The Pattern Classification based on the Nearest

0 downloads 0 Views 130KB Size Report
In this paper, we propose a novel method, called the nearest feature midpoint (NFM), for pattern classification. Any two feature points of the same class are ...
The Pattern Classification Based on the Nearest Feature Midpoints Zonglin Zhou Chee Keong Kwoh Bioinformatics Research Centre, School of Computer Engineering Nanyang Technological University, Nanyang Avenue, Singapore 639798 [email protected], Fax: (+65)6792-6559

Abstract In this paper, we propose a novel method, called the nearest feature midpoint (NFM), for pattern classification. Any two feature points of the same class are generalized by the feature midpoint (FM) between them. The representational capacity of available prototypes is thus expanded. The classification is based on the nearest distance from the query feature point to each FM. A theoretical proof is provided in this paper to show that for the n-dimensional Gaussian distribution, the classification based on the NFM distance metric will achieve the least error probability as compared to those based on any other points on the feature lines. Furthermore, a theoretical investigation indicates that under some assumption the NFL is approximately equivalent to the NFM when the dimension of the feature space is high. The empirical evaluation on a simulated data set concurs with all the theoretical investigations.

1 Introduction In Pattern Recognition, various distance metrics have been used for pattern classification: Euclidean distance, Cosine distance, Hamming distance, and so on. They all have distinction between the query and an individual prototype (feature point). In classification, a class is considered as a collective of isolated points in the feature space, and there is no class membership concept for the prototypes. This type of classification can be referred collectively as the nearest-neighbor (NN) classification [3, 2]. However, in many cases, multiple prototypes are available within a class. Such a characteristic can be used to improve the classification performance but has been ignored by the NN type of methods. In [6, 4, 5], the method of the nearest feature line (NFL) is proposed for pattern classification to circumvent the above mentioned limitations of the NN. The basic assumption made in the NFL is that at least two prototype feature points are available for each class, which is usually satis-

fied. In the NFL, a feature subspace is constructed for each class from the straight lines (feature lines) passing through each pair of the prototypes (feature points) belonging to that class. The prototypes are generalized by the feature lines. A feature line (FL) covers more of the space than the two feature points alone and virtually provides an infinite number of feature points of the class that the two prototypes belong to. The representational capacity of available prototypes is thus expanded. In this paper, we present a new method, called the nearest feature midpoint (NFM), for pattern classification. The basic assumption made in the NFM is same as in the NFL, that is, at least two prototype feature points are available for each class. In the NFM, each feature subspace is constructed for each class from the midpoints (feature midpoints) between each pair of the prototypes belonging to that class. In addition, the NFM also makes use of the available information about classes contained in the multiple prototypes of each class. The within-class prototypes are generalized by the feature midpoints to represent variants of that class, and the generalized ability of the classifier is thus improved.

2 The Pattern Classification Using NFL and NFM In the NFL, the straight line passing through Ü and Ü of the same class, denoted Ü Ü , is called a feature line (FL) of that class. The feature point Ü of a query (test) sample is projected onto an FL as point Ü (Fig. 1). The FL distance between Ü and Ü Ü is defined as Ü Ü Ü   Ü  Ü , where  is some norm. The projection point can be computed as Ü  Ü   Ü  Ü , where   , called the position parameter, can be calculated from Ü Ü and Ü as follows: Because Ü Ü is perpendicular to Ü Ü , we have Ü  Ü  Ü  Ü   Ü   Ü  Ü   Ü  Ü  Ü   , where “” stands for dot product, and thus



0-7695-2128-2/04 $20.00 (C) 2004 IEEE

Ü  Ü   Ü  Ü   Ü  Ü   Ü  Ü 

(1)

X

each other, then

Ü

 Ü 

 Mx1x2

Xp

Ü  Ü Ü  Ü









      





Ü 

(3)

         (4)





(5)

The following lemma elucidates the conditions under which the nearest-neighbor (NN) classifier is equivalent to Bayes classifier.

X2 X1



Figure 1. Generalizing two prototype feature points Ü and Ü by the feature line Ü Ü , and the feature midpoint Ñܾܽ , respectively. The feature point Ü of a query is projected onto the line as point Ü .

Lemma 3.1 Suppose there are  classes        . The likelihood distribution is

 Ü    





Ü   

(6)

Ü    Ü  Ü    Ü  Ü 

(7)







where The NFL classification is done by using the minimum FL distance between the feature point of the query and all the feature lines. In the NFM proposed in this paper, the midpoint between Ü and Ü of the same class is called a feature midpoint (FM) of that class, and is denoted Ñܾܽ . Any point on the feature line Ü Ü can be expressed as Ü   Ü  Ü , where     . When    , Ñܾܽ   Ü  Ü  is the FM. The FM distance between the feature point Ü of a query and Ñܽ ܾ is defined as Ü Ñܽ ܾ   Ü  Ñܾܽ , where  is the same norm as in the NFL. Mathematically, let Ü and Ü be two distinct prototype feature points belonging to class . The FM distance between Ü of the query and each pair of prototypes Ü and Ü ,     , is calculated for each class . Suppose there are classes and each class  has  prototype feature points. The FM distances are sorted in ascending order, each being associated with a class identifier, and two prototypes. The NFM distance is the first rank FM distance:

Ü ÑÜ Ü  



 



 

Ü ÑÜ Ü 

(2)

The first rank gives the NFM classification consisted of the best matched class  and the two best matched prototypes  and   of the class.

3 Theoretical Analysis In this section, we will investigate in theory the NFM method as well as the relationship between the NFL and NFM. The comparison of computational complexities of the NFL and NFM will be made as well. Denote the mean vector of class by Ü and variance matrix by  . Let Ü and Ü be two prototypes of class . Ü   Ü   Ü ,  Ü    Ü    . Assume that all the points in each class are independent of

is the distance between Ü and Ü given  , and                . The classifier is equivalent to Bayes classifier, if the following conditions hold: (i) All classes are equally probable,  (ii)

  ½¾ Ü     Ü   



 



   

,

for all      .

Proof: The posterio distribution is

   Ü 

  Ü      Ü

(8)



where  Ü     Ü   . In terms of Bayes classification rule, Ü is classified as  if    Ü       Ü, which is equivalent to  Ü        Ü    in terms of Eq. (8) and condition (i). From Eq. (6) and condition (ii), the latter holds if and only if Ü       Ü   , which is just the

classification rule. Thus, the classifier is equivalent to the Bayes classifier under the conditions. A remark on the condition (ii) of lemma 3.1: the condition (ii) means the change of Ü    relatively dominates that of  Ü    from the change of  , i.e., from the change of  . This is especially true in many real cases where the prototypes from different classes are subjected to the same noise processes, hence we can set      [1]. To illustrate further, the error probability of Bayes classifier and hence that of the NN is examined in the two-class case for simplicity, where the two classes are equally possible,      , i.e., no priors are available. Under the conditions of lemma 3.1, the error probability of Bayes classifier [3], thus that of the NN, is







 





0-7695-2128-2/04 $20.00 (C) 2004 IEEE





 

 











    (9)

where  Ü     Ü       Ü   , is called the discriminant function.  and  can be calculated as follows. When    ,  Ü  Ü  Ü   Ü 



  Ü Ü  Ü Ü . In this case,  Ü is also a Gaussian random variable. Letting, Ü  Ü    ,    Ü  Ü   Ü  Ü , we have  Ü   ,  Ü   ,   Ü     . Therefore,              (10)  



¾ where          is the normal error function, and      Ü    Ü   .



In the theorem below, the classification based on an arbitrary point Ü on feature lines means that the position parameter  is fixed for all the feature lines. The classification is done using the nearest distance from the query to each Ü . Theorem 3.1 For the n-dimensional Gaussian distribution in Eq. (6), if the two conditions in lemma 3.1 are satisfied, then the classification based on the nearest feature midpoint will achieve least error probability relative to those based   , on the feature lines. on any other point Ü    Proof: Case 1:    , Assume that Ü  Ü are from same class . Let Ü be any point on the feature line Ü Ü , Ü  Ü   Ü  Ü ,     . From Eq. (4), Ü is a Gaussian distribution

Ü        . In addition, from Eq. (4),

 Ü    Ü 

 

  Ü    (11)            (12)

From Eq. (10), the error probability of the classification based on the Ü will be                  (13)     

 

Thus, from Eq. (5), 

Ü ܽ ܾ













   









(14)

Case 2:   , In this case,   are class-dependent. Here we convert the case 2 to case 1, that is to approximate , by letting

   ,           . This is a suboptimal solution in the sense of classification error. The theorem below reveals the relationship between the NFL and NFM.

 Ü    Ü   , Ü  Theorem 3.2 Let Ü      

Ü   Ü  , Ü  Ü   Ü   be n-dimensional random vectors, and Ü , Ü belong to the same class and have a common distribution,

Ü



Ü

Ü  Ü   Ü  Ü  Ü  Ü  Ü  Ü   Ü  Ü  



(15)

where  stands for dot product. If the components of Ü,  Ü , and Ü are i.i.d., then Ü   Ü  Ü  when . It means that Ü converges, in probability sense, to  Ü  Ü  when n goes to infinity. Proof: From Eq. (15), we have, for !    ,

Ü

Ü 





 Ü

  



Ü  Ü  Ü  Ü  Ü  Ü  Ü  Ü  

(16)

According to the assumption made in the theorem, Ü   Ü   Ü   Ü    Ü   Ü   Ü   Ü   are i.i.d.,                 and Ü  Ü  Ü  Ü   Ü  Ü  Ü  Ü  are i.i.d. as well.   

" "    , suppose   Ü     Ü     , then

 Ü  Ü  Ü  Ü   Ü  Ü  Ü  Ü 



 









In the law of large numbers, we have,







 

Ü 



 





 







 Ü  Ü    



  Ü  Ü    



From Eq. (16),  Ü  Ü 



Ü  Ü 



equivalently,

Ü







Ü

 Ü 



In theorem 3.2, Ü and Ü belong to the same class, so they should have a common distribution. Thus, the performance of the NFL classifier is approximately equivalent to that of the NFM under such assumptions according to theorem 3.2. Complexity wise, since the position parameter  in Eq. (1) for the NFL depends on the query and each pair of prototypes in each class, it needs to be calculated for each query and each pair of prototypes. In the NFM, however, the midpoint position of each pair of prototypes in each class is fixed, and there is no need to perform the computation of the position parameter. Thus, the computational complexity in the NFM is significantly less than the NFL.

0-7695-2128-2/04 $20.00 (C) 2004 IEEE

Table 1. Error rates of classifications Dim 8 16 32 64

128

6 6 12 6 12 24 6 12 24 48 6 12 24 48 96



0.8021 0.7188 0.7760 0.8229 0.7552 0.7682 0.7917 0.8125 0.7917 0.7148 0.7604 0.7448 0.8464 0.7799 0.8099



0.6875 0.6458 0.6823 0.5833 0.6458 0.5990 0.6562 0.6979 0.6432 0.5430 0.6250 0.6302 0.7500 0.6641 0.6400



0.5729 0.5312 0.5000 0.4479 0.4427 0.3646 0.5000 0.4479 0.3828 0.2630 0.3854 0.4323 0.5000 0.2656 0.2799

 

0.5938 0.4583 0.3698 0.3333 0.2656 0.2135 0.3438 0.2135 0.1458 0.0911 0.1667 0.1979 0.1380 0.0430 0.0456

0.5208 0.3646 0.3542 0.3125 0.2552 0.1693 0.2812 0.1615 0.0833 0.0547 0.1250 0.1094 0.0651 0.0130 0.0182

NFL 0.6354 0.5000 0.4010 0.3021 0.2760 0.2057 0.2812 0.1823 0.0964 0.0638 0.1146 0.1094 0.0651 0.0143 0.0189

NN 0.6354 0.5000 0.4740 0.4375 0.4271 0.3776 0.4896 0.4583 0.3516 0.2526 0.3958 0.4479 0.5130 0.2591 0.2747

5-NN 0.6250 0.5729 0.5260 0.4896 0.4948 0.4062 0.5833 0.5104 0.4141 0.2630 0.5208 0.5000 0.5833 0.2982 0.3333

10-NN 0.6875 0.5938 0.5677 0.5729 0.5469 0.4609 0.6458 0.5729 0.4219 0.2669 0.5000 0.5156 0.5938 0.3177 0.3652

15-NN 0.6875 0.7917 0.6094 0.6875 0.5729 0.4766 0.7917 0.6146 0.4453 0.2747 0.5729 0.5156 0.6198 0.3346 0.3874



0.5938 0.4583 0.3698 0.3333 0.2656 0.2135 0.3438 0.2135 0.1458 0.0911 0.1667 0.1979 0.1380 0.0430 0.0456



0.5729 0.5312 0.5000 0.4479 0.4427 0.3646 0.5000 0.4479 0.3828 0.2630 0.3854 0.4323 0.5000 0.2656 0.2799



0.6875 0.6458 0.6823 0.5833 0.6458 0.5990 0.6562 0.6979 0.6432 0.5430 0.6250 0.6302 0.7500 0.6641 0.6400

4 Experiments

5 Conclusions

The following simulation experiment provides an illustration of the finding given in theorem 3.1 and theorem 3.2. In this experiment, sixteen classes are assumed, of which the samples are randomly generated from Gaussian distributions. Let  and  denote two uniform random variables at  . They determine, respectively, the means and variances of the Gaussian distributions as follows:  varies with the change of both the component of a random sample and the membership of the class the sample belongs to, but does not vary for the same component of different random samples in the same class. The change of  only depends on the change of the membership of the class. Consequently the covariance matrix of any random sample of each class is a diagonal matrix. All the diagonal components of every covariance matrix are randomly generated by  . The components of any random sample of each class are mutually independent. Each of the random samples of different classes has a different covariance matrix. Cases with different dimensions of data (dim) and different numbers of prototypes per class (  ) are examined in Table 1. The experiment is evaluated by a leave-one-out test: when a sample is used as the query, it is not used as a prototype, i.e., it is removed from the prototype set. Table 1 displays the error rates of classifications using the NFM (  ), NFL, NN, " NN, and those based on other points in the feature lines. For " NN, values of " equal to 5, 10 and 15 are tested. The NFM yields consistently lower error rates of classifications than all the others in almost all cases. As proved in theorem 3.2, under the i.i.d. assumption, the NFL is approximately equivalent to the NFM when the dimension of the feature space is high. In Table 1, for example, when the number of prototypes is 6, the difference between error rates of NFM and NFL decreases, although not monotonically, from 0.1146 to 0.0104 when the dimension increases from 8 to 128.

In this paper, a novel pattern classification method named NFM is proposed, and a theoretical analysis is conducted to justify it. Furthermore, it has been theoretically proved that under the i.i.d. assumption the performance of the NFL is approximately equivalent to that of the NFM when the dimension of the feature space is high. However, the computational complexity of the NFM is significantly less than that of the NFL. Therefore, the NFM is a good alternative to the NFL in high dimensional feature spaces. The simulation experiment further shows that the NFM can yield considerably lower error rate than the classifications based on other points on the feature lines, as well as the NFL, NN, and " NN.

References [1] S. Dasgupta. Learning Probability Distribution. PhD thesis, University of California at Berkeley, 2000. [2] C. Domeniconi, J. Peng, and D. Gunopulos. “Locally adaptive metric nearest-neighbor classification”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9):1281–1285, September 2002. [3] K. Fukunaga. Introduction to statistical pattern recognition. Academic Press, Boston, 2 edition, 1990. [4] S. Z. Li. “Content-based classification and retrieval of audio using the nearest feature line method”. IEEE Transactions on Speech and Audio Processing, 8(5):619–625, September 2000. [5] S. Z. Li, K. L. Chan, and C. L. Wang. “Performance evaluation of the nearest feature line method in image retrieval”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1335–1339, November 2000. [6] S. Z. Li and J. Lu. “Face recognition using the nearest feature line method”. IEEE Transactions on Neural Networks, 10(2):439–443, March 1999.

0-7695-2128-2/04 $20.00 (C) 2004 IEEE



0.8021 0.7188 0.7760 0.8229 0.7552 0.7682 0.7917 0.8125 0.7917 0.7148 0.7604 0.7448 0.8464 0.7799 0.8099

Suggest Documents