A theorem on the generalized canonical projective ... - Semantic Scholar

52 downloads 0 Views 168KB Size Report
This paper proposes a kind of generalized canonical projective vectors (GCPV), ... Keywords: Canonical correlation analysis (CCA); Canonical projective vectors ...
Pattern Recognition 38 (2005) 449 – 452 www.elsevier.com/locate/patcog

Rapid and brief communication

A theorem on the generalized canonical projective vectors Quan-Sen Suna, b,∗ , Zheng-dong Liua , Pheng-Ann Hengc , De-Sen Xiaa a Department of Computer Science, Nanjing University of Science & Technology, Nanjing 210094, China b Department of Mathematics, Jinan University, Jinan 250022, China c Department of Computer Science and Engineering, the Chinese University of Hong Kong, Hong Kong

Received 22 July 2004; accepted 17 August 2004

Abstract This paper proposes a kind of generalized canonical projective vectors (GCPV), based on the framework of canonical correlation analysis (CCA) applying image recognition. Apart from canonical projective vectors (CPV), the process of obtaining GCPV contains the class information of samples, such that the combined features extracted according to the basis of GCPV can give a better classification performance. The experimental result based on the Concordia University CENPARMI handwritten Arabian numeral database has proved that our method is superior to the method based on CPV. ! 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Canonical correlation analysis (CCA); Canonical projective vectors (CPV); Generalized CPV (GCPV); Feature fusion; Handwritten character recognition

1. Introduction Canonical correlation analysis (CCA) is an important multiple data processing technique [1–4]. In Ref. [4], we have proposed the canonical projective vectors (CPV), based on the idea of feature fusion, and established a framework of (CCA) for the use of image recognition, and then applied the CPV to fields like face automatic recognition and handwritten characters recognition. Although canonical discriminant features extracted based on CCA perform better, the process of obtaining CPV contain no class information of the samples. In case of pattern classification the extracted features may not be optimum. In the light of this, we propose a kind of generalized canonical projective vectors (GCPV) to increase the class

∗ Corresponding author. Department of Computer Science,

Nanjing University of Science & Technology, Nanjing 210094, China. Tel./fax: +86 531 2927158. E-mail address: [email protected] (Q.-S. Sun).

information of the training samples by improving the canonical correlation criterion function; the discriminant feature extracted on the basis of improved criterion function has shown its physical significance, and excellent classification performance. The experimental result based on the Concordia University CENPARMI handwritten numeral database has proved that GCPV proposed in this paper is superior to CPV. 2. Theory of GCPV Suppose A and B are two feature sets defined on pattern sample space !. For any pattern sample " ∈ !, the corresponding two feature vectors are x ∈ A ⊂ ! and y ∈ B ⊂ !. 2.1. CPV Considering two zero-mean random vectors x ∈ Rp and y ∈ Rq , CCA finds pairs of directions # and $ that maximize the correlation between the projections x ∗ = #T x and

0031-3203/$30.00 ! 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2004.08.009

450

Q.-S. Sun et al. / Pattern Recognition 38 (2005) 449 – 452

y ∗ = $T y. In general, the projective directions # and $ are obtained by maximizing the criterion function as follows: J (#, $) =

#T Sxy $ (#T Sxx # · $T Syy $)1/2

(1)

,

where Sxx ∈ Rp×p and Syy ∈ Rq×q are the covariance matrices of x and y, respectively, while Sxy ∈ Rp×q denotes T =S , r = their between-set covariance matrix, and Sxy yx rank(Sxy ). In Ref. [4], the projective vectors {#i } and {$i } that maximizes criterion function J (#, $) are called CPV. CPV is composed of two linear transformations (two feature fusion strategies) as follows: ! "T ! " 0 Wx x FFS I: Z1 = , (2) 0 Wy y FFS II:

Z2 =

!

"T ! " Wx x , y Wy

(3)

where Wx = (#1 , #2 , . . . , #d ), Wy = ($1 , $2 , . . . , $d ). The combinatorial features extracted by linear transformation (2) and (3) are called canonical discriminant features (CDF), which are used for classification.

Theorem 1. Under the generalized criterion function Jg (&), GCPV can be the d pair eigenvectors corresponding to the first d (d ! r = rank Lxy ) maximum eigenvalues ('21 " '22 " · · · " '2d ) of the two eigenequations (5) and (6), −1 Lyx # = '2 SW x #, Lxy SWy

(5)

−1 2 Lyx SW x Lxy $ = ' SWy $,

(6)

and these GCPV can satisfy T #T i SW x #j = $i SWy $j = (ij , T #i Lxy $j = 'i (ij ,

(i, j = 1, 2, . . . , d)

(7)

where we assume that the eigenvalues 'i &= 'j (i &= j ), and ( 1, i = j, (ij = 0, i &= j. Proof. Let projective directions # and $ satisfy the following constraints:

2.2. GCPV Let SW x and SWy denote the within-class scatter matrix of training sample spaces A and B, respectively, i.e.   li c # # 1 SW x = P (%i )  (xij − mxi )(xij − mxi )T  , li i=1 j =1   li c # # 1 y y P (%i )  (yij − mi )(yij − mi )T  , SWy = li j =1

i=1

where xij ∈ A and yij ∈ B denote the j th training sample in class i; P (%i ) is the prior probability of class i; li is the y number of training samples in class i; and mxi and mi are the mean vectors of training samples in class i, respectively. Suppose Lxy = n1 Sxy , and SW x and SWy are both positive definite. We give the criterion function Jg (#, $) =

combinatorial features extracted by linear transformation (2) and (3) which are composed of GCPV are called the generalized CDF (GCDF). With respect to the solution of GCPV, we present a theorem.

#T Lxy $ (#T SW x

# · $T S

Wy

$)1/2

,

(4)

and the criterion function Jg (#, $) is called generalized canonical correlation discriminant criterion. The pair vectors that maximize criterion function Jg (#, $) are regarded as the projective directions, whose physical significances are two sets of the feature vectors have maximum correlation when the projected samples minimize the within-class scatter matrix. We call two vectors {#i } and {$i } that maximize the criterion function Jg (#, $) as generalized CPV (GCPV). The

#T SW x # = $T SWy $ = 1.

(8)

Use the Lagrange multiplier method to transform Eq. (4): L(#, $) = #T Lxy $ − −

'1 T (# SW x # − 1) 2

'2 T ($ SWy $ − 1), 2

where '1 and '2 are Lagrange multipliers. Let !L = Lxy $ − '1 SW x # = 0, !#

(9)

!L = Lyx # − '2 SWy $ = 0. !$

(10)

Multiply #T and $T to the left sides of Eqs. (9) and (10), respectively. Considering the constraint (8), we obtain #T Lxy $ = '1 #T SW x # = '1 , $T Lyx # = '2 $T SWy $ = '2 . T T T T Due to LT yx =Lxy , so '1 ='1 =(# Lxy $) =$ Lyx #='2 . Let '1 = '2 = ', then

Jg (#, $) = #T Sxy $ = $T Syx # = '.

(11)

Here, the criterion functions have the maximal valve; thus, Eqs. (8) and (9) can also be represented as Lxy $ − 'SW x # = 0,

(12)

Lyx # − 'SWy $ = 0.

(13)

Q.-S. Sun et al. / Pattern Recognition 38 (2005) 449 – 452

SinceSW x and SWy are both positive definite, we obtain −1 Lyx # by Eq. (13). By inserting this expression $=(1/')SWy into Eq. (12), we obtain the generalized eigenequation (5). Likewise, the generalized eigenequation (6) will be inferred. As mentioned above, the generalized eigenequations (5) and (6) have the same non-zero eigenvalues '21 , '22 , . . . , '2r (r = rank(Lxy )), and '21 " '22 " · · · " '2r , sorted by size. The corresponding sets of the two eigenvectors are #1 , #2 , . . . , #r and $1 , $2 , . . . $r , respectively. The first d pair eigenvectors are composed of GCPV. For any i, j = 1, 2, . . . , d (d ! r), when i = j , we know the following equation under condition (8): T #T i SW x #i = $i SWy $i = 1.

(14)

When i &= j , according to hypothesis 'i & = 'j , we can infer 1 T −1 #T i SW x #j = 2 #i Lxy SWy Lyx #j , 'j

(15)

1 T −1 #T j SW x #i = 2 #j Lxy SWy Lyx #i . 'i

(16)

Therefore, 1 T −1 T T #T i SW x #j = (#j SW x #i ) = 2 #i Lxy SWy Lyx #j . 'i

(17)

Again according to criterion (4) and Eq. (16), we can de−1 T duce #T i Lxy SWy Lyx #j = 0, i.e. #i SW x #j = 0; likewise, $T i SW x $j = 0. Then it follows that

( 1, TS #T S # = # # = ( = ij W x j Wy j i i 0,

i = j, i &= j.

(18)

By Eqs. (12) and (18), we can also deduce T −1 #T i Lxy $j = 'j #i SW x #j = 'i (ij .

#

Theorem 1 provides an algorithm to solve GCPV. After GCPV is determined, the two feature fusion strategies FFSI

451

and FFSII can be used for extracting combination features. When solving GCPV, we only need to calculate one group of GCPV, and solve another group according to equation (12) or (13). Therefore, the generalized eigenequation with low dimension (min(p, q)) can be selected for calculating its eigenvalues and eigenvectors, so the feature extraction will be greatly speeded up, particularly when the dimension of these two groups of feature vector are different. 3. Experiments and analysis We have tested our method on the well-known Concordia University CENPARMI handwritten numeral database. In this database, there are 10 classes, i.e. 10 digits (from 0 to 9), and 600 samples for each. The training samples and testing samples are 4000 and 2000, respectively. In Ref. [5], Hu et al. had performed some preprocessing work and extracted four kinds of features as follows: XG : 256-dimensional Gabor transformation feature; XL : 121-dimensional Legendre moment feature; XP : 36-dimensional Pseudo-Zernike moment feature; XZ : 30-dimensional Zernike moment feature. Combining any two features of the above four features in the original feature space, and using the algorithm described in this paper (Section 2.2), we obtain GCPV, using FFSI and FFSII to extract the combined features. The minimumdistance classifier is used for classification, and classification error rate is shown in Table 1. Notice that when we try to obtain the GCPV, we should obtain the low-dimensional GCPV at first. For example, when we want to combine two features XG and XZ , for getting the GCPV with 256-dimension and 30-dimension, we only need to obtain the eigenvalue and the corresponding eigenvector from the 30-dimensional generalized eigenequation. In order to explain the superiority of GCPV, we have extracted the combined features by using CPV based on Section 2.1 and the classification results in the same database in Table 1, and also provided the result of direct classification based on original feature fusion. They are shown in Table 1,

Table 1 Classification error rates based on CPV method, GCPV method and original feature spaces in minimum-distance classifier Combine features FFSI

FFSII

CPV [4] Dimension GCPV XG –XL XG –XP XG –XZ XL –XP XL –XZ XP –XZ

0.1170 0.2050 0.2230 0.1810 0.2110 0.3215

242 72 60 72 60 56

Primitive feature Dimension

Dimension CPV [4] Dimension GCPV Dimension

0.0760 140 0.1235 72 0.1315 56 0.1025 44 0.1160 60 0.2910 56

0.1290 0.2300 0.2445 0.2160 0.2405 0.3295

85 31 20 34 30 30

0.0840 0.1470 0.1560 0.1285 0.1535 0.2960

29 35 28 33 30 30

0.2070 0.2270 0.2290 0.2415 0.2510 0.4195

377 292 286 157 151 67

452

Q.-S. Sun et al. / Pattern Recognition 38 (2005) 449 – 452

respectively. Please note that the above classification results are obtained in the minimum-distance classifier. Table 1 shows that the method based on GCPV is significantly superior to the method based on CPV under the two feature fusion strategies, while the methods based on GCPV and CPV perform better than the direct recognition based on the original feature fusion. Experimental results have indicated that GCPV proposed in this paper can provide better classification performance for the GCDF extracted after increasing the class information of training samples. 4. Conclusion In this paper, we have generalized the theory of CCA in consideration of pattern classification. The GCPV set forth in this paper can make the extracted canonical feature more advantageous to the classification due to increased class information of samples. That is the main cause for the adoption of GCPV superior to that of CPV, while this has been shown by the experimental results. Our future research direction is to investigate its application to the other fields of image recognition.

Acknowledgements This work was supported by the Research Grants Council of the Hong Kong Special Administrative Region under Earmarked Research Grant (project no. CUHK4185/00E. References [1] Xiao-Ting Zhang, Kai-Tai Fang, Multivariate Statistical Introduction, Sciences Press, Beijing, 1999 (in Chinese). [2] M. Borga, Learning Multidimensional Signal Processing, Linköping Studies in Science and Technology, Dissertations, vol. 531, Department of Electrical Engineering, Linköping University, Linköping, Sweden, 1998. [3] Thomas Melzer, Michael Reiter, Horst Bischof, Appearance models based on kernel canonical correlation analysis, Pattern Recognition 36 (2003) 1961–1971. [4] Quan-Sen Sun, Mao-Long Yang, Pheng-Ann Heng, De-Shen Xia. Improvements on CCA model with application to face recognition, Proceedings of International Conference on Intelligent Information Processing, 2004, pp. 125–134. [5] Zhong-Shan Hu, Z. Lou, J.-Y. Yang, Handwritten digit recognition based on multiclassifier combination, Chinese J. Computer 22 (4) (1999) 369–374 (in Chinese).

About the Author—QUAN-SEN SUN is an associate professor in the Department of Mathematics at Jinan University. At the same time, he is working for his Ph.D. degree in pattern recognition and intelligence system from Nanjing University of Science and Technology (NUST). His current interests include pattern recognition, image processing, computer vision and data fusion. About the Author—ZHENG-DONG LIU is working for his Ph.D. degree in pattern recognition and intelligence system from NUST. His current interests include pattern recognition, image processing. About the Author—PHENG-ANN HENG received his Ph.D. degree in computer specialty from the Indiana University of USA in 1992. He is now a professor and Ph.D. supervisor in the Department of Computer Science and Engineering at the Chinese University of Hong Kong (CUHK). He is Director of Virtual Reality, Visualization and Imaging Research Centre at CUHK. His research interests include virtual reality applications in medicine, scientific visualization, 3D medical imaging, user interface, rendering and modeling, interactive graphics and animation. About the Author—DE-SHEN XIA received his Ph.D. degree in Pattern Recognition and Intelligent Systems from the Rouen University of France in 1987. He is now the honorary professor at ESIGELEC, France, and the professor and Ph.D. supervisor in the Department of Computer Science at NUST. He is the Director of the Laboratory of Image Processing, Analysis and Recognition at the NUST. His research interests are in the domain of image processing, remote sensing, medical image analysis and pattern recognition.