Document not found! Please try again

Unified Locally Linear Embedding and Linear ... - Semantic Scholar

15 downloads 0 Views 573KB Size Report
Junping Zhang, Stan Z. Li, and Jue Wang, ”Manifold Learning and Applications in Recog- nition”. in Intelligent Multimedia Processing with Soft Computing.
Unified Locally Linear Embedding and Linear Discriminant Analysis Algorithm (ULLELDA) for Face Recognition Junping Zhang1 , Huanxing Shen2 , and Zhi-Hua Zhou3 1

2

Intelligent Information Processing Laboratory Fudan University, Shanghai 200433, China [email protected] School of Software, Fudan University, Shanghai 200433, China [email protected] 3 National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China [email protected]

Abstract. Manifold learning approaches such as locally linear embedding algorithm (LLE) and isometric mapping (Isomap) algorithm are aimed to discover the intrinsical low dimensional variables from high-dimensional nonlinear data. While, in order to achieve effective recognition tasks based on manifold learning, many problems remain to be solved. In this paper, we propose unified algorithm based on LLE and linear discriminant analysis (ULLELDA) for those remained problems. First, training samples are mapped into low-dimensional embedding space and then LDA algorithm is used to project samples into discriminant space for enlarging between-class distances and decreasing within-class distance. Second, the unknown samples are directly mapped into discriminant space without the computation of the corresponding one in the low-dimensional embedding space. Experiments on several face databases show the advantages of the proposed algorithm.

1 Introduction Faces with varying intrinsic features such as illumination, pose, expression, are thought of to constitute highly nonlinear manifolds in the high-dimensional observation space [1]. Visualization and exploration of high-dimensional nonlinear manifolds, therefore, become the focus of much of the current machine learning research. However, most recognition systems using linear method are bound to ignore subtleties of manifolds such as concavities and protrusions, which is a bottleneck for achieving highly accurate recognition. This problem has to be solved before we can build up a high performance recognition system. During these years progresses have been made in modelling nonlinear subspaces or manifolds. Rich literature exists on manifold learning. On the basis of different representations of manifold learning, they can be roughly divided into four major classes: projection methods, generative methods, embedding methods, and mutual information methods.

2

1. The first one is to find principal surfaces passing through the middle of data, such as the principal curves [2][3]. Although geometrically intuitive, this one has difficulty in how to generalize the global variable–arc-length parameter– into higherdimensional surface. 2. The second one adopts generative topology models [4] [5], and hypothesizes that observed data are generated from the evenly spaced low-dimensional latent nodes. Then the mapping relationship between the observation space and the latent space can be modelled. Resulting from the inherent insufficiency of the adopted EM (Expectation and Maximization) algorithms, nevertheless, the generative models fall into local minimum easily and also have slow convergence rates. 3. The third one is generally divided into global and local embedding algorithms. ISOMAP [6], as a global algorithm, presumes that isometric properties should be preserved in both the observation space and the intrinsic embedding space in the affine sense. On the other hand, Locally Linear Embedding (LLE) [7] and Laplacian Eigenamp [8] focus on the preservation of local neighbor structure. 4. In the fourth category, it is assumed that the mutual information is a measurement of the differences of probability distribution between the observed space and the embedded space, as in stochastic nearest neighborhood (henceforth SNE) [9] and manifold charting [10]. While there are many impressive results about how to discover the intrinsical features of the manifold,few reports are published on the practical applications on manifold learning, especially on face recognition. A possible explanation is that the practical data includes a large number of intrinsic features and has high curvature both in the observation space and in the embedded space, whereas present manifold learning methods strongly depends on the selection of parameters. Assuming that data are drawn independently and identically distributed from the underlying unknown distribution, we propose unified locally linear embedding and linear discriminant analysis algorithm for face recognition. Training samples are projected into the intrinsic low-dimensional space. To improve classification ability, LDA is introduced for enhancing between-class distances and decreasing within-class distances through mapping sample into the discriminant space. Based on the assumption that the neighborhood of unknown sample in the high-dimensional space is the same as that of sample in the low-dimensional discriminant space, finally, the unknown sample is directly mapped into discriminant space with the proposed algorithm. Experiments on several face databases show the advantages of the proposed recognition approaches. In the final section we discuss potential problems and further researches.

2 Unified Locally Linear Embedding and Linear Discriminant Analysis Algorithm (ULLELDA) 2.1 Locally Linear Embedding To establish the mapping relationship between the observed data and the corresponding low-dimensional one, the locally linear embedding (LLE) algorithm [7] is used to obtain the corresponding low-dimensional data Y (Y ⊂ Rd ) of the training set X (X ⊂

3

RN , N À d). Then the data set (X, Y ) is used for modelling the subsequently mapping relationship. The main principle of LLE algorithm is to preserve local neighborhood relation of data in both the embedding space and the intrinsic one. Each sample in the observation space is a linearly weighted average of its neighbors. The basic LLE algorithm is described as follows: Step 1: Define K X ψ(W ) =k xi − Wij xij k2 (1) j=1

P Where samples xij are the neighbors of xi . Considering the constraint term j=1 Wij = 1, and if xi and xij are not in the same neighbor, Wij = 0, compute the weighted matrix W according to the least square. Step 2: Define K X ϕ(Y ) =k yi − Wij∗ yij k2 (2) j=1

P P where W = arg minw ψ(W ). Considering the constraint i yi = 0 and i yi yiT /n = I, where n is the number of local covering set. Calculate Y ∗ = arg minY ϕ(Y ). Step 2 of the algorithm is to approximate the nonlinear manifold around sample xi with the linear hyperplane that passes through its neighbors {xi1 , . . . , xik }.PConsidering that the objective ϕ(Y ) is invariant to translation P in Y , constraint term i yi = 0 is added in the step 2. Moreover, the other term i yi yiT /n = I is to avoid the degenerate solution of Y = 0. Hence, the step 2 is transformed to the solution of eigenvector decomposition which can be seen as follows: ∗

Y ∗ = arg min φ(Y ) Y

= k yi −

K X

Wij∗ yij k2

j=1

= arg min k (I − W )Y k2 Y

= arg min Y T (I − W )T (I − W )Y Y

(3)

The optimal solution of Y ∗ in Formula (3) is the smallest eigenvectors of matrix (I − W )T (I − W ). With respect to the constraint conditions, the eigenvalue which is zero need to be removed. So we need to compute the bottom (d + 1) eigenvectors of the matrix and discard the smallest eigenvector considering constraint term. 2.2 Linear Discriminant Analysis Assuming the data of different classes have the same or similar categories, for instance, facial images sampled from difference persons are generally thought of owning the same cognitive concept. So data of different classes can be reduced into the same subspace with manifold learning approaches. While manifold learning is capable of recovering the intrinsic low-dimensional space, it may not be optimal for recognition. When

4

the two highly nonlinear manifolds are mapped into the same low-dimensional subspace, there is no reason to believe that the optimal classification hyperplane also exists between the two unravelled manifolds. If the principal axes of the two low-dimensional mapping classes of manifolds have an acute angle, for example, the classification ability may be impaired [11]. Therefore, linear Discriminant analysis (LDA) is introduced to maximize the separability of data among different classes. Suppose that each class is the equal of event, Within-class scatter maPni PL probability (yj − mi )(yj − mi )T for ni samples trix is therefore defined as: Sw = i=1 j=1 from class i with class means mi , i = 1, 2, . . . , L. For the overall mean m for all samples PLfrom all classes, meanwhile, the between-class scatter matrix is defined as Sb = i=1 (mi − m)(mi − m)T [11]. To maximize the between-class distances while minimizing the within-class distances of manifolds, the column vectors of discriminant matrix W are the eigenvectors of Sb−1 Sw associated with the largest eigenvalues. Then Projection matrix W play a role that projects a vector in the low-dimensional face subspace into discriminatory feature space which can be formulated as follows: Z =YW

0

0

Z ∈ Rd , Y ∈ Rd , W ∈ Rd ×d

(4)

2.3 The Proposed ULLELDA Algorithm It is not difficult to see that the mentioned procedure comprises two steps. Data are first mapped into the intrinsic low-dimensional space based on LLE and then are mapped into the discriminant space based on LDA. In the paper, we expect to unify the two algorithm with one step only so that the computational effectiveness is improved. Considering the nearest neighbor of unknown sample, the weighted values among unknown data and training data are first calculated based on the idea of LLE idea. The basis formula can be written as follows: φ(W 0 ) = kx0i −

K X

wi0 j xi0 j k xi0 j ∈ X ∈ RN

(5)

j=1

Where x0i means the ith unknown sample, and xi0 j the corresponding training samples according to the K values. As soon as the weighted values of each neighbor samples of the unknown sample are obtained, we presume that the data in the high-dimensional space have the same neighborhood relationship as in the low-dimensional discriminant space. Therefore, the unified mapping formulate can be seen as follows: zi0 =

K X

Wij zi0 j

0

zi 0 j ∈ Z ∈ R d

(6)

i=1

Where z 0 is the corresponding one of the unknown samples in the discriminant space, zi0 j are the closely training samples which have been obtained through the mentioned two steps and the neighbor indices are the same as that of sample in the original highdimensional space.

5

Finally, recognition is carried out on the discriminant spaces. The proposed approach has several advantages: 1) Data are directly mapped into the discriminant space without the computation of the intrinsic low-dimensional space. 2) For classification, the neighbor relationship of the unknown sample implicitly embodies the capacity of discriminant analysis.

Fig. 1. Examples of ORL Face Database

Fig. 2. Examples of UMIST Face Database

3 Experiments To verify the proposed ULLELDA approach, three face databases ( ORL database [12], UMIST database [13] and JAFFE database [14]) are investigated. Some examples can be seen in Figure 1 to Figure 3. The training samples and test samples are randomly

6

Fig. 3. Examples of JAFFE Face Database

separated without overlapping, the detail can be seen in Table 1. In the paper, the intensity of each pixel is regarded as one dimension. For example, 112*92 pixel is equal to 10304 dimensions. All the samples are standardized to the range [0,1]. The results are the average of 100 runs.

Table 1. The number of training samples (TR) and test samples (TE) of each classes

TR TE classes Dimensions ORL

112 ∗ 92

200 200

40

UMIST 200 375

20

112 ∗ 92

JAFFE

60 153

10

146 ∗ 111

Expression 168 45

7

146 ∗ 111

For comparing the effectiveness of dimensionality reduction of the proposed approach, experiments are also performed on LLE and PCA [15], respectively. When using LLE for dimensionality reduction, for instance, test samples are first mapped into the intrinsic low-dimensional space based on the same mapping approaches as ULLELDA. The dimensions of LLE-reduced data are set to be 150 except for JAFFE (where the dimension is 50). For the 2nd mapping (LDA-based reduction), the reduced dimension is generally no more than L − 1. otherwise eigenvalues and eigenvectors will have complex values. Actually, we remain the real-value part of complex values when the 2th reduced dimensions are higher than L − 1. In addition, neighbor factor K of LLE algorithm need to be predefined. Through broad experiments, we found that the selection of neighbor factor has little impact on result of the final recognition. Without loss

7

of generality, we set K be 40 for ORL , UMIST and JAFFE expression database, 20 for JAFFE Face database. Finally, two classification algorithms (1-nearest neighbor algorithm and nearest feature line (NFL) [16]) is adopted for final recognition. The corresponding combinational algorithm, for example, the combination of ULLELDA and NFL, is abbreviated with ULLELDA+NFL. Due to the adopted dimensionality reduction approaches are different, the experimental result of each algorithm listed in Table 2 is the lowest error rate under corresponding reduced dimensions. For example, the error rate of ULLE+NFL for ORL is obtained with 19 dimensions.

Table 2. The Error Rates (%) and Standard Deviation (%) of the proposed algorithm and other algorithms ORL

UMIST

JAFFE

Express

LLE+NFL

4.35± 1.42 3.79±2.14 4.50±1.76

11.89± 5.14

LLE+NN

7.01±1.83 5.78±2.49 5.955±1.97 13.489± 5.212

PCA+NFL

8.27±2.41

9.65±2.7 5.435±1.74

17.4±6.16

PCA+NN

10.12±2.06 11.29±2.85 8.38±1.88

32.62±6.85

ULLELDA+NFL 4.06±1.4 2.144±1.45 2.40±1.18

10.8±5.14

ULLELDA+NN 4.13±1.33 2.18±1.38 2.20±1.22

10.69±5.26

PCA+LDA+NFL 7.35±1.89 1.09±1.11 2.62±1.22

12.26±4.95

PCA+LDA+NN 7.45±1.92 0.93±1.05 2.59±1.21

10.82±5.28

For better analysis, one of the recognition tasks is conducted in Figure 4. From the figure we can see that the proposed algorithm has better performance with either NFL or NN approaches. More experimental results will be presented in a longer version of this paper. It is clear to see that the proposed approach obtains improvements compared with LLE algorithm isolated or PCA. When LDA is introduced for both LLE and PCA, the recognition performances of the two approaches are comparable. Based on our observation, we can summarize it into two parts: 1) The proposed algorithm combine LLE and LDA with one step, some structure information may loss and therefore lead the decrease of recognition rate. Actually, we have implement different mapping approach (manifold learning algorithm ) for face recognition, and some experimental results are better than that of the proposed ULLELDA algorithm. The detail can be seen in [17]. 2) For some high-dimensional data, the nonlinearity are inherent on the intrinsic variables other than generated by the nonlinear mapping of intrinsic variables. On this condition, LDA may be unsuitable for further recognition.

8 0.9

ULLELDA+NFL ULLELDA+NN PCA+LDA+NFL PCA+LDA+NN

0.8

ORL Face Recognition 0.7

Error Rates

0.6

0.5

0.4

0.3

0.2

0.1

0

5

10

15

20

25

30

35

40

45

50

Reduced Dimensions

Fig. 4. ORL Face Recognition

4 Conclusions In this paper, we propose ULLELDA algorithm for face recognition. First, training set is projected into the intrinsic low-dimensional space and then LDA is adopted to enhance between-class distances and decrease within-class distances. Second, the unknown samples are projected into the discriminant space directly without the computation of corresponding counterpart in the low-dimensional space. Experiments show that the proposed algorithm is better than the LLE algorithm and PCA algorithm for recognition, and is comparable to the PCA+LDA approach. However, several problems are worthy making further researches. First, we observe that the selection of neighbor factor K is relate to the error rates of recognition. We will study the feasibility of using ensemble approach to solve the problems[18]. Second, the proposed ULLELDA algorithm has lower recognition rate in some databases compared with the proposed MLA approach with LDA. [17]. The possible reason may be that the projection procedure of test sample based on the neighbor factor does not consider the distribution of data set and therefore the global structure information can not be embodied effectively. Finally, we will compare our proposed dimensionality reduction approach with the state-of-the-art approaches in the future.

Acknowledgements This work was supported by 2003-HT-FD05 and the National Outstanding Youth Foundation of China under the Grant No. 60325207.

References 1. Haw-Minn Lu, Yeshaiahu Fainman, and Robert Hecht-Nieslen, ”Image Manifolds”, in Proc. SPIE, vol. 3307, pp.52-63, 1998. 2. T. Hastie, and W. Stuetzle, ”Principla Curves,”Journal of the American Statistical Association, 1988, 84(406), pp. 502-516.

9 3. B. K´egl, A. Krzyzak, T. Linder, and K. Zeger, (2000),”Learning and design of principal curves”,IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 3, pp. 281-297. 4. C.M. Bishop, M. Sevensen, and C.K.I. Williams, ”GTM:The generative topographic mapping,” Neural Computation, 1998,10, pp. 215-234. 5. K. Chang and J. Ghosh, (2001),“A unified model for probabilistic principal surfaces,” IEEE transactions on Pattern Analysis and Machine Intelligence, 23(1), pp. 22-41. 6. J. B. Tenenbaum, de Silva, V.& Langford, J.C, ”A global geometric framework for nonlinear dimensionality reduction,”Science, 2000, 290, pp. 2319-2323. 7. S. T. Roweis, and K. S. Lawrance, ”Nonlinear Dimensionality reduction by locally linear embedding”, Science, 2000, 290, pp. 2323-2326. 8. Mikhail Belkin, and Partha Niyogi, ”Laplacian Eigenmaps for Dimensionality Reduction and Data Representation”,2001. 9. G. Hinton and S. Roweis, ”Stochastic Neighbor Embedding,” Neural Information Proceeding Systems: Natural and Synthetic, Vancouver, Canada, December 9-14, 2002. 10. M. Brand, MERL, ”Charting a manifold,” Neural Information Proceeding Systems: Natural and Synthetic, Vancouver, Canada, December 9-14, 2002. 11. Daniel L. Swets and John (Juyang) Weng, (1996), ”Using Discriminant Eigenfeatures for Image Retrieval”, IEEE Transactions on Pattern Analysis and Machine Intelligences, Vol. 18, No. 8, pp.831-836. 12. F. S. Samaria, ”Face Recognition Using Hidden Markov Models”,PhD thesis, University of Cambridge, 1994. 13. H. Wechsler, P. J. Phillips, V. Bruce, F. Fogelman-Soulie and T. S. Huang (eds), ”em Characterizing virtual Eigensignatures for General Purpose Face Recognition”, Daniel B Graham and Nigel M Allinson. In Face Recognition: From Theory to Applications; NATO ASI Series F, Computer and Systems Sciences, Vol. 163, pp. 446-456, 1998. 14. Michael J. Lyons, Julien Budynek, and Shigeru Akamatsu, ”Automatic classification of Single Facial Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.21, no. 12, pp. 1357-1362, 1999. 15. M. Turk, and A. Pentland, ”Eigenfaces for Recognition”, Journal of Cognitive Neuroscience, vol 3, no.1, 1991, pp.71-86. 16. Stan Z. Li, K.L. Chan and C.L. Wang. ”Performance Evaluation of the Nearest Feature Line Method in Image Classification and Retrieval”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1335-1339. November, 2000. 17. Junping Zhang, Stan Z. Li, and Jue Wang, ”Manifold Learning and Applications in Recognition”. in Intelligent Multimedia Processing with Soft Computing. Yap Peng Tan, Kim Hui Yap, Lipo Wang (Ed.), Springer-Verlag, Heidelberg, 2004. 18. Opitz, D. and Maclin, R. ”Popular ensemble methods : an empirical study”. J. Art. Intell. Research, 11, 169-198,1999.

Suggest Documents