Self-training-based face recognition using semi-supervised linear ...

1 downloads 0 Views 633KB Size Report
Self-training-based face recognition using semi-supervised linear discriminant analysis and affinity propagation. Haitao Gan,1 Nong Sang,1,* and Rui Huang2.
Gan et al.

Vol. 31, No. 1 / January 2014 / J. Opt. Soc. Am. A

1

Self-training-based face recognition using semi-supervised linear discriminant analysis and affinity propagation Haitao Gan,1 Nong Sang,1,* and Rui Huang2 1

School of Automation, Huazhong University of Science and Technology, Wuhan 430074, China 2 NEC Laboratories China, Beijing 100084, China *Corresponding author: [email protected] Received June 4, 2013; revised October 9, 2013; accepted November 6, 2013; posted November 13, 2013 (Doc. ID 191741); published December 2, 2013

Face recognition is one of the most important applications of machine learning and computer vision. The traditional supervised learning methods require a large amount of labeled face images to achieve good performance. In practice, however, labeled images are usually scarce while unlabeled ones may be abundant. In this paper, we introduce a semi-supervised face recognition method, in which semi-supervised linear discriminant analysis (SDA) and affinity propagation (AP) are integrated into a self-training framework. In particular, SDA is employed to compute the face subspace using both labeled and unlabeled images, and AP is used to identify the exemplars of different face classes in the subspace. The unlabeled data can then be classified according to the exemplars and the newly labeled data with the highest confidence are added to the labeled data, and the whole procedure iterates until convergence. A series of experiments on four face datasets are carried out to evaluate the performance of our algorithm. Experimental results illustrate that our algorithm outperforms the other unsupervised, semi-supervised, and supervised methods. © 2013 Optical Society of America OCIS codes: (100.3008) Image recognition, algorithms and filters; (100.5010) Pattern recognition; (150.2950) Illumination. http://dx.doi.org/10.1364/JOSAA.31.000001

1. INTRODUCTION During the past decades, face recognition has become one of the most important and successful applications of machine learning and computer vision [1–5]. One widely used category of methods for face recognition is appearance-based methods [6–10]. In order to describe the variations in face appearance, multiple templates associated to different poses, illuminations, and occlusions should be gathered in the training phase. However, in many practical face recognition tasks there are only a small amount of labeled face images that can be used to train a classifier. In this case, the face recognition performance may not be robust to face variations since the templates of such variations are not well represented only using a small amount of labeled data. On the other hand, unlabeled face images are often abundant or easy to collect in the real world. Consequently, semi-supervised learning, which attempts to train a better classifier with both labeled and unlabeled data, has been a topic of great interest. Many researchers have been investigating the idea and a number of approaches have been proposed in the semi-supervised learning field, such as self-training [11], co-training [12], transductive support vector machines [13], generative models [14], graph-based methods [15], etc. More details can be found in [16]. Among these methods, selftraining is a widely used technique. As the name implies, a classifier trained by the labeled data is first used to classify the unlabeled data, and the newly labeled data with the highest confidence (probability of belonging to a certain class) 1084-7529/14/010001-06$15.00/0

are added incrementally to the labeled dataset with their predicted labels. The procedure is repeated until all unlabeled data are labeled. In face recognition, Roli and Marcialis [17] proposed a principal component analysis (PCA)-based semi-supervised method (PCA self-training) that used PCA [18] to update the eigenspace in each iteration of self-training. However, PCA is an unsupervised method and does not consider the discriminative information of labeled images. Zhao et al. [19] proposed an linear discriminant analysis (LDA)-based semi-supervised method (LDA self-training) that uses LDA [18] to update the Fisher space in each iteration of self-training. But both of the above two selftraining methods do not fix the problem that the performance of PCA and LDA may be drastically reduced when the training images set is small. Recently, a semi-supervised LDA (SDA) [20] was proposed to learn the discriminative subspace with both labeled and unlabeled data. SDA has shown its effectiveness for face recognition with only one training image per person, although it does not involve the self-training strategy. Another problem with the above methods is that the templates in these methods are calculated as the means of the projected images, which do not exist in the real world (i.e., they are virtual faces in the original data space). Frey and Dueck [21] proposed a clustering algorithm named affinity propagation (AP), which used similarity between pairs of data points to cluster the dataset and can obtain the exemplars of each cluster, which are actual data in the cluster. Figure 1 shows the results obtained by AP and mean templates on a synthetic dataset, © 2014 Optical Society of America

2

J. Opt. Soc. Am. A / Vol. 31, No. 1 / January 2014

Gan et al.

Fig. 1. Results obtained by mean templates and AP after 10 iterations.

respectively. As we can see, after 10 iterations, the labeling results using AP is better than that using mean templates. It is indicated that we employ AP to compute the templates that are intuitively more reasonable than the mean faces obtained by [17,19]. In this paper, we propose an improved semi-supervised approach for face recognition in which SDA and AP are integrated into a self-training framework. In each iteration of our proposed method, the labeled and unlabeled face images are first projected into a low-dimensional feature space using SDA, and the exemplars (or templates as used in the face recognition literature) of each class are computed using AP, only on the labeled data. The unlabeled data can then be classified according to the exemplars and the newly labeled data with the highest confidence are added to the labeled data, and the whole procedure iterates until convergence. In contrast to [17] and [19], we use SDA to update the face subspace that is more robust than PCA and LDA when there is only a small amount of labeled face images. Different from SDA [20], our approach is a self-training method. The rest of the paper is organized as follows: In Section 2, we briefly review the related work, including SDA and AP. In Section 3, we describe our algorithm in detail. Section 4 presents the experimental results on several face datasets. Finally, we conclude the paper in Section 5.

2. PREVIOUS WORK

class scatter matrix structure from labeled data can be defined as J 1 a  aT XW X T a:

The total scatter matrix structure from labeled data can be defined as ~ T a; J 2 a  aT X IX

(2)

where I~ is defined as I~ 



I 0 0 0



and I is an identity matrix of size l × l. The geometrical structure from both labeled and unlabeled data is exploited by constructing a p nearest neighbors graph matrix S. SDA defines the matrix S as  S ij 

1 if xi ∈ N p xj  or xj ∈ N p xi  ; 0 otherwise

where N p xi  denotes the datasets of p nearest neighbors of xi . Hence, SDA provides the definition of the graph-based smoothness regularization term as

In this section, we will briefly review the related work. A. Semi-Supervised Discriminant Analysis SDA is a semi-supervised dimensionality reduction technique extended from LDA. SDA attempts to find a projection by exploiting the discriminant structure from labeled data and the intrinsic geometrical structure from both labeled and unlabeled data. To make full use of the geometrical structure, SDA constructs a p-nearest neighbor graph to model the structure and add a graph-based smoothness regularization term into the objective function of LDA. A basic assumption in SDA is the smoothness assumption, which means that similar data should have similar labels. Given a dataset X  fx1 ;    ; xl ; xl1 ;    ; xn g ∈ RD×n , the first l data are labeled as yi ∈ f1;    ; cg with the number of the labeled data in the jth class lj and the rest consisting of unlabeled data. Let us denote the weight matrix W ∈ Rn×n , where W ij  δyi ;yj ∕lyi for labeled data and W ij  0, otherwise. According to the graph representation of LDA [20], the between-

(1)

J 3 a 

X aT xi − aT xj 2 S ij  2aT XLX T a;

(3)

ij

where L  D − S isP the graph Laplacian and D is a diagonal matrix with Dii  j S ij . Finally, the objective function of SDA can be described as max a

aT XW X T a : aT XI~  αLX T  βIa

4

Here α tunes the balance between the model complexity and the empirical loss, and β is the parameter of regularization term. The solution of Eq. (4) is the eigenvectors corresponding to the nonzero eigenvalues of eigenproblem: XW X T a  λXI~  αLX T  βIa:

(5)

Gan et al.

Vol. 31, No. 1 / January 2014 / J. Opt. Soc. Am. A

B. Affinity Propagation AP is a kind of unsupervised clustering method that finds a set of exemplars that may represent the whole dataset [21,22]. In the traditional clustering methods, the cluster center is generally fictional, whereas the exemplars obtained by AP are selected from the actual data. Given a similarity matrix sij that describes the similarity of sample xi and xj , AP tries to find the representative points that maximize the net similarity, i.e., the overall sum of similarities of the samples to their exemplars. We can use any distance measurement technique to computer the similarity, such as negative Euclidean distance. The solution of AP can be deduced by the max-sum algorithm [23] and the problem can be reduced to the message passing and exchanged process with two kinds of messages that iteratively updated until convergence. The two kinds of messages are responsibility r ij and availability aij . The values of r ij and aij are iteratively updated as follows: r ij  1 − λρij  λr ij aij  1 − λαij  λaij ;

(6)

where λ is a damping factor, and ρij and αij are computed as  ρij   αij 

sij − maxk≠j faik  sik g i ≠ j sij − maxk≠j fsik g ij minf0; r jj  Σk≠i;j maxf0; r kj gg i ≠ j : Σk≠j maxf0; r kj g ij

When the convergence is reached, the exemplars are selected by maximizing r jj  ajj . Non-exemplar xi is assigned to the exemplar by calculating argmaxj∈J fr ij  aij g, where J denotes the set of exemplars.

3. OUR ALGORITHM In this section, we present a semi-supervised approach for face recognition in which SDA and AP are integrated into self-training. SDA and AP are used to update the face subspace and to identify the most representative templates in the subspace, respectively. More precisely, in the training stage, the initial input includes a labeled set Dl and an unlabeled set Du . First, a projection matrix is learned using SDA. The images in both Dl and Du are then mapped into a cdimensional feature space using the projection matrix, where c is the number of classes. The exemplar of each class is computed by AP using the projected labeled images in each class. The projected unlabeled images are then assigned with labels according to the nearest exemplar using Euclidean distance. The ki images that are nearest to the corresponding template for the ith class in the subspace, together with their labels, are moved to the labeled set Dl from the unlabeled set Du . The procedure is repeated until the unlabeled set Du is empty. Finally, all the unlabeled images are labeled and the projection matrix for subspace learning can be obtained by traditional LDA. The training details are described in Algorithm 1. In the testing stage, the face images are first projected into the low-dimensional feature space through the learned projection matrix and then identified using K-nearest neighbor (KNN). The flow chart of our algorithm is shown in Fig. 2.

3

Algorithm 1 Label the unlabeled images Input: A labeled set Dl , an unlabeled set Du , the number of selected images ki for the ith class in each iteration. Output: Labels of the unlabeled set Du and a projection matrix obtained by LDA. Method: 1. Use SDA to project Dl and Du into c-dimension feature space; 2. A template of each class is calculated by AP using the labeled images in each class, respectively; 3. Label the images in Du according to the nearest template; 4. The ki images that are nearest to the corresponding templates for the i th class is added to Dl and removed from Du ; 5. Repeat 1–4 until Du is empty. 6. Use LDA to find the projection matrix with Dl .

It is worth pointing out the difference among PCA self-training, LDA self-training, and our proposed method. First, since PCA is an unsupervised dimensionality reduction method, it does not take into account the information of labeled images. Meanwhile, the performance of LDA heavily relies on the number of labeled images. When the labeled set is small, the generalization capability may not be guaranteed. However, SDA, which makes use of labeled and unlabeled images, is more robust than PCA and LDA when there are a small amount of labeled images. Second, the templates obtained by AP is selected from the original images, which is more reasonable than that of the other two semi-supervised methods.

4. EXPERIMENTAL RESULTS In this section, we will carry out a series of experiments to evaluate our algorithm and compare the performance of our algorithm with Fisherface [24], Laplacianface [4], SDA [20], and LDA self-training [19]. Since PCA [25] (i.e., Eigenface) and PCA self-training are both significantly inferior to their discriminant counterparts (i.e., Fisherface and LDA self-training), we omitted their results. Referring to [19,20], the experiments are conducted under the following two scenarios: (1) there are a small amount of labeled images per class to be used on three face datasets; (2) there are only a single labeled images per class on a face dataset. In our

Fig. 2. Flow chart of our algorithm.

4

J. Opt. Soc. Am. A / Vol. 31, No. 1 / January 2014

experiments, we use recognition accuracy (Rank-1 identification rate) as a performance indicator of different methods on various face dataset. A. Training with a Small Amount of Labeled Images per Class The first series of experiments, in which there are only a small amount of labeled images per class, are performed on three datasets. 1. Face Datasets Three face datasets are used in the experiments: Olivetti Research Lab (ORL) face dataset [26], Yale face dataset [27], and extended Yale face dataset B (Yale_B) [28]. The face images are manually aligned and cropped [29]. The size of each cropped image is 32 × 32 pixels, with 256 gray levels per pixel. The face datasets can be downloaded at [28]. ORL dataset contains 40 people with 10 images per people. For some people, the images were collected under different times, lighting, facial expressions (open/closed eyes, smiling/not smiling), and facial details (glasses/no glasses). The images were captured against a dark homogeneous background. In our experiment, we randomly select eight images per people as training data and the remaining as testing data. The Yale dataset has 165 images of 15 individuals. There are 11 images per individual taken at different facial expressions (center-light, with glasses, happy, left-light, without glasses, normal, right-light, sad, sleepy, surprised, and wink). In the experiment section, we randomly select eight images per individual as training data and the remaining as testing data. The Yale_B dataset used in our experiment contains two parts: Yale Face Database B and extended Yale Face Database B [30]. Therefore, the Yale_B dataset has 38 human subjects under 9 poses and 64 illumination conditions. In our experiment, we use the images under the frontal pose and different illumination. Thus, we have 64 images for each subject. For the Yale_B dataset, we randomly select 50 images per subject as training data and the remaining as testing data. Some example images are shown in Figure 3. The details are described in Table 1. 2. Comparing Our Algorithm with the Other Methods In this section, we compare our algorithm with the other four methods with a small amount of labeled images. First, we randomly divide the training data into two subsets: labeled and unlabeled data. Then we randomly choose 2, 2, 8 images

Gan et al.

Table 1. Description of Experimental Face Datasets Number of Dataset Training Size ORL Yale Yale_B

Number of Testing Size

Number of Attributes

Number of Class

80 45 514

1024 1024 1024

40 15 38

320 120 1900

for each person in three datasets as labeled set and the remaining as unlabeled set, respectively. We repeat this process 20 times to evaluate all the methods. The parameter ki is set as 1, 1, and 5 in each dataset, respectively. The results are shown in Table 2. From Table 2, we have two observations. First, the three semi-supervised methods give better results than the unsupervised and supervised methods. To some extent, it can be concluded that unlabeled images do help train a better classifier and improve the generalization ability of the classifier. Second, our algorithm achieves the best results compared to the other methods. Especially, our algorithm obtains an obvious improvement on the Yale_B dataset. And the standard deviation of the testing performance of our algorithm is the lowest. This shows that our algorithm is more robust and depends less on the initial training data. We believe that the reason may be that our algorithm uses SDA to exploit the low-dimensional feature space and AP to calculate the template for each class. Hence, even if the labeled face set is small, our algorithm gives the steady performance and outperforms the other unsupervised, semi-supervised, and supervised methods. Additionally, we investigated the details in the training process of LDA self-training and our algorithm. We consider the accuracy of labeling the unlabeled images as a performance indicator of different methods. Table 3 shows that the performance of LDA self-training and our algorithm. As can be seen, our algorithm outperforms LDA self-training on the three face datasets. 3. Impact of the Parameter ki Since the different values of the parameter ki , which is the selected number of the ith class in each iteration, will directly Table 2. Recognition Accuracy of the Five Algorithms on Testing Face Set in Three Face Datasets Method Fisherface Laplacianface SDA LDA Self-training Our algorithm

ORL

Yale

Yale_B

65.75  5.98 70.38  5.92 77.00  5.37 83.75  3.86 86.88  3.25

47.77  8.46 51.55  4.89 58.44  4.45 51.11  6.37 58.44  4.34

86.24  1.05 83.69  1.82 88.30  1.73 87.77  1.18 96.32  0.65

Table 3. Recognition Accuracy of the Two Algorithms on the Unlabeled Set in Three Face Datasets Method

Fig. 3. Face images from the four face datasets.

LDA Self-training Our algorithm

ORL

Yale

Yale_B

88.25  3.14 91.42  2.53

51.56  4.03 62.78  4.06

87.83  0.73 97.12  0.52

Gan et al.

Vol. 31, No. 1 / January 2014 / J. Opt. Soc. Am. A

5

Table 4. Recognition Accuracy of the Two Algorithms on the Testing Set in Three Face Datasets with Different Selected Images in Each Iteration Dataset ORL

Yale

Yale_B

Parameter

LDA Self-Training

Our Algorithm

2 3 4 2 3 4 7 9 11

82.12  4.29 82.50  4.86 81.75  5.90 52.67  3.15 53.11  8.19 51.11  8.34 86.86  2.72 84.49  2.49 83.19  2.42

85.62  4.26 85.00  4.79 84.00  5.36 57.33  7.01 60.44  8.37 55.33  9.16 95.91  0.47 95.48  0.69 95.15  0.55

influence the performance of LDA self-training and our algorithm, we will analyze the impact of ki . The value of ki ranges among f2; 3; 4g for the ORL and Yale datasets and f7; 9; 11g for the Yale_B dataset. The experiments are carried out as specified in section 4.A.2. The results are shown in Table 4. From the table, we can find that the performance of our algorithm is relatively stable with respect to the parameter ki . 4. Impact of the Number of Labeled Images In this section, we discuss the impact of the number of labeled images on the performance of the different methods. Experiments are carried out to compare the performance of our algorithm with that of the other four methods while the number of labeled images in the training set increases. The results are shown in Figs. 4–6. Not surprisingly, as we can see from these figures, the accuracies of all five methods increases overall as the number of labeled images increases. On the whole, our algorithm gives the best results on the Yale and Yale_B datasets and achieves comparable results on the ORL dataset in comparison to the other four methods. And semi-supervised methods outperform generally the unsupervised and supervised methods. We find that Laplacianface obtains the worst results as the number of labeled images changes on the three datasets. It may be because Laplacianface does not use the labeled images to find the discriminative projection.

Fig. 4. Recognition accuracy on ORL dataset.

Fig. 5. Recognition accuracy on Yale dataset.

B. Training with a Single Labeled Image per Class In this section, we will discuss the situation where there is only one labeled image in each class. The experiment is conducted on a CMU PIE face dataset [31]. The dataset contains 41368 face images of 68 individuals. Each image is taken under varying poses, illumination conditions, and expressions. Each image is manually aligned and cropped with 32 × 32 pixels. Figure 3 shows the sample images for a certain individual. For a fair comparison with SDA and LDA self-training, we choose frontal pose images (C27) with 43 images per individual in this experiment. For each individual, we randomly select 30 images as training data and the remaining as testing data. In each trial, a single image is randomly selected as labeled data for each individual and the remaining as unlabeled data from training data. We repeat this process 20 times and average the results to evaluate our algorithm and the other methods. Since LDA will fail to work if there is only one labeled image for each individual, we do not conduct the experiment using LDA. Table 5 shows the recognition results of various algorithms. As one can see, our algorithm outperforms the other three methods. Especially, our algorithm achieves the best results in comparison to SDA and LDA self-training. On the one hand, the comparison between the results of our algorithm and

Fig. 6. Recognition accuracy on Yale_B dataset.

6

J. Opt. Soc. Am. A / Vol. 31, No. 1 / January 2014

Gan et al.

Table 5. Recognition Accuracies of the Four Algorithms on the CMU PIE Face Dataset with One Labeled Image per Individual Method

Unlabeled Set

Test Set

56.1  2.3 59.0  2.0 84.5  9.5 85.7  6.1

56.4  2.4 59.5  2.7 71.3  6.5 86.9  6.2

Laplacianface [4] SDA [20] LDA Self-training [19] Our algorithm

SDA indicates that it is effective to integrate AP into the selftraining framework. On the other hand, the results of our algorithm and LDA self-training illustrate that the including of SDA and the way of calculating the templates in our algorithm is effective.

5. CONCLUSION In this paper, we introduced a semi-supervised method based on SDA and AP for face recognition. Our algorithm integrated SDA and AP into self-training. In the iteration of self-training, SDA is used to find a projection that maps the original face space into a low-dimensional feature space through both labeled and unlabeled images. AP is employed to compute a template for each class in the subspace through projected labeled images. The newly added labeled images help learn a better projection using SDA. The procedure iterates until all data are labeled. We carried out a series of experiments to evaluate the performance of our algorithm. Experimental results show that our algorithm outperforms the other unsupervised, semi-supervised, and supervised methods on four face datasets. In some cases, if the labeling procedure wrongly assigns labels to the unlabeled images during the iterative procedure, this will decrease the performance of semi-supervised learning. Therefore, how to design a safe semi-supervised classifier based on our algorithm is one of our future works. Additionally, the basic assumption of our algorithm is that the unlabeled samples belong to one of the c classes. If the majority of the unlabeled images do not belong to subjects in the gallery (i.e., one of the c classes), then our algorithm is not likely to work. Hence, how to exploit the information of the unlabeled images to address the problem is another future work.

ACKNOWLEDGMENTS The work is supported by the National Natural Science Foundation of China under Grant Nos. 61271328 and 61105014.

REFERENCES A. F. Abate, M. Nappi, D. Riccio, and G. Sabatino, “2D and 3D face recognition: a survey,” Pattern Recogn. Lett. 28, 1885–1906 (2007). 2. R. Jafri and H. R. Arabnia, “A survey of face recognition techniques,” J. Inf. Process. Syst. 5, 41–68 (2009). 3. D. Cai, X. He, Y. Hu, J. Han, and T. Huang, “Learning a spatially smooth subspace for face recognition,” in Proceedings of IEEE Conference Computer Vision and Pattern Recognition (IEEE, 2007), pp. 1–7. 4. X. He, S. Yan, Y. Hu, P. Niyogi, and H.-J. Zhang, “Face recognition using laplacianfaces,” IEEE Trans. Patt. Analysis Mach. Intell. 27, 328–340 (2005). 5. J. Lu, Y.-P. Tan, and G. Wang, “Discriminative multimanifold analysis for face recognition from a single training sample per person,” IEEE Trans. Patt. Analysis Mach. Intell. 35, 39–51 (2013).

1.

6. R. Gross, I. Matthews, and S. Baker, “Appearance-based face recognition and light-fields,” IEEE Trans. Patt. Analysis Mach. Intell. 26, 449–465 (2004). 7. H. K. Ekenel and R. Stiefelhagen, “Local appearance-based face recognition using discrete cosine transform,” in 13th European Signal Processing Conference (EUSIPCO, 2005). 8. H. Murase and S. K. Nayar, “Visual learning and recognition of 3D objects from appearance,” Int. J. Comput. Vis. 14, 5–24 (1995). 9. Z. Lei, S. Liao, M. Pietikainen, and S. Z. Li, “Face recognition by exploring information jointly in space, scale and orientation,” IEEE Trans. Image Process. 20, 247–256 (2011). 10. W. Yu, X. Teng, and C. Liu, “Face recognition using discriminant locality preserving projections,” Image Vis. Comput. 24, 239–248 (2006). 11. C. Rosenberg, M. Hebert, and H. Schneiderman, “Semisupervised self-training of object detection models,” in Proceedings of the Seventh IEEE Workshops on Application of Computer Vision (IEEE, 2005), pp. 29–36. 12. A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” in Proceedings of the Eleventh Annual Conference on Computational Learning Theory (ACM, 1998), pp. 92–100. 13. T. Joachims, “Transductive inference for text classification using support vector machines,” in Proceedings of the Sixteenth International Conference on Machine Learning (Morgan Kaufmann Publishers, 1999), pp. 200–209. 14. K. P. Nigam, “Using unlabeled data to improve text classification,” Ph.D. thesis (Carnegie Mellon University, 2001). AAI3040487. 15. X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi-supervised learning using Gaussian fields and harmonic functions,” in Proceedings of the 20th International Conference on Machine Learning (Morgan Kaufmann Publishers, 2003), pp. 912–919. 16. X. Zhu, “Semi-supervised learning literature survey,” Tech. Rep. 1530 (Computer Sciences, University of Wisconsin-Madison, 2005). 17. F. Roli and G. Marcialis, “Semi-supervised PCA-based face recognition using self-training,” in Structural, Syntactic, and Statistical Pattern Recognition, Vol. 4109 of Lecture Notes in Computer Science (Springer, 2006), pp. 560–568. 18. A. M. Martinez and A. Kak, “PCA versus LDA,” IEEE Trans. Patt. Analysis Mach. Intell. 23, 228–233 (2001). 19. X. Zhao, N. W. D. Evans, and J.-L. Dugelay, “Semi-supervised face recognition with LDA self-training,” in IEEE International Conference on Image Processing (IEEE, 2011), pp. 3102–3105. 20. D. Cai, X. He, and J. Han, “Semi-supervised discriminant analysis,” in Proceedings of International Conference Computer Vision (IEEE, 2007). 21. B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science 315, 972–976 (2007). 22. Y. Fujiwara, G. Irie, and T. Kitahara, “Fast algorithm for affinity propagation,” in Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (AAAI, 2011), pp. 2238–2243. 23. F. Kschischang, B. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory 47, 498–519 (2001). 24. P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces versus fisherfaces: recognition using class specific linear projection,” IEEE Trans. Patt. Analysis Mach. Intell. 19, 711–720 (1997). 25. M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cogn. Neurosci. 3, 71–86 (1991). 26. ORL face dataset, http://www.cl.cam.ac.uk/research/dtg/attarchive/ facedatabase.html. 27. Yale face dataset, http://cvc.yale.edu/projects/yalefaces/yalefaces. html. 28. Extended Yale face dataset B, http://www.cad.zju.edu.cn/home/ dengcai/Data/FaceData.html. 29. D. Cai, X. He, J. Han, and H.-J. Zhang, “Orthogonal laplacianfaces for face recognition,” IEEE Trans. Image Process. 15, 3608–3614 (2006). 30. D. Cai, X. He, and J. Han, “Spectral regression for efficient regularized subspace learning,” in IEEE 11th International Conference on Computer Vision (IEEE, 2007), pp. 1–8. 31. CMU PIE face dataset, http://vasc.ri.cmu.edu/idb/html/face/.