the recognition rate across two color face databases and also compared our results against a multi-class neural network model. We observe that the use of the ...
Face Recognition using a color PCA framework M. Thomas1 , S. Kumar2 , and C. Kambhamettu1 1
Video/Image Modeling and Synthesis Lab, University of Delaware, Newark, DE 19711 2 Bell Laboratories, Lucent Technologies, 600 Mountain Avenue, Murray Hill, NJ 07974
Abstract. This paper delves into the problem of face recognition using color as an important cue in improving recognition accuracy. To perform recognition of color images, we use the characteristics of a 3D color tensor to generate a subspace, which in turn can be used to recognize a new probe image. To test the accuracy of our methodology, we computed the recognition rate across two color face databases and also compared our results against a multi-class neural network model. We observe that the use of the color subspace improved recognition accuracy over the standard gray scale 2D-PCA approach [17] and the 2-layer feed forward neural network model with 15 hidden nodes. Additionally, due to the computational efficiency of this algorithm, the entire system can be deployed with a considerably short turn around time between the training and testing stages.
1
Introduction
With the surge of security and surveillance activities, biometric research has become one of the important research topics in computer vision. From the results of the Face Recognition Vendor Test (FRVT 2006) [10], it is obvious that the current face recognition algorithms can achieve a very high accuracy in recognition. In some cases the tests indicate that some algorithms are superior to the human recognition rates. This high recognition accuracy occurs in the faces observed under very high resolution cameras where the average separation between the eye centers is ∼350 pixels. The current state of imaging technology does not indicate a linear relationship between the price and quality of a camera, with a slight increase in quality imposing significant increase in camera cost. With this in mind, it is essential that the robustness of any vision algorithm can be tested against images from the lower end of the quality spectrum. This is especially true in typical low end commodity cameras (e.g. web cameras), where the presence of noise and the limited pixel resolution would provide an acid test to the workings of the algorithm. The other issue which is usually not given it due importance is the computational efficiency of the vision algorithm. The requirement for an algorithm to strike a balance between the computational cost and the estimation accuracy would define two of the important aspects for any vision algorithm.
The above requirements is especially true for the problem of face recognition using web cameras where the typical image quality is low while the efficiency requirements are high. In order to satisfy such a need, we propose a face recognition algorithm, where we use color to improve the recognition accuracy over the 2D-PCA approach [17]. The organization of the paper is as follows. In the next section, we describe some of the important works that defined our study. This is followed by a description of the algorithm that we implemented and the results obtained from the face databases. Finally, we conclude our work with possible future directions.
2
Background Studies
Visual developments in infants rapidly move from poor fixation ability and a limited ability to discriminate color to a mature visual acuity in 3 years. It is of interest to note that the color is one of the primary recognition cues for an infant much before it begins to observe shapes and structures (www.ski.org/ Vision/infanttimeline.html). We believe that, it is essential to utilize this very important stimulus when attempting to identify and recognize faces. Typical face recognition algorithms is composed of two sequential stages, the detection step followed by the recognition step [19]. In this work, we assume that the faces have already been located, either as a cropped face database or as the output from an external face detection algorithm. For the face detection section of our work, we have used the face detection algorithm proposed by Lienhart and Maydt [9], which identifies faces using ada-boost of Haar-like features. Readers are directed to the works by Lienhart and Maydt [9] and Barreto et al. [1] for a detailed overview of this algorithm. The research in face recognition has burgeoned over the past few years and it would be impossible to list all the currently ongoing works. The readers can get a better picture of the current state of the art in face recognition algorithms at http://www.face-rec.org/. The graphs in http://www.face-rec. org/interesting-papers/ shows the rate at which research in face recognition is currently progressing. Therefore, in lieu of the current research question, we would like to review some significant previous works that were influential to our research. One of the earliest contributions to the field of recognition via subspace decomposition was by Sirovich and Kirby [13] and Turk and Pentland [15]. The principal component analysis (PCA) of the face space or the “Eigenfaces” approach was seminal. It has since then given rise to many variants and reformulations that attempt to tackle the difficulties present in its original formulation. A different subspace decomposition strategy was proposed by Belhumeur et al. [2] based on the Fisher’s linear discriminant analysis (LDA). This was shown to be superior to the original PCA formulation but required an increased computational requirement. The “Eigenfaces” and the “Fisherfaces” approaches can be considered the two main subspace decomposition work that has been improved upon by other researchers. Comparative studies by Ruiz-del-Solar and Navarrete
[4], Delac et al. [5] and G. Shakhnarovich and Moghaddam [11] to name but a few, might direct interested readers to the available subspace techniques. The use of 3D Principal Component Analysis was first proposed by Tucker [14] who used it to analyze a 3-way psychometric dataset. A multi-linear generalization to the 3-way data analysis was subsequently proved by Lathauwer et al. [8], which has been the primary analysis of the mathematical aspects of the Higher Order Singular Value Decomposition (HOSVD). Higher order SVD has been applied in different contexts. Costantini et al. [3] recently developed a dynamic texture synthesis algorithm by applying linear updates to the HOSVD of the spatio-temporal tensor. In the field of face recognition, the earliest application of HOSVD was by Vasilescu and Terzopoulos [16] who described a “tensorfaces” framework to handle aspects such as scene structure, illumination and viewpoint. Recently, Yu and Bennamoun [18] used these generalization in performing face recognition using 3D face data. However, according to these authors, the biggest problem in performing HOSVD lay in its computational complexity
3
Tensor Concepts
For the sake of completeness, we shall briefly describe the concept of a tensor from multi-linear algebra. A tensor is a multi-linear mapping over a set of vector spaces, D ∈ RI1 ×I2 ×···×IN and can be seen as generalizations of vectors (1st order tensor) and matrices (2nd order tensor). Given a tensor, D we can unfold it along any of its n ≤ N dimensions to obtained a mode-n matrix version of the tensor. Figure 1 shows the 3 possible unfoldings along each of its highlighted dimensions and mode-n unfolding can be visualized along similar lines. The mode-n product of a tensor, D ∈ RI1 ×I2 ×···×In ×···×IN with a matrix U ∈ Jn ×In R is computed by unfolding D along In , performing matrix multiplication and then folding the result back into the tensorial form (D ×n U). Having thus defined the mode-n multiplication, the HOSVD for D can be defined as D = S ×1 U(1) ×2 · · · ×N U(N ) ,
(1)
where U(n) is a unitary matrix of size In × In and S is an all-orthogonal core tensor, which is analogous to the eigenvectors and eigenvalues in the SVD formulation, respectively. The unitary matrices, U(n) can be obtained by unfolding D to obtain the mode-n matrix and performing the regular SVD on the unfolded matrix.
4
Algorithm
Yang et al. [17] showed the superiority of the 2D-PCA over the 1D-PCA (Eigenfaces) approach, in terms of accuracy and efficiency. One of the possible reasons for the improvement in accuracy might be due to the pixel coherence that exists within any finite neighborhood when images are observed as 2D matrices.
I3
I3
I3
I1
I1
I1
I2
I2
I2
I 2 × I 3 I1
I 3 × I1 I 2
I1 × I 2 I 3
Concatenated along highlighted dimension
Fig. 1. Mode-1, mode-2 and mode-3 unfolding of a 3rd order tensor (| · | denotes the size of the dimension).
Typically, skin pixels would occur in close proximity to other skin pixels, except of course at boundaries. In the 1D-PCA approach, this coherence is disturbed due to the vectorization of the image pixels, while the 2D-PCA maintains the neighborhood coherence thereby achieving higher recognition accuracy. Vertical unfolding 1 Horizontal unfolding
c
2
3
hc
1
2
2
3 h
1 h
3
wc w w
Fig. 2. Horizontal and vertical mode-3 unfolding of the 3D color tensor.
We can observe that the mode-3 unfolding of the 3D color tensor best maintains the coherence between local pixel neighborhoods and thus we use this unfolding for our experiments. Another observation we can make is that the mode-3 unfolding can be performed along two directions, the horizontal direction and the vertical direction. The two types of mode-3 unfolding is shown in
figure 2, where h is the image height, w is the image width and c is the number of channels in the image (in our case c = 3). Given M training images, we compute the vertical and horizontal mode-3 unfolding for each of the images. The image covariance (scatter) matrix, Φ over the M training samples can then be computed as Φ=
M 1 X T (Γi − Ψ ) (Γi − Ψ ) M i=1
Ψ=
M 1 X Γi M i=1
(2)
where Γi is the ith training image. In the case of a vertical unfolding Γi , Ψ ∈ Rhc×w and Φ ∈ Rw×w while for the horizontal unfolding case, Γi , Ψ ∈ Rh×wc and Φ ∈ Rwc×wc . The optimal projection axes that maximizes the variance of the M images correspond to the top d eigenvector-eigenvalue pairs of Φ [17]. These orthonormal eigenvectors span the required 2D color subspace. Once the color subspace is computed, each of the training images can be projected into the subspace and their projection vector can be stored in the face database. In the recognition stage, an input test sample is unfolded and projected into this subspace. The minimum Euclidean distance between the projected test vector and all the stored projections are computed and the class of the closest projection can be used for the test image.
5
Results and Analysis
For quantitative comparisons, we have performed our analysis on two face databases. The first one was the Georgia Tech (GATech) face database (www.anefian. com/face_reco.htm) and the second was a pruned version of the California Institute of Technology (CalTech) face database (www.vision.caltech.edu/ html-files/archive.html).
(a)
(b)
Fig. 3. Sample images from the two face databases (a) GATech (b) CalTech.
The GATech images (figure 3(a)) were cropped low resolution (∼200 × ∼150 pixels) color images of 50 individuals with 15 views per individual, with no spe-
cific order in their viewing direction. The pruned Caltech database (figure 3(b)) was composed of 19 individuals with 20 views per image. The original database was composed of 450 face images of 27 individuals, but we pruned out individuals who had lesser than 20 views to maintain uniformity within the database. Unlike the GATech database, these images were compressed jpeg images captured at a higher resolution (∼450 × ∼300 pixels), which might indicate the differences in recognition accuracy. 5.1
Error Analysis against Face Databases
For comparing our algorithm against the face databases, we computed the recognition rate in a K-fold cross validation scheme [7] where K varied from 1 (leaveone-out) to one less than the total number of training images. Thus for the each K, we randomly selected K views of each individual for training and used the remaining N − K images (N = 15 for GATech and N = 20 for CalTech) for testing. We averaged the recognition accuracy for each value of K to obtain an average rate of identification. The results below are the output from the best K-fold cross validation scheme. To compare various possible measurements we have conducted the following experiments: Unfolding Direction In this experiment, we tested the unfolding direction and its contribution to the recognition rate. From the test cases, we observed that unfolding the 3D color tensor along the vertical direction was better than unfolding along the horizontal direction for most color spaces. Figure 4 shows the results from the RGB and HSV color space where we can see that the vertical unfolding provided a better recognition rate over the horizontally unfolded version for both the color spaces.
GAtech database: Unfolding
Caltech database: Unfolding
100
100 2D RGB: vertical 2D RGB: horizontal 2D HSV: vertical 2D HSV: horizontal
98
94
96
Recongition Rate
Recongition Rate
96
92 90 88 86
94 92 90 88 86
84
84
82
82
80
2
4
6
8
10
No.of. eigenvalues
12
14
2D RGB: vertical 2D RGB: horizontal 2D HSV: vertical 2D HSV: horizontal
98
80
2
4
6
8
10
12
No.of. eigenvalues
Fig. 4. Variations in the direction of unfolding of the 3D color tensor.
14
1D/2D variations Here, we tried to observe the improvement of the color space 2D-PCA approach in face-recognition over the traditional 1D-PCA and grayscale 2D-PCA. For comparative purposes, we experimented with 1D grayscale (original eigenface approach [15]), 1D RGB (original eigenface approach using color components), 2D grayscale (2D-PCA based face recognition [17]) and 2D RGB (color space based PCA). For the 2D RGB case, we used the horizontal unfolding of the 3D color tensor to compute the subspace. Figure 5 shows the recognition rates with the use of 1D/2D data. From the figure it is observable that the 2D oriented approach is better than the 1D-PCA, despite the use of color components in the 1D analysis. Between the gray scale and the color based 2D approaches, using the color space improved recognition rate by around 2∼3% in the case of both the databases.
Caltech database 100
90
90
80
80
Recongition Rate
Recongition Rate
GAtech database 100
70
60 50 40
1D Gray 1D RGB 2D Gray 2D RGB
30 2
4
6
8
10
No.of. eigenvalues
12
14
70
60 50 40
1D Gray 1D RGB 2D Gray 2D RGB
30 2
4
6
8
10
12
14
No.of. eigenvalues
Fig. 5. Variations in using 1D vectors or their 2D counterparts (comparison between gray scale and RGB color).
Color space variations To understand the variability of the algorithms towards color space, we analyzed the recognition rate with 4 color spaces, RGB, HSV, YUV and YIQ [6]. From the previous two experiments, we observed that vertical unfolding of the 2D face images was better than the others. Here we only show the results across the 4 color spaces with the vertical mode-3 unfolding. The conspicuous finding that can be observed immediately from the figure 6 is that any color space performs better than the grayscale 2D-PCA approach. In fact the performance of the 2D HSV is significantly better for the GATech database (an improvement from 86% to 94%). Unfortunately, we could not achieve any conclusive evidence to the superiority of one color space over the other since the results from the two databases indicate superiority of different color spaces.
Caltech database: Colorspace 100
98
98
96
96
94
94
Recongition Rate
Recongition Rate
GAtech database: Colorspace 100
92 90 88 86 2D Gray 2D RGB 2D YUV 2D YIQ 2D HSV
84 82 80
2
4
92 90 88 86 2D Gray 2D RGB 2D YUV 2D YIQ 2D HSV
84 82 6
8
10
12
No.of. eigenvalues
14
80
2
4
6
8
10
12
14
No.of. eigenvalues
Fig. 6. Variations in using different color spaces for analysis (with vertical unfolding).
5.2
Recognition performance
Among the many existing techniques, neural network models have been used in the past to perform reasonably accurate identification. Typically, facial features like the eyes, nose and mouth are extracted, vectorized and used in the training process. For our experiments, we used a holistic face recognition neural network model instead of a feature oriented approach so as to provide a uniform basis for comparison against the subspace techniques we implemented. The 2-layer feed forward neural network model that we used was adapted from the ANN toolbox developed by S. Sigurdsson et al. [12]. The weights in the network was computed using the BFGS optimization algorithm using soft line search to determine step lengths. The readers are directed to the code (http://isp.imm.dtu.dk/toolbox/ann/) and the references therein for additional information. For a reasonable computational effort, we used 15 individuals from the GATech database and generated the classification error rates with two neural network models, one that used 15 hidden layer nodes and the other that used 7 hidden layer nodes. To reduce the dimensions of the face space, we first converted the color images into grayscale and then projected the 1D grayscale faces images into their eigenspace so as to retain ≤75% of their cumulative variance. Using the leave-one-out cross validation [7], we repeated our measurements by training a new network for every simulation step and used the trained network to perform the recognition. The table 1 and the adjacent figure, shows the recognition accuracy and computational requirements of using our PCA oriented approach and the neural network model. Increasing the number of hidden nodes improved the recognition rate but required a higher computational cost. Table 1(a) shows the time in seconds that were required for various algorithms that were compared. It is important to observe that the training time for a PCA oriented algorithm was dependent on the number of eigenvalues that were used to represent the
Training time in seconds 1D-Gr 2D-Gr 2D-HSV NN-7 NN-15 0.705 0.253 0.800 0.716 0.256 0.805 0.719 0.257 0.809 0.719 0.259 0.815 0.723 0.271 0.821 97.18 357.4 0.721 0.265 0.826 0.727 0.278 0.832 0.728 0.267 0.838 0.729 0.272 0.842 0.730 0.273 0.848
GAtech database − 15 Individuals (15 views/individual) 100
95
Recongition Rate
λi 1 2 3 4 5 6 7 8 9 10
90
85
80 1D Gray 2D Gray 2D HSV − Vertical 1D PCA + NN − 7 Hiddent units 1D PCA + NN − 15 Hiddent units
75
70
2
4
6
8
10
12
14
No. of eigenvalues(PCA)
Table 1. Performance comparison against the multi-class neural network classifier (a) Time required to perform training (b) recognition rate for the corresponding technique.
eigenspace and is depicted in the first column of table 1(a). The neural network models produced improved recognition rates, but it came at the cost of computational efficiency. On the contrary, our approach showed a better recognition accuracy at a significantly lower computational cost.
6
Conclusions
In this paper, we have shown that the utilization of color cues would improve the accuracy of a face recognition algorithm. Instead of computing a full HOSVD, we compute a partial mode-n unfolding of the original tensor and apply 2D-PCA on the unfolded color tensor. From the quantitative experiments, we have observed that a color space oriented approach can improve recognition accuracy when compared to the grayscale 2D-PCA approach. Differences between the various color spaces has not been conclusively established and further research is needed to see which color space performs better. Another important observation is that the vertical unfolding of the color tensor improves recognition rate as against the horizontal unfolding. This is observed to occur independent of the color space that is used and we are currently trying to understand the reason for this improvement. With the availability of cameras that provide color information by default, color based recognition would provide means to improve the recognition accuracy by utilizing all the available information.
References 1. J. Barreto, P. Menezes, and J. Dias. Human-robot interaction based on haar-like features and eigenfaces. In Proceedings of the IEEE International Conference on Robotics and Automation, 2004 (ICRA 2004), volume 2, pages 1888–1893, 2004.
2. P. N. Belhumeur, J. Hespanha, and D. J. Kriegman. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):711–720, 1997. usstrunk. Higher Order SVD Analysis for Dynamic 3. R. Costantini, L. Sbaiz, and S. S¨ Texture Synthesis. IEEE Transactions on Image Processing, 2007. 4. J. R. del Solar and P. Navarrete. Eigenspace-based face recognition: a comparative study of different approaches. IEEE Transactions on Systems, Man and Cybernetics - Part C: Applications and Reviews, 35(3):315–325, 2005. 5. K. Delac, M. Grgic, and S. Grgic. Independent comparative study of PCA, ICA, and LDA on the FERET data set. IJIST, 15(5):252–260, 2005. 6. A. Ford and A. Roberts. Colour space conversions. August 1998. 7. R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI, pages 1137–1145, 1995. 8. L. D. Lathauwer, B. D. Moor, and J. Vandewalle. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4):1253– 1278, 2000. 9. R. Lienhart and J. Maydt. An extended set of haar-like features for rapid object detection. In Proceedings of the International Conference on Image Processing, pages 900–903, September 2002. 10. P. J. Phillips, W. T. Scruggs, A. J. O’Toole, P. J. Flynn, K. W. Bowyer, C. L. Schott, and M. Sharpe. FRVT 2006 and ICE 2006 Large-Scale Results. Technical Report NISTIR 7408, National Institute of Standards and Technology, March 2007. 11. G. Shakhnarovich and B. Moghaddam. Handbook of Face Recognition, chapter Face Recognition in Subspaces. Springer-Verlag, 2004. 12. S. Sigurdsson, J. Larsen, L. Hansen, P. Philpsen, and H. Wulf. Outlier estimation and detection: Application to Skin Lesion Classification. In Proceedings of the Int. Conf. on Acoustics, Speech and Signal Processing, pages 1049–1052, 2002. 13. L. Sirovich and M. Kirby. Low-dimensional procedure for the characterization of human faces. Journal of the Optical Society of America, A(4):519524, 1987. 14. R. L. Tucker. Some mathematical notes on the three-mode factor analysis. Psychometrika, 31:279–311, 1966. 15. M. A. Turk and A. P. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71–86, 1991. 16. M. A. O. Vasilescu and D. Terzopoulos. Multilinear image analysis for facial recognition. In Proceedings of the International Conference of Pattern Recognition (ICPR 2002), volume 2, pages 511–514, 2002. 17. J. Yang, D. Zhang, A. F. Frangi, and J. Yang. Two-dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(1):131–137, 2004. 18. H. Yu and M. Bennamoun. 1D-PCA, 2D-PCA to nD-PCA. In Proceedings of the International Conference of Pattern Recognition (ICPR2006), volume IV, pages 181–184, 2006. 19. W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld. Face recognition: A literature survey. ACM Comput. Surv., 35(4):399–458, 2003.