ARTICLE International Journal of Advanced Robotic Systems
An improved Robust Sparse Coding for Face Recognition with Disguise Regular Paper
Dexing Zhong1,2,*, Peihong Zhu1, Jiuqiang Han1 and Shengbin Li2 1 Ministry of Education Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University, Xi’an, China 2 State Key Laboratory of the Health Ministry for Forensic Sciences, Xi’an Jiaotong University, Xi’an, China * Corresponding author E-mail:
[email protected]
Received 1 Jun 2012; Accepted 25 Jul 2012 DOI: 10.5772/51861 © 2012 Zhong et al.; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract Robust vision‐based face recognition is one of most challenging tasks for robots. Recently the sparse representation‐based classification (SRC) has been proposed to solve the problem. All training samples without disguise are used to compose an over‐complete dictionary, and the testing sample with disguise is represented by the dictionary with a sparse coding coefficients plus an error. The coding residuals between the sample and each class of training samples are measured and the minimum of them is the identified class to which the sample belongs. The robust sparse coding (RSC) seeks for the MLE (maximum likelihood estimation) solution of the sparse coding problem, so it is more robust to disguise. However, the iteratively algorithm to solve RSC is high time consuming. In this paper, we propose an improved robust sparse coding (iRSC) algorithm for practical application conditions. During iterations, the dictionary is reduced by eliminating the objects with larger coding residuals. The over‐complete property of dictionary is not affected. Experiments on AR face database demonstrate that the coding is sparser and the efficiency is higher in iRSC. Keywords Face Recognition, Sparse Representation, Robust Sparse Coding, Disguise www.intechopen.com
1. Introduction In the last few decades, face recognition has attracted more and more attention in the field of computer vision and patter recognition [1‐3]. As one of most successful applications in biometrics, face recognition can be applied in social robotics to fulfill the person identification task in a natural and non‐contact way. In practice, face patterns are subject to changes in illumination, pose, facial expression, etc. Among them, face recognition with the real disguise is a very important and hard problem to be solved. Therefore, robust vision‐based face recognition has been extensively studied by researchers from the area of computer vision, robotics, artificial intelligence, etc. Generally, face image is stretched into a high dimensional face vector, then feature extraction and dimensionality reduction algorithms can be applied in the face space, so that the high‐dimensional face vector is transformed into a low‐dimensional subspace. And in this face subspace classification and identification task can be implemented. Two classical linear face recognition methods are principal component analysis (PCA) [4] and linear discriminate analysis (LDA) [5]. PCA is widely used to reduce the dimensionality of original face images, and the J Adv Zhu, Robotic Sy, 2012, 126:2012 Dexing Zhong, Int Peihong Jiuqiang Han Vol. and 9, Shengbin Li: An improved Robust Sparse Coding for Face Recognition with Disguise
1
extracted Eigenface features are used as inputs for other methods. LDA is a supervised subspace learning method, which seeks the optimal projection directions to maximize the between‐class scatter and minimize the with‐class scatter at the same time. The typical nonlinear methods are kernel methods based on the linear ones, which apply the kernel transformation to enhance the classifying ability, for example, see [6, 7]. The other nonlinear methods are manifold learning algorithms, e.g. locally linear embedding (LLE) [8] and locality preserving projection (LPP) [9], which assume that the distribution of face image data is close to manifolds embedded in the high‐dimensional space. In 2007, graph embedding (GE) [10] was proposed as a general framework to unify a series of the dimensionality reduction algorithms for face recognition. Each algorithm can be considered as a certain kind of graph embedding, in which the specific graph is designed to describe a certain statistical or geometric property of a data set. According to GE, marginal fisher analysis (MFA) [10] and neighborhood discriminant embedding (NDE) [11] are gradually proposed. These algorithms can better reveal the representative and discriminative features from the underlying manifold structures of face image. Recently, sparse representation is introduced from compressive sensing theory into the field of pattern recognition; the sparse representation‐based classification (SRC) [12] is a landmark algorithm for robust face recognition, which can deal with face occlusion, corruption and real disguise. The basic idea of SRC is to represent the query face image using a small number of atoms parsimoniously chosen out of an over‐complete dictionary which consists of all training samples. The sparsity constraint of the coding coefficients is employed to insure that only a few samples from the same class of the query face have distinct nonzero values, whereas the coefficients of other samples are equal or close to zero. The sparsity of the coding coefficient can be directly measured by l0‐norm, which counts the number of nonzero entries in a vector. However, the l0‐norm minimization is an NP‐hard problem; therefore the l1‐ norm minimization is widely employed instead of the above problem. It has been demonstrated that l0‐norm and l1‐norm minimizations are equivalent if the solution is sufficiently sparse [13]. The representation fidelity of SRC is measured by the l2‐ norm of the coding residual, which actually assumes that the coding residual follows Gaussian distribution. It may not be able to effectively describe the real model of the coding residual in practical situation of face recognition, especially dealing with real face disguise, for example face with sunglasses or scarf, see Figure 1. The robust sparse coding (RSC) [14, 15] seeks for the MLE (maximum
2
Int J Adv Robotic Sy, 2012, Vol. 9, 126:2012
likelihood estimation) solution of the sparse coding problem, so that the distribution of coding residual is more accurate than Gaussian or Laplacian, and it is more robust to disguise than SRC.
Figure 1. Five objects from AR Databse, the first line is five samples without disguise, the second line is five samples with sunglasses, and the third line is five samples with scarf.
In RSC, iteratively reweighted regularized robust coding (IR3C) [15] algorithm is proposed to solve the MLE of the coding problem. Usually the number of iterations is more than 10, and then IR3C can obtain the convergence result. To improve the efficiency of the implementation of the algorithm and increase the robustness of RSC dealing with real face disguise, in this paper we propose an improved robust sparse coding (iRSC) algorithm. In each step of iteration, the dictionary, which consists of all training samples, is reduced by eliminating the objects with larger coding residuals. The reduced dictionary is used to obtain the convergence result of the MLE solution of the sparse coding problem. By eliminating the interference of the objects with larger coding residual errors, iRSC is fast convergence and more efficient. Our experiments in AR face database [16] show that iRSC achieves better performance that SRC and RSC when dealing with real face disguise. The rest of this paper is organized as follows: Section 2 reviews the algorithms of SRC and RSC. Section 3 presents our proposed iRSC. Section 4 conducts the experiments, and Section 5 concludes the paper. 2. Reviews of SRC and RSC In this section, we review two sparse representation‐ based face recognition algorithms. Given a face image sample from a certain face database, its storage format is M×N color or gray image. The face image is stretched into a d‐dimensional face vector x (d=M × N). Then face
www.intechopen.com
recognition algorithms can directly applied in the d‐ dimensional face space. 2.1 Sparse Representation‐based Classification (SRC) In SRC, the over‐complete dictionary consists of all training samples, i.e., D [ D1 , D2 ,..., Dk ] d n , where the ith object class Di [ xi ,1 , xi ,2 ,..., xi , ni ] , and n n1 n2 ... nk , k is the number of all classes. The query sample y d can be represented by the dictionary, i.e., y D , where is the coding vector of y over D . Therefore, the sparse coding problem can be formulated as:
min
0
s.t. y D , (1)
effective enough to describe the coding residual, especially when dealing with real face disguise. Therefore, RSC assumes a more suitable distribution for the coding residual e y D , as follow:
(e)
where (e) ln f (e) , f (e) is the probability density function (PDF) of e , denotes the unknown parameter set that characterizes the distribution, and are positive scalars, controls the decreasing rate from 1 to 0, and controls the location of demacation point. The RSC model is as follow: 2
ˆ arg min{ W 1 2 ( y D ) 2 1},
where 0 is the l0‐norm. Since the sparsest solution of Eq. (1) is an NP‐hard problem, SRC uses the equivalent l1‐ norm minimization to replace l0‐norm minimization under the condition that the solution is sufficiently sparse [13]. The sparse coefficient can be obtained by the following regularization:
2
ˆ arg min{ y D 2 1}. (2)
When SRC deals with face occlusion and corruption, it introduces an identity matrix I as a dictionary to code the outlier pixels. Therefore, Eq. (2) can be extended as follow:
2
ˆ arg min{ y [ D, I ][ ; ] 2 [ ; ] 1}. (3)
According to [17], Eq. (3) is equivalent to the Lagrangian formulation:
ˆ arg min{ y D 1 1}. (4)
Here SRC uses l1‐norm to model the coding residual y D , so that it can gain certain robustness to outliers. The classification criterion of SRC is to find which class of training samples can better represent the query sample,
identity( y ) arg min{ y Diˆ i 2}or arg min{ y ˆ Diˆ i }, (5) i
i
2
where ˆ i is the sub‐coding vector associated with ith class of training samples. 2.2 Robust Sparse Coding (RSC) In SRC, the sparse representation fidelity is actually measured by the l2‐norm or l1‐norm of the coding residual errors, i.e., y D
2 2
in Eq. (2) or y D 1 in Eq. (4). The
sparse coding model assumes that the coding residual follows Gaussian or Laplacian distribution, respectively. In practical situation, both distributions may not be www.intechopen.com
(7)
where W is the estimated diagonal weight matrix, and its diagnonal elements are as follow:
1 (ln(1 exp( e 2 )) ln(1 exp )), (6) 2
Wi ,i (ei )
(ei ) ei
1 . (8) 1 exp( ei2 )
The element Wi ,i is the weight assigned to pixel i of the query image y , and the outlier pixels (e.g., occlusion or corruption) have small weights to reduce their effect on the sparse coding since they have large coding residual errors. The same classification strategy as in SRC is used to classify the query face image: 12 identity( y ) arg min{ W final ( y Diˆ i ) }, (9) i
2
where W final is the convergent final weight matrix. IR3C algorithm can solve RSC model, however its convergence result usually needs more that 10 steps, and each step has high computational complexity. In order to reduce the computing consumption and enhance the robustness, we propose the improved robust sparse coding (iRSC) algorithm presented in the next section. 3. The improved Robust Sparse Coding (iRSC) In both SRC and RSC, all training samples are involved to compose the over‐complete dictionary, each query sample is represented as a sparse linear combination over the dictionary. The sparsity constraint on the coding coefficients and the iteratively solving algorithm make the computational cost of RSC very high. Since the dictionary is over‐complete for sparse representation, for example AR database has 100 classes and even more classes in practical systems, the dictionary can be reduced in the iterative steps to calculate the weight matrix W . Actually, the objects with larger coding residual errors have less contribution to representing the query sample, Dexing Zhong, Peihong Zhu, Jiuqiang Han and Shengbin Li: An improved Robust Sparse Coding for Face Recognition with Disguise
3
and the coding coefficients associated with those objects are usually equal or close to zero under the sparsity constraint. Therefore those objects could be omitted from the dictionary without loss of the over‐complete property. The iRSC algorithm is proposed according to the above principle. At the beginning, more irrelevant objects can be omitted to reduce the total computing cost. When the over‐complete dictionary is small enough, all remained objects should be reserved to keep the condition for well sparse representation. Thus, we define a retention factor R of the dictionary for the step t as follow: 0.1t 0.5, t 5 Rt (10) 1, t 5
1.
2.
3.
The improved Robust Sparse Coding (iRSC) Input: Normalized query image y with unit l2‐ norm, dictionary D (each column of D has unit l2‐ 1 1 1 norm); D (1) D , (1) [ , ,..., ]T . Start from t 1 . n n n Calculate residual e( t ) y D ( t ) ( t ) and estimate 1 weight Wi ,(it ) (ei( t ) ) . 1 exp( (ei( t ) ) 2 ) Solve the l1‐minimization problem: 2
ˆ arg min{ (W ( t ) )1 2 ( y D ( t ) ) 2 1}.
4.
Calculate residual between y and yˆ i represented by only the ith class: ri ( y ) (W ( t ) )1 2 ( y Di( t )ˆ i ) , 2
where Di( t ) is the sub‐dictionary associated with
After the step t , only Rt 100% of the dictionary with
the ith class and ˆ i is the sub‐coding vector
minor coding residual errors can be reserved for the next step. Moreover we can set the retention factor R with a fixed ratio or median ratio. Fig. 2 (a) is the size reduction curve of the over‐complete dictionary on AR database. Although the size of dictionary is reduced gradually, the convergence of iRSC is almost the same as RSC, see Fig. 2 (b), which means that the over‐complete property of dictionary is almost not affected.
5.
associated with the ith class. Remain Rt 100% classes of the dictionary
D ( t ) with smaller residuals ri ( y ) : D ( t 1) Rt ( D (t ) ) . Update the sparse coding coefficients: ( t 1) Rt (ˆ ) , which is a new vector whose entries
6.
7.
are the entries in ˆ that are associated with the remained classes. Let t t 1. Go back to step 2 until the condition of convergence is met, or the maximal number of iterations is reached. Output: identity( y ) arg min ri ( y ). i
Table 1. Algorithm of the improved robust sparse coding
The algorithm of iRSC is presented in Table 1, in step 6 the condition of convergence is as follow:
W ( t 1) W ( t )
2
W (t )
2
W , (11)
where W is a small positive scalar. The maximal number of iterations is usually set as 10 in our experiments on AR database.
4. Experimental Results
Figure 2. (a) The size reduction curve of the dictionary on AR database. (b) The convergence curves of RSC and iRSC, the difference of W is defined as W ( t 1) W (t )
4
Int J Adv Robotic Sy, 2012, Vol. 9, 126:2012
2
W (t )
2
.
In this paper, we focus on face recognition with real disguise. Therefore we conduct our experiments on AR face database [16] in which there are samples with sunglasses or scarf, see Figure 1. We compare iRSC with SRC [12] and RSC [14] which are the benchmark methods using sparse representation for face recognition. A subset of the AR database is used in the experiments, which consists of 600 images (6 non‐occluded frontal view samples per class, 3 from Session 1 and 3 from Session 2) from 100 subjects (50 males and 50 females) for training and 200 images (2 samples per class, with sunglasses or scarf) from 100 subjects for testing. Figure 3 shows 6 training samples with facial expression changes and 2 testing samples with neutral expression from the first subject in AR database. www.intechopen.com
pixels. The complexity of RSC is about O(tn 2 m1.3 ) , t 10 in this case. As the size reduction of dictionary in iRSC, its runtime is just about 16% of RSC.
Figure 3. (a) Six training samples and (b) two testing samples from the first object from AR Databse.
The images are resized to 42 30 , the parameters and are set the same as in [15]. And the regularization parameter is set as 0.001 by default. Fig. 4(a) shows a test image with sunglasses; Fig. 4(b) is the training sample associated with the maximum of coding entries; Fig. 4(c) and Fig. 4(d) show the minimum of residuals and the final weight map by RSC and iRSC, respectively.
Figure 4. An example of face recognition with disguise using RSC and iRSC. (a) A test image with sunglasses. (b) The training sample associated with the maximum of coding entries by both RSC and iRSC. (c) and (d) are the minimum of residuals ri ( y ) and the final weight map by RSC and iRSC, respectively.
In Figure 5, (a) and (b) are the sparse coding of the test sample and the residuals of each class by RSC. Only one training sample has the maximum of coding entries, others are close to zero; only the residual associated with the identified subject is close to zero, others are very large. (c) and (d) are the ones by iRSC, respectively. As a result of the size reduction of dictionary, the coding becomes sparser, while the same results are achieved as well. The face recognition results by SRC, RSC and iRSC are listed in Table 2. Although the dictionary is reduced in iRSC, it still can achieve competitive recognition rates with RSC dealing with both sunglasses and scarf disguises. SRC did not get good performance with scarf (only 38% accuracy) in which about 40% face region are covered. The reason is that SRC cannot handle the case with large occlusion more than around 30%. In the experiments, the programming environment is Matlab 7.0a. The computer used is of 3.10 GHz Intel(R) Core(TM) i5‐2400 CPU and with 4.00 GB RAM. Average runtimes by the above three methods are listed in Table 3. As a result of the size reduction of dictionary, the average runtime of iRSC is much shorter than both SRC and RSC. Since l1_ls [18] l1‐minimization solver is used in all the methods, the empirical computational complexity of SRC is O(n 2 m1.3 ) where n is the dimensionality of face feature, and m is the number of dictionary atoms. When dealing with occlusion, its complexity is O( n 2 (m n)1.3 ) because it needs to add an identity matrix to code the occluded
www.intechopen.com
Figure 5. (a) The sparse coding of the test sample by RSC, the identified sample is laid out. (b) The residuals of each class by RSC. (a) The sparse coding of the test sample by iRSC. (b) The residuals of each class by iRSC.
Algorithms
Sunglasses
Scarves
SRC
87%
38%
RSC
100%
99%
iRSC (Eq. (10)) iRSC (fixed ratio 0.8)
98% 100%
99% 98%
Table 2. Recognition rates by competing methods on the AR database with disguise occlusion.
Dexing Zhong, Peihong Zhu, Jiuqiang Han and Shengbin Li: An improved Robust Sparse Coding for Face Recognition with Disguise
5
Algorithms
Sunglasses
Scarves
SRC
17.86 s
20.09 s
RSC
28.32 s
23.35 s
iRSC (Eq. (10)) iRSC (fixed ratio 0.8)
4.43 s 5.85 s
4.03 s 4.80 s
Table 3. Average runtimes by competing methods on the AR database with disguise occlusion
Here, we conduct a more challenging task that the testing samples have more facial expressions. Another subset of the AR database is used in the experiment, which consists of 700 images (7 non‐occluded frontal view samples per class). The testing dataset consists of 600 images (each class has 6 samples with sunglasses or scarf). Other parameters are the same as the first experiment. The face recognition accuarcies and average runtime by SRC, RSC and iRSC are listed in Table 4. The iRSC with fixed ratio can achieve better results than the one with Eq. (10) in this case, while RSC still get the highest accuracy among them. Also the different ways to choose the retention factor R are investigated. Using median ratio for the retention factor will delete too many training samples and the over‐complete property of the dictionary may be weaken. Although the average runtime is the lowest, its accuracy is much lower. So it’s not a good choice. The basic principle to choose R is to preserve the over‐ complete property of the dictionary. Algorithms Sunglasses Scarves SRC
71.17%(19.4)
RSC
88.17%(32.7)
26.33%(20.4) 88.50%(22.6)
iRSC (Eq. (10)) iRSC (fixed ratio 0.9) iRSC (median ratio)
82.00%(5.02) 87.17%(5.58) 79.00%(1.42)
81.83%(4.55) 85.83%(4.85) 69.00%(1.29)
Table 4. Recognition rates and average runtime (in parentheses and unit is second) by competing methods on the AR database with disguise occlusion.
5. Conclusion This paper presented an improved robust sparse coding (iRSC) alogrithm for robust face recognition with real disguise. The advantages of RSC that are of robustness to various types of outliers and large region occlusion have been well preserved. By the size reduction of dictionary in each iterative step of iRSC, its computational complexity is reduced significantly. Its average runtime is only about 16% of RSC. In this process, the over‐complete property of dictionary is not affected, therefore, iRSC can still achieve competitive recognition rates with RSC. The experimental results on AR face database demonstrated that iRSC has better comprehensive performance than SRC and RSC. With high recognition rate but low computational cost, iRSC is a good candicate for practical robotic systems to fulfill robust face recognition tasks. 6
Int J Adv Robotic Sy, 2012, Vol. 9, 126:2012
6. Acknowledgments This work is supported by grants from National Natural Science Foundation of China (No. 61105021), China Postdoctoral Science Foundation (No. 2011M501442) and the Fundamental Research Funds for the Central Universities. 7. References [1] Bowyer K.W, Chang K, Flynn P (2006) A survey of approaches and challenges in 3D and multi‐modal 3D+2D face recognition. Computer Vision and Image Understanding. 101(1): 1‐15. [2] Abate A.F, Nappi M, Riccio D, et al. (2007) 2D and 3D face recognition: A survey. Pattern Recognition Letters. 28(14): 1885‐1906. [3] Zhao W, Chellappa R, Rosenfeld A, et al. (2003) Face Recognition: A Literature Survey. ACM Computing Surveys. 399‐458. [4] Turk M, Pentland A (1991) Eigenfaces for recognition. Cognitive Neuroscience. 3: 71‐86. [5] Belhumeur P.N, Hepanha J.P, Kriegman D.J (1997) Eigenfaces vs. Fisherfaces: recognition using class specic linear projection. IEEE. Trans. Pattern Analysis and Machine Intelligence. 19(7): 711‐720. [6] Scholkopf B, Smola A, Muller K.R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10: 1299‐1319. [7] Mika S, Ratsch G, Weston J (1999) Fisher discriminant analysis with kernels. Proc. IEEE Neural Networks for Signal Processing. 41‐48. [8] Roweis S, Saul L.K (2000) Nonlinear dimensionality reduction by locally linear embedding. Science. 290(5500): 2323‐2326. [9] He X.F, Yan S.C, Hu Y.X, et al. (2005) Face recognition using Laplacianfaces. IEEE Trans. Pattern Analysis and Machine Intelligence. 27: 328‐340. [10] Yan S.C, Xu D, Zhang B.Y, et al. (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE. Trans. Pattern Analysis and Machine Intelligence. 29(1): 40‐51. [11] Zhong D, Han J, Zhang X, et al. (2010) Neighborhood discriminant embedding in face recognition. Optical Engineering. 49(7): 077203. [12] Wright J, Yang A.Y, Ganesh A, et al. (2009) Robust Face Recognition via Sparse Representation. IEEE. Trans. Pattern Analysis and Machine Intelligence. 31(2): 210‐227. [13] Donoho D (2006) For Most Large Underdetermined Systems of Linear Equations the Minimal l1‐Norm Solution is also the Sparsest Solution. Comm. Pure and Applied Math. 59(6): 797‐829. [14] Yang M, Zhang L, Yang J, et al. (2011) Robust sparse coding for face recognition. Proc. IEEE Computer Vision and Pattern Recognition (CVPR).
www.intechopen.com
[15] Yang M, Zhang L, Yang J, et al. (2012) Regularized Robust Coding for Face Recognition. Available: http://arxiv.org/abs/1202.4207v2. [16] Martinez A, Benavente R (1998) The AR face database. CVC Tech. Report No. 24. [17] Yang J, Zhang Y (2009) Alternating direction algorithms for l1‐problems in compressive sensing. Technical report, Rice University.
www.intechopen.com
[18] Kim S.J, Koh K, Lustig M, et al. (2007) A interior‐ point method for large‐scale l1‐regularized least squares. IEEE Journal on Selected Topics in Signal Processing. 1(4): 606‐617.
Dexing Zhong, Peihong Zhu, Jiuqiang Han and Shengbin Li: An improved Robust Sparse Coding for Face Recognition with Disguise
7