IEEE TRANS. ON INSTRUM. MEAS.
1
Enhancing the Performance of Active Shape Models in Face Recognition Applications Carlos A.R. Behaine and Jacob Scharcanski, Senior Member, IEEE Abstract—Biometric features in face recognition systems are one of the most reliable and least intrusive alternatives for personal identity authentication. Active Shape Model (ASM) is an adaptive shape matching technique that has been used often for locating facial features in face images. However, the performance of ASM can degrade substantially in the presence of noise or near the face frame contours. In this correspondence, we propose a new ASM landmark selection scheme to improve the ASM performance in face recognition applications. The proposed scheme selects robust landmark points where relevant facial features are found, and assigns higher weights to their corresponding features in the face classification stage. The experimental results are promising, and indicate that our approach tends to enhance the performance of ASM, leading to improvements in the final face classification results. Index Terms—Adjusted mutual information, active shape models, feature selection, face recognition, image sensors.
I. Introduction Facial features are important biological traits for biometric recognition, using contactless data acquisition and covert imaging sensors. In face modeling and recognition methods, like Active Shape Model (ASM), a number of facial features (N ) are captured from an input face image, but only R (R < N ) facial features are useful for characterizing the face, and the other N − R features have small contributions or are noisy. This issue has been neglected in the ASM framework applied to face matching. Actually, the ASM face matching errors can be high at some face locations [1], despite the recent improvements made to the ASM technique[2][3]. Even though current ASM implementations improved the landmark location accuracy in face images [4], detecting facial features with varying pose and illumination still is challenging [5]. Inconsistent facial feature detections often occur because the trained ASM converges towards salient image edges, and if these salient edges are noisy or distorted by shading or illumination, erroneous feature matchings may occur. However, the facial features that are located consistently in the training set are more relevant for characterizing and discriminating the problem face classes. In this paper, we present a method based on Adjusted Mutual Information (AMI)1 for assigning higher weights to the facial features that are located consistently in a training set, and we show that feature weighting scheme can improve the reliability of the ASM method in face recognition applications. Next, we define our feature weighting approach in Section II, the experiC. A. R. Behaine is with the Graduate Programme on Electrical Engineering, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil, 90035-190. e-mail:
[email protected]
mental results and discussions are presented in Section III, and our conclusions are in Section IV. II. Active Shape Models and face recognition Face recognition can be understood as the classification of a given face in K distinct face classes, assuming that expression, pose and illumination may change slightly with respect to the training data. In ASM, face shapes are modeled by point distributions models (PDMs), as illustrated in Fig. 1(a). The points of a PDM are landmark points, and the location of these landmark points on face images can have location (or matching) errors (see Fig. 1(b). An ASM for faces is trained based on a face image training set, and N PDM points Sk,² are used to represent the shape of each face of the kth. face class, k = 1, ..., K, k namely Sk,² = {pi (xi + ²xi , yi + ²yi )} , i = 1, ..., N , where (xi , yi ) are coordinates of the PDM point pi and (²xi , ²yi ) are location errors. Each PDM landmark point pi represents a relevant facial feature location (e.g. eye centers, face boundaries, etc.), and the visual aspect (appearance) of that landmark point in the face image is described by a feature set (e.g. chrominance, texture, etc.), here denoted {Fj,i } , j = 1, ..., Q, where Q is the number of image features used per landmark point.
Fig. 1. Illustration of: (a) landmark points pi (PDM) used to model a face in ASM; and (b) location of the landmark points on a face image.
A. ASM for selecting key face locations for measuring facial features In our approach, each one of the N PDM landmark points pi (N = 68, see [5]) is described by the mean µFj,i and variance σF2 j,i of the image measurements of the m jth. feature in all M training face image samples (Fj,i , m = 1, . . . , M ): w w 1 XX µj,i (r, q), w2 r=1 q=1 © 2 ª = max σj,i (r, q) ,
J. Scharcanski is with the Institute of Informatics and Graduate Programme on Electrical Engineering, UFRGS, Porto Alegre, RS, Brazil, 90035-190. Phone: + 55 51 3308-7128, e-mail:
[email protected]
µFj,i =
1 Adjusted Mutual Information is a normalized measure of mutual dependence between two variables [6].
σF2 j,i
r,q∈W
(1) (2)
2
IEEE TRANS. ON INSTRUM. MEAS.
where µj,i (r, q) 1 M
=
1 M
M P m=1
m 2 Fj,i (r, q) and σj,i (r, q)
=
M ¡ ¢2 P m Fj,i (r, q) − µj,i (r, q) , (r, q) are the pixel coordi-
m=1
nates within the window W = w × w centered in the ith.ASM landmark point pi , and calculated for the jth. image feature. The maximum window variance was chosen in Eq. 2 to account for the feature variability within the w × w vicinity of the ith.-ASM landmark point pi . The location errors are assumed approximately Gaussian at each landmark point pi during ASM training [7], and we adopt w = 2 max {σ² }, where σ² is the standard deviation of the location errors of all landmark points, measured during ASM training. B. Selection of consistent features for ASM face recognition The ASM landmark point location performance tends to improve with larger sets of landmark points [5]. However, the face classification performance based on ASM landmark location decreases as the problem dimensionality increases. In this Section, we describe our approach for selecting R landmark points that are consistently located during ASM training (R ≤ N ), and achieving a better face inter-class separation. M I(u,v)−E{M I(u,v)} Let AM I(u, v) = max{H(u),H(v)}−E{M I(u,v)} be the adjusted mutual information between the variables u and v, 0 ≤ AM I(u, v) ≤ 1, where M I(u, v), E{M I(u, v)} denote the mutual information between the variables u and v, and its the expected value, respectively; H(u) and H(v) are the entropies of u and v. We weight the consistency ai of a landmark point pi in the training set as follows : 1 XQ ai = N j=1 P Q n=1
1 ¡
AM I µ ~ Fj,n , µ ~ Fj,i
, ¢
(3)
¡ ¢ where AM I µ ~ Fj,n , µ ~ Fj,i is calculated for feature vectors considering all samples of the landmark points pi in the training set (comprising all face classes), and for the N landmark points pn . Higher ai values reflect low adjusted mutual information in the training set between the landmark pi and the other landmark points pn , or less interdependence. The value of R is problem dependent [9]. We select the R landmark points with higher ai values (R ≤ N ) as the set of landmark points least interdependent, reducing the face recognition problem dimensionality. In fact, ai is interpreted as the distinctiveness of the measurements taken at pi . We used R = 27 in our tests (i.e., R < N2 ), since it reduces the original number of landmark points while maximizing ai . Given face image, we have a N landmark point locations pi given by the trained ASM, where the image facial features are measured. The values assumed by the image feature j at the landmark point pi in the training set of ³the face class´k are represented by a Gaussian k k Gkj,i = G µFj,i k , σF2 j,i , where µFj,i k and σF2 j,i are its
mean and variance parameters, respectively. Now, considering all N landmark points and the Q measured face image features at the landmark point locations, a Gaussian mixture is used to model statistically the face class Q N P P k as Ω, where Ω = dXk Gkj,i , and dXk is the weight k
k
i=1 j=1
of the Gaussian mixture component Gkj,i . Details of the Gaussian mixture estimation are in [8]. The location of some landmark points may change substantially in different face images, leading to unreliable facial feature matches at such locations. Therefore, Ω is k
modified in our approach to take into consideration the reliability of landmark points in the face feature matching process. In our face feature matching method, we assign higher weights to more consistent landmark points (e.g. a landmark point pi can be weighted differently for each face class k, reflecting the relevance of the landmark point for characterizing the facial features at that location). Therefore, by assigning a different weight for each landmark point pi in each face class k, the reliability of the measurements taken at pi is taken into consideration during facial feature matching in class k, and the modified Ω, namely Ωc , is : k
k
Q R 1 X 1X Ωc = αGkj,i + (1 − α)Bik , ai R i=1 Q j=1 k
(4)
where Ωc is a scalar value, and Bik indicates the reliability k
of the feature measurements done at pi in class k, where Q M P P 0 < Bik ≤ 1, 0 < α ≤ 1, and Bik = M1Q Gk,m j,i , and m=1 j=1
Gk,m j,i represents the values of feature j at the landmark pi in the face image m of the face class k. Higher Bik values suggest higher confidence on the features measured pi in class k. The term (1 − α)Bik balances the reliability of the location where the features are measured (pi ) and how representative are the obtained measurements in class k. The value of α is calculated to maximize the predictive value P V 2 during training. Recall that ai weights the distinctiveness of the measurements taken at the landmark point pi within the original set of N landmark points (see Eq. 3). 0 Given an input face image, the face class k that maximizes Ωc (∀k ∈ {K}, where {K} is the set of all face k
classes) indicates the most likely face class, that has the facial features most similar to the input face image. There0 fore, the input face image is assigned to the face class k by running a search ½ on ¾ the set of all face classes, and selecting 0
k = arg max Ωc . ∀k∈{K}
2
k
P P V = T PT+F , where T P and F P are the true positive and false positive P
rates calculated with respect to the ground truth, respectively.
Carlos and Jacob: ENHANCING THE PERFORMANCE OF ACTIVE SHAPE MODELS IN FACE RECOGNITION APPLICATIONS
(a)
III. Experimental Results
standard ASM approach ; two known methods suitable for dimensionality reduction, namely Principal Component Analysis (PCA) [9] and Spectral Regression Discriminant Analysis (SRDA) [10]; and the Data Fusion Boosted Face Recognition method (DFBFR) [11] that presents an impressive performance and represents the state-of-art in terms of face recognition. These methods were tested on the Essex Face Database [12], which contains a significant user diversity (we used 100 face classes, for each class we used 5 images for training and 15 images for testing).
1
0.9
0.9 0.8 ASM (N=68)
0.7 PCA SRDA α=1 α = 0.4 α = 0.2
0.6 0.5 0.4 0.3
Accuracy
k
(b)
1
0.8 Accuracy
In order to evaluate the performance of our method (i.e. with Ωc as a criterion for face classification), we tested: the
3
ASM (N=68) 0.7 PCA SRDA α=1 α = 0.4 α = 0.2
0.6 0.5 0.4
20 40 R ASM points
60
0.3
20 40 R ASM points
60
Fig. 2. Comparison of performances using: (a) Method evaluation; and (b) Systemic evaluation.
appears in Table I. IV. Conclusion
TABLE I Performance comparison for our test set (highest accuracies). Tested Method
(a) Method
(b) Systemic
Accuracy
Accuracy
PCA [9]
0.9033
0.8720
SRDA [10]
0.9207
0.9000
DFBFR [11]
0.9373
0.9153
ASM (N = 68)
0.9040
0.9027
Proposed : Enhanced ASM
0.9533
0.9193
(optimal: α = 0.2, R = 27)
We adopted the face detector used by Demirel et al. [11]. The ASM for faces algorithm was implemented with N = 68 landmark points, and w = 11 pixels. The two color chrominance channels Cr and Cb were used as face image features [11] (Q = 2). In order to conduct a fair evaluation of the proposed method, two tests were conducted: (a) evaluation of the method using faces successfully detected; and (b) systemic evaluation of the method. The method evaluation using faces successfully detected tests the method itself, and discounts the errors introduced by faces incorrectly detected. Accuracy is here defined as the measured average face recognition rate for each class. The results in Table I(a) illustrate the improvement achieved by our proposed method compared with ASM and other methods. In comparison with other methods, ASM has the lowest performance (i.e. accuracy), which can be improved by our feature selection approach (Enhanced ASM ). In the systemic evaluation images were selected randomly, and the face detector performance tends to affect negatively the final results. Table I(b) shows that Enhanced ASM improves on the ASM performance. This may be explained by considering that the ASM method tends to perform poorly when landmark points fall off the boundaries of the frame returned by the face detector (i.e., when faces are not detected correctly), impacting negatively on the ASM systemic evaluation. The performance gain as a function of α is illustrated in Figs. 2(a) and (b), for both tests. Since DFBFR does not allow dimensionality reduction, and only
This correspondence proposes to improve the performance of ASM in face recognition applications by selecting landmarks that correspond to the most reliable facial features. A new ASM feature selection method is introduced, and the experimental results indicate that ASM can have its performance substantially improved in face recognition applications using the proposed approach. Acknowledgment The authors also would like to thank CAPES (Coordenadoria de Aperfei¸coamento de Pessoal de Ensino Superior, Brazil) for financial support, and the Vision Group (University of Essex,UK) for providing the facial database. References [1] A. Hill, T. Cootes, C. Taylor,Active shape models and the shape approximation problem, Image Vision Computing, 9, 14, 1996, pp. 601–607. [2] K.W. Wan, K.M. Lam, K.C. Ng,An accurate active shape model for facial feature extraction, Pattern Recognition Letters, 26, 2005, pp. 2409–2423. [3] J. Kim, M. Cetin, A.S. Willsky,Nonparametric shape priors for active contour-based image segmentation, Signal Processing, 12, 87, 2007, pp. 3021–3044. [4] Z. Zheng, J. Jiong, D. Chunjiang, X. Liu, J. Yang,Facial feature localization based on an improved active shape model, Journal Information Sciences, 9, 178, 2008, pp. 2215 –2223. [5] S. Milborrow, F. Nicolls,Locating Facial Features with an Extended Active Shape Model, European Conference on Computer Vision, 10, 2008. Available: http://www.milbo.users.sonic.net/stasm [6] N. X. Vinh, J. Epps, J. Bailey,Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance, Journal of Machine Learning Research, 11, 2010, pp. 2837–2854. [7] J. Shi, A. Samal, D. Marx, How effective are landmarks and their geometry for face recognition?, Computer Vision and Image Understanding, 102, 5, 2006, pp. 117-133. [8] M. Figueiredo, A. K. Jain,Unsupervised learning of finite mixture models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 3, 24, 2002, pp. 381-396. [9] M.A. Turk, A.P. Pentland,Face recognition using eigenfaces, in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 6, 1991, pp. 586–591. [10] D. Cai, X. He, J. Han, SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis, IEEE Transactions on Knowledge and Data Engineering, 1, 20, 1, 2008, pp.1–12. [11] H. Demirel, G. Anbarjafari, Data Fusion Boosted Face Recognition Based on Probability Distribution Functions in Different Colour Channels, EURASIP Journal on Advances in Signal Processing, 8, 2009, Hindawi Publishing Corporation, pp. 1–10. [12] Face Database, Vision Group, University of Essex, UK. Available: http: //cswww.essex.ac.uk/mv/allfaces/faces94.html