Toward Pose-Invariant 2-D Face Recognition Through ... - IEEE Xplore

6 downloads 0 Views 3MB Size Report
Abstract—This paper proposes novel ways to deal with pose vari- ations in a 2-D face recognition scenario. Using a training set of sparse face meshes, we built ...
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 2, NO. 3, SEPTEMBER 2007

413

Toward Pose-Invariant 2-D Face Recognition Through Point Distribution Models and Facial Symmetry Daniel González-Jiménez and José Luis Alba-Castro

Abstract—This paper proposes novel ways to deal with pose variations in a 2-D face recognition scenario. Using a training set of sparse face meshes, we built a Point Distribution Model and identified the parameters which are responsible for controlling the apparent changes in shape due to turning and nodding the head, namely the pose parameters. Based on them, we propose two approaches for pose correction: 1) a method in which the pose parameters from both meshes are set to typical values of frontal faces, and 2) a method in which one mesh adopts the pose parameters of the other one. Finally, we obtain pose corrected meshes and, taking advantage of facial symmetry, virtual views are synthesized via Thin Plate Splines-based warping. Given that the corrected images are not embedded into a constant reference frame, holistic methods are not suitable for feature extraction. Instead, the virtual faces are fed into a system that makes use of Gabor filtering for recognition. Unlike other approaches that warp faces onto a mean shape, we show that if only pose parameters are modified, client specific information remains in the warped image and discrimination between subjects is more reliable. Statistical analysis of the authentication results obtained on the XM2VTS database confirm the hypothesis. Also, the CMU PIE database is used to assess the performance of the proposed methods in an identification scenario where large pose variations are present, achieving state-of-the-art results and outperforming both research and commercial techniques. Index Terms—CMU PIE database, facial symmetry, Gabor jets, point distribution models, pose-invariant face recognition, thin-plate splines, XM2VTS database.

I. INTRODUCTION

A

UTOMATIC face recognition has attracted a lot of attention from the computer vision and pattern recognition communities during the last decade. This paper addresses one of the major issues within the general face recognition problem: dealing with pose changes. It is well known that the performance of face recognition systems drops drastically when pose differences are present within the input images, and it has become a major goal to design algorithms that are able to cope with this kind of variations. Up to now, the most successful algorithms are those which make use of prior knowledge of the class of faces. In [4], Pentland et al. extend

Manuscript received September 8, 2006; revised May 6, 2007. This work was supported with funds provided partially by the Spanish ministry of education under Project TEC2005-07212/TCM and the European sixth framework programme under the Network of Excellence BIOSECURE (IST-2002-507604). The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Tieniu Tan. The authors are with the Departamento de Teoría de la Señal y Comunicaciones, ETSI Telecomunicación, Universidad de Vigo, Vigo 36310, Spain (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIFS.2007.903543

the eigenface approach [3] to a view-based eigenface method, where an individual eigenspace is constructed for each pose. In [2], Beymer and Poggio extend the earlier attempt presented in [1] (whose main drawback was that images from different viewpoints were needed for every client): from a single image of a subject and making use of face class information, virtual views facing different poses are synthesized and used in a view-based recognizer. For the generation of the virtual views, two different techniques were used: linear classes and parallel deformation. In [5], Maurer and von der Malsburg propose a pose invariant face recognition approach based on Elastic Bunch Graph Matching [6]. The transformation of Gabor features (“jets”) are learnt from training faces that are rotated in depth. In [9], Blanz and Vetter propose a 3-D Morphable Model, where each face can be represented as a linear combination of 3-D face examplars. Given an input image, the 3-D Morphable Model is fitted, recovering shape and texture parameters following an analysis-by-synthesis scheme. Several approaches make use of the 3-D Morphable Model to perform recognition. The main drawback of these methods is the high computational complexity needed to recover image parameters. In [10], Romdhani and Vetter report high recognition rates on the CMU PIE database [26], by means of the 3-D Morphable Model and a fitting algorithm that makes use of linear relations to update the shape and texture parameters, which are then employed for recognition purposes. Blanz et al. also use the 3-D Morphable Model in [11] to synthesize frontal faces from non frontal views, which are then fed into the recognition system. In this same direction, other researchers have tried to generate frontal faces from non frontal views, like the works proposed by Xiujuan Chai et al. in [7], via linear regression in each of the regions in which the face is divided, and in [8] where a 3-D model is used. In [12], Samaras and Zhang combine the strengths of Morphable models to capture the variability of 3-D face shape and a spherical harmonic representation for the illumination. A 3-D model is also used by Lee and Surendra in [13] to synthesize faces at different poses. In [14], Kanade and Yamada propose a completely different method, where the problem of pose variations is addressed via a probabilistic approach that takes into account the pose difference between probe and gallery images, learning how facial features change as the pose changes. In [16], Liu and Chen approximate a human head with a 3-D ellipsoid model, and both training and test images are back projected onto the surface of the ellipsoid, forming texture maps which are used for comparison. Moreover, this texture map is represented as an array of local patches, and a probabilistic model is trained to compare corresponding patches. In [15], Gross et al. propose to estimate the eigen light-fields of the subject’s head, using them

1556-6013/$25.00 © 2007 IEEE

414

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 2, NO. 3, SEPTEMBER 2007

for recognition across pose and illumination changes with tests on the CMU PIE database. Using a dataset containing sparse face meshes (62 points per image), we built a Point Distribution Model and from the main modes of variation, the parameters responsible for controlling the apparent changes in shape due to turning and nodding the head (so-called pose parameters) were identified, similar to the research by Lanitis et al. [18], where the pose of the face was estimated using those parameters. Based on them, we propose two novel approaches for pose correction: 1) a method in which pose parameters from both images are set to typical values of frontal faces; 2) a method in which one image adopts the pose parameters of the other one. Both methods need the synthesis of virtual images, which is accomplished through Thin Plate Splines-based warping [23]. The use of texture mapping for the generation of virtual views is close in spirit to the parallel deformation of Beymer and Poggio proposed in [2], and it has the advantage over linear classes that it preserves subject specific texture peculiarities, since texture is sampled from the subject real face image. However, the goal of parallel deformation is to map a facial transformation observed between two images of a prototype subject onto a novel subject’s face. The problem with this approach arises when the shape of the prototype subject differs significantly from the novel subject’s shape, as the virtual view will appear geometrically distorted. We minimize this effect by modifying only pose parameters rather than the whole shape. Also, there exist similarities between our first method and the works of Blanz et al. [11] and Xiujuan Chai et al. [7], [8] as all of them try to generate frontal images. Unlike in their approaches (among other differences), we will take facial symmetry into account in order to overcome problems due to self-occlusion, leading to important improvements in system performance. Holistic feature-based face recognition methods such as eigenfaces [3] need all images to be embedded into a constant reference frame (an average shape for instance) in order to represent a face as a vector of ordered pixels. Lanitis et al. [18] also deformed each face image to the mean shape using 14 landmarks, extracted shape and appearance parameters and classified using the Mahalanobis distance. However, the virtual images we obtain do not comply with the constant reference frame requirement and hence, local features must be employed for recognition. To this aim, we compute local Gabor responses on the synthesized face. Maurer and von der Malsburg [5] also used Gabor features in a pose invariant framework but, in their case, the correction was applied to the Gabor features extracted from the original non-frontal image. Lanitis et al. [18] show that a linear model is enough to simulate large changes in viewpoint, as long as all the landmarks remain visible. Cootes et al. [32] state that a model trained on near fronto-parallel images can cope with pose variations of up . However, for larger angle displacements, some facial to features (landmarks) become occluded and the assumptions of the model break down. In order to deal with such large rotations, [32] uses a set of models to represent shape and appearance from different viewpoints. Other approaches tackling this problem have either used a full 3-D model [9] or included non-lineari-

Fig. 1. Position of the 62 landmarks used in this paper on an image from the XM2VTS database.

ties in the 2-D model [33]. Based on the previous statement (“a linear model is enough to simulate large changes in viewpoint, as long as all the landmarks remain visible”), and under large rotation angles, we decided to use the restricted subset of visible landmarks for virtual face synthesis, empirically demonstrating the validity of our approach with realistic face images and identification experiments. This paper is organized as follows. The next section briefly reviews point distribution models and Section III introduces the concept of pose eigenvectors and pose parameters. Section IV describes the technique used to synthesize pose corrected images: Thin plate splines-based warping, with examples of virtual images across pose. In Section V we explain different ways to cope with pose variations, while Section VI describes feature extraction on corrected images through Gabor filtering. Sections VII and VIII shows experimental results with two face databases. • Authentication results on the XM2VTS database [22] confirm the advantages of normalizing only pose parameters rather than warping onto a mean shape. • Identification experiments on the CMU PIE database [26] allow us to assess the performance of the methods in the presence of large pose differences, and the benefits of taking facial symmetry into account. Finally, conclusions and future research lines are drawn in Section IX. II. POINT DISTRIBUTION MODEL FOR FACES A point distribution model (PDM) of a face is generated from a set of training examples. For each training image, landmarks are located and their normalized coordinates (by removing translation, rotation, and scale) are stored, forming a vector (1) The pair represents the normalized coordinates of the th landmark in the th training image. Principal components

GONZÁLEZ-JIMÉNEZ AND ALBA-CASTRO: TOWARD POSE-INVARIANT 2-D FACE RECOGNITION

415

Fig. 2. Effect of changing the value b on the reconstructed shapes.  controls the up–down rotation of the face.

Fig. 3. Effect of changing the value b on the reconstructed shapes. Upper row: Coupling of both rigid (left-right rotation) and nonrigid (eyebrow movement and lip width) facial motion within the second eigenvector  . Lower row: When using virtual symmetric meshes to augment the training set, expression changes are not noticeable in  .

analysis (PCA) is applied to find the most important modes of can shape variation. As a consequence, any training shape be approximately reconstructed (2) stands for the mean shape, is a where matrix whose columns are unit eigenvectors of the first modes of variation found in the training set, and is the vector of parameters that define the actual shape of . So the th comweighs the th eigenvector ponent from . Also, since the columns of are orthogonal, we have that , and thus (3) that is, given any shape, it is possible to obtain its vector of parameters . We built a 62-point PDM using manually annotated landmarks (some of them were provided by the FGnet project1, while others were manually annotated by ourselves). Fig. 1 shows the position of the landmarks on an image from the XM2VTS database [22]. When a new image containing a face is presented to the system, the vector of shape parameters that fits the data, , should be computed automatically. There are several techniques like ASM [19], IOF-ASM [20], AAM [21] to deal with this problem. In this work we have used manual annotations instead, which allows us to test the classification performance alone, without the effect of landmark detection errors. III. POSE EIGENVECTORS AND POSE PARAMETERS Among the obtained modes of shape variation, we are interested in isolating the eigenvectors that are responsible for controlling the apparent changes in shape due to rigid facial motion (pose). For each eigenvector , the value of its coris swept within suitable limits (while responding parameter the remaining ones are set to 0) and the reconstructed shapes are observed. This way, we can assess by visual inspection, which eigenvectors contain pose information. Clearly, the eigenvectors (and their relative position) obtained after PCA, strongly depend on the training data and hence, if 1Available at http://www-prima.inrialpes.fr/FGnet/data/07-XM2VTS/ xm2vts-markup.html

all meshes used to build the PDM were strictly frontal, there would not appear any eigenvector explaining rotations in depth. However, if we are sure that pose changes are present in the training set, the eigenvectors explaining those variations will appear among the first ones, due to the fact that the energy associated to rigid facial motion should be higher than that of most expression/identity changes (once again, depending on the specific dataset used to train the PDM). With our settings, it turned controlled up-down rotations (see Fig. 2) while out that was the responsible for left-right rotations. A major problem, inherent to the underlying PCA analysis, relies on the fact that a given pose-eigenvector may not only contain rigid facial motion (pose) but also nonrigid (expression/ identity) information, mostly depending on the training data used to build the PDM. Regarding , it has been shown [31] that there exists a dependence between the vertical variation in viewpoint (nodding) and the perception of facial expression, as long as faces that are tilted forwards (leftmost shape in Fig. 2) are judged as happier, while faces tilted backwards (rightmost shape in Fig. 2) are judged as sadder. Regarding , the upper row of Fig. 3 shows the reconstructed shapes by varying . Apart from the left-right rotation, it is clear that also contains facial expression/identity information: faces rotated to the right seem to show surprise (raised eyebrows), while faces rotated to the left look more serious. Ideally, rigid facial motion should be orthogonal to other factors of shape variation but we can see that this does not hold exactly for (although variations due to in-depth rotation are much more important than those induced by expression/identity changes). In order to soften this coupling for the left-right eigenvector, the training set was augmented with virtual symmetric meshes. The reconstructed shapes in the lower row of Fig. 3, show that expression changes are now not noticeable in the obtained eigenvector . By enlarging the original training set with artificial symmetric samples: • the variance of the training set is precisely increased in the left-right direction; • no examples with new expression/identity information are introduced; • the nonrigid variation is cancelled along , because both training samples looking to the right and those looking to the left will exactly have the same amount of expres-

416

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 2, NO. 3, SEPTEMBER 2007

sion/identity information: any exemplar in the original training set (with a given pose and expression/identity) has its corresponding mirror version (with the same expression/identity but opposite pose), and thus the augmented training data we get are “more balanced” in the left–right direction. A. Theoretical Evidence on the Fact That Symmetric Meshes Help to Decouple Left–Right Rotations and Nonrigid Factors We do now provide a theoretical explanation behind the intuitive use of the augmented training set with virtual symmetric exammeshes. Let be the original training set comprising ples of face meshes

(4) Its covariance matrix by

(assuming data are zero mean) is given

(5) As stated before, is the eigenvector that controls the left–right rotations. By definition

Fig. 4. Coefficients from eigenvector  (obtained with the original training set S ) grouped by the specific facial feature they affect.

Let us now consider the symmetrized training set . For the th face shape , the -coordinates of its mirror version do not change, while the -coordinates have the opposite sign. However, we have to take the following consideration into account: let us concentrate on the first landmark from Fig. 1. When obtaining the mirror version of this face is no longer the shape, the symmetrized landmark first one, but it actually becomes landmark #15 and vice versa. A similar reasoning can be applied to the remaining landmarks. Mathematically, this can be expressed by using a permutation matrix . Hence, can be expressed as follows:

(6) where is its associated eigenvalue. From the top row of Fig. 3, we already realized that there also exists some coupled non(which is mostly encoded in the rigid information within vertical displacements of the eyebrows). Fig. 4 shows the coefficients grouped by the facial feature (face contour, eyes, mouth, etc.) they affect. Concentrate, for instance, on those coefficients weighing the eyebrows’ -coordinates: as long as they are far from being 0, changing the specific value of provokes expression changes (eyebrows’ raising and bending). On the other hand, those coefficients controlling horizontal displacements (i.e., those weighing the -coordinates) are mostly responsible for left–right rotations. Hence, it makes sense to ascan be expressed as the sum of two components: sume that one controlling left–right variations (LRV) and the other accounting for nonrigid variations (NRV)

(7) From the upper row in Fig. 3, it is straightforward to conclude that the nonrigid variation encoded in is clearly smaller than the left–right contribution, and this conclusion can be also extracted from Fig. 4, as long as those coefficients associated to (mainly those corresponding to the mouth’s and eyebrows’ -coordinates) have smaller values (in modulus) than (see, for instance, the coeffithose associated with cients weighing the face contour’s -coordinates). In general terms, the moduli of -coefficients are greater than their corresponding -coefficients (with the exception of eyebrows).

(8) is a permutation matrix, it yields that Given that , and the same occurs with (i.e., ). The of the symmetric training set is given by covariance matrix

(9) Given that and hence

and

have the same eigenvalues, (10)

where is the eigenvector controlling the left–right rotations (plus some coupled nonrigid variations) in the symmetric training set. The upper row of Fig. 5 shows the reconstructed shapes using (i.e., ), while the bottom row plots the reconstructed shapes using (i.e., ) for a given value of . The first thing we should note is that . Moreover, presents the same pose as and the . Taking these facts into same nonrigid information of can be decomposed (in a similar way as ) into account, (11) Adding (6) and (10), and taking (7) and (11) into account, it yields

(12)

GONZÁLEZ-JIMÉNEZ AND ALBA-CASTRO: TOWARD POSE-INVARIANT 2-D FACE RECOGNITION

417

Fig. 7. Coefficients from eigenvector  (obtained with the augmented training set S ) grouped by the specific facial feature they affect.

is precisely the covariance matrix of the augmented training set comprising both original and symmetric meshes. In fact (15) Fig. 5. Upper row: Reconstructed shapes using  . Bottom row: Reconstructed  +  shapes using  . Clearly, X ( ) = X  has the same pose as X ( ) = X +  , and the same nonrigid information as X ( ) = X  .

0

0

0

Fig. 6. Covariance matrices plots: a) C , b) C , c) C + C , d) C C . From these plots, (C C ) and the fact that the nonrigid contribution is smaller than is not significant compared the rigid one, we can assume that (C C )v . to (C + C )v



0

Now, we will assume that is not significant . In fact, we have already seen that compared to is small compared to that the nonrigid contribution from . Moreover, (see Fig. 6 for visual evidence) of and, hence, it turns (13) Dividing both sides of (13) by the norm of noting , we have

, and de-

(14) (containing just left-right roFrom (14), it is clear that with tations) is (approximately) an eigenvector of an associated eigenvalue . It is straightforward to see that

and its covariance matrix

is given by

Hence, we have demonstrated that by using the augmented training set, we were able to get rid of the component containing the small nonrigid variations (as it was shown in the bottom row of Fig. 3). Fig. 7 plots the coefficients of the left–right eigenobtained with the augmented training set. By comvector paring this plot with Fig. 4 we can conclude that: coefficients weighing the coordinates have • the smaller values in modulus (thus closer to 0) than the corresponding ones from the original eigenvector (hence reducing expression changes when sweeping ). This is specially significant for the coefficients related to the eyebrows’ -coordinates; • those coefficients weighing the right and left contour’s coordinates (indices 1 to 7 and 9 to 15, respectively) show a perfectly symmetric pattern. This means that for a and given value of will exactly show opposite left-right angles. This was not the case of the original eigenvector , whose -contour’s coefficients did not show such a perfectly symmetric pattern; • in fact, for every facial feature, those coefficients weighing the -coordinates from the left side of the feature and the corresponding ones affecting the right side share the same and to be values. This symmetry provokes simple reflections (as shown in the bottom row of Fig. 3). B. Experiment on a Video-Sequence: Decoupling of Pose and Expression In order to demonstrate on real data that the presence of nonrigid factors within the identified pose-eigenvectors is

418

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 2, NO. 3, SEPTEMBER 2007

Fig. 9. Images taken from all cameras of the CMU PIE database for subject 04006. The nine cameras in the horizontal sweep are each separated by about 22.5 [26].

minimal, we used a manually annotated video-sequence of a man during conversation2 (hence, rich in expression changes). For each frame in the video, the vector of shape parameters

of the corresponding mesh the rigid (pose) part

was calculated and split into

and the nonrigid (expression/identity) part

Finally, we calculated the reconstructed meshes and using (2) with and , respectively. should only contain rigid mesh information, Ideally, should reflect changes in expression and while contain identity information. As shown in Fig. 8, it is clear that although some coupling exists (especially in the third row with ), is responsible small eyebrow bending in for expression changes and identity information (face shape is ) while does mainly clearly encoded in contain rigid motion information. For instance, the original shapes from the fourth and seventh rows share approximately the same pose, but differ substantially in their expression. ’s are approximately the same while the Accordingly, the ’s are clearly different. C. Experiment on the CMU PIE Database

Fig. 8. Experiment on the video sequence. Each row shows, for a given frame , the original shape ( ) and the reconstructed shapes ( ( ) and ( )) using ( ) and ( ), respectively. Clearly, ( ) controls expression and identity while ( ) is mostly responsible for rigid changes.

f X

f

b

Xf f

b

X

f

f

X

X

f

f

The CMU pose, illumination and expresion (PIE) database [26] consists of face images from 68 subjects recorded under different combinations of poses and illuminations. Fig. 9 shows the images taken for subject 04006 from all cameras with neutral illumination. As we can see, this database is specially suitable for testing the robustness of systems to left-right face rotations. In this paper, we use a subset of the database, namely the images taken from cameras 11, 29, 27, 05, and 37 (with corresponding and apnominal rotation angles of proximately) under neutral illumination. All of them (a total of 68 5 images) were manually annotated with the same set of 62 landmarks shown in Fig. 1. For each annotated mesh, its vector of shape parameters was calculated. So, for every subject, we have 5 vectors, each one 2http://www-prima.inrialpes.fr/FGnet/data/01-TalkingFace/talking_face. html

GONZÁLEZ-JIMÉNEZ AND ALBA-CASTRO: TOWARD POSE-INVARIANT 2-D FACE RECOGNITION

419

TABLE I b AND THE ANGLE OF ROTATION  RELATIONSHIP BETWEEN 

corresponding to a certain pose (11, 29, 27, 05, and 37) and, for a given pose, we have 68 vectors, each one corresponding to a subject from the database. Table I shows, for each pose, the , along with the nominal average value of the parameter rotation angle . Clearly, an approximate linear relationship exists between and , in agreement with the result obtained in [18]. Hence, it seems that, within the set of considered poses to 45 ), the PCA analysis is able to deal (ranging from with rotations properly. Moreover, if were only responsible for left–right rotations, the variance of with pose changes (interpose variance) should be high while when the pose is fixed, the variance of (intrashould not be seriously pose variance) should be small (i.e., affected by other factors such as identity variations). Fig. 10 presents both inter and intrapose variances for every parameter. It is clear that: has the highest interpose variance among the whole set • of shape parameters; is much lower • the intrapose variance associated with than its interpose variance; • given that all tested poses have approximately the same , the interpose variance is small for ; elevation From Fig. 10, it is clear that, apart from , other parameters present high interpose variances. However, their corresponding intrapose variances are also high. The ratio between both quantities

Fig. 10. Intra and interpose variances for each of shape parameter.

Interpose Variance Intrapose Variance is an adequate way of measuring how the two variances are related for a given parameter. As shown in Fig. 11, presents the highest value among the set of shape parameters. Section IV-B, with examples of virtual images, will give another token of the and provokes fact that changing the particular values of variations in pose but does not seriously affect other facial properties such as identity or expression factors. IV. VIRTUAL FACE SYNTHESIS A. Thin-Plate Splines Warping All of the methods that will be introduced in Section V share one common feature. Given one face image , the coordinates , and a new set of coordinates, of its respective fitted mesh, , a virtual face image must be synthesized by warping the original face onto the new shape. For this purpose, we used the method developed by Bookstein based on thin plate splines [23]. and , the Provided the set of correspondences between original face is allowed to be deformed so that the original landmarks are moved to fit the new shape.

Fig. 11. Interpose variance divided by intrapose variance for each of the shape parameters. Clearly, the b parameter we identified as responsible for left–right rotations presents the highest ratio.

B. Synthesizing Virtual Face Images Across Pose Using Thin Plate Splines Fig. 12 shows some examples of virtual face images, corresponding to subjects from the CMU PIE database, synthesized using Thin Plate Splines warping. For each subject, only the frontal image (i.e., pose 27) and its associated mesh are used as inputs. The vector of shape parameters is recovered from the frontal mesh using (3), and is forced to sweep a range of values, synthesizing virtual meshes using (2). Finally, the frontal face image is warped onto these virtual meshes, generating face images under different viewpoints. and Fig. 13 shows the original face images taken at . Their corresponding virtual images are present in the two

420

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 2, NO. 3, SEPTEMBER 2007

Fig. 12. Examples of synthesized face images across azimuth. In each row, the frontal face is warped onto virtual meshes obtained by sweeping b within a range of values.

Fig. 14. Examples of synthesized face images across elevation. In each column, the frontal face is warped onto virtual meshes obtained by sweeping b within a range of values.

Fig. 15. Slight coupling between pose changes and facial expression when the training set has not been properly chosen.

Fig. 16. Upper row: Identity is not modified when changing the value of the up-down parameter (a good training set has been chosen). Lower row: Identity is clearly distorted when changing the value of the up-down parameter (not enough up–down tilting examples in the training set).

6

6

Fig. 13. Original face images across azimuth ( 22:5 and 45 ). By comparing the two leftmost and the two righmost columns of Fig. 12 with the faces shown here, we can realize that the virtual images are very similar to the original ones.

leftmost and the two rightmost columns of Fig. 12. It is clear that the synthetic faces are very similar to the real ones. Fig. 14 shows the virtual images synthesized by varying in an analogous way. We would like to remark that, although some distortions due to mesh manipulation and warping process may exist, identity and expression3 properties are preserved in most of the synthesized images from both Figs. 12 and 14. Coming back to the discussion addressed in Section III regarding the decoupling between rigid and nonrigid information within an identified pose-eigenvector, we would like to re-emphasize that choosing a good training set is required in order to obtain such decoupling. Otherwise, both rigid and nonrigid information may get mixed. It was shown that the use of an augmented training set with symmetric meshes helped to decouple left–right rotation and facial expression, and this can be noted once again when comparing the first row in Fig. 12 with 3In agreement with the conclusions presented in [31], virtual faces tilted forwards look happier than those tilted backwards.

Fig. 15. Clearly, in the latter, some coupling exists between facial expression and pose changes (which was removed by using the augmented training set). Moreover, Fig. 16 provides another clear example of the need of choosing an adequate training set regarding the up–down eigenvector. • The upper row shows the warped images obtained by modifying the identified up–down parameter when the training set contains enough up–down tilting examples. Clearly, modifying the specific value of the parameter does not affect the identity of the synthetic images. • The lower row shows the warped images obtained by modifying the identified up–down parameter when the training set does not contain enough up–down tilting examples. Clearly, face shape and up–down rotation are mixed in the associated eigenvector, distorting identity as the value of the parameter changes. Up to now, all synthesized images showed a neutral expression, but what happens in the case a subject is expressing happiness, anger, etc.? Will the synthetic faces maintain expression as long as pose changes? We already discussed that there exists a dependence between nodding and the perception of facial expression, but left–right rotations should not affect expression at all. Fig. 17 clearly suggests that there is no change in the facial expression as long as pose changes for the two cases shown.

GONZÁLEZ-JIMÉNEZ AND ALBA-CASTRO: TOWARD POSE-INVARIANT 2-D FACE RECOGNITION

Fig. 17. Upper row: Although pose is forced to change, synthetic images maintain the original worried expression. Lower row: The same occurs when the subject smiles.

421

Fig. 18. Images from subject 013 of the XM2VTS. Left: Original image. Right: Image warped onto the average shape. Observe that subject-specific information has been reduced (especially in the lips region).

V. POSE CORRECTION Given a test image with unknown identity and a training of a given client, the system must output a measure image of similarity (or dissimilarity) between them. Straightforward and may not produce texture comparison between desirable results as differences in pose could be quite important. So, in order to deal with these differences, we apply and compare three different algorithms that make use of the PDM parameters.

). New coordinates are computed using (2) and virtual images and are synthesized. In [29], this method was tested on the XM2VTS database, achieving good results. C. Pose Transfer and Warping (PTW): Warping One Image to Adopt The Other One’s Pose

Once the meshes have been fitted to and , both faces are warped onto the average shape of the training set, , which corresponds to setting all shape parameters to 0 (i.e., ). Thus, the images are deformed so that a set of landmarks are moved to coincide with the correspondent set of landmarks and . The number on the average shape, obtaining of landmarks used as “anchor” points is another variable to be fixed. For the experiments, we used two different sets. • The whole set of 62 points. • The set of 14 landmarks used in [18]. As the number of “anchor” points grows, the synthesized image is more likely to present artifacts because more points are forced to be moved to landmarks of a mean shape (which may differ significantly from the subject’s shape). On the other hand, with few “anchor” points, small pose correction can be made.

Based on the particular values of and , we can also think of synthesizing a virtual face adopting the pose of the other one. A block diagram of this approach can be seen in Fig. 20. Compared to NFPW, this approach has one computational disadvantage: if several training images are available for a given client, it is approximately times slower than NFPW, as each comparison between and needs the synthesis of a virtual image, must be warped once to adopt a while with NFPW, only have been distorted frontal pose, assuming that the images and stored during the training stage. Experiments on the video sequence used in Section III-B revealed slightly better performance when warping the near profile face to adopt the pose of the near frontal one. This particular choice is the one that will be tested on the first set of experiments over the XM2VTS database (Section VII).

B. Normalizing to Frontal Pose and Warping (NFPW)

D. Taking Advantage of Facial Symmetry

We argue that normalizing only the pose parameters should produce better results than warping images onto a mean shape, because in the latter approach (WMS), discriminative information may be removed during the warping process, as long as all shape parameters are fixed to zero, and the set of “anchor” points are forced to be moved to landmarks of a mean shape. Holistic approaches such as eigenfaces [3] need all images to be embedded into a given reference frame (an average shape for instance), in order to represent these images as vectors of ordered pixels. The problem arises when the subject’s shape differs enough from the average shape, as the warped image may appear geometrically distorted, and subject-specific information may be removed (see Fig. 18 for an example). Given that our recognition method is not holistic but uses local features instead, the reference-frame constraint is avoided and the distortion is minimized by modifying only pose parameters rather than the whole shape. In Fig. 19, we can see a block diagram and , only the subset of paof this method. Given rameters that account for pose are fixed to the typical values of frontal faces from the training set (as the average shape corresponds to a frontal face, we fixed pose parameters to zero (i.e.,

As explained in a previous section, the synthesis of a virtual image is accomplished by sampling texture from the original one. The problem arises when, due to self-occlusion, some face regions become not visible, i.e., texture is not available, and hence the corresponding regions in the pose normalized image do not represent subject’s appearance correctly. In order to overcome this drawback, we take advantage of the vertical symmetry of the face. For a horizontal rotation in depth of the head and once the mesh has been fitted, the parameter controlling the azimuth angle indicates whether the face is showing mostly its right or its left side. Whenever a frontal face is synthesized from a non-frontal view, we warp the original image and its mirror version onto the pose-corrected frontal mesh and then blend the two virtual images, using simple masks that weigh the two sides of the face appropriately (according to the current rotation—left or right—of the head), as it can be seen in Fig. 21. On the other hand, when using PTW and the images to be compared are showing opposite sides (i.e., face is rotated to the left and face is rotated to the right, direct warping from one pose to the other provides poor results). A better solution can

A. Warping to Mean Shape (WMS)

422

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 2, NO. 3, SEPTEMBER 2007

Fig. 19. Block diagram for pose correction using NFPW. After face alignment, the obtained meshes are corrected to frontal pose (pose normalization block), and virtual faces are obtained through thin-plate splines (TPS) warping. Finally, both synthesized images are compared. It is important to note that the processing of the training image could (and should) be done offline, thus saving time during recognition.

^ is Fig. 20. Block diagram for pose normalization using PTW. After face alignment, mesh A adopts the pose of mesh B (pose transfer block), and virtual face A ^ and B are compared. obtained through thin-plate splines (TPS) warping. Finally, faces A be obtained with the use of mirror images (see Fig. 22 for an example): 1) mirror versions of both faces, and , and their respective meshes, and , are obtained; 2) pose parameters are transferred from mesh to and similarly, pose parameters are transferred

to , obtaining and from mesh , respectively; 3) warping is performed from to obtaining , and from to obtaining ; 4) comparison is performed between and , and and ; 5) the two obtained scores are averaged.

GONZÁLEZ-JIMÉNEZ AND ALBA-CASTRO: TOWARD POSE-INVARIANT 2-D FACE RECOGNITION

423

Fig. 21. Block diagram for pose normalization using NFPW and facial symmetry.

TABLE II FALSE ACCEPTANCE RATE (FAR), FALSE REJECTION RATE (FRR) AND TOTAL ERROR RATE (TER) OVER THE TEST SET FOR DIFFERENT METHODS

At each of the nodes of the pose-normalized mesh, a Gabor jet is extracted and stored for comparison. Given two images and with node coordinates to be compared, say and , their respective and . Fisets of jets are computed: nally, the score between the two images is given by (17)

VI. FEATURE EXTRACTION The recognition engine is based on Gabor filtering. Gabor filters are biologically motivated convolution kernels in the shape of plane waves restricted by a Gaussian envelope, as it is shown next

represents the normalized dot product between where correspondent jets, but taking into account that only the moduli stands for a generic comof jet coefficients are used. In (17), bination rule of the dot products. VII. FACE AUTHENTICATION ON THE XM2VTS DATABASE

(16) where contains information about frequency and orientation of the filters, and . Our system uses a set of 40 Gabor filters with the same configuration as in [6]. The region surrounding a pixel in the image is encoded by the convolution of the image patch with these filters, and the set of responses is called a jet, . So a jet is a vector with 40 coefficients, and it provides information about an specific region of the image.

Using the XM2VTS database [22], authentication experiments were performed on configuration I of the Lausanne protocol [24] in order to confirm the advantages of modifying only pose parameters over warping onto a mean shape. A. Database and Experimental Setup The XM2VTS database contains mainly frontal face images recorded on 295 subjects (200 clients, 25 evaluation impostors, and 70 test impostors) during four sessions taken at one month intervals. The database was divided into three sets: a training set, an evaluation set, and a test set. The training set was used

424

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 2, NO. 3, SEPTEMBER 2007

CONFIDENCE INTERVAL AROUND

1

TABLE III

= HTER 0 HTER

FOR

Z

= 1 645 :

Fig. 22. Taking advantage of facial symmetry in PTW.

to build client models, while the evaluation set was used to estimate thresholds. Finally, the test set was employed to assess system performance. jets were computed for As explained in Section VI, every image, thus obtaining 62 local scores when comparing two faces. The median rule was used to fuse these scores, i.e., . According to configuration I, 3 training images are available per client. Hence, when a test image claims a given identity, we obtain 3 scores, which may be fused in order to improve the results. Again, the median rule was used to combine these values, obtaining a final score ready for verification. We compared the performance of the different methods presented in Section V. More concisely: • WMS 1) WMS_14: Warping images onto a mean shape using the same set of 14 “anchor” points employed in [18]; 2) WMS_62: Warping images onto a mean shape using 62 “anchor” points. • NFPW: Normalizing only the subset of pose parameters to adopt a frontal mesh; • PTW: Warping one image to adopt the pose of the other one. Table II shows the false-acceptance rate (FAR), false rejection over the rate (FRR), and total error rate test set mentioned before. Moreover, the last row from this table presents the baseline results when no pose correction is applied (baseline).

We should remark that facial symmetry was not taken into account for these experiments. Pose variation is not a major characteristic of the XM2VTS and the impact of facial symmetry is not very impressive on this database. However, we will show with tests on the CMU PIE database that, in the presence of large pose changes, performance is significantly improved if symmetry is used. B. Statistical Analysis of the Results In [25], the authors adapt statistical tests to compute confidence intervals around half total error rates measures, and to assess whether statistically significant differences exist between two approaches. Given methods A and B and , we comwith respective performances pute a confidence interval (CI) around . Clearly, if the range of obtained values is symmetric around 0, we cannot say the two methods are different. The con, where fidence interval is given by (18) and (19) In (18), NC stands for the number of client accesses, while NI stands for the number of imposter trials. For each comparison

GONZÁLEZ-JIMÉNEZ AND ALBA-CASTRO: TOWARD POSE-INVARIANT 2-D FACE RECOGNITION

between the methods presented in Table II, we calculated confidence intervals which are shown in Table III. From both tables, we can conclude: 1) although pose variation is not a major characteristic of the XM2VTS database, it is clear that the use of both NFPW and PTW significantly improved system performance compared to the baseline method; 2) warping both images to a mean shape suffers from the greatest degradation in performance. It is clear that synthesizing face images with WMS does seriously distort the “identity” of the warped image, as long as the performances of the baseline algorithm and the two WMS’s methods are very similar (robustness to pose provokes subject-specific information supression, leading to no improvement at all). Furthermore, we assess that significant differences are present when comparing WMS with NFPW and PTW, as the confidence intervals do not include 0 in their range of values; 3) there are no statistically significant differences between WMS_14 and WMS_62, as the confidence interval is symmetric around 0; 4) for the same reason, no significant differences are present between NFPW and PTW. In Section V, we stated that warping the near profile image reported slightly better results over a video sequence and this fact was confirmed on the XM2VTS database, as warping the near frontal face gave a total error rate of 5.45% (compared to 4.76%). However, we can not conclude that these methods are statistically significantly different as the confidence interval was % % . In the next Section, around where results on the CMU PIE database are presented, we will follow the scheme of Fig. 22, performing the two warps and averaging the scores obtained from both comparisons. As stated before, the use of symmetry on the XM2VTS database leads to % small improvements that are not statistically significant ( with respect to the non-symmetry versions). In the previous sections, it was highlighted that the images synthesized using both NFPW and PTW were not suitable for holistic feature extraction, but this is not the case of WMS. In order to assess the performance of a (baseline) holistic method on this database, we applied eigenfaces [3] on the images generated through WMS and obtained a TER of 16.27%, which is significantly worse than that of the local feature extraction of on WMS images, with a confidence interval around [2.95%, 6.01%]. VIII. FACE IDENTIFICATION ON THE CMU PIE DATABASE Up to now, the obtained results have shown the benefits of normalizing only pose parameters, but the performance of the methods under large pose variations have not been assessed yet. Moreover, we want to test whether there exist improvements when facial symmetry is taken into account. For these purposes, we used the pose subset of the CMU PIE database that was introduced in Section III-C. A. Experimental Setup Following [27], we distinguish between gallery and probe images. The gallery contains images of known individuals, which

425

TABLE IV IDENTIFICATION RATES (%) ON THE CMU PIE DATABASE: NO POSE CORRECTION

TABLE V IDENTIFICATION RATES (%) ON THE CMU PIE DATABASE: NFPW WITHOUT FACIAL SYMMETRY

TABLE VI IDENTIFICATION RATES (%) ON THE CMU PIE DATABASE: NFPW PLUS FACIAL SYMMETRY

are used to build templates, and the probe set contains images of subjects with unknown identity, that must be compared against the gallery. A closed universe model is used to assess system performance, meaning that every subject in the probe set is also present in the gallery. We did not restrict ourselves to work with frontal faces as gallery. Instead, the performance of the system was computed for all possible (gallery, probe) combinations. B. Our Results Table IV shows the baseline results when no pose correction is applied. The average recognition rate is 68.38%. When the NFPW method is used, the correct identification rate increases to 78.46% (Table V). However, results are poor for completely different viewpoints. As it can be seen from Table VI, performance is improved if facial symmetry is taken into account, leading to an average recognition rate of 87.50%. It seems a rather safe hypothesis to think that better results could be obtained if pose is transferred and facial symmetry is used (PTW plus symmetry), specially when viewpoints are quite different. The results shown in Table VII confirm this supposition. The average recognition rate over all (gallery,probe) pairs is 91.47%, and the highest improvements are achieved when the gallery and probe sets are facing opposite directions—pairs (11,05), (05,11), (11,37), (37,11), (29,05), (05,29), (29,37) and (37,29), i.e., the leftmost bottom four cells and the righmost top four cells—. The average recognition rate in these cells increases from 74.63% using NFPW and symmetry, to 84.56% using PTW and symmetry. The average recognition rate in the other cells is the same (96.08%) for the two methods.

426

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 2, NO. 3, SEPTEMBER 2007

TABLE VII IDENTIFICATION RATES (%) ON THE CMU PIE DATABASE: PTW PLUS FACIAL SYMMETRY

TABLE X IDENTIFICATION RATES (%) ON THE CMU PIE DATABASE: VISIONICS’ FACEIT RESULTS [17]

TABLE VIII IDENTIFICATION RATES (%) ON THE CMU PIE DATABASE: 3-D MORPHABLE MODEL WITH LIST FITTING ALGORITHM [10]

Fig. 23. Near profile image from subject 04004 of the PIE database. TABLE IX IDENTIFICATION RATES (%) ON THE CMU PIE DATABASE: OTHER RESULTS

Fig. 24. Effect of large b values on the reconstructed shapes. The “occluded” contour, marked with blue dashed line, seems to disappear behind the visible features.

C. Other Researchers’ Results Table VIII presents the recognition rates achieved with the use of the 3-D morphable model -based face recognition system described in [10]. The average recognition rate is 88.45%. As we can see, PTW plus symmetry achieves better performance over the set of considered poses. Although their approach is semiautomatic, some parameters such as the face pose, focal length, etc. used for algorithm initialization, are computed using data provided by the maker of the database. In [7] and [8], only frontal images were used as gallery. The recognition rates for these two methods are shown in the first two rows of Table IX, with averages of 85.5% and 94.87%, respectively. For the same gallery, NFPW plus symmetry obtains 95.59% and PTW plus symmetry achieves 97.43% of correct recognition rate. In [15], the authors propose an appearance-based algorithm that uses an special kind of holistic feature [the so-called eigen light-field (ELF)] for face recognition. From the third to the sixth row, Table IX presents the results achieved with two different versions of the the ELF approach: the 3-point ELF (3 points—eyes and mouth—are used to warp the face image) and the Complex ELF (where a set of manually annotated points is used for the normalization). Due to the use of manual landmarks, the last one is specially suitable for comparison with our method. We can see that PTW plus symmetry ouperforms

the complex ELF in the range of considered poses4 (93% compared to 82.5% correct recognition rate). Reference [15] also presented the performance of a baseline method using holistic feature extraction (eigenfaces). The results achieved with this method are shown in the last two rows from Table IX and, as can be seen, they are even worse than those of our baseline algorithm. In [17], results obtained with the use of Visionics’ face recognition module FaceIt were presented. Table X shows the performance in the set of considered poses. If only one image is used as gallery (first five rows), the average recognition rate is 66.10%. This performance is similar to the one obtained with our baseline method, where no pose correction was applied. Clearly, correcting pose with any of the approaches we have proposed, provides better results than FaceIt. If not only frontal images, but also faces rotated to the right and to the left are used as gallery (last two rows of Table X), the general performance is improved and the recognition rate, for a given probe pose, is approximately limited by the performance of the best (single) gallery pose. D. Testing the System With Large Pose Changes Up to now, we used images from the PIE database ranging to of horizontal rotation. It has been demonfrom 4At the moment of writing this paper, only numerical results with poses c27 and c37 as gallery could be obtained from the authors of [15].

GONZÁLEZ-JIMÉNEZ AND ALBA-CASTRO: TOWARD POSE-INVARIANT 2-D FACE RECOGNITION

427

Fig. 25. Upper row: Examples of virtual images using the whole set of 62 landmarks. As we can see, serious distortions are induced in the presence of large rotations. Lower row: Synthesized faces using the set of visible landmarks. Clearly, images under large pose changes are much more realistic and seem to preserve identity information correctly.

strated both with realistic examples (Section IV-B) and good identification rates, that the method is suitable for face recognition across pose within the mentioned range of angles. These results are in agreement with [18], [32], where the authors state that a linear model is enough to simulate large changes in viewpoint, as long as all the landmarks remain visible. But, what if we exceed this range? Is the system able to deal with such extreme pose variations? As it is discussed in [32], when some of the landmarks become occluded, the assumptions of the linear model break down and therefore it is no longer valid. Clearly, in a near profile image (see Fig. 23) we can not use the original set of landmarks from Fig. 1 because some of them are occluded. Based on the previous statement (“a linear model is enough to simulate large changes in viewpoint, as long as all the landmarks remain visible”) and for large rotation angles, we decided to use the restricted set of visible landmarks: after training the PDM, we assess by visual inspection the specific that start producing severe occlusions in both divalues of rections, and determine the subset of landmarks that is visible in the presence of such extreme rotations. Fig. 24 shows the effect of using high values on the reconstructed shape. Apparently, the “occluded” contour seems to be really disappearing behind the visible features, while the non-occluded landmarks still define plausible (rotated) face shapes. Hence, in order to avoid distortions during the warping process, we discard the “occluded” landmarks and their corresponding regions. The upper row of Fig. 25 shows the result of using the whole set of 62 landmarks in the warping process. Clearly, facial images are seriously distorted when large pose changes are induced. More realistic faces can be synthesized with the restricted set of visible landmarks (lower row of Fig. 25). In order to assess the performance of the system in the presence of large rotations, we used the faces from poses 11, 29, 27, 05, and 37 as gallery and the images from poses 02 and 14 of rotation) as probe. For these two poses, the whole ( set of 62 landmarks is not visible and, hence, we are not able to obtain the vector of shape parameters nor the pose parameters as explained in Sections II and III. For the same reason, we can not apply NFPW in a straightforward manner. However, we will demonstrate that a variant of the PTW method is useful in this case. In Section III-C it was shown that there exists an approximate linear relationship between and the angle of horizontal rotation . Although we know that in the presence of large rotations, the linear dependence may not hold, estimation of the values for (namely and ) were computed assuming linearity. For each of the gallery images

Fig. 26. First and third columns show original images at poses 14 and 02, respectively, while the second and fourth columns present the corresponding synthesized images using the variant of the PTW method introduced in Section VIII-D.

the

value of its corresponding mesh was set to and , and virtual meshes were obtained. Finally, taking facial symmetry and the subset of visible landmarks into account, we synthesized virtual faces. In Fig. 26, we can see several examples of warped images obtained with this procedure along with the original faces at poses 02 and 14. It is clear that: • corresponding virtual and original images show similar pose; hence, the procedure of estimating values for assuming linearity turned out to work quite well. However, the reconstructed face does not always adopt the correct pose (see, for instance, last two columns in the second row of Fig. 26); • identity is preserved in the synthesized images. Although large changes in pose are induced, both PDM and warping process are not introducing serious distortions when only visible landmarks are used. However, some features such as shape of the nose can not always be reconstructed accurately due to lack of 3-D information in the 2-D model. Table XI shows the results for every (gallery probe) pose combination. The average recognition rate is 77.5%. Table XII

428

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 2, NO. 3, SEPTEMBER 2007

TABLE XI IDENTIFICATION RATES (%) ON THE CMU PIE DATABASE: TESTING THE SYSTEM WITH EXTREME POSE CHANGES

TABLE XII IDENTIFICATION RATES (%) ON THE CMU PIE DATABASE: 3-D MORPHABLE MODEL WITH LIST FITTING ALGORITHM [10]

presents the results for the same set using the 3-D morphable model with LiST fitting algorithm [10]. The average recognition rate for this technique is 79.7%. As we can see, the variant of the PTW method achieves similar performance to that of the 3-D morphable model when near profile faces are tested. However, using only 2-D information, we can not expect good performance for full profile images. In fact, average recognition . In this case, rate drops to 20% with poses 22 and 34 the 3-D model clearly outperforms our system with an average recognition rate of 55%. IX. CONCLUSION Based on a subset of the modes of a point distribution model; namely, the pose parameters, we have proposed methods which try to minimize differences in pose while preserving discriminative subject information. We have demonstrated that the identified pose parameters are mostly responsible for rigid mesh changes, and do not contain important nonrigid (expression/ identity) information that could severely distort the synthesized images. Qualitatively, we justified the benefits of normalizing only pose parameters instead of warping onto an average face shape. This fact was quantitatively confirmed after authentication tests on the XM2VTS database not only with a relative improvement of 31–35%, but also with the certainty that there exist statistically significant differences between both approaches. Moreover, the identification experiments on the CMU PIE database show: • taking advantage of facial symmetry does clearly improve system performance; • transferring pose performs better than normalizing to frontal pose; • the proposed methods achieve state-of-the-art results, outperforming the 3-D morphable model and other apto proaches in a set of rotation angles ranging from 45 ;

• the variant of PTW achieves similar performance to that of the 3-D morphable model when near profile images are used, but degrades with full profile views. Hence, we have demonstrated the suitability of the methods (specially PTW) for face recognition across pose in a set of anto 45 and shown that a 2-D model can gles ranging from deal with rotations up to 67.5 , obtaining similar performance to the one achieved using a more complex 3-D system. However, the latter outperforms the 2-D model significantly when full profile views are tested. Although some results with automatic fitting were presented in [30] showing little degradation in comparison with manually annotated landmarks, we did not still validate our methods on the CMU PIE database with automatic face alignment. Hence, the next step will be to test the pose correction stage on this database with automatic fitting using some well known techniques such as AAM [21]. With the help of a face tracker algorithm such as [28], we plan to perform experiments on video sequences in order to test the suitability of the methods for pose-robust face recognition from video. Although the variant of the PTW method turned out to work quite well, we should still refine this algorithm on near-profile views. Another possible improvement could be to learn a view-based weight function, so that depending on the current pose of the face, some regions get more importance than others in the computation of the final similarity score. ACKNOWLEDGMENT The authors would like to thank F. Sukno for proofreading this paper and his helpful comments, and I. Alonso for his help on the statistical analysis of the results. They would also like to thank S. Baker and R. Gross for their great support regarding the CMU PIE database, the FGnet project for providing very useful annotations of face images, and the reviewers for helpful comments that improved the quality of this paper. REFERENCES [1] D. J. Beymer, “Face Recognition under varying pose,” in Proc. IEEE Conf. CVPR, 1994, pp. 756–761. [2] D. J. Beymer and T. Poggio, “Face recognition from one example view,” in Proc. Int. Conf. Computer Vision, 1995, pp. 500–507. [3] M. Turk and A. Pentland, “Eigenfaces for recognition,” J. Cognit. Neurosci., vol. 3, pp. 72–86, 1991. [4] A. Pentland, B. Moghaddam, and T. Starner, “View-based and modular eigenspaces for face recognition,” in Proc. IEEE Conf. CVPR, 1994, pp. 84–91. [5] T. Maurer and C. V. D. Malsburg, “Single view based recognition of faces rotated in depth,” in Proc. Int. Workshop on Automatic Face and Gesture Recognition, 1996, pp. 176–181. [6] L. Wiskott, J. M. Fellous, N. Kruger, and C. vonderMalsburg, “Face recognition by elastic bunch graph matching,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 775–779, Jul. 1997. [7] X. Chai, S. Shan, X. Chen, and W. Gao, “Local linear regression (LLR) for pose invariant face recognition,” in Proc. 7th Int. Conf. Automatic Face and Gesture Recognition, Southampton, U.K., 2006, pp. 631–636. [8] X. Chai, L. Qing, S. Shan, X. Chen, and W. Gao, “Pose invariant face recognition under arbitrary illumination based on 3D face reconstruction,” in Proc. Audio- and Video-Based Biometric Person Authentication, NY, 2005, pp. 956–965. [9] V. Blanz and T. Vetter, “A morphable model for the synthesis of 3D faces,” in Proc. SIGGRAPH, 1999, pp. 187–194.

GONZÁLEZ-JIMÉNEZ AND ALBA-CASTRO: TOWARD POSE-INVARIANT 2-D FACE RECOGNITION

[10] S. Romdhani, V. Blanz, and T. Vetter, “Face identification by fitting a 3D morphable model using linear shape and texture error functions,” in Proc. Eur. Conf. Computer Vision, Copenhagen, Denmark, 2002, pp. 3–19. [11] V. Blanz, P. Grother, P. J. Phillips, and T. Vetter, “Face recognition based on frontal views generated from non-frontal images,” in Proc. IEEE Conf. CVPR, 2005, pp. 454–461. [12] L. Zhang and D. Samaras, “Face recognition from a single training image under arbitrary unknown lighting using spherical harmonics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 3, pp. 351–363, Mar. 2006. [13] M. W. Lee and R. Surendra, “Pose-invariant face recognition using a 3D deformable model,” Pattern Recognit., vol. 36, no. 8, pp. 1835–1846, 2003. [14] T. Kanade and A. Yamada, “Multi-subregion based probabilistic approach toward pose-invariant face recognition,” in Proc. IEEE Int. Symp. Computational Intelligence in Robotics and Automation, Kobe, Japan, Jul. 2003, pp. 954–959. [15] R. Gross, I. Matthews, and S. Baker, “Appearance-based face recognition and light-fields,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 4, pp. 449–465, Apr. 2004. [16] X. Liu and T. Chen, “Pose-robust face recognition using geometry assisted probabilistic modeling,” in Proc. IEEE Conf. CVPR, 2005, pp. 502–509. [17] R. Gross, J. Shi, and J. Cohn, “Quo Vadis face recognition?,” presented at the 3rd Workshop on Empirical Evaluation Methods in Computer Vision, Kauai, HI, Dec. 2001. [18] A. Lanitis, C. J. Taylor, and T. F. Cootes, “Automatic interpretation and coding of face images using flexible models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 743–756, Jul. 1997. [19] T. Cootes, C. Taylor, D. Cooper, and J. Graham, “Active shape models—Their training and application,” Comput. Vis. Image Understanding, vol. 61, no. 1, pp. 38–59, 1995. [20] F. Sukno, S. Ordas, C. Butakoff, S. Cruz, and A. Frangi, “Active shape models with invariant optimal features (IOF-ASMs),” in Proc. Audioand Video-Based Biometric Person Authentication, 2005, pp. 365–375. [21] T. Cootes, G. Edwards, and C. Taylor, “Active appearance models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 681–685, Jun. 2001. [22] K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre, “The extended M2VTS database,” in Proc. Audio- and Video-Based Biometric Person Authentication , 1999, pp. 72–77 [Online]. Available: http://www.ee. surrey.ac.uk/Research/VSSP/xm2vtsdb/., XM2VTSDB [23] F. L. Bookstein, “Principal warps: Thin-plate splines and the decomposition of deformations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-11, no. 6, pp. 567–585, Apr. 1989. [24] J. Luttin and G. Maître, Evaluation protocol for the extended M2VTS database (XM2VTSDB) Technical Rep. RR-21, 1998, IDIAP. [25] S. Bengio and J. Mariethoz, “A statistical significance test for person authentication,” in Proc. Odyssey, 2004, pp. 237–244. [26] T. Sim, S. Baker, and M. Bsat, “The CMU pose, illumination, and expression database,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 12, pp. 1615–1618, Dec. 2003.

429

[27] P. J. Phillips, H. Moon, S. Rizvi, and P. Rauss, “The FERET evaluation methodology for face recognition algorithms,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 10, pp. 1090–1104, Oct. 2000. [28] S. Baker and I. Matthews, “Equivalence and efficiency of image alignment algorithms,” in Proc. IEEE Conf. CVPR , 2001, pp. 1090–1097. [29] D. González-Jiménez and J. L. Alba-Castro, “Pose correction and subject-specific features for face authentication,” in Proc. ICPR, 2006, vol. 4, pp. 602–605. [30] D. González-Jiménez, F. Sukno, J. L. Alba-Castro, and A. Frangi, “Automatic pose correction for local feature-based face authentication,” in Proc. IAPR Conf. Articulated Motion and Deformable Objects, 2006, pp. 356–365. [31] M. J. Lyons, R. Campbell, A. Plante, M. Coleman, M. Kamachi, and S. Akamatsu, “The Noh mask effect: Vertical viewpoint dependence of facial expression perception,” in Proc. Roy. Soc. London B, 2000, vol. 267, pp. 2239–2245. [32] T. F. Cootes, K. N. Walker, and C. J. Taylor, “View-based active appearance models,” in Proc. Int. Conf. Face and Gesture Recognition, 2000, pp. 227–232. [33] S. Romdhani, S. Gong, and A. Psarrou, “A multi-view nonlinear active shape model using kernel PCA,” in Proc. British Machine Vision Conf., 1999, pp. 483–492.

Daniel González-Jiménez received the Telecommunications Engineer degree from the Universidad de Vigo, Vigo, Spain, in 2003, where he is currently pursuing the Ph.D. degree in the field of face-based biometrics. His research interests include computer vision and image processing.

José Luis Alba-Castro received the M.Sc. and Ph.D. degrees (Hons.) in telecommunications engineering from the Universidad de Santiago, Santiago, Spain, in 1990 and the Universidad de Vigo, Vigo, Spain, in 1997. His research interests include computer vision, statistical pattern recognition, automatic speech, and speaker recognition and image-based biometrics. He has written several papers and been the leader of several R&D projects on these topics. He is an Associate Professor of Discrete Signal processing, Pattern Recognition, Image Processing and Biometrics at the Universidad de Vigo.

Suggest Documents