Symmetry-based Face Pose Estimation from a Single ... - CiteSeerX

0 downloads 0 Views 446KB Size Report
(aam-api [13]) trained on face contours for facial feature localization. ..... 665-673. [9] Takahiro Otsuka and Jun Ohya, “Real-time Estimation of. Head Motion ...
Symmetry-based Face Pose Estimation from a Single Uncalibrated View Vinod Pathangay and Sukhendu Das Visualization and Perception Laboratory Indian Institute of Technology Madras Chennai-600036, India

T. Greiner Faculty of Engineering, Pforzheim University Tiefenbronner Str. 65 75175 Pforzheim, Germany

[email protected], [email protected]

[email protected]

Abstract Input image

In this paper, a geometric method for estimating the face pose (roll and yaw angles) from a single uncalibrated view is presented. The symmetric structure of the human face is exploited by taking the mirror image (horizontal flip) of a test face image as a virtual second view. Facial feature point correspondences are established between the given test and its mirror image using an active appearance model. Thus, the face pose estimation problem is cast as a two-view rotation estimation problem. By using the bilateral symmetry, roll and yaw angles are estimated without the need for camera calibration. The proposed pose estimation method is evaluated on synthetic and natural face datasets, and the results are compared with an eigenspace-based method. It is shown that the proposed symmetry-based method shows performance that is comparable to the eigenspace-based method for both synthetic and real face image datasets.

1. Introduction Computer vision algorithms that analyze human faces have to deal with significant variations in pose. The human ability of maintaining visual constancy across different poses of the face is a requirement for algorithms that recognize faces, facial expression and gestures. The pose of the face (in terms of pitch, yaw and roll angles) can thus be a significant input to applications that analyze human faces. The human face exhibits near bilateral symmetry. Although the facial symmetry is not exhibited at a fine level of detail, it is apparent at a coarser level. In this paper, we exploit this coarse symmetry of faces to estimate the roll and yaw angles. In order to exploit symmetry, the test face image is flipped about the vertical axis to get a mirror image. This mirror image is treated as a virtual second view of the face after rotation. Therefore the pose estimation problem is now cast as a rotation estimation problem from two views (either with the face being stationary with two convergent cameras, or with a single camera and rotating face). The block diagram of the proposed method is shown in Fig. 1. The input image (obtained after applying a face detection algorithm [19]) is flipped along the vertical axis

978-1-4244-2154-1/08/$25.00 ©2008 IE

Flip

Mirror image

Feature point localization

Correspondences

Rotation

Roll, yaw

Figure 1: Block diagram of the proposed method.

(at the center of the image) to obtain a corresponding mirror image. Facial feature points are located on both these images using an active appearance model [12] to obtain the correspondences. The correspondences are used as input to the rotation model for estimating roll and yaw angles. This paper is organized as follows. In section 2, an overview of the related work on the face pose estimation is discussed. In section 3, the rotation model for pose estimation is derived. In section 4, we discuss the experimental results of the proposed pose estimation technique on synthetic and real face image (public domain) datasets. We also compare the accuracy of the proposed method against a baseline eigenspace-based pose estimation method for each dataset. In section 5, we discuss the limitations of the proposed method as well as further possible improvements of this work.

2. Related work An extensive list of previous work on face pose estimation can be obtained from [1]. Most work on face pose estimation can be classified into appearance-based, feature-based and geometry-based methods. Appearancebased methods use linear subspace decomposition and other non-linear variants to model appearances of different poses of the face [2], [3], [4], [5], [6]. The method presented in [2] uses weighted linear combination of local

linear models in a pose parameter space. In [3], an independent component analysis (ICA) based approach is presented for learning view-specific subspace representations for different face poses in an unsupervised manner. A nonlinear interpolation method to estimate the head-pose between two training views using Fisher manifold learning is presented in [5]. In [6], biased manifold embedding is proposed for estimating face pose under the assumption that face images are considered to lie on a smooth lower dimensional manifold in a higher dimension feature space. Appearance-based methods can be used for large pose angles (-90º to +90º) with a step of 2º (as reported in [6]). However, training images are necessary for each step angle. In feature-based methods, the pose-specific properties of certain points on the face (eye, nose etc.) are represented as feature templates. One of the commonly used features is Gabor jet [7], which is derived from a set of linear filter operations in the form of convolutions of the image with a set of Gabor wavelets with parameterized orientations and wavelengths. Such points and their features can be connected to form pose-specific attribute graphs or bunch graphs [8], [7]. Another feature-based method uses iterated closest point algorithm to match facial feature curves and edges with a generic template [20]. Several low-level features such as colour, area, and moments of the segmented face region have also been used [21]. Geometric methods on the other hand depend on the relationship between the locations of certain feature points on the face images and derive an analytical solution for the face pose angles. Epipolar geometry and weakperspective camera model is used to obtain face pose by tracking facial feature points [9]. Facial asymmetry is used for coarse-pose estimation in [10], where an iterative 2Dto-3D geometric model-based matching is used to obtain the fine-pose. In [11], a relation between the facial feature locations in the image and facial normal and gaze direction is derived from ratios of lengths between the feature points using a weak-perspective projection. Geometric methods have been used to obtain continuous values of face pose. Some geometric methods (e.g. [9], [24]) use multiple views of the moving face to track features between views for estimating rotation. In [24], symmetry of the face has been used to compute the rotation angles using differences in left-right feature projections. Other methods ([10], [11]) locate specific facial features and estimate pose from a single view based on the spatial relationship between the feature locations. In this paper, we present a geometric method for estimating pose from a single view using the symmetric property of the face. We use active appearance model (AAM) for localization of facial feature points [12] and a rotation model to estimate roll and yaw. In the following section, we discuss the rotation model that

α Test image

Mirror image

Figure 2: Calculation of roll: α is the 2D rotation angle between the test image (left) and the mirror image (right). Roll angle with respect to the vertical axis is half the

exploits the symmetry of the face for pose estimation.

3. Rotation model As shown in the block diagram in Fig. 1, the second virtual view is synthesized by flipping the given test face image about its vertical axis. Here we use the property of bilateral symmetry of the face – where, reflection appears to be equivalent to rotational transformation. Using this assumption, we describe two rotation models for estimating roll and yaw angles of the face.

3.1 Roll estimation Roll is essentially in-plane rotation of the face. This is modeled as a 2D rotation. Fig. 2 shows the in-plane 2D rotation between the test image and its mirror image. The roll angle with respect to the vertical axis (shown dashed) is half α. Let the facial feature point correspondences from the test image and its mirror image be represented as xi and x’i (image coordinates represented in homogeneous form). The relation between correspondences for pure roll is given by

⎡a b ⎤ ⎥x x′i = ⎢ − b a i ⎢ ⎥ 1⎦ ⎣

(1)

where a, b are the cosine and sine angles of the 2D rotation angle α (as shown in Fig. 2). With sufficient number of correspondences, an over-determined linear system of equations can be set up to estimate a and b with 2

2

the additional constraint that a + b = 1 .

3.2 Yaw estimation Fig. 4 illustrates the symmetry assumption used for estimating yaw. The test image and its mirror image are

information as most face images extracted from the web or personal image collections are imaged with various cameras with unknown calibration. We use the notation as followed in [14]. A 3D point on the face X is imaged as x and x’, before and after flipping (or rotation) respectively. As shown in Fig. 3, we place the camera center such that its center coincides with the origin of the world coordinates. The face is located along the negative Z-axis with a displacement tz. We assume that due to the symmetry, the flipping of the face image is the same as a rotation of the face about an axis parallel to Y-axis located tz from the origin. This motion involves: i) translation of the face to the origin along Z-axis M1, ii) rotation of the face about the Y-axis M2 (Fig. 3) and iii) translation back to the original location which is at a distance tz from the origin along the negative Z-axis M3. (M1, M2 and M3 are 4x4 motion matrices). Therefore the motion matrix is of the form M = M1 M2 M3 which is expanded as follows

θ

Y

Z

X

Figure 3: Relative placement of camera and face; camera center is at the origin.

⎡1 ⎢ M=⎢ ⎢ ⎣

⎤⎡ c ⎥⎢ −t z ⎥ ⎢−d ⎥⎢ 1 ⎦⎣

1 1

d 1 c

⎤ ⎡1 ⎥⎢ ⎥⎢ ⎥⎢ 1⎦ ⎣

1 1

⎤ ⎥ tz ⎥ ⎥ 1⎦

(2)

Here c and d are the cosine and sine of the 3D rotation angle β (as shown in Fig. 4). The projection of a 3D point X on the test image is x = PX where P is the 3x4 canonical perspective camera matrix of the Test image

[

]

form I 3×3 | 03×1 . After rotation, we get x’ = PX’ = PMX.

β

Mirror image Figure 4: Calculation of yaw: β is the 3D rotation angle between the test image (left) and the mirror image (right). Yaw angle with respect to the frontal axis (dashed) is half the rotation angle (β/2).

considered as two different views of the face. Both images can be considered as either taken from two different convergent cameras, or as images taken from a stationary camera while the face undergoes a 3D yaw rotation. We use the latter assumption of the stationary camera and rotating face as it makes the formulation simple. Unlike roll, yaw rotations are in-depth and therefore we use a camera model since our measurements of the points on the 3D face are in 2D images. We use a canonical perspective camera model that is general enough for most type of facial images. Also, we do not use any calibration

The relation between the correspondences x and x’ is given by + (3) x ' = PMP x + where P is the pseudo-inverse of the matrix P. On substitution of M and P in (3), we get ⎡c d⎤ ⎢ ⎥ x' = ⎢ 1 ⎥x (4) ⎢ ⎥ c⎦ ⎣−d This shows that the relationship between x and x’ is independent of tz. In order to have a robust estimate of yaw, we use a bi-directional error function given by 2 T 2 e = ∑| x′i − Rxi | + | xi − R x′i | i

(5)

where i is the point index and R is the 3x3 rotation matrix for pure yaw as in (4). The error function is iteratively minimized for a range of possible yaw angles (-45º to +45º). This range has been considered because all the facial features about the line of symmetry of the face (not image) can be localized in this range. For yaw angles greater than |45º|, the natural self-occlusion of the face hinders localization of facial features on both side of the nose (or line of symmetry). This has been observed

Figure 5: Synthetic images used for testing roll estimation.

empirically over a set of images/faces. For the angle β corresponding to the minimum error the correct yaw angle with respect to the camera principal axis is β/2 (as shown in Fig. 4). In the following section, we show experimental results of the proposed method.

4. Experimental results We evaluated the proposed pose estimation method on two synthetic and one natural face image datasets that have labeled pose variations. We also compared our yaw estimation results with a baseline eigenspace-based pose estimation method, which is a simplified version of [15].

4.1 Face datasets used Different datasets were used to estimate roll and yaw. As the roll estimation method is relatively simple compared to yaw, we present results mainly for yaw estimation and show limited results for roll estimation. For evaluating roll estimation, we used rotated frontal images as shown in Fig. 5. For evaluating yaw estimation, three face datasets were used: i) MPI face dataset [16] – consists of 100 male and 100 female synthetic faces images in three poses (-30º, 0º and +30º). We used images of two male and two female subjects for training and the rest for testing. ii) MIT-CBCL face recognition dataset [17] – consists of 10 subjects with face pose (yaw) ranging from –32º to 0º in steps of 7º. We used the synthetically rendered face images from the ‘training-synthetic’ directory with constant lighting. iii) INRIA head pose dataset [18] – consists of real image sequences of 10 subjects with two sessions each with combined yaw and pitch variations. We used the subset of images with pure yaw in the range –45º to +45º in steps of 15º. For both MIT-CBCL and INRIA head pose datasets, we used the first two subjects for training and the rest for testing.

Figure 6: Result of facial feature detection using active appearance model on a test image (left) and its mirror image (right). The yaw angle in this case is 44º.

4.2 Pre-processing: Facial feature localization The face region is automatically detected using the Haar-cascade technique [19]. We use the public domain OpenCV implementation for extracting faces. Next, facial features are localized and correspondences established between the test face and its mirror image. Feature-point matching using Harris [22] and SIFT [23] methods produce highly erroneous matched correspondences for face and their mirror images. In order to avoid such errors, we localize facial features using active appearance models (AAM). AAM localizes feature points on the face image based on global structural constraints. We use a public domain implementation of active appearance models (aam-api [13]) trained on face contours for facial feature localization. We trained the AAM with 18-point contours (eye-brows, eyes and lips) for a range of poses for which both eyes are visible (i.e. magnitude of yaw angle is up to 45º). We did not use any contours on the nose as the nose exhibits maximum variations for different poses. Fig. 6 shows the result of the AAM search applied on a face and its mirror images, with the correspondences overlaid. It can be observed that the point-1 is reflected to point-10 in the mirror image. The AAM technique however maintains structural relationship and thus point-1 in both the images corresponds to first point on the left eyebrow (as viewed). X-coordinates of all feature points from both images are subtracted by the X-coordinate of the respective (facecentric) axial line (passing through the lower or upper middle lip point) on the image domain. This normalizes any relative translations induced due to flipping. It may also be noted that the AAM search is performed (twice) on both the images and therefore the correspondences between the feature points localized on both test and mirror images are not related by a reflection.

Estimated roll (degrees)

100

50

0

-50

-100 -80

-60

-40

-20

0 True roll (degrees )

20

40

60

80

Figure 7: Results of the symmetry-based roll estimation method on different rotations of a frontal face. (Blue ‘o’ show the estimated roll and the red’+’ show the ground truth.

4.3 Roll estimation

4.4 Yaw estimation The baseline system used for comparison is an eigenspace-based method (similar to [15]) that classifies the face into classes of different yaw angles. The training face images for the eigenspace-based method are the same as those used for training the AAM in the symmetry-based method. For each face dataset, training face images with different values of yaw are resized to 32x32 and reordered to produce a 1024-dimensional column vector. Principal component analysis (PCA) is used to reduce the dimensionality of the feature vector. The projections of the test feature on the first six eigenvectors are used to form the feature vector and the nearest neighbor classification is used for assigning the pose of a test face image. Fig. 8 shows the result of yaw estimation for single subject of the MPI face dataset. The blue ‘o’ indicate the result of the symmetry-based method and the red ‘+’ indicate the ground truth. Table 1 shows a comparison of the symmetry and eigenspace-based methods. The errors

40 Estimated yaw

Fig. 7 show results of roll estimation on a set of face images rotated at 20º intervals. Active appearance model is used to localize facial feature points in the test image and its mirror image. The correspondences are used to estimate the in-plane rotation. The roll angle is taken as half the rotation angle as discussed in Sec. 3. It can also be observed in Fig. 7 that there is an offset between the estimated roll and the ground truth. This is because of the tilt in the center face image towards the left side, which gets propagated for all images.

20 0 -20 -40 -40

-20

0 True yaw

20

40

Figure 8: Result of yaw estimation on single subject of MPI face dataset. (Yaw angles in degrees).

in yaw estimation are averaged for 196 subjects (4 subjects used for training out of 200). It can be noted that the eigenspace-based method performs better compared to the symmetry-based method for the MPI dataset. This is due to the fact that faces with –30º, 0º and +30º yaw angles form three well separated classes in eigenspace that can be easily discriminated using the nearest neighbor classifier. Fig. 9 shows the result of yaw estimation for subject 5 of the MIT-CBCL face recognition database. It can be observed that the face with yaw angle -32º and -28º are wrongly estimated. This is because the facial features are not correctly localized using the AAM. Therefore this

50 40 30

Estimated yaw

20 10 0 -1 0 -2 0 -3 0 -4 0 -5 0 -5 0

-4 0

-3 0

-2 0

-1 0

0 T ru e y a w

10

20

30

40

50

Figure 10: Results of yaw estimation on different subjects of the INRIA-head pose dataset. Blue circles show estimated values and red ‘+’ show ground truth. (Yaw angles in degrees).

10 5 0 Estimated yaw

leads to wrong correspondences and thus an erroneous yaw estimate. In Table 2, a comparison of the average yaw estimation error (averaged over all subjects) for the proposed symmetry-based method and the eigenspacebased method is shown. The first four subjects were used for training the AAM and the eigenspace-based method. It can be observed that the error for the eigenspace-based method is almost the same (around 6º) for all yaw angles. This is because the pose angles are estimated by nearest neighbor classification and the misclassification rate (error in yaw estimate) remains almost same due to the similarity between faces at every 4º yaw intervals. On the other hand, the symmetry-based method shows a higher average error for larger yaw angles. Fig. 10 shows the yaw estimates for all test subjects in the INRIA head pose dataset. It can be noted that there is a larger error for larger angles. This trend is also observed for the MIT-CBCL dataset (Table 2). This can be justified as follows: i) For larger yaw angles, the face images deviate from the ideal canonical camera assumption. This is due to the larger effect of foreshortening at larger angles. ii) Another source of error is the inaccuracies in correspondences at large angles. Hence larger errors are observed for larger angles. The comparison of the proposed symmetry-based method with eigenspace-based method is shown in Table 3 in terms of average error in yaw estimation for all subjects. Here, the performance of the baseline eigenspace-based method is relatively poor compared to other datasets. Unlike the previous two synthetic datasets, the lack of uniformity in face alignment after the face detection step contributed to

-5 -10 -15 -20 -25 -30 -35 -40

-30

-20 True yaw

-10

0

Figure 9: Result of yaw estimation on single subject of the MIT-CBCL face dataset. Blue circles show estimated values and red ‘+’ show ground truth. Inset face image shows error in yaw estimate due to wrong facial feature localization.

the relatively low performance of the baseline method. However, the proposed symmetry-based method is not affected by misalignment during face detection as the AAM can localize feature points in such cases. The average time required for pre-processing is around 19 secs whereas the time required for roll and yaw estimation is 0.1 and 0.4 secs respectively (evaluated with no optimizations on a 1.73GHz CPU). Therefore with a

faster preprocessing, the proposed symmetry-based pose estimation method can be used in real-time applications. Table 1: Average yaw estimation errors (degrees) for MPI dataset Method Symmetry-based Eigenspace-based

-30º



+30º

-2.0 0.3

1.4 0.0

1.2 0.5

Table 2: Average yaw estimation errors (degrees) for MIT-CBCL dataset Method Symmetry-based Eigenspace-based Symmetry-based Eigenspace-based



-4º

-8º

-12º

-16º

3.0 5.3 -20º 6.7 6.7

1.6 6.0 -24º 6.7 6.0

1.6 6.7 -28º 12.3 6.0

2.0 4.0 -32º 13.3 9.3

6.7 5.3

larger errors at larger angles of yaw, whereas the eigenspace-based method has almost constant errors for all angles. We do not consider tilt in this paper, although it should be possible to estimate roll and yaw for faces with very small tilt. The work presented in this paper estimates pure roll and yaw angles independently. This method can be considered as a coarse yaw and an accurate roll angle estimator, to provide an initial estimate for any iterative process required for a finer search (in the presence of additional information). We are extending the current work for estimating more accurate pose angles with combined roll, yaw and pitch for face image sequences using a multiview framework.

References [1] [2]

[3]

Table 3: Average yaw estimation errors (degrees) for INRIA Head Pose dataset Method Symmetry-based Eigenspace-based Symmetry-based Eigenspace-based

-45º

-30º

-15º



22.4 27.1 15º 3.5 16.7

14.0 20.7 30º 11.6 10.9

5.8 20.7 45º 20.1 26.5

4.8 31.2

5. Conclusions and future work In this paper, we propose a symmetry-based face pose estimation method, where the problem of estimating the face pose is cast a two-view rotation estimation problem. The roll angle is half the 2D in-plane rotation estimated between the test face image and its mirror image. Yaw angle is taken as half the 3D in-depth rotation between the two views generated similarly. The rotation angles are estimated using facial feature correspondences obtained by using an active appearance model. Our experiments show that eigenspace-based method performs better than the symmetry-based method for discriminating between coarse pose angles (such as –30º, 0º, +30º). For finer angles, the symmetry-based method outperforms the eigenspace-based method. The proposed method is also invariant to misalignments in the output of the face detection step as the AAM can localize feature points easily. Also, it is observed that the proposed method has

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

Keith Price Bibliography – Face Pose, Head Pose, http://www.visionbib.com/bibliography/people913.html. K. Okada and von der Malsburg, “Analysis and Synthesis of Human Faces with Pose Variations by a Parametric Piecewise Linear Subspace Method”, in Proc. Computer Vision and Pattern Recognition, (CVPR), 2001. Li, S.Z., Lu, X., Hou, X., Peng, X., Cheng, Q., “Learning Multiview Face Subspaces and Facial Pose Estimation Using Independent Component Analysis”, in IEEE Tran. Image Processing, vol. 14, no. 6, Jun. 2005, pp. 705-712. Srinivasan, S., Boyer, K.L., “Head pose estimation using view based eigenspaces”, in Proc. Int. Conf. Pattern Recognition (ICPR) 2002. Chen, L., Zhang, L., Hu, Y.X., Li, M.J., Zhang, H.J., “Head pose estimation using Fisher manifold learning”, in Proc. Int. Workshop on Analysis and Modeling of Faces and Gestures (AMFG) 2003. V. N. Balasubramanian, J. P. Ye, , S. Panchanathan, “Biased Manifold Embedding: A Framework for PersonIndependent Head Pose Estimation”, in Proc. Computer Vision and Pattern Recognition (CVPR) 2007. J.W. Wu, M.M. Trivedi, “A two-stage head pose estimation framework and evaluation”, Pattern Recognition, vol. 41, no. 3, Mar. 2008, pp. 1138-1158. Kruger, N., Potzsch, M., von der Malsburg, C., “Determination of Face Position and Pose with a Learned Representation Based on Labeled Graphs”, Image and Vision Computing, vol. 15, no. 8, Aug. 1997, pp. 665-673. Takahiro Otsuka and Jun Ohya, “Real-time Estimation of Head Motion Using Weak Perspective Epipolar Geometry”, in Proc. Workshop in Applications of Computer Vision (WACV ), 1998. Hu, Y.X., Chen, L.B., Zhou, Y., Zhang, H.J., “Estimating face pose by facial asymmetry and geometry”, in Proc. Automatic Face and Gesture Recogntion (AFGR) 2004. Andrew Gee and Rober Cipolla, “Determining Gaze of Faces in Images”, Image and Vision Computing Vol. 12 No. 10 Dec. 1994. G. J. E. Edwards, C. J. Taylor and T. F. Cootes, “Interpreting Face Images using Active Appearance

[13] [14] [15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

Models”, Int. Conf. On Face and Gesture Recognition 1998, pp. 300-305. Mikkel B. Stegmann, “Active Appearance Models”, http://www2.imm.dtu.dk/~aam/ Richard Hartley and Andrew Zisserman, “Multiple View Geometry in Computer Vision”, 2nd Edition, 2003. Trevor Darell, Babback Moghaddam and Alex Pentland, “Active Face Tracking and Pose Estimation in an Interactive Room”, in Proc. Computer Vision and Pattern Reocgnition (CVPR). 1996. B. Weyrauch, J. Huang, B. Heisele, and V. Blanz. Component-based Face Recognition with 3D Morphable Models, First IEEE Workshop on Face Processing in Video, Washington, D. C. , 2004. Troje, N. and H. H. Bülthoff: Face recognition under varying poses: The role of texture and shape. Vision Research 36, pp. 1761-1771 (1996). N. Gourier, D. Hall, J. L. Crowley, “Estimating Face Orientation from Robust Detection of Salient Facial Features”, in Proc. Pointing 2004, ICPR, International Workshop on Visual Observation of Deictic Gestures, Cambridge, UK. P. Viola and M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features”, Proc. Computer Vision and Pattern Recognition, 2001. Shimizu, I., Zhang, Z., Akamatsu, S., Deguchi, K., “Head Pose Determination from One Image Using a Generic Model”, in Proc. Automatic Face and Gesture Recognition (AFGR), 1998. Chen, Q., Wu, H., Fukumoto, T., Yachida, M., “3D Head Pose Estimation without Feature Tracking”, in Proc. Automatic Face and Gesture Recognition (AFGR), 1998. C. Harris and M. Stephens “A combined corner and edge detector”, Proc. 4th Alvey Vision Conference, pp. 147-151, 1988. K. Mikolajczyk, K. and C. Schmid, “Scale and affine invariant interest point detectors”, Int. Journal of Computer Vision, vol. 60, no. 1, pp. 63 – 86. 2004. T. Horprasert, Y. Yacoob and L. Davis, “Computing 3-D Head Orientation from Monocular Image Sequence”, Int. Conf. Face and Gesture Recognition 1996, pp. 242-247.

Suggest Documents