3D Face Reconstruction from a Single 2D Face ... - Semantic Scholar

1 downloads 0 Views 2MB Size Report
3D face reconstruction using a single 2D facial image is a challenging task ..... the Delaunay triangulation, we can get a triangular mesh consisting of a set of ...
3D Face Reconstruction from a Single 2D Face Image Sung Won Park, Jingu Heo and Marios Savvides Electrical and Computer Engineering Department Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Email: {sungwonp, jheo, marioss}@cmu.edu

Abstract—T3D face reconstruction from a single 2D image is mathematically ill-posed. However, to solve ill-posed problems in the area of computer vision, a variety of methods has been proposed; some of the solutions are to estimate latent information or to apply model based approaches. In this paper, we propose a novel method to reconstruct a 3D face from a single 2D face image based on pose estimation and a deformable model of 3D face shape. For 3D face reconstruction from a single 2D face image, it is the first task to estimate the depth lost by 2D projection of 3D faces. Applying the EM algorithm to facial landmarks in a 2D image, we propose a pose estimation algorithm to infer the pose parameters of rotation, scaling, and translation. After estimating the pose, much denser points are interpolated between the landmark points by a 3D deformable model and barycentric coordinates. As opposed to previous literature, our method can locate facial feature points automatically in a 2D facial image. Moreover, we also show that the proposed method for pose estimation can be successfully applied to 3D face reconstruction. Experiments demonstrate that our approach can produce reliable results for reconstructing photorealistic 3D faces.

I. I NTRODUCTION 3D face reconstruction using a single 2D facial image is a challenging task requiring several process such as depth estimation or face modeling since it is an ill-posed problem. In particular, pose estimation task is often set up as a problem to infer both the depth and the pose parameters for a given 2D facial image. In this case, the Expectation Maximization (EM) algorithm is a powerful way to be successfully applied to pose estimation. The EM algorithm is one of the most widely used methods for parameter estimation. One of the most significant benefits of the EM algorithm is that these parameters can be estimated even when a part of information in a data set is missing. The EM algorithm has been applied to pose estimation using feature points since it enables us to infer both the position of the feature points as missing data and the pose parameters of rotation, scaling, and translation [2] [11] [9] [4]. Choi et al. [2] propose the pose estimation by the EM algorithm based on weak perspective projection by using the summation of the posterior probabilities of all the 3D feature points which can be easily calculated in practice; however, not for empirical convenience but for theoretical correctness, the summation of the posterior should be replaced by the integrals of the posterior probabilities with respect to the whole region of the 3D feature points. Zhou et al. [11] [9] also suggested a

978-1-4244-2340-8/08/$25.00 ©2008 IEEE

novel way for pose estimation combining the EM algorithm and a deformable model of 3D face shape. Their method is applicable for various poses such as both frontal and profile views by using a mixture model [9]. they take advantage of one of the strong points of the EM algorithm; the EM algorithm can be successfully applied to estimate the parameters of a mixture model. However, their method has a limitation in practical use in that it only estimates 2D rotation so their method cannot be used for 3D pose estimation. The above-mentioned work commonly require feature point detection as an input of pose estimation [2] [11] [9] [4], and it is also a difficult task to invent a reliable facial feature detector. In particular, some of the feature points are invisible or hardly detected in a given 2D facial image with pose variation, so it is problematic to handle invisible or missing feature points. In many cases, the way to estimate the feature points should be also proposed for pose estimation [4]. On the other hand, the proposed method for pose estimation in this paper does not require to estimate the invisible points since it can perform pose estimation only with successfully detected feature points. In this paper, we propose a novel method for pose estimation applicable to 3D alignment and 3D reconstruction of a face in a single 2D facial image. To solve the pose estimation problem, we employ the EM algorithm as used in related work, but our approach does not require any effort to estimate some invisible or missing feature points. Moreover, we propose a reliable method to reconstruct a 3D face using the given 2D image by applying our pose estimator. Consequently, our method proposed in this paper can perform the whole process of 3D face reconstruction without any manual interference by humans. II. P ROBLEM F ORMULATION As the input of pose estimation, we need the positions of facial landmarks in a given 2D image obtained by a reliable feature point detector. First of all, we assume that N feature points in the 2D facial image are already detected successfully; each point is denoted by a 2 × 1 vector di where i = 1, 2, · · · N . A 2D face image can be considered as the 2D projection of a 3D face after changing its 3D pose from a frontal view by rotation, scaling and transformation. Thus, pose estimation is defined as inferring the pose parameters: scaling, rotation, and translation. Also, pose correction is

defined as recovering the 3D positions of the feature points in a normalized frontal 3D face using the estimated pose parameters. A 3×1 vector zi denotes the unknown position of a feature point in the normalized frontal 3D face. Assuming weak projective projection, we present the projection of the 3D point into the image plane in the following manner [11] [9] [4]: di = sURzi + t + εi ,

(1)

where the 3D point zi is first rotated by R3×3 , then projected onto the image plane with U2×3 = [1 0 0; 0 1 0] by losing its depth, and scaled by s. Finally, it is translated by t in the 2D image plane. Thus, in our proposed method for pose estimation, a set of unknown parameters to be estimated with the EM algorithm is Θ = {s, R, t}, and a set of missing data is {z1 , z2 , · · · , zN }. We assume that εi and zi are generated by a multivariate Gaussian distribution: εi ∼ N (0, σi2 I2×2 ) and zi ∼ N (µi , ρ2i I3×3 ). In particular, zi ∼ N (µi , ρ2i I3×3 ) means that zi is on the frontal face which is centralized to the origin and size-normalized. While the previous work applied a deformable model to solve for the probability of zi [11] [9] [4], we employed a simpler model with Gaussian distribution for zi . µi and ρi are easily calculated from the training set. In this paper, we use the USF Human-ID database [1], which has 100 laser-scanned 3D faces. Each face in the database has 75, 972 vertices shown in the Figure 1(b). In this paper, we apply the EM algorithm to estimate the pose variation. The idea of the EM algorithm is finding parameters which maximize the expected value of the loglikelihood. III. P OSE E STIMATION W ITH THE EM A LGORITHM The EM algorithm is one of the most widely used methods for parameter estimation. One of the most significant benefits of the EM algorithm is that these parameters can be estimated even when a part of information in a data set is missing. The EM algorithm can estimate both missing data and unknown model parameters with two steps: the expectation and maximization steps. The positions of the 3D feature points are estimated at the E-step, and the optimal pose parameters of rotation, scaling, and translation are calculated at the Mstep. The EM has broad applicability, and in particular, pose estimation of a 2D face can be an appropriate application of the EM algorithm since the 3D depth is lost by 2D projection and the parameters for pose transform are unknown. So, in this paper, we set up the pose estimation task as a problem to infer both the depth and the model parameters for a given 2D facial image and propose the way to solve the problem by the EM algorithm. As an input of the proposed method for pose estimation, we use the positions of facial landmarks in the 2D image obtained by our feature point detector. A. Facial Feature Detection Automatic facial feature detection is a difficult but a key task for many practical applications of face image analysis. For example, from the feature points, the estimation of rotation

and expression of a face can be achieved directly. Active Models such as Active Shape Models and Active Appearance Models [3] [5] localize the landmark points in a image using global geometric shape constraints to fit the points iteratively. These schemes require good initialization of the shape model within a few iterations until convergence. To alleviate the problems of fitting with the Active Models, we focus on detecting outstanding facial landmarks which can help to initialize the global shape more correctly. In this paper, invisible points under occlusions or undetected points are estimated through the global shape and texture constraints using Active Appearance Models. We define the facial feature points mostly located around the eyes, nose, eyebrows, mouth, and boundary of a face. These points will provide general shape information about any faces. Let xi be a vectorized image patch containing pixel values centered at each landmark point within a certain window size, where i = 1, · · · , NP . NP is the number of the image patches. As a default window size, 15×15 is used in this paper. Next, a simple linear regression model for image patches x is defined as (2) yi = θiT xi + ²i where yi is the output of the facial feature detector and θi is the parameter in the linear regression model for the ith patch xi . The closed form solution called the normal equation becomes θ = (XT X)−1 XT y,

(3)

where X is a set of image patches in a training set which should be detected as a feature point. To avoid the singularity problem, an infinitesimal noise ω is added into the diagonal element of XT X which can be denoted by (4) θ = (XT X + ωI)−1 XT y where I is the identify matrix. By making the output vector y as the same value for all the training samples forcefully, θ can be considered as an equal correction filter [6]. Simply, y is a NP × 1 column vector: y = [1 1 1 ....1]T

(5)

Since we only care about the inner product (correlation) between a filter (model) and all training samples, other outputs deviating from the desired output can be non-interesting samples. Then, the posterior distribution of the output can be modeled by the Gaussian distribution: p(y|x, θ) = N (1, σ 2 )

(6)

The classification rule for assigning x into y via the linear regression model is to choose the threshold value according to the standard deviation. All the samples in the range of specified confidence intervals [95%, 99%] are assigned to 1. If σ approaches to zero in the training, all the samples are closely assigned to the desired output; If σ increases, the confidence interval tends to spread so that the wrongly accepted samples are likely increasing. In order to handle

(a)

(b)

(c)

Fig. 1. 3D average shape created from 3D training images. (a) the average shape of the USF Human-ID database; (b) 75, 972 vertices; (c) 79 feature points.

scale and in-plane rotation changes of a face, feature detectors should be tolerant to these affine transformations. In order to reduce all possible transformations, we restrict our focus on limited scale and rotation changes after applying a face detector. To create more training images, we modify a training image patch by scaling and rotating it; we give the scale variation by {0.8, 0.9, 1, 1.1, 1.2} and rotation by {−15◦ , −14◦ , −13◦ , · · · , 15◦ }. We apply 79 different regions and produce the equal correlation filters to find 79 feature points. For testing, we use the face detector [8] for cropping face regions first, and then apply the feature detectors. Figure 2 shows the process and results of feature point detection.

and ai =

(11)

According to Eq.(8)-(10), the expectation of the latent data is obtained as Z E(zi |di , Θ) = zi p(zi |di , Θ)dzi = Σi ai = mi . (12) zi

Also, we need one more expectation E(zTi Ci zi |di , Θ) for the next subsection: Z (zi − mi )T C(zi − mi )p(zi |di , Θ)dzi E(zTi Czi |di , Θ) = zi

= tr(Σi C−1 ) + mTi C−1 mi .

B. E step Next, we demonstrate that pose variation can be estimated using the{ feature points in the given 2D image. We formulate the pose estimation task as inferring missing data and unknown parameters so as to apply the EM algorithm. First, we compute a posterior probabilities p(di , zi |Θ) and p(zi |di , Θ). p(di , zi |Θ) can be easily obtained by the two Gaussian distributions assumed above, i.e., p(di , zi |Θ) = p(di |zi , Θ)p(zi |Θ) kzi − µi k2 kdi − sURzi − tk2 − ). ∝ exp(− 2 2σi 2ρ2i (7)

p(di |zi , Θ)p(zi |Θ) , p(zi |di , Θ) = p(di |Θ) we need to compute p(di |Θ): Z p(di |Θ) = p(di , zi |Θ)dzi zi Z kdi − sURzi − tk2 kzi − µi k2 ∝ exp(− − ) 2 2σi 2ρ2i zi 1 kdi − tk2 kµi k2 1 = |Σi | 2 exp(− − + aTi Σi ai ), 2 2σi 2ρ2i 2 s2 1 Σi = ( 2 RT UT UR + 2 I)−1 , σi ρi

(13)

C. M step In the M step, the goal is to find parameters which maximize the following equation Θ(k+1) = arg max Q(Θ, Θ(k) ) = Θ

=

qi (Θ, Θ(k) )

i=1

N Z X

p(zi |di , Θ(k) ) ln p(di , zi |Θ)dzi ,

(14)

p(zi |di , Θ(k) ) ln p(di , zi |Θ)dzi .

(15)

Zi

i=1

where

N X

Z (k)

qi (Θ, Θ

)= Zi

Since p(zi |di , Θ) is represented by

where

s T T µi R U (di − t) + 2 . σi2 ρi

(8)

and N is the number of detected feature points. So, even when some points are invisible under occlusion or undetected by the feature point detector, we can apply our approach to pose estimation. Using Eq.(12) and (13), qi (Θ, Θ(k) ) = −

(9)

kdi − tk2 1 (k)−1 (k) − tr(Σi Σi ) + aTi mi 2σ 2 2

1 (k)T (k) − mi Σ−1 + const. (16) i mi 2 Minimizing Eq.(14) leads us to the solution of optimal parameters t(k+1) and s(k+1) as PN di i=1 σi2

(10)

t(k+1) = arg max Q(Θ, Θ(k) ) = PN t

1 i=1 σi2

,

(17)

(a)

(b)

(d)

(e)

(c)

Fig. 2. Results of the facial feature detection. (a) detecting high detection rates with high false positive detection; (b) removing falsely accepted points using local constraints; (c) keeping good points for the initialization of the mean face shape; (d) initializing a global mean shape (e) fitting Active Appearance Models

By applying the Lagrange multipliers, Eq.(22) can be solved:

and s(k+1) = arg max Q(Θ, Θ(k) ) s PN 1 (k) (k+1) T ) UR(k+1) mi i=1 σi2 (di − t , = PN (k) (k)T (k) 1 1 Ami i=1 σ 2 tr(AΣi ) + σ 2 mi i

L(r1, r2, λ1 , λ2 , λ3 ) =Q(Θ, Θ(k) ) + λ1 (rT1 r1 − 1) + λ2 (rT2 r2 − 1) + λ3 rT1 r2 (18)

i

where the matrix A is defined as (19)

to simplify the notation. Eq.(17) can be solved by its derivative with respect to t, and also, if we can obtain R(k+1) , Eq.(18) can be solved by its derivative with respect to s. To solve for R(k+1) , we need to set up an optimization equation with not only an objective function and but also the constraints of a rotation matrix. The constraints of R so as to make it a rotation matrix is that the three row (or column) vectors in the matrix R are orthogonal each other and have unit lengths, and they construct a right-hand system as following: rT1 · rT2 = 1, rT2 · rT3 = 1, and rT3 · rT1 = 1,

(20)

krT1 k2 = 1, krT2 k2 = 1, and krT3 k2 = 1,

(21)

and where R is denoted as R = [r1 r2 r3 ]T . By replacing R by UR = [r1 r2 ]T , we can set up the following optimization (k+1) (k+1) problem to solve for r1 and r2 instead of R(k+1) without any constraint on r3 : (k+1)

(k+1)

, r2

Finally, we can solve for R to maximize Eq.(23);  0 (k+1)T

} = arg max Q(Θ, Θ(k) ) r1 ,r2

subject to rT1 r1 = 1, rT2 r2 = 1, and rT1 r2 = 0

(22)

R(k+1)

where "

  =  

(k+1)T

r0 1 (k+1)T r0 2

1 (k+1) kr1 k 0 (k+1)T r2 (k+1) kr0 2 k (k+1) (r0 1 ×r0 2 )(k+1)T (k+1) (k+1) kr0 1 ×r0 2 k

# = 2×3

(23)

using Eq.(14) and (16) so as 

r

A = R(k+1)T UT UR(k+1)

{r1

(k+1)

,

 (k+1)T     r1(k+1)T  = r   2 (k+1)T  r3

N X 1 (k) (k+1) T ) mi . 2 (di − t σ i=1 i

(24)

(25)

IV. 3D FACE R ECONSTRUCTION In this chapter, we propose a way to reconstruct a 3D face from feature points and pose estimation by interpolating more points in an input 2D image and estimating the corresponding vertices in an output 3D face. After feature detection and pose estimation by the proposed method, we obtain the 3D positions of the 79 feature points, but they are not enough to reconstruct a whole 3D face. For example, when we model a 3D face based on the 3D geometrical structure of the USF Human-ID database, it is required to obtain both shape and texture of 75, 972 vertices. As the first step to get this information, we interpolate 75, 893 points in an input 2D image by a simple way using barycentric coordinates. Next, we estimate the 3D

(a)

(b)

(c)

(d)

(e)

Fig. 3. Interpolate 75, 893 points in an input 2D face. (a) 79 feature points detected by the proposed feature point detector; (b) pose-corrected feature points; (c) Delaunay triangulation with 79 feature points in a 3D average shape; (d) 36, 112 points inside triangles; (e) 39, 805 points out of triangles.

vertices with the depths in a 3D face by a linear deformable model.

Z coordinate. Hence, the depth of p2 is also estimated in the next section.

A. Interpolating 2D Points in a Frontal View To reconstruct a complete 3D face from the only 79 feature points detected by the method proposed in the previous section, first of all, we need to interpolate more points in a 2D image. Fortunately, the difference between the shapes of a specific face and the average face is not significant. Inspired by this idea, first of all for interpolation, pose correction of the 2D feature points to the frontal view is applied by the estimated pose parameters. After pose correction to the frontal view, the transformed feature points can be easily compared to the corresponding points in the average shape obtained from a training set consisting of the 3D frontal faces. The process for interpolation is shown in Figure 3. Figure 3(b) shows the pose correction results of the input 2D image in Figure 3(a) and also demonstrates that the proposed method for pose estimation can estimate the pose parameters successfully. Next, 2D points are interpolated in each triangle consisting of the pose-corrected feature points. We already construct the average shape of 3D training facial image, so with the Delaunay triangulation, we can get a triangular mesh consisting of a set of lines connecting each point to its natural neighbors as shown in Figure 3(c). Let p1 be the 2D projection of a certain vertex in the 3D average shape to the frontal view; a point p1 = (x1 , y1 ) when the corresponding vertex v1 = (x1 , y1 , z1 ). Now, our goal is to interpolate a point p2 = (x2 , y2 ) between the pose-corrected 2D feature points. We apply barycentric coordinates to 36, 112 interpolate points inside individual triangles shown in Figure 3(d), while the other 39, 805 points in Figure 3(e) are interpolated in a different way proposed in Section IV-B. By barycentric coordinates, p1 on a triangle consisting of a1 , b1 and c1 and p2 on a different triangle consisting of a2 , b2 and c2 can be represented by the weighted sums of the three apices of respective triangles with the same weights. That is, p2 is selected so as to satisfy p1 = αa1 + βb1 + γc1 and p2 = αa2 + βb2 + γc2 , where α + β + γ = 1. Note that p2 is on the plane including a2 , b2 , and c2 , so it cannot have

B. Depth Estimation by a Linear Deformable Model By applying interpolation using barycentric coordinates, we get X and Y coordinates of the points inside a triangular mesh. From the results of interpolation, we propose a method to infer Z coordinates of the points in Figure 3(d) and X, Y and Z coordinates of the points in Figure 3(e). We propose two linear deformable models with two different PCA subspaces learned from the inside points in the triangular mesh and all the points. Let zall be a (3 × Nall ) × 1 vector defined by zall = (x1 , y1 , z1 , · · · , xNall , yNall , zNall )T , where Nall = 75, 917. In the same way, the 3D points inside the triangular mesh can 0 0 , zN )T be represented by zin = (x01 , y10 , z10 , · · · , x0Nin , yN in in where Nin = 36, 112. Then, two deformable models for zall and zin are learned from a 3D training set, where a training face consists of Nall 3D vertices taken from a frontal view; zall = Wall call + µall ,

(26)

zin = Win cin + µin

(27)

and Nall ×m

Nin ×m

and Win ∈ R are eigenvector where Wall ∈ R matrices for Nall points and Nin points in a 3D training face respectively. The two eigenvector matrices take the same number of eigenvectors in descending order according to their eigenvalues. Note that Nin points are part a subset of Nall so the coefficients call and cin is ideally the same; the common coefficients c can be defined as c = call = cin . Given a 2D facial image unseen during training, cin of the image can be easily calculated with the Nin interpolated points T zin . Consequently, we can also calculate all zi n by cin = Win Nall 3D points zall by zall = Wall cin + µall . Finally, we estimate the shape of a given 2D image by inferring all the 3D vertices taken from the frontal view. After reconstructing the shape of a 3D face, its texture (or color) is also mapped from the given 2D facial image. Texture mapping is easy for the Nin points in Figure ??(d) since we calculate the correspondence between a 3D vertex in the

(a)

(b)

(c)

(d)

(e)

Fig. 4. Synthesis results. (a) 2D alignment; (b) 3D shape reconstruction; (c) the frontal view of the 3D face after texture mapping; (d),(e) various poses synthesized by rotating the 3D faces reconstructed by our proposed method.

Fig. 5.

A glaring defect of reconstruction around the mouth. (a) manually selected feature points; (b) 3D shape reconstruction; (c) texture mapping.

reconstructed 3D face and a 2D point in the given 2D image. Otherwise, to map the texture to the other Nall − Nin vertices in the 3D reconstruction, we apply the interpolation with barycentric coordinates again, but by the opposite direction: from a 3D vertex in the 3D reconstruction to a 2D point in the 2D image. We take a triangle which is the closest to the 3D vertex and calculate the corresponding 2D point by interpolation although the vertex is not inside the triangle. One of the merits of the interpolation with barycentric coordinates is that it also can interpolate the points out of the triangular mesh; a point p is inside the triangle formed by a, b and c if and only if 0 < α < 1, 0 < β < 1, and 0 < γ < 1.

mapping out of the triangular mesh designed in this paper is not accurate and smooth; in the sides of a forehead or around eyebrows, some defects of reconstruction are often found, which make the reconstruction results look unrealistic. Also, Figure 5(b) and (c) shows a glaring defect of reconstruction around the mouth although feature points shown in Figure 5(a) are manually selected to expect a more reliable result. The defect may be caused by the limitation of the training set used in this paper; the training faces hardly have opening mouths such as the input 2D image in Figure 5(a). These results suggest us to collect or synthesize more 3D faces for training in order to handle a large variety of human faces.

V. E XPERIMENTAL R ESULTS

VI. C ONCLUSION

For our experiments, two face databases are used: the USF Human-ID database [1] for training and the CMU PIE database [7] for testing. First, for training, we use 100 laserscanned 3D faces in the USF Human-ID database [1]. Each face in the database has 75, 972 vertices. At the training step, for each 3D feature point zi , the mean µi and the variance ρi are calculated, and the two deformable models in Eq.(26) and Eq.(27) are learned from the 100 3D faces. At the testing step, we takes face images with neutral expression and 7 poses labeled as 00, 05, 07, 09, 11, 27, 29, and 37 among all the 13 poses in the CMU PIE database. The other 6 poses such as profile views are not used since the proposed method for pose estimation is not designed for a mixture shape model such as two clusters with frontal and profile views proposed in [10]. Figure 4 shows some results of alignment and reconstruction by the proposed pose estimation. The images in the first column, Figure 4(a) show the input 2D images and the alignment results using our feature point detector. The second column, Figure 4(b) shows the 3D shape reconstructed from the given 2D images. Figure 4(c),(d), and (e) shows multiple views of the 3D face by shape reconstruction and texture mapping. To visualize a 3D face, a VRML model is used. The experimental results for 3D face reconstruction in Figure 4(b) (e) demonstrate that the proposed method for pose estimation can produce reliable results. However, we mainly find two problems in the reconstruction results. First, texture

In this paper, we proposed a novel method for 3D face reconstruction from a single 2D face image. The whole process from feature detection to shape reconstruction are automatically performed. In particular, we also propose a reliable method for 3D pose estimation, which is a key task for 3D pose estimation. Pose estimation by matching a 3D face model to a 2D face image is an ill-posed problem since the depth lost by projecting a 3D face onto a 2D image plane should be estimated. By applying the EM algorithm, the proposed method in this paper can successfully solve this problematic pose estimation task. We set up the pose estimation problem as inferring missing depths and pose parameters to apply the EM algorithm. R EFERENCES [1] V. Blanz and T. Vetter. A morphable model for the synthesis of 3d faces. In ACM SIGGRAPH, pages 187–194, 1999. [2] K. N. Choi and M. C. Mozer. Recovering facial pose with the em algorithm. pages 2073–2093, 2002. [3] T. Cootes, G. Edwards, and C. Taylor. Active appearance models. In Proceedings of the European Conference on Computer Vision, pages 484–498, 1998. [4] L. Gu and T. Kanade. 3d alignment of face in a single image. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1305–1312, June 2006. [5] I. Matthews and S. Baker. Active appearance models revisited. In International Journal of Computer Vision, number 2, pages 135–164, 2004. [6] M. Savvides, B. V. Kumar, and P. Khosla. Face verification using correlation filters. Proc. of Third IEEE Automatic Identification Advanced Technologies, pages 56–61, 2002.

[7] T. Sim, S. Baker, and M.Bsat. The cmu pose, illumination, and expression database. In IEEE transactions on Pattern Analysis and Machine Intelligence, number 12, pages 1615–1618, 2003. [8] P. Viola and M. Jones. Robust real time object detection. In IEEE ICCV Workshop Statistical and Computational Theories of Vision, July 2001. [9] Y. Zhou, L. Gu, and H. Zhang. Bayesian tangent shape model: Estimating shape and pose parameters via bayesian inference. In IEEE Conference on Computer Vision and Pattern Recognition, volume 1, pages 109–116, 2003. [10] Y. Zhou, G. Lie, and H. Zhang. Bayesian tangent shape model: estimating shape and pose parameters via bayesian inference. In IEEE Conference on Computer Vision and Pattern Recognition, pages 741– 746, 2003. [11] Y. Zhou, W. Zhang, X. Tang, and H. Shum. A bayesian mixture model for multi-view face alignment. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1741–1746, 2005.