Matching 2.5D face scans to. 3D models. 28(1):31â43, Jan. ... [23] P. Phillips, P. Flynn, T. Scruggs, K. Bowyer, J. Chang,. K. Hoffman, J. Marques, J. Min, and W.
3D Spatio-Temporal Face Recognition Using Dynamic Range Model Sequences Yi Sun and Lijun Yin Department of Computer Science State University of New York at Binghamton, Binghamton, NY
Abstract
ever, the data modality is still in 2D. Recent technological advances in 3D imaging allow for real-time 3D shape acquisition [27, 10]. Such 3D sequential data (called 4D data) captures the dynamics of time-varying 3D facial surfaces, and thus allows us to analyze facial behaviors in a high level of detail. The 3D dynamic face representation is believed to be the best reflection of facial nature. Use of such 4D data allows algorithms to learn the individual’s characteristics and personal traits through 3D dynamic sequences. Such a scrutiny of 3D facial behavior could alleviate the problems from 2D data (e.g., sensitive to illumination and poses) and static 3D data (e.g., sensitive to facial expressions). There have been very few reported works that use 4D data for face recognition. Wang et al [27] have successfully developed a hierarchical framework for tracking highdensity 3D facial expression sequences captured from a structure-lighting imaging system. However, the applicability of the tracking data to face recognition problem has not yet been reported. The recent work in [8], utilized six 3D model sequences for facial analysis and editing. The work was mainly for facial expression analysis. In [21], Papatheodorou et al evaluated a so-called 4D face recognition approach, which was, however, just the 3D static data plus texture, no temporal information was explored. Li et al [15] reported a model fitting approach to generate facial identity surfaces through video sequences. The application to face recognition relies on the quality of the tracked low resolution face model. In this work, we created a new high-definition 3D dynamic face database (called 4D face database) which includes 101 subjects. Each subject has six 4D sequences corresponding to six universal facial expressions. There are a total of 606 3D facial video sequences. Inspired by previous research on face recognition [16, 20, 2] using Hidden Markov Models, we propose to use a 3D surface primitive feature with HMM to investigate the 3D spatio-temporal facial behavior in a 4D domain. We propose to combine a spatial HMM and a temporal HMM to learn the statistics of 3D model sequences of each subject and their temporal process. Temporal characteristics of
Research on 3D face recognition has been intensified in recent years. However, most research has focused on the 3D static data analysis. In this paper, we investigate the face recognition problem using dynamic 3D face model sequences. Based on our newly created 3D dynamic face database, we propose to use a Spatio-Temporal Hidden Markov Model (HMM) which incorporates 3D surface feature characterization to learn the spatial and temporal information of faces. The advantage of using 3D dynamic data for face recognition has been evaluated by comparing our approach to three conventional approaches: 2D video based temporal HMM model, conventional 2Dtexture based approach (e.g., Gabor Wavelet based approach), and static 3D-model-based approaches.
1. Introduction According to imaging modalities, face recognition research is generally categorized to approaches as either 2Dbased or 3D-based. Each approach is represented by one of the two formats: static or dynamic. The majority of face recognition research uses normal 2D images or videos, which face the major challenges of PIES (a.k.a. pose, illumination, expressions, and scales) [14, 30]. 3D facial data shows the potential to alleviate the problems encountered by 2D based approaches. Recent works [6, 17, 3, 5, 15, 22, 24, 7] have shown the great success in 3D face recognition. The majority of the existing work utilizes the static 3D range data acquired through laserscanning, stereo photogrammetry, or active light projection. One of the major challenges is the sensitivity to facial expressions due to a lack of the temporal information. Excellent surveys can be found in [4, 23, 25, 13]. Face is by nature a dynamic facial behavior. Dynamic cues from expressive and talking movements of human faces provide information about individual’s facial structure. Recent work has shown the success in recognizing faces by exploring temporal information [16, 31, 11]; how1
978-1-4244-2340-8/08/$25.00 ©2008 IEEE
test 3D model sequences are then analyzed over time, using a recognition process. The general framework of our proposed 3D spatio-temporal face recognition approach is outlined in Figure 2. We apply a generic model to track the high-density range models frame by frame. Facial surfaces are characterized by eight types of surface primitive features. The spatial and temporal information of these labeled features is leaned from a spatio-temporal HMM model. We conducted experiments and compared the results with (1) a 2D image based (e.g., Gabor wavelet based approach [28]), (2) static 3D-model based approaches using LDA [18, 1] and PCA, and (3) a video based adaptive HMM method [16] for face recognition. Each component of the system will be described in the following sections individually.
2. 3D Dynamic Face Database To investigate how the dynamics of 3D model sequences with varied facial expressions and poses affects face analysis, we created a dynamic 3D face database. The data was captured using Dimensional Imaging’s 3D dynamic capturing system [10]. The system captures a sequence of stereo images of a dynamically changing facial surface. Each pair of stereo images is processed using a passive stereo photogrammetry approach [10] to produce a separate range map. The range maps are then combined to produce a temporally varying sequence of high-resolution 3D images. The system achieves video-rate (25 Hz) acquisition of dynamic 3D faces with an RMS accuracy of 0.2 mm. Each subject was requested to perform six universal expressions (anger, disgust, fear, smile, sad, and surprise). Each 3D video sequence captured one expression. Therefore, each subject performed six times in front of the 3D cameras separately. The 3D videos were captured at a rate of twenty-five frames per second and each 3D video clip is approximately 4 seconds in length. Each 3D face model has a resolution of approximately 30,000 - 40,000 vertices. Our database currently consists of 101 subjects (about 60% female and 40% male), with a variety of ethnic/racial ancestries. The database, in its entirety, includes 606 3D model sequences with six prototypic expressions. A snapshot of a sample video sequence taken from the database is shown in Figure 1. The database detail is described in [29]).
3. 3D Spatio-Temporal Face Analysis The 3D dynamic face data provide both high-resolution facial surface information and temporal motion information. To better use the 3D spatial and temporal information, we propose to integrate our facial surface feature descriptor and Hidden Markov Models to analyze the spatial dynamic property over time. Motivated by the recent success of using HMMs for 2D video based face recognition [16], we
Figure 1. A sampled video sequence from the dynamic 3D face model database. From top to bottom: the textured models, shading models, and wire-frame models with 83 adapted control points of a subject with smile expression, respectively.
extend the ideas to analyzing the 3D facial motions in order to learn the individual facial characteristics by HMMs. Our proposed approach is outlined in Figure 2, which consists of three stages: (1) model pre-processing; (2) HMM-based training, and (3) recognition process. In the first stage, we adapt and track a generic model (called tracking model) to each range model of the 3D model sequence. The adapted tracking model establishes the correspondence of feature points across the 3D range model sequence. An automatic labeling approach is then used to label the surface geometric features as one of the eight label types on the tracked locations of the range models (e.g., convex peak, concave pit, etc.) As a result, each range model in the sequence can be represented by a vector G = [g1 , g2 , ..., gn ], where gi is one of the eight label types and n is the number of vertices of the adapted model. Due to the high dimensionality of the feature vector g (n is over 1000 in our case), we use Linear Discriminative Analysis (LDA) based method to reduce the feature space. The LDA transformation is to map the feature space into an optimal space that is easy to differentiate different subjects. Given the observation of the transformed feature vectors, the second stage of the system is to learn different HMMs for each subject. Two HMMs are trained for each subject, namely a Spatial HMM and a Temporal HMM. Given the training stage ready, the final stage is to perform the test for recognition. In this stage, the spatial and temporal dynamics of test videos are analyzed by the trained HMMs and the probability score for each HMM is evaluated by a Bayesian decision classifier. Integrating the results of S-HMM and T-HMM, each test video sequence can be classified as one of the subjects in the database.
values of the Weingarten curvature matrix: A B W = B C
Figure 2. System Diagram
3.1. Model adaptation and tracking Facial model tracking is the first step of the spatiotemporal analysis [9]. Since the high-resolution range models created by the range system have different number of vertices across the model sequence, it is hard to establish the feature correspondences among the model frames in order to construct feature vectors. Here, we apply a generic model adaptation approach to ”sample” the range model so that the feature vertices of that model can be detected. In the first frame, we used our selected 83 feature points as the initial positions, then applied an AAM model to track the features along 3D video sequences using 2D texture and 3D model correspondences. We applied a generic model to adapt to the 3D range models based on the tracked feature points and the radius based interpolation approach. Figure 1 shows an example of tracked points. These key points will also be used to derive the facial sub-regions for HMM training in the later section.
3.2. Geometric surface labeling 3D facial range models can be characterized by their surface primitive features. This spatial feature can be classified by eight types: convex peak, convex cylinder, convex saddle, minimal surface, concave saddle, concave cylinder, concave pit and planar. Such a local shape descriptor provides a robust facial surface representation [26]. To label the range model surface, we select the vertices that are overlapped by the face region of the tracking model, then classify them into one of the primitive labels. The classification of surface vertices is based on the surface curvature computation. Here is the brief description: Let p(x, y, z) be a point on a surface S, Np be the unit normal at point p, and Xuv be a local parameterization of surface S at p. A polynomial patch is used to approximate the local surface around p. z(x, y) = Ax2 /2+Bxy+Cy 2 /2+Dx3 +Ex2 y+F xy 2 +Gy 3 (1) Using Xu , Xv , and Np as a local orthogonal system, we can obtain the principal curvatures by computing the eigen-
(2)
After calculating the curvature values of each vertex, we use the categorization method (similar to [26]) to label each vertex on the range model. As a result, each range model is represented by a group of labels, which construct a feature vector: G = (g1 , g2 , ..., gn ), where gi represents one of the primitive shape labels, n equals the number of vertices in the facial region of the adapted model. An example of the labeled face surface is shown in Figure 4.
3.3. Optimal feature space transformation Till this stage, we represent each face model by their geometric surface types. To classify the face vectors efficiently, we transform them to an optimal feature space. We use the concept of Linear Discriminative Analysis to project the feature space G to an optimal feature space OG . LDA seeks to retain the discriminative features but the expressive features by defining the within-class matrix Sw and betweenclass matrix Sb . Then, it will transform the n-dimensional feature G to the d-dimensional optimized feature OG by equation 3. OG = DoT G˙
(3)
where Do = arg minD [|(DT Sb D)/(DT Sw D)|] and d < n, D is the projection matrix. For the face recognition task, the discriminative classes are different subjects. Thus the derived optimized features are able to keep the discriminative information in order to distinguish different subjects while minimize the influence of expressions.
3.4. Basic Hidden Markov Models During the training stage, the statistical information and the temporal dynamics of the trained data are learned by HMMs. For face recognition tasks, each subject is modeled by an N-state HMM. The continuous HMM is described briefly as follows: Let λ = [A, B, π] denote a HMM to be trained. Let N be the number of hidden states in the model, we denote the individual states as S = {S0 , S1 , ..., SN −1 } and the state at time t is qt . A = {aij }is the state transition probability distribution, where aij = P [qt+1 = Sj |qt = Si ] , 0 ≤ i, j ≤ N − 1
(4)
B = {bj (k)}is the observation probability distribution in state j, and k is an observation . We use Gaussian distributions to estimate each B = {bj (k)} , where bj (k) = P [k|qt = Sj ] ∼ N (µj , Σj ) , 0 ≤ j ≤ N −1 (5)
Figure 3. Structure of a HMM model.
Let π = {πi } be the initial state distribution, where πi = P [q0 = Si ] , 0 ≤ i ≤ N − 1 Then, given an observation sequence, (6) O = O1 O2 ...OT where Oi denote an observation at time i, the HMM training procedure can be described as follows: Step 0: Take the optimized feature representation OG of each observed 3D range face model as an observation. Step 1: Initialize the HMM model λ = [A, B, π] . Each observed model has N parts and each part corresponds to one state. Then the observation parts for one state are used to estimate the parameters in the observation matrix B. Set the initial values of A and π based on observations. Step 2: Use the forward-backward (Baum-Welch) algorithm [20] to derive the maximum likelihood estimation of the model parameter λ = [A, B, π] when P (O|λ)is maximized. Given a query model sequence Q, we can use the Bayesian decision rule to classify it C ∗ = arg max[P (λi |Q)] i∈C
where P (λi |Q) =
P (Q|λi )P (λi ) C j=1 P (Q|λj )P (λj )
(7)
and C is the number
of trained HMM models. Based on our experiment, we used 12-states to build a temporal HMM model and 6-states to build a spatial HMM for face recognition. Figure 3 shows an example of a spatial HMM model with 6-states.
3.5. Spatio-Temporal HMMs Individual facial characteristics are represented by not only the temporal change (inter-frame) but also the spatial change (intra-frame). To better model the 3D facial dynamics, we investigate the HMM in both spatial domain and temporal domain, and their combination. 3.5.1
Temporal HMM (T-HMM)
To explore the temporal dynamics of the 3D facial surface along a time sequence, the temporal HMM takes each frame of a face sequence as one observation. Each frame contains the entire 3D facial region. As described in Section 3.3, we first transform each frame (i.e., a labeled 3D face model) into the optimized feature space using LDA. Each subject
is modeled by a N -fully connected one dimensional HMM, as shown in Figure 3. Here, we used 12 states (N = 12) to build the HMM model following the HMM training step 1 and 2 (described in Section 3.4). Then we can apply the above procedure to learn basic HMMs on the optimized feature space for each subject’s 3D model sequence. Given a query model sequence, the probability of the observation sequence of each HMM is computed. The decision to classify this query model sequence to a subject is denoted as DecisionT . 3.5.2
Spatial HMM (S-HMM)
In each frame, the 3D facial model is subdivided into six regions from top to bottom, R1, R2, R3, R4, R5, and R6, as shown in Figure 4. The subdivision is based on the feature points that have been tracked on the facial surface (e.g., contours of eyebrows, eyes, nose, mouth, and chin). From top to bottom, we construct an one-dimensional HMM which consists of the six states (N = 6), corresponding to six regions, as shown in Figure 5. Similar to the above mentioned entire face region case, for each sub-region, we transform the labeled surface to the optimized feature space using LDA transformation. Given such an observation of each sub-region, we can train the HMM for each subject. Given a query face model sequence of a length N , we compute the likelihood score for each frame, and use the Bayesian decision rule to decide which subject each frame is classified to. Since we obtain the N results for N frames, we take a majority voting strategy to make a final decision. As such, the query model sequence is recognized as Subject A if A is the majority result among N frames. Since this method tracks spatial dynamics of a face surface, we call it spatial HMM (S-HMM). The decision made by this method is denoted as DecisionS . 3.5.3
Combined Spatio-Temporal HMM (C-HMM)
In order to model both spatial and temporal information of 3D face sequences, we propose to combine the spatial HMM and the temporal HMM. Since the extension to a fully connected two dimensional HMM is computationally very expensive, for simplicity, we use a pseudo twodimensional HMM for 3D face sequence recognition, in which we build HMMs separately along the temporal axis and spatial axis (as shown in Figure 5). The final decision is made based on results of the temporal HMM and the spatial HMMs. The decision procedure using combined spatio-temporal HMM (DecisionC ) is described as follows: IF DecisionT == DecisionS DecisionC = DecisionS ELSE IF Conf idenceS is less than a threshold
R1
R1
R2
R2 R3
R3 R4
R5
R6
R4 R5 R6
Figure 4. Spatial sub-regions on an adapted model and a labeled model.
Figure 5. Spatial and Temporal HMM combination (C-HMM). (Here N = 12 frames for 12-states).
DecisionC = DecisionT ELSE DecisionC = DecisionS END Here, we define Conf idenceS as the ratio of the number of majority vote versus the number of frames of a query model sequence. In our experiment, we take 12 frames as a sequence (which correspond to 12 states in the T-HMM), and choose the threshold as 0.7. In other words, if at least 9 frames of a query sequence are recognized to be Subject A, we decide the query sequence to be Subject A. Otherwise, the result comes from the DecisionS . Essentially, this is to use the temporally learned facial characteristics to compensate for the spatially learned facial characteristics. Such a spatio-temporal HMM combination is effective when facial surface changes dramatically along with changes of facial expressions and poses. We will further study the performance in the experiment section using expression dependent and expression independent face recognition.
4. Experiments We conducted experiments on face recognition to verify the functionality and viability of the proposed approach for 3D dynamic range models.
Method 2D Gabor-wavelet based 3D LLE-based 3D PCA-based 3D LDA-based 4D HMM-based
Recognition rate 85.09% 82.34% 90.78% 91.37% 98.61%
Table 1. Expression-dependent face recognition results
4.1. Comparison of 4D, static 3D, and static 2D (1) Expression-dependent: For each subject in the 4D database, we split each of the 6 video sequences into two equal parts: training set and testing set. Then for each split sequence, we began by choosing the first 12 frames to generate the first sub-sequence. Then, we shift the index of the subsequence every 3 frames till the end of the split sequence. Since both the training set and the test set contain observations of each subject with all six expressions, we call this as an expression-dependent experiment. Following the HMM training procedure, we generated a set of HMMs (including T-HMM and S-HMM) for each subject. The recognition procedure is then applied to classify the identity of each input dynamic sequence as previous section described. The correct recognition rate is as high as 98.61%. To test whether the proposed approach outperforms the conventional 2D approaches and static 3D approaches, we conducted the comparison experiments. The Gabor wavelet based approach [28] is implemented and tested on facial images from our 3D video sequences with a frontal view. We used 40 Gabor kernels including 5 different scales and 8 orientations, and applied them to the 83 key points on the 2D images. For 3D-static based methods, we implemented three different approaches (LLE-based, PCA-based, and LDA-based algorithms), all of which use the geometric surface feature vector G (defined in section 3.2) as the input. The first applied algorithm is Local Linear Embedding (LLE) based approach. We first transform the labeled feature G of each range model into the LLE space and select the key frames using the k-means clustering applied on the LLE space. Then, all selected key frame models are used as the gallery database for classification. The majority voting strategy is used to classify each individual 3D query model in the test set, based on their similarity score to the gallery models. We also implemented PCA and LDA based approaches [18, 1]. Both of them incorporate the statistical information among all the models in the training dataset. However, neither considers using the temporal information to analyze the dynamics of a face during a time period. From the Table 1, we see that the proposed 4D HMM-based approach outperforms all the other algorithms that we used. (2) Expression-Independent: We also conducted an expression-dependent experiment on the 4D HMM-based
Expressional sequence used for training Anger Disgust Fear Smile Sad Surprise
Recognition rate 95.31% 95.27% 95.79% 95.64% 95.69% 95.73%
Table 2. Expression-independent face recognition results
approach for face recognition, in which the HMM model (including T-HMM and S-HMM) is trained for each subject using only one type of expression model sequence, and tested with the other five expression model sequences. Table 2 shows the recognition results using different trained expressional models. The average recognition rate of the 4D HMM-based approach is 95.65%. The performance has a small degradation as compared to the expression-dependent result. This illustrates that the dynamic 3D face sequences do provide a plenty of spatial and temporal information (e.g., unique surface feature labels) for the spatio-temporal HMM to learn individual subject’s characteristics.
4.2. Comparison of 4D and 2D video [16] In order to further evaluate our 4D data based face recognition approach, we compared our approach with the recent work reported by Liu and Chen [16], which applied a socalled Adaptive HMM (A-HMM) for 2D video based face recognition. The adaptive HMM uses an adaptive strategy by updating HMM model parameters when observing a new test sequence. In our 4D face database, each model is associated with one frontal view texture. Following the algorithm described in [16], we cropped the facial regions from all the frames of 2D video sequences. Then the cropped facial images are normalized to the size of 48*48 pixels. All cropped face images are reduced to a low-dimensional feature vectors by PCA. We used 30 eigenvectors and trained a 14-state temporal HMM for each subject. According to [16], the weighting factors α and β are chosen to be 0.5 and 0.3, respectively. We used the first half of each video sequence for training, and the second half for test. During the recognition process, we use the test sequence, which has been recognized as a subject, to update the HMM of that subject. Similarly, we conducted both expression dependent and expression independent experiments. The results are reported in Table 3. The result shows that our 4D data based approach outperforms the 2D video based approach [16]. The explanation of this result lies in three major aspects: 1. Unlike the temporal HMM, our spatio-temporal HMM is able to learn both the spatial and the temporal information from face model sequences. Most importantly,
Methods 2D Video A-HMM [16] 4D Based ST-HMM
E-independent 67.05% 95.65%
E-dependent 93.97% 98.61%
Table 3. Comparison with Liu’s method [16] using the expression dependent and expression independent study
the spatial HMM and the temporal HMM can verify and compensate for each other so that the facial dynamic structure can be learned through the individual frames and subsequent frames. 2. The dynamic 3D face sequence is able to represent the dynamic facial shapes invariant to illumination and pose changes. However, the 2D image patches used in [16] are sensitive to illumination, scale, and expressions. The Liu’s algorithm has shown very effective if the training set and test set are from the same video dataset. However, if the data captured in different imaging conditions (at different time period), the recognition performance is degraded dramatically (as reported in [16] ”Table-1 Task-new set”). Since our database contains videos with different expressions and illuminations for each subject, the expression independent test verified the result of low recognition rate using the Liu’s algorithm. However, our geometric based 3D model sequence is invariant to illumination change and pose change, it therefore shows more reliable and better performance in the expression-independent evaluation. 3. Our observation vectors are obtained by LDA transformation while [16] used the PCA dimensionality reduction method. Compared to PCA, LDA can transform the face data into a more discriminative space. The experiments reported in Table 1 has shown that using LDA-based approach has better performance that using PCA-based approach for 3D face recognition.
5. Discussions and conclusions In this paper, we proposed a 3D spatio-temporal analysis approach to investigate the usage of the 4D data to improve the face recognition performance. We have created a 4D face database including 606 3D model sequences with six prototypic expressions. To evaluate the usability of such data for face recognition, we applied a generic model to track the range model sequences and establish the correspondence of range model frames over time. After the tracking model labeling and LDA transformation, we trained two HMM models (S-HMM and T-HMM) for each subject to learn the spatial and temporal information of the 3D model sequence. The query sequence is classified based on the results of the two HMMs. Compared to the conventional 2D texture-based, 3D static based (Table 1), and 2D video based approaches (Table 3), our 3D spatio-temporal face analysis approach achieves better recognition performance.
There are some limitations in our current work: (1) Due to the lack of other public databases with 4D data, we are unable to evaluate our approach on more than one 4D data set. The current database needs to be expanded to a larger scale in order to conduct an intensive test on face recognition. (2) In this work, we concern ourselves with the study of importance and usefulness of the new modality (i.e., 4D data) for the application of face recognition. The model pre-processing approach can be improved using a more robust and automatic face tracking algorithm. For example, the existing approaches [27, 19, 12] could be explored and used as groundwork for this task. (3) In addition, we will further investigate the integration of the dynamic 3D geometric data with 2D sequential data in order to improve the performance of face recognition [21].
6. Acknowledgement This material is based upon the work supported in part by the National Science Foundation under grants IIS-0541044, IIS-0414029, and the NYSTAR’s James D. Watson Investigator Program.
References [1] P. Belhumeur, J. Hespanha, and D. Kriegman. Eigenfaces vs. fisherfaces: recognition using class specific linearprojection. IEEE Trans. on PAMI, 19(7):711–720, 1997. 2, 5 [2] M. Bicego, E. Grosso, and M. Tistarelli. Person authentication from video of faces: a behavioral and physiological approach using pseudo hierarchical hidden markov models. In International Conference on Biometric Authentication, Hong Kong, 2006. 1 [3] V. Blanz and T. Vetter. Face recognition based on fitting a 3d morphable model. IEEE Trans. PAMI, 25(9), Sept. 2003. 1 [4] K. Bowyer, K. Chang, and P. Flynn. A survey of approaches and challenges in 3d and multi-modal 3d+2d face recognition. CVIU, 101(1):1–15, 2006. 1 [5] A. Bronstein, M. Bronstein, and R. Kimmel. Threedimensional face recognition. International Journal of Computer Vision, 64(1), 2005. 1 [6] K. Chang, K. Bowyer, and P. Flynn. An evaluation of multimodal 2d+3d face biometrics. IEEE Trans. PAMI, 27(4), 2005. 1 [7] K. Chang, K. Bowyer, and P. Flynn. Multiple nose region matching for 3d face recognition under varying facial expression. IEEE Trans. on PAMI, 28(10), 2006. 1 [8] Y. Chang, M. Vieira, M. Turk, and L. Velho. Automatic 3d facial expression analysis in videos. In ICCV Workshop on Analysis and Modeling of Faces and Gestures, 2005. 1 [9] D. DeCarl and D. Metaxas. Optical flow constraints on deformable models with applications to face tracking. IJCV, 38(2):99–127, 2000. 3 [10] I. Di3D. http://www.di3d.com. 2006. 1, 2 [11] G. Edwards, C. Taylor, and T. Cootes. Improving identification performance by integrating evidence from sequences. In IEEE CVPR 99, 1999. 1
[12] S. Glodenstein, C. Vogler, and D. Metaxas. 3d facial tracking from corrupted movie sequences. In IEEE Inter. Conf. on CVPR, Wanshington DC, June 2004. 7 [13] J. Kittler, A. Hilton, M. Hamouz, and J. Illingworth. 3d assisted face recognition: A survey of 3d imaging, modelling and recognition approaches. In CVPR05 Workshop on A3DISS, 2005. 1 [14] S. Li and A. Jain. Handbook of face recognition. Springer, New York, 2004. 1 [15] Y. Li, S. Gong, and H. Liddell. Constructing facial identity surfaces for recognition. IJCV, 53(1):71–92, 2003. 1 [16] X. Liu and T. Chen. Video-based face recognition using adaptive hidden markov models. In IEEE CVPR 03, 2003. 1, 2, 6 [17] X. Lu, A. Jain, and D. Colbry. Matching 2.5D face scans to 3D models. 28(1):31–43, Jan. 2006. 1 [18] A. Martinez and A. Kak. Pca versus lda. IEEE Trans. on PAMI, 23(2):228–233, 2003. 2, 5 [19] D. Metaxas and I. Kakadiaris. Elastically adaptive deformable models. IEEE Trans. PAMI, 24(10):1310–1321, Oct. 2002. 7 [20] A. Nefian. A Hidden Markov Model based approach for face detection and recognition. Ph.D. Thesis, Georgia Tech., 1999. 1, 4 [21] T. Papatheodorou and D. Rueckert. Evaluation of automatic 4d face recognition using surface and texture registration. In FGR04, 2004. 1, 7 [22] G. Passalis, I. Kakadiaris, and T. Theoharis. Intraclass retrieval of nonrigid 3d objects: Application to face recognition. IEEE Trans. on PAMI, 29(2), 2007. 1 [23] P. Phillips, P. Flynn, T. Scruggs, K. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek. Overview of the face recognition grand challenge. In IEEE Conf. on CVPR 2005, San Diego, CA, 2005. 1 [24] C. Samir, A. Srivastava, and M. Daoudi. Three-dimensional face recognition using shapes of facial curves. IEEE Trans. on PAMI, 28(11), 2006. 1 [25] A. Scheenstra, A. Ruifrok, and R. Veltkamp. A survey of 3d face recognition method. In AVBPA 2005, pages 891–899, 2005. 1 [26] J. Wang, L. Yin, and et al. 3D facial expression recognition based on prmitive surface feature distribution. In IEEE CVPR06. 3 [27] Y. Wang, X. Huang, C. Lee, S. Zhang, Z. Li, D. Samaras, D. Metaxas, A. Elgammal, and P. Huang. High resolution acquisition, learning and transfer of dynamic 3d facial expressions. In EUROGRAPHICS 2004, 2004. 1, 7 [28] L. Wiskott, J. Fellous, N. Krueuger, and C.Malsburg. Face recognition by elastic bunch graph matching. IEEE Trans. on PAMI, 19(7):776–779, 1997. 2, 5 [29] L. Yin and et al. A high-resolution 3d dynamic facial expression database. In Technical Report, CS Department, Binghamton University, 2008. 2 [30] W. Zhao, R. Chellappa, P. Phillips, and A. Rosenfeld. Face recognition: A literature survey. ACM Computing Surveys, 35(4), Dec. 2003. 1 [31] S. Zhou, V. Krueger, and R. Chellappa. Probabilistic recognition of human faces from video. CVIU, 2003. 1