ILLUMINATION INVARIANT THREE-STAGE APPROACH ... - CiteSeerX

2 downloads 24 Views 167KB Size Report
last stage is fine face alignment that uses a component based. AAM to get accurate location of facial landmarks. The rest of this paper is organized as follows: In ...
ILLUMINATION INVARIANT THREE-STAGE APPROACH FOR FACE ALIGNMENT Fatih Kahraman1, Muhittin Gökmen2 Istanbul Technical University, 1Informatics Institute, 2Computer Engineering Department 34469 Ayazaga, Istanbul, Turkey [email protected], [email protected] ABSTRACT Localization of facial components is very important for many pattern recognition and computer vision applications such as face recognition, tracking of face expressions. This paper addresses the problem of precisely finding facial components, such as the eyes, the mouth, the nose etc. We present an illumination invariant three–stage approach for face alignment. This paper combines the concept of component-based approach and face alignment and develops component-based Active Appearance Model (AAM) method for fine face alignment. We propose a new image representation to improve AAM segmentation accuracy for illumination invariant face alignment. Index Terms— Image registration, Image shape analysis, Face recognition 1. INTRODUCTION Face alignment is required to obtain high recognition rates in any face recognition system. Recent studies are centered on around this problem. Proposed solutions are generally modelbased algorithms. For face alignment many shape and appearance based methods have been proposed. Current alignment methods such as Active Shape Model (ASM) [1], AAM [2] and its extensions [3] are used in face alignment problem. In this paper, we present an illumination invariant three– stage approach for face alignment. The first stage is coarse face alignment using Haar Cascade Classifiers. The second stage is global AAM based face alignment to locate facial features. The last stage is fine face alignment that uses a component based AAM to get accurate location of facial landmarks. The rest of this paper is organized as follows: In Section 2, we will propose the illumination invariant feature descriptor. After that we will describe the selection method of perceptually important contours and the proposed image representation for illumination invariant face alignment. In Section 3, we will describe details of the proposed three-stage face alignment approach, firstly introducing initialization of AAM by using Haar Cascade Classifier [10], then we will describe how salient face components are extracted by using global and componentbased AAM. Experimental results and conclusions are given in the following two sections.

2. ILLUMINATION INVARIANT FEATURE DESCRIPTOR This section mainly describes illumination invariant feature descriptor used in our face alignment method. Firstly, we apply appropriate edge detection method to detect edges. And then, we used a contour filtering to select perceptually important contours in face images. Finally, we obtain a dense image representation from sparse edge map by using an appropriate surface reconstruction algorithm. Rather than representing the image using normalized grey values [2], gradient values [4], edge orientation [5] or edge phase congruency [6], we use this dense image representation in the second stage of our face alignment method. Face components such as eyes, eyebrows, nose and lips correspond to object boundaries. Another observation is that contours arising from texture disappears at large scales. We utilized the Generalized Edge Detector [7] [8] which combine the most of the existing high performance edge detectors under a unified framework. Most important part of the contour extraction algorithm is to select the perceptually important contours among these contours obtained by tracing these edges. This is achieved by assigning a priority to each contour by simply calculating the weighted sum of the normalized contour length, the average contrast along normal direction and the average curvature. The priority assigned to the contour Ci is given by equation (1). We can obtain perceptually important contours mostly resulting from object boundaries by selecting only the leading contours in this order and omitting the others, as shown in Fig. 1. Priority( Ci )= wlength Length( Ci )+ wcontrast Contrast( Ci ) + wcurvature Curvature( Ci ). (1) A dense image can be obtained from sparse edge map by using an appropriate surface reconstruction algorithm. To overcome the locality problem of edges, a membrane functional [8], can be applied to edge maps by minimizing. The spread edge profiles obtained by this membrane fitting gives rise to a dense image called "Hill"[9]. Hills have high values on boundary locations and decrease as we move apart from edges.

based face alignment to locate facial features. The last stage is the fine face alignment that uses a component based AAM to get accurate location of facial landmarks. 3.1 First Stage: Coarse Face Alignment

(a) (b) (c) Figure 1: Contour selection: (a) Original image, (b) Detected edges, (c) Selected contours.

(a) (b) (c) Figure 2: Illumination invariant feature descriptor: (a) Selected contours, (b) Filter kernel, (c) Feature descriptor.

Firstly, the goal oriented edge detection is applied by sorting edges and selecting meaningful contours describing a face. The detected contours are filtered with R1-filter [8] [9] which has a very good localization performance while smoothing the edges. Filtering the selected edges instead of using the contours alone improves the convergence of AAM under varying illumination conditions. Resulting Hill image is shown Figure 2.c. Rather than representing the image using normalized grey values or edge map, we use this dense image representation as input image in the second stage of our system. Em ( f , λ ) = ∫∫ ( f − d ) 2 dxdy + λ ∫∫ f x 2 + f y 2 )dxdy Ω

(2)

Ω

The same results can be obtained by convolving the edge image by the first-order regularization filter, R1 ( x, y; λ ) =

1 − ⎡⎣( x + y ) / λ ⎤⎦ . e 2λ

(3)

In our experiments, the best results are obtained with λ =5 for edge detection. Different size of faces requires different edge detectors and the generalized edge detector is capable of producing these different results. In our case, we can determine the scale of the face image using the first stage’s face detector bounding box and we can set λ parameter using scale information of the input face adaptively. 3. THREE-STAGE FACE ALIGNMENT In this section, we describe a new three-stage approach for face alignment. The first stage is coarse face alignment using Haar Cascade face detector [10]. The second stage is global AAM

The first step in our face alignment system is to detect face region from the still image. This is achieved by using Haar Cascade Classifier (HCC) which is scale invariant and can cope with pose variations to some degree. The classifier was first proposed by Viola and Jones [10]. The classifier is trained by the images taken from face (positives) and non-face (negatives) regions of the size 20 by 20. After successful training, the classifier produces 1 where there exists a face and 0 otherwise. The first stage can only give a rough estimate of the face region. We have used global AAM to extract face landmark points in the second stage of our face alignment system. Initialization of standard AAM is very important for the good convergence. Any inappropriate initialization can cause wrong model convergence and consequently yields wrong landmark locations. To avoid dependence on initialization, HCC results are used for initialization of the AAM automatically. It helps both better initialization and fast convergence of the model. 3.2 Second Stage: AAM based face alignment In this section, a brief review of AAM is given. Any human face can be synthesized from the model which is trained by the training database. The database is prepared by a human observer selecting the landmark points. Once the model is constructed, any given face can be mapped to the model space by minimizing the residual error between the original face image and the synthesized face image [11]. Assume that the points belonging to the ith face image are denoted as {( Si , Ti )} , where Si is a set of points containing shape information such that, Si = ( ( x1 , y1 ) , ( x2 , y2 ) ,..., ( xK , yK ) ) and Ti contains texture information at Si . AAM is obtained by applying principal component analysis on {( Si , Ti )} S = S + Ps .s , T = T + Pt .t _

(4) _

where S is called mean shape and T is called mean texture. s and t are the eigenvectors corresponding to the m largest eigen-values. Any face image can be easily mapped to the model by using equation s = PsT ( S − S ) and t = Pt T (T − T ) . Any change on the shape leads to a change on texture since AAM model space is composed of the texture and shape subspaces. Hence the appearance model A, for any given image can be obtained by the formula,

search for fine face alignment. In Figure 3, component based AAM (leftEyeAAM) results are shown. In these examples, all initial locations are obtained from the previous stage (Global AAM). When the global AAM search is over, we employ sub-space AAM (right/leftEyeAAM, noseAAM, mouthAAM) search for fine alignment. This example illustrates the advantage of a component-based representation in disambiguating false global AAM face alignment in Fig.3. 4. EXPERIMENTAL RESULTS (a) (b) (c) Figure 3: Top: Initializations of eyeAAM, Bottom: Results obtained at the 12th iteration. A = ( Λs t ) where T

Λ denotes diagonal shape weight matrix.

Principal component analysis can be applied to A in order to reveal the relationship between shape and texture subspaces, A = Pa .a , where, a is the eigenvectors corresponding to the m largest eigen-values. 3.3 Last Stage: Component based AAM The last stage of the face alignment system is fine face alignment that uses a component based AAM to get accurate location of facial landmarks. This study combines the concept of component-based approach and AAM based face alignment and develops component-based AAM method for fine face alignment. It is known that component-based face detection can yield better performance than global approaches when pose and illumination variations and occlusion are considered. While pose and illumination considerably change the global face appearance, since the components are smaller than the whole face, they are less prone to these changes. We proposed a component-based AAM for face alignment which decomposes the face into set of facial components. In our approach face region divides into four components, namely left eye region, right eye region, mouth region and nose region. After the global AAM search is over, we employ sub-space AAM search to minimize the average point-point distances between the searched shape and the manually labeled shape for each face component. For example, in our system the second stage give the face component locations coarsely. And we can determine the scale and rotation of face from the second module and this information can also be used to limit the search. This prior information could be used to immediately accept/reject search results in final component-based AAM search. AAM converges to the correct solution if good initialization is given, but it otherwise prone to the local minima. We can see that face landmarks’ locations which are predicted by second-stage of our system are all close to their ground truth locations to guarantee a good initialization for component-based AAM

In our experiments we have used IMM face database [13] and CMU-PIE face database [14]. 134 color images which have full frontal face, neutral expression are used in our experiments. There are 74 color and full frontal face image in the IMM database. From IMM database 37 images with no spotlight are used for training and the remaining 37 images which have spot light added at the person's left side are used for testing. And we selected 60 images from CMU PIE face database to test our face alignment method on extreme illuminations. We picked 74 landmarks in each face image from the training set, 8 points for mouth, 11 points for nose, 22 points for left/right eyes, 11 points for chin. Table 1: Face segmentation results of the 2nd stage for test images (Global AAM) IMM(640x480) PIE (640x480) Feature Descriptor pt.-pt. Error pt.-pt. Error (Standard) RGB 6.37 ± 1.35 38.67 ± 2.04 (Proposed) Hill 6.09 ± 0.31 6.35 ± 0.39 (Proposed) (HHG) 4.82 ± 0.23 5.95 ± 0.27 Table 2: Face segmentation results of the 3rd stage for test images (Component based AAM) IMM(640x480) PIE (640x480) Feature Descriptor pt.-pt. Error pt.-pt. Error (Standard) RGB 4.64 ± 0.10 21.70 ± 1.98 (Proposed) Hill 5.30 ± 0.17 6.11 ± 0.27 (Proposed) (HHG) 3.59 ± 0.09 5.23 ± 0.19

We constructed shape and texture spaces to represent 95% of observed variation in shapes and texture for each component. Finally, we constructed appearance spaces for each components and global model to represent 95% of the total variation observed in the shape and texture coefficients. Global and components based AAMs are trained with the same training datasets. In all experiments λ =5 was used to obtain hill images. We obtained better segmentation results with AAM when the Hill images are used in the second stage. In the second stage, using the Hill images solve the AAM convergence

representation provide higher accuracy for fine alignment. Our three-stage approach handles not only illumination variations but also poor initializations. The results show that component-based AAM outperforms global AAM in facial shape localization. The performance of the proposed scheme on larger face databases is currently under investigation. 6. REFERENCES

Figure 4: Comparison under good and extreme illumination: Top: Standard AAM results using RGB values (2nd stage), Bottom: The results of the component-based AAM (3rd stage).

problem but still need some improvements. We need more detailed features for fine alignment in the last stage. For the last stage, we propose a new multi-band image representation which is less sensitive to illumination without losing texture details of the face images. First, we know that from the previous studies, lighting variations have less influence on the hue band in the Hue, Saturation and Value color space [12] [4]. Changes in illumination can have harsh inferences on the skin color distribution. These changes can be reduced in HSV color space. Thus we used Hue component for our multi-band image representation to get further robustness to the illumination variation. Rather than representing the image using grey values, we use Hue, Hill, and Gray value (HHG) for image representation. We obtain high accuracy segmentation results with AAM when multi-band modeling of appearance is used in second and the last stage. It's seen from the Table 1 and Table 2 that the using of the proposed three-band HHG features outperforms both standard and the Hill based AAM. Obtaining of Hue values and the details of the three-band AAM modeling are given in our previous study [12]. The accuracy of the segmentation is computed as the pointto-point (pt.-pt.) distance measure. Pt.-pt measures Euclidean distance between corresponding landmarks of the model and the ground truth [4]. The comparative results are given in Table 1 and 2. It's seen that our proposed three-stage approach locates the points more accurately than the classical AAM for test images which taken under different illumination condition. Because, our illumination invariant feature descriptor is obtained by smoothing the most prominent edge contours in the face image gathered by means of an efficient and effective edge ordering scheme. 5. CONCLUSION AND FUTURE WORKS From the experimental results presented in Section 6, it is also clear that the component-based AAM and the proposed image

[1] T. F. Cootes, D. Cooper, C. Taylor, and J. Graham, “Active shape models–their training and application,” Comp. Vis. & Image Understanding, 1995, 61(1),p 38-59. [2] T. F. Cootes, G. J. Edwards, C. J. Taylor. “Active appearance models”, in ECCV98, v. 2, pp.484–498, 1998. [3] T. F. Cootes and P. Kittipanya-ngam, “Comparing Variations on the Active Appearance Model Algorithm,” Proc. BMVC2002, Vol.2, pp. 837-846. [4] M. B. Stegmann, R. Larsen, “Multi-band Modeling of Appearance”, Image and Vision Comp., 2003, Elsevier Science, vol. 21(1), pp. 61-67. [5] T. F. Cootes and C. J Taylor, “On Representing Edge Structure for Model Matching,” Proc. CVPR 2001, vol. 1, pp. 1114-1119. [6] Y. Huang, S. Lin, S. Z. Li, H. Lu, H. Y. Shum, “Face Alignment Under Variable Illumination,” Proc. IEEE Int. Conf. on FG2004, 2004. [7] B. Kurt, M. Gökmen, and A. K. Jain, “Image Compression Based On Centipede Model,” Proc. ICIAP'97, 1997, vol. I, pp. 303-310. [8] M. Gokmen, A.K. Jain, “λ-τ Space Representation of Images and Generalized Edge Detection,” IEEE Trans. on PAMI, vol.19, No. 6, June 1997, pp. 545-563. [9] A. Yilmaz and M. Gokmen, “Eigenhills vs Eigenface and Eigenedge,” Pattern Recognition, v. 34, pp. 181-184, 2001. [10] P. Viola, and M. J. Jones, “Rapid Object Detection using a Boosted Cascade of Simple Features,” Comp. Vision and Pattern Recognition Conf., 2001. [11] J. Ahlberg, “A System for Face Localization and Facial Feature Extraction,” TR. LiTH-ISY-R-2172, 1999. [12] F. Kahraman, M. Gokmen, “Illumination Inv. Face Alignment Using Multi-Band AAM”, Int. Conf. on Pattern Recog. & Machine Intelligence, Kalkota, India, Dec., 2005. [13] M. B. Stegmann, “Analysis and segmentation of face images using point annotations and linear subspace techniques,” Technical Report, DTU, 2002. [14] T. Sim, S. Baker, and M. Bsat, “The CMU Pose, Illumination, and Expression (PIE) Database”, Proc. Int. Conf. on Auto. Face and Gest. Rec., May, 2002.

Suggest Documents