modelled and used to build Active Shape Models ... In this paper these models, their automatically building and some application examples from objects like ...
Methodologies to Build Automatic Point Distribution Models for Faces Represented in Images Maria João M. Vasconcelos João Manuel R. S. Tavares
Faculdade de Engenharia da Universidade do Porto Instituto de Engenharia Mecânica e Gestão Industrial Laboratório de Óptica e Mecânica Experimental Rua Drº Roberto Frias s/n, 4200-465 Porto, PORTUGAL
ABSTRACT: This paper presents new methods to automatically build Point Distribution Models for faces represented in images. These models consider significant points of faces from several images and study them in order to obtain the mean shape of the object and the main modes of variation. Active Shape Models and Active Appearance Models use the Point Distribution Model to segment the modelled object in new images. In this paper these models, their automatically building and some application examples from objects like faces represented in images are describe. 1 INTRODUCTION One of the most recent areas of interest in Computational Vision is image analysis based on flexible models. In this field, the use of statistical methods for object modelling has proved to be suitable to deal with problems in which the objects have variable shapes. This work is mainly concerned with the employment of Point Distribution Models (PDMs) in the modelling of objects represented in images (Cootes et al. 1992). These models are obtained by analysing the statistics of the co-ordinates of the landmarks that represent the deformable object in study: after aligning the object shapes, a Principal Component Analysis is made and the mean shape of the object and the main modes of its variation are obtained. Grey levels of the objects can also be modelled and used to build Active Shape Models (ASMs) and Active Appearance Models (AAMs), in order to segment (identify) the modelled object in new images. These statistical models have been very useful for image analysis in different applications of Computational Vision. For instance, they can be used on areas like: medicine, for locating bones and organs in medical images; industry, for industrial inspection; and security, for face recognition. Usually, because is manually made, the determination of the landmark points of the objects to be modelled is the most time consuming step of the construction of PDMs, and so of ASMs and AAMs as well. Consequently, some authors, like (Hill & Taylor 1994, Baker & Matthews 2002, Hicks et al. 2002, Angelopoulou & Psarrou 2004, Carvalho
& Tavares 2005, Vasconcelos 2005), have been developing methodologies to fully automate this stage. In this work, we present three methodologies to automatically extract significant points from faces represented in images. The main goals of the present work are: the introduction to the Point Distribution Models and its variants, namely ASMs and AAMs; the building of these models for faces represented in images using fully automatic procedures; and its application, namely, for its automatic segmentation in new images. This paper is organized as follows: in the next section, the models considered are presented; in section 3, are described our methods to automatically extract landmark points of the faces to be modelled, using the models previous presented; in section 4, some experimental results are presented; finally, in the last section, some conclusions and perspectives of future work are addressed. 2 POINT DISTRIBUTION MODEL (Cootes et al. 1992) describe how to build flexible shape models for objects called Point Distribution Models. These models are generated from examples of shapes of the object to be modelled, where each shape is represented by a set of labelled landmark points. The landmarks can represent the boundary or significant internal locations of the object (Fig. 1).
Figure 1. Training image, landmarks and an image labelled with the landmark points (from left to right).
In this modelling method, all the training examples are aligned into a standard co-ordinate frame and a Principal Component Analysis is applied to the co-ordinates of the landmark points. This produces the mean position for each landmark, and a description of the main ways in which these points tend to move together. The equation below represents the Point Distribution Model or Shape Model and can be used to generate new shapes: x = x + Ps bs ,
(1)
where x represents the n points of the shape: T
x = ( x0 , y0 , x1 , y1 ,K , xn −1 , yn −1 )
,
( xk , yk ) the position of point k , x the mean position of the points, Ps = ( ps1 ps 2 K pst ) the matrix of the first t modes of variation, psi , corresponding to the most significant eigenvectors in a Principal Component Analysis of the position variables, and T bs = ( bs1 bs 2 K bst ) a vector of weights for each mode. If the shape parameters b are chosen inside suitable limits (derived from the training set), then the shapes generated by equation (1) will be similar to those given in the original training set. The local grey-level environment about each landmark point can also be considered in the modelling of an object represented in images. Thus, statistical information is obtained about the mean and covariance of the grey values of the pixels around each landmark point. This information is used in the PDMs variations: to evaluate the match between landmark points in Active Shape Models and to construct the appearance models in Active Appearance Models, as we explain next. 2.1 Active Shape Model After build the PDM and the grey level profiles for each landmark point of an object, we can segment that object in new images using the Active Shape Models, an iterative technique for fitting flexible models to objects represented in images (Cootes & Taylor 1992a). The referred technique is an iterative optimisation scheme for PDMs allowing initial estimates of pose, scale and shape of an object to be refined in a new image. The used approach can be summarized on the following steps: 1) at each landmark point of the models is calculated the necessary movement to displace that point to a better position; 2) changes in the overall position, orientation and scale of the
model which best satisfy the displacements are calculated; 3) finally, any residual differences are used to deform the shape of the model by calculating the required adjustments to the shape parameters. In (Cootes et al. 1994) is presented an improvement for the active shape models, which uses multiresolution. Thus, initially the method used constructs a multiresolution pyramid of the images to be consider, by applying a Gaussian mask, and then study the grey level profiles on the various levels of the pyramid built, making this away active models faster and reliable. 2.2 Active Appearance Model This approach was presented in (Cootes et al. 1998) and allow the building of texture and appearance models. These models are generated by combining a model of shape variation (a geometric model), with a model of the appearance variations in a shapenormalized frame. The used statistical model of the shape it is also described by equation (1). To build a statistical model of the grey level appearance, we deform each example image so that its landmark points match the mean shape of the object, by using a triangulation algorithm. We then sample the grey level information, gim from the shape-normalized image over the region covered by the mean shape. To minimize the effect of global light variation, we normalize this vector, obtaining g . By applying a Principal Component Analysis to this data, we obtain a linear model, the texture model: g = g + Pg bg ,
(2)
where g is the mean normalised grey level vector, Pg is a set of orthogonal modes of grey level variation and bg is a set of grey level model parameters. Therefore, the shape and appearance of any example of the object modelled can be defined by vectors bs and bg . Since there may be some correlation between the shape and grey levels variations, we apply a further Principal Component Analysis to the data of the models. Thus, for each training example we generate the concatenated vector: Ws bs Ws PsT ( x − x ) b= , = T bg Pg ( g − g )
(3)
where Ws is a diagonal matrix of weights for each shape parameter, allowing the adequate balance between the shape and the grey models. Then, we apply a Principal Component Analysis on these vectors, giving a further model: b = Qc ,
(4)
where Q are the eigenvectors of b , and c is the vector of appearance parameters controlling both the shape and the grey levels of the model. Thus, an example object can be synthesized for a given c by generating the shape-free grey level object, from the vector g , and deforming it using the landmark points described by x . 3 AUTOMATIC EXTRACTION OF LANDMARK POINTS
In Figure 2, are present some results in a training image example using our method to automatically extract landmark points of faces represented in images.
a)
b)
c)
d)
e)
f)
3.1 Face Contour Extraction This method extracts significant points of faces represented in images; namely, on chin, eyes, eyebrows and mouth. The first step of our method uses a skin detection algorithm to localize the face region. This algorithm uses a skin representative model, built with skin samples of the individual in study. Studies like (Jones & Rehg 1999, Tien et al. 2004, Zheng et al. 2004, Carvalho & Tavares 2005) show that the skin colour have usually the same luminance range and with the study of the skin chromatic colours it is possible to build a probability function for skin regions. Studies like (Campadelli et al. 2003) show that the use of chrominance maps are useful for eyebrows and eyes localization in images. Chromatic colours can be obtained from the RGB colour space using the transformation: R Cr = R + G + B . B Cb = R+G+ B
(5)
Usually, eyes are characterized in CbCr plane by low values on the red component, Cr , and high values on the blue component, Cb , so the chrominance map for eyes can be defined by the following equation: 1 ˆ EyeMap = ( Cb 2 ) + Cr 3
( )
2
Cb + , Cr
Figure 2. a) Training image, b) segmentation result using the skin algorithm, c) face contour extracted, d) eyebrows and eyes found, e) mouth identified, and f) final contours obtained.
3.2 Face Regular Mesh The second method developed for automatically extraction of the landmark points of faces represented in images is based on the worked presented in (Baker & Matthews 2004) that, to construct active appearance models, consider the landmark points as the nodes of a mesh defined on the object to model. Our method starts to identify face and eye regions like described in the last section, and adjust a regular rectangular mesh to the face region detected, rotating it according to the angle given by the eye’s centroids. The nodes of the mesh obtained are then considered as landmark points of the object and used to build active appearance models for the same one. Figure 3, shows the face mesh result obtained in a training image example using this method to automatically extract landmark points of faces represented in images.
(6)
ˆ 2 and Cb / Cr are normalized to the where Cb 2 , Cr ˆ is the negative of Cr (ie, range [ 0, 255] and Cr ˆ = 255 − Cr ). In our work, the EyeMap is used also Cr to identify the eyebrows region with good results. In other hand, in our method, the mouth region is identified using the HSV space, where H , S , V represent hue, saturation and value, respectively; where mouth is habitually characterized by having high values on the saturation component. By congregating the contours of the face, eyebrows, eyes and mouth is possible to extract landmark points from each of these zones. Considering that the zone of the chin is the most important segment of the face contour, we only use the inferior part between ears.
a)
b)
Figure 3. a) Training image, b) face regular mesh (red points) adapted to the face region (face contour in blue) and rotated according to the eyes direction (yellow).
3.3 Face Adaptative Multiresolution Mesh Finally, our third method combines the philosophy of the first method described, using face, eyes and mouth localization, and of the last method, considering the landmark points as the nodes of the defined meshes. So, our new method builds a
multiresolution mesh considering the face, eyes and mouth positions. After localizing the face, eyes and mouth regions in the input image like described before, this new method constructs adaptative meshes, in the eye and mouth regions detected according to their localization; and then adds additional nodes, in the large mesh (that contains the face region), defined by the external edges and the bounds of the sub-meshes used in the regions of eyes and the mouth. One example of the resulting final mesh using our third method for faces represented in images is presented in Figure 4.
a)
To the active shape model built, using 44 landmark points, the first 10 modes of variation could explain 90% of all the shape variance of the object modelled. For the first face shape model trained (face contour), it was found that for 95% of the shape variance could be explained only by the first 13 modes of variation. By other hand, for the texture model it was found that 95% of the variance could be explained by the first 15 modes of variation. Finally, the appearance model needs only 12 modes of variation to explain 95% of the observed variance. The first four modes of appearance variation are shown in Figure 5.
1st mode
2nd mode
3rd mode
4th mode
b)
Figure 4: a) Training image and b) example of an adaptative multiresolution mesh obtained for a face.
In all implementation developed for our methods presented for automatically extract landmark points of faces represented in images, we can choose the parameters that define the resulting contour or mesh; that is, the number of landmark points defined in each interesting zone of the object to be modelled. 4 RESULTS The methods described in this paper were used in this work to automatically build active shape and active appearance models for objects like faces represented in images. During this work we developed an application in MATLAB to build shape models, using the Active Shape Models software (Hamarneh 1999). For the appearance models, we used the Modelling and Search Software available in (Cootes 2004). The images used in this paper are available in (Cootes 2004a). For modelling faces represented in images, we used a training set of 22 images and other 4 images were used just for testing purpose. The active shape model was build using the first method presented in this paper for automatic extraction landmark points of faces represented in images and the other two methods presented from the same purpose were used to build active appearance models. We present results for active models using the three approaches proposed for extracting landmark points: the face contour method extracted 44 landmark points, we extract 49 landmark points with the regular mesh approach and the third method extracted 54 and 75 landmark points respectively.
Figure 5. First four modes of appearance variation for the contour face model built ( ±2sd ).
For the model trained using an adaptative multiresolution face mesh, it was found that 95% of the variance of the object modelled could be explained only by the first 3 modes of variation. In the other hand, for the texture model, it was found that 95% of the variance of the same object could be explained by the first 14 modes of variation. In last, the appearance model needs only 8 modes of variation to explain 95% of the observed variance of the object modelled. The first four modes of variation of the texture and appearance models built are shown in Figure 6.
1st mode
2nd mode
3rd mode
4th mode
Figure 6. First four modes of appearance variation for the adaptative multiresolution face mesh model considered ( ±2sd ).
In Figures 7, 8 and 9 are presented some segmentation results obtained in a test image using the active appearance models built using the face contour model, regular face mesh model and adaptative face mesh model.
Test image
1st iteration
7th iteration
12th iteration
17th iteration
21st iteration
Figure 7. Test image with initial position of the mean model overlapped, and after the 1st, 7th, 12th, 17th and 21st iteration of the search with the active appearance model built for the face contour model.
Test image
1st iteration
10th iteration
15th iteration
19th iteration
24th iteration
Figure 8: Test image with initial position of the mean model overlapped, and after the 1st, 10th, 15th, 19th and 24th iteration of the search with the active appearance model built for the regular face mesh model.
Test image
1st iteration
10th iteration
15th iteration
20th iteration
23rditeration
Figure 9: Test image with initial position of the mean model overlapped, and after the 1st, 10th, 15th, 20th and 23rd iteration of the search with the active appearance model built for the adaptative face mesh model.
In the active appearance search process, 5 levels of resolution were used and a maximum of 5 iterations were allowed per level. The active shape models built using he alignment process that consider the variance of landmark points, retained 95% of the variance of the object modelled and used the grey level profile of 7 or 15 pixels long, were the ones that obtained the best segmentation results. For the active appearance models, the models built that obtained best segmentation results considered 99% of variance and 50000 pixels for the texture model. In the face models, the mean error for segmentation was between 6.2 and 15.5 pixels for
the active shape model, between 4.1 and 6.1 pixels using the face contour extraction method, 1.3 and 4.9 pixels using the face regular mesh method, and 1.5 and 3.7 pixels using the face adaptative multiresolution method, for the active appearance models. The mean error calculated for each test image consists in the Euclidean distance between the landmark points obtained by the model used and the object to be segmented. 5 CONCLUSIONS AND FUTURE WORK A methodology to automatic build flexible models was presented, using a statistical approach, for deformable objects represented in images, namely for faces objects. The methods developed to automatically extract landmark points from faces represented in images showed to be reliable and that allow the building of active shape models and active appearance models in a fully automatic way. The segmentation results obtained in this work showed that the active appearance models built with the regular face mesh model and the adaptative face mesh model present better results than using the face contour model. In general, active appearance models allows the construction of a robust model using relatively few landmark points compared to active shape models; so the first one is preferred in problems in which the landmark points extraction is not a easy process. For future work, the use of previous knowledge about physical proprieties of the objects to be modelled can be considered in the building of its statistical models. Other interesting work can be the study of the influence in the models built of the number of training images used. 6 ACKNOWLEDGMENTS This work was partially done in the scope of the project “Segmentation, Tracking and Motion Analysis of Deformable (2D/3D) Objects using Physical Principles”, with reference POSC/EEASRI/55386/2004, financially supported by FCT – Fundação para a Ciência e a Tecnologia from Portugal. REFERENCES Angelopoulou, A. N. and A. Psarrou (2004). Evaluating Statistical Shape Models for Automatic Landmark Generation on a Class of Human Hands. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Istanbul.
Baker, S. and I. Matthews 2002. Automatic Construction of Active Appearance Models as an Image Coding Problem IEEE Transactions on Pattern Analysis and Machine Intelligence 26: 1380-1384. Baker, S. and I. Matthews 2004. Automatic Construction of Active Appearance Models as an Image Coding Problem IEEE Transactions on Pattern Analysis and Machine Intelligence 26: 1380-1384. Campadelli, P., et al. (2003). A color based method for face detection. International Symposium on Telecomunications, Isfahan, Iran. Carvalho, F. J. S. and J. M. R. S. Tavares (2005). Metodologias para identificação de faces em imagens: Introdução e exemplos de resultados. Congresso de Métodos Numéricos en Ingeniería 2005, Granada, Espanha. Cootes, T. F. (2004). Build_aam. http://www.wiau.man.ac.uk/~bim/software/a m_tools_doc/download_win.html. Cootes, T. F. (2004a). Talking Face. http://www.isbe.man.ac.uk/~bim/data/talking _face/talking_face.html. Cootes, T. F., et al. (1998). Active Appearance Models. Proceedings of European Conference on Computer Vision, Springer. Cootes, T. F. and C. J. Taylor (1992a). Active Shape Models - 'Smart Snakes'. Proceedings of the British Machine Vision Conference, Leeds. Cootes, T. F., et al. (1992). Training Models of Shape from Sets of Examples. Proceedings of the British Machine Vision Conference, Leeds. Cootes, T. F., et al. (1994). Active Shape Models: Evaluation of a Multi-Resolution Method for Improving Image Search. British Machine Vision Conference, BMVA. Hamarneh, G. (1999). ASM (MATLAB). http://www.cs.sfu.ca/~hamarneh/software/co de/asm.zip. Hicks, Y., et al. 2002. Automatic Landmarking for Building Biological Shape Models International Conference of Image Processing, Rochester, USA 2: 801-804. Hill, A. and C. J. Taylor (1994). Automatic Landmark Generation for Point Distribution Models. Fifth British Machine Vision Conference, England, York, BMVA Press. Jones, M. J. and J. M. Rehg (1999). Statistical Color Models with application to skin detection. IEEE Conference on Computer Vision and Pattern Recognition, Ft. Collins, CO, USA. Tien, F.-C., et al. 2004. Automated visual inspection for microdrills in printed circuit board production International Journal of Production Research 42, nº 12: 2477-2495. Vasconcelos, M. J. 2005. MSc Thesis: Modelos Pontuais de Distribuição em Visão Computacional: Estudo, Desenvolvimento e Aplicação. Estatística Aplicada e Modelação, Universidade do Porto.
Zheng, H., et al. 2004. Blocking Adult Images Based on Statistical Skin Detection Electronic Letters on Computer Vision and Image Analysis 4: 1-14.