Voxel-based 3D face representations for recognition

3 downloads 0 Views 2MB Size Report
Universidad Rey Juan Carlos, 28933 Móstoles (Madrid); Spain. E-mail: ... The highest recognition rates achieved are 77,86% for non- controlled environments ...
12th Int. Workshop on Systems, Signals & Image Processing, 22-24 September 2005, Chalkida, Greece

285

Voxel-based 3D face representations for recognition A. B. Moreno* Escuela Superior de CC. Experimentales y Tecnología Universidad Rey Juan Carlos, 28933 Móstoles (Madrid); Spain E-mail: [email protected] *Corresponding author

A. Sánchez Escuela Superior de CC. Experimentales y Tecnología Universidad Rey Juan Carlos, 28933 Móstoles (Madrid); Spain E-mail: [email protected]

J. F. Vélez Escuela Superior de CC. Experimentales y Tecnología Universidad Rey Juan Carlos, 28933 Móstoles (Madrid); Spain E-mail: [email protected] Abstract: In this paper we present and analyze new 3D voxel-based face representations for face recognition. They have been tested using two standard matching schemes: Support Vector Machines and Principal Component Analysis with Euclidean distance. Experiments were performed under both controlled and non-controlled acquisition conditions related to the pose and facial expressions. The highest recognition rates achieved are 77,86% for noncontrolled environments (considering the variability of the database images) and 90% for controlled environments, when a voxel-based representation is used. The dataset of images used for the experiments contains 427 3D facial images consisting in meshes without texture corresponding to 61 individuals having light rotations and facial expressions. Keywords: Face Recognition, 3D face modelling, PCA, SVM. Reference to this paper should be made as follows: A. Belén Moreno, A. Sánchez and J. F. Vélez (2005) ‘Voxel-based 3D face representations for recognition’, Proceedings of the 12th International Workshop on Systems, Signals and Image Processing (IWSSIP'05) Chalkida (Grecia), September 2005. Biographical notes: A. B. Moreno received her PhD in Computer Science from Universidad Politécnica de Madrid in 2003. She is currently Associate Professor at the Department of Informática, Estadística y Telemática, Universidad Rey Juan Carlos, Madrid, Spain. Her current research interest includes Computer Vision, in particular, Face Recognition using three-dimensional Vision techniques. Á. Sánchez is an Associate Professor of Computer Science at the Universidad Rey Juan Carlos of Madrid, Spain. He received his PhD in Computer Science from the Universidad Politécnica de Madrid, Spain. His current research focuses on Computer Vision, Biometrics and Soft Computing. J. F. Vélez is an Assistant Professor of Computer Science at the Universidad Rey Juan Carlos of Madrid, Spain. He is a PhD student in Computer Science in Universidad Rey Juan Carlos of Madrid, Spain. His current research focuses on Computer Vision, Biometrics and Signature Recognition.

1

INTRODUCTION

The need of security and fraud control applications to establish personal authentication has increased research in biometric systems (http://www.biometrics.org). Automatic

Copyright © 2005 Inderscience Enterprises Ltd.

face recognition is one of the less intrusive biometric modalities, which has increased the interest. It has many applications in areas like: personal identification, security applications and law enforcements, among others (Pentland et al., 2000).

286

Traditionally, research in face recognition was focused on 2D intensity images. Under this context, the recognition accuracy is sensitive to the lighting conditions, expressions, head rotation, and/or a variety of elements such as hair, moustache, glasses, etc. Automatic Face Recognition using 2D images provides excellent results when the image acquisition conditions (illumination, pose and face variations) are controlled (Zhao, 2000). Recent efforts are oriented to reduce these image acquisition restrictions (Zhao et al., 2003). Some methods have been proposed to tackle non-controlled variations of pose and illumination, but they do not work well in arbitrary conditions (Zhao, 1999). Working with 3D face images has several advantages over using 2D face images: (1) more geometrical information can be obtained from the 3D data than from 2D images because 2D images loose depth information as they are formed through projections of 3D objects, (2) the measured features from real 3D data are neither affected by the scale nor by the rotation, and (3) if the 3D face recognition system does not consider texture information, the recognition is immune to the effect of illumination variations. The advances in computational processing capabilities as well as the reduction of both cost and size of the 3D digitizers, have contributed to the development 3D face recognition systems. Recently, the interest in model-based 3D automatic face recognition systems has also increased (Lee et al., 2003; Huang et al., 2003). One common technique on 3D object recognition is based on the matching among 2D image points and the corresponding ones of a generic 3D object model, as a previous stage to infer information. Another representation technique of 3D range face images using local information is computing the point signature over determinate 3D points in order to obtain local descriptors. The point signature has been used in the scope of 3D face recognition (Chua et al., 2000). Differential geometry has been used for feature extraction in the context of free-form three-dimensional object recognition and also used by some authors for facial feature extraction (Hallinan et al. 1999). The 3D surface local curvature and the angle among two surface normal vectors have been proposed as descriptors in the scope of 3D face recognition. Some recent approaches consider principal component analysis (PCA) to obtain a low dimensional representation of 3D images given by in depth maps of the complete face (Chang et al., 2003; Tsalakanidou et al., 2003). The objective is to evaluate the influence of colour, depth and the combination of both for face recognition. These works have compared the use of 2D images and their corresponding 3D depth maps, both independently and combined, concluding that the combination of both 2D and 3D information provides better performance. Depth and texture play complementary roles in the coding of faces. These systems have not considered images presenting facial expressions. Pose variations have been limited (i.e. in Tsalakanidou et al., 2003) or not considered (i.e. in Chang et al., 2003).

A. Belen Moreno, A. Sanchez, J. Fco. Velez

Two aspects that characterize a face recognition system are: (1) representation or face modelling and (2) recognition or matching technique. Face modelling transforms the information of the facial image into a set of characteristics that represent the original data. Matching scheme involves the used method to select the best match from the set of identities of the face database. While the matching scheme can be efficiently implemented using standard machine learning techniques like Neural Networks or Support Vector Machines, 3D Face Modelling is still an open problem (Abdelkader et al., 2005). In our work Moreno et al., 2005, a set of local geometrical characteristics were extracted from 3D face images (after a segmentation stage in which curvature information was used) and were tested in face recognition systems. In this paper, we present another kind of representations of the same faces and test them in face recognition systems employing similar matching schemes, comparing their recognition results with those presented in Moreno et al., 2005. In order to analyze the model robustness, we used both controlled and non-controlled environments for two pattern classification methods: (1) Support Vector Machines (SVM) and (2) Principal Components Analysis (PCA) in combination with a Euclidean distance classifier. Also, due to the lack of representative 3D face databases that present a high degree of variability among the images of each individual, specially related to facial expressions, we have created our 3D face database named GavabDB. This database was described in Moreno et al., 2004, and the 3D images can be found (without texture) in http://gavab.escet.urjc.es. 2

DATABASE DESCRIPTION

A set of 427 3D facial surfaces corresponding to the 61 individuals of the GavabDB database (having 7 different meshes per individual) has been used for the experiments. Each set of the seven 3D images of an individual contains: one facial image in which individual is looking down (+35º x-rotation approximately), one facial image in which individual is looking up (-35º x-rotation approximately), and five frontal views from which three of them present facial expressions (a random gesture, smile and laugh, respectively). Facial surfaces are represented by meshes (without colour) provided by the 3D digitizer VI 700 of Konica-Minolta. Cells of each mesh have four non-coplanar nodes, and occasionally three (in the contour). Two kind of resolution of the meshes have been tested, that are 1/1 and 1/4. The average points per face mesh is 2,186 at 1/4 of the scanning resolution. A pre-processing stage for removing noise and smoothing were carried out over all the 3D facial meshes using Median and Gaussian filters, respectively. A pose normalization of the facial meshes has also been achieved using 3D points automatically extracted. These points are the mass centres of the regions segmented in Moreno et al., 2005, by using the curvature-based HK segmentation algorithm presented in Trucco et al., 1998.

287

Voxel-Based 3-D Face Representations for Recognition

3

000000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000010 010000000000000000000000000000 000000000000000000000000000000 000000000000000000000000000100 001000000000000000000000000000 001000000000000000000000001000 000100000000000000000000001000 000100000000000000000000010000 000011000000000000000000100000 000001100000000000000011000000 000000011110000000011110000000 000000000000000000100000000000 000000000001000000000000000000 000000000000100001000000000000 000000000000100011000000000000 000000000000011110000000000000

VOXEL-BASED REPRESENTATION

The voxel representation of a face is targeted for transforming the original 3D mesh into a discrete and regular representation of a fixed number of volume elements (voxels). The process starts by defining a cube as big enough to contain the relevant parts of the facial meshes. Each edge of the cube is set to 150 mm long. In order to allocate the normalized facial meshes within the cube, the centre of one face of the cube is placed into the nose tip (computed as the mass centre of the convex region of the nose automatically extracted in Moreno et al., 2005) of the pose-normalized facial mesh. Next, the cube is divided into n×n×n equally spaced cubic voxels. Figure 1 present an example in which the cube containing the face is divided into (a) 2x2x2 voxels and (b) into 3x3x3 voxels.

(a)

(b)

Figure 2 (a) Volume elements (dark) containing a patch of facial mesh (clear) and (b) example of the voxel values for a horizontal section placed at the nose level for a 30x30x30 cube resolution.

4.1

(a)

(b)

Figure 1 Facial mesh inside a cube from two different viewpoints. The cube resolutions are 2x2x2 (a) and 3x3x3 (b).

The intersection of the voxels with the mesh permits to translate the facial mesh into a binary representation. In this way, if a voxel intersects the mesh (contains a vertex of the mesh in our work) then it is 1-valued, else, it is 0-valued. Figure 2 presents (a) the voxels that are occupied by a patch of mesh and (b) the voxel values of the horizontal plane given by y=7.5 mm for a 30x30x30 cube resolution. We have used three different cube resolutions to represent the facial meshes (30x30x30, 45x45x45 and 55x55x55) and two different mesh resolutions (1/1 and 1/4). Higher cube resolutions have the problem of holes (lack of 1-voxels when they contain a region of the facial mesh cause of they do not contain vertex).

4

PROPOSED REPRESENTATIONS

We have extracted three kinds of voxel-based representations derived from the previous representation of the complete face: (1) those consisting in a single cut of the cube, (2) those consisting in a combination (concatenation) of single cuts of the cube and (3) depth maps.

Single cuts

We have obtained five potential representations consisting in single cuts for modelling a face: (1) horizontal cut at the level of the mouth (y=n/3), (2) horizontal cut at the level of the nose(y=n/2), (3) horizontal cut at the eye level (y=2n/3), (4) vertical cut at the nose height (x=n/2) and (5) vertical cut at the level of one of the eyes (x=2n/3). Figure 3 shows these representations for a 55x55x55 cube resolution and 1/1 mesh resolution.

(a)

(b)

(c)

(d)

(e)

Figure 3 Examples of horizontal cuts (a) at the mouth level, (b) at the nose base level and (c) at the eye level, and vertical cuts (d) at the nose level and (e) at the eye level.

4.2

Combination of cuts

We have extracted two potential representations consisting in combination of cuts for modelling a face: (1) Combination (or concatenation) of two single cuts: horizontal cut at the eye level and vertical cut at the nose level and (2) combination of all the horizontal cuts between nose (y=n/2) and eyes (y=2n/3). Figure 4 shows an example of each one of both models.

288

A. Belen Moreno, A. Sanchez, J. Fco. Velez

controlled environment it is ensured that there exists an image in the train set presenting the same pose and gesture than each test image (frontal image with neutral expression in our case). In a non-controlled environment the train images are frontal and present neutral expression but the test images are selected randomly from the rest of images of the individual (including rotated and gesture images). The implementations of PCA and SVM used for the experiments are (Romdhani, 1996) and (Collobert et al., 2002) respectively.

6 (a)

Next the more relevant results of the experiments are summarized.

(b)

Figure 4 Examples of two cut combinations (a) horizontal cut at the level of eyes and vertical at the level of the nose for a 55x55x55 cube resolution and 1/1 mesh resolution, and (b) horizontal cuts between eyes and nose for a 30x30x30 cube resolution and 1/4 mesh resolution.

4.3

Depth maps

A depth map is a two-dimensional matrix in which each component represents the distance from a point of the face surface to a plane. In our case, we have situated the origin plane to obtain depth maps in the nose tip. This implies that around the nose the distance will be small, while in other face areas the distance will increase. We have measured the distance using the number of empty voxels from the plane y=0 to the facial mesh. In order to study the modelling capabilities of depth maps we have defined three classes (Figure 5): (1) full facial depth map, (2) upper-half facial depth map and (3) left-side facial depth map.

(a)

(b)

(c)

Figure 5 Examples of depth maps corresponding to (a) full face, (b) upper-half and (c) left-side of the face for a 55x55x55 cube resolution and 1/1 mesh resolution.

5

EXPERIMENTAL RESULTS

RECOGNITION EXPERIMENTS

We tested two matching schemes, which have produced very good results for pattern recognition: Support Vector Machines (SVM) and Principal Component Analysis (PCA), respectively. In order to test the robustness of our face recognition system, experiments have been run under controlled and non-controlled conditions, to show how the non-controlled environment affects the recognition rate. In a

6.1

Non-controlled environment

Among the three horizontal single cuts the best recognition rates are obtained using the horizontal cut at the eye level which achieves up to 60% of correct recognition when 1/1 mesh resolution is used. We can also observe the problem of holes caused by the voxel resolution that causes that SVM has better results for 30x30x30 voxel resolution than for 45x45x45. The horizontal cut at the mouth level, does not produce very good results (30%), due to the sensibility of this area to gestures. Low recognition rate has also produced the horizontal cut at the nose base level (47%) caused because the unconscious rotations of individuals when capturing the image cause of occlusions in the nose base (which are automatically reconstructed by the digitizer using an interpolation algorithm) causing errors. The recognition rates obtained with single vertical cuts do not produce, in general, very good results (the best is 60% correct recognition rate in the case of vertical cut at the level of nose). In this case the results do not seem to be really affected by different voxel and mesh resolutions. Multiple contour models were designed to capture more discriminating information from each facial mesh in order to improve the recognition rate. The combination of horizontal cut at the eye level and vertical cut at the nose level is able to correctly identify 77.86% of the faces (at 45x45x45 of cube resolution and 1/1 of mesh resolution using PCA), which is an increase when compared to the 50% and 60% recognition rates achieved by each representation independently. The combination of all horizontal cuts between nose and eyes provides a 77.05% success recognition rate, so it does not really increases the previous recognition rate and on the contrary, the size of the images is considerably increased. In this work, the full facial depth map does not produce very good recognition results as pattern (about 58% success rate). For this approach SVM works much better that PCA as matching scheme. Upper-half facial maps have been used since the mouth area is very sensible to facial gestures. Using left-half facial depth map provides recognition rates

289

Voxel-Based 3-D Face Representations for Recognition

very poor. In the literature, we found some references of 3D face recognition using PCA and depth maps (Chang, 2003; ) at higher resolutions than the used in this work (78x78 approximately), but those approaches no facial expressions were considered and the capture conditions were not flexible. 6.2

Controlled environment

The testing of the representation that provided better results for controlled conditions (combination of horizontal cut at the eye level and vertical cut at nose level), was also run considering a controlled environment. Recognition rates improved an average of 10%, and a maximum recognition rate of 90,16% was achieved with PCA and 88,52% with SVM.

7

CONCLUSION AND FUTURE WORK

This paper has proposed and tested a set of voxel-based models to represent individuals of a 3D face database. The best experimental results were obtained with the horizontal cut at the eye level and vertical cut at the nose level, which provided an 77,86% of correct recognition rate in a noncontrolled environment and a 90,16% in a controlled environment, respectively. The system was tested with two classical matching schemes, SVM and PCA combined with a Euclidean classifier. The results also showed that there was not a significant difference in the achieved recognition rates. The results also showed that correct recognition rates were enough to implement real face recognition applications and that the proposed model is robust against variations in illumination (because no colour information was used), and the recognition rate was reduced about 10%, when gestures and facial expressions appear. In our preceding work Moreno et al., 2005, similar experiments were achieved using feature vectors (consisting in the 30 most discriminating local geometrical features selected from a set of 86 features automatically extracted from the regions produced by a HK segmentation algorithm) as representations. Two face recognition systems based on PCA and SVM as matching schemes, under controlled and non-controlled environments, were experimented using as face database 420 3D facial images of 60 different individuals (also seven images per individual), which have been used also for the experiments presented here. Using SVM in a non-controlled environment produced 77.9% correct recognition rate, while a 90.16% is obtained under a controlled environment (similar results than using the best voxel-based representation). Future work will consist in improving the results by applying a 3D filling polygon procedure to give value to the voxels of the cube when its resolution is high, and also to look for combinations of new representations that can improve the achieved recognition rates.

REFERENCES Abdelkader, C. B. and Griffin, P.A. (2005), “Comparing and combining depth and texture cues for face recognition”, Image and Vision Computing 23, 339-352. Chang, K. I., Bowyer, K.W. y Flynn, P.J. (2003), “Face Recognition Using 2D and 3D Facial Data”, Workshop on Multimodal User Authentication (MMUA), (Santa Bárbara CA). Chua, C. S., Han F. and Ho, Y. K. (2000), “3D Human Face Recognition Using Point Signature”, Fourth IEEE Intl. Conf. on Automatic Face and Gesture Recognition. Collobert R. and Bengio S. (2002), “SVMTorch: Support Vector Machines for Large-Scale Regression Problems”, Journal of Machine Learning Research, vol. 1, pp. 143-160. Hallinan, P., Gordon, G., Yuille, A. L., Giblin, P. and Mumford, D. (1999), “Two and Three-dimensional patterns of the face”, Ed. A. K. Peters. Huang, J., Heisele, T. and Blanz, V. (2003), “Component-based Face Recognition with 3D Morphable Models”, 4th Conf. on Audio and Video-based Person Authentication. Lee, M. W., and Ranganath, S. (2003), “Pose-invariant face recognition using a 3D deformable model”, Pattern Recognition 36, 1835-1846. Moreno A. B. and Sánchez A. (2004), “GavabDB: a 3D Face Database”, Workshop on Biometrics on the Internet COST275, (Vigo, March 25-26), 77-85. Moreno, A. B., Sánchez, A., Vélez J. F. and Díaz, F. J. (2005), “Face Recognition Using 3D Local Geometrical Features: PCA vs. SVM”, IEEE 4th International Symposium on Image and Signal Processing and Analysis ISPA, Zagreb, Croatia, September 15-17. Pentland, A. and Choudhury, T. (2000), “Face Recognition for Smart Environments”, Computer IEEE, February, p. 50-55. Trucco, E., Verri, A., (1998), Introductory techniques for 3D Computer Vision, Prentice-Hall. Tsalakanidou, F., Tzovaras, D. and Strintzis, M. G. (2003), “Use of depth and colour eigenfaces for face recognition”, Pattern Recognition Letters 24, 1427-1435. Zhao, W. (1999), “Improving the Robustness of Face Recognition”, Proc. Intl. Conf. on Audio and Video Based Person Authentication, 78-83. Zhao, W. (2000), “Face Recognition: A Literature Survey”, UMDCFAR, Technical Report CAR-TR-948. Zhao W., Chellapa, R., Phillips, P. J. and Rosenfeld, A. (2003), “Face Recognition: A Literature Survey”, ACM Computing Surveys 35 (4), December, 399-458.

WEBSITES The Biometric Consortium http://www.biometrics.org. Grupo de algorítmica para la vision artificial y la biometría (Gavab), http://gavab.escet.urjc.es . Romdhani S., “Face Recognition Using PCA”, http://www.vision.im.usp.br/~teo/pca/ , 1996.