Face Recognition Using Range Images Bernard Achermann, Xiaoyi Jiang, Horst Bunke Institute of Computer Science and Applied Mathematics Universit yof Bern, Neubruc kstrasse10, CH-3012 Bern email: fac k erman,jiang,
[email protected] phone: +41 31 631 48 65, fax: +41 31 631 39 65
Abstract
A system for face recognition using range images as input data is described. The range data acquisition procedure is base d on the coded light appro ach, merging range images that are recorded by two separate sensors. Two appr oaches,which are known from face recognition b ased on grey level images, have been extended to dealing with range images. These approaches are base d on eigenfaces and hidden Markov models, r esp ectively. Experimental results on a database with various range images from 24 persons show very promising results for b oth r ecognition metho ds.
1 Introduction
Analysis of human faces is a very challenging researc h area which has gained muc h atten tionduring the last years. Most of the works, how ev er, are focused on intensit y or color images of faces, and only few approaches deal with range images. But there is evidence that range images have the potential to o vercome some problems of intensit y and color images. Some advantages of range images are the explicit representation of 3D shape, invariance under change of illumination, and invariance under change of color and re ectance properties of the objects. An example of the last type of in variance are images of a person with and without makeup. While the basic biometric properties of the face represented by a range image remain the same, the visual appearance of a person can be altered with makeup in such a w ya that not even h umans are able to recognize the person anymore. In the present paper w epresent our experiences with applying eigenfaces and hidden Markov models (HMMs) to range images of h uman faces for the purpose of person identi cation. One of the rst w orks basedon 3D images of human faces w asproposed by Lapreste et al. [5]. The authors analyzed the curv atureon a human face in order to extract feature points on the pro le line. The recognition of the face w asthen based on the characteristics of the pro le. Lee/Milios [6] extracted the con vex regions in range images of human faces. These regions form the set of features of a face. F or all these con vexregions the Extended Gaussian image is calculated. The matching of facial features of tw oface images is based on these Extended Gaussian images. Gordon [3, 4] presented a detailed study of range images in face recognition. She computed the curvatures on a face in order to nd face speci c descriptors (e.g.,
nose ridge and eye features). These descriptors were used in the recognition stage. Y acoob/Davies [12] proposed a method for labeling the regions of a human face with the help of range images. Again, con vex and concave points are calculated, but with a dierent method, namely a multistage diusion process. The resulting regions are then labeled using a qualitative reasoning procedure. Lengagne et al. [7] created a depth map of the face with the help of a stereo pair of images. This depth map is improved by various processing steps, and, nally, feature extraction and segmentation is performed on the depth map. A recognition step is ob viouslyplanned by the authors, but not y et described. The present paper describes tw orecognition procedures for images of human faces. One is based on the eigenface method [11 ], and the other on hidden Markov models [8]. We describe the processing steps and present some experimental results. The paper is structured in the follo wing manner. In Section 2 w e describe the data acquisition and range sensor setup. In Section 3 w e present the preprocessing and the merging steps. After the description of the recognition procedures in Section 4, we sho w experimental results in Section 5. Finally, w e discuss them and present future developments in Section 6.
2 Data Acquisition
The acquisition of range images of human faces is not trivial. V erygood images are pro vided by the range scanner of Cyberware (e.g., used in the w ork of Gordon [3, 4]). The system is based on a laser range nder and a rotation platform. Generally, the scanner yields datao of very high accuracy and provides a panoramic, 360 view of the object. Disadvan tages are the high cost and the long scan time which is not suitable for the acquisition of face images. F or our system, w e decided to use the coded light approach for acquiring range images. The disadvantages are missing data points due to occlusion or improperly re ected regions (e.g., dark regions suc h as ey ebro ws or beards, and specular regions such as the ey es). But the advan tages of this method are its relatively high speed and the low cost of the equipment. The acquisition of range information with the coded ligh t approach [1] requires a setup as depicted in Figure 1. A sequence of stripe patterns is projected onto the scene, and for eac hprojection an image is tak en
Proceedings of the 1997 International Conference on Virtual Systems and MultiMedia (VSMM '97) 0-8186-8150-0/97 $10.00 © 1997 IEEE
with the camera. This sequence of images results in a code for eac h pixel seen by the camera. But the code also corresponds to one projection plane of the projector. If the parameters of the projector and the camera are kno wn,it is possible to compute the 3D position of a given image point by triangulation. The camera and projector parameters are calculated in a calibration step. How ev er,for an yshadow region as w ellas regions that do not properly re ect the projected ligh t (e.g., dark regions), no 3D data can be computed. Thus, the coded light approach results in range images with missing data. First experiments with the sensor con guration giv en in Figure 1 show ed that the number of missing data points due to occlusion was rather large, which had a negative eect on the recognition performance. Therefore, we choose a multi-sensor setup in order to get a better qualit yof the data. This setup is depicted in Figure 2. We use t w o fullyequipped range sensors which are sensing the object under dierent views. Both sensors operate in the same coordinate system. By merging the data of these tw o range images into one, w ecan reduce the problem of missing data signi cantly. The merging procedure will be described in detail in Section 3. Images generated in the merging step will be referred to as "merged images", whereas the original range images of a single sensor will be called "base images". We acquired a collection of range images from 24 persons. F or eac h person w eha veten images, eac h resulting from tw o base images. The direction of the head w as restricted to v e standard directions. We took t wo images with the person looking straight ahead, tw o looking to the right, tw o looking to the left, tw olooking upw ards,and tw oimages looking downw ards. Our database includes images of 21 men and three w omen; 15 w earerof glasses; three men with moustache; t w o man with beard; 21 Europeans; three Asians. In Figures 3 (range) and 4 (grey level) the images of one person from our database are shown. The images in Figure 3 are visualized by applying a triangulation procedure and graphical rendering to the range data.
bet ween the tw o real sensors (see Figure 2). F or this virtual sensor position a range image is computed out of the collection of 3D points. The procedure consists of the following steps:
An essential step in our system is the merging of tw o range images (base images) that were simultaneously tak en from dierent positions. As mentioned abo ve, by merging tw orange images from dierent viewing points w e get a higher data density , and thus a better range image quality. Our merging procedure aims at constructing a new range image out of an arbitrary collection of 3D points. Note that in our case we merge the data of only tw o base images. But the procedure can be easily expanded to merging data from any number of base images. The basic idea in merging is to compute images as they might be seen by a virtual sensor. In our case the best position for the virtual sensor { i.e., the one that leads to the highest data density { is the center point
In Figure 5 the results of the merging procedure for the base images shown in Figure 3 are presented. Again, a triangulation procedure and graphical rendering were applied for the purpose of visualization. After the merging procedure no further preprocessing of the data is done except a smoothing step, which w as applied in some of our experiments. Smoothing is done with a standard Gaussian lter.
3 Merging and Preprocessing of Range Images
1. Initialization: The input images and some userde ned parameters are read. After this step a set of range data, i.e., 3D coordinates, from both sensor systems are available. 2. P ositioning the virtual sensor:The positioning of the virtual sensor is based on the set of 3D data points. First, the center of gra vity~g = (gx; gy ; gz ) of the points is computed. The virtual sensor is then positioned at ~s = (gx ; gy ; gz + d) where d is the displacement in z -direction; d is a parameter to be speci ed by the user. The view direction is given b y~g , ~s (see Figure 2). 3. Rotation of the object: The user may specify rotation angles for the x, y and z axis in order to rotate the face into a canonical position. If suc h an angle is speci ed, the object is rotated around the center of gra vity by the speci ed angles. If no rotation is speci ed, this step is skipped. 4. Computation of the virtual sensor parameters: Based on positioning information, the parameters of the virtual sensor (e.g., projection matrix, projection center in the new image, camera direction) are computed. 5. Computation of the merged images: F or every original 3D data point, the coordinates in the merged range image are calculated based on the parameters of the virtual sensor. Since w e are merging data from tw odierent view points, it might happen that 3D points of tw o dierent surfaces are mapped onto the same pixel in the new range image. In this case w eapply a sort of z buering [2]. If the distance bet ween the tw o competing points is bey ond a prede ned threshold, the point further aw ay from the virtual sensor is discarded. If the distance is low er than the threshold, the range values of the points involv ed are a veraged and taken for the range image.
4 Recognition
A standard procedure for the analysis and recognition of faces is the so-called eigenface method described by Turk/Pen tland [11]. Basically, a template matching is done after all images ha vebeen transformed into the so-called face space, which con tains muc h less redundancy than the original image space.
Proceedings of the 1997 International Conference on Virtual Systems and MultiMedia (VSMM '97) 0-8186-8150-0/97 $10.00 © 1997 IEEE
1
2
A
B 101 0
projection plane (vertical)
stripe patterns projector
camera
Figure 1: Range acquisition with the coded light approach y
x
z
object
range sensor 1
virtual sensor
range sensor 2
Figure 2: Setup of range sensor (as seen from above)
Proceedings of the 1997 International Conference on Virtual Systems and MultiMedia (VSMM '97) 0-8186-8150-0/97 $10.00 © 1997 IEEE
01
02
03
04
05
06
07
08
09
10 Figure 3: T en range images (base images) of the same person.The white does represent missing data.
Proceedings of the 1997 International Conference on Virtual Systems and MultiMedia (VSMM '97) 0-8186-8150-0/97 $10.00 © 1997 IEEE
01
02
03
04
05
06
07
08
09
10 Figure 4: Grey level images of one person, corresponding to Figure 3.
Proceedings of the 1997 International Conference on Virtual Systems and MultiMedia (VSMM '97) 0-8186-8150-0/97 $10.00 © 1997 IEEE
01
02
03
04
05
06
07
08
09
10
Figure 5: Merged range images The base vectors which span the face space are determined by a principal component analysis of a series of training images, and the vectors with the most signi can t eigen values are tak en as base vectors. T est images are rst projected into the face space, and then it is determined which person's training image is the most similar in the face space. If the similarity is abo ve a certain threshold, this yields the recognition result. Otherwise, the test image is rejected as unknown. A second method which has gained some attention recen tlyis face recognition based on hidden Markov models. It has been described by Samaria/Fallside [9, 10 ]. F or a detailed description of HMMs see Rabiner [8]. Generally, HMMs are w orking on onedimensional signals, or feature vectors. Images, howev er, con taintw o-dimensionalinformation. In order to make HMMs applicable to images, w ereduce the image information to one-dimensional vectors b y applying a sliding window. This window is moving from the top of the image to the bottom and coversthe whole width of the image. The step size is chosen so that tw osuccessive windows ha vea certain overlap. The reason for the overlap is to avoid the cutting of signi cant face features and bring some context information into the process. The pixel values in the sliding window are given as feature vector to the HMM. The idea underlying this method is the follo wing. Intuitiv ely,a human face consists of a number of regions, for example, forehead, eyes, nose, mouth, chin and so on. These regions remain identi able for a human observer, ev en when the face image is cut into sliding windows as described above. With the HMM w e try to make use of this property. A human face is represented b ya linear left-righ tmodel consisting of v e states as shown in Figure 6. These states correspond to the face parts ("1" for the forehead, "2" for
the ey es, "3" for the nose, "4" for the mouth and "5" for the c hin). F or ev ery person in the database the parameters of the hidden Markov model are calculated in a training phase. If a test image is presented, the probability of producing this image is computed by means of the Viterbi algorithm for every person (i.e., model) in the database. The classi er returns a ranking of the persons in the database in ascending order of the score for eac h model. The score s is computed dependent on the probabilities p of the models, i.e., s = ,2 log p.
5 Experimental Results
All images in our database have a size of 75 x 150 pixels. Since hidden Markov models and the eigenface method require training, w ehad to divide our data set into a test and a training set. F or ev ery head direction we used one image as a training and one as a test image. E.g., the images "01", "03", "05", "07", and "09" of the person shown in Figure 5 were used as training images, the other ve images belonged to the test set. T otally ,the database con tains120 training and 120 test images. The tests for the eigenface method were done with the program facerec,1 which was adapted for dealing with range images. The dimension of the eigenface space is 119. The computations involving the hidden Markov models, i.e., training and testing, w ere done with the ISADORA package2. The HMMs are linear left-righ tmodels consisting of v e states as explained in Section 4.A sliding win1 facerec w as written by M. Turk at the Massachusetts Institute of T echnology in Boston, USA. 2 ISADORA w as developed by E.G. Schukat-T alamazzini at the Friedrich-Alexander-Universitat in Erlangen-N urn berg, Germany.
Proceedings of the 1997 International Conference on Virtual Systems and MultiMedia (VSMM '97) 0-8186-8150-0/97 $10.00 © 1997 IEEE
a11
1
a22 a12
b1
2
a33 a23
b2
3 b3
a44 a34
4 b4
a55 a45
5 b5
Figure 6: Linear left-right human face model do w of 4 pixels height was moved from the top to the bottom of the images, and the data within the window w ere fedto the HMMs. The window o verlap for the slide operation was 3 pixels. The results of our experiments are shown in Table 1. F or the experiments of the category "no preprocessing", no smoothing and no rotation are applied to the input images. Therefore, w eha ve imagesof v edifferen t head directions. F or eac h direction and eac h person, we ha ve one image belonging to the training and one belonging to the test set. F or the experiments reported under the category "smoothing", no rotation w as done,but an additional smoothing step w as applied. A Gaussian lter with = 0:5 and = 1:5 w as used at this stage. Since the directions of the head w erecon trolledto some degree, a rotation w as applied in the experiments of the last category in order to bring the faces more or less in a canonical position (looking straight ahead). The rotation around the y axis was constantly 30o (head direction to the left or to othe right side), and the rotation around the x axis 20 (head direction upw ards or downw ards). The results for both recognition procedures are very promising. It seems that the eigenface strategy outperforms the HMM procedure. Since the database is rather small, how ev er, w e are notety able to de nitely conclude which of these procedures is superior to the other. We additionally ran some tests with ten states instead of ve in the HMM in order to chec k the in uence of the number of states on the recognition rate. We did not notice an y signi cant dierences in the results, how ev er. The results for a smoothing step in the preprocessing are not as w e expected for the HMM-based method. Since the data acquired with the range sensor are aected by white noise, it was hoped to get rid of the eects of noise by applying a smoothing procedure. The experiments, ho w ev er,sho w edthat the smoothing obviously removes characteristics of the facial surface which are important for the recognition (oversmoothing). A similar eect w as observed for rotation. The recognition rate for the eigenfaces is getting higher, whereas the rate for the hidden Markov models sligh tly drops.
6 Conclusions and F utureWork
In the present paper, a low-cost and fast acquisition method for 3D facial data is described. F urthermore, tw o methods that are known for face recognition with grey level images have been extended to range images. These techniques are based on eigenfaces and on hid-
den Markov models, respectively. Though the data collection is not large (24 persons), w ebeliev ethat the results are very promising. The eigenface approach outperforms the HMM strategy in our experiments. The results of our experiments sho w clearly that face recognition with range images is a challenging and promising alternative to tec hniques based on intensit y and color images. Procedures originally designed for grey level images work v ery well on range images, and the recognition rates are in a similar range. There is clear evidence that range images have some advantages over intensit yimages, for example availabilit y of explicit 3D information and invariance properties. The explicit 3D information is useful for the transformation of a face image into a canonical position, since the pose of the object can be determined more accurately .And the invariance properties under change of color and illumination potentially lead to more robustness and stabilit y. Another area which oers a large number of applications for range images is virtual reality. Our acquisition method is very well suited for the modeling of objects in VR systems, since in addition to the range image we also get an intensit y image of the scene from our sensor. The combination of these tw o information sources by means of texture mapping yields object models that can be incorporated into virtual worlds. In our future w ork,w e planto enlarge our collection of range images of human faces. We also plan to enhance the preprocessing of the images and incorporate additional recognition techniques in our system. F urthermore, we shall work on the combination of the classi ers involv ed,especially on the combination of range image and grey level image classi cation.
Acknowledgments
The softw are for the range image acquisition was originally provided by the ETH Zuric h, Switzerland, and has been adapted to our sensor setup. The visualization of the 3D data w asdone with BOOGA (Bern's Object Oriented Graphics Architecture) which was developed by the Research Group for Computational Geometry at the University of Bern, Switzerland, headed by Prof. H. Bieri.
References
[1] P .J. Besl. Active, Optical Range Imaging Sensors. Machine Vision and Applications, 1:127{ 152, 1988. [2] J.D. F oley ,A. van Dam, S.K. F einer, and J.F. Hughes. Computer Gr aphics. Principles and
Proceedings of the 1997 International Conference on Virtual Systems and MultiMedia (VSMM '97) 0-8186-8150-0/97 $10.00 © 1997 IEEE
preprocessing eigenface procedure HMM procedure no preprocessing 97.50% 90.83% smoothing ( = 0:5) 98.33% 90.00% smoothing ( = 1:5) 98.33% 76.67% rotation 100.00% 89.17% T able 1:Results of the experiments Pr actic .e Addison-Wesley Publishing Company, 2nd edition, 1990.
[3] G.G. Gordon. F ace Recognition Based on Depth Maps and Surface Curvature. In SPIE Pr oceedings: Geometric Methods in Computer Vision, San Die go,CA, volume 1570 of SPIE Pr oceedings, July 1991. [4] G.G. Gordon. F ace Recognition Based on Depth and Curvature F eatures. In Pr oceedingsIEEE Computer So ciety Conferenc e on Computer Vision and Pattern R ecognition(CVPR), Champaign, IL, pages 808{810. IEEE Computer Society Press, June 1992. [5] J.T. Lapreste, J.Y. Cartoux, and M. Ric hetin. F ace Recognition from Range Data by Structural Analysis. In G. Ferrate, T. Pavlidis, A. Sanfeliu, and H. Bunke, editors, Syntactic and Structural Pattern R ecognition, NATO ASI Series, pages 303{314. 1988. [6] J.C. Lee and E. Milios. Matching Range Images of Human Faces. In Third International Conferenc e on Computer Vision (ICCV), Osaka, Japan, pages 722{726, December 1990. [7] R. Lengagne, J.-P .T arel,and O. Monga. F rom 2D Images to 3D Face Geometry. In Pr oceedings 2nd International Conference on Automatic Face and Gesture R ecognition(ICAFGR), Killington, V ermont, pages 301{306, October 1996. [8] L.R. Rabiner. A T utorial on Hidden Markov Models and Selected Applications in Speech Recognition. Pr oceedings of the IEEE, 77(2):257{ 286, February 1989. [9] F. Samaria and F. F allside. F ace Iden ti cation and F eature Extraction Using Hidden Markov Models. In G. Vernazza, A.N. Venetsanopoulos, and C. Braccini, editors, Image Processing: Theory and Applications, pages 295{298. Elsevier Science Publishers, 1993. [10] F. Samaria and S. Y oung. HMM-Based Arc hitecture for Face Identi cation. Image and Vision Computing, 12(8):537{543, October 1994.
[11] M.A. T urk and A.P. P en tland. Eigenfaces for Recognition. Journal of Co gnitiveNeuroscience, 3(1):71{86, 1991. [12] Y. Y acooband L.S. Davis. Labeling of Human F ace Components from Range Data. CVGIP: Image Understanding, 60(2):168{178, September 1994.
Proceedings of the 1997 International Conference on Virtual Systems and MultiMedia (VSMM '97) 0-8186-8150-0/97 $10.00 © 1997 IEEE