A Framework for Performance Evaluation of Face Recognition Algorithms John A. Black, Jr*, M Gargesha, K Kahol, P Kuchi, Sethuraman Panchanathan Visual Computing and Communications Lab, Arizona State University ABSTRACT Face detection and recognition is becoming increasingly important in the contexts of surveillance, credit card fraud detection, assistive devices for visual impaired, etc. A number of face recognition algorithms have been proposed in the literature. The availability of a comprehensive face database is crucial to test the performance of these face recognition algorithms. However, while existing publicly-available face databases contain face images with a wide variety of poses angles, illumination angles, gestures, face occlusions, and illuminant colors, these images have not been adequately annotated, thus limiting their usefulness for evaluating the relative performance of face detection algorithms. For example, many of the images in existing databases are not annotated with the exact pose angles at which they were taken. In order to compare the performance of various face recognition algorithms presented in the literature there is a need for a comprehensive, systematically annotated database populated with face images that have been captured (1) at a variety of pose angles (to permit testing of pose invariance), (2) with a wide variety of illumination angles (to permit testing of illumination invariance), and (3) under a variety of commonly encountered illumination color temperatures (to permit testing of illumination color invariance). In this paper, we present a methodology for creating such an annotated database that employs a novel set of apparatus for the rapid capture of face images from a wide variety of pose angles and illumination angles. Four different types of illumination are used, including daylight, skylight, incandescent and fluorescent. The entire set of images, as well as the annotations and the experimental results, is being placed in the public domain, and made available for download over the worldwide web.
1. INTRODUCTION Most face recognition algorithms depend upon some type of similarity measure to retrieve candidate face images from a database. Many different similarity measures can be used. However, since human beings seem to be very good at recognizing faces, it seemed to us that it would be worthwhile to study what facial characteristics are most salient to humans, as they evaluate subjective similarity between faces. In order to collect data for that study, an experiment was designed that used 75 frontal face images from Purdue AR Face database9. Participants were presented with one of these 75 images (to be used as a reference image) and were asked to select the 19 “most similar” faces from the remaining 74 images, and to rank order those 19 images based on their degree of similarity to the reference image. That experiment produced results that are currently being used to evaluate the criteria that humans use to distinguish faces from each other, and will be the subject of a future paper. However, while designing that experiment, it became clear to us that the currently available face databases are not adequate to perform a comprehensive performance evaluation of face analysis and recognition systems. Specifically, the problem of evaluating a face recognition system’s tolerance of variations in environmental parameters (such as pose angle, illumination angle, and illumination color) can only be done with a comprehensive set of face images that widely populate the multidimensional space defined by these parameter values. Such a set of face images can only be generated by methodical control of the environmental parameter values as the images are captured, and by subsequent use of those values used to annotate the images in the resulting database. By using subsets of this image set (in which all but one parameter is held constant) it becomes possible to independently evaluate a face recognition system’s ability to tolerate *
[email protected]; phone 1 480 966-5936; fax 480 968-3446; http://cubic.asu.edu/VCCL.htm; PO Box 5406, Tempe, AZ 852875406, USA
changes in each type of environmental variable. Such a database also allows objective performance comparisons between face recognition algorithms when environmental parameter values are varied along different dimensions. Unfortunately, most of the currently available sets of face images do not widely populate this multidimensional environmental parameter space. In cases where variations in these environmental parameters were used during the capture of images they were often not methodically controlled, accurately measured, or used to annotate the images within the resulting image set. For example, a face database might include face images taken from various pose angles, but those pose angles were only described qualitatively, rather than quantitatively in units such as degrees. Testing a face recognition algorithm with a set of images that includes only a few qualitative pose angels (such as frontal, profile and 3/4 view) does not establish how many degrees of pose angle variation the algorithm can tolerate. Such coarsely quantized pose angles are also of limited value in evaluating incremental improvements in the algorithm’s tolerance of pose angle variations.
2. RELATED WORK As mentioned above, several research groups have built face databases for measuring and comparing the performance of face recognition algorithms. This section lists many of these face databases, and provides a brief description of each. 2.1 The AT&T Database of Faces (formerly the ORL Database of Faces) The AT&T Database of Faces1 contains face images of 40 persons, with 10 images of each. The subjects were asked to face the camera but no restrictions were imposed on their facial expression, and some side movement and head rotation were tolerated. For most subjects, the 10 images were shot at different times and with different lighting conditions, but always against a dark background. 2.2 The Oulu Physics database The Oulu Physics database2 includes frontal color images of 125 different faces. Each face was photographed 16 times, using 1 of 4 different illuminants (horizon, incandescent, fluorescent, and daylight) in combination with 1 of 4 different camera calibrations (color balance settings). (If a subject wore glasses, 2 sets of 16 images were captured – one with glasses, and one without.) The camera was white-balanced and linearized for one of the 4 illuminants, and then the same setting was used to photograph the face under all 4 illuminants. (This was repeated for each of the 4 illuminants, thus providing 16 images.) The images were captured under dark room conditions, and a gray screen was placed behind the participant. The spectral reflectance (over the range from 400 nm to 700 nm) was measured at the forehead, left cheek, and right cheek of each person with a spectrophotometer. The spectral sensitivities of the R, G and B channels of the camera, and the spectral power of the four illuminants were also recorded over the same spectral range. 2.3 The XM2VTS database The XM2VTS (Extended MultiModal Verification for Teleservices and Security) database3, 13, 14 consists of 1000 GBytes of video sequences and speech recordings taken of 295 subjects at one-month intervals over a period of 4 months (4 recording sessions). Since the data acquisition was accomplished over a period of time, significant variability in appearance of clients (such as changes of hairstyle, facial hair, shape and presence or absence of glasses) is present in the recordings. During each of the 4 sessions a “speech” video sequence and a “head rotation” video sequence was captured. During the speech sequence the subject was asked to read three sentences, which were written on a board positioned just below the camera. During the head rotation sequence the subject was asked to rotate his/her head from the center to the left, to the right, then up, then down, and then finally returning it to the center. This database is designed to test systems designed to do multimodal (video + audio) identification of humans by facial and voice features. The objective of this multimodal identification is to limit access to central services while minimizing access to impostors, and simultaneously maintaining a low false rejection rate.
2.4 The Yale Face database The Yale Face database4 contains frontal grayscale face images of 15 people, with 11 face images of each subject, giving a total of 165 images. Lighting variations include left-light, center-light, and right-light. Spectacle variations include with-glasses and without-glasses. Facial expression variations include normal, happy, sad, sleepy, surprised, and wink. 2.5 The Yale B database The Yale B database5 contains grayscale images of 10 subjects with 64 different lighting angles and 9 different poses angles, for a total of 5760 images. Pose 0 is a frontal view, in which the subject directs his/her gaze directly into the camera lens. In poses 1, 2, 3, 4, and 5 the subject is gazing at 5 points on a semicircle about 12 degrees away from the camera lens, in the left visual field. In poses 6, 7, and 8 the subject is gazing at 3 different points on a semicircle about 24 degrees away from the camera lens, again in the left visual field. The exact head orientation (both vertically and horizontally) for each pose was controlled by the subject, presumably based on the point of gaze. Each image file in the database is annotated with the subject number, the pose number, and the azimuth and elevation of the light source. The images were captured with an overhead lighting structure (consisting of multiple interconnected struts) which was fitted with 64 computer-controlled xenon strobe lights. For each pose, 64 images were captured of each subject at a rate of 30 frames/sec, over a period of about 2 seconds. Thus, there are only minor variations in the pose angle and facial expression. Photographs of each of the 10 subjects were taken both with and without the strobe illumination, thus providing two sets of images – one with “ambient” illumination, and another with ambient illumination plus strobe illumination. (In 47 of the latter images, the strobe failed to fire, thus producing an image with ambient lighting only. In addition, 4 other images in the database were corrupted, due to technical difficulties.) 2.6 MIT face database The MIT face database6 was produced at the MIT Media Laboratory. The subjects were 16 males. Each subject sat on a couch and was photographed 27 times, while varying head orientation. The lighting direction and the camera zoom were also varied during the sequence. The resulting 480 x 512 grayscale images were then filtered and sub sampled by factors of 2, to produce six levels of a binary Gaussian pyramid. The six “pyramid levels” are annotated by an X-by-Y pixel count, which ranged from 480x512 down to 15x16. 2.7 The CMU Pose, Illumination, and Expression (PIE) database The CMU Pose, Illumination, and Expression (PIE) database7, 18 contains images of 68 subjects that were captured with 13 different poses, 43 different illumination conditions, and 4 different facial expressions, for a total of 41,368 color images with a resolution of 640 x 486. Nine of the 13 poses were taken by moving the camera in a semicircular path at eye level around the front of the subject, while taking snapshots – presumably at about every 22.5 degrees. Three of the 13 poses were taken from above eye level, and 1 was taken below eye level. Two sets of images were captured – one set with ambient lighting present, and another set with ambient lighting absent. 2.8 The UMIST Face database The UMIST Face database8, consists of 564 grayscale images of 20 people of both sexes and various races. (Image size is about 220 x 220.) Various pose angles of each person are provided, ranging from profile to frontal views. 2.9 The Purdue AR Face database This Purdue AR Face database9 face database contains over 4,000 color frontal view images of 126 people's faces (70 men and 56 women) that were taken during two different sessions separated by 14 days. Facial illumination was provided by two independent light sources – left and right. Similar pictures were taken during the two sessions. No
restrictions on clothing, eyeglasses, make-up, or hair style were imposed upon the participants. Controlled variations include facial expressions (neutral, smile, anger, and screaming), illumination (left light on, right light on, all side lights on), and partial facial occlusions (sun glasses or a scarf). Images of facial occlusions were captured with the left light on, the right light on, and both side lights on. 2.10 The University of Stirling online database The University of Stirling online database10 was created for use in psychology research, and contains pictures of faces, objects, drawings, textures, and natural scenes. A web-based retrieval system allows a user to select from among the 1591 face images of over 300 subjects based on several parameters, including male, female, grayscale, color, profile view, frontal view, or 3/4 view. (Subjective ratings of attractiveness, distinctiveness and facial expressions can also be used as criteria for retrieval.) 2.11 The FERET database The FERET database11, 15, 16, 17 is a huge database that contains face images of over 1000 people. It was created by the FERET program, which ran from 1993 through 1997. The database was assembled to support government monitored testing and evaluation of face recognition algorithms using standardized tests and procedures. The final set of images consists of 14051 grayscale images of human heads with views that include frontal views, left and right profile views, and quarter left and right views. The most attractive thing about this database is that it contains many images of the same people taken with time-gaps of one year or more, so that some facial features have changed. This is important for evaluating the robustness of face recognition algorithms over time.
3. LIMITATIONS OF EXISTING DATABASES The following sections provide a short analysis of each of the face databases listed in Section 2, discussing the limitations of each. 3.1 The AT&T Database of Faces (formerly the ORL database of Faces) This database includes faces of a rather limited number of people (40) and the illumination conditions are not consistent from image to image. Also the images are not annotated for different facial expressions, head rotation, or lighting conditions. 3.2 The Oulu Physics database This database contains images captured under a good variety of illuminant colors, and the images are annotated for illuminant. However, all of the face images are basically frontal (with some variations in pose angle and distance from the camera) and there are no variations in the lighting angle. 3.3 The XM2VTS database This huge video database of faces/voices is certainly impressive, containing a wide range of pose angle variations and a large number of subjects. However, it does not include any information about the image acquisition parameters, such as illumination angle, illumination color, or pose angle. For example, the subject is simply asked to rotate his/her head left and right and up and down, without any attempt to record the pose angle.
3.4 The Yale Face database While the face images in this database were taken with 3 different lighting angles (left, center, and right) the precise positions of the light sources are not specified. Also, since all images are frontal, there are no pose angle variations. Environmental factors (such as the presence or absence of ambient light) are also not described. 3.5 The Yale B database The 9 different pose angles in these images were not precisely controlled. (Subjects were asked to gaze at specified points in the visual field, but the exact head orientation (both vertically and horizontally) for each pose was chosen by the subject. Also, the background in these images is not homogeneous, and is cluttered with images of the framework supporting the strobe lights, people who happened to be standing behind the apparatus when the images were captured, and even unshaded windows with sunlight filtering into the room. This indicates that little care was taken to prevent uncontrolled reflections, making control of the ambient lighting difficult, if not impossible. The resulting uncertainty of the ambient lighting conditions raises some questions about the efficacy of the strobe lighting. The fact that only 10 subjects were photographed is rather puzzling. It seems that with such an elaborate, automated, custom-built apparatus, the researchers would have captured images of many more subjects. 3.6 MIT face database Although this database contains images that were captured with a few different scale variations, lighting variations, and pose variations, these variations were not very extensive, and were not precisely measured. (The scale variations were annotated simply as full, medium or small. The lighting variations were annotated as “head on”, “~45 degrees”, and ~90 degrees. The head tilt variations were annotated as “upright”, right ~22.5 degrees, and left ~22.5 degrees.) There was also apparently no effort made to prevent the subjects from moving between pictures. 3.7 The CMU Pose, Illumination, and Expression (PIE) database Although this database makes an admirable attempt to capture a range of different poses, different illumination conditions, and different facial expressions, the clutter visible in the backgrounds of these images (which includes structures within the room, cabling, and cast shadows) indicates that there were probably many uncontrolled reflections of the illuminant, making control of the ambient lighting difficult, if not impossible. The exact pose angle for each image is not specified, and while it is reasonable to assume that the 9 pose angles around the front of the subject were spaced at intervals of 22.5 degrees, there is no indication of how accurate those angles are. 3.8 The UMIST Face database The grayscale images taken of each person are numbered consecutively, in the order that they were taken. While this provides some indication of relative pose angle in each image, this is not always reliable – perhaps due to movement of the subject between image captures. No absolute pose angle is provided for each image. No information is provided about the illumination used – either its direction or its color temperature. 3.9 The Purdue AR Face database Although the creator of this database says that the “pictures were taken... under strictly controlled conditions” those conditions are not specified. Although the facial illumination was provided by two independent light sources (left and right) the placement of those light sources, the color temperature of those light sources, and whether they were diffuse of point light sources is not specified. (The placement of the two light sources produces objectionable glare in the spectacles of some subjects.) 3.10 The University of Stirling database Given that the original purpose of this database was not to test face recognition algorithms, it is not surprising that no information is provided about the illumination used during the image capture. Most of these images were also captured
in front of a black background, making it difficult to discern the boundaries of the head of those subjects with dark hair. Presumably the pose angles (frontal, profile and 3/4) are only approximate. 3.11 The FERET database Although this database has some distinct advantages (such as having a large number of individuals, and having duplicate pictures of some people over time) it does not provide a very wide variety of pose variations and there is no information about the lighting used to capture the images.
4. THEORY Broadly speaking, images of faces contain two basic types of content: (1) content that is specific to the particular face contained in the image and (2) content that is particular to the environment in which the image was captured. The challenge of face recognition is two-fold: (1) to distinguish between different faces based on the image content that is specific to each particular face, and (2) to disregard the image content that is dependent only upon the environment. As shown by the discussion of the various face databases in sections 2 and 3 above, researchers have expended much effort in constructing face databases to test the ability of face recognition algorithms to distinguish between different faces. This effort has led to the construction of large databases with a wide variety of different faces. In several cases these databases consist largely of frontal face images taken under similar lighting conditions. Needless to say, such databases are not very useful for testing the ability of face recognition algorithms to disregard environmental variations, such as illumination. Few face databases are constructed specifically for the purpose of challenging face recognition algorithms with a carefully considered range of precisely calibrated environmental variations. Perhaps the most important environmental variables are the qualities of the illuminant, such as the color temperature of the source, the direction of the source, and whether the source is a point source or a diffuse source. These variables can make a dramatic difference in the low-level content of a face image. Two photographs taken of the same face can differ dramatically at the pixel level when the light source is moved from the left side of the face to the right side. However, because the human visual system subconsciously disregards changes in the illuminant, we are typically unaware of how dramatic these changes can be at the pixel level. This fact is illustrated below in Figures 1 through 4. The faces in Figures 1 and 2 are both illuminated with diffuse light. However, the face in Figure 1 is also illuminated by a point light source from the left side, while the face in Figure 2 is illuminated by a point light source from the right side. To the casual observer, no significant difference is seen, and there is certainly no difficulty in identifying both faces as the same person. However, if the pixel values in Figure 2 are subtracted from those in Figure 1 (with an offset of +128 to prevent negative results) the resulting difference image (Figure 3) highlights the dramatic differences at the pixel level. (Figure 4 is a corresponding difference image between Figure 1 and itself, which is simply a homogeneous level of 128.)
Figure 1
Figure 2
Figure 3
Figure 4
In order to provide robust face recognition, an algorithm must be sensitive to subtle differences in image content that are useful for distinguishing between faces. However, equally important is its ability to disregard the content that is particular to the environment in which the image was captured – such as the illuminant. Development of such an algorithm is not easy, and requires considerable empirical work. However, if the face images in a face database are not captured with a range of environmental variable values, and if each image is not annotated with information that precisely indicates the values of those environmental variables, it difficult to correlate face recognition failure (or success) with changes in these variables, and refine the face recognition algorithm to be more tolerant of changes in these environmental variables. A face database that does not include a set of images to represent each independent environmental variable also complicates comparisons between different face recognition algorithms, because two algorithms might give the same failure rates even though they have failed for totally different reasons. To provide a truly useful tool, a face database should contain face images that were captured under carefully controlled illumination conditions. These conditions include both the ambient (diffuse) and direct (point source) illumination. This level of control is only possible in a “black room” where all exposed surfaces are painted black to eliminate uncontrolled reflections of the illuminant. The location of each light source in the black room (whether it be a diffuse or a point light source) should be recorded, along with its color temperature. To allow researchers to determine the robustness of their face recognition algorithms to illumination variations, the database should include images spanning a range of lighting configurations. Natural lighting rarely consists of pure ambient or pure direct lighting, so the images in the database should include various combinations of lighting. In some image sets the ambient lighting might predominate, while in others direct lighting might predominate. Each image set should then be annotated with the ratio of ambient to direct lighting. The location of the direct lighting has a large effect on the appearance of cast shadows. To allow researchers to test the ability of their algorithms to disregard differences in cast shadows, the database should include multiple images of the same face in the same pose angle. but with different illumination angles, and each of these images should be annotated with the precise angle of illumination. This annotation allows researchers to correlate changes in the illumination angle with the error rates of their face recognition algorithms. Ideally, the face images should be captured in color, since the color of skin, eyes, and hair can be useful for face recognition. When capturing color images the color temperature of the illuminant can have a large effect on the colors captured by a digital camera. For example, skin color takes on a striking reddish or bluish cast under different color temperature illuminations. The dramatic effect of different colored illuminants (such as daylight, blue skylight, incandescent light, and fluorescent light) on faces can be seen on the University of Oulu’s web site (http://www.ee.oulu.fi/research/imag/color/pbfd.html) which shows the resulting color shift in RGB images when the a digital camera is not balanced to compensate for the illuminant. Most digital cameras are equipped with a color balancing feature that can be used to compensate for a particular illuminants color temperature. However, some face recognition applications require that a digital camera capture images in a wide variety of illuminants, without opportunity for color balancing. For example, recent work at Arizona State University has focused on providing portable cameras to assist blind people in identifying persons who are approaching them. The face recognition algorithm used for such an application might be called upon to recognize faces indoors or outdoors, regardless of the color shifts that occur under these conditions. Thus, it is important that a face database also include face images with the color shifts that will be encountered most often. In order to allow researchers to test their algorithms against variations in pose angle, the database should include multiple pose angles of the same face, and each image should be annotated with the precise pose angle. If a wide range of pose angles are photographed (and if each of the resulting images are annotated with precise pose angles) researchers will be able to correlate pose angles with the error rates of their face recognition algorithms. However, it is not possible to annotate face images with precise pose angles when the pose angles are obtained by simply asking each participant to rotate his/her head left, right, up, and down. A better strategy is to ask each participant to hold his/her head still while a camera is moved in a controlled trajectory around the participant, to capture images from precisely measured angles.
Efforts should also be made to limit physical movement of participants when other environmental variables are being changed, in order to control the number of variables that are changing simultaneously. For example, if the location of a direct lighting source is changed (to cast different shadows on the face) it would be helpful if the face remains stationary during the entire sequence, since this allows precise recording of the illumination angle at every point in the sequence. If variations in the environmental variables can be independently controlled and recorded during the capture of face images, researchers will be able to use subsets of the database images to test the robustness of their face recognition algorithms against any of several independent variables. This would be helpful for analyzing (1) why a particular algorithm fails, (2) what kinds of changes will improve its performance with regard to a particular variable, (3) whether those changes have adverse affects with regard to other variables, and (4) how much an algorithm’s robustness has improved with added refinements. In summary, to be really useful a face database should include face images that are captured under a variety of environmental conditions that widely populate the multidimensional environmental space this is typically encountered in real-world environments. At the same time, the values of these variables should be recorded as the images are captured, and should be used to annotate those images. Testing with the resulting annotated face database would give a more realistic measure of how a particular face recognition algorithm would perform in real world environments.
5. AN OVERVIEW OF OUR METHODOLOGY The creation of a face database that satisfies the requirements discussed in section 4 requires the construction of specialized apparatus. In our case this apparatus is built around an 8-foot-diameter dual rotating platform. This platform consists of two independently rotating rings that are each capable of supporting lighting and camera equipment, while moving it in a precisely controlled manner (with very little vibration) around the participant being photographed. This apparatus is shown in Figures 5 and 6. Both Figures show two Photoflex Starlite diffused light sources in the foreground, in front of the 8-foot-diameter platform. (The precise location of both these light sources is recorded and used to annotate captured images.) The two concentric rotating rings on the platform each ride on top of two concentric circular tracks of ball bearings that lie underneath the rings. Thus each ring comprises a very large “lazy susan” bearing. At the rear of Figure 5 a vertical backdrop can be seen, which provides a white background for the photos, which are taken from the opposite side of the ring by a camera that is mounted on a tripod (not shown here). Figure 6 shows our virtual participant (Molly) seated in front of the backdrop in the center portion of the platform, which does not rotate. A electronic camera (either video or still) is normally mounted in front of Molly on the same ring as the backdrop. As that ring is rotated, both the camera and the backdrop rotate in synchronism with each other. Thus, as the camera rotates around Molly (to capture first a left profile, then a frontal view, and then a right profile) the background behind her simultaneously rotates around Molly, thus appearing unchanged in all of the images. A tennis ball (which can be seen in both Figure 5 and 6) is suspended from the ceiling over the center of the platform, and is used to verify that the participant’s head is exactly aligned with the center of the platform. The rings are motorized, allowing a video camera to be swept around the front of the participant at it captures a continuous stream of video at 30 frames per second. This allows for quick capture over the entire range of pose angles in a short time, thus reducing the time that the participant is required to remain motionless. (The chair in Figure 6 is equipped with an adjustable head rest to stabilize the participant’s head during image capture, but due to the short amount of time required to capture the images this has not been necessary.) The rapid frame rate of the video camera ensures that, even with eye blinks, it will be possible to capture an open-eyed image from each angle. Frames can then be extracted from the video sequence at desired intervals, or the entire video sequence can be used. Note: By having the participant remain stationary (and asking him/her to fixate on a motionless spot) while the camera moves, we avoid the problem of nystagmus, which is an involuntary tracking behavior characteristic of humans when they are confronted with a moving scene. For example, when looking out the window of a car, a passenger’s eyes fixate on an object and track if for a time, and then jump to another object and track it for a time. If a video camera were to be mounted in a stationary position, and the participant rotated through a range of pose angles, the participants eyes would appear to dart back and forth on the resulting video.
Figure 5
Figure 6
It is also possible to use this same apparatus to capture images with different illumination directions. To do this a point light source (not shown) is mounted on the outer ring of the platform, while the camera and the backdrop are moved to the inner ring. The camera’s ring can then be held stationary to capture frontal shots, while the point light source is rotated around Molly on the outer ring, thus providing direct illumination (first to the left side of the face, then to the front of the face, and then to the right side of the face) that casts shadows across the face. Since most lighting situations include both ambient and direct lighting, the two Photoflex Starlites are also typically used to provide diffuse (ambient) light. The ratio of lighting from the direct and diffuse light sources can then be adjusted to simulate different natural conditions, such as a clear day or a cloudy day. Notice that the two independent rings on the platform allow the camera motion to be independent from the light source motion. Alternatively, the point light source (or even the diffused light sources) could be mounted on the same ring as the camera. This would maintain constant lighting (from the point of view of the camera) as the camera rotates around the participant, capturing a range of pose angles. Both of the diffused light sources as well as the point light source have a color temperature of 3200 Kelvins. This allows the use of Roscolux color light filters (which are designed to be used with this color temperature) to produce other color temperature light sources. For example, a Rosco 65 filter (which simulates daylight) was used to capture the face images shown above. By color balancing the camera for one color temperature (such as daylight) and by illuminating the participant with light of another color temperature (such as incandescent or fluorescent) it is possible to produce a face image set with deliberate color shift. (Such an image set would be useful for testing the robustness of face recognition algorithms when a camera is carried from outside a building to inside the building.) As can be seen in Figures 5 and 6, the entire platform is surrounded by black velvet curtains. These curtains absorb secondary light and, in doing so, help maintain control over the ambient lighting conditions. Thus, by recording the exact positions of all diffuse and point light sources, repeatability of lighting conditions can be assured. In the design of this apparatus the guiding principle was to capture images of a participant unobtrusively, while being able to independently control the pose angle, the angle of illumination and the color of the illuminant. In order to reduce the time required of the participant, the system was designed to be automated. Video was chosen as the preferred media because it allowed the capture a large set of images in a short time, while still allowing the extraction of still images from the video stream. Four color temperatures are used to simulate incandescent, fluorescent, daylight and blue skylight illumination. The resulting apparatus allows the acquisition of images with good consistency and repeatability.
7. RESULTS Figures 7 through 25 show 19 different pose angles captured at 10 degree intervals over a range of 180 degrees. (Both video and still images are captured in color, but the images shown here have been reduced to grayscale.)
Figure 7: – 90°
Figure 8: – 80°
Figure 9: – 70°
Figure 10: – 60 °
Figure 11: – 50°
Figure 12: – 40°
Figure 13: – 30°
Figure 14: – 20°
Figure 15: – 10°
Figure 16: 00°
Figure 17: + 10°
Figure 18: + 20°
Figure 19: + 30°
Figure 20: + 40°
Figure 21: + 50°
Figure 22: + 60°
Figure 23: + 70°
Figure 24: + 80°
Figure 25: + 90°
Figures 26 through 44 show 19 frontal views with direct lighting angles ranging in 10 degree intervals from – 90 degrees to + 90 degrees. These images were all captured with equal amounts of direct and ambient light.
Figure 26: – 90°
Figure 27: – 80°
Figure 28: – 70°
Figure 29: – 60 °
Figure 30: – 50°
Figure 31: – 40°
Figure 32: – 30°
Figure 33: – 20°
Figure 34: – 10°
Figure 35: 00°
Figure 36: + 10°
Figure 37: + 20°
Figure 38: + 30°
Figure 39: + 40°
Figure 40: + 50°
Figure 41: + 60°
Figure 42: + 70°
Figure 43: + 80°
Figure 44: + 90°
8. DISCUSSION There are many challenges to building a comprehensive face database. Perhaps one of the most fundamental is the ability to capture consistent, high-quality, repeatable images. There are many environmental variables that must be controlled to produce repeatable results. While the most obvious variables involve the lighting, there are many other considerations, including the camera equipment, the computer software needed to process the images consistently, and the electromechanical apparatus needed to conduct the photo sessions expeditiously. In addition, there is a need for a dedicated laboratory where the entire image capture set up can be maintained without disruption for a period long enough to build a large database, and a commitment to maintain and improve that database as new challenges surface.
9. CONCLUSION While the attention of face recognition researchers has largely been focused on methods for distinguishing between face images, robust face recognition will not be attained until researchers also learn how to deal with the environmental variations that are encountered in real-world applications. To accomplish this it is necessary to create face databases with carefully calibrated environmental variations that can be used to challenge existing face recognition algorithms, and expose the specific weaknesses that are inherent in each. Such databases can also be useful in validating new approaches to face recognition, and in comparing the performance of different face recognition algorithms. The methodology described in this paper provides one possible answer to the question of how to build such a database.
10. ONGOING WORK The construction of a comprehensive face database is a large undertaking that involves the cooperation of many people. While we have constructed the basic apparatus that we need to accomplish this task, there is still much work to be done, and many optimizations and refinements that can be done to streamline the process further. Our plans include image capture from a large group of participants, processing and annotating of the resulting images, archiving of the resulting image set in an easy-to-use format, and making that archive available for download from the world wide web12. As this database is used by ourselves (and hopefully by other researchers) to explore the limitations of current face recognition algorithms, and to develop methods for overcoming these limitations, we will be looking for opportunities to further expand this database to cover emerging problems.
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
The AT&T Database of Faces (http://www.uk.research.att.com/facedatabase.html) The Oulu Physics database (http://www.ee.oulu.fi/research/imag/color/pbfd.html) The XM2VTS database (http://www.ee.surrey.ac.uk/Research/VSSP/xm2vtsdb/) The Yale database (http://cvc.yale.edu/) The Yale B database (http://cvc.yale.edu/projects/yalefacesB/yalefacesB.html) The MIT face database (ftp://whitechapel.media.mit.edu/pub/images/) The CMU PIE database (http://www.ri.cmu.edu/projects/project_418.html) The UMIST database (http://images.ee.umist.ac.uk/danny/database.html) The Purdue AR Face database (http://rvl1.ecn.purdue.edu/~aleix/aleix_face_DB.html) The University of Stirling online database (http://pics.psych.stir.ac.uk/) The FERET database (http://www.itl.nist.gov/iad/humanid/feret/) The FacePix reference image set is in the public domain, and may be downloaded at http://cubic.asu.edu/vccl/imagesets/facepix. J. Matas, M. Hamouz, K. Jonsson, J. Kittler, Y. Li, et al. 2000. Comparison of face verification results on the XM2VTFS database. Presented at 15th International Conference on Pattern Recognition. ICPR-2000, Barcelona, Spain K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre. 1999. XM2VTSDB: The Extended M2VTS Database. Presented at Second International Conference on Audio and Video-based Biometric Person Identification (AVBPA '99), Washington, DC H. Moon, and P.J. Phillips. 1998. The FERET verification testing protocol for face recognition algorithms. Presented at Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan P.J. Phillips, M. Hyeonjoon, P. Rauss, and S.A. Rizvi. 1997. The FERET evaluation methodology for facerecognition algorithms. Presented at IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico P.J. Phillips, M. Hyeonjoon, S.A. Rizvi, and P.J. Rauss, "The FERET evaluation methodology for face-recognition algorithms," IEEE Transactions on Pattern Analysis and Machine Intelligence 22, pp. 1090-104, 2000. T. Sim, S. Baker, and M. Bsat. 2002. The CMU Pose, Illumination, and Expression (PIE) Database. Presented at International Conference on Automatic Face and Gesture Recognition