Face Recognition in the Dark

0 downloads 0 Views 179KB Size Report
the shape of the lens. ... istered visible and longwave thermal infrared (LWIR) video. ... 0.9µ) has 8 bits of grayscale resolution and the LWIR has. 12 bits.
Face Recognition in the Dark ∗ Andrea Selinger†

Diego A. Socolinsky‡

†Equinox Corporation 9 West 57th St New York, NY 10019

‡Equinox Corporation 207 East Redwood St Baltimore, MD 21202

{andrea,diego}@equinoxsensors.com

Abstract Previous research has established thermal infrared imagery of faces as a valid biometric and has shown high recognition performance in a wide range of scenarios. However, all these results have been obtained using eye locations that were either manually marked, or automatically detected in a coregistered visible image, making the realistic use of thermal infrared imagery alone impossible. In this paper we present the results of an eye detector on thermal infrared imagery and we analyze its impact on recognition performance. Our experiments show that although eyes cannot be detected as reliably in thermal images as in visible ones, some face recognition algorithms can still achieve adequate performance.

1

Introduction

Over the last few years, there has been a surge of interest in face recognition using thermal infrared imagery. While the volume of literature on the subject is notably smaller than related to visible face recognition, there is nonetheless a steady stream of research [1, 2, 3, 4, 5]. Although they mostly relied on databases limited in size and variability, these papers have established that thermal imagery of human faces constitutes a valid biometric signature. Thermal imagery has shown superior performance over visible imagery with a variety of algorithms [1, 6]. More recently, results of time lapse recognition results were reported in [4, 7]. Results in an operational scenario are presented in [8]. A necessary step toward automated face recognition, in any modality, is the detection of faces and facial features. Face detection in the thermal infrared has been reported in [9]. The most common facial features required by face ∗ This research was supported in part by the DARPA Human Identification at a Distance (HID) program, contract # DARPA/AFOSR F49620-01C-0008.

0-7695-2158-4/04 $20.00 (C) 2004 IEEE

recognition algorithms are the locations of both eyes. These are normally used to align the detected faces to a standard template prior to more complex feature extraction and comparison. To date, thermal infrared face recognition algorithms have relied on eye locations that were either manually marked by a human operator [3, 6, 1, 4, 7], or automatically detected in a coregistered visible image [8]. The results of such studies cannot be extrapolated to situations such as real-time face recognition at night, where a coregistered visible image may be unavailable or heavily degraded.1 In this paper, we present results of a detection algorithm applied to thermal infrared eye images. We compare these results with ground-truth data as well as with results obtained by a visible eye detector and we observe that although the localization error is larger in the thermal infrared than in the visible, the distance between the detected and the actual eye center location stays within 15% of the eye size in both modalities. Previous research [4] has shown that face recognition performance with thermal imagery is much more sensitive to eye location errors than its visible counterpart. As reported in that study, correct recognition rates using the PCA algorithm with Mahalanobis angle distance drop significantly when the eye locations are randomly perturbed in a 3x3 pixel window centered at the manually-located eye center. The reported performance drop is considerably larger than that suffered when perturbing eye locations and performing recognition with visible imagery. The authors therefore conclude that face recognition from thermal imagery is inherently more sensitive to registration errors. That conclusion is not supported by our study. We use visible and thermal infrared eye detection algorithms to geometrically normalize face images before applying two recognition algorithms, PCA using Mahalanobis angle distance and the Equinox algorithm. We analyze the impact of eye localization errors on recognition perfor1 Without

the use of external illumination.

mance in both visible and thermal infrared images of varying difficulty, indoors and outdoors. We observe that when the eye detection error increases, recognition performance decreases more abruptly in the case of the weaker PCA algorithm and stays within reasonable bounds for the better performing Equinox algorithm. Contrary to [4], our finding is that visible and thermal infrared performance decrease by approximately the same amount. This may be due to the increased difficulty of the visible image set with respect to the one used in [4].

2

The Eye Detection Algorithms

In order to detect eyes in thermal images, we rely on the face location detected using the face detection and tracking algorithm in [9, 10]. We then look for the eye locations in the upper half of the face area using a slightly modified version of the object detector provided in the Intel Open Computer Vision Library [11]. Before detection we apply an automatic gain control algorithm to the search area. Although LWIR images are 12 bit, the temperature of different areas of a human face has a range of only a few degrees and thus is represented by at most 6 bits. We improve the contrast in the eye region by mapping the pixels in the interest area to an 8 bit interval, from 0 to 255. The detection algorithm is based on the rapid object detection scheme using a boosted cascade of simple feature classifiers introduced in [12] and extended in [13]. The OpenCV version of the algorithm [14] extends the haar-like features by an efficient set of 45 degree rotated features and uses small decision trees instead of stumps as weak classifiers. Since we know that there is one and only one eye on the left and right halves of the face, we force the algorithm to return the best guess regarding its location. Figure 1 shows an example of face and eyes detected in a thermal infrared image.

thermal infrared in general, is that it fails to detect the eye center locations for subjects wearing glasses. Glasses are opaque in the thermal infrared spectrum and therefore show up black in thermal images, blocking the view of the eyes (see Figure 2.) In these images the glasses can be easily segmented and the eye center location can be inferred from the shape of the lens. Unfortunately, the errors incurred by such inference are rather large. For the experiments reported in this paper, only images of subjects without glasses were used. Proper normalization of thermal images of subjects wearing glasses is an area of active research, published results on which are forthcoming.

Figure 2: Thermal infrared image of a person wearing glasses We do not use the OpenCV object detection algorithm for eye detection in visible images. While this method does work reasonably well, we can obtain better localization results with the algorithm outlined below. This is simply because we can take advantage of clear structure within the eye region and model it explicitly, rather than depend on a generic object detector. We simply search for the center of the pupil of the open eye. The initial search area relies again on the position of the face as returned by a face detector [9]. Within this region, we look for a dark circle surrounded by lighter background using an operator similar to the Hough transform widely used for detection in the iris recognition community [15]: max(r,x0 ,y0 ) |Gσ (r) ∗

Figure 1: Automatic detection of the face and eyes in a thermal infrared image The drawback of the algorithm, and of eye detection in

0-7695-2158-4/04 $20.00 (C) 2004 IEEE

δ δr

 r,x0 ,y0

I(x, y) ds| 2πr

(1)

This operator searches over the image domain (x, y) for the maximum in the smoothed partial derivative with respect to increasing radius r, of the normalized contour integral of I(x, y) along a circular arc ds of radius r and center coordinates (x0 , y0 ). The symbol ∗ denotes convolution and Gσ (r) is a smoothing function such as a Gaussian of scale σ.

3

Experimental Results and Discussion

In order to validate our thermal eye detection algorithm, we performed two types of experiments. First we compared the eye locations to those obtained manually on a set of images. Then we used the eye locations to geometrically normalize the face images before applying two face recognition algorithms: PCA with Mahalanobis angle distance and the Equinox algorithm. We compared the recognition performance in visible and thermal infrared using eye detection results from both domains.

3.1

Comparison to Ground Truth Data

For this experiment we used 3732 images of 207 subjects not wearing glasses, collected during several indoor sessions. We used the FBI mugshot standard light arrangement. The subjects were volunteers, none of which was visibly agitated or perspiring (on their face, at least) during imaging. We used an uncooled sensor capable of acquiring coregistered visible and longwave thermal infrared (LWIR) video. The format consists of 240 × 320 pixel image pairs, coregistered to within 1/3 pixel, where the visible image (from a Pulnix 6701 camera, sensitive to approximately 0.9µ) has 8 bits of grayscale resolution and the LWIR has 12 bits. The LWIR microbolometer (Indigo Merlin) is sensitive through the range 8µ-12µ, with a noise-equivalentdifferential-temperature (NEdT) of 100mK. Thermal images were radiometrically calibrated in order to compensate for non-uniformities in the microbolometer array. Figure 3 shows sample images from this set.

set. These are images where the detected eye coordinates were at least 10 pixels away from the ground-truth location. Since the images are geometrically normalized using the detected eye locations, outliers can easily be detected by passing the normalized image through a face/non-face classifier. Also, note that the face detector yields expected eye locations, which can be used to validate the feasibility of finer eye localization. Outliers amounted to 436 images (12%) in the visible domain and 79 images (2%) in the LWIR domain. Table 1 shows the mean absolute error and the standard deviation of the error in the x and y coordinates for the left and right eye, for detection in the visible domain, while Table 2 shows the equivalent quantities for the LWIR domain. While the number of outliers is much larger in the visible than in LWIR, the means and standard deviations of the visible errors stay below 1 pixel2 . The means of the absolute LWIR errors go up to 2.8 pixels, a 4.7 times increase over visible, and the standard deviations go up to 1.75, a 1.77 times increase over visible. We have to keep in mind though that at the resolution of our images the average size of an eye is 20 pixels wide by 15 pixels high. So although the error increase from visible to LWIR is large, LWIR values still stay within 15% of the eye size, quite a reasonable bound. We will see below how this error increase affects recognition performance.

left x left y right x right y

Mean 0.56735 0.55006 0.59527 0.56735

Standard deviation 1.1017 0.83537 1.1364 0.83601

Table 1: Means and standard deviations of visible eye detection errors

left x left y right x right y

Mean 1.9477 1.5738 2.8054 1.5338

Standard deviation 2.0254 1.6789 2.0702 1.6821

Table 2: Means and standard deviations of IR eye detection errors Figure 3: Sample images used for ground truth data comparison For each coregistered image pair, the locations of the left and right pupil were semi-automatically located by a human operator. These locations were used as ground truth. After detecting the faces and eyes in both the visible and thermal infrared images, we removed the outliers from our

0-7695-2158-4/04 $20.00 (C) 2004 IEEE

3.2

Face Recognition Performance

The imagery used for this experiment was collected during eight separate day-long sessions spanning a two week 2 Obviously, removing the outliers reduces the standard deviation, so to some extent the lower variance is due to the large number of outliers.

period. A total of 385 subjects participated in the collection. Four of the sessions were held indoors in a room with no windows and carefully controlled illumination. Subjects were imaged against a plain background some seven feet from the cameras, and illuminated by a combination of overhead fluorescent lighting and two photographic lights with umbrella-type diffusers positioned symmetrically on both sides of the cameras and about six feet up from the floor. Three of the four indoor sessions were held in different rooms. The remaining four sessions were held outdoors at two different locations. During the four outdoor sessions, the weather included sun, partial clouds and moderate rain. All illumination was natural; no lights or reflectors were added. Subjects were always shaded by the side of a building, but were imaged against an unconstrained natural background which included moving vehicles, trees and pedestrians. Even during periods of rain, subjects were imaged outside and uncovered, in an earnest attempt to simulate true operational conditions. For all sessions, subjects were cooperative, standing about seven feet from the cameras, and looking directly at them when so requested. On half of the sessions (both indoors and outdoors), subjects were asked to speak while being imaged, in order to introduce some variation in facial expression into the data. For each subject and session, a four second video clip was collected at ten frames per second in two simultaneous imaging modalities. We used the same sensor as in the previuos experiment, so the image format consists again of 240×320 pixel image pairs, coregistered to within 1/3 pixel, where the visible image has 8 bits of grayscale resolution and the LWIR has 12 bits. Thermal images were radiometrically calibrated. Example visible images can be seen in Figure 4.

Figure 4: Example visible images of a subject from indoor and outdoor sessions For each individual, the earliest available video sequence in each modality is used for gallery images and all subsequent sequences in future sessions are used for probe images. Images of subjects wearing glasses were removed from both the gallery and probe. The training set for all algorithms was completely disjoint from gallery and probe images, in time, space and subjects. That is, the training set was collected at an ear-

0-7695-2158-4/04 $20.00 (C) 2004 IEEE

lier time, in a different location and used a disjoint set of subjects. This insures that the results reported below are indicative of real world performance. Since the data collection involved video data in both modalities, we evaluated recognition performance using 40 frame video sequences as input. The distance from a probe sequence to an individual in the gallery was defined to be the smallest distance between any frame in the sequence and any image of that individual in the gallery. Classification was based on nearest neighbors with respect to this distance. We divided our test data in three categories: indoor gallery and indoor probe set consisting of 190 subjects, outdoor gallery and outdoor probe set consisting of 151 subjects, and indoor gallery and outdoor probe set consisting of 157 subjects. Eye detection for the gallery images was performed in the visible domain, a likely scenario for an access control system where users are enrolled under good visiblity conditions. Eye detection for the probe images was performed in the visible as well as in the LWIR domain. For this experiment we did not remove the outliers after eye detection as described in the previous section. Outliers result in face images that are incorrectly normalized and thus their distance to all gallery individuals is large. Since the distance to an individual from the gallery is the smallest distance between any frame in a 40 frame sequence and that individual, frames with incorrect eye locations are far from the top matches. We performed recognition experiments on our three data sets using the PCA algorithm with Mahalanobis angle distance and the Equinox algorithm in the visible and LWIR domains using eye detection results from the visible and LWIR domains. For completeness, we also recorded the performance obtained in the visible domain using eyes detected in the LWIR domain. Although this is not a realistic scenario (if visible imagery is available we might as well detect the eyes there) the results show the sensitivity of visible imagery to error in eye location. Top match recognition performances are shown in Tables 3, 4, 5, 6. Recognition performance with LWIR eye locations is followed in parentheses by the percentage of the corresponding performance with visible eye locations that this represents. PCA performs very poorly on difficult data sets (outdoor probes) and the performance decreases even more when the eyes are detected in LWIR. The decrease in performance is about the same in both modalities (performance with LWIR eye locations is about 70% of the performance with visible eye locations). This is in contrast with the observation in [4], but is probably due to the difficulty of the data set as well as a lower error in the eye center location. The Equinox algorithm performs much better than PCA in general. The decrease in performance when usig LWIR

Gallery/Probe indoor/indoor outdoor/outdoor indoor/outdoor

Visible % 89 38 29

LWIR % 72 73 54

Table 3: Performance of PCA algorithm with eyes detected in the visible domain Gallery/Probe indoor/indoor outdoor/outdoor indoor/outdoor

Visible % 69 (77) 34 (87) 14 (49)

LWIR % 48 (67) 66 (90) 34 (63)

Table 4: Performance of PCA algorithm with eyes detected in the LWIR domain. In parentheses percentage of correspondig performance with eyes detected in the visible domain eyes is not as steep as in the case of PCA (about 90% of the visible eyes performance in both modalities). Gallery/Probe indoor/indoor outdoor/outdoor indoor/outdoor

Visible % 99 89 88

LWIR % 96 96 90

Table 5: Performance of Equinox algorithm with eyes detected in the visible domain

4

Conclusion

Thermal infrared face recognition algorithms so far, have relied on eye center locations that were detected either manually or automatically in a coregistered visible image. In an attempt to solve the problem of real-time night-time face recognition we performed eye detection on thermal infrared images of faces and used the detected eye center locations to geometrically normalize the images prior to applying face recognition algorithms. We presented the results of applying a generic object detector to the problem of eye detection in thermal infrared images of faces. As expected, the problem is more difficult than its visible counterpart. The experiments we performed on over 3000 images with ground truth data available show that the error in eye center location is much higher in the thermal infrared than in the visible domain, but it still stays within 15% of the size of the eye. We analyzed the impact of eye locations detected in the visible and thermal infrared domains on two face recognition algorithms: PCA with Mahalanobis angle distance

0-7695-2158-4/04 $20.00 (C) 2004 IEEE

Gallery/Probe indoor/indoor outdoor/outdoor indoor/outdoor

Visible % 96 (96) 83 (93) 73 (83)

LWIR % 88 (92) 93 (97) 78 (87)

Table 6: Performance of Equinox algorithm with eyes detected in the LWIR domain. In parentheses percentage of correspondig performance with eyes detected in the visible domain and the Equinox algorithm. Our test data consisted of 385 subjects divided into three gallery/probe set pairs, indoor/indoor, outdoor/outdoor and indoor/outdoor. We observed that while recognition performance drops for both algorithms, the drop is more significant for the already poorly performing PCA algorithm. For the Equinox algorithm performance drops only 10% when the eyes are detected in LWIR. Night-time thermal infrared only recognition performance stays comparable to day-time visible performance. Notably, our experiments show that the decay in performance due to poor eye localization is comparable across modalities. Based on our results we believe that, using the right algorithm, thermal infrared face recognition is a viable biometric not only when visible light is available, but also in the dark.

References [1] D. A. Socolinsky and A. Selinger, “A comparative analysis of face recognition performance with visible and thermal infrared imagery,” in Proceedings ICPR, Quebec, Canada, August 2002. [2] D.A. Socolinsky and A. Selinger, “Face recognition with visible and thermal infrared imagery,” Computer Vision and Image Understanding, July - August 2003. [3] Joseph Wilder, P. Jonathon Phillips, Cunhong Jiang, and Stephen Wiener, “Comparison of Visible and Infra-Red Imagery for Face Recognition,” in Proceedings of 2nd International Conference on Automatic Face & Gesture Recognition, Killington, VT, 1996, pp. 182–187. [4] X. Chen, P. Flynn, and K. Bowyer, “Visible-light and infrared face recognition,” in Proceedings of the Workshop on Multimodal User Authentication, Santa Barbara, CA, December 2003, to appear. [5] B. Abidi, “Performance comparison of visual and thermal signatures for face recognition,” in The

Biometrics Consortium Conference, Arlington, VA, September 2003. [6] D. Socolinsky, L. Wolff, J. Neuheisel, and C. Eveland, “Illumination Invariant Face Recognition Using Thermal Infrared Imagery,” in Proceedings CVPR, Kauai, Dec. 2001. [7] X. Chen, P. Flynn, and K. Bowyer, “PCA-based face recognition in infrared imagery: Baseline and comparative studies,” in International Workshop on Analysis and Modeling of Faces and Gestures, Nice, France, October 2003. [8] D. Socolinsky and A. Selinger, “Thermal face recognition in an operational scenario,” in Proceedings CVPR, Washington, DC, June 2004, to appear. [9] C. Eveland, Utilizing Visible and Thermal Infrared Video for the Fast Detection and Tracking of Faces, Ph.D. thesis, University of Rochester, 2003. [10] D.A. Socolinsky, J.D. Neuheisel, C.E. Priebe, D. Marchette, and J.G. DeVinney, “A boosted CCCD classifier for fast face detection,” Computing Science and Statistics, vol. 35, 2003. [11] “Open computer vision library,” http://sourceforge.net/projects/opencvlibrary/. [12] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of IEEE CVPR, Kauai, HI, December 2001. [13] R. Lienhart and J. Maudt, “An extended set of haarlike features for rapid object detection,” in Proceedings ICIP 2002, 2002, vol. 1, pp. 900–903. [14] R. Lienhart, A. Kuranov, and V. Pisarevsky, “Empirical analysis of detection cascades of boosted classifiers for rapid object detection,” Tech. Rep., Microprocessor Research Lab, Intel Labs, Intel Corporation, 2002. [15] J. Daugman, “How iris recognition works,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 1, January 2004.

0-7695-2158-4/04 $20.00 (C) 2004 IEEE