Integrating Face Recognition into Security Systems Volker Vetter1 , Thomas Zielke1, and Werner von Seelen2 1 C-VIS Computer Vision und Automation GmbH, D-44799 Bochum, Germany 2 Institut fur Neuroinformatik, Ruhr-Universitat, D-44780 Bochum, Germany
[email protected]
Abstract. Automated processing of facial images has become a serious
market for both hard- and software products. For the commercial success of face recognition systems it is most crucial that the process of face image capturing is very convenient for the people exposed to such systems. As a consequence, the whole imaging setup has to be carefully designed for each operational scenario. The paper mainly deals with the problem of face image capturing in real security applications. In this context, we use the term face recognition in a broader sense, including the important functionality of face spotting followed by face validation. The latter provides the front-end of videobased automated person authentication. We describe two examples for a successful integration of face recognition in a security system. One is a system for recognizing authorization to use a vehicle and the other one is an automatic \assistant" for a security desk ocer. Both applications require techniques for face detection and face segmentation. Key problems are the camera setup and the size of the surveillance area. We propose an approach using a tracking camera with a high speed pan-tilt unit.
1 Introduction Convenience of operation is a primary prerequisite for the acceptance of a face recognition system as an access control device. The requirement for user cooperation should be as little as possible, i.e. it should be limited to looking into a camera for a short period of time. The camera mounting position should therefore be in view of a person requesting access. A good camera position will not force the user to behave in a strange way. The observed area should be adapted to the environment. Using a xed camera often leads to a small observation area because of the required image resolution for recognition. When integrating a face recognition system into a real world application the issues of face detection and face segmentation become as important as the recognition process itself. In section 3 we will present our approach of face candidate segmentation. It combines a low-level model-based face candidate generation with a neural-net-based nal check of the candidate regions. The methods chosen were selected under the conditions of an unconstrained environment and
(a)
(b)
(c)
Fig. 1. Access control scenarios: (a) driver's face monitoring, (b) PIN/video input console, (c) service/reception counter surveillance
real-time operation on limited hardware resources. The use of dierent image features supporting each other should improve robustness. Section 4 presents a face recognition system integrated into a car for driver access control. In section 5 a surveillance application at a security counter is described.
2 Integrating a Face Recognition System for Hands-Free Authorization Checks For user comfort, security systems have to avoid as much user interaction as possible. Fig. 1 shows dierent access control scenarios. Inside a car the environment is quite constrained due to the xed position of the driver's seat (Fig. 1 (a)). This restricts the region where his head could be found. The driver usually has to cooperate when unlocking the system. During the trip he will be looking straight out of the front window, look into the mirrors, and monitor the panel instruments. A camera position in the vicinity of one of these positions guarantees to obtain frontal face images during vehicle operation. However, ideal moments for facial snapshots have to be selected by the recognition system. A PIN/video console is found as part of some existing security systems at a protected entrance or an automated teller machine (Fig. 1 (b)). These systems require a high degree of user interaction: at least a key card and/or a PIN code has to be entered. During these actions the user position is approximately prede ned, the user has to look at the device. It seems reasonable to use that moment for face image acquisition. This setup usually tries to control the image capturing process by providing a visual user feedback by a semi-transparent mirror. In case of a simple image capturing device there is no other way to obtain high-quality facial images. In Fig. 1 (c) a surveillance situation at a clearance counter is depicted. Avoiding unwanted attention, a face recognition system has to operate autonomously without the requirement of any interaction by the monitored persons in this
(a)
(b)
(c)
(d)
Fig. 2. Face candidate segmentation: (a) scene from driver access control system, (b) hough space after transformation with multi-scale facial template, (c) dierence image from edge- ltered sequence, (d) nally segmented face.
operational environment. Usually the monitored person will look at the security ocer during service. This situation is dicult to cope with: the operating environment is quite unconstrained, and the position variance of possible face locations is high. It is the task of the face recognition system to choose the right moment for face image capturing. In general, a xed camera will not be able to obtain images with a sucient resolution while covering the complete surveillance area. We propose a tracking camera to overcome this diculty.
3 Face Candidate Segmentation Most face recognition algorithms show best performance when dealing with centered, frontal face images, normalized in size and illumination, with a uniform background [1]. The task of face candidate segmentation is to process real-world video images with respect to these conditions. Starting with image acquisition (see Fig. 2 (a)) the face candidate segmentation rst has to decide whether a face is present in the digitized image or not. We solve this by three low-level processes looking in parallel for form, motion and color properties of faces. Of course none of these properties warranties to nd a face under any circumstances, but they are unlikely to fail all together. From the low-level processes candidate regions are assigned and ranked. Overlapping regions are merged if the overlap is more than 50%. The 2D position and size measurements are tracked by a steady state Kalman Filter. High ranking regions which can be tracked over some time are checked by a neural net for the presence of a face. The segmented window with highest neural net rate, which must be higher than a given threshold, is considered to contain the current face in front of the camera. This candidate window (Fig. 2 (d)) is passed to the face recognition module.
3.1 Model Based Face Candidate Segmentation The basic idea of model based face candidate segmentation is to convolve prede ned models with images containing faces. At matching face candidate positions
the convolution result is dierent from that in a neighborhood. We want to limit the focus on algorithms which are able to work on non-uniform backgrounds. This group of algorithms often uses multi-scale template matching approaches [7, 13], sometimes supported by neural networks [10], and Hough-based techniques [6]. We have chosen a Hough-based technique because of its inherent property of statistical template matching at low computational cost. Although a multidimensional template is used the search space has only dimension two. The acquired image is rst appropriately scaled in size. Then a general Hough transformation with a facial template is performed (see Fig. 2 (b)). Local maxima in the Hough space are identi ed and a rst candidate list is created. Since maxima in Hough space are ambiguous a back-transformation is required to verify candidate positions. A critical issue in the Hough transformation is of course template scale variance. The candidate positions are chosen best, if the face's size variance in the image is small and so the template size variance may be small also. Small means here about 50% of a mean face size. Size estimates within this range are possible for example, if persons are monitored in an approximately known distance. Our model based face candidate segmentation will fail in cases of very poor face-background contrast and complex backgrounds containing shapes similar in form and size to our face model.
3.2 Color-based Face Candidate Segmentation
As a second face estimate we use color segmentation. If available, skin color is a strong feature for the presence of a face [2, 3]. Considering varying skin colors does not lead to an unacceptable amount of misclassi cations on complex backgrounds. Our algorithm is derived from the chroma keying technique. Each pixel of a color image in YCrCb space is classi ed to be of esh tone color A split and merge algorithm identi es connected regions and again a candidate list is created. The quality is derived from the color match and the shape of the region. In the chroma keying technique the Cr and Cb components of the video signal are used to specify a key color range. The Y component gives an additional limit. The color ranges permissible for faces have been empirically derived from sample images. Color-based face spotting works most favorable with high-quality 3-chip CCD cameras. Some low-cost color cameras place esh tone colors close to the origin of the Cr-Cb plane. By carefully selecting the camera type and the video digitizer suciently good results have been achieved with certain low-cost multimedia components. Color based face segmentation is best used as support for other segmentation techniques.
3.3 Motion Based Face Candidate Segmentation
Although not very reliable on its own, motion is a good key for gure-ground segmentation. Where a complex background may mislead form- and color-based
(a)
(b)
(c)
Fig. 3. (a) Driver requesting access, (b) Dashboard with user display, and (c) IR-spot, camera and user console
face segmentation, motion is a good indicator for \living" persons. For motion detection image sequence processing is required. In a real-time system this limits the computational complexity of the algorithm that can be considered for implementation. Motion may be detected by optical ow computation, or the evaluation of image dierence between subsequent images or between a current image and an accumulated background image. All methods are subject to distortions by illumination changes and non-stationary background. Our method is similar to that in [9]. We use dierences from subsequent edge- ltered images (see Fig. 2 (c)) which are evaluated by a lateral histogram technique. The results are stabilized by considering only regions which can be tracked over time. The computational cost of the process is low, so it is also used to control the regions of interest for the form- and color-based segmentation modules.
4 Driver Access Control Due to an increasing rate of car thefts, electronic motor locks have become a standard feature of new cars. The weakness of these systems is the transferable key or chip which is required to start the engine. In order to assure highest security even in case of attempted \carnapping" biometric identi cation can be used instead of or in addition to a conventional authorization check. Face recognition seems to be the only method that allows \authorization monitoring", i.e. repeated identi cation even during driving.
4.1 System Outline We have combined our face recognition system with an electronic motor lock and integrated the complete system into a test car [12]. A driver requesting access has to look for a short moment straight at a camera in the dashboard. The identi cation process is automatically started when a person is detected in the camera's surveillance area. On-line recognition is performed on the basis
of a pretrained driver database. The training process can only be started with a chip-card as a special ID. It is possible to enhance the data set of persons already known to the system. Thereby the acceptance level of identi cation can be increased and individual variation in appearance (like drastic changes of face hair or make-up) can be dealt with. The car is equipped with a graphical cockpit display. It is used for the required dialogs in training mode. The interactive dialogs use an easy-to-use two-button interface for all settings.
4.2 Car Integration Fig. 3 shows the installation of the face recognition system in the test car. Due to the constrained environment inside the car monochrome form- and motionbased face segmentation is sucient. This allows to use a monochrome camera combined with an infrared spot. By this way, operation under dicult light conditions is possible without dazzling the driver. Direct sunlight exposure of the driver's face is still a problem. Some eort was required to nd a good position for the camera and the IR-spot. Drivers of dierent size in multiple seat settings need to be covered. In addition, the window regions seen by the camera should be as small as possible to avoid sunlight interference. And at last some free space at the mounting position should facilitate a nice integration without obstructing the driver's eld of view. We choose a position behind the gears, which allows an unobstructed view to the drivers face in the required resolution. The covered window area in the image is rather small. The image processing hardware consists of two digital signal processors (DSP) of type TMS320C40 on a TIM-40 motherboard. One DSP module contains an on-board video digitizer and a video output connected to the cockpit display. This module carries out the image acquisition, the motion-based face candidate segmentation and parts of the recognition/training process. The second DSP board carries out in parallel the form-based face candidate segmentation and some preprocessing steps for the recognition process. The system is connected to the car communication bus (i.e. A-BUS). An exchange of information with other car devices is done via this interface: the door and engine status is requested, the cockpit display is controlled. In future, the system may also distribute information about the current driver to adjust seat- and mirror positions.
5 Security Counter Application As an example for an application where the environment is quite unconstrained we describe a system which has been developed for border control stations. The task is to automatically capture facial images of persons standing at a counter in the visual pose during security questioning. The situation seen through the camera's lens is depicted in Fig. 4 (a).
camera field of view
camera window
observation box
operational field for face spotting (approx. volume: 1m - width, 1m - height, 0.7 m - depth ) head of passanger being served
head of officer
queue of passangers
desk
20,999 meters mm
(a)
(b)
Fig. 4. Scenario of the security counter application: (a) scene and (b) geometric analysis A closer look a the scenario (Fig. 4 (b)) shows that the variation of possible head positions is much larger in comparison to the driver access control system. In other words: the surveillance area is larger. Fig. 4 (b) actually is a geometric analysis of what can be achieved with a conventional surveillance camera on a xed wall mount. Even with the assumption that the person being served stands close to the desk, the area within the eld of view of the camera becomes quite small if a face resolution of 18 dpi is considered a lower bound. To obtain facial images of adequate resolution, we can either increase the camera resolution with non-standard equipment or one can use a tracking camera system. Due to previous experience with active vision systems [5, 11] and the advantage of using standard video hardware, we have chosen the second approach.
5.1 Bifocal Camera Head To ful ll the requirements of the application a novel bifocal camera head has been developed. A xed survey camera with a wide-angle lens is used to nd regions of attention in the whole observed area. By means of a fast pan-tilt unit (PTU) a portrait camera with a telephoto lens is pointed to these regions to obtain high resolution facial images. The eectiveness of this approach has recently been con rmed in [4]. The complete head is displayed in Fig. 5 (a). The distance between both cameras is small related to the distance between the camera head and the observed persons. Under the assumption of a common projection center this allows to derive the guiding angle for the PTU without a three dimensional model. Due to mechanical restrictions the used assumption is only an approximation, of course. The guidance error is smaller than 2:5 for a distance of three meters. This is small enough to bring the region of interest into the eld of view of the portrait camera. Once the head of a person is \seen" by the portrait camera, closed-loop tracking of a face from images of the portrait camera itself leads to higher precision movements. The tracked face is kept in center of the portrait camera by driving the error signal (distance of face centroid from the image center) to zero.
(a)
(b)
Fig. 5. (a) Bifocal Camera Head, (b) FaceCheck user interface T
Context Analysis
+
Face/Head/Body
Survey Camera
Vision S
S
Position
Face Recognition
Geometric Transf. S Control Data
Desired Pos. +
Geometric Transf. P
-
T
Fusion
Vision P
P
+ Controller
+
PTU
Portrait Camera
Fig. 6. Block diagram for active face tracking 5.2 Active Head Tracking Fig. 6 shows the control scheme of the active tracking system. The system works in two modes: the survey mode and the portrait tracking mode:
{ In survey mode the portrait camera is controlled in open loop by signals derived from the xed survey camera. { In portrait tracking mode the portrait camera is controlled by closed loop feedback of it's own video signal.
Mode selection is performed by a data fusion module which judges state information from the survey vision module, the portrait vision module, and the face recognition module. The state information of the vision modules contains data about presence and position of a face in view of the modules. The recognition module determines the duration of the closed loop tracking process. If a person has been actively tracked and recognized the attention of the portrait camera should be switched. Both vision modules run in parallel all the time, so even if portrait tracking mode is active, the complete scene in view of the survey
camera is monitored for other persons. By the position information the data fusion module can determine if another person is in view of the survey camera. Both inputs on the left de ne the desired position of the tracked head in the image of the portrait camera. They have a constant signal, the image center. The image coordinates are geometrically transformed to a pan and tilt angle of the pan tilt unit. The controller is implemented as an - tracking lter [8] which stabilizes these destination positions to a smooth movement in real-time. It is a steady state Kalman Filter: xM (k) = x(k , 1) + T vx (k , 1) (1) x(k) = xR (k) + (1 , ) xM (k) (2) vx (k) = vx (k , 1) + xM (k) (3) T where xR (k) is the raw data position, x(k) the ltered position, xM (k) the model position, vx (k) the estimated velocity of the tracked face. and are
positive constants smaller than one chosen in dependence of the sampling period T . Due to the constant coecients the - tracking lter is much simpler to compute than a full Kalman lter. The pan tilt unit with the attached portrait camera is able to perform either a fast saccadic move to a xed position or a smooth speed controlled motion to track a moving object.
6 Conclusions On the research side, a lot of work has been done on face identi cation, face recognition, and face spotting in images. However, the integration of research results concerning facial image processing into practical application systems is still a very dicult and risky undertaking. Car security systems based on video authentication of the driver may become mass products in a couple of years. We built a functional prototyp from which we learned valuable lessons on how to design robust algorithms for real-time face spotting. This problem had been underestimated at rst and a number of methods tried out from research papers were found to fail completely in the environment of the test car. The second application we describe has led to the development of a novel bifocal camera unit supporting high speed camera pointing toward a human face. Again the robustness of face image spotting and validation has turned out to be crucial technical problem. In addition, the issue of user convenience has been a driving force for the technical development. If people are to accept computers watching their faces they at least don't want to be bothered with "posing" for a norm snapshot. People feel that choosing the right moment and the most suitable viewing direction is the computer's business.
7 Acknowledgement We wish to thank Gerd-Jurgen Gie ng, Markus Ohlhofer, and all other contributors to the FaceCheck3 system. Hubert Weisser and Rudolf Mai of the Volkswagen AG have been valuable partners in our joint eorts towards a driver recognizing car.
References 1. R. Brunelli and T. Poggio, Face Recognition: Features versus Templates, in IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 15, No. 10, pp. 1042{1052, 1993. 2. T. C. Chang, T. S. Huang, and C. Novak, Facial Feature Extraction from Color Images, in Proc. International Conference on Pattern Recognition, Vol. II, pp. 39{ 43, 1994. 3. Y. Dai and Y. Nakano, Face-Texture Model Based on SGLD and its Application in Face Detection in a color scene, in Pattern Recognition, Vol. 29, No. 6, pp. 1007{1017, 1996. 4. T. Darrel, B. Moghaddam, and A. Pentland, Active Face Tracking and Pose Estimation in an Interactive Room, in Proc. Computer Vision and Pattern Recognition, pp. 67{72, 1996. 5. W. Gillner, S. Bohrer, and V. Vetter, Objektverfolgung mit pyramidenbasierten optischen Flussfeldern, in Bildverarbeitung '93: Forschen, Entwickeln, Anwenden, Technische Akademie Esslingen, 1993, in German. 6. V. Govindaraju, D. B. Sher, R. K. Srihari, and S. N. Srihari, Locating Human Faces in Newspaper Photographs, in Proc. Computer Vision and Pattern Recognition, pp. 549{554, 1989. 7. A. Jacquin and A. Eleftheriadis, Automatic Location Tracking of Faces and Facial Features in Video Sequences, in Proc. International Workshop on Automatic Faceand Gesture-Recognition, pp. 142{147, 1995. 8. C. L. Phillips and H. T. Nagle, Digital Control System Analysis and Design, Prentice Hall, 2nd ed. 1989. 9. C. Ponticos, A Robust Real Time Face Location Algorithm for Videophones, in Proc. British Machine Vision Conference, pp. 449{458, 1993. 10. H. A. Rowley, S. Baluja, and T. Kanade, Neural Network-Based Face Detection, in Proc. Computer Vision and Pattern Recognition, pp. 203{208, 1996. 11. W. von Seelen et al., A Neural Architecture for Autonomous Visually Guided Robots - Results of the NAMOS Project -, Fortschr.-Ber. VDI Reihe 10 Nr. 388. VDI Verlag, 1995. 12. V. Vetter, G.-J. Gie ng, R. Mai, and H. Weisser, Driver Face Recognition as a Security and Safety Feature, in Law Enforcement Technologies: Identi cation Technologies and Trac Safety, Proc. SPIE 2511, pp. 182{190, 1995. 13. G. Yang and T. S. Huang, Human Face Detection in a Complex Background, in Pattern Recognition, Vol. 27, No. 1, pp. 53{63, 1994. This article was processed using the LATEX macro package with LLNCS style 3 FaceCheck is a registered trademark of C-VIS Computer Vision und Automation GmbH.