A Survey of Face Recognition Thomas Fromherz, Peter Stucki, Martin Bichsel University of Zurich Department of Computer Science MultiMedia Laboratory Winterthurerstr. 190 8057 Zurich-Irchel Phone: +41 1 257 4321 Fax: +41 1 363 00 35 Email:
[email protected] Email: {stucki, bichsel}@ifi.unizh.ch Keywords: Face recognition, identification, authentication, hybrid recognition, classifiers
Abstract The development of face recognition over the past years allows an organization into three types of recognition algorithms, namely frontal, profile, and view-tolerant recognition, depending on the kind of imagery and the according recognition algorithms. While frontal recognition certainly is the classical approach, view-tolerant algorithms usually perform recognition in a more sophisticated fashion by taking into consideration some of the underlying physics, geometry, and statistics. Profile schemes as stand-alone systems have a rather marginal significance for identification. However, they are very practical either for fast coarse presearches of large face databases to reduce the computational load for a subsequent sophisticated algorithm, or as part of a hybrid recognition scheme. Such hybrid approaches have a special status among face recognition systems as they combine different recognition approaches in an either serial or parallel order to overcome the shortcomings of the individual components.
1.
Introduction
The modern information age confronts humanity with various challenges that did not exist to the same extent in earlier times. Two such challenges are the organization of society and its security. The ever increasing human population and its mobility in all its facets caused an increasing demand for enhanced ways of transferring data and sharing information, which in turn, through complex communication structures, gained its own mobility - a mobility that even outgrew its human counterpart. In this context of increased mobility and world population, security and organization have become important social issues. As mobility applies to both humans and information, security includes both security of individuals and their valuables, and the integrity of data under external influence. Within this environment of increased importance of security and organization, identification and authentication methods have developed into a key technology in various areas: entrance control in buildings; access control for computers in general or for automatic teller machines in particular; day-to-day affairs like withdrawing money from a bank account or dealing with the post office; or in the prominent field of criminal investigation. These few examples illustrate the necessity of identification and authentication for the functioning of modern society.
A Survey of Face Recognition
2
Still, most of these methods, with all their legitimate applications in an expanding society, have a bothersome drawback. Except for human and voice recognition, these methods require the user to remember a password, to enter a PIN code, to carry a batch, or, in general, require a human action in the course of identification or authentication. In addition, the corresponding means (keys, batches, passwords, PIN codes) are prone to being lost or forgotten, whereas fingerprints and retina scans suffer from low user acceptance. Modern face recognition has reached an identification rate of greater than 90% for larger databases with well-controlled pose and illumination conditions. While this is a high rate for face recognition, it is by no means comparable to methods using keys, batches or passwords, nor can it bear direct comparison with the recognition abilities of a human concierge. Still, face recognition as an identification or authentication means could be successfully employed in many of the aforementioned tasks to support other techniques, or even replace them in the case of lower security requirements. Apart from these more classical face recognition applications, in the modern information age, where computers are common tools for daily work processes, new authentication tasks keep appearing. In bank offices, for example, or in other institutions whose security demands are higher than in ordinary offices, security breeches result from logged-in but temporarily unattended computers. Such security needs could be met nowadays, as many computers are already equipped with video cameras (or cameras could be easily installed). In this case, special face recognition software could constantly observe and match the face in front of the camera against the face of the user who claims to be logged in. In the future, this kind of constant observation could be employed in intelligent PCs, for instance to automatically bring up a user’s individual desktop environment. Even more advanced, when employed in a gesture and facial expression recognition environment, it could turn “personal” computers into “personalized” computers able to interact with the user on a level higher than mouse clicks and key strokes. Together with software that could “understand” gestures and facial expressions, such recognition would make possible intelligent man-machine interfaces and, in the future, intelligent interaction with robots. Automatic recognition is a vast and modern research area of computer vision, reaching from recognition of faces, facial expressions and gestures over related topics such as automatically detecting, locating and tracking faces, as well as extraction of face orientation and facial features, to such supporting fields as the handling of uncontrolled and uncontrollable conditions like illumination and shadows, and the 3D reconstruction of faces in particular, or the generation of new views from given imagery in general. On the background of such an abundant field of research, this survey paper shall restrict itself to face recognition techniques with some remarks to related research topics. The development of face recognition over the past years allows an organization into three types of recognition algorithms, namely frontal, profile, and view-tolerant recognition, depending on both the kind of imagery (facial views) available, and on the according recognition algorithms. While frontal recognition certainly is the classical approach to tackle the problem at hand, view-tolerant algorithms usually treat it in a more sophisticated fashion by taking into consideration some of the underlying physics, geometry, and statistics. This survey paper reports on research results mostly published in the years 1995 and later, thus complementing the earlier surveys by [Samal & Igenar 92] on nonconnectionist approaches, by [Valentin et al. 94] on connectionist schemes, and the abundant survey by [Chellappa et al. 95] on 20 years of face recognition. In addition to this organization, a special status is taken by the rather novel idea of hybrid recognition algorithms which combine different recognition approaches in either a serial or parallel order. In the parallel case the final results are assessed by submitting all single recognition rates to one of a number of possible classification procedures.
A Survey of Face Recognition
3
And finally, although not tackling the problem of face recognition itself, the supporting research topics mentioned above play an important role in many of the more advanced automatic recognition procedures. The according routines supply the actual recognition process with valuable information, be it through locating and tracking faces in unconstrained environments, be it through illumination normalization, or be it through extracting the orientation of a face and generating an appropriate reference view for the matching step. This last element is especially important for view-tolerant approaches acting on a database of various views. However, if in the future real-time recognition from video sequences is the task, extraction of face orientation will have to be complemented with nonrigid motion, i.e. facial expression analysis, another vision area that recently has received a substantial amount of attention. The following chapter gives an overview of existing frontal, profile and view-tolerant recognition approaches. The special class of hybrid recognition systems is treated in chapter 3. Some conclusions are given in chapter 4.
2.
Face Recognition Today
In order to organize the vast field of face recognition, several approaches are conceivable. For instance, algorithms treating the face and its environment as uncontrolled systems could be distinguished from systems that control the lighting or background of the scene, or the orientation of the face. Or systems that use one or more still images for the recognition task could be distinguished from others that base their efforts on video sequences. While this "low-level" differentiation aims at immediate problems in face recognition, a rather "high-level" overview is proposed in this survey paper by a classification into frontal, profile, and view-tolerant face recognition. It is interesting to note that this classification goes along with a historic view of face recognition with the time axis running from classical frontal recognition, over profile approaches to modern view-tolerant algorithms. This historic classification is reflected in the last two major conferences on face and gesture recognition, 1995 in Zurich, and 1996 in Killington, where the majority of direct contributions on face recognition chose a view-tolerant approach. In their paper about benchmark studies in face recognition, [Gutta et al. 95] independently came to the conclusion that holistic connectionist methods outperform discrete feature and correlation methods. The former are characteristic for such approaches as principal component analysis (PCA), eigenface systems or radial basis functions (RBF) and are common for view-tolerant systems while the latter use standard pattern recognition to match extracted facial features amongst faces commonly used in frontal algorithms. The same paper also describes in detail the DARPA/ARL FERET database of facial images used in some of the research work described below. Consisting of about 1100 image sets of two frontals, and pairs of 1/4, of 1/2, of 3/4, and of full profiles, totaling about 8500 images, this image collection aims at offering a representative database for training, testing and evaluating alternative face recognition schemes. 2.1 Frontal Recognition 'Frontal recognition' denotes those approaches which in a preprocessing step find and extract facial features in head-on 2D face images which then are matched against the according features of a face database. Following facial heuristics, preprocessing steps usually are biologically motivated and serve the purpose of data reduction, removal of redundancies, and speed-up of parameter searches. Such a preprocessing step can be high-level in that it finds a list of facial features which then is searched for corresponding elements such as edge elements, lines, or circles. It can also be low-level in that filters extract and mark certain features keeping them decoded in the imagery. The subsequent matching step has the goal of finding unknown
A Survey of Face Recognition
4
parameters such as positions and distances of facial features, and of subsequently assessing match values, which finally are used to assess an overall recognition rate. The following methods are all specialties in the realm of frontal recognition. [Daugman 1995], for instance, describes a system where the rejection of an independence hypothesis can be used for reliable recognition. If two stochastic signals have sufficiently high degrees of complexity and show only small amounts of similarity, i.e. a test of independence of the signals fails, the hypothesis of independence of the two signals is justifiably be rejected. While the face with its common geometry and its fixed number of facial features, an advantage in other face recognition procedures, cannot be used in this approach, the iris of the human eye can. The random texture of the iris reveals an estimated complexity of several hundred degrees of freedom. Due to being as unique as fingerprints, even across twins or across the pair of eyes of one individual, human irises present themselves as unique identifying facial features. [Daugman 1995] describes a real-time algorithm that demodulates local phase variation in the texture using 2D Gabor wavelets, the results of which are converted into an 'Iris Code' about 173 bits long. The author claims a very high-confidence recognition by exhaustive searches at the rate of about 10,000 persons per second. This approach shall be counted as a member of 'frontal recognition' since it obviously has to rely on frontal imagery and does not correct for location and scale variations. As internal organ of the eye, the iris creates certain problems with respect to suitable imagery and with respect to the locating and extracting task. Unfortunately, the author fails to comment on image recording details and on limits of imagery that can be used for reliable iris detection. The nature of the 'iris' approach is rather described as person identification than as face recognition, as it is based on the biological properties of one facial feature - the iris - rather than the visual properties of primary features like eyes, noses, or mouths in video images. In contrast, the approach by [Takács & Wechsler 96] describes classical frontal face recognition, although based on modern detection and recognition techniques. The authors present a procedure for the representation and detection of multiscale objects based on radially nonuniform sampling lattices. The feature detection and classification involves feature encoding using visual filter banks (receptive fields) defined through self-organizing feature maps. The combination of visual filters with radially non-uniform sampling allows illumination insensitive calculation of feature codes that subsequently are collected into compact face codes for later identification. In a first step, multiresolution eye pair detectors find possible face regions which, in a second step, undergo a full face detector process to find the face location and to estimate its scale, thus allowing efficient normalization. Subsequently, these normalized face images are submitted to eye, eye pair, nose, and mouth visual filters observing simple average geometric models of the facial parts. The filter outputs are used to encode each individual feature. Using 216 frontal raw images of 100 subjects of the FERET database, the first step was tested assuming but one face embedded in a complex background. The performance was 95% using only the strongest detection, 100% when more than two eye pair candidates were allowed. The second step, the extraction of the facial features, achieved a 97% accuracy in finding at least 3 of the 5 major features, where a feature location was considered correct if the feature coordinates fell within ±3 pixels of a human estimate. Errors were due to glasses, smiling mouths, or facial hair covering part of the mouth. In contrast, eyes and noses seem relatively less sensitive to varying imaging constraints and facial expressions. The combined face code assembled from the individual feature codes was taken to train a nearest neighbor classifier, thus allowing later code matching and identification. In a corresponding recognition test of 116 subjects, the identification algorithm correctly recognized 89,6% of the individuals.
A Survey of Face Recognition
5
The next contribution by [Wilder et al. 96] attempts to lessen the pose and illumination sensitivity of face recognition, not by a new matching algorithm but by testing the feasibility of infra-red (IR) against visible imagery. Although the three recognition algorithms, called in to ensure fair comparison, usually would be assigned to the view-tolerant group, the comparison is based on frontal imagery recorded under fairly controlled pose, illumination, and temperature conditions. The acquisition of the imagery allowed small variations in pose, temperature, and geometry (talking and silent faces) but these were not truly corrected or normalized in any way, which is why this approach appears in this frontal recognition section. Since the emphasis of the paper of [Wilder et al. 96] was on recognition, the face was located manually by pointing on eyes, nose tip, and mouth center. The following pre-processing step rotated and scaled the images for normalization purposes and then cropped and masked them to suit the individual extraction and recognition algorithms. A fair comparison of visible vs. IR imagery was guaranteed by three independent algorithms, namely transform coding of gray scale projections - an approach related to tomographic applications, eigenfaces [Turk & Pentland 91], and matching pursuit filters [Phillips & Vardi 95]. All the experiments involved both training on 'talking images' and recognition on 'silent images', and vice versa. The overall message of the experiments is that both visible and IR imagery perform similarly across algorithms, with a slight advantage of the first algorithm on IR images, and the latter two on visible images. The paper also reports on examinations of the relative importance of different facial regions for recognition, basically several facial features plus the face, without the face, and the face alone. The result here is that the face contributes to recognition in the visible case, but decreases recognition in the IR case. The results for fusing visible and IR recognition are given in chapter 3. The main problem of visible imagery is the sensitivity of the recognition performance to illumination and to the influence that pose and illumination have on each other. In contrast, the performance of IR imagery is subject to variations in temperature of the surrounding environment, and to variations in the facial heat patterns when moving between temperature environments. For use as an isolated application, the visible approach has to be favored since, in verification scenarios with cooperative subjects, illumination in contrast to temperature can be controlled. 2.2 Profile Recognition While frontal images contain inherently more discrimination power than facial profiles, the analysis of such head-on images is computationally much more complex and analytically more sensitive to variation of illumination and pose. In contrast, facial profiles are relatively easy to analyze, and, therefore, allow for fast algorithms, and show a sufficiently high number of details to support face recognition. Although not much confidence has been put into profile recognition in the past, it presents itself as an obvious solution for two emminent problems of modern face recognition. For one, the fast profile procedure can be employed as a fast search of a large face database to index candidates which then enter the more accurate but computationally expensive frontal recognition. On the other hand, profile recognition provides a simple means to support other approaches in a hybrid face recognition procedure. Pure profile algorithms are rare. One such approach, however, was recently published by [Yu et al. 95] as a preliminary step towards a later integration into a hybrid recognition system. The procedure begins with a rule-based extraction of fiducial marks on facial profiles, namely nose tip, chin point, nose bottom, mouth point, mouth bottom, upper lip, lower lip, forehead, and bridge. The initial search for the nose tip is a very sophisticated step involving the convexity of the profile as well as a location and an area constraint. This procedure results in a very reliable nose tip extraction from which the extraction of the remaining fiducial points is relatively easy and robust.
A Survey of Face Recognition
6
Subsequently, the marks are transformed into a canonical profile representation in which the profile can be considered a one-dimensional function, as such simplifying the following recognition task. As a preparation for the matching step, a variation model is calculated from a number of training profiles. The actual matching of a test profile with the variation models of all subjects happens by calculating the distance between the test profile points and the corresponding extremal points of the variation models. To the translation- and rotation-invariance, inherent to the approach, scale-invariance can easily be added by integrating the mean distances of certain fiducial points across training profiles into the canonical representation. The authors tested the algorithm on profiles of 33 subject, which were recorded under controlled illumination, and allowing a rotational pose variation around the vertical axis of ±15°. Five profiles were recorded per person, some of them in up- or downwards tilted head position, three of them being used for training and two for testing. Under these ideal conditions, all profiles were recognized correctly. Still, some problems have to be reported. First, due to variable eye lashes and variable open or closed eyes at recording time, eye areas are sensitive to timely variations. Additionally, mouth and lip points are not always clearly observable for some subjects. The authors solved this problem by discarding these problematic points from the profile function. Another contribution that may be mentioned in connection with profile approaches was published by [Gordon 95]. In an overall valuation, however, the approach would rather belong to the group of view-tolerant recognition, and, by virtue of its testing of classical frontal recognition with and without additional profile constraints, to the group of hybrid procedures. In contrast to the approach by [Yu et al. 95], which organized recognition as a comparison of normalized one-dimensional functions of facial features, Gordon's approach extracts these features merely to perform normalization and define template regions, which later are used for combined recognition of frontal and profile regions in a classical template matching process. The extraction techniques involves tangency constraints and heuristic knowledge about head structure. Additionally, facial features of both frontal and profile imagery, can be used to precompute search indexes consisting of distances and angles of just these points. These indexes are used to speed up overall recognition by reducing the complete number of large database entries to a much smaller fraction of possible candidates. 2.3 View-Tolerant Recognition View-tolerant algorithms describe approaches that employ various techniques to correct for perspective or pose-based effects due to illumination and to the three-dimensional nature of the head. Such techniques are mainly needed in situations where uncooperative or unaware subjects should be authenticated or identified with no view consistency given. In a rough validation, these approaches can be organized into algorithms that construct some sort of 3D head model, into algorithms that employ Hidden Markov Model (HMM) techniques to tackle real-time recognition, or into algorithms that use deformable models of some kind (involving Karhunen-Loeve-Transformation (KLT), Eigenfaces, or wavelets) and morphs to model varying shape. The KLT usually is employed when beards, baldness, or non-uniform backgrounds are present in the imagery, since it treats the entire image, face and background, as a pattern. Computationally complex eigenface concepts are used if global and local recognition aspects shall be addressed, but only have been proved useful on a relatively small amount of images. In contrast, wavelets, mostly Gabor wavelets, extract facial features including an entire
A Survey of Face Recognition
7
neighborhood similar to the receptive fields of the human visual system, thus representing the face as a collection of feature points. 2.3.1 Shape All shape-based approaches need two or more images of the head from different viewing directions to produce 21/2 D or 3D models of the head. The system by [Gordon 95], mentioned in the previous section, works with one frontal and one profile image. After identifying the head bounds in the frontal view, eye candidates are extracted using general eye templates and valley based pupil detection. From all the candidates, likely eye pairs are selected using geometric constraints on separation and angle. After normalizing the image view for each candidate pair, candidates with no evidence for nose and mouth are eliminated. The final pair is then selected based on bilateral symmetry, relative position within head region, eye candidate scores and scale. In the case of the profile image, first the profile line is extracted, then the nose and chin tip are estimated using tangency constraints and a priori knowledge of head structure, and finally these feature positions are refined using bitangency constraints. The subsequent normalization of both frontal and profile image, involving rotations in the image planes, general scale factors and translations, put the extracted feature points to conform to standardized positions. The feature points only identify five regions, namely right and left eye, nose, mouth, and profile. These feature points enter the matching process which consists of a cross correlation procedure with corrections for general lighting effects. Depending on the reliability and information content of the features the various regions are weighted differently, leading to a scoring of the individual matching results. It is important to notice that in contrast to the algorithm by [Yu et al. 95] in the previous section, the profile enters the matching process by adding a limited region of the profile to the pattern matching process, and not by the geometry of the extracted feature points. The author tested her hypothesis that an additional profile view improves system performance with two frontals and two profiles of 194 subjects of the FERET database. One of each view was always used to build the reference database and the remaining two to build the query model. Adequate tests show that the addition of profile information reduces the error rates by between 30% to 40% at R=5% (R denotes the Rank Threshold, the number of the most similar reference database subjects considered when judging correct identification). The automated system including the profiles performed at a recognition rate of 85% at R=5%. Although this method takes advantage of the shape information given in the frontal and profile images, it does not yet operate with a actual model of the head. In contrast, [Bichsel 95] presents a modular framework for shape from multiple views and varying illumination to estimate a 21/2 D head model represented by a depth map and a corresponding texture map. In an iterative process a frontal depth map is estimated from a number of different views which, when rendered together with its corresponding texture map, shows the smallest deviation from the according frontal view image. The framework consists of a renderer module which uses initial viewing parameters (depth, texture, calibration) to compute an image of this specific view. The subsequent probability estimator module measures the deviation of this output image from the original image and calculates a probability value for the specific set of shape and texture parameters. The optimizer module finally tries to find a better set of parameters using the history of the previous evaluations to allow the renderer module to generate a better rendered image. From the resulting depth and texture map any additional view of the head can be calculated within limitations. This together with the ability of the probability estimator module to compute the probability of a specific set of shape and texture parameters, namely of a specific face model, makes this framework highly suitable for face recognition applications.
A Survey of Face Recognition
8
The author reports reconstruction experiments with images of ±30° viewing orientations recorded in 10° steps, and proves the ability to calculate various different views from the resulting depth and texture map, including intermediate and vertically tilted views. Since this paper mainly deals with estimating head models no recognition experiments and results are given. A similar approach to shape-enhanced face recognition is given by [Fromherz 96] who proposes a shape from multiple views and visual cues procedure to reconstruct a fully 3D representation of the head. The approach presents a combination of two algorithms that reconstruct head shape from contours and from local illumination distributions based on an image sequence of the head. These algorithms are embedded in a framework that allows consistent integration of further visual cues if needed. The image sequences used showed the heads at orientations of ±90° in steps of 10°. While during reconstruction the models are represented either as a voxel clusters or as sets of depth maps of multiple orientations, the depth map representation is used for recognition. The actual recognition happens by adjusting the orientation of a particular depth map together with its corresponding texture map in an iterative process until a view is reached that, in a simple template matching, shows the best fit to the image to be recognized. [Fromherz 96] reports a significant decrease of the errors of the shape-enhanced matching of around 1/5 compared to traditional matching. Alternatively, the depth and texture map could be entered in the probability module described above ([Bichsel 95]) to receive a probability value for the according face model. In a step to even more accurate head models than possible by shape-from-X techniques, [O’Toole et al. 95] employs range data recorded with a Cyberware rotational scanner to calculate an average head model represented by range and texture data. Assuming that the viewing direction of a recorded image is known, the model is used to compute new views of the head. Missing points of the head are filled in with values from the average range and depth map. The paper sets out to show that overlapping visible regions of heads can support accurate recognition even with large viewpoint changes. Scans of 68 subjects were recorded, 51 of which were used to build up a 51×51 correlation matrix. Eigenvectors computed from this matrix form a vector basis in which any estimated novel view of the 68 heads can be represented. The quality of such an estimate can be assessed by a similarity metric, namely the cosine between the estimated and the original vector. The results show that reconstruction quality in general is better for faces which entered the correlation matrix (“learned faces”) than for “unlearned” faces. Additionally, smaller changes in viewpoint between learned and novel test view result in better reconstruction than for large changes due to the larger overlapping regions between learned and test view. While the approaches mentioned so far work with images recorded under fairly controlled conditions, [Clergue et al. 95] work with multimedia documents that contain uncontrolled video segments from which an analysis of the body motion and the identity of persons in motion should be accomplished. This initial paper shows the basics of how to achieve identification based on projective invariants and on model-based estimations of head pose. In contrast to the methods above which tried to find an accurate head model of a subject, this approach works with a generic 3D mesh model of a human body. From the preceding analysis of the body motion the head location is estimated. Some characteristic feature points are selected manually in the first frame to allow automatic tracking of the head over the subsequent video frames. If sets of these features (e.g. the four corners of the eyes or the center of the mouth together with the top and base of the nose and the top of the forehead) are aligned in an image their cross-ratios are independent of any perspective transformation, i.e. these ratios can be used to identify a head. The identification approach by projective invariants is supported by a classification mechanism based on the pose of the head:
A Survey of Face Recognition
9
some feature points manually selected in the first frame together with the generic head model allows one to estimate the initial head pose, to track this pose over the video sequence, and to define the 3D symmetric face axis. Now, various 3D planes perpendicular or parallel to this face axis and passing through several facial feature point can be built and checked for intersections with other feature points. These measurements turn out to be significant for differentiating among people and allow a first classification of the persons appearing in the video. 2.3.2 Hidden Markov Modeling (HMM) While Hidden Markov Models have been used in speech recognition for more than a decade ([Rabiner & Juang 86], [Huang et al. 90]), and were also promoted for gesture recognition in recent years ([Starner & Pentland 95]), only little work has been done on applying HMM to face recognition ([Samaria & Young 94], [Nyffenegger 95], [Achermann & Bunke 96]). As an example of such work, some details of the HMM implementation by [Achermann & Bunke 96], being part of the hybrid system in chapter 3, shall be given. HMM generally works on sequences of coherent 1D signals (feature vectors), while an image usually is represented by a simple 2D matrix. This dilemma can be solved by applying a sliding window to the image covering the entire width of the image which is moved from the top to the bottom of the image. The brightness values of the windows are passed as 1D feature vectors to the HMM process. Successive windows overlap to avoid cutting of significant facial features and to bring the missing context information into the sequence of feature vectors. The human face can be divided in horizontal regions like forehead, eyes, nose, etc., that are recognizable even when observed in isolation. Thus the face is modeled as a linear left-right HMM model of five states, namely forehead, eyes, nose, mouth, and chin. The complete recognition system incorporates a preprocessing step followed by four processing steps. While the preprocessing, involving a normalization of the image width, a histogram equalization, and a manual determination of the face location, is applied to both training and test images, processing step 1, 2, and 3 are used for training, while recognition involves step 1,2, and 4: 1) Code book: The feature vectors, extracted with the sliding windows from a test set of images, are used to build a code book. 2) Vector quantization: Based on this code book all feature vectors are quantized. 3) Training: Based on five images of each person, the HMM model is trained for every person in the database by iterative computation of its parametrizations. The iterations serve the fine adjustment of the HMM parameters. 4) Recognition: A test image that was not used for training first passes the preprocessing step and processing step 1 and 2. Then the probability of producing this image is computed for every person, i.e. every model, in the database. This procedure produces a ranking of the potential individuals in descending order of the probability of each model. This HMM implementation shows best results for an optimal number of training iterations, while suboptimal training, i.e. over- or undertrained systems, cause inferior results. Experiments were carried out on a database of 30 persons each represented by 10 images of various views (2 straight view, 2 left/right views, and 2 up/down views). The sliding windows were of size 40×4 with a 3 pixels overlap. The recognition results for optimal training were 90% on the first rank, while a certain choice of suboptimal training produced an 87% accuracy. 2.3.3 Deformable Models In contrast to the 'physical' approach of shape-based face recognition, the idea of deformable models is to encode regions around significant facial features in a mathematical description and
A Survey of Face Recognition
10
let a similarity function decide about the relations among the descriptions of different faces. Since it is impossible to include all possible parameters of facial imagery in a face recognition model, data driven methods are employed to design key components of the model. One such method, used by various authors to automatically extract facial features, employs a wavelet decomposition of the images. [Phillips & Vardi 95], for instance, present an adapted wavelet decomposition as key element of their matching pursuit filters to find the subtle differences between faces. The ability of wavelets to encode local information minimizes the effect of variations of background and facial expressions. The presented face recognition system consists of three modules in which two sets of matching pursuit filters play the key role of locating facial features and of identifying unknown faces. The first module designs these two filter sets off-line, while the second module uses the first set of filters to determine the face position in the image and to report the locations of nose and eyes. Every person is now described by a 5-component vector with one coefficient for every feature (tip of nose, bridge of nose, left and right eye, interior of face). These feature vectors are used in the third module to measure the similarities between features in the probe - the unknown face - and their representations in the gallery (of known faces). The identification happens by computing a similarity score for each feature of the probe and the corresponding feature of one candidate of the gallery followed by the computation of a total score for this candidate. Repeating this procedure for all images in the gallery finally leads to the identification of the probe. Two experiments were conducted on 172 frontal images from FERET database. The two frontal images per person, which involved some variation of facial expression and hair style, were split into one gallery member and one member of the probe set. In the first experiment which assessed the performance of the isolated identification procedure yielded a 98% correct answer in the top match. The second experiment involving face detection, facial features extraction, and identification received a 97% in top match. [Phillips & Vardi 95] also present another application of data-driven algorithms, illumination normalization. The presented algorithm corrects for both intra-face variations where a face is not uniformly illuminated, and inter-face variations where the illumination of one face is normalized, for instance, to a second face of standard illumination so that the two faces can be compared. The algorithm designs the brightness distribution of the pixels of a restricted facial area as a probability distribution based on which the illumination of the face is then transformed. In other words, the histogram of one image is transformed into the histogram of the other image employing a general form of histogram equalization. Data-driven algorithms can be found in the majority of the deformable model approaches. [Konen & Schulze 95], [Wiskott et al. 95], and [Maurer & Malsburg 95], for instance, employ an elastic graph method that stores the faces as grids (graphs) with the characteristic facial features attached to the nodes of the graphs. The features are obtained by convolutions of the faces with Gabor wavelets computed at the node locations. Through the Gabor wavelets not only local features but an entire neighborhood of the features are integrated. Once the features have been extracted the creation of a graph happens by matching a stored graph to the actual image by an optimization procedure where position, size, and inner structure of the graph are elastically varied to maximize the similarity of graph and image features. The resulting labeled graph is then compared with a number of precomputed reference graphs, the general face knowledge, and only if certain significance conditions are fulfilled, the match is accepted. The elastic graph method is robust with respect to variations in pose, size, and facial expression of the head and can deal with different lighting conditions. In his contribution, [Konen & Schulze 95] reports on the face recognition system 'ZN-face', a real world access control system. The system employs user-triggered semi-automatic image acquisition, allows fast identification in about 3.5 seconds, and can handle large databases of more than 1000 subjects due to verifying instead of recognizing a subject's identity. Tested on
A Survey of Face Recognition
11
87 subjects with one or two images per person in the database, the system yields recognition rates of around 96%. [Wiskott et al. 95] modified the elastic graph approach so that the nodes in the different graphs always refer to the same local facial region. This allows one to enhance face recognition by additionally detecting certain features like gender, beards, or glasses in the image of the recognized person. This is accomplished by checking whether the best fitting graph in the general face knowledge is mostly male or female and did or did not show evidence for beard or glasses. Tests on 112 neutral frontal images of the FERET database, 65% male, 19% bearded, 28% with glasses, yield an approximate 90% recognition rate of the features. The face database for testing the recognition system consisted of twice 300 frontal views differing by expression, and another 300 half profile views varying between 40° and 70°, both taken from the FERET database. Comparing the 300 frontal views against the other 300 frontal views achieved a high recognition rate of about 97%. In contrast, comparing the 300 frontal views with half profile views showed poor recognition of about 13%. The system is robust with respect to rotations of the head of up to 20°. If higher rotations should be processed, as in the second test, new images of such views have to be added to the face knowledge. [Maurer & Malsburg 95] tackle the problem of single-view based recognition across varying view points by introducing a geometric transformation which acts on the encoded feature points (so-called jets) in the graph. Thus, if the surface normals are given at the graph nodes and if the angle between two views is known, then the graphs can be transformed by simply multiplying every jet with its geometric transformation matrix. Assuming a fixed set of normals, the required pose and normals are found by optimizing recognition on a training set of frontal and half profile views. The system was tested on 160 pairs of frontal and half profile images of the FERET database, 70 for training and 90 for tests. The rotation angle varied significantly from the expected 45°. A matching of the half profile faces to the frontal gallery without transformation of the jets yielded a 36% recognition rate, while with the transformation the results improved to 50%. For a rank threshold of 5% the rates are 56% and 73%. Part of these low recognition rates is due to the fact that only one set of normals was used for the whole gallery. While the above methods are merely robust against variations in lighting, more sophisticated approaches decompose geometry and lighting to achieve better results under varying contrast. The contribution of [Craw et al. 95], for instance, is based on the original eigenface work by [Turk & Pentland 91] and decomposes a number of faces into configuration vectors (shape vectors) and shape-free faces (texture vectors). An ensemble of faces, manually coded into feature vectors containing the 34 most significant facial landmarks, is subject to a principal component analysis which extracts shape variations ordered by importance into a new independent eigenface basis. It is now easy to project every face onto this basis and compute the components of the face along every eigenface. In the shape-free approach, each face of the ensemble is texture mapped to a standard shape , the average shape of the set of ensemble images. A principal component analysis then decomposes the texture mapped shape-free faces into eigenvectors describing the lighting variations. Again the components of every face along these eigenvectors are computed. Finally, recognition of a coded test image is achieved by matching the components vector of the test image against the ensemble components, for instance, using an Euclidean metric. Tests conducted on 14 images of each of 27 individuals, recorded under varying lighting conditions and on different days, showed that isolated matching based on either configuration or shapefree faces only yielded reasonable overall recognition, and that only the combination of both codings promised good recognition rates of over 90%. Another approach to recognition from a single view is given by [Lando & Edelman 95] who choose a representation of face images similar to [Konen & Schulze 95] and [Takács & Wechsler 96]. Following biological visual systems where the sensitive regions of a neuronal
A Survey of Face Recognition
12
unit is known as receptive field, the employed model represents face images by vectors of activities of graded overlapping receptive fields (RF). Same-class objects like faces induce similar patterns (clustering) in RF space across varying viewing conditions (pose and illumination) and facial expression. [Lando & Edelman 95] set out to answer the question of whether this clustering in RF space is class- or transformationbased, i.e. whether such clusters belong to representations of the same face or to the same transformation. It turns out that if only illumination is varied, clustering by face identity becomes better with increasing RF size, while clustering by viewing conditions becomes better with decreasing RF size. If the viewing position is varied in addition to illumination, this behavior is not as strong as before, class-based clustering still being ahead of transformationbased clustering. Since the viewpoint information is carried by the high-frequency RFs and the face identity information in the low-frequency RFs, both channels can be effectively combined. Hence, an appropriate face recognition scheme consists of three modules. The first employs radial basis function (RBF) classifiers which accepts image representations in the highfrequency RF space thus detecting the viewing conditions. In a class-specific transformation that corresponds to the detected viewing conditions, the second module transforms representations in the low-frequency RF space into a prototype representation predicted for the input image. In the final module, face identification happens by RBF classifiers which in the low-frequency space compares this predicted representation with those of known faces. Since both classifiers need training on two databases which can be updated with the resulting classspecific transformation, the system is capable of improving with experience. In an experiment assessing the capability of the system to generalize from a single view, the system was provided with training images of between 1 to all 18 individuals, each recorded under 15 different viewpoints (up to 68°) or illuminations. Subsequently, the system had to generalize from one given face yielding a correct recognition rate for novel views of between 59% (training on 1 individual) to 66% (training on 18 individuals). When the bilateral symmetry of faces was added to the training conditions the latter performance raised to 76%. The authors assume that a larger database comparable to an adult's lifetime experience with faces would produce recognition rates of about 97%. A special position among deformable model approaches is taken by morphing techniques which employ various techniques to find suitable parameters (so-called morph fields) to represent the linear transition of one image into another . In this context, the recent algorithm by [Bichsel 96] generates optimum image morph fields based on distortion of geometry and illumination by maximizing the probability of the morph fields in a Bayesian framework. The method needs no training and is derived based on invariances and basic transformation groups. The author shows that, assuming rigid objects with Lambertian surfaces without selfshadowing and ignoring occlusions, all possible morphs among images of a single person are confined in a 5-dimensional subspace of the overall morph space. Although these constraints are not always applicable to faces one can assume that the five most significant eigenvectors of the morph measurement matrix, calculated from at least three images, govern the major interpersonal variations. Recognition now happens by computing the morphs from the unknown image to the reference images of all possible candidates, by then projecting the resulting morphs into the fivedimensional subspace of each person, and by finally using the length of the residual vector as match value. Tests show excellent morphing results for morphs between two persons of different view and illumination. Although no recognition results are given, preliminary recognition tests show that morph fields are well suited for recognition under varying view and illumination.
A Survey of Face Recognition
13
3.
Hybrid Systems
An area that has received significant attention in recent years is classifier combination and sensor fusion ([Dasarathy 1994]). Realized in so-called hybrid systems, sensor fusion is already commonly believed to be of advantage in shape reconstruction from video images ([Fromherz & Bichsel 95]), in optical character recognition ([Lam & Suen 95]), and in the combination of face and speech recognition for person identification ([Brunelli et al. 95]). Already as early as 1992, [Ghezal 92] proposed that connectionist methods could be interesting alternatives to common recognition approaches. Although not yet combined, three methods relying on the power spectrum of images, on profiles, and on concepts of multi-layered artificial neural networks were investigated and compared. In a recent study of automatic face recognition systems, [Gutta et al. 95] finally came to the conclusion that the future in face recognition lies in the research of hybrid recognition systems. Different recognition approaches succeed and fail at widely different viewing and illumination conditions. Usually one method that is highly suitable for a certain kind of imagery cannot be used under different conditions. On the other hand, the image acquisition conditions needed for a useful day-to-day recognition system often cannot be constrained enough to suit one specific recognition approach. Due to this dilemma, it seems obvious to run various individual recognition classifiers on a problem leading to an individual ranking of the results of every process, and to design a classification scheme to assess an overall recognition result. In contrast to this parallel approach of equivalent face classifiers a rather serial one is conceivable where the output of one classifier is input to the next one. The latter approach can even go along with hybrid learning if the classifiers require training. As mentioned in section 2.3.1, the recognition method by [Gordon 95] already could be counted as a first attempt to build a hybrid system combining template-based frontal and profile recognition. Certainly, Gordon's profile approach only adds an additional template to the matching of the frontal templates and, therefore, does not constitute an independent recognition method. Still, the overall template matching is subject to a scoring of the five facial templates. The mouth template, for instance, is weighted at less than half that of the eye and nose templates due to its smaller reliability under varying facial expressions. Furthermore, the profile template has a higher score than eye and nose templates since, due to its larger size, it contains more information. Adequate tests on 194 subjects showed that the addition of profile information reduces the error rates by between 30 to 40% at R=5% (R denotes the Rank Threshold, the number of the most similar reference database subjects considered when judging correct identification). Another recognition project that can be counted as a hybrid recognition attempt is the work by [Wilder et al. 96] on visible and IR imagery already presented in section 2.1. It has to be noticed, however, that the term 'hybrid' refers to the combination of different imagery rather than of different recognition approaches, and that the combination does not involve any ranking or classification scheme among the two image types. A straightforward equal-weight combination of the results for the frontal images of 101 subjects, achieved under the conditions mentioned in section 2.1, raised the recognition performance from between 89-94% to 98-99%. An example of a proper serial hybrid classification scheme involving hybrid learning was presented by [Gutta et al. 96] who propose that intelligent hybrid systems include specific levels of knowledge, namely connectionist levels which can handle different sensory input and symbolic levels which are able to fuse data from different sensory modalities and cognitive modes. The presented system for face and hand gesture recognition combines ensembles of radial basis functions (ERBF), a holistic template matching able to cluster similar images before classification, as the connectionist level with inductive decision trees (DT), an abstractive matching using discrete features, implementing the symbolic level.
A Survey of Face Recognition
14
[Gutta et al. 96] proposed and tested two different ERBF architectures. ERBF1 separately trains the same three RBF nodes on three different sets of images, the original images, the original images distorted by Gaussian noise, and the original images distorted by a geometric transformation (rotation). ERBF2, on the other hand, trains the same three RBF nodes on a combination of the imagery of ERBF1. The output of both versions consists of objects described by a fixed set of attributes and their discrete attribute values assigning each objects to either of two classes. The goal of the following symbolic stage is to derive rules (the DT) for classifying these objects based on a collection of training objects with known class labels. The DT employed for the face recognition part was the C4.5 method by [Quinlan 86]. Two different experiments were conducted, the first an identification task like 'find person X with/without glasses', and the second a verification task, for instance, for forensic verification. The first experiment is implemented in two hybrid stages: the match stage (only involving an original RBF network) finds the identity of the probe, while the second stage (DT) checks for the presence or absence of glasses. Tests on 200 images of the FERET database showed a 93% accuracy using cross validation. In the forensic experiment a large number of candidates are tested against a predefined image gallery on which the system has already been trained. The hybrid character in this case also involves that the training inputs for the DT method are generated from the already trained ERBF method. To compare the hybrid and traditional approach, tests on 904 images of 350 subjects of the FERET database were conducted using original RBF, ERBF1, ERBF2, ERBF1 plus DT, and ERBF2 plus DT. Using cross validation the tests yielded accuracies of 76%, 82%, 86%, 90%, and 96%, respectively. The results show that ERBF architectures outperform simple RBF approaches, that hybrid learning improves classification performance, and that training on a combination of original and distorted data (ERBF2) leads to improved performance against training on separate sets of training data (ERBF1). A proper parallel hybrid recognition system integrating three face classifiers, namely the profile approach by [Yu et al. 95], an HMM algorithm similar to the one in [Samaria & Young 94], and the eigenface approach by [Turk & Pentland 91], was successfully presented by [Achermann & Bunke 96]. All three face classifiers return a ranking of the possible person in ascending order of its score. For the subsequent combination classifier, combining the results of the individual classifiers, three decision strategies are conceivable: Voting: Every classifier has one vote and a decision is reached through the majority of the votes. If the voting ends in a draw no decision can be reached and the input is rejected. Ranking: Depending on the ranks assigned by the individual classifiers a ranking for the combination classifier is computed. For instance the sum of ranks for every individual across the classifier combination can be computed which would have the lowest rank sum as first rank in the combination classifier. Scoring: Integrate information of the score function of the individual classifier to assess a ranking for the combination classifier. Before the integration, it is usually necessary to transform the scores of the different classifiers to the same range of values with a linear, logarithmic, exponential, or logistic transformation. The score-based strategy is usually favored since it provides additional information not available through voting or ranking. In all tests the ranking for the combination classifier was accomplished according to ascending scores, and scoring happened by score summation after applying a logarithmic transformation to the score values of the individual classifiers.
A Survey of Face Recognition
15
Tests were conducted on images of 30 subjects with 10 frontal images of varying viewpoints and 5 profile images of varying size and orientation. The frontal imagery included 2 straight views, and always 2 of a left, right, up, and down views. The lighting conditions were carefully controlled during image acquisition. The image database was divided in disjoint training sets (5 frontals, 3 profiles) and test sets (5 frontals, 2 profiles) which yields a total of 300 (30·5·2) possible input combinations for the combination classifier. The recognition results for the individual face classifiers were 94.7% for the eigenface approach, 90% for HMM models with optimal training, 86.7% for HMM models with less training, and 85% for the profile method. The combination of profile and HMM methods through scoring yielded a recognition rate of 96.3%, no matter whether optimally or less trained HMM models were used. Voting or ranking strategies resulted in no or only slight improvements over the individual classifiers. Profile and eigenface techniques together performed at 97.7% with the scoring strategy, while voting and ranking showed the same behavior as above. The combination of HMM and eigenface classifiers, which both only rely on frontal imagery, achieved a 98.7% with scoring. Good results were also received for rank summation, not, however, for voting strategies. Finally, the combination of profile, HMM, and eigenface classifiers resulted in a recognition rate of 99.7%. This time good results were also achieved for the voting and ranking strategies. The experiments show that the different recognition criteria of different face recognition approaches can be effectively combined to enhance the identification performance that can be achieved with a given image gallery. The tests also show that for the given images the combination of three face classifiers is superior to two classifiers. Future research, however, has to investigate whether this is a general behavior, and how much influence the quality of the images and the choice of the face classifiers have.
4.
Conclusions
A classification of face recognition algorithms into frontal, profile, and view-tolerant approaches was given. Naturally emerging from the imperfections of these algorithms, and already suggested by various authors, is the need to employ hybrid recognition systems that classify and combine the results of different algorithms. In this survey, several different hybrid approaches have been listed and discussed. Additionally to the discussion given along with the presentation of every algorithm in the various chapters, the two following major observations shall conclude this survey: Most of the recognition systems available today have been demonstrated only on limited datasets, often recorded under restricted conditions. This is legitimate if commercial applications with special recording constraints are produced, or if completely new recognition algorithms are examined. In the research of face recognition, however, more often than today advantage should be taken of large public databases like the FERET database, to allow comparable results, and to advance recognition beyond the point of specialized environments. Finally, future algorithms should add another classification level to the existing ones, namely the level of techniques that can handle nonrigid facial motions (facial expressions) and unconstrained real-time video sequences.
A Survey of Face Recognition
16
5. [Achermann & Bunke 96]
[Bichsel 95]
[Bichsel 96]
[Brunelli et al. 95]
[Chellappa et al. 95]
[Clergue et al. 95]
[Craw et al. 95]
[Dasarathy 94] [Daugman 95]
[Fromherz & Bichsel 95]
[Fromherz 96]
[Ghezal 92]
[Gordon 95]
Bibliography
Bernard Achermann, Horst Bunke, “Combination of Face Classifiers for Person Identification”, Proceedings of the 13th IAPR International Conference on Pattern Recognition (ICPR), 3:416-420, Vienna, 1996. Martin Bichsel, “Human Face Recognition: From Views to Models - From Models to Views”, Proceedings of the International Workshop on Automatic Face- and GestureRecognition, IWAFGR 95, 59-64, Zurich, 1995. Martin Bichsel, “Automatic Interpolation and Recognition of Face Images by Morphing”, Proceedings of the International Conference on Automatic Face and Gesture Recognition, ICAFGR 96, 128-135, Killington, 1996. R. Brunelli, D. Falavigna, T. Poggio, L. Stringa, “Automatic Person Recognition by Acoustic and Geometric Features”, Machine Vision and Applications, 8:317-325, 1995. Rama Chellappa, Charles L. Wilson, Saad Sirohey, “Human and Machine Recognition of Faces: A Survey”, Proceedings of the IEEE, 83(5):704-740, 1995. E. Clergue, M. Goldberg, N. Madrane, B. Merialdo, “Automatic Face and Gestual Recognition for Video Indexing”, Proceedings of the International Workshop on Automatic Face- and GestureRecognition, IWAFGR 95, 110-115, Zurich, 1995. Ian Craw, Nicholas Costen, Takashi Kato, Graham Robertson, Shigeru Akamatsu, “Automatic Face Recognition: Combining Configuration and Texture”, Proceedings of the International Workshop on Automatic Face- and Gesture-Recognition, IWAFGR 95, 53-58, Zurich, 1995. B. Dasarathy, “Decision Fusion”, IEEE Computer Society Press, 1994. John Daugman, “Face Recognition by Feature Demodulation”, Proceedings of the International Workshop on Automatic Faceand Gesture-Recognition, IWAFGR 95, 350-355, Zurich, 1995. Thomas Fromherz, Martin Bichsel, “Shape from Multiple Cues: Integrating Local Brightness Information”, Proceedings of the Fourth International Conference for Young Computer Scientists, ICYCS 95, 855-862, Beijing, P.R. China, 1995. Thomas Fromherz, “Shape from Multiple Cues for 3D-Enhanced Face Recognition: A Contribution to Error Reduction by Employing Inexpensive 3D Reconstruction Methods”, Ph.D. Thesis, University Zurich, Zurich, 1996. Abdelhakim Ghezal, “Automated Human Face Recognition by Computer Using Multiple Feature Criteria”, Ph.D. Thesis, University Zurich, Zurich, 1992. Gaile G. Gordon, “Face Recognition from Frontal and Profile Views”, Proceedings of the International Workshop on Automatic Face- and Gesture-Recognition, IWAFGR 95, 47-52, Zurich, 1995.
A Survey of Face Recognition
[Gutta et al. 95]
[Gutta et al. 96]
[Huang et al. 90] [Konen & Schulze 95]
[Lam & Suen 95] [Lando & Edelman 95]
[Maurer & Malsburg 95]
[Nyffenegger 95] [O’Toole et al. 95]
[Phillips & Vardi 95]
[Quinlan 86]
[Rabiner & Juang 86] [Samal & Iyengar 92]
[Samaria & Young 94]
17
Srinivas Gutta, Jeffrey Huang, Dig Singh, Imran Shah, “Benchmark Studies on Face Recognition”, Proceedings of the International Workshop on Automatic Face- and GestureRecognition, IWAFGR 95, 227-231, Zurich, 1995. Srinivas Gutta, Jeffrey Huang, Ibrahim F. Imam, Harry Wechsler, “Face and Hand Gesture Recognition Using Hybrid Classifiers”, Proceedings of the International Conference on Automatic Face and Gesture Recognition, ICAFGR 96, 164-169, Killington, 1996. X. Huang, Y. Ariki, M. Jack, “Hidden Markov Models for Speech Recognition”, Edinburgh Univ. Press, Edinburg, 1990. Wolfgang Konen, Ekkehard Schulze-Krüger, “ZN-Face: A System for Access Control Using Automated Face Recognition”, Proceedings of the International Workshop on Automatic Faceand Gesture-Recognition, IWAFGR 95, 18-23, Zurich, 1995. L. Lam, C. Suen, “Optimal Combinations of Pattern Classifiers”, Pattern Recognition Letters, 16:945-954, 1995. Maria Lando, Shimon Edelman, “Generalization from a Single View in Face Recognition”, Proceedings of the International Workshop on Automatic Face- and Gesture-Recognition, IWAFGR 95, 80-85, Zurich, 1995. Thomas Maurer, Christoph von der Malsburg, “Single-View Based Recognition of Faces Rotated in Depth”, Proceedings of the International Workshop on Automatic Face- and GestureRecognition, IWAFGR 95, 248-253, Zurich, 1995. C. Nyffenegger, “Gesichtserkennung mit Hidden-MarkovModellen”, Masters Thesis, IAM, University Bern, 1995. Alice J. O’Toole, Heinrich H. Bülthoff, Nikolaus F. Troje, Thomas Vetter, “Face Recognition across Large Viewpoint Changes”, Proceedings of the International Workshop on Automatic Face- and Gesture-Recognition, IWAFGR 95, 326331, Zurich, 1995. P. Jonathon Phillips, Yehuda Vardi, “Data-Driven Methods in Face Recognition”, Proceedings of the International Workshop on Automatic Face- and Gesture-Recognition, IWAFGR 95, 6569, Zurich, 1995. J.R. Quinlan, “The Effect of Noise on Concept Learning”, in Machine Learning: an Artificial Intelligence Approach 2, R.S. Michalski, J.G. Carbonell, T.M. Mitchell (Eds.), Morgan Kaufman, 149-166, 1986. Lawrence R. Rabiner, Biing-Hwang Juang, “An Introduction to Hidden Markov Models”, IEEE ASSP Magazine, 4-16, 1986. Ashok Samal, Prasana A. Iyengar, “Automatic Recognition and Analysis of Human Faces and Facial Expressions: a Survey”, Pattern Recognition, 25(1):65-77, 1992. F. Samaria, S. Young, “HMM-Based Architecture for Face Identification”, Image and Vision Computing, 12(8):537-543, 1994.
A Survey of Face Recognition
[Starner & Pentland 95]
[Takács & Wechsler 96]
[Turk & Pentland 91] [Valentin et al. 94]
[Wilder et al. 96]
[Wiskott et al. 95]
[Yu et al. 95]
18
Thad Starner, Alex Pentland, “Visual Recognition of American Sign Language Using Hidden Markov Models”, Proceedings of the International Workshop on Automatic Face- and GestureRecognition, IWAFGR 95, 189-194, Zurich, 1995. Barnabás Takács, Harry Wechsler, “Visual Filters for Face Recognition”, Proceedings of the International Conference on Automatic Face and Gesture Recognition, ICAFGR 96, 218-223, Killington, 1996. Michael Turk, Alex Pentland, “Eigenfaces for Recognition”, Journal of Cognitive Neuroscience, 3(1):71-86, 1991. Dominique Valentin, Hervé Abdi, Alice J. O'Toole, Garrison W. Cottrell, “Connectionist Models of Face Processing: a Survey”, Pattern Recognition, 27:1209-1230, 1994. Joseph Wilder, P. Jonathon Phillips, Cunhong Jiang, Stephen Wiener, “Comparison of Visible and Infra-Red Imagery for Face Recognition”, Proceedings of the International Conference on Automatic Face and Gesture Recognition, ICAFGR 96, 182-187, Killington, 1996. Laurenz Wiskott, Jean-Marc Fellous, Norbert Krüger, Christoph von der Malsburg, “Face Recognition and Gender Determination”, Proceedings of the International Workshop on Automatic Face- and Gesture-Recognition, IWAFGR 95, 92-97, Zurich, 1995. Keren Yu, Xiaoyi Jiang, Horst Bunke, “Face Recognition by Facial Profile Analysis”, Proceedings of the International Workshop on Automatic Face- and Gesture-Recognition, IWAFGR 95, 208-213, Zurich, 1995.