Design of a Vision System for Identity Verification - Semantic Scholar

1 downloads 0 Views 294KB Size Report
agram of the developed workspace is shown in figure. 6. In order to take into account .... form a verification trial for these subjects. The same experiment has ...
Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999 Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999

Design of a Vision System for Identity Veri cation Massimo Tistarelli, Andrea Lagorio, Massimo Jentile and Enrico Grosso University of Genova - DIST Computer Vision Laboratory Via Opera Pia 13 - 16145 Genova, Italy e-mail: ftista,[email protected]

Abstract

The use of biometric data for automated identity veri cation, is one of the major challenges in many application domains. This is certainly a formidable task which requires the development of a complex system including several concurrent agents operating in real time. In this paper, a system for automated identity veri cation (currently under development within an European research project) encompassing the active vision paradigm is described. In our approach the amount of data to be processed is limited by selecting and analysing only few areas within the face image. The number of pixels for each area are also reduced by applying a space-variant conformal mapping. The devised system does not require to use special hardware. On the other hand, robustness can be enforced by performing the nal matching with more than a single image. This may require to adopt a simple, coarse scale, multi-processor architecture. The system is conceived for banking applications but can be ported to a variety of industrial applications. Several experiments on identity veri cation, performed on real images, are presented. Keywords: Biometric Authentication, Recognition, Computer Vision, Surveillance, Arti cial Intelligence

1 Introduction

The automatic veri cation of person's identity is a very interesting issue both in social and industrial environments. As an example you may consider: surveillance, law-enforcement, secure access control, smart interfaces, home marketing (World Wide Web). Many approaches have been proposed ranging from ngerprint recognition to retinal scan and facial image analysis. Most of these methods are characterized by intrinsic limitations or low reliability and require an active cooperation from the user. In principle the analysis of face images seems to be the best way to perform identity authentication and also the most acceptable for people: this is what every human being does everyday in life. On the other hand, many diculties arise from the enormous dimensionality of the search space when dealing with natural images. This is certainly a formidable task which requires the development of complex systems including several concurrent agents operating in real time.

In the literature, limited e orts were devoted to limit or circumvent the complexity of vision-based recognition/veri cation systems. On one side, several vision problems have been faced using very sophisticated mathematical tools, ending with algorithms which can be hardly implemented in real-time systems. On the other side, industrial applications pushed the development of practical, simple vision systems based on many arti cial constraints. In general, it is quite dicult to adopt this approach when dealing with natural environments while it is almost impossible to interact with a non-cooperating subject. At present, few commercial systems exist for the recognition of human faces1 . A remarkable attempt to ll this gap, to produce simple and practical vision systems by reducing the complexity of the problems, has been performed by the reasearch on active vision. The basic assumptions stem from the observation that there are speci c mechanisms in natural perceptual systems which are purposively designed to: 1. reduce the complexity of the visual process; 2. optimize the resources required to accomplish a given task. For example, in humans the capability to move and to perform planned xations is very important to give a better description of the face but also to reduce the amount of information analysed. This is accomplished also at the sensor level, by adopting an appropriate sampling of the data on the image plane. In order to achieve an acceptable reliability level for the introduction of this technology in the industry and in social life, it is necessary to realize exible and robust systems. This paper presents encouraging results toward this direction. In this paper, a system for automated identity veri cation (currently under development within an European research project) encompassing the active vision paradigm is described. The system is based on 4 sequential modules: 1 They are all based on neural networks which have the intrinsic limitation of being quite inecient when simulated on conventional computers. Moreover, neural nets can perform a good generalization, but the performance degrade signi cantly as the number of images to be handled increases.

0-7695-0001-3/99 $10.00 (c) 1999 IEEE

1

Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999 Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999

 detection of the customers's face;  tracking of the face with 2 active (mobile) cam-

y

eras (until the card is inserted in the smart card reader);  facial features are detected within each image in a sub-set from the acquired sequence. Spacevariant sub-images are extracted;  matching of the model face (read from the personal smart card) with the extracted features.

η

P ρ θ ρ

ξ

2 Recognition and Active Vision

One of the main reasons for the great complexity of recognition/veri cation tasks is the amount of informations to be processed. To achieve any visual task, all natural perceptual systems are capable of interacting with the environment and get as much information as needed, purposively controlling the ow of input data, also limiting the amount of information acquired from the sensory system [1, 2, 3]. This is a key aspect to reduce the complexity of the \computing system". The anatomy of the human visual system is a clear example: despite the formidable acuity in the fovea centralis (1 minute of arc) and the wide eld of view (about 140200 degrees of solid angle), the optic nerve is composed of only 1 million nerve bers. The spacevariant distribution of the cells in the retina allows a formidable data ow reduction. In fact, the same resolution would result in a space-invariant sensor of about 600,000,000 pixels [4]. Another important perceptual mechanism related to the data acquisition process is the attention mechanism. Again, as not all (visual in our case) input data is relevant for a given task, the perceptual system must be capable of making a selection of the input signal in various dimensions: \signal space", depth, motion etc. The selection is controlled by a proper attention mechanism through ad-hoc band-limiting or focusing processes. The active vision paradigm takes into account these and other considerations related to existing perceptual systems, to realize complex arti cial vision systems which are able to perform a given task under general assumptions in real time [5].

2.1 Space-variant imaging

It is generally assumed that, to recognize an object, it is necessary to provide a high resolution description of the most salient features of the interest object. This is accomplished either by \capturing" in rapid succession these parts of the scene2 or moving an interest window on a high resolution image [6]. On the other hand, it is not sucient to scan the scene or the image with a high resolution window, but it is also necessary to provide some information on the area around the window. A way to meet these requirements is to adopt a space-variant sampling strategy of the image, where the central part of the visual eld is sampled at a higher resolution than the periphery. In this way the peripheral part of the visual eld, coded 2 This mechanism implies an ecient motion control to quickly direct the gaze toward di erent areas of the scene.

P

x

0

Figure 1: Parameters of retino-cortical mapping. (Left) Any position P on the retinal plane can be expressed in polar (; ) or Cartesian (x; y) coordinates. (Right) On the log-polar plane the same position is identi ed by (;  ). at low resolution, can still be used to describe the context in which high resolution data is located. Many di erent models of space-variant image geometries have been proposed, like the truncated pyramid [6], the reciprocal wedge transform (RWT) [7], the complex logarithmic mapping (CLM) [8, 9] and the log-polar mapping [10, 11]. The analytical formulation of the log-polar mapping describes the mapping that occurs between the retina (retinal plane (; )) and the visual cortex (log-polar or cortical plane (;  )). The derived logarithmic-polar law, taking into account the linear increment in size of the receptive elds, going from the central region (fovea) towards the periphery, is given by: 

x =  cos  y =  sin 

and



=q    = lna 0

(1)

where a de nes the amount of overlapping among neighbouring receptive elds, 0 is the radius of the innermost circle, q1 corresponds to the minimum angular resolution of the log-polar layout, and (; ) are the polar coordinates (see gure 1). In the presented system the log-polar transformation is computed at frame rate by using special remapping software routines. This is possible because the mapping can be performed by addressing the image pixels on specially designed look-up tables. This approach has several advantages over special hardware devices, like space-variant sensors: it is more exible and allows the use of conventional, low-cost cameras.

3 A system for identity veri cation

Dealing with access veri cation from image data, there are two distinct problems: face detection (i.e. to detect a face within an image) and recognition or identity veri cation (i.e. to identify a person given an image of the face). There are many approaches to face detection and recognition/veri cation [12, 13, 14, 15, 16, 17, 18].

0-7695-0001-3/99 $10.00 (c) 1999 IEEE

2

Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999 Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999

Dealing with real applications where a huge number of customers have access to several stations, it is quite dicult to maintain a large database of facial images (or any useful representation of the same) and assure an ecient retrieval of the data for recognition. As an example consider a marketing system based on the world wide web or the huge network of money tellers of banks worldwide. The allowed bandwidth and the required security constraints for commercial transactions do not allow, at present, to pursue this solution. For this reason, in real applications, it is more feasible to perform identity veri cation instead of recognition. Identity veri cation is based on a representation of the subject's appearance which is provided by the customer himself. This is possible by providing the access control system with a chip card 3 reader, and storing, on the customer's personal chip card, a representation of the person's face. It is like having an invisible mugshot printed within the card.

3.1 The system set-up

The developed system for person identi cation is sketched in gure 2. Color images are continuously acquired from the left camera and processed to detect faces based on the skin color. The area of the face within the image roughly determines the distance of the subject from the camera. When the customer is suciently close to the ATM, a color-based stereo face tracker is activated also moving the cameras to keep the face in the center of the images. The image coordinates of the detected face are used to direct the gaze of both cameras toward the face center and start grabbing stereo images of the face. The grabbed face images are stored in a circular array (on RAM), containing the last 5 images acquired, until a card is inserted into the chip card reader. The last face images are extracted from the circular array and processed to detect 3 facial features: the eyes and the mouth. The face images are cropped and resampled to obtain a log-polar space-variant representation of each facial feature. The sampled images (which will be referred as \ xations") are used, either to build/update a face representation or to verify the identity of the subject. The identity veri cation system can be described by the diagram in gure 3. Even though the modules are arranged in a pipeline, data parallelism can be enforced in two ways:  within each processing module. For example, the model matching may be decomposed int several modules each one processing a single image feature;

 as the matching with the model has to be per-

formed on several images (probably 5), all images may be processed in parallel. In this way a simple multi-processor architecture may be sucient to reach real time performances. 3 A smart card or chip card, is a bank card with a small dynamic memory on it. This is used to store data about the customer's account and also data for identi cation.

Welcome

Figure 2: The system set-up, conceived within a banking application: an automatic teller machine (ATM).

3.2 Facial features detection

The technique applied to detect the facial features, relies on the application of morphological operators in order to extract contours, valleys and peaks in the grey levels. These informations are gathered together to make hypotheses for the presence of speci c facial features. For example, the visible part of the sclera of the eye corresponds to a peak in the grey levels while the nostrils correspond to valleys. The generalized Hough transform is applied for the rough localization of the facial features and a nal matching with adaptive templates is performed to re ne the localization process. Morphological operators have been successfully used in a number of applications involving classical problems of image processing (contours extraction, segmentation, shape analysis) and image restoration (noise suppression). Denoting by  the dilation operator and by the erosion operator the image contours are detected as: Ce (r; c) = I (r; c)M (h; k) I (r; c) (2) where M (h; k) is a twodimensional mask, I (r; c) is the input image and Ce (r; c) is the contour image. The dimension of the mask is tailored to the frequency band of the features to be selected. In our case a 7x7 circular mask specially devised for the extraction of the image contours has been used [19]. Morphological operators are particularly suitable for the extraction of structural properties of the image, like peaks and valleys in the gray levels [20]: P (r; c) = I (r; c) (I (r; c) M (h; k))M (h; k) (3) V (r; c) = (I (r; c)M (h; k)) M (h; k) I (r; c) (4) where P (r; c) denotes the peaks image, V (r; c) denotes the valleys image (see gure 4) and M (h; k) is the contour extraction mask.

0-7695-0001-3/99 $10.00 (c) 1999 IEEE

3

Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999 Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999

Figure 3: Functional and computational diagram of the identity veri cation system. On the left is the image acquisition and detection loop. Once the face is located at a given distance from the camera, the tracking loop (in the middle) is activated. Face images are stored in a circular stack on RAM. As the card is inserted, facial features are extracted from the last stored images and the log-polar mapping performed. Finally the feature images are matched against the model read from the chip card. Each dilation/erosion step requires to scan the entire image and substitute each pixel value with the maximum or minimum within a given neighborhood. A fast dedicated DSP may be required to compute the peaks and valleys images at frame rate. The generalized Hough transform is applied for the rough localization of the eyes. In fact, the region containing the eye is characterized by many curved contours with a strong concentricity. Two histograms are computed to locate the center of circular contour segments [21]: Xo (xo ) = Yo (yo ) =

y=X ymax r=X rmax

I

x=X xmax

r=rmin C [r;(x y )] o o I r=X rmax

xo =0

r=rmin C [r;(x y )] o o

yo =0

I (x; y) ds I (x; y) ds

where I (x; y) is the image intensity and C [r; (xo yo )] denotes the circle along which the integral is computed. This procedure is applied to the contours image, producing a starting point for the tting of the eyes template and also as an initial estimate for the position of the nostrils. Two models have been considered to accurately determine the position of the eyes. The former is composed by a circle and a couple of parabolas, with nine real parameters to be identi ed [22]. The latter is a very simple deformable model (an ellipse), which is described by four parameters (the 2 axes and the position of the center) to represent the iris of the subject. Both models are driven towards the nal position by a cost function which is computed on the peaks/valleys

images. Due to the high computational cost required for the rst eye model, only the simpli ed model has been used. As the shape of the nostrils strongly depends on the subject, the exact position is computed by looking for isolated spots on the valleys image. The mouth is searched in the valleys image within a limited region below the position of the nostrils. The extraction of facial features is certainly a time consuming process. For social acceptability of the system it is mandatory to limit the time required for the identity veri cation process. Given the time required to read the smart card inserted by the customer, which is about 10 seconds, about 7 seconds can be devoted to the extraction of the facial features and 3 seconds to initialize the model matching process. As already mentioned, the use of a multiprocessor architecture, including 5 or more processors, and a DSP board to perform the morphological lterings stages, would allow to limit the processing time to the required bounds.

3.3 Visual identity veri cation

Identity veri cation is related to but di ers from face recognition. Therefore, techniques that have been formulated and successfully applied for the former may not give good results with the latter and vice-versa. For face recognition the population of all possible subjects (not including the impostors) is known a priori (the database), while it is unknown for veri cation: everyone can be an impostor. For this reason face veri cation is, in principle, a more dicult task than recognition. It is necessary to bound very well the person's representation in the chosen feature space. In order to perform the identity veri cation the cus-

0-7695-0001-3/99 $10.00 (c) 1999 IEEE

4

Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999 Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999

Figure 4: Results of the morphological processing: (from left to right) original image, contours, valleys, peaks.

Figure 5: Results concerning the facial features extraction for a subset of 4 images among 20 acquired. The symbol  indicates the starting point for the eyes localization procedure; ( ) denotes the nal localization for the eyes while + denotes the nal localization of the nostrils.

Figure 6: The face veri cation workspace. The leftmost blocks (up to the \Getinfo" block) represent the image acquisition and facial features coordinates extraction procedures. The rightmost blocks (from the 3 \Absolute di " blocks to the end) represent the matching betweeen the two subjects as the computation of the integral of the absolute di erences. In the middle, the upper blocks represent the processing of the reference subject data, and the lower blocks the processing of the data from the new subject. Each row represents the processing steps applied to a single xation.

0-7695-0001-3/99 $10.00 (c) 1999 IEEE

5

Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999 Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999

tomer is represented by a model (stored on the smart card), which is simply a collection of xations from his face image. The matching is performed by computing the correlation between the model of the reference subject and the data extracted from the acquired images. The algorithm is based on the following steps: 1. Given the position of selected facial features (the eyes and the mouth) log-polar xations are extracted from the acquired image of the subject. 2. The log-polar images are warped according to the pose and orientation of the reference subject's face (generally parallel to the image plane). 3. The gray levels of each log-polar xations are normalized to the range of intensity values of the corresponding facial feature of the reference subject. 4. Corresponding xations are compared by either computing the sum of the absolute value of gray level di erences, or the normalized correlation, or the sum of squared di erences (SSD). In the presented experiments the absolute di erence has been used. This process is applied to several images acquired from the customer at the ATM. The biometric representation of the user (to be stored on the chip card) is composed of:  3 log-polar images representing the eyes and mouth (approximately 6 Kbytes);  the max and min intensity values (2 bytes);  the value of the interocular distance (2bytes);  the threshold value to verify the subject's identity (4 bytes). This method has been implemented under the c development environment 4 . The block diKhoros agram of the developed workspace is shown in gure 6. In order to take into account small movements of the head, the images are rst \warped" to obtain a view as close as possible to the reference face position. Given a stereo image pair of the face, with the cameras verging on the same point of the face, the image warping is performed by rst computing the threedimensional plane containing the eyes and the mouth of the imaged face. The vector normal ~n to the plane containing the eyes and the mouth, can be computed from the image coordinates of the extracted facial features. From the basic inverse perspective equations applied to the coordinates (xl ; yl ) and (xr ; yr ) of the 4 Khoros is a C++ based programming environment which allows the user either to build a complete system for fast prototyping (a \workspace") by connecting, within a visual environment called \Cantata", functional modules of pre-designed C++ functions or to pack a complete system in a command script to be executed within the unix shell.

feature points detected in the two images, the real coordinates (Xl ; Yl ; Zl ) in the world are computed: Xl = Zl =

xl y Z ; Yl = l Zl fl l fl h fl B yl sin sin xfrr + cos yfrr +

h

i

(5)

h

yr xr cos xfrr + sin fr sin sin fr + sin cos ] i yr + cos fr sin cos

where and represent the di erence in the pan and tilt angles of the two cameras, B is the baseline of the stereo cameras, fl and fr are the focal lengths of the left and right camera expressed in pixels, (xl ; yl ) and (xr ; yr ) are the coordinates of a feature point on the left and right image respectively. The normal vector to the face plane is computed from the coordinates of the 3 feature points in space: a = X1 X3 ; b = Y1 Y3 ; c = Z1 Z3 d = X2 X3 ; e = Y2 Y3 ; f = Z2 Z3 N~ =



X Y Z a b c d e f



=) (bf ce ; cd af ; ae db) ~n =

N~ N~



(6)

The normal vector ~n is computed to reproject each image feature on the image plane. Even though the planar approximation is not accurate for the whole face, it is satisfactory when considering a limited neighborhood of each facial feature. Therefore, the image warping is applied to each log-polar xation separately. It is worth noting that rotations of the face on the image plane are inferred from the mutual position of the eyes on the image plane, while the scale is adjusted by comparing the interocular distance of the two subjects on the image plane. Given the scale and rotation invariance of the log-polar mapping, both the orientation and scale are corrected by shifting the pixels in both directions in the log-polar images. From equations (5) it is possible to compute the real interocular distance of the imaged subject. This measure is compared with the same measurement from the reference subject as an additional check for identity veri cation. One of the major problems in image matching is the change in the illumination condition. Even though the shape of each feature generally does not change, the gray levels may change introducing unexpected errors in the computed distance between the compared subjects. For this reason, two normalization procedures are applied:  the mean gray level is computed for each xation and subtracted to obtain a zero mean distribution of the gray levels;

0-7695-0001-3/99 $10.00 (c) 1999 IEEE

6

Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999 Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999

 the gray levels are \warped" to match the range

of intensity values of the reference subject. This procedure has the e ect of enhancing similarities in the local shape of the intensity values. These procedures are enclosed in the \normalize" c workspace in gure 6. block of the Khoros

Figure 7: Output of the rst veri cation experiment. On the X axis two poses for each of the 8 subjects are plotted. The rst two positions on the abscissa represent the two poses of the subject coded in the database. The leftmost part represents the matching score for the rst subject.

4 Experiments

In order to analyze the performance of the system, a sequence of 16 images including 2 views from each of 8 subjects was acquired. The resolution of the images is 512x512 pixels with 8 bits per pixel. For each view 3 log-polar images were extracted, centered on the eyes, nose and lips. the two views of each subject di er for the expression of the face or a slight motion of the head. In gure 7, the results obtained by performing an identity veri cation experiment are shown. Additional experiments were performed to test the matching technique on a subset of the FERET database [23], including 20 subjects and 2 frontal views for each subject. The subjects were chosen according to the following criteria: 1. at least two images of the same subject were available, to perform the veri cation test; 2. the size of the head within the images is comparable to the images acquired with the actual system (considering the average distance of the subject when the authentication should be performed). The images are all black and white, with 8 bits of resolution in intensity and di erent sizes. Two example views of two subjects are shown in gure 8. In

gure 9, the result of the process for identity veri cation applied to subject 5 against all other subjects, is shown. The log-polar images have 34 eccentricities with 46 receptive elds for each circle. The overlap factor of the receptive elds is 1.6 on both directions. By applying the log-polar mapping the face representation only requires 4692 (1564x3) bytes. This results in a compression rate of 1:19. The full veri cation test has been performed by enclosing the workspace shown in gure 6 within a loop c " (the Khoros c run-time environunder \Cantata ment) to compare all 20 subjects. A sample of the recognition results are displaied in gure 9, where the score obtained by matching the subject numer 1, 5, 10 and 20 with all the other subjects is plotted. Two di erent images have been used to match each subject against himself. This experiment corresponds to perform a veri cation trial for these subjects. The same experiment has been performed for all 20 subjects, and all subjects were correctly identi ed. Nonetheless, the matching score of the correct subject is sometimes very close to others, as for subjects 4 and 20 or 9 and 10. Therefore, in order to tune the acceptance threshold, it will be necessary to train the system over a great number of face images. In this way it will be also possible to de ne statistically the false acceptance/false rejection ratio for each subject. The experiments on the FERET database were performed within the Cantata programming environment of Khoros, on a variety of computing platforms including a Silicon Graphics O2, Sun Ultra Sparc 1 and a Pentium Pro-based workstation. Even in this context (which includes the huge burden of the Khoros programming environment and the X11 display server), the system requires about one minute to perform the complete veri cation cycle. More e ort is still required to pack the system into a set of binaries and verify the real processing time. Further experiments will be performed also on a multi-processor architecture.

5 Conclusion

The identi cation of individuals from face images has been considered. The realization of complex computer vision systems requires to limit the amount of data to be processed. This is mandatory to ensure real-time performances, sometimes even using specialpurpose hradware devices. The active vision approach, through the use of mobile cameras and hints from natural perceptual systems, allows to reduce the amount of data to be processed and design practical, real-time vision systems. A system has been described, based on the active vision paradigm, for personal identi cation in banking applications. Some advantages of the \active" approach for identity veri cation have been presented. In particular, the motion of the cameras, associated with a spacevariant representation of the images, allows a reduction in the size of the face model and consequently also the time required for identity veri cation. Several aspects have been addressed explicitly and many are still under investigation. For example, one topic which is still to be studied more deeply is the

0-7695-0001-3/99 $10.00 (c) 1999 IEEE

7

Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999 Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999

300000 280000 260000 240000 220000 200000 180000 160000 140000 120000 100000 80000 0

5

10

15

20

0

5

10

15

20

0

5

10

15

20

0

5

10

15

20

350000

300000

250000

200000

150000

100000

50000

300000

280000

260000

240000

220000

200000

180000

160000

140000

400000

Figure 8: Original images and Log-polar xations for the subject number 5 and 14 extracted from the FERET database.

350000

300000

250000

200000

150000

100000

50000

Figure 9: Output of the matching performed between the subject (from top to bottom) 1, 5, 10, 20 and all other subjects. On the abscissa is the subject number and on the ordinates is the distance between the reference and each other subject.

0-7695-0001-3/99 $10.00 (c) 1999 IEEE

8

Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999 Proceedings of the 32nd Hawaii International Conference on System Sciences - 1999

parallel implementation of some of the modules composing the system. This analysis would be very important to further improve the real-time performances of the system.

Acknowledgements

We thank M. Perrone and S. Rimassa for their valuable contribution to the development of this paper. This work has been partially funded by the LTR Esprit Project 21894 \VIRSBS".

References

[1] D.H. Ballard. Animate vision. Arti cial Intelligence, 48:57{86, 1991. [2] G. Sandini and M. Tistarelli. Vision and spacevariant sensing. In H. Wechsler, editor, Neural Networks for Perception: Human and Machine Perception. Academic Press, 1991. [3] Y. Aloimonos. Purposize, qualitative, active vision. CVGIP: Image Understanding, 56(special issue on qualitative, active vision):3{129, July 1992. [4] E. L. Schwartz, D. N. Greve, and G. Bonmassar. Space-variant active vision: de nition, overview and examples. Neural Networks, 8(7/8):1297{ 1308, 1995. [5] M. Swain and M. Stricker. Promising directions in active vision. Intern. Journal of Computer Vision, 11(2):109{126, 1993. [6] P. J. Burt. Smart sensing in machine vision. In Machine Vision: Algorithms, Architectures, and Systems. Academic Press, 1988. [7] F. Tong and Z.N. Li. The reciprocal-wedge transform for space-variant sensing. In 4th IEEE Intl. Conference on Computer Vision, pages 330{334, Berlin, 1993. [8] R.C. Jain, S.L. Bartlett, and N. O'Brian. Motion stereo using ego-motion complex logarithmic mapping. IEEE Trans. on PAMI, PAMI9(3):356{369, 1987. [9] E. L. Schwartz. Spatial mapping in the primate sensory projection: Analytic structure and relevance to perception. Biological Cybernetics, (25):181{194, 1977. [10] C. F. R. Weiman and G. Chaikin. Logarithmic spiral grids for image processing and display. Comp. Graphics and Image Processing, (11):197{ 226, 1979. [11] G. Sandini and V. Tagliasco. An anthropomorphic retina-like structure for scene analysis. CGIP, 14 No.3:365{372, 1980. [12] I. Craw, D. Tock, and A. Bennett. Finding face features. In Proc. of second European Conference on Computer Vision, pages 92{96, S. Margherita Ligure (Italy), 1992. Springer Verlag.

[13] V. Bruce, A. Coombes, and R. Richards. Describing the shapes of faces using surface primitives. Image and Vision Computing, 11(6):353{ 363, 1993. [14] R. Brunelli and T. Poggio. Face recognition: Features versus templates. IEEE Trans. on PAMI, PAMI-15(10):1042{1052, 1993. [15] L. Sirovich and M. Kirby. Application of the karhunen-loeve procedure for the characterization of human faces. IEEE Trans. on PAMI, PAMI12(1):103{108, 1990. [16] M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71{ 79, March 1991. [17] G. Robertson and I. Craw. Testing face recognition systems. In Proc. of 4th British Machine Vision Conference, pages 25{34, University of Surrey, Guildford (UK), 1993. [18] M. Tistarelli and E. Grosso. Active face recognition with a hybrid approach. Pattern Recognition Letters, 18:933{946, 1997. [19] J. Serra. Introduction to mathematical morphology. Computer Vision Graphics and Image Processing, 35:283{305, 1986. [20] J. Serra. Image Analysis and Mathematical Morphology. Academic Press, London, 1982. [21] D.E. Benn, M.S. Nixon, and J.N. Carter. Robust eye center extraction using the hough transform. In Audio and Video based Person Authentication - AVBPA97, pages 3{9. Springer, 1997. [22] Alan L. Yuille, Peter W. Hallinan, and David S. Cohen. Feature extraction from face using deformable templates. International Journal of Computer Vision, 8:99{111, 1992. [23] P. J. Phillips, H. Moon, P. Rauss, and S. A. Rizvi. The feret september 1996 database and evaluation procedure. In Audio and Video based Person Authentication - AVBPA97, pages 395{402. IAPR, Springer, 1997.

0-7695-0001-3/99 $10.00 (c) 1999 IEEE

9

Suggest Documents