Multi-View Face Recognition By Nonlinear ... - IEEE Xplore

1 downloads 0 Views 317KB Size Report
method IsoScale and Generalized Linear Models (GLMs). Multi-view face sequences of freely moving people are obtained from several stereo cameras installed ...
Multi-View Face Recognition By Nonlinear Dimensionality Reduction And Generalized Linear Models Bisser Raytchev, Ikushi Yoda and Katsuhiko Sakaue National Institute of Advanced Industrial Science and Technology (AIST) [email protected] [email protected] [email protected] Abstract In this paper we propose a new general framework for real-time multi-view face recognition in real-world conditions, based on a novel nonlinear dimensionality reduction method IsoScale and Generalized Linear Models (GLMs). Multi-view face sequences of freely moving people are obtained from several stereo cameras installed in an ordinary room, and IsoScale is used to map the faces into a low-dimensional space where the manifold structure of the view-varied faces is preserved, but the face classes are forced to be linearly separable. Then a GLM-based linear map is learnt between the low-dimensional face representation and the classes, providing posterior probabilities of class membership for the test faces. The benefits of the proposed method are illustrated in a typical HCI application.

1. Introduction It is often stated that face recognition (FR) is one of the most widely studied problems in computer vision and pattern recognition. This may be true, however, it is also true that the FR meant in this statement is mainly concerned with benchmark-type problems in which different algorithms compete for better performance on large (at least as far as the number of face classes is concerned) databases of predominantly frontal or near-frontal faces, or more recently, even containing a few view-varied faces taken in static and well-controlled environments. This approach of course is important, and has led to both interesting theoretical insights and practical applications. In this paper, however, we will be concerned with the relatively lessstudied problem of FR under unconstrained real-world conditions, “unconstrained” meaning that the subjects are freely moving in their environments, and neither the process of face image acquisition, nor the actual face recognition requires them to adopt a certain pose at a certain dis-

tance or position from the camera(s), or even be conscious of these processes. It is expected that interest in this direction will grow as already cheap cameras and ever more powerful computers are readily available and also partly stimulated by the recent needs for security-motivated surveillance in public areas like subways, airports, parking lots, etc., or as an integral part of human-computer interfaces, in which unconstrained FR has an important role to play. It has to be appreciated also that real-time multi-view face recognition in unconstrained real-world conditions has some peculiar requirements, which differ from those usually assumed in the typical benchmark-type FR studies. Some of these peculiarities include: (1) low resolution and bad quality images provided by the surveillance cameras, including imprecisely detected and cropped facial images; (2) the necessity for real-time processing of huge quantities of raw face images (multidimensional input data); (3) unbalanced datasets in which certain views can be missing for certain subjects, but available for others, and so on, to mention just some of the most obvious. Having in mind the above peculiar requirements for unconstrained FR, we are motivated to propose a new general framework, which seems to be more useful in the context of the task at hand. The proposed framework consists of two building blocks: first a novel nonlinear dimensionality reduction (NDR) method, IsoScale is used to map the original high-dimensional input face images into a lowdimensional space where the manifold structure of the view-varied faces is preserved, but the face classes are forced to be linearly separable. The linear separability condition permits at the second step a fast Generalized Linear Models (GLM)-based linear map to be learnt between the low-dimensional face representation and the target classes, which additionally provides posterior probabilities of class membership for the test faces, a fact which is important for obtaining a minimum risk decision and treat the uncertainties which inevitably arise in unconstrained FR in a principled and well-founded way. Further benefits of the proposed method are discussed in more detail in the relevant subsections and are also illustrated in a typical HCI application and on a large face database in section 4.

Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06) 0-7695-2503-2/06 $20.00 © 2006

IEEE

2. Dimensionality reduction methods First we briefly review the PCA-based linear subspace method for face representation (see [1] for more details on and variations of this method), which will be needed for comparison later, and also to introduce some relevant ideas and notation. Then in section 2.3 we propose a new method for nonlinear dimensionality reduction. 2.1. The linear subspace method In this method, the training data necessary to build the face model is represented by the following centered data matrix & & & & & X ( x 1  x av ,..., x N  x av ) , where x i is the ith training & N & sample and x av N 1 x i . We assume that N training

¦i 1

face samples are available, including samples from all available views from several different face classes (sub& & & jects). A linear subspace model Y ( y 1 , y 2 ,..., y P ) t is constructed using the top P principal components (PCs) obtained by PCA by solving the eigenvalue problem XX t Y Yȁ . Thus, a certain high-dimensional face & sample x i is projected to a P-dimensional space by & & & q i Y(x i  x av ) , or more generally Q YX for all training samples and Q T YX T for the test samples, where X T contains the test samples, e.g. a multi-view face sequence for a subject whose identity has to be determined. Usually the class of the test subject is determined by the nearest neighbor (NN) method. Alternatively, a direct map between the dimensionality-reduced face representations and the corresponding face classes can be learnt. The advantages of the latter approach will be discussed in section 3. 2.2. Limitations of the linear subspace method

The vector space spanned by the subset of the PCs in Y corresponding to the P largest eigenvalues provides an optimal representation (in the sense of optimal L2 reconstruction) of the faces in P-dimensional subspace. However, one limitation of this approach is that the view-varying face manifolds corresponding to different classes are not linearly separable in the resulting linear subspace. Actually it is more likely that similar views of different people are stored in the same neighborhood. Consequently, learning a linear map between the dimensionality-reduced face representations and the corresponding face classes becomes impossible. A nonlinear method like the nearest neighbor (NN) might work better, but in the case of multi-view face recognition it would fail for the following 2 reasons: (1) if

certain view for face class A is not available in the training set, but available for class B, then a test sample of that view for class A will be erroneously classified to class B; (2) the NN scheme is computationally unrealistic for realtime applications, especially in the multi-view case, where distances to thousands of samples have to be calculated just to classify a single test sample. This motivates us to propose the following nonlinear method (in combination with the GLMs described in section 3) which solves the above problems by projecting the data into a lowdimensional space in which the classes are forced to be linearly separable, thus permitting a fast linear map to be learnt between the low-dimensional face representations and their classes. 2.3. Nonlinear dimensionality reduction with IsoScale

The nonlinear dimensionality reduction method IsoScale, proposed here projects a face x i (of dimension d) to its lower-dimensional representation y i (of dimension q, where q

Suggest Documents