Face Recognition Based on Nearest Linear

0 downloads 0 Views 218KB Size Report
Abstract. This paper proposes a novel pattern classi cation approach, called the nearest linear combination (NLC) approach, for eigenface based face ...
Face Recognition Based on Nearest Linear Combinations Stan Z. Li School of EEE, Nanyang Technological University, Singapore 639798 [email protected]

Abstract

This paper proposes a novel pattern classi cation approach, called the nearest linear combination (NLC) approach, for eigenface based face recognition. Assume that multiple prototypical vectors are available per class, each vector being a point in an eigenface space. A linear combination of prototypical vectors belonging to a face class is used to de ne a measure of distance from the query vector to the class, the measure being de ned as the Euclidean distance from the query to the linear combination nearest to the query vector (hence NLC). This contrasts to the nearest neighbor (NN) classi cation where a query vector is compared with each prototypical vector individually. Using a linear combination of prototypical vectors, instead of each of them individually, extends the representational capacity of the prototypes by generalization through interpolation and extrapolation. Experiments show that it leads to better results than existing classi cation methods.

1 Introduction

Face recognition has a wide range of applications in which personal identi cation is required [3]. A face recognition system uses a computer to analyze a query face image, compares it with the prototypical face images recorded in a face database, and tells the identity of the person. A main concern in designing a face recognition system is that face images in query are subject to changes in viewpoint, illumination, and expression conditions when compared to the database records. To deal with such changes, we have to address two basic issues: (i) how to represent a face class (individual) and (ii) how to classify a query face image using the selected representation. There are two broad types of representations: geometric, feature-based and template-based. Geometric features such as relative positions of eyes, nose and mouth in 3D are visually stable under various changes [6, 2]. When measured reliably and accurately, they provide powerful constraints for face recognition to produce good results [4]. However, automated detection of facial feature landmarks and measurement of the positions remains a challenge that has yet been

well solved, and manual interaction have been needed [4]. Template-based representation is computationally more stable. Usually one or more prototypical templates are available for a class, and the matching is performed by comparing the query with all or a subset of the templates. A crucial assumption made in template matching is that the prototypes are representative of query images under various conditions. Therefore, the location and scale of the query are normalized before the query is compared to the template. Multiple templates per person are normally used to represent changes in illumination and rotation because such changes are dicult to compensate by normalization operations. To reduce the cost in representation and classi cation, a raw image is represented by a feature vector in a feature space. The Karhunen-Loeve (K-L) transform or principal component analysis (PCA) [5] provides the optimal reduction in the least squares sense. In [12], a set of eigenvectors are calculated from a set of prototypical (training) face images to form a basis, and any face image in the prototypical set is represented and reconstructed by a linear combination of the basis vectors. In the eigenface approach [13], which has been successfully used for face recognition, a face image, either query or prototypical, is represented by a point in the eigenface space, and a distance-based criterion is used for face recognition. After the standard eigenface approach [13], variants have been proposed in several ways [11]. Given N individuals under M di erent views, a \view-based" set of M distinct eigenspaces may be constructed, each capturing the variation of the N individuals in a common view. One the other hand, the eigenface technique can be extended to represent facial features such as eyes, nose and mouth, yielding eigeneyes, eigennose and eigenmouth. When the eigenfeatures are incorporated with the eigenfaces, a \modular" or layered eigen-representation results. A recent direction is to model faces in terms of probabilities. When the sample images of similar faces, seen from a standard view and in a standard illumi-

Copyright 1998 IEEE. Published in the Proceedings of CVPR'98, June1998 Santa Barbara, CA. nation, are accurately aligned, the distribution can be modeled by a single Gaussian distribution [9]. However, the distribution of face images over varying conditions is highly nonconvex and complex [1]. In this case, a mixture-of-Gaussian model is proposed [9]. A problem in the probabilistic modeling of every class separately is that the size of training set is often too small for computing a class. A solution is to compute two basic distributions instead, intra-personal and extra-personal, after warping the query image with each of the prototypical images [8]. Two distributions, instead of one for each person, can be computed more reliably because the set of intra-person and extra-person data can be suciently large. In this paper, we are more interested in performing face recognition using non-parametric pattern recognition approaches. The nearest neighbor (NN) is often used for classi cation. The NN relies crucially on the assumption that the prototypes are representative of query images. The results depend on how prototypes are chosen to account for possible image variations and also how many prototypes are available. No matter how representative the prototypes may be, there are always un-prototyped viewing, expression and illumination conditions, because only a nite, often small, number of prototypes are available as compared to all possibilities. An approach is needed to cope with the missing conditions. In [10], a spline interpolation method is presented for capturing varying viewing and illumination conditions in industry inspection environments. A rigid object is imaged on a turnable (parameterized by a single parameter) under carefully controlled lighting (parameterized by another single parameter). The training points in an eigenspace, representing the training images of an inspected part, are ordered according to one of the parameters, e.g. the viewing angle or the lighting angle. However, the ordering according to a single parameter can be dicult in face recognition because (i) there are more parameters in face imaging than in rigid object imaging, (ii) a face imaged live is subject to position, rotation and expression changes and thus getting strictly parameter-controlled face images is dif cult; (iii) the parameters describing changes in viewpoint, illumination and facial expression, if known, are not easily separable for face recognition. As far as imaging parameters are concerned, the linear combination approach for 3D object recognition from 2D images [14] is less restrictive. In that approach, a 3D object is represented by a linear combination of 2D boundary maps of the object and the knowledge of imaging parameters is not required. An object in the image is considered as an instance of the model object if it can be expressed as a linear combi-

2

nation of the model views for some set of coecients. In another work [15], a linear combination of 3D prototypical views are used to synthesize new views of an object. In this paper, a novel pattern classi cation approach, called the nearest linear combination (NLC) approach, is proposed for face representation and recognition. Inspired by both eigenface [13] and linear combination approaches [14], the NLC is a templatebased approach in which eigenfaces are used as the prototypical templates and multiple prototypes (as few as one) belonging to the same person are linearly combined to infer a new image of the person. The idea is to expand the representational capacity [7] of the prototypes x1 ; : : : ; xm of each face class by linear combinations (LCs) x = a1 x1 +    + am xm of them. An LC interpolates or extrapolates the prototypes fxk g. Variations in lighting, viewing angle and expression among the prototypical face images are accounted for by variations in the weights that determine linear combinations. This virtually provides an in nite number of prototypical points, and thus account for more image changes than the original prototypes. In the calculation of the distance between a query vector and a face class, the query is projected to the subspace spanned by the prototypes. The projection point is the linear combination that is nearest to the query, hence called the NLC. The distance between the query and the NLC is used as the basis for the classi cation. Based on such distances, the conventional NN classi cation, which compares each prototype individually, is extended to the nearest NLC classi cation, which compares the NLC from the prototypes of each class: The query face is classi ed as belonging to the class having the minimum NLC. The NLC of a query image may also be used for inferring how the query face is relative to the prototypical faces in terms of lighting, viewing angle and expression. Experiments performed on data sets from six databases show that the NLC signi cantly improves the error rates as compared to the standard Eigenface method and the NN (cf. demos at http://markov.eee.ntu.ac.sg:8000/~szli/demos.html). The rest of the paper is organized as follows: Section 2 introduces the NLC representation and describes NLC based face recognition, Section 3 presents experimental results.

2 The Nearest Linear Combination Approach In the NN classi cation, there is one or multiple prototypes per class and the decision is made based on the distance between the query and each individual prototype. The results rely crucially on the as-

Copyright 1998 IEEE. Published in the Proceedings of CVPR'98, June1998 Santa Barbara, CA. sumption that the prototypes are representative of the query image. There should be a sucient number of representative prototypes to account for as many as possible changes in face images. However, it is impractical to exhaust all possibilities { there are an in nite number of them. In practice, only a few, typically one to a couple of dozens of prototypes are available per class. The NLC is aimed to expand the representational capacity of the available prototypes to cope with possible changes.

2.1 Linear Combination

Assume that there are C classes and there are a set of Nc prototypes for class c, denoted fx1 ; x2 ; : : : ; xNc g (it is not necessary that all the classes have the same number Nc ). A prototype xi = (x1i ; : : : ; xDi ) is a row vector in a D dimensional feature space (in this paper, the eigenface space). A linear combination (LC) of size m is a weighted sum of m prototypes

x = x(A) =

m X ak xk = AT X

(1)

k=1 where X = (x1 ; : : : ; xm )T is a matrix formed by putting the m row vectors in a pile and AT = (a1 ; : : : ; am ) is the vector of weights.

In the proposed approach, the prototypes x1 ; : : : ; xm for constructing an LC must be of the same class; cross-class combinations are not considered. The within-class LCs are used to produce samples of that class. Given X, an LC is determined by the weights A. De ne the linear combination space, denoted S (x1 ; : : : ; xm ), by the space spanned by x1 ; : : : ; xm . When m < D, S (x1 ; : : : ; xm ) is a subspace of the whole eigenface space. An LC interpolates or extrapolates the prototypes fxk g in S (x1 ; : : : ; xm ). A constrained linear combination (CLC) is de ned as a linear combination subject to the following constraint on the weights m X C (A) = a k=1

k=1

by the basis vectors x01 ; : : : ; x0m?1 . When m  D, it is a subspace of S (x1 ; : : : ; xm ). The subspaces S and S 0 are based on the prototypes belonging to a single class. They suggest two scopes of variations in the samples of the class. Variations in lighting, viewing angle and expression among the prototypical face images are accounted for by variations in the weights that determine the linear combination. This virtually provides an in nite number of prototypical points, and thus accounts for more image changes than the original prototypes. The representational capacity of the prototypes for a class is thus expanded.

2.2 Nearest Linear Combination Let the feature vector of the query image be y (of the same dimensionality as xi ). Its Euclidean distance to an LC is e(A) = ky ? x(A)k (4) It depends on the weights A given y and X. We de ne the nearest linear combination (NLC) of the m points (x1 ; : : : ; xm ) for y, without the constraint (2), as the linear combination that minimizes e(A). This is a least squares problem. When x1 ; : : : ; xm are linearly independent, the NLC weights can be calculated by using y and the pseudo-inverse X+ + A = arg min (5) A e(A) = yX

When they are linearly dependent, the calculation can be done by using singular value decomposition. The NLC is the projection of y onto the (unconstrained) linear combination space, with the projection point being p = p(y; x1 ; : : : ; xm ) = x(A ): (6) When the constraint (2) is imposed, we have the nearest CLC (NCLC). The constrained optimization can be converted to an unconstrained one determined by m ? 1 of the m weights, say, A0 formed by the rst m ? 1 weights. The minimal weights can be found as 0 0+ A = arg min (7) A e(A) = y X 0

(2)

Under this constraint, only m ? 1 of fx1 ; : : : ; xm g are free. Letting the rst mP? 1?be free, the last weight is 1 expressed as am = 1 ? m k=1 ak . Let

X x = xm + ak x0k = xm + A0T X0 m?1 k=1

3

y =(1,1,1)

0 x2 =(0,1,0)

(3)

where x0k = xk ? xm , X0 = (x01 ; : : : ; x0m?1 ), and A0T = (a1 ; : : : ; am?1 ). The constrained linear combination space, denoted S 0 (x1 ; : : : ; xm ), is the space spanned

pNCLC x1 =(1,0,0)

pNLC

Figure 1: Schematic Illustration of NLC and NCLC (see text).

Copyright 1998 IEEE. Published in the Proceedings of CVPR'98, June1998 Santa Barbara, CA. where y0 = y ? xm and X0+ is the pseudo-inverse of X0. The NCLC can be calculated Pk=1?1 abyk. using the same formula (6) with am = 1 ? m The NLC is the projection of y onto the subspace spanned by x1 ; : : : ; xm whereas the NCLC is the projection onto the subspace spanned by x1 ; : : : ; xm?1 . Fig.1 illustrates NLC and NCLC in a 3D feature space (D = 3) with m = 2 prototypes x1 = (1; 0; 0) and x2 = (0; 1; 0) and the query vector y = (1; 1; 1). The coordinates of NLC is pNLC = (1; 1; 0). It is the projection of y in the subspace (plane) spanned by the vectors x1 and x2 . The coordinates of NCLC is pNCLC = (0:5; 0:5; 0). It is the projection of y in the subspace (line) spanned by the vector x1 ? x2 . The concept can be generalized to a higher-dimensional feature space and with more than two prototypes. Seemingly, the NN classi cation is equivalent to NLC with m = 1 but this is not true. Let x1 = (1; 0; 0) and y = (0:5; 0:5; 0:5). The projection point is p = (0:5; 0; 0) = 0:5x1 . The NLC is based on d(y; p) whereas the NN based on d(y; x1 ), d(y; x2 ). Since the LC is a subspace of the (D dimensional) eigenspace, as m (the number of prototypes) approaches D, the LC approaches the whole eigenspace and the projection distance goes to zero. What is the optimal range of values of m in terms of the accuracy? For the NN approach, the performance should improve as m increases, but that may not be the case for the NLC. More analysis is needed to investigate this issue. Here, we have the following comments about m from the computational viewpoint. In practice, normally there are D = 40 and Nc < D. The range of the m value is f1; : : : ; Ncg for NLC and f2; : : :; Nc g for NCLC. The total number of possible linear combinations of size m for class c is CNmc . For example, when Nc = 5 and m = 2, CNmc = 10; when m = Nc, we have CNmc = 1. For the computational convenience, m = Nc is suggested because only one NLC or NCLC needs to be calculated in this case. Our experiments suggest that the overall best improvements are gained when m = Nc .

2.3 Face Recognition Based on the NLC A query face y is classi ed into one of C face classes based on a distance measure of how far y is from each class. In the NLC approach, the distance is the Euclidean distance between y and its projection point p = p(y; x1 ; : : : ; xm) (of either NLC or NCLC) d(y; p) = ky ? pk (8)

With m xed, there are CNmc combinations for class c P and therefore a total number Cc=1 CNmc of distances to be calculated for classifying y. The face represented by y is classi ed as belonging to the class represented by xa ; : : : ; xm if the distance d(y; p) is minimum.

4

The recognized class c, the minimized distance d(y; p), the position parameters fak g, and the re-

trieved prototype faces represented by fxk g are given as the recognition result. The position parameters can be used to infer the position of the query face relative to the retrieved prototype faces as will be explained shortly. The pseudo-inverse matrices can be computed o line because they are de ned by the prototypical vectors only, regardless of the query vector y. With X+ available, the minimal weights A = yX+ can be calculated quickly using pre-stored coecients. In the standard eigenface method [13], the classi cation is based on the nearest distance to the class center (NC). It can be considered as a special case of constrained linear combination. There, thePmean of c the prototypical vectors of a class c, x = N1c Nk=1 xk , is used to represent that class. It is equivalent to a LC with identical weights, ak = 1=Nc (1  k  Nc ). However, this setting does not yield good results, even not as good as the NN classi cation. Our experimental results show that the NN yields better results than the NC, the NLC and NCLC better than the NN.

2.4 Linear Interpolation and Extrapolation of Face Images

Representationally, a NCLC can be considered as a generalization of its prototypes. Denote a face image by z. Consider a change from z1 to z2 in the image space and the corresponding change from x1 to x2 in the feature space. The size of the change may be measured by the variations z = kz2 ? z1 k or x = kx2 ?x1 k. When z ! 0 and thus x ! 0, the locus of x due to the change can be approximated well enough by a straight line segment between x1 and x2 . Thus any change between the two can be interpolated by a point on the line. A further small change beyond x2 can be extrapolated using the linear model. The NCLC weights fak g, denoted fak g hereafter for brevity, can be regarded as the position parameters of p relative to fxk g. Consider the case where m = 2. The two prototypes, x1 and x2 , are linearly combined as p = x2 + a1 (x1 ? x2 ). When a1 = 1, p = x1 . When a1 = 0, p = x2 . When 0 < a1 < 1, p is an interpolating point between x1 and x2 . When a1 < 0, p is a \forward" extrapolating point on the x2 side. When a1 > 1, p is a \backward" extrapolating point on the x1 side. This concept can be generalized to situations where m > 2. In this case, a constrained linear combination is a point on the hyperplane in the space spanned by x01; : : : ; x0m?1. When ak > 0 (8k) and Pmk=1?1 ak < 1, point. When ak < 0 (9k) or Pp mkis=1?an1 akinterpolating > 1, it is an extrapolating point. Let y be a face image. When d(y; p) is small, the

Copyright 1998 IEEE. Published in the Proceedings of CVPR'98, June1998 Santa Barbara, CA.

3 Experiments

Figure 2: (Top Row) Faces under viewpoint changes. The query face y (left) is at a center angle relative to the two prototypical faces x1 and x2 viewed at a right and a left angle, respectively. (Middle Row) Faces under illumination changes. The query face y (left) is illuminated by a right light as compared to the two prototypical faces x1 and x2 illuminated by a left and a center light, respectively. (Bottom Row) Faces under expression changes.

relative position of p may be used to infer that of Fig.2 illustrates the use of the NCLC weights for inferring the (viewpoint, illumination, or expression) position of y relative to two prototypes xk (k = 1; 2). The value of the position parameter a1 indicates how y is projected onto x1x2 at the projection point p = x1 + a1(x2 ? x1). It can be used to infer the position of p, and y if the projection distance is small, relative to x1 and x2 , interpolating or extrapolating. For the top row of the gure, the parameter value is a1 = 0:234, indicating that the query face is an interpolation of the two prototypical images. For the middle row, it is calculated as a1 = 1:138, indicating that the query face is a forward extrapolation of the two prototypical faces. For the bottom row, it is a1 = ?0:519, indicating that the query face is a backward extrapolation of the two prototypical faces.

y.

5

Results for two sets of experiments are given. The rst set compares four classi cation methods: (1) the nearest center (NC) used in the standard eigenface method; [13], whose performance is used as the baseline; (2) the nearest neighborhood (NN) method; (3) the nearest linear combination (NLC), with m = Nc = 5; and (4) the nearest constrained linear combination (NCLC), also with m = Nc = 5. The second compares the NCLC itself with varying parameter m = 2; 3; 4; 5. The eigenface approach are used as the bottom level feature representation. The tests are performed on two databases in which the images are subject to variations in viewpoint, illumination, and expression, race and gender. (1) In the rst database, the set of prototypes contains a total number of 600 images of 120 individuals (5 images each, i.e. Nc = 5) from four databases: 40 individuals from the Cambridge database, 30 individual from the Bern database, 13 individuals from the MIT database, and 37 individuals from our own database. The query set contains a total number of 200 images of 40 individuals from the Cambridge database. (2) The second database includes all the data in the rst one, plus the following: For the prototypical set: 25 images of 5 individuals from the Harvard database and 75 images from the Yale database were added. For the query set: 25 images of 5 individuals from the Harvard database and 25 images of 5 individuals from the Harvard and Yale databases were added. The faces are located by using a simple algorithm with manual interaction, which does not align the faces accurately but provide a basis for the relative comparisons. For the rst set of experiments, the error-rates of the four compared methods are plotted in Fig.3 as functions of the number of eigenfaces. We see that the methods can be ordered in descending error-rates as NC, NN, NLC, NCLC. NLC and NCLC are significantly better than NC and NN. When D is between 30 and 40, the error rate of NCLC is about 45% of that of NC for the rst data set and 40% for the second; it is about 60% of that of NN for the rst data set and 75% for the second. The analysis suggests that NLC and NCLC are better classi cation methods than NC and NN, and between NLC and NCLC, NCLC is more preferable. Therefore the use of NCLC is recommended. The second set of experiments compares the errorrates within NCLC as m increases from 2 to 5. The result is shown in Fig.3. For both data sets, we see that the di erences in error-rates between di erent m settings are not signi cant when D is between 30 and 40. For data set 2, however, the error-rate with m = 2 drops faster when D is increased from 15 to 20. But we

Copyright 1998 IEEE. Published in the Proceedings of CVPR'98, June1998 Santa Barbara, CA. 30

20 15

20

10

5

5

15 20 25 30 35 Number of Basis Vectors

40

Acknowledgments

This work was supported by NTU AcRP projects RG 43/95 and RG 51/97.

15

10 10

NC NN NLC NCLC

25 Error Rate (%)

25 Error Rate (%)

method, the NLC has also produced good results for audio classi cation and retrieval.

30 NC NN NLC NCLC

10

15 20 25 30 35 Number of Basis Vectors

40

Figure 3: Comparison of error rates of the four methods for data set 1 (left) and data set 2. 16

12 10 8 6

m=2 m=3 m=4 m=5

14 Error Rate (%)

Error Rate (%)

14

12

References [1] M. Bichsel and A. P. Pentland. \Human face recog[2] [3]

16 m=2 m=3 m=4 m=5

[4]

10 8

10

15 20 25 30 35 Number of Basis Vectors

40

6

10

15 20 25 30 35 Number of Basis Vectors

40

Figure 4: Comparison of error rates of NCLC with varying combination numbers m, for data set 1 (left) and data set 2. consider this phenomenon as a coincidence with that data set. As commonly reported, the best classi cation is achieved with D around 40, suggesting the use of D = 40. With D = 40, the use of m = Nc is suggested for the computational convenience because in this case, only one combination need to be computed. The NC has the complexity O(M ) and the NN has Nc O(M ). With the pseudo-inverses having been calculated, the NLC and NCLC have the same complexity as the NC, faster then NN. This takes a fraction of a second to compute on an HP-9000/770 workstation.

4 Conclusions

6

The NLC approach is a novel classi cation approach which is shown to signi cantly reduce the error rates of the standard and NN classi cation approaches in eigenface based face recognition. The improvement is due to that the NLC representation expands the representational capacity of available prototypes in face database: Variations in lighting, viewing angle and expression among the prototypical face images are accounted for by variations in the weights that determine linear combinations. Currently, the feature vectors used in the paper are those in an eigenface space. The approach can be extended naturally to incorporate any other types of feature vectors, as long as there are more than one prototypes for some classes and where a distance based criterion is suitable for classi cation. As a general

[5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]

nition and the face image set's topology". CVGIP: Image Understanding, 59:254{261, 1994. R. Brunelli and T. Poggio. \Face recognition: Features versus templates". IEEE Transactions on Pattern Analysis and Machine Intelligence, 15:1042{ 1052, 1993. R. Chellappa, C. Wilson, and S. Sirohey. \Human and machine recognition of faces: A survey". PIEEE, 83:705{740, 1995. I. J. Cox, J. Ghosn, and P. Yianilos. \Feature-based face recognition using mixture-distance". In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 209{216, 1996. K. Fukunaga. Introduction to statistical pattern recognition. Academic Press, Boston, 2 edition, 1990. A. J. Goldstein, L. D. Harmon, and A. B. Lesk. \Identi cation of human faces". Proceedings of the IEEE, 59(5):748{760, May 1971. S. Z. Li and J. Lu. \Generalizing capacity of face database for face recognition". In Proceedings of Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, April 14-16 1998. B. Moghaddam, C. Nastar, and A. Pentland. \A Bayesain similarity measure for direct image matching". Media Lab Tech Report No. 393, MIT, August 1996. B. Moghaddam and A. Pentland. \Probabilistic visual learning for object representation". IEEE Transactions on Pattern Analysis and Machine Intelligence, 7:696{710, July 1997. H. Murase and S. Nayar. \Visual learning and recognition of 3-D objects from appearance". International Journal of Computer Vision, 14:5{24, 1995. A. P. Pentland, B. Moghaddam, and T. Starner. \View-based and modular eigenspaces for face recognition". In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 84{91, 1994. L. Sirovich and M. Kirby. \Low-dimensional procedure for the characterization of human faces". Journal of the Optical Society of America A, 4(3):519{524, March 1987. M. A. Turk and A. P. Pentland. \Eigenfaces for recognition". Journal of Cognitive Neuroscience, 3(1):71{ 86, March 1991. S. Ullman and R. Basri. \Recognition by linear combinations of models". IEEE Transactions on Pattern Analysis and Machine Intelligence, 13:992{1006, 1991. T. Vetter and T. Poggio. \Linear object classes and image synthesis from a single example image". IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):733{742, 1997.

Suggest Documents