and additive are extracted from an input image. Then those features are linearly combined on the basis of multivariate analysis methods so as to provide new ef-.
A Face Recognition Method Using Higher Order Local Autocorrelation And Multivariate Analysis 1) T. Kurita , 1)
N. Otsu
Electrotechnical Laboratory
1-1-4 Umezono, Tsukuba, Japan 305
1)
,
2)
and
T. Sato
2)
OITA-AIST Joint Research Center 1977 Nakahanda, Oita, Japan 877-76
Abstract
This paper 1 proposes a face recognition method which is characterized by structural simplicity, trainability and high speed. The method consists of two stages of feature extractions. At rst, higher order local autocorrelation features which are shift-invariant and additive are extracted from an input image. Then those features are linearly combined on the basis of multivariate analysis methods so as to provide new effective features for face recognition in learning from examples.
Adaptive Learning x
MVA Output
Teacher Input
Geometrical Feature Extraction
Introduction
The face in an image taken by TV camera moves at least a little even when the person is sitting on a chair. It is usually impossible to stop such motion. Therefor it is important to extract features which are invariant to translation (shift) of the face in the image. Otherwise, the matching process of features becomes complicated, necessitating preprocessing for segmentation. We have developed a prototype system for face identi cation on a conventional workstation with a video interface. Higher order local autocorrelation features were employed as the primitive features at the rst stage of feature extraction. Then those features are linearly combined by using linear Discriminant Analysis or Multiple Regression Analysis to identify the person. The recognition rates were more than 99 % for identi cation of 10 persons and more than 92% for 50 persons. The speed of recognition was about 2 images per second. 1 Proc. of 11th Internatinal Conf. on PATTERN RECOGNITION, The Hague, Aug.30-Sep.3, Vol.II, pp.213-216, 1992.
Statical Feature Extraction
Figure 1: The scheme for image recognition systems. 2
1
y
Scheme for Practical Image Recognition Systems
We considered the following essential requirements for practical image recognition systems; 1) To be shiftinvariant (the results are irrelevant to where the objects locate in the image); 2) To be additive (if there are two objects in an image, feature values should be sum of those of each object); and 3) To be trainable (the systems can learn the task from the given training examples). Thus the following general scheme of feature extraction for image recognition systems was proposed [1]; General and primitive features which are shift-invariant and additive are extracted from the image.
Geometrical Feature Extraction:
Those features are linearly combined on the basis of multivariate analysis methods so as to provide new eective features.
Statistical Feature Extraction:
Such systems can adaptively and automatically learn the task from the given supervised training sam-
ples. Fig.1 shows the scheme for our image recognition systems. The basic idea of this scheme is similar to Perceptron [2] or neural networks but more practical because it is based on primitive feature extraction which satis es essential requirements and closed-form solutions for statistical performance criteria are obtained without employing slow iterative learning processes. 3 3.1
Primitive Features Higher Order Local Autocorrelation Features
Let an image plane be denoted by P . Images on P are represented by functions f (r) 0 de ned within P , where r 2 P and the support of f ; Supp(f ) = frjf (r) > 0g, is included in P ; Supp(f ) P . Then a shift (translation) of f (r) within P is represented by T (a)f (r ) = f (r + a);
where the displacement a 2 R2 is restricted such that the support does not exceed P . Let x[f ] denote a feature of the image f (r) extracted over P . Then, the requirement 1), namely, shift invariance of x[f ], is represented by for 8a;
x[T (a)f ] = x[f ]
Supp(T (a)f ) P:
The requirement 2), namely, additivity of x[f ], is represented by x[f1 + f2 ] = x[f1 ]+ f [f2 ]
for
Supp(f1 ) \ Supp(f2 ) = :
These requirements lead us to such features that are given by sums of local features over P . It is well known that the autocorrelation function is shift-invariant. Its extension to higher orders has been presented in [3]. The N th-order autocorrelation functions with N displacements a1 ; . . . ; aN are de ned by xN f (a1 ; . . . ; aN ) =
Z
P
f (r)f (r + a1 ) 1 1 1 f (r + aN )dr :
Since the number of these autocorrelation functions obtained by the combination of the displacements over the image f are enormous, we must reduce them for practical application. First, we restrict the order N up to the second (N = 0; 1; 2). It is seen that the 0thorder autocorrelation just corresponds to the averaged gray-level of the image f . We also restrict the range
* * * * 1 * * * *
* * * * 1 1 * * *
* * 1 * 1 * * * *
* 1 * * 1 * * * *
1 * * * 1 * * * *
* * * 1 1 1 * * *
* * 1 * 1 * 1 * *
* 1 * * 1 * * 1 *
1 * * * 1 * * * 1
* * 1 1 1 * * * *
* 1 * * 1 * 1 * *
1 * * * 1 * * 1 *
* * * 1 1 * * * 1
* * * * 1 1 1 * *
* * 1 * 1 * * 1 *
* 1 * * 1 * * * 1
1 * * * 1 1 * * *
* 1 * 1 1 * * * *
1 * * * 1 * 1 * *
* * * 1 1 * * 1 *
* * * * 1 * 1 * 1
* * * * 1 1 * 1 *
* * 1 * 1 * * * 1
* 1 * * 1 1 * * *
1 * 1 * 1 * * * *
Figure 2: Local mask patterns for primitive feature extraction. of displacements to within a local 3 2 3 window, the center of which is the reference point. By eliminating the displacements which are equivalent by the shift, the number of the patterns of the displacements are reduced to 25. Fig.2 shows the patterns, where the symbol \3" represents \don't care". The primitive features xj are obtained by scanning the image over P with the 25 local 3 2 3 masks and by computing the sums of the products of the gray values of the corresponding pixels to \1". The features are obviously shift-invariant and also additive for isolated objects on P . 3.2
Features on a Resolution Pyramid
What is the best spatial resolution for face recognition? The primitive features computed from the image of the highest ( nest) resolution may include only detail information. Often faces can be more easily recognized in images with lower resolution because faces are characterized by more global features. A Pyramidal image data structure gives a set of images of dierent resolutions from the highest to lower [5]. A straightforward method to construct a pyramidal structure is the following. An image is partitioned into nonoverlapping neighborhoods of equal size and shape and each of those neighborhoods are replaced by the average. This operation is repeated until the image of the desired resolution is obtained. A set of the higher order local autocorrelation features extracted from each of the images in the pyramidal structure includes from detail to rough informa-
tion of objects and are still invariant to shift of objects. The features can be good primitive features. 4
Multivariate Analysis
Each of those higher-order local autocorrelation features extracted from an image would be considered to be insucient for the recognition task. However those features are general and primitive, and they must, in total, have enough information of the object to be recognized. To get new eective features for image recognition, it is necessary to combine those features. And also to satisfy the requirement 3), trainability, we use multivariate data analysis methods for the second stage of feature extraction. In multivariate data analysis methods, new features y = (y1 ; . . . ; yN )T are given by linear combinations of the primitive features x = (x1 ; . . . ; xM )T with weights A = [aij ] and constants b = (b1 ; . . . ; bN )T as follows y = AT x + b;
(1)
where the symbol T denotes the transpose and M is the number of primitive features. Then the optimal parameters are determined so as to optimize a criterion function which is set to evaluate the performance of the linear model for a given task from the learning samples. Since the equation (1) is linear, the optimal parameters are usually obtained by solving linear equations or eigenvalue problems. 4.1
Linear Discriminant Analysis
A general purpose image recognition system is constructed by using Linear Discriminant Analysis for the second stage of feature extraction of our scheme and by combining it to a classi er. Suppose that we have K classes fCk gKk=1 . Then the within-class and the between-class covariance matrices of the primitive features are computed from training samples as 6W =
K X ! 6 ;
k=1
k k
6B =
K X ! ( x k=1
k k 0 x T )(xk 0 x T )T ;
where !k , x k , x T , and 6k denote a priori probability of class Ck (usually set to be equally 1=K ), the mean vector of class Ck , the total mean vector, and the covariance matrix of class Ck , respectively. The discriminant criterion tr(6^ 0W1 6^ B ) is used to evaluate
the performance of the discrimination of the new features y and is maximized, where 6^ W and 6^ T are the within- and between-class covariance matrices de ned similarly on y. The optimal coecient matrix A is then given by solving the following eigen-equation 6B A = 6W A3 (AT 6W A = I ); where 3 is a diagonal matrix of eigenvalues and I denotes the unit matrix. The j -th column of A is the eigenvector corresponding to the j -th largest eigenvalue. Thus, the N new features yj are evaluated in its importance for discrimination by eigenvalues. The maximum number N is bounded by min(K 0 1; M ). To identify the class of the object, we can use a simple classi er which checks the distances from an input y to class means fy k g and classi es the input to such class Ck that gives the shortest distance. 4.2
Linear Discriminant Regression Analysis
A general purpose recognition system is also constructed by using Multiple Regression Analysis. We call this Linear Discriminant Regression Analysis. Suppose that we have K classes fCk gKk=1 . Let the representative vectors of each class be zk . For example, we can set them to orthonormal bases ek in K dimensional vector space. The coecients A and the constants b are determined so as to minimize the mean square error between the desired representative vector z k and the estimated vector y . This criterion is the same to that of the back-propagation learning of multilayer neural networks. The closed-form solution is given by = 60X1 6XZ b = z T 0 AT x T ;
A
P
(2)
where 6PX = Kk=1 !k ECk (x 0 x T )(x 0 x T )T and 6XZ = Kk=1 !k ECk (x 0 x T )(zk 0 z T )T . Compared with Discriminant Analysis, this necessitates only matrix inversion. Thus, this is suitable for on-line face recognition. If orthonormal bases ek are used as the representative vectors, the mapping gives a linear approximations of Bayes's posterior probabilities fP (Ck jx)jk = 1; . . . ; K g [4]. In this case, we can use a simple classi er which checks the values of y and classi es it to such class Ck that gives the largest value.
Table 1: and R of each resolution. 1 1 1 1 1 1 1 2 4 8 16 16 64 1.14 1.46 1.40 1.29 1.29 0.84 0.67 R(%) 95 100 98 98 98 98 100
5 5.1
Experiments Preliminary Experiment by Discriminant Analysis
To assess the viability of this approach to face recognition, we have performed preliminary experiment with stored face images of 3 men. The number of images of each person is 20 and the size of the images is 300 by 200 pixels. After obtaining higher order local autocorrelation features of the images with highest resolution, new features y were extracted by Linear Discriminant Analysis. The measure of discrimination = tr(6~ 0T 1 6~ W ) was 1.143. By the simple classi er which checks the distances with the class means of new features y, the recognition rate was 100 %. To estimate the recognition rate for unknown data, we used leave one out method. The estimated recognition rate was 95%. To investigate in uence of resolution to face recognition, we have performed experiment of face recognition with dierent resolutions. The same stored face images of 3 men were used. Pyramidal images were constructed for each image and primitive features were extracted from images of each resolution in the pyramid structure. Recognition rates of each resolution were measured by leave one out method. Table 1 shows and the recognition rates R (%) of each resolution. Next we have performed the experiment using a set of features extracted from each of images in the pyramidal structure. In this case, was 1.917 and the recognition rate estimated by leave one out method was 100 %. 5.2
On-line Face Recognition System
We have developed a prototype system for on-line face recognition on a conventional workstation (SUN SparcStaion) with a video interface. We have used the techniques described in 4.2 as the second stage of feature extraction. Images of faces of 5 men and 5 women were recoded in Video Tape. The images were taken from
the Video and primitive features were computed for about 1000 images. These were used as the learning samples. Similarly another 1000 images were taken and were used as the test samples. In this experiments we used features extracted from images with xed resolution. The recognition rate was 99.4 %. Next we increased the number of persons to 50 and performed the similar experiments. Then the recognition rate was 92.2 %. It could be increased by using the set of features extracted from each of images in the pyramidal structure. Because of the simplicity of the system, the speed of learning and recognition is about 2 images per second including the time which is necessary to read and display the images. This high speed implies the potential use of the method as a real time face recognition system. 6
Conclusion
We proposed a face recognition method which consists of two stages of feature extraction (Geometrical Feature Extraction and Statistical Feature Extraction). The shift-invariant and additive feature extraction based on higher order local autocorrelation makes the face recognition system simple in its structure, because complicated matching process or preprocessing for segmentation are not necessary. Experimental results shows that linear models by the multivariate analysis are enough for face recognition as the second stage of feature extraction. As the result, the learning becomes very fast due to the closed form solution by the multivariate analysis, unlike the slow iterative learning of Perceptron or neural networks. References
[1] N.Otsu and T.Kurita, "A new scheme for practical exible and intelligent vision systems," Proc. IAPR Workshop on Computer Vision, Tokyo, pp. 431-435 , 1988. [2] M. Minsky and S. Papert, Perceptrons, MIT Press, 1969. [3] J.A.Mclaughlin and J.Raviv, "N th-order autocorrelations in pattern recognition," Information and Control, vol.12, pp.121-142, 1968. [4] N. Otsu, "Mathematical Studies on Feature Extraction in Pattern Recognition," (in Japanese) Researches of the ETL, No. 818, 1981. [5] D.H.Ballrad and C.M.Brown, Computer Vision, Prentice-Hall, 1982.