Tensor-Based Face Representation and Recognition Using Multi-Linear Subspace Analysis
Hadis Mohseni, Member, IEEE ,Shohreh Kasaei, Senior Member, IEEE Sharif University of Technology, Tehran, Iran.
[email protected],
[email protected]. Abstract Discriminative subspace analysis is a popular approach for a variety of applications. There is a growing interest in subspace learning techniques for face recognition. Principal component analysis (PCA) and eigenfaces are two important subspace analysis methods have been widely applied in a variety of areas. However, the excessive dimension of data space often causes the curse of dimensionality dilemma, expensive computational cost, and sometimes the singularity problem. In this paper, a new supervised discriminative subspace analysis is presented by encoding face image as a high order general tensor. As face space can be considered as a nonlinear submanifold embedded in the tensor space, a decomposition method called Tucker tensor is used which can effectively decomposes this sparse space. The performance of the proposed method is compared with that of eigenface, Fisherface, tensor LPP, and ORO4×2 on ORL and Weizermann databases. Conducted experimental results show the superiority of the proposed method. Keywords — Tensor, Face Recognition, Subspace Analysis, Multi-linear Discriminant Analysis
1.
INTRODUCTION
Face recognition has been a very active research topic in pattern recognition and computer vision communities in recent years. It has a wide range of applications such as identity authentication, access control, surveillance, and content-based indexing [1]. Despite of remarkable progresses so far, the general task of face recognition remains a challenging problem due to complex patterns caused by various variations in illumination conditions, facial expressions, and poses. Many methods of face recognition have been proposed during the past 30 years. Face recognition is such a challenging yet interesting problem that has
attracted researchers with different backgrounds (e.g., psychology, pattern recognition, neural networks, computer vision, and computer graphics). In high-level categorization processes, face recognition techniques are divided into three main groups [2]: 1. Holistic matching methods: such as the wellknown PCA which uses the whole face region as the input to the recognition system. 2. Feature-based (structural) matching methods: which typically use local features such as eyes, nose, mouth, and their locations and statistics (geometric appearance) for recognition purposes. 3. Hybrid methods: which use a combination of local features and whole face; just as the human perception system. One can argue that these methods could potentially offer superior performance over the two other types of methods. One of the most important issues in face recognition is to select efficient representations of face images. Psychophysical studies have suggested that the visual perception tasks such as similarity judgment tend to operate on a low dimensional representation of the sensory data. Many representation approaches for face recognition have been suggested (such as simple low resolution “thumbnail” images and geometric features). One other methodology widely used is holistic/local image decomposition with some special 2-D signals (so-called image kernels) such as eigenfaces [3]. Most previous works on statistical image analysis represent an image by a vector in a high dimensional space. However, an image is intrinsically a matrix, or the second order tensor. The relationship between the row vectors of the matrix and the column vectors of that might be important for finding a projection, especially when the number of training samples is small. Recently, multi-linear algebra, the algebra of higher order tensors, was applied for analyzing the multifactor structure of image ensembles. In [5] a novel face representation algorithm called tensorface is proposed which represents a set of face images by a high order
tensor and extends the SVD to HOSVD [6]. This way, the multiple factors related to expression, illumination, and pose can be separated from different dimensions of the tensor. In order to emphasize the benefits of tensor-based approaches, one can notice the discrimination power of different methods used for data projection and discrimination. Many types of these techniques have been proposed in the literature. These include PCA, LDA, Fisher linear discriminant (FLD), or neural networks classifiers. Despite the success of these methods in many applications, they often suffer from the small samplesize problem when dealing with high dimensional face data. This problem is exacerbated by rasterization of 2D image data into 1-D vectors prior to processing, which may conceal higher order structure in images as we mentioned previously (e.g., concatenation of rows can effectively obscure correlations along columns) [4]. From another point of view, these linear methods are usually inaccurate while nonlinear methods lead to high computational cost. Recently, multilinear models have been proposed as a solution to this problem, with higher accuracy than the linear models and lower computational costs than nonlinear methods. In this paper, a new hybrid method is represented. Faces are considered as a high dimensional tensor and a method based on higher order singular value decomposition (HOSVD) is applied for subspace analysis. This tensor-image consists of both the original face image and a modified version of that. The modified face image considers the natural symmetry that exists in face images, especially in the frontal view. In other words, with some neglect, it is possible to divide the face region into two approximately identical left and right regions using a vertical line that crosses the nose tip. This symmetry diminishes gradually as the face rotates from the frontal view to the profile view; where there is no symmetry at all. Therefore, the proposed hybrid method attempts to utilize the information in both the global and symmetry-based face images to derive a more discriminative subspace. More details are provided in Section III. The rest of the paper is organized as follows. In Section II, tensor space and tensor subspace analysis are introduced. In Section III, the procedure and analysis of the characteristics of the proposed method are justified. Section IV is devoted to conducted face recognition experiments on two standard databases and gives the comparison of the obtained results with some traditional subspace learning methods. Finally, in Section V the paper conclusion and future work are discussed.
2.
TENSOR SPACE AND TENSOR SUBSPACE ANALYSIS
Before describing our proposed method, we discuss in greater detail the context of supervised dimensional reduction with tensor data. To facilitate this discussion, we first review some fundamental definitions on tensors and describe a tensor-based subspace analysis (called HOSVD) and its properties.
2.1
Tensor Definition
In multidimensional linear algebra, a tensor of order n (or nD-tensor) is a multiple dimensional array which … describes as . For example, a matrix is a tensor of order 2 or a 2D-tensor and denotes the element of the row and column. In the following, we introduce several definitions in tensor analysis which are essential to present the discriminative orthogonal tensor decomposition. Definition 1- k-mode product: Regarding tensor algebra, a k-mode product of a tensor … … and a matrix is the mapping of …
… …
(1)
…
such that …
… …
…
(2)
.
The k-mode product is generally denoted by . Definition 2- k-mode unfolding: The k-mode unfold… ing of a nth-order tensor into a ma∏ is defined by trix , ∑ ∏ , 1 1 . Fig. …… , , 1 depicts an example of tensor unfolding. Definition 3- Tensors inner product and norm: If A and B are tensors of the same dimension, their inner product is defined by ,
,
,…
,…,
…
…
.
(3)
So, the norm of tensor A is defined as and the distance between tensor X and Y , √ can be calculated by , . … Lemma 1: Take arbitrary tensors … and and projection matrixes 1,2, … . Suppose X and Y are unfolded into matrixes and vectorized, where x and y are the unfolded vectors of X and Y, respectively. Then, we have
from tensor data. Applications include fuzzy modeling, harmonic retrieval, image processing, and classification. In discriminant tensor criterion, multiple interrelated projection matrices (subspaces) are pursued in a way that they maximize the interclass scatter and at the same time minimize the intraclass scatter measured in a tensor metric described by |
arg max |
∑ ∑
Figure 1. Flattening a (3r d-order ) tenso r. The tensor c an be fla tten e d in 3 ways to obtain matrice s comprising its mode-1, mode-2, and mode-3 vec tors.
… … ∏
where
2.2
∏
(4)
denotes the Kronecker product.
Tensor Decomposition
One of the standard tensor decompositions utilized in this paper is the Tucker decomposition. It approx… imates by … …
(5)
where and and Z is called a core tensor. This is the format that results from Tucker decomposition and is therefore termed a Tucker tensor. Tucker decomposition is a striking generalization of the matrix SVD (see Fig. 2). Typically, a least squares approach is used. Tucker decomposition is not unique, but measures can be taken to correct this (see [7]). Tucker approximation of a tensor is useful for dimensional reduction of large tensor datasets. The actual data analysis can then be carried out in a space of lower dimensions. Very often, Tucker compression precedes other analysis such as parallel factor or block decomposition [7]. Tucker approximation is also important when one wishes to estimate signal subspaces
… …
…
(6)
…
is the average of the samples belonging to where class c, is the total average tensor of all samples, and is the sample number of class c. Equation (6) is equivalent to a higher order nonlinear optimization problem with a higher order nonlinear constraint; thus, it is difficult to find a closed form solution for it. Alternatively, an iterative optimization approach is used. It is worth to notice that using tensor representation, multilinear (e.g., bilinear for order 2 tensors) are pursued for discriminative subspace analysis. As above mentioned issues in Section I, tensor-based representation enjoys several advantages over vector-based representation. First, it has the potential to utilize the spatial structure of face images. Second, it suffers less from the curse-of-dimensionality because the multilinear projection has much less parameters to estimate than normal linear vector projections. To give a concrete example, for face images of size 32×32, pursuing one discriminative projection for vector-based representation needs to estimate 32×32=1024 parameters. While for order 2 tensor representation (raw image), pursuing one bilinear projection only needs to estimate 32+32= 64 parameters. Third, because multilinear projection has much less parameters to estimate, it is less likely to be over fitted with the training data, especially when we only have a small number of training examples.
Figure 2. An N-mode SVD orthogonalizes the N ve ctor sp ace s asso ciated with an or der-N tensor (N = 3 ca se is illustrated ).
3.
PROPOSED METHOD AND DISCUSSIONS
In this section, we first build a tensor structure for face images. Each face image is represented by three images of the same size and then a 3D-tensor of (3×image-size) is made from that. The 2D matrix of the first dimension is exactly the image itself; the second one is the gradient of image in the x direction ( , , 1 ), and the third one demonstrates the gradient of image in the y direction ( , 1, ) (see Fig. 3). The gradient images provide some information about the major changes in the image. One of the main problems of using gradient on face images is the resulted peaks on head, face, and neck border of the person. As these changes are less important compared to the changes on the face region, one simple policy is to neglect them by applying a threshold on the gradient image. This way all gradient values over this threshold level are set to zero (therefore, those smaller values gain more importance). One effective novelty of our proposed method is to use a vertical mirror of the input image as a new input to the face recognition process. Obviously, the face of a human can be considered as an object with a vertical symmetry if a line crossing the nose tip divides it into two parts (see Fig. 4.a). Based on this evidence, it can be concluded that if I1 is an image of a face with θ degrees of rotation to the right and I2 is an image of a face with θ degrees of rotation to the left, I1 is expected to be a vertical mirror of I2 (see Figs. 4.b and 4.c). Therefore, for every input image, we construct two 3D-tensors, the first tensor consists of the main image and its vertical and horizontal gradients and the second tensor is the mirrored version of the main image with its corresponding gradient matrices. An obvious advantage of this idea is to provide the algorithm with more
(a) (b) (c) Figure 4. (a) Face ver tical symmetry. (b) Main image. (c) Mirrored image. [Obviously, both (b) and (c) show v alid face images.]
input data and to make it more robust to usual changes in view. In the case of a frontal view entering image, its mirror brings no more important data; however, it has no disadvantage and does not obstruct the algorithm either. Now, we can derive an algorithm for face recognition based on this tensor-based representation. Our proposed method has two phases: the training phase and the testing phase. Based on these two phases, the face database is divided into two separate partitions. For each individual in the database, some of his/her face images are used in the training process and the remained images are judged in the testing phase. In order to prepare training images for the training phase, a 5D-tensor (called DBT) of size (number of individuals(n1) × number of samples per each individual(n2) × 3(n3) × number of image rows(n4) × number of image columns(n5)) is required. This phase has the duty to decompose this tensor to 5 projection matrices by applying the Tucker tensor decomposition method described in Section II. This results in
(7)
where Z is the core tensor of size ( ) holding the inter-relation among 5 other prois a projection matrix of jection matrices. Here, size ( ). In the testing phase, each probe image is first converted to 3D-tensor using its gradients. The obtained tensor is now projected on projection matrices and the result is compared with the projection of all training input 3D-tensors to find the most similar train image. The comparison is applied by utilizing tensor norms. Fig. 5 demonstrates the details of the testing procedure.
4. Figure 3.
3D-ten sor of a typical input image.
EXPERIMENTAL RESULTS
In this section, two standard face databases, ORL and Weizermann, are used to evaluate the effectiveness
1. 2. 3. 4.
Training-Images-proj = I = input image Ix=gradientx(I) & Iy=gradienty(I) In-3D-Tensor(1,:,:) = I In-3D-Tensor(2,:,:) = Ix In-3D-Tensor(1,:,:) = Iy 5. Proj-In =
In-3D-Tensor
6. for i in Individuals for j in samples per Individuals d = || Proj-In - Training-Images-proj(i,j) if (d < min_d) min_d = d; best-matched-individual = i end end end 7. output best-matched-individual Figure 5.
Proposed testin g phase.
of our proposed algorithm which is based on multilinear decomposition of higher order tensor in face recognition accuracy. All experiments are performed in Matlab™ 7.0.4 on a computer with 2.4 core (TM) 2 Duo CPU and 2GB of RAM. Also, Tensor toolbox 2.2 [11] is used for tensor-based representation and decomposition.
4.1
ORL Database
The ORL database contains 400 images of 40 individuals. These images are captured at different times and have different variations including expression (open or closed eyes, smiling or non-smiling) and facial details (glasses or no glasses). The images were taken with a tolerance for some tilting and rotation of the face up to 20 degrees. All images were in grayscale and normalized to the resolution of pixels and histogram equilibrium was applied in the preprocessing step. Ten sample images of one person in the ORL database after scale normalization are displayed in Fig. 6. Four sets of experiments were conducted to compare the performance of our proposed method with eigenface, Fisherface, tensor LPP, and ORO4×2[9]. In each experiment, the image set was partitioned into the training and test sets with different numbers. For ease of representation, the experiments are named as TRm/TEn which means that m images per person are randomly selected for training and the n remaining images for testing. Table 1 lists the best face recognition accuracies of all the algorithms in our experiments with different training and test set partitions. The comparative results prove that our method outperforms all the four sets of experiments, especially in
the cases with a small number of training samples. Since our approach duplicates the number of training images provided by the database, higher recognition accuracy is obtained. Due to ref. [10], the ORL is a small database and training the algorithm even with 5 images per individual does not completely represent the variations of the testing image and cannot capture the discriminative information. This shows why the eigenface and Fisherface approaches perform inferior recognition results. Using higher order data without vectorizing image data decreases the space dimension and improves the recognition results.
4.2
Weizermann Database
Weizmann face database consists of 28 male subjects photographed in 15 different poses, under 4 illuminations performing 3 different expressions. Original images have 512×352 pixels. Fig. 7.a shows 20 images from different people in this database and Fig. 7.b shows some images of one person in different modes. As this database has categorized face images in different poses, illuminations, and expressions, the database tensor is a little different from the ORL case. Here, DBT is a 6D-tensor of size (number of individuals(n1) × number of views(n2) × number of illumination conditions(n3) × number of expression modes(n4) × number of image rows(n5) × number of image columns(n6)). Although there is no negative effect, it is acceptable to make the input data tensor without including gradient images or the mirror image as there are sufficient samples for each person. But, due to the large image size, it is almost impossible to vectorize images for the subspace analysis, because of curse-ofdimensionality problem and high computational cost. Table 1. Recognition ac cura cy (%) comparison of eigenface, Fishe rfa ce, ten sor lpp, oro, and prposed mehtod on orl databas e.
Eigenface Fisherface Tensor LPP ORO4×2 Proposed
TR5/ TE5 85.9 92.2 95.8 97 97.5
TR4/T E6 82 89.5 90.3 91.8 92.3
TR3/ TE7 75.4 84.2 87.1
Figure 6. Ten typical images in ORL.
TR2/T E8 66.3 72.9 84.1
(a)
(b) Figure 7. The facial image databa se (28 subjects × 45 images per s ubject). (a) 2 0 subjects are shown in expr ession 2 (smile), viewpoint 3 (frontal ), and illumination 2 (fron tal). (b ) Some images for subjec t 1. Left to right, the thr ee panels s how images captured in illuminations 1, 2, and 3. [Within each panel, images of e xpr essions ar e shown horizontally while images from viewpoints are shown ve rtically. ]
These are the main reasons of image down-sampling in the majority of methods that experiment such a database if they do not neglect it. However, it is clear that reducing the image size causes data loss and ruins a lot of face details. Higher order tensors have the capability of working with such great and high resolution databases with no memory or time difficulty and utilizing the available features of them. Based on tensor decomposition, one can find the pose, illumination, and expression of the test image in addition to the individual itself using projection matrices. By applying the proposed method on the preprocessed data of this database above 90% recognition accuracy was achieved. Here, we needed more complex pre-processing stage to separate each face region from its surrounding environment.
5.
CONCLUSION AND FUTURE WORKS
A novel algorithm was proposed in this paper for supervised dimensional reduction using general tensor representation. In the proposed method, image objects were encoded as an nD-order tensor. An approach called Tucker decomposition was used to iteratively learn multiple interrelated discriminative subspaces to reduce the dimension of the higher order tensor. Compared with traditional algorithms, such eigenface, Fisherface, and tensor LPP, our proposed algorithm effectively avoids the curse of dimensionality dilemma and alleviates the small sample size problem
by using the image gradient version and its vertical mirror. It is efficient to apply this algorithm on high dimensional databases and in those who have symmetry in their objects. In order to improve the accuracy of the method, an efficient preprocessing stage can be applied. Also, an accurate face detection method that accurately separates the face region can enhance the recognition accuracy (as additional data in input images decrease the subspace decomposition efficiency). Furthermore, there are a variety of feature-based properties in face images that can be introduced to the tensor space to enrich the available information and lead the decomposition process to the near optimal solution.
6. [1] [2]
[3] [4]
[5]
[6] [7] [8] [9]
[10]
[11] [12]
[13]
REFERENCES K. Delac, M. Grgic and M. S. Bartlett, Recent Advances in Face Recognition, IN-TECH: Vienna, Austria, December 2008. W. Zhao, R. Chellappa, P.J. Phillips and A. Rosenfeld, “Face Recognition: A Literature Survey”, ACM Computing Surveys, vol. 35, no. 4, pp. 399–458, December 2003. S. Rana, W. Liu, M. Lazarescu and S. Venkatesh, “Efficient tensor based face recognition”, IEEE Conf. ICPR08, pp. 1-4, December 2008. D. Xu, S. Lin, S. Yan and X. Tang, “Rank-one projections with adaptive margins for face recognition”, IEEE Trans, J. Systems, man and cybernetics, vol. 37, no. 5, pp. 1226-1236, October 2007. M.A.O. Vasilescu, D. Terzopoulos, “Multilinear subspace analysis for image ensembles: Tensor-faces”, IEEE Conf. on Computer Vision and Pattern Recognition, Madison, WI, vol. 2, pp. 93-99, January 2003. X. He, D. Cai, P. Niyogi, “Tensor subspace analysis”, 9th annual Conf. on Neural Information Processing Systems, December 2005. L.D. Lathauwer, K.U. Leuven, “A survey of tensor methods”, Reported by Science and Technology and EE. Dept, Belgium. M.Turk and A.Pentland, “Eigen-faces for recognition”, J. Cognitive Neuro-Science, vol. 3, pp. 71-86, 1991. G. Hua, P. Viola, S. Drucker, “Face recognition using discriminatively trained orthogonal rank one tensor projections”, IEEE Conf. on Computer Vision and Pattern Recognition, Minneaplois, 2007. X. Jiang, B. Mandel and A. Kot, “Eigen-feature regularization and extraction in face recognition”, IEEE Trans, J. Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 383-393, March 2008. B.W. Bader and T.G. Kolda, Tensor Toolbox version 2.2, Sandia National Laboratories S. Yan, D. Xu, Q. Yang, L. Zhang, X. Tang and H.J. Zhang, “Multi-linear discriminant analysis for face recognition ”, IEEE Trans. J. on Image Processing, vol. 16, no. 1, pp. 212-219, January 2007. M. Thomason, J. Gregor, “Fusion of Multiple Images by Higher-Order SVD of Third-Order Image Tensors”, Technical Report, EECS Dept, Univ. of Tennessee, November 2007.