Face Recognition Using Nearest Feature Space ... - IEEE Xplore

5 downloads 0 Views 2MB Size Report
Ying-Nong Chen, Chin-Chuan Han, Member, IEEE, Cheng-Tzu Wang, and Kuo-Chin .... C.C. Han is with the Department of Computer Science and Information.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 33, NO. 6,

JUNE 2011

1073

Face Recognition Using Nearest Feature Space Embedding Ying-Nong Chen, Chin-Chuan Han, Member, IEEE, Cheng-Tzu Wang, and Kuo-Chin Fan Abstract—Face recognition algorithms often have to solve problems such as facial pose, illumination, and expression (PIE). To reduce the impacts, many researchers have been trying to find the best discriminant transformation in eigenspaces, either linear or nonlinear, to obtain better recognition results. Various researchers have also designed novel matching algorithms to reduce the PIE effects. In this study, a nearest feature space embedding (called NFS embedding) algorithm is proposed for face recognition. The distance between a point and the nearest feature line (NFL) or the NFS is embedded in the transformation through the discriminant analysis. Three factors, including class separability, neighborhood structure preservation, and NFS measurement, were considered to find the most effective and discriminating transformation in eigenspaces. The proposed method was evaluated by several benchmark databases and compared with several state-of-the-art algorithms. According to the compared results, the proposed method outperformed the other algorithms. Index Terms—Face recognition, nearest feature line, nearest feature space, Fisher criterion, Laplacianface.

Ç 1

INTRODUCTION

R

ECENTLY,

the manifold-based learning approach in face recognition has attracted a lot of researchers. He et al. [1] proposed an eigenspace method, called Laplacianface, to preserve the local structure of training samples. Using the locality preserving projection (LPP), the locality of a manifold structure is preserved by a nearest-neighbor graph (NNgraph) in face feature spaces. According to the consequences in [1], the local manifold structure preserved by the LPP method is more effective than the global euclidean structure (e.g., PCA [2] or LDA approach [3]). It is more suitable for classification in a number of applications. Moreover, an orthogonal locality preserving projection (OLPP) method enhanced by Cai et al. [4] could preserve more local information. The other nonlinear manifold structures could be generated by Isomap [5], [6], locally linear embedding (LLE) [7], [8], Laplacian Eigenmap [9], topology preserving nonnegative matrix factorization [10], and unsupervised discriminant projection (UDP) [11] approaches. In these unsupervised methods, the class information is not used in finding the transformation matrices. Similarly to the PCA approach, they work well for dimensionality reduction and sample reconstruction, but not for classification.

. Y.N. Chen and K.-C. Fan are with the Department of Computer Science and Information Engineering, National Central University, No. 300, Jhongda Rd., Jhongli City, Taoyuan County 32001, Taiwan (R.O.C.). E-mail: [email protected], [email protected]. . C.C. Han is with the Department of Computer Science and Information Engineering, National United University, No. 1, Lienda, Kung-ching Li, Miaoli City, 36003, Taiwan (R.O.C). E-mail: [email protected]. . C.T. Wang is with the Department of Computer Science, National Taipei University of Education, No. 134, Sec. 2, Heping E. Rd., Da-an District, Taipei City 106, Taiwan (R.O.C.). E-mail: [email protected]. Manuscript received 15 Nov. 2009; revised 2 May 2010; accepted 15 Sept. 2010; published online 9 Nov. 2010. Recommended for acceptance by S. Li. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TPAMI-2009-11-0763. Digital Object Identifier no. 10.1109/TPAMI.2010.197. 0162-8828/11/$26.00 ß 2011 IEEE

Prior class labels were adopted to guide the procedure of discriminant analysis in many algorithms, such as the LDA [3], F-LDA [12], D-LDA [13], K-DDA [14], FD-LDA [15], and WLDA [16]. The Fisher criterion was optimized by maximizing intraclass compactness and interclass separability. The modified within-class and between-class scatters were calculated based on LPP for preserving the local structures. The similarity matrices for describing the neighborhood relations between sample points were calculated in the supervised LPP [1], the orthogonal neighborhood preserving discriminant analysis (ONPDA) [17], the kernel class-wise LPP [18], the marginal Fisher analysis (MFA) [19], etc. Yan et al. [19] proposed a general framework, called graph embedding, for providing a common perspective on various algorithms. Two graphs, an intrinsic graph (IG) and a penalty graph (PG), describe the intraclass point adjacency relationship and the interclass marginal point adjacency relationship in three functions: linearization, kernelization, and tensorization. All of these approaches preserve both the local intraclass and interclass neighborhood structures. In addition to the discriminant analysis in eigenspaces, other problems in face recognition have been discussed. First, a small sample size (S3) is also a problem. Many methods have been proposed to solve this problem, such as PCA plus LDA [20], null space [21], discriminative common vector(DCV) [22], and principal nonparametric subspace analysis(PNSA) [23]. Second, a limitation is assumed in the traditional LDA approach. The data distribution in each class is assumed to be of a Gaussian distribution. The performance usually degrades for non-Gaussian distributions in many cases. Nonparametric discriminant analysis(NDA) [23], [24], MFA [19], and TMAF-SVM [31] algorithms use the features points located at the class boundaries to remove this assumption. Next, many researchers try to extract the regularity of face features and eliminate unwanted noises from eigenspaces. Jiang et al. [25] decomposed the within-class scatter into three subspaces: the reliable subspace, the unstable subspace, and the null Published by the IEEE Computer Society

1074

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

space. Wang and Tang [26] assumed that variations of face images in feature spaces are mixed with the intrinsic, transformation, and noise factors. Rajagopalan et al. [27] proposed an eigen-background space for background learning. While the eigenspace is related to representation, neural network [28], [29], attributed relational graph (ARG) [30], TMAF-SVM [31], and HMM [32] classifiers are methods for driving distances between templates. The nearest linear combination (NLC) approach [33], which includes the nearest feature line (NFL) [34], [35] in the earlier development, explores information contained in more than one feature point within the same class. Multiple feature points are linearly combined, yielding a linear subspace to represent the class, and the class of the nearest subspace to the query point is chosen as the final classification. This leads to the concept of nearest feature subspace (NFS). Consider the simplest case, where a feature line is generated by two feature points. It linearly interpolates or extrapolates each pair of feature points within the same class. An infinite number of pseudoprototypes for each class are generated by linear interpolation. The classification task is done by selecting the minimal distance between the input and the feature lines. For the FR problem, the new image of a person is linearly approximated by multiple prototypes belonging to the same person in eigenspaces [33]. Variations among face images in pose, illumination, and expression (PIE) changes have been accounted by the weight variations. They also conduct a performance evaluation in image classification and retrieval. The NFL-based or nearest feature plane (NFP)-based classifier (a simple NLC-based classifier) has been utilized for face recognition by using waveletface features [36]. According to the evaluation result, the NFLbased method outperformed the NN-based classifier. But, it was implemented and executed during the matching phase. The disadvantage of the NFL-based method is that extra computation is needed which rapidly and exponentially increases with the number of feature points. In this paper, a nearest feature space embedding (NFS embedding) method for face recognition is proposed. Differently from the construction of a point-based relationship in traditional eigenspace approaches, the distance measurement of NFL or NFS, inspired from [33], [34], [35], is embedded in the transformation through discriminant analysis. The comparison between the NFS embedding method and the traditional methods is briefly described as follows: The LPP-based or LLE-based methods try to keep the locality among feature points instead of the global euclidean structure. An adjacency matrix is constructed to represent the point-to-point (P2P) connectivity relationship among neighbor points. The discriminant power of transformation depends on the constructed matrix. Since a small number of samples (prototypes) are collected during the training phase, the relationship is poorly modeled. Besides, many unprototyped samples with varied PIE conditions are unavailable in training. Linear combinations of original prototypes virtually generate the unprototyped samples. The point-to-line (P2L) or point-to-space (P2S) measurement achieves the better classification results [33], [34], [35], [36]. The NFS embedding constructed the P2L or the P2S adjacency matrix instead of the P2P matrix for obtaining the effective transformation. This measurement is directly embedded in the transformation in the discriminant analysis, not in the classification phase.

VOL. 33,

NO. 6,

JUNE 2011

The procedure of the proposed NFS embedding method is described as follows: The PCA process is first performed to reduce the dimensionality in order to avoid the S3 problem. Second, the projection points for all points to the possible feature spaces (lines, planes, or high-dimensional subspaces) are obtained, and their corresponding distances are calculated. Next, the class labels are included in the scatter computation for class separability. Consider the vectors from a point to the feature spaces; the vectors with the K1 smallest lengths within the same class and those with the K2 smallest lengths belonging to the different classes are used to calculate the within-class and the between-class scatters, respectively. The eigenvectors w with the r largest positive eigenvalues are found by maximizing the Fisher criterion, and the transformation w ¼ wP CA w is generated. The main contributions of this study are summarized as follows: Three factors, NFS measurement, neighborhood structure preservation, and class separability, are all considered in finding the effective and discriminating transformation matrices.

1.1 NFS-Based Measurement The NFS-based distance measurement is embedded in discriminant analysis. Since a feature line could linearly approximate the variation between two sample prototypes, the measurement of NFL-based distance is more suitable than that of a point-based distance [8], [9], [10]. As mentioned above, much time is needed for comparing during the matching phase. The distance computation from a point to a feature space is directly embedded in the projection transformation instead of the calculation during the matching phase. 1.2 Neighborhood Structure Preservation In the Laplacianface approach [1], the weighted distances between neighboring points are minimized and represented in a Laplacian matrix to preserve the structure locality. The NFS-based distance scatter, a form of Laplacian matrix, is deduced for locality preservation. The weighted relationships are set between points and NFSs instead of the P2P relationships in a Gaussian distribution. The transformation obtained provides more effective discriminating power. Furthermore, the representation of LPP and LLE has been denoted as the converged cases of the proposed NFS embedding. 1.3 Class Separability The class information is adopted both in the intraclass and interclass scatter computation to maximize the Fisher criterion for class separability. Since a large number of possible vectors are generated, discriminating feature vectors located at the interclass boundaries are used to compute the scatter matrices so that more effective transformed eigenvectors can be found. It is unnecessary to make an assumption of a Gaussian distribution. Two parameters, K1 intraclass NFS-based vectors and K2 interclass NFS-based vectors, are manually set to choose the discriminating feature vectors. The scatters of the Fisher criterion, using point-to-NFS measurement, are more discriminating than the P2P ones. The rest of this paper is organized as follows: Several eigenspace-based approaches are reviewed in Section 2. In Section 3, the NFL and NFS algorithms are first reviewed,

CHEN ET AL.: FACE RECOGNITION USING NEAREST FEATURE SPACE EMBEDDING

and then these NFS-based distances are embedded in the discriminant analysis. Several experiments, conducted to show the effectiveness of the proposed method, are presented in Section 4. Furthermore, comparisons with several state-of-the-art eigenspace-based methods are also made. Finally, conclusions and discussion of future work are given.

2

A REVIEW OF EIGENSPACE APPROACHES

Consider N d-dimensional samples x1 ; x2 ; . . . ; xN , which constitute NC classes of faces. Let xij denote a sample in space Rd representing the jth sample in the ith class of size Ni . The total sample mean vector m and the class mean vector of the ith class mi are calculated from the training samples. New samples in low-dimensional space can be obtained by the linear projection yi ¼ wT xi , where w is a linear transformation matrix that needs to be found.

2.1 Local Structure Preserving Algorithm LPP [1] and LLE [7], [8] are two popular and effective manifold learning approaches used to keep the local structure of samples. Unsupervised LPP represents the topological structure of samples with an adjacency graph without using the class label information. The best transformation matrix w is obtained by solving the following minimization problem: X w ¼ arg min kyi  yj k2 Si;j w

¼ arg min w

i6¼j

X

kwT xi  wT xj k2 Si;j

ð1Þ

i6¼j

¼ arg min trðwT XLXT wÞ: w

Matrix L ¼ D  S is the Laplacian matrix in which D is the column sums of the similarity matrix S. The transformation matrix w is given to be the eigenvectors with the smallest eigenvalues by solving the general eigenvalue problem: XLXT w ¼ XDXT w. In their approaches, only the relationship between two points is considered in constructing the graph. The weighted values are intuitively assigned as one (i.e., two points are connected) or as a Gaussian distance function expðkxi  xj k2 =tÞ for two “close” samples xi and xj . They are in inverse proportion to the distances between points. The local structure of training samples is preserved by the constructed graph. Furthermore, the orthogonal projection bases are generated to obtain the more discriminating power from Cai et al. [4]. LLE is also an unsupervised learning algorithm for dimension reduction in preserving local neighborhood structures. The reconstructed error between point yi and its neighboring points is minimized as defined below:  2  X X  ¼ Mi;j yj  ¼ trðYðI  MÞT ðI  MÞYT Þ; yi     i j Y ¼ ½y1 y2 . . . yN : ð2Þ According to the consequences in [19], the minimization of (2) in LLE algorithm can be formulated as Y ¼ arg min YT ðI  MT ÞðI  MÞY ¼ arg min YT ðD  WÞY, in

1075

which the similarity matrix Wi;j ¼ ðM þ MT  MT MÞi;j if i 6¼ j; 0, otherwise. The matrix ðI  MT ÞðI  MÞ is represented as a Laplacian matrix. The LLE algorithm minimizes the reconstructed errors, while LPP minimizes the distance summation between neighboring points. Both are represented as a Laplacian matrix. Similarly to the PCA approach, the LLE and unsupervised LPP algorithms work well for dimensionality reduction and sample reconstruction, but not for classification. In addition, it is still an open problem to select an appropriate parameter t in the similarity matrix.

2.2 Optimization of the Fisher Criterion When the class information is considered in constructing the adjacency matrix, the intraclass (within-class) scatter and the interclass (between-class) scatter are first calculated and then the Fisher criterion is maximized. As mentioned in [1], the class labels of samples within the same class could be adopted in the supervised LPP method. Li et al. [18] also defined three possible sample relationships within a class. When two samples xi and xj belong to the same class, the values in the similarity matrix S are respectively defined as 1, expðkxi  xj k2 =tÞ, and xTi xj =kxi kkxj k. If not, the values are assigned as 0. In their approach, only the sample similarities within the same class are measured, whereas the sample measurements between different classes are not assessed. Second, the class labels can be used both in the adjacency matrix and Fisher’s criterion, simultaneously [17]. The parameters in matrix S are set as Si;j ¼ expðjjxi  xj jj2 =jjxi jj jjxj jjÞ. In addition, two weighted matrices, RW i;j and RB i;j , are embedded into the computation of the withinclass and the between-class scatter matrices. Yan et al. [19] designed a marginal Fisher criterion (named marginal Fisher analysis, MFA) to avoid the limitation of data distribution. Only the feature points near the marginal boundary are used to compute the interclass and intraclass scatters for maximizing the Fisher criterion. In their approaches, only simple P2P relationships with a Gaussian distribution function are intuitively considered. More details can be found in [17], [18], [19].

3

NEAREST FEATURE SPACE EMBEDDING (NFS EMBEDDING)

In this study, an NFS embedding discriminant analysis is proposed by maximizing Fisher’s criterion. Three factors, namely, class separability, neighborhood structure preservation, and nearest feature space measurement, are all taken into consideration to find the effective and discriminating projection from the original image space into the transformed space.

3.1 Nearest Feature Space (NFS) Strategy In pattern classification, a new sample xi is identified to be the ID of its nearest neighbor (NN) when the distance between them is the smallest. Li and Lu [34], [35] designed an NFL classifier for face recognition. The feature line is generated by any two feature points. The smallest distance is selected from C2N1 distances which are the distances from a query point to its projection points on C2N1 feature lines. Furthermore, the general NLC-based classifier (e.g.,

1076

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 33,

NO. 6,

JUNE 2011

two simple NFL and NFP classifiers) enhances the classification performance [33], [36]. The main functions of the NFL, NFP, or NFS classifiers are to increase the capacity of prototypes by a linear approximation of original feature points. Their performance is much better than that of NN classifiers. However, the number of distance calculations rapidly increases to C3N1 and CPN1 for the NFP and P -NFS classifiers. A significant amount of computational time is needed during the matching phase.

3.2

Scatter Computation of the Nearest Feature Space Considering a training set in the transformed space T ¼ fy1 ; y2 ; . . . ; yN g, the distance from a specified point yi to the feature space is defined as kyi  f ðP Þ ðyi Þk, in which f ðP Þ is a subspace generated by P points, f ðP Þ ðyi Þ is the projection point, and set T ðP Þ is a set of CPN possible combinations f ðP Þ generated from set T . The scatter computation between feature points and feature spaces are obtained and embedded in the discriminant analysis. This approach is called the NFS embedding. Two possible objective functions in the following equations are minimized: 2   X   X   ðP Þ ðP Þ  yi  f ðyi Þ w ðyi Þ F1 ¼  ; and ð3Þ   i f ðP Þ 2T ðP Þ F2 ¼

X X   yi  f ðP Þ ðyi Þ2 wðP Þ ðyi Þ: i

ð4Þ

f ðP Þ 2T ðP Þ

There are CPN1 possible combinations for a specified point yi . The weight values wðP Þ ðyi Þ constitute a connectivity relationship matrix of size N by CPN1 for N feature points to their corresponding projection points f ðP Þ ðyi Þ, i ¼ 1; 2; . . . ; N. The first equation F1 in (3) shows the squared summation of the summed terms for point yi , while function F2 in (4) denotes the summation of every squared term. The difference between functions F1 and F2 is the sequence of an “add” operation. In both (3) and (4), the discriminant vectors from a specified point yi to the K nearest feature spaces have been chosen in the discriminant analysis. If the K nearest feature spaces were chosen, the vectors in the square bracket of function F1 are added first and then the summed vector is squared in (3). It could be a point reconstruction form like that of an LLE algorithm at P ¼ 1. On the other hand, the “squared” operation is executed before the K “added” operations for each point yi in function F2 . This is a form like that of an LPP algorithm when P ¼ 1. To simplify representation, the weighted values wðP Þ ðyi Þ are extended to a new matrix of size N by N P in which some combinations do not exist. For example, when P ¼ 2, C2N1 ¼ ðN  1ÞðN  2Þ=2 projection points are found for the P2L distances of point yi . The weights of feature lines xm xn and xn xm are two different values even though they are the same line. In addition, feature lines xm xm , xi xm , and xm xi do not exist, and their weights are forcefully assigned as 0. In the following, the formulas of functions F1 and F2 for various P values will be addressed and represented as a Laplacian matrix.

Fig. 1. The weight setting for (a) P ¼ 2 and (b) P ¼ 3.

3.2.1 P ¼ 1, Nearest Feature Point Embedding The feature space f ðP Þ is converged to a point when P ¼ 1. It is a simple case. If the K nearest neighbors are chosen, P the formula F in (3) is a standard form of LLE: min 1 i ðyi  P 2 j Mi;j yj Þ . The weights in matrix M can be obtained by solving the equations of a linear system for minimizing the errors [7], [8]. From the consequence in [19], it can be represented as wT XLXT w. Next, consider the objective function F2 in (4), the similarity matrix, similar to the Laplacianface, for point yi , and the K nearest neighbors ð1Þ can be defined as wj ðyi Þ ¼ Si;j ¼ expðkxi  xj k2 =tÞ. The objective function F2 is also represented as wT XLXT w. Moreover, the connections of the Laplacian matrix to PCA and LDA were also described in [19]. 3.2.2 P ¼ 2, Nearest Feature Line Embedding The distance from feature line ym yn to point yi is calculated ð2Þ as jjyi  fm;n ðyi Þjj, in which line ym yn passes through two points ym and yn , as shown in Fig. 1a. The projection point ð2Þ fm;n ðyi Þ can be written as a linear combination of points ym and yn [34], [35]. The vector for feature line ym yn to point yi ð2Þ is obtained as follows: yi  fm;n ðyi Þ ¼ yi  sn;m ym  sm;n yn , in which values sm;n and sn;m are two weighted values in the transformed space, and sm;n þ sn;m ¼ 1. If the K nearest feature lines are chosen from C2N1 possible combinations, the objective function F1 in (3) can be represented as a Laplacian matrix as follows: 2   X  ð2Þ  X ð2Þ F1 ¼ yi  fm;n ðyi Þ wm;n ðyi Þ    m6¼n i  2  ð5Þ X X   ¼ Mi;j yj  yi    i j ¼ trðwT XLXT wÞ;

CHEN ET AL.: FACE RECOGNITION USING NEAREST FEATURE SPACE EMBEDDING

1077

in which weight wð2Þ m;n ðyi Þ and matrix M are assigned as referred to in the Appendix. On the other hand, the objective function F2 in (4) is first decomposed into K components. Each component denotes the summation of squared distances for point yi to the kth nearest feature line, and each could be derived to a representation of Laplacian matrix. Therefore, function F2 becomes XX  yi  f ð2Þ ðyi Þ2 wð2Þ ðyi Þ F2 ¼ m;n m;n i

m6¼n

 2  2  X  X X X     ¼ Mi;j ð1Þyj  þ Mi;j ð2Þyj  þ yi  yi      i j i j 2   X X    þ Mi;j ðKÞyj  yi    i j Fig. 2. The computation of (a) the within-class and (b) the between-class scatters.

¼ trðwT XLð1ÞXT wÞ þ trðwT XLð2ÞXT wÞ þ    þ trðwT XLðKÞXT wÞ ¼ trðwT XLXT wÞ: ð6Þ Function F2 could be represented in a form of Laplacian matrix. More details can be found in the Appendix.

3.2.3 P  3, Nearest Feature Plane and Space (NFP and NFS) When P ¼ 3, the vector for a feature plane Eq;m;n to point yi ð3Þ is represented as yi  fq;m;n ðyi Þ, where plane Eq;m;n is spanned by three given feature points yq , ym , and yn , as ð3Þ shown in Fig. 1b. The projection point fq;m;n ðyi Þ is obtained from the following matrix operations [37]: ð3Þ fq;m;n ðyi Þ

 1 ¼ Yq;m;n YTq;m;n Yq;m;n YTq;m;n ðyi  yq Þ þ yq ð7Þ

¼ ½ðym  yq Þ ðyn  yq Þ½sm sn T þ yq ¼ ð1  sm  sn Þyq þ sm ym þ sn yn ¼ sq yq þ sm ym þ sn yn :

Here, Yq;m;n ¼ ½ðym  yq Þ ðyn  yq Þ is a matrix of size r  2, and the partial matrix ðYTq;m;n Yq;m;n Þ1 YTq;m;n ðyi  yq Þ is represented as ½sm sn T . Besides, the constraint sq þ sm þ ð3Þ sn ¼ 1 is satisfied. The projection point fq;m;n ðyi Þ is thus represented as a linear combination of feature points yq , ym , and yn . Similarly to the representation of NFL, the vectors for point yi to the K nearest feature planes are summed and P represented as yi  j Mi;j yj , whose weighted values ðMi;q ; Mi;m ; Mi;n Þ in matrix M are assigned as follows: ðMi;q ; Mi;m ; Mi;n Þ ¼ ðtq =K; tm =K; tn =KÞ; 1  ½tm tn T ¼ XTq;m;n Xq;m;n XTq;m;n ðxi  xq Þ; Xq;m;n ¼ ½ðxm  xq Þ ðxn  xq Þ;

(4) is also decomposed into K components and becomes wT XLXT w, too. Generally, when P > 3, the projection point for feature point yi to the subspace which is spanned by points y1 ; y2 ; . . . ; yP is generated as follows: f ðP Þ ðyi Þ ¼ Yð1:P Þ ðYTð1:P Þ Yð1:P Þ Þ1 YT ð1:P Þ ðyi  y1 Þ þ y1 ¼

P X

s j yj ;

ð9Þ

and

j¼1

Yð1:P Þ ¼ ½ðy2  y1 Þ ðy3  y1 Þ    ðyP  y1 Þ: P Matrix Yð1;P Þ is a matrix of size r  ðP  1Þ and Pj¼1 sj ¼ 1. The vector from the projection point to point yi is P represented as yi  f ðP Þ ðyi Þ ¼ yi  j Mi;j yj . Finally, both the objective functions F1 and F2 can be represented as a Laplacian matrix.

3.3 Maximization of the Fisher Criterion Similarly to the MFA approach, only the feature spaces with smaller distances from a specified point are used to calculate the scatters. Two parameters K1 and K2 are ðP Þ manually determined for the within-class scatter SW and ðP Þ the between-class scatter SB , respectively. The Fisher ðP Þ ðP Þ criterion, SB =SW , is maximized, as shown in Fig. 2. Here, the training set consists of NC classes and sample xij denotes the jth sample in the ith class of size Ni . The sample in lowdimensional space is obtained by a projection yij ¼ wT xij . The within-class and between-class scatters for the objective function F1 are calculated as 10 1T 3 20 NC X Ni X X X CB C 7 6B ðP Þ Z A@ Z A 5; ð10Þ SW ¼ 4@ i¼1 j¼1

ð8Þ

ðP Þ

f ðP Þ 2FK ðxij Þ 1

ðP Þ

f ðP Þ 2FK ðxij Þ 1

and

tq þ tm þ tn ¼ 1: If the K nearest feature planes are selected from C3N1 possible combinations and the weighted matrix M is set, the objective function F1 in (3) becomes a form of Laplacian matrix: wT XLXT w. Similarly, the objective function F2 in

ðP Þ

SB

20 NC X Ni X 6B ¼ 4@ i¼1 j¼1

X

ðP Þ f ðP Þ 2BK ðxij Þ 2

Z ¼ wT xij  f ðP Þ ðwT xij Þ:

10 CB Z A@

X

ðP Þ f ðP Þ 2BK ðxij Þ 2

1T 3 C 7 Z A 5;

ð11Þ

1078

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 33,

NO. 6,

JUNE 2011

TABLE 1 The Experimental Parameters for Various Data Sets

ðP Þ

in which FK1 ðxij Þ indicates the K1 nearest feature subspaces, which are generated by P feature points within the ðP Þ

same class of point xij , and BK2 ðxij Þ is a set of the K2 nearest feature subspaces that are generated by P feature points belonging to the different class of point xij . Similarly, ðP Þ

ðP Þ

scatters SW and SB for function F2 are calculated as

ðP Þ

SW

0 NC Ni X BX ¼ @ i¼1

X

j¼1 f ðP Þ 2F ðP Þ ðxi Þ j K

1  C ZZ T A;

ð12Þ

1  T C ZZ A:

ð13Þ



1

ðP Þ

SB

0 NC Ni X BX ¼ @ i¼1

X

j¼1 f ðP Þ 2BðP Þ ðxi Þ j K 2

Thereafter, the optimal transformation matrix w is obtained ðP Þ

ðP Þ

by maximizing the criterion w ¼ arg maxw SB =SW . The procedure of the NFS embedding algorithm is given below.

3.4

The NFS Embedding Algorithm

Input: N training samples z1 ; z2 ; . . . zN . Output: The transformation matrix w ¼ wP CA w . Step 1: Initialize four parameters P ; K1 ; K2 , and r. Step 2: Find the projection matrix wP CA by the PCA method. The sample data are transformed by matrix wP CA : xi ¼ wTP CA zi ; i ¼ 1; 2; . . . ; N. Step 3: Projection point generation. 1) Obtain the projection points for all feature points to the possible feature spaces f ðP Þ ðxi Þ; i ¼ 1; 2; . . . ; N.   2) Calculate and sort the distances xi  f ðP Þ ðxi Þ . Step 4: Compute the within-class and between-class scatters. 1) Select the K1 vectors with the smallest distances from a specified point to the feature lines within ðP Þ the same class. The within-class scatter SW is calculated by (10) or (12). 2) K2 vectors with the smallest distances from a point to the feature lines belonging to different classes are chosen to compute the between-class ðP Þ scatter SB by (11) or (13). Step 5: Maximize the Fisher criterion to obtain the ðP Þ ðP Þ transformation matrix w ¼ arg maxw SB =SW

which is composed of r eigenvectors with the largest eigenvalues. Step 6: Output the transformation matrix: wP CA w .

4

EXPERIMENTAL RESULTS

Some experiments were conducted to show the effectiveness of the proposed algorithm. Several face benchmark databases, including ORL [38], CMU [39], IIS [40], and XM2VTS [41] were collected for evaluating the recognition performance. These data sets are briefly described as follows: The CMU face data set is composed of 68 people with PIE variations. In this study, 170 images per individual were selected for the experiments. Lighting, expression, and the wearing of spectacles were taken into consideration for the evaluation. Next, the ORL data set consisted of 400 images, 40 individuals and 10 images per person, with a frontal view and a neutral expression. The images in this data set were grabbed under well-controlled conditions and frequently used in the exercises of many pattern recognition classes. The IIS data set was a set with a large number of image samples. There were 12,000 images of 400 people with pose and expression changes. In this case, 128 people were selected for the performance evaluation. The XM2VTS database contains 295 individuals and 12 frontal face images of each individual (three subsets CDS001, CDS002, and CDS008) were taken in four different sessions. In addition, the images in subset CDS008 were acquired under the illumination changes. More details can be found in [38], [39], [40], [41]. The training and testing samples for various databases are summarized in Table 1. Before the comparison of recognition performance, a simple example was illustrated for the class separability, neighborhood structure preservation, and NFS measurement. Of the three classes for the CMU database, 510 sample distributions are shown in Figs. 3 and 4. They were projected to the first two axes by the unsupervised LPP and the NFL embedding algorithms, respectively. Since the prior class information was not used in the unsupervised LPP method, the samples of each class were mixed, as seen in Fig. 3. On the other hand, the samples belonging to the same class were grouped together as shown in Fig. 4. They were classified into three groups. The samples with similar poses and illumination conditions were classified and distributed in a small region. Comparing with Fig. 3, the class separability and locality preservation were achieved by observing the sample distribution. Since the NFL measurement was embedded in the transformation, the illumination of sample images located at the boundary

CHEN ET AL.: FACE RECOGNITION USING NEAREST FEATURE SPACE EMBEDDING

1079

Fig. 3. The sample distribution on the first two axes using the unsupervised LPP approach.

Fig. 5. The sample distributions for four class samples on the first two axes using (a) the supervised LPP and (b) the NFL embedding methods. Fig. 4. The sample distribution on the first two axes using the NFL embedding approach.

regions for each class was gradually changed. This relates to a consequence of the linear approximation of samples using the NFL measurement. As mentioned in [1], the class label of each sample could be adopted during the training process in the supervised LPP method. Considering a specified point, the samples belonging to different classes are not considered in the similarity matrix even if they are very close. The sample distributions of four classes on the first two projected axes are depicted in Fig. 5 by using the supervised LPP and the NFL methods. These classes are successfully separated into four groups. Since only the within-class scatter is considered in the LPP method, the interclasses generated by the NFL method (see Fig. 5b) spread more widely than those by LPP (see Fig. 5a). The wider the interclass scatter, the better performance can be achieved. This consequence will be verified by the experimental results. For fair comparison’s sake, the class labels are used for all compared algorithms in the experiments. The experimental results are divided into two parts: One is conducted to show the discriminant power of the NFL embedding method using the NN matching rule. This rule is commonly used in many traditional classification methods. The other part is that the NFL measurement is adopted both in the discriminant analysis and the matching phase. In the first part, four state-of-the-art methods, local preserving projection (PCA+LPPFace) [1],1 orthogonal LPP(PCA+ OLPPFace) 1. The source codes are available from a web site: http://www. zjucadcg.cn/dengcai/Data/ORL/results_new.html.

Fig. 6. The illustrated face-only images of size 32 by 32 for the CMU benchmark database.

[4], orthogonal neighborhood preserving discriminant analysis (PCA+ONPDA) [17], and marginal Fisher analysis (PCA+MFA) [19], were implemented for performance comparisons. In the experiments, several images were randomly selected from the data sets for training, and the others were used for testing. The face-only images of size 32 by 32 were cropped from the original ones to eliminate the influence of hair and background, as shown in Fig. 6. In addition, the PCA process was first performed to reduce the dimensionality in order to avoid the S3 problem. The sample dimensions for databases ORL, XM2VTS, CMU, and IIS were reduced to keep more than 99 percent of features in the PCA process. The parameters are tabulated in Table 1. After the PCA transformation wP CA , the optimal transformation matrices w are obtained for four state-of-the-art algorithms and the NFL embedding approach (i.e., P ¼ 2). All testing samples were matched with the training samples using the NN matching rule. Each algorithm was run 10 times to obtain the average rates. The highest recognition rates of various training samples for the implemented algorithms and the four benchmark databases were tabulated in Tables 2, 3, 4.

1080

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 33,

NO. 6,

JUNE 2011

TABLE 2 The Recognition Performance on the ORL and XM2VTS Databases (Percent)

TABLE 3 The Recognition Performance on the CMU Database (Percent)

TABLE 4 The Recognition Performance on the IIS Database (Percent)

Fig. 7. Recognition rates versus dimensionality reduction with various training samples on the ORL database.

The sample dimensions were reduced by the optimal transformation w , as listed in the parentheses in these tables. From these tables, the proposed NFL embedding outperformed the other four algorithms. Moreover, the averaged recognition rates versus the reduced dimensions on the benchmark databases are shown in Figs. 7, 8, 9, 10. The standard derivations for these data sets are also tabulated in Table 5 to demonstrate the robustness of the proposed method. From this table, the standard derivations of those compared methods are almost the same. The performance of the NFL embedding method is better than that of the other algorithms. In addition to the results for P ¼ 2, when P > 3, the recognition rates versus the reduced dimensions for various benchmark databases are also shown in Fig. 11. From the results, the recognition rates are almost the same. There is no evidence to show that the performance for P > 3 must be better than that for P ¼ 2. In addition to the eigenspace-based approaches, the NFL algorithm proposed by Chien and Wu [36] was implemented for the comparison in the second part. The results are shown in Fig. 12 and Table 6. It is an improved algorithm in the matching phase. The input images of dimensionality 256 were by the wavelet and LDA transforms. PNC transformed Ni i¼1 C2 distances from the query image to the feature lines were next calculated, and the class ID with the smallest distance was outputted. In the proposed method, the PCA process was executed to reduce the dimensions of samples in databases ORL, XM2VTS, CMU, and IIS to 256. The NFL measurement was embedded into the discriminant analysis, and the NFL rule was used in the matching phase. More discriminating powers were obtained at low dimensions,

Fig. 8. Recognition rates versus dimensionality reduction with various training samples on the XM2VTS database.

CHEN ET AL.: FACE RECOGNITION USING NEAREST FEATURE SPACE EMBEDDING

Fig. 9. Recognition rates versus dimensionality reduction with various training samples on the CMU database.

Fig. 10. Recognition rates versus dimensionality reduction with various training samples on the IIS database.

1081

1082

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

TABLE 5 The Average Recognition Rates and the Standard Derivations on Various Databases (Percent)

e.g., less than 40. Since Chien’s method uses the convectional LDA transformation, the discriminant power at low dimensions was low. The recognition performance was poor even though lots of pseudoprototypes were generated. The NFL-based matching strategy can be used in both the discriminant analysis and the matching processes in our approach. From the recognition results, as shown in Fig. 12 and Table 6, the proposed method outperformed that of Chien and Wu. The training and testing processes are analyzed on the number of training samples, as shown in Fig. 13. As mentioned before, CPN1 possible combinations should be checked for a specified features point. The calculation of projection points and the sorting process of distances were performed in training. When NC classes with each

VOL. 33,

NO. 6,

JUNE 2011

size Ni , i ¼ 1; 2; . . . ; NC , constitute the training set, CPNi 1 PNC Nj and j¼1;j6¼i CP combinations are generated for finding the K1 and K2 nearest feature spaces for a point in the P C P C PNC Nj Ni 1 ith class. N and N i¼1 Ni CP i¼1 Ni j¼1;j6¼i CP combinations are considered in total to compute the within-class and between-class scatters, respectively. When the pointPNC 2 based scatters are constructed, only and i¼1 Ni PNC PNC i¼1 Ni j¼1;j6¼i Nj combinations are considered to obtain the intraclass and the interclass scatters, i.e., a special case for P ¼ 1. During the testing phase, the matching time is the same as that of the traditional methods using the NN P C matching rule, e.g., N i¼1 Ni . If the NFS matching rule was PNC Ni adopted, C combinations were verified. A vast i¼1 P number of combinations needs to be checked and they are time-consuming processes. In order to boost the efficiency, Chien [36] proposed an improvement to filter out the impossible feature spaces. The first 10 nearest classes are chosen by comparing the distances from a probe point to the classes’ centers. Using this strategy, a risk exists with a 2 percent descended rate. From the empirical results, at least half of the total classes should be verified to keep the performance. The main disadvantage in this study is that more training time is needed in it than that with the traditional eigenspace-based approaches.

Fig. 11. Recognition rates versus dimensionality reduction with various databases for P ¼ 2, P ¼ 3, and P ¼ 4. (a) ORL database. (b) XM2VTS database. (c) CMU database. (d) IIS database.

CHEN ET AL.: FACE RECOGNITION USING NEAREST FEATURE SPACE EMBEDDING

1083

Fig. 12. Recognition rates versus dimensionality reduction with various databases for NFL embedding and Chien and Wu [36] approaches. (a) ORL database. (b) XM2VTS database. (c) CMU database. (d) IIS database.

Two parameters K1 and K2 are empirically selected as tabulated in Table 1. They highly depend on the sample distributions. Basically, the value of K1 is set as the number of possible line combinations when the samples within a class are few. If the training samples within a class are many, the maximal value of K1 is set as 10 because of the time-consuming computation. On the other hand, the value of K2 is set as certain amount of the increasing of value K1 , according to extensive experimentations. In this study, K2 was set to be 2 to 10 times of K1 . When the ratio between values K2 and K1 is larger than 10, much more training time will be needed. Moreover, a test on a simple simulated example has been conducted to demonstrate the discriminant results of the proposed method when multisubclasses exist. Two classes composed of 800 samples are given in the TABLE 6 The Recognition Performance on Four Benchmark Databases (Percent)

Fig. 13. The (a) training and (b) testing time on the CMU database. (a) For 68 classes. (b) For a probe sample.

1084

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

VOL. 33,

NO. 6,

JUNE 2011

ð2Þ yi  fm;n ðyi Þ ¼ yi  ym þ sm;n ðym  yn Þ

¼ yi  ð1  sm;n Þym  sm;n yn ¼ yi  sn;m ym  sm;n yn :

ð14Þ

sm;n þ sn;m ¼ 1. If the K nearest feature lines are chosen

Fig. 14. The discriminant results generated by the LDA, supervised LPP, and NFL embedding methods when multisubclasses occur.

2D space, as shown in Fig. 14. Each class is represented by two separated Gaussians. They are trained by the LDA, supervised LPP, and NFL methods. The first projection axes for these algorithms are shown in Fig. 14. From this figure, it is revealed that the proposed method also works well when multisubclasses exist. On the other hand, the multiple subclass conditions seldom occur in the experiments, because the training samples within a class are few. The influence is small when multiple subclasses occur. Since few line combinations for the within-class and between-class scatters were selected, the discriminant vectors with larger lengths, which cover different subclasses, can be ignored.

5

CONCLUSIONS

In this study, the NFS distance measurement has been embedded in the discriminant analysis. The scatters of points to NFSs are represented as a Laplacian matrix for local structure preservation. Though the representation is the same as those of LLE and LPP, the weight setting based on the NFS embedding approach is more systematic than the others. In addition, the Fisher criterion is maximized using the prior class information. The found transformation provides more discriminating power. From the experimental results, not only is the structure locality preserved, but class separability is also achieved. The matching time is the same as that of the traditional eigenspace-based approaches using the NN matching rule. In the future, the variation when using kernel functions for nonlinear manifold learning will be addressed. More discriminating vectors need to be found and analyzed to obtain more reliable results.

APPENDIX In this Appendix, we would like to show that the objective functions (3) and (4) with the NFL-based distance measure, P ¼ 2, can be represented in a form of Laplacian matrix. The distance from feature line ym yn to point yi is calculated ð2Þ as jjyi  fm;n ðyi Þjj, as shown in Fig. 1a. The projection point ð2Þ can be written as fm;n ðyi Þ ¼ ym þ sm;n ðyn  ym Þ, in which T sm;n ¼ ðyi  ym Þ ðym  yn Þ=ðym  yn ÞT ðym  yn Þ and i 6¼ m 6¼ n. The vector for feature line ym yn to point yi is obtained as follows:

from the C2N1 possible combinations, the objective function F1 in (3) by the following simple algebra operations becomes:  2  X  ð2Þ X   ð2Þ yi  fm;n ðyi Þ wm;n ðyi Þ F1 ¼    i m6¼n 2   X  ð2Þ  X ¼ yi  sn;m ym  sm;n yn wm;n ðyi Þ    i m6¼n  2 ð15Þ  X X   ¼ Mi;j yj  yi    i j ¼ trðYðI  MÞT ðI  MÞYT Þ ¼ trðYðD  WÞYT Þ ¼ trðwT XLXT wÞ: In the above formula, the weights wð2Þ m;n and matrix M are respectively defined below: 8 < K1 m 6¼ n 6¼ i; if line xm xn is among the ðy Þ ¼ wð2Þ K nearest feature lines of point xi ; m;n i : 0 otherwise; ð16Þ Mi;n ¼

 P 1 K

0

m tm;n

if wð2Þ m;n ðyi Þ 6¼ 0; m 6¼ n 6¼ i; otherwise;

ð17Þ

tm;n ¼ ðxi  xm ÞT ðxm  xn Þ=ðxm  xn ÞT ðxm  xn Þ a n d j Mi;j ¼ 1. It should be noted again that when line xm xn is the nearest feature line from point xi , i.e., lines xn xm and ð2Þ xm xn are the same, the weights wð2Þ m;n ðyi Þ and wn;m ðyi Þ are both set as 1. From the results in [19], matrix W is set as Wi;j ¼ ðM þ MT  MT MÞi;j when i 6¼ j; 0, otherwise. The objective function F1 in (3) can be represented as a Laplacian matrix. On the other hand, the objective function F2 in (4) is first decomposed into K components. Each component denotes the summation of squared distances for point yi to the kth nearest feature line. Consider the first component, matrix Mi;j ð1Þ denotes the connectivity relationship matrix between point xi and the nearest feature line xm xn , i; m; n ¼ 1; 2; . . . ; N and i 6¼ m 6¼ n. Two nonzero terms, Mi;n ð1Þ ¼ tm;n and Mi;m ð1Þ ¼ tn;m , exist at each row of matrix Mi;j ð1Þ P and they satisfy j Mi;j ð1Þ ¼ 1. In general, Mi;n ðkÞ ¼ tm;n , Mi;m ðkÞ ¼ tn;m , i 6¼ m 6¼ n, if line xm xn is the kth nearest feature line of point xi ; 0 otherwise. All components are derived to a representation of Laplacian matrix. Therefore, function F2 becomes P

CHEN ET AL.: FACE RECOGNITION USING NEAREST FEATURE SPACE EMBEDDING

XX  yi  f ð2Þ ðyi Þ2 wð2Þ ðyi Þ F2 ¼ m;n m;n i

m6¼n

i

m6¼n

XX

[9]

kyi  sn;m ym  sm;n yn k2 wð2Þ m;n ðyi Þ

[10]

2 2    X  X X X     ¼ Mi;j ð1Þyj  þ Mi;j ð2Þyj  yi  yi      i j i j  2  X X   þ  þ Mi;j ðKÞyj  yi    i j

[11]

¼

¼ trðYðIMð1ÞÞT ðIMð1ÞÞYT Þ þ trðYðIMð2ÞÞT ðIMð2ÞÞYT Þ

[12]

[13]

þ    þ trðYðIMðKÞÞT ðIMðKÞÞYY Þ ¼ trðYðDð1Þ  Wð1ÞÞYT Þ þ trðYðDð2Þ  Wð2ÞÞYY Þ þ    þ trðYðDðKÞ  WðKÞÞYT Þ ¼ trðYðD  WÞYT Þ ¼ trðwT XLXT wÞ:

[14]

[15]

ð18Þ Each

[16]

Wi;j ðkÞ ¼ ðMðkÞ þ MðkÞT  MðkÞT MðkÞÞi;j ; 1 D ¼ ðDð1Þ þ Dð2Þ þ    þ DðkÞÞ; k and W ¼ 1k ðWð1Þ þ Wð2Þ þ    þ WðKÞÞ. Thus, function F2 could be represented in a form of Laplacian matrix.

[17]

[18]

[19]

ACKNOWLEDGMENTS The authors thank Professor Stan Z. Li and the anonymous reviewers for providing valuable comments which considerably improved the quality of this paper. The work was supported by the National Science Council under grant no. NSC 97-2221-E-239 -023 -MY2, and by the Technology Development Program for Academia of DOIT, MOEA, Taiwan under grant no. 98-EC-17-A-02-S1-032.

[20]

[21]

[22]

[23]

REFERENCES [1] [2] [3]

[4] [5] [6] [7] [8]

X. He, S. Yan, Y. Ho, P. Niyogi, and H.J. Zhang, “Face Recognition Using Laplacianfaces,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 328-340, Mar. 2005. M. Turk and A.P. Pentland, “Face Recognition Using Eigenfaces,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 586591, 1991. P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997. D. Cai, X. He, J. Han, and H. Zhang, “Orthogonal Laplacianfaces for Face Recognition,” IEEE Trans. Image Processing, vol. 15, no. 11, pp. 3608-3614, Nov. 2006. J. Tenenbaum, V. Silva, and J. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol. 290, no. 22, pp. 2319-2323, 2000. H. Zha and Z. Zhang, “Isometric Embedding and Continuum ISOMAP,” Proc. 20th Int’l Conf. Machine Learning, pp. 864-871, 2003. S.T. Roweis and L.K. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, no. 22, pp. 23232326, 2000. L.K. Saul and S.T. Roweis, “Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds,” J. Machine Learning Research, vol. 4, pp. 119-155, 2003.

[24]

[25]

[26]

[27]

[28]

[29]

[30]

1085

M. Belkin and P. Niyogi, “Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering,” Proc. Advances in Neural Information Processing System, vol. 14, pp. 585-591, 2001. J. Yang, D. Zhang, J.Y. Yang, and B. Niu, “Globally Maximizing, Locally Minimizing: Unsupervised Discriminant Projection with Applications to Face and Palm Biometrics,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 4, pp. 650-664, Apr. 2007. T. Zhang, B. Fang, Y. Tang, G. He, and J. Wen, “Topology Preserving Non-Negative Matrix Factorization for Face Recognition,” IEEE Trans. Image Processing, vol. 17, no. 4, pp. 574-584, Apr. 2008. R. Lotlikar and R. Kothari, “Fractional-Step Dimensionality Reduction,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 6, pp. 623-627, June 2000. J. Yang, D. Zhang, A. Frangi, and J. Yang, “Two-Dimensional PCA: A New Approach to Appearance-Based Face Representation and Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 1, pp. 131-137, Jan. 2004. K. Etemad and R. Chellappa, “Discriminant Analysis for Recognition of Human Face Recognition,” J. Optical Soc. Am., vol. 14, no. 8, pp. 1724-1733, 1997. J. Fortuna and D. Capson, “Improved Support Vector Classification Using PCA and ICA Feature Space Modification,” Pattern Recognition, vol. 37, no. 6, pp. 1117-1129, 2004. M. Loog, R.P.W. Duin, and R. Haeb-Umbach, “Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 7, pp. 762-766, July 2001. H.F. Hu, “Orthogonal Neighborhood Preserving Discriminant Analysis for Face Recognition,” Pattern Recognition, vol. 41, no. 6, pp. 2045-2054, 2008. J.B. Li, J.S. Pan, and S.C. Chu, “Kernel Class-Wise Locality Preserving Projection,” Information Sciences, vol. 178, no. 7, pp. 1825-1835, 2008. S. Yan, D. Xu, B. Zhang, H.J. Zhang, Q. Yang, and S. Lin, “Graph Embedding and Extensions: General Framework for Dimensionality Reduction,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40-51, Jan. 2007. D. Swets and J. Weng, “Using Discriminant Eigenfeatures for Image Retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 8, pp. 831-836, Aug. 1996. L.F. Chen, H.Y.M. Liao, M.T. Ko, J.C. Lin, and G.J. Yu, “A New LDA-Based Face Recognition System Which Can Solve the Small Sample Size Problem,” Pattern Recognition, vol. 33, no. 10, pp. 1713-1726, 2000. H. Cevikalp, M. Neamtu, M. Wikes, and A. Barkana, “Discriminative Common Vectors for Face Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 1, pp. 4-13, Jan. 2005. Z. Li, D. Lin, and X. Tang, “Nonparametric Discriminant Analysis for Face Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 4, pp. 755-761, Apr. 2009. M. Bressan and J. Vitria, “Nonparametric Discriminant Analysis and Nearest Neighbor Classification,” Pattern Recognition Letters, vol. 24, no. 15, pp. 2743-2749, 2003. X. Jiang, B. Mandal, and A. Kot, “Eigenfeature Regularization and Extraction in Face Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 3, pp. 383-394, Mar. 2008. X.G. Wang and X. Tang, “A Unified Framework for Subspace Face Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 26, no. 9, pp. 1222-1228, Sept. 2004. A.N. Rajagopalan, R. Chellappa, and N. Koterba, “Background Learning for Robust Face Recognition with PCA in the Presence of Clutter,” IEEE Trans. Image Processing, vol. 14, no. 6, pp. 832-843, June 2005. J. Lu, X. Yuan, and T. Yahagi, “A Method of Face Recognition Based on Fuzzy C-Means Clustering and Associated Sub-NNs,” IEEE Trans. Neural Networks, vol. 18, no. 1, pp. 150-160, Jan. 2007. X. Geng, Z. Zhou, and K. Smith-Miles, “Individual Stable Space: An Approach to Face Recognition under Uncontrolled Conditions,” IEEE Trans. Neural Networks, vol. 19, no. 8, pp. 1354-1368, Aug. 2008. B. Park, K. Lee, and S. Lee, “Face Recognition Using Face-ARG Matching,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 12, pp. 1982-1988, Dec. 2005.

1086

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,

[31] Y. Liu and Y. Chen, “Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines,” IEEE Trans. Neural Networks, vol. 18, no. 1, pp. 178-192, Jan. 2007. [32] J. Chien and C. Liao, “Maximum Confidence Hidden Markov Modeling for Face Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 606-616, Apr. 2008. [33] S.Z. Li, “Face Recognition Based on Nearest Linear Combinations,” Proc. Computer Vision and Pattern Recognition, pp. 839-844, 1998. [34] S.Z. Li and J. Lu, “Face Recognition Using the Nearest Feature Line Method,” IEEE Trans. Neural Networks, vol. 10, no. 2, pp. 439433, Mar. 1999. [35] S.Z. Li, K.L. Chan, and C.L. Wang, “Performance Evaluation of the Nearest Feature Line Method in Image Classification and Retrieval,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1335-1339, Nov. 2000. [36] J.T. Chien and C.C. Wu, “Discriminant Waveletfaces and Nearest Feature Classifiers for Face Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 12, pp. 1644-1649, Dec. 2002. [37] G. Strang and K. Borre, Linear Algebra, Geodesy, and GPS, pp. 165171. Wellesley-Cambridge Press, 1997. [38] The Olivetti & Oracle Research Laboratory Face Database of Faces, http://www.cam-orl.co.uk/facedatabase.html, 1994. [39] T. Sim, S. Baker, and M. Bsat, “The CMU Pose, Illumination, and Expression Database,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1615-1618, Dec. 2003. [40] http://www.iis.sinica.edu.tw, 2009. [41] J. Luettin and G. Maitre, “Evaluation Protocol for the Extended M2VTS Database (XM2VTS),” DMI for Perceptual Artificial Intelligence, 1998. Ying-Nong Chen received the BS and the MS degrees in information management and informatics from Nan Hua University and Fo Guang University, Taiwan, in 2000 and 2003, respectively. He is currently pursuing the PhD degree in computer science and information engineering at National Central University, Taiwan. His research interests include pattern recognition, computer vision, and machine learning.

VOL. 33,

NO. 6,

JUNE 2011

Chin-Chuan Han received the BS degree in computer engineering from National Chiao-Tung University in 1989, and the MS and PhD degrees in computer science and electronic engineering from National Central University in 1991 and 1994, respectively. From 1995 to 1998, he was a postdoctoral fellow in the Institute of Information Science, Academia Sinica, Taipei, Taiwan. He was an assistant research fellow in the Telecommunication Laboratories, Chunghwa Telecom Co. in 1999. From 2000 to 2004, he worked with the Department of Computer Science and Information Engineering, Chung Hua University, Taiwan. In 2004, he joined the Department of Computer Science and Information Engineering, National United University, Taiwan, where he became a professor in 2007. He is a member of the IEEE, the SPIE, and the IPPR in Taiwan. His research interests are in the areas of face recognition, biometrics authentication, video surveillance, image analysis, computer vision, and pattern recognition. Cheng-Tzu Wang received the MS and PhD degrees from the Center for Advanced Computer Studies, University of Louisiana, in 1991 and 1994, respectively. He is currently an associate professor in the Department of Computer Science at National Taipei University of Education, Taiwan. His current interests include image processing, hybrid soft computing models, and software engineering.

Kuo-Chin Fan received the BS degree in electrical engineering from the National TsingHua University, Hsinchu, in 1981, and the MS and PhD degrees from the University of Florida (UF), Gainesville, in 1985 and 1989, respectively. In 1983, he joined the Electronic Research and Service Organization (ERSO), Taiwan, as a computer engineer. From 1984 to 1989, he was a research assistant with the Center for Information Research at UF. In 1989, he joined the Institute of Computer Science and Information Engineering, National Central University, Chung-Li, Taiwan, where he became a professor in 1994, and from 1994 to 1997, he was the chairman of the department. Currently, he is the director of the Computer Center. His current research interests include image analysis, optical character recognition, and document analysis. He is a member of the SPIE.

. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.

Suggest Documents