Fuzzy local maximal marginal embedding for ... - Semantic Scholar

3 downloads 0 Views 677KB Size Report
May 21, 2011 - Techniques for dimensionality reduction (Batur and Hayes. 2001; Yang et al. ... (Tenenbaum et al. 2000; Roweis and Saul 2000; Belkin and.
Soft Comput (2012) 16:77–87 DOI 10.1007/s00500-011-0735-y

ORIGINAL PAPER

Fuzzy local maximal marginal embedding for feature extraction Cairong Zhao • Zhihui Lai • Chuancai Liu Xingjian Gu • Jianjun Qian



Published online: 21 May 2011 Ó Springer-Verlag 2011

Abstract In graph-based linear dimensionality reduction algorithms, it is crucial to construct a neighbor graph that can correctly reflect the relationship between samples. This paper presents an improved algorithm called fuzzy local maximal marginal embedding (FLMME) for linear dimensionality reduction. Significantly differing from the existing graph-based algorithms is that two novel fuzzy gradual graphs are constructed in FLMME, which help to pull the near neighbor samples in same class nearer and nearer and repel the far neighbor samples of margin between different classes farther and farther when they are projected to feature subspace. Through the fuzzy gradual graphs, FLMME algorithm has lower sensitivities to the sample variations caused by varying illumination, expression, viewing conditions and shapes. The proposed FLMME algorithm is evaluated through experiments by using the WINE database, the Yale and ORL face image databases and the USPS handwriting digital databases. The results show that the FLMME outperforms PCA, LDA, LPP and local maximal marginal embedding. Keywords Fuzzy gradual graph  Graph-based learning  Feature extraction

C. Zhao (&)  Z. Lai  C. Liu  X. Gu  J. Qian School of Computer Science, Nanjing University of Science and Technology, Nanjing 210094, Jiansu, China e-mail: [email protected] C. Zhao Department of Physics and Electronics, Minjiang College, Fuzhou 350108, Fujian, China

1 Introduction Techniques for dimensionality reduction (Batur and Hayes 2001; Yang et al. 2007; Jain et al. 2000; Belhumeur et al. 2007; Martinez and Kak 2001; Turk and Pentland 1991; Fukunnaga 1991; Ye et al. 2004; Yu and Yang 2001) have achieved remarkable success in computer vision and pattern recognition. From the perspective of pattern recognition, dimensionality reduction is an effective method of avoiding the ‘‘curse of dimensionality’’ (Jain et al. 2000) and improving the computational efficiency of pattern matching. Non-linear dimensionality reduction methods (Tenenbaum et al. 2000; Roweis and Saul 2000; Belkin and Niyogi 2003), which use the local neighborhood information, are the important techniques presented in recent years. These methods typically rely on some local geometry defined in different ways to directly find the intrinsic lowdimensional data structures hidden in observation space. However, they might be unsuitable for pattern recognition tasks. The reasons are as follows. First, the non-linear methods have no direct connection to classification. The second reason is that the low-dimensional embedding are only defined on the training data set and thus cannot be directly applied to the test samples. In contrast to most nonlinear methods, locality preserving projections (LPP; He et al. 2005; He and Niyogi 2003), which is the linear expansion of LE has the remarkable advantage in that it can generate an explicit map and the test samples can be directly mapped on the low-dimensional subspace. However, like most graph-based algorithms, LPP has the same weakness of having no direct connection to classification. Thus, the classification accuracy of LPP may be degraded with respect to pattern recognition. Motivated by the idea of classification-oriented multimanifold learning, Yang et al. (2007) proposed UDP

123

78

(unsupervised discriminant projection) for feature extraction by taking into account both the local and non-local quantities. Zhao et al. (2008) proposed local maximal marginal embedding (LMME) for linear dimensionality reduction, aiming to construct local interclass graph and intraclass graph. However, in these methods, samples in the margin of different class may be projected to neighbor points in the low-dimensional space. This may lead to potential misclassification. In order to overcome this disadvantage, Kokiopoulou and Saad (2009) proposed a repulsion graphs to describe these marginal samples. It yields encouraging results on face recognition. However, there is still a major drawback in Kokiopoulou and Saad (2009) and previous graph-based methods that the graphs are constructed based on the assumption that the same membership degree of each sample to corresponding class during construction of affinity graphs. Since the distances between the samples in local k-nearest neighborhood might also vary in a big range, the graphs constructed in this way might have potential disadvantages that the weights are not in accordance with the nature relations of samples in actual applications. To better describe the relations in the samples, some fuzzy pattern methods are proposed in recent years. Laskaris and Zafeiriou (2008) suggested the fuzzy connectivity graph to represent the relative distribution of the data and got remarkable result for data clustering. Kwak and Pdrycz (2005) proposed a fuzzy Fisherface classifier based on fuzzy k-nearest neighbor (FKNN; Keller et al. 1985) and the recognition rate is improved on different face databases. This testified the effectiveness of fuzzy pattern. In general, there are two main problems in graph-based methods: description of margin samples between different classes and membership degree of each sample to corresponding class. Based on these problems, this paper presents a new approach called fuzzy local maximal marginal embedding (FLMME). In FLMME, the enhanced FKNN is implemented in order to model the nature distribution information of original samples. Then this information is utilized to redefine the affinity weights of neighborhood graph instead of those of the binary pattern or Gaussian weighted pattern. Through the fuzzy gradual pattern, two novel fuzzy gradual graphs (intraclass and interclass) are constructed. During the construction of fuzzy gradual intraclass graph, the main idea is that the nearer the neighbors are, the greater the weights are. During the construction of fuzzy gradual interclass graph, the main idea is that the farther the neighbors are, the greater the weights are. That helps to pull the near neighbor samples in same class nearer and nearer and repel the farther neighbor samples of margin between different classes farther and

123

C. Zhao et al.

farther when they are projected to low-dimensional subspace. The remainder of this paper is organized as follows: Sect. 2 outlines FKNN and LMME. Section 3 develops the idea of FLMME and the relevant theory and algorithm. Section 4 describes the related experiments. Section 5 offers our conclusions. 2 Outline of FKNN and LMME 2.1 FKNN Fuzzy set theory is a generalization of the classical set theory. Fuzzy pattern recognition is a fuzzy logic method used to solve the classical pattern recognition problem. FKNN algorithm (Keller et al. 1985) is adopted to calculate fuzzy membership degree. Let U ¼ ½lij ; i ¼ 1; 2; . . .; C; j ¼ 1; 2; . . .; M; where lij indicates the degree that the jth sample belongs to class i. The computation of the fuzzy membership degrees (lij ) follow the procedures below (Keller et al. 1985; Kwak and Pedrycz 2005): Step 1 Compute the Euclidean distance matrix between any pair of feature vectors in the training samples set. Step 2 Set diagonal elements of this matrix to infinity. Step 3 Sort the distance matrix (treat each of its columns separately) in an ascending order. Collect the corresponding class labels of the patterns located in the closest kneighborhood. Step 4 Compute the membership degree of the samples using Eq. 1.  0:51 þ 0:49ðnij =kÞ; if the jth sample 2 ith class lij ¼ ; 0:49ðnij =kÞ; otherwise ð1Þ where nij stands for the number of the neighbors of the jth pattern that belong to the class i. 2.2 LMME LMME (Zhao et al. 2008) creates an intrinsic graph to characterize the intraclass compactness and a penalty graph to characterize the interclass separability. The intrinsic graph describes the point adjacency relationship of intraclass, and each sample is connected to its k-nearest neighbors of the same class. The penalty graph depicts the interclass marginal point adjacency relationship. Given a set of training samples fxi gM i¼1 ; there are C known pattern classes, where xi 2 Rn ; let X ¼ ½x1 ; x2 ; . . .; xM  be the data matrix including all the training samples in its columns.

FLMME for feature extraction

79

By following the graph embedding formulation, interclass separability is characterized by a penalty graph with subsequent equation: X X 2 Sp ¼ kwT xi  wT xj k Wijp ;   i i2Nk ðjÞ or j2Nk ðiÞ ð2Þ 1 1 ¼ 2wT XðDp  W p ÞX T w  lij ; if i 2 Nk1 ðjÞ or j 2 Nk1 ðiÞ ; Wijp ¼ 0; else: where Wp characterizes the affinity weights between class, in which each elements WPij refers to the weight of the edge between xi and xj in different class, DP is diagonal matrix P with diagonal elements DPii ¼ j Wijp ; k1 denotes the number of nearest neighbors of the sample xi and Nk1 ðiÞ represents the index set of the k1 nearest neighbors of the sample xi in the different classes. Intraclass compactness is characterized from the intrinsic graph by following equation: X X 2 SI ¼ kwT xi  wT xj k WijI ; i i2N þ ðjÞ or j2N þ ðiÞ ð3Þ k2 k2 ¼ 2wT XðDI  W I ÞX T w  lij ; if i 2 Nkþ2 ðjÞ or j 2 Nkþ2 ðiÞ WijI ¼ ; 0; else: where WI characterizes the affinity weights in the same class, in which each elements WIij refers to the weight of the edge between xi and xj in the same class, DI is a diagonal P matrix with diagonal elementsDIij ¼ j WijI ; k2 denotes the number of nearest neighbors of the sample xi and Nkþ2 ðiÞ represents the index set of the k2 nearest neighbors of the sample xi in the same class. With the intrinsic graph and penalty graph, the optimal projections of LMME can be obtained by solving generalized eigen-equation: XðDI  W I ÞX T x ¼ kXðDp  W p ÞX T x

ð4Þ

the membership level of each sample to the corresponding class (category) is to be the same, which is not in accordance with the class nature relations of the data. Furthermore, the samples in the margin of different class may be projected to neighbor points in the low-dimensional space. This may lead to potential misclassification. It is an important work to investigate how to construct the affinity graphs benefiting to pull these samples from one another when they are projected to low-dimensional feature subspace. Based on these considerations, we present a new approach called FLMME. In FLMME, the enhanced FKNN is implemented in order to model the nature distribution information of original samples. Through enhanced FKNN criterion, two novel fuzzy gradual graphs (intraclass and interclass) are constructed. During the construction of intraclass graph, the main idea is that the nearer the neighbors are, the greater the weights are; during the construction of interclass graph, the main idea is that the farther the neighbors are, the greater the weights are. In this paper, we realize these ideas by fuzzy gradual weights. As a result, we get the fuzzy gradual decreasing intraclass graph to describe intraclass compactness and the fuzzy gradual increasing graph to characterize interclass separability. These two graphs help pull the near neighbor samples in same class nearer and nearer and repel the far neighbor samples of margin between different classes farther and farther when they are projected to low-dimensional subspace. More details will be presented in Sect. 3.2. These procedures have a low sensitivity to substantial sample variations caused by varying illumination, expression, and viewing conditions since the improved graph can capture more nature information about the proximity of the data. For the convenience of the description of the algorithm, we will use WINE data sets taken from UCI to show the effectiveness of the proposed fuzzy graphs and measure the robustness of the proposed algorithm to outliers by using Yale face database in Sect. 4.

3 FLMME

3.2 Computation of the fuzzy membership degree

3.1 Basic idea

FLMME designs fuzzy gradual intraclass and interclass membership degree. For a fuzzy gradual intraclass membership degree, k-nearest neighbors are characterized through their contributions to the intraclass compactness: the nearer the neighbors are, the greater the weights are. The k-nearest neighbor in a fuzzy gradual interclass membership degree is related to the interclass separability: the farther the neighbors are, the greater the weights are. The fuzzy gradual intraclass membership degree is defined as below:

The construction of affinity graph that can correctly reflect the relationship of samples is of extreme importance for the graph-based algorithms. The graphs have powerful abilities of modeling the geometric structure since its edge set describes some information about the proximity or distribution of the data. However, there is a problem on how to better measure the relation of the data. The problem in the existing graph-based methods lies in the assumption that

123

80

e l ij ¼

C. Zhao et al.

8 > < 0:51 þ 0:49ðnij =kc Þ

if the ith sample 2 jth class, then nij ¼

> :

if the ith sample 62 jth class, then nij ¼

0:49ðnij =kc Þ

ð1 þ eÞkc q ;

and 0\e\0:1

ð5Þ

0:49ðnij =kp Þ

neighbor point is more contribution for interclass separability graph, The farthest neighbor, i.e., the kth neighbor has the biggest weight: ð1 þ eÞk1 : As a result, the samples of margin between different classes become more separability. Therefore, a more reliable interclass membership degree description is obtained to create an interclass graph having the ability to better reflect the interclass data structure. 3.3 Construction of fuzzy gradual neighborhood graphs By following the graph embedding formulation, intraclass compactness is calculated from the fuzzy gradual intraclass graph: XX 2 SC ¼ kxT xi  xT xj k Wijc ; i j ð7Þ T ¼ 2x XðDC  W C ÞX T x  e l ij ; if i 2 Nkþc ðjÞ or j 2 Nkþc ðiÞ; and i; j 2 Class m; WijC ¼ 0; else: ð8Þ where WC characterizes the affinity weights in the same class, in which each elements Wcij refers to the weight of the

if the ith sample 62 jth class, then nij ¼

P

ð1 þ eÞq

q62cðjÞ

if the ith sample 2 jth class, then nij ¼

where c(j) denotes the index set of jth class, and q 62 cðjÞ means that the qth nearest neighbor sample does not belong to c(j); where kp stands for the number of the neighbors of the ith data; nij stands for the sum of the membership degree of the ith data (pattern) that belongs to different classes or same class; e is weight parameter modulating the separation degree between different classes. This study uses 0.02 as the value of e (for details on setting this parameter, please see Sect. 3.6). To define an interclass membership degree, intuitively, a farther

123

P

q62cðjÞ

8 > < 0:51 þ 0:49ðnij =kp Þ > :

ð1 þ eÞkc q

q2cðjÞ

where c(j) denotes the index set of jth class, and q [ c(j) means that the qth nearest neighbor sample belong to c(j); k stands for the number of the neighbors of the ith data; n stands for the sum of the membership degree of the c ith dataij (pattern) that belongs to the same class or different classes; e is a weight parameter that adjusts the compactness degree in the same classes. The value of 0.02 is assigned to e in this study (the details on setting this parameter are stated in Sect. 3.6). q represents the qth nearest neighbor of the ith data. During construction of intraclass compactness graph, the nearer the neighbor is, the more contribution the neighbor has. Therefore, the nearest neighbor sample has the biggest weight: ð1 þ eÞkc 1 : The weights for the rest neighbors vary with their distances to the sample, and the weight for the k th neighbor is set to be 1. Base on the concept describedc above, the closer samples within the same class become more compact. As a result, a more reliable intraclass membership degree description is achieved, which is useful to construct an intraclass graph that can better reflect the interclass data structure. The fuzzy gradual interclass membership degree is defined in a similar format of intraclass membership degree as below:

pij ¼

P

P

ð1 þ eÞq ;

and 0\e\0:1

ð6Þ

q2cðjÞ

edge between xi and xj in the same class, DC is a diagonal P matrix with diagonal elements Dcii ¼ j Wijc ; and Nkþc ðiÞ indicates the index set of the kc nearest neighbors of the sample xi in the same class. Interclass separability, namely margin, is characterized by a fuzzy gradual interclass graph: XX SS ¼ kxT xi  xT xj k2 Wijs i j ð9Þ T ¼ 2x XðDs  W s ÞX T x

FLMME for feature extraction

Wijs ¼



pij ; 0;

81

if i 2 Nkp ðjÞ; i 2 Class m or j 2 Nkp ðiÞ; j 2 Class n else: ð10Þ

where WS characterizes the affinity weights between classes, in which each element Wsij refers to the weight of the edge between xi and xj in different classes, DS is diagonal matrix P with diagonal elements Dsii ¼ j Wijs ; and Nkp ðiÞ indicates the index set of the kp nearest neighbors of the sample xi in different classes. Based on the criterions as shown in Eqs. 6 and 10, there is a kind of discontinuity of the weighing function, since the weight of the ðkp þ 1Þth neighbor is shrunk to be 0. We consider that the discontinuity is reasonable since the farther sample points with different labels are signed with larger weights, which results in larger scatter and prevents the farther points of different classed in mapping to be close together in low-dimensional subspace. And thus, it is expected to achieve higher classification accuracy. As samples lie in the local margin, the farther neighbors are more contributive for interclass separability. So when the number of kp is not bigger, the proposed algorithm will always perform better. This is also testified by the experimental result (Fig. 1b) in Sect. 3.6. Therefore, it is reasonable that the weight of the ðkp þ 1Þth nearest neighbor equals to 0 as far as the construction of interclass graph is concerned. 3.4 Objective function of the proposed method The proposed algorithm is expected to find the optimal projections that can minimize the fuzzy gradual intraclass graph and simultaneously maximize the fuzzy gradual interclass graph. We then have following constrained optimized problem: XX 2 Maximize JðxÞ ¼ kxT xi  xT xj k Wijs subject to

XX i

i

j T

2

kx xi  xT xj k Wijc ¼ 1

Therefore, we can obtain its optimal solutions by solving a generalized eigen-equation: XðDs  W s ÞX T xi ¼ kXðDC  W C ÞX T xi ;

ð12Þ

where xi is the generalized eigenvector corresponding to generalized eigenvalue k. Then, we can select the eigenvectors associated to the d largest eigenvalues as the optimal projection matrix PFLMME, where PFLMME ¼ ½x1 ; x2 ; . . .; xd . 3.5 FLMME algorithm The following summarizes the FLMME algorithm steps: Step 1 Perform PCA transformation on the data. Let PPCA denotes the transformation matrix of PCA. Step 2 Compute the fuzzy gradual intraclass and interclass membership degree matrix using Eqs. 5 and 6, respectively. Step 3 Compute affinity weights using Eqs. 8 and 10, respectively. An edge is added between xi and xj from the same class if xj is one of xi’s k-nearest neighbors. For the interclass graph, xi is connected with xj from the different classes if xj is one of xi’s k-nearest neighbors. Step 4 Create the fuzzy gradual intraclass compactness graph and fuzzy gradual interclass separability graph by using Eqs. 7 and 9, respectively. Step 5 Solve the generalized eigen-function of Eq. 12 and obtain the optimal projection matrix PFLMME. Step 6 Output the final linear projection matrices as P ¼ PPCA  PFLMME

ð13Þ

Once the project matrix P is obtained through using the FLMME algorithm, the nearest neighbor classification can be used for classification.

ð11Þ

j

The above criterion is formally similar to the Fisher criterion since they are both Rayleigh quotient problems.

3.6 Analysis of parameters In FLMME, the number of intraclass neighbors can be chosen as kc = l - 1 during construction of intraclass

Fig. 1 a The average recognition rate versus the variation weight adjusting parameter e. b The average recognition rate versus the variation of the number of interclass nearest neighbor parameter kp

123

82

graph, where l denotes the number of training samples per class. The choice is testified to be reasonable in the observation space in (Yang et al. 2007). Besides the number of intraclass neighbor parameter, there are two main parameters: weight adjusting parameter e and the number of interclass nearest neighbor parameter kp in FLMME. In this section, we illustrate the behavior of proposed algorithm (FLMME) with weight adjusting parameter e and the number of interclass nearest neighbor parameter kp. The experiment was performed using four images per class for training, and the rest for testing on the Yale face dataset. The details of the dataset are presented in Sect. 4.2. To reduce the disturbing effect from a training set, we perform 50 random experiments and then showed the results in Fig. 1. The average recognition rate of FLMME versus the variation weight adjusting parameter e and those versus interclass nearest neighbor parameter kp are, respectively illustrated in Fig. 1a and b. Figure 1a indicates that the performance achieves top recognition rate when e is chosen as 0.02 and varies very little when e C 0.06. So, we will choose as e = 0.02 for all the experiments presented in this paper. Figure 1b shows that the performance achieves top recognition when the number of interclass nearest neighbors (kp) between 2.5 and 3.5 times of l (the number of training samples). So, we will choose as kp = 3l for our experiments. Then with the further increment of the number of interclass nearest neighbor (kp), the performance of FLMME in general steps down. As samples lie in the margin, the farther neighbors are more contribution for interclass separability. Contrarily, for the samples not belonging to the margin, too far neighbors maybe disturb the construction of interclass separability and should even be more contribution for intraclass compactness.

4 Experiments and analysis In order to evaluate the performance of proposed algorithm, FLMME is compared to those of the PCA, LDA, LPP, LMME by conducting experiments using databases: Wine dataset from UCI, YALE, ORL and USPS. The Wine is synthetic data sets taken from UCI. It used to show the effectiveness of the proposed fuzzy gradual graphs and test the clustering performance of the proposed algorithm as a toy example. The robustness of the FLMME was evaluated by using the YALE database with the variations in both facial expressions and illumination. The ORL database was used to examine the performance of the algorithm under conditions where the pose and sample size were varied. The USPS handwriting digital database was used to

123

C. Zhao et al.

estimate the effectiveness of FLMME when dealing with shapes variation in handwriting. Nearest neighborhood classifier with Euclidean distance was used in all the experiments. In the experiments, we perform random experiments in order to reduce the disturbing effect from one training set. 4.1 Experiment using the WINE dataset from UCI: a toy example Now we use Wine dataset, a real-life dataset from the UCI machine-learning repository (http://archive.ics.uci.edu/ml), to show the effectiveness of FLMME in constructing the two novel graphs. Wine dataset consists of 178 samples of 3 classes. Every sample has 13 features. We select 48 samples per class in our experiments. Then 5 out of 48 samples per class are selected for training. Here, we apply PCA, LDA, LPP, LMME and the proposed FLMME for feature extraction. All the samples are projected onto the 2D subspace, which are shown in Fig. 2, respectively. Corresponding classification accuracies in the learning 2D subspace are also given. According to the result shown in Fig. 2, the data points projected onto the 2D subspace learned by the proposed algorithm (FLMME) are clearly separated when compared with those of PCA, LDA, LPP and LMME. This indicates that FLMME captures a more reasonable structure of the data. From the 2D illustrative examples, we can see that the construction of fuzzy gradual intraclass and interclass graph actually works. It also indicates the fuzzy gradual membership degree is effective and reasonable though the weighing function is discontinuity. Constructed by the fuzzy gradual membership degree, the novel fuzzy gradual graphs can better characterize the marginal separability and within-class compactness than other affinity graphs, which help to pull these marginal points between different classes away from one another when they are projected to lowdimensional subspace. Thus the higher classification accuracy and clustering results in 2D subspace are obtained. 4.2 Experiment using the Yale database The Yale face database contains 165 images of 15 individuals (each person provides 11 different images) with various facial expressions and lighting conditions. Each image was originally cropped and resized to 100 9 80 pixels. Figure 3 shows sample images of one person. For computational effectiveness, the sample size was reduced to 50 9 40 pixels in this experiment.

FLMME for feature extraction

83

Fig. 2 The points projected onto the 2D subspace learned by five methods and the corresponding recognition rate (shown in parentheses)

4.2.1 The robustness for outliers In this experiment, we test the robustness of the FLMME. We focus on the case that there are outliers (left-light, right-light and surprised images can be viewed as outliers) in training set and text set. The experiment was performed using the first four images (i.e., center-light, with glasses, happy, left-light,) per class for training, and the remaining seven images (i.e., without glasses, normal, right-light, sad, sleepy, surprised, and winking) for test. Thus, there are outliers in both training set and test set. For feature extraction, we used PCA, LDA, LPP, LMME and the proposed FLMME, respectively. The recognition rates of these methods are show in Fig. 4.

Since face images lie on different sub-manifolds, and with the variations of facial expressions and lighting conditions, some face images of different individuals (such as left-light images) are close with each other. With the fuzzy gradual graphs which sign gradual weights according to their fuzzy gradual membership degree, the inner structure of data can be better characterized by introducing the fuzzy pattern. As can be seen from the Fig. 4, FLMME perform the best among these five methods. This experiment indicates that FLMME is more robust for the outliers than the other methods. When compared the LMME with the FLMME, FLMME obtains better classification results. The only reason is that the different ways to construct the graphs result in different classification accuracies. The

123

84

C. Zhao et al. Table 1 The maximal average recognition accuracy (%) of different algorithms and the corresponding dimension (shown in parentheses) Method

2Train

3Train

4Train

5Train

6Train

PCA

78.49 (29) 81.46 (40) 85.37 (36) 85.95 (40) 87.01 (40)

LDA

81.92 (14) 85.62 (14) 88.03 (14) 88.84 (14) 89.36 (14)

LPP

81.45 (24) 85.96 (24) 88.57 (21) 89.00 (18) 89.00 (18)

LMME

81.23 (21) 83.90 (38) 85.92 (40) 89.31 (39) 91.31 (40)

FLMME 81.26 (30) 86.20 (30) 89.69 (29) 92.53 (22) 94.53 (29)

Fig. 3 The sample images of one person from Yale face database

repeated 50 times and the average recognition rate was calculated. In general, the recognition rates varied with the dimension of the face subspace. Table 1 lists the maximal average recognition accuracy. Figure 5 shows the variation of average recognition rates by using different algorithms with different dimensions for the case that six images per person were selected for training. As it can be seen from Table 1 and Fig. 4, the recognition rates of the FLMME are always higher than the other methods, which indicates that the proposed fuzzy neighborhood graph are more suitable for reflecting the local neighborhood structure of the face image obtained form different individuals with various facial expressions and lighting conditions. 4.3 Experiment using the ORL database The ORL database was used to evaluate the performance of FLMME for the variations in pose and face expression. The ORL face database contains images from 40 individuals with each of them providing 10 different images. The facial expressions and facial details (glasses or no glasses) also

Fig. 4 The recognition rates (%) of different algorithms versus the dimensions when the first four images per person were used for training on the YALE face database

higher recognition rate of FLMME indicates that the proposed fuzzy gradual graphs can better characterize the compactness and separability. In other words, the proposed method is not quite sensitive to outlier due to the fact that the fuzzy gradual graphs help to repel these marginal points between classes from one another when they are projected to low-dimensional subspace. 4.2.2 Random experiment In this section, for each individual, l (=2, 3, 4, 5, 6) images were randomly selected as training samples, and the rest images were used for test. All the five methods, PCA (eigenface), LDA (Fisherface), LPP, LMME, and FLMME were utilized for feature extraction. All LDA, LPP, LMME and FLMME methods included a PCA phase. In this phase, 40 principal components were selected. The procedure was

123

Fig. 5 The average recognition rates (%) of different algorithms versus the dimensions when six images per person were randomly selected for training on the YALE face database (50 times independently running)

FLMME for feature extraction

85

Fig. 6 The sample images of one person from ORL face database

vary. The images were taken with a tolerance for some tilting and rotation of the face of up to 20°. Moreover, there is also some variation in the scale up to about 10%. Figure 6 shows sample images of one person. All images were normalized to a resolution of 56 9 46 in the experiments. LDA, LPP, LMME and FLMME have a PCA phase, during which 50 principal components were selected. For each individual, images in the amount of l (=2, 3, 4, 5, 6) were randomly selected as training samples. The rest images were then used for test. This process was repeated 50 times and the average recognition rate was calculated. In general, the recognition rates varied with the dimension of the face subspace. The maximal average recognition accuracy of using different algorithms is presented in Table 2. The average recognition rate of different algorithms when five images per person were randomly selected for training is show in Fig. 7.

subset of digital ‘‘2’’ from original USPS handwriting digital database. For each individual, l (=20, 30, 40, 50, 60) images were randomly selected as training samples, and the rests were used for test. For feature extraction, PCA (eigenface), LDA (Fisherface), LPP, LMME and the proposed FLMME were used. It is noted that LDA, LPP, LMME and FLMME all involve using a PCA phase. For different algorithms, the optimal PCA dimension may be different. It is still an open problem to choose the optimal dimension of PCA. For fair

4.4 Experiment using the USPS handwriting digital database The USPS handwriting digital data includes ten classes designated from ‘‘0’’ to ‘‘9’’. Each class has 1,100 examples. In this experiment, a subset was selected from the original database. Each image is then cropped to have the size of 16 9 16. There are 100 images for each class in the subset and the total number is 1,000. Figure 8 displays a

Table 2 The maximal average recognition accuracy (%) of different algorithms and the corresponding dimension (shown in parentheses) Method

2Train

3Train

4Train

5Train

Fig. 7 The average recognition rates (%) of different algorithms versus the dimensions when five images per person were randomly selected for training on the ORL face database (50 times independently running)

6Train

PCA

74.90 (50) 82.24 (46) 84.98 (46) 86.71 (23) 87.50 (50)

LDA

77.40 (38) 85.09 (39) 86.16 (39) 87.41 (18) 87.87 (21)

LPP

72.50 (44) 81.78 (46) 87.42 (36) 87.42 (36) 93.21 (32)

LMME

72.72 (50) 87.30 (50) 91.42 (46) 94.65 (42) 96.22 (46)

FLMME 75.24 (50) 89.17 (50) 93.62 (46) 96.61 (38) 98.14 (46)

Fig. 8 The sample digital images ‘‘2’’ from USPS handwriting database

123

86

C. Zhao et al.

Table 3 The maximal average recognition accuracy (%) of different algorithms and the corresponding dimension (shown in parentheses) Method

20Train

30Train

PCA

80.88 (20) 84.56 (20) 86.72 (29) 87.96 (26) 88.90 (27)

LDA

82.72 (7)

LPP

78.93 (28) 82.75 (30) 85.70 (29) 86.78 (30) 88.82 (30)

LMME

82.03 (12) 86.17 (15) 88.25 (20) 89.46 (20) 90.40 (30)

85.83 (9)

40Train

86.80 (8)

50Train

88.00 (9)

60Train

88.57 (9)

FLMME 83.80 (22) 87.84 (18) 89.70 (17) 91.24 (19) 92.05 (23)

comparisons, in this phase, we kept nearly 95% image energy and selected the number of principal components as 30 for each method. For each l, we independently run the system 10 times, then calculated the average recognition rate. In general, the recognition rates varied with the dimension of the face subspace. The maximal average recognition accuracy of different algorithms is shown in Table 3.

samples in same class nearer and nearer and repel the far neighbor samples of margin between different classes farther and farther. So, the novel fuzzy gradual graphs based on the fuzzy gradual membership degree can better characterize the compactness and separability. Factors discussed above contribute to the superior effectiveness of the FLMME method. In general, the results show that FLMME is superior to LMME, LPP, LDA and PCA. Nevertheless, another inconsistent point that is worth remarking upon concerns the performance comparison of FLMME and LDA. LDA might outperform FLMME when l is very small, such as when l = 2 in the Yale and the ORL database. The reasons might be that the fuzzy membership relation is not enough to characterize the ‘‘locality’’ when l is very small. With the increasing number of training samples, the ‘‘locality’’ is characterized more and more accurately, thus the proposed method outperforms LDA more significantly.

4.5 Discussions 5 Conclusions Based on the experimental results by using the WINE database, the YALE and ORL face image databases and the USPS handwriting digital database, the following findings and conclusions are made: 1.

2.

3.

From the experiments result from the WINE dataset, we can intuitionally see that the data points projected onto the 2D subspace learned by the proposed algorithm (FLMME) are clearly separated when compared with those of PCA, LDA, LPP and LMME. It is because that the novel fuzzy gradual graphs help to pull these marginal points between different classes away from one another when they are projected to lowdimensional subspace. Thus the higher classification accuracy and clustering results in 2D subspace are obtained. From Tables 1, 2 and 3, we can see that the recognition rates of FLMME are higher than the other methods in general. As it can be seen from Figs. 4, 5 and 7, the performance of FLMME is significantly improved. Moreover, FLMME is more robust for the outliers than the other methods. These indicate that the construction of fuzzy intraclass and interclass graph actually works and the proposed fuzzy gradual neighborhood graph are more suitable for reflecting the local neighborhood structure of these databases. The advantage of the FLMME is because that the fuzzy gradual membership degree can efficiently handle the vagueness and ambiguity of samples being degraded by poor illumination, shape and facial expression variation. In other words, the fuzzy gradual membership degree helps to pull the near neighbor

123

This paper presents a new approach called FLMME. The FKNN is adopted to characterize the nature distribution information of original samples. Based on the distribution information, the affinity weights of neighborhood graphs (intraclass and interclass) instead of the weights of the binary pattern or Gaussian kernel function. Comparative study between the proposed algorithm and other algorithms (PCA, LDA, LPP and LMME) was conducted to evaluate the effectiveness of the new algorithm. The experimental results show that FLMME outperforms PCA, LDA, LPP and LMME because of the fuzzy discriminating characteristics. In the future, we will make more tests on other types of datasets and further improved objective function of the proposed algorithm. Acknowledgments This work is partially supported by the Fujian Provincial Department of Science and Technology of China under grant no. JK2010046, JB10135, JA10226, 2009I0020. It is also partially supported by the National Science Foundation of China under grant no. 60472061, 60632050, 90820004 and Hi-Tech Research and Development Program of China under grant no. 2006AA04Z238. It is also partially supported by the Ministry of Industry and Information Technology of china under grant no. E0310/1112/JC01.

References Batur A, Hayes M (2001) Linear subspaces for illumination robust face recognition. Proc IEEE Int Conf Comput Vis Pattern Recogn 2(1):296–301 Belhumeur PN, Hespanda JP, Kiregeman DJ (2007) Eigenfaces versus Fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

FLMME for feature extraction Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15:1373–1396 Fukunnaga K (1991) Introduction to statistical pattern recognition, 2nd edn. Academic Press, London He X, Niyogi P (2003) Locality preserving projections. In: Proceedings of the 16th conference on neural information processing systems He X, Yan S, Hu Y, Niyogi P, Zhang H (2005) Face recognition using Laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3): 328–340 Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37 Keller JM, Gray MR, Givens JR (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybernet 15(4):580–585 Kokiopoulou E, Saad Y (2009) Enhanced graph-based dimensionality reduction with repulsion Laplaceans. Pattern Recogn 42(11): 2392–2402 Kwak KC, Pedrycz W (2005) Face recognition using a fuzzy Fisherface classifier. Pattern Recogn 38:1717–1732 Laskaris NA, Zafeiriou SP (2008) Beyond FCM: graph-theoretic postprocessing algorithms for learning and representing the data structure. Pattern Recogn 41(8):2630–2644 Martinez AM, Kak AC (2001) PCA versus LDA. IEEE Trans Pattern Anal Mach Intell 23(2):228–233

87 Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326 Tenenbaum JB, Desilva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323 Turk M, Pentland A (1991) Face recognition using eigenfaces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 586–591 Zhao C, Lai Z, Sui Y, Chen Y (2008) Local maximal marginal embedding with application to face recognition. Proc 2nd Chin Conf Pattern Recogn 1(1):215–220 Yang J, Zhang D, Yang JY (2007) Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics. IEEE Trans Pattern Anal Mach Intell 29(4):650–664 Ye J, Janardan R, Park C, Park H (2004) An optimization criterion for generalized discriminant analysis on under-sampled problems. IEEE Trans Pattern Anal Mach Intell 26(8):982–994 Yu H, Yang J (2001) A direct LDA algorithm for high dimensional data-with application to face recognition. Pattern Recogn 34(10):2067–2070

123

Suggest Documents