Multiple Kernel Learning in Fisher Discriminant ... - Semantic Scholar

ARTICLE International Journal of Advanced Robotic Systems

Multiple Kernel Learning in Fisher Discriminant Analysis for Face Recognition Regular Paper

Xiao-Zhang Liu1,* and Guo-Can Feng2

1 School of Computer Science, Dongguan University of Technology, Dongguan, Guangdong, China 2 Faculty of Mathematics and Computing, Sun Yat-sen University, Guangzhou, China * Corresponding author E-mail: [email protected]

Received 14 Jun 2012; Accepted 14 Aug 2012 DOI: 10.5772/52350 © 2013 Liu and Feng; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract Recent applications and developments based on support vector machines (SVMs) have shown that using multiple kernels instead of a single one can enhance classifier performance. However, there are few reports on performance of the kernel‐based Fisher discriminant analysis (kernel‐based FDA) method with multiple kernels. This paper proposes a multiple kernel construction method for kernel‐based FDA. The constructed kernel is a linear combination of several base kernels with a constraint on their weights. By maximizing the margin maximization criterion (MMC), we present an iterative scheme for weight optimization. The experiments on the FERET and CMU PIE face databases show that, our multiple kernel Fisher discriminant analysis (MKFD) achieves high recognition performance, compared with single‐kernel‐based FDA. The experiments also show that the constructed kernel relaxes parameter selection for kernel‐based FDA to some extent. Keywords Multiple Kernel Learning (MKL), Kernel‐based Fisher Discriminant Analysis (kernel‐based FDA), Margin Maximization Criterion (MMC), Weight Optimization www.intechopen.com

1. Introduction As there exist many image variations such as pose, illumination and facial expression, face recognition is a highly complex and nonlinear problem which could not be sufficiently handled by linear methods, such as principal components analysis (PCA) [1] and Fisher discriminant analysis (FDA) [2]. Therefore, it is reasonable to assume that a better solution to this inherent nonlinear problem could be achieved using nonlinear methods, such as the so‐called kernel machine techniques [3]. Following the success of applying the kernel trick in SVMs, many kernel‐based PCA and FDA methods have been developed and applied in pattern recognition tasks, such as kernel PCA (KPCA) [4], kernel Fisher discriminant (KFD) [5], generalized discriminant analysis (GDA) [6], and kernel direct FDA (KDDA) [7]. It has been shown that the kernel‐based FDA method is a feasible approach to solve the nonlinear problems in face recognition. However, the performance of the kernel‐ based FDA method is sensitive to the selection of a kernel function and its parameters. Kernel parameter selection to date can mainly be achieved by Cross Validation [8], which is computationally expensive, and the selected Int J Adv Robotic Sy, 2013, Vol.Guo-Can 10, 142:2013 Xiao-Zhang Liu and Feng: Multiple Kernel Learning in Fisher Discriminant Analysis for Face Recognition

1

kernel parameters can not be guaranteed optimal. Furthermore, a single and fixed kernel can only characterize the geometrical structure of some aspects for the input data and, thus, not always be fit for the applications which involve data from multiple, heterogeneous sources [9][10], such as face images under broad variations of pose, illumination, facial expression, aging, etc. Recent applications and developments based on SVMs [11][12] have shown that using multiple kernels (i.e., a combination of several “base kernels”) instead of a single fixed one can enhance classifier performance, which raised the so‐called multiple kernel learning (MKL) method. With m kernels, input data can be mapped into m feature spaces, where each feature space can be taken as one view of the original input data [10]. Each view is expected to exhibit some geometrical structures of the original data from its own perspective such that all the views can complement for the subsequent learning task. It has been proven that MKL can offer some needed flexibility and well manipulate the case that involves multiple, heterogeneous data sources [9][13][14]. However, MKL is proposed for SVMs, and there have been few reports on performance of the kernel‐based FDA method with multiple kernels. In this paper, we propose multiple kernel Fisher discriminant analysis (MKFD), in which the constructed kernel is a linear combination of several base kernels with a constraint on their weights, and we give an iterative scheme for weight optimization. The rest of this paper is organized as follows. First we describe the kernel construction for MKFD in section 2. Then in section 3, the optimization scheme for the multi‐ kernel weights is presented. The experimental results are reported in section 4, while we draw our conclusion in section 5. 2. Kernel construction for MKFD Given

M Mercer kernel functions k ( m ) (x, y ) ,

good performance of MKFD for face recognition, we consider the problem of learning proper weights of the base kernels, i.e., choosing the best γ  ( 1 ,  2 ,...,  M ) , T

without regard to the specific structures of base kernel functions. 3. Weight optimization for MKFD 3.1 Some notations on MKFD Let    be the original sample space. Let X be a training set of N samples and C be the number of sample d

classes. Assume the i th class X i contains N i samples, i.e.,

X i  {x1i , xi2 , , xiNi } , i  1, 2, , C , so N   i 1 N i . C

M nonlinear mappings as  m : x     m (x)   , m  1, 2,..., M , where  is the mapped feature space, with df = dim  (dimensionality of  ). Denote M base kernel matrices ( N  N ) as Denote

K ( m )   k ( m ) (xij , xlh )  i 1,,C , j 1,, Ni , m  1,..., M , (2) l 1,, C , h 1,, N l

where k

M

k (x, y )    m k m 1

2

(m)

M

(x, y ) , s.t.   m  1 , (1) m 1

2 (m) where  m is the weight of base k (x, y ) . Since the weights are nonnegative, it is easy to show k ( x, y ) is d d also a Mercer kernel defined on    . We apply the multiple kernel function (1) in kernel‐based FDA, which produces what we call MKFD. To achieve 2

Int J Adv Robotic Sy, 2013, Vol. 10, 142:2013

(xij , xlh )   m (xij )T  m (xlh ) , each base kernel

corresponding to one nonlinear mapping. Given γ  ( 1 ,  2 ,...,  M )

T

subject to



M

  1 ,

m 1 m

N  N multiple kernel matrix is

M

K    m 2 K ( m ) . (3) m 1

We call the mapping from  to  multiple nonlinear mapping, denoted as  , which is implicitly defined by the multiple kernel matrix K , and can be understood as the compound of  m , m  1,..., M .

Under multiple nonlinear mapping  , the i th mapped class and the mapped sample set are respectively given by

 ( X i )  {(x1i ), (xi2 ), ,  (xiNi )} ,

m  1, 2,..., M , defined on  d   d , which are M base

kernels, we construct the multiple kernel function as the following linear combination

(m)

 ( X )  { ( X 1 ), ( X 2 ), , ( X C )} . Also, the mean of the mapped class  ( X i ) and that of the mapped sample set  ( X ) are respectively given by

mi 

1 Ni

Ni

 (xij ) , m  j 1

1 N

C

Ni

 (x i 1 j 1

i j

) .

In kernel feature space  , the within‐class scatter matrix

S w and between‐class scatter matrix Sb are respectively

defined as www.intechopen.com

S w 

1 N

C

  ((x)  m )((x)  m ) i

i 1 x X i

1 S  N

C

 N (m

 b

i 1

i

i

T

 Φ wΦTw , (4)

 m)(mi  m)  Φb Φ , (5) T

i

T b



3.2.2 Eigen‐analysis of S w in the feature space Based on the analysis in section 3.2.1, it can be seen that

U T S w U  (Er Λ b1 2 )T (ΦbT S w Φb )(E r Λ b1 2 ) ,

Where

T b

where Φ S Φb can be expressed using K , with the

Φ w  [11 ,..., N1 1 , 12 ,..., N2 2 ,......, 1C ,..., NCC ]df  N ,

 ij 

1 N

((xij )  mi ) , Ni (m i  m) . (6) N

Φb  [1 ,..., C ]df C , i 

The kernel Fisher criterion is defined as

J  ( W) 

tr( W T Sb W) , tr( W T S w W)

similar details seen in [7]. T  Let z j be the eigenvector of U S w U corresponding to the j th smallest eigenvalue  j , j  1,..., r . Denote Z  (z1 ,..., z r ) . Defining Y  UZ , it can be derived that Y T S w Y  Λ w , with Λ w  diag(1 ,..., r ) . Based on the derivation presented in section 3.2.1 and 3.2.2, an optimal projection matrix for MKFD is obtained as

W*  YΛ w1 2  Φb Er Λ b1 2 ZΛ w1 2 . (8)

where W  {w1 ,..., w q } is a df  q ( df  q) projection matrix. MKFD is to find an optimal projection matrix W*:  df   q in mapped feature space  , such that W*  arg max J  ( W) . W

3.2 Diagonalization strategy We use the same diagonalization strategy as KDDA [7] to deal with the small sample size (SSS) problem in our  MKFD, i.e., first diagonalzing Sb to I (identical matrix)  and then diagonalzing S w to Λ w , which is briefly expressed using the MKFD notations as follows. 3.2.1 Eigen‐analysis of S in the feature space.  b

Φ Φb can be expressed using the multiple kernel matrix K as follows:

1 N

D  ( A TNC  K  A NC  N1 A TNC  K  1NC

 N1 1

T NC

 K  A NC 

1 N2

1

T NC

 K  1NC )  D,

(7)

where D  diag( N1 ,..., N C ) ( C  C diagnal matrix), 1NC is a N  C matrix with terms all equal to one, A NC  diag(a N1 ,...,a NC ) is a N  C block diagonal matrix, and a Ni is a N i  1 vector with all terms equal to 1 . Ni Let i and ei (i  1,..., C ) be the i th largest eigenvalue T and corresponding eigenvector of Φb Φb . Let  T r ( C  1) be the rank of Sb ( Φb Φb ) (also the rank T of Φb Φb ). Denote E r  (e1 ,..., e r ) , and V  ( v1 ,..., v r )  Φb Er . It can be derived that V T Sb V  Λ b , with Λ b  diag(12 ,..., r2 ) , a 1 2 nonsingular diagonal matrix. Let U  VΛ b . Then T  U Sb U  I .

www.intechopen.com

Certainly, as the multiple nonlinear mapping  is implicitly defined by the multiple kernel function (or matrix), Φb (defined by Eq.(6)) remains unknown, and

W * can not be evaluated. The real meaning of Eq.(8) is 1 2 1 2 obtaining matrix Er Λ b ZΛ w , which can be computed from the multiple kernel matrix K . This is the core result of diagonalization for MKFD.

3.3 Optimization criterion and objective We adopt the maximum margin criterion (MMC) [15] as the objective function to optimize weight γ :

T b

ΦbT Φb 

 w

F ( W, γ )  tr( W T Sb W)  tr( W T S w W ) , (9) T where W is a projection matrix, γ  (  ,  ,...,  ) , M 1 2 M T  m  1 , with  m 2 being the weight of subject to 1 γ  m 1 K ( m ) . the m th base kernel matrix Based on the result (8) in 3.2, the optimal projection matrix



W*  Φb Er Λ b1 2 ZΛ w1 2 . Denoting G  E r Λ b1 2 ZΛ w1 2 , which can be computed from the multiple kernel matrix K . Then the objective function (9) can be reformulated as

F ( γ )  tr( W * T Sb W *  W * T S w W*)  tr(G T ΦbT Φb ΦbT Φb G  G T ΦbT Φ wΦTwΦb G ) (10)  tr(G T PP T G  G T QQ T G ), T

T

where P  Φb Φb and Q  Φb Φ w can be expressed in

terms of the multiple kernel matrix K as follows.

Xiao-Zhang Liu and Guo-Can Feng: Multiple Kernel Learning in Fisher Discriminant Analysis for Face Recognition

3

P

F ( γ )   (tr(G T PP T G  G T QQ T G ))  m  m

D  ( A TNC  K  A NC  N1 A TNC  K  1NC

1 N

T NC

 N1 1

 K  A NC 

1 N2

T NC

1

 K  1NC )  D (11)

 tr(

M

   m 2 P(m) , m 1

where

 (G T PP T G  G T QQ T G ))  m

 tr((G T

P(m) 

1 N

D  ( A TNC  K ( m )  A NC  N1 A TNC  K ( m )  1NC

 N1 1TNC  K ( m )  A NC 

1TNC  K ( m )  1NC )  D,

1 N2

 (G T

Q T Q T Q G  G TQ G ))  m  m

k 1

with D , A NC and 1NC defined the same as in (7);

Where

D  ( A TNC  K  A TNC  K  H NN

trm, k  tr((G T P ( m ) P ( k ) T G  G T P ( k ) P ( m ) T G )

 N1 1TNC  K  N1 1TNC  K  H NN ) (12)

 (G T Q ( m ) Q ( k ) T G  G T Q( k ) Q( m ) T G )) m, k  1, 2,..., M .

M

   m 2Q( m) ,

where

Now, differentiating ( γ ,  ) with respect to  1 ,...,  M and  gives the following partial derivatives:

Q

( m)



1 N

D  (A

T NC

K

(m)

(m)

T NC

 1 1 N

A

T NC

T NC

K

K  1 K m  1, 2,..., M , 1 N

( m)

( m)

 H NN

 H NN ),

M ( γ,  )  2 m   k 2 trm, k   (16)  m k 1

m  1,..., M ,

H NN  diag(h N1 ,...,h NC ) is a N  N block diagonal matrix, and h Ni is a N i  N i matrix with all terms equal to

.

M  2   k 2 trm, k    0, m  1,..., M  m   k 1 (18) M   1  0 m  m 1

multiple kernel matrix K defined in Eq. (3), we need solve the following constrained optimization

max

γ

F ( γ )  tr(G T PP T G  G T QQ T G )

(13)

M

s.t. 1T γ    m  1.

3.4 Solving the optimization problem We introduce a Lagrangian

M

( γ ,  )  F ( γ )   (  m  1) , (14) m 1

with one multiplier  . From Eq.(11) and (12), we can obtain

P Q  2 m P ( m ) ,  2 m Q ( m ) .  m  m

Moreover, temporarily regarding G as constant, we have Int J Adv Robotic Sy, 2013, Vol. 10, 142:2013

We use Newton’s iteration method to solve these nonlinear equations. Let

m 1

4

( γ,  )    m  1 . (17)  m 1 M

Setting these partial derivatives to zero, we get the following set of M  1 equations:

Therefore, to find the best weights ( 1 ,  2 ,...,  M ) for the

,

m 1

1 Ni

(15)

 2 m   k 2 trm, k ,

1 N

M

m  1, 2,..., M ,

Q

P T P T P G  G TP G)  m  m

M   2  2 1   k tr1, k    k 1        , (19) M ( γ )   2 M   k2 trM , k      k 1   M    m 1    m 1

where γ  ( 1 , ,  M ,  ) . T

Then the iteration formula is

γ (i 1)  γ ( i )  [( γ (i ) )]1 ( γ ( i ) ) , (20) www.intechopen.com

is the Jacobian matrix of ( γ ) at

where [ ( γ )] (i )

1

γ (i )  ( 1(i ) ,...,  M( i ) ,  ( i ) )T , i  0,1, 2,... . 3.5 Weight optimization procedure Based on the analysis above, the detailed weight optimization procedure for MKFD is described as follows. Input： K

(m)

 [k ( m ) (x r , xt )]N  N , m  1,..., M , i.e., M

base kernel matrices. Output： γ  ( 1 ,...,  M ) , with  m being the weight of T

2

(m)

base kernel K S1. Given   0 . Initialize iteration counter i  0 and

(

(0) 1

,..., 

(0) M

,

(0)

) , subject to 

M



(0) m 1 m

 1 ;

constructed multiple kernel matrix K 

 m 1

(i ) 2 m

K ( m ) ,

find an optimal projection matrix in the i ‐th iteration

W *(i )  Φb G (i )  arg max J  ( W ) ; W

S3. Regarding G optimization S4.

(i )

as constant, construct the constrained

max F ( γ )  tr(G ( i ) T PP T G (i )  G (i ) T QQ T G ( i ) ) , γ

T

s.t. 1 γ 



and calculate matrix trm , k



M

 m 1

M M

m

 1 ,

, where

trm, k  tr((G (i ) T P ( m ) P ( k ) T G ( i )  G ( i ) T P ( k ) P ( m ) T G (i ) )

( i 1) Eqs.(19)(20). If γ  γ (i )   , then i  i  1 , go to

S2; else stop. In our experiments reported in Section 4, for the initial (0)

 

2 M4

M

M

 tr m 1 k 1

m,k

 ...   M(0) 

1 , M

, and  is set to 5e  3 .

4. Experiments To evaluate the performance of our MKFD for face recognition, we have made experimental comparisons with KDDA based on single kernels, in terms of low‐

www.intechopen.com

k1 (xi , x j )  xi T x j , Gaussian RBF kernel

k2 (xi , x j )  exp( value

of

all

xi  x j

2

2

the

2

) where  is set to the average original

sample

distances

1 N N   xi  x j , and polynomial kernel N ( N  1) i 1 j 1

k1 (xi , x j )  (xi T x j  1)d where d is set to 0.5. Thus the multiple‐kernel is k ( xi , x j ) 

 m 1

m

3

 m 1

m

2

km (xi , x j ) , with

 1 . We demonstrate the effectiveness of the

multiple‐kernel by comparing its performance with the single base kernels. 4.1 Face image datasets From the FERET database [16], we select 72 people, with 6 frontal‐view images for each individual. Face image variations in these 432 images include illumination, facial expression, wearing glasses, and aging. All the images are aligned by the centers of the eyes and the mouth and then normalized with a resolution of 92×112. The pixel value of each image is normalized between 0 and 1. The original images with resolution 92×112 are reduced to wavelet feature faces with resolution 49×59 after 1‐level Daubichies‐4 (Db4) wavelet decomposition. Images from one individual are shown in Fig. 1.

(G ( i ) T Q ( m ) Q ( k ) T G ( i )  G ( i ) T Q ( k ) Q ( m ) T G ( i ) )), m, k  1,..., M . ( i 1) (i ) S5. (Update weights) Compute γ from γ using

value, considering Eq.(18), we set  1

kernel

3

S2. Using the diagonalization strategy of MKFD with the M

dimensional representation and image recognition. Images are from two face databases, namely the FERET and the CMU PIE databases. In our experiments, three base kernels ( M  3 ) are adopted to construct the multiple kernel function: linear

Figure 1. Images of one person from the FERET database

In the CMU PIE face database [17], there are totally 68 people, and each person has 13 pose variations ranged from the full right profile image to the full left profile image and 43 different lighting conditions, 21 flashes with ambient light on or off. In our experiments, for each person, we select 56 images including 13 poses with neutral expression and 43 different lighting conditions in the frontal view. For all frontal‐view images, we apply alignment based on two eye center and nose center points, and no alignment is applied on the other images with poses. All the segmented images are rescaled to the resolution of 92 × 112, and then reduced to wavelet feature faces with resolution 49×59 after 1‐level Daubichies‐4 (Db4) wavelet decomposition. Some images of one person are shown in Fig. 2.


5

4.3 Recognition results This section reports the recognition results of MKFD and KDDA with single kernels on the FERET and the CMU PIE datasets. For each subject in the FERET dataset, we randomly select n ( n  2 to 5) out of 6 images for training, with the rest for testing. In the CMU PIE dataset, the number of randomly selected training images is ranged from 10 to 18 out of 56 for each individual, while the rest are testing images. The average recognition accuracies over 10 runs on the FERET and CMU PIE datasets are shown in Fig. 4(a)‐(b), respectively.

Figure 2. Some images of one person from the CMU PIE face database

4.2 Distribution of extracted features This section aims to provide insights on how the proposed MKFD simplifies the face pattern distribution, compared with KDDA based on single kernels, when the patterns are subject to pose and illumination variations. We select five subjects, 56 images per subject, with varying pose and illumination, from our CMU PIE face dataset determined above (56× 5 ＝ 280 images in all). Four types of feature bases are generalized from the images by utilizing KDDA with linear kernel, KDDA with Gaussian RBF kernel, KDDA with polynomial kernel, and our MKFD, respectively. In the sequence, all the 280 images are projected onto the four subspaces. For each image, its projections in the first two most significant feature bases of each subspace are visualized in Fig. 3.

a) Linear-kernel KDDA based subspace

b) RBF-kernel KDDA based subspace

c) Polynomial-kernel KDDA based subspace

d) MKFD based subspace

Fig. 3(a)‐(d) depict the first two most discriminant features extracted by utilizing KDDA with linear kernel, KDDA with Gaussian RBF kernel, KDDA with polynomial kernel and MKFD, respectively. Obviously, our MKFD extracts the most discriminant features.

Int J Adv Robotic Sy, 2013, Vol. 10, 142:2013

Table 1 shows the average and standard deviation of the accuracies for FERET ( n  3: 3 images per subject for training with the rest for testing) and CMU‐PIE ( n  14: 14 images per subject for training with the rest for testing), respectively. FERET CMU PIE Type of kernel (n=3) (n=14) Mean 84.94% 71.79% Linear kernel Std 0.015 0.014 RBF kernel

Mean Std

73.84% 0.040

69.03% 0.010

Polynomial kernel

Mean Std

84.80% 0.017

76.50% 0.010

Multi‐kernel with same weights

Mean Std

80.84% 0.030

71.81% 0.014

Multi‐kernel with optimized weights

Mean Std

86.34% 0.015

78.74% 0.007

Table 1. Performance comparison between MKFD and KDDA with single kernels

Figure 3. Distribution of 280 images of five subjects under varying pose and illumination in four types of subspaces

6

Figure 4. Comparison of accuracies obtained by MKFD and KDDA with single kernels

From the results in Fig. 4 and Tables 1, it can be seen that, the blending of multiple kernels in the proposed MKFD can achieve higher accuracies than any of the three single kernels, but a simple summation of multiple kernels is hardly a good idea for improving the classification performance, while our constructed multiple kernel function with optimized kernel weights leads to enhanced performance. Note that in the experiments, neither the parameter for RBF kernel nor the one for polynomial kernel is optimally selected, and even linear kernel has no parameter. This

www.intechopen.com

means that, to a certain extent, the multiple kernel function relaxes parameter selection about the base kernels. 5. Conclusion In this paper, on the assumption that multiple kernels can characterize geometrical structures of the original data from multiple views which can complement to improve recognition performance, we apply kernel‐based FDA with multiple kernels, which we call MKFD, to recognition of face images under variations of pose, illumination, facial expression, etc. The constructed kernel for MKFD is a linear combination of several base kernels with a constraint on their weights. By maximizing the margin maximization criterion, we propose an iterative scheme based on the method of Lagrange multipliers for the weight optimization, which yields updated kernel weights resulting in high recognition accuracy on the FERET and CMU PIE face database, compared with the single kernels. The experiments also demonstrate that the multiple kernel function relaxes parameter selection to some extent. It is important to point out that the proposed weight optimization scheme is generic and, with minor modifications, can be applied to all kernel‐based FDA algorithms. 6. Acknowledgement This work is partially supported by the National Natural Science Foundation of China under grant No. 60975083. 7. References [1] M. Turk and A. Pentland. Eigenfaces for recognition. J. Cogn. Neurosci., 3(1):71–86, 1991. [2] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman. Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell., 19(7):711–720, Jul. 1997. [3] A. Ruiz and P.E. López de Teruel. Nonlinear kernel‐ based statistical pattern analysis. IEEE Trans. Neural Netw., 12(1):16–32, Jan. 2001. [4] B. Schölkopf, A. Smola, and K.Müller. Nonlinear component analysis as a kernel eigenvalue problem. MPI fur biologische kybernetik, Tubingen, Germany, Tech. Rep. 44, 1996.

www.intechopen.com

[5] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, and K. R. Müller. Fisher discriminant analysis with kernels. Proc. IEEE workshop Neural Netw. Signal Process. IX, 1999, pp. 41‐48. [6] G. Baudat and F. Anouar. Generalized discriminant analysis using a kernel approach. Neural Comput., 12(10):2385‐2404, 2000. [7] J. W. Lu, K. Plataniotis, and A. N. Venetsanopoulos. Face recognition using kernel direct discriminant analysis algorithms. IEEE Trans. Neural Netw., 14(1):117‐126, Jan. 2003. [8] O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1‐ 3):131‐159, 2002. [9] S. Sonnenburg, G. Rätsch, and C. Schäfer. A general and efficient multiple kernel learning algorithm. Neural Information Processing Systems, 2005. [10] Z. Wang, S. Chen, and T. Sun. MultiK‐MHKS: A novel multiple kernel learning algorithm. IEEE Trans. Pattern Anal. Mach. Intell., 30(2):348‐353, Feb. 2008. [11] J. Bi, T. Zhang, and K. Bennett. Column‐generation boosting methods for mixture of kernels. Proc. Int’l Conf. Knowledge Discovery and Data Mining, pp. 521‐ 526, 2004. [12] G.R.G. Lanckriet, N. Cristianini, P. Bartlett, L.E. Ghaoui, and M.I. Jordan. Learning the kernel matrix with Semidefinite Programming. J. Machine Learning Research, vol. 5, pp. 27‐72, 2004. [13] F. Bach, G.R.G. Lanckriet, and M.I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. Proc. 21st Int’l Conf. Machine Learning, 2004. [14] K.P. Bennett, M. Momma, and M.J. Embrechts. MARK: A boosting algorithm for heterogeneous kernel models. Proc. ACM SIGKDD, pp. 24‐31, 2002. [15] H. Li, T. Jiang, and K. Zhang. Efficient and robust feature extraction by maximum margin criterion. Advances in Neural Information Processing Systems 16, S. Thrun, L. Saul, and B. Schölkopf, Eds. Cambridge, MA, MIT Press, pp. 157 ‐165, 2004. [16] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss. The FERET evaluation methodology for face‐ recognition algorithms. IEEE Trans. Pattern Anal. Mach. Intell., 22(10):1090–1104, Oct. 2000. [17] T. Sim, S. Baker, and M. Bsat. The CMU pose, illumination, and expression (PIE) database. Proc. 5th IEEE Int. Conf. Autom. Face and Gesture Recogntion, May 2002.


7

Multiple Kernel Learning in Fisher Discriminant ... - Semantic Scholar

Multiple Kernel Learning in Fisher Discriminant ... - Semantic Scholar

Suggest Documents

An e cient renovation on kernel Fisher discriminant ... - Semantic Scholar

Matching Pursuit Kernel Fisher Discriminant Analysis

Research Article Kernel Fisher Discriminant Analysis ...

Combining Fisher Discriminant Analysis And ... - Semantic Scholar

Probabilistic Fisher Discriminant Analysis - Semantic Scholar

Multiview Fisher Discriminant Analysis - Semantic Scholar

Multiview Fisher Discriminant Analysis - Semantic Scholar

Error bounds for Kernel Fisher Linear Discriminant in Gaussian Hilbert ...

Multiple Kernel Learning for Vehicle Detection in ... - Semantic Scholar

Efficient Cross-Validation of Kernel Fisher Discriminant ... - UCL/ELEN

Testing for Homogeneity with Kernel Fisher Discriminant Analysis

Kernel Fisher Discriminant for Steganalysis of JPEG ... - CiteSeerX

Fisher Linear Discriminant Analysis

Multiple Kernel Learning For Explosive Hazard ... - Semantic Scholar

lp-Norm Multiple Kernel Learning

Multiple Operator-valued Kernel Learning

More Efficiency in Multiple Kernel Learning - CiteSeerX

Nonparametric Kernel Regression with Multiple ... - Semantic Scholar

Text Segmentation with LDA-Based Fisher Kernel - Semantic Scholar

Outlier Detection with One-class Kernel Fisher ... - Semantic Scholar

using the fisher kernel method for web audio ... - Semantic Scholar

Discriminant Analysis - Semantic Scholar

Learning with kernel machine architectures ... - Semantic Scholar

Kernel Selection using Multiple Kernel Learning and Domain