IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 7, JULY 2013
2911
A Linear Support Higher-Order Tensor Machine for Classification Zhifeng Hao, Lifang He, Student Member, IEEE, Bingqian Chen, and Xiaowei Yang
Abstract— There has been growing interest in developing more effective learning machines for tensor classification. At present, most of the existing learning machines, such as support tensor machine (STM), involve nonconvex optimization problems and need to resort to iterative techniques. Obviously, it is very time-consuming and may suffer from local minima. In order to overcome these two shortcomings, in this paper, we present a novel linear support higher-order tensor machine (SHTM) which integrates the merits of linear C-support vector machine (C-SVM) and tensor rank-one decomposition. Theoretically, SHTM is an extension of the linear C-SVM to tensor patterns. When the input patterns are vectors, SHTM degenerates into the standard C-SVM. A set of experiments is conducted on nine second-order face recognition datasets and three thirdorder gait recognition datasets to illustrate the performance of the proposed SHTM. The statistic test shows that compared with STM and C-SVM with the RBF kernel, SHTM provides significant performance gain in terms of test accuracy and training speed, especially in the case of higher-order tensors. Index Terms— Higher-order tensor, support tensor machine (STM), support vector machine (SVM), tensor classification, tensor rank-one decomposition.
I. I NTRODUCTION
T
HERE are two main topics of concern in the fields of pattern recognition, computer vision and image processing: data representation and classifier design. In the past decades, numerous state-of-the-art classification algorithms
Manuscript received September 19, 2012; revised January 24, 2013; accepted March 2, 2013. Date of publication March 20, 2013; date of current version May 22, 2013. This work was supported in part by the National Science Foundation of China under Grant 61273295 and Grant 61070033, the Major Project of the National Social Science Foundation of China under Grant 11&ZD156, the Open Project of Key Laboratory of Symbolic Computation and Knowledge Engineering of the Chinese Ministry of Education under Grant 93K-17-2009-K04, and the China Scholarship Council. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Marios S. Pattichis. Z. Hao is with the Faculty of Computer, Guangdong University of Technology, Guangzhou 510006, China, and also with the School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641, China (e-mail:
[email protected]). L. He was with the South China University of Technology, Guangzhou 510641, China. He is now with the School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641, China (e-mail:
[email protected]). B. Chen is with the Department of Mathematics, School of Sciences, South China University of Technology, Guangzhou 510641, China (e-mail:
[email protected]). X. Yang is with the Department of Mathematics, School of Sciences, South China University of Technology, Guangzhou 510641, China, and also with the Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2013.2253485
have been proposed and have achieved great success in many applications. Among these algorithms, the most prominent representative is support vector machine (SVM) [1], which is particularly attractive in pattern recognition, computer vision and image processing communities [2]–[7] due to a number of theoretical and computational merits. Unfortunately, the standard SVM model is based on vector space and cannot directly deal with non-vector patterns, whereas real-world image and video data are more naturally represented as matrices (secondorder tensors) or higher-order tensors. For example, grey level face images [8], [9] are inherently represented as matrices. Color images [2], [10], gray-level video sequences [11]–[13], gait silhouette sequences [14], [15] and hyperspectral cube [16] are commonly represented as third-order tensors. Color video sequences [17], [18] are usually represented as fourthorder tensors. Although tensor objects can be reshaped into vectors beforehand to comply with the input requirements of SVM, several studies have indicated that this direct reshaping breaks the natural structure and correlation in the original data [15], [19]–[21], and leads to the curse of dimensionality and small sample size (SSS) problems [15], [22], where SVM performs poorly [6], [23]–[26]. Within the last decade, researchers have mainly focused on data representation to address the above problems, such as tensor decomposition [27]–[31] and multilinear subspace learning [15], [19], [22], [32]–[34]. Recently, several researchers [35]–[41] have suggested constructing multilinear models to extend the SVM learning framework to tensor patterns. In [35], Tao et al. presented a supervised tensor learning (STL) framework by applying a combination of the convex optimization and multilinear operators, in which the weight parameter was decomposed into rank-1 tensor [27]. Based on this learning framework, Cai et al. [36] studied second-order tensor and presented a linear tensor least square (TLS) classifier, Tao et al. [37] extended the classical linear C-support vector machine (C − SVM) [42], v − SVM [43] and least squares SVM [44] to general tensor patterns. Based on the SVM methodology within the STL framework, Liu et al. [38] used the dimension-reduced tensors as input for video analysis. Kotsia et al. [39] adopted Tucker decomposition of the weight parameter instead of rank-1 tensor to retain more structural information. Wolf et al. [40] proposed to minimize the rank of the weight parameter with the orthogonality constraints on the columns of the weight parameter instead of the classical maximum-margin criterion and Pirsiavash et al. [41] relaxed the orthogonality constraints to further improve the Wolf’s method.
1057-7149/$31.00 © 2013 IEEE
2912
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 7, JULY 2013
A potential disadvantage of the STL framework is that it gives rise to a non-convex optimization problem. This leads to the result that the STL-based method has two main drawbacks. On the one hand, it may suffer from the local minima problem. On the other hand, for the non-convex optimization problem, one usually resorts to iterative techniques, which is very timeconsuming. In this paper, we propose a novel linear support higher-order tensor machine (SHTM) to overcome these two shortcomings. Specifically, we first reformulate the linear C − SVM model from the viewpoint of multilinear algebra based on support tensor machine (STM) [37] and obtain a tensor space model, which works with the same principles as the linear C − SVM but operates directly on tensor input. More importantly, different from the traditional STL-based methods which involve iterative optimization procedure, the global solution of this model can be obtained through the optimization algorithms for SVM without any extra matrix storage and without invoking an iterative numerical routine. Second, we integrate tensor rank-one decomposition [45] into the model to assist its inner product computation. There are two main reasons for doing this: 1) The original inner product cannot exert its normal capability for capturing structural information of tensor objects because of the curse of dimensionality and SSS problems. 2) Motivated by the successes of the tensor rank-one decomposition in tensor representation and classification [18], [27], [46], we observe that the tensor rank-one decomposition is able to obtain more compact and meaningful representations of the tensor objects, especially in the case of higher-order tensors [18], thus improving the effectiveness of inner product computation and also saving storage space and computational time. Finally, we conduct a set of experiments on twelve tensor classification datasets to examine the effectiveness and efficiency of SHTM. The rest of this paper is organized as follows. Section II covers some preliminaries including notation, basic definitions and a brief review of STM. In Section III, the proposed SHTM is discussed for classification. The differences of SHTM vs. C − SVM and STM are also illustrated in this section. The experimental evaluation is presented in Section IV. Finally, Section V gives conclusions and future work. II. P RELIMINARIES Before presenting our work, we first briefly introduce some notation and basic definitions used throughout the paper and review the STM algorithm.
TABLE I L IST OF S YMBOLS M L M {X m , ym }m=1 Xm ym N R = rank(X m ) W b C ξ α, β ε w(1) ◦ w(2) ◦ · · · ◦ w(N ) X × nw · F
The total number of tensor samples The number of classes A set of tensor samples The mth input tensor sample The label of X m The order of X m ∈ R I1 ×I2 ×···×I N The rank of X m The weight parameter The bias variable The trade-off parameter The slack variables The Lagrange multipliers The threshold parameter Rank-1 tensor of n−mode product of X W and w Frobenius norm
Definition 1 (Tensor): A tensor, also known as Nth−order tensor, multidimensional array, N−way or N− mode array, is an element of the tensor product of N vector spaces, which is a higher-order generalization of a vector (first-order tensor) and a matrix (second-order tensor), denoted as A ∈ R I1 ×I2 ×···×I N , where N is the order of A, also called ways or modes. The element of A is denoted by ai1 ,i2 ,··· ,i N , 1 ≤ i n ≤ In , 1 ≤ n ≤ N. Definition 2 (Tensor Product or Outer Product): The tensor product X ◦Y of a tensor X ∈ R I1 ×I2 ×···×I N and another tensor Y ∈ R I1 ×I2 ×···×I M is defined by (X ◦ Y )i ,i
i 2 ,··· ,i N ,i1 ,i2 ,··· ,i M
= x i1 ,i2 ,··· ,i N yi ,i ,··· ,i 1 2
for all values of the indices. Definition 3 (Inner Product): The inner product of two same-sized tensors X, Y ∈ R I1 ×I2 ×···×I N is defined as the sum of the products of their entries, i.e., X,Y =
I2 I1 i1 =1 i2 =1
···
IN
x i1 ,i2 ,··· ,i N yi1 ,i2 ,··· ,i N .
For convenience, we will follow the conventional notation and definitions in the areas of multilinear algebra, pattern recognition and signal processing [15], [29], [45] [47]. Thus, in this study, vectors are denoted by boldface lowercase letters, e.g., a, matrices by boldface capital letters, e.g., A, tensors by calligraphic letters, e.g., A. Their elements are denoted by indices, which typically range from 1 to the capital letter of the index, e.g., n = 1, . . . , N. To make it more clear, Table I lists the fundamental symbols defined in this study.
(2)
i N =1
Definition 4 (n−mode product): The n−mode product of a tensor A ∈ R I1 ×I2 ×···×I N and a matrix U ∈ R Jn ×In , denoted by A × n U, is a tensor in R I1 ×I2 ×···×In−1 ×Jn ×In+1 ×···×I N given by ( A × n U)i1 ,i2 ,··· ,in−1 , jn ,in+1 ,··· ,i N =
A. Notation and Basic Definitions
(1)
M
In
ai1 ,i2 ,··· ,i N u jn ,in
(3)
in =1
for all index values. Remark: Given a tensor A ∈ R I1 ×I2 ×···×I N and a sequences of matrices U(n) ∈ R Jn ×In , Jn < In , n = 1, · · · N. The projection of A onto the tensor subspace R J1 ×J2 ×···×J N is defined as A× 1 U(1) × 2 U(2) ×· · ·× N U(N) . Given a tensor A ∈ R I1 ×I2 ×···×I N , and two matrices F ∈ R Jn ×In , G ∈ R Jm ×Im , one has (A × n F) × m G = (A × m G) × n F = A × n F × m G Definition 5 (Frobenius Norm): The Frobenius norm of a tensor A ∈ R I1 ×I2 ×···×I N is the square root of the sum of the
HAO et al.: LINEAR SUPPORT HIGHER-ORDER TENSOR MACHINE FOR CLASSIFICATION
TABLE II
subject to
A LTERNATING P ROJECTION FOR STM
i =n ym (w(n) )T X m
Input: A Set Of Tensor Samples M , the Threshold Parameter ε. {X m ∈ R I1 ×I2 ×···×I N , ym ∈ {−1, 1}}m=1
1≤i≤N
ξm(n)
Step 2: Run Step 3 iteratively until convergence. Step 3: For n = 1, 2, . . . , N . Fix w(i) (i = n) and solve the optimization problem (6)-(8) to obtain w(n) . N (n) (n) (n) −2 Step 4: If ((wt )T wt−1 wt − 1) < ε, output F
w(n) ∈ R In , n = 1, 2, . . . , N. and stop. Otherwise, go to Step 3. Here (n) (n) wt and wt−1 are the current projection weight vector and the previous projection weight vector, respectively.
squares of all its elements, i.e., IN I2 I1 A F = A, A = ··· ai2 ,i i1 =1 i2 =1
i N =1
1 2 ,··· ,i N
R
ur(1) ◦ ur(2) ◦ · · · ◦ ur(N) =
r=1
N R
≥ 0, m = 1, . . . , M.
y(X) = sign(X
◦ur(n)
(8)
where w(n) is the normal vector (or weight vector) of the nth hyperplane, b (n) is the bias, ξm(n) is the error of the mth training sample corresponding to w(n) , C is the trade-off between the classification margin and misclassification error. Obviously, each optimization problem defined in (6)–(8) is the standard C − SVM However, these N optimization models have no closed-form solution, we need to use the alternating projection algorithm to solve them. The detailed procedures are described in Table II. Once the STM model has been solved, the class label of a testing example X can be predicted as follow:
(4)
Remark: Given two same-sized tensors A ∈ R I1 ×I2 ×···×I N and B ∈ R I1 ×I2 ×···×I N , the distance between tensors A and Bis defined as A − B F . Note that the Frobenius norm of the difference between two tensors equals to the Euclidean distance of their vectorized representations [48]. Definition 6 (Tensor Rank-One Decomposition): Let A ∈ R I1 ×I2 ×···×I N be a tensor. If it can be written as A=
×i w(i) + b (n) ≥ 1 − ξm(n) , (7)
Output: w(n) ∈ R In and b(N ) , n = 1, 2, . . . , N. Step 1: Set w(n) equal to random unit vector in R In , n = 1, 2, . . . , N .
n=1
2913
N
×n w(n) + b).
(9)
n=1
III. L INEAR S UPPORT H IGHER -O RDER T ENSOR M ACHINE In this section, we first introduce the proposed SHTM for classification, and then analyze the differences of SHTM vs. SVM and STM.
(5)
r=1 n=1
we call (5) tensor rank-one decomposition of A with length R, also known as CANDECOMP/PARAFAC (CP) decomposition. Particularly, if R = 1, it is called rank-1 tensor. If R is the minimum number of rank-1 tensors that yield A in a linear combination, R is defined as the rank of A, denoted by R = r ank(A). Moreover, if ui(n) and u(n) j are mutually orthonormal for all i = j, 1 ≤ i, j ≤ R, n = 1, . . . , N, A is called orthogonally decomposable [49] [50].
A. SHTM for Binary Classification Before deriving our optimization model, we first reformulate the STM model using concepts of multilinear algebra and obtain a base model, which happens to be identical to the linear C − SVM model but is an independent implementation from a multilinear algebra viewpoint. Second, we integrate tensor rank-one decomposition into the base model and present our optimization model.
ξm(n) . Let W = w(1) ◦ w(2) ◦ · · · ◦ w(N) and ξm = max n=1,2,...,N
Using the definitions of outer product and Frobenius norm of tensors, we have
B. Support Tensor Machine Considering a training set of M pairs of samples M {X m , ym }m=1 for binary classification problem, where I X m ∈ R 1 ×I2 ×···×I N are the input data and ym ∈ {−1, +1} are the corresponding class labels of X m , STM for binary classification is composed of N quadratic programming (QP) problems with inequality constraints, and the nthQP problem can be described in the following way [37]: 2 i =n 1 (i) 2 J (w(n) , b (n) , ξ (n) ) = w(n) min w 1≤i≤N F F 2 w(n) ,b(n) ,ξ (n) +C
M m=1
ξm(n) ,
(6)
W 2F =
I1 I2 i1 =1 i2 =1
=
I2 I1 i1 =1 i2 =1
IN
···
i N =1
···
wi2 ,i
1 2 ,··· ,i N
2 IN (2) (N) wi(1) w · · · w i2 iN 1
i N =1
= w(1) , w(1) w(2) , w(2) · · · w(N) , w(N) =
N
(n) 2 w . n=1
F
(10)
2914
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 7, JULY 2013
From the definitions of n−mode product and inner product of tensors, we can write
i =n ×i w(i) (w(n) )T X m 1≤i≤N = X m × 1 w(1) × 2 w(2) × . . . × (n−1) w(n−1) ×n w(n) × (n+1) w(n+1) × . . . × N w(N) =
I2 I1
···
i1 =1 i2 =1
=
I2 I1
IN i N =1
···
i1 =1 i2 =1
IN
x m,i 1 , i 2 , . . . , i N wi(1) wi(2) · · · wi(N) 1 2 N x m,i1 ,i2 ,...,i N wi1 ,i2 ,...,i N
i N =1
= W, X m
(11)
Using (10) and (11), the N QP problems arising in STM can be transformed into the following optimization problem: min J (W, b, ξ ) =
W,b,ξ
M 1 W 2F + C ξm , 2
(12)
m=1
subject to
the input samples X i are vectors, the optimization model (12)–(14) degenerates into support vector machine. Moreover, if we adopt the original input tensors to compute X i , X j , then the optimal solutions of (12)–(14) are the same as the linear C − SVM. Although it is not our intention to produce the same solution for different data space models, the work reported here is a first step towards our approach and also establishes a link between C − SVM and STM. As mentioned previously, due to the so-called curse of dimensionality and SSS problems, SVM cannot handle the tensor objects effectively, thus the optimization model (12)-(14) also suffers from these problems. More precisely, the inner product computation in (19) fails to capture structural information of the data since the dual formulation of the optimization model (12)–(14) depends on the data only through inner products. Considering that the tensor rankone decomposition can obtain more compact and meaningful representations of the tensor objects, especially in the higherorder tensors, we embed it into (19) to assist the inner product computation. Let the rank-one decomposition of X i and X j be X i ≈ R R (1) (2) (N) (2) (N) xir ◦ xir ◦ · · · ◦ xir and X j ≈ x(1) jr ◦ x jr ◦ · · · ◦ x jr
r=1
ym (W, X m + b) ≥ 1 − ξm ,
(13)
ξm ≥ 0, m = 1, . . . , M.
(14)
The Lagrangian function of the optimization problem (12)-(14) is
r=1
respectively, then the inner product of X i and X j is calculated as follow: R (1) (2) (N) X i , X j ≈ xir ◦ xir ◦ · · · ◦ xir , r=1 R
M 1 L(W, b, α, β, ξ ) = W 2F + C ξm 2 m=1
−
M
αm ym (W, X m + b − 1 + ξm ) −
m=1
M
=
βm ξm . (15)
(1)
(2)
(N)
x jr ◦ x jr ◦ · · · ◦ x jr
r=1 R R
(1) (1) (2) (2) (N) (N) xip , x j q xip , x j q · · · xip , x j q .
p=1 q=1
m=1
Let the partial derivatives of L(W, b, α, β, ξ ) with respect to W , b and ξm be zeroes respectively, we have W =
M
αm ym X m .
(16)
m=1 M
(αm ym ) = 0.
(17)
αm + βm = C, m = 1, . . . , M.
(18)
Substituting (16)–(18) into (15) yields the dual of the optimization problem (12)-(14) as follows: max α
m=1
1 αm − 2
M
Substituting (22) into (19), we get max α
M m=1
αm −
N M R R
(n) (n) 1 xip , x j q . αi α j yi y j 2 i, j =1 p=1 q=1
n=1
(23) subject to
m=1
M
(22)
M
αm ym = 0,
(24)
m=1
0 ≤ αm ≤ C, m = 1, · · · , M. αi α j yi y j X i , X j ,
(19)
i, j =1
subject to M
αm ym = 0,
(20)
0 ≤ αm ≤ C, m = 1, . . . , M
(21)
m=1
where αm are the Lagrange multipliers and X i , X j are the inner products of X i and X j . It is obvious that when
(25)
From (16), we know that the normal of the hyperplane can be expressed as a linear combination of the training samples in tensor space. So, we call the optimization model (23)–(25) linear support higher-order tensor machine (SHTM). It can be solved by sequential minimal optimization algorithm [51]. The class label of a testing example X is predicted as follow: ⎞ ⎛ R R N M
(n) (n) αm ym xmp , xq + b⎠. (26) y(X) = sign ⎝ m=1 p=1 q=1
n=1
HAO et al.: LINEAR SUPPORT HIGHER-ORDER TENSOR MACHINE FOR CLASSIFICATION
(n) Where xmp and xq(n) are the elements of the rank-one decomposition of X m and X respectively.
B. SHTM for Multi-Classification For an L−class classification problem, one-against-one support vector machine (OAO-SVM) [52] needs to construct L(L − 1)/2 binary classification SVM models where each one is trained on data points from two classes. Inspired by this idea, for the tensor samples X m of the i th and the j th classes, if X m belongs to the i th class, then ym = 1, otherwise ym = −1. We solve the following binary classification problem: min
W i j ,bi j ,ξ
J (W i j , b i j , ξ ) =
M 1 i j 2 ij + C ξm , W F 2
(27)
m=1
subject to ij
ym (W i j , X m + b i j ) ≥ 1 − ξm ,
(28)
ij ξm
(29)
≥ 0, m = 1, . . . , M.
We call this model one-against-one support higher-order tensor machine (OAO-SHTM). Once the OAO-SHTM model has been solved, the class label of a testing example X can be predicted by applying majority voting strategy, i.e., the vote counting takes into account the outputs of all binary classifiers. If X belongs to the i th class, then the i th class gets one vote, otherwise the j th class gets one vote. X is labeled by the class with the most votes. C. Analysis of SHTM Versus SVM and STM In this section, we discuss the differences of SHTM vs. SVM and STM as follows: 1) Naturally STM is a multilinear support vector machine and constructs N different hyperplanes in N mode spaces. SHTM is a linear support tensor machine and constructs a hyperplane in the tensor space. 2) The optimization problem arising in STM needs to be solved iteratively while the optimization problem of SHTM only needs to be solved once. (n) 3) For the same training sample, the slack variables ξm obtained by STM are often unequal in different mode spaces while SHTM only obtains one slack value in the tensor space. In addition, for different mode spaces, support vectors in one mode space may no longer be support vector in another mode space for STM. 4) For weight parameter W , STM only obtains its rank-1 tensor while SHTM obtains its more accurate presentation, which leads that the generalized performance of SHTM is usually higher than that of STM. 5) For a tensor, STM directly uses it as input, SVM forces it into vectors as input, while SHTM adopts its more compact representations, namely R rank-one tensors as input, which makes SHTM more effective than SVM and STM. ∈ 6) For a set of tensor samples {X m M , SVM requires O((M + 1) R I1 ×I2 ×···×I N , ym }m=1 N memory space, STM requires n=1 In + 1)
2915
N N In + space, O(M n=1 n=1 In + 1) memory N while SHTM only requires O((M + 1)R n=1 In + 1) memory space, which is desirable from a storage perspective. 7) From the previous work, we know that the N comIn ) putational complexity of SVM is O(M 2 n=1 [53], thus the computational complexity of STM N In ) where T is the loop numis O(M 2 N T n=1 ber, and the computational complexity of SHTM is N In ), which indicate that SHTM is more O(M 2 R 2 n=1 efficient than SVM and STM. IV. E XPERIMENTAL E VALUATION In this section, we evaluate the performance of SHTM with experiments on four benchmarking databases (Yale-B, ORL, CMU PIE and USF HumanID). A. Baselines and Metrics For the proposed SHTM, we choose the most popular and widely used alternating least squares (ALS) [50], [54] as its tensor rank-one decomposition strategy. In order to establish a comparative study, we use C − SVM and STM as baselines. Specifically, RBF kernel is used in C − SVM, since it has been demonstrated that RBF kernel usually outperforms other kernels [55], [56] K (x, xi ) = exp(− x − xi 2 /2σ 2 ).
(30)
In order to achieve our goal, we compare the performance of SHTM with those of C − SVM and STM in terms of both effectiveness and efficiency. The effectiveness is measured by test accuracy, which is the proportion of correctly classified samples as to the total number of test samples. The efficiency is measured by the training time of classifier, which is the CPU time for training the classifier and time for reading the data into main memory is not included. In the statistical analysis [57], [58] the Hochberg’s step-up method of Friedman test (F-test) is usually suggested to check whether a newly proposed algorithm is better than the existing algorithms. For this purpose, we also use the Hochberg’s stepup method to conduct a statistical comparison of SHTM vs. C − SVM and STM. Assume that we compare K existing learning machines on Q experimental datasets. Let dqk be the performance score of the kth compared learning machine on the qth experimental dataset, rank(dqk ) is the rank of dqk , and Q Rk = Q1 q=1 rank(dqk ) is its average rank, and then the test statistic for comparing the kth and new learning machines is computed as follow: (K + 1)(K + 2) (31) Z = (Rk − Rnew )/ 6Q which is used to find the corresponding probability p with significance level δ from the table of normal distribution, so as to calculate p¯ = 2(1 − p). After obtaining p, ¯ we need to compare p¯ with δ/(K −s +1) starting with the largest p¯ value until it encounters a hypothesis it can reject, where s is the sorted value of Z in descending order.
2916
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 7, JULY 2013
(a)
Fig. 2.
Gait silhouette sequence for third-order gait recognition datasets. TABLE III D ETAILED I NFORMATION OF E XPERIMENTAL D ATASETS
(b) Data Source (Database)
Datasets
Yale32 × 32 Yale64 × 64 ORL32 × 32 ORL ORL64 × 64 C05 C07 CMU PIE C09 C27 C29 USFGait17_32 × 22 × 10 USF HumanID USFGait17_64 × 44 × 20 USFGait17_128 × 88 × 20 Yale-B
(c) Fig. 1. Second-order face recognition datasets. (a) Yale64×64 samples. (b) ORL64×64 samples. (c) C05 samples.
B. Experimental Datasets In the experiments, we use a total of twelve tensor datasets where nine of them (Yale32 × 32, Yale64 × 64, ORL32 × 32, ORL64 × 64, C05, C07, C09, C27, and C29) are second-order face recognition datasets obtained from http://www.zjucadcg.cn/dengcai/Data/FaceData.html, and others (USFGait17_32 × 22 × 10, USFGait17_64 × 44 × 20, and USFGait17_128 × 88 × 20) are third-order gait recognition datasets obtained from https://sites.google.com/site/tensormsl/. The detailed information about these twelve datasets is listed in Table III, in which the data source of each dataset is also given in the first column. To better understand the tensor structures of experimental data, we illustrate with one example for each database, as shown in Figs. 1 and 2. As a preprocessing step, we scale each attribute to the range [0, 1] in order to facilitate a fair comparison. We divide each dataset randomly into ten parts of approximately equal size while keeping the proportion of samples in each class, and use nine parts as the training set, and the remaining is the test set. All the learning machines use the same training set and test set. Using aforementioned ALS algorithm, we decompose each tensor into R rank-1 tensors for SHTM, and reshape each tensor into vectors for C − SVM. C. Parameter Settings All of these three methods select the optimal trade-off parameter from C ∈ 20 , 21 , 22 , . . . , 29 . Considering the fact
Number Number of of Samples Class 165 15 165 15 400 40 400 40 3332 68 1629 68 1632 68 3329 68 1632 68
Size 32 × 32 64 × 64 32 × 32 64 × 64 64 × 64 64 × 64 64 × 64 64 × 64 64 × 64
731
71
32 × 22 × 10
731
71
64 × 44 × 20
731
71
128 × 88 × 20
that there is no known closed-form solution to determine the rank R of a tensor a priori [59], and rank determination of a tensor is still an open problem [60], [61], in SHTM we use grid search to determine the optimal rank and the optimal tradeoff parameter together, where the rank R ∈ {3, 4, 5, 6, 7, 8} The influence of different rank parameters on the classification performance of SHTM is also given in our experiments. In C−SVM, the optimal widthof the RBF kernel is selected from σ ∈ 2−4 , 2−3 , 2−2 , . . . , 29 . In STM, the threshold parameter ε is set to 10-3 . To obtain an unbiased statistical result, all the optimal parameters are searched using ten-fold cross validation. In addition, the significance level δof Hochberg’s step-up method for F-test is set to 0.05. All the programs are written in C++ and compiled using the Microsoft Visual Studio 2008 compiler. All the experiments are conducted on a computer with Intel(R) Core(TM)2 1.8 GHz processor and 3.5 GB RAM memory running Microsoft Windows XP. D. Classification Performance In this section, we first conduct the experiments on twelve datasets to compare the performance of SHTM with those of C − SVM and STM. To conduct a fair comparison, OAO-SVM strategy is also used for multi-classification in
HAO et al.: LINEAR SUPPORT HIGHER-ORDER TENSOR MACHINE FOR CLASSIFICATION
2917
TABLE IV C OMPARISON OF THE R ESULTS OF SHTM, C − SVM AND STM ON T WELVE E XPERIMENTAL D ATASETS Learning Machines C − SVM STM SHTM C − SVM STM SHTM C − SVM STM SHTM C − SVM STM SHTM C − SVM STM SHTM C − SVM STM SHTM
Datasets
Yale32×32
Yale64×64
ORL32×32
ORL64×64
C05
C07
R
C
σ
– – 4 – – 8 – – 3 – – 6 – – 7 – – 5
128 2 16 256 128 512 512 512 1 512 8 256 64 1 8 512 1 1
16 – – 32 – – 8 – – 32 – – 8 – – 32 – –
Test Accuracy (%) 77.33 74.00 79.00 84.33 82.33 85.33 97.75 97.00 98.00 97.75 96.50 98.50 98.59 98.06 98.76 96.47 95.44 96.74
Training Time (Seconds) 0.642 1.383 0.078 1.708 6.466 0.544 5.311 7.314 0.413 17.997 34.299 3.208 2398.530 3129.298 203.475 324.912 648.103 34.158
Datasets
C09
C27
C29 USFGait17_ 32×22×10 USFGait17_ 64×44×20 USFGait17_ 128×88×20
R
C
σ
– – 6 – – 7 – – 8 – – 8 – – 7 – – 7
512 64 1 128 2 2 128 64 2 512 32 32 512 – 8 256 – 128
8 – – 32 – – 32 – – 64 – – 512 – – 512 – –
(a)
(b)
(c)
(d)
Test Accuracy (%) 97.40 96.23 97.45 96.69 95.10 96.72 96.62 94.75 96.64 76.39 78.79 79.60 77.53 – 81.55 77.53 – 82.60
Training Time (Seconds) 584.664 655.519 49.128 348.773 653.308 68.924 298.991 631.321 90.223 265.730 834.333 19.294 2896.670 – 28.980 8940.456 – 55.298
Fig. 3. Test accuracy versus on (a) Yale32 × 32, Yale64 × 64, (b) ORL32 × 32, ORL64 × 64, (c) C05, C09, (d) USFGait17_32 × 22 × 10, and USFGait17_128 × 88 × 20, where the red triangles indicate the peak positions.
C − SVM and STM. Table IV shows the experimental results for twelve datasets, including test accuracy, training time, and the corresponding optimal parameters.Test accuracy and training time are average of test accuracy and training time in
10 trials, respectively. The best test accuracy and training time among these three learning machines are highlighted in bold type. Note that, STM fails to find the optimal parameter values on USFGait17_64 × 44 × 20 and USFGait17_128 × 88 × 20
2918
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 7, JULY 2013
(a)
(b)
(c)
(d)
Fig. 4. Training time versus R on (a) Yale32 × 32, Yale64 × 64, (b) ORL32 × 32, ORL64 × 64, (c) C05, C09, (d) USFGait17_32 × 22 × 10, and USFGait17_128 × 88 × 20, where the red diamonds indicate the optimal values. TABLE V
TABLE VI
S ORTED R ESULTS OF S IGNIFICANCE OF SHTM V ERSUS . C − SVM AND STM O N T EST A CCURACY
S ORTED R ESULTS OF S IGNIFICANCE OF SHTM V ERSUS . C − SVM AND STM O N T EST A CCURACY
= (Rs (K +1)(K +2) 6Q
− R S H T M )/
s.no. Learning Machines
Z
1 2
(2.917-1)/0.408 = 4.699 (2.083-1)/0.408 = 2.654
STM C − SVM
p¯
δ K −s+1
0.000 0.008
0.025 0.05
for more than two weeks, and the corresponding results are denoted by “–”. Next, using test accuracy and training time, we conduct the Hochberg’s step-up method of F-test for further comparison of SHTM vs. C − SVM and STM. The sorted results, the corresponding values p¯ and δ/(K − s + 1) on test accuracy and training time are reported in Table V and Table VI, respectively. From Table IV, we have the following observations: 1) In terms of test accuracy, STM outperforms C − SVM only on one dataset, while SHTM outperforms C −SVM and STM on all twelve datasets, especially for thirdorder tensor datasets. For example, the test accuracy of
= (Rs (K +1)(K +2) 6Q
− R S H T M )/
s.no. Learning machines
Z
1 2
(3-1)/0.408 = 4.902 (2-1)/0.408 = 2.451
STM C − SVM
p¯
δ K −s+1
0.000 0.014
0.025 0.05
SHTM on USFGait17_64 × 44 × 20 is 4% higher than that of C − SVM and 5% on USFGait17_128 × 88 × 20. 2) In terms of training time, C − SVM is faster than STM on all the datasets, while SHTM is significantly faster than C − SVM on all the datasets, particularly for thirdorder tensor datasets. For example, the training speed of SHTM on USFGait17_64 × 44 × 20 is about 160 times faster than C − SVM. From Table V, we know that for the last hypothesis, 0.008 < 0.05. Therefore, we should reject this null hypothesis, which indicates that SHTM is significantly better than C − SVM and STM in terms of test accuracy. Table VI shows the same result as Table V.
HAO et al.: LINEAR SUPPORT HIGHER-ORDER TENSOR MACHINE FOR CLASSIFICATION
So far we have compared all experimental results. The comparison of test accuracy and training time for SHTM, C − SVM and STM demonstrates that SHTM is significantly efficient and effective for tensor classification. Note that, in this study, we adopt ALS decomposition method for SHTM, and other decomposition methods can also be used. E. Parameter Sensitivity Although the optimal rank parameter R and the optimal trade-off parameter C are found by a grid search in SHTM, it is still interesting to see the sensitivity of SHTM to the rank parameter R. For this purpose, we demonstrate a sensitivity study over different R in this section, where the optimal tradeoff parameter is still selected from C ∈ 20 , 21 , 22 , · · · , 29 . Here, we consider two examples in each database. Figs. 3 and 4 demonstrate the variation in test accuracy and training time over different R on eight datasets respectively. From Fig. 3, we can see that the rank parameter R has a significant effect on the test accuracy and the optimal value of R depends on the data Fig. 4 shows the efficiency of SHTM is reduced when R is increased. This is a very natural result because a higher value of R implies that more items are included into inner product computations. Taken together, we can observe that the optimal value of R (marked by red diamond in Fig. 4) lies in the range 2 ≤ R ≤ 8 except USFGait17_128 × 88 × 20, which may provide a good guidance for selection of the optimal value of R. In summary, the parameter sensitivity study indicates that the classification performance of SHTM relies on parameter R and it is difficult to specify an optimal value for R in advance. However, in most cases the optimal value of R lies in the range 2 ≤ R ≤ 8 and it is not time-consuming to find it using the grid search strategy in practical applications V. C ONCLUSION In this paper, a novel linear support higher-order tensor machine (SHTM) has been presented for classification. In SHTM, the linear C − SVM model is reformulated from multilinear algebra viewpoint which shows a link between C − SVM and STM. Furthermore, the model uses more compact R rank-one tensors as input instead of the original tensors, which makes SHTM have strong capabilities for capturing essential information from tensor objects and saving storage space and computational time. The experiments have been conducted on nine second-order face recognition datasets and three third-order gait recognition datasets to test the performance of SHTM. The results show that SHTM is more effective and efficient for tensor classification than C − SVM and STM, especially in the case of higher-order tensors. In future work, we will investigate the reconstruction techniques of tensor data, so that SHTM can handle highdimensional vector data more effectively. Another interesting topic would be to design some tensor kernel for SHTM so as to generalize it to nonlinear case. Further study on this topic will also include many applications of SHTM in real-world classification with tensor representations.
2919
R EFERENCES [1] V. Vapnik, The Nature of Statistical Learning Theory. New York, NY, USA: Springer-Verlag, 1995. [2] D. Gavrila, “The visual analysis of human movement: A survey,” Comput. Vis. Image Understand., vol. 73, no. 1, pp. 82–92, 1999. [3] D. Ramanan and D. Forsyth, “Automatic annotation of everyday movements,” in Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2003, pp. 1–8. [4] Z. Li and X. Tang, “Bayesian face recognition using support vector machine and face clustering,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jun.–Jul. 2004, pp. 259–265. [5] J. Liu and M. Shah, “Learning human actions via information maximization,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jun. 2008, pp. 1–8. [6] J. Li, N. Allinsion, D. Tao, and X. Li, “Multitraining support vector machine for image retrieval,” IEEE Trans. Image Process., vol. 15, no. 11, pp. 3597–3601, Nov. 2006. [7] F. Bovolo, L. Bruzzone, and L. Carlin, “A novel technique for subpixel image classification based on support vector machine,” IEEE Trans. Image Process., vol. 19, no. 11, pp. 2983–2999, Nov. 2010. [8] M. Turk and A. Pentland, “Face recognition using eigenfaces,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jun. 1991, pp. 586–591. [9] M. Felsberg, “Low-level image processing with the structure multivector,” Ph.D. dissertation, Inst. Computer Science and Applied Mathematics, Christian-Albrechts-Univ. Kiel, Kiel, Germany, 2002. [10] K. Plataniotis and A. Venetsanopoulos, Color Image Processing and Applications, Berlin, Germany: Springer-Verlag, 2000. [11] R. Green and L. Guan, “Quantifying and recognizing human movement patterns from monocular video images-part II: Applications to biometrics,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 2, pp. 191–198, Feb. 2004. [12] R. Chellappa, A. Roy-Chowdhury, and S. Zhou, Recognition of Humans and their Activities Using Video. San Rafael, CA, USA: Morgan & Claypool, 2005. [13] P. Negi and D. Labate, “3-D discrete shearlet transform and video processing,” IEEE Trans. Image Process., vol. 21, no. 6, pp. 2944–2954, Jun. 2012. [14] S. Sarkar, P. Phillips, Z. Liu, I. Vega, P. Grother, and K. Bowyer, “The humanID gait challenge problem: Data sets, performance, and analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 2, pp. 162–177, Feb. 2005. [15] H. Lu, K. Plataniotis, and A. Venetsanopoulos, “MPCA: Multilinear principal component analysis of tensor objects,” IEEE Trans. Neural Netw., vol. 19, no. 1, pp. 18–39, Jan. 2008. [16] N. Renard and S. Bourennane, “Dimensionality reduction based on tensor modeling for classification methods,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 4, pp. 1123–1131, Apr. 2009. [17] M. Kim, J. Jeon, J. Kwak, M. Lee, and C. Ahn, “Moving object segmentation in video sequences by user interaction and automatic object tracking,” Image Vis. Comput., vol. 19, no. 5, pp. 245–260, 2001. [18] H. Wang and N. Ahuja, “Compact representation of multidimensional data using tensor rank-one decomposition,” in Proc. 17th Int. Conf. Pattern Recognit., vol. 1. 2004, pp. 44–47. [19] X. Geng, K. Smith-Miles, Z.-H. Zhou, and L. Wang, “Face image modeling by multilinear subspace analysis with missing values,” IEEE Trans. Syst., Man, Cybern B, Cybern., vol. 41, no. 3, pp. 881–892, Jun. 2011. [20] J. Ye, R. Janardan, and Q. Li, “GPCA: An efficient dimension reduction scheme for image compression and retrieval,” in Proc. 10th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2004, pp. 354–363. [21] H. Lu, K. Plataniotis, and A. Venetsanopoulos, “Uncorrelated multilinear discriminant analysis with regularization and aggregation for tensor object recognition,” IEEE Trans. Neural Netw., vol. 20, no. 1, pp. 103–123, Jan. 2009. [22] S. Yan, D. Xu, Q. Yang, L. Zhang, X. Tang, and H.-J. Zhang, “Multilinear discriminant analysis for face recognition,” IEEE Trans. Image Process., vol. 16, no. 1, pp. 212–220, Jan. 2007. [23] X. Zhang, X. Lu, Q. Shi, X.-Q. Xu, H.-C. Leung, L. Harris, J. Iglehart, A. Miron, J. Liu, and W. Wong, “Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data,” BMC Bioinformat., vol. 7, p. 197, Apr. 2006. [24] T. Cooke, “Two variations on Fisher’s linear discriminant for pattern recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 2, pp. 268–273, Feb. 2002.
2920
[25] P. Hall, J. Marron, and A. Neeman, “Geometric representation of high dimension, low sample size data,” J. Royal Stat. Soc. Ser. B, vol. 67, no. 18, pp. 427–444, Jun. 2005. [26] D. Tao, X. Tang, X. Li, and X. Wu, “Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 7, pp. 1088–1099, Jul. 2006. [27] A. Shashua and A. Levin, “Linear image coding for regression and classification using the tensor-rank principle,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 1. Dec. 2001, pp. 42–49. [28] T. Hazan, S. Polak, and A. Shashua, “Sparse image coding using a 3D non-negative tensor factorization,” in Proc. 10th IEEE Int. Conf. Comput. Vis., vol. 1. Oct. 2005, pp. 50–57. [29] T. Kolda and B. Bader, “Tensor decompositions and applications,” SIAM Rev., vol. 51, no. 3, pp. 455–500, 2009. [30] M. Liu, N. Yu, and W. Li, “Camera model identification for JPEG images via tensor analysis,” in Proc. 6th Int. Conf. Intell. Inf. Hiding Multimedia Signal Process., 2010, pp. 462–465. [31] S. Bourennane, C. Fossati, and A. Cailly, “Improvement of classification for hyperspectral images based on tensor modeling,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 4, pp. 801–805, Oct. 2010. [32] S. Yan, D. Xu, Q. Yang, L. Zhang, X. Tang, and H.-J. Zhang, “Discriminant analysis with tensor representation,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jun. 2005, pp. 526–532. [33] Y. Fu and T. Huang, “Image classification using correlation tensor analysis,” IEEE Trans. Image Process., vol. 17, no. 2, pp. 226–234, Feb. 2008. [34] H. Lu, K. Plataniotis, and A. Venetsanopoulos, “A survey of multilinear subspace learning for tensor data,” Pattern Recogn., vol. 44, no. 7, pp. 1540–1551, 2011. [35] D. Tao, X. Li, W. Hu, S. Maybank, and X. Wu, “Supervised tensor learning,” in Proc. 5th IEEE Int. Conf. Data Mining, Houston, TX, USA, Nov. 2005, pp. 450–457. [36] D. Cai, X. He, and J. Han, “Learning with tensor representation,” Dept. Comput. Sci., Univ. Illinois at Urbana-Champaign, Urbana, IL, USA, Tech. Rep. UIUCDCS-R-2006-2716, Apr. 2006. [37] D. Tao, X. Li, X. Wu, W. Hu, and S. Maybank, “Supervised tensor learning,” Knowl. Inf. Syst., vol. 13, no. 1, pp. 1–42, 2007. [38] Y. Liu, F. Wu, Y. Zhuang, and J. Xiao, “Active post-refined multimodality video semantic concept detection with tensor representation,” in Proc. ACM Conf. Multimedia, 2008, pp. 91–100. [39] I. Kotsia and I. Patras, “Support tucker machines,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jun. 2001, pp. 633–640. [40] L. Wolf, H. Jhuang, and T. Hazan, “Modeling appearances with lowrank SVM,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jun. 2007, pp. 1–6. [41] H. Pirsiavash, D. Ramanan, and C. Fowlkes, “Bilinear classifiers for visual recognition,” in Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2009, pp. 1482–1490. [42] C. Cortes and V. Vapnik, “Support vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, 1995. [43] B. Schölkopf, A. Smola, R. Williamson, and P. Bartlett, “New support vector algorithms,” Neural Comput., vol. 12, no. 5, pp. 1207–1245, 2000. [44] J. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural Process. Lett., vol. 9, no. 3, pp. 293–300, 1999. [45] P. Savicky and J. Vomlel, “Exploiting tensor rank-one decomposition in probabilistic inference,” Kybernetika, vol. 43, no. 5, pp. 747–764, 2007. [46] H. Wang and N. Ahuja, “Rank-R approximation of tensors using imageas-matrix representation,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., Jun. 2005, pp. 346–353. [47] L. Lathauwer, “Signal processing based on multilinear algebra,” Ph.D. dissertation, Dept. Elektrotechniek, Katholieke Univ., Leuven, Belgium, Sep. 1997. [48] H. Lu, K. Plataniotis, and A. Venetsanopoulos, “A taxonomy of emerging multilinear discriminant analysis solutions for biometric signal recognition,” in Biometrics: Theory, Methods, and Applications, N. V. Boulgouris, K. Plataniotis, and E. Micheli-Tzanakou, Eds. New York, NY, USA: Wiley, 2009. [49] S. Goreinov, E. Tyrtyshnikov, and N. Zamarashkin. “A theory of pseudoskeleton approximations,” Linear Algebra Appl., vol. 261, nos. 1–3, pp. 1–21, 1997. [50] T. Zhang and G. Golub, “Rank-one approximation to high order tensors,” SIAM J. Matrix Anal. Appl., vol. 23, no. 2, pp. 534–550, 2001. [51] S. Keerthi, S. Shevade, C. Bhattacharyya, and K. Murthy, “Improvement to Platt’s SMO algorithm for SVM classifier design,” Neural Comput., vol. 13, no. 3, pp. 637–649, 2001.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 7, JULY 2013
[52] U. Kreßel, “Pairwise classification and support vector machines,” in Advances in Kernel Methods: Support Vector Learning. Cambridge MA, USA: MIT Press, 1999, pp. 255–268. [53] C.-T. Chu, S. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun, “Map-reduce for machine learning on multicore,” in Advances in Neural Information Processing Systems. Cambridge MA, USA: MIT Press, 2006, pp. 281–288. [54] P. Kroonenberg and J. Leeuw, “Principal component analysis of threemode data by means of alternating least squares algorithms,” Psychometrika, vol. 45, no. 1, pp. 69–97, Mar. 1980. [55] P. Clarkson and P. Moreno, “On the use of support vector machines for phonetic classification,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Mar. 1999, pp. 585–588. [56] S. Maji, A. Berg, and J. Malik, “Efficient classification for additive kernel SVMs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 66–77, Jan. 2013. [57] J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” J. Mach. Learn. Res., vol. 7, pp. 1–30, Jan. 2006. [58] Y. Hochberg, “A sharper Bonferroni procedure for multiple tests of significance,” Biometrika, vol. 75, no. 4, pp. 800–802, 1988. [59] M. Kilmer and C. Martin, “Factorization strategies for third-order tensors,” Linear Algebra Appl., vol. 435, no. 3, pp. 641–658, Aug. 2011. [60] V. Silva and L.-H. Lim, “Tensor rank and the ill-posedness of the best low-rank approximation problem,” SIAM J. Matrix Anal. Appl., vol. 30, no. 3, pp. 1084–1127, 2008. [61] C. Martin, “The rank of a 2×2×2 tensor,” Linear and Multilinear Algebra, vol. 59, no. 8, pp. 943–950, 2011. Zhifeng Hao received the B.S. degree from Sun YatSen University, Guangzhou, China, and the Ph.D. degree in mathematics from Nanjing University, Nanjing, China, in 1990, and 1995, respectively. He is currently a Professor with the Faculty of Computer Science, Guangdong University of Technology and School of Computer Science and Engineering, South China University of Technology, Guangzhou. His current research interests include algebra, machine learning, data mining, and evolutionary algorithms. Lifang He (S’12) received the B.S. degree in information and computer science from Northwest Normal University, Lanzhou, China, in 2009. She is currently pursuing the Ph.D. degree with the School of Computer Science and Engineering, South China University of Technology, Guangzhou, China. Her current research interests include machine learning, pattern recognition and their applications in the field of computer vision.
Bingqian Chen received the B.S. degree in mathematics and applied mathematics from the South China University of Technology, Guangzhou, China, in 2012, where she is currently pursuing the Graduate degree with the Department of Mathematics. Her current research interests include tensor learning, machine learning, and pattern recognition and their applications in the fields of computer vision.
Xiaowei Yang received the B.S. degree in theoretical and applied mechanics, the M.Sc. degree in computational mechanics, and the Ph.D. degree in solid mechanics from Jilin University, Changchun, China, in 1991, 1996, and 2000, respectively. He is currently a Professor with the Department of Mathematics, South China University of Technology, Guangzhou, China. He has authored more than 80 journals and refereed international conference articles. His current research interests include designs and analyses of algorithms for large-scale pattern recognitions, imbalanced learning, semi-supervised learning, and evolutionary computation.