An overview on nonparallel hyperplane support vector machine ...

8 downloads 609 Views 294KB Size Report
Dec 15, 2013 - Loading web-font TeX/Math/Italic ... First Online: 15 December 2013 ... Support vector machine (SVM) has attracted substantial interest in the ...
Neural Comput & Applic (2014) 25:975–982 DOI 10.1007/s00521-013-1524-6

REVIEW

An overview on nonparallel hyperplane support vector machine algorithms Shifei Ding • Xiaopeng Hua • Junzhao Yu

Received: 20 August 2013 / Accepted: 26 November 2013 / Published online: 15 December 2013 Ó Springer-Verlag London 2013

Abstract Support vector machine (SVM) has attracted substantial interest in the community of machine learning. As the extension of SVM, nonparallel hyperplane SVM (NHSVM) classification algorithms have become current researching hot spots in machine learning during the last few years. For binary classification tasks, the idea of NHSVM algorithms is to find a hyperplane for each class, such that each hyperplane is proximal to the data points of one class and far from the data points of the other class. Compared with the classical SVM, NHSVM algorithms have lower computational complexity, work better on XOR problems and can get better generalization performance. This paper reviews three representative NHSVM algorithms, including generalized eigenvalue proximal SVM (GEPSVM), twin SVM (TWSVM) and projection twin SVM (PTSVM), and gives the research progress of them. The aim of this overview is to provide an insightful organization of current developments of NHSVM algorithms, identify their limitations and give suggestions for further research.

S. Ding (&)  X. Hua  J. Yu School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China e-mail: [email protected] S. Ding Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Science, Beijing 100190, China X. Hua School of Information Engineering, Yancheng Institute of Technology, Yancheng 224001, China

Keywords Support vector machine  Generalized eigenvalue proximal support vector machine  Twin support vector machine  Projection twin support vector machine

1 Introduction In the past decade, support vector machine (SVM) [4, 7] has become a hot topic of research in machine learning. Based on the structural risk minimization principle, the classical SVM finds the maximization margin between two classes of data points by solving a quadratic programming problem (QPP) in the dual space [42, 43]. Within a few years after its introduction, SVM has already outperformed most other systems in a wide variety of applications [16, 19, 20, 46]. However, one of the main challenges in the classical SVM is the large computational cost of QPP. The long training time of QPP not only causes the classical SVM to take a long time to train on a large database, but also prevents it from locating the optimal parameter set from a very fine grid of parameters over a large span [31]. Another shortcoming in the classical SVM is it usually pays more attention to the separation between classes than the prior information within classes in data. In fact, for different real-world problems, different classes may have different underlying prior information. Recently, nonparallel hyperplane support vector machine (NHSVM) classification algorithms, as the extension of the classical SVM, have become the researching hot spots in the field of matching learning. The study of NHSVM classification algorithms originates from generalized eigenvalue proximal SVM (GEPSVM) proposed by Mangasarian and Wild [21], which is an extension of proximal SVM (PSVM) [9]. GEPSVM obtains each of nonparallel hyperplanes by

123

976

solving the eigenvector corresponding to a smallest eigenvalue of a generalized eigenvalue problem, so that each hyperplane is as close as possible to the points of its own class and meantime as far as possible from the points of the other class. Besides the lower computational complexity, such method also performs better on XOR problems in comparison with the classical SVM. During the last few years, GEPSVM has been enhanced to a family of novel NHSVM algorithms for solving data mining problems, which are roughly divided into two categories: learning algorithms based on the generalized eigenvalues to improve the generalization capability or computational cost of GEPSVM and SVM-like methods that obtain two nonparallel hyperplanes by solving two smaller QPPs. In the second category, twin support vector machine (TWSVM) [14] and projection twin support vector machine (PTSVM) [1] are two representative algorithms. In the last few years, NHSVM classification algorithms have been developed rapidly on the basis of the above three representative methods. In this paper, we provide an insightful organization of current developments of NHSVM algorithms, identify their limitations and give suggestions for further research. The remainder of this paper proceeds as follows. In Sect. 2, we review the theory and the algorithm thought of three representative NHSVM algorithms: GEPTSVM, TWSVM and PTSVM. Section 3 gives the research progress of them. Then, in Sect. 4, we list specific applications of NHSVM algorithms in recent years. Finally, we provide concluding remarks and discuss the direction of future research.

Neural Comput & Applic (2014) 25:975–982

xT w1 þ b1 ¼ 0

and

xT w2 þ b2 ¼ 0;

ð1Þ

where superscript T denotes transposition, and ðwi ; bi Þ 2 ðRn  RÞ (i = 1, 2) is the direction and bias of each hyperplane. This algorithm requires each hyperplane to be as close as possible to the points of its own class and as far as possible from the points of the other class. This leads to the following two optimization problems: ðGEPSVM1Þ min

kAw1 þ e1 b1 k2 þdk½w1 =b1 k2

ð2Þ

kBw1 þ e2 b1 k2

and ðGEPSVM2Þ min

kBw2 þ e2 b2 k2 þdk½w2 =b2 k2 kAw2 þ e1 b2 k2

ð3Þ

;

where d [ 0 is a regularization constant, kk denotes the 2-norm and ei is a column vector of ones with dimension mi. The optimization formulation (2) finds the hyperplane by minimizing the distance from the points of class ?1 and maximizing the distance from the points of class -1. Conversely, the optimization problem (3) seeks the hyperplane, which is closest to the points of class -1 and furthest from the points in class ?1. By making the definition 

G ¼ ½A

e 1 T ½ A

e1  þ dI; H ¼ ½ B

e 2 T ½ B

e2 ; z1 ¼

L ¼ ½B

e2 T ½ B

e2  þ dI; M ¼ ½ A

e 1 T ½ A

e1 ; z2 ¼

w1



b1   w2 b2

; ;

ð4Þ 2 Review of three representative NHSVM algorithms: GEPTSVM, TWSVM and PTSVM For binary data classification problems, NHSVM algorithms aim to find a hyperplane for each class, such that each hyperplane is proximal to the data points of one class and far from the data points of the other class. GEPSVM, TWSVM and PTSVM are three representative algorithms of NHSVM, and all the other NHSVM methods are improved versions based on them. So, we firstly introduce the algorithm thought of the three representative NHSVM algorithms in this section. 2.1 Model of GEPSVM Consider a binary data classification problem of classifying m1 training points belonging to class ?1 and m2 training points belonging to class -1 in the n-dimensional real space Rn. Let matrix A in Rm1 n represent the training points of class ?1, and matrix B in Rm2 n represent the training points of class -1. The central idea in GEPSVM is to seek the following two nonparallel hyperplanes in the n-dimensional input space

123

the optimization problems (2) and (3) become ðGEPSVM1Þ min

zT1 Gz1 ; zT1 Hz1

ð5Þ

zT2 Lz2 : zT2 Mz2

ð6Þ

and ðGEPSVM2Þ min

With the help of the well-known properties of Rayleigh quotient (RQ), the solution of (5) and (6) can be obtained by solving two generalized eigenvalue problems as follows Gz1 ¼ k1 Hz1 ;

ð7Þ

Lz2 ¼ k2 Mz2 ;

ð8Þ

where zi 6¼ 0; i ¼ 1; 2: Compared to the classical SVM, GEPSVM relaxes the universal requirement that bounding or proximal hyperplanes should be parallel in the input space for linear kernel classifiers or in the higher dimensional feature space for nonlinear kernel classifiers. This makes GEPSVM work better than SVM on XOR problems in the linear classification case. In addition, each of nonparallel proximal

Neural Comput & Applic (2014) 25:975–982

977

hyperplanes in GEPSVM is easily obtained using a single MATLAB command that solves the classical generalized eigenvalue problem. However, the generalization performance of GEPSVM still needs to be further improved. On the other hand, the constraints zi 6¼ 0; i ¼ 1; 2 determine the matrices H and M must be nonsingular. 2.2 Model of TWSVM In order to improve the generalization performance of GEPSVM, Jayadeva et al. [14] introduced a stand-alone NHSVM algorithm, called twin support vector machine (TWSVM). This algorithm seeks two nonparallel hyperplanes by solving two related SVM-type problems, each of which is smaller than that in the classical SVM. The formulation of TWSVM can be expressed as: ðTWSVM1Þ min 12 ðAw1 þ e1 b1 ÞT ðAw1 þ e1 b1 Þ þ c1 eT2 n s:t: ðBw1 þ e2 b1 Þ þ n  e2 ; n  0

ð9Þ

ðH0 HÞ1 and ðG0 GÞ1 are approximately replaced by ðH0 HþeIÞ1 and ðG0 GþeIÞ1 , respectively [14], where I is an identity matrix of appropriate dimensions. e is a positive scalar, small to keep the structure of data. From (11) and (12), we notice that only constraints for the other class appear, implying that QPP (11) has m2 parameters and QPP (12) has m1 parameters as opposed to m = m1 ? m2 parameters in standard SVM. The strategy of solving a pair of smaller-sized QPPs instead of a large one as in the classical SVM makes the learning speed of TWSVM be approximately four times faster than the classical SVM. In terms of generalization, TWSVM favorably compare with the classical SVM for its fully considering the prior information within classes in data. However, TWSVM augmented vectors lose sparsity. In addition, in the classical SVM, the structural risk is minimized, whereas in the primal problems of TWSVM, only the empirical risk is minimized. 2.3 Model of PTSVM

and ðTWSVM2Þ min 12 ðBw2 þ e2 b2 ÞT ðBw2 þ e2 b2 Þ þ c2 eT1 g ; s:t: ðAw2 þ e1 b2 Þ þ g  e1 ; g  0

ð10Þ

where c1 and c2 are tradeoff constants, n and g are nonnegative slack variables. The objective functions of (9) and (10) seek the distance from the points to the hyperplane by square distance and minimize the distance to ensure the hyperplane is as close as possible to its own class. The inequality constraints ensure that the distance from the points to hyperplane is at least 1. Let H ¼ ½ A e1 ; G ¼ ½ B e2 ; and vi ¼ ½ w0i bi 0 , i = 1,2. The Wolfe’s dual of QPPs (9) and (10) are given by (11) and (12) in terms of the Lagrangian multipliers a 2 Rm2 and c 2 Rm1 , respectively. ðDTWSVM1Þ

min s:t:

1 0 0 1 0 2 a GðH HÞ G a 0  a  e2 c1 :

 e02 a;

ð11Þ

ðDTWSVM2Þ

min s:t:

1 0 0 1 0 2 c HðG GÞ H c 0  c  e1 c2 :

 e01 c;

ð12Þ

After solving the QPPs (11) and (12), two nonparallel hyperplanes of (1) can be, respectively, produced by   w1 v1 ¼ ð13Þ ¼ ðH0 HÞ1 G0 a; b1 and  v2 ¼

w2 b2



¼ ðG0 GÞ1 H0 c:

ð14Þ

In order to deal with the case when H0 H or G0 G is singular and avoid the possible ill-conditioning, the inverse matrices

Different from GEPSVM and TWSVM, the central idea in PTSVM is to find a projection axis for each class, such that within-class variance of the projected samples of its own class is minimized; meanwhile, the projected samples of the other class scatter away as far as possible. This leads to the following primal QPPs: ðPTSVM1Þ min 12 wT1 S1 w1 þ c1 eT2 n; s:t:

  Bw1  m11 e2 eT1 Aw1 þ n  e2 ;

ð15Þ n  0;

and ðPTSVM2Þ min 12 wT2 S2 w2 þ c2 eT1 g; s:t: ðAw2  m12 e1 eT2 Bw2 Þ þ g  e1 ;

ð16Þ g  0;

where S1 and S2, corresponding to class ?1 and class -1, respectively, are defined as follows:  T   1 1 T T S1 ¼ A  e2 e1 A A e2 e1 A and m1 m1 ð17Þ  T   1 1 T T e1 e2 B B e1 e2 B : S2 ¼ B  m2 m2 The first term in the objective functions of (15) and (16) make the within-class variance of the projected data points of its own class as small as possible, that is, the projected data points are clustered around their mean. The inequality constraints require the projected samples of the other class to be at a distance of at least 1 from the corresponding projected center. Through analysis, it is not difficult to find that PTSVM, similar to GEPSVM and

123

978

Neural Comput & Applic (2014) 25:975–982

TWSVM, also aims to seek a hyperplane for each class, such that each hyperplane is proximal to the data points of one class and far from the data points of the other one. Different from GEPSVM and TWSVM, the bias b1 corresponding to the first hyperplane is the negative of projected mean of class ?1 and b2 corresponding to the second hyperplane is the negative of projected mean of class -1. The Wolfe’s dual of QPPs (15) and (16) are given by (18) and (19) in terms of the Lagrangian multipliers a 2 Rm2 and b 2 Rm1 , respectively. ðDPTSVM1Þ    T 1 T min 12 aT B  m11 e2 eT1 A S1 B  e e A a  eT2 a; 2 1 1 m1 s:t: 0  a  C1 e2 ; ð18Þ and ðDPTSVM1Þ    T 1 T A  e e B b  eT1 b; min 12 bT A  m12 e1 eT2 B S1 1 2 2 m2 s:t: 0  b  C2 e1 :

algorithms based on GEPSVM, TWSVM and PTSVM, as shown in Fig. 1. 3.1 Improvements on GEPSVM Guarracino et al. [11] proposed regularized general eigenvalue classifier (ReGEC) by introducing a new regularization technique into GEPSVM. The solution of ReGEC can be attained by solving a single generalized eigenvalue problem rather than two ones in GEPSVM. Experimental results show that ReGEC has classification accuracy comparable to GEPSVM and lower computational complexity. Yang and Chen [50] proposed proximal SVM based on prototypal multiclassification hyperplanes (MHPSVM) by introducing new optimization criterion to GEPSVM. Compared with GEPSVM, MHPSVM can directly obtain multi-prototypal hyperplanes for multiple-class classification and get significantly higher classification performance. Jayadeva et al. [15] extended GEPSVM to fuzzy multicategory classification problems by using one-fromrest separation from each class and proposed fuzzy

ð19Þ After solving the QPPs (18) and (19), the projection axis of class ?1 is given by  T 1 1 T w1 ¼ S1 B  e2 e1 A a ð20Þ m1 and that of class -1 is  T 1 1 T w2 ¼ S2 A  e1 e2 B b: m2

ð21Þ

The bias b1 and b2 can be defined as follows: b1 ¼ 

1 T e Aw1 m1 1

and b2 ¼ 

1 T e Bw2 : m2 2

ð22Þ

Similar to TWSVM, the strategy of solving a pair of smaller-sized QPPs also makes the learning speed of PTSVM be approximately four times faster than the classical SVM. Different from minimizing the distance to ensure the hyperplane is as close as possible to its own class in TWSVM, PTSVM ensures the projected data points in the same class are clustered around their mean as far as possible. This makes PTSVM works better than TWSVM on some complex XOR problems. However, PTSVM faces the same drawbacks as TWSVM.

3 Improvements on NHSVM algorithms In recent years, many scholars probed into the research of NHSVM methods and proposed many improved

123

⎧ ⎪ ⎧ReGEC ⎪ ⎪GEPSVM → ⎪⎨MHPSVM ⎪ ⎪FMGEPSVM ⎪ ⎩ ⎪ ⎧ ⎧ NPPC ⎪ ⎪ ⎪STSVM ⎪ ⎪ ⎪ ⎪ ⎪ ⎪LCTSVM ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ v-TSVM ⎪ ⎪ ⎪Lap-TSVM ⎪ ⎪ ⎪ ⎪ ⎪ ⎪TMSVM ⎪ ⎪ ⎪ UMH-MTSVM ⎪ ⎪ ⎪⎪ ⎪ ⎪ classification → ⎨BDTWSVM ⎪ ⎪ ⎪ ⎪R-TWSVM ⎪ ⎪ ⎪ ⎪ ⎪ ⎪S-TWSVM ⎪ ⎪ ⎪ ⎪ NHSVM algorithms → ⎨TWSVM → ⎨ ⎪DTTSVM ⎪ ⎪ ⎪STPMSVM ⎪ ⎪ ⎪ ⎪ ⎪ ⎪RMCV_TWSVM ⎪ ⎪ ⎪ NMTWSVM ⎪ ⎪ ⎪ ⎪ ⎪ ⎩⎪TWSVMLG ⎪ ⎪ ⎪ ⎪ ⎧TSVR ⎪ ⎪ ⎪PTSVR ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎪RTSVR ⎪ ⎪regression → ⎨ ⎪ ⎪ ⎪ε − TSVR ⎪ ⎪ ⎪ L1 − ε − TSVR ⎪ ⎪ ⎪ ⎪ ⎪⎩ ⎪⎩TLSTSVR ⎪ ⎧LSPTSVM ⎪ ⎪RPTSVM ⎪ ⎪PTSVM → ⎨⎪ ⎪ ⎪ NLSPTSVM ⎪ ⎪⎩MPPTSVM ⎩

Fig. 1 Improvements on NHSVM algorithms

Neural Comput & Applic (2014) 25:975–982

multicategory GEPSVM (FMGEPSVM). FMGEPSVM requires the solution of M-generalized eigenvalue problems, as against the solution of M-quadratic programming problems in the case of classical SVM. Experimental results demonstrate that FMGEPSVM provides the ability to use knowledge of the ambiguity associated with the data set or a subset of patterns, to improve generalization. 3.2 Improvements on TWSVM Many improvements on TWSVM have been given by Ding et al. [6]. Here, we further supply some new improvements on TWSVM during the last few years from two aspects of classification and regression. 3.2.1 Improvements on TWSVM for classification Ghorai et al. [10] formulated a much simpler nonparallel plane proximal classifier (NPPC) for speeding up the training of TWSVM by introducing a technique as used in PSVM classifier. Khemchandani et al. [17] improved the efficiency of TWSVM by treating the kernel selection problem for TWSVM as an optimization problem over the convex set of finitely many basic kernels, and formulating the same as an iterative alternating optimization problem. Peng [26] proposed a rapid sparse twin support vector machine (STSVM) classifier in primal space to improve the sparsity and robustness of TWSVM. Ye et al. [52] proposed a reduced algorithm termed localized twin SVM via convex minimization (LCTSVM), which effectively reduced the space complexity of TWSVM by constructing two nearest neighbor graphs in the input space. Xu et al. [48] proposed a rough margin-based v-TSVM, which can avoid overfitting problem to a certain extent by introducing rough set theory into TWSVM. A novel Lap-TSVM [2, 32] was presented for the semi-supervised classification problem, which exploited the geometry information of the marginal distribution embedded in unlabeled data to construct a more reasonable classifier. Peng and Xu [27] proposed a new TMSVM classifier called TMSVM, in which two Mahalanobis distance-based kernels were constructed according to the covariance matrices of two classes of data for optimizing the nonparallel hyperplanes. Ye et al. [51] proposed a novel weighted twin support vector machines with local information (WLTSVM), which could mine as much underlying similarity information within samples as possible. Shao and Deng [36] considered unity norm constraints by using Euclidean norm and added a regularization term with the idea of maximizing some margin in TWSVM and proposed a novel margin-based TWSVM with unity hyperplanes (UNH-MTSVM). Peng and Xu [28] presented a classifier called bi-density twin support vector machine (BDTWSVM) by introducing the relative density

979

degrees for all training points using the intra-class graph. Qi et al. [33] proposed a new robust TWSVM (R-TWSVM) via second-order cone programming formulations for classification, which can deal with data with measurement noise efficiently. Qi et al. [34] designed a new structural TWSVM (S-TWSVM) by introducing within-class structural information into TWSVM. Shao et al. [35, 38, 39] proposed a decision tree twin support vector machine (DTTSVM) for multi-class classification by constructing a binary tree based on the best separating principle. Peng et al. [31] presented a structural twin parametric-margin support vector machine (STPMSVM) by introducing the prior structural information in data into TPMSVM. Wang et al. [44, 45] increased the efficiency of TPMSVM by introducing a quadratic function and suggesting a genetic algorithm GA-based model selection for TPMSVM. In order to overcoming the shortcoming of only considering the empirical risk minimization principle in TWSVM, Peng and Xu [29] formulated a robust minimum class variance twin support vector machine (RMCV-TWSVM) by introducing a pair of uncertain class variance matrices in its objective functions. Peng and Xu [30] presented a novel TWSVM classifier, called norm-mixed TWSVM (NMTWSVM), which used two groups of equality constraints in its pair of primal optimization problems and considered the L1-norm losses for the slack variables in equality constraints. Wang et al. [44, 45] proposed a new twin SVM algorithm based on the local and global regularization (TWSVMLG), which combined the results of all local classifiers with a global smoothness regularizer. 3.2.2 Improvements on TWSVM for regression The learning speed of classical support vector regression (SVR) is low, since it is constructed based on the minimization of a convex quadratic function subject to the pair groups of linear inequality constraints for all training samples [24]. Peng [24] extended TWSVM to regression problems and proposed twin support regression (TSVR). Experimental results show that TSVR is not only fast, but also shows good generalization performance. Peng [25] presented a primal TSVR (PTSVR) by introducing a quadratic function to approximate its loss function. Singh et al. [41] proposed the reduced TSVR (RTSVR) by using the notion of rectangular kernels to obtain significant improvements in execution time over TSVR. Xu and Wang [47] proposed a weighted TSVR, in which samples in the different positions were proposed to give different penalties. Shao et al. [35, 38, 39] proposed a new e-TSVR based on TSVR by introducing a regularization term and the successive overrelaxation technique. Zhao et al. [54] and Huang et al. [13] presented a new twin least squares twin support vector regression (TLSTSVR) owning faster

123

980

computational speed than TSVR. A novel feature selection approach, L1-norm e-TSVR (L1-e-TSVR), was proposed by Ye et al. [53] to investigate the determinants of cost-push inflation in China. Compared with L2-e-TSVR, L1-e-TSVR not only can fit function well, but also can do feature ranking. 3.3 Improvements on PTSVM In order to enhance the performance of PTSVM further, Shao et al. [37] proposed a least squares version of PTSVM, called least squares PTSVM (LSPTSVM). LSPTSVM works extremely faster than PTSVM because the solutions of LSPTSVM can be attained by solving two systems of linear equations, whereas PTSVM needs to solve two QPPs. In addition, the regularization term ensures the optimization problems in LSPTSVM are positive definite and results better generalization ability. Shao et al. [35, 38, 39] proposed a simple and reasonable variant of PTSVM from theoretical point of view, called PTSVM with regularization term (RPTSVM). This leads to the singularity in RPRSVM which can be avoided, and the regularized risk principle is implemented. In addition, the nonlinear classification ignored in PTSVM is also considered in RPTSVM. Ding and Hua [5] formulated a nonlinear version of the proposed LSPTSVM for binary nonlinear classification by introducing nonlinear kernel into LSPTSVM. This formulation leads to a novel nonlinear algorithm, called nonlinear LSPTSVM (NLSPTSVM). Additionally, in order to promote its generalization capability, the recursive learning method, used for further boosting the performance of PTSVM and LSPTSVM, is also be extended to the nonlinear case. Based on the PTSVM, a novel matrix-pattern-based PTSVM (MPPTSVM) [12] was presented by introducing the matrix-pattern-based within-class scatter matrix into PTSVM and constructing matrix-pattern-based constraints. MPPTSVM, in contrast to PTSVM, theoretically solves the singularity problem of within-class scatter matrix in the small samples size case, efficiently reduces the space complexity of storing the within-class scatter matrixes and weight vectors, and can directly operate on matrix patterns such as image data.

4 Applications of NHSVM algorithms Intrusion detection has become the important component of the network security. Many intelligent intrusion detection models are proposed, but the performance and efficiency are not satisfied to real computer network system. Ding et al. [8] extended these works by applying TWSVM to

123

Neural Comput & Applic (2014) 25:975–982

intrusion detection. The experimental results indicate that the proposed models based on the TWSVM are more efficient and have higher detection rate than conventional SVM-based model and other models. Speaker recognition is used to solve the problem of ‘who is speaking.’ It means extracting the characteristics of personal identity from the speech signal and then identifying the identity of the speaker, but the content of the speech is invalid. Currently, GMM and SVM are frequently combined together for speaker recognition. Cong et al. [3] suggested the method of implementing TWSVM into speaker recognition. This approach uses GMM to pick up the characteristic parameters as the input for TWSVM model. It can not only be used for the insufficient training samples, but also used to identify large-scale samples with better robustness. Experimental results show that this method can get better and profitable recognition performance in speaker recognition. Speech emotion recognition is used to solve the problem of ‘how to speak.’ Automatic emotion recognition (AER) has been an interesting, useful and researchable topic in the human–computer interaction fields. Yang et al. [49] applied TWSVM to AER. Experiments show that the more efficient and accurate results can be achieved by using TWSVM. Automatic text categorization (TC) is a supervised learning problem, which involves training a classifier with some labeled documents and then using the classifier to predict the labels of unlabeled documents. Kumar and Gopal [18] have applied least squares twin support vector machine (LSTSVM) to TC. The experimental results show that LSTSVM achieves better classification performance than other methods. Breast cancer is the second major cause of death from cancer among women. Mammographic screening is one of the most effective ways to detect the breast cancer in an early stage to reduce mortality. Mass in mammogram can be an indicator of breast cancer. Si and Jing [40] applied TWSVM for automated detection of mass in digital mammograms. Experimental results demonstrate that TWSVMbased system provides higher classification accuracy and computational speed than SVM-based classifier. Classification of surface electromyogram (sEMG) signal is important for various applications such as prosthetic control and human–computer interface. Surface EMG provides a better insight into the strength of muscle contraction, which can be used as control signal for different applications. Naik et al. [23] presented a novel method on the machine-learning-based classification of fractal features of sEMG using TWSVM. Experimental results indicate that appropriately optimized TWSVM can accurately identify hand gestures and actions from sEMG using fractal features. Such a system overcomes the weakness of

Neural Comput & Applic (2014) 25:975–982

previous sEMG-based hand gesture identification systems where there are issues related to the reliability. Human activity recognition is an important research branch in computer vision. Automatic analysis of outgoing activities in an unknown video is main goal of human activity recognition (HAR). Mozafari et al. [22] proposed a new framework for HAR with the combination of local space–time features and LSTSVM. Experimental results show LSTSVM is powerful in accuracy rate and has low time complexity. In the case of image recognition, Qi et al. [32] and Chen et al. [2] applied Lap-TSVM to handwritten symbol recognition, and Peng and Xu [27] and Qi et al. [33] applied TMSVM and R-TWSVM, respectively, to human face recognition. Although these NHSVM methods improve the accuracy significantly and decrease the time complexity of algorithm dramatically at the same time, they can only operate directly on patterns represented by vector, i.e., before applying them to a pattern, any nonvector pattern such as image has to be first vectorized into a vector pattern by some techniques. However, some implicit structural or local contextual information may be lost in this transformation. Hua and Ding [12] proposed MPPTSVM and applied it to matrix-based image data directly rather than a vector pattern. Compared with similar algorithms, the advantages of MPPTSVM are obvious.

5 Conclusions and prospects Compared with the classical SVM, NHSVM algorithms achieve higher training speed and classification accuracy by solving generalized eigenvalue problems or two smaller QPPs and paying more attention to the prior information within classes in data. Although many improved NHSVM algorithms have been proposed in the last few years, there are still many problems needed to be considered. Firstly, data usually become available gradually in many application fields; this fact requires data analysis systems to have the capability to learn information incrementally. However, the existing NHSVM algorithms can not deal with tasks where the learning environment is steadily changing or training samples become available one after another over time. Secondly, the strategy to solve small QPP pairs increases the NHSVM algorithms training speed compared to that of the classical SVM, but the calculations of their augmented vectors result in the disappearance of sparsity. So, how to resolve the sparsity of NHSVM algorithms is still a topic which is worth studying. Finally, it is well known that one significant advantage of the classical SVM is the implementation of the structural risk minimization principle. However, only the empirical risk is considered in the existing NHSVMs. Although some improved NHSVM

981

algorithms state that structural risk minimization principle can be realized by adding a regularization term, this kind of statement is lack of theoretical basis. In addition, the introduction of the regularization term will change the original idea of NHSVM algorithms. Acknowledgments This work was supported in part by the National Key Basic Research and Development Program (973 Program) under Grant No. 2013CB329502, the National Natural Science Foundation of China under Grant No. 61379101 and the Natural Science Foundation of Jiangsu Province under Grant No. BK2011417.

References 1. Chen XB, Yang J, Ye QL, Liang J (2011) Recursive projection twin support vector machine via within-class variance minimization. Pattern Recogn 44(10–11):2643–2655 2. Chen WJ, Shao YH, Ning H (2013) Laplacian smooth twin support vector machine for semi-supervised classification. Int J Mach Learn Cyber. doi:10.1007/s13042-013-0183-3 3. Cong HH, Yang CF, Pu XR (2008) Efficient speaker recognition based on multi-class twin support vector machines and GMMs. 2008 IEEE conference on robotics, automation and mechatronics, pp 348–512 4. Cristianini N, Taylor JS (2004) An introduction to support vector machines and other kernel-based learning methods (trans: Li G, Wang M, Zeng H). Publishing House of Electronics Industry, Beijing 5. Ding SF, Hua XP (2013) Recursive least squares projection twin support vector machines for nonlinear classification. Neurocomputing. doi:10.1016/j.neucom.2013.02.046 6. Ding SF, Yu JZ, Qi BJ, Huang HJ (2012) An overview on twin support vector machines. Artif Intell Rev. doi:10.1007/s10462012-9336-0 7. Ding SF, Qi BJ, Tan HY (2011) An overview on theory and algorithm of support vector machines. J Univ Electron Sci Technol China 40(1):2–10 8. Ding XJ, Zhang GL, Ke YZ, Ma BL, Li ZC (2008) High efficient intrusion detection methodology with twin support vector machines, pp 560–564. doi:10.1109/ISISE.2008.278 9. Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings of the 7th ACMSIFKDD international conference on knowledge discovery and data mining, pp 77–86 10. Ghorai S, Mukherjee A, Dutta PK (2009) Nonparallel plane proximal classifier. Signal Process 89:510–522 11. Guarracino MR, Cifarelli C, Seref O, Pardalos PM (2007) A classification method based on generalized eigenvalue problems. Optim Method Softw 22(1):73–81 12. Hua XP, Ding SF (2012) Matrix pattern based projection twin support vector machines. Int J Digit Content Technol Appl 6(20):172–181 13. Huang HJ, Ding SF, Shi ZZ (2013) Primal least squares twin support vector regression. J Zhejiang Univ SCI C 14(9):722–732 14. Jayadeva, Khemchandni R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910 15. Jayadeva, Khemchandai R, Chandra S (2007) Fuzzy multi-category proximal support vector classification via generalized eigenvalues. Soft Comput 11(7):685–769 16. Khan NM, Ksantini R, Ahmad IS, Boufama B (2012) A novel SVM plus NDA model for classification with an application to face recognition. Pattern Recogn 45(1):66–79

123

982 17. Khemchandani R, Jayadeva, Chandra S (2009) Optimal kernel selection in twin support vector machines. Optim Lett 3(1):77–88 18. Kumar MA, Gopal M (2009) Least squares twin support vector machines for pattern classification. Expert Syst Appl 36(4): 7535–7543 19. Lin KB, Wang ZJ (2006) The method of fax receiver’s name recognition based on SVM. Comput Eng Appl 42(7):156–158 20. Liu XL, Ding SF (2010) Appropriateness in applying SVMs to text classification. Comput Eng Sci 32(6):106–108 21. Mangasarian OL, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74 22. Mozafari K, Nasiri JA, Charkari NM, Jalili S (2011) Action recognition by space-time features and least squares twin SVM. 2011 first international conference on informatics and computational intelligence, pp 287–292 23. Naik GR, Kumar DK, Jayadeva (2010) Twin SVM for gesture classification using the surface electromyogram. IEEE Trans Inf Technol Biomed 14(2):301–308 24. Peng XJ (2010) TSVR: an efficient twin support vector machine for regression. Neural Netw 23(3):365–372 25. Peng XJ (2010) Primal twin support vector regression and its sparse approximation. Neurocomputing 73(16–18):2846–2858 26. Peng XJ (2011) Building sparse twin support vector machine classifiers in primal space. Inf Sci 181:3967–3980 27. Peng XJ, Xu D (2012) Twin Mahalanobis distance-based support vector machines for pattern recognition. Inf Sci 200(10):22–37 28. Peng XJ, Xu D (2013) Bi-density twin support vector machines for pattern recognition. Neurocomputing 99:134–143 29. Peng XJ, Xu D (2013) Robust minimum class variance twin support vector machine classifier. Neural Comput Appl 22:999–1011 30. Peng XJ, Xu D (2013) Norm-mixed twin support vector machine classifier and its geometric algorithm. Neurocomputing 99:486–495 31. Peng XJ, Wang YF, Xu D (2013) Structural twin parametricmargin support vector machine for binary classification. KnowlBased Syst 49:63–72 32. Qi ZQ, Tian YJ, Shi Y (2012) Laplacian twin support vector machine for semi-supervised classification. Neural Netw 35:46–53 33. Qi ZQ, Tian YJ, Shi Y (2013) Robust twin support vector machine for pattern classification. Pattern Recogn 46(1):305–316 34. Qi ZQ, Tian YJ, Shi Y (2013) Structural twin support vector machine for classification. Knowl-Based Syst 43:74–81 35. Shao YH, Chen WJ, Huang WB, Yang ZM, Deng NY (2013) The best separating decision tree twin support vector machine for multi-class classification. Procedia Comput Sci 17:1032–1038

123

Neural Comput & Applic (2014) 25:975–982 36. Shao YH, Deng NY (2013) A novel margin-based twin support vector machine with unity norm hyperplanes. Neural Comput Applic 22(7–8):1627–1635 37. Shao YH, Deng NY, Yang ZM (2012) Least squares recursive projection twin support vector machine for classification. Pattern Recogn 45(6):2299–2307 38. Shao YH, Wang Z, Chen WJ, Deng NY (2013) A regularization for the projection twin support vector machine. Knowl-Based Syst 37:203–210 39. Shao YH, Zhang CH, Yang ZM, Jing L, Deng NY (2013) An etwin support vector machine for regression. Neural Comput Applic 23(1):175–185 40. Si X, Jing L (2009) Mass detection in digital mammograms using twin support vector machine-based CAD system. 2009 WASE international conference on information engineering, pp 240–243 41. Singh M, Chadha J, Ahuja P, Jayadeva, Chandra S (2011) Reduced twin support vector regression. Neurocomputing 74(9):1471–1477 42. Vapnik VN (2000) The nature of statistical learning theory (trans: Zhang X). Tsinghua University Press, Beijing 43. Vapnik VN (2004) Statistical learning theory (trans: Xu J, Zhang X). Publishing House of Electronics Industry, Beijing 44. Wang YN, Zhao X, Tian YJ (2013) Local and global regularized twin SVM. Procedia Comput Sci 18:1710–1719 45. Wang Z, Shao YH, Wu TR (2013) A GA-based model selection for smooth twin parametric-margin support vector machine. Pattern Recogn 46:2267–2277 46. Xie SQ, Shen FM, Qiu XN (2009) Face recognition using support vectormachines. Comput Eng 35(16):186–188 47. Xu YT, Wang LS (2012) A weighted twin support vector regression. Knowl-Based Syst 33:92–101 48. Xu YT, Wang LS, Zhong P (2012) A rough margin-based v-twin support vector machine. Neural Comput Applic 21(6):1307–1317 49. Yang CF, Ji LP, Liu GS (2009) Study to speech emotion recognition based on TWINsSVM. 2009 Fifth international conference on natural computation, pp 312–316 50. Yang XB, Chen SC (2006) Proximal support vector machine based on prototypal multiclassification hyperplanes. J Comput Res Dev 43(10):1700–1705 51. Ye QL, Zhao CX, Gao SB, Zheng H (2012) Weighted twin support vector machines with local information and its application. Neural Netw 35:31–39 52. Ye QL, Zhao CX, Ye N, Chen XB (2011) Localized twin SVM via convex minimization. Neurocomputing 74(4):580–587 53. Ye YF, Cao H, Bai L, Wang Z, Shao YH (2013) Exploring determinants of inflation in China based on L1-e-twin support vector regression. Procedia Comput Sci 17:514–522 54. Zhao YP, Zhao J, Zhao M (2013) Twin least squares support vector regression. Neurocomputing 118:225–236

Suggest Documents