Speeding Up Multi-class SVM Evaluation by PCA ... - Semantic Scholar

2 downloads 0 Views 217KB Size Report
1 Introduction. The Support Vector Machine (SVM) was originally de- signed for binary classification problem [1]. It separates two classes with maximum margin.
Speeding Up Multi-class SVM Evaluation by PCA and Feature Selection Hansheng Lei, Venu Govindaraju CUBS, Center for Unified Biometrics and Sensors State University of New York at Buffalo Amherst, NY 14260 Email: {hlei,[email protected]} Abstract

lem was almost solved, although there might be lurking more powerful solutions. With the support of SMO, the great scalability of SVM has demonstrated its promising potentials in data mining areas [19]. In the past decade, SVM has been widely applied in pattern recognition as well as data mining with fruitful results. However, the SVM itself also needs improvement in both training and testing (evaluation). A lot of work have been done to improve the SVM training and the SMO can be considered as the state-ofart solution for that. Comparatively, only a few efforts have been put at the evaluation side of SVM[2, 4, 10]. In this paper, we propose a method for SVM evaluation enhancement via Principle Component Analysis (PCA) and Recursive Feature Elimination (RFE). PCA is an orthogonal transformation of coordinate system that preserves the Euclidean distance of original points (each point is considered as a vector of features or components). By PCA transform, the energy of points are concentrated into the first few components. This leads to dimension reduction. Feature selection has been 1 Introduction heavily studied, especially for the purpose of gene selecThe Support Vector Machine (SVM) was originally de- tion on microarray data. The common situation in the signed for binary classification problem [1]. It separates gene related classification problem is: there are thoutwo classes with maximum margin. The margin is de- sands of genes but only no more than hundreds of samscribed by Support Vectors (SV) which are determined ples, i.e., the number of dimensions is much more than by solving a Quadratic Programming(QP) optimization the number of samples. In this condition, the probproblem. The training of SVM, dominated by the QP lem of overfitting arises. Among those genes, which of optimization, used to be very slow and lack of scalabil- them are discriminative? Finding the minimum subset ity. A lot of efforts have been done to crack the QP of genes that interact can help cancer diagnosis. RFE problem and enhance its scalability [17, 13, 14]. The in the context of SVM has achieved excellent results bottleneck lies in the kernel matrix. Suppose we have on feature selection [5]. Here, we do the contrary, i.e., N data points for training, then the size of the kernel we use the fruits of the application of SVM in feature matrix will be N × N . When N is more than thou- selection to improve SVM itself. The rest of this paper is organized as follows. After sands (say, N = 5000), the kernel matrix is too big to the introduction, we briefly discuss the background of stay in the memory of a common personal computer. SVM, PCA and RFE as well as some related works in This had been a challenge for SVM until the Sequential §2. Then, we prove SVM is invariant under PCA and Minimum Optimization (SMO) was invented by [14]. describe how to incorporate PCA and RFE into SVM to The space complexity of SVM training is dramatically brought down to O(1) by SMO. Thus, the training prob- speed up SVM evaluation in §3. Experimental results Support Vector Machine (SVM) is the state-of-art learning machine that has been very fruitful not only in pattern recognition, but also in data mining areas, such as feature selection on microarray data, novelty detection, the scalability of algorithms, etc. SVM has been extensively and successfully applied in feature selection for genetic diagnosis. In this paper, we do the contrary,i.e., we use the fruits achieved in the applications of SVM in feature selection to improve SVM itself. By reducing redundant and non-discriminative features, the computational time of SVM is greatly saved and thus the evaluation speeds up. We propose combining Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE) into multi-class SVM. We found that SVM is invariant under PCA transform, which qualifies PCA to be a desirable dimension reduction method for SVM. On the other hand, RFE is a suitable feature selection method for binary SVM. However, RFE requires many iterations and each iteration needs to train SVM once. This makes RFE infeasible for multi-class SVM if without PCA dimension reduction,especially when the training set is large. Therefore, combining PCA with RFE is necessary. Our experiments on the benchmark database MNIST and other commonly-used datasets show that PCA and RFE can speed up the evaluation of SVM by an order of 10 while maintaining comparable accuracy.

on benchmark datasets are reported in §4. conclusion is drawn in §5.

Finally,

2 Background and Related Works In this section, we discuss the basic concepts of SVM and how RFE is incorporated into SVM for feature selection on gene expressions. In addition, PCA is also introduced. We prove that SVM is invariant under PCA transformation and the propose combining PCA and RFE safely to improve SVM evaluation. 2.1 Support Vector Machines (SVM) The basic form of a SVM classifier can be expressed as: (2.1)

g(x) = w · φ(x) + b,

where input vector x ∈