Multi-Class SVM for Large Data Sets Considering

1 downloads 0 Views 408KB Size Report
classifiers, Bayesian classifiers, artificial neural networks, decision tree, support vector machine (SVM), ensemble methods, etc. [18]. Among these techniques ...
Multi-Class SVM for Large Data Sets Considering Models of Classes Distribution Jair Cervantes 1 , Xiaoou Li 1 , Wen Yu 1

2

Department of Computer Science. CINVESTAV

Av. Instituto Polit´ecnico Nacional 2508. 07360 M´exico D.F., M´exico 2 Department of Automatic Control. CINVESTAV Av. Instituto Polit´ecnico Nacional 2508. 07360 M´exico D.F., M´exico

Abstract—Support Vector Machines (SVM) have gained profound interest amidst the researchers. One of the important issues concerning SVM is with its application to large data sets. It is recognized that SVM is computationally very intensive. This paper presents a novel multi SVM classification approach for large data sets using the sketch of classes distribution which is obtained by using SVM and minimum enclosing ball (MEB) method. Our approach has distinctive advantages on dealing with huge data sets. Experiments done with several large synthetic and real world data sets, show good performance on computational expense and accuracy.

keywords: support vector machines, multi-classification, large data sets. 1. I NTRODUCTION There are many standard classification techniques in literature, such as simple rule based and nearest neighbor classifiers, Bayesian classifiers, artificial neural networks, decision tree, support vector machine (SVM), ensemble methods, etc. [18]. Among these techniques, SVM is one of the best-known technique for its optimization solution [9]. Support vector machine (SVM) has emerged as a good classification technique and has achieved excellent generalization performance in a wide variety of applications, such as handwritten digit recognition [3], categorization of Web pages [11], face detection [21] and many applications in medicine [15][17]. However, despite of its good theoretical foundation and generalization performance, SVM is not suitable for classification of large data sets, since the training kernel matrix grows quadratically with the size of the data set, which provokes that the training of support vector machines on large data sets is a very slow process [8][10][16]. Many researchers have tried to find possible methods to apply SVM classification for large data sets. Sequential Minimal Optimization (SMO) is a fast method to train SVM [19], which breaks the large Quadratic Programming (QP) problem into a series of smallest possible QP problems. Core Vector Machines (CVM) [22] is a suitable technique for scaling up a two class SVM to handle large data sets. In CVM, the quadratic optimization problem involved in SVM is formulated as an equivalent Minimum Enclosing

Ball (MEB) problem. The MEB problem is then solved by using a faster approximation algorithm introduced in [2]. On the other hand, clustering techniques are an effective tool to reduce training time of SVM. For examples, hierarchical clustering, the algorithms build a hierarchical clustering tree for each class in the data set iteratively in several epochs. Each approach a SVM is trained on the nodes of each tree. This method is scalable for large data sets. However the algorithm is susceptive to noisy and incomplete data sets [1]. On the other hand, c-means cluster [5] is used in order to selecting and combining a given number of member classifiers which takes into account accuracy and efficiency. The accuracies obtained using this algorithm are comparable to other SVM implementations. However, this method works only with small and medium size data sets. Clustering based methods can reduce the computations burden of SVM, but they are themselves also very complex for large data set. These disadvantages in binary classification problems are still more problematic for multi-classification, because the training time is bigger than binary classification, although in the case that both have the same amount of training data [12]. In this paper, a new approach for classifying large data sets is proposed by using SVM and MEB method. Firstly, original data set is reduced by using a simple random selection method, so that an approximate hyperplane could be obtained by applying SVM on these selected training data. Then, we use the minimum enclosed balls idea [2][14] in order to remove data points which are far away from the decision hyperplane, and the data points included in the balls constitute the classes distribution, which is the framework in the boundary of each class and the most important data points for training classification algorithm. Finally, the second stage SVM on these ”important” data could result a more precise hyperplane. Multi-classification can be done by extending above method into a k-class problem with a series of binary-class problems. The main contribution of this paper is the demonstration that the speed of SVM can be increased by reducing drastically the input data set. This has been done using a previous stage of SVM and by demonstrating its power in both speed

and accuracy. Comparing other SVM implementations, the main advantage of this mechanism is that it removes the most nonsupport vectors quickly and collect useful training sets for the classification. The rest of the paper is organized as following: Section II describes the SVM implementations based on clustering in order to minimize the training time. Section III reviews SVM classification and introduces MEB and random selection methods which are used to select the most important input data from original data set. Section IV focuses on explaining the multi SVM classification using classes distribution. Simulation results are shown in Section V. Conclusion is given in Section VI. 2. P RELIMINARIES In this section, we concisely review the basic principles of SVMs, Minimum enclosed balls and a random selection algorithm used for selecting sample data. For more details, please refer to [19] and [14]. 3.1 Support Vector Machines Classification Assume that a training set X is given as: (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )

(1)

i.e. X = {xi , yi }ni=1 where xi ∈ Rd and yi ∈ (+1, −1) . Training SVM yields to solve a quadratic programming problem as follows max − αi

l X

l l X 1 X αi αi yi αj yj K hxi · xj i + 2 i,j=1 i=1

αi yi = 0, C ≥ αi ≥ 0, i = 1, 2, . . . l

(2)

(3)

i=1 T

where C > 0, αi = [α1 , α2 , . . . , αl ] , αi ≥ 0, i = 1, 2, . . . , l, are coefficients corresponding to xi , xi with nonzero αi is called Support Vector (SV). The function K is called the Mercel kernel, which must satisfy the Mercer condition [9]. Let S be the index set of SV, then the optimal hyperplane is X (αi yi )K(xi , xj ) + b = 0 (4) i∈S

and the optimal decision function is defined as ! f (x) = sign

X

(αi yi )K(xi , xj ) + b

(5)

i∈S

where x = [x1 , x2 , . . . , xl ] is the input data, αi and yi are Lagrange multipliers. A new object x can be classified using (5). The vector xi is shown only in the way of inner product. There is a Lagrangian multiplier α for each training point. When the maximum margin of the hyperplane is found, only the closed points to the hyperplane satisfy α > 0. These points are called support vectors SV, the other points satisfy α = 0.

3.2 Random Selection Algorithm (RS) In this paper, we use a random selection algorithm to select a small percentage of data from original large data set, so that SVM algorithm can be applied directly. We assume that (X, Y ) be the training patterns set, where X = {x1 , x2 , ..., xn } is the input data set, xi can be represented by a vector of p dimension, i.e., xi = (xi1 ...xip )T , Y = {y1 , y2 , . . . , yn } is the label set, label yi ∈ {−1, 1}. Our objective is to select a sample set C = (c1 , c2 , . . . , cl ), l  n from X . Note that l need to be predefined and should be an integer that is suitable to be training data size for normal SVM algorithms. Firstly, we divide the input data X into two groups according to their label Y , i.e., let the number of positive labeled data is q and the number of negative labeled data is m, and define the positive and negative labeled input data in array form as X + and X − corresponding to their label Y + and Y − respectively, where n = q + m. i.e., X+

=

[x+ (1), . . . , x+ (q)]

(6)

X



=

[x− (1), . . . , x− (m)]

(7)

Y

+

Y



= =

+

+

+

+

[y (1), . . . , y (q)] = [1, ..., 1]

(8)

[y (1), . . . , y (m)] = [−1, ..., −1]

(9) +

So the original input data set is the union of X and X − , i.e., X = X + ∪ X − . Secondly, we select sample data by sampling the subsets X + and X − independently. Here we use swapping method during selection process, and the selection is done in the following way: the first sample data c1 is chosen from X+ (or X−) uniformly and randomly, then it is exchanged with the last data x + (q) (or x − (m)), i.e., Swap (c1 , x+ (q)) (or Swap (c1 , x− (m))). The second sample c2 is selected from the remaining data {x+ (1), . . . , x+ (q − 1)}, then it is exchanged with the last data x+ (q − 1)(or x− (m − 1)),, i.e., Swap (c2 , x+ (q − 1)) (or Swap (c2 , x− (m − 1))), and so on, until the required number of data for X + (or X − ) are selected. Generally, we may select l/2 sample from each labeled original data sets, i.e., l/2 samples from X + and l/2 samples from X − if the labels are distributed evenly. If not, we may predefine a proportion on the sample for each label according to their distribution on original data or as wanted. The proportion may be defined in various different ways which are out of the range of our investigation in this paper. This swapping method can be also regarded as generating the first l entries in a random permutation. Above random selection can be formed as the following algorithm. For i := n to n-l+1 Do Generate a uniform [0, 1] random variate X. s←− diXe Swap(x[s], x[i]) Return x[n-l+1],...,x[n]

3.3 Minimum Enclosing Ball (MEB) Clustering is an unsupervised classification which partition a data set into subsets (clusters), so that the patterns belonging to any one of the clusters are similar and the patterns of the different clusters are as dissimilar as possible. MEB clustering proposed in this paper uses the concept of core-sets. It is defined as follows. A ball with center c and radius r is denoted as B (c, r) . Given a set of points S = {x1 , . . . xm } with xi ∈ Rd , the minimum enclosing ball (MEB) of S is the smallest ball that contains all balls and also all points in S, it is denoted as M EB(S). (1 + ε) −approximation of M EB (S) is denoted as a ball B (c, (1 + ε) r) , ε > 0 with r ≥ rM EB(S) and S ⊂ B (c, (1 + ε) r) . A set of points Q is a core-set of S if M EB (Q) = B (c, r) and S ⊂ B (c, (1 + ε) r) . 3. M ULTI - CLASSIFICATION USING C LASSES D ISTRIBUTION The problem of multi-classification for SVM, does not present an easy solution. In order to apply SVM to multiclassification problems, it is necessary to change the problem to multiple binary classification problems. There are two popular approaches to solve the multi-class classification based in SVM. 1) “one versus one” method (OVO). This method conhyperplanes for k−class problem , where each structs k(k−1) 2 one is trained with just two-class data sets. 2) “one versus all” method (OVA). This algorithm constructs k hyperplanes for k−class problem. The i-th hyperplane is trained with the samples between the i-th class and the rest data. After the multi-hyperplanes are obtained, a test sample is labeled by means of maximum output among the k hyperplanes (classifiers). There different methods to measure the maximum output, the principal difference between them is how to fuse together all classifiers. We consider following multi classification problem using OVA. Given n training data (x1 , y1 ), . . . , (xn , yn ), where xi ∈ Rd , i = 1, . . . , n and yi ∈ {1, . . . , k} is the class of xi , the ith SVM solve the problem: Pl minwi ,bi ,ξi 12 (wi )T wi + C j=1 ξji (wi )T (wi )T φ(xj ) + bi ≥ 1 − ξji , if yj = i (wi )T φ(xj ) + bi ≤ −1 + ξji , if yj 6= i ξji ≥ 0, j = 1, . . . , n where xi are mapped to high dimensional space by means of φ, where φ is a Kernel that satisfies the Mercer‘s condition [9], and C is penalty parameter. We solve the ith problem of binary classification as follows, let (Xi , Yi ) be the training patterns set Xi = {Xik , XR } = {x1 , · · · , xm , xm+1 , · · · , xn } Yi = {Yik , YR } = {y1 , · · · , ym , ym+1 , · · · , yn } y → = +1, and y → = −1 in ith 1m

m+1,n

Fig. 1.

a)

Fig. 2.

Binary classification for the i-th class

b)

c)

a) Input data set, b) Sampling Data c) First stage of SVM

The training task of SVM classification is to find the optimal hyperplane from the input Xi and the output Yi , which maximize the margin between the ith class and the rest of the data. By the sparse property of SVM, the data which are not support vectors will not contribute the optimal hyperplane. The input data points which are far away from the decision hyperplane should be eliminated, meanwhile the data points which are possibly support vectors should be used. 4.1 Binary-class SVM Algorithm Fig. 1 shows our the binary classification algorithm. Firstly, we select a predefined number of data from the original data set using a random selection method, and these selected data are used in the first stage SVM to obtained approximate support vectors. Secondly, the algorithm detects the closest support vectors (SVs) with different class labels and constructs a ball from each pair of SVs with different label using minimum enclosing ball method. Then the points enclosed in these balls are collected, we call this process data recovery, and these recovered data constitute the class distribution which will be used to train the final SVM classification algorithm. Thirdly, the second stage SVM is applied on recovery data to get a more precise classification result.

4.1.1 Selecting Initial Data: After random selection, we denote the data set of selected samples as C. We may divide C into two subsets, one contains positive labels, and the other contains negative labels, i.e., C + = {∪Ci | y = +1} C − = {∪Ci | y = −1}

(10)

Fig. 2-a) shows the original data, where the red circles represent positive labels data, and the green squares represents negative labels data. After random selection, the selected sample data are shown in Fig. 2-b). Then, C + consists of those data which are represented by red circles in Fig. 2b), and C − consists of those data which are represented by green squares in 2-b). Only these selected data will be used as training data in the first stage SVM classification. 4.1.2 The first stage SVM classification: In the first stage SVM classification we use SVM classification with SMO algorithm to get the decision hyperplane X ∗ yk α1,k K (xk , x) + b∗1 = 0 (11) k∈V1

where V1 is the index set of the support vectors in the first stage. Here, the training data set is C + ∪ C − which has been obtained by random selection in the last sub section. Fig. 2 b) and c) illustrate the training data set and the support vectors obtained in the first stage SVM classification, respectively. 4.1.3 Data recovery: Note that, the original data set is reduced significantly by random selection. In the first stage SVM classification, the training data set is only a small percentage of the original data. This may affect the classification precision, i.e., the obtained decision hyperplane cannot be precise enough, a refined process is necessary. However, at least it gives us a reference on which data can be eliminated. Some useful data hasn’t been selected during the random selection stage. So, a natural idea is to recover those data which are located between two support vectors with different label, and using the recovered data to make classification again. Firstly, we propose to recover the data by using a set pairs of support vectors with different label, where each chosen pair has the minimum distance allowed (mda) between each pair. In this way, the algorithm finds the minimum distance (md) between two support vectors with different label and calculate the mda. In this paper, in order to select only the closest support vector pairs, we use the following equation to calculate mda. mda = md + δ (12) where

a)

Fig. 3.

b)

c)

a) Balls from SV b) Recovering input data c) Final SVM

contained in the original input data set. Fig. 3 a) and b) illustrates the recovery process. The data in circles in Fig. 3 b) are the recovered data. The amount of recovered data depends on the mda selection and in this case is controlled by the support vector pairs selected. And the recovered data set is ∪Ci ∈V {Bi (ci , r)} . In the next section, Example 1 will illustrate this process. 4.1.4 The second stage SVM classification: Taking the recovered data as new training data set, we use again SVM classification with SMO algorithm to get the final decision hyperplane X ∗ yk α2,k K (xk , x) + b∗2 = 0 (14) k∈V2

where V2 is the index set of the support vectors in the second stage. Generally, the hyperplane (11) is close to the hyperplane (14). Note that, the original data set is reduced significantly by random selection and the first stage SVM. The recovered data should consist of the most useful data from the SVM optimization viewpoint, because the data far away from the decision hyperplane has been eliminated. On the other hand, almost all original data near the hyperplane (11) are included by our data recovery technique, the hyperplane (14) should be an approximation of the hyperplane of an normal SMO with all original training data set. Fig.3 (c) illustrates the training data and hyperplane of the second stage SVM classification. 4.2 Multi-class SVM Algorithm Training samples are X = {x1 , x2 , . . . , xn }; class labels are Y = {y1 , y2 , . . . , yn }, testing samples are Xt = {xt1 , xt2 , . . . , xtn }; class labels are Yt = {yt1 , yt2 , . . . , ytn }, k is the number of classes. Step 1. Obtain the k optimal hyperplanes using Binaryclass SVM Algorithm Step 2. Given a sample x to classify, the label of the class that has the largest value of the decision function is chosen as: class of x = arg max (wi · xt + bi )

4 4

= δ=

P

kwk

yi αi xi

i,2,...,k

(13)

i∈V

Secondly, we construct the balls using core sets from each pair of support vectors and we use this balls in order to catch all the data points (near of classification hyperplane)

where wi and bi depict the hyperplane of the i-th SVM. 4. E XPERIMENTAL R ESULTS The proposed algorithm was evaluated on 3 different test sets: the Shuttle data set, the Covtype data set and the Sensit Vehicle (combined) and was compared to simple SVM [23]

Table I

Checkerboard Data Set our approach

# 50,000 100,000 500,000 1,000,000

t

Acc

93.6 145 316 517

99.9 99.9 99.9 99.9

LIBSVM

t 4657 -

Acc 99.9 -

Simple SVM

t 107.91 257.1 685.6 -

Acc 99.9 99.9 99.9 -

t = time in seconds, Acc = accuracy. Example 2: The Shuttle data set consists of 7 classes, there are 43, 500 data with 9 features each one, we use sizes 22, 000 and 43, 500 records for training and next 14, 500 records for testing in our experiments. We choose RBF kernel with γ = 0.5. Fig. 4.

The 4 × 4 checkerboard data with 4 classes

Example 3: The Covtype data set consists of 7 classes, whose inputs are 54 features. There are 581, 012 data, we use sizes 10, 000, 25, 000, 50, 000, 100, 000 and 200, 000 records for training and next 381, 012 records for testing in our experiments.

Fig. 5. a) Original data set for binary classification b) Reduced data set after applying RS, SVM and MEB

and LIBSVM [6] based in SMO, all data sets are available at [6]. With the purpose of clarify and emphasize the basic idea of our approach, let’s consider a very simple case of classification and sectioning. Example 1: The 4 × 4 checkerboard data is commonly used for evaluating SVM implementations with large data sets. In this case, the input data set has four classes as show in Fig 4. We use training data sizes of 1, 000, 000, 500, 000, 100, 000 and 50, 000 separately in our experiments, and 5, 000 data are used for testing. The input data set of 500, 000 data (Figure 5 a)) is reduced to 7291 data (Figure 5 b)) after applying the random sampling, the first stage SVM and MEB on the closest SVs with different class labels. So, the original data set is reduced considerably. Table I. show the results of our experiments using RBF kernel in all SVM implementations. For the 50, 000 data set, our classifier needs 93 seconds, while LIBSVM and Simple SVM needs about 4657 and 107.91 seconds respectively, and the accuracies are the same. For 1, 000, 000 data, our classifier needs 517 seconds, while LIBSVM, and Simple SVM uses a very long time.

Example 4: The Sensit Vehicle (combined) data set consists of 3 classes. There are 78, 823 training data and 19, 705 testing data, the Sensit Vehicle (combined) has 100 features we uses 9000, 18, 500, 39, 000 and 78, 823 records for training in our experiments. Table II shows the comparison results with some other SVM classifications on training time and accuracy. In the experiments on Example 2, we use C = 105 . For 22, 000 data, our classifier needs 15 seconds while LIBSVM needs about 50 seconds, and the accuracies are almost the same. But, for 43, 500 data, our classifier needs 20.7 seconds while the LIBSVM and Simple SVM should use a very long time. In the experiments on Example 3, for 10, 000 data, our classifier needs 13.7 seconds while LIBSVM needs about 328 seconds, and the accuracies are almost the same. But, for 25000 data, our classifier needs 18.3 seconds while the LIBSVM should use a very long time. On the hand, one more time, there is almost no difference between their accuracies. In this experiment we use γ = 0.5 and C = 100. For entire input data in Example 4, our classifier needs a small time in comparison with LIBSVM. On the hand, there is almost no difference between their accuracies. This implicate that our algorithm has great advantage when the input data set are very large. Sometimes, we do not include some results because the training time is very long.

Table II Comparison with LIBSVM and Simple SVM our approach # Shutle

CovType

Vehicle

Libsvm t

Simple SVM

t

Acc

Acc

t

22000

14.9

99.6

50.5

99.6

60.0

Acc 99.6

43500

20.7

99.6

158.

99.7

165

99.8

10000

13.7

67.8

328.3

68.1

16598

68.7

25000

18.3

67.9

3104

70.2

-

-

50000

35.7

69.3

-

-

-

-

100000

93.9

70.1

-

-

-

-

200000

651

70.3

-

-

-

-

9000

23.7

82.7

56.2

83.3

19108

83.2

18500

31.8

83.9

469

84.6

-

-

39000

69.7

84.2

1842

84.6

-

-

78823

258

84.6

-

-

-

-

# = size of input data, t=time in seconds, Acc=accuracy.

5. C ONCLUSIONS In this paper we present a novel multi SVM classification approach for large data sets using classes distribution. In order to reduce SVM training time for large data sets, one fast first stage of SVM is introduced to reduce training data. A minimum enclosed ball process overcomes the drawback that only part of the original data near the support vectors are trained. Experiments done with several large synthetic and real world data sets, show that the proposed technique has great advantage when the input data sets are very large. R EFERENCES [1] M. Awad L. Khan, F.Bastani and I. L.Yen , “An effective support vector machine(SVMs) performance using hierarchical clustering,” Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’04),Pages: 663- 667, 2004. [2] M.Badoiu, S.Har-Peled and P.Indyk. “Approximate clustering via core-sets,” Proceedings of the 34th Symposium on Theory of Computing, 2002. [3] Bellili A., Gilloux M. and Gallinari P., “An MLP-SVM combination architecture for offline handwritten digit recognition: Reduction of recognition errors by Support Vector Machines rejection mechanisms,” International Journal on Document Analysis and Recognition, Volume: 5, Issue: 4, p. 244 - 252, 2003. [4] J. Cervantes, X.Li and W.Yu, “Multi-Class Support Vector Machines for Large Data Sets via Minimum Enclosing Ball Clustering,” ICEEE 2007. 4th International Conference on Electrical and Electronics Engineering,, 146 - 149, 2007 [5] J. Cervantes, X.Li and W.Yu, “Support vector machine classification for large data sets via minimum enclosing ball clustering,” Neurocomputing, Volume 71, Issues 4-6, Pages 611-619 , 2008. [6] C-C.Chang and C-J.Lin, LIBSVM: a library for support vector machines, http://www.csie.ntu.edu.tw/˜cjlin/libsvm, 2001. [7] Chen J.-H. and Chen Ch.-S., Reducing SVM classification time using multiple mirror classifiers, Systems, Man, and Cybernetics, Part B, IEEE Transactions on, April 2004, Vol: 34, Issue: 2, pp: 1173- 1183 [8] P-H.Chen, R-E.Fan and C-J.Lin, “A study on SMO-type decomposition methods for support vector machines,” IEEE Trans. Neural Networks, Vol.17, No.4, 893-908, 2006. [9] N.Cristianini, J.Shawe-Taylor, “An introduction to support vector machines and other kernel-based learning methods,” Cambridge University Press, 2000.

[10] J-X.Dong,A.Krzyzak,and C.Y.Suen, “Fast SVM training algorithm with descomposition on very large data sets,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.27, no.4, pp.603-618, 2005. [11] Q. Duan, D. Miao and K. Jin, “A Rough Set Approach to Classifying Web Page Without Negative Examples,” Advances in Knowledge Discovery and Data Mining, LNCS, Volume 4426, p.481-488, 2007. [12] C.-W. Hsu and C.-J. Lin. “A comparison of methods for multi-class support vector machines,” IEEE Transactions on Neural Networks, 13(2):415–425, March 2002. [13] Khan L., Awad M and Thuraisingham B., A new intrusion detection system using support vector machines and hierarchical clustering, The VLDB Journal, Vol. 16, No. 4, pp 507-521, 2007. [14] P. Kumar, J. S. B. Mitchell, and A. Yildirim. “Approximate minimum enclosing balls in high dimensions using core-sets,” ACM Journal of Experimental Algorithmics, 8, January 2003. [15] Matheny, M. E., Resnic, F. S., Arora, N., and Ohno-Machado, L., “Effects of SVM parameter optimization on discrimination and calibration for post-procedural PCI mortality,” J. of Biomedical Informatics, Vol.40, No.6, 688-697, 2007. [16] Moon, H., Ahn, H., Kodell, R. L., Baek, S., Lin, C., and Chen, J. J., “A geometric approach to support vector machine classification,” IEEE Trans. Neural Networks, Vol.17, No.3, 671-682, 2006. [17] M.E.Mavroforakis and S.Theodoridis, “Ensemble methods for classification of patients for personalized medicine with high-dimensional data,” Artif. Intell. Med., Vol.41, No.3, 197-207, 2007. [18] A.Nurnberger, W.Pedrycz , R.Kruse, “Data mining tasks and methods: classification: neural network approaches,” Oxford University Press, Inc., New York, NY, 2002. [19] J.Platt, “Fast training of support vector machine using sequential minimal optimization,” Advances in Kernel Methods: support vector machine, MIT Press, Cambridge, MA, 1998. [20] SAKETHA NATH J., BHATTACHARYYA C., MURTY M. N., Clustering based large margin classification : A scalable approach using SOCP formulation, Proceedings of the Twelfth ACM SIGKDD international conference on knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA [21] Shih, P. and Liu, C., “Face detection using discriminating feature analysis and Support Vector MachineA,” Pattern Recogn., Vol 39, Issue 2, 2006, [22] Tsang, I. W., Kwok, J. T., & Cheung, P.-M. “Core vector machines: fast SVM training on very large data sets,” Journal of Machine Learning Research, 6, pags. 363-392. 2005. [23] Vishwanathan, S., Smola, A., & Murty, M.. “SimpleSVM,” Proceedings of the Twentieth International Conference on Machine Learning. pp. 760–767. Washington, D.C., USA. 2003 [24] H.Yu , J.Yang and J.Han , Classifying Large Data Sets Using SVMs with Hierarchical Clusters, Proc. of the 9th ACM SIGKDD 2003, Washington, DC, USA, 2003.

Suggest Documents