2012 11th International Conference on Machine Learning and Applications
Minimal Norm Support Vector Machines for Large Classification Tasks Robert Strack Computer Science Department Virginia Commonwealth University Richmond, VA 23284-3019
[email protected]
Vojislav Kecman Computer Science Department Virginia Commonwealth University Richmond, VA 23284-3019
[email protected]
The Minimal Norm Support Vector Machine proposed here originates from the minimal enclosing ball approach [5], [7]. Namely, it solves the same optimization problem but by application of the minimal norm problem solver (being an approach previously used i.a. in SVMs training based on convex hulls [8]). While keeping the level of accuracy, it achieves a significant speedup in respect to all three L1 and L2 LIBSVM and BVMs. Although the most popular SVM solvers (such as Platt’s SMO [9]) are based on the Lagrange multipliers method and are searching for the solution in the dual space, there is a lot of research conducted towards finding efficient algorithms that work directly in the feature space. These algorithms are mostly based on the geometric interpretation of the maximum margin classifiers. The geometric properties of hard margin SVM classifiers have been known for a long time. Recently, Keerthi et al. [10] and Franc [11] proposed algorithms based on the geometric interpretation of the SVM algorithm for solving cases with separable classes. Their approach treats the problem of finding the maximum margin between two classes as a problem of finding two closest points belonging to convex polytopes covering the classes. Crisp et al. analyzed the geometric properties of the ν-SVM algorithm [12] and based on this work, Mavroforakis introduced the Reduced Convex Hulls (RCH) [13]. Another field of research involves algorithms based on the Minimal Enclosing Ball (MEB) problem. Tsang et al. [5], [14] formulated the SVM problem as the MEB problem and proposed the Core Vector Machines (CVM) algorithm as an approach suitable for very large SVM training. Their algorithm is an application of Badoiu and Clarkson’s work [15] that investigates the use of coresets in finding an approximation of MEB. Furthermore, Tsang et al. in [7] improved the idea of Core Vector Machines by introducing new algorithm not requiring QP solver - Ball Vector Machines (BVM). This algorithm was further enhanced by Strack et al. [16] who proposed new updade scheme resulting in faster convergence. In this paper we propose a new algorithm that improves the BVM by applying ideas previously used in SVM learning based on RCH. First SVM solver involving finding two
Abstract—This paper introduces Minimal Norm Support Vector Machines (MNSVM) as the new fast classification algorithm originating from minimal enclosing ball approach and based on combining state of the art minimal norm problem solvers and probabilistic techniques. Our approach significantly improves the time performance of the SVM’s training phase. Moreover, the comparison with other SVM classification techniques based on Sequential Minimal Optimization algorithm, over several large real data sets within the strict validation frame of a double (nested) cross-validation, reveals huge similarity in the classification accuracy. The results shown are promoting MNSVM as outstanding alternative for handling large and ultra-large datasets in a reasonable time without switching to various parallelization schemes for SVMs algorithms proposed recently. Keywords-support vector machines; core vector machines; minimum enclosing ball; minimal norm problem; large datasets; classification
I. I NTRODUCTION Support Vector Machines are considered to be among the best classification tools available today. Many experimental results achieved on a variety of classification (and regression) tasks complement the highly appreciated theoretical properties of SVMs. However, there is one property of SVMs learning algorithm that has required, and still requires, special attention. This is the fact that the learning phase of SVMs scales with the number of training datapoints. Hence, with an increase of datasets’ sizes, the learning can be a quite slow process. The first successful attempts to deal with this matter include the decomposition approaches that have led to several efficient pieces of software, the most popular being SVMlight [1] and LIBSVM [2]. However, the ever increasing size of datasets has driven the SVMs training time beyond acceptable limits. The two remedial avenues for overcoming the issues of large datasets employed during in the last decade include various parallelization attempts (including the newest GPU embedded implementations [3], [4]) and the use of geometric approaches. The latter include solving SVMs’ learning problem by both convex hulls and enclosing ball approaches [5], [6]. The most recent and advanced method, known as the Ball Vector Machines [7] has displayed high capacity for handling large datasets. 978-0-7695-4913-2 2012 U.S. Government Work Not Protected by U.S. Copyright DOI 10.1109/ICMLA.2012.43
209
(see Figure 1). This way, the solution of the relatively complex problem expressed in (1) can be obtained by solving fundamental computational geometry problem - minimal norm problem (5).
closest points on non-overlapping RCH was introduced by Mavroforakis and Theodoridis [13]. It was further improved by L´opez et al. [17] by replacing SK algorithm (Kozinec [18]), that was used in searching for the closest points, with faster MDM algorithm introduced by Michell, Demyanov and Malozemov in [19]. Here, we show that the optimization problem that is solved by both CVM and BVM algorithms can be treated as a minimal norm problem. Since MDM algorithm was proven to be an efficient approach in solving this type of minimization tasks, we propose a novel SVM training technique that applies this procedure in finding the solution of the optimization problem used by Core and Ball Vector Machines.
II. M INIMAL N ORM S UPPORT V ECTOR M ACHINES Since the modified L2 SVM criterion (1) can be transformed into MNP problem, it is possible to solve the SVM problem by applying one of the existing MNP solvers. Therefore, we propose a novel Minimal Norm Support Vector Machines (MNSVM) algorithm that uses well known MDM approach in training an SVM model. Here, instead of searching for an ε-approximation of the minimum enclosing ball, as it was done in BVM algorithm, we search for the point that is closest to the origin and that ˜ i in belongs to the convex hull spanned by the data points x ˜ Because of the fact that both BVM and the feature space Φ. MNSVM originate from the same optimization problem (1), they yield the same solution. In other words, the center of the minimum enclosing ball found by BVM and the point on the convex hull being closest to the origin obtained by MNSVM algorithm overlap. While creating MNSVM algorithm we followed the MDM approach [19], [20] which is very effective method for solving minimal norm problem.
A. Background It has been shown in [5] that for normalized kernels1 the learning setting of the L2 SVM defined as m 1 C 2 b2 arg min w2 + −ρ+ ζ , 2 2 i=1 i w,b,ζ,ρ 2
(1)
subject to yi (xi · w + b) ρ − ζi for i = 1, . . . , m, can be rewritten as a minimization task which equals to solving a problem of finding minimal enclosing ball with center z and radius R (2) arg min R2 ,
A. Steps of the Minimal Norm SVM Algorithm
R,z
The Algorithm 1 contains the simplified pseudo code of the MNSVM method. As a result of the algorithm we
˜ i 2 R2 for i = 1, . . . , m, in the feature subject to z − x ˜ defined by kernel k˜ space Φ
Algorithm 1 Minimum Norm Vector Machines algorithm Require: ε ∈ (0, 1) {used in stopping criterion} m ˜ i {the ε-approximation of the point Ensure: c = i=1 αi x closest to the origin} 1: α ← 0, α0 ← 1 2: repeat 3: Xr ← random subset of X ˜i · c 4: v ← arg mini: x˜ i ∈Xr ∧ x˜ i ·c0 x v −xu ) 7: βˆ ← c·(x 2 xv −x u ˆ αu 8: β ← min β,
δij k˜ij = yi yj kij + yi yj + , (3) C where kij = ϕ(xi ) · ϕ(xj ) is the kernel used in the original ˜ i = ϕ(x ˜ i ) is the image of the vector xi L2 SVM problem, x in the feature space Φ˜ and δij is the Kronecker’s delta. In other words, solving enclosing ball problem in the feature ˜ produces a solution of the L2 SVMs task. space Φ Optimization task (1) can be expressed in dual space as the following QP optimization problem arg min α
m m
αi αj k˜ij ,
(4)
i=1 j=1
m subject to i=1 αi = 1 and αi 0 for i = 1, . . . , m. Therefore, it is clear that the problem stated in (1) can be solved as a minimal norm problem in the same kernel space k˜ arg min c2 , c∈H(X )
9: 10: 11: 12:
αv ← αv + β αu ← αu − β end if until v = ∅
m ˜ i belonging to the convex obtain a vector c = i=1 αi x ˜ i , that is closest to the hull spanned by the data points x origin. MNSVM algorithm is divided into several stages. First the vector α is initialized. Then selection of the violating vectors followed by an update to the current solution is performed
(5)
where belonging to the convex hull H(X ) = m c is a point ˜ i }, and m { i=1 αi x i=1 αi = 1, αi 0 for i = 1, . . . , m 1 kernels satisfying condition k = ϕ(x ) · ϕ(x ) = τ is constant, e.g. ii i i for a Gaussian kernel kii = 1
210
˜ i used in selecting violator x ˜ v in arg maxi:˜xi ∈Xr z − x MNSVM and BVM algorithms respectively, are equivalent. 3) Update Procedure: In the BVM algorithm the update to the current solution c was performed by shifting it along ˜ v − z. Whereas in MNSVM it is implemented the vector x as a translation of the current solution c parallel to the line ˜ u in such a way that the ˜v − x connecting the violators x value of c is minimized. Knowing that updated solution is equal to ˜ u ), xv − x (7) c = c + β(˜ Figure 1.
it is possible to show that the optimal value of β that minimizes c 2 is equal to
Update step of the MNSVM algorithm.
˜u) c(˜ xv − x . βˆ = ˜ u 2 ˜ xv − x
within a loop. The loop ends when condition of the stopping criterion are satisfied. 1) Initialization: The initialization of the MNSVM algorithm is almost the same as in BVM method. The random ˜ i is chosen (for example, the one with index 0 as it vector x is shown in line 1 of the Algorithm 1) and its weight is set to 1. In contrast to BVM, our approach does not require an initial estimation of the enclosing ball radius. This is very important improvement that eliminates the possibility of inaccurate radius estimation, which is very likely to occur when the number of data m is small or the dimensionality ˜ is low. of the feature space Φ 2) Selection of the Violating Vectors: In BVM algorithm, ˜ v , lying outside during each iteration a violating vector x of the enclosing ball, is found. However, in MNSVM there ˜ u . The violator x ˜v ˜ v and x are two types of violators: x is a vector lying “in front” of the current solution c and ˜ u is a support vectors being “behind” it (see Figure 1 for x clarification). It is worth mentioning that for the minimal norm problem all support vectors constituting the solution ˜ i · c = c2 . In other words the of this problem must fulfill x support vectors must be lying neither “in front” nor “behind” the solution c. Although the interpretation of the violating vectors is different for both algorithms, it shows out that in these two ˜ v are literally the same vectors. approaches the vectors x One of the assumption of the BVM method is that kernel is normalized and all vectors are mapped onto a sphere. Therefore, in the equation describing the distance from the ˜i center of the enclosing ball to a vector x ˜ i 2 = z2 − 2˜ z − x xi · z + ˜ xi 2 ,
(8)
In order to satisfy the non-negativity conditions for weights αi , the value of β mustbe limited from above by αu , ˆ αu . therefore β = min β, 4) Stopping criterion: In MNSVM, instead of using stopping criterion based on the radius of the enclosing ball, it is possible to apply one of the criterion suggested by L´opez [21]. Our algorithm stops when further progress is not profitable and the current solution c approximates the true solution well enough. Namely, the probability of finding ˜ i satisfying condition V (˜ xi ) > ε is small. The a vector x c2 −˜ xi ·c is shows to what extent a function V (˜ xi ) = c2 ˜ i violates the current solution c and the ε given vector x variable is the parameter of the stopping criterion. If the ˜ i . Although, the solution is optimal, then V (˜ xi ) 0 for all x algorithm finds just an approximate solution, one can control the accuracy of this approximation using the parameter ε. The smaller the value is the more accurate the solution is. Unfortunately, decreasing the value of ε increases the time required to find the solution. It is also possible to use the same stopping criterion as in BVM algorithm - this is a valid approach if the two algorithms are to be compared. B. Performance tweaks Additional speedup was obtained by using the “probabilistic speedup” approach proposed by Smola and Sch¨olkopf [22]. Instead of finding the worst violator, which requires calculation of V (˜ xi ) for all xi ∈ X , only small subset Xr ⊂ X is used. Unfortunately, in order to gain significant speedup this technique slightly decreases the accuracy of the models. Similarly as in BVM algorithm we use the multi-scale approach. Briefly speaking, the procedure starts from relatively large value of the parameter ε (e.g. equal to 12 ) and then in ˜ v . If it is not each iteration it is trying to find a violator x possible to find a violator, the value of εˆ is decreased and the searching for a violating vector is repeated. Decreasing of ε is repeated until its value is smaller than the original
(6)
both z and ˜ xi = τ + 1 + C1 are constant for all vectors. It means that for two different samples xi the difference between their distances from the center depends only upon ˜ i · c (please note that the center of the enclosing the term x ball z used in BVM algorithm and the vector c used in BVM method are equivalent and can be used interchangeably). ˜v = ˜ i · c and x ˜ v = arg mini:˜xi ∈Xr x Hence, the criterion x
211
value, then the algorithm stops. The justification for the multi-scale approach is an increase in the accuracy of the obtained models together with small improvement in time performance. III. DATASETS AND E XPERIMENTAL E NVIRONMENT All results presented in this section were obtained using the double cross-validation procedure [23]. The double cross-validation is a very rigorous scheme for assessing a classification model’s performance. It ensures that the class labels of the test data will not be seen when tuning the hyper-parameters. This is consistent with the real-world applications scenario. In our experiments we used five folds in the outer loop of the nested cross-validation procedure. The datasets were first normalized by linear transformation of the feature values into the [0, 1] range. Then the training process, which involved searching for the best model parameters using a grid search method, was performed. The parameters were selected among 64 possible combinations of the regularization parameter C and coefficient γ being the parameter of the Gaussian kernel 2 ϕ(xi ) · ϕ(xj ) = e−γxi −xj . There were eight possible values for C parameter C ∈ {4n } , n = −2, . . . , 5 and eight possible γ values γ ∈ {4n } , n = −5, . . . , 2. The tolerance parameter ε, used in the stopping criterion, was set to ε = 10−3 for the L1 SVM and L2 SVM algorithms. As shown in [4], for SMO based algorithms, the value of ε does not affect accuracy or time performance significantly. In other words, decrease of ε does not improve accuracy nor does reasonable increase of its value speed up the training procedure in a way that could change results substantially. Therefore, we believe that this setting is a good trade-off between accuracy and the time required to train the model. In order to perform fair comparison between BVM and MNSVM algorithms, we decided to use the same stopping criterion for both algorithms. The value of ε was calculated using a heuristic proposed by Tsang. This heuristic permitted us to estimate parameter ε based on the value of the regularization C. In our case, values coefficient of ε were in the range 10−6 , 10−3 depending on the value of C (smaller ε for larger C). More information regarding properties of the parameters ε for both methods can be found in [24], [25]. The selection of the best model’s parameters has been done by using 5-fold cross validation applied to the previously selected training sets. After the best parameters were chosen, one additional SVM model was trained using an entire training dataset. This model was then assessed on the test dataset. We used (almost) the same datasets as in [7] where BVM algorithm has been introduced. However, experimental environment in [7] was not as strict as the double CV used here. Similarly, as in Tsang’s work we applied oneversus-one scheme for multi-class classification problems.
Dataset
# of classes
optdigits satimage usps pendigits reuters letter adult w3a shuttle web (w8a) ijcnn1 intrusion
10 6 10 10 2 26 2 2 7 2 2 2
Dim. 64 36 256 16 8315 16 123 300 7 300 22 127
Total # of patterns 5,620 6,435 9,298 10,992 11,069 20,000 48,842 49,749 58,000 64,700 141,691 5,209,460
Avg. # of training patterns 719 1,373 1,190 1,407 7,084 985 31,259 31,839 10,606 41,408 90,682 3,334,054
Table I DATASETS USED IN EXPERIMENTS , NUMBER OF CLASSES , NUMBER OF FEATURES ( DIMENSION ), NUMBER OF TRAINING PATTERNS AND AVERAGE NUMBER OF PATTERNS USED IN ONE TRAINING RUN ( WITHIN A NESTED 5 × 5 CROSS VALIDATION AND WITH A PAIRWISE CLASSIFICATION FOR MULTI - CLASS DATASETS ).
All the datasets can be downloaded from the LIBSVM2 and LibCVM3 sites. Our experiments were performed using a computer cluster equipped with E5520 Intel Xeon CPUs. The double cross-validation was performed by parallel execution of the independent training and testing processes. We used the LIBSVM [2] package as the reference implementation of the L1 and L2 SVMs based on SMO approach. The BVM implementation is taken from the LibCVM [5], [7] software. IV. P ERFORMANCE OF MNSVM Figure 2 shows the total time of the nested cross validation procedure. Results allow us to conclude that MNSVM is approximately two times more efficient than BVM. In case of datasets w3a and web our approach is even more than 35 times faster. Note that we do not present results for L1
Figure 2. 2 available 3 avaliable
212
Total nested cross validation time.
at http://www.csie.ntu.edu.tw/ cjlin/libsvm/ at http://c2inet.sce.ntu.edu.sg/ivor/cvm.html
and L2 SVM obtained for the intrusion dataset. The reason is that, both SVM algorithms based on SMO were unable to complete the training within reasonable time. Since the algorithms were not able to finish cross-validation even for a single set of parameters C and γ, we were forced to abort their training (after approximately 60 hours). Simple calculations lead to rough estimation, that the learning process would not have finished within 160 days, which is a huge difference compared to 7 days required by MNSVM. If we analyze the training times for optimal parameters (see Figure 3) we will see that almost always MNSVM algorithm is faster than its competitors. It loses with L1 SVM only for shuttle and adult (here, BVM is slightly faster as well). Usually, MNSVM is approximately two times faster than the fastest of the SMO-based algorithms - L1 SVM. In the case of the web dataset a sevenfold speedup was achieved.
Figure 3.
Figure 4.
Accuracy obtained during nested cross validation.
The average percent of support vectors5 is shown in Figure 5. All the models obtained by different algorithms have similar number of support vectors. It can be observed that the models created by MNSVM approach are always smaller compared to the ones generated by BVM algorithm. The reason for that is the way the update step is performed. Namely, MNSVM algorithm allows to reduce the number of support vectors (by decreasing the weight αu for the violator ˜ i to 0). In case of the BVM algorithm it impossible to x remove a support vector from the coreset.
Training time for optimal parameters.
Figure 4 presents accuracies obtained for all datasets during nested cross validation. It can be readily seen that the results of all four models are very similar in terms of the accuracy. The error rate for MNSVM is usually higher than the one of L1 and L2 SVM but the difference is almost always smaller than 0.5% (excluding adult dataset4 ). According to the standard error range of our measurements - the significant difference can be observed for datasets usps, letter, adult, shuttle and ijcnn1. Here, both algorithms BVM and MNSVM have evidently higher error rate. BVM resulted in significantly better models only for datasets letter and adult compared to MNSVM, but on the other hand MNSVM was more accurate for usps dataset. Since both algorithms optimize the same cost function, we strongly believe that further experiments will lead to conclusion that the accuracy of both algorithms is the same. Fortunately, the small differences in accuracy are compensated by shorter training time required by our geometrical method.
Figure 5. training.
Average percent of support vectors obtained in one-vs-one
V. C ONCLUSIONS We presented the novel L2 SVM classification approach called Minimal Norm SVM that aims at classifying very large data sets. It was shown that it achieves significant performance improvement compared to its predecessor Ball Vector Machines and is competitive with other modern SVM solvers such as the ones implemented in LIBSVM software. While achieving a significant speedup MNSVM is 5 By the percent of support vectors we mean the ratio of the number of non-zero coefficients αi to the total number of training patterns used in one-vs-one training.
reason for that is too big value of the tolerance parameter ε. We obtained competitive accuracy after decreasing its value. 4 The
213
still attaining comparative accuracy as the other presented approaches. The tests have been performed using the double cross validation procedure and therefore the classification error estimates are obtained on the samples not seen by the classifiers during the training phase. By using such a rigorous experimental tools we can ensure that the obtained accuracy estimates are close to the ones obtained in a real life applications. Further, we would like to emphasize the fact that MNSVM usually generates smaller models than the BVM algorithm, which is a consequence of the efficient elimination of support vectors during the update step. In conclusion, for massive datasets, when the number of training samples goes beyond few millions, MNSVM seems to be the only alternative able to handle classification tasks in a reasonable time.
[11] V. Franc and V. Hlav´acˇ , “An iterative algorithm learning the maximal margin classifier,” Pattern Recognition, vol. 36, no. 9, pp. 1985–1996, Sep. 2003. [12] D. J. Crisp and C. J. C. Burges, “A Geometric Interpretation of nu-SVM Classifiers,” in Advances in Neural Information Processing Systems, vol. 12, 2000, pp. 223–229. [13] M. E. Mavroforakis and S. Theodoridis, “A geometric approach to support vector machine (SVM) classification.” IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, vol. 17, no. 3, pp. 671–82, May 2006. [14] I. W. Tsang, J. T. Kwok, and P.-M. Cheung, “Very large SVM training using core vector machines,” in Proc. 10th Int. Workshop Artif. Intell., 2005, pp. 349—-356.
R EFERENCES
[15] M. Badoiu and K. L. Clarkson, “Smaller Core-Sets for Balls,” in Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, no. 1, 2003, pp. 1–2.
[1] T. Joachims, “Making large-scale support vector machine learning practical,” in Advances in kernel methods. MIT Press, 1999, pp. 169–184.
[16] R. Strack, V. Kecman, B. Strack, and Q. Li, “Sphere Support Vector Machines for large classification tasks,” Neurocomputing, Aug. 2012, DOI:10.1016/j.neucom.2012.07.025.
[2] C.-H. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 27:1–27:27, 2011.
[17] J. L´opez, A. Barbero, and J. R. Dorronsoro, “An MDM solver for the nearest point problem in Scaled Convex Hulls,” in Neural Networks (IJCNN), The 2010 International Joint Conference on. IEEE, 2010, pp. 1–8.
[3] Q. Li, R. Salman, E. Test, R. Strack, and V. Kecman, “GPUSVM: a comprehensive CUDA based support vector machine package,” Central European Journal of Computer Science, vol. 1, no. 4, pp. 387–405, 2011.
[18] B. N. Kozinec, “Recurrent algorithm separating convex hulls of two sets,” Learning algorithms in patter recognition, pp. 43–50, 1973.
[4] Q. Li, “Fast parallel machine learning algorithms for large datasets using Graphic Processing Unit,” Ph.D. dissertation, Virginia Commonwealth University, 2011.
[19] B. F. Michell, V. F. Demyanov, and V. N. Malozemov, “Finding the point of polyhedron closest to the origin,” SIAM JControl, vol. 12, pp. 19–26, 1974.
[5] I. W. Tsang, J. T. Kwok, and P.-M. Cheung, “Core Vector Machines: Fast SVM Training on Very Large Data Sets,” Journal of Machine Learning Research, vol. 6, pp. 363–392, 2005.
[20] A. Barbero, J. L´opez, and J. R. Dorronsoro, “An accelerated MDM algorithm for SVM training,” in Advances in Computational Intelligence and Learning, Proceedings of ESANN 2008 Conference, no. April, 2008, pp. 421–426.
[6] K. P. Bennett and E. J. Bredensteiner, “Duality and Geometry in SVM Classifiers,” in In Proc. 17th International Conf. on Machine Learning (2000), 2000, pp. 57–64.
[21] J. L´opez, “On the Relationship among the MDM, SMO and SVM-Light Algorithms for Training Support Vector Machines,” Ph.D. dissertation, 2008.
[7] I. W. Tsang, A. Kocsor, and J. T. Kwok, “Simpler core vector machines with enclosing balls,” in Proceedings of the 24th international conference on Machine learning - ICML ’07. New York, New York, USA: ACM Press, 2007, pp. 911–918.
[22] A. J. Smola and B. Sch¨olkopf, “Sparse Greedy Matrix Approximation for Machine Learning,” in Proceedings of the Seventeenth International Conference on Machine Learning, 2000, pp. 911—-918.
[8] Z. Liu, J. Liu, and Z. Chen, “A generalized Gilbert’s algorithm for approximating general SVM classifiers,” Neurocomputing, vol. 73, no. 1-3, pp. 219–224, Dec. 2009.
[23] S. Varma and R. Simon, “Bias in error estimation when using cross-validation for model selection,” BMC Bioinformatics, vol. 7, p. 91, 2006.
[9] J. C. Platt, “Sequential minimal optimization: A fast algorithm for training support vector machines,” Advances in Kernel Methods Support Vector Learning, vol. 208, no. MSR-TR98-14, pp. 1–21, 1998.
[24] G. Loosli and S. Canu, “Comments on the core vector machines: Fast svm training on very large data sets,” The Journal of Machine Learning Research, vol. 8, pp. 291–301, 2007.
[10] S. S. Keerthi, S. K. Shevade, C. Bhattacharyya, and K. K. Murthy, “A fast iterative nearest point algorithm for support vector machine classifier design.” IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, vol. 11, no. 1, pp. 124–36, Jan. 2000.
[25] I. W. Tsang and J. T. Kwok, “Authors’ Reply to the ”Comments on the Core Vector Machines: Fast SVM Training on Very Large Data Sets”,” Journal of Machine Learning Research, pp. 1–14, 2007.
214