Bennett, K., and Campbell, C.: Support Vector Machines: Hype or Hallelujah ?. in. SIGKDD Explorations, 2(2), 2000, pp. 1-13. 2. Blake, C. and Merz, C.: UCI ...
Enhancing SVM with Visualization Thanh-Nghi Do and François Poulet ESIEA Recherche 38, rue des Docteurs Calmette et Guérin Parc Universitaire de Laval-Changé 53000 Laval - France {dothanh,poulet}@esiea-ouest.fr
Abstract. Understanding the result produced by a data-mining algorithm is as important as the accuracy. Unfortunately, support vector machine (SVM) algorithms provide only the support vectors used as “black box” to efficiently classify the data with a good accuracy. This paper presents a cooperative approach using SVM algorithms and visualization methods to gain insight into a model construction task with SVM algorithms. We show how the user can interactively use cooperative tools to support the construction of SVM models and interpret them. A pre-processing step is also used for dealing with large datasets. The experimental results on Delve, Statlog, UCI and bio-medical datasets show that our cooperative tool is comparable to the automatic LibSVM algorithm, but the user has a better understanding of the obtained model.
1 Introduction The SVM algorithms proposed by Vapnik [22] are a well-known class of data mining algorithms using the idea of kernel substitution. SVM and kernel related methods have shown to build accurate models but the support vectors found by the algorithms provide limited information. Most of the time, the user only obtains information regarding the support vectors and the accuracy. It is impossible to explain or even understand why a model constructed by SVM performs a better prediction than many other algorithms. Understanding the model obtained by the algorithm is as important as the accuracy. A good comprehension of the knowledge discovered can help the user to reduce the risk of wrong decisions. Very few papers have been published about methods trying to explain SVM results ([3], [20]). Our investigation aims at using visualization methods to try to involve more intensively the user in the construction of the SVM model and to try to explain their results. A new cooperative method based on a set of different visualization techniques and large scale Mangasarian SVM algorithms [10], [16] gives an insight into the classification task with SVM. We will illustrate how to combine some strength of different visualization methods with automatic SVM algorithms to help the user and improve the comprehensibility of SVM models. The experimental performance of this approach is evaluated on Delve [8], Statlog [18], UCI [2] and bio-medical [13] data sets. The results show that our cooperative method is comparable with LibSVM (a high performance automatic SVM algorithm [4]). We also use a pre-processing step to deal with very large dataE. Suzuki and S. Arikawa (Eds.): DS 2004, LNAI 3245, pp. 183–194, 2004. © Springer-Verlag Berlin Heidelberg 2004
184
Thanh-Nghi Do and François Poulet
sets. The feature selection with 1-norm SVM [11] can select a subset from an entire large number of dimensions (thousands of dimensions). With very large number of data points, we sample the data points from the clusters created by SOM [15] or Kmeans [17] algorithms to reduce the dataset size. And thus, our tool works only on this reduced dataset. A case study on the UCI Forest cover type dataset shows that this approach provides an accurate model. In section 2, we briefly introduce supervised classification with SVM algorithms. In section 3, we present the cooperative algorithm using visualization methods and SVM algorithms for classification tasks. In section 4, we propose to use a multiple view approach based on different visualization methods to interpret SVM results. We present experimental results in section 5 before the conclusion and future work.
2 Support Vector Machines Let us consider a linear binary classification task, as depicted in figure 1, with m data points in the n-dimensional input space Rn, denoted by the xi (i=1, …, m), having corresponding labels yi = ±1.
Fig. 1. Linear separation of the data points into two classes.
For this problem, the SVM try to find the best separating plane, i.e. furthest from both class +1 and class -1. It can simply maximize the distance or margin between the support planes for each class (x.w – b = +1 for class +1, x.w – b = -1 for class -1). The margin between these supporting planes is 2/||w|| (where ||w|| is the 2-norm of the vector w). Any point falling on the wrong side of its supporting plane is considered to be an error. Therefore, the SVM has to simultaneously maximize the margin and minimize the error. The standard SVM formulation with linear kernel is given by the following quadratic program (1):
Enhancing SVM with Visualization
185
m
Min Ψ(w, b, z) = (1/2) ||w||2 + C
∑z
i
i =1
s.t.
yi(w.xi – b) + zi ≥ 1 zi ≥ 0 (i=1, …, m)
(1)
where slack variable zi ≥ 0, constant C > 0 is used to tune errors and margin size. The plane (w,b) is obtained by the solution of the quadratic program (1). And then, the classification function of a new data point x based on the plane is: f(x) = sign (w.x – b) SVM can use some other classification functions, for example a polynomial function of degree d, a RBF (Radial Basis Function) or a sigmoid function. To change from a linear to non-linear classifier, one must only substitute a kernel evaluation in (1) instead of the original dot product. More details about SVM and others kernelbased learning methods can be found in [1]. Recent developments for massive linear SVM algorithms proposed by Mangasarian [10], [16] reformulate the classification as an unconstraint optimization and these algorithms require thus only solution of linear equations of (w, b) instead of quadratic programming. If the dimensional input space is small enough (less than 104), even if there are millions data points, the new SVM algorithms are able to classify them in minutes on a Pentium. The algorithms can deal with non-linear classification tasks; however the m2 kernel matrix size requires very large memory size and execution time. Reduced support vector machine (RSVM) [16] creates rectangular mxs kernel matrix of size (s