IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 20-21, 2016, India
An Assessment of Support Vector Machine Kernel Parameters using Remotely Sensed Satellite Data Vikas Sharma, Diganta Baruah, Dibyajyoti Chutia, PLN Raju, DK Bhattacharya Abstract—This paper reviews the comparative performance of Support Vector Machine (SVM) using four different kernels, i.e., Linear, Polynomial, Radial Basis Function (RBF) and Sigmoid. Overall accuracy (OA), Kappa Index Analysis (KIA), Receiver Operating Characteristic (ROC) and Precision (P) have been considered as evaluation parameters in order to assess the predictive accuracy of SVM. Both high resolution QuickBird sensor data and moderate resolution Landsat Enhanced Thematic Mapper Plus (ETM+) remotely sensed satellite data have been used in the investigation. It was observed that SVM with polynomial kernel ( ) achieves highest classification accuracy followed by SVM with linear kernel ( ), SVM with RBF kernel ( ) and SVM with sigmoid kernel ( ) while classifying QuickBird data. On the other hand, achieves highest accuracy followed by , and in case of Landsat ETM+ data. However, faces lots of computational expenses in classifying both QuickBird and Landsat ETM+ data as compared to the SVMs with other three kernels. was found computationally more efficient with satisfactory predicting ability over all the kernels investigated here. Performance of was found very sensitive with the training data set and produced inconsistent results when there is a limited number of training data set. However, performance of kernel was found consistent and not affected by the size of the training data set. Keywords—SVM, Kernels, Satellite data, Classification, Accuracy.
I. Introduction Classification of Remote Sensing (RS) data is considered an important process for recognition of spatial features in the satellite images. It requires consideration of many factors i.e. selection of suitable classification system in training samples, feature extraction and evaluation of classification results etc [1]. Large numbers of RS classification techniques have been developed since 1980 where the pixel value is treated as the basic unit of analysis [2]. Literature on the RS data analysis approaches and few survey articles has pointed out the recent advancements of classification algorithms and associated feature extraction techniques. Chutia et al. [3] categorized RS classification algorithms in six major groups’ i.e. (1) supervised (2) unsupervised (3) semisupervised (4) hybrid (5) ensemble and (6) multistage approaches. Most of the traditional supervised and unsupervised classifiers are parametric. Recently non-parametric artificial neural network Vikas Sharma, Information Technology Department, Sikkim Manipal Institute of Technology, Majitar, East Sikkim-737136 India Diganta Baruah, Information Technology Department, Sikkim Manipal Institute of Technology, Majitar, East Sikkim-737136 India Dibyajyoti Chutia, North Eastern Applications centre, Department of Space, Govt. of India, Shillong, Meghalaya – 793103 India PLN Raju, North Eastern Applications centre, Department of Space, Govt. of India, Shillong, Meghalaya – 793103 India DK Bhattacharya, Department of Computer Science, Tezpur University, Napaam Assam-784028 India
978-1-5090-0774-5/16/$31.00 © 2016 IEEE 1622
(ANN, [4]) decision tree (DT, [5]) and support vector machine (SVM, [6]) are becoming important approaches for classification of RS data. On the other hand hybrid or multiple classifiers usually have better performance in terms of accuracy than a single classifier [7]. Multistage classification is normally supported by image pre-processing techniques in order to improve the classification performance [3]. Ensemble approaches are getting popular in machine intelligence and pattern recognition applications [8]. Specifically random forests (RF, [9]) have been found most powerful approaches in the classification of RS data. In remote sensing applications SVM have been effectively used for classification and feature extraction. SVMs are currently one of the best known classification techniques with computational advantages over their counter parts and can provide satisfactory result with minimal number of training samples [6]. However selection of appropriate kernel and its parameter is crucial task in improving predictive ability of SVM. Gul et al. propose an improve SVM [10] which is applicable to both linearly and non-linearly separable data without choosing any kernel. SVM with polynomial kernel was found more appropriate than SVM with radial basis function in the classification of Advanced Spaceborne Thermal Emission and Reflection (ASTER) data [11]. On the other hand SVM with radial basis kernel was found more effective than SVM with linear and polynomial kernel in the classification of multi-temporal polarimetry Synthetic Aperture Radar (SAR) data [12]. In the classification of Landsat Enhanced Thematic Mapper plus (ETM+) and Terra aster images in SVM with radial basis function produced comparatively better accuracy in comparison of SVM with polynomial kernel [13]. A number of studies have been reported for assessment of SVM classifier with different kernels. However, a comprehensive assessment on the predictive ability and time complexity of the SVM classifier with most widely used different kernels is not yet reported. The main objective of this work is to assess the performance of SVM classifier using four different kernels, i.e., Linear, Polynomial, Radial Basis Function (RBF) and Sigmoid using a different set of evaluation parameters. II. DATA SETS USED In this investigation both moderate resolution and high resolution data sets of north eastern part of India have been investigated to assess the behavior of SVM classifier using different kernels. A. Landsat ETM + data set Five bands of Landsat ETM+ sensor with a spectral resolution of 0.45µm - 1.75µm with a 30m spatial resolution was considered in the investigation. The test site is comprised of five land cover classes pertaining to the plain areas of Sonitpur, Assam, India. The information on land cover classes and train-test samples is depicted in the Table-I and Landsat
IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 20-21, 2016, India
ETM+ image of Sonitpur, Assam, India is given in the Figure 1. TABLE I.
SVM performs classification by constructing an -dimensional and maximizes the margin to obtain the best performance in classification. SVMs are based on the idea of hyper plane classifiers, or linear separability [14]. Suppose we have n training data points , , …, , where (m-dimensional data space or instances) and , where . Consider a hyper plane defined by is a weight vector and is a bias. We can classify a new object with
LAND COVER CLASSES AND TRAIN-TEST SAMPLES OF LANDSAT ETM+ DATASET Class
No
Samples
Name
Train
Test
1
River/Waterbody-Perennial
572
551
2
Forest tree clad area
1169
1033
3
Agriculture crop land
1266
1662
4
Waterbody/River non-perennial
1071
1076
5
Grass lands
851
991
4929
5313
Total
III. SUPPORT VECTOR MACHINE
(1) Where, represents the hyper-plane and direction of the weight vector gives us the predicted class. The data points those are near to the hyper plane or having minimum distance with decision boundary is called the support vectors (see Figure 3).
Fig. 1.
Landsat ETM+ image of Sonitpur, Assam, India
B. QuickBird data set Four bands of high resolution QuickBird data set with a Spectral resolution of 0.45µm - 0.90µm with 0.60m a spatial resolution was used. The test site comprised of five land cover classes pertaining to the hilly areas of Shillong, Meghalaya India. Summary of land cover classes and train-test samples of the QuickBird data set is presented in Table-II and QuickBird image of Shillong, Meghalaya, India is presented in the Figure 2. The train and test data sets for each of the test sites have been generated independently in order to eliminate bias in the evaluation. TABLE II.
LAND COVER CLASSSES AND TRAIN-TEST SAMPLES OF QUICKBIRD DATASET Class Name
Samples Train
Test
No 1
Urban residential areas
400
599
2
Pine trees
520
655
3
Shadows
334
512
4
Tree clad areas
422
664
5
Bare soil Total
414 2090
418 2848
Fig. 3.
Concept of decision boundary in SVM with single hyperplane
The limitation of SVM is that it encounters lots of computational expenses and produces inconsistent results when data set is characterized by large feature space and the train data set is limited. This can be overcome by introducing a kernel function in place of the inner product of two transformed data vectors in the feature space. A kernel function is defined as that corresponds to a dot product of two feature vectors in some expanded feature space [6]. Linear, polynomial, RBF and sigmoid kernels have been widely utilized and explored to enhance the performance of SVM classifier. Those kernels can be defined as: Linear kernel Polynomial kernel RBF kernel Sigmoid kernel -
Fig. 2.
QuickBird image of Shillong, Meghalaya, India
(2) , ,
(3) (4) (5)
Where xi is a support vector of length m, and each belongs to one of the two classes (i.e. +1 and -1). T is a transformation. is the gamma function for all kernels except linear kernel and is the degree defined in the polynomial kernel. On the other hand, r is the bias defined in the kernel
1623
IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 20-21, 2016, India
function of both polynomial and sigmoid kernels. More details on SVM classifier can be found in Cortes et al 1995 [6]. was realized with and ; was defined for . was executed with and . IV RESULTS AND DISCUSSIONS OA, KIA, ROC and P have been used for evaluation of SVM classifier with 4 different types of kernels. In addition, training time (t in sec) to build the model was also used to assess the computational complexity of SVM classifiers. In the classification of Landsat ETM+ data set, achieved highest accuracy (see Table-III) with OA=85.34%, KIA=0.81, ROC=0.94 and P=0.86 followed by (OA=83.34%, KIA=0.79, ROC=0.94 and P=0.860), (OA=80.78%, KIA=0.75, ROC=0.93 and P=0.84), and (OA=80.61%, KIA=0.81, ROC=0.94 and P=0.87). Similarly, in the classification of QuickBird data set, outperformed the all (see Table-IV) with OA=89.43%, KIA=0.87, ROC=0.96 and P=0.90 followed by (OA=84.90%, KIA=0.81, ROC=0.95 and P=0.86), (OA=82.48%, KIA=0.78, ROC=0.94 and P=0.85), and (OA=81.57%, KIA=0.77, ROC=0.95 and P=0.85). It has been observed that performance of was found comparable with in the classification of Landsat data set. On the other hand performance of was not satisfactory in both the data sets (see Table-III & IV). was found to produce satisfactory results on many benchmark data sets as compared to the other kernels. However, requires optimization of Gamma (width parameter) and complexity parameter (c) in order to achieve higher performance. Implementation of is relatively simple and performs well when the data set is associated with large number of features but the performance of is highly affected if there is noise in the training data set. requires fitting a sigmoid function to perform the mapping, which increases the training time. Moreover, the accuracy of the probability estimates that they yield depends on the efficiency of their estimation process which, as expected, cannot be erfect. TABLE III.
SVM classifiers
TABLE IV.
COMPARATIVE EVALUATION OF SVM KERNELS USING LANDSAT ETM+ DATASET
Evaluation parameters OA (%)
KIA
ROC
P
t(sec)
80.78 83.34 80.61 85.34
0.75 0.79 0.75 0.81
0.93 0.94 0.92 0.94
0.84 0.86 0.83 0.87
10.89 02.04 18.02 24.43
COMPARATIVE EVALUATION OF SVM KERNELS USING QUICKBIRD DATASET
Evaluation parameters SVM classifiers
A. Analysis on computational expenses The investigation was carried out on an Intel Intel (R) Xeon (R) CPU
[email protected] (4 core) workstation. The computational complexity of any classifier highly depends on the training phase. SVMs have been found computationally very effective. Complexity of SVM depends on the choice of kernel and is typically proportional to the number of support vectors. Training time (t) for each of the SVM classifiers is presented in the Table III and Table IV for Landsat ETM+ and QuickBird data set respectively. It has been observed that faces more computational expenses in classifying both QuickBird (t= 24.43 sec) and Landsat ETM+ (t= 2.63 sec) data as compared to the SVMs with other three kernels. On the other hand, was found computationally more efficient over all the kernels investigated here. B. Training size versus overall accuracy The common practice of evaluation is to split the training data set randomly 66% as train data set and remaining 34% for test. But, it cannot be treated as unbiased evaluation and it produces higher OA than using train and test data sets independently. An investigation was made to assess the performances of SVM with all the kernels with varied sizes of train and test data samples. Each of the SVM classifiers was initiated with 10% of train data set (remaining as test) and executed with increment of 5% train data set up to maximum 66% train data set. It has been observed that the predictive ability of was very poor (i.e., OA 80%) while train data size was lesser than 20%. It was observed that the OA of SVM with all kernels is increased with size of the training data set. It has been observed that performance of was found more consistent compared to the SVM with other three kernels in both the data sets.
OA (%)
KIA
ROC
P
t(sec)
84.90 89.43 81.57 82.48
0.81 0.87 0.77 0.78
0.95 0.96 0.95 0.94
0.86 0.90 0.84 0.85
01.00 0.18 0.91 2.63
Fig. 4. (a) OA of SVM with each of the kernels versus train sizes in case of Landsat ETM + data set and (b) OA of SVM with each of the kernels versus train sizes in case of QuickBird data set
V.
CONCLUSION
An assessment of SVM classifier using four different types of kernels is reported here for the classification of Landsat ETM+ and QuickBird data sets. The investigation was carried out using independently generated train and test data sets and random split of training data set for both the data sets. In case of high resolution data, kernel was found effective in context of both predictive ability and time complexity. Similarly, for moderate resolution kernel was found more effective in terms of predictive accuracy. But, it encounters lots of computational expenses. Optimization of
1624
IEEE International Conference On Recent Trends In Electronics Information Communication Technology, May 20-21, 2016, India
parameters for defining each of the kernels plays a major role in improving the performance of SVM classifier. Acknowledgment Authors would like to thank North Eastern Space Applications Centre, Department of Space, Government of India, Umiam, Meghalaya, India for providing necessary Scientific tools and supervision during the investigation. Authors would also like to acknowledge the support of Sikkim Manipal Institute of Technology, Majitar, Sikkim, India for assisting support and guidance. References [1]
[2]
[3]
[4]
[5]
[6] [7]
[8] [9] [10]
[11]
[12]
[13]
[14]
Lu, D., & Q Weng,. "A survey of image classification methods and techniques for improving classification performance", International Journal of Remote sensing, 28(5), 823--870, 2007. Li, M., Zang, S., Zhang, B. & Wu, S.L.C. "A Review of Remote Sensing Image Classification Techniques: the Role of Spatio-contextual Information", European Journal of Remote Sensing, 47, 389--411, 2014. Chutia, D., Bhattacharyya, D K., Sarma, KK., Kalita, R., and Sudhakar, S., "Hyperspectral Remote Sensing Classifications: A Perspective Survey", Transactions in GIS, doi:10.1111/tgis.12164, 2015. McCulloch, W. & Walter, P. "A Logical Calculus of Ideas Immanent in Nervous Activity", Bulletin of Mathematical Biophysics, 5(4), 115-133,1943,doi:10.1007/BF02478259. Hansen, M., Dubayah, R. & Defries, R. Classification trees: "An alternative to traditional land cover classifiers", International Journal of Remote Sensing, 17, 1075--1081, 1996. Cortes, C. & Vapnik, V. Support-vector networks, Machine Learning, 20(3), 273, 1995, doi:10.1007/BF00994018. Chang, Y.L., Fang, J.P., Hsu, W.L., Chang, L. & Liang, W.Y. "Simulated annealing band selection approach for hyperspectral imagery", Journal of Applied Remote Sensing, 4, 2010. Rokach, L. "Ensemble-based classifiers", Artificial Intelligence Review, 33(1-2),1--39, 2010. Leo Breiman. Random Forests. Machine Learning. 45(1), 5-32, 2001. Gül¸sen Ta¸skın Kaya, Okan K. Ersoy, and Mustafa E. Kama¸sak “Support Vector Selection and Adaptation for Remote Sensing Classification” IEEE Transaction on Geoscience and Remote Sensing, 49(6) ,2011. Akbari, E., Amiri, N., and Azizi, H., “Remote Sensing and Land Use Extraction for Kernel Functions Analysis by Support Vector Machines with ASTER Multispectral Imagery”, Iranian Journal of Earth Sciences 4, 75-84, 2012. Yekkehkhany, B., Safari, A., Homayouni, S., Hasanlou, M. "A Comparison Study of Different Kernel Functions for SVM-based Classification of Multi-temporal Polarimetry SAR Data". The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XL-2/W3, 281-285, 2014 Giorgos Mountrakis Jungho Im, Caesar OgoleT. Kavzoglu I.Colkesen. "A kernel functions analysis for support vector machines for land cover classification"., International Journal of Applied Earth Observation and Geoinformation,11(5), 352–359, 2009. Vapnik,V. "The Nature of Statistical Learning Theory". NY: Springer Verlag.1995
1625