Silvilaser 14th - 17th September 2010, Freiburg - Session 3
377
A comparison of support vector and linear classification of tree species Johannes Heinzel§*, Olaf Ronneberger‡ and Barbara Koch§
[email protected]
§University of Freiburg, Department of Remote Sensing and Landscape Information Systems, 79106 Freiburg, Germany ‡University of Freiburg, Department of Computer Science and Centre for Biological Signalling Studies (BIOSS), 79110 Freiburg, Germany
Abstract Data from full-waveform airborne LiDAR becomes more and more available. At the same time high dimensional feature spaces are increasingly used for remote sensing related classifications. To handle those data sets advanced classification techniques are often regarded as producing more accurate results compared to less complex classification methods. Within this paper an empirical comparison of classification results from linear discriminant analysis (LDA) and support vector machines (SVM) is carried out. For this we used an innovative and comprehensive set of reflection specific full-waveform LiDAR features. Altogether 231 features were available, which are compositions of the geometric reflection properties, their statistical distribution, the position within the laser beam as well as the spatial position. The LDA was conducted with a stepwise inclusion of variables to achieve a ranking of their importance. SVMs are tested with different configurations concerning the multiclass classification method and the kernel. The parameters of the SVM are optimized using a grid search algorithm. A generally approved method to achieve a nonlinear variable ranking from the SVM is not available until now. Therefore, both classifiers are additionally executed with the most important variables from the stepwise LDA and are compared to a set with all variables. Verification was done by a 24-fold cross validation with leaving out complete independent sample plots. Using the complete feature set SVM outperforms LDA for the four main species pine, spruce, oak and beech (79.22%) and for the deciduous-coniferous (94.43%) classification. When the variables are restricted to only the top ranked variables the results for the LDA increase and come very close to the SVM accuracies. SVM accuracies remain constant using LDA selected features. As a surprising result, LDA with an optimized feature set even outperforms SVM in the case of six species(61.81%). This indicates a need to develop specific nonlinear variable ranking mechanisms for SVMs and to test their influence on SVM classification results.
1. Introduction The automated classification of tree species from airborne surveyed data is a major scientific and practical aim of forestry related remote sensing applications. In general there are two basic principles by which the solution of the classification can be influenced. This is first the selection of the features and second, the choice of the classifier. Concerning the first option early attempts used mainly colour infrared (CIR) images (Pinz and Bischof 1990, Meyer et al. 1996) whose capability were often more or less limited to the distinction of coniferous and deciduous trees when working with a precision on a single tree scale (Holmgren and Persson 2004). Most recent studies try to exploit the potential of light detection and ranging (LiDAR) data while either using the spatial distribution pattern of the point cloud (Brandtberg et al. 2003, Brandtberg 2007, Tokola et al. 2008) or, still less common, by inspecting the geometric properties of the single reflections (Litkey et al. 2007, Hollaus et al. 2009, Heinzel and Koch 2010).
Silvilaser 14th - 17th September 2010, Freiburg - Session 3
378
The second option, which concerns the classifier, has also experienced major technical improvements. When LiDAR data is the last decades’ innovation on the feature and data side, then support vector machines (SVMs) seem to develop as one of the most innovative methods on the classifiers’ side. Their introduction to remote sensing is still in its infancy and most of the studies are from a period within the last five years (Pal and Mather 2005, Waske and Benediktsson 2007, Ørka et al. 2009b). While some authors indicate their advantage over other classifiers in remote sensing applications, there are still numerous studies relying on more basic but well proven methods like linear discriminant analysis (LDA) (Lucas et al. 2008, Kim et al. 2009, Ørka et al. 2009a). This study presents a direct empirical comparison of the results from LDA and SVM classifiers using the same dataset for tree species classification. Furthermore, it considers different configurations of the SVM, concerning the kernel and the multiclass method. It also explains a complete SVM classification procedure which comprises the optimization of parameters and the t-fold cross validation. The dataset consists of features which are compositions of the geometric properties of LiDAR reflections, different values for their statistical distribution within a grid cell and their location within the laser beam as well as in space. Altogether, 231 compositions represent the feature space for both classifiers. In comparison to several other studies which aim to distinguish coniferous from deciduous or a maximum of up to three species (Holmgren et al. 2008, Ørka et al. 2009a) we present results for different numbers of classes, which are referred to as classification depths. The discrimination of up to six species was tested under the conditions of a mixed temperate forest. Furthermore, we describe a grid based classification where no information about the delineation of the tree crown is known. The classification of automatically delineated crowns, e.g. Holmgren and Persson (2004) and Reitberger et al. (2008), bears the risk that failures in single tree detection increase the failure in species classification.
Data Study area The study area is located in south-western Germany close to the city of Karlsruhe as shown in Figure 1. It comprises a rectangular area of about 10 km² and has an even terrain. The forest stands are characterized by a high diversity concerning the deciduous-coniferous composition, the tree age, the layering and the density of the stands. The main species in a typical temperate forest of this type are pine (Pinus sylvestris), spruce (Picea abies), oak (Quercus petraea) and beech (Fagus sylvatica). Additionally, there are a considerable number of the secondary species hornbeam (Carpinus betulus) and cherry (Prunus avium) within our study area.
LiDAR data
LiDAR data was captured during a flight campaign in the summer of 2007 with a Harrier 56 system by TopoSys® GmbH. The recorded full-waveforms were preprocessed with the software RiAnalyze 560© from Riegl LMS GmbH to extract singular reflections. In addition to the pure three dimensional coordinates we extorted the reflection specific amplitude, the width and the intensity of the signal. The intensity is defined as the total power received over time by the reflected signal. Furthermore, we got the position of each reflection and the total number of reflections within the corresponding beam. Due to a special flight planning we achieve a point density of at least 16 points/m².
Figure 5. Location and overview of the study area. The image shows a CIR aerial photograph with sample plots indicated as white squares.
Silvilaser 14th - 17th September 2010, Freiburg - Session 3
379
Reference data Tree species specific reference data was collected during multiple phases in 2008, 2009 and 2010. Within 24 square plots of 30 m side length the position of the crown centre was measured for each individual tree and recorded together with the referring tree species. Additionally the crown class was estimated from which only dominant, co-dominant and intermediate were selected due to their visibility from above. According to Hildebrandt (1996) these three upper classes make up to 97% of the whole timber volume. Training data was manually selected as explained in Heinzel and Koch (2010). The tree top positions were allocated to single crowns and delineated by following their outline on a normalized digital surface model (nDSM) and colour infrared (CIR) images. Afterwards these polygons were intersected with the feature grids to receive sample cells of known tree species. Altogether the plots were selected to enclose an equal number of samples for each species.
Feature Extraction From the LiDAR point data a comprehensive set of composed features was derived. The features were originally computed for a slightly different dataset to exploit the information of secondary parameters from the waveform geometry and is comprehensively explained in Heinzel and Koch (2010). The point cloud data is projected onto a grid with a cell size of 1 m. Each grid layer represents one feature which consists of four components. These are first the primary geometric information such as the width, the amplitude and the intensity of the reflection as well as the total number of reflections within a laser beam. Second, a measure for the statistical distribution of the reflection specific values within a grid cell and third and fourth, the position of the reflection within the ray and the position in space respectively. Altogether 231 variables were available.
2.
Methods
Within this compare ative study two major classification methods are used. These are first the LDA and second the SVM classification.
Linear discriminant analysis LDA is a widely-used classification method in remote sensing image classification. In this study we use the software package SPSS to conduct all LDA classifications. Due to the well known principles and functionality of LDA we only give a brief summary of the settings and framework we use. LDA aims to find new variables that allow an optimized linear separation of the groups. The general form of the according discriminate function Y is a linear combination of the feature variable Xi with i = 1,...,d:
Y
a 0 a1 X 1 a 2 X 2 ... a d X d
(1)
The discriminate coefficients ai are determined by maximizing the ratio of the explained sum of squares and the unexplained sum of squares. To receive a ranking of the original 231 variables concerning their importance for discrimination we conduct a forward stepwise LDA. The variable which decreases Wilks’ lambda most is iteratively included in the analysis with each step. We use an F-value of 3.84 as threshold for inclusion and 2.71 for exclusion according to Pollar et al. (2007). As a summary of the classification principle it can be said that LDA tries to find the average object of each group and allocates new objects to the group with the most similar mean value. Further LDA aims on the reduction of variables used for classification and is limited to the linear discrimination of groups.
Support vector machines SVMs use a different classification principle. They try to find those training objects (vectors) of a group which are most similar to the other second group. These objects are then called support vectors because they form the decision boundary for classification. The new object is then compared to each of these support vectors. A
Silvilaser 14th - 17th September 2010, Freiburg - Session 3
380
SVM is primarily a two class separation tool. To handle multiple classes there are two major methods. These are the separation of the multiclass problem into all possible two class pairs (one versus one) or the comparison of each class against the rest (one versus rest). The theory of support vector machines is explained in several scientific papers and tutorials (Burges 1998, Bennett and Campbell 2000). A separate description of the SVM theory would go beyond the scope of this paper. Instead we give an explanation of the framework in which the SVM classification should be used to receive optimal results. To conduct fast SVM computations we use the LIBSVMTL programming library (Ronneberger 2004). The general form of the decision function of the SVM is given in equation (2). l
f ( x ) sign(¦ y i Į i K ( x, x i ) b)
(2)
i 1
Here x is the vector to be classified, xi is the ith support vector, yi is the known class of xi, Įi is the lagrangian multiplier and b is the bias. K(x,xi) is a kernel function that allows a nonlinear classification while remaining in the input feature space. One of the most universal kernels is the radial basis function (RBF) as shown in equation (3).
K ( x, x i ) exp J || x x i || 2 s.t.
(3)
J !0
A complete SVM classification procedure can be subdivided in three hierarchical levels (Figure 6). To receive meaningful results it is important to follow this scheme. The upper level comprises the gridsearch method that allows an optimization of the kernel parameter Ȗ and the cost factor C which is used for estimating Įi (Tso and Mather 2009). This is important since both variables differ for each classification problem and therefore no standard values exist. Starting at an initial value they are exponentially increased for a defined number of times. In case of a linear SVM only C has to be optimized. The mid level contains an accuracy assessment for each variable pair determined by the gridsearch. We use a 24-fold cross-validation referring to the 24 sample plots on the test site. With this we receive a mean accuracy for each of the parameter settings. The lower level conducts the actual multiclass SVM classification. When using the one versus one method altogether k(k – 1) /2 SVMs are required for k classes.
Figure 6. The hierarchical levels which are required to conduct an optimal SVM classification.
Silvilaser 14th - 17th September 2010, Freiburg - Session 3
381
After plotting the accuracies into the referring parameter grid the optimal parameterization can be easily determined as the global maximum of the grid values. These values are then used to train the final SVM model. We tested SVM classifications after the above described method with different configurations. These include the kernel, where we used both RBF and linear SVM and the multiclass methods one versus one as well as one versus rest. Supplementary to the usage of all feature we conducted all SVM classifications with the same ranked features as used for the LDA. This way of feature ranking can be described as a filter ranking which means, that the ranking is done by a method separated from and prior to the actual classifier (Kohavi and John 1997).
3.
Results
The classification results of different SVM configurations are shown in Table 1. The columns distinguish the use of an RBF kernel and a linear SVM as well as the two different multiclass methods. The rows show the overall accuracies for three classification depths. The RBF kernel with a one versus one (ovo) method achieves the best results for all classification depths. We therefore chose this SVM setting for further comparison with the LDA.
Table 10. Overall accuracy (%) for SVM classification with different configurations
Six species Main Species Con/Dec
One versus one
One versus rest
RBF
Linear
RBF
Linear
56.10
55.90
53.84
47.67
79.22
78.33
77.40
73.51
94.43
93.90
94.43
93.90
Table 2 gives an overview of these comparative classification results. The columns distinguish LDA and SVM accuracies while they are further subdivided for the use of all 231 variables of the feature set and those variables selected by the stepwise discriminant analysis. The number of the selected variables by the stepwise LDA is 64 for the six species, 44 for the main species and 48 for deciduous-coniferous classification. It can be seen that the SVM outperforms LDA for the main species and the deciduousconiferous when using all features. LDA is better in the case of six species with selected features. A further difference between both classifiers is that LDA accuracies increase when limiting the feature space to the stepwise included variables. On the other hand SVM accuracy results nearly stay constant.
Table 11. LDA and SVM classification accuracies (%) in comparison for rank selected and non-ranked features as well as for different number of classes.
LDA
SVM (RBF/ovo)
All features
Top features
All features
Top features
Six species
57.89
61.81
56.10
55.26
Main Species
75.07
79.59
79.22
79.83
Con/Dec
90.18
93.10
94.43
94.73
Silvilaser 14th - 17th September 2010, Freiburg - Session 3
382
Additionally we observed the classification accuracies when the top ten variables were included in a stepwise procedure. Therefore, we determined the variables which occur most often on each of the upper ten ranks during the 24-fold stepwise discriminant analysis. It can be seen from Figure 3 that the most relevant improvement of accuracy occurs with the first three variables. This is the case for both classifiers and all classification depths. The three most important variables are very similar for all classification depths. In the first place it is the arithmetic mean of the intensity within a grid cell from laser beams with only one reflection, in the second place the mean of the signal width using all targets within the beam and in the third place the mean of the total number of targets within a beam. In all cases the considered reflections were limited to the upper 5 m of the canopy. Deciduous-coniferous classification has the same variables within the top three ranking but uses the number of targets in place one and the intensity variable in place three.
Figure 7. Classification accuracies for the averaged top ten variables. The accuracy is given for each number of topmost ranked variables together with the improvement in comparison to the next lower number.
4.
Discussion
The initial test of different SVM configurations reveals the advantage of the RBF kernel in comparison to a linear SVM and the superiority of the one versus one multi class method. These results confirm the study of Pal and Mather (2005) who classified a multi- and hyperspectral remote sensing dataset and came to the same conclusions. We have also pointed out the gridsearch method which is obligatory to receive optimal SVM results. This step seems often to be neglected in application oriented studies which use SVMs as a black box tool.
The comparative classification results show that in most cases using the complete feature set SVM outperforms LDA classification more than 4% in overall accuracy. After limiting the feature set to the selected features from stepwise LDA the accuracies for the main species and deciduous-coniferous trees come very close to the SVM results (see Table 2). In the case of six species we have the unexpected phenomenon that LDA even overtops SVM. However, it should be considered that SVM results may underlie small variations due to the step size in the gridsearch algorithm. Nevertheless, step sizes should not be too small to avoid
Silvilaser 14th - 17th September 2010, Freiburg - Session 3
383
overfitting of the model. The increase in accuracy after removing noise producing variables refers to the general character of LDA. All three classification depths clearly show this behaviour. In contrast SVM results stay more or less constant which points to a higher stability of SVM in high dimensional feature spaces (Chen and Ho 2008). Focusing the iterative inclusion of the ten uppermost ranked features shows a very similar development of the results for both classifiers. Within all classification depths the major increase in accuracy is restricted to the top two or three variables. The inclusion of further variables only results in minor changes of accuracy. Since SVM classification uses the same rank selected features derived by stepwise LDA, it can be assumed that the ranking is not optimal for the SVM. Therefore we see the necessity for future work in developing an SVM specific ranking and testing the classifier with a customized feature selection. Altogether the often indicated superiority of non-linear SVMs in comparison to more simple classifiers (Gokcen and Peng 2002, Chen and Ho 2008, Ørka et al. 2009b) is only partly confirmed with this study when using the complete feature set. With an LDA optimized set of variables we achieve surprisingly good accuracies for this classifier. Even if our restriction to a single example does not allow any generalization, it indicates that the differences in performance depend on the extent the classifiers fit to the problem at hand (Gokcen and Peng 2002). It can be concluded that even if one classifier outclasses another in theory, as shown for SVM and LDA by Gokcen and Peng (2002), in the individual case the superiority need not to be mandatory.
Acknowledgements This work was financed by the Deutsche Forschungsgemeinschaft (DFG). We would also like to thank TopoSys® GmbH for good cooperation and providing the high quality LiDAR data.
References BENNETT, K. P., and CAMPBELL, C., 2000, Support vector machines: hype or hallelujah? ACM SIGKDD Explorations Newsletter, 2, pp. 1-13. BRANDTBERG, T., 2007, Classifying individual tree species under leaf-off and leaf-on conditions using airborne lidar. Isprs Journal of Photogrammetry and Remote Sensing, 61, pp. 325-340. BRANDTBERG, T., WARNER, T. A., LANDENBERGER, R. E., and MCGRAW, J. B., 2003, Detection and analysis of individual leaf-off tree crowns in small footprint, high sampling density lidar data from the eastern deciduous forest in North America. Remote Sensing of Environment, 85, pp. 290-303. BURGES, C. J. C., 1998, A tutorial on Support Vector Machines for pattern recognition. Data Mining and Knowledge Discovery, 2, pp. 121-167. CHEN, C. H., and HO, P. G. P., 2008, Statistical pattern recognition in remote sensing. Pattern Recognition, 41, pp. 2731-2741. GOKCEN, I., and PENG, J., 2002, Comparing linear discriminant analysis and support vector machines. Lecture Notes in Computer Science, 2457, pp. 104-113. HEINZEL, J., and KOCH, B., 2010, Exploring full-waveform LiDAR parameters for tree species classification. International Journal of Applied Earth Observation and Geoinformation (submitted). HILDEBRANDT, G., 1996, Fernerkundung und Luftbildmessung (Heidelberg: Wichmann). HOLLAUS, M., MÜCKE, W., HÖFLE, B., DORIGO, W., PFEIFER, N., WAGNER, W., BAUERHANSL, C., and REGNER, B., 2009, Tree species classification based on full-waveform airborne laser scanning data. In Silvilaser 2009, 14-16 October 2009, College Station, USA, pp. 54-62. HOLMGREN, J., and PERSSON, A., 2004, Identifying species of individual trees using airborne laser scanner. Remote Sensing of Environment, 90, pp. 415-423. HOLMGREN, J., PERSSON, A., and SODERMAN, U., 2008, Species identification of individual trees by combining high resolution LIDAR data with multi-spectral images. International Journal of Remote Sensing, 29, pp. 1537-1552. KIM, S., MCGAUGHEY, R. J., ANDERSEN, H. E., and SCHREUDER, G., 2009, Tree species differentiation using intensity data derived from leaf-on and leaf-off airborne laser scanner data. Remote Sensing of Environment, 113, pp. 1575-1586. KOHAVI, R., and JOHN, G. H., 1997, Wrappers for feature subset selection. Artificial Intelligence, 97, pp. 273-324.
Silvilaser 14th - 17th September 2010, Freiburg - Session 3
384
LITKEY, P., RÖNNHOLM, P., LUMME, J., and LIANG, X., 2007, Waveform features for tree identification. In P. Rönnholm, H. Hyyppä, and J. Hyyppä (Eds.), International Archives of Photogrammetry, Remote Sensing and Spatial Information, Vol. 36 Part 3/W52, Proceedings of the ISPRS Workshop ‘Laser Scanning 2007 and SilviLaser 2007’, 12-14 September 2007 (Espoo, Finland, pp. 258-263. LUCAS, R., BUNTING, P., PATERSON, M., and CHISHOLM, L., 2008, Classification of Australian forest communities using aerial photography, CASI and HyMap data. Remote Sensing of Environment, 112, pp. 2088-2103. MEYER, P., STAENZ, K., and ITTEN, K. I., 1996, Semi-automated procedures for tree species identification in high spatial resolution data from digitized colour infrared-aerial photography. Isprs Journal of Photogrammetry and Remote Sensing, 51, pp. 5-16. ØRKA, H. O., NAESSET, E., and BOLLANDSAS, O. M., 2009a, Classifying species of individual trees by intensity and structure features derived from airborne laser scanner data. Remote Sensing of Environment, 113, pp. 1163-1174. ØRKA, H. O., NÆSSET, E., and BOLLANDSÅS, O. M., 2009b, Comparing classification strategies for tree species recognition using airborne laser scanner data. In Silvilaser 2009, 14-16 October 2009, College Station, USA, pp. 46-53. PAL, M., and MATHER, P. M., 2005, Support vector machines for classification in remote sensing. International Journal of Remote Sensing, 26, pp. 1007-1011. PINZ, A. J., and BISCHOF, H., 1990, Constructing a neural network for the interpretation of species of trees in aerial photographs. In 10th International Conference on Pattern Recognition 16-21 June 1990, Atlantic Citys, USA (Omaha, USA: IEEE Computer Society), pp. 755-757. POLLAR, M., JAROENSUTASINEE, M., and JAROENSUTASINEE, K., 2007, Morphometric analysis of tor tambroides by stepwise discriminant and neural network analysis. Engineering and Technology, 33, pp. 16-20. REITBERGER, J., KRZYSTEK, P., and STILLA, U., 2008, Analysis of full waveform LIDAR data for the classification of deciduous and coniferous trees. International Journal of Remote Sensing, 29, pp. 1407-1431. RONNEBERGER, O. 2004. LIBSVMTl - a Support Vector Machine Template Library. Available online at: http://lmb.informatik.uni-freiburg.de/lmbsoft/libsvmtl/index.en.html (accessed 16/10/2010). TOKOLA, T., VAUHKONEN, J., LEPPÄNEN, V., PUSA, T., MEHTÄTALO, L., and PITKÄNEN, J., 2008, Applied 3D texture features in ALS based tree species segmentation. In G. J. Hay, T. Blaschke, and D. Marceau (Eds.), International Archives of Photogrammetry, Remote Sensing and Spatial Information, Vol. 38 Part 4/C1, GEOBIA 2008, 5-8 August 2008, Calgary, Canada (Calgary, Canada: International Society for Photogrammetry and Remote Sensing). TSO, B., and MATHER, P. M., 2009, Classification Methods for remotely sensed data (Boca Raton, London, New York: CRC Press). WASKE, B., and BENEDIKTSSON, J. A., 2007, Fusion of support vector machines for classification of multisensor data. Ieee Transactions on Geoscience and Remote Sensing, 45, pp. 3858-3866.