remote sensed hyperspectral images. ... One of the many tasks in remote sensing is land cover ... For instance, recently, classification methods using Support.
Feature Selection for Classification of Remote Sensed Hyperspectral Images: A Filter approach using Genetic Algorithm and Cluster Validity A. B. Santos1 , C. S. F. de S. Celes1 , A. de A. Araújo1 , and D. Menotti2 1 Computer Science Department, UFMG - Federal University of Minas Gerais, Belo Horizonte, MG, Brazil 2 Computing Department, UFOP - Federal University of Ouro Preto, Ouro Preto, MG, Brazil
Abstract— In this paper, we investigate the advantages of using feature selection approaches for classification of remote sensed hyperspectral images. We propose a new filter feature selection approach based on genetic algorithms (GA) and cluster validity measures for finding the best subset of features that maximizes inter-cluster and minimizes intracluster distances, respectivelly. Thus, using the optimal, or sub-optimal, subset of features, classifiers can build decision boundaries in an accurate way. Dunn’s index metric, given a subset of features, is used to estimate how good the built clusters are. Experiments were carried out with two wellknown datasets: AVIRIS - Indian Pines and ROSIS - Pavia University. Three different classifiers were used to evaluate our proposal: Support Vector Machines (SVM), Multi-layer Perceptron Neural Networks (MLP) and K-Nearest Neighbor (KNN). Moreover, we compare the performance of our proposal in terms of accuracies to other ones: the traditional Pixelwise, without feature selection/extraction, and the widely used Singular Value Decomposition Band Subset Selection (SVDSS). Experiments show that the classification methods using our feature selection approach produce a small subset of features which easily achieve enough discriminative power and their results are similars to the ones using SVDSS. Keywords: filter feature selection, hyperspectral image, pattern classification, genetic algorithm, cluster validity
1. Introduction One of the many tasks in remote sensing is land cover classification, which is concerned with the identification of areas with vegetation, hydrographic, artificial cover (plantations, urban areas, reforestation areas, etc.) and all the different coverages of the earth’s surface [1], [2]. Hyperspectral images have information about materials on earth’s surface expressed in hundred bands/wavelengths [1]. This data allows us to identify and discrimate materials with more accuracy [1], [2]. With a new data representation as such, classifiers can improve their performance in terms of accuracy and precision. For instance, recently, classification methods using Support Vector Machines (SVM) have shown greater accuracy when
dealing with hyperspectral data than when compared with other methods using Maximum Likelihood (ML), k-nearest neighbors (KNN), among other classifiers [3], [4], [2]. Although the high dimensionality of hyperspectral images provides great discriminative power, its classification is still a challenging task due to the large amount of spectral information and its small set of referenced data [1], [2], [5], [6]. This is also known as Hughes phenomena [7] or the “curse of dimensionality”. Another constraint mentioned in the literature when data is in high-dimensional space is the density estimation [2]. It is more difficult to compute than when in a lower dimensional space, since the space is quite sparse [2]. In order to surmount such difficulties, some approaches apply feature extraction/selection/representation techniques [3], [8], [9]. Thus, feature dimension reduction approaches are still required in order to improve the generalization power of the classifiers and reduce its overhead [3]. In [8], a wrapper approach using genetic algorithms (GA) and a SVM classifier tackles this issue. Wrapper approaches, however, have high computacional costs [10]. For this reason, in this paper, we propose a filter approach for feature selection. The search for a smaller subset of features is based on a GA as well, where clustering analysis measures are “evolved” trying to achieve a minimal number of features without loss of discriminative power. Experiments were carried out using two well-known data sets: Indian Pines and Pavia University, obtained by Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) [11] and Reflective Optics System Imaging Spectrometer (ROSIS) [12] sensors, respectively. Three different classifier algorithms (i.e., KNN, Multi-layer Perceptron Neural Networks (MLP), and SVM) were used to compare accuracies obtained by our approach to Pixelwise, which does not use feature selection, and SVDSS ones, which is widely used for feature selection in the remote sensing community [13], [14], [15]. The remainder of this paper is organized as follows. Section 2 describes the classification process and presents some well-known algorithms for such task. Section 3 introduces the problem of feature selection and the SVDSS approach. In section 4, our proposed approach for feature selection is presented. Finally, the experiments and conclusions are presented in Sections 5 and 6, respectively.
2. Classification Algorithms First, let us mathematically define the problem of classification of hyperspectral images. Let δ ≡ {1, ..., n} be an integer set which indexes the n pixels of a hyperspectral image. Let ψ ≡ {1, ..., Kc } be a set of Kc available classes and X ≡ (X1 , ..., Xn ) ∈