Feature Selection of Hyperspectral Data Through Local ... - ISP - UV

12 downloads 0 Views 850KB Size Report
Abstract – In this paper, we propose a procedure to reduce dimensionality of ... dimensionality feature spaces (referred to as Hughes phenomenon or curse of ...
Feature Selection of Hyperspectral Data Through Local Correlation and SFFS for Crop Classification L. Gomez-Chova, J. Calpe, G. Camps-Valls, J.D. Martín, E. Soria, J. Vila, L. Alonso-Chorda, J. Moreno Department of Electronic Engineering (GPDS) & Department of Thermodynamics University of Valencia, Spain [email protected] Abstract – In this paper, we propose a procedure to reduce dimensionality of hyperspectral data while preserving relevant information for posterior crop cover classification. One of the main problems with hyperspectral image processing is the huge amount of data involved. In addition, pattern recognition methods are sensitive to problems associated to high dimensionality feature spaces (referred to as Hughes phenomenon or curse of dimensionality). We propose a dimensionality reduction strategy that eliminates redundant information by means of local correlation criterion between contiguous spectral bands; and a subsequent selection of the most discriminative features based on a Sequential Float Feature Selection algorithm. This method is tested with a crop cover recognition application of six hyperspectral images from the same area acquired with the 128-bands HyMap spectrometer during the DAISEX99 campaign. In the experiments, we analyze the dependence on the dimension and employed metrics. The results obtained using Gaussian Maximum Likelihood improve the classification accuracy and confirm the validity of the proposed approach. Finally, we analyze the selected bands of the input space in order to gain knowledge on the problem and to give a physical interpretation of the results.

main problems with hyperspectral image processing is the huge amount of data involved. Certainly, this is a major problem for pattern recognition methods, since they are sensitive to problems associated to high dimensionality feature spaces known as Hughes phenomenon or curse of dimensionality [7]. The close relationship between the complexity of the classifier and the size of the training set suggests the idea of a prior reduction of the input space in order to avoid wrong estimations of the classifier parameters [5].

Keywords- feature selection; dimensionality reduction; crop classification; hyperspectral imaging; remote sensing.

We propose a dimensionality reduction strategy that eliminates redundant information, by means of local correlation criterion between contiguous spectral bands, and a subsequent selection of the most discriminative features based on a Sequential Float Feature Selection (SFFS) algorithm.

I. INTRODUCTION The information contained in hyperspectral images allows the reconstruction of the energy radiated by the earth surface throughout the electromagnetic spectrum. The properties of reflection, absorption and emission of the materials in certain wavelengths, due to their composition and molecular structure, make possible the characterization and identification of the observed materials from their spectral curve [1], [2]. Summarizing the analysis of hyperspectral images, the data set can be interpreted as a hypercube where two dimensions represent the spatial coordinates of the image, a third one corresponds to the spectral coordinate, and the fourth is the radiance. Therefore we have a complete image for each spectral band. The values for a given point in the hypercube represent the reflectance at a given wavelength as measured by the sensor for that pixel. The last objective will be to process these data in order to extract the underlying structure. In fact, classification of surface features in satellite imagery is one of the most important applications of remote sensing. Pattern recognition methods have proven to be effective techniques in this kind of applications. However, one of the

This objective can be achieved essentially by two different ways. The first one is to identify those variables that do not contribute to the classification task and to ignore them (feature selection). The second approach is to find a transformation to a lower dimension feature space (feature extraction). In this work, we analyze feature selection methods since they have several advantages: data transmission (only selected bands to be transmitted from satellite); interpretability of results (the selected features are spectral bands with physical meaning); and finally, extrapolation of results to other spectrometers with different spectral bands. The main risk is the loss of information if the feature selection is wrong.

II. HYPERSPECTRAL DATA This work is a contribution to the Digital Airborne Imaging Spectrometer Experiment (DAISEX) project, funded by the European Space Agency (ESA) within the framework of its Earth Observation Preparatory Program during 1998, 1999, and 2000 [3]. Three data acquisition campaigns were carried out under controlled conditions in the area of Barrax (Spain). We have used six hyperspectral images acquired with the HyMap spectrometer during the DAISEX99 campaign. It is a 128-band scanner with a discontinuous spectral range (0.4 µm - 2.5 µm). The HyMap images acquired in the DAISEX99 campaign correspond to two consecutive days. Each flight consisted on two overpasses, one in North-South direction and the other in East-West direction which yielded six images of the same area. Calibration and atmospheric correction of the images were carried out. Simultaneously to the aerial campaign, another one was made on the ground level with the acquisition of samples of vegetation [9].

This research has been partially supported by the Information Society Technologies (IST) programme of the European Community. The results of this work will be applied in the “Smart Multispectral System for Commercial Applications” project. (SmartSpectra - http://www.smartspectra.com)

0-7803-7930-6/$17.00 (C) 2003 IEEE

III. REDUCTION OF DIMENSIONALITY The aim is to reduce dimensionality while preserving relevant information for posterior classification. These methods are referred to as feature selection techniques. The SFFS algorithm identifies those bands that better discriminate among classes in a two-stage feature selection process. Firstly, a search strategy for feature group selection is carried out, and secondly, the objective function that evaluates the different subgroups is calculated. In our case, we use an objective function that maximizes the mean probabilistic distance between classes. Although the SFFS method does not require variables to be selected in advance, hyperspectral data sets frequently have higher spectral resolution than the issue requires. This redundant information between features makes the analysis of the results difficult. In addition, Hughes phenomenon (high dimensionality with small training set) produces wrong estimations of our objective function. These drawbacks are solved by local correlation criterion that reduces redundant information before the SFFS algorithm selects the features that contain the highest discriminative capabilities. A. Redundant Information The large amount of redundant information is due to the high spectral resolution of the instruments. This redundancy aggravates the problems related to the curse of dimensionality since crop classification results do not improve beyond tens of bands. We use the correlation matrix between all the spectral bands (no class labelling is assumed) in order to identify similarities and redundant information. For our data set, high correlation between subsets of bands can be observed (Fig. 1). The information contained in each set of the correlation matrix principal diagonal (over the threshold) is contained in one of those bands. In order to select the more discriminative spectral band between classes, we look for the one that further separates the distribution of classes using the Bhattacharyya distance between the distribution of classes, which is given by

Figure 1. Left: Cross-correlation matrix between all the spectral bands. Right: Blocks of correlated contiguous spectral bands to each other after applying a 0.99 threshold.

B. Feature Selection After eliminating redundant information, we obtain a set of self-contained information bands. At this moment, the task is to identify those bands that better discriminate among classes. A two-stage feature selection algorithm is followed. We chose the Sequential Floating Forward Selection algorithm due to the following characteristics [4]: a) Sequential: Sequential algorithms present better computational requirements than exponential algorithms which perform an exhaustive search. The latter are infeasible working with 128 spectral features. b) Floating: A floating sequential search process is more versatile since the number of added or removed features is adapted during the process. c) Forward: The forward search begins with one variable and continues adding features. It presents a couple of practical advantages with regard to the backward search. Firstly, if the initial dimension is very high, the calculation of the criterion can be complex or impossible (for example the estimation of the covariance matrix from few samples). Secondly, if the process is long and interrupted, the best subsets for small dimensions are already obtained. d) Selection: Feature selection is well-suited to our problem since obtained results can be directly interpreted and extrapolated to other sensors. The main advantage of this method is that it produces a hierarchy of feature subsets with the best selection for each dimension. Therefore, we can represent the value of the objective function for each number of features (Fig. 2) and check monotonicity (no local minima when adding features). Objective function (interclass Bhattacharyya distance)

6000

SFFS Best Selection

5000

4000

J(DIM)

For feature selection and classification purposes, six different classes were considered in the area: corn, wheat, sugar beet, barley, alfalfa, and soil. A 900-sample training set and another 900-sample validation set were defined for classification considering spectra of ground checked points to assure its certainty. The number of samples in each of the six classes was identical (25 samples per image and per class, i.e. 150 samples per class). The last step was to use a true map of the scene in order to verify the classification results on the complete images (test set). In each one of the six images (700x670 pixels), the total number of test samples is 327336 (corn 31269; sugar beet 11322; barley 124768; wheat 53400; alfalfa 24726; and bare soil 81851) and the rest is considered unknown.

2000

3000

§ ¨ Σ + Σ2 1 1 T −1 J B = (µ 2 − µ 1 ) [Σ1 + Σ 2 ] (µ 2 − µ 1 ) + log ¨ 1 1 4 2 ¨ 2( Σ Σ )2 1 2 ©

· ¸ ¸ ¸ ¹

1500

(1)

2000

1000

500

1000

0

where Gaussian distribution of classes is assumed with µi mean and Σi covariance matrix for class i.

0

0

10

20

30

1

40

2

3

50

DIM (dimension)

4

60

5

6

70

80

Figure 2. Objective function value attending to the number of features.

0-7803-7930-6/$17.00 (C) 2003 IEEE

TABLE II.

Hughes Phenomenon 100 95

Success Rate %

CHARACTERISTICS OF THE SELECTED BANDS.

Bands Wavelength (nm) 18 685 (∆ω=14.7)

90 85 80

GML with 150 samples/class

75 70 65

24

777 (∆ω=16.4)

102

2042 (∆ω=20.5)

99

1986 (∆ω=21.5)

72

1500 (∆ω=16.2)

Absorption Characteristics Chlorophyll-a maximum absorption. Near Infrared beginning with more reflectance and less absorbance due to leaf structure. Foliar starch absorption; protein and nitrogen. Water absorption due to soil moisture and leaf water content. Cellulose and sugar; protein and nitrogen.

60 55

12 0

80

10 0

60

40

20

5

3

1

50

Nº of Bands Used Figure 3. Hughes phenomenon in a Gaussian Maximum Likelihood classifier is shown considering the training and validation sets for different numbers of bands uniformly distributed over the spectrum.

IV. RESULTS In the experiments, we analyze the dependence on the dimension and employed metrics. The results obtained using the Gaussian Maximum Likelihood (GML) classifier without feature selection show that the classification success rate falls as data dimension increases (Fig.3). Results do not improve beyond 20 bands due to the dimensionality of the spectral information. Moreover, success rate decreases over 100 bands due to the Hughes phenomenon (a covariance matrix of dimension 100 is estimated with only 150 samples). With the procedure explained in Section III we reduce the dimension from 128 to 36 bands, which demonstrates the high level of oversampling in the spectrum for the present crop classification problem. The SFFS algorithm produces a feature selection giving the best subset of features for each dimension. We used the Bhattacharyya (or Mahalanobis) metric as the objective function to be maximized. This metric performs better than the Euclidean metric since it considers the first and second order statistics of the data (variability inside each class) and it is not scale dependent. When using these most discriminative selected bands, results improve the classification accuracy and confirm the validity of the proposed approach. Moreover, this methodology, commonly known as a data mining process, yields valuable information on the relevance of input features by providing a ranking of variables, as shown in Table I. Finally, we analyze the selected bands of the input space in order to gain knowledge on the problem and to give a physical interpretation of the results. Selected features can be related to the pertinent absorption spectral bands: cellular pigments; red shift; leaf structure; and water absorption bands (Table II).

V. CONCLUSIONS In this communication, we have proposed a procedure to reduce dimensionality while preserving relevant information for posterior classification. This feature selection procedure yields the following benefits: • Increases the performance of the classifier by mitigating the Hughes phenomenon. • Reduces the amount of data. This allows faster calculations and makes possible the training of iterative methods, such as neural networks. • The selected bands optimize the class separation. It allows to easily recognize the natural structure in data. • We can identify the physical meaning of the selected spectral bands. This, in turn, offers excellent advantages for building a new classifier. By choosing the Bhattacharyya distance, one assumes Gaussian distribution of classes which could be a priori drawback. However, good results have been obtained using the GML classifier since it extracts the underlying structure in data and the Gaussianity assumption has been commonly used in other works [6], [8]. An alternative approach is to consider an objective function which pays direct attention to the classification results. REFERENCES [1]

[2]

[3] [4]

[5]

[6] TABLE I.

RELATIVE IMPORTANCE OF THE SELECTED VARIABLES BY MEANS OF THE SFFS ALGORITHM AND RECOGNITION RATE OVER THE WHOLE IMAGES ADDING THE NEW VARIABLE TO THE EMPLOYED FEATURE SET. Added Feature Band 18 Band 24 Band 102 Band 99 Band 72

Recognition Rate [%] 55.71 66.03 87.71 91.49 93.85

Added Feature Band 9 Band 21 Band 12 Band 44 Band 20

Recognition Rate [%] 94.79 95.03 95.15 95.19 95.21

[7] [8]

[9]

D. E. Bowker, R. E. Davis, D. L. Myrick, K. Stacy, and W. T. Jones, "Spectral reflectances of natural targets for use in remote sensing studies." NASA Reference Publication 1139. 1985. R. N. Clark, “Spectroscopy and Principles of Spectroscopy” Manual of Remote Sensing, A. Rencz, Editor, John Wiley and Sons, Inc. 1999. http://speclab.cr.usgs.gov. DAISEX. “Digital Airborne Imaging Spectrometer Experiment”: http://io.uv.es/projects/daisex/. ESA, 2000. J. Doak, "Intrusion Detection: The Application of Feature Selection, A Comparison of Algorithms, and the Application of a Wide Area Network Analyzer." Master's thesis, University of California, Davis, Dept. of Computer Science. 1992. K. Fukunaga and R. R. Hayes, "Effects of sample size in classifier design." IEEE Trans. on Pattern Analysis and Machine Intelligence 11(8): 873-885. 1989. L. Gómez-Chova, J. Calpe, E. Soria, J. Moreno, M. C. González, L. Alonso, and J. D. Martín, "Improvements in land surface classification with hyperspectral HyMap data at Barrax". DAISEX Workshop, ESAESTEC, Noordwijk, The Netherlands, ESA Publications Division. 2001. G. F. Hughes, "On The Mean Accuracy Of Statistical Pattern Recognizers." IEEE Trans. on Information Theory (14-1): 55-63. 1968. D. Landgrebe, "Hyperspectral Image Data Analysis as a High Dimensional Signal Processing Problem," IEEE Signal Processing Magazine, Vol 19, No. 1 pp. 17-28, January 2002. J. Moreno, V. Caselles, J. A. Martinez-Lozano, J. Melia, J. Sobrino, A. Calera, F. Montero, and J. M. Cisneros, "The measurement programme at Barrax". Final Results Workshop on DAISEX, ESA/ESTEC, Noordwijk, The Netherlands, 2001.

0-7803-7930-6/$17.00 (C) 2003 IEEE

Suggest Documents