Statistical Classification for Assessing PRISMA Hyperspectral ...

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING

1

Statistical Classification for Assessing PRISMA Hyperspectral Potential for Agricultural Land Use Umberto Amato, Anestis Antoniadis, Maria Francesca Carfora, Paolo Colandrea, Vincenzo Cuomo, Monica Franzese, Stefano Pignatti, and Carmine Serio

Abstract—The upcoming launch of the next generation of hyperspectral satellites (PRISMA, EnMap, HyspIRI, etc.) will meet the increasing demand for the availability/accessibility of hyperspectral information on agricultural land use from the agriculture community. To this purpose, algorithms for the classification of remotely sensed images are here considered for agricultural monitoring of cultivated area, exploiting remotely sensed high spectral resolution images. Classification is accomplished by procedures based on discriminant analysis tools that well suit hyperspectrality, circumventing what in statistics is called “the curse of dimensionality”. As a byproduct of classification, a full assessment of the spectral bands of the sensor is obtained, ranking them with the purpose of understanding their role in segmentation and classification. The methodology has been validated on two independent image datasets gathered by the MIVIS (Multispectral Infrared and Visible Imaging Spectrometer) sensor for which ground validations were available. A comparison with the popular multiclass SVM (Support Vector Machines) classifier is also presented. Results show that a good classification (minimum global success rate 95% through all experiments) is achieved by using the 10 spectral bands selected as the most discriminant by the proposed procedure; moreover, it also appears that nonparametric techniques generally outperform parametric ones. The present study confirms that the new generation of hyperspectral satellite data like PRISMA can ripen an end-user application for agricultural land-use of cultivated area. Index Terms—Hyperspectral data, land use, discriminant analysis, independent components.

I. INTRODUCTION

H

YPERSPECTRAL airborne and spaceborne sensors are widely used for Earth surface remote sensing. In particular, exploitation of remotely sensed land images for agriculture applications takes great advantage from multi- and hyperspectral sensors: information useful to support agriculture applications are the cultivated fields boundaries, the vegetation types and status, the soil moisture concentration and so on. Manuscript received July 16, 2012; revised October 01, 2012, February 22, 2013; accepted February 27, 2013. This work was supported by the Italian Space Agency (ASI) under grant I/019/11/10. U. Amato, M. F. Carfora, and M. Franzese are with Istituto per le Applicazioni del Calcolo ‘Mauro Picone’ CNR, Napoli, Italy (e-mail: [email protected]). A. Antoniadis is with Institute J. Kuntzmann, Université Joseph Fourier, Grenoble, France. P. Colandrea is with Compagnia Generale per lo Spazio S.p.A., Milano, Italy. V. Cuomo and S. Pignatti are with Istituto di Metodologie per Analisi Ambientale CNR, Tito Scalo (Potenza), Italy. C. Serio is with Dipartimento di Ingegneria e Fisica Ambientale, Università della Basilicata, Potenza, Italy. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JSTARS.2013.2255981

In this framework ASI (Italian Space Agency) planned the PRISMA (PRecursore IperSpettrale della Missione Applicativa) mission that, according to the mission requirements, will allow for a detailed characterization and parameters extraction supporting a wide variety of applications on land like agriculture and forestry. The PRISMA satellite will be on a sun-synchronous orbit. Orbit baseline will be of about 650 km with an inclination of 98.19 and a repeat cycle of 25 days with an equator crossing time (LTDN) of 10:30 hours on descending mode. The satellite will primarily operate in a “user driven” targeting mode, with a re-targeting period of less than 7 days and a response time less than 14 days with a nominal swath width of 30 km at nadir. The payload will be composed by a hyperspectral pushbroom scanner (HYP) with a 30 m ground spatial resolution (GSD) on a swath of 30 km, coupled with a panchromatic camera (PAN) with a GSD of 5 m. The HYP spectral resolution is of about 12 nm in a spectral range of 400–2500 nm (VNIR and SWIR spectral regions). Even though in the next future there will be a wide availability of satellite hyperspectral data set, a full exploitation of these hyperspectral images is still far to be reached as their management and processing is not a straightforward extension of low spectral resolution setup. In this paper some specific methods for processing hyperspectral images aimed at supporting land use/ crop classification applications are presented. In particular, pixel classification methods have been developed for the purpose of exploiting the unique capabilities provided by hyperspectral images information content, tailoring them to support specific agriculture applications in the frame of precision farming. Classification aims at understanding the type of land cover associated to a pixel, through its unique spectral signature. Therefore, medium or high resolution spectral data open new avenues for applications; actually, coverage of a wider fraction of the electromagnetic spectrum at a higher spectral resolution means to better represent the spectral signature corresponding to each pixel and then to accurately pick the unique spectral features of land categories. At present there are several methodologies for classifying vegetation multispectral images. First, we recall the classical maximum likelihood, minimum distance, parallelepiped and Fisher classifiers [24]. Lu et al. [16] proposed a linear mixture model to classify the different vegetation species in a forest; it was claimed to be effective to handle transition areas between homogeneous regions having mixed vegetation. A stepwise optimization model with genetic algorithms was developed by Luo et al. [17]. This method relies on tools from Gaussian Mixture Modelling and Decomposition and parametric statistical

1939-1404/$31.00 © 2013 IEEE

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING

models. It has several distinct advantages, such as robustness with respect to statistical hypotheses, and flexibility with respect to density functions. Advanced statistical classifiers for feature extraction [15] and distribution-free approaches such as neural networks and support vector machines (SVM) have also been proposed [18]. Recently, active learning techniques have been applied to the classification of remotely sensed data [19], also in combination with semisupervised techniques [23]. Standard univariate (i.e., single spectral band) classification techniques can be adapted for application to hyperspectral data, but in this case their effectiveness is often reduced by the presence of redundant or irrelevant information in the multivariate data set. Processing a large number of bands can paradoxically result in a higher classification error than processing a subset of relevant bands without redundancy, if the huge quantity of information and the very high spectral resolution are not properly taken into account [11]. A disadvantage of such methods is also a lack of parsimony in the final classification and a high sensitivity to the so-called “curse of dimensionality” when the dimension is large and the sample size is moderate. This states in a formal way the fact that a model including a large number of spectral bands does not automatically imply an improvement of the classification procedures. To cope with performance and accuracy problems associated with high dimensionality, it is commonplace to apply a preprocessing step that transforms the data to a space of significantly lower dimension with limited loss of the information present in the original data. Besides classical Principal Component Analysis (PCA), the most notable techniques are Minimum Noise Fraction (MNF) [10], [3] and Singular Value Decomposition (SVD) [7] (e.g., [21]). MNF differs from PCA in that the components are ranked according to their signal-to-noise ratio instead of their variance; SVD reduces data without discarding their average, unlike PCA. One may also cite the Rank Ordered With Accuracy Selection (ROWAS) technique developed in [11] that involves ranking bands based on their information content and redundancy, and evaluating a varying number of top ranked bands. However, since relying only on rank-based schemes tends to ignore correlations between bands, ROWAS is coupled with some unsupervised dimension reduction methods, such as PCA, and some supervised classification methods like Naïve Bayes. This paper proposes a feature extraction and dimension reduction algorithm based on discriminant analysis for supervised classification. In the context of a tree species separability study, linear discriminant analysis with hyperspectral data has been used by van Aardt and Wynneb [26]; see also Bandos et al. [4] for a more general and deeper discussion. The discriminant procedure proposed hereafter employs as a preprocessing step a linear transform of the original components into principal or independent components. Independent Component Analysis (ICA) is a statistical technique based on a generative model, whose purpose is to reveal hidden factors that underlie sets of random variables, measurements, or signals that are assumed to be a mixture of several underlying sources called independent components [13]. These source variables are also assumed to be non-Gaussian and mutually independent of each other. ICA has already been used for remote sensing applications [28]; as a precursor to discriminant analysis we refer, e.g., to Zhang and

Huang [30]. The Independent Component Discriminant Analysis proposed by Amato et al. [2] has already been applied to cloud detection [1] and to hyperspectral data in a recent paper by Villa et al. [27]. The features produced by our procedure in the transformed space are uncorrelated or independent, so that the multivariate density estimates characterizing each class are replaced by univariate product estimators; moreover, since parametric densities do not adequately represent land categories, we estimate these univariate densities nonparametrically. The effectiveness of the proposed procedure will be demonstrated on PRISMA like data by using two different datasets obtained by the MIVIS sensor onboard aircraft. Its spectral resolution ( 100 bands in the range 0.43–12.7 m) demands methodologies specialized for moderately high dimensions. MIVIS data has been chosen for representing PRISMA data because the present procedures for simulating PRISMA appear inadequate for the aims of this work. In fact, the available simulation procedures use either airborne hyperspectral data with similar spectral settings (e.g., like MIVIS) or classification maps, derived by higher resolution images, combined with the spectral signatures of each classes [5]. Both procedures show critical aspects: the former requires to correctly decorrelate signal from the sensor noise, while the latter cannot properly simulate areas characterized by highly fragmented scenarios, such as the ones involved in the present study. The paper is organized as follows. Section II briefly discusses the methods used in the paper for the classification algorithm. Section III describes the MIVIS datasets used for the present analysis and the implementation of the methods of Section II. Results of the experiments are shown and discussed in Section IV. Finally, Section V offers conclusions and further discussion on each pixel and then accurately picking the unique spectral features of land categories. II. THE MODEL Classification of hyperspectral imagery aims at discriminating different objects (in our case different land cover types) using information coming from several spectral bands. Thus the problem is to assign an unknown subject to one of classes on the basis of a multivariate observation , where represents the number of variables (spectral bands in our context). In discriminant analysis the distribution of the observations in each class is characterized by the so-called class conditional probability density function . Denoting by the a priori probability of observing an individual from population , the Bayes decision rule suggests to allocate to the population which maximizes with respect to . If the class conditional densities are multivariate Gaussian, the above rule simply yields the well-known linear or quadratic discriminant functions according to whether the condition of homoscedasticity is fulfilled or not. In this case, if in addition we adopt uniform priori probabilities, we retrieve the ML classifier, as discussed in Hastie and Tibshirani [12]. However, in most applications neither nor , are known and the recourse to the Gaussian based approach may be strongly misleading.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. AMATO et al.: STATISTICAL CLASSIFICATION FOR ASSESSING PRISMA HYPERSPECTRAL POTENTIAL FOR AGRICULTURAL LAND USE

While the a priori probabilities can be easily estimated as the relative frequency of the classes in the labeled data, to fully exploit the information in the training dataset (on the contrary, these probabilities can be assumed as uniform when size or representativeness of the labeled information is not appropriate), the estimation of the unknown class conditional densities from the training dataset is a more complex task. Then we resort to nonparametric density estimates as in [2]: the class-conditional densities are estimated with multivariate kernel density estimators of the form (1) where is the training sample from the class and denotes the size of the training set for the same class (see, e.g., Wand and Jones [29] for a comprehensive introduction). A popular choice for the kernel function is the product of univariate Gaussian kernels, but an efficient estimate for a multivariate density function as the product of univariate functions requires an assumption of independence of its margins that does not always hold in practice. While in the case of Gaussian distributions this can be fixed by transforming the original multivariate data into principal components by the PCA and then proceeding with classification, for general distributions the application of PCA to the data only de-correlates them, without yielding full independence. A possible remedy is to seek for a transform that makes the margins mutually independent irrespective of their distribution. ICA achieves such a task. It is a statistical method for linearly transforming an observed multidimensional random vector into a random vector whose components are stochastically as independent from each other as possible. The present paper considers the Hyvärinen’s maximum negentropy approach [13] for the estimate of the independent components and relies on the Matlab package fastICA [14], available at the Website http://www.cis.hut.fi/projects/ica/fastica/, for its implementation. Summarizing, the following three nonparametric and two parametric discriminant analysis methods for multispectral vegetation classification have been considered: LDA (Linear Discriminant Analysis), based on Gaussian density functions with common variance among classes; QDA (Quadratic Discriminant Analysis), based on Gaussian density functions with general covariance; NPDA (NonParametric Discriminant Analysis), where the nonparametric estimate (1) of the density functions is made for each component separately; PCDA (Principal Component Discriminant Analysis), where the original components are transformed into principal components prior to nonparametric density estimation; ICDA (Independent Component Discriminant Analysis, [2]), where the original components are transformed into independent components prior to nonparametric density estimation. The popular SVM classifier in its multi-class implementation LIBSVM [9] has also been considered for comparison since, as shown in [8], it provides very good results in terms of accuracy and robustness, even if at a significantly high computational cost.

3

III. EXPERIMENTAL SETUP The Multispectral Infrared and Visible Imaging Spectrometer (MIVIS) images show similarities with hyperspectral PRISMA images; MIVIS is a modular system consisting of four spectrometers that provide 102 spectral channels. MIVIS covers the ranges 0.43 to 0.83 m in 20 bands (20 nm resolution), 1.15 to 1.55 m in 8 bands (50 nm resolution), 2.0 to 2.5 m in 64 bands (8 nm resolution) and 8.2 to 12.7 m in 10 bands (400 to 500 nm resolution). The IFOV is 2.0 mrads with a digitized FOV of 71 covering 755 pixels for line. Ground resolution varies with flight altitude. The system includes an integrated GPS receiver, roll, pitch, and heading sensors. Two blackbody sources are used as thermal reference sources. MIVIS has been operational since 1993 by the Italian National Research Council. Two datasets have been considered for the present analysis. For both datasets the data pre-processing pipeline included a radiometric calibration, and atmospheric, geometric, and topographic corrections, depending on the sensor characteristics and acquisitions date. Atmospheric correction of the MIVIS spectrometer was performed as in Moran et al. [20] by simulating the primary atmospheric parameters with MODTRAN4 radiative transfer code [6]. Dataset 1. Fig. 1 shows the image of the Tessera area, near Venice, Italy, that was used in our study, obtained from the MIVIS channels at 0.44 m, 0.56 m and 0.68 m. Data were taken on July 24, 2001 at an altitude of 4000 m a.s.l. providing a spatial resolution of 8 meters, and were georeferenced and calibrated. The actual number of pixels included is 241 001. A field campaign for validation purposes was executed contemporarily to the MIVIS survey over a wide zone of the image to produce a land cover map describing the crop distribution at the moment of the flight. Seven classes are defined (1–7 in the left part of Table I) corresponding to the different vegetation categories present on the land. In addition two more classes were included, namely water and urban (8 and 9 in Table I), because their spectral signatures are completely different from the other classes; since discriminant analysis does choose one class among the input ones, their exclusion would unavoidably result in several misclassifications on those two classes. The training dataset was defined by choosing non overlapping box samples, having roughly the same size within each vegetation class, inside the experimental ground vegetation field. Such a training dataset is also shown in Fig. 1. The total size of the training dataset is 30 553 data; the number of pixels for each land class is shown in Table I. Dataset 2. Fig. 2 shows the image of the Pollino mountain, in the district of Basilicata, Southern Italy. It was taken on November 9th 1998 at an altitude of 4000 m a.s.l. providing a spatial resolution ranging from 3.5 m to 7 m. Since this dataset is gathered on an elevated area, the radiative effects induced by the different ground elevation of the area were considered by using a Digital Elevation Model (DEM; 6 m/pixel resolution) to calculate the atmospheric parameters along each sensor scan line. The actual number of pixels included is 1,198,798. Thirteen land cover classes are defined as reported in the right part of Table I. The areas identified to train the algorithms and



Fig. 1. The training dataset for Dataset 1 (Tessera) overimposed to the RGB image of the entire Dataset 1.

TABLE I DEFINITION OF THE LAND CLASSES, WITH CORRESPONDING SIZE OF THE TRAINING DATASET, FOR BOTH DATASETS

Fig. 2. RGB image of the Pollino area (Dataset 2 in the study), as taken by the MIVIS sensor.

validate the results were defined using the information gathered from field surveys and digital aerial photos (1 m ground resolution) acquired at the time of the MIVIS flight. For this dataset, such a ground truth consisting of 6566 pixels is much less extensive than the one for the Tessera dataset, therefore it is not shown on the image. The two datasets involve different types of land cover (mainly cultivated vegetation and urban in the former, and uncultivated vegetation and bare ground in the latter). Therefore they represent the most interesting categories of land cover for agricultural purposes. In particular the Tessera area has an abundance of small fields, that characterize the typical rural area of North Italy having sharp boundaries and which are intensively

cultivated. The spatial dimensions of the fields allow one to select wide homogeneous areas for retrieving statistically significant training areas. The Pollino area exhibits a very fragmented landscape with various ecosystems ranging from the Mediterranean to the alpine habitats. The natural fragmentation of the land cover classes of the selected areas reduces the possibility to identify a wide training area required to describe a representative spectral signature of the land class [22]. Even if, in principle, all of the methods presented in Section II could be applied to all the 102 spectral bands, there are two main reasons for selecting a smaller number of spectral bands in classification. First of all, data from different bands do not have the same quality, both for instrumental reasons connected with the low signal-to-noise ratio of the higher frequency bands, and for the noise induced by atmosphere; such a noise is noticeable both in the bands pertaining to the SWIR region for the relative low irradiance values and in the bands close to the window regions. A preliminary qualitative analysis of the radiance data is then applied, concentrating especially in the atmospheric absorption regions. Following this analysis, a certain number of bands have been rejected because of the high number of missing pixels. However, experiments have been carried both including and excluding these bands in order to check the robustness of the procedure. Indeed, these bands have been “detected” by the procedure, always being ranked last in the band selection part of the following experiments and with practically no influence on the resulting classification. On the contrary, the pixels showing negative radiances at some of the remaining spectral band have been removed during the band selection procedure and not globally from the beginning: since the band selection procedure naturally yields the most informative bands, then we reach the objective of getting significant (i.e., positive) spectral radiances with a negligible reduction of the sample size. The second reason for selecting a smaller number of spectral bands in classification is to realize a good tradeoff between classification accuracy and computational effort, reducing data redundancy. Indeed, the methodologies considered in the present paper naturally circumvent the curse of dimensionality in that decorrelation or independence are guaranteed through proper transforms. Nevertheless, we shall investigate a progressive inclusion of the bands in the classification analysis and estimate their effectiveness in classifying the data, with the main objective of quantitatively assessing their role in the overall accuracy. To avoid an exhaustive investigation of the set of bands, practically unfeasible in terms of the present computer technology, besides being not necessary, we consider the following three strategies for bands selection: Simple forward. The first band is selected by an exhaustive search over all bands as the one that gives the best classification performance on the training dataset; each of the other bands is recursively selected by the same criterion among all the remaining bands. Forward-backward. At each step of the above forward procedure, a check is made whether eliminating one of the already selected bands improves performance of the classification; this strategy is intended to limit the bias introduced by the forward recursive procedure.


Double start. The procedure is the same as forward-backward or simple forward, except that the selection of the first two bands is made by an exhaustive search over the couples of bands. The reason for this strategy is that the first few selected bands are the most informative ones, so that the procedure becomes exhaustive at least with respect to them. In order to estimate the performance of the classification methods quantitatively, the following indicators have been considered: , percentage of correctly classified pixels the in vegetation class , false positive rate of vegetation class , false negative rate of vegetation class , kappa-statistic coefficient, the chance-corrected measure of agreement for each class, defined as , where is the above mentioned observed percentage of agreement and is the percentage agreement that would occur by chance alone; values of are claimed to indicate a good classification; , global success percentage of correctly classified pixels for the whole volume of data; , global -statistic coefficient for the whole set of data. Experiments have been conducted with the three band selection methods discussed in Section II; however results are shown only for the simple forward procedure, because there are no significant differences with respect to the other procedures. Three different numerical experiments have been conducted in order to evaluate the accuracy of the classification methods and robustness with respect to the training dataset: Experiment 1. A training dataset has been defined starting from the ground validation (see Fig. 1 for Dataset 1—Tessera and Table I for Dataset 2—Pollino). Classification has been performed on each whole dataset and performance indicators evaluated for the same training set (test dataset coinciding with the training dataset). Only one band at a time has been considered. This experiment aims at picking the main physical features of the spectral bands related to land cover detection. In addition it is the base for evaluating the role of multispectrality. However, to analyze the impact of the unequal number of pixel per class, we repeated the classification several times on randomly chosen subsets of the training set having the same number of pixels per class. Experiment 2. Carried out as in Experiment 1, but performed on each entire dataset (i.e., all spectral bands) through the band selection procedure previously discussed. Experiment 3. To assess the performance of the proposed procedures on external datasets, the experimental training dataset of Experiments 1 and 2 has been splitted into three parts by randomly partitioning the box samples belonging to each class in three groups having roughly the same total number of pixels. This choice significantly reduces the effects of spatial autocorrelation on the procedure to be tested. Then the first subset is used for band selection, the second one is used as a training dataset, whereas performance indicators are evaluated on the third one. This experiment aims at quantitatively assessing the robustness of the classification methods with respect to the choice of the training dataset. In order not to be sensitive to the particular random realization of the training dataset, the experiment is repeated 20 times with different random choices of the partitioning, and the results are averaged.

5

Fig. 3. Global success rate of classification for Dataset 1 when only one band at a time is considered (Experiment 1—Section IV-A). Results are shown for the most representative methods in Section II: Quadratic (QDA) and Non Parametric (NPDA) Discriminant Analysis. SVM results are also shown for comparison.

IV. EXPERIMENTS RESULTS AND DISCUSSION A. Single-Band Land Cover Detection—Experiment 1 Fig. 3 shows the global success rate obtained when a single spectral band is used to retrieve the land cover (Dataset 1—Tessera). Of course, NPDA, PCDA and ICDA give the same results in this case. The improvement due to nonparametric density estimate is very clear all over wavelengths; however, the more time-consuming SVM algorithm attains an even better global success rate on almost every spectral band. The same results sorted by land cover are shown in the left part of Tables II and III. Values are very good in general for all land classes (even if digits for the Dataset 2—Pollino, shown in Table III are lower due to the poorer training dataset), confirming the role of multispectrality in detecting different classes. However, a certain bias in the success rate of some of the methods towards the most populated classes can be observed. In particular, it should be noted that SVM is more sensitive than the other techniques to the differences in size of the training set for the different classes: indeed, the less populated ones in both datasets are not detected by the SVM classifier. Then we decided to repeat the same experiment adopting a bootstrapping procedure to randomly choose a training subset composed by pixels per class ( for Dataset 1 and for Dataset 2) and repeat the classification over replicas. Results reported in the right part of the same Tables show a little variation in the success rates of QDA and NPDA; on the contrary, SVM results dramatically improve. Finally, looking at the selected wavelengths, it can be observed that the Red region (0.62–0.70 m), showing the strongest contrast to soil reflection due to high chlorophyll absorption (which is related to the status and characteristics of vegetation), is highly performing in terms of success rate. Moreover, some wavelengths pertaining to the SWIR region have a good success rate as they allow a clear discrimination between soil and vegetation. For the Tessera site, however, the lower success rate of wavelengths pertaining to the Green region (0.5–0.6 m) could be explained by the fact that crops on



TABLE II 1-BAND TOP SUCCESS RATE FOR THE DIFFERENT METHODS ON DATASET 1. RESULTS FOR THE ENTIRE TRAINING DATASET WITH CORRESPONDING WAVELENGTHS (LEFT) AND AVERAGE VALUES WITH STANDARD DEVIATION ON A RANDOM SUBSAMPLE OVER 20 REPLICAS (RIGHT)

TABLE III 1-BAND TOP SUCCESS RATE FOR THE DIFFERENT METHODS ON DATASET 2. RESULTS FOR THE ENTIRE TRAINING DATASET WITH CORRESPONDING WAVELENGTHS (LEFT) AND AVERAGE VALUES WITH STANDARD DEVIATION ON A RANDOM SUBSAMPLE OVER 20 REPLICAS (RIGHT)

Fig. 4. Global success percentage, , of the considered classification methods as a function of the number of spectral bands, when the whole training dataset is considered (Experiment 2—Dataset 1): zoom on the first 20 selected bands is shown.

the scene are all characterized by high pigments concentration that impacts on the Green region by determining a low variability of reflectance values (i.e., high pigment concentration causes low chlorophyll reflective peak) among crops land cover classes. B. Land Cover Detection—Experiment 2 Fig. 4 shows the global percentage of success, , for the classification methods of Section II when the spectral bands are progressively chosen by the simple forward procedure (only the first 20 selected bands are shown for readability reasons). The figure shows evidence of all the theoretical issues of the statistical methodologies. LDA and QDA involve a parametric den-

sity estimation and therefore they exhibit a greater robustness with respect to the density estimation. PCDA and ICDA are able to circumvent the “curse of dimensionality” since they are based on a prior transform of the data into decorrelated or independent components. NPDA does not possess either of the two features and actually it performs the worst; in particular its success rate quickly decreases with the number of spectral bands. Moreover, QDA has better success rate than LDA, as expected due to its more accurate estimate of variance for each class; its performance almost approaches PCDA and ICDA, showing that a robust estimate of density is somewhat able to address the “curse of dimensionality”. The multi-class SVM classifier performance, as already observed in the previous Experiment results, is affected by the differences in size of the considered training sets; however, we defer a more detailed analysis of its results to the following discussion. Indeed, results of Fig. 4 are global, in the sense that they refer to all pixels and land cover classes; as such, they could be misleading when we consider each class separately, since the extension of the different classes inside the image varies very much and therefore the global indicator practically only takes account of the most populated ones. Therefore, Table IV shows the statistical indicators introduced in Section III for QDA, PCDA, ICDA and SVM, when the 10 best channels are selected. We observe that the wheat stubbles class is characterized by the lowest value of accuracy because it poorly meets the gaussianity hypothesis. In fact, wheat stubbles fields are composed of soil and stubbles, which have different spectral behaviour, mixed together in different concentrations. ICDA method globally works in the same manner as PCDA, however ICDA gives a better trade-off between false positive and negative indicators, therefore often


7

TABLE IV CLASS-SPECIFIC AND GLOBAL PERFORMANCE INDICATORS FOR THE BEST PERFORMING METHODS WHEN THE WHOLE TRAINING DATASET IS CONSIDERED (EXPERIMENT 2). CLASSIFICATION WAS PERFORMED USING 10 SPECTRAL BANDS. RESULTS REFER TO DATASET 1

TABLE V CLASS-SPECIFIC AND GLOBAL PERFORMANCE INDICATORS FOR THE BEST PERFORMING METHODS WHEN THE WHOLE TRAINING DATASET IS CONSIDERED (EXPERIMENT 2). CLASSIFICATION WAS PERFORMED USING 10 SPECTRAL BANDS. RESULTS REFER TO DATASET 2

achieving higher values of the -statistic. Table IV also helps us to clarify the weakness of the SVM classifier on this dataset: while its accuracy is very high on most land classes, it completely fails in identifying Mix Woods and hardly detects Urban, the less populated classes, as already observed by looking at Table II in the previous Experiment. Table V shows the same statistical indicators of Section III for the Pollino dataset. Not surprisingly, in this case the less numerous (and therefore less significant) training dataset lead to better performances for the Discriminant Analysis techniques, while at the same time it exacerbates the already noticed weakness of the SVM classifier, that completely fails in detecting two land classes. To better clarify how the proposed procedures act in discriminating the different land cover classes, the following Fig. 5 shows a 2D representation of radiance in the MIVIS feature space for the Tessera training dataset. In the top panel of the figure, scatterplot of radiance for the two best spectral channels as selected by the NPDA step-forward procedure are shown. The lower panel of the figure shows the same scatterplot of the two principal or independent components as selected by the PCDA and ICDA step-forward procedures, respectively on the left and on the right. In all panels different colors refer to the different land classes as reported in the top panel legend. It is clear that a transform into principal or independent components

dramatically splits land classes away in the feature space, giving rise to their better detectability; in addition, as already reported in Table IV, ICDA is able to give better false positives and negatives rates. Fig. 6 shows the land classes predicted on the whole image of Fig. 1 using the full training dataset of Fig. 1 by PCDA and by SVM, respectively, on the first 10 spectral bands. For PCDA, we notice a bad classification on the left and right boundaries of the image. It is due to the residual atmospheric path radiance effect, which could be relevant within all MIVIS images because of its high FOV (about 35 ). Therefore, the classification performance can be affected by this atmospheric noise. This effect is not present in SVM classification, where instead the Urban class is almost undetected and several artifacts (stripes) show evidence of unrealistic mixing between land classes. Fig. 7 shows classification of the whole Pollino image obtained by ICDA and SVM, respectively, by using the best 10 spectral bands. On such a fragmented landscape, with a very limited available training set, a quantitative assessment of the compared techniques, as reported in Table VII in the case of coincident training and test dataset, is to be preferred to the visual assessment. However, the classification maps confirm a complete failure of the SVM classifier; on the contrary, ICDA classification results look more realistic.



C. Spectral Band Selection

Fig. 5. (a) Radiance measured on the MIVIS channels 20 (wavelength 0.83 m) and 15 (wavelength 0.72 m) for all the training data in Dataset 1; (b) Principal Components and (c) Independent Components in the MIVIS feature space for the same data.

Fig. 6. Results of Principal Components Discriminant Analysis (left) and Support Vector Machine (right) classification for the Dataset 1 (Tessera), when all the training data are considered and the best 10 spectral bands are selected.

Finally, we discuss the selection of spectral bands accomplished by the methods. Tables VI and VII show the bands progressively chosen and the corresponding global success rate, , for the discriminant analysis methods considered in Section II (Tessera and Pollino areas, respectively). The wavebands listed in Tables VI and VII have a high level of relevance in providing various vegetation or crop characteristics, as reported in many literature references (see [25] for a comprehensive bibliography relating wavelengths with physical vegetation characteristics). In particular, the five wavelengths selected by the different methodologies as the most relevant have been coupled with a character depicting a physical meaning related to vegetation characteristics. More specifically, TPI indicates wavelengths that are sensitive to the total (chlorophyll + carotenoids) pigments concentration; PI is mainly related to chlorophyll reflective peaks; RE stays for red edge which is sensitive to plant stress but also provides additional information about chlorophyll and nitrogen status of plants; PM points out spectral regions related to moisture absorption and is therefore sensible to plant moisture; BLAI stays for region influenced by Biomass and LAI (Leaf Area Index); T corresponds to TIR bands and therefore is related to soil/vegetation contrast; ABS corresponds to wavelengths related to vegetation absorption maxima where the soil/crop differences are highest for most crops in most growing conditions; MLIG identifies wavelengths related to plant moisture and sensitive to lignine, cellulose and starch. The most important conclusion is that the spectral bands selected by the considered discriminant analysis methodologies as the best performing ones in terms of success rate are not exactly the same for the two sites; however it can be observed that all the highest score bands fall in the VNIR spectral region, while just a few bands are in the SWIR region. This result confirms that in both test areas (mainly dominated by the vegetation land cover) the reflectance properties of leaf components in the VNIR is primary for discriminating vegetation (pigments chlorophyll, water content) as well as brightness temperature for discriminating vegetation from soil. In particular on the Pollino test site, where soil and rock significantly outcrop on the entire image, some bands pertaining to the SWIR region, typically characterized by a lower SNR for the MIVIS sensor, were selected by the three nonparametric discriminant analysis methods, according to their capacity to decorrelate signal from noise. D. Land Cover Detection—Experiment 3

Fig. 7. Results of Independent Components Discriminant Analysis (top) and Support Vector Machine (bottom) classification for Dataset 2 (Pollino), when all the training data are considered and the best 10 spectral bands are selected.

The results of this last Experiment are intended to assess the performance of the proposed procedures when different datasets (or different parts of the same image) are used for band selection, training and testing. Table VIII shows the rate of success, , for the discriminant analysis methods of Section II in the setup of Experiment 3 and the Tessera dataset. The rate of success is computed as the average of 20 replicas having different random partitions of the blocks composing the three sub-datasets. In this experiment, for some classes the number of data in each subset is less than the number of spectral bands; this


9

TABLE VI BANDS PROGRESSIVELY CHOSEN FOR THE CLASSIFICATION BY ALL THE METHODS WHEN THE ENTIRE TRAINING DATASET IS CONSIDERED (EXPERIMENT 2). FOR THE FIRST FIVE SELECTED BANDS WAVELENGTHS, GLOBAL SUCCESS PERCENTAGE, AND CORRESPONDING PHYSICAL MEANING, AS SPECIFIED IN SECTION IV.C, IS REPORTED. DIGITS REFER TO DATASET 1

TABLE VII BANDS PROGRESSIVELY CHOSEN FOR THE CLASSIFICATION BY ALL THE METHODS WHEN THE ENTIRE TRAINING DATASET IS CONSIDERED (EXPERIMENT 2). FOR THE FIRST FIVE SELECTED BANDS WAVELENGTHS, GLOBAL SUCCESS PERCENTAGE, AND CORRESPONDING PHYSICAL MEANING, AS SPECIFIED IN SECTION IV.C, IS REPORTED. DIGITS REFER TO DATASET 2

TABLE VIII GLOBAL SUCCESS PERCENTAGE AS A FUNCTION OF THE NUMBER OF SELECTED BANDS, WHEN THE TRAINING DATASET IS RANDOMLY SPLIT INTO THREE SUBSETS (EXPERIMENT 3—DATASET 1). RESULTS ARE GIVEN AS MEAN VALUE STANDARD DEVIATION OVER THE REPLICAS

is the main reason for the slightly worse performance of the nonparametric techniques, since reducing the training set degrades the accuracy of the density estimation. In particular, the reduced size of the training set makes both fastICA and LIBSVM often ineffectual in retrieving independent components and this is the reason why ICDA and SVM results are omitted. However, a comparison with Table VI shows that the performance is still very good (between 95% and 97% for both the “best” methods, PCDA and QDA, as identified in Experiment 2). Analogous results (not shown for the sake of brevity) are obtained also for the full error indicators of Table IV. Finally, Table IX shows the same results for the Pollino dataset, where decrease of performance is higher for a small number of spectral bands because, as already outlined, the training dataset is less numerous and therefore the density estimation is significantly degraded. V. CONCLUSION This paper has demonstrated the validity of discriminant analysis for classifying hyperspectral vegetation images and in particular the superiority of prior transforms of the data aimed at making them independent or, at least, uncorrelated.

TABLE IX GLOBAL SUCCESS PERCENTAGE AS A FUNCTION OF THE NUMBER OF SELECTED BANDS, WHEN THE TRAINING DATASET IS RANDOMLY SPLIT INTO THREE SUBSETS (EXPERIMENT 3—DATASET 2). RESULTS ARE GIVEN AS MEAN VALUE STANDARD DEVIATION OVER THE REPLICAS

To simulate the future availability of PRISMA data set, the study was carried out on the airborne hyperspectral data (MIVIS) on two test sites in Tessera (near Venice) and on the Pollino mountain. Noteworthily, during the campaigns in-situ validations of the type of vegetation were made, which permitted us to estimate the accuracy rates of classification methods. Classification was accomplished through some classical discriminant analysis methods enhanced with prior transforms of data that estimate their principal or independent components. Both PCA and ICA perform better than methods that are not based on a prior transform of data. In addition the ICA transform better balances the false positive and negative rates. Moreover, both these methods compare favorably with the multi-class SVM: with larger training sets, they attain quite the same accuracy at a sensibly lower computational cost, while with a smaller or inhomogeneous training set they show a greater robustness and capability to identify all classes. An assessment of the role of spectral region and of the different vegetation classes has also been made. It has been shown that a few well selected spectral bands suffice to give performances close to the top ones reachable with more bands. Moreover, the performance is very good on all the vegetation types. The analysis of the results shows that, even though with differences in the scoring of the spectral channels also related to the



sensor characteristics used for this study, all methods refer basically to the Visible and NIR spectral ranges providing a good classification. Moreover, SWIR spectral region can be of advantage for the classification accuracy according to the method chosen and to the spatial consistency of the selected training areas. The performances of the applied methods and the suitability of the results need to be verified on simulated PRISMA data set whenever ASI will distributed the data sets on agricultural test sites areas. The results of the selection procedure for the spectral bands also give us information on the most significant ones to be considered for image segmentation. Future work will be devoted to the integration of a multispectral segmentation module to the proposed procedure to retrieve the different vegetation boundaries very accurately when compared with usual edge detection procedures and at a reasonable computational cost. Such segmentation results could be useful for improving the classification by a post-processing on the unclassified pixels. REFERENCES [1] U. Amato, A. Antoniadis, V. Cuomo, L. Cutillo, M. Franzese, L. Murino, and C. Serio, “Statistical cloud detection from SEVIRI multispectral images,” Remote Sens. Environ., vol. 112, pp. 750–766, 2008. [2] U. Amato, A. Antoniadis, and G. Gregoire, “Independent component discriminant analysis,” Int. J. Math., vol. 3, pp. 735–753, 2003. [3] U. Amato, R. M. Cavalli, A. Palombo, S. Pignatti, and F. Santini, “Experimental approach to the selection of the components in the minimum noise fraction,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 1, pp. 153–160, 2009. [4] T. V. Bandos, L. Bruzzone, and G. Camps-Valls, “Classification of hyperspectral images with regularized linear discriminant analysis,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 3, pp. 826–873, 2009. [5] A. Barducci, D. Guzzi, C. Lastri, P. Marcoionni, V. Nardino, and I. Pippi, “Simulating the performance of the hyperspectral payload of the PRISMA mission,” in Proc. IGARSS 2012, Munich, Germany, 2012, pp. 5013–5016. [6] L. S. Bernstein, A. Berk, D. C. Robertson, P. K. Acharya, G. P. Anderson, and J. H. Chetwynd, “Addition of a correlated- capability to MODTRAN,” in Proc. 1996 IRIS Targets, Backgrounds, and Discrimination Meeting, 1996. [7] J. W. Boardman, “Inversion of imaging spectrometry data using singular value decomposition,” in Proc. IGARSS 1989, Vancouver, BC, Canada, 1989, pp. 2069–2072. [8] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 6, pp. 1351–1362, 2005. [9] C. C. Chang and C. J. Lin, LIBSVM: A Library for Support Vector Machines, 2007 [Online]. Available: http://www.csie.ntu.edu.tw/cjlin/ libsvm [10] A. A. Green, M. Berman, P. Switzer, and M. D. Craig, “A transformation for ordering multispectral data in terms of image quality with implications for noise removal,” IEEE Trans. Geosci. Remote Sens., vol. 26, no. 1, pp. 65–74, 1988. [11] P. Groves and P. Bajcsy, “Methodology for hyperspectral band and classification model selection,” in Proc. IEEE Workshop on Advances in Techniques for Analysis of Remotely Sensed Data, 2003, pp. 120–128. [12] T. Hastie and R. Tibshirani, “Discriminant analysis by Gaussian mixtures,” J. Royal Statist. Soc. B, vol. 58, pp. 155–176, 1996. [13] A. Hyvärinen, J. Karhunen, and E. Oja, Independent Component Analysis. New York, NY, USA: Wiley, 2001. [14] A. Hyvärinen, “Fast and robust fixed-point algorithms for independent component analysis,” IEEE Trans. Neural Networks, vol. 10, no. 3, pp. 626–634, 1999. [15] D. A. Landgrebe, Signal Theory Methods in Multispectral Remote Sensing. Hoboken, NJ, USA: Wiley, 2003. [16] D. Lu, E. Moran, and M. Batistella, “Linear mixture model applied to Amazonian vegetation classification,” Remote Sens. Environ., vol. 87, pp. 456–469, 2003.

[17] J. C. Luo, J. Zheng, Y. Leung, and C. H. Zhou, “A knowledge-integrated stepwise optimization model for feature mining in remotely sensed images,” Int. J. Remote Sens., vol. 24, no. 23, pp. 4661–4680, 2003. [18] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 8, pp. 1778–1790, 2004. [19] P. Mitra, B. U. Shankar, and S. K. Pal, “Segmentation of multispectral remote sensing images using active support vector machines,” Pattern Recognition Lett., vol. 25, no. 9, pp. 1067–1074, 2004. [20] M. S. Moran, J. D. Jackson, P. N. Slater, and P. M. Teillet, “Evaluation of simplified procedures for retrieval of land surface reflectance factors from satellite sensor output,” Remote Sens. Environ., vol. 41, pp. 169–184, 1992. [21] R. D. Phillips, C. Layne, T. Watsona, R. H. Wynnec, and C. E. Blinn, “Feature reduction using a singular value decomposition for the iterative guided spectral class rejection hybrid classifier,” ISPRS J. Photogramm. Remote Sens., vol. 64, pp. 107–116, 2009. [22] S. Pignatti, R. M. Cavalli, V. Cuomo, L. Fusilli, S. Pascucci, M. Poscolieri, and F. Santini, “Evaluation of hyperion capability for land covers mapping in a fragmented ecosystem: Pollino National Park (Italy) case study,” Remote Sens. Environ., vol. 113, no. 3, pp. 622–634, 2009. [23] S. Rajan, J. Ghosh, and M. Crawford, “An active learning approach to hyperspectral data classification,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 4, pp. 1231–1242, 2008. [24] J. A. Richards and X. Jia, Remote Sensing Digital Image Analysis: An Introduction, 4th ed. Berlin, Germany: Springer, 2006. [25] P. S. Thenkabail, E. A. Enclona, M. S. Ashton, and B. Van Der Meer, “Accuracy assessments of hyperspectral waveband performance for vegetation analysis applications,” Remote Sens. Environ., vol. 91, no. 3, pp. 354–376, 2004. [26] J. A. N. van Aardt and R. H. Wynneb, “Examining pine spectral separability using hyperspectral data from an airborne sensor: An extension of field-based results,” IEEE Trans. Geosci. Remote Sens., vol. 28, no. 2, pp. 431–436, 2007. [27] A. Villa, J. A. Benediktsson, J. Chanussot, and C. Jutten, “Hyperspectral image classification with independent component discriminant analysis,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 12, pp. 4865–4876, 2011. [28] T. Wachtler, T. W. Lee, and T. J. Sejnowski, “Chromatic structure of natural scenes,” J. Optical Society of America A, vol. 18, no. 1, pp. 65–77, 2001. [29] M. P. Wand and M. C. Jones, Kernel Smoothing. London, UK: Chapman & Hall, 1995. [30] L. Zhang and X. Huang, “Object-oriented subspace analysis for airborne hyperspectral remotesensing imagery,” Neurocomputing, vol. 73, pp. 927–936, 2010.

Umberto Amato received the M.Sc. degree in physics from the University of Naples, Naples, Italy, in 1981. He was a Researcher with the Company for Energy Saving (A.P.R.E.), Naples, and a Professor with high schools until 1986, when he joined the Institute for Mathematics Applications (IAC), Italian National Research Council (CNR), Naples, as a Researcher. He is currently the Director of Research and Manager of IAC, CNR, Naples. He has co-authored nearly 100 peer-reviewed papers in international journals. His current research interests include development of statistical methods for nonparametric regression, classification, and dimension reduction applied to problems arising from remote sensing (analysis of images measured by sensors on-board aircrafts or satellites) and medicine (magnetic resonance imaging). Anestis Antoniadis received the Doctorate degree in applied mathematics from the University Joseph Fourier, Grenoble, France, in 1983. He is a University Distinguished Professor with the Department of Applied Mathematics (Laboratoire Jean Kuntzmann), University Joseph Fourier, Grenoble. His research interests include wavelet theory, nonparametric function estimation, abstract inference of stochastic processes, statistical pattern recognition, and statistical methodology in meteorology and crystallography. He co-edited the book


Wavelet in Statistics (Springer-Verlag, 1995). He was a joint Editor-in-Chief of the journal ESAIM: Probability and Statistics from 2001 to 2005. Dr. Antoniadis is a Fellow of the American Statistical Association and the Institute of Mathematical Statistics.

Maria Francesca Carfora received the M.Sc. degree in mathematics from the University of Naples “Federico II,” Naples, Italy, in 1988, and a two-year fellowship from the Italian National Research Council (CNR), Naples. She is a Researcher with the Institute for Mathematics Applications (IAC), CNR, Naples, Italy, since 1996. Her current research interests include development and implementation of numerical methods for partial differential equations on structured and unstructured grids with applications in medical and biological imaging, atmospheric physics, fluid dynamics, biological systems; numerical solution of inverse problems, data analysis, statistical modeling, and parameter estimation in ordinary/partial/stochastic differential equations.

Paolo Colandrea received the M.Sc. degree in oceanography and remote sensing from the University of Napoli Parthenope, Italy, in 1997. Since 2008 he has worked in small companies involved in development of SW and applications for remote sensing for national and international agencies and industries. In 2008 he joined CGS (Compagnia Generale per lo Spazio, formerly Carlo Gavazzi Space) and he is currently involved in the development of the main Italian optical space missions. He has deep experience in developing algorithms for satellite data processing, active and passive, in optical and MW range and services for satellite data exploitation for civil use. He has long experience in project managing at national and international level, in both industrial and research and development fields.

Vincenzo Cuomo received the M.Sc. degree in physics from the University of Naples, Naples, Italy, in 1972. He was a Researcher from 1973 to 1976, then Associate Professor of physics from 1976 to 1987 with the Engineering Faculty, University of Naples. From 1987 to 2012 he was a Full Professor with the University of Basilicata, Italy, and from 1993 to 2010 he was Director of the Institute of Methodologies for Environmental Analysis/National Research Council, Italy. He has dealt with environmental issues since the mid-1970s, his particular focus being on: the monitoring and control of environmental processes; relations between human activities and natural systems; protection against and prevention of natural risks; energy and environmental planning; Earth Observations. In all these fields, he is also very involved in the development of applications. He has variously acted as a member of steering committees issued by national and European organizations and research institutions on problems of environmental protection and monitoring from satellite, land uses, and next-generation space-borne sensors. He was responsible of many Research projects funded by EUMETSAT (European Organization for the Exploitation of Meteorological Satellites), ESA, European Union, Italian

11

Space Agency, Italian Ministry of Research, CNR. Actually he is responsible of SAP4PRISMA Project funded by Italian Space Agency to study hyperspectral mission PRISMA.

Monica Franzese received the M.Sc. degree in mathematics from the II University of Naples S.U.N., Caserta, Italy, in 1999, and after one year research fellowship from the Italian National Research Council (CNR) she received her Ph.D. in 2008 from the Engineering Faculty of the University of Potenza, Italy. Her research activity during the research fellowship and the doctorate has been developed in the field of remote sensing and involved the development of statistical methods for parametric and nonparametric classification applied to problems of remote sensing. She has currently a research fellowship at the institute for Mathematics Applications (IAC), Italian National Research Council (CNR), Naples. Her current research interests include development and implementation of statistical methods for bioinformatics and systems biology.

Stefano Pignatti received the M.Sc. degree in geology from the University of Rome “La Sapienza,” Rome, Italy, in 1988. Since 1995, he works as researcher with the Institute of Methodologies for Environmental Analysis of the Italian National Research Council. His interests include data calibration and analysis of hyperspectral sensors onboard aircraft or satellites in the field of environmental applications. He is currently the coordinator of the SAP4PRISMA scientific study for supporting the exploitation of the next Italian hyperspectral mission PRISMA (ASI).

Carmine Serio received his degrees with honour in physics from University of Naples, Italy on February 1978. Since joining the University of Naples as an Assistant Professor in 1984, he has led projects and studies in the area of radiative transfer modelling applied to the remote sensing of Earth atmosphere. From 1990 to 1992 he acted as temporary Professor of General Physics at University of Naples. In 1992 he joined University of Basilicata, Italy, where he currently leads a research group in infrared and visible spectroscopy applied to remote sensing of aerosol, atmospheric gas constituents and temperature, surface parameters. He is Principal Investigator in various projects in the area of high spectral resolution infrared sounders from satellite. These projects include the interferometric monitoring for greenhouse gases (IMG of the Japanese NASDA), the infrared atmospheric sounder interferometer (IASI of the French Space Agency, CNES, and EUMETSAT), the radiation explorer in the far infrared (REFIR, a three-year EU project supported within the 4th framework programme), the Meteosat Third Generation Infrared Sounder (MTG-IRS a joint programme ESA/EUMETSAT), IASI-next-generation (IASI-NG of CNES and EUMETSAT). He is permanent member of the TOVS working group, elected member and then Secretary of the International Radiation Commission (term 2001–2008). Because of his expertise he has acted as reviewer of satellite experiments and projects. He teaches at the School of Engineering of University of Basilicata, where he is currently Full Professor of General Physics.