Classification of Very High Spatial Resolution Imagery Using ...

1 downloads 0 Views 2MB Size Report
structures with a maximum of eight floors), and towers (more than eight ..... [32] A. Plaza, P. Martinez, R. Perez, and J. Plaza, “A new method for target detection ...
Copyright © 2009 IEEE

Reprinted from: Tuia, D.; Pacifici, F.; Kanevski, M.; Emery, W.J., "Classification of Very High Spatial Resolution Imagery Using Mathematical Morphology and Support Vector Machines," Geoscience and Remote Sensing, IEEE Transactions on , vol.47, no.11, pp.3866,3879, Nov. 2009 doi: 10.1109/TGRS.2009.2027895 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5256162&isnumber =5291960

This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to: [email protected] By choosing to view this document, you agree to all.

Classification of Very High Spatial Resolution Imagery Using Mathematical Morphology and Support Vector Machines Devis Tuia, Student Member, IEEE, Fabio Pacifici, Student Member, IEEE, Mikhail Kanevski, and William J. Emery, Fellow, IEEE

Abstract—We investigate the relevance of morphological operators for the classification of land use in urban scenes using submetric panchromatic imagery. A support vector machine is used for the classification. Six types of filters have been employed: opening and closing, opening and closing by reconstruction, and opening and closing top hat. The type and scale of the filters are discussed, and a feature selection algorithm called recursive feature elimination is applied to decrease the dimensionality of the input data. The analysis performed on two QuickBird panchromatic images showed that simple opening and closing operators are the most relevant for classification at such a high spatial resolution. Moreover, mixed sets combining simple and reconstruction filters provided the best performance. Tests performed on both images, having areas characterized by different architectural styles, yielded similar results for both feature selection and classification accuracy, suggesting the generalization of the feature sets highlighted. Index Terms—Mathematical morphology, recursive feature elimination (RFE), support vector machines (SVMs), urban land use, very high resolution imagery.

I. I NTRODUCTION

D

URING the last two decades, significant progress has been made in developing and launching satellites with instruments well suited for Earth observation with increasingly finer resolutions in both the spatial and spectral domains. Improvements in spatial resolution of optical sensors opened a wide range of opportunities for remote sensing image analysis. Specifically, remote sensing data from panchromatic systems such as QuickBird and WorldView-1 have a potential for more detailed and accurate mapping of the urban environment with details of submeter ground resolution. At the same time, they present additional problems for information mining and are Manuscript received October 1, 2008; revised March 4, 2009. First published September 22, 2009; current version published October 28, 2009. This work was supported in part by the Swiss National Foundation under Grants 100012113506 and 200021-113944. D. Tuia and M. Kanevski are with the Institute of Geomatics and Analysis of Risk, University of Lausanne, 1015 Lausanne, Switzerland (e-mail: Devis. [email protected]; [email protected]). F. Pacifici is with the Earth Observation Laboratory, Department of Computer, Systems and Production Engineering (DISP), “Tor Vergata” University, 00133 Rome, Italy (e-mail: [email protected]). W. J. Emery is with the Department of Aerospace Engineering Sciences, University of Colorado, Boulder, CO 80309 USA (e-mail: William.Emery@ colorado.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2009.2027895

rarely exploited for urban classification due to their lack of spectral information. Therefore, panchromatic imagery has often been fused with multispectral imagery using pansharpening methods [1]–[3] in order to exploit the spectral information at higher spatial resolution. In the past, the adding of textural or morphological features for the classification of satellite images has been shown to overcome the lack of spectral information [4]. Texture is the term used to characterize the tonal or gray-level variations in an image. A widely used technique to extract textural information is to build a so-called gray-level cooccurrence matrix (GLCM) proposed by Haralick et al. [5], [6]. In [7], six textural parameters associated with GLCM statistics have been investigated to perform image clustering on an Advanced Very High Resolution Radiometer image. Two of them, namely, energy and contrast, are highlighted to be the most significant. In [8], the authors exploited a linear discriminant analysis to classify a mixed spectral-textural feature set, resulting in a general improvement of the classification results. Airborne photography has been used as well to extract first- and secondorder textural features to perform supervised classification of forest regions [9]. Pacifici et al. [10] used GLCM to classify very high resolution QuickBird and WorldView-1 urban scenes with neural networks and proposed extended pruning to reduce the number of features. Other techniques to extract spatial information have been proposed in the literature, including Markov random fields [11]–[14] or Gabor filters [15]–[17]. A different way to integrate contextual information is to extract shape information of single objects using mathematical morphology [18], [19]. Mathematical morphology provides a collection of image filters (called operators) based on set theory. In remote sensing, morphological operators have been used to classify remotely sensed images at decametric and metric resolutions and have been highlighted as very promising tools for data analysis [20], [21]. The effective results with panchromatic imagery [22], [23] using basic operators such as opening and closing have focused the attention of the scientific community on the use of morphological analysis for image classification. In [24] and [25], the authors use morphological operators for image segmentation. Pesaresi and Benediktsson [20] proposed building differential morphological profiles (DMPs) to account for differences in the values of the morphological profiles at different scales. Such profiles have been used in [20] and [26] for Indian Remote Sensing Satellite (IRS)-1C panchromatic

image segmentation, using the maximal derivative of the DMP, because this value shows the optimal size of the structure of the object. In [27], complete DMPs have been used for both IRS-1C and IKONOS panchromatic image classification. Particularly, reconstruction DMPs have been used with a neural network classifier, and two linear feature extraction methods have been proposed in order to reduce the redundancy in the information. More recently, the extraction of morphological information from hyperspectral imagery has been discussed. In [28]–[30], the first principal component has been used to extract the morphological images. In [31], morphological operators have been proposed for multichannel images. In [32], extended opening and closing operators by reconstruction are used for target detection on both Airborne Visible/Infrared Imaging Spectrometer and Reflective Optics System Imaging Spectrometer data. In [33], multichannel and directional morphological filters are proposed for hyperspectral image classification, showing the ability to account simultaneously for scale and orientation of objects. The use of morphological operators may result in a very high input space dimensionality, since it is possible to extract many morphological features using different filters or by changing the size and the shape of the structuring elements (SEs). This may result in largely empty spaces and can harm the quality of class statistics that may become highly variable. This effect is known in literature as Hughes phenomenon [34], [35]. Support vector machines (SVMs) [36], [37] are well known for their capability to handle high-dimensional spaces. For this reason, they are often used for classification of hyperspectral imagery [38]–[42]. Although SVMs are effective in extracting the relevant morphological information from the feature set, the large amount of features extracted leads to expensive computations, and the presence of noisy or redundant features can degrade the classification performance. Thus, feature extraction/selection is a central problem when dealing with such input spaces. Feature extraction results from the reduction of dimensionality of data by combination of input features. The aim of these methods is to extract new features when maximizing discrimination between classes. Linear feature extraction methods for morphological operators have been studied in [27], [29], and [33], where algorithms such as decision boundary feature extraction [43] and nonparametric weighted feature extraction [44] have been shown to effectively reduce the size of the feature space. Despite the effectiveness of these methods, the extracted features are combinations of the original features, and all of them are necessary to build the new feature set. On the contrary, feature selection is performed to select informative and relevant input features by analyzing the relevance of each of the input features for a certain classification problem. Pacifici et al. [10] showed this principle for textural features identifying the ten most informative parameters for urban landcover classification. The results obtained for an independent test case indicated the value of reducing the input set down to ten features and the capability of these ten to generalize new scenes. Feature selection algorithms can be divided into three classes: filters, wrappers, and embedded methods [45], [46]. Filters rank features based on feature relevance measures. Such measures can be computed either using a similarity measure,

such as correlation, or using a measure of distance between distributions (for instance, Kullback–Leibler divergence between the joint and product distribution). Then, a search strategy such as forward or backward selection is applied in order to select the relevant subset of features. Filters may be considered as a preprocessing step and are generally independent from the selection phase. Remote sensing applications of filters can be found in [4] and [47]–[51]. Wrappers utilize the predictor as a black box to score subsets of features according to their predicting power. In a wrapper selector, a set of predictors receives subsets of the complete feature set and gives feedback on the quality of the prediction (represented, for instance, by accuracy or kappa statistics) using each subset. Feature selection and learning phases interact in embedded methods where the feature selection is part of the learning algorithm and the feature selection is performed by using the structure of the classifying function itself. This is different from wrappers where, as pointed out, the learning algorithm is accessed as a black box and only the output is used to build the selection criterion. Several specific methods have been proposed for SVMs, either wrappers or embedded, depending on the degrees of interaction between the classifier and the feature selection criterion. A well-known embedded backward selection algorithm is the recursive feature elimination (RFE) that uses the changes in the decision function of the SVM as a criterion for the selection [52], [53]. In [54], an embedded feature selection is proposed. The algorithm, based on boosting, finds an optimal weighting and eliminates the bands linked with small weights. In [55], a genetic algorithm is used to select an optimal subset of features for a successive SVM classification. Experiments using the radius margin bound and the number of support vectors as a fitness function to select the best subset have been carried out in [56]. Pal [57] uses the bounds of generalization of the SVM as a fitness function for a genetic algorithm in order to perform the selection. In this paper, we report the results of an extensive analysis of different morphological operators with the goal of investigating the relevance of the most contributing features when applied to submetric panchromatic imagery for land-cover classification of urban scenes. To account for the spatial setting of different cities, these parameters have been calculated over a range of window scales. The effect of different filters, as well as their scale, is addressed and discussed in detail. The novelty of this paper is the extensive analysis of morphological features and their comparison when applied to scenes accounting for different urban settlements imaged at very high spatial resolution. The relevance of the analyzed features was assessed by RFE feature selection algorithm. The comparison of the results highlighted a reduced set of features, giving insight on the type of morphological features and their scales that may help other researchers or urban planners to choose more effectively their input parameters for the classification of urban areas. This paper is organized as follows. Section II recalls the basic theory of mathematical morphology, while Section III illustrates the RFE feature selection algorithm. In Section IV,

Fig. 1. (Left) Erosion and (right) dilation using a circular SE.

the data sets are presented, while in Section V, we describe the setup of the experiments. Results are reported and discussed in Section VI. Conclusions are in Section VII.

Fig. 2. Progressive opening and closing using a diamond-shaped SE. On the left end is the closing image produced using an 11-pixel SE, in the middle is the original image, and on the right end is the opening image produced using an 11-pixel SE.

II. M ORPHOLOGICAL O PERATORS Morphological operators are a collection of filters based on set theory. These operators are basically applied to two ensembles, the image g to analyze and an SE B, which is a set with known size and shape that is applied over the image and acts as a filter. When centered on a pixel x, B is a vector that takes into account all the values xb ’s of the pixels of g covered by the SE B. Morphological operators can be summarized into two fundamental operations: erosion B (g) and dilation δB (g) [19], whose principles are shown in Fig. 1 for binary images. Basically, erosion deletes all pixels whose neighborhood cannot contain a certain SE, i.e., it performs an intersection between the binary image g and B. On the contrary, dilation provides an expansion by addition of the pixels contained in the SE, i.e., a union between g and B. Mathematically, erosion and dilation can be represented as   g−b δB (g) = g−b . (1) B (g) = b∈B

the objects in the image are missed with classical opening (and closing) operators. In general, this is not a desirable behavior in image classification, where pixels belonging to the same object should be grouped in the same class. In order to preserve original shapes in the morphological images, Crespo et al. [58] proposed to use opening (and closing) by reconstruction operators instead of simple opening and closing operators. These filters are recalled in the following section.

b∈B

Binary morphology has been extended to grayscale images by considering them as a topographic relief, where brighter tones correspond to higher elevation [22], [33]. In grayscale morphology, intersection ∩ and union ∪ become the pointwise minimum ∧ and maximum ∨ operators, respectively [21]. In the following section, we briefly discuss the filters used in this paper, which are composed by combinations of erosion and dilation operators. A. Opening and Closing Two of the most common morphological operators are the opening γB (g) and the closing φB (g) operators. Opening is the dilation of an eroded image and is widely used to filter brighter (compared to surrounding features) structures in grayscale images. On the contrary, closing is the erosion of a dilated image and allows one to filter out darker structures [22]. Opening and closing operators can be represented as γB (g) = δb ◦ B (g)

Fig. 3. Progressive opening and closing by reconstruction using a diamondshaped SE. On the left end is the CR image produced using an 11-pixel SE, in the middle is the original image, and on the right end is the OR image produced using an 11-pixel SE.

φB (g) = B ◦ δB (g).

(2)

Fig. 2 shows a series of openings (and closings) obtained by applying SEs of increasing sizes, showing how the shapes of

B. Reconstruction Filtering Reconstruction filters provide an iterative reconstruction of the original image g starting from a mask I. If the mask I is the erosion B (g), the original brighter features are filtered by geodesic dilation [opening by reconstruction (OR)]. On the contrary, if the mask I is the dilation δB (g), the original darker features are filtered by geodesic erosion [closing by reconstruction (CR)]. A geodesic dilation (respectively erosion) is the pointwise minimum (maximum) between the dilation (erosion) of the marker image and the original image. Equations (3) and (4) illustrate OR and CR, respectively ρδ [B (g)] = ρδ (I)  k  k k−1 (I) |δB (I) = δB (I) = min xg , δB

(3)

ρ [δB (g)] = ρδ (I)   = max xg , kB (I) |kB (I) = k−1 B (I).

(4)

The reconstruction process, whose principle is shown in Fig. 3, is iterated until the reconstructed image at iteration k k (I) = is identical to the image obtained at iteration k − 1 [δB k−1 k−1 k δB (I) in (3) and B (I) = B (I) in (4)]. By comparison

with w=



αk yk xk .

(7)

k

Fig. 4. Progressive opening and closing top hat using a diamond-shaped SE. On the left end is the CTH image produced using an 11-pixel SE, in the middle is the original image, and on the right end is the OTH image produced using an 11-pixel SE.

with Fig. 2, the difference is rather obvious. Using OR, the shape of the objects is preserved, and the progressive increase of the size of the SE results in a progressive disappearance of objects whose pixels are eliminated by erosions using larger SEs. These objects cannot be recovered during the reconstruction. The size of the SE used in the reconstruction controls the strength of the reconstruction. By using larger size SEs, peaks and valleys of the marker are filled (or taken out, respectively) quickly, and only very bright (or dark, respectively) structures are recovered. On the contrary, by using smaller size SEs in the reconstruction phase, the reconstruction is reached gradually, and gray structures are also recovered. C. Top Hat Top-hat operators are the residuals of an opening (or a closing) image, when compared to the original image. Therefore, top-hat images show the peaks [opening by top hat (OTH)] or valleys [closing by top hat or bottom hat (CTH)] of the image (Fig. 4). This correspond to the following: OTH = g − γB (g) CTH = φB (g) − g.

(5)

The OTH operator represents the bright peaks of the image. On the contrary, the CTH operator represents the dark peaks (or valleys) of the image. III. RFE RFE is an embedded feature selector whose selection criterion is based upon the analysis of the classification function of the predictor. Two versions of the algorithm have been proposed in the literature so far. The first one [52] takes into account linearly separable data, while the other [53] is suitable for linearly nonseparable data. In this section, we briefly summarize both versions of the RFE algorithm. A. RFE for Linearly Separable Data In a linearly separable case and for a binary problem, the SVM decision function for a Q-dimensional input vector x ∈ RQ is given by D(x) = w · x + b

(6)

The weight vector w is a linear combination of the training points, α’s are the support vector coefficients, and y’s are the labels. In this case and for a  set of variables Q, the width Q 2 2 of the margin is 2/w = 2/ i=1 wi . By computing the weight vector ci = (wi )2 for each feature in the feature set Q, it is possible to rank the features by their contribution to the total decision function and evaluate the selection criterion f as shown in the following equation: f = arg min |ci |. i∈Q

(8)

In this way, the feature that provides the smallest change in the objective function is selected and removed. B. RFE for Nonlinearly Separable Data In a nonlinearly separable case, it is not possible to compute the weight vectors ci ’s. This can be explained as follows. The decision function is defined in a feature space, which is the mapping of the original patterns x in a higher dimensional space. The mapping Φ is not computed explicitly but only expressed in terms of dot products in the feature space by the kernel function K(xk , xl ). This way, only distances between the mapped patterns φ(x) are used, and their position in the feature space is not computed explicitly. Therefore, it is not possible to apply (7) directly. In order to take into account linearly inseparable data sets, in [52], it has been proposed to use the quantity W 2 (α) expressed in the following:  αk αl yk yl K(xk , xl ). (9) W 2 (α) = w2 = k,l

Such a quantity is a measure of the predictive ability of the model and is inversely proportional to the SVM margin. By using this property and assuming that the α coefficients remain unchanged by removing the less informative feature, it is pos2 (α) for all the feature subsets counting sible to compute W(−i) for all the features minus the considered feature i. This quantity is computed without retraining the model. Successively, the feature whose removal minimizes the change of the margin is removed, as shown in the following equation: 2 (α) . (10) f = arg min W 2 (α) − W(−i) i∈Q

For the sake of simplicity, the RFE algorithms described earlier [see (8) and (10)] were discussed for binary problems only. To take into account multiclass data, the quantities W 2 (α) 2 (α) are computed separately for each class. Then, as and W(−i) proposed in [53], the selection criterion of (10) is evaluated for the sum over the classes of W (α)2 ’s  2 f = arg min (αcl ) . (11) Wcl2 (αcl ) − Wcl,(−i) i∈Q

cl

Fig. 6. (a) Rome panchromatic image and (b) ground survey used (orange = buildings, yellow = apartment blocks, black = roads, gray = railway, light green = vegetation, dark green = trees, dark brown = bare soil, light brown = soil, and red = towers). Fig. 5. (a) Las Vegas panchromatic image and (b) ground survey used (orange = residential buildings, red = commercial buildings, black = roads, dark gray = highway, light gray = parking lots, light green = vegetation, dark green = trees, light brown = soil, dark brown = bare soil, blue = water, and cyan = drainage channel).

RFE runs iteratively, removing a single feature at each epoch. Therefore, a prior knowledge about the number of features to select is required. IV. D ATA S ETS Two panchromatic QuickBird images have been used for the morphological analysis. The first is a 755 × 722 pixels image taken over Las Vegas (U.S.) in 2002, while the second is a 1188 × 973 pixels image acquired over Rome (Italy) taken in 2004. Both scenes are imaged at very high spatial resolution (∼0.6 m). These scenes represent well a common American suburban landscape, including small houses and large roads, and the European style of old cities built with more complex structures with older and more modern constructions. The ground references for each scene, reported in Figs. 5(b) and 6(b), have been obtained by careful visual inspection of separate data sources, including aerial imagery, cadastral maps, and in situ inspections (for the Rome scene only). An additional consideration regards objects within shadows that reflect

little radiance because the incident illumination is occluded. Morphological features can potentially be used to characterize these areas as if they were not within shadow. Therefore, these surfaces were assigned to one of the corresponding classes of interest described earlier. To select training and validation samples, having both statistical significance and avoiding the correlated neighboring pixels, we have adopted a stratified random sampling (SRS) method, ensuring that even small classes were adequately represented as reported by Chini et al. [59]. In SRS, the population of N pixels is divided into k subpopulations of sampling units N1 , N2 , . . . , Nk , which are termed “strata.” Therefore, we have randomly sampled the pixels in each of those classes accordingly to their extension in area, based on the produced ground reference. A more detailed description of the scenes follows. A. Las Vegas The Las Vegas scene, shown in Fig. 5(a), denotes regular crisscrossed roads and different examples of buildings characterized by similar heights (about one or two floors) but different dimensions, from small residential houses to large commercial buildings. This scene was chosen due to its simplicity and regularity that allow an easier analysis and interpretation of the morphological features.

TABLE I GROUND SURVEY USED FOR THE LAS VEGAS DATASET AND NUMBER OF SAMPLES FOR TRAINING, VALIDATION, AND TEST

Eleven different surfaces of interest have been recognized, paying special attention to the specific peculiarities of each scene. Particularly, for this test case, the goal was to distinguish the different use of the asphalted surfaces which included residential roads (i.e., roads that link different residential houses), highways (i.e., roads with more than two lanes), and parking lots. An unusual structure within the scene was a drainage channel located to the upper part of the image. This construction showed a shape similar to roads, but with higher brightness since it was built with concrete. A further discrimination was made between residential houses and commercial buildings due to their different extents and between bare soil (terrain not in use) and soil (generally, house backyards with no vegetation cover). Finally, more traditional classes such as trees, short vegetation, and water were added. The areas of shadow were very limited in the scene due to the modest heights of buildings and relatively low sun elevations. A reference ground survey, shown in Fig. 5(b), of 373 023 pixels has been randomly split into the following: 1) a training set of 30 000 pixels (∼8% of the labeled pixels); 2) a validation set of 25 000 pixels (∼6.7%); and 3) a test set of 318 023 pixels (∼85.3%). Specific details on the number of samples per class used are reported in Table I. B. Rome Due to the dual nature of the architecture [older buildings to the upper right and newer buildings such as apartment blocks in the lower left, see Fig. 6(a)] and to the high offnadir angle, the selection of the classes for the scene of Rome was made to investigate the potential of discriminating between structures with different heights, including buildings (structures with a maximum of five floors), apartment blocks (rectangular structures with a maximum of eight floors), and towers (more than eight floors). The surfaces of interest were roads, trees, short vegetation, soil, and the peculiar railway in the middle of the scene for a total of nine classes. Contrarily from the previous case, in this scene, shadow occupies a discrete portion of the image. Objects within shadows reflect little radiance because of the occlusion of the incident illumination, but the interest of this case study lies in the recognition of the correct class, even when the object is covered by shadows. Therefore, shadowed surfaces were recognized by visual inspection of separate data sources, including aerial imagery, cadastral maps,

TABLE II GROUND SURVEY USED FOR THE ROME DATASET AND NUMBER OF SAMPLES FOR TRAINING, VALIDATION, AND TEST

and in situ inspections, and assigned to the corresponding classes of interest. A reference ground survey [Fig. 6(b)] of 775 411 labeled pixels was created for this data set. In light of the complexity of the scene and of the significant overlap of the classes, 50 000 pixels (∼6.5%) have been retained for training, 30 000 (∼4%) have been used for model selection, and the remaining 695 411 (∼89.5%) have been used for test. Details on the number of samples per class used are reported in Table II. V. E XPERIMENTAL S ETUP This section briefly discusses the feature sets and the setup of the experiments. Details on the classifier used, as well as the strategy followed to select the optimal number of features in the RFE selection, are also reported. A. Experiments Eight experiments have been investigated using different morphological sets as reported in Table III. Specifically, the following six morphological filters have been considered: 1) opening (O); 2) closing (C); 3) OR; 4) CR; 5) OTH; 6) CTH. As stated earlier, these are the most frequently used morphological filters. For each of these filters, we used an SE whose dimensions increased from 9 to 25 pixels with steps of 2 pixels, resulting in 9 morphological features. The size of the SEs has been chosen according to the image resolution. For the Las Vegas data set, a square SE has been used in order to take into account the major direction of the objects on the image, which are 0◦ and 90◦ (see Fig. 5). For the Rome data set being characterized by an overall 45◦ angle in the disposition of the objects (see Fig. 6), a diamond-shaped SE has been used instead. This shape allows a better reconstruction of the borders of the objects. For the reasons stated in Section II-B, the reconstruction has been performed using a small (3-pixel diameter) SE. B. Classifier The classification has been performed using a one-against-all SVM implemented using the Torch 3 library [60]. A radial basis

TABLE III FEATURE SETS CONSIDERED (O = OPENING, C = CLOSING, OR = OPENING BY RECONSTRUCTION, CR = CLOSING BY RECONSTRUCTION, OTH = OPENING TOP HAT, AND CTH = CLOSING TOP HAT)

TABLE IV CLASSIFICATION ACCURACIES (IN PERCENT) AND KAPPA INDEX FOR LAS VEGAS DATASET (∗ = SIGNIFICANT DIFFERENCE FROM THE OC-OCR RESULT BY THE McNEMAR TEST [61], SEE SECTION V-B)

TABLE V CLASSIFICATION ACCURACIES (IN PERCENT) AND KAPPA INDEX FOR RFE EXPERIMENTS USING THE LAS VEGAS DATASET. IN BOLD ARE THE RESULTS OUTPERFORMING THE OC-OCR SET (∗ = SIGNIFICANT DIFFERENCE FROM THE OC-OCR RESULT BY THE McNEMAR TEST [61], SEE SECTION V-B)

function kernel has been used in all the experiments. Kernel parameters θ = {σ, C} have been optimized by a grid search in the ranges σ = {0.01, . . . , 1.3}, C = {1, . . . , 51} based on previous experiments. Labels of a model selection set were predicted by each model, and the one showing the higher accuracy was retained. The optimal model was then used to predict a new unseen data set, the test data. See Tables I and II for the composition of the different set on both case studies. To evaluate the performances of the models, both accuracy and Kappa statistics have been used. Since overall accuracies of the models were often similar, statistical significance of

the difference between the results has been assessed using the McNemar test. This test compares classification results for related samples by assessing the standardized normal test statistic z for two thematic maps [61]. For an interval of confidence of α = 5%, the first map result is significantly different from the second when |z| > 1.96. In Tables IV–VII, the sign “∗ ” has been added alongside of the overall accuracy when the result is significantly different from the best result reported in the table (the OC-OCR in each case). A C# application has been developed by the authors to extract morphological images. RFE feature selection has been implemented using Matlab 7.

TABLE VI CLASSIFICATION ACCURACIES (IN PERCENT) AND KAPPA INDEX FOR THE ROME DATASET (∗ = SIGNIFICANT DIFFERENCE FROM THE OC-OCR RESULT BY THE McNEMAR TEST [61], SEE SECTION V-B)

TABLE VII CLASSIFICATION ACCURACIES (IN PERCENT) AND KAPPA INDEX FOR RFE EXPERIMENTS USING THE ROME DATASET. IN BOLD ARE THE RESULTS OUTPERFORMING THE OC-OCR SET (∗ = SIGNIFICANT DIFFERENCE FROM THE OC-OCR RESULT BY THE McNEMAR TEST [61], SEE SECTION V-B)

and eight features are to be removed. These solutions are conservative because they preserve most of the features. 2) Test error: A test error is computed at each iteration using the new features set. The feature set related to the minimal test error is selected. 3) Representation entropy  [62]: At each RFE iteration, the d eigenvalues λ˜i = λi / di=1 λi of the covariance matrix of the current d-dimensional feature set show the distribution of the information between the d features. If the distribution of the eigenvalues is uniform, the maximal degree of compression possible using these features is achieved. An entropy function can be computed to evaluate the degree of compression [63] Fig. 7. Application of the three criteria for the selection of the optimal number of features. Example on the Las Vegas image (37 initial features). Prior knowledge results in the sets RFE-33 and RFE-29. (Blue dot) Validation error minimum results in the RFE-24 set. (Red dot) Representation entropy maximum results in the RFE-15 set.

C. On the Optimal Number of Features to Select One drawback of the RFE is that it does not provide a straightforward stopping criterion. This means that the algorithm runs until all the features have been removed. Thus, a prior knowledge about the desired number of features is mandatory. In the analysis reported, the selection of the number of features has been based on the following three criteria (see also Fig. 7). 1) Prior knowledge: The number of selected features is decided a priori. In the experiments thereafter, four

HR = −

d 

λ˜i log λ˜i .

(12)

i=1

This quantity is called representation entropy. It has a minimum (zero) when all the eigenvalues except one are zero and has a maximum when all the eigenvalues are equal (uniform distribution). As a comparison, classical principal component analysis extraction of ten features (accounting for 99.5% of the original information in both cases) has been added. VI. R ESULTS AND D ISCUSSION In this section, we discuss the results obtained for the two QuickBird scenes.

Fig. 8. Classification maps for the Las Vegas image using (a) only the panchromatic band, (b) the OC set, and (c) the RFE-24 set (orange = residential buildings, red = commercial buildings, black = roads, dark gray = highway, light gray = parking lots, light green = vegetation, dark green = trees, light brown = soil, dark brown = bare soil, blue = water, and cyan = drainage channel).

A. Las Vegas The results for the Las Vegas scene are reported in Table IV. Classification maps are shown in Fig. 8. For this scene, the single panchromatic image cannot correctly classify the 11 classes, resulting in an overall accuracy of 44.42% with a Kappa index of 0.321. Only the classes “Residential,” “Roads,” and “Bare soil” are recognized by the classifier, as is shown by the classification map in Fig. 8(a). The integration of morphological features improves the classification results for all the feature sets considered. Hereafter, all the land-use classes are detected. The reconstruction filters (OCR) result in an overall accuracy of 82.73% with a Kappa index of 0.797. Even if excellent results are obtained for the class “Commercial” (93.02% accuracy), the class “Parking lots” is markedly confused with roads (accuracy: 55.27%). Major confusion is also observed between the classes “Soil” and “Residential buildings.” Finally, the class “Trees” shows the smallest accuracy (55.17%) because of the high confusion with residential buildings, roads, and vegetation. This can be explained by the fact that reconstruction filtering smoothes small structures whose luminance is not sharply darker than the surrounding objects.

Top-hat filters (TH) lead to better global results for the main classes of the image. Residential building, road, and highway results are improved by these features, achieving an overall accuracy of 92.37% (Kappa: 0.911). Parking lot accuracy improves by 25% because of the better separability obtained by using large SE CTH filters. Surprisingly, the use of simple opening and closing filtering (OC) leads to the best results for the single filter sets with an overall accuracy of 95.14% and a Kappa index of 0.943. These simple filters allow a significant improvement for the classes badly treated by the previous operators. Results for classes “Short vegetation,” “Trees,” and “Soil” are improved by 10%–15%, and the best result is reached for all the classes. The classification map obtained is shown in Fig. 8(b). Combination of the sets (OC-TH, OCR-TH, OC-OCR, and OC-OCR-TH) leads to overall accuracies around 94%–95%, equaling the OC results. The OC-OCR set shows the best results, lightly improving the OC results for each class (the class “Short vegetation” is the only one showing a small decrease in the accuracy). This set achieves an overall accuracy of 95.93% with a Kappa index of 0.952. The McNemar’s test shows the superiority of this result with respect to all the others. Summing up, adding the sets into a stacked vector not only improves

Fig. 9. Feature selection by the RFE algorithm for the Las Vegas image. Each bar represents the iteration when the feature has been removed. White bars represent features removed during iterations 1–4; yellow bars represent features removed during iterations 5–8; orange bars represent features removed during iterations 9–13; red bars represent features removed during iterations 14–22; and black bars are the features maintained in each RFE result.

the results (with best results obtained for the set combining OC and OCR features) but also adds noise and redundancy that prevents the combination of improving significantly the solution found using the simple feature sets. In order to reduce such redundancy, RFE feature selection is applied. As stated in Section V-C, four different stages of the feature selection are considered, accounting for different sizes of the final feature set. Solutions after removal of 4 (RFE-33), 8 (RFE-29), 13 (RFE-24), and 22 (RFE-15) are considered here. The RFE-24 feature set is chosen by taking into account the minimum test error criterion [see the blue curve in Fig. 7(a)], and the RFE-15 is chosen by taking into account the maximum of representation entropy [red curve in Fig. 7(a)]. As stressed earlier, this last solution is optimal in terms of information compression. In general, none of the results obtained outperforms significantly the OC-OCR result (by the McNemar’s test), even if small increases in the accuracies can be observed in Table V. Since the SVM is already robust to the problems of dimensionality, the benefits of RFE have to be interpreted in terms of image compression more than in terms of improvement of the classification accuracy, because they highlight the relevant features for the success of the classification. The sequential removal of features by RFE is shown in Fig. 9. During the first iterations (RFE-33, RFE-29) only features from the OCR set are removed: large scale CR and small scale OR features are the features selected at this stage. By looking at these features, the changes between them are minor, and their redundancy is strong. RFE-33 and RFE-29 columns in Table V show small improvements in the results of several classes. Successively (iteration 9, RFE-24), the panchromatic image is removed. The panchromatic band is related to a very detailed information that can bring noise to the class definition. In this sense, O9 and C9 features are very similar to the panchromatic band and act as a smoothing filter. Moreover, RFE-24 shows the removal of OCR features related to small SEs. Thus, the algorithm selects OCR features, adding information to the OC features that are conserved to this point. RFE-24 provides the

best results, showing an overall accuracy of 96.11% with a Kappa index of 0.955. At this point (RFE-15), OC features start to be removed. Fig. 9 shows the removal of opening features (in red). From this figure, we can observe that the features are removed at regular intervals and that features O-9, O-15, O-19, and O-25 are preserved. We can interpret such a removal as the redundancy between features extracted using too similar SEs: a 2-pixel step in the extraction of features seems to have been a too short interval. Thus, only a small amount of features carrying very different information is kept. Nonetheless, a decrease in the performance of the SVM was obtained (accuracy of 95.67% and Kappa index of 0.949). The representation entropy is a criterion useful for information compression and does not account for classification accuracy. Therefore, the RFE-15 set represents an optimal solution for the reduction of the size of the data set, even if it shows a small decrease in the performances. In fact, after removal of 22 features, the SVM result is degraded only by 0.003 in terms of Kappa index. Regarding computational burden, most of the computational complexity of the algorithm is taken by the SVM model selection. The generation of the morphological features takes only a few seconds, while the SVM shows a complexity which is quadratic with respect to the number of examples. For the Las Vegas case and for the most complex experiment (the OC-OCRTH), such a calibration with 25 parameter sets has taken 16 h. The complete RFE relies on the evaluation of Q · (Q − 1) SVMs. Nonetheless, the optimal model parameters of the OCOCR set are kept so that the model selection computational burden is avoided. Each iteration speeds up the SVM evaluation because a feature is removed, but the overall computational burden still remains heavy. Note that, in this optic, the definition of a task-based stopping criterion for the RFE becomes important. B. Rome The results for the Rome scene are reported in Table VI. Classification maps are shown in Fig. 10. Confirming the results obtained for the Las Vegas scene, the only panchromatic information does not allow one to distinguish the nine classes of the ground survey. This results in an overall accuracy of 33.84% with a Kappa index of 0.229. Only the classes “Buildings,” “Roads,” “Trees,” and “Bare soil” are retained by the classifier. Nonetheless, the results for the Pan set provided the best results in terms of accuracy for the class “Bare soil” (97.20%). This is due to the fact that the model has the tendency to classify most of the pixels as “Bare Soil,” including the ones that are, in fact, bare soil [see the classification map in Fig. 10(a)]. Since the overall accuracy criterion does not take into account commission errors, the accuracy for this class is really high. Similarly to what was observed in the previous scene, the OCR set shows better results than the Pan experiment (accuracy of 74.21% and Kappa index of 0.694). Moreover, the difference in overall performances with the TH experiment is smaller. This can be interpreted as follows. The OCR set provides the best performance for the classes “Railway” and “Tower.” This is because the OCR features preserve the shape of the objects [see the classification map in Fig. 10(b)]. Specifically, for towers (note that only this set can handle this class correctly), the

Fig. 10. Classification maps for the Rome image using (a) only the panchromatic band, (b) OCR set, (c) OC set, and (d) RFE-33 (orange = buildings, yellow = apartment blocks, black = roads, gray = railway, light green = vegetation, dark green = trees, dark brown = bare soil, light brown = soil, and red = towers). Images have been visually enhanced.

reproduction of the shape is far more accurate than the one obtained using other sets [see the results for the class “Tower” for the OC and TH sets and the classification maps in Fig. 10(b) and (c)]. Nonetheless, looking at the per-class accuracy of Table VI, the OCR model confirms the poor performances on the classes “Trees” (accuracy 48.29%, strongly confused with “Roads”) and “Vegetation” that make this model worst than the TH on the average. Moreover, OCR provides a more noisy solution than the OC set in the residential building areas. TH set provides overall better results, resulting in an accuracy of 78.10% and a Kappa index of 0.738. Even if the overall result is better for this data set, the TH set fails at recognizing the towers, which are poorly classified (58.04% of accuracy). The best results for single sets are obtained again by the OC set (accuracy of 83.10% and Kappa index of 0.799). The building area reconstruction is less noisy, and the class “Trees” is slightly confounded with the class “Road.” Despite the higher accuracies, the lower half of the classification map in Fig. 10(c) shows a less desirable result for apartment blocks than the one obtained with the OCR set. Mixed sets (OC-TH, OCR-TH, OC-OCR, and OC-OCRTH reported in Table VI) show general improvement of the results obtained by the nonmixed sets. Again, the OC-OCR sets provide the best results (overall accuracy of 86.48% and Kappa index of 0.839), as for the Las Vegas scene. Results for all the classes are improved with respect to the results obtained by the single sets.

Fig. 11. Feature selection by the RFE algorithm for the Rome image. Each bar represents the iteration when the feature has been removed. White bars represent features removed during iterations 1–4; yellow bars represent features removed during iterations 5–8; orange bars represent features removed during iterations 9–26; and black bars are the features maintained in each result.

RFE feature selection is shown in Fig. 11. Only three subsets are evaluated because the minimum for test error is achieved by the RFE-33 set, when four features are removed. Table VII shows the per-class and global accuracies for the Rome data set. At first glance, the feature selection does not improve significantly the classification accuracy for this data set. The best result achieved is provided by the RFE-33 set,

which achieves an overall accuracy of 86.54% with a related Kappa index of 0.840. This can be explained by the good generalization capabilities of the SVM, which is known to be able to easily handle high-dimensional spaces up to several tens of dimensions. In terms of classification accuracy, the OC-OCR is already optimal, and the feature selection must be seen here as a way to rank the features in terms of information. Note that the RFE-33 and RFE-29 results, even if accounting for less features, are not significantly different to the OC-OCR result (McNemar’s test). As observed for the previous image, most of the features removed during the first iteration are part of the OCR set. OR and CR features related to small SEs are removed at the RFE-33 stage. For the Rome data set, the panchromatic image is removed early (at iteration 5, RFE-29), again showing its redundancy with the OC features extracted using small SEs. CR features extracted using small SEs are removed rapidly, mainly because they highlight small shadowed areas and tend to smooth the differences between the other structures in the image. Finally, OR features are also rapidly removed. These features filter small-scale structures such as details in the roofs while leaving the main structures of the image unchanged. On the contrary, features constructed using larger SEs take into account large-scale structures such as the entire building. These features show higher variability and are more valuable for the land-use classification because they provide the information necessary for the recognition of the towers, which differ from the buildings only by their height, i.e., the neighborhood information of the pixels. The set selected by the representation entropy criterion (RFE-12) is the set resulting in the best compression rate. All the CR information has been collected into a single feature, the CR-25. The same holds for the OR features, for which only OR-23 and OR-25 are selected. Regarding the O and C features, the same scheme observed for the Las Vegas image is found. Small-, medium-, and large-size SEs are selected, and intermediate steps are removed from the data set. In terms of classification accuracy, even if a decrease is visible, it is only 2% of the overall accuracy. By the McNemar’s test, the result is significantly inferior with respect to the OC-OCR result. By keeping only 12 features, the classification result still outperforms all the results obtained by the single sets. Summing up, the same feature selection scheme is observed, giving much importance to OC features and selecting them in regular SEs’ size intervals. Few OCR features appear to contain all the OCR information.

The SVM has been used for the classification of the morphological features. This classifier is well known for its good generalization capabilities, particularly in high-dimensional spaces. RFE feature selection has been proposed in order to decrease the dimensionality of the input information. Two QuickBird images have been analyzed, both at the spatial resolution of 0.6 m, containing, respectively, 11 and 9 classes of land use. Both images are very challenging because they imply the differentiation of land-use classes indistinguishable at the pure panchromatic pixel level. In both the experiments, the simple feature sets constructed with opening and closing operators showed the best classification accuracies. Nonetheless, each set of operators showed specific peculiarities that made the use of a mixed set suitable. The mixed set improved strongly the results but could not fully take advantage of the added features for the classes where a single set failed. RFE feature selection was used to remove features increasing redundancy and resulted in optimal sets of features either in terms of classification accuracy or of data redundancy. Even if characterized by large differences between urban structures, the same feature sets have been found to be the most valuable for the classification of the images. Moreover, the RFE selection has led to the same conclusions for both scenes in terms of importance of the features, showing the possibility to define a family of features which is optimal for the classification of land use using very high spatial resolution panchromatic imagery. More experiments on new case studies may be desirable. However, the results obtained in this paper, as well as the ones reported in [10], seem to confirm this hypothesis. At present, the method suffers of two major drawbacks. First, in light of the high complexity of the problem, a consequent training set has to be provided to the machine. In our experiments, 30 000 and 50 000 pixels have been used for training, which, even if they represented only 5% of the available ground truth, are a very large amount of pixels. In this sense, active learning methods [64]–[66] selecting relevant ensembles of training examples could bring a solution to this problem. The second problem is related to the feature selection routine used. Even if resulting in good performances, RFE is a greedy method, whose computational cost heavily relies to the number of support vectors found by the SVM. A solution for this problem could be again a technique selecting the most relevant training samples. Otherwise, faster feature selection methods can be considered or designed [46].

VII. C ONCLUSION

ACKNOWLEDGMENT

In this paper, morphological features have been used for the classification of land use from panchromatic very high resolution satellite images. The study aimed at showing the potential of such features for extracting relevant information about the structures and shapes in the scene. Six types of filters have been considered: opening, closing, OR, CR, OTH, and CTH. Each of them highlights a different aspect of the information contained in the image and has been considered at different scales in order to produce a classification result dependent on the scale of the objects.

The authors would like to thank DigitalGlobe for providing data which have been used in this paper and C. Kaiser and G. Matasci (Institute of Geomatics and Analysis of Risk) for the technical support. R EFERENCES [1] P. K. Varshney, “Multisensor data fusion,” Electron. Commun. Eng. J., vol. 6, no. 9, pp. 245–253, 1997. [2] C. Pohl and J. L. Van Genderen, “Multisensor image fusion in remote sensing: Concepts, methods and applications,” Int. J. Remote Sens., vol. 19, no. 5, pp. 6–23, 1998.

[3] G. Simone, A. Farina, F. C. Morabito, S. B. Serpico, and L. Bruzzone, “Image fusion techniques for remote sensing applications,” Inf. Fus., vol. 3, no. 1, pp. 3–15, Mar. 2002. [4] A. P. Carleer and E. Wolff, “Urban land cover multi-level region-based classification of VHR data by selecting relevant features,” Int. J. Remote Sens., vol. 27, no. 6, pp. 1035–1051, Mar. 2006. [5] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural features for image classification,” IEEE Trans. Syst., Man, Cybern., vol. SMC-3, no. 6, pp. 610–621, Nov. 1973. [6] R. M. Haralick, “Statistical and structural approaches to texture,” Proc. IEEE, vol. 67, no. 5, pp. 786–804, May 1979. [7] A. Baraldi and F. Parmiggiani, “An investigation of the textural characteristics associated with gray level cooccurrence matrix statistical parameters,” IEEE Trans. Geosci. Remote Sens., vol. 33, no. 2, pp. 293–304, Mar. 1995. [8] A. Puissant, J. Hirsch, and C. Weber, “The utility of texture analysis to improve per-pixel classification for high to very high spatial resolution imagery,” Int. J. Remote Sens., vol. 26, no. 4, pp. 733–745, 2005. [9] C. A. Coburn and A. C. B. Roberts, “A multiscale texture analysis procedure for improved forest and stand classification,” Int. J. Remote Sens., vol. 25, no. 20, pp. 4287–4308, Oct. 2004. [10] F. Pacifici, M. Chini, and W. J. Emery, “A neural network approach using multi-scale textural metrics from very high-resolution panchromatic imagery for urban land-use classification,” Remote Sens. Environ., vol. 113, no. 6, pp. 1276–1292, Jun. 2009. [11] A. Lorette, X. Descombes, and J. Zerubia, “Texture analysis through a Markovian modelling and fuzzy classification: Application to urban area extraction from satellite images,” Int. J. Comput. Vis., vol. 36, no. 3, pp. 221–236, Feb./Mar. 2000. [12] G. Rellier, X. Descombes, F. Falzon, and J. Zerubia, “Texture feature analysis using a gauss-markov model in hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 7, pp. 1543–1551, Jul. 2004. [13] D. A. Clausi and B. Yue, “Comparing cooccurrence probabilities and Markov random fields for texture analysis of SAR sea ice imagery,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 1, pp. 215–228, Jan. 2004. [14] Y. Zhao, L. Zhang, P. Li, and B. Huang, “Classification of high spatial resolution imagery using improved Gaussian Markov random-field-based texture features,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 5, pp. 1458–1468, May 2007. [15] P. Kruizinga, N. Petkov, and S. E. Grigorescu, “Comparison of texture features based on Gabor filters,” in Proc. Int. Conf. Image Anal. Process., 1999, pp. 142–147. [16] D. A. Clausi and H. Deng, “Design-based texture feature fusion using Gabor filters and co-occurrence probabilities,” IEEE Trans. Image Process., vol. 14, no. 7, pp. 925–936, Jul. 2005. [17] U. Kandaswamy, D. A. Adjeroh, and M. C. Lee, “Efficient texture analysis of SAR imagery,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 9, pp. 2075–2083, Sep. 2005. [18] J. Serra, Image Analysis and Mathematical Morphology. New York: Academic, 1982. [19] P. Soille, Morphological Image Analysis. Berlin, Germany: SpringerVerlag, 2004. [20] M. Pesaresi and J. A. Benediktsson, “A new approach for the morphological segmentation of high-resolution satellite images,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 2, pp. 309–320, Feb. 2001. [21] P. Soille and M. Pesaresi, “Advances in mathematical morphology applied to geoscience and remote sensing,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 9, pp. 2042–2055, Sep. 2002. [22] S. R. Sternberg, “Grayscale morphology,” Comput. Vis. Graph. Image Process., vol. 35, no. 3, pp. 333–355, Sep. 1986. [23] D. Wang, D. C. He, and D. Morin, “Classification of remotely sensed images using mathematical morphology,” in Proc. IGARSS, 1994, pp. 1615–1617. [24] P. Pina and T. Barata, “Classification by mathematical morphology,” in Proc. IGARSS, 2003, pp. 3516–3518. [25] I. Epifanio and P. Soille, “Morphological texture features for unsupervised and supervised segmentations of natural landscapes,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 4, pp. 1074–1083, Apr. 2007. [26] M. Pesaresi and I. Kannellopoulos, “Detection of urban features using morphological based segmentation and very high resolution remotely sensed data,” in Machine Vision and Advanced Image Processing in Remote Sensing. Berlin, Germany: Springer-Verlag, 1999. [27] J. A. Benediktsson, M. Pesaresi, and K. Arnason, “Classification and feature extraction for remote sensing images from urban areas based on morphological transformations,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 9, pp. 1940–1949, Sep. 2003.

[28] J. A. Palmason, J. A. Benediktsson, and K. Arnason, “Morphological transformations and feature extraction for urban data with high spectral and spatial resolution,” in Proc. IGARSS, 2003, pp. 470–472. [29] J. A. Benediktsson, J. A. Palmason, and J. R. Sveinsson, “Classification of hyperspectral data from urban areas based on extended morphological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 480–490, Mar. 2005. [30] M. Fauvel, J. A. Benediktsson, J. Chanussot, and J. R. Sveinsson, “Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 11, pp. 3804–3814, Nov. 2008. [31] G. Louverdis, M. I. Vardavoulia, I. Andreadis, and P. Tsalides, “New approach to morphological color image processing,” Pattern Recognit., vol. 35, no. 8, pp. 1733–1741, Aug. 2002. [32] A. Plaza, P. Martinez, R. Perez, and J. Plaza, “A new method for target detection in hyperspectral imagery based on extended morphological profiles,” in Proc. IGARSS, 2003, pp. 3772–3774. [33] A. Plaza, P. Martinez, J. Plaza, and R. Perez, “Dimensionality reduction and classification of hyperspectral image data using sequences of extended morphological transformations,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 466–479, Mar. 2005. [34] G. F. Hughes, “On the mean accuracy of statistical pattern recognition,” IEEE Trans. Inf. Theory, vol. IT-14, no. 1, pp. 55–63, Jan. 1968. [35] D. W. Scott, “Density estimation: Theory, practice and visualization,” in The Curse of Dimensionality and Dimension Reduction. New York: Wiley, 1992, pp. 195–217. [36] B. Boser, I. Guyon, and V. Vapnik, “A training algorithm for optimal margin classifiers,” in Proc. 5th ACM Workshop Comput. Learn. Theory, 1992, pp. 144–152. [37] V. Vapnik, The Nature of Statistical Learning Theory. Berlin, Germany: Springer-Verlag, 1995. [38] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004. [39] G. Camps-Valls, L. Gomez-Chova, J. Calpe, and E. Soria, “Robust support vector method for hyperspectral data classification and knowledge discovery,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 7, pp. 1530– 1542, Jul. 2004. [40] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 6, pp. 1351–1362, Jun. 2005. [41] M. Chi, R. Feng, and L. Bruzzone, “Classification of hyperspectral remote-sensing data with primal SVM for small-sized training dataset problem,” Adv. Space Res., vol. 41, no. 11, pp. 1793–1799, 2008. [42] A. Plaza, J. A. Benediktsson, J. Boardman, J. Brazile, L. Bruzzone, G. Camps-Valls, J. Chanussot, M. Fauvel, P. Gamba, A. Gualtieri, M. Marconcini, J. C. Tilton, and G. Trianni, “Recent advances in techniques for hyperspectral image processing,” Remote Sens. Environ., vol. 113, Supplement 1, pp. S110–S122, Sep. 2009. [43] C. Lee and D. A. Landgrebe, “Feature extraction based on decision boundaries,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 4, pp. 388– 400, Apr. 1993. [44] B. C. Kuo and D. A. Landgrebe, “A robust classification procedure based on mixture classifiers and nonparametric weighted feature extraction,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 11, pp. 2486–2494, Nov. 2002. [45] I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” J. Mach. Learn. Res., vol. 3, no. 7/8, pp. 1157–1182, Mar. 2003. [46] I. Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh, Eds., Feature Extraction: Foundations and Applications. Berlin, Germany: Springer-Verlag, 2006. [47] T. A. Warner, K. Steinmaus, and H. Foote, “An evaluation of spatial autocorrelation feature selection,” Int. J. Remote Sens., vol. 20, no. 8, pp. 1601–1616, May 1999. [48] J. S. Borak, “Feature selection and land cover classification of a MODISlike data set for a semiarid environment,” Int. J. Remote Sens., vol. 20, no. 5, pp. 919–938, Mar. 1999. [49] L. Bruzzone and S. B. Serpico, “A technique for features selection in multiclass problems,” Int. J. Remote Sens., vol. 21, no. 3, pp. 549–563, Feb. 2000. [50] T. Kavzoglu and P. M. Mather, “The role of feature selection in artificial neural network applications,” Int. J. Remote Sens., vol. 23, no. 15, pp. 2919–2937, Aug. 2002. [51] B. Demir and S. Ertürk, “Phase correlation based redundancy removal in feature weighting band selection for hyperspectral images,” Int. J. Remote Sens., vol. 29, no. 6, pp. 1801–1807, Mar. 2008.

[52] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machines,” Mach. Learn., vol. 46, no. 1–3, pp. 389–422, Jan. 2002. [53] J. Weston, A. Elisseeff, B. Schölkopf, and M. Tipping, “Use of the zeronorm with linear models and kernel methods,” J. Mach. Learn. Res., vol. 3, no. 7/8, pp. 1439–1461, Mar. 2003. [54] R. Archibald and G. Fann, “Feature selection and classification of hyperspectral images with support vector machines,” IEEE Geosci. Remote Sens. Lett., vol. 4, no. 4, pp. 674–679, Oct. 2007. [55] A. Bazi and F. Melgani, “Toward an optimal SVM classification system for hyperspectral remote sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 11, pp. 3374–3376, Nov. 2006. [56] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik, “Feature selection for SVMs,” in Advances in Neural Information Processing Systems NIPS, vol. 13. Cambridge, MA: MIT Press, 2001, pp. 668–674. [57] M. Pal, “Support vector machine-based feature selection for land cover classification: A case study with DAIS hyperspectral data,” Int. J. Remote Sens., vol. 27, no. 14, pp. 2877–2894, Jul. 2006. [58] J. Crespo, J. Serra, and R. Schafer, “Theoretical aspects of morphological filters by reconstruction,” Signal Process., vol. 47, no. 2, pp. 201–225, Nov. 1995. [59] M. Chini, F. Pacifici, W. J. Emery, N. Pierdicca, and F. Del Frate, “Comparing statistical and neural network methods applied to very high resolution satellite images showing changes in man-made structures at rocky flats,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 6, pp. 1812–1821, Jun. 2008. [60] R. Collobert, S. Bengio, and J. Mari ethoz, “Torch: A modular machine learning software library,” IDIAP, Martigny, Switzerland, Tech. Rep. RR 02-46, 2002. [61] G. M. Foody, “Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy,” Photogramm. Eng. Remote Sens., vol. 50, no. 5, pp. 627–633, 2004. [62] P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Englewood Cliffs, NJ: Prentice-Hall, 1982. [63] P. Mitra, C. A. Murthy, and K. Pal, “Unsupervised feature selection using feature similarity,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 3, pp. 301–312, Mar. 2002. [64] D. Cohn, L. Atlas, and R. Ladner, “Improving generalization with active learning,” Mach. Learn., vol. 15, no. 2, pp. 201–221, May 1994. [65] P. Mitra, B. Uma Shankar, and S. K. Pal, “Segmentation of multispectral remote sensing images using active support vector machines,” Pattern Recognit. Lett., vol. 25, no. 9, pp. 1067–1074, Jul. 2004. [66] D. Tuia, F. Ratle, F. Pacifici, M. Kanevski, and W. J. Emery, “Active learning methods for remote sensing image classification,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 7, pp. 2218–2232, Jul. 2009.

Devis Tuia (S’07) received the Diploma in geography from the University of Lausanne, Lausanne, Switzerland, in 2004 and the Master of Advanced Studies in environmental engineering from the Federal Institute of Technology, Lausanne, in 2005. He is currently working toward the Ph.D. degree in the field of machine learning and its applications to remote sensing urban images at the Institute of Geomatics and Analysis of Risk, University of Lausanne. His research interests include the development of algorithms for feature selection and classification of very high resolution images using kernel methods. Particularly, his studies have focused on the use of unlabeled samples and on the interaction between the user and the machine to increase classification performances. Mr. Tuia was one of the winners of the 2008 IEEE Geoscience and Remote Sensing Data Fusion Contest. In 2009, he ranked second place at the student paper competition of the Joint Urban Remote Sensing Event.

Fabio Pacifici (S’03) was born in Rome, Italy, in 1980. He received the Laurea (B.S.) (cum laude) and the Laurea Specialistica (M.S.) (cum laude) degrees in telecommunication engineering from “Tor Vergata” University, Rome, in 2003 and 2006, respectively, where he is currently working toward the Ph.D. degree in geoinformation in the Earth Observation Laboratory. Since 2005, he has been collaborating with the Department of Aerospace Engineering Sciences, University of Colorado, Boulder. He is currently involved in various remote sensing projects supported by the European Space Agency and the Italian Space Agency. His research activities include remote sensing image processing, analysis of multitemporal data, data fusion, and feature extraction. Particularly, his research interests are related to the development and validation of novel classification and change detection techniques for urban remote sensing applications using very high resolution optical and synthetic aperture radar imagery, with special emphasis on neural networks. Mr. Pacifici was the recipient of the 2009 Joint Urban Remote Sensing Event Student Paper Competition. He ranked the first prize in both the 2007 and the 2008 IEEE Geoscience and Remote Sensing Data Fusion Contest. He serves as a Referee for the IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, IEEE TRANSACTIONS ON IMAGE PROCESSING, EURASIP Journal on Image and Video Processing, and Journal of Urban Ecosystems.

Mikhail Kanevski received the Ph.D. degree in plasma physics from the Moscow State University, Moscow, Russia, in 1984 and the Doctoral theses in computer science from the Institute of Nuclear Safety (IBRAE), Russian Academy of Science, Moscow, in 1996. Until 2000, he was a Professor with the Moscow Physico–Technical Institute (Technical University) and the Head of Laboratory with the Moscow Institute of Nuclear Safety, Russian Academy of Sciences. Since 2004, he has been a Professor with the Institute of Geomatics and Analysis of Risk, University of Lausanne, Lausanne. Switzerland. He is a Principal Investigator of several national and international grants. His research interests include geostatistics for spatio-temporal data analysis, environmental modeling, computer science, numerical simulations, and machine learning algorithms. Remote sensing image classification, natural hazards assessments (forest fires, avalanches, and landslides), and time-series predictions are the main applications considered at his laboratory.

William J. Emery (M’90–SM’01–F’02) received the Ph.D. degree in physical oceanography from the University of Hawaii, Mãnoa, in 1975. After being with Texas A&M University, College Station, he was with the University of British Columbia, Vancouver, BC, Canada, in 1978, where he created a satellite oceanography facility and education/research program. Since 1987, he has been a Full Professor with the Department of Aerospace Engineering Sciences, University of Colorado, Boulder. He is active in the analysis of satellite data for oceanography, meteorology, and terrestrial physics (vegetation, forest fires, sea ice, etc.). His research focus areas include satellite sensing of sea surface temperature, mapping ocean surface currents (imagery and altimetry), sea ice characteristics/motion, and terrestrial surface processes. He has recently started working in urban change detection using high-resolution optical imagery and synthetic aperture radar data. This is done with students from various universities in Rome, where is an Adjunct Professor in geoinformation with the “Tor Vergata” University, Rome. He also works with passive microwave data for polar applications to ice motion and ice concentration and to atmospheric water vapor studies. In addition, his group writes image navigation and analysis software and has established/operated data systems for the distribution of satellite data received by their own antennas. He is an associate member of the Laboratory for Atmospheric and Space Physics, an affiliate member of NOAA’s Cooperative Institute for Research in Earth Science, and a founding member of the Program in Oceanic and Atmospheric Sciences, which is now the Department of Atmospheric and Ocean Sciences. He is a coauthor of two textbooks on physical oceanography, has translated three oceanographic books (German to English), and is the author of over 130 published articles. Dr. Emery is a member of the Administrative Committee of the IEEE Geoscience and Remote Sensing Society, Founding Editor and currently Associate Editor with the IEEE GEOSCIENCE AND REMOTE SENSING LETTERS.

Suggest Documents