Multitemporal/multiband sar classification of urban ... - Semantic Scholar

2338

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 41, NO. 10, OCTOBER 2003

Multitemporal/Multiband SAR Classification of Urban Areas Using Spatial Analysis: Statistical Versus Neural Kernel-Based Approach Tiziana Macrì Pellizzeri, Student Member, IEEE, Paolo Gamba, Senior Member, IEEE, Pierfrancesco Lombardo, Member, IEEE, and Fabio Dell’Acqua, Member, IEEE

Abstract—In this paper, we derive two techniques for the classification of multifrequency/multitemporal polarimetric SAR images, based respectively on a statistical and on a neural approach. Both techniques are especially designed to exploit the spatial structure of the observed scene, thus allowing more stable classification results. Such techniques are useful when looking at medium- to large-scale features, like the boundaries between urban and nonurban areas. They are applied to a set of SIR-C images of a urban area, to test their effectiveness in the identification of the different classes that compose the observed scene. A lower and an upper bound to the classification performance are introduced to characterize their limits. They correspond respectively to pixel-by-pixel classification and to the joint classification of the pixels belonging to the different classes identified in the ground truth. The results achieved with the two approaches are quantitatively analyzed by comparing them to the ground truth. Moreover, a hybrid approach is presented, where the homogeneous regions identified through statistical segmentation are classified using a neurofuzzy technique. Finally, a quantitative analysis of the results achieved with all the proposed techniques is carried out, showing that their classification performance is much higher than the lower bound and reasonably close to the upper bound. This is a consequence of their effectiveness in the exploitation of the spatial information. Index Terms—Image processing, synthetic aperture radar (SAR), spatial analysis, urban areas.

I. INTRODUCTION

D

UE TO THE recent introduction of multiparametric operational systems like ENVISAT, SkyMed Cosmo, etc., and to the variety of possible applications, the development of multipolarization/multifrequency/multitemporal synthetic aperture radar (SAR) image processing techniques is a key element for the exploitation of high-quality remote sensing data. It is well known that multiparametric SAR images carry a much larger information content than a single SAR image; thus, their exploitation allows a more accurate identification of homogeneous regions on the basis of their temporal or spectral characteristics. In this paper, we compare two different supervised techniques for multipolarization/multifrequency/multitemporal SAR data classification based on contextual analysis, with the aim of providing an assessment of the ability of each one to exploit the Manuscript received September 27, 2002; revised August 26, 2003. T. Macrì Pellizzeri and P. Lombardo are with INFOCOM Department, University of Rome “La Sapienza,” 00184 Rome, Italy (e-mail: t.macri.pellizeri@ infocom.uniroma1.it). P. Gamba and F. Dell’Acqua are are with Dipartimento di Elettronica, Università di Pavia, 1-27100 Pavia, Italy. Digital Object Identifier 10.1109/TGRS.2003.818762

spatial information for identifying the different classes. The proposed techniques are especially designed to exploit the spatial characteristics of medium- to high-resolution images. This is useful, for instance, when looking at high-density urban areas, at rural land characterization, as well as at medium- to largescale features, like the boundaries between urban and nonurban areas. Note that the capability to provide reliable classification of terrain and land cover types is crucial to the successful exploitation of remotely sensed images for practical applications in earth observation, monitoring, and control. Both the proposed techniques are applied on the same set of polarimetric C- and L-band, and single-channel X-band images of the town of Pavia, in Northern Italy, acquired in different dates. To investigate the capability of the different techniques to exploit the temporal and spectral information content of the processed data, different subsets of the dataset are considered, composed of mono- or multitemporal images, and mono- or multifrequency images. The classification results are quantitatively analyzed by considering classification accuracy as the quantitative measure of performance. This is evaluated by comparing classification results to the ground truth available for the considered area. To characterize the limits that can be achieved, both a lower bound and an upper bound to the classification performance are introduced in this paper, corresponding respectively to singlepixel classification and to joint classification of all pixels in the homogeneous regions defined by the ground truth test set. The difference between the upper and lower bounds can be used to evaluate the maximum performance improvement theoretically available by the exploitation of the contextual spatial information. Namely, it represents the improvement available moving from classifying single pixels to classifying together all pixels in the largest possible homogeneous segments. The two proposed techniques are both interpreted as replacing the (ideally) known region shapes with their estimates, performed either before or after the classification operation. We then demonstrate a significant performance improvement by following the proposed approaches, and we achieve performance that lay reasonably close to the upper bound. Specifically, the first technique we propose is based on a statistical analysis, and it is composed of the cascade of two stages: 1) an unsupervised segmentation stage and 2) a maximum-likelihood (ML) classification stage. The second technique is based on a neurofuzzy classification scheme that operates in two stages: 1) a pixel-by-pixel preclassification and 2) a second classification stage that takes into account for the spatial neighborhood. A hybrid approach is also devised, where

0196-2892/03$17.00 © 2003 IEEE

MACRÌ PELLIZZERI et al.: MULTITEMPORAL/MULTIBAND SAR CLASSIFICATION

the homogeneous regions are identified through statistical segmentation, and then classified using a neurofuzzy technique that operates on the statistically segmented data. Other than introducing the upper and lower bounds as means to evaluate the performance in the exploitation of the spatial contextual information, the proposed schemes show innovative characteristics in three directions: 1) the approach used by the proposed statistical technique to exploit the spatial information, 2) the approach used by the proposed neurofuzzy classification technique, and the possibility to combine statistical and neurofuzzy solutions, and 3) the approach used to address the classification of the specific urban environment. For each of the three cases, a brief literature review is presented, which allows us to discuss our innovative contribution. A. Proposed Statistical Technique The literature records numerous examples of classifiers based on the spectral analysis of individual pixels for the classification of remotely sensed images. However, when comparing these techniques with those of the human photointerpreter, the limitations are apparent, since the specialists, by implicitly using spectral and contextual image information can come to higher classification accuracies. The contextual classification model deals with the problem of incorporating into the classification process the contextual information, expressed in terms of the spatial relations existing between one pixel and the pixels in the rest of the scene. In general, the fundamental assumption of contextual techniques is that geographical phenomena generally display an order or a structure, so that the characteristics of a given pixel can be more easily understood if this pixel is considered in the context of its neighbors. Several techniques have been derived to exploit the contextual information to improve classification accuracy, that can be roughly subdivided into two main approaches: the neurofuzzy approach and the statistical approach. In [1], classification is considered as a cognitive process, and modeled within a fuzzy logic framework. The proposed classification technique is designed to exploit multisource and contextual information, and it is based on a two-level classification strategy. At first, multisource level rules are used to express the degrees of conviction in assigning a single pixel to a corresponding class. Then, contextual level rules are interpreted by integrating the results of the multisource classification processes in the eight-neighborhood of the considered pixel. The technique is then applied to the identification of the glacier equilibrium line using a Landsat TM image and a digital terrain model (DTM). The neurofuzzy technique presented in our work adopts a similar sequentialization of the classification problem into two separate steps, with the difference that in our case multifrequency/multitemporal classification of polarimetric SAR images is considered, instead of multisource classification of optical multispectral images and a DTM. In both cases, and eight-neighborhood of the considered pixel is used for taking into account the spatial context. In [2], a statistical approach is adopted, and the contextual information is incorporated through the modeling of the scene. In particular, the Gibbs random field is used to model the inherent coherence of class labels of spatially adjacent pixels in terms of spatial prior

2339

probabilities. Moreover, class transition probabilities are used to convey temporal interpixel class dependency context into the classification process, and changes of classes over time are allowed. The technique presented in [3] is also based on a statistical approach to exploit temporal and contextual information, but in this case the authors consider the assumption that the observed process is the sum of two independent processes, one having a class dependent structure and the other being an autocorrelated noise process. The noise process accounts for both spatial and temporal correlations, and the conditional joint probabilities of spatial and temporal neighbors are computed assuming a given structure for the covariance matrix. In both the above mentioned works, the spatial context is taken into account by exploiting the pixels located north, south, east and west of the considered pixel. In contrast, the statistical technique presented in our work is based on a region finding process, where the observed scene is divided into distinct statistically homogeneous regions, that are subsequently classified. Thus, in this case the exploitation of the context is considered separately from the classification problem, and the context taken into account has not a predefined structure. Moreover, in our case no changes of class over time are allowed (due to the close temporal proximity of the different images in our dataset), and the absence of temporal correlation is assumed. The same approach was proposed by some of the authors in [4], for the case of SAR and optical image fusion. B. Proposed Neurofuzzy Classification Technique Moreover, the comparison between neurofuzzy and statistical approaches for the same task has been studied in literature. In [5], the two approaches were compared for the case of multisource classification (Landsat MSS, elevation, slope, and aspect data). In particular, the authors compared a supervised Bayesian classifier, modified to include information about the reliability of the different data sources, to two different neural networks, based on the delta rule and on the generalized delta rule. Results showed that the statistical approach has a better generalization capability, thus yielding higher accuracies on the test data, while the neural network approach is more accurate on the training data, in consequence of its distribution-free characteristics that allow to better learn the characteristics of the training data. A detailed comparison between the two approaches for the specific case of urban land use classification was presented in [6], considering the application to two Landsat TM scenes. In particular, the authors compared a standard supervised ML classifier to a backpropagation neural network. Results showed that the neural network approach has a higher computational cost, since it requires the training of the network, even though it is faster than the ML in classifying the considered data. In contrast, the nonparametric neural network approach is more robust to training site selection and to the purity of the class signatures, whereas the ML classifier finds more difficulties in recognizing the intrinsically heterogeneous classes (e.g. “urban residential”) that are commonly present in the urban environment. In our work, the comparison is extended to the case of a fuzzy approach, which capability to emulate the human reasoning has shown a great potentiality in the interpretation and classification of SAR

2340


images. The main contribution of our work with respect to the previously mentioned ones is in the derivation of a combined approach, where the identification of the homogenous regions is carried out by means of unsupervised statistical segmentation, and the neurofuzzy classifier is then applied to the identified segments. This was made possible by the separation of the spatial analysis stage from the classification stage that characterizes both the proposed techniques. C. Classification of the Specific Urban Environment Despite the analysis of SAR images has been widely studied, only a few applications are related to urban area characterization. One of the few examples is [7], where a feature fusion approach based on evidential reasoning is proposed to extract the elements of a SAR scene. Multiple spatial operators are used to characterize the urban area, and the approach combines their results in a comprehensive and consistent framework. For built-up area characterization three operators are used, based on texture analysis and density of bright points. Road and river operators add useful context information. In particular, the image segmentation available using the operators outputs is fused with road and river network hypotheses using Dempster–Shafer theory to improve scene characterization. Another way to consider spatial characteristics of SAR images is by way of exploiting interferometric information. In [8] it is shown that the joint use of SAR backscattering and interferometric SAR coherence images provides useful results in land cover characterization. The approach is based on the principal component transform, a feature extraction/discarding step based on keeping the components with eigenvalues significantly higher than the smallest one (the so called Bartlett test), an unsupervised fuzzy clustering with multiple cluster validity checks. Moreover, texture measures on SAR polarimetric data using both single-channel and intrachannel statistics to improve the classification accuracy are considered in [9], where a classification based on logistic regression shows that we may obtain 90% accuracy on urban area extraction, provided that suitable high-resolution data are used. Finally, we would like to note that in [10] it has been proved that using satellite SAR textural features based on cooccurrence matrix it is possible to capture not only the urban area, but also the different parts and environments in it. Differently from the above mentioned works, the techniques proposed in our paper are applied directly to the speckled data, without requiring the extraction of any additional feature. Thus, they do not require any particular preprocessing of the considered data, which makes them more easily applicable in a wide set of cases. Finally, an extended review of the different SAR processing techniques and applications over urban areas can be found in [11]. D. Organization of Paper The paper is organized as follows. The dataset and the corresponding ground truth are described in detail in Section II, together with the statistical analysis of the different classes. The lower bound and the upper bound to the classification performance are introduced in Section III. The derivation of the statistical techniques and the corresponding results are discussed

Fig. 1. Total power C-band April 16 image.

in Section IV, while the description and the results obtained with the neurofuzzy approach are discussed in Section V. Finally, the different approaches are compared in Section VI, to assess the comparative performance and their ability to exploit the spatial information. Our main conclusions are summarized in Section VII.

II. STATISTICAL ANALYSIS OF THE DATASET For the analyses of this paper, we considered a set of SIR-C/X-SAR polarimetric images of the town of Pavia, in Northern Italy, acquired in four different days of April 1994. Namely, the set is composed of the following images: • April 14: C- and L-band, HH, HV and VV polarization; • April 16–18: C- and L-band HH HV polarization, X-band VV polarization. As an example, the C-band total power image of April 16 is reported in Fig. 1. All the images have been coregistered before being processed, so that pixels in the same position represent the same resolution cell. The April 14 images were collected under SIR-C mode 16 (multilook complex, quad pol), while the other SIR-C images were collected under mode 11 (single-look complex, dual pol). Thus, all the polarimetric channels are available for the April 14 image, while for the other images only the HH and the HV channels are available. The incidence angles for the different images are not exactly the same, however their values span a limited interval (44 to 57 ). The images provided in slant range were resampled to the ground range. Then, all the images were registered to a technical regional map, with a spatial resolution of about 10.8 m both in azimuth and in range. The images have been resampled using the nearest neighbor technique, to avoid modifying the radiometric and the statistical characteristics of the images. Finally, it is worth to point out that


2341

no other preprocessing was performed, and in particular no despeckling filters were applied to the data. Consequently, all the speckle noise present in the original images is still present in the data used as input for the techniques proposed in the following. A partial knowledge of the ground truth is available for the considered dataset, composed of three different classes: 1) water; 2) agricultural area; and 3) built up area. The ground truth was extracted from the technical regional map of the area and validated with in situ observations. The whole ground truth was split into a training set and a test set, respectively used for setting up the classifiers and for testing classification results. In order to provide a fair comparison of the results achievable with the different techniques and datasets, particular attention has been devoted to the choice of the reference training set and test set for all cases. The training set is reported in Fig. 2(a), while the test set is reported in Fig. 2(b). To provide a quantitative measure of classification performance for each class we considered two different accuracies. For each class, the omission accuracy records the fraction of each defined class which is correctly assigned to that category, while the commission accuracy records the fraction of each assigned class that has been correctly classified. Moreover, a useful general performance measures is provided by the overall accuracy, that represents the fraction of the pixels of the test set that has been correctly classified. As in [12] and [13], to derive a statistically optimized processing technique we start from the identification of a statistical model that provides a useful description of the characteristics of the considered dataset. To this purpose, the histograms of each class of the training set are evaluated and their shape is compared to the theoretical distributions. As it is well known from the theory [14], the probability density function (pdf) of SAR data is dominated by the presence of speckle, namely multiplicative noise due to the effect of coherent imaging, that tends to mask the backscattering characteristics of the observed objects. A phenomenological representation of the interference between coherent scatterers leads to the K-distribution, which is well known in the literature and has been widely validated for different SAR images [14]. Following this model, the joint disof a singletribution of the pixel intensities , channel SAR image for a single region of pixels is

(1) is the Gamma function of where is the number of looks; ; and is the Bessel K function of order of , [15]. The shape of the pdf in (1) depends on the mean value , that represents the mean reflectivity, and on the order parameter that characterizes the nongaussianity of the backscattering mechanism. Specifically, the order parameter value spans the range for a Gaussian backscattering of uniform agrifrom for a well cultural fields or grass (pure speckle), to for urban areas. developed forest region and finally to The excess texture associated with small values of increases

Fig. 2. (a) Training set and (b) test set for the considered area.

the level of fluctuation of the measured intensities to values higher than pure speckle. Since the K-distribution is not easy to be handled analytically, especially when dealing with multidimensional statistics, it is usual to approximate it with other distributions. In particular, when the number of looks is high enough, it is well known that the lognormal distribution is an acceptable approximation to the K-distribution. This is usually the case for the multilook SAR images that are commonly used for earth observation and land cover classification. Therefore, the joint distribution of the pixel intensities ,

2342


of a single-channel SAR image for a single region of will be written as

pixels

(2) and encode the statistical characteristics of the where SAR reflectivity from the scene. It is also interesting to observe that the theoretical speckle model does not rigorously apply in dense built-up areas, since different scattering mechanisms might become dominant, like the edge diffraction, specular returns, multibounce effects, etc. In these conditions, the speckle theory with the resulting K-distribution statistical model fails; however, the lognormal pdf is still appropriate to fit the spiky behavior caused by the mentioned scattering mechanisms, which give rise to high tails. Fig. 3 shows the histograms and the corresponding fitting lognormal pdfs for the pixels of the training set and the L-band HH, C-band HH, and X-band VV images of April 16. As apparent, the lognormal pdf provides an adequate fitting to the data histogram for all the classes that compose the ground truth. Moreover, it is apparent that the fitting is adequate for both the distribution body and the distribution tails. Therefore, the lognormal pdf can be effectively used as the basis for the development of processing techniques based on a statistical approach. The values of the parameters providing the best fitting in the logarithmic domain for the different classes on the various channels are reported in Table I. These values are used to train the supervised ML classification technique. III. STATISTICAL UPPER AND LOWER BOUND CLASSIFICATION PERFORMANCE

FOR

Before introducing the practical statistical and neural kernelbased classification scheme, we consider the upper and lower bounds to the achievable classification performance. To derive these bounds we consider an ML classification scheme, obtained as described below. We start from the pdf of the pixel intensities in the SAR image. When considering the logarithmic , , the joint pdf of the pixel intensities values becomes the product of identical Gaussian pdfs, with mean value and standard deviation (3) Thus, the joint pdf for a images becomes of

-pixel homogeneous region over a set

Tr (4) having arranged the logarithmic intensities of the pixels -dimensional vector from the th resolution cell into the , and having defined the logarithmic mean , and the covariance matrix . We vector and stand for the trace and the also recall that Tr determinant of matrix , respectively.

Fig. 3. Histograms and fitting Lognormal distribution on each class for (a) C-band HH image, (b) L-band HH image, and (c) X-band VV image of April 16 on a logarithmic scale.


2343

TABLE I MEAN VALUES AND VARIANCES OF THE LOGARITHMIC INTENSITIES FOR THE BEST FITTING DISTRIBUTION

Since the joint pdf in (4) also represents the likelihood that the logarithmic values of the considered -pixel region belong to a Gaussian pdf with mean vector and covariance matrix , the ML classification technique is directly obtained by comparing the values of the likelihood function for the different classes. Namely, the pixels of each segment are assigned to the class that satisfies the condition

Tr

TABLE II STATISTICAL APPROACH: TRAINING SET ACCURACY, LOWER AND UPPER BOUNDS

(5)

and statistically characterize each where the parameters class and are estimated from the training set. To allow the evaluation of the generalization capability of the classifier, we report in the first column of Table II the accuracies obtained by applying the classifier to the pixels of the training set individually. To assess the upper and lower bounds for the performance of the ML classification of remotely sensed images we base our considerations on the uncertainty present in the classification of each pixel of the image. The maximum level of uncertainty is present when applying the classification at the single-pixel level on the original image. In fact, only the intensity value of the single pixel is used to define the class, thus yielding the simplest scheme with minimum computational demand, but with the worst possible performance due to the presence of the multiplicative speckle noise. This gives the lower bound for the classification performance. Using a larger number of pixels reduces the uncertainty present in the classification, thus we expect the classification performance to increase. Eventually, the best performance would be expected when using the largest possible regions, consistent with the scene structure, to classify the image.

The classified test regions, defined by the ground truth, could provide this maximum region size. Of course, the shape of actual segments is unknown in practice. However, using the test regions establishes the potential upper bound to classification performance. In practice, due to the large number of very small segments that compose class 3 in our ground truth, for this class the upper bound is obtained by combining all the pixels of seg-

2344


Fig. 4. Overlay of the ground truth boundaries on the single-pixel ML classifications of the April 16–18 image for (a) C-band, (b) L-band, (c) X-band, and (d) C-, L-, and X-bands.

ments belonging to the same class that are spatially closer than five pixels. A. Lower Bounds The lowest limit to performance is obtained by classifying single pixels. As an example, Fig. 4 shows the overlay of the ground truth boundaries on the single-pixel ML classifications obtained using the multitemporal dataset of the April 16–18 images for (a) C-band, (b) L-band, (c) X-band, and (d) the multifrequency combination of C-, L-, and X-bands. A complete

characterization of the overall classification performance is reported in the second column of Table II, for all combinations of frequencies and collection dates. As apparent, the classification results are affected by the presence of the speckle noise, especially for L- and X-band. This results in very low values of the average probability of correct classification, with an overall accuracy of 66.7% for the classification of the April 16 L-band images, and 48.3% for the X-band image of the same date. Moreover, the borders of the Ticino river are in general not clearly identified. From the analysis of Table II it is also apparent that


the use of a larger number of images (namely, a multitemporal dataset) yields a neat performance improvement, but the results are still not satisfactory, since the overall accuracy does not exceed 85.6% when jointly classifying all the bands of the multitemporal dataset. Under these conditions, it is apparent the need to develop a classification technique that takes into account the spatial properties of the observed scene, so that the results can be generated over a larger number of pixels, averaging over the speckle fluctuations. B. Upper Bounds As previously stated, the upper bound is obtained by classifying together all the pixels of each segment of the ground truth. A complete characterization of the overall classification performance is reported in the last column of Table II for all combinations of frequencies and collection dates. The large increase in classification performance is apparent for all combinations of channels, showing that good classification performance can be achieved if the pixels are correctly grouped together and jointly classified. In particular, it is apparent that the best results are obtained by using all available bands and dates, that yields a maximum overall accuracy of 95.6%, against the maximum value of 85.6% for the single-pixel classification. It is also to be observed that, while for the single-pixel classification the addition of more bands or dates provides a significant performance improvement, in the ideal case of the upper bound the performance can be significantly high even with few channels. In this case the improvement of the multiple bands and collection dates might be less apparent, since the system is already close to having exploited all of the available information. When considering the classification performance for each class individually, it is useful to notice that even in the upper bound class 1 (water) is very difficult to identify and in particular it has a very low commission accuracy. This is mainly a consequence of a residual coregistration error, that causes the Ticino river to be not perfectly overlapped in the different images. The coregistration error does not appreciably affect the results for classes 2 and 3, since they are composed of a larger number of pixels, and they have a more regular shape. The comparative analysis of lower and upper bounds demonstrates the impact of spatial analysis on the achievable performance when dealing with SAR images of urban areas. The significant classification improvement possibly achievable by ideally exploiting the spatial characteristics of the data shows that the correct use of the spatial distribution of the data is essential to achieve high-performance classification of urban areas with SAR images. Unfortunately, the spatial distribution of the SAR images is not known a priori, as implied by the upper bound. However, different techniques can be used to estimate the spatial distribution of the data and to classify groups of homogeneous pixels together, instead of single pixels. In the following, we propose two practical techniques to make use of the spatial characteristic of the data to obtain the best possible classification performance in the difficult urban environment: • statistical approach, based on the cascade of an optimized segmentation stage and a ML classification stage;

2345

• a neural kernel-based approach, which uses the spatial characteristics as implied by the appropriate “learning” on the training set. The first approach is an attempt to estimate the largest homogeneous regions from the SAR images themselves and classify together all the pixels of the identified segments. This follows directly by the definition of the performance upper bound, with the idea to estimate the homogenous segments and use this estimate in place of the known segments. The second approach instead is an attempt to directly and jointly classify groups of spatially homogenous pixels, by means of a neurofuzzy classifier based on a combination of spectral and spatial characteristics. IV. STATISTICAL SEGMENTATION AND CLASSIFICATION To derive a statistical segmentation technique that is able to identify the largest possible homogeneous regions of pixels to be classified together, we aim at the derivation of an appropriate quality function. This function should be applicable to any given mask of segments, and it should yield a measure of the effectiveness of this given set of segments in encoding the information contained in the original SAR image. The proposed derivation of such function is based on the likelihood of the image pixels. As noticed before, the function in (4) also represents the likelihood that the pixels belong to a homogeneous region with known mean vector and covariance matrix . Since the data at the different frequencies and days are generally largely uncorrelated, to simplify the structure of the statistical segmentation, we diag . consider a diagonal covariance matrix are not known a priori, However, in general and thus the likelihood function cannot be used directly. Therefore, we resort to the generalized likelihood, by replacing the unknown parameters with their ML estimates (6) for the mean value and (7) for the elements of the covariance matrix. Thus, the generalized likelihood of a homogeneous region of pixels is (8) Assuming that the images can be split into a disjoint set of regions with pixels respectively (being ), we can write the joint generalized logarithmic likelihood (JGLL) of the whole image, just by adding the logarithmic likelihood functions, as (9) is the estimated covariance matrix, evaluated on the where pixels of region . Equation (9) is the key expression of the statistical segmentation approach, since it yields the desired

2346


Fig. 5. Overlay of the ground truth boundaries on the ML joint classification of the April 16–18 images after segmentation. (a) C-band, (b) L-band, (c) X-band, and (d) C-, L-, and X-bands.

quality function to measure how appropriate is any given set of SAR images. segments for the considered group of When looking for an optimal segmentation, we wish to find the specific set of segments that maximizes the joint generalized logarithmic likelihood function in (9). A possible way to obtain this result is to modify the set of segments until the maximum of the JGLL is obtained. Since the number of possible changes is very large and there are possibly many local maxima, a statistical optimization technique based on simulated annealing is used to identify the best set of homogeneous segments for

the considered dataset, [16]. Namely, during the segmentation process, the objective function is maximized with respect to the number of segments and their borders. To encourage the algorithm to obtain regions with smooth borders a penalty funcis added to the JGLL in (9), which is a function of the tion considered configuration of segments, with negative values. The influence of the penalty function on the segmentation results is controlled by means of a multiplicative shape penalty factor , that can be set by the user. Moreover, the user can set the size of the regions into which the image is subdivided during the


2347

TABLE III STATISTICAL SEGMENTATION AND CLASSIFICATION RESULTS

initialization of the algorithm. The stability of the results with respect to changes in the values of and was investigated considering different values of the two parameters. In particular, we , 4, 9, 16, 25, and , , . The set results achieved for the different values showed that the classification accuracy is not strongly influenced from the values of these user-defined parameters. In particular, the segmentation algorithm tends to converge approximately to the same set of segments for each value of , while changing the value of can lead to appreciable changes in the shape of the identified segments. However, for moderate variations of the change in the border shape is also limited and the accurate selection of the value has little impact on the classification results. Moreover, a group of adjacent segments is often assigned to the same class, therefore the shape of the internal borders is not relevant for the classification results, especially for segments with very close characteristics. It is important to notice that this segmentation technique makes full use of the backscattering characteristics of the observed scene at the different times of the SAR acquisitions. This technique operates on the original data and is expected to yield the highest possible geometrical resolution, while fully exploiting the contextual information, to increase the discrimination capability between adjacent regions with slightly different characteristics. More details about the penalty function and the role of the different parameters inside the annealing procedure can be found in [14]. This segmentation technique is applied to the dataset of Pavia, , , and considering different combinasetting tions of bands and dates. Then the ML classifier is applied over the identified segments. Fig. 5 reports the overlay of the ground truth boundaries on the classification of the segments identified by jointly segmenting the multitemporal images (April 16–18),

considering both the individual bands and the joint processing of C-, L-, and X-band. From the visual comparison of Figs. 4 and 5 the improvement achieved by introducing the exploitation of the contextual information is apparent. In particular, it can be easily noticed that the boundaries of the river are more clearly identified in Fig. 5, moreover the distinction between built-up and agricultural areas is more neat, and the green areas inside the town have been preserved. Finally, it is possible to notice that the classification results are in general more stable, since the homogeneous regions in Fig. 5 are uniformly classified, while in Fig. 4 it is possible to notice the presence of isolated misclassified pixels in the same homogeneous region. The complete set of numerical results, namely omission accuracy, commission accuracy, and overall accuracy for all the combination of frequencies and collection dates is reported in Table III. As apparent, classification performance is generally higher than the single-pixel lower bound, but it does not reach the values of the upper bound. From the analysis of Table III, it is possible to draw some conclusions about the effectiveness of the different individual bands for our classification purposes, as well as on the impact of performing a joint multifrequency processing. The following considerations are in order. • C-band yields acceptable results for the classification of water or agricultural areas, while the results are not satisfactory when classifying urban areas. In particular, the percentage of correctly classified pixels for class 3 is 63.9% for the monotemporal case, while it is increased to 68.5% and 80.3% when using three and four multitemporal images, respectively. • Using band L only, in contrast, it is possible to achieve an acceptable discrimination of class 3, while the results are

2348


TABLE IV NONLINEAR STATISTICAL SEGMENTATION AND CLASSIFICATION RESULTS

poorer for classes 1 and 2. In particular, the identification of water is rather inaccurate, with no more than 64.3% of pixels correctly classified for both the mono- and the multitemporal case. • Finally, band X provides acceptable results for water and agricultural areas, while being highly inaccurate for the identification of urban areas. In particular, the percentage of correctly classified pixels for class 3 is about 47.6% for the monotemporal case, and 51.7% for the multitemporal case. The joint use of different bands is expected to yield improved performance, since the proposed technique is designed to optimally exploit the information content of the different bands, and to integrate their different contributions. As a matter of facts, an improvement in the classification performance is generally obtained when considering a multiband processing scheme, as can be observed from Table IV. In particular, the joint use of C and L-band yields an improvement in the percentage of correctly classified pixels, which is mainly due to an increased accuracy in the classification of agricultural areas, for both the monoand the multitemporal case. When adding also the X-band the overall accuracy is increased up to 84.2% for the monotemporal case (the maximum value achievable with a single band in the same case is 82.7%), and to 88.9% in the multitemporal case (where the maximum single-band probability of correct classification is 86.6%). When considering the effects of the use of multitemporal sequences, it is apparent that the optimum exploitation of the information content of the whole sequence yields a performance improvement with respect to the monotemporal case. The percentage of correctly classified pixels achieved from the joint processing of C-, L-, and X-band raises from 84.2% when considering only the April 16 images to 87.3% when considering three different dates (April 16–18). Moreover, the addition of a fourth date (April 14) yields a further improvement, with an overall accuracy of 88.9%. This suggests that the use of longer sequences may further increase the classification accuracy to very satisfactory results. As apparent, the use of multitemporal sequences does not yield any improvement in the identification of class 1 (water), which suffers of bad definition and coregistration errors. Finally, it is worth to note that the optimum multidimensional segmentation (used for both multifrequency and multitemporal sets) of the considered dataset is obtained through the maxi-

mization of the joint generalized likelihood function by means of a simulated annealing procedure. The number of iterations required for the convergence of the algorithm significantly decreases when increasing the size of the dataset. Thus, the CPU time required for the segmentation of the set of images under analysis is largely independent of the number of images in the dataset. The average time required to run the segmentation routine on a Pentium IV 1 GHz is of 5 min, while the classification routine takes approximately 10 min on the same machine. It is important to notice that the only preliminary processing required to apply the statistical segmentation + classification scheme is the estimation of the statistical characteristics of the classes from the training set, which has a negligible computational cost. To complete the analysis of the results achieved using a statistical approach, we notice that the omission accuracies obtained for the different classes are different when processing individual bands. In particular, it is apparent from Table III that C-band yields the better results for the classification of water, while L-band yields the better results for the classification of built-up areas, and X-band yields the better results for the classification of agricultural areas. Thus, the classification obtained using separately the different bands to classify the different classes is expected to give the best possible results, since it allows to exploit at the best the discrimination capability of each band. Following this idea as an alternative possibility, we derive a nonlinear classifier as follows: 1) for a given segmentation, the identified regions are classified using one of the bands, then 2) the regions that have not been assigned to the class corresponding to the considered band are classified using another band, and 3) the remaining regions are classified using the third band. The use of different sequences of the possible bands yields different classification performance. We considered all the possible sequences of C-, L-, and X-band, and we selected for each case the sequence that yields the highest average probability of correct classification. The results are reported in Table IV, for both the monotemporal and the multitemporal case. From the comparison of the values of the overall accuracy achieved in this case with the corresponding values achieved with the joint ML classifier it is apparent that the two methods yield very similar performance. In other words, the joint ML classifier is able to retain the best discrimination capability of each band, but it does not need any a priori information about the correspondence between the different bands and the different classes.


2349

Fig. 6. Neural kernel-based classification. (a) Scheme of a supervised fuzzy ARTMAP neural network. (b) Pixel-by-pixel spectral classification. (c) Spatial analysis.

V. NEURAL KERNEL-BASED CLASSIFICATION A different approach to multiband classification, still taking into account spatial analysis, is the use of a two-step neural kernel-based classifier, that is described in detail in [17]–[19]. The procedure performs first an image preclassification on a pixel-by-pixel basis, using a supervised fuzzy ARTMAP neural network. The scheme of the classifier is illustrated in Fig. 6(a). Spectral analysis is performed classifying the “patterns” representing the spectral response of each pixel in different bands. Since normalization is performed, a scaling value (impmod) is added [Fig. 6(b)]. Then, the spatial analysis is carried out by means of a second classification with the same neural classifier, but using now as input vector the percentages of pixels in a 3 3 window around the current pixel position assigned to each class by the first step [Fig. 6(c)]. With respect to its unsupervised counterpart presented in [17] and corresponding to an ART-2A network followed by a fuzzy C-means clustering, the supervised fuzzy ARTMAP shows superior performances as far as the classification maps are concerned. The training phase, usually critical in terms of both time and choice of the data subset, is far less critical in multiband data, since clusters are more easily discriminated in a multidimensional space. The more the bands, the shorter the time the neural network requires to learn, by means of a training set, how to group inputs into consistent clusters of similar vectors. The overall accuracies obtained by the proposed scheme on the training set are reported in the first column of Table V.

TABLE V NEUROFUZZY APPROACH. TRAINING SET PERFORMANCE AND LOWER BOUND

The second column of Table V shows the results obtained in the preclassification, namely by considering only a supervised

2350


TABLE VI NEUROFUZZY SPECTRAL + SPATIAL

fuzzy ARTMAP classification on a pixel-by-pixel basis (first stage), while Table VI shows the output of the second stage, thus including the use of the spatial characteristics to refine the classification. The results in Tables V and VI show some differences between spectral and spectral+spatial classification. As a matter of fact, if we consider only one supervised fuzzy ARTMAP classification, on a pixel-by-pixel basis, we have lower accuracies, especially when considering single-date SAR images. In contrast, the use of the second step of the above presented procedure allows increasing both the omission and commission accuracies by reducing the “salt-and-pepper” effect, usual in pointwise classification. In general, we should also note that the use of the spatial classification after the spectral one improves the results, even if this is less true for multitemporal sets, where the temporal redundancy of the data is perhaps sufficient to better discriminate among classes. This is also an effect of the nature of ART neural networks, developed to deal with pattern recognition problems. In case of multitemporal data, the pattern is the vector of temporally different values assumed by the same pixel. A slightly different behavior may be found by classifying the presegmented images by using the mean and variance of each segment (first column of Table VII) or the mean and the inverse of the normalized variance (second column of Table VII). In both cases classification was performed by considering the presegmented data as inputs to a neurofuzzy spectral classification. Because of the homogeneous values assigned to the pixels in each segmented region, the effect of this classification is that each of the segments is totally assigned to an output class. It seems that the use of simple statistical indicators, like mean and variance, is sufficient to obtain results similar, and sometimes better, than using more refined indicators. Again, this difference disappears using a larger number of bands. However, the overall accuracies for the mean+shape factor case show a higher corre-

TABLE VII OVERALL ACCURACY FOR THE HYBRID APPROACHES

lation with the statistical classifier results than with the neurofuzzy ones. This is consistent with the fact that we use exactly the same information exploited for the segmentation step to assign segments to output classes. In general, from the comparison of the results in Tables VI and VII we should also note that the use of the spatial classification after the spectral one is almost equivalent to applying


the spectral classification to the segmented images obtained by means of the statistical approaches. In a sense, this means that the information added by considering the neighborhood of a pixel is “independent” of the information carried by the values of that pixel in the multipolarization/multiband dataset. As a result, the combination of the statistical and neurofuzzy approaches does not improve to a large extent the original classification maps. Finally, it is worth stressing that the CPU time required by the neurofuzzy approach to train the fuzzy ARTMAP module dramatically decreases while increasing the input dimensionality. This behavior depends on the increased easiness of the network to find the boundaries between different land cover classes in a space with more dimensions, especially when new dimension are “orthogonal” or “almost orthogonal” to previously considered ones. This is the case, for instance, when adding data at different wavelengths, more or less uncorrelated the ones to the others. The average time required by the whole technique is of about 20 s, 10 for the spectral classification and 10 for the spatial classification, on a Pentium II 1 GHz. Moreover, a preliminary stage is necessary to optimize the input parameters, that takes approximately 35 min on the same machine. The optimization is performed only one time, and the resulting parameters are used in all the subsequent classifications.

VI. COMPARISON OF THE DIFFERENT APPROACHES The comparative analysis of the statistical and the neurofuzzy approaches (namely, the comparison of Tables III and VI) shows that the complete neurofuzzy chain slightly outperforms the statistical approach when classifying single-date images. This is especially true when considering L-band and X-band data. The slightly better performance is a consequence of the larger value of the omission accuracies of all the classes. When the dataset is larger, and combines more frequencies and/or more dates, the difference between neurofuzzy and statistical approaches tends to become negligible. In the multiparametric case none of the approaches yields uniformly better performance than the other, while the best classifier depends on the considered combination of channels. Moreover, the comparison of Tables VI and VII shows that using presegmented images as inputs to the neurofuzzy classifier is almost equivalent to applying the spectral+spatial classification. In a sense, this means that the information added by considering the neighborhood of a pixel is “independent” from the information carried by the values of that pixel in the multipolarization/multiband dataset. As a result, the combination of the statistical and neurofuzzy approaches does not improve to a large extent the original classification maps. Thus, we conclude that both approaches are able to fully exploit the spatial characteristic of the SAR images, together with their spectral properties. It is also interesting to observe that the classification performance is more sensitive to the spatial analysis than to the multifrequency/multitemporal information. This is apparent by comparing the variability along the columns of the different tables (different combinations of channels), with the variability with respect to different ways to exploit the spatial information. In particular, while the statistical upper bound to the achievable

2351

performance shows a limited sensitivity to the number of frequencies and collection dates used for the classification, the difference between upper and lower bounds is remarkable. However, especially for the practical approaches (both statistical and neurofuzzy) there is a clear trend of performance increase with the number of considered channels. Specifically, the multifrequency classification shows performance very close to the best single-frequency classification for each class. Therefore, it acts as it automatically selects the best band (which is not known a priori). Moreover, the use of multitemporal sequences rather than single-time images yields a better classification accuracy. In terms of implementations, we notice that the neurofuzzy approach is strongly supervised, therefore a wise choice of the training set allows to overcome the problems due to the scarce discriminability between different land cover classes for all datasets. However, this is also a disadvantage, since it requires the capability to define such a training set by a skilled operator. Moreover, the nonparametric nature of neural networks allows them to mimic the statistics of any input combination better than the proposed statistical model. In contrast, the statistical analysis is composed of an unsupervised segmentation step, followed by a supervised ML classifier. The former is based on a statistical model (lognormal pdf) generally applicable to SAR and optical remotely sensed images. This stage does not require any type of operator intervention and is totally automatic. The latter stage for its supervised nature requires the definition of a training set, to extract the model parameters (mean and variance of the logarithm of intensity) to be used by the ML classifier. This stage is straightforward and can be made fully automatic, so that any human intervention can be totally avoided. VII. CONCLUSION In this paper, we considered the impact of the spatial analysis on the classification of SAR images of urban areas. In particular, we introduced two bounds on the achievable classification performance: a lower bound, obtained by the pixel-by-pixel classification of the image, and an upper bound, obtained by classifying together all the pixels that belong to homogeneous ground segments. To obtain the upper bound, these segments are defined by using the a priori knowledge on the structure of the image (from ground truth) so that they are exactly identified. The significant difference between upper and lower bounds demonstrates the potential improvement achievable by the ideal exploitation of the spatial structure of the images. This is equally valid for single channel, as well as for the multitemporal/multipolarization datasets considered in this work. As apparent, the upper bound yields only ideal performance values, since it requires a priori knowledge on the image. To achieve in practice the performance improvement related to the spatial analysis, we follow two different approaches: 1) a statistical approach, given by the cascade of an optimized segmentation stage and a classification stage (either ML or nonlinear), and 2) a neural kernel-based approach. The former operates with the same strategy of the upper bound, by replacing the unknown segment structure with its estimate obtained by the proposed statistical-based segmentation. The latter approach makes use of some spatial characteristics in its operation and is appropriately trained, while keeping a strict model-free structure.

2352


The individual analysis of the two approaches shows that both of them are able to exploit a significant fraction of the ideally available improvement, by yielding classification performance much higher than the lower bound and reasonably close to the upper bound. The comparative performance analysis shows that they have different properties in terms of capability against the different combinations of frequencies and collection dates, but their overall classification performance is essentially comparable. Moreover, for both approaches, the computational cost does not largely increase with an increase of the dimensionality of the dataset to be processed, so that both of them are suitable for the analysis of large numbers of images. Thus, both approaches can be used as alternative solutions for the full exploitation of the multichannel SAR images of urban areas, there including the achievement of the full benefits coming from the spatial analysis. From the analysis of the results, it is apparent that classification performance is affected by the number of frequencies and/or collection dates used (even though it is less sensible to this dimensionality than to the use of spatial structure). In particular, the following points are in order. • The multiband processing scheme shows performance very close to the single-band processing scheme applied to the best channel for each class. Therefore, it is an effective fusion technique, since it acts as it automatically selects the best band (which is not known a priori). • The use of multitemporal sequences rather than single-time images yields a better classification accuracy. Moreover, the achieved results suggest that a further improvement could be achieved using longer sequences. The contribution of this paper can be summarized in the following points: 1) it presents a segmentation technique that allows to exploit the spatial information in a context which size is not defined a priori; 2) it provides a first comparison between statistical and fuzzy approaches, also proposing a first combined approach. The derivation of a combined technique was made possible by the separation of the spatial analysis stage from the classification stage in both the proposed approaches. Finally, 3) the proposed technique is simpler than the majority of the techniques especially designed for urban areas, since it does not require the preliminary extraction of features from the considered dataset.

[4] T. M.T. Macrì Pellizzeri, P. Lombardo, and C. J. Oliver, “A new maximum likelihood classification technique for multitemporal SAR and multiband optical images,” in Proc. IGARSS, Toronto, ON, Canada, June 2002, pp. 24–28. [5] J. D. Paola and R. A. Schowengerdt, “A detailed comparison of backpropagation neural network and maximum-likelihood classifiers for urban land use classification,” IEEE Trans. Geosci. Remote Sensing, vol. 33, pp. 981–996, July 1995. [6] J. A. Benediktsson, P. H. Swain, and O. K. Ersoy, “Neural network approaches versus statistical methods in classification of multisource remote sensing data,” IEEE Trans. Geosci. Remote Sensing, vol. 28, pp. 540–552, July 1990. [7] F. Tupin, I. Bloch, and H. Maitre, “A first step toward automatic interpretation of SAR images using evidential fusion of several structure detectors,” IEEE Trans. Geosci. Remote Sensing, vol. 37, pp. 1327–1343, May 1999. [8] P. B. G. Dammert, J. I. H. Askne, and S. Kulmann, “Unsupervised segmentation of multitemporal interferometric SAR images,” IEEE Trans. Geosci. Remote Sensing, vol. 37, pp. 2259–2271, Sept. 1999. [9] D. Borghys, C. Perneel, and M. Acheroy, “Automatic detection of built-up areas in high-resolution polarimetric SAR images,” Patt. Recognit. Letters, vol. 23, pp. 1085–1093, 2002. [10] F. Dell’Acqua and P. Gamba, “Texture-based characterization of urban environments on satellite SAR images,” IEEE Trans. Geosci. Remote Sensing, vol. 41, pp. 153–159, Jan. 2003. [11] F. M. Henderson and Z. G. Xia, “SAR applications in human settlement detection, population estimation and urban land use pattern analysis: A status report,” IEEE Trans. Geosci. Remote Sensing, vol. 35, pp. 79–85, Jan. 1997. [12] P. Lombardo and C. J. Oliver, “Maximum likelihood approach to the detection of changes between multitemporal SAR images,” Proc. Inst. Elect. Eng.—Radar, Sonar Navigat., vol. 148, no. 4, pp. 200–210, Aug. 2001. [13] P. Lombardo and T. M.T. Macrì Pellizzeri, “Maximum likelihood signal processing techniques to detect a step pattern of change in multitemporal SAR images,” IEEE Trans. Geosci. Remote Sensing, vol. 40, pp. 853–870, Apr. 2002. [14] C. J. Oliver and S. Quegan, Understanding SAR Images. Boston, MA: Artech House, 1998. [15] M. Abramowitz and L. A. Stegun, Handbook of Mathematical Functions. New York: Dover, 1964. [16] P. Lombardo and C. J. Oliver, “Optimum detection and segmentation of oil-slicks using polarimetric SAR data,” Proc. Inst. Elect. Eng.—Radar, Sonar Navigat., vol. 147, no. 6, pp. 309–321, Dec. 2000. [17] P. Gamba and B. Housmand, “An efficient neural classification chain for optical and SAR urban images,” Int. J. Remote Sens., vol. 22, no. 8, pp. 1535–1553, May 2001. [18] G. Amici, F. Dell’Acqua, P. Gamba, and G. Pulina, “Fuzzy, neural and neuro-fuzzy classification of pre- and post-event SAR images for flood monitoring and disaster mitigation,” in Proc. Conf. Analysis of Multi-Temporal Remote Sensing Images, L. Bruzzone and P. Smits, Eds. Trento, Italy, Sept. 13–14, 2001, pp. 100–107. [19] P. Gamba and F. Dell’Acqua, “Improved multiband urban classification using a neurofuzzy classifier,” Int. J. Remote Sens., vol. 24, no. 4, pp. 827–834, Feb. 2003.

ACKNOWLEDGMENT P. Lombardo and T. Macrì Pellizzeri are grateful to InfoSAR—Liverpool, UK for making available the InfoPACK SAR image processing software. They also thank C. J. Oliver for many fruitful discussions on this and similar topics. REFERENCES [1] E. Binaghi, P. Madella, M. G. Montesano, and A. Rampini, “Fuzzy contextual classification of multisource remote sensing images,” IEEE Trans. Geosci. Remote Sensing, vol. 35, pp. 326–340, Mar. 1997. [2] B. Jeon and D. Landgrebe, “Classification with spatio-temporal interpixel class dependency contexts,” IEEE Trans. Geosci. Remote Sensing, vol. 30, pp. 663–672, July 1992. [3] N. Khazenie and M. M. Crawford, “Spatial-temporal autocorrelated model for contextual classification,” IEEE Trans. Geosci. Remote Sensing, vol. 28, pp. 529–539, July 1990.

Tiziana Macrí Pellizzeri (S’03) was born in Rome, Italy, in December 1975. She graduated with distinction in communication engineering from the University of Rome “La Sapienza,” in July 2000. She is currently pursuing the Ph.D. degree in remote sensing at University of Rome “La Sapienza.” Her research activity includes multiparametric image processing, and in particular the fusion of SAR and optical images, and multifrequency polarimetric image processing. In 2000 and 2001, she has been involved in a national research project on the fusion of SAR and optical images funded by the Italian Ministry of University and Research. Ms. Macrí Pellizzeri is a member of the IEEE Geoscience and Remote Sensing Society Data Fusion Committee. She was the winner of the Student Paper Prize Competition at the 2001 IEEE Radar Conference (Atlanta, GA) in May 2001.


Paolo Gamba (S’91–M’93–SM’00) received the laurea (cum laude) and the Ph.D. degrees in electronic engineering from the University of Pavia, Pavia, Italy, in 1989 and 1993, respectively. He is currently Associate Professor of telecommunications at the University of Pavia. From 1992 to 1994, he was a Research and Development Engineer in the Microwave Laboratory of Siemens Telecomunicazioni, Cassina de’ Pecchi, Milano, Italy. In 1994, he joined the Department of Electronics, University of Pavia, first as a Teaching Assistant and later as an Assistant Professor and now as an Associate Professor. Since 1997, he has been teaching radiocommunications systems, electrical communications, and remote sensing image processing. He has been involved in a number of projects funded by the European Community, the Italian Ministry for Scientific Research, the Italian Space Agency (ASI), and the Italian Research Council (CNR). From 1998 to 2001, he has acted as a Consultant for the Radar Science and Technology Section, Jet Propulsion Laboratory (JPL), Pasadena, CA, concerning SAR over urban areas. He is the Guest Editor of a special issue of the ISPRS Journal of Photogrammetry and Remote Sensing on “Algorithm and Techniques for Multi-Source Data Fusion in Urban Areas” and a special issue of the International Journal of Information Fusion on “Fusion of Urban Remotely Sensed Features.” His current research interests include remote sensing data processing for urban applications, especially SAR urban analysis, multitemporal/polarization/frequency and multispectral data classification by neural and fuzzy classification tools, satellite image interpretation for civil protection purposes, weather radar, and meteorological satellite data interpretation. Dr. Gamba was the recipient (first place) of the 1999 ESRI Award for Best Scientific Paper in Geographic Information Systems. He was also awarded a Fulbright grant for the academic year 1999/2000. He was the Organizer and Technical Chair of the first IEEE/ISPRS Joint Workshop on “Remote Sensing and Data Fusion over Urban Areas,” whose first issue was held in Rome, Italy, in November 2001, and the second is in Berlin, Germany, in May 2003. He is a Guest Editor for this special issue of the IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING on “Urban Remote Sensing by Satellites.” He is currently Chair of Technical Committee 7 “Pattern Recognition in Remote Sensing” of International Association for Pattern Recognition (IAPR) and is a member of the Data Fusion Committee of IEEE Geoscience and Remote Sensing Society.

2353

Pierfrancesco Lombardo (S’93–M’95) received the laurea degree in electronic engineering and the Ph.D. degree from the University of Rome “La Sapienza,” Rome, Italy, in 1991 and 1995, respectively. In 1994, he was a Research Associate at the University of Birmingham, Birmingham, U.K., while also with the SAR team of the Defense Research Agency, Malvern, U.K. He was involved in research on space–time adaptive processing for AEW at Syracuse University, Syracuse, NY, where he was a Research Associate in 1996. In June 1996, he joined the University of Rome “La Sapienza,” where he has been an Associate Professor since 1998. He is involved in research projects funded by the Italian Space Agency on multiparametric SAR image processing, and in projects on advanced radar detection, data fusion, and radiolocalization. He is also Co-Investigator of the radar sounding instruments for the space exploration missions Rosetta (ESA/ASI) and Mars Express (ASI/NASA). His research has been reported in over 80 publications in international technical journals and conferences. Dr. Lombardo served in the paper selection committee of the IEEE Radar Conference 2001 (Atlanta, GA) and as Technical Chairman of the IEEE/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas (Rome 2001). He is Associate Editor for Radar Systems for the IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS. In 2001, he received the Barry Carlton Award for the best paper published in the IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS.

Fabio Dell’Acqua (M’01) was born in Pavia, Italy, on March 28, 1971. He received the first-class honors and Ph.D. degrees in electronics engineering from the University of Pavia, Pavia, Italy. He passed the qualification exam for practicing the profession of engineer in November 1996. He is currently working on processing and interpretation of satellite images. From 1996 to 1999, he attended a Ph.D. course in electronics and computer science engineering at the University of Pavia, investigating shape analysis techniques to process meteorological images. From September to December 1998, he was a Visiting Researcher at Colorado State University, Fort Collins, where he studied weather satellite imagery analysis. In the first half of 2000, he worked on analysis and reconstruction of range data (EU TMR—CAMERA) as a Research Associate at the Vision Laboratory, University of Edinburgh, Edinburgh, U.K. From July 2000 to November 2001, he was a Post Doctorate at the University of Pavia, where in December 2001, he received a permanent position as an Assistant Professor. His fields of interest include shape analysis, retrieval of images from archives, range data analysis, neural networks, and remote sensing.