IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 49, NO. 12, DECEMBER 2011
4977
Data Fusion of Different Spatial Resolution Remote Sensing Images Applied to Forest-Type Mapping Pieter Kempeneers, Fernando Sedano, Lucia Seebach, Peter Strobl, and Jesús San-Miguel-Ayanz
Abstract—A data fusion method for land cover (LC) classification is proposed that combines remote sensing data at a fine and a coarse spatial resolution. It is a two-step approach, based on the assumption that some of the LC classes can be merged into a more generalized LC class. Step one creates a generalized LC map, using only the information available at the fine spatial resolution. In the second step, a new classifier refines the generalized LC classes to create distinct subclasses of its parent class, using the generalized LC map as a mask. This classifier uses all image information (bands) available at both fine and coarse spatial resolutions. We followed a simple data fusion technique by stacking the individual image bands into a multidimensional vector. The advantage of the proposed approach is that the spatial detail of the generalized LC classes is retained in the final LC map. The method has been designed for operational LC mapping over large areas. Within this paper, it is shown that the proposed data fusion approach increased the robustness of forest-type mapping within Europe. Robustness is particularly important when creating continental LC maps at fine spatial resolution. These maps become more popular now that remote sensing data at fine resolution are easier to access. Index Terms—Data fusion, forest types, land cover (LC) classification.
I. I NTRODUCTION
U
NTIL RECENTLY, remote-sensing-based land cover (LC) mapping of a large area has been performed using coarse spatial resolution data (250 m–1 km [1]–[3]). Yet, fine spatial resolution LC maps have clear advantages: Complex LC patterns are finely resolved, and map outputs can be more properly validated than coarse resolution maps [4]. As public access to fine spatial resolution satellite imagery is improving, detailed LC mapping for large areas becomes feasible [5]. The national LC database is based on Landsat 5 [Thematic Mapper (TM)] and 7 [Enhanced Thematic Mapper Plus (ETM+)] and provides a full-scale LC map of the United States with 16 classes at a resolution of 30 m [6], [7]. In Europe, there is the pan-European forest/nonforest map for the year 2000, also based on Landsat (ETM+) imagery [8], and the Corine LC map (CLC) [9]. A similar approach was used for a forest-type map in Sweden [10]. Manuscript received May 30, 2010; revised October 29, 2010 and January 27, 2011; accepted May 22, 2011. Date of publication July 28, 2011; date of current version November 23, 2011. P. Kempeneers, F. Sedano, P. Strobl, and J. San-Miguel-Ayanz are with the Institute for Environment and Sustainability, Joint Research Centre, European Commission, LMNH-FOREST (TP 261), 21027 Ispra, Italy (e-mail: pieter.
[email protected];
[email protected]; peter.
[email protected];
[email protected]). L. Seebach is with the Forest and Landscape Department, University of Copenhagen, 1958 Copenhagen, Denmark (e-mail:
[email protected]. europa.eu). Digital Object Identifier 10.1109/TGRS.2011.2158548
The objective of this paper was to develop an operational method for creating a fine spatial resolution LC map at continental scale (e.g., pan-European). The method had to be automatic and robust for suboptimal input data. Being overly meticulous about the input data will result in an incomplete LC map, since not all scenes are acquired in optimal conditions. This is due to vegetation phenology (seasonal vegetation growth and senescence over time), low sun angles, or atmospheric distortion. In some cases, scenes must be used from a sensor that is less suited for the classification at hand, in order to create LC products at continental scale. As an example, since May 2003, all Landsat 7 ETM+ acquisitions have suffered from a scan line corrector malfunction (SLC-off) which causes along-scan no-data gaps affecting 22% of each acquisition [11]. Operational sensors with a similar spatial resolution exist, but, mostly, the spectral coverage is unmatched. The fine spatial resolution images for our study were acquired with the Linear Imaging Self-Scanner (LISS-3) sensor on board the Indian Remote Sensing satellite (IRS-P6). Unlike Landsat TM or ETM+, LISS-3 has only one band in the short wave infrared (SWIR) and no thermal bands. This can have a negative impact on the classification accuracy, as shown in [12]. The spectral coverage is even more limited for the new series of satellites within the Disaster Monitoring Constellation, as there is no SWIR band at all [only green, red, and near infrared (NIR)] [13]. One way to improve robustness is to combine information from different sensors (multisensor data fusion). Acquiring another data set at fine spatial resolution with a full coverage is, in most cases, not realistic. Coarse spatial resolution satellite imagery at 250 m or more is widely available. Within this project, the challenge was to combine both fine and coarse spatial resolution sensors, without losing the advantage of the fine spatial resolution in the final LC map. The proposed data fusion method deals with this issue. On the other hand, combining sensors with different spatial resolutions offers a unique opportunity. Coarse spatial resolution sensors often provide information that is complementary to fine spatial resolution sensors. MODIS, the Moderate-Resolution Imaging Spectroradiometer instrument, for example, has more spectral information, the coarser the spatial resolution gets; there are two spectral bands at 250 m, 5 at 500 m and 29 at 1000 m [14]. The spatial resolution of the LC map we aim for is 25 m. We therefore used the two MODIS bands at the finest spatial resolution (250 m): red and NIR. However, both spectral bands are already covered by the sensor we used at finer spatial resolution. Hence, the coarse spatial resolution data did not provide any extra spectral information. Therefore, instead of a single image, we selected a composite product (Vegetation Indices 16-Day
0196-2892/$26.00 © 2011 IEEE
4978
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 49, NO. 12, DECEMBER 2011
L3 Global 250 m (MOD13Q1) [15]) for each month in 2006. We therefore obtained a multitemporal image with 12 bands. Instead of extra spectral bands, the complementary information was the temporal aspect of the spectral reflectance. This can describe phenology, which is a potential indicator for LC types [16], [17]. The main problem with data fusion from sensors at different spatial resolutions is that spatial resolution is lost in the final LC map. The solution was found in the hierarchical structure that exists in the legend of many LC maps, suggesting a cascade of two different classifiers. In step one, the classifier creates a generalized LC map. In step two, a new classifier refines some of the generalized LC classes to a refined level, using the generalized LC map as a mask. The idea is that, as the classes are refined, the complexity of the classification increases. At this point, the classifier can benefit most from the added information obtained from data fusion. The generalized LC classes should be easier to classify accurately using only the information at fine spatial resolution. For those classes, the LC map will therefore retain the finest spatial resolution possible. As an example, the application in our paper was the classification of a forest-type map with four classes: broadleaved forest, coniferous forest, nonforest, and inland water. The generalized LC classes consisted of nonforested land, inland water bodies, and forest. Broadleaved and coniferous forests were introduced by refining the forest class. In this case, spatial detail was most interesting for the generalized LC classes. Using fine spatial resolution image data, small inland water bodies were still detected. In addition, spatial detail was of considerable importance for forest and NF lands. Within the forest areas, there is a less defined boundary between individual forest types, whereas forest boundaries can be crisp due to forest management (forest plantation, harvesting practices, etc.) and natural disturbances (fire and storms). To be able to detect these boundaries, a fine spatial resolution forest/nonforest map was preferred. The natural gradient that exists between forest types within a forest area not only complicates the classification but also makes the definition of edges between the discrete classes less meaningful. The main contribution of this paper is how to use multisensor data fusion for LC classification when sensors have different spatial resolutions. The method has been designed for creating fine spatial resolution LC maps at continental scale. Another proposed technique is how to deal with training data over large areas. At a local or regional scale, collecting data in the field is a common practice to create a training set for a supervised classifier. At continental or global scale, it becomes too expensive to create a training sample that represents the entire coverage. If a large database of ground reference data is already available, the sampling unit is often too small to match the coarse pixels in the input image. As a consequence, the spectral reflectance of the pixel will be affected from other LC types than the one labeled in the field. These mixed pixels impede the training process. On the other hand, large-scale LC maps do already exist. Even though they might not meet the requirements as a final LC map for some particular user (e.g., outdated, poor accuracy, or too coarse spatial resolution), there are several advantages of using them for training. Apart from the abundance of available
Fig. 1. Selected test scenes across the European continent: (MD) Mediterranean dry area, (MH) Mediterranean humid area, (FR) fragmented landscape, (AR) Alpine region, and (BR) Boreal region.
training pixels, the LC map can be adjusted to the scale that corresponds to the remote sensing input image. We present an implementation for selecting a training set that is based on a spatially distributed random subsample of the existing LC map. The proposed methods are applied to create a forest-type map at 25-m resolution. We used the CLC map, which has a minimum mapping unit (MMU) of 25 ha, as a basis for training. It is shown that data fusion increased robustness, while the proposed method preserved most of the fine spatial detail in obtained LC map. When comparing it to the CLC that existed, the following items have been observed: 1) The accuracy with respect to reference data is improved and 2) spatial detail is increased. II. S TUDY A REA AND DATA The proposed method has been applied to five test scenes, selected across the European continent (Fig. 1). The selection criterion was to address the different challenges of forest-type mapping toward a wall-to-wall European product. Each scene covers 141 km2 , corresponding to an LISS-3 footprint. The four LC classes (broadleaved and coniferous forests, nonforested land, and water) are represented in all the scenes, but the distribution of the two forest types is different from scene to scene. One scene with a more balanced distribution is located in the Mediterranean area, central-western Spain. The tree species diversity located within this scene is due to the altitude gradient. In addition, the area has tree cover with various densities, due to its long history of agroforestry (dehesas). Another scene was selected in the south of Spain, also in the Mediterranean area. It represents a dry area with sclerophyllous vegetation. Trees are mostly broadleaved but evergreen and thus could easily be confounded with conifers. The third scene represents a highly fragmented landscape within the Atlantic
KEMPENEERS et al.: DATA FUSION OF REMOTE SENSING IMAGES APPLIED TO FOREST-TYPE MAPPING
TABLE I R EMOTE S ENSING DATA
4979
series, covering a 16-day period of each month in 2006, were thus obtained: two surface reflectances (red and NIR) and two vegetation indices (NDVI and EVI). Their performance was compared with respect to the classification accuracy of the final LC map. Time series based on the NIR surface reflectance performed best, although differences were small. For the sake of simplicity, results are not presented here. All results presented in this paper are based on the NIR time series. C. Training Data
region in the southwest of France. It covers agricultural fields and fragmented forests. There is a mixture of tree species, dominated by broadleaved forests with some scattered conifer plantations. A fourth scene is located in the Alpine region. Different illumination conditions due to topographic shadows increased the complexity of the forest mapping task of this scene, where the forests are primarily coniferous. The last scene is located in the Boreal region, in the south of Sweden. Conifers are in the majority. The main challenges for the image classifier here were peat bogs and other wooded land, because they are easily confused with forested land, due to their similar spectral signature. A. Fine Spatial Resolution Remote Sensing Data Fine spatial resolution images were acquired in 2006 from the medium-resolution LISS-3 sensor on board the IRS-P6. This sensor has four spectral bands: three in the visible and NIR and one in the SWIR (see Table I). The scenes used in this work were provided by Euromap and preprocessed by the Deutsches Zentrum für Luft- und Raumfahrt (DLR). The scenes were orthorectified and geometrically corrected using ground control points and a digital elevation model (DEM). The orthoimages obtained have 25-m resolution and were projected to [Ellipsoidal Coordinate Reference System/Lambert Azimuthal Equal Area (ETRS89/LAEA)], the standard projection for Europe [18]. The reported root mean square errors in both horizontal directions were less than a pixel. The images were only available as top of atmosphere (TOA) radiances (not atmospherically corrected). B. Coarse Spatial Resolution Remote Sensing Data We selected MOD13Q, the 16-day composites at 250-m spatial resolution from MODIS, for the coarse spatial resolution information. They were downloaded from the Land Processes Distributed Active Archive Center at the U.S. Geological Survey. The 16-day composite images at 250-m spatial resolution reduce the volume of the data while still retaining the temporal variability of the LC signal [17]. The MOD13Q products contain red and NIR surface reflectances, adjusted for bidirectional effects. In addition, they provide two vegetation indices [normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI)]. We downloaded 12 MOD13Q composites, one for each month in 2006. We reprojected and resampled them to a grid common to the fine spatial resolution scene (25 m). A cubic convolution interpolation method was used. Four different time
Consistent and accurate training data that cover a large area are difficult to obtain. One of the few European LC data sets, that is both homogeneous and publicly available, is the CLC map. It is available for the reference years 1990, 2000, and 2006. The CLC classifies the European territory in 44 LC classes. It is based on a visual interpretation of 25-m spatial resolution Earth observation satellite images [satellite pour l’observation de la terre (SPOT) high resolution visible (HRV) and linear imaging self-scanning sensor-3 (LISS-3)], the same data set from which the LISS-3 scenes for this case study were selected. All 27 countries of the European Union (EU27) are covered, in addition to the following: Albania, Bosnia-Herzegovina, Serbia, Croatia, the former Yugoslav Republic of Macedonia, and Liechtenstein. There are three forest classes, based on species dominance: broadleaved, coniferous, and mixed. Crown cover needs to be at least 30% to be labeled as forest. The CLC map is a valuable data set for training a supervised classifier. Its main limitation is the MMU, which is 25 ha. This corresponds to 400 pixels at 25-m resolution. As a consequence, pixels representing a smaller area are merged to a larger unit that can be of a different class. Since the CLC map has to be overlaid with mediumresolution data (250 m), pixels near the class boundaries were excluded from the training selection. This was done using a morphological filter (erosion) of kernel size 11 × 11 pixels (see also Section III-B). D. Validation Data Validation of forest maps requires first a definition of “forest.” According to some national forest inventories, a forest area does not always have to be covered by trees (unstocked forest), and tree cover does not always result in forest (fruit trees and public parks). In this paper, we concentrated on LC, defining the forested land as tree crown cover. Two different reference data sets were available for validating the proposed method on forest-type mapping. The first was obtained from visual interpretation of very high (10 m or better) spatial resolution images available through Google Earth virtual globe. It enabled us to assess the four thematic classes in the forest-type map in all scenes. To facilitate the visual interpretation, we first created a Keyhole Markup Language file from the randomly selected validation points. It contained a circular polygon with a diameter of 25 m, centered at the location of the validation point. When overlaid in Google Earth virtual globe, we could visualize, on top of the high spatial resolution imagery, the corresponding pixel area in the LC map. The validation point was assigned to the class with the majority
4980
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 49, NO. 12, DECEMBER 2011
TABLE II N UMBER OF VALIDATION P OINTS O BTAINED F ROM V ISUAL I NTERPRETATION IN E ACH A REA
cover. It was labeled as forested if the crown cover within the pixel boundary was estimated larger than 50%. This differs from the Food and Agriculture Organization of the United Nations (FAO) definition, which requires a minimum land area of 0.5 ha with tree crown cover of more than 10%–30%. We selected a stratified random sampling design, based on LUCAS [19], the Land Use/Cover Area frame Statistical survey. Within the first phase of LUCAS, a data set is built that consists of the intersection points of a 2-km grid over the European continent (EU27). The points are available in seven strata, based on the interpretation of orthophotos. The main focus of LUCAS was the agricultural domain, but it also covers other LC types: arable land (stratum 1), permanent crops (stratum 2), grassland (stratum 3), wooded areas and shrubland (stratum 4), bare land and rare vegetation (stratum 5), artificial land (stratum 6), and water (stratum 7). Since wooded areas and shrubland were merged in a single stratum (stratum 4), these strata were not suitable as such for the validation process in this study. We therefore visually interpreted all validation points and retained only those points for which Google Earth imagery was available at a spatial resolution of 10 m or better. Validation points for water were only selected from inland waters. The sample sizes for nonforested land (NF), broadleaved forest (BF), coniferous forest (CF), and water (W) are shown per scene in Table II. The second reference data set was derived from the Spanish National Forest Inventory (NFI) Data, which is based on field visits. It was available only for the scenes in Spain. Because its sampling design focused on forest, this second reference data set was used to validate the forest types (broadleaved and coniferous forests) within the forested land. NFI data contain very detailed forest information and are a valuable source of information for the validation of remote sensing maps. The field inventory data from the Spanish NFI were available to further assess the strength of the method. Field data are taken from field sample plots, located over the “forest area” by means of a Universal Transverse Mercator 1 × 1 km2 grid [20]. Each forest plot contains basic forest structure information such as dominant species from which the forest types were derived (majority rule). III. M ETHODS A general overview of the method is given in Fig. 2, showing a two-step approach. In step one, we create a generalized LC map K. Suppose there are C generalized LC classes: ΩK = ω1 , . . . , ωC . K can then be written as the union of Ki , representing a mask for each single class ωi : K = C i=1 Ki . Only the fine spatial resolution image X is used so far. The
pixels X(x, y) can contain any type of fine spatial resolution information: panchromatic, multispectral, contextual (texture), and DEM or some other elevation information. In our paper, we used the multispectral information from the LISS-3 sensor (X(x, y) = [x1 , . . . , x4 ]T ). The second step involves a data fusion process with the coarse spatial resolution information (image Y). Here, we used the multitemporal information corresponding to the 12 NIR bands in the MOD13Q composite (Y(x, y) = [y1 , . . . , y12 ]T ). Each generalized LC class ωi is then further refined to Ci distinct classes (Ωi = ωi1 , . . . , ωiCi ), obtaining an LC map Li . In our example, the generalized LC class is “forest,” which is refined into broadleaved and coniferous forests (C1 = 2). The classifier that refines class ωi only operates on the subset Zi of the image, by masking out pixels from all classes ωj not equal to ωi : mask K \ Ki = ∪j=i Kj . The final LC map is obtained by merging all refined LC maps: L = ∪C i=1 Li . The important steps are subsequently described in more detail. A. Data Fusion The first step in multisensor data fusion is data alignment, i.e., the observations from the individual sensors must be processed to a form that is suitable for subsequent processing. The coarse spatial resolution image Y must therefore be resampled to the same grid of the fine spatial resolution image X. Based on a common grid, the resampled coarse spatial resolution image Y can then be combined (or fused) to X. There are several techniques how to combine the information from X and Y : before or at the decision level of the classifier. One technique is decision fusion. This involves fusion of sensor information, after a preliminary decision (classification) has been made, based on a single source of information only (either X(x, y) or Y (x, y)). The final class decision can be based on a weighted decision method (voting techniques), classical inference, Bayesian inference, and Dempster–Shafer’s method [21]. We have chosen for a simple data fusion technique before the decision level. By stacking (concatenating) the vectors X(x, y) and Y (x, y), a new vector Z(x, y) is obtained, which is then fed into the second classifier that refines the LC classes. This is also known as raw data fusion. The combination of both spectral information and multitemporal information in our case study resulted in the 16 dimensional vector Z(x, y) = [x1 , . . . , x4 , y1 , . . . , y12 ]T . Alternatively, feature-level fusion involves a feature extraction [22] step before stacking. A multidimensional feature vector, Z(x, y) = [z1 , . . . , z16 ]T , is then obtained that can be used as input to the classifier. In this case, the image bands are typically projected to a feature space by using some transformation (e.g., principle component analysis and wavelet transform). Regardless of the aforementioned data fusion techniques, the spatial detail of an LC map can only decrease by introducing coarse spatial information in the classification process. The advantage of the two-step approach is that all generalized LC classes for which only fine spatial resolution information was used keep their spatial detail. Even after refining an LC class ωi to Ci , the spatial detail of the generalized LC classes remains intact. By masking out all other classes, they are not involved
KEMPENEERS et al.: DATA FUSION OF REMOTE SENSING IMAGES APPLIED TO FOREST-TYPE MAPPING
Fig. 2.
4981
Schematic overview of the two-step approach for forest-type mapping.
in the classification. For the refined LC classes, we are willing to trade off some of the spatial detail for the extra robustness that the data fusion can offer. The classification of the refined LC classes is performed for every class ωi , separately with a different mask, K \ Ki , each. The images X and Y are not restricted to optical remote sensing data. The proposed approach can also be applied to topographic, radar, and LiDAR information. High-density LiDAR data are very efficient, for example, to create a fine spatial resolution forest/nonforest classification. To refine these classes, data can then be fused in step two with spectral information from an optical sensor that has a coarser spatial resolution. As an example, the spectral information could help to refine “nonforest” into land and water, and “forest” into forest types or species. B. Training The morphological filter in Fig. 3 excludes all “boundary” pixels from the potential training data set, i.e., pixels within some distance of a class boundary. The distance is to be matched to the spatial resolution of the coarsest remote sensing input image (e.g., half the pixel size of the coarse spatial resolution image). We used a kernel size of 11 × 11 pixels,
because the LISS-3 images have a spatial resolution that is ten times finer than the MOD13Q products. This does not mean that the resulting training set in our case study only contained large homogeneous regions. Because of the MMU of 25 ha in CLC, patches smaller than this are merged into a different class. Therefore, some of the selected pixels for some class can actually correspond to a different ground cover. The next step is to use a nearest neighbor interpolation to match the ground sampling distance of the coarse spatial resolution image. This is to avoid that the same coarse spatial resolution information is used more than once. The selection process for training then iterates through all pixels of this reduced LC map. A pixel is randomly selected for each class, ignoring cloudy pixels. The random selection is based on a random function that produces a random number between zero and one. Our implementation was based on the rand() function that is part of the C++ standard library. A pixel is selected for training class ωi if the random output is smaller than the proportion p(ωi ). If not, the pixel is skipped, and the selection process proceeds with the next pixel in the reduced LC map. The proportion p(ωi ) is calculated as the number of desired training pixels of class ωi divided by the total amount of pixels of class ωi in the reduced LC map (counting only cloud-free pixels).
4982
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 49, NO. 12, DECEMBER 2011
Fig. 3. Flow chart of the training scheme. Input is an existing LC map from which training data are randomly selected.
Given that the proportion p(ωi ) is based on the total amount of pixels of class ωi , all potential areas in the existing LC map are equally likely to be selected. The selected training sample is therefore likely to represent the cloud-free scene, which reduces spatial correlation. To train the classifier in step one for our case study, the training samples for broadleaved and coniferous forests were merged into a single training class “forest.” The training sample for “nonforested land” included some CLC classes, namely, Agricultural areas, Scrub and/or herbaceous associations and Wetlands, which were also merged. Mixed forest, which is part of the CLC, was not retained as a class label in the forest-type map. The CLC does not define mixed forest at the 25-m pixel size but at the MMU of 25 ha. The individual pixels in our forest map are comparable to a single crown area, and the definition of mixed forest at 25-m pixel level is therefore questionable. Some kind of spatial aggregation is needed to introduce mixed forest into the forest-type, either before (image segmentation) or after the classification process (aggregation of forested land pixels). In this paper, only a separation between pure forest types was sought for, leaving the aggregation step to obtain mixed forest for future study. In our case, a separate classifier was trained for each image scene. That enabled us to use the LISS-3 images without any radiometric normalization. Using an existing LC map such as the CLC for training makes separate classifiers feasible, because enough training data are available. C. Supervised Classification The objective of this study was to create an operational method for a fine-resolution LC map at continental scale.
A classifier had to be selected that can meet this objective. We selected a multilayer perceptron artificial neural network (ANN), using backpropagation for training [23]. ANNs have been shown to combine two excellent classification properties: high accuracy [24], [25] and robustness to training site heterogeneity [26]. This was confirmed in a study on mapping LC modifications over large areas [27]. ANN appeared to be robust and accurate for automated large area change monitoring. The study concluded that it performed equally well across the diverse study areas with minimal human intervention in the classification process. This matched well with our requirements for an operational method. Also important is that ANN, once trained, is very fast. In relation to data fusion, ANN has the nice property that they deal well with integrated data from different sources [28]. A condition is that the data are properly scaled [26]. In our case, the multispectral information (bands 1–4) was scaled from 0 to 100. The multitemporal information (bands 5–16) was scaled from 0 to 10 000. Without rescaling, bands 5–16 would quickly saturate the ANN nodes. Saturated nodes produce constant output values corresponding to the maximum of their dynamic range, causing the learning process to stop. Therefore, all input values had to be rescaled to match the range of the activation function [29]. The implemented ANN architecture was a fully connected backpropagation neural network with two hidden layers. The number of input nodes in the ANN for the classifiers in steps one and two was equal to the dimensions of the respective input image data. The output nodes in steps one and two corresponded to the number of generalized and refined LC classes, respectively. It is difficult to rely on any heuristics regarding the number of hidden layers and nodes of an ANN. A trial-and-error strategy is therefore frequently applied, also in this study. Consistent results were obtained with two hidden layers (ten and five nodes), based on a combination of visual inspection of the classification maps and cross validating the available training data (CLC). Finally, a bootstrap aggregation was introduced. Bootstrap aggregation, or bagging [30], generates an ensemble of individual classifiers by bootstrap sampling the training data set. The classification result can be obtained from the majority vote of each individual classifier output. Bagging can minimize the sensitivity of the classification algorithm to noise in feature data and labeling errors in training data. This is particularly useful in case an existing LC map with poor or unknown accuracy is used for training. In this perspective, bagging has been shown to be more accurate than boosting [31], another successful ensemble classifier technique. We used three bootstrap aggregations, with a bootstrap sample consisting of a random selection of 50% of the training samples (with replacement). The methods presented in Figs. 2 and 3 were implemented in C++. For the ANN, a free open source library, Fast Artificial Neural Network [32], was used. The programs can run in a grid computing environment (Linux operating system, 36 nodes available). A single node with four CPU’s of 1.56 GHz and 5 GB memory (RAM) classified a generalized LC map based on one scene (between 7000 and 8000 pixels in the European LAEA ETRS89 [18] projection) in approximately
KEMPENEERS et al.: DATA FUSION OF REMOTE SENSING IMAGES APPLIED TO FOREST-TYPE MAPPING
4983
Fig. 4. Effect of the two-step approach in the generalized LC map. Left image shows band 1 of the fine spatial resolution image. The generalized LC map (dark is forest; light is nonforest) is shown (center) without and (right) with the two-step approach.
TABLE III C ONFUSION M ATRIX BASED ON V ISUAL I NTERPRETATION (R EFERENCE DATA IN ROWS ): κ = 0.71 AND OVERALL ACCURACY I S 88%
11 min. This included three bootstrap aggregations, each with different training sets (sample size of 25 000). For the refined LC map, the time needed was only half of that, because the training sample size was smaller (5000 for broadleaved forest and 5000 for coniferous forest), and nonforest was masked out for classification. IV. R ESULTS AND D ISCUSSION A. Classification Accuracy The overall accuracy of the generalized LC map (three classes: nonforested land, forest, and water) was 88%. The classifier used the multispectral information at fine spatial resolution (bands 1–4) only. Adding the multitemporal information at coarse spatial resolution (bands 5–16) for creating the forest mask could have helped in distinguishing the larger forest areas, but forest patches, roads, clear cuts, and water bodies smaller than the medium-resolution pixel size would be missed. This is illustrated for a small extract of 7 km by 7 km of a forest area in the Alps (Fig. 4), where two forest masks are shown. The first (left image) was created using bands 1–4 only. The second (right image) was created including bands 5–16. Most of the forest area (in dark) is captured by both forest masks. The first forest mask better preserves the detailed information. This can be seen from the road in Fig. 4, which is less than 50 m wide and cuts the forest from north to south. The forested land was further refined into forest types, resulting in a final LC map with four classes: nonforested land (NF), broadleaved forest (BF), coniferous forest (CF), and water (W). The overall accuracy was 87%, based on the first reference data set (visual interpretation). Another measure for classification accuracy is the kappa coefficient [33]. It was calculated as 0.71 and takes into account the expected agreements (correct
TABLE IV C ONFUSION M ATRIX BASED ON NFI (R EFERENCE DATA IN ROWS ): κ = 0.75 AND OVERALL ACCURACY I S 88%
classification by chance). A confusion matrix is shown in Table III. Rows represent the reference data. Classification results (in columns) were produced by the classifiers in the twostep approach discussed in Section III-A. User’s and producer’s accuracies are shown in the last two rows. The NFI of Spain, the second reference data set, allowed us to intercompare the validation results in the combined Mediterranean scenes. Based on the first reference data set (visual interpretation), the overall accuracies of the forest mask and the forest-type map were 93% and 92%, respectively. These reference data did not include nonforested land and water and thus did not allow us to validate the forest mask. Broadleaved and coniferous forests were derived from the dominant species in each plot. The calculated overall accuracy of the forest-type map based on the two reference data sets was almost identical (87% and 88%). Given that the NFI is the most extensive and accepted source of information for forest resource assessment at national level, this is an important result. The rows in the confusion matrix in Table IV show the two forest types derived from the NFI data. The columns represent the classification result. The difficulties were different for the two scenes MH and MD. In the dry area (MD), most errors were present in the coniferous forest, which was the minority forest-type there. Broadleaves and conifers were more balanced in the humid area (MH). There were only a few omission errors for coniferous forest. Instead, most errors were due to misclassification of broadleaved forest as coniferous forest. B. Benefits of Data Fusion If the acquisition date of one multispectral observation is optimal, it is difficult to further improve the classification accuracy with the proposed data fusion. Classes that can be distinguished
4984
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 49, NO. 12, DECEMBER 2011
Fig. 5. TOC reflectance based on MODIS in NIR for broadleaved and coniferous forests. The optimal acquisition window for forest-type discrimination for the scene in the Boreal region is between June and July.
with a time series of a single spectral band are likely separable by one multispectral observation, if acquired at the right point in time. Little added value of multitemporal information can be expected in this case. However, the optimal acquisition date, which is location and class dependent, is often not known a priori. Even if it is known, the corresponding observation might not be available. The main contribution of the proposed method is to be less dependent on the optimal acquisition date, i.e., to increase robustness. This is particularly important for large-area LC mapping. A consistent wall-to-wall LC map can then be obtained without putting too many constraints on the singledate scenes. As an example, the only cloud-free LISS-3 scene for the Boreal region in this study is dated from May 5, 2006. Although this date might be well suited for forest-type classification in other regions in Europe, many broadleaved trees in the Boreal region do have few leaves at this latitude (57◦ 3 ). Because the data fusion included multitemporal data, we were able to classify the broadleaved forest. More similar cases can be expected when trying to cover a wall-to-wall forest-type map. To show the potential of the proposed data fusion, we selected two random samples from the classification result (with data fusion) in the Boreal region. One sample was selected from the pixels classified as coniferous forest (5000 pixels), the same amount from broadleaved forest. We then calculated the mean and standard deviation of the corresponding reflectance in NIR for the two samples. This was done for each month in the MODIS multitemporal data. The result is shown in Fig. 5. The two curves represent the mean top of canopy (TOC) reflectance (NIR) of the corresponding pixels (nearest neighbor) in the MODIS image. In May, the mean reflectance in NIR of pixels in broadleaved forest is only slightly higher than for pixels in coniferous forest. Both classes largely overlap, as indicated by the error bars representing the standard deviations. Including other spectral bands is of little added value, as shown in Fig. 6. Again, the mean reflectances are shown for broadleaved and coniferous forests, using the same random sampling, but are now selected from the LISS-3 scene in May. Although the mean
Fig. 6. Spectral overlap between the TOA reflectances acquired by LISS-3 sensor in May of broadleaved and coniferous forests in the Boreal region.
reflectances in the SWIR seem more distinct than in the NIR, there is still a large overlap. Ideally, the multispectral image should have been acquired in June or July (Fig. 5). The Boreal scene in May was classified, once with and once without data fusion. The classification result in Fig. 7 shows that data fusion improves the classification results for scenes acquired in suboptimal conditions. We compared the result to the original LC map used for training (CLC). It shows that the classification result without data fusion largely underrepresents broadleaved forest. With data fusion, the distribution of forest types was similar to the CLC. The updated map showed more spatial detail in the LC types (due to the 25-ha MMU of the CLC). The classification accuracy of the original CLC was also lower (overall accuracy of 85% and kappa of 0.68). We expect that the results for CLC are even slightly lower than this, because mixed pixels were left out. Mixed pixels are present in the CLC [see Fig. 7(a)] but not in the reference data. They are more difficult to classify, due to the potential overlap of their reflectances with the spectral signatures of pure broadleaved and coniferous forests. V. C ONCLUSION An operational method for LC mapping has been proposed, introducing a new data fusion method. It combines remote sensing data at fine and coarse spatial resolutions. The method was designed for fine spatial resolution LC maps at continental scale, where acquisition conditions are likely to be suboptimal for some scenes. By introducing a new information source, the robustness of the classifier can be increased. Even though the spatial resolution of the new information source is coarser, the proposed approach can still retain much of the fine spatial resolution in the final LC map. Results for a case study on forest-type mapping in Europe showed promising results. Multitemporal data at 250 m (MODIS) were fused with multispectral data at 25-m spatial resolution (LISS-3). The input data at different spatial resolutions can be extended to meteorological and digital elevation data, while other LC maps could also benefit from the presented method. Future work consists of applying the method to more than 3000 scenes for a wall-to-wall
KEMPENEERS et al.: DATA FUSION OF REMOTE SENSING IMAGES APPLIED TO FOREST-TYPE MAPPING
Fig. 7.
4985
Forest-type maps using (a) the CLC map for training (b) without data fusion and (c) with data fusion.
pan-European forest-type map. National inventory data of different countries will also be included in the validation process.
[8]
R EFERENCES [1] M. Hansen, R. DeFries, J. Townshend, and R. Sohlberg, “Global land cover classification at 1 km spatial resolution using a classification tree approach,” Int. J. Remote Sens., vol. 21, no. 6/7, pp. 1331–1364, 2000. [2] M. Friedl, D. McIver, J. Hodges, X. Zhang, D. Muchoney, A. Strahler, C. Woodcock, S. Gopal, A. Schneider, A. Cooper, A. Baccini, F. Gao, and C. Schaaf, “Global land cover mapping from MODIS: Algorithms and early results,” Remote Sens. Environ., vol. 83, no. 1/2, pp. 287–302, Nov. 2002. [3] E. Bartholomé and A. S. Belward, “GLC2000: A new approach to global land cover mapping from Earth observation data,” Int. J. Remote Sens., vol. 26, no. 9, pp. 1959–1977, May 2005. [4] P. Mayaux and E. Lambin, “Estimation of tropical forest area from coarse spatial resolution data: A two-step correction function for proportional errors due to spatial aggregation,” Remote Sens. Environ., vol. 53, no. 1, pp. 1–15, Jul. 1995. [5] J. Cihlar, “Land cover mapping of large areas from satellites: Status and research priorities,” Int. J. Remote Sens., vol. 21, no. 6/7, pp. 1093–1114, 2000. [6] C. Homer, J. Dewitz, J. Fry, M. Coan, N. Hossain, C. Larson, N. Herold, A. McKerrow, J. VanDriel, and J. Wickham, “Completion of the 2001 national land cover database for the conterminous United States,” Photogramm. Eng. Remote Sens., vol. 73, no. 4, pp. 337–341, Apr. 2007. [7] G. Xian, C. Homer, and J. Fry, “Updating the 2001 national land cover database land cover classification to 2006 by using Landsat imagery
[9] [10] [11] [12] [13] [14]
change detection methods,” Remote Sens. Environ., vol. 113, no. 6, pp. 1133–1147, 2009. A. Pekkarinen, L. Reithmaier, and P. Strobl, “Pan-European forest/nonforest mapping with Landsat ETM+ and CORINE land Cover 2000 data,” ISPRS J. Photogramm. Remote Sens., vol. 64, no. 2, pp. 171–183, Mar. 2009. M. Bossard, J. Feranec, and J. Otahel, “CORINE land cover technical guide—Addendum 2000,” Eur. Environ. Agency, Copenhagen, Denmark, Tech. Rep. 40, May 2000. O. Hagner and H. Reese, “A method for calibrated maximum likelihood classification of forest types,” Remote Sens. Environ., vol. 110, no. 4, pp. 438–444, Oct. 2007. T. Arvidson, S. Goward, J. Gasch, and D. Williams, “Landsat-7 longterm acquisition plan: Development and validation,” Photogramm. Eng. Remote Sens., vol. 72, no. 10, pp. 1137–1146, Oct. 2006. J. Gao, “A comparative study on spatial and spectral resolutions of satellite data in mapping mangrove forests,” Int. J. Remote Sens., vol. 20, no. 14, pp. 2823–2833, 1999. G. Chander, S. Saunier, M. Choate, and P. Scaramuzza, “SSTL UK-DMC SLIM-6 data quality assessment,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 7, pp. 2380–2391, Jul. 2009. C. O. Justice, E. Vermote, J. R. G. Townshend, R. Defries, D. P. Roy, D. K. Hall, V. V. Salomonson, J. L. Privette, G. Riggs, A. Strahler, W. Lucht, R. B. Myneni, Y. Knyazikhin, S. W. Running, R. R. Nemani, Z. Wan, A. R. Huete, W. van Leeuwen, R. E. Wolfe, L. Giglio, J. Muller, P. Lewis, and M. J. Barnsley, “The moderate resolution imaging spectroradiometer (MODIS): Land remote sensing for global change research,” IEEE Trans. Geosci. Remote Sens., vol. 36, no. 4, pp. 1228–1249, Jul. 1998.
4986
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 49, NO. 12, DECEMBER 2011
[15] A. Huete, K. Didan, T. Miura, E. Rodriguez, X. Gao, L. Ferreira et al., “Overview of the radiometric and biophysical performance of the MODIS vegetation indices,” Remote Sens. Environ., vol. 83, no. 1/2, pp. 195–213, Nov. 2002. [16] R. DeFries and J. Townshend, “NDVI-derived land cover classifications at a global scale,” Int. J. Remote Sens., vol. 15, no. 17, pp. 3567–3586, 1994. [17] M. Hansen, J. Townshend, R. DeFries, and M. Carroll, “Estimation of tree cover using MODIS data at global, continental and regional/local scales,” Int. J. Remote Sens., vol. 26, no. 19, pp. 4359–4380, 2005. [18] A. Annoni, C. Luzet, E. Gubler, and J. Ihde, “Map Projections for Europe,” Inst. Environ. Sustainability, Ispra, Italy, EUR 20120 EN, 2003. [19] P. Jacques and F. Gallego, The LUCAS 2006 Project—A New Methodology. [Online]. Available: http://mars.jrc.it/mars/content/download/567/ 4122/file/Lucas%20new%20me%thodology.pdf [20] E. Tomppo, T. Gschwantner, M. Lawrence, and R. E. McRoberts, National Forest Inventories: Pathways for Common Reporting, E. Tomppo, T. Gschwantner, M. Lawrence, and R. E. McRoberts, Eds. Heidelberg, Germany: Springer-Verlag, 2010. [21] D. Hall and S. McMullen, Mathematical Techniques in Multisensor Data Fusion. Norwood, MA: Artech House, 2004. [22] R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. New York: Wiley, 2001. [23] D. Rumelhart and J. McClelland, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations. Cambridge, MA: MIT Press, 1986. [24] M. Chini, F. Pacifici, W. Emery, N. Pierdicca, and F. Del Frate, “Comparing statistical and neural network methods applied to very high resolution satellite images showing changes in man-made structures at rocky flats,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 6, pp. 1812–1821, Jun. 2008. [25] G. Licciardi, F. Pacifici, D. Tuia, S. Prasad, T. West, F. Giacco, C. Thiel, J. Inglada, E. Christophe, J. Chanussot, and P. Gamba, “Decision fusion for the classification of hyperspectral data: Outcome of the 2008 GRS-S data fusion contest,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 11, pp. 3857–3865, Nov. 2009. [26] J. Paola and R. Schowengerdt, “A review and analysis of backpropagation neural networks for classification of remotely-sensed multi-spectral imagery,” Int. J. Remote Sens., vol. 16, no. 16, pp. 3033–3058, 1995. [27] J. Rogan, J. Franklin, D. Stow, J. Miller, C. Woodcock, and D. Roberts, “Mapping land-cover modifications over large areas: A comparison of machine learning algorithms,” Remote Sens. Environ., vol. 112, no. 5, pp. 2272–2283, May 2008. [28] J. Benediktsson, P. Swain, and O. Ersoy, “Neural network approaches versus statistical methods in classification of multisource remote sensing data,” IEEE Trans. Geosci. Remote Sens., vol. 28, no. 4, pp. 540–552, Jul. 1990. [29] B. Widrow and M. Lehr, “Thirty years of adaptive neural networks: Perceptron, madaline and backpropagation,” Proc. IEEE, vol. 78, no. 9, pp. 1415–1442, Sep. 1990. [30] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123– 140, 1996. [31] Y. Freund, “Boosting a weak learning algorithm by majority,” Inf. Comput., vol. 121, no. 2, pp. 256–285, Sep. 1995. [32] S. Nissen and E. Nemerson, Fast Artificial Neural Network Library (FANN). [Online]. Available: http://fann.sourceforge.net [33] R. G. Congalton, “A review of assessing the accuracy of classification of remotely sensed data,” Remote Sens. Environ., vol. 37, no. 1, pp. 35–46, Jul. 1991.
Pieter Kempeneers received the M.S. degree in electronic engineering from Ghent University, Ghent, Belgium, in 1994 and the Ph.D. degree in physics from Antwerp University, Antwerp, Belgium, in 2007. He was a Researcher with the Department of Telecommunications and Information Processing, Ghent University, and with Siemens (private industry mobile communication systems). In 1999, he was with the Centre for Remote Sensing and Earth Observation Processes (TAP), Flemish Institute for Technological Research (VITO), as a Scientist. Since 2008, he has been a Scientist with the Joint Research Centre, European Commission, Ispra, Italy. His research focus is on image processing, pattern recognition, and multi- and hyperspectral image analysis.
Fernando Sedano received the Ph.D. degree from the Department of Environmental Science, Policy and Management, University of California, Berkeley, in 2008. He was a National Aeronautics and Space Administration Earth System Science Fellow in 2005–2008. From 1998 to 2003, he was a Forest Consultant with Stora Enso Forest Consulting in Finland and Mozambique. Since November 2008, he has been a Postdoctoral Researcher with the Institute for Environment and Sustainability, Joint Research Centre, European Commission, Ispra, Italy. His research interests include the development of remote sensing applications for the study of forest processes and land cover dynamics.
Lucia Seebach received the Diploma degree (equivalent to M.S. degree) in geoecology from the University of Bayreuth, Bayreuth, Germany, in September 2003. After her graduation, she was a Scientific Officer with the Joint Research Centre, European Commission, Ispra, Italy. She is currently a Ph.D. Fellow with the Forest and Landscape Department, University of Copenhagen, Copenhagen, Denmark. Her main areas of scientific interest are monitoring and modeling of forest resources, uncertainty analysis, and assessment of applicability of remote-sensing-derived maps.
Peter Strobl received the M.Sc. degree in geophysics from the University of Munich, Munich, Germany, in 1991 and the Ph.D. degree in geosciences from the University of Potsdam, Potsdam, Germany, in 2000. Between 1991 and 2004, he was a Research Scientist with the German Aerospace Centre (DLR), Germany, Joint Research Centre (JRC), European Commission, Ispra, Italy, and University of Munich. Since 2004, he has been with the Institute for Environment and Sustainability, JRC, as a Scientific Officer. He looks back on 20 years of remote sensing experience during which he got involved in preprocessing, calibration, sensor design and operations, and various use cases building on statistical analysis of multisensor and multitemporal data sets. His current focus is on data quality and standardization issues in conjunction with large-area applications such as the pan-European forest mapping.
Jesús San-Miguel-Ayanz received the B.S. degree in forest engineering from the Polytechnic University of Madrid, Madrid, Spain, in 1987 and the M.Sc. and Ph.D. degrees in remote sensing and geographic information systems from the University of California, Berkeley, in 1989 and 1993, respectively. He was with the University of Cordoba, Cordoba, Spain, where he was an Assistant Professor from 1994 to 1995 and an Associate Professor of forest inventory, forest mensuration, and remote sensing from 1995 to 1997. He was also a Research Fellow with the European Space Agency, Noordwijk, The Netherlands, from 1993 to 1994, and with the University of California from 1989 to 1993. He is a Senior Researcher with the Institute for Environment and Sustainability, Joint Research Centre, European Commission, Ispra, Italy, in the field of forestry. He is also a Leader of the FOREST Team.