Fine Land Cover Classification Using Daily Synthetic ... - IEEE Xplore

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 12, NO. 12, DECEMBER 2015

2359

Fine Land Cover Classification Using Daily Synthetic Landsat-Like Images at 15-m Resolution Bin Chen, Bo Huang, and Bing Xu

Abstract—There is currently no unified remote sensing system available that can simultaneously produce images with fine spatial, temporal, and spectral resolutions. This letter proposes a unified spatiotemporal spectral blending model using Landsat Enhanced Thematic Mapper Plus and Moderate Resolution Imaging Spectroradiometer images to predict synthetic daily Landsat-like data with a 15-m resolution. The results of tests using both simulated and actual data over the Poyang Lake Nature Reserve show that the model can accurately capture the general trend of changes for the predicted period and can enhance the spatial resolution of the data, while at the same time preserving the original spectral information. The proposed model is also applied to improve land cover classification accuracy. The application in Wuhan, Hubei Province shows that the overall classification accuracy is markedly improved. With the integration of dense temporal characteristics, the user and producer accuracies for land cover types are also improved. Index Terms—Improved adaptive intensity–hue–saturation (IAIHS), land cover classification, spatiotemporal–spectral fusion, spatial and temporal adaptive reflectance fusion model (STARFM).

I. I NTRODUCTION

F

OR remotely sensed data, spatial resolution is the actual length of the ground surface captured by one pixel, temporal resolution denotes the revisit cycle of a sensor, and spectral resolution is the electromagnetic bandwidth of the spectral signatures captured by the onboard sensor. However, given the tradeoff among spatial, temporal, and spectral resolutions, there is so far no unified sensor that can produce images with fine resolutions simultaneously. For example, the Landsat Enhanced Thematic Mapper Plus (ETM+) and Satellite Pour l’Observation de la Terre (SPOT)-5 cannot produce remotely

Manuscript received December 30, 2014; revised April 14, 2015 and June 12, 2015; accepted June 17, 2015. Date of publication October 14, 2015; date of current version November 11, 2015. This work was supported in part by the Ministry of Science and Technology of China under National Research Programs under Grant 2012AA12A407, Grant 2012CB955501, and Grant 2013AA122003 and in part by the National Natural Science Foundation of China under Grant 41271099 and Grant 41371417. (Corresponding author: Bing Xu.) B. Chen is with the Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China (e-mail: chenbin@mail.bnu.edu.cn). B. Huang is with the Department of Geography and Resource Management, The Chinese University of Hong Kong, Shatin, Hong Kong (e-mail: bohuang@ cuhk.edu.hk). B. Xu is with the Ministry of Education Key Laboratory for Earth System Modelling, Center for Earth System Science, Tsinghua University, Beijing 100084, China (e-mail: bingxu@tsinghua.edu.cn). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LGRS.2015.2453999

sensed data with both high spatial and spectral resolutions. Instead, they provide a panchromatic (Pan) image with relatively higher spatial resolution (e.g., 15 m for Landsat ETM+, 2.5–5 m for SPOT-5) but lower spectral resolution and multispectral (MS) images with a lower spatial resolution (30 m for Landsat ETM+ and 10 m for SPOT-5). Both have proven highly useful in monitoring ecosystem dynamics [1], [2] and land cover change detection [3], [4]. However, the long revisit cycle (16 days for Landsat series and 26 days for SPOT series) and frequent cloud contamination [5] make it difficult to acquire continuous remotely sensed images for the same regions and has limited their wide use in monitoring phenology disturbance and rapid change detection [6]. The Advanced Very High Resolution Radiometer, SPOT-Vegetation, and the Moderate Resolution Imaging Spectroradiometer (MODIS) can provide daily revisiting data, but the relatively coarse spatial resolution (250 m to 1 km) limits their applications in quantitative monitoring and accurate remote sensing mapping. Thus, an image fusion technique that blends the characteristics of multiple sensors to generate synthetic data with fine resolutions would be a cost-efficient solution of great interest within the remote sensing community [7]. Currently, remotely sensed data fusion can be generally divided into two major groups, spatial and spectral fusion and spatial and temporal fusion [8]. A number of image fusion algorithms addressing these two domains have been proposed in recent decades. Spatiospectral fusion, or pan-sharpening, aims to blend a lower spatial resolution MS image with a higher spatial resolution Pan image to obtain an MS image with a spatial resolution as high as that of the original Pan image. The methods developed in this field fall into several basic categories: arithmetic combination methods, projectionsubstitution methods, Amélioration de la Résolution Spatiale par Injection de Structures methods, and model-based methods [9]–[11]. However, the key challenge of these methods is the inability to control spectral distortion while preserving spatial details. Thus, a number of improved algorithms have recently been proposed to reduce spectral distortion [11]–[13]. Meanwhile, spatiotemporal fusion aims to enhance spatial coverage and temporal frequency simultaneously and includes the following: 1) the spatial and temporal adaptive reflectance fusion model (STARFM) [6], the enhanced version of STARFM (ESTARFM) [12], and the customized ESTARFM [7]; 2) the spatial temporal adaptive algorithfor mapping reflectance change [13]; 3) a semiphysical approach using a bidirectional reflectance distribution function spectral model for use in Landsat gap filling and relative radiometric normalization [14];

1545-598X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

2360


Fig. 1. Flowchart of the proposed fusion approach integrating (a) spatial– spectral details and (b) spatiotemporal information.

and 4) the sparse representation-based spatiotemporal reflectance fusion model [15]. Although much advance has been achieved in each of these areas, there has been limited work addressing their integration. In this letter, we propose a unified spatiotemporal spectral fusion model predicting daily synthetic Landsat-like data with 15-m spatial resolution. This model is implemented in two stages. First, the spatial resolution of Landsat ETM+ multispectral images is enhanced based on an improved adaptive intensity–hue–saturation (IAIHS) method. Second, the MODIS and enhanced Landsat ETM+ data are fused using STARFM to generate the final synthetic predictions.

Fig. 2. Fusion results of the simulated experiments. (a) Downscaled MS image of Landsat ETM+ with a 60-m spatial resolution. (b) Downscaled Pan image with a 30-m spatial resolution. (c) Fused image using (a) and (b), based on the method described in Section II-A. (d) NDVI difference during the prediction period. Images (e) and (f) represent the upscaled MODIS images with a 30-m spatial resolution on February 20, 2003 and March 8, 2003, respectively. (g) Final synthetic fusion image on March 8, 2003 with a 30-m spatial resolution. (h) Corresponding actual observation.

II. P ROPOSED M ETHOD A flowchart integrating the spatial–spectral details and spatiotemporal information was presented in Fig. 1. All the retrieved Landsat ETM+ and MODIS data were preprocessed to maintain the same image extent and improve the image quality. First, the ETM+ panchromatic band was applied to sharpen the multispectral imagery producing the synthetic multispectral image with a spatial resolution as high as that of the original panchromatic band. Then, the MODIS data acquired at the prior and predicted dates were fused with the enhanced ETM+ image via spatiotemporal fusion to predict the final synthetic product in fine resolutions. A. Integration of Spatial and Spectral Details For this part, we employed the IAIHS method [16] in fusing Pan and MS images. This method was developed based on the adaptive IHS proposed by Rahmani et al. [17]. The main idea of the IAIHS is to transfer the edges of both Pan and MS images to the fused image using a weighting function, and more details can be referred to [16] and [18]. B. Fusion of Spatial and Temporal Information After the integration of spatial and spectral details, we obtained a synthetic MS image with a spatial resolution as high as that of the Pan image, which subsequently underwent spatiotemporal fusion. Here, we chose the STARFM as an example because of: 1) its minimum input requirement; 2) its open sourcing of software and simplicity of process; and 3) its wide use in the remote sensing community [19]. STARFM predicts the surface reflectance using a combined weight function, incorporating spectral information from both the fine- and coarseresolution data, and details with respect to STARFM can be found in [6].

Fig. 3. Comparison of the fusion result from the actual experiments. (a) Final synthetic image on March 8, 2003 with a 15-m spatial resolution, which was fused using the actual Landsat ETM+ MS and Pan images. (b)–(d) Comparison between the observation and predictions in a subregion of our study area, where (b) is the prediction of real experiments, (c) is the prediction of simulated experiments, and (d) is the actual Landsat ETM+ image.

III. E XPERIMENTAL R ESULTS AND A NALYSIS To test the performance of our proposed blending framework, we applied it to both simulated and actual observed data using Landsat ETM+ and MODIS. Landsat ETM+ provides a Pan image at a spatial resolution of 15 m and MS images (here, we consider green, red, and NIR bands, the bandwidths of which are covered by the Pan image) at a spatial resolution of 30 m, downloaded from http://earthexplorer.usgs.gov/. MODIS09GA products provide daily multispectral images at a spatial resolution of 500 m. We selected the corresponding bands of MODIS data, downloaded at http://reverb.echo.nasa.gov/. For our experiments, we selected part of the Poyang Lake Nature Reserve as our testing region, whereas a study site located in the southern part of Hubei Province, China, was chosen for the fine land cover classification. A. Experiments With Simulated Data The proposed method was first tested with simulated data employing the Wald protocol [20], which helps to understand its accuracy and reliability. According to this protocol, the

CHEN et al.: LAND COVER CLASSIFICATION USING SYNTHETIC LANDSAT-LIKE IMAGES

TABLE I A SSESSMENT OF F USION R ESULTS IN F IGS . 2 AND 3

2361

TABLE II N UMBER OF ROI S AND P IXELS IN E ACH L AND C OVER T YPE FOR T RAINING AND VALIDATING A CCURACY

Fig. 5. Comparison of land cover classification results using (left) original Landsat data and the (right) composited Landsat data with temporal features at a 15-m spatial resolution.

Fig. 4. Flowchart of fine land cover classification using daily synthetic Landsat-like images at 15-m resolution.

original Pan and MS images of Landsat ETM+ were preliminarily degraded by a factor of 2. The degraded images were then fused to enhance the spatial resolution of the MS images. With regard to the second step, the MODIS images should be upsampled to the same spatial resolution as that of the enhanced MS image, and then, the MODIS images and enhanced Landsat ETM+ images are fused by STARFM to generate the final synthetic data for the prediction date. Thus, the original Landsat ETM+ MS image for the prediction date was used as the reference image. Visual comparison between the spatiospectral fusion result [see Fig. 2(c)] and the two original images [see Fig. 2(a) and (b)] shows that the fusion preserved consistent spectral information from the original MS image, accompanied by an excellent integration of spatial details from the Pan image. Moreover, the normalized difference vegetation index (NDVI) difference image [see Fig. 2(d)] suggests a significant phenology change from February 20, 2003 to March 8, 2003; and the final fusion result still captures general change information during the prediction period despite the large difference between Landsat and MODIS in spatial resolution. B. Experiments With Observed Data For the actual experiments, the original Pan and MS images were fused, and the enhanced MS image was then used to blend

with two MODIS images for the synthetic prediction for the targeted date. The spectral characteristics of two synthetic fusions [see Fig 3(b) and (c)] are quite similar to the actual observation [see Fig. 3(d)], which implies the superiority of suppressing spectral distortion and capturing changing information. Meanwhile, the spatial details are obviously enhanced in our unified fusion procedure compared with the actual image. C. Quality Assessment To assess the unified fusion performance quantitatively, we used various evaluation indexes. In the simulated experiments, the correlation coefficient (CC), the absolute average difference (AAD), and the root-mean-square error (RMSE) were calculated for each band; and the Erreur Relative Globale Adimensionnelle de Synthèse (ERGAS) [20] was computed to evaluate the overall fusion result. In the real experiments, because of the lack of actual reference images, we primarily used the upscaled Landsat ETM+ image for the targeted date as a substitute to check the global spatial and spectral details of the predicted MS image. From the five quantitative indexes in Table I, we conclude that all three selected bands achieved fine prediction accuracy. However, the prediction accuracy of the visible bands was obviously better than that of the near-infrared band. D. Applications With a Fine Land Cover Classification A fine land cover classification requires remotely sensed data with fine resolution and, in particular, the integration of dense time series data, which provide temporal change features to significantly improve the classification accuracy. Jia et al. [21] conducted a 30-m land cover classification experiment

2362


TABLE III C ONFUSION M ATRIX FOR L AND C OVER T YPE C LASSIFICATION

with the help of STARFM. Here, we have extended their work to perform a finer classification method, which aims to make full use of the available satellite images. Fig. 4 provides the flowchart of fine land cover classification using synthetic fusion through our proposed fusion framework. 1) EVI Calculation From Fused Time-Series Landsat-Like Data: The enhanced vegetation index (EVI) was extracted from time-series synthetic fine-resolution fusions according to EVI = 2.5 ×

ρnir − ρred ρnir + 6.0 · ρred − 7.5 · ρblue + 1

(1)

for a number of reasons. First, EVI is a composite index of blue, red, and near-infrared bands, which correspond to the synthetic bands we predicted using our method. Second, EVI is more sensitive than NDVI for those regions with a high leaf area index (LAI), particularly when the LAI is greater than 2. Moreover, specific feature selection is a more optimal strategy for redundancy reduction and computational efficiency [22]. Thus, the maximum and minimum values of the synthetic timeseries EVI data were extracted for integration into the synthetic Landsat data. The composited Landsat data consist of both the spectral features of the original Landsat data at a finer spatial resolution and the temporal features extracted from synthetic time-series EVI data with a fine resolution, which is sure to provide more supporting information for further classification. 2) Supervised Classification Using Maximum Likelihood: Prior knowledge of the general characteristics of the land cover type distribution for a given region is a prerequisite for the designation of land cover type and number. Six classification

types were identified: impervious, water, bare land, arable land, shrubs (grass), and forest. All of the regions of interest (ROIs) for the training samples and validating land cover classification accuracy were processed through the ENVI 4.8 software under the visual interpretation of Landsat ETM+ images and Google Earth images. All of the selected samples were easily identified, with each of the categories in a homogenous region. The number of ROIs for each type is summarized in Table II. We chose the maximum likelihood (ML) method as our classifier when conducting the land cover classification because of its superiority in identifying land cover type clusters and underestimating pixels through probability calculations. 3) Classification Accuracy Assessment: For our study, the land cover classification result using the observed Landsat data and the composited Landsat with temporal features (i.e., EVI), extracted from synthetic time-series Landsat-like data, were compared simultaneously. Quantitative criteria, including product accuracy, user accuracy, overall accuracy, and kappa coefficient, were adopted for assessing classification accuracy. The land cover classification maps, derived from both the original Landsat data upscaled to a 15-m spatial resolution and the composited Landsat with temporal features at a 15-m spatial resolution, are shown in Fig. 5. A visual inspection showed that each land cover type could be generally identified in both maps. The shapes of water bodies were particularly apparent. However, the classification result that used the composited Landsat with temporal features performed better in capturing phenology changes than that which only used the original Landsat data. The corresponding confusion matrices are summarized in

CHEN et al.: LAND COVER CLASSIFICATION USING SYNTHETIC LANDSAT-LIKE IMAGES

Table III. From a quantitative perspective, the result of integrating temporal features showed better classification accuracy (overall accuracy of 92.6%, kappa coefficient 0.91) than that using only the original Landsat spectral information (overall accuracy of 87.8%, kappa coefficient 0.85). Moreover, both the producer and user accuracies were significantly improved compared with the confusion matrix, particularly for the impervious and bare land types. IV. C ONCLUSION We have proposed a unified framework that addresses spatiotemporal spectral fusion to make better use of available satellite data. Notably, this is the first reference to the idea of constructing daily Landsat ETM+ images with a 15-m spatial resolution. Moreover, the algorithm demonstrates, with both simulated and observed remotely sensed data, that synthetic fusion adequately preserves both spatial and spectral details, while still being able to capture phenology change during the prediction period. A fine land cover classification using our proposed method was also performed, and the results showed that the classification accuracy was noticeably improved after combining temporal features, demonstrating the possibility of integrating spatiotemporal spectral information simultaneously in the classification field. Nevertheless, some potential limitations regarding our proposed method should be addressed. First, only limited spectral bands that are covered by the corresponding panchromatic band are employed in the fusion framework. Second, land cover is assumed not to be changed within the whole temporal MODIS data. Thus, our future research direction will focus on constructing hyperspectral, fine-spatial, and frequent-temporal fusion model considering both phenology and land cover changes and producing higher land cover classification accuracy products. Meanwhile, it is critical to keep spectral fidelity, particularly when the actual hyperspectral images have been applied [23], [24], and how to further minimize the accumulative geolocation biases among different satellite images should be also addressed. R EFERENCES [1] W. B. Cohen and S. N. Goward, “Landsat’s role in ecological applications of remote sensing,” BioScience, vol. 54, no. 6, pp. 535–545, Jun. 2004. [2] J. G. Masek et al., “North American forest disturbance mapped from a decadal Landsat record,” Remote Sens. Environ., vol. 112, no. 6, pp. 2914–2926, Jun. 2008. [3] M. A. Wulder et al., “Landsat continuity: Issues and opportunities for land cover monitoring,” Remote Sens. Environ., vol. 112, no. 3, pp. 955–969, Mar. 2008. [4] R. Michishita, Z. Jiang, and B. Xu, “Monitoring two decades of urbanization in the Poyang lake area, China through spectral unmixing,” Remote Sens. Environ., vol. 117, pp. 3–18, Feb. 2012. [5] J. Ju and D. P. Roy, “The availability of cloud-free Landsat ETM+ data over the conterminous United States and globally,” Remote Sens. Environ., vol. 112, no. 3, pp. 1196–1211, Mar. 2008.

2363

[6] F. Gao, J. Masek, M. Schwaller, and F. Hall, “On the blending of the Landsat and MODIS surface reflectance: Predicting daily Landsat surface reflectance,” IEEE Trans. Geosci. Remote Sens., vol. 44, no. 8, pp. 2207–2218, Aug. 2006. [7] R. Michishita, L. Chen, J. Chen, X. Zhu, and B. Xu, “Spatiotemporal reflectance blending in a wetland environment,” Int. J. Digit. Earth, vol. 8, no. 5, pp. 364–382, Mar. 2014. [8] B. Huang, H. Zhang, H. Song, J. Wang, and C. Song, “Unified fusion of remote-sensing imagery: Generating simultaneously high-resolution synthetic spatial–temporal–spectral earth observations,” Remote Sens. Lett., vol. 4, no. 6, pp. 561–569, Jun. 2013. [9] S. Li and B. Yang, “A new pan-sharpening method using a compressed sensing technique,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 2, pp. 738–746, Feb. 2011. [10] L. Zhang, H. Shen, W. Gong, and H. Zhang, “Adjustable modelbased fusion method for multispectral and panchromatic images,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 42, no. 6, pp. 1693–1704, Dec. 2012. [11] W. Wang, L. Jiao, and S. Yang, “Fusion of multispectral and panchromatic images via sparse representation and local autoregressive model,” Inf. Fusion, vol. 20, pp. 73–87, Nov. 2014. [12] X. Zhu, J. Chen, F. Gao, X. Chen, and J. G. Masek, “An enhanced spatial and temporal adaptive reflectance fusion model for complex heterogeneous regions,” Remote Sens. Environ., vol. 114, no. 11, pp. 2610–2623, Nov. 2010. [13] T. Hilker et al., “A new data fusion model for high spatial- and temporalresolution mapping of forest disturbance based on Landsat and MODIS,” Remote Sens. Environ., vol. 113, no. 8, pp. 1613–1627, Aug. 2009. [14] D. P. Roy et al., “Multi-temporal MODIS-landsat data fusion for relative radiometric normalization, gap filling, and prediction of Landsat data,” Remote Sens. Environ., vol. 112, no. 6, pp. 3112–3130, Jun. 2008. [15] B. Huang and H. Song, “Spatiotemporal reflectance fusion via sparse representation,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 10, pp. 3707–3716, Oct. 2012. [16] Y. Leung, J. Liu, and J. Zhang, “An improved adaptive intensity–hue–saturation method for the fusion of remote sensing images,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 5, pp. 985–989, May 2014. [17] S. Rahmani, M. Strait, D. Merkurjev, M. Moeller, and T. Wittman, “An adaptive IHS pan-sharpening method,” IEEE Geosci. Remote Sens. Lett., vol. 7, no. 4, pp. 746–750, Oct. 2010. [18] B. Chen and B. Xu, “A unified spatial–spectral–temporal fusion model using Landsat and MODIS imagery,” in Proc. IEEE 3rd Int. Workshop EORSA, 2014, pp. 256–260. [19] I. V. Emelyanova, T. R. McVicar, T. G. Van Niel, L. Li, and A. I. J. M. Van Dijk, “Assessing the accuracy of blending Landsat-MODIS surface reflectances in two landscapes with contrasting spatial and temporal dynamics: A framework for algorithm selection,” Remote Sens. Environ., vol. 133, pp. 193–209, Jun. 2013. [20] L. Wald, T. Ranchin, and M. Mangolini, “Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images,” Photogramm. Eng. Remote Sens., vol. 63, no. 6, pp. 691–699, Dec. 1997. [21] K. Jia et al., “Land cover classification of finer resolution remote sensing data integrating temporal features from time series coarser resolution data,” ISPRS J. Photogramm. Remote Sens., vol. 93, pp. 49–55, Jul. 2014. [22] C. Li et al., “A circa 2010 thirty meter resolution forest map for China,” Remote Sens., vol. 6, no. 6, pp. 5325–5343, Jun. 2014. [23] B. Xu and P. Gong, “Land use/cover classification with multispectral and hyperspectral EO-1 data,” Photogramm. Eng. Remote Sens., vol. 73, no. 8, pp. 955–965, Aug. 2007. [24] B. Xu and P. Gong, “Noise estimation in a noise-adjusted principal component transformation and hyperspectral image restoration,” Can. J. Remote Sens., vol. 34, no. 3, pp. 271–286, Jun. 2008.