Hyperspectral Data Dimensionality Reduction and the Impact of Multi-seasonal Hyperion EO-1 Imagery on Classification Accuracies of Tropical Forest Species Manjit Saini, Binal Christian, Nikita Joshi, Dhaval Vyas, Prashanth Marpu, and N.S.R Krishnayya
Abstract
Synchronizing hyperspectral data acquisition with phenological changes in a tropical forest can generate comprehensive information for their effective management. The present study was performed to identify a suitable dimensionality reduction method for better classification and to evaluate the impact of seasonality on classification accuracy of tropical forest cover. EO-1 Hyperion images were acquired for three different seasons (summer (April), monsoon (October), and winter (January)). Spectral signatures of pure patches of Teak, Bamboo, and mixed species covers are significantly different across the three seasons indicating distinctive phenology of each cover. Kernel Principal Component Analysis (k-PCA) is more suitable for dimensionality reduction for these covers. The three vegetation covers classified using images of three seasons achieved the best classification accuracies using k-PCA with maximum likelihood classifier for the monsoon season with overall accuracies of 83 to 100 percent for single species, 74 to 81 percent for two species, and 72 percent for three species respectively.
tropical forest species (Bradshaw et al., 2009). Therefore, comprehensive information on the spatial distribution and composition of existing plant species is fundamental to design effective strategies for conservation and management of increasingly fragmented tropical forests (Gillespie et al., 2008; Rodriguez et al., 2007). Hence, strong preference has been given to acquire updated data on vegetation cover changes regularly or annually so as to better assess the environment and ecosystem (Knight et al., 2006). Unfortunately, such information cannot be obtained exclusively from traditional survey techniques, due to logistical difficulties and the costs involved. For a subcontinent like India, survey for mapping vegetation and other land covers using conventional techniques is too complex and demands a huge amount of human resource and time (Roy and Joshi, 2002). Quite the opposite, forest vegetation mapping using remotely sensed observations is efficient and cost effective. Currently hyperspectral remote sensing is fast emerging as a key technology for advanced and improved understanding, classification, modeling, and monitoring of complex forest vegetation.
Introduction
The Importance of Phenological Variation in Species Discrimination Vegetation phenology can provide an useful signal for classifying forest cover. Seasonal phenological changes are mainly caused by inter-annual climatic variability and are reflected through an increase or decrease in green biomass (Pettorelli et al., 2005). Phenological changes significantly influence spectral reflectance curves. Changes in vegetation spectral response caused by phenology can conceal long term changes in the landscape (Hobbs, 1989; Lambin and Ehrlich, 1996). An understanding of vegetation phenology is prerequisite to inter-annual studies and predictive modeling of land surface responses to climate change (Myneni et al., 1997; Vina et al., 2004). However, the phenology and interactions of tropical forests with environmental, climate, and anthropogenic factors are not well perceived. Synchronizing hyperspectral data acquisition with phenological changes in tropical trees is a daunting task. At times, it is practically not feasible. Identifying appropriate endmembers for classifying tropical trees with diverse phenologies is an important aspect to look at.
The Importance of Tropical Forests Tropical forests constitute about half of the world’s forests and have the intrinsic property of being extremely rich in terms of species richness and diversity (Bradshaw et al., 2009; Gibson et al., 2011). They store 40 to 50 percent of carbon in terrestrial vegetation and are responsible for one third of the global terrestrial primary productivity (Beer et al., 2010). Over the past century tropical forests have been suffering from exceptional rates of changes as they are destroyed by human activities and climate change (Achard et al., 2004; Féret and Asner, 2013; Morris, 2010). The global character of tropical deforestation and its consequences on climate change and biodiversity make it an important emerging global concern that increasingly transcends individual nations and their boundaries (Fuller, 2006). Tropical forest destruction is likely to continue in the future, causing an extinction crisis among Manjit Saini, Binal Christian, Nikita Joshi, Dhaval Vyas, and N.S.R Krishnayya with the Ecology Laboratory, Department of Botany, Faculty of Science, The M.S. University of Baroda, Gujarat, India (
[email protected]). Prashanth Marpu is with the Institute Center for Water and Environment (iWATER), Masdar Institute of Science and Technology, PO Box 54224, Abu Dhabi, United Arab Emirates.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
Photogrammetric Engineering & Remote Sensing Vol. 80, No. 8, August 2014, pp. 773–784. 0099-1112/14/8007–773 © 2014 American Society for Photogrammetry and Remote Sensing doi: 10.14358/PERS.80.8.773
Aug u s t 2014
773
It is equally important to verify the suitability of endmember spectra coming from one season on classifying an image of a different season. Larger spatial heterogeneity in tropics suggests more complex control of the phenology (Maignan et al., 2008). Thus, attention to phenology is vital for the characterization and mapping of forests from spaceborne sensors. In several recent studies, phenological differences in leaf traits (Castro-Esau et al., 2006) and canopy characteristics (Castro-Esau and Kalacska, 2008) allowed discrimination of vegetation at the species level based on leaf spectral signatures. Extending these for configuring canopy scale studies in tropics is to be addressed. Merits and Demerits of Hyperspectral Sensors for Vegetation Mapping Numerous studies (Datt et al., 2003; Wu et al., 2010) have shown that the Hyperion imaging spectrometer onboard the Earth Observing One (EO-1) satellite has provided significantly enhanced data over conventional multi-spectral remote sensing systems (Thenkabail et al., 2013). The ability of hyperspectral data to significantly improve the characterization, discrimination, modeling and mapping of forested vegetation is well known (Thenkabail et al., 2011). This has led to improved modeling and mapping of forest vegetation characteristics such as biochemical-biophysical quantities (Asner and Martin, 2008; Haboudane et al. 2008; Vyas et al., 2013) and species discrimination (Cho et al., 2008; Cochrane, 2000; Vaiphasa et al., 2005). Development of base line spectral data and tools for the characterization and detection of tree species as a function of hyperspectral characteristics was identified as priority area of remote sensing research in tropical forests before a decade ago (Sanchez-Azofeifa, 2003). However, the main problem with hyperspectral image processing is the huge amount of data involved. It is imperative that new methods and techniques are to be developed to handle these high dimensional datasets. At the same time, it will be important to optimize future hyperspectral sensors by identifying and dropping redundant bands (Thenkabail et al., 2004). Optimizing hyperspectral sensors will help to reduce the volume and dimensionality of the datasets, and makes it feasible to classify vegetation in a pertinent manner. Dimensionality Reduction Techniques and Endmember Extraction A simple and effective way of dealing with high dimensional data is to reduce the number of dimensions (Benediktsson et al., 1995; Landgrebe, 2001; Lee and Landgrebe, 1993). Dimensionality reduction is mostly done by band selection. Feature extraction by band selection can minimize computational time for high dimensional hyperspectral data. Efficient statistical methods are required to reduce dimensionality (Chan and Palinckx, 2008). Large number of linear and nonlinear dimensionality reduction methods such as Principal Component Analysis (PCA), Minimum Noise Fraction (MNF), Discriminant Analysis (DA), Decision Boundary Feature Extraction (DBFE), Independent Component Analysis (ICA), Fisher’s Linear Discriminant Analysis (LDA), Kernel Principal Component Analysis (k-PCA), and ISOMAP have been used for the data reduction (Bachmann et al., 2006; Chang et al., 2002; Féret and Asner, 2013; Marpu et al., 2012; Thenkabail, 2004). Each method works differently. The PCA is an orthogonal linear transformation which projects the data into a new coordinate system, such that the greatest amount of variance of the original data is contained in the first few principal components (Richards, 2005). ICA identifies components which share minimal mutual information to ensure statistical independence of the derived components. ICA can be viewed as an improvement over PCA as an unsupervised feature extraction tool. PCA and ICA are linear projection methods and they work perfectly well on the linear data. However, real world data are often nonlinear, in which case linear techniques are not
774
Aug us t 2 014
appropriate. k-PCA (Mika et al., 1999; Schölkopf et al., 1998) is a kernel version of standard PCA. It is a method of nonlinear feature extraction, closely related to methods applied in Support Vector Machines (Schölkopf et al., 1999). All these methods have been commonly employed in optimum feature extraction band identification for different applications (Fauvel et al., 2013; Plaza et al., 2009; Tsai et al., 2007; Wang and Chang, 2006). Similarly, endmember selection is essential to accurately classify tree species in heterogeneous tropical forests. It involves the identification of appropriate endmembers and their corresponding spectral signatures. Reference endmember spectra can be directly derived manually from the image data themselves using ground truth information. An important advantage of the manual endmember selection is the capability of deriving endmembers whose spectra reflect some of the scene-specific processes affecting the signal received by the sensor (Bateson and Curtiss, 1996). Manual endmember extraction gets influenced by the proportion of homogeneous occupancy of a species. A combination of suitable endmember, appropriate dimensionality reduction technique can improve classification accuracy of forest cover. Forest Ecosystems and Species Mapping A number of studies have indicated the advantages of narrow band data to obtain the most sensitive information on species level discrimination using lab spectra (Castro-Esau et al., 2006; Zhang et al., 2006), airborne spectra (Carlson et al., 2007; Skoupý et al., 2011) and spaceborne spectra (Christian and Krishnayya, 2009; Mitri and Gitas, 2010; Vyas et al., 2011) by applying different classification algorithms. In the past few years many advanced methods have been developed and are used for species level classification such as Maximum Likelihood (ML) Classification (Buddenbaum et al., 2005; Clark et al., 2005; Govender et al., 2008), Spectral Angle Mapper (SAM) (Buddenbaum et al., 2005; Christian and Krishnayya, 2009; Clark et al., 2005) and Artificial Neural Network (ANN) (Filippi and Jenson, 2007; Skoupý et al., 2011). Most of these hyperspectral studies were carried out for boreal and temperate forest ecosystems. These areas are relatively more homogeneous as compared to tropical covers. Temperate landscapes offer a more manageable location for such studies, with a relatively small number of habitat types, and within each type, a greater predominance of a few, dominant species. Phenological changes are mostly uniform across the regions. This minimizes the complexity of studying these covers through remote sensing. The tropics on the other hand offers a challenge of an altogether greater magnitude, with far greater numbers of landscapes, habitats and species, distributed across a variety of stages of growth and succession, and with far more complex canopy structures (Kalacska et al., 2004; Nagendra, 2001). Although there have been major advances in remote sensing research of boreal and temperate ecosystems, tropical remote sensing research currently is lacking the fundamental scientific understanding and potential for routine applications observed in these parts of the globe. This knowledge gap is due in part to the complexity of tropical forest ecosystems, the underdevelopment of scientific and engineering infrastructure in the tropics, and the tendency of many countries to treat tools for conservation and resource management as a low priority relative to immediate economic needs (Sanchez-Azofeifa et al., 2003). Townsend et al. (2008) mentioned that heterogeneity has not been well captured in large-scale estimates of tropical ecosystem function. They suggested that new developments in Remote Sensing can help bridge the gap. These issues pose major hurdles in the classification of hyperspectral data of tropics. Heterogeneity and diverse phenological conditions of tropics augments the complexity. The present study has been carried out to address these issues with the following objectives:
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
1. Selection of an appropriate dimensionality reduction technique to improve the classification accuracy, and 2. Impact of phenological changes on classification accuracy.
Materials and Methods Study Area Tropical, dry deciduous forests of the Shoolpaneshwar Wildlife Sanctuary (SWS) (Gujarat, India) was selected as the study area (Figure 1). It is located between latitude and longitude of 21° 29'N to 21° 52'N and 73° 29'E to 73° 54'E, respectively. Hilly tract of the Sanctuary bordering the Narmada River supports some of the deciduous forests in Gujarat and is one of the important, naturally protected regions nurturing a sizeable biota. Annual precipitation of the area is in the range of 900 to 1,200 mm. Minimum and maximum annual mean temperatures are 8°C and 42°C, respectively. Topography of the study area is undulated with continuous and discontinuous hilly tracts up to an elevation of ~800 m ASL intermingled with valleys, streams, and sporadic clearings for agriculture. The area is important for its support to wildlife, tribal population, and as a catchment area for local water bodies (Sabnis and Amin, 1992). Field Data Collection An extensive field survey was done to collect information on homogenous vegetation patches, heterogeneous covers, and signs of human disturbance. Field surveys were carried out on all the three dates coinciding with the EO-1 Hyperion data acquisition time. This has been done to record seasonal differences in the phenology and growth of the vegetation cover. Like in any tropical area, heterogeneity in species distribution is unique to the study area. The major types found in the sanctuary include covers of Tectona grandis L. f., patches of mixed deciduous trees, Dendrocalamus strictus (Roxb.) Nees, and dry tropical riverine forests. Large tracts of land (≈100 m ×
100 m) covered with pure patches of Teak, Bamboo and mixed patches of other dry deciduous species are spread across the sanctuary (Pradeepkumar, 1993). Mixed species cover consisted of species like Anogeissus latifolia (Roxb. ex DC.) Wall. ex Bedd., Butea monosperma (Lam.) Taub., Mitragyna parvifolia (Roxb.) Korth., Garuga pinnata Roxb., Lagerstromia parvifolia Roxb., Wrightia tinctoria R.Br., Terminalia crenulata Roth., and Ficus glomerata Roxb.. Few individuals of Teak and Bamboo were also present in this mixed species cover. Phenological cycles of the three covers were distinct. Leaf longevity was relatively lesser in Teak as compared to Dendrocalamus cover in the study area. Mixed cover showed wider variation correlated with the composition of species in the quadrat. A total of 120 quadrats were laid down across the study site (91.50 Km2). 58 quadrats were laid down for Teak, 26 for Bamboo and 36 for mixed species cover. Number of quadrats laid down for each cover is proportional to its distribution in the study site. Quadrats laid down were randomly spread across the selected area. Each quadrat was of 30 m × 30 m size, matching the spatial resolution of Hyperion sensor. Each quadrat of a vegetation cover was coming from a 2 × 2 or 3 × 3 pixel window of the same cover. In all the quadrats, biophysical parameters such as Diameter Breast at Height, density, height and spread of canopy (Vyas et al., 2010) were recorded. Percentage occupancy of each vegetation cover was calculated. Global positioning system (GPS) locations of all the quadrats were taken within ±5 m error using GPS (Magellan Explorist 600) and considered as Ground Control Points (GCPs). Image Acquisition and Preprocessing of Hyperion Data Three narrow-band Hyperion images were acquired for the seasons, summer (March to June), monsoon (July to October) and winter (November to February). Images were acquired on 03 April 2006 (summer-dry season), 21 October 2006 (monsoon-wet season), and 22 January 2011 (winter-middle of fall season) (Figure 2). The spatial resolution of the sensor was 30
Figure 1. The location of the study area in western India (21.7017N, 73.735E) and the area where trees were sampled (displayed in Google® map image with utm projections). The selected forest covers are Teak, Bamboo, and Mixed species.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
Aug u s t 2014
775
neighborhood sampling method and second order polynomial model where resultant RMSE was ≤0.5 pixels for all three datasets. For the classification and analysis of the three vegetation covers, a spatial subset of 345 × 256 pixels (91.50 km2) was generated within the scene. GPS positions of all the quadrats of the field study are falling in this subset. NDVI values were calculated by utilizing bands 620 nm (Red band) and 800 nm (NIR band). Masking of the image was performed using threshold NDVI values so as to exclude water bodies. Temporal variation of spectral characteristics of the study area was assessed based on two hyperspectral narrow band vegetation indices (HVIs), NDVI (Normalized Difference Vegetation Index) and PRI (Photochemical Reflectance Index). PRI was calculated by following the formula given by Gamon et al. (1997).
m and the spectral resolution was 10 nm with a wavelength range of 356 to 2578 nm. At the time of image acquisition, the study area had less than 25 percent cloud cover. Characteristics of three Hyperion images are given in Table 1. The delivered USGS Hyperion product contains 242 hyperspectral narrow bands (HNBs). Out of these, bands 1 through 7 (356 to 417 nm) and bands 225 to 242 (2406 to 2578 nm) are not calibrated, bands 56 and 57, 77 and 78 fall in the overlap region of the two spectrometers (VNIR and SWIR). Along with uncalibrated HNBs, bands 77 and 78 were removed because they have usually been found to be noisier than the corresponding bands (56 and 57) in the VNIR region. Strong water vapor absorption bands falling at 1356 nm, 1366 nm, 1406 nm, 1417 nm, and between 1820 to 1931 nm were also ignored. After the removal of all these uncalibrated, overlapped and water absorptive HNBs, a spectral subset of 179 HNBs has been retained for further processing from each of the three datasets. A “destreaking” routine was applied to minimize the striping artifact present in the Hyperion datasets. For destreaking, ENVI add-on tool “workshop” (CSIRO Office of space science and applications, Earth Observation Centre, Australia) was used. The Hyperion radiance values from the 179 HNBs were converted to surface reflectance using FLAASH (Fast Line-of-sight Atmospheric Analysis of Spectral Hypercube, USA), a Modtran-4 based program to remove atmospheric scattering and absorption effects. We estimated the amount of aerosols and the scene average visibility (40 km) using the 2-Band (k-T) method (Kaufman et al., 1997), which uses a dark pixel reflectance ratio approach with bands placed around 660 and 2100 nm. Precipitable water vapor was derived on a per pixel basis from the 1130 nm spectral feature. A correction for adjacency effects was also applied to the data. Model parameters included were a tropical atmosphere with a rural aerosol model. The atmospherically corrected images were geometrically registered using the nearest
(a)
Endmember Extraction Endmember spectra for the three identified covers were selected manually from the hyperspectral image using ground truth data. Quadrats with the highest ground occupancy (>80 percent) of each vegetation cover were considered as purer ones and spectra from these quadrats were extracted for further analysis. The underlying assumption was that the influence of other vegetation (0.5; Figure 3a). Figure 3b shows a decline in foliage cover during winter season with NDVI values ranging from 0.2 to 0.8. Maximum number of pixels (72 percent) showed NDVI values