comparing broadband remote sensing data with laboratory or field spectra. Early tests of signature generalization with Landsat data focused on agricultural crop ...
Remote Sensing of Environment 77 (2001) 241 – 250 www.elsevier.com/locate/rse
Forest mapping with a generalized classifier and Landsat TM data Mary Pax-Lenney*, Curtis E. Woodcock, Scott A. Macomber, Sucharita Gopal, Conghe Song Department of Geography, Boston University, Boston, MA 02215, USA Received 30 August 2000; accepted 24 February 2001
Abstract Monitoring landcover and landcover change at regional and global scales often requires Landsat data to identify and map landscape features and patterns with sufficient detail. Analytical methods based on image-by-image interpretation are too time-consuming and laborintensive for studies of large areas to be undertaken with any degree of frequency. One potential solution is to develop algorithms or classifiers that can be generalized beyond the arena of the initial training to new images from different spatial, temporal or sensor domains. Building upon earlier success with a generalized classifier to monitor forest change, we now address the question of generalization for classifications of stable landcovers. We evaluate the ability of a supervised neural network, Fuzzy ARTMAP, to identify conifer forest across time and space with Landsat Thematic Mapper (TM) images for a region in northwest Oregon. We also assess the effects of atmospheric corrections on generalized classification accuracies. Using midsummer images atmospherically corrected with a simple dark-objectsubtraction (DOS) method, there is no statistically significant loss of accuracy as the classification is extended from the initial training image to other images from the same scene (path and row): temporal generalization is successful. Extending the classifier across space and time to nearby scenes results in a mean decline of 8 – 13% accuracy depending on the atmospheric correction used. Obvious sources of error, such as seasonality, solar angle variation, and complexity of landcover identification, do not explain the decline in error. Additionally, the patterns in generalization accuracies are complex, and the relationship between pairs of training and testing images is not necessarily reciprocal, i.e., good training data are not necessarily good testing data. Simple DOS atmospheric corrections produce classifications with comparable accuracies as classifications from the more complex radiative transfer corrections. These findings are based on over 200 classifications. A high degree of variability in the classification accuracies underscores the importance of extensive, in-depth analysis of remote sensing techniques and applications, and highlights the potential problem for misleading results based on just a few tests. Generalization is well suited for multitemporal classifications of one Landsat scene. Using simple DOS and midsummer images, generalization offers the opportunity for frequent landcover mapping of a Landsat scene without having to retrain the classifier for each time period of interest. However, at this point, the utility of regional landcover mapping with a generalized classifier remains limited. D 2001 Elsevier Science Inc. All rights reserved. Keywords: Landcover classification; Generalization; Landsat TM and ETM+; Atmospheric correction
1. Introduction With the launch of ERTS 1 in 1972, remote sensing research and applications grew from intermittent acquisition of meteorological and photographic satellite imagery into a program designed to acquire systematic, repetitive, and multispectral data for the entire land area of Earth. For nearly 30 years, the remote sensing community has spent much time and effort in developing methods to extract increasingly detailed levels of information about the landscape from Landsat data. Research has often had a very * Corresponding author. 3930 Lawn Avenue, Western Springs, IL 60558, USA.
local or regional focus. In 1994, the Landsat program was added to NASA’s Earth Observing System, which is an integrated network of satellites devoted to the study of global change. For Landsat to contribute to such global analyses, methods are sought that are more automated and generalizable permitting frequent monitoring of large areas. Global monitoring requires both a synoptic view of the Earth to identify broad landcover patterns and a concurrent detailed view of specific locations to allow description and quantification of the type and extent of change occurring at the local level. This points to the need for a collaborative approach combining satellite data of multiple sensors — combining, for instance, information about the general status of broad landcover classes from the quarterly land-
0034-4257/01/$ – see front matter D 2001 Elsevier Science Inc. All rights reserved. PII: S 0 0 3 4 - 4 2 5 7 ( 0 1 ) 0 0 2 0 8 - 5
242
M. Pax-Lenney et al. / Remote Sensing of Environment 77 (2001) 241–250
cover product from Moderate Resolution Imaging Spectroradiometer (MODIS; Strahler et al., 1996) with detailed information about the types and extent of change from the Landsat Thematic Mapper (TM) and Enhanced Thematic Mapper Plus (ETM+) sensors. To use Landsat data successfully in such a collaborative scheme, we need an approach to processing and interpretation of Landsat data that includes classification models that can be generalized beyond the scope of the initial training data. To this end, we are developing an automated method to monitor temperate conifer forests, ultimately at the global scale, with Landsat TM and ETM+ data. Within this context, generalization describes the concept whereby a classifier is trained with data from one domain and applied to data from different domains be they geographic location, time, and/or imaging sensors. Previously, we demonstrated that it is possible to identify manifest forest change with a generalized classifier with the same degree of accuracy as a conventional, image-by-image based method, and in a fraction of the time (Woodcock et al., 2001). A neural network classifier was trained with examples of forest change and no-forest change from a two-date composite of Landsat data. The network was then applied to six different two-date composites of 1991 – 1995 images covering the Cascade Range in Oregon. An accuracy assessment of 536 sites showed the results from the generalized classification to be comparable to one made with traditional methods by Cohen et al. (1998). In this paper, we address how well the landcover conifer forest can be identified using a generalized classification method. Of particular interest here is the effect of atmospheric correction on classification accuracies as the level of generalization increases. Based on over 200 classification tests, we evaluate the stability of conifer – nonconifer classification accuracies for a portion of the Cascade and Coastal Ranges in Oregon as the spatial and temporal separation between training and testing data is increased.
2. Background Landsat users have approached the task of monitoring landcover over large areas from two directions. One is a sampling strategy, the other a wall-to-wall analysis of hundreds of images. For the 1990 World Forest Resources Assessment, the United Nations Food and Agriculture Organization (FAO) analyzed two dates of Landsat images representing a 10% sample of the world’s tropical forests. Results were extrapolated to predict the state of global tropical forests (FAO, 1995). Alternatively, as part of the NASA Pathfinder Program, Skole and Tucker (1993) visually interpreted over 200 Landsat Band 5 images to monitor forest change in the Brazilian Amazon. Brazil’s space agency Instituto Nacional de Pesquisas Espaciais (INPE) monitors Amazonia deforestation by analyzing hundreds of color composites (Bands 3, 4, and 5) of Landsat TM data (Instituto Nacional de Pesquisas Espaciais, 1992). The Multiresolution
Land Characteristics (MRLC) Interagency Consortium produces landcover products based on unsupervised classifications of image mosaics covering the conterminous United States (Loveland and Shaw, 1996; see Vogelmann et al., 1998 for a regional application). In contrast to these laborintensive approaches, generalization offers the possibility of monitoring large areas more quickly and with less effort. In the broad sense, all supervised classification is based on generalization: a classifier is trained with examples of the features of interest and then applied to previously unseen data. The question is how far and to what domains can the features’ signatures be extended away from the training or calibration data? The generalization concept is not new. The concept of unique spectral signatures of surface features or a library of such signatures is frequently associated with the early work of geologists who measured and compiled the reflectance spectra of hundreds of minerals and rock types in the laboratory, in the field, and eventually with remote sensors (e.g., Cronin, 1967; Goetz et al., 1982; Hunt, 1977). Examples or libraries of hyperspectral vegetation signatures also exist (Bowker et al., 1985; Price, 1995), but their utility vis-a`-vis remotely sensed data is limited by the relatively broad bands and large instantaneous field of view (IFOV) of the satellite sensors in comparison to the sensors used to create the spectral libraries. Atmospheric influences on satellite data also increase the difficulty of comparing broadband remote sensing data with laboratory or field spectra. Early tests of signature generalization with Landsat data focused on agricultural crop identification for yield predictions. One of the goals of the 1973 Crop Identification Technology Assessment for Remote Sensing (CITARS) experiment was to test the feasibility of extending corn and soybean classifiers beyond the region from which the training data were derived. Classification accuracies and area proportion estimates decreased by 22% and 23%, respectively, when training and testing data were developed from different locations or dates. The decline was correlated with differences in atmospheric conditions (Myers, 1983). During the mid-1970s, the Large Area Crop Inventory Experiment (LACIE) evaluated the feasibility of spectral extensibility for determining wheat acreage. Classifiers trained with data from one segment within the US wheatgrowing region were tested against nearby segments. While haze and sun-angle corrections improved cross-segment classification, in general, the approach was considered untenable (Minter, 1978). Signature extension has also been tested with combinations of coarse and medium resolution data. Fazakas and Nilsson (1996) calculated a statistical relationship between AVHRR spectral signatures and forest cover information derived from TM data to estimate volume and forest cover for unseen images covering southern Sweden. In four of 17 counties, aggregated forest cover estimates were within 10% of official county estimates. Accuracies declined as the
M. Pax-Lenney et al. / Remote Sensing of Environment 77 (2001) 241–250
spatial separation between training and testing areas increased. Similarly, Iverson et al. (1994) developed a regression between AVHRR spectral bands and percent forest cover as determined from TM data, and then used it to extrapolate forest cover in two areas several hundred kilometers away from the training site. The correlation between these forest cover estimates and independent ground-based estimates was high for nearby areas (.89), but decreased as the distance from the training data and difference in ecological landscapes increased. Zhu and Evans (1994) developed and applied regionally based statistical regressions between AVHRR spectral bands and TM derived forest – nonforest maps to predict percent forest cover in AVHRR images for the conterminous US. There are few recent studies of classification generalization based solely on medium resolution data. Hall et al. (1991) applied temporal and sensor generalization to identify changes in a forested landscape. Spectral signatures of forest ecological states derived from Bands 1 and 4 of a 1983 Landsat 4 Multispectral Scanner (MSS) image were applied to a radiometrically calibrated 1973 Landsat 1 MSS image. Estimates of per-pixel changes in ecological states between the two dates were evaluated with transition matrices. The authors did construct an error model of the matrices, but did not undertake a direct assessment of accuracy of the generalization. Cohen et al. (2001) developed a method called applied radiometric normalization to minimize boundary effects in regional maps of western Oregon of continuous vegetation cover parameters created from mosaiked TM images. A centrally located source image was translated into vegetation cover information using statistical models of spectral and reference vegetation data. The vegetation cover information was then extended to neighboring images with new models using the reference data from sites in overlapping areas between scenes and the neighboring image spectral data. Cohen et al. report good success with their methods, but do not specifically address accuracy vis-a`-vis degree of separation between training and testing data. There are several reasons to believe that the concept of generalization with Landsat data should be addressed once again. First, the types of questions currently being raised about global landcover patterns and interactions between terrestrial ecosystems require new methods of data interpretation. While the general status of such patterns can be monitored with such sensors as MODIS, the extent and types of changes in landcover still needs to be identified using data with the spatial and spectral resolutions of Landsat (Collins and Woodcock, 1999). However, the handcrafted approach to image analysis using methods calibrated with local data is not appropriate for these new global studies; they are simply too time consuming. We need automated methods that can be generalized across time, space, and sensors. Secondly, atmospheric correction methods developed over the past 20 years may mitigate some of the earlier problems of spectral extensibility.
243
Thirdly, new classification methods such as artificial neural networks have been developed that may be more adept at defining robust feature signatures. Indeed, as stated previously, we have applied the concept of generalized classification successfully to monitoring forest change (Woodcock et al., 2001). In this current study, we apply the same method to the task of landcover identification. We begin with a relatively simple analysis: the identification of conifer forests versus all nonconifer forest landcover in northwest Oregon.
3. Methods 3.1. The generalization scheme In terms of image classification, generalization signifies an increasing separation in time and/or space between training and testing data. Here, generalization levels are based on Landsat’s World Reference System (WRS). For the base level of no generalization, training and testing data are derived from a single TM image. This level is called withinimage and has long been the standard approach to classification in remote sensing. The within-image classification accuracies serve as the baseline against which results from higher generalization levels are compared. The first level of generalization, within-scene, is temporal. Training data are derived from one Landsat image of a scene (path/row), and testing is performed on a different image from the same scene but a different date. At the second level, within-region, training and testing data are derived from nearby scenes, and this generally involves temporal generalization as well, particularly with Landsat 5 data. This concept could be extended to across region and even across continent. The term region is somewhat loosely defined and should this method prove applicable to larger and larger areas, a more strict definition of region would be necessary. It is easy to picture regional boundaries being defined by ecological parameters. In this paper, scene refers to a WRS Landsat path-row designation; image refers to a specific image acquisition. Thus, there may be many images for one scene. The results presented in this paper are from classification tests of the first two generalization levels: within-scene and within-region. 3.2. The study area and data The study region is located in northwest Oregon. Franklin and Dyrness (1988) describe three major forested zones west of the Cascades. The coastal zone is dominated by dense, tall conifer forest stands, the most common species being Sitka spruce, western hemlock, and western redcedar. A second zone, divided by the Willamette Valley, encompasses areas in both the eastern Coastal Range and the western and high Cascade Range. Dominant tree species are western hemlock and Douglas-fir. The third zone occurs in
244
M. Pax-Lenney et al. / Remote Sensing of Environment 77 (2001) 241–250
Table 1 List of Landsat TM images
Table 3 Atmospheric corrections
Scene
Image
Scene
Image
4628
870626 870930 920508 961008 840617 880409 920607 920810 961008
4528
850816 860718 871009 931025 880723 880909 920803 950913
4629
4529
Scene = WRS path – row; Image = yymmdd.
the higher elevations of the western Cascade Range where forest composition is more varied. Typical species are Pacific silver fir, western hemlock, noble fir, Douglas-fir, western redcedar, and western white pine. At the crest of the Cascades is a mosaic of tree patches interspersed with shrubs and a herbaceous understory. Ponderosa and lodgepole pines dominate the drier, eastern side of the Cascades although Douglas-fir and western larch are also present (Edmonds, 1982). The satellite data consist of 17 Landsat TM images from four scenes (Table 1). The images were coregistered by scene. Field observations of general land cover characteristics were collected for 190 sites in 1997 and 1998. For forested sites, information includes the percent conifer and hardwood cover (determined visually), general understory characteristics (herbaceous and woody shrub prevalence), and key species. General land cover classes, such as agriculture, barren, or clear-cut, were identified for nonforested sites. The distribution of sites by WRS scene is shown in Table 2. The total number of sites in this table is greater than 190 because sites may occur in more than one scene. The class conifer is defined as forested land with at least 40% needle leaf, evergreen cover, i.e., there is a significant conifer component to the forest but it need not be exclusively conifer. All other land covers, including hardwood forests and forested land with < 40% evergreen, needle leaf cover (including larch trees), and all nonforested lands such as water, agriculture, and grasslands are defined as nonconifer. The classifier is Fuzzy ARTMAP, a match-based learning neural network (Carpenter et al., 1997). It is a nonparametric supervised classifier. One of the principal advantages of this classifier is that it permits ‘‘many-to-one’’ mapping, i.e., many spectral subclasses may be contained within one larger class without having to explicitly identify each subclass in the classification process. All examples of the Table 2 Distribution of conifer (con) and nonconifer (non) sites by WRS Scene
Con
Non
Total
4528 4529 4628 4629
30 59 44 25
24 58 29 9
54 117 73 34
DN DOS1 DOS1M DOS2 DOS3 RTC
Digital numbers, no correction Simple DOS (Chavez, 1989) A histogram match approach applied to DOS1-corrected bands Improved DOS (Chavez, 1996) DOS1 with a Rayleigh atmosphere (Song et al., 2001) Radiative Transfer Code 6S (Vermote et al., 1997)
subclasses must be included in the training data, but need not be identified by subclass. In this case, all nonconifer subclasses (hardwood forest, agriculture, water, sparse conifer forest, etc.) are grouped into one nonconifer class for training. Inputs to Fuzzy ARTMAP are the brightness, greenness, and wetness transformations (BGW) created using published coefficients (Crist, 1985) applied to the atmospherically corrected reflectance data. 3.3. Atmospheric corrections We anticipated that atmospheric correction would be an important preprocessing step for generalization. However, there is no commonly accepted correction for Landsat data for operational applications at the regional scale. Due to the lack of detailed atmospheric data for most of the Landsat archive, atmospheric corrections are limited primarily to image-based methods that do not rely on the concurrent collection of atmospheric parameters and imagery. We evaluated the effect of several dark-object-subtraction (DOS) methods and, for comparison, one radiative transfer method on classification accuracies within the framework of generalization. Within-image classifications made from the raw, uncorrected data (digital number, DN) served as the basis against which classification accuracies from higher order generalizations with atmospherically corrected data are compared. The atmospheric corrections are listed in Table 3. After Moran et al. (1992), the equation to calculate reflectance is: r ¼ ½p ðLsat Ld Þ=tv =½ðEo Þcosqtz þ Ed
ð1Þ
where: Lsat = spectral radiance at satellite, Ld = upwelling atmospheric radiance, tv = atmospheric transmittance along the target– sensor path, tz = atmospheric transmittance along the sun – target path, Eo = exoatmospheric solar spectral irradiance, cos q = cosine of the solar zenith angle, and Ed = scattered downwelling spectral irradiance. Some of these variables can be derived from the images themselves or from published data. Lsat was calculated as Lsat = DN*Gain + Bias using the time delayed calculations Table 4 Parameters for three DOS corrections Method
Tz
Tv
Ed
DOS1 DOS2 DOS3
1.0 1.0 e tr /cos
1.0 cos qz e tr /cos
0.0 0.0 6S
qz
qv
M. Pax-Lenney et al. / Remote Sensing of Environment 77 (2001) 241–250
for gains of Thome et al. (1993) and sensor biases of Markham and Barker (1986). Eo is calculated from 6S (Vermote et al., 1997) and cos q is given with the image data. The remaining variables, Ld, tv, tz, and Ed need to be determined. The basic assumption of DOS correction is that some ground features are so dark that the energy received at the sensor is due almost entirely to scattered path radiance (Ld). However, because so few surfaces reflect absolutely nothing, Ld is modified such that 1% of the radiance at the sensor is assumed to be from the surface. Thus, Ld is set to the topof-atmosphere radiance of a dark object less 0.01 radiance. For this study, the dark object radiance for each band for each image was determined from histogram minima rather than from a specific dark feature such as water (Teillet and Fedosejevs, 1995). The parameters for tz, tv, and Ed for the three DOS corrections are shown in Table 4. For DOS1, transmittance
245
along the sun – target and target –sensor paths is assumed to be 100% (tv and tz = 1), and the downwelling solar spectral diffuse irradiance is ignored (Ed = 0) (Chavez, 1989). For DOS2, cos q is used as a surrogate for the sun –target path transmittance (tv), and tz and Ed are the same as DOS1 (Chavez, 1996). For DOS3, Song et al. (2001) suggested a more realistic interpretation of path transmittances would be to assume a Rayleigh scattering atmosphere with tz defined as e-tr/cos qz and tv defined as e-tr/cos qv. Optical thickness for such an atmosphere is defined by Kaufman (1989) as: tr ¼ 0:008569l4 ð1 þ 0:0113l2 þ 0:00013l4 Þð2Þ where l is wavelength in mm or micrometers. For DOS3, Ed is calculated for a Rayleigh atmosphere from the radiative transfer code (RTC) 6S (Vermote et al., 1997). Once a DOS correction is applied to all the images from one scene, it is expected that calculated reflectances of
Fig. 1. Histograms of atmospherically corrected (DOS1) Band 4 from four images of one scene are shown in the top figure. Even after the DOS1 correction, there is variability in the left edge of the histograms, particularly for the late October image (931025). The lower figure shows the alignment in histograms after a simple shift has been applied to the DOS1 values.
246
M. Pax-Lenney et al. / Remote Sensing of Environment 77 (2001) 241–250
Table 5 Combination of atmospheric corrections and generalization levels tested Atmospheric correction Generalization
DN
DOS1
DOS1M
DOS2
DOS3
RTC
Within image Within scene Within region
X X X
X X
X X
X X
X X
X
See text for description of atmospheric corrections.
stable features would be constant throughout the images. And, as the proportion of change pixels to stable pixels in all images is small, one would expect band histograms across corrected images to be similar. In particular, because conifer forests are relatively dark, the left edge of the histograms is expected to align. In fact, they do not, indicating that the DOS corrections do not correct the data sufficiently. Therefore, we created an additional DOS1 correction, DOS1Match (DOS1M) by aligning or matching the left edges of the band histograms within each scene. Figure 1 demonstrates this correction. In addition to the DOS approaches, we evaluated a radiative transfer correction, 6S, using the atmospheric parameters from the code’s internal standard meteorological libraries. The atmospheric model describing pressure, temperature, water vapor, and ozone densities was set to ‘‘midlatitude summer’’; the aerosol model, which partitions aerosols into dust-like, water soluble, oceanic, and soot proportions, was set to ‘‘maritime’’; and a Lambertian surface was assumed. Visibility data were derived from point data collected from meteorological stations near the time of satellite overpass. DN-to-reflectance conversions using a RTC are nonlinear. We evaluated the degree of nonlinearity by comparing a plot of reflectances for each unique DN in an image against a line connecting the reflectances for DN = 0 and DN = 255 for each band. There was a negligible difference between the points. Therefore, in this study, linear corrections based on reflectances calculated from 6S for DNs 0 and 255 were applied to each band in each image.
tion. A majority rule determined the class label of each field site. Thus, accuracies are site-based, not pixel-based. To judge the success of generalization and to assess the effectiveness of the atmospheric corrections, we tested for statistical significance of difference between mean accuracies for combinations of atmospheric corrections and generalization levels with one-way analysis of variance (ANOVA) and a posteriori Student – Newman– Keuls t tests (Montgomery, 1984).
4. Results The full dataset includes images from May to October. The range of acquisition dates was intentionally broad to evaluate the effectiveness of atmospheric corrections to account for seasonal effects. Figures 2 and 3 show that mean accuracies are lower and variability is greater for the full dataset than for a subset of images restricted to midsummer dates (June – August). There is no clear association between accuracy and seasonality as indicated by either an extreme acquisition date or by a large difference between training and testing dates (Fig. 4). Poor accuracies usually result from images with these seasonal indicators, but not all classifications from such images produce poor accuracies. However, restricting the analysis to midsummer images improves mean accuracies and reduces variability. While this finding is not a surprise, it clearly shows that atmospheric corrections alone are not sufficient to remove all seasonal effects. Phenological differences and possibly sun angle are also important even for the identification of evergreen forests. We compared the accuracies of the 11 midsummer combinations of atmospheric corrections and generalization levels with ANOVA. The F value was 7.02 (df = 10,205; P =.0001) indicating that the null hypothesis that all means
3.4. The test The combination of atmospheric corrections and generalization levels tested is shown in Table 5. For each withinscene and within-region test, the neural net was trained with the BGW data from the field sites. Then, the trained net was applied, or tested, on data from the sites of all other images. Thus, for each atmospheric correction and generalization level, there are multiple classification tests. For the withinimage tests, the neural net was trained with 80% of the data and tested on the remaining 20% of unseen data. This 80/20 analysis was repeated four more times until all sites in the image had been used as test cases, but in all instances the test data had not been seen during training. The output from the neural net is a per-pixel, conifer – nonconifer classifica-
Fig. 2. The mean classification accuracies ( ± 1s) for the full dataset of images are shown for each atmospheric correction and generalization level. The atmospheric corrections are: DN = digital number, no correction; D1 = DOS1, D1M = DOS1-Match; D2 = DOS2; D3 = DOS3; RTC = radiative transfer code. See text for atmospheric correction details.
M. Pax-Lenney et al. / Remote Sensing of Environment 77 (2001) 241–250
Fig. 3. Mean classification accuracies ( ± 1s) for the midsummer dataset of images by atmospheric correction and generalization level. See Fig. 2 for the explanation of atmospheric corrections.
are equal should be rejected. The next question is: Which means are different and which are similar? To answer this question, and account for the increased likelihood of Type I errors associated with multiple t tests, we used a posteriori Student – Newman– Keuls t tests with a = .1. Results are shown in Table 6. There are several observations to make. The Student – Newman– Keuls t tests show three groups: A, B, and C. There is no significant difference in mean accuracies for members within a group, and there is such a difference for members across groups. Thus, mean accuracies for members within Group A are not different from each other, but they are different from members in Group C. The complicating pattern is Group B, which contains some but not all members of both Groups A and C. From the patterns in Table 6, we can now answer the questions posed by this research: Is generalization a viable remote sensing method of landcover mapping? Is atmospheric correction necessary, and, if so, which method produces the highest classification accuracies? First, Group A shows that for first-order, temporal generalization (from within-image to within-scene) of midsummer images, generalization is possible with any of the
247
atmospheric corrections tested. There is no significant decline in mean accuracies between within-image DN results and within-scene atmospherically corrected results. Second, for second-order, within-region generalization, there is a statistically significant decline in mean accuracies for the midsummer dataset for all three atmospheric corrections (Groups A vs. C). There is an 8– 13% decline in accuracy when generalizing from within-image DN to within-region atmospherically corrected classifications. The decline in mean accuracies is due primarily to classifications of the images from one scene (Path 45, Row 28). As test images, the 45/28 images produce low accuracies, but the reverse is not true: accuracies are not low when these images are used as training images. At this point, there is no clear explanation for this pattern. Third, the effect of atmospheric corrections is related to generalization levels. For temporal, within-scene, generalization atmospheric correction is necessary: the mean accuracy for within-scene DN (uncorrected) classifications falls into Group C, whereas all atmospheric corrections are in Group A. While all the within-scene atmospheric corrections are within Group A, the DOS1M and DOS2 corrections have the highest mean accuracies and do not fall into Group B as do the other atmospheric corrections. These results show that within-scene generalization is possible and that the best choices of atmospheric correction are DOS1M and DOS2 for this application. In contrast, for the within-region generalization level, atmospheric correction does not produce as clear a picture. Both the uncorrected (DN) and corrected classifications belong within Group C indicating no significant difference in mean accuracies. However, the DN and DOS1M corrections produce the poorest accuracies and they do not belong to Group B. Thus, DOS1 and DOS2 corrections are the better choices for within-region classifications. It should be noted that the atmospheric correction methods evaluated here are simply linear conversions of DNs to reflectances based on the assumption of a spatially homogeneous atmosphere over the entire image.
Table 6 Student – Newman – Keuls test results Grouping
B B B B B B Fig. 4. Classification accuracies vs. the difference in Julian dates between training and testing images for both within-scene and within-region DOS1 classification trials.
A A A A A A C C C C C
Mean
N
Classification test
0.948 0.936 0.930 0.911 0.911 0.906 0.863 0.859 0.850 0.830 0.821
8 10 10 10 10 8 46 46 10 46 12
Within Within Within Within Within Within Within Within Within Within Within
image scene scene scene scene scene region region scene region region
DN DOS1M DOS2 DOS1 DOS3 RTC DOS1 DOS2 DN DN DOS1M
Means with the same group letter (A, B, or C) are not significantly different (a = 0.1, df = 205). Results are sorted by mean accuracy.
248
M. Pax-Lenney et al. / Remote Sensing of Environment 77 (2001) 241–250
Fig. 5. Example of DN-reflectance conversions for Band 4 for a June image (left) and an October image (right). Both images are from scene 46/29. The patterns of conversions for the June image are representative of most bands of most images: There is very little difference between atmospheric corrections particularly for the most populated range of DNs in most images. Overall, the DOS2 correction tends to brighten reflectances more than the other corrections. The October example shows the exaggerated correction of DOS2, which occurs in spring and fall images.
They do not account for adjacency or topographic effects. Except for DOS2, the conversions are very similar. As an example of this, Fig. 5 shows the DN-to-reflectance conversions for Band 4 for two images. DOS2 uses the cosine of the solar zenith angle in its correction of pathtransmittance in the viewing direction resulting in comparatively larger reflectance values particularly for spring and fall images. If such images cannot be avoided, then the DOS2 correction produced the best correction (i.e., better accuracies) because of the exaggerated path-transmittance correction. The point of this analysis was not to compare imagebased corrected reflectances to ground-based reflectances, but rather, to assess the effect of such corrections in a simple applications test. Based on the within-scene results, there is no indication that other broad imagebased corrections, such as the combined dark-object and radiative-transfer method proposed by Teillet and Fedosejevs (1995) would produce better data for this generalization question than did the corrections tested here. In this study, neither the subtle changes between the DOS1, DOS3, and RTC corrections nor the more extreme DOS2 correction show any significant difference between within-scene, mean classification accuracies for midsummer images. Therefore, the DOS3 and RTC corrections were not extended to the within-region tests. It is possible that an atmospheric correction that incorporates the atmospheric spatial inhomogeneity of an image, and adjacency and topographic effects would produce different corrections and thus more accurate classifications. But, for the foreseeable future, such a correction is likely to be very time consuming, thus, defeating the purpose of generalization. The DOS1M correction also has limited utility within the context of generalization in that it is tied to a regional
landcover pattern and cannot necessarily be extended to other regions. However, given the known imperfections of the DOS correction, we tested this relative correction to find out if a simple adjustment to DOS would be beneficial. The lower within-region classification accuracies for DOS1M indicate that this correction does not warrant further study. Although DOS corrections are known to be imperfect, there are no other operational corrections suitable for regional scale analyses. Future research and development of an operational, image-based atmospheric correction would be beneficial to the entire Landsat community. There is a surprising degree of variability in accuracies for all combinations of atmospheric corrections and generalization levels. Fig. 6 shows the accuracies for each of the 46
Fig. 6. There is significant variability in accuracies for each series of classification tests. This figure shows the classification accuracies for each of the 46 within-region DOS1 classification trials. Each data point corresponds to a trial based on training from one image and testing on an image from a different date and scene. The lower accuracies result most frequently from classifications in which images from scene 45/28 serve as test images.
M. Pax-Lenney et al. / Remote Sensing of Environment 77 (2001) 241–250
within-region DOS1 classifications. This figure shows that a single classification test is insufficient to measure the effects of generalization or atmospheric correction on classification accuracies. Results from a single, or even several classifications, could be very misleading. One of the strengths of this study is that results are based on a large number of tests. Not only is there significant variability in accuracies, there is also a pattern to the variability. Low accuracies are often, but not exclusively, associated with images from one scene (45/28). The source of lower accuracies for scene 45/ 28 is not apparent. Obvious sources of error such as seasonality, solar angle variation, complexity of landcover identification, and geographic distribution of sites do not explain the pattern of lower accuracies for this scene. Nor is this error pattern related to whether or not sites coexist in training and testing images. It appears that underlying factors in the generalization process are not yet well understood. More work will be needed to understand the interactions between training and testing data. It is also important to note that there are ‘‘good’’ and ‘‘poor’’ training and testing images, and that this quality is nonreciprocal; ‘‘good’’ trainers do not necessarily serve as ‘‘good’’ testers. Understanding this dichotomy better will be necessary in order to utilize generalization most effectively. One of the more surprising results from this study is the finding of no statistically significant difference between classifications of uncorrected (DN) and atmospherically corrected for data for within-region generalizations: atmospheric corrections did not improve classification results. This finding is unexpected because it was assumed that atmospheric corrections are a critical component to spatial generalization. Once we have a better understanding of both the variability in accuracies and the interaction between training and testing data, it will be important to reevaluate this finding. At this point, we can ask what is the role of generalization vis-a`-vis monitoring and mapping landcover and landcover change. As mentioned previously, we demonstrated a successful application of generalization to monitor manifest forest change: within-region generalization was successful (Woodcock et al., 2001). In contrast, in this current study of landcover mapping, we show a significant decrease in within-region mean accuracies. With generalization, it was more difficult to map the stable conifer landcover at the regional scale than to identify manifest forest change at the same scale. However, temporal generalization (within-scene) was successful for both landcover and landcover change mapping if simple atmospheric corrections and a limited seasonal distribution of images were used. More tests need to be conducted in different geographic regions and in regions with different landcovers to assess the constancy of these initial results. In this study, the within-region analysis incorporated both temporal and spatial generalization simultaneously. In part, this is an artifact of the sparsely populated Landsat 5
249
archive, and in part, the temporal spread was intentionally broad to evaluate the limitations of regional generalization. While we found no clear association between classification accuracies and seasonality or separation of training and testing dates, it would be beneficial to test this further. The Landsat 7 ETM+ archive provides the opportunity to separate temporal and spatial generalization effects because of the greatly increased rate and extent of ETM+ coverage. It is much more feasible to construct a regional database of anniversary-date images with Landsat 7 data than was possible with Landsat 5 data. With such a regional database, the spatial extent of generalization may be clearer.
5. Conclusions The ability to identify landcover classes with a generalizable classifier offers the possibility of more frequent monitoring over larger areas than is feasible with common labor-intensive image-by-image methods. This first attempt to apply generalization to a simple landcover classification produced mixed results. First order, temporal generalization was successful. There was no loss of mean accuracy from within-image to within-scene classification trials when using midsummer images with DOS atmospheric corrections to identify conifer forest in western Oregon. Furthermore, simple DOS atmospheric corrections produced classifications with statistically comparable accuracies to those from the more complex radiative transfer corrections indicating that the added effort of using the more complex radiative transfer correction is not warranted. However, extending the classifier across space (from within-image to within-region) shows an 8– 13% loss of mean accuracies depending on the atmospheric correction applied. And, although within-region mean accuracies are high for DOS1 and DOS2 atmospheric corrections (86%), the usefulness of generalization across space to map stable landcovers is limited because of the decline in accuracy. The patterns in accuracies are complex and, as of yet, not well understood. Errors in the classifications cannot be attributed to commonly expected sources such as variation in acquisition dates, solar zenith angles, or complexity of landcovers. Furthermore, the relationship between training and testing images is not necessarily reciprocal: good training images are not necessarily good testing images and vice versa. As an initial application of generalization, these results are promising, particularly for within-scene classifications. Improvements are expected from work in progress to develop more sophisticated decision rules in the neural network and from developing a better understanding of the interaction between training and testing images. Future work will also investigate the significance of the quantity and types of training data on accuracies.
250
M. Pax-Lenney et al. / Remote Sensing of Environment 77 (2001) 241–250
Acknowledgments This research was funded by the NASA Landsat 7 Science Team, NASA Grant NAS5-3439.
References Bowker, D. E., Davis, R. E., Myrick, D. L., Stacy, K., & Jones, W. T. (1985). Spectral reflectances of natural targets for use in remote sensing. Washington, DC: NASA Reference Publication 1139. Carpenter, G. A., Gjaja, M. N., Gopal, S., & Woodcock, C. E. (1997). ART neural networks for remote sensing: vegetation classification from Landsat TM and terrain data. IEEE Transactions on Geoscience and Remote Sensing, 35 (2), 308 – 325. Chavez, P. S. Jr. (1989). Radiometric calibration of Landsat Thematic Mapper multispectral images. Photogrammetric Engineering and Remote Sensing, 55 (9), 1285 – 1294. Chavez, P. S. Jr. (1996). Image-based atmospheric corrections — revised and improved. Photogrammetric Engineering and Remote Sensing, 62 (9), 1025 – 1036. Cohen, W. B., Fiorella, M., Gray, J., Helmer, E., & Anderson, K. (1998). An efficient and accurate method for mapping forest clearcuts in the Pacific Northwest using Landsat imagery. Photogrammetric Engineering and Remote Sensing, 64 (4), 293 – 300. Cohen, W. B., Maiersperger, T. K., Spies, T. A., & Oetter, D. R. (2001). Modeling forest cover attributes as continuous variables in a regional context with Thematic Mapper data. International Journal of Remote Sensing (in press). Collins, J. B., & Woodcock, C. E. (1999). Personal Communication. Crist, E. P. (1985). A TM Tasseled Cap equivalent transformation for reflectance factor data. Short communication. Remote Sensing of Environment, 17, 301 – 306. Cronin, J. F. (1967). Terrestrial multispectral photography: Terrestrial Sciences Laboratory (Project 7628), Bedford, MA: Air Force Cambridge Research Laboratories, Special Reports, No. 56 (AFCRL-67-0076), January, 46 pp. Edmonds, R. L. (1982). Analysis of coniferous forest ecosystems in the western United States. Stroudsburg, PA: Hutchinson Ross Publication (US/IBP Synthesis Series #14). FAO (1995). Forest resources assessment 1990: global synthesis. FAO Forestry Paper 124, Food and Agriculture Organization of the United Nations, Rome, 44 pp. Fazakas, Z., & Nilsson, M. (1996). Volume and forest cover estimation over southern Sweden using AVHRR data calibrated with TM data. International Journal of Remote Sensing, 17 (9), 1701 – 1709. Franklin, J. F., & Dyrness, C. T. (1988). Natural vegetation of Oregon and Washington. Corvallis, Oregon: Oregon State Univ. Press. Goetz, A. F. H., Rowan, L. C., & Kingston, J. J. (1982). Mineral identification from orbit: initial results from the shuttle multispectral infrared radiometer. Science, 218 (4676), 1020 – 1024. Hall, F. G., Strebel, D. E., Nickeson, J. E., & Goetz, S. J. (1991). Radiometric rectification: toward a common radiometric response among multidate, multisensor images. Remote Sensing of Environment, 35, 11 – 27. Hunt, G. R. (1977). Spectral signatures of particulate minerals in the visible and near infrared. Geophysics, 42 (3), 510 – 513. Instituto Nacional de Pesquisas Espaciais. (1992). Deforestation in Brazilian Amazonia. Sao Jose dos Campos, Brazio: Instituto Nacional de Pesquisas Espaciais (INPE).
Iverson, L. R., Cook, E. A., & Graham, R. L. (1994). Regional forest cover estimation via remote sensing — the calibration center concept. Landscape Ecology, 9 (3), 159 – 174. Kaufman, Y. J. (1989). The atmospheric effect on remote sensing and its correction. In: G. Asrar (Ed.), Theory and application of optical remote sensing p. 341. New York: Wiley. Loveland, T. R., & Shaw, D. M. (1996). Multiresolution land characterization: building collaborative partnerships. In: J. M. Scott, T. Tear, & F. Davis (Eds.), Gap analysis: a landscape approach to biodiversity planning. Proceedings of the ASPRS/GAP Symposium, Charlotte, NC. ( pp. 83 – 89). Moscow, ID: Charlotte National Biological Service. Markham, B. L., & Barker, B. L. (1986). Landsat MSS and TM postcalibration dynamic ranges, exoatmospheric reflectances and at-satellite temperature. EOSAT Landsat Technical Notes. Minter, T. C. (1978). Methods of extending crop signatures from one area to another. In: Proceedings, The LACIE Symposium, A Technical Description of the Large Area Crop Inventory Experiment (LACIE), October 23 – 26, 1978, Houston, TX. Montgomery, D (1984). Design and analysis of experiments (2nd ed.). New York: Wiley Moran, M. S., Jackson, R. D., Slater, P. N., & Teillet, P. M. (1992). Evaluation of simplified procedures for retrieval of land surface reflectance factors from satellite sensor output. Remote Sensing of Environment, 41, 169 – 184. Myers, V. I. (1983). Remote sensing applications in agriculture. In: J. E. Colwell, & R. N. Colwell (Eds.), Manual of remote sensing, vol. 2 (pp. 2111 – 2228). Falls Church, VA: American Society of Photogrammetry. Price, J. C. (1995). Examples of high resolution visible to near-infrared reflectance spectra and a standardized collection for remote sensing studies. International Journal of Remote Sensing, 16 (6), 93 – 1000. Skole, D., & Tucker, C. J. (1993). Tropical deforestation and habitat fragmentation in the Amazon: satellite data from 1978 to 1988. Science, 260, 905 – 1910. Song, C., Woodcock, C. E., Seto, K. C., Pax-Lenney, M., & Macomber, S. A. (2001). Classification and change detection using Landsat TM data: when and how to correct atmospheric effects. Remote Sensing of Environment, 75, 230 – 244. Strahler, A. H., Townshend, J. R. G., Muchoney, D., Borak, J., Friedl, M., Gopal, S., Hyman, A., Moody, A., & Lambin, E. (1996). MODIS Land Cover Product Algorithm Theoretical Basis Document (ATBD), Version 4.1. Washington, DC: National Aeronautics and Space Administration. Teillet, P. M., & Fedosejevs, G. (1995). On the dark target approach to atmospheric corrections of remotely sensed data. Canadian Journal of Remote Sensing, 21 (4), 374 – 387. Thome, K. J., Gellmann, D. I., Parada, R. J., Biggar, S. F., Slater, P. N., & Moran, S. M. (1993). In-flight radiometric calibration of Landsat-5 Thematic Mapper from 1984 to present. In: Proc. of Photo-Optical Instrum. Eng. Symp, 12 – 16 April, Orland, FL. Vermote, E. F., Tanri, D., Deuzi, J. L., Herman, M., & Morcrette, J. J. (1997). Second simulation of the satellite signal in the solar spectrum: an overview. IEEE Transactions on Geoscience and Remote Sensing, 35 (3), 675 – 686. Vogelman, J. E., Sohl, T., & Howard, S. M. (1998). Regional characterization of land cover using multiple sources of data. Photogrammetric Engineering and Remote Sensing, 64 (1), 45 – 57. Woodcock, C. E., Macomber, S. A., Pax-Lenney, M., & Cohen, W. B. (2001). Monitoring large areas for forest change using Landsat: generalization across sensors, space and time. Remote Sensing of Environment (in press). Zhu, Z., & Evans, D. L. (1994). US forest types and predicted percent forest cover from AVHRR data. Photogrammetric Engineering and Remote Sensing, 60 (5), 525 – 531.