Automated masking of cloud and cloud shadow for ... - Semantic Scholar

4 downloads 6134 Views 4MB Size Report
Oct 20, 2010 - This algorithm uses clear view forest pixels as a reference to define cloud boundaries for separating cloud from clear view surfaces in a spectral ...
International Journal of Remote Sensing Vol. 31, No. 20, 20 October 2010, 5449–5464

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

Automated masking of cloud and cloud shadow for forest change analysis using Landsat images CHENGQUAN HUANG*†, NANCY THOMAS†, SAMUEL N. GOWARD†, JEFFREY G. MASEK‡, ZHILIANG ZHU§, JOHN R. G. TOWNSHEND† and JAMES E. VOGELMANN¶ †Department of Geography, University of Maryland, College Park, MD 20742, USA ‡Biospheric Sciences Branch, NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA §US Geological Survey, 12201 Sunrise Valley Drive, Reston, VA 20771, USA ¶USGS Earth Resources Observation and Science (EROS) Center, Sioux Falls, SD 57198, USA (Received 8 April 2009; in final form 6 August 2009) Accurate masking of cloud and cloud shadow is a prerequisite for reliable mapping of land surface attributes. Cloud contamination is particularly a problem for land cover change analysis, because unflagged clouds may be mapped as false changes, and the level of such false changes can be comparable to or many times more than that of actual changes, even for images with small percentages of cloud cover. Here we develop an algorithm for automatically flagging clouds and their shadows in Landsat images. This algorithm uses clear view forest pixels as a reference to define cloud boundaries for separating cloud from clear view surfaces in a spectraltemperature space. Shadow locations are predicted according to cloud height estimates and sun illumination geometry, and actual shadow pixels are identified by searching the darkest pixels surrounding the predicted shadow locations. This algorithm produced omission errors of around 1% for the cloud class, although the errors were higher for an image that had very low cloud cover and one acquired in a semiarid environment. While higher values were reported for other error measures, most of the errors were found around the edges of detected clouds and shadows, and many were due to difficulties in flagging thin clouds and the shadow cast by them, both by the developed algorithm and by the image analyst in deriving the reference data. We concluded that this algorithm is especially suitable for forest change analysis, because the commission and omission errors of the derived masks are not likely to significantly bias change analysis results.

1.

Introduction

The usefulness of optical remote sensing images for land surface studies is often hindered by the presence of cloud and shadow. Little or no information of the surface can be derived from observations contaminated by cloud or shadow. To minimize the impact of such contaminations, cloud-free images are often preferred over cloudy images in land remote sensing applications. For areas that suffer constant cloudy conditions, however, cloud-free images are not always available. This problem is *Corresponding author. Email: [email protected] International Journal of Remote Sensing ISSN 0143-1161 print/ISSN 1366-5901 online # 2010 Taylor & Francis http://www.tandf.co.uk/journals DOI: 10.1080/01431160903369642

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

5450

C. Huang et al.

further exacerbated by historical issues that dictated data acquisition strategies of Landsat class satellites, including limited acquisition capabilities, lack of ground receiving stations, and commercial considerations (Goward et al. 2006, Green 2006). As a result, cloud-free images may not exist during a specified time window for a particular study area, and the best available images, which may be partly cloudy, would have to be used. To avoid erroneous information of the land surface being derived from pixels contaminated by cloud or shadow, such pixels should be flagged properly. It has been demonstrated that inability to mask cloud and shadow accurately can lead to decreased performance in land cover classification, estimation of snow cover (Wang et al. 2008), and detection of active fire (Li et al. 2003). Forest change analysis is especially sensitive to cloud contaminations, because unflagged clouds over forest are likely to be mapped as change. Given the fact that in most areas the rates of forest change are in the order of a few percentage points or less (e.g. Lunetta et al. 2004, Masek et al. 2008), even a small level of error in cloud detection could lead to significant errors in downstream forest change analysis. Methods for masking cloud using satellite images can be grouped into classification or rule-based approaches. In the classification approach, cloud can be treated as one of the classes in a land cover classification task and mapped using a classification approach as part of a land cover classification effort, or it can be classified in a separate cloud masking task (e.g. Simpson and Gobat 1996, Amato et al. 2008). Full automation of this approach is generally difficult, because most classification methods require a certain level of local knowledge of cloud and surface conditions within the concerned images. In the rule-based approach, cloudy pixels are identified using a set of decision rules. Automation of this approach is straightforward if the decision rules have been derived based on known properties of cloud. Rule-based cloud masking algorithms have been developed for images acquired by the Advanced Very High Resolution Radiometer (AVHRR) (Stowe et al. 1999), Moderate Resolution Imaging Spectroradiometer (MODIS) (Ackerman et al. 1998, Luo et al. 2008), and the planned Visible/Infrared Imager Radiometer Suite (VIIRS) instrument (Hutchison et al. 2005). Because many forest changes, especially those of anthropogenic origin, often occur at relatively small scales, reliable detection of such changes requires Landsat images or datasets with similar or finer spatial resolutions (Townshend and Justice 1988). An automated cloud cover assessment (ACCA) algorithm has been developed for estimating the percentage of cloud cover in Landsat 7 Enhanced Thematic Mapper Plus (ETMþ) images (Irish et al. 2006). At the image level the derived cloud cover was found within 10% of reference data for 75% of the time (Irish 2000). Without pixel level accuracy measures it is difficult to assess the suitability of this algorithm for forest change analysis. The reported image level accuracy values, however, suggest that the cloud masks derived using this algorithm could have substantial errors under certain circumstances, which can lead to significant commission or omission errors in downstream forest change analysis. Here we develop a new algorithm for masking cloud and cloud shadow using Landsat Thematic Mapper (TM) and ETMþ images. This automated algorithm first identifies cloud pixels in a spectral–temperature space. The shadow of an identified cloud pixel is then determined based on spectral values and its location projected according to a temperature-based cloud height estimate, sun illumination geometry, and a digital elevation model (DEM). In this paper we provide a detailed description of the developed algorithm and comprehensive assessments of the masks derived using this algorithm, including pixel level accuracy assessments and a comparison

Automated masking of cloud and cloud shadow

5451

of those masks with the ones produced using a cloud algorithm developed by Luo et al. (2008). Strength and limitations of the developed algorithm are discussed in the context of forest change analysis. 2.

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

2.1

Algorithm description Overview of the cloud algorithm

Most existing cloud detection algorithms developed for coarse spatial resolution images use a sequence of multispectral contrast and spectral and spatial signature threshold tests to identify cloudy pixels (Ackerman et al. 1998, Stowe et al. 1999, Hutchison et al. 2005). Similar tests are also used in the ACCA algorithm developed for Landsat ETMþ images (Irish 2000, Irish et al. 2006). The physical bases for many of these tests are that clouds are spectrally bright (figure 1(a)) and cold in the thermal band (figure 1(b)). Therefore, threshold values can be used to define cloud boundaries for separating cloud from non-cloudy surface in a spectral–temperature space (figure 1(c)). The threshold values, however, are likely to vary from image to image, because temperature is a function of many factors, including geographic location, day of year and elevation, while remotely sensed spectral values are affected by atmospheric conditions, vegetation phenology, and other surface conditions. Therefore, image specific threshold values should be more robust than fixed global threshold values for cloud detection. The algorithm developed here uses clear view forest pixels to define a cloud boundary in a spectral–temperature space as shown in figure 1(c). Here, and throughout this paper, clear view means that a pixel is not contaminated by either cloud or shadow. It does not exclude other atmospheric effects. This algorithm first identifies clear view forest pixels. The temperature values of those pixels calculated using the thermal band are used as a surrogate of the temperature of surface air at the acquisition of the concerned image, and are used to normalize the spatial and temporal variations of temperature. Furthermore, a digital elevation model (DEM) is used to remove the elevation dependency of temperature (Smithson et al. 2008). Finally, threshold values are derived to define a cloud boundary based on the spectral–temperature characteristics of know forest pixels. Like in most land surface applications of satellite images, accurate georeferencing of all required geospatial datasets is a requirement of the cloud and shadow algorithm described here. In addition, the raw radiometric values of Landsat images need to be calibrated and converted to top-ofatmospheric (TOA) reflectance for the spectral bands and to TOA brightness temperature for the thermal band. Details of the geometric and radiometric preprocessing of the Landsat images used in this study have been provided by Huang et al. (2009). Methods for converting raw Landsat radiometry to TOA reflectance and brightness temperature have also been described by Markham and Barker (1986), the Landsat Project Science Office (2000), and Chander et al. (2004). 2.2

Delineation of confident forest pixels

Due to strong chlorophyll absorption and substantial shadows cast by uneven tree canopy, forests are generally darker than other vegetated surface types, especially in the visible bands (Colwell 1974, Goward et al. 1994, Huemmrich and Goward 1997), and are among the most easily distinguished features in remote sensing imagery (Dodge and Bryant 1976). This well-recognized observation serves as the basis for

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

5452

C. Huang et al.

Figure 1. Images showing that clouds are (a) bright in the spectral bands (Landsat bands 5, 4, and 3 are in red, green, and blue); and (b) cold in the thermal band. (c) A cloud boundary can be defined in a reflectance–temperature space, and (d) can be used to identify cloudy pixels (in red). The image window covered a ground area of 11.4 km  11.4 km near Boston, MA, USA (WRS2 path 12/row 31), and was acquired on 29 August 2004. North is towards the top for all images and maps shown in this paper. The colour of each point in (c) represents the number of pixels plotted at that point. Low numbers are in purple and blue and high numbers in yellow and red.

automated delineation of some forest pixels, which are called confident forest pixels in this paper. Such pixels often represent dense, mature forests that typically appear dark and green in a true colour composite of Landsat images. Delineation of confident forest pixels is achieved using histograms created using local image windows (e.g. 5 km by 5 km). Because forest pixels are typically the darkest among vegetated pixels, they are generally located towards the lower end of each histogram. When a local image window has a significant portion of forest pixels, those pixels form a peak called the forest peak in the histogram. In the absence of water, dark soil and other dark non-vegetated surfaces, which are masked out using appropriately defined greenness and brightness threshold values, forest pixels are delineated using threshold values

Automated masking of cloud and cloud shadow

DEM

Delineate 100 m isometric zones

Calculate mean forest T within each isometric zone Confident forest pixels

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

Figure 2.

Subtract mean forest T from Band 6 T within each isometric zone

5453

Normalized T

Band 6 temperature (T)

Use of DEM and confident forest pixels to normalize band 6 temperature (T).

defined by the forest peak. A detailed description of this approach has been provided by Huang et al. (2008). 2.3

Temperature normalization

While figure 1 shows the usefulness of temperature for cloud delineation, surface temperature can vary spatially, temporally and along elevation gradients. The threshold values for separating cloud from non-cloudy surfaces are likely to vary similarly, unless such variability of the temperature is normalized. Here we use the confident forest pixels and a DEM to normalize the temperature (figure 2). Specifically, a DEM is used to delineate isometric zones at an interval of 100 m in the vertical direction. Within each isometric zone, the temperature values of confident forest pixels calculated using the thermal band are used to calculate the mean forest temperature for that zone. For each pixel within a 100-m isometric zone, the mean forest temperature of that zone is subtracted from the pixel’s temperature to calculate the normalized temperature. Here, use of forest pixels to normalize the temperature can greatly reduce its spatial and temporal variability, because the temperature values of forest and the target pixel are measured at almost the same time using the same instrument. The elevation dependency of temperature is removed by using the DEM. The normalized temperature value of forest pixels will be around 0, although the actual temperature values of forest pixels may vary due to elevation, geographic region and acquisition date. In addition, because the temperature values of forest pixels and those of the pixels to be normalized are acquired simultaneously by the same Landsat instrument, this normalization process can also minimize potential biases in satellite measurements that may arise from instrument or calibration errors. 2.4

Defining the cloud boundary

Given the very different spectral–temperature characteristics of clouds, a boundary for separating cloud from non-cloudy surfaces could be defined in many different ways. For the purpose of automation, however, simple linear boundaries that can be defined automatically are preferred over complex nonlinear boundaries that may require fine tuning. We experimented with many Landsat images acquired in different areas, and found that a decision boundary consisting of four linear segments was adequate for flagging clouds in most cases. As shown in figure 1(c), the four segments are defined and connected by three points in the spectral–temperature space. Segment 1 defines the lower reflectance boundary of cloudy pixels, while segment 4 defines their upper temperature limit. These two segments are joined by segments 2 and 3 at points A, B and C. A pixel is a cloudy pixel if it is located below segment 4 and to the

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

5454

C. Huang et al.

right of segments 1, 2 and 3. Notice that although most clouds appear bright in all six spectral bands of Landsat images, because thin clouds and other atmospheric contaminants have stronger brightening effect in the visible bands, these bands are preferred over the infrared bands in constructing the spectral-temperature space shown in figure 1(c). The red band is used in the algorithm described here. As discussed earlier, the cloud boundary most likely will vary from one image to another due to spatial and temporal variability of surface and atmospheric conditions. To minimize the impact of such variabilities, here the points A, B and C as shown in figure 1(c) are located based on previously delineated confident forest pixels. Let (RA, TA), (RB, TB) and (RC, TC) be the coordinates of points A, B and C in the spectral–temperature space, mR and mT the mean reflectance and temperature values of delineated confident forest pixels, and sR and sT their standard deviations. A, B and C are located as follows: RA ¼ mR þ sR

(1ða)Þ

TA ¼ mT  6 sT

(1ðb)Þ

RB ¼ mR þ 3 sR

(1ðc)Þ

TB ¼ mT  3 sT

(1ðd)Þ

R C ¼ m R þ 8 sR

(1ðe)Þ

TC ¼ mT þ 2 sT

(1ðf)Þ

Here the multipliers of the standard deviation values are empirically derived. One can obtain slightly different results by changing these multipliers slightly. But the differences most likely will not be seen at cloud patch level, but along cloud edges. Points A and B are placed to detect thin cirrus and other high altitude thin clouds over forest, which are not necessarily much brighter than the forest beneath them, but are typically very cold due to rapid decreases in temperature as altitude increases in the troposphere (see right side of the image example in figure 1). Segments 3 and 4 are placed to detect other cloud types, which are often bright and cold. Clouds that are bright but not cold, including some near surface clouds, low fogs, and smokes, are not likely to be detectable using this cloud boundary. Clouds typically become less bright and less cold towards their edges. As a result, those edge pixels may not be detected by the above defined cloud boundary (figure 1(d)). In order to flag those pixels as cloudy pixels, the cloud boundary is relaxed for pixels neighbouring previously flagged cloudy pixels as follows: RA0 ¼ RA

(2ða)Þ

TA0 ¼ TA

(2ðb)Þ

RB0 ¼ RB  sR

(2ðc)Þ

TB0 ¼ RB  sR

(2ðd)Þ

RC0 ¼ RC  2 sR

(2ðe)Þ

Automated masking of cloud and cloud shadow

5455

TC0 ¼ TC þ sT

(2ðf)Þ

Here no relaxation is applied to point A because the reflectance value at this point is only one standard deviation above the mean value of forest. Flagging pixels as cloud that have reflectance values lower than this point is not necessary, because the impact of such clouds is likely to be within the normal variation of forest pixels anyway.

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

2.5

Cloud shadow detection

Once the cloud pixels are identified, their shadows are detected geometrically and spectrally. Specifically, the projected location of each detected cloudy pixel is calculated according to its location, cloud height and solar illumination geometry. Cloud height is estimated by applying a normal lapse rate of 6.4 C km-1 (Christopherson 2002, p. 67, Smithson et al. 2008) to the temperature difference between a cloudy pixel and nearby surface air. Here we use the mean temperature of previously identified confident forest pixels within an isometric zone to represent the surface air temperature at the elevation level of that isometric zone. For a cloudy pixel, this temperature difference is the normalized temperature as calculated in §2.3. Because all pixels within a satellite image are measured by the same instrument at almost the same time, calculating the temperature difference this way can minimize two potential sources of biases that may arise if the difference is calculated using ground measurements of air temperature. One is temporal discrepancies that may exist between ground measurements and satellite observations. The other is measurement bias that may arise due to instrument errors, calibration issues and impacts of the atmosphere and surface emissivity. Figure 3 shows a comparison of predicted and actual shadow locations. While in some cases the predicted shadow location matches its actual location, in other cases it can be slightly off. The mismatch is likely to be due to uncertainties in estimating cloud height from temperature: the actual lapse rate (or environmental lapse rate) can be different from the normal lapse rate of 6.4 C km-1; cloud temperature can be different from the temperature of surrounding air; and a single height value may not adequately represent the range of height values for clouds with substantial vertical development (Christopherson 2002). Therefore, pixels at the predicted shadow locations cannot be taken as actual shadow pixels; they are used to guide the search of shadow pixels. Specifically, the darkest pixels within a neighbourhood of predicted shadow locations are identified as shadow pixels. Here darkness is defined according to near-infrared (band 4) or shortwave infrared (band 5). The visible bands and midinfrared band (band 7) are less useful for this purpose because vegetation also appears dark in those bands. In addition, thin clouds may not cast spectrally distinguishable shadows, while part or all of the shadow of a cloud can be obscured by itself or other clouds and therefore may be absent in the concerned image. Therefore, a shadow will be delineated for an identified cloud pixel only when dark pixels can be found within a neighbourhood of the predicted shadow location. 3. 3.1

Algorithm assessment Pixel level accuracy assessment

The above cloud-shadow algorithm has been applied to hundreds of Landsat images that were acquired as part of the North American Forest Dynamics (NAFD) project (Goward et al. 2008). Those images were acquired for sample sites selected across the

5456

C. Huang et al. (a)

(b)

(c)

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

(i)

(ii)

Figure 3. Landsat images showing the actual location of cloud shadows (row (i), bands 5, 4 and 3 in red, green and blue; clouds appear bright and their shadows are dark) and their locations predicted by the shadow algorithm (row (ii), predicted location in black). The actual shadow may be found at the predicted location (column a), to its right (column b), or to its left (column c). The image windows in columns (a) and (b) were acquired in North Carolina (WRS2 path 18/row 35) on 21 July 1992, and that in column (c) was acquired in southern Mississippi (WRS2 path 21/row 39) on 25 August 1997. The size of each image window is 2.85 km by 2.85 km. The white cursors in each column have the same geographic coordinates.

USA throughout much of the TM and ETMþ era of the Landsat history (i.e. 1984–2006) (Huang et al. 2009). The derived masks of most of these images were assessed qualitatively through visual inspection. In addition, four of those images, with each covering an area in eastern, southern, northern and western USA, were selected to derive per-pixel accuracy estimates through accuracy assessment. Characteristics of the four images are listed in table 1. Table 1. Characteristics of the four Landsat images where the cloud-shadow masks derived using the developed algorithm were assessed through accuracy assessment. Path/ row 18/35 21/39 22/28 36/37

Location

Acquisition date

North 21 July 1992 Carolina Mississippi 25 August 1997 Michigan 3 August 2001 Arizona 4 August 1992

Surface characteristics Mostly forest and agriculture land uses; high terrain relief Mostly forest, coastal; low terrain relief Mostly forest, near the great lakes; low terrain relief Semiarid, mostly shrub and grassland, cloud mainly in non-forest area; high terrain relief

*Note: Cloud cover was estimated based on a visual assessment of the image.

Cloud cover (%)* 15 10 5 7

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

Automated masking of cloud and cloud shadow

5457

For each selected Landsat image, a reference cloud-shadow mask (referred to as the reference mask hereafter) was derived as follows. First, the ISODATA clustering algorithm (Tou and Gonzalez 1974) was used to partition the image into a large number of clusters. The clusters were then grouped into three categories by an image analyst: cloud, shadow and clear surface, where clear surface refers to pixels not contaminated by cloud or shadow. Lastly, this classification was edited to remove errors due to spectral confusions between clear surfaces and the other two categories. To reduce the editing effort, we randomly selected three windows of 530 by 530 pixels (or 15.1 km by 15.1 km) within each image and picked the one with the most cloud cover for hand editing. The derived reference cloud-shadow masks are shown in figure 4 (middle column) along with the Landsat image windows (left column) and the masks produced by the algorithm developed in this study (right column). For each cloud-shadow mask derived using the algorithm developed in this study (referred to as the derived mask hereafter), several accuracy measures were calculated according to Stehman (1997) and Congalton (1991), including an overall accuracy and the per-class commission and omission errors for the cloud and shadow classes (table 2). The overall accuracy was defined by the proportion of pixels where a derived mask agreed with the corresponding reference mask for all classes. For each of the concerned classes, the commission error was calculated as the proportion of pixels that were labelled with that class in the derived mask but were not in the reference mask, whereas the omission error was the proportion of pixels that were labelled with that class in the reference mask but were not in the derived mask. These accuracies were calculated using all pixels of each test site. The overall accuracies of the derived masks are generally high, ranging from 86.1% to 98.8% over the four test sites. For forest change studies the commission and omission errors of the cloud and shadow classes and their spatial distributions are particularly relevant, because they can directly affect the derived change results. As discussed previously, an unflagged cloud over forest (omission) likely will be mapped as non-forest, and likely will be mapped as a change if the forest pixel is not contaminated by cloud in a different date. On the other hand, flagging a harvested forest area as cloud (commission) will leave the harvest event undetected. The developed algorithm had very low omission errors for the cloud class: 1.6% and 0.9% for the North Carolina and Mississippi images respectively. The omissions were substantially higher for the Michigan and Arizona images. However, visual assessments of masks against the Landsat images revealed that most cloud patches were flagged properly in the derived masks (figure 4). The relatively high omission error in the Michigan site (22/28) was likely to be the result of a very small denominator value in the calculation (i.e. the total number of cloudy pixels is very low), while the error in the Arizona site (36/37) was probably due to the difficulty to flag thin clouds along cloud edges with a bright surface background. Except for the Arizona site, the commission errors for the cloud class ranged from near 30% to over 40%. A detailed visual assessment revealed that the majority of the errors were along the edges of actual cloud patches (figure 4). Depending on the stretching of the images some image analysts may flag some of the edge pixels as cloud while others may not. In our opinion cloud edges should be labelled as cloud because most of them are contaminated by cloud and often have substantially higher reflectance values than neighbouring pixels that are not contaminated by cloud. Therefore, the algorithm developed here was designed to include those edge pixels as cloud. In addition, it was difficult to determine whether some thin clouds should be flagged as

5458

C. Huang et al. (a)

(b)

(c)

18/35, 1992-07-21

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

21/39, 1997-08-25

22/28, 2001-08-03

36/37, 1992-08-04

Figure 4. Visual comparison of the cloud-shadow masks derived using the algorithm developed in this study (c) against the reference masks (b) and Landsat images (a) for the four accuracy assessment images listed in table 1. Each of the images in (a) covers an area of 15.1 km by 15.1 km and is shown with bands 5, 4, and 3 in red, green, and blue. In the masks (middle and right columns), cloud, shadow and clear view surfaces are shown in white, black and green colours, respectively. The red circles and ellipses in the top row highlight the thin clouds that were flagged by the developed algorithm but were mostly missed in the reference mask.

Automated masking of cloud and cloud shadow

5459

Table 2. Overall accuracies of the cloud-shadow masks derived using the developed algorithm and the commission and omission errors for the cloud and shadow classes. All accuracy and error values are percentages (%). Cloud Path/row

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

18/35 21/39 22/28 36/37

Shadow

Commission

Omission

Commission

Omission

Overall accuracy

45.0 28.9 34.8 0.2

1.6 0.9 11.8 29.4

16.1 23.7 17.8 4.6

15.6 18.7 19.6 21.5

88.7 86.1 98.8 93.8

clouds even by an experienced image analyst. In the North Carolina image, some thin clouds were not flagged as clouds in the reference mask but were identified properly by the developed algorithm. Those pixels also contributed to the commission errors reported in table 2 (figure 4, 18/35, highlighted in red circles and ellipses). The commission and omission errors of the shadow class were mostly between 16% and 24% (table 2). As with those of the cloud class, visual assessments revealed that most of those errors were along the edges of actual shadow patches (figure 4). Again, difficulties in detecting shadows cast by thin clouds or cloud edges, by both the developed algorithm and by the image analyst in developing the reference masks, contributed to most of the errors. Another source of error was confusions between shadow and water, which were found to be spectrally inseparable in many cases. 3.2

Comparison with other cloud algorithms

The developed cloud algorithm was compared with an algorithm developed by Luo et al. (2008). While many other algorithms have been published for masking clouds using satellite imagery (e.g. Gao et al. 1993, Rossow and Garder 1993, Simpson and Gobat 1996, Chen et al. 2003, Hutchison et al. 2005), due to different spectral and spatial characteristics of different sensors, few of them can be directly applied to Landsat images. Although the ACCA was designed specifically for use with Landsat ETMþ imagery (Irish 2000, Irish et al. 2006), due to the complex nature of this algorithm, we were not confident that we could replicate it in our study. The cloud algorithm of Luo et al. (2008) was originally designed for use with MODIS TOA reflectance images as part of a compositing method for producing clear-sky composites. It can be adapted for use with Landsat images because it only uses spectral bands (i.e. MODIS bands 1, 2, 3 and 6) that are very similar to those available in Landsat images (i.e. Landsat bands 1, 3, 4 and 5). This algorithm uses four sets of decision rules to flag a pixel as non-vegetated land, snow/ice, water, cloud, or vegetated land (figure 7 of Luo et al. (2008)). By simply replacing the MODIS bands 1, 2, 3, and 6 with Landsat bands 3, 4, 1 and 5, we implemented the cloud portion of this algorithm for use with Landsat images, which is referred to as the Luo algorithm hereafter. The shadow part of the Luo algorithm was not implemented here because we were not confident we could replicate it. Since the MODIS version of the Luo algorithm was applied to TOA reflectance images, here it was also applied to TOA reflectance of the Landsat images listed in table 1. A comparison of the cloud masks generated by the Luo algorithm and the algorithm described in this paper is provided in figure 5. It shows that the two

5460

C. Huang et al. (a)

(b)

(c)

18/35, 1992-07-21

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

21/39 1997 21/39, 1997-08-25 08 25

Figure 5. Cloud masks produced by the algorithm of Luo et al. (2008) (b) and the algorithm developed in this paper (c), for image windows shown in (a) with bands 5, 4 and 3 shown in red, green and blue, from the North Carolina (18/35, upper row) and Mississippi (21/39, lower row) images (left column). The size of each image window is 11.4 km by 11.4 km. The colour keys for the masks in (b) and (c) are the same as those used in figure 4. The red ellipses were drawn to aid visual comparison between the images and the masks.

algorithms performed similarly in identifying bright clouds and clear view land surfaces. The Luo algorithm, however, failed to identify the less bright clouds, including cloud edges. By using the thermal band and relaxing the cloud boundary for pixels neighbouring previously identified cloud pixels, the algorithm developed in this paper successfully flagged most of those less bright cloudy pixels as cloud (figure 5, highlighted using red ellipses). 4. 4.1

Discussions Applicability of the developed algorithm

Land cover change analysis using satellite imagery is sensitive to clouds and shadows that are not flagged properly. A forest patch with cloud above it in one acquisition date but not in another date likely will be mapped as a false change. While in many areas only small percentages of forests experienced stand clearing disturbances within a given time period (e.g. Masek et al. 2008), for many cloudy regions it is not uncommon to use images with 5% or more cloud cover in land cover change analysis (table 3 of Huang et al. (2009)), because in those areas less cloudy images may not exist at all. In such cases, unflagged clouds can easily result in false changes that can be comparable to or even many times more than the actual amount of change. The cloud and shadow algorithm described here is designed for addressing this issue. Specifically, by having very low omission errors for the cloud class, the cloud

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

Automated masking of cloud and cloud shadow

5461

algorithm can greatly reduce false changes caused by unflagged clouds. On the other hand, despite the high commission error of the cloud class, such errors rarely occur in areas where forest harvest, fire, or other disturbances actually occurred. In fact, such disturbance events typically caused the disturbed area to have substantially higher temperatures than neighbouring forests, and the disturbed areas are not likely to be flagged as cloud by the developed algorithm. In other words, the commission error of the cloud class is not likely to lead to significant omissions in detecting forest changes. Therefore, the errors in the cloud mask generated using the developed algorithm are not likely to have a significant impact on the results of downstream forest change analysis. Because the algorithm described here was designed for forest change analysis, it has several limitations. First, because it requires forest pixels to determine the cloud boundaries, it can only be applicable to images where forest pixels exist. We are investigating whether it can be adapted for use with images having few or no forest pixels. Second, most snow pixels will also be flagged as cloud because snow is also bright and cold. Because forest change studies typically require images acquired during the summer leaf-on season, the snow pixels in such images, if any, are likely to be limited to high elevations or high latitude regions that are beyond the upper or northern limits of timber lines. Therefore, this cloud/snow confusion is not likely to significantly affect forest change analysis results. In addition, there exist algorithms for separating snow from cloud (e.g. Hall et al. 2002, Choi and Bindschadler 2004). As discussed earlier, the developed algorithm can confuse shadow and water. While flagging water as shadow will not have much impact on forest change analysis results, flagging a shadowed forest patch as water can lead to an underestimate of forest. Further investigations are needed to better separate shadow from water. Finally, the algorithm may not be able to detect low altitude clouds, smokes, and fogs that can be as warm as surface air. A machine learning algorithm trained using the cloud pixels flagged by the developed algorithm may be used to identify such low altitude atmospheric contaminants. It should be noted that although the developed algorithm was designed for use with Landsat TM and ETMþ images, it can be adapted for use with images acquired by other satellite instruments such as the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) and MODIS, which have both spectral and thermal bands. As a test, we modified the developed algorithm by replacing the Landsat bands with comparable ASTER bands. The results derived by applying the modified algorithm to ASTER images were similar to those reported here for Landsat images. 4.2

Usefulness of thermal band for cloud detection

The comparison of the Luo algorithm and the cloud algorithm developed in this study highlights the usefulness of the thermal band in cloud detection. Because in the troposphere, where most clouds occur, air temperature decreases in general as altitude increases, most clouds are colder than the land or water surfaces underneath them. Such temperature differences are especially significant for mid- to high-altitude clouds, and can be very effective in identifying such clouds. For example, because thin cirrus clouds are not very bright spectrally, they are often very difficult to detect using the spectral bands of the TM and ETMþ instruments. But because these are high-altitude clouds, they are very cold, and can be identified relatively easily using the thermal band (figure 1). In addition, mostly due to the use of the thermal band in

5462

C. Huang et al.

the algorithm developed in this study, the decision rules for identifying clouds are much simpler than those used in other cloud algorithms (e.g. Irish et al. 2006, Luo et al. 2008). Therefore, given the fact that frequent cloudy condition is common to many regions of the Earth, satellite systems aiming at monitoring the land surface should have at least one thermal band for effective cloud detection.

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

5.

Conclusions

An algorithm for automatically flagging clouds and their shadows in Landsat images has been developed. This algorithm uses clear view forest pixels as a reference to define cloud boundaries for separating cloud from clear view surfaces in a spectral–temperature space. Shadow locations are predicted according to cloud location, cloud height estimates and sun illumination geometry, and shadow pixels are identified by searching the darkest pixels surrounding the predicted shadow location. Accuracy assessments revealed that the masks derived using this algorithm had overall accuracies ranging from 86% to 99%, while visual assessments showed that most cloud and shadow were detected properly at the patch level. Omissions of clouds for the two test images having 10% or more cloud cover were around 1%, although the errors were higher for the other two images having less cloud cover, including one image acquired in a semiarid environment. Commission errors for the cloud class were much higher than the omissions. However, visual assessments revealed that most of the commission errors occurred around the edges of detected clouds, or were thin clouds that were not flagged properly in the reference cloud masks. The shadow class had omission errors between 16% and 22% and commission errors mostly between 16% and 24%. Again, most of the errors were found along shadow edges or were due to difficulties in detecting shadows cast by thin clouds or cloud edges. A comparison of this algorithm with a cloud algorithm developed by Luo et al. (2008) shows that the two performed similarly in identifying bright clouds and clear view land surfaces. But they differed in identifying the less bright clouds, including cloud edges. Most of those less bright cloudy pixels were flagged as cloud by the algorithm developed in this study, but were not by the Luo algorithm. The developed algorithm is especially suitable for forest change analysis because the errors in the derived mask likely will not significantly bias change analysis results. The low omission errors for the cloud class can greatly reduce the risk of a cloud over forest being mapped as a forest change, while use of temperature to define the cloud boundaries minimizes the chance of disturbed forests being flagged as clouds. Because the algorithm relies on forest pixels to locate the cloud boundaries, its current version can only be applied to images containing clear view of forests. Also, it does not separate snow from cloud because forest change detection algorithms generally require images acquired during the summer leaf-on season, which typically contain little or no snow pixels. Warm atmospheric contaminants such as low altitude clouds, smokes and fogs are also likely to be missed by the developed algorithm. Some of them should be detectable using a machine learning algorithm trained using cloudy pixels flagged by the developed algorithm. While the algorithm described here was designed for use with Landsat images, it can be adapted for use with images acquired by ASTER, MODIS or other satellite instruments that have both spectral and thermal bands.

Automated masking of cloud and cloud shadow

5463

Acknowledgement This study was supported by grants from NASA’s Terrestrial Ecology, Carbon Cycle Science, Applied Sciences, and Land Cover and Land Use Change Programs, by NASA’s funding opportunity NNH06ZDA001N-EOS and NNH06ZDA001NMEASURES, and by funding from the US Geological Survey.

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

References ACKERMAN, S.A., MOELLER, C.C., GUMLEY, L.E., STRABALA, K.I., MENZEL, W.P. and FREY, R.A., 1998, Discriminating clear sky from clouds with MODIS. Journal of Geophysical Research D: Atmospheres, 103, pp. 32141–32157. AMATO, U., CUTILLO, L., FRANZESE, M., MURINO, L., ANTONIADIS, A., CUOMO, V. and SERIO, C., 2008, Statistical cloud detection from SEVIRI multispectral images. Remote Sensing of Environment, 112, pp. 750–766. CHANDER, G., HELDER, D.L., MARKHAM, B.L., DEWALD, J.D., KAITA, E., THOME, K.J., MICIJEVIC, E. and RUGGLES, T.A., 2004, Landsat-5 TM reflective-band absolute radiometric calibration. IEEE Transactions on Geoscience and Remote Sensing, 42, pp. 2747–2760. CHEN, P.Y., SRINIVASAN, R. and FEDOSEJEVS, G., 2003, An automated cloud detection method for daily NOAA 16 advanced very high resolution radiometer data over Texas and Mexico. Journal of Geophysical Research D: Atmospheres, 108, pp. AAC10-1–AAC 10–9. CHOI, H. and BINDSCHADLER, R., 2004, Cloud detection in Landsat imagery of ice sheets using shadow matching technique and automatic normalized difference snow index threshold value decision. Remote Sensing of Environment, 91, pp. 237–242. CHRISTOPHERSON, R.W., 2002, Geosystems: An Introduction to Physical Geography (Upper Saddle River, New Jersey: Prentice Hall). COLWELL, J.E., 1974, Vegetation canopy reflectance. Remote Sensing of Environment, 3, pp. 174–183. CONGALTON, R., 1991, A review of assessing the accuracy of classifications of remotely sensed data. Remote Sensing of Environment, 37, pp. 35–46. DODGE, A. and BRYANT, E., 1976, Forest type mapping with satellite data. Journal of Forestry, 74, pp. 23–40. GAO, B.-C., GOETZ, A.F.H. and WISCOMBE, W.J., 1993, Cirrus cloud detection from airborne imaging spectrometer data using the 1.38 mm water vapor band. Geophysical Research Letters, 20, pp. 301–304. GOWARD, S.N., HUEMMRICH, K.F. and WARING, R.H., 1994, Visible-near infrared spectral reflectance of landscape components in western Oregon. Remote Sensing of Environment, 47, pp. 190–203. GOWARD, S., IRONS, J., FRANKS, S., ARVIDSON, T., WILLIAMS, D. and FAUNDEEN, J., 2006, Historical record of landsat global coverage: Mission operations, NSLRSDA, and international cooperator stations. Photogrammetric Engineering and Remote Sensing, 72, pp. 1155–1169. GOWARD, S.N., MASEK, J.G., COHEN, W., MOISEN, G., COLLATZ, G.J., HEALEY, S., HOUGHTON, R., HUANG, C., KENNEDY, R., LAW, B., TURNER, D., POWELL, S. and WULDER, M., 2008, Forest disturbance and North American carbon flux. EOS Transactions, American Geophysical Union, 89, pp. 105–106. GREEN, K., 2006, Landsat in context: the land remote sensing business model. Photogrammetric Engineering and Remote Sensing, 72, pp. 1147–1155. HALL, D.K., RIGGS, G.A., SALOMONSON, V.V., DI GIROLAMO, N.E. and BAYR, K.J., 2002, MODIS snow-cover products. Remote Sensing of Environment, 83, p. 181. HUANG, C., SONG, K., KIM, S., TOWNSHEND, J.R.G., DAVIS, P., MASEK, J. and GOWARD, S.N., 2008, Use of a dark object concept and support vector machines to automate forest cover change analysis. Remote Sensing of Environment, 112, pp. 970–985.

Downloaded By: [University of Maryland College Park] At: 17:27 27 October 2010

5464

C. Huang et al.

HUANG, C., GOWARD, S.N., MASEK, J.G., GAO, F., VERMOTE, E.F., THOMAS, N., SCHLEEWEIS, K., KENNEDY, R.E., ZHU, Z., EIDENSHINK, J.C. and TOWNSHEND, J.R.G., 2009, Development of time series stacks of Landsat images for reconstructing forest disturbance history. International Journal of Digital Earth, 2, pp. 195–218. HUEMMRICH, K.F. and GOWARD, S.N., 1997, Vegetation canopy PAR absorptance and NDVI: an assessment for ten tree species with the SAIL model. Remote Sensing of Environment, 61, pp. 254–269. HUTCHISON, K.D., ROSKOVENSKY, J.K., JACKSON, J.M., HEIDINGER, A.K., KOPP, T.J., PAVOLONIS, M.J. and FREY, R., 2005, Automated cloud detection and classification of data collected by the Visible Infrared Imager Radiometer Suite (VIIRS). International Journal of Remote Sensing, 26, pp. 4681–4706. IRISH, R.R., 2000, Landsat 7 automatic cloud cover assessment. In Proceedings of SPIE: Algorithms for Multispectral, Hyperspectral, and Ultraspectral Imagery VI, 24–26 April 2000, Orlando, FL, USA, pp. 348–355 (Bellingham, WA: SPIE). IRISH, R.R., BARKER, J.L., GOWARD, S.N. and ARVIDSON, T., 2006, Characterization of the landsat-7 ETMþ automated cloud-cover assessment (ACCA) algorithm. Photogrammetric Engineering and Remote Sensing, 72, pp. 1179–1188. LANDSAT PROJECT SCIENCE OFFICE, 2000, Landsat 7 Science Data User’s Handbook (Greenbelt, MD: National Aeronautics and Space Administration). LI, Z., JIN, J., GONG, P., FRASER, R., ABUELGASIM, A.A., CSISZAR, I., PU, R. and HAO, W., 2003, Evaluation of algorithms for fire detection and mapping across North America from satellite. Journal of Geophysical Research D: Atmospheres, 108, pp. ACL 20-1–ACL 20–14. LUNETTA, R.S., JOHNSON, D.M., LYON, J.G. and CROTWELL, J., 2004, Impacts of imagery temporal frequency on land-cover change detection monitoring. Remote Sensing of Environment, 89, pp. 444–454. LUO, Y., TRISHCHENKO, A.P. and KHLOPENKOV, K.V., 2008, Developing clear-sky, cloud and cloud shadow mask for producing clear-sky composites at 250-meter spatial resolution for the seven MODIS land bands over Canada and North America. Remote Sensing of Environment, 112, pp. 4167–4185. MARKHAM, B.L. and BARKER, J.L., 1986, Landsat MSS and TM post-calibration dynamic ranges, exoatmospheric reflectances and at-satellite temperatures. EOSAT Landsat Technical Notes, 1, pp. 3–8. MASEK, J.G., HUANG, C., COHEN, W., KUTLER, J., HALL, F. and WOLFE, R.E., 2008, Mapping North American forest disturbance from a decadal Landsat record: methodology and initial results. Remote Sensing of Environment, 112, pp. 2914–2926. ROSSOW, W.B. and GARDER, L.C., 1993, Cloud detection using satellite measurements of infrared and visible radiances for ISCCP. Journal of Climate, 6, pp. 2341–2369. SIMPSON, J.J. and GOBAT, J.I., 1996, Improved cloud detection for daytime AVHRR scenes over land. Remote Sensing of Environment, 55, pp. 21–49. SMITHSON, P., ADDISON, K. and ATKINSON, K., 2008, Fundamentals of the Physical Environment (London: Routledge). STEHMAN, S.V., 1997, Selecting and interpreting measures of thematic classification accuracy. Remote Sensing of Environment, 62, pp. 77–89. STOWE, L.L., DAVIS, P.A. and MCCLAIN, E.P., 1999, Scientific basis and initial evaluation of the CLAVR-1 global clear/cloud classification algorithm for the Advanced Very High Resolution Radiometer. Journal of Atmospheric and Oceanic Technology, 16, pp. 656–681. TOU, J.T. and GONZALEZ, R.C., 1974, Pattern Recognition Principles (London: Addison-Wesley Publishing Company). TOWNSHEND, J.R.G. and JUSTICE, C.O., 1988, Selecting the spatial resolution of satellite sensors required for global monitoring of land transformations. International Journal of Remote Sensing, 9, pp. 187–236. WANG, X., XIE, H. and LIANG, T., 2008, Evaluation of MODIS snow cover and cloud mask and its application in Northern Xinjiang, China. Remote Sensing of Environment, 112, pp. 1497–1513.