Statistical Methods in Remote Sensing Presented at the Enhanced Resource Assessment workshop in Brisbane 16-18 November 1998
Fiona Evans CSIRO Mathematical and Information Sciences Private Bag, PO Wembley WA 6014 email:
[email protected]
Abstract This presentation will discuss statistical methods for use in the analysis of remotely sensed data. Many of the methods represent enhancements of traditional means for processing remotely sensed data that have been developed by the CSIRO Mathematical and Information Sciences Remote Sesning and Monitoring Group (CMIS-RSM). The presentation will discuss statistical methods in the context of particular applications, with the aim of explaining the motivation for careful application of these methods.
1
Introduction to CMIS-WA
The Remote Sensing Group conducts research and applications work to provide information for resource assessment and management. The group focuses on the analysis and processing of remotely sensed and other spatial data and has been working in this field since 1979. The emphasis of the group is on the analysis of long term sequences of images to determine trends in land condition, and the integration with related spatial data sets to predict areas at risk from degradation. The group has established close links with major end-users in the renewable resource sector and has received significant support from national funding bodies for collaborative projects with resource management agencies. Collaborators include: • • • • • • •
Agriculture Western Australia WA Department of Land Administration WA Conservation and Land Management WA Water and Rivers WA shires (local government) Department of Environmental Protection LWRRDC, NDSP, Environment Australia
In recent years, the Remote Sensing Group has developed methods for mapping and monitoring land condition in rangelands, forests and agricultural areas. The latter focuses on the spread of salinity, the effect of seasonal waterlogging, the occurrence of wind erosion and the condition of remnant vegetation.
1
The applications of the group include: • • • • • • • 2
Monitoriong wind erosion. Monitoring waterlogging. Monitoring the condition of remnant vegetation and native forests. Monitoring the condition of rangelands. Cereal crop yield mapping. Mapping and montiroing salinity. Predicting areas at risk from salinity.
Image registration and rectification
Rectification is the process by which images are aligned to AMG coordinate system, enabling pixels to be tracked through time and integrated with other spatial data. Registration is used to describe the process by which two or more images are geometrically aligned. This step is sometimes considered to be a trivial part of image pre-processing, since the procedures for image rectification are fairly well defined. However, small errors in the registration process can result in large errors in products derived using multiple images. For instance, the clearing map shown in Figure 1(i) contains errors along the boundaries of bush reserves and roadside verges that have been caused by mis-registration errors. The map shown in Figure 1(ii) shows the accurate clearing map produced using carefully registered images.
Figure 1 Clearing maps produced using (i) mis-registered images and (ii) well-registered images.
Registration and rectification are usually performed using least squares regressions, where the eastings and northings for known ground control points (GCPs) are regressed against their (x,y) image position (Richards, 1986). The regressions take the form: (1)
N = a0 + a1x + a2y + ...
(2)
E = b0 + b1x + b2y + ...
The regressions may be linear or include additional quadratic or higher terms. It is important to carefully test which terms are required in the regression models. Traditionally, each overpass scene is 2
registered to a base scene. However; when making scene mosaics, mis-registrations often occur in scene overlap areas. A recent modification to the registration process implemented by CMIS-RSM incorporates ground control points in scene overlap areas. Simultaneous regressions of overlapping images aim to minimise the differences between the pixel positions of the ground control points in the overlapping areas. The regression model for the simplified two-image case becomes: rss
= rssscene 1 + rssscene 2 + rssoverlap = (N1 - X1a1)T (N1 - X1a1) + (N2 - X2a2)T (N2 - X2a2) + (X01a1 - X02a2)T (X01a1 - X02a2).
Where, X1 defines the pixel position for the GCPs in the first image and N1 the corresponding northing values, X2 defines the pixel position for the GCPs in the second image and N2 the corresponding northing values, and X01, X02 define the pixel positions in the first and second images for the overlap GCPs. The model for the eastings is described similarly. Another new technique based on cross-correlation enables the automatic matching of a given set of ground control points in the base image to determine the location of these points in overpass and overlapping images. This technique reduces the need for manual selection of ground control points in any but the base scenes. Sub-pixel matching allows for offsets in the overpass image, scaling and rotation. The procedure is described as follows. Given located edge or boundary features in the base image, the position in the overpass image is estimated as a function of six parameters: estimated line = best whole-line match + line shift + i * line scaling * cos (line rotation angle) - j * line scaling * sin (line rotation angle) estimated pixel = best whole-pixel match + pixel shift + j * pixel scaling * cos (pixel rotation angle) - i * pixel scaling * sin ( pixel rotation angle) Cross correlations are calculated between the values of the base image and interpolated values for the overpass image at the estimated location, summing over all pixels in a window around the feature. The best match maximises the correlation between the image windows. 3
Image calibration
An important step in the development of methods for monitoring change is the ability to compare images from different dates and sites in different scenes. These comparisons require the digital counts from each scene to be calibrated to common reference values. The calibration process is applied to the raw data to remove time and scene-dependent effects on the digital counts of the images. Calibration enables temporal analyses to be conducted, so that images showing change through time can be produced. For instance, Figure 2 shows a condition change map for the Stirling Range national Park (south west WA) produced using seven dates of Landsat imagery between 1988 and 1998. In this map, red areas have shown in a linear increase in cover density over the seven years. These areas actually correspond to a the recovery of ground burnt in a bushfire that occurred in1988. The blue areas have decreased in condition during the ten-year period, and the green and yellow areas have shown a quadratic decrease in cover density; i.e. cover had increased up until a fire event and decreased after the fire.
3
Figure 2 Condition change map for the Stirling Range National Park, 1988 - 1998.
Carefully calibrated images can also be used to extract time-traces for particular pixels, enabling us to quantitatively examine how the cover density is changing for particular locations. For instance, in rangelands areas time-traces for particular pixels, such as those shown in Figure 3, can be used to help determine whether the dominant cover type is of annual or perennial species. This plot shows time traces of calibrated image values from good (black) and poor (red) sites from tropical grasslands. The dynamic zig-zag response indicates annual-dominated sites (poor condition rangelands) while the more steady response indicates perennial-dominated sites (good condition rangelands).
Figure 3 Time-traces for rangelands sites showing good condition sites in black and poor condition sites in red.
4
Ideally, all images would be calibrated to reflectances. Reflectance, which is defined as the percentage of incident radiation reflected by the surface material, is a physical property of every substance. It can be measured under laboratory conditions. If the images are calibrated to reflectances, image pixel values can be compared directly with field and laboratory measurements, and changes can be related to physical properties of the ground cover. In practice, linear regression techniques are used to convert the digital values of an image to a reference image. This method is referred to as likevalues calibration. Invariant targets are features which have constant reflectance over time. These may include: • • • • • • •
Deep ocean. Deep lakes Rock outcrops. Airfields. Quarries, gravel pits and mines. Roaded bitumen catchments. Bare sands.
The data values for invariant targets are used to define linear functions to transform each overpass image to the reference image by assuming these targets should have the same digital count values in each image. Targets must be selected for a range of bright, mid-range, and dark values and you must have a balanced number of bright and dark targets. Possible targets are listed below. The digital counts of these targets are extracted from both images and robust regression techniques are used to estimate the gains and offsets. Robust regression techniques are essential, since changes in targets assumed to be invariant can overly influence standard least squares regression. Figure 4 shows the results of image calibration using standard least squares and robust regression techniques. The salt lake chain in the lower left hand corner of the image would be epected not to change spectrally through time unless if the water content were considerably changed. In the image calibrated using least squares regression, this salt lake chain looks to be a different colour than in the base image. However, the colour of the salt lake area in the image calibrated using robust regression techniques is very similar to that of the base image, suggesting that this is the better calibration. The calibration plot in Figure 5, shows the effect of clouds on the regression line. The presence of cloud in the overpass image has caused a cluster of points that lie some distance from the correctly estimated weighted least squares line. The next step is to calculate the regression coefficients which relate the overpass images to the reference image. Robust regression techniques, such as s-estimation (Rousseeuw and Leroy, 1984) and weighted least squares estimation procedures are calculated. The s-estimation method fits a line to 58% of the data in each band separately and assigns a weight to each point. The weighted least squares method uses the minimum weight from all bands of each point to determine the weighted least squares line.
5
(i) Base scene.
(ii) Raw overpass scene.
(iii) Calibrated overpass scene using least squares regression.
(iv) Calibrated overpass scene using weighted least squares regression.
Figure 4 The effects of using robust and non-robust regression techniques for image calibration.
6
Figure 5 Calibration plot showing weighted least squares regression line.
4
Maximum likelihood classification and canonical variate analysis
Given a set of objects, X, where each object is described by a set of numerical measures or attributes, and has an associated class, the pattern classification problem is to determine the class of a new object given its attribute values. For an example that illustrates the construction of the classification problem, see Alder (1994). A classifier is a method for assigning a class to an object according to its attribute values. A classifier can be defined as a function d(x) defined on X such that for every x in X, d(x) = j, if and only if x has class j. A classifier partitions X into disjoint subsets where members of each subset have the same class. Usually, a subset T of X is used to train the classifier, and the aim is to determine the class of a new object. Maximum likelihood classification (MLC) has traditionally been used as a baseline for the classification of remotely sensed data. Maximum likelihood classification is based upon the assumption that there exist statistical models describing the distribution of the classes in the attribute space. Given these models, the class of a new object is determined by calculating which of the models is more likely to describe that object. In other words, the model with maximum likelihood is selected. Maximum likelihood classification usually assumes multivariate normal (Gaussian) models. For a set of M n-dimensional objects, (x1,...xM) where xi = (x1,1,....x1,n)T, the Gaussian probability density function (pdf) is defined to be
g [ m ,C ] =
1 ( 2 ) n det(C )
e
− ( x − m )T C −1 ( x − m ) 2
,
where the vector of means, m, is given by
7
m=
1 M
M
∑x i =1
i
and the covariance matrix, C, is given by
C=
M 1 ( xi − m)( xi − m) T . ∑ ( M − 1) i =1
The following images (Figure 6) show the calibrated band 4,5 7 display for Landsat TM scenes for two successive spring seasons. In these images, the near infrared band (band 4) is showing differences in the density of vegetative structure in the ground cover. Hence, the darker red areas show cropped paddocks, the lighter red and orange areas show pastures, and the lighter blue and grey areas have little vegetative cover. These bare areas may be salt-affected soils, or they may be bare for other reasons including management. Clearly, some bare areas are fallow paddocks, since they support a good cover of crop or pasture in one of the two seasons. The similar colours in this image area evidence that the spectral signatures of bare soil and bare salt-affected soil are similar, implying a single Landsat scene may not contain sufficient information for classifying salinity.
Figure 6 Calibrated (i) 1989 and (ii)1990 Landsat images with bands 4, 5, 7 in R, G, B.
Canonical variate analysis provides a method for transforming input attribute data in such a way that the separation between training classes is maximised. Plots of canonical variate means for the training sites provide a simple tool for examining the separability of the classes. Canonical variate analysis can be considered as a two-stage rotation of the attribute data (Campbell and Atchley, 1981). The first stage consists of a principle component analysis (Richards, 1986, pp. 127-130) of the attribute data. The second stage consists of an eigenanalysis of the group means for the principle component scores from the first stage. In this way, the differences between the classes are maximised relative to the differences within the classes. This is particularly relevant to remote sensing applications where training sites are composed of regions of many pixels since the spectral values of pixels belonging to the same training site may cover a range of values. Given g classes, each with ng training objects such that xki = (xi,ki,…,xM,ki) for all k=1,…,g and i=1,…,nk then a canonical variate analysis forms a linear combination, yki = cT xki , of the input attributes such that the ratio of the between-groups sum of squares, g
SS B = ∑ nk ( yk − yT )2 , k =1
8
and the within-groups sum of squares, g
nk
SSW = ∑ ∑ ( yki − yk )2 , k =1 i =1
where yk =
1 nk 1 yki is the mean of the k-th class, yT = ∑ n i=1 nT
g
∑n
k
yk is the overall mean and
k =1
g
nT = ∑ nk is the total number of training objects. k =1
Substituting yki = cT xki gives
SSW = c
g
T
nk
∑ ∑(x
ki
− xT )( xki − xT )T c
k =1 i =1
= c Wc T
and g
SSB = c T ∑ nk ( xk − xT )( xk − xT )T c k =1
= c Bc. T
Thus, maximising f =
SS B c T Bc = requires an eigenanalysis Bc=Wcf. SSW c TWc
Given p attributes, there are h=min(p, g-1) canonical vectors with non-zero canonical roots. If C=(c1,…ch) and F=diag(f1,…fh) then the eigenanalysis becomes BC=WCF. The canonical variate plot of the first canonical mean against the second canonical mean for each training site in 1989 is shown in Figure 7. The plot shows that the training sites corresponding to different ground cover types are clustered; however, there is a good degree of overlap between the different cover types. The agricultural sites contained in the overlap region have been inspected in the Landsat image; they tend to be cropped or pastured areas where growth is poor because of late germination, poor conditions or management effects. The plot also shows that some saline sites are spectrally similar to bare soil sites and other saline sites are spectrally similar to remnant vegetation sites.
9
Figure 7 Canonical variate plot for August 1989.
The effect of the overlap region can be seen in the classification maps, Figure 8, where: 1. Salinity is over-estimated in both years (particularly in 1989 where whole paddocks are inaccurately mapped as saline). 2. A greater proportion of the image is mapped as saline in 1989 than in 1990, despite it being unlikely that on-ground changes have occurred within this time period. In addition, a third type of error is apparent in that many areas mapped as salt seem to occur outside of the valleys where salinity is more likely to occur.
Figure 8 (i) 1989 and (ii) 1990 Landsat classifications.
10
5
MLC and posterior probabilities
Bayes’ theorem provides a framework for allocating probabilities of class labels for a new object. For instance, an object located in the overlap region of the attribute space shown in Figure 6 may belong to class 1 with a probability of 0.55 and belong to class 2 with probability of 0.45. Bayes’ theorem requires that we know the prior probability of any object belonging to each of the classes. That is, the proportion of objects belonging to each class over the population. The probability for class Ck is written P(Ck). Given a set of attribute values, we can form the joint probability P(Ck, x), or the probability that an object has attribute values given by x and belongs to class Ck. Then the conditional probability P(x | Ck) is the probability that the object has attribute values equal to x given that it belongs to class Ck. Probability theory states: P(Ck, x) = P(x | Ck) P(Ck). Bayes’ Rule can then be applied to give:
P(C k | x) =
P( x | C k ) P(C k ) . P( x)
The probability P(Ck | x) is called the posterior probability of the object belonging to class Ck given that it has attribute values x. Maximum likelihood classification uses the estimated Gaussian distribution to calculate the posterior probabilities for each class, and assigns a new object to the class with the highest posterior probability. However, the posterior probabilities provide a continuous measure of class membership, that can be combined with posterior probabilities from multiple dates using probability theory. The following section describes how this can be done. 6
Conditional probabilistic networks
Conditional probabilistic networks (CPNs), also called Bayesian networks and causal probabilistic networks, provide a framework for describing probabilistic relationships between a number of different variables. A CPN is a graphical model that describes the joint probability distribution for a number of variables via conditional independence assumptions and local probability distributions (Heckerman, 1996). The benefit of this type of model, is that the graphical structure is used to describe the way that input attributes inter-relate. In this way, the structure of the CPN is used to incorporate prior knowledge about the relationships between data layers. The network structure of a conditional probabilistic network consists of a directed acyclic graph. The nodes in the graph correspond to the variables of interest. The edges joining nodes correspond to joint probability distributions between the variables represented by those nodes. More detail about graph theory and its application in CPNs can be found in Lauritzen and Spiegelhalter (1988) and Neapolitan (1990). Given a set of variables X = {xI,…,xn}, each with parents Pi, the joint probability distribution of X is given by n
p( X ) = ∏ p( xi | Pi ) . i =1
The local probability distributions correspond to the conditional distributions in the product on the right hand side of the above equation. If the conditional distributions are known, then it is possible to calculate the joint probability distribution using Bayes’ rule. 11
Construction of a conditional probabilistic network requires that the variables are ordered, and the relationships between variables are examined, so that conditional probability distributions can be defined for subsets of variables that are conditionally dependent. For example, Figure 9 shows a simple network which aims to map salinity using a two-year sequence of landcover maps produced using the posterior probabilities from maximum likelihood classification of Landsat TM data (with associated accuracy statistics) and a landform map. In this graph, the square nodes represent the input attribute data (y1 = posterior probabilities for year 1, y2 = posterior probabilities for year 2 and lf = landform type), and the circular nodes represent the outputs (s1 = salinity in year 1 and s2 = salinity in year 2).
Figure 9 A simple CPN for mapping salinity.
The network contains four cliques of child nodes and their parents: (lf, s1), (lf, s1, s2), (s1, y1) and (s2, y2). This structure represents the following assumptions: 1.
The mapped landcover type depends on the true salinity status at any time.
2.
The salinity status for year 1 depends upon the landform type.
3.
The salinity status at year 2 depends on both landform and whether that area was saltaffected in the previous year.
Conditional probability distributions must be defined for each of these cliques such that P(X) = p(s1 | lf) p(s2 | lf, s1) p(y1 | s1) p(y2 | s2)p(lf). Neighbourhood information can be included in conditional probabilistic networks with the addition of extra nodes. If we consider the previously described model, we can write the neighbourhood values as s1n and s2n. Figure 10 shows the network in graphical format. The model is then extended so that the effects of neighbourhood pixels (modelled using Markov random fields) are included: P(X) = p(s1 | lf, s1n) p(s2 | lf, s1, s2n) p(y1 | s1) p(y2 | s2)p(lf).
12
Figure 10 A simple CPN with neighbourhood effects included.
The model can then be extended to use as many dates of imagery as required. In Figure 11, input attributes are represented by boxes. Nodes 5 to 8 represent the posterior probabilities images and node 0 represents the landform type. The influence on any pixel of the labels of neighbouring pixels is represented by nodes 9 to 12; the effects are included in an iterative manner similar to the methods used in section 3. Output salinity maps are produced at nodes 1 to 4.
Figure 11 CPN for mapping salinity using four dates of Landsat TM imagery.
The conditional probability distributions are supplied to the CPN in the form of tables. The probability tables can be initialised using the error estimates from the neighbourhood-modified maximum likelihood classifications and expert knowledge (ie. the best judgement of the author) of the probabilities of different cover types occurring in each landform type.
13
Figure 12 shows the salinity maps produced using the conditional probabilistic network. Marked improvements can be seen on any of the maximum likelihood classifications; in particular: •
Mapped saline areas are constrained to valleys and depressions; thus eliminating noise in the form of saline patches mapped on slopes and hilltops.
•
Salinity is mapped consistently through time; ie. no significant changes in the areas mapped as saline occur within any single-year time interval.
The conditional probabilistic network has been used to include prior knowledge about the relationships between the input attributes and their relationship with salinity. This is particularly useful when considering a time series of Landsat images since it enables the production of salinity maps which are consistent through time.
(i) 1989
(ii) 1990
(iii) 1993
(iv) 1994
Figure 12 Salinity maps produced using the conditional probabilistic network.
14
7
CMIS-RSM research and development
Other areas being investigated by CMIS-RSM include: • • • • • 8
Brightness correction of airborne photography and videography. Correction of artefacts in automatically generated, high density digital elevation models. Using neighbourhood configurations in image classification. Developing spatial - temporal models for data integration. Using multi-temporal sequences of Radarsat imagery for monitoring land condition.
For more information
On CMIS-RSM research and previous projects, see http://www.cmis.csiro.au/rsm. On CMIS-RSM operational monitoring project, see http://www.landmonitor.wa.gov.au.
9
References
Alder, M. D. (1994), Principles of pattern classification: statistical, neural net and syntactic methods for getting robots to see and hear, University of Western Australia Centre for Intelligent Information Processing Systems. Caccetta, P. C. (1997), Remote Sensing, GIS and Bayesian Knowledge-based Methods for Monitoring Land Condition, A thesis submitted to the Faculty of Computer Science at Curtin University for the degree of Doctor of Philosophy. Campbell, N. A. and Atchley, W. R. (1981), ‘The geometry of canonical variate analysis’, Syst. Zoology, Vol. 30, No. 3, pp. 268-280. Evans, F. H. (1998), An investigation into the use of maximum likelihood classifiers, decision trees, neural networks and conditional probabilistic networks for mapping and predicting salinity, A thesis submitted to the Faculty of Computer Science at Curtin University for the degree of Master of Science. Furby, S. L., Campbell, N. A. and Palmer, M. J. (1997), ‘Calibrating images from different dates to like value digital counts’, submitted to Remote Sensing of the Environment. Heckerman, D. (1996), A tutorial on learning with Bayesian networks, Microsoft Research technical report No. MSR-TR-95-06. Lauritzen S. L. and Spiegelhalter D. J. (1988), ‘Local computations with probabilities on graphical structures and their application to expert systems’, Journal of the Royal Statistical Society, Vol. 50, No. 2, pp. 157-224. Neapolitan, R. E. (1990), Probabilistic reasoning in expert systems, John Wiley and Sons, USA. Richards, J. A. (1986), Remote sensing digital image analysis: an introduction, Springer-Verlag, New York.
15
Rousseeuw, P. J. and Leroy, A. M. (1984), ‘Robust regression by means of S-estimators’, in Robust and Nonlinear Time Series Analysis, ed. Franke, J., Hardle, W. and Martin, R. D., Lecture Notes in Statistics, Springer-Verlag, pp. 256-272.
16