01-130.qxd 10/8/03 12:15 PM Page 1271
Evaluating the Performance of Spatially Explicit Models Robert Walker
Abstract Spatially explicit models are now widely used for conducting ecological research and for managing natural resources, due in part to the difficulty of undertaking empirical work at landscape scale. Unfortunately, error assessment and analysis of the predictive ability of such models is not well-developed, and has relied on the Kappa statistic and information-based measures. As has been pointed out, however, such approaches are limited by virtue of their global nature and weak hypotheses. As it turns out, the literature on map accuracy does provide a way of assessing model performance, and the goal of this paper is to adapt this literature to the need for evaluating the predictive ability of spatially explicit models. To this end, the paper first considers inference using the Kappa statistic. This is followed by a commentary on information theory, and a critique of both the Kappa statistic and information-based approaches given their global structure and underlying null hypotheses. A probabilistic treatment of alternative measures recently suggested follows, as does a direct adaptation of map inference to the modeling case. Examples of the proposed measures are given, using an application of logistic regression applied to land-cover changes that have recently occurred in the Muskegon River watershed of the State of Michigan.
Introduction Landscape models have become popular as tools for conducting ecological research (Sklar and Constanza, 1991) and for managing natural resources (Sklar et al., 2001), due in part to the difficulty of undertaking empirical work at landscape scale (Baker 1989). Of special interest are stochastic spatial landscape approaches and particularly the transition probability model, which has been extensively deployed in studies of land-use and land-cover change. The fundamental concept is represented by a Markovian process (Çinlar, 1975), and land-cover change is taken simply as a stochastic transition from one state to another, governed by the equation nt1 Pnt,
(1)
where n is a vector—superscripted in time, t—of I land covers, lci, i [1, ..., I ], for some arbitrary parcel, or n [lc1 lc2 . . . lcI]T and P is an I I matrix of transition probabilities (between land covers), over the period t to t 1. The spatially explicit model, a widely applied type of stochastic landscape model, focuses on the statistical estimation of transition probabilities for highly disaggregated landcover units, which may be measured at pixel scales. Presently, ecologists, economists, and geographers are engaged in specifications of such models, and applications range across Department of Geography, Michigan State University, 314 Natural Science Building, East Lansing, MI 48824 (
[email protected]). PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
temperate and tropical ecosystems (Ludeke et al., 1990; Turner et al., 1996; Wear et al., 1996; Bockstael, 1996; Chomitz and Gray, 1996; Nelson and Hellerstein, 1997; Mertens and Lambin, 1997; Wear and Bolstad, 1998; Mertens and Lambin, 2000; Geoghegan et al., 2001). In such modeling efforts, data for dependent variables are often derived through remote sensing applications, and the independent variables are generated with geographic information system (GIS) software. One challenge that has emerged in the development and use of spatially explicit models is the assessment of goodnessof-fit. Because the variable to be explained is a discrete state of land cover, the usual regression measure, R2, with its extremely useful interpretation of accounting for unexplained variation, does not apply (Hosmer and Lemeshow, 1989; Nelson and Hellerstein, 1997; Wear and Bolstad, 1998; Geoghegan et al., 2001). Often implemented are measures of psuedo-R2 (Nelson and Hellerstein, 1997) such as the so-called likelihood ratio index, or 2 (Ben-Akiva and Lerman, 1985), which has nothing to do with variance and reflects instead a ratio of likelihood functions (McFadden, 1974; Greene, 2000). This is given as 1 £(X)£(O), where the quotient is a ratio of log-likelihoods (£) associated with the estimated model, £(X), and with a single-constant, or intercept, model, £(O). Another aspect of goodness-of-fit that spatially explicit models often overlook is predictive ability. While a model may have a high value of the likelihood ratio index, this need not mean it also predicts well. Standard reports provided by statistical software include, typically, a measure indicating the percentage of observations correctly classified, which compares observed to predicted outcomes. If observations are taken on spatial units, the norm for spatially explicit models, such a statistic provides some indication of predictive ability. Unfortunately, this statistic depends entirely on how probabilities predicted by the model are translated into the discrete outcomes of the phenomena being modeled (Greene, 2000). In addition, the inferential properties of such measures typically remain unreported in standard software. The literature on map accuracy does provide a way of assessing model performance (Nelson and Hellerstein, 1997; Pontius, 2000), and the goal of this paper is to adapt the work on map accuracy to the need for evaluating the predictive ability of spatially explicit models. The relevance of the work on map accuracy in the present context is easy to see by reference to the Kappa statistic, which is often used in map accuracy assessments. The Kappa statistic is based on a contingency table with frequency data organized into map classifications and ground-truthed reference conditions. The data themselves are observations on land units that present an actual cover as well as an indication of what that cover is, as Photogrammetric Engineering & Remote Sensing Vol. 69, No. 11, November 2003, pp. 1271–1278. 0099-1112/03/6911–1271/$3.00/0 © 2003 American Society for Photogrammetry and Remote Sensing November 2003
1271
01-130.qxd 10/8/03 12:15 PM Page 1272
provided by a map. If the map is interpreted as the model in this case, and the reference conditions as the true empirical situation that the model seeks to represent, then the Kappa statistic provides a measure of model accuracy (Nelson and Hellerstein, 1997). This paper attempts to provide a basis for inference in assessments of spatially explicit models. To this end, inference using the Kappa statistic is first considered, which is followed by a commentary on information theory, and a critique of both the Kappa statistic and information-based approaches. A probabilistic treatment of alternative measures put forward by Pontius (2000), and an adaptation of map inference to the modeling case (Card, 1982; Nelson and Hellerstein, 1997) are then given. Several applications of the goodness-of-fit measures proposed conclude the paper.
Implicit Probability Models and Goodness-of-Fit The Kappa statistic is defined as (P0 Pc)(1 Pc),
(2)
where P0 is the relative frequency of correct classifications (summed over classes) in a comparison between a test map and some reference taken as the truth condition, and Pc is an estimate of the relative frequency of correct classifications that would occur on the basis of chance (Cohen, 1960). This is the sum of the product of the marginal probabilities (or relative frequencies) for the same classes, calculated for the test map and reference. Kappa is suggestive of the chi-square ( 2) statistic used in contingency table analysis, but the components of P0 and Pc are not individually summed as differences. Like chi-square ( 2), the Kappa statistic tests differences between observed frequencies and what would be expected under a process of “pure chance.” But as Pontius (2000) pointed out, a pure chance referent may not be very useful in assessing how well a model predicts the actual configuration of a landscape. Such models typically give as output some land cover, or land-cover change, associated with individual spatial units, and defined on categories reflecting the real world (for which there are map data). If the model performs according to chance, then the probability that an arbitrary parcel is predicted to possess some cover, given it actually does, is equal to the fraction of that land cover predicted to occur in the landscape at large, which could be quite low. Testing model performance against such an assumption may well lead to its rejection, which may not be very informative. Just because a model predicts better than the toss of dice does not mean it predicts well. Another problem is interpretive, and arises by virtue of the global nature of the statistic, based as it is on all the categories in a data set. Considering the case of land-cover change analysis, certain changes may be relatively infrequent in a map data set, but of critical importance to the application. This happens with deforestation studies when affected areas are small relative to the entire map area used for model fit and estimation. Because the Kappa statistic uses all categories, inference will be heavily weighted by the non-forest change categories, which may not be of much interest. Thus, a model can test well overall, but only because it is able to reproduce a static landscape. Comparison of results to reality on a disaggregated basis might suggest a different assessment altogether for performance in the categories that matter. Similar arguments can be made regarding informationbased statistics recently implemented in land-cover change analysis (Wear and Bolstad, 1998). Consider the entropy of a system (H) defined on a set of I possible states i, i I: i.e., I
p(i) ln[p(i)],
H
i1
1272
November 2003
(3)
where p, the probability of observing state i I under a null model, is set at 1I for each of I states (Hauser, 1978). For illustrative purposes, let the possible states be types of land cover observed in a landscape consisting of a grid of n cells (or pixels). Let there also exist some model, represented by X, producing estimates of probabilities within a mutually exclusive, collectively exhaustive partition of the grid. Then, the information added by the model, X, is written as 1 Inf(I, X) n
n
I
p (i X)
k ki ln p(i)
(4)
k1 i1
where pk(i X) is the predicted (or conditional) probability of observing i, calculated from the model (for each grid cell, k), p(i) is the same as in Equation 3, and ki is a weight of 1 or 0, such that ki 1 for the i state indicating the observed land cover in grid cell k (0 otherwise). This formulation (Hauser, 1978) has been used to test whether a landscape model possesses explanatory power through its ability to reduce the level of entropy in map data (Wear and Bolstad, 1998). However, the statistic is global in nature, and the operative null is complete entropy. Even if the global result were of interest, a test against an entropy hypothesis may not be very useful. Concluding that the model does better than predicting a probability of 1I for observing some land cover i in an arbitrary pixel does not mean that the model is a good one. In summary, the problems with the Kappa and information-based statistics in testing model performance are twofold. First, the underlying null hypotheses may be so weak that they lead to tests that are misleadingly strong. Although more credible than the information-based null hypothesis, Kappa’s assumption of independence between reality and model predictions can still overstate the utility of a bad model at predicting land-cover change. The second problem involves the global nature of the tests, which may conceal poor performance for categories that matter in any given analysis.
Extending and Disaggregating the Hypothesis Set In criticizing the Kappa statistic, Pontius (2000) presents a set of baselines that originate from a consideration of its foundational null hypothesis, which is that a process of pure chance governs the classification (or prediction) of observational units into map (or model) categories. Pontius (2002) defines his baselines as expected proportions of correct classifications, and organizes them along two dimensions, namely, the ability to predict quantity (i.e., extents of land cover or types of land-cover changes in particular classes) and the ability to predict location, respectively. The locational hypothesis may be thought of as referring to individual points on a map or grid cells. Given an actual land cover at some point, locational ability could be taken as the model’s success at predicting the land cover at that point, which is analogous to so-called producer’s accuracy (Nelson and Hellerstein, 1997; Congalton and Green, 1999). By way of contrast, quantity ability is an aggregate characteristic referring to a model’s ability to predict the actual magnitude of some land cover (or land-cover change category) in the landscape. The distinction between locational and quantity ability is based on the observation that a model may be able to correctly place the location of a particular land cover, at the same time that it does not predict well the actual amount of the land cover occurring in the landscape at large. Preliminary Setting Pontius (2000) does not develop his baselines as explicit probabilistic hypotheses, which compromises their utility as a tool for inference, and it is a goal of the paper to fill this gap. To this end, consider two landscape maps for the identical area, one giving the actual land cover (r) and the other, a simulated PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
01-130.qxd 10/8/03 12:15 PM Page 1273
land cover (s). Consistent with Pontius (2000), the simulated land cover map is assumed to be created by the distribution of a fixed number of predicted outcomes across a grid with n locations, as opposed to allowing prediction realizations to occur at each cell according to some fixed probability. To facilitate the exposition of the sampling space and stochastic process, let there be two possible states of land cover (or landcover change) belonging to a set, I. Without loss of generality, let I [0, 1]. Then each location, k, in the two maps has two possible realizations, Crk and Csk, where C is a Bernoulli variable indicating by subscript an outcome for (1) the real land-cover map at the given location or (2) the simulated land cover. Thus, Crk 0r or 1r, and Csk 0s and 1s, where r indicates the variable and outcome for the real map, and s, for the simulated map. Let there be v 1r’s and m 1s’s, in which case the probability of sampling a pixel from the real map that shows land cover 1 is vn and, from the simulated map, mn. Define these as R1 Pr(Crk 1) and S1 Pr(Csk 1). Hence, R1 vn and S1 mn. The 1r’s locations are fixed, while the 1s’s are distributed randomly over the landscape grid. For both realizations, the 0’s and 1’s exhaust all locations, and are mutually exclusive. The number of 0r’s is therefore n v, and the number of 0s’s, n m. Randomness arises from the manner in which the fixed number (m) of 1s’s is distributed over the grid locations; this depends, in turn, on assumptions regarding model performance. The stochastic setting just described conforms to a non-free, or sampling without replacement, framework because the number of realizations for both the reality and simulated states are fixed (Cliff and Ord, 1973).
selected present opposite ends of the hypothesis continuum regarding model predictive capability, and will be developed in the sequel as forms of producer’s and user’s accuracy, which are conditional probabilities involving the random variables, Crk and Csk. An approach is presented that can be used to specify these probabilities (with confidence intervals), describing locational ability as defined (Card, 1982). The probability that an arbitrary cell shows the same state under simulation as observed empirically is the probability of the intersection event, Pr(Crk i Csk i), where i [0, 1]. By Bayes’ rule, this can be written as Pr(Crk i) Pr(Csk iCrk i). The null hypotheses for locational ability (Pontius 2000) will be defined here on the above conditional probability, as well as the converse, Pr(Crk iCsk i). The quantity ability hypothesis is embedded in the unconditional probability that the simulated state for an arbitrary map cell is Pr(Csk i), where Si Pr(Csk i) for i [0, 1]. Recall also that Ri Pr(Crk i). The predicted probability, Si, can be assumed known as in Pontius (2000), an assumption that ultimately will be relaxed. Knowledge of Ri has strong precedence in the map marginal probabilities of Card (1982) and Congalton and Green (1999). With no quantity ability, a completely naïve model predicts an arbitrary cell is in state i with probability 1I, given I possible states (Hauser, 1977; Pontius, 2000), or Pr(Csk i) 1I ( 0.5, for I [0, 1]). With partial predictive ability, it is Si [Pr(Csk i) Si], while perfect ability yields Ri, and Pr(Csk i) Ri. From Equation 7,
Forming the Hypotheses Distributional characteristics can now be ascertained to develop tests on matching between the reality and simulated conditions. However, it is necessary to identify the statistic of interest, which is taken here as the fraction correctly classified, or fn, given as
where the summation is over all states. As noted, the intersection terms may be rewritten by Bayes’ rule as
n
Xk k1 fn for n locations, n
(5)
where Xk is a random variable such that Xk 1 when Crk Csk, for grid cell k (and 0 otherwise). The expectation is n
E
Xk k1
E(Xk) k1
. n
The locational hypotheses may now be stated in terms of the conditional probability, in conjunction with the quantity ability specifications on the unconditional probabilities, 1I, Si, and Ri. If a model possesses no locational ability, it is asserted that Pr(Csk iCrk i) Pr(Csk i) and Pr(Crk iCsk i) Pr(Crk i). Thus, the probability of correct classification for the naïve case is I
Pr Crk i
Pr Csk iCrk i
i1
(6)
I
(1I)
Pr Crk i 1I.
(8)
i1
For the partial quantity model, we have I
I
PrCrk i Pr Csk iCrk i or Pr Csk i Pr Crk iCsk i.
P
Because the Xk, k [1, ..., n] are identically distributed (but dependent), we have
Pr Crk i Csk i, i1
n
, or n
1 n E(Xk) n
I
Pr (correctly classified) P
Pr Crk i C ks i
(7)
P
Pr Crk i i1
i1
where Pr stands for probability. This is consistent with Card (1982) and Congalton and Green (1999). Let E(Xk) P. The development begins with the baselines of Pontius (2000), which are mixed with varying degrees of locational and quantity ability. A model can have perfect, partial, or no ability to predict the location of a particular land cover, and similarly for its ability to predict magnitudes of land cover. To build an inferential apparatus, these baselines must be translated into corresponding null hypotheses. To this end, a single operational hypothesis will consist of a combination of claims on locational and quantity ability. To simplify the exposition, only two locational baselines are considered, namely, no locational ability and perfect locational ability, while the quantity baseline is restricted to the perfect case (Pijanowski et al., 2001; Pijanowski et al., 2002). The two locational baselines PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
I
Pr Csk iCrk i
RiSi
(9)
i1
and, for the perfect quantity model, P
I
I
i1
i1
Pr Crk i Pr Csk iCrk i Ri2.
(10)
These probabilities are the baselines given in Table 2 of Pontius (2000) for the same assumptions on model performance, in particular, no quantity/no locational ability (NQNL), medium quantity/no locational ability (MQNL), and perfect quantity/no locational ability (PQNL). For the case of perfect locational ability, the conditional probabilities must be specified for each of the quantity assumptions. Given a sampling without replacement framework, the perfect quantity assumption yields Pr(Csk iCrk i) 1. This is because perfect locational ability will place November 2003
1273
01-130.qxd 10/8/03 12:15 PM Page 1274
the fixed number of predicted cells with type i cover in the same set of locations that actually show type i cover. Hence, P
I
I
i1
i1
Pr Crk i Csk i Pr Crk i Pr Csk iCrk i I
I
i1
i1
Pr Crk i Ri 1.
(11)
In the partial quantity case, similar reasoning yields Pr(Csk iCrk i) 1 if Si Ri and Pr(Crk iCsk i) Ri if Ri Si. For a correct classification event, we have Pr Crk i Csk i Pr Crk i Csk i Si Ri Pr Crk i Csk i Si Ri. (12) Observe that Si Ri and Ri Si are mutually exclusive and known for each category and, hence, are independent of the other events in the two intersections of Equation 12. Consequently, by Baye’s rule we have Pr Crk i Csk i Pr Crk i PrCsk iCrk i Pr Si Ri Pr Csk i Pr Crk iCsk i Pr(Ri Si).
(13)
Further, given Si and Ri are non-random, P(Si Ri), P(Ri Si) 0 or 1, and the above term yields Ri when Si Ri and Si when Ri Si or min (Ri, Si). Similar reasoning holds for the naïve model, substituting 1I everywhere for Si, in which case P(Crk i Csk i) min(1I, Ri). Collecting terms, we have I
P
min(1I, Ri)
(14)
i1
for the naïve model, and I
P
min(Si, Ri)
(15)
i1
for the partial quantity model. Note that P is 1 under an assumption of perfect locational ability. As with the case for no locational ability, these probabilities may also be observed as baselines for perfect quantity/perfect locational ability (PQPL), perfect quantity/no locational ability (PQNL), and perfect quantity/medium locational ability (PQML) in Pontius (2000). The Hypothesis Testing Framework The development to this point has focused on providing a probabilistic rationale for certain of the baselines of Pontius (2000) in the interest of developing a formal approach to hypothesis testing. Inference requires information on variance, however. At this point, attention is restricted to only one hypothesis, the case of perfect quantity/no locational (PQNL) ability. This is because the quantity component of the hypothesis has precedent in land cover change modeling (Pijanowski et al., 2001; Pijanowski et al., 2002). In such applications, an approach taken to converting estimated probabilities that a land-cover change occurs to actual outcomes, which are discrete, is to (1) rank the spatial units by probability of change, and then to (2) indicate change cells by their probability ranking, selecting as many as the number of cells actually showing change in the landscape. This amounts to a perfect quantity assumption. The expected probability values for percent correctly classified can be stated directly at this point from the previous derivations. Thus, for the case with two possible land covers, P for PQNL is R0R0 R1R1. Expression of variance necessitates explicit treatment of the probability distributions. This is now done, with a slight generalization to the case of partial 1274
November 2003
quantity. Thus, continuing with the earlier discussion on the probability process, let there be n grid cells, v fixed locations of actual change (e.g., deforestation), and m locations that are predicted to change (to be distributed over the n locations). Without loss of generality, assume m v, or R0 S0 and R1 S1. Note that the perfect quantity case is then given by the equality condition, or R0 S0 and R1 S1. Let x be the number of hits, or correct predictions for land-cover state 1, where a predicted change occurs at the same location as an actual change. Correct predictions (i.e., successes) can also be defined on random locations of the no change category, 0. Let y be the number of “hits” in this regard, in which case y n v (m x). Consider the probability that x correct predictions of change will be made under PQNL assumptions. There are n ways to locate the m predicted changes over the entire m
x ways to distribute x predicted changes nv ways across the v locations with actual change, and m x sample space,
v
to distribute the remainder over the non-change partition. Hence, under the PQNL assumptions, X is a random variable following the hypergeometric distribution, which gives the probability of observing x correct predictions of change as
xvmn xv P(X x) mn The expected value of X is EX (mv)n, and, therefore, the required expectation of the percent correctly classified for the change state (i 1) is (mv)n2 R1S1. Alternatively, consider the no-change state 0. Given X is a random variable, so is Y, and Y X n (v m). To arrive at the fraction correctly classified, divide by n, or Yn Xn 1 (v m)n. The expectation of correct classifications for this state is then E(Yn) R1S1 1 (S1 R1) (1 R1)(1 S1) R0S0. The distribution of interest, of course, is (Y X)n, whose expectation is the sum of the individual fractions, or E[(Y X)n] R0S0 R1S1,
(16)
which is the same as Equation 10 for two states when R0 S0 and R1 S1. Variance for the total percent correctly classified, [(X Y) n], may be obtained by observing Var(X) [(mv)n][1 (vn)][1 (m 1)(n 1)], (17) given it possesses a hypergeometric distribution (Hoel et al., 1971). To facilitate the exposition, let EX and Var(X) , where and are constants associated with the distribution of X. In addition, define the probability of a correct classification for outcome 1 as P11 Pr(Crk 1 Csk 1) for some arbitrary location. Its estimator is Pˆ11 Xn, and (18) Var Pˆ Var(Xn) (1n2) Var(X). 11
Because Y X c, where c n (v m), Var(Y) Var(X). Furthermore, Cov(X,Y) E{(X EX)(Y EY)} E(XY) EXEY. By the definition of Y ( X c, constant), this is equal to E{X(X c)} EX{E(X c)}. Thus, Cov(X,Y) EX 2 c 2 c EX 2 2 Var(X). Furthermore, Var(X Y)n (1n2){Var(X) Var(Y) 2 COV(X,Y)} (4)n2. It is of some interest to note the degenerate case presented by PQPL, also given in Pontius (2000). In this case, m v (or R1 S1), and perfect locational ability will always achieve precisely v correct classifications of class 1, and n v correct PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
01-130.qxd 10/8/03 12:15 PM Page 1275
classifications of class 0. Thus, the percentage correctly classified will always be [v (n v)]n R1 R0 1. Because the percentage correctly classified does not vary, variance under this assumption is zero.
An Application and Extension Table 1a presents pixel-level output from a logistic model predicting land-cover change, from natural and agricultural lands to urban, between 1978 and 1998, in the lower Muskegon River watershed of Michigan (Figure 1). This application is presented for illustrative purposes, and uses as independent variables distance to urban areas, to single family homes, to county roads, to state highways, to rivers, and to lakes. The data were developed through a land-cover change-detection analysis for the period between 1978 and 1998. The 1978 data were produced by the Michigan State Department of Natural Resources using color-infrared aerial photography. The 1998 data came from the National Aerial Photography Program (NAPP), and were also based on color-infrared images. A modified Anderson classification was used to identify the land-use categories. The GIS grid unit for statistical analysis was taken as a 100-m2 pixel. Standard measures of model performance from the statistical software (SAS) indicate that 70 percent of the pixels are correctly classified, and the likelihood ratio index is 0.07. Figures 2 and 3 show the spatial outcomes for error and correct predictions, respectively. To consider the Pontius approach to error assessment, a perfect quantity assumption was implemented to produce the predicted pixel changes (Pijanowski et al., 2001; Pijanowski et al., 2002). As can be observed in Table 1a, the model predicts that 4.2 percent of the map pixels are in class one ({P[Cs] 1} 0.042), which by assumption is the same as R1. Table 1b presents relevant information for a test on the PQNL hypothesis, for which only single category performance is considered (i.e., predictive success for class 1). The null hypothesis for this case is that the probability land cover is predicted to
TABLE 1A. ERROR MATRIX Model Output Total Area: 214565 Pixels Actual
0 1 ∑ S
0
1
∑
R
197662 7800 205462 0.957
7800 1303 9103 0.042
205462 9103 214565
0.957 0.042
Note: The off-diagonal elements are equal, given the assumption that actual change and predicted change are the same.
TABLE 1B. HYPOTHESIS TEST, PQNL H0
P11 (R1)2 0.001798
Pˆ 11
0.006073 0.0000881 (0.005897, 0.006249)2
[Var Pˆ 11]12 Confidence Interval 1 2
Calculated from Equation 18. Taken as two standard deviations. TABLE 1C. PRODUCER’S ACCURACY
ˆ11 [Var ˆ11]12 Confidence Interval 2 3
0.143140 0.0036613 (0.135818, 0.150462)2
Taken as two standard deviations. Calculated from Equation 19.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
Figure 1. State of Michigan: Study area in Muskegon River Watershed.
change in a cell where change occurs is 0.001798; the observed fraction is 0.006073, and the confidence interval clearly counsels rejection of the null hypothesis. Thus, the model appears to be a “good” one in the sense that it improves on the hypothesis of no locational ability. Of course, this begs the question, how good? The Pontius (2000) critique does not provide a clear way of answering this question, because hypothesized probabilities are specified in advance, in which case the goodness-of-fit measures are fixed as reference points to test against. Nevertheless, it is possible to move directly to estimators of two useful conditional probabilities, namely, Pr(Csk iCrk i), so-called producer’s accuracy, and Pr(Crk iCsk i), known as user’s accuracy. This is accomplished through an application of maximumlikelihood estimation to the appropriate probabilities in a contingency table (Card, 1982). It is important to point out differences between the present formulation and that of Card (1982), whose truth category functions as the model output category here. This is because Card’s truth category possesses a random component given that ground points with “true” data are sampled from map locations (given sampling cost). By way of contrast to Pontius (2000), the stochastic process underlying the map and prediction realizations in the present formulation is free, at least partly. The Ri for the real map categories are fixed as before (as in Card (1982)), but the outcomes of the individual predictions of land cover (or land-cover change) are based on an underlying probability, Pi, in which case the marginal values of Si are not fixed, as would be consistent with a spatially explicit model with a random dependent variable. To facilitate the exposition, represent Pr(Crk i Csk j) as Pij. Note that when i j, we observe the probability of a correct prediction. Let the log likelihood function for the Pij’s November 2003
1275
01-130.qxd 10/8/03 12:16 PM Page 1276
(a)
(a)
(b)
(b)
Figure 2. (a) Lower Basin Muskegon River Watershed: Underpredicted pixels. (b) Lower Basin Muskegon River Watershed: Overpredicted pixels. I
I
i
j
Figure 3. (a) Lower Basin Muskegon River Watershed: Correctly classified pixels with change. (b) Lower Basin Muskegon River Watershed: Correctly classified pixels with no change.
associated with the contingency Table 1a be £ nij log Pij, where the nij are the counts of the map units, pixels, or grid cells counted in the ij box of the table, and where i, j are elements of the set I (Card, 1982). Maximizing this function with the global constraint Pij 1, and the constraint i
j
on known map probabilities for the actual land covers, I
i Pij Rj, j, yields the estimator nijn for Pij. This is the same
as would be given without the constraint on the “map” probaI
bilities, because njn Rj, where nj nij, which implies i
1276
November 2003
that the sum of the Lagrange multipliers used for maximization is equal to n ( j n in Equation A 3 in Card (1982)). Under the invariance property of the maximum-likelihood estimates (Card, 1982:437), it is then possible to state estimators of the probabilities of interest such as producer’s accuracy, Pr(Csk iCrk i). Here, producer’s accuracy is taken as in Nelson and Hellerstein (1997), in which the simulated, or predicted, output is conditioned on the actual land cover, as determined by a map. By way of contrast, in Card (1982) and Congalton and Green (1999), the map is the equivalent to a model output, and producer’s accuracy is the probability that PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
01-130.qxd 10/8/03 12:16 PM Page 1277
the map indicates a land cover, given that ground truth data also show this cover. In particular, define ii Pr(Csk iCrk i) PiiRi. Then ˆii Pˆ iiRi, and the asymptotic variance is Var ˆii Pii (Ri Pii)nRi3,
(19)
where Var Pˆii Pii (Ri Pii)nRi. The probability Pr(Crk iCsk i), or user’s accuracy, is also of interest and is characterized as follows. Let ˆ ii PˆiiPˆi, and its ii Pr(Crk iCsk i) PiiPi. Then asymptotic variance can be stated by considering a first-order Taylor series expansion around ii. Define f (a, b) ab. Then f(Pˆii, Pˆi) PˆiiPˆi, f(Pii, Pi) PiiPi, and the Taylor Series is
f f (Pˆii, Pˆi) f (Pii, Pi) (Pii, Pi )(Pˆii Pii)
Pii
f (Pii, Pi )(Pˆi Pi).
Pi ii Pii ii
f
f 1 Because (Pii, Pi ) and (Pii, Pi ) , P Pi2
Pii Pi
Pi Pi ii the approximation is ii ii ˆ ii ii (Pˆ Pii) (Pˆi Pi), Pii ii Pi from which it follows that ii Var ˆ ii E (ˆ ii ii)2 Pii
2
ii Var Pˆi 2 E(Pˆii Pii)(Pˆi Pi). PiiPi Pi
(20)
Substituting for the variances of Pˆii and Pˆi, where Var Pˆi Pii (Ri Pii) Pij (Rj Pij) r ˆ ˆ , and Var Pˆij ji Var Pij, Var Pii nRi nRj (Card, 1982), it follows that
Var ˆ ii
ii
ii
Pii(Ri Pii)
P P nR 2
ii
2
i
i
ii Pi
2
22
P (R P )
22
ii ii ij j ij ˆ ˆ PiiPi Pii Pi PiiPi EPii Pi. nRj
(21)
ji
j1 Pˆii Pˆij. Because cov Pˆij Pˆik 0 I
By definition, EPˆii Pˆi
for j k (Card, 1982), it follows that EPˆii Pˆij Pii Pij for j i. Hence, EPˆii Pˆi Pii {Pi1 Pi2 . . . Pir} Var Pˆii, because EPˆii Pˆii Var Pˆii Pii Pii, by the definition of variance. Further, I because P P , EPˆ Pˆ P P Var Pˆ , and ij
i
ii
i
ii
i
ii
ji
E(ˆ ii ii)2
ii Pii
ii Pi
2
2
Pii(Ri Pii) ii nRi Pi
2
Pij(Rj Pij)
nRj ji
22ii Pii(Ri Pii) Pii Pi nRi
ii Pii
ii Pi
2
2
22ii Pii(Ri Pii) Pii Pi nRi
P (R P ) nR.
ii Pi
2
ij
ji
j
ij
j
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
Pii Pi4 Pii
I
P (R P )
(R P )
ij j ij i ii (Pi Pii)2 , nRi nRj
(23)
ji
which is identical to Equation 28 in Card (1982), the asymptotic variance for the case of simple random sampling. The ˆ ii, however, differs by virtue of the estimate of Pˆii. statistic, Indeed, Card’s estimators of the various probabilities and those of the present paper differ mainly by the maximumlikelihood estimate for Pˆii, and can be converted to those presented here by taking marginal counts for the map categories as non-random, and equal to nRj for column j. The present presentation takes ii, rather than ii, to symbolize user’s accuracy. Table 1c uses the data from Table 1a to provide a calculation of producer’s accuracy, ˆii ( Pˆii/Ri), and associated variance. Note that Equation 23 could be used to provide similar statistics for user’s accuracy, ii Pr(Crk iCsk i). The estimate for ˆii is 0.14, with a confidence interval of (0.135818, 0.150462). In other words, the model has a 14 percent chance of predicting class 1, given a grid cell is in class 1. In the sample at large, the model predicts class 1 cover only 4.2 percent of the time, in which case the model adds appreciably to explanation. Nevertheless, 14 percent is not as large as might be hoped for, in which case the model may not be such a “good” one after all.
Discussion and Conclusions
Var Pˆii
i2i
2
Upon observing that ii PiiPi, further manipulation yields ˆ ii ii)2 E(
(22)
This paper began by considering a recent critique of traditional statistics used in assessments of map and model accuracy, and several new ones that have been proposed, based on concepts of a model’s locational and quantity ability (Pontius, 2000). These new measures, however, lack the probabilistic treatment necessary for inference. The present paper rectifies this, and by way of illustration presents a hypothesis test using derivations of the appropriate distributional properties. The approach taken here disaggregated by land-cover change category, although tests for aggregate performance can easily be structured by defining the percent correctly classified on all map categories, not just one, following the theoretical development of the paper. It was noted that the new measures are limited in that they do not provide actual estimators of key conditional probabilities. Consequently, the paper developed an alternative based on the map accuracy literature (Card, 1982; Congalton and Green, 1999). Measures of producer’s and user’s accuracy were described, and distributional properties stated, for the case of spatially explicit models. One issue not directly addressed by the paper concerns the role of “quantity” prediction by spatially explicit models. Given that such models typically produce probability measures for land cover or land-cover change—as opposed to a discrete outcome—the modeler is left with the task of converting the model’s predicted probabilities, which are continuous, to actual outcomes, which are discrete. Typically, this involves a rather arbitrary approach, using some threshold probability (Greene, 2000) or a perfect quantity assumption (Pijanowski et al., 2001; Pijanowski et al., 2002). For the model presented in the paper, in which a perfect quantity assumption is used to generate the land-cover outcomes, a Pontius (2000) test against the perfect quantity/no locational (PQNL) ability hypothesis is necessarily restricted to a test on the locational dimension of model performance. Presumably, producer’s accuracy in Table 1c is also affected by the quantity assumption. One approach to resolving the issue in this case would be to treat the landscape in an November 2003
1277
01-130.qxd 10/8/03 12:16 PM Page 1278
experimental fashion, and generate landscape outcomes based on Bernoulli trials for each pixel, using its estimated probability. If the estimates of the conditional probabilities were then calculated for each landscape outcome, empirical distributions could be generated, allowing for insight into estimated magnitudes and variability. Spatially explicit models are new creations, enabled by the felicitous combination of GIS software and easily accessible remote sensing data. Interest in them is likely to grow, given our increasing awareness of the role played by landcover change in environmental problems. As their novelty wears off, however, questions are bound to arise about how good they are. The approach proposed in this paper is meant as a first step at providing an answer.
Acknowledgments This research was mostly supported by the National Aeronautics and Space Administration under the project, “Pattern to Process: Research and Applications for Understanding Multiple Interactions and Feedbacks on Land Cover Change (NAG 5-9232),” and by the Wege Foundation under the project entitled, “Building A Sustainable Future For The Muskegon River Watershed: A Decentralized Approach.”
References Baker, W., 1989. A review of models of landscape change, Landscape Ecology, 2:111–33. Ben-Akiva, M., and S.R. Lerman, 1985. Discrete Choice Analysis: Theory and Application to Travel Demand, The MIT Press, Cambridge, Massachusetts, 390 p. Bockstael, N.E., 1996. Modeling economics and ecology: The importance of a spatial perspective, American Journal of Agricultural Economics, 78:1168–1180. Card, D.H., 1982. Using known map category marginal frequencies to improve estimates of thematic map accuracy, Photogrammetric Engineering & Remote Sensing, 48(3):431–439. Chomitz, K.M., and D.A. Gray, 1996. Roads, land use, and deforestation: A spatial model applied to Belize, The World Bank Economic Review, 10(3):487–512. Çinlar, E., 1975. Introduction to Stochastic Processes, Prentice-Hall, Englewood Cliffs, New Jersey, 402 p. Cliff, A.D., and J.K. Ord, 1973. Spatial Autocorrelation, Pion, London, United Kingdom, 78 p. Cohen, J., 1960. A coefficient of agreement for nominal scales, Educational and Psychological Measurement, 20(1):37–46. Congalton, R.G., and K. Green, 1999. Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, Lewis Publishers, Boca Raton, Florida, 137 p. Geoghegan, J., S.C. Villar, P. Klepeis, P.M. Mendoza, Y. OgnevaHimmelberger, R.R. Chowdhury, B.L. Turner, and C. Vance, 2001. Modeling tropical deforestation in the southern Yucatán peninsular region: Comparing survey and satellite data, Agriculture, Ecosystems, and Environment, 1795:1–22.
1278
November 2003
Greene, W.H., 2000. Econometric Analysis. Fourth Edition, Prentice Hall, New Jersey, 1004 p. Hauser, J.R., 1978. Testing the accuracy, usefulness, and significance of probabilistic choice models: An information-theoretic approach, Operations Research, 26:406–421. Hoel, P.G., S.C. Port, and C.J. Stone, 1971. Introduction to Probability Theory, Houghton Mifflin Company, Boston, Massachusetts, 258 p. Hosmer, D.W., and S. Lemeshow, 1989. Applied Logistic Regression, John Wiley & Sons, New York, N.Y., 307 p. Ludeke, A.K., R.C. Magio, and L.M. Reid, 1990. An analysis of anthropogenic deforestation using logistic regression and GIS, Journal of Environmental Management, 32:247–259. McFadden, D., 1974. The measurement of urban travel demand, Journal of Public Economics, 3:303–28. Mertens, B., and E. Lambin, 1997. Spatial modeling of deforestation in southern Cameroon: Spatial disaggregation of diverse deforestation processes, Applied Geography, 17(2):143–162. , 2000. Land-cover-change trajectories in southern Cameroon, Annals of the Association of American Geographers, 90(3): 467–494. Nelson, G.C., and D. Hellerstein, 1997. Do roads cause deforestation? Using satellite images in econometric analysis of land use, American Journal of Agricultural Economics, 79:80–8. Pijanowski, B.C., M. Bauer, K. Sawaya, and B. Shellito, 2001. Using remote sensing to parameterize the land transformation model for the Twin Cities, Proceedings, American Society for Photogrammetry and Remote Sensing (ASPRS) Meeting, 21–24 April, St. Louis, Missouri, CD ROM. Pijanowski, B.C., D.G. Brown, G. Manik, and B. Shellito, 2002. Using neural nets and GIS to forecast land use changes: A land transformation model, Computers, Environment, and Urban Systems, 26(6):553–575. Pontius, R.G., 2000. Quantification error verses location error in comparison of categorical maps, Photogrammetric Engineering & Remote Sensing, 66(8):1011–1016. Sklar, F., and R. Constanza, 1991. The development of dynamic spatial models for landscape ecology: A review and prognosis, Quantitative Methods in Landscape Ecology (M.G. Turner and R.H. Gardner, editors), Springer, New York, N.Y., pp. 239–288. Sklar, F.H., H.C. Fitz, Y.Wu, R. Van Zee, and C. McVoy, 2001. The design of ecological landscape models for Everglades restoration, Ecological Economics, 37(3):379–401. Turner, M.G., D.N. Wear, and R.O. Flamm, 1996. Land ownership and land-cover change in the Southern Appalachian Highlands and the Olympic Peninsula, Ecological Applications, 6(4):1150–1172. Wear D.N., M.G. Turner, and R.O. Flamm, 1996. Ecosystem management with multiple owners: Landscape dynamics in a Southern Appalachian watershed, Ecological Applications 6(4):1173–1188. Wear, D.N., and P. Bostad, 1998. Land-use change in Southern Appalachian landscapes: Spatial analysis and forecast evaluation, Ecosystems, 1:575–594. (Received 25 October 2001; accepted 15 November 2002; revised 17 December 2002)
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING