Geographical Analysis ISSN 0016-7363
Assessing Spatial Dependence in Count Data: Winsorized and Spatial Filter Specification Alternatives to the Auto-Poisson Model Daniel A. Griffith Ashbel Smith Professor, School of Social Sciences, University of Texas at Dallas, Richardson, TX
The auto-Poisson probability model furnishes an obvious tool for modeling counts of geographically distributed rare events. Unfortunately, its original specification can accommodate only negative spatial autocorrelation, which itself is a rare event. More recent alternative reformulations, namely, the Winsorized and spatial filter specifications, circumvent this drawback. A comparison of their performances presented in this article reveals some of their relative advantages and disadvantages.
Introduction For many problems the available data consist of georeferenced counts, or counts tagged to geographic locations. ‘‘The central role of the Poisson distribution with respect to the analysis of counts is analogous to the position of the normal distribution in the context of models for continuous data’’ (Upton and Fingleton 1989, p. 71). The conventional Poisson random variable describes a count of the number of events that have very many opportunities to happen in, for example, geographic areas but nevertheless are rare. One feature of a Poisson random variable is that its mean and variance are equal (equidispersion), a property frequently violated by georeferenced real-world data, in part because of latent spatial autocorrelation, which one would expect to be captured by an automodel specification. But Besag (1974) points out that the initial specification of the auto-Poisson model is incapable of capturing positive spatial autocorrelation, a severe drawback to applying this model when virtually all georeferenced data display positive spatial autocorrelation. Recent developments by Kaiser and Cressie (1997) and Griffith (2002) circumvent this drawback, allowing positive spatial autocorrelation to be Correspondence: Prof. Daniel Griffith, School of Social Sciences, University of Texas at Dallas, P.O. Box 830688, GR31, Richardson, TX 75083-0688 e-mail:
[email protected]
Submitted: March 11, 2004. Revised version accepted: February 4, 2005. 160
Geographical Analysis 38 (2006) 160–179 r 2006 The Ohio State University
Daniel A. Griffith
Assessing Spatial Dependence in Count Data
accommodated. The purpose of this article is to compare these two specifications empirically using Irish drumlin counts by quadrat. Poisson data analysis Georeferenced counts data allude to the Poisson probability model, and hence Poisson regression, which assumes independent counts, say ni, taken at n locations (or collections of points aggregated into n areal units) i 5 1, 2, . . ., n. These counts can be described by a set of explanatory variables denoted by matrix Xi and, for a constant areal unit size, that have the expected value of mi ðXi Þ ¼ expðXi bÞ
ð1Þ
where b is a vector of parameter values, exp denotes the base of the natural logarithm (ln), exp(Xib) is both the mean and the variance of the Poisson distribution for location i, and mi(Xi) denotes an intensity. The linearizing transformation ln[mi (Xi)] 5 Xib is why the logarithm is the commonly used link function in generalized linear models for a Poisson distribution. A Poisson random variable can be approximated by a normal random variable in certain situations. If mi(Xi) is sufficiently large (e.g., 410), the frequency distribution for a Poisson random variable resembles a discretized normal frequency distribution. Often, however, mi(Xi) is sufficiently small that the resulting frequency distribution is highly positively skewed (i.e., skewed to the right). Many times this situation can still be related to a normal probability model by subjecting the counts to a logarithmic transformation; in other words, a log-normal approximation to a Poisson random variable can be made. When zero counts are present, this lognormal approximation must include a nonzero translation parameter, d, with the transformation becoming ln(counts1d). This notion is illustrated in Fig. 1, whose graphics are based upon a simulated Poisson random variable for which n 5 10,000 and m 5 1 ( y ¼ 1:0281, s2 5 1.04942). The three-parameter logarithmic transformation ln(counts12.5) optimally aligns these counts with a normal frequency
Figure 1. Quantile plots for simulated Poisson counts, m 5 1. (a) Left: independent and identically distributed (IID) raw counts. (b) Right: log-normal approximation for IID counts. 161
Geographical Analysis
Special Issue
distribution. This transformation increases the Ryan–Joiner normality diagnostic statistic from a value that is markedly significantly less than 1 (Po0.0100) to a value that is marginally significantly less than 1 (P 5 0.0463). In other words, the approximation here is reasonable, but not perfect. A Poisson random variable also can be approximated by a binomial random variable in certain situations. One difference between these two probability models is that the Poisson describes counts for which the upper limit is infinity, whereas the binomial describes counts for which the upper limit is some finite number N. The mean and the variance of a binomial random variable are NP and NP(1 P), 0o Po1, respectively, for some probability P of a counted item occurring. If P is extremely small (i.e., as P approaches 0), (1 P) approaches 1. Accordingly, the mean and the variance approach equality. This approximation allows Poisson counts to be treated like binomial counts by artificially introducing a very large N into an analysis. For example, the Poisson data used to construct Fig. 1 range from 0 to 7. If N is set to 100,000, then P 5 0.00001 (i.e., 100,000 0.00001 5 1), and the counts can be treated as though they are binomially distributed. Accordingly, m=N ln 1m=N ln Nm ¼ lnðmÞ lnðNÞ. In introducing an artificially large value for N, both a Poisson and a binomial analysis will yield equivalent estimation results, although Poisson regression estimation results obtained in this indirect manner are not as numerically precise.
Spatial autocorrelation in georeferenced counts data Connections between the Poisson distribution and spatial point processes for geographically aggregated data are well known (e.g., Diggle 1983). The notion of complete spatial randomness—that is, a set of observed points could have been located anywhere in a given study area, with these locations being mutually independent—in this context is reflected by a homogeneous Poisson distribution (i.e., mi 5 m); various methods for assessing complete spatial randomness have been developed for both geographically aggregated and unaggregated data situations (e.g., Ripley 1981; Diggle 1983; Cressie 1993). But, proceeding from a rejection of complete spatial randomness to an adequate conceptualization of some presumed underlying demographic, ecological, economic, epidemiological, geographic, geological, and/or social process of interest can be very difficult. Little guidance is available on how to develop models for inhomogeneous spatial Poisson processes, particularly in the absence of what are believed to be important covariates. One popular approach is to model observed counts as a set of conditionally independent Poisson random variables that have parameters depending on some underlying latent spatial process (e.g., Clayton and Kaldor 1987). In doing so, assessing the type of dependence exhibited in geographic patterns of counts across a landscape, or modeling the process that leads to counts directly, rather than relying on a latent process concept, offers one desirable way to move forward. 162
Daniel A. Griffith
Assessing Spatial Dependence in Count Data
Retaining the pairwise-only spatial dependence assumption often invoked when specifying pure automodels, the Poisson regression parameter estimation problem becomes one of evaluating the following log-probability mass function term: n X
ai ni þ r
i¼1
n X n X
cij ni nj
i¼1 j¼1
n X
lnðni !Þ
ð2Þ
i¼1
where ai is the parameter capturing large-scale variation (and hence could be specified in terms of vector Xi), r is the spatial autocorrelation parameter (capturing small-scale variation), and cij is the geographic configuration weight for location pairs i and j contained in a geographic connectivity matrix C—in its simplest form, cij 5 1 if areal units i and j are neighbors, and 0 otherwise. The autobinomial approximation term corresponding to Equation (2) is n X i¼1
ai n i þ r
n X n X i¼1 j¼1
c ij ni nj
n X i¼1
N ln ni
ð3Þ
where the third term denotes the combinatorial notation of N things taken ni at a time. The principal drawback to employing the auto-Poisson specification is that it cannot accommodate cases of positive spatial autocorrelation because of the summability condition—that is, all possible probabilities must sum to 1—requirement of its normalizing factor, resulting in the normalizing factor being intractable (see Cressie 1993, for the mathematical expression) and hence the model being of limited use. Of note is that the autobinomial model, which also has an intractable normalizing constant (see Cressie 1993, for the mathematical expression), can capture positive spatial autocorrelation. This autocorrelation restriction is at odds with the real world, in which most georeferenced data exhibit positive spatial autocorrelation—socioeconomic/demographic data tend to display moderate levels, whereas remotely sensed data tend to display markedly strong levels. These more pronounced levels of spatial autocorrelation materialize because, in part, these latter data can be described with normal random variables that have an unconstrained upper limit on their variances. As N goes to infinity, both the Poisson and the binomial distributions converge upon a normal distribution. The rate of this convergence decreases as the probability of an event occurring approaches 0. Meanwhile, the case of N 5 1 relates to a binary autologistic setting, for which a number of articles have been published (e.g., Gumpertz, Graham, and Ristaino 1997; Huffer and Wu 1998; Griffith 2004). Conceptually, then, the combination of N and P tends to impose an upper limit on positive spatial autocorrelation, resulting in the binomial approximation to a Poisson distribution often being able to capture no more than only modest levels of positive spatial autocorrelation (e.g., see Table 2). 163
Geographical Analysis
Special Issue
A nonzero value for the spatial autocorrelation parameter r in Equation (2) has certain detectable consequences. Foremost is overdispersion (i.e., extra Poisson variation), resulting in s2i ðXi Þ > mi ðXi Þ, when spatial autocorrelation is ignored. This effect is discussed in detail in Griffith (2003, Chapter 6). Based upon Equation (2) for a constant mean parameter (i.e., ai 5 a), this type of variance inflation can be observed in the following Poisson probability mass function term: 0 P 1 n ! ni Pn r cij nj r c n C eea ðea Þni ea B j¼1 j¼1 ij j B C P ðY Þ / e e 1A ð4Þ @e ni ! where the right-hand bracketed term is the conventional Poisson probability quantity; expression (4) is the numerator of the function, and is divided by a complicated normalizing denominator term (see Cressie 1993, for the mathematical expression) to convert it into a probability. Fig. 2a displays the histogram for the independent and identically distributed Poisson results used to construct Figs. 1 and 2b displays comparable results (i.e., m 5 1) for an extremely strong, positive spatially autocorrelated numerical example. The manifestation of spatial autocorrelation in Fig. 2 is more pronounced skewness (e.g., more zero and near zero, and very large counts), a sample mean of 0.9995, a sample variance of 1.2100 (i.e., overdispersion), and a constant intensity term that decreases from 1 to 0.7492. Besag (1974) discusses pseudo-likelihood estimation (PLE), which involves computing standard logistic regression results while including the autoregressive Pn term j¼1 cij nj as a covariate. Parameter estimates based upon this estimation procedure are unbiased but inefficient, and have incorrect standard errors. The autobinomial approximation to the auto-Poisson discussed in ‘‘Poisson data analysis’’ supports an extension of these results to Poisson random variables. But the weakness of this approach—assuming conditionally independent observations— remains; in other words, the likelihood function still is constructed with incorrect
Figure 2. Histograms of simulated Poisson counts. (a) Left: independent and identically distributed raw counts. (b) Right: counts containing strong, positive spatial autocorrelation. 164
Daniel A. Griffith
Assessing Spatial Dependence in Count Data
product terms. Furthermore, for a fixed number of quadrats, quantitative studies in image analysis reveal that increasing the number of parameters to be estimated tends to degrade PLE’s performance, as does increasing the maximum number of counts, with this degradation being moderated with increasing n. Approaches to proper model specifications that explicitly incorporate spatial autocorrelation include: (1) retaining simplicity by slightly modifying the Poisson probability model via Winsorizing, (2) constructing a spatial filter that removes spatial autocorrelation from the response residuals, and (3) switching the perspective to one for a Bayesian hierarchical generalized linear model that incorporates spatial autocorrelation via its parameter specifications. These approaches emphasize the absence of a unique decomposition for spatial pattern into a global mean structure (i.e., large-scale variation) and a dependence structure (i.e., small-scale variation) (Cressie 1993). This article addresses these first two frequentist approaches. Winsorizing of georeferenced counts data Winsorizing Poisson counts data involves systematically replacing extremely high counts with the value of some cutoff criterion (after Barnett and Lewis 1978). It is a compromise between the infinite sum of a Poisson probability model and utilizing all of the Poisson-type counts information in some data set while establishing a most extreme acceptable count. For example, substituting 7 as a maximum count for the Poisson model used to generate Figs. 1 and 2 renders: P ð0 Y 7Þ ¼
7 X k¼0
P ðY ¼ hÞ ¼
7 X e1 k¼0
h!
¼
0:367879442ð1 þ 1 þ 1=2 þ 1=6 þ 1=24 þ 1=120 þ 1=720 þ 1=5040Þ ¼ 0:999990 which essentially is 1. Kaiser and Cressie (1997) suggest setting the maximum count at three times the expected value of the non-Winsorized Poisson random variable (i.e., 3^ m); the suggestion put forth in this article is three times the maximum count in a data set (i.e., 3ymax). For the simulation example inspected here, this former criP 1 terion would result in 3k¼0 eh! ¼ 0:98101 whereas this latter criterion would result P21 e1 in k¼0 h! , a number that for all practical purposes is indistinguishable from 1. In other words, the probabilities of excessively large counts, whose Poisson probabilities of occurring essentially are 0, are set to exactly 0. The purpose of Winsorizing is to explicitly model spatial dependence with relatively few parameters and a distribution exhibiting Poisson-like behavior so that the case of positive spatial autocorrelation can be accommodated. Spatial autocorrelation effects are captured by a single parameter (or just a few parameters), furnishing a parsimonious result. The form of the mean response is given by Equation (2). Kaiser and Cressie (1997) show that the Winsorized alternative to the autoPoisson model is capable of capturing positive spatial autocorrelation, has a relatively simple mean response specification, and has an expected value that is near 165
Geographical Analysis
Special Issue
that of the regular auto-Poisson version (i.e., these expectations are nearly the same).
Spatial filtering of georeferenced counts data Spatially filtering counts data involves specifying a geographically heterogeneous mean and variance in order to capture spatial autocorrelation. In this sense, it is similar to a Bayesian approach in that spatial autocorrelation is captured through parameters rather than through the counts themselves; but it still is a frequentist perspective. Spatial filtering seeks to transform a variable containing spatial dependence into one free of spatial dependence by partitioning the original georeferenced attribute variable into two synthetic variates: a spatial filter variate capturing latent spatial dependency that otherwise would remain in the response residuals and a nonspatial variate that is free of spatial dependence. Griffith (2000a) proposes a transformation procedure that depends on the eigenfunctions of matrix (I 11T/n)C(I 11T/n)—where I denotes the identity matrix, 1 is an n 1 vector of ones, and T denotes matrix transpose—a term appearing in the numerator of the Moran Coefficient (MC) spatial autocorrelation index, and is based on the following theorem (Griffith 2003): The first eigenvector, say E1, is the set of numerical values that has the largest MC achievable by any set for the spatial arrangement defined by the geographic connectivity matrix C. The second eigenvector is the set of values that has the largest achievable MC by any set that is uncorrelated with E1. The third eigenvector is the third such set of values. And so on. This sequential construction of eigenvectors continues through En, the set of values that has the largest negative MC achievable by any set that is uncorrelated with the preceding (n 1) eigenvectors. As such, Griffith (2000a) argues that these eigenvectors furnish distinct map pattern descriptions of latent spatial autocorrelation in georeferenced variables. The spatial filter is constructed by using judiciously selected eigenvectors as regressors (e.g., selected with a stepwise Poisson regression routine), which results in spatial autocorrelation being filtered out of the residuals of georeferenced variables, with the regression residuals representing spatially independent variable components. Of note is that these eigenvectors representing distinct map patterns are both mutually orthogonal and uncorrelated in their numerical form, a property that is corrupted by the weighting involved in computing Poisson regression parameter estimates. The spatial filtering specification alternative to the auto-Poisson model, whose mean for location i is specified by Equation (1), is given by a þ Ei b: P In other words, the r ni¼1 cij nj autoregressive term is dispensed with by shifting spatial dependence effects to the large-scale variation term represented by Eib, which is a special case of the term Xib contained in Equation (1). A complete model 166
Daniel A. Griffith
Assessing Spatial Dependence in Count Data
formulation contains (K11) parameters to be estimated, namely the scalar a and the K 1 vector b. Theoretical eigenvectors for regular square tessellations When georeferencing is constituted by quadrats that form a regular square tessellation overlaying a rectangular-shaped region, as often is the case for point pattern analyses, analytical eigenvectors are easily calculated. Griffith (2000b) proves that the asymptotic spatial filter eigenvectors for a complete P Q rectangular geographic region, where P is the number of units in a horizontal axis direction and Q is the number of units in a vertical axis direction, are given by: * + 2 gpp hqp Epq ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sin sin ; P þ1 Q þ1 ð5Þ ðP þ 1ÞðQ þ 1Þ g ¼1; 2; ; P ; h ¼ 1; 2; ; Q; p ¼ 1; 2; p; and q ¼ 1; 2; ; Q: Of note is that these eigenvectors do not change when geographic connectivity is defined in terms of shared common boundaries of nonzero length only (i.e., the rook’s chess move analogy) or both zero and nonzero length (i.e., the queen’s chess move analogy) (see Griffith 2003). And, although these eigenvectors relate to an inverse covariance matrix, in an ideal situation, distance-based geostatistical semivariogram models would yield the corresponding covariance matrix, whose eigenvectors would be the same. This latter approach is being formally developed by Legendre and his colleagues (e.g., Mo´et, Legendre, and Borcard 1998; Borcard and Legendre 2002). Equation (5) results actually are for matrix C and are asymptotic for matrix (I 11T/n)C(I 11T/n). Because repeated eigenvalues occur for a regular square tessellation forming a complete rectangular region, most standard computer algorithms fail to produce the correct theoretical vectors for these pairs of eigenvalues. Fortunately, these theoretical vectors can be accurately approximated by subjecting the n vectors generated by Equation (5) to a varimax rotated principal axis factor analysis. An empirical example: Hill’s (1973) Irish drumlins data Quadrat counts were tabulated for three 64-km2 square-shaped regions extracted from Hill’s (1973) famous County Down (Northern Ireland) drumlins data set. The optimal quadrat size (see Taylor 1977, p. 147) resulted in each region being overlaid with an 11 11 grid of quadrats (see Fig. 3). A preliminary analysis of these data reveals that each region displays significant, weak positive spatial autocorrelation (see Table 1). The log-normal approximation seeks translation parameter values, ^ d, that are greater than the largest count in two of the three data sets and that make little difference in the resulting quantile plots; thus, because 0 counts are present, for simplicity d was set to 1. Both the raw counts and the autonormal simultaneous autoregressive (SAR) model estimation 167
Geographical Analysis
Special Issue
Figure 3. Subregional sets of Hill’s (1973) Irish drumlins data. (a) Left: part of the Upper Ards peninsula. (b) Center: part of the county west of Strangford Lough. (c) Right: part of the county east of Slieve Croob.
results imply the presence of positive spatial autocorrelation in these drumlins data sets. Next, a constant mean parameter a 5 ln(m) was estimated using a generalized linear model algorithm whose response distribution was Poisson, with a log link function. Estimation results appear in Table 1. In addition, a binomial logistic regression model was estimated, using an artificial total, N (see ‘‘Poisson data analysis’’), whose assigned values were the set {10, 50, 100, 500, 1000, 5000, 10,000, ^ followed the 50,000, 100,000, 500,000}. Convergence of the logistic estimate a following trajectories, whose descriptions have relative error sums of squares less than 3.4 10 11: Drumlins region : ^ a ¼ 0:6595 þ Drumlins region : ^ a ¼ 0:6638 þ
1:9326 N 1:0073
1:9409 ; and N 1:0118
Drumlins region : ^ a ¼ 0:2346 þ
1:2641 : N 0:6489
Table 1 Preliminary Maximum Likelihood Estimate Statistical Analysis Results for Parts of Hill’s Irish Drumlins Data Drumlins region
Region #1 (n 5 234)
Region #2 (n 5 235)
Region #3 (n 5 153)
MC (sMC 5 0.067) Geary Ratio Log-normal ^d ^ Log-normal SAR r ^ Poisson m Poisson sm^
0.2698 0.7012 11.2 0.4495 0.6595 0.0654
0.2412 0.7580 10.6 0.3897 0.6638 0.0652
0.1673 0.8147 3.7 0.2675 0.2346 0.0808
MC, Moran Coefficient. 168
Daniel A. Griffith
Assessing Spatial Dependence in Count Data
Table 2 Pseudo-Likelihood Estimations for Parts of Hill’s Irish Drumlins Data Drumlins region
Region #1 (n 5 234)
Region #2 (n 5 235)
Region #3 (n 5 153)
N
Estimate
Estimate
Standard Error
Estimate
Standard Error
0.1746 0.0208 0.1732 0.0206 0.1731 0.0206 0.1730 0.0206 0.1730 0.0206 0.1730 0.0206
0.2357 0.0970 0.2417 0.0959 0.2423 0.0958 0.2423 0.0957 0.2424 0.0957 0.2424 0.0957
0.1897 0.0325 0.1887 0.0323 0.1886 0.0323 0.1886 0.0323 0.1886 0.0323 0.1886 0.0323
Nonlinear regression Kaiser–Cressie specification output 107 a 0.6142 0.1341 0.6260 0.1368 r 0.0832 0.0194 0.0787 0.0193
0.1635 0.0949
0.1379 0.0295
Parameter
Standard Error
Generalized linear model specification output 102 a 0.0251 0.1885 0.1090 r 0.0883 0.0223 0.0776 103 a 0.0187 0.1840 0.1012 r 0.0867 0.0220 0.0762 104 a 0.0181 0.1838 0.1004 r 0.0865 0.0220 0.0760 105 a 0.0180 0.1838 0.1004 r 0.0865 0.0220 0.0760 106 a 0.0180 0.1838 0.1003 r 0.0865 0.0220 0.0760 107 a 0.0180 0.1838 0.1003 r 0.0865 0.0220 0.0760
N for which convergence achieved is highlighted in bold typeface.
In all three cases, Poisson and binomial estimates essentially are equivalent for N 5 100,000, at which value the second term on the right-hand side of these preceding equations approximately equals 0. Of note is that, although convergence has not occurred, little practical difference in parameter estimates comes about by increasing N beyond 100 (see Table 4). PLEs based upon both Poisson and binomial regression analyses, setting N 5 100,000, are reported in Table 2; exactly the same estimates are obtained with both specifications. Kaiser and Cressie (1997, p. 428) rewrite the mean suggested by Equation (2) as ai þ r
n X j¼1
cij nj r
n X
cij eaj
ð6Þ
j¼1
in order to obtain an estimate for ai that is more directly comparable with its nonspatial statistical estimate. PLEs for Equation (6), using a nonlinear regression implementation, also appear in Table 2. These estimates are very similar to their generalized linear model counterparts and reveal that the spatial autocorrelation parameter estimation has little impact upon the constant mean estimate. Based on the maximum eigenvalue of matrix C, namely l1 5 3.83797, the maximum value of ^ prior to encountering any phase transitions is 0.26055; hence, the standardized r 169
Geographical Analysis
Special Issue
^=l1 are of the order 0.33, indicating weak-to-moderate positive estimates given by r spatial autocorrelation. One important outcome of this pseudo-likelihood analysis, which is consistent with findings reported in Table 1, is the detection of positive spatial autocorrelation in the observed Poisson counts. Therefore, all preliminary evidence gleaned from the three drumlins data sets implies the presence of weak-to-moderate positive spatial autocorrelation and thus demands an auto-Poisson type of specification that can capture this nature and degree of correlation. Estimation results for the Winsorized alternative to the auto-Poisson model The proposition pertaining to the Winsorized alternative to the auto-Poisson model proved by Kaiser and Cressie (1997) establishes the feasibility and validity of a mathematical Poisson counts description that captures positive spatial autocorrelation. The preceding PLEs are for the parameters of conditional Poisson mass functions. Of interest here is the estimation of parameters for joint multivariate Poisson mass functions. These estimates can be obtained with Markov Chain Monte Carlo (MCMC; see Gilks, Richardson, and Spiegelhalter 1996; Hubbell et al. 2001) techniques employing a Gibbs sampler and, when necessary, a Metropolis–Hasting algorithm. Although some theoretical work has been completed on the autobinomial model (e.g., Besag 1974; Cressie 1993), in practice most implementations of it to date have employed PLE or Besag-coding estimation of parameters, or a Bayesian hierarchical modeling perspective in which autocorrelation is introduced with an autonormal prior. Here the Gibbs sampler involves sampling from a Poisson distribution whose log-mean is given by the expression ^þr ^ a
n X
cij nj
j¼1
^ and r ^ are the PLEs. An initial geographic distribution of quadrat counts is where a obtained by randomly sampling from n independent Poisson distributions with P mean expð^ aÞ; this set of computations allows r nj¼1 cij nj to be calculated in the second iteration. During each subsequent sampling iteration, each of the 121 quadrats of a drumlins region are visited in turn, but in a random order, and the quadrat count by is replaced sampling from a Winsorized Poisson distribution with P mean exp a þ r nj¼1 cij nj ; truncating drawings when they exceed 7 (i.e., Winsorization) serves a function similar to that of a Metropolis–Hasting algorithm. These P iterations are repeated until convergence of the sufficient statistics (i.e., ni¼1 ni for Pn Pn ^ and i¼1 j¼1 cij ni nj for r ^) is attained. Of a total of 25,000 iterations executed a here, the first 20,000 were discarded to remove transient states toward the equi^ and for r ^; these first 20,000 results are the ‘‘burn-in’’ pelibrium distribution for a riod. Time series plots of the two sufficient statistics were used to confirm equilibrium attainment. The remaining 5000 iterations were weeded such that the sufficient statistics for only every fifth simulated map were retained. Time series 170
Daniel A. Griffith
Assessing Spatial Dependence in Count Data
correllograms of these sufficient statistics were used to confirm independence of retained maps. MCMC maximum likelihood estimates (MLEs) and their standard errors were computed with the retained sufficient statistics, and are reported in Table 3. As expected, almost without exception the standard errors are much smaller than their PLE counterparts. The spatial autocorrelation parameter estimates are positive and roughly equivalent to their PLE counterparts. Of note is that the results for drumlins region #3 are somewhat unstable. The MCMC ‘‘Winsorized’’ auto-Poisson results were checked with an MCMC analysis using an autobinomial approximation for which N was set to 100,000. This check involved the same two sufficient statistics as for the preceding Poisson P P P ^ and ni¼1 nj¼1 cij ni nj for r ^), and also required MCMC analysis (i.e., ni¼1 ni for a the use of a Metropolis–Hasting algorithm, whose goal was to construct a Markov Chain that has a specified equilibrium distribution, as sampling counts from N 5 100,000 can result in sizable numbers and hence a phase transition (all locations on a map end up taking on the same value). In keeping with the Winsorized upper limit, a maximum value of 7 (i.e., the maximum observed number of counts) was set in order to define the acceptance probability for this algorithm. Each sample value selected from a binomial distribution was evaluated, and retained if its probability of acceptance was appropriate; otherwise, the count at the quadrat location in question was retained. When a sample count is rejected, the Markov Chain has a repeat in its sequence. Nevertheless, under rather general conditions, the final sequence is a Markov Chain with the desired equilibrium distribution. The MCMC– MLEs obtained with this procedure also appear in Table 3. They are of the same order of magnitude as their MCMC-MLE Poisson counterparts, thus corroborating the Poisson results. These results can be reexpressed using the Kaiser–Cressie specification given in Equation (6), in order to be more directly comparable with aspatial statistical results (see Table 3). Regardless, the pooled measures of spatial autocorrelation account for about 10% of the variation in drumlin counts across each region. The autoregressive terms account for virtually all latent spatial autocorrelation, yielding residuals that are spatially unautocorrelated. In their generalizations to the wider class of generalized linear mixed models, Lee and Nelder (2001) assess the normality of Poisson regression residuals using normal quantile plots. Using this model-checking technique, the residuals here show evidence of non-normality (see the probabilities of Shapiro–Wilk statistics, P[Shapiro–Wilk], reported in Table 3). Estimation results for the spatial filter alternative to the auto-Poisson model The motivation for spatial filtering is to allow spatial analysts to employ traditional generalized linear models regression techniques while ensuring that regression residuals behave according to required model assumptions, such as uncorrelated errors. When georeferenced data are used, residuals are assumed not to be spatially autocorrelated; otherwise, the regression model is said to be misspecified. This is the reason why PLEs are inefficient. Spatial filtering decomposes Poisson counts 171
Geographical Analysis
Special Issue
Table 3 Markov Chain Monte Carlo–Maximum Likelihood Estimates for Parts of Hill’s Irish Drumlins Data Drumlins region
Region #1 (n 5 234)
Region #2 (n 5 235)
Region #3 (n 5 153)
Estimate
Estimate
Estimate
Standard Error
0.6050 0.0992
0.1142 0.0606
Standard Error
Standard Error
Chains generated with a Poisson conditional mass function a 0.2056 0.0247 0.1158 0.0327 r 0.0866 0.0191 0.0767 0.0131
Nonlinear regression Kaiser–Cressie specification output for the Poisson mass function a 0.0615 0.1836 0.6312 r 0.0867 0.0767 0.0992 Chains generated with a binomial conditional mass function a 0.2139 0.0138 0.1243 0.0267 r 0.0867 0.0229 0.0771 0.0155
0.6307 0.1043
Diagnostic statistics for conditional Poisson mass function results Pseudo-R2 0.1359 0.1155 P (Shapiro-Wilk) 0.0348 0.0655 MC 0.0650 0.0740 Geary Ratio 1.0285 1.0633
0.0842 0.0006 0.0800 1.0661
0.1393 0.0634
Denotes not calculated.
that are spatially autocorrelated into a spatial and an aspatial (i.e., geographically P ^ nj¼1 cij nj term in an independent) component. The spatial term functions like the r automodel, comprises synthetic variates that are a linear combination of some subset of the eigenfunctions of matrix (I 11T/n)C(I 11T/n), and does not require further reexpression (e.g., like is done with the Kaiser–Cressie expression (6)) because all of the eigenvectors have a mean of 0. Because latent spatial autocorrelation is positive, attention can be restricted to those eigenvectors representing marked positive spatial autocorrelation (e.g., those with a MC/MCmax40.25). Conventional supervised stepwise selection of the K candidate eigenvectors is a useful and effective approach to identifying the subset of important vectors for best describing latent spatial autocorrelation in a particular georeferenced counts variable. This procedure begins with only the intercept, a, included in the Poisson regression specification. Then, at each step, a candidate eigenvector is considered for addition to the model specification. The one that produces the greatest reduction in the log-likelihood function w2-test statistic is selected, but only if it produces at least a prespecified minimum reduction. At each step all eigenvectors previously entered into the model specification are reassessed, possibly removing ones added at an earlier step. The procedure terminates automatically when prespecified threshold residual spatial autocorrelation and/or prespecified threshold levels of significance for eigenvector coefficients are 172
Daniel A. Griffith
Assessing Spatial Dependence in Count Data
encountered for entry and removal of all candidate eigenvectors. A forward/backward combination is needed here because the orthogonality and uncorrelatedness properties of the eigenvectors are corrupted to some degree by weights attached to observations in the maximum likelihood estimation process utilized in generalized linear model estimation—the weighted eigenvectors are neither orthogonal nor uncorrelated. This stepwise procedure can be implemented either with a stepwise Poisson regression routine or, in its absence, a pseudo-logistic regression routine (i.e., artificially setting N to a very large number). Of note is that this stepwise procedure results in a differential set of estimated coefficients being attached to the eigenvectors, whereas the Winsorized model attaches a single common estimated coefficient to all n differentially weights eigenvectors (i.e., rCY 5 rEKETY 5 Eqb, where K is a diagonal matrix of eigenvalues and b is an n 1 vector of coefficients). Spatial filter estimation results appear in Table 4. Generalized linear model estimation results are commonly assessed, in part, by dividing calculated deviance by the number of degrees of freedom (df). Deviance/df values of nearly 1 obtained here imply that Poisson spatial filter models appear to furnish good descriptions of
Table 4 Maximum Likelihood Estimate Spatial Filtering Results for Parts of Hill’s Irish Drumlins Data Drumlins region
Region #1 (n 5 234)
Region #2 (n 5 235)
Region #3 (n 5 153)
Eigenvector
Estimate
Standard Error
Estimate
Standard Error
Estimate
Standard Error
0.0698 0 0.0640 0 0 0.0689 0.0663 0 0.0666 0.0679 0
0.5769 0.1552 0.1156 0.1135 0 0 0.2272 0 0 0.1957 0.1941
0.0709 0.0646 0.0688 0.0654 0 0 0.0672 0 0 0.0654 0.0667
0.1536 0 0 0 0.1702 0.2877 0 0.2296 0 0 0
0.0873 0 0 0 0.0828 0.0826 0 0.0818 0 0 0
Inntercept 0.5937 E1,2 0 E1,3 0.1689 E1,8 0 E2,1 0 E2,2 0.2330 E2,3 0.1359 E2,4 0 E3,6 0.1443 E4,2 0.1306 E4,4 0 Diagnostic statistics Deviance/df 0.7890 2 Pseudo-R 0.2412 P (Shapiro-Wilk) 0.0411 MC 0.0927 Geary Ratio 0.8682
0.8560 0.3199 0.2103 0.0131 0.9994
0.8369 0.2502 0.0123 0.0654 1.0451
Denotes a significant w2-statistic at the 10% level. Coefficients of 0 were set and not
estimated numerically. MC, Moran Coefficient. 173
Geographical Analysis
Special Issue
drumlins counts in these three geographic landscapes, characterizing them with a pronounced small-scale variation component (i.e., spatial dependence) and a sizable random component that conforms to a Poisson distribution. Spatial autocorrelation accounts for roughly a quarter of the geographic variability in counts across quadrats in all three drumlins regions. The estimates of a are very similar to their counterparts in Table 1. The counts residuals tend to conform to a normal distribution and basically are free of spatial autocorrelation; trace positive spatial autocorrelation appears to remain in drumlins region #1 counts residuals (e.g., E(MC)) for the linear regression counterpart for this model is 0.0425); the formal statistical distribution theory still needs to be developed for this problem in order to properly evaluate these particular residual results. The spatial filtering results reveal that complex patterns of spatial autocorrelation underpin these geographic distributions of counts. No eigenvector is common to all three regions, indicating that the underlying map patterns are quite distinct. This finding refutes using a single estimated parameter to capture spatial autocorrelation effects, suggesting that the Winsorized alternative to the auto-Poisson model specification may be too simplistic. However, spatial filtering requires three to six estimated parameters in order to characterize the spatial dependence captured by a single parameter with the Winsorized model. Model assessment. Model assessment is presented in terms of mean squared prediction error, computed with the cross-validation technique, and scatterplots of predicted versus observed counts. Cross-validation was implemented in a jackknife fashion: each observed count was removed from an analysis, in turn, and then the analysis was repeated. The full set of eigenvectors was retained, as spatial filter eigenfunctions are independent of the geographically distributed counts; 121 spatial filter Poisson regression analyses were performed, each one containing a missing value. Next, a PLE was obtained for each subset of 121 observations. These PLEs were the parameter estimates used in computing 121 MCMC–MLEs. Each generated chain had 10,000 iterations, with the first 5000 removed as burn-in iterations, and the last 5000 weeded to 1000. Once the predicted values were computed with results from these procedures, differences between observed and predicted values were evaluated. The outcomes of these evaluations are summarized in Table 5. Cross-validation findings reveal that the spatial filter alternative to the auto-Poisson specification performs the best overall. MCMC-MLE results for the Winsorized auto-Poisson specification are an improvement over the PLE results, although they may contain some bias. Meanwhile, scatterplots of the observed versus predicted counts are portrayed in Fig. 4. Visual inspection of these plots reveals slight positive association trends, with those displayed by the spatial filter specification alternative to the auto-Poisson model being more pronounced and hence more conspicuous. Comparing the pseudo-R2 values reported in Tables 3 and 4 corroborates this perceived difference. In addition, the spatial filter specification yields predicted value ranges that 174
Daniel A. Griffith
Assessing Spatial Dependence in Count Data
Table 5 Cross-Validation Maximum Likelihood Estimate (MLE) Results: Differences Between Observed and Predicted Values Drumlins region
Region #1 (n 5 234)
Region #2 (n 5 235)
Region #3 (n 5 153)
PLE Mean MSPE Minimum Maximum
0.0042 1.3777 2.4575 3.2892
0.0006 1.6408 2.6088 3.5383
0.0027 1.0096 2.1784 2.7426
Winsorized specification mean 0.3668 MSPE 1.3688 Minimum 1.9490 Maximum 3.6112
0.3587 1.6161 2.0475 3.8631
0.3596 0.9889 1.4356 3.1044
Spatial filter specification Mean 0.0009 MSPE 1.3289 Minimum 2.2482 Maximum 3.1786
0.0012 1.3919 2.8618 3.6593
0.0011 0.8547 1.9391 2.0722
PLE, pseudo-likelihood estimation; MSPE, mean squared prediction error.
Figure 4. Observed vs. predicted scatterplots; Markov Chain Monte Carlo–maximum likelihood estimates (MCMC–MLE) produces integer, whereas spatial filtering produces noninteger, predicted values, with multiple predicted values overlaid in the scatterplots. (a) Top left: MCMC–MLE-based predictions for drumlins region #1. (b) Top center: MCMC–MLEbased predictions for drumlins region #2. (c) Top right: MCMC–MLE-based predictions for drumlins region #3. (d) Bottom left: spatial filter-based predictions for drumlins region #1. (e) Bottom center: spatial filter-based predictions for drumlins region #2. (f) Bottom right: spatial filter-based predictions for drumlins region #3. 175
Geographical Analysis
Special Issue
more closely agree with those for their corresponding observed counts. In all cases, considerable variation about the hypothetical regression lines remains (e.g., correlations between the spatial filter predicted and observed counts are roughly 0.5), again emphasizing the sizable random component associated with these drumlins data, a component also uncovered by Hill (1973). Failure to detect systematic trends in diagnostic statistics further supports model-based findings reported here. Of note is that the deviance statistics, commonly used for model assessment, that are reported in Table 4 are less than 1, suggesting the possibility of underdispersion of the residuals. One source of underdispersion is a competitive situation. But drumlins are not territorial in a way that spreads them very evenly over the Irish landscape. Another source is overfitting of the data, which in a spatial filter model specification would result in overcorrecting for spatial autocorrelation. But neither the MC nor the Geary Ratio residual values are excessively negative. A third source is excessive zeroes, which would invalidate the w2-statistical basis of the deviance statistic (Pigeon and Heyse 1999). But the a values suggest that the observed and expected numbers of zeroes are roughly equivalent. And a fourth source is random variation. Two-tailed 10%, 5%, and 1% confidence intervals contain virtually all the reported values; 0.7890 for region #1 is significant at the 10% level only. Therefore, the deviance statistics fail to imply any obvious model specification problems.
Conclusions The goal of this article is to present results of an empirical comparison of the performance of a Winsorized auto- and a spatial-filter-specified Poisson probability model for geographically correlated data using Irish drumlin counts by quadrat. The conceptual trade-off between these two specifications concerns simplicity. The Winsorized specification involves what appears to be merely a mathematical trick to force the Poisson probability mass function to accommodate positive spatial autocorrelation, and with a single spatial autocorrelation parameter. However, this Winsorized specification better recognizes the finiteness of physical reality, because the number of drumlins in a quadrat in Ireland could never approach infinity. In contrast, the spatial filter specification involves a likelihood function comprising mutually independent Poisson probability mass functions with a constant intensity parameter coupled with more than one spatial autocorrelation parameter. This increased number of parameters reveals far more detail about the complexity of underlying spatial dependencies. Of note is that a more comprehensive spatial filter analysis of these drumlins data, focusing on increasing scale effects and autonormal model estimate comparisons, appears in Griffith (2003). Computational effort furnishes a second comparison criterion of interest to practitioners. The Winsorized specification requires a sequence of calculations that together are numerically intensive. First, PLEs need to be calculated. Second, 176
Daniel A. Griffith
Assessing Spatial Dependence in Count Data
MCMC chains based upon these PLEs, and using a Gibbs sampler, need to be generated and checked for convergence. An autobinomial approximation requires the additional inclusion of a Metropolis–Hasting algorithm. Third, MCMC–MLEs need to be calculated. These last calculations involve nonlinear estimation. Furthermore, because of the magnitude of numbers involved and embedded nonlinearities, the MCMC–MLEs can be unstable. In contrast, the spatial filter specification requires computation of eigenfunctions. For regular square tessellations forming complete rectangular regions, Equation (5) provides an analytical solution to this problem. For incomplete rectangular regions and irregular tessellations, current computer technology supports eigenfunction computations for n as large as about 10,000. But the eigenfunctions need to be calculated only once for a given surface partitioning. And estimation can be done with standard generalized linear models statistical software packages. Description of the selected empirical data sets furnishes another comparison criterion. The geographic distributions of drumlins by quadrat in three particular geographic landscapes exhibit weak-to-moderate positive spatial autocorrelation. Preliminary inspections of east–west and north–south Moran scatterplots for these three landscapes found no evidence of anisotropy (see Fig. 5), supporting the use of isotropic model specifications. The Winsorized specification accounts for all but trace spatial autocorrelation in all three of these landscapes. Its spatial autoregressive term accounts for roughly 10% of the variation in drumlin quadrat counts across these three landscapes. But the counts residuals noticeably deviate from a normal frequency distribution. In contrast, the spatial filter specification accounts for more than 25% of the variation in drumlin quadrat counts across these three landscapes. The counts residuals conform reasonably well to a normal frequency distribution. But this alternative version of an auto-Poisson model fails to fully capture all spatial dependency effects in one of the landscapes. This failure alludes to the question of whether or not the stepwise eigenvector selection rule should be to seek the greatest reduction in the log-likelihood function w2-test statistic or the
Figure 5. Anisotropic Moran scatterplots superimposed on isotropic Moran scatterplots. (a) Left: drumlins region #1. (b) Middle: drumlins region #2. (c) Right: drumlins region #3. 177
Geographical Analysis
Special Issue
greatest reduction in spatial autocorrelation among counts residuals. Utilizing this latter rule calls for software development, because presently it is not an option in standard statistical software packages, as well as statistical distribution theory development for residual spatial autocorrelation for Poisson regression. Predictive capabilities furnish a final comparison criterion to be discussed here. A cross-validation analysis of empirical results reveals that the MCMC–MLEs are improvements over the PLEs, a finding that is corroborated by the expected accompanying general reduction in standard errors. But the Winsorized version appears to have some prediction bias. The spatial filter version outperforms the Winsorized version in both these ways. Furthermore, the spatial filter version renders a wider range of predicted values than does the Winsorized version, better spanning the observed range of counts. In conclusion, the Winsorized and spatial filter specification alternatives to the auto-Poisson probability model for describing georeferenced counts data each have strengths and weaknesses. The spatial filter specification portrays spatial autocorrelation as a more complex phenomenon, and by doing so tends to outperform the Winsorized specification according to a number of different criteria.
References Barnet, V., and T. Lewis. (1978). Outliers in Statistical Data. New York: Wiley. Besag, J. (1974). ‘‘Spatial Interaction and the Statistical Analysis of Lattice Systems.’’ Journal of the Royal Statistical Society 36B, 192–225. Borcard, D., and P. Legendre. (2002). ‘‘All-Scale Spatial Analysis of Ecological Data by Means of Principal Coordinates of Neighbour Matrices.’’ Ecological Modelling 153, 51–68. Clayton, D., and J. Kaldor. (1987). ‘‘Empirical Bayes Estimates of Age-Standardized Relative Risks for Use in Disease Mapping.’’ Biometrics 43, 671–81. Cressie, N. (1993). Statistics for Spatial Data, 2nd ed. New York: Wiley. Diggle, P. (1983). Statistical Analysis of Spatial Point Patterns. London: Academic Press. Gilks, R., S. Richardson, and J. Spiegelhalter, eds. (1996). Markov Chain Monte Carlo in Practice. New York: Chapman & Hall. Griffith, D. (2000a). ‘‘A Linear Regression Solution to the Spatial Autocorrelation Problem.’’ Journal of Geographical Systems 2, 141–56. Griffith, D. (2000b). ‘‘Eigenfunction Properties and Approximations of Selected Incidence Matrices Employed in Spatial Analyses.’’ Linear Algebra and its Applications 321, 95–112. Griffith, D. (2002). ‘‘A Spatial Filtering Specification for the Auto-Poisson Model.’’ Statistics and Probability Letters 58, 245–51. Griffith, D. (2003). Spatial Autocorrelation and Spatial Filtering: Gaining Understanding through Theory and Scientific Visualization. Berlin, Germany: Springer-Verlag. Griffith, D. (2004). ‘‘A Spatial Filtering Specification for the Auto-Logistic Model.’’ Environment and Planning A 36, 1791–1811. Gumpertz, M., J. Graham, and J. Ristaino. (1997). ‘‘Autologistic Model of Spatial Pattern of Phytophthora Epidemic in Bell Pepper: Effects of Soil Variables on Disease Presence.’’ Journal of Agricultural, Biological, and Environmental Statistics 2, 131–56. 178
Daniel A. Griffith
Assessing Spatial Dependence in Count Data
Hill, A. (1973). ‘‘The Distribution of Drumlins in County Down, Ireland.’’ Annals of the Association of American Geographers 63, 226–40. Hubbell, S., J. Ahumada, R. Condit, and R. Foster. (2001). ‘‘Local Neighborhood Effects on Long-Term Survival of Individual Trees in a Neotropical Forest.’’ Ecological Research 16, 859–75. Huffer, F., and H. Wu. (1998). ‘‘Markov Chain Monte Carlo for Autologistic Regression Models with Application to the Distribution of Plant Species.’’ Biometrics 54, 509–24. Kaiser, M., and N. Cressie. (1997). ‘‘Modeling Poisson Variables with Positive Spatial Dependence.’’ Statistics and Probability Letters 35, 423–32. Lee, Y., and J. Nelder. (2001). ‘‘Modelling and Analysing Correlated Non-Normal Data.’’ Statistical Modelling 1, 3–16. Me´ot, A., P. Legendre, and D. Borcard. (1998). ‘‘Partialling out the Spatial Component of Ecological Variation: Questions and Propositions in the Linear Modelling Framework.’’ Environmental and Ecological Statistics 5, 1–27. Pigeon, J., and J. Heyse. (1999). ‘‘A Cautionary Note About Assessing the Fit of Logistic Regression Models.’’ Journal of Applied Probability 26, 847–53. Ripley, B. (1981). Spatial Statistics. New York: Wiley. Taylor, P. (1977). Quantitative Methods in Geography. New York: Houghton Mifflin. Upton, G., and B. Fingleton. (1989). Spatial Data Analysis by Example, Vol. 2. New York: Wiley.
179