faster maximum likelihood estimation of very large ...

2 downloads 0 Views 325KB Size Report
LANDSAT TM image of the Adirondack Park, NY, explored in Griffith and ... where for the Adirondack Park data ˆα = 0.15278 and ˆδ = 1.14543, and for the ...
Journal of Statistical Computation and Simulation Vol. 74, No. 12, December 2004, pp. 855–866

FASTER MAXIMUM LIKELIHOOD ESTIMATION OF VERY LARGE SPATIAL AUTOREGRESSIVE MODELS: AN EXTENSION OF THE SMIRNOV–ANSELIN RESULT∗ DANIEL A. GRIFFITH† Department of Geography, College of Arts and Science, Syracuse University, 144 Eggers Hall, Syracuse, NY 13244-1020, USA (Received 25 October 2001; In final form 24 September 2002) Maximization of an auto-Gaussian log-likelihood function when spatial autocorrelation is present requires numerical evaluation of an n × n matrix determinant. Griffith and Sone proposed a solution to this problem. This article simplifies and then evaluates an alternative approximation that can also be used with massively large georeferenced data sets based upon a regular square tessellation; this makes it particularly relevant to remotely sensed image analysis. Estimation results reported for five data sets found in the literature confirm the utility of this newer approximation. Keywords: Spatial statistics; Jacobian; Georeferenced data; Spatial autocorrelation; Normalizing constant; Maximum likelihood; Auto-Gaussian

1

INTRODUCTION

One impediment hindering the use of spatial autoregressive models to describe massively large georeferenced data sets is calculating the Jacobian of the transformation from an autocorrelated to an unautocorrelated probability space. This Jacobian term is a normalizing constant, and involves an n × n matrix. Griffith identifies pronounced patterns in the covariation between this Jacobian term for a simultaneous autoregressive (SAR) or autoregressive response (AR) auto-Gaussian model – which is of the matrix form |I − ρW| – and the nature and degree of spatial autocorrelation, which is denoted here by the autoregressive parameter ρ (Griffith, 1992; Griffith and Sone, 1995). I is an n × n identify matrix, and W is an n × n geographic weights matrix indicating which locations are nearby. Matrix W is frequently the row-standardized version of a binary connectivity matrix C whose elements are defined as cij = 1 if a real units i and j are nearby, and cij = 0 otherwise. Suppose D is a diagonal matrix whose ith entry is  n −1 j =1 cij , the ith row sum of matrix C upon which matrix W is based. Then W = D C.

∗ This †

research was supported by the National Science Foundation, research grant #BCS-9905213. Corresponding author. E-mail: [email protected]

c 2004 Taylor & Francis Ltd ISSN 0094-9655 print; ISSN 1563-5163 online  DOI: 10.1080/00949650410001650126

856

D. A. GRIFFITH

Smirnov and Anselin (2001) exploit the polynomial equation that produces a scalar quantity that is the determinant, which may be written in terms of permutations as

|I − ρW| =

n! 

(−1)p [ω1,Pk,1 ω2,Pk,2 · · · ωn,Pk,n ] = scalar determinant value,

(1)

k=1

where ωi,j are the (i, j )-cell entries of matrix (I − ρW), {Pk,j , j = 1, 2, . . . , n} is a permutation of the second subscript of the cell entries, and p is a count of the number of cell pair exchanges required to permute the consecutive set of integers {1, 2, . . . , n} to the permutation set {Pk,j }. There are n! possible permutations of n numbers, resulting in a determinant being the sum of n! products. Denote the coefficients of the resulting polynomial by qj , j = 1, 2, . . . , n. Then the only product containing no ρ term is the one coming from the product of the diagonal elements of the matrix, or q0 =

n 

(−1)0 ωk,k = 1(1n ) = 1,

k=1

since all diagonal entries of matrix W are 0, and hence all diagonal entries of matrix (I − ρW) are 1. The exponent of −1 is 0 because the permutation used is the original one (i.e., the set of consecutive integers). Because all nondiagonal entries ωi,j are either 0 or a multiple of −ρ, switching  only two ωi,j values in the original permutation of subscripts means the product (−1)p jn=1 ω1,Pk,j will always render either 0 or a ρ 2 term. In other words, q1 = 0 in every case. In addition, because matrix W can be rewritten as D−1 P Q [IP ⊗ CQ + CP ⊗ IQ ], where ⊗ denotes the Kronecker product and Ck is a k × k bi-diagonal matrix, the resulting tridiagonal partitioned matrix suggests that only even powers of ρ will appear in the determinant polynomial. Algebraically organizing these terms yields q0 + q2 ρ 2 + · · · + q2k−2 ρ 2k−2 + q2k ρ 2k ,

2k ≤ n.

(2)

Moreover, Eq. (2) contains only even number powers. The purpose of this article is to outline how a relatively simple version of Eq. (2) can have its coefficients estimated so that it can be employed in spatial autoregressive analyses involving regular square tessellations, such as the pixels of a remotely sensed image or the field plots of many agricultural experiments. For the analysis presented here, only pixels sharing nonzero length boundaries are considered juxtaposed (i.e., the rook’s move case in chess).

2

POTENT DETAILS OF MATRIX W

Because matrix W is stochastic (i.e., all of its entries are nonnegative and each of its rows sums to 1), we know analytically that λ1 ≡ 1. Furthermore, the eigenvalues of matrix W are known to be real, and constitute a symmetric set. Although matrix W is almost always nonsymmetric, it has a corresponding similarity matrix that is symmetric; D−1/2 CD−1/2 is a symmetric matrix whose eigenvalues are identical to those of matrix W. Summing the squared entries of this matrix for a regular square tessellation constituting a P × Q rectangular

LARGE SPATIAL AUTOREGRESSIVE MODELS

857

FIGURE 1 Jacobian term plots for the data reported by Isaaks and Srivastava (), Bailey and Gatrell (), Mercer and Hall (), Wiebe (), and Long et al. (•).

region renders the quantity (18PQ + 11P + 11Q + 12)/72, which is the sum of the squared eigenvalues of matrix W (see Griffith, 2000). The following five empirical data sets are analyzed here for illustrative purposes: digital elevation reported by Isaaks and Srivastava (1989) for a 10 × 10 region of Walker Lake; LANDSAT TM spectral bands reported by Bailey and Gatrell (1995) for a 30 × 30 region of the High Peak; grain and straw production reported by Mercer and Hall (1911) for a 20 × 25 agricultural field; grain production reported by Wiebe (1935) for a 12 × 125 agricultural field; and grain production reported by Long et al. (1993) for a 36 × 108 agricultural field. These data sets were selected because estimates of ρ are available for them. Plots of their Jacobian terms appear in Figure 1. The aforementioned symmetry is apparent in this figure, as the plot for values of ρ between 0 and 1 are a mirror image of those between −1 and 0. The importance of this property is that the Jacobian term can be described by a relatively simple function. Griffith (2000) outlines an algorithm for numerically approximating the eigenvalues of matrix W for any size P × Q region. These approximate eigenvalues, which are very close to their actual counterparts, can be used to construct plots like those appearing in Figure 1 when n is very large, and to estimate coefficients of the Griffith and Sone (1995) approximation.

3

PROMINENT COEFFICIENTS OF EQ. (2)

The five empirical data sets were augmented by a set of 11 supplemental regular square tessellation surface partitionings ranging in size from 5 × 5 to 50 × 50. Analysis of numerical results for the entire set of 16 surfaces reveals that only four coefficients of Eq. (2) are meaningful. Regression results corroborate that q0 = 1 and q2k+1 = 0, k = 0, 1, . . .. A stepwise regression procedure was used to select prominent powers of ρ when describing the Jacobian term. In every case the dominant term is ρ 4 , which accounts for at least 95.5% of the variance of the Jacobian term for the 5 × 5 tessellation, increasing to more than 98.5% for the largest tessellations. For all but the smallest tessellations, the second prominent term is ρ 2 , which for most tessellations accounts for about 1% of the variance of the Jacobian term. The other competing prominent term is ρ 20 , which for most tessellations accounts for about 0.5% of the variance of the Jacobian term. No other powers of ρ are repeatedly selected by the stepwise regression procedure; in all cases these three terms account for nearly all of the variance.

858

D. A. GRIFFITH

Estimates of the qj coefficients can be calculated in two different ways. The first is to fit trend lines to the regression results. Maximum likelihood estimation often involves the log-likelihood function. In keeping with this likelihood function transformation, estimates for coefficients q2 , q4 , and q20 were calculated by fitting the following function with a nonlinear regression procedure: n −

j =1

LN (1 − ρλj ) n

= ln(1 + q2 ρ 2 + q4 ρ 4 + q20 ρ 20 ) + ε,

(3)

where the left-hand term is the log-Jacobian term −ln[|I − ρW|]/n appearing in the Gaussian likelihood function and written in terms of its eigenvalues, ln denotes the natural logarithm, and ε is a random error term. Of note is that q0 is set to 1. The estimates of q2 , q4 , and q20 obtained with Eq. (3) covary with P and Q, the horizontal and vertical Cartesian dimensions of a region. The resulting equations, again obtained with a nonlinear regression procedure, are   0.42844 1 1 qˆ2 = 0.11735 + 0.10091 + + , (4) 5/4 5/4 P Q n   0.66001 1 1 qˆ4 = 0.07421 + 0.05730 + 2/3 − , and (5) P 2/3 Q n   2..48015 1 1 qˆ20 = 0.05521 + 0.52467 + . (6) + 7/4 7/4 P Q n For massively large tessellations that form approximately square regions, such as those for a remotely sensed image, these estimates converge on qˆ2 = 0.11735, qˆ4 = 0.07421, and qˆ20 = 0.05521. Of note is that even for a relatively small tessellation, such as the 10 × 10 region used by Isaaks and Srivastava, these asymptotic values render a quite respectable estimate of ρ. The relationship between actual and predicted Jacobian terms based upon these estimates is portrayed in Figure 2; a modest deviation is detectable only for the smallest landscape (i.e., the surface from Isaaks and Srivastava), and only for very large values of ρ. Rather than fitting trend lines to estimated coefficients, the estimated coefficients can be written as functions of the quantity (18PQ + 11P + 11Q + 12)/72. The conceptual basis for positing such a functional form rests upon the coefficients of Eq. (2) being sums of products of

FIGURE 2 Jacobian term observed and predicted comparisons for the data reported by Isaaks and Srivastava (), Bailey and Gatrell (), Mercer and Hall (), Wiebe (), and Long et al. (•).

LARGE SPATIAL AUTOREGRESSIVE MODELS

859

n-tuples of the entries in matrix D−1/2 CD−1/2 . This quantity can be calculated quickly for any size region, and as noted previously, relates directly to the sum of the squared eigenvalues of matrix W. The resulting equations, once more obtained with a nonlinear regression procedure, are 

18PQ + 11P + 11Q + 12 qˆ2 = 0.09244 + 0.02900 + 0.73812 72   18PQ + 11P + 11Q + 12 qˆ4 = −0.11558 + 0.79785 72   18PQ + 11P + 11Q + 12 10 − 4933.42818 , 72 and qˆ20

16 ,

 14 18PQ + 11P + 11Q + 12 = 0.02052 + 1372.00643 + 0.21400 . 72

(7)

(8)

(9)

For each of these three coefficients the right-hand side of the equation accounts for well over 99% of the variation in the estimated coefficients.

4

COMPARISONS OF EMPIRICAL ESTIMATES OF ρ

The five data sets furnish eleven georeferenced variables. Spectral bands #4 and #5 of the High Peak data are not analyzed here because they require a more sophisticated spatial autocorrelation specification. Independent and identically distributed random numbers drawn from a normal distribution, with mean of 0 and variance of 1, were added to the Isaaks and Srivastava data set in order to have an example of a value for ρ near zero. Figure 3 displays a scatterplot of the 12 empirical estimate pairs. The dotted line represents a perfect correspondence. The approximations are based upon Eqs. (4)–(6); there is virtually no difference between this plot and one based upon Eqs. (7)–(9). As this figure indicates, the approximations are not noticeably different from their exact counterparts.

FIGURE 3 Exact and approximate [from Eqs. (4)–(6)] estimates of ρ for the data reported by Isaaks and Srivastava, Bailey and Gatrell, Mercer and Hall, Wiebe, and Long et al.

860

D. A. GRIFFITH

FIGURE 4

5

Goodness-of-fit of Jacobian approximations calculated with Eqs. (4)–(6).

ESTIMATING ρ FOR TWO VERY LARGE DATA SETS

Asymptotics appear to play an important role in this estimation situation. Figure 4 depicts the relationship between goodness-of-fit of the Jacobian term approximations and n = PQ; as n increases, the correspondence between the exact and approximate Jacobian plots (e.g., see Fig. 1) becomes near perfect. Equation (3) results coupled with Eqs. (4)–(6) and then with Eqs. (7)–(9) were used to estimate the autoregressive parameter ρ for all seven spectral bands from a 500 × 500 pixels LANDSAT TM image of the Adirondack Park, NY, explored in Griffith and Fellows (1999). These equations also were used to estimate ρ for a 268 × 276 pixels AVHRR (advanced very high resolution radiometer) radar image of part of Alaska. These estimation results, together with those for the equation reported by Griffith and Sone (1995), appear in Table I. Denoting the Jacobian term by J , this latter equation may be written as ln(J ) = α[2 ln(δ) − ln(δ + ρ) − ln(δ − ρ)],

(10)

where for the Adirondack Park data αˆ = 0.15278 and δˆ = 1.14543, and for the Alaska data αˆ = 0.15260 and δˆ = 1.14411. The correspondence between values produced by Eq. (10) TABLE I

Estimates of the Spatial Autoregressive Parameter ρ for an AR or SAR Model.

Adirondack, NY Spectral band #1 Spectral band #2 Spectral band #3 Spectral band #4 Spectral band #5 Spectral band #6 Spectral band #7 Alaska Visible/near-/ thermal infrared spectrum

Griffith/Sone equation

Eqs. (3)–(6)

Eqs. (3) and (7)–(9)

0.8048 0.8864 0.9130 0.9969 0.9937 0.9842 0.9677

0.8128 0.8883 0.9108 0.9843 0.9815 0.9782 0.9575

0.8207 0.8961 0.9179 0.9883 0.9856 0.9805 0.9626

0.7982

0.8054

0.8125

LARGE SPATIAL AUTOREGRESSIVE MODELS

861

and the actual Jacobian term values also looks like that portrayed in the graph appearing in Figure 2. Inspection of these tabulated results reveals that the estimates rendered by Eq. (3) are extremely good.

6 A SIMULATION EXPERIMENT A Monte Carlo experiment was conducted in order to illustrate the utility of the new Jacobian approximation. The factors of this experiment are three levels of spatial autocorrelation (ρ = 0.1, 0.5, 0.9), and three regular square tessellations (30 × 30, 49 × 51, 70 × 78). For each cross-factor, (n = 90, 2499, 5460) iid random numbers, ξi (i = 1, 2, . . . , n) were selected from a normal distribution, for which µ = 0 and σ = 1, using the IMSL function RNNOR and a seed selected with the computer system clock. Each selection was replicated 250 times. Each set of n random numbers was converted to a spatially autocorrelated variable Y as follows: Y = (I − ρW)−1 ξ.

(11)

These simulated data are intended both to mimic the nature of a realistic situation, and to define a problem for which exact computation can be done rapidly enough to investigate the utility of the new Jacobian approximation. Maximum likelihood estimates of ρ were computed for each of the 9 × 250 simulated square tessellation surfaces, first using the Griffith–Sone Jacobian approximation, and second using Eqs. (3) and (7)–(9). Summary statistics for these estimates appear in Table II. The mean Kolmogorov–Smirnov statistics suggest that the individual sets of initial random numbers, denoted by ξ in Eq. (11), conform well to normal distributions, with this conformity improving with increasing lattice size. Mean estimates of ρ closely correspond to their population parameter values, regardless of Jacobian approximation used. And boxplots of the estimates of ρ are very similar for the two Jacobian approximations, although slightly less variation appears to be present in estimated computed using Eqs. (3) and (7)–(9). Scatterplots comparing these estimates appear in Figure 5. These plots confirm that the estimates are essentially the same across lattice size and value of ρ. They also reveal that the variance in ρˆ tends to decrease as ρ approaches its upper limit of 1; this same property characterizes the conventional Pearson product moment correlation coefficient. A more comprehensive qualitative comparison of the simulation results appears in Table III. Because both ρˆGriffith–Sone and ρˆEqs. (3) and (7)–(9) were calculated for each data vector, Y, the difference between pairs of values is evaluated in order to account for their being correlated. Diagnostics computed for these differences suggest that they deviate from normality especially when ρ = 0.1, and variance is not constant across the lattice size × ρ cross-classifications. Evaluation of a weighted linear combination of these pairs of values, in order to maximize conformity with the normal distribution and remove variance heterogeneity, implies that the transformed differences fluctuate only with regard to ρ. Assessment of the raw differences using nonparametric techniques corroborates this finding. Formal hypothesis test results are not reported for the raw differences because of assumption violations when performing an ANOVA, and because the two-way nonparametric analysis may well have a significant interaction term. Nevertheless, adjustments for assumption violations as well as relative factor orderings based upon decompositions of variance for ranks suggest that very large lattice datasets containing marked levels of spatial autocorrelation can be successfully analyzed using the Jacobian approximation developed in this article.

Selected Summary Statistics Calculated for Results from the Simulation Experiments.

862

TABLE II

Surface tessellation size ρ

Statistic

30 × 30

49 × 51

70 × 78

0.9

Mean Kolmogorov–Smirnov ρ¯ˆ Griffith–Sone ρ¯ˆ Eqs. (3) and (7)–(9)

0.021 0.898 0.896

0.013 0.898 0.898

0.009 0.899 0.900

0.021 0.494 0.490

0.013 0.495 0.492

0.009 0.496 0.493

0.021 0.097 0.101

0.013 0.102 0.106

0.008 0.099 0.103

Boxplot

Mean Kolmogorov–Smirnov ρ¯ˆ Griffith–Sone ρ¯ˆ Eqs. (3) and (7)–(9)

Boxplot

0.1

Mean Kolmogorov-Smirnov ρ¯ˆ Griffith–Sone ρ¯ˆ Eqs. (3) and (7)–(9)

Boxplot

D. A. GRIFFITH

0.5

LARGE SPATIAL AUTOREGRESSIVE MODELS

863

FIGURE 5 Bivariate plots of paired ρˆ values obtained from the simulation experiments. The dotted line denotes the theoretical values; solid circles denote simulation results for ρ = 0.9 (upper right-hand corner cluster); asterisks denote simulation results for ρ = 0.5 (center cluster); solid squares denote simulation results for ρ = 0.1 (lower left-hand corner cluster).

7

COMPUTER CODING CONSIDERATIONS

All of the computations reported in this article were executed with the SAS software package; any package with a nonlinear regression procedure should be useable. An important detail to get correct when programming the spatial autoregressive model in such a package is the partial derivative with respect to the autoregressive parameter ρ. A potential error arises because of

864

D. A. GRIFFITH TABLE III

Diagnostic and Evaluation Statistics for the Simulation Experiment Results.

ρˆGriffith–Sone − ρˆEqs. Shapiro–Wilk test statistic for normality Lattice size ρ 30 × 30 0.1 30 × 30 0.5 30 × 30 0.9 49 × 51 0.1 49 × 51 0.5 49 × 51 0.9 70 × 78 0.1 70 × 78 0.5 70 × 78 0.9

(3) and (7)–(9)

0.77855 0.95529 0.94510 0.55145 0.94624 0.95283 0.40363 0.95042 0.94997

(1.1ρˆGriffith–Sone − 0.1ρˆEqs. (3) and (7)–(9) )/Slattice×ρ

0.99712 0.99288 0.99257 0.99602 0.99413 0.99551 0.98978 0.97630 0.99054

Bartlett’s homogeneity of variance test (χ 2 )

2-way ANOVA F-ratio Factor Lattice size ρ Lattice size × ρ

698.74

0.00

12.22 776.21 7.54

0.00 119.90 0.00

970.30

214.36

10.33 829.25 5.34

0.04 117.87 0.07

Kruskal–Wallis nonparameteric ANOVA (χ 2 )

2-way ANOVA F-ratio for ranked data Factor Lattice size ρ Lattice size × ρ

the presence of the Jacobian term on both sides of the equation. The correct calculus is as follows, for a constant mean spatial autoregressive model specification: J = 1 + q2 ρ 2 + q4 ρ 4 + q2 ρ 20 ∂J = 2q2 ρ + 4q4 ρ 3 + 20q20 ρ 19 ∂ρ the nonlinear regression model (see Griffith, 1988) is Y · J = [ρWY + µ(1 − ρ)1 + ε]J , where 1 is an n × 1 vector of ones and µ denotes the population mean ∂(Y · J ) ∂J ∂Y +Y =J ∂ρ ∂ρ ∂ρ hence, J

  ∂J ∂Y = [ρWY + µ(1 − ρ)1 − Y] + (WY − µ1) J. ∂ρ ∂ρ

This last equation is not calculated correctly by automated derivative procedures such as the one available in the latest versions of SAS.

LARGE SPATIAL AUTOREGRESSIVE MODELS

865

A coding trick that can be used to avoid constructing the n × n matrix C, which then is converted to matrix W, may be summarized as follows: Step 1. Generate a horizontal Cartesian coordinate value U by numbering the pixel locations from, say, west to east using the consecutive integers 1, 2, . . . , Q. Step 2. Generate a vertical Cartesian coordinate value V by numbering the pixel locations from, say, north to south using the consecutive integers 1, 2, . . . , P . Step 3. Attach a buffer zone along the east side of the P × Q rectangular region by creating coordinates Q + 0.1, and along the south side of the region by creating coordinates P + 0.1, and then assign missing values to each of these P + Q − 1 artificial buffer pixels. Step 4. Sort the data simultaneously in ascending order on U followed by V , and use a time series LAG1 function to retrieve the north neighbor values; sort the data simultaneously in ascending order on V followed by U , and use a time series LAG1 function to retrieve the west neighbor values; sort the data simultaneously in descending order on U followed by V , and use a time series LAG1 function to retrieve the south neighbor values; sort the data simultaneously in descending order on V followed by U , and use a time series LAG1 function to retrieve the east neighbor values. Step 5. Count the number of nonmissing values to determine the row sums of matrix C that can be used as a devisor with CY to convert it to WY. This procedure can be expanded using lag orders greater than 1 to identify other nearby neighbors if the geographic configuration differs from a rook’s move structure (e.g., a queen’s move structure).

8

DISCUSSION

The advantage of Eq. (3) over the approximation promoted by Smirnov and Anselin (2001) is that it does not require manipulating large matrices. In addition, including terms having qj coefficients of 0 (the odd powers), or near zero (even powers other than 2, 4, and 20) increases computational time with little or no gain in estimation precision. The advantage of Eq. (3) over the approximation promoted by Griffith and Sone (1995) is that it does not require knowledge of the extreme eigenvalues of matrix W, although these eigenvalues are known for a P × Q regular square tessellation to be ±1. Of note is that this second approach has been used to calculate estimates of ρ for spatial data sets as large as 25 million observations (Griffith, 2002). In conclusion, estimates obtained with Eq. (3) are sufficiently precise for practical purposes, deviating from their exact counterparts by only about 1% . And they furnish a relatively simple way to implement spatial autoregressive models to analyze massively large georeferenced data sets.

References Bailey, T. and Gatrell, A. (1995). Interactive Spatial Data Analysis. Longman Scientific, Harlow. Griffith, D. (1988). Estimating spatial autoregressive model parameters with commercial statistical packages. Geographical Analysis, 20, 176–186. Griffith, D. (1992). Simplifying the normalizing factor in spatial autoregressions for irregular lattices. Papers in Regional Science, 71, 71–86. Griffith, D. (2000). Eigenfunction properties and approximations of selected incidence matrices employed in spatial analyses. Linear Algebra & Its Applications, 321, 95–112.

866

D. A. GRIFFITH

Griffith, D. (2002). Modeling spatial dependence in high spatial resolution hyperspectral data sets. J. of Geographical Systems (forthcoming). Griffith, D. and Fellows, P. (1999). Pixels and eigenvectors: classification of LANDSAT TM imagery using spectral and locational information. In: Lowell, K. and Jaton, A. (Eds.), Spatial Accuracy Assessment: Land Information Uncertainty in Natural Resources. Ann Arbor Press, Ann Arbor, pp. 309–317. Griffith, D. and Sone, A. (1995). Trade-offs associated with normalizing constant computational simplifications for estimating spatial statistical models. J. of Statistical Computation and Simulation, 51, 165–183. Isaaks, E. and Srivastava, R. (1989). An Introduction to Applied Geostatistics. Oxford University Press, Oxford, U.K. Long, D., DeGloria, S., Carlson, G. and Nielsen, G. (1993). Spatial regression analysis of crop and soil variability within an experimental research field. In: Robert, P., et al., Proceedings of the First Workshop on Soil Specific Crop Management, Minneapolis, MN, April 14–16, 1992. ASA-CSSA-SSSA, Madison, WI, pp. 365–366. Mercer, W. and Hall, A. (1911). The experimental error of field trials. J. of Agricultural Science (Cambridge), 4, 107–132. Smirnov, O. and Anselin, L. (2001). Fast maximum likelihood estimation of very large spatial autoregressive models: a characteristic polynomial approach. Computational Statistics & Data Analysis, 35, 301–319. Wiebe, G. (1935). Variation and correlation among 1500 wheat nursery plots. J. of Agricultural Research, 50, 331–357.