Comparing exact and approximate spatial auto-regression ... - CiteSeerX

7 downloads 28081 Views 725KB Size Report
1 Electrical and Computer Engineering Department,. University of Minnesota, ... The spatial auto-regression (SAR) model is a popular spatial data analysis technique ...... For a learning (i.e., training) dataset of problem size 2500, the prediction ...
Comparing Exact and Approximate Spatial AutoRegression Model Solutions for Spatial Data Analysis Baris M. Kazar1, Shashi Shekhar2, David J. Lilja1, Ranga R. Vatsavai2, and R. Kelley Pace3 1

Electrical and Computer Engineering Department, University of Minnesota, Twin-Cities MN 55455 {Kazar, Lilja}@ece.umn.edu 2 Computer Science and Engineering Department, University of Minnesota, Twin-Cities MN 55455 {Shekhar, Vatsavai}@cs.umn.edu 3 LREC Endowed Chair of Real Estate 2164B CEBA, Department of Finance E.J. Ourso College of Business Louisiana State University Baton Rouge, LA 70803-6308 [email protected]

Abstract. The spatial auto-regression (SAR) model is a popular spatial data analysis technique, which has been used in many applications with geo-spatial datasets. However, exact solutions for estimating SAR parameters are computationally expensive due to the need to compute all the eigenvalues of a very large matrix. Recently we developed a dense-exact parallel formulation of the SAR parameter estimation procedure using data parallelism and a hybrid programming technique. Though this parallel implementation showed scalability up to eight processors, the exact solution still suffers from high computational complexity and memory requirements. These limitations have led us to investigate approximate solutions for SAR model parameter estimation with the main objective of scaling the SAR model for large spatial data analysis problems. In this paper we present two candidate approximate-semi-sparse solutions of the SAR model based on Taylor series expansion and Chebyshev polynomials. Our initial experiments showed that these new techniques scale well for very large data sets, such as remote sensing images having millions of pixels. The results also show that the differences between exact and approximate SAR parameter estimates are within 0.7% and 8.2% for Chebyshev polynomials and Taylor series expansion, respectively, and have no significant effect on the prediction accuracy.

1 Introduction Explosive growth in the size of spatial databases has highlighted the need for spatial data analysis and spatial data mining techniques to mine the interesting but implicit spatial patterns within these large databases. Extracting useful and interesting patterns from massive geo-spatial datasets is important for many application domains, such as regional economics, ecology and environmental management, public safety, transportation, public health, business, and travel and tourism [8,34,35]. Many classical data mining algorithms, such as linear regression, assume that the learning samples are independently and identically distributed (i.i.d). This assumption is violated in the case of spatial data due to spatial autocorrelation [2,34] and in such cases classical linear regression yields a weak model with not only low prediction accuracy [35] but also residual error exhibiting spatial dependence. Modeling spatial dependencies improves overall classification and prediction accuracies. The spatial auto-regression model (SAR) [10,14,34] is a generalization of the linear regression model to account for spatial autocorrelation. It has been successfully used to analyze spatial datasets in regional economics and ecology [8,35]. The model yields better classification and prediction accuracy [8,35] for many spatial datasets exhibiting strong spatial autocorrelation. However, it is computationally expensive to estimate the parameters of SAR. For example, it can take an hour of computation for a spatial dataset with 10K observation points on a single IBM Regatta processor using a 1.3GHz pSeries 690 Power4 architecture with 3.2 GB memory. This has limited the use of SAR to small problems, despite its promise to improve classification and prediction accuracy for larger spatial datasets. For example, SAR was applied to accurately estimate crop parameters [37] using airborne spectral imagery; however, the study was limited to 74 pixels. A second study, reported in [21], was limited to 3888 observation points. Table 1. Classification of algorithms solving the serial spatial auto-regression model E xact Applying Direct Sparse M at rix A lgorit hms [ 25] Eigenvalue based 1-D Surf ace Part it ioning [ 16]

A p p r o xi mat e M L based M at rix Exponent ial Specif icat ion [ 26] Graph Theory A pproach [ 32] Taylor Series A pproximat ion [ 23] Chebyshev Polynomial A pproximat ion M et hod [ 30]

M axi mum Li kel i ho o d

Semiparamet ric Est imat es [ 27] Charact erist ic Polynomial Approach [ 36] Double Bounded Likelihood Est imat or [ 31] Upper and Lower B ounds via Divide & Conquer [ 28] Spat ial A ut oregression Local Est imat ion [ 29]

B ayesi an

None

Bayesian M at rix Exponent ial Specif icat ion [ 19] M arkov Chain M ont e Carlo (M CM C) [ 3,17]

A number of researchers who have been attracted to SAR because of its high computational complexities have proposed efficient methods of solving the model. These solutions, summarized in Table 1, can be classified into exact and approximate solutions, based on how they compute certain compute-intensive terms in the SAR solu-

tion procedure. Exact solutions suffer from high computational complexities and memory requirements. Approximate solutions are computationally feasible, but many of these formulations still suffer from large memory requirements. For example, a standard remote sensing image consisting of 3000 lines (rows) by 3000 pixels (columns) and six bands (dimensions) leads to a large neighborhood (W) matrix of size 9 million rows by 9 million columns. (The details for forming the neighborhood matrix W can be found in Sect. 2.) Thus, the exact implementations of SAR are simply not capable of processing such large images, and approximate solutions must be found. We choose Taylor and Chebyshev approximations for two reasons. First, the solutions are scalable for large problems and secondly these methods provide bounds on errors. Major contributions of this study include scalable implementations of the SAR model for large geospatial data analysis, characterization of errors between exact and approximate solutions of the SAR model, and experimental comparison of the proposed solutions on real satellite remote sensing imagery having millions of pixels. Most importantly, our study shows that the SAR model can be efficiently implemented without loss of accuracy, so that large geospatial datasets which are spatially auto-correlated can be analyzed in a reasonable amount of time on general purpose computers with modest memory requirements. We are using an IBM Regatta in order to implement parallel versions of the software using open source ScaLAPACK [7] linear algebra libraries. However, the software can also be ported onto generalpurpose computers after replacing ScaLAPACK routines with the serial equivalent open source LAPACK [1] routines. Please note that, even though we are using a parallel version of ScaLAPACK, the computational timings presented in the results section (Table 7) are based on serial execution of all SAR model solutions on a single processor. The remainder of the paper is organized as follows: Section 2 presents the problem statement, and Section 3 explains the exact algorithm for the SAR solution. Section 4 discusses approximate SAR model solutions using Taylor series expansion and Chebyshev polynomials respectively. The experimental design is provided in Section 5. Experimental results are discussed in Section 6. Finally, Section 7 summarizes and concludes the paper with a discussion of future work.

2 Problem Statement We first present the problem statement and the notation in Table 2; and then explain the exact and approximate SAR solutions based on maximum-likelihood (ML) theory [12]. The problem studied in this paper is defined as follows: Given the exact solution procedure described in the Dense Matrix Approach [16] for one-dimensional geospatial datasets, we need to find a solution that scales well for large multi-dimensional geo-spatial datasets. The constraints are as follows: the spatial auto-regression parameter ρ varies in the range [0,1); the error is normally distributed, that is, ε ∼N(0,σ2I) iid; the input spatial dataset is composed of normally distributed random

variables; and the size of the neighborhood matrix W is n. The objective is to implement scalable and portable software for analyzing large geo-spatial datasets. Table 2. The notation in this study Variable

ρ

y x W

k β n p q C D ~ W

ε

Variable

Definition

The spatial auto-regression (autocorrelation) parameter n-by-1 vector of observations on the dependent variable n-by-k matrix of observations on the explanatory variable n-by-n neighborhood matrix that accounts for the spatial relationships (dependencies) among the spatial data Number of features k-by-1 vector of regression coefficients Problem size (also number of observation points or pixels) Row dimension of spatial framework (image) Column dimension of spatial framework (image) n-by-n Binary neighborhood matrix n-by-n Diagonal matrix with elements 1 / si , where si is the row-sum of row i of C n-by-n Symmetric equivalent of W matrix in terms of eigenvalues n-by-1 vector of unobservable error

σ2

The common variance of the error

ln(.)

Natural logarithm operator

q Ψ

ε

The highest degree of the Chebyshev polynomials Current pixel in the spatial framework (image) with “s” neighbors

Definition

I

Identity matrix

λ

Eigenvalue of a matrix

tr(.)

Trace of the “.” matrix

π

Pi constant which is equal to 3.14

|.| (.)-1

Determinant of the “.” matrix

Ti (.)

A Chebyshev polynomial of degree i. “.” can be a matrix or a scalar number. Index variable

k

Inverse of the “.” matrix

∑ (.)

Summation operation on a matrix/vector Index variable running on i of Ti (.)

∏ exp(.)

Product operation on a matrix/vector Exponential operator i.e., e(.)

(.)T

Transpose of the “.” Matrix/vector

(.)ij ∑

ijth element of the “.” matrix

O(.) N

cos(.)

n-by-n Diagonal variance matrix of error defined as σ 2 I “O” notation for complexity analysis of algorithms n-by-n Binary neighborhood matrix from Delaunay triangulation Cosinus trigonometric operation

2.1 Basic SAR Model The spatial auto-regression model (SAR) [10], also known in the literature as a spatial lag model or mixed regressive model, is an extension of the linear regression model and is given in equation 1. y = ρWy + xβ + ε .

(1)

Here the parameters are defined in Table 2. The main point to note here is that a spatial autocorrelation term ρWy is added to the linear regression model in order to

model the strength of the spatial dependencies among the elements of the dependent variable, y. 2.2 Example Neighborhood Matrix (W) The neighborhood matrices used by the spatial auto-regression model are the neighborhood relationships on a one-dimensional regular grid space with two neighbors and a two-dimensional grid space with “s” neighbors, where “s” is four, eight, sixteen, twenty-four and so on neighbors, as shown in Fig. 1. This structure is also known as regular square tessellation one-dimensional and two-dimensional planar surface partitioning [12]. 2-D 4-neighbors

2-D 24-neighbors 2-D 16-neighbors

Ψ

2-D 8-neighbors

1-D 2-neighbors

Fig. 1. The neighborhood structures of the pixel Ψ on one-dimensional and two-dimensional regular grid space

2.3 Illustration of the Neighborhood Matrix Formation on a 4-by-4 Regular Grid Space As noted earlier, modeling spatial dependency (or context) improves the overall classification (prediction) accuracy. Spatial dependency can be defined by the relationships among spatially adjacent pixels in a small neighborhood within a spatial framework that is a regular grid space. The following paragraph explains how W in the SAR model is formed. For the four-neighborhood case, the neighbors of the (i,j)th pixel of the regular grid are shown in Fig. 2.

(i − 1, j ) 2 ≤ i ≤ p,1 ≤ j ≤ q NORTH  (i, j + 1) 1 ≤ i ≤ p, 1 ≤ j ≤ q-1 EAST neighbors (i, j ) =  (i + 1, j) 1 ≤ i ≤ p-1, 1 ≤ j ≤ q SOUTH   (i, j − 1) 1 ≤ i ≤ p, 2 ≤ j ≤ q WEST Fig. 2. The four neighbors of the (i,j)th pixel on the regular grid

The (i,j)th pixel of the surface will fill in the (p(i-1)+j)th row of the non-rowstandardized neighborhood matrix, C. The following entries of C, i.e. {( p(i-1)+j),( p(i-2)+j)}, {( p(i-1)+j),( p(i-1)+j+1)}, {( p(i-1)+j),( p(i)+j)} and {( p(i-1)+j),( p(i1)+j-1)} will be “1”s and the others all zeros. The row-standardized neighborhood matrix W is formed by first finding each row sum (i.e., there will be pq or n number of row-sums since W is pq-by-pq) and dividing each element in a row by its corresponding row-sum. In other words, W = D −1 C, where the elements of the diagonal matrix C are defined as d ii

=



n

i =1

cij

and d ij

=0

. Fig. 3 illustrates the spatial frame-

work and the matrices. Thus, the rows of matrix W sum to 1, which means that W is row-standardized i.e., row-normalized or row-stochastic. A non-zero entry in the jth column of the ith row of matrix W indicates that the jth observation will be used to adjust the prediction of the ith row where i is not equal to j. We described forming a W matrix for a regular grid that is appropriate for satellite images; however, W can also be formed for irregular (or vector) datasets as discussed further in the Appendix [12]. 1 2 3 4 5

1 5 9 13

2 6 10 14

3 7 11 15

(a)

4 8 12 16

1 0 2 1 3 0  4 0 5 1  6 0 7 0  8 0 9 0  10 0  11 0 12 0  13 0 14 0  15 0 16 0

6 7 8 9 10 11 12 13 14 15 16

1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0  0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0  1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0  0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0  0 0 0 0 1 0 0 1 0 1 0 0 1 0 0  0 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1  0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0  0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0

(b)

0    13  0   0  1  3  0   0  0   0   0  0   0  0   0   0  0 

1

2

0 1

0

0

1

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

3

2

3

0

1

0

0

1

0

0

0

0

0

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

1

0

0

1

0

1

0

0

1

0

0

0

0

0

0

1

0

0

1

0

1

0

0

1

0

0

0

0

0

0

1

0

0

1

0

0

0

0

1

0

0

0

0

0

0

1

0

0

0

0

1

0

0

1

0

0

0

0

0

0

1

0

0

1

0

1

0

0

1

0

0

0

0

0

0

1

0

0

1

0

1

0

0

1

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

0

0

1

0

0

1

0

1

0

0

0

0

0

0

0

0

0

1

0

0

1

0

0

0

0

0

0

0

0

0

0

0

1

0

0

3

4

2

4

3

3

4

3

3

4

4

3

4

3

4

2

4

3

3

4

2

4

3

4

3

4

4

3

3

3

4

2

3

3

4

2

3

4

0

1

3

2

(c)

Fig. 3. (a) The spatial framework, which is p-by-q, where p may or may not be equal to q, (b) the pq-by-pq non-normalized neighborhood matrix C with 4 nearest neighbors, and (c) the normalized version (i.e., W), which is also pq-by-pq. The product pq is equal to n, the problem size

0   0  0   0   0  0   0  0   0   0  0   1  3  0   0  1  3  0 

3 Exact SAR Model Solution The estimates for the parameters ρ and β in the SAR model (equation 1) can be found using either maximum likelihood theory or Bayesian statistics. In this paper we consider the maximum likelihood approach for estimating the parameters of the SAR model, whose mechanics are presented in Fig. 4. ~ x, y , W, W , ε, n

Stage A Compute Eigenvalues

Range of ρ

ρˆ bestfit

Stage B

Compute SSE

Golden Section Search Calculate ML function

Eigenvalues of W

Stage C

ρˆ , βˆ , σˆ 2

Fig. 4. System diagram of the serial exact algorithm for the SAR model solution composed of three stages (A, B, and C )

Fig. 4 highlights the three stages of the exact algorithm for the SAR model solution. It is based on maximum-likelihood (ML) theory, which requires computing the logarithm of the determinant (i.e., log-determinant) of the large (I − ρW ) matrix. The first term of the end-result of the derivation of the logarithm of the likelihood function (equation 2) clearly shows why we need to compute the (natural) logarithm of the determinant of a large matrix. In equation 2 “I” denotes an n-by-n identity matrix, “T” denotes the transpose operator, “ln” denotes the logarithm operator, and σ 2 is the common variance of the error. ln( L ) = ln I − ρW −

{

n ln( 2π ) 2



2 n ln(σ ) 2



SSE . 2 2σ

(2)

}

T T T −1 T T T −1 T where SSE = ( y (I − ρW ) [I − x ( x x ) x ] [I − x ( x x ) x ](I − ρW ) y ) .

Therefore, Fig. 4 can be viewed as an implementation of the ML theory. We now describe each stage. Stage A is composed of three sub-stages: pre-processing, Householder transformation [33], and QL transformation [9]. The pre-processing sub-stage not only forms the row-standardized neighborhood matrix W, but also converts it to ~ its symmetric eigenvalue equivalent matrix W . The Householder transformation and QL transformation sub-stages are used to find all of the eigenvalues of the neighbor~ hood matrix. The Householder transformation sub-stage takes W as input and forms the tri-diagonal matrix whose eigenvalues are computed by the QL transformation sub-stage. Computing all of the eigenvalues of the neighborhood matrix takes approximately 99% of the total serial response time, as shown in Table 3. Stage B computes the best estimates for the spatial auto-regression parameter ρ and the vector of regression coefficients β for the SAR model. While these estimates are being found, the logarithm of the determinant of (I − ρW ) needs to be computed at

each step of the non-linear one-dimensional parameter optimization. This step uses the golden section search [9] and updates the auto-regression parameter at each step. There are three ways to compute the value of the logarithm of the likelihood function: (1) compute the eigenvalues of the large dense matrix W once; (2) compute the determinant of the large dense matrix (I − ρW ) at each step of the non-linear optimization; (3) approximate the log-determinant term. For small problem sizes, the first two methods work well; however, for large problem sizes approximate solutions are needed. Equation 3 expresses the relationship between the eigenvalues of the W matrix and the logarithm of the determinant (i.e., log-determinant) of the (I − ρW ) matrix. The optimization is of O(n) complexity. (3)

taking the n n logarithm | I − ρW |= ∏ (1 − ρλi )  → ln | I − ρW |= ∑ ln(1 − ρλi ) . i =1 i =1

The eigenvalue algorithm applied in this study cannot find the eigenvalues of any ~ dense matrix. The matrix W has to be converted to its symmetric version W , whose eigenvalues are the same as the original matrix W. The conversion is derived as shown in Fig. 5.

~ W = D1 / 2 * W * D −1 / 2

D is defined as:

The Binary Neighborhood matrix C

n

d ii =

∑c i =1

ij

and

d ij = 0 if i ≠ j

W = D −1 * C

= D −1 / 2 * C * D −1 / 2

row - normalized

symmetric

(or row - stochastic)

equivalent of W

version of C

in terms of eigenvalues

~

Fig. 5. Derivation of the W matrix, the symmetric eigenvalue equivalent of the W matrix

~ The matrix W (i.e. D −1 / 2 CD −1 / 2 ) is symmetric and has the same eigenvalues as W.

The row standardization can be expressed as W = D −1C , where D is a diagonal matrix with elements 1 / si , where si is the row-sum of row i of C. The symmetrization subroutine is the part of the code that does this job. Finally, stage C computes the sum of the squared error, i.e., the SSE term, which is O(n2) complex. Table 3 shows our measurements of the serial response times of the stages of the exact SAR model solution based on ML theory. Each response time given in this study is the average of five runs. As can be seen, computing the eigenvalues (stage A) takes a large fraction of the total time. Now, we outline the derivation of the maximum likelihood function. This derivation not only shows the link between the need for eigenvalue computation and the spatial auto-regression model parameter fitting but also explains how the spatial autoregression model works and can be interpreted as an execution trace of the algorithm. We begin the derivation by choosing a SAR model that is described by equation 1. Ordinary least squares are not appropriate to solve for the models described by

equation 1. One way to solve is to use the maximum likelihood procedure. In probability, there are essentially two classes of problems: the first is to generate a data sample given a probability distribution and the second is to estimate the parameters of a probability distribution given data. Obviously in our case, we are dealing with the latter problem. Table 3. Measured serial response times of stages of the exact SAR model solution for problem sizes of 2500, 6400 and 10K. Problem size denotes the number of observation points Serial Execution Time (sec) Spent on Problem size (n )

Stage A

Stage B

Stage C

Computing Eigenvalues

ML Function

Least Squares

78.10

0.41

0.06

Machine

SGI Origin 2500

6400

10000

IBM SP

69.20

1.30

0.07

IBM Regatta

46.90

0.58

0.06

SGI Origin

1735.41

5.06

0.51

IBM SP

1194.80

17.65

0.44

IBM Regatta

798.70

6.19

0.42

SGI Origin

6450.90

11.20

1.22

IBM SP

6546.00

66.88

1.63

IBM Regatta

3439.30

24.15

0.93

It is assumed that ε is generated from a normal distribution, which has to be formally defined to go further in the derivation. The normal density function is given in equation 4. N (ε) ≡ (2π )



n 2





1 2

 1  exp− ε T ∑ ε  , 2  

(4)

where ∑ = σ 2 I and “T” means transpose of a vector or matrix. It is worth noting again that we are assuming in our derivation that the error vector ε is governed by the standard normal distribution with zero mean and variance ∑ . The prediction of the spatial auto-regression model solution heavily depends on the quality of the normally distributed random numbers generated. The term dε dy needs to be calculated out in order to find the probability density function of the variable y, which is given by equation 5. The notation x denotes the determinant of matrix x. Hence, the probability density function of the observed dependent variable (y vector) is given by equation 6. When ε is replaced by ((I − ρW )y − xβ) in equation 6, the explicit form of the probability density function of y given by equation 7 is obtained. It should be noted that ∑ = σ 2 I = σ 2 n , where n is the rank (i.e., row-size and columnsize) of identity matrix, I.

L( y) = (2πσ 2 )



n 2

| dε dy |=| I − ρW | .

(5)

L( y) = N((I − ρW)y − xβ)| dε dy | .

(6)

  1 I − ρW exp− 2 {[(I − ρW)y − xβ]T [(I − ρW)y − xβ]} .   2σ

(7)

L(y) will henceforth be referred to as the “likelihood function.” It is a probability distribution but now interpreted as a distribution of parameters which have to be calculated. Since the log-likelihood function is monotonic, we can then equivalently minimize the log-likelihood function, which has a simpler form and can handle large numbers. This is because the logarithm is advantageous, since log( ABC ) = log( A) + log( B) + log(C ) . After taking the natural logarithm of equation 7, we get equation 8, i.e., the log-likelihood function with the estimators for the variables β and σ 2 , which are represented by βˆ and σˆ 2 respectively in equations 9a and b. ln(L) = ln I − ρW −

1 n ln(2π ) n ln(σ 2 ) − − 2 {[(I − ρW)y − xβ]T [(I − ρW)y − xβ]} . 2 2 2σ

(8)

(9a)

βˆ = (xT x) −1 xT (I − ρW)y .

(9b)

σˆ 2 = y T (I − ρW)T [I − x(xT x) −1 xT ](I − ρW)y / n .

The term [(I − ρW )y − xβ] is equivalent to [I − x(x T x) −1 x T ](I − ρW )y after replacing βˆ given by equation 9a with β in equation 8. That leads us to equation 10 for the log-likelihood function (i.e., the logarithm of the maximum likelihood function) to be optimized for ρ . n ln(2π ) n ln(σ 2 ) − − 2 2 1 (y T (I − ρW)T [I − x(xT x) −1 xT ]T [I − x(xT x) −1 xT ](I − ρW)y) 2

ln(L) = ln I − ρW − 2σ

{

(10)

}

The first term of equation 10 (i.e., the log-determinant) is nothing but the logarithm of the sum of a collection of scalar values including all of the eigenvalues of the neighborhood matrix W. The first term transforms from a multiplication to a sum as shown by equation 3. That is why all of the eigenvalues of W matrix are needed.

{

}

n (11) 1 (y T (I − ρW ) T [I − x(x T x) −1 x T ]T [I − x(x T x) −1 x T ](I − ρW )y ) . MIN ∑ ln(1 − ρλi ) − 2 |ρ | 10K) one can use geometric series expansion to compute the inverse matrix in equation 13. (For more details see lemma 2.3.3 [11].) In the next section we show how the complexity of these calculations can be reduced using approximate solutions.

4 Two approximate SAR model solutions Since an exact SAR model solution is both memory and compute intensive, we need approximate solutions that do not sacrifice accuracy and can handle very large datasets. We propose to use two different approximations for solving the SAR model solution, Taylor’s series expansion and Chebyshev polynomials. The purpose of these approximations is to calculate the logarithm of the determinant of (I − ρW ) .

4.1 Approximation by Taylor’s Series Expansion Martin [23] suggests an approximation of the log-determinant by means of the traces of the powers of the neighborhood matrix, W. He basically finds the trace of the matrix logarithm, which is equal to the log-determinant. In this approach, the Taylor series is used to approximate the function



n

i =1

ln(1 − ρλi ) where λi represents the ith

eigenvalue that lies in the interval [-1,+1] and ρ is the scalar parameter from the  k  −1 provided that | ρλi |< 1 , which will hold for all i if | ρ |< λ . Equation 13, which states the approximation used for the logarithm of the determinant of the large matrix term

interval (-1,+1). The term −



n

i =1

ln(1 − ρλi ) can be expanded as



∑i=1 ( ρλi ) n

k

of maximum likelihood, is obtained using the relationship between eigenvalues and the trace of a matrix, i.e.,

∑i=1 λi k = tr (W k ) . n

−1

ln | I − ρW |

n=

(13)

1 ∞ ∑{tr (W k ) ρ k / k} . n k =1

The approximation comes into the picture when we sum up to a finite value, r, instead of infinity. Therefore, equation 13 is relatively much faster because it eliminates the need to calculate the compute-intensive eigenvalue estimation when computing the log-determinant. The overall solution is shown in Fig. 6. ~ W , W, ρ , x, y Taylor’s Series Expansion applied to

ln | I − ρW |

ρˆ ML Function Value

Golden Section search

ρˆ bestfit

Calculate ML Function

Similar to Stages A & B in Fig. 4

One Dense Matrix (n-byn) and Vector (n-by-1) Multiplication 2 Dense Matrix (nby-k) and Vector (n-by-1) Multiplications

ρˆ , βˆ ,σˆ 2 3 Vector (n-by-1) Dot Products

Scalar Operation

SSE stage (Stage C in Fig. 4)

Fig. 6. The system diagram for the Taylor’s Series expansion approximation for the SAR model solution. The inner structure of Taylor series expansion is similar to that of Chebyshev Polynomial except that there is one more vector sum operation, which is very cheap to compute

4.2 Approximation by Chebyshev Polynomials ~

This approach uses the symmetric equivalent of the neighborhood matrix W (i.e., W ) ~ as discussed in Sect. 3. The eigenvalues of the symmetric W are the same as those of the neighborhood matrix W. The following lemma leads to a very efficient and accurate approximation to the first term on the right-hand side of the logarithm of the likelihood function shown in equation 2. Lemma1. The Chebyshev solution tries to approximate the logarithm of the ~ determinant of (I − ρW) involving a symmetric neighborhood matrix W as in equation 14. The first three terms are sufficient for approximating the log determinant term with an accuracy of 0.03%. ~ ln | I − ρW |≡ ln | I − ρW |≅

q +1

~ 1 c j ( ρ )tr (T j −1 ( W )) − c1 ( ρ ) . 2 j =1



(14)

Proof. It is available in [33]. The value of “q” is 2, which is the highest degree of the Chebyshev polynomials. ~ ~ ~ Therefore, only T0 ( W ), T1 ( W ) and T2 ( W ) have to be computed where: ~ ~ ~ ~ ~ ~ ~ ~ ~ T0 ( W ) = I; T1 ( W ) = W; T2 ( W ) = 2W 2 − I; ...; Tk +1 ( W) = 2 WTk ( W ) − Tk −1 ( W )

The Chebyshev polynomial coefficients c j ( ρ ) are given in equation 15. c j (ρ) = (

π (k − 1 2) π ( j − 1)(k − 1 2) 2 q +1 )∑ ln[1 − ρ cos( )] cos( ). q + 1 k =1 q +1 q +1

(15)

In Fig. 7, the maximum likelihood function is computed by computing the maximum of the sum of the logarithm of the likelihood function values and the SSE term. The spatial auto-regression parameter ρ that achieves this maximum value is the desired value that makes the classification most accurate. The parameter “q” is the highest degree of the Chebyshev polynomial which is used to approximate the term ln(I − ρW ) . The system diagram of the Chebyshev polynomial approximation is presented in Fig. 7. The following lemma reduces the computational complexity of the Chebyshev polynomial from O(n3) to approximately O(n2). Lemma2. For regular grid-based nearest-neighbor symmetric neighborhood matrices, the relationship shown in equation 16 holds. This relationship saves a tremendous amount of execution time. ~ trace( W 2 ) =

∑ ∑ n

i =1

n

~

w j =1 ij

2

~ ~ where (i, j ) th element of W is w ij .

(16)

Proof. The equality property given in equation 16 follows from the symmetry property of the symmetrized neighborhood matrix. In other words, it is valid for all symmetric matrices. The trace operator sums the diagonal elements of the square of the ~ symmetric matrix W . This is the equivalent of saying that the trace operator first multiplies and adds the ith column with the ith row of the symmetric matrix, where the ith column and the ith row of the matrix are the same entries in a symmetric matrix. This results in squaring and summing the elements of the symmetric neighborhood ~ matrix W . Equation 16 shows this shortcut for computing the trace of the square of the symmetric neighborhood matrix. In Fig. 8, the powers of the W matrices, whose traces are to be computed, go up to 2. The parameter “q” is the highest degree of the Chebyshev polynomial which is used to approximate the term ln(I − ρW ) . The ML function is computed by calculating the maximum of the likelihood functions (i.e. the logarithm determinant term plus the SSE term). The pseudo-code of the Chebyshev polynomial approximation approach is presented in Fig. 8.

~ W, W , ρ ,x,y Chebyshev Polynomial Approximation

q

ρˆ

Chebyshev coefficients

c j (ρ )

~ W

q-1 dense n-by-n matrix-matrix multiplications

Golden Section

ML search Function Value Calculate

Trace of n-by-n dense matrix

ML Function

ρˆ bestfit

One Dense Matrix (n-byn) and Vector (n-by-1) Multiplication 2 Dense Matrix (n-byk) and Vector (n-by-1) Multiplications

ρˆ , βˆ ,σˆ 2 3 Vector (n-by-1) Dot Products

Scalar Operation

Similar to Stages A & B in Fig. 4 Chebyshev Polynomial applied to

ln | I − ρW |

SSE stage (Stage C in Fig. 4)

Fig. 7. System diagram of the approximate SAR model solution, where ln(I − ρW ) is expressed as a Chebyshev polynomial. The term “q” is the degree of the Chebyshev Polynomial ~ Input : W, n, ρˆ , q(= nposs ) Output : The estimate to ln | I - ρW | CHEBYSHEV - APPROXIMATION - TO - LOGDET 1 td1 ← 0 ~ 2 td2 ← TRACE( W 2 ) 3 cheby_poly_coeffs ← [1 0 0 ; 0 1 0; - 1 0 2] 4 nposs ← 3 5 seq1nposs ← [1 2 3]T 6 x k ← cos(π * ( seq1nposs - 0.5)/nposs ) 7 for j ← 1 to nposs do 8 cposs[i, j ] ← (2 nposs )sum(log(1 − ρ ∗ x k ). ∗ cos(π ( j − 1). * ( seq1nposs − 0.5) / nposs )) 9 tdvec ← [n td1 td2 − 0.5 * n]T 10 comboterm ← cposs length ( ρ _ vec)×nposs * cheby_poly_coeffs 3×3 11 cheby_logdet_approx ← combotermlength ( ρ _ vec )×3 * tdvec 3×1

Fig. 8. The pseudo code of the Chebyshev polynomial approximated ln(I − ρW )

5 Experimental Design We evaluate our solution models using satellite remote sensing data. We first present the system setup and then introduce our real dataset along with the comparison metrics. System setup. The control parameters for our experiments are summarized in Table 4. One of our objectives is to make spatial analysis tools available to the GIS user community. Notable solutions for the SAR model have been implemented in Matlab [18]. These approaches have two limitations. First, the user cannot operate without these packages and secondly these methods are not scalable to the application size. Our approach is to implement a general purpose package that works independently and scales well to the application size. All solutions described in this paper have been implemented using a general purpose programming language, f77, and use open

source matrix algebra packages (ScaLAPACK [7]). All the experiments were carried out using the same common experimental setup summarized in Table 4. Table 4. The experimental design Factor Name Language Problem Size (n) Neighborhood Structure Method Auto-regression Parameter Hardware Platform Data set

Parameter Domain f77 2500,10K and 2.1M observation points 2-D w/ 4-neighbors Maximum Likelihood for Exact & Approximate SAR Model [0,1) IBM Regatta w/1.3 GHz Power4 architecture processor Remote Sensing Imagery Data

Dataset. We used real data-sets from satellite remote-sensing image data in order to evaluate the approximations to SAR. The study site encompasses Carlton County, Minnesota, which is approximately 20 miles southwest of Duluth, Minnesota. The region is predominantly forested, composed mostly of upland hardwoods and lowland conifers. There is a scattering of agriculture throughout. The topography is relatively flat, with the exception of the eastern portion of the county containing the St. Louis River. Wetlands, both forested and non-forested, are common throughout the area. The largest city in the area is Cloquet, a town of about 10,000. For this study we used a spring Landsat 7 scene, taken May 31, 2000. This scene was clipped to the Carlton county boundaries, which resulted in an image of size 1343 lines by 2043 pixels and 6-bands. Out of this we took a subset image of 1200 by 1800 to eliminate boundary zero-valued pixels. This translates to a W matrix of size 2.1 million x 2.1 million (2.1M x 2.1M) points. The observed variable x is a matrix of size 2.1M by 6. We chose nine thematic classes for the classification. Comparison Metrics. We measured the performance of our implementation for accuracy, scalability (computational time), and memory usage. We first calculated the percentage error of the spatial auto-regression parameter ρ and the vector of regression coefficients β estimates from the approximate and exact SAR model solutions. Next, we calculated another accuracy metric using the standard root-mean-square (RMS) error. We computed the RMS error of the estimates of the observed dependent variable (y vectors or yˆ ) i.e. the thematic classes from the approximate and exact SAR model solutions. Scalability is reported in terms of computation (wall) time on an IBM Regetta 1.3GHz Power4 processor. Memory usage is determined by the total memory required by the program (which includes data and instruction space).

6 Results and Discussion Since the main focus of this study is to find a scalable approximate method for the SAR model solution for very large problem sizes, the first evaluation is to compare

the estimates from the approximate methods for the spatial auto-regression parameter ρ and the vector of regression coefficients β with the estimates obtained from the exact SAR model. Using the percentage error formula, Table 5 presents the comparison of accuracies of ρ and β obtained from the exact and the approximate (Chebyshev Polynomial and Taylor Series expansion based) SAR model solutions for the 2500 problem size. The estimates from the approximate methods are very close to the estimates obtained from the exact SAR model solution; there is an error of only 0.57% for the ρ estimate obtained from the Chebyshev polynomial approximation case and an error of 7.27% for the ρ estimate from the Taylor series expansion approximation. A similar situation exists for the β estimates. The maximum error among the β estimates is 0.7% for the Chebyshev polynomial approximation case and 8.2% for the Taylor series expansion approximation. The magnitudes of the errors for the ρ and β estimates are on the same order across methods. Lemma 3. Taylor series approximation performs worse than Chebyshev polynomial approximation because Chebyshev polynomial approximation has a potential error canceling feature of the logarithm of the determinant (log-determinant) of a matrix. Taylor series expansion produces different error magnitudes for positive versus negative eigenvalue λi whereas the Chebyshev polynomials tend to produce error of more equal maximum magnitude [30]. Proof: The main reason behind this phenomenon is that Taylor series approximation does better than the Chebyshev polynomial approximation for values of ρ nearer to zero, bur far worse for extreme ρ (see Sect. 2.3 of [30]). Since the value of ρ is far greater than zero in our case, our experiments also verify this phenomenon, as shown in Table 5. Table 5. The comparison of accuracies of ρ , the spatial auto-regression parameter, and β , the vector of regression coefficients, obtained from the exact and the approximate (Chebyshev Polynomial and Taylor Series expansion) SAR model solutions for the 2,500 problem size Problem

ρ

β

Size

1

2

3

4

5

6

-0.516

3.167

0.0368

-0.4541

3.428

50x50

Exact

0.4729

-2.473

(2500)

Chebyshev

0.4702

-2.478

-0.520

3.176

0.0368

-0.456

3.440

Taylor

0.4385

-2.527

-0.562

3.291

0.0374

-0.476

3.589

The second evaluation is to compute the RMS (root-mean-square) error of the estimates of the observed dependent variable (y vectors or yˆ ) i.e., the thematic classes. The RMS error is given in equation 17 to show how we use it in our formulation.

Table 6 presents the RMS values for all thematic classes. A representative RMS error value for the Taylor method is 2.0726 and for the Chebyshev method, 0.1686.

(

)

(17)

 yˆ cp − yˆ ee 2      n−2 .   (yˆ ts − yˆ ee )2     n−2   

RMSerrorcp =



RMSerrorts =



The values of the RMS error suggest that estimates for the observed dependent variable (y vector or thematic classes) from the Chebyshev polynomial approximated SAR model solution are better than those of the Taylor series expansion approximated SAR model solution. This result agrees with the estimates for the spatial autoregression parameter ρ and the vector of regression coefficients β shown in Table 5. Table 6. RMS values for each thematic class of a dataset of problem size 2500 Training Thematic Class

RMS error value for Chebyshev

RMS error value for Taylor

Testing Thematic Class

RMS error value for Chebyshev

RMS error value for Taylor

y1 y2 y3 y4 y5 y6 y7 y8 y9

0.1686 0.2945 0.5138 1.0476 0.3934 0.3677 0.2282 0.6311 0.3866

2.0726 2.0803 3.3870 6.9898 2.4642 2.3251 1.5291 4.3484 3.8509

y1 y2 y3 y4 y5 y6 y7 y8 y9

0.1542 0.2762 0.5972 1.4837 0.6322 0.4308 0.2515 0.5927 0.4527

1.9077 2.0282 4.0806 9.6921 3.9616 2.8299 1.7863 4.0524 4.4866

Table 7. The execution time in seconds and the memory usage in mega-bytes (MB) Problem Size (n) 50x50 (2500) 100x100 (10K) 1200x1800 (2.1M)

Time (Seconds)

Memory (MB)

Exact

Taylor

Chebyshev

Exact

Taylor

Chebyshev

38

0.014

0.013

50

1.0

1.0

5100 Intractable

0.117 17.432

0.116 17.431

2400 ~32*106

4.5 415

4.5 415

The predicted images (50 rows by 50 columns) using exact and approximate solutions are shown in Fig. 9. Although the differences in the images predicted by the exact and approximate solutions is hard to notice, there is a huge difference between these methods in terms of computation and memory usage. As can be seen in Table 7, even for large problem sizes, the run-times are pretty small due to the fast logdeterminant calculation offered by Chebyshev and Taylor’s series approximation. By contrast, with the exact approach, it is impossible to solve any problem having more than 10K observation points. Even if we used sparse matrix determinant computation, it is clear that approximate solutions will still be faster.

The approximate solutions also manage to provide close estimates and fast execution times using very little memory. Such fast execution times make it possible to sale solutions for large problems consisting of billions of observation points. The memory usage is very low due to the sparse storage techniques applied to the neighborhood matrix W. Sparse techniques cause speedup since the computational complexity of linear algebra operations decrease because of the small number of non-zero elements within the W matrix. As seen from Figures 6 and 7, the most complex operation for Taylor series expansion and Chebyshev Polynomial approximated SAR model solu-

~

tions is the trace of powers of the symmetric neighborhood matrix W, which requires matrix-matrix multiplications. These operations are reduced to around O(n2) complexity by Lemma 2 given in Sect. 4.2. All linear algebra matrix operations are efficiently implemented using the ScaLAPACK [7] libraries. We fitted the SAR model for each observed dependent variable (y vector). For each pixel a thematic class label was assigned by taking the maximum of the predicted values. Fig. 9 shows a set of labeled images for a problem size of 2500 pixels (50 rows x 50 columns). For a learning (i.e., training) dataset of problem size 2500, the prediction accuracies of the three methods were similar (59.4% for the exact SAR model, 59.6% for the Chebyshev polynomial approximated SAR model, and 60.0% for the Taylor series expansion approximated SAR model.) We also observed a similar trend on another (testing) dataset of problem size 2500. The prediction accuracies were 48.32%, 48.4% and 50.4% for the exact solution, Chebyshev polynomial and Taylor series expansion approximation based SAR models respectively. This is an interesting result. Even though the estimates for the observed dependent variables (y vectors) or thematic classes are more accurate for the Chebyshev polynomial based approximate SAR model than for the Taylor series expansion approximated SAR model solution, the classification accuracy for the Taylor series expansion approximated SAR model solution becomes better than the ones for not only the Chebyshev polynomial based approximate SAR model but also even the exact SAR model solution. We think that the opposite trend will be observed for larger size images because SAR might need more samples to be trained better. Even though we do not suggest a new exact SAR model solution, further research and experimentation is needed to fully understand SAR model’s training needs and its impact on prediction accuracy with the solution methods discussed in this paper.

7 Conclusions and Future Work Linear regression is one of the best-known classical data mining techniques. However, it makes the assumption of independent identical distribution (i.i.d.) in learning data samples, which does not work well for geo-spatial data, which is often characterized by spatial autocorrelation. In the spatial auto-regression (SAR) model, spatial dependencies within data are taken care of by the autocorrelation term, and the linear regression model thus becomes a spatial auto-regression model.

Fig. 9. The images (50x50) using exact and approximate solutions

Incorporating the autocorrelation term enables better prediction accuracy. However, computational complexity increases due to the need for computing the logarithm of the determinant of a large matrix (I − ρW ) , which is computed by finding all of the ~ eigenvalues of the W matrix. This paper applies one exact and two approximate methods to the SAR model solution using various sizes of remote sensing imagery data i.e., 2500, 10K and 2.1M observations. The approximate methods applied are Chebyshev Polynomial and Taylor series expansion. It is observed that the approximate methods not only consume very little memory but they also execute very fast while providing very accurate results. Although the software is written using a parallel version of ScaLAPACK [7], SAR model solutions presented in this paper can be run either sequentially on a single processor of a node or in parallel on single or multiple nodes. All the results presented in Sect. 6 (Table 7) are based on sequential runs on the same (single) node of an IBM Regetta machine. It should be noted that the software can be easily ported onto general purpose computers and workstations by replacing open source ScaLAPACK routines with the serial equivalent routines in the open source LAPACK [1,13] library. Currently, LAPACK libraries can be compiled on Windows 98/NT, VAX, and several variants of UNIX. In our future release of SAR software, we plan to provide both ScaLAPACK and LAPACK versions.

In this study we focused on the scalability of the SAR model for large geospatial data analysis using approximate solutions and compared the quality of exact and approximate solutions. Though in this study we focused only on quality of parameter estimates, we do recognize that training and prediction errors were also important for these methods to be widely applied in various geospatial application domains. Towards this goal we are conducting several experiments on several geospatial data sets from diverse geographic settings. Our future studies will also focus on comparing SAR model predictions against competing models like Markov Random Fields. We are also developing algebraic cost models to further characterize performance and scalability issues.

8 Acknowledgments This work was partially supported by the Army High Performance Computing Research Center (AHPCRC) under the auspices of the Department of the Army, Army Research Laboratory (ARL) under contract number DAAD19-01-2-0014. The content of this work does not necessarily reflect the position or policy of the government and no official endorsement should be inferred. The authors would like to thank the University of Minnesota Digital Technology Center and Minnesota Supercomputing Institute for the use of their computing resources. The authors would also like to thank the members of the Spatial Database Group, ARCTiC Labs Group for valuable discussions. The authors thank Kim Koffolt for helping improve the readability of this paper and anonymous reviewers for their useful comments.

References 1. Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK User’s Guide, 3rd Edition, Society for Industrial and Applied Mathematics, Philadelphia, PA (1999) 2. Anselin, L.: Spatial Econometrics: Methods and Models, Kluwer Academic Publishers, Dorddrecht (1988) 3. Barry, R., Pace, R.: Monte Carlo Estimates of the log-determinant of large sparse matrices. Linear Algebra and its Applications, Vol. 289 (1999) 41-54 4. Bavaud, F.: Models for Spatial Weights: A Systematic Look, Geographical Analysis, Vol. 30 (1998) 153-171 5. Besag, J. E.: Spatial Interaction and the Statistical Analysis of Lattice Systems, Journal of the Royal Statistical Society, B, Vol. 36 (1974) 192-225 6. Besag, J. E.: Statistical Analysis of Nonlattice Data, The Statistician, Vol. 24 (1975) 179195 7. Blackford, L. S., Choi, J., Cleary, A., D'Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R. C.: ScaLAPACK User’s Guide, Society for Industrial and Applied Mathematics, Philadelphia, PA (1997)

8. Chawla, S., Shekhar, S., Wu, W., Ozesmi, U.: Modeling Spatial Dependencies for Mining Geospatial Data, Proc. of the 1st SIAM International Conference on Data Mining, Chicago, IL (2001) 9. Cheney, W., Kincaid, D.: Numerical Mathematics and Computing, 3rd ed. (1999) 10. Cressie, N. A.: Statistics for Spatial Data (Revised Edition). Wiley, New York (1993) 11. Golub, G. H., Van Loan, C. F.: Matrix Computations. Johns Hopkins University Press, 3rd edn. (1996) 12. Griffith, D. A.: Advanced Spatial Statistics, Kluwer Academic Publishers (1988) 13. Information about Freely Available Eigenvalue-Solver Software: http://www.netlib.org/utk/people/JackDongarra/la-sw.html 14. Kazar, B., Shekhar, S., Lilja, D.: Parallel Formulation of Spatial Auto-Regression, AHPCRC Technical Report No: 2003-125 (August 2003) 15. Kazar, B. M., Shekhar, S., Lilja, D. J., Boley, D.: A Parallel Formulation of the Spatial Auto-Regression Model for Mining Large Geo-Spatial Datasets, Proc. of 2004 SIAM International Conf. on Data Mining Workshop on High Performance and Distributed Mining (HPDM2004), Orlando, Fl. USA (2004) 16. Li, B.: Implementing Spatial Statistics on Parallel Computers, In: Arlinghaus S. (Ed.), ed. Practical Handbook of Spatial Statistics, CRC Press, Boca Raton, FL (1996) 107-148 17. LeSage, J.: Solving Large-Scale Spatial autoregressive models, presented at the Second Workshop on Mining Scientific Datasets, AHPCRC, University of Minnesota (July 2000) 18. LeSage, J. P.: Econometrics Toolbox for MATLAB. http://www.spatial-econometrics.com/ 19. LeSage, J., Pace, R. K.: Using Matrix Exponentials to Explore Spatial Structure in Regression Relationships (Bayesian MESS), Technical Report (October 2000) http://www.spatialstatistics.com 20. LeSage, J., Pace, R. K.: Spatial Dependence in Data Mining, in Data Mining for Scientific and Engineering Applications, R. L. Grossman, C. Kamath, P. Kegelmeyer, V. Kumar, and R. R. Namburu (eds.), Kluwer Academic Publishing (2001) 439-460 21. Long, D. S.: Spatial autoregression modeling of site-sepecific wheat yield. Geoderma, Vol. 85 (1998) 181-197 22. Marcus, M., Minc, H.: A Survey of Matrix Theory and Matrix Inequalities, New York: Dover (1992) 23. Martin, R. J.: Approximations to the determinant term in Gaussian maximum likelihood estimation of some spatial models, Communications in Statistical Theory Models, Vol. 22 Number 1 (1993) 189-205 24. Ord, J. K.: Estimation Methods for Models of Spatial Interaction, Journal of the American Statistical Association, Vol. 70 (1975) 120-126 25. Pace, R. K., Barry, R.: Quick Computation of Spatial Auto-regressive Estimators. Geographical Analysis, Vol. 29, (1997) 232-246 26. Pace, R. K., LeSage, J.: Closed-form maximum likelihood estimates for spatial problems (MESS), Technical Report (September 2000) http://www.spatial-statistics.com 27. Pace, R. K., LeSage, J.: Semiparametric Maximum Likelihood Estimates of Spatial Dependence, Geographical Analysis, Vol. 34, No.1 The Ohio State University Press (Jan 2002) 76-90 28. Pace, R. K., LeSage, J.: Simple bounds for difficult spatial likelihood problems, Technical Report (2003) http://www.spatial-statistics.com 29. Pace, R. K., LeSage, J.: Spatial Auto-regressive Local Estimation (SALE), Spatial Statistics and Spatial Econometrics, Edited by Art Getis, Palgrave (2003) 30. Pace, R. K., LeSage, J.: Chebyshev Approximation of Log-Determinant of Spatial Weight Matrices, Computational Statistics and Data Analysis, Technical Report, Forthcoming.

31. Pace, R. K., LeSage, J.: Closed-form maximum likelihood estimates of spatial autoregressive models: the double bounded likelihood estimator (DBLE), Geographical Analysis-Forthcoming 32. Pace, R. K., Zou, D.: Closed-Form Maximum Likelihood Estimates of Nearest Neighbor Spatial Dependence, Geographical Analysis, Vol. 32, Number 2, The Ohio State University Press (April 2000) 33. Press, W., Teukulsky, S. A., Vetterling, W. T., Flannery, B. P.: Numerical Recipes in Fortran 77, 2nd edn. Cambridge University Press (1992) 34. Shekhar, S., Chawla, S.: Spatial Databases: A Tour, Prentice Hall (2003) 35. Shekhar, S, Schrater, P., Raju, R., Wu, W.: Spatial Contextual Classification and Prediction Models for Mining Geospatial Data, IEEE Transactions on Multimedia, Vol. 4, Number 2 (June 2002) 174-188 36. Smirnov, O., Anselin, L.: Fast Maximum Likelihood Estimation of Very Large Spatial Auto-regressive Models: A Characteristic Polynomial Approach, Computational Statistics & Data Analysis, Volume 35, Issue 3 (2001) 301-319 37. Timlin, J., Walthall, C. L., Pachepsky, Y., Dulaney, W. P., Daughtry, C. S. T.: Spatial Regression of Crop Parameters with Airborne Spectral Imagery. Proceedings of the 3rd Int. Conference on Geospatial Information in Agriculture and Forestry, Denver, CO; (November 2001)

Appendix: Constructing the Neighborhood Matrix W for Irregular Grid Spatial statistics requires some means of specifying the spatial dependence among observations [12]. The neighborhood matrix i.e., spatial weight matrix fulfills this role for lattice models [5,6] and can be formed on both regular and irregular grid. This appendix shows a way to form the neighborhood matrix on the irregular grid which is based on Delaunay triangulation algorithm [28,29]. [30] describes another method of forming the neighborhood matrix on the irregular grid which is based on nearest neighbors. One specification of the spatial weight matrix begins by forming the binary adjacency matrix N where N ij = 1 when observation j is a neighbor to observation i ( i ≠ j ). The neighborhood can be defined using computationally very expensive Delaunay triangulation algorithm [18]. These elements may be further weighted to give closer neighbors higher weights and incorporate whatever spatial information the user desires. By itself, N is usually asymmetric. To insure symmetry, we can rely on the transformation C = N + N T / 2 . The rest of forming neighborhood matrix on irregular grid follows the same procedure discussed in Sect. 3 (see Fig. 5). Users often re-weight the adjacency matrix to create a row-normalized i.e., row-stochastic matrix or a matrix similar to a row-stochastic matrix. This can be accomplished in the following way. Let D represent a diagonal matrix whose ith diagonal entry is the rowsum of the ith row of matrix C. The matrix W = D −1/2 D −1/2 C = D −1C is rowstochastic (see Fig. 5) where D −1/2 is a diagonal matrix such that its ith entry is the inverse of the square root of the ith row of matrix C. Note that the eigenvalues of the matrix W do not exceed 1 in absolute value as noted in Sect. 4.1, and the maximum

(

)

eigenvalue equals 1 via the properties of row-stochastic matrices (see Sect. 5.13.3 in [22]). Despite the symmetry of C, the matrix W will be asymmetric in the irregular grid case as well. One can however invoke a similarity transformation as shown in equation 18 (see Sect. 3, Fig. 5 of this study).

(

~ W = D −1/2

)

−1

(

W D −1/2 

)

−1 

 

−1

= D1/2 WD −1/2 = D −1/2 CD −1/2 .

(18)

~ This results in W having eigenvalues i.e., λ equal to those of W [24]. That is why we ~ call W the symmetric eigenvalue-equivalent matrix of W matrix. Note, the eigenvalues of W do not exceed 1 in absolute value via the properties of row-stochastic matri~ ces (5.13.3 of [22]) because W is similar to W due to the equivalent eigenvalues i.e., ~

− 1 ≤ λi W ≤ 1 . From a statistical perspective, one can view W as a spatial averaging operator. Given the vector y, the row-stochastic normalization i.e., Wy results in a form of local average or smoothing of y. In this context, one can view elements in the rows of W as the coefficients of a linear filter. From a numerical standpoint symmetry of ~ W simplifies computing the logarithm of determinant and has theoretical advantages as well. (See [4,28,29,30] for more information on spatial weight matrices.)

Suggest Documents