Jun 16, 2011 - Spatial statistics and spatial econometrics fill an impressive body of literature, clearly demonstrating the importance of the correspond-.
A Hitchhiker’s View on Spatial Statistics and Spatial Econometrics for Lattice Data Go¨ran Kauermann Harry Haupt Nadeshda Kaufmann 16th June 2011
Abstract Spatial statistics and spatial econometrics fill an impressive body of literature, clearly demonstrating the importance of the corresponding methods. Though the two fields are being referred to in the same breath, they differ substantially as they pursue quite different objectives and hence exhibit different interpretations. In this article we shed some light on the nature and relevance of those differences from a practitioners point of view and exemplarily demonstrate how the models can be employed. The discussion covers models for continuous responses and count data, real data examples, and Monte Carlo simulations.
1
1
Introduction
Spatial statistics is a wide and well established field with Cressie (1993) marking a milestone in its development, see also Ripley (2004). The field has seen impressive dynamics in the last two decades which is mirrored in the historic view on spatial statistics provided by Diggle (2010). For lattice data, as we deal with in this paper, the seminal work by Besag (1974) has been influential and it still is. The underlying idea is that a lattice has a neighborhood structure and observations recorded on a lattice point are conditionally independent of the remaining lattice points given the observation in the neighboring lattice points. See Rue & Held (2005) for a comprehensive collection of applications and theory in this field, see also Gaetan & Guyon (2010). Slightly parallel to spatial statistics evolved the field of spatial econometrics over the last years. Initiated by Paelinck & Klaassen (1979) the field has seen a wide range of developments as mirrored in Anselin (2009). We refer to LeSage & Pace (2009) for a comprehensive introduction, see also Anselin (1988) or Pace & LeSage (2010). The intention of spatial econometric models is inter alia to capture spill-over effects and spatial dependence. Therefore spatial association is assumed for residuals and spatial influence of covariate effects is incorporated in the model. In this paper we contrast the two strands of modeling from a pragmatic, practitioner’s point of view. A comparison of the two fields, including a historical view is found in Griffith & Paelinck (2007). They argue that “in practice, spatial econometrics and spatial statistics reflect traditions of their respective parent disciplines, namely econometrics and statistics.” While this is certainly true, it doesn’t give a guideline which model class to use when it comes to analyzing data with spatial components. The most striking difference between spatial statistics and spatial econometrics becomes obvious when scrolling through books in the two fields, e.g. Cressie (1993) or LeSage & Pace (2009). While spatial statistics produces maps, i.e. spatial estimates are plotted to visualize 2
the spatial structure, there are no (or hardly any) maps found in the spatial econometrics literature. In terms of modeling this has a simple reason. In spatial econometrics the spatial structure is not included additively in the model but implicitly in the correlation structure. One may boil down this difference to the simplified aspect that spatial statisticians focus the question “where?” while spatial econometricians are more interested in the question “how strong?”. We will make this distinction more clear in the paper. And even though most books on spatial econometrics and statistics, respectively, endeavor to cover both fields in some way, for a practitioner it is less than obvious which models to use and which conclusions to draw. This extends to available software to fit one of the two models as can be seen from Bivand, Pebesma & G´omez-Rubio (2008). In this paper we distinguish spatial statistical models and spatial econometric models as follows. In spatial statistical models we include an additive spatial effect. In contrast, the spatial structure is directly included in the correlation structure in spatial econometric models. More specifically, let Y = (Y1 , . . . , YN )T be a vector of spatial random variables recorded on a lattice with some lattice structure. The classical and most simple spatial statistics model takes the form Y = s + ,
(1)
where s is an unobserved spatial random effect capturing the spatial association in the data and are uncorrelated and unstructured residuals. A major focus of interest is to predict s given the data, i.e. to find sˆ = E(s|Y ). This estimate can be plotted and thus provides an answer to “where” spatial association — that is spatial dependence and spatial heterogeneity — can be observed empirically. In contrast, a class of spatial econometric models takes the form Y = ρW Y + , (2)
3
which is labeled as spatial autoregressive model (SAR) subsequently. The matrix W reflects the given neighborhood structure from the lattice and are uncorrelated and unstructured residuals. The focus of interest lies in the estimation of the scalar ρ. In particular the basic assumption underlying model (2) is that spatial association can be captured and hence estimated via the scaling parameter ρ to answer the question of “how strong” the spatial “dependence” is. Apparently models (1) and (2) are similar, but they do provide completely different insights about the data which we will demonstrate exemplarily in this paper. Models (1) and (2) above implicitly (or explicitly) are based on the assumption of normality. This is void if Y are e.g. count data, for example the number of patent registrations in a region or the number of disease cases in a district. The latter data constellation is typically found in the field of disease mapping (see e.g. Lawson et al. 1999, Lawson & Williams, 2001 or Wakefield, 2007). Disease mapping emerged from spatial statistics and has seen numerous developments in the last decades. A survey and comparison of different models for the spatial structure is provided in Best, Richardson & Thomson (2005), see also Pascutto et al. (2000). The underlying model is of Poisson structure Y |s ∼ Poisson(η + o + s), (3) where η is a linear predictor dependent on available covariates, o is an offset, that is a fixed term (which in disease mapping refers to the log-standard mortality ratio) and s is an unobserved random effect capturing the spatial structure. In so far, we obtain model (3) by extending the spatial statistical model (1) towards count data. While the literature on spatial statistical models for disease mapping is vast, the field of spatial econometric models for limited dependent variable models in general is comparably small. Indeed most of this literature focussed on discrete choice models while neglecting count data problems. Only recently Bhati (2008) and Lambert, Brown & Florax (2010) confirmed this lack of attention. Interestingly these authors 4
adapt and build on work from spatial statistics due to Kaiser & Cressie (1997) to formulate a coherent SAR-Poisson model and discuss its estimation. The paper is organized as follows. In Section 2 we introduce the model classes considered in this paper. Section 3 gives two data examples and reports results of a simulation study. In Section 4 we extend both models towards non-normal response count data. A conclusion finalizes the paper.
2 2.1
Spatial Econometric and Statistical Models Spatial Statistical Models
Assume a lattice structure for the locations at which observations Y = (Y1 , . . . , YN ) are recorded and let N = {N1 . . . , NN } be the index sets of neighbors, i.e. for j ∈ Ni ⊂ {1, . . . , N } we have j as neighbor of i and in this case i ∈ Nj as well. The neighborhood structure N can be summarized in matrix form by defining the N × N matrix W with elements Wij = 1/|Ni | for j ∈ Ni and Wij = 0 otherwise. Let further X be the N × p dimensional matrix of available covariates. A classical multiple linear regression model now relates Y and X through Y = Xβ+ where errors have zero mean but might carry spatial correlation. In the vein of model (1) we may postulate =s+e
(4)
with s ∼ N (0, σs2 Ω) and e ∼ N (0, σe2 IN ). We refer to this as spatial statistical model. The error is decomposed additively into a structured component and an unstructured part. The covariance matrix Ω thereby mirrors the spatial association. In principle Ω may carry any reasonable structure, but two types are most common in practice, the so called conditional and the simultaneous model. The conditional model is better known as Gaussian Markov Random Field 5
(GMRF), initially proposed in Besag (1974) and extensively discussed in Rue and Held (2005, 2010). In this case we define the spatial concentration matrix Q = Ω−1 through Qij = −1 for j ∈ Ni and Qij = 0 otherwise, for j 6= i, and Qii = |Ni |. Conditioning on neighboring lattice points (sj , j ∈ Ni ) we obtain spatial effects si independent of the remaining lattice points. This yields the model σs2 1 X sj , , i = 1, . . . , N. (5) si | (sj , j ∈ Ni ) ∼ N |Ni | j∈N |Ni | j
As a consequence the spatial association can be captured via a covariance matrix Ω = (IN − W )−1 V with V = diag(1/|Ni |). Alternatively one may define Ω through a simultaneous model as originally proposed in Whittle (1954), see also Cressie (1993). The structure of this model is related to (2), but formulated here for the spatial effect. We set s = Ws + u
(6)
where u ∼ N (0, V ). In this case we set Ω = (IN − W )−1 V (IN − W T )−1 . These are the two traditional and common settings, though other scenarios are possible and may be suitable as well. Note that by using the spatial component model (1) the covariance structure of the errors is given by Cov(Y |X) = Cov() = σe2 IN + σs2 Ω.
(7)
Since the covariance decomposes additively into two components we can derive the posterior distribution of s given observations Y through s|Y ∼ −1 N sˆ, σs2 Ω(I + σs2 /σ2 Ω)−1 with sˆ = I + σe2 /σs2 Ω−1 Y . This relates the model to its original geostatistical tools proposed by Krige (1951). Note also that σs2 is the scale parameter carrying and mirroring the spatial association and σs2 = 0 yields independent residuals. Spatial statistical models in the form described above can be cast as lin6
ear mixed models and hence be fitted employing well developed theory and software, see e.g. Pinheiro & Bates (2000). To do so we reformulate the regression model Y = Xβ + with ∼ N (0, σs2 Ω + σe2 In ) as the linear mixed model 1 Y = Xβ + Ω 2 s˜ + e (8) where s˜ ∼ N (0, σs2 In ) and e ∼ N (0, σe2 In ). This model is easily fitted e.g. with the linear mixed model procedure in R. Alternatively more general Bayes estimates are available based on MCMC routines, see e.g. Fahrmeir, Kneib & Lang (2004) or Belitz et al. (2009, www.stat.uni-muenchen.de/~bayesx/ bayesx.html).
2.2
Spatial Econometric Model
Following now a spatial econometric approach there are two central models being used. First, again building on the classical regression model Y = Xβ + , it is assumed that the spatial association in the data can be captured by the covariance structure Cov(Y |X) = Cov() = σ2 Ω.
(9)
This model class is called spatial error model (SEM), see LeSage & Pace (2009). In particular, the covariance matrix does not mirror an additive structure as in (7), but the spatial structure is directly imposed on . In principle there are numerous possibilities to choose Ω, analogously to the previous section, but in spatial econometrics the simultaneous model is found most often, see LeSage & Pace (2009). The latter can be motivated by comprehending = ρW + v with v ∼ N (0, σ2 In ) and ρ as a single scale parameter of spatial association. Note that ρ = 0 leads to the case of no spatial association in the covariance
7
structure, i.e. Ω = In , while in general Ω = Ω(ρ) = (I − ρW )−1 (I − ρW T )−1 . Contrasting the covariance structure (9) to the additive version (7) we observe that the former lacks the possibility to predict the spatial structure. That is to say no spatial plots result as outcome of the fitting procedure. It is easily seen that regardless of the model used for the covariance of Y , the simple least squares estimate for β is consistent in the SEM model defined in (9), though not efficient. Hence, if the focus is purely on the estimation (and interpretation) of β, the use of any of the spatial models discussed above ˆ This is not the case for the aims only to increase the efficiency of estimate β. spatial autoregressive model (SAR) which is defined through Y = ρW Y + Xβ + ˜ + ˜, ⇔ Y = Xβ
(10)
˜ = where ˜ ∼ N (0, σ2 Ω) with Ω = (I − ρW )−1 (I − ρW T )−1 as above and X (I −ρW )−1 X as modified design matrix. Clearly, in this model the parameter estimate of β and its interpretation is different as it contains spill-over effects resulting through the covariates, see LeSage & Pace (2009) for more details. Finally, the latter two models can be extended towards a spatial Durbin model which, however, will not be further discussed in this paper. Today, estimation of spatial econometric models is routinely done with MaximumLikelihood methods and routines are available in R, see Bivand, Pebesma & G´omez-Rubio (2008).
2.3
Model Comparison
Spatial statistical and spatial econometric models are substantially different and a fair comparison must be driven by substance matter questions. Nonetheless, a statistical comparison is helpful since it is data driven. We pursue this idea by using the Akaike Information Criterion (AIC), see e.g. Claeskens & Hjort (2008). For the spatial statistical models we therefore use 8
the mixed model formulation (8). Integrating out the random spatial effect leaves us with the so called marginal model Y ∼ N (Xβ, σ2 In + σs2 Ω).
(11)
Let l(β, σ2 , σs2 ) denote the log-likelihood resulting from (11), then the AIC is defined through ˆ σ AIC := −2 l(β, ˆ2 , σ ˆs2 ) + 2 df (12) where df denotes the degree of the model defined through the number of parameters, i.e. the number of components in β plus 2 for the variance parameters σ2 and σs2 , see Vaida & Blanchard (2005) or Wager, Vaida & Kauermann (2007). Accordingly, let l(β, σ2 , ρ) be the log-likelihood for the spatial econometric models, i.e. the spatial error model and the spatial autoregressive model, respectively. Then the AIC for these models is defined ˆ σ through −2 l(β, ˆ2 , ρˆ) + 2 df with df as defined above. The intention is to minimize the AIC value. Note that in all models the structure of the spatial association is assumed as fixed but the strength of the spatial association is fitted from the data. That is to say we assume that the neighborhood matrix W is defined as stated above and not within the scope of model selection.
3
Examples and Simulations
Our intention with this paper is to give a practical view on choosing a spatial econometric or spatial statistical approach and to highlight their interpretative differences, respectively. This aim is pursued by discussing two real data examples and a simulation study.
3.1
Regional Growth of high-skilled Employees
In the first data example we analyze the spatial distribution and convergence of high-skilled employees. The dependent variable in this example 9
def
is grschooli = log(schooli,2005 ) − log(schooli,1996 ), where schooli,t is the number of those employees liable for social security insurance in district i (as a place-of-work) and year t, who have at least eleven years of schooling (and a degree). The latter serves as a proxy for the share of high-skilled employees. Hence, the response variable is interpreted as the log ratio of high-skilled employees in 2005 and 1996. Data is available for all 439 administrative districts of Germany. The single def explanatory variable is school0i = schooli,1996 , the percentage of highskilled employees in district i at time 1996. We adapt the modelling approach of Mankiw, Romer & Weil (1992) and Barro & Sala-i Martin (2004) and analyze the so called unconditional β-convergence adapted here to a regional context. The convergence regression model is thereby defined through grschooli = α + β log(school0i ) + i .
(13)
In (13) the coefficient β expresses the β-convergence and convergence takes place if β > 0. In this case districts with a lower concentration of high skilled employees increase their concentration faster than districts with a higher concentration. As a consequence, convergence is said to be present if skill concentration differences across the districts decrease. The spatial structure of the data for the response variable and explanatory variable are displayed in Figure 1. The plot suggests to include an east/west indicator variable westi , to control for political and structural effects of Germany‘s reunification. This is done by extending model (13) to grschooli = α + β1 log(school0i ) + β2 westi + i , where westi takes value 1 if the i-th district is in former West Germany and 0 otherwise. The latter model is a simple form of a conditional convergence regression model, though the interpretation of β remains the same. Recently Phillips & Sul (2007) argue that the estimation of models in the vein of (13) is inconsistent due to the existence of neglected heterogeneity
10
over districts1 . Using the methods outlined in the previous section, we try to assess whether (part of) this potential heterogeneity may be due to spatial association. We fit model (13) using ordinary least squares (OLS), the conditional and simultaneous spatial statistical model discussed in Section 2.1 and the spatial error model (SEM), and the spatial autoregressive model (SAR) from Section 2.2, respectively. From the coefficient estimates listed in Table 1 (top half) for model (13) we see little difference with respect to the respective economic implications, that is β-convergence cannot be rejected for any of the models. This still holds true — though βˆ is remarkably reduced in magnitude — when an east/west indicator is introduced. The fitted coefficients are shown in the bottom half of Table 1. The inclusion of the east/west indicator uncovers that there is a significant positive growth effect for districts in former West Germany. While the convergence parameter is very similar across models, the estimate of the “West” effect is considerably larger for spatial statistical models. Especially interesting are the estimated spatial effects sˆ, visualized in the displays of Figure 2. Once we have controlled for the “West” effect we observe that though the spatial effect is much weaker it has a distinctive spatial pattern. This latter pattern is smooth whereas the patterns generated by model (13) are still dominated by the big economic impact exerted from the shift due to the “West” effect. In summary the results indicate that there is spatial association in growth patterns. Thus, an analysis of growth convergence based on the spatial statistical mixed model (8), delivering smooth spatial effects, seems to provide additional reliable insights into the heterogeneity of growth and convergence patterns. Looking at the AIC as criterion for model selection we see a remarkable drop in the AIC values when including the east/west indicator but no strong discrepancy between the models, with the minimum AIC value for the spatial 1
The same argument applies to conditional growth models, that is extended versions of (13), including a potentially huge number of additional covariates.
11
error model.
3.2
Employment in Business and Service Industry
As second example we look at the number of employees in Germany, here focusing exclusively and exemplarily on employees in the branch “business and service” in 2007. We define with log.emp the logarithm of the proportion of individuals in a district working in the business and service sector. We assume that this quantity depends on a number of quantities describing the economic condition of a district. These are log.pop as logarithm of the population density, unemp giving the unemployment rate in a district, tax denoting the district specific industrial tax and, as before, we employ an east/west indicator west. We fit the same collection of models as in the previous example and list the fitted results in Table 2. Let us first look at the model where the east/west indicator is excluded. It is striking to see that the unemployment rate has a significant positive effect in the spatial autoregressive model as well as in the simple OLS model. This apparently is counterintuitive, since an increased unemployment rate reflects weaker economic conditions for a district which in turn is likely to reduce the number of employees working in the business and service sector. In contrast to the previous example we now observe substantial differences in AIC values, clearly favoring the spatial statistical models. These models allow us to gain a deeper insight into the spatial association structure and visualize spatial patterns by plotting the spatial effect sˆ as shown in Figure 3. The upper row shows the results fitted without east/west indicator for the conditional model (left) and the simultaneous model (right). We can clearly identify spatial heterogeneity, expressed in increased employment rates in the eastern part of Germany. Including an east/west indicator to capture this east/west effect yields the estimates listed in the bottom part of Table 3. Clearly, the AIC reduces remarkable for all models when the east/west indicator is included. In the SAR model there is hardly any spatial effect remaining since ρ is estimated to 12
0.001. Moreover, the effect of the unemployment rate becomes negative and hence meaningfully interpretable. For the remaining coefficients the different models now provide comparable results and the spatial statistical models hardly differ. This also applies to their fitted spatial effect shown in the bottom row of Figure 3. Overall we conclude that the east/west indicator is important and spatial statistical models are able to visualize the necessity to include the east/west indicator.
3.3
Simulations
To further explore the differences between the models and to evaluate the AIC as model selection criterion we simulate data from a regular grid of size 20 by 20 lattice points as sketched in Figure 4. We thereby simulate from each of the four models introduced above and fit the resulting data using again all four models. The covariate x is binary with P (x = 1) = P (x = 0) = 0.5 and parameter β is set to 1. The residual variance is fixed at σ2 = 0.42 . The spatial variance is set to values σs2 = 0.12 , 0.22 , and 0.42 for the spatial statistical models and accordingly ρ = 0.1, 0.2, and 0.4 for the spatial econometric models, leading to 12 simulation scenarios (4 models times 3 dependence strengths). Figure 5 displays the resulting AIC values as boxplots each based on 50 simulations. Row-wise we show the simulations drawn from the conditional model (GMRF, row 1), the simultaneous model (SIM, row 2), the spatial error model (SEM, row 3) and the spatial autoregressive model (SAR, row 4). Column-wise we show the simulations for the three settings of the strength of spatial correlation. Each plot displays the AIC value for the fitted models GMRF, SIM, SEM and SAR, respectively. We would therefore expect that in the first row with simulations based on the GMRF model the GMRF fit performs best and accordingly for the other rows. We see that the simultaneous model (SIM) is best identified based on the AIC value, even for small spatial variances σs2 . Similarly is the SAR model well identified for large values of ρ. 13
In Figure 6 we show the fitted coefficients βˆ with true value β = 1 indicated as dashed horizontal line. As can be seen, the estimates are unbiased regardless of the simulation model except of the SAR simulations in the last row. While for ρ = 0.4 the estimate is biased downwards for the fitted GMRF and SEM model, interestingly this is not the case for the (incorrect) simultaneous model SIM as the fit βˆ here behaves reasonable.
4 4.1
Spatial Models for Count Data Spatial Statistical Model
Let us now assume that Y consists of count data, e.g. the number of patent registrations in a district as in our example discussed later. Again, denoting with X the matrix of covariates we extend the spatial statistical model (4) to Y |s, e ∼ Poisson(λ = exp{η + o + s + e}) s ∼ N (0, σs2 Ω),
e ∼ N (0, σe2 I)
(14)
where η = Xβ is the linear predictor and o is a fixed term, the so called offset to be specified later, see Besag, York & Molli´e (1991). As before, s is the spatial effect with variance σs2 exhibiting the spatial structure while e is an unstructured effect which is capable to account for overdispersion or unobserved heterogeneity (see Pascutto et al. 2000). Analogous to the spatial statistical model for normal data we can structure the spatial variance matrix Ω to follow a Gaussian Markov Random Field, referred to as conditional model. Alternatively one may carry over the simultaneous model used above to the non-normal case. The latter is however less commonly found in the literature so that we restrict the presentation to the conditional model subsequently. In model (14) it is not uncommon to assume that σe2 ≡ 0, so that spatial association is exclusively captured by effect s, 14
see Clayton & Kaldor (1987) or Besag & Kooperberg (1995). Combinations of the model have been suggested e.g. by MacNab (2003) but will not further be pursued here. For fitting, Bayesian procedures have become standard, which may be explained through the fact that the marginal likelihood of (14), resulting by integrating out s and e, is not available analytically. Therefore a fully Bayesian formulation and computation is commonly used, see e.g. Best, Richardson & Thomson (2005). The avoid the computational burden of MCMC routines Rue, Martino & Chopin (2009) have recently suggested and promoted the use of approximate methods based on integrated Laplace Approximations (INLA). The procedure demonstrates numerically efficient by maintaining accurate performance. We follow the latter approach and provide some details in the Appendix.
4.2
Spatial Econometric Model
As already mentioned, econometric models for spatial count data are less widely developed. We refer exemplarily to Bhati (2008) and Lambert, Brown & Florax (2010). Count LeSage, Fischer & Scherngell (2007) model spill-over effects in patent data, see also Fischer, Scherngell & Jansenberger (2005). Simplifying their model to make it comparable to our framework we assume the structure Y |s ∼ Poisson(λ = exp{η + o + s}) s = ρW s + e,
e ∼ N (0, σe2 I).
(15)
In analogy to the spatial statistical model introduced above, spatial hetero geneity of the form s ∼ N 0, σe2 Ω(ρ) with Ω(ρ) = (I − ρW )−1 (I − ρW T )−1 . Comparing (15) with (14) shows that in (15) coefficient ρ carries the spatial correlation while σe2 accounts for spatial heterogeneity approximated by district specific variation. Hence, the spatial econometric model and spatial 15
statistical model are again comparable, but nonetheless different when modeling count data. Estimation again requires the use of numerical routines like MCMC or, as used in the example below, approximate methods like the INLA approach. Note that unlike in the normal model we can now identify s in that we can calculate sˆ = E(s|Y ). Hence, for non-normal spatial econometric models we can produce maps to visualize the spatial dependence structure. Due to the fact that AIC is not a suitable criterion for model comparison in Bayesian statistic, we subsequently use the Deviance Information Criterion (DIC) introduced by Spiegelhalter et al. (2002).
4.3
Example
To demonstrate the application of spatial models for count data we consider the number of patent applications in Germany assigned to 422 administrative districts in 2005. Analogous to Example 3.2 we employ as explanatory variables the logarithm of the population density (popdens), the district specific industrial tax rate (tax), the unemployment rate (unemployed) and an east/west dummy (west). The logarithmic number of employees per district is taken as offset, mirroring the simple fact that in larger districts more patent applications are expected. We fit the models using the INLA approach proposed in Rue, Martino & Chopin (2009) and sketched in the Appendix. For the spatial statistical model we fitted model (14) by setting σe2 ≡ 0, i.e. ignoring the unstructured effect (model A) and in its original form (model B). Finally we fitted model (15) (model C). To keep it simple we select parameter ρ in a non-Bayesian form by maximizing the marginal likelihood. The results are listed in Table 3, again by showing models with and without the east/west indicator. The first two columns give the estimates in model A (without and with east/west indicator, respectively). Including the unstructured effect e in model B leads to the estimates are shown in the middle two columns. The estimates in the spatial econometric model (model C) are 16
shown in the right two columns. We report the posterior mean and standard deviation. Apparently the estimates in this example are comparable regardless of the model being fitted, where the inclusion of the east/west indicator in general decreases the strength of the effect of unemployment rate. The negative sign is plausible in that unemployment rate is an indicator for the economic circumstances of a district. The higher the unemployment rate, the worse the economic situation, the fewer the patent applications in that district. If we account for unobserved heterogeneity, that is if we allow σe2 > 0 in model (14), we clearly strengthen the negative effect of the unemployment rate. The estimated spatial effects sˆ for both statistical as well as econometrical models described above including east/west indicator are visualized in Figure 7. The plots look comparable.
5
Discussion
The paper discussed and illuminated different modeling aspects and viewpoints of spatial statistics and spatial econometrics. We thereby restricted the discussion to data collected on a lattice or grid. The hitchhiker’s lift has come to its end and we may question where we are. Apparently, aims and scopes of spatial statistical models and spatial econometric models are different, though they share some common aspects. The fundamental difference is that spatial statistical models for normal data allow to visualize spatial effects while spatial econometric models aim to infer the presence of mutual spill-over effects. Leaving the normal response scenario, this conceptual difference becomes weaker and the models themselves become more comparable. After all, however, the two model classes remain disjunct and referring to the two strands of modeling is preferably done in two breaths instead of a single one. Generally, the inclusion of spatial effects requires to interpret parametric effects conditional on the spatial structure. In fact, the inclusion of spatial
17
effects can “mess up” fixed effects as Hodges & Reich (2010) state. As this applies to both spatial statistical and econometric models, it has not particularly been focused in this paper.
A
Integrated Nested Laplace Approximation (INLA)
Rue, Martino & Chopin (2009) suggest to circumvent computer intensive MCMC methods by making use of Laplace Approximations, see also Schr¨odle & Held (2010). First, one assumes a fully Bayesian framework for model (14) by imposing a (flat) prior on coefficient vector β and a Gamma prior on the inverse variances σs−2 and σe−2 . Let θ = (β T , sT , eT ) and denote with σ 2 = (σs2 , σe2 ), then Bayes reasoning results through calculating the posterior probability Z ∞
p(θ|Y, σs2 , σe2 )p(σ 2 |Y ) dσ 2 .
P (θ|Y ) = 0
The integral is non-analytic but may be approximated through the quadrature X p˜(θ|Y ) = p˜(β|Y, σj2 ) p˜(σj2 |Y ) ∆j , j
where σj2 are quadratic points with ∆j as appropriate weights and both p˜(θ|Y, σj2 ) and p˜(σj2 |Y ) are approximations of the corresponding distributions. To be specific, p˜(θ|Y, σj2 ) denotes a Gaussian approximation centered around a posterior mode (or equivalently the penalized quasi likelihood - PQL) esˆ 2 ), say, when σ timate θ(σ ˆ 2 is considered as fixed (see Breslow & Clayton, 1993). Accordingly, p˜(σj2 |Y ) is proportional to the ratio p˜(σj2 |Y
p(Y, θ, σ 2 ) . )∝ p˜(θ|Y, σ 2 ) θ=θ(σ ˆ 2)
Fong, Rue & Wakefield (2010) demonstrate the flexibility of the approach 18
and the R-package (www.r-inla.org) provides the program tools to fit the models.
19
References Anselin, L. (1988). Spatial Econometrics: Methods and Models. Boston: Kluwer Academic Publishers. Anselin, L. (2009). Thirty years of spatial econometrics. Papers in Regional Science 89, 3–25. Barro, R. and Sala-i Martin, X. (2004). Economic Growth. The MIT Press. Belitz, C., Brezger, A., Kneib, T., and Lang, S. (2009). Bayesx bayesian inference in structured additive regression models. [Online: http://www.stat.uni-muenchen.de/∼bayesx/bayesx.html]. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussions). Journal of the Royal Statistical Society, Series B 36, 192 – 236. Besag, J. and Kooperberg, C. (1995). On conditional and intrinsic autoregressions. Biometrika 82, 733–746. Besag, J., York, J., and Molli´e, A. (1991). Bayesian image restoration, with two applications in spatial statistics. Annals of the Institute of Statistical Mathematics 43, 1–59. Best, N., Richardson, S., and Thomson, A. (2005). A comparison of bayesian spatial models for disease mapping. Statistical Methods in Medical Research 14, 35–59. Bhati, A. (2008). A generalized cross-entropy approach for modeling spatially correlated counts. Econometric Reviews 27, 574–595. Bivand, R., Pebesma, E., and G´omez-Rubio, V. (2008). Applied Spatial Data Analysis with R. New York: Springer Verlag. Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed model. Journal of the American Statistcal Association 88, 9–25. 20
Claeskens, G. and Hjort, N. L. (2008). Model selection and model averaging. Cambridge: Cambridge University Press. Clayton, D. and Kaldor, J. (1987). Empirical Bayes estimates of agestandardized relative risks for use in disease mapping. Biometrics 43, 671–681. Cressie, N. A. (1993). Statistics for Spatial Data. New York: Wiley. Diggle, P. J. (2010). Historical introduction. In A. E. Gelfand, P. J. Diggle, M. Fuentes, & P. Guttorp (Eds.), Handbook of Spatial Statistics, pp. 3 – 16. Boca Raton: Chapman & Hall/CRC. Fahrmeir, L., Kneib, T., and Lang, S. (2004). Penalized sturctured additive regression for space-time data: A Bayesian perspective. Statistica Sinica 14, 715–745. Fischer, M. M., Scherngell, T., and Jansenberger, E. (2005, August). The geography of knowledge spillovers between high-technology firms in europe - evidence from a spatial interaction modelling perspective. ERSA conference papers ersa05p5, European Regional Science Association. Fong, Y., Rue, H., and Wakefield, J. (2010). Bayesian inference for generalized linear mixed models. Biostatistics 11, 397–412. Gaetan, C. and Guyon, X. (2010). Spatial Statistics and Modeling. Berlin: Springer. Griffith, D. A. and Paelinck, J. H. (2007). An equation by any other name is still the same: on spatial econometrics and spatial statistics. The Annals of Regional Science 41, 209–227. Hodges, J. and Reich, B. (2010). Adding spatially-correlated errors can mess up the fixed effect you love. The American Statistician 64, 335– 344. Kaiser, M. and Cressie, N. (1997). Modeling Poissson variables with positive spatial dependence. Statistics and Probability Letters 35, 423–432. 21
Krige, D. (1951). A statistical approach to some mine valuation and allied problems on the Witwatersrand. Master’s thesis, University of Witwatersrand. Lambert, D., Brown, J., and Florax, R. (2010). A two-step estimator for a spatial lag model of counts: Theory, small sample performance and an application. Regional Science and Urban Economics 40, 241–252. Lawson, A., Biggeri, A., B¨ohning, D., Lesaffre, E., Viel, J.-F., and Bertollini, R. E. (1999). Disease Mapping and Risk Assessment for Public Health. Chichester, UK: Wiley. Lawson, A. B. and Williams, F. L. R. (2001). An Introductory Guide to Disease Mapping. New York: Wiley Medical Sciences. LeSage, J., Fischer, M., and Scherngell, T. (2007). Knowledge spillovers across europe: Evidence from a Poisson spatial interaction model with spatial effects. Papers in Regional Science, Blackwell Publishing 86, 393–421. LeSage, J. and Pace, K. (2009). Introduction to Spatial Econometrics. London: Taylor and Francis. MacNab, Y. (2003). Hierarchical bayesian spatial modelling of small-area rates of non-rare disease. Statistics in Medicine 22, 1761–1773. Mankiw, N., Romer, D., and Weil, D. (1992). A contribution to the empirics of economic growth. Quarterly Journal of Economics 107, 407–437. Pace, R. K. and LeSage, J. (2010). Spatial econometrics. In A. E. Gelfand, P. J. Diggle, M. Fuentes, & P. Guttorp (Eds.), Handbook of Spatial Statistics, pp. 245 – 262. Boca Raton: Chapman & Hall/CRC. Paelinck, J. H. P. and Klaassen, L. H. (1979). Spatial Econometrics. Farnborough, Hants: Saxon House. Pascutto, C., Wakefield, J., Best, N., Richardson, S., Bernardinelli, L., Staines, A., and Elliott, P. (2000). Statistical issues in the analysis of 22
disease mapping data. Statistics in Medicine 19, 2493–2519. Phillips, P. and Sul, D. (2007). Transition modeling and econometric convergence tests. Econometrica 75, 1771–1855. Pinheiro, J. and Bates, D. (2000). Mixed-Effects Models in S and Splus. New York: Springer Verlag. Ripley, B. (2004). Spatial Statistics. New York: John Wiley & Sons. Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Boca Raton: Chapman & Hall/CRC. Rue, H. and Held, L. (2010). Discrete spatial variation. In A. E. Gelfand, P. J. Diggle, M. Fuentes, & P. Guttorp (Eds.), Handbook of Spatial Statistics, pp. 171 – 200. Boca Raton: Chapman & Hall/CRC. Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for latent gaussian models using integrated nested Laplace approximations. Journal of the Royal Statistical Society, Series B 71, 319–392. Schr¨odle, B. and Held, L. (2010). A primer on disease mapping and ecological regression using INLA. Computational Statistics 26, 241–258. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and van der Linde, A. (2002). Bayesian measure of model complexity and fit. Journal of the Royal Statistical Society, Series B 64, 583–639. Vaida, F. and Blanchard, S. (2005). Conditional Akaike information for mixed-effects models. Biometrica 92, 351–70. Wager, C., Vaida, F., and Kauermann, G. (2007). Model selection for Pspline smoothing using Akaike information criteria. Australian and New Zealand Journal of Statistics 49, 173–190. Wakefield, J. (2007). Disease mapping and spatial regression with count data. Biostatistics 8, 158–183.
23
Whittle, P. (1954). On stationary processes in the plane. Biometrica 41, 434–449.
24
-0.2076
0.8013
0.0195
0.2223
Figure 1: Spatial distribution of response variable grschool (left display) and explanatory quantity school0 (right display).
I
-0.2469
0
0.2469
-0.1532
0
0.1532
-0.0988
0
0.0988
-0.0822
0
0.0822
Figure 2: Fitted spatial effects sˆ for regional growth data in model without east/west indicator (upper row) and with east/west indicator (bottom row). Left column displays the conditional model, right column shows the simultaneous model.
II
-0.4813
0
0.4813
-0.4038
0
0.4038
-0.3564
0
0.3564
-0.3101
0
0.3101
Figure 3: Fitted spatial effects sˆ for employment share data in model without east/west indicator (upper row) and with east/west indicator (bottom row). Left column displays the conditional model, right column the simultaneous model.
III
Figure 4: Sketch of grid structure used for simulations.
IV
450 500 550
0.2
SIM
SEM
SAR
600
● ●
GMRF
SIM
0.1
SEM
SAR
SEM
SAR
●
SIM
SEM
SAR
●
●
GMRF
SIM
0.2
SEM
SAR
0.4
450
0.1
400
450
●
● ● ● ●
● ●
SEM
SAR
●
SIM
●
● ●
GMRF
●
SIM
0.1
●
●
SEM
SAR
0.2
●
●
GMRF
SIM
● ●
● ●
SEM
SAR
0.4
550
GMRF
● ●
300
● ● ●
350
SEM 350
SAR
●
750
● ●
GMRF
SEM
600
700 500
400
SIM
600
SIM
SIM
●
0.4 ●
GMRF
GMRF
●
0.2
●
●
● ●
900
GMRF
0.4 ● ●
500
380 420 460
GMRF
0.1
●
●
GMRF
SIM
SEM
350
●
350
●
450
450
420 360
SAR
●
SAR
GMRF
SIM
SEM
SAR
GMRF
SIM
SEM
SAR
Figure 5: Akaike Information Criterion (AIC) for simulations drawn from the conditional model (GMRF, first row), the simultaneous model (SIM, second row), the spatial error model (SEM, third row) and the spatial autoregressive model (SAR, fourth row). Spatial correlation is set to σs = 0.1 and ρ = 0.1, respectively (first column), σs = ρ = 0.2 (second column) and σs = ρ = 0.4 (third column).
V
0.2
1.15
0.1
0.4 ● ●
●
0.85
●
GMRF
1.00
1.00 0.90
1.00 0.90
GMRF
●
SIM
SEM
SAR
GMRF
SIM
SAR
●
●
●
●
●
●
●
GMRF
SIM
SEM
SAR
SIM
SAR
●
GMRF
SIM
1.10
1.10
1.00 0.90
1.00 GMRF
SIM
SEM
SAR
●
GMRF
SIM
0.85 0.95 1.05
1.10 0.90 GMRF
SIM
SEM
SEM
SAR
0.4
1.00
1.00
SAR
●
0.2
0.90
SEM
0.4
0.90
1.10 1.00 0.90
SEM
SEM
●
0.1
SAR
● ●
0.2
SAR
SAR
1.00 ●
GMRF
0.1
SEM
SEM
0.85
●
SAR
●
SIM
SIM
0.4
1.00 ●
SEM
SIM
0.85
●
GMRF
GMRF
● ●
0.2
1.00
SIM
0.85
●
●
●
●
GMRF
1.15
0.1
SEM
●
SAR
● ●
GMRF
SIM
SEM
SAR
Figure 6: Fitted β coefficients for different simulation scenarios. Order of plots as in Figure 5.
VI
-2.9031
0
2.9031
-2.2881
0
2.2881
-2.8758
0
2.8758
Figure 7: Fitted spatial effects sˆ for patent data in model A (left), in model B (center) and in model C (right) including east/west indicator.
VII
effect
OLS
grschool -0.156(0.012) σ ˆs2 or ρˆ 2 0.01 σ AIC -551 grschool -0.063(0.012) west 0.198(0.013) 2 σ ˆs or ρˆ 0.01 σ2 AIC -726
spatial statistics spatial econometrics conditional simultaneous SEM SAR Model without east/west indicator -0.101(0.013) -0.090(0.013) -0.121(0.014) -0.110(0.013) ρˆ = 0.096 ρˆ = 0.055 σ ˆs2 = 0.042 σ ˆs2 = 0.112 0.01 0.01 0.01 0.01 -648 -653 -636 -598 Model with east/west indicator -0.064(0.012) -0.063(0.012) -0.064(0.013) -0.056(0.012) 0.236(0.022) 0.262(0.025) 0.197(0.016) 0.184(0.015) ρˆ = 0.050 ρˆ = 0.015 σ ˆs2 = 0.012 σ ˆs2 = 0.062 0.01 0.01 0.01 0.01 -732 -729 -739 -728
Table 1: Fitted coefficients for regional growth data. Standard deviations in brackets. Coefficient estimates are typed in bold if they are significant on a 5 % level overall.
VIII
spatial statistics spatial econometrics effect OLS conditional simultaneous SEM SAR Model without east/west indicator log.pop 0.28(0.02) 0.29(0.02) 0.28(0.02) 0.28(0.02) 0.27(0.02) unemp 0.68(0.22) -0.83(0.34) -1.09(0.20) 0.30(0.28) 0.66(0.22) tax 3e-4(4e-4) 6e-4(4e-4) 8e-4(4e-4) 1e-3(4e-4) 4e-4(4e-4) 2 2 2 2 2 σ ˆs or ρˆ - σ ˆs = 0.23 σ ˆs = 0.08 ρˆ = 0.07 ρˆ = 0.004 0.09 0.06 0.06 0.08 0.08 σ2 AIC 175 123 118 150 175 Model with east/west indicator log.pop 0.28(0.02) 0.29(0.02) 0.28(0.02) 0.28(0.02) 0.28(0.02) unemp -0.45(0.30) -1.37(0.35) -1.42(0.35) -0.87(0.33) -0.44(0.30) tax 6e-4(4e-4) 6e-4(4e-4) 7e-4(4e-4) 1e-3(4e-4) 6e-4(4e-4) west -0.24(0.04) -0.30(0.06) -0.31(0.07) -0.32(0.05) -0.23(0.04) 2 2 2 ρˆ = 0.07 ρˆ = 0.001 σ ˆs2 = 0.062 - σ ˆs = 0.19 σ ˆs or ρˆ 2 σ 0.08 0.06 0.06 0.07 0.08 AIC 150 104 103 115 151 Table 2: Fitted coefficients for employment share data. Coefficient estimates are typed in bold if they are significant on a 5 % level overall.
spatial statistics effect model A model B popdens 0.25(0.06) 0.21(0.06) 0.31(0.06) 0.28(0.06) tax -6e-3 (9e-4) -6e-3(9e-4) -5e-3(9e-4) -5e-3(9e-4) unemployed -2.05(0.95) -1.41(0.94) -3.01(0.98) -2.01(0.95) west - 0.96(0.22) - 1.02 (0.21) 2 2 2 2 σˆs or ρ 0.53 0.56 0.96 1.132 σˆe2 0 0 6.252 5.772 DIC 3267 3267 3264 3263
spatial econometrics model C 0.28 (0.06) 0.27(0.06) -6e-3(8e-4) -6e-3(8e-4) -1.95(0.88) -1.89 (0.88) 1.05 (0.19) 0.8 0.7 2 2.6 2.52 3266 3264
Table 3: Fitted coefficients for patent data. Coefficient estimates are typed in bold if they are significant on a 5 % level overall.
IX