Estimation and Model Selection of Higher-order Spatial Autoregressive Model: An Efficient Bayesian ApproachI Xiaoyi Hana,∗, Chih-Sheng Hsiehb , Lung-fei Leec a Wang
Yanan Institute for Studies in Economics(WISE), Xiamen University, Xiamen,361005 China of Economics, The Chinese University of Hong Kong,9/F, Esther Lee Building, CUHK, Shatin, HK c Department of Economics, Ohio State University, 1945 N. High St., Columbus, OH 43210 USA
b Department
Abstract In this paper we consider estimation and model selection of higher-order spatial autoregressive model by an efficient Bayesian approach. Based upon the exchange algorithm, we develop an efficient MCMC sampler, which does not rely on special features of spatial weights matrices and does not require the evaluation of the Jacobian determinant in the likelihood function. We also propose a computationally simple procedure to tackle nested model selection issues of higher-order spatial autoregressive models. We find that the exchange algorithm can be utilized to simplify the computation of Bayes factor through the Savage-Dickey density ratio. We apply the efficient estimation algorithm and the model selection procedure to study the “tournament competition” across Chinese cities and the spatial dependence of county-level voter participation rates in the 1980 U.S. presidential election. Keywords: Higher-order spatial autoregressive model, Exchange algorithm, Bayesian estimation, Bayes factor, Savage-Dickey density ratio JEL classification: C11, C21, C33
1. Introduction Higher-order spatial autoregressive (SAR) models include more than one spatial weights matrix. While a SAR model with a single spatial weights matrix can capture one type of dependence relation, higher-order SAR models introduce different kinds of neighbors and can characterize different types of spatial dependence. For instance, in the context of strategic interaction among local governments, a government’s expenditure on schooling might be affected by both of its geographically close and economically similar neighbors (Tao, 2005). In this setting, two spatial weights matrices are needed: one is based upon geographical contiguity, I We are grateful to the Co-Editor and two anonymous referees for their helpful comments. We also thank Ming Lin and Andrew Pua for helpful suggestions. Han gratefully acknowledges the financial support of the Chinese Natural Science fund (No.71501163). ∗ Corresponding author. Tel: 86-18350281433 Email addresses:
[email protected] (Xiaoyi Han ),
[email protected] (Chih-Sheng Hsieh ),
[email protected] (Lung-fei Lee )
Preprint submitted to Regional Science and Urban Economics
November 20, 2016
while the other is based upon economic similarity. Additionally, in the context of social interactions, reciprocated, unreciprocated, and unchosen friends might exert heterogeneous influences on one’s behavior (Lin and Weinberg, 2014). In this case, three spatial weights (network) matrices are specified to capture different peer effects from three types of friendships. Due to the increasing popularity of higher-order SAR models, there is a growing literature in spatial econometrics on the specification and estimation of the model. For the classical approach, unlike the SAR model with a single spatial weights matrix, the maximum likelihood (ML) estimation can not be easily implemented because with multiple spatial weights matrices, it is computationally demanding to evaluate the Jacobian determinant in the likelihood function, especially with a large sample size. So instead, researchers rely on IV or GMM methods to tackle the estimation issue. In the cross-section setting, Lee and Liu (2010) extend the GMM method in Lee (2007) to estimate higher-order SAR models with autoregressive disturbances. Their best GMM estimator can be asymptotically as efficient as the ML estimator under normality. Gupta and Robinson (2015) consider the least square and IV estimation of higher-order SAR models, in which the number of spatial weights matrices can approach infinity slowly as sample size increases. In the panel data setting, Lee and Yu (2014) investigate the GMM estimation of spatial dynamic panel data (SDPD) models with multiple spatial weights matrices. For the Bayesian approach, evaluation of the Jacobian determinant would usually be required as Bayesian analysis is likelihood-based. So researchers try to develop some efficient algorithms for computing the Jacobian determinant. LeSage and Pace (2008) apply a higher-order SAR model to study the inter-state migration flows in U.S. The three large spatial weights matrices in their model are derived through two Kronecker products of a small spatial weights matrix and an identity matrix of the same dimension, where one product puts the spatial weights matrix at the front while the other puts the identity matrix at the front, and a Kronecker product of the same weights matrix and itself. By exploring these special structures, they manage to derive an efficient algorithm to calculate the Jacobian determinant.1 However, without these special structures, their algorithm is not applicable to higher-order SAR models with a finite number of general spatial weights matrices. In this paper, we develop an efficient algorithm for the higher-order SAR model, in order to tackle the computational issue of the Jacobian determinant when the number of cross-sectional units is large, within a Bayesian framework. Unlike LeSage and Pace (2008)’s approach that relies on special features of spatial weights matrices to simplify computation, we suggest to use the exchange algorithm in Murray et al. (2006) to estimate the spatial parameters. By introducing auxiliary samples into the MCMC sampler, the Jacobian determinant does NOT show up in the expression of the acceptance probability. This makes our sampler computationally efficient than a typical MCMC method, which needs to evaluate the Jacobian determinant. 1 They
utilized the traces of n-square of the small weights matrix to compute the corresponding traces of n-square large
weights matrices. Then the traces of n-square large weights matrices can be used to approximate the Jacobian determinant.
2
Another contribution of the paper is that we propose a computationally simple procedure to tackle nested model selection issues of higher-order SAR models that empirical researchers are interested in. For example, in Tao (2005), one might want to test whether a local government’s spending on education is only affected by the spending of its “geographical neighbors” but not by that of its “economic neighbors.” In Lin and Weinberg (2014), one might wonder whether peer effects induced by different types of friends are indeed heterogeneous. These empirical questions correspond to model selection between competing nested spatial models. In the Bayesian context, the Savage-Dickey density ratio advocated by Verdinelli and Wasserman (1995) can be exploited to simplify the computation of the Bayes factor for competing nested models. Interestingly, we find that the exchange algorithm can also simplify the computation of the Bayes factor through the Savage-Dickey density ratio. It is worth emphasizing that our efficient estimation algorithm and the corresponding model selection procedure do NOT rely on any special features of spatial weights matrices. Therefore, they are applicable to higher-order SAR models with general spatial weights in different settings. Two empirical studies based upon efficient algorithm and model selection procedure are considered. The first study examines the spatial competition on total investments across 239 Chinese prefectural level cities in a panel data setting. According to Xu (2011) and Yu et al. (2016), Chinese cities would involve in a “tournament competition” regarding the promotion of their local leaders. Since the upper-level government tends to evaluate their performance based upon local economic growth, cities are competing intensively with their same level rivals on total investment, which fuels short-run economic growth. However, given that China has a hierarchical political system and the one-level-up government for prefectural cities is the provincial government, cities might compete differently with neighbors within and outside the province. Hence, at least two spatial weights matrices may be relevant to capture the heterogeneous competition intensities for different types of neighbors. Furthermore, one might wonder whether a “border effect” exists, i.e., cities only compete with their neighbors within the same province. This would correspond to a nested model selection issue on whether the spatial parameter of the spatial weights characterizing neighbors outside the province is equal to zero or not. Yu et al. (2016) adopt a higher-order spatial panel model to study the “tournament competition” across Chinese cities and justify the “border effect.”2 Here we apply a more general higher-order spatial panel model to re-examine the issue.3 The second empirical application studies the spatial dependence of county-level voter participation rates in the 1980 U.S. presidential election for 3107 counties in a cross-sectional setting. We would like to 2 Yu
et al. (2016) also finds that within the same province, the spatial competition effect mainly exists for cities with similar
economic ranking but not for cities that are geographically close. Besides, they investigate the “age effect” of city leaders on the spatial effect. 3 The spatial weights matrices in the higher-order spatial panel model employed by Yu et al. (2016) are time-invariant. In addition, their model does not include higher-order spatial-time lags. In this paper, we incorporate both time-varying spatial weights matrices and higher-order spatial-time lags in our model.
3
know whether spatial correlation of participation rates can extend beyond first-order contiguous counties and whether the magnitude of correlation with relatively close neighboring counties differ from that with farther away neighboring counties. Thus, a higher-order SAR model with multiple spatial weights matrices would come out naturally. Our efficient algorithm also serves as a complement to the existing literature on computing the log Jacobian determinant in the likelihood function of spatial models. According to LeSage and Pace (2009), currently there are mainly two ways to handle the computational issue of the Jacobian determinant. One way is to decompose the Jacobian matrix to a “simpler” matrix, i.e., a diagonal or triangular matrix, and use the main diagonal elements of the simpler matrix to evaluate the determinant.4 The other way is to approximate the log determinant by finite order series of the trace of spatial weights matrices.5 Compared with those approaches, our approach does not utilize features of spatial weights matrices nor the data to boost computational speed. It is robust to different features, simpler and easier to implement: one merely needs to simulate auxiliary samples of the dependent variables and incorporate those samples in the MCMC algorithm. Hence, our approach is more appealing for spatial models with general spatial weights matrices. The paper is organized as follows: Section 2 presents the model. In particular, the higher-order SAR model in both cross-section and panel data settings are considered. Section 3 specifies prior distributions and discusses the efficient Bayesian MCMC algorithm. Section 4 studies model selection issues. Section 5 provides simulation results on sampling properties of our Bayesian estimation method and model selection procedures. Section 6 summarizes results of our empirical studies. Conclusions are drawn in Section 7.
2. The model 2.1. Higher-order SAR model in the cross-section setting Consider a higher-order SAR model with p1 spatial weights matrices in the main equation and p2 spatial weights matrices in the disturbance, Yn =
p1 X
λr1 Wr1 n Yn + Xn β + Un , Un =
r1 =1
p2 X
ρr2 Mr2 n Un + Vn ,
(2.1)
r2 =1
where n is the total number of cross-sectional units, Yn is a n × 1 vector of dependent variables, Xn is the n × k matrix of exogenous regressors, and Vn = (v1 , v2 , · · · , vn )0 with vi ’s i.i.d. normally distributed N (0, σ 2 ). W1n , W2n , · · · , Wp1 n and M1n , M2n , · · · , Mp2 n are, respectively, p1 and p2 dimensional n × n exogenous spatial weights matrices in the main equation and the disturbance. In particular, Wr1 n 6= Wr˜1 n if r1 6= r˜1 and Mr2 n 6= Mr˜2 n if r2 6= r˜2 . All Wr1 n ’s and Mr2 n ’s may or may not be row-normalized. Furthermore, some Wr1 n ’s and Mr2 n ’s may or may not be the same in practice. 4 See, 5 See,
for example, Pace and LeSage (2009) and Chapters 4.2 and 4.3 in LeSage and Pace (2009). among others, Barry and Pace (1999) and Chapter 4.4 in LeSage and Pace (2009).
4
2.2. Higher-order SAR model in the panel data setting The model in (2.1) can be generalized into the panel data setting. Denote the time-varying n × 1 vector of dependent variables as Ynt = (y1t , · · · , ynt )0 , the n × k exogenous regressors as Xnt , the n × 1 disturbances as Vnt = (v1t , · · · , vnt )0 and the n × n spatial weights matrices as Wnt (Mnt ) for t = 1, 2, · · · , T . Let Cn = (c1 , c2 , · · · , cn )0 be the n × 1 vector of individual effects, with ci being the individual effect of i. Also let αt be the scalar time effect at period t for t = 1, 2, · · · , T , and ln be a n × 1 column vector of ones. Note that α1 is dropped out as a normalization to ensure the identification of time effects. Assume vit follows an i.i.d. normal distribution N (0, σ 2 ) across all i’s and t’s, and Wr1 nt ’s and Mr2 nt ’s are all exogenous. The model in the static panel setting is given by Ynt =
p1 X
λr1 Wr1 nt Ynt + Xnt β + Cn + ln αt + Unt , Unt =
r1 =1
p2 X
ρr2 Mr2 nt Unt + Vnt , t = 1, 2, · · · , T. (2.2)
r2 =1
Notice that the weights matrices Wr1 nt ’s and Mr2 nt ’s could be time-varying because they may be constructed based upon economic/socioeconomic characteristics of different regions, which may vary over time (Lee and Yu, 2012). Moreover, by introducing lagged dependent variables Yn,t−1 and lagged spatial weights matrices Wr1 n,t−1 ’s into the model, we further extend (2.2) to a higher-order spatial dynamic panel data (SDPD) model with additive individual and time effects: Ynt = Unt =
p1 X r1 =1 p2 X
λr1 Wr1 nt Ynt + γYn,t−1 +
p1 X
µr1 Wr1 n,t−1 Yn,t−1 + Xnt β + Cn + ln αt + Unt ,
r1 =1
ρr2 Mr2 nt Unt + Vnt , t = 1, 2, · · · , T,
(2.3)
r2 =1
where the dynamic term Yn,t−1 captures the persistence of Ynt and the spatial lag terms at t−1, Wr1 n,t−1 Yn,t−1 ’s capture dynamic diffusions (spillovers). Assume the initial values in Yn,0 and the corresponding weights matrix Wr1 n,0 are exogenously given. With ρr2 = 0 for r2 = 1, 2, · · · , p2 and Wnt being time-invariant for t = 1, 2, · · · , T , (2.3) is reduced to the higher-order SDPD model in Lee and Yu (2014). The models in (2.2) and (2.3) assume that all elements of spatial weights matrices are known constants. This assumption could also be relaxed by incorporating some unknown parameters into the spatial weights. For instance, let Zit be city i’s GDP per capita at time t and Eijt = |Zit − Zjt | be the “economic distance” between city i and city j. Let W1nt be a row-normalized spatial weights matrix constructed from economic distance. Let wijt|1 be the ij th (i 6= j) element of W1nt . Following Han and Lee (2016), b wijt|1 −φ1 b = Eijt , wijt|1 , i = 1, 2, · · · , n, j = 1, 2, · · · , n, t = 1, 2, · · · , T, wijt|1 = Pn b w j=1 ijt|1
(2.4)
with φ1 being a scalar parameter to be estimated. Thus, the model in (2.3) can be rewritten as Ynt = λ1 W1nt (φ1 )Ynt +
p1 X
λr1 Wr1 nt Ynt + γYn,t−1 + µ1 W1n,t−1 (φ1 )Yn,t−1 +
r1 =2
p1 X r1 =2
5
µr1 Wr1 n,t−1 Yn,t−1
+ Xnt β + Cn + ln αt + Unt , Unt =
p2 X
ρr2 Mr2 nt Unt + Vnt , t = 1, 2, · · · , T,
(2.5)
r2 =1
where W1nt (φ1 ) is known up to one parameter φ1 . In principle, we can allow all Wr1 nt ’s and Mr2 nt ’s to include some parameters. However, as suggested by Corrado and Fingleton (2012), it is usually difficult to estimate those parameters in the spatial weights matrices. When evaluating the Jacobian determinant, the nonlinearity induced by those parameters might give rise to additional computational burden. So a common practice in the empirical literature is to fix the values of those parameters. In this paper, we focus on the one-parameter spatial weights case as in (2.4) and (2.5) for extension. We are interested in whether the exchange algorithm can also simplify the computation of the model with flexible spatial weights. 2.3. The parameter space of spatial parameters and the corresponding computational issue Pp Consider first the simpler model in the cross-sectional setting. Let Sn (λ) = In − r11=1 λr1 Wr1 n and Pp Rn (ρ) = In − r22=1 ρr2 Mr2 n , with λ = (λ1 , λ2 , · · · , λp1 )0 and ρ = (ρ1 , ρ2 , · · · , ρp2 )0 . Restrictions need to be imposed on λ and ρ to ensure Sn (λ) and Rn (ρ) are invertible. According to Horn and Johnson (1985), Pp Pp a sufficient condition is || r11=1 λr1 Wr1 ,n || < 1 and || r22=1 ρr2 Mr2 ,n || < 1, where ||.|| denote any matrix norm. This condition can be imposed on the sampling step of λ and ρ. Specifically, one may first pick a matrix norm, which is relatively smaller in value for a given λ (ρ) and Wn (Mn ), to ensure a wider parameter space for λ (ρ), but also need to be computationally simple. So a proper candidate would be matrix row sum norm ||.||∞ or column sum norm ||.||1 . In this paper we choose ||.||∞ . Then one may use a Pp Metropolis-Hasting (M-H) algorithm to sample λ and ρ. The matrix row sum norms || r11=1 λr1 Wr1 ,n ||∞ Pp and || r22=1 ρr2 Mr2 ,n ||∞ are calculated based upon new draws of λ and ρ and compared with 1 in each iteration. If one of them is larger than 1, λ and ρ have to be redrawn until both norms are less than 1. However, one drawback of the previous condition is that it might be computationally demanding for large n. This is so because one needs to compute absolute values of all elements, sum them up for each row and do an ordering sequentially. All these steps would involve a lot of elementary operations for large n. Moreover, the matrix row sum norms need to be evaluated at each iteration of the MCMC sampler. To simplify computation, one may consider a more restrictive parameter space. Notice that Pp Pp Pp Pp || r11=1 λr1 Wr1 n ||∞ ≤ ( r11=1 |λr1 |) × maxr1 =1,2,··· ,p1 ||Wr1 n ||∞ and || r22=1 ρr2 Mr2 n ||∞ ≤ ( r22=1 |ρr2 |) × Pp Pp maxr2 =1,2,··· ,p2 ||Mr2 n ||∞ . Thus as long as ( r11=1 |λr1 |) × maxr1 =1,2,··· ,p1 ||Wr1 n ||∞ < 1 and ( r22=1 |ρr2 |) × maxr2 =1,2,··· ,p2 ||Mr2 n ||∞ < 1, Sn (λ) and Rn (ρ) are invertible. Since the two norms maxr1 =1,2,··· ,p1 ||Wr1 n ||∞ and maxr2 =1,2,··· ,p2 ||Mr2 n ||∞ only need to be calculated once before the MCMC sampler, the computational burden is greatly reduced. When Wr1 n ’s and Mr2 n ’s are row-normalized, the corresponding condition Pp Pp suffices to r11=1 |λr1 | < 1 and r22=1 |ρr2 | < 1, which is just the stability condition in Lee and Liu (2010). 6
In the dynamic panel data setting, more conditions are required to ensure the stability of the model. For instance, consider the higher-order SDPD model in (2.3). Let µ = (µ1 , · · · , µp1 )0 and Ant (λ, γ, µ) = Pp Pp Pp −1 Snt (λ)(γIn + r11=1 µr1 Wr1 n,t−1 ), where Snt (λ) = (In − r11=1 λr1 Wr1 nt ) and Rnt (ρ) = (In − r22=1 ρr2 Mr2 nt ). The reduced form of (2.3) is −1 −1 Ynt = Ant (λ, γ, µ)Yn,t−1 + Snt (λ)(Xnt β + Cn + ln αt + Rnt (ρ)Vnt ), t = 1, 2, · · · , T.
Thus, the corresponding sufficient conditions for stability are ||
Pp2
r2 =1
ρr2 Mr2 nt ||∞ < 1, ||
Pp1
r1 =1
(2.6)
λr1 Wr1 nt ||∞
j0 , where ηcov (Ξ(0) , Ξ(1) , . . . , Ξ(j−1) ) =
1 j
Pj−1 i=0
(j−1)
0
Ξ(i) Ξ(i) − Ξ
(j−1)0
Ξ
(j−1)
with Ξ
=
1 j
Pj−1 i=0
(3.4)
Ξ(i) is the empir-
ical variance matrix, j0 is the length of the initial sampling period and the scaling factor 2.382 optimizes the mixing properties of the Metropolis search for the Gaussian proposals (Gelman et al., 1996). In particular, the AM proposal is a mixture of two normal distributions with a ratio parameter δ when the number of iterations exceed j0 . The second component of the mixture δNp (Ξ(j−1) , 0.12 Ip /p) would prevent us from generating singular covariance matrix due to some problematic values of ηcov (Ξ(0) , Ξ(1) , . . . , Ξ(j−1) ). In this 6 We
thank one referee for pointing out this.
9
paper δ is set to be 0.05 following Roberts and Rosenthal (2009). Finally, in our MCMC sampler, we apply the AM step only to the burn-in draws. After burn-in, we fix the value of the variance-covariance matrix and use a normal proposal (with that variance-covariance matrix) to continue the M-H step for Ξ.7 Step 1: Sample Ξ = (λ0 , ρ0 , γ, µ0 )0 according to p(Ξ|{Ynt }, Cn , {αt }, β, σ 2 ). By Bayes’ theorem, p(Ξ|{Ynt }, Cn , {αt }, β, σ 2 ) ∝ π(Ξ) × f ({Ynt }|Cn , {αt }, Ξ, β, σ 2 ), where f ({Ynt }|Cn , {αt }, Ξ, β, σ 2 ) ∝
QT
t=1
(3.5)
|Rnt (ρ)| × |Snt (λ)| × exp(− 2σ1 2 Hnt (Cn , αt , θ)0 Hnt (Cn , αt , θ)). Let
jb be the length of the burn-in period for the MCMC sampler, where j0 < jb . ˜ based upon the AM proposal Ω(Ξ|Ξ(0) , Ξ(1) , . . . , Ξ(j−1) ) (1.1: burn-in stage ) For j ≤ jb , generate a new candidate Ξ ˜ satisfies the stability conditions in Subsection 2.3. in (3.4), which is a symmetric density. Check whether Ξ ˜ until it meets those conditions. With the acceptance probability equaling to If not, redraw Ξ (j−1)
˜ = min{1, P r(Ξ(j−1) , Ξ)
f ({Ynt }|Cn
(j−1)
f ({Ynt }|Cn
(j−1)
, {αt
(j−1)
, {αt
˜ β (j−1) , σ 2(j−1) ) }, Ξ,
}, Ξ(j−1) , β (j−1) , σ 2(j−1) )
×
˜ π(Ξ) } π(Ξ(j−1) )
˜ else set it equal to Ξ(j−1) . set Ξ(j) equal to Ξ, (1.2: after burn-in ) For j > jb , fix the variance-covariance matrix of the proposal at V arΞ = (1 − ˜ from Np (Ξ(j−1) , V arΞ ). Check whether Ξ ˜ δ)2 ηcov (Ξ(0) , Ξ(1) , . . . , Ξ(jb ) )2.382 /p + δ 2 0.12 Ip /p. Generate Ξ ˜ until it meets those conditions. With satisfies the stability conditions in Subsection 2.3. If not, redraw Ξ ˜ else set it equal to Ξ(j−1) . the acceptance probability specified in Step 1.1, set Ξ(j) equal to Ξ, Step 2: Sample β from p(β|{Ynt }, Cn , {αt }, Ξ, σ 2 ); By Bayes’theorem, p(β|{Ynt }, Cn , {αt }, Ξ, σ 2 ) 1 −1 ∝ π(β) × f ({Ynt }|Cn , {αt }, Ξ, β, σ 2 ) ∝ exp(− (β − βO )0 BO (β − βO )) 2 T 1 X × exp(− 2 Hnt (Cn , αt , θ)0 Hnt (Cn , αt , θ)) 2σ t=1 ∼ N (Tβ , Σβ ),
(3.6)
with −1 Σβ = (BO +
T 0 X Xnt (ρ)Xnt (ρ) −1 ) , σ2 t=1
−1 Tβ = Σβ {BO βO +
T X Xnt (ρ)0 [Ynt (ρ, λ) − Rnt (ρ)(Ant (γ, µ)Yn,t−1 + Cn + ln αt )]
σ2
t=1
7 This
},
normal proposal with a fixed variance-covariance after burn-in would facilitate the computation of Bayes factor through
Savage-Dickey density ratio, for some nested model selection issues. See Section 4 for more discussions.
10
where Ynt (ρ, λ) = Rnt (ρ)Snt (λ)Ynt and Xnt (ρ) = Rnt (ρ)Xnt . Step 3: Sample σ 2 from p(σ 2 |{Ynt }, Cn , {αt }, Ξ, β); p(σ 2 |{Ynt }, Cn , {αt }, Ξ, β) ∝ π(σ 2 ) × f ({Ynt }|Cn , {αt }, Ξ, β, σ 2 ) a
∝ (σ 2 )−( 2 +1) × exp(− ∼ IG(
T 1 X 0 b 2 − nT 2 × exp(− ) × (σ ) H (Cn , αt , θ)Hnt (Cn , αt , θ)) 2σ 2 2σ 2 t=1 nt
ap bp , ), 2 2
(3.7)
with ap = a + nT and bp = b +
PT
t=1
0 Hnt (Cn , αt , θ)Hnt (Cn , αt , θ).
Step 4: Sample αt from p(αt |{Ynt }, Cn , Ξ, β, σ 2 ) for t = 2, 3, · · · , T ; By Bayes theorem, p(αt |{Ynt }, Cn , Ξ, β, σ 2 ) ∝ π(αt ) × f (Ynt |Cn , αt , Ξ, β, σ 2 ) ∝ exp(−
(αt − αO )2 1 0 ) × exp(− 2 Hnt (Cn , αt , θ)Hnt (Cn , αt , θ)) 2 2σα 2σ
∼ N (Tαt , Σαt ),
(3.8)
with Σα t = (
0 (ρ)Rnt (ρ)ln −1 ln0 Rnt 1 + ) σα2 σ2
Tαt = Σαt {
0 (ρ)[Ynt (ρ, λ) − Rnt (ρ)(Ant (γ, µ)Yn,t−1 + Xnt β + Cn )] ln0 Rnt αO + }. σα2 σ2
Step 5 : Sample Cn from p(Cn |{Ynt }, {αt }, Ξ, β, σ 2 ). p(Cn |{Ynt }, {αt }, Ξ, β, σ 2 ) ∝ π(Cn ) × f ({Ynt }|Cn , {αt }, Ξ, β, σ 2 ) 1 1 ∝ exp(− (Cn − CO )0 Σ−1 (H 0 (Cn , αt , θ)Hnt (Cn , αt , θ))) CO (Cn − CO )) × exp(− 2 2σ 2 nt ∼ N (TCn , ΣCn ),
(3.9)
with T 0 X Rnt (ρ)Rnt (ρ) −1 ] σ2 t=1 PT R0 (ρ)[Ynt (ρ, λ) − Rnt (ρ)(Ant (γ, µ)Yn,t−1 + Xnt β + ln αt )] −1 = ΣCn {ΣCO CO + t=1 nt }. σ2
ΣCn = [Σ−1 CO + TCn
11
The MCMC algorithm for the model in (2.4) and (2.5) can be derived based upon the above algorithm. One just needs to use an additional M-H step to sample φ1 . Assume a prior N (φ10 , σφ2 1 ) for φ1 .8 Note that p(φ1 |{Ynt }, Cn , {αt }, Ξ, β, σ 2 ) ∝ π(φ1 ) × f ({Ynt }|Cn , {αt }, Ξ, φ1 , β, σ 2 ). One may propose a new candidate φ˜1 according to the AM proposal in (3.4) and impose the stability conditions on it. At the jth iteration, the acceptance probability is (j−1)
(j−1)
P r(φ1
, φ˜1 ) = min{1,
(j−1)
}, Ξ(j−1) , φ˜1 , β (j−1) , σ 2(j−1) ) (j−1) (j−1) (j−1) f ({Ynt }|Cn , {αt }, Ξ(j−1) , φ1 , β (j−1) , σ 2(j−1) ) f ({Ynt }|Cn
, {αt
×
π(φ˜1 ) (j−1)
π(φ1
}. )
3.2. Some discussions about the baseline algorithm There might be some concerns about the baseline algorithm in Subsection 3.1, especially on the direct sampling steps of Cn and αt ’s.9 The first concern is about the potential computational cost of generating posterior draws for Cn and {αt }’s. With a large n or T , it will take time to produce posterior samples for those individual and time effects. The second concern is about the asymptotic validity of likelihood-based estimators for the SDPD model. In the classical approach, one popular way is to treat Cn and {αt }’s as fixed effects (unknown fixed parameters). But in the fixed effect specification, when T is short, one might encounter the incidental parameter problem discussed by Neyman and Scott (1948) because the time dimension does not provide enough information to consistently estimate individual effects Cn (Lee and Yu, 2010b). For spatial panels, Yu et al. (2008) show that the quasi-maximum likelihood estimates (QMLE) of the SDPD model with only individual fixed effects have an asymptotic bias of O( T1 ).10 Hence, one might question the validity of the Bayesian estimator under the fixed effect specification, which is likelihood-based, when T is small. Due to the above two concerns, a way to handle the fixed effect specification is to follow the spatial panel data model literature (Lee and Yu, 2010b,c) to eliminate fixed effects and work with a transformed SDPD model, which is free of fixed effects. Unfortunately, when Wr1 nt ’s are not row-normalized and are time-varying, a transformed model may no longer preserve structures of the SDPD model and the corresponding likelihood function might not be feasible. For instance, to eliminate the time fixed effect αt ’s, one may consider the transformation based upon the demean projector Jn = In − n1 ln ln0 , or En,n−1 , 8 Since
a M-H step is used to sample φ1 , its prior does not play much a specific role (not a conjugate prior). According to
(2.4), a more intuitive way is to impose a non-negative constraint on φ1 by specifying its prior as a truncated normal defined on [0, ∞]. We have tried both unconstrained and constrained priors. Without the non-negative constraint, we adopt a normal prior for φ1 and just let the data (likelihood) to determine whether φ1 should be non-negative or not. We have also tried out the truncated normal prior for φ1 in simulation and empirical application. The estimation results turn out to be very similar to the results with a unconstrained normal prior. 9 We thank one referee for raising these issues, which help us better understand possible limitations of our sampling algorithm. 10 See Lee and Yu (2010b) for a detailed discussion about the incidental parameter problem in spatial panel data models.
12
which is a submatrix of the orthonormal matrix of eigenvectors for Jn 11 as in Lee and Yu (2010c). But only when Wr1 nt ’s are row-normalized can the structures of the SDPD model be preserved after transformation. Furthermore, to eliminate the individual fixed effect ci ’s, the transformation based upon the demean operator JT = IT −
1 0 T lT lT
would work for time invariant spatial weights matrix in a static panel model, but would
not preserve the SDPD structure, in particular, when Wr1 nt ’s are time-varying. The transformation based on ET,T −1 , which is a submatrix of JT ’s orthonormal matrix of eigenvectors would not form a SDPD process because the transformed time-lagged dependent variables are no longer a direct transformation of Yn,t−1 with ET,T −1 (Lee and Yu, 2014).12 To the best of our knowledge, one possible scenario that both αt ’s and Cn can be eliminated is when Wr1 nt ’s are row-normalized in a static panel model. Therefore, to ensure our estimation procedure to have wider applications, we do not work with a transformed SDPD model. What we do in the baseline algorithm is to follow the common practice for panel (longitudinal) data models in the Bayesian literature (Greenberg, 2007; Koop et al., 2007; Chib, 2008) to regard Cn and αt ’s as random parameters and directly sample them. In the Bayesian paradigm, all parameters are viewed as random variables, instead of unknown constants (Greenberg, 2007). Hence, Cn can be viewed as a random vector, which is distributed as Nn (CO , ΣCO ) and independently of Xnt ’s and Vnt ’s. Similarly, αt can be viewed as a normally distributed random variable with mean αO and variance σα2 and independently of exogenous variables and disturbances.13 We combine priors of Cn , αt ’s and θ with the likelihood function to derive the exact joint posterior distribution p(θ, Cn , {αt }|{Ynt }). Drawing posterior samples from p(θ, Cn , {αt }|{Ynt }) also provides samples from marginal posterior distributions p(θ|{Ynt }), p(Cn |{Ynt }), and p({αt }|{Ynt }), without evaluating any integrals (Greenberg, 2007). In particular, sampling Cn and αt ’s actually helps to integrate out those random effects and obtain the exact marginal posterior density p(θ|{Ynt }). Thus, the incidental parameter problem and possible asymptotic bias of the MLE should not be a concern here. As for the computational cost of sampling Cn and αt ’s when n or T are large, we argue that the corresponding benefits of the direct sampling approach may warrant its application: we are able to 11 As
in Lee and Yu (2010c), (En,n−1 ,
ln √ ) n
is the orthonormal matrix of eigenvectors for Jn , where En,n−1 corresponds to
ln √ n
eigenvalues of ones while corresponds to eigenvalues of zeros. 12 The transformation based upon E T,T −1 would be applicable for the static spatial panel models with γ = 0 and timeinvariant Wn (Lee and Yu, 2010a). But for a SDPD model with spatial weights matrices that may not be row-normalized, the likelihood function approach might not be feasible after eliminating fixed effects by ET,T −1 . So a GMM approach might be preferable. See Lee and Yu (2014) for more discussions. 13 In the previous manuscript, we used the term “fixed effects” for C and α ’s. But those “fixed effects” are defined under n t the Bayesian framework, in the spirit of non-hierarchical priors (Lindley and Smith, 1972; Smith, 1973; Rendon, 2013). Cn and αt ’s are still random parameters. The only “fixed” parameters are hyperparameters of their priors: we only update posterior 2 . This is different from the distributions of Cn and αt ’s but we do NOT update posterior distributions of CO , ΣCO , αO and σα
fixed effect specification in the classical approach that treats Cn and αt ’s as unknown fixed parameters. To avoid confusion, we remove the terminology “fixed effect” for Cn and αt ’s in the manuscript, and use “random effect”, “individual effect” and “time effect” instead.
13
obtain posterior samples from p(θ|{Ynt }) without computing high-dimensional integrals with respect to Cn and αt ’s. Thus the direct sampling approach actually facilitates our computation. Though the pure random effect specification is natural in the Bayesian approach for panel data models, it might be restrictive, especially for Cn , because it is likely that Cn might correlate with Xnt ’s. To fix this, we may follow Mundlak (1978) to consider a correlated random effect specification for Cn , Cn = X n ψ + ζn , ζn ∼ Nn (0, σc2 In ),
(3.10)
where X n is a n × k matrix consisting of over-time mean of Xnt ’s, ψ is a k × 1 vector of coefficients, ζn is a n × 1 normal random vector independently of Vnt ’s, and σc2 is a scalar variance. By assuming priors on ψ and σc2 , we obtain a hierarchical prior structure for Cn .14 In this way, possible correlations between Cn and Xnt ’s can be captured.15 We conclude this subsection by summarizing the MCMC algorithm for the SDPD model with hierarchical priors on Cn . The sampling steps of Ξ, αt ’s and σ 2 are the same as the baseline algorithm. Thus, in principle, the baseline algorithm can be easily extended to incorporate sampling steps for ψ and σc2 . However, it is preferable to sample Cn , β and ψ in one block due to possible correlations between them (Chib and Carlin, 1999; Greenberg, 2007).16 Let B = (β 0 , ψ 0 )0 and Xnt = (Xnt , X n ). Substituting (3.10) into (2.3), the model becomes p1 X
Ynt =
λr1 Wr1 nt Ynt + γYn,t−1 +
r1 =1
p1 X
−1 µr1 Wr1 ,nt−1 Yn,t−1 + Xnt B + ln αt + ζn + Rnt (ρ)Vnt .
(3.11)
r1 =1
Multiplying both sizes of (3.11) by Rnt (ρ), we have Ynt (ρ, λ) = Rnt (ρ)Ant (µ, γ)Yn,t−1 + Xnt (ρ)B + Rnt (ρ)ln αt + Vnt (ρ),
(3.12)
where Xnt (ρ) = Rnt (ρ)Xnt and Vnt (ρ) = Rnt (ρ)ζn + Vnt ∼ Nn (0, σ 2 In + σc2 Rnt (ρ)Rnt (ρ)0 ). Denote Hnt (αt , Ξ, B, σ 2 ) = Ynt (ρ, λ)−Rnt (ρ)Ant (µ, γ)Yn,t−1 −Rnt (ρ)ln αt −Xnt (ρ)B and ΣVnt = σ 2 In +σc2 Rnt (ρ)Rnt (ρ)0 . 14 In
the Bayesian literature on panel data models, researchers often assume hierarchical priors on individual effects. See for
example, Chib (2008), Greenberg (2007) and Koop et al. (2007). 15 One might argue that (3.10) is still be restrictive since it specifies a particular dependence structure between C and X ’s. n nt An approach is to treat Cn as fixed effects (unknown fixed parameters), which can be robust against all possible correlations between Cn and Xnt ’s. But the difficulty is, one needs to seek for an alternative method (other than eliminating those fixed effects), to tackle the asymptotic bias of the MLE at least with small T , in a Bayesian framework. We leave this issue for future research. 16 Due to
the
hierarchical
p(Cn |{Ynt }, {αt }, Ξ, β, σ 2 , ψ, σc2 ),
priors
of
Cn ,
a
more
simple
approach
then assume conjugate priors for ψ and
σc2
is
to
first
sample
Cn
and sample them from
according
p(ψ|Cn , σc2 )
to and
p(σc |Cn , ψ). But this algorithm might suffer from poor mixing because of possible correlations among Cn , β and ψ (Chib and Carlin, 1999). Hence, we follow Chib and Carlin (1999) and Greenberg (2007) to sample them in one block.
14
According to (3.12), the likelihood function with Cn integrated out is n 1 2 f (Ynt |αt , Ξ, B, σ 2 , σc2 ) ∝ (σ 2 )− 2 × |Rnt (ρ)| × |Snt (λ)| × exp(− Hnt (αt , Ξ, B, σ 2 )0 Σ−1 Vnt Hnt (αt , Ξ, B, σ )). 2 (3.13)
Assume priors of B and σc2 be, respectively, B ∼ N2k (BO , ΣBO ) and σc2 ∼ IG( a2c , b2c ). Following Chib and Carlin (1999), we sample B and Cn in one block. Note that p(B, Cn |{Ynt }, {αt }, Ξ, σ 2 , σc2 ) ∝ p(B|{Ynt }, {αt }, Ξ, σ 2 , σc2 ) × p(Cn |{Ynt }, {αt }, Ξ, B, σ 2 , σc2 ). The sampling steps for B and Cn from p(B, Cn |{Ynt }, {αt }, Ξ, σ 2 , σc2 ) are: Sub-step 1: Sample B from p(B|{Ynt }, {αt }, Ξ, σ 2 , σc2 ); By Bayes’theorem, p(B|{Ynt }, {αt }, Ξ, σ 2 , σc2 ) 1 ∝ π(B) × f ({Ynt }|{αt }, Ξ, B, σ 2 , σc2 ) ∝ exp(− (B − BO )0 Σ−1 BO (B − BO )) 2 T 1X 2 Hnt (αt , Ξ, B, σ 2 )0 Σ−1 × exp(− Vnt Hnt (αt , Ξ, B, σ )) 2 t=1 ∼ N (TB , ΣB ),
(3.14)
with ΣB =
(Σ−1 BO
+
T X
−1 X0nt (ρ)Σ−1 , Vnt Xnt (ρ))
t=1
TB = ΣB {Σ−1 BO BO +
T X
Xnt (ρ)0 Σ−1 Vnt [Ynt (ρ, λ) − Rnt (ρ)Ant (γ, µ)Yn,t−1 − Rnt (ρ)ln αt ]}.
t=1
Sub-step 2: Sample Cn from p(Cn |{Ynt }, {αt }, Ξ, B, σ 2 , σc2 ); By Bayes theorem, p(Cn |{Ynt }, {αt }, Ξ, B, σ 2 , σc2 ) ∝ π(Cn |ψ, σc2 ) × f ({Ynt }|Cn , {αt }, Ξ, β, σ 2 ) ∝ exp(−
1 1 0 (Cn − X n ψ)0 (Cn − X n ψ)) × exp(− 2 (Hnt (Cn , αt , θ)Hnt (Cn , αt , θ))) 2σc2 2σ
∼ N (TCn , ΣCn ),
(3.15)
with T 0 X 1 Rnt (ρ)Rnt (ρ) −1 I + ] n 2 σc σ2 t=1 PT R0 (ρ)[Ynt (ρ, λ) − Rnt (ρ)(Ant (γ, µ)Yn,t−1 + Xnt β + ln αt )] X nψ = ΣCn { 2 + t=1 nt }. σc σ2
ΣC n = [ TCn
15
Finally, the sampling step for σc2 is p(σc2 |Cn , ψ) ∝ π(σc2 ) × π(Cn |ψ, σc2 ) ∝ (σc2 )−( ∼ IG(
ac 2
+1)
× exp(−
n (Cn − X n ψ)0 (Cn − X n ψ) bc ) × (σc2 )− 2 × exp(− ) 2 2σc 2σc2
apc bpc , ), 2 2
(3.16)
with apc = ac + n and bpc = bc + (Cn − X n ψ)0 (Cn − X n ψ). Simulation results regarding the estimation of the SDPD model with hierarchical priors on Cn are summarized in Table 5 of the Appendix. 3.3. The efficient algorithm to sample Ξ The MCMC algorithms outlined in Subsections 3.1 and 3.2 are computationally tractable when n, the number of cross-sectional units, is relatively small. However, if n is large, the computation burden might increase dramatically because when sampling for Ξ, the Jacobian determinants |Rnt (ρ)| and |Snt (λ)| need to be evaluated in each iteration. The innovation of this paper is to suggest the exchange algorithm in Murray et al. (2006) to “avoid” the calculation of |Rnt (ρ)| and |Snt (λ)| in the M-H step. Specifically, based upon features of the SAR or SDPD model, we can easily simulate some auxiliary samples for the dependent variable Ynt ’s and the spatial error term Unt ’s. With the help of those auxiliary samples, |Rnt (ρ)| and |Snt (λ)| are canceled out in the expression of the acceptance ratio, then the corresponding computation can be simplified. For the dynamic model, recall that Ξ = (λ0 , ρ0 , γ, µ0 )0 with prior π(Ξ). The likelihood function can be decomposed into17 f ({Ynt }|Cn , {αt }, Ξ, β, σ 2 ) = q({Ynt }; Cn , {αt }, Ξ, β, σ 2 ) × D(λ, ρ),
(3.17)
with D(λ, ρ) =
T Y
|Rnt (ρ)||Snt (λ)|,
t=1 2
2 − nT 2
q({Ynt }; Cn , {αt }, Ξ, β, σ ) ∝ (σ )
T Y t=1
exp(−
Hnt (Cn , αt , θ)0 Hnt (Cn , αt , θ) ). 2σ 2
˜ 0 , ρ˜0 , γ˜ , µ ˜ = (λ If a M-H step is applied to sample Ξ, a new candidate value Ξ ˜0 )0 needs to be generated from the ˜ proposal g(Ξ|Ξ). Note that to simplify notation, we use Ξ, Cn , αt ’s, β and σ 2 (parameter notations without ˜. symbol) to represent old MCMC samples of those parameters. The acceptance probability of moving from ˜ is Ξ to Ξ 2 ˜ ˜ ˜ = min{1, π(Ξ) × f ({Ynt }|Cn , {αt }, Ξ, β, σ ) } P r(Ξ, Ξ) π(Ξ) f ({Ynt }|Cn , {αt }, Ξ, β, σ 2 ) 17 In
this section we focus on using the exchange algorithm to sample Ξ. The remaining MCMC samplers for Cn , {αt }, β
and σ 2 can be identically derived as in Subsection 3.1.
16
= min{1, With a large n, evaluating the ratio
˜ ρ˜) ˜ ˜ β, σ 2 ) D(λ, π(Ξ) q({Ynt }; Cn , {αt }, Ξ, × × }. 2 π(Ξ) q({Ynt }; Cn , {αt }, Ξ, β, σ ) D(λ, ρ)
˜ ρ) D(λ, ˜ D(λ,ρ)
(3.18)
is computationally demanding, so the acceptance probability can
not be easily calculated. To tackle this issue, we adopt the exchange algorithm in Murray et al. (2006). In ad˜ we introduce auxiliary sample Y˜nt from f (Y˜nt |Cn , {αt }, Ξ, ˜ β, σ 2 ) for t = dition to the new candidate value Ξ, ˜ ˜ β, σ 2 ), 1, · · · , T . Hence, one can regard the proposal of the exchange algorithm as g(Ξ|Ξ)f (Y˜nt |Cn , {αt }, Ξ, ˜ from g(Ξ|Ξ), ˜ ˜ β, σ 2 ) which not only generates Ξ but also consists of a randomization component f (Y˜nt |Cn , {αt }, Ξ, that generates Y˜nt ’s (Liang et al., 2016).18 Y˜nt ’s can be viewed as “replacement” data sets, compared with ˜ the the real data Ynt ’s. Different from the conventional M-H step that only chooses between Ξ and Ξ, ˜ {Y˜nt }). The exchange algorithm wants to make a swapping change of (Ξ, {Ynt }) with the proposed (Ξ, acceptance probability in the exchange algorithm is 2 2 ˜ ˜ ˜ ˜ {Y˜nt }) = min{1, π(Ξ) × f ({Ynt }|Cn , {αt }, Ξ, β, σ ) × f ({Ynt }|Cn , {αt }, Ξ, β, σ ) } P r(Ξ, Ξ, ˜ β, σ 2 ) π(Ξ) f ({Ynt }|Cn , {αt }, Ξ, β, σ 2 ) f ({Y˜nt }|Cn , {αt }, Ξ, ˜ β, σ 2 ) ˜ f ({Y˜nt }|Cn , {αt }, Ξ, β, σ 2 ) f ({Ynt }|Cn , {αt }, Ξ, π(Ξ) = min{1, × × } ˜ β, σ 2 ) π(Ξ) f ({Ynt }|Cn , {αt }, Ξ, β, σ 2 ) f ({Y˜nt }|Cn , {αt }, Ξ, ˜ ˜ β, σ 2 ) π(Ξ) q({Y˜nt }; Cn , {αt }, Ξ, β, σ 2 ) q({Ynt }; Cn , {αt }, Ξ, }. = min{1, × × 2 ˜ β, σ 2 ) π(Ξ) q({Ynt }; Cn , {αt }, Ξ, β, σ ) q({Y˜nt }; Cn , {αt }, Ξ,
(3.19)
˜ is more compatible with the real data {Ynt } and the original value Ξ prefers the replacement If the proposed Ξ data {Y˜nt }, both
2 ˜ f ({Ynt }|Cn ,{αt },Ξ,β,σ ) 2) ˜ f ({Y˜nt }|Cn ,{αt },Ξ,β,σ
and
f ({Y˜nt }|Cn ,{αt },Ξ,β,σ 2 ) f ({Ynt }|Cn ,{αt },Ξ,β,σ 2 )
should be large. Then, a swap between Ξ
˜ ρ˜) are cancelled out in ˜ should be made. Note that the Jacobian determinant term D(λ, ρ) and D(λ, and Ξ Eq.(3.19). So the acceptance probability can be evaluated easily. ˜ β, σ 2 ), conditional on the new candidate Ξ ˜ and old MCMC draws Simulating Y˜nt from f (Y˜nt |Cn , {αt }, Ξ, of Cn , αt ’s, β and σ 2 , for t = 1, 2, · · · T might be time-consuming for some f (.). But for the higher-order SAR model, the likelihood function is a multivariate normal density function. Thus, given the restrictions on the parameter space of λ and ρ, we can adopt the contraction-mapping algorithm in Lee (2003) to simulate Y˜nt ’s. Below are the simulation steps: ˜nt : Step 1.1: Simulating the spatial error terms U Pp2 −1 Notice that Unt = r2 =1 ρr2 Mr2 nt Unt + Vnt and Unt = Rnt (ρ)Vnt , with Vnt ∼ N (0, σ 2 In ). We aim to −1 −10 ˜nt ∼ N (0, σ 2 Rnt simulate a U (˜ ρ)Rnt (˜ ρ)). Consider the mapping
˜nt ) = M ap(U
p2 X
˜nt + Vnt . ρ˜r2 Mr2 nt U
(3.20)
r2 =1
˜nt is the fixed point solution of U ˜nt = M ap(U ˜nt ). Provided that the sufficient stability condition || Pp2 ρ˜r Mr nt ||∞ < U 2 2 r2 =1 18 Liang
et al. (2016) show that the exchange algorithm defines a valid Markov chain for simulating from the joint posterior
distribution.
17
1 is satisfied, (3.20) turns out to be a contraction mapping such that ||M ap(Unt|1 ) − M ap(Unt|2 )||∞ = ||
p2 X
ρ˜r2 Mr2 nt (Unt|1 − Unt|2 )||∞ ≤ ||
r2 =1
p2 X
ρ˜r2 Mr2 nt ||∞ ||Unt|1 − Unt|2 ||∞
r2 =1
= ϕ||Unt|1 − Unt|2 ||∞ , with ϕ = || M ap(Unt|i )
Pp2
˜r2 Mr2 nt ||∞ < 1. Hence, with any initial r2 =1 ρ Pp2 = r2 =1 ρ˜r2 Mr2 nt Unt|i + Vnt . Then one has
(3.21) value Unt|1 ,19 define the iteration as Unt|i+1 =
||M ap(Unt|i ) − M ap(Unt|i+1 )||∞ ≤ ϕ||Unt|i − Unt|i+1 ||∞ = ϕ||M ap(Unt|i−1 ) − M ap(Unt|i )||∞ ≤ · · · ≤ ϕi ||Unt|1 − Unt|2 ||∞ = ϕi ||Rnt (˜ ρ)Unt|1 − Vnt ||∞ .
(3.22)
The iterations in (3.21) and (3.22) stop with a given criterion (say, an of 10−4 or 10−6 ). Any value Unt|i+1 ˜nt . that satisfies ||M ap(Unt|i ) − M ap(Unt|i+1 )||∞ < can be used as a desired U Step 1.2: Simulating the dependent variables Y˜nt : ˜nt from Step 1.1, we need to simulate Y˜nt for the SDPD model in (2.3). Define the mapping at the Given U first period and at the tth (t = 2, · · · , T ) period as M ap1 (Y˜n,1 ) and M apt (Y˜nt ), where M ap1 (Y˜n,1 ) = M apt (Y˜nt ) =
p1 X
˜ r Wr n,1 Y˜n,1 + γ˜ Yn,0 + λ 1 1
p1 X
˜n,1 , µ ˜r1 Wr1 ,n,0 Yn,0 + Xn,1 β + Cn + U
r1 =1 p1 X
r1 =1 p1 X
r1 =1
r1 =1
˜ r Wr nt Y˜nt + γ˜ Y˜n,t−1 + λ 1 1
˜n,t , t = 2, · · · , T. µ ˜r1 Wr1 ,n,t−1 Y˜n,t−1 + Xn,t β + Cn + ln αt + U (3.23)
Y˜nt (Y˜n1 ) is the fixed point of Y˜nt = M ap(Y˜nt ) (Y˜n1 = M ap(Y˜n1 )). Providing that stability conditions Pp ˜ r Wr n1 ||∞ < 1 and || Pp1 λ ˜ || r11=1 λ 1 1 r1 =1 r1 Wr1 nt ||∞ < 1 are satisfied, (3.23) is also a contraction mapping. Pp1 ˜ Pp ˜ r Wr nt ||∞ , with any initial value Yn1|1 or Ynt|1 , define Let ω1 = || r1 =1 λr1 Wr1 n1 ||∞ and ωt = || r11=1 λ 1 1 the iteration as Yn1|i+1 = M ap1 (Yn1|i ) and Ynt|i+1 = M apt (Ynt|i ) for t = 2, · · · , T . One has ˜ n1|1 − γ˜ Yn,0 − ||M ap1 (Yn1|i ) − M ap1 (Yn1|i+1 )||∞ ≤ ω1i ||Sn1 (λ)Y
p1 X
˜n1 ||∞ , µ ˜r1 Wr1 ,n,0 Yn,0 − Xn1 β − Cn − U
r1 =1
(3.24) and ||M apt (Ynt|i ) − M apt (Ynt|i+1 )||∞ 19 To
boost the speed of the contraction mapping, one may set initial values as follows: in the first iteration, set Unt|1 to
˜nt . Then, use U ˜nt as the initial values of the mapping in the second be some arbitrary value, and obtain the fixed point U iteration. After that, always use the fixed point in the previous iteration as the initial values for the mapping in the current iteration. As long as those fixed points do not change much, the computation time of the contraction mapping would be saved.
18
≤
˜ nt|1 ωti ||Snt (λ)Y
− γ˜ Y˜n,t−1 −
p1 X
˜nt ||∞ . µ ˜r1 Wr1 ,n,t−1 Y˜n,t−1 − Xnt β − Cn − ln αt − U
(3.25)
r1 =1
One can simulate Y˜nt from t = 1 to t = T sequentially. The efficient algorithm proposed here can be applied to estimate the model in a cross-sectional or static panel setting. For the SDPD model with W1nt (φ1 ), one may also adopt the above algorithm to sample φ1 . Let φ˜1 be the new candidate value of φ1 . The contraction mapping fixed points can be constructed similarly as, p1 X
M ap1 (Y˜n,1 ) = λ1 W1n,1 (φ˜1 )Y˜n,1 +
λr1 Wr1 n,1 Y˜n,1 + γYn,0 + µ1 W1n,0 (φ˜1 )Yn,0 +
r1 =2
p1 X
µr1 Wr1 ,n,0 Yn,0
r1 =2
˜n,1 , + Xn,1 β + Cn + U M apt (Y˜nt ) = λ1 W1n,t (φ˜1 )Y˜n,t +
p1 X
λr1 Wr1 nt Y˜nt + γ Y˜n,t−1 + µ1 W1nt−1 (φ˜1 )Y˜n,t−1 +
r1 =2
p1 X
µr1 Wr1 ,n,t−1 Y˜n,t−1
r1 =2
˜n,t , t = 2, · · · , T, + Xn,t β + Cn + ln αt + U
(3.26)
We conclude this subsection by mentioning the related computational cost of the exchange algorithm. ˜nt ’s and Y˜nt ’s need to be simulated at each pass of the MCMC sampler (in the M-H step Notice that both U for Ξ). Furthermore, they need to be simulated for each time period in the sample.
4. Model selection For higher-order SAR models, there are two nested model selection issues that empirical researchers are interested in. The first is whether the spatial coefficients of some spatial lags are equal to zero or not. For instance, the higher-order SAR model in Yu et al. (2016) consists of two spatial lags for each city, in which one refers to neighbors within the same province while the other corresponds to neighbors outside the province. One might want to test the existence of the “border effect”: whether the coefficient of the spatial lag for neighbors outside the province is equal to zero or not, while the spatial lag for neighbors within the province is significantly different from zero. Another interesting issue is whether coefficients of some spatial lags are significantly different from those of others. For example, Lin and Weinberg (2014) adopt a higher-order SAR model to study possible heterogenous influences that different types of friends would exert on teenage’s behavior. They test whether peer effect coefficients from different spatial weights (networks), i.e., from weights for reciprocated friends or for unreciprocated friends, are significantly different. In the Bayesian paradigm, following the long tradition initialized by Zellner (1971), one way to conduct model comparison is to compute the corresponding Bayes factor. Bayes factor is widely applicable and can be applied to study cases where the competing models are nested or non-nested. However, in the panel data setting, with many individual and time effects in the model, it may be tedious to directly evaluate 19
the Bayes factor as in Chib (1995) or Chib and Jeliazkov (2001). However, given that we are investigating model selection issues for nested spatial models, the Bayes factor can be calculated using the Savage-Dickey density ratio (SDDR) proposed by Verdinelli and Wasserman (1995). For our model, the computation of SDDR only requires the MCMC draws from the unrestricted model and one-time reduced MCMC draws from the restricted model. In the following subsections, the implementation of the SDDR, the corresponding computational issues when n is large, and how the exchange algorithm can be used to reduce the computational burden are addressed. 4.1. Evaluating the Bayes factor by SDDR Consider the SDPD model in (2.3). Suppose we want to test whether λ1 is equal to zero or not, the restricted competing model is Ynt = Unt =
p1 X r1 =2 p2 X
λr1 Wr1 nt Ynt + γYn,t−1 +
p1 X
µr1 Wr1 n,t−1 Yn,t−1 + Xnt β + Cn + ln αt + Unt
r1 =1
ρr2 Mr2 nt Unt + Vnt , t = 1, 2, · · · , T.
(4.1)
r2 =1
Denote the unrestricted SDPD model as M1 and the restricted competing model in (4.1) as M2 . Denote λ−1 = {λr1 }r1 =2,··· ,p1 . Let Θ2 = (λ−1 , ρ, γ, µ, β, σ 2 , Cn , {αt }) and Θ1 = (λ1 , Θ2 ) be respectively, the parameter of M2 and M1 . Let P (M1 ) and P (M2 ) be the prior probabilities for the two models and π(Θ1 ) and π(Θ2 ) be the prior density for the parameters. The posterior odds, which is the ratio of the products of the prior and the marginal likelihood of the two models, is: P (M2 ) f2 ({Ynt }|M2 ) P (M2 |{Ynt }) = × , P (M1 |{Ynt }) P (M1 ) f1 ({Ynt }|M1 ) | {z } | {z } | {z } P osterior odds
P rior odds
Bayes f actor
where Z fi ({Ynt }|Mi ) =
fi ({Ynt }|Θi , Mi )π(Θi )dΘi , i = 1, 2.
Usually the same prior probabilities are assumed for competing models. Thus, we only need to pay attention to the Bayes factor, which is just the ratio of the two model’s marginal likelihoods. The model with a larger marginal likelihood is more likely to be the model that generates the data. However, as mentioned earlier, with many individual and time effects, it may be difficult to directly evaluate marginal likelihoods of competing models by the method in Chib (1995) or Chib and Jeliazkov (2001). Therefore, we rely on the SDDR in Verdinelli and Wasserman (1995), which does not require explicit computation of marginal likelihoods of competing models, to evaluate the Bayes factor.20 20 The
SDDR has been employed to compute the Bayes factors in many empirical studies using Bayesian methods. See,
among others, Li (1998), Koop and Potter (1999), Koop et al. (2010) and Chan (2016).
20
The nested model selection issue is equivalent to a hypothesis testing problem H0 : λ1 = 0 against H1 : λ1 6= 0 for M1 . Let π(.) be the prior under H1 and π0 (.) be the prior under H0 . The Bayes factor of M2 over M1 is BF21 Let F1 =
R
R f ({Ynt }|λ1 = 0, Θ2 )π0 (Θ2 )dΘ2 f2 ({Ynt }|M2 ) =R . = f1 ({Ynt }|M1 ) f ({Ynt }|λ1 , Θ2 )π(λ1 , Θ2 )dλ1 dΘ2
(4.2)
f ({Ynt }|λ1 , Θ2 )π(λ1 , Θ2 )dλ1 dΘ2 . Verdinelli and Wasserman (1995) show that, under some
assumptions21 R
f ({Ynt }|λ1 = 0, Θ2 )π0 (Θ2 )dΘ2 p(λ1 = 0|{Ynt }) × F1 Z f ({Ynt }|λ1 = 0, Θ2 )π0 (Θ2 )p(Θ2 |{Ynt }, λ1 = 0) = p(λ1 = 0|{Ynt }) × dΘ2 p(λ1 = 0, Θ2 |{Ynt }) × F1 Z p(λ1 = 0|{Ynt }) π0 (Θ2 ) × p(Θ2 |{Ynt }, λ1 = 0)dΘ2 , = π(λ1 = 0) π(Θ2 |λ1 = 0)
BF21 = p(λ1 = 0|{Ynt }) ×
(4.3)
where the last line follows because p(λ1 = 0, Θ2 |{Ynt }) × F1 = f ({Ynt }|λ1 = 0, Θ2 ) × π(λ1 = 0, Θ2 ). As π0 (Θ2 ) = π(Θ2 |λ1 = 0) is satisfied for our model because all priors are independent, (4.3) is reduced to BF21 =
p(λ1 = 0|{Ynt }) . π(λ1 = 0)
(4.4)
(4.4) is labeled as the the Savage-Dickey density ratio (SDDR) in Verdinelli and Wasserman (1995). If R π0 (Θ2 ) π0 (Θ2 ) 6= π(Θ2 |λ1 = 0), π(Θ p(Θ2 |{Ynt }, λ1 = 0)dΘ2 in (4.3) would serve as the “correction factor” 2 |λ1 =0) for SDDR. For our model, the denominator in (4.4) is just a constant since we assume a uniform prior for λ1 . Hence, it remains to calculate the marginal posterior likelihood p(λ1 = 0|{Ynt }). Denote Ξ−λ1 = (λ−1 , ρ, γ, µ). We first modify the baseline algorithm in Subsection 3.1 by using two M-H steps to sample λ1 from p(λ1 |{Ynt }, Cn , {αt }, Ξ−λ1 , β, σ 2 ) and Ξ−λ1 from p(Ξ−λ1 |{Ynt }, Cn , {αt }, λ1 , β, σ 2 ). Then, according to Chib and Jeliazkov (2001), the marginal likelihood p(λ1 = 0|{Ynt }) can be derived as PL (l) (l) (l) L−1 l=1 P r(λ1 , 0|{Ynt }, Θ2 ) × g(0|λ1 ) Eu (P r(λ1 , 0|{Ynt }, Θ2 )g(0|λ1 )) ≈ , p(λ1 = 0|{Ynt }) = P (j) (j) J Er (P r(0, λ1 |{Ynt }, Θ2 )) J −1 j=1 P r(0, λ1 |{Ynt }, Θ2 )
(4.5)
where P r(λ1 , 0|{Ynt }, Θ2 ) is the acceptance probability of moving from λ1 to 0 in the M-H step, and g(0|λ1 ) is the corresponding proposal density. Note that the expectation Eu is taken with respect to the posterior (l)
(l)
distribution of the unrestricted model p(Θ1 |{Ynt }) and λ1 ’s and Θ2 ’s are MCMC draws from it, while (j)
(j)
the expectation Er is taken with respect to the density g(λ1 |0) × p(Θ2 |{Ynt }, λ1 = 0) and λ1 ’s and Θ2 ’s are MCMC draws from it. Hence, to compute p(λ1 = 0|{Ynt }), in addition to the MCMC draws from the unrestricted model, one just needs to obtain the reduced MCMC draws from g(λ1 |0) × p(Θ2 |{Ynt }, λ1 = 0). The detailed derivation of Eq.(4.5) is provided in Appendix A. 21 The
assumptions imposed are 0 < p(λ1 = 0|{Ynt }) < ∞ and 0 < π(λ1 = 0, Θ2 ) < ∞, which are satisfied under the setup
of our model.
21
For (4.3), there might be some concerns about influences of the U(−1, 1) prior on model selection.22 For instance, in a special case of (4.1), with only one spatial lag in the main equation and the error term, namely, Ynt = λ2 W2nt Ynt + Xnt β + Cn + ln αt + Unt , Unt = ρ1 M1nt Unt + Vnt , t = 1, 2, · · · , T, the U(−1, 1) prior might be too restrictive for λ2 and ρ1 , when W2nt and M2nt are row-normalized. For 1 , 1) interval suggested λ2 , one can adopt a uniform prior with a wider parameter set, based upon the ( eig s
by LeSage and Pace (2009), where eigs is the most negative real eigenvalue of W2nt . Or one may consider 1 , 1) while π(λ2 ) the interval suggested by Kelejian and Prucha (2010). Thus, if π0 (λ2 ) is specified as U( eig s
is maintained as U(−1, 1), the
π0 (Θ2 ) π(Θ2 |λ1 =0)
might be less than 1
23
and the Bayes factor would be deflated.
We argue that, if the stability conditions are more restrictive with a parameter range smaller than (−1, 1), this concern would be rather minor. However, when it comes to the opposite scenario, one needs to assume that the true spatial parameter is in (−1, 1) under a large sample, to justify that the U(−1, 1) prior does not affect model selection. In our MCMC sampler, the U(−1, 1) prior does not play much a specific role. Firstly, it is not a conjugate prior. More importantly, we follow LeSage and Pace (2009, pg. 221) to impose the stability conditions in Subsection 2.3 through an accept-rejection M-H sampling step, not through the priors.24 Hence, in cases where the stability conditions imply a more restrictive parameter set than (−1, 1), such as the restrictive stability condition in Subsection 2.3, (−1, 1) forms a wider parameter range, and plays no role in the M-H sampling step. There might be some special cases, like the aforementioned one, where (−1, 1) are more restrictive than the (wider) stability conditions in Subsection 2.3. In those cases, as long as the true spatial parameter is in (−1, 1), with a large sample there would not be a concern as sample information would dominate the prior restriction. But when the true parameter is outside (−1, 1), one needs to eliminate the effect of the prior by assuming more non-informative priors for spatial parameters 10 under the null and alternative. For example, one may let π0 (λ2 ) = π(λ2 |λ1 = 0) = U( eig , 10). In this way, s 1 compared with ( eig , 1) and the stability conditions, π0 (λ2 ) and π(λ2 |λ1 = 0) imply wider parameter sets s
and have no impact on the M-H sampling step. As a result, they would not affect estimation and model selection results. The SDDR can also be adopted to tackle the model selection issue on whether the spatial effects from different types of neighbors are indeed heterogenous. Specifically, suppose we want to test whether λ1 = λ2 . Denote λ2 = λ1 + $. The unrestricted M1 in (2.3) can be rewritten as Ynt = λ1 (W1nt + W2nt )Ynt + $W2nt Ynt +
p1 X
λr1 Wr1 nt Ynt + γYn,t−1 +
r1 =3
22 We
µr1 Wr1 n,t−1 Yn,t−1
r1 =1
thank a referee for raising this issue. we assume other than λ2 , π0 (.) and π(.|λ1 = 0) are the same for other parameters. This is because it is generally difficult to directly specify priors based upon stability conditions in Subsection 2.3.
23 Here 24
p1 X
22
+ Xnt β + Cn + ln αt + Unt Unt =
p2 X
ρr2 Mr2 nt Unt + Vnt , t = 1, 2, · · · , T.
(4.6)
r2 =1
When $ = 0, it suffices to a restricted model M3 , namely, Ynt = λ1 (W1nt + W2nt )Ynt +
p1 X
λr1 Wr1 nt Ynt + γYn,t−1 +
r1 =3
Unt =
p2 X
p1 X
µr1 Wr1 n,t−1 Yn,t−1 + Xnt β + Cn + ln αt + Unt
r1 =1
ρr2 Mr2 nt Unt + Vnt , t = 1, 2, · · · , T.
(4.7)
r2 =1
Let λ−2 = {λr1 }r1 =1,3,··· ,p1 , Θ3 = (λ−2 , ρ, γ, µ, β, σ 2 , Cn , {αt }) and Θ1 = ($, Θ3 ). To test whether $ = 0 or not, one needs to compute the Bayes factor BF31 =
p($=0|{Ynt }) . π($=0)
Assume a uniform density for $, the
computation of BF31 will only require the computation of p($ = 0|{Ynt }). Similar to (4.5), to evaluate p($ = 0|{Ynt }), one merely needs to obtain MCMC draws from p(Θ1 |{Ynt }) and g($|0) × p(Θ3 |{Ynt }, $ = 0). 4.2. Model selection for the higher-order SDPD model with unknown parameters in spatial weights Denote the model in (2.5) with W1nt (φ1 ) as M4 . One interesting model selection issue for M4 is to test whether λ1 = µ1 = 0 or not, i.e., links of W1nt (φ1 ) and W1n,t−1 (φ1 ) have no effects on Ynt . Under H0 , the competing restricted model M5 is Ynt = Unt =
p1 X r1 =2 p2 X
λr1 Wr1 nt Ynt + γYn,t−1 +
p1 X
µr1 Wr1 n,t−1 Yn,t−1 + Xnt β + Cn + ln αt + Unt
r1 =2
ρr2 Mr2 nt Unt + Vnt , t = 2, 3, · · · , T.
(4.8)
r2 =1
Note that φ1 disappears from M5 and is not identified under H0 : λ1 = µ1 = 0. This is called Davies’problem in the statistical literature (Davies, 1977). As mentioned by Koop and Potter (1999), to conduct hypothesis testing in this setting, generally the classical test statistics would not be nuisance parameter free. For example, Andrews and Ploberger (1994) suggest “average” exponential test statistics, where the average is over different possible values of the nuisance parameter. However, in the context of nonlinear time series models, Koop and Potter (1999) show that the Davies’problem actually simplify the computation of the Bayes factor by SDDR because integration over the nuisance parameter is not needed for the “correction factor” of the SDDR. Here we provide a further instance of this simplification in the context of spatial models. Let θ4 = (λ1 , µ1 ), λ−1 = {λr1 }r1 =2,··· ,p1 , µ−1 = {µr1 }r1 =2,··· ,p1 , Θ5 = (λ−1 , µ−1 , γ, β, σ 2 , Cn , {αt }) and Θ4 = (θ4 , φ1 , Θ5 ). The Bayes factor of M5 over M4 is R f ({Ynt }|θ4 = 0, Θ5 )π0 (Θ5 )dΘ5 f5 ({Ynt }|M5 ) =R . BF54 = f4 ({Ynt }|M4 ) f ({Ynt }|θ4 , φ1 , Θ5 )π(θ4 , φ1 , Θ5 )dθ4 dφ1 dΘ5 23
(4.9)
Let φ∗1 be a given constant and F4 = BF54 = p(θ4 = 0, φ1 = p(θ4 = 0, φ1 = p(θ4 = 0, φ1 = p(θ4 = 0, φ1
R
f ({Ynt }|θ4 , φ1 , Θ5 )π(θ4 , φ1 , Θ5 )dθ4 dφ1 dΘ5 . (4.9) can be rewritten as R f ({Ynt }|θ4 = 0, Θ5 )π0 (Θ5 )dΘ5 ∗ = φ1 |{Ynt }) × p(θ4 = 0, φ1 = φ∗1 |{Ynt }) × F4 Z f ({Ynt }|θ4 = 0, Θ5 )π0 (Θ5 )p(Θ5 |{Ynt }, θ4 = 0, φ1 = φ∗1 ) = φ∗1 |{Ynt }) × dΘ5 p(θ4 = 0, φ1 = φ∗1 , Θ5 |{Ynt }) × F4 Z f ({Ynt }|θ4 = 0, Θ5 )π0 (Θ5 )p(Θ5 |{Ynt }, θ4 = 0, φ1 = φ∗1 ) = φ∗1 |{Ynt }) × dΘ5 f ({Ynt }|θ4 = 0, φ1 = φ∗1 , Θ5 )π(θ4 = 0, φ1 = φ∗1 , Θ5 ) Z π0 (Θ5 )p(Θ5 |{Ynt }, θ4 = 0) = φ∗1 |{Ynt }) × dΘ5 . (4.10) π(θ4 = 0, φ1 = φ∗1 , Θ5 )
where the last two lines follow because p(θ4 = 0, φ1 = φ∗1 , Θ5 |{Ynt }) × F4 = f ({Ynt }|θ4 = 0, φ1 = φ∗1 , Θ5 ) × π(θ4 = 0, φ1 = φ∗1 , Θ5 ), f ({Ynt }|θ4 = 0, Θ5 ) = f ({Ynt }|θ4 = 0, φ1 = φ∗1 , Θ5 ) and p(Θ5 |{Ynt }, θ4 = 0, φ1 = φ∗1 ) = p(Θ5 |{Ynt }, θ4 = 0). Notice that p(θ4 = 0, φ1 = φ∗1 |{Ynt }) = p(θ4 = 0|{Ynt }) × p(φ1 = φ∗1 |{Ynt }, θ4 = 0) and p(φ1 = φ∗1 |{Ynt }, θ4 = 0) = π(φ1 = φ∗1 |θ4 = 0), (4.10) can be further reduced to Z π0 (Θ5 ) p(θ4 = 0|{Ynt }) × p(Θ5 |{Ynt }, θ4 = 0)dΘ5 , BF54 = π(θ4 = 0) π(Θ5 |θ4 = 0, φ1 = φ∗1 ) where
R
π0 (Θ5 ) p(Θ5 |{Ynt }, θ4 π(Θ5 |θ4 =0,φ1 =φ∗ 1)
(4.11)
= 0)dΘ5 is the “correction factor” and it only requires the integration
with respect to Θ5 . Hence, the Davies’problem here helps to avoid the integration with respect to φ1 in the correction factor. In our setting, we assume independent priors for θ4 , φ1 and Θ5 . So π(Θ5 |θ4 = 0, φ1 = φ∗1 ) = π0 (Θ5 ) and the Bayes factor is sufficed to BF54 =
p(θ4 =0|{Ynt }) . π(θ4 =0)
Also the Davies’problem will not add
extra computational burden. This is so, because p(θ4 = 0|{Ynt }) =
PL (l) (l) (l) (l) L−1 l=1 P r(θ4 , 0|{Ynt }, Θ5 , φ1 ) × g(0|θ4 ) Eu (P r(θ4 , 0|{Ynt }, Θ5 , φ1 )g(0|θ4 )) ≈ . PJ (j) (j) (j) Er (P r(0, θ4 |{Ynt }, Θ5 , φ1 )) J −1 j=1 P r(0, θ4 |{Ynt }, Θ5 , φ1 ) (4.12)
With θ4 = 0, the likelihood function does NOT offer any additional information (other than its prior) to update φ1 , so p(φ1 |{Ynt }, Θ5 , θ4 = 0) = π(φ1 ), which implies the reduced MCMC draws of φ1 only come (j)
from its prior π(φ1 ). Then one can simply draw φ1 ’s from π(φ1 ) and does not need to apply any M-H step for φ1 . 4.3. Evaluating the SDDR using the exchange algorithm The evaluation of the SDDR outlined in Subsection 4.1 is computationally tractable when n is relatively small. However, when n is large, it might be computationally demanding to use (4.5) to evaluate the SDDR. This is because the acceptance probability P r(λ1 , 0|{Ynt }, Θ2 ) and P r(0, λ1 |{Ynt }, Θ2 ) involve the Jacobian determinants |Rnt (ρ)|’s and |Snt (λ)|’s. For instance, in Eq.(4.5), |Rnt (ρ)|’s and |Snt (λ)|’s need to be evaluated at each iteration l for draws from p(Θ1 |{Ynt }), and at each iteration j for draws from g(λ1 |0) × p(Θ2 |{Ynt }, λ1 = 0). To simplify computation, we can apply the exchange algorithm to calculate the acceptance probability. 24
Recall that the likelihood function of M1 is f ({Ynt }|Θ1 ) = q({Ynt }; Θ1 ) × D(λ, ρ) with D(λ, ρ) = Pp1 r1 =2 λr1 Wr1 nt for t = 1, 2, · · · , T . The likelit=1 |Rnt (ρ)||Snt (λ)|. Also denote Snt (λ−1 ) = In −
QT
hood function of M2 is f ({Ynt }|Θ2 , λ1 = 0) = q({Ynt }; Θ2 , λ1 = 0) × D(λ−1 , ρ), where D(λ−1 , ρ) = QT t=1 |Rnt (ρ)||Snt (λ−1 )|. Following (3.19), the acceptance probability in (4.5) can be evaluated by (l) (l) P r(λ1 , 0|{Ynt }, Θ2 )
(j)
= min{1,
π(λ1 = 0)
(j)
P r(0, λ1 |{Ynt }, Θ2 ) = min{1,
(l)
(l)
×
(l)
q({Y˜nt|1 }; Θ2 , λ1 ) (l)
(l)
(l)
×
q({Ynt }; Θ2 , λ1 = 0) } (l) q({Y˜nt|1 }; Θ , λ1 = 0)
π(λ1 )
q({Ynt }; Θ2 , λ1 )
(j) π(λ1 )
q({Y˜nt|2 }; Θ2 , λ1 = 0)
π(λ1 = 0)
2
(j)
×
(j)
q({Ynt }; Θ2 , λ1 = 0)
×
(j)
(j)
2
1
q({Ynt }; Θ2 , λ1 ) }, (j) (j) q({Y˜nt|2 }; Θ , λ )
(4.13)
(l) where Y˜nt|1 ’s are auxiliary samples simulated based upon the MCMC draws Θ2 from p(Θ2 |{Ynt }) and (j) (j) λ1 = 0, whereas Y˜nt|2 ’s are auxiliary samples simulated based on reduced MCMC draws Θ2 ’s and λ1 from
g(λ1 |0) × p(Θ2 |{Ynt }, λ1 = 0). Notice that all Jacobian determinants have been cancelled out not only in the MCMC sampling step, but also in the expression of the acceptance probabilities in (4.13). Therefore, the computational burden can be reduced without the evaluation of any Jacobian determinant.
5. Simulation Study 5.1. Monte Carlo simulation design In this section, we apply the Bayesian estimation algorithms and the model selection procedures previously outlined to simulated data sets. The study consists of two parts. In the first part, we evaluate the performance of the baseline and exchange algorithms for the higher-order SAR model in different settings. In the second part, the SDDR is utilized to deal with various nested model selection issues. The most general model considered in this simulation study is the following higher-order SDPD model: Ynt = λ1 W1nt (φ1 )Ynt + λ2 W2nt Ynt + γYn,t−1 + µ1 W1n,t−1 (φ1 )Yn,t−1 + µ2 W2n,t−1 Yn,t−1 + X1nt β1 + X2nt β2 + Cn + ln αt + Unt , Unt = ρ1 M1nt Unt + Vnt , t = 1, 2, · · · , T.
(5.1)
We also study restricted models of (5.1), including the higher-order SDPD model with φ1 fixed at 1, the higher-order static panel SAR model with φ1 fixed at 1 and γ = µ1 = µ2 = 0 and a cross-sectional higherorder SAR model with the same restrictions as the static panel model. Different values of cross-sectional units n and time dimension T are explored. For the panel data model, T is set to be 10 for the static model and 18 for the dynamic model. For the cross sectional model, T = 1. We investigate settings with one relatively small n case(n = 600) and two large n cases (n = 3000 and n = 5000). To generate the time-varying spatial weights matrices W1nt (φ1 )’s and W2nt ’s (M1nt ’s) in (5.1), we first simulate time-varying variables Zit|W (Zit|M ) for individual i from N (0, 3) (N (0, 5)) for t = 1, 2, · · · T independently. Then we generate coordinates xci and yci for each i from X (3). Based upon those coordinates, 25
we further utilize the function “makeneighborsw”, taken from LeSage’s matlab codes for spatial econometrics, to construct n × n un-row normalized 0 − 1 indicator matrices W1u and W2u (M1u ). Specifically, the function can generate a time-invariant and row-normalized spatial weights matrix W1r based on “3-nearest neighbors.” The un-row normalized matrix W1u is defined as the indicator matrix for “3-nearest neighbors”, u r where the w1ij = 1 if w1ij > 0, otherwise 0. Similarly, W2u (M1u ) is the indicator matrix for “4 to 8-nearest
neighbors (0.1 × n nearest neighbors).” Denote Eijt|W = |Zit|W − Zjt|W |. W1nt (φ1 ) is constructed as b w1ij,t −φ1 b u , t = 1, 2, · · · , T. , i 6= j, w1ii,t = 0, w1ij,t = w1ij × Eijt|W w1ij,t = Pn b j=1 w1ij,t
(5.2)
To generate W1nt ’s (without φ1 ), we simply fix φ1 = 1, similarly for W2nt ’s and M1nt ’s. Different values for spatial parameters in the models are investigated, namely, DGP1: λ1 = 0.3, λ2 = 0.1, ρ1 = 0.2; DGP2: λ1 = 0.3, λ2 = 0.1; DGP3: λ1 = 0.2, λ2 = 0.1, ρ1 = 0.2, γ1 = 0.1, µ1 = −0.1, µ2 = −0.1; DGP4: λ1 = 0.2, λ2 = 0.1, γ1 = 0.1, µ1 = −0.1, µ2 = −0.1; DGP5: λ1 = 0.4, λ2 = −0.2, ρ1 = 0.2, γ1 = 0.2, µ1 = −0.1, µ2 = −0.1; DGP6: λ1 = 0.4, λ2 = −0.2, γ1 = 0.2, µ1 = −0.1, µ2 = −0.1. Specifically, DGP1 and DGP2 correspond, respectively, to the cross-section or staic panel models with and without spatial errors. DGP3 and DGP4 are, respectively, dynamic panels with and without spatial errors. DGP5 and DGP6 also refer to, respectively, the dynamic panel model with and without spatial errors, but a key feature of them is |λ1 | + |λ2 | + |γ1 | + |µ1 | + |µ2 | = 1, which violates the restrictive stability condition in Lee and Yu (2012) for SDPD models. Motivated by the simulation study in Elhorst et al. (2012), we modify W1u (W2u ) to be the indicator matrix for “2(5) nearest neighbors.” In this way, W1nt and W2nt would have some “overlaps,” with one of the spatial parameter λ2 to be negative. With those modified spatial weights, DGP5 and DGP6 would satisfy a less restrictive stability condition, as the one in Eq.(2.7). The exogenous regressors X1nt ’s and X2nt ’s are generated independently from a normal distribution with mean 0 and variance 2. Their coefficients are β1 = 1 and β2 = 1. The disturbances Vnt ’s are generated from i.i.d. N (0, σ 2 In ) with σ 2 = 1. For panels, the time effect αt ’s are generated from standard normal distribution for t = 2, 3, · · · , T . The individual effect Cn are generated according to Mundlak (1978), ci = X i1. ψ1 + X i2. ψ2 + εi , where ψ1 = ψ2 = 2, X i1. and X i2. represent, respectively, the empirical means of the exogenous variables Xi1t ’s and Xi2t ’s over time, and εi ’s are generated independently from N (0, 2). Moreover, in the dynamic setting, we treat the first period as the initial period. Yn1 is generated from a standard normal distribution. For the model with W1nt (φ1 ), the true value of φ1 is equal to 1. 26
The Bayesian MCMC algorithms developed in Section 3 are implemented on (5.1) with both restrictive and wide parameter spaces as discussed in Subsection 2.3. The priors are: λr1 ∼ U(−1, 1), µr1 ∼ U(−1, 1), r1 = 1, 2; ρ1 ∼ U(−1, 1), γ ∼ U(−1, 1), φ1 ∼ N (0, 1); 6 4 β ∼ N2 (0, 10 × I2 ), σ 2 ∼ IG( , ), Cn ∼ Nn (0, 10 × In ), αt ∼ N (0, 1), t = 2, · · · , T. 2 2
(5.3)
We choose the hyper-parameters of β, σ 2 , Cn and αt ’s in order to make their corresponding priors relatively non-informative. For instance, the prior means of β, Cn and αt ’s are zeros since we want to take a neutral stance regarding their signs. The prior variances of them are set to be larger or at least equal to their actual variances in the DGPs. One might argue that the hyper-parameters chosen are still quite informative. So we also investigate more non-informative priors: we enlarge the prior variances of β,Cn and αt ’s 10 times, and set the shape and scale parameters of the inverse gamma prior for σ 2 to be close to zero (Spiegelhalter et al., 2003). Thus, we have β ∼ N2 (0, 100 × I2 ), σ 2 ∼ IG(
0.001 0.001 , ), Cn ∼ Nn (0, 100 × In ), αt ∼ N (0, 10), t = 2, · · · , T. 2 2
We run both the baseline and the exchange algorithm with those non-informative priors for the static and dynamic panel model with ρ 6= 0. The results are provided in Table 6. For the SDPD model with hierarchical priors on Cn , in addition to priors of Ξ, σ 2 and αt ’s in (5.3), we assume B ∼ N2k (0, I2k ) and σc2 ∼ IG( 62 , 24 ). The length of MCMC is 10000. The first 20% of draws are discarded as burn-in samples. The mean of posterior draws is used as our point estimate. Some trace plots of λ1 and λ2 are depicted in Figures 1(a) and (b), which demonstrate the convergence of the MCMC sampler. For estimation, we focus on cases where n = 600 (n = 300) for cross-section (panel data) setting and the number of repetitions is 50 for all experiments.25 The mean and standard deviation of parameter estimates are reported. Besides, we compare the computation time of one M-H step in the exchange algorithm with those in the baseline and the Monte carlo approximation algorithms,26 in a single Markov chain for different values of n.27 All algorithms are run on a server with a 3.30GHZ Intel Xeon processor and 66 GB memory. 25 We
are not aiming at deriving the sampling distribution of parameters. For estimation, we just want to see whether
our MCMC sampler could recover the true values of parameters, in experiments with finite samples generated from the same data generating process. By checking the mean and the standard deviation of the approximated posterior means across 50 repetitions, we can learn about the overall performance of the MCMC sampler with 50 different finite samples. In addition, for model selection, we can learn about whether the SDDR could always select the true model using 50 different finite samples. 26 See, among others, Barry and Pace (1999) and Chapter 4.4 of LeSage and Pace (2009) for a more detailed description of the Monte carlo approximation algorithm. Here we follow LeSage and Pace (2009) to implement the algorithm. For instance, in the panel data setting, we utilize finite order series of trace of λ1 W1nt (φ1 )+λ2 W2nt to approximate log|In −λ1 W1nt (φ1 )−λ2 W2nt | in our MCMC sampler. We set the order of the series equal to 30, calculate 50 log Jacobian determinants, and use the average of these 50 values as the estimated value. 27 We explore cases of n = 600, n = 3000 and n = 5000 for both cross-section and panel data models. We keep T = 10 for panel data models. In addition, we set M1nt ’s (M1n ) equal to W1nt ’s (W1n ).
27
For model selection, we focus on two issues. The first is to test whether λ2 is equal to zero or not,28 while the second is to examine whether $ = λ1 − λ2 is equal to zero or not.29 In addition, for the most general model in (5.1), instead of checking λ2 = 0, we choose to test whether λ1 = µ1 = 0 since this would give rise to the Davies’ problem for φ1 . Both the typical SDDR and the SDDR with the exchange algorithm are utilized to compute the Bayes factor. The number of repetitions is 50 for all experiments with n = 600 (n = 300) in cross-section (panel data) setting. We compare the performance of the typical SDDR and the SDDR with the exchange algorithm. The model frequencies in which the true models are selected are reported.30 5.2. Simulation results Tables 1-4 summarize simulation results of the models in both the cross-sectional and panel data settings. The mean of estimates across repetitions for most parameters are close to their true values and the standard deviations across repetitions are small. In particular, in Table 4, for the SDPD model, both the baseline and the exchange algorithms with the less restrictive stability condition in (2.7) could still produce good estimates. Tables 5-6 provide estimation results of the models with hierarchical priors on Cn and with non-informative priors on β, Cn , αt ’s and σ 2 . Most parameter estimates are close to their true values with relatively small standard deviations. Table 7 compares the CPU time of one M-H step in baseline, exchange, and Monte carlo approximation algorithms, within a single Markov chain. When n is relatively small (600), the CPU time cost by the three algorithms are very similar. However, as n grows to 3000 and 5000, the time cost by the exchange algorithm is less than the time cost by other two algorithms in most cases, especially for the baseline algorithm. For instance, in the general model in (5.1), with n = 5000, compared with the baseline algorithm, the CPU time saved by the exchange algorithm could be more than 20 seconds, by one M-H step! The only exception comes in the case of the general model in (5.1) with n = 5000, where the time cost by the Monte carlo approximation algorithm is even less than that cost by the exchange algorithm. Table 8 provides the model selection results by SDDR for both the baseline and the exchange algorithms. The SDDR calculated based upon the two algorithms can both select the true model most of the time. 6. Empirical Application This section consists of two empirical applications. The first is about the spatial competition on total investments across 239 Chinese prefectural level cities in the panel data setting, while the second is about the spatial dependence of county-level voter participation rates in the 1980 U.S. presidential election for 3107 counties in the cross-section setting. 28 Here
we just test whether λ2 = 0 whereas µ2 can be non-zero. we only test whether λ1 − λ2 = 0. We do not test whether µ1 = µ2 . And we assume a uniform prior U (−2, 2) for $. 30 Given the Bayes factor, we rely on guidelines in Jeffreys (1961) to select the true model. Specifically, if the log 10 Bayes 29 Here
factor of M2 over M1 is larger than 0, M2 is preferred to M1 .
28
6.1. Empirical application 1: On the spatial competition of Chinese cities We apply the higher-order SDPD model to examine the spatial competition of Chinese cities on total investments. Yu et al. (2016) label this spatial competition as a “tournament competition,” where cities compete with their neighbors on stimulating economic growth, in order to maximize the chance of promotion for their local officials. Similar to Yu et al. (2016), we want to test whether a “border effect” exists for this competition, which suggests that for a city, only neighbors within the same province are its rivals whereas its neighbors outside the province are not in its “tournament.” This would correspond to a nested model comparison issue on whether the spatial parameter for the spatial weights characterizing neighbors outside the province is zero or not, and whether the spatial parameters for spatial weights characterizing neighbors within the province is significantly different from zero or not. We use data from 239 Chinese prefectural level cities between 2007 to 2012.31 The dependent variable Ynt is the total investment spending on fixed asset for each city. The control variables in Xnt include the characteristics of cities and corresponding provinces, such as a city’s account revenue, GDP level, population, population density, the ratio of secondary industry over GDP, and the corresponding province’s fiscal revenue and expenditure. The data on city and provincial level variables all come from China City Statistical Year book for the corresponding periods. Table 9 presents the summary statistics of the data. We consider two time-varying spatial weights matrices W1nt and W2nt , where W1nt is for neighbors within the same province, while W2nt is for neighbors outside the province. Let Zit be city i’s GDP per capita at year t and Eijt = |Zit − Zjt | be the “economic distance” between city i and j. The ij th element of W1nt is specified as b w1ij,t −φ1 b u w1ij,t = Pn , i 6= j, w1ii,t = 0, w1ij,t = w1ij × Eijt , t = 1, 2, · · · , T, b w j=1 1ij,t
(6.1)
u u where w1ij is a 0 − 1 indicator. Two ways of defining w1ij is explored. The first corresponds to geographical u contiguity within the same province: if j is i’s first contiguous neighbor in its province, w1ij = 1, otherwise u 0. The other way is simply set w1ij = 1 as long as i and j are in the same province. Denote W1nt|1 and
W1nt|2 respectively, the spatial weights associated with the first and the second way. And φ1 is fixed at 1 for models without parameters in W1nt . Moreover, the ij th element of W2nt can be specified similarly to u (6.1), with w2ij being the indicator for first contiguous neighbors outside the province and φ1 fixed at 1.
Various empirical specifications are estimated. The most general one is the higher-order SDPD model with W1nt (φ1 )’s and without spatial errors, Ynt = λ1 W1nt|i (φ1 )Ynt + λ2 W2nt Ynt + γYn,t−1 + µ1 W1n,t−1|i (φ1 )Yn,t−1 + µ2 W2n,t−1 Yn,t−1 31 Our
main data source, China City Statistical Year book contains data on 290 prefectural level cities from 2007 to 2012.
After deleting the four municipalities (Beijing, Shanghai, Tianjin, Chongqing) and cities with missing variables, we are left with 239 cities.
29
+ X1nt β1 + X2nt β2 + Cn + ln αt + Unt , i = 1, 2; t = 1, 2, · · · , T.
(6.2)
It is followed by the higher-order SDPD model with φ1 fixed at 1 and the static SAR panel model with φ1 fixed at 1 and γ = µ1 = µ2 = 0. Both the baseline and the exchange algorithms are adopted to estimate these models. The priors of the parameters are:32 λr1 ∼ U(−1, 1), µr1 ∼ U(−1, 1), r1 = 1, 2; γ ∼ U(−1, 1), φ1 ∼ N (0, 1); 6 4 β ∼ N7 (0, 10 × I7 ), σ 2 ∼ IG( , ), Cn ∼ Nn (0, 10 × In ), αt ∼ N (0, 1), t = 2, · · · , T. 2 2
(6.3)
The estimation results and the CPU time of the two algorithms are compared. Some trace plots of λ1 and λ2 are depicted in Figure 2(a). Moreover, to examine whether cities only compete with their neighbors within the same province, we apply the SDDR to compute the Bayes factor of restricted models with λ1 = 0 over general models without this restriction, in both the static and dynamic settings. We also compute the Bayes factor of restricted models with λ2 = 0 over the corresponding unrestricted models. We further compare the results and the CPU time of the SDDR with the baseline and exchange algorithms. Other nested model selection issues , such as whether λ1 − λ2 = 0, and whether λ1 = µ1 = 0 for the model in (6.2), are also studied. Tables 11(a) and 12(a) summarize the estimation results for both the static and dynamic models with W1nt|1 ’s and W2nt ’s, under restrictive and wider parameter ranges for Ξ = (λ1 , λ2 , µ1 , µ2 , γ). Here we rely on standard deviations (SD) of MCMC draws to roughly decide whether the estimated coefficients are significantly different from zero or not. For a more formal test we refer to the Bayes factor computed by SDDR. For dynamic models, the estimate of λ1 is around 0.16 (0.17) for the baseline algorithm (the exchange algorithm), with relatively small SD. This implies cities’ total investment exhibit a positive and significant spatial correlation with its neighbors in the same province. On the other hand, the estimate of λ2 is close to zero with relatively large SD, which suggests that the investment spending of cities’ neighbors outside the province might be irrelevant for cities’ investment spending decisions. These two results are consistent with Yu et al. (2016), where they find a “border effect”: cities only compete with their neighbors within the province border contemporaneously. Furthermore, the estimate of γ is significant and about 0.67, implying that cities are quite persistent when deciding their total investment. The estimate of µ1 is negative and significant, and it approximately satisfies the nonlinear restriction µ1 = −λ1 γ. As suggested by Tao and Yu (2012), a negative µ1 with µ1 = −λ1 γ could arise if a city is not confident in forecasting its neighbor’s actions and may follow a partial adjustment process to decide its own investment. Lastly, for the static 32 According
to (6.1), φ1 should be non-negative. But here we do not impose the non-negative constraint on φ1 through its
normal prior. We aim to let the data (likelihood) to determine the sign of φ1 . An alternative way is to impose the non-negative constraint through a truncated normal defined on [0, ∞]. We also tried that. The estimation results turn out to be very similar to the results with a unconstrained normal prior.
30
model, the main findings are very similar to that of the dynamic model, except that the magnitude of some estimated parameters are different. Tables 13 and 14 provide estimation results of key parameters for all model specifications with different W1nt ’s. The estimates of λ1 are positive with small SD’s in all specifications whereas the estimates of λ2 are close to zero with relatively large SD’s. This further justify the spatial competition among Chinese cities is restricted within province borders. Table 15(a) collects the CPU time of estimation for the baseline and the exchange algorithms. The exchange algorithm requires less time, though the difference of the CPU time is not large with n = 239. In addition, for the same algorithm, it does take more time to do estimation under a wider parameter range. Table 16(a) summarizes the Bayes factor computed by SDDR for testing the following four hypotheses: Hypothesis 1: λ1 = 0; Hypothesis 2: λ2 = 0; Hypothesis 3: λ1 = λ2 ; Hypothesis 4: λ1 = µ1 = 0, where the last hypothesis is only for the model with W1nt (φ1 )’s. The Bayes factor of restricted models with λ2 = 0 over unrestricted models are large while the Bayes factor of restricted models with the remaining restrictions over the corresponding unrestricted models is close to zero. These results together indicate the model with λ2 = 0 and λ1 6= 0 is more compatible with the data, which confirms the existence of the “border effect.” Table 17(a) provides the CPU time of computing Bayes factor for the baseline and the exchange algorithm in different settings. In most cases the CPU time saved by the exchange algorithm becomes larger, compared with the time it saved in estimation. 6.2. Empirical application 2: on the spatial dependence of county-level voter participation rates in the 1980 U.S. presidential election In this subsection, we adopt the higher-order SAR model in a cross-sectional setting to study the spatial dependence of voter participation rates for 3107 counties in the 1980 U.S. presidential election. We wonder whether the spatial dependence of participation rates can extend beyond first-order contiguous counties. We also investigate whether the magnitudes of spatial dependence with closer neighbors differ from that with neighbors that are farther away. Data of this study are available from James LeSage’s matlab package for spatial econometrics.33 The dependent variable Yn is the voter participation rate of each county, which is the log of the proportion of population aged 18 or older that voted in the 1980 Presidential election. Xn contains characteristics of voters such as the log of the proportion of population over 18 with college degrees, the log of the proportion of the same population with home ownership and the log of income per capita. Table 10 provides the summary 33 The
data can be downloaded from http://www.spatial-econometrics.com/. Application of the data can be found in Pace
and Barry (1997) and LeSage (1999).
31
statistics of the data. The empirical specification is Yn = λ1 W1n Yn + λ2 W2n Yn + Xn β + Vn ,
(6.4)
where W1n and W2n are respectively, row-normalized spatial weights matrices based upon “6-nearest neighbors” and “7-12 nearest neighbors.” We choose 6 as the boundary for W1n because according to LeSage and Pace (2007), the average number of first-order contiguous neighbors for these 3107 counties in the sample is about 6.34 Hence, by testing whether λ2 = 0 or not, we see whether the spatial dependence of the participation rate can extend beyond first-order contiguous neighbors. The priors of parameters are: 6 4 λr1 ∼ U(−1, 1), r1 = 1, 2; β ∼ N4 (0, 10 × I4 ), σ 2 ∼ IG( , ). 2 2 Similar to Subsection 6.1, both the baseline algorithm and the exchange algorithm are utilized to estimate Eq.(6.4). Some trace plots of λ1 and λ2 are depicted in Figure 2(b). The estimation results and CPU time are compared. Furthermore, we adopt the SDDR with the two algorithms to compute the Bayes factors of a restricted model with the following three restrictions over the corresponding unrestricted models: Restriction 1: λ1 = 0; Restriction 2: λ2 = 0; Restriction 3: λ1 = λ2 .
(6.5)
The values of Bayes factors from the two algorithms and their CPU times are also compared. Tables 11(b) and 12(b) collect estimation results of the model under both restrictive and wider parameter ranges. The estimates of λ1 and λ2 are both positive with small SD’s, suggesting that the spatial correlation of voter’s participation rate among counties does extend beyond first-order contiguous neighbors. In particular, the estimate of λ1 is larger than that of λ2 for both the baseline and exchange algorithms, implying that the strength of spatial correlation with close neighbors turns out to be stronger than that of neighbors which are farther away. Table 15(b) provides the CPU time of the two algorithms for estimation. When n = 3107, it took about 60000 seconds for the baseline algorithm to complete the MCMC sampler with 40000 iterations. But when it comes to the exchange algorithm, the CPU time reduces to around 9902 seconds, which is about
1 6
of the CPU time spent by the baseline algorithm.
Table 16(b) summarizes the Bayes factor calculated by SDDR for nested hypotheses (restrictions) outlined in (6.5). All Bayes factors of restricted models over general models are very close to or equal to zero. This confirms the spatial correlation on voter participation rates among counties indeed extends beyond their first-order contiguous neighbors and the magnitude of the correlation is significantly different for different neighbors. Table 17(b) presents the CPU time of calculating the Bayes factors for both algorithms. The CPU time spent by the exchange algorithm is only just about
1 10
of the CPU time spent by the baseline
algorithm! This shows the exchange algorithm does greatly reduce the computational burden of calculating the Bayes factors. 34 LeSage
and Pace (2007) consider using these 3107 counties to model voter participation in the 2000 U.S. presidential
election.
32
7. Conclusion This paper considers Bayesian estimation and model selection for higher-order SAR models in different settings. We develop a more efficient algorithm based upon the exchange algorithm in Murray et al. (2006), to tackle the computational issue of the Jacobian determinant in the likelihood function for the posterior distribution of parameters, when the number of cross-sectional spatial units is large. Nested model selection of the higher-order SAR model by Bayes factors is also investigated. In particular, we utilize the exchange algorithm to simplify the computation of the Bayes factors through SDDR. The efficient estimation algorithm and the model selection procedure are applied to study the spatial competition among 239 Chinese cities and spatial correlation of voter participation rates for 3107 counties in the 1980 U.S. presidential election. We find that the baseline and the exchange algorithms produce similar results. For the first empirical application, the estimation and model selection results together justify the“border effect”: cities only compete with their neighbors within the province border. But the exchange algorithm turn out to be more efficient than the baseline algorithm. With 239 cross-sectional units, the CPU time spent by the exchange algorithm for estimation and model selection is less than that spent by the baseline algorithm. For the second application, the estimation and model selection results indicate that spatial correlation on voter participation rates of U.S. counties does extend beyond first-order contiguous neighbors. Furthermore, with 3107 cross-sectional units, the CPU time cost by the exchange algorithm is only about
1 10
( 16 ) of the CPU time cost by the baseline algorithm for model selection (estimation).
References Andrews, D., Ploberger, W., 1994. Optimal tests when a nuisance parameter is present only under the alternative. Econometrica 62, 1383–1414. Barry, R., Pace, R.K., 1999. Monte Carlo estimates of the log determinant of large sparse matrices. Linear Algebra and Its Applications 289, 41–54. Chan, J., 2016. Specification tests for time-varying parameter models with stochastic volatility. Econometric Reviews , forthcoming. Chib, S., 1995. Marginal likelihood from the gibbs output. Journal of the American Statistical Association 90, 1313–1321. Chib, S., 2008. Panel data modeling and inference: A Bayesian primer, in: M´ aty´ as, L., Sevestre, P. (Eds.), The Econometrics of Panel Data: Fundamentals and Recent Developments in Theory and Practice. Springer. Chib, S., Carlin, B., 1999. On MCMC sampling in hierarchical longitudinal models. Statistics and Computing 9, 17–26. Chib, S., Jeliazkov, I., 2001. Marginal likelihood from the metropolis-hasting output. Journal of the American Statistical Association 96, 270–281. Corrado, L., Fingleton, B., 2012. Where is the economics in spatial econometrics? Journal of Regional Science 52, 210–239. Davies, R.B., 1977. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 62, 247–254. Elhorst, J.P., Lacombe, D., Piras, G., 2012. On model specification and parameter space definitions in higher order spatial econometrics models. Regional Science and Urban Economics 42, 211–220.
33
Gelman, A., Roberts, G., Gilks, W., 1996. Efficient metropolis jumping rules. Bayesian Statistics 5, 599–607. Greenberg, E., 2007. Introduction to Bayesian Econometrics. Cambridge University Press. Gupta, A., Robinson, P., 2015. Inference on higher-order spatial autoregressive models with increasingly many parameters. Journal of Econometrics, 186, 19–31. Haario, H., Saksman, E., Tamminen, J., 2001. An adaptive metropolis algorithm. Bernoulli 7, 223–242. Han, X., Lee, L.F., 2016. Bayesian analysis of spatial panel autoregressive models with time-varying endogenous spatial weight matrices, common factors and random coefficients. Journal of Business Economic & Statistics, 34, 642–660. Horn, R., Johnson, C., 1985. Matrix Analysis. New York: Cambridge University Press. Jeffreys, H., 1961. Theory of Probability. Clarendon Press, Oxford. 3rd edn. Kelejian, H.H., Prucha, I.R., 2010. Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. Journal of Econometrics 157, 53–67. Koop, G., Leon-Gonzalez, R., Strachan.R.W., 2010. Dynamic probabilities of restrictions in state space models: an application to the phillips curve. Journal of Business Economic & Statistics 28, 370–379. Koop, G., Poirier, D., Tobias, J., 2007. Bayesian Econometric Methods. Cambridge University Press. Koop, G., Potter, Simon, M., 1999. Bayes factors and nonlinearity: Evidence from economic time series. Journal of Econometrics 88, 251–281. Lee, L.F., 2003. Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econometric Reviews 22, 307–335. Lee, L.F., 2007. GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. Journal of Econometrics 137, 489–514. Lee, L.F., Liu, X., 2010. Efficient GMM estimation of high order spatial autoregressive models with autoregressive disturbances. Econometric Theory 26, 187–230. Lee, L.F., Yu, J., 2010a. Estimation of spatial autoregressive panel data model with fixed effects. Journal of Econometrics 154, 165–185. Lee, L.F., Yu, J., 2010b. Some recent developments in spatial panel data models. Regional Science and Urban Economics 40, 255–271. Lee, L.F., Yu, J., 2010c. A spatial dynamic panel data model with both time and individual fixed effects. Econometric Theory 26, 564–597. Lee, L.F., Yu, J., 2012. QML estimation of spatial dynamic panel data models with time varying spatial weights matrices. Spatial Economic Analysis 7, 31–74. Lee, L.F., Yu, J., 2014. Efficient GMM estimation of spatial dynamic panel data models with fixed effects. Journal of Econometrics 180, 174–197. LeSage, J., 1999. Applied econometrics using matlab. Department of Economics, University of Toledo. LeSage, J., Pace, R.K., 2007. A matrix exponential spatial specification. Journal of Econometrics 140, 190–214. LeSage, J., Pace, R.K., 2008. Spatial econometric modeling of origin-destination flows. Journal of Regional Science 48, 941–967. LeSage, J., Pace, R.K., 2009. Introduction to Spatial Econometrics. CRC Press. Boca Raton, FL, USA. Li, K., 1998. Bayesian inference in a simultaneous equation model with limited dependent variables. Journal of Econometrics 85, 387–400. Liang, F., Jin, I.H., Song, Q., Liu, J.S., 2016. An adaptive exchange algorithm for sampling from distributions with intractable normalizing constants. Journal of the American Statistical Association 111, 377–393. Lin, X., Weinberg, B., 2014. Unrequited friendship? how reciprocity mediates adolescent peer effects. Regional Science and Urban Economics 48, 144–153. Lindley, D., Smith, A., 1972. Bayes estimates for the linear model. Journal of the Royal Statistical Society. Series B (Method-
34
ological) 34, 1–41. Mundlak, Y., 1978. On the pooling of time series and cross section data. Econometrica 46, 69–85. Murray, I., Ghahramani, Z., MacKay, D., 2006. MCMC for doubly-intractable distributions. In Proceedings of the 22nd Annual Conference on Uncertainity in Artificial Intelligence (UAI). Neyman, J., Scott, E., 1948. Consistent estimates based on partially consistent observations. Econometrica 16, 1–32. Pace, R.K., Barry, R., 1997. Quick computation of spatial autoregressive estimators. Geographical Analysis 29, 232–246. Pace, R.K., LeSage, J.P., 2009. A sampling approach to estimate the log determinant used in spatial likelihood problems. Journal of Geographical Systems 11, 209–225. Rendon, S., 2013. Fixed and random effects in classical and bayesian regression. Oxford Bulletin of Economics and Statistics 75, 460–476. Roberts, G., Rosenthal, J.S., 2009. Examples of adaptive M CM C. Journal of Computational and Graphical Statistics 18, 349–367. Smith, A., 1973. A general bayesian linear model. Journal of the Royal Statistical Society. Series B (Methodological) 35, 67–75. Spiegelhalter, D., a.B.T., Gilks, W., Lunn, D., 2003. BUGS: Bayesian inference using Gibbs sampling. Technical report, MRC Biostatistics Unit, England, www.mrcbsu. cam.ac.uk/bugs/. Tao, J., 2005. Spatial econometrics: Models, methods and applications. Ph.D. thesis, Ohio State University. Tao, J., Yu, J., 2012. The spatial time lag in panel data models. Economics Letters 117, 544–547. Verdinelli, L., Wasserman, L., 1995. Computing bayes factors using a generalization of the savage-dickey density ratio. Journal of the American Statistical Association 90, 614–618. Xu, C., 2011. The fundamental institutions of China’s reforms and development. Journal of Economic Literature 49, 1076–1151. Yu, J., de Jong, R., Lee, L.F., 2008. Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both n and T are large. Journal of Econometrics 146, 118–134. Yu, J., Zhou, L.A., Zhu, G., 2016. Strategic interaction in political competition: Evidence from spatial effects across Chinese cities. Regional Science and Urban Economics 57, 23–37. Zellner, A., 1971. An Introduction to Bayesian Inference in Econometrics. New York: J. Wiley and Sons, Inc.
Appendices A. Evaluation of p(λ1 = 0|{Ynt }) using the method in Chib and Jeliazkov (2001) In this section we show how the method in Chib and Jeliazkov (2001) can be adopted to evaluate the marginal likelihood p(λ1 = λ∗1 |{Ynt }), where λ∗1 is any given constant. In Section 4, λ∗1 = 0. Recall that Θ2 = (λ−1 , ρ, γ, µ, β, σ, Cn , {αt }) is the parameter vector of the restricted model M2 and Θ1 = (λ1 , Θ2 ) is the parameter vector of the unrestricted model M1 . Let P r(λ1 , λ∗1 |{Ynt }, Θ2 ) and g(λ1 |λ∗1 ) be respectively, the acceptance probability of λ1 to λ∗1 and the corresponding proposal density. Denote H(λ1 , λ∗1 |{Ynt }, Θ2 ) ≡ P r(λ1 , λ∗1 |{Ynt }, Θ2 )g(λ∗1 |λ1 ). By the reversibility condition of the Markov Chain, H(λ1 , λ∗1 |{Ynt }, Θ2 )p(λ1 |{Ynt }, Θ2 ) = H(λ∗1 , λ1 |{Ynt }, Θ2 )p(λ∗1 |{Ynt }, Θ2 ).
35
(A.1)
Multiply both sides of Eq.(A.1) by p(Θ2 |{Ynt }) and integrate with respect to Θ1 , we have Z Z ∗ H(λ1 , λ1 |{Ynt }, Θ2 )p(Θ1 |{Ynt })dΘ1 = H(λ∗1 , λ1 |{Ynt }, Θ2 )p(λ∗1 |{Ynt }, Θ2 )p(Θ2 |{Ynt })dΘ1 .
(A.2)
Note that p(λ∗1 |{Ynt }, Θ2 )p(Θ2 |{Ynt }) = p(λ∗1 |{Ynt })p(Θ2 |{Ynt }, λ∗1 ). Therefore, (A.2) can be expressed as Z Z ∗ H(λ1 , λ1 |{Ynt }, Θ2 )p(Θ1 |{Ynt })dΘ1 = H(λ∗1 , λ1 |{Ynt }, Θ2 )p(λ∗1 |{Ynt })p(Θ2 |{Ynt }, λ∗1 )dΘ1 Z = H(λ∗1 , λ1 |{Ynt }, Θ2 )p(Θ2 |{Ynt }, λ∗1 )dΘ1 × p(λ∗1 |{Ynt }). (A.3) Then we have p(λ∗1 |{Ynt })
R P r(λ1 , λ∗1 |{Ynt }, Θ2 )g(λ∗1 |λ1 )p(Θ1 |{Ynt })dΘ1 H(λ1 , λ∗1 |{Ynt }, Θ2 )p(Θ1 |{Ynt })dΘ1 =R =R ∗ ∗ H(λ1 , λ1 |{Ynt }, Θ2 )p(Θ2 |{Ynt }, λ1 )dΘ1 P r(λ∗1 , λ1 |{Ynt }, Θ2 )g(λ1 |λ∗1 )p(Θ2 |{Ynt }, λ∗1 )dΘ1 Eu ((P r(λ1 , λ∗1 |{Ynt }, Θ2 )g(λ∗1 |λ1 )) = . (A.4) Er (P r(λ∗1 , λ1 |{Ynt }, Θ2 )) R
Here the expectation Eu (.) is with respect to the posterior density p(Θ1 |{Ynt }) and can be calculated using the original MCMC draws of the unrestricted model; but the expectation Er (.) is with respect to the density g(λ1 |λ∗1 )p(Θ2 |{Ynt }, λ∗1 ). Hence, to evaluate Er (.), one needs to rely on the following reduced MCMC draws: Step 1: Sample Θ2 from p(Θ2 |{Ynt }, λ∗1 ) Step 2: Sample λ1 from g(λ1 |λ∗1 ). Then p(λ∗1 |{Ynt }) can be evaluated by p(λ∗1 |{Ynt }) ≈
L−1
(l) (l) ∗ ∗ (l) l=1 P r(λ1 , λ1 |{Ynt }, Θ2 )g(λ1 |λ1 ) . PJ (j) (j) J −1 j=1 P r(λ∗1 , λ1 |{Ynt }, Θ2 )
PL
36
(A.5)
Figure 1: Graphs of Simulation Results. (a) Trace plots of λ1 and λ2 in simulation: Static
(b) Trace plots of λ1 and λ2 in simulation: Dynamic
panel
panel
37
Figure 2: Graphs of Empirical Results. (a) Trace plots of λ1 and λ2 in spatial competition
(b) Trace plots of λ1 and λ2 in voter participation
of Chinese cities: Dynamic panel with W1nt
rate: Cross-section
38
Table 1: Model estimation: Baseline algorithm Cross-section
Static Panel
Dynamic Panel
Restrictive Range Wider Range Restrictive Range Wider Range Restrictive Range Wider Range Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
ρ1 6= 0 λ1
0.30
0.02
0.30
0.02
0.30
0.01
0.30
0.01
0.20
0.01
0.20
0.01
λ2
0.10
0.02
0.10
0.02
0.10
0.01
0.10
0.01
0.10
0.01
0.10
0.01
ρ1
0.19
0.08
0.19
0.07
0.20
0.03
0.20
0.03
0.21
0.02
0.21
0.02
γ
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
0.12
0.01
0.12
0.01
µ1 N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
-0.11
0.01
-0.11
0.01
µ2 N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
-0.10
0.01
-0.10
0.01
β0
2.01
0.08
2.01
0.08
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
β1
1.00
0.02
1.00
0.02
1.00
0.01
1.00
0.01
1.01
0.01
1.01
0.01
β2
1.00
0.02
1.00
0.02
1.00
0.02
1.00
0.01
1.01
0.01
1.01
0.01
2
1.00
0.05
1.01
0.05
0.99
0.02
0.99
0.02
0.95
0.02
0.95
0.02
ρ1 = 0 λ1
0.30
0.01
0.30
0.01
0.30
0.01
0.30
0.01
0.20
0.01
0.20
0.01
λ2
0.10
0.02
0.10
0.02
0.10
0.01
0.10
0.01
0.10
0.01
0.10
0.01
γ
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
0.12
0.004
0.12
0.005
µ1 N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
-0.11
0.01
-0.11
0.01
µ2 N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
-0.10
0.01
-0.10
0.01
β0
2.00
0.09
2.00
0.09
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
β1
1.00
0.02
1.00
0.02
1.00
0.01
1.00
0.01
1.01
0.01
1.01
0.01
β2
1.00
0.02
1.00
0.02
1.00
0.01
1.00
0.01
1.01
0.01
1.01
0.01
σ 2 0.99
0.06
0.99
0.06
1.00
0.03
1.00
0.03
0.96
0.02
0.96
0.02
σ
The values reported in this table are calculated from 50 repetitions. Cross-section: n = 600; Static panel: n = 300 and T = 10; Dynamic panel: n = 300 and T = 18. Cross-section and Static panel: (λ1 , λ2 , ρ1 ) = (0.3, 0.1, 0.2) or (0.3, 0.1, 0). Dynamic panel: (λ1 , λ2 , ρ1 , γ, µ1 , µ2 ) = (0.2, 0.1, 0.2, 0.1, −0.1, −0.1) or (0.2, 0.1, 0, 0.1, −0.1, −0.1). (β1 , β2 , σ 2 ) = (1, 1, 1), W1u : “3-nearest neighbors”, W2u : “4-8 nearest neighbors”, and M1u : “0.1 × n nearest neighbors.”
39
Table 2: Model estimation: Exchange algorithm Cross-section
Static Panel
Dynamic Panel
Restrictive Range Wider Range Restrictive Range Wider Range Restrictive Range Wider Range Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
ρ1 6= 0 λ1
0.29
0.03
0.29
0.03
0.30
0.01
0.30
0.01
0.20
0.01
0.20
0.01
λ2
0.10
0.04
0.10
0.04
0.10
0.01
0.10
0.01
0.10
0.01
0.10
0.01
ρ1
0.18
0.10
0.18
0.10
0.20
0.06
0.20
0.06
0.20
0.03
0.20
0.03
γ
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
0.12
0.01
0.12
0.01
µ1 N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
-0.11
0.01
-0.11
0.01
µ2 N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
-0.10
0.01
-0.10
0.01
β0
2.00
0.15
2.00
0.15
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
β1
1.00
0.02
1.00
0.02
1.00
0.01
1.00
0.01
1.01
0.01
1.01
0.01
β2
1.00
0.02
1.00
0.02
1.00
0.01
1.00
0.01
1.01
0.01
1.01
0.01
2
1.00
0.06
1.00
0.06
1.00
0.02
1.00
0.02
0.95
0.02
0.95
0.02
ρ1 = 0 λ1
0.31
0.03
0.30
0.03
0.30
0.01
0.30
0.01
0.20
0.01
0.20
0.01
λ2
0.10
0.03
0.10
0.03
0.10
0.01
0.10
0.01
0.10
0.01
0.10
0.01
γ
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
0.12
0.01
0.12
0.01
µ1 N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
-0.11
0.01
-0.11
0.01
µ2 N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
-0.10
0.01
-0.10
0.01
β0
1.99
0.19
1.99
0.18
N.A
N.A
N.A
N.A
N.A
N.A
N.A
N.A
β1
1.00
0.02
1.00
0.02
1.00
0.01
1.00
0.01
1.01
0.01
1.01
0.01
β2
1.00
0.03
1.00
0.02
1.00
0.01
1.00
0.01
1.01
0.01
1.01
0.01
σ 2 1.00
0.06
1.00
0.06
1.00
0.03
1.00
0.03
0.96
0.02
0.96
0.02
σ
The values reported in this table are calculated from 50 repetitions. Cross-section: n = 600; Static panel: n = 300 and T = 10; Dynamic panel: n = 300 and T = 18. Cross-section and Static panel: (λ1 , λ2 , ρ1 ) = (0.3, 0.1, 0.2) or (0.3, 0.1, 0). Dynamic panel: (λ1 , λ2 , ρ1 , γ, µ1 , µ2 ) = (0.2, 0.1, 0.2, 0.1, −0.1, −0.1) or (0.2, 0.1, 0, 0.1, −0.1, −0.1). (β1 , β2 , σ 2 ) = (1, 1, 1), W1u : “3-nearest neighbors”, W2u : “4-8 nearest neighbors”, and M1u : “0.1 × n nearest neighbors.”
40
Table 3: Estimation of the higher-order dynamic SAR panel model with parameters in spatial weights ρ1 = 0.2 Baseline Algorithm
ρ1 = 0
Restrictive range
Wider range
Restrictive range
Wider range
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
λ1
0.20
0.01
0.20
0.01
0.20
0.01
0.20
0.01
λ2
0.10
0.01
0.10
0.01
0.10
0.01
0.10
0.01
ρ1
0.20
0.02
0.20
0.02
N.A
N.A
N.A
N.A
γ
0.12
0.01
0.12
0.01
0.12
0.01
0.12
0.01
µ1
-0.11
0.01
-0.11
0.01
-0.11
0.01
-0.11
0.01
µ2
-0.10
0.01
-0.10
0.01
-0.10
0.01
-0.10
0.01
φ1
0.98
0.08
0.99
0.08
1.01
0.08
1.01
0.08
β1
1.01
0.01
1.01
0.01
1.01
0.01
1.01
0.01
β2
1.01
0.01
1.01
0.01
1.01
0.01
1.01
0.01
2
0.96
0.02
0.96
0.02
0.96
0.02
0.96
0.02
σ
ρ1 = 0.2 Exchange algorithm
ρ1 = 0
Restrictive range
Wider range
Restrictive range
Wider range
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
λ1
0.20
0.01
0.20
0.01
0.20
0.01
0.20
0.01
λ2
0.10
0.01
0.10
0.01
0.10
0.01
0.10
0.01
ρ1
0.20
0.03
0.20
0.03
N.A
N.A
N.A
N.A
γ
0.12
0.01
0.12
0.01
0.12
0.01
0.12
0.01
µ1
-0.11
0.01
-0.11
0.01
-0.11
0.01
-0.11
0.01
µ2
-0.10
0.01
-0.10
0.01
-0.10
0.01
-0.10
0.01
φ1
0.99
0.10
0.99
0.10
0.99
0.11
0.99
0.11
β1
1.01
0.01
1.01
0.01
1.01
0.01
1.01
0.01
β2
1.01
0.01
1.01
0.01
1.01
0.01
1.01
0.01
σ2
0.95
0.02
0.95
0.02
0.96
0.02
0.96
0.02
The values reported in this table are calculated from 50 repetitions. Dynamic panel: n = 300 and T = 18; (φ1 , β1 , β2 , σ 2 ) = (1, 1, 1, 1). (λ1 , λ2 , ρ1 , γ, µ1 , µ2 ) = (0.2, 0.1, 0.2, 0.1, −0.1, −0.1) or (0.2, 0.1, 0, 0.1, −0.1, −0.1). W1u : “3-nearest neighbors”, W2u : “4-8 nearest neighbors”, and M1u : “0.1 × n nearest neighbors.”
41
Table 4: Estimation of the higher-order dynamic SAR panel model with wider range ρ1 = 0.2 BA
ρ1 = 0 EA
BA
EA
Dynamic model Mean S.D Mean S.D Mean S.D Mean S.D λ1
0.40
0.01
0.40 0.01 0.40 0.01 0.40 0.01
λ2
-0.20
0.01
-0.20 0.02 -0.20 0.01 -0.20 0.02
ρ1
0.20
0.03
0.20 0.03 N.A N.A N.A N.A
γ
0.22
0.01
0.22 0.01 0.23 0.01 0.22 0.01
µ1
-0.11
0.01
-0.11 0.01 -0.11 0.01 -0.11 0.01
µ2
-0.10
0.01
-0.09 0.02 -0.10 0.01 -0.10 0.02
β1
1.01
0.01
1.01 0.01 1.01 0.01 1.01 0.01
β2
1.01
0.01
1.01 0.01 1.01 0.01 1.01 0.01
σ2
0.96
0.02
0.95 0.02 0.96 0.02 0.95 0.02
ρ1 = 0.2 BA Dynamic Model with parameter in W1nt
ρ1 = 0 EA
BA
EA
Mean S.D Mean S.D Mean S.D Mean S.D
λ1
0.40
0.01
0.39 0.02 0.40 0.01 0.40 0.01
λ2
-0.19
0.01
-0.19 0.03 -0.20 0.01 -0.20 0.02
ρ1
0.20
0.03
0.19 0.07 N.A N.A N.A N.A
γ
0.23 0.004 0.22 0.01 0.22 0.01 0.23 0.01
µ1
-0.11
0.01
-0.11 0.01 -0.11 0.01 -0.11 0.01
µ2
-0.09
0.01
-0.10 0.03 -0.10 0.01 -0.10 0.01
φ1
1.00
0.04
1.02 0.09 1.00 0.04 1.01 0.06
β1
1.01
0.01
1.01 0.01 1.01 0.01 1.01 0.01
β2
1.01
0.01
1.01 0.01 1.01 0.01 1.01 0.01
2
0.96
0.02
0.96 0.02 0.96 0.02 0.96 0.02
σ
The values reported in this table are calculated from 50 repetitions. BA: Baseline algorithm; EA:Exchange algorithm; n = 300, T = 18; (φ1 , β1 , β2 , σ 2 ) = (1, 1, 1, 1). (λ1 , λ2 , ρ1 , γ, µ1 , µ2 ) = (0.4, −0.2, 0.2, 0.2, −0.1, −0.1) or (0.4, −0.2, 0, 0.2, −0.1, −0.1). W1u : “2-nearest neighbors”, W2u : “5 nearest neighbors”, and M1u : “0.1 × n nearest neighbors.”
42
Table 5: Model estimation: hierarchical prior on Cn Static Panel
Dynamic Panel
Restrictive Range Wider Range Restrictive Range Wider Range Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
Baseline algorithm λ1
0.30
0.01
0.30
0.01
0.20
0.01
0.20
0.01
λ2
0.10
0.01
0.10
0.01
0.10
0.01
0.10
0.01
ρ1
0.20
0.02
0.20
0.02
0.21
0.01
0.20
0.02
γ
N.A
N.A
N.A
N.A
0.10
0.01
0.10
0.01
µ1
N.A
N.A
N.A
N.A
-0.10
0.01
-0.10
0.01
µ2
N.A
N.A
N.A
N.A
-0.10
0.01
-0.10
0.01
β1
1.00
0.01
1.00
0.01
1.00
0.01
1.00
0.01
β2
1.00
0.01
1.00
0.01
1.00
0.01
1.00
0.01
σ2
1.00
0.03
1.00
0.03
0.94
0.02
0.94
0.04
ψ1 2.03
0.13
2.03
0.13
1.97
0.15
1.98
0.15
ψ2 2.00
0.13
2.00
0.13
1.97
0.14
1.98
0.14
σc2
1.97
0.15
1.97
0.15
2.00
0.17
2.00
0.17
Exchange algorithm λ1
0.30
0.01
0.30
0.01
0.20
0.01
0.20
0.01
λ2
0.10
0.01
0.10
0.01
0.10
0.01
0.10
0.01
ρ1
0.19
0.04
0.19
0.04
0.21
0.01
0.20
0.02
γ
N.A
N.A
N.A
N.A
0.10
0.01
0.10
0.01
µ1
N.A
N.A
N.A
N.A
-0.10
0.01
-0.10
0.01
µ2
N.A
N.A
N.A
N.A
-0.10
0.01
-0.10
0.01
β1
1.00
0.01
1.00
0.01
1.00
0.01
1.00
0.01
β2
1.00
0.01
1.00
0.01
1.00
0.01
1.00
0.01
2
1.01
0.03
1.00
0.03
0.94
0.02
0.94
0.02
ψ1 2.03
0.13
2.03
0.13
1.97
0.15
1.97
0.15
ψ2 2.00
0.13
2.00
0.13
1.97
0.14
1.98
0.14
σc2
0.15
1.97
0.15
1.99
0.17
2.00
0.17
σ
1.97
The values reported in this table are calculated from 50 repetitions; Static panel: n = 300 and T = 10, (λ1 , λ2 , ρ1 ) = (0.3, 0.1, 0.2); Dynamic panel: n = 300 and T = 18, (λ1 , λ2 , ρ1 , γ, µ1 , µ2 ) = (0.2, 0.1, 0.2, 0.1, −0.1, −0.1); (β1 , β2 , σ 2 , ψ1 , ψ2 , σc2 ) = (1, 1, 1, 2, 2, 2), W1u : “3-nearest neighbors”, W2u : “4-8 nearest neighbors”, and M1u : “0.1×n nearest neighbors.” 43
Table 6: Model estimation: non-informative priors on β,σ 2 ,Cn and {αt }0 s Static Panel
Dynamic Panel
Restrictive Range Wider Range Restrictive Range Wider Range Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
Baseline algorithm λ1
0.30
0.01
0.30
0.01
0.20
0.01
0.20
0.01
λ2
0.10
0.01
0.10
0.01
0.10
0.01
0.10
0.01
ρ1
0.20
0.02
0.20
0.02
0.20
0.02
0.20
0.02
γ
N.A
N.A
N.A
N.A
0.10
0.01
0.10
0.01
µ1 N.A
N.A
N.A
N.A
-0.10
0.01
-0.10
0.01
µ2 N.A
N.A
N.A
N.A
-0.10
0.01
-0.10
0.01
β1
1.00
0.01
1.00
0.01
1.00
0.01
1.00
0.01
β2
1.00
0.01
1.00
0.01
1.00
0.01
1.00
0.01
2
1.00
0.03
1.00
0.03
0.94
0.02
0.94
0.02
Exchange algorithm λ1
0.30
0.01
0.30
0.01
0.20
0.01
0.20
0.01
λ2
0.10
0.01
0.10
0.01
0.10
0.01
0.10
0.01
ρ1
0.20
0.03
0.20
0.04
0.20
0.02
0.20
0.02
γ
N.A
N.A
N.A
N.A
0.10
0.01
0.10
0.01
µ1 N.A
N.A
N.A
N.A
-0.10
0.01
-0.10
0.01
µ2 N.A
N.A
N.A
N.A
-0.10
0.01
-0.10
0.01
β1
1.00
0.01
1.00
0.01
1.00
0.01
1.00
0.01
β2
1.00
0.01
1.00
0.01
1.00
0.01
1.00
0.01
σ 2 1.00
0.03
1.00
0.03
0.94
0.02
0.94
0.02
σ
The values reported in this table are calculated from 50 repetitions; Static panel: n = 300 and T = 10, (λ1 , λ2 , ρ1 ) = (0.3, 0.1, 0.2); Dynamic panel: n = 300 and T = 18, (λ1 , λ2 , ρ1 , γ, µ1 , µ2 ) = (0.2, 0.1, 0.2, 0.1, −0.1, −0.1); (β1 , β2 , σ 2 ) = (1, 1, 1), W1u : “3nearest neighbors”, W2u : “4-8 nearest neighbors”, and M1u : “0.1 × n nearest neighbors.”
44
Table 7: CPU Time of Different Algorithm: estimation n = 600
n = 3000
n = 5000
Cross-section BA
EA
MCA
BA
EA
MCA
BA
EA
MCA
RR
0.03
0.01
0.10
1.02
0.23
0.87
2.59
0.75
1.49
WR
0.02
0.01
0.10
1.13
0.23
0.82
2.71
0.63
1.42
RR
0.07
0.02
0.18
0.56
0.14
0.31
1.67
0.38
0.65
WR
0.03
0.02
0.10
0.55
0.15
0.31
1.53
0.39
0.65
DGP1
DGP2 n = 600, T = 10
n = 3000, T = 10
n = 5000, T = 10
Static Panel BA
EA
MCA
BA
EA
MCA
BA
EA
MCA
RR
0.25
0.07
0.93
11.12
3.09
5.56
31.18
8.90
11.97
WR
0.20
0.06
0.95
10.26
2.85
5.22
30.41
7.50
11.61
RR
0.22
0.11
1.29
5.87
1.92
3.34
20.01
5.51
6.67
WR
0.20
0.10
0.65
5.81
1.67
3.51
19.65
3.77
6.62
DGP1
DGP2 n = 600, T = 10
n = 3000, T = 10
n = 5000, T = 10
Dynamic Panel BA
EA
MCA
BA
EA
MCA
BA
EA
MCA
RR
0.23
0.09
0.84
9.20
2.23
5.05
27.17
7.55
11.98
WR
0.19
0.06
0.82
10.29
2.19
4.90
28.51
6.20
10.02
RR
0.10
0.04
0.48
4.57
1.30
2.91
17.29
4.91
5.92
WR
0.10
0.04
0.48
4.75
1.34
2.86
18.00
4.87
6.01
DGP3
DGP4 n = 600, T = 10 Dynamic Panel: W1nt (φ)
n = 3000, T = 10
n = 5000, T = 10
BA
EA
MCA
BA
EA
MCA
BA
EA
MCA
RR
0.95
0.60
1.67
19.79
9.81
14.66
58.22
38.72
33.81
WR
1.05
0.57
2.35
19.99
10.03
13.61
74.43
37.74
34.36
RR
0.61
0.29
1.18
15.63
8.57
11.80
52.39
30.14
32.01
WR
0.39
0.28
1.18
16.67
8.61
11.98
53.07
29.35
32.99
DGP3
DGP4 Cross-section and static panel: with (λ1 , λ2 , ρ)
=
(0.3, 0.1, 0);
DGP1 with (λ1 , λ2 , ρ) Dynamic panel:
=
(0.3, 0.1, 0.2) and DGP2
DGP3 with (λ1 , λ2 , ρ1 , γ, µ1 , µ2 )
=
(0.2, 0.1, 0.2, 0.1, −0.1, −0.1) and DGP4 with (λ1 , λ2 , ρ1 , γ, µ1 , µ2 ) = (0.2, 0.1, 0, 0.1, −0.1, −0.1); (φ1 , β1 , β2 , σ 2 ) = (1, 1, 1, 1). W1u = M1u : “3-nearest neighbors”, W2u : “4-8 nearest neighbors.” BA:Baseline algorithm; EA: Exchange algorithm; MCA: Monte carlo approximation; RR: Restrictive range; WR: Wider range; All algorithms are run on a server with an 3.30GHZ Intel Xeon processor and 66 GB memory; The CPU times are in seconds. 45
Table 8: Model selection by SDDR Cross-section True
Baseline algorithm
Exchange algorithm
Restrictive Range Wider Range Restrictive Range Wider Range
λ2 = 0
0.96
1
0.82
0.82
λ2 6= 0
1
1
0.98
0.98
λ1 − λ2 = 0
0.94
1
0.80
0.78
λ1 − λ2 6= 0
1
1
1
1
Static Panel True
Baseline algorithm
Exchange algorithm
Restrictive Range Wider Range Restrictive Range Wider Range
λ2 = 0
0.86
0.82
0.95
0.96
λ2 6= 0
0.98
1
1
1
λ1 − λ2 = 0
0.86
0.90
0.86
0.86
λ1 − λ2 6= 0
0.96
1
1
1
Dynamic Panel True
Baseline algorithm
Exchange algorithm
Restrictive Range Wider Range Restrictive Range Wider Range
λ2 = 0
0.82
0.9
0.86
0.82
λ2 6= 0
1
1
1
1
λ1 − λ2 = 0
0.94
0.94
0.82
0.8
λ1 − λ2 6= 0
1
1
1
1
Dynamic Panel with W1nt (φ1 ) True
Baseline algorithm
Exchange algorithm
Restrictive Range Wider Range Restrictive Range Wider Range
λ1 = 0, µ1 = 0
0.98
0.94
0.84
0.9
λ1 6= 0, µ1 6= 0
1
1
1
1
λ1 − λ2 = 0
0.96
0.96
0.78
0.86
λ1 − λ2 6= 0
1
1
1
1
The model frequencies reported in this table are calculated from 50 repetitions. Cross-section and Static panel: (λ1 , λ2 ) = (0.3, 0.1) or (0.3, 0) or (0.3, 0.3). Dynamic panel: (λ1 , λ2 , µ1 ) = (0.2, 0.1, −0.1) or (0.2, 0, −0.1) or (0.2, 0.2, −0.1). Dynamic panel with W1nt (φ1 ): (λ1 , λ2 , µ1 ) = (0.2, 0.1, −0.1) or (0, 0.1, 0) or (0.2, 0.2, −0.1). (ρ1 , φ1 , β1 , β2 , σ 2 ) = (0.2, 1, 1, 1, 1); Dynamic panel: (γ1 , µ2 ) = (0.1, −0.1). W1u : “3-nearest neighbors”, W2u : “4-8 nearest neighbors”, and M1u : “0.1×n nearest neighbors.”
46
Table 9: Summary statistics for spatial competition of Chinese cities Variable
Mean
Maximum
Minimum
S.D
Total investment spending on fixed assets
1.65
14.11
0.12
1.43
Account revenue
0.20
5.00
0.01
0.32
City GDP
2.96
42.95
0.28
3.37
Population (millions)
4.11
12.39
0.18
2.40
412.01
2565.10
4.70
302.80
Secondary industry ratio (percent)
0.50
0.91
0.16
0.11
Provincial revenue
0.17
0.60
0.02
0.12
Provincial expenditure
0.31
0.85
0.02
0.16
Population density (people/square kilo meters)
Sample is 239 prefecture cities in China, from 2007 to 2012. RMB amounts are in 10000 yuan per capita.
Table 10: Summary statistics for voter participation rate Variable
Mean
Maximum
Minimum
S.D
Voter participation rate
1.65
14.11
0.12
1.43
Proportion of voter with college degrees
0.20
5.00
0.01
0.32
Proportion of voter with home ownership
2.96
42.95
0.28
3.37
Income per voter
4.11
12.39
0.18
2.40
Sample is 3107 counties (or their equivalents) in the continental United States from the 1990 Census which recorded votes in the 1980 presidential election.
47
Table 11: Bayesian estimation results in empirical applications: baseline algorithm (a) Bayesian estimation of spatial competition of Chinese cities: with W1nt|1 and W2nt
Static
Dynamic
Independent variable Restrictive Range
Wider Range
Restrictive Range
Wider Range
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
Account revenue
0.81
0.20
0.81
0.20
0.12
0.15
0.13
0.16
City GDP
0.25
0.02
0.25
0.02
0.14
0.02
0.14
0.02
Population
-0.03
Population density
−0.1 × 10
Secondary industry ra-
0.02 −3
0.1 × 10
-0.03 −3
−0.1 × 10
0.02 −3
-0.07
0.1 × 10
−3
0.1 × 10
0.02 −4
0.8 × 10
-0.07 −4
0.1 × 10
0.02 −4
0.8 × 10−4
0.55
0.19
0.54
0.19
0.53
0.16
0.53
0.17
Provincial revenue
-1.31
0.32
-1.32
0.33
-0.81
0.24
-0.83
0.24
Provincial expenditure
1.10
0.27
1.13
0.27
0.86
0.19
0.85
0.20
λ1
0.21
0.02
0.20
0.03
0.16
0.02
0.16
0.02
λ2
0.03
0.02
0.03
0.02
-0.003
0.01
-0.001
0.01
γ
N.A
N.A
N.A
N.A
0.67
0.02
0.67
0.02
µ1
N.A
N.A
N.A
N.A
-0.13
0.02
-0.13
0.01
µ2
N.A
N.A
N.A
N.A
-0.008
0.01
-0.007
0.01
σ2
0.09
0.003
0.09
0.003
0.05
0.002
0.05
0.002
tio
Key parameters
(b) Bayesian estimation of voter participation rate
Restrictive Range
Wider Range
Independent variable Mean
S.D
Mean
S.D
Constant
0.59
0.04
0.59
0.04
Proportion of voter with college degrees
0.19
0.02
0.19
0.02
Proportion of voter with home ownership
0.50
0.02
0.48
0.02
Income per voter
-0.08
0.02
-0.08
0.02
λ1
0.41
0.02
0.41
0.02
λ2
0.25
0.02
0.25
0.02
0.01
0.3 × 10−3
Key parameters
σ
2
0.01
0.4 × 10
−3
Number of MCMC iterations is 50000 for static model, 60000 for dynamic model and 40000 for cross-sectional model, with the first 20% used for burn-in. 48
Table 12: Bayesian estimation results in empirical applications: exchange algorithm (a) Bayesian estimation of spatial competition of Chinese cities: with W1nt|1 and W2nt
Static
Dynamic
Independent variable Restrictive Range
Wider Range
Restrictive Range
Wider Range
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
Account revenue
0.82
0.20
0.81
0.19
0.13
0.15
0.13
0.15
City GDP
0.25
0.02
0.25
0.02
0.13
0.02
0.13
0.02
Population
-0.03
Population density
−0.1 × 10
Secondary industry ra-
0.02 −3
0.1 × 10
-0.03 −3
−0.1 × 10
0.02 −3
-0.06
0.1 × 10
−3
0.1 × 10
0.02 −4
0.8 × 10
-0.06 −3
0.1 × 10
0.02 −4
0.8 × 10−4
0.56
0.19
0.57
0.19
0.50
0.16
0.50
0.16
Provincial revenue
-1.39
0.33
-1.38
0.33
-0.91
0.25
-0.91
0.25
Provincial expenditure
1.27
0.25
1.28
0.26
0.88
0.20
0.88
0.20
λ1
0.20
0.02
0.20
0.01
0.17
0.02
0.17
0.01
λ2
-0.02
0.02
-0.02
0.02
-0.02
0.02
-0.02
0.02
γ
N.A
N.A
N.A
N.A
0.67
0.01
0.67
0.02
µ1
N.A
N.A
N.A
N.A
-0.09
0.02
-0.09
0.02
µ2
N.A
N.A
N.A
N.A
-0.02
0.02
-0.02
0.02
σ2
0.09
0.003
0.09
0.003
0.05
0.001
0.05
0.001
tio
Key parameters
(b) Bayesian estimation of voter participation rate
Restrictive Range
Wider Range
Independent variable Mean
S.D
Mean
S.D
Constant
0.56
0.04
0.56
0.04
Proportion of voter with college degrees
0.16
0.02
0.16
0.02
Proportion of voter with home ownership
0.47
0.02
0.48
0.02
Income per voter
-0.06
0.02
-0.06
0.02
λ1
0.40
0.02
0.40
0.02
λ2
0.30
0.02
0.29
0.02
0.01
0.3 × 10−3
Key parameters
σ
2
0.01
0.4 × 10
−3
Number of MCMC iterations is 50000 for static model, 60000 for dynamic model and 40000 for cross-sectional model, with the first 20% used for burn-in. 49
Table 13: Estimates of key parameters with baseline algorithm: spatial competition of Chinese cities W1nt|1 and W2nt
Static Panel
W1nt|2 and W2nt
Restrictive Range
Wider Range
Restrictive Range
Wider Range
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
λ1
0.20
0.02
0.20
0.02
0.21
0.02
0.20
0.02
λ2
0.03
0.02
0.03
0.02
0.03
0.02
0.03
0.02
σ2
0.09
0.003
0.09
0.005
0.09
0.003
0.09
0.003
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
λ1
0.16
0.02
0.16
0.02
0.17
0.02
0.16
0.02
λ2
-0.003
0.01
-0.001
0.01
-0.004
0.01
-0.003
0.01
γ
0.67
0.01
0.67
0.02
0.68
0.02
0.67
0.02
µ1
-0.13
0.02
-0.13
0.01
-0.12
0.02
-0.13
0.01
µ2
-0.008
0.01
-0.007
0.01
-0.01
0.01
-0.01
0.01
0.05
0.002
0.05
0.002
0.05
0.002
0.05
0.002
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
λ1
0.16
0.02
0.16
0.02
0.16
0.02
0.16
0.02
λ2
-0.003
0.01
-0.001
0.01
-0.001
0.01
-0.002
0.01
γ
0.68
0.02
0.68
0.02
0.68
0.02
0.67
0.02
µ1
-0.12
0.01
-0.12
0.02
-0.13
0.01
-0.13
0.02
µ2
-0.01
0.01
-0.01
0.01
-0.01
0.01
-0.01
0.01
φ1
0.86
0.63
0.87
0.65
0.86
0.67
0.88
0.65
σ2
0.05
0.002
0.05
0.001
0.05
0.001
0.05
0.002
Dynamic Panel
σ
2
Dynamic Panel with parameter in W1nt
Number of MCMC iterations is 50000 for static model and 60000 for dynamic model, with the first 20% used for burn-in.
50
Table 14: Estimates of key parameters with exchange algorithm: spatial competition of Chinese cities W1nt|1 and W2nt
Static Panel
W1nt|2 and W2nt
Restrictive Range
Wider Range
Restrictive Range
Wider Range
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
λ1
0.20
0.02
0.20
0.02
0.19
0.01
0.19
0.01
λ2
-0.02
0.02
-0.02
0.02
-0.02
0.02
-0.02
0.02
σ2
0.09
0.003
0.09
0.003
0.09
0.003
0.09
0.003
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
λ1
0.17
0.02
0.17
0.01
0.17
0.01
0.16
0.02
λ2
-0.02
0.02
-0.02
0.02
-0.03
0.02
-0.03
0.02
γ
0.67
0.01
0.67
0.02
0.67
0.02
0.66
0.01
µ1
-0.09
0.01
-0.09
0.01
-0.09
0.01
-0.09
0.01
µ2
-0.02
0.02
-0.02
0.02
-0.02
0.02
-0.02
0.02
2
0.05
0.002
0.05
0.002
0.05
0.002
0.05
0.002
Mean
S.D
Mean
S.D
Mean
S.D
Mean
S.D
λ1
0.16
0.02
0.16
0.02
0.16
0.01
0.16
0.01
λ2
-0.03
0.02
-0.02
0.02
-0.02
0.02
-0.02
0.02
γ
0.66
0.01
0.67
0.01
0.68
0.01
0.66
0.01
µ1
-0.10
0.02
-0.11
0.02
-0.10
0.02
-0.11
0.02
µ2
-0.03
0.02
-0.02
0.02
-0.02
0.02
-0.02
0.02
φ1
1.04
0.85
0.84
0.80
1.00
0.79
0.84
0.74
σ2
0.05
0.002
0.05
0.002
0.05
0.002
0.05
0.002
Dynamic Panel
σ
Dynamic Panel with parameter in W1nt
Number of MCMC iterations is 50000 for static model and 60000 for dynamic model, with the first 20% used for burn-in.
51
Table 15: CPU time of different algorithms in empirical applications: estimation (a) CPU time of different algorithms for spatial competition of Chinese cities: estimation
Static Panel
W1nt|1 and W2nt
W1nt|2 and W2nt
Restrictive range Wider range Restrictive range Wider range Baseline Algorithm
1423.3
1494.1
1386.2
1532.8
Exchange Algorithm
1168.2
1180.8
1114.9
1226.1
Dynamic Panel
W1nt|1 and W2nt
W1nt|2 and W2nt
Restrictive range Wider range Restrictive range Wider range Baseline Algorithm
2297.1
3181.5
2261.8
3283.6
Exchange Algorithm
2190.8
2931.8
2216.2
2803.7
Dynamic Panel with parameter in W1nt|i
W1nt|1 and W2nt
W1nt|2 and W2nt
Restrictive range Wider range Restrictive range Wider range
Baseline Algorithm
5551.4
7127.8
5172.6
7912.6
Exchange Algorithm
5250.4
6601.8
4792.7
6412.3
(b) CPU time of different algorithms for voter participation rate: estimation
W1n and W2n
Cross-section
Restrictive Range
Wider Range
Baseline algorithm
6.00 × 104
6.07 × 104
Exchange algorithm
9902.4
9902.5
Number of MCMC iterations is 50000 for static model, 60000 for dynamic model and 40000 for cross-sectional model, with the first 20% used for burn-in. The CPU times are in seconds; Empirical application on spatial competition of Chinese cities are run on a 2.39GHz laptop with 8GB memory; Empirical application on voter participation rate are run on a 3.6 GHz desktop with 8 GB memory.
52
Table 16: Bayesian Model selection in empirical applications (a) Bayesian model selection of spatial competition of Chinese cities: Bayes factor by SDDR
W1nt|1 and W2nt
W1nt|2 and W2nt
RR
WR
RR
WR
Static BA
EA −111
λ1 = 0
8.54 × 10
λ2 = 0
12.13
λ1 = λ2
3.06 × 10
BA 6.02 × 10
0 4.98
−9
1.08 × 10
EA −118
6.59 × 10
0
12.41 −104
BA
12.40 −11
7.23 × 10
1.91 × 10
EA −110
3.19 × 10
11.11
−120
1.78 × 10
BA −174
16.76 −16
4.74 × 10
W1nt|1 and W2nt
EA −115
3.27 × 10
0
11.49 −96
2.75 × 10
8.07 −5
1.30 × 10−70
W1nt|2 and W2nt
RR
WR
RR
WR
Dynamic BA
EA
BA
EA
BA
EA
BA
EA
λ1 = 0
0
0
0.00
0
0.00
0
0.00
0
λ2 = 0
64.57
7.21
64.60
9.13
67.20
45.32
67.28
3.89
λ1 = λ2
0.00
0
0.00
0
0.00
0.00
0.00
2.13 × 10−50
W1nt|1 and W2nt
Dynamic
W1nt|2 and W2nt
RR W1nt (φ1 )
WR
BA
EA
BA
EA
0
0
60.66
87.83
λ1 = µ1 = 0 1.02 × 10−4 1.51 × 10−122 λ2 = 0
68.85
10.14
λ1 = λ2
5.04 × 10−44
0
RR BA
WR EA
BA
EA
0
4.20 × 10−84
16.74
65.89
24.04
0
8.70 × 10−111
0
1.07 × 10−29 1.05 × 10−76 64.91
5.87 × 10−68 4.92 × 10−112 4.64 × 10−91
(b) Bayesian model selection of voter participation rate: Bayes factor by SDDR
W1n and W2n Cross-section
RR
WR
BA
EA
BA
EA
λ1 = 0
0
0
0
0
λ2 = 0
0
0
0
0
λ1 = λ2
0
0
9.76 × 10−30
5.13 × 10−300
Number of MCMC iterations is 50000 for static model, 60000 for dynamic model and 40000 for cross-sectional model, with the first 20% used for burn-in; BA: Baseline algorithm; EA: Exchange algorithm; RR: Restrictive Range; WR: Wider Range.
53
Table 17: CPU time of different algorithms in empirical applications: model selection (a) CPU time of different algorithms for spatial competition of Chinese cities: model selection
W1nt|1 and W2nt
W1nt|2 and W2nt
RR
WR
RR
WR
Static BA
EA
BA
EA
BA
EA
BA
EA
λ1 = 0
5579.6
3717.4
5402.8
4135.5
5243.7
3797.0
6426.1
4364.4
λ2 = 0
5569.3
3698.3
6235.4
4496.2
5181.6
3837.0
5796.1
4135.6
λ1 = λ2
5644.9
4029.4
6194.4
5109.9
5020.8
4502.3
5508.3
4923.6
W1nt|1 and W2nt
W1nt|2 and W2nt
RR
WR
RR
WR
Dynamic BA
EA
BA
EA
BA
EA
BA
EA
λ1 = 0
7024.9
6233.3
9468.7
8487.4
7654.9
5939.9
9182.0
8830.0
λ2 = 0
7219.1
6456.8
9256.1
9262.6
6538.1
6558.8
9985.2
9198.9
λ1 = λ2
7618.7
6944.2
10715
9249.6
7056.9
6333.6
10195
9433.0
W1nt|1 and W2nt
Dynamic
W1nt|2 and W2nt
RR W1nt (φ1 )
WR
RR
WR
BA
EA
BA
EA
BA
EA
BA
EA
λ1 = µ1 = 0
1.10 × 104
1.00 × 104
1.44 × 104
1.33 × 104
1.21 × 104
1.10 × 104
1.48 × 104
1.32 × 104
λ2 = 0
1.47 × 104
1.40 × 104
2.20 × 104
2.09 × 104
1.39 × 104
1.18 × 104
1.84 × 104
1.59 × 104
λ1 = λ2
1.49 × 104
1.23 × 104
2.07 × 104
2.03 × 104
1.33 × 104
1.28 × 104
2.10 × 104
1.96 × 104
(b) CPU time of different algorithms for voter participation rate: model selection
W1n and W2n Cross-section
RR BA
WR EA
5
BA 4
2.40 × 10
EA 5
3.49 × 104
λ1 = 0
2.69 × 10
3.39 × 10
λ2 = 0
2.69 × 105
3.41 × 104
2.49 × 105
3.55 × 104
λ1 = λ2
2.47 × 105
1.19 × 104
2.50 × 105
3.46 × 104
Number of MCMC iterations is 50000 for static model, 60000 for dynamic model and 40000 for cross-sectional model, with the first 20% used for burn-in; The CPU times are in seconds; Empirical application on spatial competition of Chinese cities are run on a 2.39GHz laptop with 8GB memory; Empirical application on voter participation rate are run on a 3.6 GHz desktop with 8 GB memory; BA: Baseline algorithm; EA: Exchange algorithm; RR: Restrictive Range; WR: Wider Range. 54