Asymptotic F Test in the Presence of Nonparametric Spatial Dependence Yixiao Sun and Min Seong Kim Department of Economics, UC San Diego
Abstract The paper considers M estimation and develops a new test that corrects for spatial autocorrelation. The test is based on the covariance matrix estimator that involves projecting the data onto a set of orthonormal bases and using the sample variance of the projection vectors as the covariance estimator. To obtain a more accurate asymptotic approximation, we treat the number of basis functions K as …xed. Under this speci…cation, we show that the modi…ed Wald statistic converges to an F-distribution. The new F test is based on the F-asymptotic theory. Simulations show that the F test is more accurate in size than the conventional 2 test.
JEL Classi…cation: C12; C14; C31; Keywords: F-distribution, Hotelling’s T-squared distribution, robust standard error, series methods, spatial analysis, spatial autocorrelation.
1
Introduction
In this paper, we consider spatial data models with moment restrictions. A salient feature of spatial data is that the observations are statistically dependent. In searching for inference procedures that remain valid for general and unspeci…ed dependent structure, many practical methods in econometrics now make use of heteroskedasticity and autocorrelation robust (HAR) variance estimates. The essence of HAR procedure is to construct nonparametric variance estimators that take the dependent structure into account. See, for example, Kelejian and Prucha (2007) and Kim and Sun (2010). Most commonly used HAR variance estimator is formulated using conventional kernel smoothing techniques. Under some rate conditions, the HAR variance estimator is consistent and we obtain asymptotic normal and chi square tests. While appealing in terms of their asymptotic properties, consistent HAR procedures do not capture the randomness of the HAR variance estimator and the associated Email:
[email protected] and
[email protected]. Correspondence to: Department of Economics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0508. Sun gratefully acknowledges partial research support from NSF under Grant No. SES-0752443.
1
tests often have a large size distortions, especially when the spatial dependence is high. To address the size distortion problem, Bester, Conley, Hansen and Vogelsang (2009, BCHV hereafter) consider the so-called …xed-b asymptotics under which the truncated lag in the kernel HAR variance estimator is set equal to a …xed fraction b of the sample size. This is in contrast with the conventional asymptotics where b ! 0 as the sample size increases. While the resulting HAR variance estimator is inconsistent under the …xed-b asymptotics, the associated test statistic is asymptotically nuisance parameter free and critical values can be simulated from its nonstandard asymptotic distribution. BCHV show by simulation that the nonstandard test has better size properties than the conventional normal or chi-square test. One drawback of the …xed-b asymptotics is that the asymptotic distribution of the HAR variance estimator and the associate test statistic are nonstandard. Critical values have to be obtained via simulation or bootstrap. In this paper, we consider the class of series HAR variance estimators. The basic idea is to project the data series onto a set of basis functions designed to directly capture the sampling variation of interest. The outer-product of the projection coe¢ cients is a direct and unbiased estimator of the sampling variation. The series HAR variance estimator is simply an average of the direct estimators. By construction, the series HAR variance estimator is automatically positive semide…nite, an important property for practical use of the HAR variance estimator. The smoothing parameter underlying the series HAR variance estimator is the number of terms in the series expansion. As in the conventional asymptotics for series estimators, when the number of terms K goes to in…nity but at a slower rate than the sample size, the series HAR variance estimator is consistent. The associated Wald test statistic is asymptotically chi-square. We call this type of asymptotics the largeK asymptotics. When K is …xed, the series HAR variance estimator is inconsistent but converges weakly to a random variable that is proportional to the true variance. This property enables us to scale the point estimators of the model parameters in such a way that the resulting Wald test statistic is asymptotically pivotal. The novelty here is that we can choose the basis functions such that the asymptotic pivotal distribution is a standard distribution. We require the basis functions to be orthonormal and integrated to zero. The latter condition ensures that the estimation uncertainty in model parameters does not matter asymptotically even when K is …xed. Under these two conditions, the direct estimators are asymptotically independent and identically distributed. In addition, they are asymptotically independent of the point estimators of the model parameters. As a result, the (modi…ed) Wald statistic converges to an F-distribution. So regardless of whether K is …xed or grows with the sample size, the Wald statistic has a standard limiting distribution, being either a chi-square distribution or an F-distribution. This is very convenient in practice as the critical values from these distributions can be obtained from statistical tables or software packages. There is no need to simulate the …xed-K asymptotic critical values. This is in contrast with the …xed-b asymptotics where critical values have to be simulated or bootstrapped.
2
The …xed-K asymptotics and …xed-b asymptotics may be collectively referred to as the …xed-smoothing asymptotics as they e¤ectively involve smoothing over a …xed number of quantities of interest. On the other hand, the conventional large-K asymptotics where K ! 1 or the small-b asymptotics where b ! 0 may be referred to as the increasing-smoothing asymptotics, as they involves smoothing over an increasing number of quantities. The two speci…cations can be viewed as di¤erent asymptotic devices to obtain approximations to the …nite sample distribution. The …xed-smoothing asymptotics does not necessarily require that we …x the smoothing parameter in …nite samples. In fact, in empirical applications, the sample size is usually given beforehand and the smoothing parameter needs to be determined using a priori information and/or information obtained from the data. Very often, the selected amount of smoothing increases with the sample size. So the empirical situations appear to be more compatible with the conventional increasing-smoothing asymptotics. Fortunately, we can show that these two types of asymptotics coincide when the amount of smoothing increases with the sample size. Since the F approximation captures the randomness of the HAR variance estimator, it may be more accurate than the chi-square approximation. In addition, critical values from the F distribution are also asymptotically valid under the conventional increasing-smoothing asymptotics. In view of the accuracy of the F critical values and their asymptotic validity under the increasing-smoothing asymptotics, we recommend using the F distribution as the reference distribution and call the resulting test the series F test. The next step in using the series HAR variance estimator is to select the number of terms K in constructing the series F test. We follow the standard practice and use the asymptotic mean squared error (AMSE) as the criterion to select the smoothing parameter K: The MSE-optimal K depends on unknown parameters, which can be estimated by a parametric plug-in procedure. We employ the Matérn model as the approximating parametric model. As a widely-used model in spatial analysis, the Matérn model is very ‡exible in capturing various decaying pattern of spatial dependence. Simulation studies show that the series F test is more accurate in size than the conventional chi-square test. It is also as accurate in size as the BCHV test under the …xed-b asymptotics. The rest of the paper is organized as follows. Section 2 describes the problem at hand and introduces the series HAR variance estimator. Sections 3 and 4 establish the asymptotic properties of the HAR variance estimator and the associated F tests under the …xed-smoothing asymptotics. Section 5 presents the MSE-optimal smoothing parameter choice. The subsequent section reports simulation evidence on the performance of the new procedure. The last section provides some concluding discussion. Proofs are given in the Appendix.
3
2
Basic Setting and HAR Variance Estimation
We are interested in an M estimator, ^N , of a d
1 parameter vector
0
that satis…es
N 1 X o ^ si N = 0 N i=1
where N is the sample size, and soi is a d
1 score vector such that
E(soi ( )) = 0 if and only if = 0 : The score vector is a function of observable data indexed by i. We allow the score vector to exhibit general forms of spatial correlation where the strength of the correlation depends on some observable distance measure between any two observations. For simplicity, we follow Conley (1999) and assume that, given the distance measure, it is possible to map the data onto a …nite-dimensional integer lattice so that the distance between pairs of observations can be expressed in terms of the lattice indices. To simplify the presentation further, we consider the two dimensional case. The extension to higher dimensions is straightforward. We suppose that the locations are indexed by (`; m) 2 [1; 2; :::; L] [1; 2; :::; M ] : We can then rewrite the sample moment condition as L M 1 XX 1`;m so`;m ^N = 0 (1) LM `=1 m=1
so`;m (
) is the score function associated with location (`; m) and 1`;m is a binary where variable indicating whether an observation is available at location (`; m) : To establish the asymptotic properties of ^N ; we often …rst show its consistency and then prove its asymptotic distribution. We assume that consistency has been proved. De…ne s`;m ( 0 ) = 1`;m so`;m ( 0 ) : Under the two conventional assumptions that L M 1 XX p s`;m ( 0 ) !d N (0; ); (2) LM `=1 m=1 where
0 10 10 X X 1 = lim E@ s`;m ( 0 )A @ s`;m ( 0 )A ; L!1;M !1 LM `;m
and
sup 2
where
`;m
L M 1 X X @s`;m ( ) LM @ `=1 m=1
J( ) !p 0;
is a small compact neighborhood around 0 ; we have " # 1 L X M L M X p @s 1 1 XX N `;m ^ p LM N s`;m ( 0 ) 0 = LM @ LM !d N (0; J
`=1 m=1 1 1
J
`=1 m=1
);
4
(3)
(4)
where J = J( 0 ) = E [@s`;m ( ) =@ j = 0 ] : In the time series literature, is called the long run variance. We may refer as the global variance in the spatial setting, as it is not a variance associated with a single location but rather a variance contributed by all locations. Since J can be estimated easily by its sample analog, it su¢ ces to estimate in order to conduct inference about 0 : In this paper, we introduce the series type estimator for : For each k = (k1 ; k2 ) 2 Z+ Z+ , de…ne 1 Ak = Ak (^N ) = p LM for some basis function direct estimator:
k1 ;k2 (
L X M X
; ) =
k1 ;k2 (
`=1 m=1 k(
` m ; )s`;m ^N L M
; ) that may be complex. Construct the
^ k = Re (Ak A ) ; k where Ak is the conjugate transpose of A: Taking a simple average of the direct estimators yields a new estimator: X ^k ^= 1 K k2K
where K = (0; 1; :::; K1 )
(0; 1; :::; K2 )n f(0; 0)g ;
K1 and K2 are smoothing parameters, and K = K1 K2 + K1 + K2 is the number of elements in K. The larger K is, the larger the amount of smoothing is. In the de…nition of ^ ; we have explicitly excluded the case k1 = k2 = 0: We do so because we anticipate (0;0) = 1 for some choice of : When (0;0) (r; s) = 1 for all r and s; we have Ak = 0, using the de…nition of the estimator ^N : If we do not exclude this case and de…ne K1 X K2 X 1 ^ k = K 1 K2 + K1 + K2 ^ : = (K1 + 1) (K2 + 1) (K1 + 1) (K2 + 1) k1 =0 k2 =0
Then may be asymptotically biased while ^ is asymptotically unbiased. The above series estimator has been considered in the time series setting by Phillips (2005) and Sun (2010a,c). They provide more discussion on this class of estimators.
3
Asymptotic Properties of the HAR Variance Estimator
In this section, we investigate the asymptotic properties of the global variance estimator ^ under the speci…cation that K1 and K2 are …xed. In this case, with appropriately chosen basis functions, ^ converges to a random variable with Wishart distribution. 5
We maintain the following assumptions. Assumption 1. K1 and K2 are …xed as L ! 1, M ! 1 such that L=N and M=N ! 0: Assumption 2. ^N !p 0 Assumption 3. The functional central limit theorem [rL] [sM ] 1 XX p s`;m ( 0 ) !d LM `=1 m=1
W (r; s)
(5)
holds for all (r; s) 2 [0; 1]2 ; where is the matrix square root of and W (r; s) = (W1 (r; s); :::; Wd (r; s)) is a d-dimensional independent Brownian sheet process with covariance given by cov(Wi (r1 ; s1 ); Wj (r2 ; s2 )) =
ij
min(r1 ; r2 ) min(s1 ; s2 ):
Assumption 4. The uniform law of large numbers [rL] [sM ] 1 X X @s`;m ( ) sup LM @ 2
rsJ( ) !p 0
`=1 m=1
holds and J( ) is a nonsingular matrix for all
2
:
Assumption 1 imposes non-degeneracy in that neither dimension is dominant. Assumption 2 is made for convenience. It can be proved under more primitive assumptions and using standard arguments. Assumptions 3 and 4 are the same as BCHV (2009). Technical conditions for the FCLT can be founded in Deo (1975) and Goldie and Greenwood (1986), among others. Under the above assumptions, it is easy to see that the asymptotic distribution in (4) can be written in terms of the Brownian sheet process: p
LM ^N
0
!d
J
1
W (1; 1):
(6)
With this result, we have p
[rL] [sM ] 1 XX s`;m ^N LM `=1 m=1
=p !d
1 LM
[rL] [sM ] X X
`=1 m=1
W (r; s)
0
[rL] [sM ] 1 X X @s`;m @ s`;m ( 0 ) + LM @
rsJJ
1
`=1 m=1
W (1; 1) =d
[W (r; s)
N
1 A
LM ^N
0
rsW (1; 1)] := B (r; s) :
where B (r; s) is a d-dimensional tied-down Brownian sheet.
6
p
Theorem 1 Let assumptions 1-4 hold. Suppose k ( ; ) is twice continuously di¤ erentiable, then Ak !d k jointly for k 2 K, and ^k !
d
where k
=
Z
0
Re (
1Z 1
; ^! Z
0
0
k k)
Z
1Z 1
=
k k)
0
0
=
k
"
d
k (r; s)
Note that E Re (
k
0
1Z 1
1 X Re ( K
#
k k)
k2K
k (u; v)dudv
0
;
dW (r; s) :
0
where Z
k (r; s)
0
0
1Z 1
2 k (u; v)dudv
drds:
0
So if k = 1; ^ k is asymptotically unbiased. As a result, ^ is asymptotically unbiased. The asymptotic unbiasedness provides the basis for constructing test statistics that are pivotal under the …xed-K asymptotics. In general, the …xed-K asymptotic distribution of ^ is nonstandard, which is not convenient for practical use. To the asymptotic distribution, we can choose R 1 Rsimplify 1 the basis functions such that 0 0 k (u; v)dudv = 0: In this case, k
=
Z
0
1Z 1
k (r; s)dW
(r; s) :
0
This ‘zero mean’assumption ensures that the estimation uncertainly in ^N will not a¤ect the asymptotic distributions of ^ k and ^ : This is an important point as the conventional kernel estimators often su¤er from the bias due to the estimation error in ^N : To simplify the nonstandard distribution further, we …rst consider the case that choose k (r; s) such that k ’s become independent k (r; s) is a real function. We P normal vectors and as a result k2K k k follows a Wishart distribution. To ful…ll this requirement, we need k (r; s) to be orthonormal. Theorem 2 Let assumptions 1-4 hold. Suppose (i) Rk (R; ) is a twice continuously di¤ erentiable real function, 1 1 (ii) 0 0 k (r; s) drds = 0; (iii) f k (r; s)g are orthonormal in L2 ([0; 1] [0; 1]) : 0 and ^ !d 0 =K where Then ^ k !d k k s iid W(Id ; 1), W(Id ; K); and W( ; ) is a Wishart distribution. There are several possible choices for polynomials such as r 1=2; r2 1=3; r3
k
7
=
P
k2K
k
s
( ; ) : First, we can start with zero mean 1=4; r4 1=5 and use the Graham and
4
φ (r) 0
3
φ (r) 1
φ (r)
2
2
φ (r) 3
1
0
-1
-2
-3 0
0 .2
0 .4
0 .6
0 .8
1
Figure 1: Graphs of orthonormal polynomials
Schmidt procedure to orthonormalize them. This gives us, for example, the …rst few polynomials as follows 0 (r)
=
p
12 r
1 (r)
=
p
180 r2
2 (r)
=
p
2800 r3
3 (r)
= 210 r4
1 2 r+
1 6
1 3 2 3 r + r 2 5 20 9 2 1 2r3 + r2 r+ 7 7 70
:
These polynomials are graphed in Figure 1. Obviously, they look like wave functions. It is not surprising that they can be used to capture the low frequency component of a stochastic process. Let k (r; s) = k1 (r) k2 (s) be the tensor product of the above basis functions, then k (r; s) satis…es conditions p (i)-(iii) in Theorem 2. For the second choice, we can let (r; s) = 2 cos (2 k1 r + 2 k2 s) or k (r; s) = k p 2 sin (2 k1 r + 2 k2 s) : It is easy to check that the sine and cosine functions satisfy conditions p (i)-(iii) in Theorem 2. Following Phillips using p (2005), one may consider p (r; s) = 2 sin ( (k 0:5) r + (k 0:5) s) ; 2 cos ( k r + k s), or 2 sin ( k1 r + k2 s) : 1 2 1 2 k However, these functions do not satisfy the zero mean condition. We proceed to consider the case that k (r; s) is a complex function. We choose P k (r; s) such that k becomes a circular complex normal vector and k2K k k follows a complex Wishart distribution. The complex Wishart distribution has been consid8
ered by Goodman (1963). For moments of this distribution, see Graczyk, Letac, and Massam (2003). Since our interest lies on the real part of k k ; we may write k
and examine
X
Re (
k k)
= Gk + iHk =
k
X
Gk G0k + Hk Hk0
k
P
directly. To ensure that k p Re ( k k ) follows p a Wishart distribution, we need fGk ; Hk g to be iid normal. That is, 2 Re k (r; s); 2 Im k (r; s); k 2 K are orthonormal in L2 ([0; 1] [0; 1]) : It follows from Theorem 2 that Corollary 3 below holds. Corollary 3 Let assumptions 1-4 hold. Suppose (i) Rk (R; ) is twice continuously di¤ erentiable, 1 1 (ii) 0 0 k (r; s) drds = 0, p p (iii) 2 Re k (r; s); 2 Im k (r; s); k 2 K are orthonormal in L2 ([0; 1] [0; 1]) : d 0 and ^ !d 0 =K where ^ = k k s iid W(Id =2; 2) and P Then k ! k2K k s W(Id =2; 2K):
Corollary 3 shows that ^ k converges to independent Wishart distributions. Under 0 = the orthonormality assumption, we have E . So ^ k is asymptotically k unbiased. As a result, ^ is also asymptotically unbiased. A natural choice for complex valued k ( ; ) is the complex exponential of the form: k (r; s) = exp [ i (2 k1 r + 2 k2 s)] ; k1 ; k2 2 K: It is easy to check that all conditions in Corollary 3 hold for this choice. In this case, Ak becomes Ak = p
L M 1 XX exp LM `=1 m=1
i
2 k2 2 k1 `+ m L M
s`;m ^N ;
which is the Finite Fourier transform (FFT) of the spatial process s`;m ^N : In general, we can de…ne L M 1 XX A(! 1 ; ! 2 ) = p exp [ i (! 1 ` + ! 2 m)] s`;m ^N LM `=1 m=1
for any (! 1 ; ! 2 ) 2 [0; 2 ]
[0; 2 ] and let I(! 1 ; ! 2 ) = A(! 1 ; ! 2 )A (! 1 ; ! 2 ):
I(! 1 ; ! 2 ) is the periodogram matrix indexed by two dimensions. The direct estimator ^ k is then equal to the real part of the periodogram matrix evaluated at (! 1 ; ! 2 ) =
9
(2 k1 =L; 2 k2 =M ) : The global variance estimator ^ is just a simple average of the periodogram matrices: X ^ = Re 1 I K
2 k1 2 k2 ; L M
k2K
:
(7)
In the time series setting, Hannan (1979, p. 275) called the estimator of this type the FFT estimator. Among the four sets of choices, the sine, cosine and complex exponential bases are easier to use than the polynomial bases as they provide a simple in…nite sequence of basis functions. The sine and cosine bases are not complete while the complex exponential bases are complete. In order to capture the spatial dependence with a …nite number of basis functions, it is advantageous to use complete orthonormal bases. Hence we focus on the global variance estimator with k (r; s) = exp [ i (2 k1 r + 2 k2 s)] hereafter.
4
Spatial Correlation Robust Hypothesis Testing
In this section, we use ^ to perform standard tests and derive the asymptotic distributions of the test statistics for …xed smoothing parameters K1 and K2 : The null hypothesis of interest is H0 : R = r and the alternative is H1 : R 6= r; where R is a q d matrix. The usual Wald statistic W for testing H0 against H1 is given by hp i0 1p WN = LM R^N r R 1 J^^ J^ 1 R0 LM R^N r where J^ = (LM ) usual t-statistic
1 PL PM `=1 m=1 @s`;m
tN =
p
^N =@ : When q = 1, we can construct the
LM (R^N
r)
1=2
R ^ R0
=
p
WN :
Since the limiting result in (6) and weak convergence results in Theorem 2 and Corollary 3 hold jointly, we have WN !d RJ " RJ
1
1
W (1; 1)
0
1 X Re ( K k2K
!
k k)
0
J
1
R
0
#
1
RJ
1
W (1; 1) :
Let RJ for some q Then
~W ~ q (r; s) W (r; s) =d R
1
~ and q dimensional independent Wiener process W ~ q (r; s) : q matrix R WN !
0
1 X Re ( K k2K
10
!
k k)
1
;
where = Wq (1; 1) ;
k
=
Z
0
1Z 1
k (r; s)dWq (r; s):
0
~ q (r; s) as Wq (r; s) and we will do so in the Here for convenience, we have written W rest of the paper. Since both and k are normal and cov( ; we know that we note that
k) =
is independent of
Re
X
(
k k)
=
k2K
Z
0
k:
X
1Z 1
k (r; s)drds
= 0;
0
To understand the limiting distribution of WN ;
Gk G0k + Hk Hk0 s Wq (Iq =2; 2K):
k2K
As a result, 0
1 X Re ( K k2K
=
0
!
1
k k)
1 X p 2Gk 2K
p
2Gk
0
+
p
2Hk
p
2Hk
k2K
0
!
1
s T 2 (q; 2K)
where T 2 (q; 2K) is Hotelling’s T-square distribution. Using the well-known relationship between the T-square distribution and the F-distribution, we have (2K
q + 1) WN !d Fq;2K 2qK
q+1 :
When q = 1; the above result reduces to tN !d t2K : That is, the t-statistic converges to the t-distribution with 2K degree of freedom. We have therefore shown that under the …xed-K asymptotics, the scaled Wald statistic converges weakly to the F distribution and the t-statistic converges to the t-distribution. These results are very handy as critical values from the F distribution or the t distribution can be easily obtained from statistical tables or standard software packages. Next, we consider the power of the tests under the local alternative hypothesis, p H1 (c) : R = r + c= LM where c = RJ for some q (2K
1
1
J
R0
1=2
c~
(8)
1 vector c~: We have
q + 1) (2K q + 1) WN !d ( + c~)0 2qK q
11
1 X Re ( K k2K
!
k k)
1
( + c~) := Fq;2K
q+1
2
;
a noncentral F distribution with degrees of freedom (q; 2K q + 1) and noncentrality parameter 1 2 = (~ c)0 c~ = c0 RJ 1 J 1 R0 c: This result follows from Proposition 8.2 in Bilodeau and Brenner (1999) where the notation Fc denotes the canonical F distribution (Bilodeau and Brenner (1999), page 42). Similarly, the t-statistic converges to the noncentral t distribution with degrees 1=2 of freedom 2K and noncentrality parameter = c= RJ 1 J 1 R0 = c~: We collect the above results in the following theorem. Theorem 4 Let Assumptions in Corollary 3 hold, then (2K
q + 1) WN !d Fq;2K 2qK (2K q + 1) WN !d Fq;2K 2qK
q+1
q+1
under the null H0 ; 2
under the local alternative H1 (c) :
Theorem 4 shows that the …nite sample distribution WN can be approximated by 2K 2K q + 1
1
2 2K q+1 = (2K
q + 1)
2 q:
As K ! 1; both 22K p+1 = (2K p + 1) and 2K=(2K p + 1) converge to one. As a result, the above limiting distribution reduces to 2q , the conventional asymptotic approximation. A direct implication is that critical values obtained from the F -approximation is asymptotically valid under the conventional asymptotics when K ! 1 with the sample size. However, when K is not very large or the number of the restrictions q is large, the F approximation can be very di¤erent from the 2 approximation. Since both the random denominator 2 p + 1) and 2K p+1 = (2K the proportional factor 2K=(2K p + 1) shift the probability mass to the right, critical values based on the F approximation are larger than those based on the 2 approximation. Let (2K q + 1) FN = WN 2qK be the modi…ed Wald statistic. We call the test based on the statistic FN and critical values from Fq;2K q+1 the series F test. Correspondingly, we call the test based on the statistic WN and critical values from 2q the series 2 test.
5
Smoothing Parameter Choice
In this section, we assume that K1 =L = K2 =M = and employ the conventional mean square error criterion to select the smoothing parameter : We de…ne ~ as the pseudo-estimator that is identical to ^ but is based on the true parameter, , instead of ^N . That is, X ~= 1 Re [Ak ( 0 ) Ak ( 0 )] ; K k2K
12
where
L M 1 XX Ak ( 0 ) = p exp LM `=1 m=1
i
2 k1 2 k2 `+ m L M
s`;m ( 0 ) :
Under some technical assumptions, the e¤ect of using ^N instead of 0 is asymptotically negligible. These assumptions can be found in Kim and Sun (2010) and are omitted here for brevity. Let fLM (! 1 ; ! 2 ) = Re
L L M M 1 X X X X Es`1 ;m1 ( 0 ) s0`2 ;m2 ( 0 ) exp ( i [! 1 (`1 LM
`2 )
! 2 (m1
m2 )]) ;
`1 =1 `2 =1 m1 =1 m2 =1
and assume that limL!1;M !1 fLM (! 1 ; ! 2 ) exists. Denote f (! 1 ; ! 2 ) = Then E~ =
lim
L!1;M !1
fLM (! 1 ; ! 2 ):
K1 X K2 1 X 2 k1 2 k2 fLM ( ; ): K L M k1 =0 k2 =0
Given a d2
d2 weighting matrix V; we de…ne the MSE criterion as nh io M SE( ~ ; V ) = E vec( ~ )0 V vec( ~ ) ;
where vec( ) is the column-by-column vectorization function. The following theorem gives the MSE of ~ : Theorem 5 As L ! 1; M ! 1 such that ! 0; we assume (i) fLM (! 1 ; ! 2 ) = f (! 1 ; ! 2 ) + o(1=LM ) uniformly over ! 1 and ! 2 in a neighborhood around the origin, (ii) f (! 1 ; ! 2 ) is twice continuously di¤ erentiable, and f (! 1 ; ! 2 ) = f ( ! 1 ; ! 2 ), 0 =K)] (1 + o(1)): (iii) var(vec( ~ )) = var [vec ( Then as L ! 1; M ! 1 such that ! 0; (a) var(vec( ~ )) = ( ) (Id2 + Kdd ) =(2K): (b) E ~ = B 2 + o 2 where B= (c) M SE( ~ ; W ) =
n
2
2 00 2 00 00 (0; 0) + f11 (0; 0) + f22 (0; 0) : f12 3 3
4 [vec (B)]0 W vec(B)
(1 + o(1)) where Kdd is the d2
+ 2
2 LM
1
d2 commutation matrix.
13
tr [V (
o ) (Id2 + Kdd )]
We make high level assumptions in Theorem 5 in order to facilitate proofs. The assumptions can be replaced by more primitive conditions but at the cost of lengthy proofs. For example, Assumptions (i) and (ii) require the spatial dependence between two units decays fast enough as their distance increases. When K1 and K2 are …xed, 0 =K: So Assumption (iii) it is easy to show that under Assumption 3 ~ !d strengthens the weak convergence to the quadratic mean convergence. Theorem 5 shows that, as K increases, the variance of ~ decreases and the bias of ~ increases. According the MSE criterion, there is an opportunity to select the smoothing parameter to balance the squared bias and variance. It is easy to see that the MSE-optimal is N
=
tr [V ( ) (Id2 + Kdd )] 4 [vec (B)]0 W vec(B)
1=6
1=6
1 N
:
The above formula is consistent with Kim and Sun (2010) who consider kernel estimators of : For second order kernels and two-dimensional regular lattice structure, the optimal bandwidth is d?N =
4 [vec (B)]0 W vec(B) )(Id2 + Kdd )] 2 tr [V (
1=6
N 1=6 :
(9)
using the notation in this paper, where 2 is related to the kernel function used. When 2 = 1; N = 1=dN : To implement the optimal ; we use an approximating parametric model to capture the spatial dependence. There are two classes of parametric models that are commonly used in the literature. The …rst is to model the process itself. This approach is based on the work of Cli¤ and Ord (1981) and requires the use of a weight matrix. Kim and Sun (2010) use this approach. The second approach is to model the covariance structure directly. In this approach, rather than starting with the process and deriving the covariance matrix, a functional form for the covariance structure is assumed. The parameters of this function are then estimated. Here we use the second approach as it does not require the weight matrix. We employ the ‡exible class of Matérn models as the approximating covariance model. The covariance function is h i0 2 khk i khk (i) (i) i Ci (h) = Es`1 ;m1 ( 0 ) s`2 ;m2 ( 0 ) = K i 1 2 i ( i + 1) i i where i > 0; i > 0; h = (`1 `2 ; m1 m2 ) and K i is a modi…ed Bessel function (Abramowitz and Stegun, 1965, pp. 374-379). When i = m + 1=2 for a nonnegative integer m, the autocovariance function is of the form e khk= times a polynomial in khk of degree m: The corresponding variogram is given by i (h)
with ci = 48-51).
= Ci (0)
2= : i i
Ci (h) = ci 1
1 2v i
1
(vi )
khk i
i
K
i
khk i
For further discussion about the Matérn class, see Stein (1999, pp.
14
(i)
For the Matérn class, the spectral density for s`;m ( 0 ) has the form: fi (! 1 ; ! 2 ) =
2 2 i i
2
i
i
+ ! 21 + ! 22
i
1
:
So 00 00 f11;i (0; 0) = f22;i (0; 0) =
(2
i
2 4 i i;
+ 2)
00 f12;i (0; 0) = 0:
For simplicity and parsimony, we use d univariate Matérn models to model the vector process s`;m ( 0 ) : In addition, we require that V give weights only to the diagonal elements of ~ and the weights are equal. In the case B=
4 3
2
diag (
i
and N
= 0:37735
2 4 i i
+ 1)
Pd
Pd
;
= diag
2 2 i i
!1=6
1=6
2 i=1 ci
i=1 (
i+
4 i 1)2 c2i 8i
N
:
In the special case that ci = c; i = and vi = v for all i; the above formula reduces to 1=3 2=3 N 1=6 : N = 0:37735 ( + 1) In our simulation study below, we set i = for all i and consider two di¤erent values of : = 1=2; 1; which correspond to the exponential model and the Whittle model. In the former case (h) = c 1
khk
exp
while in the latter case khk
(h) = c 1
khk
K1
:
We estimate the rest of parameters ci and i by …tting the theoretical variogram i (h) to the empirical estimates ^ i (h) based on nonlinear least squares (NLS): (^ ci ; ^i ) = arg min ci ;
i
X
ci 1
h
1 2v
1
(v)
khk
K
2
khk
i
^ i (h)
i
:
In principle, we can estimate ci and i using more e¢ cient estimators such as the MLE or weighted NLS estimator. Here we are content with the above ordinary NLS estimator as the model we speci…ed is not necessarily correct. The data-drive optimal N is then given by ^ N = 0:37735
Pd
^2i ^4i i=1 c P ( + 1)2 di=1 c^2i ^8i 15
!1=6
N
1=6
:
(10)
6
Simulation Study
This section provides some simulation evidence on the …nite sample performance of the series F test. Following BCHV (2009), we consider data generated on an M M integer lattice with ys = + xs + "s where xs is a scalar, s is a vector (s1 ; s2 ) indicating which lattice point the observation corresponds to, = 0, and = 1. We generate xs and "s according to X X jjj x jjj " us+j ; us+j ; "s = xs = kjk 2
kjk 2
where kjk = max (jj1 j ; jj2 j) ; uxs s iidN (0; 1), and u"s s iidN (0; 1). We consider four di¤erent values of = 0; 0:3; 0:6; 0:9. We consider two di¤erent designs: one uses a full lattice and the other uses a sparse lattice. For the full lattice case, we set M = 25 and use the data generated at each location for a total sample size of N = 625: For the sparse lattice case, we set M = 36 and generate data on the full 36 36 lattice but then randomly sample (without replacement) 625 of the potential 1296 locations. We condition on the same set of 625 locations in each of the simulation replications. These locations are presented in Figure 2.
35
30
25
20
15
10
5
5
10
15
20
25
30
35
Figure 2: Sparse Lattice
The null hypothesis of interest is H0 : = 1 and the alternative hypothesis is H1 : 6= 1. For this testing problem q = 1; R = (0; 1) and r = 1: We consider two 16
set of testing procedures. The …rst set consists of the series F test and the series 2 test. The series 2 test is based on the Wald statistic and employs 2 as the 1 reference distribution. The series F test is based on the modi…ed Wald statistic and employs an F distribution as the reference distribution. We consider both a priori …xed smoothing parameter choice and data-driven smoothing parameter choice. In the former case, we set K1 = K2 = 1 or 2: In the latter case, we let K1 = K2 = [^ N M ] where ^ N is given in (10). The second set consists of the Wald-type tests based on a kernel estimator of : We consider the Gaussian kernel with smoothing parameter equal to d : ! 2 ks1 s2 k2 G(s1 ; s2 ) = exp : d2 The Gaussian kernel is also considered by BCHV. For the kernel method, we consider two reference distributions: the nonstandard …xed-b asymptotic distribution given in BCHV and the standard 21 distribution. The …xed-b critical values are simulated using the same DGP as above but with = 0: For convenience, we refer the test based on the …xed-b asymptotics the BCHV test. For …xed smoothing parameter choice, BCHV set d = 2; 4; 8; 16: To save space, we consider only d = 8; 16 here. For the above kernel estimator, we can derive the MSE-optimal d: Following Kim and Sun (2010), we …nd that the MSE-optimal d is d?N =
8 [vec (B)]0 W vec(B) tr [W ( )(Id2 + Kdd )]
1=6
N 1=6 :
(11)
Using the same plug-in procedure as before, we can estimate d?N by d^?N
=2
1=6
(^ N )
1
= 2:3609
( + 1)2 Pd
Pd
^2i ^8i i=1 c ^2i ^4i i=1 c
!1=6
N 1=6 :
We are interested in evaluating the size accuracy of the four testing methods. We set the signi…cance level to be 5%, which is also the nominal size. We compute the empirical size based on 5000 simulation replications. Table 1 gives the empirical size of the four testing methods when the smoothing parameter parameters are …xed a priori. Both full lattice results and sparse lattice results are reported. It is clear from the table that the conventional chi-square tests can have a large size distortion, especially when the amount of smoothing is small (small K or large d): The size distortion increases with the spatial dependence. The series F test and the BCHV test succeed in reducing the size distortion. In terms of the size accuracy, the performance of the F test with K1 = K2 = 1 is about the same as the BCHV test with d = 16: Comparing the full lattice results with the sparse lattice results, we …nd that the size distortion is smaller for the sparse lattice. Having a sparse lattice is analogous to have a weaker spatial dependence. Table 2 presents the empirical size as Table 1 except that the smoothing parameters are data-driven. The qualitative observations from Table 1 remain valid. The 17
series F test is more accurate in size than the series 2 test. Similarly, the …xed-b approximation is more accurate than the standard 2 approximation. The series F test and the BCHV test have more or less the same size distortion. Both tests have slightly more accurate size when = 1=2 as compared to = 1: Simulation results not reported here show that the smaller value of gives rise to smaller amount of smoothing (smaller K and large d). As shown in Table 1, the smaller the amount of smoothing is, the more accurate the sizes of the F test and BCHV test are. Both Table 1 and Table 2 reveal that the series F test and the BCHV test are most accurate when the amount of smoothing is small. For the purpose of achieving size accuracy, the data-driven amount of smoothing based on the MSE criterion is too large. This result is consistent with the …ndings of Sun, Phillips, Jin (2008) and Sun (2010a,b). They rigorously show that the MSE-optimal amount of smoothing is too large for minimizing the size distortion of a test. Finally, as before, the size distortion for the sparse lattice case is smaller compared to the full lattice case. The selected amount of smoothing is larger (larger K and smaller d). The larger the amount of smoothing is, the smaller the di¤erence between the …xed smoothing asymptotic and the increasing smoothing asymptotics is. So it is not surprising to see that the F test and the BCHV test do not reduce the size distortion of their respective 2 test by a large margin.
7
Conclusion
The paper has proposed a new testing method that is robust to spatial autocorrelation. It is based on a series type variance estimator that involves projecting the data or score processes onto to a sequence of orthonormal basis functions. When the number of basis functions is …xed, the modi…ed Wald statistic is asymptotically F distributed. This is in contrast with the conventional 2 approximation of the Wald statistic. On the basis of this F asymptotic theory, we propose an easy-to-implement F test. Simulation results show that the F test can ameliorate the size distortion of the conventional 2 test caused by spatial autocorrelation. In this paper, we focus on the asymptotic MSE criterion, which may not be most suitable for hypothesis testing or CI construction. It is interesting to extend the methods by Sun, Phillips and Jin (2008), Sun and Phillips (2008), and Sun (2010a,b) on time series HAR estimation to the spatial setting.
18
Table 1: Empirical size of di¤erent tests with a priori …xed smoothing parameters Series F test Series 2 test Gaussian (…xed-b) Gaussian ( 2 ) Ki = 1 Ki = 2 Ki = 1 Ki = 2 d = 8 d = 16 d = 8 d = 16 Regular Lattice =0 0.050 0.051 0.097 0.068 0.050 0.044 0.086 0.163 = 0:3 0.065 0.079 0.119 0.101 0.077 0.057 0.120 0.192 = 0:6 0.075 0.097 0.137 0.123 0.091 0.070 0.138 0.208 = 0:9 0.076 0.106 0.142 0.132 0.094 0.072 0.144 0.216 Sparse Lattice =0 0.048 0.046 0.094 0.060 0.051 0.049 0.060 0.104 = 0:3 0.050 0.067 0.105 0.085 0.077 0.060 0.089 0.126 = 0:6 0.059 0.070 0.114 0.094 0.086 0.069 0.098 0.136 = 0:9 0.064 0.076 0.116 0.098 0.090 0.069 0.107 0.142 Note: ‘Series F test’ is the test developed in this paper. ‘Gaussian (…xed-b)’ is the test developed in BCHV. ‘Series 2 test’and ‘Gaussian ( 2 )’are conventional 2 tests.
Table 2: Empirical size of di¤erent tests with data-driven smoothing parameters Series F test Series 2 test Gaussian (…xed-b) Gaussian ( 2 ) = 1=2 =1 = 1=2 =1 = 1=2 =1 = 1=2 =1 Regular Lattice =0 0.048 0.047 0.050 0.050 0.048 0.048 0.049 0.050 = 0:3 0.079 0.079 0.110 0.102 0.075 0.075 0.136 0.122 = 0:6 0.082 0.094 0.131 0.123 0.084 0.092 0.170 0.148 = 0:9 0.090 0.106 0.132 0.134 0.090 0.094 0.169 0.149 Sparse Lattice =0 0.046 0.046 0.046 0.047 0.053 0.053 0.047 0.048 = 0:3 0.074 0.082 0.085 0.091 0.076 0.077 0.092 0.088 = 0:6 0.083 0.093 0.097 0.102 0.083 0.086 0.099 0.097 = 0:9 0.095 0.114 0.107 0.123 0.090 0.095 0.106 0.106
19
8
Appendix of Proofs
Proof of Theorem 1. For each location (`1 ; m1 ); de…ne the partial sum S `1 ;m1 ( ) =
`1 X m1 X
s`;m ( ) ;
`=1 m=1
S 0;0 ( ) = S `1 ;0 ( ) = S 0;m1 ( ) = 0: Note that " M L 1 X X Ak = p LM `=1 m=1 =p =p
L M 1 XX LM `=1 m=1 L M 1 XX LM `=1 m=1
#
` m `;m ^ )s N k( ; L M ` m h `;m ^ ; ) S ( N k L M
S`
k(
S `;m
` m h `;m ^ ; ) S N L M
L M 1 XX LM `=1 m=1
k(
` m h ` ; ) S L M
L M 1 XX I1 = p LM `=1 m=1
k(
` m h `;m ^ ; ) S N L M
p = I1
I2 ;
1;m
^N
1;m
S`
1
^N ^N
S `;m i ^N
1;m 1
1
^N + S `
1;m 1
i
where
=p =p
L M 1 XX LM `=1 m=1 L M 1 XX LM `=1 m=1
L M 1 XX =p LM `=1 m=1
+p
L 1 X LM `=1
=p
k(
k(
` m `;m ^ )S N k( ; L M
p
` m `;m ^ ; )S N L M
p
k(
1
^N
i
L M 1 XX LM `=1 m=1
L M 1 1 XX LM `=1 m=0
L M 1 XX p LM `=1 m=1
` m `;m ^ )S N k( ; L M
` M + 1 `;M ^ ; )S N L M
L M 1 XX LM `=1 m=1
L 1 X +p LM `=1
S `;m
k(
` m ; ) L M
k(
` m+1 ; ) S `;m ^N L M
` M + 1 `;M ^ ; )S N ; L M
20
k(
` m `;m ; )S L M
k(
k(
1
^N
` m + 1 `;m ^ ; )S N L M
` m + 1 `;m ^ ; )S N L M
^N
i
L M 1 XX I2 = LM
=p +p
`=1 m=1 L X M X
1 LM
1 LM
=p
k(
1 LM
+p
1 LM
k(
`=1 m=1
L X
k(
`=1 L X M X
1 LM
p
` m h ` ; ) S L M
k(
k(
`=1
k(
` m ; ) L M
k(
` M +1 ` ; )S L M
`=1 m=1 M X
m=1 L X
1;m
1;M
`+1 m ; ) L M
L+1 m ; ) L M
^N
1;m 1
` m+1 ; ) S` L M
^N
i
^N
1;m
^N
k(
k(
S`
`+1 m+1 ; ) S `;m ^N L M
L+1 m+1 ; ) S L;m ^N L M
` + 1 M + 1 `;M ^ ; )S N : L M
So Ak =p +p
L M 1 XX LM `=1 m=1
M 1 X LM m=1
k(
` m ; ) L M
k(
` m+1 ; ) L M
L+1 m ; ) L M
k(
L+1 m+1 ; ) S L;m ^N L M
k(
k(
`+1 m ; )+ L M
L 1 X `+1 M +1 ` M +1 +p ) ; ) S `;M ^N k( k( ; L M L M LM `=1 Z 1Z 1 2 Z 1 @ k (r; s) @ k (1; s) ! B(r; s)drds B(1; s)ds @r@s @s 0 0 0 := k
where Z k =
0
1Z 1 0
@2
k (r; s)
@r@s
B(r; s)drds
Z
1
@
k (1; s)
@s
0
21
B(1; s)ds
k(
Z
1
@
`+1 m+1 ; ) S `;m ^N L M
k (r; 1)
@r
0
Z
0
1
@
k (r; 1)
@r
B(r; 1)dr
B(r; 1)dr:
The …rst term in Z
0
=
1Z 1
Z
0
=
Z
0 1Z 1 1
Z
1
0
=
Z
1
0
=
Z
k (r; s)
@r@s
Bd (r; s)drds
Bd (r; s)d
0
0
=
@2
is
k
1
0
@
k (1; s)
@
k (r; s)
@s
ds Z 1Z
1
@
k (r; s)
dBdr (r; s) ds @s @s 0 0 Z 1Z 1 @ k (1; s) dBdr (r; s) d k (r; s) Bd (1; s)ds @s 0 0 Z 1Z 1 Z 1 @ k (1; s) r Bd (1; s)ds k (r; s)dBd (r; s) k (r; 1)dBd (r; 1) + @s 0 0 0 Z 1 Z 1Z 1 @ k (1; s) @ k (r; 1) Bd (1; s)ds + Bd (r; 1) dr + k (r; s)dBd (r; s) : @s @r 0 0 0 Bd (1; s)ds
Hence Z 1Z 1 2 Z 1 Z 1 @ k (r; s) @ k (1; s) @ k (r; 1) = B (r; s)drds B (1; s)ds Bd (r; 1)dr d d k @r@s @s @r 0 0 0 0 Z 1Z 1 = k (r; s)dBd (r; s) 0 0 Z 1Z 1 Z 1Z 1 = k (r; s)dW (r; s) k (u; v)dudv W (1; 1) 0 0 0 0 Z 1Z 1 Z 1Z 1 Z 1Z 1 = (r; s)dW (r; s) (u; v)dudv dW (r; s) k k 0 0 0 0 0 0 Z 1Z 1 Z 1Z 1 = dW (r; s) : k (r; s) k (u; v)dudv 0
0
0
0
Given the weak convergence of Ak ; we immediately have: ^ k !d Hence " # X 1 ^ !d Re ( k k ) 0 K k
as desired.
Proof of Theorem 2. Under conditions (i) and (ii), we can write k
=
Z
0
1Z 1
k (r; s)dW
(r; s) :
0
So, under condition (iii) we have E
0 k j
= Id
Z
0
1Z 1
k (r; s) j (r; s)drds
0
22
= Id
ij :
Re (
k k)
0:
Hence
=
k k
k
s iid W(Id ; 1) and 1=K
P
k2K k k
s W(Id ; K):
Proof of Theorem 5. (a) By de…nition, we can represent
as
2K
1X = 2
0 k k
k=1
where
k
=
R1 0 0
E vec =E(
)
dW (r) s iidN (0; I): Now
=K
0
vec
2K 1 X 2K
k
)(
k
k)
)
Z
k
k=1
1 = E( 2K 1 E( = 2K 1 = E( 2K
0
=K !
2K 1 X 2K 0 k
Z
dW (r)
0
0
0
0
!
1
dW (s)
Z
1
0
0
dW (p)
0
+
0
0
0
) vec(Id2 )
0 k
k=1
0 k
1
0 k
1 ( 2K
Z
1
dW (q)0
0
0
0
) (Id
Id ) (Id2 + Kdd )
0
0
:
Hence var vec =
1 ( 2K
0
=K
=
1 ( 2K
) (Id
Id ) (Id2 + Kdd )
0
0
) (Id2 + Kdd ) :
part (a) follows from the above result and condition (iv). (b) Under Condition (ii), we have f10 (0; 0) = f20 (0; 0) = 0. A Taylor expansion gives 1 00 1 00 00 f (! 1 ; ! 2 ) = f (0; 0) + f12 (0; 0) (! 1 ! 2 ) + f11 (0; 0) ! 21 + f22 (0; 0) ! 22 + o(! 21 + ! 22 ) 2 2 as ! 1 ! 0 and ! 2 ! 0: Here the subscripts on f indicate the partial derivative. By de…nition 00 f11 (0; 0)
L L M M 1 X X X X = lim Es`1 ;m1 ( 0 ) s0`2 ;m2 ( 0 ) (`1 L;M !1 LM
`2 )2
`1 =1 `2 =1 m1 =1 m2 =1
00 f22 (0; 0)
L L M M 1 X X X X = lim Es`1 ;m1 ( 0 ) s0`2 ;m2 ( 0 ) (m1 L;M !1 LM
m2 )2
`1 =1 `2 =1 m1 =1 m2 =1
00 f12 (0; 0)
L L M M 1 X X X X Es`1 ;m1 ( 0 ) s0`2 ;m2 ( 0 ) (`1 = lim L;M !1 LM `1 =1 `2 =1 m1 =1 m2 =1
23
`2 ) (m1
m2 ) :
Consequently, under Conditions (i) and (ii), we have E~
=
K1 X K2 1 X fLM K k1 =0 k2 =0
=
K1 X K2 1 X f K k1 =0 k2 =0
=
2 k1 2 k2 ; L M
2 k1 2 k2 ; L M
1 LM
+o
K1 X K2 1 X 4 2 k1 k2 2 2 k2 2 2 k22 00 00 00 f12 (0; 0) (1 + o(1)) + f11 (0; 0) 2 1 + f22 (0; 0) K LM L M2 k1 =0 k2 =0
=
K1 2 00 f12 (0; 0)
K2 2 + L M 3
K2 2 00 f11 (0; 0) 21 L
+
K2 2 00 f22 (0; 0) 22
2 3
M
+o
K12 K22 + L2 M2
:
So under the speci…cation that K1 =L = K2 =M = ; we have E~ where B=
2
=B
2
+o
2
;
2 00 2 00 00 f12 (0; 0) + f11 (0; 0) + f22 (0; 0) : 3 3
(c). Part (c) follows from parts (a) and (b).
References [1] Abramowitz, M and I. Stegun (1965): Handbook of Mathematical Functions, Dover Pub., New York. [2] Andrews, D. W. K. (1991): “Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation,” Econometrica, 59(3), 817–858. [3] Bester, A., Conley, T., Hansen, C., and Vogelsang, T. (2009): “Fixed-b Asymptotics for Spatially Dependent Robust Nonparametric Covariance Matrix Estimators.” Working paper, Michigan State University. [4] Bilodeau, M. and D. Brenner (1999): Theory of Multivariate Statistics. Springer, New York. [5] Cli¤, A. D., and Ord, J. K. (1981). Spatial Processes: Models and Applications. London: Pion. [6] Conley, T. G. (1999): “GMM Estimation with Cross Sectional Dependence,” Journal of Econometrics, 92, 1-45. [7] Deo, C. (1975): “A Functional Central Limit Theorem for Stationary Random Fields,” The Annals of Probability, 3(4), 708–715.
24
[8] Goldie, C. M., and P. E. Greenwood (1986): “Variance of Set-Indexed Sums of Mixing Random Variables and Weak Convergence of Set-Indexed Processes,” The Annals of Probability, 14(3), 817–839. [9] Goodman, N.R. (1963): “Statistical Analysis Based on a Certain Multivariate Complex Gaussian Distribution.”Annals of Mathematical Statistics 34, 152–177. [10] Graczyk, P. G. Letac, and H. Massam (2003): “The Complex Wishart Distribution and the Symmetric Group,”The Annals of Statististics, Vol 31(1), 287-309. [11] Hannan, E. (1970): Multiple Time Series. Wiley New York. [12] Hotelling H. (1931): “The Generalization of Student’s Ratio.” Annals of Mathematical Statistics, 2, 360–378. [13] Kelejian, H. and Prucha, I. (2007): “HAC Estimation in a Spatial Framework.” Journal of Econometrics, 140(1):131-154. [14] Kiefer, N. M., and T. J. Vogelsang (2002): “Heteroskedasticity-Autocorrelation Robust Testing Using Bandwidth Equal to Sample Size," Econometric Theory, 18, 1350–1366. [15] — — (2005): “A New Asymptotic Theory for Heteroskedasticity-Autocorrelation Robust Tests,” Econometric Theory, 21, 1130–1164. [16] Kim, M. and Sun, Y. (2010): “Spatial Heteroskedasticity and Autocorrelation Consistent Estimation of Covariance Matrix.”Forthcoming in Journal of Econometrics. [17] Phillips, P. C. B. (2005): “HAC Estimation by Automated Regression.” Econometric Theory, 21, 116–142. [18] Stein, M.L. (1999): Statistical Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York. [19] Sun, Y., P. C. B. Phillips, and S. Jin (2008): “Optimal Bandwidth Selection in Heteroskedasticity-Autocorrelation Robust Testing,” Econometrica, 76, 175– 194. [20] Sun, Y. (2010a): “Autocorrelation Robust Inference Using Nonparametric Series Methods,” Working paper, Department of Economics, UC San Diego. [21] Sun, Y. (2010b): “Let’s Fix It: Fixed-b Asymptotics versus Small-b Asymptotics in Heteroscedasticity and Autocorrelation Robust Inference.” Working paper, Department of Economics, UC San Diego. [22] Sun, Y. (2010c): “Sun, Y. (2010b): “Robust Trend Inference with Series Variance Estimator and Testing-optimal Smoothing Parameter.”Working paper, Department of Economics, UC San Diego.
25