Estimators for Persistent and Possibly Non-Stationary Data with Classical Properties Yuriy Gorodnichenko
∗
Serena Ng
†
November 7, 2007 Preliminary Draft Comments Welcome
Abstract This paper proposes new estimators for single and multiple √ regressions when the data are persistent and possibly non-stationary. The estimators are T rather than super-consistent, but the asymptotic distributions are normal even when there is an autoregressive unit root in the data. There is no longer a discontinuity in the limiting distribution as the persistence parameter changes. The same critical values from the normal distribution can be used for hypothesis testing whether the regression is cointegrated or spurious, and irrespective of the treatment of the deterministic terms. The point of departure is that the estimators are based on moments of the errors, and these are stationary processes even if the data are non-stationary. Simulations show that the estimates are approximately median unbiased and tests have good size even in three hard to solve problems: (i) testing the value of the autoregressive parameter when the series is highly persistent but not non-stationary, (iii) regressions with a highly persistent predictor, and (iii) regressions that are nearly spurious. The framework is extended to systems estimation such as DSGE models. The estimates are precise without assuming a priori whether the shocks are permanent or transitory.
JEL Classification: C1 Keywords Persistence, unit root, cointegration, predictability tests, local-to-unity
∗
Department of Economics, University of Michigan, Ann Arbor, MI 48109 Email:
[email protected]. Department of Economics, University of Michigan and Columbia University, 420 West 118 St., New York, NY 10027 Email:
[email protected]. †
1
Introduction
Since the seminal work of Nelson and Plosser (1982) who reported that many macroeconomic time series are better characterized as differenced stationary, many tests for non-stationarity have been developed. It is now more or less accepted that the least squares estimator α bOLS of the autoregressive parameter α is super-consistent under the null hypothesis of a unit root (ie. α0 = 1), but the asymptotic distribution is non-standard. Similarly, in multiple regressions with non-stationary data, least squares estimation of the coitegrating vector is super-consistent but the estimator has a non-standard asymptotic distribution. Furthermore, classical (normal) inference tends to be inaccurate when the regressors are highly persistent but not necessarily non-stationary. Tests for predictability of asset returns are known to be vulnerable, as predictors such as the dividends-price and earnings-price ratio are borderline non-stationary in the data. The main problem for inference is that the asymptotic distribution of α bOLS is not continuous at α0 = 1. The so-called local-tounity framework which parametrizes α = 1 +
c T
is often used to yield an asymptotic distribution
that is continuous in α. As the resulting distribution is not pivotal (because c is not consistently estimable), inference remains non-standard under the local-to-unity framework. We consider estimating a K × 1 vector of parameters θ in the regression model yt = zt0 θ + ηt where zt is a vector of regressors and ηt is an error process. We propose
√
T consistent estimators
of θ that permit classical inference whether ηt and zt are mildly, strongly persistent, and even non-stationary. This means in the case of the AR(1) model (where zt is the trend function and ηt = αηt−1 + et ) that the t test for α is asymptotically standard normal not just when |α0 | < 1, but also α0 = 1. For predictability tests, this means that inference can be made in the same way whether the regressor zt is I(1) or I(0), and even in the spurious regression case when ηt is I(1). Because our estimators are not super consistent when the regressors are truly non-stationary, not surprisingly, this slower rate of convergence translates into power loss. However, testing if α = .95 when α0 is in fact .98 using α bOLS is highly imprecise. It is in such situations that even a less powerful test may have some appeal. The primary reason for the classical properties our estimators is that the moments evaluated at the true parameter vector are stationary, and a central limit theorem applies. When the regressors are stationary, the asymptotic distribution of the estimators have the usual Gaussian properties. When the regressors are non-stationary, the normalized estimator is mixed normal. That is, it converges in distribution to the product of a normal random variable and a random variance. However, the t statistic is asymptotically standard normal. Thus one can conduct inference without 1
deciding a priori whether the regressors are non-stationary. The same set of critical values can also be used whether or not deterministic terms are in the regression. When the number of moments exceeds the number of parameters to be estimated, a test of overidentifying restrictions permits the lag length to be chosen to simultaneously. There are only few estimators in the literature that are
√
T consistent and asymptotically normal
when the data are non-stationary. Estimators developed in So and Shin (1999)
1
and Phillips and
Han (2006) are both confined to estimation of the linear autoregressive model, while the latter is further restricted to the case without deterministic terms. For cointegration regressions, Laroque √ and Salanie (1997) used two OLS regressions in stationary variables to obtain a T consistent estimate of the cointegrating vector. The estimators we consider can be applied to both univariate and multivariate models. Our estimators adopt a method of moments setup with an identity weighting matrix. Let wt = (yt , zt0 )0 be the data and let and θ0 denote the true parameter vector. P Consider a vector of L × 1 moments g(wt ; θ) = gt (θ). Let g¯(θ) = T1 Tt=1 gt (θ) be the vector of sample moments evaluated at an arbitrary θ. The generalized methods of moments of estimator is QT (θ) = g¯(θ)0 g¯(θ).
θb = argmin QT (θ)
(1)
θ
¯ T (θ) = Let G
1 T
PT
t=1
∂gt (θ) ∂θ
be the L × K matrix of derivatives.
Assumption A:
(i) θ is in a compact parameter space Θ; (ii) gt (θ) is continuously differentiable P d in θ for all wt ; (iii) Eg(θ) = 0 for all θ 6= θ0 (iv) √1T Tt=1 gt (θ0 )−→N (0, S) and S > 0. Assumption B:
(i) E supθ kgt (θ)k < ∞ and G0 = limT →∞ T −1
(ii) θ0 is in the interior of Θ and minimizes Q(θ) =
PT
t=1 E[Gt (θ)]θ=θ0
is non-singular;
Eg(θ)0 Eg(θ).
Under Assumptions A and B, QT (θ) converges to Q(θ) uniformly in θ. Mean-value expansion of g¯(θ) around θ0 leads to the result that √
−1 0 0 0 d T (θb − θ0 )−→N 0, (G0 G0 )−1 G0 SG0 (G0 G0 )
Furthermore d
b 0 Avar(¯ b − g¯(θ) b −→χ2 J = T g¯(θ) g (θ)) L−K where Avar(¯ g (θbC )) is the generalized inverse of (I − G0 (G00 G0 )−1 G0 )S(I − G0 (G00 G0 )−1 G0 )0 . 1
The So and Shin estimator is defined as α b=
Han estimator is α b=
PT
t=1
PT xyt PT t=1 t=1 xt yt−1
∆yt−1 (2∆yt +∆yt−1 ) PT 2 t=1 (∆yt−1 )
2
with xt = sign(yt−1 ) as instrument. The Phillips and
Whereas the standard theory assumes wt is stationary, we allow wt to be possibly non-stationary. √ We will show how to find moments such that estimators that are T consistent and asymptotically normal can be obtained. To be clear on the idea, we first focus on the univariate autoregression. We begin in Section 2 by first abstracting from deterministic terms and serial correlation in the errors to permit a clear exposition of how the estimators work. The general AR(p) case will then be discussed in Section 3. Simulations are presented to illustrate the finite sample properties of the estimators. Extensions to multiple regressions and systems are then discussed in Section 4. 2
The AR(1) Model
We consider the data generating process yt = dt + xt , xt = α0 xt−1 + ut ,
(2)
β(L)ut = et , p0
β(L) = 1 − β1 L − . . . − βp L ,
∞ X
|βj | j=0 2 −1 2
et ∼ N (0, σ 2 ) ω 2 = (1 − β(1) ) where e0 = 0, E(x20 ) < ∞, β(L) = ψ(L)−1 with
P∞
j=0 |ψj | by ω 2 =
< ∞,
σ
< ∞, et ∼ iid(0, σ 2 ). The non-normalized
spectral density at frequency zero of ut is given σ 2 (1 − β(1))−2 . The deterministic terms Pr are captured by dt = j=0 δj tj where r is the order of the deterministic trend function. We focus on the intercept only case with dt = δ0 and the linear trend case with dt = δ0 + δ1 t. Hereafter, we let θ = (α, σ 2 , β1 , . . . , βp ) be the K × 1 vector of parameters of the model. The true parameter vector is denoted θ0 and the correct lag length is denoted p0 . To motivate, consider the simple case without deterministic terms and so yt = xt . Suppose further that β(L) = 1 and so ut = et . The DGP is then yt = α0 yt−1 + et . P[T r] When α0 = 1, the functional central limit theorem holds so that √1T t=1 ut ⇒ ωW (r) and thus R1 P 2 T −2 Tt=1 yt−1 ⇒ ω 2 0 W (r)2 dr, where ⇒ denotes weak convergence in distribution and W (r) is a Wiener process defined on C[0, 1], the space of continuous functions on [0, 1]. In contrast, the P p σ2 2 −→E(y 2 process yt is stationary when α0 < 1. In that case, T −1 Tt=1 yt−1 t ) = 1−α2 . That the 0
sample moments have different properties when α0 < 1 and when α0 = 1 have been the basis of many unit root tests, and most of these moments are linear in the parameter of interest, α. We now consider a different approach. 3
2.1
The QD Estimator
Define et (α) = yt − αyt−1 and let γk = E[et (α)et−k (α)] be the autocovariance of et at lag k. For |α0 | ≤ 1, the population moment condition E(g QD (α0 ) = γk (α0 ) = 0 holds for k ≥ 1. As et is the quasi-difference of the yt , we use ’QD’ to distinguish this moment from the ones that will be discussed subsequently. The sample analog expressed in terms of observed variables is g¯QD (α) = γ bk (α) =
T 1X (yt − αyt−1 )(yt−k − αyt−k−1 ). T t=1
αQD ) = 0, g¯QD (α) = (b γ1 (α), . . . γ bLD (α). Let α bQD be such that g¯kQD (b When the parameter space is restricted to be far away from the boundary of one, Assumptions ¯ QD (α0 ) A and B hold and α bQD is consistent for α0 . However, when α0 is at or close to one, G converges to a random variable and Egt (α) may not be well behaved when α 6= α0 . Moon and Schorfheide (2002) encountered a similar problem when analyzing minimum distance estimation of restricted non-stationary time series models. However, consistency and the limit distribution can still be established from a quadratic approximation of the objective function. (α) Proposition 1 Suppose yt = α0 yt−1 +et . Let α bQD = argminα g¯QD (α)0 g¯QD (α) with g¯QD (α) = γbγbk0 (α) √ d αQD − α0 )−→N (0, 1) for all |α0 | < 1 and (ii) for some k ≥ 1. Under Assumption A, (i) T (b √ d T (b αQD − 1)−→W (1)−2 N (0, 1).
Using the sample autocorrelation to define the moments ensure that the objective function is always bounded. The estimator has classical properties when |α0 | is strictly bounded away from the unit circle, but is less efficient than the least squares estimator whose asymptotic variance is 1 − α02 . ¯ QD ⇒ −σ −2 W (1)−2 . The estimator has a Student t distribution with one degree When α0 = 1, G 0
of freedom (and hence Cauchy). However, for all |α0 | ≤ 1, √ T (b αQD − α0 ) QD ≈ N (0, 1). t = p Avar(b αQD ) Classical normal inference is therefore valid even when α0 is in the neighborhood of 1, a parameter region for which accurate hypothesis testing has proved difficult. A disadvantage of the QD estimator in this unviariate AR(1) setting is that the standard error is a χ21 which can take on values close to zero with non-zero probability, making the t statistic 4
ill-behaved. Adding additional moments will alleviate the problem. We consider an estimator that exploits the same moment condition but whose asymptotic properties are more stable and have intuitive interpretation. 2.2
A Linear QD Estimator
The starting point is to view the QD estimator as IV estimator using et−k (α) as an instrument. Instead of using yt−1 yt−k to instrument yt−1 as in OLS and IV respectively, consider using et−k (α) as an instrument. Notably, (i) et−k (α) is not observed, but (ii) et−k (α0 ) is stationary even when yt−k is not stationary for any k ≥ 0. As we will see, stationarity of the instrument is crucial for asymptotic normality of the QD estimator. To resolve the problem that et−k is a latent instrument, we use the fact that the least squares estimator α bOLS is always consistent for α. Thus, let eet−1 = yt − α bOLS yt−1 . Conveniently, generated instruments thus not require special treatment of the standard errors subsequently. Treating eet−1 as known, let α bA be such that T 1X g¯ (b α )= eet−k (yt − α bA yt−1 ) = 0. T A
A
t−1
If k = 1 and the model is exactly identified, α bA is nothing more than a simple IV estimator: PT PT et−k et (α0 )e et−1 A t=1 yt e α b = = α0 + Pt=1 . T yt−1 eet−1 et−1 t=1 yt−1 e Consistency of α bA follows from the fact that eet−1 = et−1 + Op (T −δ ) where δ = 1 if α0 = 1 and δ = 1/2 when |α0 | < 1/2, and E(et (α0 )et−1 (α0 )) = 0 by assumption. The asymptotic distribution ¯ A (α0 ) =⇒ − σ2 (W (1)2 + 1). As S = σ 4 in this case, we have that depends on α0 . When α0 = 1, G T
when α0 = 1,
2
√
T (b αA − 1) ⇒ 2(1 + W (1)2 )−1 N (0, 1). P P p When α0 < 1 , T −1 Tt=1 yt−1 et−1 −→σ 2 and T −1 Tt=1 yt−2 et−1 = op (1). It follows that √
(3)
d
T (b αA − α0 )−→N (0, 1).
The distribution of α bA is symmetrically distributed, and the estimator is median unbiased. Although the distribution α bA is only mixed normal when α0 = 1, √ T (b αA − α) t= p ∼ N (0, 1) Avar(b αA ) and classical normal inference is valid even at |α0 | = 1. The usual critical value of -1.64 and -2.32 can be used when the significance level of the test is 5 and 1 percent, respectively. 5
Additional moments can easily be used. Consider the feasible moment conditions: −1 T (yt − αyt−1 )e et−1 A g¯t (α)] = T −1 (yt − αyt−1 )e et−2 and GA t Evaluated at α0 = 1,
= 2
¯ A |α =1 ⇒ − σ G 0 2
−yt−1 eet−1 . −yt−1 eet−2
1 = GA 1 + W (1) 0, 1 2
which again is random in view of W (1)2 . The standardized test is still is asymptotically normal. As α bA is a instrumental variable estimator, bias increases with the number of instruments but the variance is lower. 2.3
A Modified Moment
Estimator α bA is a two-step IV estimator. Consider yet another estimator PT yt ∆yt−k B α b = PT t=2 .. t=1 yt−1 ∆yt−k
(4)
The estimator cannot be motivated from a regression context, but it is clear that if yt = α0 yt−1 +et , then for any |α0 | ≤ 1 and k > 0, P T −1 Tt=1 et ∆yt−k α b = α0 + . P T −1 Tt=1 yt−1 ∆yt−k B
For the assumed DGP, E(et ∆yt−k ) = 0 only when E(et et−k ) = 0, which the same condition that underlies α bA . The estimator is consistent and is again asymptotically mixed normal. Estimator α bB is of course nothing more than a GMM estimator using as moments gtB (α) = et (α)∆yt−k . ¯ B (α0 ) ⇒ − σ2 (W (1)2 + 1) = GB . Then as T → ∞, For any k > 0, G 0 T 2 √ T (b αB − 1) ⇒ 2(W (1)2 + 1)−1 N (0, 1). p σ2 ¯ B (α0 )−→ On the other hand, when evaluated at |α0 | < 1, G − 1+α 2 and 0 √ d T (b αB − α0 )−→N (0, 2(1 + α0 ))
for all 0 ≤ |α0 | < 1. More importantly, for all |α0 | ≤ 1 √ T (b αB − α0 ) tB (α0 ) = q ≈ N (0, 1) [ αB ) Avar(b 6
where
2 T X −1 B 2 [ yt−1 ∆yt−k , Avar(b α )=σ b T t=1
and
σ b2
is a consistent estimate of
σ2.
It should be emphasized that α bB is consistent when the
moment condition that underlies α bA is satisfied. Estimators A and B replace et−1 (α) by a stationary term that is no longer a function of α. ¯ A,B (α0 ) has a limit that is bounded away from zero. The estimators are thus fact linear. As well, G T
Estimators A and B can be expected to be more stable than GQD 0 . All three estimators permit classical inference for all |α0 | ≤ 1 at the cost of having a slower convergence rate, and are less efficient than OLS. However, testing if α = .95 when α0 = .98 using α bOLS is highly imprecise. It is in these situations that the proposed estimators will be of interest. 3
The General AR(p) Model
Both estimators above achieve asymptotic normality by using moments defined in terms of stationary variables. So far, there is no deterministic term and et is serially uncorrelated. These assumptions will now be relaxed. 3.1
Deterministic Terms
Dealing with deterministic terms is straightforward in our setup. Suppose dt = δ0 with xt = yt − dt . P Let x bt = yt − dbt , where dbt is a consistent estimate of dt . For example, if r = 0, x bt = yt = 1 T yt . T
t=1
GLS and recursive detrending can easily be accommodated. The population condition QD E[gt (α)] = E (xt − αxt−1 )(xt−k − αxt−k−1 ) for k ≥ 0 can be replaced by gtA (α) = (b xt − αb xt−1 )e et−k where eet is the least squares residual from a regression of yt on yt−1 plus the deterministic compoR ¯ A ⇒ GA = − σ2 (W ¯ (1)2 + 1) where W ¯ (r) = W (r) − 1 W (r)dr. nents. Now GA xt−1 eet−k G t = −b 0 0 2 0 √ T (b αA − α0 ) t= p ≈ N (0, 1) vd ar(b α) remains normally distributed as T → ∞. Estimator α bB can obtained using the moment gtB (α) = (b xt − αb xt−1 )∆b xt−k . ¯ (r) by W f (r), The extension to the linear trend case is similar, with the obvious replacement of W R1 R1 f (r) = W (r) − (4 − 6s)W (s)ds − r (12 − 6s)W (s)ds is a detrended standard Brownian where W 0
0
motion. 7
3.2
Higher Order Autoregressive Processes
Recall that the model is (1 − αL)xt = ut , β(L)ut = et and we have so far assumed β(L) = 1. When β(L) is a finite p-th order polynomial in L, we can rewrite the model as in Said and Dickey (1984): xt = ρ0 xt−1 +
pX 0 −1
b0,j ∆xt−j + et
(5)
j=1
where et is serially uncorrelated by assumption. Let b0 = (b0,1 , . . . b0,p0 −1 )0 and β0 = (β0,1 , . . . β0,p )0 . Notably, the parameters ρ0 , b0 are functions of α0 , β0 . Furthermore, ρ0 < 1 when α0 < 1. Testing if α0 = 1 is then the same as testing ρ0 − 1 = 0. For arbitrary ρ, b, and p, et (ρ, b, p) = xt − ρxt−1 −
p−1 X
bj ∆xt−j .
j=1
Notice that et depends on p. If p < p0 , etp will be serially correlated. Let θ = (ρ, b1 , . . . bp ). Let eet be the least squares residual from a regression of yt on the deterministic terms and p lags of yt . Define the sample moments used in estimation as QD
g¯
(θ, p) =
T
A
g¯ (θ, p) =
T
B
g¯ (θ, p) =
T
−1
T X
−1
t=1 T X t=1
t=1
−1
T X
−1
et (θ, p)et−1 , . . . T
et (θ, p)e et−1 , . . . T
−1
T X
−1
t=1 T X
et (θ, p)∆b xt−1 , . . . , T
t=1
et (θ, p)et−L
(6)
et (θ, p)e et−L
T X
(7)
et (θ, p)∆b xt−L .
(8)
t=1
The interpretation of Estimators QD and A are straightforward. To better understand ρbB , define zt = xt −
p−1 X
bj ∆xt−j .
j=1
to rewrite xt = ρxt−1 +
Pp−1
j=1 bj ∆xt−j
+ et as zt = ρxt−1 + etp .
Notice the dependence of etp on p. If we know bj , then we can define the estimator PT zt ∆xt−k ρbB = PT t=2 . x ∆x t−1 t−k t=1 Clearly, PT
etp ∆xt−k ρb = ρ0 + PTt=1 t=1 xt−1 ∆xt−k B
8
(9)
The estimator is consistent if it can be shown that T −1
p t=1 etp ∆xt−k −→0.
PT
But etp is serially
uncorrelated when p ≥ p0 , where p0 is the true lag length. Furthermore, ∆xt−k for any k > 0 is driven by innovations prior to t and thus uncorrelated with etp . The estimator exploits the same property that et (ρ, b, p0 ) is uncorrelated with et−k for k ≥ p0 . In practice, bj have to be estimated. The GMM estimator essentially performs joint estimation of ρ and bj . The result is θbB . Proposition 2 Let yt be generated as in (2). Let θ = (ρ, b1 , . . . bp ). Let x bt = yt − dbt , where dbt is a consistent estimate of dt . Suppose Assumption A holds. Define (θbA,B , p∗ ) =
min
p=1,...pmax
minθ∈Θ g¯A,B (θ, p)0 g¯A,B (θ, p)
for some pre-specified pmax. Then as T → ∞ with √
Whereas
√
√
d
T g¯A,B (θ0 )−→N (0, S A,B ) for all ρ0 ≤ 0,
d
−1 A,B0 N (0, S A,B ). GA,B T (θbA,B − θ0 )−→(GA,B0 0 ) G0 0
T g¯OLS (θ0 ) is non-Gaussian when ρ0 = 1 a central limit theory still holds when g¯(θA,B )
is evaluated at θ = θ0 . In consequence, the distribution of ρb is mixed normal and classical inference is valid for all 0 ≤ |ρ0 | ≤ 1. The practical appeal of the three estimators is that the standardized asymptotic distributions are continuous in ρ0 . This provides a convenient way to test H0 at various values of ρ0 . To illustrate the properties of the estimators, we simulate yt , as a Gaussian random walk. The data are demeaned to yield x bt . We let L = 2. Notably, α bB is more spread out than α bA . Figure √ 1 presents the distribution for the ρb − 1 scaled by T , along with OLS (scaled by T ). The QD has Cauchy type features, but that for A and B are not far from the normal distribution. The normality approximation is quite precise when T = 1000. Figure 2 shows the t statistic corresponding to Figure 1 with the normal distribution superimposed. The problem that the tOLS is left skewed is evident. The tQD statistic has more mass at zero and is more dispersed than the normal distribution. On the other hand, tA and tB are quite closer to the normal distribution. Lag Length Selection
As with any estimator of the autoregressive model, our proposed estima-
tors depend our choice of the lag length, p. To see how p affects inference, suppose p0 = 2, xt = yt , so that the DGP is ∆xt = ρ0 xt−1 − b0,1 ∆xt−1 + et . Suppose that the researcher (wrongly) assumes p = 1 and uses the condition gtQD (ρ, 1) = (∆b xt − ρb xt−1 )(∆b xt−1 − ρb xt−2 ) 9
to estimate ρ. But EgtQD (ρ0 ) 6= 0 since the error term ∆xt − ρ0 xt−1 = et (ρ0 ) − b1 ∆xt−1 is serially correlated even though et (ρ0 , p) ) is an innovation. Accordingly, the parameter estimate associated with these sample moments cannot not be the minimum over all possible values of p. In this sense, p is no longer a nuisance parameter but is chosen to satisfy the moment conditions. The J test of overidentifying restrictions provides a natural guide to the selection of p. Estimates corresponding to a J test that rejects the moment conditions should be disregarded. MA(p) Errors So far, we have assume that et is serially uncorrelated. Suppose et = θ(L)εt = (1 + θ1 L + . . . θq Lq )εt . In such a case, xt does not have a finite order autoregressive representation. One approach is to take an AR(p) approximation to the AR(∞) model and then define the moments from the approximate model. But one can use the fact that if et is an MA(q0 ), it should be uncorrelated with et−q0 −k for k > 0. This suggests a more general set of moment conditions. For k = 1, . . . L, g¯kQD (ρ, b, p, q)) g¯kA (ρ, b, p, q)) g¯kB (ρ, b, p, q))
= T
= T
= T
−1
−1
−1
T X
x bt − ρb xt−1 −
p−1 X
t=1
j=1
T X
p−1 X
x bt − ρb xt−1 −
t=1
j=1
T X
p−1 X
x bt − ρb xt−1 −
t=1
x bt−q − ρb xt−q−1 −
bj x bt−j
p−1 X
bj x bt−q−j
j=1
bj x bt−j eet−q−k bj x bt−j ∆xt−q−k .
j=1
One can therefore choose between using a large p to approximate the autoregressive model, or a more parsimonious ARMA process. For processes with close to moving-average unit-roots, allowing q > 0 may yield more precise estimates. 3.3
Adding Covariates
One way to improve the power of unit root test is to exploit the correlation between the error term ut and some stationary process, say ∆zt . The CADF regression, due to Hansen (1995), is ∆xt = ρxt−1 +
p X
bj ∆xt−j +
j=1
s X
cj ∆zt−j + εt .
(10)
j=1
Hansen showed that if the parameters in (10) are estimated by OLS, the t statistic for testing ρ = 0 is a mixture of a standard normal distribution (due to the regressor ∆zt ) and a Dickey-Fuller distribution (due to the regressor yt−1 ), with weights measured by the relative contribution of ∆zt 10
to ut . Elliott and Jansson (2003) showed that tests that exploit information in the covariates are much closer to the power envelope than tests that do not, but that tests in this class tend to have a non-Gaussian component. The Dickey-Fuller piece is a consequence of least squares estimation. Covariates is straightforward to implement in our framework. Let eet be the least squares residuals from a regression of yt on p lags of yt , r lags of ∆zt , and the deterministic terms. Let g¯A (θ, p, q) = (¯ g1 , g¯2 , . . . g¯L−1 )0 , L ≥ p + s where g¯jA (θ)
p−1 T s X X 1X = (∆b xt − ρb xt−1 − bj ∆b xt−j − cj ∆b zt−j )e et−j . T t=1
j=1
j=1
Define the estimator as (θbA , p∗ , s∗ ) =
min
p=1,...pmax,s=1,s max
min g¯A (θ)0 g¯A (θ). θ
Let Avar(θbA ) be defined as in Proposition 2, and let Avar(b ρA ) = [Avar(θbA )]11 . Then √ T (b ρA − ρ0 ) p ≈ N (0, 1). Avar(b ρA ) Estimators B can analogously be defined from E(et ∆yt−k ) = 0 for k = 1, . . . L. But now, we need L ≥ p + s for identification. 4
Simulations
To illustrate the finite sample properties of estimators, we simulate data as follows: yt = dt + xt xt = (1 − λ1 L)−1 (1 − λ2 L)−1 et et = εt + ψεt−1
εt ∼ N (0, 1)
or equivalently, xt = ρ0 xt−1 + b0,1 ∆xt−2 + et where ρ0 = λ1 + λ2 − λ1 λ2 − 1, b0,1 = λ1 λ2 . Three specifications of et are considered: DGP 1 p0 = 1, q0 = 0: ψ = λ2 = 0, λ1 = 1, .98, .95, .92, .85, .5, −.5 with (i) p = 1, L = 2 and (ii) p = 1, L = 3; DGP 2 p0 = 2, q0 = 0: ψ = 0, (λ1 , λ2 ) = (1, .5), (.95, .5), (.9, .5), (.85, .5): with (i) p = 2, L = 2, and (ii) p = 1, L = 3; DGP 3 p0 = q0 = 1: λ2 = 0, (λ1 , ψ) = (1, .5), (.8, .5), (1, −.5), (.8, −.5) with 11
(i) p = 2, L = 3 and (ii) p = 1, L = 2 and q = 1. The assumed regression model is yt = δ0 + δ1 t + ρyt−1 +
p−1 X
bj ∆yt−j + error.
j=1
Thus the regression model is correct for DGP 1 if p = 1, and is correct for DGP 2 if p = 2. There does not exist a finite p that will correctly specify DGP 3, though the sum of the autoregressive coefficients in the AR(∞) representation is
α+ψ 1+ψ .
Many autoregression based unit root tests suffer
from size distortions when ψ < 0. We bound the parameter space for ρ to [−1.1, 1.1]. If the converged estimates are outside of this range, we change the starting value up to 3 times. Very rarely does an estimate falls outside of this range. Reported are results for 1000 replications. Table 1a reports the mean estimates along with the J test for overidentifying restrictions for the intercept model. The negative bias in OLS when the data are highly persistent is well known, but our proposed estimators are not immune to this problem. The bias in α bB at α0 = 1 is in fact larger than OLS. The QD upward biased when p = p0 = 1 but is actually more accurate when p is assumed to be two. The last panel of Table 1 reports the J statistic. The test has the correct size when the correct lag length is assumed, and remains so if p > p0 = 1. The true lag length under DGP 2 is two. Interestingly, the J tests reject the model and favor an AR(3) model. This suggests that in finite samples, the correct lag length does not necessarily imply more precise estimates. We use four moments to estimate the parameters of an AR(3) model. The model is correctly specified, the J tests do not reject the model, and the estimated sum of the autoregressive parameters are very precise. The model is then wrongly assumed to be an AR(1). The estimates are upward biased, the J tests reject the moment, showing that the J test can serve as a lag length selection device. For DGP 3, we first use an AR(2) model to approximate the AR(∞) model, which will be inadequate when ψ is large. The J tests strongly reject the model when ψ = .5, and less strongly when ψ = −.5. We then set p = 1, q = 2, L = 1 to allow for moving average errors. The J test now (correctly) fails to reject the moment conditions defined by g¯QD and g¯A . These moments provide quite precise point estimates of λ1 (being 1 or .8) is quite accurate. However, Estimator B, which is not specifically designed for moving-average errors, gives inaccurate estimates. We consider three hypothesis. The t statistic will be denoted t0 , t1 , t2 respectively. The asymptotic 5% critical value is used in all cases except when the least squares estimator is used in t3 . In that case the critical value is -2.86 in the intercept only case and -3.41 when a trend is also allowed. We denote this case as the ADF. The results for the intercept model are reported in Table 2. We only report results when the model is not rejected by the J tests. 12
The rejection rates for t1 correspond to the finite sample size for testing ρ = ρ0 . The well documented size distortion due to OLS when ρ0 is close to 1 is evident. All three estimators lead to much improved size. When ρ0 is between .9 and 1, estimators QD and A still lead to tests with distorted sizes, but Estimator B yields accurate inference. The rejection rates for t3 correspond to the size of a unit root test. Estimator B again yields the most accurate inference. Not surprisingly, its power is lower than the Dickey-Fuller test. The ADF/OLS and B have rejection rates of .35 and .13 respectively at ρ0 = .95. The apparent cost of being able to perform inference robust to α0 being on the unit circle is a loss of power. The rejection rates for t2 indicate the power of a one-sided test of ρ = .95 against the alternative that ρ > .95. Because α bQD is upward biased, tQD is distorted even in large samples. The rejection rates for t3 correspond to the size and power of unit root tests. Here, the power loss of the proposed estimators compared to OLS is evident. Autoregression based unit root tests have distorted size in this parameter range. See Schwert (1989), Ng and Perron (2001). Yet, tA and especially tB have good size and power even when θ = −.5. Explicitly modeling the moving-average error structure appears to be a promising approach. Table 3 report results for the time trend model. In this case, estimators QD and A have size distortion at T = 200, while B remains quite accurate. We have not experimented with GLS detrending, which might make the tests more accurate. All things considered, we find Estimator B to yield the most precise inference but that A is more powerful. Both yield tests with smaller size distortion than the QD and certainly OLS when hypothesis about ρ near unity. Ultimately, the proposed estimators are useful only if the potential for a more accurate size when the data are highly persistent does not come at the cost of power low outside of the persistent range. Table 4 reports the finite sample power for one sided tests at α0 − .05 and α0 − .10, for values of α0 ranging fro -.5 to 1 . Notice that when ρ0 is far from 1, such as .5, the power of OLS, QD, and A are quite similar, though the power of B is somewhat lower. As there does not exist a test in the literature that can yield correct inference both when ρ0 is close to 1 and when ρ0 = 1, the proposed estimators can be useful in testing if α is .95 or .98 to supplement information from unit root tests.
13
5
Multiple Equations
Consider the regression model with K regressors yt = x0t−1 β + uyt
(11)
xt = αx xt−1 + ext , uyt = αy uyt−1 + eyt where eyt ∼ (0, σy2 ), ext ∼ (0, σx2 ) cov(ext , eyt ) = σxy , cov(ext−j , eyt−k ) = 0∀j, k 6= 0. If |αy | < 1 and √ |αx | < 1, (yt , xt ) are both stationary, T consistent and asymptotically normal estimates can be obtained if uyt is uncorrelated with xt−1 . If αx = 1 and uyt is stationary, (yt , xt ) are both I(1). Then (1 − β) is a cointegrating vector, and least squares provide super-consistent estimates but inference is non-standard. If xt is I(1) and uyt is also I(1), the regression is spurious. The challenge facing practitioners is that it is not easy to establish if any of these variables are strictly I(1) or I(0). The question is how to do inference that is robust to the dynamic properties of the data. 5.1
Single Equation Approach
Suppose uyt is known to be serially uncorrelated so that uyt = eyt , but uxt is serially uncorrelated. OLS uses the moments EgtOLS (β) = E(xt−1 eyt (β)) = 0. As is well known, the asymptotic distribution of βbOLS is non-standard if xt is highly persistent and possibly non-stationary. Consider instead QD (β) = E[et−k (β)eyt (β)]. Egkt
Stationarity of et enables an application of central limit theorem so that some hope remains for achieving asymptotic normality. As in the univariate AR(p) model, we also consider A gkt (β) = eext−k eyt
and
B gkt = ∆xt−k eyt
where eext is the least squares residual from a regression of yt on xt−1 , and eext = ext + Op (T −1/2 ). Using arguments analogous to the AR(1) model, one can show that d βbA −→ K A N (0, sA ) d βbB −→ K B N (0, sB )
14
where sA and sB = are the asymptotic variance of g¯A (β0 ) and g¯B (β 0 ) respectively, K A and K B are random if αx = 1. However, the t statistic is √ T (βbA,B − β0 ) [ βbA,B ) Avar(
≈ N (0, 1)
[ βbA,B ) are the estimated asymptotic variances. Estimation of the nuisance parameters where Avar( ¯ A,B (β). such as the long-run variance is not necessary. These are all implicit in G The QD estimator is also valid when uy is possibly non-stationary, in which case the regression is spurious. Define ∆αy = (1 − αy L). We have ∆αy yt = ∆αy xt−1 β + eyt . Let θ = (αy , β) and gt (θ) = (g1 (θ), . . . gL (θ)0 . For k ≥ 1, define b(k) = T −1 g¯kQD (θ) = γ
T X
eyt (θ)eyt−k (θ)
t=1
g¯kA (θ)
= T
−1
g¯kB (θ) = T −1
T X t=1 T X
eyt eeyt−k eyt ∆αey xt−k−1
t=1
e t−1 , βe is obtained from least squares regression of ∆yt on ∆xt−1 , where eeyt = ∆αey u et , u et = yt − βx and α ey is obtained by least squares from a regression of u et on u et−1 . When L = 1, it is easy to see that
PT ∆αy yt ∆αey xt−2 B b β = PTt=1 α ey α e t=1 ∆ xt−1 ∆ xt−2
is consistent for β whenever ∆αey is evaluated at a consistent estimate of αy0 . The moment conditions g¯B enables joint estimation of αy and β. So far, the estimators of β have not exploited information in the marginal distribution of x. This is because the dynamic relation between eyt and ext are often unknown a priori. We now consider a more versatile approach that exploits more information. 5.2
Systems Estimation
Consider again the model defined by (11) where αx is possibly on the unit circle but αy is known 2 )0 . Let w = (∆αx y , ∆αy x )0 and Ω(j) = E(w w to be zero a priori. Then θ = (β, αx , σx2 , σy2 , σxy t t t t t−j )
15
be the autocovariance of the quasi-differenced variables at lag j as implied by the model. In the present example, Ω(0) =
β 2 σx2 + σy2 (1 + αx2 ) βσx2 βσx2 σx2
and Ω(1) =
2 β 2 σxy 0 . 0 0
b d (j) be j-th sample autocovariance of the data quasi differenced at αx . We need at least Let Ω five moments for identification. Let ω(θ) ⊂ (vech(Ω(0)(θ), vech(Ω(1)(θ))0 and ω b d (αx ) be the corresponding sample autocovariances. Let g¯QD (θ) = vech(b ω d (αx ) − ω(θ)). Then θbQD = argminθ g QD (θ)0 WT g¯QD (θ). This is nothing more than a GMM estimator, but has the important feature that even if the observed data are non-stationary, gt (wt ; θ0 ) is stationary √ ¯ QD (θ0 ) converges to and T g¯(θ0 ) obeys a central limit theorem. If the data are stationary and G T QD QD b a non-random matrix G , and θ has the usual large sample properties of GMM as derived 0
¯ QD (theta0 ) will in Hansen (1982). When one or more components of (yt , xt ) are non-stationary, G T √ QD d QD b have a random limit, and hence θ is mixed normal. But so long as T g¯ (θ0 )−→N (0, S), the asymptotic distribution is given by √
d T (θbQD − θ)−→(G00 G0 )−1 G00 N (0, S).
The t statistic is asymptotic normal. An important feature of the moments is that they are defined by wt , the quasi-differenced data. The AR(1) model considered in Section 2 is just a special case. Quasi-differencing has a long tradition in econometrics and underlies GLS estimation. Canjels and Watson (1997) found that quasi-differencing gives more precise estimates of the trend parameters when the errors are highly persistent. Pesavento and Rossi (2005) suggest that for such data, quasi-differencing can improve the coverage of impulse response functions. In both studies, the data are quasi-differenced at α = α ¯ which is fixed at the value as suggested by the local to unity framework. In our analysis, this parameter is itself being estimated. The QD estimator can be made robust to possible non-stationarity in the regressors and the errors of the model. In the above example, we would quasi-differenced the data by ∆α = (1 − αy L)(1 − αx L). More generally, let Θ be a K × 1 vector of parameters of interest. If the model has state space representation Yt = Π(Θ)Yt−1 + Vt ,
16
V ∼ (0, Ω)
(12)
the autocovariances are defined by Ω(0) = var(Yt (Θ)) = (I − Π(Θ))−1 Ω(I − Π(Θ))0−1 Ω(1) = cov(Yt (Θ), Yt−k (Θ)) = Πk var(Yt ). These autocovariances are functions of Θ to the extent that Π depends on Θ. In DSGE models, (12) would emerge after solving the model, such as by the QZ decomposition. We want to estimate some or all of the Θ parameters in (11) while being agnostic as to whether shocks are permanent or transitory. A slight variation of the quasi-differenced framework makes this possible. Let α1 , . . . , αn be the largest autoregressive root of the n shocks in the structural Q model. Define ∆α = nj=1 (1 − αj L). Partition Θ into (Θ− , α). The QD Estimator: Initialize αj , j = 1, . . . n to yield ∆α . 1: Compute ω d (α), the unique sample covariances of the data quasi-differenced by ∆α . 2 For a given Θ, compute ω(Θ), the analytical covariances and autocovariances implied by the quasi-differenced representation of the model. b = arg minΘ = g¯QD0 (Θ)WT g¯QD (Θ). 3: Let g¯QD = (ωαd − ω(Θ). Then Θ α0 Again, GQD T (Θ0 ) will converge to a random matrix when αj = 1 for some j. But so long as ∆ Yt √ is stationary whether or not Yt is stationary, g¯(Θ0 ) will obey a central limit theorem and T
consistent estimates of Θ whose t statistics are normally distributed can be obtained. 5.3
Simulations
We consider two examples. The first is a regression with one possibly non-stationary regressor (as in predictive regressions). The second is a system of three equations driving by a possibly non-stationary common shock (as in DSGE models). Example 1 The data are generated as in Jansson and Moreira (2006). yt = βxt−1 + uyt xt = αx x − t − 1 + ext uyt = αy uyt−1 + eyt 2 σxy = ρxy σx σy
17
with eyt ∼ N (0, 1), ext ∼ N (0, 1). The objective is to test H0 : β = 0. For estimator A, eext are the least squares residuals from a regression of xt on xt−1 . We consider two sets of instruments. Estimator A1 uses eet t − 1 and eet−2 as instruments, while A2 also uses eet−3 . For estimator Bm we use ∆yt−j and ∆xt−j as instruments, with j = 1, 2 in B1 , and j = 1, 2, 3 in B2 . Figure 5 shows the distribution of the t statistic for T = 200 at αx = 1. Evidently, the normal distribution is a reasonably good approximation to the finite sample distribution of all tests except OLS.2 Table 5 shows that OLS exhibit substantial bias when α is close to but not equal to one, while the QD as well as estimators A and B are quite well behaved. Power is evidently increasing in the number of moments, and tests based on A are significantly more powerful than B. Estimator A with a suitable choice of the number of moments can thus achieve a good size with limited loss of power. Example 2
To assess the properties of the QD estimator, we consider one sector stochastic
growth model with inelastic labor supply. The problem facing the central planner is to maximize P∞ t Et t=0 β ln Ct subject to ψ Yt = Ct + It = Kt−1 (Zt Lt )(1−α)
Kt = (1 − δ)Kt−1 + It Zt = exp(¯ g t) exp(uzt ),
uzt = ρz uzt−1 + ezt ,
|ρz | ≤ 1
where Yt is output, Ct is consumption, Kt is capital, Lt is labor input, Zt is the level of technology, and Qt is a labor supply shock. We allow ρz to be on the unit circle. Additional results can be found in Gorodnichenko and Ng (2007), where we used the covariance structure of quasi-differenced data to estimate parameters of various DSGE model when it is not known a priori whether shocks are permanent and transitory. The results suggest that the estimates are precisely estimated and the normal distribution is a reasonable approximation to the finite sample distribution of the estimator even when the shocks are strongly persistent. Summarizing, the classical properties of our estimator comes from the use of quasi-differenced data since ∆α Yt and such data are stationary for all |αj | ≤ 1, j = 1, . . . n. This avoids or at least substantially alleviates inferential problems that usually complicate inference when the data are persistent. Expressing the moments in terms of eyt enables conditionally normal inference even if √ xt is non-stationary. In effect, we can obtain T consistent and asymptotically normal estimates of the cointegrating vector when xt is I(1), and we can conduct classical preference on β when xt is nearly I(1). 2
When T = 100, Estimator A also has size distortion but not as large as OLS, and when T = 1000, the normal approximation is very accurate.
18
6
Conclusion
In this paper, we suggest that moments based on quasi-differenced data can be used to derive estimators with classical properties. Quasi-differencing renders possibly non-stationary processes stationary so that classical limit theorems can be applied. But it is also because of this that the √ estimates are T consistent and not super-consistent as would be the case when the level of the data are used. The advantage of this slower convergence rate is the asymptotic normality. The inconvenience of non-standard inference that is characteristic of analysis associated with integrated data can be avoided.
19
Appendix We begin with the following Lemma, taken from Wu (1981): Lemma 1 Let θ be the parameter of interest and θ0 denote the true value of θ. Suppose that for any δ > 0 lim inf
inf
T →∞ kα−α0 k≥δ
(QT (θ) − QT (θ0 )) > 0 a.s.
or in probability.
a.s. Then θbT −→θ0 (or in probability) as T → ∞.
Let Q = g(α)0 g(α) where gj (α) = g¯j (α) = γ bj (α)/b γ0 (α), where γ bj (α) =
γj (α) ¯(α)0 g¯(α), where γ0 (α) , γj (α) = Eet (α)et−j (α), and QT = g P T −1 Tt=1 (yt −αyt−1 )(yt−j −αyt−j−1 ). Let Gt (α) = ∂gt (α)/∂α.
For the model yt = αyt−1 + et , et (α)et−j (α) = (yt − αyt−1 )(yt−j − αyt−j−1 ) = (et (α0 ) + (α0 − α)yt−1 )(et−j (α0 ) + (α0 − α)yt−j−1 ) ∂et (α)et−j (α) Rtj (α) = ∂α
= −(yt−1 et−j (α0 ) + yt−j−1 et (α0 ) − 2(α0 − α)yt−1 yt−j−1 .
¯ T 1 (α) = T −1 PT Rt1 (α). Then Without loss of generality, let L = 1 with j = 1. Let R t=1 ¯ ¯ T 0 (α) b1 (α)R ¯ T (α) = RT 1 (α) − γ G γ b0 (α) γ b0 (α)2 ¯ T (α) and γ Note that when α 6= α0 < 1, GT (α) = Op (1) for all |α0 | ≤ 1 even though R b(j) are both p ¯ T 1 (α0 )−→γ(0) when |α0 | < 1, and when α0 = 1, Op (T ) when α0 = 1. Furthermore, R ¯ T 1 (α0 ) = −2T −1 R
T X
yt−2 et−1 − T −1
t=1
T X
e2t−1 + Op (T −1/2 ) ⇒ −σ 2 W (1)2
t=1
2 ¯ Let G0 = plim G(α)| α=α0 . Thus G0 = 1 when |α0 | < 1 and G0 = W (1) .
Proof of Proposition 1 When α0 is strictly bounded away from 1, Assumptions A and B hold and consistency follows √ by standard arguments. For the limit distribution, T g¯1 (α0 ) ∼ N (0, 1) and G0 = 1. We have √ d T (b α − α0 )−→N (0, 1). When α0 = 1, we consider the quadratic approximation QT (α): ¯ T (α0 )(α − α0 ) + 1 (α − α0 )2 G ¯ T (¯ ¯ T (¯ QT (α) = QT (α0 ) + g¯(α0 )G α)0 G α) 2 ¯ T (α0 )(α − α0 ) + 1 (α − α0 )0 J¯T (¯ = QT (α0 ) + g¯(α0 )G α) · (α − α0 ) 2 20
¯ ¯ T (α)0 G ¯ T (α). For any |α − α0 | = δ > 0, QT (α) − QT (α0 ) = where α ¯ ∈ [α, α0 ], and J(α) = G 1 2 (α
p
− α0 )0 JT (¯ α)(α − α0 ) > 0. The condition for Lemma 1 is satisfied and α b−→α.
To obtain the limit distribution, consider a sequence of local neighborhoods s such that α = α0 + T −1/2 s. Then given consistency, s 1 0 QT (α) = QT (α0 ) + g(α0 )GT (α) √ + s JT (α0 )s + op (s3 ), 2T T ˘ ∈ [¯ α, α0 ]. Define since J(¯ α) = J(α0 ) + (α − α0 ) ∂J(α) ˘ , for some α ∂α |α=α ZT
= JT (α0 )−1 G0 T 1/2 g(α0 ) = (G00 G0 )−1 G00 T 1/2 g(α0 )
qT (s) = (s − ZT )0 J0 (s − ZT ) = s0 J0 s − 2sZT J0 + ZT0 J0 ZT . The limit objective function is 1 0 s s J0 s Q(α) = Q(α0 ) + g(α0 )G0 √ + 2T T 1 0 1 = QT (α0 ) − Z J0 ZT + qT (s) 2T T 2T
(13)
and depends on α via s only through the last term. The distribution of sb = T 1/2 (b α − α0 ) is thus determined by ZT . But ZT = (G00 G0 )−1 G00 T 1/2 g¯(α0 ). The stated result follows by noting that √ d G0 = W (1)2 , and T g¯(α0 )−→N (0, 1). Proof of Proposition 2 Consider Estimator B:
PT
et ∆yt−1 α b − α = PT t=1 . t=1 yt−1 ∆yt−1 P d Since ∆y−1 = et−1 when α0 = 1, the numerator satisfies √1T Tt=1 et et−1 −→N (0, σ 4 ). The denomiP P 2 p nator is yt−1 et−1 = (yt−2 +et−1 )et−1 . Now T −1 Tt=1 yt−2 et−1 ⇒ σ2 (W (1)2 −1) and T −1 Tt=1 e2t−1 −→σ 2 . B
Thus, the denominator converges to
σ2 2 2 (W (1)
+ 1).
To show that the t statistic is asymptotically normal even though α b is only conditionally normal, we need to show that the numerator and the denominator of the t statistic are independent. For this, we need to consider the joint distribution of h
T −1/2
PT
−1 t=1 et−1 et T
Their covariance is T
−3/2
E
X T
es−1 es
s=1
T X t=1
21
i y e t=1 t−1 t−1 .
PT
yt−1 et−1 .
Since yt−1 =
Pt
j=1 ej−1 ,
this covariance is non-zero only if s = t. In this case,
T −3/2
T X
E(e2s e2s−1 ) = T −3/2
s=1
T X
p
σ 2 = σ 4 T −1/2 −→0.
s=1
The two terms are asymptotically uncorrelated.By the Brownian motion property of the denominator and asymptotic property of the numerator, the two terms are also asymptotically independent. By continuous mapping, the ratio has a limit 2(1 + W (1)2 )−1 N (0, 1). When |α0 | ≤ 1, the denominator is T
−1
T X
yt−1 ∆yt−1
=
T
−1
t−1
T X
2 yt−1 − yt−1 yt−2 = γ by (0) − γ by (1)
t=1 p
−→ γy (0) − γy (1) =
σ 2 (1 − α0 ) σ2 . = 2 1 + α0 1 − α0
p t=1 et ∆yt−1 −→E(et ∆yt−1 ) = 0 by the law of iterated σ2 projection and var(∆yt ) = 2Γy (0) − 2Γy (1) = 2 (1+α . It follows that var(et ∆yt−1 ) = 2σ 4 /(1 + α0 ). 0) √ d 2σ 4 0) Combining the results, T (b N (0, (1+α αB − α)−→ (1+α ) = N (0, 2(1 + α0 )). σ2 0)
Consider now the numerator. Now T −1
PT
22
References Canjels, E. and Watson, M. W. 1997, Estimating Deterministic Trends in the Presence of Serially Correlated Errors, Review of Economics and Statistics May, 184–200. Elliott, G. and Jansson, M. 2003, Testing for Unit Roots with Stationary Covariates, Journal of Econometrics 115:1, 75–91. Gorodnichenko, Y. and Ng, S. 2007, Estimation of DSGE Models when the Data are Persistent, mimeo,Columbia University. Hansen, B. 1995, Rethinking the Univariate Approach to Unit Root Tests: How to use covariates to increase power, Econometric Theory 11, 1148–1171. Hansen, L. P. 1982, Large Sample Properties of Generalized Method of Moments Estimators, Econometrica 50, 1029–1054. Jansson, M. and Moreira, M. 2006, Optimal Inference In Regession with Nearly Integrated Regressors, Econometrica 74, 681–714. Kleibergen, F. 1999, Reduced Rank Regression Using GMM, in L. M´aty´as (ed.), Generalized Methods of Moments Estimation, Cambridge University Press. Laroque, B. and Salanie, B. 1997, Normal Estimators for Cointegrating Relationships, Economics Letters 55, 185–189. Moon, R. and Schorfheide, F. 2002, Minimum Distance Estimation of Non-stationary Time Series Models, 18, 1385–1407. Nelson, R. and Plosser, C. 1982, Trends and Random Walks in macroeconomic time series, Journal of Monetary Economics 10, 139–162. Ng, S. and Perron, P. 2001, Lag Length Selection and the Construction of Unit Root Tests with Good Size and Power, Econometrica 69:6, 1519–1554. Pesavento, E. and Rossi, B. 2005, Small Sample Confidence Interevals for Multivariate Impulse Response Functions at Long lags, mimeo, Duke University. Phillips, P. C. B. and Han, C. 2006, Gaussian Inference in AR(1) Time Series With or Without a Unit Root, Colwes Foundation Discussion Paper 1546. Said, S. E. and Dickey, D. A. 1984, Testing for Unit Roots in Autoregressive-Moving Average Models of Unknown Order, Biometrika 71, 599–607. Schwert, G. W. 1989, Tests for Unit Roots: A Monte Carlo Investigation, Journal of Business and Economic Statistics 7, 147–160. So, B. and Shin, D. 1999, Cauchy Estimators for Autoregressive Procseses with Applications to Unit Root Tests and Confidence Intervals, Econometric Theory 15, 166–176. Wu, C. F. 1981, Asymptotic Theory of NonLinear Least Squares Estimation, Annals of Statistics 9:3, 501–513.
23
Table 1a: Mean Estimates of ρ and Rejection Rates of J Test, T=200 (Intercept). p
p 2 2 2 2
L DGP 2 2 2 2 2 3 3 3 3 3 DGP 4 4 4 4 3 3 3 3 DGP θ 0.5 0.5 -0.5 -0.5
ρ0 1 1.000 0.980 0.950 0.920 0.850 1.000 0.980 0.950 0.920 0.850 2 1.000 0.975 0.950 0.925 1.000 0.975 0.950 0.925 3 ρ0 1.000 0.867 1.000 0.800
1 1 1 1
0.5 0.5 -0.5 -0.5
1.000 0.800 1.000 0.800
1 1 1 1 1 2 2 2 2 2 3 3 3 3 1 1 1 1
OLS 0.974 0.957 0.930 0.901 0.832 0.973 0.957 0.929 0.900 0.831 0.986 0.964 0.938 0.913 0.988 0.972 0.955 0.938
QD A Mean estimates 1.005 0.981 1.006 0.965 0.989 0.937 0.950 0.907 0.854 0.838 0.996 0.984 0.976 0.968 0.948 0.939 0.925 0.909 0.880 0.839 0.984 0.969 0.947 0.921 0.998 1.000 1.000 1.000
1.024 0.991 0.949 0.906 1.093 1.086 1.068 1.045
q = 0, L = 3 0.979 1.000 1.001 0.831 0.807 0.804 0.926 0.886 0.885 0.486 0.468 0.717 q = 1, L = 3 0.985 1.004 0.980 0.877 0.785 0.783 0.887 0.991 0.963 0.378 0.678 0.685
24
B
QD 0.044 0.035 0.024 0.024 0.029 0.041 0.041 0.035 0.029 0.025
A J test 0.047 0.051 0.054 0.056 0.054 0.051 0.053 0.056 0.057 0.059
0.955 0.963 0.938 0.910 0.841 0.960 0.961 0.938 0.910 0.842
B 0.039 0.045 0.051 0.054 0.055 0.036 0.045 0.048 0.050 0.052
0.987 0.969 0.944 0.919 1.013 1.094 1.092 1.083
0.035 0.030 0.026 0.023 0.996 0.991 0.971 0.938
0.061 0.073 0.079 0.078 0.966 0.991 0.994 0.993
0.030 0.036 0.039 0.035 0.854 0.989 0.996 0.994
0.890 0.766 0.485 0.296
0.311 0.460 0.120 0.072
0.219 0.190 0.110 0.116
0.579 0.697 0.018 0.044
0.977 0.787 0.678 0.197
0.009 0.002 0.018 0.033
0.032 0.045 0.096 0.096
0.026 0.042 0.040 0.047
Table 2 Rejection Rates Using 5% Asymptotic Critical Values (Intercept Only). T = 200 t1 : H0 : ρ = ρ0 H1 : ρ < ρ 0 OLS QD A
p
L
ρ0
B
1 1 1 1 1 2 2 2 2 2
2 2 2 2 2 3 3 3 3 3
1.000 0.980 0.950 0.920 0.850 1.000 0.980 0.950 0.920 0.850
0.441 0.218 0.148 0.127 0.101 0.464 0.207 0.142 0.118 0.098
0.084 0.080 0.140 0.128 0.110 0.109 0.164 0.161 0.145 0.116
0.100 0.098 0.097 0.093 0.088 0.087 0.092 0.084 0.079 0.077
0.032 0.038 0.049 0.054 0.054 0.030 0.040 0.053 0.056 0.064
3 3 3 3 p 1 1 1 1
4 4 4 4 θ 0.5 0.5 -0.5 -0.5
1.000 0.975 0.950 0.925 ρ0 1.000 0.800 1.000 0.800
0.437 0.161 0.116 0.099
0.129 0.158 0.156 0.151
0.038 0.055 0.110 0.189
0.036 0.048 0.052 0.056
0.274 0.000 0.934 1.000
0.116 0.249 0.178 0.519
0.178 0.121 0.114 0.093
0.081 0.076 0.057 0.214
p
L
ρ0
OLS
QD
A
B
1 1 1 1 1 2 2 2 2 2
2 2 2 2 2 3 3 3 3 3
1.000 0.980 0.950 0.920 0.850 1.000 0.980 0.950 0.920 0.850
0.445 0.147 0.101 0.089 0.083 0.448 0.146 0.103 0.092 0.085
0.111 0.013 0.083 0.095 0.079 0.037 0.107 0.125 0.107 0.077
0.075 0.070 0.069 0.069 0.066 0.077 0.070 0.074 0.072 0.069
0.027 0.041 0.045 0.047 0.051 0.034 0.049 0.057 0.064 0.062
3 3 3 3 p 1 1 1 1
4 4 4 4 θ 0.5 0.5 -0.5 -0.5
1.000 0.975 0.950 0.925 ρ0 1.000 0.800 1.000 0.800
0.442 0.097 0.084 0.084
0.101 0.132 0.128 0.108
0.001 0.018 0.063 0.179
0.024 0.059 0.059 0.063
0.289 0.000 0.927 1.000
0.164 0.194 0.087 0.402
0.137 0.110 0.067 0.042
0.054 0.093 0.031 0.201
t2 : H0 : ρ = .95 H1 : ρ > .95 OLS QD A B DGP 1 0.585 0.280 0.139 0.038 0.249 0.288 0.098 0.016 0.009 0.235 0.042 0.008 0.000 0.123 0.009 0.002 0.000 0.006 0.000 0.000 0.570 0.220 0.165 0.045 0.235 0.108 0.101 0.017 0.008 0.010 0.040 0.004 0.000 0.015 0.010 0.001 0.000 0.003 0.000 0.000 DGP 2 0.917 0.079 0.679 0.207 0.379 0.023 0.385 0.092 0.014 0.008 0.105 0.032 0.000 0.004 0.017 0.011 DGP 3, q = 2, L = 3 0.832 0.369 0.284 0.230 0.000 0.022 0.000 0.000 0.044 0.165 0.057 0.001 0.000 0.000 0.000 0.000 T = 500 OLS
QD A B DGP 1: 0.964 0.430 0.367 0.144 0.673 0.321 0.193 0.087 0.017 0.491 0.027 0.034 0.000 0.367 0.002 0.009 0.000 0.023 0.000 0.000 0.961 0.429 0.385 0.156 0.674 0.277 0.203 0.092 0.017 0.010 0.047 0.046 0.000 0.028 0.003 0.015 0.000 0.019 0.000 0.000 DGP 2 1.000 0.297 0.978 0.436 0.790 0.079 0.761 0.186 0.029 0.015 0.161 0.042 0.000 0.010 0.006 0.008 DGP 3, q = 2, L = 3 0.999 0.474 0.559 0.372 0.000 0.001 0.000 0.000 0.384 0.268 0.299 0.000 0.000 0.000 0.000 0.000
25
ADF
t3 : H 0 : ρ = 1 H1 : ρ < 1 QD A
0.053 0.093 0.350 0.693 0.996 0.046 0.098 0.334 0.652 0.987
0.084 0.102 0.255 0.458 0.785 0.109 0.230 0.385 0.486 0.649
0.100 0.174 0.336 0.530 0.909 0.087 0.163 0.290 0.461 0.802
0.032 0.059 0.133 0.214 0.477 0.030 0.061 0.127 0.217 0.438
0.048 0.288 0.710 0.930
0.129 0.295 0.423 0.511
0.038 0.144 0.428 0.718
0.036 0.123 0.246 0.355
0.030 0.946 0.587 1.000
0.116 0.943 0.178 0.752
0.178 0.961 0.114 0.352
0.081 0.736 0.057 0.392
ADF
QD
A
B
0.063 0.304 0.972 1.000 1.000 0.062 0.303 0.965 1.000 1.000
0.111 0.115 0.114 0.437 0.947 0.037 0.183 0.508 0.668 0.883
0.075 0.196 0.538 0.825 0.999 0.077 0.196 0.483 0.747 0.982
0.027 0.083 0.213 0.388 0.805 0.034 0.086 0.214 0.363 0.745
0.051 0.926 1.000 1.000
0.101 0.333 0.568 0.681
0.001 0.102 0.586 0.937
0.024 0.178 0.396 0.594
0.023 1.000 0.607 1.000
0.164 0.998 0.087 0.756
0.137 1.000 0.067 0.534
0.054 0.973 0.031 0.419
B
Table 3: Rejection Rates Using 5% Asymptotic Critical Values, (Linear Trend Model) T = 200 t1 : H0 : ρ = ρ0 H1 : ρ < ρ 0 OLS QD A
p
L
ρ0
B
1 1 1 1 1 2 2 2 2 2
2 2 2 2 2 3 3 3 3 3
1.000 0.980 0.950 0.920 0.850 1.000 0.980 0.950 0.920 0.850
0.768 0.431 0.249 0.192 0.151 0.785 0.430 0.260 0.202 0.154
0.116 0.139 0.169 0.149 0.132 0.231 0.205 0.181 0.166 0.139
0.172 0.129 0.119 0.117 0.107 0.151 0.125 0.110 0.107 0.099
0.064 0.049 0.056 0.056 0.052 0.060 0.058 0.057 0.061 0.068
3 3 3 3 p 1 1 1 1
4 4 4 4 θ 0.5 0.5 -0.5 -0.5
1.000 0.975 0.950 0.925 ρ0 1.000 0.800 1.000 0.800
0.778 0.284 0.202 0.159
0.193 0.186 0.175 0.169
0.071 0.080 0.137 0.229
0.063 0.055 0.054 0.061
0.506 0.000 0.998 1.000
0.121 0.296 0.318 0.557
0.239 0.133 0.145 0.104
0.114 0.079 0.124 0.221
p
L
ρ0
OLS
QD
A
B
1 1 1 1 1 2 2 2 2 2
2 2 2 2 2 3 3 3 3 3
1.000 0.980 0.950 0.920 0.850 1.000 0.980 0.950 0.920 0.850
0.765 0.265 0.153 0.125 0.108 0.763 0.266 0.162 0.134 0.113
0.102 0.022 0.102 0.108 0.090 0.091 0.148 0.140 0.128 0.095
0.116 0.085 0.077 0.077 0.073 0.108 0.088 0.089 0.087 0.081
0.046 0.045 0.044 0.047 0.051 0.053 0.052 0.058 0.066 0.064
3 3 3 3 p 1 1 1 1
4 4 4 4 θ 0.5 0.5 -0.5 -0.5
1.000 0.975 0.950 0.925 ρ0 1.000 0.800 1.000 0.800
0.766 0.160 0.125 0.113
0.153 0.149 0.136 0.116
0.004 0.024 0.083 0.195
0.048 0.061 0.059 0.064
0.504 0.000 1.000 1.000
0.112 0.218 0.068 0.419
0.182 0.120 0.101 0.050
0.095 0.092 0.072 0.203
t2 : H0 : ρ = .95 H1 : ρ > .95 OLS QD A B DGP 1 0.173 0.232 0.114 0.035 0.089 0.227 0.084 0.023 0.004 0.153 0.034 0.007 0.000 0.090 0.008 0.002 0.000 0.004 0.000 0.000 0.163 0.063 0.120 0.026 0.081 0.036 0.089 0.017 0.003 0.011 0.034 0.004 0.000 0.011 0.009 0.001 0.000 0.003 0.000 0.000 DGP 2 0.656 0.025 0.600 0.182 0.218 0.011 0.331 0.089 0.005 0.005 0.090 0.035 0.000 0.004 0.013 0.012 DGP 3: q = 2, L = 3 0.483 0.306 0.224 0.205 0.000 0.014 0.000 0.000 0.001 0.045 0.016 0.000 0.000 0.000 0.000 0.000 T = 500 OLS
QD A B DGP 1 0.841 0.301 0.314 0.135 0.491 0.334 0.172 0.095 0.011 0.485 0.024 0.034 0.000 0.338 0.002 0.009 0.000 0.022 0.000 0.000 0.836 0.327 0.316 0.149 0.491 0.179 0.188 0.095 0.011 0.010 0.043 0.047 0.000 0.026 0.003 0.014 0.000 0.022 0.000 0.000 DGP 2 0.998 0.242 0.974 0.431 0.715 0.063 0.730 0.187 0.016 0.015 0.141 0.043 0.000 0.008 0.006 0.008 DGP 3: q = 2, L = 3 0.988 0.355 0.505 0.388 0.000 0.001 0.000 0.000 0.040 0.224 0.183 0.000 0.000 0.000 0.000 0.000
26
ADF
t3 : H 0 : ρ = 1 H1 : ρ < 1 QD A
0.050 0.078 0.188 0.464 0.957 0.051 0.072 0.192 0.428 0.920
0.116 0.159 0.321 0.478 0.789 0.231 0.271 0.379 0.483 0.624
0.172 0.223 0.387 0.569 0.922 0.151 0.200 0.337 0.489 0.814
0.064 0.079 0.144 0.216 0.480 0.060 0.085 0.136 0.217 0.440
0.044 0.163 0.485 0.768
0.193 0.300 0.424 0.520
0.071 0.193 0.473 0.754
0.063 0.133 0.250 0.360
0.013 0.712 0.830 1.000
0.121 0.947 0.318 0.760
0.239 0.970 0.145 0.372
0.114 0.737 0.124 0.392
ADF
QD
A
B
0.059 0.187 0.840 0.998 1.000 0.061 0.179 0.815 0.995 1.000
0.102 0.103 0.133 0.463 0.947 0.091 0.247 0.509 0.667 0.868
0.116 0.230 0.566 0.836 0.999 0.108 0.226 0.510 0.760 0.984
0.046 0.088 0.215 0.388 0.806 0.053 0.096 0.216 0.365 0.744
0.059 0.746 1.000 1.000
0.153 0.353 0.569 0.675
0.004 0.115 0.615 0.946
0.048 0.182 0.396 0.593
0.008 1.000 0.854 1.000
0.112 0.998 0.068 0.758
0.182 1.000 0.101 0.548
0.095 0.972 0.072 0.420
B
Table 4: Finite Sample Power for DGP 1, L = 2
K
L
ρ0
t 1 : H 0 : ρ = ρ0 H1 : ρ < ρ0 OLS QD A
1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2
1.000 0.980 0.950 0.920 0.850 0.800 0.700 0.600 0.500 0.300 0.150 0.000 -0.300 -0.500
0.443 0.230 0.155 0.131 0.102 0.092 0.081 0.075 0.066 0.063 0.057 0.056 0.057 0.057
0.097 0.090 0.148 0.132 0.111 0.103 0.092 0.088 0.081 0.069 0.069 0.062 0.059 0.052
0.117 0.109 0.103 0.096 0.088 0.082 0.076 0.073 0.070 0.073 0.062 0.057 0.054 0.055
1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2
1.000 0.980 0.950 0.920 0.850 0.800 0.700 0.600 0.500 0.300 0.150 0.000 -0.300 -0.500
0.437 0.148 0.102 0.089 0.083 0.080 0.073 0.066 0.064 0.055 0.051 0.049 0.047 0.045
0.117 0.013 0.083 0.095 0.079 0.074 0.072 0.067 0.061 0.060 0.055 0.050 0.044 0.044
0.081 0.072 0.069 0.069 0.066 0.066 0.063 0.064 0.063 0.059 0.058 0.058 0.046 0.050
B T 0.038 0.043 0.052 0.056 0.054 0.054 0.058 0.056 0.056 0.059 0.061 0.065 0.057 0.054 T 0.029 0.042 0.045 0.047 0.051 0.050 0.049 0.051 0.046 0.049 0.048 0.039 0.043 0.040
t2 : H0 : ρ0 − .05 H1 : ρ > ρ0 − .05 OLS QD A = 200 0.570 0.321 0.045 0.500 0.365 0.096 0.413 0.327 0.143 0.354 0.220 0.160 0.261 0.089 0.186 0.234 0.101 0.188 0.208 0.155 0.181 0.182 0.171 0.171 0.174 0.168 0.166 0.162 0.161 0.159 0.162 0.162 0.162 0.158 0.171 0.166 0.171 0.189 0.174 0.197 0.220 0.202 = 500 0.962 0.457 0.325 0.931 0.439 0.381 0.837 0.583 0.396 0.743 0.432 0.393 0.592 0.344 0.373 0.524 0.349 0.363 0.427 0.337 0.351 0.353 0.327 0.336 0.332 0.315 0.323 0.303 0.300 0.308 0.300 0.293 0.312 0.292 0.300 0.300 0.319 0.329 0.330 0.368 0.379 0.361
27
B
t3 : H0 : ρ = ρ0 − .10 H1 : ρ > ρ0 − .1 ADF QD A B
0.027 0.028 0.061 0.081 0.113 0.126 0.127 0.125 0.118 0.124 0.134 0.128 0.141 0.150
0.923 0.900 0.841 0.788 0.664 0.610 0.527 0.486 0.444 0.405 0.379 0.374 0.426 0.486
0.543 0.481 0.418 0.375 0.417 0.457 0.466 0.450 0.434 0.391 0.387 0.387 0.438 0.491
0.409 0.461 0.503 0.510 0.511 0.500 0.470 0.438 0.420 0.393 0.380 0.380 0.437 0.489
0.162 0.162 0.196 0.211 0.236 0.232 0.240 0.248 0.253 0.266 0.259 0.278 0.292 0.361
0.139 0.146 0.172 0.174 0.178 0.176 0.186 0.187 0.193 0.203 0.204 0.209 0.231 0.276
1.000 1.000 0.997 0.986 0.952 0.929 0.875 0.822 0.788 0.724 0.697 0.704 0.758 0.833
0.602 0.630 0.742 0.836 0.894 0.872 0.840 0.797 0.770 0.710 0.693 0.692 0.760 0.810
0.865 0.895 0.889 0.883 0.865 0.850 0.821 0.795 0.765 0.719 0.690 0.693 0.761 0.820
0.433 0.445 0.450 0.458 0.452 0.457 0.459 0.458 0.469 0.487 0.487 0.508 0.562 0.668
Table 5: Predictive regressions: Rejection Rates for tβb
yt xt α
= βxt−1 + eyt eyt ∼ N (0, 1) = αxt−1 + ext eyt ∼ N (0, 1) = 1 + c/T, cov(eyt , ext ) = ρ.σx σy .
T
ρ
α
OLS
200 200 200 200 200 200 200 200 200 200 200 200 200 200 200
0.500 0.500 0.500 0.500 0.500 -0.500 -0.500 -0.500 -0.500 -0.500 0.000 0.000 0.000 0.000 0.000
1.000 0.988 0.975 0.963 0.950 1.000 0.988 0.975 0.963 0.950 1.000 0.988 0.975 0.963 0.950
0.194 0.116 0.085 0.076 0.071 0.008 0.016 0.018 0.021 0.021 0.060 0.052 0.050 0.047 0.045
200 200 200 200 200 200 200 200 200 200 200 200 200 200
0.500 0.500 0.500 0.500 0.500 -0.500 -0.500 -0.500 -0.500 -0.500 0.000 0.000 0.000 0.000
1.000 0.988 0.975 0.963 0.950 1.000 0.988 0.975 0.963 0.950 1.000 0.988 0.975 0.963
1.000 1.000 1.000 1.000 1.000 0.972 0.966 0.947 0.925 0.904 0.996 0.998 0.993 0.982
QD A1 A2 B1 H0 : β = 0, H1 : β < 0 0.039 0.056 0.061 0.014 0.038 0.059 0.072 0.018 0.037 0.064 0.075 0.024 0.040 0.062 0.077 0.034 0.044 0.063 0.077 0.039 0.014 0.011 0.016 0.001 0.010 0.026 0.029 0.005 0.012 0.035 0.035 0.016 0.011 0.040 0.039 0.024 0.011 0.041 0.044 0.029 0.039 0.025 0.038 0.008 0.039 0.047 0.050 0.014 0.039 0.048 0.055 0.018 0.040 0.052 0.057 0.029 0.040 0.051 0.057 0.041 H0 : β = .1, H1 : β < .1 0.245 0.701 0.828 0.283 0.237 0.673 0.818 0.265 0.238 0.664 0.807 0.281 0.242 0.660 0.799 0.291 0.248 0.655 0.792 0.292 0.117 0.539 0.664 0.190 0.104 0.558 0.690 0.200 0.112 0.568 0.688 0.210 0.111 0.568 0.687 0.222 0.109 0.568 0.678 0.222 0.204 0.609 0.761 0.219 0.213 0.615 0.754 0.221 0.219 0.613 0.748 0.244 0.224 0.615 0.745 0.251
Estimator A uses ebxt−j as instruments, where j = 1, 2 in A1 , and j = 1, 2, 3 in A2 . Estimator B uses ∆yt−j , ∆xt−j as instruments. B1 uses j = 1, 2 and B2 uses j = 1, 2, 3.
28
B2 0.023 0.032 0.035 0.042 0.049 0.005 0.014 0.018 0.028 0.033 0.010 0.015 0.023 0.034 0.044 0.400 0.386 0.397 0.404 0.406 0.257 0.277 0.287 0.281 0.276 0.317 0.318 0.325 0.327
Table 6: DSGE Model T 200 200 200 200 200 200 200 200 200 200 200 200
ρz 1.000 1.000 1.000 0.975 0.975 0.975 0.950 0.950 0.950 0.900 0.900 0.900
ψb 0.402 0.375 0.370 0.366 0.347 0.384 0.350 0.339 0.391 0.347 0.341 0.392
ρbz 1.008 1.007 1.001 0.973 0.971 0.978 0.939 0.936 0.955 0.883 0.880 0.901
σ b2 0.987 0.988 0.996 0.990 0.989 0.995 0.993 0.993 0.989 0.994 0.995 0.986
29
tpbsi 0.017 0.021 0.037 0.125 0.159 0.084 0.097 0.124 0.060 0.033 0.026 0.035
tρb 0.003 0.003 0.019 0.097 0.117 0.072 0.165 0.199 0.062 0.186 0.218 0.103
tσb 2 0.067 0.066 0.058 0.109 0.105 0.085 0.090 0.085 0.094 0.085 0.075 0.102
Figure 1: t − statistic, T = 500, K = 1, L = 2, α0 = 1.0
tQD: T= 500
t: T= 500 0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0 −4
−2
0
2
4
0 −4
−2
tA: T= 500
0
2
4
2
4
tB: T= 500
0.5
0.7 0.6
0.4
0.5 0.3
0.4
0.2
0.3 0.2
0.1 0 −4
0.1 −2
0
2
4
0 −4
30
−2
0
Figure 2: t − statistic, T = 500, K = 1, L = 2, α0 = .95
tQD: T= 500
t: T= 500 0.5
0.7 0.6
0.4
0.5 0.3
0.4
0.2
0.3 0.2
0.1 0 −4
0.1 −2
0
2
4
0 −4
−2
tA: T= 500
0
2
4
2
4
tB: T= 500
0.4
0.5 0.4
0.3
0.3 0.2 0.2 0.1
0 −4
0.1
−2
0
2
4
0 −4
31
−2
0