Apr 28, 2014 - The appropriate estimator in the current case is the same but ... )(ri,t â rt). Therefore, since âAt
S UPPLEMENT TO “A S IMPLE T EST FOR N ONSTATIONARITY IN M IXED PANELS : A F URTHER I NVESTIGATION ”: E XTENSIONS Joakim Westerlund∗ Deakin University Australia
April 28, 2014
Abstract This supplement extends the results reported in Westerlund (2014a) in several directions, including (i) nonzero initial values, (ii) cross-section dependence, (iii) heteroskedasticity, (iv) serial correlation, and (v) incidental trends.
1 Introduction Many of the assumptions used in Westerlund (2014a) are stronger than the ones often encountered in the literature based on letting both N and T to infinity, but are standard when testing for a unit root in a fixed-T setting (see Harris and Tzavalis, 1999; Kruiniger, 2009). In particular, the assumption that ϵi,t is independent across both i and t is standard, and is difficult to dispense with unless one is willing to make very specific assumptions regarding the structure for the dependence (when T is fixed). For example, if the correlation structure can be assumed to be homogenous, it is also possible to allow for serial correlation in ϵi,t (see De Blander and Dhaene, 2013, for such an assumption). Of course, the situation is quite different when N, T → ∞, which enables more general types of dependencies. In this supplement we therefore discuss some of the allowances that can be made. In particular, in what remains we discuss in turn nonzero initial values (Section 2), cross-section dependence ∗ Deakin
University, Faculty of Business and Law, School of Accounting, Economics and Finance, Melbourne Burwood Campus, 221 Burwood Highway, VIC 3125, Australia. Telephone: +61 3 924 46973. Fax: +61 3 924 46283. E-mail address:
[email protected].
1
(Section 3), heteroskedasticity (Section 4), serial correlation (Section 5), and incidental trends (Section 6).
2 Nonzero initial value Ng (2008) assumes that u1,0 , ..., u N,0 are O p (1), which is obviously less restrictive than the u1,0 = ... = u N,0 = 0 assumption in Westerlund (2014a). However, as with many of the other assumptions, zero initialization is not really necessary. Indeed, while restrictive in a model without deterministic terms, the assumption of zero initialization is irrelevant as long as the √ model includes a constant. In fact, under N, T → ∞, we can even allow ui,0 = o p ( T ) without affecting the results. The reason is the following: The proofs of Theorems 1 and √ 2 in Westerlund (2014a, Appendix) start with the limiting behavior of ri,t = ui,t /σϵ T. If u1,0 , ..., u N,0 are unrestricted, then this variable can be written as 1 1 1 √ ri,t = √ αit ui,0 + √ σϵ T σϵ T σϵ T
t
∑ αit−s ϵi,s ,
s =1
√ where the first term on the right-hand side is negligible as long as ui,0 = o p ( T ) (see Westerlund, 2014b, for a detailed discussion of the effect of initialization when testing for a unit root in panels). Unfortunately, if T is fixed, then this is no longer the case. A sufficient condition for the finite-T results to go through also in the case of nonzero but O p (1) initialization √ is that ∑iN=1 E[(ui,0 − u0 )(ϵi,t − ϵt )]/N = o (1/ N ) (with obvious definitions of u0 and ϵt ).
3 Cross-section dependence It is easy to verify that θˆ is invariant with respect to transformations of the type {yi,t }tT=1 7→
{ at + yi,t }tT=1 , suggesting that cross-section dependence in the form of a common time effect can be allowed without any correction needed. To appreciate this, suppose that instead of (2) in Westerlund (2014a) we have yi,t = λi + γt + ui,t ,
(1)
where γt is a common time effect. Under this DGP, yi,t − yt = αi − α + ϵi,t − ϵt , showing that Vt = ∑iN=1 (yi,t − yt )2 /N is indeed invariant with respect to γt . But while comˆ the test statistics need to be modified slightly. Specifically, mon time effects do not affect θ, 2
the estimators of σϵ2 and κϵ should be based on yi,t − yt rather than on yi,t . The appropriate versions of σˆ ϵ2 and κˆ ϵ to consider in this case are therefore given by σˆ ϵ2 = ∑iN=1 ∑tT=2 [∆(yi,t − yt )]2 /NT and κˆ ϵ = ∑iN=1 ∑tT=2 [∆(yi,t − yt )]4 /(σˆ ϵ4 NT ), respectively. The asymptotic results reported in Westerlund (2014a) are unaffected by this. Common time effects represent a limiting form of cross-section dependence. For a detailed treatment of the case with a general common factor structure we make reference to Ng (2008, Section 3.2).
4 Heteroskedasticity Consider first the case when the heteroskedasticity is across time. Specifically, suppose that 2 ) = σ2 and E ( ϵ4 ) /σ4 = κ E(ϵi,t ϵ,t < ∞. Heteroskedasticity of this type can be permitted ϵ,t ϵ,t i,t 2 = 4 N ), by simply replacing σˆ ϵ2 and κˆ ϵ by σˆ ϵ,t ∑iN=1 (∆yi,t )2 /N and κˆ ϵ,t = ∑iN=1 (∆yi,t )4 /(σˆ ϵ,t
respectively. Since these estimates are consistent for N → ∞, heteroskedasticity across time ˆ σˆ ϵ2 , can be permitted even when T is fixed. In Westerlund (2014a) θˆ∗ is defined as θˆ∗ = θ/ ∗ = y /σ which is equivalent to using θˆ while replacing yi,t by yi,t i,t ˆ ϵ . The appropriate estimator ∗ = y /σ to consider under heteroskedasticity is the same but with yi,t replaced by yi,t i,t ˆ ϵ,t . The ∗ statistic is the same as under homoskedasticity, but with σ 2 replaced by ˆ θ,T τ1,T 2 σˆ θ,T =
1 2( T 2 − 1) + 2 T2 T
T
∑ (κˆ ϵ,t − 3).
t =2
If the heteroskedasticity is across the cross-section, then T cannot be finite anymore. As2 ) = σ2 and E ( ϵ4 ) /σ4 = κ sume that E(ϵi,t ϵ,i < ∞. Valid inference in this case requires ϵ,i i,t ϵ,i 2 = replacing σˆ ϵ2 in τθ∗0 (and τ1∗ ) by σˆ ϵ,i ∑tT=2 (∆yi,t )2 /T. The obvious rationale for this practice 2 = σ2 + o (1) as T → ∞ (see also Ng, 2008, Section 3.1). The appropriate estimator is that σˆ ϵ,i p ϵ,i
∗ = y /σ of θ to use in this case is θˆ∗ with σˆ ϵ2 = 1 and yi,t in θˆ replaced by yi,t i,t ˆ ϵ,i . Given this
change, the τ1∗ statistic is the same as under homoskedasticity. In Section 5 we report some Monte Carlo evidence for the case when the errors are both cross-section heteroskedastic and serially correlated.
3
5 Error serial correlation In Westerlund (2014a) it is assumed that the error driving ui,t is serially uncorrelated. This is not necessary. Suppose therefore that instead of (3) in Westerlund (2014a), we have ui,t = αi ui,t−1 + ei,t ,
(2)
ei,t = ϕi ( L)ϵi,t
(3)
n where ϕi ( L) = ∑∞ n=0 ϕi,n L has all its roots outside the unit circle, and ϵi,t is iid with E ( ϵi,t ) = 2 ) = σ2 and E ( ϵ4 ) /σ4 = κ 0, E(ϵi,t ϵ,i < ∞. In this DGP the “long-run variance” of ei,t is ϵ,i i,t ϵ,i 2 = ϕ (1)2 σ 2 . given by ωe,i i ϵ,i
In order to illustrate the effect of the above DGP, we revisit Proof of Theorem 1. As pointed out in Section 4, with cross-section heteroskedasticity the appropriate version of θˆ∗ ∗ = y /σ to use is θˆ with yi,t replaced by yi,t i,t ˆ ϵ,i . The appropriate estimator in the current case 2 replaced by ω 2 , which can be any consistent estimator of ω 2 . Later ˆ e,i is the same but with σˆ ϵ,i e,i 2 may be constructed. However, for now we in this section we provide an example of how ωˆ e,i 2 is known, such that y∗ = y /ω . Let us therefore define a∗ = a /ω for assume that ωe,i i,t e,i i,t e,i i,t i,t √ N ∗ 2 ∗ any variable ai,t . Redefine Vt = ∑i=1 (yi,t − yt ) /N, ri,t = ui,t /ωe,i T and Ut = Vt /T. From
∗
∗
N ∗ − y∗ = ( λ∗ − λ ) + ( u∗ − u∗ ), with σ2 ∗ 2 yi,t t t i i,t λ∗ ,N = ∑i =1 ( λi − λ ) /N,
Ut = At + Bt + Ct ,
(4)
where 1 2 σ ∗ , T λ ,N 1 N (ri,t − r t )2 , N i∑ =1
At = Bt =
√
Ct =
2
N
∑ (λi∗ − λ TN
∗
)(ri,t − r t ).
i =1
Therefore, since ∆At = 0, we obtain ∆Ut = ∆Bt + ∆Ct .
(5)
suggesting that 1 θˆ = T
T
T
T
t =2
t =2
t =2
∑ ∆Vt = ∑ ∆Ut = ∑ (∆Bt + ∆Ct ). 4
(6)
Consider Bt . Clearly, ∆Bt =
1 N
N
∑ ∆ri,t2 − ∆r2t ,
i =1
According to the Beveridge–Nelson decomposition, ϕi ( L) = ϕi (1) + ϕi∗ ( L)(1 − L), where ∞ ∗ n ∗ ϕi∗ ( L) = ∑∞ n=0 ϕi,n L with ϕi,n = ∑ k=n+1 ϕi,k has all roots outside the unit circle. Hence,
∗ = ϕ∗ ( L ) ϵ , e = ϕ ( L ) ϵ = ϕ (1) ϵ + ∆ϵ∗ . Let ϵ˜ ∗ = α−t ϵ∗ , such that letting ϵi,t i,t i,t i i,t i i,t i,t i,t i,t i i t
1 1 √ ui,t = √ ωe,i T ωe,i T
ri,t =
1 √
=
σϵ,i
1 √
=
σϵ,i
∑ αit−s ei,s
s =1
t
t 1 1 ∗ √ αit ∑ ∆ϵ˜ i,s √ = σϵ,i T ϵ,i T s =1
∑ αit−s ϵi,s − σ T s =1 t
t
1 ∗ √ αit ϵ˜ i,t T ϵ,i
∑ αit−s ϵi,s − σ
s =1
1 ∗ √ ϵi,t . ϵ,i T
αit−s ϵi,s − ∑ T σ s =1
Hence, for t ≥ s, E(ri,t ri,s |ci ) =
= −
1 2 T σϵ,i
1 2 T σϵ,i
1 2 T σϵ,i
[(
)(
t
∑
E
n =1 t
s
∑ ∑
αit−n ϵi,n
∗ − ϵi,t
s
∑
m =1
αit+s−n−m E(ϵi,n ϵi,m ) −
n =1 m =1 t ∗ αit−n E(ϵi,n ϵi,s )+ n =1
∑
1 2 T σϵ,i
) αis−m ϵi,m 1 2 T σϵ,i
∗ − ϵi,s
s
∑
m =1
ci
]
∗ αis−m E(ϵi,m ϵi,t )
∗ ∗ E(ϵi,t ϵi,s ),
where 1 2 T σϵ,i
s
∑
m =1
∗ αis−m E(ϵi,m ϵi,t ) =
=
1 2 T σϵ,i
1 2 T σϵ,i
s
∞
∗ s−m αi E(ϵi,m ϵi,t−n ) ∑ ∑ ϕi,n
m =1 n =0 s ∗ s−n 2 ϕi,n αi E(ϵi,n ) n =1
∑
1 = T
s
∑
n =1
∗ s−n ϕi,n αi
( ) 1 , =O T
∗ is summable. We can similarly show that E ( ϵ∗ ϵ∗ ) = O (1), giving which holds, because ϕi,n i,t i,s ) ( t s 1 1 . (7) E(ri,t ri,s |ci ) = 2 ∑ ∑ αit+s−n−m E(ϵi,n ϵi,m ) + O T σϵ,i T n=1 m=1
All results used for evaluating ∑tT=2 ut , which are based on E(ri,t ri,s |ci ), are therefore the same as in Proof of Theorem 1, up to a O p (1/T ) remainder term. Hence, (√ ) T √ N E(ST ) = ∑ NE(ut ) = O . T t =2 It follows that ST is no longer exactly mean zero, but only asymptotically so, provided that √ 2 N/T → 0. Hence, by a central limit theorem, letting σθ,NT = E(S2T ), √ ST ∼ Nσθ,NT N (0, 1) (8) 5
as N, T → ∞ with
√
N/T → 0.
2 is unknown and we therefore require a consistent estimator to Of course, in practice ωe,i
be used in its place. Such an estimator can be constructed by using known results from the literature on heteroskedasticity and autocorrelation consistent (HAC) variance estimators (see, for example, Andrews, 1991). If η > 0, then we may use 2 ωˆ e,i = γˆ (0) + 2
M −1
∑
K (n/M )γˆ (n),
n =1
where γˆ (n) = ∑tT=n+1 ∆yi,t ∆yi,t−n /T, K (n/M) is a kernel function and M is a bandwidth truncation parameter. As when allowing for heteroskedasticity across the cross-section, the presence of error serial correlation requires letting both N and T to infinity. As already ∗ = y /ω mentioned, the appropriate version of θˆ∗ to use is θˆ with yi,t replaced by yi,t i,t ˆ e,i .
The appropriate variance correction factor to put in the denominator of τ1∗ depends on the 2 formula for σθ,NT . In the simple DGP considered in Westerlund (2014a), the appropriate 2 formula under T → ∞ simplifies to σθ,NT = 2 + o p (1). Unfortunately, evaluating this term
in the presence of serial correlation is extremely tedious. Preliminary calculations suggests 2 that the effect of the serial correlation is negligible and therefore that σθ,NT has the same
simple form as before, which is supported by unreported simulation results. However, in 2 this supplement we follow Ng (2008, Section 3.2) and replace σθ,NT by a direct estimator, √ which is obtained as the HAC estimator based on N/T (∆Vt − 1). Hence, in terms of the √ 2 , we simply replace ∆y by N/T (∆Vt − 1). above formula for ωˆ e,i i,t
As an illustration we revisit the DGP used for generating Table 4 in Westerlund (2014a). The only difference is that now ui,t = ui,t−1 + ei,t , where ei,t = ϕi ( L)ϵi,t , ϵi,t ∼ N (0, 1), ϕi ( L) = ϕi,0 + ϕi,1 L, ϕi,0 = 1 and ϕi,1 = 0.5. Two different versions of τ1∗ are considered. The ∗ , is the one described in the above with K ( x ) = (1 − | x |)1(| x | ≤ 1) (the first, denoted τ1,ac
Bartlett kernel) and M = ⌊ T 1/3 ⌋, where 1( A) is the indicator function for the event A and
⌊ x ⌋ is the integer part of x. The second version, τ1∗ , is the same as in Westerlund (2014a), which is constructed under the assumption of no serial correlation. The results based on ∗ is correctly 3,000 replications are reported in Table 1. As expected, we see that while τ1,ac
sized, τ1∗ is severely undersized in all experiments considered. The reason for this difference can be seen by looking at the mean and variance results, which in case of τ1∗ are way off the theoretical predictions under no serial correlation. ∗ is not identical to the test statistic considered by Ng (2008, Section 3), they While τ1,ac
6
should be asymptotically equivalent. In spite of this, unreported simulation evidence suggests that the test of Ng (2008) generally suffers from severe size distortions. Hence, just as in the case without serial correlation, the test statistic considered here leads to improved small-sample performance. ∗ . Table 1: 5% size, mean and variance of τ1∗ and τ1,ac
N
T
10 10 10 10 20 20 20 20 40 40 40 40
25 50 100 200 25 50 100 200 25 50 100 200
τ1∗
Size ∗ τ1,ac
0.30 0.20 0.20 0.40 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4.60 5.90 7.70 8.50 2.50 3.90 5.60 6.30 1.10 2.80 3.30 3.30
τ1∗
Mean ∗ τ1,ac
1.65 1.70 1.75 1.73 2.68 2.74 2.68 2.70 3.93 3.96 4.04 4.04
−0.10 −0.17 −0.20 −0.25 0.11 0.03 −0.07 −0.10 0.28 0.14 0.10 0.07
Variance ∗ τ1∗ τ1,ac 3.15 3.38 3.69 3.66 3.58 3.59 3.54 3.70 3.57 3.70 3.56 3.56
0.85 0.92 1.01 1.08 0.83 0.89 0.93 0.99 0.79 0.87 0.89 0.93
∗ refers to the serial correlation robust version of τ ∗ . Notes: τ1,ac 1 See Westerlund (2014a) for a description of the DGP.
6 Incidental trends The DGP in Westerlund (2014a) excludes the possibility that yi,t is trending deterministically, thereby excluding the so-called “incidental trends” case (see, for example, Moon et al., 2007). While in principle there is nothing that prevents a generalization of the test approach considered here to accommodate such trends, preliminary calculations suggest that the proofs in the finite-T case would be extremely tedious, if not impossible. In Westerlund (2014a) we therefore only consider the intercept-only case. Fortunately, the required calculations simplify under N, T → ∞. Suppose therefore that instead of (2) in Westerlund (2014a) we have yi,t = λi + β i t + ui,t .
(9)
7
In what follows we provide the asymptotic distribution (as N, T → ∞) of
√
N/T θˆ under
this specification. In so doing we will assume that η > 0, which means that we will basically derive a constant and trend version of Theorem 1. Since many of the results can be taken directly from Proof of Theorem 1 in Westerlund (2014a, Appendix), only essential details are given.
√ 2 2 We begin by defining ri,t = ui,t /σϵ T, σλ,N = ∑iN=1 (λi − λ)2 /N, σβ,N = ∑iN=1 ( β i − β)2 /N
and Ut = Vt /σϵ2 T. Substitution of yi,t − yt = (λi − λ) + ( β i − β)t + (ui,t − ut ) into Ut yields Ut = At + Bt + Ct + Dt + Et + Ft ,
(10)
where At =
1 2 σ , σϵ2 T λ,N
Bt =
1 N
N
∑ (ri,t − rt )2 ,
i =1
N 2 √ (λi − λ)(ri,t − r t ), ∑ σϵ TN i=1 1 2 2 σ t , σϵ2 T β,N
Ct = Dt =
N
2 √ σϵ TN
Et =
2 σϵ NT
Ft =
∑ ( βi − β)(ri,t − rt )t,
i =1
N
∑ (λi − λ)( βi − β)t,
i =1
suggesting ∆Ut = ∆Bt + ∆Ct + ∆Dt + ∆Et + ∆Ft .
(11)
Here ∆Bt and ∆Ct are as in Proof of Theorem 1 and ∆Dt =
1 2 σ ∆t2 , σϵ2 T β,N
∆Et =
2 √ σϵ TN
∆Ft =
2
N
∑ ( βi − β)∆[(ri,t − rt )t],
i =1 N
∑ (λi − λ)( βi − β).
σϵ2 NT i=1
Consider ∆Dt . Since ∆t2 = t2 − (t − 1)2 = 2t − 1, we may write ∆Dt =
1 2 σ (2t − 1) = µ∆D,NT (t) + o (1) σϵ2 T β,N 8
where µ∆D,NT (t) =
2 2 σ t. σϵ2 T β,N
Making use of the results provided for ∆Ct in Proof of Theorem 1, it is clear that ∆Et has zero mean. For the variance, we use ∆[(ri,t − r t )t] = (ri,t − r t )t − (ri,t−1 − r t−1 )(t − 1) = [∆(ri,t − r t )]t + (ri,t−1 − r t−1 ), giving (∆[(ri,t − r t )t])2 = [∆(ri,t − r t )]2 t2 + 2(ri,t−1 − r t−1 )[∆(ri,t − r t )]t + (ri,t−1 − r t−1 )2 , and so NE[(∆Et )2 ] =
4 σϵ2 NT 4
N
∑ E[( βi − β)2 (∆[(ri,t − rt )t])2 ]
i =1 N
∑ E[( βi − β)2 ]E[(∆ri,t − ∆rt )2 ]t2
=
σϵ2 NT i=1 N
+
σϵ2 NT i=1 N
+
σϵ2 NT i=1
8
∑ E[( βi − β)2 ]E[(ri,t−1 − rt−1 )∆(ri,t − rt )]t
4
∑ E[( βi − β)2 ]E[(ri,t−1 − rt−1 )2 ].
2 2 . As for the The first term on the right is (t/T )2 times NE[(∆Ct )2 ] with σλ,N replaced by σβ,N
third term, by cross-section independence, 2 2 2 E[(ri,t−1 − r t−1 )2 ] = E(ri,t −1 − 2ri,t−1 r t−1 + r t−1 ) = E (ri,t−1 ) + o (1)
= ρr,NT (t − 1, t − 1) + o (1). where ρr,NT (s, t) ∼ 1/T, as in Proof of Theorem 1. Hence, E[(ri,t−1 − r t−1 )2 ] = O(1), suggesting that the third term the expression for NE[(∆Et )2 ] is O(1/T ). The same is true for the second term. Hence, NE[(∆Et )2 ] =
4 2 σϵ NT
N
4
2 t2 , ∑ E[( βi − β)2 ]E[(∆ri,t − ∆rt )2 ]t2 + o(1) ∼ σ2 T2 σβ,N ϵ
i =1
which we can use to show that √ √ N∆Et ∼ NE[(∆Et )2 ] N (0, 1).
(12)
For ∆Ft , ∆Ft =
2 2 σϵ TN
where σλβ,N =
N
2
i =1
ϵ
∑ (λi − λ)( βi − β) = σ2 T σλβ,N = o(1),
∑iN=1 (λi
− λ)( β i − β)/N. Let ut = ∆Ut − µ∆B,NT (t) − µ∆D,NT (t). The above
results, together with the known orders of ∆Bt and ∆Ct (see Proof of Theorem 1), imply √ √ Nut = N [(∆Bt − µ∆B,NT (t)) + ∆Ct + (∆Dt − µ∆D,NT (t)) + ∆Et + ∆Ft ] √ ∼ N [(∆Bt − µ∆B,NT (t)) + ∆Ct + ∆Et ]. (13) 9
Consider ST = ∑tT=2
√
Nut , which in view of the above result for
√
Nut may be written
as follows: ST ∼
T
∑
√
N [(∆Bt − µ∆B,NT (t)) + ∆Ct + ∆Et ]
t =2
From Proof of Theorem 1 we know that ∑tT=2
√
N (∆Bt − µ∆B,NT (t)) and ∑tT=2
√
N∆Ct are
both O p (1). However, by using the above result for NE[(∆Et )2 ] it is not difficult to show that
(
)2 4 2 1 NE ∑ ∆Et ∼ T 2 σβ,N σϵ T3 t =2 T
T
4
2 , ∑ t2 ∼ T 3σ2 σβ,N ϵ
t =2
√ ∫1 where the last result holds because ∑tT=2 t2 /T 3 → 0 r2 dr = 1/3. It follows that ∑tT=2 N∆Et = √ O p ( T ), and therefore √ √ T T 1 N N √ ST ∼ ∑ √ [(∆Bt − µ∆B,NT (t)) + ∆Ct + ∆Et ] ∼ ∑ √ ∆Et . T T T t =2 t =2 We can therefore show that 2 1 √ ST ∼ √ σβ,N N (0, 1). 3σϵ T ∫1 Since ∑tT=2 t/T 2 → 0 rdr = 1/2, we have √ T √ √ N 2 2 1 T 1 2 t ∼ NT 2 σβ,N , ∑ √T µ∆D,NT (t) = NT σ2 σβ,N ∑ 2 T σ ϵ ϵ t =2 t =2 making use of this result and the fact that µ∆B,NT (t) ∼ ∆ρr,NT (t) (see Proof of Theorem, 1), √ the above result for ST / T can be rearranged as follows: √ T 1 T N 2 √ ∑ ∆Ut ∼ ∑ √ [µ∆B,NT (t) + µ∆D,NT (t)] + √ σβ,N N (0, 1) 3σϵ T t =2 T t =2 √ T √ N 1 2 2 ∼ ∑ √ ∆ρr,NT (t) + NT 2 σβ,N + √ σβ,N N (0, 1), σϵ 3σϵ T t =2 or
√ √ T √ 2σϵ σβ,N Nˆ N 2 2 √ θ ∼ σϵ ∑ √ ∆ρr,NT (t) + NTσβ,N + √ N (0, 1). 3 T T t =2
(14)
2 . Towards this end, note that Inference based on this result requires an estimator of σβ,N √ √ the above result for Nut can be solved for N [∆Vt /T − σϵ2 µ∆B,NT (t)], giving √ ) √ (1 √ 2 N 2 2 N ∆Vt − σϵ µ∆B,NT (t) ∼ σβ,N t + σϵ2 N [(∆Bt − µ∆B,NT (t)) + ∆Ct + ∆Et ] T T t = b2 + et , (15) T
10
√ where et = σϵ2 N [(∆Bt − µ∆B,NT (t)) + ∆Ct + ∆Et ] is normal with mean zero and variance 2 t2 / ( σ2 T 2 ) (this can be shown using the results provided in Proof of NE[(∆Et )2 ] ∼ 4σβ,N ϵ √ 2 2 Theorem 1), µ∆B,NT (t) ∼ ∆ρr,NT (t) ∼ 1/T and b = Nσβ,N . This suggests that σβ,N can be √ √ 2 ˆ N, where bˆ is the least squares slope in a regression of N [∆Vt − estimated using σˆ β,N = b/
σϵ2 ]/T onto 2t/T. Clearly, )1/2 ( 1 T 4 σ2 √ ∑tT=2 (t/T )et ( t/T ) ∑ 3σβ,N t = 2 β,N T T (bˆ − b) = T N (0, 1) ∼ √ N (0, 1). (16) ∼ T T 1 2 ∑t=2 (t/T )2 σϵ2 [ T ∑t=2 (t/T )2 ]2 5σϵ √ √ 2 − σ2 ), this implies Since T (bˆ − b) = NT (σˆ β,N β,N
√
√
3σβ,N 2 2 NT (σˆ β,N − σβ,N )∼ √ N (0, 1). 5σϵ
(17)
√ ˆ This distribution is independent of that of the normal variate in θ/ T. Hence, √ √ T √ 2σϵ σβ,N Nˆ √ N 2 2 2 2 √ θ − NT σˆ β,N ∼ σϵ ∑ √ ∆ρr,NT (t) + NT (σβ,N − σˆ β,N N (0, 1) )+ √ 3 T T t =2 √ T N 2 (18) ∼ σϵ ∑ √ ∆ρr,NT (t) + σϵ2 σθ N (0, 1), T t =2 where σθ2 =
2 (27 + 20σϵ4 )σβ,N
15σϵ4
.
The appropriate test statistic to consider is therefore given by √ √ 2 /σ ∗ ˆ ϵ2 ) N/T (θˆ∗ − ( T − 1)/T − T σˆ β,N N/T (θˆBA,1,t − 1) ∗ τ1,t = = , σˆ θ σˆ θ ∗ 2 /σ ∗ 2 /σ 2 ˆ ϵ2 = θˆBA,1 ˆ ϵ2 , and σˆ θ2 is σθ2 with σϵ2 and σβ,N where θˆBA,1,t = θˆ∗ + 1/T − T σˆ β,N − T σˆ β,N 2 , respectively. The asymptotic distribution of this test statistic is replaced by σˆ ϵ2 and σˆ β,N
easily seen to be ∗ τ1,t
1 ∼ σθ
√ ( ) N 1 ∑ √T ∆ρr,NT (t) − T + N (0, 1), t =2 T
(19)
which reduces to N (0, 1) under H0 : θ = 1. ∗ , which is driven by the first (drift) term It is interesting to consider the local power of τ1,t
in the asymptotic distribution. By using the same arguments as in Westerlund (2014a), with T = N γ and αi = exp(ci /N η ), √ ( ) T N 1 √ ∆ρ ( t ) − = O((1 − θ ) N 1/2+γ/2−η µ0,1 ), r,NT ∑ T T t =2 11
(20)
which is less than O((1 − θ ) N 1/2+γ−η µ0,1 ), the corresponding expression in the constantonly case. This means that while the appropriate condition for non-negligible and nonincreasing power is different than in the constant-only case. Specifically, while in the constantonly case the condition is γ − η + 1/2 = 0, in the current trend case the relevant condition is γ/2 − η + 1/2 = 0. Thus, if γ = 1/4, such that T = N 1/4 , while τ1∗ has non-negligible ∗ , which has negligible power for all η > 5/8 power for η = 3/4, this is not the case for τ1,t
(including η = 3/4). Hence, in agreement with the results of Moon et al. (2007), we find that the test statistic considered here has reduced power in the presence of incidental trends. ∗ for some constellations of This is illustrated in Table 2, which report the local power of τ1,t
(γ, η ) when ci ∼ U ( a, b). The DGP is the same as in Section 5, except that the equation for yi,t now includes a linear trend with coefficient β i ∼ N (1, 0.5). Preliminary results suggest that the performance of the test is improved by the inclusion of a constant in (15). The results reported in Table 2 are therefore based on this regression. As expected, we see that power is very low and that it only rarely raises above the nominal 5% level. In order to get a feeling for the loss of power when compared to the constant-only case we look at Table 6 in Westerlund (2014a), which contain some power results for τ1∗ . Clearly, for the combinations of
( a, b) considered here the loss of power is quite dramatic, which is in agreement with our theoretical prediction. As mentioned in Westerlund (2014a), Ng (2008) only examines the asymptotic behavior ∗ is subject to the so-called of her test statistic under the null hypothesis. The finding that τ1,t
“incidental trends problem” is therefore a contribution relative to Ng (2008).1 Another Monte Carlo simulation exercise was undertaken to investigate the small-sample accuracy of our asymptotic results under the null hypothesis. The DGP used for this purpose is the same as the one used in generating the local power results reported in Table 2. The ∗ the test only difference is that in this case a = b = 0 (α1 = ... = α N = 1). In addition to τ1,t
statistic considered by Ng (2008, Section 5), henceforth denoted τ1,t , which can be seen as a conventional t-test of the constant term in a regression of ∆Vt onto a constant and ∆t2 , is also simulated. According to Table 3 τ1,t is severely size distorted. In fact, unreported simulation results suggest that N has to be as large as 1,000 for this test to be reasonably sized, which 1 Moon
and Phillips (1999) show that the maximum likelihood estimator of the local-to-unity parameter in near unit root panels with unit-specific trends is inconsistent. They call this phenomenon, which arises because of the presence of an infinite number of nuisance parameters, an “incidental trend problem”, because it is analogous to the well-known incidental parameter problem in dynamic panels where T is fixed.
12
∗ when T = N γ . Table 2: 5% power of τ1,t
N
a = b = −1
40 80 160 320 640
11.3 7.3 1.5 0.1 0.0
40 80 160 320 640
5.2 3.2 1.8 0.8 0.4
γ = η = 1/2 5.1 3.3 1.8 0.9 0.4
9.4 6.0 4.0 2.7 1.5
15.5 13.3 10.0 8.9 7.3
20.2 14.4 5.3 1.6 0.3
γ = η = 1/4 19.6 14.3 5.4 1.9 0.3
30.1 24.4 15.6 9.4 6.4
38.4 35.6 33.7 35.0 41.6
40 80 160 320 640
a = −2, b = 0
a = b = −2
γ = 1/4, η = 3/4 11.3 12.4 7.4 7.8 1.5 1.7 0.1 0.1 0.0 0.0
a = b = −4 15.5 9.5 2.2 0.3 0.0
∗ refers to the test with incidental trends. The autoregressive parameter Notes: τ1,t is generated as αi = exp(ci /N η ), where ci ∼ U ( a, b). See Westerlund (2014a) for a description of the rest of the DGP.
is largely in agreement with the results reported by Ng (2008, Table 4). By contrast, the test statistic proposed here seems to perform well even in relatively small samples. We also see ∗ are close to the theoretically that the mean and variance of the empirical distribution of τ1,t
predicted values (zero and one, respectively), and that the accuracy increases as N and T increase. The same cannot be said about τ1,t . ∗ is only suitable for testing H : θ = 1. The appropriate As in the constant-only case, τ1,t 0
test statistic to use when testing H0 : θ = θ0 ∈ (0, 1] under η = 0 is given by √ ∗ − 1) N/T (θˆBA,θ ∗ 0 ,t √ τθ0 ,t = . θˆ∗ σˆ θ By using the same arguments as in Westerlund (2014a) we can show that if the null hypothesis is true, then τθ∗0 ,t →d N (0, 1) 13
∗ and τ . Table 3: 5% size, mean and variance of τ1,t 1,t
Size τ1,t
N
T
∗ τ1,t
10 10 10 10 20 20 20 20 40 40 40 40
25 50 100 200 25 50 100 200 25 50 100 200
3.5 3.5 3.9 4.1 3.4 3 3.9 4 3 2.4 2.8 3.5
36.9 38.6 39.5 43.8 34 34.9 37.8 41.9 27.5 30.8 33.9 38.1
∗ τ1,t
Mean τ1,t
0.54 0.22 0.18 0.09 0.4 0.32 0.18 0.12 0.58 0.4 0.28 0.17
−20.94 −17.06 −12.9 −10.3 −24.74 −19.98 −16.15 −12.98 −24.15 −22.11 −19.24 −16.05
Variance τ1,t
∗ τ1,t
21.97 1.37 1.2 1.12 1.66 1.35 1.15 1.07 1.56 1.26 1.13 1.06
954.9 555.08 302.22 164.12 1552.93 877.28 481.45 261.95 2291.44 1450.48 837.81 463.11
∗ refers to the test with incidental trends, with τ being the corresponding Notes: τ1,t 1,t test in Ng (2008). See Westerlund (2014a) for a description of the DGP.
as N, T → ∞ with 1/2 < γ < 1. If, on the other hand, the null hypothesis is false, then √ τθ∗0 ,t = O p ( N ), which is the same result as in the constant-only case. The incidental trends problem is therefore only an issue when considering a local alternative hypothesis (η > 0).
14
References Andrews, D. (1991). Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817–858. De Blander, R., and G. Dhaene (2013). Unit root tests for panel data with AR(1) errors and small T. Econometrics Journal 15, 101–124. Harris, R. D. F., and E. Tzavalis (1999). Inference for unit roots in dynamic panels where the time dimension is fixed. Journal of Econometrics 91, 201–226. Kruiniger, H. (2009). GMM estimation of dynamic panel data models with persistent data. Econometric Theory 25, 1348–1391. Moon, H. R., and P. C. B. Phillips (1999). Maximum likelihood estimation in panels with incidental trends. Oxford Bulletin of Economics and Statistics 61, 771–748. Moon, H. R., B. Perron and P. C. B. Phillips (2007). Incidental trends and the power of panel unit root tests. Journal of Econometrics 141, 416–459. Ng, S. (2008). A simple test for nonstationarity in mixed panels. Journal of Business & Economic Statistics 26, 113–126. Westerlund, J. (2014a). A simple test for nonstationarity in mixed panels: A further investigation. Unpublished manuscript. Westerlund, J. (2014b). Pooled panel unit root tests and the effect of past initialization. Forthcoming in Econometric Reviews.
15