ISSN 1440-771X
Australia
Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/
Inference on Nonstationary Time Series with Moving Mean Jiti Gao and Peter M. Robinson
July 2013
Working Paper 15/13
Inference on Nonstationary Time Series with Moving Mean Jiti Gao and Peter M Robinson* Monash University and London School of Economics July 23, 2013
Abstract A semiparametric model is proposed in which a parametric …ltering of a nonstationary time series, incorporating fractionally di¤erencing with short memory correction, removes correlation but leaves a nonparametric deterministic trend. Estimates of the memory parameter and other dependence parameters are proposed, and shown to be consistent and asymptotically normally distributed with parametric rate. thereby justi…ed.
Unit root tests with standard asymptotics are
Estimation of the trend function is also considered.
We
include a Monte Carlo study of …nite-sample performance. Keywords: fractional time series; …xed design nonparametric regression; nonstationary time series; unit root tests. JEL Classi…cations: C14, C22. Proposed running head: Nonstationary Time Series .................................................................................................... *Corresponding author. E-mail:
[email protected]
1
1 INTRODUCTION A long-established vehicle for smoothing a deterministically-trending time series yt ; t = 1; :::; T; is the …xed-design nonparametric regression model given by yt = g
t T
(1)
+ ut ; t = 1; :::; T;
where g(x); x 2 [0; 1] ; is an unknown, smooth, nonparametric function, and ut is an unbservable sequence of random variables with zero mean. The dependence on sample size T of g (t=T ) in (1) is to ensure su¢ cient accumulation of information to enable consistent estimation of g ( ) at any
2 (0; 1).
A more basic trend
function is a polynomial in t of given degree, as still frequently employed in various econometric models. A more general class of models than polynomials (and having analogy with the fractional stochastic trends we will employ in the current paper) involves fractional powers, i.e. yt = where all the
i
and
i
0
+
1t
1
+ ::: +
pt
p
+ ut ; t = 1; :::; T;
(2)
are unknown and real-valued. Subject to identi…ability and
other restrictions, these parameters can be estimated consistently and asymptotically normally, e.g. by nonlinear least squares (Robinson (2012a)).
Models such as (2)
can be especially useful in modest sample sizes. However, and as with any parametric function of t; misspeci…cation leads to inconsistent estimation, and a nonparametric treatment a¤ords greater ‡exibility when T is large (recognizing that nonparametric estimates converge more slowly than parametric ones). With independent and identically distributed (iid) ut ; with …nite variance, various kernel-type estimates of g in (1) were developed by Gasser and Mueller (1979) ; Priestley and Chao (1972) ; with statistical properties established; in particular, under regularity conditions kernel estimates of g ( ) are consistent and asymptotically normally distributed as T ! 1 (see e.g. Benedetti (1977)). 2
A suitable choice of
kernel (and bandwidth) is an important ingredient in this theory, although kernel estimates are essentially an elaboration on simple moving window averages, which have a much longer history in empirical work. More recent empirical uses of (1) include Starica and Granger (2005) in modelling stock price series. The iid assumption on ut is very restrictive, but similar asymptotic properties result when ut has weak dependence, for example is a covariance stationary process, generated by a linear process or satisfying suitable mixing conditions, and having …nite and positive spectral density at degree zero (see e.g. Roussas, Tran and Ioannides (1992), Tran, Roussas, Yakowitz and Truong Van (1996)). The rate of convergence of kernel estimates is una¤ected by this level of serial correlation, though the asymptotic variance di¤ers from that in the iid case (unlike in the stochastic-design model in which the argument of g in (1) is instead a weakly dependent stationary stochastic sequence). Long-range dependence in ut has a greater impact on large-sample inference. If ut is a stationary and invertible fractional process, for example (1
L) 0 ut = "t ; j 0 j < 1=2;
(3)
L being the lag operator and the "t forming an iid sequence, or if ut has a "semiparametric" speci…cation with spectral density f ( ) having rate
2
0
as frequency
approaches zero from above, then the convergence rate of kernel estimates of g ( ) is slower when
0
> 0 and faster when
0
< 0: References dealing with (1) for such ut
include Beran and Feng (2002), Csorgo and Mielniczuk (1995), Deo (1997), Guo and Koul (2007), Robinson (1997), Zhao, Zhang and Li (2013). The asymptotic variance of the kernel estimates depends on
0
and any other time series parameters; for the
"semiparametric" speci…cation Robinson (1997) justi…ed studentization using local Whittle estimates of The restriction
0
0:
< 1=2 implies stationarity of ut ; so that yt given by (1) is non3
stationary only in the mean.
Stochastic trends are also an important source of
nonstationarity in many empirical time series. However, a nonstationary stochastic trend in yt generated by a nonstationary ut ; for example one having a unit root, would render g (t=T ) undetectable.
An alternative, semiparametric, model which
both incorporates a possibly nonstationary stochastic trend and enables estimation of a nonparametric deterministic trend is t
0
t T
yt = g
(4)
+ ut ; t = 1; :::; T;
where ut is a sequence of uncorrelated, homoscedastic random variables and, for any real ;
t
is the truncated fractional di¤erencing operator, t
=
t 1 X
j(
)Lj ; t
(5)
1;
j=0
the
j(
) being coe¢ cients in the (possibly formal) expansion (1
z) =
1 X
j(
)z j ;
j=0
namely j(
)=
(j ) : ( ) (j + 1)
(6)
The truncation in (5) re‡ects non-observability of yt when t of the moving average representation of (4) when for
0;
0
0; and avoids explosion
1=2; the nonstationary region
it is this region with which we will be concerned.
One such
0
has assumed wide empirical importance in connection with a variety
of econometric models, the unit root case (1
L)yt = g
0
t T
= 1; when (4) becomes
+ ut ; t = 1; :::; T:
(7)
The bulk of the econometric literature nests the unit root in autoregressive structures, which suggests treating (7) as a special case of (1
L)yt = g
t T
+ ut ; t = 1; :::; T; 4
(8)
rather than (4). The autoregressive unit root literature suggests that estimates of in (8) will have a nonstandard limit distribution under (7); but a normal one in the "stationary" region j j < 1: By contrast we can anticipate, for example from literature concerning (4) with g(x) a priori constant; that estimates of
0
such as
ones optimizing an approximate pseudo-Gaussian likelihood, and Wald and other test statistics, will enjoy standard asymptotics, with the usual parametric converp gence rate, T ; whatever the value of 0 ; due essentially to smoothness properties of the fractional operator; tests are also expected to have the classical local e¢ ciency properties. While (4) cannot, unlike (8), describe "explosive" behaviour (occurring when j j > 1); it describes a continuum of stochastic trends indexed by
0:
A conse-
quence of the T dependence in g(t=T ) is that the left side of (4) is also T -dependent, so the yt = ytT in fact form a triangular array, but in common with the bulk of literature concerning versions of (1) we suppress reference to T: The model (4) (which nests (1) with iid ut on taking
0
= 0) supposes that the fractional …ltering of yt suc-
cessfully eliminates correlation, but possibly leaves a trend which we are not prepared to parameterize. To provide greater generality than (4), the paper in fact considers the extended model t
where
0
(L;
0 ; 0 ) yt
=g
t T
+ ut ; t = 1; :::; T;
is an unknown p dimensional column vector and t
(z; ; ) =
t 1 X
j(
; )z j ; t
1;
j=0
where the
j(
; ) are coe¢ cients in the possibly formal expansion (z; ; ) =
1 X
j(
; )z j ;
j=0
such that (z; ; ) = (1 5
z)
(z; ) ;
(9)
where (z; ) =
1 X
j(
)z j
j=0
is a known function of z and
that is at least continuous, and nonzero for z on or
inside the unit circle in the complex plane. (1
z) : Leading examples of
When (z; )
1; we have (z; ; ) =
(z; ) are stationary and invertible autoregressive
moving average operators of known degree, for example the …rst order autoregressive operator (z; ) = 1
z; with
here a scalar such that j j < 1. In general
the essential memory or degree of nonstationarity
0
leaves
unchanged but allows otherwise
richer dependence structure. It would be possible to consider in e¤ect a nonparametric
;
(z) ; satisfying
smoothness assumptions only near z = 1; and hence a "semiparametric" operator on yt : This would lead to an estimate of
with only a nonparametric convergence p rate. However, establishing the parametric, T ; rate for estimating 0 and 0 seems 0
actually more challenging and delicate, because of the presence of the nonparametric p g (t=T ) in (9) ; estimates of which converge more slowly than T . In particular, proving consistency requires establishing that certain (stochastic and deterministic) contributions to residuals, whose squares make up the objective function minimized by the parameter estimates, are negligible uniformly over the parameter space; these contributions are of larger order than would be the case with a parametric trend (and this fact also explains why we …nd ourselves unable to choose the parameter space for 0
as large as is possible with a parametric trend). Then, corresponding contributions
to scores evaluated at
are also of larger order than in the parametric trend case p and have to be shown to be negligible after being normalized by T ; rather than by a 0;
0
slower, nonparametric, rate, in order to prove asymptotic normality of the parameter p estimates with T rate. Of course, the strong dependence in yt also impacts on the conditions, due to non-summability of certain weight sequences.
6
The following section proposes estimates of
and
0
0;
and establishes their con-
sistency and asymptotic normality, the proofs appearing in Appendices A and B. Section 3 develops unit root tests based on Wald, pseudo log likelihood ratio and Lagrange multiplier principles. Section 4 proposes estimates of g (x) and establishes their asymptotic properties. A small Monte Carlo study of …nite-sample performance is contained in Section 5. Section 6 concludes by describing further issues that might be considered. 2. ESTIMATION OF DEPENDENCE PARAMETERS Were g(x)
0 in (9) a priori, a natural method of estimating
0
and
0
would be
conditional-sum-of-squares, which approximates Gaussian pseudo-maximum-likelihood estimation. We modify this method by employing residuals, which requires preliminary estimation of g(x): Note that under the conditions imposed below, g (t=T ) g ((t
1
1) =T ) = O (T
); 2
t
T; so we could instead consider proceeding by …rst
di¤erencing in (9) as this would e¤ectively eliminate the deterministic trend; however this also induces a moving average unit root on the right hand side. Let k (x) ; x 2 R; be a user-chosen kernel function and h a user-chosen positive bandwidth number. For any ; g^ (x) =
T X
s
write
(L) ys k
t
x
t
s=T h
s=1
for any x 2 [0; 1] :
(z) =
=
(z; ; ) and introduce T X
k
x
s=T h
s=1
(10)
;
The corresponding estimate of Priestley and Chao (1972) type
replaces the denominator by T h; but we prefer to use weights (of the
s
(L) ys ) that
exactly sum to 1 for all x: De…ne residuals
u^t ( ; ) =
t
(L) yt
g^ (t=T ) =
T X
(
t
(L) yt
s=1
T X s=1
7
s
(L) ys ) kts ;
kts
(11)
where kts = k ((t
s) =T h) : Denote by r1 ; r2 chosen real numbers such that r1 < be a suitably chosen compact subset of Rp : We
r2 ; write r = [r1 ; r2 ] ; and let estimate
0
and
0
by b; b = arg min Q( ; );
(12)
2r; 2
where
Q( ; ) =
T X
u^2t ( ; ):
(13)
t=1
We …rst establish consistency of b; b; under the following regularity conditons. Assumption 1
The ut are stationary and ergodic with …nite fourth moment, and E u2t Ft
E (ut j Ft 1 ) = 0;
1
=
2
almost surely, where Ft is the -…eld of events generated by "s , s on Ft
t, and conditional
3rd and 4th moments of ut equal corresponding unconditional moments.
1
Assumption 2 (i)
0
2
;
(ii) j (z; )j 6= j (z;
0 )j ;for
all
and real
,
6=
0,
2
, on a z set of positive Lebesgue
measure; (iii) for all
2
ei ;
is di¤erentiable in
with derivative in
Lip (&), & > 1=2; (iv) for all
,
ei ;
(v) for all
2
; j (z; )j = 6 0; jzj
is continuous in ; 1:
Assumption 1 is weaker than imposing independence and identity of distribution of ut ; and Assumption 2 is standard from the literature on parametric short memory models since Hannan (1973), ensuring identi…ability of
0
and easily covering station-
ary and invertible moving averages. These assumptions correspond to ones of Hualde and Robinson (2011), who established consistency of the same kind of estimates when 8
g (x)
0 in (9) a priori.
In that setting they were able to choose the set of ad-
missible memory parameters (our [r1 ; r2 ]) arbitrarily large, to simultaneously cover stationary, nonstationary, invertible and non-invertible values. This is more di¢ cult, and perhaps imposssible, to achieve in the presence of the unknown, nonparametric g in (9), which can only be estimated with a slow rate of convergence, and we impose: Assumption 3 r1 > 3=4; r2 < 5=4; 0
(14)
2 r:
(15)
As can be inferred from the proof of Theorem 1, strictly what is required instead of (14) is the weaker condition r2
0
< 1=2; but since
to be no less than r1 the restriction r2
0
is known from (15) only
r1 < 1=2 implied by (14) is appropriate.
Inspection of our proofs indicates that they go through with r in Assumption 3 replaced by [{; { + !] ; for any real { and for ! 2 (0; 1=2) (for example a subset of the stationary and invertible region ( 1=2; 1=2)); but for the sake of clarity we …x on (14), which seems among the more empirically realistic possibilities, and covers the unit root case
0
= 1:
We also need conditions on g; k and h: Assumption 4 The function g(x) is twice boundedly di¤erentiable on [0; 1] and g(0) = 0: Assumption 5 The function k(x) is even, di¤erentiable at all but possibly …nitely many x; with derivative k 0 (x); and
R
R
k(x)dx = 1;
k(x) = O((1 + x2+ ) 1 ); k 0 (x) = O((1 + jxj1+ ) 1 ); some Assumption 6 9
> 0:
As T ! 1; the positive-valued sequence h = hT satis…es: (T h)
1
+ T 2(r2
r1 ) 3
h ! 0:
(16)
Assumption 5 is virtually costless, covering many of the usual kernel choices. Assumption 6, however, represents a trade-o¤ with Assumption 3: in the latter, r2 r1 is desirably as close to 1=2 as possible, but as it approaches 1=2 from below the 1
range of h satisfying Assumption 6 reduces to (T h)
+ T h3 ! 0:
Theorem 1 Let (9) and Assumptions 1-6 hold. Then as T ! 1; b !p
b !p
0;
0:
The proof is in Appendix A. Asymptotic normality requires two further assumptions. Assumption 7 (i)
0 2 (r1 ; r2 ) ;
(ii) for all real
0
is an interior point of ei ;
,
is twice continuously di¤erentiable in
neighbourhood of radius < 1=2 about (iii) the matrix
0
2
=@
P1
j=1
is non-singular, where
:
0;
P1
=6 j
P1
( 0 ) =j
j( ) =
j 1 X
k
( )
j
@
0 j
j=1 j
j=1
k=0
the
on a closed
j k
( 0 ) =j
( 0) ( )
@
0 j
( 0)
1 A
;
( ) being coe¢ cients in the expansion j
(z; ) = (z; )
1
=
X1
j=0
j
( ) zj :
This condition again is based on one of Hualde and Robinson (2011), but is similar to others in the literature, and practically unrestrictive. strengthen the …rst component of Assumption 6 on h: 10
However we have to
Assumption 8 As T ! 1; T h2 = (log T )2 ! 1: Theorem 2 Let (9) and Assumptions 1-8 hold. Then as T ! 1 0 1 b 0 A !d N (0; 1 ): T 1=2 @ b 0
The proof is in Appendix B. Note that the same limit distributions results when g is known or replaced by a parametric function.
In the special case (4) of (9), we
deduce that as T ! 1 T 1=2 b
0
!d N (0; 6= 2 ):
3. UNIT ROOT TESTING We …rst establish Wald tests for 0
= 1 in (9), based on Theorem 2. De…ne 1 P1 0 2 =6 j=1 j ( ) =j A P1 P1 0 j=1 j ( ) j ( ) j=1 j ( ) =j
( )=@
0
and denote by b (1;1) the element in the top left hand corner of Theorem 3 Let
0
W = T 1=2 b
1
= b (1;1)1=2 :
b
1
: Put
= 1 in (9) and let Assumptions 1-8 hold. Then as T ! 1; W !d N (0; 1):
The theorem follows from Theorem 2 and by the proof of Theorem 2.
b !p
(17) ; where the latter is implied
We can reject the unit root null against more non-
stationary alternatives when W falls in the appropriate upper tail of the standard 11
normal density, and reject against less nonstationary alternatives when it falls in the appropriate lower tail. Pseudo-log likelihood ratio tests can also be constructed. De…ne e = arg min QT (1; );
(18)
2
and
LR = log Theorem 4 Let
0
QT (1; e) : QT (b; b)
= 1 in (9) and let Assumptions 1-8 hold. Then as T ! 1; LR !d
2 1:
The proof is standard, given Theorem 2 and a central limit theorem for e (see e.g.
Hannan (1973), or implied by Hualde and Robinson(2011)).
Though it of course does not use b; b, for completeness we also present a Lagrange
multiplier-type test, as it and the Wald and pseudo-log likelihood tests are expected to have equal local power.
Robinson (1994) developed Lagrange multiplier tests for
unit root and other hypotheses against fractional alternatives for the disturbances in multiple linear regression models.
The stress there was on frequency-domain
tests, but starting from an initial time-domain statistic, and to avoid introducing considerable additional notation we stay in the time domain here. Writing @ = @=@( ; 0 )0 ; from (13) T 2X @Q( ; ) = u^t ( ; )@ u^t ( ; ); T t=1
where
@ u^t ( ; ) =
t
(L) yt
@^ g (t=T ) =
T X
(@
t
(L) yt
(19)
@
s=1
T X s=1
12
s
(L) ys ) kts ;
kts
(20)
in which @
t
(z) =
t 1 X
j X
@ j ( ; )z j ; @ j ( ; ) =
j=0
l=0
In fact
@
0
t
j t 1 X X
B B j=0 (1; ) = B t 1 B X @
j
l( ) l
l=1
0
@ l ( )=@ )
@ !
@ j ( )=@
l ( )@
(yt
(yt
j
yt
j
j 1)
j=0
De…ne
LM = with e given by (18).
T @Q(1; e)0 4
e
1
)
l ( )=@
j 1)
yt
j
j l(
1
A:
(21)
1
C C C: C A
@Q(1; e);
The proof of the following theorem is straightforward given
the sentence after Theorem 3. Theorem 5 Let
0
= 1 in (9) and let Assumptions 1-8 hold. Then as T ! 1; LM !d
2 1:
4. NONPARAMETRIC REGRESSION ESTIMATION
We can base estimation of g(x) on our estimates of b; b and (10), but in view of
the stringent conditions on the bandwidth h in Theorems 1 and 2 we allow use of a possibly di¤erent bandwidth, b; in ge (x) =
T X
s
(L) ys k
x
s=1
s=T b
=
T X s=1
k
x
s=T b
;
(22)
We provide a multivariate central limit theorem for gebb( i ); i = 1; 2; ::; q; where i = 1; 2; ::; q; are distinct …xed points, imposing also: Assumption 9
13
i;
1
As T ! 1; (bT )
+ b5 T ! 0:
The proof of the following theorem is omitted as univariate and multivariate central limit theorems for the ge 0 0 (xi ) are already in the literature (see e.g. Benedetti (1977),
Robinson (1997)) and from Theorem 2 it is readily shown that gebb(x)
Op (T
1=2
) for all x:
Theorem 6
ge 0 0 (x) =
Let (9) and Assumptions 1-9 hold. Then as T ! 1; the (bT )1=2 gebb( i ) g( i ) ; R i = 1; 2; :::; q; converge in distribution to independent N 0; 2 k(x)2 dx random variables, where
2
R
is consistently estimated by
b2 = Q b; b :
This is the same limit distribution as results if
0
and
0
are known, i.e. the same
as in the model (1) with iid ut : 5. FINITE-SAMPLE PERFORMANCE A small Monte Carlo study was carried out to investigate the …nite-sample behaviour of our parameter estimates, and of one of our unit root tests. To generate data, in (9) we took g(x) = sin(2 x); p = 1; (z; ) = 1 for various values of
0
and
0;
z (so yt was a FARIMA(1;
0 ; 0));
and "t standard normally distributed. Throughout,
parameter estimates were obtained taking k to be the standard normal kernel. All results are based on 1000 replications. Tables 1-3 contain Monte Carlo biases b and b of b and b and coresponding
Monte Carlo standard deviations s and s , for two ( 0 ; and
varying with these. In Table 1 ( 0 ;
and ( 0 ;
0)
=
(r = [:5; 1:5] ;
7 ;0 8
(r = [:4; 1:1] ;
= [:1; :9]) and ( 0 ;
0)
0)
=
5 1 ; 8 2
0)
choices each, the sets r
(r = [:4; 1:2] ;
= [ :5; :5]); in Table 2 ( 0 ; = (1; 0) (r = [:5; 1:5] ; 14
= [:1; :9]) 0)
=
1; 21
= [ :5; :5]); in Table
3 ( 0;
0)
13 1 ; 8 2
=
(r = [:9; 2:1] ;
= [:1; :9]) and ( 0 ;
= [ :5; :5]): All but one of these Note that in the cases FARIMA(0;
0 ; 0) ;
0
0;
0)
9 ;0 8
=
(r = [:9; 1:5] ;
and none of these r; satisfy Assumption 3.
= 0; one of which is included in each table, yt reduces to a
but we suppose that the practitioner does not know this. Three
di¤erent (T; T h) combinations were employed: (250; 20) ; (600; 50) and (1000; 80) ; these choices only partly represent an h sequence obeying our assumptions, because though T h increases with T; h takes successive values 12:5; 12; 12:5: In the tables, the excessive biases for the smaller sample sizes whenever
0
6= 0 may partly be due to the
overly large r; as well as the deleterious e¤ect of the nonparametric estimation, and the estimation procedure having some di¢ culty in distinguishing the long and short memory e¤ects. The biases when
0
= 0 are much smaller, and generally both biases
and standard deviations diminish with increasing T; while there is a high stability across corresponding elements of the tables, especially Tables 1 and 2. Table 1: Bias and standard deviation of
( 0;
5 1 ; 8 2
0)
T
b
s
b
b; b , ( 0 ;
s
0) =
b
5 1 ; 8 2
7 ;0 8
;
:
7 ;0 8
s
b
s
250
0.3770 0.0537
-0.3081 0.0828
0.0422 0.0655
-0.0264 0.0952
600
0.1905 0.0436
-0.1719 0.0672
0.0272 0.0396
-0.0184 0.0579
1000
0.0669 0.0424
-0.0624 0.0581
0.0175 0.0327
-0.0131 0.0466
Table 2: Bias and standard deviation of ( 0;
1; 12
0)
T
b
s
b
s
b; b , ( 0 ; b
0 )=
1; 12 ; (1; 0) :
(1; 0) s
b
s
250
0.3713 0.0555
-0.3000 0.0841
0.0380 0.0609
-0.0226 0.0911
600
0.1925 0.0437
-0.1733 0.0668
0.0252 0.0396
-0.0187 0.0603
1000
0.0674 0.0443
-0.0632 0.0594
0.0159 0.0317
-0.0136 0.0460
15
b; b , ( 0 ;
Table 3: Bias and standard deviation of ( 0;
13 1 ; 8 2
0)
T
b
s
b
s
b
0 )=
13 1 ; 8 2
;
9 ;0 8
:
9 ;0 8
s
b
s
250
0.2666 0.0488
-0.1790 0.0801
0.0376 0.0636
-0.0178 0.0946
600
0.1853 0.0485
-0.1667 0.0735
0.0251 0.0393
-0.0171 0.0597
1000
0.0612 0.0457
-0.0568 0.0592
0.0170 0.0315
-0.0126 0.0461
Table 4 contains Monte Carlo sizes and powers for the LR unit root test described in Section 3, based on nominal 1% and 5% levels. Sizes were obtained using ( 0 ; 1; 12 and (T; T h) = (250; 12), (600; 25) ; (1000; 52) ; powers using ( 0 ; and
17 1 ; 16 2
0)
=
0)
=
15 1 ; 16 2
, with (T; T h) = (250; 20), (600; 50) ; (1000; 80) : Considering the serious
biases found in Tables 1-3 when
0
= 12 ; the sizes in Table 4 do not seem bad, and
they do improve with increasing T . Only one alternative in either direction from the unit root null is considered, but given that these are both close to 1 the di¤erences between powers and corresponding sizes seem quite satisfactory, with slightly the greater sensitivity when
0
=
15 ; 16
and again there is improvement as T increases.
Table 4: Sizes and powers at nominal 1% and 5% levels Size, ( 0 ;
H0 T
0)
= 1; 21
= 1%
= 5%
250
0:019
0:068
600
0:014
0:061
1000
0:012
0:045
H1
Power, ( 0 ;
T
= 1%
0)
=
15 1 ; 16 2
= 5%
Power, ( 0 ; = 1%
0)
=
17 1 ; 16 2
= 5%
250
0:133
0:284
0:102
0:215
600
0:303
0:562
0:211
0:439
1000
0:504
0:725
0:479
0:708
16
6. FINAL REMARKS The paper has justi…ed large sample inference on the fractional and short memory parameters and nonparametric regression function in a semiparametric model incorporating nonstationary stochastic and deterministic trends. For parametric inference, the restrictions on the admissible memory parameter interval and the range of bandwidths are relatively strong, due to the presence of the nonparametric function and the extent of the time series dependence; possibly one or both these restrictions could be relaxed by means of higher-order kernels, as have been used elsewhere in the semiparametric econometric literature. A variety of further avenues might be explored. As always when nonparametric estimation is involved, bandwidth choice is a practical issue, though possibly a less acute one than in the stochastic design setting in which the density of explanatory variables varies over their support. In our Monte Carlo study only one value of h was used for each T; but sensitivity of estimates and tests to h; and b; can be gauged by carrying out the computations over a range of choices. With respect to automatic rules, in the model (1) a cross–validation choice of b is known to minimize mean integrated squared error (MISE), and we can extend this property to our setting, using b; b, though as usual the minimum-MSE rate does
not quite satisfy conditions (our Assumption 9) for asymptotic normality about g; for h; as is familiar in the semiparametric literature the minimum-MISE rate is clearly excluded by the conditions (our Assumption 8) for asymptotic normality of parameter estimates, and a more appropriate goal may be to make a selection that matches the orders of the next two terms after the normal distribution function in an Edgeworth expansion for distribution function of b; b, and thereby minimizes the departure from the normal limit and leads to better-sized tests and more accurate interval estimates;
in some settings this problem has a neat solution, but we do not know whether this is the case in ours. Bootstrapping is also likely to improve …nite-sample properties.
17
Inference issues that might be investigated include testing constancy or other parametric restrictions on g(x): Possible model extensions that require non-trivial further work include adding a nonparametric function of explanatory variables to g(t=T ) in (9) ; and allowing for unconditional or conditional heteroscedasticity in ut : Our work might also be extended to a panel setting, including individual e¤ects and possible cross–sectional dependence. ACKNOWLEDGMENTS The …rst author acknowledges the Australian Research Council Discovery Grants Program for its support under Grant number: DP1096374. The second author’s research was supported by ESRC Grant ES/J007242/1, and he thanks the Department of Econometrics and Business Statistics at Monash University for hosting and supporting his visits in March 2011 and September 2012. Thanks also go to Dr Jiying Yin for computing assistance. APPENDIX A
Proof of Theorem 1 The model (9) refers to yt ut and g(t=T ) only for t t
1 so we can set yt = ut = 0;
0 and g(x) = 0; x < 0: Then for t = 1; :::; T t
(L; ; ) yt = (L; ; ) yt
and yt = (L;
0; 0)
1
g
t T
+ ut ;
so t
(L; ; ) yt =
(L; ; ) (L;
= (1
L)
0
18
0; 0)
1
yt t (z; ) g T
+ ut ;
where (z; ) = (z; ) (z;
1
0)
=
1 X
)z j :
j(
j=0
From Zygmund (1977, p. 46), Assumption 2 implies that the Fourier coe¢ cients
j(
)
satisfy sup j j ( )j = O(j The Fourier coe…cients
j(
1 &
(A.1)
):
; ) of
(1
z)
0
(z; ) =
1 X
j(
; )z j
j=0
are given by
j(
; )=
j X
l(
)
j l(
0 ):
l=0
(Note that
j (0)
=
j ( 0)
=
j ( 0; 0)
[r1 ; r2 ] r f 0 g j(
0)
1): From (6) ; uniformly in
0; j
= O(j
1
0
); as j ! 1;
2
(A.2)
and so, using also (A.1) j(
; )
[j=2] X
l(
)
j l(
0)
j X
+
l=0
l(
)
j l(
0)
l=[j=2]+1
Kj
0
1
Kj
0
1
1 X l=0
j l ( )j + Kj
+ Kj
1 &
0
1 &
j X
l(
0)
l=0
Kj
0
1
(A.3)
uniformly in ; ; where K throughout denotes a generic …nite, positive constant. Also for future use note that from (6) ; uniformly in j j(
0)
j+1 (
0 )j
19
= O(j
0
2 [r1 ; r2 ] r f 0 g ; 2
); as j ! 1;
2
; (A.4)
j(
; )
j+1 (
j X
; )
l(
)(
j l(
0)
j+1 l (
0 ))
l=0
Kj Kj
2
0
max(
0
1 X
1 &
j l ( )j + Kj
l=0 ;1 &) 2
1 X
l
0
2
+j
j+1 (
+ Kj
)j
1 &
l=1
:
(A.5)
With the abbreviations =
t
t 1 X
j(
1 X ; kt = kts T h s=1 T
t T
j
; )L ; gt = g
j=0
we have from (11) 1 X u^t ( ; ) = ( T h s=1 T
=
t
1 X gs ) kts =kt + ( T h s=1 T
gt
t
ut + Dt
where
s
St ;
1 X = ( T h s=1
t
ut
s
us ) kts =kt
T
Dt and
gt
t
1 X = T h s=1
s
gs ) kts =kt
T
St
s
us kts =kt
are respectively the deterministic and stochastic errors contributing to the residual, that are absent when g(t=T )
0 in (9). Thus
T 1X ( Q( ; ) = T t=1 T 1X = ( T t=1
St )2
t
u t + Dt
t
T T 1X 2 1X 2 ut ) + D + S T t=1 t T t=1 t 2
T 2X + ( T t=1
t
ut ) Dt
T 2X Dt S t : T t=1
20
T 2X ( T t=1
t
ut ) St (A.6)
Hualde and Robinson (2011) show that the estimates minimizing T 1X ( T t=1
are consistent for
0;
0:
ut )2
t
(A.7)
From their proof it su¢ ces to show that as T ! 1; T 1X 2 D T t=1 t
sup
t
T 1X sup ( T t=1
(A.8)
!
p
0;
(A.9)
ut ) Dt
!
p
0;
(A.10)
u t ) St
!
p
0;
(A.11)
!
p
0;
(A.12)
T 1X 2 S T t=1 t
sup T 1X sup ( T t=1
! 0;
t
T 1X sup Dt St T t=1
where the suprema here and subsequently are over
2 [r1 ; r2 ] ;
2
: Given
(A.8) and (A.9), and using the Cauchy inequality, (A.10)-(A.12) follow from the fact, implied by the proof of Hualde and Robinson (2011), that (A.7) is uniformly Op (1): To prove (A.8) note …rst that Lemma 3 of Robinson (2012) gives, for all su¢ ciently large T; 1 : 8
inf jkt j t
Suppressing reference to ; in T X
(
t
gt
s
j
=
gs ) kts =
s=1
j(
; );
T t 1 X X s=1
=
(A.13)
T 1 X j=0
j gt j
j=0
j
T X
s 1 X
j gs j
j=0
(gt
j
gs j ) kts ;
!
kts
(A.14)
s=1
with the convention already adopted that g(x) = 0; x < 0; and g(0) = 0 from 21
Assumption 4. Then sup
T X
(
t
gt
T 1 X
gs ) kts
s
s=1
sup
T X
j
(gt
gs j ) kts :
j
(A.15)
s=1
j=0
From (A.3) and Assumption 3 sup
Kj r2
j
r1 1
(A.16)
:
:
Applying Assumption 4 and with g t denoting the derivative of g(x) at x = t=T; ! T T T X X X t s t s : (gt j gs j ) kts = g t j ( )kts + O ( )2 jkts j ; (A.17) T T s=1 s=1 s=1 :
where g t = 0; t T X t ( s=1
0: By Lemma 2 of Robinson (2012) s
T
1 X t s ( )kts T h s=1 T h T
)kts = T h2
uniformly in t 2 (T h; T
R
T h) : Uniformly in t T X t ( s=1
s T
uk(u)du
T h; t
!
T
= O (h)
(A.18)
T h;
)kts = O(T h2 )
(A.19)
from Lemma 1 of Robinson (2012). By the same lemma, T X t ( s=1
s T
)2 jkts j = O(T h3 )
(A.20)
uniformly in t: Thus from (A.17)-(A.20), max j
T X
(gt
j
gs j ) kts
s=1
= O h + T h3 ; t 2 (T h; T = O T h2 ; t 2 (T h; T
T h)
T h) ;
uniformly. Using also (A.13), (A.15) and (A.16), sup jDt j
K(T h)
1
3
h + Th
;
T 1 X
(1 + j)
0
r1 1
j=0
K(T
r2 r 1 1
+T 22
r2 r 1 2
h ); t 2 (T h; T
T h) ;
(A.21)
and 1
sup jDt j
2
K(T h) T h
;
T 1 X
(1 + j)r2
r1 1
j=0
KT
r2 r 1
h; t
T h; t
T
T h;
uniformly over the stated ranges of t: Thus sup
T 1X 2 D T t=1 t
K(T 2(r2
r1 1)
+ T 2(r2
r1 ) 4
h + T 2(r2
r1 ) 3
h )!0
by Assumption 6, verifying (A.8). To prove (A.9), we have T X
s
us kts =
s=1
T 1 X
j ctj
=
j=0
[T h] X
j ctj
+
j=0
where ctj =
T j X
T 1 X
j ctj
j=[T h]+1
ur kt;r+j ;
r=1
so, using (A.3), sup
[T h] X
j ctj
j=0
and thus 0 Now
E @sup
[T h] X j=0
[T h] X
sup
j
j=0
12
A j ctj
K
[T h] [T h] X X
j r2
jctj j
K
[T h] X j=1
r 1 1 r2 r 1 1
l
j r2
r1 1
jctj j
Ec2tj Ec2tl
1=2
:
(A.22)
j=1 l=1
Ec2tj
=
2
T j X
2 kt;r+j = O(T h)
r=1
by Assumption 6, so (A.22) is O((T h)2(r2 sumption 3. 23
r1 )+1
) = o((T h)2 ) uniformly in t; by As-
By summation-by-parts T 1 X
T 2 X
j ctj =
j=[T h]+1
j
dtj +
j+1
(A.23)
T 1 dt;T 1 :
j=[T h]+1
where dtj =
j X
ctl :
l=0
Now (A.23) is bounded uniformly by T 2 X
sup
j
jdtj j + sup
j+1
j=[T h]+1
K
T 2 X
j
1
j=[T h]+1
jdtj j + KT jdt;T
T 1
so Ed2tj
2
=
T X r=1
0
(A.24)
= max (r2 r1 ; 1 &) 0 1 min(r+j;T ) T X X ur @ kts A ; dtj =
@
min(r+j;T )
X s=r
1j
1j
using (A.3) and (A.5) and writing
r=1
jdt;T
1. Rearranging,
s=r
12
kts A
T X
Kj
s=1
!2
jkts j
Kj (T h)2
and (A.24) has second moment bounded by 2
K (T h)
T 2 X
T 2 X
j
1=2
l
1=2
+ K (T h)2 T 2
+1
j=[T h]+1 l=[T h]+1 2 +3
K (T h) uniformly in t; since