Inference on Nonstationary Time Series with Moving Mean

3 downloads 124 Views 253KB Size Report
Jul 23, 2013 - Australia. Department of Econometrics and Business Statistics ..... A small Monte Carlo study of finite%sample performance is contained in ...
ISSN 1440-771X

Australia

Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/

Inference on Nonstationary Time Series with Moving Mean Jiti Gao and Peter M. Robinson

July 2013

Working Paper 15/13

Inference on Nonstationary Time Series with Moving Mean Jiti Gao and Peter M Robinson* Monash University and London School of Economics July 23, 2013

Abstract A semiparametric model is proposed in which a parametric …ltering of a nonstationary time series, incorporating fractionally di¤erencing with short memory correction, removes correlation but leaves a nonparametric deterministic trend. Estimates of the memory parameter and other dependence parameters are proposed, and shown to be consistent and asymptotically normally distributed with parametric rate. thereby justi…ed.

Unit root tests with standard asymptotics are

Estimation of the trend function is also considered.

We

include a Monte Carlo study of …nite-sample performance. Keywords: fractional time series; …xed design nonparametric regression; nonstationary time series; unit root tests. JEL Classi…cations: C14, C22. Proposed running head: Nonstationary Time Series .................................................................................................... *Corresponding author. E-mail: [email protected]

1

1 INTRODUCTION A long-established vehicle for smoothing a deterministically-trending time series yt ; t = 1; :::; T; is the …xed-design nonparametric regression model given by yt = g

t T

(1)

+ ut ; t = 1; :::; T;

where g(x); x 2 [0; 1] ; is an unknown, smooth, nonparametric function, and ut is an unbservable sequence of random variables with zero mean. The dependence on sample size T of g (t=T ) in (1) is to ensure su¢ cient accumulation of information to enable consistent estimation of g ( ) at any

2 (0; 1).

A more basic trend

function is a polynomial in t of given degree, as still frequently employed in various econometric models. A more general class of models than polynomials (and having analogy with the fractional stochastic trends we will employ in the current paper) involves fractional powers, i.e. yt = where all the

i

and

i

0

+

1t

1

+ ::: +

pt

p

+ ut ; t = 1; :::; T;

(2)

are unknown and real-valued. Subject to identi…ability and

other restrictions, these parameters can be estimated consistently and asymptotically normally, e.g. by nonlinear least squares (Robinson (2012a)).

Models such as (2)

can be especially useful in modest sample sizes. However, and as with any parametric function of t; misspeci…cation leads to inconsistent estimation, and a nonparametric treatment a¤ords greater ‡exibility when T is large (recognizing that nonparametric estimates converge more slowly than parametric ones). With independent and identically distributed (iid) ut ; with …nite variance, various kernel-type estimates of g in (1) were developed by Gasser and Mueller (1979) ; Priestley and Chao (1972) ; with statistical properties established; in particular, under regularity conditions kernel estimates of g ( ) are consistent and asymptotically normally distributed as T ! 1 (see e.g. Benedetti (1977)). 2

A suitable choice of

kernel (and bandwidth) is an important ingredient in this theory, although kernel estimates are essentially an elaboration on simple moving window averages, which have a much longer history in empirical work. More recent empirical uses of (1) include Starica and Granger (2005) in modelling stock price series. The iid assumption on ut is very restrictive, but similar asymptotic properties result when ut has weak dependence, for example is a covariance stationary process, generated by a linear process or satisfying suitable mixing conditions, and having …nite and positive spectral density at degree zero (see e.g. Roussas, Tran and Ioannides (1992), Tran, Roussas, Yakowitz and Truong Van (1996)). The rate of convergence of kernel estimates is una¤ected by this level of serial correlation, though the asymptotic variance di¤ers from that in the iid case (unlike in the stochastic-design model in which the argument of g in (1) is instead a weakly dependent stationary stochastic sequence). Long-range dependence in ut has a greater impact on large-sample inference. If ut is a stationary and invertible fractional process, for example (1

L) 0 ut = "t ; j 0 j < 1=2;

(3)

L being the lag operator and the "t forming an iid sequence, or if ut has a "semiparametric" speci…cation with spectral density f ( ) having rate

2

0

as frequency

approaches zero from above, then the convergence rate of kernel estimates of g ( ) is slower when

0

> 0 and faster when

0

< 0: References dealing with (1) for such ut

include Beran and Feng (2002), Csorgo and Mielniczuk (1995), Deo (1997), Guo and Koul (2007), Robinson (1997), Zhao, Zhang and Li (2013). The asymptotic variance of the kernel estimates depends on

0

and any other time series parameters; for the

"semiparametric" speci…cation Robinson (1997) justi…ed studentization using local Whittle estimates of The restriction

0

0:

< 1=2 implies stationarity of ut ; so that yt given by (1) is non3

stationary only in the mean.

Stochastic trends are also an important source of

nonstationarity in many empirical time series. However, a nonstationary stochastic trend in yt generated by a nonstationary ut ; for example one having a unit root, would render g (t=T ) undetectable.

An alternative, semiparametric, model which

both incorporates a possibly nonstationary stochastic trend and enables estimation of a nonparametric deterministic trend is t

0

t T

yt = g

(4)

+ ut ; t = 1; :::; T;

where ut is a sequence of uncorrelated, homoscedastic random variables and, for any real ;

t

is the truncated fractional di¤erencing operator, t

=

t 1 X

j(

)Lj ; t

(5)

1;

j=0

the

j(

) being coe¢ cients in the (possibly formal) expansion (1

z) =

1 X

j(

)z j ;

j=0

namely j(

)=

(j ) : ( ) (j + 1)

(6)

The truncation in (5) re‡ects non-observability of yt when t of the moving average representation of (4) when for

0;

0

0; and avoids explosion

1=2; the nonstationary region

it is this region with which we will be concerned.

One such

0

has assumed wide empirical importance in connection with a variety

of econometric models, the unit root case (1

L)yt = g

0

t T

= 1; when (4) becomes

+ ut ; t = 1; :::; T:

(7)

The bulk of the econometric literature nests the unit root in autoregressive structures, which suggests treating (7) as a special case of (1

L)yt = g

t T

+ ut ; t = 1; :::; T; 4

(8)

rather than (4). The autoregressive unit root literature suggests that estimates of in (8) will have a nonstandard limit distribution under (7); but a normal one in the "stationary" region j j < 1: By contrast we can anticipate, for example from literature concerning (4) with g(x) a priori constant; that estimates of

0

such as

ones optimizing an approximate pseudo-Gaussian likelihood, and Wald and other test statistics, will enjoy standard asymptotics, with the usual parametric converp gence rate, T ; whatever the value of 0 ; due essentially to smoothness properties of the fractional operator; tests are also expected to have the classical local e¢ ciency properties. While (4) cannot, unlike (8), describe "explosive" behaviour (occurring when j j > 1); it describes a continuum of stochastic trends indexed by

0:

A conse-

quence of the T dependence in g(t=T ) is that the left side of (4) is also T -dependent, so the yt = ytT in fact form a triangular array, but in common with the bulk of literature concerning versions of (1) we suppress reference to T: The model (4) (which nests (1) with iid ut on taking

0

= 0) supposes that the fractional …ltering of yt suc-

cessfully eliminates correlation, but possibly leaves a trend which we are not prepared to parameterize. To provide greater generality than (4), the paper in fact considers the extended model t

where

0

(L;

0 ; 0 ) yt

=g

t T

+ ut ; t = 1; :::; T;

is an unknown p dimensional column vector and t

(z; ; ) =

t 1 X

j(

; )z j ; t

1;

j=0

where the

j(

; ) are coe¢ cients in the possibly formal expansion (z; ; ) =

1 X

j(

; )z j ;

j=0

such that (z; ; ) = (1 5

z)

(z; ) ;

(9)

where (z; ) =

1 X

j(

)z j

j=0

is a known function of z and

that is at least continuous, and nonzero for z on or

inside the unit circle in the complex plane. (1

z) : Leading examples of

When (z; )

1; we have (z; ; ) =

(z; ) are stationary and invertible autoregressive

moving average operators of known degree, for example the …rst order autoregressive operator (z; ) = 1

z; with

here a scalar such that j j < 1. In general

the essential memory or degree of nonstationarity

0

leaves

unchanged but allows otherwise

richer dependence structure. It would be possible to consider in e¤ect a nonparametric

;

(z) ; satisfying

smoothness assumptions only near z = 1; and hence a "semiparametric" operator on yt : This would lead to an estimate of

with only a nonparametric convergence p rate. However, establishing the parametric, T ; rate for estimating 0 and 0 seems 0

actually more challenging and delicate, because of the presence of the nonparametric p g (t=T ) in (9) ; estimates of which converge more slowly than T . In particular, proving consistency requires establishing that certain (stochastic and deterministic) contributions to residuals, whose squares make up the objective function minimized by the parameter estimates, are negligible uniformly over the parameter space; these contributions are of larger order than would be the case with a parametric trend (and this fact also explains why we …nd ourselves unable to choose the parameter space for 0

as large as is possible with a parametric trend). Then, corresponding contributions

to scores evaluated at

are also of larger order than in the parametric trend case p and have to be shown to be negligible after being normalized by T ; rather than by a 0;

0

slower, nonparametric, rate, in order to prove asymptotic normality of the parameter p estimates with T rate. Of course, the strong dependence in yt also impacts on the conditions, due to non-summability of certain weight sequences.

6

The following section proposes estimates of

and

0

0;

and establishes their con-

sistency and asymptotic normality, the proofs appearing in Appendices A and B. Section 3 develops unit root tests based on Wald, pseudo log likelihood ratio and Lagrange multiplier principles. Section 4 proposes estimates of g (x) and establishes their asymptotic properties. A small Monte Carlo study of …nite-sample performance is contained in Section 5. Section 6 concludes by describing further issues that might be considered. 2. ESTIMATION OF DEPENDENCE PARAMETERS Were g(x)

0 in (9) a priori, a natural method of estimating

0

and

0

would be

conditional-sum-of-squares, which approximates Gaussian pseudo-maximum-likelihood estimation. We modify this method by employing residuals, which requires preliminary estimation of g(x): Note that under the conditions imposed below, g (t=T ) g ((t

1

1) =T ) = O (T

); 2

t

T; so we could instead consider proceeding by …rst

di¤erencing in (9) as this would e¤ectively eliminate the deterministic trend; however this also induces a moving average unit root on the right hand side. Let k (x) ; x 2 R; be a user-chosen kernel function and h a user-chosen positive bandwidth number. For any ; g^ (x) =

T X

s

write

(L) ys k

t

x

t

s=T h

s=1

for any x 2 [0; 1] :

(z) =

=

(z; ; ) and introduce T X

k

x

s=T h

s=1

(10)

;

The corresponding estimate of Priestley and Chao (1972) type

replaces the denominator by T h; but we prefer to use weights (of the

s

(L) ys ) that

exactly sum to 1 for all x: De…ne residuals

u^t ( ; ) =

t

(L) yt

g^ (t=T ) =

T X

(

t

(L) yt

s=1

T X s=1

7

s

(L) ys ) kts ;

kts

(11)

where kts = k ((t

s) =T h) : Denote by r1 ; r2 chosen real numbers such that r1 < be a suitably chosen compact subset of Rp : We

r2 ; write r = [r1 ; r2 ] ; and let estimate

0

and

0

by b; b = arg min Q( ; );

(12)

2r; 2

where

Q( ; ) =

T X

u^2t ( ; ):

(13)

t=1

We …rst establish consistency of b; b; under the following regularity conditons. Assumption 1

The ut are stationary and ergodic with …nite fourth moment, and E u2t Ft

E (ut j Ft 1 ) = 0;

1

=

2

almost surely, where Ft is the -…eld of events generated by "s , s on Ft

t, and conditional

3rd and 4th moments of ut equal corresponding unconditional moments.

1

Assumption 2 (i)

0

2

;

(ii) j (z; )j 6= j (z;

0 )j ;for

all

and real

,

6=

0,

2

, on a z set of positive Lebesgue

measure; (iii) for all

2

ei ;

is di¤erentiable in

with derivative in

Lip (&), & > 1=2; (iv) for all

,

ei ;

(v) for all

2

; j (z; )j = 6 0; jzj

is continuous in ; 1:

Assumption 1 is weaker than imposing independence and identity of distribution of ut ; and Assumption 2 is standard from the literature on parametric short memory models since Hannan (1973), ensuring identi…ability of

0

and easily covering station-

ary and invertible moving averages. These assumptions correspond to ones of Hualde and Robinson (2011), who established consistency of the same kind of estimates when 8

g (x)

0 in (9) a priori.

In that setting they were able to choose the set of ad-

missible memory parameters (our [r1 ; r2 ]) arbitrarily large, to simultaneously cover stationary, nonstationary, invertible and non-invertible values. This is more di¢ cult, and perhaps imposssible, to achieve in the presence of the unknown, nonparametric g in (9), which can only be estimated with a slow rate of convergence, and we impose: Assumption 3 r1 > 3=4; r2 < 5=4; 0

(14)

2 r:

(15)

As can be inferred from the proof of Theorem 1, strictly what is required instead of (14) is the weaker condition r2

0

< 1=2; but since

to be no less than r1 the restriction r2

0

is known from (15) only

r1 < 1=2 implied by (14) is appropriate.

Inspection of our proofs indicates that they go through with r in Assumption 3 replaced by [{; { + !] ; for any real { and for ! 2 (0; 1=2) (for example a subset of the stationary and invertible region ( 1=2; 1=2)); but for the sake of clarity we …x on (14), which seems among the more empirically realistic possibilities, and covers the unit root case

0

= 1:

We also need conditions on g; k and h: Assumption 4 The function g(x) is twice boundedly di¤erentiable on [0; 1] and g(0) = 0: Assumption 5 The function k(x) is even, di¤erentiable at all but possibly …nitely many x; with derivative k 0 (x); and

R

R

k(x)dx = 1;

k(x) = O((1 + x2+ ) 1 ); k 0 (x) = O((1 + jxj1+ ) 1 ); some Assumption 6 9

> 0:

As T ! 1; the positive-valued sequence h = hT satis…es: (T h)

1

+ T 2(r2

r1 ) 3

h ! 0:

(16)

Assumption 5 is virtually costless, covering many of the usual kernel choices. Assumption 6, however, represents a trade-o¤ with Assumption 3: in the latter, r2 r1 is desirably as close to 1=2 as possible, but as it approaches 1=2 from below the 1

range of h satisfying Assumption 6 reduces to (T h)

+ T h3 ! 0:

Theorem 1 Let (9) and Assumptions 1-6 hold. Then as T ! 1; b !p

b !p

0;

0:

The proof is in Appendix A. Asymptotic normality requires two further assumptions. Assumption 7 (i)

0 2 (r1 ; r2 ) ;

(ii) for all real

0

is an interior point of ei ;

,

is twice continuously di¤erentiable in

neighbourhood of radius < 1=2 about (iii) the matrix

0

2

=@

P1

j=1

is non-singular, where

:

0;

P1

=6 j

P1

( 0 ) =j

j( ) =

j 1 X

k

( )

j

@

0 j

j=1 j

j=1

k=0

the

on a closed

j k

( 0 ) =j

( 0) ( )

@

0 j

( 0)

1 A

;

( ) being coe¢ cients in the expansion j

(z; ) = (z; )

1

=

X1

j=0

j

( ) zj :

This condition again is based on one of Hualde and Robinson (2011), but is similar to others in the literature, and practically unrestrictive. strengthen the …rst component of Assumption 6 on h: 10

However we have to

Assumption 8 As T ! 1; T h2 = (log T )2 ! 1: Theorem 2 Let (9) and Assumptions 1-8 hold. Then as T ! 1 0 1 b 0 A !d N (0; 1 ): T 1=2 @ b 0

The proof is in Appendix B. Note that the same limit distributions results when g is known or replaced by a parametric function.

In the special case (4) of (9), we

deduce that as T ! 1 T 1=2 b

0

!d N (0; 6= 2 ):

3. UNIT ROOT TESTING We …rst establish Wald tests for 0

= 1 in (9), based on Theorem 2. De…ne 1 P1 0 2 =6 j=1 j ( ) =j A P1 P1 0 j=1 j ( ) j ( ) j=1 j ( ) =j

( )=@

0

and denote by b (1;1) the element in the top left hand corner of Theorem 3 Let

0

W = T 1=2 b

1

= b (1;1)1=2 :

b

1

: Put

= 1 in (9) and let Assumptions 1-8 hold. Then as T ! 1; W !d N (0; 1):

The theorem follows from Theorem 2 and by the proof of Theorem 2.

b !p

(17) ; where the latter is implied

We can reject the unit root null against more non-

stationary alternatives when W falls in the appropriate upper tail of the standard 11

normal density, and reject against less nonstationary alternatives when it falls in the appropriate lower tail. Pseudo-log likelihood ratio tests can also be constructed. De…ne e = arg min QT (1; );

(18)

2

and

LR = log Theorem 4 Let

0

QT (1; e) : QT (b; b)

= 1 in (9) and let Assumptions 1-8 hold. Then as T ! 1; LR !d

2 1:

The proof is standard, given Theorem 2 and a central limit theorem for e (see e.g.

Hannan (1973), or implied by Hualde and Robinson(2011)).

Though it of course does not use b; b, for completeness we also present a Lagrange

multiplier-type test, as it and the Wald and pseudo-log likelihood tests are expected to have equal local power.

Robinson (1994) developed Lagrange multiplier tests for

unit root and other hypotheses against fractional alternatives for the disturbances in multiple linear regression models.

The stress there was on frequency-domain

tests, but starting from an initial time-domain statistic, and to avoid introducing considerable additional notation we stay in the time domain here. Writing @ = @=@( ; 0 )0 ; from (13) T 2X @Q( ; ) = u^t ( ; )@ u^t ( ; ); T t=1

where

@ u^t ( ; ) =

t

(L) yt

@^ g (t=T ) =

T X

(@

t

(L) yt

(19)

@

s=1

T X s=1

12

s

(L) ys ) kts ;

kts

(20)

in which @

t

(z) =

t 1 X

j X

@ j ( ; )z j ; @ j ( ; ) =

j=0

l=0

In fact

@

0

t

j t 1 X X

B B j=0 (1; ) = B t 1 B X @

j

l( ) l

l=1

0

@ l ( )=@ )

@ !

@ j ( )=@

l ( )@

(yt

(yt

j

yt

j

j 1)

j=0

De…ne

LM = with e given by (18).

T @Q(1; e)0 4

e

1

)

l ( )=@

j 1)

yt

j

j l(

1

A:

(21)

1

C C C: C A

@Q(1; e);

The proof of the following theorem is straightforward given

the sentence after Theorem 3. Theorem 5 Let

0

= 1 in (9) and let Assumptions 1-8 hold. Then as T ! 1; LM !d

2 1:

4. NONPARAMETRIC REGRESSION ESTIMATION

We can base estimation of g(x) on our estimates of b; b and (10), but in view of

the stringent conditions on the bandwidth h in Theorems 1 and 2 we allow use of a possibly di¤erent bandwidth, b; in ge (x) =

T X

s

(L) ys k

x

s=1

s=T b

=

T X s=1

k

x

s=T b

;

(22)

We provide a multivariate central limit theorem for gebb( i ); i = 1; 2; ::; q; where i = 1; 2; ::; q; are distinct …xed points, imposing also: Assumption 9

13

i;

1

As T ! 1; (bT )

+ b5 T ! 0:

The proof of the following theorem is omitted as univariate and multivariate central limit theorems for the ge 0 0 (xi ) are already in the literature (see e.g. Benedetti (1977),

Robinson (1997)) and from Theorem 2 it is readily shown that gebb(x)

Op (T

1=2

) for all x:

Theorem 6

ge 0 0 (x) =

Let (9) and Assumptions 1-9 hold. Then as T ! 1; the (bT )1=2 gebb( i ) g( i ) ; R i = 1; 2; :::; q; converge in distribution to independent N 0; 2 k(x)2 dx random variables, where

2

R

is consistently estimated by

b2 = Q b; b :

This is the same limit distribution as results if

0

and

0

are known, i.e. the same

as in the model (1) with iid ut : 5. FINITE-SAMPLE PERFORMANCE A small Monte Carlo study was carried out to investigate the …nite-sample behaviour of our parameter estimates, and of one of our unit root tests. To generate data, in (9) we took g(x) = sin(2 x); p = 1; (z; ) = 1 for various values of

0

and

0;

z (so yt was a FARIMA(1;

0 ; 0));

and "t standard normally distributed. Throughout,

parameter estimates were obtained taking k to be the standard normal kernel. All results are based on 1000 replications. Tables 1-3 contain Monte Carlo biases b and b of b and b and coresponding

Monte Carlo standard deviations s and s , for two ( 0 ; and

varying with these. In Table 1 ( 0 ;

and ( 0 ;

0)

=

(r = [:5; 1:5] ;

7 ;0 8

(r = [:4; 1:1] ;

= [:1; :9]) and ( 0 ;

0)

0)

=

5 1 ; 8 2

0)

choices each, the sets r

(r = [:4; 1:2] ;

= [ :5; :5]); in Table 2 ( 0 ; = (1; 0) (r = [:5; 1:5] ; 14

= [:1; :9]) 0)

=

1; 21

= [ :5; :5]); in Table

3 ( 0;

0)

13 1 ; 8 2

=

(r = [:9; 2:1] ;

= [:1; :9]) and ( 0 ;

= [ :5; :5]): All but one of these Note that in the cases FARIMA(0;

0 ; 0) ;

0

0;

0)

9 ;0 8

=

(r = [:9; 1:5] ;

and none of these r; satisfy Assumption 3.

= 0; one of which is included in each table, yt reduces to a

but we suppose that the practitioner does not know this. Three

di¤erent (T; T h) combinations were employed: (250; 20) ; (600; 50) and (1000; 80) ; these choices only partly represent an h sequence obeying our assumptions, because though T h increases with T; h takes successive values 12:5; 12; 12:5: In the tables, the excessive biases for the smaller sample sizes whenever

0

6= 0 may partly be due to the

overly large r; as well as the deleterious e¤ect of the nonparametric estimation, and the estimation procedure having some di¢ culty in distinguishing the long and short memory e¤ects. The biases when

0

= 0 are much smaller, and generally both biases

and standard deviations diminish with increasing T; while there is a high stability across corresponding elements of the tables, especially Tables 1 and 2. Table 1: Bias and standard deviation of

( 0;

5 1 ; 8 2

0)

T

b

s

b

b; b , ( 0 ;

s

0) =

b

5 1 ; 8 2

7 ;0 8

;

:

7 ;0 8

s

b

s

250

0.3770 0.0537

-0.3081 0.0828

0.0422 0.0655

-0.0264 0.0952

600

0.1905 0.0436

-0.1719 0.0672

0.0272 0.0396

-0.0184 0.0579

1000

0.0669 0.0424

-0.0624 0.0581

0.0175 0.0327

-0.0131 0.0466

Table 2: Bias and standard deviation of ( 0;

1; 12

0)

T

b

s

b

s

b; b , ( 0 ; b

0 )=

1; 12 ; (1; 0) :

(1; 0) s

b

s

250

0.3713 0.0555

-0.3000 0.0841

0.0380 0.0609

-0.0226 0.0911

600

0.1925 0.0437

-0.1733 0.0668

0.0252 0.0396

-0.0187 0.0603

1000

0.0674 0.0443

-0.0632 0.0594

0.0159 0.0317

-0.0136 0.0460

15

b; b , ( 0 ;

Table 3: Bias and standard deviation of ( 0;

13 1 ; 8 2

0)

T

b

s

b

s

b

0 )=

13 1 ; 8 2

;

9 ;0 8

:

9 ;0 8

s

b

s

250

0.2666 0.0488

-0.1790 0.0801

0.0376 0.0636

-0.0178 0.0946

600

0.1853 0.0485

-0.1667 0.0735

0.0251 0.0393

-0.0171 0.0597

1000

0.0612 0.0457

-0.0568 0.0592

0.0170 0.0315

-0.0126 0.0461

Table 4 contains Monte Carlo sizes and powers for the LR unit root test described in Section 3, based on nominal 1% and 5% levels. Sizes were obtained using ( 0 ; 1; 12 and (T; T h) = (250; 12), (600; 25) ; (1000; 52) ; powers using ( 0 ; and

17 1 ; 16 2

0)

=

0)

=

15 1 ; 16 2

, with (T; T h) = (250; 20), (600; 50) ; (1000; 80) : Considering the serious

biases found in Tables 1-3 when

0

= 12 ; the sizes in Table 4 do not seem bad, and

they do improve with increasing T . Only one alternative in either direction from the unit root null is considered, but given that these are both close to 1 the di¤erences between powers and corresponding sizes seem quite satisfactory, with slightly the greater sensitivity when

0

=

15 ; 16

and again there is improvement as T increases.

Table 4: Sizes and powers at nominal 1% and 5% levels Size, ( 0 ;

H0 T

0)

= 1; 21

= 1%

= 5%

250

0:019

0:068

600

0:014

0:061

1000

0:012

0:045

H1

Power, ( 0 ;

T

= 1%

0)

=

15 1 ; 16 2

= 5%

Power, ( 0 ; = 1%

0)

=

17 1 ; 16 2

= 5%

250

0:133

0:284

0:102

0:215

600

0:303

0:562

0:211

0:439

1000

0:504

0:725

0:479

0:708

16

6. FINAL REMARKS The paper has justi…ed large sample inference on the fractional and short memory parameters and nonparametric regression function in a semiparametric model incorporating nonstationary stochastic and deterministic trends. For parametric inference, the restrictions on the admissible memory parameter interval and the range of bandwidths are relatively strong, due to the presence of the nonparametric function and the extent of the time series dependence; possibly one or both these restrictions could be relaxed by means of higher-order kernels, as have been used elsewhere in the semiparametric econometric literature. A variety of further avenues might be explored. As always when nonparametric estimation is involved, bandwidth choice is a practical issue, though possibly a less acute one than in the stochastic design setting in which the density of explanatory variables varies over their support. In our Monte Carlo study only one value of h was used for each T; but sensitivity of estimates and tests to h; and b; can be gauged by carrying out the computations over a range of choices. With respect to automatic rules, in the model (1) a cross–validation choice of b is known to minimize mean integrated squared error (MISE), and we can extend this property to our setting, using b; b, though as usual the minimum-MSE rate does

not quite satisfy conditions (our Assumption 9) for asymptotic normality about g; for h; as is familiar in the semiparametric literature the minimum-MISE rate is clearly excluded by the conditions (our Assumption 8) for asymptotic normality of parameter estimates, and a more appropriate goal may be to make a selection that matches the orders of the next two terms after the normal distribution function in an Edgeworth expansion for distribution function of b; b, and thereby minimizes the departure from the normal limit and leads to better-sized tests and more accurate interval estimates;

in some settings this problem has a neat solution, but we do not know whether this is the case in ours. Bootstrapping is also likely to improve …nite-sample properties.

17

Inference issues that might be investigated include testing constancy or other parametric restrictions on g(x): Possible model extensions that require non-trivial further work include adding a nonparametric function of explanatory variables to g(t=T ) in (9) ; and allowing for unconditional or conditional heteroscedasticity in ut : Our work might also be extended to a panel setting, including individual e¤ects and possible cross–sectional dependence. ACKNOWLEDGMENTS The …rst author acknowledges the Australian Research Council Discovery Grants Program for its support under Grant number: DP1096374. The second author’s research was supported by ESRC Grant ES/J007242/1, and he thanks the Department of Econometrics and Business Statistics at Monash University for hosting and supporting his visits in March 2011 and September 2012. Thanks also go to Dr Jiying Yin for computing assistance. APPENDIX A

Proof of Theorem 1 The model (9) refers to yt ut and g(t=T ) only for t t

1 so we can set yt = ut = 0;

0 and g(x) = 0; x < 0: Then for t = 1; :::; T t

(L; ; ) yt = (L; ; ) yt

and yt = (L;

0; 0)

1

g

t T

+ ut ;

so t

(L; ; ) yt =

(L; ; ) (L;

= (1

L)

0

18

0; 0)

1

yt t (z; ) g T

+ ut ;

where (z; ) = (z; ) (z;

1

0)

=

1 X

)z j :

j(

j=0

From Zygmund (1977, p. 46), Assumption 2 implies that the Fourier coe¢ cients

j(

)

satisfy sup j j ( )j = O(j The Fourier coe…cients

j(

1 &

(A.1)

):

; ) of

(1

z)

0

(z; ) =

1 X

j(

; )z j

j=0

are given by

j(

; )=

j X

l(

)

j l(

0 ):

l=0

(Note that

j (0)

=

j ( 0)

=

j ( 0; 0)

[r1 ; r2 ] r f 0 g j(

0)

1): From (6) ; uniformly in

0; j

= O(j

1

0

); as j ! 1;

2

(A.2)

and so, using also (A.1) j(

; )

[j=2] X

l(

)

j l(

0)

j X

+

l=0

l(

)

j l(

0)

l=[j=2]+1

Kj

0

1

Kj

0

1

1 X l=0

j l ( )j + Kj

+ Kj

1 &

0

1 &

j X

l(

0)

l=0

Kj

0

1

(A.3)

uniformly in ; ; where K throughout denotes a generic …nite, positive constant. Also for future use note that from (6) ; uniformly in j j(

0)

j+1 (

0 )j

19

= O(j

0

2 [r1 ; r2 ] r f 0 g ; 2

); as j ! 1;

2

; (A.4)

j(

; )

j+1 (

j X

; )

l(

)(

j l(

0)

j+1 l (

0 ))

l=0

Kj Kj

2

0

max(

0

1 X

1 &

j l ( )j + Kj

l=0 ;1 &) 2

1 X

l

0

2

+j

j+1 (

+ Kj

)j

1 &

l=1

:

(A.5)

With the abbreviations =

t

t 1 X

j(

1 X ; kt = kts T h s=1 T

t T

j

; )L ; gt = g

j=0

we have from (11) 1 X u^t ( ; ) = ( T h s=1 T

=

t

1 X gs ) kts =kt + ( T h s=1 T

gt

t

ut + Dt

where

s

St ;

1 X = ( T h s=1

t

ut

s

us ) kts =kt

T

Dt and

gt

t

1 X = T h s=1

s

gs ) kts =kt

T

St

s

us kts =kt

are respectively the deterministic and stochastic errors contributing to the residual, that are absent when g(t=T )

0 in (9). Thus

T 1X ( Q( ; ) = T t=1 T 1X = ( T t=1

St )2

t

u t + Dt

t

T T 1X 2 1X 2 ut ) + D + S T t=1 t T t=1 t 2

T 2X + ( T t=1

t

ut ) Dt

T 2X Dt S t : T t=1

20

T 2X ( T t=1

t

ut ) St (A.6)

Hualde and Robinson (2011) show that the estimates minimizing T 1X ( T t=1

are consistent for

0;

0:

ut )2

t

(A.7)

From their proof it su¢ ces to show that as T ! 1; T 1X 2 D T t=1 t

sup

t

T 1X sup ( T t=1

(A.8)

!

p

0;

(A.9)

ut ) Dt

!

p

0;

(A.10)

u t ) St

!

p

0;

(A.11)

!

p

0;

(A.12)

T 1X 2 S T t=1 t

sup T 1X sup ( T t=1

! 0;

t

T 1X sup Dt St T t=1

where the suprema here and subsequently are over

2 [r1 ; r2 ] ;

2

: Given

(A.8) and (A.9), and using the Cauchy inequality, (A.10)-(A.12) follow from the fact, implied by the proof of Hualde and Robinson (2011), that (A.7) is uniformly Op (1): To prove (A.8) note …rst that Lemma 3 of Robinson (2012) gives, for all su¢ ciently large T; 1 : 8

inf jkt j t

Suppressing reference to ; in T X

(

t

gt

s

j

=

gs ) kts =

s=1

j(

; );

T t 1 X X s=1

=

(A.13)

T 1 X j=0

j gt j

j=0

j

T X

s 1 X

j gs j

j=0

(gt

j

gs j ) kts ;

!

kts

(A.14)

s=1

with the convention already adopted that g(x) = 0; x < 0; and g(0) = 0 from 21

Assumption 4. Then sup

T X

(

t

gt

T 1 X

gs ) kts

s

s=1

sup

T X

j

(gt

gs j ) kts :

j

(A.15)

s=1

j=0

From (A.3) and Assumption 3 sup

Kj r2

j

r1 1

(A.16)

:

:

Applying Assumption 4 and with g t denoting the derivative of g(x) at x = t=T; ! T T T X X X t s t s : (gt j gs j ) kts = g t j ( )kts + O ( )2 jkts j ; (A.17) T T s=1 s=1 s=1 :

where g t = 0; t T X t ( s=1

0: By Lemma 2 of Robinson (2012) s

T

1 X t s ( )kts T h s=1 T h T

)kts = T h2

uniformly in t 2 (T h; T

R

T h) : Uniformly in t T X t ( s=1

s T

uk(u)du

T h; t

!

T

= O (h)

(A.18)

T h;

)kts = O(T h2 )

(A.19)

from Lemma 1 of Robinson (2012). By the same lemma, T X t ( s=1

s T

)2 jkts j = O(T h3 )

(A.20)

uniformly in t: Thus from (A.17)-(A.20), max j

T X

(gt

j

gs j ) kts

s=1

= O h + T h3 ; t 2 (T h; T = O T h2 ; t 2 (T h; T

T h)

T h) ;

uniformly. Using also (A.13), (A.15) and (A.16), sup jDt j

K(T h)

1

3

h + Th

;

T 1 X

(1 + j)

0

r1 1

j=0

K(T

r2 r 1 1

+T 22

r2 r 1 2

h ); t 2 (T h; T

T h) ;

(A.21)

and 1

sup jDt j

2

K(T h) T h

;

T 1 X

(1 + j)r2

r1 1

j=0

KT

r2 r 1

h; t

T h; t

T

T h;

uniformly over the stated ranges of t: Thus sup

T 1X 2 D T t=1 t

K(T 2(r2

r1 1)

+ T 2(r2

r1 ) 4

h + T 2(r2

r1 ) 3

h )!0

by Assumption 6, verifying (A.8). To prove (A.9), we have T X

s

us kts =

s=1

T 1 X

j ctj

=

j=0

[T h] X

j ctj

+

j=0

where ctj =

T j X

T 1 X

j ctj

j=[T h]+1

ur kt;r+j ;

r=1

so, using (A.3), sup

[T h] X

j ctj

j=0

and thus 0 Now

E @sup

[T h] X j=0

[T h] X

sup

j

j=0

12

A j ctj

K

[T h] [T h] X X

j r2

jctj j

K

[T h] X j=1

r 1 1 r2 r 1 1

l

j r2

r1 1

jctj j

Ec2tj Ec2tl

1=2

:

(A.22)

j=1 l=1

Ec2tj

=

2

T j X

2 kt;r+j = O(T h)

r=1

by Assumption 6, so (A.22) is O((T h)2(r2 sumption 3. 23

r1 )+1

) = o((T h)2 ) uniformly in t; by As-

By summation-by-parts T 1 X

T 2 X

j ctj =

j=[T h]+1

j

dtj +

j+1

(A.23)

T 1 dt;T 1 :

j=[T h]+1

where dtj =

j X

ctl :

l=0

Now (A.23) is bounded uniformly by T 2 X

sup

j

jdtj j + sup

j+1

j=[T h]+1

K

T 2 X

j

1

j=[T h]+1

jdtj j + KT jdt;T

T 1

so Ed2tj

2

=

T X r=1

0

(A.24)

= max (r2 r1 ; 1 &) 0 1 min(r+j;T ) T X X ur @ kts A ; dtj =

@

min(r+j;T )

X s=r

1j

1j

using (A.3) and (A.5) and writing

r=1

jdt;T

1. Rearranging,

s=r

12

kts A

T X

Kj

s=1

!2

jkts j

Kj (T h)2

and (A.24) has second moment bounded by 2

K (T h)

T 2 X

T 2 X

j

1=2

l

1=2

+ K (T h)2 T 2

+1

j=[T h]+1 l=[T h]+1 2 +3

K (T h) uniformly in t; since