LOCAL ASYMPTOTIC NORMALITY FOR ... - Project Euclid

36 downloads 0 Views 220KB Size Report
This paper develops the asymptotic theory based on the LAN approach for a regression model with stationary non-Gaussian FARIMAp d q long-memory errors.
The Annals of Statistics 1999, Vol. 27, No. 6, 2054–2080

LOCAL ASYMPTOTIC NORMALITY FOR REGRESSION MODELS WITH LONG-MEMORY DISTURBANCE By Marc Hallin,1 Masanobu Taniguchi, 2 Abdeslam Serroukh3 and Kokyo Choy Universit´e Libre de Bruxelles, Osaka University, Universit´e Libre de Bruxelles and Komazawa University, Tokyo The local asymptotic normality property is established for a regression model with fractional ARIMAp d q errors. This result allows for solving, in an asymptotically optimal way, a variety of inference problems in the long-memory context: hypothesis testing, discriminant analysis, rankbased testing, locally asymptotically minimax and adaptive estimation, etc. The problem of testing linear constraints on the parameters, the discriminant analysis problem, and the construction of locally asymptotically minimax adaptive estimators are treated in some detail.

1. Introduction. Local asymptotic normality [LAN; see, e.g., Le Cam (1960, 1986); Strasser (1985); Le Cam and Yang (1990)] constitutes a key result in asymptotic inference and allows for the solution of virtually all asymptotic inference problems. The LAN approach recently has been adopted in the study of a variety of time series models. Swensen (1985) established LAN for autoregressive models of finite order with a regression trend and applied it in the derivation of the local power of the Durbin–Watson test. For a stationary ARMA p q process (without trend), Kreiss (1987, 1990) also proved the LAN property, and constructed locally asymptotically minimax LAM adaptive estimators, as well as locally asymptotically maximin tests for AR models. Based on the LAN property, Hallin and Puri (1988, 1994) discussed the problem of testing arbitrary linear constraints on the parameters of an ARMA model with trend. Kreiss (1990) showed the LAN for autoregressive processes of infinite order and discussed adaptive estimation. For multivariate ARMA models with a linear trend, Garel and Hallin (1995) established the LAN property and gave various expressions for the central sequence. Unit root and cointegrated models have been studied by Jeganathan (1995, 1997). Benghabrit Received March 1997; revised October 1999. supported by the Human Capital contract ERB CT CHRX 940 963 and the Fonds d’Encouragement a` la Recherche de l’Universit´e Libre de Bruxelles. 2 Work completed while visiting the Institut de Statistique et de Recherche Op´ erationnelle at the Universit´e Libre de Bruxelles under grant of Belgian Fonds National de la Recherche Scientifique. 3 Research supported by the Fonds d’Encouragement a ` la Recherche de l’Universit´e Libre de Bruxelles. AMS 1991 subject classifications. Primary 60G10, 62E20; secondary 62A10, 62F05. Key words and phrases. Long-memory process, FARIMA model, local asymptotic normality, locally asymptotically optimal test, adaptive estimation, discriminant analysis. 1 Research

2054

LAN FOR LONG MEMORY REGRESSION

2055

and Hallin (1998) investigate the bilinear case in the vicinity of AR dependence. Other nonlinear time series models have been considered, for example, by Hwang and Basawa (1993), Drost, Klaassen and Werker (1997) and Koul and Schick (1997). All the above results are for short-memory time series models. The longmemory case, from this point of view, remains largely, if not totally, unexplored. Asymptotic results, though, exist. But they mainly deal with the asymptotic behavior of traditional MLEs or LSEs. For a Gaussian long-memory process, Dahlhaus (1989) proved the asymptotic normality of the maximum likelihood estimator of spectral parameters and discussed its asymptotic efficiency. Yajima (1985, 1991) elucidated problems related with the asymptotics of the BLUE and the LSE in a regression model with long-memory stationary errors. Beran (1995) considers the case of the differencing parameter. Sethuraman and Basawa (1997) consider the MLE for a multivariate fractionally differenced AR model and obtain its asymptotic distribution. The monograph by Beran (1994) gives an extensive review of various problems in long-memory time series. However, the systematic approach based on the LAN property apparently has not been considered so far. This paper develops the asymptotic theory based on the LAN approach for a regression model with stationary non-Gaussian FARIMAp d q long-memory errors. In Section 2, we show that the log-likelihood ratios exhibit the typical LAN behavior, so that the local experiments converge weakly to a Gaussian shift experiment. This result is used, in Section 3, in the solutions of a variety of inference problems. Examples are given, and a general formula for the distribution, under contiguous sequences of alternatives, of a wide class of statistics is established. These results are quite general, and remain valid under a very large family of innovation densities. Throughout the paper we denote by  = 1 2    and  = 0 ±1 ±2    the sets of (natural) integers. 2. Local asymptotic normality (LAN). Y1      Yn  generated by Yt = Xt  + et 

(2.1)

Suppose that we observe Y n = t ∈ 

where Xt = Xt1      Xtb   t ∈  is a sequence of b-dimensional real-valued nonstochastic regressors,  = β1      βb  is a vector of nonserial parameters and et  t ∈  is a stationary long-memory process generated by the fractional ARIMA [in short, FARIMAp d q] model (L, as usual, stands for the lag operator), (2.2)

p  k=0

φk Lk 1 − Ld et =

q  k=0

ηk L k εt 

t ∈  φ0 = η0 = 1

with a vector of serial parameters  = θ1      θp+q+1  = d φ1      φp  η1      ηq  = d    

2056

HALLIN, TANIGUCHI, SERROUKH AND CHOY

and an i.i.d. innovation process εt  t ∈ , with nonvanishing innovation density g. In matrix notation, (2.1) also writes (2.3)

Y n = X n  + en 

t ∈ 

where X n = X1      Xn  and en = e1      en  . The following assumptions are made on  and g [(S1)–(S3)], and X n [(G1)– (G6)], respectively. p q (S1) The characteristic polynomials φz = k=0 φk zk and ηz = k=0 ηk zk have no roots within the unit disc  = z ∈  z ≤ 1 (S2) 0 < d < 1/2; denote by  the set of all ’s satisfying (S1)–(S2).  (S3) The innovation density g is such that zgz dz = 0 and z2 gz dz = σ 2 < ∞; moreover, g is absolutely continuous, with a.e. derivative g satisfying

(2.4)

 0 < g = g z/gz2 gz dz < ∞  g z/gz4 gz dz < ∞

and

√ These assumptions imply the quadratic mean differentiability of g. Also, note that (S3) entails E g εt /gεt  = 0 and that the second part of condition (2.4) is not required when inference is restricted to the serial part of the model [i.e., when  is specified; see Swensen (1985)]. Under these assumptions, ψz = φzηz−1 1 − zd admits the absolutely convergent development (2.5)

ψz =

∞  k=0

ψk zk 

z ∈ 

which implies that et  has the AR∞ representation (2.6)

∞  k=0

Similarly, letting ξz = representation (2.7)

ψk et−k = εt 

∞

k=0

et =

t ∈ 

ξk zk = ψz−1 , z ∈  , et  has the MA∞

∞  k=0

ξk εt−k 

t ∈ 

where ξk  is a square-summable sequence. Of course, the coefficients ψk and ξk in (2.5)–(2.7) depend on , with ψ0 = ξ0 = 1. The coefficients ψν are

LAN FOR LONG MEMORY REGRESSION

2057

differentiable, with gradient grad ψν  = gν 1      gν p+q+1  , where  π

−1

d  e−iνλ φeiλ  ηeiλ  j=1 1 − eiλ log1 − eiλ  dλ    −π    π 

d

−1   1 − eiλ dλ e−iν−jλ ηeiλ  j = 2     p + 1 −π (2.8) gν j = π 

−2

d   −iν−jλ  − 1 − eiλ dλ φeiλ  ηeiλ    −π e     j = p + 2     p + q + 1 and et  has spectral density

q 2 σ 2 k=0 ηk eikλ f λ = p 2  2π 1 − eiλ 2d k=0 φk eikλ

(2.9)

On the regression constants X n , we impose a sort of Grenander’s conditions [see, e.g., Hannan (1970)]. Letting  n−h     Xt+h k Xtj  h = 0 1       t=1 n (2.10) akj h =  n     Xt+h k Xtj  h = −1 −2      t=1−h

the following conditions are assumed to hold: (G1) ankk 0 → ∞ as n → ∞ k = 1     b (G2) limn→∞ X2n+1k /ankk 0 = 0 k = 1     b (G3) limn→∞ ankj h/ankk 0anjj 01/2 = rkj h exists for every k j = 1     b

˜ and h ∈ ; denote by Rh the b × b matrix rkj h . ˜ (G4) R0 is nonsingular; then there exists a Hermitian matrix function

Mλ = Mkj λ with positive semidefinite increments such that π ˜ Rh = (2.11) eihλ dMλ −π

Still on X

n

, we also make the following two assumptions:

anll 0

= On1+α  for some α ≥ 0, and max1≤t≤n X2tl /anll 0 = On−δ , l = 1     b, for some δ > 1 − 2d. (G6) There exist b1  b2 ∈ , 0 ≤ b1 ≤ b2 ≤ b, such that Xtk is of the order of tjk −1 (notation: Xtk ∼ tjk −1 ), k + 1    b1 , for some sequence of integers 1 ≤ j1 ≤ · · · < jk−1 ≤ jk ≤ · · · ≤ jb1 , that is, 0 < lim inf t→∞ Xtk t−jk +1 ≤ lim supt→∞ Xtk t−jk +1 < ∞; for the sake of simplicity, we assume in the sequel that jk = k, k = 1     b1 , so that this assumption takes the form

(G5)

(i) Xtk ∼ tk−1 , k = 1     b1 , which implies Mkk 0+  − Mkk 0 = 1, k = 1     b1 ; moreover, (ii) 0 < Mkk 0+  − Mkk 0 < 1, k = b1 + 1     b2 and (iii) Mkk 0+  − Mkk 0 = 0, k = b2 + 1     b.

2058

HALLIN, TANIGUCHI, SERROUKH AND CHOY

Assumption (G6) distinguishes three types of regressors, according to their asymptotic behaviors. m Letting X·k = X1k     , Xnk 1  k = 1    , b, and ˜ n = diagn−d X·1      n−d X·b  X·b +1      X·b  D 1 1 (here  ·  denotes the Euclidean norm), define the local sequences n =  + n−1/2 h 

˜ −1 n =  + D n k

where h is a p + q + 1-dimensional real vector, and u = h  k  belongs to n n some open subset of p+q+1 . Also, denote by ψk  and ξk  the sequences n resulting from substituting  for  in the definitions of ψ and ξ [(2.6) and (2.7)], respectively. The sequence of statistical experiments under study is

n n =     P   ∈  × b  n ∈  n

where  denotes the Borel σ-field on  and P  the joint distribution  εs  s ≤ 0 Y1      Yn  characterized by the parameter value   and the n innovation density

g. Denote by Hg   the sequence of simple hypotheses n

n

P   n ∈  . The log-likelihood ratio for Hg n  n  with respect to n

n

Hg   takes the form [under Hg  ] n

n

-g   = log

=

n  t=1

(2.12)

dPn  n n

dP    log g εt + −

t−1  ν=0

t−1  ν=0

 n ψn  ν Yt−ν − Xt−ν 

 ψν Yt−ν − Xt−ν 

∞  r 

n n + ψµ+t ξr−µ r=0 µ=0





− ψµ+t ξr−µ ε−r − log gεt  

The same log-likelihood ratio would have been obtained from conditional likelihoods [associated with the distribution of Y1      Yn  conditional upon the starting values εt  t ≤ 0], since the innovations εt have the same distribun n tion under Hg   as under Hg n  n . As we shall see, these starting values have no influence on the form (2.14) of the subsequent LAN result— hence of the asymptotic form of local experiments. Another way of handling the starting value problem is developed in Koul and Schick (1997), but is hardly applicable here due to the infinite-dimensional nature of initial values.

LAN FOR LONG MEMORY REGRESSION

Define the b × b matrix



W =

(2.13)

1  2π

W1 

O

O

W2 

2059

 

where W1  is the b1 × b1 matrix with k jth entry, /k − d/j − d2k − 12j − 1−1/2 σ 2 /2π η1/φ1 2 /k − 2d/j − 2dk + j − 1 − 2d [/· stands for the classical gamma function] and W2  is the b−b1 ×b−b1  matrix with k jth entry, π f λ−1 dMk+b1  j+b1 λ −π

We then have the following LAN result. Proposition 1 (LAN). Suppose that (S1)–(S3) and (G1)–(G6) hold. Then the sequence of experiments n  n ∈ , is locally asymptotically normal (LAN) and equicontinuous on compact subsets  of . That is: n (i) For all  , the log-likelihood ratio (2.12) admits, under Hg  , as n → ∞, the asymptotic representation n

(2.14)

n

-g   = h  k g     − 12 σ 2 gh Qh + gk Wk + oP 1

with the b + p + q + 1-dimensional random vector (the central sequence)  n    g Zt  t−1  grad ψν  et−ν   gZt  ν=1   n −1/2  t=1  g   = n (2.15)   n t−1      g Zt  ˜ −1 −D ψ X ν t−ν n gZ  t ν=0 t=1 the p + q + 1 × p + q + 1 matrix, π Q = 4π−1 (2.16) grad log f λ grad log f λ dλ −π

and the b × b matrix W definedin (2.13); Zt  t = 1     n stands for the   approximate residual Zt   = t−1 k=0 ψk Yt−k − Xt−k , and grad ψν  is given in (2.8). n n (ii) Still under Hg  , as n → ∞, g   is asymptotically normal, with mean 0 and covariance matrix g  , where  2  σ gQ 0  g   =  (2.17) 0 gW

2060

HALLIN, TANIGUCHI, SERROUKH AND CHOY n

(iii) For all n ∈  and all   ∈ , the mapping u = h k → Pn  n is continuous with respect to the variational distance. See Section 4 for the proof. 3. Applications. 3.1. Hypothesis testing. As mentioned in the introduction, Proposition 1 is the key result for virtually all problems in asymptotic inference connected with the FARIMA model under study. We illustrate this fact with some examples. For a general theory on locally asymptotically optimal testing in LAN families, the reader is referred to Strasser (1985) or Le Cam (1986). The following is borrowed from Strasser (1985). Let 0 denote the intersection of with a linear subspace of p+q+1 , equipped with the norm  · g associated with the covariance matrix (2.17); namely, g =  −1 g . We consider the problem of testing the null hypothesis under which   ∈ 0 against the alternative   ∈ / 0 . Denote by 0 g the projection matrix (projection here means the orthogonal projection associated with the norm  · g ) mapping b+p+q+1 onto 0 , and by I the identity map (in b+p+q+1 ). Put c = u ∈ I − 0 g ug = c c > 0. n

n

Also, let Eu stand for the expectation under Hg  +n−1/2 u. A sequence of tests ϕn  n ∈  is said to be asymptotically unbiased at level α ∈ 0 1 for the testing problem just described if lim sup Eu ϕn  ≤ α

n

u ∈ 0 

n

u∈ / 0 

n→∞

lim inf Eu ϕn  ≥ α n→∞

Proposition 2. Suppose that (S1)–(S3) and (G1)–(G6) hold. Let α ∈ 0 1 and choose kα ∈ 0 ∞ such that     n  n  lim P   I − 0  g g   > kα = α n→∞

g

Then: (i) the sequence of tests

(3.1)

 1

  n  ˆ n  if I − 0  g g ˆ n   

> kα 

0

  n  ˆ n  if I − 0  g g ˆ n   

≤ kα 

ϕ∗n =

g

g

ˆ n are locally discrete [see, for instance, Le Cam and Yang where ˆ n and    ˜n  ˆ n −  are (1990), page 60] estimators such that n1/2 ˆ n −  and D OP 1, as n → ∞ under the null hypothesis [note that they also play a role in

2061

LAN FOR LONG MEMORY REGRESSION

ˆ n ] is asymptotically unbiased at the definition of the matrix g = g ˆ n   level α for the problem under study. (ii) ϕ∗n  n ∈  is locally asymptotically maximin, in the sense that, denoting by ϕn  n ∈  any other sequence of asymptotically unbiased tests at level α for the same problem, n

n

lim sup inf Eu ϕn  ≤ lim inf Eu ϕ∗n  

(3.2)

n→∞

n→∞ u∈ c

u∈ c

The result readily follows from Theorem 82.21 in Strasser (1985). Note that, when 0 is the linear space spanned by the columns of a fullrank matrix B, the test statistic in (3.1) takes the form   n  ˆ n  I − 0  g g ˆ n    g

! =

n ˆ n  g ˆ n  

ˆ n  ˆ n   −1 g 







ˆ n

− B B g 

ˆ n



B

−1

" B



n

ˆ n  ×g ˆ n   The local and asymptotic optimality property (ii) of ϕ∗n also could be described in terms of local asymptotic stringency; see Le Cam [(1986), Chapter 11.9, Corollaries 2 and 3]. Let us give an explicit example. Let the innovation density g be Gaussian; the process et  then is completely characterized by its spectral density (2.9). For simplicity, set σ 2 = 1 (in case σ 2 is unknown, as it usually is in practice, it always can be replaced by its empirical residual version, based on the estimates provided below). The assumptions of Proposition 2 are assumed to hold, with b2 = 0 in (G6). From Theorem 2.4 of Yajima (1991), the BLUE n −1

−1 n −1 ˆ n  n X n X n Y n , where n  is the covariance maBL = X

−1 n n n ˆ trix of en , and the LSE  = X n X n X Y are asymptotically equivLS

˜ ˆ n ˜ n  ˆ n ˆ n alent (namely, D BL −  = Dn LS −  + oP 1), as n → ∞. The MLE ML of  is obtained by maximizing    ˆ n −1 ˆ n 5n   = − 1 log n  − 1 Yn − Xn  n  Yn − Xn  2

2

LS

LS

n with respect to . As in Dahlhaus (1989),we can show that ˆ ML →P , and that n the mth element of n1/2 Qˆ ML −  satisfies    n n1/2 Qˆ ML −  = n−1/2 en A m en − trn  A m  + oP 1 m

where A m = A m  is the n × n Toeplitz matrix with entries 1  π ik−jλ ∂/∂ θm f λ Am dλ = e k j 8π 2 −π f2 λ

2062

HALLIN, TANIGUCHI, SERROUKH AND CHOY

Denote by  B the linear space spanned by the columns of a matrix B. The n problem consists in testing the null hypothesis H0 under which  − 0  ∈  B1 

and

 − 0  ∈  B2 

for some given p+q+1×p+q+1−51  and b×b−52  matrices B1 and B2 of maximal ranks, and given vectors 0 ∈ p+q+1 and 0 ∈ b , respectively. Then n Proposition 2 implies that the test rejecting H0 whenever the test statistic # $  −1 n n n n n    Tn = nˆ ML − 0  Qˆ ML  − Qˆ ML B1 B1 Qˆ ML B1 B1 Qˆ ML  n

(3.3)

׈ ML − 0     −1 W W n 2 2    ˜n ˜n ˜n ˜ n B2 B 2 D ˜ B ˆ − 0  D + B2 D −D D LS 2π 2π n 2 ˜ n  ˆ n − 0  ×D LS

exceeds the α-quantile χ25 α of a chi-square distribution with 5 = 51 +52 degrees of freedom, has asymptotic level α and is locally asymptotically optimal in the sense of (3.2). 3.2. Discriminant analysis. Before turning to the discriminant analysis problem, we establish a general result on the asymptotic distribution of a class of statistics which plays a central role in this specific context, but also in

a variety of testing and estimation problems. Let B = Bk 5 denote the n × n matrix with elements π Bk5 = eik−5λ fbb λ dλ −π

where fbb is a real, even integrable function defined on −π π, such that (3.4)

1 1 π f λf λ2 dλ < ∞ tr Bn 2 −→ n 2π −π bb

as n → ∞. Let n = α1      αn  be a nonrandom vector satisfying  (T1) n−1/2 nr=1 αr −→ 0; (T2) There exists a spectral measure Mα λ such that lim

n→∞

n 

αj αj+s =

j=1

and (3.5)

n n n →



π

−π



π

−π

eisλ dMα λ

f λ dMα λ = c

2063

LAN FOR LONG MEMORY REGRESSION

The class of statistics we are considering here (actually, a class of sequences of statistics) is  n n   n n ek e5 − Rk − 5 Bk 5  = T Tn = n−1/2 k=1 5=1

(3.6)

 +n en

+ oP 1 

n

where the oP 1 term is under some sequence Hg  , as n → ∞, and n Rk − 5 = E ek e5  [expectations, here and below, are taken under Hg  ]. The motivation for considering this class  n is as follows. For simplicity, assume that  = 0 and that the parameter  is a scalar θ. Consider the discriminant analysis problem under which the observation Y n is to be assigned either to population mn

LAN FOR LONG MEMORY REGRESSION

2067

as n → ∞, where n i  denotes the σ-algebra generated by εt  t ≤ 0 Z1      Zi−1  Zi+1      Zn   See Section 4 for the proof. Let an → 0 bn → 0 cn → ∞, and dn = on denote three sequences of positive real numbers and one sequence of positive integers (all convergences n are as n → ∞). Denote by 0 a preliminary, root-n consistent estimate of  = n n n n d φ , by ψˆ ν and Zˆ t = Zt 0  the corresponding estimated residuals; ˆ n = Zˆ n  Zˆ n      Zˆ n . let Z n dn

dn +1

First, consider the case under which the unknown g can be assumed to be symmetric. Choosing the sequences an  bn and cn such that (3.12)

−1 n−1 a−3 n bn → 0

as n → ∞

define the estimator of the score function ϕ by   ˆ n − f ˆ n fNn z Z Nn −z Z n   ϕˆ z = − (3.13)  ˆ n + fN −z Z ˆ n bn + fN z Z n

n

with Nn = n − dn + 1 and (3.14)

& % n 1  z − yt fn z y1      yn  =  k nan t=1 an % & n 1   z − yt fn z y1      yn  =  k na2n t=1 an

where the kernel k is assumed to satisfy [see Schick (1993)] (K) The kernel k is three times continuously differentiable, with derivai i tives  ∞ 2k satisfying k z ≤ ckz i = 1 2 3 for some positive c, and −∞ z kz dz < ∞. The technical advantages of using this type of kernel were noted first in Bickel and Klaassen (1986). We then have the following result. n

Proposition 4. Assume that (S1)–(S3) hold, that g is symmetric, that 0 is root-n consistent, that the kernel k is symmetric and satisfies condition (K), and that the constants involved in (3.15) satisfy (3.12). Then, the sequence n  −1 1  n ˆ n ˙ n n ϕˆ n Zˆ n ˆ ˆ n = n  H (3.15) + J Q t  0 0 Nn t=d t n

where

(3.16)

n   1  n 2 Jˆn = ϕˆ n Zˆ t  and Nn t=d n n    ˆ n = 1 ˙ tn n  H ˙ tn n   Q H 0 0 Nn t=d n

2068

HALLIN, TANIGUCHI, SERROUKH AND CHOY

is locally asymptotically minimax (LAM). Proof. The proposition results from Koul and Schick’s (1997) Theorem 5.2, which shows that, under conditions (i)–(vi) of Lemma 1, 

ˆ n −  = σ 2 gQ −1 gn  + oP 1 n1/2  n

under Hg , as n → ∞.



When the assumption of symmetry cannot be made, the construction of the adaptive estimator is slightly different. Using the same notation as above, we now assume that the constants an  bn  cn and the sequence mn of Lemma 1 are such that (3.17)

−2 2 n−1 a−4 n bn c n → 0

and

−1 2 n−1 a−3 n bn c n m n → 0

as n → ∞. The estimator of the score function is  ˆ n fNn z Z n  ϕˆ z = − (3.18)  ˆ n bn + fN z Z n

still with Nn = n − dn + 1 and fn given in (3.14). n

Proposition 5. Assume that (S1)–(S3) hold, that 0 is root-n consistent, that the kernel k satisfies condition (K), and that the constants involved in (3.15) satisfy (3.17). Then, the sequence  −1 ˆ n ˆ n = n  + Jˆn Q 0 n  ˙ n n 1 n (3.19) ˙ n n × N1 t=dn Ht 0  − N t=dn Ht 0  n

n

n ×ϕˆ n Zˆ t 

with Jˆn given in (3.16) and

(3.20)

n  ˆ n = 1 Q Nn t=d





˙ tn n  H 0

n

n 1  ˙ n n  H − 0 Nn t=d t n



˙ tn n  − × H 0

1 Nn

n

˙ n n t=dn Ht 0 



is locally asymptotically minimax (LAM). The result again follows from Koul and Schick [(1997), Theorem 6.2]; here, conditions (i)–(vii) of Lemma 1 are required. The adaptive estimation of the lag parameter d in the FARIMA0 d 0 model 1 − Ld Xt = εt has been treated in Hallin and Serroukh (1999).

2069

LAN FOR LONG MEMORY REGRESSION Table 1

n

Square roots of the mean square errors of the Gaussian maximum likelihood estimator φ0 and the adaptive estimator φˆ n of φ, in model (3.8), with d = 02 and centered exponential errors εt , for φ = ±05 and ±08, respectively. Series length: n = 40. Number of replications: 40 n

␾0 φ = −0.8 φ = −0.5

0.10575 0.11061

ˆ n ␾ 0.10549 0.10938

n

␾0 φ = 0.8 φ = 0.5

0.09663 0.10419

ˆ n ␾ 0.09646 0.10414

While the theoretical interest of adaptive estimation for long and very long time series is obvious from the LAM property of the resulting estimates, the practical virtues of adaptivity for short series lengths should be investigated by means of an extensive Monte Carlo study. Such an investigation is beyond the scope of this paper. A modest simulation nevertheless has been conducted and yields encouraging results, despite the very short length (n = 40) of the (non-Gaussian) series considered. Forty replications of FARIMA1 d 0 processes of length n = 40, characterized by (3.8), with centered exponential errors εt have been generated for the parameter values d = 02 φ = ±05 and ±08, respectively. For each replication, a LAM estimation φˆ n of φ has been computed along the lines of Proposition 5. Dahlhaus’ (1989) Gaussian MLE was used as the inin n n tial estimate 0 = φ0  d0 . The sequences an = bn = n−1/16  cn = mn = n−1/8  √ dn = n/2 and

Nn = n − dn + 1 were adopted, with the kernel ku = 43 5 1 − u2 /5 I  u ≤ 5. Table 1 presents, for each value of φ, the square root of the mean square error computed over the 40 replications, n for the Gaussian MLE φ0 and the adaptive φˆ n , respectively. 3.4. Other inference problems. As already mentioned, the LAN result in Proposition 1 is the key to most inference problems for the long memory FARIMA model (2.2). Further examples are: 1. Locally asymptotically minimax estimation; the construction of locally asymptotically minimax estimates in the parametric models with specified innovation density g readily follows from Proposition 1 by applying the usual methods described, for example, in Section 5.3 of Le Cam and Yang (1990). 2. Maximum likelihood estimation; the asymptotic behavior of maximum likelihood estimators similarly follows from Proposition 1. 3. Rank-based testing; rank-based tests can be derived, along the same general lines as in Hallin and Puri (1994), from rank-based versions of the n central sequences f  . This approach is adopted in Serroukh (1996). 4. Adaptive rank-based testing methods; adaptive rank-based tests—actually, permutation tests—enjoying the same (conditional) distribution-freeness

2070

HALLIN, TANIGUCHI, SERROUKH AND CHOY

properties as rank tests but also yielding the uniformly optimal performance of adaptive tests, can be constructed in the same spirit as in the simpler case of AR models; see Hallin and Werker (1998) for a brief outline in the traditional short memory ARp context. 4. Proofs. 4.1. Proof of Proposition 1. 1 − eiλ d =

∞ 

It is well known that

πk eiλk

and

1 − eiλ −d =

k=0

∞ 

ρk eiλk 

k=0

where πk =

/k − d /k + 1/−d

and ρk =

/k + d  /k + 1/d

It follows that

(4.1)

πk = Ok−1−d 

ρk = Ok−1+d 

∂πk ∂d

∂ρk ∂d

∂2 π k ∂2 d

= Ok−1−d log k = Ok−1−d log k2 

∂2 ρk ∂2 d

= Ok−1+d log k = Ok−1+d log k2 

Considering the characteristic polynomials φz and ηz in (S1), φz/ηz is expanded as the absolutely convergent (for z < 1) power series, ∞  φz = γ zk  ηz k=0 k

where the γk ’s are such that (4.2)

γk = O γ k 

for some bounded, C2 function γ· of  satisfying γ < 1; compare Fuller (1996). Similarly, defining the sequence αk  by ∞  ηz = α zk  φz k=0 k

z < 1

there exists a C2 function α such that α < 1 and (4.3)

αk = O α k 

It follows from (4.1) and (4.2) that the sequence ψk  defined in (2.5) satisfies ∂j ψk (4.4) j = 0 1 2 = Ok−1−d log kj  ∂θi1 · · · ∂θij

2071

LAN FOR LONG MEMORY REGRESSION

as k → ∞. Similarly, for the sequence ξk  defined in (2.6), it is easily shown that ∂j ξk (4.5) j = 0 1 2 = Ok−1+d log kj  ∂θi1 · · · ∂θij as k → ∞. Our proof of the LAN result in Proposition 1 is based on a slight modification [see, e.g., Garel and Hallin (1995)] of Swensen’s (1985) Lemma 1 [itself based on McLeish (1974)], applicable to a wide class of models with dependent observations. Note that, from (2.12), n   n n -g   = log At n  n   2  t=1

with



2 n At n  n      t−1 t−1  n    = g εt + ψν Yt−ν − Zt−ν n  − ψν Yt−ν − Zt−ν  ν=0

ν=0

r ∞  

n n + ψt+µ ξr−µ r=0 µ=0

 − ψt+µ ξr−µ ε−r

× gεt −1  Define n

Ut and n Wt

n

= At



n  n    − 1

  t−1 t−1    1 g εt  −1/2  −1 ˜n k  = h grad ψν et−ν − ψν Zt−ν D n 2 gεt  ν=1 ν=0

and denote by t the σ-field generated by εk  k ≤ 0 Y1      Yt . In order to prove parts (i) and (ii) of Proposition 1, it is sufficient to check for the following six conditions: see Swensen’s Lemma 1 [note that (L1)–(L5) are Swensen’s conditions (1.2)–(1.6), whereas (L6) is assumption (iii) in his Theorem 1]. All n expectations and convergences are under Hg  , as n → ∞.   n (L1) E Wt t−1 = 0 a.s.;   n n 2 n (L2) limn→∞ E = 0; t=1 Ut − Wt    n 2 n (L3) supn E < ∞; t=1 Wt  n

(L4) max1≤t≤n Wt = oP 1;  n (L5) nt=1 Wt 2 − τ2 /4 = oP 1 for some positive constant τ;

2072

HALLIN, TANIGUCHI, SERROUKH AND CHOY

n

(L6)

t=1

  n n E Wt 2 I Wt > δ t−1 = oP 1 for some δ > 0 (I ·  stands

for the indicator function). We successively prove that (L1)–(L6) hold here. Condition (L1) immediately n follows from the definition of Wt . In order to prove (L2), let n

Qt

=

t−1  ν=0

+

  t−1     ψn n − ψν Yt−ν − Zt−ν  Yt−ν − Zt−ν ν



ν=0

∞ 

r 

r=0 µ=0 n



 n n ψµ+t ξr−µ − ψµ+t ξr−µ ε−r

n

n

= Qt1 + Qt2 + Qt3

say.

Expanding at  yields r   µ=0

n

n

ψµ+t ξr−µ − ψµ+t ξr−µ !

= On

−1/2



r  µ=0



grad ψµ+t  ξr−µ + On

= On−1/2 A1 + On−1/2 A2

−1/2



r  µ=0

" ψµ+t grad ξr−µ  =∗

say,

where  ≤ ∗ ≤ n . In view of (4.4) and (4.5), A1 =

r/2



  O µ + t−1−d logµ + t O r − µ−1+d logr − µ

µ=0

+

  O µ + t−1−d logµ + t O r − µ−1+d logr − µ

r  µ=r/2+1

= Or−1+d Ot−d/2 Olog r

r/2



 O µ + t−1−d/2 Olog µ + log t

µ=0

+Or−1−d/2 Ot−d/2 Olog tOlog r

r 

 O r − µ−1+d 

µ=r/2+1

Noting that

r

µ=0

µα = Orα+1  for α > −1, we have that A1 = Ot−d/2 log tOr−1+d log r

Similarly, it can be shown that A2 is Ot−d/2 log tOr−1+d log r, which implies (4.6)

r   µ=0

 n n ψµ+t ξr−µ − ψµ+t ξr−µ = On−1/2 Ot−d/2 log tOr−1+d log r

2073

LAN FOR LONG MEMORY REGRESSION n

n

Hence, Qt 1 + Qt 2 can be written as t−1  ν=0

(4.7)



t−1 t−1      n   ˜ −1 ˜ −1 ψn ψν − ψν Zt−ν D D ψν Zt−ν ν − ψν et−ν − n k− n k ν=0

−1/2

=n



t−1  ν=0

!   "  ∂2 1 t−1  h grad ψν  et−ν + h ψ h et−ν 2n ν=1 ∂θi θj ν ∗

t−1  ν=0

ν=0







 ˜ −1 ψν Zt−ν D n k−

t−1 



 ˜ −1 ψn ν − ψν Zt−ν Dn k

ν=0

where ∗ is some intermediate point between  and n . On account of (4.4), (4.6) and (4.8), and by Theorem 2.3 in Yajima (1991), we obtain that (4.8)

lim

n→∞

Next, let us show that (4.9)

n E1

=

n 

!% E

t=1

n    n E Qt 2 < ∞

t=1

n Ut

1 g εt  n − Q 2 gεt  t

&2 "

= o1

For every c1 > 0, we have ! & " n  %  1 g εt  n 2 n n n 1/2 E I n Qt ≤ c1 Ut − Q E1 = 2 gεt  t t=1 ! & " n %   1 g εt  n 2 n n 1/2 + E I n Qt > c1 Ut − Q 2 gεt  t t=1 n

n

= E1 1 + E1 2

say.

It follows from Lemma 2.2(ii) in Garel and Hallin (1995) that    n  n n 2 E1 1 ≤ oc1 1 E Qt t=1

n

where limn→∞ oc1 1 = 0 for any given c1 > 0. Hence, E1 1 also is o1. Part (i) of the same lemma yields    n   1 n n n 2 1/2 (4.10) E I n Qt > c1 n Qt g  E1 2 ≤ n t=1  n From Ash [(1972), page 297], n Qt

2

is uniformly integrable, which implies

that the right-hand side in (4.10) converges to zero as c1 → ∞; (4.9) follows.

2074

HALLIN, TANIGUCHI, SERROUKH AND CHOY

Finally, one easily checks that !% &2 " n  1 g εt  n n E Q − Wt 2 gεt  t t=1 ! !   " n   1 g εt  t−1 ∂2  = E h ψ h et−ν 4n gεt  ν=1 ∂θi θj ν ∗ t=1 



t−1  ν=0

2 



 ˜ −1 ψn ν − ψν Zt−ν Dn k



also converges to zero as n → ∞. This completes the proof of (L2). (L3). Note that   n n t−1    t−1      1 n 2 E Wt h grad ψν1  Re ν1 − ν2  grad ψν2 h g = 4n t=1 t=1ν1 =1ν2 =1 ! " n t−1     t−1  1 −1  ˜ −1 ˜n k ψν1 ψν2 Zt−ν1 Zt−ν2 D + g D n k 4 t=1ν =0ν =0 1

2

where Re ν1 − ν2  = Eet−ν1 et−ν2 . Lemma 4.4 of Garel and Hallin (1995) and Theorem 2.3 of Yajima (1991), along with (4.4), imply that  n 2  n E Wt → τ2 /4 as n → ∞, where t=1 τ2 = σ 2 gh Qh + gk Wk hence, (L3) is satisfied. n n n (L4). Turning to (L4), decompose Wt into Wt 1 + Wt 2 , with    1 g εt  t−1 n h grad ψν  et−ν Wt 1 = √ 2 n gεt  ν =0

(4.11)

1

and n

(4.12)

Wt 2 = −

 1 g εt  t−1 ˜ −1 k ψ Z D 2 gεt  ν =0 ν t−ν n 1

For every ε > 0,       n n n P max Wt > 2ε ≤ P max Wt 1 > ε + P max Wt 2 > ε  1≤t≤n

1≤t≤n

1≤t≤n

The Markov inequality implies that ! n     n n Wt 1 P max1≤t≤n Wt 1 > ε ≤ P t=1

(4.13) −2

≤ε n

−1

n 

2

I



n Wt 1

  n E n Wt 1

t=1

2



"

>ε >ε

In

1/2

2

n Wt 1

1/2

> εn

  

2075

LAN FOR LONG MEMORY REGRESSION

 n Just as n Qt

2

 n , n Wt1

2

is uniformly integrable, so that (4.13) is o1 as n

n → ∞. In view of (G5), the case of Wt 2 is entirely similar; (L4) follows. (L5). For the sake of simplicity, write ψ˙ ν = ψ˙ ν h instead of h grad ψν . √  n 1 ˙ Then, Wt 1 = 1/2 ng εt /gεt  t−1 ν=1 ψν et−ν , with an 5 -summable sequence  ˙ M > 0 such that ∞ ψ˙ ν  . For any fixed ε > 0, we can choose  ν=M ψν E  et−ν  < ˙ ε. Note that the process g εt /gεt  ∞ e is ergodic. Theorem 2 of ψ ν=1 ν t−ν Hannan [(1970), page 203] thus entails n   σ2 n 2 Wt 1 −→ gh Qh a.s. 4 t=1 Similarly, showing that

!

E

n   t=1

n 2 Wt 2

and that

! Var

n  

t=1

we obtain

n   t=1

(L6). Since

n 2

Wt 2

n

t=1

E

P

−→



" −→ 41 gk Wk

n 2 Wt 2

1 gk Wk 4

n 2 Wt

" −→ 0

and

n 

n

n

P

Wt 1 Wt 2 −→ 0

t=1

 n I Wt > δ t−1 is a nonnegative variable, it

is sufficient, in order to prove (L6), to show that   n  n 2 n E Wt I Wt > δ = o1 (4.14) t=1

as n → ∞. This, however, already has been shown in the proof of (L4). Finally, part (iii) of Proposition 1 readily follows from Scheff´e’s lemma and the continuity of g. ✷ 4.2. Proof of Proposition 3. The proof relies on Proposition 1 and Le Cam’s so-called third lemma. We just briefly outline it here. Giraitis and Surgailis (1990) established the asymptotic normality, under (3.4), as n → ∞, of n

T1 = n−1/2

n  n  t=1s=1

et es − Rt − s Bt − s

  n In their proof, they approximate the sums T1 1 = n−1/2 nt=1 ns=1 et es with  n sequences of the form U1 = n−1/2 nt=1 e1 te2 t, where ei t i = 1 2, are finite linear combinations of the εt ’s, involving 2M terms. For M sufficiently n n large, the approximation of T1 1 by U1 can be made arbitrarily good. On the

2076

HALLIN, TANIGUCHI, SERROUKH AND CHOY

  n other hand, we can approximate n en by V1 = nt=1 αt tj=0 βj εt−j , where βj  is 52 -summable. The joint asymptotic normality of -n  Tn  then foln

n

lows from that of -n  U1 + V1 , which in turn follows from checking for McLeish’s classical conditions. Next, we need an evaluation of the variance of Tn , which is equal to   n n lim VarT11  + 2 CovT11  n en  + c n→∞

Giraitis and Surgailis (1990) give   π n 2 3 f λfbb λ dλ + µ4 2π lim VarT11  = 16π n→∞

−π

π

−π

2

f λfbb λ dλ



On the other hand, we have n

CovT1 1  n en  = n−1/2 = n−1/2

(4.15)

n  n n   r=1 k=1 5=1 n  r=1

where Dn λ =



π



π



π

−π −π −π

Dn η + λDn −η + µe−irλ−irµ

×fbb λfeee λ µ dλ dµ dη

n

k=1 e

n−1/2

αr

αr Bk − 5cumer  ek  e5 

ikλ

. Now, (4.15) reduces to   π n  2 αr 2π fbb ηfeee −η η dη + o1  −π

r=1

a quantity which, by assumption (T1), tends to zero as n → ∞. Finally, we evaluate the asymptotic covariance Cov-n  Tn   = Cov n−1/2

n  j=1

Sj

j−1

j−1 n    ˜ −1 h grad ψν  ej−ν − k D Sj ψν Zj−ν  n



ν=1

j=1

−1/2

n

n n   k=1 5=1

= Cov 1 + 2 3 + 4

ν=0

ek e5 Bk − 5 +

 n en

say.

The resulting four terms yield m1 ,   , m4 in (3.7). We provide some details on the derivation of m1 ; the three other terms are obtained along the same lines. Putting ψ˙ ν h = h grad ψν , we obtain   j−1 n n n     −1/2 −1/2 ˙ Sj ek e5 Bk − 5 ψν hej−ν  n Cov1 3 = Cov n j=1

= n−1

n j−1 n  n   j=1 ν=1 k=1 5=1

ν=1

k=1 5=1

 ψ˙ ν hBk − 5 CovSj  ek Covej−ν  e5 

2077

LAN FOR LONG MEMORY REGRESSION



+CovSj  e5 Covej−ν  ek  n

n

= C1 + C2

say.

The first term yields n C1

! " j−1 n  π  π n   2π  iλk−j iλν = e ψ˙ ν he n j=1 k=1 −π −π ν=1

= 2π2



π

−π

!

∞  ν=1

×fbb λf λ dλ eik−jµ fSe µ dµ " ψ˙ ν heiλν fbb λf λfSe −λ dλ + o1

n

n

The case of C2 is quite similar, and it is easily shown that C2 n the same limit as C1 . The proposition follows. ✷

converges to

4.3. Proof of Lemma 1. We successively establish the seven statements of the lemma. n n (i) Letting n −  = n−1/2 h1  h2  , we have n  Ht n  − Ht  − n −  H ˙ t  2

t=1

2 2 n t−1  

1  n n 2 ∂ = h h ψ X t−ν  ij ν +cn − 4n2 t=1 ν=1 ij=1 i j

where c = ct X1      Xn  ∈ 0 1. Taking expectations and bounding EXt−ν1 Xt−ν2  with EX2t  yields ! " n 2  n n  ˙ E Ht   − Ht  −  −  Ht  t=1

2   2 n t−1  

1  n n 2 ≤ ∂ h h sup ψ E X2t  4n2 t=1 ν=1 i j=1 i j c∈01 ij ν +cn −

a quantity which, for n sufficiently large, is less than 2 2 ∞  K  ∂2 ψν   4n ν=1 i j=1 ij where K is an appropriate 2 The result then follows from the fact  constant. 2 that, in view of (4.4), ∞ ν=1 ij=1 ∂ij ψν  < ∞. (ii) This is an immediate consequence of (L4); see the proof of Proposition 1. ˙ t has (up to the choice of the score func(iii) and (iv) For any h ∈ 2 , h H n  tion, which here is g x/gx = x) the same form as defined as 2n1/2 Wt 1

2078

HALLIN, TANIGUCHI, SERROUKH AND CHOY

appearing in (4.11) [proof of conditions (L4) and (L5) of Proposition 1]. The same ergodicity argument used there yields n 1 P ˙ t −→ ˙ t = 0 h H Eh H n t=1

n 2 1  as σ ˙ tH ˙ t h −→ h H h Qh 4n t=1 4

and

The statement follows. n (v) A Taylor expansion of ψν = ψν n  yields, for some c ∈ 0 1, % 2 & 2 t−1   ∂ ∂ n 2 n ˙ t   − H ˙ t  = H ψν   − ψν  Xt−ν ∂θ ∂θ i i i=1 ν=1 2 





2 t−1  

n

−1/2

i

j=1 ν=1

i=1

2 n ∂2 −1/2 n h h Xt−ν j ∂θ ∂θ ψν  + cn j

 2 2 t−1  ∂2 4 n 2  −1/2 n ≤ h  ψν  + cn h Xt−ν  n ∂θi ∂θj ij=1 ν=1 Summing over t and taking expectation yields, as in the proof of (i), an O1 quantity. The result follows. (vi) This statement is an immediate consequence of (4.13). (vii) From (3.10), we have ˙ t n  = H

∞ t−1   k=0 j=1

n

n

gradψj ξk Zt−j−k n 

Under Hn n  Zi n  coincides with εi + Since

∞

j=t

∞  j=t

n

ψj Xt−j 

n

ψj Xt−j is measurable with respect to ε0  ε−1     ,   ˙ t n  − En H ˙ t n  n i n  H =

Since moreover

∞  k=0

 t−i−1     

k=0

n

0

n 2

ξk

n

gradψt−i−k ξk εi 

n

t=1 i=1

t < i n

< ∞, and since, in view of (4.4), gradψj 2 =

Oj−1−d log j2 , it follows that n t−m  n −1

1 ≤ i ≤ t

 2  t−i−1   n n   En  gradψt−i−k ξk εi    k=0 

LAN FOR LONG MEMORY REGRESSION

= Kn−1

2079

2 n t−m n   n   gradψt−i−k 

t=mn i=1

< K n−1

n  t=mn

t − mn Ot − mn −1−d log t − mn 2  ≤ K

n − mn  n

where K, K , and K are adequate constants. This latter quantity is oa2n  provided that mn is such that the ratio n − mn /n goes to zero sufficiently fast, letting, for example, mn = n − ona2n . ✷ Acknowledgments. The authors gratefully acknowledge the careful reading and very pertinent remarks by an anonymous referee, which lead to a substantial improvement of the initial version of this paper. REFERENCES Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York. Benghabrit, Y. and Hallin, M. (1998). Locally asymptotically optimal tests for ARp against diagonal bilinear dependence. J. Statist. Plann. Inference 68 47–63. Beran, J. (1994). Statistics for Long-Memory Processes. Chapman and Hall, New York. Beran, J. (1995). Maximum likelihood of the differencing parameter for invertible short and long memory autoregressive integrated moving average models. J. Roy. Statist. Soc. Ser. B 57 659–672. Bickel, P. J. and Klaassen, C. A. J. (1986). Empirical Bayes estimation in functional and structural models, and uniformly adaptive estimation of location. Adv. in Appl. Math. 7 55–69. Brillinger, D. R. (1981). Time Series. Data Analysis and Theory, 2nd ed. Holden-Day, San Francisco. Dahlhaus, R. (1989). Efficient parameter estimation for self-similar processes. Ann. Statist. 17 1749–1766. Drost, F. C., Klaassen, C. A. J. and Werker, B. J. M. (1997). Adaptive estimation in time series models. Ann. Statist. 25 786–818. Fuller, W. A. (1996). Introduction to Statistical Time Series, 2nd ed. Wiley, New York. Garel, B. and Hallin, M. (1995). Local asymptotic normality of multivariate ARMA processes with linear trend. Ann. Inst. Statist. Math. 47 551–579. Giraitis, L. and Surgailis, D. (1990). A central limit theorem for quadratic forms in strongly dependent linear variables and its application to asymptotical normality of Whittle’s estimate. Probab. Theory Related Fields 86 87–104. Hallin, M. and Puri, M. L. (1988). Optimal rank-based procedures for time series analysis: testing an ARMA model against other ARMA models. Ann. Statist. 16 402–432. Hallin, M. and Puri, M. L. (1994). Aligned rank tests for linear models with autocorrelated error terms. J. Multivariate Anal. 50 175–237. Hallin, M. and Serroukh, A. (1999). Adaptive estimation of the lag of a long-memory process. Statist. Inference Stochastic Process. 2 1–19. Hallin, M. and Werker, B. (1998). Optimal testing for semi-parametric AR models: from Lagrange multipliers to autoregression rank scores and adaptive tests. In Asymptotics, Nonparametrics and Time Series (S. Ghosh, ed.) 295–358 M. Dekker, New York. Hannan, E. J. (1970). Multiple Time Series. Wiley, New York. Hwang, S.Y. and Basawa, I. V. (1993). Asymptotic optimal inference for a class of nonlinear time series models. Stochastic Process. Appl. 46 91–113. Jeganathan, P. (1995). Some aspects of asymptotic theory with applications to time-series models. Econometric Theory 11 818–887.

2080

HALLIN, TANIGUCHI, SERROUKH AND CHOY

Jeganathan, P. (1997). On asymptotic inference in linear cointegrated time-series systems. Econometric Theory 13 692–745. Koul, H. L. and Schick, A. (1997). Efficient estimation in nonlinear autoregressive time-series models. Bernoulli 3 247–277. Kreiss, J. P. (1987). On adaptive estimation in stationary ARMA processes. Ann. Statist. 15 112–133. Kreiss, J. P. (1990a). Testing linear hypotheses in autoregression. Ann. of Statist. 18 1470–1482. Kreiss, J. P. (1990b). Local asymptotic normality for autoregression with infinite order. J. Statist. Plann. Inference 26 185–219. Le Cam, L. (1960). Locally asymptotically normal families of distributions. University of California Publications in Statistics 3 37–98. Le Cam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer, New York. Le Cam, L. and Yang, G. L. (1990). Asymptotics in Statistics. Springer, New York. McLeish, D. L. (1974). Dependent central limit theorems and invariance principles. Ann. Probab. 4 620–633. Schick, A. (1993). On efficient estimation in regression models. Ann. Statist. 21 1486–1521. Sethuraman, S. and Basawa, I. V. (1997). The asymptotic distribution of the maximum likelihood estimator for a vector time series model with long memory dependence. Statist. Probab. Letters 31 285–293. Serroukh, A. (1996). Inf´erence asymptotique param´etrique et non param´etrique pour les mod`eles ARMA fractionnaires. Ph.D. thesis, Institut Statist., Univ. Libre Bruxelles. Shumway, R. H. and Unger, A. N. (1974). Linear discriminant functions for stationary time series. J. Amer. Statist. Assoc. 69 948–956. Strasser, H. (1985). Mathematical Theory of Statistics. W. de Gruyter, Berlin. Swensen, A. R. (1985). The asymptotic distribution of the likelihood ratio for autoregressive time series with a regression trend. J. Multivariate Anal. 16 54–70. Taniguchi, M. (1998). Statistical analysis based on functionals of nonparametric spectral density estimators. In Asymptotics, Nonparametrics and Time Series (S. Ghosh, ed.) 351–394 M. Dekker, New York. Yajima, Y. (1985). On estimation of long memory time series models. Austral. J. Statist. 27 303– 320. Yajima, Y. (1991). Asymptotic properties of the LSE in a regression model with long memory stationary errors. Ann. Statist. 19 158–177. Zhang, G. and Taniguchi, M. (1994). Discriminant analysis for stationary vector time series. J. Time Ser. Anal. 15 117–126. M. Hallin I.S.R.O., E.C.A.R.E.S. ´ ´ and Departement de Mathematique Universite´ Libre de Bruxelles Brussels, B 1050 Belgium E-mail: [email protected]

M. Taniguchi Department of Mathematical Science Osaka University Toyonaka, Osaka 500 Japan E-mail: [email protected]

A. Serroukh Department of Mathematics Imperial College of Science Technology and Medicine London United Kingdom E-mail: [email protected]

K. Choy Komazawa University Tokyo Japan