A concept of asymptotically e cient estimation is presented when a mis- speci ed ... ciency of several minimum distance estimates is proved and the behavior.
ASYMPTOTICALLY OPTIMAL ESTIMATION IN MISSPECIFIED TIME SERIES MODELS By R. Dahlhaus and W. Wefelmeyer
Universitat Heidelberg and Universitat Siegen
Abstract A concept of asymptotically ecient estimation is presented when a misspeci ed parametic time series model is tted to a stationary process. Ef ciency of several minimum distance estimates is proved and the behavior of the Gaussian maximum likelihood estimate is studied. Furthermore, the behavior of estimates that minimize the h-step prediction error is discussed brie y. The paper answers to some extent the question what happens when a misspeci ed model is tted to time series data and one acts as if the model were true.
1 Introduction Let X1; :::; Xn be a sample from a real valued stationary process Xt ; t 2 Z, with mean 0, spectral density f (); 2 [0; ], and covariance function c(u); u 2 Z. Suppose for example that we want to make a one step ahead prediction, and that we want to use for convenience an AR(p)-model (autoregressive model of order p for this model the predictor has a simple form), i.e. we use the model Xt + a1Xt01 + ::: + ap Xt0p = "t ;
where "t are iid with mean 0 and variance 2. The best linear predictor of Xn+1 in an AR(p)- model is p ^Xn+1 = 0 aj Xn+10j (1.1) X
j =1
(c.f. Brockwell and Davis, 1987, p.170, Example 5.3.1). The mean square prediction error under the true distribution of the process (a0 = 1; = (a1; :::; ap)) is P E (; f ) = E
p
X
j =0
aj Xt0j
2
=
p
X
j;k=0
aj ak c(k 0 j ) =
0 AMS
Z
0
f () j A ()j2 d;
1991 subject classi cations. Primary 62M10; secondary 62G20. words and phrases. Time series, misspeci ed models, eciency, minimum distance estimation, maximum likelihood, predicition. 0 Key
1
where A () = pj=0 aj exp(0ij ). Here P E (; f ) is a kind of distance between the true process and an AR(p)-process. Suppose now that we wish to estimate the parameter 0 which leads to the best mean square prediction error (note that the process is misspeci ed and hence there exists no `true' value 0), i.e. we want to estimate 0 = argmin P E (; f ): A natural way to estimate 0 is to replace the covariance function c(u) or the spectral density f () by a nonparametric estimate and to minimize the resulting empirical function. For example, we may take as an estimate of f () the periodogram 2 n 1 In () = 2n t=1 Xt exp(0it) and minimize P E (; In ) = In () jA ()j2 d: 0 This leads to the Yule-Walker estimate ^n of 0. (Estimates with better small sample properties such as maximum likelihood estimates and tapered Yule-Walker estimates will be discussed in Section 4). The innovation variance may be `estimated' by ^n2 = P E (^n ; In); the corresponding theoretical value is 02 = P E (0 ; f ). We can proceed in a similar way if we wish to estimate those parameters of an AR(p)-model that lead to the best h-step ahead prediction error (cp. section 4). An equivalent approach to the minimization of the one step ahead prediction error is to look for the model which is closest in the sense of the (asymptotic) Kullback-Leibler information divergence. For a Gaussian process and a Gaussian model this divergence has the form 1 log f () + f () 0 1 d (1.2) f () f () 4 0 where f () is the spectral density of the model (cf. Pinsker, 1963; Parzen, 1982, 1992). Thus, the best t is achieved by P
X
Z
Z
(
)
0 = argmin D(; f )
where now
(1.3)
1 D(; f ) = 4
Z
0
(
log f () + ff(()) d: )
2
(1.4)
It may be estimated by
^n = argmin D(; In ):
^n is called Whittle estimate (Whittle, 1952). For AR(p)-models we have f () = 2 Pp a exp(0ij ) 02 (we now set = (a ; :::; a ; 2 )). Since Kolmogorov's for1 p 2 j =0 j
mula gives
1 log f ()d = 1 log 2 2 2 4 0 (cf. Brockwell and Davis, 1987, p. 184, Theorem 5.8.1), the values 0 and ^n are exactly the same as the values obtained by minimization of the one step ahead prediction error. In addition, 02 and ^n2 are the same as the values obtained by minimization of the one step ahead prediciton error. 02 and ^n2 are now also obtained as solutions of a minimization problem. Since D(; In ) converges uniformly to D(; f ) (see (3.1)) we immediately get that ^n is a consistent estimate of 0. In this paper we prove that ^n is also ecient if the true underlying process is Gaussian (Theorem 3.2). The same holds for the Gaussian maximum likelihood estimate (Theorem 3.3). This is rather surprising since the MLE is in general not an ecient estimate for the `best' approximating parameter if the model is misspeci ed (where `best' is meant in the sense of the Kullback-Leibler distance). As an example consider the situation where the true distribution of the process is AR(p) with "t following an unknown distribution. Our eciency result is proved by considering the best t 0 as a functional 0 = T (f ) of the unknown spectral density f and then applying a nonparametric version of the convolution theorem of Hajek (1970). Other functionals and estimates are treated by Hasminskii and Ibragimov (1986) and Ginovyan (1988). Their eciency concept is based on a local asymptotic minimax theorem rather than a convolution theorem. Note that in our setting the true spectral density lies outside the parametric model. Furthermore, it does not approach the parametric model asymptotically. Hence our setting is not covered by the general results of Millar (1984) on optimality of minimum distance estimates. Eciency in our sense is considered by Beran (1977) for i.i.d. observations and the Hellinger distance, and by Greenwood and Wefelmeyer (1993) for Markov chains and the Kullback-Leibler distance. Z
3
More generally, we consider in this paper distance functions of the form D(; f ) = K (; f (); )d: 0 We set T (f ) := argmin D(; f ) and make the following assumption. (1.1) Assumption 2 Rk is compact and K : 2 2 (0; 1) 2 [0; ] ! R is three times dierentiable in (; x) with continuous derivatives in (; x; ). For the true spectral density f we assume that T (f ) exists, is unique and lies in the interior of 2. One would usually consider functions with T (f ) = . However, we do not need this assumption. Taniguchi (1987) considers distance functions of the special type K (; f (); ) = K ff (()) , where f () is the spectral density of the model. An important distance function which is not of this form in the h-step prediction error (cf. section 4). If we take K ff (()) as the distance function, then Assumption 1.1 is ful lled if K (x) is three times continuously dierentiable with unique minimum at x = 1 and the model spectral densities f ful ll (1.2) Assumption 2 Rk is compact and the model spectral density f () is Z
three times continuously dierentiable with respect to with continuous (in and ) derivatives.
Note that we do not assume f =6 f for 1 =6 2. Instead we assume that T (f ) exists uniquely. This is of importance when only part of the parameters are estimated (as is the case for the prediciton error where 2 is not estimated by minimizing a distance function). The assumptions on the observed process are (1.3) Assumption Xt ; t 2 Z, is a Gaussian stationary process with EXt = 0 and spectral density f 2 Lip ( is speci ed below). As estimates of T (f ) we consider in this paper T (In) and T (f^n), where f^n () := In ( + )Wn ()d (1.5) 0 is a kernel estimate of f . For the kernel we need the following. 1
2
Z
4
(1.4) Assumption Wn () = mW (m), where n1=4 m n1=2 and W is bounded
and non-negative with W (x) = 0 for jxj > c and 0c c W (x)dx = 1. The Fourier 1 jW ^ (x) is assumed to be continuous with R01 ^ (x)jdx < 1 transform W R
The above assumptions are discussed after Lemma A.7. The use of f^n instead of In is important for distance functions that are not linear in f , since in this case D(; In ) will usually not converge to D(; f ), with the consequence that T (In) is not a consistent estimate of T (f ). Taniguchi (1987) has proved asymptotic normality of T (f^n) for general stationary time series, and eciency when the model is correctly speci ed. Asymptotic normality of T (In) for the Whittle distance has been proved by several authors. We mention Whittle (1952), Walker (1964), Dzhaparidze (1971), Hannan (1973), Hosoya and Taniguchi (1982), Hosoya (1989). We also study the eciency of the exact Gaussian maximum likelihood estimate under model misspeci cation. The asymptotic distribution of this estimate was derived under model misspeci cation by Ogata (1980). In particular, we prove that T (f^n) is an ecient estimate of T (f ) even if the model is misspeci ed. T (In) turns out to be ecient if the distance function is linear in f while the MLE is only ecient if the Kullback-Leibler divergence is used as a distance function. The asymptotic variance bound is derived in Section 2. Eciency of several estimates is proved in Section 3. The results are discussed together with some examples in Section 4. In particular, we consider the h-step prediction error in some detail. Some technical lemmas are put into an appendix.
2 The asymptotic variance bound In this section we derive a lower bound for the asymptotic variance of `regular' estimates of the functional T (f ) introduced in Section 1. Let X1; :::; Xn be a sample form a real-valued stationary Gaussian process with mean 0 and unknown spectral density f . The distribution of the process is determined by the spectral density. Hence we may consider f as an in nite-dimensional `parameter' of the distribution. We want to prove that certain estimates of T (f ) are ecient. In a rst step, we prove that the true model is locally asymptotically normal, 5
LAN. If the spectral density were to depend on a nite-dimensional parameter , we would consider the likelihood ratio corresponding to and a nearby parameter + p1n h, with h an arbitrary vector, the so-called local parameter. Local asymptotic normality in this case was rst proved by Davies (1973). In our case, however, the spectral density is completely unknown. Hence we x a spectral density f and consider a nearby spectral density of the form f () 1 + p1n h() , with h an arbitrary (bounded) function. The function h now plays the role of local parameter. Let L2 denote the space of functions on (0; ) which are square-integrable with respect to Lebesgue measure. Introduce the inner product, hh; ki = 41 0 h()k()d
Z
and the norm khk2 = hh; hi on L2. For a bounded function h() set 1 fnh () = f () 1 + p h() : !
n
Furthermore, let 6n (g) =
Z
0
g ()exp (i(r 0 s))d
r;s=1;:::;n
be the Toeplitz matrix of g and X n = (X1; :::; Xn)0. If g is a vector function, 6n (g) is the corresponding vector of matrices. For example, by k6n (rf )k we mean 0
X
@
k
i=1
6n
1=2
@
2 A f @i
1
where kAk is the spectral norm of A (cp. the appendix). The following result is due to Dzhaparidze (1986). (2.1) Theorem Let f () be positive, bounded and bounded away from 0.
Let Pnh be the distribution of the observations X1; :::; Xn of a Gaussian stationary process with mean 0 and spectral density fnh = f 1 + p1n h , where h() is bounded. Then we have under the law Pn0 1 P nh 0 0; log dP Zn (h) + hh; hi ! 2 dPn0 where ! Z 1 1 h Zn (h) = p X n0 6n (f )01 6n ( )X n 0 2 0 h()d : 2 n 2
6
(Note that Zn (h) can be written in the form hh; Zn0 i.) Furthermore, D Zn (h) ! N (0; hh; hi);
i.e. the sequence Pnh is LAN.
See Dzhaparidze (1986), p.64, Section I.3, Theorem 4(3) and p. 155, Section II, Theorem A1.2. Proof.
The following result is also due to Dzhaparidze (1986), p.64, Section I.3, Theorem 4(4). We prove it under slightly dierent conditions. (2.2) Theorem Suppose h is bounded and f 2 Lip with > 1=2. Then pn I () 0 f () n P 4 0 f () h()d 0 Zn (h) ! 0: Z
Lemma A.7. (iv) implies pn EI () 0 f () n h()d = o(1): f () 4 0 Since 0 In()g()d = 2n1 X 0n6n (g)X n the variance of the above expression is equal to 1 varX 0 6n h 0 6n (f )01 6n h X : n n n 4f 4 If X is Gaussian with covariance matrix 6, then var(X 0AX ) = tr(6A6A)+tr(6A6A0). Therefore, this variance is with 6f = 6n (f ) equal to 2 tr 6n h 6f 6n h 6f 0 tr 6n h 6n h 6f n 4f 4f 4f 4 h 0 tr 6n 4h 6f 6n 4f + tr 6n 4h 6n 4h
Proof.
Z
R
(
"
(
(
!
!)
)
(
)
(
)
)#
which tends to zero with Lemma A.3.
We now derive a lower bound for the asymptotic variance of `regular' estimates of T (f ). Local asymptotic normality, see Theorem 2.1, induces the norm hh; hi on the local parameter space. The norm determines how dicult it is, asymptotically, to 7
distinguish between f and f 1 + p1n h on the basis of a sample X1; :::; Xn. Consider now the problem of estimating the functional T (f ). The convolution theorem says that a variance bound for `regular' estimates of T (f ) is given by the squared length of the gradient of the functional in terms of the inner product hh; ki. The gradient is given in Corollary 2.4 below. We need the following Taylor expansion which will also be used in section 3. Let fn be a spectral density which converges to f . Let us assume for the moment that T (fn ) is also in the interior of 2. Let rK denote the derivative of K (; x; ) with respect to , and K 0 the derivative with respect to x. We obtain with K (; x) = K (; x; 1)
0 = rD(T (fn); fn ) = rD(T (f ); fn ) + 0 r2K (T (f ); f ) (T (fn) 0 T (f )) + 0 r2K 0(T (f ); f~)(fn 0 f ) (T (fn) 0 T (f )) + 12 (T (fn) 0 T (f ))0 0 r3K (t;~ fn) (T (fn) 0 T (f )) Z
Z
Z
(2.1)
where jf~() 0 f ()j jfn () 0 f ()j and jt~0 T (f )j jT (fn) 0 T (f )j. Furthermore, rD(T (f ); fn) = rD(T (f ); f ) +
Z
rK 0(T (f ); f )(fn 0 f ) 0 Z + 12 0 rK 00(T (f ); f~)(fn 0 f )2:
(2.2)
Note that rD(T (f ); f ) = 0. As a rst consequence we obtain the following result. (A similar result was proved by Taniguchi, 1987, Theorem 1(b) and Theorem 2 in the special case K (; f (); ) = K ff (()) .) (2.3) Theorem Suppose that fn is a sequence with kfn 0 f k ! 0 and 0 < C1 fn ; f C2 . Then we have (i) T (fn ) ! T (f ).
(ii) If
Hf
:=
Z
0
r2K (T (f ); f (); )d
is a nonsingular matrix, then we have with
gf () = 04Hf01 rK 0(T (f ); f (); )f () :
8
T (fn) 0 T (f ) =
1 4
Z
0
gf ()
fn () 0 f () d + O(kfn 0 f k2 ): f ()
(i) is exactly analogous to Theorem 1(b) in Taniguchi (1987). (ii). Since T (fn) ! T (f ), the value T (fn) lies in the interior of 2 for n large enough. Furthermore, f n and f are bounded from above and below. Therefore, all derivatives of K in the above Taylor expansion are bounded and we obtain T (fn ) 0 T (f ) + O(jT (fn) 0 T (f )j kfn 0 f k) + O(jT (fn ) 0 T (f )j2) = 41 0 gf () fn (f)(0)f () d() + 0(kfn 0 f k2) Proof.
Z
which implies with part (i) the result. As a consequence we obtain
(2.4) Corollary Let h be bounded and fnh = f 1 + p1n h . Then
pn(T (f ) 0 T (f )) ! 1 nh 4
Z
0
gf ()h()d:
The function gf is called the gradient of T at f . We are now in a position to formulate our eciency concept for estimates of T (f ). An estimate Tn is regular for T at f with limit L if its distribution converges continuously to L in the following sense: pn(T 0 T (f )) ! D L for h bounded: n nh The convolution theorem says that the limit L is the convolution of some distribution M with a normal distribution the variance of which equals the squared length of the gradient: (2.3) L = M 3 N (0; hgf ; gf i): By a well-known result of Anderson (1955), L is less concentrated in symmetric intervals than N (0; hgf ; gf i). This justi es calling Tn ecient for T at f if its limit distribution is N (0; hgf ; gf i). We say brie y that hgf ; gf i is a variance bound for regular estimates. Note, however, that the optimality result is much stronger: it holds for all (bounded) symmetric bowl-shaped loss functions, not just for the (truncated) quadratic loss function. 9
We also have the following useful characterization: An estimate Tn is regular and ecient for T at f if and only if it admits the following stochastic approximation: pn(T 0 T (f )) 0 Z (g ) !P 0: (2.4) n
n f
A convenient reference for the above version of the convolution theorem, and the characterization, is Greenwood and Wefelmeyer (1990). There it is also pointed out that the convolution theorem implies its own multivariate version. Speci cally, let T = (T1; :::; Tk)0 be a nite-dimensional functional of the spectral density. If gj is the gradient of Tj , then gf = (g1 ; :::; gk )0 is called the gradient of T . The convolution theorem (2.3) is true with a k-dimensional normal distribution with covariance matrix hgf ; gf0 i. The characterization (2.4) is true with vectors T and gf .
3 Ecient estimates In this section we study several estimates of T (f ). We start with T (f^n) where f^n is a kernel estimate as in (1.4). As seen in Section 2, proving eciency means proving the stochastic approximation (2.4). (3.1) Theorem Suppose Assumptions 1.1, 1.3 (with = 1) and 1.4 hold and Hf is nonsingular. Then we have
pn(T (f^ ) 0 T (f )) 0 Z (g ) !P 0; n n f
i.e. T (f^n ) is an ecient estimate of T (f ).
We start by proving consistency. We cannot apply Theorem 2.3 directly, ^ since fn () is not necessarily bounded. However, if sup jf^n() 0 f ()j < m=2 we know that f^n is bounded. We therefore obtain from Theorem 2.3 (1) that there exists for all " > 0 a > 0 with P (jT (f^n ) 0 T (f )j ") P (kf^n 0 f k ) + P (sup jf^n () 0 f ()j m=2)
Proof.
which implies consistency with Lemma A.7. Since P (sup jf^n()0f ()j m=2) ! 0 we obtain as in the proof of Theorem 2.3 pn(T (f^ ) 0 T (f )) = pn g () f^n () 0 f () d + O (n1=2kf^ 0 f k2) p n n 4 0 f f () 10 Z
which by using Lemma A.7 (i) and (iii) is equal to pn I () 0 f () d + op (1) gf () n f () 4 0 (cp. also Taniguchi, 1987, proof of Theorem 2). Theorem 2.2 implies that this is Zn (gf ) + op (1). Z
We now study the estimate T (In) with distance functions that are linear in f (). An example is the Kullback-Leibler distance as in (1.4). (3.2) Theorem Suppose Assumptions 1.1 and 1.3 (with > 1=2) hold, where K (; x; ) = a () + b ()x. If Hf is nonsingular then pn(T (I ) 0 T (f )) 0 Z (g ) !P 0 n n f i.e. T (In) is an ecient estimate of T (f ). Here 2a 2 Hf = r T (f ) + r bT (f )f 0 Z
and
gf
Proof.
Since
= 04Hf01rbT (f )f:
Lemma A.7(v) implies that sup jD(; In ) 0 D(; f )j !P 0:
(3.1)
D(T (In ); In ) D(T (f ); In)
and
D(T (f ); f ) D(T (In ); f );
it follows that D(T (In); f ) ! D(T (f ); f ) in probability and therefore also T (In) ! T (f ) in probability. A modi cation of the Taylor expansion (2.1) yields with jT (f ) 0 T (f )j jT (In) 0 T (f )j: pn I () 0 f () d gf () n 4 0 f () = H 01 r2K (T (f ); f ) + r2b (I 0 f ) pn(T (I ) 0 T (f )): g Z
Z
f
Z
0
g
0
11
Tg (f ) n
n
Since
pn
4
and
Z
0
gf ()
In () 0 f () d = Zn (gf ) + op (1) (Theorem 2.2); f () Z (f ); f ) !P Hf (smoothness of K ); r2K (Tg 0
Z
P 2b (I 0 f ) ! 0 r n T f ( ) 0 g
the result is proved.
(Lemma A.7(v))
In the next theorem we consider the Kullback-Leibler distance D(; f ) as in (1.3) (i.e. a () = 41 log f () and b () = 41 f01 with b () as in Theorem 3.2). We then have 1 (f 0 f )r2f 01 + (r log f )(r log f )0 d: Hf = 4 0 As a consequence of Theorem 2.1 and Theorem 3.2 we now obtain for the Whittle estimate ^n = T (In) pn(^ 0 ) ! D W 0; 1 H 01 f 2 (rf01 )(rf01 )0 d Hf01 ; n 0 f 4 0 a result already proved by Taniguchi (1979). However, we now know that this limit variance is the smallest which can be achieved under model misspeci cation. We now study the behavior of the Gaussian maximum likelihood estimate ~n of the tted model, i.e. ~n = argmin Ln (); where Ln () = 0 n1 log likelihood = 12 log(2) + 21n log j6n (f )j + 12 X 0n6n (f )01X n : Below we prove that ~n is also an asymptotically ecient estimate of 0. It is obvious that ~n cannot be ecient for any point dierent from 0. Thus, if we choose a distance function dierent from the Kullback-Leibler divergence, the maximum likelihood estimate will usually not be consistent. In particular, this holds if 0 is the parameter that gives the best h-step prediction error for h 2 (cp. Section 4). Z
n
o
0
0
0
0
Z
0
12
0
(3.3) Theorem Suppose 0 is the unique solution of (1.3) and lies in the interior of 2. Suppose further that Assumption 1.2 and 1.3 (with > 1=2) hold and in addition f 2 Lip ; > 1=2. If Hf is nonsingular, then we have with gf = 0Hf01 f rf01 pn(~ 0 ) 0 Z (g ) !P 0; 0 n n f 0
0
i.e. the Gaussian maximum likelihood estimate is ecient for the point 0 which minimizes the asymptotic Kullback-Leibler information divergence.
Unfortunately, it is much more dicult to prove the analogous result to (3.1) with Ln () instead of D(; In ). Therefore, we follow the method of proof of Walker (1964), Section 2 to prove consistency of ~n. We start by proving that for all 1 2 2, 1 =6 0 there exists a constant c(1) > 0 with lim E fLn (1) 0 Ln (0)g c(1): n!1 Let 6 = 6n(f ), 6f = 6n (f ) and 6r = 6n(rf ). We obtain 1 1 E fLn (1) 0 Ln (0)g = log j6 60 1 j + tr 6f (60 1 0 60 1 ) 2n 2n which tends with Szego's identity (cf. Grenander and Szego, 1958, p.64, Section 5.2) and Lemma A.5 to D(1 ; f ) 0 D(0 ; f ) =: c(1) > 0 due to the uniqueness of 0. Furthermore, var(Ln(1) 0 Ln(0)) = 2n1 2 tr 6f (60 1 0 60 1) 2 tends to zero with Lemma A.5 which implies lim P (Ln (1) 0 Ln (0) < c(1)=2) = 0: n!1 Using Lemma A.1 we obtain with j 0 1j j2 0 1j Ln (2) 0 Ln (1) k = 21n (2 0 1)i tr 60 16n @@ i f 0 X 0n 60 16n @@ i f 60 1X 0n :
Proof.
0
n
1
0
1
h
i
1
X
"
(
o
0
0
!)
!
i=1
#
Lemma A.1 and Lemma A.2 now imply with U (1) := f2 2 2 : j2 0 1j < g sup jLn (2) 0 Ln (1)j K 1 + n1 X 0n X n :
2 2U (1 )
13
Since E n1 X 0nX n = c(0) = 0 f ()d and var n1 X 0nX n = O(T 01), it follows that there exists for all 1 6= 0 a c(1) > 0 with lim P 2inf (Ln (2) 0 Ln(o)) c(1)=4 = 1 n!1 U ( ) for suciently small . With a compactness argument we obtain as in Walker (1964) that ~n !P 0. We now obtain with a Taylor argument rLn (~n)i 0 rLn(0)i = r2Ln (n(i))(~n 0 0) i where jn(i) 0 0j j~n 0 0j (i = 1; :::; k). If ~n is an interior point of 2 we have rLn(~n ) = 0. If ~n lies on the boundary of 2 then the assumption that 0 is in the interior implies j~n 0 0j for some > 0, i.e. we obtain p P ( njrLn (~n )j ") P j~n 0 0j ) ! 0 for all " > 0. Therefore, it is sucient to prove (3.2) r2Ln(n(i)) !P Hf and pnrL ( ) 0 Z f rf 01 !P 0: (3.3) n 0 n We have with Lemma A.1 rLn() = 21n tr 60 16n (rf ) 0 21n X 0n 60 16n (rf )60 1X n and r2Ln () = 0 21n tr 60 16n (rf ) 2 + 21n tr 60 16n (r2f ) + n1 X 0n 60 16n (rf )60 16n (rf )60 1X n 0 12 X 0n 60 16n (r2f )60 1X n : Lemma A.6 implies p E nrLn (0 ) 0 Zn (f rf01 ) = 2p1 n tr 60 1 6r 0 tr 6f 60 16r 60 1 pn = 4 (f 0 f )rf01d + o(1) = pnrD( ; f ) + o(1) = o(1)
R
2
1
n
o
0
n
o
n
o
0
n
o
n
0
o
0
Z
0
0
0
14
0
Furthermore, var pnrLn (0) 0 Zn (f rf01) = 41n 2tr 6f 60 1 6r 60 1 2 0 2tr 6f 60 1 6r 60 16n 2f rf01 0 2tr 60 16r 60 16f 6n 2f rf01 + tr 6n 2f rf01
"
0
(
0
!)
0
(
0
0
0
0
!)
8
1=2. Then
1 tr 6n (fo)016n (g) = 1 n 2 n
o
Z
g () d + o(n01=2 ) 0 fo ()
and
1 tr 6n (f )6n (fo)016n (g)6n (fo)01 = 1 f ()g() d + O(n01=2) n 2 0 fo ()2 The proof is omitted. It is quite technical and uses calculations in the frequency domain similar to the proof of Lemma A.4. Under stronger conditions the result would e.g. follow from Theorem 2.1.1 of Taniguchi (1991) or from Lemma 4.5 in Azencott and Dacunha-Castelle (1986, Chapter XIII). n
o
22
Z
(A.7) Lemma Suppose Xt; t 2 Z is a fourth order stationary process with EXt = 0,
Lipschitz- continuous spectral density f () and bounded fourth order spectrum. Under Assumption 1.2 we have 2 R (i) E 0 f^n () 0 f () d = O(m02 ) + O mn = o(n01=2 ), (ii) P sup jf^n () 0 f ()j " = o(1), n o p R P 0 for () continuous, (iii) n 0 () f^n () 0 In () d ! R p (iv) n 0 () fEIn () 0 f ()g d = o(1) for () bounded, R P 0 for 2 compact and h continuous on (v) sup22 0 h () fIn () 0 f ()g d ! 2 2 [0; ].
(i) and (iv) are standard. (iii) is contained in the proof of Theorem 3 in Taniguchi (1987). (iv) is proved by approximating h () by the Cesaro sum of its Fourier series (cp. Hannan, 1973, Lemma 1). (ii) follows since u 1 f^n () 0 Ef^n () cn (u) 0 Ecn (u)j w^ sup j n 2 jujn01 and varcn (u) = O(u01 ) uniformly in u. Proof.
X
If we make the stronger assumption that f is dierentiable with Lipschitz continuous derivative then we get in (i) the stronger result O(m04) + O mn . We then can relax the conditions in Assumption 1.2 to n1=8 m n1=2.
References Anderson, T.W. (1955). The integral of a symmetric unimodal function over a symmetric convex set and some probability inequalities. Proc. Amer. Math. Soc. 6, 170-176. Avram, F. (1988). On bilinear forms in Gaussian random variables and Toeplitz matrices. Probab. Th. Rel. Fields 79, 37-45. Azencott, R. and Dacunha-Castelle, D. (1986). Series of Irregular Observations. Forecasting and Model Building. Springer- Verlag. New York. Beran, R. (1977). Minimum Hellinger distance estimates for parametric models. Ann. Statist. 5, 445-463. Brockwell, P.J. and Davis, R.A. (1987). Time Series: Theory and Methods. Springer-Verlag, New York.
23
Dahlhaus, R. (1983). Spectral analysis with tapered data. J. Time. Ser. Anal. 4, 163-175. Dahlhaus, R. (1988). Small sample eects in time series analysis: A new asymptotic theory and a new estimate. Ann. Statist. 16, 808-841. Davies, R.B. (1973). Asymptotic inference in stationary Gaussian time series. Adv. Appl. Probab. 5, 469-497. Dzhaparidze, K. (1971). On methods for obtaining asymptotically ecient spectral parameter estimates for a stationary Gaussian process with rational spectral density. Theor. Probab. Appl. 16, 550-554. Dzhaparidze, K. (1986). Parameter Estimation and Hypothesis Testing in Spectral Analysis of Stationary Time Series. Springer-Verlag, New York. Ginovyan, M.S. (1988). Asymptotically ecient nonparametric estimation of functionals of a spectral density having zeros. Theor. Probab. Appl. 33, 296-303. Graybill, F.A. (1983). Metrics with Applications in Statistics, 2nd ed. Wedsworth, Belmont, Calif. Greenwood, P.E. and Wefelmeyer, W. (1990). Eciency of estimators for partially speci ed ltered models. Stochastic Process. Appl. 36, 353-370. Greenwood, P.E. and Wefelmeyer, W. (1993). Maximum likelihood estimator and Kullback-Leibler information in misspeci ed Markov chain models. Preprint. Grenander, U. and Szego, G. (1958). Toeplitz Forms and their Applications. University of California Press, Berkeley. Hajek, J. (1970). A characterization of limiting distributions of regular estimates. Z. Wahrsch. verw. Gebiete 14, 323-330. Hannan, E.J. (1973). The asymptotic theory of linear time series models. J. Appl. Probab. 10, 130-145. Hasminskii, R.Z. and Ibragimov, I.A. (1986). Asymptotically ecient nonparametric estimation of functionals of a spectral density function. Probab. Theory Related Fields 73, 447-461. Haywood, J. and Tunniclie Wilson, G. (1993). Fitting time series models by minimising multi-step ahead errors: a frequency domain approach . Preprint. Lancaster University. Hosoya, Y. and Taniguchi, M. (1982). A central limit theorem for stationary processes and the parameter estimation of linear processes. Ann. Statist. 10, 132-153. Hosoya, Y. (1989). The bracketing condition for limit theorems on stationary linear processes. Ann. Statist. 17,401-418. Millar, P.W. (1984). A general approach to the optimality of minimum distance estimators. Trans. Amer. Math. Soc. 286, 377- 418. Ogata, Y. (1980). Maximum likelihood estimates for incorrect Markov models for time series and the derivation of AIC. J. Appl. Probab. 17, 59-72. Parzen, E. (1982). Maximum entropy interpretation of autoregressive spectral densities. Statist. Probab. Lett. 1, 7-11.
24
Parzen E. (1992). Time series, statistics, and information. In: New Directions in Time Series Analysis, Part I (D. Brillinger et al., eds.), 265-286, Springer-Verlag, New York. Pinsker, M. (1963). Information and Information Stability of Random Variables. Holden-Day, San Francisco. Taniguchi, M. (1979). On estimation of parameters of Gaussian stationary processes. J. Appl. Prob. 16, 575-591. Taniguchi, M. (1987). Minimum contrast estimation for spectral densities of stationary processes. J. Roy. Statist. Soc. Ser. B 49, 315-325. Taniguchi, M. (1991). Higher Order Asymptotic Theory for Time Series Analysis. Lecture Notes in Statistics, Springer-Verlag, Berlin. Walker, A.M. (1964). Asymptotic properties of least squares estimates of the parameters of the spectrum of a stationary non-deterministic time series. J. Austral. Math. Soc. 4, 363-384. Whitle, P. (1952). Some results in time series analysis. Skand. Aktuarietidskr. 35, 48-60.
25