log-periodogram regression of time series with ... - Semantic Scholar

LOG-PERIODOGRAM REGRESSION OF TIME SERIES WITH LONG RANGE DEPENDENCE ERIC MOULINES AND PHILIPPE SOULIER

Abstract This paper discusses the use of fractional exponential models (Robinson (1990), Beran (1994)) to model the spectral density f (x) of a covariance stationary process when f (x) may be decomposed as f (x) = x?2d f (x), where f (x) is bounded and bounded away from zero. A form of log-periodogram regression technique is presented both in the parametric context (i.e. f (x) is a nite order exponential model in the sense of Bloom eld (1973)) and the semi-parametric context (f (x) is regarded as a nuisance parameter). Assuming gaussianity and additional conditions on the regularity of f (x) which seem mild, asymptotic normality of the parameter estimates in the parametric and the semi-parametric context is established. As a by-product, some improvements over the results presented by Robinson (1994) have been obtained for the large sample distribution of log-periodogram ordinates for Gaussian processes.

Keywords. Long range dependence. Log-periodogram regression. Fractional exponential models. Central-Limit Theorems for dependent variables.

2

ERIC MOULINES AND PHILIPPE SOULIER

Corresponding author: P. Soulier, Universite d'Evry Val-d'Essonne, Laboratoire de Mathematiques et d'Informatique, 91025 Evry Cedex ,FRANCE e-mail [email protected]

E. Moulines, Ecole Nationale Superieure des Telecommunications, CNRS-URA 820, 46, rue Barrault, 75634 Paris Cedex 13, FRANCE e-mail [email protected]

Soumis a Annals of Statistics

LOG-PERIODOGRAM REGRESSION OF TIME SERIES WITH LONG RANGE DEPENDENCE

1

1. Introduction Let fXtgt2Zbe a covariance stationary process with mean = E (X0) and auto-covariance

(t) = E (Xt ? )(X0 ? ), t 2 Z. We assume that the spectral distribution function is absolutely continuous w.r.t. to Lebesgue measure on [?; + ] and denote f (x) the spectral density function :

(t) =

Z ?

cos(tx)f (x)dx

It is assumed in this contribution that

f (x) = j1 ? eix j?2d f ?(x): where ?1=2 < d < 1=2, and f ? (x) is continuous and bounded away from zero. The parameter d controls the behaviour (and possibly, the singularity) of the spectral density in the neighborhood of the zero frequency, whereas f ? (x) controls the short-memory behavior . When 0 < d < 1=2, the process fXtg is said to be `long-range dependent'. When ?1=2 < d < 0, the spectral density at zero frequency is zero, but the process is still invertible; such a situation occurs, for example, when modeling the rst dierences of a process which is non stationary but less so than a unit root process. The importance of such models in virtually all elds of statistical applications has been demonstrated by numerous examples (see Robinson (1990), Beran (1994) and the references therein). The best known parametric models, which allow modeling of long-memory and short-memory simultaneously are fractional ARIMA models (Granger and Joyeux 1980). Fractional ARIMA models are a natural extensions of standard ARIMA models. It consists in approximating the short-memory component by an ARMA model. An interesting alternative is to use an exponential model for f ? (x), as suggested by Bloom eld (1973).

2


De nition. Let h0 def = 1 and let h1 ; h2; ; hp be even functions. We call fXtgt2Za fractional EXP-process of order p (or FEXP-process) if its spectral density function may be expressed as: log f (x) = dg (x) +

p X

j =0 g(x) = ?2 log j1 ? eix j

j hj (x); x 6= 0:

In the terminology introduced by Beran (1993), the functions hj (x) (1 j p) are referred to as the short memory components. Similar to fractional ARIMA models, the class of FEXP-models is of course very exible. Taking hk (x) = cos(kx) (k = 0; 1; ; p) and d = 0 we precisely obtain the class of models proposed by Bloom eld (1973). As stressed by Robinson (1990), the FEXP model can be regarded as approximating the spectral density of a process with longrange dependence, and also with short-range factor of a quite general nature if the order of the short-range components p can be chosen large. FEXP processes have been introduced in the long-range dependent context by Janacek (1982), who derived a semi-parametric estimate of the long-memory coecient d. Estimation of nite order FEXP models is discussed in Robinson (1990) and Beran (1993). Two methods can be used (i) Whittle approximate maximum likelihood (ii) log-periodogram regression. The Whittle approximate maximum likelihood requires the use of optimization technique, which is numerically involved especially when the order p is large (implementations details can be adapted from Bloom eld (1973)). Central limit theorems for approximate maximum likelihood estimators of the parameters are given in Fox and Taqqu (1986) and Dahlhaus (1989) for Gaussian processes (and Giraitis and Surgailis (1990) for linear processes). These results can be adapted directly in the FEXP models (Robinson (1990)). In this contribution, we study the log-periodogram regression technique. This method was suggested by Bloom eld (1973) for parameter estimation in exponential models. Its application to semi-parametric estimation of the long-range dependent parameter was proposed by Geweke-Porter and Hudak (1983). For FEXP parameter estimation, log-periodogram regression boils down to the following simple procedure. De ne the discrete Fourier transform and the


periodogram of fXtg as:

X !n (x) = (2n)?1=2 Xteitx n

t=1

3

In(x) = j!n (x)j2:

Evaluate the discrete Fourier transform and the periodogram at the Fourier frequencies xk = 2k=n, 1 k n ? 1. The zero frequency is omitted because of the singularity of g at zero. Denote = exp( ), where = 0:5772:: is the Euler's constant. Writing log(In (xk )) = log(f (xk )) + "n;k , with "n;k = log(In (xk )=f (xk )), we get: log(In (xk )) = dg (xk ) +

P

p X

j =0

j hj (xk ) + "n;k + lp?(xk ) 1 k n ? 1:

where lp?(x) def = log(f ? (x)) ? pj=0 j hj (x). Denote p = [d; 0; ; p]T the true value of the parameters (up to the order p). In vector form, the log-periodogram regression estimates ^ p = [d;^ ^1 ; ^p ]T are obtained as: ^ p = (Tp;np;n )?1 Tp;n [log(In (x1 )); ; log(In (xn?1 )]T where

1 h0 (x1) hp(x1 ) C CC .. . CA g(xn?1 ) h0 (xn?1 ) hp(xn?1 )

0 BB g. (x1) p;n = B B@ ..

Obtaining the large sample distribution for the log-periodogram regression estimate ^ p has proven to be a very dicult task. This is because, as hinted by Kunsch (1986) and later exhaustively proved by Hurvich and Beltrao (1993), the low-frequency ordinates "n;k are asymptotically neither equidistributed nor independent, and standard results on linear regression do not apply. Robinson (1995), in a slightly dierent context (local regression for semi-parametric estimate of the long-memory parameter d) overcame this diculty by trimming low-frequency ordinates. This introduces a trimming number, which happens to be rather dicult to tune in practice. In this paper, as aimed by Ludena (1995), we develop a large-sample theory for log-periodogram ordinates over the whole frequency range. Based on this result, we prove the asymptotic normality of the log-periodogram regression estimates for the FEXP models, in the parametric and the semi-parametric contexts.

4


This paper is organized as follows. In section 2, we state our main results. Theorem 1 gives a decomposition of "n;k and Theorem 2 gives conditions upon which a central limit theorem holds for weighted sums in the "n;k . We apply these results in section 3 to parameter estimation for FEXP models and in section 4 to semi-parametric estimation of the long-memory parameter d. Proofs of technical results are given in sections 5 and 6

2. Asymptotics of log-periodogram ordinates Asymptotics of the error terms "n;k have been studied for xed k by Hurvich and Beltrao (1993). Here we give a decomposition of "n;k for all k, that is, over the whole frequency range. The following assumptions are introduced:

(A1) The spectral density of fXtg writes: f (x) = j1 ? eix j?2d f ? (x) where f ? (x) is dierentiable and bounded away from zero. Moreover, the derivative of f ?(x) is bounded. (A2) The process fXtgt2Zis Gaussian. Assumption (A2) seems unfortunately unavoidable because of the highly complicated and nonlinear way in which the log-periodogram depends on fXtgt2Z. It is likely that the results obtained in this paper can be extended to a more general class of processes (e.g. linear processes) : however, the regularity conditions needed for such extensions are liable to be relatively involved.


5

Theorem 1. Assume (A1,A2). Then for each n 1, there exist random variables n;1; ; n;n?1, rn;1; ; rn;n?1 with n;k = n;n ?k and rn;k = rn;n?k such that (1)

E (n;k) = 0; var(n;k ) = 2=6

(2)

"n;k = n;k + rn;k ;

(3)

jrn;k j cd log(1 + k)=k 1 k n=2 w:p:1;

(4)

jcov(n;k ; n;j )j Cd log2(j )k?2jdjj ?2(1?jdj); 0 < k < j n=2:

for some nite constants cd and Cd . Let u be a positive integer, let (k1; ; ku) be a u-tuple of pairwise distinct integers greater or equal to vn , where vn is a sequence of integers such that vn > 4Cd log(n). Let (r1; ; ru) be a u-tuple of positive integers among which exactly s are equal to 1. Then there exists a constant cr < 1 depending only on u and (r1; ; ru) and not on n, such that,

Y u r E n;k cr (log(n)=vn)s: i=1

(5)

i

i

Remarks. Theorem 1 means that "n;k can be approximated by n;k which is the logarithm of a central chi-square with 2 degrees of freedom. n;k is a centered variable with exponential moments and variance 2=6. We will show that rn;k plays the role of a (negligible) bias term. Note that this Theorem is not in contradiction with the result of Hurvich and Beltrao (1993) who proved that the log-periodogram ordinates for xed k are asymptotically neither independent nor equidistributed. For any xed k, the remainder term rn;k in Theorem 1 cannot be neglected. But when summing over all the frequency range, "n;k can be replaced by n;k , under reasonable assumptions on the summand.

Proof of Theorem 1. Our choice of frequencies xk makes the correction for the unknown mean unnecessary because Pn exp(itx ) = 0 for 0 < k < n. It is thus assumed in the sequel that t=1

k

6


E (Xt) = 0. De ne:

2 n?1 !23?1=2 nX ?1 X Xt cos(txk ); Xt cos(txk ) 5 n;k = 4E t=0 t=0 2 n?1 !23?1=2 nX ?1 X 5 4 Xt sin(txk ); Xt sin(txk ) n;k = E t=0

t=0

an;k = 12 E (In(xk )); bn;k + icn;k = 21 E (!n(xk )2 ): De ne also

n;k = cov(n;k ; n;k ) = where n;k = bn;k =an;k . Finally, de ne + n;k ; Un;k = pn;k 2 + 2 n;k

qcn;k 2 ; an;k 1 ? n;k

? n;k ; Vn;k = pn;k 2 ? 2 n;k

2 + V 2 ) ? log 2 + : n;k = log(Un;k n;k

It is easily seen that [Un;k ; Vn;k ] is a two-dimensional standard Gaussian vector and that n;k is a centered random variable with variance 2=6. Straightforward algebraic manipulations yield

"n;k = n;k + rn;k ; where

!

!

2an;k 2 ? 2 2 n;k U 2 ? Vn;k n;k + log 1 +

+ log rn;k = log 1 + n;k Un;k n;k 2 2 2 2 n;k + n;k f (xk ) : n;k + Vn;k

Based on this decomposition, the proof of Theorem 1 relies on the following lemmas. Without any confusion, we denote c a constant which may take dierent values upon each appearance. For short, we adopt the following convention : log(1) = 1.

Lemma 1. There exist nite constants c1 and c2 such that, for all even n 1, and all 1 k

n=2

j2an;k ? f (xk )j c1 log(k k) x?k 2d; an;k c2 x?k 2d :


7

Lemma 2. There exists a constant c < 1, such that, for all n 1 and all 1 k n=2, jbn;k j c log(k k) x?k 2d; jcn;k j c log(k k) x?k 2d: These two lemmas are re nements of Theorem 2 of Robinson (1995) in the sense that the bounds are uniform w.r.t n for each k, 1 k n=2. Lemmas 1 and 2 imply that there exists a constant c < 1 such that, for all n 1

j n;k j c log(k)=k jn;k j c log(k)=k j2an;k =f (xk ) ? 1j c log(k)=k: All we now need to conclude the proof of (3) is that j n;k j and jn;k j are bounded away from 1, and 2an;k =f (xk ) is bounded away from 0.

Lemma 3. There exists some 0 < < 1, such that for all 1 k n=2, n;k and n;k are in [?1 + ; 1 ? ] and 2an;k =f (xk ) > . It remains to evaluate the moments of products of n;k . Note that for any u-tuple of positive integers (k1; ; ku), [Un;k1 ; Vn;k1 ; ; Un;k ; Vn;k ] is a Gaussian vector. Let ?n (k1; ; ku) denote its covariance matrix. This matrix is a 2u 2u block matrix, with 2 2 blocks. The diagonal blocks are all equal to the 2 2 identity matrix I2, and the non diagonal blocks are An (ki; kj ); = E ([Un;k ; Vn;k ]T [Un;k ; Vn;k ]), i 6= j . u

j

i

i

u

j

To compute E (n;k1 ; ; n;k ), we use an Edgeworth-like expansion of the joint distribution of (Un;k1 ; Vn;k1 ; ; Un;k ; Vn;k ) about the standard Gaussian distribution. The following result is doubtless known, but we have failed to locate a reference and thus include the proof. u

u

u

Proposition 1. Let be a symmetric a a matrix with spectral radius () < 1=4. Let E denote expectation with respect to the a-dimensional Gaussian distribution with covariance matrix ?Ia +, and E 0 denote expectation with respect to the a-dimensional standard Gaussian

8


distribution. Let X = (X1; ; Xa) and (X ) be a random variable such that E 0( 2(X )) < 1. The following expansion holds :

(det?)1=2E (

(X )) = E 0(

(X )) +

1 1 (?1=2)q X X q=1

q!

t=q

and

(6)

(?1)t

1 + +kq =t k1 >0;kq >0

E0

(X )

k

1=2

1=2

E ( (X )) = E 0( (X )) + O() E 0( 2(X ))

If E 0(X T X (X )) = 0, then

(7)

X

E ( (X )) = E 0( (X )) + O( 2) E 0( 2(X ))

Q

Yq i=1

X T ki X

!

:

:

If E 0( (X )) = 0 and there exists some q0 2 such that E 0 qi=1 X T k X (X ) = 0 for all q-tuple (k1; k2; ; kq) verifying 1 q k1 + + kq < q0 , then

(8)

E ( (X )) = O( q0 ) E 0( 2(X ))

1=2

i

:

To use this result, we need to prove that the spectral radius of ?n (k1; ; ku) ? I2u is small enough. So we need to nd uniform bounds (in n) for the non diagonal entries of ?n (k1; ; ku).

Lemma 4. Assume A1. Then there exists a constant c < 1 such that for all 1 k < j < n=2, jE (Un;kVn;j )j + jE (Vn;kVn;j )j + jE (Vn;kUn;j )j + jE (Un;kUn;j )j c log(j )k?jdjj jdj?1: We can now compute cov(n;k ; n;j ), (0 < k < j ). Let

0 0 n (k; j ) = B @

1 An(k; j ) C A:

An (k; j ) 0

The spectral radius of n (k; j ) is of order log(j )k?jdjj jdj?1, so it is smaller than some < 1=4 for j greater than some J0 (independent of n) and all 1 k < j . Let '(x1; x2) = log(x21 +x22 )?log 2+ .


9

If X1 and X2 are i.i.d. standard Gaussian, '(X1; X2) is a centered and symmetric random variable, so we have (9)

E 0('(X1; X2)) = E 0(X1 '(X1; X2)) = E 0(X2'(X1; X2)) = E 0(X1X2'(X1; X2)) = 0:

Note that (9) holds for any even function such that E 0((X1; X2)) = 0. Let 2(x1; x2; x3; x4) = '(x1; x2)'(x3; x4) and let X = (X1; X2; X3; X4). Since the diagonal terms of n (k; j ) are all 0, (9) implies E 0(X T n (k; j )X 2(X )) = 0. Note that cov(n;k ; n;j ) = E (k;j ) ( 2(X )), so for j > J0 and 0 < k < j , we can apply Proposition 1 : n

cov(n;k ; n;j ) = O(log2 (j )k?2jdjj 2(jdj?1)): For 1 k < j J0, Holder inequality implies that jcov(n;k ; n;j )j 2=6, so (4) holds. We now prove (5). We will prove that we can apply Proposition 1 with q0 = s where s is de ned as the number of indices 1 i u such that ri = 1. Write ?n (k1; ; ku ) = +I2u + k1 ; ;k . k1 ; ;k has zero diagonal entries and, by Lemma 4, its spectral radius is of order Cd log(n)=vn < 1=4 under the assumption of Theorem 1. u

u

Q

For s = 0, (5) is implied by the Holder inequality. Denote u (x1 ; ; x2u) = ui=1 'r (x2i?1; x2i). Since s 1, E 0( u (X )) = 0, where X = (X1; ; X2u). So, for s = 1, (5) is a consequence of (6). For s 2, we show that we may apply (8) with q0 = s. Let be an arbitrary 2u 2u matrix. Let q and t be positive integers such that, 1 q t < s. We must prove that for any q-tuple of strictly positive integers n1 ; ; nq such that n1 + + nq = t, we have: (10)

E0

u (X )

Yq i=1

X T ni X

!

= 0:

Since under E 0 the variables Xi are i.i.d., we can expand E 0 weighted sum of terms (11)

u Y i=1

E 0(X2i2??11 X2i2 'r (X2i?1; X2i)) i

i

i

i

Qq T n u (X ) i=1 X i X

as a

10


P

u = 2q . Because of (9) this product where 1; ; 2u are non-negative integers such that 2i=1 i vanishes if the number of exponents i greater than, or equal to 2 in (11) is strictly less than s, since in that case one of the terms of the product will necessarily be either E 0('(X1; X2)), u = E 0(X1'(X1; X2)), E 0(X2'(X1; X2)) or E 0(X1X2'(X1; X2)), and thus vanish. Since P2i=1 i 2q , the number of exponents i 2 is at most q . So (10) hods for all q < s and all q -tuple (n1 ; ; nq ) such that n1 + + nq < s. This concludes the proof of Theorem 1.

We now state a central limit theorem for weighted sums involving the variables fn;k g. This theorem extends standard CLT for triangular arrays of martingale dierences or weakly dependent sequences, and we will use it in the next sections to prove central limit theorems for log-periodogram estimates.

Theorem 2. Let fvn g and fwng be two nondecreasing sequences of integers such that 0 < vn wn < n=2. Let f n;k gv kw be a triangular array of non identically vanishing real numbers. n

n

De ne

Sn = an =

w X n

k=vn wn X k=vn

n;k n;k ; s2n =

j n;k j;

w X n

k=vn

2 ; n;k

j j: bn = v max kw n;k n

n

Assume that (i) limn!1 sn =bn = 1 (ii) limn!1 an log(n)=sn vn = 0. Then,

s?n 1 Sn !d N (0; 2=6): The proof is given in section 5. It is based on the so-called method of moments, consisting in showing that the moments of s?n 1 Sn converge to the moments of a Gaussian random variable. 3. Asymptotic normality of log-periodogram estimates: parametric case In the sequel, the short memory components hj , j 0 are chosen to be h0 (x) = p1 ; hj (x) = p1 cos(jx): 2


11

Any orthonormal complete basis of the Hilbert space of square integrable even functions can in principle be used. The trigonometric basis is convenient because the Fourier coecients of g are known (see below) and though g is not continuous at 0, its truncated Fourier series converges uniformly on every compact sets of [?; + ] n f0g. Denote:

l?(x) = log(f ?(x)); j =

Z

?

lp?(x) =

l?(x)hj (x)dx; 1 X

j =p+1

j hj (x):

In the parametric context, lp?(x) identically vanishes for p greater than the order of the model. In the semi parametric context, since the j 's are the Fourier coecient of l?, the rate of convergence of related series can be easily evaluated in terms of the Holder regularity of l? or f ? . The error ^ p ? p is classically split in two terms (i) a regression noise and (ii) a deterministic bias accounting for the under-modelization which of course vanishes in the parametric context: ? ^ p = p + (Tp;np;n )?1 Tp;n "n + (Tp;n p;n)?1 Tp;n lp;n ? = [l?(x ); ; l?(x )]T . where "n = ["n;1; ; "n;n?1 ]T and lp;n p 1 p n?1

In this section, we state a central limit theorem for the estimate ^ p in the parametric context. The exact order p of the FEXP model (or an upper bound for it) is assumed to ? is identically vanishing. Let 0 = 0 be known. This implies that the truncation bias lp;n R p and j = ? hj (x)g (x)dx = 2 =j (j > 0) be the j -th Fourier coecient of g and let 2 p = [0; ; p]T and p = P1 j =p+1 j .

Theorem 3. Assume A1-A2. Then pn(^ p ? p) is asymptotically normal with zero mean and covariance

0 1 T 1 ?p C Gp = 3 @ A p 2 3 B

p Ip+1

12


Proof of Theorem 3. We rst introduce some notations. Let 1 0 h ( x ) h ( x ) 0 1 p 1 CC B B . . CC : . . Hp;n = B . B A @. h0 (xn?1 ) hp (xn?1 ) p T Hp;n and 1IT def Denote Wp;n = (2=n)Hp;n p+1 = (1= 2; 1; ; 1). Note that (12)

Wp;n = Ip+1 ? (2=n)1Ip+1 1ITp+1 ;

(13)

?1 = Ip+1 + (n=2 ? p ? 1=2)?11Ip+1 1IT : Wp;n p+1

Denote ~ p = [~0 ; ; ~ p]T , g n = [g (x1); ; g (xn?1)]T and g~p;n = [~gp;n (x1); ; g~p;n(xn?1 )]T , where T g and g~ = g ? H W ?1 ~p = 2n Hp;n p;n p;n ~p;n : n p;n n Finally, de ne

~p;n = (2=n)

nX ?1 k=1

2 (x ): g~p;n k

By construction, g~p;n is orthogonal to the range of matrix Hp;n . An elementary work out of the regression equation yields, in vector form

3 3 2 2 1 75 g~T + 2 64 0 75 : (Tp;n p;n )?1Tp;n = n2 ~ 64 p;n n Wp;n ?1 ~ p ?1 H T p;n ?Wp;n p;n

According to Theorem 1, we decompose the regression noise "n;k as "n;k = n;k + rn;k . Since n;k is zero-mean, the bias is entirely captured by rn;k . To prove that the bias is asymptotically negligible and to apply Theorem 2, we need to derive bounds on g~p;n(xk ) and to evaluate how good are the approximations of j and p by ~ j and ~p.

Lemma 5. Let p and n be positive integers such that p + 1 < n=4. (14) (15) (16)

max jj ? ~ j j = O(log(n)=n)

0j

log-periodogram regression of time series with ... - Semantic Scholar

log-periodogram regression of time series with ... - Semantic Scholar

Suggest Documents

TIME SERIES REGRESSION WITH A UNIT ROOT

Nonparametric regression with rescaled time series errors

Econometrics Regression Analysis with Time Series Data

Quantile Regression for Time-Series-Cross ... - Semantic Scholar

Time-Series Analysis - Semantic Scholar

Time Series Databases - Semantic Scholar

Time Series Regression with Mixtures of Integrated Regressors

Short-term Time Series Forecasting with Regression Automata

Least angle regression for time series forecasting with ... - CiteSeerX

Nonparametric Regression in Time Series with Errors-in-Variables

Hybrid Quantile Regression Estimation for Time Series Models with ...

TIME SERIES REGRESSION WITH LONG-RANGE ... - Project Euclid

Time Series Nonparametric Regression Using Asymmetric ... - CiteSeerX

Forecasting, Time Series, Regression - Google Sites

Forecasting, Time Series, Regression - Google Sites

Forecasting, Time Series, Regression - Google Sites

Time series regression studies in environmental epidemiology

Forecasting, Time Series, Regression - Google Sites

Forecasting, Time Series, Regression - Google Sites

Forecasting, Time Series, Regression - Google Sites

Caudal Regression Syndrome: A Case Series of a ... - Semantic Scholar

common features in time series with both ... - Semantic Scholar

Time series segmentation with shifting means ... - Semantic Scholar

Time series anomaly discovery with grammar ... - Semantic Scholar