Lecture Notes

55 downloads 276 Views 956KB Size Report
... in the spring semester 2011. These chapters are mainly based on Chapters 1, 2, 3, 5 and 8 of Brockwell and Davis (1991). Time Series: Theory and Methods.
Time series analysis I Siegfried H¨ormann1

1 The

raw version of the first five chapters of these notes was deliverd by Andrea Colombo based on my course TSI given in the spring semester 2011. These chapters are mainly based on Chapters 1, 2, 3, 5 and 8 of Brockwell and Davis (1991). Time Series: Theory and Methods. Springer.

2

Contents 1 Stationary time series 1.1 Data and model . . . . . . . . . . . 1.2 Stationarity . . . . . . . . . . . . . 1.3 Properties of the ACF . . . . . . . 1.4 Sample mean and sample ACF . . 1.5 Estimation and elimination of trend 1.6 Exercises . . . . . . . . . . . . . . . 2 Hilbert spaces 2.1 Inner product spaces . . 2.2 A first application . . . . 2.3 The projection theorem . 2.4 Exercises . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . and seasonal components . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

3 ARMA models 3.1 Introduction and definitions . . . . . . . . 3.2 Stationary solutions for ARMA processes . 3.3 Computing the ACF of an ARMA process 3.4 The partial autocorrelation function . . . . 3.5 Identification of ARMA models . . . . . . 3.6 Invertibility . . . . . . . . . . . . . . . . . 3.7 Exercises . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . .

. . . .

. . . . . .

. . . .

. . . . . .

. . . .

. . . . . .

5 5 13 15 16 19 25

. . . .

29 29 32 33 35

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

39 39 45 52 57 59 60 61

4 Forecasting 4.1 Prediction equations . . . . . . . . . . . . . . 4.2 Innovations algorithm . . . . . . . . . . . . . . 4.3 Prediction of ARMA processes . . . . . . . . . 4.4 An alternative method to compute the PACF 4.5 Exercises . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

65 65 69 71 73 74

. . . . .

77 77 80 84 86 88

. . . . . . .

5 Parameter estimation for ARMA processes 5.1 Preliminary estimation method for AR processes 5.2 Recursive calculation of Gaussian likelihood . . 5.3 Order selection . . . . . . . . . . . . . . . . . . 5.4 Asymptotics for AR(1) estimators . . . . . . . . 5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . 3

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

4

Index

CONTENTS

89

Chapter 1 Stationary time series 1.1

Data and model

The term time series is used to indicate two objects: 1. A collection of data (observations) all collected at particular time point t: {xt |t ∈ T }, where T is an index set. Typically T = [a, b] with continuous time observations and T = {t1 , t2 , . . .} in discrete time observations. In this lecture we will consider T ⊆ Z. Some examples are shown in Figure 1.1. 2. The model describing data. For this purpose we will use a stochastic process (see definition below) {Xt |t ∈ T }. General idea: each xt is the realization of a random variable Xt . Note that for the data we usually have {x1 , . . . , xn } (we start observing at time 1 and stop at time n), while we use as a model {Xt , t ∈ Z}, i.e. process runs from infinite past to infinite future. The interpretation is: the process will usually already run before we start observing the output (before time 1) and it will continue after we stop observing at time n. Technical reason: mathematical convenience. What makes time series analysis different from classical statistics? Data are dependent and correlated, whereas in statistics usually independence between the observations is assumed. Roughly speaking, in statistics the experimenter can repeat the experiment under the same conditions and independently. In time series we cannot redo the experiment, we only observe from a running process. On the one hand that requires more complex models but on the other hand, dependence is the crucial ingredient for forecasting, one of the central topics in time series analysis. Definition 1.1.1 (Stochastic Process). A stochastic process is a collection of random variables {Xt |t ∈ T } defined on a common probability space (Ω, A , P ). The functions {Xt (ω)|t ∈ T }, with ω ∈ Ω are called realizations or trajectories or also sample paths. We use {xt |t ∈ T } to indicate the values observed. Recall: A random variable X is a measurable mapping defined on a probability space (Ω, A , P ): X : (Ω, A ) → (R, B(R)).

5

CHAPTER 1. STATIONARY TIME SERIES

0

−0.4

0.0

gtemp

10 5

V1

15

0.4

6

1960

1965

1970

1975

1980

1900

1920

1940

1980

2000

Time

0 0

500

1000

1500

2000

0

200

400

600

800

1000

Time

0

50

150

Time

sunspots

2000

speech

0.00 −0.15

nyse

4000

Time

1960

0

100

200

300

400

Time

Figure 1.1: Examples of time series data: Johnson & Johnson quarterly earning per share (upper left), Yearly average global temperature deviations (upper right), NYSE stock return data from 1984–1991 (middle left), speech recording of the syllable aaa...hhh (middle right) and sunspot numbers measured biannually form 1749–1978 (lower left). [R-code]

1.1. DATA AND MODEL

7

Here B(R) is the Borel σ-algebra. In other words ω 7→ X(ω) ∈ R and X −1 (B) ∈ A , ∀B ∈ B(R). The distribution function of X is given by FX (x) = P (X ≤ x) = P ({ω|X(ω) ≤ x}) = P (X −1 (−∞, x]).

Simple examples: 1. IID sequences. {Xt , t = 1, 2, . . .} is a sequence of independent and identically distributed (iid) random variables. We will write {Xt } ∼ IID or {Xt } ∼ IID(µ, σ 2 ) to indicate EXt = µ and Var(Xt ) = σ 2 . 2. Random walk. {Zt , t = 1, 2, . . .} is a sequence of iid random variables and {Xt , t = 1, 2, . . .} is defined as  Xt = 0 if t = 0, Pt Xt = k=1 Zk if t ≥ 1. 3. Branching process. This stochastic model is used to describe the evolution of a population size. Let’s call Xt the size of the population at time t described by:  Xt = x if t = 0, PXt Xt+1 = k=1 Zk,t if t ≥ 1, where Zj,t is a random variable indicating the number of offsprings generated by individual j for j = 1, 2, . . . at time t for t = 0, 1, . . .. 4. Moving average. Assume {Zt } ∼ IID(µ, σ 2 ). A process of the form Xt = a1 Zt + a2 Zt−1 + · · · + aq Zt−q is called a moving average. 5. AR(1). Assume {Zt } ∼ IID(0, σ 2 ). Then we call a process of the form Xt = ρXt−1 + Zt an auto-regressive process of order 1 (AR(1)). 6. Signal plus noise. Assume that {Zt } ∼ IID(0, σ 2 ) and Xt = µ(t) + Zt , where µ(t) is a deterministic process. (See Figure 1.2). More general than i.i.d. variables is white noise. Definition 1.1.2 (White noise). A process {Zt , t ∈ Z} is called white noise if the following holds: • Var(Zt ) = σZ2 < ∞, ∀t, • EZt = µ, ∀t, • Cov (Zt , Zt+h ) = 0, ∀t, |h| ≥ 1. Short: Zt ∼ WN(µ, σ 2 ).

CHAPTER 1. STATIONARY TIME SERIES

0 −2

c

1

2

8

0

100

200

300

400

500

300

400

500

300

400

500

0 2 4 −4

c+w

Time

0

100

200

5 −5 −20

c+5*w

Time

0

100

200 Time

Figure 1.2: Cosine wave with period 50 (top) compared with cosine wave contaminated with additive Gaussian noise (σ = 1, middle) and (σ = 5, bottom). [R-code]

1.1. DATA AND MODEL

9

2



1

●●●●●





●●



●●●









● ●





0



wn



● ●





−1









●●









−2





0

10

20

30

40

Index

Figure 1.3: A white noise sequence. Data are independent and N (0, 1) for t ∈ {1, . . . , 20} and Rademacher (±1 with probability 1/2) for t ≥ 21.

Note that white noise can be (i) dependent! E.g. the famous GARCH model used in econometric applications is only white noise. (ii) The distribution of the marginals can be entirely different for different time points. (See Figure 1.3) It is also important to note that for a general stochastic process the marginal distributions FXt (·) = P (Xt ≤ ·) don’t say anything about the dynamics of the process. For example, consider the processes Xt = X, ∀t ≥ 1, and Xt ∼iid X,

∀t ≥ 1.

They have the same marginal distribution but they behave completely different as a process. For the first, knowledge of X1 provides perfect knowledge of Xt for any t ≥ 1. For the second example, knowledge of X1 doesn’t provide information of Xt for t > 1. Another example is shown in Figure 1.4. The four processes shown their all came from a process where the marginal distributions are the same. Hence, we need more information on joint distributions. Definition 1.1.3 (Finite dimensional distributions, f.d.d.’s). Let {Xt , t ∈ T } be a stochastic process and let T = {t = (t1 , . . . , tn ), t1 < t2 < . . . < tn , n = 1, 2, . . .}. Then the finite-dimensional distribution functions of the stochastic process are the functions {Ft (·), t = (t1 , . . . , tn ) ∈ T } with Ft (x) = Ft (x1 , . . . , xn ) = P (Xt1 ≤ x1 , . . . , Xtn ≤ xn ) .

−3

−2

0

ar0.9n

1

2

1 2 3

CHAPTER 1. STATIONARY TIME SERIES

−1

ar0.5

10

0

50

100

150

200

0

50

150

200

150

200

Time

1 −2

0

ma

0 −2 −1

ar0.9

1

2

2

3

Time

100

0

50

100 Time

150

200

0

50

100 Time

Figure 1.4: Four different time series with i.i.d. Gaussian innovations: AR(1) with ρ = 0.5 and σ 2 = 1 (upper left), ρ = −0.9 and σ 2 = 0.253 (upper right), ρ = 0.9 and σ 2 = 0.253 (lower left) and an MA(1) process of the form Xt = Zt + bZt−1 where b = 0.58 and σ 2 = 1 (lower right). [R-code]

1.1. DATA AND MODEL

0

4

−4

ar0.5

0 ar0.9n

4

0

500 1000

Histogram of ma

Frequency

500 1000 0

500 1000 0

−4

Histogram of ar0.9

Frequency

Histogram of ar0.9n

Frequency

500 1000 0

Frequency

Histogram of ar0.5

11

−4

0 2 4 ar0.9

−4

0

4

ma

Figure 1.5: Histograms from 10000 observations of respective processes shown in Figure 1.4. [R-code] Example 1.1.1. If the random variables Xt are independent, then the finite-dimensional distribution functions are Ft (x) = Ft (x1 , . . . , xn ) =

n Y

FXti (xi ) = P (Xt1 ≤ x1 ) · . . . · P (Xtn ≤ xn ) .

i=1

It seems that one possible way to define a stochastic process (i.e. define a time series model) is to make assumptions on the finite dimensional distributions. There is (at least) one problem: how do we know that to a given family of f.d.d.’s a stochastic process will exist? NOTE: it is actually not even trivial to claim the existence of an iid sequence (see probability theory)! This theorem helps: Theorem 1.1.1 (Existence theorem of Kolmogorov). Denote x = (x1 , . . . , xn ) and t = (t1 , . . . , tn ) and x(i) and t(i) the same vectors leaving out the i-th coordinate. A family of finite-dimensional distribution functions {Ft (·), t ∈ T } defines a stochastic process if and only if  lim Ft (x) = Ft(i) x(i) , (1.1) xi →∞

∀n ≥ 1, ∀t ∈ T , ∀i ∈ {1, . . . , n}. R Fourier version: Let φt (u) = Rn exp(iuT x)dFt (x) be the characteristic function of Ft (x). Then the statement of the theorem holds true, if limui →0 ϕt (u1 , . . . , un ) = ϕt(i) (u(i) ). Proof. Not here. What does it say? If we let xi in (1.1) go to ∞, then this means that we allow ultimately arbitrary values for Xi . The resulting limit should be the same as if we kick out the i-th variable a priory. So this is a very plausible consistency requirement. Exercise: Use Theorem 1.1.1 to prove the existence of an iid sequence. (Exercise 1.1.) We will later use Kolmogorov’s extension theorem to prove the existence of processes, belonging to a very important family (see proof of Theorem 1.3.1):

12

CHAPTER 1. STATIONARY TIME SERIES

Example 1.1.2 (Gaussian processes). A process {Xt } is called Gaussian, if all its finite dimensional distributions are multivariate normal. Recall: A vector X = (X1 , . . . , Xn )T is multivariate normally distributed if we can write X in the following way: X = µ + BN where µ ∈ Rn , B ∈ Rn×m , N = (N1 , . . . , Nm )T and Ni ∼iid N (0, 1). Then • E[X] = µ, Cov(X) = BB T ∈ Rn×n . ˜ with BB T = AAT and thus have the same mean and • If X = µ + BN and Y = µ + AN d same covariance, then X = Y (i.e. X has the same distribution as Y). Proof. Look at the characteristic function and assume without loss of generality that µ = 0. Then ϕX (u) = E[exp(iuT X)] = E[exp(i(B T u)T N)] 1 = ϕN (B T u) = exp(− uT BB T u) 2 1 = exp(− uT AAT u) 2 = ϕY (u).

• Σ := BB T is non-negative definite: aT BB T a ≥ 0, ∀a ∈ Rn since aT BB T a = (B T a)T (B T a) =k B T a k2 . • BB T is symmetric: (BB T )T = BB T . • every non negative definite matrix Σ can be written in the form Σ = BB T (by the spectral theorem). Consequences. • The multivariate normal distribution depends only on µ and Σ = BB T . • For all Σ symmetric and non-negative a multivariate normal vector exists, whose covariance matrix is Σ. • Notation: X ∼ N(µ, Σ).

Theorem 1.1.1 is very important, but it has still one drawback. How would we in general come up with a reasonable family of finite dimensional distributions? Another very important method to construct a stochastic process is by constructing them from independent variables or some other existing process: Example 1.1.3 (Sinusoid wave with random phase and amplitude). Xt = A · cos(ρt + B) where A is a non-negative random variable independent of B ∼ U ([0, 2π]) (uniform distribution on [0, 2π]). Example 1.1.4 (Moving average process of order q (MA(q)). Xt = Zt + θ1 Zt−1 + · · · + θq Zt−q , with {Zt } ∼ WN(0, σ 2 ). Example 1.1.5 (Linear process). X Xt = ak εt−k , {εt } ∼ WN(0, σ 2 ),

a0 , a1 , . . . ∈ R.

k≥0

For these examples it might be difficult to calculate the f.d.d.’s, but existence is no problem!

1.2. STATIONARITY

1.2

13

Stationarity

Within the context of a finite number random variables we often use the covariance function in order to understand the dependence between the variables. Now we need a device that can extend the concept of covariance matrix to deal with infinite number of random variables. This is the autocovariance function. Definition 1.2.1 (Autocovariance function, ACF). For a process {Xt , t ∈ T } for which Var (Xt ) < ∞ for each t ∈ T , the autocovariance function γX (·, ·) of {Xt } is given by: γX (r, s) = Cov(Xr , Xs ) = E[(Xr − E[Xr ])(Xs − E[Xs ])], r, s ∈ T. Remark: Note that Var(X) < ∞ ⇔ EX 2 < ∞. The latter implies (by CauchySchwarz inequality, Proposition 2.1.1) that E|X| < ∞. An appealing aspect of iid sequences is that there stochastic behavior remains stable. We introduce now a framework which is very often used to describe a certain structural stability in terms of the first two moments of the process and of its dependence. Definition 1.2.2 (Weak Stationarity). The time series {Xt , t ∈ Z} is said to be stationary if: (i) EXt2 < ∞, (ii) EXt = m,

∀t ∀t ∈ Z (i.e. the expected value is constant)

(iii) γX (r, s) = Cov(Xr , Xs ) = γX (r + t, s + t), ∀t ∈ Z, ∀r, s ∈ Z. Example 1.2.1. An iid sequence is weakly stationary (provided it has second moments), whereas the random walk model is not weakly stationary. Remark. Apparently γX (r, s) = γX (s, r). Hence by (iii) γX (0, s − r) = γX (r, s) = γX (r − s, 0) = γX (0, r − s) This means that the ACF of a stationary time series can be expressed as a one parameter functions γX (h), with h = |r − s|. In this case one often uses the autocorrelation function (acf) to asses dependence: ρX (h) =

Cov(Xt+h , Xt ) γX (h) = = Corr(Xt+h , Xt ). γX (0) Var(X)

Remark. ρX (h) is a good and simple measure for dependence. But ρX (h) = 0 does generally not imply that Xt independent of Xt+h . For Gaussian processes, uncorrelated implies independent. Here is another, more stringent notion of stationarity.

14

CHAPTER 1. STATIONARY TIME SERIES

Definition 1.2.3 (Strict Stationarity). A process {Xt |t ∈ Z} is strictly stationary if Ft (·) = Ft+h (·), ∀h ∈ Z, t ∈ T or equivalently P (Xt1 ≤ x1 , . . . , Xtn ≤ xn ) = P (Xt1 +h ≤ x1 , . . . , Xtn +h ≤ xn ). Example 1.2.2. An iid sequence is strictly stationary. Example 1.2.3. Xt = X ∀t is strictly stationary. The following proposition is quite useful: Proposition 1.2.1. Assume {Xt |t ∈ Z} is strictly stationary and f : R∞ → R is B(R∞ )-B(R)-measurable. Then Yt = f (Xt , Xt−1 , . . .) is again strictly stationary. Proof. Not here. Example 1.2.4. Assume {εt } ∼ IID. Then it is a strictly stationary process. Define the linear process X ak εt−k = a0 εt + a1 εt−1 + a2 εt−1 + . . . = f (εt , εt−1 , . . .) Xt = k≥0

If the series converges (in some sense to be specified), {Xt , t ∈ Z} is strictly stationary. Relation between weak and strict stationarity. If {Xt } is strictly stationarity, the distribution Ft of Xt is the same for each t ∈ Z. Moreover, any pair (Xt , Xt+h ) has a joint distribution Ft,t+h that doesn’t depend on t. Hence Z EXt = xdFt (x) = m R

is independent of t and Z Z (x − m)(y − m)dFt,t+h (x, y)

Cov(Xt , Xt+h ) = R

R

is independent of t. Hence: Strict stationarity =⇒ weak stationarity provided the sequence has finite second moments. The reverse is in general not true. (Find a simple counter example.) There is an important case where weak stationarity implies strict stationarity. It is the case of the Gaussian process. In this case weak stationarity of {Xt } implies strict stationarity since (Xt1 , . . . , Xtn )0 and (Xt1 +h , . . . , Xtn +h )0 have the same mean and covariance matrix and hence the same distribution for all n = {1, 2, . . .} and for all h, t1 , . . . , tn ∈ Z. NOTE: We will henceforth call a time series stationary, if it is weakly stationary.

1.3. PROPERTIES OF THE ACF

1.3

15

Properties of the ACF

Proposition 1.3.1 (Elementary properties of the ACF). If γX (·) is the ACF of a stationary process {Xt , t ∈ Z}, then (i) γX (0) ≥ 0, (ii) |γX (h)| ≤ γ(0), ∀h ∈ Z, (iii) γX (h) = γX (−h), ∀h ∈ Z. Proof. (i) follows from γ(0) = Var(Xt ) ≥ 0, (ii) follows from 1

1

|γX (h)| = |Cov(Xt+h , Xt )| ≤ (Var(Xt+h )) 2 (Var(Xt )) 2 = γX (0) (Cauchy-Schwarz inequality, Proposition 2.1.1) and (iii) is trivial. Definition 1.3.1 (Non-negative definiteness, n.n.d.). A real-value function on the integers κ : Z → R is called non negative definite (n.n.d.) if n n X X

ai κ(ti − tj )aj ≥ 0,

ai ∈ R,

ti ∈ Z,

n ≥ 1.

i=1 i=1

In other words: the matrices K = ((κ(ti − tj )))ni,j=1 are n.n.d. Theorem 1.3.1 (Characterization of ACFs). A function γ : Z → R is the autocorrelation function of a stationary process {Xt , t ∈ Z}, iff γ is even and n.n.d. Proof. Assume γ is an ACF of a stationary process. Then γ is even (Proposition 1.3.1, (iii)). Further 0 ≤ Var

n X

! ai Xti

i=1

= =

n X n X i=1 j=1 n X n X

ai aj Cov(Xti , Xtj ) ai aj γ(ti − tj ).

i=1 j=1

Assume γ is even and n.n.d. We will show that a Gaussian process with ACF γ exists. We define Γt = (γ(ti − tj ))ni,j=1 . Note that since Γt is nnd, we can find a multivariate normal vector X with Γt = Cov(X). We will use now the ”Fourier version” of Kolmogorov’s extension theorem: Let ϕt (u) be the characteristic function of a N(0, Γt ) random vector.

16

CHAPTER 1. STATIONARY TIME SERIES

Z

exp(iuT x)dFt (x)

lim ϕt (u) = lim

uk →0

uk →0

Rn

1 = lim exp(− uT Γt u) uk →0 2 n 1X ui γ(ti − tj )uj ) = lim exp(− uk →0 2 i,j=1 1X = exp(− ui γ(ti − tj )uj ) 2 i,j=1 i,j6=k

(k)

= ϕt(k) (u ). Hence: Kolmogorov’s extention theorem applies. Example 1.3.1. Define: Xt = Zt + θZt−1 , where Zt ∼ IID(0, σ 2 ), θ ∈ R (MA(1) process) Xt−1 = Zt−1 + θZt−2 Xt−2 = Zt−2 + θZt−3 , independent from the first equation. Hence: γX (0) = σ 2 (1 + θ2 ) γX (1) = σ 2 θ = γX (−1) γX (k) = 0, ∀|k| > 1 Assume now that: κ(0) = a κ(1) = b = κ(−1) κ(k) = 0, ∀|k| > 1 What requirements on a and b do we need to prove the existence of a stationary process with ACF κ? See Exercise 1.8.

1.4

Sample mean and sample ACF

Assume we have a stationary process {Xt } with E[Xt ] = m and ACF γX (h). In practice we don’t know m and γX , hence they must be estimated from the data. Natural estimators are the sample mean ¯ n = 1 (X1 + . . . + Xn ), m ˆ =X n and the sample ACF n−|h| 1 X ¯ n )(Xk − X ¯n) γˆX (h) = (Xk+|h| − X n k=1

1.4. SAMPLE MEAN AND SAMPLE ACF

17

Series ar0.9n

0.5 10 15 20 25 30

0

10 15 20 25 30

Lag

Lag

Series ar0.9

Series ma

0.0

0.4

ACF

0.4 0.0

ACF

5

0.8

5

0.8

0

−0.5 0.0

ACF

0.4 0.0

ACF

0.8

1.0

Series ar0.5

0

5

10 15 20 25 30 Lag

0

5

10 15 20 25 30 Lag

Figure 1.6: The sample auto-correlation function of 2000 observations from the processes in Figure 1.4. [R-code]

18

CHAPTER 1. STATIONARY TIME SERIES

Figure 1.6 shows the sample auto-correlation function of 2000 observations from the processes in Figure 1.4. One can show under quite general conditions that these estimators are consistent (and asymptotically normal, not now.) ¯ n −m| > ε) n→∞ (a) P (|X −→ 0, ∀ε > 0 (weak consistency ↔ convergence in probability) n→∞

(b) P (max0≤h≤k |ˆ γX (h) − γX (h)| > ε) −→ 0, ∀ε > 0 and k ≥ 0 fixed (weak consistency). For the moment we treat only (a). Proposition 1.4.1. Assume {Xt , t ∈ Z} is stationary and let γX (·) be its ACF . Then h→∞ ¯ n ) = E[(X ¯ n − m)2 ] n→∞ (i) Var(X −→ 0 if γX (h) −→ 0; X X ¯ n ) = nE[(X ¯ n − m)2 ] n→∞ (ii) nVar(X −→ γX (h) if |γX (h)| < ∞. h∈Z

h∈Z

Remark. (i) implies consistency: by the Markov inequality ¯ n − m| > ε) ≤ P (|X

¯ n ) n→∞ Var(X −→ 0, ∀ε > 0. ε2

Remark. When {Xt , t ∈ Z} are iid, E[Xt ] = m and Var(Xt ) = σ 2 < ∞, the central limit theorem tells us that ¯  √ Xn − m d −→ N (0, 1). Zn := n σ Here E[Zn ] = 0 and Var(Zn ) = E[Zn2 ] = 1, P while in the general stationary case we n 2 2 ¯ − m) ] → h∈Z 2γX (h) follows from (ii). Hence if the have E[Zn ] = 0, E[Zn ] = σ2 E[(X σ central limit theorem applies, we should scale with τ instead of σ where X X Cov(X0 , Xh ). τ2 = γX (h) = h∈Z

h∈Z

The parameter τ 2 is called long-run variance. Proof of Proposition 1.4.1. n X 1 ¯ nVar(Xn ) = Var (Xk − m) n k=1 n

!

n

1 XX = Cov(Xk , Xl ) n k=1 l=1 1 nγX (0) + [(n − 1)γX (1) + (n − 1)γX (−1)] n  + [(n − 2)γX (2) + (n − 2)γX (−2)] + . . . + [γX (n − 1) + γX (−n + 1)]  X |h| = 1− γX (h). n =

|h|