Lecture 7 - DTU

H. Madsen, Time Series Analysis, Chapmann Hall

Time Series Analysis Henrik Madsen [email protected]

Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

Henrik Madsen

1


Outline of the lecture Identification of univariate time series models Introduction, Sec. 6.1 Estimation of auto-covariance and -correlation, Sec. 6.2.1 (and the intro. to 6.2) Using SACF, SPACF, and SIACF for suggesting model structure, Sec. 6.3 Estimation of model parameters, Sec. 6.4 Examples... Cursory material: The extended linear model class in Sec. 6.4.2 (we’ll come back to the extended model class later)

Henrik Madsen

2


Model building in general Data

1. Identification (Specifying the model order) Theory physical insight

2. Estimation (of the model parameters)

No

3. Model checking Is the model OK ? Yes

Applications using the model (Prediction, simulation, etc.)

Henrik Madsen

3


Identification of univariate time series models

2 4 6 8

12

What ARIMA structure would be appropriate for the data at hand? (If any)

0

20

40

60

80

100

Given the structure we will then consider how to estimate the parameters (next lecture) What do we know about ARIMA models which could help us?

Henrik Madsen

4


Estimation of the autocovariance function Estimate of γ(k) N −|k|

1 X CY Y (k) = C(k) = γ b(k) = (Yt − Y )(Yt+|k| − Y ) N t=1

It is enough to consider k > 0 S-PLUS: acf(x, type = "covariance")

Henrik Madsen

5


Some properties of C(k) The estimate is a non-negative definite function (as γ(k)) The estimator is non-central: N −|k|

|k| 1 X )γ(k) γ(k) = (1 − E[C(k)] = N N t=1

Asymptotically central (consistent) for fixed k : E[C(k)] → γ(k) for N → ∞

The estimates are autocorrelated them self (don’t trust apparent correlation at high lags too much)

Henrik Madsen

6


How does C(k) behave for non-stationary series? N −|k|

1 X (Yt − Y )(Yt+|k| − Y ) C(k) = N t=1

Henrik Madsen

7


How does C(k) behave for non-stationary series? N −|k|

1 X (Yt − Y )(Yt+|k| − Y ) C(k) = N t=1

7200

7400

7600

7800

8000

Series : arima.sim(model = list(ar = 0.9, ndiff = 1), n = 500)

0

Henrik Madsen

5

10

15

7

20

25


Autocorrelation and Partial Autocorrelation Sample autocorrelation function (SACF): ρb(k) = rk = C(k)/C(0)

For white noise and k 6= 1 it holds that E[b ρ(k)] ≃ 0 and √ V [b ρ(k)] ≃ 1/N , this gives the bounds ±2/ N for deciding when it is not possible to distinguish a value from zero. S-PLUS: acf(x) Sample partial autocorrelation function (SPACF): Use the Yule-Walker equations on ρb(k) (exactly as for the theoretical relations) √ It turns out that ±2/ N is also appropriate for deciding when the SPACF is zero (more in the next lecture) S-PLUS: acf(x, type="partial") Henrik Madsen

8


−2

−1 1.1

0

1 1.2 2

What would be an appropriate structure?

20

40

60

100

0.2 Partial ACF −0.1 0.0 0.1

0.60.9

−0.2

−0.2

0.8 0.2

ACF

0

0.85

10 Lag

0.915

20

1.0 1

Henrik Madsen

80

•

1.0

1

1.0

0

9

0

5 1.1

10 Lag

151.2

20


−21.1

0

21.2

4


20

40

60

80

100

•

−0.2

−0.2

0.0

0.8 0.2

ACF

Partial ACF 0.2 0.4

0.60.9

0.6

1.0

1

1.0

0

0

0.85

10 Lag

0.915

20

1.0 1

Henrik Madsen

10

0

5 1.1

10 Lag

151.2

20


1.0

0

20

40

60

100

0.8 −0.4

Partial ACF 0.0 0.4

ACF −0.2 0.8 0.2 0.6 0.9 0

0.85

10 Lag

0.915

20

1.0 1

Henrik Madsen

80

•

1.0

1

−6

−4 1.1 −2

0

1.2 2

4


11

0

5 1.1

10 Lag

151.2

20


1.0

0

20

40

60

100

0.2 Partial ACF −0.2 0.0

0.5 0.9

−0.6

−0.50.8

ACF 0.0 0

0.85

10 Lag

0.915

20

1.0 1

Henrik Madsen

80

•

1.0

1

−4

−2 1.1

0

2 1.2

4


12

0

5 1.1

10 Lag

151.2

20


−2

−1 1.1

0

1

1.2 2

3


20

40

60

100

0.6 Partial ACF 0.2 0.4

0.60.9

−0.2

−0.2

0.0

0.8 0.2

ACF

0

0.85

10 Lag

0.915

20

1.0 1

Henrik Madsen

80

•

1.0

1

1.0

0

13

0

5 1.1

10 Lag

151.2

20


−2

−1 1.1

0

1

1.2 2

3


20

40

60

100


0.60.9 −0.2

−0.2

0.8 0.2

ACF

0

0.85

10 Lag

0.915

20

1.0 1

Henrik Madsen

80

•

1.0

1

1.0

0

14

0

5 1.1

10 Lag

151.2

20


−150

1.1 −130

−110 1.2 −90


20

40

60

80

100

•

−0.2

−0.4

0.8 0.2

ACF

Partial ACF 0.0 0.4

0.60.9

0.8

1.0

1

1.0

0

0

0.85

10 Lag

0.915

20

1.0 1

Henrik Madsen

15

0

5 1.1

10 Lag

151.2

20


1.0

0

20

40

60

100

0.2 Partial ACF −0.2 0.0

0.5 0.9

−0.6

−0.50.8

ACF 0.0 0

0.85

10 Lag

0.915

20

1.0 1

Henrik Madsen

80

•

1.0

1

−4

−2 1.1

0

2 1.2

4

Example of data from an M A(2)-process

16

0

5 1.1

10 Lag

151.2

20


−40

1.10

20 40 60 1.2 80

Example of data from a non-stationary process

20

40

60

100


0.60.9

−0.2

−0.2

0.8 0.2

ACF

0

0.85

10 Lag

0.915

20

1.0 1

Henrik Madsen

80

•

1.0

1

1.0

0

17

0

5 1.1

10 Lag

151.2

20


−4 1.1 −2

0

2 1.2 4

Same series; analysing ∇Yt = (1 − B)Yt = Yt − Yt−1

20

40

60

80

100

•

−0.2

−0.2

0.8 0.2

ACF

0.6 0.9

Partial ACF 0.2 0.6

1.0

1

1.0

0

0

0.8 5

10 Lag

0.9 15

1.0 1

Henrik Madsen

18

0

51.1

10 Lag

1.2 15


Identification of the order of differencing Select the order of differencing d as the first order for which the autocorrelation decreases sufficiently fast towards 0 In practice d is 0, 1, or maybe 2 Sometimes a periodic difference is required, e.g. Yt − Yt−12

Remember to consider the practical application . . . it may be that the system is stationary, but you measured over a too short period

Henrik Madsen

19


Stationarity vs. length of measuring period

−0.5

−0.1

0.1

0.3

US/CA 30 day interest rate differential

1976

1977

1978

1979

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

−0.5

−0.4

−0.3

−0.2

−0.1

0.0

US/CA 30 day interest rate differential

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 1990 1991 1992 1993 1994 1995 1996

Henrik Madsen

20


Identification of the ARMA-part Characteristics for the autocorrelation functions: ACF ρ(k) PACF φkk AR(p)

M A(q)

ARM A(p, q)

Damped exponential and/or sine functions

φkk = 0 for k > p

ρ(k) = 0 for k > q

Dominated by damped exponential and or/sine functions

Damped exponential and/or sine functions after lag q − p

Dominated by damped exponential and/or sine functions after lag p − q

The IACF is similar to the PACF; see the book page 133 Henrik Madsen

21


Behaviour of the SACF ρˆ(k) (based on N obs.) If the process is white noise then r

±2

1 N

is an approximate 95% confidence interval for the SACF for lags different from 0 If the process is a M A(q)-process then r

±2

1 + 2(ˆ ρ2 (1) + . . . + ρˆ2 (q)) N

is an approximate 95% confidence interval for the SACF for lags larger than q Henrik Madsen

22


Behaviour of the SPACF φˆkk (based on N obs.) If the process is a AR(p)-process then r

±2

1 N

is an approximate 95% confidence interval for the SPACF for lags larger than p

Henrik Madsen

23


Model building in general Data

1. Identification (Specifying the model order) Theory physical insight

2. Estimation (of the model parameters)

No

3. Model checking Is the model OK ? Yes

Applications using the model (Prediction, simulation, etc.)

Henrik Madsen

24


Estimation We have an appropriate model structure AR(p), M A(q), ARM A(p, q), ARIM A(p, d, q) with p, d, and q known Task: Based on the observations find appropriate values of the parameters The book describes many methods: Moment estimates LS-estimates Prediction error estimates • Conditioned • Unconditioned ML-estimates • Conditioned • Unconditioned (exact) Henrik Madsen

25


Example Using the autocorrelation functions we agreed that yˆt+1|t = a1 yt + a2 yt−1

and we would select a1 and a2 so that the sum of the squared prediction errors got so small as possible when using the model on the data at hand

Henrik Madsen

26


Example Using the autocorrelation functions we agreed that yˆt+1|t = a1 yt + a2 yt−1

and we would select a1 and a2 so that the sum of the squared prediction errors got so small as possible when using the model on the data at hand To comply with the notation of the book we will write the 1-step forecasts as yˆt+1|t = −φ1 yt − φ2 yt−1 Henrik Madsen

26


The errors given the parameters (φ1 and φ2 ) Observations: y1 , y2 , . . . , yN Errors: et+1|t = yt+1 − yˆt+1|t = yt+1 − (−φ1 yt − φ2 yt−1 )

Henrik Madsen

27


The errors given the parameters (φ1 and φ2 ) Observations: y1 , y2 , . . . , yN Errors: et+1|t = yt+1 − yˆt+1|t = yt+1 − (−φ1 yt − φ2 yt−1 ) e3|2 = y3 + φ1 y2 + φ2 y1 e4|3 = y4 + φ1 y3 + φ2 y2 e5|4 = y5 + φ1 y4 + φ2 y3 eN |N −1

Henrik Madsen

.. . = yN + φ1 yN −1 + φ2 yN −2

27


The errors given the parameters (φ1 and φ2 ) Observations: y1 , y2 , . . . , yN Errors: et+1|t = yt+1 − yˆt+1|t = yt+1 − (−φ1 yt − φ2 yt−1 ) e3|2 = y3 + φ1 y2 + φ2 y1 e4|3 = y4 + φ1 y3 + φ2 y2 e5|4 = y5 + φ1 y4 + φ2 y3 eN |N −1 





y3 −y2  ..   ..  . = . −yN −1 yN Henrik Madsen

.. . = yN + φ1 yN −1 + φ2 yN −2

−y1 .. . −yN −2

  

φ1 φ2 27





e3|2   + ...  eN |N −1


The errors given the parameters (φ1 and φ2 ) Observations: y1 , y2 , . . . , yN Errors: et+1|t = yt+1 − yˆt+1|t = yt+1 − (−φ1 yt − φ2 yt−1 ) e3|2 = y3 + φ1 y2 + φ2 y1 e4|3 = y4 + φ1 y3 + φ2 y2 e5|4 = y5 + φ1 y4 + φ2 y3 eN |N −1 





y3 −y2  ..   ..  . = . −yN −1 yN Henrik Madsen

.. . = yN + φ1 yN −1 + φ2 yN −2

−y1 .. . −yN −2

  

φ1 φ2 27





Or just: e3|2   + ...  Y = Xθ + ε eN |N −1


Solution To minimize the sum of the squared 1-step prediction errors εT ε we use the result for the General Linear Model from Chapter 3:



With

b = (X T X)−1 X T Y θ

−y2  .. X= . −yN −1



−y1 .. .

−yN −2

 





y3  ..  and Y =  .  yN

The method is called the LS-estimator for dynamical systems The method is also in the class of prediction error methods since it minimize the sum of the squared 1-step prediction errors

Henrik Madsen

28


Solution To minimize the sum of the squared 1-step prediction errors εT ε we use the result for the General Linear Model from Chapter 3:



With

b = (X T X)−1 X T Y θ

−y2  .. X= . −yN −1



−y1 .. .

−yN −2

 





y3  ..  and Y =  .  yN

The method is called the LS-estimator for dynamical systems The method is also in the class of prediction error methods since it minimize the sum of the squared 1-step prediction errors How does it generalize to AR(p)-models? Henrik Madsen

28


Small illustrative example using S-PLUS > obs [1] -3.51 -3.81 -1.85 -2.02 -1.91 -0.88 > N X X [,1] [,2] [1,] 3.81 3.51 [2,] 1.85 3.81 [3,] 2.02 1.85 [4,] 1.91 2.02 > solve(t(X) %*% X, t(X) %*% Y) # Estimates [,1] [1,] -0.1474288 [2,] -0.4476040 Henrik Madsen

29


Maximum likelihood estimates ARM A(p, q)-process: Yt + φ1 Yt−1 + · · · + φp Yt−p = εt + θ1 εt−1 + · · · + θq εt−q

Notation:

θT YtT

= (φ1 , . . . , φp , θ1 , . . . , θq ) = (Yt , Yt−1 , . . . , Y1 )

The Likelihood function is the joint probability distribution function for all observations for given values of θ and σε2 : L(YN ; θ, σε2 ) = f (YN |θ, σε2 )

Given the observations YN we estimate θ and σε2 as the values for which the likelihood is maximized. Henrik Madsen

30


The likelihood function for ARM A(p, q)-models The random variable YN |YN −1 only contains εN as a random component εN is a white noise process at time N and does therefore not depend on anything

We therefore know that the random variables YN |YN −1 and YN −1 are independent, hence: f (YN |θ, σε2 ) = f (YN |YN −1 , θ, σε2 )f (YN −1 |θ, σε2 )

Repeating these arguments: 

L(YN ; θ, σε2 ) =  Henrik Madsen

N Y

t=p+1



f (Yt |Yt−1 , θ, σε2 ) f (Yp |θ, σε2 ) 31


The conditional likelihood function Evaluation of f (Yp |θ, σε2 ) requires special attention

It turns out that the estimates obtained using the conditional likelihood function: N Y L(YN ; θ, σε2 ) = f (Yt |Yt−1 , θ, σε2 ) t=p+1

results in the same estimates as the exact likelihood function when many observations are available For small samples there can be some difference Software: The S-PLUS function arima.mle calculate conditional estimates The R function arima calculate exact estimates Henrik Madsen

32


Evaluating the conditional likelihood function Task: Find the conditional densities given specified values of the parameters θ and σε2 The mean of the random variable Yt |Yt−1 is the the 1-step forecast Ybt|t−1 The prediction error εt = Yt − Ybt|t−1 has variance σε2 We assume that the process is Gaussian: f (Yt |Yt−1 , θ, σε2 )

1 −(Yt −Ybt|t−1 (θ))2 /2σε2 = √ e σε 2π

And therefore: − N 2−p

L(YN ; θ, σε2 ) = (σε2 2π)





N X 1 exp − 2 ε2t (θ) 2σε t=p+1

Henrik Madsen

33


ML-estimates b is a prediction error estimate The (conditional) ML-estimate θ since it is obtained by minimizing S(θ) =

N X

ε2t (θ)

t=p+1

By differentiating w.r.t. σε2 it can be shown that the ML-estimate of σε2 is b σ b2 = S(θ)/(N − p) ε

b is asymptoticly “good” and the The estimate θ variance-covariance matrix is approximately 2σε2 H −1 where H contains the 2nd order partial derivatives of S(θ) at the minimum

Henrik Madsen

34


Finding the ML-estimates using the PE-method 1-step predictions: Ybt|t−1 = −φ1 Yt−1 − · · · − φp Yt−p + θ1 εt−1 + · · · + θq εt−q

If we use εp = εp−1 = · · · = εp+1−q = 0 we can find:

Ybp+1|p = −φ1 Yp − · · · − φp Y1 + θ1 εp + · · · + θq εp+1−q

Which will give us εp+1 = Yp+1 − Ybp+1|p and we can then calculate Ybp+2|p+1 and εp+1 . . . and so on until we have all the

1-step prediction errors we need.

We use numerical optimization to find the parameters which minimize the sum of squared prediction errors Henrik Madsen

35


S(θ) for (1 + 0.7B)Yt = (1 − 0.4B)εt with σε2 = 0.252 Data: arima.sim(model=list(ar=−0.7,ma=0.4), n=500, sd=0.25)

1.0 0.8 45

AR−parameter

0.6 0.4 40

0.2 0.0

35

−0.2 30

−0.4 −1.0

−0.5

0.0 MA−parameter

Henrik Madsen

36

0.5


Moment estimates Given the model structure: Find formulas for the theoretical autocorrelation or autocovariance as function of the parameters in the model Estimate, e.g. calculate the SACF Solve the equations by using the lowest lags necessary Complicated! General properties of the estimator unknown!

Henrik Madsen

37


Moment estimates for AR(p)-processes In this case moment estimates are simple to find due to the Yule-Walker equations. We simply plug in the estimated autocorrelation function in lags 1 to p: 





ρb(1) 1 ρb(1) ··· ρb(2)  ρb(1) 1 ···     ..  =  .. ..  .   . . ρb(p) ρb(p − 1) ρb(p − 2) · · ·

and solve w.r.t. the φ’s

The function ar in S-PLUS does this

Henrik Madsen

38

  ρb(p − 1) −φ1 −φ2  ρb(p − 2)     .. ..   .  . 1

−φp