TIME SERIES ECONOMETRICS: SOME BASIC CONCEPTS. Reference :
Gujarati, Chapters 21, 22. 1. Assume the underlying time series data is stationary.
2.
TIME SERIES ECONOMETRICS: SOME BASIC CONCEPTS Reference : Gujarati, Chapters 21, 22 1. Assume the underlying time series data is stationary. 2. Sometimes autocorrelation because the underlying time series data is non-stationary. 3. Sometimes one obtains a very high R2 and significant regression coefficients even though there is no meaningful relationship between the two variables — the problem of spurious, or nonsense, regression.
7-1
Stochastic Processes Let yt be the observation made at time t. The units of time vary with application; they could be years, quarters, months, days,... We assume that the observations are equally spaced in time. The sequence of random variables {Y1, Y2, · · · , YT } is called a stochastic process. Its mean function is: µt = E(Zt)
t = 0, ±1, ±2, · · ·
µt is the expected value of the process at time t. The autocovariance function is: γt,s = Cov(Zt, Zs) = E(Zt − µt)(Zs − µs)
t, s = 0, ±1, ±2, · · ·
The variance function is: V ar(Zt) = γt,t = Cov(Zt, Zt) = E(Zt − µt)2 The autocorrelation function is: Cov(Zt, Zs) ρt,s = Corr(Zt, Zs) = [V ar(Zt)V ar(Zs)]1/2 7-2
t, s = 0, ±1, ±
STATIONARITY The time series Zt is weakly stationary if µt = E(Zt) = µ and γt,s = Cov(Zt, Zs) = Cov(Zt−l , Zs−l )
(1)
for any integer l. Equation (1) implies: γt,s = γ0,k where k = |t − s|. Thus, for a stationary process we can simply write γk = Cov(Zt, Zt−k )
and
Note that ρk = γk /γ0
7-3
ρk = Corr(Zt, Zt−k )
WHITE NOISE Let {εt} be a sequence of independent random variables with mean 0 and variance σ 2 and let Yt = µ + ε t Then E(Yt) = µ γk = Cov(Yt, Yt−k )
σ2
k=0
0
k 6= 0
= Cov(εt, εt−k ) = and
1
k=0
0
k 6= 0
ρk =
Such a sequence is called a purely random sequence or white noise sequence.
7-4
Example of Stationary Series Let {εt} be a white noise sequence which is distributed as N (0, σ 2). Define a new process {Yt} by Yt = µ + εt + εt−1 Then E(Yt) = µ γ0 = var(Yt) = var(εt + εt−1) = var(εt) + var(εt−1) = 2σ 2 γ1 = cov(Yt, Yt−1) = cov(εt + εt−1 , εt−1 + εt−2) = σ2 γk = cov(Yt, Yt−k ) = cov(εt + εt−1 , εt−k + εt−k−1) = 0 for s > 1 Hence
2σ
γk = σ 2
0
2
k=0 |k| = 1
1
and ρk = 1/2
|k| > 1 7-5
0
k=0 |k| = 1 |k| > 1
Example of Nonstationary Series In practice, we usually find series which are not stationary. For example, economic or business series show a trend or change in mean level over time reflecting growth or nonstationary due to seasonal features in series. Important practical matter in time series involves methods of transforming nonstationary series to stationary one, or modeling nonstationarity. Two fundamental approaches for dealing with nonstationarity are: 1. Work with change or differences in series since these may be stationary. 2. remove non-stationary components, e.g. nonconstant mean, by linear regression technique.
7-6
RANDOM WALK Let at be iid N (0, σ 2) and let Zt = Zt−1 + at
t = 1, 2, · · ·
and Z0 = 0 Then Zt = a1 + a2 + · · · + at Zt is called random walk with mean µt = 0, variance V ar(Zt) = tσ 2, and autocovariance, γt,s = tσ 2 for 1 ≤ t ≤ s. Since µt, V ar(Zt) and γt,s depend on t, Zt is not stationary.
7-7
Example: Random Walk with Drift Let εt be iid N (0, σ 2) and let Yt = Yt−1 + δ + εt
t = 1, 2, · · ·
and Y0 = 0 where δ is a constant. Such a series is called a random walk with drift δ. We have Yt = Y0 + tδ + Ptj=1 εj Its mean µt = E(Yt) = tδ, its variance is: V ar(Yt) = tσ 2. Thus {Yt} is not stationary, with both mean and variance depend on t. Note that the series of changes or first difference of {Yt}, defined by Zt = Yt − Yt−1 = δ + εt is a white noise series.
7-8
ESTIMATION OF MEAN, AUTOCOVARIANCES, AND AUTOCORRELATIONS FOR STATIONARY SERIES Suppose Y1, Y2, · · · , YT be a sample realization of a stationary time series, {Yt} with mean µ = E(Yt) autocovariance function γk = Cov(Yt, Yt+k ) = Cov(Yt, Yt−k ) and autocorrelation function ρk = Corr(Yt, Yt+k ) =
7-9
γk γ0
The estimator for µ is the sample mean T 1 X ¯ Y = Yt T t=1 Estimator for γk is −k 1 TX (Yt − Y¯ )(Yt+k − Y¯ ) k = 0, 1, 2, · · · ck = T t=1 where k is small relative to T . Note that T 1 X c0 = (Yt − Y¯ )2 T t=1 is the sample variance. Estimator for ρk is sample ACF −k (Yt − Y¯ )(Yt+k − Y¯ ) ck PTt=1 k = 0, 1, 2, · · · rk = = PT 2 ¯ c0 t=1 (Yt − Y ) A plot of rk versus k is called a correlogram.
7-10
Sampling Properties of Estimators 1. Y¯ is an unbiased estimator of µ. That is: E(Y¯ ) = µ 2. T 1 X ¯ V ar(Y ) = V ar( Yt ) t=1 T TX −1 T − k γ0 1 + 2 = ρk T T k=1 If Yt are independent, then ρk = 0 for all k 6= 0 and so V ar(Y¯ ) = γ0/T . When T is large, then
3. rk is approximately normally distributed. 4. E(rk ) ≈ ρk and 5. 1 V ar(rk ) ≈ T
∞ X s=−∞
·
2
2
ρs + ρs+k ρs−k − 4ρk ρsρs−k + 2ρs ρk
7-11
2
¸
Special case When the series is white noise, so ρs = 0 for s 6= 0, then 1 V ar(rk ) ≈ for k 6= 0 T In fact, rk is approximately distributed as N (0, 1/T ) for k = 1, 2, · · · This property will be applied to check whether the model is appropriate or not. If the model fits the data, the residuals will follow a white noise series and hence √ √ 95% of its ACF will lie between −2/ T and 2/ T .
7-12
General Characteristics of Sample ACF 1. Stationary (a) Sample ACF tends to damp out to 0 as lag k increases fairly rapidly (b) cut off (c) damp out exponentially or sinusoidally 2. Non-stationary (a) Sample ACF tends to damp out very slowly, linearly (b) sinusoidally but damping out very slowly – strong seasonal component
7-13
PARTIAL AUTOCORRELATION FUNCTION For a stationary and normal-distributed time series {Zt}, the partial autocorrelation , (PACF), at lag k is defined as: φk,k = Corr(Zt , Zt−k | Zt−1, Zt−2, · · · , Zt−k+1) which is the correlation between Zt and Zt−k after removing the effect of the intervening variables Zt−1, Zt−2, · · · , Zt−k+1. Its estimator is the sample partial autocorrelation , rk,k . Property: If {Zt} for t = 1, 2, · · · , T is white noise, then its sample partial autocorrelation function rkk will distribute as N (0, 1/T ) for k = 1, 2, · · · This property will be applied to check whether the model is appropriate or not. If the model fits the data, the residuals will follow a white noise series and hence √ √ 95% of its PACF will lie between −2/ T and 2/ T .
7-14
Tests of Stationarity 1. Sample ACF tends to damp out to 0 as lag k increases fairly rapidly (a) cut off (b) damp out exponentially or sinusoidally 2. Sample PACF tends to damp out to 0 as lag k increases fairly rapidly (a) cut off (b) damp out exponentially or sinusoidally
7-15
Tests of White Noise If the time series is white noise, we have 1. its ACF rk is approximately distributed as N (0, 1/T ) for k = 1, 2, · · ·, and 2. its PACF is approximately distributed as N (0, 1/T ) for k = 1, 2, · · ·. Hence, if the time series is white noise, we have √ √ 1. its ACF rk lies between −2/ T and 2/ T ; √ √ 2. its PACF rk lies between −2/ T and 2/ T . 3. In addition, we can apply the Box-Pierce Q Statistic m 2 X Q=n rk k=1
or the Ljung-Box Q Statistic rk2 LB = n(n + 2) k=1 n − k where n is sample size, m is lag length to test for white noise. m X
If the time series is white noise, Q ∼ χ2m and LB ∼ χ2m 7-16
Notation: The backward shift operator B is defined by BYt = Yt−1 and hence B iYt = Yt−i The forward shift operator F = B −1 is defined by F Yt = Yt+1 and hence F iYt = Yt+i Example 1: Yt = εt − θεt−1 = (1 − θB)εt Example 2: Yt = φYt−1 + εt implies Yt − φYt−1 = εt or (1 − φB)Yt = εt 7-17
If |φ| < 1, then Yt = (1 − φB)−1εt We have Yt = (1 + φB + φ2B 2 + φ3B 3 + · · ·)εt and hence Yt = εt + φεt−1 + φ2εt−2 + φ3εt−3 + · · · Similarly, in Example 1 εt = (1 − θB)−1Yt and hence εt = (1 + θB + θ2B 2 + θ3B 3 + · · ·)Yt εt = Yt + θYt−1 + θ2Yt−2 + θ3Yt−3 + · · · Remark: In Example 2, when φ = 1, we have Yt = Yt−1 + εt or Yt − Yt−1 = εt which is a random walk series. 7-18
LINEAR MODELS FOR STATIONARY SERIES Properties of series are exhibited by ACF. Hence, we build models which reflect the ACF structure. Linear Filters Often we deal with formation of new series {Yt} by a linear operation applied to a given series {Xt}. The system where Xt is input and Yt is output which results from linear operation of Xt. A linear time-invariant filter applied to series {Xt} is to produce a new series {Yt} such that Yt =
∞ X j=−∞
ψj Xt−j
If ψj satisfies ψj = 0 for j < 0, then Yt =
∞ X j=0
ψj Xt−j
The filter is one-sided. Time-invariant because coefficients ψj do not depend on t.
7-19
Note 1. Xt may be controllable, e.g. in the process of production, {Xt} is the input of raw material. Yt be output of product or by-product. 2. Differencing operators are linear filters Yt = ∇Xt = Xt − Xt−1 and Yt = ∇2Xt = ∇Xt − ∇Xt−1 = Xt − 2Xt−1 + Xt−2 3. Moving averages are linear filters, e.g. m 1 X Xt−j Yt = 2m + 1 j=−m If {Xt} is stationary with mean µx and autocovariance γk , then ∞ X Yt = ψj Xt−j j=−∞
with mean µY =
∞ X j=−∞
ψj E(Xt−j ) = µx 7-20
∞ X j=−∞
ψj
and the autocovariance γY (s) = Cov(Yt, Yt+s) ∞ X = Cov( ψj Xt−j , = =
∞ X
j=−∞ ∞ X
j=−∞ k=−∞ ∞ ∞ X X j=−∞ k=−∞
∞ X k=−∞
ψk Xt+s−k )
ψj ψk Cov(Xt−j , Xt+s−k ) ψj ψk γs+j−k
7-21
Linear Process {Yt} is a linear process if it can be represented as output of one-sided linear filter white noise {εt}. That is: ∞ X Yt = µ + ψj εt−j j=0
where εt are independent random variable with mean 0 and variance σ 2. In this situation, µY = µ and the autocovariance γY (s) = Cov(Yt, Yt+s) ∞ X = Cov( ψj εt−j , = =
j=0 ∞ ∞ X X
∞ X k=0
ψk εt+s−k )
ψj ψk Cov(εt−j , εt+s−k )
j=0 k=0 ∞ X ψj ψj+s σ 2 j=0
because Cov(εt−j , εt+s−k ) = σ 2 when k = j + s and equal to 0 when k 6= j + s.
7-22
Wold’s Representation Theorem : If {Yt} is a weakly stationary nondeterministic series with mean µ, then Yt can always be expressed as: Yt = µ +
∞ X j=0
ψj εt−j
2 with ψ0 = 1 and P∞ j=0 (ψj ) < ∞ where εt are uncorrelated random variables with mean 0 and variance σ 2.
This results supports the use of model representation of the form: Yt = µ +
∞ X j=0
ψj εt−j
ψ0 = 1
as a class of models for stationary series.
7-23
FINITE MOVING AVERAGE MODEL A simple class of models is to set ψj = 0 for j > q. {Yt} is said to be a moving average process of order q (MA(q)) if it satisfies: Yt = µ + ε t −
q X j=1
θj εt−j
where εt are independent white noise with mean 0 and variance σ 2. We write Yt = µ + ε t −
q X j=1
θj εt−j
= µ + Θ(B)εt where Θ(B) = 1 − Pqj=1 θj B j is a MA average.
7-24
MA(1) When q = 1, Yt = µ + εt − θεt−1, we have E(Yt) = µ V ar(Yt) = γ0 = σ 2(1 + θ2) γ1 = −θσ 2 and γk = 0
for |k| > 1
Hence ρ1 =
−θ 1 + θ2
7-25
MA(2) When q = 2, Yt = µ + εt − θ1εt−1 − θ2εt−2 E(Yt) = µ V ar(Yt) = γ0 = σ 2(1 + θ12 + θ22) γ1 = σ 2(−θ1 + θ1θ2) γ2 = σ 2(−θ2) and γk = 0 Hence
for |k| > 2
−θ1 + θ1θ2 ρ1 = (1 + θ12 + θ22)
and ρ2 =
−θ2 (1 + θ12 + θ22)
7-26
MA(q) The model is Yt = µ + ε t −
q X j=1
θj εt−j
= µ + Θ(B)εt where Θ(B) = 1 − Pqj=1 θj B j . E(Yt) = µ V ar(Yt) = γ0 = σ 2(1 + θ12 + θ22 + · · · + θq 2) γk = σ 2(−θk +θ1θk+1+· · ·+θq−k θq ) for k = 0, 1, 2, · · · , q and γk = 0 for |k| > q. Hence, the ACF is ρk =
(−θk + θ1θk+1 + · · · + θq−k θq ) 1 + θ1 2 + θ2 2 + · · · + θq 2
for k = 0, 1, 2, · · · , q and ρk = 0 for |k| > 0.
7-27
AUTOREGRESSIVE MODELS The autoregressive model with order p, AR(p), is Yt = φ1Yt−1 + φ2Yt−2 + · · · + φpYt−p + δ + εt where εt are independent white noise with mean 0 and variance σ 2. We can re-write it as Yt − φ1Yt−1 − φ2Yt−2 − · · · − φpYt−p = δ + εt or Φ(B)Yt = δ + εt where Φ(B) = 1 − Ppj=1 φj B j is a AR average. AR(p) model resembles a multiple linear regression where Yt−1,...,Yt−p are “independent” variables. Autoregression because Yt is regressed on its own past values.
7-28
AR(1) When p = 1, Yt = φYt−1 + δ + εt Is it stationary? By successive substitution Yt = φ(φYt−2 + δ + εt−1) + δ + εt n
= φ Yt−n + δ
n−1 X j=0
j
φ +
n−1 X j=0
φj εt−j
Under the assumption that |φ| < 1, as n −→ ∞, we get Yt = δ =
∞ X j=0
j
φ +
∞ X
φj εt−j
j=0 ∞ j X
δ + φ εt−j j=0 1−φ
which is stationary. Note that
j |φ| 0
Cov(εt, Yt−k ) =
γ0 = V ar(Yt) = V ar(φ1Yt−1 + φ2Yt−2 + εt) = φ21γ0 + φ22γ0 + 2φ1φ2γ1 + σ 2
γ1 = Cov(Yt, Yt−1) = Cov(φ1Yt−1 + φ2Yt−2 + εt, Yt−1) = φ1γ0 + φ2γ1 This implies γ1 =
φ1 γ0 1 − φ2
For k > 0, γk = Cov(Yt, Yt−k ) = Cov(φ1Yt−1 + φ2Yt−2 + εt, Yt−k ) = φ1γk−1 + φ2γk−2 This implies ρk = φ1ρk−1 + φ2ρk−2 for k > 0. This is called Yule-Walker Equation. 7-39
In Y-W Equation, k = 1 and k = 2 are very important for AR(2). They are ρ1 = φ1ρ0 + φ2ρ1 = φ1 + φ2ρ1
(4)
ρ2 = φ1ρ1 + φ2ρ0 = φ1ρ1 + φ2
(5)
and Solving these two equations, we have ρ1 =
φ1 1 − φ2
and
φ12 ρ2 = φ2 + 1 − φ2 Higher lag values of ρk can then be computed recursively by the difference equation. For example: ρ3 = φ1ρ2 + φ2ρ1 Equations (4) and (5) can also be used to solve for φ1 and φ2 such that
ρ1(1 − ρ2) 1 − ρ21 ρ2 − ρ21 φ2 = 1 − ρ21
φ1 =
7-40
ACF ρk satisfies the second order difference equation (Y-W Equation): ρk = φ1ρk−1 + φ2ρk−2 From the difference equation theory, solution ρk to the difference equation has the form: ρk = c1mk1 + c2mk2
for any k ≥ 0
(if m1 and m2 are distinct and real) where m1, m2 are roots of m2 − φ1m − φ2 = 0 c1 and c2 can be determined by initial conditions φ1 ρ0 = 1 and ρ1 = 1 − φ2 In this situation, ρk decline exponentially as k increases. When m1 and m2 are complex, said m1 , m2 = R(cosλ ± i sinλ) c1 and c2 will be complex also, said c1 , c2 = a ± bi So that ρk = c1mk1 + c2mk2 = (a + bi)Rk (cosλ + i sinλ)k + (a − bi)Rk (cosλ − i sinλ)k = Rk (a1cos(kλ) + a2sin(kλ)) 7-41
where 1
R = |m1| = |m2| = (−φ2) 2 < 1 and λ satisfies cosλ =
φ1 1
2(−φ2) 2 In this situation, ρk is damped sinusoid with “damping factor” R, period 2π/λ and frequency λ.
7-42
PACF of AR(2) The PACF of AR(2) are φ11 = ρ1 and
ρ2 − ρ21 φ22 = 1 − ρ21
( = φ2 )
and φkk = 0
for k > 2
Hence, the ACF of AR(2) damp off exponentially or sinusoidly while the PACF cut off after lag 2.
7-43
/*-------------------------------------------------------*/ /*----
EXAMPLE
----*/
/*-------------------------------------------------------*/ Approx. Parameter
Estimate
Std Error
T Ratio
MU
6.96407
0.20628
33.76
0
AR1,1
0.51108
0.08640
5.92
1
Constant Estimate Variance
Lag
= 3.40488718
Estimate = 1.04185881
Std Error Estimate = 1.02071485 AIC
= 290.170809
SBC
=
295.38115
Number of Residuals=
100
Autocorrelation Check of Residuals To Lag 6
Chi
Autocorrelations
Square DF 6.91
Prob
5
0.228 -0.005
0.030
0.106 -0.221 -0.014
12
13.53 11
0.260 -0.191
0.024
0.022
0.060 -0.127 -0.046
18
17.83 17
0.399 -0.162 -0.058
0.004
0.036
0.010
0.072
24
20.85 23
0.590 -0.086 -0.031 -0.032 -0.055 -0.048
0.092
Autoregressive Factors Factor 1: 1 - 0.51108 B**(1)
7-44
0.063
Approx. Parameter
Estimate
Std Error
T Ratio
6.96184
0.16749
41.57
0
MA1,1
-0.45979
0.10020
-4.59
1
MA1,2
-0.15518
0.10022
-1.55
2
MU
Constant Estimate Variance
Lag
= 6.96184387
Estimate = 1.08819896
Std Error Estimate = 1.04316775 AIC
= 295.415416
SBC
= 303.230926
Number of Residuals=
100
Correlations of the Estimates Parameter
MU
MA1,1
MA1,2
MA1,1
0.001
1.000
0.394
MA1,2
-0.001
0.394
1.000
Autocorrelation Check of Residuals To Lag 6
Chi
Autocorrelations
Square DF
Prob
9.45
0.051
4
0.049
0.100
0.182 -0.203 -0.012
0.008 -0.004
0.051
12
16.43 10
0.088 -0.195
0.018 -0.142 -0.063
18
21.02 16
0.178 -0.170 -0.078 -0.013
0.012 -0.005
0.059
24
24.00 22
0.347 -0.087 -0.037 -0.032 -0.052 -0.038
0.094
Moving Average Factors Factor 1: 1 + 0.45979 B**(1) + 0.15518 B**(2)
7-45
Approx. Parameter
Estimate
Std Error
T Ratio
6.97335
0.21712
32.12
0
MA1,1
-0.54556
0.09870
-5.53
1
MA1,2
-0.36785
0.10771
-3.42
2
MA1,3
-0.27311
0.09959
-2.74
3
MU
Constant Estimate Variance
Lag
= 6.97334862
Estimate = 1.00988046
Std Error Estimate = 1.00492809 AIC
= 289.200417
SBC
= 299.621098
Number of Residuals=
100
Correlations of the Estimates Parameter
MU
MA1,1
MA1,2
MA1,3
MA1,1
-0.006
1.000
0.442
0.239
MA1,2
-0.012
0.442
1.000
0.448
MA1,3
-0.015
0.239
0.448
1.000
Autocorrelation Check of Residuals To Lag
Chi
Autocorrelations
Square DF
Prob
6
1.60
3
0.661 -0.023 -0.031 -0.004 -0.078
12
11.02
9
0.275 -0.222
0.009
0.024
0.003
0.086
0.110 -0.147 -0.031
Moving Average Factors Factor 1: 1 + 0.54556 B**(1) + 0.36785 B**(2) + 0.27311 B**(3)
7-46
GENERAL ORDER AUTOREGRESSIVE MODELS The autoregressive model with order p, AR(p), is Yt = φ1Yt−1 + φ2Yt−2 + · · · + φpYt−p + δ + εt where εt are independent white noise with mean 0 and variance σ2. or Φ(B)Yt = δ + εt where Φ(B) = 1 −
Pp
j=1 φj B
j
is a AR average.
If all roots of Φ(z) = 1 − φ1z − φ2z 2 − · · · − φpz p = 0 are larger than one in absolute value, or all roots of mp − φ1mp−1 − φ2mp−2 − · · · − φp = 0 are smaller than one in absolute value, then the process is stationary and has a convergent infinite MA representation.
7-47
That is Yt = Φ(B)−1δ + Φ(B)−1εt = µ + Ψ(B)εt where µ = E(Yt) = Φ(B)−1δ δ = 1 − φ1 − φ2 − · · · − φp Ψ(B) = and
P∞
j=0 |ψj |
∞ X j=0
ψj B j = Φ(B)−1
0. Note that ψ0 = 1 and ψj = 0 for j < 0. The solution of difference equation implies that ψj satisfies ψj =
p X i=1
cimji
where mi are roots of mp − φ1mp−1 − φ2mp−2 − · · · − φp = 0 7-48
Autocovariance and Autocorrelation Autocovariance γs of AR(p) satisfy Yule-Walker Equation: γs = φ1γj−1 + φ2γj−2 + · · · + φpγs−p
(6)
Divide (6) by γ0, we get the Yule-Walker Equation for ACF, ρj : ρs = φ1ρj−1 + φ2ρj−2 + · · · + φpρs−p ACF satisfies the same different equation as the ψs and γs, but with different initial conditions. General solution to the above different equation is ρs = c1ms1 + c2ms2 + · · · + cpmsp where mi are roots of mp − φ1mp−1 − φ2mp−2 − · · · − φp = 0 Yule-Walker equation are useful for determining AR parameters φ1, · · · , φp. Equations can be expressed in matrix form as PΦ=ρ where
P =
1
ρ1
ρ2
· · · ρp−1
ρ1 ...
1 ...
ρ1 ...
··· ρ ... ...
ρp−1 ρp−2 ρp−3 · · · 7-49
p−2
1
2
Φ=
φ1 φ ...
and ρ =
ρ1
2
ρ ...
ρp φp Equations are used to solve for Φ in terms of ACF, the solution is: Φ = P −1 ρ Sample version of this solution replaces ρs by sample ACF rs and the estimate of Φ (which is called Yule-Walker estimates of AR parameters) is ˆ = R−1 r Φ where
R=
p−2
1
r1
r2
· · · rp−1
r1 ...
1 ...
r1 ...
··· r ... ...
rp−1 rp−2 rp−3 · · ·
and r =
1
Variance γ0 = V ar(Yt) can be expressed as γ0 = φ1γ1 + φ2γ2 + · · · + φpγp + σ 2 Hence, σ 2 = γ0 − φ1γ1 − φ2γ2 − · · · − φpγp = γ0(1 − φ1ρ1 − φ2ρ2 − · · · − φpρp)
7-50
r1
2
r ...
rp
Partial Autocorrelation Function When fit AR models to data, we need to choose an appropriate order p for model. The PACF is useful here. Suppose Yt is stationary with ACF, ρs. For k ≥ 1, consider the first k Yule-Walker equation corresponding to AR(k) model: ρs = φ1ρj−1 + φ2ρj−2 + · · · + φk ρs−k
s = 1, · · · , k
(7)
and let φ1k , φ2k , · · · , φkk denote to be the solution to Y-W equation for φ1, φ2, · · · , φk . This equation can be solved for order k = 1, 2, · · · and t he quantity φkk is the PACF at lag k. k=1
=⇒
ρ1 = φ11ρ0
=⇒
φ11 = ρ1
k = 2 implies
ρ1 ρ2
φ12 φ22
=
=
This implies φ12 = and
1 ρ1 ρ1 1 1 ρ1 ρ1 1
φ12 φ22
−1
ρ1(1 − ρ2) 1 − ρ21
ρ2 − ρ21 φ22 = 1 − ρ21 7-51
ρ1 2 ρ2 2
When we actually have an AR(p) process and we set k = p in equation (7), we have
p2
φp1 φ ...
=
φpp
φ1
2
φ ...
φp
and hence φpp = φp. When k > p, we get φkk = 0 PACF φk k at lag k is actually equal to the partial correlation between Yt and Yt−k , when we adjust for the intermediate values Yt−1, Yt−2, · · · , Yt−k+1.
7-52
INVERTIBILITY OF MA MODELS
Yt = µ + Θ(B)εt If all roots of Θ(z) = 1 − θ1z − θ2z 2 − · · · − θq z q = 0 are larger than one in absolute value, or all roots of mq − θ1mq−1 − θ2mq−2 − · · · − θq = 0 are smaller than one in absolute value, then the MA process can be expressed in form of infinite AR model. That is: Θ(B)−1Yt = Θ(B)−1µ + εt or Π(B)Yt = δ + εt where Π(B) = 1 − π1B − π2B 2 − · · · = Θ(B)−1 with
P∞
j=1 |πj |
< ∞. That is: Yt =
∞ X j=1
πj Yt−j + δ + εt
Then MA process is said to be invertible. 7-53
MIXED AUTOREGRESSIVE MOVING AVERAGE (ARMA) MODEL Yt follows an ARM A(p, q) model if it satisfies: Yt = φ1Yt−1 + φ2Yt−2 + · · · + φpYt−p +δ + εt − θ1εt−1 − · · · − θq εt−q where εt are independent white noise with mean 0 and variance σ 2, or Φ(B)Yt = δ + Θ(B)εt where Φ(B) = 1 − Θ(B) = 1 −
Pq
Pp
j=1 φj B
j=1 θj B
j
j
is a AR average,
is a AR average.
If all roots of Φ(z) = 1 − φ1z − φ2z 2 − · · · − φpz p = 0 are larger than one in absolute value, then the process is stationary and has convergent infinite MA representation: Yt = Φ(B)−1δ + Φ(B)−1Θ(B)εt = µ + Ψ(B)εt where µ = E(Yt) = Φ(B)−1δ δ = 1 − φ1 − φ2 − · · · − φp 7-54
Ψ(B) = and
P∞
j=0 |ψj |
∞ X j=0
ψj B j = Φ(B)−1Θ(B)
0, it is often to be d = 1, (occasionally, d = 2), ie Wt = (1 − B)Yt is ARM A(p, q) or Yt is ARIM A(p, 1, q) To get Yt from Wt, we must sum or “integrate” Wt. i.e. Yt = (1 − B)−1Wt = (1 + B + B 2 + · · ·)Wt = Wt + Wt−1 + Wt−2 + · · ·
7-63
ARIM A(p, d, q) MODEL Yt is non-stationary such that Wt = (1 − B)dYt is stationary ARM A(p, q) i.e. Φ(B)Wt = δ + Θ(B)εt Φ(B)(1 − B)dYt = δ + Θ(B)εt ϕ(B) = Φ(B)(1 − B)d = 1 − ϕ1B − · · · − ϕp+dB p+d In this form, Yt has form of ARM A(p + d, q) model, but is non-stationary with d roots of ϕ(B) = 0 equal to 1.
7-64
Unit Root Stochastic Process For the AR(1) model: Yt = ρYt−1 + δ + εt Yt = δ
∞ X j=0
j
φ +
−1≤ρ≤1 ∞ X j=0
φj εt−j
which is non-stationary when ρ = 1 — unit root problem.
7-65
Trend Stationary (TS) and Difference Stationary (DS) Trend Stationary — the trend is completely predictable Difference Stationary — the trend is stochastic but becomes stationary after differencing. Consider Yt = β1 + β2 t + β3Yt−1 + ut 1. Random Walk Model [RWM] without drift — β1 = β2 = 0, β3 = 1: Yt = Yt−1 + ut is non-stationary but ∆Yt = (Yt − Yt−1) = ut is stationary. 2. Random Walk Model with drift — β1 6= 0, β2 = 0, β3 = 1: Yt = β1 + Yt−1 + ut is non-stationary but ∆Yt = (Yt − Yt−1) = β1 + ut is stationary and Yt exhibits a positive (β1 > 0) or negative (β1 < 0) trend. 7-66
3. Deterministic Trend Model — β1 6= 0, β2 6= 0, but β3 = 0: Yt = β1 + β2 t + ut is non-stationary but stationary after detrend. 4. Random Walk with Drift and Deterministic Trend — β1 6= 0, β2 6= 0, and β3 = 1: Yt = β1 + β2 t + Yt−1 + ut is non-stationary and ∆Yt = β1 + β2 t + ut is still non-stationary. 5. Deterministic Trend with Stationary AR(1) Component — β1 6= 0, β2 6= 0, and β3 < 1: Yt = β1 + β2 t + β3Yt−1 + ut which is stationary around the deterministic trend.
7-67
Integrated Stochastic Process A time series Yt integrated of order d, denoted by Yt ∼ I(1), if Yt differencing d times. Hence, random walk model without drift, random walk model with drift, deterministic trend model and random walk with stationary AR(1) component are I(1) while random walk with drift and deterministic trend is I(2). Properties of Integrated Series 1. If Xt ∼ I(0) and Yt ∼ I(1), then Zt = Xt + Yt ∼ I(1). 2. If Xt ∼ I(d), then Zt = a + bXt ∼ I(d). 3. If Xt ∼ I(d1), Yt ∼ I(d2), then Zt = aXt + bYt ∼ I(d2) where d1 ≤ d2. 4. If Xt ∼ I(d) and Yt ∼ I(d), then Zt = aXt + bYt ∼ I(d∗) where d∗ is generally equal to d but sometimes d∗ < d (when cointegrated).
7-68
Problems 1. Consider Yt = β1 + β2Xt + ut . The estimate
P
xt yt P 2 xt If Xt ∼ I(1) and Yt ∼ I(0), then Xt is non-stationary and its variance will increase indefinitely, then dominating the numerator with the result that βˆ2 will converge to zero asymptotically and it will not even have an asymptotic distribution. βˆ2 =
2. Spurious Regression Consider Yt = Yt−1 + ut Xt = Xt−1 + vt Y0 = 0 and
X0 = 0
where ut and vt are independent. However, when we simulate independent ut and vt from N (0, 1) and fit Yt = β1 + β2Xt + et . We find that estimate of β2 is significantly different from zero and R2 is also significantly different from zero. 7-69
The Unit Root Test Consider Yt = ρYt−1 + ut
−1≤ρ≤1
ut is error term. ∆Yt = (Yt − Yt−1) = ρYt−1 − Yt−1 + ut = (ρ − 1)Yt−1 + ut = δYt−1 + ut
(8)
where δ = ρ − 1. H0 : ρ = 1
H1 : ρ < 1
H0 : δ = 0
H1 : δ < 0
is equivalent to (9)
If H0 is true, then ∆Yt = ut is white noise. To test for (9), we simply regress ∆Yt on Yt−1 and obtain the estimated slope ˆ However, unfortunately, the estimate δˆ does not coefficient, δ. follow the t distribution even in large samples. Dickey and Fuller show that δˆ follows the τ (tau) statistic. The test δˆ is known as the Dickey-Fuller (DF) test. If the hypothesis H0 : δ = 0 is rejected, we can use the usual (Student’s) t test.
7-70
The DF test is estimated in three different forms: 1. Yt is a random walk : ∆Yt = δYt−1 + ut
(10)
2. Yt is a random walk with drift : ∆Yt = β1 + δYt−1 + ut
(11)
3. Yt is a random walk with drift around a deterministic trend : ∆Yt = β1 + β2 t + δYt−1 + ut
(12)
If the hypothesis H0 : δ = 0 is rejected, then Yt is a stationary time series with zero mean for (10), is stationary with a nonzero mean for (11) and is stationary around a deterministic trend for (12)
7-71
The Augmented Dickey-Fuller (ADF) test In the DF test for (10), (11) and (12), it is assumed that ut was uncorrelated. If they are, we use ADF test for the following: ∆Yt = β1 + β2 t + δYt−1 + αi
m X i=1
∆Yt−i + εt
(13)
where εt is a white noise and ∆Yt−i = Yt−i − Yt−i−1. The number of lagged difference terms to include so that εt is white noise. The ADF test follows the same asymptotic distribution as the DF test and so that same critical values can be used.
7-72
Cointegration If Yt ∼ I(1) and Xt ∼ I(1), but ut ∼ I(0) where Yt = β1 + β2Xt + ui .
(14)
Yt and Xt is said to be cointegrated. As both Yt and Xt ∼ I(1), they have stochastic trends and their linear combination ut ∼ I(0) cancels out the stochastic trends. As a results, the cointegrating regression (14) is meaningful and β2 is called the cointegrating parameter. Economically speaking, two variables will be cointegrated if they have a long-term, or equilibrium, relationship between them.
7-73
Testing for Cointegration A number of tests have been established, we consider two simple methods: 1. Engle-Granger (EG) or Augmented Engle-Granger (AEG) test Apply the DF or ADF unit root test on the residuals ut estimated from the cointegrating regression. Since the estimated ut are based on the estimated cointegrating parameter β2. The DF and ADF critical values are not appropriate. Engle and Granger have calucated these values known as Engle-Granger (EG) or Augmented Engle-Granger (AEG) tests. 2. Cointegrating regression Durbin-Watson (CRDW) test Use the Durbin-Watson d statistic obtained from the cointegrating regression but now H0 : d = 0
H1 : d > 0
ˆ as d ≈ 2(1 − ρ). Examples : Refer to Gugarati p825-829. 7-74