EtL'. 07 $>t,
Technical Appendix for: Non-linear DSGE Models and The Central Di¤erence Kalman Filter Martin M. Andreasen Bank of England July, 2010
Contents 1 Multivariate Stirling interpolation 1.1 Univariate case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 "The remainder" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Multivariate case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 2 5 6
2 Results from Filtering Theory 2.1 A linear updating rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 A property for the multivariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 7 9
3 The 3.1 3.2 3.3 3.4
Central Di¤erence Kalman Filter (CDKF) The CDKF . . . . . . . . . . . . . . . . . . . . . The CDKF with additive noise terms . . . . . . . Square root implementation of the Kalman Filter A Smoother for the CDKF . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
12 12 17 19 22
4 The QML estimator based on the CDKF
24
5 Expressing the stochastic volatility model
26
6 Accuracy of …rst and second moments in approximated DSGE models
27
1
1
Multivariate Stirling interpolation
1.1
Univariate case
The Stirling’s interpolation formula of f (x) 2 R where x 2 R, let f (x)
f 1 2
f (x)
x+
f
h 2
x+
f
h 2
h 2
x
+f
x
h 2
The approximation about x = x is given by f (x) = f (x + ph) = f (x) + p +
p2 (p2 1) 4 f 4!
= f (x + ph) = f (x) + p + Normally,
f (x) +
f (x) +
p2 (p2 1) 4 f 4!
5
p+1 3 5
f (x)
f (x) + :::
(x) +
p+2 5
(x) +
3
3
f (x)
f (x) + :::
1 < p < 1.
= f (x) + p
f x+
h 2
= f (x) + p
f x+
h 2
= f (x) + p
1 2
= f (x) + p
1 2
= f (x) + p
1 2f
Now, let x + f (x) = f (x)
f x+
f (x) h 2
f x f x h 2
+
h 2
+f x+
(f (x + h) + f (x)) (x + h) + 12 f (x)
h
h 2
1 2
h 2
h 2
1 2
(f (x) + f (x
1 2f
(x)
1 2f
= f (x) +
h
(x
h)
i
f (x+h) f (x h) 2h
0 = f (x) + fDD h (x) (x
(x)
i
(x
x)
f (x+h) f (x h) 2h
x)
i
For a second order approximation: 2 f (x) = f (x + ph) = f (x) + p f (x) + p2!
2
f x h))
f (x+h) f (x h) 2 ph = xh() p = x h x . iSo + x h x f (x+h) 2 f (x h)
= f (x) + p
where
p2 2 f 2!
p+1 3
(x) +
p+2 5
(x) +
For a …rst approximation: f (x) = f (x + ph) = f (x) + p
0 fDD
p2 2 f 2!
f (x) 2
h 2
+
h 2
+f x
h 2
h 2
Hence, we only need to …nd 2 p2 2 f (x) = p2! f x + h2 f x 2! =
p2 2!
=
p2 2!
[f (x + h)
=
p2 2!
[f (x + h) + f (x
f x+
h 2
+
h 2
h 2
f x+
f (x)
h 2
h 2
(f (x) h)
f (x
x x h .
for p =
h
f (x+h) f (x h) 2h
0 = f (x) + fDD h (x) (x
i
+
h 2
h 2
h 2
f x
h))]
2f (x)]
Thus h f (x) = f (x + ph) = f (x) + p f (x+h) 2 f (x = f (x) +
h 2
f x
(x
x) +
h
h)
i
+
p2 2!
[f (x + h) + f (x
f (x+h)+f (x h) 2f (x) 2h2
00 x) + fDD (x) i (x
i
(x
h) 2
x)
2
x)
f (x+h)+f (x h) 2f (x) 2h2
00 (x) where fDD
For a third order approximation: f (x) = f (x + ph) = f (x) + p m k
Note that p+1 3
=
=
(p+1)! 3!(p 2)!
=
(p+1)p(p 1) 3!
= = =
p2 2 f 2!
(x) +
p+1 3
where m 2 R and k 2 N, Thus
(p+1)! 3!(p+1 3)!
=
=
m! k!(m k)!
f (x) +
(p2 +p)(p
1)
3!
(p3 +p2
p2 p)
3!
(p3
p)
3!
(p2
1)p 3!
And 3 f (x) = [f (x + h) + f (x h) 2f (x)] using results from the second order app. =
f (x + h) +
f (x
h)
2
f (x) 3
3
f (x)
2f (x)]
i h h = f (x+h+h) 2 f (x+h h) + f (x h+h) 2 f (x using results from the …rst order app. =
f (x+2h) f (x) 2
=
h
+
f (x) f (x 2h) 2
+
h h)
i
2
h
f (x+h) f (x h) 2
2f (x+h)+2f (x h) 2
f (x+2h) f (x 2h) 2f (x+h)+2f (x h) 2
i
Thus i h 2 f (x) = f (x + ph) = f (x) + p f (x+h) 2 f (x h) + p2! [f (x + h) + f (x h i (p2 1)p f (x+2h) f (x 2h) 2f (x+h)+2f (x h) + 3! 2 = f (x) +
for p =
h
f (x+h) f (x h) 2h 2
+
( xhx )
i
1
(x x
x h
3!
x x h .
0 = f (x) + fDD (x) (x (x
+
x)2 h2
0 (x) (x = f (x) + fDD
+ (x
x) + h
h
f (x+h)+f (x h) 2f (x) 2h2
i
2f (x)]
2
(x
x) i
2
x)
f (x+2h) f (x 2h) 2f (x+h)+2f (x h) 2
i
2
00 (x) (x h x) x) + fDD 2 2 x) h (x x) f (x+2h) 2
f (x 2h) 2f (x+h)+2f (x h) 12h3 (3)
3
0 00 = f (x) + fDD (x) (x x) + fDD (x) (x x) + fDD (x) (x h i (3) f (x+2h) f (x 2h) 2f (x+h)+2f (x h) where fDD (x) 3 12h 0 (x) = f (x) + fDD
h)
f (x+2h) f (x 2h) 2f (x+h)+2f (x h) 2
00 x) + fDD (x) (x 2 h x x h h
3!
i
(3)
fDD (x) h2 (x
00 (x) (x x) + fDD
x)
h2 (x
(3)
2
i
x) + fDD (x) (x
x)
3
x)
For a fourth order approximation: f (x) = f (x + ph) = f (x) + p
f (x) +
p2 2 f 2!
We only need to …nd p2 (p2 1) 4 p2 (p2 1) 2 f (x) = [f (x + h) + f (x 4! 4! using results from the second order app. =
p2 (p2 1) 4!
=
p2 (p2 1) ([f 4!
=
p2 (p2 1) (f 4!
2
f (x + h) +
2
f (x
h)
(x) + h)
p+1 3 2f (x)]
2 2 f (x)
(x + h + h) + f (x + h h) 2f (x + h)] + [f (x h + h) + f (x h h) 2f (x h)] 2 [f (x + h) + f (x h) 2f (x)])
(x + 2h) + f (x) 2f (x + h) +f (x) + f (x 2h) 2f (x h) 2f (x + h) 2f (x h) + 4f (x)) 4
3
f (x) +
p2 (p2 1) 4 f 4!
(x)
=
p2 (p2 1) (f 4!
(x + 2h) + f (x
2h) + 6f (x)
4f (x + h)
4f (x
h))
Thus i h 2 f (x) = f (x + ph) = f (x) + p f (x+h) 2 f (x h) + p2! [f (x + h) + f (x h) 2f (x)] h i (p2 1)p f (x+2h) f (x 2h) 2f (x+h)+2f (x h) + 3! 2 p2 (p2 1) + (f (x + 2h) + f (x 2h) + 6f (x) 4f (x + h) 4f (x h)) 4! (3)
0 = f (x) + fDD (x) 2
2
( xhx ) ( xhx ) + 4! for p = x h x .
1
(f (x + 2h) + f (x
(3)
0 = f (x) + fDD (x) (x
+
x)2 h2
(x
x)2 h2
h2 h2
(f (x + 2h) + f (x
2
(x
2
x)
0 (x) = f (x) + fDD (4)
2h)
(3)
(3)
2
4f (x + h)
4f (x
3
x) h))
x) + fDD (x) (x
00 (x) (x x) + fDD
fDD (x) h2 (x
4
3
x)
h) + 6f (x))
2
(3)
x)
2
(3)
x)
x) + fDD (x) (x
3
3
2
(3)
0 (x) = f (x)+ fDD
4f (x
00 (x) (x x) + fDD (x) (x fDD (x) h2 (x x) + fDD i h (x+h) 4f (x h)+6f (x) h2 f (x+2h)+f (x 2h) 4f 4 24h
+fDD (x) (x x) (x x) h2 h (4) f (x+2h)+f (x 2h) 4f (x+h) where fDD (x) 24h4
1.2
4f (x + h)
(3)
0 (x) = f (x) + fDD
x)
x) + fDD (x) (x
00 x) + fDD (x) (x
fDD (x) h2 (x
4!
+ (x
2h) + 6f (x)
(3)
2
00 x) + fDD (x) (x
fDD (x) h2 (x
fDD (x) h2 (x
4f (x h)+6f (x)
i (4)
00 (x) x)+ fDD
fDD (x) h2 (x
2
(3)
x) +fDD (x) (x
3
(4)
x) +fDD (x) (x
"The remainder"
To examine the error term in the approximation (i.e. the remainder), we insert the full Taylor series expansion of f (x) in the formula for multivariate Stirling interpolations. Recall that the full Taylor series expansion is given by f (x) = f (x) +
1 X 1 (i) f (x) (x i! i=1
For a second order approximation h i h (x h) f (x) = f (x) + f (x+h)2hf (x h) (x x) + f (x+h)+f2h 2 = f (x) + +
[f (x)+
[f (x)+
P1
= f (x) +
P1
1 (i) (x)hi i=1 i! f
1 (i) (x)hi i=1 i! f
]+[f (x)+ 2h2
h P1
1 (i) (x)hi i=1 i! f
P1
] [f (x)+
2h P1
P1
1 (i) (x)( i=1 i! f
1 (i) (x)( i=1 i! f
2h
1 (i) (x)( i=1 i! f
h)i
2f (x)
h)i ]
h)i ] 2f (x)
i
(x
x) 5
i
(x
2
(x
(x
i
x)
x) x) 2
x)
4
x)
+
h P1
P 1 (i) 1 (i) (x)hi + 1 (x)( i=1 i! f i=1 i! f 2h2
= f (x) + +
+
P1
[f (1) (x)h+0:5f (2) (x)h2 +
f (2) (x)h2 2h2
2f (1) (x)h 2h P1
+
+
P1
] [f (1) (x)( 2h
P h)1 +0:5f (2) (x)( h)2 + 1 i=3
P1
1 (i) (x)( i=3 i! f
] [
h)i ]
(x
2h
1 (i) (x)hi i=3 i! f
[
x)
1 (i) (x)hi i=3 i! f
1 (i) (x)hi i=3 i! f
[
2
(x
P 1 (i) (x)hi ]+[f (1) (x)( h)1 +0:5f (2) (x)( h)2 + 1 i=3 i! f 2 2h
P [f (1) (x)h+0:5f (2) (x)h2 + 1 i=3
= f (x) +
i
h)i
P1
1 (i) (x)( i=3 i! f
]+[
h)i ]
(x
2h2
1 (i) (x)( i! f
1 (i) (x)( i! f
h)i ]
(x
x)
2
x)
2
= f (x) + f (1) (x) (x x) + 21 f (2) (x) (x x) P P 1 (i) 1 (i) (x)hi ] [ 1 (x)( h)i ] [ 1 i=3 i! f i=3 i! f + (x x) 2h +
P1
[
1 (i) (x)hi i=3 i! f
P1
]+[
1 (i) (x)( i=3 i! f
h)i ]
1.3
P1
[
1 (i) (x) i=3 i! f 2h2
(hi +(
h)i )]
x) 2
= f (x) + f (1) (x) (x x) + 21 f (2) (x) (x P 1 (i) (x)(hi ( h)i )] [ 1 i=3 i! f (x x) + 2h +
2
(x
2h2
x)
2
(x
x)
Multivariate case
Let x 2 Rn and let y = f (x) 2 Rm . The Taylor series expansion of f (x) about x = x is P1 1 i D x f (x) i=0 i! 1 1 1 = f (x) + D x f (x) + D2 x f (x) + D3 x f (x) + D4 x f (x) + ::: 2! 3! 4! = f (x +
y
x) =
where Pn
Di x f (x) =
j=1
for i = 1; 2; 3; ::: So for the …rst for terms we have Pn @ xj @x D x f (x) = f (x) j=1 j x=x
D2 x f (x) = = =
Pn
Pn
a1 =1
Pn
a1 =1
j=1
@ xj @x j
xa1 @x@a Pn
a2 =1
1
2
f (x) x=x
Pn
xa2 @x@a
a2 =1
2
xa1 xa2 @xa@@xa 1
f (x) 2
f (x) 2
x=x
x=x
6
xj
@ @xj
i
f (x) x=x
h)i ]
(x 2
x)
x)
Pn
D3 x f (x) = = =
j=1
Pn
a1 =1
=
Pn
a2 =1
Pn
D4 x f (x) = =
xa1 @x@a
a1 =1
Pn
j=1
Pn
a1 =1
1
a2 =1
f (x) x=x
Pn
xa2 @x@a
Pn
a3 =1
2
xa3 @x@a
@3 @x a2 @xa3 1
f (x) 3
f (x)
x=x
x=x
4
f (x) x=x
Pn
xa2 @x@a
a2 =1
1
Pn
xa1 xa2 xa3 @xa
a3 =1
@ xj @x j
Pn
3
a2 =1
Pn
xa1 @x@a
a1 =1
Pn
@ xj @x j
a3 =1
Pn
a3 =1
Pn
a3 =1
2
xa3 @x@a
xa1 xa2 xa3 xa4 @xa
Pn
xa4 @x@a
a4 =1
3
@4 @x @xa3 @xa4 a 1 2
f (x)
The multivariate Stirling interpolation is then given by (up to fourth order) ~ x f (x) + 1 D ~ 2 f (x) + 1 D ~ 3 f (x) + 1 D ~ 4 f (x) y = f (x) + D x x x 2! 3! 4! ~ x, D ~2 , D ~3 , D ~ 4 as where we de…ne the divided di¤erence operatiors D x x x P n ~ x f (x) = 1 D xa1 1 1 f (x) a1 =1 h
~ 2 f (x) = 12 D x h
~ 3 f (x) = 1 3 D x 2h y = f (x)+
2 2.1
Pn
0 fDD
a1 =1
Pn
2 2
( xa1 )
+2
2 2
a1 =1
(x)
1
( xa1 )
(3) fDD
(x) h
Pn
a1 =1
Pn
a2 =
xa1 xa2
1
1
2
1
2
f (x) 4
x=x
f (x)
1
2
(x
(4)
00 (x) x)+ fDD
fDD (x) h2 (x
2
(3)
x) +fDD (x) (x
3
(4)
x) +fDD (x) (x
4
x)
Results from Filtering Theory A linear updating rule
Consider the non-linear state space system yt = g (xt ; vt ; ) xt = h (xt
1 ; wt ;
(1) )
(2)
Our derivation follows Lewis (1986). Hence, we restrict the focus to a linear updating rule for the posterior state estimate, i.e. x ^t = bt + Kt yt (3) where bt and Kt are to be determined. The prior and the posterior estimation errors, respectively, are de…ned by et = xt
xt
(4)
^ et = xt
x ^t
(5)
where a bar denotes an a prior estimate and a hat denotes a posterior estimate. Hence, ^ et = xt x ^t
7
= xt bt by using (3)
Kt yt
= xt + et bt Kt yt since (4) implies et = xt xt () xt = xt + et = xt + et by using (1)
Kt g (xt ; vt ; )
bt
Thus ^ et = xt + et
Kt g (xt ; vt ; )
bt
Note then that Et 1 [^ et ] = Et 1 [et ] + xt bt Kt Et 1 [g (xt ; vt ; )] We then require that x ^t and xt are unbiased given information at time t 0 = 0 + xt bt Kt Et 1 [g (xt ; vt ; )] m bt = xt
K t Et
1
(6)
1. Hence
[g (xt ; vt ; )]
(7)
Inserting (7) into (3) we get x ^t = bt + Kt yt = xt
K t Et
[g (xt ; vt ; )] + Kt yt
1
m x ^t = xt + Kt (yt
Et
1
[g (xt ; vt ; )])
(8)
To get an expression for the posterior covariance matrix in terms of the prior covariance matrix, we substitute (7) into (6) ^ et = et + xt bt Kt g (xt ; vt ; ) = et + xt
(xt
= et + K t E t = et
1
K t Et
1
[g (xt ; vt ; )]
Kt (g (xt ; vt ; )
Note, that Et 1 [^ et ] = Et
1
[g (xt ; vt ; )])
[et ]
Kt (Et
Et
1
Kt g (xt ; vt ; )
Kt g (xt ; vt ; ) 1
[g (xt ; vt ; )])
[g (xt ; vt ; )]
Et
1
[g (xt ; vt ; )]) = 0
We let Qt g (xt ; vt ; ) Et 1 [g (xt ; vt ; )]. Hence, the posterior covariance matrix is given by ^ xx (t) = Et 1 (et Kt Q) (et Kt Q)0 P = Et
1
= Et
1
Kt Q) (e0t
[(et et e0t
= Pxx (t)
Et
et Q0 K0t 1
Q0 K0t )] Kt Qe0t + Kt QQ0 K0t
[et Q0 ] K0t
K t Et
1
Qe0t + Kt Et
1
QQ0 K0t
^ xx (t). Therefore We now determine Kt such that we minimze P ^ xx (t) @P 0 0 0 = 2Et 1 [et Q ] + 2Kt Et 1 QQ = 0 @Kt 8
m Kt Et 1 QQ0 = Et 1 [et Q0 ] m Kt = (Et 1 [et Q0 ]) Et 1 QQ0 m Kt = Et 1 et (g (xt ; vt ; ) Et Et
(g (xt ; vt ; )
1
Et
1
0
[g (xt ; vt ; )])
1
[g (xt ; vt ; )]) (g (xt ; vt ; )
1
Et
1
1
0
[g (xt ; vt ; )])
Based on this expression, the covariance matrix for the posterior is given by: ^ xx (t) = Pxx (t) Et 1 [et Q0 ] K0t Kt Et 1 Qe0t + Kt Et 1 QQ0 K0t P = Pxx (t)
Et
[et Q0 ] (Et
[et Q0 ]) Et
1
(Et
[et Q0 ]) Et
1
QQ0
1
1
+ (Et
[et Q0 ]) Et
1
QQ0
1
1
= Pxx (t)
2.2
1
Et
1
[et Q0 ] Et
1
Qe0t
Et
1
QQ0
[et Q0 ]) Et
1
QQ0
1
1
+ (Et
[et Q0 ]) Et
1
QQ0
1
1
Et (Et
1
= Pxx (t)
K t Et
1
QQ0
Et
1
= Pxx (t)
K t Et
1
QQ0
Et
1
= Pxx (t)
K t Et
1
QQ0
1
(Et
1
1
[et Q0 ]) Et
1
1 0
QQ0
0
[et Q0 ])
0
[et Q0 ]) 1
QQ0
Et
(Et
Qe0t
1
= Pxx (t)
1
1
QQ0
1
(Et
[et Q0 ] Et
Et
1 0
QQ0
1
(Et
QQ0
1 1
[et Q0 ] Et
0
[et Q0 ]) (Et 1
1
0
[et Q0 ])
QQ0
1 0
K0t
A property for the multivariate normal distribution
Recall the following result (taken from Anderson & Moore (1979), page 25) X Y
Proposition 1 Let Z =
where X and Y are vectors with dimension nx and ny . Let furthermore, Z be Gaussian x xx and = y yx y) and V ar ( Xj Y = y) =
with mean and the covariance matrix given by m = Gaussian and E [ Xj Y = y] = x +
1 yy
xy
(y
Proof Assume that and yy are nonsigular. Hence (x;y) p XjY ( xj y) = pXY pY (y) =
=
(2 ) (nx +ny )=2 j j (2 ) (2 )
(ny )=2 j
nx =2 j
j
1 j2
yy j
1 yy j 2
expf
1 2 1 2
expf
expf
expf 1 2 (y
1 2 (z
y)0
1 2 (z
m)0
1
(z m)g
0
1 yy (y
m)0
1
1 yy (y
y)g
(z m)g
1 2 (y
y)
y)g
9
xy
. Then it holds that Xj Y = y is
yy xx
xy
1 yy
xy
Then we notice that 1 I xy yy 0 I I 0
=
1 yy 1 yy
xy
I xx
=
I
1 yy
xy
0 I
0 xy
=
xx
xy
I
yx
yy
1 yy
yx
xy
yx xx
=
1 yy
xy
0
yx
=
yy
yx 0 xy
xy
1 yy
yx
xy
yx
xx
0 xy
1 yy
I
0 I
0 I
0 xy
1 yy
yy 1 yy 1 yy
xx
I
yy
yy
yx
=
1 yy
xy
0 I
0 xy
0 yy
0
0
yy
m I 0
=
1
1 yy
xy
xx
1 yy
xy
I
0
yx
0
I 1 yy
yy
1
0 I
0 xy
m 1
I
=
Note also that I j j= 0 =
=1
0 I
0 xy
1 yy
xx
1 xx
xx
xy
1 yy
1 yy
xy
0
yx
0 since these matrices are triangular
I
0 I
I 1 yy
yy
1
0 xy
1 yy
0
yx
1 yy
I
0
yx
0
xx
xy
yy
1
1 yy
I
1
1 yy
xy
0
xy
I 0
yy
I
I 0
1
0
yx
0
1 yy
xy
1 yy
xy
1
1
0 I
0 xy
1
yy
1 = xx xy yy yx j yy j due to the properties of determinants.
So we …nally have 0 1 (Z m) (Z m) = =
x0 x0 I 0
=
I
y0 y0 xy
1 yy
1 yy
x y
I 0
(x
0
x)
(y
0
0
y)
1 yy
0 I
0 xy
xx
xy
1 yy
1
0
yx
0
yy
x y 0 xy
y
0
y
0
"
xx
xy
0 10
1 yy
1
0
yx
(
yy )
1
#
x
x 0
h
=
y
(x0 x0 )
xx
x
xy
x 0
(y
0
y)
(x0 x0 ) + (y0 y0 ) + (y0 y0 ) (
(y y
xy 1 yy (y
y
= (x0 x0 ) 0
1 yy
xy
y)
1
1 yy
(y0 y0 )
yx
1 yy
0 xy
xx
xy
1
1 yy
yx
y)
y 1
1 xx xy yy yx 1 0 xx xy yy xy 1 xx xy yy yx 1 0 xx xy yy xy 1 (y y) yy )
(x
x) 1
1 yy 1
yx
(x
x)
1 xy yy (y 1 1 xy yy yx
y) 1 yy
(y
y)
Notice that we may express the …rst four terms by letting x ^ = x + xy yy1 (y y) and x0 x ^0
xx
= x0 x0
= (x0 x0 ) 0
y)
0
0
(y (x
1 yy
xy
(y
x)
+ (y0 y0 ) Hence we have p XjY ( xj y) =
=
=
(2 )
(2 )
= (2 ) n exp
(2 )
nx =2 j
1 j2
j
nx =2 j
nx =2 j
1 yy j 2
1 yy j 2
1 2
xx 0
x0 x ^
expf
exp
j nx =2
1 yy j 2
exp
n
1 2
n
1 2
1 yx
1 yy
xy
x
0 xy
xx
xy
xy
1 yy
(y
y) 1
1 yy
yx
xy
1 2 (y
(x0
xy
x) 1 yx
(x
x)
1 xy yy (y 1 1 xy yy yx
yx
expf
(x
1 yy 1
1 2 (z
m)0
1
0
1 yy (y
y)g
y)
x ^0 )(
y) 1 yy
xy
1 yy
yx
j j 2 expf
1 2 (y
y)0
1 yy (y
xy
1 yy
yx
(x0
x ^0 )(
xy
1 yy
1 yy xy
xx 1
yx
yx 1 yy
j2 j
)
)
1
(x
1 2
y)g 1
(x x ^)
1 2
1
y)
(x x ^)
1 yy j 2
yx
(y
(z m)g
xx
1
xx
xx
(y0 y0 )
x
y)
1 xx xy yy 1 0 xx yy xy 1 xx xy yy 1 0 xx yy xy
0
yx
xy
yx
1
1 yy
xx 1
1 yy
xy
x ^)
0 xy
1 yy
xx
x
(x
yx
(y0 y0 )
= (x0 x0 ) x
1
1 yy
xy
o x ^)
which is a multivariate normal distribution. Q.E.D.
11
o
(y0
y0 )(
yy )
1
(y y)
o
(y0 y0 ) (
yy )
1
i
3
The Central Di¤erence Kalman Filter (CDKF)
3.1
The CDKF
We consider the general nonlinear state space system of the form xt+1 = h (xt ; wt+1 )
(9)
yt = g (xt ; vt )
(10)
where xt is the unobserved state vector and yt is the observable measurement vector. We let xt have dimension nx 1 and yt have dimension ny 1. The vector wt is the process noise and has dimension nw 1. The vector vt is the measurement noise and has the dimension nv 1. We assume that wt and vt are mutually independent and that wt s iid (0; Rw (t)) and vt s iid (0; Rv (t)). The general principle for estimating the unobserved state vector based on the observable measurements are as follows. The a priori state and covariance estimators are xt+1 = Et [xt+1 ] Pxx (t + 1) = Et (xt+1
(11) 0
xt+1 ) (xt+1
xt+1 )
(12)
where Et [ ] is the conditional expectation given the observations y1:t = fy1 ; y2 ; :::; yt g. The updated posterior state estimator is according to x ^t+1 = xt+1 + Kt+1 (yt+1 yt+1 ) (13) where Kt+1 = Pxy (t + 1) Pyy (t + 1)
1
(14)
yt+1 = Et [g (xt+1 ; vt+1 )] Pxy (t + 1) = Et (xt+1
xt+1 ) (yt+1
Pyy (t + 1) = Et (yt+1
yt+1 ) (yt+1
(15) 0
(16)
0
(17)
yt+1 ) yt+1 )
The corresponding update of the covariance matrix is ^ xx (t + 1) P
= Et+1 (xt+1 = Pxx (t + 1)
0
x ^t+1 ) (xt+1 x ^t+1 ) Kt+1 Pyy (t + 1) K0t+1
(18)
For the following, we de…ne four squared and upper triangular Cholesky factorizations Sw (t), Sv (t), Sx (t) ; and ^ x (t) such that S 0 Rw (t) = Sw (t) Sw (t) (19) 0
(20)
0
Pxx (t) = Sx (t) Sx (t)
(21)
^ xx (t) = S ^ x (t) S ^ x (t)0 P
(22)
Rv (t) = Sv (t) Sv (t)
For the following matrices we use the notation Sw (t + 1) =
sw;1
sw;2
::: sw;nw
(23)
Sv (t) =
sv;1
sv;2
::: sv;nv
(24)
Sx (t) =
sx;1
sx;2
::: sx;nx
(25)
^ x (t) = S
^ sx;1 ^ sx;2
::: ^ sx;nx
(26)
12
Here, sw;j has dimension nw 1 for j = 1; :::; nw , sv;j has dimension nv 1 for j = 1; :::; nv and sx;j and ^ sx;j have dimension nx 1 for j = 1; :::; nx . Next, we de…ne eight matrices of the form: The …rst order e¤ects S(1) xt + h^ sx;j ; wt+1 ) xx (t) = f(hi (^ S(1) xt ; wt+1 + hsw;j ) xw (t) = f(hi (^ S(1) yx
(t) = f(gi (xt + hsx;j ; vt )
S(1) yv (t) = f(gi (xt ; vt + hsv;j )
hi (^ xt
h^ sx;j ; wt+1 )) =2hg for i; j = 1; :::; nx
hi (^ xt ; wt+1 gi (xt
(27)
hsw;j )) =2hg for i = 1; :::; nx and j = 1; :::; nw
(28)
hsx;j ; vt )) =2hg for i = 1; :::; ny and j = 1; :::; nx
gi (xt ; vt
and the second order e¤ects: (p h2 1 (2) (hi (^ xt + h^ sx;j ; wt+1 ) + hi (^ xt Sxx (t) = 2h2
(29)
hsv;j )) =2hg for i = 1; :::; ny and j = 1; :::; nv )
h^ sx;j ; wt+1 )
2hi (^ xt ; wt+1 ))
(30)
for i; j = 1; :::; nx
(31)
) h2 1 (hi (^ xt ; wt+1 + hsw;j ) + hi (^ xt ; wt+1 hsw;j ) 2hi (^ xt ; wt+1 )) (t) = for i = 1; :::; nx and j = 1; :::; nw 2h2 ) (p h2 1 (2) (gi (xt + hsx;j ; vt ) + gi (xt hsx;j ; vt ) 2gi (xt ; vt )) for i = 1; ::; ny and j = 1; ::; nx (32) Syx (t) = 2h2 (p ) h2 1 (2) Syv (t) = (gi (xt ; vt + hsv;j ) + gi (xt ; vt hsv;j ) 2gi (xt ; vt )) for i = 1; ::; ny and j = 1; :::; nv (33) 2h2
S(2) xw
(p
Here, the superscript, (1) and (2), denotes the …rst-order and the second-order e¤ects of the general nonlinear functions. For the normal distribution the optimal value of h is h2 = 3 since the normal distribution has the property that 4 = 3 2. The idea in the CDKF is to approximate the nonlinear expectations in (11), (12), and (15) to (18) by multivariate Stirling interpolations of second order. Hence, the a priori state estimator is xt+1
=
h2
nx nw 1 Pnx h (^ xt ; wt+1 ) + 2 p=1 (h (^ xt + h^ sx;p ; wt+1 ) + h (^ xt h2 2h 1 Pnw (h (^ xt ; wt+1 + hsw;p ) + h (^ xt ; wt+1 hsw;p )) + 2 p=1 2h
h^ sx;p ; wt+1 ))
A lower triangular Cholesky factor of the a priorii covariance matrix is obtained by a Householder transformation of h (2) (1) (2) we thus have the matrix S(1) xx (t) Sxw (t) Sxx (t) Sxw (t) . Denoting the Householder transformation by Sx (t + 1) =
h
(1)
(1)
(2)
(2)
Sxx (t) Sxw (t) Sxx (t) Sxw (t)
That is, the expression for the a priori covariance matrix is of the form Pxx (t + 1)
0
= Sx (t + 1) Sx (t + 1) h i (1) (1) (2) (2) = Sxx (t) Sxw (t) Sxx (t) Sxw (t) 0
0
h
(1)
i
(34)
(1)
(2)
(2)
Sxx (t) Sxw (t) Sxx (t) Sxw (t) 0
0
(1) (2) (2) (1) (1) (2) (2) = S(1) xx (t) Sxx (t) + Sxx (t) Sxx (t) + Sxw (t) Sxw (t) + Sxw (t) Sxw (t)
i
(35) 0
The third equality follows from the fact that the Householder transformation of the rectangular matrix A produces a 0 squared and upper triangular matrix S = (A) such that AA0 = SS0 , i.e. SS0 = (A) (A) = AA0 . For instance, (1) (1) (2) (2) 0 0 ^ xx (t) H0x;t , where Hx;t = @h(x;wt+1 ) Sxx (t) Sxx (t) + Sxx (t) Sxx (t) corresponds to Hx;t P , in the Extended @x Kalman Filter. However, the former is a better approximation than the latter. 13
x=^ xt
The a priori estimator of the measurement vector is given by h2
nx nv 1 Pnx (g (xt+1 + hsx;p ; vt+1 ) + g (xt+1 g (xt+1 ; vt+1 ) + 2 p=1 h2 2h 1 Pnv + 2 p=1 (g (xt+1 ; vt+1 + hsv;p ) + g (xt+1 ; vt+1 hsv;p )) 2h
=
yt+1
hsx;p ; vt+1 ))
(36)
The covariance matrix for this estimator is calculated based on a Householder transformation of h i (1) (1) (2) (2) Syx (t + 1) Syv (t + 1) Syx (t + 1) Syv (t + 1) That is
Sy (t + 1) = Notice, that 0 Pyy (t + 1) = Sy (t + 1) Sy (t + 1) =
m
h
(1)
h
(1)
(1)
(1)
(2)
(2)
(2)
Syx (t + 1) Syv (t + 1) Syx (t + 1) Syv (t + 1) i h (2) (2) (1) (1) Syx (t + 1) Syv (t + 1) Syx (t + 1) Syv (t + 1) Pyy (t + 1)
i
(2)
Syx (t + 1) Syv (t + 1) Syx (t + 1) Syv (t + 1)
i 0
0
0
(1) (1) (1) = S(1) yx (t + 1) Syx (t + 1) + Syv (t + 1) Syv (t + 1)
+S(2) yx
(t +
1) S(2) yx
0
(37)
(t + 1) +
S(2) yv
(t +
1) S(2) yv
(38) 0
(t + 1)
The a priori cross covariance matrix is given by 0
Pxy (t + 1) = Sx (t + 1) S(1) yx (t + 1)
(39)
and the a priori error covariance matrix for the observables is 0
Pyy (t + 1) = Sy (t + 1) Sy (t + 1)
(40)
As noted above, the posterior estimator of the covariance matrix for the unobserved state variables is ^ xx (t + 1) = Pxx (t + 1) P
Kt+1 Pyy (t + 1) K0t+1
(41)
Notice, however that Kt+1 Pyy (t + 1) K0t+1
= Pxy (t + 1) Pyy (t + 1) = Pxy (t + 1) K0t+1
1
Pyy (t + 1) K0t+1
(42)
0
0 = Sx (t + 1) S(1) yx (t + 1) Kt+1
and due to symmetry
0
Kt+1 Pyy (t + 1) K0t+1 = Kt+1 S(1) yx (t + 1) Sx (t + 1) By using (38) we also have 0 Kt+1 Pyy (t + 1) K0t+1 = Kt+1 Sy (t + 1) Sy (t + 1) K0t+1 (1)
(1)
0
(1)
(1)
0
= Kt+1 (Syx (t + 1) Syx (t + 1) + Syv (t + 1) Syv (t + 1) (2) (2) (2) (2) 0 0 +Syx (t + 1) Syx (t + 1) + Syv (t + 1) Syv (t + 1) )K0t+1 Hence, we may rewrite (41) in the following way: 14
(43)
^ xx (t + 1) = Pxx (t + 1) Kt+1 Pyy (t + 1) K0t+1 Kt+1 Pyy (t + 1) K0t+1 + Kt+1 Pyy (t + 1) K0t+1 P (1) (1) 0 0 0 = Sx (t + 1) Sx (t + 1) Sx (t + 1) Syx (t + 1) K0t+1 Kt+1 Syx (t + 1) Sx (t + 1) (1) (1) (1) (1) 0 0 +Kt+1 (Syx (t + 1) Syx (t + 1) + Syv (t + 1) Syv (t + 1) (2) (2) (2) (2) 0 0 +Syx (t + 1) Syx (t + 1) + Syv (t + 1) Syv (t + 1) )K0t+1 h ih i0 (1) (1) = Sx (t + 1) Kt+1 Syx (t + 1) Sx (t + 1) Kt+1 Syx (t + 1) (2)
(2)
0
(1)
(1)
0
(2)
(2)
0
+Kt+1 Syx (t + 1) Syx (t + 1) + Syv (t + 1) Syv (t + 1) + Syv (t + 1) Syv (t + 1) K0t+1 h ih i0 (1) (1) = Sx (t + 1) Kt+1 Syx (t + 1) Sx (t + 1) Kt+1 Syx (t + 1) h i0 h i0 (1) (1) (2) (2) +Kt+1 Syv (t + 1) Syv (t + 1) Kt+1 + Kt+1 Syx (t + 1) Syx (t + 1) Kt+1 h i0 (2) (2) +Kt+1 Syv (t + 1) Syv (t + 1) Kt+1 h i (1) (2) (2) = Sx (t + 1) Kt+1 S(1) yx (t + 1) Kt+1 Syv (t + 1) Kt+1 Syx (t + 1) Kt+1 Syv (t + 1) 2 h i0 3 (1) + 1) 7 6 Sx (t h+ 1) Kt+1 Syx (t i0 6 7 (1) 6 7 Kt+1 Syv (t + 1) 6 7 h i0 6 7 (2) 6 7 Syx (t + 1) Kt+1 6 7 h i0 4 5 (2) Syv (t + 1) Kt+1
^ xx (t + 1) is So the Cholesky factor for P h i (1) (1) (2) (2) ^ x (t + 1) = S Sx (t + 1) Kt+1 Syx (t + 1) Kt+1 Syv (t + 1) Kt+1 Syx (t + 1) Kt+1 Syv (t + 1) 0
(44)
^ xx (t + 1) = S ^ x (t + 1) S ^ x (t + 1) by de…nition of the Householder transformation. Notice, that P ^ xx (t + 1) and since P Pyy (t + 1) by construction always will be symmetric and positive semide…nite as desired.
15
The algorithm for the Central Di¤erence Kalman Filter (CDKF): Initialization: t = 0 ^ x (t). Set x ^t and S For t > 1 Prediction step: 2
xt ; wt+1 ; ) – xt+1 = h nhx2 nw h (^ Pnx + 2h1 2 p=1 (h (^ xt + h^ sx;p ; wt+1 ; ) + h (^ xt h^ sx;p ; wt+1 ; )) Pnw 1 xt ; wt+1 + hsw;p ; ) + h (^ xt ; wt+1 hsw;p ; )) + 2h2 p=1 (h (^ h i (1) (1) (2) (2) – Sx (t + 1) = Sxx (t) Sxw (t) Sxx (t) Sxw (t)
Updating step: 2
– yt+1 = h nhx2 nv g (xt+1 ; vt+1 ; ) Pnx (g (xt+1 + hsx;p ; vt+1 ; ) + g (xt+1 hsx;p ; vt+1 ; )) + 2h1 2 p=1 Pnv 1 + 2h2 p=1 (g (xt+1 ; vt+1 + hsv;p ; ) + g (xt+1 ; vt+1 hsv;p ; )) h i (1) (1) (2) (2) – Sy (t + 1) = Syx (t + 1) Syv (t + 1) Syx (t + 1) Syv (t + 1) (1)
0
0
– Kt+1 = Sx (t + 1) Syx (t + 1) Sy (t + 1) Sy (t + 1) ^ x (t + 1) = ([ Sx (t + 1) –S
(1)
1
(1)
Kt+1 Syx (t + 1) Kt+1 Syv (t + 1)
(2) (2) Kt+1 Syx (t + 1) Kt+1 Syv (t + 1) ])
Quasi log-likelihood function – Lt+1 = Lt
ny 2
log (2 )
1 2
1 2
log Pyy (t + 1)
16
(yt+1
0
yt+1 ) Pyy1 (t + 1) (yt+1
yt+1 )
3.2
The CDKF with additive noise terms
We consider the nonlinear state space system of the form xt+1 = h (xt ) + wt+1
(45)
yt = g (xt ) + vt
(46)
where xt is the unobserved state vector and yt is the observable measurement vector. We let xt have dimension nx 1 and yt have dimension ny 1. The vector wt is the process noise and has dimension nw 1. The vector vt is the measurement noise and has the dimension ny 1. We assume that wt and vt are mutually uncorrelated and that wt s iid (0; Rw (t)) and vt s iid (0; Rv (t)). Again, the a priori state and covariance estimators are xt+1 = Et [xt+1 ] Pxx (t + 1) = Et (xt+1
(47) 0
xt+1 ) (xt+1
xt+1 )
(48)
where Et [ ] is the conditional expectation given the observations y1:t = fy1 ; :::; yt g. The updated posterior state estimator is according to x ^t+1 = xt+1 + Kt+1 (yt+1 yt+1 ) (49) where Kt+1 = Pxy (t + 1) Pyy (t + 1)
1
(50)
yt+1 = Et [g (xt+1 )] Pxy (t + 1) = Et (xt+1
(51)
xt+1 ) (yt+1
Pyy (t + 1) = Et (yt+1
yt+1 ) (yt+1
0
(52)
0
(53)
yt+1 )
yt+1 )
The corresponding update of the covariance matrix is ^ xx (t + 1) P
0
= Et+1 (xt+1 = Pxx (t + 1)
x ^t+1 ) (xt+1 x ^t+1 ) Kt+1 Pyy (t + 1) K0t+1
(54)
For the following, we de…ne four squared and upper triangular Cholesky factorizations Sw (t), Sv (t), Sx (t), and ^ x (t) such that S 0 Rw (t) = Sw (t) Sw (t) (55) 0
(56)
0
(57)
0
(58)
Rv (t) = Sv (t) Sv (t)
Pxx (t) = Sx (t) Sx (t)
^ xx (t) = S ^ x (t) S ^ x (t) P For the following we use the notation
where sx;j and s^x;j have dimension nx
Sx (t) =
sx;1
sx;2
::: sx;nx
(59)
^ x (t) = S
^ sx;1 ^ sx;2
::: ^ sx;nx
(60)
1 for j = 1; :::; nx . Next, we de…ne four matrices of the form:
S(1) xt + h^ sx;j ) xx (t) = f(hi (^ S(1) yx (t) = f(gi (xt + hsx;j )
gi (xt
hi (^ xt
h^ sx;j )) =2hg for i; j = 1; :::; nx
(61)
hsx;j )) =2hg for i = 1; :::; ny and j = 1; :::; nx
(62)
17
S(2) xx S(2) yx (t) =
(t) =
(p
(p
h2 1 (hi (^ xt + h^ sx;j ) + hi (^ xt 2h2
h2 1 (gi (xt + hsx;j ) + gi (xt 2h2
hsx;j )
h^ sx;j )
)
2hi (^ xt )) )
2gi (xt ))
for i; j = 1; :::; nx
(63)
for i = 1; :::; ny and j = 1; :::; nx
(64)
The a priori state estimator is h2
xt+1 =
nx h2
h (^ xt ) +
1 Pnx (h (^ xt + h^ sx;p ) + h (^ xt 2h2 p=1
h^ sx;p ))
(65)
A lower triangular Cholesky factor of the h i a priori covariance matrix is obtained by a Householder transformation of (1) (2) the matrix Sxx (t) Sw (t) Sxx (t) . Denoting the Householder transformation by we thus have Sx (t + 1) =
h
(1)
(2)
Sxx (t) Sw (t) Sxx (t)
That is, the expression for the a priori covariance matrix is of the form Pxx (t + 1)
0
= Sx (t + 1) Sx (t + 1) h i (1) (2) = Sxx (t) Sw (t) Sxx (t) 0
h
0
i
(66)
(1)
(2)
Sxx (t) Sw (t) Sxx (t) 0
(1) (2) (2) = S(1) xx (t) Sxx (t) + Sxx (t) Sxx (t) + Sw (t) Sw (t)
i
(67) 0
The a priori estimator of the measurement vector is given by yt+1 =
h2
nx h2
g (xt+1 ) +
1 Pnx (g (xt+1 + hsx;p ) + g (xt+1 2h2 p=1
hsx;p ))
(68)
The covariance matrix for this estimator is calculated based on a Householder transformation of h i (1) (2) Syx (t + 1) Sv (t + 1) Syx (t + 1) That is
h
Sy (t + 1) = Notice, that Pyy (t + 1)
(1)
(2)
Syx (t + 1) Sv (t + 1) Syx (t + 1)
0
= Sy (t + 1) Sy (t + 1) h i (1) (2) = Syx (t + 1) Sv (t + 1) Syx (t + 1) 0
0
h
i
(69)
(1)
(2)
Syx (t + 1) Sv (t + 1) Syx (t + 1) 0
(1) (2) (2) = S(1) yx (t + 1) Syx (t + 1) + Syx (t + 1) Syx (t + 1) + Sv (t + 1) Sv (t + 1)
i
(70) 0
The a priori cross covariance matrix is given by 0
Pxy (t + 1) = Sx (t + 1) S(1) yx (t + 1) and the Kalman gain is given by Kt+1 = Pxy (t + 1) Pyy (t + 1)
1
(71) (72)
Hence the a posterior update of the state estimator is x ^t+1 = xt+1 + Kt+1 (yt+1
18
yt+1 )
(73)
The posterior estimate of the covariance matrix for the unobserved state variables is ^ xx (t + 1) = Pxx (t + 1) P
Kt+1 Pyy (t + 1) K0t+1
(74)
Notice, however that Kt+1 Pyy (t + 1) K0t+1
= Pxy (t + 1) Pyy (t + 1) = Pxy (t + 1) K0t+1
1
Pyy (t + 1) K0t+1
(75)
0
0 = Sx (t + 1) S(1) yx (t + 1) Kt+1
and due to symmetry
0
Kt+1 Pyy (t + 1) K0t+1 = Kt+1 S(1) yx (t + 1) Sx (t + 1)
(76)
We also have Kt+1 Pyy (t + 1) K0t+1
0
= Kt+1 Sy (t + 1) Sy (t + 1) K0t+1
(77)
0
0
0
(1) (2) (2) = Kt+1 S(1) yx (t + 1) Syx (t + 1) + Syx (t + 1) Syx (t + 1) + Sv (t + 1) Sv (t + 1)
K0t+1
Hence, we may rewrite (74) in the following way: ^ xx (t + 1) = Pxx (t + 1) Kt+1 Pyy (t + 1) K0t+1 Kt+1 Pyy (t + 1) K0t+1 + Kt+1 Pyy (t + 1) K0t+1 P (1) (1) 0 0 0 = Sx (t + 1) Sx (t + 1) Sx (t + 1) Syx (t + 1) K0t+1 Kt+1 Syx (t + 1) Sx (t + 1) (1)
(1)
0
(2)
(2)
0
(2)
(2)
0
0
+Kt+1 Syx (t + 1) Syx (t + 1) + Syx (t + 1) Syx (t + 1) + Sv (t + 1) Sv (t + 1) h ih i0 (1) (1) = Sx (t + 1) Kt+1 Syx (t + 1) Sx (t + 1) Kt+1 Syx (t + 1)
0 Kt+1
0
+Kt+1 Syx (t + 1) Syx (t + 1) + Sv (t + 1) Sv (t + 1) K0t+1 i0 ih h (1) (1) = Sx (t + 1) Kt+1 Syx (t + 1) Sx (t + 1) Kt+1 Syx (t + 1) h i0 (2) (2) 0 +Kt+1 Sv (t + 1) [Sv (t + 1) Kt+1 ] + Kt+1 Syx (t + 1) Syx (t + 1) Kt+1 2 3 (1) h i Sx (t + 1) Kt+1 Syx (t + 1) 6 7 (2) Kt+1 Sv (t + 1) = Sx (t + 1) Kt+1 S(1) 4 5 yx (t + 1) Kt+1 Sv (t + 1) Kt+1 Syx (t + 1) (2) Kt+1 Syx (t + 1)
^ xx (t + 1) is So the Cholesky factor for P h ^ x (t + 1) = S Sx (t + 1)
(1)
(2)
Kt+1 Syx (t + 1) Kt+1 Sv (t + 1) Kt+1 Syx (t + 1)
0
i
(78)
^ xx (t + 1) = S ^ x (t + 1) S ^ x (t + 1) by de…nition of the Householder transformation. Notice, that P ^ xx (t + 1) and since P Pyy (t + 1) by construction always will be symmetric and positive semide…nite as desired.
3.3
Square root implementation of the Kalman Filter
Consider the following state space system xt+1 = hx xt + hw wt+1
(79)
yt = gx xt + gv vt
(80)
where xt is the unobserved state vector and yt is the observable measurement vector. We let xt have dimension nx 1 and yt have dimension ny 1. The vector wt is the process noise and has dimension nw 1. The vector vt is the 19
measurement noise and has the dimension nv 1. We assume that wt and vt are mutually independent and that wt s iid (0; Rw (t)) and vt s iid (0; Rv (t)). Note that the calculations below also hold on a linearized state space system, i.e. for the extended kalman …lter. Prior state estimate is xt+1 = hx x ^t
(81)
and the covariance matrix for this estimate is ^ x (t) h0x + hw Rw (t + 1) h0w Px (t + 1) = hx P m ^ x (t) S ^ x (t)0 h0x + hw Sw (t + 1) Sw (t + 1)0 h0w Px (t + 1) = hx S m 0
^ x (t) hx S ^ x (t) + hw Sw (t + 1) (hw Sw (t + 1))0 Px (t + 1) = hx S A square root implementation is therefore to let ^ x (t) hw Sw (t + 1) hx S
Sx (t + 1) =
(82)
meaning that 0 Pxx (t + 1) = Sx (t + 1) Sx (t + 1) =
^ x (t) hw Sw (t + 1) hx S
^ x (t) hx S ^ x (t) = hx S
0
^ x (t) hw Sw (t + 1) hx S
0
0
+ hw Sw (t + 1) (hw Sw (t + 1))
The prior estimate of the observables is yt+1 = gx xt+1
(83)
and its covariance matrix is Pyy (t + 1) = gx Px (t + 1) gx0 + gv Rv (t + 1) gv0 0
0
= gx Sx (t + 1) Sx (t + 1) gx0 + gv Sv (t + 1) Sv (t + 1) gv0 0
0
= gx Sx (t + 1) gx Sx (t + 1) + gv Sv (t + 1) (gv Sv (t + 1)) A square root implementation is therefore to let Sy (t + 1) =
gx Sx (t + 1) gv Sv (t + 1)
(84)
meaning that 0 Pyy (t + 1) = Sy (t + 1) Sy (t + 1) =
gx Sx (t + 1) gv Sv (t + 1)
gx Sx (t + 1) gv Sv (t + 1)
0
0
= gx Sx (t + 1) gx Sx (t + 1) + gv Sv (t + 1) (gv Sv (t + 1)) The Kalman gain is given by Kt+1 = Pxy (t + 1) Pyy (t + 1) m
1
Kt+1 = Pxx (t + 1) gx0 Pyy (t + 1)
20
1
0
The posterior state estimator is x ^t+1 = xt+1 + Kt+1 (yt+1
yt+1 )
(85)
and the covariance matrix of this estimator is ^ xx (t + 1) = Pxx (t + 1) P
Kt+1 Pyy (t + 1) K0t+1
^ xx (t + 1) is now derived. Note …rst that for the latter term in the update of A square root implementation of P ^ xx (t + 1), we have P Kt+1 Pyy (t + 1) K0t+1 = Pxy (t + 1) Pyy (t + 1)
1
Pyy (t + 1) K0t+1
= Pxy (t + 1) K0t+1 = Pxx (t + 1) gx0 K0t+1 0
= Sx (t + 1) Sx (t + 1) gx0 K0t+1 But we also have 0 Kt+1 Pyy (t + 1) K0t+1 = Kt+1 Sy (t + 1) Sy (t + 1) K0t+1 0
0
= Kt+1 (gx Sx (t + 1) Sx (t + 1) gx0 + gv Sv (t + 1) Sv (t + 1) gv0 )K0t+1 ^ xx (t + 1) in the following way: Hence, we may rewrite the update of P ^ Pxx (t + 1) = Pxx (t + 1) Kt+1 Pyy (t + 1) K0t+1 Kt+1 Pyy (t + 1) K0t+1 + Kt+1 Pyy (t + 1) K0t+1 0
0
0
= Sx (t + 1) Sx (t + 1) Sx (t + 1) Sx (t + 1) gx0 K0t+1 Kt+1 gx Sx (t + 1) Sx (t + 1) 0 0 +Kt+1 (gx Sx (t + 1) Sx (t + 1) gx0 + gv Sv (t + 1) Sv (t + 1) gv0 )K0t+1 = Sx (t + 1) Kt+1 gx Sx (t + 1) Sx (t + 1) 0 +Kt+1 gv Sv (t + 1) Sv (t + 1) gv0 )K0t+1
Kt+1 gx Sx (t + 1)
0
= Sx (t + 1) Kt+1 gx Sx (t + 1) Sx (t + 1) 0 + [Kt+1 gv Sv (t + 1)] [Kt+1 gv Sv (t + 1)]
Kt+1 gx Sx (t + 1)
0
=
Sx (t + 1) Sx (t + 1)
Kt+1 gx Sx (t + 1) Kt+1 gv Sv (t + 1) Kt+1 gx Sx (t + 1) Kt+1 gv Sv (t + 1)
0
^ xx (t + 1) is So the Cholesky factor for P ^ x (t + 1) = S
Sx (t + 1)
Kt+1 gx Sx (t + 1) Kt+1 gv Sv (t + 1)
21
(86)
3.4
A Smoother for the CDKF
The objective in this section is to make inference about the unknown states xt given all observations, i.e. given y1:T = fy1 ; y2 ; :::; yT g. The estimate of xt in the CDKF is given the information at time t, i.e. given y1:t = fy1 ; y2 ; :::; yt g. The estimate of xt given y1:T is called the smoothed estimate of xt and is denoted xst . The covariance matrix for 0 this estimate is denoted Psxx (t) = ET (xt xst ) (xt xst ) . Our derivations follows Särkkä (2008) who shows how to derive a Forward-Backward smoother of the same form as the Rauch-Tung-Striebel smoother for the Unscented Kalman Filter.1 Dunik & Simandl (2006) derive a square root version of this smoother. Given the results in Norgaard et al. (2000) it is straigthforward to set up the Forward-Backward smoother for the CDKF. Särkkä (2008) shows that an approximated smoothed estimate of xt and the covariance matrix for this estimate is given by xst = x ^t + Kt+1 xst+1 xt+1 ^ xx (t) + Kt+1 Psxx (t + 1) Psxx (t) = P
Pxx (t + 1) K0t+1
where xt+1 = Et [h (xt ; wt+1 )] Pxx (t + 1) = Et (xt+1 Ct+1 = Et (xt
0
xt+1 ) (xt+1
x ^t ) (xt+1
Kt+1 = Ct+1 Pxx (t + 1)
xt+1 ) 0
xt+1 ) 1
The mean and the covariance matrices in this recursion are calculated as follows xt+1
=
h2
1 Pnx nx nw h (^ xt ; wt+1 ) + 2 p=1 (h (^ xt + h^ sx;p ; wt+1 ) + h (^ xt 2 h 2h 1 Pnw + 2 p=1 (h (^ xt ; wt+1 + hsw;p ) + h (^ xt ; wt+1 hsw;p )) 2h Sx (t + 1) =
h^ sx;p ; wt+1 ))
0 ^ x (t) S(1) Ct+1 = S xx (t + 1) h i (1) (1) (2) (2) Sxx (t) Sxw (t) Sxx (t) Sxw (t) 0
Pxx (t + 1) = Sx (t + 1) Sx (t + 1) 0 ^ x (t) S(1) Kt+1 = S xx (t) Pxx (t + 1)
1
Following Dunik & Simandl (2006) the square root implementation of this smoother is as follows: ^ xx (t) + Kt+1 Psxx (t + 1) Pxx (t + 1) K0t+1 Psxx (t) = P ^ xx (t) + Kt+1 Psxx (t + 1) K0t+1 Kt+1 Pxx (t + 1) K0t+1 =P Kt+1 Pxx (t + 1) K0t+1 + Kt+1 Pxx (t + 1) K0t+1 But notice that 0 ^ x (t) S(1) Kt+1 Pxx (t + 1) K0t+1 = S xx (t) Pxx (t + 1) (1)
1
Pxx (t + 1) K0t+1
0
^ x (t) Sxx (t) K0t+1 =S and due to symmetry (1) ^ x (t)0 Kt+1 Pxx (t + 1) K0t+1 = Kt+1 Sxx (t) S 1 The
Unscented Kalman Filter (UKF) developed by Julier, Uhlmann & Durrant-Whyte (1995) is a derivative free implementation of the …ltering equations based on the unscented transformation. However, Norgaard, Poulsen & Ravn (2000) show that the CDKF has marginally higher theoretical accuracy than the UKF.
22
and furthermore 0 Kt+1 Pxx (t + 1) K0t+1 = Kt+1 Sx (t + 1) Sx (t + 1) K0t+1 h
= Kt+1 (
h
0
(1)
(1)
(2)
i
(2)
Sxx (t) Sxw (t) Sxx (t) Sxw (t) (1)
(1)
(2)
(2)
Sxx (t) Sxw (t) Sxx (t) Sxw (t)
i
2
Bh i6 B 6 (1) (2) (2) = Kt+1 B S(1) 6 (t) S (t) S (t) S (t) xx xw xx xw @ 4 (1)
(1)
(1)
0
(1)
(2)
0
0
)K0t+1 (1)
0
Sxx (t) (1) 0 Sxw (t) (2) 0 Sxx (t) (2) 0 Sxw (t) (2)
0
31
7C 7C 0 7C Kt+1 5A (2)
(2)
0
= Kt+1 Sxx (t) Sxx (t) + Sxw (t) Sxw (t) + Sxx (t) Sxx (t) + Sxw (t) Sxw (t) (1)
(1)
(1)
0
(1)
K0t+1
0
= Kt+1 Sxx (t) Sxx (t) K0t+1 + Kt+1 Sxw (t) Sxw (t) K0t+1 (2) (2) (2) (2) 0 0 +Kt+1 Sxx (t) Sxx (t) K0t+1 + Kt+1 Sxw (t) Sxw (t) K0t+1 Hence, substituting these expressions into the expression for Psxx (t) imply that ^ xx (t) Psxx (t) = P +Kt+1 Psxx (t + 1) K0t+1 Kt+1 Pxx (t + 1) K0t+1 Kt+1 Pxx (t + 1) K0t+1 +Kt+1 Pxx (t + 1) K0t+1 ^ x (t) S ^ x (t)0 =S 0 +Kt+1 Ssx (t + 1) Ssx (t + 1) K0t+1 (1) 0 ^ x (t) Sxx (t) K0t+1 S (1) ^ x (t)0 Kt+1 Sxx (t) S (1) (1) (1) (1) 0 0 +Kt+1 Sxx (t) Sxx (t) K0t+1 + Kt+1 Sxw (t) Sxw (t) K0t+1 (2) (2) (2) (2) 0 0 +Kt+1 Sxx (t) Sxx (t) K0t+1 + Kt+1 Sxw (t) Sxw (t) K0t+1 h ^ x (t) = S
ih (1) ^ x (t)0 Kt+1 Sxx (t) S 0
(1)
0
Sxx (t) K0t+1
i
+Kt+1 Ssx (t + 1) Ssx (t + 1) K0t+1 (1) (1) 0 +Kt+1 Sxw (t) Sxw (t) K0t+1 (2) (2) (2) (2) 0 0 +Kt+1 Sxx (t) Sxx (t) K0t+1 + Kt+1 Sxw (t) Sxw (t) K0t+1 i (1) (2) (2) s ^ x (t) Kt+1 S(1) S xx (t) Kt+1 Sx (t + 1) Kt+1 Sxw (t) Kt+1 Sxx (t) Kt+1 Sxw (t) 2 h i0 30 ^ x (t) Kt+1 S(1) S (t) xx 6 7 0 s 0 6 7 S (t + 1) K x t+1 6 7 6 7 (1) 0 Sxw (t) K0t+1 6 7 6 7 (2) 0 4 5 Sxx (t) K0t+1 (2) 0 0 Sxw (t) Kt+1 Hence, the square root of the covariance matrix for the smoother state estimate is h i (1) (2) (2) s ^ x (t) Kt+1 S(1) Ssx (t) = S xx (t) Kt+1 Sx (t + 1) Kt+1 Sxw (t) Kt+1 Sxx (t) Kt+1 Sxw (t) =
h
23
^ xx (T ). Thus, This smoothing recursion is started at the last time step because it holds that xsT = x ^T and Psxx (T ) = P ^ the proceedure is …rst to calculate the posterior estimates x ^t and Sx (t) for t = 1; 2; :::; T by running the CDKF. Then the smoothing recursion is started in time period T and iterated back in time.
4
The QML estimator based on the CDKF
This section states the conditions for consistency and asymptotic normality of the Quasi-Maximum Likelihood (QML) estimator based on the CDKF. Here we apply the results in Bollerslev & Wooldrigde (1992). They show consistency and asymptotic normality for the QML estimator when: i) only the …rst and second moments are correctly speci…ed in the quasi log-likelihood function, ii) the quasi log-likelihood function is derived based on the normal distribution, and iii) standard regularity conditions hold (see Bollerslev & Wooldrigde (1992)). That is, for ^QM L = arg maxL ( ; y1:T ) 2
it holds that
p
d
T ^QM L
! N 0; A0 1 B0 A0 1 ;
0
where the zero subscript denotes the true value of theta ( 0 ) and " @ 2 L ( ; y1:T ) A0 E @ @ 0 E s(
B0 s(
=
0
#
0 0 ; y1:T ) s ( 0 ; y1:T )
@L ( ; y1:T ) @
0 ; y1:T )
: =
0
If the DSGE model is linearized, then we can exactly evaluate the …rst and second moments, and the results in Bollerslev & Wooldrigde (1992) apply directly. Recall that the CDKF in the linear case corresponds to the standard Kalman Filter. However, when the DSGE model is approximated with non-linear terms, then we face the problem that the CDKF only evaluates the …rst and second moments in the quasi log-likelihood function up to second order accuracy. Hence, we can not directly apply the results in Bollerslev & Wooldrigde (1992). But, the approximated solution to DSGE models impose an upper bound on the feasible degree of precision which can be achieved for these moments. For instance, if a DSGE model is approximated up to second order, then it su¢ ces to evaluate the …rst and second moments up to second order and third order accuracy, respectively. Let yt
yexact (
0 )t
+
exact
(
0 )t
where 0 is the true value of the structural parameters, ytexact is the exact estimator to the degree allowed by the approximation of the DSGE model. That is yexact ( )t Et
1
[
exact
Pexact (t; ) yy
Et (
1
[g (xt ; )] is the conditional mean
0 )t ]
=0 h Et 1 (yt
yexact ( )t ) (yt
0
yexact ( )t )
i
is the conditional covariance matrix
Next, let us decompose the conditional mean in the following way yexact ( )t
yCDKF ( )t + r ( )t
24
where yCDKF ( )t is the approximation of the conditional mean found in the CDKF and r ( )t is the remainder which is also stochastic. Hence exact ( 0 )t = yt yexact ( 0 )t m exact
(
0 )t
= yt
yCDKF ( )t
r ( )t
m
exact
(
0 )t
CDKF
=
(
0 )t
r ( )t
CDKF
where ( 0 )t yt y ( )t is the error in the conditional mean based on the approximated expression in the CDKF. Thus Et 1 [ exact ( 0 )t ] = Et 1 CDKF ( 0 )t r ( )t m Et 1 as Et
CDKF 1
[
( 0 )t = Et ( 0 )t ] = 0
1
exact
[r ( )t ]
For the conditional covariance matrix 0 Pexact (t; ) Et 1 exact ( 0 )t exact ( = Et
1
= Et
1
= Et
1
= Et
1
h
0 )t
CDKF
(
0 )t
r ( )t
CDKF
(
0 )t
r ( )t
CDKF
(
0 )t
r ( )t
CDKF
(
0 0 )t
r ( )t
CDKF
(
0 )t
CDKF
(
0 0 )t
CDKF
(
0 )t
CDKF
(
0 0 )t
(t; = PCDKF yy where PCDKF (t; yy 0 Et 1 r ( )t r ( )t . m
0) 0)
2Pyr (t; Et 1
CDKF
2
0
(
CDKF
0
0 )t
(
0 ) + Prr (t; 0 ) CDKF ( 0 )t CDKF
Pexact (t; ) + 2Pyr (t;
0
r ( )t
0 )t
(
i
CDKF
0
0 0 )t
(
0
+ r ( )t r ( )t
0
r ( )t + r ( )t r ( )t
0 0 )t
0)
r ( )t
, Pyr (t;
Prr (t;
0)
0)
Et
1
(
= PCDKF (t; yy
0)
0 )t
The quasi log-likelihood function evaluated to any degree of precision is: PT 0 n T (t; ) + (yt yexact ( )t ) Pexact (t; ) L ( ; y1:T ) = 2y log (2 ) 12 t=1 log Pexact yy yy
We let have dimension P . The 1 Wooldrigde (1992)) 0 0 sexact ( )t = r L ( ; yt ) 0
0
r ( )t , and Prr (t;
1
(yt
0)
yexact ( )t ) :
0
P score function is denoted by sexact ( )t and is given by (see Bollerslev &
1
exact = r yexact ( )t Pexact (t; ) ( )t yy h 0 1 + 21 r Pexact (t; ) Pexact (t; ) Pexact (t; ) yy yy yy
As shown by Bollerslev & Wooldrigde (1992) Et estimator.
1
1
i
vec
sexact (
25
exact
0 0 )t
( )t
exact
= 0; and
0
( )t
PT
t=1
Pexact (t; ) yy
0 sexact ( )t = 0 gives ^, the QML
The quasi log-likelihood function returned by the CDKF is only accurate up to second order precision, and is therefore given by PT 0 n T 1 L ( ; y1:T ) = 2y log (2 ) 12 t=1 log PCDKF (t; ) + yt yCDKF ( )t PCDKF (t; ) yt yCDKF ( )t yy yy
:
This gives the following score function 0 0 s ( )t = r L ( ; yt ) 0
1
CDKF = r yCDKF ( )t PCDKF (t; ) ( )t yy h 0 1 1 CDKF CDKF (t; ) Pyy (t; ) PCDKF (t; ) + 2 r Pyy yy
1
i
CDKF
vec
and therefore 0 0 1 Et 1 s ( 0 )t = r yCDKF ( )t PCDKF (t; ) Et 1 CDKF ( )t h yy i 0 1 1 (t; ) PCDKF (t; ) PCDKF (t; ) vec Et + 21 r PCDKF yy yy yy 0
CDKF
( )t
1
CDKF
0
PCDKF (t; ) yy
( )t
( )t
CDKF
0
( )t
PCDKF (t; ) yy
1
= r yCDKF ( )t PCDKF (t; ) Et 1 [r ( )t ] yy h 1 0 1 CDKF (t; ) (t; ) PCDKF (t; ) PCDKF + 2 r Pyy yy yy
1
which we cannot show is equal to 0 because we do not know if Et
1
i
vec Pexact (t; ) + 2Pyr (t;
0)
Prr (t;
[r ( )t ] = 0 and vec Pexact (t; ) + 2Pyr (t;
0)
(t; ) PCDKF yy
0)
Prr (t;
0.
L ( ; y1:T ) =
ny T 2
1 2
log (2 )
PT
t=1
(t; ) log PCDKF yy
PT ny T log (2 ) 12 t=1 log Pexact (t; 2 PT 0 1 exact ( 0 )t + r ( )t ) Pexact (t; t=1 ( 2 because exact ( 0 )t + r ( )t = CDKF ( 0 )t =
=
5
ny T log (2 ) 2 P T 1 exact t=1 ( 2
1 2
(
PT
0 )t
t=1
) + 2Pyr (t;
0)
) + 2Pyr (t;
log Pexact (t; ) + 2Pyr (t; 0
+ r ( )t ) P
exact
0
(t; ) yCDKF ( )t PCDKF yy
+ yt
Prr (t;
0)
Prr (t;
0)
(t; ) + 2Pyr (t;
Prr (t;
0)
Prr (t;
1
yt
yCDKF ( )t
0) 0)
1
(
exact
(
0 )t
+ r ( )t )
(
exact
(
0 )t
+ r ( )t )
0) 0)
1
Expressing the stochastic volatility model
Consider the model ln
at+1 ass a;t+1
ln
at ass
= ln =
a;ss
ln
a
+
a;t+1 a;t+1
a;t
+
a;ss
a ;t+1
(87) (88)
Another representation of this model is as follows ln
ln
at ass
vt+1 vss
=
=
t
a;t
ln
26
ln
vt vss
vt vss
+
(89)
a;t+1
(90)
0)
ln
a;t+1
=
a;ss
Hence, ln aat+1 ss m ln m
at+1 ass
=
a;t+1
=
a;t+1
at+1 ass
ln
at+1 ass
=
vt vss
ln
t+1
a;t+1
t+1
Thus if we let
a;t
ln
at ass a;t
t+1
ln
a;t
+
a;ss
(91)
a ;t+1
vt+1 vss
ln
+
at ln( ass ) = a;t+1 t+1 a;t + at ln( ass ) because = ln vvsst a;t m
ln
a
a;t+1
a;t+1
a;t+1 a;t+1
+
a;t+1 a;t+1
then ln
at+1 ass
= ln
at ass
+
a;t+1 a;t+1
a;t . This is important Conclusion: the system in (87)-(88) can be reexpressed as in (89)-(91) with t+1 t+1 because (89)-(91) can be solved by only perturbing the variables (at ; vt ; a;t ) whereas (87)-(88) requires perturbing the variables (at ; a;t ; a;t+1 , ;t+1 ). Hence, we can use the standard codes by Schmitt-Grohe and Uribe (2004). Note also that in (89)-(91), (vt ; a;t ) are su¢ cient state variables.
6
Accuracy of …rst and second moments in approximated DSGE models
For …rst moments it follows trivially that the accuracy of the approximation is given by the approximation order. The situation is di¤erent for the second order moments. We illustrate this by considering the scalar case, i.e. xt and yt are scalars. Both variables are expressed in deviation from the deterministic steady state. A sixth order approximation of yt = g (xt ) is given by 1 1 1 1 1 yt = gx xt + g2x x2t + g3x x3t + g4x x4t + g5x x5t + g6x x6t 2! 3! 4! 5! 6! We denote moments of xt by mi = E xi for i = 1; 2; :::. This implies 2
3
4
5
1 1 1 1 V ar (yt ) = gx2 m2 + 2! g2x m4 + 3! g3x m6 + 4! g4x m8 + 5! g5x m10 + 1 1 1 1 1 3 4 5 6 +2 2! gx g2x m + 3! gx g3x m + 4! gx g4x m + 5! gx g5x m + 6! gx g6x m7 1 1 1 1 1 1 1 1 5 6 7 8 +2 2! 3! g2x g3x m + 2! 4! g2x g4x m + 2! 5! g2x g5x m + 2! 6! g2x g6x m 1 1 1 1 1 1 7 8 9 +2 3! 4! g3x g4x m + 3! 5! g3x g5x m + 3! 6! g3x g6x m 1 1 1 1 9 10 +2 4! 5! g4x g5x m + 4! 6! g4x g6x m 1 1 11 +2 5! 6! g5x g6x m
A second order approximation of V ar (yt ) is given by: 2nd V ar (yt ) = gx2 m2 27
12 1 6! g6x
m12
A third order approximation of V ar (yt ) is given by: 3th V ar (yt ) = gx2 m2 + gx g2x m3 A fourth order approximation of V ar (yt ) is given by: 2 4th 1 2 V ar (yt ) = gx2 m2 + gx g2x m3 + 2! g2x m4 + 3! gx g3x m4 A …fth order approximation of V ar (yt ) is given by: 2 5th 1 2 V ar (yt ) = gx2 m2 + gx g2x m3 + 2! g2x m4 + 3! gx g3x m4 +
2 5 4! gx g4x m
+
2 1 5 2! 3! g2x g3x m
Computing the variance in a DSGE model approximated up to …rst order: yt = gx xt + app V ar (yt ) = gx2 m2 which is accurate up to second order Computing the variance in a DSGE model approximated up to second order: 1 g2x x2t yt = gx xt + 2! + 2 app 1 V ar (yt ) = gx2 m2 + 2! g2x m4 + gx g2x m3 which is accurate up to third order and not fourth order because we are missing the term
2 4 3! gx g3x m
Computing the variance in a DSGE model approximated up to third order: 1 1 yt = gx xt + 2! g2x x2t + 3! g3x x3t + 2 2 app 1 2 1 1 2 5 V ar (yt ) = gx2 m2 + 2! g2x m4 + 3! g3x m6 + gx g2x m3 + 3! gx g3x m4 + 2! 3! g2x g3x m 2 which is accurate up to fourth order. Note that we are missing the term 4! gx g4x m5 to have precision of …fth order.
References Anderson, B. D. O. & Moore, J. B. (1979), ‘Optimal …ltering’, Prentice-Hall, Inc . Bollerslev, T. & Wooldrigde, J. M. (1992), ‘Quasi-maximum likelihood estimation and inference in dynamic models with time-varying covariances’, Econometrics Reviews 11(2), 143–172. Dunik, J. & Simandl, M. (2006), ‘Design of square-root derivative-free smoothers’, Working Paper . Julier, S. J., Uhlmann, J. K. & Durrant-Whyte, H. F. (1995), ‘A new approach for …ltering nonlinear systems’, In The Proceedings of the American Control Conference pp. 1628–1632. Lewis, F. L. (1986), ‘Optimal estimation - with an introduction to stochastic control theory’, John Wiley and Sons . Norgaard, M., Poulsen, N. K. & Ravn, O. (2000), ‘Advances in derivative-free state estimation for nonlinear systems’, Automatica 36:11, 1627–1638. Särkkä, S. (2008), ‘Unscented rauch-tung-striebel smoother’, IEEE Transactions on Automatic Control .
28