Advanced Computational Methods in Statistics: Lecture 5. Sequential Monte
Carlo/Particle Filtering. Axel Gandy. Department of Mathematics. Imperial
College ...
Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering Axel Gandy Department of Mathematics Imperial College London http://www2.imperial.ac.uk/~agandy
London Taught Course Centre for PhD Students in the Mathematical Sciences Autumn 2015
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Outline Introduction Setup Examples Finitely Many States Kalman Filter Particle Filtering Improving the Algorithm Further Topics Summary
Axel Gandy
Particle Filtering
2
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Sequential Monte Carlo/Particle Filtering I
Particle filtering introduced in Gordon et al. (1993)
I
Most of the material on particle filtering is based on Doucet et al. (2001) and on the tutorial Doucet & Johansen (2008). Examples:
I
I I I
Tracking of Objects Robot Localisation Financial Applications
Axel Gandy
Particle Filtering
3
Setup - Hidden Markov Model I
x0 , x1 , x2 , . . . : unobserved Markov chain - hidden states
I
y1 , y2 , . . . : observations;
x0 I I
I
I
y3
y4
x1
x2
x3
x4
...
π(x0 ) - the initial distribution f (xt |xt−1 ) for t ≥ 1 - the transition kernel of the Markov chain g (yt |xt ) for t ≥ 1 - the distribution of the observations
Notation: I I
I
y2
y1 , y2 , . . . are conditionally independent given x0 , x1 , . . . Model given by I
I
y1
x0:t = (x0 , . . . , xt ) - hidden states up to time t y1:t = (y1 , . . . , yt ) - observations up to time t
Interested in the posterior distribution p(x0:t |y1:t ) or in p(xt |y1:t )
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Some Remarks I
No explicit solution in the general case - only in special cases.
I
Will focus on the time-homogeneous case, i.e. the transition densities f and the density of the observations g will be the same for each step. Extensions to inhomogeneous case straightforward.
I
It is important that one is able to update quickly as new data becomes available, i.e. if yt+1 is observed want to be able to quickly compute p(xt+1 |y1:t+1 ) based on p(xt |y1:t ) and yt+1 .
Axel Gandy
Particle Filtering
5
Example- Bearings Only Tracking I I I I
Gordon et al. (1993); Ship moving in the two-dimensional plane Stationary observer sees only the angle to the ship. Hidden states: position xt,1 , xt,2 , speed xt,3 , xt,4 . Speed changes randomly xt,3 ∼ N(xt−1,3 , σ 2 ),
I
Position changes accordingly xt,1 = xt−1,1 + xt−1,3 ,
xt,2 = xt−1,2 + xt−1,4
Observations: yt ∼ N(tan−1 (xt,1 /xt,2 ), η 2 )
0.4
0.6
0.8
1.0
I
xt,4 ∼ N(xt−1,4 , σ 2 )
0.2
● ●
● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
● ●●
● ● ●
●● ●●
●
●
●●
●●
●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●
● ● ● ● ● ●
0.0
● ●
● ●
−1.0
● ●
−0.8 ●
−0.6
−0.4
−0.2
0.0
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Example- Stochastic Volatility I
Returns: yt ∼ N(0, β 2 exp(xt )) (observable from price data)
I
σ σ Volatility: xt ∼ N(αxt−1 , (1−α) 2 ), x1 ∼ N(0, (1−α)2 ),
I
σ = 1, β = 0.5, α = 0.95
2
2
●
10
●
●●
●
Volatility xt Returns yt
● ● ●
● ●
5
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ●● ● ● ●●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ●● ● ●●●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ●● ●●● ●● ● ● ● ●●●● ●● ● ●● ● ●● ● ● ● ● ●● ●● ●●● ● ● ●●● ● ●●●● ● ● ● ●●● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●
−5
0
●
● ● ● ● ●
●
●
●
● ●
0
50
100
●
●
●
150
200
250
300
350
t
Axel Gandy
Particle Filtering
7
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Hidden Markov Chain with finitely many states I
If the hidden process xt takes only finitely many values then p(xt+1 |y1:t+1 ) can be computed recursively via p(xt+1 |y1:t+1 ) ∝ g (yt+1 |xt+1 )
X
f (xt+1 |xt )p(xt |y0:t )
xt I
May not be practicable if there are too many states!
Axel Gandy
Particle Filtering
8
Kalman Filter I I
Kalman (1960) Linear Gaussian state space model: I I I I I
I
x0 ∼ N(ˆ x0 , P0 ) xt = Axt−1 + wt yt = Hxt + vt wt ∼ N(0, Q) iid, vt ∼ N(0, R) iid A, H deterministic matrices; ˆ x0 , P0 , A, H, Q, R known
Explicit computation and updating of the posterior possible:
Posterior: xt |y1 , . . . , yt ∼ N(ˆ x t , Pt ) ˆ x0 , P0
Time Update ˆ x− xt−1 t = Aˆ − Pt = APt−1 A0 + Q
Measurement Update Kt = Pt− H 0 (HPt− H 0 + R)−1 ˆ xt = ˆ x− x− t + Kt (yt − Hˆ t ) − Pt = (I − Kt H)Pt
Kalman Filter - Simple Example x0 ∼ N(0, 1) xt = 0.9xt−1 + wt yt = 2xt + vt wt ∼ N(0, 1) iid, vt ∼ N(0, 1) iid
I I I I
−15
−10
−5
0
5
yt
0
20
40
60
80
100
60
80
100
t
−5
0
5
true xt mean xt|y1,...,yt 95% credible interval
0
20
40
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Kalman Filter - Remarks I I
Updating very easy - only involves linear algebra.
I
Very widely used
I
A, H, Q and R can change with time
I
A linear control input can be incorporated, i.e. the hidden state can evolve according to xt = Axt−1 + But−1 + wt−1 where ut can be controlled.
I
Normal prior/updating/observations can be replaced through appropriate conjugate distributions.
I
Continuous Time Version: Kalman-Bucy filter See Øksendal (2003) for a nice introduction.
Axel Gandy
Particle Filtering
11
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Kalman Filter - Remarks II I
Extended Kalman Filter extension to nonlinear dynamics: I I
xt = f (xt−1 , ut−1 , wt−1 ) yt = g (xt , vt ).
where f and g are nonlinear functions. The extended Kalman filter linearises the nonlinear dynamics around the current mean and covariance. To do so it uses the Jacobian matrices, i.e. the matrices of partial derivatives of f and g with respect to its components. The extended Kalman filter does no longer compute precise posterior distributions.
Axel Gandy
Particle Filtering
12
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Outline Introduction Particle Filtering Introduction Bootstrap Filter Example-Tracking Example-Stochastic Volatility Example-Football Theoretical Results Improving the Algorithm Further Topics Summary Axel Gandy
Particle Filtering
13
What to do if there were no observations...
x0
y1
y2
y3
y4
x1
x2
x3
x4
...
Without observations yi the following simple approach would work: I Sample N particles following the initial distribution (1)
(N)
x0 , . . . , x0 I
For every step propagate each particle according to the transition kernel of the Markov chain: (j)
(j)
xi+1 ∼ f (·|xi ), I
I
∼ π(x0 )
j = 1, . . . , N
After each step there are N particles that approximate the distribution of xi . Note: very easy to update to the next step.
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Importance Sampling I I I I
I
Cannot sample from x0:t |y1:t directly. Main idea: Change the density we are sampling from. R Interested in E(φ(x0:t )|y1:t ) = φ(x0:t )p(x0:t |y1:t )dx0:t For any density h, Z p(x0:t |y1:t ) E(φ(x0:t )|y1:t ) = φ(x0:t ) h(x0:t )dx0:t , h(x0:t ) Thus an unbiased estimator of E(φ(x0:t )|y1:t ) is N X ˆI = 1 φ(xi0:t )wti , N i=1
where wti = I I
p(xi0:t |y1:t ) h(xi0:t )
and x10:t , . . . , xN 0:t ∼ h iid.
How to evaluate p(xi0:t |y1:t )? How to choose h? Can importance sampling be done recursively? Axel Gandy
Particle Filtering
15
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Sequential Importance Sampling I I
Recursive definition and sampling of the importance sampling distribution: h(x0:t ) = h(xt |x0:t−1 )h(x0:t−1 )
I
Can the weights be computed recursively? By Bayes’ Theorem: wt =
p(y1:t |x0:t )p(x0:t ) p(x0:t |y1:t ) = h(x0:t ) h(x0:t )p(y1:t )
Hence, wt =
g (yt |xt )p(y1:t−1 |x0:t−1 )f (xt |xt−1 )p(x0:t−1 ) h(xt |x0:t−1 )h(x0:t−1 )p(y1:t )
Thus, wt = wt−1 Axel Gandy
g (yt |xt )f (xt |xt−1 ) p(y1:t−1 ) h(xt |x0:t−1 ) p(y1:t ) Particle Filtering
16
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Sequential Importance Sampling I I
Recursive definition and sampling of the importance sampling distribution: h(x0:t ) = h(xt |x0:t−1 )h(x0:t−1 )
I
Can the weights be computed recursively? By Bayes’ Theorem: wt =
p(y1:t |x0:t )p(x0:t ) p(x0:t |y1:t ) = h(x0:t ) h(x0:t )p(y1:t )
Hence, wt =
g (yt |xt )p(y1:t−1 |x0:t−1 )f (xt |xt−1 )p(x0:t−1 ) h(xt |x0:t−1 )h(x0:t−1 )p(y1:t )
Thus, wt = wt−1 Axel Gandy
g (yt |xt )f (xt |xt−1 ) p(y1:t−1 ) h(xt |x0:t−1 ) p(y1:t ) Particle Filtering
16
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Sequential Importance Sampling I I
Recursive definition and sampling of the importance sampling distribution: h(x0:t ) = h(xt |x0:t−1 )h(x0:t−1 )
I
Can the weights be computed recursively? By Bayes’ Theorem: wt =
p(y1:t |x0:t )p(x0:t ) p(x0:t |y1:t ) = h(x0:t ) h(x0:t )p(y1:t )
Hence, wt =
g (yt |xt )p(y1:t−1 |x0:t−1 )f (xt |xt−1 )p(x0:t−1 ) h(xt |x0:t−1 )h(x0:t−1 )p(y1:t )
Thus, wt = wt−1 Axel Gandy
g (yt |xt )f (xt |xt−1 ) p(y1:t−1 ) h(xt |x0:t−1 ) p(y1:t ) Particle Filtering
16
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Sequential Importance Sampling II I
Can work with normalised weights: w ˜ ti =
wi P t j; j wt
then one gets
the recursion i w ˜ ti ∝ w ˜ t−1
I
g (yt |xit )f (xit |xit−1 ) h(xit |xi0:t−1 )
If one uses uses the prior distribution h(x0 ) = π(x0 ) and h(xt |x0:t−1 ) = f (xt |xt−1 ) as importance sampling distribution then the recursion is simply i w ˜ ti ∝ w ˜ t−1 g (yt |xit )
Axel Gandy
Particle Filtering
17
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Failure of Sequential Importance Sampling Weights degenerate as t increases. Example: x0 ∼ N(0, 1), xt+1 ∼ N(xt , 1), yt ∼ N(xt , 1). N = 100 particles Plot of the empirical cdfs of the normalised weights w ˜ t1 , . . . , wtN
1e−41
1e−30
1e−19 x
I
●
●
●
1e−08
● ●
●
●
●
●
●
● ● ● ● ● ●
●
● ●
●
● ●
●
●
● ●
● ● ●
●
0.8 ●
●
1e−52
1e−41
1e−30
1e−19
1e−08
1e−52
●
● ● ●
●
●
● ●
●
●
● ●
● ●
● ●
● ●
●
●
●
●
●
● ●
0.6
●
● ● ● ● ● ● ● ● ● ●
● ●
● ● ●
Fn(x)
● ● ●
● ● ●
●
0.4
● ●
●
●
1.0
1.0 ● ● ●
●
●
● ●
● ● ● ● ● ● ●
● ●
0.2
●
●
●
●
●
0.8
0.8 0.6 Fn(x) ●
●
● ●
● ● ● ● ● ● ● ● ●● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
0.0
0.0 1e−52
●
● ● ● ● ● ●
t= 40 ● ● ● ● ● ●
0.0
●
●
0.4
● ●
0.2
●
●
0.2
●
●
●
●
●
t= 20
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●
0.0
0.8 0.6 Fn(x) 0.4 ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1.0
t= 10
1.0
t= 1
0.6
I
Fn(x)
I
0.4
I
0.2
I
1e−41
x
1e−30
1e−19
1e−08
1e−52
1e−41
x
1e−30
1e−19
1e−08
x
Note the log-scale on the x-axis. Most weights get very small.
Axel Gandy
Particle Filtering
18
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Resampling I
Goal: Eliminate particles with very low weights.
I
Suppose Q=
N X
w ˜ ti δxit
i=1
I
is the current approximation to the distribution of xt . Then one can obtain a new approximation as follows: I I
Sample N iid particles ˜ xit from Q The new approximation is ˜ = Q
N X 1 δi N ˜xt i=1
Axel Gandy
Particle Filtering
19
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
The Bootstrap Filter (i)
1. Sample x0 ∼ π(x0 ) and set t = 1 2. Importance Sampling Step For i = 1, . . . , N: (i)
(i)
(i)
(i)
(i)
I
xt ) x0:t = (x0:t−1 , ˜ Sample ˜ xt ∼ f (xt |xt−1 ) and set ˜
I
Evaluate the importance weights w ˜ t = g (yt |˜ xt ).
(i)
(i)
3. Selection Step: I
(i)
Resample with replacement N particles (x0:t ; i = 1, . . . , N) from (1) (1) the set {˜ x0:t , . . . , ˜ x0:t } according to the normalised importance (i)
weights
w ˜ PN t
(j)
j=1
I
w ˜t
.
t:=t+1; go to step 2.
Axel Gandy
Particle Filtering
20
Illustration of the Bootstrap Filter N=10 particles ●
● ●
●
●●
●
● ●
●
●
●
●
●●
●
● ●
●
● ●
●
● ● ●
● ●
●
g(y1|x)
●●
●
● ● ●
●
●
● ●●
● ●
●
●
●
● ●
● ●
● g(y2|x)
●
●
●● ● ● ● ● ●
●
● ●
●
●
●
● ●●
●
● ●
●
●
●
● ●●
●
● ●
● ● ● ● ●● ●
g(y3|x)
●
−2
−1
0
1
2
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Example- Bearings Only Tracking N = 10000 particles
0.0
0.5
1.0
I
−0.5
● ● ● ● ● ● ● ●● ● ●● ●● ●● ●● ●● ● ● ● ● ●● ● ●● ●● ● ●●
−1.2
−1.0
−0.8
●
● ●
●● ●● ●● ●● ●● ● ● ●● ●● ● ● ● ●
●
−0.6
Axel Gandy
●●
●● ●● ●●
● ●● ●●● ● ●●
true position estimated position observer
−0.4
−0.2
●● ●
● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●●●
●
0.0
Particle Filtering
22
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Example- Stochastic Volatility Bootstrap Particle Filter with N = 1000
I
−5
xtrue 0
5
true volatility mean posterior volatility 95% credible interval
0
50
100
150
200
250
300
350
t
Axel Gandy
Particle Filtering
23
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Example: Football I I I I I
Data: Premier League 2007/08 xt,j “strength” of the jth team at time t, j = 1, . . . , 20 yt result of the games on date t Note: not time-homogeneous (different teams playing one-another - different time intervals between games). Model: I I
I
Initial distribution of the strength: xt,j ∼ N(0, 1) Evolution of strength: xt,j ∼ N((1 − ∆/β)1/2 xt−∆,j , ∆/β) will use β = 50 Result of games conditional on strength: Match between team H of strength xH (at home) against team A of strength xA . Goals scored by the home team ∼ Poisson(λH exp(xH − xA )) Goals scored by the away team ∼ Poisson(λA exp(xA − xH )) λH and λA constants chosen based on the average number of goals scored at home/away. Axel Gandy
Particle Filtering
24
1.0
Mean Team Strength
−1.0
−0.5
0.0
0.5
Arsenal Aston Villa Birmingham Blackburn Bolton Chelsea Derby Everton Fulham Liverpool
0
50
100
150
200
250
300
350
1.0
t
−1.0
−0.5
0.0
0.5
Man City Man United Middlesbrough Newcastle Portsmouth Reading Sunderland Tottenham West Ham Wigan
0
50
100
150
200 t
(based on N = 100000 particle)
250
300
350
League Table at the end of 2007/08 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Man Utd Chelsea Arsenal Liverpool Everton Aston Villa Blackburn Portsmouth Manchester City West Ham Utd Tottenham Newcastle Middlesbrough Wigan Athletic Sunderland Bolton Fulham Reading Birmingham Derby County
87 85 83 76 65 60 58 57 55 49 46 43 42 40 39 37 36 36 35 11
Influence of the Number N of Particles 1.0
N=100
−1.0
−0.5
0.0
0.5
Arsenal Aston Villa Birmingham Blackburn Bolton Chelsea Derby Everton Fulham Liverpool
0
50
100
150
200
250
300
350
1.0
N=1000
−1.0
−0.5
0.0
0.5
Arsenal Aston Villa Birmingham Blackburn Bolton Chelsea Derby Everton Fulham Liverpool
0
50
100
150
200
250
300
350
1.0
N=10000
−1.0
−0.5
0.0
0.5
Arsenal Aston Villa Birmingham Blackburn Bolton Chelsea Derby Everton Fulham Liverpool
0
50
100
150
200
250
300
350
1.0
N=100000
−1.0
−0.5
0.0
0.5
Arsenal Aston Villa Birmingham Blackburn Bolton Chelsea Derby Everton Fulham Liverpool
0
50
100
150
200
250
300
350
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Theoretical Results I
Convergence results are as N → ∞
I
Laws of Large Numbers
I
Central limit theorems see e.g. Chopin (2004)
I
The central limit theorems yield an asymptotic variance. This asymptotic variance can be used for theoretical comparisons of algorithms.
Axel Gandy
Particle Filtering
28
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Outline Introduction Particle Filtering Improving the Algorithm General Proposal Distribution Improving Resampling Further Topics Summary
Axel Gandy
Particle Filtering
29
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
General Proposal Distribution Algorithm with a general proposal distribution h: (i) 1. Sample x0 ∼ π(x0 ) and set t = 1 2. Importance Sampling Step For i = 1, . . . , N: I I
(i)
(i)
(i)
(i)
(i)
xt ) x0:t = (x0:t−1 , ˜ Sample ˜ xt ∼ ht (xt |xt−1 ) and set ˜ (i)
Evaluate the importance weights w ˜t =
(i) (i) (i) g (yt |˜ xt )f (˜ xt |xt−1 ) (i) (i) ht (˜ xt |xt−1 )
.
3. Selection Step: I
(i)
Resample with replacement N particles (x0:t ; i = 1, . . . , N) from (1) (1) the set {˜ x0:t , . . . , ˜ x0:t } according to the normalised importance (i)
weights
w ˜ PN t
(j)
j=1
I
w ˜t
.
t:=t+1; go to step 2.
Optimal proposal distribution depends on quantity to be estimated. generic choice: choose proposal to minimise the variance of the normalised weights. Axel Gandy
Particle Filtering
30
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Improving Resampling I
Resampling was introduced to remove particles with low weights
I
Downside: adds variance
Axel Gandy
Particle Filtering
31
Other Types of Resampling I
Goal: Reduce additional variance in the resampling step
I
Standardised weights W 1 , . . . , W N ;
I
N i - number of ‘offspring’ of the ith element.
I
Need E N i = W i N for all i.
I
Want to minimise the resulting variance of the weights.
Multinomial Resampling - resampling with replacement Systematic Resampling I Sample U1 ∼ U(0, 1/N). Let , i = 2, . . . , N Ui = U1 + i−1 PNi−1 Pi i k k I N = |{j : k=1 W ≤ Uj ≤ k=1 W | ˜ i = bW i Nc Residual Resampling I Idea: Guarantee at least N offspring of the ith element; P i ¯ 1, . . . , N ¯ n : Multinomial sample of N − N ˜ I N ¯ i ∝ Wi − N ˜ i /N. items with weights W i i i ˜ ¯ I Set N = N + N .
Adaptive Resampling I I I I I
Resampling was introduced to remove particles with low weights. However, resampling introduces additional randomness to the algorithm. Idea: Only resample when weights are “too uneven”. Can be assessed by computing the variance of the weights and comparing it to a threshold. Equivalently, one can compute the “effective sample size” (ESS): !−1 n X i 2 ESS = (wt ) . i=1
(wt1 , . . . , wtn
I
I
are the normalised weights) Intuitively the effective sample size describes how many samples from the target distribution would be roughly equivalent to importance sampling with the weights wti . Thus one could decide to resample only if ESS < k where k can be chosen e.g. as k = N/2.
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Outline Introduction Particle Filtering Improving the Algorithm Further Topics Path Degeneracy Smoothing Parameter Estimates Summary
Axel Gandy
Particle Filtering
34
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Path Degeneracy Let s ∈ N. #{xi0:s : i = 1, . . . , N} → 1
(as # of steps t → infty )
Example x0 ∼ N(0, 1), xt+1 ∼ N(xt , 1), yt ∼ N(xt , 1). N = 100 particles.
0
20
40
60
80
6 −4
−2
0
2
4
6 4 2 0 −2 −4
−4
−2
0
2
4
6
8
t=80
8
t=40
8
t=20
0
Axel Gandy
20
40
60
80
0
20
Particle Filtering
40
60
80
35
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Particle Smoothing I
Estimate the distribution of the state xt given all the observations y1 , . . . , yτ up to some late point τ > t.
I
Intuitively, a better estimation should be possible than with filtering (where only information up to τ = t is available).
I
Trajectory estimates tend to be smoother than those obtained by filtering.
I
More sophisticated algorithms are needed.
Axel Gandy
Particle Filtering
36
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Filtering and Parameter Estimates I
Recall that the model is given by I I I
I I
In practical applications these distributions will not be known explicitly - they will depend on unknown parameters themselves. Two different starting points I I
I
I
π(x0 ) - the initial distribution f (xt |xt−1 ) for t ≥ 1 - the transition kernel of the Markov chain g (yt |xt ) for t ≥ 1 - the distribution of the observations
Bayesian point of view: parameters have some prior distribution Frequentist point of view: parameters are unknown constants.
Examples: σ2 σ2 Stoch. Volatility: xt ∼ N(αxt−1 , (1−α) 2 ), x1 ∼ N(0, (1−α)2 ), yt ∼ N(0, β 2 exp(xt )) Unknown parameters: σ, β, α Football: Unknown Parameters: β, λH , λA . How to estimate these unknown parameters? Axel Gandy
Particle Filtering
37
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Maximum Likelihood Approach I
Let θ ∈ Θ contain all unknown parameters in the model.
I
Would need marginal density pθ (y1:t ).
I
Can be estiamted by running the particle filter for each θ of interest and by muliplying the unnormalised weights, see the slides Sequential Importance Sampling I/II earlier in the lecture for some intuition.
Axel Gandy
Particle Filtering
38
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Artificial(?) Random Walk Dynamics I
I
Allow the parameters to change with time - give them some dynamic. More precisely: I I I
I
I
Suppose we have a parameter vector θ Allow it to depend on time (θ t ), assign a dynamic to it, i.e. a prior distribution and some transition probability from θ t−1 to θ t incorporate θ t in the state vector xt
May be reasonable in the Stoch. Volatility and Football Example
Axel Gandy
Particle Filtering
39
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Bayesian Parameters I
Prior on θ
I
Want: Posterior p(θ|y1 , . . . , yt ). Naive Approach:
I
I
I
I
Incorporate θ into xt ; transition for these components is just the identity Resampling will lead to θ degenerating - after a moderate number of steps only few (or even one) θ will be left
New approaches: I
I
Particle filter within an MCMC algorithm, Andrieu et al. (2010) computationally very expensive. SMC2 by Chopin, Jacob, Papaspiliopoulos, arXiv:1101.1528.
Axel Gandy
Particle Filtering
40
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Outline Introduction Particle Filtering Improving the Algorithm Further Topics Summary Concluding Remarks
Axel Gandy
Particle Filtering
41
Introduction
Particle Filtering
Improving the Algorithm
Further Topics
Summary
Concluding Remarks I
Active research area
I
I
A collection of references/resources regarding SMC http: //www.stats.ox.ac.uk/~doucet/smc_resources.html Collection of application-oriented articles: Doucet et al. (2001)
I
Brief introduction to SMC: (Robert & Casella, 2004, Chapter 14)
I
R-package implementing several methods: pomp http://pomp.r-forge.r-project.org/
Axel Gandy
Particle Filtering
42
References
Part I Appendix
Axel Gandy
Particle Filtering
43
References
References I Andrieu, C., Doucet, A. & Holenstein, R. (2010). Particle markov chain monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72, 269–342. Chopin, N. (2004). Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference. Ann. Statist. 32, 2385–2411. Doucet, A., Freitas, N. D. & Gordon, N. (eds.) (2001). Sequential Monte Carlo Methods in Practice. Springer. Doucet, A. & Johansen, A. M. (2008). A tutorial on particle filtering and smoothing: Fifteen years later. Available at http://www.cs.ubc.ca/~arnaud/doucet_johansen_tutorialPF.pdf. Gordon, N., Salmond, D. & Smith, A. (1993). Novel approach to nonlinear/non-gaussian bayesian state estimation. Radar and Signal Processing, IEE Proceedings F 140, 107–113. Kalman, R. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering 82, 35–45.
Axel Gandy
Particle Filtering
44
References
References II Øksendal, B. (2003). Stochastic Differential Equations: An Introduction With Applications. Springer. Robert, C. & Casella, G. (2004). Monte Carlo Statistical Methods. Second ed., Springer.
Axel Gandy
Particle Filtering
45