Using Random Quasi-Monte-Carlo within Particle Filters, with

0 downloads 0 Views 149KB Size Report
Carlo methods website, http://www-sigproc.eng.cam.ac.uk/smc/index.html. ... if errors of Op(r(N)), for a suitable rate r(N), are introduced at each iteration of the.
Using Random Quasi-Monte-Carlo within Particle Filters, with Application to Financial Time Series Paul Fearnhead Department of Mathematics and Statistics Lancaster University

Summary We present a new particle filter algorithm which uses random Quasi-MonteCarlo to propagate particles. The filter can be used generally, but we show that for 1-dimensional state-space models, if the number of particles is N , then the rate of convergence of this algorithm is N −1 . This compares favourably with the N −1/2 convergence rate of standard particle filters. The computational complexity of the new filter is quadratic in the number of particles, as opposed to the linear computational complexity of standard methods. We demonstrate our new filter on two important financial time series models, an ARCH model and a stochastic volatility model. Simulation studies show that for fixed CPU time, the new filter can be orders of magnitude more accurate than existing particle filters. The new filter is particularly efficient at estimating smooth functions of the states, where empirical rates of convergence are N −3/2 ; and for performing smoothing, where both the new and existing filters have the same computational complexity. Keywords ARCH, Filtering, Rate of Convergence, Sequential Monte-Carlo, Smoothing, Stochastic Volatility

1

Introduction

Dynamic state space models are commonly used in econometrics, engineering, geophysics and other scientific disciplines. They model processes where there is an underlying state of interest which evolves over time. At successive time points, partial observations of the state are made. A simple but important example, which will be used throughout this paper, is the ARCH model (Bollerslev et al., 1994). (The ARCH model has received much attention in the econometric literature; see Harvey et al., 1992).

1

Example 1 ARCH model xt |xt−1 ∼ N(0, β0 + β1 x2t−1 ), yt |xt ∼ N(xt , σ 2 ), where xt and yt are respectively the state and observation at time t, and β 0 , β1 and σ are the parameters of the model. The aim of filtering is to estimate the current value of the state given the observations to date. The Bayesian solution is to calculate the posterior distribution p(x t |y1:t ), where here, and throughout, y1:t = {y1 , . . . , yt }. A related problem, that of smoothing, is to estimate a past value of the state given the observations to date; that is to calculate p(xt |y1:T ) for T > t. The primary focus of this paper is on the filtering problem, and we consider approaches based on particle filters. Particle filters approximate the posterior distribution at a given time point by a set of weighted particles. The particles are possible realisations of the state, and the approximation of the posterior is by a discrete distribution whose support is the set of particles, with the probability mass assigned to each particle being proportional to that particle’s weight. For the simplest filter, these particles are propagated (stochastically) through time according to the dynamics of the state, and reweighted at each time-point to take account of the information in the observation at that time. See Doucet et al. (2001) for a book length review, and numerous examples of particle filters being applied to important scientific problems; or the article of Liu and Chen (1998) for a more concise review. The latest developments in particle filters can be found on the sequential Monte Carlo methods website, http://www-sigproc.eng.cam.ac.uk/smc/index.html. Particle filters have good theoretical properties (see Crisan and Doucet, 2002, and references therein), and under mild regularity conditions, the rate of convergence of the particle approximation to the true posterior is O(N −1/2 ) where N is the number of particles. This rate is independent of time; the constant of proportionality generally increases with time, but in many applications it is bounded. Furthermore, the computational cost of the filter is linear in the number of particles. As a result there has been some recent interest in applying particle filter methods to batch problems (Chopin, 2002; Fearnhead, 2004), particularly for problems where the data sets can be very large (Ridgeway and Madigan, 2003). Particle filters have been used to analyse financial time series, see for example Kitagawa (1996), Pitt and Shephard (1999) and Smith and 2

Santos (2003) amongst others. The O(N −1/2 ) convergence rate of the particle filter stems from its basis on Monte Carlo integration ideas. The aim of this paper is to apply ideas from quasi-Monte-Carlo (QMC) integration (Niederreiter, 1978) to particle filters. In QMC integration, the error of an approximation, based on N particles, to a d-dimensional integral can decay as quickly as O(N −1 (log N )d−1 ). This faster rate of convergence is due to dependence between the particles which means that they are more regularly spaced out than independent particles. Standard QMC methods used deterministic rules for producing the particles, and this violates the unbiased property of properly weighted particles (Liu et al., 2001). Instead we use random QMC methods (see Owen, 1998, for a review), which produce random particle values, with each particle having the correct marginal distribution, but with the dependence between the particles that produces the quicker convergence rates being retained. Such methods can satisfy the properly weighted particle property. A further advantage of random QMC is that for sufficiently smooth integrands, the rate of convergence of the approximation to a d-dimensional integral is O(N −3/2 (log N )(d−1)/2 ) (Owen, 1997). We propose a new particle filter algorithm for analysing these models, which uses random QMC to propagate the particles. Our algorithm is motivated by a result (see Theorem 1) that shows for models with a 1-dimensional state, such as the ARCH model, if errors of Op (r(N )), for a suitable rate r(N ), are introduced at each iteration of the particle filter, then, under mild regularity conditions, the error in the particle filter’s approximation to the posterior at any time will be O p (r(N )). The use of random QMC has the promise of producing a particle filter whose rate of convergence is O(N −1 ). We call our new particle filter a regularised particle filter as it produces particles which are more regularly spaced than standard particle filters. We prove in Theorem 2 that the regularised particle filter introduces errors which are O p (N −1 ) at each step of the algorithm, and thus has a rate of convergence which is O(N −1 ). The computational complexity of our regularised particle filter is quadratic in the number of particles. As such no asymptotic gain in efficiency is necessarily achieved. However, in the examples we consider, significant gains in efficiency are observed for practical, finite, number of particles. The new filter is particularly efficient at estimating smooth functions of the state, for example the log-likelihood and posterior means, and in these cases we notice errors decaying at rates close to O(N −3/2 ).

3

In our simulation study we also consider estimating the smoothed posterior means, E(Xt |y1:T ). Both the regularised particle filter and standard particle filter can be adapted to approximate the marginal smoothed posteriors, p(x t |y1:T ), using ideas from Kitagawa (1996). For the smoothing problem the computational complexity of both filters is the same, quadratic in N . However our empirical results suggest that the new filter maintains its quicker rate of convergence for the smoothing problem, and thus large gains in efficiency are possible, particlularly for large values of N . The outline of the paper is as follows. In Sections 2 and 3 we briefly introduce QMC integration theory and particle filters respectively. In Section 4 we consider the rate of convergence of the particle filter, and state Theorem 1. In Section 5 we introduce our new regularised particle filter algorithm. Section 6 consists of results from simulation studies on the ARCH and stochastic volatility models, and Section 7 the results for a model with bi-modal filtering densities. The paper ends with a discussion, including the possible extension of the new filter to models in higher dimensions.

2

Quasi-Monte-Carlo Integration

We now give a brief overview of QMC integration theory. As we focus later on 1dimensional models, here we concentrate on presenting QMC theory for estimating 1dimensional integrals, with the extensions to higher dimensions being briefly outlined. For further details see Niederreiter (1978), Niederreiter (1992) and Owen (1998). Consider estimating the value of an integral over the interval [0, 1], I=

Z

1

f (x)dx. 0

A simple Monte-Carlo estimator is based on simulating x i ∼ U(0, 1), for i = 1, . . . , N , and estimating I by N 1 X ˆ f (xi ). I= N

(1)

i=1

This is an unbiased estimator of I, and with standard deviation proportional to N −1/2 . However if we define the discrepancy of a set of N points, x 1 , . . . , xN , as ) ( N X 1 ∆N = sup x − δ(xi < x) , N 0≤x≤1 i=1

where δ(xi < x) is an indicator function that takes the value 1 if x i < x and 0 otherwise, 4

then ˆ ≤ ∆N |I − I|

Z

1

|f 0 (x)|dx.

0

See Ripley (1987) for a proof. While the expected discrepancy of N independent realisations of a U(0, 1) random variable is O(N −1/2 ), it is possible to construct sequences of points whose discrepancy is O(N −1 ), for example the sequence {1/(N + 1), 2/(N + 1), . . . , N/(N +1)}. The idea of QMC is to use a deterministic sequence of points, chosen to have a low discrepancy, in the estimator (1). For estimating integrals of functions, R1 f (x), which have bounded variation, 0 |f 0 (x)|dx < ∞, the use of such a sequence of points gives errors of the estimate that decay more quickly than for standard Monte-

Carlo estimates. Such a low-discrepancy sequence of points is called a QMC sequence. The condition of bounded variation is not necessary for the errors of the estimator to decay at the rate of ∆N . In practice integrals of interest are often of the form J =

R

f (y)p(y)dy, for some prob-

ability density function p(y). Such an integral can be transformed to an integral over Ry [0, 1] by the probability integral transform. If we define P (y) = −∞ p(s)ds, then R1 J = 0 f (P −1 (x))dx. Estimating this integral using a low-discrepancy sequence of x values is equivalent to evaluating an estimator of the form (1) based on a sequence of

yi values, where yi = P −1 (xi ). Such a sequence of yi values can be thought of as a low discrepancy sequence from p(y). We will use a further generalisation of such QMC methods, based on using weighted particles. Consider pˆ(y), an approximation to p(y), which is based on N particles {y i }N i=1 and associated weights {wi }N i=1 . Our estimate of J is Jˆ =

N X

wi f (yi ).

(2)

i=1

We can define the discrepancy of pˆ(y) with p(y) as ) ( N X ∆(p)N = sup P (y) − wi δ(yi < y) , y i=1

and we have (by a similar argument to Ripley 1987, p.190) Z ˆ |J − J| ≤ ∆(p)N |f 0 (y)|dy.

(3)

These results extend to higher dimensions. For formal definitions of discrepancy and bounded variation in higher dimensions see Niederreiter (1992). It is possible to generate a sequence of N points in the d-dimensional hypercube, [0, 1] d , whose discrepancy 5

is O(N −1 (log N )d−1 ). Integrals over [0, 1]d , where the integrand has bounded variation, can be estimated using these low-discrepancy sequences, and the estimators have errors which are O(N −1 (log N )d−1 ). As in the 1-dimensional case, expectations with respect to densities over a general d-dimensional space can be related to integrals over [0, 1] d via a suitable transformation, and QMC methods still used. Though for d-dimensional integrals there is no single standard transformation. It also harder to relate the properties of the integrands over a general space to the transformed integrands over [0, 1] d , and extensions of (3) do not necessarily follow.

2.1

Random Quasi-Monte-Carlo

The low discrepancy sequences used in QMC are deterministic, which has the practical disadvantage that there is no way of estimating the accuracy of an estimator. This has lead to the development of random QMC methods (see Owen, 1998, for a review). A random QMC sequence, X1 , . . . , XN , on [0, 1]d satisfies (i) marginally Xi ∼ U(0, 1)d for all i = 1, . . . , N ; (ii) jointly X1 , . . . , XN is a QMC sequence with probability 1. Random QMC sequences produce unbiased estimators of integrals, and the variance of the estimators can be estimated from independent replicates of the estimator. In 1-dimension a simple random QMC sequence is obtained by letting the ordered sequence of points, X(1) , X(2) , . . . , X(N ) satisfy X(i) ∼ U((i−1)/N, i/N ). Random QMC sequences from a pdf p(y) can then be obtained by transformation as before. Methods for producing random QMC sequences in higher dimensions are given in Owen (1995). A further advantage of random QMC over QMC is that for estimates of integrals of sufficiently smooth functions, the errors can be O(N −3/2 (log N )(d−1)/2 ) (Owen, 1997). Example 2 Random QMC for uniform mean Consider estimating J=

Z

10

f (y)/10dy,

0

the mean of f (Y ) where Y has an uniform distribution on [0, 10]. A random QMC estimator is obtained by; (i) generate, for i = 1, . . . , N , xi a realisation of Xi ∼ U((i − 1)/N, i/N ); 6

(ii) set yi = 10xi , and wi = 1/N ; (iii) estimate J by Jˆ using (2). For f (x) = x, the variance of the estimator is 50/(6N 3 ), or equivalently the errors of the estimator decay as N −3/2 .

3

Particle Filters

We return to the problem of estimating a time-varying state, x t , based on partial observations yt , We will use p(·) and p(·|·) to denote respectively a generic density and conditional density associated with the model; the arguments making it clear as to which density we are referring to. Our aim is to calculate the posterior densities p(x t |y1:t ) for subsequent values of t. Filtering algorithms use the following relationship between p(x t+1 |y1:t+1 ) and p(xt |y1:t ), p(xt+1 |y1:t+1 ) ∝ p(yt+1 |xt+1 )

Z

p(xt+1 |xt )p(xt |y1:t )dxt .

(4)

In general it is not possible to solve (4) analytically, but it still forms the basis of approximate filtering algorithms (e.g. Kitagawa, 1987; West, 1992). Particle filters use simulation to perform filtering. The posterior at time t is approxi(i)

(i)

N mated used a set of particles {xt }N i=1 and associated weights {w t }i=1 which sum to

unity. The posterior distribution is approximated by a discrete probability mass func(i)

tion whose support is the set of particles, and which has probability w t

assigned to

the ith particle value. We denote π ˆ t (xt ) to be the particle filter approximation to the posterior at time t: π ˆ t (xt ) =

N X

(i)

(i)

wt δ(xt = xt ).

i=1

Substituting into (4) gives an approximation to the posterior at time t + 1, which we denote as π ¯ t+1 (xt+1 ), with π ¯t+1 (xt+1 ) ∝

N X

(i)

(i)

wt p(xt+1 |xt )p(yt+1 |xt+1 ).

i=1

At its simplest, the particle filter can be viewed as the following: 1 Initiation Produce a particle approximation, π ˆ 0 (x0 ) to the prior p(x0 ). 7

2 Iteration (Time t + 1.) Given a particle approximation π ˆ t (xt ), calculate a new particle approximation π ˆ t+1 (xt+1 ) to π ¯t+1 (xt+1 ). The key to an efficient particle filtering algorithm is the method for generating π ˆ t+1 (xt+1 ), the approximation to π ¯ t+1 (xt+1 ). Various approaches to this iteration have been proposed in the literature. In general the iteration is split into propagation, reweighting and resampling steps. The propagation step generates the particles at time t + 1, and the reweighting step calculates the particles’ weights. The resampling step is optional, and produces a set of equally weighted particles, some of which will be duplicated, which can be viewed as an approximate sample from the posterior. Primarily the propagation and reweighting steps are based on importance sampling. A general framework for implementing these steps is given by the ASIR filter of Pitt and Shephard (1999). This filter allows for flexibility in the choice of proposal density, and the proposal density can be chosen to take account of the model and the information in the observation at the next time step. In the ASIR filter the proposal is of the form q(xt+1 ) =

N X

(i)

βi q(xt+1 |xt ).

(5)

i=1

To simulate from this proposal we first simulate the component, i, from the discrete distribution which assigns probability β j to value j, and then simulate xt+1 from (i)

q(xt+1 |xt ). For a simulated pair (i, xt+1 ), the new particle is xt+1 , and its weight is proportional to (i)

(i)

wt p(xt+1 |xt )p(yt+1 |xt+1 ) (i)

βi q(xt+1 |xt )

.

In practice q(xt+1 ) is chosen to be as close as possible to π ¯ t+1 (xt+1 ). It is possible to choose q(xt+1 ) = π ¯t+1 (xt+1 ) for some problems, whereas for others, approximations of π ¯t+1 (xt+1 ), often based on a Taylor expansion, can be used. See Pitt and Shephard (1999) for more details. Any resampling step increases the Monte Carlo variation of the filter. The reason for using resampling steps is linked to early particle filter algorithms (e.g. Gordon et al., 1993; Kong et al., 1994; Liu and Chen, 1995) which only proposed one future particle (i)

for each existing particle (that is they sampled a single value from each p(x t+1 |xt ) at the propagation step). For such algorithms, resampling allows multiple particles to be generated in areas of high posterior probability, which can then independently explore the future of the state. There is still a trade-off between this advantage, and the disadvantage of extra Monte Carlo variation, and rules-of-thumb have been introduced 8

to guide whether and when to resample (Liu and Chen, 1995). Furthermore there are numerous algorithms for performing resampling while introducing as little Monte Carlo variation as possible (Kitagawa, 1996; Liu and Chen, 1998; Carpenter et al., 1999; Fearnhead and Clifford, 2003). It should be noted that for the ASIR framework, resampling naturally occurs within the propagation step.

4

Rate of Convergence of the Particle Filter

We now consider the rate of convergence of the particle filter. By this we mean the rate at which the errors in the particle filter approximation of the posterior at a given time decays with the number of particles. The following simple result shows that, for 1-dimensional states, this rate depends on the rate at which the errors introduced in the Initiation and Iteration steps of the filter decay with the number of particles. Theorem 1 Consider a fixed set of observations y 1:T . Define Z y f (x)dx , ||f (x)|| = max y

−∞

so that, for two densities, ||ˆ π (x) − π(x)|| is the discrepancy between π(x) and π ˆ (x). Let S be the set of functions with finite variation,   Z 0 S = f (x) : |f (x)|dx < ∞ . Consider a particle filter with N particles. If, for some rate r(N ), and for all t = 0, . . . , T − 1, (i) ||ˆ π0 (x0 ) − p(x0 )|| = Op (r(N )); (ii) ||ˆ πt (xt+1 ) − π ¯t+1 (xt+1 )|| = Op (r(N )); (iii) viewed as functions of xt , p(yt+1 |xt ) ∈ S, and Pr(Xt+1 < y|yt+1 , xt ) ∈ S for all y; and (iv) p(yt+1 |y1:t ) > 0 for all t; then, for t = 1, . . . , T , ||ˆ πt (xt ) − p(xt |y1:t )|| = Op (r(N )).

9

(6)

Proof: See appendix A.

2

Conditions (i) and (ii) quantify the rate of error that is introduced at the Initiation and Iteration stages of the particle filter respectively. Conditions (iii) and (iv) are mild regularity conditions. Condition (iii) controls how smoothly the distribution of the state and observation at t + 1 varies as the state at time t is varied. The difficulty with extending this result to higher dimension is purely specifying a suitable generalisation to this regularity condition (if such a condition exists). Condition (iv) will trivially hold for any sensible model. The metric used to measure the difference between two densities is just the KolmogorovSmirnov statistic. For standard Monte Carlo approaches to particle filters, the rate will be r(N ) = N −1/2 . The idea behind this theorem is that if QMC methods are used a rate of r(N ) = N −1 can be achieved. We call filters which use QMC, and attain these rates of convergence, regularised particle filters.

5

A Regularised Particle Filter

To implement a regularised particle filter requires an algorithm for generating a set of dependent weighted particles that give a low discrepancy approximation to p(x 0 ), and then at each time point a set of weighted particles that give a low discrepancy approximation to π ¯ t (xt ). Of these it is generating the approximation to π ¯ t (xt ) that is particularly difficult, because π ¯ t (xt ) is a mixture distribution. By comparison, the prior, p(x0 ), will generally be a standard single density, and a low-discrepancy approximation to it can be obtained using the probability integral transform applied to a low discrepancy sequence from [0, 1]. We thus concentrate on the iteration step of the filter. Our approach to generating an approximation to π ¯ t (xt ) involves using importance sampling with a simple proposal density; from which we can generate particles using standard random QMC methods. Calculating the weight for a single particle is an O(N ) calculation, so the resulting filter algorithm has O(N 2 ) complexity. For a given model, the algorithm we propose can use existing ASIR theory to guide the choice of proposal distributions. Regularised Particle Filter: Iteration at time t + 1. (1) Calculate proposal density. Given the set of weighted particles at time t, and 10

the observation at time t + 1, construct a proposal density q ∗ (xt+1 ). (2) Generate particles. Generate regularised particles from this proposal using standard random QMC methods. (i)

(3) Weight particles. Assign particle x t+1 a weight proportional to (i)

π ¯t+1 (xt+1 ) (i)

q ∗ (xt+1 )

.

The key to implementing this regularised particle filter is the choice of proposal density in (1). For models where the filtering densities are uni-modal, then a suitable proposal density can be designed by adapting the ASIR filter as follows. Given an ASIR type P (i) 2 proposal density, q(xt+1 ) = N i=1 βi q(xt+1 |xt ), let µ and σ be the mean and variance

of q(xt+1 ). If Td is a t random variable with d degrees of freedom, then the proposal density is that of the random variable µ + σT d . For models with multi-modal filtering densities a uni-modal proposal density will be inefficient. For many such models the multi-modality arises through the form of the likelihood function, and a proposal density chosen to approximate this likelihood function should work well. See Section 7 for an example. The regularised particle filter algorithm produces a set of weighted particles that is properly weighted (Liu et al., 2001) because, as random QMC is used, each particle is marginally a draw from the proposal. The following theorem gives a condition on the

weights which ensures that ||ˆ πt+1 (xt+1 ) − π ¯t+1 (xt+1 )|| = Op (1/N ) for models with a 1-dimensional state.

Theorem 2 Consider a model with a 1-dimensional state. Given a target distribution π ¯t+1 (xt+1 ), and a proposal distribution q ∗ (xt+1 ), define the weight function w(xt+1 ) = (i)

∗ π ¯t+1 (xt+1 )/q ∗ (xt+1 ). Given a random QMC sequence, {xt+1 }N i=1 from q (xt+1 ) with dis(i)

crepancy which is O(1/N ), we produce a set of weighted particles {x t+1 }N i=1 with weights (i)

¯ t+1 (xt+1 ). Call this approximation proportional to {w(xt+1 )}N i=1 which approximates π R π ˆt+1 (xt+1 ) as before. If |{w0 (x)|dx < ∞, then ||ˆ πt+1 (xt+1 ) − π ¯t+1 (xt+1 )|| = Op (1/N )

Proof: See appendix B.

2

11

The condition on the weights will hold in almost all situations where the weights are bounded. This will be achieved providing the proposal has heavier tails than the target distribution.

6

Numerical Examples

We now look at the empirical performance of the regularised particle filter at analysing data from the ARCH and stochastic volatility models. In each case our comparisons are made with the ASIR filter which the regularised particle filter is based on (that is the ASIR filter with the proposal distribution q(x t+1 ) of stage 1 of the regularised particle filter algorithm). To make the comparisons fair, we fix the CPU time of the two filters we compare, rather than the number of particles, throughout. All programs implementing the particle filters were in a combination of C and R, and computing times are for a 900 MHz Pentium PC. The implementation of the ASIR filter for the two models is discussed below. If µ and σ 2 are the mean and variance of the proposal density of the ASIR filter, then the proposal density of step (1) of the RPF is q ∗ (x; µ, σ 2 ) ∝ σ −1 (1 + (x − µ)2 /σ 2 )−(d+1)/2 . This is the density of a linear transpose of a t d random variable. We simulate the particles in step (2) of the RPF by (i) simulate ui for i = 1, . . . , N , where ui is a realisation of a U((i − 1)/N, i/N ) random variable; and (i)

(ii) set xt+1 = td (ui )σ + µ, where td (u) is the u quantile of a td random variable. For the results we present below we set d = 4, but similar results were obtained for a range of d values. We compare the RPF to the ASIR filter at estimating (a) the log-likelihood, l, evaluated at the true parameter values; (b) the posterior means, E(Xt |y1:t ), for t = 1, . . . , T ; and

12

(c) the posterior probabilities of the state being less than its true value, Pr(X t < xt |y1:t ), for t = 1, . . . , T . These represent the type of inference most commonly performed by the particle filter. In the cases of (a) and (c) we are using these as a default for estimating the log-likelihood at an arbitrary parameter value, and estimating an arbitrary posterior probability respectively. To measure a given filter’s performance at estimating any of (a)–(c) we calculate the variance of the filter’s estimates across 100 independent runs. Basing the accuracy of the filter on this variance is sensible as in our studies all the filters had negligible bias. For (b) and (c) we present results of the average of these variances over t = 1, . . . , T . As a final comparison, we implemented versions of the RPF and ASIR filter which approximate the marginal smoothing distribution p(x t |y1:T ). Both filters are easily adapted to the smoothing problem using ideas from Kitagawa (1996). In both cases the resulting filter has computational complexity which is quadratic in the number of particles. For simplicity we present results for the filters’ accuracy at estimating E(Xt |y1:T ), and we measure the accuracy of the filters in an identical way to estimating the accuracy of the posterior means E(X t |y1:t ).

6.1

ARCH model

We return to the ARCH model of Example 1. We simulated 100 data points using the parameter values β0 = 0.8, β1 = 0.4 and σ = 0.9. For this model it is possible to set the proposal distribution in the ASIR filter as π ¯ t (xt ) (see Pitt and Shephard, 1999, for further details). The results are given in Table 1. In all cases the variances from the RPF was smaller than the corresponding variances from the ASIR filter. The ratio of these variances gives a measure of the gain of efficiency of the RPF over the ASIR filter, and is an estimate of proportionately how much longer the ASIR filter would need to be run to get as accurate an estimate as the RPF. In our study the gain of efficiency was, depending on CPU time, between 400 and 25000 for estimating the log-likelihood; between 20 and 200 for estimating the posterior means, and between 2 and 4 for estimating the posterior probabilities. The largest gains in efficiency are for larger CPU times. The variance of the estimates of the log-likelihood and the posterior means decay at 13

ASIR

RPF

Time (s)

l

E(xt |y1:t )

Pr(Xt < xt |y1:t )

l

E(xt |y1:t )

Pr(Xt < xt |y1:t )

0.08

4.5 × 10−2

6.4 × 10−4

2.1 × 10−4

1.1 × 10−4

3.5 × 10−5

9.4 × 10−5

0.25

2.4 × 10−2

2.0 × 10−4

6.6 × 10−5

2.0 × 10−5

4.5 × 10−6

2.1 × 10−5

0.89

1.6 × 10−2

5.9 × 10−5

1.9 × 10−5

2.7 × 10−6

5.7 × 10−7

5.3 × 10−6

3.45

7.0 × 10−3

1.5 × 10−5

4.8 × 10−6

2.8 × 10−7

7.5 × 10−8

1.4 × 10−6

Table 1: Variances of the RPF’s and ASIR’s estimates of (1) the log-likelihood, l, evaluated at the true parameter values; (2) the posterior mean of the state; and (3) the posterior probability that the state is less than its true value; for the ARCH model. For the comparison we fixed the CPU time for analysing the data and present the variance of the estimates across 100 independent analyses. The number of particles was 800, 2500, 8900 and 34500 for the ASIR filter and 50, 100, 200 and 400 for the RPF.

an empirical rate which is close to N −3 , which corresponds to errors decaying at a rate N −3/2 . This is the rate of convergence of random QMC at approximating integrals of smooth functions. By comparison, the empirical rate of convergence of the posterior probabilities is approximately N −2 . This may explain the substantial difference in the efficiency gains for estimating the log-likelihood and the posterior means rather than the posterior probabilities. Results of the filters’ accuracy at estimating the smoothed posterior means are shown in Table 2. The efficiency gain of the RPF over the ASIR filter is by factors of between 2 and 4 orders of magnitude. This gain in efficiency is greater than for the filtering problem, particularly for larger CPU time, as the ASIR filter now has the same computational complexity as the RPF.

6.2

Stochastic Volatility Model

We next consider the stochastic volatility model xt |xt−1 ∼ N(φxt−1 , σ 2 ), yt |xt ∼ N(0, β 2 exp(xt )), with parameters β, φ and σ. This model provides a way of generalising the BlackScholes option pricing to allow for volatility clustering. See Hull and White (1987) for

14

ARCH

SV

Time (s)

ASIR

RPF

Time(s)

ASIR

RPF

0.22

6.7 × 10−3

3.9 × 10−5

0.18

6.1 × 10−3

8.8 × 10−4

0.60

4.0 × 10−3

5.1 × 10−6

0.60

3.8 × 10−3

1.5 × 10−4

2.11

2.1 × 10−3

6.7 × 10−7

2.26

2.0 × 10−3

8.9 × 10−6

8.15

1.0 × 10−3

9.5 × 10−8

8.93

9.8 × 10−4

1.5 × 10−6

Table 2: Variances of the RPF’s and ASIR’s estimates of the smoothed posterior means, E(xt |y1:100 ), for both the ARCH and stochastic volatility (SV) models. The four lines correspond to the RPF filter with 50, 100, 200 and 400 particles respectively. The number of particles in the ASIR filter was set so that the CPU time to analyse a single data set was the same as the corresponding RPF. For the ARCH model, 80, 140, 260 and 520 particles were used in the ASIR filter; while for the SV model, 90, 170, 320 and 630 particles were used. The entries in the table are the average (over time) of the variance of the estimates of E(xt |y1:100 ) from 100 independent runs of the corresponding filter.

more details, and Shephard and Pitt (1997) for an MCMC approach to analysing this model. To implement the ASIR filter for this model, we use a second order Taylor expansion of the log-likelihood, log p(yt |xt ), about xt = φxt−1 , as suggested in Smith and Santos (2003). This approximation produces a Gaussian mixture approximation to π ¯ t (xt ), which we use as our proposal density in the ASIR filter. We compare the RPF and ASIR filter at analysing 100 simulated data points from the Stochastic Volatility model. We set the parameters to β = 0.6, φ = 0.97 and σ = 0.18, taken from Pitt and Shephard (1999). The results are shown in Table 3. The empirical rate of decay of the variance of the RPF’s estimates of the posterior probability is roughly N −2 , while the variance of the estimates of the log-likelihood and posterior means appears to be N −3 . The RPF shows a gain in efficiency over the ASIR filter for all three features of the model. Depending on the CPU time for the filters, this gain was by a factor of between 1 and 10 for estimating the log-likelihood, between 1.5 and 20 for estimating the posterior means, and between 2 and 5 for estimating the posterior probabilities. Again the largest gains in efficiency are for largest CPU times.

15

ASIR

RPF

Time (s)

l

E(xt |y1:t )

Pr(Xt < xt |y1:t )

l

E(xt |y1:t )

Pr(Xt < xt |y1:t )

0.12

2.8 × 10−2

5.3 × 10−4

3.4 × 10−4

2.8 × 10−2

3.6 × 10−4

1.8 × 10−4

0.39

8.9 × 10−3

1.6 × 10−4

1.0 × 10−4

2.6 × 10−3

4.2 × 10−5

3.3 × 10−5

1.46

1.9 × 10−3

4.7 × 10−5

3.0 × 10−5

3.7 × 10−4

4.4 × 10−6

6.4 × 10−6

5.67

4.8 × 10−4

1.1 × 10−5

6.7 × 10−6

4.2 × 10−5

5.4 × 10−7

1.4 × 10−6

Table 3: Variances of the RPF’s and ASIR’s estimates of (1) the log-likelihood, l, evaluated at the true parameter values; (2) the posterior mean of the state; and (3) the posterior probability that the state is less than its true value; for the Stochastic Volatility model. For the comparison we fixed the CPU time for analysing the data and present the variance of the estimates across 100 independent analyses. The number of particles was 750, 2500, 9100, and 36000 for the ASIR filter and 50, 100, 200 and 400 for the RPF.

As for the ARCH model, the gain in efficiency of the RPF over the ASIR filter is more marked for the smoothed estimates of the states. Table 2 show the gain in efficiency of the RPF over the ASIR filter by factors of between 1 and 3 orders of magnitude.

7

Bi-modal Example

The previous examples have uni-modal filtering densities at each time-step and as such, they appear particularly well-suited to our RPF algorithm. We now present results of a comparison of the RPF and ASIR filter on the non-linear model of Kitagawa (1987): xt |xt−1 ∼ N(0.5xt−1 + 25xt−1 /(1 + x2t−1 ) + 8 cos(1.2(t − 1)), σs2 ), yt |xt ∼ N(x2t−1 /20, σo2 ), with a prior X1 ∼ N(0, σp2 ). This model mas bi-modal filtering densities, arising from the likelihood function, which, if y t > 0, has modes at ±(20yt )1/2 . The results we present are for σp = 1, σs = 1 and σo = 10, though the effect of varying these is discussed below. (i)

The ASIR filter was implemented using a proposal of the form (5), with q(x t+1 |xt ) = (i)

(i)

(i)

(i)

(i)

ˆt+1 is the mean of xt+1 given xt . The p(xt+1 |xt ), and βi ∝ wt p(yt+1 |ˆ xt+1 ) where x results of the ASIR filter at analysing 100 simulated data-points from this nonlinear 16

model are shown in Table 4. Initially we implemented the RPF as in Section 6, but the filter performed poorly. While the filter estimated the state at certain times well, there were some time-steps at which the state was estimated poorly. At these time-steps the observations were large, and as a result the filtering density had two well-seperated modes which is poorly approximated by the proposal density of the RPF. However it is possible to implement a more efficient RPF for this model by using a bimodal proposal density. There are various ways of constructing such a proposal density, and we considered a simple approach of basing the proposal density on the likelihood function. Our proposal density at time t was of the form q ∗ (xt ; µ, σ 2 ) ∝ σ −1 (1 + (|xt | − µ)2 /σ 2 )−(d+1)/2 , where   (20yt )1/2 if yt > 0, µ=  0 otherwise,

  20σ 2 /(yt + 1) if yt > 0, o σ2 =  20σo2 otherwise.

The |xt | term in this proposal makes the density symmetric about 0, for µ > 0 the proposal is bi-modal, with modes at ±µ, otherwise the proposal has a mode at 0. The value of σ 2 is based on the variance, 20σo2 /yt , suggested by a Normal approximation derived from a second-order Taylor expansion of the likelihood function but changed to avoid the variance tending to infinity as y t tends to 0. We simulate the particles in step (2) of the RPF by (i) Calculate a = Pr(σTd + µ > 0), where Td is a td random variable; (ii) simulate for i = 1, . . . , N ui , a realisation of a U((i − 1)/N, i/N ) random variable; (i)

(i)

(iii) if ui < 0.5 set xt+1 = σtd (2aui ) − µ, otherwise set xt+1 = σtd (2a(ui − 1) + 1) + µ. As before, for the results we present are for d = 4 and are given in Table 4; a significant improvement over the ASIR filter can be seen. We have presented results for just one set of value of the prior, system and observation variances. Increasing the prior variance has little effect on the performance of either the RPF or the ASIR filter. Increasing the system variance or lowering the observation variance makes the RPF more accurate and the ASIR filter less accurate (because 17

ASIR

RPF

Time (s)

l

E(xt |y1:t )

Pr(Xt < xt |y1:t )

l

E(xt |y1:t )

Pr(Xt < xt |y1:t )

0.10

7.9

5.2

8.8 × 10−3

3.5

3.8

2.0 × 10−2

0.26

7.7

5.8

8.7 × 10−3

9.7 × 10−1

8.9 × 10−1

6.8 × 10−3

0.95

3.9

4.0

6.4 × 10−3

1.7 × 10−1

2.2 × 10−1

2.2 × 10−3

3.7

3.1

2.4

4.7 × 10−3

3.2 × 10−2

6.0 × 10−3

7.0 × 10−4

Table 4: Variances of the RPF’s and ASIR’s estimates of (1) the log-likelihood, l, evaluated at the true parameter values; (2) the posterior mean of the state; and (3) the posterior probability that the state is less than its true value; for the Non-linear model. For the comparison we fixed the CPU time for analysing the data and present the variance of the estimates across 100 independent analyses. The number of particles was 900, 2600, 9500 and 37000 for the ASIR filter and 50, 100, 200, 400 for the RPF

the filtering density becomes more similar to the likelihood rather than the predictive density). However over a wide range of values for these variances we found that the RPF consistently outpeformed the ASIR filter.

8

Discussion

We have presented a new particle filter which uses random QMC to propagate particles. Theoretical results show that, at least for models with 1-dimensional states, the resulting particle filters can have rate of convergence which is O(N −1 ). Our work has been motivated by two models from econometrics, the ARCH model and the stochastic volatility model, and our simulation study show that large gains in efficiency can be obtained for these models. The new filter seems particularly suitable for estimating smooth functions of the states, for example the log-likelihood and posterior means, and for performing smoothing. We demonstrated in Section 7 the ability of the RPF to work well for models with bi-modal filtering densities, provided a suitable proposal density is chosen. For this model, our approach to choosing the proposal density was based on approximating the likelihood function, and we believe this will be an approach which is suitable for many problems with multi-model posteriors - as it is often the form of the likelihood which induces the multi-modality. While we have concentrated on models in 1-dimension, the RPF algorithm we presented

18

can be used for any model. In higher dimensions the advantage of using random QMC will be less, particularly if few particles are used. While it appears difficult to generalise Theorems 1 and 2 to higher dimensions, we conjecture that the RPF algorithm we present here will have a rate of convergence that is O(N −1 (log N )d−1 ) for a large class of d-dimensional problems. When the computational cost of the RPF algorithm is taken into account, the filter will be asymptotically inefficient for d > 1. However it may still provide practical gains in efficiency for finite CPU time, especially when estimating smooth function of the states. Furthermore, if smoothing is required, such a filter will be asymptotically efficient. The idea of using random quasi-Monte-Carlo within a particle filter is related to the idea of Bolviken and Storvik (2001). They use ideas from Gaussian-quadrature to deterministically propagate and weight particles. Gaussian-quadrature is more accurate than random quasi-Monte-Carlo at approximating integrals of smooth functions over finite 1-dimensional intervals, and the filter of Bolviken and Storvik (2001) is shown to be very accurate at approximating the log-likelihood curve. The potential disadvantages of their approach include the need to truncate the state-space, and the deterministic nature of their filter (so it does not satisfy the properly weighted particle property which is related to the unbiasedness of the filter). Also, it is unclear how accurate estimates of posterior expectations of non-smooth functions of the states, such as estimates of posterior probabilities, will be. Random quasi-Monte-Carlo is more easily extended to higher dimensions than Gaussian quadrature (Owen, 1998). QMC ideas have also been used within a particle filter by Maskell et al. (2003), but in this case QMC is used only to approximate the weights of the particles, rather than as a basis for generating particles as in the RPF. While we have presented a specific RPF algorithm, other more efficient algorithms may be possible, and we feel that this is an important area for future research. Of particular interest will be whether a RPF can be designed whose computational cost is less than quadratic in the number of particles, and in particular whether an algorithm with linear computational complexity is possible. Such a filter will be asymptotically efficient compared to a standard particle filter, regardless of the state dimension. There may be other ways of improving the RPF we described here. For example an alternative way of producing the particles in step (2) of the algorithm is to use a sequence of random uniform variates where u1 is uniformly distributed on [0, 1/N ], and for i =

19

2, . . . , N , ui = ui−1 + 1/N . This produces increased dependence between the particles, the discrepancy of the sequence u1 , . . . , uN being 1/N rather than approximately 2/(N ) for the previous approach. However it is not clear whether the deterministic relationship between the N particles is desirable. If we use these steps within the RPF, then we obtain a small improvement in the accuracy of the filter for the ARCH model. Results for the stochastic volatility model remarkably show a much faster rate of convergence for the estimates of the log-likelihood and the posterior means: doubling the number of particles reduces the variances by a factor of 50 (results not shown). No such improved rate of convergence is seen in the estimates of the posterior probabilities. What features of this model produce such fast rates of convergence, and for the posterior expectation of what class of functions of the state should we expect such convergence rates is unclear. These faster rates of convergence are reminiscent of the rate of convergences that are possible using numerical integration techniques in 1-dimension. Appendix A: Proof of Theorem 1 Our proof is based loosely on that of Crisan (2001). The proof is by induction, with the inductive hypothesis being (6). This hypothesis is true for t = 0 by condition (i). We assume it is true at time t and prove that it is thus true at time t + 1. For notational simplicity we write p(x t ) for p(xt |y1:t ) and π(xt ) for πt (xt ), and similarly when t is replaced by t + 1, throughout this proof. The triangle inequality gives ||ˆ π (xt+1 ) − p(xt+1 )|| ≤ ||ˆ π (xt+1 ) − π ¯ (xt+1 )|| + ||¯ π (xt+1 ) − p(xt+1 )||.

(7)

The first term on the right-hand side is O p (r(N )) by condition (ii). Thus we need only show that the second term is also Op (r(N )). Now the second term is R

R

π p(xt ) Pr(Xt+1 < y|yt+1 , xt )p(yt+1 |xt )dxt ˆ (xt ) Pr(Xt+1 < y|yt+1 , xt )p(yt+1 |xt )dxt

.

R R − max

y π ˆ (xt )p(yt+1 |xt )dxt p(xt )p(yt+1 |xt )dxt (8) Using (3), as p(yt+1 |xt ) ∈ S, by condition (iii), and by the inductive hypothesis the discrepancy between π ˆ (xt ) and p(xt ) is Op (r(N )), then Z (ˆ π (xt ) − p(xt ))p(yt+1 |xt )dxt = Op (r(N )). 20

By condition (iv) p(yt+1 |y1:t ) = rewrite (8) as

R

p(xt )p(yt+1 |xt )dxt is strictly positive, so we can

Z

π (xt ) − p(xt )) Pr(Xt+1 < y|yt+1 , xt )p(yt+1 |xt )dxt max (ˆ

+ Op (r(N )). p(yt+1 |y1:t ) y 1

(9)

Finally, if f (x) ∈ S and g(x) ∈ S, then f (x)g(x) ∈ S. This follows because Z

0

|(f (x)g(x)) |dx ≤

Z

0

|f (x)g (x)|dx +

Z

|f 0 (x)g(x)|dx,

(10)

and Z

0

|f (x)g (x)|dx ≤ max{|f (x)|}

Z

|g 0 (x)|dx.

Now because f (x) ∈ S then f (x) must be finite for all x, and, as g(x) ∈ S, the integral R on the right-hand side is finite by definition. By a similar argument for |f 0 (x)g(x)|dx

we have that (10) is finite as required.

Thus as Pr(Xt+1 < y|yt+1 , xt ) ∈ S, and p(yt+1 |xt ) ∈ S, then Pr(Xt+1 < y|yt+1 , xt )p(yt+1 |xt ) ∈ S and thus by QMC integration theory and the inductive hypothesis

Z

max (ˆ π (xt ) − p(xt )) Pr(Xt+1 < y|yt+1 , xt )p(yt+1 |xt )dxt

= Op (r(N )). y This, together with (9), proves that the second term of (7) is finite as required.

2

Appendix B: Proof of Theorem 2 For notational simplicity we write q(x) for q ∗ (xt+1 ), π ¯ (x) for π ¯ t+1 (xt+1 ) and π ˆ (x) for π ˆt+1 (xt+1 ). (i)

The particles {xt+1 }N i=1 , with weights 1/N produce an approximation to q(x), which we call qˆ(x). As the particles were generated using random QMC we have ||q(x) − qˆ(x)|| = Op (1/N ). Now ||ˆ π (x) − π ¯ (x)|| ≤ ||ˆ π (x) − qˆ(x)w(x)|| + ||ˆ q (x)w(x) − π ¯ (x)||. The first term on the right-hand side is qˆ(x)w(x) R qˆ(x)w(x) − qˆ(x)w(x) , and this is Op (1/N ) because Z

qˆ(x)w(x) =

Z

q(x)w(x) + Op (1/N ),

21

and

R

q(x)w(x) = 1. While Z y (ˆ q (x) − q(x))w(x)dx ||ˆ q (x)w(x) − π ¯ (x)|| = max y −∞   Z y 0 |w (x)|dx ||q(x) − qˆ(x)|| ≤ max y

−∞

and this is Op (1/N ) because of the assumption about w(x).

2

References Bollerslev, T., Engle, R. F. and Nelson, D. B. (1994). ARCH models. In: The Handbook of Econometics, Volume 4 (eds. R. F. Engle and D. B. Nelson), Amsterdam: NorthHolland, 2959–3038. Bolviken, E. and Storvik, G. (2001). Deterministic and stochastic particle filters in state-space models. In: Sequential Monte Carlo Methods in Practice (eds. A. Doucet, N. de Freitas and N. gordon), Springer–Verlag; New York, 97–116. Carpenter, J., Clifford, P. and Fearnhead, P. (1999). An improved particle filter for non-linear problems. IEE proceedings-Radar, Sonar and Navigation 146, 2–7. Chopin, N. (2002). A sequential particle filter method for static models. Biometrika 89, 539–551. Crisan, D. (2001). Particle filters - a theoretical perspective. In: Sequential Monte Carlo Methods in Practice (eds. A. Doucet, N. de Freitas and N. gordon), Springer–Verlag; New York, 17–41. Crisan, D. and Doucet, A. (2002). A survey of convergence results on particle filtering methods for practitioners. IEEE Transactions on signal processing 50, 736–746. Doucet, A., de Freitas, J. F. G. and Gordon, N. J., eds. (2001). Sequential Monte Carlo Methods in Practice. Springer-Verlag, New York. Fearnhead, P. (2004). Particle filters for mixture models with an unknown number of components. Statistics and Computing 14, 11–21. Fearnhead, P. and Clifford, P. (2003). Online inference for hidden Markov models. Journal of the Royal Statistical Society, Series B 65, 887–899. Gordon, N., Salmond, D. and Smith, A. F. M. (1993). Novel approach to nonlinear/nonGaussian Bayesian state estimation. IEE proceedings-F 140, 107–113. 22

Harvey, A. C., Ruiz, E. and Sentana, E. (1992). Unobserved component times series models with ARCH disturbances. Journal of Econometrics 52, 129–158. Hull, J. and White, A. (1987). The pricing of options on assets with stochastic volatilities. Journal of Finance 42, 281–300. Kitagawa, G. (1987). Non-Gaussian state-space modelling of non-stationary time series (with discussion). Journal of the American Statistical Association 82, 1032–1063. Kitagawa, G. (1996). Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. Journal of Computational and Graphical Statistics 5, 1–25. Kong, A., Liu, J. S. and Wong, W. H. (1994). Sequential imputations and Bayesian missing data problems. Journal of the American Statistical Association 89, 278–288. Liu, J. S. and Chen, R. (1995). Blind deconvolution via sequential imputations. Journal of the American Statistical Association 90, 567–576. Liu, J. S. and Chen, R. (1998). Sequential Monte Carlo methods for dynamic systems. Journal of the American Statistical Association. 93, 1032–1044. Liu, J. S., Chen, R. and Logvinenko, T. (2001). A theoretical framework for sequential importance sampling with resampling. In: Sequential Monte Carlo Methods in Practice (eds. A. Doucet, N. de Freitas and N. gordon), Springer–Verlag; New York, 225–246. Maskell, S., Rollaon, M., Gordon, N. and Salmond, D. (2003). Efficient particle filtering for multiple target tracking with application to tracking in structured images. Image and Vision Computing 21, 931–939. Niederreiter, H. (1978). Quasi-Monte Carlo methods and pseudo-random numbers. Bulletin of the American Mathematical Society 84, 957–1041. Niederreiter, H. (1992). Random number generation and quasi-Monte Carlo methods. SIAM, Philadelphia. Owen, A. B. (1995). Randomly permuted (t, m, s)-nets and (t, s)-sequences. In: Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (eds. H. Neiderreiter and P. J. S. Shiue), Springer-Verlag, New York. Owen, A. B. (1997). Scrambled net variance for integrals of smooth functions. Annals of Statistics 25, 1541–1562. 23

Owen, In:

A.

B.

(1998).

Winter Simulation

Monte

Carlo

Conference

extensions Proceedings,

of

Quasi-Monte

571–577,

available

Carlo. from

http://www-stat.stanford.edu/∼owen/reports/. Pitt, M. K. and Shephard, N. (1999). Filtering via simulation: auxiliary particle filters. Journal of the American Statistical Association 94, 590–599. Ridgeway, G. and Madigan, D. (2003). A sequential Monte Carlo method for Bayesian analysis of massive datasets. Data Mining and Knowledge Discovery 7, 301–319. Ripley, B. D. (1987). Stochastic Simulation. New York: Wiley and Sons. Shephard, N. and Pitt, M. K. (1997). Likelihood analysis of non-Gaussian measurement time series. Biometrika 84, 653–667. Smith, J. Q. and Santos, A. F. (2003). Second order filter distribution approximations for financial time series with extreme outlier. submitted Available from http://www4.fe.uc.pt/gemf/estudos/resumos/2003/resumo2003 03.htm. West, M. (1992). Modelling with mixtures. In: Bayesian statistics 4 (eds. J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith), Clarendon Press, London.

24

Suggest Documents