A Comparison of Particle Filters for Personal Positioning

2 downloads 0 Views 843KB Size Report
Abstract. Particle filters, also known as sequential. Monte Carlo methods, are a convenient and popular way to numerically approximate optimal Bayesian.
VI Hotine-Marussi Symposium of Theoretical and Computational Geodesy, May 29-June 2, 2006.

A Comparison of Particle Filters for Personal Positioning D. Petrovich and R. Piché Institute of Mathematics Tampere University of Technology, P.O. Box 553, 33101 Tampere, Finland Abstract. Particle filters, also known as sequential Monte Carlo methods, are a convenient and popular way to numerically approximate optimal Bayesian filters for nonlinear non-Gaussian problems. In the literature, the performance of different filters is often determined empirically by comparing the filter’s conditional mean with the true track in a set of simulations. This is not ideal. Because these filters produce approximations of the optimal Bayesian posterior distribution, the comparison should be based on the quality of this approximation rather than on an estimate formed from the distribution. In this work, we apply a multivariate binning technique to compare the performance of different particle filters. In our simulation, we find that the conclusions of the distribution comparison are similar to the conclusions of a root mean square error analysis of the conditional mean estimate. Keywords. Sequential Monte Carlo, Particle Filter, Bayesian estimation

1 Introduction Particle filters (PFs) implement the recursive Bayesian filter with Monte Carlo (MC) simulation and approximate the posterior distribution by a set of samples with appropriate weights. This is most attractive in nonlinear and non-Gaussian situations where the integrals of Bayes’ recursions are not tractable. In the literature, many PF simulations focus on the MC variation of the mean estimates, i.e. the randomness introduced by the MC algorithm that can be observed from the empirical mean estimates. Better PFs vary less. In this work, we develop a method to compare PF performance that uses the distribution approximation itself rather than a single estimate formed from it. A distribution analysis could be more useful than an estimate analysis. As examples, two filters could give similar mean estimates although one of them has a distribution “closer” to the true distribution. Also, in a bimodal case, the mean could be between the two modes in a region of the state-space where

there is little probability of the target being located; in a case such as this, the mean is less interesting to analyze and we are more concerned if our filter appropriately characterizes this bimodal characteristic. We will discuss one proposed method for comparing distributions and then apply this method to the comparison of four PFs (SIR1, SIR2, SIR3, SIR4) described in the Appendix.

2 Comparing Distributions with χ2-tests An interesting application of distribution comparisons was given by Roederer et al. (2001) in the field of cytometry. Test samples, i.e. sets of multidimensional data, are to be ranked according to their similarity to a control sample, which is a sample of data chosen to represent some known behavior. A multivariate data-dependent binning technique was proposed that adaptively constructed bins according to the control sample, followed by the use of a test statistic to quantify the difference between the test and control sample. Baggerly (2001) provides a more theoretical discussion of this approach with the recommendation to use the standard two-sample χ2test statistic. The form of the computed test statistic is (Baggerly (2001)) 2

B

ψ = ∑∑ i =1 j =1

(oij − eij ) 2 eij

,

where B is the number of bins, oij is the observed count of the jth bin of the ith sample set, and eij is the expected count of the jth bin of the ith sample set given by

eij = ni

(o1 j + o2 j ) (n1 + n2 )

,

where ni is the number of samples in set i. The algorithm of Roederer et al. (2001) that is used for constructing the bins is called probability binning and is as follows. The variance of the control sample along each of the d dimensions is computed and the dimension with the largest variance is chosen to be divided. The sample median value of the chosen dimension is then chosen as the point at which to partition the state-

space in two. This is then repeated for each partitioned subspace, continuing until the desired number of bins has been reached. The result is a set of d-dimensional hyper-rectangular bins with sides parallel to the coordinate axes and each bin containing roughly the same number of control samples, see Figure 1. Assuming then, that a test sample is from the same distribution as the control sample, each bin will roughly have the same expected frequency. The two-sample test statistic is

χ B −1 distributed, i.e. then approximately approximately distributed according to a χ2 distribution with B-1 degrees of freedom. 2

The binning of the PF samples is done as follows. At each time step, we simulate M=N=104 samples using multinomial sampling from the importance distribution, evaluate weights, and skip the final resampling step. The binning at each time step uses the weighted samples. We draw 104 samples using systematic resampling from this weighted discrete distribution, bin the equallyweighted samples, compute the test score, and discard these resampled points. If the weights are equal, then the binning can proceed without resampling. To verify that the test score analysis is meaningful, we can compare our conclusions to those made from a root mean square error (RMSE) analysis of the mean estimates. The RMSE is given as

RMSEk = Ε μˆ k − μ k

2

,

(1)

where μˆ k is the random posterior mean estimate and μk is the true posterior mean. The expectation in equation 1 is approximated by repeating the simulation 500 times and using the sample mean. Due to lack of space, we have omitted our linearGaussian simulations and proceed directly to the general filtering scenario. The interested reader is referred to Petrovich (2006).

Fig. 1 Probability Binning in ℜ2 with B=32

4 Application Scenarios 3 Application to Linear and Gaussian Filtering Scenarios Assume for a moment, that we are able to generate IID samples from the true marginal posterior. We can then apply the probability binning procedure at each time step so that the state-space ℜ x is partitioned into bins using a sample from the true marginal posterior. It might be reasonable then to assume that the quality of a PF approximation of the marginal posterior can be assessed using the twosample χ2-test, where the hypothesis is that the samples from the true marginal posterior and the samples from the PF approximation are from the same distribution. Roughly speaking, we might expect better PFs to give better test scores, where “better test scores” refers to realizations of a random variable with distribution closer to the assumed χ2 distribution. We check this empirically by repeating the simulation 1000 times and comparing the mean of the realized test scores to that from the assumed χ2 n

distribution. For a

χ B2 −1 distribution,

B-1. All of our simulations use B=64.

the mean is

to

General

Filtering

We would like to apply this distribution analysis to PFs in the general filtering scenario where we do not have an analytic form for the posterior. The difficulty is then to determine a control sample that can be used to partition the state-space. Ideally, we would have some algorithm that was known to produce IID samples from the marginal posterior that we could use to produce the control sample. In the absence of such an ideal algorithm, we have chosen one PF algorithm as a reference. The partitioning of the state-space using a reference PF is done as follows. We have used the SIR3 importance distribution and, at each time step, draw M=107 samples from the importance distribution, evaluate weights, and then resample N=104 samples from this weighted distribution. The importance sampling uses deterministic sampling (103 samples from each mixture component) and the resampling uses systematic sampling. The binning at each time step uses the M weighted samples. We draw 106 samples from this weighted discrete distribution, perform the probability binning on these equally-weighted samples, and discard these resampled points.

It should be mentioned that the simulation scenario of this section is the same simulation scenario as that in Heine (2005), i.e. the same signal and measurement models and the same realized measurement and signal process. We have reproduced the results (i.e. the relative RMSE plots) and review the conclusions of that publication here for convenience. The contribution of this work is the application of the probability binning method to the comparison of PFs. 4.1 Simulation Description 4

For this simulation, the state is in ℜ with two position coordinates rk and two velocity coordinates uk, i.e.

[

x k = rke

rkn

u ke

u kn

]. T

The evolution of the state in continuous-time can be described by a white noise acceleration model, see e.g. Bar-Shalom and Li (1998). The discretized state equation for each [ rk uk ]T coordinate is written as

⎡1 Δt k ⎤ x k = Fk −1 x k −1 + v k −1 , Fk −1 = ⎢ ⎥, ⎣0 1 ⎦ where Δtk = tk-tk-1 and vk-1 ∼ Ν(0, Σ vk −1 ) with Σ vk −1

⎡ 1 Δt 3 = ⎢ 13 k2 ⎣ 2 Δt k

Δt k2 ⎤ ⎥ γ. Δt k ⎦

4.2 Results We first consider an analysis of the RMSE of the mean estimates. The RMSE values are divided by the SIR1 RMSE values and plotted in Figure 2. These relative RMSE values show how the MC variation of mean estimates for the different filters compare to SIR1: values lower than one indicate improved performance compared to SIR1 (less MC variation) while values above one indicate worse performance (more MC variation). With the largest noise variance (σ2=104 m2), all the filters seem to have similar MC variance. With σ2=25 m2, the SIR2 and SIR4 filters clearly have less MC variation, while for cases with smaller measurement noise variance (σ2=1 and 0.1 m2), the SIR3 and SIR4 filters have less MC variance. Also, the SIR2 behaves erratically at the smaller measurement noise variance. Due to weights summing to zero in the PF algorithm on the σ2=0.1 m2 case, SIR1 is averaged over only 998 realizations and SIR2 is averaged over only 365 realizations for that case. 2 SIR1 SIR2 SIR3 SIR4 rel. RMSE

1

0

1 2

0

20

40

60

80

100

120

140

160

180

200

0 0 2

20

40

60

80

100

120

140

160

180

200

20

40

60

80

100

120

140

160

180

200

20

40

60

80

100

120

140

160

180

200

2

rel. RMSE

We simulate a 200 second trajectory with a constant time step of Δtk= 1 second and γ = 3 m2/s3. The state's initial distribution Px0 is Gaussian and known. Three base stations are used and, at each time step, one base station produces a range measurement. The range measurement of the form

rel. RMSE

1

1

0 20

rel. RMSE

1

y k = r − rk + z k , z k ~ Ν (0, σ ), b

2

where rb is the known position of a base station. In the simulation of the observation process, the probability of a base station producing a measurement is inversely proportional to the squared distance to the target. For each filter, four separate simulations were run, each using a different variance σ2 for the noise in the measurement model: 104, 25, 1, 0.1 m2. The true posterior mean μk in the RMSE analysis, see equation 1, is approximated using the same reference PF that was used for creating the bins.

0

0

time

Fig. 2 Relative RMSE w.r.t. SIR1 for nonlinearGaussian scenario. From top-to-bottom σ2=104, 25, 1, 0.1 m2.

In Figure 3, the test score mean is plotted over the whole simulation. It is interesting to note that the test scores rarely resemble the χ2 theoretical distribution. However, there appears to be different behavior for the different filters. Note that all the results are rather similar with large σ2, and the differences become more apparent as σ2 decreases. Also note that for σ2=104 and 25 m2, there is near

identical results for the SIR1 and SIR3 filters and for the SIR2 and SIR4 filter. For σ2=0.1 m2, the SIR2 score means are quite large and are outside of the plotted region. The intention of these plots was not to display the actual score means, but instead to show the relative performance of the different filters. For cases with larger measurement noise variances (σ2=104 and 25 m2), the two filters using alternative weights in the importance distribution (SIR2 and SIR4) have smaller means. For cases with smaller measurement noise variances (σ2=1 and 0.1 m2), the two filters using alternative components in the importance distribution (SIR3 and SIR4) show smaller means, while the SIR2 results behave erratically. It might be reasonable then to conclude that the filters having test scores closer to the theoretical distribution χ B −1 , i.e. having smaller test score means, are working better. 2

5000 SIR1 SIR2 SIR3 SIR4

4000 mean

3000 2000 1000 0

0

20

40

60

80

100

120

140

160

180

200

0

20

40

60

80

100

120

140

160

180

200

0

20

40

60

80

100

120

140

160

180

200

0

20

40

60

80

120

140

160

180

200

5000 4000 mean

3000 2000 1000 0 10000 8000

mean

6000 4000 2000 0 10000 8000

mean

6000 4000 2000 0

100 time

2

Fig. 3 Mean of two-sample χ test statistic for nonlinear-Gaussian scenario. From top-to-bottom σ2=104, 25, 1, 0.1 m2. In summary, similar conclusions about the relative performance of the different PFs can be found using alternative criteria, i.e. the distribution analysis and the RMSE analysis of the mean estimates. The empirical comparison of different PFs using χ2 techniques seems to be feasible, even in scenarios where the state-space partitioning relies on a PF.

5 Conclusions In this work, we applied a multivariate binning technique from Roederer et al. (2001) to the

comparison of PFs. This was described for a linear and Gaussian filtering scenario, where we have an analytical form of the marginal posterior, and also for a nonlinear and Gaussian filtering scenario, where we estimated the optimal solution with a PF. The conclusions resulting from the proposed test were similar to the conclusions from an RMSE analysis of the mean estimates. We have not offered a detailed discussion of the practical, implementation aspects of such a test. It should be mentioned that our implementation of the test, i.e. the construction of the bins and the actual binning of the samples, used data structures similar to Kd-trees, see e.g. de Berg et al. (2000), and was computationally feasible for the cases that we considered. The literature on the χ2-test is vast and admittedly, our treatment of the test has been brief. In this section, we point out some questionable aspects of our use of this test for comparing PFs. First, we should question the use of the χ2-test itself. We are testing whether the two samples are from the same distribution, although it was already noted in Pitt and Shephard (1999) that the methods will not produce IID samples from the true posterior due to the finite mixture approximation. In spite of this, we have still considered the test scores as a way to empirically quantify the difference between distributions. Second, we should question how the χ2-test was actually performed. The number of bins and the construction of the bins in χ2-tests is not always straightforward and is therefore debatable. The resampling that is carried out before binning the PF samples is also questionable; due to the “extra” resampling, our test is then comparing an approximation of the PF approximation, which complicates the analysis. Finally, our use of a large sample PF to approximate the optimal posterior, as well as our choice of importance distribution for this PF, is not properly justified; other methods such as rejection sampling or MCMC might result in a better approximation of the posterior. The intention of the distribution comparison was to devise better ways of comparing PFs and possibly other Bayesian filters. The binning procedure that was described is, of course, limited to sample-based methods. Other possible methods could include integrating the posterior distribution over a finite number of regions and then using some distance, e.g. Kullback Leibler divergence or total variation distance, on the resulting finite state-space to quantify the distance between the distributions. However, the choice of distance is then quite arbitrary and there is little reason to prefer one distance over another. This seems to be an interesting area for future work.

References Baggerly, K.A. (2001). Probability Binning and testing agreement between multivariate immunofluorescence histograms: Extending the chi-square test. Cytometry, Vol. 45, pp. 141-150. De Berg, M., M. van Kreveld, M. Overmars, and O. Schwartzkopf (2000). Computational Geometry: Algorithms and Applications. Springer, 2nd Ed. Heine K. (2005). Unified framework for sampling/importance resampling algorithms. In Proceedings of Fusion 2005. Pitt, M.K. and N. Shephard (1999). Filtering via simulation: Auxiliary particle filter. In Journal of the American Statistical Association, Vol. 94, No. 446, pp. 590-599 Petrovich D. (2006). Sequential Monte Carlo for Personal Positioning. M.Sc. Thesis, submitted to: Tampere University of Technology. Roederer, M., W. Moore, A. Treister, R. Hardy, and L.A. Herzenberg (2001). Probability binning comparison: a metric for quantitating multivariate differences. Cytometry, Vol. 45, pp. 47-55

The general form of the unnormalized second-stage importance weight can be written as

w( x k , j ) ∝ ∝

p ( x k , j | y1:k ) q( x k , j | y1:k ) wkj−1 p( y k | x k ) p( x k | x kj−1 ) , β kj q j ( x k )

where j is the auxiliary variable,

β kj is the jth first-

stage weight, and qj is the jth importance density. Note that for notational convenience, we have dropped the conditioning on the measurements for the first-stage weight and importance density. The four different importance distributions result from different choices of

β kj and qj. SIR1 results

β kj = wkj−1 and q j = p ( x k | x kj−1 ) .

from using

SIR2 is an example from Pitt and Shephard (1999) that

β kj ∝ wkj−1 p ( y k | ξ kj )

uses

q = p( xk | x j

j k −1

and

) , where ξ is the mean of the j k

distribution of x k | x k −1 . Note that in the literature, j

this example is often referred to as the auxiliary

Appendix In this section, we briefly describe the four different importance distributions used for the PFs tested in this work. The system equations are typically given as

x k = f k −1 ( x k −1 ) + v k −1

For k =1,2,…

β ki



For i=1:N, assign 1st-state weights



normalize For i=1:M, sample index ji from discrete distribution of 1st-stage weights



For i=1:M, sample state x k conditioned

and

i

on index ji and all received measurements



now uses an EKF for each importance distribution, i.e. q = ν ( x k ; μ k , C k ) where ν is a density of j

j

j

a Gaussian distribution and the mean

μ kj and

j

where xk and yk are the values of the signal and observation stochastic process, respectively, at time k. Furthermore, vk-1 denotes the state of the signal noise process at time k-1 and zk denotes the state of the observation noise process at time k. We follow the “auxiliary” formulation of Pitt and Shephard (1999). Assuming that we have N particles approximately distributed according to Px0 , the algorithm is as follows:

y1:k

β kj = wkj−1 , but

covariance C k are given by the posterior of the jth

y k = hk ( x k ) + z k ,



particle filter. SIR3 again uses

For i=1:M, evaluate 2nd-stage weights and normalize Resample N particles

EKF. SIR4 uses the same importance distribution as SIR3, but has different first-stage weights given by β k = wk −1 c k , where j

j

j

c kj = ν ( y k ; m, Ξ

)

m = hk ( f k −1 ( x kj−1 )) Ξ = Hˆ kj Σ vk −1 ( Hˆ kj ) T + Σ zk , where

Hˆ kj is the Jacobian matrix of hk evaluated at

f k −1 ( x kj−1 ) , Σ vk −1 is the covariance matrix of the signal noise vk-1, Σ z k is the covariance of the observation noise zk.