Optimizing the Likelihood with Sequential Monte

Optimizing the Likelihood with Sequential Monte–Carlo methods P. Closas(1) , C. Fernández–Prades(1) , J.A. Fernández–Rubio(1) (1) Dept.

{closas, carlos, juan}@gps.tsc.upc.edu of Signal Theory and Communications. Universitat Politécnica de Catalunya (UPC) c/ Jordi Girona 1-3, Campus Nord, 08034 Barcelona.

Abstract- In this paper, a Sequential Monte–Carlo (SMC) method is studied to deal with nonlinear multivariate optimization problems arising from Maximum Likelihood (ML) estimation approaches. In this context, gradient–like methods are not efficient being the computational cost burdensome. Moreover, SMC provide an appealing way of introducing prior information in the estimation of parameters in general state–space models. Relying on SMC methods, the optimization algorithm is completely exposed and its pseudocode delivered for completely general ML problems. For the sake of completeness, the algorithm has been adapted and simulated in a GNSS application, where the use of prior information is of great interest.

II. M AXIMUM L IKELIHOOD E STIMATION OF P OSITION IN GNSS In a GNSS receiver, measurements are considered to be a superposition of plane waves corrupted by thermal noise and, possibly, interferences and multipath. The antenna receives M scaled, time–delayed and Doppler–shifted signals corresponding to each in-view satellite. The received complex baseband signal is modeled as M

x(t) = ∑ ai si (t − τi (Γ)) exp{ j2π fdi (Γ)t} + n(t)

(1)

i=1

I. I NTRODUCTION In a large variety of signal processing applications the Maximum Likelihood Estimation (MLE) is the approach taken for parameter estimation purposes. Unfortunately, in many cases there is not a closed-form solution for this estimator but a cost function to be optimized. The cost function can be multi-dimensional, which makes not feasible the use of gradient–based methods, such as the Steepest Descent or the Newton–Raphson algorithms. These algorithms need a proper initialization point to converge to the optimal value in a finite time, i.e., they need to be initialized in the neighborhood of the global optimum. Thus, local optimums can cause gradient–based algorithms to fail. Alternative methods must be studied to deal with the optimization in a more suitable and implementable way. To this aim, Sequential Monte–Carlo (SMC) methods, a set of statistical simulation-based methods [1], have been investigated and adapted to the multivariate optimization problem that arises from the computation of the MLE of position in a Global Navigation Satellite Systems (GNSS) framework. GNSS is the general concept used to identify those systems that allow user position computation basing on a constellation of satellites. Specific GNSS systems are the well-known american GPS or the forthcoming european Galileo. Both systems rely on the same principle: the user computes its position from measured distances between the receiver and the set of in-view satellites. These distances are calculated estimating the propagation time that transmitted signals take from each satellite to the receiver [2] [3]. Each satellite is uniquely identified by its own direct–sequence spread–spectrum (DS–SS) signal, which are transmitted synchronously by all satellites. In this work, we use a novel approach in the positioning problem which relies on the MLE of position to obtain motion parameters directly from recorded data without the intermediate synchronization– parameters estimation step [4].

where si (t) is the transmitted complex baseband low rate BPSK signal spreaded by the pseudorandom code of the i– th satellite, considered known, ai is its complex amplitude, τi (Γ) is the time-delay, fdi (Γ) the Doppler deviation and n(t) is zero-mean additive white Gaussian noise (AWGN) of variance σ2n . We have gathered all user motion parameters in a real vector contains for instance position and velocity £ Γ, which ¤T Γ = pT , vT . Notice that time–delays and Doppler shifts can be expressed as functions of Γ from the pseudorange and the pseudorange rate expressions [2]. If a receiver captures K snapshots, the model in equation (1) can be expressed as x = aD(Γ) + n

(2)

where • x ∈ C1×K is the observed signal vector, • a ∈ C1×M is a vector whose elements are the amplitudes of the M received signals a = [a1 . . . aM ], • D(Γ) = [d(t0 ) . . . d(tK−1 )] ∈ CM×K , known as the basis–function matrix, being d(t) = [d1 . . . dM ]T ∈ CM×1 , where each component is defined by di = si (t − τi (Γ)) exp{ j2π fdi (Γ)t} the delayed and Doppler shifted narrowband signal envelopes, and • n ∈ C1×K represents K snapshots of zero-mean AWGN with piecewise constant variance σ2n during the observation interval. We now consider the Maximum Likelihood Estimation (MLE) of signal parameters taking into account the measurement model presented in equation (2), parametrized by motion parameter vector Γ. We first take into account that the MLE is equivalent to the solution obtained by a Least Squares (LS) criteria under the assumption of zero-mean AWGN. Neglecting additive and multiplicative constants, maximizing the likelihood function of measurement equation (2) is equivalent to

minimizing the following cost function 1 ||x − aD(Γ)||2 (3) K and with the following cross-correlations estimation definitions Λ (a, Γ) =

1 H xx K ˆ dx (Γ) = R ˆH R xd (Γ)

ˆ xd (Γ) = 1 xDH (Γ) R K 1 ˆ Rdd (Γ) = D(Γ)DH (Γ) (4) K it is straightforward to obtain the MLE of complex amplitudes as ¯ ˆ xd (Γ)R ˆ −1 (Γ)¯¯ aˆ ML = R (5) dd Γ=Γˆ ML rˆxx =

The ML estimation of motion parameters is then obtained by minimizing the nonlinear cost function resulting from the substitution of (5) in (3), ª © ˆ xd (Γ) R ˆ −1 (Γ) R ˆH Γˆ ML = arg min {Λ (Γ)} = arg min rˆxx − R xd (Γ) dd Γ Γ (6) Whereas in the synchronization–parameter based positioning a two-dimensional optimization has to be performed for each tracked satellite [2], the position-dependent cost function takes into account signals coming from all satellites to obtain a position estimate, dealing with a single multivariate optimization problem for all the received satellites. For the sake of clarity and without loss of generality, we now consider that one of the coordinates (say z) and the velocity vector are known (or vary slowly with time and can be tracked by other methods) so that we can plot the three-dimensional likelihood function. Figure 1 shows the cost function in equation (6) in a realistic GNSS scenario composed of 6 satellites and Γ = [x, y]T .

velocity and acceleration among other possible motion parameters. Alternative methods must be studied to deal with the optimization in a more suitable and implementable way. To this aim, Sequential Monte–Carlo (SMC) methods have been investigated and adapted to the multivariate optimization problem at hand. Moreover, it might be advisable to provide external information to the system regarding receiver motion (or other application–dependent information), aiming at improving performance. This can be accomplished by means of SMC based methods, which provide an appealing way to introduce prior information in the estimation algorithm. Hence, the possibility of using aprioristic information can easily be taken into account when optimizing likelihood functions with the SMC method used herein. III. S EQUENTIAL M ONTE –C ARLO BASED OPTIMIZATION ALGORITHM

Sequential Monte–Carlo (SMC) methods are a set of simulation based methods which adopt the discrete state-space approach to deal with the non-linear filtering problem, this is to recursively compute estimates of states zk ∈ Rnz given measurements xk ∈ Cnx at time index k, where nz and nx are the dimensions of the state and measurement vectors respectively. State equation models the evolution of target states (on which measurements depend on) as a discrete–time stochastic model, in general zk = fk−1 (zk−1 , wk−1 ) (7) where fk−1 is a known, possibly non-linear, function of the state zk and wk−1 is referred to as process noise which gathers any non modeled effect or disturbances in the state characterization. The relation between the measurements and the states is modeled by

1

xk = hk (zk , nk )

(8)

ML Cost function

0.8

0.6

0.4

0.2

0 500 500 0 0 εy [m]

−500

−500

εx [m]

Fig. 1. The ML cost function in equation (6) for a realistic scenario (M = 6 satellites) as a function of different x and y coordinate errors, denoted as εx and εy respectively. In general, a single multi-dimensional optimization problem has to be solved.

A gradient–like method can be used to iteratively minimize the cost function, such as the Steepest Descent or the NewtonRaphson algorithm. However, these methods need a proper initialization to converge to the optimal value due to the high dimensionality of the function in the general multidimensional case, where Γ can consist of the 3–D position,

where hk is a known possibly non-linear function and nk is referred to as measurement noise. Both process and measurement noise are assumed white, with known statistics and mutually independent. The initial a priori probability density function of the state vector is assumed to be known, p(z0 ). Once the model is exposed, the problem is to compute filtered estimates of zk taking into account all available measurement up to time k, Xk = {xi , i = 1, . . . , k}. From a Bayesian point of view, the solution is to recursively obtain the a posteriori probability density function of states at time k given all available measurements, p(zk |Xk ). It will be shown that the desired density can be computed recursively in two stages: prediction and update. Given that p(z0 ) , p(z0 |x0 ), where x0 is the set of no measurements, we can assume that the required density at time k − 1 is available, p(zk−1 |Xk−1 ). In the prediction stage the prediction pdf is obtained by considering that p(zk |zk−1 ) = p(zk |zk−1 , Xk−1 ), since equation (7) describes a Markov model of order one. Using the Chapman–Kolmogorov equation to remove zk−1 we obtain, Z

p(zk |Xk−1 ) =

p(zk |zk−1 )p(zk−1 |Xk−1 )dzk−1

(9)

The update stage is performed whenever new measurement becomes available at time step k, xk . This stage is done by updating the prediction pdf in equation (9) via the Bayes’ rule p(zk |Xk ) = = =

p(zk |xk , Xk−1 ) p(xk |zk , Xk−1 )p(zk |Xk−1 ) p(xk |Xk−1 ) p(xk |zk )p(zk |Xk−1 ) p(xk |Xk−1 )

(10)

Z

p(xk |zk )p(zk |Xk−1 )dzk

(11)

Now the recursion is enclosed by equations (9) and (10), assuming some knowledge about state evolution, p(zk |zk−1 ), described by state equation (7) and the likelihood function, p(xk |zk ), modeled by measurement equation (8). This recurrence form the basis of the Optimal Bayesian solution, which tries to characterize the exact a posteriori pdf. In general, this recursion cannot be solved analytically. There are few cases where the posterior density can be characterized by a sufficient statistic and sub–optimal algorithms must be explored. Particle Filter methods use the Sequential Importance Sampling (SIS) concept to characterize the posterior density dealing with non-linearities/non-gaussianities of the model. Basically it involves the approximation of the posterior by a set of Ns random samples taken from an importance density function, zi ∼ π(z|X), with associated importance weights wi . The choosing of π(·) is a critical issue in any particle filter design. Where the importance function has the same support as the true posterior, which means that the set closure of arguments of these functions for which the value is not zero is the same. The posterior approximation is then Ns

p(z ˆ k |Xk ) = ∑ wik δ(zk − zik )

(12)

i=1

being δ(·) the Dirac’s delta function. This approximation converges almost surely to the true posterior as Ns → ∞ under weak assumptions according to the Strong Law of Large Numbers [1]. Notice that almost surely convergence is equivalent to convergence with probability one. These assumptions hold ¯ include the if the support of the chosen importance density (π) support of the posterior probability density function ( p), ¯ π¯ p¯

= {zk ∈ Rnz |π(zk |Xk ) > 0} = {zk ∈ Rnz |p(zk |Xk ) > 0} and p¯ ⊆ π¯

zik wik

being the normalizing factor p(xk |Xk−1 ) =

obtained is,

(13)

The resampling step consists in replacing particles with low importance weights and multiply those with high importance weights. This is the so called Sampling Importance Resampling (SIR) step which, although constitutes the bottle neck in any parallel implementation of Particle Filters, prevents from the degeneracy phenomenon, which states that variance of importance weights can only increase over time. After a certain number of recursive steps, all but one particle will have negligible normalized weights. The general recursive solution

∼ π(zk |zik−1 , xk ) p(xk |zik )p(zik |zik−1 ) ∝ wik−1 π(zik |zik−1 , xk )

(14)

One of the key points is the choose of a good importance density function π(·). This is to propose an importance density function close to the optimal, which is the posterior density function. In our case, by setting it to the transitional prior defined by state equation (7), we obtain zik

∼ π(zk |zik−1 , xk ) = p(zk |zik−1 )

w˜ ik

∝

p(xk |zik )

(15)

which states that the computed weights will be proportional to the likelihood function, which is the target function to optimize in equation (6). For comprehensive overviews on SMC and Particle Filtering methods, see [1], [5] [6] and [7]. In the problem at hand, equation (2) models measurements and motion parameter are the states of the model (zk , Γk ). As we deal with a sequential approximation to the ML solution, it is assumed that no prior information is available regarding the statistical evolution of states, although it is intuitive to add information thanks to the Importance Sampling, so that a non-informative prior is used to characterize the evolution of user coordinates, that is the importance density function. A joint normal probability density function is assumed for user coordinates, whose covariance parameter is used to tune the statistical distribution as the number of iterations (Nt ) increase. Then, for a given initial guess of user coordinates, Ns trial coordinates are generated taking into account a normal probability density function with mean the current estimate and covariance function of the initial ambiguity. A state estimate is given from the particle that optimize the ML cost function from the set generated and the covariance matrix of the states is given from particle filtering theory, if zˆ k denotes an estimation of the states at time index k, then the covariance matrix estimate is given by Ns ¡ ¢¡ ¢H Czk ≈ ∑ wik zik − zˆ k zik − zˆ k

(16)

i=1

which will be propagated to the next algorithm iteration jointly with the particle estimate. This estimation procedure converges to the minimum of the ML cost function in (6) adapting the covariance matrix at each iteration. A pseudocode description of the proposed optimization method is sketched in Table I, where N (µ, σ2 ) denotes the normal probability density function of mean µ and variance σ2 and, for a given input vector wk , the function [V, I] = min {wk } returns the minimum value (V ) and its index (I). Notice that the algorithm only updates the particles for next iteration if the lowest weight from the set of points generated is lower than the value obtained from the previous iteration. In GNSS, the ML cost function to optimize is described by equation (6), which relates the likelihood with user position coordinates. Coordinates can be expressed in a number of reference frames. Mainly, in cartesians where the coordinates vector is expressed as Γ = [x, y, z]T and in the World Geodetic System (WGS84), where the coordinates are expressed in

General SMC based optimization algorithm • Initial State Estimate: zˆ 0 • Initial State Uncertainty: Cz0 • Calculate Vpast = p(xk |ˆz0 ) • FOR k=1:Nt - FOR i=1:Ns Generate zik ∼ N (ˆzik−1 , Czk−1 ) Calculate w˜ ik = p(xk |zik ) - END FOR j s - Normalize wik = w˜ ik / ∑Nj=1 w˜ k - [V, I] = min{wk } - IF Vpast ≤ V zˆ k = zˆ k−1 Czk = Czk−1 - ELSE zˆ k = zIk Update Vpast = V Calculate Czk using eq. (16) - END IF • END FOR • Final estimates: zˆ = zˆ Nt

SMC based optimization algorithm: an application to GNSS h i i • Initial Position Estimate: Γˆ 0 = ϕˆ i0 , λˆ 0 , hˆ i0

• Initial Position Uncertainty: CΓ0 • Calculate Vpast = Λ(Γˆ 0 ) • FOR k=1:Nt - FOR i=1:Ns Generate ϕik ∼ N (ϕˆ ik−1 , CΓk−1 ) i λik ∼ N (λˆ k−1 , CΓk−1 ) i hk ∼ N (hˆ ik−1 , CΓk−1 ) Calculate w˜ ik = Λ(Γik ) - END FOR j s w˜ k - Normalize wik = w˜ ik / ∑Nj=1 - [V, I] = min{wk } - IF Vpast ≤ V ϕˆ k = ϕˆ k−1 , λˆ k = λˆ k−1 , hˆ k = hˆ k−1 CΓk = CΓk−1 - ELSE ϕˆ k = ϕIk , λˆ k = λIk , hˆ k = hIk Update Vpast = V Calculate CΓk using eq. (16) - END hIF i - Γˆ k = ϕˆ k , λˆ k , hˆ k

TABLE I

terms of longitude (ϕ), latitude (λ) and height (h) with respect to an ellipsoidal earth model standard, Γ = [ϕ, λ, h]T [8]. Whereas the cartesian coordinate system is convenient when motion models or dead reckoning techniques are used, the WGS coordinate system is useful when information regarding angles or height of the receiver is available, maybe via an Inertial Measurement Unit or an altimeter. Those two reference systems can be used equally, being the transformation from one coordinate system to the other trivial [8]. Table II shows the pseudocode particularization of the SMC method, for the GNSS application discussed in this work. Figure 2 shows the evolution of Λ(Γˆ k ) for the Steepest Descent algorithm and the proposed SMC method, clearly outperforming the results after a number of averaged simulations. Also the value of the cost function for the true target vector has been plotted, Λ(Γˆ opt ). While the proposed SMC method approximates the likelihood function almost surely, gradient-like methods do not ensure convergence to the optimum in a finite time. 1

10

Λ (Γ ) − SMC method k

Λ (Γk) − Steepest Descent Λ (Γ

• END FOR • Final estimates: Γˆ = Γˆ Nt TABLE II

IV. C ONCLUSIONS This paper addresses the problem of multivariate optimization of likelihood functions using Sequential Monte–Carlo (SMC) methods. After an overview of SMC methods, a general algorithm is presented and particularized for the problem of computing the MLE of position in a GNSS framework. Some results are provided comparing the behavior of the proposed SMC method with that of gradient–like methods. Whereas the latter are seen to fail in multi–dimensional problems, the proposed algorithm converges almost surely to the optimum value of the likelihood function. ACKNOWLEDGMENT This work was financed by the Spanish/Catalan Science and Technology Commissions and FEDER funds from the European Commission: TIC2003-05482, TEC2004-04526 and 2001SGR-00268.

)

opt

R EFERENCES ML Cost function, Λ (Γk)

0

10

−1

10

−2

10

0

2

4

6

8 10 12 Algorithm Iterations, k

14

16

18

20

Fig. 2. The evolution of Λ(Γˆ k ) for the proposed optimization algorithm, the Steepest Descent algorithm and the optimum value of the cost function as a function of k.

[1] A. Doucet, N. de Freitas, and N. Gordon, Eds., Sequential Monte Carlo Methods in Practice. Springer, 2001. [2] B. Parkinson and J. Spilker, Eds., Global Positioning System: Theory and Applications, vol. I, II, ser. Progress in Astronautics and Aeronautics. Washington DC: American Institute of Aeronautics, Inc., 1996, vol. 163– 164. [3] E. Kaplan, Understanding GPS. Principles and Applications. Artech House Publishers, 1996. [4] P. Closas, C. Fernández-Prades, J. A. Fernández-Rubio, and A. Ram´ırezGonzález, “On the Maximum Likelihood Estimation of Position,” in Proceedings of the ION GPS/GNSS 2006, Fort Worth, TX, 2006, submitted. [5] P. M. Djuric, J. H. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. F. Bugallo, and J. Miguez, “Particle Filtering,” IEEE Signal Processing Mag., vol. 20, no. 5, pp. 19–38, September 2003. [6] B. Ristic, S. Arulampalam, and N. Gordon, Beyond the Kalman Filter: Particle Filters for tracking applications. Boston: Artech House, 2004. [7] “Special Issue on Monte Carlo methods for Statistical Signal Processing,” IEEE Trans. Signal Processing, vol. 50, no. 2, February 2002. [8] M. S. Grewal, L. R. Weill, and A. P. Andrews, Global Positioning Systems, Inertial Navigation, and Integration. John Wiley & Sons, 2001.

Optimizing the Likelihood with Sequential Monte

Optimizing the Likelihood with Sequential Monte

Suggest Documents

Multilevel Sequential Monte Carlo with Dimension-Independent ...

Sequential monte carlo samplers

Adaptive Monte Carlo Maximum Likelihood

Genetic Algorithm Sequential Monte Carlo

Auto-Encoding Sequential Monte Carlo

Kernel Sequential Monte Carlo - arXiv

Nested Sequential Monte Carlo Methods

Neural Adaptive Sequential Monte Carlo

Likelihood-free Markov chain Monte Carlo - arXiv

Eye Tracking with Statistical Learning and Sequential Monte Carlo ...

Sequential Monte Carlo Methods for Tracking Multiple Targets With

Sequential Monte Carlo methods for Multi-target Filtering with Random

Group Object Tracking with Sequential Monte Carlo ...

Overview of Bayesian sequential Monte Carlo ...

Forward Smoothing using Sequential Monte Carlo

RANDOM FINITE SETS AND SEQUENTIAL MONTE ... - CiteSeerX

Multiple Target Tracking using Sequential Monte

Sequential Monte Carlo multiple testing - Google Sites

a sequential Monte Carlo approach - PeerJ

Embarrassingly Parallel Sequential Markov-chain Monte ... - CiteSeerX

Some Non-Standard Sequential Monte Carlo

bayesian estimation by sequential monte carlo

monte carlo methods in sequential and parallel

Information Geometry and Sequential Monte Carlo