Fast Nonlinear Filter for Continuous-Discrete Time ... - CiteSeerX

6 downloads 0 Views 187KB Size Report
Key words: nite element approximation, multiple model lter- ing, optimal lter. ... from the point of view of applications, because many real life processes evolve in ...
Fast Nonlinear Filter for Continuous-Discrete Time Multiple Models Sergey V. Lototsky Center for Applied Mathematical Sciences University of Southern California Los Angeles, CA 90089-1113 USA [email protected] Chuanxia Rao Center for Applied Mathematical Sciences University of Southern California Los Angeles, CA 90089-1113 USA [email protected] Boris L. Rozovskii Center for Applied Mathematical Sciences University of Southern California Los Angeles, CA 90089-1113 USA [email protected] Abstract

A fast algorithm is proposed for computing on line the optimal nonlinear lter in the continuous-discrete time, multiple model setting. Using the nite element approximation on a spatial grid with N This work was partially supported by ONR Grant #N00014-95-1-0229and ARO Grant DAAH 04-95-1-0164. 

points and performing part of the computations o line, the on-line complexity of the algorithm is shown to be O(N ) for all dimensions of the state process. The error of the approximation is also studied. Key words: nite element approximation, multiple model ltering, optimal lter.

1 Introduction In the continuous-discrete time ltering model, an unobservable continuous time state process is estimated from the noisy measurements made at discrete time moments. This model seems of special interest from the point of view of applications, because many real life processes evolve in continuous time while the digital devices used to process the measurements require discrete time data. The case of continuous time observations was studied in [9, ?], see also [10]. The desired solution of the ltering problem is an algorithm that provides the best mean square estimate of the given functional of the state process in the form suitable for on-line implementation. In the linear case, such a solution is given by the Kalman lter [6, ?]. It is worth mentioning that the exact solution of the continuousdiscrete time ltering problem is known for a wide class of nonlinear models [4, ?] and consists in determining a special random eld called the unnormalized ltering density (UFD), but the resulting algorithm requires on-line solution of a partial di erential equation (FokkerPlanck equation) and computation of integrals. If the dimension of the state process is greater than three, the real time implementation of this algorithm is practically impossible. On-line solution of the Fokker-Planck equation can be avoided if the UFD admits a nite dimensional sucient statistics; in particular, this is the case when the model is linear Gaussian. The rst nonlinear example of this sort was discovered by Benesh [2], and a more general class of UFD admitting nite dimensional sucient statistics was studied by Daum [3]. Unfortunately, for a given nonlinear model, it is usually not clear whether the sucient statistics exists. The practical algorithms based on this approach use approximations similar to the extended Kalman lter [11], so as a rule the error of such approximations is unknown. Moreover, these algorithms still require on-line solution of an ordinary di erential equation and evaluation of 2

integrals. The objective of the current work is to develop a recursive numerical algorithm for computing the optimal lter in which the on-line part is as simple as possible; in particular, no di erential equations are to be solved on-line. The starting point in the derivation is the equation satis ed by the unnormalized ltering density in the general nonlinear model, and the approach is based on the technique known as the parameterization of the UFD [6] using the nite element approximation [5]. In the proposed algorithm (Section 3) both time consuming operations of solving the Fokker-Planck equation and computing the integrals are performed o line, which makes the algorithm suitable for on-line implementation. The on-line complexity of the algorithm is O(N ), where N is the number of points in the spatial domain (or the number of functions used in the nite element approximation). Since the result is not the exact optimal lter, the error of the approximation is estimated.

2 The Filtering Problem

Consider the problem of estimating an IRd -valued state process X = X (t) that evolves in continuous time according to the following di usion equation dX (t) = b(X (t))dt + (X (t))dW (t); X (0) = x0; (2:1) given IRr -valued measurements z = z (k) made at discrete time moments tk = k; k  1: z(k) = h(X (tk )) + v(k): (2:2) d In the above, W = (W (t))t0 is an IR 1 -valued standard Brownian motion independent of the initial condition x0; functions b = b(x);  = (x); and h = h(x); x 2 IRd ; take values in IRd ; IRdd1 , and IRr respectively; the sequence fv (k)gk1 is independent of the state process and consists of i.i.d. Gaussian random vectors with zero mean and covariance Ev (k)v (k)T = "I , I 2 IRrr is the identity matrix, " > 0 is a scalar parameter. The underlying probability space ( ; F ; P) is assumed to be xed. The following regularity assumptions are made about the model (2.1), (2.2): 3

1. the functions b; ; and h are in nitely di erentiable and bounded with all the derivatives; 2. the random vector x0 has a density p0 = p0 (x); x 2 IRd , so that the function p is in nitely di erentiable and decays at in nity with all the derivatives faster than any power of jxj. Let f = f (x); x 2 IRd ; be a measurable scalar function such that Ejf (X (t))j2 < 1 for all t  0. Then the ltering problem for (2.1), (2.2) can be stated as follows: nd the best mean square estimate of f (X (tk)) given the measurements z(m); m = 1; . . . ; k. This estimate is called the optimal lter and will be denoted by f^(k). It is a standard fact that f^(k) = E(f (X (tk))jz(1); . . . ; z(k)): For computational purposes, the optimal lter can be characterized as follows. Denote by Tt the solution operator for the Fokker-Planck equation corresponding to the state process; in other words, u(t; x) = Tt'(x) is the solution of the equation

!

d d1 @u(t; x) = 1 X @2 X @t 2 i;j =1 @xi @xj l=1 il (x)jl (x)u(t; x) ? d X @ (b (x)u(t; x)); t> 0; i i=1 @xi u(0; x) = '(x): Next, de ne the sequence pk (x); x 2 IRd ; k  0; by p0(x) = p(x); pk (x) = Hk (x)Tpk?1 (x);

where

( X  1 r

)

(2:3)

hl(x)zl (k) ? 21 h2l (x) : Hk (x) = exp "2 l=1 The random eld pk = pk (x) is called the unnormalized ltering density. Then the optimal lter f^(k) can be written as follows [6]: R p (x)f (x)dx ^ (2:4) f (k) = IRRd k p (x)dx : IRd k 4

The numerator in (2.4) will be denoted by k [f ]. With this notation, (2.4) becomes f]: f^(k) = k [[1] k

3 The Algorithm

Assume that the function f satis es the following growth condition:

jf (x)j  C (1 + jxj ); x 2 IRd; (3:1) for some ; C > 0. Then Ejf (X (t))j2 < 1 for all t  0 [7]. To parameterize the function pk , a nite element approximation is used. Let fe1; . . . ; en g be a basis in the given nite dimensional set of interpolating functions and fx1 ; . . . ; xN g the collection of points in IRd so that ei (xj ) = ij . Assume that the support of each function ei is a compact set concentrated around the point xi . For a continuous function v = v (x) its interpolation v = v (x) is de ned by

v(x) =

N X i=1

v(xi)ei (x):

Consider the following approximation of the unnormalized ltering density: p~k (x) = Hk (x)Tp~k?1 (x): For l; m = 1; . . . ; M de ne the numbers

Qlm = (T em )(xl); and

fl =

Z IRd

l (k) = p~k (xl );

f (x)el(x)dx:

The coecients l(k) can be used to compute the approximations p~k (x) and ~k [f ] of the UFD and the unnormalized optimal lter according to the following recursive algorithm. 1. O line (before the measurements are available), compute Qlm ; fl ; and l(0) = p0 (xl); l; m = 1; . . . ; N:

5

2. On line, step k (as the measurements become available): compute m (k) = Hk (xm )

then compute

N X l=1

p~k (x) =

Qml l(k ? 1); m = 1; . . . ; N; N X l=1 N X

~k [f ] = and

l=1

l(k)el (x);

(3:2) (3:3)

l (k)fl ;

~

(3:4) f~k = ~k [f ] : k [1] According to formula (3.2), the matrix Q can be viewed as the lter

kernel.

We would like to remark the following features of the algorithm: (1) The time consuming operations of solving the partial di erential equation (2.3) and computing integrals are performed o line; (2) The overall amount of the o -line computations does not depend on the number of the on-line time steps; (3) Formula (3.4) can be used to compute an approximation to f^(k) (e.g. conditional moments) without computing p~k (x) as an intermediate step; (4) Only the coecients l must be computed at every time step while the approximate lter f~k and UFD p~k (x) can be computed as needed, e.g. at the nal time moment. (5) The on-line part of the algorithm can be easily parallelized. The number of on-line operations on each step of the algorithm is O(N ), where N is the number of the spatial points. The reason is that the number of operations to update each of the coecients m according to (3.2) is e ectively independent of N . Indeed, since Tel is a solution of a parabolic equation with the initial condition el and each function el is compactly supported around the point xl , the values of (Tel )(xm ) will be close to zero when the distance jxl ? xm j is suciently large. In other words, the the matrix Q = (Qml ) is 6

numerically banded meaning that in actual computations Q is replaced by a banded matrix with the number of nonzero entries in each column not exceeding some xed number L; the number L is independent of N . The speci c value of L depends on the coecients of the state equation and the basis functions, and can be controlled by the suitable choice of . The number of on-line operations, measured in ops, on each step of the algorithm is then equal to

Kop = (3r + 2L + 3)N: The approximation error can be controlled by increasing the dimension of the approximation. The speci c error bounds depend on the basis functions el . For simplicity, consider the one dimensional model (d = r = 1) and the linear interpolation on a uniform grid so that the functions el (x) are given by



 j x ? x j l el(x) = 1 ? x ? x I (xl?1  x  xl+1 ): l+1 l Denote by C2b the space of twice continuously di erentiable functions bounded with both derivatives; kf kCb2 := supx2IR jf (x)j+supx2IR jf 0(x)j+ supx2IR jf 00(x)j. Also denote by L2 (w); w 2 IR; R the space of measurable functions f = f (x) for which kf k2L2 (w) := IR jf (x)j2(1+ x2)w dx < 1; similarly, H1(w) = ff : kf k2H1(w) := kf k2L2(w) + kf 0k2L2(w) < 1g. Theorem. Assume that the spatial grid x1 < . . . < xN is uniform (xi+1 ? xi = x); [?R; R]  [x1; xN ] for some R > 0, and the linear interpolation is used. Then for every w > 0 there is a constant C (w) depending only on w and possibly the coecients of the state equation, and a constant C (h) depending only on the ratio h(x)=" so that sup Ejpk (x) ? p~k (x)j  exp (k(C (w) + C (h))  x2IR  (x)2 kp k 2 + C (w) kp k 1 8 0 Cb (1 + R2)w=2 0 H (w) : If in addition the function f satis es the growth condition (3.1), then Ejk [f ] ?~k [f ]j  2 exp (k(C (w) + C (h))  CR +3=2 (x8 ) kp0kC2b +  C (w)kf kL2(? ?2) kp0kL2(w+ +2) : (1 + R2 )w=2 7

Proof. It is well known that in the case of linear interpolation 2

sup jf (x) ? f (x)j  (x8 ) max jf 00(x)j  x2[?R;R] x2[?R;R] (x)2 kf k 2 : Cb 8 Another well known inequality is sup jf (x)j  C (w2) 1=2 kf kH1 (w) : (1 + R ) x=2[?R;R] It can be shown using the de nition of pk and the regularity assumptions that

Ekpk kX  ek(C(w)+C(h)) kp0kX; where X is either C2b or H1 (w). After that the rst statement of the

theorem follows from the discrete time version of the Gronwall lemma. The second statement of the theorem follows from the rst and the growth condition (3.1). 2 The theorem implies that for every k the approximation error converges to zero in the limit Rlim lim . !1 x!0 Remark. If "?1khkC2b  1, then C (h) = 8"?2 khk2C2b , and if "?1 khkC0b  1; then C (h) = 5"?2 khk2C2b + 7"?4 khk4C1b .

4 Multiple Model Filtering The optimal nonlinear lter can also be used even if the parameters of the state equation are not known to the observer. To construct the lter in this situation, the multiple model approach can be used. According to [1], the basic assumption of the approach is that at each moment the state process can be described by one of the given number of models with switching between the models governed by a homogeneous Markov process. For example, an aircraft can be in one of few possible ight modes (constant velocity, constant acceleration, coordinated turn, etc.), and while the current mode may be not known to the observer, the dynamical equations corresponding to each mode are available. 8

Assume that each of the possible models of the state process is of the form (2.1), i.e. is a nonlinear di usion with drift b(x; i) and di usion coecient  (x; i); with i 2 f1; . . . ; M g corresponding to the particular model. If  = t is a homogeneous Markov process with values in the set f1; . . . ; M g, initial distribution 0 2 IRM , and the jump intensity matrix  2 IRM M , then the state process X = X (t) can be modeled by dX (t) = b(X (t); t)dt + (X (t); t)dW (t): (4:1) It is also assumed that the measurements z(k) = h(X (tk ); tk ) + v(k) (4:2) with the sequence v = v (k) as in (2.2), and that the process  is independent of v; W; and X (0). As before, the functions b(; i),  (; i), and h(; i) are in nitely di erentiable and bounded with all the derivatives for every i = 1; . . . ; M; and the initial density p0 of X (0) is in nitely di erentiable and decays at in nity faster than any power of jxj. The choice of the initial distribution 0 and of the jump intensity matrix  is made during the overall system design. For example, in the case of an aircraft tracking both 0 and  can be estimated by analyzing the real ight data. A general discussion of the question is given in [1]. >From the point of view of the general theory, the model (4.1), (4.2) is equivalent to the original model (2.1), (2.2). Indeed, the unobservable component is the pair of processes (X; ) with the generating operator 1 0L 0 1 C L=B A + ; @ ... 0 LM where d X 1 @ 2g(x) + Lk g(x) = 2 ((x; k)T (x; k))ij @x i @xj i;j =1 d X b(x; k) @g(x)

@xi and  T is the transpose of the matrix  . Denote by ?t F(x) the solution i=1

of the equation

@ v(t; x) = L v(t; x); t > 0; v(0; x) = F(x); @t 9

in which v(t; x) is a vector1 in IRM and L is the formal adjoint of L. De ne the vectors p(k; x), k  0, x 2 IRd , by p(k; x) = H(k; x)?p(k ? 1; )(x); k  1; (4:3) p(0; x) = p0(x)0; where H(k; x) is a diagonal matrix with entries

)

( X  1 r

hl(x; i)zl(k) ? 21 h2l (x; i) i = 1; . . . ; M: exp "2 l=1 For a function f satisfying the growth condition (3.1) de ne k [f; i] = Then and

Z

IRd

f (x)[p(k; x)]idx:

PM  [f; i] Ef (X (tk ))jz(1); . . . ; z(k)) = PiM=1 k  [1; i] i=1 k

P(tk = ijz (1); . . . ; z (k)) = PMk [1; i] : i=1 k [1; i] Recursive approximations of p(k; x) and k [f; i] can be computed using the same approach as in the previous section. The vector p(k; x) is approximated by

p~ (k; x) =

N X M X l=1 i=1

i l (k)el(x)ui ;

where e1 ; . . . ; eN are the interpolating functions and fu1; . . . ; uM g is the standard unit basis in IRM (i.e. [ui ]j = ij ). The coecients li are updated using the pre-computed lter kernel Q = (Qijlm ), l; m = 1; . . . ; N , i; j = 1; . . . ; M; and the measurements z (k). The following is the description of the corresponding algorithm. 1. O line (before the observations are available): compute the components of the kernel Qijlm = [?(em()uj )]i(xl ); All vectors are column vectors; the components of a vector v are denoted by either v or [v] . 1

i

10

i

the coecients

fl =

and set

Z

f (x)el(x)dx

IRd

i l (0) = p0(xl )[0]i ;

p~ (0; x) =

2. On line, step k: compute

N X M X l=1 i=1

i l (0)el(x)uk ;

N M i(k) = H(k; x ) X X Qij j (k ? 1); l l lm m m=1 j =1

then, if necessary, compute

p~ (k; x) =

N X M X l=1 i=1

i l (n)el (x)ui;

X i ~k [f; i] = l (k)fl; N

l=1

and

PM ~ [f; i] ~ fk = PiM=1 ~k ;  [1; i] i=1 k

(4:4)

~ ~k (i) = PMk [1~; i] i=1 k [1; i] (the probability that the current model is i).

Formula (4.4) gives only one possible way of estimating the state. Once the coecients (k) are known, other estimates can be constructed, for example, X^k = arg max[~p(k; x)]^ ; x

where

^k = arg max  [1; i]: i k 11

k

Using the same arguments as in the previous section, it can be shown that the number of on-line operations (in ops) to update the coecients is Kop = (3r + 2L + 3)M 2N: The operator ? can be approximated by the Trotter formula as follows. For t > 0 denote by Tt(k) g (x); k = 1; . . . ; M; the solution of the equation @v(t; x) = L v(t; x); v(0; x) = g(x); and let

@x

k

0 T (1) t B Tt = @

...

0

1 C A:

0 Tt(M ) If 0 = t0 < t1 . . . < t =  is a partition of the interval [0; ], then ? is approximated by exp((t ? t?1 )T )Tt?t?1    exp((t1 ? t0 )T )Tt1?t0 :

References [1] Y. Bar-Shalom and X.-R. Li. Estimation and Tracking: Principles, Techniques, and Software. Artech House, Boston, 1993. [2] V. E. Benesh. Exact Finite-dimensional Filters for Certain Diffusions with Nonlinear Drift. Stochastics, 5:65{92, 1981. [3] F. E. Daum. New Exact Nonlinear Filters. In J. C. Spall, editor, Bayesian Analysis of Time Series and Dynamic Models, pages 199{226, Marcel Dekker, Inc., New York, 1988. [4] R. J. Elliott, L. Aggoun, and J. B. Moore. Hidden Markov Models. Springer, New York, 1995. [5] A. Germani and M. Piccioni. Semi-discretization of Stochastic Partial Di erential Equations on Rd by a Finite Element Technique. Stochastics, 23(2):131{148, 1988. [6] A. H. Jazwinski. Stochastic Processes and Filtering Theory. Academic Press, New York, 1970. 12

[7] R. Sh. Liptser and A. N. Shiryayev. Statistics of Random Processes. Springer, New York, 1977. [8] S. Lototsky, R. Mikulevicius, and B. L. Rozovskii. Nonlinear Filtering Revisited: A Spectral Approach. SIAM Journal on Control and Optimization, 1997. (To appear). [9] S. Lototsky and B. L. Rozovskii. Recursive Multiple Wiener Integral Expansion for Nonlinear Filtering of Di usion Processes. In N. Gretsky, editor, Festschrift in honor of M. M. Rao, Marcel Dekker, Inc., New York, 1996. (To appear). [10] D. Ocone. Multiple Integral Expansions for Nonlinear Filtering. Stochastics, 10:1{30, 1983. [11] G. C. Schmidt. Designing Nonlinear Filters Based on Daum's Theory. Journal of Guidance, Control, and Dynamics, 16(2):371{ 376, 1993.

13

Suggest Documents