Convergence of Nonlinear Filters for Randomly ... - Semantic Scholar

2 downloads 0 Views 510KB Size Report
Mar 27, 2003 - 1TD Securities, Toronto Dominion Bank,. 14–18 Finsbury Square, London, EC2A 1DB, England [email protected]. 2Department ...
Appl Math Optim 48:93–128 (2003) DOI: 10.1007/s00245-003-0772-8

© 2003 Springer-Verlag New York Inc.

Convergence of Nonlinear Filters for Randomly Perturbed Dynamical Systems∗ Vladimir M. Lucic1 and Andrew J. Heunis2 1 TD

Securities, Toronto Dominion Bank, 14–18 Finsbury Square, London, EC2A 1DB, England [email protected] 2 Department

of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1 [email protected]

Abstract. We establish convergence of the nonlinear filter of the state of a randomly perturbed dynamical system in which the perturbation is a rapidly fluctuating ergodic Markov process, and the observation process conditions the state of the system. The limiting nonlinear filter is completely characterized. Key Words. Nonlinear filter equations, Martingale problems, Measure-valued processes, Weak convergence. AMS Classification.

1.

Primary 60G35, Secondary 60G44, 60G57.

Introduction

Suppose that Z := {Z t } is a Markov process on a probability space (, F, P), taking values in a metric space S, with a unique invariant probability measure m. ¯ Let Z ε be a “rescaled process” defined by Z tε := Z t/ε2 ,

∀t ∈ [0, ∞),

(1.1)

for some small parameter ε ∈ (0, 1], and consider the random ordinary differential equation d X tε 1 = F(X tε , Z tε ) + H (X tε , Z tε ), X 0ε := nonrandom x0 ∈ Rd , (1.2) dt ε ∗

This research was supported by MITACS-NSERC of Canada.

94

V. M. Lucic and A. J. Heunis

where the mappings F, H : Rd × S → Rd are sufficiently well behaved to ensure that (1.2) has a unique solution X ε for each sample-path of the process Z ε . The ODE (1.2) with small ε is frequently used as a model for randomly perturbed dynamical systems, the interpretation of the terms in (1.2) being, briefly, as follows: the process Z ε models “internal randomness” in the system, the vector field H models the aggregate drift of the dynamical system, and the vector field F models rapid local fluctuations around the paths determined by this drift. If the Markov process Z in (1.1) is “weak mixing,” then F(x, Z sε ) and F(x, Z tε ) are “almost independent” for fixed x ∈ Rd and s = t (since ε > 0 is small), so that the local fluctuations are indeed fast. We typically want these local fluctuations to be essentially “aimless,” without any overall drift, at least asymptotically for small ε (when the distribution of Z tε should be given approximately by the invariant probability measure m), ¯ and hence it is also assumed that F satisfies the condition  F(x, z)m(dz) ¯ = 0, ∀x ∈ Rd . (1.3) S

When ε > 0 is small enough then the process {(1/ε)F(x, Z tε )} should be “almost a white noise” for each x ∈ Rd , and hence one would expect that, as ε → 0, the solution process X ε of (1.2) converges weakly to the solution of a random differential equation driven by “perfect” white noise, or, more precisely, to the solution X¯ of an Itˆo stochastic differential equation (SDE) of the form d X¯ t = b( X¯ t ) dt + c( X¯ t ) d V¯t ,

X 0ε := nonrandom x0 ∈ Rd ,

(1.4)

¯ P). ¯ Indeed, ¯ F, for some standard R -valued Wiener process V¯ on a probability space (, this convergence was established by Blankenship and Papanicolaou [6, Section 4] when the Markov process Z takes values in a compact metric space S, and the coefficients b(·) and c(·) in (1.4) are calculated in terms of the vector fields F and H in (1.2) (this result also appears as Theorem 12.2.4 on p. 475 of [8]). Now suppose that, related to the process X ε , we have an Rr -valued observation process Y ε defined by  t Ytε := h(X sε ) ds + Wt , (1.5) d

0

where W is an Rr -valued Wiener process on (, F, P) independent of the process Z in (1.1), and h: Rd → Rr is a sensor function which is characteristic of the technical apparatus used to “measure” the signal X ε . Without being too exact at this point about existence and technical measurability issues, we shall regard the nonlinear filter of the signal X ε corresponding to the observation process Y ε , as a process π ε taking values in P(Rd ) (this denotes the set of all probability measures on Rd ), which is adapted to the self-filtration of the observation process Y ε , and such that πtε f = E[ f (X tε ) | Ysε , 0 ≤ s ≤ t]

a.s.

(1.6)

for each bounded Borel-measurable f : Rd → R and t ≥ 0. If we regard X¯ given by (1.4) as a “limiting signal,” then we can introduce a corresponding observation process Y¯ defined by  t ¯ Yt := h( X¯ s ) ds + W¯ t , (1.7) 0

Convergence of Nonlinear Filters

95

¯ P) ¯ which is independent of the ¯ F, where W¯ is an Rr -valued Wiener process on (, ¯ Wiener process V in (1.4). Again, we shall regard the nonlinear filter of the signal X¯ corresponding to the observation process Y¯ as a P(Rd )-valued process π¯ which is adapted to the self-filtration of Y¯ , and such that π¯ t f = E[ f ( X¯ t ) | Y¯s , 0 ≤ s ≤ t]

a.s.

(1.8)

for each bounded Borel-measurable f : Rd → R and t ≥ 0. The weak convergence of X ε to X¯ as ε → 0 suggests that the nonlinear filter π ε will likewise converge weakly to the nonlinear filter π¯ , and our goal is to establish this result. In fact, we look at a problem which is somewhat more general that the one just outlined and which is motivated as follows: In many applications (especially in aerospace problems) it is usual for the observation Y ε to be fed back to the dynamics of the signal X ε , so that the process X ε is given, not by (1.2), but rather by d X tε =

1 F(X tε , Z tε ) dt + H (X tε , Z tε ) dt + B(X tε ) dYtε , ε

X 0ε = nonrandom x0 ∈ Rd ,

(1.9)

or, equivalently, by d X tε =

1 F(X tε , Z tε ) dt + G(X tε , Z tε ) dt + B(X tε ) d Wt , ε

X 0ε = nonrandom x0 ∈ Rd ,

(1.10)

for G(x, z) := H (x, z) + B(x)h(x), (x, z) ∈ Rd × S. Here B: Rd → Rd×r is a mapping which defines the feedback of the observation Y ε to the dynamics of X ε . It seems plausible that the weak convergence of X ε given by (1.10) to a limit X¯ continues to hold provided that we modify the model (1.4) of the “limiting” signal X¯ by adding an extra term to account for feedback of the observation process Y¯ given by (1.7), namely d X¯ t = b( X¯ t ) dt + c( X¯ t ) d V¯t + B( X¯ t ) d W¯ t , X¯ 0 = nonrandom x0 ∈ Rd .

(1.11)

The general question of convergence (as ε → 0) of the nonlinear filter π ε given by (1.5), (1.6), to a limiting nonlinear filter π¯ given by (1.7), (1.8), continues to make sense when X ε and X¯ are given by (1.10) and (1.11), respectively, and our goal is to show this convergence. In Section 2 we formulate the basic conditions which will always be postulated, and state the main results of this work (see Theorems 2.12 and 2.18). In Section 3 we present a convergence theorem of Bhatt and Karandikar [3] which is the main technical result that we use for establishing weak convergence. Sections 4 and 5 give proofs of the main results set forth in Section 2, and in Section 6 we compare our result on convergence of nonlinear filters with other works in the established literature which have also addressed this question. Finally, the remaining Sections 7–9 are appendices for the proofs of Theorems 2.12 and 2.18.

96

2.

V. M. Lucic and A. J. Heunis

Conditions and Main Results

We always use the following notation and terminology: (i) For a separable metric space E, let B(E) denote the Borel σ -algebra on E, let B(E) denote the set of all real-valued uniformly bounded Borel-measurable mappings on E, and define the supremum norm on B(E) by ϕ := sup{|ϕ(x)| : x ∈ E}, ∀ϕ ∈ B(E). Also write C(E) for the set of all real-valued continuous functions on E, ¯ ¯ put C(E) := B(E) ∩ C(E), and write Cc (E) for the set of all members of C(E) which ˆ have compact support. When E is locally compact let C(E) be the collection of all ¯ members of C(E) which vanish at infinity. (ii) For a vector x in a finite-dimensional Euclidean space Rq , write x k for the kth scalar entry of x, and write |x| for the Euclidean norm on Rq . For a positive integer r , let C r (Rq ) denote the collection of all members of C(Rq ) with continuous derivatives of each order, up to and including r . Let C ∞ (Rq ) denote the collection of all members of C(Rq ) with continuous derivatives of all orders. Put Cc∞ (Rq ) := Cc (Rq ) ∩ C ∞ (Rq ) and Ccr (Rq ) := Cc (Rq ) ∩ C r (Rq ) for positive integers r . For E a metric space and r some positive integer, write C r,0 (Rq × E) for the collection of all mappings f ∈ C(Rq × E) whose partial derivatives of every order up to and including r , with respect to its first q real-valued arguments, exist and are members of C(Rq × E), and put Ccr,0 (Rq × E) := C r,0 (Rq × E) ∩ Cc (Rq × E). (iii) When E is a complete separable metric space let P(E) denote the collection of all probability measures on the measurable space (E, B(E)) with the usual topology of weak (or narrow) convergence; and if X : (, F, P) → E is F/B(E)-measurable, then let L(X ) or, more precisely, L P (X ) denote the member of P(E) which is the distribution of X on (E, B(E)). Also, for a B(E)-measurable mapping f : E → R which is integrable  with respect to µ ∈ P(E), we put µf := E f dµ. We now formulate conditions on the SDEs (1.10), (1.11), and the observations (1.5), (1.7): Condition 2.1. For each ε ∈ (0, 1] the process {Z tε , t ∈ [0, ∞)} is defined by (1.1), where {Z t , t ∈ [0, ∞)} is a Markov process on a complete probability space (, F, P), taking values in a compact metric space S, with a transition probability function Pt (z, ) and initial distribution µ0 ∈ P(S). The transition probability function Pt (z, ) is conservative, i.e.  Pt (z, S) = 1, ∀(t, z) ∈ [0, ∞) × S, and the semigroup {Tt } defined by Tt (z) := S (z  )Pt (z, dz  ), z ∈ S, maps C(S) into C(S) for each t ∈ [0, ∞), and is a strongly continuous semigroup on C(S) with infinitesimal generator Q (that is, {Tt } is a Feller semigroup in the terminology of Ethier and Kurtz [8, p. 166]). Remark 2.2. Write D(Q) and R(Q) for the domain and range of Q. Without loss of generality we suppose that the sample paths of Z are corlol (cadlag) (see Theorem 4.2.7 of [8]). Condition 2.3 (Ergodicity of {Z t }).  ∞ The transition probability Pt (z, ) has unique invariant probability m¯ ∈ P(S), and 0 Tt  − m ¯ dt < ∞, ∀ ∈ C(S). Condition 2.4. The mappings F: Rd × S → Rd , G: Rd × S → Rd , and B: Rd → Rd×r in (1.10) satisfy F i ∈ C 3,0 (Rd × S), G i ∈ C 2,0 (Rd × S), B i j ∈ C 2 (Rd ),

Convergence of Nonlinear Filters

97

i = 1, 2, . . . , d, j = 1, 2, . . . , r , and have uniformly bounded first x-derivatives, namely sup (|∂k F i (x, z)| + |∂k G i (x, z)| + |∂k B i j (x)|) < ∞,

Rd ×S

∀i, k = 1, 2, . . . , d,

j = 1, 2, . . . , r.

(2.12)

Moreover, F is such that (1.3) holds. Condition 2.5. The process {Wt , t ∈ [0, ∞} in (1.5) and (1.10) is a standard Rr valued Wiener process on (, F, P), and is independent of the Markov process Z in Condition 2.1. The following result, which is established in Section 9, will be needed to formulate the remaining conditions: Lemma 2.6. Suppose that Conditions 2.1 and 2.3 hold. Then, for each z ∈ S, there is a unique finite signed regular Borel measure χ (z, ·) on S such that supz χ (z, ·)TV < ∞ (where  · TV is the total variation norm) and   ∞ (z  )χ (·, dz  ) = [Tt (·) − m] ¯ dt ∈ C(S), ∀ ∈ C(S). (2.13) 0

S

Moreover, the following hold: (i) If  ∈ C(S) is such that m ¯ = 0, then  ∈ R(Q), and the function (z) := S (z  )χ (z, dz  ), z ∈ S, is a member of D(Q) and solves the “Poisson equation” Q = −.  (ii) For every g ∈ C 1,0 (Rd × S), the function f (x, z) := S g(x, z ) χ (z, dz  ), ∀(x, z) ∈ (Rd × S), is a member of C 1,0 (Rd × S), and ∂ j f (x, z) = S ∂ j g(x, z  ) χ (z, dz  ), ∀(x, z) ∈ Rd × S, j = 1, 2, . . . , d. Remark 2.7. The mapping t ∈ [0, ∞) −→ [Tt  − m] ¯ ∈ C(S) is continuous (since {Tt } is strongly continuous). Thus Condition 2.3 and Lemma 1.1.4 of [8] ensures that the Riemann integral on the right-hand side of (2.13) exists in C(S). Remark 2.8. Condition 2.4, together with standard existence and uniqueness results for SDEs (see Theorem 5.1.1 of [13]), ensures that (1.10)ε has an Rd -valued pathwise unique strong solution {X tε } adapted to the filtration {FtW,Z } defined by ε

FtW,Z := σ {Ws , Z sε , s ∈ [0, t]} ∨ N (P),

t ∈ [0, ∞),

where N (P) denotes the set of P-null events in (, F, P). We next formulate coefficients b(x) and c(x) of (1.11) so that X¯ is the weak limit of X given by (1.10). To this end, use the signed Borel measures χ (z, ·) from Lemma 2.6 to define      d bi (x) := ∂ j F i (x, z  )χ (z, dz  )F j (x, z) m(dz), ¯ (2.14) G i (x, z) + ε

S

S j=1

98

V. M. Lucic and A. J. Heunis

 

F i (x, z  )χ (z, dz  )F j (x, z) +

a i j (x) := S

S



F j (x, z  )χ (z, dz  )F i (x, z)



S

× m(dz) ¯ + [BB (x)] , T

ij

(2.15)

for each i, j = 1, . . . , d, and x ∈ Rd . Also, define diffusion operator L with domain D(L) by D(L) := Cc∞ (Rd ), Lϕ(x) :=

d 

(2.16a)

bi (x) ∂i ϕ(x) +

i=1

x ∈ Rd ,

d 1  a i j (x) ∂i ∂ j ϕ(x), 2 i, j=1

ϕ ∈ D(L).

(2.16b)

For each x ∈ Rd we have m¯ F i (x, ·) = 0 (see Condition  2.4). Thus, upon taking (·) := ∞ F i (x, ·) in (2.13), we get 0 (Tt F i (x, ·))(z) dt = S F i (x, z  )χ (z, dz  ), ∀z ∈ S; now multiply each side by F j (x, z), integrate with respect to m(dz), ¯ and use Fubini’s theorem to get   F i (x, z  )χ (z, dz  ) F j (x, z)m(dz) ¯ S

S



=



0

 =



(Tt F i (x, ·))(z)F j (x, z)m(dz) ¯ dt S



E[F i (x, Zˆ t )F j (x, Zˆ 0 )] dt,

(2.17)

0

where Zˆ is a stationary S-valued process corresponding to the transition probability Pt (z, ) and initial measure m. ¯ It follows from (2.17) that the first term on the righthand side of (2.15) defines a d × d nonnegative definite symmetric matrix which we denote by a(x), ˆ so that a(x) is nonnegative definite and symmetric for each x ∈ Rd . d d×d in (1.11) be a fixed Borel-measurable square root of a(·); ˆ then from Let c: R → R (2.15) we have  T c (x) a(x) = [c(x) B(x)] T (2.18) , x ∈ Rd . B (x) Remark 2.9. From (2.12), (2.14), (2.15), and Lemma 2.6 it follows that bi (·) and a i j (·) are continuous functions on Rd , and there exists a constant C ∈ [0, ∞) such that |bi (x)| ≤ C[1 + |x|], |a i j (x)| ≤ C[1 + |x|2 ], ∀x ∈ Rd , ∀i, j = 1, 2, . . . , d. Condition 2.10. For each µ ∈ P(Rd ) there exists a unique Pµ ∈ P(CRd [0, ∞)) which solves the Stroock–Varadhan martingale problem associated with L (see Definition 5.4.10 of [15]) and which has initial distribution µ, namely Pµ [ω ∈ CRd [0, ∞) : ω0 ∈ ] = µ( ), ∀ ∈ B(Rd ). It follows from Condition 2.10 and Corollary 5.3.4 of [8] that, for each µ ∈ P(Rd ), ¯ {F¯ t }, P), ¯ on which is defined an ¯ F, there is a complete filtered probability space (,

Convergence of Nonlinear Filters

99

Rd+r -valued {F¯ t }-standard Wiener process {(V¯t , W¯ t ), t ∈ [0, ∞)} and an Rd -valued ¯ {F¯ t }, P), ¯ ( X¯ t , ¯ F, {F¯ t }-adapted continuous process { X¯ t , t ∈ [0, ∞)}, such that (i) {(, ¯ ¯ (Vt , Wt ))} is a (weak) solution of the stochastic integral equation corresponding to ((c B), b, µ) (i.e. solves the SDE in (1.11) with L( X¯ 0 ) = µ—see the terminology on ˜ {F˜ t }, P), ˜ ( X˜ t , ˜ F, p. 291 of [8]) and (ii) L( X¯ ) = Pµ . Moreover, for every solution {(, (V˜t , W˜ t ))} of the stochastic integral equation corresponding to ((c B), b, µ) we have ¯ {F¯ t }, P) ¯ we always ¯ F, L( X˜ ) = Pµ (by a “complete filtered probability space” (, ¯ P) ¯ is a complete probability space carrying a filtration {F¯ t } such that ¯ F, mean that (, ¯ ¯ each F¯ t includes all P-null events in F). Remark 2.11. Conditions 2.1, 2.3–2.5, and 2.10 will typically be invoked together, and we therefore refer to these conditions collectively as Condition AI. The next result will be needed to formulate the main result on convergence of nonlinear filters. Theorem 2.12. Suppose Condition AI and fix an arbitrary Borel-measurable locally ¯ {F¯ t }, P), ¯ ( X¯ t , (V¯t , ¯ F, bounded mapping c: Rd → Rd×d such that (2.18) holds. If {(, W¯ t ))} is a solution of (1.11), then limε→0 L(X ε ) = L( X¯ ) in P(CRd [0, ∞)). Remark 2.13. The conditions postulated for Theorem 2.12 are similar to the conditions postulated by Blankenship and Papanicolaou [6] who establish the preceding result in the case where B ≡ 0 in (1.10) and (1.11): see the (unnumbered) theorem on p. 449 of [6]. To study the convergence of nonlinear filters we postulate the additional two conditions: Condition 2.14. The mapping c: Rd → Rd×d in the factorization (2.18) (which is the coefficient c(·) in (1.11)) is continuous, c(x)c T (x) is strictly positive definite for each x ∈ Rd , and |cik (x)| ≤ C[1 + |x|], ∀x ∈ Rd , for a constant C ∈ [0, ∞). Condition 2.15. The sensor function h: Rd → Rr in (1.5) is continuous and uniformly bounded. Remark 2.16. Fix some T ∈ (0, ∞), and define the filtration of the observation process {Ytε } in (1.5) by ε

FtY := σ {Ysε , 0 ≤ s ≤ t} ∨ N (P),

∀t ∈ [0, T ].

(2.19)

By a standard Girsanov measure-change one finds a probability measure on (, F), having the same null events as P, and with respect to which {Ytε } is a Brownian motion; ε now it follows (see, e.g. no. 4, pp. 21–22 of [13]) that the filtration {FtY } is rightcontinuous. Then Lemma 1.1 of [17] yields a P(Rd )-valued corlol process {πtε , t ∈ ε [0, T ]} which is {FtY }-adapted and satisfies ε

πtε f = E[ f (X tε ) | FtY ]

a.s.

(2.20)

for each t ∈ [0, T ] and f ∈ B(Rd ). This process is called the nonlinear filter for the signal process {X tε , t ∈ [0, T ]} given the observation {Ytε , t ∈ [0, T ]}. Our goal is

100

V. M. Lucic and A. J. Heunis

to study and characterize the asymptotic limit, in the sense of weak convergence, of the nonlinear filter {πtε , t ∈ [0, T ]} as ε → 0. To this end, fix some some solution ¯ {F¯ t }, P), ¯ ( X¯ t , (W¯ t , V¯t ))} of (1.11) (as in Theorem 2.12), and define the Rr ¯ F, {(, valued observation process {Y¯t , t ∈ [0, T ]} as in (1.7), together with its filtration ¯

¯ FtY := σ {Y¯s , 0 ≤ s ≤ t} ∨ N ( P),

∀t ∈ [0, T ].

¯

(2.21) ε

Then the filtration {FtY } is right-continuous (in the same way that {FtY } is seen to be right-continuous) and, again by Lemma 1.1 of [17], there is a P(Rd )-valued, corlol, and ¯ {FtY }-adapted process {π¯ t , t ∈ [0, T ]} such that ¯ π¯ t f = E[ f ( X¯ t ) | FtY ]

a.s.

(2.22)

for each t ∈ [0, T ] and f ∈ B(Rd ). The process {π¯ t , t ∈ [0, T ]} is the nonlinear filter for the signal { X¯ t , t ∈ [0, T ]} corresponding to the observation {Y¯t , t ∈ [0, T ]}. Remark 2.17. We refer to the set of conditions given by Condition AI (see Remark 2.11) together with Conditions 2.14 and 2.15 as Condition AII. With the preceding established, we can state the main result as follows: Theorem 2.18. Suppose that Condition AII holds, and fix T ∈ (0, ∞). Then {(πtε , Ytε ), t ∈ [0, T ]} and {(π¯ t , Y¯t ), t ∈ [0, T ]}, defined by (2.20) and (2.22), respectively, are (P(Rd ) × Rr )-valued continuous processes, and limε→0+ L((π ε , Y ε )) = L((π¯ , Y¯ )) in P(CP(Rd )×Rr [0, T ]). 3.

A Convergence Theorem of Bhatt and Karandikar

In this section we state a result of Bhatt and Karandikar [3] which is a very effective tool for establishing weak convergence of stochastic processes, and which will be used to prove Theorems 2.12 and 2.18. Remark 3.19. The following notation and terminology will be needed for this and later sections. Suppose that E is a separable metric space: (i) The set M ⊂ B(E) separates points in E when the equality f (x) = f (y), ∀f ∈ M, for some x, y ∈ E, implies x = y; and the set M ⊂ B(E) strongly separates points in E when the convergence limn f (xn ) = f (x), ∀f ∈ M, for some x, xn ∈ E, implies limn xn = x. (ii) For ϕ, ϕn ∈ B(E) write b.p.-limn ϕn = ϕ to indicate supn ϕn  < ∞ and limn ϕn (x) = ϕ(x), ∀x ∈ E. A set M ⊂ B(E) is called b.p.-closed when b.p.-limn ϕn = ϕ for some sequence {ϕn } ⊂ M implies ϕ ∈ M. If Mλ ⊂ B(E) is a given collection of b.p.-closed sets, then ∩λ Mλ is of course b.p.-closed; the b.p.-closure of M ⊂ B(E) is defined to be the intersection of all b.p.-closed Mλ ⊂ B(E) such that M ⊂ Mλ . Likewise, a set M ⊂ B(E) × B(E) is called b.p.-closed when b.p.-limn ϕn = ϕ and b.p.-limn ψn = ψ for a sequence {(ϕn , ψn )} ⊂ M implies (ϕ, ψ) ∈ M; and the b.p.closure of a set M ⊂ B(E) × B(E) is defined to be the intersection of all b.p.-closed sets Mλ ⊂ B(E) × B(E) such that M ⊂ Mλ .

Convergence of Nonlinear Filters

101

¯ (iii) A set M ⊂ C(E) is called separating when the equality ν1 ϕ = ν2 ϕ, ∀ϕ ∈ M, for some ν1 , ν2 ∈ P(E), implies that ν1 = ν2 ; and is called convergence determining when the convergence limn→∞ νn ϕ = ν0 ϕ, ∀ϕ ∈ M, for some sequence {νn , n = 0, 1, 2, . . .} ⊂ P(E), implies weak convergence of νn to ν0 as n → ∞. (iv) Let A ⊂ B(E) × B(E) be a relation with domain D(A), and let µ ∈ P(E). Then a progressively measurable solution of the martingale problem for A (for (A, µ)) ˜ {F˜ t }, P), ˜ {F˜ t }, P) ˜ ( X˜ t )}, in which (, ˜ is a complete filtered ˜ F, ˜ F, is some pair {(, ˜ ˜ probability space  t and { X t } is an E-valued {Ft }-progressively measurable process such that f ( X˜ t ) − 0 A f ( X˜ s ) ds is an {F˜ t }-martingale for each f ∈ D(A) (and L( X˜ 0 ) = ˜ {F˜ t }, P), ˜ ( X˜ t )} is a progressively measurable solution of the martin˜ F, µ). If {(, gale problem for A (for (A, µ)) and the E-valued process { X˜ t } has corlol paths, then ˜ {F˜ t }, P), ˜ ( X˜ t )} is a solution of the corlol martingale problem for A (for (A, µ)). ˜ F, {(, The martingale problem for (A, µ) has the property of existence when there exists some progressively measurable solution of the martingale problem for (A, µ), and has the property of uniqueness when, given any two progressively measurable solu˜ {F˜ t }, P), ¯ {F¯ t }, P), ˜ ( X˜ t )} and {(, ¯ ( X¯ t )} of the martingale problem for ˜ F, ¯ F, tions {(, (A, µ), the E-valued processes X˜ and X¯ necessarily have identical finite-dimensional distributions. The martingale problem for (A, µ) is called well-posed when it has the properties of both existence and uniqueness. Finally, the martingale problem for A is well-posed when the martingale problem for (A, µ) is well-posed for each µ ∈ P(E). The notions of existence, uniqueness, and well-posedness of the corlol martingale problem for (A, µ) and for A are similarly formulated. Theorem 3.20 (Theorem 2.1 and Remark 2.2 of [3]). Suppose that E is a complete ¯ ¯ separable metric space, µ ∈ P(E), and A: D(A) ⊂ C(E) → C(E) is a linear operator such that the following conditions hold: (I) There is a countable set {gk } ⊂ D(A) such that (i) {( f, A f ) : f ∈ D(A)} is a subset of the b.p.-closure of {(gk , Agk )}, and (ii) the set {gk } strongly separates points in E. (II) D(A) is an algebra that vanishes nowhere. (III) The martingale problem for A is well-posed (see Remark 3.19(iv)). (IV) The martingale problem for (A, µ) has a solution {X t , t ∈ [0, ∞)} with corlol paths. (V) There is a sequence {X n (t), t ∈ [0, ∞)}, n = 1, 2, . . . , of E-valued processes with corlol paths such that {L(X n (t)), n = 1, 2, . . .} is a tight sequence in P(E) for each t ∈ [0, ∞), and limn L(X n (0)) = µ in P(E). (VI) For each f ∈ D(A), there exist R-valued progressively measurable processes {(Un (t), Ftn ), t ∈ [0, ∞)} and {(Vn (t), Ftn ), t ∈ [0, ∞)} on (n , F n , Pn ), n = 1, 2, . . . , such that  t Un (t) − Vn (s) ds, t ∈ [0, ∞) is an {Ftn }-martingale, (3.23) 0

lim En n

sup |Un (t) − f (X n (t))| = 0, t∈[0,t1 ]

for each

t1 ∈ [0, ∞), (3.24)

102

V. M. Lucic and A. J. Heunis



t1

sup En n

1/ p |Vn (s)| ds p

< ∞,

for some

p ∈ (1, ∞)

0

and each

t1 ∈ [0, ∞),

lim En [|A f (X n (t)) − Vn (t)|] = 0, n

(3.25) for each

t ∈ [0, ∞).

(3.26)

Then limn→∞ L(X n ) = L(X ) in P(D E [0, ∞)).

4.

Proof of Theorem 2.12

In this section we establish Theorem 2.12, which extends the theorem of Blankenship and Papanicolaou [6, Section 4, p. 449] to include the third term on the right of (1.10) (recall Remark 2.13). Our tool of proof is Theorem 3.20 (which will also be used for the main result on convergence of nonlinear filters). In fact, the full strength of Theorem 3.20 is not actually needed in this section, since it is really meant for proving weak convergence of processes taking values in “large” (i.e. not locally compact) spaces, and in Theorem 2.12 we are only looking at convergence of processes taking values in Rd . Nevertheless, even in this case, the use of Theorem 3.20 results in a considerable gain in simplicity and transparency with respect to the arguments in [6], which rely on the more classical methods of weak convergence found in [5]. In the course of this section we make explicit the construction of certain functions which will be essential not only for proving Theorem 2.12, but also for establishing the main result on convergence of nonlinear filters (see the functions f ε,ϕ in Proposition 4.22). Recalling the infinitesimal generator Q of the Markov process {Z t } in Condition 2.1, put D˜ := { f ∈ Cc2,0 (Rd × S) : f (x, ·) ∈ D(Q), ∀x ∈ Rd , and the mapping (x, z) ∈ Rd × S → Q[ f (x, ·)](z) ∈ R is a member of Cc2,0 (Rd × S)}.

(4.27)

For each ε ∈ (0, 1] define the operator Cε with domain D˜ as follows: d   1 i 1 i C f (x, z) := 2 Q[ f (x, ·)](z) + F (x, z) + G (x, z) ∂i f (x, z) ε ε i=1 ε

+

d 1  [B B T (x)]i j ∂i ∂ j f (x, z), 2 i, j=1

(4.28)

˜ Note that D˜ is a linear subspace of Cc2,0 (Rd × S), for all (x, z) ∈ Rd × S and f ∈ D. ε ˜ so that (Cε , D) ˜ is a linear and that C f is an element of Cc (Rd × S) for each f ∈ D, d operator on Cc (R × S) for every ε ∈ (0, 1]. The next proposition, whose proof is given in Section 7, establishes that (X ε , Z ε ) solves the martingale problem for the operator ˜ (Cε , D):

Convergence of Nonlinear Filters

103

Proposition 4.21. Suppose Condition AI (see Remark 2.11). For every ε ∈ (0, 1] and t ε, f := f (X tε , Z tε ) − 0 Cε f (X sε , Z sε ) ds, ∀t ∈ [0, ∞), is an f ∈ D˜ the process Mt W,Z ε W,Z ε {Ft }-martingale, where Ft is defined in Remark 2.8. The next result, which is also proved in Section 7, gives functions which will be needed for checking (VI) of Theorem 3.20 (and which will also be needed for establishing Theorem 2.18): Proposition 4.22. Suppose Condition AI (see Remark 2.11). Then, for each ϕ ∈ ϕ ϕ Cc∞ (Rd ), there exist functions f 1 ∈ Cc3,0 (Rd × S) and f 2 ∈ Cc2,0 (Rd × S) with the ε,ϕ following property: If f (·) is defined for each ε ∈ (0, 1] by ϕ

ϕ

f ε,ϕ (x, z) := ϕ(x) + ε f 1 (x, z) + ε 2 f 2 (x, z),

∀(x, z) ∈ Rd × S,

(4.29)

˜ and Cε f ε,ϕ has the form Cε f ε,ϕ (x, z) = Lϕ(x)+εγ ϕ (x, z)+ε2γ ϕ (x, z), then f ε,ϕ ∈ D, 1 2 ϕ ϕ ¯ d × S). ∀(x, z, ε) ∈ Rd × S × (0, 1], for some γ1 , γ2 ∈ C(R Now we are ready to establish Theorem 2.12. We do so using Theorem 3.20, and identify E, µ, A, and D(A) in Theorem 3.20 with Rd , δx0 (for x0 in (1.10) and (1.11)), L, and D(L) := Cc∞ (Rd ), respectively (see (2.16)). Verification of (I) and (II) in Theorem 3.20. (I) is a consequence of the fact that {(ϕ, ¯ d ) × C(R ¯ d ) (which Lϕ), ϕ ∈ Cc∞ (Rd )} is separable in the supremum norm of C(R ∞ d follows from Lemma 9.48) and the fact that Cc (R ) strongly separates points in Rd . (II) is clear. Verification of (III) and (IV) in Theorem 3.20. From Condition 2.10 and Proposition 5.3.5 of [8] (together with the bounds on bi and a i j in Remark 2.9) it follows that the martingale problem for L is well-posed (in the sense of Remark 3.19(iv)), as required to check (III). As for (IV), we identify X in Theorem 3.20(IV) with the continuous ¯ {F¯ t }, P), ¯ ( X¯ t , (V¯t , W¯ t ))} of ¯ F, (hence corlol) process X¯ in the postulated solution {(, (1.11). Now fix a sequence {εn } ⊂ (0, 1] with limn εn = 0, and identify the continuous (hence corlol) process X εn given by (1.10) with X n in Theorem 3.20(V) and (VI). Verification of (V) in Theorem 3.20. From (1.10) we have L(X 0εn ) = δx0 . The remainder of (V) is verified by the next result which is established in Section 7: Proposition 4.23. Suppose that Condition AI holds (see Remark 2.11). Then the sequence {X tεn , n = 1, 2, . . .} is tight in P(Rd ) for each t ∈ [0, ∞). Verification of (VI) in Theorem 3.20. put Un (t) := f εn ,ϕ (X tεn , Z tεn ),

Fix some ϕ ∈ Cc∞ (Rd ). For each n = 1, 2, . . . ,

Vn (t) := Cεn f εn ,ϕ (X tεn , Z tεn ),

104

V. M. Lucic and A. J. Heunis

where f εn ,ϕ is given by Proposition 4.22 and Cεn is the operator defined by (4.28). W,Z εn }-progressively measurable (see Remark 2.8), and (3.23) Then Un and Vn are {Ft follows from Propositions 4.21 and 4.22. By Proposition 4.22 we have Un (t)−ϕ(X tεn ) = ϕ ϕ εn f 1 (X tεn , Z tεn ) + εn 2 f 2 (X tεn , Z tεn ), and (3.24) follows from the uniform boundedness ϕ ϕ d of f 1 and f 2 over R × S, and εn → 0. Again by Proposition 4.22, we have Vn (t) = ϕ ϕ Lϕ(X tεn ) + εn γ1 (X tεn , Z tεn ) + εn 2γ2 (X tεn , Z tεn ). Now (3.25) and (3.26) follow from the ϕ ϕ uniform boundedness of γ1 and γ2 over Rd × S. All conditions of Theorem 3.20 have now been checked, and thus limn L(X εn ) = L( X¯ ) in P(DRd [0, ∞)). Since X εn and X¯ are continuous processes we then get limn L(X εn ) = L( X¯ ) in P(CRd [0, ∞)) (see Corollary 3.3.2 of [8]) as required. 5.

Proof of Theorem 2.18

We begin by formulating the general notion of a (weak) solution of the normalized (or Fujisaki–Kallianpur–Kunita–Kushner–Stratonovich) equation of nonlinear filtering: Condition 5.24. Let E be a metric space, and let ck ∈ B(E), k = 1, 2, . . . , r . Also let ¯ ¯ × C(E), k = 1, 2, . . . , r , be linear operators with a common domain G, Hk ⊂ C(E) ¯ D(G, H) ⊂ C(E). ˜ {F˜ t }, P), ˜ (˜νt , W˜ t )} is a so˜ F, Definition 5.25. Suppose Condition 5.24. The pair {(, lution of the normalized filter equation corresponding to (E; G, H, c), when the following hold: ˜ {F˜ t }, P) ˜ is a complete filtered probability space; ˜ F, 1. (, ˜ P); ˜ ˜ F, 2. {W˜ t , t ∈ [0, T ]} is a standard Rr -valued {F˜ t }-Wiener process on (, 3. {˜νt , t ∈ [0, T ]} is a P(E)-valued, corlol, {F˜ t }-adapted process, such that for every ϕ ∈ D(G, H), we have  t r  t  ν˜ t ϕ = ν˜ 0 ϕ + ν˜ s (Gϕ) ds + RHk (ϕ, ck , ν˜ s ) d W˜ sk , 0

k=1

0

t ∈ [0, T ],

(5.30)

where RHk (ϕ, ck , ν) := ν(ck ϕ + Hk ϕ) − (νck )(νϕ), ∀ϕ ∈ D(G, H).

∀ν ∈ P(E), (5.31)

Remark 5.26. We have given a fairly general formulation of the normalized filter equation because later we will need to interpret E as both Rd × S and Rd , and will have to make suitable choices for the operators G and H in Condition 5.24. When E := Rd and D(G, H) := Cc∞ (Rd ) in Condition 5.24, then the P(Rd )-valued process {˜νt } in

Convergence of Nonlinear Filters

105

Definition 5.25 is continuous. This follows from (5.30), since {˜νt ϕ} is continuous for each ϕ in the set Cc∞ (Rd ), which is convergence determining (by Problem 3.11.11 of [8] ˆ d ) with respect to the supremum norm). and the fact that Cc∞ (Rd ) is dense in C(R Define Rr -valued innovations processes I ε , ε ∈ (0, 1], and I¯ by  t  t Itε := Ytε − πuε h du, I¯t := Y¯t − π¯ u h du, t ∈ [0, T ]. (5.32)

Remark 5.27.

0

0

¯

Theorem VI.8.4(i) of [19] shows that {(Itε , FtY )} and {( I¯t , FtY )} are Rr -valued Wiener ¯ P), ¯ respectively. ¯ F, processes on (, F, P) and (, ε

We next introduce a P(Rd ×S)-valued process µεt , which is auxiliary to the nonlinear filter π ε defined by (2.20). The (Rd × S)-valued process (X ε , Z ε ) is corlol (recall Reε mark 2.2), hence Lemma 1.1 of [17] gives a P(Rd × S)-valued, corlol, and {FtY }-adapted process {µεt , t ∈ [0, T ]} on (, F, P) such that ε

µεt g = E[g(X tε , Z tε ) | FtY ]

a.s.,

∀t ∈ [0, T ],

g ∈ B(Rd × S).

(5.33)

Thus, the nonlinear filter {πtε } and the process {µεt } are related by πtε f = µεt ( f ⊗ 1)

a.s.,

∀t ∈ [0, T ],

f ∈ B(Rd ).

(5.34)

Define the operators Ek , k = 1, 2, . . . , r , on D˜ (recall (4.27)) by Ek f (x, z) :=

d 

B jk (x) ∂ j f (x, z),

(x, z) ∈ Rd × S,

˜ f ∈ D,

j=1

k = 1, . . . , r.

(5.35)

We are going to show that {(µεt , Itε )} is a solution of a normalized filter equation of the form in Definition 5.25. The key is the following lemma whose proof is given in Section 8: Lemma 5.28. Suppose Condition AII (Remark 2.17). Then, for each ε ∈ (0, 1] and ˜ we have that M ε, f , W k t = t Ek f (X sε , Z sε ) ds, 0 ≤ t ≤ T, k = 1, . . . , r, f ∈ D, 0 ε, f (Mt is defined in Proposition 4.21). Remark 5.29.

(i) From (5.32) and (5.34) we have  t Itε = Ytε − µεs (h ⊗ 1) ds, t ∈ [0, T ].

(5.36)

0

Put E := Rd × S, ck := h k ⊗ 1, Hk := Ek , k = 1, 2, . . . , r , G := Cε , and D(G, H) ≡ D(Cε , E) := D˜

(5.37)

in Condition 5.24. Then, from (5.36), Lemma 5.28, Proposition 4.21, Remark 5.27, and Theorem 4.1 of [10], we see that µε satisfies the following relation: for each f ∈ D˜

106

V. M. Lucic and A. J. Heunis

we have µεt

f =

µε0 +

 f +

t

0

 t r 0 k=1

µεs (Cε f ) ds

[µεs ((h k ⊗ 1) f + Ek f ) − (µεs (h k ⊗ 1))(µεs f )] d(I ε )ks

(5.38)

ε

for every t ∈ [0, T ]. That is, {(, F, {FtY }, P), (µεt , Itε )} is a solution of the normalized filter equation corresponding to (Rd × S; Cε , E, h ⊗ 1) (in the sense of Definition 5.25). (ii) It follows from (i) that the process {µεt f } is R-valued and continuous for each ˜ Now fix some ϕ ∈ Cc∞ (Rd ). Since 1 ∈ D(Q) with Q1 ≡ 0 (see Condition 2.1), f ∈ D. ˜ thus, from (5.34), we find πtε ϕ = µεt f , ∀t ∈ [0, T ], by (4.27) we have f := ϕ ⊗ 1 ∈ D, and so the R-valued process {πtε ϕ} is continuous for each ϕ ∈ Cc∞ (Rd ). Then, as in Remark 5.26, we see that the P(Rd )-valued process {πtε } is continuous. ¯ {F¯ t }, P), ¯ ( X¯ t , (W¯ t , V¯t ))} is a solution of (1.11) (recall ¯ F, Remark 5.30. Since {(, ϕ Remark 2.16) it follows from (2.18) and Itˆo’s formula that the process M¯ t := ϕ( X¯ t ) − t ∞ d ¯ ¯ 0 Lϕ( X s ) ds, t ∈ [0, T ], is an {Ft }-martingale for each ϕ ∈ C c (R ), where L is defined by (2.16), (2.15), and (2.14). Moreover, for Bk ϕ(x) :=

d 

B jk (x) ∂ j ϕ(x),

∀x ∈ Rd ,

j=1

∀ϕ ∈ D(B) := Cc∞ (Rd ), ∀k = 1, 2, . . . , r,

t

(5.39)

one sees, again from Itˆo’s formula, that  M¯ ϕ , W¯ k t = 0 Bk ϕ(X s ) ds, for each ϕ ∈ Cc∞ (Rd ), k = 1, 2, . . . , r , t ∈ [0, T ]. Now identify E := Rd , ck := h k , Hk := Bk , k = 1, 2, . . . , r , G := L, and D(G, H) ≡ D(L, B) := Cc∞ (Rd )

(5.40)

¯ ¯ F, in Condition 5.24. It follows from Remark 5.27 and Theorem 4.1 of [10] that {(, Y¯ ¯ ¯ {Ft }, P), (π¯ t , It )} is a solution of the normalized filter equation corresponding to (Rd ; L, B, h) (in the sense of Definition 5.25), and (see Remark 5.26) the P(Rd )-valued process {π¯ t } is continuous. Remark 5.31. From Remarks 5.29(ii) and 5.30, we know that {(πtε , Ytε ), t ∈ [0, T ]} and {(π¯ t , Y¯t ), t ∈ [0, T ]} are (P(Rd ) × Rr )-valued continuous processes. Since the mapping

 · (ν· , V· ) ∈ CP(Rd )×Rr [0, T ] −→ ν· , V· + νs h ds ∈ CP(Rd )×Rr [0, T ] 0

is easily seen to be continuous, we see from (5.32) and the Continuous Mapping Theorem (e.g. Corollary 3.1.9 of [8]) that Theorem 2.18 follows when we have shown: Theorem 5.32. Suppose Condition AII holds (see Remark 2.17). Then limε→0+ L((π ε , I ε )) = L((π, ¯ I¯)) in P(CP(Rd )×Rr [0, T ]).

Convergence of Nonlinear Filters

107

We are going to use Theorem 3.20 to establish Theorem 5.32, in a way that is very similar to the earlier application of Theorem 3.20 in the proof of Theorem 2.12. To this end, we use a martingale problem introduced by Hijab [12, page 132] which has a nice relationship with solutions of the normalized filter equation (this martingale problem is also useful for studying the Fleming–Viot process of population genetics—see [7] and the references therein). Suppose Condition 5.24, and define a linear operator H(G, H, c) on B(P(E)) as follows: Put ¯ D(H(G, H, c)) := { ∈ C(P(E)) : (ν) = H (νϕ1 , νϕ2 , . . . , νϕn ), ∀ν ∈ P(E), n for some positive integer n, (ϕi )i=1 ⊂ D(G, H),

H ∈ Cc∞ (Rn )},

(5.41)

and for each  ∈ D(H(G, H, c)) define (recall (5.31)) n  H(G, H, c)()(ν) := ∂i H (νϕ1 , . . . , νϕn )ν(Gϕi ) i=1

+

r  n 1 ∂i ∂ j H (νϕ1 , . . . , νϕn ) 2 k=1 i, j=1

× RHk (ϕi , ck , ν)RHk (ϕ j , ck , ν), ∀ν ∈ P(E). (5.42) Properties of D(H(G, H, c)) are summarized in the following lemma, which is proved in Section 8. Lemma 5.33. Suppose that Condition 5.24 holds. Then: (i) D(H(G, H, c)) is a subal¯ gebra of C(P(E)) that includes constant functions; (ii) if D(G, H) is separating, then D(H(G, H, c)) separates points in P(E); (iii) if D(G, H) is convergence determining, then D(H(G, H, c)) strongly separates points in P(E) (recall Remark 3.19(i), (iii)). Again following Hijab [12], put ˆ D(H(G, H, c)) := span{ ⊗ g :  ∈ D(H(G, H, c)), g ∈ span{1, Cc∞ (Rr )}},

(5.43a)

ˆ H(G, H, c)()(ν, y) := g(y)H(G, H, c)()(ν) +

n  r  i=1 k=1

+

∂i H (νϕ1 , . . . , νϕn )RHk (ϕi , ck , ν) ∂k g(y)

1 (ν)g(y), 2

ˆ ∀ :=  ⊗ g ∈ D(H(G, H, c)),

(ν, y) ∈ P(E) × Rr .

(5.43b)

ˆ ¯ ¯ Remark 5.34. Observe that H(G, H, c) ⊂ C(P(E) × Rr ) × C(P(E) × Rr ) in the k ¯ case where c ∈ C(E), k = 1, 2, . . . , r , in Condition 5.24. The following result, which is a direct consequence of Itˆo’s formula, shows that the (P(E) × Rr )-valued process (ν, V ) in Definition 5.25 solves the martingale problem for ˆ the operator H(G, H, c):

108

V. M. Lucic and A. J. Heunis

Lemma 5.35. Suppose that Condition 5.24 holds, and let {(, F, {Ft }, P), (νt , Vt )} be a solution of the normalized filter equation corresponding to (E; G, H, c) (see Deft ˆ inition 5.25). Then the process Mˆ t := (νt , Vt ) − 0 H(G, H, c)()(νu , Vu ) du, t ∈ ˆ [0, T ], is an {Ft }-martingale for each  ∈ D(H(G, H, c)). Remark 5.36. With these preliminaries in place we next establish Theorem 5.32 by verifying the conditions of Theorem 3.20. Put E := P(Rd ) × Rr (which is a complete separable metric space with the Prohorov metric for P(Rd )), let µ ∈ P(E) be given by the Dirac measure in P(P(Rd ) × Rr ) at the point (δx0 , 0), and let (A, D(A)) := ˆ ˆ ¯ d ), k = (H(L, B, h), D(H(L, B, h))) (see (5.43)) in Theorem 3.20. Since h k ∈ C(R ˆ 1, . . . , r (Condition 2.15), it follows from Remark 5.34 that H(L, B, h) is a linear operd ¯ ator on C(P(R ) × Rr ). Verification of (I) in Theorem 3.20. We need the next result, which follows from Lemma 9.48 (the elementary proof is omitted): d ˆ ¯ Lemma 5.37. Suppose Condition AII. Then H(L, B, h) ⊂ C(P(R ) × Rr ) × C¯ d r d ˆ ¯ )× (P(R ) × R ), and H(L, B, h) is separable (in the supremum norm of C(P(R r d r ¯ R ) × C(P(R ) × R )).

ˆ ˆ Thus there is a sequence {k } ⊂ D(H(L, B, h)) such that the sequence {(k , H(L, B, h) ˆ (k ))} is a dense subset of H(L, B, h), and part (i) of (I) is verified when we identify {gk } with {k }. As for part (ii) of (I), since D(L, B) = Cc∞ (Rd ) (see (5.40)) is convergence determining (see Problem 3.11.11 of [8]), we see from Lemma 5.33(iii) that D(H(L, B, h)) strongly separates points in P(Rd ), and it is clear that Cc∞ (Rr ) strongly ˆ B, h)) strongly sepseparates points in Rr . Now it follows from (5.43a) that D(H(L, d r arates points in P(R ) × R . However, {k } has been seen to be a dense subset of ˆ D(H(L, B, h)), so that {k } strongly separates points in P(Rd ) × Rr . Verification of (II) in Theorem 3.20. By Lemma 5.33 we know that D(H(L, B, h)) is an algebra that includes constant functions, hence vanishes nowhere. Verification of (III) in Theorem 3.20. This is the most important of the conditions associated with Theorem 3.20, and is verified by the next result, established in Section 8: Theorem 5.38. Suppose that Condition AII holds. Then the martingale problem for ˆ H(L, B, h) defined by (5.43) is well-posed. Items (IV)–(VI) of Theorem 3.20 are stated for processes defined over the semiinfinite interval [0, ∞) and the theorem delivers weak convergence of stochastic processes defined over [0, ∞). It is evident that if (IV)–(VI) are verified for processes over the finite interval [0, T ], rather than over [0, ∞), then Theorem 3.20 gives weak convergence of stochastic processes also defined over [0, T ]. From now on we use this restriction of Theorem 3.20 to the interval [0, T ] without further mention.

Convergence of Nonlinear Filters

109

Verification of (IV) in Theorem 3.20. As noted in Remark 5.36 we take µ ∈ P(P(Rd ) × Rr ) to be the Dirac measure of the point (δx0 , 0). Then L(π¯ 0 , I¯0 ) = µ. In Remark 5.30 ¯ {FtY¯ }, P), ¯ (π¯ t , I¯t )} is a solution of the normalized filter equation ¯ F, we saw that {(, corresponding to (Rd ; L, B, h), thus, from Lemma 5.35, it follows that the martingale ˆ problem for (H(L, B, h), µ) has a solution {(π¯ t , I¯t ), t ∈ [0, T ]}, whose paths are continuous, hence corlol. This verifies (IV) of Theorem 3.20 when we identify {X t } with {(π¯ t , I¯t )}. Remark 5.39. Fix an arbitrary sequence {εn } ⊂ (0, 1] such that limn εn = 0, and, to simplify the notation, let πtn and Itn stand for πtεn and Itεn , respectively. Likewise, put Cn εn for Cεn (see (4.28)), µnt for µεt n (see (5.33)), and Ftn for FtY (see (2.19)). Verification of (V) in Theorem 3.20. Identify X n in Theorem 3.20(V) with (π n , I n ). We see from (1.10) and (5.32) that L(π0n , I0n ) = µ, ∀n = 1, 2, . . . . Since the Itn , n = 1, 2, . . . , have a common Gaussian law for each t, the verification of (V) follows from the next result which is proved in Section 8: Lemma 5.40. Suppose that Condition AII holds. Then the sequence {L(πtn ), n = 1, 2, . . .} is tight in P(P(Rd )) for each t ∈ [0, T ]. Verification of (VI) in Theorem 3.20. Here we use the functions f ε,ϕ furnished by Proposition 4.22, and the P(Rd × S)-valued process {µεt , t ∈ [0, T ]} given by (5.33). In keeping with Remark 5.39, we simplify the notation and put f n,ϕ for f εn ,ϕ (see (4.29)). ˆ Fix arbitrary  ∈ D(H(L, B, h)) (see (5.43a)) of the form (ν, y) = ( ⊗ g)(ν, y), d r ∀(ν, y) ∈ P(R ) × R , for some  ∈ D(H(L, B, h)) and g ∈ span{1, Cc∞ (Rr )}, with  given by (ν) := H (νϕ1 , . . . , νϕm ),

ν ∈ P(Rd ),

(5.44)

for some positive integer m, some H ∈ Cc∞ (Rm ), ϕi ∈ Cc∞ (Rd ), i = 1, 2, . . . , m (see (5.41)). For each positive integer n define n (µ) := H (µf n,ϕ1 , . . . , µf n,ϕm ),

µ ∈ P(Rd × S),

(5.45)

where the f n,ϕi ∈ D˜ are given by Proposition 4.22 with ε := εn and ϕ := ϕi . Then f n,ϕi ∈ D(Cn , E) (by (5.37)), from which we find n ∈ D(H(Cn , E, h ⊗ 1)), ∀n = 1, 2, . . . (recall (5.41)). In view of (5.43a), we then have ˆ n , E, h ⊗ 1)), n ⊗ g ∈ D(H(C

∀n = 1, 2, . . . .

(5.46)

For each t ∈ [0, T ], n = 1, 2, . . ., put Un (t) := (n ⊗ g)(µnt , Itn ), ˆ n , E, h ⊗ 1)(n ⊗ g)(µnt , Itn ). Vn (t) := H(C

(5.47)

Now Condition 5.24 holds for E := Rd × S, G := Cn , Hk := Ek , and c := h ⊗ 1, and, from Remark 5.29(i), we know that {(, F, {Ftn }, P), (µnt , Itn )} is a solution of the

110

V. M. Lucic and A. J. Heunis

normalized filter equation corresponding to (Rd × S; Cn , E, h ⊗ 1). Thus, Lemma 5.35 shows that the (P(Rd × S) × Rr )-valued process {(µnt , Itn ), t ∈ [0, T ]} is a solution ˆ n , E, h ⊗ 1), and hence (5.46) and (5.47) show that of the martingale problem for H(C t Un (t) − 0 Vn (s) ds, t ∈ [0, T ], is an {Ftn }-martingale, as required for (3.23). To verify the remaining conditions (3.24)–(3.26) we need the following fact, the elementary proof of which is omitted: Fact 5.41. With reference to (5.44), (5.45), and (5.47), we have (a) supt∈[0,T ],ω∈ |n (µnt ) − (πtn )| ≤ Cεn , ∀n = 1, 2, . . . , for a constant C ∈ [0, ∞); (b) supn,ω,t |Vn (t)| < ∞; ˆ (c) supn,ω |Vn (t) − H(L, B, h)()(πtn , Itn )| < C εn , ∀t ∈ [0, T ], for a constant C ∈ [0, ∞). From (5.47) we find Un (t) − (πtn , Itn ) = [n (µnt ) − (πtn )]g(Itn ), ∀t ∈ [0, T ], n = 1, 2, . . . , and (3.24) follows from this, together with Fact 5.41(a) and uniform boundedness of g(·). Now (3.25) is an immediate consequence of Fact 5.41(b) and (3.26) follows from Fact 5.41(c) and the Dominated Convergence Theorem. This verifies (VI) of Theorem 3.20, from which it follows that limn→∞ L((π n , I n )) = L((π¯ , I¯)) in P(DP(Rd )×Rr [0, T ]). However, {(πtn , Itn ), t ∈ [0, T ]} and {(π¯ t , I¯t ), t ∈ [0, T ]} are continuous (P(Rd ) × Rr )-valued processes (see Remarks 5.29(ii) and 5.30), thus this convergence takes place in P(CP(Rd )×Rr [0, T ]) (see Corollary 3.3.2 of [8]). Now Theorem 5.32 follows by the arbitrary choice of {εn } in Remark 5.39, and, as noted in Remark 5.31, this gives Theorem 2.18.

6.

Discussion

In this section we compare Theorem 2.18 with some results on convergence of nonlinear filters that have recently been established. Bhatt et al. [1] study the nonlinear filter for the observation equation  t Yt = W t + h(X s ) ds, 0

in the case where the Rr -valued Wiener process W is independent of the signal X , which takes values in a complete separable metric space E, and the sensor mapping h: E → Rr is subject to only weak integrability conditions. The main result of [1] establishes that the nonlinear filter of the signal X based on the observation Y depends continuously on the law of X , and is achieved by an elegant argument that involves applying only elementary ideas from integration theory and stochastic analysis to the Kallianpur–Striebel formula for the nonlinear filter; in particular, there is no need to postulate a model for X (e.g. that X be a diffusion or a Markov process), and no use is made of the normalized or unnormalized filter equations. In a subsequent work, Bhatt and Karandikar [4] extend the general approach of [1] to the case in which the signal X and observation process Y are correlated. In contrast

Convergence of Nonlinear Filters

111

to the independence case of [1], here it is necessary to postulate a model for the signal X and its relation to the observation process Y . We suppose that these are given by d X t = a(X t ) d Wt1 + b(X t ) dYt + g(X t ) dt, X 0 = nonrandom x0 ∈ Rd , dYt = h(X t ) dt + d Wt2 , (Wt1 , Wt2 )

(6.48) Y0 = 0 ∈ Rr ,

(6.49)

where is an R -valued standard Wiener process on (, F, P), and the coefficients a(·), b(·), g(·), and h(·) are globally Lipschitz continuous. Suppose, also, that there is a signal/observation pair (X tn , Ytn ) given by d+r

d X tn = a n (X tn ) d Wtn,1 + bn (X tn ) dYtn + g n (X tn ) dt, X 0n = nonrandom x0 ∈ Rd , dYtn

=

h(X tn ) dt

+

d Wtn,2 ,

(6.50) Y0n

=0∈R , r

(6.51)

where (Wtn,1 , Wtn,2 ) is an Rd+r -valued standard Wiener process on (n , F n , P n ), and the coefficients a n (·), bn (·), and g n (·) are again globally Lipschitz continuous. Let {πt } and {πtn } be the P(Rd )-valued processes which are, respectively, the nonlinear filter of the signal X given the observation process Y , and the nonlinear filter of the signal X n given the observation process Y n . If the coefficients a n (·) converge to a(·) uniformly on compacta, and likewise for the remaining coefficients bn (·), g n (·), and the Lipschitz constants associated with a n (·), bn (·), g n (·) are uniform with respect to n, then one sees that the signal/observation pair (X n , Y n ) converges weakly to the signal/observation pair (X, Y ) (see Theorem 5.1 of [4]), and it then becomes natural to try to establish weak convergence of π n to π. The key technical result used in [4] to show this convergence are Kallianpur–Striebel representations for π n and π , which have the following form: in view of the Lipschitz continuity of the coefficients in (6.48) and (6.50), there are nonanticipative mappings e, en : CRd [0, T ] × CRr [0, T ] → CRd [0, T ] which give pathwise representations for X and X n in terms of the “driving” processes (W 1 , Y ) and (W n,1 , Y n ), namely X t = e(W 1 , Y )(t),

P-a.e.

and

X tn = en (W n,1 , Y n )(t),

P n -a.e. (6.52)

(see Theorem 4.3 of [14]). Then, subject to only mild boundedness restrictions on h(·), it ¯ d ), follows from the Girsanov theorem (see pp. 44 and 45 of [4]) that, for each f ∈ C(R one has the Kallianpur–Striebel representations πtn f =

σ˜ tn ( f, Y n ) σ˜ tn (1, Y n )

and

πt f =

σ˜ t ( f, Y ) , σ˜ t (1, Y )

where, for arbitrary w ∈ CRd [0, T ], y ∈ CRr [0, T ], we have defined  σ˜ tn ( f, y) := χtn ( f, w, y) d Q 1 (w),

(6.53)

(6.54)

CRd [0,T ]

χtn ( f, w, y) := f (en (w, y)(t))q˜tn (w, y),  t   1 t n  n n 2 q˜t (w, y) := exp h (e (w, y)(s)) dy(s)− |h(e (w, y)(s))| ds , 2 0 0

(6.55) (6.56)

112

V. M. Lucic and A. J. Heunis

with Q 1 indicating Wiener measure on CRd [0, T ], and σ˜ t ( f, y) being defined in the same way as σ˜ tn ( f, y), but with the superscript “n” suppressed everywhere in (6.54)– (6.56). From these representations one sees that if convergence of the integrand χ n ( f, ·, ·) in (6.54) to the limiting integrand χ ( f, ·, ·) can be established in a sufficiently strong sense, then we should have convergence of the functional σ˜ n ( f, ·) to the functional σ˜ ( f, ·), which, in view of the ratios in (6.53), should lead to weak convergence of π n to π. Indeed, for an arbitrary sequence tn → t ∈ [0, T ], it follows easily from Scheff´e’s lemma that 





lim

n→∞ C r [0,T ] R

CRd [0,T ]

|χtnn ( f, w, y) − χt ( f, w, y)| d Q 1 (w) d Q 2 (y)

= 0,

(6.57)

where Q 2 denotes the Wiener measure on CRr [0, T ]. This is the essential step in establishing weak convergence of π n , since, from (6.57), (6.54), and Fubini’s theorem, we see that the functional σ˜ tnn ( f, ·) converges to the functional σ˜ t ( f, ·) in Q 2 -probability, and weak convergence of π n to π follows from this latter convergence and easy integration theory (see pp. 50 and 51 in [4]). Remark 6.42. Note that Bhatt and Karandikar [4] use models for (X, Y ) and (X n , Y n ) which are more general than the pairs (6.48), (6.49) and (6.50), (6.51) that we postulate here—see the pairs (2.1a), (2.1b) and (5.1a), (5.1b) in [4]—and also allow random initial conditions X 0 and X 0n , with X 0n converging weakly to X 0 . However, to focus on the essential ideas and facilitate comparison with Theorem 2.18, we limit attention to the model (6.48)–(6.51). The method just summarized completely eschews the nonlinear filter equations and martingale problems, together with their rather complicated technical machinery, and raises the question of whether one can establish Theorem 2.18 by the same simple and elegant approach. To explore this possibility we next establish a Kallianpur–Striebel formula for the nonlinear filter πtε (recall (2.20)) in much the same spirit as the representation given by (6.53)–(6.56). Remark 6.43.

For present and later use put

Mt := σ {Z u , Ws , u ∈ [0, ∞), s ∈ [0, t]} ∨ N (P), ∀t ∈ [0, ∞), and note from the independence of Z and W (Condition 2.5) that {Wt } is an {Mt }-Wiener process on (, F, P). For each ε ∈ (0, 1] and t ∈ [0, ∞) put qtε



t

:= exp

h 0



(X sε ) dYsε

1 − 2



t 0

|h(X sε )|2

 ds ,

(6.58)

Convergence of Nonlinear Filters

113

where X ε and Y ε are given by (1.9) and (1.5), respectively. Then E[(qTε )−1 ] = 1 (by Condition 2.15), so that d Pε := (qTε )−1 dP

(6.59)

defines a probability measure P ε on (, F), with P and P ε having identical null events. Then, from the Girsanov theorem, we see that {(Y ε , Mt ), t ∈ [0, T ]} is an Rr -valued standard Wiener process on (, F, P ε ), and therefore (i) {(qtε , Mt ), t ∈ [0, T ]} is a martingale on (, F, P ε ), (ii) Z and Y ε are P ε -independent, (iii) P ε ◦ Z −1 = P ◦ Z −1 , for each ε ∈ (0, 1] ((ii) and (iii) follow since σ {Z t , t ∈ [0, ∞)} = M0 ). Moreover, from (i), (6.59), (2.20), and the Bayes formula (see, e.g., Lemma 7.1, p. 243 of [21]), for ¯ d ) we have each f ∈ C(R πtε f =

σtε f , σtε 1

P ε -a.e.,

(6.60)

where ε

ε

σtε f := E P [ f (X tε )qtε |FtY ].

(6.61)

In view of (1.9), (1.1), and a straightforward extension of Theorem 4.3 of [14], for each ε ∈ (0, 1] we get a mapping eε : D S [0, ∞) × CRr [0, T ] → CRd [0, T ] such that (z, y) → eε (z, y)(t) is Bt/ε2 {D S [0, ∞)} ⊗ Bt {CRr [0, T ]}-measurable, and we have the “pathwise” representation X tε = eε (Z , Y ε )(t),

P ε -a.e.

(6.62)

for each t ∈ [0, T ]. Let Q Z denote the distribution in D S [0, ∞) of the Markov process Z , with initial distribution µ0 specified in Condition 2.1, and, for each ε ∈ (0, 1], t ∈ [0, T ], ¯ d ), y ∈ CRr [0, T ], and z ∈ D S [0, ∞), define f ∈ C(R  ε σ˜ t ( f, y) := ξtε ( f, z, y) d Q Z (z), (6.63) D S [0,∞)

ξtε (

f, z, y) := f (eε (z, y)(t)) q˜tε (z, y), (6.64)  t   t 1 q˜tε (z, y) := exp h  (eε (z, y)(s)) dy(s) − |h(eε (z, y)(s))|2 ds . (6.65) 2 0 0 In view of (6.58), (6.62), and (6.65), we find that qtε = q˜tε (Z , Y ε ), P ε -a.e., hence, from (ii), (iii), and (6.61), we have σtε f = σ˜ tε ( f, Y ε ), P ε -a.e.; and thus, from (6.60), we have the Kallianpur–Striebel representation πtε f =

σ˜ tε ( f, Y ε ) , σ˜ tε (1, Y ε )

P ε -a.e.,

(6.66)

where the functional σ˜ tε ( f, ·) is defined on CRr [0, T ] by (6.63)–(6.65). We next write a Kallianpur–Striebel representation for the nonlinear filter π¯ t (see (2.22)). From (1.11)

114

V. M. Lucic and A. J. Heunis

and (1.7) we have d X¯ t = [b( X¯ t ) − B( X¯ t )h( X¯ t )] dt + c( X¯ t ) d V¯t + B( X¯ t ) d Y¯t , X¯ 0 = nonrandom x0 ∈ Rd .

(6.67)

We need a “pathwise” representation for X¯ in terms of the pair (V¯ , Y¯ ), analogous to the representations in (6.52). However, the conditions postulated for Theorem 2.18 are not quite enough for this, since these entail only uniqueness-in-distribution for the pair ( X¯ , Y¯ ) given by (1.11), (1.7), rather than pathwise uniqueness, as required for such a representation to hold. We therefore add the further condition that the mappings b(·), c(·), B(·), and h(·) be globally Lipschitz continuous, and then, from (6.67) and Theorem 4.3 of [14], we have a nonanticipative mapping e¯ : CRd [0, T ]×CRr [0, T ] → CRd [0, T ] such that X¯ t = e¯ (V¯ , Y¯ )(t),

¯ P-a.e.

(6.68)

Then, similarly to (6.53)–(6.56), we have the following Kallianpur–Striebel representa¯ d ), tion for π¯ t : for each t ∈ [0, T ] and f ∈ C(R π¯ t f =

σ˜ t ( f, Y¯ ) , σ˜ t (1, Y¯ )

where, for arbitrary v ∈ CRd [0, T ], y ∈ CRr [0, T ], we have defined  σ˜ t ( f, y) := ξt ( f, v, y) d Q 1 (v),

(6.69)

(6.70)

CRd [0,T ]

ξt ( f, v, y) := f (¯e(v, y)(t))q˜t (v, y),  t   1 t  2 q˜t (v, y) := exp h (¯e(v, y)(s)) dy(s) − |h(¯e(v, y)(s))| ds , 2 0 0

(6.71) (6.72)

with Q 1 indicating Wiener measure on CRd [0, T ]. The goal now is to try to use the Kallianpur–Striebel representations (6.63)–(6.66) and (6.69)–(6.72) to show weak convergence of π ε to π¯ as ε → 0. To this end, fix sequences {εn } ⊂ (0, 1] and {tn } ⊂ [0, T ] such that εn → 0 and tn → t. Then, as in [4], it is enough to show that the functional σ˜ tεnn ( f, ·) (see (6.63)) converges to the functional σ˜ t ( f, ·) (see (6.70)) in Q 2 probability (where Q 2 denotes the Wiener measure in CRr [0, T ]), in order to conclude weak convergence of π εn to π¯ . However, we now encounter a fundamental difficulty in following the method of [4], which relied on (6.57) to secure convergence in the Q 2 -measure of the functional σ˜ tnn ( f, ·) (see (6.54)) to σ˜ t ( f, ·), since, in the present case, the analogue of (6.57) does not even make sense. This is because the integrands ξtεnn ( f, z, y) and ξt ( f, v, y) in (6.63) and (6.70) are defined on different domains, namely (z, y) ∈ D S [0, ∞) × CRr [0, T ] and (v, y) ∈ CRd [0, T ] × CRr [0, T ], respectively, so that one does not have available a double-integral of the kind appearing in (6.57), where the integrands χtnn ( f, w, y) and χt ( f, w, y) are defined on, and can be integrated over, the common domain (w, y) ∈ CRd [0, T ] × CRr [0, T ]. The reason for this is to be found in the dynamics of (6.48) (for X ) and (6.50) (for X n ), in which the “driving” random pairs (Wt1 , Yt ) and (Wtn,1 , Ytn ) have sample-paths in the same space CRd [0, T ] × CRr [0, T ], while on the other hand, in (6.67) (for X¯ ) and (1.9) (for X ε ), the driving random pairs

Convergence of Nonlinear Filters

115

(V¯t , Y¯t ) and (Z t , Ytε ) have sample-paths in the different spaces CRd [0, T ]×CRr [0, T ] and D S [0, ∞) × CRr [0, T ], respectively. This seems to make it rather difficult to establish Theorem 2.18 by the elegant method of Bhatt and Karandikar [4], and, at least at our current level of understanding, appears to necessitate the martingale-problem approach taken here.

7.

Appendix for Section 4

˜ put For each f ∈ D, (V f )(x, z 1 , z 2 ) :=

d   1 i=1

+

ε

F i (x, z 1 ) + G i (x, z 1 ) ∂i f (x, z 2 )

d 1  [B B T (x)]i j ∂i ∂ j f (x, z 2 ), 2 i, j=1

(7.73)

for all (x, z 1 , z 2 ) ∈ Rd × S × S, and observe from (4.28) and (7.73) that (Cε f )(x, z) = (V f )(x, z, z) + ε −2 Q[ f (x, ·)](z),

∀(x, z) ∈ Rd × S.

(7.74)

˜ Since {(Wt , Mt )} is a Proof of Proposition 4.21. Fix some ε ∈ (0, 1] and f ∈ D. Wiener process on (, F, P) (Remark 6.43), it easily follows from (1.10), (7.73), and Itˆo’s formula that, for each z ∈ S, f (X tε , z) −



t 0

(V f )(X sε , Z sε , z) ds

is an {Mt }-martingale,

thus, for 0 ≤ t1 < t2 < ∞ and z ∈ S, we have    t2  ε ε ε ε  (V f )(X s , Z s , z) ds  Mt1 = 0 E f (X t2 , z) − f (X t1 , z) −

a.s.

(7.75)

t1

By Problem 1.5.7 of [20] and the Mt1 -measurability of Z tε2 , we see that (7.75) still ε ε holds when z is replaced with Z tε2 , and then, since FtW,Z ⊂ Mt (for {FtW,Z } defined in Remark 2.8), we get    t2  W,Z ε ε ε ε ε ε ε ε  (V f )(X s , Z s , Z t2 ) ds  Ft1 E f (X t2 , Z t2 ) − f (X t1 , Z t2 ) − t1

=0

a.s.

(7.76)

{Ttε }

{Z tε }

If is the Feller semigroup on C(S) corresponding to Markov process defined by (1.1), then ε−2 Q is the generator of {Ttε } with domain D(Q) (see Condition 2.1). Since ˜ one has f (x, ·) ∈ D(Q), ∀x ∈ Rd (see (4.27)), and thus Proposition 4.1.7 of [8] f ∈ D, shows that, for each x ∈ Rd ,  t Utε (x) := f (x, Z tε ) − ε −2 Q[ f (x, ·)](Z uε ) du, 0

t ∈ [0, ∞)

is an

ε FtZ -martingale.

(7.77)

116

V. M. Lucic and A. J. Heunis

However, Z and W are independent, so that for 0 ≤ t1 < t2 < ∞, we have E[Utε2 (x) − ε ε Utε1 (x) | FtW,Z ] = 0 a.s. for each x ∈ Rd . Since X tε1 is FtW,Z -measurable (Remark 2.8) 1 1 ε it follows from Problem 1.5.7 of [20] that E[Utε2 (X tε1 ) − Utε1 (X tε1 ) | FtW,Z ] = 0 a.s. or 1 (see (7.77))    t2  ε Q[ f (X tε1 , ·)](Z sε ) ds  FtW,Z E f (X tε1 , Z tε2 ) − f (X tε1 , Z tε1 ) − ε −2 1 t1

=0

a.s.

(7.78)

Since (7.78) and (7.76) hold for all 0 ε ≤ t1 < t2 < ∞, by Lemma 4.3.4(a) of [8] (with E := S, X := Z ε , Gt := FtW,Z , and mappings u, v: [0, ∞) × S ×  → R, w: [0, ∞) × [0, ∞) × S ×  → R defined by u(t, z, ω) := f (X tε (ω), z), v(t, z, ω) := (V f )(X tε (ω), Z tε (ω), z), w(t, s, z, ω) := ε −2 Q[ f (X tε (ω), ·)](z)), it follows that  t ε u(t, Z t ) − [v(s, Z sε ) + w(s, s, Z sε )] ds (7.79) 0

ε {FtW,Z }-martingale.

ε, f

is an However, from (7.74), the quantity in (7.79) is Mt statement of Proposition 4.21, as required. Proof of Proposition 4.22. and put ϕ

g1 (x, z) :=

d 

in the

Fix ε ∈ (0, 1] and ϕ ∈ D(L) := Cc∞ (Rd ) (recall (2.16)),

F i (x, z) ∂i ϕ(x),

i=1



ϕ

f 1 (x, z) :=

ϕ

g1 (x, z  )χ (z, dz  ),

S

∀(x, z) ∈ Rd × S.

(7.80)

ϕ

Now Condition 2.4 (see (1.3)) shows that mg ¯ 1 (x) = 0, ∀x ∈ Rd , so that Lemma 2.6(i) ϕ gives f 1 (x, ·) ∈ D(Q), ∀x ∈ Rd , with ϕ

ϕ

Q[ f 1 (x, ·)](z) = −g1 (x, z),

∀(x, z) ∈ Rd × S.

(7.81) ϕ

ϕ

From (7.80) with Condition 2.4 and Lemma 2.6(ii), we see that g1 , f 1 ∈ Cc3,0 (Rd × S). Put ϕ

g2 (x, z) :=

d 

ϕ

F i (x, z) ∂i f 1 (x, z) +

i=1

+ ϕ f 2 (x, z)

 S

G i (x, z) ∂i ϕ(x)

i=1 d 1  [BBT (x)]i j ∂i ∂ j ϕ(x), 2 i, j=1 ϕ

:=

d 

ϕ

(7.82)

[g2 (x, z  ) − mg ¯ 2 (x)]χ (z, dz  ),

∀(x, z) ∈ Rd × S.

(7.83)

ϕ

Then Lemma 2.6(i) gives f 2 (x, ·) ∈ D(Q), ∀x ∈ Rd , with ϕ

ϕ

ϕ

¯ 2 (x) − g2 (x, z), Q[ f 2 (x, ·)](z) = mg

∀(x, z) ∈ Rd × S.

(7.84)

Convergence of Nonlinear Filters

117

Now define f ε,ϕ as in (4.29). The semigroup {Tt } is conservative, so that (1, 0) ∈ Q ϕ ϕ (see Condition 2.1), and we have seen that f 1 (x, ·), f 2 (x, ·) ∈ D(Q), ∀x ∈ Rd . Thus ε,ϕ d f (x, ·) ∈ D(Q), ∀x ∈ R , and it follows from (7.81) and (7.84) that ϕ

ϕ

ϕ

Q[ f ε,ϕ (x, ·)](z) = −εg1 (x, z) + ε 2 [mg ¯ 2 (x) − g2 (x, z)]. ϕ

ϕ

(7.85)

ϕ

We have already seen that g1 , f 1 ∈ Cc3,0 (Rd × S). Then ∂i f 1 ∈ Cc2,0 (Rd × S), hence ϕ Condition 2.4 and (7.82) show that g2 ∈ Cc2,0 (Rd × S). Thus, by standard results on exchanging the order of integrals and derivatives (see, e.g. Theorem 2.27 in [9]), we get ϕ ϕ ϕ ¯ 2 ∈ Cc2,0 (Rd × S), and so, by (7.83) mg ¯ 2 ∈ Cc2 (Rd ). It therefore follows that g2 − mg ϕ and Lemma 2.6(ii), we have f 2 ∈ Cc2,0 (Rd × S). To summarize, we have seen that ϕ

ϕ

ϕ

ϕ

ϕ

g1 , f 1 , g2 − mg ¯ 2 , f 2 ∈ Cc2,0 (Rd × S).

(7.86)

Thus (see (4.29)) we have f ε,ϕ ∈ Cc2,0 (Rd × S), and (see (7.85)) the mapping (x, z) → Q[ f ε,ϕ (x, ·)](z) defines a member of Cc2,0 (Rd × S). This shows that f ε,ϕ ∈ D˜ (see (4.27)). We next evaluate Cε f ε,ϕ . From Cε in (4.28), (7.85), and some simplification, we find

1  i ϕ ε ε,ϕ C f (x, z) = F (x, z) ∂i ϕ(x) − g1 (x, z) ε i

  ϕ + F i (x, z) ∂i f 1 (x, z) + G i (x, z) ∂i ϕ(x) i

i

1 ϕ + [BBT (x)]i j ∂i ∂ j ϕ(x) − g2 (x, z) 2 i, j ϕ

ϕ

ϕ

+ mg ¯ 2 (x) + εγ1 (x, z) + ε 2 γ2 (x, z), ϕ



(7.87)

ϕ

for some γ1 , γ2 ∈ Cc (Rd × S). Now the expressions in square brackets on the right-hand side cancel to zero (by (7.80) and (7.82)). An easy calculation involving Lemma 2.6(ii), (7.80), and (7.82), shows that ϕ

mg ¯ 2 (x) = Lϕ(x),

(7.88)

as required. Proof of Proposition 4.23. Fix a mapping g: [0, ∞) → [0, 1] such that g(r ) := 1, ∀r ∈ [0, 1]; g(r ) := 0, ∀r ∈ [2, ∞); and g(·) has continuous derivatives of all orders. For each k = 1, 2, . . . define gk : [0, ∞) → [0, 1] by

1 gk (r ) := g log[1 + r ] , ∀r ∈ [0, ∞), (7.89) k and put ϕk (x) := gk (|x|), ∀x ∈ Rd . Then ϕk ∈ Cc∞ (Rd ). For notational convenience, ϕ ϕ ϕ ϕ let f 1k , f 2k , γ1k , and γ2k , denote respectively the functions f 1 k , f 2 k , γ1 k , and γ2 k in Proposition 4.22; let f k,n denote the member of D˜ given by (4.29) with ε := εn , ϕ = ϕk ;

118

V. M. Lucic and A. J. Heunis

and let Mtk,n denote the martingale in Proposition 4.21 when ε := εn and f := f k,n , so that Mtk,n := ϕk (X tεn ) + εn f 1k (X tεn , Z tεn ) + εn2 f 2k (X tεn , Z tεn )  t − [Lϕk (X sεn ) + εn γ1k (X sεn , Z sεn ) + εn2 γ2k (X sεn , Z sεn )] ds.

(7.90)

0

Now fix some t ∈ [0, ∞), some (small) η ∈ (0, 1), and some positive integer k0 such that ϕk (x0 ) = 1, ∀k ≥ k0 (for x0 in (1.10)). In view of the bounds on bi (·) and a i j (·) in Remark 2.9, an easy calculation shows that limk→∞ Lϕk  = 0, thus fix integer k1 := k1 (η) ≥ k0 such that  t     Lϕk (X εn ) ds  < η, 1 s  

∀n = 1, 2, . . . .

(7.91)

0

¯ d × S) (see Proposition 4.22), it follows from (7.90) that Since f 1k1 , f 2k1 , γ1k1 , γ2k1 ∈ C(R Mtk1 ,n = ϕk1 (X tεn ) + O(εn ) −

 0

t

Lϕk1 (X sεn ) ds,

n = 1, 2, . . . .

(7.92)

Then, from (7.92), (7.91), and the facts that ϕk1 (x0 ) = 1 and E[Mtk1 ,n ] = E[M0k1 ,n ] (by Proposition 4.21), we find that E[ϕk1 (X tεn )] ≥ 1 + O(εn ) − η. Since the support of ϕk1 is the ball of radius Rη := e2k1 − 1, we have P[|X tεn | ≤ Rη ] ≥ 1 + O(εn ) − η, ∀n = 1, 2, . . . , and tightness of the sequence {X tεn , n = 1, 2, . . .} follows.

8.

Appendix for Section 5

˜ and k = 1, 2, . . . , d. Since Proof of Proposition 5.28. Fix some ε ∈ (0, 1], f ∈ D, {Wt } is continuous, the result follows when we establish that 

f (X tε , Z tε ) −  − 0

t



t 0

 (Cε f )(X sε , Z sε ) ds Wtk

(Ek f )(X sε , Z sε ) ds

ε

is an {FtW,Z }-martingale

ε

(8.93) ε, f

(for {FtW,Z } defined in Remark 2.8), since the quantity in square braces is just Mt . We show this by an argument similar to that used in the proof of Proposition 4.21. Since {(Wt , Mt )} is a Wiener process on (, F, P) (Remark 6.43), it easily follows from (1.10), (5.35), (7.73), and Itˆo’s formula that f (X tε , z)Wtk −

 0

t

[Wsk (V f )(X sε , Z sε , z) + (Ek f )(X sε , z)] ds

is an {Mt }-martingale

Convergence of Nonlinear Filters

119

for each z ∈ S, thus, for 0 ≤ t1 < t2 < ∞,  E f (X tε2 , z)Wtk2 − f (X tε1 , z)Wtk1 

t2

− t1

  [Wsk (V f )(X sε , Z sε , z) + (Ek f )(X sε , z)] ds  Mt1 = 0

a.s.

(8.94)

By Problem 1.5.7 of [20] and the Mt1 -measurability of Z tε2 (see Remark 6.43), it is easily ε seen that (8.94) still holds when z is replaced with Z tε2 , and then, since FtW,Z ⊂ Mt1 , 1 we get  E f (X tε2 , Z tε2 )Wtk2 − f (X tε1 , Z tε2 )Wtk1 

t2

− t1

  ε =0 [Wsk (V f )(X sε , Z sε , Z tε2 ) + (Ek f )(X sε , Z tε2 )] ds  FtW,Z 1

a.s. (8.95)

Now, exactly as in the proof of Proposition 4.21, for 0 ≤ t1 < t2 < ∞ we have (see (7.78))  E

f (X tε1 , Z tε2 ) − f (X tε1 , Z tε1 ) − ε −2

 t1

t2

  ε =0 Q[ f (X tε1 , ·)](Z sε ) ds  FtW,Z 1

and therefore   ε ε k ε ε k −2 k E f (X t1 , Z t2 )Wt1 − f (X t1 , Z t1 )Wt1 − ε Wt1 =0

t2 t1

Q[

f (X tε1 , ·)](Z sε ) ds

a.s.

a.s.

  W,Z ε F  t1 (8.96)

Since (8.95) and (8.96) hold for all 0 ≤ t1 < t2 < ∞, by Lemma 4.3.4(a) of [8] (with ε E := S, X := Z ε , Gt := FtW,Z , and mappings u, v: [0, ∞)× E × → R, w: [0, ∞)× [0, ∞) × E ×  → R defined by u(t, z, ω) := f (X tε (ω), z)Wtk (ω), v(t, z, ω) := Wtk (ω)(V f )(X tε (ω), Z tε (ω), z)+(Ek f )(X tε (ω), z), w(t, s, z, ω) := ε −2 Wtk (ω) Q[ f (X tε (ω), ·)](z)) one sees that  t  t f (X tε , Z tε ) Wtk − Wsk (Cε f )(X sε , Z sε ) ds − (Ek f )(X sε , Z sε ) ds 0

is an

0

ε {FtW,Z }-martingale,

(8.97) ε

where we have used (7.74). Now {Wtk } is an {FtW,Z }-Wiener process (by Condition 2.5), thus  t  t Wsk (Cε f )(X sε , Z sε ) ds − Wtk (Cε f )(X sε , Z sε ) ds 0

0

is an

ε {FtW,Z }-martingale

(see Problem 2.9.22 of [8]), so that (8.93) follows from (8.97).

120

V. M. Lucic and A. J. Heunis

Proof of Lemma 5.33. (i) Fix 1 , 2 ∈ D(H(G, H, c)). Then it is clear that 1 2 ∈ D(H(G, H, c)), and it is easily checked that α1 1 + α2 2 has the form of a member of ¯ D(H(G, H, c)), ∀α1 , α2 ∈ R. Thus D(H(G, H, c)) is a subalgebra of C(P(E)). To see ¯ that D(H(G, H, c)) includes constant functions fix ϕ ∈ C(E) and H ∈ Cc∞ (R) such that H (x) = 1, ∀|x| ≤ ϕ. Then (ν) := H (νϕ), ∀ν ∈ P(E), defines a member of D(H(G, H, c)) and has the constant value of unity. (ii) Suppose that (ν1 ) = (ν2 ), ∀ ∈ D(H(G, H, c)), for some ν1 , ν2 ∈ P(E). Then H (ν1 ϕ) = H (ν2 ϕ), ∀H ∈ Cc∞ (R), ∀ϕ ∈ D(G, H). Since Cc∞ (R) separates points ¯ in R, we get ν1 ϕ = ν2 ϕ, ∀ϕ ∈ D(G, H), so that ν1 = ν2 (since D(G, H) ⊂ C(E) is separating). (iii) The argument is similar to that for (ii) but just uses the elementary fact that Cc∞ (R) strongly separates points in R. Proof of Theorem 5.38. Fix some ξ ∈ P(Rd ) × Rr , and observe from Lemma 5.35 that there exists a corlol (indeed continuous) solution of the martingale problem for ˆ (H(L, B, h), δξ ). To see uniqueness of the solution we need the next two results, the first of which is established later in the present section: ˜ {F˜ t }, P), ˜ (˜νt , V˜t )} is a progressively measurable ˜ F, Theorem 8.44. Suppose that {(, ˆ solution of the martingale problem for H(L, B, h). Then there exists some (P(Rd )×Rr )˜ {F˜ t }, P) ˜ ˜ F, valued {F˜ t }-adapted process {(˜νt , V˜t )} on the filtered probability space (,   ˜   ˜ {F˜ t }, P), ˜ (˜νt , V˜t − V˜0 )} is ˜ F, such that {(˜νt , Vt )} is a modification of {(˜νt , V˜t )} and {(, a solution of the normalized filter equation corresponding to (Rd ; L, B, h) (in the sense of Definition 5.25); namely for each ϕ ∈ Cc∞ (Rd ) we have a.s. ν˜ t ϕ = ν˜ 0 ϕ +



t 0

∀t ∈ [0, T ].

ν˜ s (Lϕ) ds +

r   k=1

0

t

RBk (ϕ, h k , ν˜ s ) d(V˜s − V˜0 )k , (8.98)

Remark 8.45. Theorem 8.44 shows that any progressively measurable solution of the ˆ martingale problem for H(L, B, h) has a corlol modification (in fact, by Remark 5.26, the modification is actually continuous, and not just corlol). The next result is an immediate consequence of Theorem 2.21 of [18], which establishes the property of uniqueness in law for the normalized filter equation when the signal and observation are given by the pair (1.11) and (1.7), with b(·), c(·), and B(·) being continuous and linearly bounded, and h(·) being Borel-measurable and uniformly bounded. In the problem under consideration here these hypotheses are implied by Conditions AII. ˜ {F˜ t }, P), ˜ ˜ F, Theorem 8.46. Suppose Conditions AII (see Remark 2.17). If {(, ˆ ˆ ˜ ˆ ˆ ˆ (η˜ t , Wt )} and {(, F, {Ft }, P), (ηˆ t , Wt )} are solutions of the normalized filter equation corresponding to (Rd ; L, B, h) (in the sense of Definition 5.25), with L P˜ (η˜ 0 ) = L Pˆ (ηˆ 0 ), then the (P(Rd ) × Rr )-valued processes {(η˜ t , W˜ t )} and {(ηˆ t , Wˆ t )} are identically distributed.

Convergence of Nonlinear Filters

121

From Theorems 8.44 and 8.46 we see that any two progressively measurable solutions ˆ of the martingale problem for (H(L, B, h), δξ ) have identical finite-dimensional distributions. However, we have already seen that there exists a corlol solution of the martinˆ gale problem for (H(L, B, h), δξ ), so it follows that the corlol martingale problem for ˆ ˆ (H(L, B, h), δξ ) is well-posed, ∀ξ ∈ P(Rd ) × Rr . Since H(L, B, h) has been shown to satisfy the separability condition (I) of Theorem 3.20, we then see from Theorem 2.1 of [2] ˆ that the corlol martingale problem for H(L, B, h) is well-posed. Consequently, one sees from Remark 8.45 that, for each µ ∈ P(P(Rd )×Rr ), the finite-dimensional distributions ˆ of a progressively measurable solution of the martingale problem for (H(L, B, h), µ) are uniquely determined, as required to establish Theorem 5.38. Proof of Theorem 8.44.

The proof is given in a series of steps:

˜ {F˜ t }, P), ˜ (˜νt , V˜t )} is a progressively measurable solution of the ˜ F, Step 1. Since {(, ˆ martingale problem for H(L, B, h), and since H(L,  t B, h) includes the constant functions (by Lemma 5.33(i)), it follows that g(V˜t ) − 12 0 g(V˜s ) ds is an {F˜ t }-martingale for each g ∈ Cc∞ (Rr ). Thus, Proposition 5.3.5 of [8] ensures that {V˜t } has a continuous modification {V˜t }, which is {F˜ t }-adapted, thus {(V˜t − V˜0 , F˜ t )} is a standard Rr -valued Wiener process. Step 2. We next show that the P(Rd )-valued process {˜νt } has a P(Rd )-valued corlol modification. Let Rd∗ be the one-point compactification of Rd , with the point at infinity ¯ d∗ ), with common domain denoted by , and let L and B  be linear operators on C(R   D(L , B ), defined by ¯ d∗ ) : ϕ|Rd ∈ Cc∞ (Rd ), ϕ() = 0}, D(L , B  ) := {ϕ ∈ C(R

(8.99a)

¯ L := {(ϕ, ψ) ∈ D(L , B ) × C(R ): 





d∗

(ϕ|Rd , ψ|Rd ) ∈ L, ϕ() = ψ() = 0},

(8.99b)

¯ B := {(ϕ, ψ) ∈ D(L , B ) × C(R ): 





d∗

(ϕ|Rd , ψ|Rd ) ∈ B, ϕ() = ψ() = 0}

(8.99c) 

(recall (2.16) and (5.39)). Also, define (Borel-measurable) h : R h  (x) := h(x),

∀x ∈ Rd ,

h  () := 0.

d∗

→ R by r

(8.100)

d∗ ¯ Then we have (A) that H(L , B  , h  ) ⊂ C(P(R )) × B(P(Rd∗ )). From Prob  ¯ d∗ ) is separating, and therelem 5.4.25 of [15] it easily follows that D(L , B ) ⊂ C(R    d∗ ¯ fore Lemma 5.33 shows that D(H(L , B , h )) is an algebra in C(P(R )) which d∗ d∗ includes constant functions and separates points in P(R ). Since P(R ) is compact, the Stone–Weierstrass theorem establishes that D(H(L , B  , h  )) is a dense subset of d∗ ¯ C(P(R )), and therefore we have (B) the set D(H(L , B  , h  )) is separating. Mored∗ ¯ over, since P(Rd∗ ) is a compact metric space, we know that C(P(R )) is separable,    d∗ ¯ and thus, since D(H(L , B , h )) has been seen to be a dense subset of C(P(R )), it    follows (C) that D(H(L , B , h )) contains a countable subset which separates points ˜ {F˜ t }, P), ˜ (˜νt , V˜t )} is a progressively measurable solution of the ˜ F, in P(Rd∗ ). Now {(,

122

V. M. Lucic and A. J. Heunis

ˆ ˜ {F˜ t }, P), ˜ (˜νt )} is a ˜ F, martingale problem for H(L, B, h), from which we see that {(, progressively measurable solution of the martingale problem for H(L, B, h) (take g ≡ 1 in (5.43)). We regard {˜νt } as a P(Rd∗ )-valued process with ν˜ t () = 0, ∀t, and then it ˜ {F˜ t }, P), ˜ (˜νt )} solves the martingale problem for ˜ F, follows from (8.99) that (D) {(,    H(L , B , h ). In view of (A)–(D), compactness of P(Rd∗ ), and Theorem 4.3.6 of [8] (with E := P(Rd∗ ), A := H(L , B  , h  ), and X := ν˜ ), there exists a P(Rd∗ )-valued ˜ {F˜ t }, P) ˜ ˜ F, corlol modification of {˜νt }, which we denote by {˜νt }. In particular, since (,  is a complete filtered probability space, the corlol process {˜νt } is {F˜ t }-progressively ˜ {F˜ t }, P), ˜ (˜νt )} is a solution of the corlol martingale problem ˜ F, measurable, thus {(,     d for H(L , B , h ), and ν˜ 0 (R ) = 1 a.s. Next we need the following result, which is a variant of Theorem 4.3.8 of [8], and which is established later in this section: ˆ {Fˆ t }, P), ˆ (ˆνt )} be a ˆ F, Lemma 8.47. Suppose that Condition AII holds, and let {(, ˆ ν0 (Rd ) = 1) = 1, solution of the corlol martingale problem for H(L , B  , h  ). If P(ˆ ˆ νt (Rd ) = 1, ∀t ≥ 0) = 1. then P(ˆ ˜ νt (Rd ) = 1, ∀t ∈ [0, T ]) = 1, thus {˜νt } is a P(Rd )-valued, Lemma 8.47 shows that P(˜ ˜ {Ft }-adapted, corlol modification of {˜νt }. We conclude that {(˜νt , V˜t )} is a (P(Rd ) × Rr )valued, {F˜ t }-adapted, corlol modification of {(˜νt , V˜t )}. Fix ϕ ∈ Cc∞ (Rd ) and put  t η˜ t := ν˜ t ϕ − ν˜ s (Lϕ) ds.

Step 3.

(8.101)

0

Since {˜νt } is P(Rd )-valued, corlol, and {F˜ t }-adapted, it follows that {η˜ t } is R-valued, corlol, and {F˜ t }-adapted. Fix H2 ∈ Cc∞ (R) such that H2 (x) = x, |x| ≤ ϕ, and define 2 ∈ D(H(L, B, h)) by 2 (ν) := H2 (νϕ),

∀ν ∈ P(Rd ).

(8.102)

Then (see (5.42)) 2 (ν) = νϕ

and H(L, B, h)(2 )(ν) = ν(Lϕ),

∀ν ∈ P(Rd ).

(8.103)

ˆ ˜ {F˜ t }, P), ˜ (˜νt , V˜t )} solves the corlol martingale problem for H(L, ˜ F, Now {(, B, h),  ˜ {F˜ t }, P), ˜ (˜νt )} solves the corlol martingale problem for H(L, B, ˜ F, and therefore {(, h), and hence from (8.101) and (8.103), it follows that {η˜ t } is a corlol {F˜ t }-martingale. However, {(V˜t )k } is a continuous {F˜ t }-martingale (in fact, Wiener process) for each k = 1, 2, . . . , r , so that Theorem VI.37.8 of [19] gives a continuous co-variation process {[η, ˜ (V˜  )k ]t } which is unique to within indistinguishability. We will show that  t [η, ˜ (V˜  )k ]t = Rk (s) ds, where Rk (s) := RBk (ϕ, h k , ν˜ s ) (8.104) 0

(recall (5.31), (5.39), and (5.40)). For each n = 1, 2, . . . fix some gn ∈ Cc∞ (Rr ) with gn (y) = y k , ∀y ∈ Rr , |y| ≤ n, and put n := 2 ⊗ gn , n = 3, 4, . . . , where 2 is

Convergence of Nonlinear Filters

123

ˆ given by (8.102). Then n ∈ D(H(L, B, h)), and it is easily seen that n (ν, y) = (y k )(νϕ) and ∀|y| ≤ n,

ˆ H(L, B, h)(n )(ν, y) = (y k )ν(Lϕ) + RBk (ϕ, h k , ν),

ν ∈ P(Rd ),

y ∈ Rr .

(8.105)

Put Tn := inf{t : |(V˜t )k | ≥ n}, n = 1, 2, . . . . Then Tn is an {F˜ t }-stopping time, and ˆ ˜ {F˜ t }, P), ˜ (˜νt , V˜t )} solves the corlol martingale problem for H(L, ˜ F, B, h), since {(, we see from the optional sampling theorem and (8.105) that  t∧Tn  k  Mn (t) := (V˜t∧T ) (˜ ν ϕ) − [(V˜s )k ν˜ s (Lϕ) + Rk (s)] ds (8.106) t∧Tn n 0

is a corlol {F˜ t }-martingale. From (8.106) and (8.101) we have  t∧Tn  k ˜ η˜ t∧Tn (Vt∧Tn ) − Rk (s) ds 0



t∧Tn

= Mn (t) + 0

(V˜s )k ν˜ s (Lϕ) ds −



t∧Tn 0



 k , (8.107) ν˜ s (Lϕ) ds (V˜t∧T ) n

and since {(V˜t )k } is a continuous {F˜ t }-martingale it follows from Problem 2.9.22 in [8] that  t

 t  k   ˜ (Vs ) ν˜ s (Lϕ) ds − ν˜ s (Lϕ) ds (V˜t )k 0

0

is a continuous {F˜ t }-martingale. Thus, the process on the left-hand side  t of (8.107) is a corlol {F˜ t }-martingale for all positive integers n, whence η˜ t (V˜t )k − 0 Rk (s) ds is a corlol {F˜ t }-local martingale. Now (8.104) follows by the uniqueness of [η, V˜  ]t . Step 4.

Put

ρ˜t := η˜ t −

r   k=1

0

t

RBk (ϕ, h k , ν˜ s ) d(V˜s − V˜0 )k .

(8.108)

˜ {F˜ t }, P), ˜ ρ˜t = ρ˜0 , ∀t ≥ 0) = 1, which shows that {(, ˜ ˜ F, We shall establish that P(   ˜ d ˜ (˜νt , Vt − V0 )} is a solution of the normalized filter equation corresponding to (R ; L, B, d ¯ )) (by Lemma 5.33) we have 22 ∈ h). Since D(H(L, B, h)) is a subalgebra of C(P(R D(H(L, B, h)) (for 2 given by (8.102)), and it is easily seen that 22 (ν) = (νϕ)2

and H(L, B, h)(22 )(ν) = 2(νϕ)ν(Lϕ) +

r 

RB2 k (ϕ, h k , ν),

k=1

∀ν ∈ P(Rd ).

(8.109)

˜ {F˜ t }, ˜ F, Now in Step 3 we have seen that {(, solves the corlol martingale problem for H(L, B, h), and hence from (8.109) and (8.104) we know that  t

r   2   2 (˜νt ϕ) − 2(˜νs ϕ)˜νs (Lϕ) + Rk (s) ds (8.110) ˜ (˜νt )} P),

0

k=1

124

V. M. Lucic and A. J. Heunis

is an {F˜ t }-martingale. From (8.101), (8.110), and Problem 2.9.29 of [8], we then get  t

r  2   2   η˜ t − 2(˜νs ϕ)˜νs (Lϕ) + Rk (s) − 2(˜νs ϕ)˜νs (Lϕ) ds 0

≡ η˜ t2 −

 t r 0 k=1

k=1

Rk2 (s) ds

(8.111)

is an {F˜ t }-martingale. Squaring (8.108) and taking expectations yields   t r  Rk (u) d(V˜u − V˜0 )k E[ρ˜t2 ] = E[η˜ t2 ] − 2 E η˜ t 0

k=1

 r  + E k=1

t

Rk2 (u) du

0

.

(8.112)

Since the quantity in (8.111) is an {F˜ t }-martingale, we have E[η˜ t2 − E[η˜ 02 ] ≡ E[ρ˜02 ], and combining this with (8.112) gives   t r  2 2   k ˜ ˜ Rk (u) d(Vu − V0 ) E[ρ˜t ] = E[ρ˜0 ] − 2 E η˜ t

k=1

0

0

k=1

Rk2 (u) du] =

0

k=1

 r  +2 E

 t r

t

Rk2 (u) du

.

(8.113)

¯ d ), it follows that {Rk (t)} defined in (8.104) is corlol, Since {˜νt } is corlol and h k ∈ C(R so that {Rk (t−)} is left-continuous and {F˜ t }-adapted, thus it is {F˜ t }-predictable. Then, since {η˜ t } has been seen to be a corlol {F˜ t }-martingale, it follows from (8.104) and Theorem VI.37.9(vi) of [19] that   t  t Rk (s−) d(V˜s − V˜0 )k = E Rk (s−) Rk (s) ds . (8.114) E η˜ t 0

0

Again since {Rk (t)} is corlol, the set {t ∈ [0, T ] : Rk (t) = Rk (t−)} is countable, hence has Lebesgue measure zero, for each ω, ˜ so that (8.114) gives   t  t   k 2 ˜ ˜ η ˜ = R (s) d( V − V ) R (s) ds . (8.115) E t E k s 0 k 0

0

Combining (8.115) with (8.113) gives E[ρ˜t2 − ρ˜02 ] = 0, and, since {ρ˜t } is an {F˜ t }martingale, this gives E[(ρ˜t − ρ˜0 )2 ] = 0, and thus ρ˜t = ρ˜0 a.s. for each t ≥ 0. However, ˜ ρ˜t = ρ˜0 , ∀t ≥ 0) = 1, as required the R-valued process {ρ˜t } is corlol, hence we find P( to show that (8.98) holds. Proof of Lemma 8.47. Fix some ϕ ∈ Cc∞ (Rd ) such that I [0, 1](|x|) ≤ ϕ(x) ≤ I [0, 2] ¯ d∗ ) by ϕn (x) := ϕ(x/n), ∀x ∈ Rd , ϕn () := (|x|), x ∈ Rd , and define ϕn ∈ C(R ∞ 0, ∀n = 1, 2, . . . . Fix H ∈ Cc (R) such that H (x) = x, for all |x| ≤ 1, and put

Convergence of Nonlinear Filters

125

n (ν) := H (ν ϕn ), ∀ν ∈ P(Rd∗ ). Since ϕn ∈ D(L , B  ) (recall (8.99a)), we see that n ∈ D(H(L , B  , h  )), with n (ν) = νϕn and H(L , B  , h  )(n )(ν) = ν(L ϕn ), ˆ {Fˆ t }, P), ˆ (ˆνt )} solves the corlol martingale problem ˆ F, ∀ν ∈ P(Rd∗ ). However, {(,    for H(L , B , h ), thus  Ntn := νˆ t ϕn −

t

νˆ s (L ϕn ) ds,

t ≥ 0,

0

is a corlol {Fˆ t }-martingale, and therefore {Ntn } is an {Fˆ t+ }-martingale. Now put   1 d∗ Mk := µ ∈ P(R ) : µ({}) < , k = 1, 2, . . . , and k M :=

∞ 

Mk ,

(8.116)

k=1

 Tmk := inf t ≥ 0 :

inf

y∈P(Rd∗ )\Mk

d(y, νˆ t )
0, put T k := limm→∞ Tmk , and observe that the limit µˆ kt ≡ lim νˆ t∧Tmk

(8.118)

m→∞

exists in P(Rd∗ ) for each k = 1, 2, . . . , since {ˆνt } has left limits. Now Mk ⊂ P(Rd∗ ) is open, from which it easily follows that {µˆ kt (ω) ∈ Mk } ⊂ {T k (ω) > t},

(8.119)

ˆ We have seen that {Ntn } is a corlol {Fˆ t+ }-martingale, thus for all k = 1, 2, . . . , ω ∈ . the Optional Sampling Theorem gives

 k ˆ

ˆ

t∧Tm

ˆ

E P [ˆνt∧Tmk ϕn ] = E P [ˆν0 ϕn ] + E P

νˆ s (L ϕn ) ds ,

0

∀k, m, n = 1, 2, . . . ,

(8.120)

and, from dominated convergence (with m → ∞, n fixed) and (8.118), we then get

 k ˆ

ˆ

∀k, n = 1, 2, . . . .

t∧T

ˆ

E P [µˆ kt ϕn ] = E P [ˆν0 ϕn ] + E P

νˆ s (L ϕn ) ds ,

0

(8.121)

Now b.p.-limn→∞ (ϕn , L ϕn ) = (IRd , 0) (from the upper-bounds in Remark 2.9 and ˆ ˆ Problem 4.11.12 of [8]), so that n → ∞ in (8.121) gives E P [µˆ kt (Rd )] = E P [ˆν0 (Rd )] = 1, ˆ µˆ kt ∈ Mk ] = 1, and therefore (see (8.119)) we have P[T ˆ k > t] = 1, which, that is P[

126

V. M. Lucic and A. J. Heunis

ˆ νs ∈ Mk , ∀s ∈ [0, t]] = 1, ∀k = 1, 2, . . . . Thus by the definition of T k , implies P[ˆ ˆ P[ˆνs ∈ M, ∀s ∈ [0, t]] = 1 (see (8.116)), and since M = {µ ∈ P(Rd∗ ) : µ({}) = 0} the result follows from the arbitrary choice of t. Proof of Lemma 5.40 (motivated by Goggin [11, p. 1100]). Fix t ≥ 0. It is enough to η show that, for each η ∈ (0, 1), there is a sequence of compacta K k ⊂ Rd , k = 1, 2, . . . , such that  1 η η n inf P πt (K k ) > 1 − ∀k = 1, 2, . . . , (8.122) ≥ 1− k, n k 2 for then it follows that    ∞  1 η n inf P πt (K k ) > 1 − ≥ 1 − η. (8.123) n k k=1  η d Now put Kη := ∞ Kη is a tight k=1 {π ∈ P(R ) : π(K k ) > 1−(1/k)}. Then it is clear that η ∞ d n η (hence relatively compact) subset of P(R ), and P[πt ∈ K ] = P[ k=1 {πtn (K k ) > 1 − (1/k)}], so that (8.123) gives   inf P πtn ∈ K¯η ≥ 1 − η. (8.124) n

Since K¯η ⊂ P(Rd ) is compact for each η ∈ (0, 1), it follows from (8.124) and the Prohorov theorem (Theorem 3.3.2 of [8]) that the sequence {L(πtn ), n = 1, 2, . . .} is relatively compact in P(P(Rd )), as required. Thus, fix some η ∈ (0, 1). It remains to η construct a sequence of compacta K k ⊂ Rd , k = 1, 2, . . . , such that (8.122) holds. By Proposition 4.23 and the Prohorov theorem, for each k = 1, 2, . . . there is a compact η K k ⊂ Rd such that η η η 1 − k < inf P[X tn ∈ K k ] = inf E[πtn (K k )], k = 1, 2, . . . , (8.125) n n k2 η

η

where the last equality follows since πtn (K k ) = P{X tn ∈ K k |Ftn }. Thus, for all n, k = 1, 2, . . . , we get   η 1 1 η η η η n n n n 1 − k < E πt (K k ) : πt (K k ) > 1 − + E πt (K k ) : πt (K k ) ≤ 1 − k2 k k 

 1 1 1 η η ≤ P πtn (K k ) > 1 − + 1− P πtn (K k ) ≤ 1 − k k k  1 1 1 η ≤ 1 − + P πtn (K k ) > 1 − , k k k from which (8.122) follows.

9.

Appendix of Miscellaneous Proofs and Results

Proof of Lemma 2.6. The existence of the Borel measures χ (z, ·) such that (2.13) and (i) hold is suggested by Remark 12.2.3 of [8]. For completeness we briefly summarize the proof:

Convergence of Nonlinear Filters

127

n ∞ Put Un  := 0 [Tt  − m] ¯ dt and U  := 0 [Tt  − m] ¯ dt, for all  ∈ C(S), n = 1, 2, . . . . Then Un is a bounded linear operator on C(S) and Remark 2.7 shows that U is a linear operator on C(S) with limn Un  − U  = 0 for each  ∈ C(S). Then U is a bounded operator (by the Uniform Boundedness Principle), thus  → (U )(z) is a bounded linear functional on C(S) for each z ∈ S. Now the Riesz Representation Theorem gives a unique signed regular Borel measure χ (z, ·) on B(S) such that (2.13) holds and supz χ (z, ·)TV ≤ |||U ||| (operator norm of U ). ∞ (i) Fix  ∈ C(S) with m ¯ = 0. Then, from (2.13), we have  = 0 Tt  dt. If {Rλ , λ ∈ (0, ∞)} is the resolvent  ∞ of the semigroup {Tt }, then, from the Dominated Convergence Theorem and 0 Tt  dt < ∞ (recall Condition 2.3), we have limλ→0+ Rλ  −  = 0, so that we get limλ→0+ (Rλ , λRλ  − ) = (, −) in C(S) × C(S). However, Rλ  ∈ D(Q) with Q(Rλ ) = λRλ  −  (see p. 11 of [8]) and Q is a closed operator (Corollary 1.1.6 of [8]), thus (, −) ∈ Q, as required. (ii) By making a Hahn decomposition of the signed measure χ (z, ·) and applying standard results on interchanging derivatives and integrals (see, e.g. Theorem 2.27 of [9]) to the integrals with respect to the positiveand negative parts of χ (z, ·), we find that f (·, z) ∈ C 1 (Rd ), ∀z ∈ S, with (∂ j f )(x, z) = S (∂ j g)(x, z  )χ (x, dz  ), ∀(x, z) ∈ Rd ×S. Thus, to establish the result, it must be shown that  v(x, z) := h(x, z  )χ (z, dz  ), ∀(x, z) ∈ Rd × S, (9.126) S

is a member of C(Rd × S) when h ∈ C(Rd × S). Fix such an h, put C := supz χ (z, ·)TV < ∞, and fix some (x0 , z 0 ) ∈ Rd × S and (small) η > 0. Let K denote the closed ball in Rd of unit radius centered at x0 . Since K × S is compact, the Stone–Weierstrass theorem gives some positive integer n and ϕi ∈ C(K ) and i ∈ C(S), i = 1, 2, . . . , n, such that   n    η     ϕi (x)i (z ) < (9.127) , ∀(x, z  ) ∈ K × S h(x, z ) −   3C i=1  (e.g. see Exercise 4.68 of [9]). Now put i (z) := S i (z  )χ (z, dz  ), ∀z ∈ S, and observe, from (2.13), that i ∈ C(S). Thus there is some δ > 0 such that   n   η   (9.128)  [ϕi (x)i (z) − ϕi (x0 )i (z 0 )] < ,  i=1  3 for all (x, z) ∈ Rd × S with |x0 − x| + r (z, z 0 ) < δ (where r denotes the metric on S). However, from (9.126), v(x, z) − v(x0 , z 0 ) 

   = h(x, z ) − ϕi (x)i (z ) χ (z, dz  ) S

i

 + [ϕi (x)i (z) − ϕi (x0 )i (z 0 )] i

+

  S

i





ϕi (x0 )i (z ) − h(x0 , z ) χ (z 0 , dz  ),

(9.129)

128

V. M. Lucic and A. J. Heunis

for all (x, z) ∈ K ×S. From (9.127), (9.128), and (9.129) we see that if |x−x0 |+r (z, z 0 ) < δ, then (x, z) ∈ K × S and |v(x, z) − v(x0 , z 0 )| < η. ˆ D ) in The following result is an immediate consequence of the separability of C(R the supremum norm (see also Remark 2.5 of [16]): Lemma 9.48. For every n = 1, 2, . . . , there exists a countable set Hn ⊂ Cc∞ (Rn ) with the following property: for each ϕ ∈ Cc∞ (Rn ) there is some sequence {ϕk } ⊂ Hn such that, when g: Rn → R is bounded on bounded sets, we have limk→∞ gϕk − gϕ = 0, limk→∞ g ∂i ϕk −g ∂i ϕ = 0, and limk→∞ g ∂i ∂ j ϕk −g ∂i ∂ j ϕ = 0, ∀i, j = 1, 2, . . . , n.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21.

Bhatt A.G., Kallianpur G., Karandikar R.L. (1999) Robustness of the nonlinear filter. Stochastic Process Appl., 81: 247–254. Bhatt A.G., Karandikar R.L. (1993) Invariant measures and evolution equations for Markov processes characterized via martingale problems. Ann. Probab., 21: 2246–2268. Bhatt A.G., Karandikar R.L. (1993) Weak convergence to a Markov process: the martingale approach. Probab. Theory Rel. Fields, 96: 335–351. Bhatt A.G., Karandikar R.L. (2002) Robustness of the nonlinear filter: the correlated case. Stochastic Process Appl., 97: 41–58. Billingsley P. (1968) Convergence of Probability Measures. Wiley, New York. Blankenship G., Papanicolaou G.C. (1968) Stability and control of stochastic systems with wide-band noise disturbances I. SIAM J. Appl. Math., 34: 437–476. Dawson D.A. (1993) Measure-Valued Markov Processes. Lecture Notes in Mathematics, no. 1541, pages 1–260. Springer-Verlag, Berlin. Ethier S.N., Kurtz T.G. (1986) Markov Processes: Characterization and Convergence. Wiley, New York. Folland G. (1984) Real Analysis: Modern Techniques and Their Applications. Wiley, New York. Fujisaki M., Kallianpur G., Kunita H. (1972) Stochastic differential equations for the non-linear filtering problem. Osaka J. Math., 9: 19–40. Goggin E.M. (1994) Convergence in distribution of conditional expectations. Ann. Probab., 22: 1097– 1114. Hijab O. (1989) Partially observed control of Markov processes I. Stochastics, 28: 123–144. Kallianpur G. (1980) Stochastic Filtering Theory. Springer-Verlag, Berlin. Karandikar R.L. (1989) On the Metivier–Pellaumail inequality, Emery topology and pathwise formulae in stochstic calculus. Sankhy¯a Ser. A, 51: 121–143. Karatzas I., Shreve S.E. (1991) Brownian Motion and Stochastic Calculus, 2nd edn. Springer-Verlag, Berlin. Kurtz T.G. (1998) Martingale problems for conditional distributions of Markov processes. Electron J. Probab., 3: 1–29. Kurtz T.G., Ocone D.L. (1988) Unique characterization of conditional distributions in nonlinear filtering. Ann. Probab., 16: 80–107. Lucic V.M., Heunis A.J. (2001) On uniqueness of solutions for the stochastic differential equations of nonlinear filtering. Ann. Appl. Probab., 11: 182–209. Rogers L.C.G., Williams D. (1987) Diffusions, Markov Processes and Martingales, Vol. 2: Itˆo Calculus. Wiley, New York. Stroock D.W., Varadhan S.R.S. (1979) Multidimensional Diffusion Processes. Springer-Verlag, Berlin. Wong E., Hajek B.E. (1985) Stochastic Processes in Engineering Systems. Springer-Verlag, Berlin.

Accepted 27 March 2003. Online publication 7 August 2003.