Measure Valued Processes and Interacting Particle Systems ...

10 downloads 0 Views 462KB Size Report
aim of this work is the design of a stochastic particle system approach for the ... pirical measures of the particle system, converge to the solution of (1) as the ...
Measure Valued Processes and Interacting Particle Systems. Application to Non Linear Filtering Problems P. DEL MORAL Abstract

In the paper we study interacting particle approximations of discrete time and measure valued dynamical systems. Such systems have arisen in such diverse scienti c disciplines as physics and signal processing. We give conditions for the so-called particle density pro les to converge to the desired distribution when the number of particles is growing. The strength of our approach is that is applicable to a large class of measure valued dynamical system arising in engineering and particularly in nonlinear ltering problems. Our second objective is to use these results to solve numerically the nonlinear ltering equation. Examples arising in uid mechanics are also given.

1 Introduction

1.1 Measure valued processes

Let (E; (E )) be a locally compact and separable metric space, endowed with a Borel  - eld, state space. Denote by P (E ) be the space of all probability measures on E with the weak topology. The aim of this work is the design of a stochastic particle system approach for the computation of a general discrete time and measure valued dynamical system n given by

.

8n  1

n = (n; n?1)

0 = 

(1)

where  2 P (E ) and (n; ) : P (E ) ! P (E ) is a continuous function. Such systems have arisen in such diverse scienti c disciplines as physics (see Sznitman [44] and the references given there), nonlinear economic modelling and signal processing ( see Kunita [29] and Stettner [40]). Solving (1) is in general an enormous task as it is nonlinear and usually involves integrations over the whole space E . To obtain a computionnally feasible solution some kind of approximation is needed. On the other hand, particle methods have been developed in physics since the second world war, mainly for the need of Fluid Mechanics (Meleard [32], McKean [33], Sznitman [43]) and Statistical Mechanics ( Dobrushin [22], Liggett [31], Spitzer [42]). During the decade their application area has grown establishing unexpected connection with number of other elds.

Short title: Non Linear Filtering Using Interacting Particles. Keywords: Non linear ltering, measure valued processes, interacting and branching particle systems, genetic algorithms. AMS subject classi cation: 60G57, 60K35, 60G35, 93E11. 1

Our major motivation is from advanced signal processing applications in engineering and particularly in optimal non linear ltering problems. Recall that this consists in computing the conditional distribution of internal states in dynamical systems, when partial observations are made, and random perturbations are present in the dynamics as well as in the sensors. With the notable exception of the linear-Gaussian situation or wider classes of models (Benes lters [1]) optimal lters have no nitely recursive solution (Chaleyat-Maurel/Michel [7]). Nevertheless, guided by Bayes' rule we will see that the dynamics structure of such conditional distributions can be viewed, under mild assumptions, as a special case of (1). In our formulation the most important measure of complexity of the problem is now reduced to the in nite dimensionality of the state space P (E ). The main advantage of dealing with equation (1) rather than the conditional expectation is that the solution of (1) is Markovian and the solution of the conditional expectation plant equation is not. Our claim that this formulation is the natural framework for formulating and solving problems in nonlinear ltering will be amply justi ed by the results which will follows. The paper has the following structure. After xing the context within which we work we introduce in section 3 a stochastic particle system approach to solve (1). The particle systems described in this section will consist of nitely many particles and the systems as a whole will be Markov processes with product state space. In this framework the transition for individual particles will depend on the entire con guration, So that the particles are not independent and the evolution of an individual particle is no longer Markovian. The models to be treated in section 3.1 are obtained by imposing various type of interactions on the motions of the particles. We begin in this section with the description of a simple particle system in which the interaction function only depends on the empirical measure of the particle system. Furthermore, it is shown that the so-called particle density pro les, i.e. the random empirical measures of the particle system, converge to the solution of (1) as the number of particle is growing. The proof of convergence is based on the use of semigroup techniques in the spirit of Kunita [29] and Stettner [40]. Our next objective is to extend the above construction. In section 3.2 we introduce general particle systems which includes branchings and nonlinear interactions and we also prove that the random empirical measures of the particle system weakly converge to the solution of (1) as the number of particle is growing. The proof of convergence involves essentially the same analytic techniques which are used in section 3.1. We will examine as much as the theory as possible in a form applicable to general non linear ltering problems and to the master equations arising in uid mechanics. The strength of our approach is that it is applicable to a large family of measure-valued dynamical systems. Several examples are worked out in section 3.3. One way in which our approach may be applied to uid mechanics problems are discussed. We will only examine the convergence of the particle density pro les. In real problems the crucial question is the convergence of the empirical measures of the particle system on the path space. These problems are quite deep and we shall only scratch the surface here. For a detailed discussion and a full treatment of the above questions the reader is referred to Sznitman [43], [44] and the references given there. However we emphasize that our proof only uses semigroup techniques and turns out to be more transparent. In section 4 , we will use the results of section 3 to describe several interacting particle approximations of the non linear ltering plant equation. This section is divided into two subsections. In section 4.1 we present a general discussion of the nonlinear ltering theory and we formulate the ltering problem in such a way that the techniques of section 3 can be applied. The design of our particle system approach is described in section 4.2. We nish the paper with the description of general interacting particle resolutions. 2

1.2 Non linear ltering

In our development, special knowledge of non linear ltering theory is not assumed. For a detailed discussion of the ltering problem the reader is referred to the pioneering paper of Stratonovich [41] and to the more rigorous studies of Shiryaev [39] and Kallianpur-Striebel [28]. More recent developments can be found in Ocone [34] and Pardoux [35]. Some collateral readings such as Kunita [29], Stettner [40], Michel [7] will be helpful in appreciating the relevance of our approximations. As far as the author knows the various numerical methods based on xed grid approximations, conventional linearization or determining the best-linear lter (in expected cost error sense) have never really cope with large scale systems or unstable processes. They are usually far from optimal, particularly in high noise environment, when there is signi cant uncertainly in the observations or when the non linearities are not suitable smooth. More precisely, it is in general impossible to identify an event space region R so that the trajectories of the state space lies entirely in R. So that, it is dicult to use x grid methods. Moreover it is well known that the large state dependent noise has destabilizing e ects on the dynamic of the best linear lter and tends to increase the magnitude of its gain. It has been recently emphasized that a more ecient way is to use random particle systems to solve numerically the ltering problem. That particle algorithms are gaining popularity is attested by the recent papers of Crisan/Lyons [9] [10] [11], Gordon/Salmon/Smith [24], Van Dootingh/Viel/Rakotopara/Gauthier [45], Carvalho [5], Carvalho/Del Moral/Monin/Salut [4], Del Moral/Rigal/Noyer/Salut [18], Del Moral/Noyer/Salut [20] and Del Moral [15] [16], [17]. Instead of hand-crafting algorithms often on the basis of ad-hoc criteria, particle systems approaches provide powerful tools for solving a large class of non linear ltering problems. Let us now brie y survey some di erent approaches and motivate our work. In [14] and [15] the author introduced particle methods where particles are independent each other and weighted by regularized exponentials and he proposes minimal conditions which ensure the convergence to the optimal lter uniformly with respect to time. Crisan and Lyons develop in [9] a sequence of branching processes in the spirit of Dawson and Watanabe constructions [12],[46] whose expectation converges to the solution of the Zakai equation. In [10] they construct a di erent sequence of branching particle systems converging in distribution to the solution of the Zakai equation. In their last work [11] they describe another sequence of of branching particle systems converging in measure to the solution of the KushnerStratonovitch equation. In the present paper we describe di erent interacting and branching particle systems and we prove that the empirical measure converges to the desired conditional distribution. The connection between such particle systems and genetic algorithms is given in [17]. These algorithms are an extension of the well known Sampling-Resampling principles introduced by Gordon, Salmon and Smith in [24] and independently by Del Moral, Rigal, Noyer and Salut in [18], [19] and [16]. Roughly speaking they consist of periodically redistributing the particle positions in accordance to a discrete representation of the conditional distribution. This procedure allows particles to give birth to more particles at the expense of light particles which die. This guarantees an occupation of the state space regions proportional to their probability mass, thus providing an adaptative and stochastic grid. We will prove the convergences of such approximations to the optimal lter, yielding what seemed to be the rst convergence results for such approximations of the nonlinear ltering equations. This new treatment was in uenced primarily by the development of genetic algorithms (J.H. Holland [26], R. Cerf [6]) and secondarily by the papers of H.Kunita and L.Stettner ([29], [40]). 3

Several practical problems examples which have been solved using these methods are given in [4], [20], [24], [18], [5] including problems in Radar/Sonar signal processing and GPS/INS integration. There is an essential di erence between our particle systems and the branching particle systems described in [9], [10] and [11]. What makes our results interesting and new is that the number of particles is xed and the system of interacting particles is as a whole a Markov process which feel its environment according the observations. Moreover, the transition for individual particle depends on the entire con guration of the system and not only on its current position. On the other hand our constructions are explicit, the recursions have a simple form and they can be easily be generated on a computer. Thus, armed with these algorithms and the worked out examples of section 3.3 the users should be able to handle a large class of nonlinear ltering problems. However here the diculties are well known: these stochastic algorithms use Mont Carlo simulations and they are usualy slow when the state space is too large. We will not describe the so-called complexity of the interacting particle algorithms. The reader who wishes to know details about such questions is recommended to consult [5],[4],[24],[18]. There is no universal agreement at present on the choice of the most ecient particle algorithm. Obviously this is a topic in which no one can claim to have the nal answer. Perhaps such problem will become relatively transparent only after we have develop a theory powerful enough to describe all of these particle algorithms. Such algorithms have been made valuable in practice by advances in computer technology. They fall within the scope of new architecture computers such as vector processors, parallel computers and connection machines. Moreover, they are ready to cope with real-time constraints imposed by practical problems.

2 Preliminaries In this paper we will consider stochastic processes with values in the set of all probability measures on E . Such type of processes appear naturally in the study of Nonlinear Filtering problems (see for instance [9], [29] and [40]) and in Propagation of Chaos Theory (see for instance [38], [43] and [44] ). In this short section we summarize the key concepts and the various forms of convergence of probability measures which are used throughout the paper. For further information the reader is referred to Parthasarathy [36] and Dobrushing [23].

2.1 E-valued random variables

Assume E is a locally compact and separable metric space. By (E ) we denote the  -algebra of Borel subsets in E and by P (E ) the family of all probability measures on (E; (E )). As usually by B(E ) we denote the space of all bounded Borel measurable functions f : E ! R, by Cb (E ) the subspace of all bounded continuous functions and by U (E ) the subspace of all bounded and uniformly continuous functions. In these spaces the norm is

kf k = sup jf (x)j x2E

For an f 2 B(E ) and  2 P (E ) we write

f =

Z

f (x) (dx) 4

We say that a sequence (N )N 1, N 2 P (E ), converges weakly to a measure  2 P (E ) if

8f 2 Cb(S )

lim  f N !+1 N

= f

Let  2 P (E ), f 2 Cb (E ) and, let K1 and K2 be two Markov kernels. We will use the standard notations

K1(dy) = K1f (x) =

Z

Z

(dx) K1 (x; dy)

K1K2(x; dz) =

Z

K1(x; dy) K2(y; dz)

K1 (x; dy) f (y)

(2) (3)

A transition probability kernel K on E is said to be Feller if

f 2 Cb (E ) =) K f 2 Cb(E ) (4) With a Markov kernel K and a measure  2 P (E ) we associate a measure   K 2 P (E  E ) by setting

Z

8f 2 Cb(E  E ) (  K )f = (dx ) K (x ; dx ) f (x ; x ) It is well known that P (E ) with the topology of weak convergence can be considered as a metric space with metric  de ned for ;  2 P (E ) by X ?m (;  ) = 2 jfm ? fm j (5) 1

(

1

2

1

2

+1)

m1

where (fm )m1 is a suitable sequence of uniformly continuous functions such that kfm k  1 for all m  1 (theorem 6.6 pp. 47, Parthasarathy [36]). Moreover we can show that P (E ) is a separable metric space with metric  (theorem 6.2 Parthasarathy [36]).

2.2 Measure valued random variables

Recall P (E ) is a separable metric space with metric : By (P (E )) we denote the  -algebra of Borel subsets in P (E ) and by P (P (E )) the collection of all probability measures on (P (E ); (P (E ))). By B(P (E )) we denote the space of all bounded Borel measurable functions F : P (E ) ! R, by Cb(P (E )) the subspace of all bounded continuous functions and by U (P (E )) the subspace of all bounded and uniformly continuous functions. As usually, in these spaces the norm is

kF k = sup jF ()j 2P (E )

For an F 2 B(P (E )) and  2 P (P (E )) we write F =

Z

F () (d)

We say that a sequence (N )N 0, N 2 P (P (E )), converges weakly to a measure  2 P (P (E )) if

8F 2 Cb(P (E )) and this will be denoted N

lim  F N !+1 N

= F

(6)

w

???! . N ! +1

In this paper we will consider measure valued stochastic processes, it is therefore convenient to 5

recall the de nition of the conditional expectation of a P (E )-valued random measure relative to a - eld (cf. H.Kunita [29]). Let (!) be an P (E )-valued random variable de ned on a probability space ( ; F; P ). The conditional expectation of  relative to a sub- - eld G  F is de ned as a P (E )-valued random variable E (=G) such that

F (E (=G)) = E (F ()=G) holds for all continuous ane functions F : P (E ) ?! R (F 2 Cb (P (E )) is ane if there exists a real constant c and a function f 2 Cb (E ) such that for every  2 P (E ) F () = c + (f )). A linear mapping M : B(P (E )) ! B(P (E )) such that

M (1P (E)) = 1P (E)

M (F )  0 8F 2 B(P (E )); F  0

and

is called a Markov operator or Markov transition on P (E ). Then we may de ne a linear mapping, still denoted by M by setting

M : P (P (E )) ! P (P (E ))  7! M with M (A) def = (M (1A )) 8A 2 (P (E ))

(7)

A Markov transition M is said to be Feller if

MF 2 Cb (P (E ))

8F 2 Cb(P (E )) For an F 2 Cb (P (E )) and M ; M two Feller transitions on P (E ) we write Z Z M F () = M (; d ) F ( ) M M F () = M (; d )M ( ; d ) F ( ) 1

1

by

2

1

1

2

1

1

1

1

2

2

Now, we introduce the Kantorovitch-Rubinstein or Vaserstein metric on the set P (P (E )) de ned

D(; ) = inf

Z

(;  ) (d(;  )) :  2 P (P (E )  P (E )) p1   =  and p2   =



(8)

In other words D(; ) = inf E ((;  )), where the lower bound is taken over all pair of random variables (;  ) with values in (P (E ); (P (E ))) such that  has the distribution  and  the distribution . The metric  being a bounded function, formula (8) de nes a complete metric on P (P (E )) which gives to P (P (E )) the topology of weak convergence (see Theorem 2 in Dobrushing [23]). The proof of this last statement is very simple. We quote here its outline for the convenience of the reader. Let (N )N 0 , N 2 P (P (E )), N  0, be a sequence of probability measures such that lim D(N ; ) = 0  2 P (P (E )) N !+1 For every F 2 U (P (E )) and  > 0 there exists %() > 0 such that

8(;  ) 2 P (E )  P (E )

j(;  )j  %() =) jF () ? F ( )j  

Let (; N )N 1 be a sequence of measure valued random variables on some probability space such that N , N  1, have distributions N 2 P (P (E )), N  1 and  is a measure valued random variable with distribution  2 P (P (E )). For every F 2 U (P (E )) and  > 0 one gets

jN F ? F j  E (jF (N ) ? F ()j)   + 2kF k P ((N ; ) > %())   + 2kF k%()? E ((N ; )) 1

6

So that,

jN F ? F j   + 2%k(F)k D(N ; )

Letting N go to in nity and recalling that, in the weak convergence (6), only bounded and uniformly continuous functions can be considered (theorem 6.1 pp. 40, Parthasarathy [36]) we obtain our claim. On the other hand, we can apply the monotone convergence theorem to prove that

D(N ; ) 

X

m1

2?(m+1) E (jN fm ? fm j)

(9)

where (fm )m1 is the sequence of bounded and uniformly continuous functions introduced in (5). Hence, by the dominated convergence theorem, it follows that

8f 2 U (E )

lim E (jN f ? f j) = 0 =) N ! lim+1 D(N ; ) = 0

N !+1

(10)

Remark 1 1. It is interesting to note that if, in addition,  is a constant probability distribution the functions

F ( ) = jf ? f j

f 2 Cb (E ) are continuous for the weak convergence topology in P (E ). So that

8f 2 U (E )

lim E (jN f ? f j) = 0 () N ! lim+1 D(N ; ) = 0 w () N ???!  N ! +1

(11)

lim E (jN f ? f j) = 0 () N ! lim+1 E (jN f ? f j2 ) = 0

(13)

N !+1

(12)

2. It is also easily seen that

8f 2 U (E )

N !+1

2.3 Convergence of empirical measures

In the study of Markov models of interacting particles one looks at a N -particle systems ( 1;N ; : : :;  N;N ) satisfying a stochastic dynamical equation or more generally evolving according to a Markov transition probability kernel on a product space E N , N  1. Such models are used in Fluid mechanics to study the many particle nature of real systems with internal uctuations (see Sznitman [22], [38], [42] and [44]) and in [16] the author proposes one way to use such models to solve numerically the so-called nonlinear ltering equation. The problem of weak convergence in both situations consists to study the limiting behavior of the empirical measures N X (14)  N = N1 i=1

i;N

as the number of particles N is growing. In the rst situation the measures N are shown to converge in law, as N ! +1, to a constant probability measure  which is called the McKean measure (see for instance Tanaka [38]). Therefore, to prove convergence it is enough to verify (11). In nonlinear ltering problems we will prove that the measures N converge in law, as N ! +1, to the desired conditional distribution. In this situation it is convenient to work in a rst stage with a given sequence of observations and we will formulate the conditional distributions as a non random 7

probability measure  parameterized by the given sequence of observation parameters and solution of a measure valued dynamical system which is usually called the non linear ltering equation. To prove convergence it is enough to verify (11) P -a.s. for every sequence of observations and then apply the dominated convergence theorem. In Statistical Physics and Fluid Mechanics the dynamical system (1) usually describes the time evolution of the density pro les of McKean-Vlasov stochastic processes with mean eld drift functions. It was proposed by McKean and Vlasov to approximate the corresponding equations by mean eld interacting particle systems. A crucial practical advantage of this situation is that the dynamical structure of the non linear stochastic process can be used in the design of an interacting particle system in which the mean eld drift is replaced by a natural interaction function. Such models are called in Physics Masters equations and/or weakly interacting particle systems. In this situation it is convenient to use the following

Lemma 1





Let ( ; F; P ) be a probability space on which is de ned a pair sequence ( i;N )1iN ; (i;N )1iN of N -particle systems so that the distribution uN of ( i;N )1iN is a symmetric probability measure on E N and ( i;N )1iN are N i.i.d. with common law  . For every f 2 U (E ) we have



lim E d(

N !+1

Proof:

;N ; 1;N )

1



! N X 1 i;N f ( ) ? f j = 0 = 0 =) N ! lim+1 E j N i=1

(15)

Let %() be the modulus of continuity of the function f 2 U (E ). Then for every  > 0 we have

! ! ! N N N X X X 1 1 1 i;N i;N i;N i;N E j N f ( ) ? f j  E N jf ( ) ? f ( )j + E j N f ( ) ? f j i=1 i=1 i=1 2 k f k 2 k f k   + %() E (d(1;N ; 1;N )) + p N

Letting N ! +1, the lemma is proved. This lemma gives a simple condition for the convergence in law of the empirical measures when the interacting particle systems are described by a stochastic dynamical equation. More precisely, this powerful result can be used when the desired distribution  is the distribution of a nite dimensional stochastic process. It will be applied in section 3.3.2 to study the convergence in law of the empirical measures of weakly interacting particle systems. In non linear ltering problems the dynamical system (1) describes the time evolution of the conditional distribution of the internal states in dynamical systems when partial observations are made. In contrast to the situation described above the conditional distributions cannot be viewed as the law of a nite dimensional stochastic process which incorporates a mean eld drift [7]. We therefore have to nd a new strategy to de ne an interacting particle system which will approximate the desired distributions. We propose hereafter a new interacting partixle system approach and another method to prove convergence.

8

3 Measure Valued Processes The chief purpose of this section is the design of two special models of stochastic particle systems for the numerical solving of a discrete time and measure valued dynamical system n given by

n = (n; n?1) 8n  1 0 =  (16) where  2 P (E ) and (n; ) : P (E ) ! P (E ) are continuous functions. This section has the

.

following structure: In section 3.1 we describe a natural particle approximation with a simple interaction function. In this situation the interaction depends on the current positions but it does not depend on their paths. In section 3.2 we introduce a general particle system which includes branching and interaction mechanisms. We emphasize that in both situations the nature of the interaction function is dictated by the plant equation (16). To illustrate our approach we nish this section with practical examples for which all assumptions are satis ed.

3.1 Interacting Particle Systems

3.1.1 The particle system state space

The particle system under study will be a Markov chain with state space E N , where N  1 is the size of the system. The N -tuple of elements of E , i.e. the points of the set E N , are called particle systems and will be mostly denoted by the letters x; y; z . The local dynamics of the system will be described by a product transition probability measure. Thus, to clarify the notations, with  2 P (E ) we associate a measure  N 2 P (E N ) by setting

 N = | :{z: : }

times and we note mN (x) the empirical measure associated to the point x = (x1; : : :; xN ): N

mN (x) = N1

N X j =1

x

j

3.1.2 The Associate Markov Process

Let ( ; Fn; (n )n0 ; P ) be the E N -valued Markov process de ned by

P (0 2 dx) =  N (dx)

P (n 2 dx= n?1 = z) =  N (n; mN (z))(dx)

(17)

where dx def = dx1  : : :  dxN is an in nitesimal neighborhood of x = (x1; : : :; xN ) 2 E N . It is clear from the construction above that n = (n1 ; : : :; nN ) can be viewed as a system of N particles with non linear interaction function (n; mN (n?1 )). The algorithm constructed in this way will be called an interacting particle approximation of (1). The terminology interacting is intended to emphasized that the particles are not independent and the evolution of an individual particle is no longer Markovian. Nevertheless, the system as a whole is Markovian. Much more is true, the above description enables us to consider the particle density pro les

nN def = mN (n ) as a measure-valued Markov process ( ; Fn;  N ; P ) de ned by P (0N 2 d) = M0 CN (d) P (nN 2 d= nN?1 =  ) = MnCN (; d) 9

(18)

where d is an in nitesimal neighborhood of  and, CN and Mn , n  0, are the Markov transitions on P (E ) given by

Mn F ( ) = F ((n;  ))

CN F ( ) =

Z

E

N

F (mN (x))  N (dx)

(19)

for every F 2 Cb (P (E )) n  0 and  2 P (E ), with the convention (0;  ) =  , for all  2 P (E ). To see this claim it suces to note that

Z

F (mN (x))  N (n;  )(dx) = CN F ((n;  )) = Mn (CN F )( )

E (F (nN )=nN?1 =  ) = for all F 2 Cb (P (E )) and  2 P (E ).

3.1.3 Description of the Algorithm

At the time n = 0 the particle system consists of N independent random particles 01 ; : : :; 0N . with common law  . At the time n  1 the empirical measure mN (n?1 ) associated to the particle system n?1 enters in the plant equation (1) so that the resulting measure (n; mN (n?1 )) depends on the con guration of the system at the previous time n ? 1. Finally, the particle system at the time n consists of N independent (conditionally to n?1 ) particles n1 ; : : :; nN with common law (n; mN (n?1 )). We refer to section 3.3 for further discussions and detailed examples.

3.1.4 Convergence of the Algorithm

The crucial question is of course whether nN converges to n as N is growing. When the state space E is compact we show hereafter a slightly more general result. Theorem 1 Let us suppose that E is compact. Let M = (Mn)n1 denote a series of timeinhomogeneous and Feller Markov transitions on P (E ), M0 2 P (P (E )) and, let Mn(N ) = Mn CN , n  0. Using these notations we have w

M0(N )M1(N ) : : :Mn(N ) ????! M0 M1 : : :Mn 8M0 2 P (P (E )) N ! +1 More generally (20) holds when E is locally compact and M = (Mn )n1 such that Mn F 2 U (P (E )) 8F 2 U (P (E ))

(20)

(21)

In order to prepare for its proof we begin with Lemma 2 If E is compact, then for every N  1, CN is a Feller transition kernel and, we have for every F 2 Cb (P (E )) lim kC F ? F k = 0 (22) N !+1 N

When E is locally compact (22) holds for every F 2 U (P (E )) To throw some light on the convergence (22) and its connection with the law of large number assume that the function F is given by

F () = jf ? f j 10

8 2 P (E )

where f 2 Cb (E ) and  2 P (E ). By a direct calculation it follows from (22) that N X sup E j N1 f ( i ) ? f j

 2P (E )

!

i=1

??????! 0 N ! +1

  where E N; E N ; ( i)i1 ; P is a sequence of E -valued and independent random variables with common law  and, E ( ) denote the expectation with respect to P . So that lemma 2 can be regarded as a uniform weak law of large number.

.

The proof of Theorem 1 is comparatively short, therefore we give it rst.

Proof of Theorem 1: For every F 2 Cb (P (E )) we observe that kCN M N : : :MnN F ? M : : :MnF k  kCN M (CN M N : : :MnN ? M : : :Mn)F k + kCN M : : :MnF ? M : : :Mn F k  kCN M N : : :MnN F ? M : : :MnF k + kCN M : : :MnF ? M : : :MnF k nX ?  kCN Mp : : :Mn F ? Mp : : :Mn F k + kCN F ? F k ( 1

(

)

1

1 ( 2

)

( 2

)

)

(

1

)

(

)

1

2

+1

p=0

2

1

1

1

(23)

+1

Thus, using lemma 2 and recalling that, in the weak convergence, only bounded uniformly continuous functions can be considered the proof of Theorem 1 is straightforward. We come to the proof of Lemma 2

Proof of Lemma 2:

Let us set

f (x1 ; : : :; xN ) =

N Y i=1

fi (xi)

f1 ; : : :; fN 2 Cb (E )

and assume n 2 P (E ) is a sequence which weakly converges to  2 P (E ) when n tends to in nity. Then, we obtain N N Y Y lim  ( f ) =  (fi) n i n!+1 i=1

i=1

Since linear combinations of such functions are dense in Cb (E N ), on can check easily that CN is Feller. Let F~ ( ) def = F ( (f1 ); : : :;  (fp)) with fi 2 Cb (E ) and F globally Lipschitz, that is:

jF (x ; : : :; xp) ? F (x0 ; : : :; x0p)j  A 1

Then we obtain:

1

p X i=1

jxi ? x0ij

A < +1

p Z 1 X N X j N fi (xj ) ?  (fi)j  (dx ) : : : (dxN ) i j p  X ??????! 0  pA  (fi ) ?  (fi ) N i N ! +1

jCN F~( ) ? F~( )j  A

1

=1

2

2

=1

Therefore, for some constant B > 0

jCN F~ ( ) ? F~ ( )j  pB N 11

1 2

(24)

Now, if E is compact such F~ are dense in Cb (P (E )) and CN is Feller, CN F converges to F for every F 2 Cb (P (E )). Finally let us assume that E is locally compact. Let %() be the modulus of continuity of the function F 2 U (P (E )). Then for every  > 0 we have Z 2 k F k jCN F ( ) ? F ( )j   + %() (;  ) CN (; d) In view of (5) it follows that

N X ?(m+1) Z 1 X 2 k F k 2 jCN F ( ) ? F ( )j   + %() j N fm (xi) ? fm j  (dx1) : : : (dxN ) E i=1 m1 where (fm )m1 is a suitable sequence of uniformly continuous functions such that kfm k  1 for all m  1. In the same way exactly as in (24) we can prove that jCN F ( ) ? F ( )j   + 2kFpk %() N The above inequality immediately implies the last assertion. This ends the proof of the lemma. N

.

Recall that the functions (n; ) are continuous, so that the transitions de ned by

8 2 P (E ) 8F 2 Cb(P (E ))

Mn F ( ) = F ((n;  ))

are Feller transition probability kernels. The interpretation of Theorem 1 is clear. The theorem states that under rather wide conditions

8n  0

lim E (F (nN )) = F (n )

8F 2 Cb(P (E ))

N !+1

Applying this, one can obtain the limit of the moments of the particle density pro le error

8n  0

8f 2 Cb(E )

8p  1

lim E (jnN f ? n f jp ) = 0

N !+1

This result can be regarded as a convergence theorem which vindicates the approach by semigroup techniques to a fairly general class of measure valued dynamical systems. It will be applied in section 3.3 to the so-called master equations of uid mechanics. Unfortunately, when the state space E is not compact or when the Feller transitions Mn , n  1, do not satisfy the condition (21) the question of convergence is quite dicult. In this situation we must check as usual the tightness of the laws of the random measures mN (n ), N  1, and identify all limit points as being concentrated on the desired measure n (Billingsley [2]). Thus very few substantive statements can be made about the convergence in view of the generality of our dynamical system (1). Our next objective is to study an intermediate situation. More precisely we introduce an additional assumption with regard to the functions (n; ) which enables us to develop some useful theorem. In a little while we will see one way in which this result may be applied in nonlinear ltering problems.

.

12

Theorem 2 Suppose that for every f 2 Cb(E ),  2 P (E ) and n  1 there exist some constant n(; f ) and a nite set of bounded functions Hn (; f ) such that X jh ? hj (25) 8 2 P (E ) j(n;  )f ? (n; )f j  n(; f ) h2H (;f ) n

Then, for every f 2 Cb (E ) and n  1 there exists An (f ) < +1 such that

E (jnN f ? nf j2)  AnN(f )

(26)

Therefore, if Nn is the distribution of nN and n F () = F (n ) for all  2 P (E ) we have

8n  1

lim D(Nn ; n) = 0

(27)

N !+1

where D is the Vaserstein metric introduced in (8).

.

The condition (25) strongly depends on the nature of the function (n; ) which governs the dynamics of the distributions (n )n0 . Although this seems to be a very general condition it may not rule out certain kind of system of practical interest. For instance, the same does not work for the so-called master equations arising in uid mechanics. Nevertheless, theorem 1 may be applied to study these equations. Before proceeding to the proof of the theorem let us examine some typical situations for which the condition (25) is met. 1. Let us suppose that our functions (n; ), n  1, have the form (n; ) = Kn , where (Kn)n0 is a family of Feller transition on E . Then we have for every f 2 Cb (E ) and  2 P (E )

.

j(n;  )f ? (n;  )f j  j (Knf ) ? (Knf )j (28) for every f 2 Cb (E ) and ;  2 P (E ) So that, condition (25) is satis ed with n (; f ) = 1 and Hn (; f ) = fKnf g. Note that the set Hn (; f ) = fKnf g does not depends on the measure  .

2. At this point it is already useful to give some example of measure-valued dynamical system which will appear in non linear ltering problems. Next suppose that the measure valued dynamical system of interest is described by the equations

8n  1  =  where  2 P (E ) and the continuous functions n : P (E ) ! P (E ) are given by n()f = (gn(gTn)f ) 8n  1 8f 2 Cb(E ) 8 2 P (E ) n n = n (n?1 )

0

(29)

where  (Tn)n1 is a family of Feller transitions on E .  (gn)n1 is a family of continuous and bounded functions such that 0 < an  gn (x)  An

8x 2 E 8n  1

(30)

for some constants an and An , n  1. We immediately notice that this example is a generalization of the example one (if gn (x) = 1 for all x 2 E then n () = Tn ). Moreover we will see in the last part of the paper that the 13

functions (29) prescribe the dynamics structure of the optimal lter in non linear ltering problems. Now, we observe that   n()f ? n( )f = (n ()f ? n ( )f )  ((ggn)) + (1 ?  ((ggn)) ) n  (2)n  (1) =  hn + (n ()f ? n ( )f )  hn with

  g  ( g T f ) n n n hn =  (g ) Tnf ?  (g ) n n

gn h(2) n = 1 ?  (g )

(1)

n

It follows that   (1) (2) (2) jn()f ? n ( )f j  (1 + 2kf k) j(h(1) n ) ?  (hn )j + j(hn ) ?  (hn )j

(31)

(1) and condition (25) is satis ed with n (; f ) = 1 + 2kf k and Hn (; f ) = fh(1) n ; hn g. To see (31) it suces to note that (g T f )  (g T f ) (1) (2) hn = hn = 0 and n(g n) ? n(g n)  2kf k n n

Proof of Theorem 2:

Let us show (26) by induction on n  0. Consider the rst case n = 0. By the very de nition of 0N , on gets easily

E (j0N f ? 0 f j2) = E (j0N (f ? 0f )j2)

Z 1X N  ( N (f (xi) ?  f ))  (dx ) : : : (dxN )  (2kNf k) = A N(f ) i Suppose the result is true at rank (n ? 1). The assumption (25) implies the existence of a constant n(n? ; f ) > 0 and a nite set of bounded functions Hn (n? ; f ) such that E (jnN f ? n f j ) = E (j(n; nN? )f ? (n; n? )f j ) + E (jf (n) ? (n; nN? )f j )=N  E (j(n; nN? )f ? (n; n? )f j ) + (2kf k) =N X  n(n? ; f ) jHn (n? ; f )j E (jnN? h ? n? hj ) + (2kf k) =N 2

0

2

1

0

0

def

0

=1

1

1

2

1

1

1

1

1

2

1

2

1

2

2

h2H ( ?1 ;f ) n

2

1

1

1

2

2

n

where jHn (n?1 ; f )j is the cardinality of the set Hn (n?1; f ). Now, the induction hypothesis at rank (n ? 1) implies E (jnN f ? n f j2)  N1 An(f ) with X An (f ) = n(n?1 ; f )2 jHn(n?1 ; f )j An?1 (h) + (2kf k)2 h2H ( ?1 ;f ) n

n

So that, the desired inequality (26) is true at rank n and the induction is completed. Our aim is now to get some information about the constant An (f ) in a special case arising in non-linear ltering problems. Consider the dynamical system described by (29). The discussion below closely follows [16]. De ne the continuous functions

n=p def = n  n?1  : : :  p+1 14

80  p  n ? 1

with the convention n=n () =  for all  2 P (E ). Observe that

n=p()f =

(gn=p Tn=pf ) (gn=p)

80  p  n ? 1 8f 2 B(E ) 8 2 P (E )

(32)

where

Tp(gn=p Tn=pf ) 80  p  n ? 1 8f 2 B(E ) (33) Tp (gn=p) with the conventions gn=n?1 = gn and Tn=n?1 = Tn . To prove this claim we rst note that (32) is obvious for p = n ? 1 because n=n?1 = n . Now, using backward induction on the parameter p, if (32) is satis ed for a given value of p  1 then we have  ()(g Tn=pf )

n=p?1 ()f = n=p (p())f = p  (n=p p )(gn=p) (gp Tp (gn=p Tn=pf )) (gn=p?1 Tn=p?1f ) = (gp Tp(gn=p)) = (gn=p?1 ) where gn=p?1 and Tn=p?1 are given by (33). In the following we retain the notations of theorem 2. gn=p?1 = gp Tp(gn=p)

Tn=p?1 f =

Returning to the inequality (23) we have

jM

N ) : : :M (N )F n

( 0

? M : : :Mn F j  0

n X

p=0

Z   sup F ( n=p ()) ? F ( n=p ( )) CN (; d)  2P E ( )

for all F 2 B(P (E )). Suppose now that

F () = jf ? n f j Then, on gets easily



n  X

E jnN f ? nf j 

sup

p=0  2P (E )

Arguing exactly as before we have



n=p()f ? n=p( )f =  hn=p with

(1)



f 2 B(E )

Z n=p()f ? n=p( )f CN (; d) !

(gn=p Tn=p f )  (gn=p Tn=pf )  (2)   hn=p + (gn=p) ?  (gn=p)

 (g Tn=pf ) g hn=p =  (gn=p ) Tn=p f ? n=p (gn=p ) n=p (1)

On the other hand

!

gn=p h(2) n=p = 1 ?  (g )

n=p()f ? n=p( )f  (hn=p) + 2kf k (hn=p) (2)

(1)

and, a short calculation shows that

! Z gn=p (2 k f k ) hn=p CN (; d)  N   (g ) n=p

and

2

2

(1) 2

! Z gn=p 1 hn=p CN (; d)  N   (g ) n=p

2

(2) 2

15

n=p

(34)

From this and (34) it follows that

0 !1= n n Y n A  N  X X g k E jn f ? n f j  4pkf k sup @  (gn=p ) A  4pkf k a N p 2P E Np k p k n=p 2

=0

1 2

( )

=0

= +1

Q

with the convention ; = 1. Finally, we have proved that   8f 2 B(E ) E jN f ?  f j  Bpn (f ) with n

n

Bn (f ) def = 4nkf k

N

n A Y k

ak

k=1

In this situation we have obtained a uniform upper-bound over the class of all measurable functions f with norm kf k  1. This uniformity allows us to prove the following extension of the classical Glivenko-Cantelli theorem to interacting particle systems. Corollary 1 When the measure-valued dynamical system is given by (29) and E = R we have

8n  0

Proof:





lim sup  N (] ? 1; t]) ? n (] ? 1; t]) = 0 N !+1 t2R n

(35)

P-a.s.

To prove (35) we rst use the upper-bound

P (jnN (] ? 1; t]) ? n(] ? 1; t])j  =2)  1 ? 8 npBn  N

with Bn =

n A Y k k=1

ak

for every  > 0, n  0 and t 2 R. Consequently, for N  (16nBn =)2 we have

P (jnN (] ? 1; t]) ? n(] ? 1; t])j  =2)  1=2

8 > 0 8n  0 8t 2 R

Applying the symmetrization lemma (Pollard [37] pp. 14) it follows that













P (sup nN (] ? 1; t]) ? n(] ? 1; t]) > )  4E P (sup no;N (] ? 1; t]) > =4=n1 ; : : :; nN ) (36) t

t

where no;N denotes the signed measure

no;N = N1

N X i=1

i 

i n

with (i )i1 an i.i.d. sequence with P (i = 1) = P (i = ?1) = 1=2. Now, using Pollard's maximal inequality ([37]) and the Hoe ding's lemma ([25]) we obtain the exponential bounds

! o;N N P sup n (] ? 1; t]) > =4=n ; : : :; n  2 (N + 1) e?N = t2R

2

1

and the Borel-Cantelli lemma together with (36) turns this into (35).

16

32

P-a.s.

We now turn to the asymptotic normality of the particle approximation errors. Proposition 1 Suppose the assumptions of theorem 2 are satis ed. Then for every n  0, f 2 Cb(E ) and x 2 R we have

! Zx ! N 2 X z 1 1 i N p exp ? 22 (f ) dz (f ( ) ? (n; n?1 )f )  x = lim P p N !+1 ?1 n (f ) 2 N i=1 n n where  1=2 n (f ) = n (f ? nf )2 8f 2 Cb(E ) 8n  0

(37) (38)

Before proceeding we should be more precise about the diculty which arises here: This result gives some indications on the asymptotic behavior of the particle estimators (nN )n0 but the essential and unsolved problem is to characterize the asymptotical nature of the random errors N X p1 (f (ni ) ? nf ) Ni

8f 2 Cb(E )

=1

8n  0

Other results relating to the asymptotical normality of particle approximation errors for speci c models in continuous time can be founded in Shiga/Tanaka [38].

Proof:

Using the above notations, we rst observe that

!

N X E exp ( pi (f ( k ) ? (n; nN?1)f )) N k=1 n



N i 1 N N = E E exp ( p (f (n ) ? (n; n?1)f ))=n?1 N   def = E B (n; N; f )N

where i2 = ?1. Using the classical inequality

ei t ? (1 + i t ? t )  t 2 6 2

and the fact that we get



3

8t  0



E f (n1 ) ? (n; nN?1)f=nN?1 = 0

  B(n; N; f ) ? 1 ? 1 (n; nN? )(f ? (n; nN? )f )  4kfpk 2N 3N N 1

1

2

3

It allows to obtain the following inequality with

jB(n; N; f ) ? a(n; N; f )j  N1 b(n; N; f ) a(n; N; f ) def = 1 ? 21N n (f ? n f )2 3 b(n; N; f ) def = 12 n (f ? n f )2 ? (n; nN?1 )(f ? (n; nN?1)f )2 + 4kpf k 3 N

Thus, we have

B(n; N; f )N ? a(n; N; f )N  b(n; N; f ) 17

!

E (B(n; N; f )N ) ? a(n; N; f )N  E (b(n; N; f ) ) =

and

2 1 2

On the other hand let us remark that

n(f ? nf ) ? (n; nN? )(f ? (n; nN? )f )  n(f ) ? (n; nN? )(f ) + (nf ) ? ((n; nN? )f )  n(f ) ? (n; nN? )(f ) + 2kf k nf ? (n; nN? )f    n(f ) ? (n; nN? )(f ) + nf ? (n; nN? )f (1 + 2kf k) 2

1

2

1

2

1

2

1

2

1

2

2

2

1

2

Then, we can write

2

1

1



b(n; N; f )  ( 12 + kf k + 4kf3 k ) p1 + jn(f 2) ? (n; nN?1 )(f 2)j + jn f ? (n; nN?1 )f j N 3



(39)

We point out that the middle term in the second parentheses goes to zero by (25) and (26). It follows that lim E (B (n; N; f )N ) = N ! lim+1 a(n; N; f )N = exp ? 12 n (f ? n f )2 N !+1 This ends the proof of the proposition. Our nal step is to provide some exponential bounds and to prove that nN f converges P ? a:s: to n f as N is the size of the systems growing, for every n  0 and f 2 Cb (E ). Proposition 2 Under the same conditions as in theorem 2 we have

! N X 1 i P j N f (n) ? nf j >   A1(n; f ) e?N A (n;f )

8 > 0 8n  0 8f 2 Cb(E )

2

2

i=1

(40)

with A1 (n; f ) and A2(n; f ) positive and nite. Then we have

lim  N f N !+1 n

8n  0 8f 2 Cb(E )

= n f

(41)

P-a.s.

Proof:

We rst use assumption (25) to prove by induction that for every f 2 Cb (E ) and n  0 there exist some constant A(n; f ) and some nite subset L(n; f )  N  Cb (E ) such that

N n f ? nf  A(n; f )

sup

k;h)2L(n;f )

(

N k h ? (k; kN? )h

(42)

1

with the convention (0; ?N1) = 0 . Consider the case n = 0. For every f 2 Cb (E ), we have

N  f ?  f = N f ? (0; ?N )f 0

0

0

1

and (42) is satis ed at rank n = 0. Suppose the result is true at rank n ? 1  0. Observe that

N n f ? nf  nN f ? (n; nN? )f + (n; nN? )f ? (n; n? )f 1

1

18

1

8n  1

Using (25) we get

N n f ? nf  nN f ? (n; nN? )f + Cn(n? ; f )jH(n? ; f )j 1

1

1

sup

h2H ( ?1 ;f ) n

n

jnN? h ? n? hj 1

1

To clarify the presentation we will note Cn (f ) and Hn (f ) instead of Cn (n?1 ; f ) and Hn (n?1 ; f ). The induction hypothesis at rank n ? 1 implies

N n f ? nf  A(n; f )

sup

k;h)2L(n;f )

(

N k h ? (k; kN? )h 1

with

A(n; f ) = 2 (1 _ Cn(f )jHn(f )j sup A(n ? 1; h))

L(n; f ) = f(n; f )g [ [h2H

h2H (f ) n

f )L(n ? 1; h)

n(

The induction is thus completed. Apply Hoe ding's [25] inequality to get the upper-bound

  P jkN h ? (k; kN?1 )hj > =nN?1  2 exp ? N8 (=khk)2

Therefore we have

P-a.s.

(43)

  P jkN h ? (k; kN?1)hj >   2 exp ? N8 (=khk)2

(44)

Combining (42) and (44) we obtain

! N X 1 i P j N f (n) ? nf j >   A1(n; f ) e?N A (n;f ) 2

2

i=1

with

A1(n; f ) = 2jL(n; f )j

and

A2 (n; f ) = 1=(8A(n; f )2

sup

k;h)2L(n;f )

(

khk ) 2

(67) is a clear consequence of the Borel-Cantelli lemma and the proof is completed.

3.2 Interacting Particle Systems with Branchings

The latter is only concerned with simple interaction mechanisms therefore avoiding situations in which the interaction depends on parts of the trajectories of the particles. In this section we introduce a general approximation of (1) which includes branching and interaction mechanisms. Such constructions will solves numerically (1) when the state space E is given by

E = E1  : : :  E r

with r  1

where (Ep)1pr is a nite sequence of locally compact separable spaces. The basic idea is to split a probability measure on E for dealing with random trees. Nevertheless, it should be kept in mind that the content of this section is nothing else than an extension of the results of section 3.1.

19

3.2.1 The particle systems space The particle system state space

Let us introduce some new notations. For every N1; : : :; Nr  1 we note (N )q = f1; : : :; N1g  : : :  f1; : : :; Nq g

for all 1  q  r

(45)

The points of the sets (N )q will be denoted by the letters (i) or (j ). For (i) 2 (N )q and 1  p  q , ip denotes the p-th component of (i), so that (i) = (i1; : : :; iq). The particle system under study will be a Markov chain with state space

E (N ) def = E1(N )  : : :  Er(N ) 1

(N ) def = (N1; : : :; Nr )

r

(46)

For every 1  p  r, each point xp 2 Ep(N ) consists of j(N )pj particles x(pi) p

xp = (x(pi) )(i)2(N ) 2 Ep(N )

(47)

p

p

Each point x 2 E (N ) consists of r particles systems

x = (x1; : : :; xr) 2 E1(N )  : : :  Er(N ) 1

(48)

r

The points of x 2 E (N ) are called random trees. It is important to remark that the size of the particle systems increases at each step of the algorithm. j(N )1j = N1  j(N )2j = N1N2  : : :  j(N )rj = N1 : : :Nr At each step 1 < p  r the transitions under study will be branching mechanisms. More precisely, during a transition each particle p(i?) 1 , living in the system p?1 , will branch into Np auxiliary particles. (49) 8(i) 2 (N )p?1 p(i?) 1 ??????! (p(i);1; : : :; p(i);N ) So that, at the end of this time the resulting particle system p consists of j(N )pj random particles. p

Note that, for every 1  q  r, a point (x1; : : :; xq ) of the product space E (N )  : : :  E (N ) may be viewed as a nite sequence 1





(x1 ; : : :; xq ) = xi1 ; xi2 ;i : : :; xqi ;i ;:::;i 1i N ;:::;1i N 1

1 2

1 2

q

q

1

1

q

q

Thus, from the above description (49), given a random tree x 2 E (N ), each q -tuple (i1; : : :; iq ) will describe the history of the particle x(qi) . In order to describe the history of the particle x(qi), it is convenient to introduce the following notations. For every (i) = (i1; : : :; iq ) 2 (N )q and 1  p  q  r we denote (i)p = i1; : : :; ip For instance (i) = (i1; : : :; ir ) 2 (N )r =) (i)1 = (i1 ) (i)2 = (i1; i2) (i)3 = (i1; i2; i3) Using these notations and the description (49) the branching dynamics which describe the evolution of the particle x(ri) = xir ;:::;i is given by 1

r

? ?! x(i) = xi ;:::;i x(1i) = xi1 ?! x(2i) = x2i ;i ?! : : : ?! xr(i?) 1? = xir?;:::;i r r 1 1

1

2

1

2

r

20

1

1

r

1

1

r

Transition probability kernels

Let  be a probability measure on E1  : : :  Er . For every 1  p  r, we de ne, whenever they exist, a probability measure p 2 P (E1  : : :  Ep ) and a transition probability kernel p=p?1 by setting, with some obvious abusive notations,  (dx1; : : :; dxr) = 1 (dx1) 2=1(x1 ; dx2) : : :r=r?1 (x1 ; : : :; xr?1; dxr ) (50) p(dx1; : : :; dxp) = p?1 (dx1; : : :; dxp?1) p=p?1 (x1; : : :; xp?1; dxp) The existence of such splittings is discussed in Dellacherie/Meyer, Chap III. [13]. With  2 P (E ) and (50) we associate a measure  (N ) 2 P (E (N )) by

 (N ) (dx) =

N Y 1

i1 =1

1(dxi1 ) 1

N Y 2

i2 =1

2=1(xi1 ; dxi2 ;i ) : : : 1

1

N Y r

2

i

? ; dxi ;:::;i ) r=r?1 (xi1 ; : : :; xir?;:::;i r 1 1

1

r =1

r

1

1

r

Let us see an example of the use of these formulas. Examples 1 Let r = 2, E1 = E2 and  =   K where  2 P (E1) and K is a transition probability kernel on E1. Since in this case 1 (dx1) = (dx1 ) and 2=1 (x1; dx2) = K (x1; dx2) on obtain N N Y Y (51) K (xi1 ; dxi2 ;i )  (N ) (dx) = (dxi1 ) 2

1

1 2

1

1

i2 =1

i1 =1

It follows from (51) that  (N ) is the probability distribution of a random tree

 = (1i ; 2i ;i )1i N ;1i N 1

1 2

1

1

2

2

where 1. (1i )1i N are N1 independent and identically distributed random variables with common law . 2. For each 1  i1  N1, (2i ;i )1i N are N2 independent and identically distributed random variables with common law K (1i ; ). We observe that the probability measure  (N ) may be splitted as before and written in a more compact and simple form. By the very de nition of the sets (N )1; : : :; (N )r, we have 1

1

1

1 2

2 1

.

2

(N ) (x ; : : :; x ; dx )  (N ) (dx) = 1 (N ) (dx1) 2 =(1N ) (x1; dx2) : : :r=r r?1 r ?1 1 2

1

with

(N ) (x ; : : :; x ; dx ) = p=p p?1 p ?1 1 p

=

r

Y i 2 N)

( ) (

Y

 (x(1i) ; : : :; x(pi?) 1? ; dx(pi) ) 1

p

N Y p

j 2 N ) ?1 i

( ) (

p

p =1

p

1

p

 (x1(j) ; : : :; xp(j?)1? ; dx(pj);i ) 1

p

1

Using these notations we introduce the transition probability kernels C(N ) as follows: Z X C(N )F ( ) = F (m(N )(x))  (N ) (dx) with m(N )(x) def = j(N1) j  E r (i)2(N ) (x ;:::;x (N )

r

21

(52)

p

(i)1 1

(i)r r

)

(53)

for all  2 P (E ) and F 2 Cb (P (E )). Roughly speaking, starting with a measure  2 P (E ) the transition probability C(N ) chooses randomly a measure m(N )( ) where  is a E (N )-valued random variable with law  (N ) . From our notations m(N )( ) 2 P (E1  : : :  Er ) and it has the form

m N )() = (

N1X ;:::;N

1

j(N )rj i ;:::;i 1

r



i1 1

r =1

;:::; 1

i ;:::;ir r

Further manipulations yields the decomposition N ) ( )(z ; dz ) m(N )()(dz) = m(1N ) ()(dz1) m(2N=1) ()(z1; dz2) : : :m(r=r r?1 r ?1 2

1

with

mN

( )1 1

(54)

r

N X 1 ( )1(dz1) = N  (dz1) 1

1

N ) ( )(z ; dz ) = and m(p=p p?1 p ?1

X

p

N X 1 (zp?1 ) N1  ? p p

i 2 N ) ?1

( ) (

(55)

i1 1

i1 =1

p

(i)

p

i

1

(i);ip

p

p =1

(dzp) 81 < p < r (56)

The decompositions (52) and (56) make clearer the nature of the transition probability kernel C(N ). The recursive description of the random tree  = (1; : : :; r ) with law  (N ) is straightforward: 1. Step p=1: The particle system 1 consists of N1 i.i.d. random variables with common law 1 . 2. Step 1 < p  r: At the end of step (p ? 1), the random tree consists of (p ? 1) particle systems 1; : : :; p?1. In particular the system p?1 contains j(N )p?1j particles. During this transition each particle p(i?) 1 branches into a xed number Np of independent particles (p(i);1; : : :; p(i);N ) with common law p=p?1 (1(i) ; : : :; p(i?) 1? ; du): 1

p

1

i.i.e.  p=p?1 (1(i) ; : : :; p(i?) 1? ; du)

p(i?) 1 2 Ep?1 ?! (p(i);1; : : :; p(i);N ) 2 EpN

1

p

p

p

p

1

Remark 2 If (N ) = (N ; 1; : : :; 1) then it is clear from the above that for every  2 P (E ) E N = E N  : : :  ErN = E N N 1 X N N 1

(

)

1

1

1

m( )(x) = m (x) = N

1

1

 (N ) (dx) = =

N Y 1

i1 =1 N1 Y i1 =1

1

i1 =1

1

x

i1

1

;:::;x 1 i r

and CN = C(N ) 1

1 (dxi1 )2=1(xi1 ; dxi2 ) : : :r=r?1 (xi1 ; : : :; xir?1; dxir ) 1

1

1

1

1

1

 (dxi ) =  N (dx) 1

1

3.2.2 The Associate Markov Process

We further require that for every z 2 E (N ) and n  1, the measure (n; m(N )(z )) can be splitted as in (50). At this point, it is appropriate to give a special case which will appear in non linear ltering problems where the splitting of the measure (n; m(N )(z )) does not present much more diculty. 22

.

Examples 2 Let us suppose that r = 2, E = E and (n; ) is given by Z f (x ; x ) gn (x ) (dx ; dx ) K (x ; dx ) Z 8f 2 Cb(E ) 8 2 P (E ) (n; )f = 1

2

2

1

2

2

1

0

1

1

2

gn(z1 ) (dz0; dz1)

Since in this case

8z 2 E N  E N ( 1

)1

( 2

m(N )(z) = N 1N

)2

1

N N X X 2

1

i1 =1 i2 =1

2

(z

i1 1

;z21

i ;i2

)

we obtain

(n; m(N )(z))(dx1; dx2) =

NX 1 ;N2 i1 ;i2 =1

gn (z2i ;i ) PN ;N g (zj ;j ) z j ;j =1 n 2 1

2

1 2 1 2

1

2

i1 ;i2 2

(dx1) K (x1 ; dx2)

and therefore

(n; m(N )(z))(dx1; dx2) = 1(n; m(N )(z))(dx1) 2=1 (n; m(N )(z))(x1; dx2) with NX 1 ;N2

gn(z2i ;i ) PN ;N j ;j z i ;i =1 j ;j =1 gn (z2 ) 2=1(n; m(N )(z))(x1; dx2) = K (x1; dx2) 1 (n; m(N )(z))(dx1) =

1 2

1

1 2 1 2

2

1

2

i1 ;i2 2

(dx1)

We refer again to section 3.3 and section 3 for more detailed examples which explain the splitting assumption imposed in the construction of the branching transitions.

We are now ready to introduce a particle approximation of (1) which includes branching and interaction mechanisms. Let N1  1; : : :; Nr  1 and, let ( ; (Fn)n ; (n)n ; P ) be the E (N )-valued Markov process de ned by

P (0 2 dx) =  (N )(dx)

P (n 2 dx= n?1 = z) =  (N ) (n; m(N )(z))(dx)

(57)

In section 3 we will apply the above constructions to solve nonlinear ltering problems. In such framework the above transitions will have an explicit and simple form. The algorithm constructed in this way will be called an interacting particle system with branchings. The terminology branching is intended to emphasized that the points of the state space of the Markov chain are random trees. It is useful at this point already to stress the Markov description of the empirical measures

n(N ) def = m(N )(n ) Arguing as in section 3.2.2 the above description enables us to consider the random measures n(N ) as the measure valued random process ( ; (Fn)n ; (n(N ))n ; P ) de ned by





P 0(N ) 2 d = M0C(N )(d)





P n(N ) 2 d= n(N?)1 =  = Mn C(N )(; d)

with M0 F ( ) = F ( ) for all  . 23

(58)

3.2.3 Description of the algorithm

If we want to think in terms of branching and interaction mechanisms it is essential to recall that at each time n and for every (i) = (i1; : : :; ir ) 2 (N )r we have   i ;i ;:::;i n(i) = n;i 1 ; n;i ;i2 : : :; n;r 1 2

1

1 2

r

Therefore n(i) can be viewed as the nth-part of trajectory of an individual particle

(i) ? (i) n;(i)1 ?! n;(i)2 ?! : : : ?! n;r ?1 ?! n;r (i) In addition, in view of (52), at each step 1 < p  r, each particle n;p ?1 will branch into Np 2

1

r

1

r

auxiliary particles.

8(i) 2 (N )p? n;pi ? ??????! (n;pi ; ; : : :; n;pi ;N ) In the same spirit, the E N -valued Markov process as a whole can be viewed as a branching process. ( )

1

(

( )1

1

( )

p

)

Notice that each particle system n;p , 1  p  n, contains j(N )pj = N1 : : :Np particles. So that the size of the particle systems n;1 ; : : :; n;r increase at each step of the p = 1; : : :; n, but at the end of the interval, the next particle system n+1;1 only contains j(N )1j = N1 particles. Probabilistically and, in a more precise language we may describe its evolution in time as follows: 1. At the time n = 0:  Step p = 1: The particle system 0;1 consists of N1 = j(N )1j random particles 0;1 = (0(i;)1)(i)2(N ) with the same distribution 1.  Step 1 < p  r: At the end of the step p?1 the random tree consists of p?1 particle systems 0;1; : : :; 0;p?1. In particular, the particle system 0;p?1 contains j(N )p?1j particles 0(i;p) ?1. In the very beginning each particle 0(i;p) ?1 branches into a xed number Np of particles: 1

i.i.d.  p=p?1(0(i;)1 ; : : :; 0(i;p) ??1 ; du) Therefore at the end of these mechanisms the particle system 0;p contains j(N )pj particles. 2. At the time n  1: At the time n?1 the random tree n?1 consists of r particle systems n?1 = (n?1;1 ; : : :; n?1;r ). For all p = 1; : : :; r, each particle system n?1;p contains j(N )pj particles n?1;p = (n(i?) 1;p )(i)2(N ) .  Step p = 1: The particle n;1 consists of j(N )1j random particles n;(i)1 with the same distribu system tion 1 n; m(N )(n?1 )  Step 1 < p  r: At the end of the step p?1 the random tree consists of p?1 particles systems n;1 ; : : :; n;p?1 . (i) In particular, the particle system n;p?1 consists of j(N )p?1j particles n;p ?1 . (i) In the very beginning each particle n;p?1 branches into a xed number Np of random particles:  (N )  (i) (i) ? (i) (i);1 (i);N ) i.i.d.   ???! (  ; : : :;  n;p n; m (  ) (n;1 ; : : :; n;p n ? 1 p=p ? 1 n;p n;p ?1 ?1 ; du) So that, at the end of these mechanisms, the particle system n;p contains j(N )pj particles.

0(i;p) ?1 2 Ep?1 ??????! (0(i;p);1; : : :; 0(i;p);N ) 2 EpN p

1

p

p

1

p

1

p

24

p

1

3.2.4 Convergence of the Algorithm

The following discussion is an easy generalization of that given in section 3.1 and for these reasons proofs will only be sketched. Proposition 3 Let us suppose that E is compact. Let M = (Mn)n1 denote a series of timeinhomogeneous and Feller Markov transitions on P (E ), M0 2 P (P (E )) and, let Mn(N ) = Mn C(N ), n  0. Using these notations we have w

M0(N )M1(N ) : : :Mn(N ) ????! M0 M1 : : :Mn 80 2 P (P (E )) N ! +1 More generally (59) holds when E is locally compact and M = (Mn )n1 such that Mn F 2 U (P (E ))

(59)

8F 2 U (P (E ))

(60)

Hint of proof:

The proof which we sketch here is based on the same kind of arguments used in the proof of Theorem 1. Arguing as before, we can show that the transition probability kernel C(N ) is a Feller transition. Now, Let F~ ( ) def = F ( (f1 ); : : :;  (fq )) with fk 2 Cb (E ) and F globally Lipschitz, that is: q X jF (x1; : : :; xq ) ? F (x01; : : :; x0q)j  A jxi ? x0ij A < +1 k=1

By the very de nition of C(N ) one gets easily the system of inequalities q Z X ~ ~ jC(N )F ( ) ? F ( )j  A

E(

 A

k=1 q Z X

jm N (x)fk ?  (fk )j  N (dx) (

N)

)

j j(N1) j

(

X

r (i)2(N )

k=1

)

fk (x(1i) ; : : :; x(ri) ) ?  (fk )j  (N ) (dx1; : : :; dxr) 1

r

r

Therefore using the form of (x(1i) ; : : :; x(ri) ) one gets after some standard computations 1

r

1 0r q X X 1 jC N F~ ( ) ? F~ ( )j  A @ j(N ) j p((r=pfk ? r=p? fk ) )A p p k 1 0 q X r X 1 = A @ j(N ) j p ((r=pfk ) ?  ((r=p? fk ) )A p p k 1 0 q X r X A  pN @ p ((r=pfk ) ?  ((r=p? fk ) )A 1 2

(

)

=1

=1

2

=1

(61)

2

1

1

1 2

2

=1

1 2

2

k=1 p=1 q X pAN ( (fk ? fk )2)1=2 1 k =1 1

=

1

2

with the convention r=r f = f for all f 2 Cb (E ). The proof of (61), while straightforward, is somewhat lengthy so that it is omitted. Then there exists some constant B such that

jC N F~( ) ? F~( )j  pBN (

)

25

1

The statement follows using (23) and on recalling that such functions F~ are dense in Cb (P (S )). In particular, when Mn F ( ) = F ((n;  )) the above result implies that

8n  1

lim E (jn(N )f ? n f j) = 0

8f 2 Cb(E )

N1 !+1

where (N ) = (N1; : : :; Nr)

The problem is now to nd an explicit upper bound for the rate of convergence. Similarly as in Theorem 2 and, using the inequality (61) a crude upper bound may be derived when the condition (25) is satis ed. Proposition 4 Suppose that for every f 2 Cb(E ),  2 P (E ) and n  1 there exist some constant Cn(; f ) and a nite set of bounded functions Hn (; f ) such that

8 2 P (E )

j(n;  )f ? (n; )f j  Cn(; f )

X

h2H (;f )

jh ? hj

(62)

n

Then, for every f 2 Cb (E ) and n  1 there exists An (f ) < +1 such that E (j(N )f ?  f j2 )  An (f )

(63) N1 Therefore, if n(N ) is the distribution of n(N ) and F () = F (n ) for all  2 P (E ) we have 8n  1 N lim D(n(N ); n) = 0 (64) !+1 n

n

1

To get exponential bounds in this situation we prove an extension of Hoe ding's inequality ([25]). Lemma 3 Let ((i))(i)2(N ) be an E (N )-valued random tree with law  (N ) where (N ) = (N1; : : :; Nr), N1  1,: : : , Nr  1, r  1 and  2 P (E1  : : :  Er ). For each f 2 Cb (E1  : : :  Er) and  > 0

0 1 0 !? 1 r X X 1 1 1 A (f ( i ) ? f )j > A  2 exp @? (=kf k) P @j j(N )j 8 j ( N ) j k k i2N 1

( )

( ) (

Proof:

(65)

2

=1

)

Our method of proof follows that of Hoe ding. For each f 2 Cb (E1  : : :  Er ), (i) 2 (N ) and 1  k  r set

fk(i) = E (f ( (i))=(i) ) ? E (f ( (i))= (i) ? ) k

k

k

Using these notations we have

f ( (i)) ? f = Write LN (t; f )r

r X k=1

1

fk(i)

k

= ( i ;  i ;i ; : : :;  i ;:::;i )  (i) def 1

k

1 2

1

k

8(i) 2 (N )

the moment generating functions

0 1 1 0 r X X Y t t i LN (t; f )r = E @exp ( j(N ) j (f ( i ) ? f )A = E @ exp ( j(N ) j fk )A r i 2N r k i 2N def

( )k

( )

( )r (

)r

( )r (

)r

=1

It is well-known that E (etX )  exp ((tb)2=2) for every t 2 R and every real-valued random variable X with zero mean and bounded ranges jX j  b. Applying this inequality to each fr(i) , (i)r 2 (N ), conditionally to  (i) ? we obtain L(N )(t; f )r  L(N )(t; f )r?1 exp(2t2 kf k2=j(N )rj) r

r

1

26

Using repeatedly the same technique we obtain the upper-bound

L(N )

(t; f )r  exp

2 t kf k 2

Thus, for each  > 0 and t 2 R

2

r X

1 j(N )kj

k=1

!

0 1 ! r X X 1 1 i P @ j(N )j (f ( ) ? f ) > A  exp ? t + 2 t kf k k j(N )k j i2N P To minimize the quadratic, let t = =(4kf k rk 1=j(N )kj). This yields 0 1 0 !? 1 r X X 1 1 1 A P @ j(N )j (f ( i ) ? f ) > A  exp @? 8 (=kf k) j(N ) j ( )

( ) (

2

2

=1

)

2

=1

1

2

( )

i 2 N)

k

k=1

( ) (

The end of the proof is now straightforward. Using the same lines of arguments as in the proof of proposition 2, the above lemma leads immediatly to the following result. Proposition 5 Under the same conditions as in theorem 2, for every  > 0, n  0 andf 2 Cb(E ) we have 0 !?11 r  (N )  X 1 A P jn f ? nf j >   A1 (n; f ) exp @?2 A2 (n; f ) (66) j(N ) j with A1 (n; f ) and A2(n; f ) positive and nite. Then we have

8n  0 8f 2 Cb(E )

lim  (N )f N1 !+1 n

k

k=1

= n f

(67)

P-a.s.

Our goal is now to discuss the connexions between proposition 4 and theorem 2. Let (nN )n be the density pro les associated to the particle approximation with simple interactions and N1 particles and let (n(N ))n be the density pro les associated to the interacting particle resolution with branchings, where (N ) = (N1; : : :; Nr ) and N1; : : :; Nr  1. We have already remark that the particle systems with simple interactions and the interacting particle systems with branchings are exactly the same when the number of auxiliary branching particles is at each step equals to one. More precisely 1

(N ) = (N1; 1; : : :; 1) =) C(N ) = CN

1

and n(N ) = nN

1

8n  0

Let us discuss the relationship between the above approximations in the situation where the functions (n; ), n  1, are de ned by (29). The discussion of example 2) page 13 can be extended in an obvious way to the interacting particle approximation with branchings and it can be seen easily that the moments of the corresponding particle density pro le errors are given by

.

E (jnN f ? nf j)  1

with N1 (; f ) = In=p

Z

n X

N (; f ) sup In=p

p=0  2P (E )

E (jn(N )f ? n f j) 

1

N )(; f ) = In=p

j n=p()f ? n=p( )f j CN (; d)

(

1

27

Z

n X

(N ) sup In=p (; f )

p=0  2P (E )

j n=p()f ? n=p( )f j C N (; d) (

)

N (; f ) we introduced previously two functions h(1) and h(2) In order to estimate the terms In=p n=p n=p (2) such that h(1) = 0 = h and we proved that n=p n=p 1

Z

Z jhn=pj CN (; d) + 2kf k jhn=pj CN (; d) 2kf k  ((h ) ) = +  ((h ) ) =   1 +pN n=p n=p

N (; f )  In=p 1

(1)

(2)

1

1

(1)

(2)

2 1 2

(68)

2 1 2

1

Similarly, one gets (N ) In=p (; f ) 

Z

jhn=pj C N (; d) + 2kf k (1)

(

Z

)

jhn=pj C N (; d) (2)

(

)

and the same computations as in (61) leads to N )(; f )  (1 + 2kf k) In=p (

0r X @X 2

1

i=1 q=1 N1 : : :Nq





i ? (i) r=q hn=p r=q?1 hn=p ( )



1= A

1 2

2

(69)

One way to see that the term in the right hand of (69) is lower than the one of (68) is to remark that r  r 2 2 X X 1 1   h(i) ?  (i) h    h(i) ?  h(i) q=1 N1 : : :Nq

r=q n=p

r=q?1 n=p

N1

q=1 r  X

r=q n=p

r=q?1 n=p

 i) )2 ?  ( (i) 2 = N1  (r=q h(n=p r=q?1 hn=p ) 1 q =1 i) ? h(i) )2 = 1  (h(i) )2 = N1  (h(n=p n=p n=p N 1

1

We conclude that using the branching mechanisms it is possible to get a lower bound for the particle density moment errors but whether or not much loss of performance is incurred by one of the above algorithms is an interesting but unsolved theorical question. Really e ective methods for attacking such a problem are apparently not known.

3.3 Examples

One important and useful application of our techniques is the situation in which the dynamical system (1) has the form )(f ) 8f 2 C (E ) 8 2 P (E ) (n; )f = ((n; (70) b n; )(1) where (n; ) are continuous and measure-valued linear functions, that is

.

8 ;  2 P (E ) 8 ; 2 R 8n  1 1

2

1

2

(n; 1 1 + 2 2 ) = 1 (n; 1) + 2 (n; 2)

This property is of particular interest because of its relation to the nonlinear ltering problem. In addition to the role of interacting particle systems in nonlinear ltering theory they are several important points of contact between our approach and nonlinear systems arising in uid mechanics. The second part of this section is devoted to the study of such systems. These simple Markovian models of particles are called master equations in physics. Incidentally, our approach provides convergence results for the empirical measures of such interacting particle systems. For a more 28

thorough treatment of these equations see Sznitman [44] and the references given there. The setting is the same as in the previous sections. The particle systems will be a Markov chain with state space E N where N  1 indicates the size of the system. The N -tuple of elements of E , i.e. the points of the set E N are called systems of particles and will be mostly denoted by the letters x; y; z . As usual mN (x) denotes the empirical measure associated to the point x = (x1 ; : : :; xN ) 2 E N : N X mN (x) = N1 x i=1

i

3.3.1 Linear systems Compositions

The following examples illustrate our interacting particle system approach and highlight issues speci c to non linear ltering problems. Let (Kn )n0 be a family of Feller Markov kernels and let (gn)n1 be a sequence of continuous functions gn : E ! R.

1) If (n;  ) = Kn for all n  1 then for every x 2 E N

(n; mN (x)) = mN (x) K

.

N X 1 Kn(xi ; ) n=N i=1

and, the transition probability kernels of the corresponding particle system are given by N 1 X N Y P ( 2 dz= = x) = K (xi; dz p) n?1

n

p=1

N i=1

n

In other words the particles are chosen randomly and independently in the previous system and, in a second step, they move independently of each other according to the transitions (Kn)n .

.

2) Our second example concerns the dynamical plant equation (1) when the functions (n; ) are given by (n; )f = (f gn)=(gn) 8f 2 Cb(E ) 8 2 P (E ) In this situation, for every x 2 E N N i X (n; mN (x)) = PNgn (x ) j x i=1 j =1 gn (x ) Thus, the transition probability kernels of the corresponding particle system are given by N N X i Y P (n 2 dz=n?1 = x) = PNgn(gx ()xj ) x (dxp) p=1 i=1 j =1 n In other words, at the time n the particles are chosen randomly and independently in the previous system according to the tness functions gn . i

i

3) Let us study a way of combining the situations 1) and 2). Let us set E = E1  E2, where E1 and E2 are two locally compact separable spaces. Suppose the dynamical plant equation (1) is given with the functions (n; ) de ned by

.

(n; )f =

Z

f (x1; x2) gn(x1 ) (dx0; dx1) K (x1; dx2)

Z

gn (z1) (dz0; dz1) 29

(71)

for every f 2 Cb (E1  E2) and  2 P (E1  E2). In this situation, for every x = (x1; x2) 2 E N = E1N ; E2N N X N (n; m (x))(du; dv) = i=1

i

g (x ) PN n g (xj ) x (du) K (u; dv) n j 2

i

=1

2

2

and the particles will move according to the transitions

P (n 2 d(x1; x2)=n?1 = (z1; z2)) =

N N X Y p=1 i=1

g (z2i )

PN n g (zj ) z (dxp) K (xp; dxp) n j =1

i

1

2

2

1

2

To be more precise, let us set n = (bn ; n+1 ). Using this notation and the above description the motion of the particles is decomposed into two separate mechanisms N Y

P (n 2 dx2=bn?1 = x1 ) =

K (xp1; dxp2)

p=1 N X N Y

P (bn 2 dx1=n = x2 ) =

p=1 i=1

(72)

g (xi2)

PN n g (xj ) x (dxp) n j i

=1

(73)

1

2

2

4) Finally we examine the above situation (71) when we use a particle system which includes branching and interaction mechanisms. When considering the dynamical system (71) with state space E = E1 E2 , the particle system is modeled by Markov chain with state space E (N ) = E1(N ) E2(N ) where (N ) = (N1; N2) with N1; N2  1. 2

1

Recall that m(N )(x) is the empirical measure associated to the point x = (x(i))(i)2(N ) 2 E (N ): 2

m N )(x) = (

X

1

j(N ) j 2

N N X X 1 (x ;x x = N N 1 2 i i =1 2

1

i 2 N )2

( ) (

(i)

i1

1

2

1

i1 ;i2

2

)

Observe that, for every x 2 E (N ) N N X X

gn(xi2 ;i )

2

1

(n; m(N )(x))(du; dv) =

1

N N X X

i1 =1 i2 =1

2

1

j1 =1 i2 =1

gn

2

(xj1 ;j2 )

x

i1 ;i2 2

(du) K (u; dv )

2

Thus, the transition probability kernels of the corresponding E1  E2 -valued particles is given by

P (n 2 dz=n?1 = x) =

;N N NX Y 1

1

2

p1 =1 i1 ;i2 =1

g (xi ;i ) PN ;Nn 2g (xj ;j ) x j ;j =1 n 2 1

2

1 2 1 2

1

i1 ;i2

2

2

(dz p1 ) 1

N Y 2



p2 =1

K (z1p ; dz2p ;p ) 1

1

2

Let us set n = (bn ; n+1 ) 2 E N  E2N N . Using this notation and the above description the motion of the particles is decomposed into two separate mechanisms 1

1

2

N N Y Y b K (xp ; dxp ;p ) P (n 2 dx =n? = x ) = 1

2

1

1

P (bn 2 dx1=n = x2) =

2

p1 =1 p2 =1 N1 NX 1 ;N2 Y

p1 =1 i1 ;i2 =1 30

1

1

1

2

(74)

2

gn (xi2 ;i ) PN ;N g (xj ;j ) x j ;j =1 n 2 1

1 2 1 2

2

1

2

i1 ;i2 2

(dxp1 ) 1

(75)

To be more precise, let us remark that

PN g (xi ;k ) X N N X N Y g (xi ;i ) n b P (n 2 dx =n = x ) = PN k PN g (xj ;j ) PN n g (xi ;k ) x (dxp ) 1

1

2

2

p1 =1 i1 =1 j j particle bnp1 chooses a 1 1 =1

Roughly speaking, each random with probability

1

2 2 =1

1

N X 2

k2 =1

2

2

1

2

N N X X 1

2

1 2

1

2

n i2 =1 k (xi21 ;k )1kN2 , 2 2 =1

2

1

2

2

i1 ;i2 2

1

1

with 1  i1  N1 , at

gn (xj2 ;j ) 1

j1 =1 j2 =1

and moves to the site xi2 ;i with probability gn (xi2 ;i )= 1 2

2

sub-system

gn (x2i ;k )= 1

n

2 2 =1

2

N X 2

k2 =1

2

gn (x2i ;k ). 1

2

3.3.2 Master equations

The following systems are called in physics weakly interacting systems, because the interaction depends only on a xed function of the empirical measures mN . We have only considered here very elementary equations which can be easily generalized. The continuous versions, with non necessarily compact state space, are studied in Braun/Hepp [3], McKean [33], Meleard/RoellyCoppoletta [32], Sznitman [43], [44] and the references given there.

Compact state space

In order to use theorem 1, we rst made the sanguine assumption that the state space is compact. Although such is generally not the case these arti cial examples will serve their purpose in illuminating the e ect of interaction in real systems.

.

1) Our rst example concerns the dynamical system (1) when E = [0; 1] and the functions (n; ) are given by

Z

f (x + F (x; V  (x)) + w) d(x) d?(w) 8f 2 Cb ([0; 1]) where the sign + means summation modulo 1, ? 2 P ([0; 1]) and V and F are continuous functions V : [0; 1] ! R, F : [0; 1]  R ! [0; 1]. (n; )f =

Roughly speaking the solution of this dynamical system n = (n; n?1) 8n  1 0 2 P (E ) (76) describes the time marginal distribution of the time inhomogeneous, [0; 1]-valued, and non linear process  de ned by the recursive equation ( Xn ? Xn?1 = F (Xn?1 ; V  n?1 (Xn?1 )) + Wn n  1 (77)

X0  0

where n?1 is the distribution of Xn?1 and Wn is an [0; 1]-valued random variable with law ?. As we shall see this process describes the limit behavior of the trajectory of an individual particle in an interacting particle system, as the number of particle is growing. It is usually called in propagation of chaos theory the tagged particle process. To be more precise, it is well known that there exists a measure  in the path space 2 P (E n+1), called a McKean measure (or McKean process) corresponding to the set of transitions fKu ; u 2 P (E )g de ned by

.

Ku f (x) =

Z

f (x + F (x; V  u(x)) + w) d?(w)

such that 31

8f 2 Cb(E )

.

.

 ?E n ; E n ; X = (Xk ) kn ;   is a time inhomogeneous Markov process with transitions (Ku ) kn .  Under , the probability distribution of xk is uk for all 0  k  n. +1 k

+1

0

0

In our settings this measure is clearly given by (dx0; : : :; dxn+1) = u0(dx0) Ku (x0; dx1) : : : Ku (xn ; dxn+1 ) with u0 = 0and 81  k  n uk = uk?1 Ku ? The description of such models in continuous time may be founded in Meleard/Roelly-Coppoletta [32]. In this situation the McKean probability measure on the path space is de ned as the solution of a classical martingale problem. Therefore, the time marginal distributions n , n  0, of the McKean measure  satisfy (76). The existence of a unique McKean measure is discussed in Sznitman [43] and Shiga-Tanaka [38]. One classical problem is to estimate the the time marginal distributions n . Using (17) section 3.1.1 and theorem 1 section 3.1.4 we are able to construct an interacting particle approximation. First, we observe that for every x = (x1 ; : : :; xN ) 2 E N 0

n

k

(k; mN (x))f

1

N Z X 1 = f (xi + F (xi ; (V  mN (x))(xi)) + w) d?(w)

N i=1

With regard to (17), the system of particles is driven by the mechanisms

8 N N > < P (bk 2 dx=k = z) = Y 1 X z (dxp) (78) p Ni  > : ki = bki ? + F bki ? ; (V  mN (k? ))(bki ? ) + Wki 1  i  N where (Wki ) iN are i.i.d. with common law ?, b = (b ; : : :; bN ) are i.i.d. with common law  , k = (k ; : : :; kN ), bk = (bk ; : : :; bkN ) 2 [0; 1]N . i

=1

1

=1

1

1

1

0

1

1

1

1 0

0

0

There remains the question of convergence of the particle density pro les N X  n1 N def = 1 n

N i=1



i n

Using theorem 1, we conclude easily that for every n  1 the random measures nN converge in law to n when the size of the system is growing.

.

2) Our second example concerns the dynamical system (1) when the functions (n; ) are given by

(n; )f =

Z



f x+

Z

a(x; z) (dz) +

Z



b(x; z) (dz) w d(x) d?(w)

8f 2 Cb ([0; 1])

where the sign + means summation modulo 1, ? 2 P ([0; 1]) and a and b are continuous functions

a : [0; 1]2 ! R, b : [0; 1]2 ! R.

In this situation, the solution of the dynamical system n = (n; n?1) 8n  1 0 2 P (E ) is the density pro le of the McKean measure associated to the time inhomogeneous and [0; 1]-valued non linear process ( Xn ? Xn?1 = an(Xn?1 ) + bn(Xn?1 ) Wn (79) X  0

0

where

32

Z Z  an(z) = E (a(z; Xn? )) = a(z; x) n? (dx) and bn(z) = E (b(z; Xn? )) = b(z; x) n? (dx) for all n  1 and x 2 [0; 1].  n? is the distribution of Xn? .  Wn is an [0; 1]-valued random variable with law ?. 1

1

1

1

1

1

The description of such models in continuous time may be founded in Sznitman [44]. Arguing as before the corresponding interacting particle system is given by 8 N N > < P (bk 2 dx=k = z) = Y 1 X z (dxp) N (80) > j ) + 1 PN b(bi ;  j ) W i : ni = bni ?1 + 1 PNj=1 ap(=1bni ?1 ; i=1 1iN n?1 N N j =1 n?1 n?1 n with (Wni )1iN i.i.d. variables with common law ?. P Using theorem 1, the random measures nN def = N1 Ni=1  converge in law to n when the size of the system is growing. i

i n

.

3)Let E be a compact separable state space and, let  be a McKean process corresponding to a given set of Feller transitions fKu ; u 2 P (E )g and to a given distribution  2 P (E ). In addition, we assume that the maps

u 2 P ?! x Ku 2 P

are continuous, for all x 2 E . Recalling the above observations, we see that the density pro le n is solution of the dynamical system n = (n; n?1 ) 8n  1 0 =  2 P (E ) where Z def 8n  1 8u 2 P (E ) 8f 2 Cb(E ) (n; u)f = u(dx) Ku(x; dy) f (y) Now, in view of (17) section 3.1.1, the transition probability kernels of the interacting particle system is de ned by N N N 1 X X Y 1 i p N K (z ; dx ) with m (z ) =  2 P (E ) P ( 2 dx= = z) = n

n?1

N i=1 m (z) where x = (x1; : : :; xN ); z = (z 1 ; : : :; z N ) 2 E N and n = (n1 ; : : :; nN ). p=1

N

N

j =1

z

j

Finally, using theorem 1 section 3.1.4 and the above conditions we can show easily that the random measures N X nN def = 1 n1 

N i=1

i n

converge in law to n when the size of the system is growing.

Non compact state space

In this last example we study in a di erent, but more classical way, the asymptotic behavior of a system of interacting particles when the state space is non necessarily compact. The continuous version without interaction through the perturbation can be founded in [44]. Let E = Rd , d  1, and let  be the McKean measure on the path space E T +1, T 2 N? , corresponding to the time-inhomogeneous and E -valued non linear process. ( Xn = F n (Xn?1 ; Wn) 1  n  T (81) X0 with law 0 where 33

 For every z; w 2 E F n(z; w) = R Fn (z; x; w) n? (dx) where Fn : E ! E are bounded Lipschitz functions for all 1  n  T .  n is the distribution of Xn, for all 0  n  T .  W = (Wn)n is a sequence of E -valued and independent random variables.  X is an, independent of W , random variable with distribution  2 P (E ) In this situation one looks at a pair system of N -particles 8 N > ( i;N i;N ;  j;N ; W i ) < ni;N = 1 X i F (  n  n = F n ( i;N n n ? n ? n? ; Wn ) N > 1iN 1nT : 1  i  Nj 1  n  T 3

1

0

0

1

1

1

=1

def i;N where W i , 1  i  N , are independent copies of the perturbation W and  i;N = 0 , 1  i  N , 0 i are independent of W i.i.d. random variables with common law 0. Let us prove that ! 1;N 1;N (82) E sup jn ? n j  Ap(T )

N

nT

0

for some nite constant A(T ) > 0. Therefore, using lemma1 we conclude that the empirical measures N X N def = N1 ( ;:::; ) i;N

i=1

i;N

0

T

converge in law as N is growing to the McKean measure  2 P (E T +1). To prove (82) we use the following decomposition

jn;N ? n1;N j 1

N X 1 1;N j;N j;N 1 1  N Fn(n1;N ?1 ; n?1 ; Wn ) ? Fn ( n?1 ; n?1 ; Wn ) j =1

N X Fn (n;N? ; j;N ; Wn ) ? Fn (n;N? ; j;N ; W ) n? n n? Nj N 1 X ;N j;N ;N + F ( n? ; n? ; Wn ) ? F n ( n? ; Wn ) n N j

+1

1

1

1

1

1

1

1

1

=1

1

1

1

1

1

1

1

=1

Since the functions Fn are bounded and Lipschitz there exists a nite constant C < +1 such that

N 1 0  X 1 j;N ;N ;N E j n;N ? n;N j  C @E (jn;N Fn ( n;N ? ? n? j) + E N ? ;  n? ; Wn ) ? F n (n? ; Wn ) A j 

1

1

1

1

1

1

1

1

1

1

1

1

1

=1

Using a discrete time version of Gronwall's lemma one gets

N T 1 X  X j;N ;N E jT;N ? T;N j?  C (T ) E N Fn ( n;N ? ; n? ; Wn ) ? F n ( n? ; Wn ) j n 

1

1

1

1

=1

1

1

1

1

1

=1

where j1n;N ? n1;N j? def = sup0kn j 1k;N ? k1;N j and C (T ) < +1. To bound the terms of the right hand side we observe that, by the law of large numbers,

0 N 1 X 1 ;N j;N ;N E @j N Fn ( n? ; n? ; Wn ) ? F n( n? ; Wn )j A  (2kFNnk) 1

j =1

1

1

1

1

1

34

1

2

2

1nT

4 Application to the Non Linear Filtering Problem The basic model for the general Non Linear Filtering problem consists of a time inhomogeneous Markov process X and a non linear observation Y with observation noise V . Namely, let (X; Y ) be the Markov process taking values in S  Rd , d  1, and de ned by the system:

F (X=Y )

(

X = (Xn)n0 Yn = hn (Xn) + Vn

n1

(83)

where S is a locally compact and separable metric space, hn : S ! Rd , d  1, are continuous functions and Vn are independent random variables with continuous and positive density gn with respect to Lebesgue measure. The signal process X that we consider is assumed to be a noninhomogeneous and S -valued Markov process with Feller transition probability kernel Kn , n  1, and initial probability measure  , on S . We will assume the observation noise V and X are independent. The classical ltering problem is to estimate the distribution of Xn conditionally to the observations up to time n. Namely,

n(f ) def = E (f (Xn)=Y1; : : :; Yn )

(84)

for all f 2 Cb (S ). The nonlinear ltering problem has been extensively studied in the literature. With the notable exception of the linear-Gaussian situation or wider classes of models (Benes lters [1]) optimal lters have no nitely recursive solution (Chaleyat-Maurel/Michel [7]). Although Kalman ltering ([27],[30]) is a popular tool in handling estimation problems its optimality heavily depends on linearity. When used for non linear ltering (Extended Kalman Filter) its performance relies on and is limited by the linearizations performed on the concerned model. The interacting particle systems approach developed hereafter can be seen as a non linear ltering method which discards linearizations. More precisely these techniques use the non linear system model itself in order to solve the ltering problem. The evolution of this material may be seen quite directly through the following chain of papers [4], [5], [16], [17], [18], [19], [20], [24]. Nevertheless, in most of these papers this method is applied as an heuristic approximation to speci c models, its general nature is not emphasized and experimental simulations are the only guides for handling concrete problems. The remainder of this paper is divided into three sections. In section 4.1 we formulate the ltering problem in such a way that the techniques of section 3 can be applied. The problem of assessing the distributions (84) is of course related to that of recursively computing the conditional distributions n , n  0, which provides all statistical informations about the states variables Xn obtainable from the observations (Y1 ; : : :; Yn ), n  0. The key idea is to study the ltering problem along the the lines proposed by Kunita [29] and Stettner [40]. Brie y stated, the essence of the present formulation is that, given the observations Y = y , the conditional distributions n , n  0, are solution of an explicit dynamical model with in nite dimensional state space of the form studied in the rst part of our development. Namely,

(

n = n (yn; n?1 ); n  1 0 = 

(85)

.

where yn 2 Rd is the current observation at the time n  1 and n (yn ; ) are continuous functions n(yn; ) : P (S ) ! P (S ). It should be noted that in this form the optimal lter has a recursive but in nite dimensional solution (except for the linear-Gaussian case, where the Kalman lter reduces to mean an variance parameters). For illustration, recalling the constructions of the particle

.

35

approximation of a measure valued of the form (85) described in section 3 we will see that the local dynamics at the time n  1 of the corresponding particle system are given by the distributions

n(yn; N1

N X i=1

x )(dz) = i

N X i=1

where

i

PNgy;ng(x )(xj ) Ky;n(xi; dz) j =1 y;n

gy;n (xi ) =

and

x1; : : :; xN 2 S

Z

n1

(86)

gn(yn ? h(z)) Kn(xi; dz)

(87)

Ky;n (xi; dz) = gn(gyn ?(xhi)(z)) Kn (xi ; dz)

(88)

y;n

for all xi 2 S , 1  i  N . What is remarkable is that the particle system motion is strongly in uenced by the observations. More precisely Ky;n is exactly the conditional distribution of Xn given Xn?1 and the observation Yn. Intuitively speaking, when the observation of Xn becomes available the particles are sampled around the real state Xn and this guarantees an occupation of the state space regions in accordance with the observations, thus providing a well behave adaptative grid. Unfortunately the main diculty in directly applying the random particle methods of section 3 to the equation (85) stems from the fact that this local dynamic has still the disadvantage of incorporating integrations over the space S . Thus, another kind of approximation is needed to simulate the motion of the particles. Nevertheless we will work out an example in which the integrals (87) and (88) have an explicit and simple form. This special case apart, such a computational diculty will be solved by studying the conditional distributions of the pair process (Xn ; Xn+1) with respect to the observations up to time n. Namely

n(f ) def = E (f (Xn; Xn+1 )=Y1; : : :; Yn )

8f 2 Cb(S )

8n  0

2

(89)

The advantage of this alternative formulation of the optimal lter is that it incorporates separately the so-called prediction and updating mechanisms. To be entirely precise, we will see that, given the observations Y = y , the conditional distributions n , n  0, are solution of a new measure valued dynamical system ( n = 'n(yn; n?1 ); n  1 (90)  =K 0

.

1

where yn 2 Rd is the current observation at the time n  1 and 'n (yn ; ) are continuous function 'n (yn ; ) : P (S 2) ! P (S 2). Then, it will follows easily that the local dynamic of the corresponding particle system with simple interactions is given by the distributions

.

'n(yn ; N1

N X i=1

(x ;x ))(du; dv) = i

0

i

1

N X i=1

g (y ? h(xi )) PN n gn(y ? h(xj )) x (du) Kn (u; dv) 1

j =1

n n

i

1

1

+1

(91)

with n  1, x10 ; : : :; xN0 2 S and x11; : : :; xN1 2 S . Roughly speaking, the prediction mechanism is introduced in the ltering model (89) in order to express in an explicit form the local dynamics of the associated random particle approximation. The aim of section 4.2 is to use the results of section 3 to describe several interacting particle approximations of the non-linear and measure valued dynamical system (90). These approximations belong to the class of algorithms called genetic algorithms. These algorithms are based on the genetic mechanisms which guide natural evolution: exploration/mutation and updating/selection. They were introduced by J.H. Holland [26] to handle global optimization problems on a nite set and the rst well-founded convergence theorem ensuring the convergence of the algorithm toward 36

the desired set of the global minima of a given tness function was obtained by Cerf in 1994 in his PhD. dissertation [6]. Another simplest proof based on the use of Log-Sobolev inequalities and semigroup techniques can be founded in [21]. In the beginning of this section we rst describe a basic particle approximation with simple interaction. In this situation and in view of the distributions (91) the local dynamic of the corresponding particle systems is decomposed into two mechanisms. In the rst one, each particle explores the state space S , independently of each other, according the transition probability kernel of the signal process X . Finally, when the observation is received, each particle examines the previous system and chooses randomly a site in accordance with the observation data. In a second stage we describe a more general evolutionary scheme which includes branchings mechanisms. Upon carefully examining the local dynamics of the particles it will be shown that the corresponding transitions are themselves natural approximations of the distributions (86). Intuitively speaking, the integral form of the conditional distribution Ky;n and the weights gy;n given by (87) are estimated at each step of the algorithm by auxiliary branching particles moving independently of each other according to the transition probability kernel of the signal process. Computionally, the particle approximation with simple interactions is of course more time saving because it doesn't use branching mechanisms but several numerical studies have revealed that its use is a more ecient way to solve the ltering problem. In fact the choice of the number of auxiliary branching particles has considerable e ects on the dynamics of the particle systems. The interested reader is referred to [4],[5] and [20]. There seems to be numerical evidence of this superiority and from a intuitive point of view the physical reason of that seems to be the fact that the particle systems are more likely to track the signal process by conditioning the exploration in accordance with the observations and thus avoid divergence from the real process X . This observation leads us to investigate more closely the relationships between these interacting particle resolutions. Although it is intuitively clear that a bene t can be obtained from the use of branchings it is still an object of investigation to prove the superiority of the such approximations. We remark that the estimates provided by theorem 2 and proposition 4 are in some ways rather crude and without some precise bound on the speed of convergence of such algorithms it is dicult to get a comparison argument between them. On the other hand we have already note in remark 2, section 3, that the particle approximation with simple interactions can be viewed as a special case of the interacting particle approximation with branching mechanisms. To be entirely precise when the number of auxiliary branching particles is equal to 1 these two algorithms are exactly the same. Thus, the interacting particle approximation with branchings generalizes the particle approximation with simple interactions. We nish the paper with some natural generalizations of the elementary stochastic algorithms described in section 4.2. Brie y stated, the particle system described in section 4.3 will track the signal process by considering exploration paths of a given length r  1 and limited sections of the observation path. In the case r = 1 these constructions will reduce to those described above. This enables a uni ed description of our particle approximations in terms of three parameters: the population size, the number of auxiliary branching particles and the length of exploration paths. Several numerical investigations [4],[5] and [20] have also revealed that the introduction of exploration paths also tends to re-center the particles around the signal trajectory. Of course we have touched in this paper only a limited number of questions. For instance we let open the practical question of the best choice of the population size, the number of auxiliary branching particles and the length of exploration paths in accordance with the non linear ltering problem at hand. These simple questions turn out to be surprisingly hard to answer satisfactorily 37

and no rm results concerning the choice of the parameters has been available. Another question we let in the dark is the study of the asymptotic behavior of these algorithms in terms of the ergodic properties of the signal semigroup. In these direction something was done in [17] when the state space is compact but many questions have yet no answers.

4.1 Formulation of the Non Linear Filtering Problem

The object of this section is to introduce the ltering model in such a way that the techniques of section 3 can be applied. We emphasize that several presentations are available and here we follow rather closely the paper of Stettner [40]. For simplicity non linear ltering problem in discrete time are treated throughout. The main virtue of these problems is that the theory is very simple and the ideas transparent. It is thus a good starting point.

Let X = ( 1 = S N ; (Fn1)n0 ; (Xn)n0 ; PX0 ) be a time-inhomogeneous discrete time Markov process with transition operators Kn , n  1, initial distribution  and, let Y = ( 2 = (Rd )N ; (Fn2)n0 ; (Yn )n0 ; PY0 ) be a sequence of, independent of X , independent random variables with continuous and positive density gn with respect to Lebesgue measure. On the canonical space ( = 1  2; Fn = Fn1  Fn2 ; P 0 = PX0 PY0 ) the signal process X and the observation process Y are P0 -independent. Let (hn )n1 be a family of continuous functions hn : S ! Rd, n  1. Let us set

Q

Ln =

n Y

k=1

gk (Yk ? hk (Xk ))=gk (Yk )

n0

(92)

with the convention ; = 1. Note that L is a (P0 ; (Fn)n0 ))-martingale. Then we can de ne a new probability measure P on ( ; (Fn)n0 ) such that the restrictions Pn0 and Pn to Fn satisfy Pn = Ln Pn0 n  0 (93) One can check easily that Lemma 4 Under P , X is a time-inhomogeneous Markov process with transition operators Kn, n  1 and initial distribution  . Vn = Yn ? hn (Xn), n  1, are independent of X and independent random variables with continuous and positive density gn with respect to Lebesgue measure. We will use E ( ) to denote the expectations with respect to P on . The following well known result gives a functional integral representation for the conditional expectation, which is known as the Kallianpur-Striebel formula [28] (see also Stettner [40] and Kunita [29]). Lemma 5 (Stettner [40]) The conditional distributions (n)n0 form a time-inhomogeneous and ( (Y n ); P )-Markov process on P (S ) with transition operators

.

n F () =

.

Z

F (n(y; )) gn (y ? hn (x)) dy ( Kn)(dx)

8F 2 Cb(P (S )) 8 2 P (S ) (94)

where n (y; ) : P (S ) ?! P (S ), n  1, is the continuous function given by

Z

n (y; )f =

f (x) gn(y ? hn (x)) ( Kn)(dx)

Z

gn(y ? hn (z)) ( Kn)(dz)

8f 2 Cb(S )

(95)

for all  2 P (S ) y 2 Rd and n  1. Therefore, given the observations Y = y the distributions n , n  0, are solution of the P (S )-valued dynamical system n = n(yn; n?1 ) n  1 0 =  (96) 38

The equation (96) is called the non linear ltering equation. Even if it looks innocent it requires extensive computations and can rarely be solved analytically. It is thus necessary to resort to numerical solutions. This lemma is proved in Stettner [40]. We quote its outline for the convenience of the reader.

Proof:

In view of (93) we have

.

Ln =Y n) = E0X (f (Xn) Ln ) nf = E0(Ef (X(Ln)=Y n) E X (L ) 0

n

0

n

.

where E0( ) denotes the expectation with respect to P 0 and E0X ( ) denotes the integration of the paths of the Markov process X and the variable X0 . Then we obtain   E0X E0X (f (Xn) gn(Yn ? hn (Xn))=Xn?1) Ln?1  ? n f = E0X E0X (gn (Yn ? hn (Xn)) =Xn?1 ) Ln?1 and, nally   E0X E0X (f (Xn) gn(Yn ? hn(Xn))=Xn?1) Ln?1 =E0X (Ln?1 ) n f = X ?E X (gn (Yn ? hn (Xn )) =Xn?1 ) Ln?1  =E X (Ln?1 ) E 0 0 0 Z f (x) gn(Yn ? hn (x))) (n?1Kn)(dx) Z = gn (Yn ? hn(z))) (n?1Kn)(dz) The temptation is to apply immediately the random particle approximations described in section 3.1. Recalling the construction of the interacting particle system (17) we see that the transition of individual particle at the time n  1 will be speci ed, in this situation, by the transition probability kernels ! N N X X 1 gn(yn ? hn (x1)) Kn (xi ; dx1)  y; (97)  (dx ) = n

N i=1

x

1

i

N Z i=1 X j =1

gn(yn ? hn (z1)) Kn(xj ; dz1)

where N is the size of the particle systems, Yn = yn is the current observation data, x1 2 S and (x1; : : :; xN ) 2 S N . To be more precise, let us put for all x0 ; x1 2 S gn (yn ? hn (x1)) Kn(x0; dx1) (98) Ky;n (x0; dx1) = Z gn(yn ? hn (z1)) Kn (x0 ; dz1)

Z

gy;n (x0) =

Kn (x0; dz1)gn(yn ? hn (z1))

(99)

Using Bayes' rule we note that Ky;n (x0; dx1) is the density under P of the distribution of Xn conditionally to Xn?1 = x0 and Yn = yn and, gy;n (x0) is the density under P of the distribution of Yn conditionally to Xn?1 = x0 . With these notations, the transition probability kernel (97) becomes ! N N i X X 1 n y; N x (dx1) = PNgy;n (x ) j Ky;n (xi ; dx1) i=1 i=1 j =1 gy;n (x ) Let us work out and example in which the desired transition (97) has a simple form. i

39

(100)

Examples 3 let (X; Y ) be the Markov process taking values in R  R and de ned by the system: ( Xn = fn (Xn? ) + Wn (101) Yn = Cn Xn + Vn n  1 where fn : R ! R, Cn 2 R and W , resp. V , is a discrete time Gaussian process with zero mean and variance function q , resp. r. In this speci c situation one gets easily  1  h i  1 ? exp ? 2 js j x ? fn (x ) + sn Cn rn (yn ? Cn fn (x )) Ky;n (x ; dx ) = p 2  jsn j n (102)   1 1 and gy;n (x ) = p exp ? 2 jq j jr j=js j (yn ? Cn fn (x )) 2  jqn j jrnj=jsn j n n n with sn = (qn? + Cn rn? Cn )? and x ; x 2 R. Once more the veri cation of (102), while straight1

0

1

1

2

1

0

0

0

1

0

1

1

0

2

1

forward, is somewhat lengthly and is omitted.

Unfortunately, in most cases it is not possible to exhibit explicitly the form of transition (100). In a little while we shall see one way to approximate the transition probability kernel (100). Roughly speaking, the idea is to replace, in the de nition of Ky;n , each transition Kn (xi ; dx1), 1  i  N , by the empirical measure N0 1 X (103) 

N 0 j=1

x

i;j

where (xi;1; xi;2; : : :; xi;N 0 ) are i.i.d. random variables with common law Kn (xi ; dx1), 1  i  N . With these notations we have formally N;N X0

N gn (yn ? hn (xi;j ))  1X   ( y; n x 0 N;N N i=1 x ) 0  +1 i;j =1 X N k;l gn (yn ? hn(x )) i

i;j

k;l=1

Using such a local approximation it is not obvious that the empirical measure of the particle system will converge to the conditional distribution since it is not clear what condition on the system size N 0 will guarantee the convergence of the algorithm. Fortunately, this vexing technical diculty will disappear when we will model this local approximation by a branching mechanism. The diculty with the recursion (96) is that it involves two separate mechanisms. Namely, the rst one

 7! Kn

does not depends on the current observation and it is usually called the prediction and, the second one  7! Z gn(Yn ? hn ( ))  gn(Yn ? hn (z)) (dz) updates the distribution given the current observation. It is therefore essential to nd a dynamical system formulation which incorporates separately the prediction and the updating mechanisms. A natural idea is to study the distribution of the pair process (Xn ; Xn+1) conditionally to the observations up to time n. Namely,

.

n(f ) def = E (f (Xn; Xn+1 )=Y1; : : :; Yn ) Next lemma is a modi cation of lemma 5 40

8f 2 Cb(S ) 2

8n  0

(104)

Lemma 6 Given the observations Y = y, n is solution of the P (S )-valued dynamical system ( n = 'n(yn ; n? ) n  1 (105)  = K where 'n (y; ) : P (S ) ?! P (S ) is the continuous function given by Z f (x ; x ) gn(y ? hn (x )) d(x ; x ) Kn (x ; dx ) Z 8f 2 Cb(S ) (106) 'n(y; )f = gn(y ? hn (z )) d(z ; z ) 2

1

0

.

2

1

2

1

2

1

0

1

1

0

+1

1

2

2

1

for all  2 P (S 2), y 2 Rd and n  1. Moreover (n )n is a time-inhomogeneous and ( (Y n ); P )Markov process on P (S 2) with transition operators

n F () =

Z

F ('n (y; )) gn (y ? hn(x)) dy d(z; x)

8F 2 Cb(P (S )) 8 2 P (S ) (107) 2

2

Proof:

It is easily checked from (96) that

n = n  Kn+1 = n (yn; n?1 )  Kn+1 = 'n(yn ; n?1) So that (105) and lemma 5 end the proof of the lemma. Returning once more to the description of the random particle system described in section 3.1, we see that the local dynamics of an individual particle will now be given by the transition probability kernels

'n (yn; N1

N X i=1

(z ;z ) )(d(x1; x2)) = i

1

i

2

N X i=1

g (y ? h (zi )) PN n g n(y ?n h (zj )) z (dx ) Kn (x ; dx ) 2

j =1 n n

n

i

2

2

1

+1

1

2

(108)

where N is the size of the particle systems and Yn = yn is the current observation data.

4.2 Interacting particle resolutions

This section covers stochastic particle methods for the numerical solving of the nonlinear ltering equation (105) based upon the simulation of interacting and branching particle systems. The technical approach presented here is to work with a given sequence of observations Y = y . With regard to (105) and (106) the nonlinear ltering problem is now reduced to the in nite dimensionality of the state space P (S 2). This assumption enables us to formulate the conditional distributions as probabilities parameterized by the observation parameters and solution of a measure-valued dynamical system. The design of our particle system approach is described in section 3 for the numerical solving of general measure-valued dynamical systems. We shall use the notions and notations introduced in sections 3.1 and 3.2.

4.2.1 Particle systems with simple interactions

The algorithm presented in (17) section 3.1 was referred to as a particle system approximation with simple interaction function. When considering the nonlinear ltering equation (105) the interacting particle system is modeled by a Markov chain ( 0; (Fn0 )n0 ; (n)n ; P[y] ) with state space S 2N , where N  1 is the size of the system. 41

The N -tuple of elements of S 2, i.e. the points of the set S 2N , are called particle systems and will be denoted by the letters z; x. Recalling the description (17), this chain is de ned by

P[y] (0 2 dx) =

N Y p=1

0

(dxp)

P[y] (n 2 dx= n?1 = z) =

N Y p=1

! N X 1 'n yn ; N z (dxp) (109) i=1

i

In view of (105) and (108), we have that

P[y] (0 2 dx) = P[y] (n 2 dx= n?1 = z) =

N Y

 (dxi1) K1(xi1; dxi2)

i=1 N N X Y p=1 i=1

(110)

g (y ? h (zi )) PN n g n(y ?n h (zj )) z (dxp) Kn (xp; dxp) (111) 2

j =1 n n

n

i

+1

1

2

2

1

2

where

 yn is the observation data at the time n  x = (x ; x ); z = (z ; z ) 2 S N  S N and, xi = (xi ; xi ); zi = (zi ; zi ) 2 S for all 1  i  N .  n 2 S N  S N is the system of S -valued particles at the time n. 1

2

1

2

1

2

1

2

2

2

To be more precise, it is convenient to introduce additional notations. Let us set

n def = (bn ; n+1 ) 2 S N  S N

8n  0 Now the points of the set S N will be denoted by the letters x and z . Using these notations, (109)

together with (108) lead to the following Markov model

P[y] (n 2 dx=bn?1 = z) = P[y] (bn 2 dx= n = z) =

N Y

Kn (zp; dxp)

p=1 N N X Y p=1 i=1

(112)

i PNgn (gyn(y? h?n(hz ))(zj )) z (dxp) j =1 n n

n

i

(113)

where yn is the observation data at the time n and x; z 2 S N . Equations (112) and (113) resemble a genetic algorithm [6], [21], [26]. The advantage of this formulation is that it incorporates separately the prediction bn?1 ; n and the updating n ; bn mechanisms. Thus, we see that the particles move according the following rules 1. Prediction: Before the updating mechanism each particle evolves according the transition probability kernel of the signal process. 2. Updating: When the observation Yn = yn is received, each particle examines the system of particles n = (n1 ; : : :; nN ) and chooses randomly a site ni with probability

g (y ? h ( i )) PN n g n(y ?n h n(j )) j =1 n n

n n

Our purpose is now to understand why the second mechanism (113) play a very special role in the behavior of the particle lter. What is important is that each particle interacts selectively with the system in accordance with the observations data. Roughly speaking, a given particle which takes a given value is more likely to choose another site if its position generally disagree with the current observation than if it agrees with it. These observations points to the very interesting dynamical role played by the updating mechanism. It stabilizes the particles' motion around certain values of 42

the real signal which are determined by the observations thus providing a well behave adaptative grid. Now we design a stochastic basis for the convergence of our particle approximations. To capture all randomness we list all outcomes into the canonical space ( ~ ; (F~n)n0 ; P~ ) de ned as follows: 1. Recall ( ; Fn; P ) is the canonical space for the signal observation pair (X; Y ). 2. We de ne ~ = 0  and F~n = Fn0  Fn and, for every ! def = (! 1 ; ! 2; ! 3) 2 we de ne:

n(!) = !n1 Xn(!) = !n2 Yn (!) = !n3 3. For every A 2 Fn0 and B 2 Fn we de ne P~ as follows:

.

P~ (A  B) def =

Z

B

P[Y ] (A) dP

(114)

As usual we use E~ ( ) to denote expectations with respect to P~ . The approximation of the conditional distribution n by the empirical measure

nN def = N1

N X i=1

(b ; i n

i ) n+1

is guaranteed by theorem 2. Moreover, applying the dominated convergence theorem, we nd 8f 2 Cb(S 2) 8n  1 lim E~ (jnN f ? n f j2) = 0 N !+1

To see this claim it clearly suces to prove that condition (25) is satis ed for every sequence of observation Y = y . By the very de nition of the functions 'n (yn ; ) one gets easily 8n  1 8 2 P (S 2) 8f 2 Cb(S 2) 'n(yn ; )f = (gn(gTn)f ) n with

.

gn(x0; x1) = gn(yn ? hn (x1))

and

Tnf (x0; x1) =

Z

f (x1; x2) Kn+1 (x1 ; dx2)

Arguing as in 2) page 13 with E = S 2 it is straightforward to see that condition (25) is satis ed.

4.2.2 Interacting particle systems with branchings

The method described above is the crudest of the random particle methods. When applied to the non linear ltering equation (105) the density pro les might lack some of the statistical details of the conditional distributions which one would get when using the branching re nement method introduced in section 3.2. We shall use the notions and notations introduced in section 3.2.1. This re nement is also relatively easy to program and it has been used with success in many practical situations [4], [5], [18] but its use still leaves open the optimal choice of the number of auxiliary particles. Let us start with a few remarks. When each particle branches into 1 particle these algorithms are exactly the same. On the other hand, when the number of auxiliary branching particles is growing the transition probability density of an individual particle tends to the transition (100). Even if the rst algorithm is more time saving because it does not use branching simulations several numerical simulations have revealed that a clear bene t can be obtained by using branching particles. Unfortunately on cannot quantify this superiority. Some attempts in this direction have 43

been done page 28. When considering the nonlinear ltering equation (105) with a given sequence of observations Y = y, the corresponding interacting particle system with branchings is modeled by a Markov chain ( 0; (Fn0 )n0 ; (n)n ; P[y] ) with state space

E (N ) = S (N )  S (N ) (N ) def = (N1; N2) where (N )1 = f1; : : :; N1g and (N2) = f1; : : :; N1g  f1; : : :; N2g, N1; N2  1. Each point x = (x1; x2) = (xp1 ; xp2 ;p )1p N ;1p N 2 S (N )  S (N ) 1

1

2

1

2

1

1

2

1

2

2

consists of two particles systems and they are called random particle trees, they will be denoted by the letters z; x. Now, recalling the description (57), the transition probability densities of this Markov chain are given by

P[y] (0 2 dx) = P[y] (n 2 dx= n?1 = z) =

N Y 1

 (dxp1 ) 1

p1 =1 N1 NX 1 ;N2 Y

N Y 2

p2 =1

K1(xp1 ; dx2p ;p ) 1

1

(115)

2

gn (yn ? hn(z2i ;i )) PN ;N j ;j z p =1 i ;i =1 j ;j =1 gn (yn ? hn (z2 )) 1

1

2

1 2 1 2

1 2

1

2

i1 ;i2

2

(dxp1 ) 1

N Y 2

Kn+1 (xp1 ; dx2p ;p ) 1

p2 =1

n def = (bn ; n+1 ) 2 S (N )  S (N ) 8n  0 To clarify the notations, the N1-tuple of elements of S , i.e. the points of the set S (N ) will be denoted by z = (z p )p and, the points of the set S (N ) will be denoted by x = (xp ;p )p ;p . For brevity, we will also write gn (x) instead of gn (yn ? hn (x)). Using these notations, we obtain to the 2

1

1

1

2

1

2

1

2

following Markov model

N Y b  (dz p ) P y ( 2 dz) = 1

[ ]

1

0

P[y] (n 2 dx=bn?1 = z) = P[y] (bn 2 dz= n = x) =

p1 =1 N2 N1 Y Y

Kn(zp ; dxp ;p ) 1

p1 =1 p2 =1 N1 NX 1 ;N2 Y

1

2

i1 ;i2

p1 =1 i1 ;i2 =1

g (x ) PN ;Nn g (xj ;j ) x (dzp ) n j ;j 1 2 1 2 =1

1

i1 ;i2

2

1

Continuing in the same vein, it is important to note that

N N PN gn (xi ;k ) X N X Y gn (xi ;i )  (dz p ) (117) k b P y (n 2 dz= n = x) = P PN ;N g (xj ;j ) N g (xi ;k ) x n n p i i j ;j k This formulation incorporates separately the prediction bn? ; n and the updating n ; bn 1

1

[ ]

1 =1 1 =1

2 2 =1

1

1 2 1 2 =1

2

2

1

2

1 2

2 2 =1

2 =1

1

1

2

i1 ;i2

1

mechanism. More precisely, at each moment of time n, we see that the particles move according the following rules 44

2

(116)

where yn is the observation data at the time n. This algorithm generalize the one given above by allowing at each step N2 auxiliary particles. For the moment, let us merely note that if N2 = 1 then the transitions above are exactly the same as those given before. To be more precise, it is now convenient to introduce additional notations 1

1

1. Prediction: Before the updating mechanism each particle bni ?1 , 1  i1  N1, branches into a xed number N2 of i.i.d. random particles n = (ni ;i )1i N with law Kn . 2. Updating: When the observation Yn = yn becomes available, each particle bni ?1 , 1  i1  N1, chooses a sub-system of auxiliary particles (ni ;i )1i N at random with probability 1

1

2

2

2

1

1 2

N X 2

gn (ni ;k )= 1

k2 =1

2

NX 1 ;N2

2

gn(nj ;j ) 1

j1 ;j2 =1

2

2

and moves to the site ni ;i in the chosen sub-system with probability 1

2

gn(ni ;i )= 1 2

N X 2

k2 =1

gn (ni ;k ) 1

2

The fundamental di erence between the so-called simple interaction and branching approaches lies in the fact that in the former each particle branches into a xed number of auxiliary particles and the corresponding updating/selection procedure is itself decomposed in two di erent mechanisms. Intuitively speaking the branchings are meant to discourage the particles to visit bad state regions. Remark 3 The choice of the systems' size N1 and N2 is frequently a matter of judgment and intuition. We observe that each sub-system consists of N2 i.i.d. random variables (ni ;i )1i N with law Kn (bni ?1 ; dz1). So that, formally 1 2

1

N 1 X N2 i =1  2

i1 ;i2 n

(dz1 )

2

leads to

N X i2 =1

and

k

n n

2

1

i2 =1

2

1

1

2

1

N 1 X gn (ni ;i ) N 2

1

i1 ;i2

1 2 =1

2

 Kn (bni ? ; dz ) N  +1

g ( ) PN n gn (i ;k )  (dz )

2

2

2

i1 ;i2 n

 N  +1

 Ky;n (bni ? ; dz ) N  +1 1

1

1

1

2

gy;n (bni ? ) = 1

Z

gn (z1) Ky;n (bni ?1 ; dz1) 1

1

2

This observation leads us to the following equivalence

P~[y] (bn 2 dz= bn?1 = x)

 N  +1 2

N Y 1

p1 =1

n(yn ; N1

N X 1

1

i1 =1

x )(dz p ) i1

1

It follows that (117) is the right way to model the local approximation discussed in (103). As in section 4.2.1, to capture all randomness, we list all outcomes into a canonical space ( ~ ; (F~n )n0 ; P~ ) and we use E~ ( ) to denote expectations with respect to P~ . Using the same line of arguments as in the end of section 4.2.1, the approximation of the desired conditional distribution n by the empirical measure

.

n(N ) def = m(N )((bn ; n+1 )) = N 1N 1

N X N X 1

2

2

i1 =1 i2 =1

(b

i1 n

is guaranteed by proposition 4. Moreover arguing as before we have lim E~ (jn(N )f ? n f j2) = 0 N !+1 1

45

; 1+12 ) i ;i

n

4.2.3 General interacting particle resolutions

In the last part of this paper the above approximations are generalized. The prediction mechanism of the former particle lter will include exploration paths of a given length r  1 and the corresponding updating procedure will then be used every r steps and it will consider r observations. The idea is to study the pair process (Xn ; Xn+1 ) given by

Xn = (Xnr ; : : :; Xnr def

8n  0 and r > 0

r?1))

+(

(118)

For later convenience, let us set Yn def = (Ynr ; : : :; Ynr+r?1 ) for all n  0. Let n be the distribution of the pair process (Xn ; Xn+1 ) conditionally to the observations up to time nr. Namely,

n (f ) def = E (f (Xn ; Xn+1 )=Y1; : : :; Yn )

8f 2 Cb(S r)

(119)

2

It is straightforward to see that this case may be reduced to the latter (104) through a suitable state space basis. More precisely, (Xn )n0 is a S r -valued and time-inhomogeneous Markov process with transition operator

Kn ((x ; : : :; xr? ); d(xr; : : :; x r? )) = 0

1

2

def

1

rY ?1 p=0

Knr+p (xr+p?1 ; dxr+p)

and initial distribution

0 =   K1  : : :  Kr?1 In this situation the corresponding observation process (Yn )n0 may be written by

Yn = Hn (Xn) + Vn with

 Hn(x ; : : :; xr? ) = hnr (x ); : : :; hnr 0

1

0



r?1) (xr?1 )

8n  0 and

+(

 Vn = Vnr ; : : :; Vnr

r?1)

+(



n0

is a sequence of, independent of X , independent random variables with continuous and positive density rY ?1 Gn(v0; : : :; vr?1) = gnr+p(vp) p=0

The same kind of arguments as the ones above given show that n is solution of the P (S 2r)-valued recursive equation ( n = n(Yn; n?1 ) n  1 (120) 0 =  0  K 1 where n (y; ) : P (S 2r ) ?! P (S 2r) is the continuous function given by

.

Z

n (Yn ;  )f

=

f (X1 ; X2) Gn (Yn ? H(X1 )) d(X0; X1 ) Kn+1 (X1 ; dX2)

Z

Gn(Yn ? Hn(Z )) d(Z ; Z ) 1

0

(121)

1

for all f 2 Cb (S 2r ) and  2 P (S 2r). From the equations (120) we can quickly deduce the design of general interacting particle resolutions. We will look at such algorithms in all details below and provide enough detail for the reader to be able to generate these stochastic algorithms on a 46

computer. For brevity we will write Gn (X ) instead of Gn (Yn ? Hn (X )). Now, the corresponding particle system is modeled by a Markov chain ( 0 ; (Fn0 )n0 ;  = (n )n0 ; P[y]) with state space

E (N ) def = (S r )(N )  (S r )(N ) 1

(N ) def = (N1; N2) and N1; N2  1

2

(122)

where

(N )1 = f1; : : :; N1g (N )2 = f1; : : :; N1g  f1; : : :; N2g We will use the notations n def = (bn ; n+1 ) 2 (S r )(N )  (S r )(N ) 8n  0 and, to clarify the presentation, the points of the set (S r )(N ) will be denoted by z = (z p )p and, the points of the set (S r )(N ) will be denoted by x = (xp ;p )p ;p . Recalling the descriptions (115) and (117), this Markov chain has the following transition probability densities: 2

2

1

2

P[y] (b0 2 dz) = P[y] (n 2 dx=bn?1 = z) = P[y] (bn 2 dz= n = x) =

1

N Y 1

2

1

1

1

2

 (dz0p )K1(z0p ; dz1p ) : : :Kr?1 (zrp?2; dzrp?1 ) 1

1

p1 =1 N2 N1 Y Y

1

1

1

Knr (zrp?1 ; dx0p ;p ) : : :Knr+(r?1) (xrp?;p2 ; dxrp?;p1 )

p1 =1 p2 =1 N1 NX 1 ;N2 Y

p1 =1 i1 ;i2 =1

1

1

2

1

G (xi ;i ; : : :; xi ;i ) PN ;Nn G (xj ;j ; : r: ?:; xj ;j )  x 1

0

2

n

1 2 0 1 =1

j ;j

1 2

1

0

0

1

0

1

r?1

i ;i ( 01 2

2

1

2

p1 p1 ;:::;x 1?12 ) (d(z0 ; : : :; zr?1 )) i ;i r

Finally, the approximation of the desired conditional distribution n by the empirical measure N N X X (N ) def = 1  2

1

N1N2 i =1 i =1 (b

n

i1 n

1

2

; 1+12 ) i ;i

n

is guaranteed by proposition 4 and for every f 2 Cb (S 2r ), n  1 lim E~ (jn(N )f ? n f j2) = 0 N !+1 1

To check that this algorithm generalizes the one given above observe that the former transition probability densities coincide with the transitions of the particle systems with simple interaction described in section 4.2.1 when r = 1 and N2 = 1 and, they coincide with those of the interacting particle systems with branchings described in section 4.2.2 when r = 1. The choice of the parameters N1; N2; r requires a criterion for optimality. The result will probably depend on the dynamics of state process and also on the dynamical structure of the observations. Unfortunately no one has yet been able to give an e ective method of solution. Another idea is to study the performance of a modi ed version of the above algorithms which includes an interactive updating/selection schedule r = r(n). For instance we may choose to resample the particles when fty percents of the weights are lower that NA , with a convenient choice of the parameter A > 0 and p  2. p

Acknowledgements

I am deeply grateful to Prof. Dominique Michel for useful discussions and her careful reading of the original manuscript. I also thank the anonymous referee for his comments and suggestions that greatly improved the presentation of the paper. This work has been partially supported by CEC Contract No ERB-FMRX-CT96-0075 and INTASRFBR project No 95-0091. 47

References [1] V.E. Benes, Exact Finite-Dimensional Filters for Certain Di usions with Nonlinear Drift, Stochastics, Vol.5, pp 65-92, 1981 [2] P. Billingsley, Convergence of probability measures, Wiley, New York (1968). [3] W. Braun, K. Hepp, The Vlasov dynamics and its uctuation in the 1=n limit of interacting particles, Comm. Math. Phys. (56) (101-113) 1977. [4] H. Carvalho, P. Del Moral, A. Monin, G. Salut, Optimal non-linear ltering in GPS/INS Integration, IEEE Trans. on Aerospace and Electronic Systems, Vol. 33, (3), pp. 835850, 1997. [5] H. Carvalho, Filtrage optimal non lineaire du signal GPS NAVSTAR en racalage de centrales de navigation, These de L'Ecole Nationale Superieure de l'Aeronautique et de l'Espace, Sept. 1995. [6] R. Cerf, Une theorie asymptotique des Algorithmes Genetiques, Universite Montpellier II, Sciences et techniques du Languedoc, 1994. [7] M. Chaleyat-Maurel, D. Michel, Des resultats de non existence de ltres de dimension nie, C.R.Acad.Sci. Paris, Serie I, t. 296, 1983. [8] D. Crisan, The Nonlinear Filtering Problem, Ph.D. Disseration, University of Edinburgh, 1996. [9] D. Crisan, T.J.Lyons, Nonlinear Filtering And Measure Valued Processes., Imperial College London, preprint 1996. [10] D. Crisan, T.J.Lyons, Convergence of a Branching Particle Method to the Solution of the Zakai Equation, Imperial College, London, preprint 1996. [11] D. Crisan, J. Gaines, T.J.Lyons, A Particle Approximation of the Solution of the KushnerStratonovitch Equation, Imperial College, London, preprint 1996. [12] D.A. Dawson, Stochastic Evolution Equation and Related Measure Processes, Journal of Multivariate Analysis, 5, 1-52. [13] C. Dellacherie, P.A. Meyer, Probabilites et Potentiel, Hermann Eds. Chap III, paragraphe 70. [14] P. Del Moral, Non linear ltering using random particles, Theor. Prob. Appl., Vol. 40, (4) (1995). [15] P. Del Moral, Non-linear Filtering: Monte Carlo Particle Resolution, Publications du Laboratoire de Statistiques et Probabilites, Universite Paul Sabatier, No 02-96. [16] P. Del Moral, Non-linear Filtering: Interacting particle resolution, Markov Processes and Related Fields, Vol.2, 1996. [17] P. Del Moral, Asymptotic Properties of Non-Linear Particle Filters, Publications du Laboratoire de Statistiques et Probabilites, Universite Paul Sabatier, No 11-96. [18] P. Del Moral, G. Rigal, J.C. Noyer, G. Salut, Traitement non-lineaire du signal par reseau particulaire : Application Radar, quatorzieme colloque GRETSI, Juan les Pins, 13-16 Septembre 1993. [19] P. Del Moral, Resolution particulaire des problemes d'estimation et d'optimisation nonlineaires, These de l'Universite Paul Sabatier, Rapport LAAS no 94269, Juin 1994. [20] P. Del Moral, J.C. Noyer, G. Salut, Resolution particulaire et traitement non-lineaire du signal : Application Radar/Sonar, Traitement du signal, Septembre 1995. 48

[21] P. Del Moral, L. Miclo, On the Convergence and the Applications of the Generalized Simulated Annealing, Publications du Laboratoire de Statistiques et Probabilites, Universite Paul Sabatier, No 16-96. [22] R.L. Dobrushin Markov processes with a large number of locally interacting components: existence of a limit process and its ergodicity. Problem Inform. Transmission (7) 149-164. [23] R.L. Dobrushin, Prescribing a System of Random Variables by Conditional Distributions, Theor. Prob. Appl., Vol. 15, (3) (1970). [24] N.J. Gordon, D.J. Salmon, A.F.M. Smith, Novel Approach to non-linear/non-Gaussian Bayesian state estimation, IEE 1993. [25] W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, 58, pp. 13-30, (1963). [26] J.H. Holland, Adaptation in natural and arti cial systems University of Michigan Press, Ann Arbor, 1975 [27] A.H. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press, 1970. [28] G. Kallianpur, C. Striebel, Stochastic di erential equations occuring in the estimation of continuous parameter stochastic processes, Tech. Rep. No 103, Department of statistics, Univ. of Minnesota, September, 1967. [29] H. Kunita Asymptotic Behavior of Nonlinear Filtering Errors of Markov Processes, Journal of Multivariate Analysis, Vol. 1, No 4, p. 365-393, Dec. 1971 [30] D. Liang, J. McMillan, Development of a marine integrated navigation system, Kalman Filter Integration of Modern Guidance and Navigation Systems, AGARD-LS-166, OTAN. [31] T.M.Liggett Interacting Particle Systems, A Series of Comprehensive Studies in Mathematics, Springer Verlag (276) 1985. [32] S. Meleard, S. Roelly-Coppoletta, A propagation of chaos result for a system of particles with moderate interaction, Stochastic Processes and their Applications (26) (1987) 317332. [33] H.P. McKean, Propagation of chaos for a class of non linear parabolic equations, Lectures Series in Di erential Equations (7) Catholic Univ. (1967) 41-57. [34] D.L. Ocone, Topics in Nonlinear Filtering Theory, Ph.D. Thesis, MIT, Cambridge, 1980. [35] E. Pardoux, Filtrage Non Lineaire et Equations aux Derives Partielles Stochastiques Associees, Ecole d'ete de Probabilites de Saint-Flour XIX-1989, Lecture Notes in Mathematics, 1464, Springer-Verlag, 1991. [36] K.R. Parthasarathy, Probability measures on Metric Spaces, Academic Press Inc., New York, 1968. [37] D. Pollard, Convergence of Stochastic Processes, Springer Series in Statistics, Springer Verlag, 1984. [38] T. Shiga, H. Tanaka, Central Limit Theorem for a System of Markovian Particles with Mean Field Interactions, Z. Wahrscheinlichkeitstheorie verw. Gebiete, (69) (439-459), 1985. [39] A.N. Shiryaev, On stochastic equations in the theory of conditional Markov processes, Theor. Prob. Appl. (11) (179-184), 1966. [40] L. Stettner, On Invariant Measures of Filtering Processes , Stochastic Di erential Systems, Lecture Notes in Control and Information Sciences, 126, Springer Verlag, 1989. [41] R.L. Stratonovich, Conditional Markov processes, Theor. Prob. Appl. (5) (156-178), 1960. 49

[42] F. Spitzer, Random processes de ned through the interaction of an in nite particle system, Springer Lectures Notes in Mathematics (89) 201-223. [43] A.S. Sznitman, Non linear re ecting di usion process and the propagation of chaos and

uctuations associated, J. Functional Anal. 56 (3) (1984) 311-336. [44] A.S. Sznitman, Topics in propagation of chaos, Ecole d'ete de Probabilites de Saint-Flour XIX-1989, Lecture Notes in Mathematics, No 1464. [45] M. Van Dootingh, F. Viel, D. Rakotopara. J.P. Gauthier, Coupling of non-linear control with a stochastic lter for state estimation : Application on a free radical polimerization reactor, I.F.A.C. International Symposium ADCHEM'91, Toulouse, France, October 14-15, 1991. [46] S. Watanabe, A limit Theorem of Branching Processes and Continuous State Branching Processes, H. Math. Kyoto Univ. (1968).

Del Moral Pierre Laboratoire de Statistiques et Probabilites CNRS UMR C55830, Bat 1R1, Universite Paul Sabatier 118 Route de Narbonne, 31062 Toulouse Cedex Electronic adress: [email protected].

50

Suggest Documents