Approximation of the value function for a class of di ... - CiteSeerX

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE

Approximation of the value function for a class of dierential games with target Odile POURTALLIER and Mabel TIDBALL

N˚ 2942 July 1996 THE`ME 4

ISSN 0249-6399

apport de recherche

Approximation of the value function for a class of dierential games with target Odile POURTALLIER * and Mabel TIDBALL ** Thème 4 Simulation et optimisation de systèmes complexes Projet MIAOU Rapport de recherche n2942 July 1996 26 pages

Abstract: We consider the approximation of a class of dierential games with target

by stochastic games. We use Kruzkov transformation to obtain discounted costs. The approximation is based on a space discretization of the state space and leads to consider the value function of the dierential game as the limit of the value function of a sequence of stochastic games. To prove the convergence, we use the notion of viscosity solution for partial dierential equations. This allows us to make assumptions only on the continuity of the value function and not on its dierentiability. This technique of proof has been used before by M. Bardi, M. Falcone and P. Soravia for another kind of discretization. Under the additional hypothesis that the value function p is Lipschitz continuous, we prove that the rate of convergence of this scheme is of order h where h is the space parameter of discretization. Some numerical experiments are presented in order to test the algorithm for a problem with discontinuous solution. Key-words: Hamilton Jacobi Bellman Isaacs equation, dynamics programming, approximation schemes, viscosity solutions. (Résumé : tsvp)

* INRIA, Centre Sophia Antipolis, 2004 route des Lucioles BP 93, 06902 Sophia-Antipolis, France ** Department of Mathematics, University of Rosario, Pellegrini 250, 2000 Rosario, Argentina

Unite´ de recherche INRIA Sophia-Antipolis 2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex (France) Te´le´phone : (33) 93 65 77 77 – Te´le´copie : (33) 93 65 77 65

Approximation de la fonction valeur pour une classe de jeux diérentiels avec cible

Résumé : On considère l'approximation d'une classe de jeux diérentiels avec cible par des

jeux stochastiques. On utilise la transformation de Kruzkov pour obtenir des coûts avec taux d'actualisation. Le schéma d'approximation utilisé est basé sur la discrétisation de l'espace d'état et revient à considérer la fonction Valeur du jeu diérentiel comme la limite des fonctions Valeur d'une suite de jeux stochastiques. Pour prouver la convergence du schéma, on utilise la notion de solution de viscosité pour les équations aux derivées partielles du premier ordre. Cela nous permet de restreindre les hypothèses sur la fonction Valeur à la continuité, sans se préoccuper de sa diérentialité. Les techniques de preuve ont été déjà utilisées par M. Bardi, M. Falcone et P. Soravia pour un autre schéma de discrétisation. Sous l'hypothèse additionnelle de Lipschitzianité de lapfonction Valeur, on prouve que la vitesse de convergence du shéma étudié est de l'ordre de h où h est le paramètre de discrétisation de l'espace. On présente quelques expériences numériques pour tester l'algorithme dans un cas où la fonction valeur est non continue. Mots-clé : Equation de Hamilton Jacobi Bellman Issacs, programmation dynamique, schéma d'approximation, solution de viscosité

Approximation of the value function of dierential games

3

1 Introduction We study an approximation scheme of the value function for a dierential game with target. The value function is the minmax time for the state to reach the target, and it is dened in the sense of Varaiya, Roxin, Elliot and Kalton (see [26], [23], [14] and [15]). Under a controllability assumption (the usable part (BUP) of the target is the boundary of the target) Bardi and Soravia proved [5], that the value function is the unique viscosity solution of the Isaacs' equation associated to the game. The approximation scheme presented in this paper is an adaptation to games of methods developed by H.Kushner, see [19], for stochastic control. The main point is that the scheme uses a space discretization and an approximation of partial derivatives in such a way that transition probabilities appear naturally. The continuous game is then approximated by stochastic discrete state games. A large literature exists on numerical methods (see [22] for a survey). This scheme was used previously for discounted game in open state space without boundary conditions, see [21]. This scheme is basically the same used previously for control problem, see [16] and for stopping time game, see [25]. In these papers the interpretation in terms of controlled Markov chain or stochastic games is not explicited. Another scheme of discretization has been deeply studied. It uses rst a discretization of time, and then a discretization of the state space. This scheme has been used rst for control problems, see [9], [10], [2], [3], [6] for the time discretization, [17] for fully discrete problems, and [6], [4] [1] for games with target. In this paper we prove the convergence of the value function of the discrete game to the viscosity solution of Isaacs' equation of the continuous game. The proof of the convergence uses similar arguments to the ones in [4]. Under the assumption that the value function ispLipschitz continuous, we prove that the rate of convergence of this scheme is of order h (h being the space discretization parameter). The technique is similar to the one in [25]. We also make a few remarks concerning the comparison on the two schemes and numerical results.

2 The continuous problem We consider the class of dierential games dened by the dynamic y_ (t) = f (y(t); u(t); v(t)); t > 0 y(0) = x: Where y(t) 2 IRM is the state of the game, u(t) 2 U , U compact, is the control of the minimizer at time t,

RR n2942

4

O. Pourtallier and M. Tidball

v(t) 2 V , V compact, is the control of the maximizer at time t. We assume that the dynamic function f is a continuous function from IRM U V to IRM that satises :

k f (x; u; v) ? f (y; u; v) k L k x ? y k; 8u; v; x; y k f (y; u; v) k F; 8u; v; y:

(1) (2)

(k : k denotes the L1 (IRM ) norm and L, F are constants). These assumptions insure

that the dierential equation has an unique solution. a closed target T 2 IRM . The game will stop whenever the state reaches the target. a cost function associated to each pair of functions (u(:); v(:)),

J (x; u(); v()) = inf ft j y(t) 2 T g: In other words, J (x; u(); v()) is the smallest time such that the state reaches the target, if the initial state is x. We set J (x; u(); v()) = +1 if the state never reaches the target. We dene the function T (x) as the lower value function of the game in the sense of Varaiya, Roxin, Elliot and Kalton [26], [23], [14]. Namely sup J (x; (v()); v()); T (x) = inf 2A v()2V

where V is the set of measurable functions from IR+ to the control set V , and A is the set of non anticipative strategies of the minimizer. We could dene, in the same way, the upper value function of the game, and all the results of this paper would remain valid. In order to obtain a discounted dierential game (which will be important to prove the existence of the approximated solution), we make use of the following Kruzkov transformation ?T (x) if T (x) < +1 V (x) = 11 ? e if T (x) = +1;

and the following proposition can be proved (see [18]):

Proposition 2.1 Assume that the whole boundary of the target is usable (or equivalently that T (x) is continuous on @ T ), then the function V is the unique viscosity solution of the Hamilton-Jacobi-Isaacs equation (

V (x) + min max f? < rV (x); f (x; u; v) > ?1g = 0; v2V u2U V (x) = 0; if x 2 T

if x 2 IRM =T =

(3)

INRIA

5


We note

H (x; rV (x)) = min max f? < rV (x); f (x; u; v) > ?1g v u

the hamiltonian of the boundary value problem and recall the denition of a viscosity solution of (3), see [11], [12], [20], [2] and [6].

Denition 2.1 w is said to be a viscosity subsolution (supersolution) of the boundary value problem (3) if it is a upper (lower) semicontinuous function and if for all ' 2 C ( ) such that w ? ' attains a local maximum (minimum) point x we have w(x) + H (x; r'(x)) 0; if x 2

(4) w(x) + H (x; r'(x)) 0; or w(x) 0; if x 2 @

w(x) + H (x; r'(x)) 0; if x 2

(5) w(x) + H (x; r'(x)) 0; or w(x) 0; if x 2 @

1

We say that w is a viscosity solution of (3) if it is both sub and super viscosity solution.

3 Approximation of the dierential game by a stochastic game In this section we describe the stochastic game that will approximate the dierential game. The reasoning that leads to this approximation is described in details in [21]. It is an adaptation to the case of deterministic games of a scheme introduced by H.Kushner for stochastic control. The basic idea is to discretize the space and approximate the partial derivative of the dynamics function in such a way that transition probabilities appear. Let us consider the stochastic game composed of the following elements A state space h . Let IRM h be the discrete grid dened by : M X M IRh = fx j x = i hei ; i 2 IN g; i=1 where (e1 ; : : : ; eM ) is a normal basis of IRM . oh h

Dene the sets @ and in the following way :

@ h = @ T h = fx 2 IRM h;

x 2 @ T ; or 9i j x + ei h 2 T or x ? ei h 2 T g oh

c

=T

RR n2942

\

h IRM h ? @

6


The state space of the stochastic game will be oh [

h =

@ h :

and the discrete target is dened by

T h = IRMh ? h : A transition probability p(x; yju; v) dened by : oh

P

If x 2 and i j fi (x; u; v) j6= 0 + p(x; x + ei hju; v) = P fij f(x;(x;u;u;v)v) j

i

i

? p(x; x ? ei hju; v) = P fij f(x;(x;u;u;v)v) j i

i

(6)

p(x; yju; v) = 0 for every other y; where fi+ = sup(0; fi ) and fi? = sup(0; ?fi). P If x 2 @ T h or i j fi (x; u; v) j= 0

p(x; xju; v) = 1 p(x; yju; v) = 0; 8y 6= x

(7)

p(x; yju; v) is the probability that the next state of the game will be y, if the present state is x and if the players use the controls u and v at the present time. Let us note that each x of @ T h is an absorbing state. If the system happens to be in a state x 2 @ T h , then, it will stay on this state indenitely since the transition probability to any other state is null. An instantaneous reward given by oh kh (x; u; v) = P j f (x;hu; v) j +h ; for x 2

i i

kh (x; u; v) = 0 for x 2 @ h

(8)

Note that when the system stops on the boundary the total reward does not increase any more since the instantaneous reward is null.

INRIA

7


A discount factor h (x; u; v) given for each triple (x; u; v) P i (x; u; v ) j : h (x; u; v) = P ji fj f(x; (9) i i u; v ) j +h This last discount factor is not classical in the stochastic game or controlled Markov chain literature. Nevertheless the main point is that it does not aect the convergence of the classical algorithms such as Shapleys' algorithm, since k f k being bounded, h (x; u; v) is bounded by a scalar strictly smaller than one. We will denote V h (x) the value function of this stochastic game, and it is a classical result that it satises following equation :

V h (x) = max min v u

(

X kh(x; u; v) + h (x; u; v) p(x; yju; v)V h (y) y

)

(10)

Note that if x belongs to @ h (10) gives

V h (x) = 0 which is compatible with the continuous equation (3). Since the state never reaches the interior of the target (because of the denition of the probabilities), we can set V h (x) = 0 to be compatible with the continuous game.

Proposition 3.1 The solution V h of equation (10) exists and is unique. Proof

V h is showed to be the unique xed point of a contractive operator T h dened by M) Th : B(IRM h h ) ?! B(IR h U ?! T U

with

T hU (x) = max min v u

(

X kh (x; u; v) + h (x; u; v) p(x; yju; v)U (y) y

)

M and B(IRM h ) denoting the set of all bounded real-valued functions dened on IRh .

Clearly T h is well dened since h (:; :; :) and kh (:; :; :) are bounded : P i (x; u; v ) j 1 < 1 h (x; u; v) = P ji fj f(x; h i i u; v ) j +h 1 + F and kh (x; u; v) = P j f (x;hu; v) j +h 1 i

RR n2942

i

(11) (12)

8


Also T h is a contractive mapping because for all U and W in B(IRM h) (

)

h (x; u; v ) + h (x; u; v ) X p(x; y ju; v )U (y ) k (T h U ? T hW )(x) = max min v u y ( ) X h h ?max min k (x; u; v) + (x; u; v) p(x; yju; v)W (y) v u y

h (x; u; v) where v satises maxv minu

n

kh (x; u; v ) + h (x; u; v )

X

y

p(x; yju; v)(U (y) ? W (y))

P

o

v )W (y ) = yp(x; y ju; o n P minu kh (x; u; v) + h (x; u; v) yp(x; yju; v)W (y)

and u is the argument of the following minimization (

h (x; u; v) + h (x; u; v) X p(x; y ju; v)U (y ) min k u y

)

That leads to

(T h U ? T hW ) h k U ? W k where h dened by (11) is strictly smaller that one, and the norm used is the sup norm : k U k= supx2IRMh j U (x) j. Similarly it is easy to show that (T hW ? T hU ) h k U ? W k . T h is then a contracting mapping and therefore has an unique xed point V h in B(IRM h ) that satises equation (10), and this completes the proof.

}}

4 Main theorem We prove in this part the convergence of the value function of the stochastic game to the value function of the dierential game. For that we rst introduce V~ h an interpolation on the closure of the set in the following way. V~ h is the restriction to of the ane h interpolation V~ of V h that is h V~ (x) = V h (x) for x 2 h

INRIA

9


h oh V~ is ane on each simplex dened by its vertices fx; x + ei h; x 2 ; i = 1; : : : ; M g or oh fx; x ? ei h; x 2 ; i = 1; : : : ; M g. For sake of completeness we rst state a comparison theorem that we will use later in the proof of the main result of this paper. The proof of this theorem can be found in [6].

Theorem 4.1 Assume that T is the closure of an open set and that @ T is a Lipschitz

surface. Let u1 and u2 be bounded functions from to IR such that : i) u1 is a viscosity subsolution of u(x) + H (x; ru(x)) = 0 in the open set , and is continuous and non positive at each point of @ , ii) u2 is a viscosity supersolution of the equation with boundary condition (3). Then u1 (x) u2 (x); x 2 : The same conclusion holds if u1 is a viscosity subsolution of (3), and u2 is a viscosity supersolution of u(x) + H (x; ru(x)) = 0 in and is continuous non negative at each point of @ .

Let us now state and prove the main theorem of this paper

Theorem 4.2 Assume that the regularity assumptions (1) and (2) hold and that T is the closure of an open set and that @ T is a Lipschitz surface. We have : V~ h converges, uniformly on each compact of , to the value function of the continuous game.

Proof of theorem 4.2 :

As in [4], let us introduce the functions V and V dened by V (x) = lim sup V~ h (y) and V (x) = lim inf V~ h (y); y!x

y!x

h!0

h!0

and let us state, and admit for a while, a lemma. The proof of this lemma is the main part of the proof of the theorem.

Lemma 4.1 V and V are respectively sub and super viscosity solutions of equation (3). On one hand, by denition of V and V , we know that

V V On the other hand, to prove the reverse inequality, we use the same argument as in [4]. We remind it here to make the proof selfcontaining.

RR n2942

10


Since V , the solution of the continuous problem is continuous and null on the boundary it satises the condition i) of theorem 4.1. Applying this theorem we obtain that V V . With the same argument and the second part of the theorem, we prove that V V and then V =V =V h ~ ~ which proves that V = limh!0 V is a viscosity solution of (3) and it is the solution of the continuous problem.

}}

Let us state and prove a modied version of lemma 6.1 of [13] that we will use in the proof of lemma 4.1.

Lemma 4.2 Let ' be a C (IRM ) function. Let uh be a sequence of functions dened on IRMh , 1

and u be the function dened by

uh (y) u(x) = limy!sup x h!0

Dene x+ a strict local maximum of u ? ', (let assume that it is the unique maximum in B (x+ ; r) for a xed r), Then there exists sequences hj , xj such that T xj is a maximum of uhj ? ' in B hj = B (x+ ; r) IRM hi , j!lim xj = x+ +1 j!lim uhj (xj ) = u(x+ ) +1

Proof of the lemma 4.2.

By denition of u(x+ ), there exists sequences hn (hn ! 0 whenn ! 1) and xn such that lim xn = x+ ; and n!lim uhn (xn ) = u(x+ ): (13) n! +1 +1 Notice that for any sequences hk , xk such that limn!+1 xk = x+ , we have lim sup uhk (xk ) u(x+ ): k!+1

(14)

Now, for any h let us dene x+;h as the maximum ot uh ? ' in B h , in other words we have

uh (x) ? '(x) uh (x+;h ) ? '(x+;h ); 8x 2 B h :

(15)

INRIA

11


In particular this equality is true for h = hn and x = xn , that is :

uhn (xn ) ? '(xn ) uhn (x+;hn ) ? '(x+;hn ):

(16)

Now the sequence x+;hn is dened on a the compact set, and then we can extract a convergent sub sequence, that we note again x+;hn . Let us note y the limit of this subsequence. We take the inferior limit in the inequality (16) and obtain,

u(x+ ) ? '(x+ ) lim j!inf+1 uhn (x+;hn ) ? '(y): and now using inequality (14) for the sequence x+;hn we obtain

u(x+ ) ? '(x+ ) lim j!inf+1 uhn (x+;hn ) ? '(y) lim sup uhn (x+;hn ) ? '(y) u(y) ? '(y) j !+1

and naly since x+ is the maximum of u ? ' we can deduce that y = x+ and then

u(x+ ) = j!lim uhn (x+;hn ) +1 which ends the proof the the lemma.

}}

Proof of the lemma 4.1

We will only prove that V is a viscosity subsolution, since the proof that V is a viscosity supersolution is basically the same. From its denition, it is easy to see that V is uper semi continuous. Now, let ' be a C 1 ( ) function, and x+ a local maximum of V ? '. Let us assume that x+ is a strict local maximum, so we can nd r > 0 such that x+ is the unique maximum of V ? ' in the open ball B = B (x+ ; r). (If x+ is not strict, we can modify slightly the test function '). Two o possibilities occur : either x+ belongs to , or it belongs to the boundary @ . o

Assume rst that x belongs to . We want to prove that ? V (x ) + min v max u ? < r'(x ); f (x ; u; v ) > ?1 0: +

+

+

+

Let hn be a sequence such that the conclusion of lemma 4.2 holds, and let xn be a sequence of B such that xn is a maximum of V hn ? ' in B hn = B \ IRM hn , with hn a sequence such that hn tends to zero when n tends to innity. In order to simplify the notation we will write V n instead of V hn , and B n instead of B hn . We have (V n ? ')(xn ) (V n ? ')(x);

RR n2942

12


for all x 2 B n . In particular this equality is true for all xn + ei hn and xn ? ei hn (which are elements of B n for n large enough because of lemma 4.2). We can write X

y

p(xn ; yju; v)(V n ? ')(xn )

X

y

p(xn ; yju; v)(V n ? ')(y)

or equivalently X

y

p(xn ; yju; v) ('(y) ? '(xn ))

X

y

p(xn ; yju; v) (V n (y) ? V n (xn )) :

(17) oh

On the other hand, developing slightly equation (10) we obtain that for all xn 2

we have P

n

n n i jfi (x ;u;v)j maxv minu hn+P n y p(x ; y ju; v )V (y )+ i jfi (x ;u;v)j P

P hn

hn + i

that is maxv minu

P

n jfi xn;u;v j ? V (x ) = 0 (

)

n i jfi (x ;u;v)j P p(xn ; y ju; v )V n (y ) ? y hn P hn + i jfi (xn ;u;v)j n V (x ) + 1 = 0 hn

and nally max min v u

(P

)

n i j fi (x ; u; v ) j X p(xn ; y ju; v ) (V n (y ) ? V n (xn )) ? V n (xn ) + 1 = 0; hn y

and using the monotonicity of the minmax function and the inequality (17) we obtain for all n max min v u

(P

n i j fi (x ; u; v ) j X p(xn ; y ju; v ) ('(y ) ? '(xn )) ? V n (xn ) + 1 hn y

)

0;

and again developing the transitions probabilities, ( X

n n fi+ (xn ; u; v) '(x + ei hhn ) ? '(x ) n i ) n n ? ei h) X ' ( x ) ? ' ( x ? n n ? fi (x ; u; v) ? V (x ) + 1 0 hn i

max min v u

INRIA

13


Taking the limsup when n tends to innity (that is hn tends to zero), and using the lemma 4.2 we obtain that (

max v min u

)

@' (x+ ) ? X f ? (x+ ; u; v) @' (x+ ) ? V (x+ ) + 1 0 fi (x ; u; v) @x i @xi i i i

X

that is

+

+

max min < r'(x+ ); f (x+ ; u; v) > ?V (x+ ) + 1 0: v u

and nally, multiplying by -1, we obtain the required inequality ?

V (x+ ) + min max ? < r'(x+ ); f (x+ ; u; v) > ?1 0: v u

If x belongs to @ T = @ we want to prove that one of the following inequalities +

holds,

(a) or (b)

V (x+ ) 0; ?

V (x+ ) + min max ? < r'(x+ ); f (x+ ; u; v) > ?1 0 v u

Again we construct the sequence xn such that xn is a maximum of V hn ? ' in B hn . Two dierent cases occur. Either there exists n0 such that for all n > n0 , xhn belongs o hn to , in this case the previous reasoning holds and we obtain (b), either there exits a subsequence (hm )m of the sequence (hn )n , such that hm tends to 0 if m tends to o hm innity and such that xhm does not belong to . In this case V hm (xhm ) = 0 by denition of V hm on the boundary. Then, since according to the second conclusion on lemma 4.2, the sequence V hm (xhm ) converges to V (x+ ), we obtain that V (x+ ) = 0, and then satises (a).

}}

5 Convergence rate of the discrete problem In this section we want to obtain an estimate on the rate of convergence of the approximation scheme. This rate of convergence is computed with the additional hypothesis that V is Lipschitz continuous (let us note LV it Lipschitz constant). A set of hypothesis that insures that V is Lipschitz continuous can be found in [7]. Again, these hypothesis concern regulatity of @ T and dynamics.

RR n2942

14


In order to simplify the notations, let us introduce the operators W and W h dened by : (WF )(x) = F (x) + min v max u (? < rF (x); f (x; u; v ) > ?1); and @F h (W F )(x) = F (x) + min max ? @x (x; u; v) ? 1 ; v u f

@F (x; u; v ) stands for the approximation of rF (x)f (x; u; v ), that is, Here @x f

X F (x + ei h) ? F (x) @F (x; u; v) def = fi+ (x; u; v) ? F (x) ? Fh(x + ei h) fi?(x; u; v) : @xf h i

With the notations introduced, V (:) is solution of the boundary value problem

WV (x) = 0 if x 2 IRM =T = ; V (x) = 0 if x 2 T ;

(18)

and V h is the solution of the discrete space boundary value problem (

oh

W h V h (x) = 0 if x 2 ; V h (x) = 0 if x 2 T h [ @ T h :

(19)

Note that with this notation V h veries (10). To obtain the rate of convergence of the scheme, we want to obtain an upper bound of supx2IRMh j V (x) ? V h (x) j. To this aim we use the following decomposition

j V ? V h jj V ? Vh j + j Vh ? V h j;

(20)

where V is the regularization of function V , and Vh is the ane interpolation of the restriction of V on the discrete space state. With classical results we obtain a upper bound of the rst term of the decomposition. A result from [16] together with the interpretation of the function Vh as a solution of an auxiliary boundary value problem which is almost the problem (19), give an upper bound of the second term of the right hand side of (20). Combining these results we obtain the rate of convergence of the problem. Let 1 () be the function such that:

1 () 2 C 1 (IRM ); 1 (x) 0; if k x k> 1 then 1 (x) = 0; Z

IRM

1 (s)ds = 1: INRIA


15

For a strictly positive scalar , we dene:

(x) = 1M 1 (x=)

and the regularization V (:) of V (:), V (x) = (V ) (x); 8x 2

where means the convolution product, that is (f g)(x) =

Z

IRM

(21)

f (x ? y)g(y)dy:

We also dene the function Vh as the continuous piecewise ane function of such that : Vh (x) = V (x); 8x 2 IRM h: It is well known that V (:) 2 C 1 (IRM ), and furthermore, if we assume that V (:) is Lipschitz continuous, we have the classical properties (see [8] for example) :

@ V (x) = @ V (x); @x @x @ V kk @ VkL ; ii) k @x V @x 2 @ V k C 1L ; iii) k @x@@x V k C 1 k @x V i j

i)

iv) jV (x) ? V (x)j LV

In (iii), C is a constant that we do not want to precise. We have furthermore the following property, for each x in IRM : v) jWV (x) ? (WV )(x)j C and from the denition of Vh and the property (iv) it follows that for x 2 IRM h, jVh (x) ? V (x)j LV ; (22) which gives an upper bound for the rst term of our decomposition. The next two theorems, Theorems 5.1 and 5.2, give an upper bound for the second term of the decomposition (20). oh oh Theorem 5.1 applies for x in , and theorem 5.2 applies for x in Th , where, the set is the complementary in the discrete space of the enlarged discrete target Th , that is h

Th = x = d(x; T h ) ; and o = IRMh ? Th :

RR n2942

16

O. Pourtallier and M. Tidball h

Theorem 5.1 For V h(:) and Vh(:) as dened previously, and x 2 o , we have :

j V h (x) ? Vh (x) j C + h

(23)

where C is a constant, independent of x, that we will not precise.

Before starting the proof of the theorem we want to state a technical lemma: Lemma 5.1 For all x 2 IRMh we have the following upper bounds

jW h Vh (x) ? WV (x)j C h ;

jW h Vh (x) ? (WV

(24)

)(x)j C + h :

(25)

Proof of lemma : The proof of (24) By iii) we have that: h

(x) ; f (x; u; v ) > j C k @ V k h C h j @V@x (x) ? < @V@x @x x 2

f

i j

This ends the proof of equation (24). Notice that this proof uses the fact that V is Lipschitz continuous. Equation (25) is a direct consequence of (24) and of property (v) of V .

}}

Proof of theorem : To prove this theorem we rst interpret Vh(:) as the solution of the following auxiliary boundary value problem : (

Here,

oh

W h Vh (x) ? (x; h; ) = 0; if x 2 ; Vh (x) = (x; h; ) if x 2 Th :

(26)

(x; h; ) = (W h Vh )(x) ? WV (x) + WV (x):

oh

Using equation (25) of lemma 5.1, the fact that WV (x) is null on because of (18), it comes oh h j(x; h; )j C + ; 8x 2 : In the same vein, has to be written as

(x; h; ) = Vh (x) ? V (x) + V (x) ? V ( ) + V ( );

INRIA

17


for x 2 Th , where 2 T h is such that k x ? k . We have V ( ) = 0 from equation (18) and using the fact that jV (x) ? V ( )j LV , we obtain : j(x; h; )j C; where again, C is a constant that we do not want to precise. Developing the operator W h and using denitions (8) of kh and (9) of h , Vh (x) can be written for all x in h (

Vh (x) = max min k~h (x; u; v) + h (x; u; v) v u where

X

y

)

p(x; y j u; v)Vh (y) ;

k~h (x; u; v) = kh (x; u; v)(1 ? (x; h; )):

Combining this last equation with equations (19) and (10), and reminding that for functions g1 (:; :) and g2 (:; :), there exists u and v such that max min g (u; v) ? max min g (u; v) g1 (u; v) ? g2 (u; v); v u 1 v u 2 we can write for x in h :

V h (x) ? Vh (x)

kh (x; u; v) + h (x; u; v)

X

p(x; y j u; v)V h (y)

y X h h ?k (x; u; v) + (x; u; v) p(x; y j u; v)Vh (y): y

It follows that

V h (x) ? Vh (x) X k kh (:; u; v) ? k~h (:; u; v) k + h (x; u; v) p(x; y j u; v) k V h (:) ? Vh (:) k : (27) y

In the same way we obtain the reverse inequality

Vh (x) ? V h (x) X k kh (:; u~; v~) ? k~h (:; u~; v~) k + h (x; u~; v~) p(x; y j u~; v~) k V h (:) ? Vh (:) k; (28) y

and these two last equations together lead to the inequality

k V h (:) ? Vh (:) k F + h h ~h 1 h ~h 1? k h k k k ? k k h k k ? k k :

RR n2942

(29)

18


We nally use the fact that

k kh ? k~h kk kh k k k F h+ h ( + h ); oh

to obtain that for x in ,

k V h (:) ? Vh (:) k C ( + h ): This ends the proof of theorem 5.1.

}} p

Remark 5.1 For a given space discretization h, the optimal value of in (23) is = h. Theorem 5.2 Let V h(:) and Vh(:) be as dened previously, if x 2 Th, we have : jV h (x) ? Vh (x)j C: where C is a constant independent of x.

Proof of theorem : The proof of this theorem diers from the proof of the previous theorem only in the denition of the auxiliary boundary value problem. We want to estimate the expresion: (

h (x; u; v ) + h (x; u; v ) X p(x; y ju; v )V h (y ) ? V h (x) max min k v u y

)

(30)

which can be rewritten (

max min kh (x; u; v) + h (x; u; v) v u

X

y

p(x; yju; v)Vh (y)?

Vh (x) + h (x; u; v)Vh (x) ? h (x; u; v)Vh (x) X = max min kh (x; u; v) + h (x; u; v) p(x; yju; v)(Vh (y) ? Vh (x))+ v u y ( h (x; u; v) ? 1)Vh (x) (

(31) As Vh is a Lipschitz function and k x ? y k h, (since for k x ? y k h, p(x; y j u; v) = 0), we have j Vh (y) ? Vh (x) j LV h. On an other hand, from (26), jVh (x)j C for x in Th . From that, it follows that for all x 2 Th , and for all u and v : max min jkh(x; u; v) + h (x; u; v) v u

X

y

p(x; yju; v)Vh (y) ? Vh (x)j

jkh (x; u; v)j + j (x; u; v)jLV h + C: INRIA


19

So, for x in Th , we have,

W h Vh (x) ? ~(x; h; ) = 0;

jVh (x)j C;

(32)

with j ~(x; h; )j C, or again,

Vh (x:u:v) = k~h (x; u; v) + h (x; u; v) where

X

y

p(x; y j u; v)Vh (y);

k~h (x; u; v) = kh(x; u; v)(1 ? ~(x; h; )):

Now the proof ends with exactly the same computation that previously, using the fact that

V h (x) satises the boundary value problem (19).

}} The proof of the next theorem that gives the convergence rate of the scheme is now a direct consequence of theorems 5.1 and 5.2. Theorem 5.3 If V is a Lipschitz continuous function then:

p kV h ? V k C h

6 Comparison with another discrete scheme In [4] the numerical approximation of the same problem is studied. This discretization is done in two steps : rst the continuous equation is approximated by a discrete time equation (see [6]) (using an Euler's approximation of the dynamic), and then this equation is approximated by a discrete space equation to obtain the fully discret problem. Thus two parameters are associated to this method. A time parameter k, and a space parameter h. The important feature is that the time parameter is xed. We show in this section that the scheme of approximation presented in this paper can be interpreted as an extension of this last scheme, considering that the time parameter can be a function of the state. As a matter of fact when using Euler's approximation, it is possible to consider time as a function of the state point and controls. Let us rst describe briey the fully discrete dynamic programming equation obtained in [4]. We will consider only the equation in the interior of the state space and neglect the boundary problems for this comparison. For x a node of the discretized space we have

V (x) = max min e?k v

RR n2942

u

( X

i

)

i V (xi ) + 1 ? e?k ;

(33)

20

O. Pourtallier and M. Tidball P

where the i 's are such that i2S i = 1, i 0. S is the set of indices of vertices xi of the simplex that contains the point x0 = x + kf (x; u; v). As in the method presented in this paper, this time and space discretization scheme can also be interpreted in terms of stochastic game. As a matter of fact, the process which leads to equation (33) is the following (see gure 1) : start at node x and let the dynamic evolves for a duration k. The new state, x0 = x + kf (x; u; v), is not necessarily a node of the discret space. Let xi be the vertices of the simplex which contains the point x0 . The convex combination of x0 in the system of points xi is considered. The coecients i dened previously can be interpreted as transition probabilities to go from the point x to the points xi of the discret space. Nevertheless this last stochastic game is somewhat dierent to the one obtained with the time discretization scheme. Indeed, in the stochastic game obtained in this paper, the transition probability to go from a point to a non direct neighbor is null. Even if we consider the time parameter k small enough, so that x and x0 always belong to the same simplex, the two schemes are dierents. Indeed, in the rst scheme (the one studied in this paper), the probability to go from x to x is null (or equal to one is the dynamics is null)(see gure 2), which is in general not the case in the second scheme. Note furthermore that equation (10) can be rewritten as : h ?1 P V (x) = max min 1 + v u i jfi j

X

i

!

p(x; yju; v)V (y) +

Ph

i jfi j

1 + P hjfi j

:

i

We have exactly the same form of equation as equation (33), if we keep in mind that 1+1 k is an approximation of e?k and 1+k k an approximation of 1 ? ek , and P if we replace the time parameter k by the state and controls dependent expression : h= ( i j fi (x; u; v ) j). Note P that h=( i j fi (x; u; v) j) can be interpreted as the time necessary for the system to go from the point x to a point at distance h, according to the controls used by the players. Another remark concerns the convergence condition. As a matter of fact, in [4] the condition (34) limn!1 hkn = 0; n

is necessary to prove the convergence of the scheme. In the method presented here this condition is not satised since h = X jf j: i Ph i jfi j

i

A question arises : Is condition (34) a necessary condition for the scheme to converge or is it only a technical condition to make the proof of the convergence simpler ? Some other points of comparison such that convergence rate, together with comparison with other methods will be presented in a forthcoming paper.

INRIA

21


x1 x2

x’

x3

k f(x,u,v) x h Figure 1: Space and time discretization scheme

7 Numerical examples An implementation of our method has been tested on examples. The algorithm uses the policy iteration method to solve the stochastic game. It is known that for game problems the method of policy iteration does not always converge. When it converges, it is towards the solution of the game. On the dierent examples tested the convergence has always been obtained rather quickly. We present the results obtained on the simple example already presented in [4]. This game has the particularity that the analytical solution is known, and furthermore the value function is not continuous. The algorithm converges quickly. We obtain the same solution as in [4]. From numerical observation it seems that the function we nd is not exactly the value function of the game. This gives strong motivations to study the problem without continuity assumptions of the value function, and rst of all to understand what the discontinuous viscosity solution is. For that we can refer to [18] and [24] for instance.

The game

RR n2942

x_ = v(x ? 1)(x + 1)v1 ; y_ = u(y ? 1)(y + 1)v2 ;

x(0) = x0 y(0) = y0

22


x1

x’

x

h

Σ f (x,u,v) i

f(x,u,v)

x2

i

Figure 2: Time discretization scheme where x, y 2 Q = [0; 1] [0; 1] are the state variables of the pursuer and the evader respectively, v1 and v2 , v1 v2 represent their relative velocities; U = V = [?1; 1] (see that Q in this case is invariant with respect to the trajectories) and T = f(x; y) : x = yg. One can verify that the solution of this game is :

v ?1 v 2 1

V (x0 ; y0 ) = 1 ? ll1 2

if x0 y0

1

v2 ?v1 V (x0 ; y0 ) = 1 ? ll2 if x0 > y0 1 where l1 = (jx0 ? 1j=jx0 + 1j)1=2 , l2 = (jy0 ? 1j=jy0 + 1j)1=2 .

The algorithm has been tested on a grid with 1601 nodes. We consider v1 = v2 , in this case V (x; y) = 0 in T and V (x; y) = 1 in . Figure 3 represents the approximate value function and Figure 4 its level curves.

INRIA

23


1 0.75 0.5 0.25 0 10

10 20

20 30

30 40

40 5050

Figure 3: The approximate value function

50

40

30

20

10

0

10

20

30

40

Figure 4: The level curves

RR n2942

50

24


8 Perspectives Further investigations should address the following issues : Precise comparison between the two methods. Extension of the method for more general problems e.g. cost with non positive instantaneous reward (Kruskov transform does not apply), non null condition on the boundary of the target. Case where the value function is not continuous. Study of an approximation of the controls. Can the controls obtained when solving the discrete game be considered as approximations of the optimal controls ?

References [1] Alziary de Roquefort B., Jeux diérentiels et approximation numériques de fonctions valeur, 2e partie: étude numérique, RAIRO Math. Model. Numer. Anal. 25, pp 535560, 1991. [2] Bardi M. and Falcone M., Discrete approximation of the minimal time function for systems with regular optimal trajectories, Lecture Notes in Control and Informations Sciences, Analysis and Optimization of systems, no 114, Springer-Verlag, 1990. [3] Bardi M. and Falcone M., An approximation scheme for the minimum time function, SIAM Journal, Control and Optimization, Vol 28, no 4, pp950965, July 1990. [4] Bardi M., Falcone M. and Soravia P., Fully discrete schemes for the value function of pursuit-evasion games, Advances in dynamic games and applications, T. Ba³ar, A. Haurie eds., Birkhauser, pp 89105, 1994. [5] Bardi M. and Soravia P., A PDE framework for games of pursuit-evasion type, Dierential Games and Applications, T. Basar P. Bernhard eds. pp 6271, Lectures Notes in Control and Information Sciences 144, Springer Verlag, 1989. [6] Bardi M. and Soravia P., Approximation of dierential games of pursuit-evasion by discrete time games, Dierential Games - Developments in modeling and computation, R.P. Hämäläinen, H.K. Ehtamo Eds, Lecture Notes in Control and Information Sciences, Vol 156, Springer-Verlag, 1991. [7] Berkovitz L., Dierential games of generalized pursuit and evasion , SIAM J. Control and Optimization, Vol 24, no 3, 1986. [8] Brezis H., Analyse fonctionnelle. Théorie et applications, Mathématiques appliquées pour la maitrise, 1992

INRIA


25

[9] Capuzzo Dolcetta I. On a discrete approximation of the Hamilton-Jacobi equation of dynamic programming, Appl Math Optim 10, pp 367377, 1983. [10] Capuzzo Dolcetta I. and Ishii H., Approximation solutions of the Bellman equation of deterministic control theory, Appl Math Optim 11, pp 161181, 1984. [11] Crandall M.G. and Lions P. L., Viscosity solutions of Hamilton-Jacobi equations., Trans AMS 277, pp 1-42, 1983. [12] Crandall M.G., Evans L. C. and Lions P. L., Some properties of solutions of HamiltonJacobi equations., Trans AMS 282, pp 487502, 1984. [13] Crandall M.G., Ishii H. and Lions P. L., User's guide to viscosity solutions of second order partial dierential equation, Bulletin (new Series) of the American Mathematical Society, 27, n0 1, 1992. [14] Elliot R. J. and Kalton N. J., The existence of value in dierential games, Mem. Amer Math. Soc. 126, 1972. [15] Elliot R.J.and Kalton N.J., The existence of value in dierential games, Mem. Amer. Math. Soc. 126, 1972. [16] Gonzalez R. and Rofman E., On deterministic control problem - An approximation procedure for the optimal cost , SIAM Journal on Control and Optimization 23, 1985. [17] Gonzalez R. and Tidball M., Sur l'ordre de convergence des solutions discrétisées en temps et en espace de l'équation de Hamilton-Jacobi, Comptes Rendus Acad. Sci. Paris, Tomo 314, Serie I, pp 479-482, 1992. [18] Ishii H., Perron's method for Hamilton-Jacobi equations, Duke Math. J.55, pp 369384. [19] Kushner H., Probability methods for approximations in stochastic control and for elliptic equations, Academic Press, Vol 129. [20] Lions P.L., Generalized solutions of Hamilton-Jacobi equations, Pitman London, 1982. [21] Pourtallier O. and Tolwinski B., Discretization of Isaacs' equation: A convergence result, Communication at the Conference on applied probability in engineering, computer and communication sciences, Paris, June 1993. [22] Raghavan T. E. S. and Filar J., Algorithms for Stochastic Games - A Survey, Methods and Models of operations Research 35, pp 437-472, 1991. [23] Roxin E., Axiomatic approach in dierential games, J. Optim. Th. Appl. 3, pp 153163 [24] Soravia P., The concept of value in dierential games of survival and viscosity solution of Hamilton-Jacobi equations, Dierential Integral Equations. To appear.

RR n2942

26


[25] Tidball M. and González R.L.V., Zero sum dierential games with stopping times. Some results about its numerical resolution, Proceedings of Fifth International Symposium on Dynamics Games and Applications, Grimentz, Swizerland, 15 - 18 july 1992. Annals of Dynamics games, Vol 1, 1993. [26] Varaiya P.P., On the existence for solution to a dierential game, SIAM J. Control 5, pp 153162, 1967.

INRIA

Unite´ de recherche INRIA Lorraine, Technopoˆle de Nancy-Brabois, Campus scientifique, 615 rue du Jardin Botanique, BP 101, 54600 VILLERS LE`S NANCY Unite´ de recherche INRIA Rennes, Irisa, Campus universitaire de Beaulieu, 35042 RENNES Cedex Unite´ de recherche INRIA Rhoˆne-Alpes, 655, avenue de l’Europe, 38330 MONTBONNOT ST MARTIN Unite´ de recherche INRIA Rocquencourt, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex Unite´ de recherche INRIA Sophia-Antipolis, 2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex

E´diteur INRIA, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex (France) ISSN 0249-6399

Approximation of the value function for a class of di ... - CiteSeerX

Approximation of the value function for a class of di ... - CiteSeerX

Suggest Documents

Pseudorehearsal in value function approximation

Manifold Representations for Value-Function Approximation

A Multi-class Approximation Technique for the Analysis of ... - CiteSeerX

Analyzing Feature Generation for Value-Function Approximation

Value Function Approximation using the Fourier Basis

Load Value Approximation - CiteSeerX

A Brief Survey of Parametric Value Function Approximation

Iterative approximation of unstable limit cycles for a class ... - CiteSeerX

Approximation Results for a General Class of

ON THE APPROXIMATION PROPERTIES OF A CLASS ... - CiteSeerX

Approximation of Sigmoid Function and the Derivative for ... - CiteSeerX

a step-function approximation for the experimental

Approximation of Di erential Inclusions - CiteSeerX

Neuro{Fuzzy Systems for Function Approximation - CiteSeerX

Approximation of a Class of Optimal Control Problems ... - CiteSeerX

Using Singular Value Decomposition Approximation for ... - CiteSeerX

Neuro{Fuzzy Systems for Function Approximation - CiteSeerX

Least Absolute Policy Iteration for Robust Value Function Approximation

A Class of Best Simultaneous Approximation Problems - CiteSeerX

Policy Gradient vs. Value Function Approximation: A Reinforcement ...

Approximation by superpositions of a sigmoidal function - CiteSeerX

A branching particle system approximation for a class of FBSDEs

Value Function Approximation in Reinforcement Learning using the ...

Approximation by Translates of a Radial Function