Incomplete Information in Differential Games and ... - Science Direct

0 downloads 0 Views 1MB Size Report
GAMES AND TEAM PROBLEMS: NECESSARY AND ... Necessary and sufficient optimality conditions are obtained by a dynamic programming method, giving a ...
Copyright © IFAC Control Science a nd T ec hnology (8th Triennia l W orld Congress) Kyo to, J a pa n , 1981

TEAM AND GAME THEORY I

INCOMPLETE INFORMATION IN DIFFERENTIAL GAMES AND TEAM PROBLEMS: NECESSARY AND SUFFICIENT OPTIMALITY CONDITIONS

J.

Levine

Centre d 'Automatique et Informatique , Ecole Nationale Superieure des Min es de Paris , 35, ru e Saint Honore, 77305 Fontainebleau , France

Abs tract. We s t udy a non-c lassic al s tru cture of incompl e t e i nformat ion in N- p e~s o n noncoo pera t iv e diffe ce nt i al gam es , t eam probl ems and s toc hastic co nt r o l , wher e th e noisy obs el'va t i ons availabl e t o th e 1.eci s io n makers a re ins t antaneous and memoryl ess . Necessar y and suffic i e nt o pt imality co ndi tio ns are obtained by a dynamic progr ammi ng me thod , giving a charac t erizatio'l of the "signaling eff ec t " and a s eparation pr ~ n c ipl e. Keywords . Incollpl e t e and loca l info rmat ion - Diff erent ial Games - Team Probl erJs Stochas t i c Contro l - Nash equilibrium - Non c lassi cal information stru c ture - Op t imali t y condi t i ons - Dynami c Pro gr amming - Signaling Eff ect - Separation Princ ipl e. I NTRODUCT I ON The purpo s e of t his paper is t o char a c t er i z e the s olution of Non- Coo pera t iv e N- pers on Di ff er e ntial Games or of Te am Optimization Probl e ms (inc luding t he s toc hasti c co ntro l probl em) i n whic h t he cru ci al point i s t h e lack of informa t i on ab out the s t at e. More preci s ely , we s u ppos e, fo r tec hnica l reasons , t hat th e players obs erv e i ns t antaneous l y t h e sta te at pr es crfued ins t ants and tha t t he ob serva t i on vector b elo ngs t o a gi v en dis crete space. The s truc t ur e of i nfo rma t i on i s thus non-classi ca l (see [ 9 J, [lOJ). The players have then to c hoo s e a d ec ~sio n i n v iew of the obs erv a tio ns , t o o ptimize i n some s ens e t he expectation of a g i v en criterion . The probl ems at this stage are :

we " i v e two important cor ollari es in the signaling charac t e:ization and a s epara t i on princ ~pl e.

§V

STAT EMENT OF THE PROB LEj\j 1 .0 . Bef or e to gi v e a f ormal pr es ent a t i on of t he mode l , l et us int r oduce the type of probl ems we shall d eal wi th The sys tem is giv en by f(t , x( t ) , u(t "y(t ,)) , v (t , )) 1 J J J h (t " x (t , ) ,v (t ) ) 2 J J J 1ft E [ t"t '+1[' 1fj J J

10 ) Do o pt i mal strateg ies ex i s t ? 2 0 ) Ho w to c har acter iz e t hell ? The answer to t he fi r s t qu estion i s y es , und er v ery genera l assump tions. For the s eco nd point, if we try to write do wn a Dynamic Pro gr a ~i ng equa tio n , we r api d l y no t ice t hat t he centra l ro l e i s pl ay ed by t he prob ab ili t y measure f or t he s t a te to b e at some place x at time t, gi v en t he pas t decis ions, and t ha t the "geometri c" study of its eVolutio n (wha t we ca ll measure - trajectory ) gi v es the ~ ey to t he der i vatio n of o ptimal ity co ndi tio ns . Thes e condit i ons, i n terms of Dynamic Pro gr a mming , pro vi de a cha r a cteriz a t i on of the 30-called "signaling Effect " (s ee [ 3 J). I n § I, II we state the probl ems of Nas hEquilibrium and Team Optimization b oth deter minis tic or s tochast ic. I n § III, we giv e exis tence t heorems and t he § IV i s devo t ed to t he s tudy of necess ary and suffici ent o pt ima lity conditions. Fi nal ly ,

1345

wher e !t , l ar e the epochs bo th of random pe r t urba t f on sho cks and obs ervations , whe~ e v ' v ;:> are pe r t ur bations on the dynamiCS 1 t h e obs ervat io ns respec tiv e ly , wher e y and i s t he ob s erva tio n v ec t or , and wher e u i s a piecewise co nstant cont r o l on l y t aki ng i nto a ccount the las t obs ervat i on . I f u ( t j , y) = (u ( t j' Y1 ) , ... , uN ( t j' YN) ) t 1 (prime deno t es t r ansposi tio n) with y i s t he cont r ol of y = (Y1""' YN) t, u u i play er i and Yi is play er i t s obs ervatio n v ecto r (local informat ion str u ctur e ) . Note t hat t he obs erv a tio ns instantaneousn es s , and thus t he lack of memory f or t he decis ion mak er , ref er s to t he Non- Cl assica l I nfo rma t i on Str uctur e (s ee [ 9]). I n the team probl e m, t h e cos t fun ct i on to mi nimize is the exp ectation of an in tegr a l + fi nal cos t wit h a t ar ge t . In t he Non-coo per a t iv e Di ff erential Game, ea c h play er has

J. Levine

1346

a cost fu nc tion of the ?r eceding nature. Let us no'., giv e a pr ecLse stat ement of thes e probl eJls . 1.1 . Le t us b e giv e .1 an open s et E C R{ n an oFen b ounded s et

as th~ Stat e Space. X ex C E as the o

s et of ini tial stat es , probability measur e on

0

and X .

Po

Let us b e giv e :l a s equ e'1 ce of r eal numb ers: and a family of Borel subs ets \v.l'-A M m + m J J-v, ••• , 2 of IR 1 • Let ~(Vj) b e the s e t of ~rob abili ty p €.~ ~(V.) ,

J

J-V

o th erwis e stat ed

he has for go tt en

J

y . ( t. 1 ) , ••• , y. (t ) .

J

J.

J-

1

0

We make t he assump tio n : Th € K,

or,

1f j = 0 , •• • , M-1

(i) h(t. , .,. )

is a Borel fun c ti on on

J

IP (V.) J

E x pr2 V j

1fj =O , ••• ,M.

(ii) hj1 (Ykiv2)

M

We shall d enote v = (v , v ) E I1 V. = V 1 2 j~ J a r andom s equ ence of perturbations obtained a ccording t o th e prob abi l ity measur e P, V . b ei ng the s et of admissibl e J

perturbations a t time

Thus i t i s c l ear

that the players cannot build a filt er of the observatio ns as i n the c lassical i nfor ~na tion cas e i n sto c hasti c co ntrol (s ee f or exampl e [1J, [2 J ) . Let us denot e :

M P =j~O P j , P j €

J

J

y. ( t .) , J.

o

J.

y.(t .), or equival ently , when he

obs erv es

[ t , TJ C IR as the maximal o dura tio n of th e play . 1 . 2. The random per turbations .

V . and

J

only

J.

a

an int e cv .al

measures on

player i, the who l e obs ervation y b e Lng defined by (Y1' •• " YN) '. The i nformati on structure is suppos ed to be instantaneous and memoryless , that is play er i kno',s at time t. y. (t . ) and

is an oriented

1

C _ n_

dimensional manifold with regular ps eudo-boundar y (s ee [7J). 1.4. Output f eedbacks - Pure and Mixed Strategies.

t j ' v (r esp . v 2 ) 1 plays th e ro l e of a nois e on the dynami cs (r esp . obs ervatio ns). The deterministic cas e i s of cours e included in our forJlulation since p . can be a Dirac

Let U " b e a com pact subs et of lJ k Pi IR Vi = 1, ••• , N, 1fj =0, ••• , M- 1, Vk € K. U represents the s et of admiss ible ijk controls for play er i at time t. and for

measur e on a giv en

the ob s ervation

J

v.

1f j = 0 , ••• , M.

J

1 . 3 . The obs ervations . The I nfor mat ion Structure. Y =

M

h

:

U (\ t.l x

j~

x pr 2

J

=h(t ,x, v ) 2

J

pr 2 V

j

= \v

(5)

1fj~ ,

V .) - Y , J

••• , M,

i =jlli U€K Uijk

such that J.

or :

U

i

i=1 , ••• ,N.

an out put f eedback fo r play er i is thus a sequence \u.J. (t J"Yk ,J.. )k J- o , ••• , M- 1 , kCK ~

U.

E

,J.

The s et of output-feedbacks for player is d efi n edM~1 :

U \Yk 1 C IRq b e a dis cr ete set , k€K K b eing a denumerabl e s et of indices. Y i s suppos ed to b e t he observations space. The observation rul e is given by the fun c tion : Let

J

Yk .

ui ( tj' Yk ,i)€ Ui jk •

may be co nsid er ed as a pi ecewis e co nstant

functio n on ea ch

1fx€E,

2i

I n the team and differe>1tial gal:le probl ems, the N players (or decis ion makers in the t erminology of [ 10 J, [ 3J ) obs erve a given co mponent of the who l e obs erv ation v ector, name ly :

[t ,t + [. j j 1 For reasons tha t wi ll appear in the Dynami c Progr amming analysis , we shall als o consid er a more general type of co ntrol functions t hat we cal l pure strat egi es : Let IP(E) b e the s et of probability meas ures on E. The s et of player i ts pure strategies iM the J. s et of ev erY a pplications from (IP(E )) to U. having the i nstanta-

n.

J.

neousness pro perty if

p=(p ' ••• ,PM-1) € (jp(E))M , o

(6 ) U.( t.,y., p ) =u.(t . , y ., P.) (3)

Yk .=h. (t .,x, v ) 2 ,J. J. J

ifj =o , ••• Yi

J.

1fi=1 , ••• , N, 1fk E K,

,M,

is the obs erv ation availab le for

"j and

J

~,

J.

J.

J

J.

J.

••• ,M-1 , "try. EY. , 1fi = 1, ••• , N. J.

J.

We shal l now define the mixtures aSSO Ciat ed to the output feedb acks and pure strategies

Incomplete Information respectively, that we call relaxed output feedbacks and mix ed strategies. The s et Ri of relaxed output feedbacks fo ", play erM-

(7)

t

manifold with regular pseudoboundary and satisfying :

( '0)

is :

Ri = ~=D ~(Uij)

with

Uij = ~EKUijk·

v

(8)

J=v

M

, we have:

with

~.

~

~

, U.

~

with



R.

C

R

~'

i

eR i

2

I(x,f(t,x,u,v, ))1 .. c (t)(1+llxIl ) 2 E C (E;lRn)

is uniformly bounded in

(t,x,u,v ). 1 We consider the d~ffere!1tial system

i (t) = f(t,x(t),u(t .,y(t .)),v, (t.)) J

J

[ y ( t .) = h ( t ., x (t . ) , v (t . ) ) 2 J J J J with

0

~

0

~c

00

0

0

c

0

u(t,y(t)) =

n E[ t j , t j +1 [, V j • For relaxed outFut-feedbacks N-tuples, (11) is easily extended : J.(r,t,P) =JJ . (u,t,p )dr(u) 0

~

0

j

(u,(tj,y,(t j »), ••• , uN(tj'YN(t j )))'. The trajectories of (9) starting from Xo E Xo are well defined though the right-hand side is not generally continuous with res pect to x (see [7]).

1.6. The cost functions We shall define the cost functions in the context of N-person games ; in the cases of teams and control the modifications are obvious. The~rget C is a c2_(n+, ) dimensional

0

~

0

NUN with r = ~=1 r i ' r i E Ri ' and U =i~1Ui. to define player its cost function for pure and mixed strategies. Let us introduce the concept of measure-trajectory : Let pUt,V(B)=P Ox EX l[xU,V(t)](x 000

subset of Rn, where

u ( t ., y (t )) J

~

(,2)

V( t , u , v,) E [to' T] x IR P x IRm, ,

J

J. (u,v,t ,x )dP (x )dp(v)

0

(u, ( t j' y, (t j ) ) , ••• ,uN ( t j' YN( t j ) ) ) ,

(i) icE L' (t ,T) such that m + 0 ) E [ to, T] x E x IRP x IR , V ( t,x,u,v,

~)

0

J . (u,v,t,x ) = G.(t (u,v,x ),x(t )) +

where

o

ox

is a

o

p = P, : ••• + PN ' satisfying the following assumpt~on :

Qf

i

J C O g . ( t , x ( t ), u ( t, Y( t ) ) )d t, ~ t

Let us be gi v en a ~onti@uous function f : [t , T] x E x IR x IR ' - /Rn where

and

gi for player

cl.

t (u,v,x )

Vi.

1.5. The trajectories.

(~) (ii) f(t,. ,u,v, )

and

EXV

~J.

U. cR.

Ei~' Ui

continuous function from [t ,T]XE~P to R, and the terminal cost G. o for player i is a continuous function fr~m [t ,T]XE tom. o The player its cost function for outputfeedbacks N-tuples is defired by ~

. E ~( U .. r(E)

~J

Remark that

II

Ey ,

(,,) J. (u,t ,P ) = J

J

~J

and generated by

The integral cost

p. (p) = ~~ P.. (P.) , ~

t (u,v,x) for a trajectory c 0 (9) with initial condition

is t (u,v,x ) = Inf!tE[t ,T]I(t,x(t)) E coo

The set of mixed strategies R. for player i is the s et of every applicatioBs from (~(E))M to R. with the instantaneousness property : p. t R. if and only if ~

[ x /Rn cC.

N

(to'x ) o

~J

M

00

x(.) solution of

p. = ®--" p.. with p .. E!P (U . . ). ~ J-V ~J ~J ~J That means that player i will choose at random a co ntrol function on each interval [t . ,t . [ independ ently, according to th~ pgbl abil i ty dis tribution p ..•

~

[T, +

The final time

Pr ec is ely , a relaxed output fee dback for playeN_,i is a probability measure

VI' = (po' ••• , P -,) E (~(E))

1347

0

)rnl

"VB Borel

[XU,v(. )](x ) o

is the

trajectory solution of (9) with initial condition (t ,x ), generated by u € U and o

0

v € V. The application

t - p~,v

is called

a measure-trajectory generated by (u,v). It is also pqssible to define

P~

(B) =

J

UXV

feedback

P~'V(B)drdP

N-tuple

for every output

N N r = ® , r. € ® 1 R.• ~= ~ ~= ~

Finally, to define the cost functions for pure and mixed strategies, we introduce the follo~ng induction

r

R. , and let l.'I. E R. i=1 ~ ~ ~ the following induction Let



obtained by

J. Levine

1348 N

,

1,... N)

r E ® R. ,

and l e t i=1 l the follo wing i nductio n Let

Similarly , if (u ,u (U is an outputfee dback s olut ion of (T. O. ) , it i s al s o a s olution in pur e s trat egies , i n r e laxed outpu t fe edbacks , and i n mixed strat egi es .

f". (p )

=

lO

Proof : If

0

(13 )

lim t - t

and no ting

J

~hen ,

~ince

a e1'ln_eG 1'or N r

=

Vi = 1, ••• , N.

we hav e

:

by (13) ,

®ri,M_1(P~1)'

r. (R., and l

l

III - Existence resu l ts

11 - The prob l ems 11.1. Nash equi librium i n sto c has tic N- pers on differential games with . incompl ete i nformation . The prob l em iSNto fi nd a N- tupl e (r * , ... ,r*) E D 1R. (r esp . R.) satisfying N 1 l= l l the Nash equil i brium property in r e lax ed out put f eedbacks (r esp . in mix ed strategies) (N .E. )J(r* , ••. , r~ 1,r~ ,r~ 1, .. ·,r*),,;; N l 1 ll l+ ,,;; J.l (r*1 ' ... , r*l - l ' r l. , r*.l+1"'" rN* ) Vr. E R. (r esp . R. ). l l l It is clear tha t we inc lud e the d et erministic cas e since we can c hoos e P as the Dirac measur e on a s ingl e v . 11.2. Team optimization s tochas tic problem with local informatio n structur e. I n this cas e we suppos e that J J Vi = 1, ••. , N. We want to find

1' ... ,uN) E U

(u

mi nimize

J, that is :

(T.O)

1, ... ,uN),,;;

(resp. 0) tha t

J(u , ... ,u ) 1 N

V(u , ... , ~) E U 1

If (r

(r esp . U).

1,··· ,rN) (

We shall i ntrod·.lce the assumpt ion : eit her P is absolu tely continuous with r es~ect to ~, Leb esgue measur e of [Rn, of (H31 • or m1 =n, 6v (t 0 , x,u,v,) is inv ertible 1 1fx, u , v , , and pr,p 0 (proj ection of Po

defin edby

(1))

is absolu tely

continuous wit h respect to Theorem 1 :

~.

(H1) t o (m) ho ld .

Su ppos e t hat

(i)

(N. E) hav e a Nas h point i n r e laxed out put feedb acks (a fortiori in mix ed strat egies). (ii) (T.O) have a s olution in out put feedbacks (a fortiori i n pur e s trat egies). Proof: (i) Th e proof is bas ed on the following (non trivial) property (s ee [7]) : Vi=1 , ••• , N,u - J.(u) is continuous on U endowed wi t h thel product topology , and with this topolo gy U i s co mpact. It follows tha t the app licat ion r - J.(r) l i s 1 i near and vagu ely co ntinuoMs on the vagu e aompaet and co nv ex R = ~=1 Ri ' and, a fortiori r - J (r , ••• ,r ) , N i i 1 V(r 1 ,··· ,r i _ 1 ,r i +1 ,··· ,r N) , and (r1, .. ·,ri_1 ,ri+1,· .. , rN) - J i (r 1 , .. ·,rN) ,

Remark : The (N.E) proble8 obviously i nc ludes z ero- sum different ial games j simi lar ly , the (T. O) probl em i n clud es t he control probl em with incompl ete (and non c lassical) information. In(N.E), we c annot generally hope to find a Nash point in pure strat egi es as in (T . O), and this is why we us e mixed s trategies in this co nt ext. We have the comparison resell t : Proposi tion

generat ed

J i (r1, .. ·,ri_1,r i ,ri +1,· .. ,r N) Vr i (l'l.i· ·

Vi =1 , ... , N.

J (u

® •••

t

,,;; J.(r*1 ,···,r,l- 1 ,r.,r~ N l l l+1,···,r *)

and this definition i nc luJ es t h e one for pur e strat egi es .

a N-tuple

r i = ri ,o (po )

pr

_

~=1 r i

=

l

the co st func tion has alr eady b een

J.(r , t , P) =J.(r , t , P ) lOO lOO

i

l

by (r*1 ,···,r~l- 1,r.,r~ N l l+1, ... ,r *)

j

t < t.

Vj = 0 , ... , M-1,

r. ( R., i t is c l ear that one

can bu ild a measure- t r aj ectory

N

~=1Ri is a

Nash ~int in relaxed output-feedbacks for (N .E. ) , it is also a Nash poi nt i n mixed stra tegies.

Vr , are vagu ely co nt i nuous . Thus, the i usual assumptions to ensure t he exi st ence of a Nash point are fulfill ed and (i) is ~ rove d . (ii) u - J(u) is co ntinuous (see als o [7] ) on th e compact U, and this s uffices to ensure the existence of a mi nimum of J on U. • IV - Necessary and suffi cient optima lity co ndi tions. Let us denote

V. (t , P ) = J. (r*1 , ••• ,rN* jt ,p) lOO l OO Vi = 1, ••• , N, the optimal valu es of (N.E),

and

Ve t , P ) = Min J(U jt , p ) o

0

(U

0

0

the value

funct i on of (T . O~. It is c l ear, i n both cas es , t hat the optimal N- tuple must

Incomplete Information

(t , P ).

depend on

I n f a ct, one can see o 0 that it depends on the whol e trajectory

t - Pt, and this justifies the introduction of pur e and mixed strategies. More pr ecisely, let

Vi(t,p) = J i (r:;,t, p , ... ,rN,t, p jt,~

(15)

Vi = 1, ... , N, wher e (rl,t,p, ... ,rN,t, p ) i s the Nash s olut io n from (t ,p), and

1349

is the 60RU ,v K , ... ,k .,\:: o J measure on O ~': has a density

%

(18)

G.. (u,v,x) ~J

V t (u , v , x) c g.(t ,[ xU,v(t) ] (x) , u)dt



~

obtained in the sense of di stributions by :

Suggest Documents