GAMES AND TEAM PROBLEMS: NECESSARY AND ... Necessary and sufficient optimality conditions are obtained by a dynamic programming method, giving a ...
Copyright © IFAC Control Science a nd T ec hnology (8th Triennia l W orld Congress) Kyo to, J a pa n , 1981
TEAM AND GAME THEORY I
INCOMPLETE INFORMATION IN DIFFERENTIAL GAMES AND TEAM PROBLEMS: NECESSARY AND SUFFICIENT OPTIMALITY CONDITIONS
J.
Levine
Centre d 'Automatique et Informatique , Ecole Nationale Superieure des Min es de Paris , 35, ru e Saint Honore, 77305 Fontainebleau , France
Abs tract. We s t udy a non-c lassic al s tru cture of incompl e t e i nformat ion in N- p e~s o n noncoo pera t iv e diffe ce nt i al gam es , t eam probl ems and s toc hastic co nt r o l , wher e th e noisy obs el'va t i ons availabl e t o th e 1.eci s io n makers a re ins t antaneous and memoryl ess . Necessar y and suffic i e nt o pt imality co ndi tio ns are obtained by a dynamic progr ammi ng me thod , giving a charac t erizatio'l of the "signaling eff ec t " and a s eparation pr ~ n c ipl e. Keywords . Incollpl e t e and loca l info rmat ion - Diff erent ial Games - Team Probl erJs Stochas t i c Contro l - Nash equilibrium - Non c lassi cal information stru c ture - Op t imali t y condi t i ons - Dynami c Pro gr amming - Signaling Eff ect - Separation Princ ipl e. I NTRODUCT I ON The purpo s e of t his paper is t o char a c t er i z e the s olution of Non- Coo pera t iv e N- pers on Di ff er e ntial Games or of Te am Optimization Probl e ms (inc luding t he s toc hasti c co ntro l probl em) i n whic h t he cru ci al point i s t h e lack of informa t i on ab out the s t at e. More preci s ely , we s u ppos e, fo r tec hnica l reasons , t hat th e players obs erv e i ns t antaneous l y t h e sta te at pr es crfued ins t ants and tha t t he ob serva t i on vector b elo ngs t o a gi v en dis crete space. The s truc t ur e of i nfo rma t i on i s thus non-classi ca l (see [ 9 J, [lOJ). The players have then to c hoo s e a d ec ~sio n i n v iew of the obs erv a tio ns , t o o ptimize i n some s ens e t he expectation of a g i v en criterion . The probl ems at this stage are :
we " i v e two important cor ollari es in the signaling charac t e:ization and a s epara t i on princ ~pl e.
§V
STAT EMENT OF THE PROB LEj\j 1 .0 . Bef or e to gi v e a f ormal pr es ent a t i on of t he mode l , l et us int r oduce the type of probl ems we shall d eal wi th The sys tem is giv en by f(t , x( t ) , u(t "y(t ,)) , v (t , )) 1 J J J h (t " x (t , ) ,v (t ) ) 2 J J J 1ft E [ t"t '+1[' 1fj J J
10 ) Do o pt i mal strateg ies ex i s t ? 2 0 ) Ho w to c har acter iz e t hell ? The answer to t he fi r s t qu estion i s y es , und er v ery genera l assump tions. For the s eco nd point, if we try to write do wn a Dynamic Pro gr a ~i ng equa tio n , we r api d l y no t ice t hat t he centra l ro l e i s pl ay ed by t he prob ab ili t y measure f or t he s t a te to b e at some place x at time t, gi v en t he pas t decis ions, and t ha t the "geometri c" study of its eVolutio n (wha t we ca ll measure - trajectory ) gi v es the ~ ey to t he der i vatio n of o ptimal ity co ndi tio ns . Thes e condit i ons, i n terms of Dynamic Pro gr a mming , pro vi de a cha r a cteriz a t i on of the 30-called "signaling Effect " (s ee [ 3 J). I n § I, II we state the probl ems of Nas hEquilibrium and Team Optimization b oth deter minis tic or s tochast ic. I n § III, we giv e exis tence t heorems and t he § IV i s devo t ed to t he s tudy of necess ary and suffici ent o pt ima lity conditions. Fi nal ly ,
1345
wher e !t , l ar e the epochs bo th of random pe r t urba t f on sho cks and obs ervations , whe~ e v ' v ;:> are pe r t ur bations on the dynamiCS 1 t h e obs ervat io ns respec tiv e ly , wher e y and i s t he ob s erva tio n v ec t or , and wher e u i s a piecewise co nstant cont r o l on l y t aki ng i nto a ccount the las t obs ervat i on . I f u ( t j , y) = (u ( t j' Y1 ) , ... , uN ( t j' YN) ) t 1 (prime deno t es t r ansposi tio n) with y i s t he cont r ol of y = (Y1""' YN) t, u u i play er i and Yi is play er i t s obs ervatio n v ecto r (local informat ion str u ctur e ) . Note t hat t he obs erv a tio ns instantaneousn es s , and thus t he lack of memory f or t he decis ion mak er , ref er s to t he Non- Cl assica l I nfo rma t i on Str uctur e (s ee [ 9]). I n the team probl e m, t h e cos t fun ct i on to mi nimize is the exp ectation of an in tegr a l + fi nal cos t wit h a t ar ge t . In t he Non-coo per a t iv e Di ff erential Game, ea c h play er has
J. Levine
1346
a cost fu nc tion of the ?r eceding nature. Let us no'., giv e a pr ecLse stat ement of thes e probl eJls . 1.1 . Le t us b e giv e .1 an open s et E C R{ n an oFen b ounded s et
as th~ Stat e Space. X ex C E as the o
s et of ini tial stat es , probability measur e on
0
and X .
Po
Let us b e giv e :l a s equ e'1 ce of r eal numb ers: and a family of Borel subs ets \v.l'-A M m + m J J-v, ••• , 2 of IR 1 • Let ~(Vj) b e the s e t of ~rob abili ty p €.~ ~(V.) ,
J
J-V
o th erwis e stat ed
he has for go tt en
J
y . ( t. 1 ) , ••• , y. (t ) .
J
J.
J-
1
0
We make t he assump tio n : Th € K,
or,
1f j = 0 , •• • , M-1
(i) h(t. , .,. )
is a Borel fun c ti on on
J
IP (V.) J
E x pr2 V j
1fj =O , ••• ,M.
(ii) hj1 (Ykiv2)
M
We shall d enote v = (v , v ) E I1 V. = V 1 2 j~ J a r andom s equ ence of perturbations obtained a ccording t o th e prob abi l ity measur e P, V . b ei ng the s et of admissibl e J
perturbations a t time
Thus i t i s c l ear
that the players cannot build a filt er of the observatio ns as i n the c lassical i nfor ~na tion cas e i n sto c hasti c co ntrol (s ee f or exampl e [1J, [2 J ) . Let us denot e :
M P =j~O P j , P j €
J
J
y. ( t .) , J.
o
J.
y.(t .), or equival ently , when he
obs erv es
[ t , TJ C IR as the maximal o dura tio n of th e play . 1 . 2. The random per turbations .
V . and
J
only
J.
a
an int e cv .al
measures on
player i, the who l e obs ervation y b e Lng defined by (Y1' •• " YN) '. The i nformati on structure is suppos ed to be instantaneous and memoryless , that is play er i kno',s at time t. y. (t . ) and
is an oriented
1
C _ n_
dimensional manifold with regular ps eudo-boundar y (s ee [7J). 1.4. Output f eedbacks - Pure and Mixed Strategies.
t j ' v (r esp . v 2 ) 1 plays th e ro l e of a nois e on the dynami cs (r esp . obs ervatio ns). The deterministic cas e i s of cours e included in our forJlulation since p . can be a Dirac
Let U " b e a com pact subs et of lJ k Pi IR Vi = 1, ••• , N, 1fj =0, ••• , M- 1, Vk € K. U represents the s et of admiss ible ijk controls for play er i at time t. and for
measur e on a giv en
the ob s ervation
J
v.
1f j = 0 , ••• , M.
J
1 . 3 . The obs ervations . The I nfor mat ion Structure. Y =
M
h
:
U (\ t.l x
j~
x pr 2
J
=h(t ,x, v ) 2
J
pr 2 V
j
= \v
(5)
1fj~ ,
V .) - Y , J
••• , M,
i =jlli U€K Uijk
such that J.
or :
U
i
i=1 , ••• ,N.
an out put f eedback fo r play er i is thus a sequence \u.J. (t J"Yk ,J.. )k J- o , ••• , M- 1 , kCK ~
U.
E
,J.
The s et of output-feedbacks for player is d efi n edM~1 :
U \Yk 1 C IRq b e a dis cr ete set , k€K K b eing a denumerabl e s et of indices. Y i s suppos ed to b e t he observations space. The observation rul e is given by the fun c tion : Let
J
Yk .
ui ( tj' Yk ,i)€ Ui jk •
may be co nsid er ed as a pi ecewis e co nstant
functio n on ea ch
1fx€E,
2i
I n the team and differe>1tial gal:le probl ems, the N players (or decis ion makers in the t erminology of [ 10 J, [ 3J ) obs erve a given co mponent of the who l e obs erv ation v ector, name ly :
[t ,t + [. j j 1 For reasons tha t wi ll appear in the Dynami c Progr amming analysis , we shall als o consid er a more general type of co ntrol functions t hat we cal l pure strat egi es : Let IP(E) b e the s et of probability meas ures on E. The s et of player i ts pure strategies iM the J. s et of ev erY a pplications from (IP(E )) to U. having the i nstanta-
n.
J.
neousness pro perty if
p=(p ' ••• ,PM-1) € (jp(E))M , o
(6 ) U.( t.,y., p ) =u.(t . , y ., P.) (3)
Yk .=h. (t .,x, v ) 2 ,J. J. J
ifj =o , ••• Yi
J.
1fi=1 , ••• , N, 1fk E K,
,M,
is the obs erv ation availab le for
"j and
J
~,
J.
J.
J
J.
J.
••• ,M-1 , "try. EY. , 1fi = 1, ••• , N. J.
J.
We shal l now define the mixtures aSSO Ciat ed to the output feedb acks and pure strategies
Incomplete Information respectively, that we call relaxed output feedbacks and mix ed strategies. The s et Ri of relaxed output feedbacks fo ", play erM-
(7)
t
manifold with regular pseudoboundary and satisfying :
( '0)
is :
Ri = ~=D ~(Uij)
with
Uij = ~EKUijk·
v
(8)
J=v
M
, we have:
with
~.
~
~
, U.
~
with
•
R.
C
R
~'
i
eR i
2
I(x,f(t,x,u,v, ))1 .. c (t)(1+llxIl ) 2 E C (E;lRn)
is uniformly bounded in
(t,x,u,v ). 1 We consider the d~ffere!1tial system
i (t) = f(t,x(t),u(t .,y(t .)),v, (t.)) J
J
[ y ( t .) = h ( t ., x (t . ) , v (t . ) ) 2 J J J J with
0
~
0
~c
00
0
0
c
0
u(t,y(t)) =
n E[ t j , t j +1 [, V j • For relaxed outFut-feedbacks N-tuples, (11) is easily extended : J.(r,t,P) =JJ . (u,t,p )dr(u) 0
~
0
j
(u,(tj,y,(t j »), ••• , uN(tj'YN(t j )))'. The trajectories of (9) starting from Xo E Xo are well defined though the right-hand side is not generally continuous with res pect to x (see [7]).
1.6. The cost functions We shall define the cost functions in the context of N-person games ; in the cases of teams and control the modifications are obvious. The~rget C is a c2_(n+, ) dimensional
0
~
0
NUN with r = ~=1 r i ' r i E Ri ' and U =i~1Ui. to define player its cost function for pure and mixed strategies. Let us introduce the concept of measure-trajectory : Let pUt,V(B)=P Ox EX l[xU,V(t)](x 000
subset of Rn, where
u ( t ., y (t )) J
~
(,2)
V( t , u , v,) E [to' T] x IR P x IRm, ,
J
J. (u,v,t ,x )dP (x )dp(v)
0
(u, ( t j' y, (t j ) ) , ••• ,uN ( t j' YN( t j ) ) ) ,
(i) icE L' (t ,T) such that m + 0 ) E [ to, T] x E x IRP x IR , V ( t,x,u,v,
~)
0
J . (u,v,t,x ) = G.(t (u,v,x ),x(t )) +
where
o
ox
is a
o
p = P, : ••• + PN ' satisfying the following assumpt~on :
Qf
i
J C O g . ( t , x ( t ), u ( t, Y( t ) ) )d t, ~ t
Let us be gi v en a ~onti@uous function f : [t , T] x E x IR x IR ' - /Rn where
and
gi for player
cl.
t (u,v,x )
Vi.
1.5. The trajectories.
(~) (ii) f(t,. ,u,v, )
and
EXV
~J.
U. cR.
Ei~' Ui
continuous function from [t ,T]XE~P to R, and the terminal cost G. o for player i is a continuous function fr~m [t ,T]XE tom. o The player its cost function for outputfeedbacks N-tuples is defired by ~
. E ~( U .. r(E)
~J
Remark that
II
Ey ,
(,,) J. (u,t ,P ) = J
J
~J
and generated by
The integral cost
p. (p) = ~~ P.. (P.) , ~
t (u,v,x) for a trajectory c 0 (9) with initial condition
is t (u,v,x ) = Inf!tE[t ,T]I(t,x(t)) E coo
The set of mixed strategies R. for player i is the s et of every applicatioBs from (~(E))M to R. with the instantaneousness property : p. t R. if and only if ~
[ x /Rn cC.
N
(to'x ) o
~J
M
00
x(.) solution of
p. = ®--" p.. with p .. E!P (U . . ). ~ J-V ~J ~J ~J That means that player i will choose at random a co ntrol function on each interval [t . ,t . [ independ ently, according to th~ pgbl abil i ty dis tribution p ..•
~
[T, +
The final time
Pr ec is ely , a relaxed output fee dback for playeN_,i is a probability measure
VI' = (po' ••• , P -,) E (~(E))
1347
0
)rnl
"VB Borel
[XU,v(. )](x ) o
is the
trajectory solution of (9) with initial condition (t ,x ), generated by u € U and o
0
v € V. The application
t - p~,v
is called
a measure-trajectory generated by (u,v). It is also pqssible to define
P~
(B) =
J
UXV
feedback
P~'V(B)drdP
N-tuple
for every output
N N r = ® , r. € ® 1 R.• ~= ~ ~= ~
Finally, to define the cost functions for pure and mixed strategies, we introduce the follo~ng induction
r
R. , and let l.'I. E R. i=1 ~ ~ ~ the following induction Let
E®
obtained by
J. Levine
1348 N
,
1,... N)
r E ® R. ,
and l e t i=1 l the follo wing i nductio n Let
Similarly , if (u ,u (U is an outputfee dback s olut ion of (T. O. ) , it i s al s o a s olution in pur e s trat egies , i n r e laxed outpu t fe edbacks , and i n mixed strat egi es .
f". (p )
=
lO
Proof : If
0
(13 )
lim t - t
and no ting
J
~hen ,
~ince
a e1'ln_eG 1'or N r
=
Vi = 1, ••• , N.
we hav e
:
by (13) ,
®ri,M_1(P~1)'
r. (R., and l
l
III - Existence resu l ts
11 - The prob l ems 11.1. Nash equi librium i n sto c has tic N- pers on differential games with . incompl ete i nformation . The prob l em iSNto fi nd a N- tupl e (r * , ... ,r*) E D 1R. (r esp . R.) satisfying N 1 l= l l the Nash equil i brium property in r e lax ed out put f eedbacks (r esp . in mix ed strategies) (N .E. )J(r* , ••. , r~ 1,r~ ,r~ 1, .. ·,r*),,;; N l 1 ll l+ ,,;; J.l (r*1 ' ... , r*l - l ' r l. , r*.l+1"'" rN* ) Vr. E R. (r esp . R. ). l l l It is clear tha t we inc lud e the d et erministic cas e since we can c hoos e P as the Dirac measur e on a s ingl e v . 11.2. Team optimization s tochas tic problem with local informatio n structur e. I n this cas e we suppos e that J J Vi = 1, ••. , N. We want to find
1' ... ,uN) E U
(u
mi nimize
J, that is :
(T.O)
1, ... ,uN),,;;
(resp. 0) tha t
J(u , ... ,u ) 1 N
V(u , ... , ~) E U 1
If (r
(r esp . U).
1,··· ,rN) (
We shall i ntrod·.lce the assumpt ion : eit her P is absolu tely continuous with r es~ect to ~, Leb esgue measur e of [Rn, of (H31 • or m1 =n, 6v (t 0 , x,u,v,) is inv ertible 1 1fx, u , v , , and pr,p 0 (proj ection of Po
defin edby
(1))
is absolu tely
continuous wit h respect to Theorem 1 :
~.
(H1) t o (m) ho ld .
Su ppos e t hat
(i)
(N. E) hav e a Nas h point i n r e laxed out put feedb acks (a fortiori in mix ed strat egies). (ii) (T.O) have a s olution in out put feedbacks (a fortiori i n pur e s trat egies). Proof: (i) Th e proof is bas ed on the following (non trivial) property (s ee [7]) : Vi=1 , ••• , N,u - J.(u) is continuous on U endowed wi t h thel product topology , and with this topolo gy U i s co mpact. It follows tha t the app licat ion r - J.(r) l i s 1 i near and vagu ely co ntinuoMs on the vagu e aompaet and co nv ex R = ~=1 Ri ' and, a fortiori r - J (r , ••• ,r ) , N i i 1 V(r 1 ,··· ,r i _ 1 ,r i +1 ,··· ,r N) , and (r1, .. ·,ri_1 ,ri+1,· .. , rN) - J i (r 1 , .. ·,rN) ,
Remark : The (N.E) proble8 obviously i nc ludes z ero- sum different ial games j simi lar ly , the (T. O) probl em i n clud es t he control probl em with incompl ete (and non c lassical) information. In(N.E), we c annot generally hope to find a Nash point in pure strat egi es as in (T . O), and this is why we us e mixed s trategies in this co nt ext. We have the comparison resell t : Proposi tion
generat ed
J i (r1, .. ·,ri_1,r i ,ri +1,· .. ,r N) Vr i (l'l.i· ·
Vi =1 , ... , N.
J (u
® •••
t
,,;; J.(r*1 ,···,r,l- 1 ,r.,r~ N l l l+1,···,r *)
and this definition i nc luJ es t h e one for pur e strat egi es .
a N-tuple
r i = ri ,o (po )
pr
_
~=1 r i
=
l
the co st func tion has alr eady b een
J.(r , t , P) =J.(r , t , P ) lOO lOO
i
l
by (r*1 ,···,r~l- 1,r.,r~ N l l+1, ... ,r *)
j
t < t.
Vj = 0 , ... , M-1,
r. ( R., i t is c l ear that one
can bu ild a measure- t r aj ectory
N
~=1Ri is a
Nash ~int in relaxed output-feedbacks for (N .E. ) , it is also a Nash poi nt i n mixed stra tegies.
Vr , are vagu ely co nt i nuous . Thus, the i usual assumptions to ensure t he exi st ence of a Nash point are fulfill ed and (i) is ~ rove d . (ii) u - J(u) is co ntinuous (see als o [7] ) on th e compact U, and this s uffices to ensure the existence of a mi nimum of J on U. • IV - Necessary and suffi cient optima lity co ndi tions. Let us denote
V. (t , P ) = J. (r*1 , ••• ,rN* jt ,p) lOO l OO Vi = 1, ••• , N, the optimal valu es of (N.E),
and
Ve t , P ) = Min J(U jt , p ) o
0
(U
0
0
the value
funct i on of (T . O~. It is c l ear, i n both cas es , t hat the optimal N- tuple must
Incomplete Information
(t , P ).
depend on
I n f a ct, one can see o 0 that it depends on the whol e trajectory
t - Pt, and this justifies the introduction of pur e and mixed strategies. More pr ecisely, let
Vi(t,p) = J i (r:;,t, p , ... ,rN,t, p jt,~
(15)
Vi = 1, ... , N, wher e (rl,t,p, ... ,rN,t, p ) i s the Nash s olut io n from (t ,p), and
1349
is the 60RU ,v K , ... ,k .,\:: o J measure on O ~': has a density
%
(18)
G.. (u,v,x) ~J
V t (u , v , x) c g.(t ,[ xU,v(t) ] (x) , u)dt
+£
~
obtained in the sense of di stributions by :