Stackelberg Solution for Two-Person Games with

8 downloads 0 Views 801KB Size Report
and VB(t) be the amounts of money invested in resea.rch a.nd development by fims A a.nd B at time t. Let x be a cerhin kind of measurement of technical gap.
IEEE TRANSdCTIOKS ON AUTObL4TIC CONTROL,

VOL.

AC-17,KO. 6,

DECEMBER

[12] N.Dunford and J. T. Schwartz, Linear Opemtors, part 1. Kerr York: Interecience, 1958. [13] R. 4 . Baker and A. R. Bergen, “Lyapunovstability and Lyapunov funct.ions of infinite dimensional system,” IEEE Trans. Automat. C d r . , vol. AG14, pp. 325-334, June 1969. [14] I. W. Sandberg, “Some stability results related t.o those of V. M. Popov,” Bell. Sysf. Tech. J., pp. 2133-2148, Nov. 1965. [15] F. Riesz and B. Sz-Yagy, Functional Analysis. New York: Kngar. 1955. J. ion nor compromise. On theotherhand,in a two-person game wit,h identical goa.ls, because the cost

I

Manuscript received January 14, 1972; revised July 6, 1972. Paper recommended by I. B. Rhodes, Chairman of the IEEE S-CS Large Systems, Different.ia1 Games Committee. This work was supported in part by t.he U.S. Air Force under Grant. AFOSR-681579D, in part by the Joint Services Electronics Program under Contract. DAAB-07-67-C-0199 withthe Kniversity of Ill., and in part by the National Research Council of Canada under Grant, NRC-A-4160 with Lava1 University, Quebec, P.Q., Canada. C. I. Chen was with the Coordinated Science Laboraton., University of Illinois, Urbana, Ill. He is noa with t.he Depart.ment of Electrical Engineering, Lava1 Vniversitp, Quebec, P.Q., Canada. J. B. Cruz, Jr., is with the Coordinat,ed Science Laboratory and theDepart,ment of Electrical Engineering, University of Illinois, Urbana, Ill. 61801.

functions for the players are identical, both players tendto co0perat.e with ea.ch other. The problemcan then be solved as an optimization problem. In two-person nonzero-sum ga.mes [ a ] , theobjectives of the players are neither exa.ct.ly oppositenor do they coincide with each ot.her. There are several mays of defining a “solut,ion” under t.hese conditions. The “optima.1” strategy depends ontherationalit,y assumed by ea.ch player. Some of the Strategies that, havebeen investigated are nlinmax [i],[4],Nash [5], and noninferior strategies [ 3 ] ,[SI, each of which has desirable charact.erktics. In this paper, a strategy suggest.ed by Stackelberg (discussed in [7] and [SI) for stat.ic economic competition will be considered a.nd extended to the case of dynamic competition with biased information pat.t.erns. Dejnition; Given a two-person game, where Player 1 wants to minimize a cost function Jl(ul,u2)and Player 2 want,s t,o minimize acost funct.ion J 2 ( u 1 , ~by ) choosing uI,u2from admissible stra.tegy setsL71 and U2,respectively, thenthestrategy set, (u1*,u2*)is called a Stackelberg strategy zcith Player 2 as leader and Player 1 as follower if for any u2belonging to U2and u1 belonging t.o . T I

where

and

u1*

= UlO(U2*).

792

Thus, a Stackelberg strategy xith Player 2 as leader is the optimal strategy for Player 2 if Player 2. announces his move first and if the goal of Player 1 is to minimize J1, while that of Player 2 is to minimize Jz. If Player 2 chooses any other strategyuz, then Player 1 will choose a strategy u1’ that minimizes J1, but the resulting cost. for Player 2 will be greater t,han or equal to that Then the Stackelberg strat-egy with Player 2 as theleader isused. The Stackelberg strategy vith Player 2 as leader is an attractive strategywhen the informat,ion pattern is biased in the sense that Player 1 does not know the cost function of Player 2, but. Player2 knows both of the cost functions. By announcing his St.ackelberg strat,egyuz* first., Player 2 forces Player 1 t.o follow and use the St.ackelberg strategy u1*. In t.his paper,Stackelbergsolutionsfor tmo-person nonzero-sum dynamic games are invest.igated. It is assumed that. the dynamicmodel, i.e., the stat.e equat,ions, and t.he state are k n o w to both players, but only the leader knows b0t.h cost functions. The follower knows his own cost, funct.ion. As -&h Kash solutions for nonzerosum games, feedback and open-loop Stackelberg strategies could yield different, solut,ions. Kecessaryconditionsfor open-loop Stackelberg solutions are derived using varia.t,ional methods.For discrete-time games, feedback Stackelberg strat.egies are defined using dynamic programming. A simplified resource allocationexample is presentedto illustrate tmhesolut,ion concept.

IEEE TRANSACTIONS ON AUTQN.4TIC CONTROL, DECEMBER

1972

PLAYER 2

TlAYEll 1

Fig. 1. A simple bimatrix game.

inferior strategy set.. It.is also clear that y3 is the minmax strategy for Player 2. As anot,her exa.mple, u1 a.nd u2 are scalars, U1 and U Z are R1, and the cost functions Jl(ul,u2)and J2(ullu2) are convex a.nd twice differentia.ble wit.h respect toboth arguments. Player 1 wants to minimize J1, Rrhile Player 2 wants to minimize Js. Equicost contours in the space U1 X LjF2 are plot.tred for J1 and J2 inFig. 2. Suppose Player 2 announces that he sill choose uz = Player 1 d l t.hen choose uh such that J1(uk,%J 5 J1(ul,uzz) forall u1 E UI. This is achieved by choosing u1 = uh such that a.t.u1 = ulz,the line ug = u z z is tangent to an equicost cont,our of Player 1. The locus of such p0int.s for all u2 E U2are plotted asP&?l.P1@1 is called the rational r e a d o n curve or reaction curve for Player 1. Similarly, the reaction curve for Player2 is obtained as PzQ2. Xon- consider a Stackelberg strategy xith Player 2 as leader. For any choice of ~2, Player 1 will elect to choose uIo such Ohat (ul0,%)lies onhis r e a d o n curve P1Q1. Thus,in order to minimize his o m payoff, Player 2, 11. STACKELBERG SOLCTION FOR TWO-PERSON while playing as leader, d l choose u2* so that for any STATIC GAXES (ulo,u2) E P1Q1,J2(ul*,u2*)5 J2(u10,uz).In other words, Before considering nonzero-sum dynamic games, a few Player 2 will choose u2such that J z is minimized with while (u1,u2)is constrained to be on simplebimatrixgames will be considered to illustrate respect to u2 E Us, some features of Stackelberg solutions. In all games con- the rea,ction curvePIQ1.This isachieved at point R, where sidered in this section, each player wishes to minimize his t.he equicost contour of Player 2 is tangent to t,he reaction 1 as own cost and is indifferent to the cost. borne by t.he other curve PIQl. A Stackelbergstra.t,egynit.hPlayer simila.rly t.o bepoint T , where the player. In the mat.rix game of Fig. 1, Player 1 chooses his leadercanbefound st,rategy from the set ( 2 1 , 5 2 , 5 3 ) , while Player 2 chooses his equicost, cont,our of Player 1 is tangent to t,hereaction strat,egy from t,he set. (yl,y2,y3). The corresponding entries curve P2Q2of Pla.yer 2. The point hT, which is the intersect.ion of t,he two rea.ction give t.he costs J1 and J2 for t.he two players, respect

i

793

CEIEN AND CRVZ: STACKELBERQ SOLUTION FOR QAb7ES

4

2 T ' "Z?

UZ

'2N

Fig. 2. Equicost contours for twc-player nonzero-sum g e e . Player 1 chooses u1 and Player 2 chooses up. 117 is Nash solution, R is Stackelberg solution with Player2 as leader, and T i s Stackelberg solution with Player 1 as leader.

L/

worse off than when theNashstrategy pair is played. However, if the leader actually chooses a strategy corresponding to Stackelberg strategy,the follower d l do worse by not following a Stackelberg strategy himself. Necessary conditions for a strategy to be the St,ackelberg strategy for static two-person nonzero-sum game where U X= 8 ' ' and Uz = Rr2is given by the following proposition. Proposition: For sta.tic two-person nonzero-sum games where the admissible strategies for Player 1 are in R" and the admissible strategies for Player 2 are in R" and the costfunctionsforPlayer 1 and Player 2, J1(ul,uz) and Jz(ul,uz), are twice Merentiable withrespect to both arguments u1 and ~2,then aStackelbergstrategywith Player 2 as leader (ul*,uz*), if it exists, must satisfy the following conditions: aJl(u1,uz)

G ( ~ I , u= ~)

=

I

D

UT

U1

Fig. 3. A zero-sum game without, saddle

(4)

aul

I

and

u;

a

+ XG(ul,U2)]

- [JZ(UI,UZ)

du,

point.

Fig. 4. =

0

D

Ul

A zerc-sun1 game with saddle point.

(5)

for i = 1,2, where h is a.n rl-dimensional row vector multiplier. SimilarconditionsforaSt.ackelberg strategywith Player 1 as 1ea.der areobtainedby intercha.ngingsubscripts 1a.nd 2 in (4)and (5), a.nd X is r2 dimensional. For zero-sum games where J Z = - J1, the Stackelberg strategy with Player 2 as leader is the minmax st.rategy for Player 2. In other words, if both playersplay the StackelbergstrategywithPlayer 2 asleader,Player 2 will get his minmax payoff (security payoff). In Fig. 3, ( u l * , ~ *is) a Stackelberg strategy with Player 2 as leader and the security payoff for player 2 is ko. Note that a saddle-pointst.rategyfor this example does not exist since min,, m a , , Jz = ko while max,, min,, J z does not exist a.t all, a.nd hence the Sta,ckelberg strategywith Player 1 as leader does not exist,. For a stmaticzero-sum game where saddle-point a strategy exists, minmax strategies, maxrnin strategies, and Stackelberg strategies

with either player as leader are all thesame as thesaddlepoint stra,t,egyby definition. exa,mple of such a, case & Shomm in ~ i 4 ~where . (U1~,U2*)is t,he sadde-pobt strategy. 111. OPEN-LOOPAND FEEDBACK STACKELBERG SOLUTIONS FOR TWO-PERSON DYNAMIC GAMES

I n a two-person dynamicgame,Player 1 wishes to choose prior t.0 the start of the game his control ul(l) for all t in the interval [to, if] to minimize

794

IEEE TRA~NSACTIONSON AUTO~L~TIC CONTROL, DECEMBER

1972

PLAYER 2

0

1

PLAYER 1

x=x2

t=l

x=x3

t=l

x=x4

t=l

x=x-

t=l

PLAYER 2

0

1

1 x 4

PLAYER 1

1 1 2 - 1

PLAYER 2

PLAYER 1

(c) PLAYER 2

0

1

PLAYER 1

t=O

t=2

t=l

Fig. 5. A discrete fhite-state multistage nonzero-sum game.

both subject to the constraint 3i- = f(x,t,u1,uz),

0

PLAYER 1

%(to)

= x0

1 7 1 PLAYER 2

1

(8)

where x is the st.atevector. In a nonzero-sum differential game, it has been found that open-loop Na.sh solutions and feedback Nash solutions a.re different in genera.1 [9]. Thus, one cannot obtain the open-loop solutions from the feedback solutions of vice versa, as in optima.1 control. It d l be shown by exa.mple that feedbackStackelbergstrat.egy and openloop Stackelberg strategyfor dynamicgamescanbe different. The example is a simple discret,e f d e - s t a t e muhistage nonzero-sum two-person ga.me sinlilar to t.he one considered by Starr and Ho [9]. It is shown in Fig. 5 using the notation in 191. The feedback Stackelberg solution is defined by dynamic programming. Figs. 6(a)-(d) show the corresponding bimat.rix ga.mes a t stage t = 1 for the four states. The Stackelberg solution m-ith Player 2 as leader a.t t = 1 is (0,O) if LC = x2, (1,O) if x = 2 3 , (0,l) if x = 2 4 , and (1,l) if x = 25. Assuming tha.t the players play their feedback Stackelbergstrategies a t t = 1, Fig. 6(e) shows the bimatrix game at stage t = 0. The feedbackStackelberg strat.egy at this stge is (1,l). Thus, the costs for Players 1 and 2 for the entire ga.me are 0 and -3, with an associa.ted t.raject.ory of ~ ( 1 = ) x5 and 4 2 ) = ~ 2 ~ . Nom- consider the open-loop sbrategies as shown in the bimat.rix game in Fig. 7. The Stackelberg st,rategy is the sequence pair (01, 10). The cost,s corresponding t.0 t,hese cont,rol sequences are 1 and -2 and the corresponding trajectory is z(1) = z3,~ ( 2 = ) xu.Note that the feedback

2,D

1,-2

-2,l

x=xl

t=tl

0,-3

(e) Fig. 6. Sequence of bimatris games for determining the feedback Stackelberg st,rategy for the multist.age game of Fig. 5. (a)-(d) are the associated games at, t = 1. (e) is t.he bimatrix game a t t = 0, assuming that the players use feedback Stackelberg strategies at. t = 1. PL4YER 2 0010

01

11

00

G1 PLAYER 1

10 11

-5,i

Fig. 7. Bimatrixgame for det,ermining the open-loop Stackelberg strategy for the multistage game of Fig. 5 .

Stackelberg strategy m

Suggest Documents