Off-Line Computation of Stackelberg Solutions with the ... - Springer Link

9 downloads 0 Views 71KB Size Report
(Accepted 9 January 1998). Abstract. The paper studies off-line computation of the Stackelberg solution in a repeated game framework, utilizing the Genetic ...
Computational Economics 13: 201–209, 1999. © 1999 Kluwer Academic Publishers. Printed in the Netherlands.

201

Off-Line Computation of Stackelberg Solutions with the Genetic Algorithm 2 THOMAS VALLÉE1 and TAMER BASAR ¸ 1 LEN-C3E, Economics Department, University of Nantes, France

e-mail: [email protected] 2 Coordinated Sciences Laboratory, University of Illinois at Urbana-Champaign, USA

e-mail: [email protected] (Accepted 9 January 1998) Abstract. The paper studies off-line computation of the Stackelberg solution in a repeated game framework, utilizing the Genetic Algorithm. Simulations are conducted with a numerical linear quadratic example and a Fish War game example. Furthermore, it is shown that an evolutionary mutation probability is preferable to a fixed one as usually assumed. Key words: off-line computation, Stackelberg game, genetic algorithm

1. Introduction When a game is sequential, the first moving player is the natural leader of the game. But to play the Stackelberg solution he has to know, at least, the reaction function of the follower. When the players do not know each other’s costs, the repeated game is generally solved by using on-line asynchronous algorithms, which force the solution to converge to the Nash solution (cf. Li and Ba¸sar, 1987). Hence the Stackelberg solution cannot be obtained using an on-line iteration, and therefore one has to use an off-line algorithm, such as a heuristic algorithm. For the last 10 years, and starting with the seminal work of Holland (1975; 1995) (see also Goldberg, 1989), the genetic algorithm has been widely applied to the computation of equilibria in games (mainly in the context of the repeated prisonners’ dilemma game, following the work of Axelrod (1987)). Our purpose in this paper is to show that the genetic algorithm can also be used for the computation of Stackelberg equilibria. We particularly show, using a simple genetic algorithm, that the first player may arrive at his almost perfect Stackelberg solution very fast. The plan of the paper is as follows. In the next section we present a general game and discuss its Stackelberg solution. Then, in Section 3, we present a brief outline of what a genetic algorithm is and show a possible relationship with Stackelberg solution. In Section 4, we first use a numerical example in order to compare the evolution of the game when the genetic algorithm uses an evolutionary mutation probability rather than a fixed one. Then an application to a Fish War game, like the one in Li and Ba¸sar (1987), is carried out. Finally we derive some conclusions.

202

´ AND TAMER BASAR THOMAS VALLEE ¸

2. The Game We consider a two-person repeated sequential game with L the ‘natural’ leader (first player to move at each period) and F the ‘natural’ follower. Let U and V be the action spaces of the leader and the follower respectively, with their generic elements denoted by u and v. Let J i : U × V → < be the cost function for player i, where i = L, N. ASSUMPTION 1 The players do not know each other’s cost function. Assume that U and V are some compact, nonempty and convex spaces. Then, if we also assume that J L (u, v) and J F (u, v) are C 2 functions strictly convex and differentiable in u and v, respectively, then we know from Ba¸sar and Olsder (1995) that there exist some reaction functions T L and T F that are continuously differentiable mappings. Hence, the best reaction function T L : V → U of the leader is defined by: T L (v) ≡ arg min J L (u, v), u∈U

∀v ∈ V ,

(1)

and likewise the best reaction function of the follower, T F : U → V is given by: T F (u) ≡ arg min J F (u, v), v∈V

∀u ∈ U.

(2)

THE STACKELBERG EQUILIBRIUM SOLUTION

The Stackelberg solution is generated by the following well-known procedure (cf. Simaan and Cruz, 1973; Ba¸sar and Olsder, 1995). Knowing the reaction function of the follower T F , the leader minimizes his own cost function. This involves the optimization problem: arg min J L (u, T F (u)) u∈U

(3)

which in turn yields a unique optimal action u∗ . Given this action, the follower reacts by an optimal action v ∗ = T F (u∗ ). DEFINITION 1 A pair of actions (uS ∈ U, v S ∈ V ) , with uS given by (3) and v S = T F (uS ), is a Stackelberg equilibrium solution. The problem is, as indicated before, that the leader does not know J F and thus the reaction function T F . Furthermore, no on-line algorithm, such as a sequential update one, may lead to the Stackelberg solution. The main reason is that the Stackelberg solution does not generally have the equilibirum property T L (T F (u∗ )) = u∗ . This equality holds only when the Stackelberg solution coincides with the Nash one. Consequently, as indicated earlier, one must use an off-line learning algorithm to find the Stackelberg solution.

OFF-LINE COMPUTATION OF STACKELBERG SOLUTIONS

203

3. Off-Line Computation of the Stackelberg Equilibrium Solution 3.1.

INTRODUCTION TO THE GENETIC ALGORITHM

Genetic Algorithms, GAs, were introduced by Holland (1975). They are powerful heuristic search and optimization techniques inspired by biological evolution. That is, they are modeled on the principle of evolution with natural selection (reproduction of the best elements with possible crossover and mutation during this reproduction). The very attractive characteristic of GAs is that they need very little knowledge of the problem data. We use a general simple version of GA as described in Goldberg (1989). The 1 fitness function is a function f : < → d(c) ˜ = 1.2115, we have switched it to c. ˜ We ran the algorithm 50 times. The average results obtained are shown in the following table. As expected, the average cost, 0.497212, is very close to the Stackelberg cost, 0.4971. But also, the variations are very small since the best simulation allows J L = 0.497147. The worst value is J L = 0.50102, which corresponds to the minimum value of the leader’s action uMin = 1.1733, while uMax = 1.2041 yields J L = 0.498584.

208

´ AND TAMER BASAR THOMAS VALLEE ¸

Table II. Results of the GA over 50 attempts.

JL JF u v

Average

L JMin

L JMax

uMax

0.497212 4.94189 1.19657 0.0164508

0.497147 4.76253 1.19400 0.019262

0.50102 3.88007 1.1733 0.0418603

0.498584 5.73240 1.2041 0.0082

Figure 4. Average leader’s action.

Graphically, one can easily check the impact of the crossover and mutation steps. Before period 20, big oscillations were possible. Later, since the mutation was possible only for the last 7 digits, we have only small oscillations. The convergence to the Stackelberg solution is very fast as depicted in Figure 4. On the average, after only 10 periods, the leader’s action was almost at its Stackelberg value. 5. Conclusion In this paper, our starting hypothesis has been that if a player is in a natural leadership position, then the Stackelberg solution with him as leader is the most natural one. However, since on-line construction of such a solution is not possible, this

Figure 5. Evolutions of the average follower’s action and cost.

OFF-LINE COMPUTATION OF STACKELBERG SOLUTIONS

209

leaves off-line heuristic search methods as the only alternative. Several heuristic search methods are available, with genetic algorithm, simulated annealing, and Tabu search, being some of them. We chose here the genetic algorithm, since it does not require the first player (leader) to have any information on the other player’s cost function. In such an incomplete information framework, genetic algorithm does well. Our simulations underscore this fast convergence to the Stackelberg equilibrium solution. Albeit, as already known or to be suspected, to get the exact value of the solution may take a long time. But still the overall gain is significant. Acknowledgements The paper was written while the first author was visiting at the Coordinated Sciences Laboratory. The kind hospitality of the Laboratory is most gratefuly acknowledged. Research of the second author was supported in part by a Grant (ECS 93-12807) from the National Science Foundation. Finally, the authors would like to thank Christophe Deissenberg for his useful comments. References Axelrod, R. (1987). The evolution of strategies in the iterated prisoner’s dilemma. In L.D. Davis (ed.), Genetic Algorithms and Simulated Annealing. Morgan Kaufmann. Ba¸sar, T. and Olsder, G. (1995). Dynamic Noncooperative Game Theory. 2nd edition, Academic Press, New York. Goldberg, D. (1989). Genetic Algorithm in Search, Optimization and Machine Learning. AddisonWesley. Holland, J.H. (1975). Adaptation in Natural and Artificial Systems. University of Michigan Press. Holland, J.H. (1995). Hidden Order. Addison-Wesley. Levhari, D. and Mirman, L.J. (1980). The great fish war: An example using a dynamic Cournot-Nash solution. Bell Journal of Economics, 11, 322–334. Li, S. and Ba¸sar, T. (1987). Distributed algorithms for the computation of noncooperative equilibria. Automatica, 24 (4), 523–533. Simaan, M. and Cruz, J. (1973). On the Stackelberg strategy in nonzero-sum games. Journal of Optimization Theory and Applications, 11, 533–555.

Suggest Documents