LPOD Answer Sets and Nash Equilibria

4 downloads 0 Views 132KB Size Report
Norman Foo. 1. , Thomas Meyer. 1 ... and standard works like Luce and Raiffa [Luce and Raiffa 57] or Watson [Watson 02] .... co-exist with negation as failure.
LPOD Answer Sets and Nash Equilibria Norman Foo1 , Thomas Meyer1 and Gerhard Brewka2 1

2

National ICT Australia; and The School of Computer Science and Engineering, University of New South Wales, Sydney NSW 2052, Australia

Intelligent Systems Department, Computer Science Institute, University of Leipzig, Augustusplatz 10-11, 04109 Leipzig, Germany

Abstract. Logic programs with ordered disjunctions (LPODs) are natural vehicles for expressing choices that have a preference ordering. They are extensions of the familiar extended logic programs that have answer sets as semantics. In game theory, players usually prefer strategies that yield higher payoffs. Since strategies are choices, LPODs would seem to be a suitable logical formalism for expressing some game-theoretic properties. This paper shows how pure strategy normal form games can be encoded as LPODs in such a way that the answer sets that are mutually most preferred by all players are exactly the Nash equilibria. A similar result has been obtained by researchers using a different, but related, logical formalism, viz., ordered choice logic programs that were used to encode extensive games.

1

Introduction

A variety of computer science areas, particularly artificial intelligence, are now importing concepts and techniques from game theory. In multiagent systems where interactions involve bargaining, negotiation, collaboration, competition, etc., game theory can provide underpinnings for rational choices. On the other hand, agent intentions and preferences can be succinctly and precisely represented as logic programs of one kind or another. The central result of this paper concerns a natural dovetailing of game theory and logic programming, so that optimal agent preferences captured as most preferred answer sets of logic programs with ordered disjunctions (LPODs) coincide with the well-known Nash equilibria for pure strategy games. De Vos and Vermeir [De Vos and Vermeir 99] have obtained a similar result using their ordered-choice logic programs (OCLPs) with a new semantics that is suitable for encoding the extensive form of games. On the other hand, LPODs semantics are an extension of the familiar answer sets and may afford a more congenial formalism for generalizing to the mixed strategies that we do not address here. LPODs also appear to be more suitable for encoding the normal form of games. However, the similarity of the results and the inter-translatability of extensive and normal form games suggests an equivalence between LPODs and OCLPs that should be investigated. The structure of the paper is as follows. Section 2 is a brief review of the concepts in game theory needed for our exposition. Of necessity this is only a skeletal treatment, and standard works like Luce and Raiffa [Luce and Raiffa 57] or Watson [Watson 02] (from which our notation and examples are adapted) should be consulted for details. M.J. Maher (Ed.): ASIAN 2004, LNCS 3321, pp. 352–361, 2004. c Springer-Verlag Berlin Heidelberg 2004 

LPOD Answer Sets and Nash Equilibria

353

Section 3 is a recapitulation of extended logic programs (ELPs) with answer set semantics. This is largely a condensation of the original paper by Gelfond and Lifschitz [Gelfond and Lifschitz 91]. (In fact, for our paper we only need the weaker form of ELPs that do not have negative literals.) Then in section 4 LPODs are described, by summarizing the content of the paper by Brewka [Brewka 04]. The main result follows in section 5. On-going and future work is outlined in the concluding section.

2

Game Theory

Game theory has an extensive literature and many profound results. However for the purpose of this paper we need only a small part of it. We consider finite games of players 1, . . . , n each of whom has a finite set Si of moves or strategies. A profile s is a tuple of strategies, one from each player; formally s ∈ S1 × . . . × Sn . For brevity we denote S1 × . . . × Sn simply by S. A profile can be informally regarded as a “play” or a “round” of the game, with each player choosing a move independently of the choice of the others. Given a profile s, by s−i is meant the tuple s1 , . . . , si−1 , si+1 , . . . , sn , i.e. the strategies or moves of the players except i. Thus s−i is an element of S1 × . . . × Si−1 × Si+1 . . . Sn . Again for brevity we write S−i for S1 × . . . × Si−1 × Si+1 . . . Sn . Each player i has an associated payoff function ui : S → R, where R is the set of real numbers. ui (s) is the payoff to player i as a result of i’s choice of strategy si while the others have chosen s−i . In the special case when n is 2, these payoff functions are denoted by u1 and u2 , and their respective strategies are denoted by s1 and s2 for a profile s. A player i may have some belief or guess about the probabilities over the strategic choices of other players. A particular belief is represented by a probability distribution µ−i over S−i , where (by our abbreviation) the latter is the set of all combinations of strategies by the players other than i. We may then write µ−i ∈ ∆S−i to indicate that µ−i is in the collection ∆S−i of probability distributions over S−i . With this belief µ−i , if player i chooses the strategy si the expected payoff ui (si , µ−i ) for i is therefore Σs−i ∈S−i µ−i (s−i )ui (si , s−i ). Player i’s strategy si is a best response to its belief µ−i (about the others’ strategies) if ui (si , µi ) ≥ ui (si , µi ) for every si ∈ Si , i.e, si has the best expected payoff relative to its other strategies. There can be more than one best response to µ−i . In the two-player examples below the probability distributions are all very simple — they place probability 1 on each of the opponent’s possible strategy choices in turn, so that it suffices to reason about the best response to opponent strategies one at a time. A profile s is a pure strategy3 Nash equilibrium if for each player i si is a best response to s−i . What this entails is that the choice of si for each player is rationalizable with respect to the hypothesis that every player knows that each player can reason completely about the best strategic choices other players will make in any circumstance. 3

A mixed strategy is one in which player i “randomizes” its own strategies according to some probability distribution. We consider such strategies in a later paper.

354

N. Foo, T. Meyer, and G. Brewka

We illustrate this using four standard two-player games as shown in figures 1, 2, 3 and 4 chosen as representatives of the features to be addressed in later sections. Perhaps the most familiar of these four is the Prisoners’ Dilemma. Its scenario is as follows. Two persons (players 1 and 2 in this abstract setting) are arrested for joint commission of a crime. They are interrogated separately and cannot collaborate. The circumstances are such that if they both cooperate by keeping silent, both will get off. However if player 1 keeps silent but player 2 confesses (“finks”), then player 1 will be severely punished while player 2 gets only a short sentence; and vice-versa. In the remaining case, they both fink, and here they are both jailed for a long time. The payoffs corresponding to these possibilities are shown in the matrix. The rows are indexed by player 1’s strategies, and the columns by those of player 2. The matrix entries are pairs in which the first component is the payoff to player 1 and the second is the payoff to player 2 for the corresponding strategy pairs. For instance, the entry for cell (F 1, C2) is (3, 0), meaning that u1 (F 1, C2) = 3 and u2 (F 1, C2) = 0. The striking feature of this game is that the best response of each player is to fink, resulting in a Nash equilibrium (F 1, F 2) which is a bad outcome compared with the (C1, C2) possibility. The next game is the Battle of the Sexes where the scenario is that a couple has to decide whether to go to the opera or the movies. One prefers the opera and the other prefers the movies, but they would rather forgo their preference than go alone. Here there are two Nash equilibria, (OP 1, OP 2) and (M V 1, M V 2), as can be verified. In the Pareto Coordination game there are also two Nash equilibria (A1, A2) and (B1, B2), as again can be verified.‘ The last example is the Matching Pennies game. It does not have a Nash equilibrium4 . To recapitulate the well-known argument for this, the best response of player 1 to the strategy H2 of player 2 is H1, and to T 2 of player 2 is T 1. But the best response of player 2 to player 1’s strategy H1 is T 2, and to T 1 of player 1 is H2. Thus none of (H1, H2), (H1, T 2), (T 1, H2) or (T 1, T 2) have the respective mutual best reponses of players 1 and 2 in any pair. For example, in (H1, H2), H2 is not player 2’s best response to player 1’s H1.

3

ELPs — Extended Logic Programs

Extended logic programs (ELPs) were introduced by Gelfond and Lifschitz (op.cit.) to increase the expressive power of logic programs by permitting classical negation to co-exist with negation as failure. They achieved this by a slight tweak of their notion of stable models of logic programs that only have negation as failure (but not classical negation). Our main result actually depends on stable models, but in anticipation of future generalization to situations in which agents can reason about other agents, and can promise or commit to do or not to do certain actions, we might as well use the vocabulary of the analogs of stable models in the ELP setting — answer sets. Given an ELP Π and a set S of literals, the Gelfond-Lifschitz reduct of Π with respect to S, denoted by Π S , is a definite logic program obtained from Π by (i) deleting every 4

However, it does have a mixed strategy Nash equilibrium.

LPOD Answer Sets and Nash Equilibria

Player 1

Player 2 C2

F2

C1

(2,2)

(0,3)

F1

(3,0)

(1,1)

Fig. 1. Prisoners’ Dilemma

Player 2

Player 1

OP2

MV2

OP1

(2,1)

(0,0)

MV1

(0,0)

(1,2)

Fig. 2. Battle of the Sexes

Player 1

Player 2 A2

B2

A1

(2,2)

(0,0)

B1

(0,0)

(1,1)

Fig. 3. Pareto Coordination

355

356

N. Foo, T. Meyer, and G. Brewka

Player 2

Player 1

H2

T2

H1

(1,−1)

(−1,1)

T1

(−1,1)

(1,−1)

Fig. 4. Matching Pennies

clause which has not L in its body for which L ∈ S, and (ii) dropping all not L in the surviving clauses. Intuitively this can be justified by regarding S as a “guess” at a solution (successful queries) of the ELP Π. If the guess S is correct and L is in it then any clause with not L in its body cannot be used. In all other clauses, any not L is guaranteed to be such that L ∈ S, so it not L might as well be dropped since L is bound to finitely fail. To close the circle of this intuition, we call a guess S an answer set if lf p(Π S ) = S, where lf p(Π S ) is the least fixed point 5 of the definite program Π S . This simple example of an ELP is taken from Gelfond and Lifschitz (op.cit). eligible ← highGP A eligible ← minority, f airGP A ¬eligible ← ¬f airGP A, ¬highGP a interview ← not eligible, not ¬eligible Assume also that the facts about a certain candidate Anne are given: f airGP A, ¬highGP A. It can be verified that the only answer set is {f airGP A, ¬highGP A, inteview}. If instead the facts had been minority, f airGP A, then the only answer set will contain eligible.

4

LPODs — Logic Programs with Ordered Disjunctions

Logic Programs with ordered disjunctions were introduced by Brewka and his colleagues (op.cit.) to rank multiple answer sets of ELPs further extended with disjunctive literals in the heads of clauses. A clause of this kind may look like A ∨ B ∨ C ← D ∧ E ∧ F ∧ not G ∧ not H 5

This can also be defined as the smallest Herbrand model of Π S .

LPOD Answer Sets and Nash Equilibria

357

where A, B, . . . H are literals. The disjunct A ∨ B ∨ C in the head essentially says that whenever the body is true, at least one of A or B or C is true, and in fact it is the minimal models that suffice. However, even with this extended syntax we do not have a way to express a preference for, say, the answer sets that contain A to those that contain B, and those with B to those with C. The desire to express such preferences led Brewka, et al. to introduce a new class of disjunctive ELPs. The syntactic modification to the above clause to express preference of A to B is this: A × B × C ← D ∧ E ∧ F ∧ not G ∧ not H Informally, this means that (when the body is true), if we can have A we are done, but if not then we will settle for B, and if even B cannot be had, we will be satisfied with C. For details of how this can be formally achieved with a variety of preferences on answer sets of programs that contain such ordered disjunctions we refer the reader to Brewka’s exposition (op.cit.). In our present context this will be achieved in a manner that in fact reflects one simple preference.

5

Answer Sets and Nash Equilibria

We will encode the normal form specification of games by LPODs. Player i will “own” a set of clauses, and each clause encodes the player’s most preferred responses to a hypothetical strategic choice profile of other players (s−i in the notation above). The ordering of the disjunction is exactly this preference, with the best response strategy si being the first disjunct, followed by other strategies of i’s with less and less payoffs. There is also a classical disjunction that encodes the deterministic strategic choices of player i, but that is explained in the simpler 2-person context in the next paragraph. In the specific cases of two person pure strategy games, a typical clause will therefore be of the form p1 × p2 . . . × pk ← q where pi are the strategies of player 1 in response to the strategy q of player 2. In the disjunction, p1 will be 1’s best response to q, p2 will be 1’s next best response to q, and so forth. If player 2 has n stratgeies, then there will be n such clauses for player 1. Dually, player 2 will have a set of k clauses encoding 2’s responses to each of 1’s k strategies. To say that player 1 has to make a move, or equivalently choose a strategy, we write a clause of the form p1 ∨ p2 . . . ∨ pk ←, i.e. a classical disjnctive fact, which we call “move clauses” below. Under the minimal model semantics [Brewka 04] that is conventional for classical disjunctions this forces models to be those with exactly one of the disjuncts. There is a dual such fact for player 2. We now exhibit the clauses that encode the four examples above. Although we separate the clauses of the two players, this is only for convenience — there is only one program for each normal form. The answer sets are ranked according to the ranks of the preferred strategies for each player, so that the individual ranks of each pair of strategies (s1 , s2 ) — s1 being 1’s move, and s2 being 2’s move — induces a corresponding rank in the answer sets. The minimal model semantics for the “move clauses” ensure that each (s1 , s2 ) pair occurs in one and only one answer set, so this rank is well-defined. Prisoner’s Dilemma

358

N. Foo, T. Meyer, and G. Brewka

Player 1 clauses. F 1 × C1 ← C2

(1)

F 1 × C1 ← F 2

(2)

F 2 × C2 ← C1

(3)

F 2 × C2 ← F 1

(4)

F 1 ∨ C1.

(5)

F 2 ∨ C2.

(6)

Player 2 clauses

Move clauses:

Answer set rankings (player 1, player 2): (C1, C2) is (2, 2); (C1, F 2) is (2, 1); (F 1, C2) is (1, 2); and (F 1, F 2) is (1, 1). The only Nash equilibrium is (F 1, F 2), which has rank (1, 1). Battle of the Sexes Player 1 clauses. OP 1 × M V 1 ← OP 2

(7)

M V 1 × OP 1 ← M V 2

(8)

OP 2 × M V 2 ← OP 1

(9)

M V 2 × OP 2 ← M V 1

(10)

OP 1 ∨ M V 1.

(11)

OP 2 ∨ M V 2.

(12)

Player 2 clauses

Move clauses:

Answer set rankings (player 1, player 2): (M V 1, M V 2) is (1, 1); (M V 1, OP 2) is (2, 2); (OP 1, M V 2) is (2, 2); and (OP 1, OP 2) is (1, 1). There two Nash equilibria are (OP 1, Op2) and (M V 1, M V 2) both of which have rank (1, 1). Pareto Coordination Player 1 clauses. A1 × B1 ← A2

(13)

B1 × A1 ← B2

(14)

A2 × B2 ← A1

(15)

B2 × A2 ← B1

(16)

Player 2 clauses

LPOD Answer Sets and Nash Equilibria

359

Move clauses: A1 ∨ B1.

(17)

A2 ∨ B2.

(18)

Answer set rankings (player 1, player 2): (A1, A2) is (1, 1); (A1, B2) is (2, 2); (B1, A2) is (2, 2); and (B1, B2) is (1, 1). There two Nash equilibria are (A1, A2) and (B1, B2) both of which have rank (1, 1). Matching Pennies Player 1 clauses: H1 × T 1 ← H2

(19)

T 1 × H1 ← T 2

(20)

T 2 × H2 ← H1

(21)

H2 × T 2 ← T 1

(22)

H1 ∨ T 1.

(23)

H2 ∨ T 2.

(24)

Player 2 clauses

Move clauses:

Answer set rankings (player 1, player 2): (H1, H2) is (1, 2); (H1, T 2) is (2, 1); (T 1, H2) is (2, 1); and (T 1, T 2) is (1, 2). There is no (pure strategy) Nash equilibrium6 and there is no answer set with rank (1, 1). In these examples their known Nash equilibria coincide with answer sets of rank (1, 1), and when there is no Nash equilibrium there are also no answer sets of rank (1, 1). They are instances of the main result of this paper. We say that an answer set is most preferred if all of the components of the inidividual strategies are ranked 1, i.e., the n-tuple of player preferences are all 1’s in the answer set. Proposition 1 The answer sets which are most preferred are exactly the pure strategy Nash equilibria. Proof Outline: Each possible strategy profile (s1 , s2 , . . . , sn ) of the n players is ranked by a tuple (p1 , p2 , . . . pn ) where pi is the position of si in the disjunct of player i’s clause that has in its body the strategies s−i (i.e., those in the strategies of the other players). The profile is a Nash equilibrium if and only if each such disjunct is the most preferred, i.e., is the left-most disjunct, and therefore p1 = 1 for each i. 6

However, this game has a mixed strategy Nash equilibrium.

360

N. Foo, T. Meyer, and G. Brewka

However, as the Battle of the Sexes and the Pareto Coordination examples show, the LPOD characterization of Nash equilibria does not discriminate between the asymmetry of the equilibria in the Battle of Sexes game and the symmetry of the equilibria in the Pareto Coordination game. In the former, one of the equilibria benefits player 1 more while the other benefits player 2 more. In the latter, no player has an advantage in either equilibrium. There are ways in LPODs to make these distinctions, and they lead to the encoding of the game-theoretic concept of Pareto efficiency [Watson 02] that we have not fully investigated. On the other hand, the rankings provided by LPODs have information about what the players might next prefer if for some reason they cannot all have their most preferred choices. Further, in the Prisoner’s Dilemma example the most Pareto-efficient strategy profile (C1, C2) can be achieved if the players are allowed to promise or commit to cooperate. There are also ways in LPODs to encode this. The alternative to normal forms for specifying games is the extensive form, which is a tree-like representation showing the time-sequential choice of strategies (moves) in a game. LPODs can also be used to encode this by appealing to experience in using them to discover plans.

6

Conclusion

We have shown that LPODs are a suitable formalism to express finite pure strategy games. In particular Nash equilibria correspond to most preferred answer sets. There are a number of loose ends that need to be tied up. Among them are the following. Not all Nash equilibria are equal, and an extension to LPODs can encode distinctions among them. Many games only have mixed strategy Nash equilibria. LPODs do not have the probabilistic structure to encode them, so they will have to be extended with (perhaps) existing notions of probabilistic disjunctive logic programs. Finally, it would be a service to the logic programming as well as the game theory community to investigate what we suspect is an equivalence between LPODs and OCLPs (ordered choice logic programs).

Acknowledgement This research is supported by the National ICT Australia, which is funded through the Australian Government’s Backing Australia’s Ability initiative, in part through the Australian Research Council.

References [Brewka 04]

G. Brewka, “Answer Sets and Qualitative Decision Making”, Synthese, 2004. [Gelfond and Lifschitz 91] M. Gelfond and V. Lifschitz, “Classical negation in logic programs and disjunctive databases”, New Generation Computing, pp. 365385, 1991.

LPOD Answer Sets and Nash Equilibria [Luce and Raiffa 57] [De Vos and Vermeir 99]

[Watson 02]

361

R.D. Luce and H. Raiffa, Games and Decisions: Introduction and Critical Survey, 1957, reprinted by Dover Publications. M. De Vos and D. Vermeir, “Choice Logic Programs and Nash Equilibria in Strategic Games” Proceedings of the 13th CSL’99 conference, pp.266-276, Springer LNCS 1683, 1999. J. Watson, Strategy: An Introduction to Game Theory, 2002, W.W. Norton.