Sequential Equilibrium and Perfect Equilibrium in Games of Imperfect Recall Joseph Y. Halpern Cornell University
[email protected]
Rafael Pass Cornell University
[email protected]
October, 2008 Abstract Definitions of sequential equilibrium and perfect equilibrium are given in games of imperfect recall.
1
Introduction
Sequential equilibrium [Kreps and Wilson 1982] and perfect equilibrium [Selten 1975] are perhaps the most common solution concepts used in extensive-form games. They are both trying to capture the intuition that agents play optimally, not just on the equilibrium path, but also off the equilibrium path; that is, even if an information set off the equilibrium path is reached (which a priori is an event of probability 0) a player is making an optimal move conditional on having reached that information set. Unfortunately, both solution concepts have been defined only in games of perfect recall, where players remember all the moves that they have made and the information sets that they have observed. Perfect recall seems like an unreasonable assumption in practice. To take just one example, consider even a relatively short card game such as bridge. In practice, in the middle of a game, most people do not remember the complete bidding sequence and the complete play of the cards (although this can be highly relevant information!). Indeed, more generally, we would not expect most people to exhibit perfect recall in games that are even modestly longer than the standard two- or three-move games considered in most game theory papers. The intuition that underlies sequential and perfect equilibrium, namely, players should play optimally even off the equilibrium path, seems to make sense even in games of imperfect recall. However, the work of Piccione and Rubinstein [1997b] (PR from now 1
on) suggests some subtleties. The following two examples, both due to PR, illustrate the problems. Example 1.1: Consider the game described in Figure 1: x0 .5
.5 z0
S
x2
x1
z1
S
2
2
B
B x3 L z2
R
L
−6
−2
z3 3
X
x4 R z5
z4 4
Figure 1: Subtleties with imperfect recall. It is not hard to show that the strategy that maximizes expected utility chooses action S at node x1 , action B at node x2 , and action R at the information set X consisting of x3 and x4 . Call this strategy b. Let b0 be the strategy of choosing action B at x1 , action S at x2 , and L at X. As PR point out, if node x1 is reached and the agent is using b, then he will not feel that b is optimal, conditional on being at x1 ; he will want to use b0 . Indeed, there is no single strategy that the agent can use that he will feel is optimal both at x1 and x2 . The problem here is that if the agent starts out using strategy b and then switches to b0 if he reaches x1 (but continues to use b if he reaches x2 ) results in a strategy that does not respect the information structure of the game, since he makes different moves at the two nodes in the information set {x3 , x4 }. As pointed out by Halpern [1997], if the agent can know what strategy he is supposed to be using at all times, then the information sets are not doing a good job here of describing what the agent knows. The agent will know different things at x3 and x4 . For the purposes of this paper, we assume that agents are restricted to using strategies that respect the information sets. As we shall see, to capture sequential equilibrium appropriately, when we consider deviations from a strategy b at an information set X, we must consider strategies b0 that, while respecting the information structure, can differ from b at X or at information sets that come strictly after X in a precise sense. Example 1.2: The following game, commonly called the absent-minded driver paradox, illustrates a different problem. It is described by PR as follows: 2
An individual is sitting late at night in a bar planning his midnight trip home. In order to get home he has to take the highway and get off at the second exit. Turning at the first exit leads into a disastrous area (payoff 0). Turning at the second exit yields the highest reward (payoff 4). If he continues beyond the second exit he will reach the end of the highway and find a hotel where he can spend the night (payoff 1). The driver is absentminded and is aware of this fact. When reaching an intersection he cannot tell whether it is the first or the second intersection and he cannot remember how many he has passed. The situation is described by the game tree in Figure 2. Xe
e1
E
B e2
E
z1 0
z2 4
B z3 1
Figure 2: The absentminded driver game. Clearly the only decision the driver has to make is whether to get off when he reaches an exit. A straightforward computation shows that the driver’s optimal strategy ex ante is to exit with probability 1/3; this gives him a payoff of 4/3. On the other hand, suppose that the driver starts out using the optimal strategy, and when he reaches the information set, he ascribes probability α to being at e1 . He then considers whether he should switch to a new strategy, where he exits with probability p. Another straightforward calculation shows that his expected payoff is then α((1 − p)2 + 4p(1 − p)) + (1 − α)((1 − p) + 4p) = 1 + (3 − α)p − 3αp2 .
(1)
Equation 1 is maximized when p = min(1, (3 − α)/6α), with equality holding only if α = 1. Thus, unless the driver ascribes probability 1 to being at e1 , he should want to change strategies when he reaches the information set. This means that as long as α < 1, we cannot hope to find a sequential equilibrium in this game. The driver will want to change strategies as soon as he reaches the information set. According to the technique used by Selten [1975] to ascribe beliefs, also adopted by PR, if the driver is using the optimal strategy, e1 should have probability 3/5 and e2 should have probability 2/5. The argument is that, according to the optimal strategy, e1 is reached with probability 1 and e2 is reached with probability 2/3. Thus, 1 and 2/3 3
should give the relative probability of being at e1 and e2 . Normalizing these numbers gives us 3/5 and 2/5, and leads to non-existence of sequential equilibrium. (This point is also made by Kline [2005].) As shown by PR and Aumann, Hart, and Perry (AHP) [1997], this way of ascribing beliefs guarantees that the driver will not want to use any single-action deviation from the optimal strategy. That is, if the driver considers changing his strategy only at a single node while keeping the rest of his strategy fixed, then there will be no deviations from the optimal strategy. PR call this the modified multi-self approach, whereas AHP call it action-optimality. AHP suggest that this approach solves the paradox. On the other hand, Piccione and Rubinstein [Piccione and Rubinstein 1997a] argue that it is hard to justify the assumption that an agent cannot change her future actions. (See also [Gilboa 1997; Lipman 1997] for further discussion of this issue.) As we said earlier, in our definition of sequential equilibrium, we do allow an agent that is considering deviating at an information set X to make changes after X, but only at X or at information sets that occur strictly after X. Grove and Halpern [1997] (GH from now on) argue that while using beliefs ascribed in this way leads to a perfectly reasonable computation of the expected utility in games with perfect recall (which is all that Selten considered), we cannot use these probabilities in games of imperfect recall. Specifically, if we identify the event of reaching an information set X with the set of all complete histories that pass through X, and then try to compute the expected utility of a strategy conditional on X by summing the expected utility of the strategy conditional on each node x ∈ X multiplied by the probability of being at x, we must be careful about how we ascribe beliefs about being at a node x ∈ X. The obvious way of doing so does not give the correct expected utility calculation! In this paper, we show how sequential equilibrium and perfect equilibrium can be defined in games of imperfect recall, taking into account the lessons learned from these two examples, namely, that we need to be careful about what strategies are considered and how beliefs are ascribed. We show that, when this is done in what we argue is an appropriate way, sequential equilibrium and perfect equilibrium exist in games of imperfect recall, and have the properties that we would expect. By way of contrast, the work of PR and more recent work of Kline [2002, 2005] show that what we would argue is an inappropriate way to ascribe beliefs leads to definitions of sequential equilibrium and perfect equilibrium that do not have the properties we expect. The rest of this paper is organized as follows. In Section 2, we review the GH arguments and their technique for ascribing beliefs. In Section 3, we describe our approach to ascribing beliefs. In Section 4, we show that sequential equilibrium and perfect equilibrium exist in games of imperfect recall. We conclude with some discussion in Section 5.
4
2
The GH approach
Here, in a nutshell, is the GH argument, which is quite simple. In an extensive-form game, utility is defined on the outcomes, which are leaves of the game tree. A leaf of the tree can be identified with a complete history of the tree, so we can view utility as a function on complete histories. To compute the expected utility associated with a strategy profile, we need a probability distribution on complete histories. Fortunately, a strategy profile places an obvious probability on histories, so computing the expected probability is straightforward. A node x in a game tree can be identified with a set of complete histories (the ones that pass through n). We can take the probability of reaching x (given strategy b) to be the probability of this set of complete histories. Computing the expected utility conditional on reaching x is also straightforward. What about computing the expected utility conditional on reaching an information set X? Consider two approaches: 1. We can compute the expected utility conditional on reaching X, where we identify the event of reaching X with the set of histories that pass through some node in X. 2. We can compute the sum over x ∈ X of the expected utility conditional on reaching x, weighted by the probability of x. If we use Selten’s approach for ascribing beliefs, then, as shown by GH, these two approaches coincide in games of perfect recall. More generally, they coincide in games where it is not the case that an information set contains two nodes in the same history. However, they may not coincide in games where an information set can contain two nodes in the same information set. The absentminded driver game provides an example of this phenomenon. The ex ante optimal strategy puts probability 1/3 on (the history ending with) z1 , probability 2/9 on z2 , and probability 4/9 on z3 . All three histories go through the information set, so the expected utility conditional on reaching the information set is identical to the unconditional expected utility according to the first approach. On the other hand, the second approach gives a quite different answer. The expected utility conditional on reaching e1 is the same as the unconditional expected utility, namely 4/3, while the expected utility conditional on reaching e2 is 2. Thus, the second approach would give an expected utility of (3/5)(4/3) + (2/5)(2) = 8/5. The reason for the two different answers should be clear: the sets of histories associated with the two nodes are not disjoint. In games with perfect recall (and in games where there is at most one node per history in an information set) the sets of histories associated with nodes in an information set are disjoint, and the two approaches agree. We would argue that the first approach is arguably the correct approach to computing expected utility in games of imperfect recall. (We remark that Kreps and Wilson’s [1982] definition of sequential equilibrium explicitly considers the utility conditional on reaching X; PR do not commit to a particular approach.) 5
We can say even more. Given an information set X, define the upper frontier of ˆ to consist of all those nodes x ∈ X such that there is no node x0 ∈ X X, denoted X, that strictly precedes x on some path from the root. The notion of upper frontier was introduced by Halpern [1997], who argued that if a player switched strategies at all, the switch had to occur at the upper frontier. Note that for games where there is at most ˆ = X. one node per history in an information set, we have X Given a behavioral strategy profile b, let πb (x) be the probability of reaching a node P x when b is used. For an information set, set πb (X) = x∈X πb (x). Selten suggested ascribing probability πb (x)/πb (X) to being at a node x in information set X. This was PR’s preferred approach as well. But, as we have seen, this approach leads to problems in the games such as the absentminded driver game. A different approach was suggested ˆ by Halpern [1997]. For a node x in an information set X, define π ˆb (x) = πb (x) if x ∈ X P ˆ Again, we set π ˆb (x). Suppose that we take and π ˆb (x) = 0 if x ∈ / X. ˆb (X) = x∈X π the probability of being at x conditional on reaching X to be π ˆb (x)/ˆ πb (X). This agrees with Selten’s approach in games of perfect recall. Moreover, as shown by GH, this gives the same answer as the first (arguably correct) approach to computing expected utility conditional on reaching X. We now make this precise. Let EUi (b | x) denote the expected utility of strategy profile b for player i conditional on reaching node x, and let EUi (b | X) denote the expected utility for player i of strategy profile b, conditional on reaching information set X, calculated using the first approach above. Proposition 2.1: [Grove and Halpern 1997] If information set X is reached by strategy profile b with positive probability, then EUi (b | X) =
X x∈X
π ˆb (x) EUi (b | x). π ˆb (X)
Since π ˆ = π in games of perfect recall, this justifies Selten’s calculation. Moreover, it justifies assigning probability 1 to being at e1 in the absentminded driver’s game.
3
Beliefs in Games of Imperfect Recall
Although using π ˆ instead of π leads to the right calculation of expected utility, we find it hard to justify beliefs that always ascribe probability 0 to being at nodes that are not on the frontier. After all, if b reaches a node x with positive probability, then any “reasonable” belief should ascribe positive probability to being at x. (This point is also made by Piccione and Rubinstein [1997a].) We now provide a simple approach for ascribing beliefs that gives a correct expected utility calculation, while still allowing us to ascribe positive probability to nodes that are reached with positive probability. Fix a game Γ. Following Kreps and Wilson [1982], 6
a belief system µ for Γ is a function that associates with each information set X in Γ a probability µX on the histories in X. PR quite explicitly interpret µX (x) as the probability of being at the node x, conditioned on reaching X. Just as Kreps and Wilson, P they thus require that x∈X µX (x) = 1. We instead interpret µX (x) as the probability of reaching x, conditioned on reaching P X. We no longer require that x∈X µX (x) = 1. While this property holds in games of perfect recall, in games of imperfect recall, if X contains two nodes that are both on a history that is played with positive probability, the sum of the probabilities will be greater than 1. For instance, in the absent minded driver’s game, the ex ante optimal strategy reaches e1 with probability 1 and e2 with probability 2/3. Rather than requiring P P that x∈X µX (x) = 1, we require that x∈Xˆ µX (x) = 1, that is, that the probability ˆ = X in of reaching the upper-frontier of X, conditional on reaching X, be 1. Since X games of perfect recall, this requirement generalizes that of Kreps and Wilson. Moreover, it holds if we define µX in the obvious way. Lemma 3.1: If X is an information set that is reached by strategy profile b with positive P probability and µX (x) = πb (x | X), then x∈Xˆ µX (x) = 1. Proof: By definition,
P
ˆ x∈X
µX (x) =
P
ˆ x∈X
ˆ | X) = 1. πb (x | X) = π(X
Given a belief system µ and a strategy profile b, define a probability distribution µbX over terminal histories in the obvious way: for each terminal history z, let xz be the shortest history in X that is a prefix of z, and define µbX (z) to be the product of µX (xz ) and the probability that b leads to the terminal history z when started in xz ; if there is no prefix of z in X, then µbX (z) = 0. Our definition of the probability distribution µbX induced over terminal histories is essentially equivalent to that used by Kreps and Wilson [1982] for games of perfect recall. The only difference is that in games of perfect recall, a terminal history has at most one prefix in X. This no longer is the case in games of imperfect recall, so we must specify which prefix to select. For definiteness, we take the shortest prefix; however, it is easy to see that if µX (x) is defined as the probability of b reaching x conditioned on reaching X, then any choice of xz leads to the same distribution over terminal histories. We formalize this in the following lemma, whose straightforward proof is left to the reader. Lemma 3.2 : If xz and x0z are two prefixes of z in some information set X, then πb (xz | X)πb (z | xz ) = πb (x0z | X)πb (z | x0z ) Note that if a terminal history z has a prefix in X, then the shortest prefix of z in ˆ Moreover, defining µb in terms of the shortest history guarantees that it X is in X. X P is a well-defined probability distribution, as long as x∈Xˆ µX (x) = 1, even if µX is not defined by some strategy profile b. 7
Lemma 3.3: If Z is the set of terminal histories and P strategy profile b, we have z∈Z µbX (z) = 1.
P
ˆ x∈X
µX (x) = 1, then for any
Proof: By definition, P
z∈Z
µbX (z) = = = = =
P µ (x )π (z | xz ) Pz∈Z PX z b ˆ µX (x)πb (z | x) Pz∈Z x∈X P ˆ µ (x) z∈Z πb (z | x) Px∈X X ˆ x∈X
µX (x)
1.
Following Kreps and Wilson [1982], let EUi (b | X, µ) denote the expected utility for player i, where the expectation is taken with respect to µbX . The following proposition, which can be viewed as a variant of Proposition 2.1, says that we can compute the expected utility of b given a belief systems µ using only nodes on the frontier (even though µX (x) may not be 0 for nodes x not on the frontier). Proposition 3.4: If µ is a belief system and b is a strategy profile, then EUi (b | X, µ) =
X
µX (x)EUi (b | x).
ˆ x∈X
Proof: Let ZX consist of those nodes in Z that have a prefix in X; similarly, let Zx ˆ = {xz : z ∈ ZX }, we consist of those nodes in Z whose prefix is x. Using the fact that X get that P µb (z)ui (z) EUi (b | X, µ) = Pz∈Z X µ (x )π (z | xz )ui (z) = Pz∈ZX PX z b = ˆ µ (x)πb (z | x)ui (z) Pz∈ZX x∈X PX = ˆ µ (x) z∈Zx πb (z | x)ui (z) Px∈X X = µ (x)EU ˆ X i (b | x). x∈X Proposition 3.4 justifies calculating the expected utility using the upper frontier, without requiring that beliefs off the upper frontier be 0. However, note that µX is no longer a probability distribution on X.
4 4.1
Perfect and Sequential equilibrium Perturbed Games
Proposition 2.1 tells us how to compute the expected utility of a strategy conditional on reaching an information set that is reached with positive probability. However, it does 8
not tell us how to define the expected utility conditional on reaching a strategy that is off the equilibrium path. We need to know how to do this in order to define both sequential and perfect equilibrium. To deal with this, we follow Selten’s approach of considering perturbed games. Given an extensive-form game Γ and a function η associating with every information set X and action c that can be performed at X, a probability ηc > 0 such that, for each information set X for player i, if A(X) is the set of actions that i can P perform at X, then c∈A(X) ηc < 1. We call η a perturbation of Γ. We think of ηc as the probability of a “tremble”; since we view trembles are unlikely, we are most interested in the case that ηc is small. A perturbed game is a pair (Γ, η) consisting of a game Γ and a perturbation η. A behavioral strategy for player i in (Γ, η) is a strategy b such that, for each action c ∈ A(X), b(X) assigns probability at least ηc to c. We can define best responses and Nash equilibrium in the usual way in perturbed games (Γ, η); we simply restrict the definitions to the legal strategies for (Γ, η). Note that with any strategy in a perturbed game, all information sets are reached with positive probability.
4.2
Best Responses at Information Sets
There are a number of ways to capture the intuition that a strategy bi for player i is a best response to a strategy profile b−i for the remaining players at an information set i. To make these precise, we need some notation. Let bi [X/c] denote the strategy that is identical to bi except that, at information set X, action c is played. Switching to another action at an information set is, of course, not the same as switching to a different strategy at an information set. If b0 is a strategy for player i, we would like to define the strategy [bi , X, b0 ] to be the strategy where i plays b up to X, and then switches to b0 at X. Intuitively, this means that i plays b0 at all information sets that come after X. The problem is that the meaning of “after X” is not so clear in games with imperfect recall. For example, in the game in Figure 1, is the information set X after the information set {x1 }? While x3 comes after x1 , x4 does not. The obvious interpretation of switching from b to b0 at x1 would have the agent playing b0 at x3 but still using b at x4 . As we have observed, the resulting “strategy” is not a strategy in the game, since the agent does not play the same way at x3 and x4 . This problem does not arise in games of perfect recall. Define a strict partial order on information sets with the property that X X 0 iff, for all x0 ∈ X 0 , there exists some x ∈ X such that x (strictly) precedes x0 on the game tree. This partial order makes sense whether or not we assume perfect recall. However, with perfect recall, it is not hard to show that is equivalent to 0 , where X 0 X 0 iff, for some x0 ∈ X 0 , there exists some x ∈ X such that x (strictly) precedes x0 on the game tree. Note that, for example, in the game in Figure 1, X 0 {x1 }, but it is not the case that X {x1 }. Although 0 is a partial order in the game in Figure 1, in general, in games of imperfect recall, 0 is not a partial order. In particular, in the game in Figure 3, we have X1 0 X2 and X2 0 X1 . 9
x0 r
@ @ @.5 @ @ @r x2
.5 x1 r
X2 J r
Q
Q X 0 r L1
r QQ1
L2 r
0
J R2 r 0 J Q Q Q Q Q R1 Q QQ L2 Q Q Q Q Q Q Q Q Q Q r
J r Q Q @ J @
L1 Q @ R2 J Q @ R1 @ @ @r @r r
2
1
0
Figure 3: A game where 0 is not a partial order. Define X X 0 iff X = X 0 or X X 0 . Now we can define [b, X, b0 ] to be the strategy according to which i plays b0 at every information set X 0 such that X X 0 , and otherwise plays b. The strategy [b, X, b0 ] is well defined even in games of imperfect recall, but it is perhaps worth noting that the strategy [b, {x1 }, b0 ] in the game of Figure 1 is the strategy where the player goes down at x1 , but still plays R at information set X, since we do not have X {x1 }. Thus, [b, {x1 }, b0 ] as we have defined it is not better for the player than b. Similarly, in the game in Figure 3, if b is the strategy of playing R1 at X1 and R2 at X2 , while b0 is the strategy of playing L1 at X1 and L2 at X2 , then [b, {x2 }, b0 ] is b. As we now show, the notation [·, ·, ·] considers all switches that respect the information structure of the game. Let (b, X, b0 ) denote the “strategy” of using b until X is reached and then switching to b0 . As observed above, (b, X, b0 ) is not always a strategy. But whenever it is, (b, X, b0 ) = [b, X, b0 ]. Lemma 4.1: If (b, X, b0 ) is a strategy in game Γ, then (b, X, b0 ) = [b, X, b0 ]. Proof: Suppose that (b, X, b0 ) 6= [b, X, b0 ]. Then there must exist some information set X 0 such that (b, X, b0 ) and [b, X, b0 ] differ at X 0 . If X 0 X, then at every node in x ∈ X 0 , the player plays b0 (X 0 ) at x according to both (b, X, b0 ) and [b, X, b0 ]. Thus, X 0 6 X. This means the player plays b(X 0 ) at every node in X 0 according to [b, X, b0 ]. Since (b, X, b0 ) and [b, X, b0 ] disagree at X 0 , it must be the case that the player plays b0 (X 0 ) at some node x ∈ X according to (b, X, b0 ). But since X 0 6 X 0 , there exists some node x0 ∈ X 0 that does not have a prefix in X. This means that (b, X, b0 ) must play b(X 0 ) at x0 . Thus, (b, X, b0 ) is not a strategy. The following result is essentially proved by Selten [1975]. 10
Proposition 4.2: If Γ is a game of perfect recall and (Γ, η) is a perturbation of Γ, (a) strategy bi is a best response to b−i in (Γ, η) (i.e., EUi (b) ≥ EUi (b0 , b−i ) for all strategies b0 for player i in (Γ, η)); (b) for each information set X for player i and all actions c at X, bi is at least as good a response to b−i as bi [X/c] at X in (Γ, η) (i.e., EUi (b | X) ≥ EUi ((bi [X/c], b−i ) | X));1 (c) for each information set X for player i and all strategies b0 for player i in (Γ, η), bi is at least as good a response to b−i as [bi , X, b0 ] at X in (Γ, η) (i.e., EUi (b | X) ≥ EUi (([b, X, b0 ], bi ) | X)). Proof: The equivalence of (a) and (b) is Lemma 7 in [Selten 1975]. Clearly (a) implies (c) and (c) implies (b). The result follows. The analogue of Proposition 4.2 does not hold in games of imperfect recall. Proposition 4.3: If Γ is an arbitrary game and (Γ, η) is a perturbation of Γ, then, taking (a), (b), and (c) as in Proposition 4.2, (a) implies (c) and (c) implies (b), but neither converse holds. Proof: It is immediate from the definitions that (a) implies (c) and that (c) implies (b). To see that (c) does not imply (a), consider the game in Figure 3. Clearly the optimal strategy for the player is b = (R1 , R2 ). Let b0 = (L1 , L2 ). Although b gives a higher utility than b0 , it is easy to see that b0 gives a higher expected utility than both b0 [X1 /R1 ] and b0 [X2 /R2 ]. That is, the player is worse off switching actions at each information set (although he would be better off if he could simultaneously switch at both). But the only strategies of the form [b0 , X1 , b00 ] are b0 and b0 [X1 /R1 ]; similarly, the only strategies of the form [b, X2 , b00 ] are b0 and b0 [X2 /R1 ]. Thus, b0 satisfies (c) but not (a). To see that (b) does not imply (c), consider a variant of the game in Figure 3 where there is an initial node x−1 that precedes nature’s node x0 . At x−1 , the agent can only play down, leading to x0 . Nevertheless, since both information sets follow x−1 , we have that [b0 , {x−1 }, b] = b. Thus, by changing his strategy at x−1 , the agent can switch from b0 to b. But he cannot do better than b0 by changing his move at any information set.
4.3
Defining Perfect and Sequential Equilibrium
We now define perfect equilibrium just as Selten [1975] did. The strategy profile b∗ is a perfect equilibrium if in Γ if there exists a sequence (Γ, η1 ), (Γ, η2 ), . . .) and a sequence of 1
Note that bi [X/c] is actually not a strategy in the perturbed game, since it does not assign positive probability to all actions at X.
11
strategy profiles b1 , b2 , . . . such that (1) ηk → ~0; (2) bk is a Nash equilibrium of (Γ, ηk ); and (3) bk → b∗ . Selten [1975] shows that a perfect equilibrium always exists in games with perfect recall. Essentially the same proof shows that it exists even in games with imperfect recall. Theorem 4.4: A perfect equilibrium exists in all finite games. Proof: Consider any sequence (Γ, η1 ), (Γ, η2), . . .) of perturbed games such that ηn → ~0. By standard fixed-point arguments, each perturbed game (Γ, ηk ) has a Nash equilibrium bk . By a standard compactness argument, the sequence b1 , b2 , . . . has a convergent subsequence. Suppose that this subsequence converges to b∗ . Clearly b∗ is a perfect equilibrium. It is easy to see that a perfect equilibrium b∗ of Γ is also a Nash equilibrium of Γ. Thus, each strategy b∗i is a best response to b∗−i ex ante. However, we also want b∗i to be a best response to b∗−i at each information set. This intuition is made precise using intuitions from the definition of sequential equilibrium, which we now review. Recall that a behavioral strategy bi for player i is completely mixed if it assigns positive probability to all actions at every information set for player i. A belief system µ is compatible with a strategy b if there exists a sequence of completely mixed strategy profiles b1 , b2 , . . . converging to b such that if X is an information set that is reached with positive probability by b, then µX (x) is just πb (x | X), the probability of reaching x conditional on reaching X; and if X is an information set that is reached with probability 0 by b, then µX (x) is limn→∞ πbn (x | X). (Thus, the sequence b1 , b2 , . . . must be such that all the relevant limits exist.) The following result, which follows easily from the definitions and earlier results, makes precise the sense in which a perfect equilibrium is a best response at each information set. Proposition 4.5: If b∗ is a perfect equilibrium in game Γ, then there exists a belief system µ compatible with b∗ such that, for all players i, all information sets X for player i, and all strategies b0 for player i at X, we have EUi (b | X, µ) ≥ EUi (([bi , X, b0 ], b−i ) | X, µ). Proof: Since b∗ is a perfect equilibrium, there exists a sequence of strategies b1 , b2 , . . . converging to b∗ and a sequence of perturbed games (Γ, η1 ), (Γ, η2 ), . . . such that ηk → ~0 and bk is a Nash equilibrium of (Γ, ηk ). All the strategies b1 , b2 , . . . are completely mixed (since they are strategies in perturbed games). We can assume without loss of generality that, for each information set X and x ∈ X, the limit limn→∞ πbn (x | X) exists. (By standard arguments, since Γ is a finite game, we can find a subsequence of b1 , b2 , . . . for which the limits all exist, and we can replace the original sequence by the subsequence.) Let µ be the belief assessment determined by this sequence of strategies. 12
We claim that the result holds with respect to µ. For suppose not. Then there exists a player i, information set X, strategy b0 , and > 0 such that EUi (b | X, µ) + < EUi (([bi , X, b0 ], b−i ) | X, µ)). It follows from Proposition 2.1, Proposition 3.4 and from the (x) that EUi (bk | X) → EUi (b | X, µ) and EUi (([bki , X, b0 ], bk−i ) | fact that πb (x | X) = πˆπˆbb(X) X) → EUi (([bi , X, b0 ], b−i ) | X, µ). Since bk → b, we have that for some k > 0 and all k 0 > k, EUi (bk | X) + /2 < EUi (([bki , X, b0 ]), b−i ) | X). But this contradicts Proposition 4.2. We can now define sequential equilibrium just as Kreps and Wilson [1982] did. Definition 4.6: A pair (b, µ) consisting of a strategy b and a belief system µ is called a belief assessment. A belief assessment (b, µ) is a sequential equilibrium in a game Γ if µ is compatible with b and for all players i, all information sets X for player i, and all strategies b0 for player i at X, we have EUi (b | X, µ) ≥ EUi (([bi , X, b0 ], b−i ) | X, µ)).
As previously observed, our modifications of the traditional definition of sequential equilibrium syntactically collapse to the traditional definition of sequential equilibrium [Kreps and Wilson 1982] in games of perfect recall. It is well known that in games of perfect recall we could replace the standard definition of sequential equilibrium by one where we only consider single-action deviations [Hendon, Jacobsen, and Sloth 1996]; this is known as the one-step deviation principle. The example used in the proof of Proposition 4.3 shows that this no longer holds in games of imperfect recall. It is immediate from Proposition 4.5 that every perfect equilibrium is a sequential equilibrium. Thus, a sequential equilibrium exists for every game. Theorem 4.7: A sequential equilibrium exists in all finite games. Note that in both Examples 1.1 and 1.2, the ex ante optimal strategy is a sequential equilibrium according to our definition. In Example 1.1, it is because the switch to what appears to be a better strategy at x1 is disallowed. In Example 1.2, the unique belief µ compatible with the ex ante optimal strategy assigns probability 1 to reaching e1 and probability 2/3 to reaching e2 . However, since e2 is not on the upper frontier of Xe , for all strategies b, EU(b | Xe ) = EU(b | e1 ) = EU(b), and thus the ex ante optimal strategy is still optimal at Xe . It is well known that in games of perfect recall, every sequential equilibrium is also a Nash equilibrium (but the converse is not true). This no longer holds in games of imperfect information. Consider the game in Figure 3. Using the same argument as in the proof of Proposition 4.3, the strategy profile (L1 , L2 ) (and the beliefs it induces) is a 13
sequential equilibrium, but the only ex ante optimal strategy (and thus only Nash equilibrium) is (R1 , R2 ). Thus, in games of imperfect recall, Nash equilibrium and sequential equilibrium are incomparable. It is easy to show that every sequential equilibrium is a Nash equilibrium in games where each agent has an initial information set that precedes all other information sets (in the order defined above). At such an information set, the agent can essentially do ex ante planning. There is no such initial information set in the game in Figure 3, precluding such planning. If we want to allow such planning in a game of imperfect recall, we must model it with an initial information set for each agent. Summarizing this discussion, we have the following result. Theorem 4.8: In all finite games, (a) every perfect equilibrium is a Nash equilibrium; (b) in games where all players have an initial information set, every sequential equilibrium is a Nash equilibrium. However, there exist games where a sequential equilibrium is not a Nash equilibrium.
5
Discussion
Selten [1975] says that “game theory is concerned with the behavior of absolutely rational decision makers whose capabilities of reasoning and memorizing are unlimited, a game . . . must have perfect recall.” We disagree. We believe that game theory ought to be concerned with decision makers that may not be absolutely rational and, more importantly for the present paper, players that that do not have unlimited capabilities of reasoning and memorizing. This is certainly not a new sentiment; work on finite automata playing games, for example, goes back to Neyman [1985] and Rubinstein [1986]. Nevertheless, we believe that there is good reason for describing games by game trees that have perfect recall (but then adding the possibility of imperfect recall later), using an approach suggested by Halpern [1997]. To understand this point, consider a game like bridge. Certainly we may have players in bridge who forget what cards they were dealt, some bidding they have heard, or what cards were played earlier. But we believe that an extensive form description of bridge should describe just the “intrinsic” uncertainty in the game, not the uncertainty due to imperfect recall, where the intrinsic uncertainty is the uncertainty that the player would have even if he had perfect recall. For example, after the cards are dealt, a player has intrinsic uncertainty regarding what cards the other players have. Given the description of the game in terms of intrinsic uncertainty (which will be a game with perfect recall), we can then consider what algorithm the agents use. (In some cases, we may want to consider the algorithm part of the strategic choice of the agents, as Rubinstein [1986] does.) If we think of the algorithm as a Turing machine, the Turing machine determines 14
a local state for the agent. Intuitively, the local state describes what the agent is keeping the track of. If the agent remembers his strategy, then the strategy must be encoded in the local state. If he has switched strategies and wants to remember that fact, then this too would have to be encoded in the local state. If we charge the agent for the complexity of the algorithm he uses (as we do in a related paper [Halpern and Pass 2008]), then an agent may deliberately choose not to have perfect recall, since it is too expensive. The key point here is that, in this framework, an agent can choose to switch strategies, despite not having perfect recall. The strategy (i.e., algorithm) used by the agent determines his information set, and the switch may result in a different information structure. As we show in [Halpern and Pass 2008], the notions defined in this paper are instrumental in defining sequential equilibrium also in such a setting.
References Aumann, R. J., S. Hart, and M. Perry (1997). The absent-minded driver. Games and Economic Behavior 20. Gilboa, I. (1997). A comment on the absent-minded driver paradox. Games and Economic Behavior 20 (1), 25–30. Grove, A. J. and J. Y. Halpern (1997). On the expected value of games with absentmindedness. Games and Economic Behavior 20, 51–65. Halpern, J. Y. (1997). On ambiguities in the interpretation of game trees. Games and Economic Behavior 20, 66–96. Halpern, J. Y. and R. Pass (2008). Algorithmic rationality: Game theory with costly computation. Unpublished manuscript. Hendon, E., H. J. Jacobsen, and B. Sloth (1996). The one-shot-deviation principle for sequential rationality. Games and Economic Behavior 12 (2), 274–282. Kline, J. J. (2002). Minimum memory for equivalence between ex ante optimality and time-consistency. Games and Economic Behavior 38, 278–305. Kline, J. J. (2005). Imperfect recall and the relationships between solution concepts in extensive games. Economic Theory 25, 703–710. Kreps, D. M. and R. B. Wilson (1982). Sequential equilibria. Econometrica 50, 863– 894. Lipman, B. L. (1997). More absentmindedness. Games and Economic Behavior 20 (1), 97–101. Neyman, A. (1985). Bounded complexity justifies cooperation in finitely repated prisoner’s dilemma. Economic Letters 19, 227–229. Piccione, M. and A. Rubinstein (1997a). The absent-minded driver’s paradox: synthesis and responses. Games and Economic Behavior 20 (1), 121–130. 15
Piccione, M. and A. Rubinstein (1997b). On the interpretation of decision problems with imperfect recall. Games and Economic Behavior 20 (1), 3–24. Rubinstein, A. (1986). Finite automata play the repeated prisoner’s dilemma. Journal of Economic Theory 39, 83–96. Selten, R. (1975). Reexamination of the perfectness concept for equilibrium points in extensive games. International Journal of Game Theory 4, 25–55.
16