Computing Approximate Equilibria in Graphical ... - Semantic Scholar

6 downloads 0 Views 175KB Size Report
the KLS algorithm) for computing approximate Nash equi- libria (NE) efficiently. Ortiz and Kearns (2003) later pre- sented the NashProp algorithm that extended ...
Computing Approximate Equilibria in Graphical Games on Arbitrary Graphs Vishal Soni [email protected]

Satinder Singh [email protected]

Michael P. Wellman [email protected]

Computer Science and Engineering Division University of Michigan, Ann Arbor Abstract We present PureProp: a new constraint satisfaction algorithm for computing pure-strategy approximate Nash equilibria in complete information games. While this seems quite limited in applicability, we show how PureProp unifies existing algorithms for 1) solving a class of complete information graphical games with arbitrary graph structure for approximate Nash equilibria (Kearns et al., 2001; Ortiz & Kearns, 2003), and 2) solving some classes of incomplete information games with tree-graph structure for approximate Bayes-Nash equilibria (Singh et al., 2004). We also show how PureProp provides a hitherto unavailable algorithm for solving classes of incomplete information graphical games with arbitrary graph structure for approximate Bayes-Nash equilibria, and conclude with some empirical analysis of the new algorithm.

1. Introduction Graphical models for representing games were introduced by Kearns et al. (2001) to capture locality of interactions among agents in a static game. Under this model, nodes represent agents and (undirected) edges represent interactions between agents. A missing edge between two nodes implies that the payoffs of the respective agents do not directly depend on the other’s action. The central idea is to view the game as being composed of several interacting local games and to exploit this locality, especially in sparsely connected games, by iteratively computing local equilibria and patching them together to obtain global equilibria efficiently (e.g., Littman et al., 2001). Games with sparsely connected graphs arise quite naturally in various settings because of computer network topologies, social networks of a population of animals, ad hoc networks of mobile devices, etc. The original work by Kearns, Littman and Singh (2001) considered acyclic graphical games of complete information and presented a message-passing algorithm (hereafter,

the KLS algorithm) for computing approximate Nash equilibria (NE) efficiently. Ortiz and Kearns (2003) later presented the NashProp algorithm that extended the KLS algorithm to arbitrary graph structures in complete information games. In other recent work, Vickrey and Koller (2002) reformulated the problem of finding approximate equilibria in games of complete information as a constraint satisfaction problem and derived a generalization of the KLS algorithm as a constraint satisfaction algorithm.1 In our own recent work (Singh et al., 2004), we considered acyclic games of incomplete information and presented a message-passing algorithm to compute approximate Bayes-Nash equilibria (BNE) efficiently. In games of incomplete information, the payoff to an agent depends not only on the actions of the other agents but also on its own private type. Agents have to choose actions without explicit knowledge of the private types of other agents. Thus, a BNE strategy has to be an equilibrium with respect to the known distribution over the types of the other agents. As our main contribution in this paper, we present PureProp(), a very simple constraint propagation algorithm that unifies all of the results mentioned above, and provides a hitherto unavailable algorithm for computing -BNE in two classes of incomplete information games with arbitrary graph structures. Hereafter we drop the argument  from the name of the algorithm for ease of exposition. Specifically, PureProp is an algorithm for finding a pure strategy -NE in arbitrary complete information games. We show how to induce a restricted game solvable by PureProp from the class of complete information games studied by Kearns et al. (2001) and Ortiz and Kearns (2003) as well as from the classes of incomplete information games studied by Singh et al. (2004). In all these cases, the construction of the induced game will be such that a pure-strategy -NE will always exist in the induced game and furthermore the equi1

Ortiz and Kearns (2003) also note but do not establish a connection between their iterative message-passing algorithm and constraint satisfaction algorithms (they thank Michael Littman in their note).

libria found by PureProp will be -eqilibria in the original game.

2. Preliminaries

The expected payoff to V when agents play according to µ ~ is X RV (~ µ) = PT (tV ) × RV (µV (·|tV ), µ−V |tV ) . (2) tV

In our formulation, we treat complete information games as a special case of incomplete information games, in which all the agents have exactly one type. We introduce notation for types, strategies, and payoffs, and definitions of basic equilibrium concepts.

2.1. Incomplete Information Games Let n denote the number of agents in the game. For games with discrete types, the realized type of each agent is chosen independently from some discrete set T = T1 , T2 , · · · , Tm according to some commonly known distribution PT . Complete information games are a special case in which there is always exactly one type and so there is no uncertainty as to the type of each agent. For games with continuous types, the realized type of each agent is chosen independently from some interval T = [tl , tu ] according to some again commonly known probability density pT . For ease of exposition, we assume that all agents have the same action set A = {a1 , a2 , · · · , ad } available to them. We use the vector notation ~a, ~t, and µ ~ to denote the joint actions, types, and strategies for all n agents. An agent V ’s strategy µV is used to denote its type-conditional mixed strategy, a mapping from types to probability distributions over actions. The symbol σ is used to refer to an unconditional (type independent) strategy, a probability distribution over actions. The symbol µV (·|tV ) refers to the mixed strategy of V when its type is tV , and µV (a|tV ) refers to the probability of taking action a when V ’s type is tV . We use ~a−V , ~t−V , and µ ~ −V to refer to the actions, types and strategies of all agents excluding V . Finally, RV (~a|t) refers to the payoff to agent V when the joint actions of the n agents are ~a and the type of V is t. Defining the game requires specifying RV (~a|t) for all V , all ~a ∈ An and all types t. Next we define some quantities of interest for games with discrete types. These can be extended in a straightforward way for continuous types by replacing the summations over discrete types and probability distribution PT with integrals over continuous types and probability density pT . The expected payoff to agent V from playing a strategy σ when its type is tV and the other agents play according to the joint strategy µ ~ −V is X X RV (σ, µ ~ −V |tV ) = PT (~t−V ) σ(aV ) × ~ t−V

~ a

 µ ~ −V (~a−V |~t−V ) × RV (~a|tV ) .

(1)

A strategy σ is said to be a best response strategy for agent V with type tV and other agents’ strategies µ ~ −V if, for all unconditional strategies σ 0 , RV (σ, µ ~ −V |tV ) ≥ RV (σ 0 , µ ~ −V |tV ).

(3)

Similarly, a strategy σ is said to be an -best response strategy for agent V with type tV and other agents’ strategies µ ~ −V if, for all unconditional strategies σ 0 , RV (σ, µ ~ −V |tV ) ≥ RV (σ 0 , µ ~ −V |tV ) − .

(4)

A BNE is a joint strategy profile µ ~ such that every agent plays a best response to the other agents’ strategies, that is, for all agents V and strategies µ0V , RV (~ µ) ≥ RV (µ0V , µ ~ −V ).

(5)

In words, a BNE is a strategy profile so that no agent by a unilateral change in strategy can benefit in expectation over the types of the other agents. Finally, an -BNE is a strategy profile µ ~ such that for all agents V and strategies µ0V , RV (~ µ) ≥ RV (µ0V , µ ~ −V ) − .

(6)

Note that in complete information games many of the above equations would simplify because each agent has only one type. Also, in such games the (-)BNE would actually be an (-)NE.

2.2. Graphical Games Graphical games (Kearns et al., 2001) have nodes representing agents and edges representing direct interactions between agents. Each agent (node) has a payoff matrix that defines her reward as a function of her type and the actions of her neighbors. Let k be the largest number of neighbors of any node in the graph. If n >> k, then the graphical representation leads to a substantially more compact representation of the game than the usual normal form (matrix) representation. In the worst case of a complete graph, the size of the graphical game representation equals that of the normal-form representation. In Figure 1, we illustrate four graph topologies (borrowed from Ortiz & Kearns, 2003), capturing different kinds of locality of interaction among the agents. These are also the graph topologies on which we evaluate the performance of our algorithm. Figure 1a is a ring topology in which each agent has exactly two neighbors, Figure 1b is a grid topology, Figure 1c is an acyclic graph topology, and Figure 1d is a chordal graph in which a fraction of the edges are chords.

a finite number of actions into a constraint satisfaction problem (CSP). We call the resulting CSP the game-CSP. Then we present our algorithm for solving game-CSPs.

3.1. Game-CSP

3. PureProp() Algorithm

A CSP is defined by a set of variables, the domain of each variable (i.e., the values each variable can take), and a set of constraints. Each constraint involves some subset of the variables and specifies the allowable combinations of values for that subset. In the game-CSP there is a variable for each agent (node in the graphical game). The domain of a variable V , denoted D(V ), is the set of all possible profiles of V . A profile for V is an assignment of (pure) actions to all agents in the neighborhood of V . We briefly note here that our CSP formulation of games differs from the CSP formulation of games in Vickrey and Koller (2002) in that we use neighborhood profiles as values in the domains of the variables while they use single-agent actions as values in the domain of the variables. Thus our CSP is a dual transformation (Dechter & Pearl, 1988) of the Vickrey and Koller (2002) CSP. We will investigate the computational impact, if any, of this difference as future work. There are two sets of constraints imposed by the CSP. The first set are unary constraints that require that if we assign profile ρ to V , it must be true that the action assigned to agent V in ρ is an -best response to the actions assigned by ρ to the neighbors of V (where  is an approximation parameter that is an input argument to PureProp). More formally, if we let ρC¯ denote the actions assigned by ρ to the set of agents C¯ ∈ N V , then ρV must be an -best response to V . The second set of constraints are binary constraints ρN−V on every pair of neighbors U and V in the graphical game. This constraint requires that if we assign profile ρ to V and profile γ to U , then ρ and γ must assign the same actions to the neighbors that U and V have in common. More for~ denote the common neighbors of U and V mally, if we let Y ~ contains both (recall that by definition of neighborhood, Y U and V ), then ρY~i = γY~i for all i. The constraint graph of a CSP places an undirected edge between nodes that have a binary constraint between them. Thus, the constraint graph of the game-CSP formulated as above for a graphical game will have exactly the same edges as in the graphical game itself. A solution to the game-CSP is an assignment of profiles to each variable so that none of the constraints are violated. It is possible that there is no assignment of profiles to variables that satisfies all constraints; in such a case the gameCSP has no solution. The following result is directly obtained from the construction of the game-CSP above.

We first describe how to convert the problem of finding pure-strategy -NE in games of complete information with

Theorem 1 Any solution to the game-CSP for a graphical game of complete information is an -Nash equilibrium

(a) Ring (b) Grid

(d) Chordal (c) Tree

Figure 1. Example graph topologies. Before presenting our PureProp algorithm we show how the locality of interactions in a graphical game reduces the computation required to compute the expected rewards of Equations (1) to (6) as follows. Let N V denote all the neighV denote all the neighbors of node V (including V ) and N−V bors of V excluding V . Equations (1) to (6) are then easily modified to incorporate this locality by modifying the payoff calculations of V to be contingent on just N V . Therefore, the expected payoff to agent V from playing a strategy σ when its type is tV and its neighbors play according V to the joint strategy µ ~ N−V is X X V |tV ) V ) RV (σ, µ ~ N−V = PT (~tN−V σ(aV ) × ~ tN V

~ a

−V

 V (~ V |~ V ) × RV (~ µ ~ N−V aN−V tN−V a|tV ) (7) The expected payoff to V when its neighbors play accordV ing to µN−V is   X V |tV RV (~ µ) = PT (tV ) × RV µV (·|tV ), µN−V . (8) tV

Agent V ’s strategy σ is said to be a best response strategy V for type tV and neighbors’ strategies N−V if, for all uncon0 ditional strategies σ , 0 V |tV ) ≥ RV (σ , µ V |tV ). RV (σ, µ ~ N−V ~ N−V

(9)

Equations (4) to (6) can be similarly modified.

of the underlying game. Also, all pure-strategy -Nash equilibria of a complete information graphical game with a finite number of actions are solutions to the game-CSP generated by the game.

3.2. Solving the game-CSP The PureProp algorithm proceeds in three phases in sequence — the first phase is called the node consistency phase, the second phase is called the arc consistency phase, while the third phase is called the value-propagation phase. 3.2.1. Node consistency This phase considers each variable in turn and removes the profiles from its domain that violate the unary constraint for that variable. For an agent with k neighbors with d actions for each agent, there are dk profiles in the domain and each of them have to be checked for a violation. The hopefully significantly reduced domains at the end of this phase are denoted D0 and are guaranteed to not have any unary constraint violations. If D0 is empty for any variable then the game-CSP has no solution.

AC-BNP(graph) returns the graph game with possibly reduced domains 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8

PRUNE(V, W ) returns true if domain of V changes. Removes profiles from Domain[V ] that are inconsistent with Domain[W ] 2.1 2.2

3.2.2. Arc Consistency Each undirected edge in the constraint graph of the CSP is treated as two directed arcs in this phase. So for an edge between neighbors V and U , there is an arc [U, V ] as well as an arc [V, U ]. Given current domains D(V ) and D(U ), the arc [V, U ] is said to be arc-consistent if the following condition holds:

2.3

• For every profile ρ in D(V ), there is some profile γ ∈ D(U ) such that the assignment of ρ to V and γ to U satisfies the binary constraint of the CSP, i.e., ρ and γ assign the same actions to the common neighbors of V and U .

2.6

We also define a finer notion of consistency in which we say a profile ρ for variable V is consistent with neighboring variable U if and only if there is a profile γ in the domain of U that satisfies the binary constraint between V and U. The goal of the arc consistency algorithm is to return domains for the variables such that all arcs are arc-consistent with respect to those domains. While there are a number of algorithms for arc consistency in the CSP literature, we implemented the simple AC-3 (Mackworth, 1977) algorithm as presented by Russell and Norvig (2003) and shown in Figure 2. It starts with the domains D0 and maintains a queue of arcs that need to be checked for consistency. The queue is initialized with all the arcs. At each step of the algorithm, the arc at the head of the queue, say [V, U ], is removed from the queue and checked for consistency with respect to the current domains. If the arc [V, U ] is not consistent then two things are done: 1) the domain of V is pruned by removing all profiles ρ from it that are not consistent V with the variable U , and 2) for all nodes W ∈ N−V the arc

inputs: graph - the graph game local variables: edges - a queue of edges. Initially all the edges in the graph game. while edges is not empty [V, W ] ← POP(edges) domain-change = PRUNE(V, W ) if domain-change = true foreach Ui ∈ NEIGHBORS[V ] push[Ui , V ] into edges

2.4 2.5

domain-change ← false foreach ρv ∈ Domain[V ] if ¬ ∃ρw ∈ Domain[W ] such that ρw and ρv satisfy the binary  constraint between W and V remove ρv from Domain[V ] domain-change ← true

return domain-change

Figure 2. Arc Consistency in PureProp.

[W, V ] are added to the queue maintained by AC-3 for consistency checking (unless the arc was already there in the queue). The AC-3 algorithm stops when the queue becomes empty and is guaranteed to achieve arc consistency in a finite number of steps (the basic intuition is that the algorithm only ever removes profiles from the domains and there are only a finite number of profiles in the first place). We denote the domains returned by AC-3 as D∗ . As with node consistency, if any of the domains returned by AC-3 are empty the game-CSP has no solution. Next we show that all pure-strategy -NE in the underlying game are present in D∗ . For any global profile (an assignment to every variable) P, let P[N V ] be the projection of P onto the variables in the neighborhood of V . Lemma 2 Let P be a global profile. Then, P[N V ] ∈ D∗ (V ) ∀ V , if and only if P is an -NE in the underlying game.

Proof If P[N V ] ∈ D∗ (V ) ∀ V , then P[V ] must be a -best response to P[N V ] and therefore P must be an NE. Now let’s consider the other direction. If P is a -NE, then for all V , P[V ] is an -best response to P[N V ] and hence P[N V ] ∈ D0 (V ). Also, for two neighbors V and U , P[N V ] and P[N U ] satisfy the binary constraint for the variables U and V in the CSP. Hence P[N V ] ∈ D∗ (V ).  Lemma 2 establishes that the arc consistency phase can potentially (and hopefully significantly) reduce the size of the domains of the variables without eliminating any -NE in the game. 3.2.3. Value Propagation This phase executes a standard backtracking search over the domains (remaining after the arc consistency phase) of the nodes in the constraint graph (see Russell & Norvig, 2003, for a description of the backtracking search algorithm for CSPs). Nodes are first ordered in ascending order of their domain size. At any stage of the search, a node V picks from its domain a profile ρ that is consistent with the current action assignments of V and its neighbors. If any of V ’s neighbors have not been assigned an action, V instructs them to play an action according to ρ. If V is not able to find such a profile, the algorithm backtracks. This algorithm is presented in Figure 3. It is known (and clear) that backtracking search is a complete algorithm that in the worst-case will amount to doing an exhaustive search over the domains. We present value propagation in Figure 3 for the goal of finding a single solution to the game-CSP. It is easily modified to find all solutions to the game-CSP. Treating the conversion to the game-CSP as part of PureProp the following result holds: Theorem 3 PureProp() computes (all) pure-strategy NE in complete information games with a finite number of actions. Proof The result is directly implied by Theorem 1, Lemma 2, and the completeness of the value propagation phase.  So far we have shown that PureProp is able to find purestrategy approximate NE in complete information games. While this by itself seems quite limited in applicability, we show next the far more interesting and applicable result that PureProp can be used to solve classes of complete and incomplete information games for approximate mixedstrategy equilibria.

4. Algorithms for Finding Mixed-Strategy Equilibria

Approximate

In this section we consider the classes of complete and incomplete information games identified in the introduction.

VALUE-ASSIGNMENT(nodes) Returns false if no consistent assignment of strategies is found 1 2

if all nodes are assigned a strategy return current-assignment

3

node ← POP(nodes)

4 5

foreach profile ρ ∈ Domain[node] assigned-N ← Neighbors of node that have a strategy assigned to them unassigned-N ← Neighbors[node] - assigned-N

6 7 8 9 10 11 12 13 14 15

if ρ is consistent with assigned strategies of node and assigned-N assign unassigned-N strategies according to ρ success ← VALUE-ASSIGNMENT(nodes) if success = true return true else retract the assignments to unassigned-N continue return false Figure 3. Value Propagation in PureProp.

4.1. Discrete types Here we consider both incomplete information games with discrete types and their special case of complete information games (with exactly one discrete type for each agent). The main difference between the two cases is that with more than one type the equilibria of interest are BNE and with exactly one type the equilibria of interest are NE. We will use the term equilibrium to refer to BNE or NE as appropriate whenever we make a statement that applies to both cases. In such games a strategy is defined as a mapping from the discrete type space to probability distributions over actions. Thus, the number of strategies is infinite and hence PureProp cannot be applied with the mixed strategies as the pure actions. We have to somehow construct a new reduced game that is restricted to a finite and hopefully small number of strategies with the property that all pure-strategy -NE in the reduced game are -equilibria in the original game, and furthermore that at least one -equilibrium of the original game is also a pure-strategy -NE in the reduced game. A reduced game with such properties is called an -induced game.

Complete Information Games: For complete information games, Ortiz and Kearns (2003) prove the following result that provides a constructive proof of the existence of an -induced game. They show that for any , there is a τ = dk k log k such that a τ -gridding of the infinite strategy space contains an -NE (recall that d is the number of actions and k is the maximum neighborhood size). Thus, an -induced game can be constructed for any complete information game by discretizing the mixed strategy space in intervals of τ . These τ -discretized strategies are treated as pure actions in the -induced game. Thus the number of pure h d i , where d actions in the -induced game will be O τ1 is the number of pure actions in the original game. PureProp can be applied to the -induced game and is then guaranteed to converge to an -NE for the original game (intuitively, if there is no way to improve by more than  by arbitrary unilateral change then there is also no way to improve by more than  by unilateral change restricted to the gridpoint strategies). We can also show that with an appropriate ordering over variables for arc-consistency and value propagation PureProp implements the NashProp algorithm of Ortiz and Kearns (2003) when applied to complete information games with arbitrary graph structure, and it implements the algorithm of Kearns et al. (2001) when applied to complete information games with tree-graph structure. Incomplete Information Games: For incomplete information games with discrete types, Singh et al. (2004) provide the following result that provides a constructive proof of the existence of an -induced game. They show that for any , there is a   2  τ ≤ min , , dk (4k log k) k log2 (k/2) such that if the mixed strategy space for every type is gridded up at τ intervals, then there exists a -BNE in the τ -discretized strategy space. These τ -discretized typeconditional strategies that clearly have a lot of internal structure to them are treated as unstructured pure actions in the -induced game. Thus number of pure actions in the h the md i induced game is O τ1 where m is the number of discrete types. The payoff for a joint action in the -induced game is defined as in Equation 9. PureProp can be applied to the -induced game and is then guaranteed to converge to an -BNE of the original game. As for complete information games, the resulting algorithm implements the algorithm of Singh et al. (2004).

4.2. Continuous types In games with continuous types, it is reasonable to require that types have some structured effects on agents’

payoffs. Otherwise just representing the effect of types exactly can lead to unbounded game descriptions. Quite naturally, the desire is to identify common effects that can be exploited for computational efficiency. In Singh et al. (2004), we considered a class of games with continuous types that have what is called the single crossing point property (examples of such games include certain kinds of first price auctions, all-pay auctions, noisy signaling games, oligopoly games, etc. (Athey, 2001; Milgrom & Weber, 1985; Topkis, 2002)) — such games have the interesting property that there exist BNE that are monotone threshold pure strategies. In a threshold pure strategy each type is mapped to a pure action and furthermore each action is played in at most one contiguous region of the continuous typer interval. We use the term threshold to refer to these strategies because the points in the type interval at which the strategies switch to new actions constitute thresholds. In monotone threshold pure strategies there is an ordering over actions such that the actions appear in that order in the strategy of all agents. As in Singh et al. (2004), we will only consider games with the single crossing point property, i.e., games with monotone threshold pure strategies. Even though distributions over the pure actions don’t need to be considered in such games, the number of strategies in such games is infinite because of the continuous types. Singh et al. (2004) showed that for any , there exists  , where ρt is the maximum probability dena τ = 4dkρ t sity over type space, such that there exists a τ -discretized strategy that is an -BNE. A τ -discretized strategy only allows thresholds at the τ intervals in type space. There will h d i be O τ1 many τ -discretized strategies and these will be the pure actions of the -induced game. As before, the existence of an -BNE in the τ -discretized strategies of the original game will guarantee the existence of an -NE in the -induced game. PureProp applied to the -induced game with the τ -discretized strategies as actions will find -BNE.

5. Experimental Results In this paper we only experimented with incomplete information games with continuous types (with the single crossing point property), because empirical results are already available on the other types of games in earlier papers on graphical games. We explored the effects of graph size and topology on the various phases of PureProp, i.e. on node consistency, arc consistency, and value propagation. Our implementation of PureProp uses the version of value propagation that finds all -NE in the -induced game. We ran PureProp on the party game with threshold pure strategies described in Singh et al. (2004). The agents have been invited to a party and must decide whether to attend or not (and so two actions for each agent). Furthermore, each agent has a social-network of agents she is acquainted with and

(a)

Acyclic Topology (tree) 9000

Single equilirium All equilibria, node + arc consistency All equilibria, node consistency

8000

Runtime (ms)

7000 6000 5000 4000 3000 2000 1000 0 0

20

40

60

80

100

120

140

Number of nodes (b)

Ring Topology 14000

Single equilirium All equilibria, node + arc consistency All equilibria, node consistency

12000

10000

Runtime (ms)

these constitute her neighbors in the graphical game. She has a liking or a dislike for her acquaintances and her reward for attending depends on which of her neighbors also attend the party. Every neighbor she likes adds to her reward if they attend and every neighbor she dislikes subtracts from her reward if they attend. An agent’s reward is thus a function of the actions of the agents in her social-network or neighborhood. Each agent also incurs a private cost associated with attending the party which has an additive effect on her reward. In the graphical representation of this game, nodes represent agents while edges connect an agent to her social network. We used the τ -discretization idea described above to construct an -induced game and then used PureProp to compute approximate equilibria in party games for the following neighborhood topologies (shown in Figure 1): (a) ring, (b) grid, (c) acyclic with a branching factor of 2 (i.e. each node has 3 neighbors), (d) chordal in which a tenth of the edges are chords, and chordal in which a fifth of the edges are chords. For each topology, we varied the number of players and measured the effect on the runtime of various phases of our algorithm.

8000

6000

4000

2000

Comparison of Runtimes to compute a Single Equilibrium 8000

0 10

Ring Tree Chordal−10% Chordal−20%

7000

30

40

50

60

70

80

90

100

Number of nodes

(c)

Grid Topology

5

15

6000

x 10

Single equilirium All equilibria, node + arc consistency All equilibria, node consistency

5000

Runtime (ms)

Runtime (ms)

20

4000

3000

10

5

2000

1000

0

0

20

40

60

80

100

120

140

Number of nodes

Figure 4. Comparison of runtimes to find one -BNE.

Figure 4 compares the runtime2 of the algorithm on various topologies as a function of the number of agents. We observe that regardless of topology, the runtime to compute a single -BNE scales linearly with the number of nodes. This is in accordance with our objective of exploiting the local2

We measured runtime for all our experiments on a machine with a 2.80GHz Intel(R) Pentium(R) CPU and approximately 1GB of RAM.

0 0

20

40

60

80

100

120

140

Number of nodes

Figure 5. Runtime analysis of the PureProp algorithm on three topologies - (a) acyclic (tree), (b) ring, (c) grid. In each plot, the solid line indicates the time to compute a single -BNE, the dashed-dot line indicates the time to compute all -BNE, and the dashed line indicates the time to compute all -BNE when the arc consistency step is skipped, i.e. node consistency is directly followed by value propagation.

(a)

Chordal Topology; 10% chords

5

4.5

x 10

Single equilirium All equilibria, node + arc consistency All equilibria, node consistency

4 3.5

Runtime (ms)

3 2.5 2 1.5 1 0.5 0 0

20

40

60

80

100

120

Number of nodes

(b)

x 10

5

Runtime (ms)

Chordal Topology; 20% chords

5

6

Single equilirium All equilibria, node + arc consistency All equilibria, node consistency

4

3

2

1

0 10

20

30

40

50

60

70

80

90

100

Number of nodes

Figure 6. Runtime analysis of the PureProp algorithm on two topologies - (a) chordal in which 10% of edges are chords, and (b) chordal in which 20 % of edges are chords. The solid line indicates the time to compute a single -BNE, the dashed-dot line indicates the time to compute all -BNE, and the dashed line indicates the time to compute all -BNE when the arc consistency step is skipped, i.e. node consistency is directly followed by value propagation.

ity of interactions to compute global equilibria in a manner that scales with the number of agents. Additionally, we see that runtime is strongly affected by the sizes of local neighborhoods. For instance, computation is quickest for ring topologies (which have the smallest local neighborhoods). Comparing the results for the two chordal graphs, we see that the graph with fewer chords, and hence fewer cycles, results in faster runtimes. Each of the plots in Figures 5 and 6 shows three curves which indicate (a) the time to compute a single -BNE, (b) the time to compute all -BNE, and (c) the time to com-

pute all -BNE under the constraint that value propagation is done immediately after node consistency is achieved without running the arc consistency phase at all. In Figure 5(a), we see that for acyclic graphs, the time required for all three computations is pretty much the same, suggesting that most of the work is done by the node consistency step and very little by arc consistency and value propagation. For acyclic graphs, this is exactly the behavior we would expect since the number of backtracking steps during value propagation will be minimal.3 In Figure 5(b), we notice that for ring topologies, circumventing arc consistency results in a big performance hit. This serves to demonstrate the effectiveness of the arc consistency step in reducing domain sizes. In Figure 5(c), however, it is interesting to note that for the grid topology, arc consistency does not prove very beneficial for graphs with fewer than 100 nodes. We speculate the reason for this is high degree of connectivity in this graph. For smaller graphs, this would cause the node consistency phase to eliminate a considerable number of profiles. However, as the ratio of node connectivity to graph size decreases, so does the effectiveness of node consistency. At the same time, the benefits of arc consistency start to become apparent. Finally, in Figures 6(a) and 6(b) we observe similar effects, and note again that the algorithm generally ran faster on graphs with fewer chords due to the presence of fewer cycles.

6. Conclusion We presented PureProp(), a constraint satisfaction algorithm for computing pure-strategy -NE in complete information graphical games with arbitrary graph structure. We showed how PureProp unifies the previous algorithms available for finding approximate equilibria in several classes of games: tree-structured graphical games of complete information (Kearns et al., 2001), arbitrary-structured graphical games of complete information (Ortiz & Kearns, 2003), and tree-structured graphical games of incomplete information (Singh et al., 2004). In addition, we showed how to apply PureProp to solve two classes of arbitrary-structured graphical games of incomplete information.

References Athey, S. (2001). Single crossing properties and the existence of pure strategy equilibria in games of incomplete information. Econometrica, 69, 861–869. 3

In acyclic graphs an ordering over nodes from the root to the leaves would lead to no backtracking in the value propagation phase. We did not specialize the value propagation phase for acyclic graphs and choose node ordering based on domain size as for the cyclic graphs.

Dechter, R., & Pearl, J. (1988). Tree clustering schemes for constraint processing. Seventh National Conference on Artificial Intelligence (AAAI-88) (pp. 37–42). St. Paul, MN. Kearns, M., Littman, M. L., & Singh, S. (2001). Graphical models for game theory. Seventeenth Conference on Uncertainty in Artificial Intelligence (pp. 253–260). Seattle. Littman, M. L., Kearns, M., & Singh, S. (2001). An efficient, exact algorithm for solving tree-structured graphical games. Advances in Neural Information Processing Systems 14 (pp. 817–823). Mackworth, A. (1977). Consistency in networks of relations. Artificial Intelligence, 8, 99–118. Milgrom, P., & Weber, R. (1985). Distributional strategies for games with incomplete information. Mathematics of Operations Research, 10, 619–632. Ortiz, L. E., & Kearns, M. (2003). Nash propagation for loopy graphical games. In S. T. S. Becker and K. Obermayer (Eds.), Advances in neural information processing systems 15, 793–800. Cambridge, MA: MIT Press. Russell, S., & Norvig, P. (2003). Artificial intelligence - a modern approach. Prentice Hall. Singh, S., Soni, V., & Wellman, M. P. (2004). Computing approximate Bayes Nash equilibria in tree-games of incomplete information. In ACM Conference on Electronic Commerce. Topkis, D. (2002). Supermodularity and Complementarity. Princeton University Press. Vickrey, D., & Koller, D. (2002). Multi-agent algorithms for solving graphical games. Eighteenth National Conference on Artificial Intelligence (pp. 345–351). Edmonton, Alberta, Canada: American Association for Artificial Intelligence.