Sparse Binary Zero-Sum Games - HAL-Inria

2 downloads 0 Views 802KB Size Report
Nov 3, 2014 - 45 avenue des ´Etats-Unis, F-78035 Versailles Cedex, France. Jialin Liu ...... Mark Fey, Richard D. McKelvey, and Thomas R. Palfrey.
Sparse Binary Zero-Sum Games David Auger, Jialin Liu, Sylvie Ruette, David L. Saint-Pierre, Olivier Teytaud

To cite this version: David Auger, Jialin Liu, Sylvie Ruette, David L. Saint-Pierre, Olivier Teytaud. Sparse Binary Zero-Sum Games. Asian Conference on Machine Learning, 2014, Ho-Chi-Minh-Ville, Vietnam. 29, pp.16, 2014.

HAL Id: hal-01077627 https://hal.inria.fr/hal-01077627 Submitted on 3 Nov 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destin´ee au d´epˆot et `a la diffusion de documents scientifiques de niveau recherche, publi´es ou non, ´emanant des ´etablissements d’enseignement et de recherche fran¸cais ou ´etrangers, des laboratoires publics ou priv´es.

JMLR: Workshop and Conference Proceedings 29:1–16, 2014

ACML 2014

Sparse Binary Zero-Sum Games David Auger

[email protected] AlCAAP, Laboratoire PRiSM, Bˆ at. Descartes, Universit´e de Versailles Saint-Quentin-en-Yvelines, ´ 45 avenue des Etats-Unis, F-78035 Versailles Cedex, France

Jialin Liu

[email protected]

TAO, Lri, UMR CNRS 8623, Univ. Paris-Sud, F-91405 Orsay, France

Sylvie Ruette

[email protected] Laboratoire de Math´ematiques, CNRS UMR 8628, Bˆ at. 425, Univ. Paris-Sud, F-91405 Orsay, France

David L. Saint-Pierre

[email protected]

Montefiore Institute, Universit´e de Li`ege, Belgium

Olivier Teytaud

[email protected] TAO, Lri, UMR CNRS 8623, Univ. Paris-Sud, F-91405 Orsay, France and OASE Lab, National Univ. of Tainan and AILab, National Dong Hwa Univ., Hualien, Taiwan

Editor: Cheng Soon Ong and Tu Bao Ho

Abstract Solving zero-sum matrix games is polynomial, because it boils down to linear programming. The approximate solving is sublinear by randomized algorithms on machines with random access memory. Algorithms working separately and independently on columns and rows have been proposed, with the same performance; these versions are compliant with matrix games with stochastic reward. (Flory and Teytaud, 2011) has proposed a new version, empirically performing better on sparse problems, i.e. cases in which the Nash equilibrium has small support. In this paper, we propose a variant, similar to their work, also dedicated to sparse problems, with provably better bounds than existing methods. We then experiment the method on a card game. Keywords: Sparsity, bandit algorithms, zero-sum matrix games.

1. Introduction Solving Nash equilibria (NE) of matrix games is important by itself, e.g. for some economical models; as a building block for Monte-Carlo Tree Search algorithms (Flory and Teytaud, 2011) when simultaneous actions are involved; and for robust stochastic optimization (Lambert III et al., 2005). Bandits algorithms (Lai and Robbins, 1985; Grigoriadis and Khachiyan, 1995; Auer et al., 1995; Audibert and Bubeck, 2009) are tools for handling exploration vs exploitation dilemma that are in particular useful for finding NE (Grigoriadis and Khachiyan, 1995; Audibert and Bubeck, 2009), with applications to two-player games (Kocsis and Szepesvari, 2006; Flory and Teytaud, 2011). To the best of our knowledge, the only paper in which sparsity in NE is used for accelerating bandits is (Flory and Teytaud, 2011).

c 2014 D. Auger, J. Liu, S. Ruette, D.L. Saint-Pierre & O. Teytaud.

Auger Liu Ruette Saint-Pierre Teytaud

Section 2 presents the framework of sparse bandits and related algorithms. Section 3 mathematically analyzes sparse bandits. Section 4 presents experiments on a card game and Section 5 concludes.

2. Algorithms for sparse bandits In this section we define the problem and our proposed approach. Section 2.1 presents below the notion of NE in matrix games. Section 2.2 defines a classical bandit algorithm framework that aims at solving such a problem and Section 2.3 introduces two proposed algorithms adapted for sparse problems. Throughout the paper, [[a, b]] denotes {a, a + 1, a + 2, . . . , b − 1, b} and lcm(x1 , . . . , xn ) denotes the least common multiple of x1 , . . . , xn , where a, b, x1 , . . . , xn are integers. 2.1. Matrix games and Nash equilibria Consider a matrix M of size K × K with values in {0, 1} (we choose a square matrix for short notations, but the extension is straightforward). Player 1, the row player, chooses an action i ∈ [[1, K]] and player 2, the column player, chooses an action j ∈ [[1, K]]; both actions are chosen simultaneously. Then, player 1 gets reward Mi,j and player 2 gets reward 1 − Mi,j . The game therefore sums to 1 (we consider games summing to 1 for convenience, but 0-sum games are equivalent). A NE is a pair (x∗ , y ∗ ) (both in [0, 1]K and summing to 1) such that if i is distributed according to the distribution x∗ (i.e. i = k with probability x∗k ) and if j is distributed according to the distribution y ∗ (i.e. j = k with probability yk∗ ) then none of the players can expect a better average reward by changing unilaterally its strategy, i.e. ∀x, y, (x∗ )T M y ≥ (x∗ )T M y ∗ ≥ xT M y ∗ . There might be several NE, but the value of the game M , given by v = x∗ M y ∗ , remains the same. Moreover, we define an -NE as a pair (x, y) such that inf0 xT M y 0 > (x∗ )T M y ∗ −  and y

sup(x0 )T M y < (x∗ )T M y ∗ + . x0

2.2. Bandit algorithms A state-of-the-art bandit algorithm to approximate a NE (which includes matrix games) is EXP3 (Auer et al., 1995). Here we present the version used in (Audibert and Bubeck, 2009). At iteration t ∈ [[1, T ]], our version of EXP3 proceeds as follows (this is for one of the players; the same is done, independently, for the other player): • At iteration 1, S is initialized as a null vector. P • Action i is chosen with probability p(i) = αt /K + (1 − αt ) × exp(ηt Si )/ j exp(ηt Sj ) for some sequences (α)t≥1 and (ηt )t≥1 , e.g. in (Bubeck and Cesa-Bianchi, q 2012) αt = 0 and ηt = log(K)/(tK) or in (Audibert and Bubeck, 2009) ηt = min( 54 αt = Kηt . 2

log(K) 1 TK , K )

and

Sparse Binary Zero-Sum Games

• Let r be the received reward. • Update Si : Si ← Si + r/p(i) (and Sj for j 6= i is not modified). This algorithm, as well as its variants, converges to the NE as explained in (Audibert and Bubeck, 2009; Bubeck and Cesa-Bianchi, 2012) (see also (Grigoriadis and Khachiyan, 1995; Auer et al., 1995)). 2.3. Rounded bandits. (Flory and Teytaud, 2011) proposes a generic way of adapting bandits to the sparse case. Basically, the bandit runs as usual, and then the solution is pruned. We here propose a variant of their algorithm, as explained in Alg. 1. Our definition of sparsity does not assume that the matrix M is sparse; rather, we assume that the two vectors of the NE x∗ and y ∗ have support of moderate size k, i.e. k 1/(2c) and to columns j such that yj > 1/(2c). Let (x0 , y 0 ) be the exact Nash equilibrium of this restricted game (computed in polynomial time by linear programming). Output (x0 , y 0 ) as approximate Nash equilibrium of the complete game (complete with 0’s for missing coordinates).

3. Analysis This section is devoted to the mathematical analysis of sparse bandit algorithms. Section 3.1 introduces the useful notations. Section 3.2 shows some properties of supports of Nash equilibria. Section 3.3 gives some useful results on denominators of rational probabilities involved in Nash equilibria. Section 3.4 presents stability results (showing that in sparse bandits, good strategies are also close to the Nash equilibrium). Section 3.5 concludes by properties on sparse bandits algorithms. 3.1. Terminology We use the classical terminology of game theory. We consider a Matrix Game M of size K × K as above. A pure strategy is an index i ∈ [[1, K]]. A mixed strategy is a vector of size K with non-negative coefficients summing to 1. Let ei be the vector (0, 0, . . . , 0, 1, 0 . . . , 0)T with a one only at the ith position; by a slight abuse of notation we will use this notation independently of the dimension of the vector (i.e. ei can be used both in R10 and R50 ). P Let ∆ denote the set of probability vectors, that is, ∆ = {y : j yj = 1 and ∀j, yj ≥ 0}; this implicitly depends on the dimension of the vectors, we do not precise it since there will be no ambiguity. The support of a vector y ∈ ∆ is the set of indices j such that yj > 0. For short, supx (or supy , inf x , inf y equivalently) means supremum on x ∈ ∆. The value of y ∈ ∆ for M is V (y) = supx xT M y. Recall that v denotes the value of the game M , that is, it satisfies ∀x, y ∈ ∆, xT M y ∗ ≤ v ≤ (x∗ )T M y,

(1)

and v = (x∗ )T M y ∗ = V (y ∗ ) if (x∗ , y ∗ ) is a Nash equilibrium. 3.2. Supports of Nash equilibria Here we consider general matrices A, M with real coefficients. The following lemma is well known (see (Gale and Tucker, 1951, Lemma 1 page 318)). 4

Sparse Binary Zero-Sum Games

Lemma 1 (Farkas Lemma) There exists y ≥ 0 satisfying Ay = b if and only if there is no x such that AT x ≥ 0 and xT b < 0. The following lemma is adapted from (Dantzig and Thapa, 2003). We give the proof for the sake of completeness. Lemma 2 Let I be the set of column indices i such that for all optimal solutions x∗ of the row player we have (x∗ )T M ei = v. Then there exists an optimal solution y ∗ for the column player whose support is exactly I. Proof First, we show that it is sufficient to prove that for any i ∈ I there is an optimal solution y i ∈ ∆ such that yiiP> 0. Then one considers any strictly convex combination of 1 i ∗ the y i , for instance y ∗ = |I| i∈I y . The vector y ∈ ∆ has support including I (because yii > 0). On the other hand, y ∗ has support included in I (because by construction any optimal solution has support included in I). So, it is sufficient to show that for any i ∈ I there is an optimal y i such that yii > 0. We now prove this. Without loss of generality fix i = 1 ∈ I. Let us suppose that no optimal solution y of the column player has a positive coordinate y1 > 0. In other words, the system  P 1  i yi =   M y ≤ v1 y1 > 0    y ≥ 0 has no solution, where 1 is the vector with all coefficients equal to 1. Equivalently, this means that the following system has no solution  P = 1  i yi   (M − v1K×K )y ≤ 0 y1 > 0    y ≥ 0 where 1K×K is the K × K matrix with all coefficients equal to 1. Introducing the variable z = yy1 and a slack variable w of size K × 1, this is equivalent to saying that the system   (M − v1K×K )z + w = 0 z1 = 1  z, w ≥ 0 has no solution. By Lemma 1, applied with the concatenation (z, w) as y, b = (0, 0, . . . , 0, 1) and   M − v1K×K IdK A= , 1 0 ··· 0 0 we deduce the existence of a vector x and a real number  T P x M − v  ·1 i xi +  ≥   T x (M − v1K×K ) ≥ x ≥     < 5

 such that 0 0 0 0

Auger Liu Ruette Saint-Pierre Teytaud

where M·1 denotes the first column of M . By the first equation above, x is not zero. Thus we can normalize x to get a vector in ∆, and we infer the existence of an optimal strategy x∗ ∈ ∆ for the row player such that (x∗ )T M·1 > v; this implies that we cannot have 1 ∈ I, a contradiction.

Corollary 3 If M admits a unique Nash Equilibrium (x∗ , y ∗ ) with support J × I then: • for all i 6∈ I and j 6∈ J we have (x∗ )T M ei > v and eTj M y ∗ < v;

(2)

• the submatrix M 0 of M with rows and columns respectively in J and I has a unique Nash Equilibrium which is the projection of x∗ , y ∗ on J × I. Proof The first part is a consequence of Lemma 2. Indeed, if (x∗ )T M ei = v with i ∈ / I, then there exists an optimal solution y’ whose support contains i, which contradicts the uniqueness of y ∗ . Thus (x∗ )T M ei > v by Eq. (1). The statement for j ∈ / J is symmetric. For the second part, the projection is clearly a Nash equilibrium. Then, suppose that there is another Nash equilibrium for M 0 and let (x0 , y 0 ) be the only 2K vector whose projection on J × I is equal to this other equilibrium (in other words add zero coordinates for i 6∈ I and j 6∈ J). Consider now (1 − t)y ∗ + ty 0 with t > 0 and a row index j; then if j ∈ J eTj M ((1 − t)y ∗ + ty 0 ) = (1 − t)v + tv = v and if j 6∈ J, then by the first part of this corollary the left part is at most v for t small enough. Since we have a finite number of rows this implies that by choosing t small enough we obtain a vector (1 − t)y ∗ + ty 0 which is another optimal solution for the column player in M , which contradicts the uniqueness.

3.3. Denominators of Nash equilibria Consider a matrix M , with coefficients in {0, 1}, of size k1 × k2 with k1 ≥ 2 and k2 ≥ 2. Lemma 4 Assume that the Nash equilibrium (x∗ , y ∗ ) is unique and that ∀i, j, x∗i > 0 and yj∗ > 0. a) Then k1 = k2 and x∗ and y ∗ are rational vectors which can be written with common denominator at most k k/2 with k = k1 = k2 . b) Moreover, for all x, y ∈ ∆, xT M y ∗ = (x∗ )T M y = (x∗ )T M y ∗ = v. Proof The Nash equilibrium y ∗ verifies the following properties:

6

Sparse Binary Zero-Sum Games

• the sum of the probabilities of all strategies for the “column” player is 1, i.e. X yi∗ = 1.

(3)

i

• the expected reward for the “row” player playing strategy i against y ∗ is independent of i, i.e. X X ∀i, Mi,j yj∗ = M1,j yj∗ . (4) j

j

If there is another solution y 6= y ∗ to Eqs. (3), (4), then y 0 = y ∗ − α(yP − y ∗ ) is another 0 strategy for the column player. If α is small enough, then y ≥ 0 and yi0 = 1; it is a correct mixed strategy, and its value is x∗ M y ∗ − αx∗ M (y − y ∗ ) which is less than or equal to x∗ M y ∗ if α has the same sign as x∗ M (y − y ∗ ). This contradicts the uniqueness of y ∗ as a Nash strategy for the column player. As a consequence, Eqs. (3) and (4) are a characterization of the unique Nash equilibrium. Thus y ∗ can be computed by solving Eqs. (3) and (4); this is a linear system Zy ∗ = (1, 0, 0, . . . , 0) with one single solution and Z a matrix with k1 rows and k2 columns, and where Z has values in {−1, 0, 1}. The solution is unique, therefore k1 = k2 and Z is invertible. Z −1 can be computed as Z −1 =

1 (cofactor(Z))T det(Z)

where

 (cofactor(Z))ij = (−1)i+j det (Zi0 j 0 )i0 6=i, j 0 6=j .

The matrix Z has coefficients in {−1, 0, 1}; therefore, by Hadamard’s maximum determinant problem: | det(Z)| ≤ k k/2 ((Hadamard, 1893), see e.g. (Brenner and Cummings, 1972, p 626)). Moreover, the matrix cofactor(Z) has integer coefficients. This concludes the proof of the fact that y ∗ is rational with denominator D = | det(Z)| at most k k/2 , where k = k1 = k2 . The same arguments using x∗ instead of y ∗ show that x∗ can also be written as a rational with the same denominator D. To show b), notice that Eq. (4) can be rewritten as follows: ∀i, (M y ∗ )i = (M y ∗ )1 . If x ∈ ∆, then X X X xT M y ∗ = xi (M y ∗ )i = xi (M y ∗ )1 = (M y ∗ )1 because xi = 1. i

i

i

Thus xT M y ∗ is independent of x ∈ ∆, which implies that xT M y ∗ = (x∗ )T M y ∗ = v. By symmetry, one also has: ∀y ∈ ∆, (x∗ )T M y = (x∗ )T M y ∗ = v. Please note that k k/2 is known nearly optimal for matrices with coefficients in {0, 1} (by Hadamard’s work) for any k of the form 2m . Also there are examples of matrices for which 1 V (y) = V (y ∗ ) + | det(Z)| ky − y ∗ k for y arbitrarily close to y ∗ . 3.4. Stability of Nash equilibria In the general case of a zero-sum game, two mixed strategies can be far from each other, whenever both of them are very close, in terms of performance, to the performance of the 7

Auger Liu Ruette Saint-Pierre Teytaud

(assumed unique) Nash equilibrium. However, with a matrix M with values in {0, 1}, this is not true anymore, as explained by the two lemmas below. Lemma 5 Let k ≥ 2. Consider a k × k matrix M with elements in {0, 1} such that the Nash equilibrium (x∗ , y ∗ ) is unique and no pure strategy has a null weight. Then for all y ∈ ∆ we have 1 V (y) ≥ V (y ∗ ) + k/2 ky − y ∗ k∞ . (5) k Proof By convexity of V , it is sufficient to prove that

u;

P

min

i

ui =0, kuk∞ =1

V (y ∗ + tu) − V (y ∗ ) ≥ 1/k k/2 . t→0, t>0 t lim

This is equivalent to u;

P

i

min

ui =0, kuk∞ =1

maxMi .u ≥ 1/k k/2 , i

(6)

with Mi the ith row of M as previously. Let u e be a vector in which the minimum is reached (it exists by compactness). Since ke uk∞ = 1, there exists i0 such that |e ui0 | = 1. Let us assume without loss of generality, that u ei0 = 1. The proof isP the same if u ei0 = −1. Thus, in Eq. (6), we can restrict to the vectors u such that ui0 = 1, i ui = 0 and ∀i ∈ {1, . . . , k}, −1 ≤ ui ≤ 1. This is indeed a linear programming problem, as follows: min

u∈Rk , w∈R

w

under constraints ∀i ∈ {1, . . . , k},

−1 ≤ ui ≤ 1,

∀i ∈ {1, . . . , k},

Mi .u ≤ w, ui0 = 1, X ui = 0. 1≤i≤k

It is known that when a linear problem in dimension k + 1 has a non infinite optimum, then there is a solution (u, w) with k + 1 linearly independent active constraints. Let us pick such a solution u. It is solution of kP + 1 linearly independent equations of the form either ui = 1, or ui = −1, or Mi .u = w, or i ui = 0. Let us note the system ∀i ∈ P, ui = 1, ∀i ∈ N, ui = −1, ∀i ∈ H, Mi .u = w, X ui = 0, 1≤i≤k

where P, N, H are the subsets of {1, . . . , k} where the corresponding constraints are active. 8

Sparse Binary Zero-Sum Games

We can remove w by setting Mj .u = Mi .u for some fixed i ∈ H and all j ∈ H \ {i}. Then, u is solution of a system of k equations in dimension k, with coefficients in {−1, 0, 1}. We use the same trick as in the proof of Lemma 4; u is solution of a system of k linear equations with all coefficients in {−1, 0, 1}. Therefore all coordinates of u are rational numbers with a common denominator D ≤ k k/2 . Then Mi .e u = Mi .u has a denominator D ≤ k k/2 and is positive; therefore Mi .e u ≥ 1/k k/2 . This proves the expected result.

Lemma 6 Consider M a K × K matrix with coefficients in {0, 1} and assume that the Nash equilibrium (x∗ , y ∗ ) is unique. Let J be the support of y ∗ and k = #J. Then ∀j ∈ / J, (x∗ )T M ej ≥ v +

1 k

.

k2

Proof According to Lemma 4, v and the coefficients of x∗ , y ∗ are multiple of some constant c ≥ 1k . Thus, for every j, (x∗ )T M ej is also a multiple of c. Fix j 6∈ J. By Corollary 3, k2

(x∗ )T M ej > v, which implies that (x∗ )T M ej ≥ v + c.

Combining Lemmas 5 and 6 yields the following Theorem 7 Consider a matrix M of size K × K and with coefficients in {0, 1}. Assume that there is a unique Nash equilibrium (x∗ , y ∗ ). Let k be the size of the supports of x∗ , y ∗ . Then 1 ∀y ∈ ∆, V (y) − V (y ∗ ) ≥ k ky − y ∗ k∞ . (7) 2k . Let J be the support of y ∗ . For every y ∈ ∆, one can write P y = ay 0 + by 00 , with a = j∈J yi ∈ [0, 1], a + b = 1 and y 0 , y 00 ∈ ∆ satisfying: ∀j 6∈ J, yj0 = 0 and ∀j ∈ J, yj00 = 0. For every index i, one has Proof Define c =

1

k

k2

yi −

yi∗

 =

ayi0 − yi∗ = a(yi0 − yi∗ ) − byi∗ byi00

if i ∈ J if i ∈ /J

Thus ky − y ∗ k∞ = max{ka(y 0 − y ∗ ) − by ∗ k∞ , bky 00 k∞ }. Then, define δ = ky − y ∗ k∞ , δ1 = ky 0 − y ∗ k∞ and δ2 = ky 00 k∞ . One has X V (y) ≥ (x∗ )T M y = a(x∗ )T M y 0 + b (x∗ )T M (yj00 ej ). j6∈J

9

(8)

Auger Liu Ruette Saint-Pierre Teytaud

By Lemma 4(b), (x∗ )T M y 0 = v; and by Lemma 6, (x∗ )T M ej ≥ v + c for all j ∈ / J. thus X V (y) ≥ av + b(v + c) yj00 , j ∈J /

that is, V (y) − v ≥ cb.

(9)

ka(y 0 −y ∗ )−by ∗ k∞ .

By Eq. (8), either δ = bδ2 ≤ b, or δ = If δ ≤ b, the result is given by 2 Eq. (9) because c < 1 and hence c ≤ c. From now on, assume that δ = ka(y 0 − y ∗ ) − by ∗ k∞ . Then, δ ≤ aδ1 + bky ∗ k∞ ≤ aδ1 + b. Equivalently, aδ1 ≥ δ − b.

(10)

We split the end of the proof into two cases. Case 1: b ≥ cδ/2. Then V (y) − v ≥ cb by Eq. (9) ≥ c2 δ/2

by assumption on b

which gives the expected result in case 1. Case 2: b < cδ/2. Since y 0 has the same support as y ∗ , there exists x0 ∈ ∆ with the same support as x∗ such that x0 M y 0 − v ≥ cδ1 by Lemma 5. Moreover, V (y) − v ≥ x0 M y − v = a(x0 M y 0 − v) + b(x0 M y 00 − v). Hence, V (y) − v ≥ acδ1 − vb (because x0 M y 00 ≥ 0) ≥ cδ(aδ1 /δ) − vb   b ≥ cδ 1 − − vb (using Eq. (10)) δ  c v ≥ cδ 1 − − (using b < cδ/2) 2 2 ≥ cδ(1 − c)/2 (using v ≤ 1) Since 1 − c ≥ c, we get the expected result in case 2.

3.5. Application to sparse bandit algorithms Consider a matrix M as in Theorem 7. By Lemma 4, (x∗ , y ∗ ) can be written with a common denominator at most k k/2 . Define C = lcm(1, 2, 3, . . . , bk k/2 c). By the prime number theorem, it is known 1 that C = O(exp(k k/2 (1 + o(1)))) . We discuss in parallel the truncated bandit algorithm (Alg. 2) and the rounded bandit algorithm (Alg. 1), as follows. By construction of the algorithms, with probability 1 − δ, the bandit algorithm finds a u-Nash equilibrium (x, y), for 1. See details in http://mathworld.wolfram.com/LeastCommonMultiple.html.

10

Sparse Binary Zero-Sum Games

• u < 1/(4k k/2 k k ) for the TBANDIT algorithm. • u < 1/(4Ck k ) for the RBANDIT algorithm. By Theorem 7, this implies that • kx − x∗ k∞ ≤ 2uk k < 1/(2k k/2 ) (idem for ky − y ∗ k∞ ) for TBANDIT; • kx − x∗ k∞ ≤ 1/(2C) (idem for ky − y ∗ k∞ ) for RBANDIT. Then: • Truncated algorithm: all non-zero coordinates of x∗ are at least 1/k k/2 and |x∗i − xi | < 1/(2k k/2 ) with probability ≥ 1 − δ (and the same for y ∗ , y); so with probability 1 − δ the Nash equilibrium (x0 , y 0 ) of the reduced game is the solution (x∗ , y ∗ ) (after filling missing coordinates with 0). • Rounded algorithm: the denominator of the coordinates of x∗ , y ∗ is a divisor of C, so with probability 1 − δ, kCx − Cx∗ k∞ < 1/2, kCy − Cy ∗ k∞ < 1/2, and Cx∗ and Cy ∗ are integers. So x∗ = bCx + 21 c/C; RBANDIT finds the exact solution with probability ≥ 1 − δ. For example, if using the Grigoriadis & Khachiyan algorithm (Grigoriadis and Khachiyan, 1995), or variants of EXP3 (Audibert and Bubeck, 2009), one can ensure precision u with fixed probability in time   • O K log K u1 = O K(log K)(k 3k ) for the truncated version;  • O K log K u1 = O(K(log K)k 2k exp(2k k/2 (1 + o(1)))) for the rounded version. Then, we get, after rounding (rounded version) or after truncating and polynomial-time solving (truncated version), the exact solution y ∗ with fixed probability and time  • O K log K · k 3k + poly(k) for the truncated algorithm (Alg. 2);  • O K log K · k 2k exp(2k k/2 (1 + o(1))) for the rounded algorithm (Alg. 1). The truncated algorithm (Alg. 2) is therefore better.

4. Experiments We work on the Pokemon card game. More precisely, we work on the metagaming part, i.e. the choice of the deck; the ingaming is then handled by a simulator with exact solving. The source code is freely available at http://www.lri.fr/~teytaud/games.html. At first a normal EXP3 is executed using our empirically tuned formula   c − 1 −1 p(i) = 1+ √ t   √ √ √ X √ × 1/(c × t) + (1 − 1/ t) × exp(Si / t)/ exp(Sj / t) (11) j

11

Auger Liu Ruette Saint-Pierre Teytaud

0.9

0.9

0.85

0.85

0.8

0.8

0.75

0.75

0.7

0.7

0.65

0.65

0.6

0.6

0.55

0.55

TEXP3 vs Uniform EXP3 vs Uniform

TEXP3 vs EXP3 0.5

0.5

0.45 0 10

0.45 0 10

1

10

2

3

10

10

4

10

1

2

10

(a) c = 0.65

10

3

10

4

10

(b) c = 0.65

0.58

0.95

0.9

0.57

TEXP3 vs Uniform EXP3 vs Uniform

0.85 0.56 0.8 0.55

0.75

0.54

0.7

0.65

0.53

0.6 0.52 0.55

TEXP3 vs EXP3 0.51

0.5 0 10

0.5

1

10

2

3

10

10

4

10

0.45 0 10

1

(c) c = K

10

3

10

4

10

(d ) c = K

0.55

0.95

0.545

0.9

0.54

0.85

0.535

0.8

0.53

0.75

0.525

0.7

0.52

0.65

0.515

0.6

0.51

0.55

TEXP3 vs EXP3

0.505

0.5 0 10

2

10

TEXP3 vs Uniform EXP3 vs Uniform

0.5

1

10

2

10

3

10

4

10

0.45 0 10

(e) c = 2K

1

10

2

10

3

10

4

10

(f ) c = 2K

Figure 1: Performance (%) in terms of budget T for the game of Pokemon using 2 cards. The left column shows TEXP3 playing against EXP3 for different values of c. The right column shows EXP3 and TEXP3 playing against the random uniform baseline. We tested a wide range of values for c and TEXP3 performs better than EXP3 regardless of c.

to compute the probability of arm i and we normalize the probabilities if needs be. After T iterations, the TEXP3 takes the decision whether an arm is part of the NE is based upon

12

Sparse Binary Zero-Sum Games

a threshold ζ as explained in Alg. 3. Alg. 3 is based on Alg. 2, with constants adapted empirically and without the exact solving at the end. Algorithm 3 TEXP3, an algorithm proposed in (Flory and Teytaud, 2011) used in these experiments. Input: A K × K matrix M , defined by a mapping (i, j) 7→ Mi,j . A number T of rounds. Run EXP3 with Eq. (11), which provides an approximation (x, y) of the Nash equilibrium. Define: (T xa )0.7 a∈K T = xi if xi ≥ ζ and x0i = 0 otherwise. X = x0i / x0j .

ζ = max x0i x00i

j∈{1,...,K}

Define: (T ya )0.7 a∈K T = yi if yi ≥ ζ 0 and yi0 = 0 otherwise. X = yi0 / yj0 .

ζ 0 = max yi0 yi00

j∈{1,...,K}

Output x00 and y 00 as approximate Nash equilibrium of the complete game. Figures 1(a), 1(c) and 1(e) show the performance of TEXP3 playing against EXP3 for different values of c and Figures 1(b), 1(d ) and 1(f ) present the performance of TEXP3 and EXP3 when playing against the random uniform baseline; the probability distribution obtained by EXP3 and TEXP3 after T iterations of EXP3 and (for TEXP3) after truncation and renormalization are used against random. To ensure that no player gains from being the first player, we make them play as both the row player and the column player and we display the result. Each point in the Figure consists in the means from 100 independent runs. In all Figures, TEXP3 provides a consistent improvement over EXP3. Even in Figure 1(b), where EXP3 seems relatively weak against the random baseline, TEXP3 manages to maintain a performance similar to the ones in Figure 1(d ) and 1(f ).

5. Conclusion (Grigoriadis and Khachiyan, 1995; Auer et al., 1995; Audibert and Bubeck, 2009) are great steps forward in zero-sum matrix games, and beyond. They provide algorithms solving K × K matrix games, with precision  and for a fixed confidence, in time O(K log K/2 ).

13

Auger Liu Ruette Saint-Pierre Teytaud

As noticed in (Grigoriadis and Khachiyan, 1995), this has the surprising property that the complexity is sublinear in the size of the matrix, with a fixed risk. We here show that, with coefficients in {0, 1}, if there is a unique sparse Nash equilibrium with support of size k for each player, then this bound can be reduced to K log K · k 3k , with no precision parameter (we provide an exact solution), with a fixed confidence 1 − δ: • the dependency in K is the same as in (Grigoriadis and Khachiyan, 1995); • there is no dependency in . Practical relevance of this work We want here to discuss the practical relevance of our results; two aspects are (i) the existence of very sparse problems, and (ii) the possible implementation of real world algorithms inspired by our results. The first point is the existence of very sparse problems. We have seen that the sparsity level that we need, for our algorithm to outperform the state of the art for exact solutions, is k