di sfuggita, magari con la nuova guida al guinzaglio, o conosciuti a Madrid in 2
weekend, ..... e calcolati in maniera simile all'algoritmo del Simplesso, attraverso
pivot- ing complementare di due ... altro percorso da capo. È stato necessario ...
POLITECNICO DI MILANO Corso di Laurea in Ingegneria Informatica Dipartimento di Elettronica e Informazione
Local Search Techniques for Nash Equilibrium Computation with Bimatrix Games
Relatore: Ing. Nicola Gatti
Tesi di Laurea di Giorgio Patrini, matricola 751017 Marco Rocco, matricola 749965
Anno Accademico 2010-2011
Ringraziamenti Ecco il momento dei ringraziamenti. Mi spoglio del gergo da Ingegnere per dedicarmi a questa pagina, come si deve. Un grazie en passant va a tutti coloro i quali ho incrociato tra i banchi in questi anni di studio, con cui si è passato un esame dopo l’altro, finchè si poteva, con cui si son trascorsi i pochi momenti liberi o quella manciata di ore di studio (tra cui le tante ore nell’auletta della Nave, primo piano), a quelli fuggiti e rivisti di sfuggita, magari con la nuova guida al guinzaglio, o conosciuti a Madrid in 2 weekend, o scoperti cremaschi e mai visti prima. Perchè sapere che non sei l’unico a scalar la vetta, ti dà qualche forza in più. Sarà un piacere un giorno rivedersi, in altre vite. Ci conto. Ringrazio Riccardo che si è goduto una lettura in draft della tesi e con cui, negli ultimi mesi, ci siam caricati a vicenda per dare il massimo. Ma soprattutto un grazie a Marco, senza la cui passione, l’occhio critico e l’intensa amicizia, sarei arrivato a questo traguardo solo con uno sforzo ben maggiore. Voglio assolutamente che siano qui anche gli altri amici, quelli veri ma che col Poli han poco a che fare. Così quando riprenderò in mano questa pagina, ricorderò come se fosse oggi. Loro sono quelli che se capiterà di perdersi di vista, il giorno in cui ci rivedremo, non sarà cambiato nulla nei nostri sguardi. Forse alcuni non sanno quanto siano stati fondamentali per me. Ma anche lo sapessero, sono felice di abbracciarli ancora una volta con questa dedica. Loro sanno, nomi non ne servono. Un enorme Grazie lo devo e lo voglio rivolgere a Nicola. Sempre pronto ad invasioni di ufficio per i dubbi del giorno, a chiamate su skype da fusi orari di Taiwan o San Francisco e darmi un consiglio fidato quando capita che t’interroghi sul futuro. Ti ringrazio e mi spiace chiudere qui il lavoro insieme, certo che mi sarei goduto altri 3 anni di libera ricerca con te. Ma questo è un altro discorso. Un grazie anche a Sofia, a Fabio e a Gianluca, per il lavoro in comune, lo scambio di idee e di caffè e il supporto a trecentosessanta gradi. Tengo le ultime righe (last but not least viene male in italiano) per la mia famiglia. 3
Un abbraccio alla mia nonna: negli anni in pochi hanno seguito così da vicino i miei progressi e i miei momenti di incertezza, i miei progetti futuri e le mie speranze. Grazie. I miei genitori. Oggi non sarebbe il 4 Ottobre 2011 senza la loro costante presenza ed il loro supporto incondizionato, anche quando non ho vissuto insieme con loro. Ci sarebbe molto da scrivere, ma conosco un modo per riassumere al meglio: vi voglio bene! E vi dedico la tesi. Giorgio
4
Contents Ringraziamenti
3
1 Introduction
16
2 Game Theory Groundings 2.1 Normal-Form Games . . . . . . . . . . . . . . . . 2.2 Nash Equilibrium and Domination . . . . . . . . 2.3 Degenerate Games . . . . . . . . . . . . . . . . . 2.4 The Complexity of Computing Nash Equilibrium 2.5 Smoothed Complexity . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
19 19 22 25 25 28
3 State of the Art: LH, PNS and MIP Algorithms 3.1 Linear Complementarity Problem . . . . . . . . . . 3.2 Geometrical View: Polyhedra and Labels . . . . . . 3.3 Complementarity Pivoting and the LH Algorithm . 3.4 Degenerate Games and Lexicographical Order . . . 3.5 Complexity and Exponentially-Hard Games . . . . 3.6 The PNS Algorithm . . . . . . . . . . . . . . . . . 3.7 The MIP Algorithm . . . . . . . . . . . . . . . . . 3.8 The Performance . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
31 31 32 36 39 40 41 43 45
4 Local Search and LS-PNS 47 4.1 Local Search . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 LS-PNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . 56 5 Enhanced LH 63 5.1 Implementation Issues . . . . . . . . . . . . . . . . . . . . . . 63 5.2 Randomized Search and Heavy Tails . . . . . . . . . . . . . . 66 5.3 Experimental Analysis and rr-LH . . . . . . . . . . . . . . . . 68 6 The Lemke’s Algorithm 77 6.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.2 The Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5
6.3
Implementation, Tuning and Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
7 Local Search on Vertices 98 7.1 LS-vertices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.2 Experimental Tuning . . . . . . . . . . . . . . . . . . . . . . . 100 8 Anytime Algorithms for Approximated Equilibria 102 8.1 Incremental Perturbation LH . . . . . . . . . . . . . . . . . . 102 8.2 Experimental Analysis . . . . . . . . . . . . . . . . . . . . . . 103 9 Summary of Results
105
10 Conclusion and Open Questions
108
Bibliography
109
A Other Plots
113
6
List of Figures 2.1
Example of normal-form game (Rock, Paper, Scissors game).
20
3.1
The division of X = ∆n and Y = ∆m in best response regions for the game in the last example. The numbers represent the labels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7x7 SGC’s game. [28]. . . . . . . . . . . . . . . . . . . . . . .
35 46
3.2 5.1 5.2
5.3
5.4 5.5 5.6 5.7 5.8
6.1
6.2
Simulated heavy and nonheavy-tailed behaviors. . . . . . . . . 67 Average number of pivoting steps performed by LH as a function of the game size for DispersionGame, UniformLEG-RG, BidirectionalLEG-RG, SGC’s games, PolymatrixGame-RG, and GraphicalGame-RG. . . . . . . . . . . . . . . . . . . . . . . . 69 Average number of pivoting steps performed by LH as a function of the game size for CovariantGame-Rand and hard-tosolve games. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Distribution of the number of steps performed by LH. Figure on the right are in logarithmic scale on the two axes. . . . . . 72 Histograms of path lengths on PolymatrixGame-RG, GraphicalGameRG and CovariantGame-Rand. . . . . . . . . . . . . . . . . . 73 Relation between the average number of steps and the cutoff. 74 Average number of pivoting steps performed by rr-LH as a function of the game size. . . . . . . . . . . . . . . . . . . . . 75 Average number of pivoting steps as a function of the game size performed by LH when applied to σ-perturbed hard-tosolve games. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Average number of pivoting steps performed by Lemke’s as a function of the game size. Note that SGC’s games are tested till dimension 99 × 99. . . . . . . . . . . . . . . . . . . . . . . Average number of pivoting steps performed as a function of the game size by Lemke’s applied to hard-to-solve games. (b) shows both the average and the shortest paths found, in logarithmic scale. . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
85
86
6.3
Comparison of average number of pivoting steps performed by Lemke’s and LH as a function of the game size. . . . . . . . . 6.4 Values of -NE (a) and z0 (b) of the starting points and related average number of steps with PolymatrixGame-RG. Distribution of z0 values (c): x-axis points are clustered in order to compute the average on y-axis. Value of z0 with CovariantGame-Rand (d) and hard-to-solve games (e). . . . . 6.5 Distances between starting points and the Nash equilibria to that they lead. x-axis points are clustered in order to compute the average on y-axis. . . . . . . . . . . . . . . . . . . . . . . 6.6 Probability distribution of mean d2 with random starts (a). Average number of steps and mean d2 between starting points and the Nash equilibria to they lead (b). Probability distribution (c), starting on vertices. x-axis points are clustered in order to compute the average on y-axis . . . . . . . . . . . . . 6.7 PolymatrixGame-RG. Number of steps as function of decr(NE, n) (a), and of decr(z0 , n) (b). The plots are zoomed in the more informative region, cut away outliers. Figure (c) shows the average number of steps as function of decr(z0 , n); x-axis points are clustered in order to compute the average on y-axis. Figure (d) depicts the probability distribution of decr(z0 , n) values. . . . . . . . . . . . . . . . . . . . . . . . . . 6.8 CovariantGame-Rand. Number of steps as function of decr(NE, n) (a), and of decr(z0 , n) (b). The plots are zoomed in the more informative region, cut away outliers. Figure (c) shows the average number of steps as function of decr(z0 , n); x-axis points are clustered in order to compute the average on y-axis. Figure (d) depicts the probability distribution of decr(z0 , n) values. . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 Hard-to-solve games. Number of steps as function of decr(NE, n) (a), and of decr(z0 , n) (b). The plots are zoomed in the more informative region, cut away outliers. . . . . . . . . 6.10 Average number of pivoting steps performed as a function of the game size by Lemke’s with random restart policy with cutoff equal to the medium number of step of Figure 6.1 (d). . 6.11 Relationship between the average number of steps and the cutoff performed by Lemke’s algorithm . . . . . . . . . . . . . A.1 Relation between the average number PolymatrixGame-RG of size 5-20 . . A.2 Relation between the average number PolymatrixGame-RG of size 25-50 . A.3 Relation between the average number PolymatrixGame-RG of size 55-65 . 8
87
88
89
90
92
93
94
95 96
of steps and the cutoff. . . . . . . . . . . . . . . 113 of steps and the cutoff. . . . . . . . . . . . . . . 114 of steps and the cutoff. . . . . . . . . . . . . . . 115
A.4 Relation between the average number of steps and the cutoff. GraphicalGame-RG of size 5-10 . . . . . . . . . . . . . . . . . A.5 Relation between the average number of steps and the cutoff. GraphicalGame-RG of size 15-40 . . . . . . . . . . . . . . . . A.6 Relation between the average number of steps and the cutoff. GraphicalGame-RG of size 45-65 . . . . . . . . . . . . . . . . A.7 Relation between the average number of steps and the cutoff. CovariantGame-Rand of size 5-30 . . . . . . . . . . . . . . . . A.8 Relation between the average number of steps and the cutoff. CovariantGame-Rand of size 35-50 . . . . . . . . . . . . . . . A.9 Values of -NE (a), -supp-NE (b), regret (c) and z0 (d) of the starting points and related average number of steps with PolymatrixGame-RG. x-axis points are clustered in order to compute the average on y-axis . . . . . . . . . . . . . . . . . . A.10 Values of -NE (a), -supp-NE (b), regret (c) and z0 (d) of the starting points and related average number of steps with CovariantGame-Rand. x-axis points are clustered in order to compute the average on y-axis . . . . . . . . . . . . . . . . . . A.11 Values of -NE (a), -supp-NE (b), regret (c) and z0 (d) of the starting points and related average number of steps with hard-to-solve games. x-axis points are clustered in order to compute the average on y-axis . . . . . . . . . . . . . . . . . . A.12 d1 , d2 and d∞ distances between starting points and the Nash equilibria to that they lead. PolymatrixGame-RG. x-axis points are clustered in order to compute the average on y-axis. . . . A.13 Cosine and correlation distances between starting points and the Nash equilibria to that they lead. PolymatrixGame-RG. x-axis points are clustered in order to compute the average on y-axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.14 d1 distances between starting points and the Nash equilibria to that they lead. CovariantGame-Rand. x-axis points are clustered in order to compute the average on y-axis. . . . . . A.15 d2 , d∞ and correlation distances between starting points and the Nash equilibria to that they lead. CovariantGame-Rand. x-axis points are clustered in order to compute the average on y-axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.16 Mean of d1 , d2 , d∞ and correlation distances between starting points and the Nash equilibria to that they lead. Hard-tosolve games. x-axis points are clustered in order to compute the average on y-axis. . . . . . . . . . . . . . . . . . . . . . .
9
115 116 117 118 119
120
121
122
123
124
124
125
126
List of Tables 3.1
4.1
4.2
4.3
4.4
Average time to find an equilibrium in 150 x 150 games (10 instances) with LH, PNS and MIP. The percentage of time-outs for LH, PNS and MIP are 8.3%, 2.0% and 7.5%, respectively [28]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Percentage with which an equilibrium is found (within 10 minutes) and time in seconds needed to find it with II-FIR and RR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Percentage with which an equilibrium is found and quality (i.e., f (S)) of the best found solution when no equilibrium is found with RR. . . . . . . . . . . . . . . . . . . . . . . . . . . Success percentages, computational times in seconds, conditionally dominance (CD) (whether or not it is used), and the value of the best upper-bound (best U-B). . . . . . . . . . . Average of of the -Nash equilibria returned by the anytime algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
57
59
61 62
6.1
Success percentages, computational times in seconds and the value of the best -NE found by LH and the Lemke’s algorithm. 97
7.1 7.2
Average value of the best -NE found within ten minutes . . . 101 Comparison between the three heuristics (II-BI, II-FI and IIFIR). Average value of the best -NE found by each heuristic within 10 minutes. 5 executions for 5 instences of CovariantGameRand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.1
Average of of the -Nash equilibria returned by the anytime algorithms executed 5 times per each hard CovariantGameRand instance; LS means local search. . . . . . . . . . . . . . 104
9.1
Features comparison of our algorithms. We suppose to use square games . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
10
Abstract This thesis concerns the problem of finding one Nash equilibrium of a bimatrix game, a two-player game in normal form. Bimatrix games are among the most basic models in noncooperative Game Theory, and finding a Nash equilibrium is important for their analysis. There exist several algorithms to find a Nash equilibrium, e.g., we resort to LH and PNS, but some games remain computationally not tractable. We focus on the hardest classes of game known from previous experimental studies. Our purpose is to design Local Search algorithms to compute Nash equilibria or their -approximations. We designed and implemented a Local Search version of PNS algorithm, that moves on the space of players’ supports. It outperforms the other known algorithm in the computation of -Nash equilibria. LH algorithm is extended with a random restart policy and is optimized by some techniques from Operations Research: it makes tractable some game classes, before considered hard. Then we studied the related Lemke’s algorithm. Our experimental campaign does not show a clear way to extend it to gain better performance, even if it allows more degrees of freedom than LH. We developed another Local Search algorithm, derived from the pivoting method of LH in order to explore vertices of best response polyhedra. It establishes the state-of-the-art of -Nash equilibria computation. One more algorithm, precisely designed to approximate equilibria, calls LH to solve perturbed games. We summarized our results comparing our algorithms in computation of approximated equilibria. Furthermore, we included in our analysis the "hard-to-solve" games introduced in [29], [30], which ones fix the exponential worst-case of LH.
11
Sommario La Teoria dei Giochi è un campo relativamente recente della matematica applicata che studia le situazioni di interazione fra agenti, o giocatori, siano esse di conflitto o cooperazione. È nata nei primi anni ’40 con i lavori di John Von Neumann e Oskar Morgenstern. La Teoria dei Giochi ha riscontrato interesse in diversi ambiti quali, ad esempio, l’economia, la psicologia, le scienze sociali, l’ingegneria e l’intelligenza artificiale, dove viene impiegata per i sistemi multiagente artificiali. L’influenza tra informatica e Teoria dei Giochi è speculare. La Teoria dei Giochi fornisce ad esempio modelli e strumenti per studiare, ad esempio, alcuni aspetti legati ad Internet, come il commercio elettronico, il routing e le reti wireless. Dall’altro lato, l’informatica completa la Teoria dei Giochi con lo studio di algoritmi per il calcolo di soluzioni nei modelli di gioco e con analisi di teoria della complessità computazionale. Vediamo in dettaglio i contenuti della tesi. Nel Capitolo 2, di carattere introduttivo, vengono esposte le basi teoriche dei giochi non cooperativi a due giocatori, su cui si concentra la tesi. In un gioco non cooperativo, ogni giocatore sceglie quali azioni compiere al fine di massimizzare il proprio guadagno, senza poter cooperare con gli altri. In questo contesto è importante identificare la strategia ottima dei giocatori, individuando quella che viene chiamata la soluzione del gioco. Il concetto principale di soluzione per i giochi non cooperativi è l’equilibrio di Nash. Esso esprime una situazione di stabilità in cui ogni giocatore è disincentivato a modificare unilateralmente il proprio agire. Nel 1951 John Forbes Nash ha dimostrato che per ogni gioco esiste almeno un equilibrio di Nash. Il Capitolo 2 si conclude con un richiamo dello Stato dell’Arte sulla complessità computazionale del calcolo di un equilibrio di Nash. La complessità temporale di trovare un equilibrio di Nash appartiene alla classe PPAD, generalmente considerata diversa da P. Da cui, trovare un equilibrio, nel caso peggiore, può impiegare un tempo non polinomiale nella dimensione del gioco, rendendo il problema intrattabile nella pratica. Il Capitolo 3 presenta i principali algoritmi noti per calcolare equilibri di Nash in giochi a due giocatori. Il più usato è LH basato sulla programmazione matematica alle complementarità lineari. Segue PNS, il cui fulcro è l’enumerazione dei supporti dei giocatori; informalmente, il supporto è l’insieme delle azioni giocate dal giocatore. In ultimo è presentato MIP che 12
trova equilibri di Nash tramite un problema di programmazione lineare misto intera. Fra di essi, ciascuno ha prestazioni migliori degli altri con particolari tipi di giochi. Allo stesso tempo esistono classi di gioco per le quali il tempo di calcolo di ciascuno degli algoritmi risulta esponenziale nel numero di azioni dei giocatori, su dati sperimentali in media. Nel corso del nostro lavoro, sono stati sviluppati degli algoritmi di Ricerca Locale per affrontare le istanze di gioco intrattabili. Tecniche di Ricerca Locale sono comunemente adottate in Ricerca Operativa qualora risulti molto difficile raggiungere la soluzione esatta del problema. L’idea è quella di sfruttare alcune euristiche progettate ad hoc per guidare la ricerca dell’ottimo oppure di una soluzione sub-ottima. Infatti, spesso gli algoritmi di Ricerca Locale sono implementati in modo tale che la loro esecuzione possa esser fermata in qualsiasi momento, anytime. Essi restituiscono la miglior approssimazione trovata, data una deadline sul tempo di calcolo disponibile. Sono stati seguiti diversi approcci. Il primo è presentato nel Capitolo 4. La base è il meccanismo di enumerazione dei supporti del PNS. L’algoritmo è progettato in modo da spostarsi all’interno dello spazio dei supporti così da migliorare la funzione obiettivo ad ogni passo, cioè avvicinandosi sempre più all’equilibrio di Nash. Per arrivare alla migliore parametrizzazione sono state valutate diverse combinazioni di funzioni obiettivo (-Nash equilibri e altre varianti), euristiche (Best Improvement, First Improvement, First Improvement with Random generation e Metropolis) e metaeuristiche (Iterated Improvement, Simulated Annealing, Tabu Search e VNS). L’algoritmo così ottenuto è in grado di risolvere alcuni giochi difficili di dimensioni medio piccola ed è migliore delle versioni anytime (implementate ad hoc) di LH, PNS e MIP. Nel Capitolo 5 si estende LH e se ne ottimizza l’implementazione. Questo algoritmo inizia la sua esecuzione da una soluzione artificiale. A partire da essa è possibile intraprendere un numero finito di percorsi deterministici e calcolati in maniera simile all’algoritmo del Simplesso, attraverso pivoting complementare di due tableaux. Geometricamente, ciò corrisponde a muoversi tra i vertici di due poliedri, uno per giocatore. LH sceglie casualmente uno di questi percorsi e lo segue fino alla fine, giungendo così ad un equilibrio di Nash. La lunghezza dei percorsi non è costante e i giochi difficili presentano molti percorsi a dimensione elevata. Si osserva sperimentalmente che per le classi di gioco difficili il numero medio di passi di LH necessari per raggiungere l’equilibrio di Nash aumenta in modo esponenziale all’aumentare della dimensione del gioco. LH è stato esteso con una tecnica di Random Restart del percorso: rr-LH segue un percorso finchè esso non diventa troppo lungo, per cui comincia un altro percorso da capo. È stato necessario utilizzare dei dati di training per stimare il cutoff ottimo (il numero di passi dopo cui fare restart) per ogni determinata classe di giochi considerata difficile. rr-LH porta ad un notevole miglioramento, infatti si osserva che per alcune classi di gioco difficili 13
il numero medio di passi di rr-LH mostra una relazione lineare con la dimensione del gioco. Tali risultati sono supportati da considerazioni teoriche sulla distribuzione di probabilità della lunghezza dei percorsi di LH. Vengono trattate distribuzioni fat-tailed e heavy-tailed. Durante i test condotti su LH si sono riscontrati problemi di approssimazione numerica dell’algoritmo. Senza una garanzia sulla precisione di calcolo, LH può terminare la sua esecuzione in un punto che non è un equilibrio, oppure entrare in un ciclo infinito. Si è quindi provveduto ad una implementazione alternativa con le librerie GMP (Gnu Multiple Precision), che permettono di rappresentare numeri razionali tramite numeratore e denominatore. La precisione numerica è garantita a livello arbitrario, da cui la correttezza dei punti di equilibrio, ma le prestazioni decadono a causa della necessità di gestire questa rappresentazione numerica non standard a due interi di dimensione non limitata. Nel Capitolo 6 introduciamo l’algoritmo di Lemke, il quale risolve generici problemi alle complementarietà lineari, non solo quelli che esprimono le condizioni di un equilibrio di Nash. Il suo metodo risolutivo è molto simile a quello di pivoting complementare di LH. L’algoritmo di Lemke possiede un maggiore grado di libertà di LH: può essere fatto partire da un continuum di punti iniziali. Quanto questa libertà possa venir sfruttata per ottimizzare l’algoritmo è una questione aperta a cui cerchiamo di dare risposta nel capitolo, anche per mezzo di alcune considerazioni teoriche. L’algoritmo di Lemke potrebbe essere la base di un algoritmo di Ricerca Locale il cui spazio di ricerca sia quello dei punti iniziali. Purtroppo i risultati sperimentali ricavati mostrano la grande difficoltà nel trovare buoni indici che guidino le euristiche. Nonostante ciò, le nostre analisi sperimentali mostrano che l’algoritmo di Lemke e LH non sono comparabili, nessuno dei due è sempre più veloce dell’altro. Un ulteriore algoritmo di Ricerca Locale, LS-vertices, è presentato nel Capitolo 7. Invece che spostarsi nello spazio dei supporti, LS-vertices visita i vertici del poliedro di uno dei giocatori, in maniera simile a LH. Invece che rispettare le condizioni di complementarietà, a guidare il pivoting è la funzione obiettivo della Ricerca Locale. Mentre ci spostiamo sul poliedro di un giocatore, calcoliamo qual è il migliore -Nash equilibrio ottenibile sulle strategie dell’altro giocatore. Questo algoritmo, allo stato dell’arte, è il migliore, al fine di calcolare gli -Nash equilibri. In ultimo, nel Capitolo 8 sono confrontate le performance della maggior parte degli algoritmi presentati nella tesi, in versione anytime. Inoltre, viene descritto un nuovo algoritmo, ip-LH, con il preciso obiettivo di calcolare equilibri approssimati. Esso richiama LH per risolvere giochi perturbati in maniera sempre più fine. ip-LH trova equilibri sempre meglio approssimati, poichè la teoria garantisce che l’equilibrio di Nash di un gioco perturbato è un -Nash equilibrio del gioco originale, migliore più è stretta la perturbazione. Le cifre di merito sono la qualità degli -Nash equilibri trovati, la percentuale 14
di giochi risolti all’ottimo e i relativi tempi di esecuzione. I Capitoli 9 e 10 chiudono la tesi, riassumendo i risultati ottenuti ed i problemi ancora aperti, e prevedendo eventuali studi futuri.
15
Chapter 1
Introduction Game Theory is a relatively recent branch of applied Mathematics that studies situations of interaction among agents, also called players. Game Theory is born in the first ’40 with John Von Neumann and Oskar Morgenstern’s works. It has gained interest in some different fields, e.g., Economy, Psychology, Social Sciences, Engineering and Artificial Intelligence, where it is applied to model multiagent systems. The influence between Computer Science and Game Theory has two faces. The former completes Game Theory with the study of algorithms to compute game solutions and with analyses from the theory of computational complexity. The latter provides models and tools to study, for example, issues of Internet, such as electronic commerce, routing and wireless networks. Let us take a glance of the whole content of the thesis. In Chapter 2 we expose the theoretical basis of noncooperative bimatrix games, on which the thesis is focused. In a noncooperative game, each player chooses, independently from the others, what actions to play in order to maximize his outcome. In this scenario it is important to identify optimal players’ strategies, reaching the so called solution of the game. The central solution concept in noncooperative game is the Nash equilibrium. It expresses a condition of stability in which each player gains no more through deviating unilaterally his play. In 1951, John Forbes Nash gives a proof to his famous theorem: every game has at least a Nash equilibrium. Chapter 2 ends with the state of the art about complexity of computing Nash equilibria. This problem belongs to the complexity class of PPAD, generally believed different from P. Hence, in the worst case, to find an equilibrium can take nonpolynomial time in the dimension of the game, making the problem not tractable in practice. Chapter 3 presents the main algorithms known to compute Nash equilibria in bimatrix games. The most commonly used is LH, based on linear complementarity programming. The following is PNS, which kernel is a supports enumeration method; informally, the support is the set of actions played by a player. Finally, we present MIP, that finds Nash equilibria through a mixed-
16
integer linear problem. Among them, each one outperforms the others with particular classes of game. At the same time, there exist classes of game for which the running time of each algorithm grows exponentially with the game dimension, referring to experimental averages. In the course of our work, we developed Local Search algorithms. We have the purpose to face the not tractable game instances. Local Search techniques are commonly adopted in Operations Research, when exact solutions are very hard to be found. The insight is to exploit ad hoc designed heuristics to guide the search of either optimum or sub-optimum solutions. Indeed, Local Search algorithms are usually implemented in an anytime version. Once stopped to a temporal deadline, they return the best approximation found so far. We followed some different approaches. The first one is explained in Chapter 4. The basis comes from the support enumeration of PNS. The algorithm moves on the supports space in order to minimize a certain objective function at each step, that is, going closer and closer to an equilibrium. To tune the algorithm we tested combinations of objective function (-Nash equilibrium, well-supported -Nash equilibrium and regret), heuristics (Best Improvement, First Improvement, First Improvement with Random generation and Metropolis), and metaheuristics (Random Restart, Simulated Annealing, Tabu Search and Variable Neighborhood Search). The so obtained algorithm is able to solve small and medium hard games and it results better than anytime versions -ad hoc implemented- of LH, PNS and MIP. In Chapter 5 LH is optimized and extended. This algorithm starts from an artificial solution. From there, a finite number of deterministic paths can be chosen. They are computed similarly as in the Simplex algorithm, by complementarity pivoting of two tableaux. Geometrically, LH moves among vertices of two polyhedra, one for each player. It randomly chooses a path and follows it till an equilibrium is found. Paths lengths are not all equal and hard games have a lot of long paths. Relatively to hard-game classes, we experimentally observed the exponential relationship between path length and game size, in the average number of LH steps. LH is extended to rr-LH by a Random Restart method: once the path seems too long, the algorithm restarts on a new randomly chosen path. We estimated the best cutoff on training data for different game classes. (The cutoff is the number of steps beyond which to perform a restart.) rr-LH improves the performance of LH. Indeed, it shows a linear relationship between average number of steps and some size of some hard games. Those results are supported by theoretical observations on probability distributions of LH paths. We deal with fat and heavy-tailed distributions. During LH tests, we noticed problems due to numerical approximation. Without guarantees on arithmetic accuracy, LH can terminate either in a point that is not an actual equilibrium or enter in a infinite loop. Thus, the algorithm needs an alternative implementation with GMP library (Gnu 17
Multiple Precision), that allow us to represent rational numbers in exact arithmetic through numerator and denominator. We gained the correctness of LH, but we lost in performance cause the need to handle this nonstandard arithmetic with two not bounded integers. In Chapter 6 we introduce the Lemke’s algorithm, which solves general linear complementarity problems, not only those ones related to Nash equilibria. Its pivoting solving method is similar to LH. But the Lemke’s algorithm has got one more degree of freedom: it can start from a continuum of initial points. It is still an open question how this extra flexibility can be exploited to optimize the algorithm. We try to give an answer in this chapter, also through theoretical observations. The Lemke’s algorithm could be the basis for a Local Search method, in which we search within the space of its initial points. Unfortunately, experimental results show the hardness to find good indices to guide heuristics. Our analysis also highlights there is not an algorithm always superior between LH and the Lemke’s. A further Local Search algorithm, LS-vertices, is presented in Chapter 7. LS-vertices visits the vertices of one player’s polyhedron, similarly to LH. But instead of respecting complementarity conditions, we guide pivoting through the objective function of the Local Search. While it is moving on the first player’s polyhedron, it computes the best -Nash equilibrium adjusting other player’s strategy. It establishes the state-of-the-art of -Nash equilibria computation. Eventually, in Chapter 8 we compared the performance of most of our algorithms, in their anytime versions. Moreover, it is described another algorithm, ip-LH, with the precise purpose to compute approximated equilibria. It calls LH to solve perturbed games, with incrementally smaller perturbations. ip-LH computes continuously better approximations, because the theory assures us that the equilibrium of a perturbed game is an -Nash equilibrium of the original game, the tighter the perturbation, the better the approximation. We compared the algorithms on the quality of -Nash equilibria found. Chapters 9 and 10 sum up the obtained results, highlight the open questions, and suggest future works.
18
Chapter 2
Game Theory Groundings The purpose of this chapter is to recall the basic notions of noncooperative Game Theory. The normal form and the concept of strategy are defined in Section 2.1, other than Nash equilibrium and strategy domination in Section 2.2. In Section 2.3 the case of degenerate games is presented. Section 2.4 deals with how much complex is to compute a Nash equilibrium. Its smoothed complexity is the subject of Section 2.5.
2.1
Normal-Form Games
The dominant approach to modeling players’ interests is utility theory. It allows one to quantify players’ degree of preference across a set of available alternatives. A utility function is a mapping from states of the world to real numbers. These numbers represent the level of happiness of the players in the given states. (For a formal definition of preferences, utility function and their properties see [31].) The normal form, also known as strategic form, is a way to represent players’ utilities for every state of the world, in the special case where the outcomes depend only on the players’ combined actions, which are played simultaneously. There are richer representations of games, which consider either probabilistic models of environment and players’ utilities (Bayesian games) or an element of time (extensive-form games). Despite it would seem to be a very particular situation, the normal form is a canonical representation of game, because most other representations of interest can be converted into it. Definition 1 (Normal-form Game) A normal-form game is a tuple (N, A, u), where: • N is a set of n players; • A = A1 × · · · × An , where Ai is a finite set of actions available to player i, or pure-strategies; 19
• u = (u1 , ..., un ) where ui : A → R is a real-value utility (or payoff) function for player i. In our thesis we only focus on two-player games. An efficient algorithm for computing Nash equilibria is still an open problem, even in the case of two players. Moreover, as we will see below (Section 2.4), the problem of finding a Nash equilibrium belong to the same complexity class independently from the number of players. Therefore, we present the theory groundings in the situation of only two self-interested players, i.e., n = 2. A two-player normal-form game is also called bimatrix game, because utilities are usually given by two matrices, (A, B) ∈ (R, R)n×m . In the following, let [k] will be the set {1, . . . , k}. The first player (also called the row player ) has an n-element action set [n] and the second player (column player ) has an m-element action set [m]. The row and the column player’s payoffs are determined by the n × m real-valued matrices A and B respectively. Definition 2 (Bimatrix Normal-form Game) A bimatrix normal-form game is a tuple (A, B) ∈ (R, R)n×m , where the n rows are the actions for row player, and the m column are the actions for column player. The following example (Figure 2.1) represents the rock, paper, scissors game. Each cell of the matrix corresponds to a possible outcome and the left number of each cell is the row player’s utility, while the right number is the column player’s utility. If the row player plays Rock and the column player plays Scissors, they gain 1 and -1, respectively.
Rock P aper Scissors
Rock 0, 0 1, −1 −1, 1
P aper −1, 1 0, 0 1, −1
Scissors 1, −1 −1, 1 0, 0
Figure 2.1: Example of normal-form game (Rock, Paper, Scissors game). The game in Figure 2.1 is a zero-sum game, which is a case of normalform game. It models a situation of pure competition. That is, one player wins if and only if the other one loses. Definition 3 (Zero-sum Game) A two-player normal-form game is zerosum if holds A + B = 0. In a normal-form game each player has a set of strategies among which he can choose. If a player selects a single action and plays it, the strategy is called pure strategy (e.g., if he plays Rock, in the example given above), otherwise, if the player randomizes over the set of available actions according 20
to some probability distribution, this kind of strategy is called mixed strategy (if he plays both Rock and Paper with a probability of 0.5). A pure strategy is a degenerate case of a mixed one. Definition 4 (Mixed Strategy) A mixed strategy for a player is a probability distribution over P the set of his pure strategies: x = (x1 , x2 , . . . , xn ) such that ∀i xi ≥ 0 and i xi = 1. Here xi is the probability to play the pure strategy i. A pure strategy will be represented by the unit vector ei , that is 1 in the ith element and 0 elsewhere. Another used notation will be 1 (0) that is the vector having 1s (0s) in all its coordinates. For simplicity of notation, with x ≥ 0 we will intend a vector of nonnegative components. More generally, an inequality between vectors as x ≥ b means inequalities between all their components. We also recall the definition of simplex. Definition 5 (Simplex) Let r ∈ N. We denote the (r-1)-simplex ∆r = {z ∈ Rr : ∀i ∈ [r], zi ≥ 0, (1)T z = 1} For example, a 2-simplex is a triangle which vertices are the vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1). A mixed strategy is naturally defined over a simplex. Definition 6 ((Mixed) Strategy profile) A (mixed) strategy profile for the two players is the profile (x, y) ∈ ∆n × ∆m of their (potentially mixed) strategies. Definition 7 (Support) For any strategy x, the support Sx is the set of all coordinates with positive value: Sx = {i : xi > 0} If all the available actions for a player are in his support, the strategy is called fully mixed. A player gains utility as result of the strategies played by him and his opponent. Let Aij the row player payoff if he plays the pure strategy i and the column player plays pure strategy j. His gain is Aij . The same can be written as eTi Aej . If the players play mixed strategies, we introduce the concept of expected utility. Definition 8 (Expected Utility) Let (x, y) be the strategy profile of the two players. xT Ay and xT By are respectively the expected utilities for the row and the column player.
21
2.2
Nash Equilibrium and Domination
In games only certain strategies are interesting, with regard to the outcomes they produce, and these possible subsets of strategies are called solution concepts. Intuitively, a solution concept should say how the players have to play properly in order to obtain a good result from the game. The Nash equilibrium is the most important solution concept in normal-form games. It is based on the definition of best response. Definition 9 (Best Response) The row player’s best response to the column player’s strategy y is the set of all mixed strategies x∗ such that (x∗ )T Ay ≥ (ei )T Ay, ∀i ∈ [n]. That is x∗ = arg maxx {xAy}. Analogously, the column player’s best response is y ∗ = arg maxy {xBy}. When a support of a best response contains two or more pure strategies, the player is indifferent among them (i.e., they give the same expected utility). In the best response, player i knows the other player’s strategy and maximize his utility. Instead, in strategic-form games, players do not know the other player’s strategy. So the concept of best response is not enough and here Nash equilibrium has its role. Definition 10 (Nash Equilibrium (NE)) A strategy profile (x∗ , y ∗ ) ∈ ∆n × ∆m is a Nash equilibrium (NE) if x∗ is a best response of y ∗ , and vice versa. An important property of NE is that it is a stable strategy profile, in the sense that no player would deviate unilaterally from his strategy, because he would receive a lower utility. The most relevant characterization of NE is shown in the following theorem. Theorem 1 (Nash, 1951) Every game with a finite number of players and action sets has at least one NE. A proof of this theorem can be found in [31]. In the example (Figure 2.1) there is only one NE, which is not a pure strategy profile, but a fully mixed NE. Nash equilibria are invariant by linear transformations of the game matrices. Theorem 2 shows that scaling and shifting the game matrices, NE does not change. Theorem 2 (Linear Invariance of Nash Equilibria) Let (A, B) be a bimatrix game. Let a, b ∈ R, a, b > 0, c, d ∈ R and E = 1T 1, i.e., a matrix with all 1s entries. It follows that (x∗ , y ∗ ) is a NE for (A, B) if and only if it is a NE for the game (aA + cE, bB + dE). 22
Proof. As mentioned before, all pure strategies played as best response give the same utility. We first apply that property and reformulate the definition of NE. (x∗ , y ∗ ) is a NE: ( (x∗ )T Ay ∗ ≥ (ei )T Ay ∗ , ∀i (x∗ )T Ay ∗ ≥ (x∗ )T Aei , ∀i ( ∀j ∈ Sx : (ej )T Ay ∗ ≥ (ei )T Ay ∗ , ∀i ∀j ∈ Sy : (x∗ )T Aej ≥ (x∗ )T Aei , ∀i ( ∀j ∈ Sx : a(ej )T Ay ∗ ≥ a(ei )T Ay ∗ , ∀i ∀j ∈ Sy : b(x∗ )T Aej ≥ b(x∗ )T Aei , ∀i ( ∀j ∈ Sx : a(ej )T Ay ∗ + c(ej )T Ey ∗ ≥ a(ei )T Ay ∗ + c(ei )T Ey ∗ , ∀i ∀j ∈ Sy : b(x∗ )T Aej + d(x∗ )T Eej ≥ b(x∗ )T Aei + d(x∗ )T Eei , ∀i ( ∀j ∈ Sx : (ej )T (aA + cE)y ∗ ≥ (ei )T (aA + cE)y ∗ , ∀i ∀j ∈ Sy : (x∗ )T (bA + dE)ej ≥ (x∗ )T (bA + dE)ei , ∀i ( (x∗ )T (aA + cE)y ∗ ≥ (ei )T (aA + cE)Ey ∗ , ∀i (x∗ )T (bA + dE)y ∗ ≥ (x∗ )T (bA + dE)ei , ∀i (2.1) Therefore,
(x∗ , y ∗ )
is a NE for the game (aA + cE, bB + cE).
While the most important solution concept is the NE, there is also a large number of others. The concepts of dominant and dominated strategy are discussed here, because they will be important in our work. One strategy dominates another for a player, if the first strategy yields a payoff greater than the second, for any strategy of the other player. In order to formalize this idea, three definitions follow: Definition 11 (Domination) Let x and x0 be two strategies of row player. Then • x strictly dominates x0 if for all y ∈ ∆m holds that xT Ay > (x0 )T Ay • x weakly dominates x0 if for all y ∈ ∆m holds that xT Ay ≥ (x0 )T Ay and for at least one y holds that xT Ay > (x0 )T Ay Definition 12 (Dominant Strategy) A strategy is strictly (weakly) dominant for a player if it strictly (weakly) dominates any other strategy for that player. 23
It is straightforward to see that a strategy profile (x, y) in which every strategy is dominant (strictly or weakly) is a NE. A such strategy profile is called equilibrium in dominant strategies and, if it is strictly dominant, it is necessarily the unique NE in the game. However, dominant strategies are rare in naturally occurring games. The case of dominated strategies is more common. Definition 13 (Dominated Strategy) A strategy x is strictly (weakly) dominated for row player if some other strategies x0 strictly (weakly) dominates x. All strictly dominated pure strategies can be ignored, since they can never be best responses. It is important to note that once a pure strategy is eliminated, another strategy which was not dominated could become dominated; moreover, a pure strategy can be dominated by a mixed strategy although no pure strategies dominate it. Because of the hardness about computation of NE, the study of approximated equilibria gained a lot of interest in the last years. We present two definitions of approximation. We will make an intense use of them in the design and analysis of our algorithms. Definition 14 (-Nash Equilibrium (-NE)) For any > 0, a strategy profile (x∗ , y ∗ ) ∈ ∆n × ∆m is an -Nash Equilibrium (-NE) if it holds (x∗ )T Ay ∗ ≥ xT Ay ∗ − , ∀x and (x∗ )T Ay ∗ ≥ (x∗ )T Ay − , ∀y. No player will gain more than by unilaterally deviating to another strategy. Definition 15 (-Well Supported Nash Equilibrium (-supp-NE)) For any > 0, a strategy profile (x∗ , y ∗ ) ∈ ∆n ×∆m is an -Nash Equilibrium (-supp-NE) if it holds ∀i ∈ Sx , j ∈ [n], (ei )T Ay ∗ ≥ (ej )T Ay ∗ − ∀i ∈ Sy , j ∈ [m], (x∗ )T Aei ≥ (x∗ )T Aej − It is an equilibrium where both players assign positive probability only to pure strategies that yield at most less than the expected utility of a NE. Any NE is both a 0-NE and a 0-supp-NE. Further, every -supp-NE is also -NE, but the converse is not always true. The previous definitions rely on normalized matrices of games, that are game with each entry in [0,1]. That assumption assures the is bounded within the same range. Indeed, an -NE of a game (A, B) becomes a cNE for the game (cA, cB) with c > 0. equilibria are not invariant under positive scaling, while they are maintained under shifting. In the rest of the present work, while we refer to approximated equilibria, we are using normalized games or compute a normalization parameter to scale the value once found. 24
2.3
Degenerate Games
Games can be degenerate and that has some effects on computation of Nash equilibria. Definition 16 (Nondegenerate Game) A bimatrix game is called nondegenerate if the number of pure best responses of a mixed strategy never exceeds the size of its support We bring another equivalent definition that gives more insight of degeneration. For the proof of their equivalence, see [35]. Definition 17 (Nondegenerate Game, 2) A bimatrix game (A, B) is called nondegenerate if the columns of (I B T ) are linearly independent, and the rows of (A I) are linearly independent, where I is the identity matrix of suitable dimension. We call a game generic if its playoffs are drawn randomly and independently from a continuos distribution. A generic game is nondegenerate with probability one, because the condition of Definition 17 is always satisfied by random sampled bimatrices. Thus, the importance of nondegenerate games comes from their probability in randomly-generated games. To exploit a useful property of nondegenerate game, we lack the notion of balance of supports. Definition 18 (Balance of Supports) Let (x, y) be a strategy profile. We define the balance of their supports as ||Sx | − |Sy ||. If it is 0, the supports are called balanced. Solving a nondegenerate game always lead to NE of balanced supports. Indeed, let (x∗ , y ∗ ) be a NE. Recall that x∗ is the best response to y ∗ and vice versa. Admit as example that |Sy∗ | = k. Hence, the best response to y ∗ has at most k pure best responses. It means the support of x∗ has at most k pure strategies, i.e., |Sx∗ | = h ≤ k. Now, if h is exactly k, the property holds. Else, it would be h < k, thus it follows |Sy∗ | ≤ h. But |Sy∗ | = k, therefore it must be h = k and the supports are balanced.
2.4
The Complexity of Computing Nash Equilibrium
The computational complexity of finding a NE has attracted a lot of attention in recent time. Although the Nash Theorem assures the existence of at least a mixed equilibrium, its original proof is not constructive and does not give information about a solving algorithm. In the last decade, scientists made great progresses about the comprehension of computing NE. In the rest of the present work, we use the following problem definition. 25
Definition 19 (r-Nash) r-Nash is the problem of finding a NE in a rplayer game. It is simply Nash if the number of players is irrelevant. Zero-sum games are a very special class of games, due to the well-known correspondence with Linear Programming (LP) established by Von Neumann and Dantzig. In fact, their are in P. More generally, we do not know if there exist polynomial algorithms to compute NE in bimatrix games. Clearly, Nash belongs to the complexity class of NP. With the bimatrix game as input, the required output is a mixed strategy profile. The recognition version of the problem must decide whether or not a strategy profile is a NE, and it can be done in polynomial time complexity (admitted an accuracy specification as input). But intuitively, NP is not the tightest class that contains Nash. The reason is the Nash Theorem: every game has at least an NE, so Nash has always a solution. This condition is not true for a generic problem in NP, which are expressible in terms of recognition problem that does not always have solutions. The first step is to define the TFNP class (it stands for "NP Total Functions"), consisting exactly of all the search problems with a guarantee of solution. As explained in [10] TFNP has no complete problem and it must be studied by its subclasses. More specifically, Nash belongs to the PPAD ⊆ TFNP, the class of Problem with Parity Argument for Directed graph. PPAD class contains such problems on directed graphs with indegree and outdegree at most one, which has a source (a node with indegree zero) and then there must have a sink (a node with outdegree zero). Formally, we define the next abstract problem in PPAD. Definition 20 (End of the line) Given two functions S and P , each with input and output of n bits, such that P (0n ) = 0n 6= S(0n ), find an input x ∈ {0, 1}n such that P (S(x)) 6= x or S(P (x)) 6= x 6= 0n . End of the line creates a directed graph with vertex set {0, 1}n and an edge from x to y if and only if both y = S(x) and x = P (y). S and P stand for "successor" and "predecessor". All vertices have indegree and outdegree at most one, and 0n is the source without predecessor. Thus, there must be a sink. We look for a sink or a source different from 0n . Definition 21 (PPAD class) PPAD is the class of all total search problems polynomial-time reducible to End of the line. As usual, PPAD-complete is the class of the most difficult PPAD problems, or equivalently the class of problems such that any other PPAD problem can be reduced to them in polynomial time. Recently, it has been shown r-Nash is PPAD-complete even for two players [10]. So we can state that PPAD captures the whole class of Nash. 26
Theorem 3 Nash is PPAD-complete. Of course, Nash is not the unique problem classified in this class; some fixed-point problems, with the guarantee that point exists, belong to PPADcomplete [7], [10]. Similarly as NP-complete, at this time it is unknown whether or not PPAD=P. Nash has been defined as "the most outstanding problems at the boundary of P" [25]. Anyway, it is generally believed that the equality is very unlikely. In fact, if not, it means all PPAD-complete problems are in P, but up to now no polynomial time algorithm to solve them has been found. It is worth noting that some other problems related to Nash are instead NP-complete. It happens when we search NE with specific properties, losing the guarantee of solution. Examples are: • (Uniqueness [15]) Does exist a unique NE in the game? • (k Equilibrium) Does exist at least k NE in the game? • (Social Welfare [15]) Does exist a NE in which the sum of expected utility of the players is at least k? Such kind of problems will not be treated anymore in this work. We now turn to focus on 2-Nash. When a problem belongs to a class generally thought strictly greater that P, we look in practice for approximated solutions. First, we discuss the problem of finding an -NE in a n × n bimatrix game. For any > 0 and such that is O(1/poly(n)), this problem remains PPAD-complete. In other words it is unlike to have a Fully Polynomial Approximation Scheme (FPAS) for this problem [7]. Theorem 4 (Unlikely FPAS for 2-Nash) No algorithm with polynomial time complexity in n and 1/ can compute an -NE of a n×n bimatrix game, unless PPAD ⊆ RP. If PPAD ⊆ RP (the class of problem solvable in Polynomial time by Randomized algorithm) is still an open question, analogously to NP ⊆ P. Hence, we guess it is unlikely to have a FPAS for 2-Nash. At the time, there are not results about PAS (Polynomial Approximation Scheme): the question is if there is an algorithm that runs in polynomial time with 1/ in the exponent. Up to now, the best algorithm runs in subexponential time [20]. In particular it has been shown that, for every > 0, 2 an -NE can be found in time O(nlog n/ ) by examining all supports of size log n/2 . In the last years, it has been proposed various results about algorithms that find -NE with constant . To our knowledge, the best one is based 27
on a descent-procedure optimization in the strategies space and obtains = 0.3393 [33]. The second computes the equilibrium by solving a particular zero-sum game and reaches = 0.3639 [4]. It is worth to note that the latter has to solve just one linear program, while the former a polynomial number of them. As expected, because -supp-NE represents a stronger condition, a good lower bound for it is more difficult to compute. The best result about that is still obtained by solving a certain zero-sum game and provides supp = 0.658 [19]. However, these two perspectives of approximate equilibria are strictly tied. In fact, it has been proofed that they are polynomially related [7]. Theorem 5 (Polynomially Equivalence of -supp-NE and -NE) For any bimatrix game holds: 1. For any 2 /(8n)-NE, there exists a polynomial time algorithm that compute an -supp-NE. 2. An -supp-NE is also an -NE. The second statement comes from the stronger definition of well-supported equilibria. Hence, this problem is in PPAD too, and does not belong to FPAS.
2.5
Smoothed Complexity
Smoothed complexity of algorithm is a recent and quite different approach to evaluate the hardness of problems and the quality of algorithms. There are a lot of metrics though which to measure the complexity of an algorithm, but here we are interested in time and we consider smoothed complexity in contrast to the most commonly used worst-case analysis. (It is exactly the one used before to define complexity classes.) Following [7], we evaluate the performance of an algorithm A applied to an input x which is an instance of the problem P . The time complexity is a function of the input dimension n. Definition 22 (Polynomial (Worst-Case) Complexity) Let be given a problem P with input domain D = ∪n Dn , where Dn represents all instances whose input size is n. Let A be an algorithm for solving P and TA (x) be the time complexity for solving an instance x ∈ D. Then algorithm A has polynomial (worst-case) complexity if it holds max TA (x) ≤ poly(n)
x∈Dn
28
Worst-case analysis is important because gives us an absolute guarantee of bad performance of the algorithm. But, most of algorithms does not work so bad in practice. And thus that kind of complexity is not enough informative about how much is hard to solve real world instances of the problem. Worst-case complexity is usually too pessimistic. Smoothed complexity analysis has been developed to overcome such issues. It is partially motivated by noting that input data in applications are biased by several sources of random error, as physical measurement or numerical approximation and rounding. In smoothed analysis we take into account those deviations. Indeed they are the fundaments of smoothed complexity evaluation, because we measure the expected time complexity under those perturbations. Of course, each problem has its suitable kind of perturbation to apply it in order to model errors. But the theory of smoothed analysis is enough general to work with whatever random model of bias. For instance, we could use Gaussian or Uniform perturbation in real-valued input, and Boolean perturbation and Partial bit permutation working with boolean strings. We just introduce a σ-uniform-cube perturbation, useful below. Definition 23 (σ-Uniform-Cube Perturbation) Let x ∈ Rn . A σ-uniform-cube perturbation of x is a random vector chosen uniformly from the interval [x − σ, x + σ]. Anyway, we define smoothed complexity on a generic family of random sources. Definition 24 (Polynomial Smoothed Complexity) Let be given a problem P with input domain D = ∪n Dn , where Dn represents all instances whose input size is n. Let R = ∪n,σ Dn,σ be a family of perturbations where Rnσ defines for each x ∈ Dn a perturbation distribution of x with magnitude σ. Let A be an algorithm for solving P and TA (x) be the time complexity for solving an instance x ∈ D. Then algorithm A has polynomial smoothed complexity if it holds max
E
x∈Dn x ˜∈Rn,σ (x)
[TA (˜ x)] ≤ poly(n, 1/σ)
The problem P is in smoothed polynomial time with perturbation model R if it has an algorithm with polynomial smoothed complexity. Smoothed complexity is different from average-case complexity both in its assumption and in its practical relevance. In average-case evaluation, one first must determine the input distribution and then compute the expected behavior under the hypothesis the input are generated by such distribution. But in real world cases, we do not know what random sources generate the 29
data. We should use a mathematically tractable distribution, as approximation of the real probabilistic behavior of input. But such a distribution is hard to find, and hence the average analysis may be completely wasted. Are there some relations between smoothed complexity and how hard is approximation? We can guess that if an algorithm A for the problem P has a low smoothed complexity, we could first perturb P , solve it with A, and then obtain an approximate solution. Therefore we build a randomized version of A that has low complexity. The quality of such an approximation may depend on the perturbation model and on the properties of objective function. About 2-Nash we can show the following connection between smoothed complexity and approximation. Theorem 6 (Smoothed 2-Nash and Approximated 2-Nash) If 2-Nash can be solved in smoothed time polynomial in n, m and 1/σ under σ-uniform-cube perturbation, then an 4σ-NE of a bimatrix game can be found in polynomial time with a randomized algorithm in n, m and 1/σ. ˜ B) ˜ as Proof [32]. Let (A, B) be an n × m bimatrix game. Take (A, the game perturbed by a σ-uniform-cube perturbation. So each payoff of the perturbed game is a random variable chosen in the neighborhood of dimension σ centered around the payoff value. Then, for each strategy x ˜ ≤ 2σ and |xT By − xT By| ˜ ≤ 2σ. Now, let and y holds |xT Ay − xT Ay| ∗ ∗ ˜ ˜ (x , y ) is a NE for (A, B). Then, for all (x, y) we have ˜ ∗ ≤ ˜ ∗ + (x∗ )T Ay ∗ − (x∗ )T Ay xT Ay ∗ − xT Ay ˜ ∗ | ≤ 4σ ˜ ∗ | + |(x∗ )T Ay ∗ − (x∗ )T Ay |xT Ay ∗ − xT Ay From which ˜ ∗ − (x∗ )T Ay ˜ ∗ + 4σ ≤ 4σ xT Ay ∗ − (x∗ )T Ay ∗ ≤ xT Ay and the same for the column player. Thus (x∗ , y ∗ ) is a (4σ)-NE for (A, B). (x∗ , y ∗ )
We can simply exploit the same proof schema to show that if is ∗ ∗ ˜ ˜ an -NE of (A, B), (x , y ) is a (4σ + )-NE for (A, B) also. Theorem 4 together with the previous one imply that 2-Nash is unlikely in polynomial smoothed complexity. Theorem 7 (Smoothed Complexity of 2-Nash) It is unlikely that 2Nash is in smoothed polynomial time, unless PPAD ⊆ RP. Such a hardness result about 2-Nash pushes to investigate other ways than polynomial approximation schemas, either deterministic or randomized. Our purpose is to face the hardest to solve games, building ad hoc Local Search methods. 30
Chapter 3
State of the Art: LH, PNS and MIP Algorithms This chapter focus on the state-of-the-art about the exact computation of Nash equilibria with bimatrix games. We present in depth two known algorithms. They are the center around which we developed our work on Local Search. It is behind the scope of this work to present all algorithms designed to this purpose. In Sections 3.1-3.5 we begin with the most cited one, the Lemke-Howson’s algorithm, based on a linear complementarity problem. Then in Section 3.6 we present the more recent Porter-Nudelman-Shoham’s algorithm, that performs very well on randomly generated games. It implements a support enumeration, ordered by some heuristics and filtered by domination. Section 3.7 gives a briefly description of MIP algorithm, a mixed-integer linear formulation of Nash equilibrium. Section 3.8 concludes the chapter with some experimental comparisons.
3.1
Linear Complementarity Problem
The best known algorithm for two-player game is the Lemke-Howson’s algorithm (LH). In this exposition we follow very closely the approach of [35]. The problem of finding a NE can be expressed as a mixed Linear Complementarity Problem (LCP); "mixed" refers to the fact that some variables are not constrained to be nonnegative. LH is used to solve this formulation. We just refer to nondegenerate games; extension for degenerate ones is faced in Section 3.4. LCP has no objective function, and thus it is a feasibility problem.
31
xT (u − Ay) = 0
(3.1)
T
T
y (v − x B) = 0
(3.2)
Ay ≤ u1
(3.3)
T
x B ≤ v1
(3.4)
x≥0
(3.5)
y≥0
(3.6)
T
x 1=1
(3.7)
T
(3.8)
y 1=1
Scalars u and v are variables which take the value of expected utilities at optimum. Constraint (3.3) states that for every pure strategy of row player, his expected utility is at most u, given the mixed strategy of the column player; analogously, it is Constraint (3.4). Anyway, they are not enough to describe a NE. Constraints (3.1) and (3.2) are the unique nonlinear ones and are called complementarity conditions. These require that whenever an action is played with positive probability (i.e, whenever an action is in the support of a given player’s mixed strategy), then the corresponding expected utility is u for row player and v for column player. Thus, the complementarity conditions capture the fact that, at equilibrium, all pure strategies which are played with positive probability must yield the same utility, while all pure strategies that lead to lower expected utilities are not played. Last Constraints (3.5)-(3.8) ensure that the probabilities are nonnegative and that they sum to one. There are several algorithms designed to solve generic LCPs. But whenever the LCP represents the condition of a NE, LH is the most commonly used method.
3.2
Geometrical View: Polyhedra and Labels
We briefly recall some notions of convex geometry. An affine combination of points z = (z1 , . . . , zk ) in an Euclidean space is of the form z T λ, where λ = (λ1 , . . . , λk ) are reals with λT 1 = 1. It is called convex combination if λ ≥ 0. A set is called convex if all of its points can be expressed as convex combination of other ones. A polyhedron P ∈ Rd is a set {z ∈ Rd | M z ≤ q}, for some matrix M and vector q. P is always convex. It is called full-dimensional if it has dimension d. It is called a polytope if it is bounded. A face of P is a set {z ∈ P | cT z = q0 } for some c ∈ Rd , q0 ∈ R so that the inequality cT z ≤ q0 holds for all z ∈ P . A vertex of P is the unique element of a 0-dimensional face of P . An edge of P is a one-dimensional face of P . A facet of a ddimensional polyhedron P is a face of dimension d − 1. It can be shown that 32
any nonempty face F of P can be obtained by turning into equalities some of the inequalities defining P , which are called binding inequalities. That is, F = {z ∈ P | ci z = qi , i ∈ I}, where ci z ≤ qi , i ∈ I, are some of the rows in Cz ≤ q. The set of points satisfying Constraints (3.3)-(3.6) of the LCP, that is, without the complementarity conditions, are those belonging to the cartesian product H1 × H2 of best response polyhedra H1 = {(x, v) ∈ (∆n × R) | xT B ≤ 1v}, H2 = {(y, u) ∈ (∆m × R) | Ay ≤ 1u} Thus, we should look for a point in H1 × H2 for which are true the complementary conditions. Actually, we do not have to explore continuous spaces. Indeed, we can perform this search in a combinatorial space, as shown below. Without loss of generality, take [n] ∩ [m] = ∅, i.e., disjoint action sets. For any bimatrix game (A, B), the strategy x and the strategy y are labeled through a function L defined as L(x) = {i ∈ [n] | xi = 0} ∪ {j ∈ [m] | xT Bej ≥ xT Bek , ∀k ∈ [m]} L(y) = {j ∈ [m] | yj = 0} ∪ {i ∈ [n] | eTi Ay ≥ eTk Ay, ∀k ∈ [n]} A strategy profile (x, y) is called completely labeled when L(x) ∪ L(y) = [n] ∪ [m]. Theorem 8 A mixed strategy profile (x, y) is a NE of a game (A, B) if and only if it is completely labeled. Proof. On the one hand, Theorem 8 simply follows from that at equilibrium, any pure strategy of the players is either best response or played with probability zero. On the other hand, the condition L(x) ∪ L(y) = [n] ∪ [m] ensures that only the best response strategies are played with nonzero probability. Referring to Definition 16, for a nondegenerate game any point x has at most n labels, and any point y has at most m labels. Theorem 9 In a n×m nondegenerate game (A, B) only finitely many points have n labels and only finitely many points have m labels See [35] for the proof. So we have obtained a finite set of discrete points in which we must verify the complementarity conditions. We are going to define a graph over this discrete structure. Let G1 be the graph whose vertices are those points x 33
that have n labels, with an additional vertex 0 ∈ Rn that has all labels i ∈ [n]. Any pair of vertices x and x0 is joined by an edge if the vertices differ in one label, that is, if they have n − 1 labels in common. Similarly, G2 is built for the column player. The product graph G1 × G2 has vertices (x, y) where x is a vertex of G1 , and y is a vertex of G2 . Its edges are given by {x} × {y, y 0 } for vertices x of G1 and edges {y, y 0 } of G2 , or by {x, x0 } × {y} for edges {x, x0 } of G1 and vertices y of G2 . LH will be defined over this graphs. Let k ∈ [n] ∪ [m], and call a vertex pair (x, y) ∈ G1 ×G2 k-almost completely labeled if k is neither a label of x nor of y, and any l ∈ [n]∪[m]−{k} is either a label of x or of y. Since two adjacent vertices x, x0 ∈ G1 have n − 1 labels in common, the edge {x, x0 } × {y} ∈ G1 × G2 is also called k-almost completely labeled if y has the remaining m labels except k. The same applies to edges {x} × {y, y 0 } ∈ G1 × G2 . The point (0, 0) is completely labeled and is called artificial equilibrium. The above definitions can be illustrated using the following example, taken from [34]. Let (A, B) be a 3 × 3 bimatrix game with 0 3 0 0 1 2 A = 1 0 1 , B = 2 0 3 −3 4 5 2 1 0 The mixed strategy space ∆n of row player is 2-simplex, and so is the mixed strategy space ∆m of column player. Figure 3.1 shows the division of the simplices ∆n and ∆m into best response regions by k-hyperplanes, with k < n and k < m, respectively. That division is obtained by projecting H1 and H2 onto ∆n and ∆m , respectively. Each hyperplane is generated by the equality of two best response conditions. 3y2 = y1 + y3
(3.9)
This is the hyperplane where x1 and x2 have the same expected utility. The intersections between a simplex and hyperplanes define the boundary of the best response regions. For example the following constraints delimit the best response region labeled with 1 in Figure 3.1. 3y2 ≥ y1 + y3
(3.10)
3y2 ≥ −3y1 + 4y2 + 5y3
(3.11)
T
1 y = 1
(3.12)
The hyperplanes are the set of points which have at least 3−1 = 2 labels; those points have not unique best response. The vertices are emphasized by dots and are exactly those points that have 3 labels. A boundary 1-face carries the label of the pure strategy that is played with zero probability. The vertices of graphs G1 and G2 are exactly the vertices just defined on the simplices, plus the ones corresponding to the origins. 34
Figure 3.1: The division of X = ∆n and Y = ∆m in best response regions for the game in the last example. The numbers represent the labels. Theorem 10 (Lemke and Howson, 1964) Let (A, B) be a nondegenerate bimatrix game and k be a label in [n] ∪ [m]. Then the set of all vertices k-almost completely labeled consists of disjoint paths and cycles in G1 × G2 . The endpoints of the paths are the equilibria of the game and the artificial equilibrium. Proof [34]. Let M (k) = {(x, y) ∈ G1 × G2 | L(x) ∪ L(y) ⊇ [n] ∪ [m] − k}. So M (k) defines a subgraph of G1 × G2 . Let (x, y) ∈ M (k): x and y have together either m + n or m + n − 1 labels. In the former case, (x, y) is either an equilibrium or the artificial equilibrium. If (x, y) is completely labeled, then the vertex (x, y) is incident to a unique edge in the subgraph M (k), namely {x} × {y, y 0 } if k ∈ L(y) or {x, x0 } × {y} if k ∈ L(y). In the latter case, one has L(x) ∪ L(y) = [n] ∪ [m] − {k}, so there must be a duplicate label in L(x) ∩ L(y), because of the hypothesis of nondegeneracy. But this means that (x, y) is incident to both edges {x}×{y, y 0 } and {x, x0 }× {y}. Thus, M (k) is a subgraph where all vertices are incident to one or two edges. Hence, the subgraph consists of paths or cycles. The endpoints of the paths are the equilibria and the artificial equilibrium. This theorem provides another proof of the existence of NE in nondegenerate bimatrix games, alternative from the original by Nash. And, more important, it is constructive. Indeed, we have just to design an algorithm that starts from the artificial equilibrium in G1 × G2 , chooses a label to drop initially, and it obtains a path to an equilibrium.
35
3.3
Complementarity Pivoting and the LH Algorithm
The graph structure of best response polyhedra H1 and H2 with its vertices is identical to that of G1 and G2 , except from the m unbounded edges of H1 and the n unbounded edges of H2 that connect to "infinity" rather that to the additional vertex (0, 0) ∈ G1 × G2 . Constraints (3.3)-(3.6) of the LCP can be simplified by eliminating u and v from the problem. It works only if the payoff matrices are positive -subsequently u and v are positive. But it is true without loss of generality, because of Theorem 2. We now define the one-to-one correspondence and their inverses x0 = x/v,
y 0 = y/u,
x = x0 v,
y = y0u
(3.13)
and two new polyhedra: P1 = {x0 ∈ Rn | x0 ≥ 0, (x0 )T B ≤ 1}, P2 = {y 0 ∈ Rm | y 0 ≥ 0, Ay 0 ≤ 1} So the correspondences 3.13 define two bijections H1 → P1 − {0} and H2 → P2 − {0}. They are not linear, but they preserve the face incidences since a binding inequality in H1 corresponds to a binding inequality in P1 and vice versa. In particular, vertices have the same labels. See [35] for details. Moving on polyhedron vertices has an algebraic interpretation that leads to a simple implementation called pivoting. From polyhedra P1 and P2 we build a system of equalities with nonnegative variables. We have Ay 0 + r = 1
(3.14)
(x0 )T B + s = 1 with x0 , y 0 , r, s ≥ 0, where r and s are vectors of slack variables. Both systems of 3.14 are in the form Cz = q
(3.15)
Matrix C has full rank (due to nondegeneracy), so q belongs to the space spanned by the columns Ci of C. A basis β is given by a basis of the column space Cβ = {Ci | i ∈ β}, so that the square matrix Cβ is invertible. Let CN = {Ci | i ∈ / β} The corresponding basic solution is the unique vector zβ with Cβ zβ = q, where the variables zi , i ∈ β, are called basic variables, and zi = 0, i ∈ / β, are called nonbasic variables and are indicated with zN . 36
If this solution fulfills also z ≥ 0, then the basis β is called feasible. If β is a basis for 3.15, then the corresponding basic solution can be read directly from Cβ−1 Cz = Izβ + Cβ−1 CN zN = Cβ−1 q, called tableau, since the columns of Cβ−1 C for the basic variables form the identity matrix. The tableau is equivalent to the system zβ = Cβ−1 q − Cβ−1 CN zN
(3.16)
Pivoting is a change of the basis where a nonbasic variable zj , j ∈ / β enters and a basic variable zi , i ∈ β leaves the set of basic variables. Pivoting is possible if and only if the coefficient of zj in the ith row of the current tableau is nonzero, and is performed by solving the ith equation for zj and then replacing zj by the resulting expression in each of the remaining equations. Mechanically, a pivoting operation is accomplished by simple matrix computations listed in Algorithm 1. Algorithm 1: Pivoting 1
2 3
4
Let be given a tableau in the form T = (I N b), where I corresponds to the columns of the basic variables, N to the nonbasic ones and b the vector of constant terms. Let l and e be the column and the row corresponding to the leaving and entering variable, respectively. Let p = Tle be the pivot value. Tlh = Tlh /p, ∀h column. Tkh = Tkh − Tlh Tke , ∀k 6= l row, ∀h column. By this way Tke = 0, ∀k 6= l row and Tle = 1. Tkh s are the new elements of the tableau.
For a given entering variable zj , the leaving variable is chosen to preserve feasibility of the basis. Let the components of Cβ−1 q be bi and of Cβ−1 Cj be cij , for i ∈ β. Then the largest value of zj such that zβ is nonnegative is given by min{bi /cij | i ∈ β, cij > 0}
(3.17)
because we want ∀i, bi − cij zj ≥ 0, and thus ∀i, zj ≤ bi /cij , cij > 0 and we take the minimum. This is called minimum ratio test. Except in degenerate case, the minimum is unique and determines the leaving variable. After pivoting, the new basis is β ∪ {j} − {i}. Up to now, the method is similar to the Simplex algorithm. The difference comes from the choice of entering variables. We do not have to improve an objective function; instead, we need to maintain the complementarity conditions between a pivot step and the next one. In the system (3.14) one looks for a complementarity solution, expressed by slack variables
37
(x0 )T r = 0,
(y 0 )T s = 0
(3.18)
because it implies, with (3.13), the complementarity conditions (3.1) and (3.2), so that (x, y) is a NE. In a basic solution to (3.14), every nonbasic variable has value zero and represents a binding inequality, that is, a facet of the polytope. Hence, each basis defines a vertex which is labeled with the indices of the nonbasic variables. The variables of the system come in complementarity pairs (xi , ri ) for the indices i ∈ [n] and (yj , sj ) for j ∈ [m]. Recall that LH follows a path of vertices that have all labels in [n] ∪ [m] − {k}. Thus a k-almost completely labeled vertex is a basis that has exactly one basic variable from each complementarity pair, except for a pair of variables that are both basic. Correspondingly, there is another pair of complementarity variables that are both nonbasic, representing the duplicate label. One of them is chosen as the entering variable, depending on the direction of the computed path. The two possibilities represent the two k-almost completely labeled edges incident to that vertex. The algorithm is started with all components of r and s as basic variables and nonbasic variables (x0 , y 0 ) = (0, 0). LH is shown as pseudo code in Algorithm 2. Algorithm 2: LH 1
2
3 4
5
6
Let (A, B) be a bimatrix game such that A, B > 0. Construct the mixed LCP with Constraint (3.14). The initial solution is (x0 , y 0 ) = (0, 0) and the initial basic variables are {si , ∀i ∈ [m]} ∪ {ri , ∀i ∈ [n]} Choose randomly the first entering variable zj = k from the set {x0i | i ∈ [n]} ∪ {yi0 | i ∈ [m]}. Choose the leaving variable zi through the minimum ratio test. Pivot with zj as entering variable and zi as leaving variable. Update the basis. if the leaving variable zi is not equal to the first entering variable k or to its complementary variable then the new entering variable zj is the complementary to the leaving zi and go to 3 Normalize (x0 , y 0 ) by 3.13 and return the equilibrium (x, y)
Let us consider some properties of LH. First, it is complete, so it guarantees to find a NE. Second, chosen initially the label to be dropped, the path through almost completely labeled pairs to an equilibrium is unique. So the algorithm is nondeterministic, but all the nondeterminism is in its first move. Finally, LH provides also an elementary way to compute more than one NE. Indeed, we could restart the algorithm choosing different first variables to enter the basis. However, the algorithm has some limitations. In general we are not able to find all the Nash equilibria of a game. As we have seen from Theorem 38
10, LH can be thought of as exploring a graph of all completely and almost completely labeled pairs. But this graph could have disconnected components, and the algorithm is only able to find the equilibria in the connected subgraph that contains the artificial equilibrium. With respect to the search of a single equilibrium, our actual aim, the indeterminacy of the first move leaves some open questions. The algorithm provides no guidance on how to make a good first choice: are there ones that leads to a relatively short path to the equilibrium? We return to that question in Section 3.5 and in Chapter 5.
3.4
Degenerate Games and Lexicographical Order
The path computed by LH is unique only for nondegenerate games. But like other pivoting methods, it can be extended by lexicographical perturbation. It is worth noting that this kind of perturbation has no relationship with smoothed complexity, like the used below has nothing to do with equilibria. The lexicographic method extends the minimum ratio test in a way that the leaving variable is always unique. If we do not apply such a rule we are not able to avoid cycles in LH paths. The method simulates an infinitesimal perturbation of the right-hand side of the system 3.15. For any > 0 consider the k-dimensional system Cz = q + (1 , . . . , k )T
(3.19)
Let β be a basis for this system with basic solution zβ = Cβ−1 q + Cβ−1 (1 , . . . , k )T = b + b() and zj = 0, ∀j ∈ / β. Now, zβ is positive for all sufficiently small if and only if all rows of the matrix (b b()) are lexico-positive, that is, the first nonzero component of each rows is positive. This holds in particular for b > 0 when β is a nondegenerate basis for the unperturbed system. This yields a good ordering on basic variables. In consequence, the choice of leaving variables is always unique. The invariant that all a computed basis are lexico-positive is preserved by pivoting with the lexico-minimum ratio test min{bi /cij + bi ()/cij | i ∈ β, cij > 0}
(3.20)
Therefore, this extension requires extra comparisons only in the cases of parity of the standard minimum ratio test. This is because we take as infinitesimal, so we have to check b()/cij only if the minimum of bi /cij is not unique.
39
We have a very simple way to implement the lexico-minimum ratio test, without actually perturbing the system. Indeed, we have just to check the rows of Cβ−1 , and it is exactly what we have in our tableau. Thus we need to perform lexicographical comparisons between two rows, resulted in parity from minimum ratio test. It can be efficiently done comparing element by element of two rows but stopping as just two elements are different; the reason still comes from infinitesimal value of .
3.5
Complexity and Exponentially-Hard Games
One can ask if there is a LH path which is better than all others, with regard to their lengths. We want to find a polynomial-sized one. Whether we know whose such path is, we could always choose it, therefore LH is a polynomial algorithm for Nash. But that hypothesis does not hold. Indeed, the authors of [29], [30] built a game -called hard-to-solve- for which every LH path is exponentially long. It follows that LH has exponential worst-case complexity, because there exist some games always solved in exponential time by LH. Here, we do not give a detailed exposition about how to build hard-tosolve games (see Appendix A of [29]). We just point out few issues about polytopes construction. The n × n games are derived from dual cyclic polytopes. This is commonly done by taking the convex hull of 2n point µ(ti ) on the moment curve µ : t → (t, t2 , . . . , tn )T for 1 ≤ i ≤ 2n. There is another way to obtain such a polytope, that is by the trigonometric moment curve τ : t → (cos t, sin t, cos 2t, sin 2t, . . . , cos nt, sin nt) . We will discuss about that when we will consider numerical stability of hard-to-solve games generation Section 5.1. Hard-to-solve games have just one NE, that is fully mixed. It means that they can be simply solved through LP, putting all actions in the supports; a suitable support enumeration method, like PNS (Section 3.6), with reversed order of enumeration, leads quickly to a NE. Nevertheless, the equilibrium supports can be hidden in more sophisticated hard-to-solve games developed in [30]. Anyway, for our purposes, the version of those games with unique and fully mixed equilibrium is enough. And what about smoothed complexity? A relevant progress on smoothed complexity has recently shown that at least one version of the simplex algorithm for LP is polynomial [32]. The subsequent question is about LH, which is clearly related to the pivoting method used by simplex algorithm. But as we have already observed with Theorem 7, the following results hold [7] Theorem 11 (Smoothed Complexity of Lemke-Howson) It is unlikely that the Lemke-Howson algorithm has a polynomial smoothed complexity in n and 1/σ, unless PPAD ⊆ RP.
40
In our experimental tests (Section 5.3) we will wonder if hard-to-solve games are stable under perturbation, i.e., if the perturbed games still show exponential behavior. If all the known worst-case instances of a generic problem are not stable, then one could ask whether its smoothed complexity under these perturbation is low, or if there are other bad instances that are stable. But because of Theorem 11, if hard-to-solved games are not stable, there must exist other stable worst-case games (unless PPAD ⊆ RP).
3.6
The PNS Algorithm
LH is not the only algorithm available to compute a NE. A more recent approach is the Porter-Nudelman-Shoham’s (PNS) one, which is based on support enumeration. A deep analysis of this algorithm is necessary, because it is the basis of our Local Search extension (Chapter 4). The basic idea behind PNS [26] is that while a general problem of computing NE is a LCP, computing if there exists a NE within a given support for each player is a relatively simple LP without objective function, a feasibility problem. Given as input the supports (Sx , Sy ), if the problem is feasible, the solution is the strategy profile (x, y), that is a NE. Ay = u
(3.21)
T
(3.22)
x B=v ∗
∀i ∈ Sx
(3.23)
∗
∀i ∈ Sy
(3.24)
∗
∀i ∈ [n] − Sx
(3.25)
∗
∀i ∈ [m] − Sy
(3.26)
xi ≥ 0
∀i ∈ Sx
(3.27)
yi ≥ 0
∀i ∈ Sy
(3.28)
xi = 0
∀i ∈ [n] − Sx
(3.29)
yi = 0 ∀i ∈ [m] − Sy
(3.30)
ui = u vi = v
ui ≤ u vi ≤ v
T
(3.31)
T
(3.32)
x 1=1 y 1=1
X − Y denotes the difference between set X and set Y . Constraints 3.213.26 require that each player must be indifferent among all actions within his support and must not strictly prefer an action outside of his support. Note that u∗ and v ∗ are the scalar values of expected utilities. They imply that no player can deviate to a pure strategy that improves his expected utility, which is exactly the condition for the strategy profile to be a NE. Constraints (3.27)(3.30) ensure that Sx and Sy can be interpreted as the players’ supports: the pure strategies in Sx and Sy must be played with nonnegative probabilities, 41
and the pure strategies out from the supports Sx and Sy must be played with zero probability. Finally, the last Constraints (3.31)-(3.32) assure the probabilities sum to one. Thanks to this LP, the problem of computing a NE can be reduced to searching with a heuristic in the supports space. Note that this space grows as 4n [23], and it means PNS has an exponential worst-case complexity. The way PNS searches in this space is a support enumeration method. To order the search space, PNS privileges supports that are small and balanced. In aim of efficiency, PNS separately instantiates each player’s support, pruning the search space whenever an action is conditionally strictly dominated. Definition 25 (Conditionally Strictly Dominated Action (1)) For the row (column) player, an action i ∈ [n] ([m]) is conditionally strictly dominated (c.s.d.), given a set of available actions Ry ⊆ [m] (Rx ⊆ [n]) for the column (row) player, if ∃i0 ∈ [n]([m]), ∀j ∈ Ry (Rx ) : Aij < Ai0 j (Bji < Bji0 ). If the inequality holds for all actions in Ry (Rx ) (i.e., for all pure strategies over the actions in Ry (Rx )), then it still holds for all mixed strategies over Ry (Rx ). Even the vice versa is true, because a pure strategy is a trivial case of a mixed strategy. Therefore, the following definition is equivalent to the previous one. Definition 26 (Conditionally strictly dominated action (2)) For the row (column) player, an action i ∈ [n] ([m]) is conditionally strictly dominated (c.s.d.), given a set of available actions Ry ⊆ [m] (Rx ⊆ [n]) for the column (row) player, if ∃i0 ∈ [n] ([m]), for all possible strategy y (x) over Ry (Rx ) : ei Ay < ei0 Ay (xBei < xBei0 ). In a NE, no action played with positive probability can be conditionally strictly dominated given the actions in the support of the opponent’s strategy. The problem of checking whether an action is conditionally strictly dominated is equivalent to the problem of checking whether the action is strictly dominated by a pure strategy in a reduced version of the original game (Definition 13 in Section 2.1). Algorithm 3 enumerates all the possible supports at Step 1, selects a support for row player at Step 2 and for column player at Step 5, prunes the space of actions of the column player by strict conditional dominance at Steps 3, checks the strict conditional dominance on row player’s support at Step 4 and Step 6, and, finally, checks the feasibility problem at Step 7. If the problem is feasible, PNS returns (x, y), i.e., a mixed strategy profile that is a NE. The preference of small supports increases the advantage of checking for conditional dominance. For instance, if the row player takes only two actions 42
Algorithm 3: PNS 1
2 3 4 5 6 7 8
for all supports size z = (zx , zy ), sorted in increasing order of, at first, |zx − zy |, and, second, (zx + zy ) do for all Sx ⊆ [n] such that |Sx | = zx do Ry ← {a ∈ [m] not c.s.d., given Sx } if 6 ∃a ∈ Sx c.s.d., given Ry then for all Sy ⊆ [m] such that |Sy | = zy do if 6 ∃a ∈ Sx c.s.d., given Sy then if 3.21-3.32 are satisfaible for (Sx , Sy ) then return (x, y).
in his support, often many of the column player’s actions are pruned, because only two inequalities must hold for one action to conditionally dominate another. PNS is complete, i.e., it always finds a solution, because it considers all supports sizes and because it prunes only strictly dominated actions which might not be in Nash equilibria.
3.7
The MIP Algorithm
Finally, we briefly introduce another recent algorithm by Sandholm-GilpinConitzer, called MIP [28]. We just use this algorithm as term of comparison in our experimental analysis. This algorithm is based on a Mixed-Integer linear Program (MIP) formulation of the NE finding problem, where a mixedinteger linear program is a linear program with some integer variables. Substantially, it codes through binary variables whether or not an action is in the support. Studies in [28] also provide several methods to speed up the resolution in CPLEX [9]. Here we just report the best formulation found there. A new concept used in MIP formulation is that of regret: the regret of a pure strategy is the difference in utility for a player between playing that pure strategy and a best response strategy. In any equilibrium, every pure strategy is either played with probability 0, or has 0 regret. Moreover, any vector of mixed strategies for the players where every pure strategy is either played with probability 0, or has 0 regret, is an equilibrium. This is another way to say that each player has to be indifferent among the actions in his support and that he must prefer the actions in the support with respect to the other available actions. Furthermore, we define the binary variables si , ti ∈ {0, 1}, which have the role of supports. They are set to 1 if the action i is in a support (and 43
so the regret is 0), otherwise 0. Finally, constants Ux∗ and Uy∗ are the maximum difference between two utilities in the game for row and column player. Formally they are Ux∗ = Uy∗ =
max j∈[n],k∈[m]
max j∈[n],k∈[m]
Ajk − Bjk −
min j∈[n],k∈[m]
Ajk ,
min j∈[n],k∈[m]
Bjk
The formulation of the MIP follows
Ay = u
(3.33)
T
x B=v
(3.34)
∗
(3.35)
∗
(3.36)
∗
rx = 1u − u
(3.37)
∗
ry = 1v − v
(3.38)
rx ≤ Ux (1 − s)
(3.39)
ry ≤ Uy (1 − t)
(3.40)
x≤s
(3.41)
y≤t
(3.42)
x, y, u, v, rx , ry ≥ 0
(3.43)
u ≤ 1u v ≤ 1v
T
(3.44)
T
(3.45)
x 1=1 y 1=1
Constraints (3.33)-(3.38) define expected utilities and the regrets of strategies; u∗ and v ∗ are scalar variables. Constraints (3.39)-(3.40) ensure that the regret of a strategy is set to 0 when s or t are 1; otherwise express that the regret cannot be more than Ux∗ and Uy∗ . Constraints (3.41)-(3.42) ensure that s and t can be 0 only when x and y are, respectively, and that if x and y are positive, s and t cannot be 0, respectively. These assumptions are comprised without loss of generality, because all games have always equivalent formulations in which utilities are nonnegative (Theorem 2). MIP can gain evident speed improvement when the problem also includes an objective function. The best one is the maximization of the social welfare, that is the sum of the expected utilities max u∗ + v ∗
44
3.8
The Performance
LH, PNS and MIP are all exact algorithms for finding a NE, but of course they have different performance. Their performance have been analyzed in [28], from which we cite the results. The experiment is based on GAMUT [24], a test suite of game generators designed for testing game-theoretic algorithms. GAMUT randomly generates games from different classes, already studied in literature, through some parametrization options. The execution of the algorithms stops when the first NE is found or if no equilibrium is found in 10 minutes. BertrandOligopoly BidirectionalLEG_CG BidirectionalLEG_RG BidirectionalLEG_SG CovariantGame_Pos CovariantGame_Rand CovariantGame_Zero DispersionGame GraphicalGame_RG GraphicalGame_Road GraphicalGame_SG GraphicalGame_SW LocationGame MinimunEffortGame PolymatrixGame_CG PolymatrixGame_RG PolymatrixGame_Road PolymatrixGame_SW RandomGame TravelersDilemma UniformLEG_CG UniformLEG_RG UniformLEG_SG WarOfAttrition OVERALL:
LH 0.04 0.06 0.05 0.06 0.06 376.92 263.48 0.05 96.02 277.80 133.07 168.49 0.05 0.05 72.82 76.26 1.26 145.38 172.08 0.02 0.05 0.05 0.05 4.29 1778.50
PNS 0.01 0.01 0.01 0.01 0.01 267.81 0.13 0.01 0.05 0.13 0.10 0.09 0.01 0.01 65.13 0.01 0.05 0.13 0.16 0.01 0.01 0.01 0.01 0.01 333.94
MIP 286.54 22.52 2.35 0.13 0.47 203.87 99.91 0.01 127.96 151.18 181.98 234.56 0.45 0.04 76.80 42.70 7.03 85.83 168.32 0.05 0.81 2.60 0.16 0.03 1696.29
Table 3.1: Average time to find an equilibrium in 150 x 150 games (10 instances) with LH, PNS and MIP. The percentage of time-outs for LH, PNS and MIP are 8.3%, 2.0% and 7.5%, respectively [28]. As it is possible to see in Table 3.1, unless LH is the most known algorithm, with this kind of games PNS outperforms it, and also MIP. The main reason is that for each class, each game has at least a 50% chance to have a pure NE [26], that PNS can find very quickly due to its search 45
strategy that privileges small supports. The privilege on balanced supports increase the performance because nondegenerate games have only balanced equilibria, and degenerate games are unlikely generated by GAMUT. But PNS is not always better than the others. We cite the games described in [28], that have a unique medium-size supports equilibrium. We call these games SGC, from the name of the authors. SGC’s games follow these rules: • for any positive integer k, the game Gk has actions a1 , . . . , a2k−1 , b1 , . . . , b2k for the row player and actions c1 , . . . , c2k−1 , d1 , . . . , d2k for the column player • u(ai , ci+1(mod2k−1) ) = (2, 4), u(ai ), u(ai , ci−1(mod2k−1) ) = (4, 2) • u(ai , cj ) = (3, 3)f orj 6∈ {i ± 1(mod2k − 1)} • u(ai , dj ) = (2, 0), u(bi , cj ) = (0, 2), u(bi , di ) = (3, 0) • u(bi , di+1 ) = (0, 3) for odd i, u(bi , di−1 ) = (0, 3) for even i • u(bi , dj ) = (0, 0) otherwise. An example of this kind of game is in Figure 3.2.
a1 a2 a3 b1 b2 b3 b4
c1 3, 3 4, 2 2, 4 0, 2 0, 2 0, 2 0, 2
c2 2, 4 3, 3 4, 2 0, 2 0, 2 0, 2 0, 2
c3 4, 2 2, 4 3, 3 0, 2 0, 2 0, 2 0, 2
d1 2, 0 2, 0 2, 0 3, 0 0, 3 0, 0 0, 0
d2 2, 0 2, 0 2, 0 0, 3 3, 0 0, 0 0, 0
d3 2, 0 2, 0 2, 0 0, 0 0, 0 3, 0 0, 3
d4 2, 0 2, 0 2, 0 0, 0 0, 0 0, 3 3, 0
Figure 3.2: 7x7 SGC’s game. [28]. PNS performs particularly bad solving SGC’s games. It is slower than both MIP and LH, that is the best.
46
Chapter 4
Local Search and LS-PNS Table 3.1 shows that there are some classes of game whose instances are hard to be solved with all the previous algorithms. The reason is that they search for an equilibrium by enumerating all the possible solutions in a static way, the number of which rises exponentially in the number of players’ actions, and, in the worst case, the algorithms must explore the whole solution space. In PNS the number of supports rises as 4n and in LH the number of vertices of a best response polyhedron rises as 2.6n [23], [35]. Thus, we isolated the most difficult games of hard classes and we designed an algorithm to solve them. The algorithm is based on the combination of support enumeration method and Local Search. In Section 4.1 we introduce groundings of Local Search techniques, that are the basis to understand the algorithm described in Section 4.2. At the end of the chapter in Section 4.3 experimental results of the algorithm are given.
4.1
Local Search
Local Search techniques are tools commonly employed to address combinatorial optimization problems [1]. Essentially, they are heuristic algorithms and produce solutions that are not necessarily optimal, but that can be found within an acceptable amount of time. In this context a solution is an element of the search space. However, it is worth noting that for several problems, e.g., SAT, Local Search techniques allow to handle instances, finding exact solutions, that the faster non-heuristic search algorithm is not able to manage [18], [27]. For all those practical problems which are difficult to solve, the use of Local Search techniques is an interesting idea to find a good solution quickly. The problem of computing a NE is difficult, because it is generally believed that there are not polynomial algorithm to solve it (see Section 2.4). Local search needs the provision of a topology over the space of the solutions in terms of neighborhood and the exploitation of this topology to 47
search iteratively for better solutions. Formally, an instance of a combinatorial optimization problem is a pair (Θ, f ), where the solution space Θ is a finite or countably infinite set of solutions and the function f is a mapping f : Θ → R that assigns a real value to each solution in Θ. f is called goal function. A combinatorial optimization problem is the determination of a solution s∗ ∈ Θ such that f (s∗ ) is globally optimal. From here on we consider the problem of minimizing f . The key feature of Local Search algorithms is the neighborhood function. For an instance (Θ, f ) of a combinatorial optimization problem, a neighborhood function is a mapping N : Θ → P(Θ), where P is the power set. The neighborhood function specifies for each solution s ∈ Θ a set N (s) ⊆ Θ which is called the neighborhood of s. A solution s0 is a neighbor of s if s0 ∈ N (s). Usually, a function d : Θ × Θ → N that returns the distance between two solutions is introduced and the neighborhood function N is derived from d: a solution s0 ∈ N (s) if d(s, s0 ) ≤ δ, where δ is a parameter. Usually, solutions are represented by discrete structures (e.g., sequences and permutations) and neighborhood functions are defined in terms of local rearrangements, such as swapping, moving, and replacing items. Local Search algorithms start with an initial solution s and then they iteratively generate a new solution that is near the current one, according to the neighborhood function. The choice of the next solution is driven by heuristics. We review the most common heuristics [1] that we will use in our algorithms. The first one is iterative improvement: given a solution s, the algorithm explores the neighborhood N (s) of s and accepts a solution according to a given rule. In particular, in the case of best improvement rule, it accepts the best solution s0 ∈ N (s) with f (s0 ) < f (s). The inconvenience of this policy is the necessity of scanning the whole neighborhood to find the best solution. Thus there is also a different rule called first improvement, which accepts the first generated solution s0 ∈ N (s) with f (s0 ) < f (s). With the previous rules, solutions can be generated according to a given order or random. The second one is Metropolis: given a solution s, its neighbors are explored randomly and a solution s0 ∈ N (s) is always accepted f (s)−f (s0 ) 0 if f (s ) < f (s), and is accepted with probability exp , where t is t a parameter called temperature, if f (s) ≤ f (s0 ). A Local Search algorithm that uses only heuristics, most of times, stops with a locally optimal solution which is much worse than the optimal one. A solution s is a local optimum if for all the neighbors s0 ∈ N (s) f (s) ≤ f (s0 ). In order to escape from local optima which are not global optima, metaheuristics are commonly employed. Some examples of metaheuristics follow. Random restart generates a new starting solution when either a local optimum is reached or the heuristic does not reach a global optimum in a given number of visited solutions. The heuristic is then applied to the new 48
starting solution in order to find global optima. Simulated annealing can be applied in combination with Metropolis heuristic. Its basic idea is to progressively decrease the value of the temperature t while executing Metropolis. The aim is to widely explore the search space at the beginning and then converge to the optimal solution. Tabu search is a search strategy that tries to take advantage of history. This metaheuristic stores in a finite-sized list the sequence of the last visited solutions. When the algorithm has to choose the next solution, it compares the new solution with the ones in the tabu list and avoids to select an already visited solution. Variable neighborhood is based on a collection of increasing in size neighborhoods N1 , ..., Ni . At the beginning, the heuristic is applied with the smallest neighborhood (N1 ). But, if a local optimum is reached, we try to escape from it applying the heuristic with a bigger neighborhood (N2 ). In general, if we are in a local optimum in the neighborhood Nj , Nj+1 is chosen to perform larger search. When we find a better solution, the heuristic is still applied to the new solution with neighborhood N1 .
4.2
LS-PNS
We designed an algorithm based on Local Search with the aim to solve difficult games that the other algorithms cannot solve in reasonable time. We call it Local Search-PNS (LS-PNS) [6], [5]. To the best of our knowledge this is the the first Local Search algorithm adopted to compute Nash equilibria in mixed strategies. In this section we propose different choices for the dimensions in which the algorithm can be parametrized, e.g., goal function, heuristic, and subsequently we evaluate in Section 4.3 their impact over the performance of the Local Search algorithm. First of all, the problem of searching for a NE has been written as the minimization of a combinatorial optimization problem instance (Θ, f ) where Θ is the space of the support profiles S = (Sx , Sy ) and f is a function that assigns each support profile S a value in [0, 1]. This optimization problem is combinatorial, the size of the domain of f being combinatorial in the number of the players’ actions. For simplicity, the support Sx of row player is represented through a binary vector such that action i ∈ [n] is in the support if the i-th position of Sx is equal to 1, it is equal to 0 otherwise. Sy is represented in the same way. We define f such that its global optima correspond to Nash equilibria, formally, f (S) = 0 when there is a NE with S. When they do not lead to any NE, f (S) can be defined arbitrarily, as long as f (S) > 0. A good f should be easily computable, should make the search efficient and should represent the distance, in the neighborhood space, between the actual solution and the nearest global optimum. We provide four 49
different definitions of f , each one based on a different criterion. In all the definitions, f is not convex and presents local optima. The idea behind the first two goal functions is inspired to Operations Research: each S is assigned a measure of the infeasibility of the associated mathematical programming problem composed of Constraints (4.1)-(4.12). The third and fourth goal functions resort to Game Theory concepts: approximate equilibrium and player’s regret, respectively. Irreducible Infeasible Subset. Consider the LP problem, given in the PNS exposition (Section 3.6)
Ay = u
(4.1)
T
(4.2)
x B=v ∗
∀i ∈ Sx
(4.3)
∗
∀i ∈ Sy
(4.4)
∗
∀i ∈ [n] − Sx
(4.5)
∗
∀i ∈ [m] − Sy
(4.6)
xi ≥ 0
∀i ∈ Sx
(4.7)
yi ≥ 0
∀i ∈ Sy
(4.8)
xi = 0
∀i ∈ [n] − Sx
(4.9)
yi = 0 ∀i ∈ [m] − Sy
(4.10)
ui = u vi = v
ui ≤ u vi ≤ v
T
(4.11)
T
(4.12)
x 1=1 y 1=1
Fixed a support profile S, f (S) = 0 if the above linear programming problem is feasible. In the case in which it is not feasible f (S) assumes a value that provides a sort of infeasibility degree of the problem. In Operations Research literature, there are two main approaches to analyze infeasible linear programming problems [8]: finding an Irreducible Infeasible Subset (IIS) of constraints and finding a Maximum Feasible Subset (MFS). For a linear programming problem, an irreducible infeasible set is an infeasible subset of constraints and variable bounds that becomes feasible if any single constraint or variable bound is removed. Identifying an IIS can help to isolate the structural infeasibility in a linear programming problem. The computation of an IIS is usually fast. Furthermore, finding IISs is wellunderstood and is incorporated in numerous commercial solvers. Finding the maximum feasible subset is the same thing as finding the smallest number of constraints to remove such that the remainder constitute a feasible set. Differently from the case of IIS, finding a MSF is an NPcomplete problem and is usually solved by using heuristics [8]. This pushes us to focus on IIS. In particular, f is defined as a function of the size of var −size(IIS(S)) the IIS, namely, f (S) = #con +# where #con is the number of #con +#var 50
constraints in the problem and #var is the number of variables. The idea is simple, the larger the IIS the lower the degree of infeasibility of the problem. Given S, we compute the IIS size by solving the corresponding feasibility problem with CPLEX [9]. Specifically, we call CPLEX from AMPL [12] setting the options: presolve 0 and iisfind 1. Inequality Constraint Violations. In the infeasibility analysis discussed in the previous section, equality and inequality constraints have the same importance. Instead, in this section, the equality constraints are forced to be satisfied and we measure the violations only of the inequality constraints. The basic idea is that with nondegenerate games the equality constraints constitute a non-singular system of linear equations: the number of variables, i.e., the strategies x and y, and the number of linearly independent constraints are the same. Therefore, the values of the probabilities x and y can be easily computed by solving a linear system without resorting to mathematical programming and their values, by definition of system of linear equations, are unique. Once the values of the probabilities have been computed, for each inequality constraint we check whether or not it is violated. We need to introduce a new constraint over probabilities since solving the equality constraints can produce probabilities larger than one
xi ≤ 1 ∀i ∈ Sx
(4.13)
yi ≤ 1 ∀i ∈ Sy
(4.14)
In counting the number of violated constraints, for each variable of x and y we consider Constraints (4.9), (4.10), (4.13) and (4.14) as a unique bound constraint. By this way, exactly one constraint is assigned to each variable xi (yi ): if i ∈ Sx (i ∈ Sy ), a bound constraint is assigned, while, if i 6∈ Sx (i 6∈ Sy ), a Constraint (4.5) ((4.6)) is assigned. # We define f (S) = vio_icon where #vio_icon is the number of violated in#icon equality constraints and #icon is the overall number of inequality constraints. We implemented a C function calling GSL (Gnu Scientific Library) [13] to solve the linear system and compute #vio_icon . This makes very fast the computation. Best -supp-NE. Here the idea is to compute the best approximate equilibrium given a support profile S. The solution concept of -supp-NE equilibrium (Section 2.2) is used. Given a support profile S, the problem of computing the -supp-NE equilibrium that minimizes can be formulated as a linear optimization problem as follows
51
min
(4.15)
Ay = u
(4.16)
T
(4.17)
x B=v ui + ≥ uk
i ∈ Sx , k ∈ [n]
(4.18)
vi + ≥ vk i ∈ Sy , k ∈ [m]
(4.19)
xi ≥ 0
∀i ∈ Sx
(4.20)
yi ≥ 0
∀i ∈ Sy
(4.21)
xi = 0
∀i ∈ [n] − Sx
(4.22)
yi = 0
∀i ∈ [m] − Sy
(4.23)
T
(4.24)
T
y 1=1
(4.25)
≥0
(4.26)
x 1=1
Constraints (4.18), (4.19) and (4.26) code the definition of -supp-NE. Call ∗ the result of the above minimization. It can be easily observed that each problem is always feasible and therefore an ∗ can be assigned to each support profile S. In the case the aim is to find an -NE (not wellsupported), the search can be stopped when a support profile with ∗ ≤ is found (Definition 15 and its consequences). Notice instead that no LP formulation can be provided to find the best -NE given the support profile S. This problem is intrinsically quadratic. We call U ∗ = max{
max
j∈[n],k∈[m]
Ajk −
min j∈[n],k∈[m]
Ajk ,
max j∈[n],k∈[m]
Bjk −
min j∈[n],k∈[m]
Bjk }
It is the largest difference between the maximum and the minimum payoff that a player can receive. We define f (S) = ∗ /U ∗ . We use U ∗ to normalize f in [0, 1]. Notice that searching for S such that ∗ = 0 is equivalent to search for a NE. We compute ∗ by calling CPLEX from AMPL. The computation of f in this case requires more computational effort than in the previous two definitions of f , being an optimization problem. On the other hand, differently from the two previous cases, it gives a quality measure of the equilibrium. Minimum Regret. In this section, we define f (S) as the players’ minimum regret [28]. See also Section 3.7. Given S, the computation of the players’ minimum regret can be formulated as a LP very similar to the one for the computation of the best -supp-NE. Call rxi the regret of row player from action i, and symmetrically for the column player.
52
min
X
X
rxi +
ryi
(4.27)
Ay = u
(4.28)
T
(4.29)
j∈Sy
j∈Sx
x B=v uj + rxi ≥ uk
∀i ∈ Sx , k ∈ [n]
(4.30)
vj + ryi ≥ vk ∀i ∈ Sy , k ∈ [m]
(4.31)
xi ≥ 0
∀i ∈ Sx
(4.32)
yi ≥ 0
∀i ∈ Sy
(4.33)
xi = 0
∀i ∈ [n] − Sx
(4.34)
yi = 0
∀i ∈ [m] − Sy
(4.35)
T
(4.36)
T
(4.37)
x 1=1 y 1=1 rxi ≥ 0
∀i ∈ Sx
(4.38)
ryi ≥ 0
∀i ∈ Sy
(4.39)
Constraints (4.30) and (4.31) code the definition of regret. Notice that the main difference between this formulation and previous one is that here the cumulative regret is minimized, instead in the previous section the maximum regret () is. Call r∗ the result of the above minimization. Notice that a strategy profile with r∗ is also an -supp-NE with ≤ r∗ . In the case the aim is to find an -NE, the search can be stopped when a support profile with r∗ ≤ is found. As before, it can be easily observed that the above optimization problem is always feasible and therefore a cumulative regret r∗ can be assigned to each support profile S. Notice that searching for S such that r∗ = 0 is ∗ equivalent to search for a NE. We define f (S) = U ∗ (|A1r|+|A2 |) where U ∗ is defined in the previous section. We use U ∗ to normalize the regret of each action in [0, 1] and we divide by |A1 | + |A2 | to normalize r∗ in [0, 1]. We compute r∗ by calling CPLEX from AMPL. The second dimension is the neighborhood function, which is based on the chosen topology. The topology is the same of PNS, the cartesian product of the support spaces. The neighborhood function N (S) of LS-PNS is based on the distance between two solutions. In particular, d(S, S 0 ) is the Hamming distance between S and S 0 : d(S, S 0 ) =
n X
0 |Sxi − Sxi |+
i=1
m X
0 |Syi − Syi |
i=1
where Sxi ∈ {0, 1} is the i-th element of Sx , and where Syi ∈ {0, 1} is the i-th element of Sy . For instance, given Sx = (1, 1, 1, 0, 0), Sy = (1, 1, 1, 0, 0) 53
and Sx0 = (1, 1, 0, 0, 0), Sy0 = (0, 1, 1, 0, 0), the distance is d(S, S 0 ) = 2. The neighborhood function Nk (S) is defined on the basis of parameter k as follows Definition 27 (Neighborhood Function) Given a support profile S, its k-neighborhood Nk (S) is composed of all the support profiles S 0 such that d(S, S 0 ) ≤ 2k. The use of d(S, S 0 ) ≤ 2k instead of d(S, S 0 ) ≤ k is due to our focus on nondegenerate games and on the discussion on balanced supports given in Section 2.3. Note that the shortest distance between two balanced support profiles S and S 0 is d(S, S 0 ) = 2. Indeed, d(S, S 0 ) = 1 means that only one action is added to or removed from the support of exclusively one player. Using d(S, S 0 ) ≤ k with k = 1 would force one to generate only non-balanced supports as neighbors of a balanced support. This would lead the search algorithm to spend much time to explore many non-balanced supports. If follows a brief comparison between the size of the search space explored by PNS with respect to that explored by Local Search techniques. Exclusively balanced supports are considered in the comparison, since PNS scans at first those and then the non-balanced ones and usually equilibrium is with balanced supports. PNS exhaustively explores a space composed of 2n that grows asymptotically as 4n [35]. Local search with k arbitrary n 2 P explores iteratively neighborhoods Nk (S) of size ki=1 ni . Evaluating a neighborhood with small values of k (e.g., k = 1) requires the checking of a negligible number of supports with respect to the whole support space. For instance, |N1 (S)| = O(n2 ) is negligible with respect to 4n . The development of good goal functions f s and good heuristics, that guide the search to the portion of support space where equilibria are, can allow algorithms to explore a small number of support profiles with respect the whole space and thus to save computational time with respect to PNS. When n is large, the neighborhood Nk (S) could be excessively large and the search could result inefficient. In order to reduce the size of the neighborhood we resort to the concept of conditional strict dominance (Definition 25). If at least a conditionally strictly dominated action is in S, then S cannot lead to any equilibrium. We use this concept to discard solutions in Nk (S). Practically, at first we check whether or not a support profile S has conditionally dominated actions. This task requires a computational time that is negligible with respect to the evaluation of f (S). Then, if a support profile S does not present any conditionally dominated action, we evaluate f (S). We experimentally evaluate the impact of discarding conditionally dominated solutions in some instances (5 per class) of PolymatrixGame-RG, GraphicalGame-RG and CovariantRand-Game generated by GAMUT. The percentage of discarded solutions is at least 50% in all the cases and thus the size of Nk (S) halves.
54
The remaining dimensions are the choice of the initial solution, heuristic and metaheuristic applied. Initial solution. The algorithm uses a Random Generation (RG) with threshold upper-bound: all the solutions Ss randomly generated with f (S) > upper-bound are discarded. When upper-bound= 1, the threshold is disabled. Heuristics. We implemented Iterative Improvement (II) and Metropolis (MET). II can use the classical Best improvement (II-BI) where all the neighbors are generated according to the order of actions, or two different versions of the First improvement rule. Elementary First Improvement (IIFI): the neighbors are generated according to the order of actions and the first generated one better than the current is chosen as next solution. First improvement with Random generation (II-FIR): the neighbors are generated randomly and the first solution better than the current one is chosen as next solution. We define a parameter, named max-trials, that sets the maximum number of solutions to be generated in a given neighborhood. When no solution among the max-trials generated ones is better than the current one, it is considered to be a local optimum and metaheuristics are activated. We designed an ad hoc Variation of II-FIR, named II-FIRV. With this rule, at first a small solution space that contains, with high probability, high quality solutions is explored and then the exploration continues randomly as in II-FIR. We implemented II-FIRV only when f is defined as the minimization of or the minimization of the players’ regret (we shall show below that they provide the best performance). By computing f (S), we obtain also the values ui s and vi s for each action. The basic idea is to exploit this information in the generation of the neighbors: II-FIRV generates solutions where the worst actions in the support (those that minimize ui or vi ) are removed and the best actions out the support (those that maximize ui or vi ) are introduced. At first the algorithm visits the neighborhood generating solutions where • the worst action in the support of each player is removed without adding new actions • the best action out from the support of each player is added without removing any action • the worst action in the support of a player is removed and the best one out from the support is added The same can be accomplished removing and adding multiple actions. But we do not consider again this generalization because preliminary tests shown it worsens performance. Finally, we implemented Metropolis (MET) with parameter temp. In MET, the solution generation is random. Parameter max-trials works as for II-FIR. 55
Metaheuristics. We implemented Random Restart (RR) by setting a parameter max-iterations as the longest sequence of solutions to be considered. After to have visited max-iterations solutions, a new randomly generated solution is produced. We implemented Simulated Annealing (SA) by setting temp as a function of the iteration. We implemented Tabu Search (TS) by introducing a circular list containing the last memory visited solutions, where memory is a parameter. Whenever a solution is generated, we check whether or not it is in the list. In the former case we discard it, otherwise we evaluate f . Finally, we implemented Variable Neighborhood Search (VNS). Metaheuristics are repeated until either an equilibrium has been found or a temporal given deadline is expired.
4.3
Experimental Analysis
Simple games can be efficiently solved by using PNS, thus for our experimental analysis of LS-PNS we isolated hard game instances. We focus on PolymatrixGame-RG, GraphicalGame-RG and CovariantGame-Rand. Only some instances of these classes are hard. In this experimental analysis, a game is considered hard if it cannot be solved by LH, PNS and MIP within two hours. We implemented LH with C, MIP with AMPL [12], PNS and LS-PNS with C calling CPLEX [9] via AMPL. Game used for the analysis had the same number of actions (they are square games) for each player and the payoffs were normalized. We evaluated performance through a UNIX computer with dual quad-core 2.33GHz CPU and 8GB RAM. The first step was to optimize the performance of LS-PNS tuning the value of parameters. We evaluated how max-trials and max-iterations affect the effectiveness of the algorithm. Each value is expressed as a func2 tion of the number of players’ actions. Tested values are: n, 2n, n2 , n2 , and 2n2 . We evaluated II-FIR with RR for all the possible combinations of max-trials and max-iterations when f is defined as ∗ and r∗ with a deadline of 10 minutes. In this first phase of the analysis, for each specific configuration, five hard instances of CovariantGame-Rand class with 50 actions has been executed 5 times. Table 4.1 describes the percentage with which a NE has been found and the value of ∗ and r∗ , respectively, of the best found solution when no equilibrium is reached (we report only data related to ∗ and r∗ , because these two goal functions are the most significant). The best configuration, both for ∗ and r∗ , is with max-trials= n2 and max-iterations= n2 . The reason is that a small value of max-trials does not allow the algorithm to explore a sufficient number of solutions near to the current solution and then to find a solution better than the current one, if there exists. This happens when the current solution is with high quality and a small number of neighbors is better than it. On the other side, 56
when max-trials= 2n2 , the algorithm spends too much time in generating solutions. A small value of max-iterations does not allow the algorithm to follow paths of solutions that are sufficiently long. This leads the algorithm to have a random restart before reaching a local optimum. This configuration performed well also with bigger games, so we used this in the following. In the preliminary experiments, k had the value 1 in the definition of the neighborhood function Nk (S) (except for VNS where the value of k changes during the execution of the algorithm), conditional dominance was disabled and we set upper-bound= 1.
max-iterations
n 2n n2 /2 n2 2n2
n2 − 2n (0%) − (0%) − (0%) − (0%) − (0%) − (0%) − (0%) − (0%) − (0%) − (0%) −
max-trials n2 /2 n2 (0%) − (4%) 416 (4%) 379 (8%) 353 (16%) 455 (24%) 483 (20%) 394 (32%) 424 (40%) 476 (32%) 460 (44%) 361 (68%) 329 (44%) 453 (52%) 408 (60%) 326 (92%) 272 (32%) 501 (40%) 482 (56%) 351 (72%) 297
2n2 (0%) − (4%) 367 (16%) 493 (24%) 437 (28%) 488 (52%) 422 (40%) 462 (76%) 401 (28%) 522 (52%) 456
f (S) ∗ r∗ ∗ r∗ ∗ r∗ ∗ r∗ ∗ r∗
Table 4.1: Percentage with which an equilibrium is found (within 10 minutes) and time in seconds needed to find it with II-FIR and RR. Then we executed LS-PNS to decide which is the best goal function, heuristic and metaheuristic. First, we focused on the three classical iterative improvement strategies, i.e., II-BI, II-FI, II-FIR. We compared their effectiveness for all the four objective functions f s with different temporal deadlines (i.e., 1, 2, 3, 4, 5, 10 minutes). For each specific configuration and game instance, we executed our algorithm 10 times for ten 50 × 50 CovariantGame-Rand instances. In Table 4.2, we report the percentage with which a Nash equilibrium has been found and the value of f of the best found solution when no equilibrium is reached. The values are averaged over all the executions. It can be observed that: the objective function defined as the minimization of r leads to the best results, as well as II-FIR outperforms the others for all the configurations. Then we compared our ad hoc heuristic with the best of the previous ones. II-FIRV’s best configuration is different from that of II-FIR: max-trials= n2 and max-iterations= n2 /2. With this configuration II-FIRV outperform II-FIR: the success probability is 80% within 5 minutes and 95% within 10 minutes. The use of MET with temp ∈ {0.5, 1.0, 1.5, 2.0} does not outperform II-FIR: the probability to 57
find a NE within a given deadline averagely halves. The best metaheuristic is RR. VNS provided results similar to that of RR, while the use of TS had no effects. We evaluated SA with MET: we set the initial temperature equal to the largest temperature for temp and we used an exponential decreasing function as cooling schedule. RR outperformed SA. We also evaluated if some random steps would improve the efficiency of the algorithm, but they had no effects on the performance. The last step of the tuning phase is the study of the incidence of the Conditionally strict Dominance (CD) and which is the best upper-bound (U-B) value for the initialization of the algorithm (Table 4.3). The tests has been done on five 50 × 50 instances of PolymatrixGame-RG, GraphicalGame-RG and CovariantGame-Rand, 5 times each. CD does not provide any advantage with games with less then 70 actions. Instead, with larger games, CD improves the performance halving the computational time. The best U-B decreases with the increase of the dimension of the games. The reason is that with games with an high number of actions the initial solution can be very far from the equilibrium. Summarily, we can say that LS-PNS solves with high probability smallmedium games within a short time and with small probability large games within reasonable time. The basic idea of Local Search is that it can find non optimal results in small time. Thus we implemented an anytime version of LS-PNS and we studied how it performs in comparison with the anytime version of PNS, LH and MIP. They return the best -NE found. With LS-PNS we store the best -NE (the smallest one) found during the execution and we returned it at the deadline. is evaluated for each strategy visited by the algorithm where = max{maxk∈[n] {uk − xT u}, maxk∈[m] {vk − y T v}}. Anytime LH computes the value of each visited vertex and when the deadline is reached return the best one. We implemented two version of anytime PNS algorithm with ∗ and r∗ , called PNS∗ and PNSr∗ , respectively. Anytime PNS works like LS-PNS, but they explores the space following the support enumeration criterion. We implemented two anytime SGC versions. The first one, called MIP∗ , that minimizes ∗ and at the deadline return the best strategy in term of -supp-NE. Then, given such strategy, the value of the -NE is determined. The second version, named MIPr∗ , is equal to the previous one except that at the deadline it returns the strategy that, so far, minimize the regret. In Table 4.4 we report the average value of the strategy returned by the algorithms. Tests were on 5 hard instances of CovariantRand-Games, executed 5 times. The Anytime LS-PNS (in both version with objective function ∗ and r∗ ) outperforms the other algorithms. It is followed by MIP∗ , while anytime PNS had poor performance (we report only PNS∗ because PNSr∗ had similar poor performance). Due to good results obtained by Local Search techniques, we could design a better algorithm to face general games, not only hard instances. In fact, we 58
59
1m (0%) 1.38·10−1 (0%) 1.73·10−1 (0%) 1.73·10−1 (0%) 4.66·10−1 (0%) 3.53·10−1 (0%) 2.72·10−1 (0%) 5.61·10−2 (0%) 4.43·10−2 (0%) 2.30 · 10−3 (0%) 7.15·10−1 (0%) 4.56·10−1 (0%) 3.61·10−2
2m (0%) 1.57·10−1 (0%) 1.65·10−1 (0%) 1.63·10−1 (0%) 4.52·10−1 (0%) 3.42·10−1 (0%) 2.65·10−1 (0%) 4.50·10−2 (0%) 3.46·10−2 (0%) 1.40 · 10−3 (0%) 6.77·10−1 (0%) 4.48·10−1 (12%) 8.10·10−3
3m (0%) 1.44·10−1 (0%) 1.61·10−1 (0%) 1.61·10−1 (0%) 4.32 · 10−1 (0%) 3.21 · 10−1 (0%) 2.48 · 10−1 (0%) 4.27·10−2 (0%) 3.04·10−2 (10%) 9.74 · 10−4 (0%) 5.96·10−1 (0%) 3.69·10−1 (15%) 5.40·10−3
4m (0%) 1.45·10−1 (0%) 1.64·10−1 (0%) 1.63·10−1 (0%) 4.28·10−1 (0%) 3.09·10−1 (0%) 2.42·10−1 (0%) 3.76·10−2 (0%) 2.94·10−2 (29%) 9.42 · 10−4 (0%) 5.62·10−1 (0%) 3.41·10−1 (55%) 2.60·10−3
time 5m (0%) 1.39·10−1 (0%) 1.68·10−1 (0%) 1.70·10−1 (0%) 4.18·10−1 (0%) 2.80·10−1 (10%) 2.40·10−1 (0%) 3.27·10−2 (0%) 2.24·10−2 (35%) 8.99·10−4 (0%) 4.89·10−1 (0%) 2.22·10−1 (61%) 8.58·10−4
10 m (0%) 1.55·10−1 (0%) 1.66·10−1 (0%) 1.70·10−1 (0%) 4.03·10−1 (0%) 2.63·10−1 (10%) 2.31·10−1 (0%) 2.25·10−2 (0%) 1.80·10−2 (52%) 8.20·10−4 (0%) 1.98·10−1 (0%) 9.76·10−2 (90%) 1.20·10−5
Table 4.2: Percentage with which an equilibrium is found and quality (i.e., f (S)) of the best found solution when no equilibrium is found with RR.
r∗
∗
#vio_icon
IIS
f (S)
II-BI II-FI II-FIR II-FI II-FI II-FIR II-BI II-FI II-FIR II-BI II-FI II-FIR
heuristic
suppose LS-PNS performs badly on simple games. We could combine classic PNS and our LS-PNS: the former runs from the beginning to a temporal deadline, within all simple games should be solved. Then, we guess the game is too hard to be solved by PNS. So it is stopped and the latter starts to find exact or approximated equilibrium.
60
61
CovariantGame-Rand success time CD best U-B 62% 502 s no 1.0 96% 327 s no 1.0 58% 688 s no 1.0 92% 443 s no 1.0 48% 932 s yes 1.0 76% 638 s yes 0.6 32% 1365 s yes 0.5 48% 1018 s yes 0.5 16% 2101 s yes 0.5 24% 1681 s yes 0.4 8% 4013 s yes 0.4 12% 3127 s yes 0.4
game classes GraphicalGame-RG success time CD best U-B 62% 618 s no 1.0 96% 403 s no 1.0 58% 836 s no 0.8 92% 595 s no 0.8 42% 1009 s yes 0.6 80% 787 s yes 0.6 28% 1524 s yes 0.6 48% 1190 s yes 0.4 12% 2754 s yes 0.4 28% 2042 s yes 0.4 8% 4606 s yes 0.4 16% 3511 s yes 0.4
PolymatrixGame-RG success time CD best U-B 62% 538 s no 1.0 96% 352 s no 1.0 58% 748 s no 0.7 92% 473 s no 0.7 52% 891 s yes 0.7 84% 596 s yes 0.7 40% 1467 s yes 0.6 52% 997 s yes 0.5 16% 2644 s yes 0.5 28% 1702 s yes 0.5 8% 4573 s yes 0.5 16% 3467 s yes 0.5
∗ r∗ ∗ r∗ ∗ r∗ ∗ r∗ ∗ r∗ ∗ r∗
f (S)
Table 4.3: Success percentages, computational times in seconds, conditionally dominance (CD) (whether or not it is used), and the value of the best upper-bound (best U-B).
100
90
80
70
60
50
actions
1m 7.53 · 10−4 8.14·10−4 1.70·10−3 2.83·10−3 6.02·10−2 7.95·10−2 8.76 · 10−4 9.21·10−4 2.76·10−3 2.17·10−3 9.87·10−3 4.55·10−2 9.86 · 10−4 1.03·10−3 2.28·10−3 4.37·10−3 4.37·10−2 6.23·10−2 2.01·10−3 1.67 · 10−3 2.62·10−3 1.08·10−2 5.87·10−2 5.03·10−2 3.12 · 10−3 4.71·10−3 5.59·10−3 8.11·10−3 5.27·10−3 6.16·10−2 5.67·10−3 6.83·10−3 4.10 · 10−3 1.34·10−2 8.06·10−2 9.41·10−2
deadline 3m 5m 50 actions 5.34 · 10−4 2.06 · 10−4 5.86·10−4 2.11·10−4 1.12·10−3 7.46·10−4 1.84·10−3 1.12·10−3 5.58·10−2 2.78·10−2 7.78·10−2 7.65·10−2 60 actions 6.58 · 10−4 3.12 · 10−4 8.01·10−4 4.33·10−4 1.49·10−3 1.33·10−3 2.80·10−3 2.14·10−3 7.28·10−3 7.22·10−3 4.24·10−2 4.24·10−2 70 actions 8.32 · 10−4 4.55 · 10−4 9.79·10−4 6.41·10−4 −3 1.58·10 1.33·10−3 2.69·10−3 3.16·10−3 −2 2.69·10 3.16·10−2 5.86·10−2 5.48·10−2 80 actions 9.31 · 10−4 6.73 · 10−4 1.15·10−3 8.73·10−4 1.87·10−3 2.05·10−3 9.08·10−3 7.98·10−3 2.65·10−2 2.12·10−2 3.74·10−2 3.55·10−2 90 actions 1.81 · 10−3 7.90 · 10−4 3.25·10−3 9.92·10−4 4.24·10−3 2.63·10−3 7.87·10−3 7.56·10−3 3.76·10−3 3.34·10−3 5.88·10−2 5.75·10−2 100 actions 3.31 · 10−3 1.27 · 10−3 4.04·10−3 3.74·10−3 3.48·10−3 2.87·10−3 −2 1.90·10 1.81·10−2 8.06·10−2 7.67·10−2 −2 9.18·10 9.18·10−2
10 m
algorithm
9.88·10−5 8.51 · 10−5 6.72·10−4 1.74·10−3 2.34·10−2 7.65·10−2
LS-PNS∗ LS-PNSr∗ MIP∗ MIPr∗ LH PNS∗
9.97·10−5 9.58 · 10−5 8.75·10−4 2.60·10−4 6.58·10−3 3.62·10−2
LS-PNS∗ LS-PNSr∗ MIP∗ MIPr∗ LH PNS∗
2.88 · 10−4 3.67·10−4 1.11·10−3 2.19·10−3 2.19·10−2 4.71·10−2
LS-PNS∗ LS-PNSr∗ MIP∗ MIPr∗ LH PNS∗
4.62 · 10−4 5.86·10−4 1.83·10−3 6.42·10−3 2.01·10−2 3.30·10−2
LS-PNS∗ LS-PNSr∗ MIP∗ MIPr∗ LH PNS∗
6.77 · 10−4 8.74·10−4 3.26·10−3 5.86·10−3 3.02·10−2 5.75·10−2
LS-PNS∗ LS-PNSr∗ MIP∗ MIPr∗ LH PNS∗
8.59 · 10−4 1.51·10−3 3.00·10−3 1.30·10−2 7.12·10−2 6.81·10−2
LS-PNS∗ LS-PNSr∗ MIP∗ MIPr∗ LH PNS∗
Table 4.4: Average of of the -Nash equilibria returned by the anytime algorithms.
62
Chapter 5
Enhanced LH In Chapter 3 we described the basic LH algorithm. In this chapter we expose some techniques that we introduced in order to optimize the performance of our LH implementation. In Section 5.1 we deal with the Revised Method of pivoting and Integer Pivoting. Further, we give the reason that pushes us to implement LH in exact arithmetic with integers of arbitrary precision, though GMP library. Section 5.2 briefly presents the theoretical background of Heavy Tails, which motivates the adoption of random restart techniques on LH. The chapter terminates with Section 5.3 in which we described the rrLH algorithm and conduct an experimental analysis. Through this algorithm we linearized the behavior of some hard-game classes. We also study the stability under perturbations of hard-to-solve games, introduced in Section 3.5.
5.1
Implementation Issues
Experiments in Section 4.3 has been conducted with a basic version of LH. Here we work to apply to LH some known optimization techniques. It is worth noting that the number of step performed by LH will not change, so the algorithm is unchanged in its properties. The methods explained in the present section just affect computational times. The first one is the Revised Method of pivoting, borrowed from Operations Research. If a tableau has more columns than rows, pivots may occur only on a small fraction of the columns. During pivoting operations, computing the elements of the unused columns is a waste of time. The Revised Method solves this problem avoiding unnecessary computations. In practice, the Revised Method saves computational time even if all the columns of the tableau are used during the pivoting operations. In a general situation, the initial tableau is in the form T = (B N b) where B are the columns of the basic variables, N are the remaining ones and b is the vector of constant terms. To exploit the Revised Method, it 63
is necessary to put the tableau to the canonical form T = B −1 (B N b) = (I B −1 N B −1 b). After each pivoting operation, knowing B −1 of the actual basis it is possible to reconstruct the whole tableau. Let Tij be the element of T in the ith row and in the jth column. Each column of the tableau can be computed as Tij0 = B −1 Tij , ∀i row, where T 0 is the new tableau. Thus, it is not necessary to perform pivoting operations on the whole tableau, but we only need to maintain B −1 updated after each basis change. This is obtained by applying pivot operations to the smaller tableau (B −1 B −1 Te B −1 b), where Te is the column of the original tableau T corresponding to the entering variable. The pivot is an element of the vector B −1 Te and the leaving variable, also with the Revised Method, is chosen through minimum ratio test. In sum, Revised Method works as classical pivoting of Algorithm 1, but it only acts on a restricted tableau (B −1 B −1 Te B −1 b). In the case of LH, we have to store and update two different B −1 matrices, one for each tableau. A simplification is that, at the beginning, tableaux are already in canonical form. Indeed, the initial basic variables are the slack variables, thus it is not necessary to compute B −1 because B = I = B −1 . During our experimental analysis we verified that LH suffers of problems of numerical stability, as known in literature. C-long double precision is not enough in order to always obtain the correct execution of the algorithm. Indeed, we observed that the minimum ratio test sometimes chooses the wrong leaving variable due to problem of numerical accuracy (e.g., a coefficient has a value slightly greater than its right value, Tij = 0, and a wrong variable is chosen as the outgoing variable). Thus the algorithm follows a wrong path that brings to a incorrect solution, i.e., the algorithm terminates its execution in a solution that is not a NE or never terminates. We first observed this behavior solving an instance of 50 × 50 TravelersDilemma: depending on the random initial choice, some paths deviated from the right ones computed with exact arithmetics; other executions even became infinite loops and never ended. This pushes us to increase the numerical precision of the implementation of the algorithm in order to obtain, always, the correct execution of LH. This is possible only with infinite numerical precision that we reached through the GNU Multiple-Precision library (GMP) [11], a free library written in C and able to provide arbitrary precision arithmetic on integers, rational numbers, and floating-point numbers. The only limit to the precision is the memory of the machine. We adopted computation with rational numbers: each one is stored in memory with two integers: numerator and denominator. The use of GMP has a considerable bad impact on the execution time of the algorithm. We compared the execution time of LH with long-double precision and with arbitrary precision in cases where both the algorithms reaches the correct solution. Long double version dramatically outperforms the other. We compared the time execution of 50 instances of PolymatrixGame-RG games for each dimension n from 50 to 100 with an interval of 5. Each 64
instance has been run 2n times. For each dimension, we computed the average running time. Long double version is on average about 56 times faster than the infinite precision one, with a standard deviation of 10.7. We argue that this average become greater with greater game size. Problems due to computer numerical precision raise also in the generation of hard-to-solve games described in Section 3.5. For further information related to the performance of the generators with different numerical precision read [14]. It is interesting to note that the largest representable hard-to-solve game generated through floating point arithmetic with the non-trigonometric moment curve is only with 14 actions per player and with the trigonometric moment curve is only with 6 actions per player. In order to obtain a speed up of the algorithm with arbitrary precision we resort to Integer Pivoting [23]. With a slight modification of classical pivoting operations, if one starts to pivot an integer matrix, he can preserve the integrality of its elements for the whole execution. It requires as input the pivot value of the previous pivoting operations, that is the tableau element at the column of entering variable and the row of the leaving variable. At the first pivot step, the previous pivot value has no effect and is given by 1. This value is used to divide rows previously multiplied by it, in order to limit the growth of integer numbers. The pseudo code of Integer Pivoting is described in Algorithm 4. Algorithm 4: Integer Pivoting 1
2
3
4 5 6
Let be given a tableau in the form T = (I N b), where I corresponds to the columns of the basic variables, N to the nonbasic ones and b the vector of constant terms. Let l and e be the column and the row corresponding to the leaving and entering variable, respectively. So let p = Tle the pivot value, and let p0 be the previous pivot value given as input. Multiply all the non pivot rows by the pivot value: Thk = Thk p, ∀h 6= l row, ∀k column Thk = Thk − Tlk The , ∀h 6= l row, ∀k column. By this way Tke = 0, ∀k 6= l row and Tle = p Thk = Thk /p0 , ∀h 6= l row and ∀k column. Tlk = Tlk , ∀k, column are the new elements of the tableau. p0 = p
Integer pivoting avoids computation on denominators in our representation of number by two integers, because they are always 1. It is superior with respect to using fractions of integers, because their cancelation requires the greatest common divisor computation which tend to take the bulk of computation time. Therefore, by this way worse performance due to GMP implementation become relatively less relevant. 65
We estimated how much Integer Pivoting can increase the performance of our LH with arithmetic in arbitrary precision just on some example of instances. We noted that the contribution of Integer Pivoting is not evident for games with small dimension and games solved quickly by LH, because the number of digits of the numerator does not increase so much. Integer Pivoting gains its importance in executions that takes long time to end. For example, we executed 10 instances of 100 × 100 CovariantRand-Game, once per game, and we collected this result: LH with Integer Pivoting solved them in about 900 seconds on average, while LH without Integer Pivoting run for about 7000 seconds on average. Their ratio is about 7.6. A similar result holds for the sum of those computational times. We described Integer Pivoting applied to the whole tableau, but it can be easily applied in combination with the Revised Method. As last optimization, during LH execution on degenerate games, it can happen that the lexicographical ordering bring out a variable even if the first entering variable or its complementary have the same value of minimum ratio. If we bring out immediately the first entering variable or its complementarily we find a NE, then the algorithm ends. And it means that the algorithm could end at this step. A very simple way to avoid those useless LH steps is to individuate those parities and to choose the variable forcing the end of LH as the leaving one, so terminating earlier the execution.
5.2
Randomized Search and Heavy Tails
In this section we introduce some theoretical concepts derived from Probability and Statistics that will be useful for improving LH performance in a random fashion. Performance of randomized algorithms can vary dramatically from run to run, even on the same instance. From the study of randomized backtrack search, we resort to some known distributions useful to describe that behavior: heavy tail and fat tail [17], [16]. Researchers have found heavy-tailed distributions related to phenomena of different areas, such that economics, geophysics, weather forecast, earthquake prediction, statistical physics, etc. Distributions of randomized backtrack search often show heavy-tailed behavior. It means big probability in the tail on a cumulative distribution of the run times. The reason is that the run time can vary a lot and there are some executions that last much more than the average run time. Distributions of this kind are nonstandard that have infinite moments from some order on (e.g., mean, variance). Formally, a heavy-tailed distribution is characterized by having a tail that is asymptotically of the Pareto-Lévy form: P {X > x} ∼ Cx−α , x > 0 66
0
0
10
10
−1
10
−20
10
−2
10
−40
1−F(x)
1−F(x)
10 −3
10
−60
10 −4
10
−80
10
−5
10
−6
10
−100
0
10
1
10 x
10
2
10
0
10
(a) Heavy-tailed behavior
1
10 x
2
10
(b) Nonheavy-tailed behavior
Figure 5.1: Simulated heavy and nonheavy-tailed behaviors. where α is a positive constant called index of stability and its value is such that α = inf{r > 0 : E[X r ] = ∞}. Thus, by definition, moments greater or equal than α are infinite, while moments which are less than α are finite. The lower α the heavier the tail. The tail of this kind of distribution has an hyperbolic behavior. There is a simple way to experimentally identify a heavy tail. It is based on plotting on logarithmic scales. First we have to compute the complement to one of the cumulative distribution F (x). If the distribution is heavy-tailed the decay of 1 − F (x) exhibits an hyperbolic behavior, because 1 − F (x) = P {X > x}. In the case of heavy tail the log-log plot of 1 − F (x) shows an approximately linear decrease (Figure 5.1 (a)) and its slope provides an estimate of the index of stability, otherwise a distribution with an exponential decay shows a faster than linear decrease (Figure 5.1 (b)). Fat-tailed distribution is similar to heavy-tailed one, but its tail decays with a faster rate. Fat-tailed distributions have a lot of probability concentrated in the tail, but all their moments are finite. A fat-tail can be identified by the computation of the kurtosis index. The kurtosis of a distribution is the quantity µ4 /µ22 , where µ4 is the fourth moment and µ2 is the second one, the variance. If a distribution has a kurtosis greater than 3, it has a fat tail. 3 is the kurtosis value of a standard normal distribution. Fat and heavy tails are bad behaviors for an algorithm. It means there are some very worse execution with respect to the average run time. Of course, the heavier the tails, the higher the frequency of very long runs. In order to eliminate fat-tailed or heavy-tailed behaviors, it is proposed to use a randomized restart strategy [17]. The idea is that multiple short runs are better then a single long run. Thus, if the current execution is not promising the algorithm stops it and begins with another one. This technique is very effective with these two kinds of distribution allowing one to obtain
67
better performance on average. If the right cutoff is chosen, the heavytailed behavior disappears and the log-log plot exhibits a faster than linear decrease. The cutoff is the maximum number of steps that the algorithm can perform on a single run. If the algorithm does not reach the solution until the cutoff, it applies the restart policy choosing a new run to execute. The cutoff can be chosen in different ways. If the runtime distribution is known, the optimal strategy is a sequence of fixed length runs, thus a fixed cutoff is chosen. Other possible strategies are based on the increase of the cutoff value at each restart or after a specified number of restarts. The increase can be geometrical or linear. We note that random restart is already used as metaheuristic in Chapter 4. Here, the difference stays in that we apply directly on a well-known algorithm, LH, which has got a random element at the beginning. We do not modify the algorithm itself. Furthermore, it is provided a theoretical study in order to understand a priori if this method will be effective on the estimation of a probability distribution.
5.3
Experimental Analysis and rr-LH
In this section we present some studies about the hardness of some game classes when they are solved with LH algorithm. Then, using some concepts of the previous section we try to solve efficiently the hardest classes. First, we evaluated the average number of steps, i.e., pivoting operations, that the algorithm performs in order to solve some classes of game generated through GAMUT and the SGC’s games (Section 3.7). Hard-to-solve games (Section 3.5) are theoretically proven to have exponential behavior and we did not run them. We studied how this value changes with respect to the dimension of the game. For each class we generated 500 instances of games for each dimension from 5 to 100 with a interval of 5. In the case of SGC’s games we had only one game for each game size, from 3 to 99 with an interval of 4. We ran LH with every action as initial entering variable and we saved the number of steps LH needed to reach the equilibrium. At the end, we plotted a graphic for each class where it can be seen how the average number of steps that LH takes to solve the game vary according to its dimension (Figures 5.2 and 5.3). On the basis of the hardness of their solution with LH, we can classify the games in five different clusters in increasing order of difficulty: • cluster A: these are the easiest games, because the average number of steps needed to reach the equilibrium does not vary with their size. DispersionGame belongs to cluster A. • cluster B: for increasing size the average number of steps tend to a 68
3
3.15
2.8
3.1
2.6 Average number of steps
Average number of steps
3.05 2.4 2.2 2 1.8 1.6
3 2.95 2.9 2.85
1.4 2.8
1.2 1
0
20
40
60
80
2.75
100
0
20
40
Game size
60
80
100
Game size
(a) DispersionGame
(b) UniformLEG-RG
3.2
120
3.15 100 Average number of steps
Average number of steps
3.1 3.05 3 2.95
80
60
40
2.9 20 2.85 2.8
0
20
40
60
80
0
100
0
20
40
Game size
80
100
80
100
(d) SGC’s game
2500
2500
2000
2000 Average number of steps
Average number of steps
(c) BidirectionalLEG-RG
1500
1000
500
0
60 Game size
1500
1000
500
0
20
40
60
80
0
100
Game size
0
20
40
60 Game size
(e) PolymatrixGame-RG
(f) GraphicalGame-RG
Figure 5.2: Average number of pivoting steps performed by LH as a function of the game size for DispersionGame, UniformLEG-RG, BidirectionalLEGRG, SGC’s games, PolymatrixGame-RG, and GraphicalGame-RG.
69
4
18000
8
16000
7 6 Average number of steps
Average number of steps
14000 12000 10000 8000 6000
5 4 3 2
4000
1
2000 0
x 10
0
20
40
60
80
0
100
Game size
(a) CovariantGame-Rand
2
4
6
8
10 Game size
12
14
16
18
(b) Hard-to-solve games
Figure 5.3: Average number of pivoting steps performed by LH as a function of the game size for CovariantGame-Rand and hard-to-solve games. constant value. BidirectionalLEG-RG and UniformLEG-REG belong to this cluster. • cluster C: the average number of steps increases linearly in the dimension of the games. SGC’s games belong to this cluster. • cluster D: these games show an exponential growth of the average number of steps respect to the dimension of the games, but there exist some polynomial length paths. PolymatrixGame-RG, GraphicalGame-RG and CovariantGame-Rand belong to this cluster. • cluster E: all the paths of these games are exponentially long. Hardto-solve games belong to this cluster. Clusters A, B and C can be efficiently solved with LH, thus we focus only on the hardest clusters: D and E. In order to solve efficiently games of cluster D, we decided to verify if their distributions have heavy-tail or fattail behavior. First, we computed the cumulative distribution of number of steps for a given dimension of the games. We run 200 times 1000 instanced of 100×100 games, for each class. The plots of 1−F (x) are in Figure 5.4 (a)-(c)(e). Note the very high values on the x-axis, with low probability. Second, we plot log-log plot of 1 − F (x) in Figure 5.4 (b)-(d)-(f). It is worth to note these plots do not show a linear decrease behavior, thus games of cluster D are not characterized by a heavy-tailed distribution. Instead, they have a fat-tailed behavior, indeed the kurtosis of games belonging to cluster D is greater than 3. In particular, PolymatrixGame-RG, GraphicalGame-RG and CovariantGame-Rand classes have a kurtosis of about 237, 629 and 222, respectively. Observe the distribution of path length for those game classes
70
in Figure 5.5: almost all the paths are very short, but there are very rare and very long paths too. In order to eliminate the fat-tailed behavior, as explained in the previous section, we tried to extend LH algorithm with a randomized restart strategy. We call this new algorithm random restart LH (rr-LH). For the pseudo code see Algorithm 5. We avoid to follow long paths imposing a fixed cutoff. If the number of steps of a path reaches the cutoff value, it means that this is not a promising path, thus we have to search for another one which is shorter. In order to do this, we stop LH following this path and we restart the algorithm with a new first entering variable. To avoid multiple repetitions of the same path, we store a tabu list of the already chosen first entering variables and we randomly choose a new one that does not belong to that set. To save the completeness of the rr-LH, when there is only one entering variable available we set the cutoff value to ∞, thus LH will follow this path to the equilibrium. Algorithm 5: rr-LH 1
2 3 4
5 6 7 8
Let (A, B) be a bimatrix game such that A, B > 0. The initial solution is (x, y) = (0, 0) and the initial basic variables are {si , ∀i ∈ [m]} ∪ {ri , ∀i ∈ [n]}. At the beginning S = {xi | i ∈ [n]} ∪ {yi | i ∈ [m]} and cutoff = h Choose randomly the first entering variable zj from the set S. Remove zj from the set S Call LH with entering variable zj and stop its execution when the number of steps overcomes cutoff if LH has reached the equilibrium then go to 8 if |S| = 1 then cutoff = ∞ go to 2 Return the equilibrium (x, y)
We performed an experimental analysis to find the best cutoff value for each class of game that belongs to D. We executed rr-LH for each game and for each cutoff, from 1 to 40, a number of times equals to the size of the game and then we averaged the results. We executed 2n times of 500 instances of PolymatrixGame-RG, GraphicalGame-RG and CovariantGame-Rand, form size n = 5 to 65 with a step of 5. Figure 5.6 shows two examples of the studies. For the whole data see Appendix A. It is worth to note that for PolymatrixGame-RG and GraphicalGameRG the cutoff that guarantees the minimum average number of steps is similar for all game sizes and we take 20 for both, on average. In the case of CovariantGame-Rand we also executed the tests with greater cutoff values, because 40 is not enough to identify the optimum. Its value is not a constant, but grows with the game size. Anyway, the average number of steps with the optimal cutoff still grows exponentially. Thus, in this way, we cannot 71
0
10 1
−1
10 0.8
−2
0.6
1−F(x)
1−F(x)
10
0.4
−4
10
0.2
0
−3
10
−5
10
−6
0
0.5
1
1.5 2 2.5 Number of steps
3
3.5
10
4
0
10
1
2
10
10
5
x 10
(a) PolymatrixGame-RG
3
4
5
10 10 Number of steps
10
6
10
(b) PolymatrixGame-RG: log-log plot 0
10 1
−1
10 0.8
−2
0.6
1−F(x)
1−F(x)
10
0.4
−4
10
0.2
0
−3
10
−5
10
−6
0
1
2
3
4 5 Number of steps
6
7
8
10
9
0
10
1
2
10
10
5
x 10
(c) GraphicalGame-RG
3
4
5
10 10 Number of steps
10
6
10
(d) GraphicalGame-RG: log-log plot 0
10 1
−1
10 0.8
−2
0.6
1−F(x)
1−F(x)
10
0.4
−4
10
0.2
0
−3
10
−5
10
−6
0
0.5
1
1.5 Number of steps
2
2.5
10
3
0
10
6
x 10
(e) CovariantGame-Rand
2
10
4
10 Number of steps
6
10
8
10
(f) CovariantGame-Rand: log-log plot
Figure 5.4: Distribution of the number of steps performed by LH. Figure on the right are in logarithmic scale on the two axes.
72
4
18
5
x 10
2 1.8
16
1.6
14
1.4 occurences
occurences
12 10 8 6
1.2 1 0.8 0.6
4
0.4
2 0
x 10
0.2
0
0.5
1
1.5
2 2.5 number of steps
3
3.5
0
4
0
1
2
3
5
x 10
(a) PolymatrixGame-RG
4 5 number of steps
6
7
8
9 5
x 10
(b) GraphicalGame-RG
4
18
x 10
16 14
occurences
12 10 8 6 4 2 0
0
0.5
1
1.5 number of steps
2
2.5
3 6
x 10
(c) CovariantGame-Rand
Figure 5.5: Histograms of path lengths on PolymatrixGame-RG, GraphicalGame-RG and CovariantGame-Rand.
73
350
80
300 Average number of steps
Average number of steps
90
70
60
50
40
30
250
200
150
100
0
10
20
30
40
50
60
50
70
0
100
200
Cutoff
(a) PolymatrixGame-RG, size 25
Average number of steps
Average number of steps
300
80
70
60
50
250
200
150
100
40
0
10
20
30
40
50
60
50
70
0
100
200
Cutoff
300
400
500
Cutoff
(c) GraphicalGame-RG, size 25
(d) GraphicalGame-RG, size 50
250
2500
200
2000 Average number of steps
Average number of steps
500
350
90
150
100
50
0
400
(b) PolymatrixGame-RG, size 50
100
30
300 Cutoff
1500
1000
500
0
50
100 Cutoff
150
0
200
0
200
400
600
800
1000
Cutoff
(e) CovariantGame-Rand, size 25
(f) CovariantGame-Rand, size 50
Figure 5.6: Relation between the average number of steps and the cutoff.
74
160
140
140
120
120 Average number of steps
Average number of steps
160
100 80 60
100 80 60
40
40
20
20
0
0
20
40
60
80
0
100
Game size
0
20
40
60
80
100
Game size
(a) PolymatrixGame-RG
(b) GraphicalGame-RG
Figure 5.7: Average number of pivoting steps performed by rr-LH as a function of the game size. eliminate the fat-tail and the exponential behavior. Another increase of the cutoff is useless because no restart would be performed. To verify whether or not the random restart strategy removed the fattailed behavior, we call rr-LH 5 times on 500 games for each dimension of the games of the classes PolymatrixGame-RG and GraphicalGame-RG. Then we averaged the results and we plot them (Figure 5.7). It is easy to observe that the exponential behavior disappears and now, on average, the relationship between game size and average number of steps seems linear. Thus we have found a way to make tractable these two classes of game. We did not include games in cluster E within this studies because a restart strategy would be useless. Indeed all their paths are exponential, thus there is no way to randomly choose a polynomial length path. Instead, in Section 3.5 we left an open question: are the hard-to-solve games stable under perturbation? In order to answer we first perturbed the game with a σ-uniform-cube perturbation and then we solved it with LH. We tried different values of σ = 1/2i , i ∈ {3, . . . , 30}. The perturbed instances resulted very simple with all the σ, because they can be solved by LH in few steps. We ran LH 5000 times for each even size from 2 to 18 of hard-to-solve σ-perturbed games. In Figure 5.8 (a)-(b) we show the relationship between the average number of steps and the dimension. Figure 5.8 (c) depicts the same relation but on the nonpertubed game. Note that LH needs between 7 and 8 ·104 steps to reach the equilibrium. It confirms that the perturbed hard-to-solve games are easy. From our experiments, hard-to-solve games would be not stable under uniform perturbations. Thus given Theorem 11, there must exist another stable worst-case game, unless PPAD ⊆ RP.
75
70
250
60
Average number of steps
Average number of steps
200 50
40
30
20
150
100
50 10
0
2
4
6
8
10 Game size
12
14
16
0
18
2
4
6
(a) σ = 1/220
8
10 Game size
12
14
16
18
(b) σ = 1/230 4
8
x 10
7
Average number of steps
6 5 4 3 2 1 0
2
4
6
8
10 Game size
12
14
16
18
(c) nonperturbed game
Figure 5.8: Average number of pivoting steps as a function of the game size performed by LH when applied to σ-perturbed hard-to-solve games.
76
Chapter 6
The Lemke’s Algorithm In this chapter we introduce one more algorithm for solving games at equilibrium (Section 6.1). Lemke’s algorithm is closely related to LH. It solves general LCP and not just bimatrix games. We adopt a version of Lemke’s algorithm due to von Stengel, van den Elzen and Talman, from their work [36]. While LH is initialized choosing the first leaving label, the Lemke starts with an arbitrary point in ∆n × ∆m . This method still needs answer to some open questions about its worst-case complexity (Section 6.2). And how to exploit its extra-flexibility to improve the performance is not known; our hope is that with the further degree of freedom we can overcome the performance of LH, maybe adopting Local Search. Section 6.3 has the purpose to study experimentally this version of Lemke. We also found some theoretical property of the algorithm, in Section 6.2.
6.1
The Algorithm
Since the algorithm is closely related to LH, we will not describe it in depth. We just highlight in what they differ. For a detailed explanation see the source [36]. We have to remark that here we use the algorithm to solve game in normal form, though it can be applied to more general extensive-form game. Given a vector q and a matrix M , to solve a LCP we look for two vectors z, w such that zT w = 0
(6.1)
w = q + Mz
(6.2)
w, z ≥ 0
(6.3)
or to determine that no such vectors exist. Constraints (6.2) are equivalent to (3.14), with the slack vector w. Equation (6.1) expresses usual complementarity conditions. 77
Actually, we have a mixed LCP. Let the vectors z = (u, v, x, y) and w = (wu , wv , wx , wy ). The sign restrictions are none for u and v, whereas x, y, wx , wy ≥ 0 and wu = 0, wv = 0. So, Equation (6.1) is equivalent to xT wx = 0,
y T wy = 0
(6.4)
For computing a NE, we take M and q such as 0 = 1T x − 1 0 = 1T y − 1 wx = 1u − Ay wy = 1v − xT B The algorithm uses an additional vector d ∈ Rn , named covering vector, as coefficient of a corresponding scalar variable z0 and computes solution of the augmented system zT w = 0
(6.5)
w = q + M z + dz0
(6.6)
w, z ≥ 0
(6.7)
z0 ≥ 0
(6.8)
An almost complementarity basis is a set of n basic variables that contains at most one variable of each complementarity pair zi , wi and possibly z0 such that these variables define a unique solution to w = q + M z + dz0 if all other variables are zero. Then suppose this solution fulfills also (6.7): if z0 is nonbasic, this solves also (6.2). Otherwise, there is a pair zi , wi of nonbasic variables. The classic Lemke’s algorithm either computes a solution of this LCP, or fails and terminates in the so called ray termination. It could seem that Lemke’s algorithm has a serious limitation with respect LH, because it can fail. But actually in our application it cannot happen. We do not give a proof of this result. There are a lot of sufficient conditions on matrix M and the covering vector d which exclude ray termination. Our M and d satisfy such conditions. So how to choose d? Remember this algorithm is designed to be initialized in an arbitrary starting point (s, t) ∈ ∆n × ∆m . We use that point to built the covering vector as d = (1, 1, −At, −sT B)T Thus, the sign constraints of our LCP and the equation w = q +M z +dz0 have the form 78
0 = 1T x + z0 − 1
(6.9)
T
0 = 1 y + z0 − 1 wx = 1u − Ay − (At)z0 wy = 1v − xT B − (sT B)z0 x, y, wx , wy ≥ 0 z0 ≥ 0 The algorithm has an initialization step based on LP. It computes the best responses to (s, t). Then, the initial almost complementarity basis contains z0 , u and v, all but one those best responses, and the slack variables wx and wy for the other variables. The missing one is the first entering variable. But note that in the case of normal-form games, we have not to resort to LP. In fact, we can simply compute whose are the pure best responses to a strategy taking the maximum components of Ay and xT B. And this can be done in linear time. Talking about pure best responses is equivalent to refer to the support of the best responses. Algorithm 6: Lemke (1) 1
2
3 4 5
6
7
Let (A, B) be a bimatrix game. Construct the mixed LCP with Contraints 6.9. Choose a starting vector (s, t) ∈ ∆n × ∆m . The initial basic variables are w = (wu , wv , wx , wy ) and the nonbasic variables are z = (u, v, x, y) and z0 . Compute best responses against s and t. By pivoting, find an initial almost complementary basis solution with z0 = 1 where the basic variables are z0 , all component of u and v, and all but one the components of x and y representing best responses against t and s, respectively. Let j be that component not put into the basis. Choose j as the new entering variable. Choose the leaving variable i through the minimum ratio test. Pivot with j as entering variable and i as leaving variable. Update the basis. if the leaving variable i is not z0 then the new entering variable j is the complementary to the leaving i following the complementary pairs (z, w) and go to 3 Return the equilibrium (x, y)
For degenerate cases, we extend the complementarity pivoting with the lexicographical method, as described for LH (Section 3.4). The version of the algorithm presented in [36] has a useful game-theoretic interpretation. It can be shown, that the algorithm generates a linear piecewise path in the strategies space. In any solution to (6.9), (x + sz0 , y + tz0 ) 79
is a strategy profile. Furthermore, when z0 < 1, let x = x/(1 − z0 ) and y = y/(1 − z0 ) a strategy profile and x is a best response to y + tz0 and y is a best response to x + sz0 . Let G be the bimatrix game. The algorithm generates a path of strategy profile (x, y). Each such profile is an equilibrium in a parametrized game G(z0 ). The starting point represents a prior against which the players react initially. Then, they gradually change their behavior by using information about strategies actually played. The payoffs in G(z0 ) are as if each player plays with probability z0 against the prior and with probability 1−z0 against the actual strategy in (x, y) of the other player. Different starting points can produce different equilibria. The authors in [36] describe an alternative Step 2 of Algorithm 6. This observation is relevant for some considerations about complexity given below. They note we can use the complementarity pivoting from the very beginning, instead of compute best responses. To do that, we need A, B < 0. But it is without loss of generality because of Theorem 2. This yields a new LCP which variables are z = (u, v, x, y)T , w = (wu , wv , wx , wy )T and z0 are all nonnegative. 0 = 1T x + z0 − 1
(6.10)
0 = 1T y + z0 − 1 wx = 1u − Ay − (At)z0 wy = 1v − xT B − (sT B)z0 w, z ≥ 0 z0 ≥ 0 Since A, B < 0, the covering vector d is nonnegative and has positive components whenever qi < 0, where q = (−1 − 1 0 0)T . Let w be the first vector of basic variables. z0 enters and wv leaves the basis. There will be some initial pivots, all degenerate, in which the basis changes but not the values of the basic variables, until z0 < 1. Then the computation proceeds as before in Algorithm 6. The game-theoretical interpretation given above holds also here, but only from when z0 < 1 on. Read this version in Algorithm 7. It is important to remark that, despite its similarities with LH, the Lemke’s algorithm follows different paths and, hence, their executions differ substantially. Lemke’s algorithm from [35] does not generalize LH [3], [30].
6.2
The Complexity
Here they are some results about complexity of the classical Lemke’s algorithm, when initialized with d ∈ Rn . It has been proven in [22] that solving a LCP may require an exponential number of steps in the worst-case, when 80
Algorithm 7: Lemke (2) 1
2
3 4 5
6
7
Let (A, B) be a bimatrix game such that A, B < 0. Build the LCP with Constraints 6.10. Choose a starting vector (s, t) ∈ ∆n × ∆m . The initial basic variables are w = (wu , wv , wx , wy ) and the nonbasic variables are z = (u, v, x, y) and z0 . Pivot with z0 as the first entering variable and as wv the first leaving one. Update the basis. Let j = v, that is the complementary variable of wv . Choose j as the new entering variable. Choose the leaving variable i through the minimum ratio test. Pivot with j as entering variable and i as leaving variable. Update the basis. if the leaving variable i is not z0 then the new entering variable j is the complementary to the leaving i following the complementary pairs (z, w). go to 3 Return the equilibrium (x, y)
Lemke’s algorithm does not fail. It is also available an average-case analysis from [21]. If we assume that the matrix M and the vector q are sampled independently from spherically distributions, the average-case complexity turns out to be exponential if d = (1, . . . , 1)T , while it becomes polynomial (quadratic in n, the row number of M ) if d = (, 2 , . . . , n )T , for sufficiently small. In our particular application, when we set up a LCP expressing the condition for NE, worst-case complexity results are not available. It is worth noting that the hard-to-solve games cited in Section 3.5 are built to be exponentially-hard for LH, not for the Lemke’s algorithm. To the best of our knowledge, the Lemke’s algorithm has never been studied on these games. Now, focus on Algorithm 6. Remember that at each step, the algorithm computes the best responses to the current strategy profile. Thus, at the first step it finds the best responses to the starting vector (s, t). One can start the algorithm in a point that have exactly the same pure best responses of one NE of the game. If that, the algorithm takes just one step to get the equilibrium, after the initialization. A point with this property is the equilibrium itself. Therefore, there always exists at least one starting point that makes the Lemke’s a polynomial-time algorithm to compute NE. But we have to investigate how complex is to find those points; and of course the equilibrium is not useful to this analysis. Recall the geometrical interpretation of LH for nondegenerate games. A n-simplex is divided into best response regions by hyperplanes. Points on those k-dimensional hyperplanes, k < n, are labeled with the label of adjacent regions. Points out from those hyperplanes has at most one pure best 81
response. A fully mixed NE lies strictly into the simplex, on the intersection of those hyperplanes. A pure NE lies on a vertex of the simplex. Other Nash equilibria stay on some intersection of the simplex frontier and hyperplanes. We are looking for sets of points with the same pure best responses of an equilibrium of the game. Let (A, B) be a bimatrix game with N Nash equilibria. For each NE, we give a measure to the set of points with the same pure best responses. If the NE is a pure strategy profile, the set is an n-dimensional subset of the n-simplex. Otherwise, it lies on a mdimensional subset, m < n, of the n-simplex, thus it is a null set (not a void set). Summing up the measure of all N equilibria, we obtain the following property. Theorem 12 Let (A, B) be a bimatrix game. Let Q ⊆ ∆n × ∆m be the set of points (x, y) ∈ ∆n × ∆m with the same pure best responses of an equilibrium of (A, B). If (A, B) has not pure Nash equilibria, then Q is a null set. Otherwise, Q has a measure different from 0. If we knew the pure best responses of a NE, we could quickly find the NE through Lemke’s algorithm. But it is not. And since we do not know a priori if a game has pure Nash equilibria, this result does not help us. In fact, we should enumerate all sets of pure best responses and then check -in polynomial time- if there exists a NE with those best responses. The problem is that this enumeration takes exponential time, because it is substantially equivalent to a support enumeration. The algorithmic interest comes from our attention on set measures. We gain a negative result. We could randomly start Lemke’s algorithm on the simplex. But random sampling will never start from a point in a null set, with probability one. Therefore, a point with the same pure best responses of a NE is never randomly chosen as the starting vector of Lemke’s algorithm, unless it is a pure NE. At least, some considerations about the case of hard-to-solve games. Recall they have only one NE, that is fully mixed. A fully mixed equilibrium is reachable in polynomial time by Lemke’s algorithm from: • the equilibrium itself. Indeed, of course, the best responses to the equilibrium are exactly the strategies constituting the equilibrium. So Algorithm 6 solves the game in just one step after the initialization. • the origin (0, 0). In fact, for any (A, B) the expected utility to play 0 is always 0. Therefore, every pure strategy is best response. Note that the origin (0, 0) does not belong to ∆n × ∆m , so it is not a proper strategy. The general Lemke’s algorithm could be started out from ∆n × ∆m , but it does not guarantee to avoid ray termination. Anyway in this case the algorithm find a solution. Lemke’s algorithm started at (0, 0) solves in polynomial time the hard-to-solve games. Unfortunately, the 82
nonsquare hard-to-solve games developed in [30] have not fully mixed equilibrium, so this observation does not hold for them. One could wonder if there exists such a game with all exponentially-long Lemke’s paths. The answer is of course negative. Because of the algorithm can start in an arbitrary point on ∆n × ∆m , there must be some points (x, y) which have the same best responses of the equilibrium; at worst, there is the equilibrium itself. However, we have not a way to know efficiently what that point are. Further, it is in practice impossible to guess them with random sampling. It is relevant to notice that those results are not valid for Algorithm 7. Algorithms 6 and 7 always lead to the same equilibria. But the initialization step can be of different length. Algorithm 7 takes exponential time even when initialized at an equilibrium of hard-to-solve games. The two versions of Lemke’s algorithm thus are not equivalent.
6.3
Implementation, Tuning and Experimental Analysis
The Lemke’s algorithm is implemented similarly as LH. It is totally written in C, resorting to the GMP [11] library. We used Revised Method and Integer Pivoting to enhance the classical complementarity pivoting. The first issue to face is how to randomly start the algorithm. While LH has a finite set of points within to choose, here we have a -theoretically- continuum of points on the simplices. A known method with that purpose comes from the study of Dirichlet distribution and its relationship with exponential distribution. A naive method that samples uniformly and independently one by one variable, maintaining them in the simplex, is wrong because does not lead to uniform distribution. Another way could be the so called rejection sampling method: first sample in the unit hypercube and then reject those points do not lie in the simplex. But this gets worse and worse as the dimension increases, because the number of rejections become more and more. We present the adopted sampling and remand to handbooks on random sampling for the explanation. Just note that an exponential distribution can be obtained from −log(X), where X is a uniform random variable sampled from [0, 1]. Algorithm 8: Random Sampling on a n-simplex 2
for i from 1 to n do sample uniformly X from [0, 1] and take xi = −log(X)
3
Normalize and return the vector x
1
83
The first analysis we conduced is about the behavior of the hardnessclusters identified with LH (Section 5.3). It is interesting to verify whether the complexity to solve a game is substantially changed with the Lemke’s algorithm. We plot graphics on Figure 6.1 of one representative game class for each cluster. Relationships between average number of pivoting steps and game size are not changed by Lemke’s algorithm from LH; it maintains similar shapes, and so we individuated the same clusters. DispersionGame are the exception: they become more difficult to solve, though they seems to grow sublinearly. They are clustered in cluster B. We tested the algorithm n times on 500 games with dimension n from 5 to 50, with an interval of 5. Graphics on hard-to-solve games (Figure 6.2) deserve a note apart. There are plotted two curves: the one of average number of steps, and another one of the minimum number of steps. An open question on Lemke’s algorithm is about its behavior when applied to find equilibria of hard-to-solve games. We wondered if starting in the continuum of the simplices can lead to polynomial sized paths. The minimum curve (see the y-log plot in Figure 6.2 (b)) suggests that random starts are ineffective to find paths with polynomial bounded length. Tests has been performed 5000 times for each even size game, from 2 to 18 (there was only one hard-to-solve game instance per size). Even we would be able to always choose the shortest path, anyway it is exponentially long. Future studies about the best random choice of starting vectors cannot do better than lower the slope of an exponential increasing. An interesting comparison is that between the Lemke’s algorithm and LH. We want to understand whether or not one of them is always superior. Figure 6.3 shows that the Lemke’s algorithm takes a lower number of step to solve PolymatrixGame-RG and CovariantGame-RG. Instead, LH follows shorter paths than Lemke’s to find equilibria for SGC’s games. But those observation does not imply that one algorithm is better than the other. We must evaluate their running time to do that, because they follow different paths. Consider Table 6.1 in the end of this chapter. We ran the algorithms on 10 instances of 150 × 150 CovariantGame-Rand, GraphicalGame-RG and PolymatrixGame-RG, with a 10 minutes of deadline. Percentages seem to demonstrate that an instance is hard for the Lemke’s algorithm if and only if it is for LH. Further, they confirm CovariantGame-Rand as the hardest game class for both the algorithms. Table 6.1 also reports the running time when an equilibrium is found and the best -NE computed. Running times show there is not an algorithm that always outperforms the other. Instead, LH found the best -NE. From now on we focus on PolymatrixGame-RG, CovariantGame-Rand and hard-to-solve games. Our purpose is to find a tuning that increases the performance of a randomized version of the Lemke’s algorithm. The dimensions to be tuned are now two: the starting point and the cutoff value. The starting point (s, t). We looked for relationships between path 84
19
6.1
18
6
17 16
Average number of steps
Average number of steps
5.9
15 14 13 12
5.8 5.7 5.6 5.5
11 5.4
10 9
5
10
15
20
25 30 Game size
35
40
45
50
5
140
160
120
140
20
25 30 Game size
35
40
45
50
120
100
80
60
40
100 80 60 40
20
0
15
(b) Cluster B: UniformLEG-RG
Average number of steps
Average number of steps
(a) Cluster B: DispersionGame
10
20
0
10
20
30
40
50 60 Game size
70
80
90
0
100
0
10
20
30
40
50
Game size
(c) Cluster C: SGC’s game
(d) Cluster D: PolymatrixGame-RG
250
Average number of steps
200
150
100
50
0
0
10
20
30
40
50
Game size
(e) Cluster D: CovariantGame-RG
Figure 6.1: Average number of pivoting steps performed by Lemke’s as a function of the game size. Note that SGC’s games are tested till dimension 99 × 99.
85
4
3
x 10
5
10
4
10 Average number of steps
Average number of steps
2.5
2
1.5
1
3
10
2
10
1
10
0.5
0
0
2
4
6
8
10 Game size
12
14
16
10
18
(a) Hard-to-solve games
2
4
6
8
10 Game size
12
14
16
18
(b) Hard-to-solve games, y-log plot
Figure 6.2: Average number of pivoting steps performed as a function of the game size by Lemke’s applied to hard-to-solve games. (b) shows both the average and the shortest paths found, in logarithmic scale. length and value of -NE, -supp-NE, regret (Definition 4.2) and variable z0 . Note that they are all in [0, 1] and if they are exactly 0, the point is a NE. The first three metrics are computed on the strategy profile (s, t). The last one is the first value of z0 different from 1 during the Lemke’s algorithm execution. See Figure 6.4. The dataset was 500 instances of 50×50 PolymatrixGame-RG and CovariantGame-Rand, all executed 10 times; and a 16 × 16 hard-to-solve game, run 500 times. We only report plots about -NE and z0 ; others have very similar shapes of the -NE one. See them in the Appendix A. z0 turns out to have a certain correlation with path length for PolymatrixGame-RG and CovariantGame-Rand, though low value of z0 are quite unlikely (Figure 6.4 (c)). Hard-to-solve game does not show useful relationships. This analysis has not suggested a clear way to choose a good starting point. We tried other classical metrics. The distance is computed between (s, t) and the NE (x∗ , y ∗ ) reached from (s, t). We report just those show the most interesting properties. See the others in Appendix A. • Mean of Euclidean distance : [d2 (s, x∗ ) + d2 (t, y ∗ )]/2 • Mean of Chebyshev distance : [d∞ (s, x∗ ) + d∞ (t, y ∗ )]/2 See Figure 6.5. The dataset was 500 instances of 50×50 PolymatrixGameRG and CovariantGame-Rand, all executed 10 times; and a 16 × 16 hard-tosolve game, run 500 times. Surprisingly, it results that the further is (s, t) to (x∗ , y ∗ ) the shorter the path is. As before, it is not true for hard-to-solve games too. Moreover, the probability to find points far from their equilibrium is quite high. We could experimentally observe that about 40% of the 86
120
140
100
100
Average number of steps
Average number of steps
120
80
60
40
60
40
20
20
0
80
0
10
20
30
40
50 60 Game size
70
80
90
0
100
0
20
40
60
80
100
Game size
(a) Lemke’s algorithm on SGC’s games
(b) LH on SGC’s games
160
180
120
160 Average number of steps
Average number of steps
200 140
100 80 60 40
140 120 100 80 60 40
20 0
20 0
10
20
30
40
0
50
0
10
20
Game size
30
40
50
Game size
(c) Lemke’s algorithm on PolymatrixGameRG
(d) LH on PolymatrixGame-RG
250
400 350
200 Average number of steps
Average number of steps
300
150
100
250 200 150 100
50 50 0
0
10
20
30
40
0
50
Game size
0
10
20
30
40
50
Game size
(e) Lemke’s algorithm on CovariantGameRand
(f) LH on CovariantGame-Rand
Figure 6.3: Comparison of average number of pivoting steps performed by Lemke’s and LH as a function of the game size.
87
350
Average number of steps
300
250
200
150
100
50
0
0
0.05
0.1
0.15
0.2
0.25
epsilon
(a) PolymatrixGame-RG: -NE 8000
300
7000 250
5000 Occurrences
Average number of steps
6000 200
150
4000 3000
100 2000 50 1000 0
0
0.1
0.2
0.3
0.4
0.5 z0
0.6
0.7
0.8
0.9
0
1
(b) PolymatrixGame-RG: z0
0
0.1
0.2
0.3
0.4 0.5 0.6 Number of steps
0.7
0.8
0.9
1
(c) PolymatrixGame-RG: distribution of z0 12000
250
11000 10000 Average number of steps
Average number of steps
200
150
100
50
9000 8000 7000 6000 5000 4000 3000
0
0
0.1
0.2
0.3
0.4
0.5 z0
0.6
0.7
0.8
0.9
2000
1
(d) CovariantGame-Rand: z0
0
0.1
0.2
0.3
0.4
0.5 z0
0.6
0.7
0.8
0.9
1
(e) hard-to-solve games: z0
Figure 6.4: Values of -NE (a) and z0 (b) of the starting points and related average number of steps with PolymatrixGame-RG. Distribution of z0 values (c): x-axis points are clustered in order to compute the average on y-axis. Value of z0 with CovariantGame-Rand (d) and hard-to-solve games (e).
88
1000
1200
900 1000
800 Average number of steps
Average number of steps
700 600 500 400 300
800
600
400
200
200
100 0 0.2
0.3
0.4
0.5
0.6
0.7 0.8 Mean d2
0.9
1
1.1
0
1.2
(a) PolymatrixGame-RG: mean d2
0.2
0.3
0.4
0.5 0.6 mean d∞
0.7
0.8
0.9
1
600
600
500
500
Average number of steps
Average number of steps
0.1
(b) PolymatrixGame-RG: mean d∞
700
400
300
200
400
300
200
100
100
0
0
0
0.2
0.4
0.6 0.8 Mean d2
1
1.2
0
1.4
(c) CovariantGame-Rand: mean d2
0
0.1
0.2
0.3
0.4
0.5 0.6 mean d∞
0.7
0.8
0.9
1
(d) CovariantGame-Rand: mean d∞ 4
12000
2
11000
1.8
x 10
1.6 Average number of steps
10000
9000
8000
7000
1.4 1.2 1 0.8 0.6
6000 0.4 5000
4000 0.45
0.2
0.5
0.55
0.6
0.65
0
0.7
(e) hard-to-solve games: mean d2
0.35
0.4 0.45 Mean dinfinity
0.5
0.55
(f) hard-to-solve games: mean d∞
Figure 6.5: Distances between starting points and the Nash equilibria to that they lead. x-axis points are clustered in order to compute the average on y-axis.
89
120
100
Occurrences
80
60
40
20
0 0.2
0.3
0.4
0.5
0.6 0.7 0.8 Number of steps
0.9
1
1.1
1.2
(a) PolymatrixGame-RG: distribution of mean d2 800
700
700
600
500 500 Occurrences
Average number of steps
600
400
400
300
300 200 200 100
100 0
0
0.5
1
0
1.5
Mean d2
0
0.5
1
1.5
Number of steps
(b) PolymatrixGame-RG: mean d2 on ver- (c) PolymatrixGame-RG: distribution of tices mean d2 on vertices
Figure 6.6: Probability distribution of mean d2 with random starts (a). Average number of steps and mean d2 between starting points and the Nash equilibria to they lead (b). Probability distribution (c), starting on vertices. x-axis points are clustered in order to compute the average on y-axis points are enough far from their equilibrium to start short paths. Consider Figure 6.5 (a): short paths start from points with mean d2 ≥ 0.6. Now, see Figure 6.6 (a): points with mean d2 ≥ 0.6 are not unlikely. But it is hard to exploit this property. Obviously, we cannot know in advance what is the actual distance between (s, t) and its equilibrium. The idea is to start only on simplex vertices. Indeed, those points are "the farthest from everyone", hence, likely the farthest from equilibria. Experiments in Figure 6.6 (b)-(c) does not show such an evident correlation. Vertices are far from equilibria, but with high probability they start long paths. This way does not isolate the shortest paths. We conclude there is not an evident method to find a starting vector from which begins a short path. Random sampling is still the best way to 90
start. The cutoff value. As in Section 5.3 the question is when to cut a path because it is too long, and restart. Here we tried a different approach: we look for indices which estimate the closeness to an equilibrium, step by step. We still investigate on the value of -NE, -supp-NE, regret and z0 . Instrumental to our analysis is the definition of a certain function on a generic index, named decr, that measures how much a function is decreasing. Definition 28 (decr) Let n be the number of step reached by the algorithm. For all i ∈ [n], let zi ∈ ∆n × ∆m . Let f : ∆n × ∆m → [0, 1] a function such that zi is a NE if and only if f (zi ) = 0. The function decr of f at step i is defined as i 1 X decr(f, i) = |f (zk ) − f (zk+1 )| f (z1 ) k=1
From the definition simply follows that Theorem 13 If f is a decreasing function, ∀i decr(f, i) upper-bound are discarded. When upper-bound= 1, the threshold is disabled. Heuristics. We just resort to Iterative Improvement (II), in the form of Best Improvement (BI), First Improvement (FI) and First Improvement with Random generation (FIR). The random generation of the last one is limited by the usual max-trials parameter that tells when to call metaheuristics after too failed improvements in the neighborhood. Metaheuristics. We implemented Random Restart (RR) by setting the parameter max-iterations. The space of the starting vertices for LSvertices is relatively small, in particular it is exactly of dimension n. Thanks to that, we implemented a very simple Tabu Search (TS) as done for rr-LH in Section 5.3. It memorizes the already tried starting points, and avoids them for future restarts. Finally, once only one point is available to perform the restart, it is chosen with max-iterations set to infinity. Such a TS makes sense for BI and FI heuristics because it would be completely useless to start again an already followed path in a deterministic way. Instead, we left the freedom to begin on same points to FIR heuristic. Metaheuristics are repeated until either an equilibrium has been found or a temporal given deadline is expired.
7.2
Experimental Tuning
In the end of the chapter, we deal with the tuning of the algorithm. We evaluated performance through a UNIX computer with dual quad-core 2.33GHz CPU and 16GB RAM. First we evaluated how parameters max-iterations and max-trials affect the performance of FIR. The dataset are hard instances of 100 × 100 CovariantRand-Game that cannot be solved in 10 minutes neither by PNS, 100
max-it
MIP nor GMP-implemented LH. The various configurations of parameters are compared on the basis of equilibrium found. upper-bound is set to 1. It resulted also LS-vertices was not able to exactly solve them within the deadline. See the results in Table 7.1.
n 2n n2 2n2
n/2 4.01 · 10−3 4.41 · 10−3 4.60 · 10−3 4.80 · 10−3
n 2.95 · 10−3 2.97 · 10−3 2.93 · 10−3 2.68 · 10−3
max-trials 2n 2.65 · 10−3 1.41 · 10−3 1.38 · 10−3 1.35 · 10−3
4n 2.51 · 10−3 1.11 · 10−3 7.42 · 10−4 7.72 · 10−4
n2 /2 2.38 · 10−3 1.14 · 10−3 3.40 · 10−4 4.32 · 10−4
Table 7.1: Average value of the best -NE found within ten minutes max-iterations = n2 and max-trials = n2 /2 is the best tuning for FIR. Similar tests on BI and FI resulted to max-iterations = n2 and max-iterations = 2n2 , respectively. Then we compared the -NE computed by the three heuristics designed (Table 7.2).
II-BI II-FI II-FIR
-NE 8.62 · 10−3 6.62 · 10−3 3.40 · 10−4
Table 7.2: Comparison between the three heuristics (II-BI, II-FI and II-FIR). Average value of the best -NE found by each heuristic within 10 minutes. 5 executions for 5 instences of CovariantGame-Rand FIR outperforms both BI and FI heuristics, as in the LS-PNS tuning (Section 4.3). At last, we estimated the upper-bound parameter of the best heuristic found FIR. The dataset is the same as the previous tests. We found that upper-bound ∈ {0.2, 0.3, 0.4, 0.5} worsen the performance of FIR. So we take upper-bound = 1.
101
Chapter 8
Anytime Algorithms for Approximated Equilibria In this last chapter we compare the performance of most of the algorithms presented in the thesis. First of all we implemented anytime versions our algorithms. An anytime algorithm is such that it returns an approximated solution -an -NE- when it is interrupted at any time before it ends. One more algorithm, ip-LH, based on incremental game-perturbations is designed in Section 8.1 with the precise purpose to find to approximated equilibria. In Section 8.2 they are all compared solving hard game instances in terms of -NE values. LS-vertices results the state-of-the-art for the computation of -NE.
8.1
Incremental Perturbation LH
From Section 2.5, we know that an -NE can be found first perturbing a game and then computing an exact NE of the perturbed game. Precisely, a NE of a game perturbed by a σ-uniform perturbation is at least a 4σ-NE of the original game. Hence, it is simple to write an anytime randomized algorithm that solves incrementally perturbed games, with a decreasing perturbation σ, till a deadline is reached. We take σ as fractional power of 2 from 1/8 to 1/230 . 1/8 assures at least a 1/2-NE, but in practice the resulting is always better. If the algorithm ends also the last steps, it stops finding at least an = 1/228 ≈ 3.7 · 10−9 . At each step LH is used to solve the game at equilibrium. So it takes the name of Incremental Perturbation LH (ip-LH): Algorithm 7. Note that the algorithm has no guarantee to find an exact NE, even if it finish the loop till i = 30. Algorithm 7 is designed with the exclusive purpose of find approximated equilibria. If the deadline is reached, it means that the last call to LH is out of time. So at this step we can take the anytime value computed by LH. At least, 102
Algorithm 9: ip-LH
6
Let (A, B) be a bimatrix game. Let i = 3. Let a deadline be given as input. while i ≤ 30 and the deadline is not reached do Choose σ = 1/2i . Perturb the game (A, B) with a σ-uniform perturbation on all payoffs. (A0 , B 0 ) is the resulting game. Make positive all the payoffs of A0 and B 0 . Call anytime-LH on that game, with a deadline of deadline - elapsed time. Let (x, y) be the result. i = i + 1.
7
Return the minimum -NE (x, y) found.
1
2 3 4
5
ip-PH performs a comparison among all the values found and returns the smaller one.
8.2
Experimental Analysis
As already explained in Section 4.3 about LS-PNS, the anytime version of algorithms is obtained from the original one in a very simple way. At each step, i.e., whenever an approximated solution can be computed, we keep memory of the strategy with the smallest and we return it on the deadline. This is done for LH and Lemke’s algorithm with double and arbitrary precision. Remember that the versions of algorithms in double precision has no guarantee to reach the actual equilibrium, due to numerical approximation. We evaluated performance through a UNIX computer with dual quadcore 2.33GHz CPU and 16GB RAM. We must also remark that in Section 4.3, LH is a previous, nonoptimized version of the algorithm that runs in double-precision arithmetics. (It comes from our previous works of [6], [5].) Instead, experiments reported here make use of our version of LH exposed in Chapter 5. The first analysis is on critical instances of 100 × 100 CovariantGameRand. These games are such that neither PNS, MIP nor LH in GMP precision solve them with a deadline of 10 minutes. We wanted to compare the best epsilon equilibrium found (Table 8.1). Note ip-LH has bad performance. In fact, it neither overcomes the first loop with i = 3 (Algorithm 7). CovariantGame-Rand remains hard even when perturbed. Then, we focus on hard-to-solve games. We wanted to find the smaller -NE. To compute good approximated equilibria for the hard-to-solve games resulted a very easy problem. We ran anytime versions of LH and Lemke’s algorithm, other than ip-LH, on 100 times on 16×16 hard-to-solve game, with deadline of 10 minutes. The anytime LH obtained on average the smallest 103
deadline 10 m 6.21 · 10−3 8.37 · 10−2 3.11 · 10−3 2.08 · 10−2 2.00 · 10−3 4.89 · 10−3 3.40 · 10−4 1.06 · 10−2
algorithm LH∗ PNS∗ MIP∗ Lemke∗ LS-PNS∗ LS-PNSr∗ LS-vertices ip-LH
Table 8.1: Average of of the -Nash equilibria returned by the anytime algorithms executed 5 times per each hard CovariantGame-Rand instance; LS means local search. in the magnitude of 10−34 . Observe that we tested just the cited algorithms because the exact arithmetic of GMP library is fundamental to solve the hard-to-solve games. ip-LH found a good approximation for hard-to-solve games, in the magnitude of 10−12 . It would do better if the loop of Algorithm 7 continued beyond i = 30. But the algorithm is very bad on CovariantGame-Rand. Probably, it is because games generated by random models, as CovariantGame-Rand are, have not so great influence by little random perturbations; so they may remain hard.
104
Chapter 9
Summary of Results Let us see together the Local Search algorithms developed in this thesis on the basis of their features. See Table 9.1. It compares the algorithms in term of search spaces, starting points, heuristics and metaheuristics . Note that we did not studied deeply rr-Lemke. This just because we observed it cannot do better when rr-LH fails, i.e., applied on CovariantGame-Rand, and we guessed it can be well-tuned to easily solve PolymatrixGame-RG and GraphicalGame-RG. LS-PNS is the unique algorithm that searches in the cartesian product of the space of players’ support. Others visit vertices associated to some polyhedra. Hence differences about the heuristics, based on pivoting for all but LS-PNS. See [35] for formal proofs of asymptotic sizes. rr-Lemke has also the peculiarity to be the unique algorithm that works in a continuum space. Actually, moving by pivoting, it always visits a discrete structure. But this graph is different for each starts, because the starting point is chosen in the continuum of ∈ ∆n × ∆m . So, at least theoretically, rr-Lemke will never visit same paths. Above we referred properly to heuristic in the context of LS-PNS and LS-vertices. Here we want to interpret the word in a relaxed fashion, to classify also rr-LH and rr-Lemke. Indeed, complementarity pivoting could be considered a heuristic itself. The choice of respecting the complementarity step by step condition is necessary to the correctness. But it is a choice, because we could select other entering variables without losing points feasibility. Then, here the complementarity pivoting is an heuristic as much as FIR and FIRV. rr-LH and rr-Lemke focus their attention on finding exact equilibria. With rr-LH we made tractable two classes of game (PolymatrixGame-RG and GraphicalGame-RG) that had an exponential behavior with LH. With both rr-LH and rr-Lemke we cannot face CovariantGame-Rand class. Also through rr-Lemke we were able to solve efficiently PolymatrixGame-RG. LS-PNS and LS-vertices are designed to find good -Nash equilibria in
105
a small amount of time. While LS-PNS is also able to solve exactly smallmedium games, LS-vertices is the fastest algorithm to compute approximated equilibria. With these two algorithms we can face the hard class of CovariantGame-Rand, indeed we found approximated solutions faster then the algorithms proposed in literature.
106
107 complementarity pivoting random restart PolymatrixGame-RG and GraphicalGame-RG
random restart PolymatrixGame-RG and GraphicalGame-RG
O(2.6n ) continuous ∈ ∆n × ∆m random
rr-Lemke vertices
rr-LH vertices of the two best response polyhedra O(2.6n ) combinatorial ∈ [n] ∪ [m] random with tabu list complementarity pivoting pivoting guided by II-FIR random restart -NE
LS-vertices vertices of one best response polyhedron O(2.6n ) combinatorial ∈ [n] random
where it works well
metaheuristic
heuristic
starting point
search space and size
Table 9.1: Features comparison of our algorithms. We suppose to use square games
O(4n ) combinatorial ∈ {0, 1}2n random with threshold II-FIRV with conditional dominance random restart small-medium size games and -NE
LS-PNS supports
Chapter 10
Conclusion and Open Questions We focused on the problem of computing a Nash equilibrium in bimatrix games. The algorithms provided by the literature allow one to solve within a short time a large number of game instances generated by GAMUT. However, there are several game classes whose instances cannot be solved by such algorithms within a reasonable time. The challenging open problem is the design of effective algorithms to tackle these games. We proposed some different anytime algorithms, resorting to Local Search techniques. We designed and implemented LS-PNS, a Local Search version of PNS algorithm, that moves on the space of players’ supports, in order to face the hardest games for PNS, MIP and LH. On these games, LS-PNS with FIRV (our ad hoc heuristic) outperforms these algorithms in finding both exact and Nash equilibria. We implemented LH with Revised Method and Integer Pivoting. Exact arithmetic was necessary to assure the correctness of LH. We obtained this resorting to GMP library. Then we extended LH with a random restart policy. rr-LH makes tractable PolymatrixGame-RG and GraphicalGameRG classes, before considered hard. We focused our attention on Lemke’s algorithm. This algorithm can start with an arbitrary strategy profile. To the best of our knowledge experimental analyses are not available in literature. We have studied its behavior with some game classes. We tried to exploit some relations between the length of the paths and some indices, but our experimental campaign did not show a clear way to gain better performance. Anyway, we showed there is not an algorithm always superior between LH and the Lemke’s. LS-vertices derives from the pivoting method of LH in order to explore vertices of single polyhedron. We compared the performance of three heuristics (II-BI, II-FI and II-FIR): II-FIR showed the best results. LS-vertices is the best algorithm in order to find -Nash equilibria. 108
ip-LH calls LH to solve iteratively smaller-perturbed games. It finds very small -Nash equilibria for hard-to-solve games, but is anyway dramatically outperformed by a simple anytime version of LH. Moreover, it works bad on CovariantGame-Rand, as probably on other random generated games. This is because games generated by random models are not so influenced by little random perturbations. CovariantGame-Rand is the hardest class of GAMUT. It shows an exponential behavior and none of our algorithms is able to make this class tractable. It still lacks a formal proof of the worst-case complexity of the Lemke’s algorithm in the version of [35]. It is very unlikely it turns to be polynomial, because it would imply that P = PPAD. The question is whether or not the hard-to-solve games also constitute a worst-case instance for the Lemke’s algorithm. We have given one contribution on the theoretical study of the computational complexity of LH. It is known that LH is not in smoothed polynomial time, unless PPAD ⊆ RP, from Theorem 7. But we experimentally observed that hard-to-solve games, when perturbed, become very simple to solve; hence, we argue the instability of hard-to-solve games under perturbations. This imply that hard-to-solve games do not seem to be the worst-case to establish the nonpolynomial smoothed complexity. The open question is: what is the worst-case instance to establish the nonpolynomiality of the smoothed complexity of LH? We have not investigated on nonsquare hardto-solve games built in [30] and they may answer to the open question. In future works, our intention is to construct a general-games solver. The idea is to integrate the best classical algorithms and our Local Search techniques, with a more generalized tuning of heuristics parameters. Further our future goal is to extend our studies to all classes of GAMUT and to nonsquare hard-to-solve games.
109
Bibliography [1] E.H.L. Aarts and J.K. Lenstra. Local search in combinatorial optimization. Princeton Univ Pr, 2003. [2] D. Avis, G.D. Rosenberg, R. Savani, and B. Von Stengel. Enumeration of nash equilibria for two-player games. Economic theory, 42(1):9–37, 2010. [3] A. Balthasar, P.J.J. Herings, M. Jurdzinski, P.B. Miltersen, E. Tardos, and B. von Stengel. Equilibrium tracing in bimatrix games. Equilibrium. [4] H. Bosse, J. Byrka, and E. Markakis. New algorithms for approximate nash equilibria in bimatrix games. In Proceedings of the 3rd international conference on Internet and network economics, pages 17–29. Springer-Verlag, 2007. [5] S. Ceppi, N. Gatti, G. Patrini, and M. Rocco. Local search methods for finding a nash equilibrium in two-player games. In 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pages 335–342. IEEE, 2010. [6] S. Ceppi, N. Gatti, G. Patrini, and M. Rocco. Local search techniques for computing equilibria in two-player general-sum strategicform games. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1, pages 1469–1470. International Foundation for Autonomous Agents and Multiagent Systems, 2010. [7] X. Chen, X. Deng, and S.H. Teng. Computing nash equilibria: Approximation and smoothed complexity. 2006. [8] J.W. Chinneck. Feasibility and infeasibility in optimization: algorithms and computational methods, volume 118. Springer Verlag, 2008. [9] I. CPLEX. 11.0 manual of user. ILOG SA, Gentilly, France, 2008. [10] C. Daskalakis, P.W. Goldberg, and C.H. Papadimitriou. The complexity of computing a nash equilibrium. Communications of the ACM, 52(2):89–97, 2009. 110
[11] T. Granlund et al. Gnu Multiple Precision Arithmetic Library 5.0.2 user’s manual. http://gmplib.org/gmp-man-5.0.2.pdf. [12] R. Fourer, D.M. Gay, and B.W. Kernighan. A modeling language for mathematical programming, 1990. [13] M. Galassi, B. Gough, G. Jungman, J. Theiler, J. Davies, M. Booth, and F. Rossi. The gnu scientific library reference manual, 2006. URL http://www. gnu. org/software/gsl. ISBN, 954161734. [14] N. Gatti and G. Staffiero. Exponentially long lh paths games and numerical precision. http://home.dei.polimi.it/ngatti/Nicola_Gatti__Software_Tools_files/report.pdf. [15] I. Gilboa and E. Zemel. Nash and correlated equilibria: Some complexity considerations. Games and Economic Behavior, 1(1):80–93, 1989. [16] C. Gomes, B. Selman, and N. Crato. Heavy-tailed distributions in combinatorial search. Principles and Practice of Constraint ProgrammingCP97, pages 121–135, 1997. [17] C.P. Gomes. Randomized backtrack search. Constraint and integer programming: toward a unified methodology, pages 233–283, 2003. [18] H. Hoos and T. Stutzle. Systematic vs. local search for sat. KI-99: Advances in Artificial Intelligence, pages 698–698, 1999. [19] S. Kontogiannis and P. Spirakis. Efficient algorithms for constant well supported approximate equilibria in bimatrix games. Automata, Languages and Programming, pages 595–606, 2007. [20] R.J. Lipton, E. Markakis, and A. Mehta. Playing large games using simple strategies. In Proceedings of the 4th ACM conference on Electronic commerce, pages 36–41. ACM, 2003. [21] N. Megiddo. On the expected number of linear complementarity cones intersected by random and semi-random rays. Mathematical programming, 35(2):225–235, 1986. [22] K.G. Murty. Computational complexity of complementary pivot methods. Complementarity and fixed point problems, pages 61–73, 1978. [23] N. Nisan, T. Roughgarden, E. Tardos, and V.V. Vazirani. Algorithmic game theory. Cambridge Univ Pr, 2007. [24] E. Nudelman, J. Wortman, Y. Shoham, and K. Leyton-Brown. Run the gamut: A comprehensive approach to evaluating game-theoretic algorithms. In Proceedings of the Third International Joint Conference 111
on Autonomous Agents and Multiagent Systems-Volume 2, pages 880– 887. IEEE Computer Society, 2004. [25] C. Papadimitriou. Algorithms, games, and the internet. In Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 749–753. ACM, 2001. [26] R. Porter, E. Nudelman, and Y. Shoham. Simple search methods for finding a nash equilibrium. Games and Economic Behavior, 63(2):642– 662, 2008. [27] S. Prestwich and C. Quirke. Local search for very large sat problems. SAT. [28] T. Sandholm, A. Gilpin, and V. Conitzer. Mixed-integer programming methods for finding nash equilibria. 20(2):495, 2005. [29] R. Savani and B. Von Stengel. Exponentially many steps for finding a nash equilibrium in a bimatrix game, 2004. [30] R.S.J. Savani. Finding Nash Equilibria of Bimatrix Games. PhD thesis, London School of Economics and Political Science. Dept. of Mathematics, 2010. [31] Y. Shoham and K. Leyton-Brown. Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge Univ Pr, 2009. [32] D. A. Spielman and S. H. Teng. Smoothed analysis of algorithms and heuristics: progress and open question, 2005. [33] H. Tsaknakis and P.G. Spirakis. An optimization approach for approximate nash equilibria. In Proceedings of the 3rd international conference on Internet and network economics, pages 42–56. Springer-Verlag, 2007. [34] A. von Schemde. Index and stability in bimatrix games: a geometriccombinatorial approach. Number 560. Springer Verlag, 2005. [35] B. von Stengel. Computing equilibria for two-person games. Handbook of Game Theory with Economic Applications, 3:1723–1759, 2002. [36] B. von Stengel, A. van den Elzen, and D. Talman. Tracing equilibria in extensive games by complementary pivoting. In Discussion paper No. 9686. CentER for Economic Research, Tilburg University, 1996.
112
Appendix A
15
30
14
28
13
26 Average number of steps
Average number of steps
Other Plots
12 11 10 9 8 7
20 18 16
12 0
5
10
15
20 Cutoff
25
30
35
10
40
(a) PolymatrixGame-RG, size 5 50
50
45
45
40
40
35
30
25
20
15
0
5
10
15
20 Cutoff
25
30
35
40
(b) PolymatrixGame-RG, size 10
Average number of steps
Average number of steps
22
14
6 5
24
35
30
25
20
0
5
10
15
20 Cutoff
25
30
35
15
40
(c) PolymatrixGame-RG, size 15
0
5
10
15
20 Cutoff
25
30
35
40
(d) PolymatrixGame-RG, size 20
Figure A.1: Relation between the average number of steps and the cutoff. PolymatrixGame-RG of size 5-20
113
90
120 110
80 Average number of steps
Average number of steps
100 70
60
50
90 80 70 60 50
40 40 30
0
10
20
30
40
50
60
30
70
0
10
20
30
Cutoff
(a) PolymatrixGame-RG, size 25
40 Cutoff
50
60
70
80
(b) PolymatrixGame-RG, size 30
160
200
Average number of steps
Average number of steps
140
120
100
80
150
100
60
40
0
20
40
60
80
50
100
0
20
40
Cutoff
60
80
100
Cutoff
(c) PolymatrixGame-RG, size 35
(d) PolymatrixGame-RG, size 40
250
350
300 Average number of steps
Average number of steps
200
150
250
200
150
100 100
50
0
20
40
60
80
50
100
Cutoff
0
100
200
300
400
500
Cutoff
(e) PolymatrixGame-RG, size 45
(f) PolymatrixGame-RG, size 50
Figure A.2: Relation between the average number of steps and the cutoff. PolymatrixGame-RG of size 25-50
114
400
500 450
350
Average number of steps
Average number of steps
400 300
250
200
150
350 300 250 200 150
100
50
100
0
100
200
300
400
50
500
0
100
200
Cutoff
(a) PolymatrixGame-RG, size 55
300 Cutoff
400
500
600
(b) PolymatrixGame-RG, size 60
600
Average number of steps
500
400
300
200
100
0
0
100
200
300
400
500
600
700
Cutoff
(c) PolymatrixGame-RG, size 65
Figure A.3: Relation between the average number of steps and the cutoff. PolymatrixGame-RG of size 55-65
15
35
14 30 Average number of steps
Average number of steps
13 12 11 10 9 8 7
25
20
15
6 5
0
5
10
15
20 Cutoff
25
30
35
10
40
(a) GraphicalGame-RG, size 5
0
5
10
15
20 Cutoff
25
30
35
40
(b) GraphicalGame-RG, size 10
Figure A.4: Relation between the average number of steps and the cutoff. GraphicalGame-RG of size 5-10 115
50
45
45
40
40
Average number of steps
Average number of steps
50
35
30
25
20
15
35
30
25
20
0
5
10
15
20 Cutoff
25
30
35
15
40
(a) GraphicalGame-RG, size 15
0
5
10
15
20 Cutoff
25
30
35
40
(b) GraphicalGame-RG, size 20
100
120 110
90
Average number of steps
Average number of steps
100 80
70
60
50
90 80 70 60 50
40
30
40
0
10
20
30
40
50
60
30
70
0
10
20
30
Cutoff
(c) GraphicalGame-RG, size 25
40 Cutoff
50
60
70
80
(d) GraphicalGame-RG, size 30
160
200
Average number of steps
Average number of steps
140
120
100
80
150
100
60
40
0
20
40
60
80
50
100
Cutoff
0
20
40
60
80
100
Cutoff
(e) GraphicalGame-RG, size 35
(f) GraphicalGame-RG, size 40
Figure A.5: Relation between the average number of steps and the cutoff. GraphicalGame-RG of size 15-40
116
240
350
220 300 Average number of steps
Average number of steps
200 180 160 140 120
250
200
150
100 100 80 60
0
20
40
60
80
50
100
0
100
200
Cutoff
300
400
500
Cutoff
(a) GraphicalGame-RG, size 45
(b) GraphicalGame-RG, size 50
400
500 450
350
Average number of steps
Average number of steps
400 300
250
200
150
350 300 250 200 150
100
50
100
0
100
200
300
400
50
500
0
100
200
Cutoff
(c) GraphicalGame-RG, size 55
300 Cutoff
400
500
600
(d) GraphicalGame-RG, size 60
600
Average number of steps
500
400
300
200
100
0
0
100
200
300
400
500
600
700
Cutoff
(e) GraphicalGame-RG, size 65
Figure A.6: Relation between the average number of steps and the cutoff. GraphicalGame-RG of size 45-65
117
15
40
14 35 Average number of steps
Average number of steps
13 12 11 10 9 8 7
30
25
20
15
6 5
0
5
10
15
20 Cutoff
25
30
35
10
40
(a) CovariantGame-Rand, size 5
200
80
180
10
15
20 Cutoff
25
30
35
40
160 Average number of steps
70 Average number of steps
5
(b) CovariantGame-Rand, size 10
90
60 50 40 30
140 120 100 80 60
20 10
0
40
0
50
100 Cutoff
150
20
200
(c) CovariantGame-Rand, size 15
0
50
100 Cutoff
150
200
(d) CovariantGame-Rand, size 20
250
500 450 400 Average number of steps
Average number of steps
200
150
100
350 300 250 200 150
50
100 0
0
50
100 Cutoff
150
50
200
0
200
400
600
800
1000
Cutoff
(e) CovariantGame-Rand, size 25
(f) CovariantGame-Rand, size 30
Figure A.7: Relation between the average number of steps and the cutoff. CovariantGame-Rand of size 5-30
118
1400
700
1200
600
1000
Average number of steps
Average number of steps
800
500
400
300
200
100
800
600
400
200
0
200
400
600
800
0
1000
0
200
400
Cutoff
600
800
1000
Cutoff
(a) CovariantGame-Rand, size 35
(b) CovariantGame-Rand, size 40
1600
2500
1400
Average number of steps
Average number of steps
2000 1200
1000
800
600
1500
1000
500 400
200
0
200
400
600
800
0
1000
Cutoff
0
200
400
600
800
1000
Cutoff
(c) CovariantGame-Rand, size 45
(d) CovariantGame-Rand, size 50
Figure A.8: Relation between the average number of steps and the cutoff. CovariantGame-Rand of size 35-50
119
400
300
350 300
250
Average number of steps
Average number of steps
350
200
150
100
200 150 100
50
0
250
50
0
0.05
0.1
0.15
0.2
0 0.1
0.25
0.15
0.2
epsilon
0.4
0.45
(b) PolymatrixGame-RG: -supp-NE
300
300
250
250
Average number of steps
Average number of steps
(a) PolymatrixGame-RG: -NE
0.25 0.3 0.35 epsilon well supported
200
150
100
50
200
150
100
50
0 0.02
0.04
0.06
0.08
0.1 regret
0.12
0.14
0.16
0
0.18
0
0.1
0.2
0.3
0.4
0.5 z0
0.6
0.7
0.8
0.9
1
(d) PolymatrixGame-RG: z0
(c) PolymatrixGame-RG: regret
Figure A.9: Values of -NE (a), -supp-NE (b), regret (c) and z0 (d) of the starting points and related average number of steps with PolymatrixGameRG. x-axis points are clustered in order to compute the average on y-axis
120
700
1200
1000
500
Average number of steps
Average number of steps
600
400
300
200
800
600
400
200
100
0 0.02
0.04
0.06
0.08
0.1
0 0.04
0.12
0.06
0.08
epsilon
250
250
200
200
150
100
50
0 0.01
0.16
0.18
0.2
(b) CovariantGame-Rand: -supp-NE
Average number of steps
Average number of steps
(a) CovariantGame-Rand: -NE
0.1 0.12 0.14 epsilon well supported
150
100
50
0.02
0.03
0.04
0.05 0.06 regret
0.07
0.08
0.09
0
0.1
0
0.1
0.2
0.3
0.4
0.5 z0
0.6
0.7
0.8
0.9
1
(d) CovariantGame-Rand: z0
(c) CovariantGame-Rand: regret
Figure A.10: Values of -NE (a), -supp-NE (b), regret (c) and z0 (d) of the starting points and related average number of steps with CovariantGameRand. x-axis points are clustered in order to compute the average on y-axis
121
12000
16000
14000
Average number of steps
Average number of steps
10000
8000
6000
4000
2000
0
12000
10000
8000
6000
4000
0
0.02
0.04
0.06 epsilon
0.08
0.1
2000
0.12
(a) hard-to-solve games: -NE
0
0.02
0.04
0.06 0.08 0.1 epsilon well supported
0.12
0.14
0.16
(b) hard-to-solve games: -supp-NE
9000
12000 11000 10000 Average number of steps
Average number of steps
8000
7000
6000
5000
9000 8000 7000 6000 5000 4000
4000
3000
3000
0
0.01
0.02
0.03 0.04 regret
0.05
0.06
2000
0.07
0
0.1
0.2
0.3
0.4
0.5 z0
0.6
0.7
0.8
0.9
1
(d) hard-to-solve games: z0
(c) hard-to-solve games: regret
Figure A.11: Values of -NE (a), -supp-NE (b), regret (c) and z0 (d) of the starting points and related average number of steps with hard-to-solve games. x-axis points are clustered in order to compute the average on y-axis
122
1200
1000
1000
Average number of steps
Average number of steps
1200
800
600
400
200
0
800
600
400
200
2.5
3
3.5
0
4
1.3
1.4
1.5
d1
(a) PolymatrixGame-RG: d1
1.6 d1
1.7
1.8
1.9
2
(b) PolymatrixGame-RG: mean d1
1200
1000 900 800 Average number of steps
Average number of steps
1000
800
600
400
700 600 500 400 300 200
200
100 0 0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.2
1.6
0.3
0.4
0.5
0.6
d2
1200
1200
1000
1000
800
600
400
200
0 0.1
0.9
1
1.1
1.2
(d) PolymatrixGame-RG: mean d2
Average number of steps
Average number of steps
(c) PolymatrixGame-RG: d2
0.7 0.8 Mean d2
800
600
400
200
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
1
d∞
(e) PolymatrixGame-RG: d∞
0
0.1
0.2
0.3
0.4
0.5 0.6 mean d∞
0.7
0.8
0.9
1
(f) PolymatrixGame-RG: mean d∞
Figure A.12: d1 , d2 and d∞ distances between starting points and the Nash equilibria to that they lead. PolymatrixGame-RG. x-axis points are clustered in order to compute the average on y-axis.
123
600
500
500
Average number of steps
Average number of steps
600
400
300
200
100
400
300
200
100
0
0.4
0.5
0.6
0.7
0.8
0.9
0
1
0.4
0.5
cosine
(a) PolymatrixGame-RG: cosine
0.6 0.7 mean cosine
0.8
0.9
1
(b) PolymatrixGame-RG: mean cosine
400
600
350 500
Average number of steps
Average number of steps
300 250 200 150
400
300
200
100 100 50 0 0.2
0.4
0.6
0.8 1 correlation
1.2
1.4
0 0.2
1.6
(c) PolymatrixGame-RG: correlation
0.4
0.6
0.8 1 mean correlation
1.2
1.4
1.6
(d) PolymatrixGame-RG: mean correlation
Figure A.13: Cosine and correlation distances between starting points and the Nash equilibria to that they lead. PolymatrixGame-RG. x-axis points are clustered in order to compute the average on y-axis. 1400
800
1200
700
Average number of steps
Average number of steps
600 1000
800
600
400
500 400 300 200
200
0
100
2
2.2
2.4
2.6
2.8
3 d1
3.2
3.4
3.6
3.8
0
4
(a) CovariantGame-Rand: d1
1
1.1
1.2
1.3
1.4
1.5 1.6 mean d1
1.7
1.8
1.9
2
(b) CovariantGame-Rand: mean d1
Figure A.14: d1 distances between starting points and the Nash equilibria to that they lead. CovariantGame-Rand. x-axis points are clustered in order to compute the average on y-axis. 124
700
600
600
500
500
Average number of steps
Average number of steps
700
400
300
200
100
400
300
200
100
0 0.2
0.4
0.6
0.8
1
1.2
1.4
0
1.6
0
0.2
0.4
d2
(a) CovariantGame-Rand: d2
1.4
500
500
Average number of steps
Average number of steps
1.2
600
600
400
300
200
400
300
200
100
100
0
0.1
0.2
0.3
0.4
0.5 d∞
0.6
0.7
0.8
0.9
0
1
(c) CovariantGame-Rand: d∞
0.1
0.2
0.3
0.4
0.5 0.6 mean d∞
0.7
0.8
0.9
1
600
600
500
Average number of steps
500
400
300
200
400
300
200
100
100
0 0.2
0
(d) CovariantGame-Rand: mean d∞
700
Average number of steps
1
(b) CovariantGame-Rand: mean d2
700
0
0.6 0.8 Mean d2
0.4
0.6
0.8 1 correlation
1.2
1.4
0 0.2
1.6
(e) CovariantGame-Rand: correlation
0.4
0.6
0.8 1 mean correlation
(f) CovariantGame-Rand: correlation
1.2
1.4
1.6
mean
Figure A.15: d2 , d∞ and correlation distances between starting points and the Nash equilibria to that they lead. CovariantGame-Rand. x-axis points are clustered in order to compute the average on y-axis.
125
16000
12000
14000
11000
Average number of steps
10000 12000 9000 10000 8000 8000 7000 6000 6000 4000
5000
2000 2.2
2.4
2.6
2.8
3 mean d1
3.2
3.4
3.6
4000 0.45
3.8
(a) Hard-to-solve games: mean d1
0.5
0.55
0.6
0.65
0.7
(b) Hard-to-solve games: mean d2
4
2
x 10
16000
1.8 14000
1.4
Average number of steps
Average number of steps
1.6
1.2 1 0.8 0.6
12000
10000
8000
6000
0.4 4000 0.2 0
0.35
0.4 0.45 Mean dinfinity
0.5
2000 0.4
0.55
(c) Hard-to-solve games: mean d∞
0.5
0.6
0.7
0.8 0.9 1 mean correlation
1.1
1.2
1.3
1.4
(d) Hard-to-solve games: mean correlation
Figure A.16: Mean of d1 , d2 , d∞ and correlation distances between starting points and the Nash equilibria to that they lead. Hard-to-solve games. x-axis points are clustered in order to compute the average on y-axis.
126