Proceedings of the 2001 IEEE Congress on Evolutionary Computation Seoul, Korea • May 27-30, 2001
Why More Choices Cause Less Cooperation in Iterated Prisoner’s Dilemma Paul J. Darwen Department of Computer Science and Electrical Engineering The University of Queensland Brisbane, Queensland 4072 Australia Email:
[email protected]
Xin Yao Department of Computer Science The University of Birmingham Edgbaston, Birmingham B15 2TT United Kingdom Email:
[email protected]
AbstractClassic Iterated Prisoner’s Dilemma (IPD) has only 2 choices, cooperate or defect. However, most real-world situations offer intermediate responses, between full cooperation and full defection. Previous studies observed that with intermediate levels, mutual cooperation is less likely to emerge, and even if it does it’s less stable. Exactly why has been a mystery. This paper demonstrates two mechanisms that sabotage the emergence of full mutual cooperation. First, to increase cooperation requires behavioral (phenotypic) diversity to explore different possible outcomes, and once evolution has converged somewhat on a particular degree of cooperation, it is unlikely to shift. Secondly, more choices allows a richer choice of stable strategies that are not simply cooperating with each other to exclude an invader, but which are symbiotic. Such non-symmetric and symbiotic players in the space of strategies act as roadblocks on the path to full cooperation.
player has two choices, cooperate or defect. The payoff matrix satisfies the following conditions [1], shown in Figure 1:
1 Introduction In a world torn by conflict, there is keen interest in how mutually beneficial cooperation can emerge spontaneously, with no higher authority than the rule of the jungle. History, economics, and especially biology [1, chapter 4] provide many real-world examples of mutual cooperation without any enforcement. This cooperation occurs despite a bloodthirsty option offering more payoff to those might exploit their more cooperative peers. Some situations offer only two choices, either full conflict or full cooperation, with nothing in between. This simple two-choice case is the most studied [1] [2] [8] [18]. The real world is often more complicated, with intermediate degrees of cooperation, rather than all-or-nothing [11] [22, page 175]. This complicates the emergence of cooperation. Why this is so, is the topic of this paper. 1.1 Iterated Prisoner’s Dilemma To better understand how spontaneous mutual cooperation occurs, the abstract mathematical game of Iterated Prisoner’s Dilemma (IPD) is widely studied in biology, economics, political science, and artificial intelligence [1]. In its basic form, the Prisoner’s Dilemma is a two-player game where each
0-7803-6657-3/01/$10.00 © 2001 IEEE
987
¯ and (Defection always pays more) ¯ (Mutual cooperation beats mutual defection) ¯ (Alternating doesn’t pay) Cooperate
Defect
Ê Cooperate
Ê
Ì Ë
Ë Defect
Ì
È È
Figure 1: The payoff matrix for the 2-choice 2-player prisoner’s dilemma game. The values must satisfy and . Many parameters satisfy these conditions. This paper uses , , and . In the iterated prisoner’s dilemma, the game in Figure 1 is played not just once, but many times, with a memory of previous iterations. This allows the possibility of retaliation and mutual cooperation.
,
1.2 IPD with More Choices Classic IPD has only has two choices, full cooperation or full defection. Some recent papers add intermediate levels of cooperation to the IPD game [11] [12] [22] [23]. The general conclusion from those studies is that mutual cooperation still happens with intermediate choices, but not as often nor as stable as in the 2-choice case. Frean notes that “... the average level of cooperation remains high throughout, so although not stable, as a group these strategists are cooperative ... there is a restless interplay of invasions and re-invasions between strategists that are broadly cooperative” [11, page 558]. Similarly, “... any cooperative behavior that did arise did not tend toward complete cooperation” [12, pages 140-142]. Previous work lacks an explanation of why these intermediate choices should so complicate the emergence of full mu-
tual cooperation. This paper verifies these earlier observations, and demonstrates two mechanisms for why. Following previous work [11] [12], intermediate payoffs are a linear interpolation of the 2-choice game. For example, if each player has four choices (instead of the usual two), the payoff matrix is in Figure 2, which shows the payoff to the player on the left, player A.
½¿ ½¿ ½ ¾ ¿ ¿
Player A
Player B
½¿ ¾¿ ½¿ ¾¿
½¿ ¾¿ ½¿ ½ ¿ ½ ¿ ¾ ¿
2 Experimental Setup 2.1 Feed-Forward Neural Network Representation
Figure 2: Four-choice IPD payoff matrix, showing payoff to player A, is interpolated from the 2-choice game. Here -1 is full defection and +1 is full cooperation. Note that any ¢ sub-matrix of the ¢ game is itself a 2-choice IPD. Two points to note: the four corners of an -choice ¢ payoff matrix are the same payoffs as for the 2-choice game, and any ¢ sub-matrix of an ¢ payoff matrix is itself a 2-choice IPD. This paper will mostly consider the case where each player has eight evenly discretized choices of cooperative level, giving a ¢ payoff matrix. In the -choice case, payoffs must satisfy Equations 1 through 3 to be a Prisoner’s Dilemma game. These conditions generalize from the 2-choice conditions in Section 1.1. Here, and are the cooperative levels of two players A and B, and is the payoff to player A.
¯ Defection pays more. For ¼ and constant :
¼
(1)
¯ Both cooperating pays more than both defecting. For ¼ and constant :
¼
(2)
¯ Alternating doesn’t pay. For ¼ and ¼ :
¼
¼
¼
¼
its peers in the same population. To learn a game like IPD, a strategy’s fitness is its average payoff from playing all the other strategies in the same evolving population. As the population improves, its members become more challenging opponents, and the evaluation function becomes more discerning. The aim is to set up an escalating arms race of innovation. Co-evolution has also worked on Checkers [6], Backgammon [7] [20], and on non-game tasks [17] [19] [21], such as scheduling [13] [15] and creating a sorting algorithm [14].
(3)
1.3 Co-Evolutionary Learning To gain insight into why mutual cooperation does or does not emerge in IPD, current research follows political scientist Robert Axelrod, who imposed a co-evolutionary dynamic on a population of trial strategies playing against each other [2]. In this approach, a computer maintains a population of trial strategies. An evaluation function judges the quality of each trial strategy in the population. Instead of a human programmer writing that evaluation function, in co-evolution a strategy’s fitness is evaluated by
988
Intermediate choices would make for a cumbersome look-up table. A more convenient way to represent an IPD strategy is as a feed-forward neural network. Each member of the population is a fixed-length array of floating-point numbers. These represent the weights and biases of a feed-forward neural network with a fixed architecture, with a single layer of hidden nodes and one output node. This paper uses 10 hidden nodes. More than that made no noticeable difference. Previous work found different behavior for 2 and 20 hidden nodes [12]. In each iteration of IPD, players move simultaneously. They remember a finite number of such moves into the past. Most previous work has used a memory of three previous rounds. This paper mostly uses a memory of just one round. This is partly because many strategies of interest (such as “Tit for Tat”) only use a memory of one previous round. Also, such a short memory makes it easy to visualize a strategy, as it is a function of only two variables: one’s own and the opponent’s previous cooperation level. This makes it easy to display strategies later in this paper. For each past round that is remembered, the neural network has four input nodes: ¯ One’s own previous level of cooperation, in .
¯ The opponent’s earlier level of cooperation. ¯ An input which is 1 if the opponent exploited the player, and zero otherwise. ¯ An input which is 1 if the opponent was exploited by the player, and zero otherwise. For the last two inputs, “exploited” is merely the sign of the difference in the previous cooperation levels. Even if a player is exploited by only a single fine-grained difference in cooperation, the corresponding input will still be 1. Thus, players can know immediately if they are being exploited, even by a small amount. The last two inputs are just functions of the first two. So although each neural network has four input nodes, it is a function of two variables: one’s own previous level of cooperation and the opponent’s. This makes it easy to visualize a strategy. Each individual is a neural network with one output node. The hidden nodes and the output node are all sigmoided, to stay in the range , using the hyperbolic tangent
2.2 Other Parameters The settings used here are quite conventional, and include the following. Mutation is Gaussian, with a standard deviation of 0.5. The probability of mutating a particular allele is 0.03, averaging around 2 mutations in a 63-allele genotype. Crossover is uniform, with probability of 0.5. An offspring suffers either mutation or crossover, but not both, so there is a 0.5 chance of being mutated. Population size is 100. Replacement is 50%, so the worse half of the population is deleted and replaced by the offspring of the better half. The unchanged strategies in the best half are re-evaluated, the old fitness values don’t get copied. Selection was stochastic universal selection with linear ranking [3]. The best player in the population expects 1.2 offspring, and the worst expects 0.8 offspring. This weak selection pressure aims to prevent premature convergence. A modest number of runs at different parameters did not reveal any great sensitivity. An exception is that (as one would expect) high mutation combined with high selection pressure tended to cause premature convergence. To make this less likely, the replacement value is only 50% of the population, instead of more typical values in the high nineties. 2.3 The Shadow of the Future In IPD, the “shadow of the future” [1, page 13] is an issue. If the number of iterations in a game of IPD is known in advance, then players have no incentive to cooperate on the very last move, because there is no risk of retaliation. But if every player thus plans to defect on the very last move, then there is no incentive to cooperate on the second last move (seeing as nobody will cooperate on the last move anyway). And so on back to the start of the game. Thus, cooperation only emerges if the game length is somehow kept uncertain. A popular way to keep game length uncertain is to have a fixed probability of ending the game on each successive move [2].
989
However, in this paper, individual strategies are represented in such a simple way (feed-forward neural networks) that they cannot count. This lets us use a fixed-length game, with no chance of the players figuring out when the last (or second-last) move will be. Here, game length is 150 iterations, comparable to the average length used by Axelrod [2].
3 Results 3.1 Classic Two-Choice Evolution of Cooperation For the two-choice game, as Axelrod [2] observed, starting out with a population of random strategies gives the behavior shown in Figure 3. Average payoffs initially plummet as na¨ıve cooperators are exploited. Exploiters soon run out of victims, after which strategies that mutually cooperate but retaliate against defectors come to dominate. Mutual cooperation remains dominant for long periods of time, punctuated by rare mass extinctions [18] [8, page 287] [12, page 141]. 4 Best and average score
tanh as the sigmoid function. After sigmoiding, the output is discretized so that all the values are equidistant from each other. For example, for 4 choices of cooperation, the output node’s value in the range would be adjusted to the closest of the values ¿½ ½¿ . This discretized output is the player’s choice of cooperation for the current move. Thus, inputs for the next round are discretized. On the very first round of a game of IPD, there are no previous rounds on which to base a decision about the next round. To fill this gap, each strategy’s genotype contains two extra values, that act as the imaginary pre-game “previous” levels of cooperation. Each of the ten hidden nodes has six weights: four input weights, an output weight, and a bias. An extra bias term for the final output, plus these two “pre-game” inputs, gives a genotype of 63 floating-point numbers.
3.5 Evolution of cooperation in the 2-choice IPD game.
3 2.5 2 1.5
Best in population Population average
1 0
20
40
60
80
100
120
140
Generation
Figure 3: A typical run with 2 choices of full cooperation or full defection, showing the “evolution of cooperation” [2]. Early in the run, all four outcomes in the ¢ payoff matrix are occurring. Only later in the run, when mutual cooperation dominates, three of the four outcomes in the payoff matrix (those other than mutual cooperation) rarely occur. Mutual cooperation is not inevitable, even in the 2-choice game. A population may get stuck on mutual defection for a long period of time. When this happens, only one of the four outcomes in the ¢ payoff matrix is being sampled. This is not necessarily a permanent state of affairs, because the noise-free 2-choice IPD game does not have an uninvadeable strategy [4] [10]. Mutation may eventually produce an invader strategy that is more cooperative. However, this is an improbable and therefore rare event. Once a population has settled into all-defection, mutation may take a long time (if ever) to produce a small number of suitable invader cooperators to take over. To demonstrate, Figure 4 shows a run where the initial population consists of strategies that always defect, but are
3.5
3.5
3 Initial population is seeded with identical all-defect strategies, who need only a few mutations to become Tit-for-Tat.
2.5 2 1.5 1 0
20
40
Best score in generation
4
Mutation eventually makes cooperative invaders who take over the population.
Best in population Population average
60 80 100 Generation
120
3 2.5 2 1.5 1
140
0
Figure 4: This run’s initial population are strategies that always defect, but which are only a few mutations away from the cooperative Tit-for-Tat strategy. Mutation eventually hits onto the right mutations to create a few successful invaders who take over the population, but it’s a rather long wait. only a few mutations away from the cooperative Tit-for-Tat strategy. Even in this case, where the all-defect strategy is specifically mutation-friendly, it takes some time for mutation to produce some suitably cooperative invaders. The point is, once behavioral diversity disappears, there is little variation for natural selection to increase cooperation. Increasing cooperation, when the whole population is already cooperating at a lower level, is rare. Mutation is a slow substitute for behavioral diversity. 3.2 Intermediate Choice Causes Less Cooperation Earlier studies have found that allowing degrees of cooperation (rather than only full cooperation or full defection) still achieves some degree of mutual cooperation, but with a “restless interplay of invasions and re-invasions” [11, page 558], where “... any cooperative behavior that did arise did not tend toward complete cooperation” [12, pages 140-142]. With a choice of 8 levels of cooperation (instead of just 2) Figure 5 shows the score of the best individual out of each generation for 10 different runs. This seems to agree with those earlier studies. Unlike the 2-choice case in Figure 3, mutual cooperation is less certain in Figure 5. Note that in Figure 5, payoffs stick at a particular level for a while, before changing erratically and perhaps settling on another level. Only one of the ten runs is stuck at the lowest payoff, from full defection. But most runs stay stuck at an intermediate level for long periods. The question is, if a population can usually increase its degree of cooperation from the bottom level, why doesn’t it keep increasing? Why not ratchet all the way up early in the run, and stay up?
990
100
200
300 400 Generation
500
600
Figure 5: Ten runs with 8 choices of cooperation. As previous studies have shown, there is usually some degree of mutual cooperation, but it’s not particularly stable. 3.3 One Cause: Lack of Behavioral Diversity As mentioned in Figure 2, any ¢ sub-matrix of the ¢ payoff matrix of the -choice IPD is itself a 2-choice IPD. One can verify (not shown here) that using a 2-choice payoff matrix whose values are a sub-matrix of the ¢ game will still result in mutual cooperation, just like in Figure 3. This suggests that the population should work its way up one level at a time, since the top corner of one ¢ submatrix is only the bottom corner of another ¢ sub-matrix. But this doesn’t consistently happen with the 8-choice game. Consider a run from Figure 5, for which Figure 6 shows the best and average payoff for each generation. Payoff remains at a constant level, often for long periods. Brief and rapid changes punctuate these stable periods. 4 Best and average scores
Best and average payoff
4
Best in population Population average
3.5 3 2.5 2 1.5 1 0
100
200
300
400
500
600
Generation
Figure 6: This shows the population’s best score in a run with 8 choices of cooperation, one of those in Figure 5. The degree of cooperation is usually stuck on some intermediate level, with occasional rapid changes, up or down.
What outcomes are happening? From the ¢ payoff matrix, one can plot a histogram of which outcomes occur in all the games in a generation. Consider the period from generations 340 to 350 in Figure 6, where average payoff goes up a notch. What choices are being made then? Figure 7 shows histograms of which choices (from the ¢ payoff matrix) are being made during the course of generations 340, 344, 346, and 350. It is during this brief period that the payoff level in Figure 6 goes up a notch.
0 1 2 3 Cooperative level 4 of player 1
5
5
6
7 0
Generation 346 4e+05 3e+05 2e+05 1e+05
No. of times payoff is chosen 450000 400000 350000 300000 250000 200000 150000 100000 50000 0
4
0
1 2 3 Cooperative level 4 of player 1
7
4 3 2 1 Cooperative level of player 2
5
6
7 0
5
0 1 2 3 Cooperative level 4 of player 1
5
5
6
7 0
6
7
3 2.5 2 1.5 1
4
0
1 2 3 Cooperative level 4 of player 1
3.5
4 3 2 1 Cooperative level of player 2
7
3 2 1 Cooperative level of player 2
4
Generation 350 5e+05 4e+05 3e+05 2e+05 1e+05
No. of times payoff is chosen 600000 500000 400000 300000 200000 100000 0 6
4.5
Generation 344 5e+05 4e+05 3e+05 2e+05 1e+05
No. of times payoff is chosen 600000 500000 400000 300000 200000 100000 0 6
As a test, more mutations (to create more behavioral diversity) should make mutual cooperation more likely in the 8choice game. Instead of a per-allele mutation rate of 0.03 with Gaussian mutation step size of 0.5 (i.e., a few changes), this section uses a per-allele mutation rate of 0.2 with step size of 0.15 (i.e., many more smaller changes).
Best score
Generation 340 6e+05 5e+05 4e+05 3e+05 2e+05 1e+05
No. of times payoff is chosen 700000 600000 500000 400000 300000 200000 100000 0
3.3.1 Verification: More Diversity Promotes Cooperation
5
6
7 0
5
6
0
7
3 2 1 Cooperative level of player 2
Figure 7: Histograms of which choices (out of the ¢ payoff matrix) are being made during generations 340, 344, 346, and 350 in the run shown in Figure 6, during one of the brief changes in payoff level. What’s happening in Figure 7 is that there is little behavioral (i.e., phenotypic) diversity in generation 340; almost everyone is doing the same thing, and choosing the secondlowest level of cooperation. With no variation in behavior, evolution stagnates. Eventually, mutation does produce an invader strategy that can cooperate at the next highest level. This is analogous to when mutual defection dominates in the 2-choice game, and a mutation eventually happens which produces a more cooperative strategy which can invade, as was shown in Figure 4. The reason the 8-choice game takes so long to move up to the next level is because behavioral diversity disappears so quickly. In the 2-choice game (Figure 3), there is a great deal of diversity early on due to the random initial population, and it explores all 4 possible outcomes. But in the 8-choice game, it doesn’t explore 64 possible outcomes for long. Payoff rises to a particular level by making use of that initial diversity, but once that diversity dies out it remains stuck at the same level for a long period, until mutation eventually produces a suitable invader. Similarly, if the 2-choice game happens to settle on alldefection, it takes an equally long period of time for mutation to create a player at the next higher level of cooperation (the highest, there being only 2 such levels). This was was shown in Figure 4.
991
100
200
300
400
500
600
Generation
Figure 8: Ten runs with 8 choices of cooperation, with more mutations, to cause extra behavioral diversity. All runs achieve the highest or second-highest level of cooperation. Figure 8 shows 10 runs with this higher level of mutation. In contrast to the previous low-mutation runs, here all ten runs achieve either the highest or second-highest level of mutual cooperation. This shows how more phenotypic diversity (simply by raising the mutation rate) makes mutual cooperation more likely in the 8-choice IPD. There are only 64 possible outcomes in the 8-choice IPD. A population in the hundreds, with a rather high mutation rate, samples enough of those 64 outcomes to discover mutual cooperation. However, more mutation is not a panacea. With more possible choices, this simple expedient cannot hope to provide sufficient phenotypic diversity. Consider the case of an almost continuous IPD, where each player has 1,024 choices. That makes more than a million possible outcomes to each round. Even with the higher mutation rate, Figure 9 shows that mutual defection is the most likely outcome. 3.4 Another Cause: Stable Non-Symmetric Strategies Even with the higher mutation rate, one of the runs in Figure 8 does not appear to settle on any particular level, but wanders around, as shown in Figure 10. Why is that? Closer inspection reveals that the population has formed two distinct species. Representatives from generation 520 of Figure 10 are shown in Figures 11 and 12. The entire population consists of a minority like that Figure 11, and the majority are the other way around with a larger valley floor, as in Figure 12.
Generation 520, individual 0. 29% of the population is like this.
4
Average payoff
3.5
My next coop level
2
1 0.5 0 -0.5 -1
1.5
1
3
With 1,024 choices, even with more mutation, defection
2.5
dominates in 8 out of 10 runs.
1 0
100
200
300 400 Generation
500
-0.5
-1
-1
600
Figure 9: Ten runs with 1,024 choices of cooperation. Even with more mutations, just like in Figure 8. there are simply too many possibilities to sample very many with a population of only 100. As a result of insufficient phenotypic (behavioral) diversity, mutual defection is once again the most common outcome, in 8 runs out of 10.
4.5 4 Best and average score
0.5 0 Other’s last coop level
1 0.5 0 My last -0.5 coop level
Figure 11: Generation 520 of Figure 10 consists of 29% of strategies like this. Generation 520, individual 16. 71% of the population is like this. My next coop level 1 0.5 0 -0.5 -1 1 0.5
3.5
0 Other’s last -0.5 coop level
3
-1 -1
-0.5
0
0.5 My last coop level
1
2.5
Figure 12: Generation 520 of Figure 10 consists of 71% of strategies like this.
2 1.5
Best in population Population average
1 0
100
200
300
400
500
600
Generation
Figure 10: This run, one of those in Figure 8, does not appear to settle down to a constant level after generation 400. Following each player’s moves round-by-round, due to the way that a strategy’s first move interacts with subsequent moves, each of the two types does not achieve full cooperation with its own type, but it does with the other type. To be precise, those 29% of individuals in generation 520 which resemble the individual in Figure 11 (plateau on the right) make their first move at the second-lowest level of cooperation. When both inputs are at that level, the output is also at the same level. So two such individuals achieve a low payoff against each other. Similarly, those 71% of individuals in generation 520 which resemble the individual in Figure 12 (plateau on the left) make their first move at the second-highest level of cooperation. When both inputs are at that level, the output is also
992
at the same level. So two such individuals achieve a rather high payoff against each other, but still not the highest and most profitable level of cooperation. But when one type plays against the other, their first few rounds lead to full defection, followed by a gradual increase in the level of cooperation, until finally both play at full cooperation for the rest of the game. In that sense, the population’s members bear a close resemblance to the hand-written strategy called “raise the stakes” that was created for an earlier study [22]. That strategy cooperated with itself. Here, in contrast, two symbiotic strategies have evolved that also raise their level of cooperation, but they rely on each other to maintain their dominance. The reader might ask, why can’t the higher mutation rate produce some suitable invader, that cooperates with itself and these two, to take over? It turns out that the two strategies in Figures 11 and 12 are surprisingly robust to mutant invaders. They are even robust against seeding the population with various hand-written strategies, such as Tit-for-Tat. These wouldbe invaders are ruthlessly wiped out by the two symbiotic strategies already present.
Just like the 2-choice IPD [4] [10], IPD with more choices does not allow a strategy (or a finite group of strategies) that are perfectly invasion-proof, unless there is random noise. Even so, there are still strategies which can survive for surprisingly long periods, such as “Tit for Tat”. The two strategies in Figures 11 and 12 are similarly stable. The novelty is that they are asymmetric and symbiotic.
4 Discussion 4.1 Behavioral Diversity Helps In IPD with only 2 choices, the ¢ payoff matrix gives only 4 possible outcomes. Each possible outcome has only 2 choices, so with a memory of only one previous round, there are only 8 possible strategies. This small space of behaviors is easy to search. But for choices, there are ¾ outcomes, each of which offers choices. That gives ¾ possible behaviors. As soon as evolution weeds out the poor players and goes some way to converging, behavioral (phenotypic) diversity quickly becomes too low to sample any more than a tiny fraction of those ¾ possibilities. When the evolving population converges on a particular level of cooperation, the population is as stable there as a 2choice population would be if it were stuck on full defection. This is because any ¢ sub-matrix of the ¢ payoff matrix is itself an IPD. It takes a long time for mutation to create an invader who can push cooperation up a notch, as shown in Figure 4. Causing more behavioral diversity — implemented here by a higher mutation rate – can push a population away from such stagnation, as shown in Figure 8. But this simple fix cannot help when there are simply too many choices, as shown in Figure 9. 4.2 Behavioral Diversity is not Genetic Diversity It is important to distinguish between behavioral and genetic diversities. High genetic diversity, i.e., a high diversity in genotypes, does not necessarily result in high behavioral diversity, especially when there is no one-to-one mapping between genotypes and phenotypes. Because we can only evaluate a phenotype’s fitness, and use that as a proxy for the corresponding genotype’s fitness, it is the phenotypical diversity that is most important here. Our earlier work [9] showed how, on this same learning problem, a high genotypical (genetic) diversity can perversely correspond to a low phenotypical (behavioral) diversity. 4.3 Non-Symmetric Stable Strategies Previous work on stability in IPD focused on a population of identical strategies, all doing the same thing with each other, and how well such a symmetric strategy can maintain itself in the face of invaders. No finite group of strategies in IPD is invasion-proof, unless there is random noise [4] [10], but
993
strategies can maintain themselves for long periods nonetheless. The results here show that, with the extra richness of the -choice IPD, there are other non-symmetric strategies that can stay dominant only through symbiosis. If a population was 100% of the strategy in Figure 11, or 100% of that in Figure 12, an invader would find it easy to take over. But with both together in the same population, the asymmetry of their symbiosis makes it very hard for an invader. Such a situation is unforeseen by previous work. Even though those strategies resemble the hand-written “raise the stakes” strategy described in previous work [22], that work did not consider asymmetric and symbiotic stable strategies. The conclusion is that the -choice game offers many more possible ensembles of quite stable strategies, that an evolving population may find itself attracted to. Such islands of stability act as stop signs on the road to full cooperation. That these extra islands of stability exist only in the richer -choice game is partly why it is less likely to reach full cooperation than the 2-choice game. Incidentally, this is also interesting because it is an example of speciation in the absence of geographic barriers to prevent genetic mixing. In biology, speciation often occurs as a result of barriers, such as Darwin’s finches on the Galapagos archipelago. Sometimes, however, species emerge within overlapping regions, in the absence of any geographic barriers. A vivid example is the four species of tern (a kind of sea bird) on the Black Sea island of Jorilgatch [5, page 451].
5 Conclusion Previous papers have found that for IPD with intermediate choices, mutual cooperation is less likely and appears to be less stable. Exactly why has been a mystery. This paper demonstrates two causes. First, when evolution has to some extent converged, the behavioral diversity of the population plummets, so that all strategies are choosing the same outcome from the ¢ payoff matrix. Because any ¢ sub-matrix of the ¢ payoff matrix is itself an IPD game, this situation is analogous to a population in which all players defect all the time; it takes a long while for a mutation to produce a more cooperative invader strategy, to move the general level of cooperation. This is why payoffs tend to stick to a constant level for long periods, as shown in Figure 5. Causing more behavioral diversity, such as by increasing the mutation rate, should therefore make mutual cooperation more likely. This prediction is verified in Figure 8, where all runs achieve either the highest or second-highest level of cooperation. Secondly, although no single strategy is invasion-proof in IPD [4] [10], there exist strategies that can dominate an evolving population for long periods of time. However, previous work has only considered a population of identical strategies
and how well they resist invasion. This paper has demonstrated that intermediate choices in IPD allows the emergence of asymmetric strategies surviving in a stable mutualism, and resisting invasion together. The two evolved strategies shown as an example bear a strong resemblance to the hand-written “raise the stakes” strategy [22].
[10] Joseph Farrell and Roger Ware. Evolutionary Stability in Repeated Prisoner’s Dilemma. Theoretical Population Biology, 36:161–166, 1989. [11] Marcus Frean. The evolution of degrees of cooperation. Journal of Theoretical Biology, 182(4):549–559, 1996. [12] Paul G. Harrald and David B. Fogel. Evolving continuous behaviors in the Iterated Prisoner’s Dilemma. Biosystems, 37:135–145, 1996.
Acknowledgments Chris Pascoe, David Reeves, and Adrian Lee gave invaluable technical support. Discussions with the University of Birmingham’s Evolutionary and Emergent Behaviour Intelligence and Computation (EEBIC) group were most helpful. This paper used a modified version of the PGA-Pack software package, courtesy of David Levine of the Argonne National Laboratory. This work is partly supported by the Australian Research Council.
Bibliography [1] Robert M. Axelrod. The Evolution of Cooperation. Basic Books, New York, 1984. [2] Robert M. Axelrod. The evolution of strategies in the iterated prisoner’s dilemma. In Genetic Algorithms and Simulated Annealing, chapter 3, pages 32–41. Morgan Kaufmann, 1987. [3] James E. Baker. Reducing bias and inefficiency in the selection algorithm. In Proceedings of the Second International Conference on Genetic Algorithms, pages 14– 21. Lawrence Erlbaum Associates, 1987. [4] Robert Boyd and Jeffrey P. Lorberbaum. No pure strategy is evolutionarily stable in the repeated prisoner’s dilemma game. Nature, 327:58–59, 7 May 1987.
[13] Jeffrey W. Herrmann. A genetic algorithm for minimax optimization problems. In Congress on Evolutionary Computation, pages 1099–1103. IEEE Press, 1999. [14] W. Daniel Hillis. Co-evolving parasites improve simulated evolution as an optimization procedure. In Artificial Life 2, pages 313–323. Addison-Wesley, 1991. [15] Philip Husbands and Frank Mill. Simulated coevolution as the mechanism for emergent planning and scheduling. In Proceedings of the Fourth International Conference on Genetic Algorithms, pages 264– 270. Morgan Kaufmann, 1991. [16] Daniel H. Janzen. How to be a fig. Annual Review of Ecology and Systematics, 10:13–52, 1979. [17] Hugues Juill´e and Jordan B. Pollack. Co-evolving intertwined spirals. In Proceedings of the Fifth Annual Conference on Evolutionary Programming, pages 461– 468. MIT Press, 1996. [18] Kristian Lindgren. Evolutionary phenomena in simple dynamics. In Artificial Life 2, pages 295–312. AddisonWesley, 1991.
[5] Martin Braun. Differential Equations and Their Applications. Springer-Verlag, fourth edition, 1993.
[19] Helmut A. Mayer and Roland Schwaiger. Evolutionary and coevolutionary approaches to time series prediction using generalized multi-layer perceptrons. In Congress on Evolutionary Computation, pages 275–280, 1999.
[6] Kumar Chellapilla and David B. Fogel. Evolution, neural networks, games, and intelligence. Proceedings of the IEEE, 87(9):1471–1496, September 1999.
[20] Jordan B. Pollack and Alan D. Blair. Co-evolution in the successful learning of Backgammon strategy. Machine Learning, 32(3):225–240, 1998.
[7] Paul J. Darwen and Jordan B. Pollack. Co-evolutionary learning on noisy tasks. In Congress on Evolutionary Computation, pages 1724–1731. IEEE Press, 1999.
[21] Mitchell A. Potter, Kenneth A. De Jong, and John J. Grefenstette. A coevolutionary approach to learning sequential decision rules. In Proceedings of the Sixth International Conference on Genetic Algorithms, pages 366–372. Morgan Kaufmann, 1995.
[8] Paul J. Darwen and Xin Yao. On evolving robust strategies for iterated prisoner’s dilemma. In Progress in Evolutionary Computation, volume 956 of Lecture Notes in Artificial Intelligence, pages 276–292. Springer, 1995. [9] Paul J. Darwen and Xin Yao. Does extra genetic diversity maintain escalation in a co-evolutionary arms race. International Journal of Knowledge-Based Intelligent Engineering Systems, 4(3):191–200, 2000.
994
[22] Gilbert Roberts and Thomas N. Sherratt. Development of cooperative relationships through increasing investment. Nature, 394:175–179, 9 July 1998. [23] Xin Yao and Paul J. Darwen. How important is your reputation in a multi-agent environment. In Proceedings of the 1999 IEEE International Conference on Systems, Man, and Cybernetics, pages 575–580, October 1999.