How to Design a Strategy to Win an IPD Tournament

3 downloads 0 Views 207KB Size Report
Jul 28, 2006 - an IPD game; moreover, it is not after the game that we can identify the ... the tournament is repeated several times and the average score for ...
July 28, 2006

4:38

WSPC/Book Trim Size for 9.75in x 6.5in

ws-book975x65

Chapter 4

How to Design a Strategy to Win an IPD Tournament

Jiawei Li Harbin Institute of Technology, China Email: lijiawei [email protected]

Imagine that a player in an IPD tournament knows the strategy of each of his opponents beforehand; he will defect the opponents like ALLC or ALLD and cooperate with opponents such as GRIM or TFT in order to maximize his payoff. This means that he can interact with each opponent optimally and receive higher payoffs. Although this information a priori is not possible, one can identify a strategy during the game. For example, if a strategy cooperated with its opponent in the previous 10 rounds while its opponent defected, it seems sensible to deduce that it will always cooperate. In fact, each strategy will gradually reveal itself during an IPD game; moreover, it is not after the game that we can identify the strategy but possibly after a few rounds. With an efficient identification mechanism, it is possible for a strategy to interact with most of its opponent optimally. However, two main problems must be solved in designing an efficient identification mechanism. Firstly, it is impossible, in theory, for a strategy to identify an opponent within a finite number of rounds because the number of possible strategies is huge. Only can the types of strategies belonging to a preconcerted finite set be identified, which may be just a small proportion of all those possible because identification will be of no use if it takes too long. Secondly, there exists the risk of exploring an opponent putting the player into a much worse position. In other words, such an action may have negative effect on future rewards. For example, in order to distinguish between ALLC and GRIM, a strategy has to defect at least once and loses the chance to cooperate with GRIM in the future. In this chapter we will discuss how to resolve these problems, how to design an identification mechanism for IPD games, and how the strategy of Adaptive Pavlov is designed that was ranked first in Competition 4 of the 2005 IPD tournament.

29

July 28, 2006

4:38

WSPC/Book Trim Size for 9.75in x 6.5in

30

4.1

ws-book975x65

Book Title

Analysis of strategies involved in IPD games

Each strategy may have its disadvantages as well as its advantages. A strategy may receive high payoffs when its opponent belongs to some set of strategies, and receive lesser payoffs when an opponent belongs to another set of strategies. However, some strategies always do better than others in IPD tournaments. The strategies involved in IPDs can be classified according to whether or not they respond to their opponents. One set of strategies is fixed and plays a predetermined action no matter what their opponent has done. ALLD, ALLC and RAND are typical. Other strategies are more complicated and their actions depend on their opponent’s behavior. TFT, for example, starts with COOPERATE and then repeats his opponent’s last move. The second set is obviously superior to the former since the strategies like TFT, TFTT and GRIM have always performed better than ’fixed’ strategies in past IPD tournaments. Then, the question here is what the optimally response to every opponent is. Is TFT’s imitation of opponent’s last move the best response? Although TFT has been shown to be superior to many other strategies, it is not perfect enough to win every IPD tournament. Let’s consider a simulation of IPD tournament with 9 players. These players are ALLC, ALLD, RAND, GRIM, TFT, STFT, TFTT, TTFT, and Pavlov. The descriptions of the strategies of these players are as shown in Table 4.1. These strategies are simple and representational, each of which has appeared in the past IPD tournaments. Table 4.1

Description of the players of the IPD simulation.

Players

Descriptions

ALLC ALLD RAND GRIM TFT TFTT STFT TTFT Pavlov

This strategy always plays COOPERATE. This strategy always plays DEFECT. It plays DEFECT or COOPERATE with 1/2 probability. Starts with COOPERATE, but after one defection plays always DEFECT. Starts with COOPERATE, and then repeats opponent’s moves. Like TFT but it plays DEFECT after two consecutive defections. Like TFT but in first move it plays DEFECT. Like TFT but it plays two DEFECT after opponent’s defection. Result of each move is divided into two groups: SUCCESS(payoff 5 or 3)and DEFEAT (payoff 1 or 0). If the last result belongs to SUCCESS group it plays the same move, otherwise it plays the other move.

The rule of the simulation is that each strategy will play a 200-round IPD game with every strategy (including itself). The payoffs in a round is as shown in Fig. 4.1. The total payoff received by any given strategy is the summation of the payoffs throughout the tournament. The results of the tournaments vary because there are random choices in the strategies of Pavlov and RAND. In order to decrease the variability of the result, the tournament is repeated several times and the average score for each strategy is calculated. Simulation results show that TFT, TFTT and GRIM acquire obviously

July 28, 2006

4:38

WSPC/Book Trim Size for 9.75in x 6.5in

ws-book975x65

How to design a strategy to win an IPD tournament?

31

Player 2’s choice

Player 1’s choice

COOPERATE DEFECT

COOPERATE (3, 3) (5, 0)

DEFECT (0, 5) (1, 1)

Fig. 4.1 Payoffs table of the IPD tournament. The numbers in brackets denote the payoffs two players receive in a round of a game.

higher scores than the others and their average scores across several tournaments are quite close. TFTT, however, wins more times than the others in single tournament. For example, TFTT wins 11 tournaments from a total of 20, while TFT wins 4 and GRIM wins 5. In addition, if Pavlov and RAND are removed TFTT will always win. One of the limitations of TFT is that it will inevitably run into the circle of defecting-defected (which means that TFT plays COOPERATE while its opponent defects; and then TFT plays DEFECT while its opponent cooperates) when its opponent happens to be STFT. However, cooperation will be achieved resulting in higher payoffs if TFT cooperates once after its opponent defects. TFTT is superior to TFT in this regard. And it is this reason why TFTT wins more tournaments than TFT in the above IPD simulation. It is easy to verify that TFT will not get lower scores than TFTT if STFT is removed from the simulation. Thus, we can improve the strategy of TFT in this way: when TFT enters a circle of defecting-defected (for example a sequence of 3 pairs of defecting-defected) it will choose COOPERATE in two continuous rounds. This modified TFT (MTFT) will achieve higher payoffs than TFT in the case that their opponents are STFT. By substituting MTFT for TFT, IPD experiments show that MTFT gets the highest average score and wins more single tournaments than the others. MTFT has used an identification technique. It identified STFT by detecting the defecting-defected circles in the process of an IPD game. When the opponent was considered to be STFT, optimal action (cooperates in two sequential rounds) would be carried out in order to maximize future payoffs. In this way, it is natural to deduce that MTFT can be further improved so that it can identify more strategies and then interact with them optimally. In the following sections, an approach to identify each strategy in a finite set will be introduced. A strategy can interact with the opponents almost optimally by using this identification mechanism. 4.2

Estimation of possible strategies in an IPD tournament

In this section, we seek to define a finite set of types of strategies to be identified. Since the number of possible strategies for IPD are infinite, it is impossible to identify each of them in a finite number of rounds. For example, suppose that a strategy cooperated with its opponent in 10 sequential rounds while its opponent defected continuously. Although it is very likely to be ALLC, there are always other possibilities. It may be GRIM but the trigger is 11 defections; it may be

July 28, 2006

4:38

WSPC/Book Trim Size for 9.75in x 6.5in

32

ws-book975x65

Book Title

RAND that has just happened to play 10 sequential COOPERATE; or it may be a combination of ALLC and TFT and it will behave as TFT type in the following rounds. However, since only ALLC belongs to the set of identification, those other possibilities will be eliminated. How to choose the set of identification depends on prior knowledge and subjective estimation. Some strategies like TFT are likely to appear; while others are designated as default strategies. There are numerous strategies one can design for an IPD tournament. However, most of them seldom appear because their chances of winning are very small. For example, there may be such a strategy that it cooperates in the first two rounds and defects in the following two rounds, and then it cooperates and defects alternately. Few players will apply such a strategy because it is unlikely to win in any IPD tournament. It is obvious that the strategies that usually win appear frequently and the others appear infrequently. We define two classifications of IPD strategies: cooperating and defecting. Cooperating strategies, for example TFT and TFTT, wish to cooperate with their opponents and never start defecting. Defecting strategies, for example ALLD and Pavlov begins with DEFECT (PavlovD), wish to defect in order to maximize their payoffs and they always start defecting. The cooperating strategies differ in the way of their responses to the opponent’s defections. For example, TFTT is more forgiving than TFT as it retaliates only if its opponent has defected twice. GRIM is sterner than TFT as it never forgives a defection. These strategies can be classified according to their responses to the opponent’s defections. The rules are the same as the one described in the previous simulation as shown in Fig. 4.2.

Stern

Forgiving ALLC

TFTT Fig. 4.2

TFT

TTFT

GRIM

The cooperatiing strategies.

The defecting strategies differ in the way they insist on defecting. PavlovD is a representative strategy in this set. It starts with DEFECT. If the opponent is too forgiving to retaliate, it defects forever. Otherwise, it tries to cooperate with the opponent.1 The defecting strategies can be classified as shown in Fig. 4.3. Other simple strategies which lack a clear objective differ from the cooperating and defecting strategies and hardly ever to get high scores in IPD tournaments. Most of the players of an IPD tournament will be cooperating strategies at the present time since cooperating strategies have been dominant in most of the 1 Although PavlovD tries to cooperate with an opponent when the opponent retaliates upon its defection, it seldom succeeds. For example, even if PavlovD meets a forgiving strategy like TFTT they can not keep cooperating in the game. In fact, if only PavlovD cooperates one more time cooperating can be achieved. We have examined a modified PavlovD (MPavlovD) strategy that starts with DEFECT and cooperates twice when the opponent retaliates. The results of simulation show that MPavlovD always gains more scores than PavlovD.

July 28, 2006

4:38

WSPC/Book Trim Size for 9.75in x 6.5in

ws-book975x65

How to design a strategy to win an IPD tournament?

Defect more

Defect less PavlovD

STFT Fig. 4.3

33

ALLD

The defectiing strategies.

tournaments. There will also be a small quantity of defecting strategies. Based on the above idea, we have designed the Adaptive Pavlov strategy that applies a simple mechanism to distinguish cooperating strategies and several representative defecting strategies. 4.3

Interaction with a strategy optimally

For any strategy there must be another strategy that optimally deals with it. Because the strategies of ALLC, ALLD and RAND are independent of the opponent’s behavior, ALLD is the optimal strategy. Because the strategies of GRIM, TFT, STFT and TTFT retaliate soon as their opponent defects, the optimal strategy of its opponent is to always cooperate but defect in the last round. TFTT is more charitable and forgives a single defection; therefore, its opponent can maximize the payoff by alternately choosing DEFECT and COOPERATE. If Pavlov starts with COOPERATE its opponent should always cooperate except in the last round; Otherwise, its opponent should start with DEFECT, then always cooperate except in the last round. Table 4.2 shows the optimal strategies to deal with each strategy as shown in Table 4.1. Table 4.2 Strategies ALLC ALLD RAND GRIM TFT TFTT STFT TTFT Pavlov

Optimal strategies to interact with a known strategy. Optimal strategy of opponent

It always plays DEFECT. It always plays DEFECT. It always plays DEFECT. It always plays COOPERATE except DEFECT in the last move. It always plays COOPERATE except DEFECT in the last move. It starts with DEFECT, and then plays COOPERATE and DEFECT in turn. It always plays COOPERATE except DEFECT in the last move. It always plays COOPERATE except DEFECT in the last move. If Pavlov starts with DEFECT it starts with DEFECT, and then always plays COOPERATE except that it plays DEFECT in the last round; If Pavlov starts with COOPERATE it always plays COOPERATE except that it plays DEFECT in the last round.

Given an IPD tournament with n players, a player will win the tournament if it interacts with each of its opponent optimally. For example, a unique ALLD will win when the other n − 1 players in a IPD tournament are all ALLC. Hence, the winning strategy of an IPD tournament must be optimal in interacting with most

July 28, 2006

4:38

WSPC/Book Trim Size for 9.75in x 6.5in

34

ws-book975x65

Book Title

of the others. Although the strategy of a player is unknown to his opponent before a game, the strategy gradually emerges as the game progresses. It is not difficult for a human player to identify the strategy of his opponent but it is more difficult for a computer program to gain the ability of identification. To make this feasible, there is a need for a method to distinguish each type of strategies from the others, and then a computer program can interact with different type of strategies with the relevant response. Under the assumption that every player belongs to a pre-defined finite set of strategies, an example is given to show how the method of identification is realized and how the winning strategy is designed . Consider an IPD tournament with 10 players. Besides the players shown in Table 4.1, let’s add a new player MyStrategy (MS) which applies an identification mechanism to identify its opponent. The rules are the same as the one described in the previous simulation. MS starts with DEFECT. If its opponent chooses DEFECT in the first round, MS chooses COOPERATE in round two, otherwise MS chooses DEFECT. MS always chooses COOPERATE in the third round. In this way, most of the strategies can be identified after just three rounds. For example, suppose that the choices of MS and its opponent in the first 3 rounds are as shown in Fig. 4.4. The strategy of the opponent can be confirmed to be RAND. Because the opponent starts with DEFECT it must be one of the strategies of ALLD, STFT, RAND and Pavlov. Since MS defects in the first round and the opponent cooperates in round two, it is impossible to be ALLD or STFT. Since MS and the opponent cooperate in the second round, the opponent should not defect in the third round if it were Pavlov. Therefore, the opponent must be RAND. The optimal strategy is ALLD in interacting with RAND, and MS will behave as ALLD in the following rounds of the game.

Round 1

Round 2

Round 3

MS’s moves

Defect

Cooperate

Cooperate

Opponent’s moves

Defect

Cooperate

Defect

Fig. 4.4

A possible process of a game (shows that the opponent is RAND).

Some possible results of identification for the 9 strategies are listed in Table 4.3, where ’C’ denotes COOPERATE and ’D’ denotes DEFECT. Because the strategy RAND chooses its move randomly it may behave like any other strategy during a short period; therefore, there needs more rounds to distinguish RAND from other strategies. If there is a process different from that of as shown in Table 4.3, the strategy of the opponent must be RAND. In this way, a strategy can be identified after several rounds of game, and then the optimal strategy can be applied.

July 28, 2006

4:38

WSPC/Book Trim Size for 9.75in x 6.5in

ws-book975x65

How to design a strategy to win an IPD tournament? Table 4.3 Players

35

Identification of the 9 strategies.

Possible moves of two players in a IPD game

Identification result

MyStrategy The opponent

D C C D C C

Pavlov (RAND)

MyStrategy The opponent

D C C D D D

ALLD (RAND)

MyStrategy The opponent

D C C D D D C

STFT (RAND)

MyStrategy The opponent

D D C C C C C

ALLC (RAND)

MyStrategy The opponent

D D C C C C D

TFTT (RAND)

MyStrategy The opponent

D D C C C D C

Pavlov (RAND)

MyStrategy The opponent

D D C C C C D D C

TFT (RAND)

MyStrategy The opponent

D D C C C C D D D C

TTFT (RAND)

MyStrategy The opponent

D D C C C C D D D D

GRIM (RAND)

Ten IPD tournaments with the above 10 players are carried out.2 The simulation results are as shown in Fig. 4.5. It shows that MS gains the highest average payoffs when compared to the other strategies and achieves the highest score in each tournament. The reason for MS’s success is that it has almost optimally interacted with most of the strategies in this IPD tournament. Most of IPD strategies, such as TFT or Pavlov, are memory-one strategies which can only respond to the opponent’s last move; however, the past process of the game contains more information. The identification mechanism of MS uses information about the opponent’s strategy, thus MS respond to not just the opponent’s past moves but the opponent’s strategy. By identifying different opponents, MS makes use of more information than the simple strategies. This is the reason MS is able to win IPD tournaments. Different identification approaches may lead to different results for MS. For example, all of the strategies of GRIM, TFT and ALLC start with COOPERATE, and they will not defect if their opponents don’t. To identify each of these strategies, MS starts with DEFECT and loses the chance to cooperate with GRIM. On the 2 How many rounds an IPD game commits is usually not fixed in order to avoid the players’ knowing of when to end the game. The simulation applies a fixed number of rounds in order to decrease complexity of computation. However, the strategy of MS does not make use of this to get extra payoff; that is to say, MS does not purposely choose DEFECT in the last round of a game.

July 28, 2006

4:38

WSPC/Book Trim Size for 9.75in x 6.5in

36

ws-book975x65

Book Title

Players

Points in 10 tournaments

Average Rank

MS

6134

6213

6179

6127

6202

6175

6152

6172

6212

6187

6175.3

1

TFTT

5957

5996

5970

6003

5994

5959

5965

5969

5966

5976

5975.5

2

TFT

5961

5936

5919

5946

5959

5938

5940

5929

5954

5978

5946.0

3

Pavlov

5718

5691

5725

5775

5816

5763

5748

5763

5733

5745

5747.7

4

TTFT

5725

5723

5725

5717

5719

5725

5746

5732

5722

5716

5725.0

5

GRIM

5404

5394

5416

5410

5440

5468

5322

5400

5390

5384

5402.8

6

ALLC

5115

5091

5103

5127

5103

5103

5103

5082

5109

5091

5102.7

7

RAND

4339

4349

4254

4340

4216

4219

4258

4241

4228

4274

4271.8

8

STFT

4165

4187

4160

4169

4179

4144

4173

4158

4142

4158

4163.5

9

ALLD

3800

3792

3852

3792

3848

3856

3832

3864

3832

3832

3830.0

10

Fig. 4.5

Simulation results of 10 IPD tournaments.

other hand, if MS doesn’t defect firstly, it cannot distinguish the 3 strategies and cannot interact with ALLC optimally. The risk involved in exploring the opponent must be considered in order to choose an efficient or payoff-maximizing identification approach. 4.4

Escape from the trap of defection

When a player begins to explore the opponent, there appears the risk of the identifying process’s putting the player into a much worse position. Some strategies, especially those with trigger mechanism such as GRIM, will change their behaviors at the trigger point. For example, the strategy MS described in the above section defects at the beginning of IPD games in order to distinguish each of the cooperating strategies ALLC, TFT and GRIM; however, the chance to cooperate with GRIM is lost. In IPD games, the risk of identification is mainly the trap of defection, which means an identifying process leading the opponent to keep defecting with nothing that can be done to rescue the situation. It appears that a strategy will not run into the trap of defection if it never defects first. But this is not the case. Suppose a strategy keeps playing COOPERATE if its opponent defects, and defects forever once its opponent cooperates; then, any cooperating strategy will be defected in interacting with it while most of defecting strategies keep it cooperating. If there is a equal possibility of this reverse-GRIM strategy appearing in a game to that of GRIM, to cooperate or to defect has equal risk to invoke future defection. This means that there always exists the risk of defection trap whether or not an identification mechanism is applied. One may argue that the reverse-GRIM type of strategies will not appear as

July 28, 2006

4:38

WSPC/Book Trim Size for 9.75in x 6.5in

How to design a strategy to win an IPD tournament?

ws-book975x65

37

frequently as GRIMs in IPDs, so to cooperate is safer than to defect and the MS strategy is more likely to run into the defection trap than TFT. That is right. But it is not enough to testify that defection trap is not inevitable for a strategy with an identification mechanism because many identification approaches can be applied. For example, a simple way to avoid retaliation from GRIM is not to defect first. The identification mechanism that Adaptive Pavlov used in 2005 IPD tournament only explored defecting strategies in order to keep cooperation with each of those cooperating strategies. Again, what kind of identification mechanisms should be applied depends on prior knowledge and subjective estimation. If there are enough ALLC strategies in an IPD game, it is worth identifying them from other cooperating strategies. But if GRIMs are prevailing, it is better not to defect first. Generally speaking, we can compare different identification approaches to choose the most efficient one although uncertainty still exists. 4.5

Adaptive Pavlov and Competition 4 of 2005 IPD tournament

The 2005 IPD tournament comprised 4 competitions. Competition 4 mirrors the original competition of Axelrod. There were totally 50 players including 8 default strategies. The strategy of Adaptive Pavlov (AP) that ranked first in Competition 4 will be analyzed in this section. The strategy of AP combines 6 continuous rounds to a period and applies different tactics in different periods. AP behaves as a TFT strategy in the first period, and then changes its strategy according to the identification of its opponent. AP classifies the possible opponents into 5 categories: cooperating strategies, STFT, PavlovD, ALLD and RAND.3 By identifying the opponent’s strategy at the end of a period, AP shift its strategy in new period in order to deal with each opponent optimally. AP is never the first to defect, and thus it will cooperate with each of cooperating strategies. AP tries to cooperate with the strategies of STFT and PavlovD, and defect to the strategies such as ALLD or RAND. The processes of AP’s interacting with cooperating strategies, ALLD, STFT, and PavlovD in first 6 rounds are shown in Fig. 4.6(AP behaves as TFT). For example, when a process of interaction as that of Fig. 4.6(c) happens, the opponent will be identified to be STFT and AP will cooperate twice in the next period in order to achieve cooperation. If the opponent is determined to be PavlovD, AP will defect once and then always cooperate in the next period. If there is a process of interaction different from that of as shown in Fig. 4.6, the opponent will be identified as RAND. In this way, any strategy that is not defined in identification set is likely to be identified as RAND. Once cooperation has been established, AP will always cooperate unless a defection occurs. Identification of the opponent is performed in each period throughout IPD tournament in order to correct misidentification and to deal with those players who change their strategies during a game. As we have mentioned, most of the players will be cooperating strategies, the 3 RAND

is claimed to be a default strategy.

July 28, 2006

4:38

WSPC/Book Trim Size for 9.75in x 6.5in

38

ws-book975x65

Book Title

1

2

3

4

5

6

AP

C

C

C

C

C

C

Co-op

C

C

C

C

C

C

1

2

3

4

5

6

AP

C

D

D

D

D

D

ALLD

D

D

D

D

D

D

(a)

(b)

1

2

3

4

5

6

AP

C

D

C

D

C

D

STFT

D

C

D

C

D

C

1

2

3

4

5

6

AP

C

D

D

C

D

D

PavlovD

D

D

C

D

D

C

(c)

(d)

Fig. 4.6 Identifying the opponent according to the process of interaction in six rounds. (a) AP cooperates with any a cooperating strategy. (b) ALLD strategy always defects. (c) If a strategy alternately plays D and C when interacting with TFT, it is identified to be STFT. (d) If a strategy periodically plays D-D-C when interacting with TFT, it is identified to be PavlovD.

results show that there are 34 cooperating strategies in Competition 4 (including 4 default strategies of TFT, TFTT, GRIM and ALLC). With the exception of the default strategies, there are still 3 strategies that behave like ALLD, 5 strategies that behave like STFT, and 2 strategies that behave like NEG. As shown in Table 4.4, AP can identify most of the strategies involved in Competition 4.4 Table 4.4

4.6

Categories of the strategies in Competition 4.

Categories

Number of the strategies

Cooperating strategies Strategies like STFT Strategies like ALLD Strategies like NEG Strategies like RAND Others

34 6 4 3 1 2

Discussion and conclusion

AP belongs to the type of adaptive automata for IPD. However, it differs from other adaptive strategies in respect of how adaptation is achieved. The approach of AP exactly belongs to the set of artificial intelligence approaches. Rather than adjusting some parameters in computing responses as most of the adaptive strategies do, AP uses an identification mechanism which acts as an expert system. Knowledge 4 AP regards NEG as RAND. It maximizes the scores when interacting with the strategies like NEG because either of the optimal strategies to interact with NEG and RAND are ALLD.

July 28, 2006

4:38

WSPC/Book Trim Size for 9.75in x 6.5in

How to design a strategy to win an IPD tournament?

ws-book975x65

39

about different opponents is expressed in the form of ’If, then”, for example, if the opponent cooperates in 6 rounds then it is determined to be ALLC. In this way, information acquired and used can be transparently expressed and thus AP can tell what strategy the opponent is. Recent years have seen much AI approaches applied in evolutionary game theory and IPD, for example reinforcement learning, artificial neural networks, and fuzzy logic (Sandholm and Crites (1996), Macy and Carley (1996), Fort and Prez (2005)). To solve the problem of computing a best response to an unknown strategy has been one of the objectives of those AI approaches. The problem is in general intractable because of the computational complexity, and finding the best response for an arbitrary strategy can be non-computable (Papadimitriou (1992), Nachbar and Zame (1996)). Reinforcement learning which is based on the idea that the tendency to produce an action should be reinforced if it produces favourable results, and weakened if it produces unfavourable results (Gilboa (1988), Gilboa and Zemel (1989)) is widely used for the automata to learn from the interaction with others. In respect to IPD, several approaches have been developed to learn optimal response to a deterministic or mixed strategy (Carmel and Markovitch (1998), Darwen and Yao(2002)). However, computational complexity is still the main difficulty in the application of these approaches in real IPD tournaments. AP’s identification mechanism is implemented in a simple way by making use of a priori knowledge, which greatly reduces the computational complexity and makes it practical for AP to respond to the opponent almost optimally. First, a priori knowledge about what strategies are more likely to appear in the IPD tournament is used in determining the identification set. The size of identification set is restricted in order to reduce computational complexity. Second, a priori knowledge about how well different identification approaches will work in a certain environment is used in selecting an efficient identification approach, with which AP can avoid the risk of identification and maximize the payoffs. Third, a priori knowledge about how to identify the opponent according to the process of interaction is used in constructing the identification rules. With these simple rules, AP strategy is easy to be understood and improved. It is obvious that the identification set can be extended in order to include more strategies that can be identified; however, more calculations will be involved as the size of identification set increases. We have to make a tradeoff between the wish to identify any strategy and the wish to develop a less complicated strategy. Compared to the NP-completeness of those reinforcement learning √ approaches (Papadimitriou (1992)), AP’s computational complexity is between O( n) and O(n), which depends on the similarities of those strategies to be identified. Therefore, the algorithm of AP is very suitable for real IPD tournaments. Identification mechanism can also work in the environment with noise, where each strategy might, with a possibility, misunderstand the outcome of game. Noise blurs the boundaries between different strategies. However, identification can still be applicable by admitting a small identification error. In this circumstance, we can set a threshold value that the opponent is considered to be identified if the probability of misidentification is smaller than this value. Just as the case of identifying the strategy of RAND, the probability of mistake identifying a strategy will decrease

July 28, 2006

4:38

40

WSPC/Book Trim Size for 9.75in x 6.5in

ws-book975x65

Book Title

to zero as the process of computation and identification repeats. Information plays a key role in intelligent activities. The individuals with more information consequentially gain the advantage over others in most circumstances. With identification mechanism, the strategies like AP acquire information about their opponents and they are more intelligent than any known strategies such as TFT or Pavlov. This type of strategies is suitable in modeling the decision-making process of human being, where learning and improving frequently happens.

Reference [1] Ben-Porath E. (1990) The complexity of computing a best response automaton in repeated games with mixed strategies, Games and Economic Behavior, 2:1-12. [2] Carmel D. and Markovitch S. (1998) ”How to explore your opponent’s strategy (almost) optimally,” Proceedings of the International Conference on Multi Agent Systems, 64-71. [3] Darwen P. and Yao X. (2002) Coevolution in iterated prisoner’s dilemma with intermediate levels of cooperation: Application to missile defense, International Journal of Computational Intelligence and Applications, 2(1): 83-107. [4] Fort H. and Prez N. (2005) ”The fate of spatial dilemmas with different fuzzy measures of success,” Journal of Artificial Societies and Social Simulation, 8(3). [5] Gilboa I. (1988) The complexity of computing best response automata in repeated games, Journal of Economic Theory, 45:342-352. [6] Gilboa I. and Zemel E. (1989) Nash and correlated equilibria: some complexity considerations, Games and Economic Behavior, 1:80-93. [7] Macy M. and Carley K. (1996) ”Natural selection and social learning in prisoner’s dilemma: co-adaptation with genetic algorithms and artificial neural networks,” Sociological Methods and Research, 25(1): 103-137 [8] Nachbar J. and Zame W. (1996) Non-computable strategies and discounted repeated games, Economic Theory, 8:103-122. [9] Papadimitriou C. (1992) On players with bounded number of states, Games and Economic Behavior, 4:122-131. [10] Sandholm T. and Crites R. (1996) ”Multiagent reinforcement learning in the Iterated Prisoner’s Dilemma,” Biosystems, 37(1-2): 147-66.