Strategy extraction when playing games - Semantic Scholar

2 downloads 0 Views 181KB Size Report
1] Cestnik, B., Kononenko, I. and Bratko, I. (1987) ASSISTANT 86: A knowledge elicitation tool for sophisticated users. In Bratko, I. and Lavra c, N. (eds.) Progress.
Strategy extraction when playing games Marko Grobelnik

AI Lab., J. Stefan Institute, Jamova 39, Ljubljana, Slovenia Phone: +386 61 1773745, E-mail: [email protected] Vesna Prasnikar

Faculty for Economics, University Ljubljana, Slovenia & Northwestern University, Kellogg School of Management E-mail: [email protected] Abstract

Modelling of a subject's strategy is rather important task for many situations e.g. in economy. However, modelling is mainly done by hand, i.e. repeating the cycle of proposing a hypothesis model and verifying it with various statistical approaches. In our work we use machine learning tools for automatic modelling of strategies in a game which is particularly interesting from the game theoretic point of view. More precisely, we modelled the decision process in the ultimatum bargaining game, experimentally performed in 4 countries. The results were decision trees corresponding to the qualitative models of the player types, con rming intuition about the game playing and the rational behaviour hypotheses of the players. Keywords:

machine learning, game theory, knowledge acquisition, qualitative modelling

1 Introduction Games can in general represent any situation in the nature, e.g. how to manage a business, what career to follow, whether to run for a president, : : : The major element in games is that decision makers do not act in vacuum, but are instead surrounded by other active decision makers. Such interactive decisions are called strategic, and the plan of action appropriate to them is called a strategy. Therefore, game theory can be de ned as a study of mathematical models of con ict and cooperation between intelligent rational decision makers. Typical paradigm when analysing one particular phenomena from the game theory point of view is the following: the underlying theory about the problem is examined and the game is de ned which models the phenomena; next, the game is either (a) studied theoretically, (b) simulated on the computer, or (c) is left to humans (students) to play the game. From all three cases we can make certain conclusions about the strategy for playing the game. 1

(a) In the rst case, some properties of the game can be shown, and sometimes an optimal strategy can be proposed. Unfortunately, the latter is possible (mostly) only in rather simple games which don't correspond to realistic situations. (b) The second case, when we simulate the game on the computer, o ers much broader insight into the game, but typically at the cost of theoretical exactness. It is possible to create ecient strategies which are still, however, more or less biased because of the assumptions made in the preparation of the simulation. (c) The third case, where the game is played by humans, gives us the most realistic picture about the playing game in a real world environment. Here we meet more practical kind of limitations: e.g. limited time, limited number of people, motivation, : : : However, even in such circumstances, many e ects can be seen, which are not evident from the theoretical analysis and simulation. The measured data from human played games are further processed to determine strategies used by players. Typically, a model of a player is proposed according to the theory and veri ed if it ts the measured data. In the cases where no e ective underlying theory is available, such modelling can become very questionable, sometimes leading to ad-hoc solutions. Such situations are particularly critical in the cases, where the games are more complex with several unpredictable nonlinear properties. In such situations an approach for automatic building of the player model from the measured data would be very helpful. Therefore, we decided to use the machine learning approach for the automatic generation of the qualitative model of the player from the measured data. In this presentation we show the analysis of the simple bargaining game for which the experiment was performed in four countries on the signi cantly large pattern of players. The game is interesting, because the theoretical results di er from the measured data. For the purposes of the automatic qualitative modelling of the player's strategy we used the machine learning tool Magnus-Assistant [2] which is based on the Assistant algorithm [1] (rather sophisticated descendant of the ID3 algorithm). The results show very applicable models of players that con rm the intuition about the game strategy. Furthermore, di erent substrategies are shown for di erent countries (nations), and within particular country several types of population are identi ed. In the following sections the game and the experiment performed with the game will be presented. Next, the machine learning algorithm Assistant will be brie y reviewed. The presentation is continued with the problem formulation for the machine learning system, and concluded with results of the analysis.

2 Game de nition The game we used for modelling is simple, maybe simpler than someone would expect. According to the game theory, it is categorised into the ultimatum games [6, 5] The ultimatum bargaining game is a two-person game played as follows. There is some quantity Q of money to be divided, and player 1 makes an o er of the form (x1; x2), where x2 = Q ? x1. Player 2 than has an opportunity either to accept or reject this proposal: if player 2 accepts, then player 1 receives x1, and player 2 receives x2 = Q ? x1; if player 2 rejects, then each player receives zero. The game is one shot (non repeated), which means that there is no accumulation of experiences about the opposite player. Two examples, for quantity Q = $10: (1) Player 1 proposes $6 for himself and $4 for 2

player 2. Player 2 accepts the o er and earns $4, while player 1 earns $6. (2) Player 1 proposes $7 for himself and $3 for player 2. Player 2 rejects the o er. In this case nobody gets anything. In this bargaining environment, the game theoretic optimal solution for player 1 is to demand all (or almost all) of the pro t, and for player 2 to accept this. That is, the division of Q that results from this solution gives player 1 a payo of Q ? , and player 2 a payo of  [3]. Observed experimental results have been quite di erent, with player 1's, predominantly o ering player 2's much larger shares (typically in the neighbourhood of 40% of Q).

3 Experiment description The experiment, with the game described in the previous section, was performed in 4 countries [4]: USA (Pittsburgh), Slovenia (Ljubljana), Israel (Jerusalem) and Japan (Tokyo). In each country 2n students were involved: n in the role of player 1 and n in the role of player 2. More precisely: 74 in USA, 60 in Slovenia and Israel, and 58 in Japan, altogether 252. Students didn't change their roles during the whole duration of the experiment. Each player repeated the game 11 times, each time with a di erent anonymous person from his country. Players were informed about the number of rounds in the game. Each player knew only the data about the games, he was involved in. The quantity Q of money to be divided was $10. The amount of $10 was recalculated into the local currency with the corresponding purchasing power. The players were additionally stimulated by being actually paid earned money in one randomly chosen round.

4 Assistant algorithm In the analysis, the system Magnus-Assistant [2] was used. It is based on the Assistant algorithm [1] from the family of the \top down induction decision tree algorithms ". Assistant's basic paradigm is top-down construction of binary decision trees. It can handle noisy and incomplete input data, and has ability to preprune and postprune decision trees. It could be said, that Assistant is rather sophisticated descendant of the well known ID3 [7] algorithm. The following piece of pseudo code brie y describes the algorithm:

Given:

The set of training examples where each example is in the form of state space vector: (C; A1; A2 : : :A ), where C is one of discrete classes to which an example belongs, and A are attributes having discrete or continuous values. Attribute values may also be unknown or unimportant . function Build ? Tree(ExampleSet) : DecisionTreeType; if the example set is empty then return an empty leaf; else if all examples are of the same class then return a leaf with that class; else if at least one of the pruning criterions is satis ed then return a multi class leaf; n

i

else

choose the most informative attribute A according to the best binary split of its values (selection is based upon maximisation of the informativeness which best

3

is calculated by the di erence between the entropy before and after splitting of the example set); split the example set into two subsets, according to the binarised values of A ; for both example subsets recursively build subtrees T1 and T2 ; return (A , T1, T2), a binary tree with A in the root and T1 2 as subtrees; best

end

best

best

;

5 Problem formulation From the experiment described in the section 3 we got the data we wanted to use for the modelling of the strategies for both player types involved into the game. Two important issues at this point were: (1) what type of the answers we want to get from the system, and (2) how to formulate the questions to get the expected answers. Furthermore, our goal was also to de ne learning problems in such a way, that this process could be later automatised for games with di erent rules. There are two player types. For each we would like to know how to act in the next round according to the past experiences. For player 1, which gives o ers, the amount of money to be o ered in the next round is important. We reformulated this into the question: how big the o er should be compared to the o er the player gave in the previous round. For player 2, which decides about an o er, the decision about the next o er is important. The learning problem for the modelling of the player's 1 strategy is deciding about o ering greater, smaller or equal amount of money compared to the previous round. Thus, the possible values of the example class are: Up, Down, or Equal. The attributes in the example describe the experiences the player got from the previous rounds. We divided attributes mainly into two groups: (1) the rst group models the player's \long term memory ", consisting of the summary statistics about all previous rounds, (2) the second group models the player's \short term memory ", consisting from the raw data about the previous round. There is the complete set of the attributes: country of the player, current round number, o er in the previous round, decision of player 2 in the previous round, existence of the accepted and rejected o ers so far (Booleans), mean and standard deviation of the previous o ers, min. and max. o er so far, min. and max. accepted o er so far, min. and max. rejected o er so far, extent of the o ers so far, number of the di erent o ers so far, number of the accepted o ers so far, percentage of the accepted o ers so far, and mean earning per round so far. The learning problem for the player's 2 strategy is deciding about accepting or rejecting the current o er, thus implying example class values Accept or Reject. All attributes calculated for player 1, are also calculated for player 2. Added is only the group of three attributes containing information on the current o er about which player 2 have to decide: current o er to decide about, and two tests checking whether the current o er is greater or equal to min. and max. accepted o er so far (Booleans). Table 1 show examples for a randomly chosen player 1 where only a subset of attributes is presented. Note, that for each player only 10 examples were generated from 11 rounds she or he played in the experiment { the round 0 was taken for the calculation of the past data attributes for the round 1. As written in the Section 3, 252 players were involved into the game, half (126) as 4

Round previous previous mean class response o er earning (decision) 1 accepted $4.0 $4.00 Down 2 rejected $0.5 $2.00 Up 3 rejected $1.0 $1.33 Up 4 accepted $2.5 $1.63 Up 5 accepted $3.0 $1.90 Down 6 accepted $2.5 $3.00 Equal 7 rejected $2.5 $2.40 Equal 8 accepted $2.5 $1.80 Equal 9 rejected $2.5 $1.60 Up 10 accepted $3.0 $1.75 Equal Table 1: Trace of decisions for a randomly chosen player 1 player 1 and the other half as player 2. Therefore, we got two domains with 1260 examples each (10 examples per player  126 players).

6 Basic statistics about the game Before proceeding with the presentation of machine learning results, we will brie y review some basic statistical characteristics about the experiment described in the Section 3. Because the lack of the space, only summary statistics over all 4 countries will be presented. The data for particular countries include some speci cs, but for the purpose of this presentation they are not signi cant. Table 2 contains the data about various aspects of the game per particular round. The tendencies in the data are approximately shown by the linear regression coecient (last column in the Table 2) for the preceding time series. Next, we brie y comment the table with additional summary statistics over all rounds. Mean o er over all rounds is $4.07(1.23)1 . Mean o er per round (PropMean in Table 2) doesn't change too much, while its standard deviation (PropDisp) drops signi cantly from the rst to the last round, which shows the stabilisation of the game through the time. Mean accepted o er (APropMean) declines, while mean rejected o er (RPropMean) increases, which shows that both parties in the game are converging to an equilibrium area (con rming game theoretic results). Both standard deviations (APropDisp, RPropDisp) are declining, con rming the stabilisation of the game. 73% of all o ers were accepted. The acceptance rate (RespMean) didn't change too much over the time. It becomes slightly higher only at the end, when players 2 tried to get as much as possible. Average earning per player 2, per o er (accepted or rejected) was $3.20(2.15). Earnings (GainMean) didn't change too much through time, while its standard deviation (GainDisp) declines, particularly at the end. Average extent of the o ers (max. o er ? min. o er) per player was $2.48(1.87) that 1

Numbers in parenthesis are standard deviations.

5

Round PropMean PropDisp APropMean APropDisp RPropMean RPropDisp RespMean GainMean GainDisp EPropMean EPropDisp DPropMean DPropDisp

0 4.09 1.71 4.73 1.34 2.89 1.71 65% 3.08 2.51 0.00 0.00 1.00 0.00

1 4.30 1.45 4.58 1.24 3.41 1.67 75% 3.46 2.26 0.92 1.21 1.69 0.46

2 4.15 1.38 4.43 1.30 3.36 1.34 74% 3.27 2.25 1.40 1.59 2.34 0.74

3 4.07 1.30 4.50 1.15 3.04 1.06 71% 3.18 2.27 1.67 1.67 2.95 1.05

4 3.93 1.24 4.33 1.06 3.07 1.19 68% 2.96 2.21 1.92 1.76 3.55 1.31

5 4.01 1.20 4.26 1.14 3.47 1.14 67% 2.88 2.21 2.10 1.78 4.07 1.56

6 4.04 1.13 4.37 1.03 3.23 0.94 71% 3.09 2.18 2.19 1.77 4.55 1.82

7 4.02 1.18 4.33 1.03 3.06 1.09 75% 3.26 2.07 2.33 1.88 5.02 2.04

8 4.05 0.91 4.25 0.81 3.37 0.92 77% 3.27 1.93 2.39 1.86 5.42 2.31

9 4.04 0.88 4.21 0.78 3.44 0.95 78% 3.28 1.89 2.44 1.86 5.81 2.56

10 4.08 0.87 4.17 0.85 3.60 0.81 83% 3.48 1.74 2.48 1.87 6.10 2.78

LR Cf.

(-0.01) (-0.08) (-0.05) (-0.05) (+0.04) (-0.08) (+0.01) (+0.01) (-0.06) (+0.21) (+0.12) (+0.51) (+0.27)

Table 2: O er and response statistics (All quantities except RespMean are in dollars.) Round { round number; LR Cf. { linear regression coecient; PropMean, PropDisp { mean and std. dev. for all o ers; APropMean, APropDisp { mean and std. dev. for accepted o ers; RPropMean, RPropDisp { mean and std. dev. for rejected o ers; RespMean { mean percentage for accepted o ers; GainMean, GainDisp { mean and std. dev. earning; EPropMean, EPropDisp { mean and std. dev. o ers extent per player; DPropMean, DPropDisp { mean and std. dev. number of di erent o ers per player indicates rather small testing area of opponent's demands. Average min. and max. o ers per player are $2.73(1.44) and $5.21(1.08), respectively. EPropMean and EPropDisp show growth of the o er's extent and its standard deviation. Average number of di erent o ers per player was 6.10(2.78), showing large repeating of o ers from the previous rounds. DPropMean and DPropDisp show growth of the number of di erent o ers player extent and its standard deviation.

7 Machine learning results The data prepared as described in the Section 5 we put into the system Magnus-Assistant. As said before, the primary goal was to build automatically qualitative models of players, which would reconstruct the actual strategies used in the game. The results, we got from the system, nicely reconstruct the dynamical behaviour of the players. Here we present the models of both player types, built from the summary data for all 4 countries. The trees were pruned to enable interpretation of the strategy. Figures 1 and 2 contain the trees for player 1 an player 2 respectively. Note, that the numbers n in the trees correspond to the number of cases covered by the particular part of the tree. Furthermore, proportions of this numbers don't correspond to the class distributions in the leaves. This is caused by virtual expansion of the examples that include \unimportant " values for some attributes. Next, we analyse the structure of both trees. The root attribute in the tree for player 1 (Figure 1) is the previous response of player 2 to the previous o er of player 1. If the previous o er was rejected, then in most cases (70.2%) player 1 o ered more to player 2 { this is of course rather normal reaction to such situation. Opposite, if the previous o er was accepted, then its height becomes important. The most common situation (covers most cases) is, when the previous o er 6

' $ & '% $ & % ' $ & %

reject



up, 70.2% (n=258) equal, 22.8% (n=64) down, 7% (n=29)

previous response (n=1260)

@@ accept @@

? ?  4.95 ? ? ? ?

up, 14.2% (n=76) equal, 20.3% (n=189) down, 65.5% (n=306)

previous o er (n=909)

ee

>

4.95

ee e

# #  5.35 ### ##

previous o er (n=338)

up, 5.4% (n=11) equal, 61.5% (n=166) down, 33.1% (n=111)

e ee > 5.35 ee e

up, 20% (n=6) equal, 3.3% (n=2) down, 76.7% (n=42)

Figure 1: The three for player 1. was  4.95. In such situation player 1 o ered in the next round less to player 2 (65.5%). This corresponds to the strategy of a dynamical player, which is trying to get as much as possible for himself. In other situation player 1 o ered > 5.35, and noticed, that he o ered too much, so he immediately proposed smaller o er (76.6%). Interesting is the situation where the previous o er is > 4.95 and  5.35. Here, in the most cases the next o er remained equal (61.5%). The reason is, that opposite from the dynamical players there exists also the population of conservative players, which are satis ed with less money, but are sure their o er will be accepted. The root attribute in the tree for the player 2 (Figure 2) is the condition whether the current o er from player 1 is greater or equal to the minimal accepted o er of the player 2 so far. This condition actually tests the hypothesis, that there exists a sort of reservation price for player 2, below which she or he doesn't want to accept anything. Because the machine learning system chose this test as the main attribute, the hypothesis is in a great deal con rmed. In the cases, when the current o er is greater or equal to the reservation price (min. accepted o er so far), then the o er is very likely to be accepted (91%). Otherwise, the probability proportion between accept and reject depends on the height of the current o er. 7

' & ' &

' $ & % $ % $ % current o er

 min. accepted o er so far

  

(n=1260)

no

accept, 91.3% (n=635) reject, 8.7% (n=67)

current o er

 4.75 

(n=620)



current o er (n=517)

? ? ?

??3.70

accept, 50.6% (n=84) reject, 49.4% (n=178)

ee yes ee

ZZ> 4.75 ZZ ZZ Z

accept, 92.2% (n=93) reject, 7.8% (n=10)

JJ JJ> 3.70 J

accept, 72.8% (n=164) reject, 27.2% (n=91)

Figure 2: The three for player 2.

8 Conclusions We presented an analysis of the ultimatum bargaining game, which is particular interesting from the game theoretic point of view. For the analysis, we used the data from the experiment performed in 4 di erent countries. The primary goal of the analysis was to automatically build models of di erent player types involved into the game. As the main tool for the analysis, we used the machine learning system Magnus-Assistant based on the Assistant algorithm (descendant of ID3). The results were decision trees that correspond to qualitative models of the player types. The strategies constructed from the data nicely correspond to the hierarchical decision process performed by players. The results also con rm the intuition about the game, and interpretation of the trees con rms the rational behaviour hypothesis. The presentation include only analysis of the summary data over all 4 countries. Results for particular countries showed similar models of decisions. The di erences are mainly in thresholds for various attributes. In the future work, we will analyse additional games in the similar manner. We want to use the results (automatically built models) as autonomous agents for proving convergence properties for various games based on the experimental data. Finally, we want to build a mixed (qualitative and quantitative) methodology for analysing decision process also in more complex situations. 8

References [1] Cestnik, B., Kononenko, I. and Bratko, I. (1987) ASSISTANT 86: A knowledge elicitation tool for sophisticated users. In Bratko, I. and Lavrac, N. (eds.) Progress in machine learning, pp. 31-45. Wilmslow, Sigma Press. [2] Mladenic, D. (1990) Machine learning system Magnus Assistant (in Slovene). BSc Thesis, University of Ljubljana, Faculty for Electrical Engineering and Computer Science, 1990. [3] Prasnikar, V., Roth, A. E. (1992) Consideration of Fairness and Strategy: Experimental Data From Sequential Games. The Quarterly Journal of Economics, August 865-888. [4] Roth, A. E., Prasnikar, V., Fujiwara, M. O., Zamir, S. (1991) Bargaining and Market Behaviour in Jerusalem, Ljubljana, Pittsburgh, and Tokyo: An Experimental Study. American Economic Review, 81, December, 1068-1095. [5] Osborne, M. J., Rubinstein, A. (1990) Bargaining and Markets. Academic Press, San Diego. [6] Roth, A. E. (1979) Axiomatic Models of Bargaining. Springer Verlag, Berlin. [7] Quinlan, J.R. (1986) Induction of decision trees. Machine Learning, Vol. 1, pp. 81106.

9

Suggest Documents