Tutorial 2 (Game Theory)

32 downloads 19471 Views 2MB Size Report
Tutorial: Introduction to Game Theory. Jesus Rios. IBM T.J. Watson Research Center .... How to draw together and learn from conflicting probabilistic judgements.
July, 2013

Tutorial: Introduction to Game Theory Jesus Rios IBM T.J. Watson Research Center, USA

[email protected]

© 2013 IBM Corporation

Approaches to decision analysis   Descriptive –  Understanding of how decisions are made

  Normative –  Models of how decision should be made

  Prescriptive –  Helping DM make smart decisions –  Use of normative theory to support DM –  Elicit inputs of normative models •  DM preferences and beliefs (psycho-analysis) •  use of experts –  Role of descriptive theories of DM behavior

2

© 2013 IBM Corporation

Game theory arena   Non-cooperative games –  More than one intelligent player –  Individual action spaces –  Interdependent consequences •  Players’ consequences depend on their own and other player actions

  Cooperative game theory –  Normative bargaining models •  Joint decision making -  Binding agreements on what to play

•  Given players preferences and solution space Find a fair, jointly satisfying and Pareto optimal agreement/solution –  Group decision making on a common action space (Social choice) •  Preference aggregation •  Voting rules -  Arrow’s theorem

–  Coalition games 3

© 2013 IBM Corporation

Cooperative game theory: Bargaining solution concepts Working alone

Juan

How to distribute

$ 10

the profits of the cooperation?

Maria $ 20 Working together

$ 100

Juan

x

Maria

y

Maria x + y = 100

90

K

y

Bliss point

•  •  •  •  • 

Fair

Disagreement point: BATNA, status quo Feasible solutions: ZOPA Pareto-efficiency Aspiration levels Fairness: x = 45 K-S, Nash, maxmin solutions

y = 55

20 10 4

x

80

Juan © 2013 IBM Corporation

Normative models of decision making under uncertainty   Models for a unitary DM –  vN-M expected utility •  Objective probability distributions –  Subjective expected utility (SEU) •  Subjective probability distributions

  Example: investment decision problem –  One decision variable with two alternatives •  In what to investment? -  Treasury bonds -  IBM shares –  One uncertainty with two possible states •  IBM share price at the end of the year -  High -  Low –  One evaluation criteria for consequences •  Profit from investment

  The simplest decision problem under uncertainty 5

© 2013 IBM Corporation

Decision Table

  DM chooses a row without knowing which column will occur   Choice depends on the relative likelihood of High and Low? –  If DM is sure that IBM share price will be High, best choice is to buy Shares –  If DM is sure that IBM share price will be Low, best choice is to buy Bonds

Elicit the DM’s beliefs about which column will occur   Choice depends on the value of money –  Expected return not a good measure of decision preferences •  The two alternatives give the same expected return but most of DMs would not fell indifferent between them

Elicit risk attitude of the DM

6

© 2013 IBM Corporation

Decision tree representation High IBM Shares

$2,000 uncertainty

price

Low

- $1,000

What to buy

Bonds

$500

certainty

  What does the choice depends upon? –  relative likelihood of H vs L –  strength of preferences for money

7

© 2013 IBM Corporation

Subjective expected utility solution   If DM’s decision behavior consistent with some set of “rational” desiderata (axioms) DM decides as if he has –  probabilities to represent his beliefs about the future price of IBM share –  “utilities” to represent his preferences and risk attitude towards money

and choose the alternative of maximum expected utility   The subjective expected utility model balance in a “rational” manner –  the DM’s beliefs and risk attitudes

  Application requires to –  know the DM’s beliefs and “utilities” •  Different elicitation methods –  compute of expected utilities of each decision strategy •  It may require approximation in non-simple problems

8

© 2013 IBM Corporation

A constructive definition of “utility”   The Basic Canonical Reference Lottery ticket: p-BCRL

p

$2,000

BCLR

1-p

- $1,000

Preferences over BCRL p-BCRL > q-BCRL iff p > q where p and q are canonical probabilities

9

© 2013 IBM Corporation

Elicit prob. of the price of IBM shares   Event H –  IBM price High

H IBM shares

  Event L

$2,000

price

L

–  IBM price Low

- $1,000

  Pr( H ) + Pr( L ) = 1

p p-BCRL   Move p from 1 to 0   Which alternative is preferred by the DM?

$2,000

BCRL

1-p

- $1,000

–  IBM shares –  p-BCRL

  There exists a breakeven canonical prob. such that the DM is indifferent –  pH-BCRL ~ IBM shares –  The judgmental probability of H is pH

10

© 2013 IBM Corporation

Elicit the utility of $500

p p - BCLR

$2,000

BCLR

  U( $500 )?

1-p

Bonds   Move p from 1 to 0   Which alternative is preferred by the DM?

- $1,000

$500

p-BCRL vs. Bonds

  There exists a breakeven canonical prob. such that the DM is indifferent –  u-BCRL ~ Bonds –  This scales the value of $500 between the value of $2,000 and - $1,000 U($500) = u

  What is then U($500)? –  The probability of a BCRL between $2,000 and - $1,000 that is indifferent (for the DM) to getting $500 with certainty

11

© 2013 IBM Corporation

H

Comparison of alternatives IBM shares

$2,000

price

L

~

pH

- $1,000

$2,000

BCRL

- $1,000

U($500)

$2,000

BCLR

The DM prefers to invest on “IBM Shares” iff pH > U($500) 12

- $1,000

~ Bonds

$500 © 2013 IBM Corporation

Solving the tree: backward induction   Utility scaling 0 = U( - $1,000 ) < U( $500 ) = u < U( $2,000 ) = 1

Utilities pH IBM Shares

High

1 - pH Bonds

13

1

- $1,000

0

$500

u

price

Low What to buy

$2,000

© 2013 IBM Corporation

Preferences: value vs. utility   Value function –  measure the desirability (intensity of preferences) of money gained, –  but do not measure risk attitude

  Utility function –  Measure risk attitude –  but no intensity of preferences over sure consequences

  Many methods to elicit a utility function –  Qualitative analysis of risk attitude leads to parametric utility functions –  Ask quantitative indifference questions between deals (one of which must be an uncertain lottery) to assess parameters of utility function –  Consistency checks and sensitivity analysis

14

© 2013 IBM Corporation

The Bayesian process of inference and evaluation with several stakeholders and decision makers (Group decision making)

15

© 2013 IBM Corporation

Disagreements in group decision making   Group decision making assumes –  Group value/utility function –  Group probabilities on the uncertainties

  If our experts disagree on the science (Expert problem) –  How to draw together and learn from conflicting probabilistic judgements –  Mathematical aggregation •  Bayesian approach •  Opinion pools -  There is no opinion pool satisfying a consensus minimum set of “good” probabilistic properties

•  Issues -  How do we model knowledge overlap/correlation -  Expertise evaluation

–  Behavioural aggregation –  The textbook problem •  If we do not have access to experts we need to develop meta-analytical methodologies for drawing together expert judgment studies

16

© 2013 IBM Corporation

Disagreements in group decision making   If group members disagree on the values –  How to combine different individuals’ rankings of options into a group ranking? –  Arbitration/voting •  Ordinal rankings -  Arrow impossibility results. •  Cardinal ranking (values and not utilities -- Decisions without uncertainty) -  Interpersonal comparison of preferences’ strengths -  Supra decision maker approach (MAUT) •  Issues: manipulation and true reporting of rankings

  Disagreement on the values and the science –  Combining •  individual probabilities and utilities •  into group probabilities and utilities, respectively, •  to form the corresponding group expected utilities and choosing accordingly –  Impossibility of being Bayesian and Paretian at the same time •  No aggregation method exist (of probabilities and utilities) compatible with the Pareto order –  Behavioral approaches •  Consensus on group probabilities and utilities via sensitivity analysis. •  Agreement on what to do via negotiation

17

© 2013 IBM Corporation

Decision analysis in the presence of intelligent others   Matrix games against nature –  One player: R (Row) •  Two choices: U (Up) and D (Down) –  Payoff matrix Nature L

R

0

5

10

3

U R D

If you were R, what would you do? D > U against L U > D against R 18

© 2013 IBM Corporation

Games against nature   Do we know which Colum nature will choose? –  We know our best responses to Nature moves, but not what move Nature will choose

  Do we know the (objective) probabilities of Nature’s possible moves? –  YES

p

1-p Nature

L

R

Expected payoff

0

5

0 p + 5 (1-p)

10

3

10 p + 3 (1-p)

U R D

U > D iff p < 1/6 19

Payoffs = vNM utils © 2013 IBM Corporation

Games against nature and the SEU criteria   Do we know the (objective) probabilities of Nature’s possible moves? –  No •  Variety of decision criteria -  Maximin (pessimistic), maxmax (optimistic), Hurwicz, minimax regret,…

Nature L

R

Min

Max

Max Regret

0

5

0

5

10

10

3

3

10

2

U R D Maxmin D

Maxmax D

Minmax Regret D

SEU criteria Elicit DM’s subjective probabilistic beliefs about Nature move (p) Compute SEU of each alternative: D > U iff p > 1/6 20

© 2013 IBM Corporation

Games against others intelligent players   Bimatrix (simultaneous) games –  Second intelligent player: C (Column) •  Two choices: L (Left) and R (Right) –  Payoff bimatrix •  we know C payoffs and that he will try to maximize them –  As R, what would you do? C L

R

0 U

5 2

*

4

R 10 D

21

3 3

–  Knowledge C’s payoffs and rationality allows us to predict with certitude C’s move (R)

8

© 2013 IBM Corporation

One shot simultaneous bi-matrix games   Two players –  Trying to maximize their payoffs

  Players must choose one out of two fixed alternatives –  Row player chooses a row –  Column player chooses a column

  Payoffs depends of both players’ moves   Simultaneous move game –  Players must act without knowing what the other player does –  Play once

  No other uncertainties involved   Players have full and common knowledge of

L

–  choice spaces –  bi-matrix payoffs

  No cooperation allowed

U R D

22

C

uR(U,L) uC(U,L) uR(D,L) uC(D,L)

R uR(U,R) uC(U,R) uR(D,L) uC(D,L) © 2013 IBM Corporation

Dominant alternatives and social dilemmas C

  Prisoner dilemma –  (NC,NC) is mutually dominant •  Players’ choices are independent of information regarding the other player’s move –  (NC,NC) is socially dominated by (C,C)

C

NC

5 C

-5 5

10

R

  Airport network security

10 NC

-2 -5

*

-2

* 23

© 2013 IBM Corporation

Iterative dominance   No dominant strategy for either player, however –  There are iterative dominated strategies •  L > R •  Now M is dominant in the restricted game -  M > U and M > D

•  Now L > C in the restricted game -  20 > - 10

–  (M,L) solution by iteratively elimination of (strict) dominated strategies •  Common knowledge and rationality assumptions

  Exercise –  Find if there is a solution by iteratively eliminating dominated strategies

Solution: (D,C) 24

© 2013 IBM Corporation

Nash equilibrium   Games without –  Dominant solution –  Solution by iterative elimination of dominated alternatives Concert

Ballet

0

2

Ballet

Concert

0 1

*

*

2

Tails

1 1

Head

0

Battle of the sexes

25

Head -1 -1 -1 0

Tails

1 1

1

-1

Matching pennies

© 2013 IBM Corporation

Existence of Nash equilibrium (Nash)   Every finite game has a NE in mixed strategies –  Requires extending the original set of alternatives of each player

  Consider the matching pennies game –  Mixed strategies •  Choosing a lottery of certain probabilities over Head and Tails –  Players’ choice sets defined by the lottery’s probability •  Row: p in [0,1] •  Column: q in [0,1] –  Payoff associated with a pair of strategies (p,q) is •  (p,1-p) P (q,1-q)T where P is the payoff matrix for the original game in pure strategies •  Payoffs need to be vNM utilities –  Nash equilibrium •  Intersection of players best response correspondences

uR(p*,q*) > uR(p,q*) uC(p*,q*) > uC(p*,q)

26

(p*,q*)

© 2013 IBM Corporation

Nash equilibria concept as predictive tool   Supporting the row player against the column player   Games with multiple NEs L

R

4

10

U

-100 12

D

*

6

8

–  To protect himself against -100

  Knowing this, R would prefer to play U

5

*

  Two NEs   (D,L) > (U,R), since 12>10 and 8>6   C may prefer to play R

–  ending up at the inferior NE (U,R) 4

  How can we model C behavior? –  Bayesian K-level thinking

27

© 2013 IBM Corporation

K-level thinking p

  Row is not sure about Column’s move –  p: Row’s beliefs about C moving L –  Row’s SEU •  U: 4 p + 10 (1-p) •  D: 12 p + 5 (1-p) –  U > D iff p < 5/13 = 0.38

q

  How to elicit p? –  Row’s analysis of Column’s decision •  Assuming C behave as a SEU maximizer •  q: C’s beliefs about whether Row is smart enough to choose D (best NE) •  L SEU: -100 (1-q) + 8 q R SEU: 6 (1-q) + 4 q •  L > R iff q > 53/55 = 0.96 •  Since Row does not know q, his beliefs about q are represented by a CPD F •  p = Pr (q > 0.96) = F(0.96)

28

© 2013 IBM Corporation

Simultaneous vs sequential games   First mover advantage –  Both players want to move first •  Credible commitment/threat

Game of Chicken

29

  Second mover advantage –  Players want to observe their opponent’s move before acting –  Both players try not to disclose their moves

Matching pennies game

© 2013 IBM Corporation

Dynamic games: backward induction   Sequential Defend-Attack games –  Two intelligent players •  Defender and Attacker –  Sequential moves •  First Defender, afterwards Attacker knowing Defender’s decision

30

© 2013 IBM Corporation

Standard Game Theoretic Analysis Expected utilities at node S

Best Attacker’s decision at node A

Assuming Defender knows Attacker’s analysis Defender’s best decision at node D

Solution:

31

© 2013 IBM Corporation

Supporting a SEU maximizer Defender Defender’s problem

Defender’s solution of maximum SEU

Modeling input:

32

??

© 2013 IBM Corporation

Example: Banks-Anderson (2006)   Exploring how to defend US against a possible smallpox attack –  Random costs (payoffs)

–  Conditional probabilities of each kind of smallpox attack given terrorists know what defence has been adopted This is the problematic step of the analysis –  Compute expected cost of each defence strategy

  Solution: defence of minimum expected cost

33

© 2013 IBM Corporation

Predicting Attacker’s decision: Defender problem

34

. Defender’s view of Attacker problem

© 2013 IBM Corporation

Solving the assessment problem Defender’s view of Attacker problem

Elicitation of A is an EU maximizer D’s beliefs about

MC simulation

35

© 2013 IBM Corporation

Bayesian decision solution for the sequential Defend- Attack model

36

© 2013 IBM Corporation

Standard Game Theory vs. Bayesian Decision Analysis   Decision Analysis (unitary DM) –  Use of decision trees –  Opponent’ actions treated as a random variables •  How to elicit probs on opponents’ decisions?? •  Sensitivity analysis on (problematic) probabilities

  Game theory (multiple DMs) –  Use of game trees –  Opponent’ actions treated as a decision variables –  All players are EU maximizers •  Do we really know the utilities our opponents try to maximizes?

37

© 2013 IBM Corporation

Bayesian decision analysis approach to games   One-sided prescriptive support –  Use a prescriptive model (SEU) for supporting one of the DMs –  Treat opponent's decisions as uncertainties –  Assess probs over opponent's possible actions –  Compute action of maximum expected utility

  The ‘real’ bayesian approach to games (Kadane & Larkey 1982) –  Weaken common (prior) knowledge assumption

 How to assess a prob distribution over actions of intelligent others?? –  “Adversarial Risk Analysis” (DRI, DB and JR) –  Development of new methods for the elicitation of probs on adversary’s actions •  by modeling the adversary’s decision reasoning -  Descriptive decision models

38

© 2013 IBM Corporation

Relevance to counterbioterrorism   Biological Threat Risk Assessment for DHS (Battelle, 2006) –  Based on Probability Event Trees (PET) •  Government & Terrorists’ decisions treated as random events

  Methodological improvements study (NRC committee) –  PET appropriate for risk assessment of •  Random failure in engineering systems but not for adversarial risk assessment •  Terrorists are intelligent adversaries trying to achieve their own objectives •  Their decisions (if rational) can be somehow anticipated –  PET cannot be used for a full risk management analysis •  Government is a decision maker not a random variable

39

© 2013 IBM Corporation

Methodological improvement recommendations   Distinction between risks from –  Nature/Accidents vs. –  Actions of intelligent adversaries

  Need of models to predict Terrorists’ behavior –  Red team role playing (simulations of adversaries thinking) –  Attack-preference models •  Examine decision from Attacker viewpoint (T as DM) –  Decision analytic approaches •  Transform the PET in a decision tree (G as DM) -  How to elicit probs on terrorist decisions?? -  Sensitivity analysis on (problematic) probabilities -  Von Winterfeldt and O’Sullivan (2006)

–  Game theoretic approaches •  Transform the PET in a game tree (G & T as DM)

40

© 2013 IBM Corporation

Models to predict opponents’ behavior   Role playing (simulations of adversaries thinking)   Opponent-preference models –  Examine decision from the opponent viewpoint •  Elicit opponent’s probs and utilities from our viewpoint (point estimates) –  Treat the opponent as a EU maximizer ( = rationality?) •  Solve opponent’s decision problem by finding his action of max. EU –  Assuming we know the opponent’s true probs and utilities •  We can anticipate with certitude what the opponent will do

  Probabilistic prediction models –  Acknowledge our uncertainty on opponent’s thinking

41

© 2013 IBM Corporation

Opponent-preference models   Von Winterfeldt and O’Sullivan (2006) –  Should We Protect Commercial Airplanes Against Surface-to-Air Missile Attacks by Terrorists?

Decision tree + sensitivity analysis on probs 42

© 2013 IBM Corporation

Parnell (2007)   Elicit Terrorist’s probs and utilities from our viewpoint –  Point estimates

  Solve Terrorist’s decision problem –  Finding Terrorist’s action that gives him max. expected utility

  Assuming we know the Terrorist’s true probs and utilities –  We can anticipate with certitude what the terrorist will do

43

© 2013 IBM Corporation

Parnell (2007)   Terrorist decision tree

44

© 2013 IBM Corporation

Paté-Cornell & Guikema (2002) Attacker

45

Defender

© 2013 IBM Corporation

Paté-Cornell & Guikema (2002)   Assessing probabilities of terrorist’s actions –  From the Defender viewpoint •  Model the Attacker’s decision problem •  Estimate Attacker’s probs and utilities (point estimates) •  Calculate expected utilities of attacker’s actions –  Prob of attacker’s actions proportional to their perceived EU

  Feed these probs into the Defender’s decision problem –  Uncertainty of Attacker’s decisions has been quantified –  Choose defense of maximum expected utility

  Shortcoming –  If the (idealized) adversary is an EU maximizer he would certainly choose the attack of max expected utility

46

© 2013 IBM Corporation

How to assess probabilities over the actions of an intelligent adversary??   Raiffa (2002): Asymmetric prescriptive/descriptive approach –  Prescriptive advice to one party conditional on a (probabilistic) description of how others will behave –  Assess probability distribution from experimental data •  Lab role simulation experiments

  Rios Insua, Rios & Banks (2009) –  Assessment based on an analysis of the adversary rational behavior •  Assuming the opponent is a SEU maximizer -  Model his decision problem -  Assess his probabilities and utilities -  Find his action of maximum expected utility

–  Uncertainty in the Attacker’s decision stems from •  our uncertainty about his probabilities and utilities –  Sources of information •  Available past statistical data of Attacker’s decision behavior •  Expert knowledge / Intelligence

47

© 2013 IBM Corporation

The Defend–Attack–Defend model   Two intelligent players –  Defender and Attacker

  Sequential moves –  First, Defender moves –  Afterwards, Attacker knowing Defender’s move –  Afterwards, Defender again responding to attack

  Infinite regress

48

© 2013 IBM Corporation

Standard Game Theory Analysis

  Under common knowledge of utilities and probs   At node

  Expected utilities at node S

  Best Attacker’s decision at node A

  Best Defender’s decision at node

  Nash Solution:

49

© 2013 IBM Corporation

Supporting the Defender against the Attacker   At node

  Expected utilities at node S

  At node A

  Best Defender’s decision at node

 

50

??

© 2013 IBM Corporation

Predicting   Attacker’s problem as seen by the Defender

51

© 2013 IBM Corporation

Assessing Given

52

© 2013 IBM Corporation

Monte-Carlo approximation of  Drawn  Generate

by

 Approximate

53

© 2013 IBM Corporation

The assessment of   The Defender may want to exploit information about how the Attacker analyzes her problem   Hierarchy of recursive analysis –  Infinity regress –  Stop when there is no more information to elicit

54

© 2013 IBM Corporation

Games with private information   Example: –  Consider the following two-person simultaneous game with asymmetric information •  Player 1 (row) knows whether he is stronger than player 2 (Colum) but player 2 does not know this •  Player's type use to represent information privately known by that player

55

© 2013 IBM Corporation

Bayes Nash Equilibrium   Assumption –  common prior over the row player's type: •  Column's beliefs about the row player's type are common knowledge •  Why column is going to disclose this information? •  Why row is going to believe that column is disclosing her true beliefs about his type?

  Row’s strategy function

56

© 2013 IBM Corporation

Bayes Nash Equilibrium

57

© 2013 IBM Corporation

Is the common knowledge assumption

realistic?

  –  Column is better off reporting that

– 

– 

58

© 2013 IBM Corporation

Modeling opponents' learning of private information   Simultaneous decisions –  Bayes Nash Equilibrium –  No opportunity to learn about this information

  Sequential decisions •  Perfect Bayesian equilibrium/Sequential rationality •  Opportunity to learn from the observed decision behavior -  Signaling games

  Models of adversaries' thinking to anticipate their decision behavior –  need to model opponents' learning of private information we want to keep secret –  how would this lead to a predictive probability distribution?

59

© 2013 IBM Corporation

Sequential Defend-Attack model with Defender’s private information   Two intelligent players –  Defender and Attacker

  Sequential moves –  First Defender, afterwards Attacker knowing Defender’s decision

  Defender’s decision takes into account her private information –  The vulnerabilities and importance of sites she wants to protect –  The position of ground soldiers in the data ferry control problem (ITA)

  Attacker observes Defender’s decision –  Attacker can infer/learn about information she wants to keep secret

  How to model the Attacker’s learning

60

© 2013 IBM Corporation

Influence diagram vs. game tree representation

61

© 2013 IBM Corporation

A game theoretic analysis

62

© 2013 IBM Corporation

A game theoretic analysis

63

© 2013 IBM Corporation

A game theoretic solution

64

© 2013 IBM Corporation

Supporting the Defender   We weaken the common knowledge assumption   The Defender’s decision problem

D

S

A

??

V

65

© 2013 IBM Corporation

Defender’s solution

66

© 2013 IBM Corporation

Predicting the Attacker’s move:

67

© 2013 IBM Corporation

Attacker action of MEU

68

© 2013 IBM Corporation

Assessing

69

© 2013 IBM Corporation

How to stop this hierarchy of recursive analysis?   Potentially infinite analysis of nested decision models –  where to stop? •  Accommodate as much information as we can •  Stop when the Defender has no more information •  Non-informative or reference model •  Sensitivity analysis test

70

© 2013 IBM Corporation