Game AI

1 downloads 0 Views 790KB Size Report
goal state. ⇔ a terminal state (usually a winning state) of high utility,. i.e. typically a leaf node in the game tree of high utility optimal strategy. ⇔ a strategy that is ...
Game AI Prof. Christian Bauckhage

outline lecture 06 recap optimal strategies and tree search algorithms α-β pruning depth restricted searches searching under uncertainty / chance history and state of the art summary

recap zero-sum game ⇔ a game in which a player’s gain of utility is exactly balanced by the loss(es) of the other player(s) utility / payoff ⇔ a numerical value assigned to terminal game states

-1+1 +1-1 -1+1 -1+1

-1+1 0

+1-1

-1+1 -1+1 0

-1+1 0

+1-1

0

-1+1 -1+1 +1-1

recap goal state ⇔ a terminal state (usually a winning state) of high utility, i.e. typically a leaf node in the game tree of high utility optimal strategy ⇔ a strategy that is at least as good as any other when playing against an infallible opponent

-1+1 +1-1 -1+1 -1+1

-1+1 0

+1-1

-1+1 -1+1 0

-1+1 0

+1-1

0

-1+1 -1+1 +1-1

recap minmax algorithm ⇔ decision rule for playing infallible opponents ⇔ minimize maximum loss ⇔ maximize minimum gain ⇔ recursively assign minmax values mmv to all game states that can result from n and move to state with largest mmv   u(n) if n is a terminal node     max mmv(s) if n is a MAX node mmv(n) = s∈Succ(n)      min mmv(s) if n is a MIN node s∈Succ(n)

observe

n0 n1 n3 n7 n15

9

n8

n4

13 13

n9

9

13 n2

13

n5

14 n10

14

n11

4 13

n 13 12 8

n6 n13

4 4

n14

1

n n n n n n n n n n n n n n n 12 16 9 17 18 18 13 19 9 20 16 21 14 22 16 23 15 24 13 25 8 26 19 27 18 28 4 29 1 30 18

minmax is a depth-first search algorithm number of states to be explored is exponential in  the number of moves, the worst case effort is O bm ⇔ it is usually infeasible to compute mmv(n) in practice

optimal strategies and tree search algorithms

obvious ideas

recall that search tree nodes often occur multiple times

obvious ideas

recall that search tree nodes often occur multiple times recall that we can compute hash values for game states

obvious ideas

recall that search tree nodes often occur multiple times recall that we can compute hash values for game states ⇒ create a transposition table, i.e. a hash or lookup table of previously seen states

obvious ideas

recall that search tree nodes often occur multiple times recall that we can compute hash values for game states ⇒ create a transposition table, i.e. a hash or lookup table of previously seen states ⇒ if minmax encounters a node that has already been expanded in a different branch of the game tree, do not expand it again but recycle minmax values from the transposition table

question are there less obvious ideas / approaches ?

answer yes, there are

observe

often, it is possible to compute the correct minmax decision without exploring every game state (node in the game tree)

observe

often, it is possible to compute the correct minmax decision without exploring every game state (node in the game tree) 1

often, the worst case effort exponent can be halved O b 2 m



observe

often, it is possible to compute the correct minmax decision without exploring every game state (node in the game tree) 1

often, the worst case effort exponent can be halved O b 2 m



using pruning techniques, large parts of the search tree do not have to be considered

pruning (in the context of AI) ⇔ eliminating possibilities from consideration without having to examine them

example

n0

n1

n4

n5

[−∞, +∞]

n2

n6

n7

n8

n3

n9

n10

  for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN

n11

n12

example

n0

n1

n4

n5

n2

n6

n7

n8

[−∞, +∞] n3

[−∞, +∞] n9

n10

  for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN

n11

n12

example

n0

n1

n4

n5

n2

n6

n7

n8

[−∞, +∞] n3

[−∞, 3] n9

3

n10

  for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN

n11

n12

example

n0

n1

n4

n5

n2

n6

n7

n8

[−∞, +∞] n3

[−∞, 3]

12

n9

3

n10

  for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN

n11

n12

example

n0

n1

n4

n5

n2

n6

n7

8

n8

[3, +∞] n3

[−∞, 3]

12

n9

3

n10

  for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN

n11

n12

example

n0

n1

n4

n5

n2

[3, +∞] n6

n7

8

n8

[3, +∞] n3

[−∞, 3]

12

n9

3

n10

  for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN

n11

n12

example

n0

n1

n4

n5

2

n2

[3, 2] n6

n7

8

n8

[3, +∞] n3

[−∞, 3]

12

n9

3

n10

  for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN

n11

n12

example

n0

n1

n4

n5

2

n2

[3, 2] n6

n7

8

n8

[3, +∞] n3

[−∞, 3]

12

n9

3

n10

  for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN

[3, +∞] n11

n12

example

n0

n1

n4

n5

2

n2

[3, 2] n6

n7

8

n8

[3, +∞] n3

[−∞, 3]

12

n9

3

n10

  for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN

14

[3, 14] n11

n12

example

n0

n1

n4

n5

2

n2

[3, 2] n6

n7

8

n8

[3, 2] n3

[−∞, 3]

12

n9

3

n10

  for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN

14

[3, 2] n11

n12

2

example

n0

n1

n4

n5

2

n2

[3, 2] n6

n7

8

n8

[3, 2] n3

[−∞, 3]

12

n9

3

n10

14

[3, 2] n11

n12

2

   mmv(n0 ) = max min 12, 3, 8 , min 2, x, y , min 14, 2, v

example

n0

n1

n4

n5

2

n2

[3, 2] n6

n7

8

n8

[3, 2] n3

[−∞, 3]

12

n9

3

n10

14

[3, 2] n11

n12

2

   mmv(n0 ) = max min 12, 3, 8 , min 2, x, y , min 14, 2, v

  = max 3, min 2, x, y , min 2, v

example

n0

n1

n4

n5

2

n2

[3, 2] n6

n7

8

n8

[3, 2] n3

[−∞, 3]

12

n9

3

n10

14

[3, 2] n11

n12

2

   mmv(n0 ) = max min 12, 3, 8 , min 2, x, y , min 14, 2, v

  = max 3, min 2, x, y , min 2, v

= max 3, z, w where z, w 6 2

example

n0

n1

n4

n5

2

n2

[3, 2] n6

n7

8

n8

[3, 2] n3

[−∞, 3]

12

n9

3

n10

14

[3, 2] n11

n12

2

   mmv(n0 ) = max min 12, 3, 8 , min 2, x, y , min 14, 2, v

  = max 3, min 2, x, y , min 2, v

= max 3, z, w where z, w 6 2 =3

α-β pruning

MaxVal(n0 , −∞, +∞) function MaxVal(n, α, β) if isTerminalState(n) return u(n) for s in Succ(n)  α ← max α, MinVal(s, α, β) if α > β return α return α

function MinVal(n, α, β) if isTerminalState(n) return u(n) for s in Succ(n)  β ← min β, MaxVal(s, α, β) if β 6 α return β return β

note

effectiveness of α-β pruning obviously depends on the order in which successors are explored

n0

n1

n4

n5

2

n2

[3, 2] n6

n7

8

n8

[3, 2] n3

[−∞, 3]

12

n9

3

n10

14

[3, 2] n11

n12

2

observe

“one can show” that, if successors are expanded in a random order, the number of nodes to be examined is 3  O b4m

question 3   O b 4 m is better than O bm , but can we do better still ?

what if the best alternative was expanded first ?

question 3   O b 4 m is better than O bm , but can we do better still ?

what if the best alternative was expanded first ?

answer strictly speaking this is impossible, but assume it was . . .

observe

“one can show” that, if successors are ranked according 1  to utility, the number of nodes to be examined is O b 2 m

observe

“one can show” that, if successors are ranked according 1  to utility, the number of nodes to be examined is O b 2 m

1

observe that b 2 m =

√ m b

⇒ the effective branching factor reduces from b to

√ b

observe

“one can show” that, if successors are ranked according 1  to utility, the number of nodes to be examined is O b 2 m

1

observe that b 2 m =

√ m b

⇒ the effective branching factor reduces from b to

√ b

observe that m = 2 21 m ⇒ in the same amount of time, minmax with α-β pruning can look twice as far ahead as ordinary minmax

note

even with pruning techniques such as α-β pruning, minmax has to search all the way down to the leafs of the search tree while this is feasible for games such as tic tac toe or connect four, this is usually infeasible for games with larger branching factors

observe

in his 1950 paper, Shannon proposed to cut off searches after a few levels, i.e. to perform depth restricted searches

observe

in his 1950 paper, Shannon proposed to cut off searches after a few levels, i.e. to perform depth restricted searches to estimate utilities of non-terminal nodes, he proposed to consider heuristic evaluation functions

heuristic / evaluation function ⇔ provides an estimate of the utility of a game state that is not a terminal state

consequence

change minmax from   u(n)     max mmv(s) mmv(n) = s∈Succ(n)      min mmv(s)

if n is a terminal node if n is a MAX node if n is a MIN node

s∈Succ(n)

to       mmv(n) =

Eval(n)

max mmv(s) s∈Succ(n)      min mmv(s) s∈Succ(n)

if CutOff (n) yields TRUE if n is a MAX node if n is a MIN node

example

simple evaluation function for tic tac toe  Eval(n, p) = number of lines where p can win  − number of lines where −p can win

example

simple evaluation function for tic tac toe  Eval(n, p) = number of lines where p can win  − number of lines where −p can win

n

 Eval n,X

8−8=0

8−4=

 Eval n,O

8−8=0

4 − 8 = −4

4

5−4=

1

4 − 5 = −1

5−3=

2

3 − 5 = −2

example

def evaluate_game_state(S, p): T1 = np.copy(S) T1[T1==0] = p n1 = num_winning_lines(T1, p) T2 = np.copy(S) T2[T2==0] = -p n2 = num_winning_lines(T2, -p) return n1 - n2

example

def num_winning_lines(T, p): cs = np.sum(T, axis=0) * p # column sums rs = np.sum(T, axis=1) * p # row sums s1 = cs[cs==3].size s2 = rs[rs==3].size s3 = 0 if np.sum(np.diag(T)) * p == 3: s3 = 1 s4 = 0 if np.sum(np.diag(np.rot90(T))) * p == 3: s4 = 1 return s1 + s2 + s3 + s4

question is this evaluation function any good ?

question is this evaluation function any good ?

answer let’s see . . .

example

for player X, evaluate all successors of state

n  Eval n,X

example

for player X, evaluate all successors of state

n  Eval n,X

2

2

2

2

3

general criteria for an evaluation function

computing Eval(n) should not take too long Eval(n) should order terminal states just as u(n) would do Eval(n) should be correlated with the chance of winning if n is non-terminal

note

if a search is cut off at non-terminal state n, minmax will necessarily be uncertain about the true mmv(n)

note

if a search is cut off at non-terminal state n, minmax will necessarily be uncertain about the true mmv(n) evaluation functions may be overly optimistic (see the example above)

observe

most evaluation functions compute features fin of states n examples include material values of pieces on a chess board number of winning lines in tic tact toe .. .

observe

most evaluation functions compute features fin of states n examples include material values of pieces on a chess board number of winning lines in tic tact toe .. .

  features form feature vectors f n = f1n , f2n , . . . , fkn

observe

most evaluation functions compute features fin of states n examples include material values of pieces on a chess board number of winning lines in tic tact toe .. .

  features form feature vectors f n = f1n , f2n , . . . , fkn feature vectors may be categorized or clustered

observe

most evaluation functions compute features fin of states n examples include material values of pieces on a chess board number of winning lines in tic tact toe .. .

  features form feature vectors f n = f1n , f2n , . . . , fkn feature vectors may be categorized or clustered feature vector clusters ⇔ game state categories c(n)

expected values experience and/or statistics can be used to estimate how often a category leads to a win (+1), to a loss (−1), or to a draw (0) later on

expected values experience and/or statistics can be used to estimate how often a category leads to a win (+1), to a loss (−1), or to a draw (0) later on this allows to define    Eval(n) = pw c(n) · (+1) + pl c(n) · (−1) + pd c(n) · 0 where pw + pl + pd = 1

linear combinations features fi can also be used directly for evaluation

linear combinations features fi can also be used directly for evaluation for instance, in chess, one assigns material values to pieces (pawn = 1, knight = 3, bishop = 3, rook = 5, queen = 9) there also are “abstract features” such as king safety which are measured in pawns

linear combinations features fi can also be used directly for evaluation for instance, in chess, one assigns material values to pieces (pawn = 1, knight = 3, bishop = 3, rook = 5, queen = 9) there also are “abstract features” such as king safety which are measured in pawns features like these can be linearly combined Eval(n) =

k X

wi fin

i=1

where the wi are weights set according to experience

note

evaluation functions can also be used to rank game states

note

evaluation functions can also be used to rank game states recall that ranking allows for more efficient α-β pruning

note

evaluation functions can also be used to rank game states recall that ranking allows for more efficient α-β pruning it also allows for realizing best-first search algorithms

question what about games such as backgammon which involve randomness ?

question what about games such as backgammon which involve randomness ?

answer introduce a chance node after every MAX and MIN node

chance nodes ⇔ indicate outcome x and probability p(x) of a random event X

chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice

chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12,

p(x) =

chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12,

p(x) = p(6 + 6)

chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12,

p(x) = p(6 + 6) =

1 6

·

1 6

=

1 36

chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12,

p(x) = p(6 + 6) =

1 6

·

1 6

=

1 36

≈ 3%

chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12, x = 4,

p(x) = p(6 + 6) = p(x) =

1 6

·

1 6

=

1 36

≈ 3%

chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12, x = 4,

p(x) = p(6 + 6) =

1 6

·

1 6

=

1 36

≈ 3%

p(x) = p(1 + 3) + p(2 + 2) + p(3 + 1)

chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12, x = 4,

p(x) = p(6 + 6) =

1 6

·

1 6

=

1 36

≈ 3%

p(x) = p(1 + 3) + p(2 + 2) + p(3 + 1) =

3 36

chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12, x = 4,

p(x) = p(6 + 6) =

1 6

·

1 6

=

1 36

≈ 3%

p(x) = p(1 + 3) + p(2 + 2) + p(3 + 1) =

3 36

≈ 8%

consequence

change minmax from   u(n)     max mmv(s) mmv(n) = s∈Succ(n)      min mmv(s)

if n is a terminal node if n is a MAX node if n is a MIN node

s∈Succ(n)

to expected minmax    u(n)       max emmv(s) s∈Succ(n) emmv(n) = min emmv(s)    s∈Succ(n)   P   p(s) · emmv(s)  s∈Succ(n)

if n is a terminal node if n is a MAX node if n is a MIN node if n is a chance node

history and state of the art

the history of minmax

J. von Neumann, Zur Theorie der Gesellschaftsspiele, Math. Annalen 100(1), 1928 N. Wiener, Cybernetics or Control and Communication in the Animal and the Machine, MIT Press, 1948 C. Shannon, Programming a Computer for Playing Chess, Philosophical Magazine, Ser.7, 41(314), 1950

J. von Neumann (∗1903, †1957)

N. Wiener (∗1894, †1964)

C. Shannon (∗1916, †2001)

the history of α-β pruning

J. McCarthy, 1955 A. Newell and H. Simon, 1958 .. . D. Knuth and R.W. Moore, 1975 .. . J. Pearl, The Solution for the Branching Factor of the Alphabeta Pruning Algorithm and its Optimality, Comm. of the ACM 25(8), 1982

state of the art (pre 2016)

modern programs that play chess, checkers, . . . rely on minmax search with α-β pruning and other cutoff techniques (e.g. quiescent search) sophisticated and highly tuned evaluation functions large transposition tables and databases of opening and endgame moves (extreme) computing power

state of the art (pre 2016)

modern programs that play chess, checkers, . . . rely on minmax search with α-β pruning and other cutoff techniques (e.g. quiescent search) sophisticated and highly tuned evaluation functions large transposition tables and databases of opening and endgame moves (extreme) computing power Deep Blue stats evaluated 126,000,000 nodes/s on average generated 30,000,000 positions per move thus typically computed look-ahead of depth 14 considered 8,000 features for node evaluation used a data base of 700,000 grandmaster games

state of the art (post 2016)

AlphaGO (2016) characteristics combines tree search and machine learning in particular, uses Monte Carlo tree search guided by a value- and a policy-network the latter are deep neural networks originally trained using a vast number of examples of human gameplay (for bootstrap) and selfplay (for refinement)

AlphaGO Zero (2017) characteristics works just as AlpheGO but creates training data from scratch (by means of way more extensive selfplay)

AlphaZero (2017) characteristics can, in principal, play any two player turn based game clearly outperforms previous best chess programs

note

AlphaGO, AlphaGO Zero, and AlphaZero all reach superhuman performance

note

AlphaGO, AlphaGO Zero, and AlphaZero all reach superhuman performance they all require insane amount of computing power (at least during training)

note

AlphaGO, AlphaGO Zero, and AlphaZero all reach superhuman performance they all require insane amount of computing power (at least during training)

an average human brain consumes 480 calories per day ⇔ an average human brain runs on about 20 Watts

note

AlphaGO, AlphaGO Zero, and AlphaZero all reach superhuman performance they all require insane amount of computing power (at least during training)

an average human brain consumes 480 calories per day ⇔ an average human brain runs on about 20 Watts a high performance GPU cluster runs on 5.000.000 Watts

recall from lecture 02: categories of intelligence

thinking humanly

thinking rationally

cognitive science

mathematical logic

neuroscience

inference

acting humanly

acting rationally

knowledge representation

agents

reasoning

game theory

learning

operations research

⇒ Turing test

recall from lecture 02: categories of intelligence

thinking humanly

thinking rationally

cognitive science

mathematical logic

neuroscience

inference

acting humanly

acting rationally

knowledge representation

agents

reasoning

game theory

learning

operations research

⇒ Turing test

summary

we now know about

transposition tables pruning, in particular α-β pruning depth restricted searches and evaluation functions extensions towards search under uncertainty / randomness