goal state. â a terminal state (usually a winning state) of high utility,. i.e. typically a leaf node in the game tree of high utility optimal strategy. â a strategy that is ...
Game AI Prof. Christian Bauckhage
outline lecture 06 recap optimal strategies and tree search algorithms α-β pruning depth restricted searches searching under uncertainty / chance history and state of the art summary
recap zero-sum game ⇔ a game in which a player’s gain of utility is exactly balanced by the loss(es) of the other player(s) utility / payoff ⇔ a numerical value assigned to terminal game states
-1+1 +1-1 -1+1 -1+1
-1+1 0
+1-1
-1+1 -1+1 0
-1+1 0
+1-1
0
-1+1 -1+1 +1-1
recap goal state ⇔ a terminal state (usually a winning state) of high utility, i.e. typically a leaf node in the game tree of high utility optimal strategy ⇔ a strategy that is at least as good as any other when playing against an infallible opponent
-1+1 +1-1 -1+1 -1+1
-1+1 0
+1-1
-1+1 -1+1 0
-1+1 0
+1-1
0
-1+1 -1+1 +1-1
recap minmax algorithm ⇔ decision rule for playing infallible opponents ⇔ minimize maximum loss ⇔ maximize minimum gain ⇔ recursively assign minmax values mmv to all game states that can result from n and move to state with largest mmv u(n) if n is a terminal node max mmv(s) if n is a MAX node mmv(n) = s∈Succ(n) min mmv(s) if n is a MIN node s∈Succ(n)
observe
n0 n1 n3 n7 n15
9
n8
n4
13 13
n9
9
13 n2
13
n5
14 n10
14
n11
4 13
n 13 12 8
n6 n13
4 4
n14
1
n n n n n n n n n n n n n n n 12 16 9 17 18 18 13 19 9 20 16 21 14 22 16 23 15 24 13 25 8 26 19 27 18 28 4 29 1 30 18
minmax is a depth-first search algorithm number of states to be explored is exponential in the number of moves, the worst case effort is O bm ⇔ it is usually infeasible to compute mmv(n) in practice
optimal strategies and tree search algorithms
obvious ideas
recall that search tree nodes often occur multiple times
obvious ideas
recall that search tree nodes often occur multiple times recall that we can compute hash values for game states
obvious ideas
recall that search tree nodes often occur multiple times recall that we can compute hash values for game states ⇒ create a transposition table, i.e. a hash or lookup table of previously seen states
obvious ideas
recall that search tree nodes often occur multiple times recall that we can compute hash values for game states ⇒ create a transposition table, i.e. a hash or lookup table of previously seen states ⇒ if minmax encounters a node that has already been expanded in a different branch of the game tree, do not expand it again but recycle minmax values from the transposition table
question are there less obvious ideas / approaches ?
answer yes, there are
observe
often, it is possible to compute the correct minmax decision without exploring every game state (node in the game tree)
observe
often, it is possible to compute the correct minmax decision without exploring every game state (node in the game tree) 1
often, the worst case effort exponent can be halved O b 2 m
observe
often, it is possible to compute the correct minmax decision without exploring every game state (node in the game tree) 1
often, the worst case effort exponent can be halved O b 2 m
using pruning techniques, large parts of the search tree do not have to be considered
pruning (in the context of AI) ⇔ eliminating possibilities from consideration without having to examine them
example
n0
n1
n4
n5
[−∞, +∞]
n2
n6
n7
n8
n3
n9
n10
for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN
n11
n12
example
n0
n1
n4
n5
n2
n6
n7
n8
[−∞, +∞] n3
[−∞, +∞] n9
n10
for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN
n11
n12
example
n0
n1
n4
n5
n2
n6
n7
n8
[−∞, +∞] n3
[−∞, 3] n9
3
n10
for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN
n11
n12
example
n0
n1
n4
n5
n2
n6
n7
n8
[−∞, +∞] n3
[−∞, 3]
12
n9
3
n10
for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN
n11
n12
example
n0
n1
n4
n5
n2
n6
n7
8
n8
[3, +∞] n3
[−∞, 3]
12
n9
3
n10
for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN
n11
n12
example
n0
n1
n4
n5
n2
[3, +∞] n6
n7
8
n8
[3, +∞] n3
[−∞, 3]
12
n9
3
n10
for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN
n11
n12
example
n0
n1
n4
n5
2
n2
[3, 2] n6
n7
8
n8
[3, +∞] n3
[−∞, 3]
12
n9
3
n10
for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN
n11
n12
example
n0
n1
n4
n5
2
n2
[3, 2] n6
n7
8
n8
[3, +∞] n3
[−∞, 3]
12
n9
3
n10
for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN
[3, +∞] n11
n12
example
n0
n1
n4
n5
2
n2
[3, 2] n6
n7
8
n8
[3, +∞] n3
[−∞, 3]
12
n9
3
n10
for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN
14
[3, 14] n11
n12
example
n0
n1
n4
n5
2
n2
[3, 2] n6
n7
8
n8
[3, 2] n3
[−∞, 3]
12
n9
3
n10
for each node, register α, β where α = highest mmv found so far for MAX β = lowest mmv found so far for MIN
14
[3, 2] n11
n12
2
example
n0
n1
n4
n5
2
n2
[3, 2] n6
n7
8
n8
[3, 2] n3
[−∞, 3]
12
n9
3
n10
14
[3, 2] n11
n12
2
mmv(n0 ) = max min 12, 3, 8 , min 2, x, y , min 14, 2, v
example
n0
n1
n4
n5
2
n2
[3, 2] n6
n7
8
n8
[3, 2] n3
[−∞, 3]
12
n9
3
n10
14
[3, 2] n11
n12
2
mmv(n0 ) = max min 12, 3, 8 , min 2, x, y , min 14, 2, v
= max 3, min 2, x, y , min 2, v
example
n0
n1
n4
n5
2
n2
[3, 2] n6
n7
8
n8
[3, 2] n3
[−∞, 3]
12
n9
3
n10
14
[3, 2] n11
n12
2
mmv(n0 ) = max min 12, 3, 8 , min 2, x, y , min 14, 2, v
= max 3, min 2, x, y , min 2, v
= max 3, z, w where z, w 6 2
example
n0
n1
n4
n5
2
n2
[3, 2] n6
n7
8
n8
[3, 2] n3
[−∞, 3]
12
n9
3
n10
14
[3, 2] n11
n12
2
mmv(n0 ) = max min 12, 3, 8 , min 2, x, y , min 14, 2, v
= max 3, min 2, x, y , min 2, v
= max 3, z, w where z, w 6 2 =3
α-β pruning
MaxVal(n0 , −∞, +∞) function MaxVal(n, α, β) if isTerminalState(n) return u(n) for s in Succ(n) α ← max α, MinVal(s, α, β) if α > β return α return α
function MinVal(n, α, β) if isTerminalState(n) return u(n) for s in Succ(n) β ← min β, MaxVal(s, α, β) if β 6 α return β return β
note
effectiveness of α-β pruning obviously depends on the order in which successors are explored
n0
n1
n4
n5
2
n2
[3, 2] n6
n7
8
n8
[3, 2] n3
[−∞, 3]
12
n9
3
n10
14
[3, 2] n11
n12
2
observe
“one can show” that, if successors are expanded in a random order, the number of nodes to be examined is 3 O b4m
question 3 O b 4 m is better than O bm , but can we do better still ?
what if the best alternative was expanded first ?
question 3 O b 4 m is better than O bm , but can we do better still ?
what if the best alternative was expanded first ?
answer strictly speaking this is impossible, but assume it was . . .
observe
“one can show” that, if successors are ranked according 1 to utility, the number of nodes to be examined is O b 2 m
observe
“one can show” that, if successors are ranked according 1 to utility, the number of nodes to be examined is O b 2 m
1
observe that b 2 m =
√ m b
⇒ the effective branching factor reduces from b to
√ b
observe
“one can show” that, if successors are ranked according 1 to utility, the number of nodes to be examined is O b 2 m
1
observe that b 2 m =
√ m b
⇒ the effective branching factor reduces from b to
√ b
observe that m = 2 21 m ⇒ in the same amount of time, minmax with α-β pruning can look twice as far ahead as ordinary minmax
note
even with pruning techniques such as α-β pruning, minmax has to search all the way down to the leafs of the search tree while this is feasible for games such as tic tac toe or connect four, this is usually infeasible for games with larger branching factors
observe
in his 1950 paper, Shannon proposed to cut off searches after a few levels, i.e. to perform depth restricted searches
observe
in his 1950 paper, Shannon proposed to cut off searches after a few levels, i.e. to perform depth restricted searches to estimate utilities of non-terminal nodes, he proposed to consider heuristic evaluation functions
heuristic / evaluation function ⇔ provides an estimate of the utility of a game state that is not a terminal state
consequence
change minmax from u(n) max mmv(s) mmv(n) = s∈Succ(n) min mmv(s)
if n is a terminal node if n is a MAX node if n is a MIN node
s∈Succ(n)
to mmv(n) =
Eval(n)
max mmv(s) s∈Succ(n) min mmv(s) s∈Succ(n)
if CutOff (n) yields TRUE if n is a MAX node if n is a MIN node
example
simple evaluation function for tic tac toe Eval(n, p) = number of lines where p can win − number of lines where −p can win
example
simple evaluation function for tic tac toe Eval(n, p) = number of lines where p can win − number of lines where −p can win
n
Eval n,X
8−8=0
8−4=
Eval n,O
8−8=0
4 − 8 = −4
4
5−4=
1
4 − 5 = −1
5−3=
2
3 − 5 = −2
example
def evaluate_game_state(S, p): T1 = np.copy(S) T1[T1==0] = p n1 = num_winning_lines(T1, p) T2 = np.copy(S) T2[T2==0] = -p n2 = num_winning_lines(T2, -p) return n1 - n2
example
def num_winning_lines(T, p): cs = np.sum(T, axis=0) * p # column sums rs = np.sum(T, axis=1) * p # row sums s1 = cs[cs==3].size s2 = rs[rs==3].size s3 = 0 if np.sum(np.diag(T)) * p == 3: s3 = 1 s4 = 0 if np.sum(np.diag(np.rot90(T))) * p == 3: s4 = 1 return s1 + s2 + s3 + s4
question is this evaluation function any good ?
question is this evaluation function any good ?
answer let’s see . . .
example
for player X, evaluate all successors of state
n Eval n,X
example
for player X, evaluate all successors of state
n Eval n,X
2
2
2
2
3
general criteria for an evaluation function
computing Eval(n) should not take too long Eval(n) should order terminal states just as u(n) would do Eval(n) should be correlated with the chance of winning if n is non-terminal
note
if a search is cut off at non-terminal state n, minmax will necessarily be uncertain about the true mmv(n)
note
if a search is cut off at non-terminal state n, minmax will necessarily be uncertain about the true mmv(n) evaluation functions may be overly optimistic (see the example above)
observe
most evaluation functions compute features fin of states n examples include material values of pieces on a chess board number of winning lines in tic tact toe .. .
observe
most evaluation functions compute features fin of states n examples include material values of pieces on a chess board number of winning lines in tic tact toe .. .
features form feature vectors f n = f1n , f2n , . . . , fkn
observe
most evaluation functions compute features fin of states n examples include material values of pieces on a chess board number of winning lines in tic tact toe .. .
features form feature vectors f n = f1n , f2n , . . . , fkn feature vectors may be categorized or clustered
observe
most evaluation functions compute features fin of states n examples include material values of pieces on a chess board number of winning lines in tic tact toe .. .
features form feature vectors f n = f1n , f2n , . . . , fkn feature vectors may be categorized or clustered feature vector clusters ⇔ game state categories c(n)
expected values experience and/or statistics can be used to estimate how often a category leads to a win (+1), to a loss (−1), or to a draw (0) later on
expected values experience and/or statistics can be used to estimate how often a category leads to a win (+1), to a loss (−1), or to a draw (0) later on this allows to define Eval(n) = pw c(n) · (+1) + pl c(n) · (−1) + pd c(n) · 0 where pw + pl + pd = 1
linear combinations features fi can also be used directly for evaluation
linear combinations features fi can also be used directly for evaluation for instance, in chess, one assigns material values to pieces (pawn = 1, knight = 3, bishop = 3, rook = 5, queen = 9) there also are “abstract features” such as king safety which are measured in pawns
linear combinations features fi can also be used directly for evaluation for instance, in chess, one assigns material values to pieces (pawn = 1, knight = 3, bishop = 3, rook = 5, queen = 9) there also are “abstract features” such as king safety which are measured in pawns features like these can be linearly combined Eval(n) =
k X
wi fin
i=1
where the wi are weights set according to experience
note
evaluation functions can also be used to rank game states
note
evaluation functions can also be used to rank game states recall that ranking allows for more efficient α-β pruning
note
evaluation functions can also be used to rank game states recall that ranking allows for more efficient α-β pruning it also allows for realizing best-first search algorithms
question what about games such as backgammon which involve randomness ?
question what about games such as backgammon which involve randomness ?
answer introduce a chance node after every MAX and MIN node
chance nodes ⇔ indicate outcome x and probability p(x) of a random event X
chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice
chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12,
p(x) =
chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12,
p(x) = p(6 + 6)
chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12,
p(x) = p(6 + 6) =
1 6
·
1 6
=
1 36
chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12,
p(x) = p(6 + 6) =
1 6
·
1 6
=
1 36
≈ 3%
chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12, x = 4,
p(x) = p(6 + 6) = p(x) =
1 6
·
1 6
=
1 36
≈ 3%
chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12, x = 4,
p(x) = p(6 + 6) =
1 6
·
1 6
=
1 36
≈ 3%
p(x) = p(1 + 3) + p(2 + 2) + p(3 + 1)
chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12, x = 4,
p(x) = p(6 + 6) =
1 6
·
1 6
=
1 36
≈ 3%
p(x) = p(1 + 3) + p(2 + 2) + p(3 + 1) =
3 36
chance nodes ⇔ indicate outcome x and probability p(x) of a random event X for instance, consider X = sum of two dice x = 12, x = 4,
p(x) = p(6 + 6) =
1 6
·
1 6
=
1 36
≈ 3%
p(x) = p(1 + 3) + p(2 + 2) + p(3 + 1) =
3 36
≈ 8%
consequence
change minmax from u(n) max mmv(s) mmv(n) = s∈Succ(n) min mmv(s)
if n is a terminal node if n is a MAX node if n is a MIN node
s∈Succ(n)
to expected minmax u(n) max emmv(s) s∈Succ(n) emmv(n) = min emmv(s) s∈Succ(n) P p(s) · emmv(s) s∈Succ(n)
if n is a terminal node if n is a MAX node if n is a MIN node if n is a chance node
history and state of the art
the history of minmax
J. von Neumann, Zur Theorie der Gesellschaftsspiele, Math. Annalen 100(1), 1928 N. Wiener, Cybernetics or Control and Communication in the Animal and the Machine, MIT Press, 1948 C. Shannon, Programming a Computer for Playing Chess, Philosophical Magazine, Ser.7, 41(314), 1950
J. von Neumann (∗1903, †1957)
N. Wiener (∗1894, †1964)
C. Shannon (∗1916, †2001)
the history of α-β pruning
J. McCarthy, 1955 A. Newell and H. Simon, 1958 .. . D. Knuth and R.W. Moore, 1975 .. . J. Pearl, The Solution for the Branching Factor of the Alphabeta Pruning Algorithm and its Optimality, Comm. of the ACM 25(8), 1982
state of the art (pre 2016)
modern programs that play chess, checkers, . . . rely on minmax search with α-β pruning and other cutoff techniques (e.g. quiescent search) sophisticated and highly tuned evaluation functions large transposition tables and databases of opening and endgame moves (extreme) computing power
state of the art (pre 2016)
modern programs that play chess, checkers, . . . rely on minmax search with α-β pruning and other cutoff techniques (e.g. quiescent search) sophisticated and highly tuned evaluation functions large transposition tables and databases of opening and endgame moves (extreme) computing power Deep Blue stats evaluated 126,000,000 nodes/s on average generated 30,000,000 positions per move thus typically computed look-ahead of depth 14 considered 8,000 features for node evaluation used a data base of 700,000 grandmaster games
state of the art (post 2016)
AlphaGO (2016) characteristics combines tree search and machine learning in particular, uses Monte Carlo tree search guided by a value- and a policy-network the latter are deep neural networks originally trained using a vast number of examples of human gameplay (for bootstrap) and selfplay (for refinement)
AlphaGO Zero (2017) characteristics works just as AlpheGO but creates training data from scratch (by means of way more extensive selfplay)
AlphaZero (2017) characteristics can, in principal, play any two player turn based game clearly outperforms previous best chess programs
note
AlphaGO, AlphaGO Zero, and AlphaZero all reach superhuman performance
note
AlphaGO, AlphaGO Zero, and AlphaZero all reach superhuman performance they all require insane amount of computing power (at least during training)
note
AlphaGO, AlphaGO Zero, and AlphaZero all reach superhuman performance they all require insane amount of computing power (at least during training)
an average human brain consumes 480 calories per day ⇔ an average human brain runs on about 20 Watts
note
AlphaGO, AlphaGO Zero, and AlphaZero all reach superhuman performance they all require insane amount of computing power (at least during training)
an average human brain consumes 480 calories per day ⇔ an average human brain runs on about 20 Watts a high performance GPU cluster runs on 5.000.000 Watts
recall from lecture 02: categories of intelligence
thinking humanly
thinking rationally
cognitive science
mathematical logic
neuroscience
inference
acting humanly
acting rationally
knowledge representation
agents
reasoning
game theory
learning
operations research
⇒ Turing test
recall from lecture 02: categories of intelligence
thinking humanly
thinking rationally
cognitive science
mathematical logic
neuroscience
inference
acting humanly
acting rationally
knowledge representation
agents
reasoning
game theory
learning
operations research
⇒ Turing test
summary
we now know about
transposition tables pruning, in particular α-β pruning depth restricted searches and evaluation functions extensions towards search under uncertainty / randomness