outline
play

Outline Games Perfect play minimax decisions Adversarial Search - PowerPoint PPT Presentation

Outline Games Perfect play minimax decisions Adversarial Search pruning by Stuart Russell Resource limits and approximate evaluation modified by Jacek Malec for LTH lectures Games of chance January 24th, 2018


  1. Outline ♦ Games ♦ Perfect play – minimax decisions Adversarial Search – α – β pruning by Stuart Russell ♦ Resource limits and approximate evaluation modified by Jacek Malec for LTH lectures ♦ Games of chance January 24th, 2018 ♦ Games of imperfect information Chapter 5 of AIMA � Stuart Russell c Chapter 5 of AIMA 1 � Stuart Russell c Chapter 5 of AIMA 2 Games vs. search problems Types of games “Unpredictable” opponent ⇒ solution is a strategy specifying a move for every possible opponent reply Time limits ⇒ unlikely to find goal, must approximate deterministic chance Plan of attack: perfect information chess, checkers, backgammon • Computer considers possible lines of play (Babbage, 1846) go, othello monopoly • Algorithm for perfect play (Zermelo, 1912; Von Neumann, 1944) imperfect information battleships, bridge, poker, scrabble blind tictactoe nuclear war • Finite horizon, approximate evaluation (Zuse, 1945; Wiener, 1948; Shannon, 1950) • First chess program (Turing, 1951) • Machine learning to improve evaluation accuracy (Samuel, 1952–57) • Pruning to allow deeper search (McCarthy, 1956) � Stuart Russell c Chapter 5 of AIMA 3 � Stuart Russell c Chapter 5 of AIMA 4

  2. Minimax Game tree (2-player, deterministic, turns) Perfect play for deterministic, perfect-information games MAX (X) Idea: choose move to position with highest minimax value = best achievable payo ff against best play X X X MIN (O) X X X X X X E.g., 2-ply game: 3 MAX X O X O X . . . MAX (X) O A 1 A 2 A 3 X O X X O X O . . . 3 2 2 MIN (O) X X MIN A 11 A 12 A 13 A 21 A 22 A 23 A 31 A 32 A 33 . . . . . . . . . . . . . . . X O X X O X X O X 3 12 8 2 4 6 14 5 2 TERMINAL O X O O X X O X X O X O O Utility − 1 0 +1 � Stuart Russell c Chapter 5 of AIMA 5 � Stuart Russell c Chapter 5 of AIMA 6 Minimax algorithm Properties of minimax Complete?? function Minimax-Decision ( state ) returns an action inputs : state , current state in game return the a in Actions ( state ) maximizing Min-Value ( Result ( a , state )) function Max-Value ( state ) returns a utility value if Terminal-Test ( state ) then return Utility ( state ) v ← −∞ for a, s in Successors ( state ) do v ← Max ( v , Min-Value ( s )) return v function Min-Value ( state ) returns a utility value if Terminal-Test ( state ) then return Utility ( state ) v ← ∞ for a, s in Successors ( state ) do v ← Min ( v , Max-Value ( s )) return v � Stuart Russell c Chapter 5 of AIMA 7 � Stuart Russell c Chapter 5 of AIMA 8

  3. Properties of minimax Properties of minimax Complete?? Only if tree is finite (chess has specific rules for this). Complete?? Yes, if tree is finite (chess has specific rules for this) NB a finite strategy can exist even in an infinite tree! Optimal?? Yes, against an optimal opponent. Otherwise?? Optimal?? Time complexity?? � Stuart Russell c Chapter 5 of AIMA 9 � Stuart Russell c Chapter 5 of AIMA 10 Properties of minimax Properties of minimax Complete?? Yes, if tree is finite (chess has specific rules for this) Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Yes, against an optimal opponent. Otherwise?? Optimal?? Yes, against an optimal opponent. Otherwise?? Time complexity?? O ( b m ) Time complexity?? O ( b m ) Space complexity?? Space complexity?? O ( bm ) (depth-first exploration) For chess, b ≈ 35 , m ≈ 100 for “reasonable” games ⇒ exact solution completely infeasible But do we need to explore every path? � Stuart Russell c Chapter 5 of AIMA 11 � Stuart Russell c Chapter 5 of AIMA 12

  4. α – β pruning example α – β pruning example 3 3 MAX MAX 3 3 2 MIN MIN X X 3 12 8 3 12 8 2 � Stuart Russell c Chapter 5 of AIMA 13 � Stuart Russell c Chapter 5 of AIMA 14 α – β pruning example α – β pruning example 3 3 MAX MAX 2 14 5 3 2 14 3 MIN MIN X X X X 3 12 8 2 14 5 3 12 8 2 14 � Stuart Russell c Chapter 5 of AIMA 15 � Stuart Russell c Chapter 5 of AIMA 16

  5. α – β pruning example Why is it called α – β ? 3 3 MAX MAX MIN 3 2 14 5 2 MIN .. .. .. MAX X X 3 12 8 2 14 5 2 MIN V α is the best value (to max ) found so far o ff the current path If V is worse than α , max will avoid it ⇒ prune that branch Define β similarly for min � Stuart Russell c Chapter 5 of AIMA 17 � Stuart Russell c Chapter 5 of AIMA 18 The α – β algorithm Properties of α – β function Alpha-Beta-Decision ( state ) returns an action Pruning does not a ff ect final result return the a in Actions ( state ) maximizing Min-Value ( Result ( a , state )) Good move ordering improves e ff ectiveness of pruning function Max-Value ( state , α , β ) returns a utility value With “perfect ordering,” time complexity = O ( b m/ 2 ) inputs : state , current state in game ⇒ doubles solvable depth α , the value of the best alternative for max along the path to state β , the value of the best alternative for min along the path to state Use additional heuristics (e.g. Killer Moves ) if Terminal-Test ( state ) then return Utility ( state ) v ← −∞ A simple example of the value of reasoning about which computations are for a, s in Successors ( state ) do relevant (a form of metareasoning) v ← Max ( v , Min-Value ( s , α , β )) Unfortunately, 35 50 is still impossible! if v ≥ β then return v α ← Max ( α , v ) return v function Min-Value ( state , α , β ) returns a utility value same as Max-Value but with roles of α , β reversed � Stuart Russell c Chapter 5 of AIMA 19 � Stuart Russell c Chapter 5 of AIMA 20

  6. Resource limits Evaluation functions Standard approach: • Use Cutoff-Test instead of Terminal-Test e.g., depth limit (perhaps add quiescence search) • Use Eval instead of Utility i.e., evaluation function that estimates desirability of position Suppose we have 100 seconds, explore 10 4 nodes/second ⇒ 10 6 nodes per move ≈ 35 8 / 2 Black to move White to move ⇒ α – β reaches depth 8 ⇒ pretty good chess program White slightly better Black winning For chess, typically linear weighted sum of features Eval ( s ) = w 1 f 1 ( s ) + w 2 f 2 ( s ) + . . . + w n f n ( s ) e.g., w 1 = 9 with f 1 ( s ) = (number of white queens) – (number of black queens), etc. � Stuart Russell c Chapter 5 of AIMA 21 � Stuart Russell c Chapter 5 of AIMA 22 Digression: Exact values don’t matter Deterministic games in practice Checkers: Chinook ended 40-year-reign of human world champion Marion MAX Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. 1 2 1 20 MIN Chess: Deep Blue defeated human world champion Gary Kasparov in a six- game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending 1 2 2 4 1 20 20 400 some lines of search up to 40 ply. Behaviour is preserved under any monotonic transformation of Eval Othello: human champions refuse to compete against computers, who are too good. Only the order matters: payo ff in deterministic games acts as an ordinal utility function Go: In go, b > 300 , so most programs use pattern knowledge bases to suggest plausible moves. AlphaGo defeated Lee Sedol, currently the best human go player, in March 2016. AlphaGo uses Monte Carlo tree search, guided by evaluation functions learnt by deep NNs. � Stuart Russell c Chapter 5 of AIMA 23 � Stuart Russell c Chapter 5 of AIMA 24

  7. Monte Carlo Tree Search Monte Carlo Tree Search Selection Expansion 11/21 11/21 7/10 3/8 0/3 7/10 3/8 0/3 2/4 1/6 1/2 2/3 2/3 2/4 1/6 1/2 2/3 2/3 2/3 3/3 2/3 3/3 0/0 � Stuart Russell c Chapter 5 of AIMA 25 � Stuart Russell c Chapter 5 of AIMA 26 Monte Carlo Tree Search Monte Carlo Tree Search Backpropagation Simulation 11/21 11/22 7/10 3/8 0/3 8/11 3/8 0/3 2/4 1/6 1/2 2/3 2/3 2/4 1/7 1/2 2/3 2/3 2/3 3/3 2/3 4/4 0/0 0/1 0/1 � Stuart Russell c Chapter 5 of AIMA 27 � Stuart Russell c Chapter 5 of AIMA 28

  8. Nondeterministic games: backgammon Nondeterministic games in general In nondeterministic games, chance introduced by dice, card-shu ffl ing 0 1 2 3 4 5 6 7 8 9 10 11 12 Simplified example with coin-flipping: MAX 3 − 1 CHANCE 0.5 0.5 0.5 0.5 2 4 0 − 2 MIN 2 4 7 4 6 0 5 − 2 25 24 23 22 21 20 19 18 17 16 15 14 13 � Stuart Russell c Chapter 5 of AIMA 29 � Stuart Russell c Chapter 5 of AIMA 30 Algorithm for nondeterministic games Nondeterministic games in practice Expectiminimax gives perfect play Dice rolls increase b : 21 possible rolls with 2 dice Backgammon ≈ 20 legal moves (can be 6,000 with 1-1 roll) Just like Minimax , except we must also handle chance nodes: depth 4 = 20 × (21 × 20) 3 ≈ 1 . 2 × 10 9 . . . if state is a Max node then As depth increases, probability of reaching a given node shrinks return the highest ExpectiMinimax-Value of Successors ( state ) ⇒ value of lookahead is diminished if state is a Min node then α – β pruning is much less e ff ective return the lowest ExpectiMinimax-Value of Successors ( state ) if state is a chance node then TDGammon uses depth-2 search + very good Eval return average of ExpectiMinimax-Value of Successors ( state ) ≈ world-champion level . . . � Stuart Russell c Chapter 5 of AIMA 31 � Stuart Russell c Chapter 5 of AIMA 32

Recommend


More recommend