chapter6
play

Chapter6 Adversarial Search 20070419 Chap6 1 Game Theory - PDF document

Chapter6 Adversarial Search 20070419 Chap6 1 Game Theory Studied by mathematicians, economists, finance In AI we limit games to: - deterministic - turn-taking - two-player - zero-sum ( Win-lose Game;


  1. Chapter6 Adversarial Search 20070419 Chap6 1 Game Theory • Studied by mathematicians, economists, finance • In AI we limit games to: - deterministic - turn-taking - two-player - zero-sum ( 零和遊戲或 Win-lose Game; 你死我活 ) - perfect information This means deterministic, fully observable environments in which there are two agents whose actions must alternate and in which the utility values at the end of the game are always equal and opposite. 20070419 Chap6 2 1

  2. Types of Games Deterministic Chance Perfect Chess, Go, Othello Backgammon ( 西洋雙陸棋 ) information Checkers ( 西洋跳棋 ) Monopoly ( 地產大亨 , 大富翁 ) Imperfect Blind Tictactoe Bridge ( 橋牌 ) , Poker ( 梭哈 ) information • Game playing was one of the first tasks undertaken in AI. • Machines have surpassed humans on checkers and Othello, have defeated human champions in chess and backgammon. • In Go, computers perform at the amateur level. 20070419 Chap6 3 Games as Search Problems • Games offer pure, abstract competition. • A chess-playing computer would be an existence proof of a machine doing something generally thought to require intelligence. • Games are idealization of worlds in which - the world state is fully accessible ; - the (small number of) actions are well-defined; - uncertainty due to moves by the opponent due to the complexity of games 20070419 Chap6 4 2

  3. Games as Search Problems (cont.-1) • Games are usually much too hard to solve. For example, in a typical chess game, - Average branching factor: 35 - Average moves by each player: 50 - Total number of nodes in search tree : 35 100 or 10 154 (although total number of different legal positions: 10 40 ) • Time limits for making good decisions 20070419 Chap6 5 Games as Search Problems (cont.-2) • Initial State - How does the game start? • Successor Function - A list of legal (move, state) pairs for each state • Terminal Test - Determines when game is over • Utility Function - Provides numeric value for all terminal states 20070419 Chap6 6 3

  4. Partial Game Tree 20070419 Chap6 7 Optimal strategies • Find the contingent strategy for MAX assuming an infallible MIN opponent. • Assumption: Both players play optimally !! • Given a game tree, the optimal strategy can be determined by using the minimax value of each node: MinimaxValue( n ) = Utility ( n ) if n is a terminal state max s ∈ Successors(n) MinimaxValue( s ) if n is a MAX node min s ∈ Successors(n) MinimaxValue( s ) if n is a MIN node 20070419 Chap6 8 4

  5. Minimax • Perfect play for deterministic, perfect information games • Idea: choose move to a position with highest minimax value = best achievable payoff against best play 20070419 Chap6 9 Two-Ply Game Tree 20070419 Chap6 10 5

  6. Two-Ply Game Tree (cont.-1) 20070419 Chap6 11 Two-Ply Game Tree (cont.-2) 20070419 Chap6 12 6

  7. Two-Ply Game Tree (cont.-3) The minimax decision Minimax maximizes the worst-case outcome for max. 20070419 Chap6 13 Minimax Algorithm 20070419 Chap6 14 7

  8. The Minimax Algorithm (cont.) • Generate the whole game tree. • Apply the utility function to each terminal state. • Determine the utility of the nodes one level higher up from the terminal nodes. = K Utility( ) max / min ( . 1 , . 2 , , . ) n n n n b • Continue backing up the values. • At the root, MAX chooses the move leading to the highest utility value. 20070419 Chap6 15 Analysis of Minimax Complete ?? Yes , only if tree is finite Optimal ?? Yes , against an optimal opponent. Otherwise?? Time ?? O(b m ), is a complete depth-first search m: max depth, b : # of legal moves Space ?? O(bm), generate all successors at once or O(m), generate successors one at a time For chess, b ≈ 35, m ≈ 100 for “reasonable” games ⇒ Exact solution completely infeasible 20070419 Chap6 16 8

  9. Optimal Decisions in Multiplayer Games • Extend the minimax idea to multiplayer games • Replace the single value for each node with a vector of values 20070419 Chap6 17 α - β Pruning • The problem of minimax search # of state to examine: exponential in number of moves α - β Pruning: • returns the same moves as minimax would, but prunes away branches that cannot possibly influence the final decision α : the value of the best (highest) choice so far in search of MAX • β : the value of the best (lowest) choice so far in search of MIN • • Order of considering successors matters (look at step f of Fig 6.5 pp.168) - If possible, consider best successors first 20070419 Chap6 18 9

  10. α - β Pruning (cont.) If m is better than n for Player, we will never get to n in play and just prune it. 20070419 Chap6 19 α - β Pruning Example Do DF-search until first leaf Range of possible values [- ∞ ,+ ∞ ] [- ∞ , + ∞ ] 20070419 Chap6 20 10

  11. α - β Pruning Example (cont.-1) [- ∞ ,+ ∞ ] [- ∞ ,3] 20070419 Chap6 21 α - β Pruning Example (cont.-2) [- ∞ ,+ ∞ ] [- ∞ ,3] 20070419 Chap6 22 11

  12. α - β Pruning Example (cont.-3) [3,+ ∞ ] [3,3] 20070419 Chap6 23 α - β Pruning Example (cont.-4) [3,+ ∞ ] This node is worse for MAX [- ∞ ,2] [3,3] 20070419 Chap6 24 12

  13. α - β Pruning Example (cont.-5) , [3,14] [- ∞ ,2] [- ∞ ,14] [3,3] 20070419 Chap6 25 α - β Pruning Example (cont.-6) , [3,5] [ − ∞ ,2] [- ∞ ,5] 20070419 Chap6 26 13

  14. α - β Pruning Example (cont.-7) [3,3] [ − ∞ ,2] [3,3] [2,2] 20070419 Chap6 27 α - β Pruning Example (cont.-8) [3,3] [- ∞ ,2] [3,3] [2,2] 20070419 Chap6 28 14

  15. The α - β Algorithm 20070419 Chap6 29 The α - β Algorithm (cont.) 20070419 Chap6 30 15

  16. Analysis of α - β Algorithm • Pruning does not affect final result. • Entire subtrees can be pruned. • Good move ordering improves its effectiveness highly dependent on the order in which the successor are examined ⇒ It is worthwhile to try to examine first the successors that are likely to be best . e.g. Figure 6.5 (e, f) If successors of D is 2, 5, 14 (instead of 14, 5, 2) then 5, 14 can be pruned. 20070419 Chap6 31 Analysis of α - β Algorithm (cont.) • If best-move-first, - the total number of nodes examined is: O(b d/2 ) - the effective branching factor becomes: b 1/2 for chess, 6 instead 35 i.e. α - β can look ahead roughly twice as far as minimax in the same amount of time. • If random ordering, - the total number of nodes examined is: O(b 3d/4 ) for moderate b • Repeated states are again possible. - Store them in memory = transposition table 20070419 Chap6 32 16

  17. Imperfect, Real-Time Decisions • Minimax and alpha-beta pruning require too much leaf-node evaluations. • May be impractical within a reasonable amount of time. • Shannon (1950): - Apply heuristic evaluation function EVAL (replacing utility function of alpha-beta) - Cut off search earlier (replacing terminal-test by Cutoff test) 20070419 Chap6 33 Heuristic Evaluation Functions • Produce an estimate of the expected utility of the game from a given position. • Performance depends on the quality of EVAL • Requirements - EVAL should order terminal-nodes in the same as UTILITY - Computation cannot take too long - For non-terminal states, the EVAL should be strongly correlated with the actual chance of winning. • Most evaluation functions work by calculating various features of the state. What are features of chess? e.g. # of pawns possessed, etc. addition assumes • Weighted linear function independence of Eval(s) = w 1 f 1 (s) + w 2 f 2 (s) + … + w n f n (s) each feature 20070419 Chap6 34 17

  18. Heuristic Evaluation Functions (cont.-1) • Give material value for each piece (by chess book) pawn 1 knight or bishop 3 rook 5 queen 9 20070419 Chap6 35 Heuristic Evaluation Functions (cont.-2) Heuristic difficulties (Heuristic counts pieces won ) e.g. Two slightly different chess positions: (a) Black has an advantage of a knight and two pawns and will win the game. (b) Black will lose after white captures the queen. 20070419 Chap6 36 18

  19. Cutting Off Search • When do you recuse or use evaluation function? if Cutoff-Test (state , depth ) then return Eval (state ) Controlling the amount of search is to set a fixed depth limit d - Cutoff-Test (state , depth ) returns 1 or 0 When 1 is returned for all depth greater than some fixed depth d , use evaluation function - Cutoff beyond a certain depth - Cutoff if state is stable (more predicable) - Cutoff moves you know are bad (forward pruning) • Can have disastrous effect if the evaluation functions is not sophisticated enough. • Should continue the search until a quiescent position is found (no wild swings in value in near future) 20070419 Chap6 37 Cutting Off Search (cont.) • Does it work in practice? b m = 10 6 , b = 35 ⇒ m = 4 4-ply lookahead is a hopeless chess player 4-ply ≈ human novice 8-ply ≈ typical PC, human master 12-ply ≈ Deep Blue, Kasparov 20070419 Chap6 38 19

More recommend