Minimax strategies, alpha beta pruning Lirong Xia
Reminder Ø Project 1 due tonight § Makes sure you DO NOT SEE “ERROR: Summation of parsed points does not match” Ø Project 2 due in two weeks 2
How to find good heuristics? Ø No really mechanical way § art more than science Ø General guideline: relaxing constraints § e.g. Pacman can pass through the walls Ø Mimic what you would do 3
Arc Consistency of a CSP Ø A simple form of propagation makes sure all arcs are consistent: X X X Delete Ø If V loses a value, neighbors of V need to be rechecked! from tail! Ø Arc consistency detects failure earlier than forward checking Ø Can be run as a preprocessor or after each assignment Ø Might be time-consuming 4
Limitations of Arc Consistency Ø After running arc consistency: § Can have one solution left § Can have multiple solutions left § Can have no solutions left (and not know it) 5
“Sum to 2” game Ø Player 1 moves, then player 2, finally player 1 again Ø Move = 0 or 1 Ø Player 1 wins if and only if all moves together sum to 2 Player 1 0 1 Player 2 Player 2 0 1 1 0 Player 1 Player 1 Player 1 Player 1 0 1 0 1 1 0 0 1 -1 -1 -1 1 -1 1 1 -1 Player 1’s utility is in the leaves; player 2’s utility is the negative of this
Today’s schedule Ø Adversarial game Ø Minimax search Ø Alpha-beta pruning algorithm 7
Adversarial Games Ø Deterministic, zero-sum games: § Tic-tac-toe, chess, checkers § The MAX player maximizes result § The MIN player minimizes result Ø Minimax search: § A search tree § Players alternate turns § Each node has a minimax value: best achievable utility against a rational adversary 8
Computing Minimax Values Ø This is DFS Ø Two recursive functions: § max-value maxes the values of successors § min-value mins the values of successors Ø Def value (state): If the state is a terminal state: return the state’s utility If the agent at the state is MAX: return max-value(state) If the agent at the state is MIN: return min-value(state) Ø Def max-value(state): Initialize max = -∞ For each successor of state: Compute value(successor) Update max accordingly return max Ø Def min-value(state): similar to max-value 9
Minimax Example 3 3 2 2 10
Tic-tac-toe Game Tree 11
Renju • 15*15 • 5 horizontal, vertical, or diagonal in a row win • no double-3 or double-4 moves for black • otherwise black’s winning strategy was computed – L. Victor Allis 1994 (PhD thesis) 12
Minimax Properties Ø Time complexity? ( ) m O b § Ø Space complexity? O bm ( ) § Ø For chess, § Exact solution is completely b 35, m 100 infeasible ≈ ≈ § But, do we need to explore the whole tree? 13
Resource Limits Ø Cannot search to leaves Ø Depth-limited search § Instead, search a limited depth of tree § Replace terminal utilities with an evaluation function for non-terminal positions Ø Guarantee of optimal play is gone 14
Evaluation Functions Ø Functions which scores non-terminals Ø Ideal function: returns the minimax utility of the position Ø In practice: typically weighted linear sum of features: ( ) = w 1 f 1 s ( ) + w 2 f 2 s ( ) + + w n f n s ( ) Evals s f s = # white queens - # black queens Ø e.g. , etc. ( ) ( ) 15 1
Minimax with limited depth Ø Suppose you are the MAX player Ø Given a depth d and current state Ø Compute value(state, d ) that reaches depth d § at depth d , use a evaluation function to estimate the value if it is non-terminal 16
Improving minimax: pruning 17
Pruning in Minimax Search Ø An ancestor is a MAX node § already has an option than my current solution § my future solution can only be smaller 18
Alpha-beta pruning Ø Pruning = cutting off parts of the search tree (because you realize you don’t need to look at them) § When we considered A* we also pruned large parts of the search tree Ø Maintain § α = value of the best option for the MAX player along the path so far § β = value of the best option for the MIN player along the path so far § Initialized to be α = -∞ and β = +∞ Ø Maintain and update α and β for each node § α is updated at MAX player’s nodes § β is updated at MIN player’s nodes
Alpha-Beta Pruning Ø General configuration § We’re computing the MIN-VALUE at n § We’re looping over n ’s children § n ’s value estimate is dropping § α is the best value that MAX can get at any choice point along the current path § If n becomes worse than α , MAX will avoid it, so can stop considering n ’s other children § Define β similarly for MIN § α is usually smaller than β • Once α >= β , return to the upper layer 20
Alpha-Beta Pruning Example α is MAX’s best alternative here or above β is MIN’s best alternative here or above 21
Alpha-Beta Pruning Example - α = ∞ starting / α β + β = ∞ 3 - - 3 α = raising α α = ∞ α = ∞ α = + + + + β = ∞ β = ∞ β = ∞ β = ∞ lowering β 3 3 α = α = - - - - α = ∞ α = ∞ α = ∞ 3 3 3 α = ∞ 3 α = α = α = + 2 α = β = ∞ β = 3 3 3 + β = β = β = 14 5 1 β = ∞ + β = β = β = β = ∞ α raising α 8 is MAX’s best alternative here or above - α = α = ∞ 3 3 β = β = β is MIN’s best alternative here or above 22
Alpha-Beta Pseudocode 23
Alpha-Beta Pruning Properties Ø This pruning has no effect on final result at the root Ø Values of intermediate nodes might be wrong! § Important: children of the root may have the wrong value Ø Good children ordering improves effectiveness of pruning Ø With “perfect ordering”: § Time complexity drops to O ( b m /2 ) § Doubles solvable depth! § Your action looks smarter: more forward-looking with good evaluation function § Full search of, e.g. chess, is still hopeless… 24
Project 2 Ø Q1: write an evaluation function for (state,action) pairs § the evaluation function is for this question only Ø Q2: minimax search with arbitrary depth and multiple MIN players (ghosts) § evaluation function on states has been implemented for you Ø Q3: alpha-beta pruning with arbitrary depth and multiple MIN players (ghosts) 25
Recap Ø Minimax search § with limited depth § evaluation function Ø Alpha-beta pruning Ø Project 1 due midnight today Ø Project 2 due in two weeks 26
Recommend
More recommend