Minimax strategies, alpha beta pruning Lirong Xia Reminder - PowerPoint PPT Presentation

Minimax strategies, alpha beta pruning Lirong Xia

Reminder Ø Project 1 due tonight § Makes sure you DO NOT SEE “ERROR: Summation of parsed points does not match” Ø Project 2 due in two weeks 2

How to find good heuristics? Ø No really mechanical way § art more than science Ø General guideline: relaxing constraints § e.g. Pacman can pass through the walls Ø Mimic what you would do 3

Arc Consistency of a CSP Ø A simple form of propagation makes sure all arcs are consistent: X X X Delete Ø If V loses a value, neighbors of V need to be rechecked! from tail! Ø Arc consistency detects failure earlier than forward checking Ø Can be run as a preprocessor or after each assignment Ø Might be time-consuming 4

Limitations of Arc Consistency Ø After running arc consistency: § Can have one solution left § Can have multiple solutions left § Can have no solutions left (and not know it) 5

“Sum to 2” game Ø Player 1 moves, then player 2, finally player 1 again Ø Move = 0 or 1 Ø Player 1 wins if and only if all moves together sum to 2 Player 1 0 1 Player 2 Player 2 0 1 1 0 Player 1 Player 1 Player 1 Player 1 0 1 0 1 1 0 0 1 -1 -1 -1 1 -1 1 1 -1 Player 1’s utility is in the leaves; player 2’s utility is the negative of this

Today’s schedule Ø Adversarial game Ø Minimax search Ø Alpha-beta pruning algorithm 7

Adversarial Games Ø Deterministic, zero-sum games: § Tic-tac-toe, chess, checkers § The MAX player maximizes result § The MIN player minimizes result Ø Minimax search: § A search tree § Players alternate turns § Each node has a minimax value: best achievable utility against a rational adversary 8

Computing Minimax Values Ø This is DFS Ø Two recursive functions: § max-value maxes the values of successors § min-value mins the values of successors Ø Def value (state): If the state is a terminal state: return the state’s utility If the agent at the state is MAX: return max-value(state) If the agent at the state is MIN: return min-value(state) Ø Def max-value(state): Initialize max = -∞ For each successor of state: Compute value(successor) Update max accordingly return max Ø Def min-value(state): similar to max-value 9

Minimax Example 3 3 2 2 10

Tic-tac-toe Game Tree 11

Renju • 15*15 • 5 horizontal, vertical, or diagonal in a row win • no double-3 or double-4 moves for black • otherwise black’s winning strategy was computed – L. Victor Allis 1994 (PhD thesis) 12

Minimax Properties Ø Time complexity? ( ) m O b § Ø Space complexity? O bm ( ) § Ø For chess, § Exact solution is completely b 35, m 100 infeasible ≈ ≈ § But, do we need to explore the whole tree? 13

Resource Limits Ø Cannot search to leaves Ø Depth-limited search § Instead, search a limited depth of tree § Replace terminal utilities with an evaluation function for non-terminal positions Ø Guarantee of optimal play is gone 14

Evaluation Functions Ø Functions which scores non-terminals Ø Ideal function: returns the minimax utility of the position Ø In practice: typically weighted linear sum of features: ( ) = w 1 f 1 s ( ) + w 2 f 2 s ( ) +  + w n f n s ( ) Evals s f s = # white queens - # black queens Ø e.g. , etc. ( ) ( ) 15 1

Minimax with limited depth Ø Suppose you are the MAX player Ø Given a depth d and current state Ø Compute value(state, d ) that reaches depth d § at depth d , use a evaluation function to estimate the value if it is non-terminal 16

Improving minimax: pruning 17

Pruning in Minimax Search Ø An ancestor is a MAX node § already has an option than my current solution § my future solution can only be smaller 18

Alpha-beta pruning Ø Pruning = cutting off parts of the search tree (because you realize you don’t need to look at them) § When we considered A* we also pruned large parts of the search tree Ø Maintain § α = value of the best option for the MAX player along the path so far § β = value of the best option for the MIN player along the path so far § Initialized to be α = -∞ and β = +∞ Ø Maintain and update α and β for each node § α is updated at MAX player’s nodes § β is updated at MIN player’s nodes

Alpha-Beta Pruning Ø General configuration § We’re computing the MIN-VALUE at n § We’re looping over n ’s children § n ’s value estimate is dropping § α is the best value that MAX can get at any choice point along the current path § If n becomes worse than α , MAX will avoid it, so can stop considering n ’s other children § Define β similarly for MIN § α is usually smaller than β • Once α >= β , return to the upper layer 20

Alpha-Beta Pruning Example α is MAX’s best alternative here or above β is MIN’s best alternative here or above 21

Alpha-Beta Pruning Example - α = ∞ starting / α β + β = ∞ 3 - - 3 α = raising α α = ∞ α = ∞ α = + + + + β = ∞ β = ∞ β = ∞ β = ∞ lowering β 3 3 α = α = - - - - α = ∞ α = ∞ α = ∞ 3 3 3 α = ∞ 3 α = α = α = + 2 α = β = ∞ β = 3 3 3 + β = β = β = 14 5 1 β = ∞ + β = β = β = β = ∞ α raising α 8 is MAX’s best alternative here or above - α = α = ∞ 3 3 β = β = β is MIN’s best alternative here or above 22

Alpha-Beta Pseudocode 23

Alpha-Beta Pruning Properties Ø This pruning has no effect on final result at the root Ø Values of intermediate nodes might be wrong! § Important: children of the root may have the wrong value Ø Good children ordering improves effectiveness of pruning Ø With “perfect ordering”: § Time complexity drops to O ( b m /2 ) § Doubles solvable depth! § Your action looks smarter: more forward-looking with good evaluation function § Full search of, e.g. chess, is still hopeless… 24

Project 2 Ø Q1: write an evaluation function for (state,action) pairs § the evaluation function is for this question only Ø Q2: minimax search with arbitrary depth and multiple MIN players (ghosts) § evaluation function on states has been implemented for you Ø Q3: alpha-beta pruning with arbitrary depth and multiple MIN players (ghosts) 25

Recap Ø Minimax search § with limited depth § evaluation function Ø Alpha-beta pruning Ø Project 1 due midnight today Ø Project 2 due in two weeks 26

Minimax strategies, alpha beta pruning Lirong Xia Reminder - PowerPoint PPT Presentation

Minimax strategies, alpha beta pruning Lirong Xia Reminder Project 1 due tonight Makes sure you DO NOT SEE ERROR: Summation of parsed points does not match Project 2 due in two weeks 2 How to find good heuristics? No really

Alpha- -beta pruning beta pruning Example Alpha Example reduce the branching factor of

Natural Target Pruning Making Proper Pruning Cuts Natural Target Pruning In this lesson we

More on games (Ch. 5.4-5.6) Alpha-beta pruning Previously on CSci 4511... We talked about how to

Minimax strategies, alpha beta pruning Lirong Xia How to find good heuristics? No really

More on games (Ch. 5.4-5.6) Announcements Writing 2 posted Minimax Pruning in real life:

Beta star measurement G. Wang and M.Bai Yellow beta star and chromatic beta beat measurement

Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Alpha-Beta Pruning: Algorithm and Analysis Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

BASICS Natural Target Pruning Terminology and Tools Reasons for Pruning Fruit Trees

Adversarial Search Volker Sorge Intro to AI: Problem of Games Lecture 4 Volker Sorge MiniMax

ECE 4524 Artificial Intelligence and Engineering Applications Meeting 6: Alpha-Beta Pruning,

Pruning for Cropload Management and Productivity 2013 Winter Pruning Workshop Dr. Mercy

Extensive Form Games 2/10/17 Alpha-Beta Pruning Exercise + + + + + + +

The results of alpha-beta depend on the order in which moves are considered among the

Faking a Failover Over the Top With Samba Clusters Christopher R. Hertel Samba Team May 2017

On the input energy for state reachability of linear systems with packet losses A. Sanand Dilip,

Automatic Generation of Minimal and Reduced Models for Structured Parametric Dynamical Systems

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Announcements HW 1

CS 730/830: Intro AI Adversarial Search 1 handout: slides You think you know when you can learn,

Game Playing Philipp Koehn 29 September 2015 Philipp Koehn Artificial Intelligence: Game

Game Playing Tail end of Constraint Satisfaction Ch. 5.1-5.3, 5.4.1, 5.5 Questions Game

CS885 Reinforcement Learning Lecture 13c: June 13, 2018 Adversarial Search [RusNor] Sec. 5.1-5.4