Adversarial Search CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Soleymani “ Artificial Intelligence: A Modern Approach ” , 3 rd Edition, Chapter 5 Most slides have been adopted from Klein and Abdeel, CS188, UC Berkeley.
Outline Game as a search problem Minimax algorithm 𝛽 - 𝛾 Pruning: ignoring a portion of the search tree Time limit problem Cut off & Evaluation function 2
Games as search problems Games Adversarial search problems (goals are in conflict) Competitive multi-agent environments Games in AI are a specialized kind of games (in the game theory) 3
Adversarial Games 4
Types of Games Many different kinds of games! Axes: Deterministic or stochastic? One, two, or more players? Zero sum? Perfect information (can you see the state)? Want algorithms for calculating a strategy (policy) which recommends a move from each state 5
Zero-Sum Games Zero-Sum Games General Games Agents have opposite utilities Agents have independent utilities (values on outcomes) (values on outcomes) Lets us think of a single value that one Cooperation, indifference, competition, maximizes and the other minimizes and more are all possible Adversarial, pure competition More later on non-zero-sum games 6
Primary assumptions We start with these games: T wo-player Turn taking agents act alternately Zero-sum agents ’ goals are in conflict: sum of utility values at the end of the game is zero or constant Deterministic Perfect information fully observable Examples: Tic-tac-toe, chess, checkers 7
Deterministic Games Many possible formalizations, one is: States: S (start at s 0 ) Players: P={1...N} (usually take turns) Actions:A (may depend on player / state) Transition Function: SxA S TerminalTest: S {t,f} Terminal Utilities: SxP R Solution for a player is a policy: S A 8
Single-Agent Trees 8 2 0 … 2 6 … 4 6 9
Value of a State Non-Terminal Value of a state: The States: best achievable outcome (utility) from that state 8 2 0 … 2 6 … 4 6 Terminal States: 10
Adversarial Search 11
Adversarial Game Trees -20 -8 … -18 -5 … -10 +4 -20 +8 12
Minimax Values States Under Agent ’ s Control: States Under Opponent ’ s Control: -8 -5 -10 +8 Terminal States: 13
Tic-Tac-Toe Game Tree 14
Game tree (tic-tac-toe) Two players: 𝑄 1 and 𝑄 2 ( 𝑄 1 is now searching to find a good move) Zero-sum games: 𝑄 1 gets 𝑉(𝑢) , 𝑄 2 gets 𝐷 − 𝑉(𝑢) for terminal node 𝑢 𝑄 𝑄 1 : 𝑌 1 𝑄 2 : 𝑃 1-ply = half move 𝑄 2 𝑄 1 𝑄 2 Utilities from the point of view of 𝑄 1 15
Optimal play Opponent is assumed optimal Minimax function is used to find the utility of each state. MAX/MIN wants to maximize/minimize the terminal payoff MAX gets 𝑉(𝑢) for terminal node 𝑢 16
Adversarial Search (Minimax) Minimax search: A state-space search tree Players alternate turns max 5 Compute each node ’ s minimax value: the best achievable utility against a min 5 2 rational (optimal) adversary 8 2 5 6 Terminal values: part of the game 17
Minimax 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡, 𝑁𝐵𝑌) 𝑗𝑔 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡) 𝑛𝑏𝑦 𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇 𝑡 𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑆𝐹𝑇𝑉𝑀𝑈(𝑡, 𝑏)) 𝑄𝑀𝐵𝑍𝐹𝑆 𝑡 = 𝑁𝐵𝑌 𝑁𝐽𝑂𝐽𝑁𝐵𝑌 𝑡 = 𝑛𝑗𝑜 𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇 𝑡 𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑆𝐹𝑇𝑉𝑀𝑈 𝑡, 𝑏 ) 𝑄𝑀𝐵𝑍𝐹𝑆 𝑡 = 𝑁𝐽𝑂 Utility of being in state s 𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑡) shows the best achievable outcome of being in state 𝑡 (assumption: optimal opponent) 3 3 2 2 18
Minimax (Cont.) Optimal strategy: move to the state with highest minimax value Best achievable payoff against best play Maximizes the worst-case outcome for MAX It works for zero-sum games 19
Minimax Properties max min 10 10 9 100 Optimal against a perfect player. Otherwise? 20
Minimax Implementation def max-value(state): def min-value(state): initialize v = - ∞ initialize v = + ∞ for each successor of state: for each successor of state: v = max(v, min-value(successor)) v = min(v, max-value(successor)) return v return v 21
Minimax Implementation (Dispatch) def value(state): if the state is a terminal state: return the state ’ s utility if the next agent is MAX: return max-value(state) if the next agent is MIN: return min-value(state) def max-value(state): def min-value(state): initialize v = - ∞ initialize v = + ∞ for each successor of state: for each successor of state: v = max(v, value(successor)) v = min(v, value(successor)) return v return v 22
Minimax algorithm Depth first search function 𝑁𝐽𝑂𝐽𝑁𝐵𝑌_𝐸𝐹𝐷𝐽𝑇𝐽𝑃𝑂(𝑡𝑢𝑏𝑢𝑓) returns 𝑏𝑜 𝑏𝑑𝑢𝑗𝑝𝑜 𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) 𝑁𝐽𝑂_𝑊𝐵𝑀𝑉𝐹(𝑆𝐹𝑇𝑉𝑀𝑈(𝑡𝑢𝑏𝑢𝑓, 𝑏)) max return function 𝑁𝐵𝑌_𝑊𝐵𝑀𝑉𝐹(𝑡𝑢𝑏𝑢𝑓) returns 𝑏 𝑣𝑢𝑗𝑚𝑗𝑢𝑧 𝑤𝑏𝑚𝑣𝑓 if 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡𝑢𝑏𝑢𝑓) then return 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡𝑢𝑏𝑢𝑓) 𝑤 ← −∞ for each 𝑏 in 𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) do 𝑤 ← 𝑁𝐵𝑌(𝑤, 𝑁𝐽𝑂_𝑊𝐵𝑀𝑉𝐹(𝑆𝐹𝑇𝑉𝑀𝑈𝑇(𝑡𝑢𝑏𝑢𝑓, 𝑏))) return 𝑤 function 𝑁𝐽𝑂_𝑊𝐵𝑀𝑉𝐹(𝑡𝑢𝑏𝑢𝑓) returns 𝑏 𝑣𝑢𝑗𝑚𝑗𝑢𝑧 𝑤𝑏𝑚𝑣𝑓 if 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡𝑢𝑏𝑢𝑓) then return 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡𝑢𝑏𝑢𝑓) 𝑤 ← ∞ for each 𝑏 in 𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) do 𝑤 ← 𝑁𝐽𝑂(𝑤, 𝑁𝐵𝑌_𝑊𝐵𝑀𝑉𝐹(𝑆𝐹𝑇𝑉𝑀𝑈𝑇(𝑡𝑢𝑏𝑢𝑓, 𝑏))) return 𝑤 23
Properties of minimax Complete?Yes (when tree is finite) Optimal?Yes (against an optimal opponent) Time complexity: 𝑃(𝑐 𝑛 ) Space complexity: 𝑃(𝑐𝑛) (depth-first exploration) For chess, 𝑐 ≈ 35 , 𝑛 > 50 for reasonable games Finding exact solution is completely infeasible 24
Game Tree Pruning 25
Pruning Correct minimax decision without looking at every node in the game tree α - β pruning Branch & bound algorithm Prunes away branches that cannot influence the final decision 26
α - β pruning example 27
α - β pruning example 28
α - β pruning example 29
α - β pruning example 30
α - β pruning example 31
α - β pruning Assuming depth-first generation of tree We prune node 𝑜 when player has a better choice 𝑛 at (parent or) any ancestor of 𝑜 Two types of pruning (cuts): pruning of max nodes ( α -cuts) pruning of min nodes ( β -cuts) 32
Alpha-Beta Pruning General configuration (MIN version) We ’ re computing the MIN-VALUE at some node n MAX We ’ re looping over n ’ s children MIN Who cares about n ’ s value? MAX a Let a be the best value that MAX can get at any choice point along the current path from the root If n becomes worse than a , MAX will avoid it, so we MAX can stop considering n ’ s other children (it ’ s already bad enough that it won ’ t be played) MIN n MAX version is symmetric 33
α - β pruning (an other example) 3 ≤ 2 3 2 ≥ 5 5 1 34
Why is it called α - β? α : Value of the best (highest) choice found so far at any choice point along the path for MAX 𝛾 : Value of the best (lowest) choice found so far at any choice point along the path for MIN Updating α and 𝛾 during the search process For a MAX node once the value of this node is known to be more than the current 𝛾 ( v ≥ 𝛾 ), its remaining branches are pruned. For a MIN node once the value of this node is known to be less than the current 𝛽 ( v ≤ 𝛽 ), its remaining branches are pruned. 35
Alpha-Beta Implementation α : MAX ’ s best option on path to root β : MIN ’ s best option on path to root def max-value(state, α , β ): def min-value(state , α , β ): initialize v = - ∞ initialize v = + ∞ for each successor of state: for each successor of state: v = max(v, value(successor, α , β )) v = min(v, value(successor, α , β )) if v ≥ β return v if v ≤ α return v α = max( α , v) β = min( β , v) return v return v 36
Recommend
More recommend