Adversarial Search CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2017 Soleymani “ Artificial Intelligence: A Modern Approach ” , 3 rd Edition, Chapter 5
Outline Game as a search problem Minimax algorithm 𝛽 - 𝛾 Pruning: ignoring a portion of the search tree Time limit problem Cut off & Evaluation function 2
Games as search problems Games Adversarial search problems (goals are in conflict) Competitive multi-agent environments Games in AI are a specialized kind of games (in the game theory) 3
Primary assumptions Common games in AI: T wo-player Turn taking agents act alternately Zero-sum agents ’ goals are in conflict: sum of utility values at the end of the game is zero or constant Deterministic Perfect information fully observable
Game as a kind of search problem Initial state 𝑇 0 , set of states (each state contains also the turn), 𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡) , 𝑆𝐹𝑇𝑉𝑀𝑈𝑇 𝑡, 𝑏 like standard search 𝑄𝑀𝐵𝑍𝐹𝑆𝑇(𝑡) : Defines which player takes turn in a state 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡) : Shows where game has ended 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡, 𝑞) : utility or payoff function 𝑉: 𝑇 × 𝑄 → ℝ (how good is the terminal state 𝑡 for player 𝑞 ) Zero-sum (constant-sum) game: the total payoff to all players is zero (or constant) for every terminal state We have utilities at end of game instead of sum of action costs 5
Game tree (tic-tac-toe) Two players: 𝑄 1 and 𝑄 2 ( 𝑄 1 is now searching to find a good move) Zero-sum games: 𝑄 1 gets 𝑉(𝑢) , 𝑄 2 gets 𝐷 − 𝑉(𝑢) for terminal node 𝑢 𝑄 𝑄 1 : 𝑌 1 𝑄 2 : 𝑃 1-ply = half move 𝑄 2 𝑄 1 𝑄 2 Utilities from the point of view of 𝑄 1 6
Game tree (tic-tac-toe) Two players: 𝑄 1 and 𝑄 2 ( 𝑄 1 is now searching to find a good move) Zero-sum games: 𝑄 1 gets 𝑉(𝑢) , 𝑄 2 gets 𝐷 − 𝑉(𝑢) for terminal node 𝑢 1-ply = half move Utilities from the point of view of 𝑄 MAX 1 7
Optimal play Opponent is assumed optimal Minimax function is used to find the utility of each state. MAX/MIN wants to maximize/minimize the terminal payoff MAX gets 𝑉(𝑢) for terminal node 𝑢 8
Minimax 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡, 𝑁𝐵𝑌) 𝑗𝑔 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡) 𝑛𝑏𝑦 𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇 𝑡 𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑆𝐹𝑇𝑉𝑀𝑈(𝑡, 𝑏)) 𝑄𝑀𝐵𝑍𝐹𝑆 𝑡 = 𝑁𝐵𝑌 𝑁𝐽𝑂𝐽𝑁𝐵𝑌 𝑡 = 𝑛𝑗𝑜 𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇 𝑡 𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑆𝐹𝑇𝑉𝑀𝑈 𝑡, 𝑏 ) 𝑄𝑀𝐵𝑍𝐹𝑆 𝑡 = 𝑁𝐽𝑂 Utility of being in state s 𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑡) shows the best achievable outcome of being in state 𝑡 (assumption: optimal opponent) 3 3 2 2 9
Minimax (Cont.) Optimal strategy: move to the state with highest minimax value Best achievable payoff against best play Maximizes the worst-case outcome for MAX It works for zero-sum games 10
Minimax algorithm Depth first search function 𝑁𝐽𝑂𝐽𝑁𝐵𝑌_𝐸𝐹𝐷𝐽𝑇𝐽𝑃𝑂(𝑡𝑢𝑏𝑢𝑓) returns 𝑏𝑜 𝑏𝑑𝑢𝑗𝑝𝑜 𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) 𝑁𝐽𝑂_𝑊𝐵𝑀𝑉𝐹(𝑆𝐹𝑇𝑉𝑀𝑈(𝑡𝑢𝑏𝑢𝑓, 𝑏)) max return function 𝑁𝐵𝑌_𝑊𝐵𝑀𝑉𝐹(𝑡𝑢𝑏𝑢𝑓) returns 𝑏 𝑣𝑢𝑗𝑚𝑗𝑢𝑧 𝑤𝑏𝑚𝑣𝑓 if 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡𝑢𝑏𝑢𝑓) then return 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡𝑢𝑏𝑢𝑓) 𝑤 ← −∞ for each 𝑏 in 𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) do 𝑤 ← 𝑁𝐵𝑌(𝑤, 𝑁𝐽𝑂_𝑊𝐵𝑀𝑉𝐹(𝑆𝐹𝑇𝑉𝑀𝑈𝑇(𝑡𝑢𝑏𝑢𝑓, 𝑏))) return 𝑤 function 𝑁𝐽𝑂_𝑊𝐵𝑀𝑉𝐹(𝑡𝑢𝑏𝑢𝑓) returns 𝑏 𝑣𝑢𝑗𝑚𝑗𝑢𝑧 𝑤𝑏𝑚𝑣𝑓 if 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡𝑢𝑏𝑢𝑓) then return 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡𝑢𝑏𝑢𝑓) 𝑤 ← ∞ for each 𝑏 in 𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) do 𝑤 ← 𝑁𝐽𝑂(𝑤, 𝑁𝐵𝑌_𝑊𝐵𝑀𝑉𝐹(𝑆𝐹𝑇𝑉𝑀𝑈𝑇(𝑡𝑢𝑏𝑢𝑓, 𝑏))) return 𝑤 11
Properties of minimax Complete?Yes (when tree is finite) Optimal?Yes (against an optimal opponent) Time complexity: 𝑃(𝑐 𝑛 ) Space complexity: 𝑃(𝑐𝑛) (depth-first exploration) For chess, 𝑐 ≈ 35 , 𝑛 > 50 for reasonable games Finding exact solution is completely infeasible 12
Pruning Correct minimax decision without looking at every node in the game tree α - β pruning Branch & bound algorithm Prunes away branches that cannot influence the final decision 13
α - β pruning example 14
α - β pruning example 15
α - β pruning example 16
α - β pruning example 17
α - β pruning example 18
α - β progress 19
α - β pruning Assuming depth-first generation of tree We prune node 𝑜 when player has a better choice 𝑛 at (parent or) any ancestor of 𝑜 Two types of pruning (cuts): pruning of max nodes ( α -cuts) pruning of min nodes ( β -cuts) 20
Why is it called α - β? α : Value of the best (highest) choice found so far at any choice point along the path for MAX 𝛾 : Value of the best (lowest) choice found so far at any choice point along the path for MIN Updating α and 𝛾 during the search process For a MAX node once the value of this node is known to be more than the current 𝛾 ( v ≥ 𝛾 ), its remaining branches are pruned. For a MIN node once the value of this node is known to be less than the current 𝛽 ( v ≤ 𝛽 ), its remaining branches are pruned. 21
α - β pruning (an other example) 3 ≤ 2 3 2 ≥ 5 5 1 22
function 𝐵𝑀𝑄𝐼𝐵_𝐶𝐹𝑈𝐵_𝑇𝐹𝐵𝑆𝐷𝐼(𝑡𝑢𝑏𝑢𝑓) returns 𝑏𝑜 𝑏𝑑𝑢𝑗𝑝𝑜 𝑤 ← 𝑁𝐵𝑌_𝑊𝐵𝑀𝑉𝐹(𝑡𝑢𝑏𝑢𝑓, −∞, +∞) return the 𝑏𝑑𝑢𝑗𝑝𝑜 in 𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) with value 𝑤 function 𝑁𝐵𝑌_𝑊𝐵𝑀𝑉𝐹(𝑡𝑢𝑏𝑢𝑓, 𝛽, 𝛾) returns 𝑏 𝑣𝑢𝑗𝑚𝑗𝑢𝑧 𝑤𝑏𝑚𝑣𝑓 if 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡𝑢𝑏𝑢𝑓) then return 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡𝑢𝑏𝑢𝑓) 𝑤 ← −∞ for each 𝑏 in 𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) do 𝑤 ← 𝑁𝐵𝑌(𝑤, 𝑁𝐽𝑂_𝑊𝐵𝑀𝑉𝐹(𝑆𝐹𝑇𝑉𝑀𝑈𝑇(𝑡𝑢𝑏𝑢𝑓, 𝑏), 𝛽, 𝛾)) if 𝑤 ≥ 𝛾 then return 𝑤 𝛽 ← 𝑁𝐵𝑌(𝛽, 𝑤) return 𝑤 function 𝑁𝐽𝑂_𝑊𝐵𝑀𝑉𝐹(𝑡𝑢𝑏𝑢𝑓, 𝛽, 𝛾) returns 𝑏 𝑣𝑢𝑗𝑚𝑗𝑢𝑧 𝑤𝑏𝑚𝑣𝑓 if 𝑈𝐹𝑆𝑁𝐽𝑂𝐵𝑀_𝑈𝐹𝑇𝑈(𝑡𝑢𝑏𝑢𝑓) then return 𝑉𝑈𝐽𝑀𝐽𝑈𝑍(𝑡𝑢𝑏𝑢𝑓) 𝑤 ← +∞ for each 𝑏 in 𝐵𝐷𝑈𝐽𝑃𝑂𝑇(𝑡𝑢𝑏𝑢𝑓) do 𝑤 ← 𝑁𝐽𝑂(𝑤, 𝑁𝐵𝑌_𝑊𝐵𝑀𝑉𝐹(𝑆𝐹𝑇𝑉𝑀𝑈𝑇(𝑡𝑢𝑏𝑢𝑓, 𝑏), 𝛽, 𝛾)) if 𝑤 ≤ 𝛽 then return 𝑤 𝛾 ← 𝑁𝐽𝑂(𝛾, 𝑤) return 𝑤 23
Order of moves Good move ordering improves effectiveness of pruning ? 𝑛 2 ) Best order: time complexity is 𝑃(𝑐 3𝑛 4 ) Random order: time complexity is about 𝑃(𝑐 for moderate 𝑐 α - β pruning just improves the search time only partly 24
Computational time limit (example) 100 secs is allowed for each move (game rule) 10 4 nodes/sec (processor speed) We can explore just 10 6 nodes for each move b m = 10 6 , b=35 ⟹ m=4 (4-ply look-ahead is a hopeless chess player!) 25
Computational time limit: Solution We must make a decision even when finding the optimal move is infeasible. Cut off the search and apply a heuristic evaluation function cutoff test: t urns non-terminal nodes into terminal leaves Cut off test instead of terminal test ( e.g., depth limit) evaluation function: estimated desirability of a state Heuristic function evaluation instead of utility function This approach does not guarantee optimality. 26
Heuristic minimax 𝐼 𝑁𝐽𝑂𝐽𝑁𝐵𝑌 𝑡,𝑒 = 𝐹𝑊𝐵𝑀(𝑡, 𝑁𝐵𝑌) 𝑗𝑔 𝐷𝑉𝑈𝑃𝐺𝐺_𝑈𝐹𝑇𝑈(𝑡, 𝑒) 𝑛𝑏𝑦 𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇 𝑡 𝐼_𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑆𝐹𝑇𝑉𝑀𝑈 𝑡, 𝑏 , 𝑒 + 1) 𝑄𝑀𝐵𝑍𝐹𝑆 𝑡 = MAX 𝑛𝑗𝑜 𝑏∈𝐵𝐷𝑈𝐽𝑃𝑂𝑇 𝑡 𝐼_𝑁𝐽𝑂𝐽𝑁𝐵𝑌(𝑆𝐹𝑇𝑉𝑀𝑈 𝑡, 𝑏 , 𝑒 + 1) 𝑄𝑀𝐵𝑍𝐹𝑆 𝑡 = MIN 27
Evaluation functions For terminal states, it should order them in the same way as the true utility function. For non-terminal states, it should be strongly correlated with the actual chances of winning. It must not need high computational cost. 28
Recommend
More recommend