Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Adversarial Search Lecture 7 How can we use search to plan ahead when other agents are planning against us ? Adversarial Search June 10, 2017 1
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Agenda • Games: context, history • Searching via Minimax • Scaling – 𝛽−𝛾 pruning – Depth-limiting – Evaluation functions • Handling uncertainty with Expectiminimax Adversarial Search June 10, 2017 2
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Characterizing Games • There are many kinds of games, and several ways to classify them – Deterministic vs. stochastic – [Im]perfect information – One, two, multi-player – Utility (how agents value outcomes) • Zero-sum • Algorithmic goal: calculate a strategy (or policy ) that decides a move in each state Adversarial Search June 10, 2017 3
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Utility Zero/Constant-Sum General Games • Opposite utilities • Independent utilities • Adversarial, pure • Cooperation, indifference, competition competition, and more are all possible Adversarial Search June 10, 2017 4
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Examples: Perception vs. Chance Deterministic Stochastic Perfect Chess, Checkers, Go, Othello Backgammon, Monopoly Imperfect Battleship Bridge, Poker, Scrabble Adversarial Search June 10, 2017 5
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Checkers • 1950: First computer player • 1994: First computer champion (Chinook) ended 40-year-reign of human champion Marion Tinsley using complete 8-piece endgame • 1995: defended against Don Lafferty • 2007: solved ! Adversarial Search June 10, 2017 6
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Chess • 1997: Deep Blue defeats human champion Gary Kasparov in a six-game match • Deep Blue examined 200M positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply • Current programs are even better , if less historic DeepBlue Adversarial Search June 10, 2017 7
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Go • Until recently, AI was not competitive at champion level – 2015: beat Fan Hui, European champion (2-dan; 5-0) – 2016: beat Lee Sedol, one of the best players in the world (9-dan; 4-1) – 2017: beat Ke Jie, #1 in the world (9-dan; 2-0) • MCTS + ANNs for policy (what to do) and evaluation (how good is a board state) AlphaGo Adversarial Search June 10, 2017 8
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Poker • Libratus beat four top- class human poker players in January, 2017 – 120,000 hands played • Novel methods for endgame solving in imperfect games • 15 million core hours of computation (+4 during competition) Libratus Adversarial Search June 10, 2017 9
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky More Progress • Othello: 1997, defeated world champion • Bridge: 1998, competitive with human champions • Scrabble: 2006, defeated world champion Adversarial Search June 10, 2017 10
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Game Formalism • States: 𝑇 (start at 𝑇 % ) • Players: 𝑄 {1, … 𝑂} (typically take turns) • Actions: 𝐵𝑑𝑢𝑗𝑝𝑜(𝑡) , returns legal options • Transition function: 𝑇×𝐵 → 𝑇 • Terminal test: 𝑈𝑓𝑠𝑛𝑗𝑜𝑏𝑚(𝑡) , returns T/F • Utility: 𝑇×𝑄 → ℝ • Solution for a player is a policy : 𝑇 → 𝐵 Adversarial Search June 10, 2017 11
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Game Plan :) • Start with deterministic, two- player adversarial games • Issues to come – Multiple players – Resource limits – Stochasticity Adversarial Search June 10, 2017 12
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Single-Agent Game Tree 8 2 0 … 2 6 … 4 6 Adversarial Search June 10, 2017 13
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Value of a State Non-Terminal States: Value of a state: The best achievable outcome (utility) from that state 8 Terminal States: 2 0 … 2 6 … 4 6 Adversarial Search June 10, 2017 14
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Adversarial Game Trees -20 -8 … -18 -5 … -10 +4 -20 +8 Adversarial Search June 10, 2017 15
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Minimax Values States Under Agent’s Control: States Under Opponent’s Control: -8 -5 -10 +8 Terminal States: Adversarial Search June 10, 2017 16
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Tic-Tac-Toe Game Tree Adversarial Search June 10, 2017 17
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Adversarial Search via Minimax • Deterministic, zero-sum Minimax values: – Tic-tac-toe, chess computed recursively – One player maximizes max 5 – The other minimizes • Minimax search min 5 2 – A search tree – Players alternate turns – Compute each node’s 8 2 5 6 minimax value : the best achievable utility Terminal values: against a rational part of the game (optimal) adversary Adversarial Search June 10, 2017 18
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Minimax Implementation def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is MIN: return min-value(state) def max-value(state): def min-value(state): initialize v = - ∞ initialize v = + ∞ for each successor of state: for each successor of state: v = max(v, value(successor)) v = min(v, value(successor)) return v return v Adversarial Search June 10, 2017 19
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Minimax Evaluation Time 𝒫(𝑐 𝑛 ) • – For chess: 𝑐 ≈ 35 , 𝑛 ≈ 100 Space 𝒫(𝑐𝑛) • Complete • Only if finite Minimax-Min Optimal • Yes, against optimal opponent Minimax-Avg Adversarial Search June 10, 2017 20
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Multiple Players Add a ply per player • Independent utility: use a vector of values, each player MAX own utility • Zero-sum: each team sequentially MIN/MAX • In Pacman, have multiple MIN layers for each ghost per 1 Pacman move Adversarial Search June 10, 2017 21
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Scaling to Larger Games Tree Pruning Depth-Limiting + Evaluation Adversarial Search June 10, 2017 22
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Minimax Example 3 2 3 2 3 12 8 2 4 6 14 5 2 Adversarial Search June 10, 2017 23
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Minimax Pruning [−∞, ∞] [3, ∞] [3,3] 3 [3,3] [2,2] [−∞, ∞] [−∞, 3] [−∞, 2] [−∞, 14] [−∞, 5] 2 3 2 3 12 8 2 14 5 2 Adversarial Search June 10, 2017 24
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky General Case 𝛽 is the best value (to 𝑁𝐵𝑌 ) found so far off the current path • If V is worse than 𝛽 , 𝑁𝐵𝑌 will avoid it – prune that branch • Define 𝛾 similarly for 𝑁𝐽𝑂 • Adversarial Search June 10, 2017 25
Wentworth Institute of Technology COMP3770 – Artificial Intelligence | Summer 2017 | Derbinsky Alpha-Beta Pruning def min-value(state, α, β): initialize v = + ∞ for each successor of state: v = min(v,value(successor,α,β)) if v ≤ α return v β = min(β, v) return v def max-value(state, α, β): initialize v = - ∞ for each successor of state: v = max(v,value(successor,α,β)) if v ≥ β return v α: MAX’s best option on path α = max(α, v) β: MIN’s best option on path return v Adversarial Search June 10, 2017 26
Recommend
More recommend