Spring 2009 Lecture 6: Adversarial Search 2/5/2009 John DeNero UC - PDF document

CS 188: Artificial Intelligence Spring 2009 Lecture 6: Adversarial Search 2/5/2009 John DeNero – UC Berkeley Slides adapted from Dan Klein, Stuart Russell or Andrew Moore Announcements  Written Assignment 1:  Due Tuesday in lecture!  No late days for written assignments  Printed copies will be here after class  Countdown to math:  Markov decision processes are 3 lectures away  Project 2:  Posted tonight; due Wednesday, 2/18  Material from today and next Tuesday  Midterm on Thursday, 3/19, at 6pm in 10 Evans 1

Game Playing  Many different kinds of games!  Axes:  Deterministic or stochastic?  One, two or more players?  Perfect information (can you see the state)?  Want algorithms for calculating a strategy (policy) which recommends a move in each state 3 Example: Peg Game Jump each tee and remove it:  Leave only one -- you're genius  Leave two and you're purty smart  Leave three and you're just plain dumb  Leave four or mor'n you're an EG-NO-RA-MOOSE Looks like a search problem:  Has a start state, goal test, successor function  But the goal cost is not the sum of step costs!  Are all of our search algorithms useless here? Instructions from Cracker Barrel Old Country Store 2

Deterministic Single-Player  Deterministic, single player, perfect information games:  Start state, successor function, terminal test, utility of terminals  Max search:  Each node stores a value: the best outcome it can reach  This is the maximal value of its children (recursive definition)  No path sums; utilities at end  After search, can pick move that leads to the best outcome Genius Purty smart Plain dumb Ignoramus 4 3 2 1 Properties of Max Search  Terminology: terminal states, node values, policies  Without bounds, need to search the entire tree to find the max  Computes successively tighter lower bounds on node values  With a known upper bound on utility, can stop when the global max is attained  Nodes are max nodes because one agent is making decisions  Caching max values can speed up computation Genius Purty smart Plain dumb Ignoramus 4 3 2 1 3

Uses of a Max Tree  Can select a sequence of moves that maximizes utility  Can recover optimally from bad moves  Can compute values for certain scenarios easily Genius Purty smart Plain dumb Ignoramus 4 3 2 1 Adversarial Search [DEMO: mystery pacman] 8 4

Deterministic Two-Player  Deterministic, zero-sum games:  tic-tac-toe, chess, checkers max  One player maximizes result  The other minimizes result  Minimax search: min  A state-space search tree  Players alternate  Each layer, or ply, consists of a 8 2 5 6 round of moves  Choose move to position with highest minimax value: best achievable utility against a rational adversary 9 Tic-tac-toe Game Tree 10 5

Minimax Example 11 Minimax Search 12 6

Minimax Properties  Optimal against a perfect player. Otherwise? max  Time complexity?  O(b m ) min  Space complexity?  O(bm) 10 10 9 100  For chess, b 35, m 100  Exact solution is completely infeasible [DEMO:  Lots of approximations and pruning minVsExp] 13 Resource Limits  Cannot search to leaves max 4  Depth-limited search -2 4 min min  Instead, search a limited depth of tree  Replace terminal utilities with an eval -1 -2 4 9 function for non-terminal positions  Guarantee of optimal play is gone  More plies makes a BIG difference  [DEMO: limitedDepth]  Example:  Suppose we have 100 seconds, can explore 10K nodes / sec  So can check 1M nodes per move  - reaches about depth 8 – decent ? ? ? ? chess program 14  Deep Blue sometimes reached depth 40+ 7

Evaluation Functions  Function which scores non-terminals  Ideal function: returns the utility of the position  In practice: typically weighted linear sum of features:  e.g. f 1 ( s ) = (num white queens – num black queens), etc. 15 Evaluation for Pacman [DEMO: thrashing, smart ghosts] 16 8

Why Pacman Starves  He knows his score will go up by eating the dot now  He knows his score will go up just as much by eating the dot later on  There are no point-scoring opportunities after eating the dot  Therefore, waiting seems just as good as eating Iterative Deepening Iterative deepening uses DFS as a subroutine: b … 1. Do a DFS which only searches for paths of length 1 or less. (DFS gives up on any path of length 2) 2. If “1” failed, do a DFS which only searches paths of length 2 or less. 3. If “2” failed, do a DFS which only searches paths of length 3 or less. ….and so on. This works for single-agent search as well! Why do we want to do this for multiplayer games? 19 9

- Pruning Example 21 - Pruning  General configuration  is the best value that Player MAX can get at any choice point along the Opponent current path  If n becomes worse than , MAX will avoid it, so Player can stop considering n ’s other children Opponent n  Define similarly for MIN 22 10

- Pruning Pseudocode v 23 - Pruning Properties  This pruning has no effect on final result at the root  Values of intermediate nodes might be wrong  Good move ordering improves effectiveness of pruning  With “perfect ordering”:  Time complexity drops to O(b m/2 )  Doubles solvable depth  Full search of, e.g. chess, is still hopeless!  This is a simple example of metareasoning 24 11

More Metareasoning Ideas  Forward pruning – prune a node immediately without recursive evaluation  Singular extensions – explore only one action that is clearly better than others. Can alleviate horizon effects  Cutoff test – a decision function about when to apply evaluation  Quiescence search – expand the tree until positions are reached that are quiescent ? ? ? ? (i.e., not volatile) 25 Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions. Checkers is now solved!  Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue examined 200 million positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 ply.  Othello: human champions refuse to compete against computers, which are too good.  Go: human champions refuse to compete against computers, which are too bad. In go, b > 300, so most programs use pattern knowledge bases to suggest plausible moves.  26 Pacman: unknown 12

GamesCrafters http://gamescrafters.berkeley.edu/ 27 13

Spring 2009 Lecture 6: Adversarial Search 2/5/2009 John DeNero UC - PDF document

CS 188: Artificial Intelligence Spring 2009 Lecture 6: Adversarial Search 2/5/2009 John DeNero UC Berkeley Slides adapted from Dan Klein, Stuart Russell or Andrew Moore Announcements Written Assignment 1: Due Tuesday in lecture!

Spring 3 Spring without XML Agenda Industry Forces Whats New Spring 2.0

SURVEY AREA WWW-YES-2009-France Water Survey Results 3 June 2009 WWW-YES-2009-France water

2009 Half Year Results Presentation 6 months to 30 June 2009 13 August 2009 2009 Half Year

First Quarter 2009 - A Good Start 1Q 2009 Results Presentation - 29 April 2009 Agenda 1Q 2009

Platinum Platinum 2009 2009 th May 2009 18 18 th May 2009 Good morning to everyone, and

anton@linevich.com http://viewdle.com Friday, July 3, 2009 Friday, July 3, 2009 Friday, July 3,

Thursday, September 10, 2009 Thursday, September 10, 2009 Thursday, September

Royersford Spring City Bridge Rehabilitation Royersford Spring City Bridge Rehabilitation

1 Slide 2 us1 Upali Siriwardane, 3/26/2008 Rules for assigning oxidation numbers Rules for

Pinal County Adopted Budget FY 2009 FY 2009 - 2010 2010 June 24, 2009 Pinal County Truth in

COPPER PRODUCER IN 2009 COPPER PRODUCER IN 2009 COPPER PRODUCER IN 2009 COPPER PRODUCER IN 2009

Construction Storm Water Construction Storm Water Workshop Workshop 2009 2009 2009 2009

Merging Merb into Rails Wednesday, November 18, 2009 Me Wednesday, November 18, 2009 Yehuda

PROJECT UPDA TE Spring 2014 TOPRS Spring 2014 Public Meeting Presentation BACKGROUND Spring

Course webpage WWW.cs.sfu.ca/~kabanets/307 307 Lectures Spring 2018 Page 1 307 Lectures Spring

BTS 65 Floor spring BTS 65 FLOOR SPRING THE ECONOMIC FLOOR SPRING Technical Data BTS 65

Game playing Chapter 5 Chapter 5 1 Outline Games Perfect play minimax decisions

Learning Objectives At the end of the class you should be able to: explain how cycle checking and

Transposition Table, History Heuristic, and other Search Enhancements Tsan-sheng Hsu

Fall 2005 6.831 UI Design and Implementation 1

Final Exam Tuesday, December 11, 5:30pm-8pm This

Foundations of Artificial Intelligence 6. Board Games Search Strategies for Games, Games with

Motion and Human Motion and Human Actions Actions Ivan Laptev ivan.laptev@ens.fr Equipe projet

Matching Deformable Objects in Clutter Emanuele Rodol` a USI Lugano Joint work with L. Cosmo