spring 2009
play

Spring 2009 Written Assignment 1: Due Tuesday in lecture! No late - PDF document

Announcements CS 188: Artificial Intelligence Spring 2009 Written Assignment 1: Due Tuesday in lecture! No late days for written assignments Printed copies will be here after class Lecture 6: Adversarial Search Countdown to


  1. Announcements CS 188: Artificial Intelligence Spring 2009  Written Assignment 1:  Due Tuesday in lecture!  No late days for written assignments  Printed copies will be here after class Lecture 6: Adversarial Search  Countdown to math:  Markov decision processes are 3 lectures away 2/5/2009  Project 2:  Posted tonight; due Wednesday, 2/18  Material from today and next Tuesday John DeNero – UC Berkeley  Midterm on Thursday, 3/19, at 6pm in 10 Evans Slides adapted from Dan Klein, Stuart Russell or Andrew Moore Game Playing Example: Peg Game  Many different kinds of games! Jump each tee and remove it:  Leave only one -- you're genius  Leave two and you're purty smart  Axes:  Leave three and you're just plain dumb  Deterministic or stochastic?  Leave four or mor'n you're an EG-NO-RA-MOOSE  One, two or more players?  Perfect information (can you see the state)? Looks like a search problem:  Has a start state, goal test, successor function  Want algorithms for calculating a strategy  But the goal cost is not the sum of step costs! (policy) which recommends a move in each state  Are all of our search algorithms useless here? 3 Instructions from Cracker Barrel Old Country Store Deterministic Single-Player Properties of Max Search  Deterministic, single player,  Terminology: terminal states, perfect information games: node values, policies  Start state, successor function,  Without bounds, need to search terminal test, utility of terminals the entire tree to find the max  Computes successively tighter  Max search: lower bounds on node values  Each node stores a value: the best outcome it can reach  With a known upper bound on  This is the maximal value of its utility, can stop when the global children (recursive definition) max is attained  No path sums; utilities at end  Nodes are max nodes because one agent is making decisions  After search, can pick  Caching max values can move that leads to the speed up computation best outcome Genius Purtysmart Plain dumb Ignoramus Genius Purtysmart Plain dumb Ignoramus 4 3 2 1 4 3 2 1 1

  2. Uses of a Max Tree Adversarial Search  Can select a sequence of moves that maximizes utility  Can recover optimally from bad moves  Can compute values for certain scenarios easily [DEMO: mystery pacman] Genius Purtysmart Plain dumb Ignoramus 8 4 3 2 1 Deterministic Two-Player Tic-tac-toe Game Tree  Deterministic, zero-sum games:  tic-tac-toe, chess, checkers max  One player maximizes result  The other minimizes result  Minimax search: min  A state-space search tree  Players alternate  Each layer, or ply, consists of a 8 2 5 6 round of moves  Choose move to position with highest minimax value: best achievable utility against a rational adversary 9 10 Minimax Example Minimax Search 11 12 2

  3. Minimax Properties Resource Limits  Cannot search to leaves  Optimal against a perfect player. Otherwise? max 4  Depth-limited search -2 4 min min max  Instead, search a limited depth of tree  Time complexity?  Replace terminal utilities with an eval  O(b m ) -1 -2 4 9 function for non-terminal positions min  Guarantee of optimal play is gone  Space complexity?  O(bm)  More plies makes a BIG difference  [DEMO: limitedDepth] 10 10 9 100  For chess, b 35, m 100  Example:  Exact solution is completely infeasible  Suppose we have 100 seconds, can [DEMO: explore 10K nodes / sec  Lots of approximations and pruning minVsExp]  So can check 1M nodes per move  - reaches about depth 8 – decent ? ? ? ? chess program 13 14  Deep Blue sometimes reached depth 40+ Evaluation Functions Evaluation for Pacman  Function which scores non-terminals [DEMO: thrashing,  Ideal function: returns the utility of the position smart ghosts]  In practice: typically weighted linear sum of features: e.g. f 1 ( s ) = (num white queens – num black queens), etc.  15 16 Iterative Deepening Why Pacman Starves Iterative deepening uses DFS as a subroutine: b  He knows his score will go … 1. Do a DFS which only searches for paths of up by eating the dot now length 1 or less. (DFS gives up on any path of  He knows his score will go length 2) 2. If “ 1 ” failed, do a DFS which only searches paths up just as much by eating of length 2 or less. the dot later on 3. If “ 2 ” failed, do a DFS which only searches paths  There are no point-scoring of length 3 or less. opportunities after eating ….and so on. the dot This works for single-agent search as well!  Therefore, waiting seems just as good as eating Why do we want to do this for multiplayer games? 19 3

  4. - Pruning Example - Pruning  General configuration  is the best value that Player MAX can get at any choice point along the Opponent current path  If n becomes worse than , MAX will avoid it, so Player can stop considering n ’s other children Opponent n  Define similarly for MIN 21 22 - Pruning Pseudocode - Pruning Properties  This pruning has no effect on final result at the root  Values of intermediate nodes might be wrong  Good move ordering improves effectiveness of pruning  With “perfect ordering”:  Time complexity drops to O(b m/2 )  Doubles solvable depth  Full search of, e.g. chess, is still hopeless!  This is a simple example of metareasoning v 23 24 More Metareasoning Ideas Game Playing State-of-the-Art  Checkers: Chinook ended 40-year-reign of human world champion  Forward pruning – prune a Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total node immediately without of 443,748,401,247 positions. Checkers is now solved! recursive evaluation   Singular extensions – explore Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game match in 1997. Deep Blue examined 200 million only one action that is clearly positions per second, used very sophisticated evaluation and undisclosed methods for extending some lines of search up to 40 better than others. Can ply. alleviate horizon effects  Cutoff test – a decision function  Othello: human champions refuse to compete against computers, which are too good. about when to apply evaluation  Quiescence search – expand  Go: human champions refuse to compete against computers, which are too bad. In go, b > 300, so most programs use pattern the tree until positions are knowledge bases to suggest plausible moves. reached that are quiescent ? ? ? ? (i.e., not volatile)  25 Pacman: unknown 26 4

  5. GamesCrafters http://gamescrafters.berkeley.edu/ 27 5

Recommend


More recommend