On to Games Game Playing • Tail end of Constraint Satisfaction Ch. 5.1-5.3, 5.4.1, 5.5 Questions • Game playing from reading? • Framework • Game trees We’ve seen search problems • Minimax where other agents’ moves • Alpha-beta pruning need to be taken into account – but what if they are • Adding randomness actively moving against us? Cynthia Matuszek – CMSC 671 1 Based on slides by Marie desJardin, Francisco Iacobelli 34 1 34 Why Games? State-of-the-art “A computer can’t be intelligent; one could never beat a human at ____” • Clear criteria for success • Chess : • Deep Blue beat Gary Kasparov in 1997 • Offer an opportunity to study problems involving • Garry Kasparav vs. Deep Junior (Feb 2003): tie! {hostile / adversarial / competing} agents. • Kasparov vs. X3D Fritz (November 2003): tie! • Interesting, hard problems which require minimal • Deep Fritz beat world champion Vladimir Kramnik (2006) setup • Now computers play computers • Often define very large search spaces • Checkers : “Chinook” (sigh), an AI program with a • chess 35 100 nodes in search tree, 10 40 legal states very large endgame database, is world champion, can provably never be beaten. Retired 1995. • Many problems can be formalized as games 35 36 35 36 AlphaGo Master defeated Ke Jie by three to zero during its 60 straight wins in the State-of-the-art online games at the end of 2016 and beginning of 2017. • Bridge : “Expert-level” AI, but no world champions • “computer bridge world champion Jack played seven top Dutch pairs … and two reigning European champions. • A total of 196 boards were played. Jack defeated three out of the seven pairs (including the Europeans). Overall, the program lost by a small margin (359 versus 385).” (2006) • Bridge is stochastic: the computer has imperfect information. “A computer can’t be intelligent; one • Go could never beat a human at ____” 37 wikipedia: Computer_bridge www.wired.com/2017/05/googles-alphago-levels-board-games-power-grids 37 38 1
State-of-the-art: Go Typical Games • Computers finally got there: AlphaGo! • 2-person game • Made by Google DeepMind in London • Players alternate moves • 2015: Beat a professional Go player without handicaps • Easiest games are: • 2016: Beat a 9-dan professional without handicaps • Zero-sum : one player’s loss is the other’s gain • Fully observable : both players have access to complete • 2017: Beat Ke Jie, #1 human player information about the state of the game. • Deterministic : No chance (e.g., dice) involved • 2017: DeepMind published AlphaGo Zero • No human games data • Tic-Tac-Toe, Checkers, Chess, Go, Nim, Othello • Learns from playing itself • Not: Bridge, Solitaire, Backgammon, ... • Better than AlphaGo in 3 days of playing 39 44 39 44 How to Play (How to Search) How to Play (How to Search) • Obvious approach: • Key problems: • From current game state: • Representing the “board” (game state) 1. Consider all the legal moves you can make • We’ve seen that there are different ways to make these choices • Generating all legal next boards 2. Compute new position resulting from each move 3. Evaluate each resulting position • That can get ugly • Evaluating a position 4. Decide which is best 5. Make that move 6. Wait for your opponent to move 7. Repeat x 1 x 2 x 3 x 1 x 2 x 3 x 4 x 4 45 46 45 46 Evaluation Function Evaluation Function: The Idea • Evaluation function or static evaluator is used to • I am always trying to reach the highest value evaluate the “goodness” of a game position (state) • You are always trying to reach the lowest value • Zero-sum assumption allows one evaluation • Captures everyone’s goal in a single function function to describe goodness of a board for both • f ( n ) >> 0 : position n good for me and bad for you players • f ( n ) << 0 : position n bad for me and good for you • One player’s gain of n means the other loses n • f ( n ) = 0 ± ε : position n is a neutral position • How? • f ( n ) = + ∞ : win for me • f ( n ) = - ∞ : win for you 47 48 47 48 2
Evaluation Function Examples Evaluation function examples • Example of an evaluation function for Tic-Tac-Toe: • Most evaluation functions are specified as a weighted sum of position features: • f ( n ) = [#3-lengths open for × ] - [#3-lengths open for O] • A 3-length is a complete row , column, or diagonal • f ( n ) = w 1 * feat 1 ( n ) + w 2 * feat 2 ( n ) + ... + w n * feat k ( n ) • Alan Turing’s function for chess • Example features for chess: piece count, piece placement, squares controlled, … • f ( n ) = w ( n )/ b ( n ) • w ( n ) = sum of the point value of white’s pieces • Deep Blue had over 8000 square control, rook-in-file, x- • b ( n ) = sum of black’s rays, king safety, pawn structure, features in its nonlinear passed pawns, ray control, outposts, pawn majority, rook on evaluation function! the 7 th blockade, restraint, trapped pieces, color complex, ... 49 50 49 50 Evaluation Function: the Idea Game trees I am maximizing f ( n ) on my turn • Problem spaces for • I am always trying to reach the highest value typical games are • You are always trying to reach the lowest value represented as trees Opponent is • Player must decide • Captures everyone’s goal in a single function minimizing f ( n ) best single move to • f ( n ) >> 0 : position n good for me and bad for you on their turn make next • f ( n ) << 0 : position n bad for me and good for you • Root node = current • f ( n ) = 0 ± ε : position n is a neutral position board configuration • f ( n ) = + ∞ : win for me • f ( n ) = - ∞ : win for you • Arcs = possible legal moves for a player 52 53 52 53 Game trees Minimax Procedure • Static evaluator function • Create start node: MAX node, current board state • Rates a board position • Expand nodes down to a depth of lookahead • f (board) = R, with f >0 for me, f <0 for you • Apply evaluation function at each leaf node • If it is my turn to move: • “Back up” values for each non-leaf node until a • Root is labeled “ MAX ” node value is computed for the root node • Otherwise it is a “ MIN ” node • MIN : backed-up value is lowest of children’s values ( opponent’s turn ) • MAX : backed-up value is highest of children’s values • Each level’s nodes are all MAX or all MIN • Pick operator associated with the child node whose backed-up value set the value at the root • Nodes at level i are opposite those at level i +1 54 55 54 55 3
lookahead = 3 Minimax Algorithm max min 2 1 2 2 1 2 7 1 8 2 7 1 8 2 7 1 8 2 Static evaluator value 2 1 Can only choose MAX “best” move up to MIN lookahead 2 7 1 8 https://www.youtube.com/watch?v=6ELUvkSkCts 56 57 Example: Nim Partial Game Tree for Tic-Tac-Toe • In Nim, there are a certain number of objects (coins, sticks, etc.) on the table – we’ll play 7-coin Nim • Each player in turn has to pick up either one or two objects • Whoever picks up the last object loses • f (n) = +1 if position is a win for X. • f (n) = -1 if position is a win for O. • f (n) = 0 if position is a draw. 58 59 Minimax Tree Nim Game Tree MAX node • In-class exercise: • Draw minimax search tree for 4-coin Nim MIN node • Things to consider: • What’s your start state? • What’s the maximum depth of the tree? Minimum? • Pick up either one or two objects • Whoever picks up the last object loses value computed f value by minimax 61 60 61 4
Nim Game Tree Games 2 Player 1 wins: +1 4 Player 2 wins: -1 3 2 Expectiminimax 2 1 1 0 Alpha-beta Pruning 1 0 0 0 0 62 63 62 63 Nim Game Tree Nim Game Tree Player 1 wins: +1 Player 1 wins: +1 4 Player 2 wins: -1 -1 4 Player 2 wins: -1 3 2 2 -1 3 -1 1 2 1 1 0 -1 1 -1 1 2 1 1 0 1 0 0 -1 0 -1 -1 1 0 1 0 -1 0 -1 -1 1 0 1 0 64 65 64 65 Improving Minimax Alpha-Beta Pruning • Basic problem: must examine a number of states • We can improve on the performance of the that is exponential in d ! minimax algorithm through alpha-beta pruning • Basic idea: “If you have an idea that is surely bad, don't take the • Solution: judicious pruning time to see how truly awful it is.” – Pat Winston of the search tree • We don’t need to compute ≤ 2 the value at this node. MAX • “Cut off” whole sections that • No matter what it is, it can’t can’t be part of the best solution MIN = 2 ≤ 1 affect the value of the root • Or, sometimes, probably won’t node. • Can be a completeness vs. efficiency tradeoff, esp. in • Because the MAX player MAX stochastic problem spaces will choose this value. 2 7 1 ? 67 66 67 5
Recommend
More recommend