CS 730/830: Intro AI Adversarial Search 1 handout: slides You think you know when you can learn, are more sure when you can write, even more when you can teach, but certain when you can program. Wheeler Ruml (UNH) Lecture 7, CS 730 – 1 / 19
EOLQs Adversarial Search Wheeler Ruml (UNH) Lecture 7, CS 730 – 2 / 19
Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α - β Pruning ■ α - β Pseudo-code Adversarial Search ■ Why α - β ? ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 3 / 19
Planning Problems Observability: complete, partial, hidden Adversarial Search State: discrete, continuous ■ Problems ■ Different! Actions: deterministic, stochastic, discrete, continuous ■ Minimax ■ Tic-tac-toe Nature: static, deterministic, stochastic ■ Improvements Interaction: one decision, sequential ■ Break ■ α - β Pruning Time: static/off-line, on-line, discrete, continuous ■ α - β Pseudo-code Percepts: discrete, continuous, uncertain ■ Why α - β ? ■ Progress Others: solo, cooperative, competitive ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 4 / 19
Multi-agent is Different Shortest-path (M&C, vacuum, tile puzzle) ■ Adversarial Search ■ Problems want least-cost path to goal at unkown depth ◆ ■ Different! ■ Minimax Decisions with an adversary (chess, tic-tac-toe) ■ ■ Tic-tac-toe ■ Improvements adversary might prevent path to best goal ◆ ■ Break ■ α - β Pruning want best assured outcome assuming rational opponent ◆ ■ α - β Pseudo-code ■ Why α - β ? irrational opponent can only be worse ◆ ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 5 / 19
Adversarial Search: Minimax Each ply corresponds to half a move . Adversarial Search Terminal states are labeled with value. ■ Problems ■ Different! ■ Minimax incorrect version by Zermelo (1912) ■ Tic-tac-toe full treatment by von Neumann and Morgenstern (1944) ■ Improvements ■ Break ■ α - β Pruning ■ α - β Pseudo-code Can also bound depth and use a static evaluation function on ■ Why α - β ? non-terminal states. ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 6 / 19
Evaluation for Tic-tac-toe A 3-length is a complete row, column, or diagonal. Adversarial Search value of position = ∞ if win for me, ■ Problems ■ Different! or = −∞ if a win for you, ■ Minimax otherwise = # 3-lengths open for me − ■ Tic-tac-toe ■ Improvements # 3-lengths open for you ■ Break ■ α - β Pruning ■ α - β Pseudo-code ■ Why α - β ? ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 7 / 19
Tic-tac-toe: two-ply search Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α - β Pruning ■ α - β Pseudo-code ■ Why α - β ? ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 8 / 19
Tic-tac-toe: second move Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α - β Pruning ■ α - β Pseudo-code ■ Why α - β ? ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 9 / 19
Tic-tac-toe: third move Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α - β Pruning ■ α - β Pseudo-code ■ Why α - β ? ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 10 / 19
Improving the Search partial expansion, SEF ■ Adversarial Search symmetry (‘transposition tables’) ■ Problems ■ ■ Different! search more ply as we have time (De Groot figure) ■ ■ Minimax ■ Tic-tac-toe avoid unnecessary evaluations ■ ■ Improvements ■ Break ■ α - β Pruning ■ α - β Pseudo-code ■ Why α - β ? ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 11 / 19
Break asst 3 ■ Adversarial Search asst 4 ■ Problems ■ ■ Different! projects! talk with me well before break ■ ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α - β Pruning ■ α - β Pseudo-code ■ Why α - β ? ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 12 / 19
Which Values are Necessary? Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α - β Pruning ■ α - β Pseudo-code ■ Why α - β ? ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 13 / 19
α - β Pruning best outcome Max can force at previous decision on this α Adversarial Search path (init to −∞ ) ■ Problems ■ Different! β best outcome Min can force at previous decision on this path ■ Minimax ■ Tic-tac-toe (init to ∞ ) ■ Improvements ■ Break α and β values are copied down the tree (but not up). ■ α - β Pruning ■ α - β Pseudo-code Minmax values are passed up the tree, as usual. ■ Why α - β ? ■ Progress ■ EOLQs John McCarthy (1956 but never published) simple version used by Newell, Shaw, and Simon (1958) published by Hart and Edwards (1961) proved correct and analyzed by Knuth and Moore (1975) proved optimal by Pearl (1982) Wheeler Ruml (UNH) Lecture 7, CS 730 – 14 / 19
α - β Pseudo-code Max-value (state, α , β ): Adversarial Search ■ Problems when depth-cutoff (state), return SEF(state) ■ Different! ■ Minimax for each child of state ■ Tic-tac-toe α ← max( α , Min-value (child, α , β )) ■ Improvements ■ Break when α ≥ β , return α ■ α - β Pruning return α ■ α - β Pseudo-code ■ Why α - β ? ■ Progress ■ EOLQs Min-value (state, α , β ): when depth-cutoff (state), return SEF(state) for each child of state β ← min( β , Max-value (child, α , β )) when β ≤ α , return β return β Wheeler Ruml (UNH) Lecture 7, CS 730 – 15 / 19
α - β in action Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α - β Pruning ■ α - β Pseudo-code ■ Why α - β ? ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 16 / 19
Why α - β ? Time complexity of α - β is about O ( b d/ 2 ) Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α - β Pruning ■ α - β Pseudo-code ■ Why α - β ? ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 17 / 19
Progress on Games Computers best: chess, checkers, backgammon, Scrabble, Adversarial Search Jeopardy, Go ■ Problems ■ Different! Computers competitive: bridge, crosswords, poker ■ Minimax ■ Tic-tac-toe Computers amateur: soccer? ■ Improvements ■ Break ■ α - β Pruning ■ α - β Pseudo-code ■ Why α - β ? ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 18 / 19
EOLQs Please write down the most pressing question you have about Adversarial Search the course material covered so far and put it in the box on your ■ Problems ■ Different! way out. ■ Minimax ■ Tic-tac-toe Thanks! ■ Improvements ■ Break ■ α - β Pruning ■ α - β Pseudo-code ■ Why α - β ? ■ Progress ■ EOLQs Wheeler Ruml (UNH) Lecture 7, CS 730 – 19 / 19
Recommend
More recommend