announcements
play

Announcements Homework k 3: Game Trees s (lead TA: Zhaoqing) Due - PDF document

Announcements Homework k 3: Game Trees s (lead TA: Zhaoqing) Due Mon 30 Sep at 11:59pm Pr Project 2 t 2: Multi-Agent Search (lead TA: Zhaoqing) Due Thu 10 Oct at 11:59pm (and Thursdays thereafter) Offi Office Ho Hours


  1. Announcements • Homework k 3: Game Trees s (lead TA: Zhaoqing) • Due Mon 30 Sep at 11:59pm • Pr Project 2 t 2: Multi-Agent Search (lead TA: Zhaoqing) • Due Thu 10 Oct at 11:59pm (and Thursdays thereafter) • Offi Office Ho Hours • Iris: s: Mon 10.00am-noon, RI 237 • JW JW: Tue 1.40pm-2.40pm, DG 111 • El Eli: Fri 10.00am-noon, RY 207 • Zh Zhaoqi qing: : Thu 9.00am-11.00am, HS 202 CS 4100: Artificial Intelligence Uncertainty and Utilities Ja Jan-Wi Willem van de Meent Northeastern University [These slides were created by Dan Klein, Pieter Abbeel for CS188 Intro to AI at UC Berkeley (ai.berkeley.edu).]

  2. Uncertain Outcomes Worst-Case vs. Average Case max min 10 10 9 100 Id Idea: Uncertain outcomes controlled by chance, not an adversary!

  3. Expectimax Search • Why y wouldn’t we kn know what the resu sult of an action will be? max • Exp xplicit randomness: ss: rolling dice • Unpredictable opponents: s: the ghosts respond randomly • Actions s can fail: when moving a robot, wheels might slip chance • Id Idea: ea: Values should reflect average-case ( exp xpectimax ) outcomes, not worst-case ( mi minima max ) outcomes • Exp xpectimax se search : compute the ave verage sc score under optimal play 10 10 10 4 9 5 100 7 • Max x nodes s as in minimax search • Ch Chance n nodes are like min nodes but the outcome is uncertain • Calculate their exp xpected utilities • I.e. take weighted average (expectation) of children • Later, we’ll learn how to formalize the underlying uncertain- result problems as Marko kov v Decisi sion Processe sses [Demo: min vs exp (L7D1,2)] Minimax vs Expectimax (Minimax)

  4. Minimax vs Expectimax (Expectimax) Expectimax Pseudocode def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is EXP: return exp-value(state) def max-value(state): def exp-value(state): initialize v = -∞ initialize v = 0 for each successor of state: for each successor of state: v = max(v, value(successor)) p = probability(successor) return v v += p * value(successor) return v

  5. Expectimax Pseudocode def exp-value(state): initialize v = 0 for each successor of state: 1/2 1/6 p = probability(successor) 1/3 v += p * value(successor) return v 5 8 24 7 -12 v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10 Expectimax Example 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 3 12 9 2 4 6 15 6 0

  6. Expectimax Pruning? 3 12 9 2 Depth-Limited Expectimax Estimate of true … expectimax value 400 300 (which would require a lot of … work to compute) … 492 362

  7. Probabilities Reminder: Probabilities • A random va variable represents an eve vent whose out outcom come is unknown • A probability y dist stribution assigns weights to outcomes 0.25 • Exa xample: Traffic on freeway • Random va variable: T = am T = amount of tr ount of traffi affic s: T ∈ {none, light, heavy} • Outcomes: vy} • Dist stribution: P(T P(T=n =none) = 0 ) = 0.2 .25 , P(T P(T=ligh =light) = 0 t) = 0.5 .50 , P(T=heavy) vy) = 0.25 0.50 • Some laws s of probability y (more later): • Probabilities are always non non-negative ve • Probabilities over all possible outcomes su sum to one • As s we get more evi vidence, probabilities s may y change: 0.25 • P(T=heavy) vy) = 0.25 , P(T=heavy vy | Hour=8am) = 0.60 • We’ll talk about methods for reasoning and updating probabilities later

  8. Reminder: Expectations • The exp f(X) of a random xpected va value of a function f( variable X is is a weighted average over outcomes. • Exa xample: How long to get to the airport? Time: 20 min 30 min 60 min + + 35 min x x x Probability: 0.25 0.50 0.25 What Probabilities to Use? • In ex expect ectimax ax se search , we have a pr proba babi bilistic mo model of the opponent (or environment) • Model could be a simple uniform distribution (roll a die) • Model could be sophisticated and require a great deal of computation • We have a chance node for any outcome out of our control: opponent or environment • The model might say that adversarial actions are likely! • For now, assume each chance node “m “magically” comes along with probabilities that specify the distribution over its outcomes Having a probabilistic belief about another agent’s action does not mean that the agent is flipping any coins!

  9. Quiz: Informed Probabilities • Let’s say you know that your opponent is actually running a de depth pth 2 2 mi minima max , using the result 80% 80% of of the he time , and moving randomly y otherwise se • Quest stion: What tree search should you use? • An Answer: Ex Expecti tima max! To compute EACH chance node’s probabilities, • you have to run a simulation of your opponent • This kind of thing gets very slow very quickly 0.1 0.9 • Even worse if you have to simulate your opponent simulating you… • … except for minimax, which has the nice property that it all collapses into one game tree Modeling Assumptions

  10. The Dangers of Optimism and Pessimism Dangerous Optim Da imis ism Dangerous Pessim Da imis ism Assuming chance when the world is adversarial Assuming the worst case when it’s not likely Assumptions vs. Reality Adversarial Ghost Random Ghost Won 5/5 Won 5/5 Minimax Pacman Avg. Score: 483 Avg. Score: 493 Won 1/5 Won 5/5 Expectimax Pacman Avg. Score: -303 Avg. Score: 503 Results from playing 5 games Pacman used depth 4 search with an eval function that avoids trouble Ghost used depth 2 search with an eval function that seeks Pacman [Demos: world assumptions (L7D3,4,5,6)]

  11. Assumptions vs. Reality Adversarial Ghost Random Ghost Won 5/5 Won 5/5 Minimax Pacman Avg. Score: 483 Avg. Score: 493 Won 1/5 Won 5/5 Expectimax Pacman Avg. Score: -303 Avg. Score: 503 Results from playing 5 games Pacman used depth 4 search with an eval function that avoids trouble Ghost used depth 2 search with an eval function that seeks Pacman [Demos: world assumptions (L7D3,4,5,6)] Adversarial Ghost vs. Minimax Pacman

  12. Random Ghost vs. Expectimax Pacman Adversarial Ghost vs. Expectimax Pacman

  13. Random Ghost vs Minimax Pacman Other Game Types

  14. Mixed Layer Types • E.g. Backg kgammon • Exp xpectiminimax • Environment is an extra “r “rand andom om ag agent ent” ” player that moves after each min/max agent • Each node computes the appropriate combination of its children Example: Backgammon • Dice rolls s increase se breadth: 21 outcomes with 2 dice • Backg kgammon: ~20 legal move ves x 20) 3 = 1.2 x • De x 10 9 Depth th 2 2: 20 x x (21 x • As s depth increase ses, s, probability y of reaching a give ven se search node sh shrinks ks • So usefulness of search is diminished • So limiting depth is less damaging • But pruning is trickier… • Hist storic AI: TD TDGam Gammon on uses depth-2 search + very good evaluation function + reinforcement learning: world-champion level play st AI world champion in any • 1 st y game! Image: Wikipedia

  15. Multi-Agent Utilities • What if the game is not ze zero-su sum , or has multiple playe yers ? • Generaliza zation of mi minima max: • Terminals s have utility tuples s (one for each agent) • Node va values s are also utility tuples • Each player maximizes its own component • Can give rise to cooperation and competition dynamically… 1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5 Utilities

  16. Maximum Expected Utility • Why should we ave verage utilities ? Why not mi minima max ? • Minimax will be overly risk sk-ave verse se in most settings. • Principle of maxi ximum exp xpected utility • A ra rational agent should chose the action that maxi ximize zes s its s exp xpected utility , given its kn knowledge of the world . • Quest stions: s: • Where do utilities come from? • How do we know such utilities even exist? • How do we know that averaging even makes sense? • What if our behavior (preferences) can’t be described by utilities? What Utilities to Use? 20 30 x 2 400 900 0 40 0 1600 • For worst-case mi minima max reasoning, terminal sc scaling doesn’t matter • We just want better states to have higher evaluations (get the ordering right) • We call this inse sensi sitivi vity y to monotonic transf sformations • For average-case exp xpectimax reasoning, magnitudes s matter

  17. Utilities • Utilities s are functions from out outcom comes es (states of the world) to real numbers s that describe an agent’s pr prefe ferences • Where do utilities s come from? • In a ga game , may be simple ( +1 +1/-1 ) • Utilities summarize the agent’s go goals • Theor em: any “ra “rational” preferences can Theorem be summarized as a utility function • We har hard-wir wire utilities and let behaviors em emer erge • Why don’t we let agents pick utilities? • Why don’t we prescribe behaviors? Utilities: Uncertain Outcomes Getting ice cream Get Single Get Double Oops Whew!

  18. Preferences A Prize A Lottery • An agent must st have ve preferences s among: s: A, B , etc. • Prize zes: A • Lotteries: s: situations with uncertain prizes p 1 -p A B • No Notatio ion: • Pr Preference: • In Indiffe ifference: Rationality

Recommend


More recommend