Announcements CS 188: Artificial Intelligence � W2 is due today (lecture or drop box) Spring 2010 � P2 is out and due on 2/18 Lecture 8: MEU / Utilities 2/11/2010 Pieter Abbeel – UC Berkeley Many slides over the course adapted from Dan Klein 1 2 Expectimax Search Trees Maximum Expected Utility � What if we don’t know what the � Why should we average utilities? Why not minimax? result of an action will be? E.g., � In solitaire, next card is unknown � In minesweeper, mine locations max � Principle of maximum expected utility: an agent should � In pacman, the ghosts act randomly choose the action which maximizes its expected utility, � Can do expectimax search given its knowledge � Chance nodes, like min nodes, chance except the outcome is uncertain � Calculate expected utilities � General principle for decision making � Max nodes as in minimax search � Often taken as the definition of rationality � Chance nodes take average (expectation) of value of children 10 4 5 7 � We’ll see this idea over and over in this course! � Later, we’ll learn how to formalize the underlying problem as a � Let’s decompress this definition… Markov Decision Process � Probability --- Expectation --- Utility 4 5 Reminder: Probabilities What are Probabilities? � Objectivist / frequentist answer: � A random variable represents an event whose outcome is unknown � A probability distribution is an assignment of weights to outcomes � Averages over repeated experiments � E.g. empirically estimating P(rain) from historical observation � Example: traffic on freeway? � Assertion about how future experiments will go (in the limit) � Random variable: T = amount of traffic � New evidence changes the reference class � Outcomes: T in {none, light, heavy} � Makes one think of inherently random events, like rolling dice � Distribution: P(T=none) = 0.25, P(T=light) = 0.55, P(T=heavy) = 0.20 � Some laws of probability (more later): � Subjectivist / Bayesian answer: � Probabilities are always non-negative � Degrees of belief about unobserved variables � Probabilities over all possible outcomes sum to one � E.g. an agent’s belief that it’s raining, given the temperature � E.g. pacman’s belief that the ghost will turn left, given the state � As we get more evidence, probabilities may change: � P(T=heavy) = 0.20, P(T=heavy | Hour=8am) = 0.60 � Often learn probabilities from past experiences (more later) � We’ll talk about methods for reasoning and updating probabilities later � New evidence updates beliefs (more later) 6 7 1
Uncertainty Everywhere Reminder: Expectations � We can define function f(X) of a random variable X � Not just for games of chance! � I’m sick: will I sneeze this minute? � Email contains “FREE!”: is it spam? � The expected value of a function is its average value, � Tooth hurts: have cavity? weighted by the probability distribution over inputs � 60 min enough to get to the airport? � Robot rotated wheel three times, how far did it advance? � Example: How long to get to the airport? � Safe to cross street? (Look both ways!) � Length of driving time as a function of traffic: � Sources of uncertainty in random variables: L(none) = 20, L(light) = 30, L(heavy) = 60 � Inherently random process (dice, etc) � What is my expected driving time? � Insufficient or weak evidence � Notation: E[ L(T) ] � Ignorance of underlying processes � Remember, P(T) = {none: 0.25, light: 0.5, heavy: 0.25} � Unmodeled variables � The world’s just noisy – it doesn’t behave according to plan! � E[ L(T) ] = L(none) * P(none) + L(light) * P(light) + L(heavy) * P(heavy) � E[ L(T) ] = (20 * 0.25) + (30 * 0.5) + (60 * 0.25) = 35 9 10 Utilities Expectimax Search � Utilities are functions from outcomes (states of the world) � In expectimax search, we have a probabilistic model of how the to real numbers that describe an agent’s preferences opponent (or environment) will behave in any state � Where do utilities come from? � Model could be a simple uniform distribution (roll a die) � In a game, may be simple (+1/-1) � Model could be sophisticated � Utilities summarize the agent’s goals and require a great deal of � Theorem: any set of preferences between outcomes can be computation summarized as a utility function (provided the preferences meet � We have a node for every certain conditions) outcome out of our control: opponent or environment � The model might say that � In general, we hard-wire utilities and let actions emerge adversarial actions are likely! (why don’t we let agents decide their own utilities?) � For now, assume for any state we magically have a distribution to assign probabilities to � More on utilities soon… opponent actions / environment Having a probabilistic belief about outcomes an agent’s action does not mean that agent is flipping any coins! 12 13 Expectimax Search Expectimax Pseudocode � Chance nodes def value(s) � Chance nodes are like min if s is a max node return maxValue(s) nodes, except the outcome if s is an exp node return expValue(s) is uncertain 1 search ply if s is a terminal node return evaluation(s) � Calculate expected utilities � Chance nodes average successor values (weighted) def maxValue(s) � values = [value(s’) for s’ in successors(s)] Each chance node has a probability distribution over its return max(values) 8 4 5 6 outcomes (called a model) Estimate of true � For now, assume we’re … expectimax value 400 300 def expValue(s) given the model (which would values = [value(s’) for s’ in successors(s)] require a lot of � Utilities for terminal states weights = [probability(s, s’) for s’ in successors(s)] … work to compute) � Static evaluation functions return expectation(values, weights) … give us limited-depth search 492 362 15 2
Expectimax Evaluation Mixed Layer Types � E.g. Backgammon � Evaluation functions quickly return an estimate for a � Expectiminimax node’s true value (which value, expectimax or minimax?) � Environment is an extra � For minimax, evaluation function scale doesn’t matter player that moves after � We just want better states to have higher evaluations each agent (get the ordering right) � Chance nodes take � We call this insensitivity to monotonic transformations expectations, otherwise like minimax � For expectimax, we need magnitudes to be meaningful ExpectiMinimax-Value( state ): 20 30 x 2 400 900 0 40 0 1600 Stochastic Two-Player � Dice rolls increase b : 21 possible rolls with 2 dice � Backgammon ≈ 20 legal moves � Depth 4 = 20 x (21 x 20) 3 1.2 x 10 9 � As depth increases, probability of reaching a given node shrinks � So value of lookahead is diminished � So limiting depth is less damaging � But pruning is less possible… � TDGammon uses depth-2 search + very good eval function + reinforcement learning: world- champion level play 23 24 Maximum Expected Utility Utilities: Unknown Outcomes � Principle of maximum expected utility: Going to airport from home � A rational agent should choose the action which maximizes its expected utility, given its knowledge Take Take surface freeway � Questions: streets � Where do utilities come from? � How do we know such utilities even exist? � Why are we taking expectations of utilities (not, e.g. minimax)? Clear, Traffic, Clear, 10 min 50 min 20 min � What if our behavior can’t be described by utilities? Arrive Arrive Arrive early late on time 25 26 3
Preferences Rational Preferences � An agent chooses among: � We want some constraints on ⇒ preferences before we call � � � ( A B ) ( B C ) ( A C ) ∧ � Prizes: A, B , etc. them rational � Lotteries: situations with uncertain prizes � For example: an agent with intransitive preferences can be induced to give away all of its money � Notation: � If B > C, then an agent with C would pay (say) 1 cent to get B � If A > B, then an agent with B would pay (say) 1 cent to get A � If C > A, then an agent with A would pay (say) 1 cent to get C 27 28 Rational Preferences MEU Principle � Preferences of a rational agent must obey constraints. � Theorem: � The axioms of rationality: � [Ramsey, 1931; von Neumann & Morgenstern, 1944] � Given any preferences satisfying these constraints, there exists a real-valued function U such that: � Maximum expected utility (MEU) principle: � Choose the action that maximizes expected utility � Note: an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities � Theorem: Rational preferences imply behavior � E.g., a lookup table for perfect tictactoe describable as maximization of expected utility 29 30 Utility Scales Human Utilities � Normalized utilities: u + = 1.0, u - = 0.0 � Utilities map states to real numbers. Which numbers? � Standard approach to assessment of human utilities: � Micromorts: one-millionth chance of death, useful for paying to reduce product risks, etc. � Compare a state A to a standard lottery L p between � “best possible prize” u + with probability p � QALYs: quality-adjusted life years, useful for medical decisions � “worst possible catastrophe” u - with probability 1-p involving substantial risk � Adjust lottery probability p until A ~ L p � Note: behavior is invariant under positive linear transformation � Resulting p is a utility in [0,1] � With deterministic prizes only (no lottery choices), only ordinal utility can be determined, i.e., total order on prizes 31 32 4
Recommend
More recommend