outline
play

Outline Decision making Utility Theory Lecture 11 Decision - PDF document

Outline Decision making Utility Theory Lecture 11 Decision Trees Utility Theory Chapter 16 in R&N Note: Some of the material we are October 14, 2008 covering today is not in the textbook CS 486/686 1 2 CS486/686


  1. Outline • Decision making – Utility Theory Lecture 11 – Decision Trees Utility Theory • Chapter 16 in R&N – Note: Some of the material we are October 14, 2008 covering today is not in the textbook CS 486/686 1 2 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson Decision Making under Uncertainty Decision Making under Uncertainty • I give robot a planning problem: I want • But it’s more complex: coffee – it could wait 45 minutes for coffee maker to – but coffee maker is broken: robot reports be fixed “No plan!” – what’s better: tea now? coffee in 45 • If I want more robust behavior – if I minutes? want robot to know what to do when my – could express preferences for primary goal can’t be satisfied – I should <beverage,time> pairs provide it with some indication of my preferences over alternatives – e.g., coffee better than tea, tea better than water, water better than nothing, etc. 3 4 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson Preferences Preferences • A preference ordering ≽ is a ranking of • If an agent’s actions are deterministic all possible states of affairs (worlds) S then we know what states will occur – these could be outcomes of actions, truth • If an agent’s actions are not assts, states in a search problem, etc. deterministic then we represent this by – s ≽ t: means that state s is at least as lotteries good as t – Probability distribution over outcomes – s ≻ t: means that state s is strictly – Lottery L=[p 1 ,s 1 ;p 2 ,s 2 ;…;p n ,s n ] preferred to t – s 1 occurs with prob p 1 , s 2 occurs with prob – s~t: means that the agent is indifferent p 2 ,… between states s and t 5 6 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson 1

  2. Axioms Why Impose These Conditions? • Structure of preference • Orderability: Given 2 states A and B ordering imposes certain – (A ≻ B) v (B ≻ A) v (A ~ B) “rationality requirements” (it • Transitivity: Given 3 states, A, B, and C is a weak ordering) ≻ Best – (A ≻ B) ∧ (B ≻ C) ⇒ (A ≻ C) • E.g., why transitivity? • Continuity: – A ≻ B ≻ C ⇒ ∃ p [p,A;1-p,C] ~ B – Suppose you (strictly) prefer ≻ • Substitutability: coffee to tea, tea to OJ, OJ to coffee – A~B � [p,A;1-p,C] ~ [p,B;1-p,C] – If you prefer X to Y, you’ll • Monotonicity: ≻ trade me Y plus $1 for X – A ≻ B ⇒ (p ≥ q ⇔ [p,A;1-p,B] ≽ [q,A;1-q,B] – I can construct a “money pump” • Decomposibility: and extract arbitrary amounts – [p,A;1-p,[q,B;1-q,C]] ~ [p,A;(1-p)q,B; (1-p)(1-q),C] Worst of money from you 7 8 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson Decision Problems: Certainty Decision Making under Uncertainty • A decision problem under certainty is: c, ~mess – a set of decisions D getcoffee donothing ~c, ~mess ~c, mess • e.g., paths in search graph, plans, actions, etc. • Suppose actions don’t have deterministic outcomes – a set of outcomes or states S – e.g., when robot pours coffee, it spills 20% of time, making a mess • e.g., states you could reach by executing a plan – preferences: c, ~mess ≻ ~c,~mess ≻ ~c, mess – an outcome function f : D → S • What should robot do? • the outcome of any decision – decision getcoffee leads to a good outcome and a bad outcome with some probability – a preference ordering ≽ over S – decision donothing leads to a medium outcome for sure • Should robot be optimistic? pessimistic? • A solution to a decision problem is any • Really odds of success should influence decision d* ∊ D such that f(d*) ≽ f(d) for all d ∊ D – but how? 9 10 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson Utilities Expected Utility • Under conditions of uncertainty, each • Rather than just ranking outcomes, we must decision d induces a distribution Pr d over quantify our degree of preference possible outcomes – e.g., how much more important is c than ~mess – Pr d (s) is probability of outcome s under • A utility function U:S → ℝ associates a real- decision d valued utility with each outcome. – U(s) measures your degree of preference for s • Note: U induces a preference ordering ≽ U • The expected utility of decision d is over S defined as: s ≽ U t iff U(s) ≥ U(t) defined ∑ – obviously ≽ U will be reflexive, transitive, = EU ( d ) Pr ( s ) U ( s ) d connected ∈ s S 11 12 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson 2

  3. Expected Utility The MEU Principle • The principle of maximum expected c, ~mess utility (MEU) states that the optimal getcoffee donothing ~c, ~mess ~c, mess decision under conditions of uncertainty When robot pours coffee, it spills 20% of time, making is that with the greatest expected a mess utility. If U(c,~ms) = 10, U(~c,~ms) = 5, U(~c,ms) = 0, • In our example then EU(getcoffee) = (0.8)(10)+(0.2)(0)=8 – if my utility function is the first one, my and EU(donothing) = 5 robot should get coffee If U(c,~ms) = 10, U(~c,~ms) = 9, U(~c,ms) = 0, – if your utility function is the second one, then EU(getcoffee) = (0.8)(10)+(0.2)(0)=8 your robot should do nothing and EU(donothing) = 9 13 14 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson Expected Utility: Notes Decision Problems: Uncertainty • Note that this viewpoint accounts for • A decision problem under uncertainty is: both: – a set of decisions D – uncertainty in action outcomes – a set of outcomes or states S – uncertainty in state of knowledge – an outcome function Pr : D →Δ (S) – any combination of the two 0.7 t1 • Δ (S) is the set of distributions over S (e.g., Pr d ) – a utility function U over S s1 a 0.3 t2 0.7 s1 0.8 • A solution to a decision problem under 0.2 s2 a 0.3 s2 b uncertainty is any d* ∊ D such that EU(d*) ≽ 0.3 s0 0.7 w1 b s3 EU(d) for all d ∊ D 0.7 0.3 w2 s4 • Again, for single-shot problems, this is trivial Stochastic actions Uncertain knowledge 15 16 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson Expected Utility: Notes So What are the Complications? • Why MEU? Where do utilities come from? • Outcome space is large – underlying foundations of utility theory tightly – like all of our problems, states spaces can be huge couple utility with action/choice – don’t want to spell out distributions like Pr d explicitly – a utility function can be determined by asking – Soln: Bayes nets (or related: influence diagrams ) someone about their preferences for actions in • Decision space is large specific scenarios (or “lotteries” over outcomes) • Utility functions needn’t be unique – usually our decisions are not one-shot actions – rather they involve sequential choices (like plans) – if I multiply U by a positive constant, all decisions have same relative utility – if we treat each plan as a distinct decision, decision space is too large to handle directly – if I add a constant to U, same thing – Soln: use dynamic programming methods to construct – U is unique up to positive affine transformation optimal plans (actually generalizations of plans, called policies… like in game trees) 17 18 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson 3

Recommend


More recommend