Making Decisions 10 AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 1
10 Making Decisions 10.1 Decision making agent 10.2 Preferences 10.3 Utilities 10.4 Decision networks • Decision networks • Value of information • Sequential decision problem ∗ 10.5 Game theory ∗ AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 2
Decision making agent function Decision-Theoretic-Agent ( percept ) returns action Updated decision-theoretic policy for current state based on available information including current percept and previous action calculate outcome for actions given action descriptions and utility of current states select action with highest expected utility given outcomes and utility information return action Decision theories: an agent’s choices • Utility theory: worth or value utility function – preference ordering over a choice set • Game theory: strategic interaction between rational decision- makers Hint: AI → Economy → Computational economy AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 3
Making decisions under uncertainty Suppose I believe the following: P ( A 25 gets me there on time | . . . ) = 0 . 04 P ( A 90 gets me there on time | . . . ) = 0 . 70 P ( A 120 gets me there on time | . . . ) = 0 . 95 P ( A 1440 gets me there on time | . . . ) = 0 . 9999 Which action to choose? Depends on my preferences for missing flight vs. airport cuisine, etc. Utility theory is used to represent and infer preferences Decision theory = probability theory + utility theory AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 4
Preferences An agent chooses among prizes ( A , B , etc.) and lotteries, i.e., situ- ations with uncertain prizes A p L Lottery L = [ p, A ; (1 − p ) , B ] 1−p B In general, a lottery (state) L with possible outcomes S 1 , · · · , S n that occur with probabilities p 1 , · · · , p n L = [ p 1 , S 1 ; · · · ; p n , S n ] each outcome S i of a lottery can be either an atomic state or another lottery AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 5
Preferences Notation A ≻ B A preferred to B A ∼ B indifference between A and B A ≻ ∼ B B not preferred to A Rational preferences preferences of a rational agent must obey constraints ⇒ behavior describable as maximization of expected utility AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 6
Axioms of preferences Orderability ( A ≻ B ) ∨ ( B ≻ A ) ∨ ( A ∼ B ) Transitivity ( A ≻ B ) ∧ ( B ≻ C ) ⇒ ( A ≻ C ) Continuity A ≻ B ≻ C ⇒ ∃ p [ p, A ; 1 − p, C ] ∼ B Substitutability A ∼ B ⇒ [ p, A ; 1 − p, C ] ∼ [ p, B ; 1 − p, C ] ( A ≻ B ⇒ [ p, A ; 1 − p, C ] ≻ [ p, B ; 1 − p, C ] ) Monotonicity A ≻ B ⇒ ( p ≥ q ⇔ [ p, A ; 1 − p, B ] ≻ [ q, A ; 1 − q, B ]) Decomposability [ p, A ; 1 − p, [ q, B ; 1 − q, C ]] ∼ [ p, A ; (1 − p ) q, B ; (1 − p )(1 − q ) , C ] AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 7
Rational preferences Violating the constraints leads to self-evident irrationality For example: an agent with intransitive preferences can be induced to give away all its money If B ≻ C , then an agent who has A C would pay (say) 1 cent to get B 1c 1c If A ≻ B , then an agent who has B would pay (say) 1 cent to get A B C If C ≻ A , then an agent who has 1c A would pay (say) 1 cent to get C AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 8
Utilities Preferences are captured by a utility function, U ( s ) assigns a single number to express the desirability of a state The expected utility of an action given the evidence, EU ( a | e ) the average utility value of the outcomes, weighted by the prob- ability that the outcome occurs U ( a | e ) = Σ s ′ P ( Result ( a ) = s ′ | a, e ) ∪ U ( s ′ ) Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944): Given preferences satisfying the axioms, there exists a real-valued function U s.t. A ≻ U ( A ) ≥ U ( B ) ⇔ ∼ B U ([ p 1 , S 1 ; . . . ; p n , S n ]) = Σ i p i U ( S i ) AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 9
Maximizing expected utility MEU principle Choose the action that maximizes expected utility a ∗ = argmax a EU ( a | e ) Note: an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities E.g., a lookup table for perfect tic-tac-toe AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 10
Utility function Utilities map states (lotteries) to real numbers. Which numbers? Standard approach to assessment of human utilities compare a given state A to a standard lottery L p that has “best possible prize” u ⊤ with probability p “worst possible catastrophe” u ⊥ with probability (1 − p ) adjust lottery probability p until A ∼ L p continue as before 0.999999 pay $30 ~ L 0.000001 instant death Say, pay a monetary value on life AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 11
Utility scales Normalized utilities: u ⊤ = 1 . 0 , u ⊥ = 0 . 0 Micromorts (micro-mortality): one-millionth chance of death useful for Russian roulette, paying to reduce product risks, etc. QALYs: quality-adjusted life years useful for medical decisions involving substantial risk Note: behavior is invariant w.r.t. +ve linear transformation U ′ ( x ) = k 1 U ( x ) + k 2 where k 1 > 0 AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 12
Money Money does not behave as a utility function Given a lottery L with expected monetary value EMV ( L ) , usually U ( L ) < U ( EMV ( L )) , i.e., people are risk-averse Utility curve: for what probability p am I indifferent between a prize x and a lottery [ p, $ M ; (1 − p ) , $0] for large M ? Typical empirical data, extrapolated with risk-prone behavior: +U o o o o o o o o o o o o +$ −150,000 800,000 o o o AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 13
Multiattribute utility How can we handle utility functions of many variables X 1 . . . X n ? E.g., what is U ( Deaths, Noise, Cost ) ? How can complex utility functions be assessed from preference behaviour? Idea 1: identify conditions under which decisions can be made without complete identification of U ( x 1 , . . . , x n ) Idea 2: identify various types of independence in preferences and derive consequent canonical forms for U ( x 1 , . . . , x n ) AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 14
Strict dominance Typically define attributes such that U is monotonic in each Strict dominance: choice B strictly dominates choice A iff ∀ i X i ( B ) ≥ X i ( A ) (and hence U ( B ) ≥ U ( A ) ) X X 2 2 This region dominates A B C B C A A D X X 1 1 Deterministic attributes Uncertain attributes Strict dominance seldom holds in practice AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 15
Stochastic dominance 1.2 1 1 0.8 0.8 Probability Probability 0.6 0.6 S1 S2 S1 0.4 S2 0.4 0.2 0.2 0 0 -6 -5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2 -6 -5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2 Negative cost Negative cost Distribution p 1 stochastically dominates distribution p 2 iff � t � t ∀ t −∞ p 1 ( x ) dx ≤ −∞ p 2 ( t ) dt If U is monotonic in x , then A 1 with outcome distribution p 1 stochastically dominates A 2 with outcome distribution p 2 : � ∞ � ∞ −∞ p 1 ( x ) U ( x ) dx ≥ −∞ p 2 ( x ) U ( x ) dx Multiattribute: stochastic dominance on all attributes ⇒ optimal AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 16
Stochastic dominance Stochastic dominance can often be determined without exact distributions using qualitative reasoning E.g., construction cost increases with distance from city S 1 is closer to the city than S 2 ⇒ S 1 stochastically dominates S 2 on cost E.g., injury increases with collision speed Can annotate belief networks with stochastic dominance information + X → Y ( X positively influences Y ) means that − For every value z of Y ’s other parents Z ∀ x 1 , x 2 x 1 ≥ x 2 ⇒ P ( Y | x 1 , z ) stochastically dominates P ( Y | x 2 , z ) AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 17
Label the arcs + or – SocioEcon Age GoodStudent ExtraCar Mileage RiskAversion VehicleYear SeniorTrain MakeModel DrivingSkill DrivingHist Antilock DrivQuality HomeBase AntiTheft CarValue Airbag Accident Ruggedness Theft OwnDamage Cushioning OwnCost OtherCost MedicalCost LiabilityCost PropertyCost AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 18
Label the arcs + or – SocioEcon Age GoodStudent ExtraCar Mileage RiskAversion VehicleYear SeniorTrain + MakeModel DrivingSkill DrivingHist Antilock DrivQuality HomeBase AntiTheft CarValue Airbag Accident Ruggedness Theft OwnDamage Cushioning OwnCost OtherCost MedicalCost LiabilityCost PropertyCost AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 19
Label the arcs + or – SocioEcon Age GoodStudent ExtraCar Mileage RiskAversion VehicleYear + SeniorTrain + MakeModel DrivingSkill DrivingHist Antilock DrivQuality HomeBase AntiTheft CarValue Airbag Accident Ruggedness Theft OwnDamage Cushioning OwnCost OtherCost MedicalCost LiabilityCost PropertyCost AI Slides (6e) c � Lin Zuoquan@PKU 1998-2020 10 20
Recommend
More recommend