Dec ecision ision Th Theo eory: ry: Seq equential uential De Decisions cisions Computer ter Sc Scienc nce e cpsc sc32 322, 2, Lect ctur ure e 34 (Te Textb xtbook ok Ch Chpt 9.3) No Nov, , 29, 2013
“Single” Action vs. Sequence of Actions Set of primitive decisions that can be treated as a single macro decision to be made before acting • Agent makes observations • Decides on an action • Carries out the action
Lecture cture Ov Overview rview • Sequential Decisions • Representation • Policies • Finding Optimal Policies
Sequential decision problems • A sequential decision problem consists of a sequence of decision variables D 1 ,….., D n . • Each D i has an information set of variables pD i , whose value will be known at the time decision D i is made.
Sequential decisions : Simplest possible • Only one decision! (but different from one-off decisions) • Early in the morning. I listen to the we weather er fore recas cast, shall I take my umbre rella today? (I’ll have to go for a long walk at noon noon) • What is a reasonable decision network ? Weather@12 Weather@12 Morning Morning Forecast Forecast U U Take Umbrella Take Umbrella A. C . . B. D . . None of these
Sequential decisions : Simplest possible • Only one decision! (but different from one-off decisions) • Early in the morning. Shall I take my umbrell ella today? (I’ll have to go for a long walk at noon) • Relevant Random Variables?
Policies for Sequential Decision Problem: Intro • A policy specifies what an agent should do under each circumstance (for each decision, consider the parents of the decision node) In the Umbrella “degenerate” case: D 1 One possible Policy pD 1 How many policies?
Sequential decision problems: “complete” Example • A sequential decision problem consists of a sequence of decision variables D 1 ,….., D n . • Each D i has an information set of variables pD i , whose value will be known at the time decision D i is made. No-forgetting decision network: • decisions are totally ordered • if a decision D b comes before D a ,then • D b is a parent of D a • any parent of D b is a parent of D a
Policies for Sequential Decision Problems • A policy is a sequence of δ 1 ,….., δ n decision on funct ction ons δ i : dom( pD i ) → dom( D i ) • This policy means that when the agent has observed O dom( pD i ) , it will do δ i ( O ) Example: Report port Check Smoke Repor port Check eckSm Smoke SeeSm eSmoke Call true true true true true true false false true false true true How many policies? true false false false false true true true false true false false false false true false false false false false
Lecture cture Ov Overview rview • Recap • Sequential Decisions • Finding Optimal Policies
When does a possible world satisfy a policy? • A possible world specifies a value for each random variable and each decision variable. • Possible ble wo world w satisfies isfies policy δ , written w ╞ δ if the value of each decision variable is the value selected by its decision function in the policy (when applied in w ). Decision function for… Report port Chec eck Smok oke true true VARs false false Fire true Decision function for… Tampering false Report t CheckS ckSmoke ke SeeSmoke ke Call Alarm true Leaving true true true true true Report false true true false false true false true true Smoke true true false false false true SeeSmoke false true true true true Chec eckSm Smok oke false true false false true Call false false true false false false false false
When does a possible world satisfy a policy? • Possible world w satisfies policy δ , written w ╞ δ if the value of each decision variable is the value selected by its decision function in the policy (when applied in w ). Decision function for… Report port Chec eck Smok oke VARs true true Fire true false false Decision function for… Tampering false Alarm true Report t CheckS ckSmoke ke SeeSmoke ke Call Leaving true Report true true true true true Smoke true true true false false true SeeSmoke true false true true true true false false false Chec eckSm Smok oke false true true true true Call false true false false false false true false false false false false A. A. B. C . C . Ca Cannot t tell
Expected Value of a Policy • Each possible world w has a probability P( w ) and a utility U( w ) • The expected utility of policy δ is • The optimal policy is one with the expected utility.
Lecture cture Ov Overview rview • Recap • Sequential Decisions • Finding Optimal Policies (Efficiently)
Complexity of finding the optimal policy: how many policies? • How many assignments to parents? • How many decision functions? (binary decisions) • How many policies? • If a decision D has k binary parents, how many assignments of values to the parents are there? • If there are b possible actions (possible values for D), how many different decision functions are there? • If there are d decisions, each with k binary parents and b possible actions, how many policies are there?
Finding the optimal policy more efficiently: VE 1. Create a factor for each conditional probability table and a factor for the utility. 2. 2. Su Sum out random variables that are not parents of a decision node. 3. 3. Eliminat ate (aka sum out) the decision variables 4. 4. Su Sum out the remaining random variables. 5. 5. Multi tiply ply the factor ctors: this is the expected utility of the optimal policy.
Eliminate the decision Variables: step3 details • Select a variable D that corresponds to the latest decision to be made • this variable will appear in only one factor with its parents • Eliminate D by maximizing. This returns: • A new factor to use in VE, max D f • The optimal decision function for D , arg max D f • Repeat till there are no more decision nodes. Report New factor Value lue Examp Ex mple: le: El Elim iminate inate Ch CheckS kSmo moke ke true false Report t CheckSmok oke Value lue Decision Function true true -5.0 true false -5.6 Report Check eckSm Smok oke false true -23.7 true false false -17.5 false
VE elimination reduces complexity of finding the optimal policy • We have seen that, if a decision D has k binary parents, there are b possible actions, If there are d decisions, • Then there are: ( b 2 k ) d policies • Doing variable elimination lets us find the optimal policy after considering only d . b 2 k policies (we eliminate one decision at a time) • VE is much h more re effi fici cient ent than searching through policy space. • However, this complexity is still l doubly-exp expon onenti ential al we'll only be able to handle relatively small problems.
Learning Goals for today’s class Yo You u can an: • Represent seque uenti ntial al decis isio ion n proble lems ms as decision networks. And explain the non forget gettin ing g proper erty ty • Verify whether a possi sibl ble e world d satis tisfie fies s a polic icy y and define the expec ected d value e of a policy icy • Compute the number r of polic icie ies s for a decision problem • Compute te the optim imal al polic icy y by Variable Elimination CPSC 322, Lecture 4 Slide 20
Big g Picture: cture: Planning anning under der Uncertainty certainty Probability Theory Decision Theory One-Off Decisions/ Markov Decision Processes (MDPs) Sequential Decisions Partially Fully Observable Observable MDPs MDPs (POMDPs) Decision Support Systems (medicine, business, …) Economics Control Robotics Systems 21
Cpsc sc 322 2 Big g Picture cture En Enviro ronm nmen ent Stochastic Deterministic Problem Arc Consistency Search Constraint Vars + Satisfaction Constraints SLS Static Belief Nets Logics Query Var. Elimination Search Markov Chains Sequential STRIPS Decision Nets Planning Var. Elimination Search Representation Reasoning CPSC 322, Lecture 2 Slide 22 Technique
Recommend
More recommend