Generating Plans in Concurrent, Probabilistic, Oversubscribed - PowerPoint PPT Presentation

Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University (Presented by: Li Li) AAAI 08 Chicago July 16, 2008

Outline  Example domain  Two usages of concurrent actions  AO* and CPOAO* algorithms  Heuristics used in CPOAO*  Experiment results  Conclusion and future work

A simple Mars rover domain Locations A, B, C and D on Mars: B D A C

Main features  Aspects of complex domains  Deadlines, limited resources  Failures  Oversubscription  Concurrency  Two types of parallel actions  Different goals (“all finish”)  Redundant (“early finish”)  Aborting actions  When they succeed  When they fail

The actions Action Success Description probability Move(L1,L2) 100% Move the rover from Location L1 to location L2 Sample (L) 70% Collect a soil sample at location L Camera (L) 60% Take a picture at location L

Problem 1  Initial state:  The rover is at location A  No other rewards have been achieved  Rewards:  r1 = 10: Get back to location A  r2 = 2: Take a picture at location B  r3 = 1: Collect a soil sample at location B  r4 = 3: Take a picture at location C

Problem 1  Time Limit:  The rover is only allowed to operate for 3 time units  Actions:  Each action takes 1 time unit to finish  Actions can be executed in parallel if they are compatible

A solution to problem 1 (1) Move (A, B) R2=2 (2) Camera (B) B R3=1 Sample (B) (3) Move (B, A) A D R1=10 C R4=3

Add redundant actions  Actions Camera0 (60%) and Camera1 (50%) can be executed concurrently.  There are two rewards :  R1: Take a picture P1 at location A  R2: Take a picture P2 at location A

Two ways of using concurrent actions  All finish: Speed up the execution Use concurrent actions to achieve different goals.  Early finish: Redundancy for critical tasks Use concurrent actions to achieve the same goal.

Example of all finish actions  If R1=10 and R2=10, execute Camera0 to achieve one reward and execute Camera1 to achieve another. (All finish) The expected total rewards = 10*60% + 10*50% = 11

Example of early finish actions  If R1=100 and R2=10, Use both Camera0 and Camera1 to achieve R1. (Early finish) The expected total rewards = 100*50% + (100-100*50%)*60% = 50 + 30 = 80

The AO* algorithm AO* searches in an and-or graph (hypergraph) Hyperarcs AND (compact way) OR

Concurrent Probabilistic Over- subscription AO* (CPOAO*)  Concurrent action set  Represent parallel actions rather than individual actions  Use hyperarcs to represent them  State Space  Resource levels are part of a state  Unfinished actions are part of a state

CPOAO* search Example A Mars Rover problem Targets: Map: B  I1 – Image at location B 4 3  I2 – Image at location C 6 A D  S1 – Sample at location B 4 5  S2 – Sample at location D C Rewards: Actions:  Have_Picture(I1) = 3  Move (Location, Location)  Have_Picture(I2) = 4  Image_S (Target) 50%, T= 4  Have_Sample(S1) = 4  Image_L (Target) 60%, T= 5  Have_Sample(S2) = 5  Sample (Target) 70%, T= 6  At_location(A) = 10;

CPOAO* search Example S0 T=10 B 4 18.4 3 6 A D 4 5 C Expected reward calculated using the heuristics

CPOAO* Search Example B 4 3 6 S0 15.2 T=10 A D 3 0 4 5 {Move(A,B) 4 6 Do-nothing C } {Move(A,D)} {Move(A,C)} S4 S1 S2 S3 10 15.2 13.2 0 T=10 T=7 T=6 T=4 Best action Expected reward calculated using the heuristics Expected reward calculated from children Values of terminal nodes

CPOAO* search Example B S0 15.2 4 3 {Move(A,B) 6 } A D S1 15.2 4 5 4 0 C 4 {a1,a2,a3} Do-nothing {Move(B,D)} S8 S5 S6 S7 0 15.8 14.6 0 T=7 T=3 T=3 T=3 50% 50% Sample(T2)=2 Sample(T2)=2 Best action Image_L(T1)=1 Image_L(T1)=1 a1: Sample(T2) Expected reward calculated a2: Image_S(T1) using the heuristics a3: Image_L(T1) Expected reward calculated from children Values of terminal nodes

CPOAO* search Example S0 15.2 4 3 {Move(A,B) 6 } A D S1 15.2 4 5 4 0 C 4 {a1,a2,a3} Do-nothing {Move(B,D)} S8 S5 S6 S7 0 15.8 14.6 0 T=7 T=3 T=3 T=3 50% 50% Sample(T2)=2 Sample(T2)=2 Best action Image_L(T1)=1 Image_L(T1)=1 a1: Sample(T2) Expected reward calculated a2: Image_S(T1) using the heuristics a3: Image_L(T1) Expected reward calculated from children Values of terminal nodes

CPOAO* search Example B S0 4 13.2 3 {Move(A,B) 6 } A D S1 11.5 S2 4 5 4 0 13.2 C 4 {a1,a2,a3} 50% Do-nothing {Move(B,D) T=3 S5 Sample(T2)=2 S6 } 13 Image_L(T1)=1 10 S8 S7 0 0 0 T=7 {a5} 2 1 3 3 0 a1: Sample(T2) {a1} {a1,a2 {a4} {a4} {a5} Best action a2: Image_S(T1) } a3: Image_L(T1) Expected reward calculated using the heuristics a4: Move(B,A) S9 S10 S11 S12 S13 S14 S15 S16 a5: Do-nothing 7 3 13 2.8 10 0 3 5.8 Expected reward T=1 T=1 T=0 T=2 T=0 T=3 calculated from children T=3 T=2 Values of terminal nodes

CPOAO* search Example B 4 3 6 S0 13.2 T=10 A D 3 0 4 5 {Move(A,B) 4 6 Do-nothing C } {Move(A,D)} {Move(A,C)} S1 11.5 S4 T=7 S2 S3 10 13.2 0 {a1,a2,a3} T=4 T=10 T=6 a1: Sample(T2) Best action a2: Image_S(T1) a3: Image_L(T1) Expected reward calculated using the heuristics a4: Move(B,A) a5: Do-nothing Expected reward calculated from children Values of terminal nodes

CPOAO* search Example B 4 3 6 S0 11.5 T=10 A D 3 0 4 5 {Move(A,B) 4 6 Do-nothing C } {Move(A,D)} {Move(A,C)} S1 S2 11.5 3.2 T=6 S4 T=7 S3 10 0 {a1,a2,a3} 4 0 T=4 T=10 5 {a6,a7} Do-nothing {a8} a1: Sample(T2) a2: Image_S(T1) S17 S18 S19 S20 Best action a3: Image_L(T1) 4 2.4 0 0 Expected reward calculated a4: Move(B,A) T=2 T=2 T=1 T=6 using the heuristics a5: Do-nothing 50% 50% a6: Image_S(T3) Expected reward calculated from children a7: Image_L(T3) Values of terminal nodes a8: Move(C,D)

CPOAO* search Example B S0 4 3 {Move(A,B) 11.5 6 } A D S1 11.5 4 5 4 0 C 4 S5 {a1,a2,a3} Do-nothing 50% 13 {Move(B,D) T=3 Sample(T2)=2 S6 } Image_L(T1)=1 10 S8 S7 0 0 0 T=7 T=3 {a5} 2 1 3 3 0 a1: Sample(T1) {a1} {a1,a2 Best action {a4} {a4} {a5} a2: Image_S(T2) } Expected reward calculated a3: Image_L(T2) using the heuristics a4: Move(B,A) S9 S10 S11 S12 S13 S14 S15 S16 a5: Do-nothing Expected reward 5 3 13 1.4 10 0 3 4.4 calculated from children T=1 T=1 T=0 T=2 T=0 T=3 T=3 T=2 Values of terminal nodes

CPOAO* search improvements S0 Estimate total expected rewards {Move(A,B) 11.5 } Prune branches S1 11.5 4 0 4 S5 {a1,a2,a3} Do-nothing 50% 13 {Move(B,D) T=3 Sample(T2)=2 S6 } Image_L(T1)=1 10 S8 S7 0 0 0 T=7 T=3 {a5} 2 1 3 3 0 {a1} {a1,a2 Plan Found: {a4} {a4} {a5} }  Move(A,B) S9 S10 S11 S12 S13 S14 S15 S16  Image_S(T1) 5 3 13 1.4 10 0 3 4.4 T=1 T=1 T=0 T=2 T=0 T=3 T=3 T=2  Move(B,A)

Heuristics used in CPOAO*  A heuristic function to estimate the total expected reward for the newly generated states using a reverse plan graph .  A group of rules to prune the branches of the concurrent action sets.

Estimating total rewards A three-step process using an rpgraph  Generate an rpgraph for each goal 1. Identify the enabled propositions 2. Compute the probability of achieving 3. each goal Compute the expected rewards based on  the probabilities Sum up the rewards to compute the  value of this state

Heuristics to estimate the total rewards At_Location(A)  Reverse plan Move(A,B) 7 graph At_Location(B) 3  Start from 4 Image_S(I1) At_Location(D) goals. Move(D,B) 8 4  Layers of Have_Picture(I1) 4 actions and At_Location(A) propositions Move(A,B) Image_L(I1) 8 5  Cost marked on 3 the actions 5 At_Location(B) 4  Accumulated 9 Move(D,B) cost marked on At_Location(D) the propositions.

Heuristics to estimate the total rewards At_Location(A)  Given a specific Move(A,B) 7 state … At_Location(B) 3 At_Location(A)=T 4 Image_S(I1) At_Location(D) Time= 7 Move(D,B) 8 4 Have_Picture(I1) 4  Enabled propositions are At_Location(A) marked in blue. Move(A,B) Image_L(I1) 8 5 3 5 At_Location(B) 4 9 Move(D,B) At_Location(D)

Heuristics to estimate the total rewards At_Location(A)  Enable more Move(A,B) 7 actions and At_Location(B) propositions 3 4 Image_S(I1)  At_Location(D) Actions are Move(D,B) 8 probabilistic 4 Have_Picture(I1) 4  Estimate the probabilities that Move(A,B) Image_L(I1) 8 propositions and 5 3 the goals being 5 true At_Location(B) 4 9  Sum up rewards on Move(D,B) all goals.

Rules to prune branches (when time is the only resource)  Include the action if it does not delete anything Ex. {action-1, action-2, action-3} is better than {action-2,action-3} if action-1 does not delete anything.  Include the action if it can be aborted later Ex. {action-1,action-2} is better than {action-1} if the duration of action2 is longer than the duration of action-1.  Don’t abort an action and restart it again immediately

Generating Plans in Concurrent, Probabilistic, Oversubscribed - PowerPoint PPT Presentation

Generating Plans in Concurrent, Probabilistic, Oversubscribed Domains Li Li and Nilufer Onder Department of Computer Science Michigan Technological University (Presented by: Li Li) AAAI 08 Chicago July 16, 2008 Outline Example domain

H2 F2009 H2 F2009 GENERATING GENERATING GENERATING GENERATING FREE CASH FLOW FREE CASH FLOW

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Advanced Electric Generating Advanced Electric Generating Advanced Electric Generating

Ratchaburi Electricity Generating Holding PCL. Ratchaburi Electricity Generating Holding PCL.

Recursive Definitions Generating Functions Lecture 18 Generating Functions A generating

Concurrent Enrollment A Guide for Parents and Students What is Concurrent Enrollment? Concurrent

Concurrent Message Service M. Clemencic CERN - LHCb Forum on Concurrent Programming Models and

Concurrent Programming in Scala 1 / 7 Concurrent Programming 1 Concurrent programming:

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Chapter 17 Employee Benefits: Retirement Plans Fundamentals of Private Retirement Plans

Generating Subfields Mark van Hoeij June 15, 2017 Mark van Hoeij Generating Subfields Overview

Atikokan Generating Station Thunder Bay Generating Station March 5, 2013 Alberta Biomaterials

Concurrent programming made simple The (r)evolution of transactional memory Torvald Riegel Nuno

Concurrent Enrollment Board Policy 6172.1 May 13, 2020 Background Definition of concurrent

Towards safer Concurrent Device Drivers Making Safer Concurrent Device Drivers. Modeling RMoX

Modeling and Analyzing Concurrent Systems Robert B. France 1 Overview Why model and analyze

Fast In Memory Checkpointing with POSIX API for Legacy Exascale Applications Jan Fajerski,

The Impact of Process Placement and Oversubscription on Application Performance: A Case Study for

GPCF* Update Present status as a series of questions / answers related to decisions made / yet

ICT and Development ICT and Development Week 10 March 28 - 30 1 Computers and Society

QMPI: A Library for Multithreaded MPI Applications Alex Brooks Hoang-Vu Dang Marc Snir Outline

Firecracker How to Securely Run Thousands of Workloads on a Single Host What is Firecracker? -

Customer Performance Jim Warner University of California Santa Cruz March 2014 Exaggerated

CloudMirror : T enant Network Abstraction that Reflects Applications Needs Myungjin Lee