Last update: May 1, 2020 Chapter 6 Deliberation with Probabilistic Automated Planning Domain Models and Acting Malik Ghallab, Dana Nau and Paolo Traverso Dana S. Nau http://www.laas.fr/planning University of Maryland Nau – Lecture slides for Automated Planning and Acting Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License 1
Motivation ● Situations where actions have multiple possible outcomes and each outcome has a probability ● Several possible action representations ▸ Bayes nets, probabilistic actions, … ● Book doesn’t commit to any representation ▸ Mainly concentrates on the underlying semantics roll-die ( d ) pre: holding ( d ) = true eff: top ( d ) ← 1 1/6: top ( d ) ← 2 1/6: top ( d ) ← 3 1/6: top ( d ) ← 4 1/6: top ( d ) ← 5 1/6: top ( d ) ← 6 1/6: Nau – Lecture slides for Automated Planning and Acting 2
Probabilistic planning domain ← Definitions d5 → → Example d2 Σ = ( S , A , γ, Pr, cost) → ● Start at d1 , want to get to d4 d3 ● S = {states} ↔ ↔ ↔ ● Some roads are one-way, ● A = {actions} Start: some are two-way ● γ : S × A → 2 S s 0 = d1 → ● Unreliable steering, d4 ● Pr( s ′ | s , a ) = probability of → r1 especially on hills Goal: → going to state s ′ if we → ▸ may slip and S g = { d4 } apply a in s d1 ← go elsewhere ▸ Pr( s ′ | s , a ) ≠ 0 iff ● Simplified state and action names: s ′ ∈ γ ( s,a ) ▸ write { loc(r1) = d2 } as d2 ● γ( d1 , m14 ) = { d1,d4 } ● cost: S × A → R ≥0 ▸ Pr( d4 | d1,m14 ) = 0.5 ▸ write move(r1,d2,d3) as m23 ▸ cost( s,a ) = cost of action a ▸ Pr( d1 | d1,m14 ) = 0.5 in state s ● γ( d1 , m12 ) = { d2 } ▸ may omit, default is ● γ( d2 , m23 ) = { d3,d5 } ▸ Pr( d2 | d1,m12 ) = 1 cost( s,a ) = 1 ▸ Pr( d3 | d2,m23 ) = 0.8 ▸ Pr( d5 | d2,m23 ) = 0.2 ● m21, m34, m41, m43, m45, m52, m54 : ● Applicable( s ) = { a | γ( s,a ) ≠ ∅ } ▸ like m12 ● there’s no m25 Nau – Lecture slides for Automated Planning and Acting 3
Probabilistic planning domain m52 d5 0.2 Definitions Example d2 m23 Σ = ( S , A , γ, Pr, cost) 0.8 d3 ● γ( d1 , m12 ) = { d2 } ● S = {states} ▸ Pr( d2 | d1,m12 ) = 1 4 ● A = {actions} 2 5 m21 1 m m34 m 3 ● γ : S × A → 2 S ● m21, m34, m41, m43, 4 m m45 m45, m52, m54 : ● Pr( s ′ | s , a ) = probability of m41 Start: ▸ like m12 going to state s ′ if we s 0 = d1 d1 Goal: apply a in s d4 m14 ● γ( d1 , m14 ) = { d1,d4 } S g = { d4 } ▸ Pr( s ′ | s , a ) ≠ 0 iff 0.5 ▸ Pr( d4 | d1,m14 ) = 0.5 0.5 s ′ ∈ γ ( s,a ) ▸ Pr( d1 | d1,m14 ) = 0.5 ● cost: S × A → R ≥0 ▸ cost( s,a ) = cost of action a ● γ( d2 , m23 ) = { d3,d5 } Poll : Can a plan (sequence of actions) in state s ▸ Pr( d3 | d2,m23 ) = 0.8 be a solution for this problem? ▸ may omit, default is ▸ Pr( d5 | d2,m23 ) = 0.2 1. yes cost( s,a ) = 1 2. no ● there’s no m25 ● Applicable( s ) = { a | γ( s,a ) ≠ ∅ } Nau – Lecture slides for Automated Planning and Acting 4
̂ m52 Policies, Problems, Solutions d5 0.2 ● Stochastic shortest path (SSP) problem: d2 m23 ▸ a triple ( S , s 0 , S g ) 0.8 d3 ● Policy : partial function π : S → A such that 4 2 5 m21 1 for every s ∈ Dom( π ) ⊆ S , π ( s ) ∈ Applicable( s ) m m34 m 3 4 ▸ π ( s ) = a only if a ∈ Applicable( s ) m m45 m41 Start: ● Transitive closure s 0 = d1 d1 ▸ ̂ d4 γ ( s , π ) = { s and all states reachable from s using π } m14 0.5 0.5 Goal: S g = { d4 } ● Graph( s , π ) = rooted graph induced by π at s ▸ nodes: ̂ γ ( s , π ); edges: state transitions ● π 1 = { (d1, m12), (d2, m23), (d3, m34) } ● leaves ( s , π ) = ̂ γ ( s , π ) ∖ Dom( π ) ▸ Dom(π 1 ) = { d1, d2, d3 } ● Solution for ( S , s 0 , S g ): a policy π such that s 0 ∈ Dom( π ) ▸ ̂ γ ( d1 ,π 1 ) = { d1, d2, d3, d4, d5 } and ▸ leaves ( s 0 ,π) ∩ S g ≠ ∅ γ (( s 0 ,π) ∩ S g ≠ ∅ ● leaves( d1 ,π 1 ) = ̂ γ ( d1 ,π 1 ) ∖ Dom(π 1 ) = { d4, d5 } ▸ http://www.cs.umd.edu/users/nau/apa/slides/errata.pdf Nau – Lecture slides for Automated Planning and Acting 5
Notation and Terminology m52 d5 0.2 ● A solution policy π is closed if it doesn’t stop at d2 m23 non-goal states unless there’s no way to continue 0.8 d3 4 2 ● π is closed for every state in ̂ γ ( s , π ), either m21 5 1 m34 m m 3 4 ▸ s ∈ Dom( π ) (i.e., π specifies an action at s ) m m45 ▸ s ∈ S g m41 Start: ▸ Applicable( s ) = ∅ s 0 = d1 d1 d4 m14 0.5 0.5 Goal: S g = { d4 } ● For the rest of this chapter we require all solutions to be closed ● π 1 = { (d1, m12), (d2, m23), (d3, m34) } ● π 2 = { (d1, m12), (d2, m23), (d3, m34), (d5, m54) } Nau – Lecture slides for Automated Planning and Acting 6
Dead Ends ● Dead end: ▸ A state or set of states from Implicit dead end which the goal is unreachable d5 0.2 d6 Implicit dead end ● Explicit dead end: no applicable d2 0.8 d3 actions Explicit dead end ● Implicit dead end: applicable actions, but no path to the goal d6 0.5 Start: 0.5 Goal: s 0 = d1 d1 d4 S g = { d4 } Nau – Lecture slides for Automated Planning and Acting 7
m52 Histories d5 ● History : sequence of states 0.2 σ = á s 0 , s 1 , s 2 , … ñ d2 m23 0.8 ▸ May be finite or infinite d3 σ = á d1 , d2 , d3 , d4 ñ 4 2 m21 5 1 σ = á d1 , d2 , d1 , d2 , … ñ m34 m m 3 4 m m45 ● Let H ( s , π ) = {all possible histories if we start at s m41 Start: and follow π, stopping if we reach a state s′ s 0 = d1 d1 d4 such that s′ ∉ Dom( π ) or s′ ∈ S g } m14 0.5 0.5 Goal: S g = { d4 } ● If σ ∈ H ( s , π ) then Pr ( σ | s , π ) = Õ Pr ( s i +1 | s i ,π ( s i )) s i ,s i +1 ∈ σ ● π 1 = { (d1, m12), (d2, m23), (d3, m34) } product of the probabilities of the states ▸ Thus ∑ σ ∈ H ( s,π ) Pr ( σ | s , π ) = 1 ▸ H ( s 0 ,π ) = { ⟨ d1,d2,d3,d4 ⟩ , ⟨ d1,d2,d5 ⟩ } ▸ Pr( ⟨ d1,d2,d3,d4 ⟩ | s 0 ,π ) = 1 × 0.8 × 1 = 0.8 ● Probability of reaching a goal state: ▸ Pr( ⟨ d1,d2,d5 ⟩ | s 0 ,π ) = 1 × 0.2 × 1 = 0.2 ▸ Pr ( S g | s , π ) = ∑ σ ∈ H ( s,π ) {Pr ( σ | s , π ) | σ ends at a state in S g } ▸ Pr( S g | s 0 ,π ) = Pr( ⟨ d1,d2,d3,d4 ⟩ | s 0 ,π ) = 0.8 ▸ Formula in book is equivalent but more complicated Nau – Lecture slides for Automated Planning and Acting 8
Unsafe Solutions m52 d5 ● Unsafe solution: 0.2 ▸ 0 < Pr ( S g | s 0 , π ) < 1 d2 m23 0.8 d3 ● Example: 4 2 m21 5 π 1 = { (d1, m12), 1 m m34 m 3 4 (d2, m23), m m45 (d3, m34) } m41 Start: s 0 = d1 d1 d4 m14 0.5 0.5 Goal: S g = { d4 } ● H ( s 0 ,π 1 ) contains two histories: ▸ σ 1 = á d1, d2, d3, d4 ñ Pr ( σ 1 | s 0 ,π 1 ) = 1 ´ .8 ´ 1 = .8 ▸ σ 2 = á d1, d2, d5 ñ Pr ( σ 2 | s 0 ,π 1 ) = 1 ´ .2 = .2 ● Pr ( S g | s 0 , π 1 ) = .8 Nau – Lecture slides for Automated Planning and Acting 9
Unsafe Solutions m52 d6 d5 ● Unsafe solution: 0.2 ▸ 0 < Pr ( S g | s 0 , π ) < 1 d2 m23 0.8 d3 ● Example: 4 2 m21 5 π 2 = { (d1, m12), 1 m m34 m 3 4 (d2, m23), m m45 (d3, m34), m41 Start: (d5, move(r1,d5,d6)), s 0 = d1 d1 d4 (d6, move(r1,d6,d5)) } m14 0.5 0.5 Goal: S g = { d4 } ● H ( s 0 , π 2 ) contains two histories: ▸ σ 1 = á d1, d2, d3, d4 ñ Pr ( σ 1 | s 0 , π 2 ) = 1 ´ .8 ´ 1 = .8 ▸ σ 3 = á d1, d2, d5, d6, d5, d6, … ñ Pr ( σ 3 | s 0 , π 2 ) = 1 ´ .2 ´ 1 ´ 1 ´ 1 ´ … = .2 ● Pr ( S g | s 0 , π 2 ) = .8 Nau – Lecture slides for Automated Planning and Acting 10
Safe Solutions m52 d5 0.2 ● Safe solution: d2 m23 0.8 ▸ Pr ( S g | s 0 , π ) = 1 d3 4 2 m21 5 ● An acyclic safe solution: 1 m m34 m 3 4 π 3 = { (d1, m12), m m45 (d2, m23), m41 Start: (d3, m34), d1 s 0 = d1 d4 (d5, m54) } m14 0.5 0.5 Goal: S g = { d4 } ● H ( s 0 , π 3 ) contains two histories: ▸ σ 1 = á d1, d2, d3, d4 ñ Pr ( σ 1 | s 0 , π 3 ) = 1 ´ .8 ´ 1 = .8 ▸ σ 4 = á d1, d2, d5, d4 ñ Pr ( σ 4 | s 0 , π 3 ) = 1 ´ .2 ´ 1 = .2 Pr ( S g | s 0 , π 3 ) = .8 + .2 = 1 Nau – Lecture slides for Automated Planning and Acting 11
Safe Solutions m52 d5 ● Safe solution: 0.2 ▸ Pr ( S g | s 0 , π ) = 1 d2 m23 0.8 d3 4 2 ● A cyclic safe solution: m21 5 1 m m34 m 3 4 π 4 = { (d1, m14 } m m45 m41 Start: d1 s 0 = d1 d4 m14 ● H ( π 4 ) contains infinitely many histories: 0.5 ▸ σ 5 = á d1, d4 ñ 0.5 Goal: S g = { d4 } Pr ( σ 5 | s 0 , π 4 ) = ½ Pr ( σ 6 | s 0 , π 4 ) = (½) 2 = ¼ ▸ σ 6 = á d1, d1, d4 ñ Pr ( σ 6 | s 0 , π 4 ) = (½) 3 = 1 / 8 ▸ σ 7 = á d1, d1, d1, d4 ñ • • • ▸ σ ∞ = á d1, d1, d1, d1, d1, … ñ Poll: what is Pr ( σ ∞ | s 0 , π 4 )? Pr ( S g | s 0 , π 4 ) = ½ + ¼ + 1 / 8 + … = 1 Nau – Lecture slides for Automated Planning and Acting 12
Safe Solutions m52 ● Safe solution: d5 0.2 ▸ Pr ( S g | s 0 , π ) = 1 d2 m23 0.8 d3 ● Another cyclic safe 4 solution: 2 m21 5 1 m m34 m 3 π 5 = { (d1, m54), (d4, m41) } 4 m m45 m41 Start: ● Recall we stop when we reach a goal d1 s 0 = d1 d4 m14 0.5 0.5 ● H ( π 5 ) = H ( π 4 ): Goal: S g = { d4 } ▸ σ 5 = á d1, d4 ñ Pr ( σ 5 | s 0 , π 4 ) = ½ Pr ( σ 6 | s 0 , π 4 ) = (½) 2 = ¼ ▸ σ 6 = á d1, d1, d4 ñ Pr ( σ 6 | s 0 , π 4 ) = (½) 3 = 1 / 8 ▸ σ 7 = á d1, d1, d1, d4 ñ • • • Pr ( S g | s 0 , π 4 ) = ½ + ¼ + 1 / 8 + … = 1 Nau – Lecture slides for Automated Planning and Acting 13
Recommend
More recommend