Policies PDL Extending PDL Programs for policies The dynamic logic of policies and contingent planning Andreas Herzig CNRS, IRIT joint work with Thomas Bolander, Thorsten Engesser, Robert Mattm¨ uller, Bernhard Nebel Journ´ es MAFTEC, Rennes, Dec. 6, 2018 1 / 27
Policies PDL Extending PDL Programs for policies Motivation understand policies standard concept in the planning literature in the game theory literature: strategies central in contingent planning only semantically defined syntactic counterpart? can we describe policies by PDL programs? long-term aim: understand policies for epistemic planning ⇒ understand implicitly coordinated policies 2 / 27
Policies PDL Extending PDL Programs for policies Action models set of action names Act PDL model = triple M Act = ( W , { R a } a ∈ Act , V ) where W : non-empty set (‘states’, ‘possible worlds’) R a ⊆ W × W (‘action a ’s transition relation’) V : W −→ 2 Prp (‘valuation’) 3 / 27
Policies PDL Extending PDL Programs for policies Outline Planning tasks and strong policies 1 PDL: language and semantics 2 Extending PDL to account for policies 3 Programs for policies 4 4 / 27
Policies PDL Extending PDL Programs for policies Planning tasks ( S 0 , γ, M Act ) where S ⊆ W (‘set of initial states’) γ = boolean formula (‘goal’) M Act = PDL model 5 / 27
Policies PDL Extending PDL Programs for policies Policies policy = relation Λ ⊆ W × ( Act ∪ { stop } ) cf. ‘state-action table’ [Cimatti&Roveri 2003] defined at set of states S ⊆ W iff for every s ∈ S : there is x ∈ Act ∪ { stop } such that ( s , x ) ∈ Λ strongly executable iff for every ( s , a ) ∈ Λ : R a ( s ) � ∅ 1 Λ is defined at R a ( s ) 2 6 / 27
Policies PDL Extending PDL Programs for policies Strong solutions to planning tasks Stop (Λ) = { t ∈ W : ( t , stop ) ∈ Λ } ‘license to stop’ all ‘stop states’ must satisfy the goal (v.i.) if a policy contains ( s , a ) and ( s , stop ) then at s one may both to stop and perform a : both must lead to the goal necessary if we want nondeterministic policies Λ is a strong solution of planning task ( S 0 , γ, M Act ) iff: Λ acyclic, finite, strongly executable 1 Λ defined at S 0 2 M Act , Stop (Λ) � γ 3 example: task = ( S 0 , HaveFerrari , M Act ) sequential plan = playLottery ; buyFerrari weak solution not a strong solution 7 / 27
Policies PDL Extending PDL Programs for policies Properties If Λ is a strong solution of ( S , γ, M Act ) then Λ is a strong solution of ( R a ( S ) , γ, M Act ) if Λ 1 and Λ 2 are both strong solutions to ( S 0 , γ, M Act ) and Λ 1 ∪ Λ 2 is acyclic then Λ 1 ∪ Λ 2 is a strong solution of ( S 0 , γ, M Act ) 8 / 27
Policies PDL Extending PDL Programs for policies Outline Planning tasks and strong policies 1 PDL: language and semantics 2 Extending PDL to account for policies 3 Programs for policies 4 9 / 27
Policies PDL Extending PDL Programs for policies PDL: language grammar of formulas ϕ and programs π : ϕ ::= p | ¬ ϕ | ϕ ∧ ϕ | � π � ϕ | [ π ] ϕ π ::= a | π ; π | π ∪ π | ϕ ? where p ∈ Prp and a ∈ Act [ π ] ϕ = “ ϕ true after every possible execution of π ” � π � ϕ = “ ϕ true after some possible execution of π ” so: � π � ϕ ↔ ¬ [ π ] ¬ ϕ N.B.: no Kleene star 10 / 27
Policies PDL Extending PDL Programs for policies PDL: semantics interpretation of programs: R π 1 ; π 2 = R π 1 ◦ R π 2 R π 1 ∪ = R π 1 ∪ R π 2 π 2 R ψ ? = { ( s , s ) : M Act , s � ψ } interpretation of formulas: p ∈ V ( s ) M Act , s � p iff M Act , s � ¬ ϕ iff . . . M Act , s � ϕ ∧ ψ iff . . . M Act , s � � π � ϕ iff M Act , t � ϕ for some t ∈ R π ( s ) M Act , s � [ π ] ϕ M Act , t � ϕ for every t ∈ R π ( s ) iff iff M Act , R π ( s ) � ϕ where: R π ( S ) = � s ∈ S R π ( s ) M Act , S � ϕ iff M Act , s � ϕ for every s ∈ S 11 / 27
Policies PDL Extending PDL Programs for policies PDL: weak executability inexecutability: [ π ] ⊥ = “ ⊥ true after every possible execution of π ” = “ π is inexecutable” executability: def = ¬ [ π ] ⊥ wx π def = � π �⊤ = “ π is weakly executable” weak: I can get by with a little help from nature. . . wx playLottery ; buyFerrari cannot account for strong solutions of planning tasks 12 / 27
Policies PDL Extending PDL Programs for policies Outline Planning tasks and strong policies 1 PDL: language and semantics 2 Extending PDL to account for policies 3 Programs for policies 4 13 / 27
Policies PDL Extending PDL Programs for policies PDL: a new modality when a is deterministic: “ a guarantees outcome ϕ ‘’ = � a � ϕ when actions can be nondeterministic: “ a guarantees outcome ϕ ‘’ = � a �⊤ ∧ [ a ] ϕ = wx a ∧ [ a ] ϕ def = ( [ a ] ) ϕ 14 / 27
Policies PDL Extending PDL Programs for policies PDL: a new modality (ctd.) straightforward extension to sequential plans/programs: “ a 1 ; · · · ; a n guarantees ϕ ” = “ a 1 guarantees ( a 2 ; · · · ; a n guarantees ϕ ) ” = ( [ a 1 ] ) · · · ( [ a n ] ) γ def = ( [ a 1 ; · · · ; a n ] ) ϕ characterises strong sequential solutions to planning tasks: a 1 ; · · · ; a n strong solution of ( S 0 , γ, M Act ) iff M Act , S 0 � ( [ a 1 ; · · · ; a n ] ) γ example: M Act , S 0 � � playLottery ; buyFerrari � HaveFerrari (weak) M Act , S 0 � ( [ playLottery ; buyFerrari ] ) HaveFerrari (not strong) 15 / 27
Policies PDL Extending PDL Programs for policies PDL: strong executability strong executability: def sx a 1 ; · · · ; a n = ( [ a 1 ; · · · ; a n ] ) ⊤ = ( [ a 1 ] ) · · · ( [ a n ] ) ⊤ = � a 1 �⊤ ∧ [ a 1 ](( [ a 2 ] ) · · · ( [ a n ] ) ⊤ ) = . . . ↔ wx a 1 ∧ [ a 1 ] wx a 2 ∧ · · · ∧ [ a 1 ] · · · [ a n − 1 ] wx a n = “ a 1 ; · · · ; a n is strongly executable” examples: sx playLottery = wx playLottery sx playLottery ; buyFerrari = wx playLottery ∧ [ playLottery ] sx buyFerrari 16 / 27
Policies PDL Extending PDL Programs for policies PDL: strong executability (ctd.) ( [ π ] ) ϕ can be reduced to sx π : ( [ a 1 ; · · · ; a n ] ) ϕ = ( [ a 1 ] ) · · · ( [ a n ] ) ϕ = � a 1 �⊤ ∧ [ a 1 ](( [ a 2 ] ) · · · ( [ a n ] ) ϕ ) = . . . ↔ wx a 1 ∧ [ a 1 ] wx a 2 ∧ · · · ∧ [ a 1 ] · · · [ a n − 1 ] wx a n ∧ = [ a 1 ][ a 2 ] · · · [ a n ] ϕ = sx a 1 ; · · · ; a n ∧ [ a 1 ; · · · ; a n ] ϕ strong executability of a nondeterministic composition? sx π 1 ∪ π 2 = ??? 17 / 27
Policies PDL Extending PDL Programs for policies Extended PDL: language formulas ϕ and programs π : ϕ ::= p | ¬ ϕ | ϕ ∧ ϕ | � π � ϕ | [ π ] ϕ | ( [ π ] ) ϕ π ::= a | π ; π | π ∪ π | ϕ ? where p ∈ Prp and a ∈ Act ( [ π ] ) ϕ = “ π is strongly executable and guarantees ϕ ” nondeterministic choice π 1 ∪ π 2 : nature chooses between (possible executions of) π 1 and π 2 can only choose π i that are weakly executable π 1 ∪ π 2 equivalent to ( wx π 1 ?; π 1 ) ∪ ( wx π 2 ?; π 2 ) consequence: ( [ π 1 ∪ π 2 ] ) ϕ ↔ ( wx π 1 ∪ π 2 ) ∧ ( wx π 1 → ( [ π 1 ] ) ϕ ) ∧ ( wx π 2 → ( [ π 2 ] ) ϕ ) cf. [ π 1 ∪ π 2 ] ϕ ↔ ( wx π 1 ∨ wx π 2 ) ∧ ( wx π 1 → [ π 1 ] ϕ ) ∧ ( wx π 2 → [ π 2 ] ϕ ) 18 / 27
Policies PDL Extending PDL Programs for policies Extended PDL: interpretation of formulas M Act , s � ( [ a ] ) ϕ iff R a ( s ) � ∅ and M Act , R a ( s ) � ϕ M Act , s � ( [ π 1 ; π 2 ] ) ϕ M Act , s � ( [ π 1 ] )( [ π 2 ] ) ϕ iff M Act , s � ( [ π 1 ∪ π 2 ] ) ϕ iff M Act , s � � π 1 �⊤ or M Act , s � � π 2 �⊤ , and if M Act , s � � π 1 �⊤ then M Act , s � ( [ π 1 ] ) ϕ , and if M Act , s � � π 2 �⊤ then M Act , s � ( [ π 2 ] ) ϕ M Act , s � ( [ ψ ?] ) ϕ iff M Act , s � ψ ∧ ϕ 19 / 27
Policies PDL Extending PDL Programs for policies Extended PDL: nondeterminism two kinds of nondeterminism atomic actions: nature chooses among possible executions 1 M Act , s �| = ( [ playLottery ] ) Rich π 1 ∪ π 2 : nature chooses among (weakly) executable disjuncts 2 M Act , s �| = ( [ playLottery ∪ workHard ] ) Rich M Act , s | = ( [( playLottery ; ⊥ ?) ∪ workHard ] ) Rich 20 / 27
Policies PDL Extending PDL Programs for policies Axiomatics axiomatics of dynamic logic reduction axioms for program operators: ( [ ψ ?] ) ϕ ↔ ψ ∧ ϕ ( [ π 1 ; π 2 ] ) ϕ ↔ ( [ π 1 ] )( [ π 2 ] ) ϕ ( [ π 1 ∪ π 2 ] ) ϕ ↔ ( � π 1 �⊤ ∨ � π 2 �⊤ ) ∧ ( � π 1 �⊤ → ( [ π 1 ] ) ϕ ) ∧ ( � π 2 �⊤ → ( [ π 2 ] ) ϕ ) reduction axiom for atomic programs: ( [ a ] ) ϕ ↔ � a �⊤ ∧ [ a ] ϕ 21 / 27
Policies PDL Extending PDL Programs for policies Alternative axiomatisation ( [ π ] ) ϕ can be reduced to sx π : ( [ π ] ) ϕ ↔ sx π ∧ [ π ] ϕ axiomatics of strong executability: sx a ↔ wx a sx ψ ? ↔ ψ sx π 1 ; π 2 ↔ sx π 1 ∧ [ π 1 ] sx π 2 sx π 1 ∪ π 2 ↔ ( sx π 1 ∧ sx π 2 ) ∨ ( sx π 1 ∧ ¬ wx π 2 ) ∨ ( ¬ wx π 1 ∧ sx π 2 ) 22 / 27
Recommend
More recommend