 
              Context of the work Theory Work Conclusion, perspectives Towards an efficient representation for epistemic planning Supervised by Alexandre Niveau Sebastien Gamblin University of Caen December 6, 2018 1/24
Context of the work Theory Work Conclusion, perspectives Plan Context of the work 1 Theory 2 Epistemic logic Kripke structure Product update KBP Work 3 V-injectivity Propositional representation Simple quering of the structure Planification Results Conclusion, perspectives 4 2/24
Context of the work Theory Work Conclusion, perspectives Context of the work Theme : Multi-agent planification, using epistemic logic and event model of DEL to represent the problem. Goal : We want to find one policy for each agent in the form of KBP [LZ12]. We work on the game Hanabi, a collaborative card game where it’s natural to learn about the knowledge of other agents. 3/24
Context of the work Theory Work Conclusion, perspectives Epistemic logic Let A , a set of agents Let X, a set of propositional atoms Let L EL the langage := ⊤| p | ¬ Φ | Φ ∨ Φ | K a Φ , p ∈ X , a ∈ A K i φ means that "agent i knows that φ ". 4/24
Context of the work Theory Work Conclusion, perspectives Kripke structure Kripke structure : M = ( W , R 1 . . . R n , V ) [FHMV95] W : set of worlds R 1 . . . R n ⊆ W × W , indistinguishability’s relations V : valuation’s function W → 2 X w 1 w 2 pq pq 2 1 , 2 1 , 2 Figure: Example of the knowledge of agents Allows to interprets an epistemic formula in a certain state of knowledge. Example in w 1 : K 1 p , ¬ K 2 p , K 1 ¬ K 2 p are true. 5/24
Context of the work Theory Work Conclusion, perspectives Event model ε = ( E , E 1 . . . E n , pre , post ) E : set of actions E 1 . . . E n ⊆ E × E : indistinguishability’s relations pre : precondition function, E → L EL post : postcondition function, E × X → L PROP e1 e2 pre: p pre: q post: p ← ⊥ post: q ← ⊥ 1 , 2 1 , 2 1 , 2 Figure: Event model example 6/24
Context of the work Theory Work Conclusion, perspectives Product update Product update : ε et M : = ε ⊗ M = ( W ′ , R ′ 1 . . . R ′ n , V ′ ) W ′ = { ( w , e ) ∈ W × E | M , w | = pre ( e ) } i ( w ′ , e ′ ) iff w R i w ′ and e E i e ′ ( w , e ) R ′ V ′ (( w , e )) = { p ∈ X | M , w | = post ( e , p ) } It’s the cartesian product of the two Kripke structure. 7/24
Context of the work Theory Work Conclusion, perspectives Product update: example w 1 w 2 e1 pre: p pq pq post: p ← ⊥ 2 1 , 2 1 , 2 1 , 2 1 , 2 w 1 e 1 e2 pre: q 1 , 2 pq post: q ← ⊥ 1 , 2 2 1, 2 w 1 e 2 w 2 e 2 2 1 , 2 pq pq 1 , 2 Figure: ⊗ Updated structure 8/24
Context of the work Theory Work Conclusion, perspectives KBP : Knowledge-based programs [LZ12] A : set of primitive actions KBP is defined inductively as follows : the empty plan is a KBP. any action α ∈ A is a KBP. if π and π ′ are KBPs, then π ; π ′ is a KBP. if Φ then π else π ′ is a KBP. while Φ do π is a KBP. Φ must be subjective to the current agent. 9/24
Context of the work Theory Work Conclusion, perspectives Domain : We have set of V, set of actions Problem : With an initial state and a goal (epistemic formula), we want to find KBP for each agent, such that when the agents execute their KBP synchronously turn by turn, the goal is reached in a finite number of steps. 10/24
Context of the work Theory Work Conclusion, perspectives Hanabi-like’s initial Kripke frame m 1 x 11 x 22 x p 3 1 , 2 m 2 m 3 2 x 11 x 23 x p 2 x 12 x 21 x p 3 1 , 2 1 , 2 1 1 2 1 m 4 m 5 x 12 x 23 x p 1 x 13 x 21 x p 2 1 , 2 1 , 2 m 6 2 x 13 x 22 x p 1 1 , 2 Figure: Example of initial situation with 3 cards and 2 agents. x ic : agent i has card c . 11/24
Context of the work Theory Work Conclusion, perspectives Event A j pioche 2 pre : xp2 post: xj 2 ← ⊥ xp 2 ← ⊤ A j pioche 1 j pre : xp1 post: xj 1 ← ⊥ j xp 1 ← ⊤ A j pioche 3 j pre : xp3 post: xj 3 ← ⊥ xp 3 ← ⊤ 12/24
Context of the work Theory Work Conclusion, perspectives Toward an efficient representation Combinatorial explosion Naive implementation of a classical graph : 2 players, 4 cards in hand. Cards Number of worlds Number of relations 9 630 22694 10 3150 58926 11 11550 112266 Contribution Hanabi has a particularity : two worlds never have the same propositional valuations. Definition V-injective. A Kripke structure M = � W , R 1 . . . R n , V � is called V-injective if V is injective, i.e., ∀ w , w ′ ∈ W : w � = w ′ = ⇒ V ( w ) � = V ( w ′ ). Also identified by M. Gattinger [Gat18]. Add variables to split worlds : [CS17]. 13/24
Context of the work Theory Work Conclusion, perspectives Definition Boolean representation of Kripke Structure. Let M a V-injectif Kripke structure for agents A = { a 1 , . . . , a m } and propositonal variables X = { x 1 , . . . , x n } . The propositional representation of M is a tuple of Boolean functions F = � f 1 , . . . , f m � where every f i : B 2 n → B is defined as follow :  ∀ j : V ( w )( x j ) = v j  ⇒ ∃ w , w ′ ∈ W : f i ( v 1 , . . . , v n , v ′ 1 , . . . , v ′ ∀ j : V ( w ′ )( x j ) = v ′ n ) = 1 ⇐ j  ( w , w ′ ) ∈ R i Relations for agent 1 Relations for agent 2 p ′ q ′ p ′ q ′ p q Rel p q Rel 1 1 1 1 w 1 → w 1 1 1 1 1 w 1 → w 1 0 1 0 1 w 2 → w 2 0 1 0 1 w 2 → w 2 1 1 0 1 w 1 → w 2 0 1 1 1 w 2 → w 1 Figure: Representation by a Boolean function of the example 14/24
Context of the work Theory Work Conclusion, perspectives Model checking There is a practical algorithm for checking if a Boolean representation is a model of an epistemic formula. Goal : find a propositional representation of Θ( F , Φ) on X of the set of worlds Q ( M , Φ) = { w ∈ W | M , w | = Φ } , where Φ ∈ L EL . i.e. Mod (Θ( F , Φ)) = Q ( M , Φ). Proposition 1. Let F the propositional representation of M . Let f w is the formula which has for models all valuations of world in W . Θ( F , x ) = f w ∧ x 1 Θ( F , ¬ Φ) = f w ∧ ¬ Θ( F , Φ) 2 Θ( F , Φ ∧ Ψ) = Θ( F , Φ) ∧ Θ( F , Ψ) 3 K i Φ) = Forget( f i ∧ F ′ , X ′ ), where F ′ = Θ( F , Φ)[ X → X ′ ] Θ( F , ˆ 4 Several languages representing propositional formulas have efficient algorithms for these operations (OBDD for example). 15/24
Context of the work Theory Work Conclusion, perspectives Boolean representation for Event model Simple atoms are used for the original world : p Plus atoms are used to apply valuation in furtur state: p + With primes : modelise propositions of the arrival event Here, ϕ e 1 = p ∧ ¯ p + et ϕ e 2 = q ∧ ¯ q + We can obtain the event formula like this : Φ e = ( ϕ e 1 ∧ ϕ e 1 [ X ′ ← X , X + ′ ← X + ]) ∨ ( ϕ e 1 ∧ ϕ e 2 [ X ′ ← X , X + ′ ← X + ]) ∨ ( ϕ e 2 ∧ ϕ e 1 [ X ′ ← X , X + ′ ← X + ]) ∨ ( ϕ e 2 ∧ ϕ e 2 [ X ′ ← X , X + ′ ← X + ]) Relations q + ′ p ′ p ′ + q ′ q ′ + p p + q Rel 1 0 1 0 e 1 → e 1 1 0 1 0 e 2 → e 2 1 0 1 0 e 1 → e 2 1 0 1 0 e 2 → e 1 16/24
Context of the work Theory Work Conclusion, perspectives Symbolic product update for propositional representation Proposition Product update. with f i knowledge structure of agent i and Φ e the event formula : Forget ( f i ∧ Φ e , X ∪ X ′ )[ X + ← X , X + ′ ← X ′ ] 17/24
Context of the work Theory Work Conclusion, perspectives Here, we can modelize the knowledge of the agents. Now, we want to create KBP. We use regression because formulas include policy... 18/24
Context of the work Theory Work Conclusion, perspectives Regression Starting from an epistemic goal formula, we want to get all the epistemic formulas that could lead to this goal formula through the events of the game. Definition Regression. of Φ G (goal formula) by ( M , w ) (pointed event), called Reg w (Φ G ) is the formula defined as (see [Auc12] ): Reg w ( p ) = Pre ( w ) ∧ Post ( w )( p ) Reg w (Φ ∨ Ψ) = Reg w (Φ) ∨ Reg w (Ψ) Reg w ( ¬ Φ) = Pre ( w ) ∧ ¬ Reg w (Φ) � Reg w ( � � K j Φ) = Pre ( w ) ∧ K j ( Reg w (Φ)) v ′ ∈ K j ( w ) 19/24
Context of the work Theory Work Conclusion, perspectives Data: n : degree regression Data: final state Result: Plan Φ G ← final _ state ; Plan ← list ((Φ G , ′ stop ′ )); We obtain plan : forall i ∈ { 0 . . . i } do tmp ← ⊤ ; if Φ 1 then execute action 1 forall a ∈ Actions do else if Φ 2 then execute action 2 Φ P ← Reg a (Φ G ) ; else if Φ 3 then execute action 3 Plan.append((Φ P , a )) ; ... tmp ∨ = Φ P ; elise if Φ G then ’STOP’ end Φ G ← tmp ; end Algorithm 1: Create plan 20/24
Context of the work Theory Work Conclusion, perspectives Data: Plan done ← ⊥ ; while not done do forall (Φ , action ) ∈ Plan do if evaluate( Φ , state) then if action == stop then done ← ⊤ ; end Execute action ; end end end Algorithm 2: Follow plan Programm pointer of other agents in this KBP ? 21/24
Context of the work Theory Work Conclusion, perspectives Implementation in python with cudd library for BDD. 22/24
Context of the work Theory Work Conclusion, perspectives Example 23/24
Recommend
More recommend