Kowledge-Based Programs as Explainable Policies for Contingent Planning J. Lang, A. Saffidine, F. Schwarzentruber, B. Zanuttini MAFTEC, April 1, 2019 Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 1/40
Planning Problems Let’s design an agent for solving problems! ? ? ? ? 1 ? ? 2 ? ? ? ? Maybe even let the agent compute its policy by itself Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 2/40
Standard Policies Before we send the agent to the mine field. . . Let’s just check how it is planning to behave Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 3/40
Standard Policies Before we send the agent to the mine field. . . Let’s just check how it is planning to behave 4 , 1; 4 , 3 2 3 , 3 4 , 2; 4 , 3 1 1 3 , 1 3 , 3; 4 , 1; 4 , 2 4 , 3 0 0 0 2 , 1 4 , 1 4 , 2 1 0 1 2 , 3 3 , 3 4 , 2; 4 , 3 4 , 1 2 0 0 1 , 3 1 4 , 3 4 , 2 1 1 1 1 , 1 1 , 2 1 , 3 2 , 1 2 , 2 2 , 3 3 , 1 4 , 1; 4 , 2 0 0 0 0 1 1 0 1 1 , 2 1 , 3 2 , 3 3 , 1 3 , 3 4 , 1; 4 , 2 1 0 0 1 1 0 2 3 , 3 4 , 2; 4 , 3 1 4 , 1; 4 , 3 Wouldn’t this lack of a little readability, verifiability. . . explainability? Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 3/40
Knowledge-Based Programs What about this behaviour? while not sure that all positions except mines have been cleared do if sure that there is no mine at � 1 , 1 � then click 1 , 1 fi if sure that there is no mine at � 1 , 2 � then click 1 , 2 fi . . . if sure that there is no mine at � H , W � then click H , W fi od Wouldn’t this be perfectly readable, verifiable. . . explainable? Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 4/40
Outline Contingent Planning Problems Standard Representations Knowledge-Based Programs The Bright Side of KBPs as Policies The Dark Side of KBPs as Policies Conclusion Multi-Agent KBPs Synthesis of KBPs More Succinctness? Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 5/40
Outline Contingent Planning Problems Standard Representations Knowledge-Based Programs The Bright Side of KBPs as Policies The Dark Side of KBPs as Policies Conclusion Multi-Agent KBPs Synthesis of KBPs More Succinctness? Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 6/40
Partially Observable Domains → 2 n states ◮ X = { x 1 , . . . , x n } : propositional variables ◮ A = { a 1 , . . . , a k } : actions ◮ O = { o 1 , . . . , o p } : observations ◮ ϕ δ : transition function States are not directly observable Minesweeper H × W : → space of 2 2 HW states ◮ variables X : m i , j , c i , j ( ∀ i , j ) ◮ actions A : click i , j ( ∀ i , j ) ◮ observations O : o 0 , . . . , o 8 + o lost Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 7/40
Actions Actions: ◮ ontic effects: change current state (nondeterministic) ◮ epistemic effects: yield observation (nondeterministic, ambiguous) Description for Minesweeper: � ϕ δ = (click i , j → ϕ δ i , j ) i , j with � � � � ( x ′ ↔ x ) ϕ δ i , j = c ′ i , j ∧ ( m i , j → o lost ) ∧ ¬ m i , j → ( ϕ n , i , j ↔ o n ) ∧ n =0 ,..., 8 x � = c i , j Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 8/40
Planning Problems ◮ Domain + initial belief state + goal states ◮ Same as POMDPs except for proba. Minesweeper: ◮ initial belief state: � � � ( ¬ c i , j ) ∧ ( m i , j ∧ m i ′ , j ′ ) ∧ ¬ ( m i , j ∧ m i ′ , j ′ ∧ m i ′′ , j ′′ ) i , j � = � = ◮ goals: � i , j ( c i , j ⊕ m i , j ) Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 9/40
Policies ◮ Prescribe the agent what action to take ◮ Cannot be as a function of current state ◮ Function from histories actions/observations to actions ◮ Abstract notion Examples for Minesweeper: ◮ let ( p t ) t = � 1 , 1 � , � 1 , 2 � , � 1 , 3 � . . . ◮ π def. by π := click p | h | π ′ ( ǫ ) = click p 0 ◮ π ′ def. by π ′ ( h ) click p t ( h )+1 if o | h |− 1 ( h ) = o 0 = π ′ ( h ) = click p t ( h )+2 otherwise Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 10/40
Valid policies Execution model: at each t = 0 , 1 , . . . ◮ current state s t (nonobservable) ◮ current history h t = a 0 o 0 a 1 o 1 . . . a t o t ◮ action a t = π ( h t ) executed (or “stop”) ◮ observation o t + new state s t +1 chosen nondet. wrt ϕ δ ◮ o t given to agent ◮ s t +1 = new current state ◮ new current history h t +1 = h t a t o t Valid policy: ∀ s 0 | = ϕ I , terminate in finite time t and s t | = ϕ G Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 11/40
Example policy 4 , 1; 4 , 3 2 3 , 3 4 , 2; 4 , 3 1 1 3 , 1 3 , 3; 4 , 1; 4 , 2 4 , 3 0 0 0 2 , 1 4 , 1 4 , 2 1 0 1 2 , 3 3 , 3 4 , 2; 4 , 3 4 , 1 2 0 0 1 , 3 1 4 , 3 4 , 2 1 1 1 1 , 1 1 , 2 1 , 3 2 , 1 2 , 2 2 , 3 3 , 1 4 , 1; 4 , 2 0 0 0 0 1 1 0 1 1 , 2 1 , 3 2 , 3 3 , 1 3 , 3 4 , 1; 4 , 2 1 0 0 1 1 0 2 3 , 3 4 , 2; 4 , 3 1 4 , 1; 4 , 3 Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 12/40
Example policy 4 , 1; 4 , 3 2 3 , 3 4 , 2; 4 , 3 1 1 3 , 1 3 , 3; 4 , 1; 4 , 2 4 , 3 0 0 0 2 , 1 4 , 1 4 , 2 1 0 1 2 , 3 3 , 3 4 , 2; 4 , 3 4 , 1 2 0 0 1 , 3 1 4 , 3 4 , 2 1 1 1 1 , 1 1 , 2 1 , 3 2 , 1 2 , 2 2 , 3 3 , 1 4 , 1; 4 , 2 0 0 0 0 1 1 0 1 1 , 2 1 , 3 2 , 3 3 , 1 3 , 3 4 , 1; 4 , 2 1 0 0 1 1 0 2 3 , 3 4 , 2; 4 , 3 1 4 , 1; 4 , 3 ? ? ? ? 1 ? ? 2 ? ? ? ? Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 12/40
Example policy 4 , 1; 4 , 3 2 3 , 3 4 , 2; 4 , 3 1 1 3 , 1 3 , 3; 4 , 1; 4 , 2 4 , 3 0 0 0 2 , 1 4 , 1 4 , 2 1 0 1 2 , 3 3 , 3 4 , 2; 4 , 3 4 , 1 2 0 0 1 , 3 1 4 , 3 4 , 2 1 1 1 1 , 1 1 , 2 1 , 3 2 , 1 2 , 2 2 , 3 3 , 1 4 , 1; 4 , 2 0 0 0 0 1 1 0 1 1 , 2 1 , 3 2 , 3 3 , 1 3 , 3 4 , 1; 4 , 2 1 0 0 1 1 0 2 3 , 3 4 , 2; 4 , 3 1 4 , 1; 4 , 3 ? ? ? 1 ? ? ? 1 ? ? 1 ? ? 2 ? ? 2 ? ? ? ? ? ? ? Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 12/40
Example policy 4 , 1; 4 , 3 2 3 , 3 4 , 2; 4 , 3 1 1 3 , 1 3 , 3; 4 , 1; 4 , 2 4 , 3 0 0 0 2 , 1 4 , 1 4 , 2 1 0 1 2 , 3 3 , 3 4 , 2; 4 , 3 4 , 1 2 0 0 1 , 3 1 4 , 3 4 , 2 1 1 1 1 , 1 1 , 2 1 , 3 2 , 1 2 , 2 2 , 3 3 , 1 4 , 1; 4 , 2 0 0 0 0 1 1 0 1 1 , 2 1 , 3 2 , 3 3 , 1 3 , 3 4 , 1; 4 , 2 1 0 0 1 1 0 2 3 , 3 4 , 2; 4 , 3 1 4 , 1; 4 , 3 ? ? ? 1 ? ? 1 1 ? ? 1 ? ? 1 ? ? 1 ? ? 2 ? ? 2 ? ? 2 ? ? ? ? ? ? ? ? ? ? Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 12/40
Example policy 4 , 1; 4 , 3 2 3 , 3 4 , 2; 4 , 3 1 1 3 , 1 3 , 3; 4 , 1; 4 , 2 4 , 3 0 0 0 2 , 1 4 , 1 4 , 2 1 0 1 2 , 3 3 , 3 4 , 2; 4 , 3 4 , 1 2 0 0 1 , 3 1 4 , 3 4 , 2 1 1 1 1 , 1 1 , 2 1 , 3 2 , 1 2 , 2 2 , 3 3 , 1 4 , 1; 4 , 2 0 0 0 0 1 1 0 1 1 , 2 1 , 3 2 , 3 3 , 1 3 , 3 4 , 1; 4 , 2 1 0 0 1 1 0 2 3 , 3 4 , 2; 4 , 3 1 4 , 1; 4 , 3 ? ? ? 1 ? ? 1 1 ? 1 1 0 ? 1 ? ? 1 ? ? 1 ? ? 1 ? ? 2 ? ? 2 ? ? 2 ? ? 2 ? ? ? ? ? ? ? ? ? ? ? ? ? Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 12/40
Outline Contingent Planning Problems Standard Representations Knowledge-Based Programs The Bright Side of KBPs as Policies The Dark Side of KBPs as Policies Conclusion Multi-Agent KBPs Synthesis of KBPs More Succinctness? Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 13/40
Policy Trees The natural representation: ◮ Node = action, edge = observation ◮ One history = one branch ◮ No child: stop Usage: ◮ typical output of planners ◮ policy typically found as a tree ◮ DAGs when equivalent situations detected ◮ very verbose Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 14/40
Finite-State Controllers Natural compaction of trees: o 1 o 0 , o 1 o 0 , o 1 o 1 o 0 , o 1 a 0 a 1 a 0 a 2 a 3 o 0 o 0 a 4 Usage: ◮ direct search in policy space ◮ representation of infinite policies Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 15/40
Common Properties Note: ◮ there are other representations ◮ implicit representation as DAG (Brafman & Hoffmann 2005) ◮ with “pseudo-epistemic” literals (Albore, Geffner & Palacios 2009) Common properties: ◮ branching on observations ◮ “reactive”: execution is instantaneous at each timestep ◮ unreadable Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 16/40
Outline Contingent Planning Problems Standard Representations Knowledge-Based Programs The Bright Side of KBPs as Policies The Dark Side of KBPs as Policies Conclusion Multi-Agent KBPs Synthesis of KBPs More Succinctness? Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 17/40
Syntax Essentially defined by Fagin et al., 1990’s: κ ::= ε | a | κ ; κ | if Θ then κ else κ fi | while Θ do κ od with Θ either ◮ subjective epistemic formula over variables X ◮ jo( o ) for some observation o Note: no auxiliary variable Lang, Saffidine, Schwarzentruber, Zanuttini KBPs as Explainable Policies 18/40
Recommend
More recommend