generatjng explanatjons for temporal logic planner
play

Generatjng Explanatjons for Temporal Logic Planner Decisions Daniel - PowerPoint PPT Presentation

Generatjng Explanatjons for Temporal Logic Planner Decisions Daniel Kasenberg*, Ravenna Thielstrom, and Matuhias Scheutz *Daniel Kasenberg dmk@cs.tufus.edu @dkasenberg dkasenberg.github.io Our (long-term) goal Agents which can Learn


  1. Generatjng Explanatjons for Temporal Logic Planner Decisions Daniel Kasenberg*, Ravenna Thielstrom, and Matuhias Scheutz *Daniel Kasenberg dmk@cs.tufus.edu @dkasenberg dkasenberg.github.io

  2. Our (long-term) goal ● Agents which can – Learn interpretable objectjves (through language and behavior) [1] – Behave competently with respect to these objectjves, even when they confmict [2] – Explain their behaviors to human teammates in terms of these objectjves (and correct objectjves or world models if needed) ● ... all while operatjng in the same environments (MDPs) in which reinforcement learning agents have been successfully deployed. [1] Kasenberg, D. , & Scheutz, M. (2017, December). Interpretable apprentjceship learning with temporal logic specifjcatjons . In 2017 IEEE 56th Annual Conference on Decision and Control (CDC) (pp. 4914-4921). IEEE. [2] Kasenberg, D ., & Scheutz, M. (2018, April). Norm confmict resolutjon in stochastjc domains . In Thirty-Second AAAI Conference on Artjfjcial Intelligence. dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  3. Markov Decision Processes A tuple , where ● a fjnite set of states ● a fjnite set of actjons ● a transitjon functjon ● an initjal state ● a discount factor ● a labeling functjon – is a set of atomic propositjons – is the set of propositjons true at ● Our explanatjon approach assumes deterministjc dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  4. Example: ShopWorld ● Agent is a robot sent to go shopping for its user in a store selling a watch ● User wants the watch, but gives the robot insuffjcient money dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  5. Linear temporal logic (LTL) ● A simple propositjonal logic encoding tjme where , are LTL statements, a propositjon. ● : “in the next tjme step, ● : “in all present and future tjme steps, ” ● : “in some present/future tjme step, ” ● : “ will be true untjl becomes true” dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  6. LTL specifjcatjons in ShopWorld “never leave the store while holding an object “leave the store while that has not been bought” holding the watch” (no shoplifuing) dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  7. Preferences over LTL objectjves ● We can give each objectjve a priority and a weight ● Violatjons of objectjves with the same priority can be traded ofg (using their weights as an “exchange rate”) ● Violatjons of objectjves with difgerent prioritjes can’t be traded ofg: the agent prefers to satjsfy the higher- priority objectjve and violate any number of lower- priority objectjves – Lexicographic ordering ● , induce a relatjon over vectors in dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  8. Multj-objectjve LTL planning problem where – a Markov Decision Process – a set of (safe/co-safe) LTL objectjves – are the weight and priority vectors respectjvely dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  9. From LTL to fjnite state machines ● We use syntactjcally (co-)safe LTL objectjves ● For each such objectjve , we can construct a fjnite state machine (FSM) which accepts on if is a bad (good) prefjx of – e.g. → good prefjx any fjnite trajectory where hold at some ● Use this to construct product MDP whose state space is dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  10. Solving the LTL planning problem Let Then we can defjne a product-space reward functjon and thus can be framed as a reward maximizatjon problem on (solvable with value iteratjon): dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  11. LTL “why” queries ● We consider queries of the form , where is an arbitrary (safe/co-safe) LTL statement ● Interpretatjon: “why did the agent act in such a way as to make hold?” ● Examples in ShopWorld: – “why didn’t the agent leave the store?” – “why did the agent never buy the watch?” – “why didn’t the agent leave the store while holding the watch” dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  12. Minimal evidence for an unsatjsfactory trajectory ● We defjne the minimal evidence that a trajectory is unsatjsfactory for an LTL statement as: where ● : positjve and negatjve literals of ● : good prefjxes of if co-safe non-bad prefjxes of if safe ● e.g. in ShopWorld: dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  13. Explanatjon structures ● The agent responds to a “why” query with an explanatjon structure where ● ● is a trajectory (or ) ● contains one or more pairs , where – is an LTL statement – is a set of (tjmestep, literal) pairs suffjcient to show that is unsatjsfactory for ● is as , but for dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  14. Answering “ ” 1. ? If not, return (“ is not, in fact, true”) e.g. 2.Is there some achievable s.t. ? If not, return (“ is true because impossible to make false”) e.g. dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  15. Answering “ ” 3. Compute a trajectory that maximally satjsfjes such that ● The solutjon to the new planning problem ● Return the explanatjon structure (comparing and in terms of their satjsfactjon of ) ● Because maximally satjsfjes , this structure indicates how satjsfying would compromise the agent’s ability to satjsfy dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  16. Answering “ ” in ShopWorld query: ( “why didn’t you leave the store while holding the watch?”) 1. ? 2. ? 3. : pickUp, leaveStore return: ● Indicates that while the true trajectory fails to leave while holding the watch, the only way to satjsfy would have been to steal the watch, which would violate a higher-priority specifjcatjon dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  17. From explanatjon structures to natural language ● We integrated this functjonality with the NL pipeline in DIARC, a robotjc architecture [3, 4] ● Specifjcatjons and queries in an object-oriented extension to LTL ( violatjon enumeratjon language; VEL) allowing quantjfjcatjon over objects ● Utuerance → VEL query →explanatjon structure → natural language response [3] Kasenberg, D. , Roque, A., Thielstrom, R. and Scheutz, M., 2019. Engaging in Dialogue about an Agent’s Norms and Behaviors. In Proceedings of the 1st Workshop on Interactjve Natural Language Technology for Explainable Artjfjcial Intelligence (NL4XAI 2019) (pp. 26-28). [4] Kasenberg, D. , Roque, A., Thielstrom, R., Chita-Tegmark, M. and Scheutz, M., 2019. Generatjng justjfjcatjons for norm-related agent decisions. In Proceedings of the 12th Internatjonal Conference on Natural Language Generatjon (pp. 484-493). dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  18. Natural language explanatjons ● Example: ShopWorld with two objects ( glasses and watch ); agent can afgord one – Buys the glasses, leaves the watch dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  19. Future work ● Incorporatjng explicit causal models (esp. in case) ● Tailoring explanatjons to interactant knowledge ● Adaptjng to stochastjc environments – Need to represent multjple trajectories or probability distributjon ● Improving effjciency of planner – Impractjcal for nontrivial domains ● Dropping assumptjon that agent has perfect knowledge of transitjon dynamics dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

  20. Thanks ● Funding sources – NSF IIS grant 43520050 – NASA grant C17-2D00-TU ● Collaborators – Matuhias Scheutz (advisor; co-author) – Ravenna Thielstrom (co-author) – Antonio Roque – Meia Chita-Tegmark – others dmk@cs.tufus.edu Daniel Kasenberg @dkasenberg dkasenberg.github.io

Recommend


More recommend