Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran Vodrahalli 2 *, Thomas Leech 1,3 , Mark Donahue 3 , Cristian-Ioan Vasile 1 , Daniela Rus 1 1 Massachusetts Institute of Technology 2 Columbia University 3 MIT Lincoln Laboratory *Equal contributors 1
2
Goals Learn to plan in an environment with rules 1. Learn the rules in a way that they can be easily interpreted by humans 2. Incorporate the rules into planning so that modifying the rules results in predictable changes in behavior 3
Packing a Lunchbox Pack a burger or a sandwich; then pack a banana 4
Goal 1 – Interpretability Rules Pack a burger or a sandwich; then pack a banana Finite State Automaton GOAL! Initial Picked up Packed Picked up Packed State or or 5
Factoring the Environment Pack sandwich or burger; High-level MDP Then pack banana Avoid obstacles Low-level MDP 6
Representing the Environment Pack sandwich or burger; Initial Picked up Packed Finite state automaton State or or Then pack banana Avoid obstacles Picked up Packed Discrete 2D gridworld 7
Goal 2 – Manipulability Incorporate FSA into planning G S0 S1 S2 S3 Initial Picked up Packed Picked up Packed State or or S0 o Ø S0 S1 S2 S3 G T 8
Differentiable Recursive Planning Learn transitions of FSA Learn reward Learn transitions One VIN for each FSA state 9 Based on Tamar, Aviv, et al. "Value iteration networks." Advances in Neural Information Processing Systems . 2016.
Experiments - Interpretability Propositions S0 o Ø S0 S1 FSA States S2 S3 G T 10
Experiments - Interpretability Picking up the sandwich or S0 the hamburger causes a o Ø transition to the next state S0 S1 S2 S3 G T 11
Experiments – Manipulability We can modify the FSA so that it will only pick up the burger and not the sandwich. Initial Packed Packed Picked up Picked up State or or 12
Experiments – Manipulability We can modify the FSA so that it will only pick up the burger and not the sandwich. Initial Packed Packed Picked up Picked up State or or 13
Experiments – Manipulability S0 We can modify the FSA so that it o Ø will only pick up the burger and S0 not the sandwich. S1 S2 S3 G T 14
Experiments – Manipulability S0 We can modify the FSA so that it o Ø will only pick up the burger and S0 not the sandwich. S1 S2 S3 G T 15
Experiments – Manipulability S0 We can modify the FSA so that it o Ø will only pick up the burger and S0 not the sandwich. S1 S2 S3 G T 16
Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran Vodrahalli 2 *, Thomas Leech 1,3 , Mark Donahue 3 , Cristian-Ioan Vasile 1 , Daniela Rus 1 1 Massachusetts Institute of Technology 2 Columbia University 3 MIT Lincoln Laboratory *Equal contributors 17
Recommend
More recommend