Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran - PDF document

Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran Vodrahalli 2 *, Thomas Leech 1,3 , Mark Donahue 3 , Cristian-Ioan Vasile 1 , Daniela Rus 1 1 Massachusetts Institute of Technology 2 Columbia University 3 MIT Lincoln Laboratory *Equal contributors 1 Slide 2 Many environments have simple rules – for example cooking from a recipe, playing games, driving, and assembly. People are able to learn how to perform tasks like these by observing an expert. When observing an expert, people don’t learn to just mimic the 2 expert. They learn the rules that the expert is following. This allows a person who has, for example, learned to cook a dish to modify the ingredients they put in the dish or the order in which they add ingredients. Slide 3 Our goal is to replicate this ability Goals algorithmically using model-based Learn to plan in an environment with rules imitation learning. 1. Learn the rules in a way that they can be easily interpreted by humans 2. Incorporate the rules into planning so that modifying the rules results in predictable changes in behavior 3

Let’s say you have a robot that has to Packing a Lunchbox pack a lunchbox. Pack a burger or a sandwich; then pack a banana The rules are that it has to first pack a burger or a sandwich, and then pack the banana. 4 Slide 5 We can make these rules both useful Goal 1 – Interpretability and interpretable by representing Rules them as a finite state automaton. Pack a burger or a sandwich; then pack a banana … Finite State Automaton GOAL! Initial Picked up Packed Picked up Packed We assume that the environment can State or or 5 be factored into a high-level Markov Decision Process which is equivalent to the FSA, and a low-level MDP of the sort usually used in reinforcement learning. Slide 6 So you can imagine that we have a Factoring the Environment reinforcement learning robot arm Pack sandwich or burger; simulator with state x, y, theta, etc, High-level MDP Then pack banana Avoid obstacles and actions such as torques or commanded positions. Low-level MDP We assume that there is also a high- 6 level MDP, which embodies the rules that the robot must follow.

Representing the Environment Pack sandwich or burger; Initial Picked up Packed Finite state automaton State Then pack banana or or Avoid obstacles Picked up Packed Discrete 2D gridworld 7 Slide 8 We want to be able to modify the Goal 2 – Manipulability behavior of the agent in order to make Incorporate FSA into planning it perform similar but new tasks to the S0 S1 S2 S3 G one it has learned. Initial Picked up Packed Picked up Packed State or or S0 o Ø S0 We achieve this by incorporating the S1 S2 S3 FSA into a recursive planning step. G T 8 Since FSAs are graphs, they can be converted into a transition matrix. First it is useful to label each FSA state with a name. And here is the transition matrix of the first FSA state. The columns are associated with features of the environment, and the rows correspond to FSA states. You can see by looking at the graph that the sandwich and the burger cause a transition to state S1, whereas the other items do not cause a transition to a new state.

We use differentiable recursive Differentiable Recursive Planning planning to approximate value Learn transitions of FSA Learn reward iteration and calculate a policy for the Learn agent. The matrix form of the FSA transitions One VIN allows us to embed the FSA as a for each convolution in the planning step – for FSA state more details, come to our poster Based on Tamar, Aviv, et al. "Value iteration networks." Advances in Neural Information Processing Systems . 2016. 9 session. Slide 10 This is the learned transition matrix of Experiments - Interpretability the first state of the FSA. Columns correspond to propositions, Propositions S0 o Ø S0 or important features of the S1 FSA States S2 environment. S3 G Rows correspond to the other FSA T states. 10 Slide 11 Experiments - Interpretability Picking up the sandwich or S0 the hamburger causes a o Ø transition to the next state S0 S1 S2 S3 G T 11

Since we have learned an Experiments – Manipulability interpretable model of the rules, we We can modify the FSA so that it can easily modify the rules to change will only pick up the burger and not the sandwich. the behavior of the agent. In terms of the FSA, this means just deleting this Initial Picked up Packed Picked up Packed State or or edge between the initial state and the next state. 12 Slide 13 Experiments – Manipulability We can modify the FSA so that it will only pick up the burger and not the sandwich. Initial Picked up Packed Picked up Packed State or or 13 Slide 14 This is also easy to express using the Experiments – Manipulability transition matrix of the FSA; we can S0 We can modify the FSA so that it change the values in the matrix to o Ø will only pick up the burger and S0 not the sandwich. S1 change the form of the FSA. S2 S3 G T 14

Experiments – Manipulability S0 We can modify the FSA so that it o Ø will only pick up the burger and S0 not the sandwich. S1 S2 S3 G T 15 Slide 16 Experiments – Manipulability S0 We can modify the FSA so that it o Ø will only pick up the burger and S0 not the sandwich. S1 S2 S3 G T 16 Slide 17 Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran Vodrahalli 2 *, Thomas Leech 1,3 , Mark Donahue 3 , Cristian-Ioan Vasile 1 , Daniela Rus 1 1 Massachusetts Institute of Technology 2 Columbia University 3 MIT Lincoln Laboratory *Equal contributors 17

Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran - PDF document

Slide 1 Learning to Plan with Logical Automata Brandon Araki 1 , Kiran Vodrahalli 2 , Thomas Leech 1,3 , Mark Donahue 3 , Cristian-Ioan Vasile 1 , Daniela Rus 1 1 Massachusetts Institute of Technology 2 Columbia University 3 MIT Lincoln

Logical Consequence: From Logical Terms to Semantic Constraints Gil Sagi Munich Center for

A Linear Logical A Linear Logical A Linear Logical Framework Framework Framework Iliano

Basic Computation logical AND logical OR logical NOT && || ! public static void

Cost-Based Plan Selection Enumerate, Estimate, Select 304 Cost-Based Plan Selection Execution

CS 151: Logical Splitwise By: Ethan Oro, Faraz Abbasi, Gunguk Kim, The Problem CS 151: Logical

What is the Logical Framework Approach? 1 LFA - DEFINITION The logical framework approach is an

First-Order Logical Duality Henrik Forssell June 2008 First-Order Logical Duality Introduction

Logical Agents Philipp Koehn 5 March 2020 Philipp Koehn Artificial Intelligence: Logical Agents

Logical Step-Indexed Logical Relations Derek Dreyer Max Planck Institute for Software Systems

Revision on propositional logic: Propositions. Logical Connectives. Truth values and truth

CONDITIONAL EXECUTION: PART 2 yes no x > y? max = y; max = x; logical AND logical OR

Physical Operators Scanning, sorting, merging, hashing 193 Physical Operators Execution Query

Time, Logical Time and Causality Carlos Baquero Distributed Systems Group Universidade do Minho

PHASE IA PLAN ULTIMATE PLAN 13 PHASE IB PLAN ULTIMATE PLAN 14 ULTIMATE PLAN ULTIMATE PLAN

Logical Form and Quantifier Raising What is Logical Form In the late 1970s, Noam Chomsky and

Logical Frameworks Lilongwe, Malawi 23-27 May 2011 Session Objectives Understand what

Evan Tschannen 1 Evan Tschannen Worked on FoundationDB for 8 years Touched every

Update of occurrence, impact, and mitigation measures of Foc TR4: the need for collaboration in

PAPYRUS AUSTRALIA PAPYRUS AUSTRALIA Managing Directors Presentation GM March 2011 PAPYRUS

Good industrial relations produce good socio-economic and health protection measures The example

Who grows the food that we eat every day? FAIR TRADEBANANAS VS NON FAIR TRADE BANANAS NON F

Morgan Stanley 2017 Global Consumer and Retail Conference David Hatfield President, Chief

Technical and Citizens Advisory Committee Meeting State Road (S.R.) 528 Update Jim Myers, P.E.

Red Tides, Green Tides, & Brown Tides in the Indian River Lagoon, Florida Edward Phlips and

Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran - PDF document

Slide 1 Learning to Plan with Logical Automata Brandon Araki 1 *, Kiran Vodrahalli 2 *, Thomas Leech 1,3 , Mark Donahue 3 , Cristian-Ioan Vasile 1 , Daniela Rus 1 1 Massachusetts Institute of Technology 2 Columbia University 3 MIT Lincoln

Logical Consequence: From Logical Terms to Semantic Constraints Gil Sagi Munich Center for

A Linear Logical A Linear Logical A Linear Logical Framework Framework Framework Iliano

Basic Computation logical AND logical OR logical NOT &amp;&amp; || ! public static void

Cost-Based Plan Selection Enumerate, Estimate, Select 304 Cost-Based Plan Selection Execution

CS 151: Logical Splitwise By: Ethan Oro, Faraz Abbasi, Gunguk Kim, The Problem CS 151: Logical

What is the Logical Framework Approach? 1 LFA - DEFINITION The logical framework approach is an

First-Order Logical Duality Henrik Forssell June 2008 First-Order Logical Duality Introduction

Logical Agents Philipp Koehn 5 March 2020 Philipp Koehn Artificial Intelligence: Logical Agents

Logical Step-Indexed Logical Relations Derek Dreyer Max Planck Institute for Software Systems

Revision on propositional logic: Propositions. Logical Connectives. Truth values and truth

CONDITIONAL EXECUTION: PART 2 yes no x &gt; y? max = y; max = x; logical AND logical OR

Physical Operators Scanning, sorting, merging, hashing 193 Physical Operators Execution Query

Time, Logical Time and Causality Carlos Baquero Distributed Systems Group Universidade do Minho

PHASE IA PLAN ULTIMATE PLAN 13 PHASE IB PLAN ULTIMATE PLAN 14 ULTIMATE PLAN ULTIMATE PLAN

Logical Form and Quantifier Raising What is Logical Form In the late 1970s, Noam Chomsky and

Logical Frameworks Lilongwe, Malawi 23-27 May 2011 Session Objectives Understand what

Evan Tschannen 1 Evan Tschannen Worked on FoundationDB for 8 years Touched every

Update of occurrence, impact, and mitigation measures of Foc TR4: the need for collaboration in

PAPYRUS AUSTRALIA PAPYRUS AUSTRALIA Managing Directors Presentation GM March 2011 PAPYRUS

Good industrial relations produce good socio-economic and health protection measures The example

Who grows the food that we eat every day? FAIR TRADEBANANAS VS NON FAIR TRADE BANANAS NON F

Morgan Stanley 2017 Global Consumer and Retail Conference David Hatfield President, Chief

Technical and Citizens Advisory Committee Meeting State Road (S.R.) 528 Update Jim Myers, P.E.

Red Tides, Green Tides, &amp; Brown Tides in the Indian River Lagoon, Florida Edward Phlips and

Slide 1 Learning to Plan with Logical Automata Brandon Araki 1 , Kiran Vodrahalli 2 , Thomas Leech 1,3 , Mark Donahue 3 , Cristian-Ioan Vasile 1 , Daniela Rus 1 1 Massachusetts Institute of Technology 2 Columbia University 3 MIT Lincoln

Basic Computation logical AND logical OR logical NOT && || ! public static void

CONDITIONAL EXECUTION: PART 2 yes no x > y? max = y; max = x; logical AND logical OR

Red Tides, Green Tides, & Brown Tides in the Indian River Lagoon, Florida Edward Phlips and