situated mapping of sequential instructions to actions
play

Situated Mapping of Sequential Instructions to Actions with - PowerPoint PPT Presentation

Situated Mapping of Sequential Instructions to Actions with Single-step Reward Observation Alane Suhr and Yoav Artzi Executing Context- Dependent Instructions Task: map a sequence of instructions to actions Existing Work Today Symbolic


  1. Situated Mapping of Sequential Instructions to Actions with Single-step Reward Observation Alane Suhr and Yoav Artzi

  2. Executing Context- Dependent Instructions Task: map a sequence of instructions to actions Existing Work Today Symbolic System Actions Representations Learning from Modeling Context Exploration

  3. Executing a Sequence of Instructions 2 6 7 1 3 4 5 Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  4. Executing a Sequence of Instructions 2 6 7 1 3 4 5 Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  5. Executing a Sequence of Instructions 2 6 7 1 3 4 5 Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  6. Executing a Sequence of Instructions 2 6 7 1 3 4 5 Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  7. Executing a Sequence of Instructions 2 6 7 1 3 4 5 Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  8. Executing a Sequence of Instructions 2 6 7 1 3 4 5 Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  9. Executing a Sequence of Instructions 2 6 7 1 3 4 5 Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  10. Executing a Sequence of Instructions 2 6 7 1 3 4 5 Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  11. Problem Setup • Task: follow sequence of instructions • Learning from instructions and corresponding world states Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  12. Problem Setup • Task: follow sequence of instructions • Learning from instructions and corresponding world states Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  13. Problem Setup • Task: follow sequence of instructions • Learning from instructions and corresponding world states Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  14. Problem Setup • Task: follow sequence of instructions • Learning from instructions and corresponding world states Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  15. Problem Setup • Task: follow sequence of instructions • Learning from instructions and corresponding world states Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  16. Problem Setup • Task: follow sequence of instructions • Learning from instructions and corresponding world states Empty out the leftmost beaker of purple chemical Then, add the contents of the first beaker to the second Mix it Then, drain 1 unit from it Same for 1 more unit

  17. Related Work • Context-dependent language understanding • Static environments Miller et al. 1996, Zettlemoyer and Collins 2009, Suhr et al. 2018 (e.g., large database) Long et al. 2016, • Environments that Guu et al. 2017, Fried change over time while et al. 2018 instructions are given Chen and Mooney 2011, Chen 2012, Artzi and • Following instructions in isolation; Zettlemoyer 2013, Artzi et al. 2014, Andreas and varying levels of supervision Klein 2015, Bisk et al. 2016, Misra et al. 2017

  18. Today 1. Attention-based model for generating sequences of system actions that modify the environment 2. Exploration-based learning procedure that avoids biases learned early in training

  19. System Actions 1 2 3 4 5 6 7 Mix it pop 2; • Each beaker is a pop 2; stack pop 2; • Actions are pop push 2 brown; push 2 brown; and push push 2 brown;

  20. Meaning Representation 1 2 3 4 5 6 7 Mix it High-level Representation mix(prevArg2(2)) Engineering Program vs. pop 2; pop 2; pop 2; push 2 brown; System Learning push 2 brown; Abstractions Actions push 2 brown;

  21. Meaning Representation 1 2 3 4 5 6 7 Mix it High-level Representation mix(prevArg2(2)) Engineering Program vs. pop 2; pop 2; pop 2; push 2 brown; System Learning push 2 brown; Abstractions Actions push 2 brown;

  22. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state • Four inputs • Output: a sequence of actions • Attend over each input Current state when generating actions

  23. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state Current state Encode instructions

  24. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state Current state Encode states

  25. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state Decoder state Current state Initialize decoder

  26. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Attention Initial state Decoder state Current instruction Current state Attend over current instruction

  27. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Attention Attention Initial state Decoder state Current instruction Previous instructions Current state Attend over previous instructions

  28. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Attention Attention Initial state Decoder state Attention Current instruction Previous instructions Current state Initial state Attend over initial state

  29. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Attention Attention Initial state Decoder state Attention Current instruction Previous instructions Current state Initial state Attention Current state Attend over current state

  30. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Decoder state Current instruction MLP Previous instructions Current state Initial state Current state Predict action

  31. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Current state Execute action, update state

  32. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Current state Attention Attend over new state

  33. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Action decoder pop 7 Current state

  34. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Action decoder pop 7 pop 7 Current state

  35. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Action decoder pop 7 pop 7 push 7 brown Current state

  36. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Action decoder pop 7 pop 7 push 7 brown Current state push 7 brown

  37. Model Current instruction Previous instructions Throw out first beaker Pour sixth beaker into last one It turns brown Initial state pop 7 Action decoder pop 7 pop 7 push 7 brown Current state push 7 brown push 7 brown

  38. Learning from World State Annotation • Goal: learn a policy that maps from instructions and environment states to actions • Approach Empty out the leftmost beaker of purple chemical • Learn through exploring the environment and observing Then, add the contents of the first beaker to the second rewards • Policy gradient with contextual bandit Mix it Then, drain 1 unit from it • Challenge: overcome biases acquired early during learning Same for 1 more unit

Recommend


More recommend