program guided agent
play

Program Guided Agent ICLR 2020 (Spotlight) Shao-Hua Sun Te-Lin Wu - PowerPoint PPT Presentation

Program Guided Agent ICLR 2020 (Spotlight) Shao-Hua Sun Te-Lin Wu Joseph J. Lim Follow an Instruction to Solve a Complex Task Recipe: cooking fried rice Stir-fry the onions until tender, and repeat this for garlic and carrots, if you have


  1. Program Guided Agent ICLR 2020 (Spotlight) Shao-Hua Sun Te-Lin Wu Joseph J. Lim

  2. Follow an Instruction to Solve a Complex Task Recipe: cooking fried rice Stir-fry the onions until tender, and repeat this for garlic and carrots, if you have soy sauce, add some. Pour 2/3 cups the whisked eggs into the stir-fried and scramble.

  3. Natural Language Instruction Ambiguities in Language Recipe: cooking fried rice • Scoping Stir-fry the onions until tender, and repeat this for garlic and • Coreferences carrots, if you have soy sauce, add some. Pour 2/3 cups the • Entities whisked eggs into the stir-fried and scramble. Bandanau et al. in ICLR 2019 Misra et al. “Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction” in EMNLP 2018 Anderson et al. “Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments” in CVPR 2018 Misra et al. “Mapping Instructions and Visual Observations to Actions with Reinforcement Learning” in EMNLP 2017 Hermann et al. “Grounded Language Learning in a Simulated 3D World” in arXiv 2017

  4. Program Advantages of Programs Function: cooking fried rice • Explicit scoping for item in [onions, garlic, carrots]: if is_there(“soy sauce”): • Resolved Coreferences add(“soy sauce”, “pot”) while not tender(item): • Resolved Entities stir_fry(item) pour(whisked(“eggs”), “pot”, 0.66) scramble(“eggs”)

  5. Problem Formulation Program

  6. Problem Formulation Program State x3 x1 x0

  7. Problem Formulation Program State Execution x3 x1 x0 x3 x1 x0

  8. Problem Formulation Program State Execution x3 x1 x0 x3 x1 x0

  9. Problem Formulation Program State Execution x3 x1 x0 x3 x1 x0

  10. Problem Formulation Program State Execution x3 x1 x0 x4 x1 x0

  11. Problem Formulation Program State Execution x3 x1 x0 x3 x1 x0

  12. Problem Formulation Program State Execution x3 x1 x0 x3 x1 x0

  13. Problem Formulation Program State Execution x3 x1 x0 x3 x1 x0

  14. Problem Formulation Program State Execution x3 x1 x0 x3 x2 x0

  15. Problem Formulation Program State Execution x3 x1 x0 x3 x1 x0

  16. Problem Formulation Program State Execution x3 x1 x0 x3 x1 x0

  17. Problem Formulation Program State Execution x3 x1 x0 x3 x1 x0

  18. Problem Formulation Program State Execution x3 x1 x0 x3 x1 x1

  19. Exemplar Instructions def Task (): def Task (): if is_there[River]: if is_there[River]: mine (Wood) build_bridge () build_bridge () place (Gold, 3, 4) if agent[Iron ] < 3: if agent[Gold ] = = 1 3: Programs mine (Iron) while agent[Gold] <= 12: place (Gold, 8, 3) place (Iron, 2, 3) if agent[Iron] >= 8: else: place (Wood, 2, 4) goto (4, 2) elif env[Gold] <= 10: while env[Gold ] > 0 : sell (Iron) mine(Gold) Natural Language Instructions

  20. End-to-end Learning Baseline Perception 3 0 1 Module State State def run(): Query Response Policy Action while env[ Gold ] > 0: OR Environment mine ( Gold ) if is_there[ River ]: build_bridge () place ( Wood , 2, 3) Module Module Output Program Program Interpreter Goal Program NL Instruction

  21. Program Guided Agent Perception 3 0 1 Module State def run(): Response Query while env[ Gold ] > 0: Policy Action Environment mine ( Gold ) if is_there[ River ]: build_bridge () place ( Wood , 2, 3) Module Module Output Program Interpreter Goal Program

  22. Program Interpreter Comprehend a given program to 3 categories: • Subtasks (actions) : what agent should perform • Perception : information from the environment • Control flow : decide to call different subtasks according to perceived • information Perception 3 0 1 Module State def run(): Query Response while env[ Gold ] > 0: Policy Action Environment mine ( Gold ) if is_there[ River ]: build_bridge () place ( Wood , 2, 3) Module Module Output Program Interpreter Goal Program

  23. Perception Module Extract environmental information for choosing a path in a program • Input • Query : a symbolically represented query ( e.g. is_there[River]) • State s : environment map and agent inventory status • Output • Predicted answer to the query ( e.g. True/False) • Perception 3 0 1 Module State def run(): Query Response while env[ Gold ] > 0: Policy Action Environment mine ( Gold ) if is_there[ River ]: build_bridge () place ( Wood , 2, 3) Module Module Output Program Interpreter Goal Program

  24. Policy Take low-level actions an the environment for fulfilling a subtask • Input • Symbolically represented subtask (goal) g • State s • Output • Predicted action distribution • Perception 3 0 1 Module State def run(): Query Response while env[ Gold ] > 0: Policy Action Environment mine ( Gold ) if is_there[ River ]: build_bridge () place ( Wood , 2, 3) Module Module Output Program Interpreter Goal Program

  25. Result

  26. Conclusion Specific tasks using programs • def Task (): if is_there[River]: mine (Wood) build_bridge () if agent[Iron ] < 3: Program mine (Iron) place (Iron, 2, 3) else: goto (4, 2) while env[Gold ] > 0 : mine(Gold) Leverage the structure of programs with a modular framework • Perception 3 0 1 Module State def run(): Query Response Policy Action while env[ Gold ] > 0: Environment mine ( Gold ) if is_there[ River ]: build_bridge () place ( Wood , 2, 3) Module Module Output Program Interpreter Goal Program

  27. Program Guided Agent ICLR 2020 (Spotlight) Thank You for Your Attention Shao-Hua Sun Te-Lin Wu Joseph J. Lim

Recommend


More recommend