a survey of reinforcement learning informed by natural
play

A Survey of Reinforcement Learning Informed by Natural Language - PowerPoint PPT Presentation

A Survey of Reinforcement Learning Informed by Natural Language Luketina et al., IJCAI 2019 Maria Fabiano Outline 1. Motivation 2. Background 3. Current Use of Natural Language in RL 4. Trends for Natural Language in RL 5. Future Work


  1. A Survey of Reinforcement Learning Informed by Natural Language Luketina et al., IJCAI 2019 Maria Fabiano

  2. Outline 1. Motivation 2. Background 3. Current Use of Natural Language in RL 4. Trends for Natural Language in RL 5. Future Work 6. Critique

  3. Motivation Current Problems in RL Most real-world tasks require some kind of language processing. ● RL poorly generalizes to tasks that are very similar to what it trains on, which ● limits its real-world practicality. Previous research has been limited by small corpora or synthetic language. ● Solutions with Natural Language Advances in language representation learning allow models to integrate ● world knowledge from text corpora into decision-making problems. Potential to improve generalization, overcome issues related to data ● constraints, and take advantage of human priors.

  4. Background: RL Agents learn what actions to take in various states to maximize a cumulative reward . Goal : Find a policy 𝜌 (a|s) that maximizes the expected discounted cumulative return. Applications : continuous control, dialogue, board games, video games Limitations : real-world use is limited by data requirements and poor generalization

  5. Background: Knowledge Transfer Recent NLP work has seen models transfer syntactic and semantic knowledge to downstream tasks. Transfer world and task-specific knowledge to sequential decision-making processes Understanding explicit goals (“go to the door”) ● Policy constraints (“avoid the scorpion”) ● Generic information about the reward or policy ● (“scorpions are fast”) Object affordances (what can be done with an object) ● Agents could learn to use NLP and information retrieval to seek information in order to make progress on a task.

  6. Current Use of Natural Language in RL Natural Language in RL Language Language conditional assisted Language in Rewards Communicating Instruction the action & Structuring from domain following observation policies instructions knowledge space

  7. Current Use of Natural Language in RL Language Language conditional assisted Language is part of the task Language facilitates learning formulation These tasks are not mutually exclusive. In both cases, language information can be task-independent (e.g., conveying general priors) or task-dependent (e.g., instructions).

  8. Current Use of Natural Language in RL

  9. Language-Conditional RL Language is part of the task Interpret and execute instructions given in language ● Language is part of the state and action space ● Often, the full language isn’t needed to solve the problem, but the full ● language assists by structuring the policy or providing auxiliary rewards Instruction Following Rewards from Instructions Observation & Action Space High-level instruction Learn a reward function Environments use language sequences (actions, goals, or for driving the interaction policies) with the agent

  10. Language-Conditional: Instruction Following Instructions can be specific actions, goal states, or desired policies Effective agents can: 1. Execute the instruction 2. Generalize to unseen instructions Ties to hierarchical RL Oh et al., 2017 ○ Parameterized skill performs different subtasks ○ Objective function makes analogies between similar ○ subtasks to try to learn the entire subtask space Meta controller reads the instructions, decides ○ which subtask to perform, and passes subtask parameters to the parameterized skill Parameterized skill executes the given subtask ○

  11. Language-Conditional: Rewards from Instructions Use the instructions to induce a reward function To apply instruction-following in a broader context, we need a way to ● automatically evaluate if an instruction was completed. Common architecture: a reward-learning module learns to ground an ● instruction to a goal, then generates a reward for a policy-learning module Use standard IRL or an adversarial process. ● The reward learner is the discriminator that discerns between goal states and visited states. ○ The agent is rewarded for visiting states the discriminator cannot discern from goal states. When environment rewards are sparse, instructions can help generate ● auxiliary rewards to help learn efficiently.

  12. Language-Conditional: Observation & Action Space Environments use language to drive interaction with the agent Much more challenging – observation and action spaces grow combinatorially ● with vocabulary size and grammar complexity Cardinal directions (“Go north”) vs. relative (“go to the blue ball southwest of the green box”) ○ Dialogue systems, QA, VQA, EQA ● Multiple-choice nature makes these problems similar to instruction following ○ To help create consistent benchmarks in this space, TextWorld generates text ● games that behave as RL environments

  13. TextWorld Example 1

  14. TextWorld Example 2

  15. Language-Assisted RL Language assists the task via transfer learning Language is not essential to the task, but assists via transfer of knowledge Specifies features, annotates states or entities, describes subtasks ● Most cases are task-specific ● Pre-trained embeddings and parsers provide task-independent information ●

  16. Language-Assisted: Communicating Domain Knowledge For more general settings outside instruction following, potentially ● task-relevant information could be available Advice about the policy, information about the environment ○ Unstructured, descriptive language is more available than instructive ● Must retrieve useful information for a given context ○ Must ground that information with respect to observations ○ Narasimhan et al, 2018 ● Ground the meaning of text to the dynamics of the environment ○ Allows an agent to bootstrap policy learning in a new environment ○

  17. Language-Assisted: Structuring Policies Construct priors on the model by ● communicating information about the state or dynamics of an environment Shape representations to be more ○ generalized abstractions Make a representation space more ○ interpretable to humans Efficiently structure computations ○ within a model Example: Learning to Compose ● Neural Networks for Question Answering

  18. Trends for Natural Language in RL 1. Language-conditional RL is more studied than language-assisted RL 2. Learning from task-dependent text is more common than task-independent 3. Little research has been done in how to use unstructured text in knowledge transfer from task-dependent text 4. Little research in using language structure to build compositional representations and internal plans 5. Synthetically generated languages (instead of natural language) are the standard for instruction following

  19. Learning from Text Corpora in the Wild Task-independent RL systems can’t generalize to language outside outside of the training ● distribution without transfer from a language model “Fetch a stick” vs. “Return with a stick” vs. “Grab a stick and come back” ○ Would enable agents to better utilize task-dependent corpora ● Task-dependent Transfer task-specific corpora and fine-tune a pre-trained information ● retrieval system. The RL agent queries the retrieval system and uses relevant information. Example: game manuals ○

  20. Diverse Environments with Real Semantics A central promise of language in RL is helping agents adapt to new goals, reward functions, and environment dynamics . This is measured only by instruction following benchmarks in closed-task domains (navigation, object manipulation, etc.) and closed worlds. Small vocabulary sizes ● Multiple pieces of evidence to ground ● each word To generalize, RL needs more diverse environments with complex composition. 3D house simulation ● Minecraft ●

  21. Future Work Use pre-trained language models to transfer world knowledge ● Learn from natural text rather than instructions or synthetic language ● Use more diverse environments with complex composition and real-world ● semantics Develop standardized environments and evaluations to properly measure ● progress of natural language and RL integration Agents that can query knowledge more explicitly and reason with it ● Pre-trained information retrieval systems ○

  22. Critique “Good” “Not so Good” Provide compelling motivation for More background for RL ● ● why RL + NLP is worth studying Q-learning, imitation learning ○ Similarity of multiple-choice QA ● Challenges the field to take the ● problems to instruction following next step in elevate RL Factors they missed that make it ● Provides many positive example ● worthwhile to work in this space that show the feasibility of this Success in multimodal NLP work ○ work Success in other modalities with RL ○ Language can inform RL; is the ● converse true?

Recommend


More recommend