Learning to Generalize from Sparse and Underspecified Rewards - PowerPoint PPT Presentation

Proprietary + Confidential Learning to Generalize from Sparse and Underspecified Rewards Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Motivation Reinforcement learning has enabled remarkable advances: ➢ These advances hinge on the availability of high-quality and dense rewards. ➢ However, many real-world problems involve sparse and underspecified ➢ rewards. Language understanding tasks provide a natural way to investigate RL ➢ algorithms in such settings. Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Instruction Following Instruction: “Right Up Up Right” : Blindfolded agent : Goal : Death Possible Actions: ← , ↑ , → , ↓ The reward is +1 if the goal is reached and 0 otherwise. Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Weakly-supervised Semantic Parsing Question: Which nation won the most number of Silver medals? Rank Nation Gold Silver Bronze Total 1 Nigeria 13 16 9 38 ????? 2 Kenya 12 10 7 29 3 Ethiopia 4 3 4 11 ... ... ... ... ... ... 15 Madagascar 0 0 2 2 Nigeria Tanzania 0 0 1 1 16 Uganda 0 0 1 1 Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Challenges: (1) Exploration, (2) Generalization Question: Which nation won the most number of Silver medals? Rank Nation Gold Silver Bronze Total 1 Nigeria 13 16 9 38 ????? 2 Kenya 12 10 7 29 3 Ethiopia 4 3 4 11 ... ... ... ... ... ... 15 Madagascar 0 0 2 2 Nigeria Tanzania 0 0 1 1 16 Uganda 0 0 1 1 Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Underspecified Rewards Instruction: “Right Up Up Right” Correct Action Sequence: → ↑ ↑ → Spurious Action Sequences: ↑ → ↑ → ↑ → → ↑ ↑ → ↑ ↓ → ↑ Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Underspecified Rewards Question: Which nation won the most number of Silver medals? Rank Nation Gold Silver Bronze Total 1 Nigeria 13 16 9 38 v0 = (argmax all_rows r.Silver) 2 Kenya 12 10 7 29 return (hop v0 r.Nation) 3 Ethiopia 4 3 4 11 ... ... ... ... ... ... 15 Madagascar 0 0 2 2 Nigeria Tanzania 0 0 1 1 16 Uganda 0 0 1 1 Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Underspecified Rewards Question: Which nation won the most number of Silver medals? Rank Nation Gold Silver Bronze Total 1 Nigeria 13 16 9 38 v0 = (argmax all_rows r.Gold) 2 Kenya 12 10 7 29 return (hop v0 r.Nation) 3 Ethiopia 4 3 4 11 ... ... ... ... ... ... 15 Madagascar 0 0 2 2 Nigeria Tanzania 0 0 1 1 16 Uganda 0 0 1 1 Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Underspecified Rewards Question: Which nation won the most number of Silver medals? Rank Nation Gold Silver Bronze Total 1 Nigeria 13 16 9 38 v0 = (argmin all_rows r.Rank) 2 Kenya 12 10 7 29 return (hop v0 r.Nation) 3 Ethiopia 4 3 4 11 ... ... ... ... ... ... 15 Madagascar 0 0 2 2 Nigeria Tanzania 0 0 1 1 16 Uganda 0 0 1 1 Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Underspecified Rewards Awesome Reinforcement Learning Model Recent interest in automated reward learning using expert demonstrations. Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Learning Rewards without Demonstration Awesome Reinforcement Learning Model Recent interest in automated reward learning using expert demonstrations. What if we don’t have demonstrations? Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Learning Rewards without Demonstration Awesome Reinforcement Learning Model Recent interest in automated reward learning using expert demonstrations. Key idea: Use generalization error as the supervisory signal for learning rewards. Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Meta Reward Learning (MeRL) The auxiliary rewards R ϕ are optimized based on the generalization performance O val of a policy π ϴ trained using the auxiliary rewards: Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Tackling Sparse Rewards Disentangle exploration from exploitation. ➢ Mode covering direction of KL divergence to collect successful ➢ sequences . Mode seeking direction of KL divergence for robust optimization. ➢ Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Results MAPOX uses our mode ➢ Method WikiSQL WikiTable covering exploration strategy on top of prior work ( MAPO) . MAPO 72.4 ( ± 0.3) 42.9 ( ± 0.5) MAPOX 74.2 ( ± 0.4) 43.3 ( ± 0.4) Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Results MAPOX uses our mode ➢ Method WikiSQL WikiTable covering exploration strategy on top of prior work ( MAPO) . MAPO 72.4 ( ± 0.3) 42.9 ( ± 0.5) BoRL is our Bayesian ➢ MAPOX 74.2 ( ± 0.4) 43.3 ( ± 0.4) optimization approach for learning rewards. BoRL 74.2 ( ± 0.2) 43.8 ( ± 0.2) Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Results MAPOX uses our mode ➢ Method WikiSQL WikiTable covering exploration strategy on top of prior work ( MAPO) . MAPO 72.4 ( ± 0.3) 42.9 ( ± 0.5) BoRL is our Bayesian ➢ MAPOX 74.2 ( ± 0.4) 43.3 ( ± 0.4) optimization approach for learning rewards. BoRL 74.2 ( ± 0.2) 43.8 ( ± 0.2) MeRL achieves state-of-the-art ➢ MeRL 74.8 (± 0.2) 44.1 ( ± 0.2) results on WikiTableQuestions and WikiSQL , improving upon prior work by 1.2% and 2.4% respectively. Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Proprietary + Confidential Poster #49 tonight @Pacific Ballroom bit.ly/merl2019 Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Learning to Generalize from Sparse and Underspecified Rewards - PowerPoint PPT Presentation

Proprietary + Confidential Learning to Generalize from Sparse and Underspecified Rewards Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Monotonic Inference for Underspecified Episodic Logic Mandar Juvekar University of Rochester

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of

Inductive Bias: How to generalize on novel data CS 478 - Inductive Bias 1 Non-Linear Tasks l

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &

ECML 2015 Big Targets Workshop Paul Mineiro Paul Mineiro ECML 2015 Big Targets Workshop How can

MULTIMODAL SEMANTIC SIMULATIONS OF LINGUISTICALLY UNDERSPECIFIED MOTION EVENTS Nikhil

Constraint-Based Underspecified Semantic Combinatorics Manfred Sailer (based on joint work with

On the Use of Underspecified Data-Type Semantics for Type Safety in Low-Level Code Hendrik Tews 1

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Lorem ipsum dolor sit amet consectetur adipiscing elit Suspendisse sed Abigail Alsop, Frank

Financial guidance to help with the impact of coronavirus Disclaimer page Bank of America and

Competitiveness and Diversification of Services Export in SSA: The Case of East African Community

Insights from UNU-WIDER Research Background and Introduction: Six Points of Reference 1. UNU-

Developing and Delivering Scientific Presentations Some Hints Vlad Coroam Digitalisation

December 7, 2015 - January 25, 2016 What is open source? Computer software where the source

Introduction to decoupled Drupal Preston So 26 Sep 2017 DrupalCon Vienna 2017

CTSA Program PI Webinar Wednesday, September 26, 2018 2:00 3:00 ET Agenda Time Topic

Learning to Generalize from Sparse and Underspecified Rewards - PowerPoint PPT Presentation

Proprietary + Confidential Learning to Generalize from Sparse and Underspecified Rewards Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi Source: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis non erat sem

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Monotonic Inference for Underspecified Episodic Logic Mandar Juvekar University of Rochester

NORMS, INNER PRODUCTS, AND ORTHOGONALITY Vector norms Generalize the familiar concept of

Inductive Bias: How to generalize on novel data CS 478 - Inductive Bias 1 Non-Linear Tasks l

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &amp;

ECML 2015 Big Targets Workshop Paul Mineiro Paul Mineiro ECML 2015 Big Targets Workshop How can

MULTIMODAL SEMANTIC SIMULATIONS OF LINGUISTICALLY UNDERSPECIFIED MOTION EVENTS Nikhil

Constraint-Based Underspecified Semantic Combinatorics Manfred Sailer (based on joint work with

On the Use of Underspecified Data-Type Semantics for Type Safety in Low-Level Code Hendrik Tews 1

Parallel Numerical Algorithms Chapter 4 Sparse Linear Systems Section 4.1 Direct Methods

Extremal results for sparse pseudorandom graphs Yufei Zhao Massachusetts Institute of Technology

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

CNBC Matlab Mini-Course Sparse Matrices Sparse matrices provide an efficient means to store

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Lorem ipsum dolor sit amet consectetur adipiscing elit Suspendisse sed Abigail Alsop, Frank

Financial guidance to help with the impact of coronavirus Disclaimer page Bank of America and

Competitiveness and Diversification of Services Export in SSA: The Case of East African Community

Insights from UNU-WIDER Research Background and Introduction: Six Points of Reference 1. UNU-

Developing and Delivering Scientific Presentations Some Hints Vlad Coroam Digitalisation

December 7, 2015 - January 25, 2016 What is open source? Computer software where the source

Introduction to decoupled Drupal Preston So 26 Sep 2017 DrupalCon Vienna 2017

CTSA Program PI Webinar Wednesday, September 26, 2018 2:00 3:00 ET Agenda Time Topic

Why NLU doesnt generalize to NLG Yejin Choi Paul G. Allen School of Computer Science &