Assessing Generalization in Deep Reinforcement Learning Soo Jung - PowerPoint PPT Presentation

Assessing Generalization in Deep Reinforcement Learning Soo Jung Jang

Background Before (ex: factory robot) Now (ex: human-like intelligence) focus on one environment apply to multiple environment generalization is not considered generalization is important ● Paper’s Goal: Empirical study of generalization in deep RL with different (1) algorithms, (2) environments, and (3) metrics

Algorithms ● Vanilla (Baseline) Algorithms ○ A2C: Actor-Critic Family ○ PPO: Policy-Gradient Family ● Generalization-Tackling Algorithms ○ EPOpt: Robust Approach ○ RL2: Adapt Approach ● 6 Algorithms Total A2C, PPO, EPOpt-A2C, EPOpt-PPO, RL2-A2C, RL2-PPO

Algorithms - Vanilla ● A2C / Actor-Critic Family ○ Critic: learns a value function ○ Actor: uses the function to learn a policy that maximizes expected reward ● PPO / Policy-Gradient Family ○ Learn sequence of improving policies ○ Maximize surrogate for the expected reward via gradient ascent

Algorithms - Generalization-Tackling ● EPOpt / Robust Approach ○ Maximize expected reward over subset of environments with lowest expected reward (Maximize conditional value at risk) ● RL2 / Adapt Approach ○ Learn environment embedding at test time “on-the-fly” ○ RNN with current trajectory as input / hidden states = embeddings

Algorithms - Network Architecture ● Feed Forward (FF) ○ Multi-layer perceptron (MLP) ● Recurrent (RC) ○ LSTM on top of MLP ● 4 Non-RL2 Algorithms → Test on both FF and RC ● 2 RL2 Algorithms → Test only on RC

Environments ● 6 Environments (OpenAI) CartPole MountainCar AcroBot Pendulum HalfCheetah Hopper

Metrics - Environment Parameters ● Deterministic (D) ○ fixed at default value (fixed environment) ● Random (R) ○ uniformly sampled from d -dimensional box (feasible environment) ● Extreme (E) ○ uniformly sampled from union of 2 intervals that straddle corresponding interval in R (edge cases) Schematic ( d =2 and 4 samples)

Metrics - Evaluation ● 3 Evaluation Metrics From 3x3 train-test pairs of (D/R/E) 1. Default: DD 2. Interpolation: RR 3. Extrapolation: Mean of DR, DE, RE ● Metric Value (Performance) ○ Success Rate (%) of episodes where a certain goal is completed

Experiment ● Compare performance of: ○ 10 algo combinations (6 algorithms / 2 architectures) ○ 6 environments ○ 3 metrics (default, interpolation, extrapolation) ● Methodology Train 15000 episodes / Test 1000 episodes ● Fairness No memory of previous episode Several sweeps of hyperparameters Success rate instead of reward itself

Results ● Default > Inter > Extrapolation ● FF > RC Architecture ● Vanilla > Generalization-Tackling ● RL2 variants do not work ● EPOpt-PPO works well in continuous action space (Pendulum, ½ Cheetah, Hopper)

Discussion Questions ● Generalization-tackling algorithms tested in this paper failed. What would be a potential strategy that makes generalization work? How would you solve this RL generalization problem? ● Why do you think generalization-tackling algorithms and recurrent (RC) architectures perform worse than Vanilla and feed forward (FF)? When would you expect generalized-tackling algorithms and recurrent (RC) architectures to work better? ● Do you think the paper’s experiment methodology is fair? Is there a better way to evaluate generalization on different algorithms and architectures?

Assessing Generalization in Deep Reinforcement Learning Soo Jung - PowerPoint PPT Presentation

Assessing Generalization in Deep Reinforcement Learning Soo Jung Jang Background Before (ex: factory robot) Now (ex: human-like intelligence) focus on one environment apply to multiple environment generalization is not considered

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Deep learning: Challenges in learning and generalization Tomas Mikolov, Facebook AI What is

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence:

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence:

0-6 hour Weather Forecast Guidance at The Weather Company Steven Honey, Joseph Koval, Cathryn

Regularized Nonlinear Acceleration. Alexandre dAspremont , CNRS & D.I. Ecole Normale Sup

Landsat Calibration: Interpolation, Extrapolation, and Reflection L DCM Sc ie nc e T e a m Me

Evaluation for Stability data Q1E Sumie Yoshioka, Ph. D. MHLW National Institute of Health

Presentation to ARC Technical Coordinating Committee June 13, 2019 Fayette County 2016

Biosimilars in Oncology The Patient Perspective Opportunities and Challenges Jenn Gordon

The human dimension of the energy transition Linda Steg University of Groningen, Department of

Team Building - New Management Principles to Motivate Teams Dr. Waffa Karkukly, PMP, ACP, CMP

Assessing Generalization in Deep Reinforcement Learning Soo Jung - PowerPoint PPT Presentation

Assessing Generalization in Deep Reinforcement Learning Soo Jung Jang Background Before (ex: factory robot) Now (ex: human-like intelligence) focus on one environment apply to multiple environment generalization is not considered

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Deep learning: Challenges in learning and generalization Tomas Mikolov, Facebook AI What is

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence:

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence:

0-6 hour Weather Forecast Guidance at The Weather Company Steven Honey, Joseph Koval, Cathryn

Regularized Nonlinear Acceleration. Alexandre dAspremont , CNRS &amp; D.I. Ecole Normale Sup

Landsat Calibration: Interpolation, Extrapolation, and Reflection L DCM Sc ie nc e T e a m Me

Evaluation for Stability data Q1E Sumie Yoshioka, Ph. D. MHLW National Institute of Health

Presentation to ARC Technical Coordinating Committee June 13, 2019 Fayette County 2016

Biosimilars in Oncology The Patient Perspective Opportunities and Challenges Jenn Gordon

The human dimension of the energy transition Linda Steg University of Groningen, Department of

Team Building - New Management Principles to Motivate Teams Dr. Waffa Karkukly, PMP, ACP, CMP

Regularized Nonlinear Acceleration. Alexandre dAspremont , CNRS & D.I. Ecole Normale Sup