Deep Reinforcement Learning and Complex Environments Raia Hadsell - PowerPoint PPT Presentation

Deep Reinforcement Learning   and Complex Environments Raia Hadsell

End-to-end Deep Learning for robots? slide from V. Vanhoucke

End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio → Acoustic Model → Phonetic Model → Language Model → Text Deep Net slide from V. Vanhoucke

End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio → Acoustic Model → Phonetic Model → Language Model → Text Deep Net 2012: Computer Vision Pixels → Key Points → SIFT features → Deformable Part Model → Labels Deep Net slide from V. Vanhoucke

End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio → Acoustic Model → Phonetic Model → Language Model → Text Deep Net 2012: Computer Vision Pixels → Key Points → SIFT features → Deformable Part Model → Labels Deep Net 2014: Machine Translation Text → Reordering → Phrase Table/Dictionary → Language Model → Text Deep Net slide from V. Vanhoucke

End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio → Acoustic Model → Phonetic Model → Language Model → Text Deep Net 2012: Computer Vision Pixels → Key Points → SIFT features → Deformable Part Model → Labels Deep Net 2014: Machine Translation Text → Reordering → Phrase Table/Dictionary → Language Model → Text Deep Net 2017: Robotics? Sensors → Perception → World Model → Planning → Control → Action slide from V. Vanhoucke

Robotics is different LABELS General Artificial Intelligence

Robotics is different SENSORS ACTIONS General Artificial Intelligence

Deep Reinforcement Learning OBSERVATIONS GOAL REWARD? Agent Environment neural network ACTIONS General Artificial Intelligence

General Atari Player [Mnih et al, Playing Atari with Deep Reinforcement Learning, 2014]

9DOF Random reacher

● Can deep RL agents learn multiple tasks? ● Can deep RL agents learn efficiently? ● Can deep RL agents learn from real data? ● Can deep RL agents learn continuous control? Deep RL — Raia Hadsell

Lab Mazes StreetLearn Parkour Multiple Tasks & Lifelong learning

Lifelong Learning - 3 challenges 1. Catastrophic forgetting 2. Positive transfer 3. Specialization and generalization Raia Hadsell 2017

Catastrophic forgetting ● Well-known phenomenon ● Especially severe in Deep RL Raia Hadsell 2017

Catastrophic forgetting Raia Hadsell 2017

Elastic Weight Consolidation EWC 𝜾 SGD * L2 Task A Task B James Kirkpatrick et al (2017), “Overcoming Catastrophic Forgetting in NNs” Raia Hadsell 2017

What if my tasks really don’t get along? Raia Hadsell 2017

  What if my tasks really don’t get along? 𝝃 1 Progressive Nets 𝛒 1 ● add columns for new tasks ● freeze params of learnt columns ● layer-wise neural connections   → capacity for task-specific features → enables deep compositionality → precludes forgetting Raia Hadsell 2017 Andrei Rusu et al (2016), “Progressive Neural Networks”

  What if my tasks really don’t get along? 𝝃 1 𝝃 2 Progressive Nets 𝛒 1 𝛒 ● add columns for new tasks 2 ● freeze params of learnt columns ● layer-wise neural connections   a → capacity for task-specific features a → enables deep compositionality → precludes forgetting Raia Hadsell 2017 Andrei Rusu et al (2016), “Progressive Neural Networks”

  What if my tasks really don’t get along? 𝝃 1 𝝃 2 𝝃 3 Progressive Nets 𝛒 1 𝛒 𝛒 ● add columns for new tasks 2 3 ● freeze params of learnt columns ● layer-wise neural connections   a a a → capacity for task-specific features a a → a enables deep compositionality → precludes forgetting Raia Hadsell 2017 Andrei Rusu et al (2016), “Progressive Neural Networks”

Sim-to-Real Simulation Robot 𝝃 1 𝝃 2 𝝃 3 𝛒 1 𝛒 𝛒 2 3 Task A Task A Task B

What if my tasks really don’t get along? Raia Hadsell 2017

Distral (Distill and Transfer Learning) 𝝃 𝝃 𝛒 𝛒 1 2 1 2 ● Task-specific networks plus shared network ● KL Divergence constraint KL KL 𝝃 𝛒 0 ● Regularisation in policy space rather 0 than parameter space ● Shared policy as a communication 𝝃 𝝃 channel between tasks 𝛒 𝛒 KL KL 4 3 4 3 Yee Whye Teh et al (2017), “Distral: Robust Multitask Reinforcement Learning” Raia Hadsell

Distral (Distill and Transfer Learning) ● Task-specific networks plus shared distillation 𝛒 1 𝛒 2 network ● Regularisation in policy space rather KL KL than parameter space ● Shared policy as a communication 𝛒 0 channel between tasks → Distillation of knowledge into KL KL shared model enables transfer to tasks 𝛒 4 𝛒 3 Yee Whye Teh et al (2017), “Distral: Robust Multitask Reinforcement Learning” Raia Hadsell

Distral (Distill and Transfer Learning) ● Task-specific networks plus shared distillation & network 𝛒 1 𝛒 2 regularisation ● Regularisation in policy space rather KL KL than parameter space ● Shared policy as a communication 𝛒 0 channel between tasks → Distillation of knowledge into KL KL shared model enables transfer to tasks → 𝛒 4 Regularisation of shared model 𝛒 3 gives stability and robustness Yee Whye Teh et al (2017), “Distral: Robust Multitask Reinforcement Learning” Raia Hadsell

Deep RL — Raia Hadsell

StreetLearn Parkour Lab Mazes Multiple Tasks & & Auxiliary Learning Lifelong learning

Navigation mazes Game episode: 3600 steps/episode 1. Random start 2. Find the goal (+10) 3. Teleport randomly 4. Re-find the goal (+10) 5. Repeat (limited time) 10800 steps/episode Variants: Static maze, static goal Static maze, random goal Random maze

Nav agent architecture 1. Convolutional encoder and RGB inputs enc x t Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

Nav agent architecture 1. Convolutional encoder and RGB inputs 2. Single or stacked LSTM with skip connection enc x t Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

Nav agent architecture 1. Convolutional encoder and RGB inputs 2. Stacked LSTM 3. Additional inputs (reward, action, and velocity) enc x t r t -1 { v t , a t -1 } Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

Nav agent architecture 𝑾 1. Convolutional encoder and RGB inputs 𝛒 2. Stacked LSTM 3. Additional inputs (reward, action, and velocity) 4. RL: Asynchronous advantage actor critic (A3C) enc x t r t -1 { v t , a t -1 } Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

Nav agent architecture 𝑾 Depth 1. Convolutional encoder and RGB inputs 𝛒 (D 2 ) 2. Stacked LSTM 3. Additional inputs (reward, action, and velocity) Depth 4. RL: Asynchronous advantage actor critic (A3C) (D 1 ) 5. Aux task 1: Depth predictors   enc x t r t -1 { v t , a t -1 } Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

Nav agent architecture 𝑾 Loop Depth 1. Convolutional encoder and RGB inputs 𝛒 (D 2 ) (L) 2. Stacked LSTM 3. Additional inputs (reward, action, and velocity) Depth 4. RL: Asynchronous advantage actor critic (A3C) (D 1 ) 5. Aux task 1: Depth predictor enc 6. Aux task 2: Loop closure predictor x t r t -1 { v t , a t -1 } Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

Variations in architecture 𝑾 𝑾 𝑾 Loop Depth 𝛒 𝛒 𝛒 (D 2 ) (L) 𝑾 𝛒 Depth (D 1 ) enc enc enc enc x t x t x t r t -1 { v t , a t -1 } x t r t -1 { v t , a t -1 } a. FF A3C b. LSTM A3C c. Nav A3C d. Nav A3C +D 1 D 2 L

Results on large maze with static goal +10 +1

Deep RL — Raia Hadsell

Parkour Lab Mazes Multiple Tasks StreetLearn & & & Auxiliary Learning Lifelong learning Real woRld RL

Navigation mazes in the real world? observation observation structure structure Deep RL — Raia Hadsell

StreetView as an RL environment: StreetLearn ● RGB image cropped observation observation from panorama (84x84) ● Goal location Actions: move to next node, rotate view 20° or 60° structure structure Deep RL — Raia Hadsell

StreetView as an RL environment: StreetLearn left or right? Deep RL — Raia Hadsell

StreetView as an RL environment: StreetLearn Looks like a road, but it’s a park entrance Deep RL — Raia Hadsell

StreetView as an RL environment: StreetLearn west side highway Deep RL — Raia Hadsell

StreetView as an RL environment: StreetLearn curved roads and tunnels Deep RL — Raia Hadsell

StreetView as an RL environment: StreetLearn really, tunnels! Deep RL — Raia Hadsell

Deep Reinforcement Learning and Complex Environments Raia Hadsell - PowerPoint PPT Presentation

Deep Reinforcement Learning and Complex Environments Raia Hadsell End-to-end Deep Learning for robots? slide from V. Vanhoucke End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio Acoustic Model Phonetic Model

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence:

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence:

Deep he(a)p, big feat arXiv:1707.06887 A Distributional Perspective on Reinforcement Learning

your child's age, we can make it through. But before we dive in, I know you are wondering who

The Art of Standing up Uncovering design pattern in comedy Who am I. Why am I doing this. The

H ONORS & A WARDS : Alliance for Graduate Education and the Professoriate Fellowship 2005

LEARNING TEMPORAL EMBEDDINGS FOR COMPLEX VIDEO ANALYSIS BY RAMANATHAN, TANG, MORI, AND LI Chad

Deep Reinforcement Learning for Robotics:

Qualitative Research I. What is it? II. Conducting qualitative research: prep, sampling, data

Studying the Effects of the Block Island Wind Farm on Recreation, Tourism, and the Block Island

Ch 9 SAQs (Pop Quiz) 1. What is the scientific method and why is it important? 2. What do we

Deep Reinforcement Learning and Complex Environments Raia Hadsell - PowerPoint PPT Presentation

Deep Reinforcement Learning and Complex Environments Raia Hadsell End-to-end Deep Learning for robots? slide from V. Vanhoucke End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio Acoustic Model Phonetic Model

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence:

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence:

Deep he(a)p, big feat arXiv:1707.06887 A Distributional Perspective on Reinforcement Learning

your child's age, we can make it through. But before we dive in, I know you are wondering who

The Art of Standing up Uncovering design pattern in comedy Who am I. Why am I doing this. The

H ONORS &amp; A WARDS : Alliance for Graduate Education and the Professoriate Fellowship 2005

LEARNING TEMPORAL EMBEDDINGS FOR COMPLEX VIDEO ANALYSIS BY RAMANATHAN, TANG, MORI, AND LI Chad

Deep Reinforcement Learning for Robotics:

Qualitative Research I. What is it? II. Conducting qualitative research: prep, sampling, data

Studying the Effects of the Block Island Wind Farm on Recreation, Tourism, and the Block Island

Ch 9 SAQs (Pop Quiz) 1. What is the scientific method and why is it important? 2. What do we

H ONORS & A WARDS : Alliance for Graduate Education and the Professoriate Fellowship 2005