Deep Reinforcement Learning and Complex Environments Raia Hadsell
End-to-end Deep Learning for robots? slide from V. Vanhoucke
End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio → Acoustic Model → Phonetic Model → Language Model → Text Deep Net slide from V. Vanhoucke
End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio → Acoustic Model → Phonetic Model → Language Model → Text Deep Net 2012: Computer Vision Pixels → Key Points → SIFT features → Deformable Part Model → Labels Deep Net slide from V. Vanhoucke
End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio → Acoustic Model → Phonetic Model → Language Model → Text Deep Net 2012: Computer Vision Pixels → Key Points → SIFT features → Deformable Part Model → Labels Deep Net 2014: Machine Translation Text → Reordering → Phrase Table/Dictionary → Language Model → Text Deep Net slide from V. Vanhoucke
End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio → Acoustic Model → Phonetic Model → Language Model → Text Deep Net 2012: Computer Vision Pixels → Key Points → SIFT features → Deformable Part Model → Labels Deep Net 2014: Machine Translation Text → Reordering → Phrase Table/Dictionary → Language Model → Text Deep Net 2017: Robotics? Sensors → Perception → World Model → Planning → Control → Action slide from V. Vanhoucke
Robotics is different LABELS General Artificial Intelligence
Robotics is different SENSORS ACTIONS General Artificial Intelligence
Deep Reinforcement Learning OBSERVATIONS GOAL REWARD? Agent Environment neural network ACTIONS General Artificial Intelligence
General Atari Player [Mnih et al, Playing Atari with Deep Reinforcement Learning, 2014]
9DOF Random reacher
● Can deep RL agents learn multiple tasks? ● Can deep RL agents learn efficiently? ● Can deep RL agents learn from real data? ● Can deep RL agents learn continuous control? Deep RL — Raia Hadsell
Lab Mazes StreetLearn Parkour Multiple Tasks & Lifelong learning
Lifelong Learning - 3 challenges 1. Catastrophic forgetting 2. Positive transfer 3. Specialization and generalization Raia Hadsell 2017
Catastrophic forgetting ● Well-known phenomenon ● Especially severe in Deep RL Raia Hadsell 2017
Catastrophic forgetting ● Well-known phenomenon ● Especially severe in Deep RL Raia Hadsell 2017
Catastrophic forgetting Raia Hadsell 2017
Catastrophic forgetting Raia Hadsell 2017
Elastic Weight Consolidation EWC 𝜾 SGD * L2 Task A Task B James Kirkpatrick et al (2017), “Overcoming Catastrophic Forgetting in NNs” Raia Hadsell 2017
What if my tasks really don’t get along? Raia Hadsell 2017
What if my tasks really don’t get along? 𝝃 1 Progressive Nets 𝛒 1 ● add columns for new tasks ● freeze params of learnt columns ● layer-wise neural connections → capacity for task-specific features → enables deep compositionality → precludes forgetting Raia Hadsell 2017 Andrei Rusu et al (2016), “Progressive Neural Networks”
What if my tasks really don’t get along? 𝝃 1 𝝃 2 Progressive Nets 𝛒 1 𝛒 ● add columns for new tasks 2 ● freeze params of learnt columns ● layer-wise neural connections a → capacity for task-specific features a → enables deep compositionality → precludes forgetting Raia Hadsell 2017 Andrei Rusu et al (2016), “Progressive Neural Networks”
What if my tasks really don’t get along? 𝝃 1 𝝃 2 𝝃 3 Progressive Nets 𝛒 1 𝛒 𝛒 ● add columns for new tasks 2 3 ● freeze params of learnt columns ● layer-wise neural connections a a a → capacity for task-specific features a a → a enables deep compositionality → precludes forgetting Raia Hadsell 2017 Andrei Rusu et al (2016), “Progressive Neural Networks”
Sim-to-Real Simulation Robot 𝝃 1 𝝃 2 𝝃 3 𝛒 1 𝛒 𝛒 2 3 Task A Task A Task B
What if my tasks really don’t get along? Raia Hadsell 2017
Distral (Distill and Transfer Learning) 𝝃 𝝃 𝛒 𝛒 1 2 1 2 ● Task-specific networks plus shared network ● KL Divergence constraint KL KL 𝝃 𝛒 0 ● Regularisation in policy space rather 0 than parameter space ● Shared policy as a communication 𝝃 𝝃 channel between tasks 𝛒 𝛒 KL KL 4 3 4 3 Yee Whye Teh et al (2017), “Distral: Robust Multitask Reinforcement Learning” Raia Hadsell
Distral (Distill and Transfer Learning) ● Task-specific networks plus shared distillation 𝛒 1 𝛒 2 network ● Regularisation in policy space rather KL KL than parameter space ● Shared policy as a communication 𝛒 0 channel between tasks → Distillation of knowledge into KL KL shared model enables transfer to tasks 𝛒 4 𝛒 3 Yee Whye Teh et al (2017), “Distral: Robust Multitask Reinforcement Learning” Raia Hadsell
Distral (Distill and Transfer Learning) ● Task-specific networks plus shared distillation & network 𝛒 1 𝛒 2 regularisation ● Regularisation in policy space rather KL KL than parameter space ● Shared policy as a communication 𝛒 0 channel between tasks → Distillation of knowledge into KL KL shared model enables transfer to tasks → 𝛒 4 Regularisation of shared model 𝛒 3 gives stability and robustness Yee Whye Teh et al (2017), “Distral: Robust Multitask Reinforcement Learning” Raia Hadsell
Deep RL — Raia Hadsell
Deep RL — Raia Hadsell
StreetLearn Parkour Lab Mazes Multiple Tasks & & Auxiliary Learning Lifelong learning
Navigation mazes Game episode: 3600 steps/episode 1. Random start 2. Find the goal (+10) 3. Teleport randomly 4. Re-find the goal (+10) 5. Repeat (limited time) 10800 steps/episode Variants: Static maze, static goal Static maze, random goal Random maze
Nav agent architecture 1. Convolutional encoder and RGB inputs enc x t Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”
Nav agent architecture 1. Convolutional encoder and RGB inputs 2. Single or stacked LSTM with skip connection enc x t Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”
Nav agent architecture 1. Convolutional encoder and RGB inputs 2. Stacked LSTM 3. Additional inputs (reward, action, and velocity) enc x t r t -1 { v t , a t -1 } Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”
Nav agent architecture 𝑾 1. Convolutional encoder and RGB inputs 𝛒 2. Stacked LSTM 3. Additional inputs (reward, action, and velocity) 4. RL: Asynchronous advantage actor critic (A3C) enc x t r t -1 { v t , a t -1 } Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”
Nav agent architecture 𝑾 Depth 1. Convolutional encoder and RGB inputs 𝛒 (D 2 ) 2. Stacked LSTM 3. Additional inputs (reward, action, and velocity) Depth 4. RL: Asynchronous advantage actor critic (A3C) (D 1 ) 5. Aux task 1: Depth predictors enc x t r t -1 { v t , a t -1 } Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”
Nav agent architecture 𝑾 Loop Depth 1. Convolutional encoder and RGB inputs 𝛒 (D 2 ) (L) 2. Stacked LSTM 3. Additional inputs (reward, action, and velocity) Depth 4. RL: Asynchronous advantage actor critic (A3C) (D 1 ) 5. Aux task 1: Depth predictor enc 6. Aux task 2: Loop closure predictor x t r t -1 { v t , a t -1 } Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”
Variations in architecture 𝑾 𝑾 𝑾 Loop Depth 𝛒 𝛒 𝛒 (D 2 ) (L) 𝑾 𝛒 Depth (D 1 ) enc enc enc enc x t x t x t r t -1 { v t , a t -1 } x t r t -1 { v t , a t -1 } a. FF A3C b. LSTM A3C c. Nav A3C d. Nav A3C +D 1 D 2 L
Results on large maze with static goal +10 +1
Deep RL — Raia Hadsell
Parkour Lab Mazes Multiple Tasks StreetLearn & & & Auxiliary Learning Lifelong learning Real woRld RL
Navigation mazes in the real world? observation observation structure structure Deep RL — Raia Hadsell
StreetView as an RL environment: StreetLearn ● RGB image cropped observation observation from panorama (84x84) ● Goal location Actions: move to next node, rotate view 20° or 60° structure structure Deep RL — Raia Hadsell
StreetView as an RL environment: StreetLearn left or right? Deep RL — Raia Hadsell
StreetView as an RL environment: StreetLearn Looks like a road, but it’s a park entrance Deep RL — Raia Hadsell
StreetView as an RL environment: StreetLearn west side highway Deep RL — Raia Hadsell
StreetView as an RL environment: StreetLearn curved roads and tunnels Deep RL — Raia Hadsell
StreetView as an RL environment: StreetLearn really, tunnels! Deep RL — Raia Hadsell
Recommend
More recommend