deep reinforcement learning and complex environments
play

Deep Reinforcement Learning and Complex Environments Raia Hadsell - PowerPoint PPT Presentation

Deep Reinforcement Learning and Complex Environments Raia Hadsell End-to-end Deep Learning for robots? slide from V. Vanhoucke End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio Acoustic Model Phonetic Model


  1. Deep Reinforcement Learning 
 and Complex Environments Raia Hadsell

  2. End-to-end Deep Learning for robots? slide from V. Vanhoucke

  3. End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio → Acoustic Model → Phonetic Model → Language Model → Text Deep Net slide from V. Vanhoucke

  4. End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio → Acoustic Model → Phonetic Model → Language Model → Text Deep Net 2012: Computer Vision Pixels → Key Points → SIFT features → Deformable Part Model → Labels Deep Net slide from V. Vanhoucke

  5. End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio → Acoustic Model → Phonetic Model → Language Model → Text Deep Net 2012: Computer Vision Pixels → Key Points → SIFT features → Deformable Part Model → Labels Deep Net 2014: Machine Translation Text → Reordering → Phrase Table/Dictionary → Language Model → Text Deep Net slide from V. Vanhoucke

  6. End-to-end Deep Learning for robots? 2010 : Speech Recognition Audio → Acoustic Model → Phonetic Model → Language Model → Text Deep Net 2012: Computer Vision Pixels → Key Points → SIFT features → Deformable Part Model → Labels Deep Net 2014: Machine Translation Text → Reordering → Phrase Table/Dictionary → Language Model → Text Deep Net 2017: Robotics? Sensors → Perception → World Model → Planning → Control → Action slide from V. Vanhoucke

  7. Robotics is different LABELS General Artificial Intelligence

  8. Robotics is different SENSORS ACTIONS General Artificial Intelligence

  9. Deep Reinforcement Learning OBSERVATIONS GOAL REWARD? Agent Environment neural network ACTIONS General Artificial Intelligence

  10. General Atari Player [Mnih et al, Playing Atari with Deep Reinforcement Learning, 2014]

  11. 9DOF Random reacher

  12. ● Can deep RL agents learn multiple tasks? ● Can deep RL agents learn efficiently? ● Can deep RL agents learn from real data? ● Can deep RL agents learn continuous control? Deep RL — Raia Hadsell

  13. Lab Mazes StreetLearn Parkour Multiple Tasks & Lifelong learning

  14. Lifelong Learning - 3 challenges 1. Catastrophic forgetting 2. Positive transfer 3. Specialization and generalization Raia Hadsell 2017

  15. Catastrophic forgetting ● Well-known phenomenon ● Especially severe in Deep RL Raia Hadsell 2017

  16. Catastrophic forgetting ● Well-known phenomenon ● Especially severe in Deep RL Raia Hadsell 2017

  17. Catastrophic forgetting Raia Hadsell 2017

  18. Catastrophic forgetting Raia Hadsell 2017

  19. Elastic Weight Consolidation EWC 𝜾 SGD * L2 Task A Task B James Kirkpatrick et al (2017), “Overcoming Catastrophic Forgetting in NNs” Raia Hadsell 2017

  20. What if my tasks really don’t get along? Raia Hadsell 2017

  21. 
 What if my tasks really don’t get along? 𝝃 1 Progressive Nets 𝛒 1 ● add columns for new tasks ● freeze params of learnt columns ● layer-wise neural connections 
 → capacity for task-specific features → enables deep compositionality → precludes forgetting Raia Hadsell 2017 Andrei Rusu et al (2016), “Progressive Neural Networks”

  22. 
 What if my tasks really don’t get along? 𝝃 1 𝝃 2 Progressive Nets 𝛒 1 𝛒 ● add columns for new tasks 2 ● freeze params of learnt columns ● layer-wise neural connections 
 a → capacity for task-specific features a → enables deep compositionality → precludes forgetting Raia Hadsell 2017 Andrei Rusu et al (2016), “Progressive Neural Networks”

  23. 
 What if my tasks really don’t get along? 𝝃 1 𝝃 2 𝝃 3 Progressive Nets 𝛒 1 𝛒 𝛒 ● add columns for new tasks 2 3 ● freeze params of learnt columns ● layer-wise neural connections 
 a a a → capacity for task-specific features a a → a enables deep compositionality → precludes forgetting Raia Hadsell 2017 Andrei Rusu et al (2016), “Progressive Neural Networks”

  24. Sim-to-Real Simulation Robot 𝝃 1 𝝃 2 𝝃 3 𝛒 1 𝛒 𝛒 2 3 Task A Task A Task B

  25. What if my tasks really don’t get along? Raia Hadsell 2017

  26. Distral (Distill and Transfer Learning) 𝝃 𝝃 𝛒 𝛒 1 2 1 2 ● Task-specific networks plus shared network ● KL Divergence constraint KL KL 𝝃 𝛒 0 ● Regularisation in policy space rather 0 than parameter space ● Shared policy as a communication 𝝃 𝝃 channel between tasks 𝛒 𝛒 KL KL 4 3 4 3 Yee Whye Teh et al (2017), “Distral: Robust Multitask Reinforcement Learning” Raia Hadsell

  27. Distral (Distill and Transfer Learning) ● Task-specific networks plus shared distillation 𝛒 1 𝛒 2 network ● Regularisation in policy space rather KL KL than parameter space ● Shared policy as a communication 𝛒 0 channel between tasks → Distillation of knowledge into KL KL shared model enables transfer to tasks 𝛒 4 𝛒 3 Yee Whye Teh et al (2017), “Distral: Robust Multitask Reinforcement Learning” Raia Hadsell

  28. Distral (Distill and Transfer Learning) ● Task-specific networks plus shared distillation & network 𝛒 1 𝛒 2 regularisation ● Regularisation in policy space rather KL KL than parameter space ● Shared policy as a communication 𝛒 0 channel between tasks → Distillation of knowledge into KL KL shared model enables transfer to tasks → 𝛒 4 Regularisation of shared model 𝛒 3 gives stability and robustness Yee Whye Teh et al (2017), “Distral: Robust Multitask Reinforcement Learning” Raia Hadsell

  29. Deep RL — Raia Hadsell

  30. Deep RL — Raia Hadsell

  31. StreetLearn Parkour Lab Mazes Multiple Tasks & & Auxiliary Learning Lifelong learning

  32. Navigation mazes Game episode: 3600 steps/episode 1. Random start 2. Find the goal (+10) 3. Teleport randomly 4. Re-find the goal (+10) 5. Repeat (limited time) 10800 steps/episode Variants: Static maze, static goal Static maze, random goal Random maze

  33. Nav agent architecture 1. Convolutional encoder and RGB inputs enc x t Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

  34. Nav agent architecture 1. Convolutional encoder and RGB inputs 2. Single or stacked LSTM with skip connection enc x t Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

  35. Nav agent architecture 1. Convolutional encoder and RGB inputs 2. Stacked LSTM 3. Additional inputs (reward, action, and velocity) enc x t r t -1 { v t , a t -1 } Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

  36. Nav agent architecture 𝑾 1. Convolutional encoder and RGB inputs 𝛒 2. Stacked LSTM 3. Additional inputs (reward, action, and velocity) 4. RL: Asynchronous advantage actor critic (A3C) enc x t r t -1 { v t , a t -1 } Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

  37. Nav agent architecture 𝑾 Depth 1. Convolutional encoder and RGB inputs 𝛒 (D 2 ) 2. Stacked LSTM 3. Additional inputs (reward, action, and velocity) Depth 4. RL: Asynchronous advantage actor critic (A3C) (D 1 ) 5. Aux task 1: Depth predictors 
 enc x t r t -1 { v t , a t -1 } Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

  38. Nav agent architecture 𝑾 Loop Depth 1. Convolutional encoder and RGB inputs 𝛒 (D 2 ) (L) 2. Stacked LSTM 3. Additional inputs (reward, action, and velocity) Depth 4. RL: Asynchronous advantage actor critic (A3C) (D 1 ) 5. Aux task 1: Depth predictor enc 6. Aux task 2: Loop closure predictor x t r t -1 { v t , a t -1 } Piotr Mirowski, Razvan Pascanu et al (2017) “Learning to navigate in complex environments”

  39. Variations in architecture 𝑾 𝑾 𝑾 Loop Depth 𝛒 𝛒 𝛒 (D 2 ) (L) 𝑾 𝛒 Depth (D 1 ) enc enc enc enc x t x t x t r t -1 { v t , a t -1 } x t r t -1 { v t , a t -1 } a. FF A3C b. LSTM A3C c. Nav A3C d. Nav A3C +D 1 D 2 L

  40. Results on large maze with static goal +10 +1

  41. Deep RL — Raia Hadsell

  42. Parkour Lab Mazes Multiple Tasks StreetLearn & & & Auxiliary Learning Lifelong learning Real woRld RL

  43. Navigation mazes in the real world? observation observation structure structure Deep RL — Raia Hadsell

  44. StreetView as an RL environment: StreetLearn ● RGB image cropped observation observation from panorama (84x84) ● Goal location Actions: move to next node, rotate view 20° or 60° structure structure Deep RL — Raia Hadsell

  45. StreetView as an RL environment: StreetLearn left or right? Deep RL — Raia Hadsell

  46. StreetView as an RL environment: StreetLearn Looks like a road, but it’s a park entrance Deep RL — Raia Hadsell

  47. StreetView as an RL environment: StreetLearn west side highway Deep RL — Raia Hadsell

  48. StreetView as an RL environment: StreetLearn curved roads and tunnels Deep RL — Raia Hadsell

  49. StreetView as an RL environment: StreetLearn really, tunnels! Deep RL — Raia Hadsell

Recommend


More recommend