improving imitation learning with reinforcement learning
play

Improving Imitation Learning with Reinforcement Learning Niklas - PowerPoint PPT Presentation

MIN Faculty Department of Informatics Improving Imitation Learning with Reinforcement Learning Niklas Fiedler University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Department of Informatics Technical Aspects of


  1. MIN Faculty Department of Informatics Improving Imitation Learning with Reinforcement Learning Niklas Fiedler University of Hamburg Faculty of Mathematics, Informatics and Natural Sciences Department of Informatics Technical Aspects of Multimodal Systems November 26, 2019 N. Fiedler – Improving Imitation Learning with Reinforcement Learning 1 / 23

  2. Outline Introduction Imitation Learning Combining RL and IL Conclusion 1. Introduction Motivation 2. Imitation Learning Demonstration Methods Behavioral Cloning Inverse Reinforcement Learning 3. Combining Reinforcement Learning and Imitation Learning BC Application IRL Application 4. Conclusion N. Fiedler – Improving Imitation Learning with Reinforcement Learning 2 / 23

  3. Goal Introduction Imitation Learning Combining RL and IL Conclusion ◮ Imitate expert behavior ◮ Improve learning by including knowledge given by demonstration ◮ Learn expert policies → Make use of expert demonstrations N. Fiedler – Improving Imitation Learning with Reinforcement Learning 3 / 23

  4. Motivation Humans are Awesome Introduction Imitation Learning Combining RL and IL Conclusion https://rejectedprincesses.tumblr.com/post/150495232038/ chynara-madinkulova-long-hair-and-aida-akmatova N. Fiedler – Improving Imitation Learning with Reinforcement Learning 4 / 23

  5. Motivation Learning from Demonstration Introduction Imitation Learning Combining RL and IL Conclusion Learning from experts is natural behavior [Haw50], https://www.wakecounseling.com/therapy-blog/play-therapy N. Fiedler – Improving Imitation Learning with Reinforcement Learning 5 / 23

  6. Imitation Learning Introduction Imitation Learning Combining RL and IL Conclusion Method to learn a behavior based on a demonstration Various forms of demonstration. Two prominent methods of implementation: 1. Behavioral Cloning 2. Inverse Reinforcement Learning N. Fiedler – Improving Imitation Learning with Reinforcement Learning 6 / 23

  7. Imitation Learning Introduction Imitation Learning Combining RL and IL Conclusion Method to learn a behavior based on a demonstration Various forms of demonstration. Two prominent methods of implementation: 1. Behavioral Cloning 2. Inverse Reinforcement Learning N. Fiedler – Improving Imitation Learning with Reinforcement Learning 6 / 23

  8. Imitation Learning Introduction Imitation Learning Combining RL and IL Conclusion Method to learn a behavior based on a demonstration Various forms of demonstration. Two prominent methods of implementation: 1. Behavioral Cloning 2. Inverse Reinforcement Learning N. Fiedler – Improving Imitation Learning with Reinforcement Learning 6 / 23

  9. Demonstration Methods Introduction Imitation Learning Combining RL and IL Conclusion Virtual/Augumented Reality Tracking of Human Motions Video Stream Teleoperation s3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/ https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo N. Fiedler – Improving Imitation Learning with Reinforcement Learning 7 / 23

  10. Demonstration Methods Introduction Imitation Learning Combining RL and IL Conclusion Virtual/Augumented Reality Tracking of Human Motions Video Stream Teleoperation s3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/ https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo N. Fiedler – Improving Imitation Learning with Reinforcement Learning 7 / 23

  11. Demonstration Methods Introduction Imitation Learning Combining RL and IL Conclusion Virtual/Augumented Reality Tracking of Human Motions Video Stream Teleoperation s3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/ https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo N. Fiedler – Improving Imitation Learning with Reinforcement Learning 7 / 23

  12. Demonstration Methods Introduction Imitation Learning Combining RL and IL Conclusion Virtual/Augumented Reality Tracking of Human Motions Video Stream Teleoperation s3.ap-south-1.amazonaws.com/kidobotikz.sprw/master/assets/images/blog/blog-2018110811630.jpg siamagazin.com/bimanual-teleoperation-of-a-compliant-whole-body-controlled-humanoid-robot/ https://ar-tracking.com/applications/motion-capture/ https://www.youtube.com/watch?v=5BTIE_fhReo N. Fiedler – Improving Imitation Learning with Reinforcement Learning 7 / 23

  13. Behavioral Cloning Introduction Imitation Learning Combining RL and IL Conclusion ◮ Training a direct link between demonstrated input and output ◮ Large amounts of training data necessary ◮ Poor generalization N. Fiedler – Improving Imitation Learning with Reinforcement Learning 8 / 23

  14. Behavioral Cloning Video Introduction Imitation Learning Combining RL and IL Conclusion https://www.youtube.com/watch?v=5BTIE_fhReo N. Fiedler – Improving Imitation Learning with Reinforcement Learning 9 / 23

  15. Inverse Reinforcement Learning Reinforcement Learning Introduction Imitation Learning Combining RL and IL Conclusion N. Fiedler – Improving Imitation Learning with Reinforcement Learning 10 / 23

  16. Inverse Reinforcement Learning Reinforcement Learning vs. Inversed Reinforcement Learning Introduction Imitation Learning Combining RL and IL Conclusion RL IRL (partially observed) policy π or history given reward function R sampled from that policy reward function R optimal policy π searching for which given behavior for given reward is optimal https://thinkingwires.com/posts/2018-02-13-irl-tutorial-1.html N. Fiedler – Improving Imitation Learning with Reinforcement Learning 11 / 23

  17. Inverse Reinforcement Learning Introduction Imitation Learning Combining RL and IL Conclusion https://medium.com/@sanketgujar95/generative-adversarial-imitation-learning-266f45634e60 N. Fiedler – Improving Imitation Learning with Reinforcement Learning 12 / 23

  18. Imitation Learning Behavioral Cloning vs. Inversed Reinforcement Learning Introduction Imitation Learning Combining RL and IL Conclusion Inversed Reinforcement Behavioral Cloning Learning ◮ Weak generalization ◮ Strong generalization ◮ Relatively low ◮ Large computational effort computational effort ◮ Complex structure N. Fiedler – Improving Imitation Learning with Reinforcement Learning 13 / 23

  19. Combining Reinforcement Learning and Imitation Learning Introduction Imitation Learning Combining RL and IL Conclusion ◮ Reducing the impact of shortcomings of both methods ◮ Applications should outperform demonstrators after RL applications ◮ Accelerated training process ◮ Extending the capabilities learned with imitation learning N. Fiedler – Improving Imitation Learning with Reinforcement Learning 14 / 23

  20. BC Application Introduction Imitation Learning Combining RL and IL Conclusion Overcoming Exploration in Reinforcement Learning with Demonstrations Ashvin Nair 12 , Bob McGrew 1 , Marcin Andrychowicz 1 , Wojciech Zaremba 1 and Pieter Abbeel 12 2018 IEEE International Conference on Robotics and Automation (ICRA) 1 OpenAI 2 University of California, Berkeley N. Fiedler – Improving Imitation Learning with Reinforcement Learning 15 / 23

  21. BC Application Goal Introduction Imitation Learning Combining RL and IL Conclusion Pushing Sliding Pick and Place [NMA + 18] N. Fiedler – Improving Imitation Learning with Reinforcement Learning 16 / 23

  22. BC Application Results Introduction Imitation Learning Combining RL and IL Conclusion [NMA + 18] N. Fiedler – Improving Imitation Learning with Reinforcement Learning 17 / 23

  23. IRL Application Introduction Imitation Learning Combining RL and IL Conclusion Reinforcement and Imitation Learning for Diverse Visuomotor Skills Yuke Zhu 1 , Ziyu Wang 2 , Josh Merel 2 , Andrei Rusu 2 , Tom Erez 2 , Serkan Cabi 2 , Saran Tunyasuvunakool 2 , Janos Kramar 2 , Raia Hadsell 2 , Nando de Freitas 2 and Nicolas Heess 2 1 Computer Science Department, Stanford University 2 OpenAI N. Fiedler – Improving Imitation Learning with Reinforcement Learning 18 / 23

  24. IRL Application Goal Introduction Imitation Learning Combining RL and IL Conclusion [ZWM + 18] N. Fiedler – Improving Imitation Learning with Reinforcement Learning 19 / 23

  25. IRL Application Method Introduction Imitation Learning Combining RL and IL Conclusion [ZWM + 18] N. Fiedler – Improving Imitation Learning with Reinforcement Learning 20 / 23

  26. IRL Example Results - Block stacking Introduction Imitation Learning Combining RL and IL Conclusion [ZWM + 18] N. Fiedler – Improving Imitation Learning with Reinforcement Learning 21 / 23

  27. Combining Reinforcement Learning and Imitation Learning Comparison Introduction Imitation Learning Combining RL and IL Conclusion IRL Approach BC Approach ◮ Inversed Reinforcement ◮ Behavioral Cloning Learning ◮ Simulation only ◮ Policies transferred to real robot ◮ Goal : improve training ◮ Goal : improve result performance and task complexity performance and task complexity N. Fiedler – Improving Imitation Learning with Reinforcement Learning 22 / 23

Recommend


More recommend