self supervised exploration via disagreement
play

Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj - PowerPoint PPT Presentation

Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj Gandhi* Abhinav Gupta UC Berkeley CMU CMU, FAIR ICML 2019 * equal contribution Exploration a major challenge! Exploration a major challenge! Mohamed et.al.


  1. Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj Gandhi* Abhinav Gupta UC Berkeley CMU CMU, FAIR ICML 2019 * equal contribution

  2. Exploration – a major challenge!

  3. Exploration – a major challenge! Mohamed et.al. “Variational information • maximisation for intrinsically motivated Schmidhuber, Jurgen. “A possibility for • reinforcement learning”. NIPS, 2015. implementing curiosity and boredom in model building neural controllers”, 1991. Houthooft et.al. “VIME: Variational information • maximizing exploration”. NIPS, 2016. Schmidhuber, Jurgen. “Formal theory of creativity, • fun, and intrinsic motivation (1990–2010)”, 2010. Gregor et.al. “Variational intrinsic control”. ICLR • Workshop, 2017. Oudeyer, P.-Y. and Kaplan, F. What is intrinsic • motivation? a typology of computational Pathak et.al. “Curiosity-driven Exploration by Self- • approaches. Frontiers in neurorobotics, 2009. supervised Exploration”. ICML 2017 Poupart et.al. “An analytic solution to discrete • Ostrovski et.al. “Count-based exploration with • bayesian reinforcement learning”. ICML, 2006. neural density models”. ICML, 2017. Lopes et.al. “Exploration in model-based • Burda*, Edwards*, Pathak* et.al. “Large-Scale • reinforcement learning by empirically estimating Study of Curiosity-driven Learning”. ICLR 2019 learning progress”. NIPS, 2012. Eysenbach et al. “Diversity is all you need: Learn • Bellemare et.al. “Unifying count-based exploration • skills without a reward function”. ICLR 2019. and intrinsic motivation”. NIPS, 2016. Savinov et al. "Episodic curiosity through • reachability". ICLR 2019.

  4. Exploration – a major challenge! Mohamed et.al. “Variational information • maximisation for intrinsically motivated Schmidhuber, Jurgen. “A possibility for • reinforcement learning”. NIPS, 2015. implementing curiosity and boredom in model building neural controllers”, 1991. Houthooft et.al. “VIME: Variational information • maximizing exploration”. NIPS, 2016. Schmidhuber, Jurgen. “Formal theory of creativity, • fun, and intrinsic motivation (1990–2010)”, 2010. Gregor et.al. “Variational intrinsic control”. ICLR • Workshop, 2017. Oudeyer, P.-Y. and Kaplan, F. What is intrinsic • motivation? a typology of computational Pathak et.al. “Curiosity-driven Exploration by Self- • approaches. Frontiers in neurorobotics, 2009. supervised Exploration”. ICML 2017 Poupart et.al. “An analytic solution to discrete • Ostrovski et.al. “Count-based exploration with • bayesian reinforcement learning”. ICML, 2006. neural density models”. ICML, 2017. Lopes et.al. “Exploration in model-based • Burda*, Edwards*, Pathak* et.al. “Large-Scale • reinforcement learning by empirically estimating Study of Curiosity-driven Learning”. ICLR 2019 learning progress”. NIPS, 2012. Eysenbach et al. “Diversity is all you need: Learn • Bellemare et.al. “Unifying count-based exploration • skills without a reward function”. ICLR 2019. and intrinsic motivation”. NIPS, 2016. Savinov et al. "Episodic curiosity through • reachability". ICLR 2019.

  5. Exploration – a major challenge! Mohamed et.al. “Variational information • maximisation for intrinsically motivated Schmidhuber, Jurgen. “A possibility for • reinforcement learning”. NIPS, 2015. implementing curiosity and boredom in model building neural controllers”, 1991. Houthooft et.al. “VIME: Variational information • maximizing exploration”. NIPS, 2016. Schmidhuber, Jurgen. “Formal theory of creativity, • fun, and intrinsic motivation (1990–2010)”, 2010. Gregor et.al. “Variational intrinsic control”. ICLR • Workshop, 2017. Oudeyer, P.-Y. and Kaplan, F. What is intrinsic • motivation? a typology of computational Pathak et.al. “Curiosity-driven Exploration by Self- • approaches. Frontiers in neurorobotics, 2009. supervised Exploration”. ICML 2017 Poupart et.al. “An analytic solution to discrete • Ostrovski et.al. “Count-based exploration with • bayesian reinforcement learning”. ICML, 2006. neural density models”. ICML, 2017. Lopes et.al. “Exploration in model-based • Burda*, Edwards*, Pathak* et.al. “Large-Scale • reinforcement learning by empirically estimating Study of Curiosity-driven Learning”. ICLR 2019 learning progress”. NIPS, 2012. Eysenbach et al. “Diversity is all you need: Learn • Bellemare et.al. “Unifying count-based exploration • skills without a reward function”. ICLR 2019. and intrinsic motivation”. NIPS, 2016. Savinov et al. "Episodic curiosity through • reachability". ICLR 2019.

  6. Exploration – a major challenge! Mohamed et.al. “Variational information • maximisation for intrinsically motivated Schmidhuber, Jurgen. “A possibility for • reinforcement learning”. NIPS, 2015. implementing curiosity and boredom in model building neural controllers”, 1991. t Houthooft et.al. “VIME: Variational information • n e ] maximizing exploration”. NIPS, 2016. Schmidhuber, Jurgen. “Formal theory of creativity, • i s c e i fun, and intrinsic motivation (1990–2010)”, 2010. l f p f Gregor et.al. “Variational intrinsic control”. ICLR • e m n Workshop, 2017. Oudeyer, P.-Y. and Kaplan, F. What is intrinsic • a I s e motivation? a typology of computational l f Pathak et.al. “Curiosity-driven Exploration by Self- • p o approaches. Frontiers in neurorobotics, 2009. m s supervised Exploration”. ICML 2017 n a o S Poupart et.al. “An analytic solution to discrete • i Ostrovski et.al. “Count-based exploration with • l l bayesian reinforcement learning”. ICML, 2006. i m neural density models”. ICML, 2017. [ Lopes et.al. “Exploration in model-based • Burda*, Edwards*, Pathak* et.al. “Large-Scale • reinforcement learning by empirically estimating Study of Curiosity-driven Learning”. ICLR 2019 learning progress”. NIPS, 2012. Eysenbach et al. “Diversity is all you need: Learn • Bellemare et.al. “Unifying count-based exploration • skills without a reward function”. ICLR 2019. and intrinsic motivation”. NIPS, 2016. Savinov et al. "Episodic curiosity through • reachability". ICLR 2019.

  7. Sample Inefficient Simulation

  8. Sample Inefficient Simulation Real Robots

  9. Sample Inefficient “Stuck” in Stochastic Envs Simulation Real Robots

  10. Sample Inefficient “Stuck” in Stochastic Envs Simulation Curiosity Exploration w/ Noisy TV & Remote [Burda*, Edwards*, Pathak* et. al. ICLR’19] [Juliani et.al., ArXiv’19] Real Robots

  11. Why inefficient?

  12. [Pathak et al. ICML, 2017]

  13. current image x t [Pathak et al. ICML, 2017]

  14. policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]

  15. action a t policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]

  16. next image x t+1 action a t policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]

  17. next image x t+1 action a t policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]

  18. next image x t+1 action a t policy network 𝜌 " 𝑦 $ Prediction Model 𝑔(𝑦 $ , 𝑏 $ ) current image x t [Pathak et al. ICML, 2017]

  19. next image x t+1 action a t policy network 𝜌 " 𝑦 $ Prediction Model 𝑔(𝑦 $ , 𝑏 $ ) current image x t action a t current image x t [Pathak et al. ICML, 2017]

  20. next image x t+1 action a t predicted next image * 𝒚 𝒖-𝟐 policy network 𝜌 " 𝑦 $ Prediction Model 𝑔(𝑦 $ , 𝑏 $ ) current image x t action a t current image x t [Pathak et al. ICML, 2017]

Recommend


More recommend