Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj Gandhi* Abhinav Gupta UC Berkeley CMU CMU, FAIR ICML 2019 * equal contribution
Exploration – a major challenge!
Exploration – a major challenge! Mohamed et.al. “Variational information • maximisation for intrinsically motivated Schmidhuber, Jurgen. “A possibility for • reinforcement learning”. NIPS, 2015. implementing curiosity and boredom in model building neural controllers”, 1991. Houthooft et.al. “VIME: Variational information • maximizing exploration”. NIPS, 2016. Schmidhuber, Jurgen. “Formal theory of creativity, • fun, and intrinsic motivation (1990–2010)”, 2010. Gregor et.al. “Variational intrinsic control”. ICLR • Workshop, 2017. Oudeyer, P.-Y. and Kaplan, F. What is intrinsic • motivation? a typology of computational Pathak et.al. “Curiosity-driven Exploration by Self- • approaches. Frontiers in neurorobotics, 2009. supervised Exploration”. ICML 2017 Poupart et.al. “An analytic solution to discrete • Ostrovski et.al. “Count-based exploration with • bayesian reinforcement learning”. ICML, 2006. neural density models”. ICML, 2017. Lopes et.al. “Exploration in model-based • Burda*, Edwards*, Pathak* et.al. “Large-Scale • reinforcement learning by empirically estimating Study of Curiosity-driven Learning”. ICLR 2019 learning progress”. NIPS, 2012. Eysenbach et al. “Diversity is all you need: Learn • Bellemare et.al. “Unifying count-based exploration • skills without a reward function”. ICLR 2019. and intrinsic motivation”. NIPS, 2016. Savinov et al. "Episodic curiosity through • reachability". ICLR 2019.
Exploration – a major challenge! Mohamed et.al. “Variational information • maximisation for intrinsically motivated Schmidhuber, Jurgen. “A possibility for • reinforcement learning”. NIPS, 2015. implementing curiosity and boredom in model building neural controllers”, 1991. Houthooft et.al. “VIME: Variational information • maximizing exploration”. NIPS, 2016. Schmidhuber, Jurgen. “Formal theory of creativity, • fun, and intrinsic motivation (1990–2010)”, 2010. Gregor et.al. “Variational intrinsic control”. ICLR • Workshop, 2017. Oudeyer, P.-Y. and Kaplan, F. What is intrinsic • motivation? a typology of computational Pathak et.al. “Curiosity-driven Exploration by Self- • approaches. Frontiers in neurorobotics, 2009. supervised Exploration”. ICML 2017 Poupart et.al. “An analytic solution to discrete • Ostrovski et.al. “Count-based exploration with • bayesian reinforcement learning”. ICML, 2006. neural density models”. ICML, 2017. Lopes et.al. “Exploration in model-based • Burda*, Edwards*, Pathak* et.al. “Large-Scale • reinforcement learning by empirically estimating Study of Curiosity-driven Learning”. ICLR 2019 learning progress”. NIPS, 2012. Eysenbach et al. “Diversity is all you need: Learn • Bellemare et.al. “Unifying count-based exploration • skills without a reward function”. ICLR 2019. and intrinsic motivation”. NIPS, 2016. Savinov et al. "Episodic curiosity through • reachability". ICLR 2019.
Exploration – a major challenge! Mohamed et.al. “Variational information • maximisation for intrinsically motivated Schmidhuber, Jurgen. “A possibility for • reinforcement learning”. NIPS, 2015. implementing curiosity and boredom in model building neural controllers”, 1991. Houthooft et.al. “VIME: Variational information • maximizing exploration”. NIPS, 2016. Schmidhuber, Jurgen. “Formal theory of creativity, • fun, and intrinsic motivation (1990–2010)”, 2010. Gregor et.al. “Variational intrinsic control”. ICLR • Workshop, 2017. Oudeyer, P.-Y. and Kaplan, F. What is intrinsic • motivation? a typology of computational Pathak et.al. “Curiosity-driven Exploration by Self- • approaches. Frontiers in neurorobotics, 2009. supervised Exploration”. ICML 2017 Poupart et.al. “An analytic solution to discrete • Ostrovski et.al. “Count-based exploration with • bayesian reinforcement learning”. ICML, 2006. neural density models”. ICML, 2017. Lopes et.al. “Exploration in model-based • Burda*, Edwards*, Pathak* et.al. “Large-Scale • reinforcement learning by empirically estimating Study of Curiosity-driven Learning”. ICLR 2019 learning progress”. NIPS, 2012. Eysenbach et al. “Diversity is all you need: Learn • Bellemare et.al. “Unifying count-based exploration • skills without a reward function”. ICLR 2019. and intrinsic motivation”. NIPS, 2016. Savinov et al. "Episodic curiosity through • reachability". ICLR 2019.
Exploration – a major challenge! Mohamed et.al. “Variational information • maximisation for intrinsically motivated Schmidhuber, Jurgen. “A possibility for • reinforcement learning”. NIPS, 2015. implementing curiosity and boredom in model building neural controllers”, 1991. t Houthooft et.al. “VIME: Variational information • n e ] maximizing exploration”. NIPS, 2016. Schmidhuber, Jurgen. “Formal theory of creativity, • i s c e i fun, and intrinsic motivation (1990–2010)”, 2010. l f p f Gregor et.al. “Variational intrinsic control”. ICLR • e m n Workshop, 2017. Oudeyer, P.-Y. and Kaplan, F. What is intrinsic • a I s e motivation? a typology of computational l f Pathak et.al. “Curiosity-driven Exploration by Self- • p o approaches. Frontiers in neurorobotics, 2009. m s supervised Exploration”. ICML 2017 n a o S Poupart et.al. “An analytic solution to discrete • i Ostrovski et.al. “Count-based exploration with • l l bayesian reinforcement learning”. ICML, 2006. i m neural density models”. ICML, 2017. [ Lopes et.al. “Exploration in model-based • Burda*, Edwards*, Pathak* et.al. “Large-Scale • reinforcement learning by empirically estimating Study of Curiosity-driven Learning”. ICLR 2019 learning progress”. NIPS, 2012. Eysenbach et al. “Diversity is all you need: Learn • Bellemare et.al. “Unifying count-based exploration • skills without a reward function”. ICLR 2019. and intrinsic motivation”. NIPS, 2016. Savinov et al. "Episodic curiosity through • reachability". ICLR 2019.
Sample Inefficient Simulation
Sample Inefficient Simulation Real Robots
Sample Inefficient “Stuck” in Stochastic Envs Simulation Real Robots
Sample Inefficient “Stuck” in Stochastic Envs Simulation Curiosity Exploration w/ Noisy TV & Remote [Burda*, Edwards*, Pathak* et. al. ICLR’19] [Juliani et.al., ArXiv’19] Real Robots
Why inefficient?
[Pathak et al. ICML, 2017]
current image x t [Pathak et al. ICML, 2017]
policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]
action a t policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]
next image x t+1 action a t policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]
next image x t+1 action a t policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]
next image x t+1 action a t policy network 𝜌 " 𝑦 $ Prediction Model 𝑔(𝑦 $ , 𝑏 $ ) current image x t [Pathak et al. ICML, 2017]
next image x t+1 action a t policy network 𝜌 " 𝑦 $ Prediction Model 𝑔(𝑦 $ , 𝑏 $ ) current image x t action a t current image x t [Pathak et al. ICML, 2017]
next image x t+1 action a t predicted next image * 𝒚 𝒖-𝟐 policy network 𝜌 " 𝑦 $ Prediction Model 𝑔(𝑦 $ , 𝑏 $ ) current image x t action a t current image x t [Pathak et al. ICML, 2017]
Recommend
More recommend