Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj - PowerPoint PPT Presentation

Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj Gandhi* Abhinav Gupta UC Berkeley CMU CMU, FAIR ICML 2019 * equal contribution

Exploration – a major challenge!

Exploration – a major challenge! Mohamed et.al. “Variational information • maximisation for intrinsically motivated Schmidhuber, Jurgen. “A possibility for • reinforcement learning”. NIPS, 2015. implementing curiosity and boredom in model building neural controllers”, 1991. Houthooft et.al. “VIME: Variational information • maximizing exploration”. NIPS, 2016. Schmidhuber, Jurgen. “Formal theory of creativity, • fun, and intrinsic motivation (1990–2010)”, 2010. Gregor et.al. “Variational intrinsic control”. ICLR • Workshop, 2017. Oudeyer, P.-Y. and Kaplan, F. What is intrinsic • motivation? a typology of computational Pathak et.al. “Curiosity-driven Exploration by Self- • approaches. Frontiers in neurorobotics, 2009. supervised Exploration”. ICML 2017 Poupart et.al. “An analytic solution to discrete • Ostrovski et.al. “Count-based exploration with • bayesian reinforcement learning”. ICML, 2006. neural density models”. ICML, 2017. Lopes et.al. “Exploration in model-based • Burda*, Edwards*, Pathak* et.al. “Large-Scale • reinforcement learning by empirically estimating Study of Curiosity-driven Learning”. ICLR 2019 learning progress”. NIPS, 2012. Eysenbach et al. “Diversity is all you need: Learn • Bellemare et.al. “Unifying count-based exploration • skills without a reward function”. ICLR 2019. and intrinsic motivation”. NIPS, 2016. Savinov et al. "Episodic curiosity through • reachability". ICLR 2019.

Exploration – a major challenge! Mohamed et.al. “Variational information • maximisation for intrinsically motivated Schmidhuber, Jurgen. “A possibility for • reinforcement learning”. NIPS, 2015. implementing curiosity and boredom in model building neural controllers”, 1991. t Houthooft et.al. “VIME: Variational information • n e ] maximizing exploration”. NIPS, 2016. Schmidhuber, Jurgen. “Formal theory of creativity, • i s c e i fun, and intrinsic motivation (1990–2010)”, 2010. l f p f Gregor et.al. “Variational intrinsic control”. ICLR • e m n Workshop, 2017. Oudeyer, P.-Y. and Kaplan, F. What is intrinsic • a I s e motivation? a typology of computational l f Pathak et.al. “Curiosity-driven Exploration by Self- • p o approaches. Frontiers in neurorobotics, 2009. m s supervised Exploration”. ICML 2017 n a o S Poupart et.al. “An analytic solution to discrete • i Ostrovski et.al. “Count-based exploration with • l l bayesian reinforcement learning”. ICML, 2006. i m neural density models”. ICML, 2017. [ Lopes et.al. “Exploration in model-based • Burda*, Edwards*, Pathak* et.al. “Large-Scale • reinforcement learning by empirically estimating Study of Curiosity-driven Learning”. ICLR 2019 learning progress”. NIPS, 2012. Eysenbach et al. “Diversity is all you need: Learn • Bellemare et.al. “Unifying count-based exploration • skills without a reward function”. ICLR 2019. and intrinsic motivation”. NIPS, 2016. Savinov et al. "Episodic curiosity through • reachability". ICLR 2019.

Sample Inefficient Simulation

Sample Inefficient Simulation Real Robots

Sample Inefficient “Stuck” in Stochastic Envs Simulation Real Robots

Sample Inefficient “Stuck” in Stochastic Envs Simulation Curiosity Exploration w/ Noisy TV & Remote [Burda*, Edwards*, Pathak* et. al. ICLR’19] [Juliani et.al., ArXiv’19] Real Robots

Why inefficient?

[Pathak et al. ICML, 2017]

current image x t [Pathak et al. ICML, 2017]

policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]

action a t policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]

next image x t+1 action a t policy network 𝜌 " 𝑦 $ current image x t [Pathak et al. ICML, 2017]

next image x t+1 action a t policy network 𝜌 " 𝑦 $ Prediction Model 𝑔(𝑦 $ , 𝑏 $ ) current image x t [Pathak et al. ICML, 2017]

next image x t+1 action a t policy network 𝜌 " 𝑦 $ Prediction Model 𝑔(𝑦 $ , 𝑏 $ ) current image x t action a t current image x t [Pathak et al. ICML, 2017]

next image x t+1 action a t predicted next image * 𝒚 𝒖-𝟐 policy network 𝜌 " 𝑦 $ Prediction Model 𝑔(𝑦 $ , 𝑏 $ ) current image x t action a t current image x t [Pathak et al. ICML, 2017]

Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj - PowerPoint PPT Presentation

Self-Supervised Exploration via Disagreement Deepak Pathak* Dhiraj Gandhi* Abhinav Gupta UC Berkeley CMU CMU, FAIR ICML 2019 * equal contribution Exploration a major challenge! Exploration a major challenge! Mohamed et.al.

Curiosity-driven Exploration by Self-supervised Prediction Author: Deepak Pathak, Pulkit Agrawal,

Value Disagreement and Two Aspects of Meaning Erich Rast erich@snafu.de IFILNOVA Institute of

Disagreement and Political Liberalism Matthias Brinkmann, matthias.brinkmann@philosophy.ox.ac.uk

Measuring disagreement in science Dakota Murray, Wout Lamers, Kevin Boyack, Vincent Larivire,

Information Flows and Disagreement Cristian Badarinza Marco Buchmann FRBNY C ONFERENCE ON C

Agreement and Disagreement in a Non-Classical World Adam Brandenburger, Patricia

How does Disagreement Help Generalization against Label Corruption? Center for Advanced

MEAP and ENB Exploration Exploration in MEAP Genesis of Exploration New Business

I Couldnt Agree More: The Role of Conversational Structure in Agreement and Disagreement

Agreement and Disagreement Classification of Dyadic Interactions Using Vocal and Gestural Cues

Minimizing Polarization and Disagreement in Social Networks Cameron Musco Chris Musco Charalampos

Automatically Identifying Agreement and Disagreement in Speech Rik Koncel-Kedziorski, Andrea

Supervised Practice Program F LO R I D A B O A R D O F B A R E X A M I N E R S Supervised

in Advanced . Exploration 1 . Note 1 : Advanced Exploration: Defined as confirmed

Leverage and Disagreement Franois Geerolf UCLA September 15, 2015 0 / 39 structure of

Acacia Mining plc Exploration Roundtable 11.12.2015 Exploration roundtable Investment in

OUTLINE Introduction Supervised ,Unsupervised And Semi-Supervised Learning.

Active Learning with Disagreement Graphs Corinna Cortes 1 , Giulia DeSalvo 1 , Claudio Gentile 1 ,

Consensus and disagreement in opinion dynamics Nina Gantert Based on joint work with Markus

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Unsupervised and Semi-supervised Learning of Structure Graham Neubig Site

Exploration and Function Approximation CMU 10703 Katerina Fragkiadaki This lecture Exploration

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Peer Disagreement in Belief Revision A Procedure for Conciliation, based on the Equal Weight View?