Model-Based Active Exploration Pranav Shyam, Wojciech Jaskowski, - PowerPoint PPT Presentation

Model-Based Active Exploration Pranav Shyam, Wojciech Jaskowski, Faustino Gomez arxiv.org/abs/1810.12162 Presentation by Danijar Hafner

Reinforcement Learning objective sensor input algorithm motor output unknown learning agent environment

Reinforcement Learning Intrinsic Motivation objective objective sensor sensor input input algorithm algorithm motor motor output output unknown unknown learning agent learning agent environment environment

Many Intrinsic Objectives Information gain e.g. Lindley 1956, Sun 2011, Houthooft 2017 Prediction error e.g. Schmidhuber 1991, Bellemare 2016, Pathak 2017 Empowerment e.g. Klyubin 2005, Tishby 2011, Gregor 2016 Skill discovery e.g. Eysenbach 2018, Sharma 2020, Co-Reyes 2018 Surprise minimization e.g. Schrödinger 1944, Friston 2013, Berseth 2020 Bayes-adaptive RL e.g. Gittins 1979, Duff 2002, Ross 2007

Information Gain Without rewards, the agent can only learn about the environment.

Information Gain Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density, forward prediction

Information Gain Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density, forward prediction Need to represent uncertainty about W to tell how much we have learned. p( W )

Information Gain Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density, forward prediction Need to represent uncertainty about W to tell how much we have learned. data collection p( W ) p( W | X )

Information Gain Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density, forward prediction Need to represent uncertainty about W to tell how much we have learned. data collection p( W ) p( W | X ) To gain the most information, we aim to maximize the mutual information between future sensory inputs X and model parameters W : Both W and X are max a I( X ; W | A=a ) random variables

Information Gain Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density, forward prediction Need to represent uncertainty about W to tell how much we have learned. data collection p( W ) p( W | X ) To gain the most information, we aim to maximize the mutual information between future sensory inputs X and model parameters W : Both W and X are max a I( X ; W | A=a ) = ? random variables

Retrospective Infogain Expected Infogain e.g. VIME, ICM, RND e.g. MAX, PETS-ET, LD KL[p(W|X,A=a) p(W|A=a)] | | I( X ; W | A=a ) Collect episodes, train world model, Need to search for actions that will record improvement, reward the lead to high information gain without controller by this improvement additional environment interaction Infogain depends on agent's Learn a forward model of the knowledge that keeps changing, environment to search for actions by making it a non-stationary objective planning or learning in imagination The learned controller will lag behind Computing the expected information and go to states that were previously gain requires computing entropies of novel but are not anymore a model with uncertainty estimates

Retrospective Novelty Episode 1 Everything unknown

Retrospective Novelty Episode 1 Random behavior

Retrospective Novelty Episode 1 High novelty

Retrospective Novelty Episode 1 Reinforce behavior

Retrospective Novelty Episode 2 Repeat behavior

Retrospective Novelty Episode 2 Reach similar states

Retrospective Novelty Episode 2 Not surprising anymore :(

Retrospective Novelty Episode 2 Unlearn behavior

Retrospective Novelty Episode 3 Repeat behavior

Retrospective Novelty Episode 3 Still not novel

Retrospective Novelty Episode 3 Unlearn behavior

Retrospective Novelty The agent builds a map of where it was already and avoids those states. Episode 4 Back to random behavior

Expected Novelty Episode 1 Everything unknown

Expected Novelty Episode 1 Consider options

Expected Novelty Episode 1 Execute plan

Expected Novelty Episode 1 Observe new data

Expected Novelty Episode 2 Consider options

Expected Novelty Episode 2 Execute plan

Expected Novelty Episode 2 Observe new data

Ensemble of Dynamics Models Learn dynamics both to represent knowledge and to plan for expected infogain

Ensemble of Dynamics Models Learn dynamics both to represent knowledge and to plan for expected infogain Capture uncertainty as an ensemble of non-linear Gaussian predictors

Ensemble of Dynamics Models Learn dynamics both to represent knowledge and to plan for expected infogain Capture uncertainty as an ensemble of non-linear Gaussian predictors I( X ; W | A=a ) = H( X | A=a ) − H( X | W , A=a ) epistemic uncertainty total uncertainty aleatoric uncertainty

Ensemble of Dynamics Models Learn dynamics both to represent knowledge and to plan for expected infogain Capture uncertainty as an ensemble of non-linear Gaussian predictors I( X ; W | A=a ) = H( X | A=a ) − H( X | W , A=a ) epistemic uncertainty total uncertainty aleatoric uncertainty Information gain targets uncertain trajectories with low expected noise

Ensemble of Dynamics Models Learn dynamics both to represent knowledge and to plan for expected infogain Capture uncertainty as an ensemble of non-linear Gaussian predictors I( X ; W | A=a ) = H( X | A=a ) − H( X | W , A=a ) epistemic uncertainty total uncertainty aleatoric uncertainty Information gain targets uncertain trajectories with low expected noise Wide predictions mean high expected noise Overlapping modes means less total uncertainty

Ensemble of Dynamics Models Learn dynamics both to represent knowledge and to plan for expected infogain Capture uncertainty as an ensemble of non-linear Gaussian predictors I( X ; W | A=a ) = H( X | A=a ) − H( X | W , A=a ) epistemic uncertainty total uncertainty aleatoric uncertainty Information gain targets uncertain trajectories with low expected noise Wide predictions mean high expected noise Narrow predictions mean low expected noise Overlapping modes means less total uncertainty Distant modes means large total uncertainty

Expected Infogain Approximation I( X ; W | A=a ) = H( X | A=a ) − H( X | W , A=a ) epistemic uncertainty total uncertainty aleatoric uncertainty

Expected Infogain Approximation I( X ; W | A=a ) = H( X | A=a ) − H( X | W , A=a ) epistemic uncertainty total uncertainty aleatoric uncertainty p(X | W=w k , A=a) Ensemble members:

Expected Infogain Approximation I( X ; W | A=a ) = H( X | A=a ) − H( X | W , A=a ) epistemic uncertainty total uncertainty aleatoric uncertainty p(X | W=w k , A=a) Ensemble members: p(X | A=a) = 1/K Σ p(X | W=w k , A=a) Aggregate prediction: 1/K Σ H(p(X | W=w k , A=a)) Aleatoric uncertainty: H(1/K Σ p(X | W=w k , A=a)) Total uncertainty: Gaussian entropy has a closed form, so we can compute the aleatoric uncertainty. GMM entropy does not, sample it or switch to Renyi entropy that has a closed form.

Model-Based Active Exploration Pranav Shyam, Wojciech Jaskowski, - PowerPoint PPT Presentation

Model-Based Active Exploration Pranav Shyam, Wojciech Jaskowski, Faustino Gomez arxiv.org/abs/1810.12162 Presentation by Danijar Hafner Reinforcement Learning objective sensor input algorithm motor output unknown learning agent

On the Exploration of Model-Based Support for DO-178C-COMPLIANT AIRBORNE SOFTWARE D E V E L O P M

Advice-Based Exploration in Model-Based Reinforcement Learning Rodrigo Toro Icarte 1 , 2 Toryn Q.

Saving Active Management A self-guided exploration : Ron Surz Click on any picture to open a

MEAP and ENB Exploration Exploration in MEAP Genesis of Exploration New Business

Helix Resources Copper focused Quality Acreage Active Exploration Program COLLERINA AND

for AI and Robotics Exploration and information gathering Alessandro Farinelli Outline

Active automata learning Based on: Bernhard Steffen, Falk Howar und Maik Merten: Introduction to

For personal use only Artemis Resources Ltd FOCUSED & ACTIVE EXPLORATION COMPANY ASX: ARV

Department of Statistics Outline Active Basis model as a generative model Supervised and

Active Learning with Active Learning with Model Selection Neil Rubens Sugiyama Lab / Tokyo

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Design Space Exploration of Memory Model for Heterogeneous

By Sara Stolbach Advanced CLT, Spring 2007 Definition In Active Learning the user is given

1 Outline Problem Setting Instance-Based vs. Model-Based Model-Based Algorithms

Efficient Graph-Based Active Learning with Probit Likelihood via Gaussian Approximations Kevin

PENINSULA M INES LIM ITED M inerals Exploration for the New M illennium M arch 2016 (ASX:PSM )

1 Model-Based Classification Model-Based Classification Model-based approach Build a

Release from Active Learning / Release from Active Learning / Model Selection Dilemma: Model

Improved Path Exploration in shim6-based Multihoming S ebastien Barr e , Olivier Bonaventure

Robustness of model-based control Emo Todorov Roboti LLC University of Washington Model-based

Everybody Active, Every Day: An evidence-based approach to physical activity Dr Mike Brannan

Hannans Reward Ltd Minerals Exploration Western Australia Gold Nickel Iron

in Advanced . Exploration 1 . Note 1 : Advanced Exploration: Defined as confirmed

CSC2547 Presentation: Curiosity-driven exploration Count-based VS Info gain-based Sheng Jia,

Model-Based Active Exploration Pranav Shyam, Wojciech Jaskowski, - PowerPoint PPT Presentation

Model-Based Active Exploration Pranav Shyam, Wojciech Jaskowski, Faustino Gomez arxiv.org/abs/1810.12162 Presentation by Danijar Hafner Reinforcement Learning objective sensor input algorithm motor output unknown learning agent

On the Exploration of Model-Based Support for DO-178C-COMPLIANT AIRBORNE SOFTWARE D E V E L O P M

Advice-Based Exploration in Model-Based Reinforcement Learning Rodrigo Toro Icarte 1 , 2 Toryn Q.

Saving Active Management A self-guided exploration : Ron Surz Click on any picture to open a

MEAP and ENB Exploration Exploration in MEAP Genesis of Exploration New Business

Helix Resources Copper focused Quality Acreage Active Exploration Program COLLERINA AND

for AI and Robotics Exploration and information gathering Alessandro Farinelli Outline

Active automata learning Based on: Bernhard Steffen, Falk Howar und Maik Merten: Introduction to

For personal use only Artemis Resources Ltd FOCUSED &amp; ACTIVE EXPLORATION COMPANY ASX: ARV

Department of Statistics Outline Active Basis model as a generative model Supervised and

Active Learning with Active Learning with Model Selection Neil Rubens Sugiyama Lab / Tokyo

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Design Space Exploration of Memory Model for Heterogeneous

By Sara Stolbach Advanced CLT, Spring 2007 Definition In Active Learning the user is given

1 Outline Problem Setting Instance-Based vs. Model-Based Model-Based Algorithms

Efficient Graph-Based Active Learning with Probit Likelihood via Gaussian Approximations Kevin

PENINSULA M INES LIM ITED M inerals Exploration for the New M illennium M arch 2016 (ASX:PSM )

1 Model-Based Classification Model-Based Classification Model-based approach Build a

Release from Active Learning / Release from Active Learning / Model Selection Dilemma: Model

Improved Path Exploration in shim6-based Multihoming S ebastien Barr e , Olivier Bonaventure

Robustness of model-based control Emo Todorov Roboti LLC University of Washington Model-based

Everybody Active, Every Day: An evidence-based approach to physical activity Dr Mike Brannan

Hannans Reward Ltd Minerals Exploration Western Australia Gold Nickel Iron

in Advanced . Exploration 1 . Note 1 : Advanced Exploration: Defined as confirmed

CSC2547 Presentation: Curiosity-driven exploration Count-based VS Info gain-based Sheng Jia,

For personal use only Artemis Resources Ltd FOCUSED & ACTIVE EXPLORATION COMPANY ASX: ARV