Learning Latent Dynamics for Planning from Pixels Danijar Hafner, - PowerPoint PPT Presentation

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson @danijarh danijar.com/planet

Planning with Learned Models Watter et al., 2015, Banijamali et al. 2017, Zhang et al. 2017 Agrawal et al., 2016; Finn & Levine, 2016; Ebert et al., 2018

Visual Control Tasks partially many sparse contacts balance observable joints reward Some model-free methods can solve these tasks but need up to 100,000 episodes

We introduce PlaNet Recipe for scalable model-based reinforcement learning 1 Efficient planning in latent space with large batch size 2 Reaches top performance using 200X fewer episodes 3

Latent Dynamics Model encode images

Latent Dynamics Model encode images predict states

Latent Dynamics Model encode images predict states decode images

Latent Dynamics Model encode images predict states decode images decode rewards

Recurrent State Space Model deterministic stochastic h 1 h 2 h 3 h 1 h 2 h 3 z 1 s 1 z 2 s 2 z 3 s 3 z 1 z 2 z 3 Recurrent Neural Network State Space Model Recurrent State Space Model

Unguided Video Predictions by Single Agent 5 frames context and 45 frames predicted

Recovers the True Dynamics Can predict simulator state from copy of model state

Planning in Latent Space

Cross Entropy Planner Initialize factorized Gaussian population distribution over action sequences 1 Horizon

Cross Entropy Planner Initialize factorized Gaussian population distribution over action sequences 1 Candidates Sample 1000 candidate action sequences 2 Horizon

Cross Entropy Planner Initialize factorized Gaussian population distribution over action sequences 1 Candidates Sample 1000 candidate action sequences 2 Evaluate candidates in parallel using the model 3 Horizon

Cross Entropy Planner Initialize factorized Gaussian population distribution over action sequences 1 Candidates Sample 1000 candidate action sequences 2 Evaluate candidates in parallel using the model 3 Horizon Re-fit the population to the top 100 candidates 4

Cross Entropy Planner Initialize factorized Gaussian population distribution over action sequences 1 Candidates Sample 1000 candidate action sequences 2 Evaluate candidates in parallel using the model 3 Horizon Re-fit the population to the top 100 candidates 4 Repeat for 10 steps 5

Comparison to Model-Free Agents Training time: 1 day on a single GPU

Comparison of Model Designs

Comparison of Iterative Planning

Some Additional Tasks In three dimensions Minitaur: 400 episodes Quadruped: 2000 episodes

Conclusions PlaNet solves control tasks from images by efficient planning in the 1 compact latent space of a learned model Pure planning with learned dynamics is feasible for control tasks with 2 image observations, contacts, sparse rewards Planning with learned models can reach the performance of top model-free 3 algorithms in 200 times fewer episodes and the same training time

Enabling More Model-Based RL Research With Jimmy Ba, Mohammad Norouzi, Timothy Lillicrap Explore dynamics Distill the planner to save Value function to extend without supervision computation planning horizon

Learning Latent Dynamics for Planning from Pixels Website with code, videos, blog post, animated paper: danijar.com/planet Thank you

Multi-Step Consistency in Latent Space Perfect one-step model would give perfect multi-step predictions 1 Under limited capacity, one-step and multi-step solutions may not coincide 2 Encourage consistency between one-step and multi-step in latent space 3

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, - PowerPoint PPT Presentation

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson @danijarh danijar.com/planet Planning with Learned Models Watter et al., 2015, Banijamali

Pixels Pixels Row and column indicates a PIXEL not a POINT. A pixel can theoretically contain

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer,

Turn Right Walk forward 100 pixels Start Here Walk Forward Turn Left and 100 pixels walk

Polygon Filling Goal intensify the pixels that belong to the polygon Issues which pixels belong

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

? Which intermediate 4 pixels to turn on? 3 (3,2) 2 1 0 1 2 3 4 5 6 7 8 9 10 11

Chapter 4: Modifying Pixels in a Range Reminder: Pixels are in a matrix Matrices have two

Opening Exercise Write a method that turns pixels with an average intensity less than 85 to

What Can You Say With Only What Can You Say With Only Three Pixels? Three Pixels? Christopher

Deep Learning Helicopter Dynamics Models Ali Punjani Pieter Abbeel UC Berkeley EECS Latent

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

C unobserved construct (e.g. Disordered v. Non- Disordered) Latent classes are mutually

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses Guan-Hua Huang,

A Discriminative Latent Variable Model for Online Clustering Rajhans Samdani, Kai-Wei Chang , Dan

Poster #24 1 Applied AI Lab, Oxford Robotics Institute 2 Department of Statistics, University of

Variational Sequential Labelers for Semi-Supervised Learning Mingda Chen, Qingming Tang, Karen

Probabilistic & Unsupervised Learning Latent Variable Models Maneesh Sahani

Finding Latent Code Errors via Machine Learning over Program Executions Yuriy Brun Michael D.

Case Study: Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested

Part 3: Latent representations and unsupervised learning Dale Schuurmans University of Alberta

The semnova Package for Latent Repeated Measures ANOVA Benedikt Langenberg, RWTH Aachen

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, - PowerPoint PPT Presentation

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson @danijarh danijar.com/planet Planning with Learned Models Watter et al., 2015, Banijamali

Pixels Pixels Row and column indicates a PIXEL not a POINT. A pixel can theoretically contain

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer,

Turn Right Walk forward 100 pixels Start Here Walk Forward Turn Left and 100 pixels walk

Polygon Filling Goal intensify the pixels that belong to the polygon Issues which pixels belong

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

? Which intermediate 4 pixels to turn on? 3 (3,2) 2 1 0 1 2 3 4 5 6 7 8 9 10 11

Chapter 4: Modifying Pixels in a Range Reminder: Pixels are in a matrix Matrices have two

Opening Exercise Write a method that turns pixels with an average intensity less than 85 to

What Can You Say With Only What Can You Say With Only Three Pixels? Three Pixels? Christopher

Deep Learning Helicopter Dynamics Models Ali Punjani Pieter Abbeel UC Berkeley EECS Latent

Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model CS330

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

C unobserved construct (e.g. Disordered v. Non- Disordered) Latent classes are mutually

Optimization-Based Model Fitting for Latent Class and Latent Profile Analyses Guan-Hua Huang,

A Discriminative Latent Variable Model for Online Clustering Rajhans Samdani, Kai-Wei Chang , Dan

Poster #24 1 Applied AI Lab, Oxford Robotics Institute 2 Department of Statistics, University of

Variational Sequential Labelers for Semi-Supervised Learning Mingda Chen, Qingming Tang, Karen

Probabilistic &amp; Unsupervised Learning Latent Variable Models Maneesh Sahani

Finding Latent Code Errors via Machine Learning over Program Executions Yuriy Brun Michael D.

Case Study: Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested

Part 3: Latent representations and unsupervised learning Dale Schuurmans University of Alberta

The semnova Package for Latent Repeated Measures ANOVA Benedikt Langenberg, RWTH Aachen

Probabilistic & Unsupervised Learning Latent Variable Models Maneesh Sahani