learning latent dynamics for planning from pixels
play

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, - PowerPoint PPT Presentation

Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson @danijarh danijar.com/planet Planning with Learned Models Watter et al., 2015, Banijamali


  1. Learning Latent Dynamics for Planning from Pixels Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson @danijarh danijar.com/planet

  2. Planning with Learned Models Watter et al., 2015, Banijamali et al. 2017, Zhang et al. 2017 Agrawal et al., 2016; Finn & Levine, 2016; Ebert et al., 2018

  3. Visual Control Tasks partially many sparse contacts balance observable joints reward Some model-free methods can solve these tasks but need up to 100,000 episodes

  4. We introduce PlaNet Recipe for scalable model-based reinforcement learning 1 Efficient planning in latent space with large batch size 2 Reaches top performance using 200X fewer episodes 3

  5. Latent Dynamics Model encode images

  6. Latent Dynamics Model encode images predict states

  7. Latent Dynamics Model encode images predict states decode images

  8. Latent Dynamics Model encode images predict states decode images decode rewards

  9. Recurrent State Space Model deterministic stochastic h 1 h 2 h 3 h 1 h 2 h 3 z 1 s 1 z 2 s 2 z 3 s 3 z 1 z 2 z 3 Recurrent Neural Network State Space Model Recurrent State Space Model

  10. Unguided Video Predictions by Single Agent 5 frames context and 45 frames predicted

  11. Recovers the True Dynamics Can predict simulator state from copy of model state

  12. Planning in Latent Space

  13. Planning in Latent Space

  14. Planning in Latent Space

  15. Planning in Latent Space

  16. Planning in Latent Space

  17. Planning in Latent Space

  18. Cross Entropy Planner Initialize factorized Gaussian population distribution over action sequences 1 Horizon

  19. Cross Entropy Planner Initialize factorized Gaussian population distribution over action sequences 1 Candidates Sample 1000 candidate action sequences 2 Horizon

  20. Cross Entropy Planner Initialize factorized Gaussian population distribution over action sequences 1 Candidates Sample 1000 candidate action sequences 2 Evaluate candidates in parallel using the model 3 Horizon

  21. Cross Entropy Planner Initialize factorized Gaussian population distribution over action sequences 1 Candidates Sample 1000 candidate action sequences 2 Evaluate candidates in parallel using the model 3 Horizon Re-fit the population to the top 100 candidates 4

  22. Cross Entropy Planner Initialize factorized Gaussian population distribution over action sequences 1 Candidates Sample 1000 candidate action sequences 2 Evaluate candidates in parallel using the model 3 Horizon Re-fit the population to the top 100 candidates 4 Repeat for 10 steps 5

  23. Comparison to Model-Free Agents Training time: 1 day on a single GPU

  24. Comparison of Model Designs

  25. Comparison of Iterative Planning

  26. Some Additional Tasks In three dimensions Minitaur: 400 episodes Quadruped: 2000 episodes

  27. Conclusions PlaNet solves control tasks from images by efficient planning in the 1 compact latent space of a learned model Pure planning with learned dynamics is feasible for control tasks with 2 image observations, contacts, sparse rewards Planning with learned models can reach the performance of top model-free 3 algorithms in 200 times fewer episodes and the same training time

  28. Enabling More Model-Based RL Research With Jimmy Ba, Mohammad Norouzi, Timothy Lillicrap Explore dynamics Distill the planner to save Value function to extend without supervision computation planning horizon

  29. Learning Latent Dynamics for Planning from Pixels Website with code, videos, blog post, animated paper: danijar.com/planet Thank you

  30. Multi-Step Consistency in Latent Space Perfect one-step model would give perfect multi-step predictions 1 Under limited capacity, one-step and multi-step solutions may not coincide 2 Encourage consistency between one-step and multi-step in latent space 3

Recommend


More recommend