dream to control
play

Dream to Control Learning Behaviors by Latent Imagination Danijar - PowerPoint PPT Presentation

Dream to Control Learning Behaviors by Latent Imagination Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Google Brain DeepMind @danijarh danijar.com/dreamer We introduce Dreamer Scalable reinforcement learning from pixels


  1. Dream to Control Learning Behaviors by Latent Imagination Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Google Brain DeepMind @danijarh danijar.com/dreamer

  2. We introduce Dreamer Scalable reinforcement learning from pixels using a world model 1 Learn actor and value in imagination for long-sighted behaviors 2 Efficiently update actor by backprop through imagined sequences 3

  3. We introduce Dreamer Scalable reinforcement learning from pixels using a world model 1 Learn actor and value in imagination for long-sighted behaviors 2 Efficiently update actor by backprop through imagined sequences 3

  4. We introduce Dreamer Scalable reinforcement learning from pixels using a world model 1 Learn actor and value in imagination for long-sighted behaviors 2 Efficiently update actor by backprop through imagined sequences 3

  5. Dreamer Agent Overview

  6. Dreamer Agent Overview

  7. Dreamer Agent Overview

  8. World Model with Latent States a 1 a 2 o 1 o 2 o 3

  9. World Model with Latent States a 1 a 2 encode images o 1 o 2 o 3

  10. World Model with Latent States a 1 a 2 encode images compute states o 1 o 2 o 3

  11. World Model with Latent States ̂ ̂ ̂ r 1 a 1 r 2 a 2 r 3 encode images compute states predict rewards o 1 o 2 o 3

  12. World Model with Latent States ̂ ̂ ̂ r 1 a 1 r 2 a 2 r 3 encode images compute states predict rewards predict images ̂ ̂ ̂ o 1 o 1 o 2 o 2 o 3 o 3

  13. Long-Term Video Prediction

  14. Long-Term Video Prediction

  15. Learning Behaviors by Latent Imagination

  16. Learning Behaviors by Latent Imagination

  17. Learning Behaviors by Latent Imagination

  18. Learning Behaviors by Latent Imagination encode images o 1

  19. Learning Behaviors by Latent Imagination a 1 a 2 encode images imagine ahead o 1

  20. Learning Behaviors by Latent Imagination ̂ ̂ a 1 r 2 a 2 r 3 encode images imagine ahead predict rewards o 1

  21. Learning Behaviors by Latent Imagination ̂ ̂ ̂ ̂ a 1 v 2 r 2 a 2 v 3 r 3 encode images imagine ahead predict rewards predict values o 1

  22. Learning Behaviors by Latent Imagination ̂ ̂ ̂ ̂ a 1 v 2 r 2 a 2 v 3 r 3 encode images imagine ahead predict rewards predict values o 1

  23. Behaviors Learned by Dreamer

  24. Large-Scale Evaluation for Control from Pixels Model-based: Model-free: 28 hours of interaction 23 days of interaction

  25. Large-Scale Evaluation for Control from Pixels Model-based: Model-free: 28 hours of interaction 23 days of interaction A3C (243)

  26. Large-Scale Evaluation for Control from Pixels Model-based: Model-free: 28 hours of interaction 23 days of interaction PlaNet (332) A3C (243)

  27. Large-Scale Evaluation for Control from Pixels Dreamer (823) Model-based: Model-free: 28 hours of interaction 23 days of interaction PlaNet (332) A3C (243)

  28. Large-Scale Evaluation for Control from Pixels Dreamer (823) D4PG (786) Model-based: Model-free: 28 hours of interaction 23 days of interaction PlaNet (332) A3C (243)

  29. Introducing Dreamer: Scalable Reinforcement Learning Using World Models

  30. Dream to Control Learning Behaviors by Latent Imagination Blog post, code, videos, paper: danijar.com/dreamer

Recommend


More recommend