Dream to Control Learning Behaviors by Latent Imagination Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Google Brain DeepMind @danijarh danijar.com/dreamer
We introduce Dreamer Scalable reinforcement learning from pixels using a world model 1 Learn actor and value in imagination for long-sighted behaviors 2 Efficiently update actor by backprop through imagined sequences 3
We introduce Dreamer Scalable reinforcement learning from pixels using a world model 1 Learn actor and value in imagination for long-sighted behaviors 2 Efficiently update actor by backprop through imagined sequences 3
We introduce Dreamer Scalable reinforcement learning from pixels using a world model 1 Learn actor and value in imagination for long-sighted behaviors 2 Efficiently update actor by backprop through imagined sequences 3
Dreamer Agent Overview
Dreamer Agent Overview
Dreamer Agent Overview
World Model with Latent States a 1 a 2 o 1 o 2 o 3
World Model with Latent States a 1 a 2 encode images o 1 o 2 o 3
World Model with Latent States a 1 a 2 encode images compute states o 1 o 2 o 3
World Model with Latent States ̂ ̂ ̂ r 1 a 1 r 2 a 2 r 3 encode images compute states predict rewards o 1 o 2 o 3
World Model with Latent States ̂ ̂ ̂ r 1 a 1 r 2 a 2 r 3 encode images compute states predict rewards predict images ̂ ̂ ̂ o 1 o 1 o 2 o 2 o 3 o 3
Long-Term Video Prediction
Long-Term Video Prediction
Learning Behaviors by Latent Imagination
Learning Behaviors by Latent Imagination
Learning Behaviors by Latent Imagination
Learning Behaviors by Latent Imagination encode images o 1
Learning Behaviors by Latent Imagination a 1 a 2 encode images imagine ahead o 1
Learning Behaviors by Latent Imagination ̂ ̂ a 1 r 2 a 2 r 3 encode images imagine ahead predict rewards o 1
Learning Behaviors by Latent Imagination ̂ ̂ ̂ ̂ a 1 v 2 r 2 a 2 v 3 r 3 encode images imagine ahead predict rewards predict values o 1
Learning Behaviors by Latent Imagination ̂ ̂ ̂ ̂ a 1 v 2 r 2 a 2 v 3 r 3 encode images imagine ahead predict rewards predict values o 1
Behaviors Learned by Dreamer
Large-Scale Evaluation for Control from Pixels Model-based: Model-free: 28 hours of interaction 23 days of interaction
Large-Scale Evaluation for Control from Pixels Model-based: Model-free: 28 hours of interaction 23 days of interaction A3C (243)
Large-Scale Evaluation for Control from Pixels Model-based: Model-free: 28 hours of interaction 23 days of interaction PlaNet (332) A3C (243)
Large-Scale Evaluation for Control from Pixels Dreamer (823) Model-based: Model-free: 28 hours of interaction 23 days of interaction PlaNet (332) A3C (243)
Large-Scale Evaluation for Control from Pixels Dreamer (823) D4PG (786) Model-based: Model-free: 28 hours of interaction 23 days of interaction PlaNet (332) A3C (243)
Introducing Dreamer: Scalable Reinforcement Learning Using World Models
Dream to Control Learning Behaviors by Latent Imagination Blog post, code, videos, paper: danijar.com/dreamer
Recommend
More recommend