large scale self supervised robot learning with gpu
play

Large-Scale Self-supervised Robot Learning with GPU-enabled - PowerPoint PPT Presentation

Large-Scale Self-supervised Robot Learning with GPU-enabled Video-Prediction Models Frederik Ebert, Chelsea Finn, Alex Lee, Sergey Levine NVIDIA GTC 2018 1 Typical Bar in 20?? 1969 Stanford Arm 2015 DARPA Robotics Challenge Humans have


  1. Large-Scale Self-supervised Robot Learning with GPU-enabled Video-Prediction Models Frederik Ebert, Chelsea Finn, Alex Lee, Sergey Levine NVIDIA GTC 2018 1

  2. Typical Bar in 20?? 1969 Stanford Arm 2015 DARPA Robotics Challenge

  3. Humans have excellent mental models of physical objects 3

  4. How can robots acquire general models and skills using large amounts of autonomously collected data? 4

  5. Related work on self-supervised learning Levine et al. 2016 Pinto & Gupta, 2015 Predict raw sensory inputs instead of binary events . Gandhi et al. 2017 5

  6. Related work on video-prediction Visual Model-Predictive Control Oh et al. 2015 Mathieu et al. 2016 Finn & Levine 2017 Byravan et al. 2017 6

  7. Visual Model-Predictive Control User Input Planning Module Cost Function apply action to Robot Video Prediction Model Designated Pixel Goal Point [Finn et al. 2017] 7

  8. Random Data Collection Collected 45,000 trajectories, recording camera images and actions 8

  9. Action-Conditioned Video Prediction Real Action 2 Action 1 Action 0 state state Recurrent NN Recurrent NN Recurrent NN Generated 9

  10. Skip Connection Neural Advection (SNA) DNA (Finn et al.) 10

  11. Action-Conditioned Video Prediction Temporal Skip Connections Real Action 2 Action 1 Action 0 state state Recurrent NN Recurrent NN Recurrent NN Generated 11

  12. Skip Connection Neural Advection (SNA) DNA (Finn et al.) SNA (Ours) 12

  13. Skip Connection Neural Advection (SNA) 11 compositing masks Image of current time step 64x64x16 64 skip 32x32x32 32x32x32 16 16 skip 16x16x16 32 16x16x64 32 64 8x8x64 33 32 64 gen. Image of next time step conv conv conv conv conv conv conv de conv de conv de conv de conv de conv de conv 3×3 5x5 and channe l 3×3 3×3 3×3 3×3 1×1 3×3 3×3 3×3 3×3 3×3 3×3 softmax stride 2 stride 2 stride 2 stride 2 stride 2 stride 2 10 CDNA ke rne ls 9 maske d 9 compositing conv 9×9 Convolutional LSTM Conv-LSTM Masks CDNA Kernels Image from first time step, temporal skip connection 13

  14. Prediction of Pixel Positions (Test Time) Action 2 Action 0 Action 1 state state Recurrent NN Recurrent NN Recurrent NN Generated 14

  15. Effects of using temporal skip connections SNA (Ours) DNA (Finn et al.) Designated Pixel 15

  16. Planning with Visual-MPC Designated Pixel Goal Pixel 16

  17. Planning: Expected Distance to Goal Cost Predicted Designated Pixel Distance to Goal Distribution for Goal Point designated Pixel 17

  18. Action Selection using Cross-Entropy Method Designated Pixel Goal Pixel Iteration 1 Iteration 2 Iteration 3 18

  19. Results 19

  20. Generalization to objects not seen during training 20

  21. Collision Avoidance Task, involving Occlusion Designated Pixel Goal Pixel Static Pixel 21

  22. Finn et al.

  23. Ours 23

  24. Multi-Goal Pushing Benchmark 24

  25. 25

  26. 26

  27. 27

  28. 28

  29. 29

  30. 30

  31. Takeaways • Temporal skip connections significantly improve the ability to deal with occlusions . • Video-prediction models can be reused across many tasks . • Self-supervised learning on large scale data enables generalizable skills . 31

  32. Q&A Chelsea Finn Alex X. Lee Sergey Levine Code and Data: https://sites.google.com/view/sna-visual-mpc

Recommend


More recommend