playing hard exploration games by watching youtube
play

Playing hard exploration games by watching YouTube Yusuf Aytar, - PowerPoint PPT Presentation

Playing hard exploration games by watching YouTube Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas Learning by watching YouTube People learn many tasks by watching online videos Despite huge gaps in visual


  1. Playing hard exploration games by watching YouTube Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas

  2. Learning by watching YouTube People learn many tasks by watching online videos Despite huge gaps in visual appearance, sensing modalities, body differences, etc.. construction knitting playing games Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  3. Learning by watching YouTube People learn many tasks by watching online videos Despite huge gaps in visual appearance, sensing modalities, body differences, etc.. construction knitting playing games Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  4. Challenges Domain Gap No Actions No Rewards Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  5. Challenges Domain Gap No Actions No Rewards Self-Supervised Learn to Play with Rewards Learned from Domain Alignment Imitation (RL) Expert Sequence Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  6. Temporal distance classification (TDC) ... ... ... demonstration sequence Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  7. Temporal distance classification (TDC) video ... ... ... Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  8. Temporal distance classification (TDC) video ... ... ... Visual Embedding Network Temporal Classifier Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  9. Temporal distance classification (TDC) video ... ... ... Temporal Classifier visual embedding Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  10. Temporal distance classification (TDC) video ... ... ... temporal classifier visual embedding Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  11. Cross-modal distance classification (CMC) ... ... ... ... ... ... ... ... ... Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  12. Model successfully aligns different videos Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  13. What does the embedding focus on? Visual only Cross-modal Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  14. Imitation through RL demonstration embedding space Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  15. Imitation through RL observation embedding space Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  16. RL makes imitation more robust Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  17. Results Montezuma Montezuma Pitfall! Private Eye Pure RL ~ 2,500 ~ 0 ~ 50 Avg. Human 4,743 6,464 69,571 DQfD ( 2018 ) 29,384 3,997 100,747 Pitfall Ours 58,175 74,323 98,763 Averaged score of best policy Private Eye Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  18. Results Montezuma Montezuma Pitfall! Private Eye Pure RL ~ 2,500 ~ 0 ~ 50 Avg. Human 4,743 6,464 69,571 DQfD ( 2018 ) 29,384 3,997 100,747 Pitfall Ours 58,175 74,323 98,763 Averaged score of best policy Private Eye Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  19. Results Montezuma Montezuma Pitfall! Private Eye Pure RL ~ 2,500 ~ 0 ~ 50 Avg. Human 4,743 6,464 69,571 DQfD ( 2018 ) 29,384 3,997 100,747 Pitfall Ours 58,175 74,323 98,763 Averaged score of best policy Private Eye Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  20. Results Montezuma Montezuma Pitfall! Private Eye Pure RL ~ 2,500 ~ 0 ~ 50 Avg. Human 4,743 6,464 69,571 DQfD ( 2018 ) 29,384 3,997 100,747 Pitfall Ours 58,175 74,323 98,763 Averaged score of best policy Private Eye Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  21. Results Montezuma Montezuma Pitfall! Private Eye level 3 Pure RL ~ 2,500 ~ 0 ~ 50 Avg. Human 4,743 6,464 69,571 DQfD ( 2018 ) 29,384 3,997 100,747 max Pitfall score Ours 58,175 74,323 98,763 Averaged score of best policy Private Eye max score Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

  22. Visit our poster ! Playing hard exploration games by watching Youtube #142 Playing hard exploration games by watching Youtube — Yusuf Aytar & Tobias Pfaff

Recommend


More recommend