deep learning for robo cs pieter abbeel uc berkeley
play

Deep Learning for Robo/cs Pieter Abbeel UC Berkeley / OpenAI / - PowerPoint PPT Presentation

Deep Learning for Robo/cs Pieter Abbeel UC Berkeley / OpenAI / Gradescope Outline n Some deep learning successes n Deep reinforcement learning n Current direc5ons Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope Object Detec5on in Computer


  1. Deep Learning for Robo/cs Pieter Abbeel UC Berkeley / OpenAI / Gradescope

  2. Outline n Some deep learning successes n Deep reinforcement learning n Current direc5ons Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  3. Object Detec5on in Computer Vision State-of-the-art object detec5on un5l 2012: n Support “cat” Hand-engineered Input Vector “dog” features (SIFT, Image Machine “car” HOG, DAISY, …) (SVM) … Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …): n “cat” “dog” Input 8-layer neural network with 60 million Image parameters to learn “car” … n ~1.2 million training images from ImageNet [Deng, Dong, Socher, Li, Li, Fei-Fei, 2009] Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  4. Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  5. Performance graph credit Matt Zeiler, Clarifai

  6. Performance graph credit Matt Zeiler, Clarifai

  7. Performance AlexNet graph credit Matt Zeiler, Clarifai

  8. Performance AlexNet graph credit Matt Zeiler, Clarifai

  9. Performance AlexNet graph credit Matt Zeiler, Clarifai

  10. Speech Recogni5on graph credit Matt Zeiler, Clarifai

  11. MS COCO Image Cap5oning Challenge Karpathy & Fei-Fei, 2015; Donahue et al., 2015; Xu et al, 2015; many more Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  12. Visual QA Challenge Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  13. Unsupervised Learning Varia5onal Autoencoders [Kingma and Welling, 2014] n n DRAW [Gregor et al, 2015] n … Genera5ve Adversarial Networks [Goodfellow et al, 2014] n n DC-GAN [Radford, Metz, Chintala, 2016] n InfoGAN [Chen, Duan, Houthoof, Schulman, Sutskever, Abbeel, 2016] n … Pixel RNN [van den Oord et al, 2016] n n Pixel CNN [van den Oord et al, 2016] n … Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  14. Image Genera5on – DC-GAN Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  15. Training Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, 2016 Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  16. Comparison with Real Images Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, 2016 Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  17. InfoGAN [Chen, Duan, Houthoof, Schulman, Sutskever, Abbeel, 2016] Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  18. Robo5cs Current state-of-the-art robo/cs n Hand- Hand-tuned Hand- Motor engineered (or learned) engineered Percepts commands control 10’ish free state- policy class parameters estimation Deep reinforcement learning n Many-layer neural network Motor Percepts commands with many parameters to learn Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  19. Deep Learning for Es5ma5on SE3 Nets [Byravan,Fox, 2016] Deep Tracking [Ondruska, Posner, 2016] Backprop KF [Haarnoja, Ajay, Structured Varia5onal Autoencoders [Johnson, Levine, Abbeel, 2016] Duvenaud, Wiltschko, Dala, Adams, 2016] Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  20. Deep Es5ma5on for Grasping/Control DeepMPC [Lenz, Knepper, Saxena, RSS Deep Learning for Detec5ng Robo5c Grasps 2015] [Lenz, Lee, Saxena, RSS 2013] Dexnet Grasp Transfer [Mahler, …, Big Data for Grasp Planning [Kappler, Bohg, Goldberg, 2015] Schaal, 2015] Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  21. Deep Reinforcement Learning (RL) probability of taking ac5on a in state s π θ ( a | s ) Robot + Environment Addi/onal challenges: n Stability Goal: n n Credit assignment n H X Explora/on max E[ R ( s t ) | π θ ] n θ t =0 Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  22. From Pixels to Ac5ons? Pong Enduro Beamrider Q*bert Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  23. Deep Q-Network (DQN): From Pixels to Joys5ck Commands 32 8x8 filters with stride 4 + ReLU 64 4x4 filters with stride 2 + ReLU 64 3x3 filters with stride 1 + ReLU fully connected 512 units + ReLU [Source: Mnih et al., Nature 2015 (DeepMind) ] fully connected output units, one per ac5on Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  24. [ Source: Mnih et al., Nature 2015 (DeepMind) ] Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  25. How About Con5nuous Control, e.g., Locomo5on? Robot models in physics simulator (MuJoCo, from Emo Todorov) Input: joint angles and veloci5es Fully Input Mean Output: joint torques connected Sampling layer parameters layer Joint angles and kinematics Neural network architecture: Control Standard 30 units deviations Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  26. Challenges with Q-Learning n How to score every possible ac5on? n How to ensure monotonic progress? Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  27. Policy Op5miza5on H X max E[ R ( s t ) | π θ ] θ t =0 Ofen simpler to represent good policies than good value func5ons n True objec5ve of expected cost is op5mized (vs. a surrogate like Bellman error) n Exis5ng work: (natural) policy gradients n n Challenges: good, large step direc5ons Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  28. Trust Region Policy Op5miza5on [Schulman, Levine, Moritz, Jordan, Abbeel, 2015] H X max E[ R ( s t ) | π θ ] θ t =0 g > δθ max ˆ δθ s . t . KL ( P ( τ ; θ ) || P ( τ ; θ + δθ )) ≤ ε n Trust Region n Surrogate Loss Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  29. Generalized Advantage Es5ma5on (GAE) H Objec5ve: X max E[ R ( s t ) | π θ ] θ t =0 H ! H Gradient: X X E[ r θ log π θ ( a t | s t ) R ( s k ) � V ( s t ) ] t =0 k = t single sample es5mate of advantage Generalized Advantage Es5ma5on n n Exponen5al interpola5on between actor-cri5c and Monte Carlo es5mates n Trust region approach to (high-dimensional) value func5on es5ma5on [Schulman, Moritz, Levine, Jordan, Abbeel, ICLR 2016] Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  30. Learning Locomo5on [Schulman, Moritz, Levine, Jordan, Abbeel, ICLR 2016] Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  31. Atari Games Deep Q-Network (DQN) [Mnih et al, 2013/2015] n Dagger with Monte Carlo Tree Search [Xiao-Xiao et al, 2014] n Trust Region Policy Op5miza5on (TRPO) [Schulman, Levine, Moritz, Jordan, Abbeel, 2015] n A3C [Mnih et al, 2016] n Pong Enduro Beamrider Q*bert Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  32. Deep RL Benchmarking n Tasks n Algorithms n Experimental setup Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  33. Deep RL Benchmarking -- Tasks 1. Basic tasks 3. Hierarchical 2. Locomo5on 4. Par5ally observable sensing, delayed ac5on, sysID 5. Driving… Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  34. Deep RL Benchmarking -- Algorithms Reinforce n Truncated Natural Policy Gradient n Reward-Weighted Regression (RWR) n Rela5ve Entropy Policy Search (REPS) n Trust-Region Policy Op5miza5on (TRPO) n Cross-Entropy Method (CEM) n Covariance Matrix Adapta5on Evolu5on Strategy (CMA-ES) n Deep Determinis5c Policy Gradients (DDPG) n … n Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  35. Benchmarking [Duan et al, ICML 2016] Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  36. rllab [Duan et al] Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  37. Open AI Gym Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  38. How About Real Robo5c Visuo-Motor Skills? Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  39. Guided Policy Search general-purpose neural network controller complex dynamics complex policy policy search (RL) HARD complex dynamics complex policy supervised learning EASY complex dynamics complex policy trajectory op5miza5on EASY trajectory op5miza5on Supervised learning Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  40. Instrumented Training training time test time Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  41. Deep Spa5al Neural Net Architecture π θ (92,000 parameters) [Levine*, Finn*, Darrell, Abbeel, JMLR 2016] Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  42. Experimental Tasks [Levine*, Finn*, Darrell, Abbeel, JMLR 2016] Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  43. Learning [Levine*, Finn*, Darrell, Abbeel, JMLR 2016] Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  44. Learned Skills [Levine*, Finn*, Darrell, Abbeel, JMLR 2016 Pieter Abbeel -- UC Berkeley / OpenAI / Gradescope

  45. Experiments: Learned Neural Network Policy [Khan, Zhang, Levine, Abbeel 2016]

  46. Fron5ers: DATA | TRANSFER | EXPLORATION | REWARD | HIERARCHY Supersizing Self-Supervision: Learning to Grasp from 50K Tries and 700 Robot Hours [Pinto, Gupta, ICRA 2016] Learning Hand-Eye Coordina5on with Deep Learning and Large Scale Data Collec5on [Pastor, Krizhevsky, Quillen, Levine, 2016] Learning to Poke by Poking: Experien5al Learning of Intui5ve Physics [Agarwal, Nair, Abbeel, Malik, Levine, 2016]

Recommend


More recommend