pieter abbeel berkeley ar ficial intelligence research
play

Pieter Abbeel Berkeley Ar-ficial Intelligence Research laboratory - PowerPoint PPT Presentation

Pieter Abbeel Berkeley Ar-ficial Intelligence Research laboratory (BAIR.berkeley.edu) PR1 [Wyrobek, Berger, Van der Loos, Salisbury, ICRA 2008] Personal RoboAcs Hardware ? PR2 Baxter Fetch Willow Garage Rethink RoboAcs Fetch RoboAcs


  1. Pieter Abbeel Berkeley Ar-ficial Intelligence Research laboratory (BAIR.berkeley.edu)

  2. PR1 [Wyrobek, Berger, Van der Loos, Salisbury, ICRA 2008]

  3. Personal RoboAcs Hardware ? PR2 Baxter Fetch Willow Garage Rethink RoboAcs Fetch RoboAcs $400,000 $30,000 ~$80,000 2009 2013 2015

  4. More Generally n Tele-op roboAc surgery n Driving n Flight

  5. Challenge Task: RoboAc Laundry

  6. Challenges and Current DirecAons n Variability n ApprenAceship learning n Reinforcement learning n Uncertainty n Belief space planning n Long-term reasoning n Hierarchical planning

  7. Challenges and Current DirecAons n Variability n ApprenAceship learning [IJRR 2010, ICRA 2010 (2x), ICRA 2012, ISRR 2013, IROS 2014, ICRA 2015 (4x) IROS 2015 (2x)] n Reinforcement learning n Uncertainty n Belief space planning [RSS 2010, WAFR 2010, IJRR 2011, ICRA 2014, WAFR 2014, ICRA 2015] n Long-term reasoning n Hierarchical planning [PlanRob 2013, ICRA 2014, AAAI 2015, IROS 2015]

  8. Object DetecAon in Computer Vision State-of-the-art object detecAon unAl 2012: n Support “cat” Hand-engineered Input Vector “dog” features (SIFT, Image Machine “car” HOG, DAISY, …) (SVM) … Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …): n “cat” “dog” Input 8-layer neural network with 60 million Image parameters to learn “car” … n ~1.2 million training images from ImageNet [Deng, Dong, Socher, Li, Li, Fei-Fei, 2009]

  9. Performance graph credit Matt Zeiler, Clarifai

  10. Performance graph credit Matt Zeiler, Clarifai

  11. Performance AlexNet graph credit Matt Zeiler, Clarifai

  12. Performance AlexNet graph credit Matt Zeiler, Clarifai

  13. Performance AlexNet graph credit Matt Zeiler, Clarifai

  14. Speech RecogniAon graph credit Matt Zeiler, Clarifai

  15. Object DetecAon in Computer Vision State-of-the-art object detecAon unAl 2012: n Support “cat” Hand-engineered Input Vector “dog” features (SIFT, Image Machine “car” HOG, DAISY, …) (SVM) … Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …): n “cat” “dog” Input 8-layer neural network with 60 million Image parameters to learn “car” … n ~1.2 million training images from ImageNet [Deng, Dong, Socher, Li, Li, Fei-Fei, 2009]

  16. RoboAcs Current state-of-the-art robo-cs n Hand- Hand-tuned Hand- Motor engineered (or learned) engineered Percepts commands control 10’ish free state- policy class parameters estimation Deep reinforcement learning n Many-layer neural network Motor Percepts commands with many parameters to learn

  17. Reinforcement Learning (RL) probability of taking acAon a in state s RoboAcs π θ ( a | s ) n MarkeAng / n AdverAsing Dialogue n Robot + OpAmizing Environment n operaAons / logisAcs Queue H n management X max E[ R ( s t ) | π θ ] θ t =0 … n

  18. Reinforcement Learning (RL) probability of taking acAon a in state s π θ ( a | s ) Robot + Environment Addi-onal challenges: n Stability Goal: n n Credit assignment n H X Explora-on max E[ R ( s t ) | π θ ] n θ t =0

  19. How About ConAnuous Control, e.g., LocomoAon? Robot models in physics simulator (MuJoCo, from Emo Todorov) Input: joint angles and velociAes Fully Input Mean Output: joint torques connected Sampling layer parameters layer Joint angles and kinematics Neural network architecture: Control Standard 30 units deviations

  20. Learning LocomoAon [Schulman, Moritz, Levine, Jordan, Abbeel, 2015]

  21. Technical Ideas H X max E[ R ( s t ) | π θ ] θ t =0 Trust Region Policy OpAmizaAon [Schulman, Levine, Moritz, Jordan, Abbeel, 2015] n n Policy OpAmizaAon: n Ojen simpler to represent good policies than good value funcAons n True objecAve of expected cost is opAmized (vs. a surrogate like Bellman error) n Trust Region: n Sampled evaluaAon of objecAve and gradient n Gradient only locally a good approximaAon n Change in policy changes state-acAon visitaAon frequencies

  22. Technical Ideas H X max E[ R ( s t ) | π θ ] θ t =0 Generalized Advantage EsAmaAon [Schulman, Moritz, Levine, Jordan, Abbeel, 2015] n n Fuse value funcAon esAmates with policy evaluaAons from roll-outs n Trust region approach to (high-dimensional) value funcAon esAmaAon

  23. In Contrast: Darpa RoboAcs Challenge

  24. Atari Games Deep Q-Network (DQN) [Mnih et al, 2013/2015] n Dagger with Monte Carlo Tree Search [Xiao-Xiao et al, 2014] n Trust Region Policy OpAmizaAon [Schulman, Levine, Moritz, Jordan, Abbeel, 2015] n Pong Enduro Beamrider Q*bert

  25. How About Real RoboAc Visuo-Motor Skills?

  26. Guided Policy Search general-purpose neural network controller complex dynamics complex policy policy search (RL) HARD complex dynamics complex policy supervised learning EASY complex dynamics complex policy trajectory opAmizaAon EASY trajectory opAmizaAon Supervised learning

  27. Instrumented Training training time test time

  28. Deep SpaAal Neural Net Architecture π θ (92,000 parameters) [Levine*, Finn*, Darrell, Abbeel, 2015, TR at: rll.berkeley.edu/deeplearningroboAcs]

  29. Experimental Tasks [Levine*, Finn*, Darrell, Abbeel, JMLR 2016

  30. Learning [Levine*, Finn*, Darrell, Abbeel, JMLR 2016

  31. Learned Skills [Levine*, Finn*, Darrell, Abbeel, JMLR 2016

  32. Visuomotor Learning Directly in Visual Space 1. Set target end-effector pose 4. Provide image that defines goal features 2. Train exploratory non-vision controller 5. Train final controller in visual feature space 3. Learning visual features with collected images [Finn, Tan, Duan, Darrell, Levine, Abbeel, 2015]

  33. Visuomotor Learning Directly in Visual Space [Finn, Tan, Duan, Darrell, Levine, Abbeel, 2015]

  34. Autonomous Flight Urban Law Agriculture Delivery Enforcement Key challenge : Enable autonomous aerial vehicles (AAVs) to navigate complex, dynamic environments [Khan, Zhang, Levine, Abbeel 2016]

  35. 3DR Solo NVIDIA ZED Stereo Jetson TX1 Depth Camera

  36. Experiments: Learned Neural Network Policy [Khan, Zhang, Levine, Abbeel 2016]

  37. Experiments: Learned Neural Network Policy [Khan, Zhang, Levine, Abbeel 2016]

  38. Experiments: comparisons Canyon Forest [Khan, Zhang, Levine, Abbeel, 2016]

  39. FronAers Shared and transfer learning n Transfer simulaAon -> real world Memory n n n EsAmaAon Hierarchical reasoning n n MulA-Ame scale learning [Mordatch, Mishra, Eppner, Abbeel ICRA 2016]

  40. Acknowledgements Colleagues: Trevor Darrell, Ken Goldberg, Michael Jordan, Stuart Russell n Post-docs: Sergey Levine, Igor Mordatch, Sachin PaAl, Jia Pan, Aviv Tamar, n Dave Held Students: John Schulman, Chelsea Finn, Sandy Huang, Bradly Stadie, Alex n Lee, Dylan Hadfield-Menell, Jonathan Ho, Ziang Xie, Rocky Duan, JusAn Fu, Abhishek Gupta, Gregory Kahn, Nikita Kitaev, Henry Lu, George Mulcaire, Nolan Wagener, Ankush Gupta, Sibi Venkatesan, Cameron Lee

  41. Thank you

Recommend


More recommend