Pieter Abbeel Berkeley Ar-ficial Intelligence Research laboratory (BAIR.berkeley.edu)
PR1 [Wyrobek, Berger, Van der Loos, Salisbury, ICRA 2008]
Personal RoboAcs Hardware ? PR2 Baxter Fetch Willow Garage Rethink RoboAcs Fetch RoboAcs $400,000 $30,000 ~$80,000 2009 2013 2015
More Generally n Tele-op roboAc surgery n Driving n Flight
Challenge Task: RoboAc Laundry
Challenges and Current DirecAons n Variability n ApprenAceship learning n Reinforcement learning n Uncertainty n Belief space planning n Long-term reasoning n Hierarchical planning
Challenges and Current DirecAons n Variability n ApprenAceship learning [IJRR 2010, ICRA 2010 (2x), ICRA 2012, ISRR 2013, IROS 2014, ICRA 2015 (4x) IROS 2015 (2x)] n Reinforcement learning n Uncertainty n Belief space planning [RSS 2010, WAFR 2010, IJRR 2011, ICRA 2014, WAFR 2014, ICRA 2015] n Long-term reasoning n Hierarchical planning [PlanRob 2013, ICRA 2014, AAAI 2015, IROS 2015]
Object DetecAon in Computer Vision State-of-the-art object detecAon unAl 2012: n Support “cat” Hand-engineered Input Vector “dog” features (SIFT, Image Machine “car” HOG, DAISY, …) (SVM) … Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …): n “cat” “dog” Input 8-layer neural network with 60 million Image parameters to learn “car” … n ~1.2 million training images from ImageNet [Deng, Dong, Socher, Li, Li, Fei-Fei, 2009]
Performance graph credit Matt Zeiler, Clarifai
Performance graph credit Matt Zeiler, Clarifai
Performance AlexNet graph credit Matt Zeiler, Clarifai
Performance AlexNet graph credit Matt Zeiler, Clarifai
Performance AlexNet graph credit Matt Zeiler, Clarifai
Speech RecogniAon graph credit Matt Zeiler, Clarifai
Object DetecAon in Computer Vision State-of-the-art object detecAon unAl 2012: n Support “cat” Hand-engineered Input Vector “dog” features (SIFT, Image Machine “car” HOG, DAISY, …) (SVM) … Deep Supervised Learning (Krizhevsky, Sutskever, Hinton 2012; also LeCun, Bengio, Ng, Darrell, …): n “cat” “dog” Input 8-layer neural network with 60 million Image parameters to learn “car” … n ~1.2 million training images from ImageNet [Deng, Dong, Socher, Li, Li, Fei-Fei, 2009]
RoboAcs Current state-of-the-art robo-cs n Hand- Hand-tuned Hand- Motor engineered (or learned) engineered Percepts commands control 10’ish free state- policy class parameters estimation Deep reinforcement learning n Many-layer neural network Motor Percepts commands with many parameters to learn
Reinforcement Learning (RL) probability of taking acAon a in state s RoboAcs π θ ( a | s ) n MarkeAng / n AdverAsing Dialogue n Robot + OpAmizing Environment n operaAons / logisAcs Queue H n management X max E[ R ( s t ) | π θ ] θ t =0 … n
Reinforcement Learning (RL) probability of taking acAon a in state s π θ ( a | s ) Robot + Environment Addi-onal challenges: n Stability Goal: n n Credit assignment n H X Explora-on max E[ R ( s t ) | π θ ] n θ t =0
How About ConAnuous Control, e.g., LocomoAon? Robot models in physics simulator (MuJoCo, from Emo Todorov) Input: joint angles and velociAes Fully Input Mean Output: joint torques connected Sampling layer parameters layer Joint angles and kinematics Neural network architecture: Control Standard 30 units deviations
Learning LocomoAon [Schulman, Moritz, Levine, Jordan, Abbeel, 2015]
Technical Ideas H X max E[ R ( s t ) | π θ ] θ t =0 Trust Region Policy OpAmizaAon [Schulman, Levine, Moritz, Jordan, Abbeel, 2015] n n Policy OpAmizaAon: n Ojen simpler to represent good policies than good value funcAons n True objecAve of expected cost is opAmized (vs. a surrogate like Bellman error) n Trust Region: n Sampled evaluaAon of objecAve and gradient n Gradient only locally a good approximaAon n Change in policy changes state-acAon visitaAon frequencies
Technical Ideas H X max E[ R ( s t ) | π θ ] θ t =0 Generalized Advantage EsAmaAon [Schulman, Moritz, Levine, Jordan, Abbeel, 2015] n n Fuse value funcAon esAmates with policy evaluaAons from roll-outs n Trust region approach to (high-dimensional) value funcAon esAmaAon
In Contrast: Darpa RoboAcs Challenge
Atari Games Deep Q-Network (DQN) [Mnih et al, 2013/2015] n Dagger with Monte Carlo Tree Search [Xiao-Xiao et al, 2014] n Trust Region Policy OpAmizaAon [Schulman, Levine, Moritz, Jordan, Abbeel, 2015] n Pong Enduro Beamrider Q*bert
How About Real RoboAc Visuo-Motor Skills?
Guided Policy Search general-purpose neural network controller complex dynamics complex policy policy search (RL) HARD complex dynamics complex policy supervised learning EASY complex dynamics complex policy trajectory opAmizaAon EASY trajectory opAmizaAon Supervised learning
Instrumented Training training time test time
Deep SpaAal Neural Net Architecture π θ (92,000 parameters) [Levine*, Finn*, Darrell, Abbeel, 2015, TR at: rll.berkeley.edu/deeplearningroboAcs]
Experimental Tasks [Levine*, Finn*, Darrell, Abbeel, JMLR 2016
Learning [Levine*, Finn*, Darrell, Abbeel, JMLR 2016
Learned Skills [Levine*, Finn*, Darrell, Abbeel, JMLR 2016
Visuomotor Learning Directly in Visual Space 1. Set target end-effector pose 4. Provide image that defines goal features 2. Train exploratory non-vision controller 5. Train final controller in visual feature space 3. Learning visual features with collected images [Finn, Tan, Duan, Darrell, Levine, Abbeel, 2015]
Visuomotor Learning Directly in Visual Space [Finn, Tan, Duan, Darrell, Levine, Abbeel, 2015]
Autonomous Flight Urban Law Agriculture Delivery Enforcement Key challenge : Enable autonomous aerial vehicles (AAVs) to navigate complex, dynamic environments [Khan, Zhang, Levine, Abbeel 2016]
3DR Solo NVIDIA ZED Stereo Jetson TX1 Depth Camera
Experiments: Learned Neural Network Policy [Khan, Zhang, Levine, Abbeel 2016]
Experiments: Learned Neural Network Policy [Khan, Zhang, Levine, Abbeel 2016]
Experiments: comparisons Canyon Forest [Khan, Zhang, Levine, Abbeel, 2016]
FronAers Shared and transfer learning n Transfer simulaAon -> real world Memory n n n EsAmaAon Hierarchical reasoning n n MulA-Ame scale learning [Mordatch, Mishra, Eppner, Abbeel ICRA 2016]
Acknowledgements Colleagues: Trevor Darrell, Ken Goldberg, Michael Jordan, Stuart Russell n Post-docs: Sergey Levine, Igor Mordatch, Sachin PaAl, Jia Pan, Aviv Tamar, n Dave Held Students: John Schulman, Chelsea Finn, Sandy Huang, Bradly Stadie, Alex n Lee, Dylan Hadfield-Menell, Jonathan Ho, Ziang Xie, Rocky Duan, JusAn Fu, Abhishek Gupta, Gregory Kahn, Nikita Kitaev, Henry Lu, George Mulcaire, Nolan Wagener, Ankush Gupta, Sibi Venkatesan, Cameron Lee
Thank you
Recommend
More recommend