learning dexterity
play

Learning Dexterity Peter Welinder SEPTEMBER 09, 2018 Learning - PowerPoint PPT Presentation

Learning Dexterity Peter Welinder SEPTEMBER 09, 2018 Learning Trends towards learning-based robotics Reinforcement Learning Go (AlphaGo Zero) Dota 2 (OpenAI Five) What about Robotics? RL doesnt work because it uses lots of experience. 5


  1. Learning Dexterity Peter Welinder SEPTEMBER 09, 2018

  2. Learning

  3. Trends towards learning-based robotics

  4. Reinforcement Learning Go (AlphaGo Zero) Dota 2 (OpenAI Five)

  5. What about Robotics? RL doesn’t work because it uses lots of experience. 5 million games ~500 years of playing Go: 200 years per day Dota: 200 years per day

  6. Simulators

  7. Learning dexterity

  8. 24 joints: 20 actuated 4 under actuated

  9. Rotating a block Challenges RL in real world high dimensional control noisy and partial observations manipulating multiple objects.

  10. Approach

  11. Reinforcement Learning + Domain Randomization

  12. Reinforcement Learning STATE REWARDS action t = policy(state t ) X score = reward(state t , action t ) AGENT (POLICY) t ACTIONS

  13. Reinforcement Learning θ ∗ = arg max X reward(policy θ , τ ) θ τ ∈ episodes Proximal Policy Optimization (PPO) Schulman et al. (2017)

  14. Policy finger joint positions Action Distribution LSTM Fully-connected ReLU Normalization fingertip positions object pose Noisy Observation Goal

  15. Distributed training with Rapid Rollout Workers Optimizers 6,000 CPU Cores 8 GPUs Policy Parameters

  16. Domain Randomization F Sadeghi, S Levine (2017) Tobin et al. (2017) Peng et al. (2018)

  17. Physics Randomizations object dimensions object and robot link masses surface friction coefficients robot joint damping coefficients actuator force gains joint limits gravity vector

  18. Object position Dense Object rotation Concat SSM SSM SSM ResNet ResNet ResNet Pool Pool Pool Conv Conv Conv Camera 1 Camera 2 Camera 3

  19. Train in Simulation A Distributed workers collect B We train a control policy using reinforcement learning. experience on randomized It chooses the next action based on fingertip positions environments at large scale. and the object pose. LSTM Observed Robot States Actions C We train a convolutional neural network to predict the object pose given three simulated camera images. CONV CONV Object Pose CONV

  20. Transfer to the Real World We combine the pose estimation network D and the control policy to transfer to the real world. CONV Fingertip LSTM CONV Locations Actions CONV Object Pose

  21. Results

  22. Page Title

  23. Results MAX NUMBER 
 MEDIAN NUMBER 
 RANDOMIZATONS OBJECT TRACKING OF SUCCESSES OF SUCCESSES All Vision 46 11.5 All Motion tracking 50 13 None Motion tracking 6 0

  24. Training time 50 Consecutive Goals Achieved 40 30 20 10 0 1 10 100 Years of Experience All Randomizations No Randomizations

  25. Tip Pinch Palmar Pinch Tripod Quadpod Power Grasp 5-finger Precision Grasp

  26. Thank You Visit openai.com for more information. FOLLOW @OPENAI ON TWITTER

Recommend


More recommend