Learning Visual Servoing with Deep Features and Fitted Q-Iteration Alex X. Lee 1 , 2,1,3 1 , Pieter Abbeel Sergey Levine 1 UC Berkeley, 2 OpenAI, 3 International Computer Science Institute
Motivation
Deep Neural Networks in Computer Vision object detection person : 0.918 cow : 0.995 bird : 0.902 person : 0.988 person : 0.992 car : 0.745 .745 person : 0.797 bird : 0.978 car : 0.955 55 55 horse : 0.991 bird : 0.972 cow : 0.998 bird : 0.941 bottle : 0.726 person : 0.964 person : 0.988 pers p person : 0.986 0 993 86 car : 0.999 person : 0.993 person person person person nperson : 0.959 n : n : n : person : 0.976 person : 0.929 person : 0.994 person : 0.991 car : 0.997 car : 0.980 AlexNet dog : 0.981 cow : 0.979 person : 0.998 person person person : 0.961 cow : 0.974 person : 0.958 cow : 0.979 bus : 0.999 cow : 0.892 person : 0.960 cow : 0.985 image classification semantic segmentation person : 0.985 person : 0.995 per person : 0.996 per per person : 0.757 person : 0.994 dog : 0.697 object tracking woman animal shaking singer1
Outline ■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
What is Reinforcement Learning? actjon u Agent Environment state s , reward r
Reinforcement Learning Approaches model free actjon u policy value model optimization based based Agent Environment s t s t u t s t u t state s , reward r environment Q π model π ( u t | s t ) = arg max Q ( s t , u ) u Q-value r t +1 u t s t +1 high sample medium sample low sample complexity complexity complexity policy might challenge for relies on a be simpler than continuous and good model value or model high-dimensional action spaces
What is Deep Reinforcement Learning? model free policy value model optimization based based s t u t s t u t s t environment Q π model Q-value r t +1 u t s t +1
Examples of Deep Reinforcement Learning Silver et al, 2014 (DPG) Lillicrap et al, 2015 Gu*, Holly*, et al, 2016 Levine*, Finn*, (DDPG) et al, 2016 Mnih et al, 2015 (DQN) (GPS) Mnih et al, 2016 (A3C) Schulman et al, 2016 Tamar et al, 2016 Sadeghi et al, 2017 (CAD) 2 RL (TRPO + GAE) (VIN)
Deep Reinforcement Learning for Robotics Gu*, Holly*, et al, 2016 Levine*, Finn*, et al, 2016 (GPS) Sadeghi et al, 2017 (CAD) 2 RL
Outline ■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
Visual Servoing current goal observation observation
Examples of Visual Servoing: Manipulation Source: SeRViCE Lab, UT Dallas
Examples of Visual Servoing: Surgical Tasks Source: Kehoe et al. 2016
Examples of Visual Servoing: Space Docking 4x Source: NASA
Outline ■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
Learning Visual Servoing with Reinforcement Learning linear and angular velocities actjon u Agent Environment state s , reward r distance to desired pose relative to car current and goal image observation
Outline ■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
Learning Visual Servoing with Policy Optimization example executions of trained policy policy observation optimization current s t π observation u t goal trained with more than 20000 trajectories!
Outline ■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
Combining Value and Model Based Reinforcement Learning State-action value based RL: π ( s t ) = arg max Q ( s t , u ) u
Combining Value and Model Based Reinforcement Learning State-action value based RL: π ( s t ) = arg min u − Q ( s t , u ) dynamics function u || x ∗ − f ( x t , u t ) || 2 Visual servoing: π ( s t ) = arg min − f ( x t , u t ) || − Q ( s t , u )
Servoing with Visual Dynamics Model current predicted goal observation observation observation
Features from Dilated VGG-16 Convolutional Neural Network K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. In ICLR, 2016.
Servoing with Visual Dynamics Model current predicted goal observation observation observation
Servoing with Visual Dynamics Model u || y ∗ − f ( y t , u t ) || 2 π ( x t , x ∗ ) = arg min w − Q w ( s t , u ) current predicted goal feature feature feature
Outline ■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
Feature Dynamics: Multiscale Bilinear Model
Feature Dynamics: Multiscale Bilinear Model
Outline ■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
Learning Model Based Policy with Fitted Q-Iteration u || y ∗ − f ( y t , u t ) || 2 π ( s t ) = arg min w − Q w ( s t , u )
Learning Visual Servoing with Deep Feature Dynamics and FQI example executions of trained policy value based + visual dynamics model s t u t observation current Q Q-value observation goal trained with only 20 trajectories!
Outline ■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion
Comparison to Prior Methods 5 Average Cost (Negative Reward) 4 3 2 1 0 ORB C-COT CNN ours, feature visual +TRPO feature dynamics points tracker ( ≥ 20000) +FQI IBVS IBVS (20) Feature Representation and Optimization Method
Conclusion ■ Deep reinforcement learning allows us to learn complex robot policies that can process complex visual inputs ■ Combine value based and model based for better sample complexity ■ Visual servoing ■ Learn visual feature dynamics ■ Learn Q-values with fitted Q-iteration
Thank You Acknowledgements Resources Paper: arxiv.org/abs/1703.11000 Code: github.com/alexlee-gk/visual_dynamics Servoing benchmark code: github.com/alexlee-gk/citysim3d More videos: rll.berkeley.edu/visual_servoing
Recommend
More recommend