Learning Visual Servoing with Deep Features and Fitted Q-Iteration - PowerPoint PPT Presentation

Learning Visual Servoing with Deep Features and Fitted Q-Iteration Alex X. Lee 1 , 2,1,3 1 , Pieter Abbeel Sergey Levine 1 UC Berkeley, 2 OpenAI, 3 International Computer Science Institute

Motivation

Deep Neural Networks in Computer Vision object detection person : 0.918 cow : 0.995 bird : 0.902 person : 0.988 person : 0.992 car : 0.745 .745 person : 0.797 bird : 0.978 car : 0.955 55 55 horse : 0.991 bird : 0.972 cow : 0.998 bird : 0.941 bottle : 0.726 person : 0.964 person : 0.988 pers p person : 0.986 0 993 86 car : 0.999 person : 0.993 person person person person nperson : 0.959 n : n : n : person : 0.976 person : 0.929 person : 0.994 person : 0.991 car : 0.997 car : 0.980 AlexNet dog : 0.981 cow : 0.979 person : 0.998 person person person : 0.961 cow : 0.974 person : 0.958 cow : 0.979 bus : 0.999 cow : 0.892 person : 0.960 cow : 0.985 image classification semantic segmentation person : 0.985 person : 0.995 per person : 0.996 per per person : 0.757 person : 0.994 dog : 0.697 object tracking woman animal shaking singer1

Outline ■ Introduction ■ Reinforcement learning and deep reinforcement learning ■ Visual servoing ■ Learn visual servoing with reinforcement learning ■ Policy optimization ■ Combine value and model based RL ■ Learn visual feature dynamics ■ Learn servoing policy with fitted Q-iteration ■ Comparison to prior methods ■ Conclusion

What is Reinforcement Learning? actjon u Agent Environment state s , reward r

Reinforcement Learning Approaches model free actjon u policy value model optimization based based Agent Environment s t s t u t s t u t state s , reward r environment Q π model π ( u t | s t ) = arg max Q ( s t , u ) u Q-value r t +1 u t s t +1 high sample medium sample low sample complexity complexity complexity policy might challenge for relies on a be simpler than continuous and good model value or model high-dimensional action spaces

What is Deep Reinforcement Learning? model free policy value model optimization based based s t u t s t u t s t environment Q π model Q-value r t +1 u t s t +1

Examples of Deep Reinforcement Learning Silver et al, 2014 (DPG) Lillicrap et al, 2015 Gu*, Holly*, et al, 2016 Levine*, Finn*, (DDPG) et al, 2016 Mnih et al, 2015 (DQN) (GPS) Mnih et al, 2016 (A3C) Schulman et al, 2016 Tamar et al, 2016 Sadeghi et al, 2017 (CAD) 2 RL (TRPO + GAE) (VIN)

Deep Reinforcement Learning for Robotics Gu*, Holly*, et al, 2016 Levine*, Finn*, et al, 2016 (GPS) Sadeghi et al, 2017 (CAD) 2 RL

Visual Servoing current goal observation observation

Examples of Visual Servoing: Manipulation Source: SeRViCE Lab, UT Dallas

Examples of Visual Servoing: Surgical Tasks Source: Kehoe et al. 2016

Examples of Visual Servoing: Space Docking 4x Source: NASA

Learning Visual Servoing with Reinforcement Learning linear and angular velocities actjon u Agent Environment state s , reward r distance to desired pose relative to car current and goal image observation

Learning Visual Servoing with Policy Optimization example executions of trained policy policy observation optimization current s t π observation u t goal trained with more than 20000 trajectories!

Combining Value and Model Based Reinforcement Learning State-action value based RL: π ( s t ) = arg max Q ( s t , u ) u

Combining Value and Model Based Reinforcement Learning State-action value based RL: π ( s t ) = arg min u − Q ( s t , u ) dynamics function u || x ∗ − f ( x t , u t ) || 2 Visual servoing: π ( s t ) = arg min − f ( x t , u t ) || − Q ( s t , u )

Servoing with Visual Dynamics Model current predicted goal observation observation observation

Features from Dilated VGG-16 Convolutional Neural Network K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. In ICLR, 2016.

Servoing with Visual Dynamics Model current predicted goal observation observation observation

Servoing with Visual Dynamics Model u || y ∗ − f ( y t , u t ) || 2 π ( x t , x ∗ ) = arg min w − Q w ( s t , u ) current predicted goal feature feature feature

Feature Dynamics: Multiscale Bilinear Model

Learning Model Based Policy with Fitted Q-Iteration u || y ∗ − f ( y t , u t ) || 2 π ( s t ) = arg min w − Q w ( s t , u )

Learning Visual Servoing with Deep Feature Dynamics and FQI example executions of trained policy value based + visual dynamics model s t u t observation current Q Q-value observation goal trained with only 20 trajectories!

Comparison to Prior Methods 5 Average Cost (Negative Reward) 4 3 2 1 0 ORB C-COT CNN ours, feature visual +TRPO feature dynamics points tracker ( ≥ 20000) +FQI IBVS IBVS (20) Feature Representation and Optimization Method

Conclusion ■ Deep reinforcement learning allows us to learn complex robot policies that can process complex visual inputs ■ Combine value based and model based for better sample complexity ■ Visual servoing ■ Learn visual feature dynamics ■ Learn Q-values with fitted Q-iteration

Thank You Acknowledgements Resources Paper: arxiv.org/abs/1703.11000 Code: github.com/alexlee-gk/visual_dynamics Servoing benchmark code: github.com/alexlee-gk/citysim3d More videos: rll.berkeley.edu/visual_servoing

Learning Visual Servoing with Deep Features and Fitted Q-Iteration - PowerPoint PPT Presentation

Learning Visual Servoing with Deep Features and Fitted Q-Iteration Alex X. Lee 1 , 2,1,3 1 , Pieter Abbeel Sergey Levine 1 UC Berkeley, 2 OpenAI, 3 International Computer Science Institute Motivation Deep Neural Networks in Computer Vision

Visual Servoing Henrik I. Christensen Robotics and Intelligent Machines @ GT College of

Visual Servoing, Intro Optimal Control Lecture 12 What will you take home today? Visual

Feedback Control and Visual Servoing Lecture 11 What will you take home today? Introduction to

The Fitted Response Surface Graph the fitted surface and its standard error: response.R 1 / 17

On the global convergence of a singularly perturbed parabolic problem of reaction diffusion type

Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface University of Lorraine,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

Efficient visual search of local features Efficient visual search of local features Cordelia

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

1926.251 General Rigging inspected prior to shift(s) As necessary during use Removed if

Tracking Deformable Objects with Point Clouds John Schulman, Alex

Cij :!~ " ~ Energy Efficiency Rebates for K-12 Schools Appa la chian Power Commerc ial and

(I , :;:::: c Qj "C ;;::::: CONFIDENTIAL C o 1 U . ~ ~'ti ~ ~ ~ ~ ~0l'

Proposed Change Proposed Change STONEBROOK DESIGN HANDBOOK Page 24 Chapter 3 Site Design

FORM 10-Q QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF

FROM UP HERE YOULL SEE THE WORLD DIFFERENTLY TITAN HELICOPTER GROUP What do we Do? The Titan

Evolution and Co-Evolution of Computer Programs to Control Independently-Acting Agents John R.

Sambuz

Useful Links

Newsletter

Mail Us

Learning Visual Servoing with Deep Features and Fitted Q-Iteration - PowerPoint PPT Presentation

Learning Visual Servoing with Deep Features and Fitted Q-Iteration Alex X. Lee 1 , 2,1,3 1 , Pieter Abbeel Sergey Levine 1 UC Berkeley, 2 OpenAI, 3 International Computer Science Institute Motivation Deep Neural Networks in Computer Vision

Visual Servoing Henrik I. Christensen Robotics and Intelligent Machines @ GT College of

Visual Servoing, Intro Optimal Control Lecture 12 What will you take home today? Visual

Feedback Control and Visual Servoing Lecture 11 What will you take home today? Introduction to

The Fitted Response Surface Graph the fitted surface and its standard error: response.R 1 / 17

On the global convergence of a singularly perturbed parabolic problem of reaction diffusion type

Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface University of Lorraine,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

Efficient visual search of local features Efficient visual search of local features Cordelia

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

1926.251 General Rigging inspected prior to shift(s) As necessary during use Removed if

Tracking Deformable Objects with Point Clouds John Schulman, Alex

Cij :!~ &quot; ~ Energy Efficiency Rebates for K-12 Schools Appa la chian Power Commerc ial and

(I , :;:::: c Qj &quot;C ;;::::: CONFIDENTIAL C o 1 U . ~ ~'ti ~ ~ ~ ~ ~0l'

Proposed Change Proposed Change STONEBROOK DESIGN HANDBOOK Page 24 Chapter 3 Site Design

FORM 10-Q QUARTERLY REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF

FROM UP HERE YOULL SEE THE WORLD DIFFERENTLY TITAN HELICOPTER GROUP What do we Do? The Titan

Evolution and Co-Evolution of Computer Programs to Control Independently-Acting Agents John R.

Sambuz

Useful Links

Newsletter

Mail Us

Cij :!~ " ~ Energy Efficiency Rebates for K-12 Schools Appa la chian Power Commerc ial and

(I , :;:::: c Qj "C ;;::::: CONFIDENTIAL C o 1 U . ~ ~'ti ~ ~ ~ ~ ~0l'