Learning to Manipulate from Demonstra3ons CS287 November 17, 2015 - PowerPoint PPT Presentation

Learning to Manipulate from Demonstra3ons CS287 November 17, 2015 Sandy Huang Slides courtesy of Pieter Abbeel

Personal RoboBcs Hardware ? PR2 Baxter UBR-1 Willow Garage Rethink RoboBcs Unbounded RoboBcs ? $400,000 $30,000 $35,000 $2,000 ? 2009 2013 2013 2017?

Challenge Task: RoboBc Laundry [MaiBn-Shepard, Cusumano Towner, Lei, Abbeel, ICRA 2010]

How About…

Surgical Knot Tie [van den Berg, Miller, Duckworth, Humphrey, Wan, Fu, Goldberg, Abbeel, Best Medical RoboBcs Paper, ICRA 2010]

Surgical Knot Tie n Open loop n If careful about iniBal condiBons n 50% success rate

Learning from DemonstraBons n The problem n Human demonstrated knot- n Robot has to Be a knot in Be in this rope this rope

Generalizing Trajectories Prior work n Billard, Calinon and collaborators n Gaussian Mixture Models (GMM) and Gaussian Mixture Regression (GMR) n Schaal and collaborators n Dynamic moBon primiBves n Cakmak, Thomaz and collaborators n Human robot interacBon for robot to learn faster n Peters and collaborators n Stay close to demonstraBons distribuBon while also opBmizing reward n BUT n All of these algorithms have underlying representaBons in terms of coordinates n Can we alleviate need to specify coordinate frames / features and directly adapt to geometry? n

Cartoon Problem Secng Training scene Trajectory demonstraBons Test scene What trajectory here? ?

Cartoon Problem Secng Training scene Trajectory demonstraBons Samples of f : R 3 à R 3 Test scene What trajectory here? ?

Cartoon Problem Secng Training scene Trajectory demonstraBons Samples of f : R 3 à R 3 Test scene What trajectory here?

Learning f : R 3 à R 3 from Samples Z x ∈ R 3 k D 2 f ( x ) k 2 min Frob dx f ∈ { R 3 → R 3 } f ( x ( i ) train ) = x ( i ) s . t . ∀ i ∈ 1 , . . . , m test n TranslaBons, rotaBons and scaling are FREE

Learning f : R 3 à R 3 from Samples Z x ∈ R 3 k D 2 f k 2 min Frob ( x ) dx f ∈ { R 3 → R 3 } f ( x ( i ) train ) = x ( i ) s . t . test 8 i 2 1 , . . . , m n SoluBon has form: Wahba, Spline models for observaBonal data. Philadelphia: Society for Industrial and Applied MathemaBcs. 1990. Evgeniou, PonBl, Poggio, RegularizaBon Networks and Support Vector Machines. Advances in ComputaBonal MathemaBcs. 2000. HasBe, Tibshirani, Friedman, Elements of StaBsBcal Learning, Chapter 5. 2008.

Finding a Non-Rigid RegistraBon n Thin Plate Spline Robust Point Matching (TPS-RPM) [Chui et al. CVIU 2003]: Calculate soj point OpBmize for warp IniBalize correspondence matrix funcBon n Variant of ExpectaBon-MaximizaBon (EM); finds locally opBmal warp

Trajectory Transfer Procedure n Using non-rigid registraBon, find a transformaBon f from training scene to test scene n Apply f to the demonstrated end-effector trajectory n Convert the end-effector trajectory to a joint trajectory [J. Schulman, J. Ho, C. Lee, P. Abbeel, ISRR 2013]

Robot Experiments n Knots Bed n Overhand n Figure-eight n Double-overhand n Square n Clove-hitch

Experiment: Knot-Tie [J. Schulman, J. Ho, C. Lee, P. Abbeel, ISRR 2013]

EvaluaBon

Experiment: Suturing [J. Schulman, A. Gupta, S. Venkatesan, M. Tayson-Frederick, P. Abbeel, IROS 2013]

LimitaBons of Trajectory Transfer n Does not consider joint limits and obstacles when finding the warp funcBon n ComputaBonally expensive with >100 demonstraBons n Ignores surface normals when finding the warp funcBon n Only uses geometric informaBon of the objects, not appearance informaBon

Trajectory Transfer: First Step DemonstraBon scene Test scene ? ( ) + bending_energy f ( ) Step 1: f ∈ registration!functions !registration_error S demo ,! S test min ! ( ) τ f ← f τ demo !

Trajectory Transfer: Second Step Transferred trajectory Feasible trajectory ( ) Step 2: min trajectory_error τ f , τ τ ∈ trajectories τ !is!feasible!and!collision5free s.t. !

Unifying Trajectory Transfer Two-step opBmizaBon Unified opBmizaBon ( ) f ∈ registration!functions !registration_error S demo ,! S test min Step 1: ( ) ( ) min registration_error S demo ,! S test !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + bending_energy f f ∈ registration!functions ! !!!!!!!! τ ∈ trajectories ( ) ( ) !!!!!!!!!!!!! + bending_energy f ( ) , τ trajectory_error f τ demo min ( ) ( ) , τ Step 2: τ ∈ trajectories + trajectory_error f τ demo τ !is!feasible!and!collision5free s.t. ! s.t. τ !is!feasible!and!collision7free !

ApplicaBon to ManipulaBon of Deformable Objects 100 90 80 70 60 Success Rate 50 Two-step opBmizaBon 40 Unified opBmizaBon 30 20 10 0 1 0.9 0.8 0.7 0.6 0.5 0.4 Degree of Freedom Range Reduc3on Factor [A. Lee, S. Huang, D. Hadfield-Menell, E. Tzeng, P. Abbeel, IROS 2014]

TheoreBcal Guarantees n Can be expected to work if the dynamics of the system are approximately covariant under sufficiently smooth warpings.

Nearest-Neighbor Policy for Tasks n Repeat n Acquire new point cloud X test n Using non-rigid registraBon compute distance between X test and each point cloud X train,i from demonstraBons n If i* is a “done” state, break n Apply trajectory transfer to generate new trajectory

LimitaBons of the Nearest-Neighbor Policy n Doesn’t account for demonstraBon quality n Doesn’t prefer moves that make progress n Doesn’t account for reachability of trajectory

Learning to Choose Bever AcBons [D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

Max-Margin Policy Learning [D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

Max-Margin Q-FuncBon Learning [D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

Experiments [D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

Results in SimulaBon [D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

EvaluaBon on Knot-Tying Overhand Knots Figure 8 Knots Success Rate Success Rate 100 100 80 80 60 60 40 40 20 20 0 0 [Schulman Max Margin Beam [Schulman Max Margin Beam et al. ISRR Q-function Search et al. ISRR Q-function Search '13] Estimation (3-3) '13] Estimation (3-3) 70% 82% 88% 54% 63% 76%

MoBvaBon for Including Surface Normals

Standard TPS-RPM RegistraBon DemonstraBon scene Test scene

TPS-RPM RegistraBon with Normals Test scene DemonstraBon scene [A. Lee, M. Goldstein, S. Barrav, P. Abbeel, ICRA 2015]

Problem FormulaBon

TPS-RPM: SensiBvity to IniBalizaBon n Only uses geometric informaBon to find non-rigid registraBon Demo Test

Geometric Similarity ≠ SemanBc Similarity n DemonstraBon selecBon also only uses geometric informaBon Test configuraBon Geometrically-similar demonstraBon configuraBons

ConvoluBonal Neural Net ClassificaBon corners-against-background n edges-against-background n edges-against-interior n folds-against-background n flat interior n wrinkled interior n [S. Huang, J. Pan, G. Mulcaire, P. Abbeel, IROS 2015]

Leveraging Appearance InformaBon Calculate soj point OpBmize for warp IniBalize correspondence matrix funcBon n = correspondence between source point and target point n = prior probability that and should be matched n Define the new point correspondence matrix as n Normalize so that the rows and columns sum to 1

Trajectory Transfer + Appearance Priors Demo Test Without appearance priors With appearance priors

TPS-RPM with CNN ClassificaBon of Pixels [S. Huang, J. Pan, G. Mulcaire, P. Abbeel, IROS 2015]

Current DirecBons n Unsupervised features in registraBon n Reinforcement learning to further improve performance n Forces and torques (to extend to non-kinemaBc tasks) n More data…

Thank you

Trajectory Transfer: Toy Example DemonstraBon Test ? Schulman et al. ISRR 2013

Trajectory Transfer: Toy Example 1. Calculate a non-rigid registraBon DemonstraBon Test Schulman et al. ISRR 2013

Trajectory Transfer: Toy Example 2. Apply to the demonstrated trajectory DemonstraBon Test Schulman et al. ISRR 2013

Learning to Manipulate from Demonstra3ons CS287 November 17, 2015 - PowerPoint PPT Presentation

Learning to Manipulate from Demonstra3ons CS287 November 17, 2015 Sandy Huang Slides courtesy of Pieter Abbeel Personal RoboBcs Hardware ? PR2 Baxter UBR-1 Willow Garage Rethink RoboBcs Unbounded RoboBcs ? $400,000 $30,000 $35,000

Week 5: Manipulate, Facet, Reduce Encode Manipulate Facet Encode Manipulate Facet

Week 5: Encode Manipulate Facet Reduce Manipulate Facet Reduce Change over Time Navigate

Library of synthetic 5' secondary structures to manipulate mRNA stability in Escherichia coli.

Week 2: from categorical and ordered Express Separate Express Separate Arrange

Proof Pearl: Using Combinators to Manipulate let -Expressions Michael Norrish 1 Konrad Slind 2 1

Lectures 3&4: from categorical and ordered Express Separate attributes Change

The Internal & External Environment How can we manipulate the two for user success? We

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate

Pebble Box Subjects manipulate real pebbles in a box. Exploring and Acting on Multiple

OLQ: Can I manipulate decimal numbers? Session 1: Subtracting decimals with the same number of

Set 7 January 2019 OSU CSE 1 Set The Set component family allows you to manipulate finite

PatManQL: A language to manipulate patterns and data in hierarchical catalogs Panagiotis Bouros,

Lab 0 2 Overview Learn how to use the command line to navigate and manipulate files as well

Queue 7 January 2019 OSU CSE 1 Queue The Queue component family allows you to manipulate

A library to manipulate Z-polyhedron in image representation Guillaume Iooss, Sanjay Rajopadhye

Map 7 January 2019 OSU CSE 1 Map The Map component family allows you to manipulate mappings

Knots, four dimensions, and fractals Arunima Ray Brandeis University February 6, 2017 Arunima

What Drove the Course of History? Johan C. . Varekamp Earth and Envir ironmental Sci ciences

MP-HULA A Multipath Transport Layer Aware Datacenter Load Balancing Scheme Using Programmable

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

GLOBAL MARKET ANNUAL GENERAL MEETING FY2016 NEO GROUP LIMITED MILESTONES * As defined in

Expedition Skills & Camp craft This module is aimed at Leaders wanting to expand their

Hylan Boulevard Corridor Transportation Improvement Study Public Meeting March 14, 2012 Agenda

Seman'c Internalism Paul M. Pietroski University of

Learning to Manipulate from Demonstra3ons CS287 November 17, 2015 - PowerPoint PPT Presentation

Learning to Manipulate from Demonstra3ons CS287 November 17, 2015 Sandy Huang Slides courtesy of Pieter Abbeel Personal RoboBcs Hardware ? PR2 Baxter UBR-1 Willow Garage Rethink RoboBcs Unbounded RoboBcs ? $400,000 $30,000 $35,000

Week 5: Manipulate, Facet, Reduce Encode Manipulate Facet Encode Manipulate Facet

Week 5: Encode Manipulate Facet Reduce Manipulate Facet Reduce Change over Time Navigate

Library of synthetic 5' secondary structures to manipulate mRNA stability in Escherichia coli.

Week 2: from categorical and ordered Express Separate Express Separate Arrange

Proof Pearl: Using Combinators to Manipulate let -Expressions Michael Norrish 1 Konrad Slind 2 1

Lectures 3&amp;4: from categorical and ordered Express Separate attributes Change

The Internal &amp; External Environment How can we manipulate the two for user success? We

What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate

Pebble Box Subjects manipulate real pebbles in a box. Exploring and Acting on Multiple

OLQ: Can I manipulate decimal numbers? Session 1: Subtracting decimals with the same number of

Set 7 January 2019 OSU CSE 1 Set The Set component family allows you to manipulate finite

PatManQL: A language to manipulate patterns and data in hierarchical catalogs Panagiotis Bouros,

Lab 0 2 Overview Learn how to use the command line to navigate and manipulate files as well

Queue 7 January 2019 OSU CSE 1 Queue The Queue component family allows you to manipulate

A library to manipulate Z-polyhedron in image representation Guillaume Iooss, Sanjay Rajopadhye

Map 7 January 2019 OSU CSE 1 Map The Map component family allows you to manipulate mappings

Knots, four dimensions, and fractals Arunima Ray Brandeis University February 6, 2017 Arunima

What Drove the Course of History? Johan C. . Varekamp Earth and Envir ironmental Sci ciences

MP-HULA A Multipath Transport Layer Aware Datacenter Load Balancing Scheme Using Programmable

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

GLOBAL MARKET ANNUAL GENERAL MEETING FY2016 NEO GROUP LIMITED MILESTONES * As defined in

Expedition Skills &amp; Camp craft This module is aimed at Leaders wanting to expand their

Hylan Boulevard Corridor Transportation Improvement Study Public Meeting March 14, 2012 Agenda

Seman'c Internalism Paul M. Pietroski University of

Lectures 3&4: from categorical and ordered Express Separate attributes Change

The Internal & External Environment How can we manipulate the two for user success? We

Expedition Skills & Camp craft This module is aimed at Leaders wanting to expand their