learning to manipulate from demonstra3ons
play

Learning to Manipulate from Demonstra3ons CS287 November 17, 2015 - PowerPoint PPT Presentation

Learning to Manipulate from Demonstra3ons CS287 November 17, 2015 Sandy Huang Slides courtesy of Pieter Abbeel Personal RoboBcs Hardware ? PR2 Baxter UBR-1 Willow Garage Rethink RoboBcs Unbounded RoboBcs ? $400,000 $30,000 $35,000


  1. Learning to Manipulate from Demonstra3ons CS287 November 17, 2015 Sandy Huang Slides courtesy of Pieter Abbeel

  2. Personal RoboBcs Hardware ? PR2 Baxter UBR-1 Willow Garage Rethink RoboBcs Unbounded RoboBcs ? $400,000 $30,000 $35,000 $2,000 ? 2009 2013 2013 2017?

  3. Challenge Task: RoboBc Laundry [MaiBn-Shepard, Cusumano Towner, Lei, Abbeel, ICRA 2010]

  4. How About…

  5. Surgical Knot Tie [van den Berg, Miller, Duckworth, Humphrey, Wan, Fu, Goldberg, Abbeel, Best Medical RoboBcs Paper, ICRA 2010]

  6. Surgical Knot Tie n Open loop n If careful about iniBal condiBons n 50% success rate

  7. Learning from DemonstraBons n The problem n Human demonstrated knot- n Robot has to Be a knot in Be in this rope this rope

  8. Generalizing Trajectories Prior work n Billard, Calinon and collaborators n Gaussian Mixture Models (GMM) and Gaussian Mixture Regression (GMR) n Schaal and collaborators n Dynamic moBon primiBves n Cakmak, Thomaz and collaborators n Human robot interacBon for robot to learn faster n Peters and collaborators n Stay close to demonstraBons distribuBon while also opBmizing reward n BUT n All of these algorithms have underlying representaBons in terms of coordinates n Can we alleviate need to specify coordinate frames / features and directly adapt to geometry? n

  9. Cartoon Problem Secng Training scene Trajectory demonstraBons Test scene What trajectory here? ?

  10. Cartoon Problem Secng Training scene Trajectory demonstraBons Samples of f : R 3 à R 3 Test scene What trajectory here? ?

  11. Cartoon Problem Secng Training scene Trajectory demonstraBons Samples of f : R 3 à R 3 Test scene What trajectory here? ?

  12. Cartoon Problem Secng Training scene Trajectory demonstraBons Samples of f : R 3 à R 3 Test scene What trajectory here? ?

  13. Cartoon Problem Secng Training scene Trajectory demonstraBons Samples of f : R 3 à R 3 Test scene What trajectory here?

  14. Learning f : R 3 à R 3 from Samples Z x ∈ R 3 k D 2 f ( x ) k 2 min Frob dx f ∈ { R 3 → R 3 } f ( x ( i ) train ) = x ( i ) s . t . ∀ i ∈ 1 , . . . , m test n TranslaBons, rotaBons and scaling are FREE

  15. Learning f : R 3 à R 3 from Samples Z x ∈ R 3 k D 2 f k 2 min Frob ( x ) dx f ∈ { R 3 → R 3 } f ( x ( i ) train ) = x ( i ) s . t . test 8 i 2 1 , . . . , m n SoluBon has form: Wahba, Spline models for observaBonal data. Philadelphia: Society for Industrial and Applied MathemaBcs. 1990. Evgeniou, PonBl, Poggio, RegularizaBon Networks and Support Vector Machines. Advances in ComputaBonal MathemaBcs. 2000. HasBe, Tibshirani, Friedman, Elements of StaBsBcal Learning, Chapter 5. 2008.

  16. Finding a Non-Rigid RegistraBon n Thin Plate Spline Robust Point Matching (TPS-RPM) [Chui et al. CVIU 2003]: Calculate soj point OpBmize for warp IniBalize correspondence matrix funcBon n Variant of ExpectaBon-MaximizaBon (EM); finds locally opBmal warp

  17. Trajectory Transfer Procedure n Using non-rigid registraBon, find a transformaBon f from training scene to test scene n Apply f to the demonstrated end-effector trajectory n Convert the end-effector trajectory to a joint trajectory [J. Schulman, J. Ho, C. Lee, P. Abbeel, ISRR 2013]

  18. Robot Experiments n Knots Bed n Overhand n Figure-eight n Double-overhand n Square n Clove-hitch

  19. Experiment: Knot-Tie [J. Schulman, J. Ho, C. Lee, P. Abbeel, ISRR 2013]

  20. EvaluaBon

  21. Experiment: Suturing [J. Schulman, A. Gupta, S. Venkatesan, M. Tayson-Frederick, P. Abbeel, IROS 2013]

  22. LimitaBons of Trajectory Transfer n Does not consider joint limits and obstacles when finding the warp funcBon n ComputaBonally expensive with >100 demonstraBons n Ignores surface normals when finding the warp funcBon n Only uses geometric informaBon of the objects, not appearance informaBon

  23. Trajectory Transfer: First Step DemonstraBon scene Test scene ? ( ) + bending_energy f ( ) Step 1: f ∈ registration!functions !registration_error S demo ,! S test min ! ( ) τ f ← f τ demo !

  24. Trajectory Transfer: Second Step Transferred trajectory Feasible trajectory ( ) Step 2: min trajectory_error τ f , τ τ ∈ trajectories τ !is!feasible!and!collision5free s.t. !

  25. Unifying Trajectory Transfer Two-step opBmizaBon Unified opBmizaBon ( ) f ∈ registration!functions !registration_error S demo ,! S test min Step 1: ( ) ( ) min registration_error S demo ,! S test !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + bending_energy f f ∈ registration!functions ! !!!!!!!! τ ∈ trajectories ( ) ( ) !!!!!!!!!!!!! + bending_energy f ( ) , τ trajectory_error f τ demo min ( ) ( ) , τ Step 2: τ ∈ trajectories + trajectory_error f τ demo τ !is!feasible!and!collision5free s.t. ! s.t. τ !is!feasible!and!collision7free !

  26. ApplicaBon to ManipulaBon of Deformable Objects 100 90 80 70 60 Success Rate 50 Two-step opBmizaBon 40 Unified opBmizaBon 30 20 10 0 1 0.9 0.8 0.7 0.6 0.5 0.4 Degree of Freedom Range Reduc3on Factor [A. Lee, S. Huang, D. Hadfield-Menell, E. Tzeng, P. Abbeel, IROS 2014]

  27. TheoreBcal Guarantees n Can be expected to work if the dynamics of the system are approximately covariant under sufficiently smooth warpings.

  28. Nearest-Neighbor Policy for Tasks n Repeat n Acquire new point cloud X test n Using non-rigid registraBon compute distance between X test and each point cloud X train,i from demonstraBons n If i* is a “done” state, break n Apply trajectory transfer to generate new trajectory

  29. LimitaBons of the Nearest-Neighbor Policy n Doesn’t account for demonstraBon quality n Doesn’t prefer moves that make progress n Doesn’t account for reachability of trajectory

  30. Learning to Choose Bever AcBons [D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

  31. Max-Margin Policy Learning [D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

  32. Max-Margin Q-FuncBon Learning [D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

  33. Experiments [D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

  34. Results in SimulaBon [D. Hadfield-Menell, A. Lee, C. Finn, E. Tzeng, S. Huang, P. Abbeel, ICRA 2015]

  35. EvaluaBon on Knot-Tying Overhand Knots Figure 8 Knots Success Rate Success Rate 100 100 80 80 60 60 40 40 20 20 0 0 [Schulman Max Margin Beam [Schulman Max Margin Beam et al. ISRR Q-function Search et al. ISRR Q-function Search '13] Estimation (3-3) '13] Estimation (3-3) 70% 82% 88% 54% 63% 76%

  36. MoBvaBon for Including Surface Normals

  37. Standard TPS-RPM RegistraBon DemonstraBon scene Test scene

  38. TPS-RPM RegistraBon with Normals Test scene DemonstraBon scene [A. Lee, M. Goldstein, S. Barrav, P. Abbeel, ICRA 2015]

  39. Problem FormulaBon

  40. TPS-RPM: SensiBvity to IniBalizaBon n Only uses geometric informaBon to find non-rigid registraBon Demo Test

  41. Geometric Similarity ≠ SemanBc Similarity n DemonstraBon selecBon also only uses geometric informaBon Test configuraBon Geometrically-similar demonstraBon configuraBons

  42. ConvoluBonal Neural Net ClassificaBon corners-against-background n edges-against-background n edges-against-interior n folds-against-background n flat interior n wrinkled interior n [S. Huang, J. Pan, G. Mulcaire, P. Abbeel, IROS 2015]

  43. Leveraging Appearance InformaBon Calculate soj point OpBmize for warp IniBalize correspondence matrix funcBon n = correspondence between source point and target point n = prior probability that and should be matched n Define the new point correspondence matrix as n Normalize so that the rows and columns sum to 1

  44. Trajectory Transfer + Appearance Priors Demo Test Without appearance priors With appearance priors

  45. TPS-RPM with CNN ClassificaBon of Pixels [S. Huang, J. Pan, G. Mulcaire, P. Abbeel, IROS 2015]

  46. Current DirecBons n Unsupervised features in registraBon n Reinforcement learning to further improve performance n Forces and torques (to extend to non-kinemaBc tasks) n More data…

  47. Thank you

  48. Trajectory Transfer: Toy Example DemonstraBon Test ? Schulman et al. ISRR 2013

  49. Trajectory Transfer: Toy Example 1. Calculate a non-rigid registraBon DemonstraBon Test Schulman et al. ISRR 2013

  50. Trajectory Transfer: Toy Example 1. Calculate a non-rigid registraBon DemonstraBon Test Schulman et al. ISRR 2013

  51. Trajectory Transfer: Toy Example 1. Calculate a non-rigid registraBon DemonstraBon Test Schulman et al. ISRR 2013

  52. Trajectory Transfer: Toy Example 2. Apply to the demonstrated trajectory DemonstraBon Test Schulman et al. ISRR 2013

Recommend


More recommend