grasping team mit princeton the amazon robotics challenge
play

Grasping Team MIT-Princeton @ the Amazon Robotics Challenge 1st - PowerPoint PPT Presentation

Grasping Team MIT-Princeton @ the Amazon Robotics Challenge 1st place in stowing task Andy Zeng Shuran Song Kuan-Ting Yu Elliott Donlon Francois Hogan Maria Bauza Daolin Ma Orion Taylor Melody Liu Eudald Romo Nima Fazeli Ferran Alet


  1. Grasping

  2. Team MIT-Princeton @ the Amazon Robotics Challenge 1st place in stowing task Andy Zeng Shuran Song Kuan-Ting Yu Elliott Donlon Francois Hogan Maria Bauza Daolin Ma Orion Taylor Melody Liu Eudald Romo Nima Fazeli Ferran Alet Nikhil Dafle Rachel Holladay Druck Green Isabella Morona Prem Qu Nair Ian Taylor Weber Liu Thomas Funkhouser Alberto Rodriguez

  3. From model-based to model-free Model-based grasping Pose estimation Grasp planning ✔ Works well with known objects in structured environments ✘ Can’t handle novel objects in unstructured environments (due to pose estimation)

  4. From model-based to model-free Model-based grasping Model-free grasping Pose estimation Grasp planning Visual data Grasp planning ● ✔ Works well with known objects in structured environments Use local geometric features ✘ ● Can’t handle novel objects in unstructured environments Ignore object identity ● (due to pose estimation) End-to-end ● Motivated by industry

  5. Recent work on model-free grasping Grasp Pose Detection Supersizing Self-Supervision DexNet 1.0 - 3.0 M. Gualtieri et al., ‘17 L. Pinto and A. Gupta, ‘16 J. Mahler et al., ‘17 Handles clutter and novel objects

  6. Recent work on model-free grasping Grasp Pose Detection Supersizing Self-Supervision DexNet 1.0 - 3.0 M. Gualtieri et al., ‘17 L. Pinto and A. Gupta, ‘16 J. Mahler et al., ‘17 Handles clutter on tabletop scenarios and novel objects selected beforehand

  7. Recent work on model-free grasping Grasp Pose Detection Supersizing Self-Supervision DexNet 1.0 - 3.0 M. Gualtieri et al., ‘17 L. Pinto and A. Gupta, ‘16 J. Mahler et al., ‘17 Common limitations: low grasp sample density, small neural network sizes

  8. In this talk ● A model-free grasping method

  9. In this talk ● A model-free grasping method ○ Handles dense clutter in tabletop bin/box scenario Rethink dense clutter: ● Objects not only tightly packed, but also tossed and stacked on top of each other ● Objects in corners and on bin edges

  10. In this talk ● A model-free grasping method ○ Handles dense clutter in tabletop bin/box scenario ○ Works for novel objects of all kinds (i.e. any household object should be fair game) 90 - 95% grasping accuracy is not enough Objects without depth data...

  11. In this talk ● A model-free grasping method ○ Handles dense clutter in tabletop bin/box scenario ○ Works for novel objects of all kinds (i.e. any household object should be fair game) ○ Fast and efficient Standard: Grasp sampling Grasp ranking Ours: Dense pixel-wise predictions

  12. In this talk ● A model-free grasping method ○ Handles dense clutter in tabletop bin/box scenario ○ Works for novel objects of all kinds (i.e. any household object should be fair game) ○ Fast and efficient ○ 1st place stowing task at Amazon Robotics Challenge ‘17 (i.e. it works) “The Beast from the East” setup competition footage

  13. Overview: multi-affordance grasping Input: multi-view RGB-D images

  14. Overview: multi-affordance grasping Input: multi-view RGB-D images Output: dense grasp proposals and affordance scores for 4 primitive grasping behaviors: suction down suction side grasp down flush grasp

  15. Dense pixel-wise affordances with FCNs Input RGB-D images fully convolutional ResNet-50

  16. Dense pixel-wise affordances with FCNs Input RGB-D images ✓ suction down ❌ suction side fully convolutional ResNet-50

  17. Dense pixel-wise affordances with FCNs Input RGB-D images ✓ suction down ❌ suction side fully convolutional ResNet-50

  18. Dense pixel-wise affordances with FCNs Input RGB-D images ✓ suction down ❌ suction side fully convolutional ResNet-50 What about grasping?

  19. Dense pixel-wise affordances with FCNs Input RGB-D images ✓ suction down ❌ suction side fully convolutional ResNet-50 RGB-D heightmaps

  20. Dense pixel-wise affordances with FCNs Input RGB-D images ✓ suction down ❌ suction side fully convolutional ResNet-50 RGB-D heightmaps ✓ grasp down ❌ flush grasp

  21. Dense pixel-wise affordances with FCNs Input RGB-D images ✓ suction down ❌ suction side fully convolutional ResNet-50 RGB-D heightmaps ✓ grasp down ❌ flush grasp predicts horizontal grasp affordances

  22. Training data ● Manual labeling ● ~100 different household/office objects Suctionable areas Parallel-jaw grasps

  23. Generalization from hardware capabilities ● High-powered deployable suction ● Actuated spatula

  24. Pros and cons Advantages: ● Fast runtime speeds from efficient convolution

  25. Pros and cons Advantages: ● Fast runtime speeds from efficient convolution ● Uses both color and depth information

  26. Pros and cons Advantages: ● Fast runtime speeds from efficient convolution ● Uses both color and depth information ● Can leverage fat pre-trained networks ● Higher good grasp recall Standard: Grasp sampling Grasp ranking Ours:

  27. Pros and cons Advantages: ● Fast runtime speeds from efficient convolution ● Uses both color and depth information ● Can leverage fat pre-trained networks ● Higher good grasp recall Limitations: ● Considers only top-down parallel-jaw grasps ○ Can trivially extend to more grasp angles ● Limited to grasping behaviors for which you can define affordances (no real planning) ● Open-loop

  28. Future work Model-based grasping Model-free grasping Pose estimation Grasp planning Visual data Grasp planning

  29. Future work Model-based grasping Model-free grasping Pose estimation Grasp planning Visual data Grasp planning How can we improve model-free by making it more like model-based?

  30. Future work Model-based grasping Model-free grasping Semantic Scene Completion from a Single Depth Image [Song et al., CVPR ‘17]

  31. Takeaways ● A model-free grasping method ○ FCNs to compute dense affordance predictions for multiple grasping behaviors (suction, parallel-jaw) ○ Multiple grasping primitive behaviors → dense clutter in bin/box scenario ○ Multi-view color and depth + diverse training data + robust hardware → handle novel objects of all kinds ○ FCNs for grasping affordance predictions → efficiency and high grasp recall

  32. Takeaways ● A model-free grasping method ○ FCNs to compute dense affordance predictions for multiple grasping behaviors (suction, parallel-jaw) ○ Multiple grasping primitive behaviors → dense clutter in bin/box scenario ○ Multi-view color and depth + diverse training data + robust hardware → handle novel objects of all kinds ○ FCNs for grasping affordance predictions → efficiency and high grasp recall Paper and code are available: arc.cs.princeton.edu

  33. Recognition of novel objects without retraining ● Match real images of novel objects to their product images (available at test time) ● After isolating object from clutter with model-free grasping, perform recognition ❌ ❌ ✓

  34. Cross domain image matching (training) product images ℓ 2 distance ratio loss match? observed images

  35. Cross domain image matching (training) product images ℓ 2 distance ratio loss match? observed images softmax loss for K-Net only

  36. Cross domain image matching (testing) feature embedding known novel

  37. Cross domain image matching (testing) input feature embedding known novel

  38. Cross domain image matching (testing) input feature embedding known novel match!

  39. Cross domain image matching (testing) input feature embedding known novel match! Pre-trained ImageNet features

Recommend


More recommend