Grasping
Team MIT-Princeton @ the Amazon Robotics Challenge 1st place in stowing task Andy Zeng Shuran Song Kuan-Ting Yu Elliott Donlon Francois Hogan Maria Bauza Daolin Ma Orion Taylor Melody Liu Eudald Romo Nima Fazeli Ferran Alet Nikhil Dafle Rachel Holladay Druck Green Isabella Morona Prem Qu Nair Ian Taylor Weber Liu Thomas Funkhouser Alberto Rodriguez
From model-based to model-free Model-based grasping Pose estimation Grasp planning ✔ Works well with known objects in structured environments ✘ Can’t handle novel objects in unstructured environments (due to pose estimation)
From model-based to model-free Model-based grasping Model-free grasping Pose estimation Grasp planning Visual data Grasp planning ● ✔ Works well with known objects in structured environments Use local geometric features ✘ ● Can’t handle novel objects in unstructured environments Ignore object identity ● (due to pose estimation) End-to-end ● Motivated by industry
Recent work on model-free grasping Grasp Pose Detection Supersizing Self-Supervision DexNet 1.0 - 3.0 M. Gualtieri et al., ‘17 L. Pinto and A. Gupta, ‘16 J. Mahler et al., ‘17 Handles clutter and novel objects
Recent work on model-free grasping Grasp Pose Detection Supersizing Self-Supervision DexNet 1.0 - 3.0 M. Gualtieri et al., ‘17 L. Pinto and A. Gupta, ‘16 J. Mahler et al., ‘17 Handles clutter on tabletop scenarios and novel objects selected beforehand
Recent work on model-free grasping Grasp Pose Detection Supersizing Self-Supervision DexNet 1.0 - 3.0 M. Gualtieri et al., ‘17 L. Pinto and A. Gupta, ‘16 J. Mahler et al., ‘17 Common limitations: low grasp sample density, small neural network sizes
In this talk ● A model-free grasping method
In this talk ● A model-free grasping method ○ Handles dense clutter in tabletop bin/box scenario Rethink dense clutter: ● Objects not only tightly packed, but also tossed and stacked on top of each other ● Objects in corners and on bin edges
In this talk ● A model-free grasping method ○ Handles dense clutter in tabletop bin/box scenario ○ Works for novel objects of all kinds (i.e. any household object should be fair game) 90 - 95% grasping accuracy is not enough Objects without depth data...
In this talk ● A model-free grasping method ○ Handles dense clutter in tabletop bin/box scenario ○ Works for novel objects of all kinds (i.e. any household object should be fair game) ○ Fast and efficient Standard: Grasp sampling Grasp ranking Ours: Dense pixel-wise predictions
In this talk ● A model-free grasping method ○ Handles dense clutter in tabletop bin/box scenario ○ Works for novel objects of all kinds (i.e. any household object should be fair game) ○ Fast and efficient ○ 1st place stowing task at Amazon Robotics Challenge ‘17 (i.e. it works) “The Beast from the East” setup competition footage
Overview: multi-affordance grasping Input: multi-view RGB-D images
Overview: multi-affordance grasping Input: multi-view RGB-D images Output: dense grasp proposals and affordance scores for 4 primitive grasping behaviors: suction down suction side grasp down flush grasp
Dense pixel-wise affordances with FCNs Input RGB-D images fully convolutional ResNet-50
Dense pixel-wise affordances with FCNs Input RGB-D images ✓ suction down ❌ suction side fully convolutional ResNet-50
Dense pixel-wise affordances with FCNs Input RGB-D images ✓ suction down ❌ suction side fully convolutional ResNet-50
Dense pixel-wise affordances with FCNs Input RGB-D images ✓ suction down ❌ suction side fully convolutional ResNet-50 What about grasping?
Dense pixel-wise affordances with FCNs Input RGB-D images ✓ suction down ❌ suction side fully convolutional ResNet-50 RGB-D heightmaps
Dense pixel-wise affordances with FCNs Input RGB-D images ✓ suction down ❌ suction side fully convolutional ResNet-50 RGB-D heightmaps ✓ grasp down ❌ flush grasp
Dense pixel-wise affordances with FCNs Input RGB-D images ✓ suction down ❌ suction side fully convolutional ResNet-50 RGB-D heightmaps ✓ grasp down ❌ flush grasp predicts horizontal grasp affordances
Training data ● Manual labeling ● ~100 different household/office objects Suctionable areas Parallel-jaw grasps
Generalization from hardware capabilities ● High-powered deployable suction ● Actuated spatula
Pros and cons Advantages: ● Fast runtime speeds from efficient convolution
Pros and cons Advantages: ● Fast runtime speeds from efficient convolution ● Uses both color and depth information
Pros and cons Advantages: ● Fast runtime speeds from efficient convolution ● Uses both color and depth information ● Can leverage fat pre-trained networks ● Higher good grasp recall Standard: Grasp sampling Grasp ranking Ours:
Pros and cons Advantages: ● Fast runtime speeds from efficient convolution ● Uses both color and depth information ● Can leverage fat pre-trained networks ● Higher good grasp recall Limitations: ● Considers only top-down parallel-jaw grasps ○ Can trivially extend to more grasp angles ● Limited to grasping behaviors for which you can define affordances (no real planning) ● Open-loop
Future work Model-based grasping Model-free grasping Pose estimation Grasp planning Visual data Grasp planning
Future work Model-based grasping Model-free grasping Pose estimation Grasp planning Visual data Grasp planning How can we improve model-free by making it more like model-based?
Future work Model-based grasping Model-free grasping Semantic Scene Completion from a Single Depth Image [Song et al., CVPR ‘17]
Takeaways ● A model-free grasping method ○ FCNs to compute dense affordance predictions for multiple grasping behaviors (suction, parallel-jaw) ○ Multiple grasping primitive behaviors → dense clutter in bin/box scenario ○ Multi-view color and depth + diverse training data + robust hardware → handle novel objects of all kinds ○ FCNs for grasping affordance predictions → efficiency and high grasp recall
Takeaways ● A model-free grasping method ○ FCNs to compute dense affordance predictions for multiple grasping behaviors (suction, parallel-jaw) ○ Multiple grasping primitive behaviors → dense clutter in bin/box scenario ○ Multi-view color and depth + diverse training data + robust hardware → handle novel objects of all kinds ○ FCNs for grasping affordance predictions → efficiency and high grasp recall Paper and code are available: arc.cs.princeton.edu
Recognition of novel objects without retraining ● Match real images of novel objects to their product images (available at test time) ● After isolating object from clutter with model-free grasping, perform recognition ❌ ❌ ✓
Cross domain image matching (training) product images ℓ 2 distance ratio loss match? observed images
Cross domain image matching (training) product images ℓ 2 distance ratio loss match? observed images softmax loss for K-Net only
Cross domain image matching (testing) feature embedding known novel
Cross domain image matching (testing) input feature embedding known novel
Cross domain image matching (testing) input feature embedding known novel match!
Cross domain image matching (testing) input feature embedding known novel match! Pre-trained ImageNet features
Recommend
More recommend