Best of both worlds: Human-machine collaboration for object annotation Fei-Fei Li Olga Russakovsky Li-Jia Li (Stanford U.) (Stanford U.) (Snapchat) CVPR 2015
Backpack
Strawberry Flute Traffic light Backpack Matchstick Bathing cap Sea lion Racket
Large-scale recognition
Large-scale recognition Need benchmark datasets
PASCAL VOC 2005-2012 20 object classes 22,591 images Classification: person, motorcycle Detection Segmentation Person Motorcycle Action: riding bicycle Everingham, Van Gool, Williams, Winn and Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV 2010.
Large Scale Visual Recognition Challenge (ILSVRC) 2010-2014 20 object classes 22,591 images 200 object classes 517,840 images DET 1000 object classes 1,431,167 images CLS-LOC Person Person Person Person Dog http://image-net.org/challenges/LSVRC/
ILSVRC types of image annotations Image classification • one object class per image • no bounding boxes Steel drum 1,000 object classes 1,331,167 images $
ILSVRC types of image annotations Image classification Single-object localization • • one object class per one object class per image • image bounding boxes around all • no bounding boxes instances of this class Steel drum Steel drum 1,000 object classes 1,000 object classes 573,966 images 1,331,167 images 657,231 bounding boxes $$ $
ILSVRC types of image annotations Image classification Single-object localization Object detection • • • one object class per one object class per image all target object classes • • image bounding boxes around all bounding boxes around • no bounding boxes instances of this class all instances Steel drum Steel drum Person Car Motorcycle Helmet 200 object classes 1,000 object classes 1,000 object classes 81,799 images 573,966 images 1,331,167 images 228,981 bounding boxes 657,231 bounding boxes $$$ $$ $
Q: How good is scene understanding with ILSVRC?
Q: How good is scene understanding with ILSVRC? An unknown image
Q: How good is scene understanding with ILSVRC? ILSVRC image classification: Table
Q: How good is scene understanding with ILSVRC? ILSVRC single-object localization: Table
Q: How good is scene understanding with ILSVRC? ILSVRC object detection: state-of-the-art output (removing wrong detections) Person Table Person Backpack Table TV
Q: How good is scene understanding with ILSVRC? ILSVRC object detection: all instances of the 200 target objects Cup Lamp Lamp Cup Potted Cup Plant Potted Plant Person Tapeplayer Table Person Potted Plant Couch Backpack Table TV Couch Table
One unsolved question: What would it take to recognize all the objects here?
The accuracy/cost tradeoff Dense manual annotation High accuracy Huge cost Many objects Cost Label quantity and quality per image
The accuracy/cost tradeoff Dense manual annotation High accuracy Huge cost Many objects Cost Fully automatic object detection Low cost Low accuracy Few objects Label quantity and quality per image
The accuracy/cost tradeoff Dense manual annotation High accuracy Huge cost Many objects Cost Fully automatic object detection Low cost Low accuracy Few objects ☺ Label quantity and quality per image
The accuracy/cost tradeoff Dense manual annotation Crowd engineering High accuracy Huge cost is improving Many objects Cost Fully automatic object detection Low cost Low accuracy Few objects ☺ Label quantity and quality per image
The accuracy/cost tradeoff Dense manual annotation Crowd engineering High accuracy Huge cost is improving Many objects Humans need short, focused Data annotation tasks Cost Fully automatic object detection Low cost Low accuracy Few objects ☺ Label quantity and quality per image
The accuracy/cost tradeoff Dense manual annotation Crowd engineering High accuracy Huge cost is improving Many objects Object detectors Cost are improving Fully automatic object detection Low cost Low accuracy Few objects ☺ Label quantity and quality per image
The accuracy/cost tradeoff Dense manual annotation Crowd engineering High accuracy Huge cost is improving Many objects Object detectors Cost are improving Fully automatic object detection Object detectors are reasonably Low cost Low accuracy Algorithms Few objects accurate on some classes ☺ Label quantity and quality per image
The accuracy/cost tradeoff Dense manual annotation Crowd engineering High accuracy Huge cost is improving Many objects Object detectors Cost are improving Fully automatic object detection Low cost Low accuracy Few objects ☺ Label quantity and quality per image O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.
Human-machine collaboration for object annotation O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.
Human-machine collaboration for object annotation Input image and constraints O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.
Human-machine collaboration for object annotation Input image and constraints Detections For every box B, class C: P(det(B,C) | Image) Pillow (0.8) Bed (0.5) O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.
Human-machine collaboration for object annotation Multiple types of human input Input image Is this a bed? and constraints Are there more pillows? Detections Solicit feedback For every box B, class C: Is there a fan? P(det(B,C) | Image) Outline another bed, if any Name this object Pillow (0.8) Name another Bed (0.5) object: pillow, Is this an object? bed, what else? O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.
Human-machine collaboration for object annotation Multiple types of human input Input image Is this a bed? and constraints Are there more pillows? Detections Solicit feedback For every box B, class C: Is there a fan? P(det(B,C) | Image, User input ) Outline another bed, if any Update state Name this object Pillow ( 0.9 ) Name another Bed ( 0.6 ) object: pillow, Is this an object? bed, what else? O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.
Human-machine collaboration for object annotation Multiple types of human input Input image Is this a bed? and constraints Are there more pillows? Detections Solicit feedback For every box B, class C: Is there a fan? P(det(B,C) | Image, User input) Outline another bed, if any Update state Name this object Pillow (0.9) Name another Bed (0.6) object: pillow, Is this an object? bed, what else? Output detections O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.
Human-machine collaboration for object annotation Multiple types of human input Input image Is this a bed? and constraints Are there more pillows? Detections Solicit feedback For every box B, class C: Is there a fan? P(det(B,C) | Image, User input) Outline another bed, if any Update state Name this object Pillow (0.9) Name another HCI in computer vision Bed (0.6) object: pillow, Branson ECCV2010 Jain ICCV2013 Is this an object? bed, what else? Kovashka ICCV2011 Vondrick IJCV 2013 Wah ICCV2011 Wah CVPR2014 Output detections Parkash ECCV2012 Vijayanarasimhan IJCV2014 Biswas CVPR2013 Branson CVPR2014 O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.
Some qualitative results Computer Object Detection ... O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.
Some qualitative results Computer Computer Human Object Detection Verify-box: Is the yellow box Answer: No tight around a car ... ... O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.
Some qualitative results Computer Computer Human Object Detection Verify-box: Is the yellow box Answer: No tight around a car ... ... … Computer Human Draw-box: Draw a box Answer: Yellow box below around a person ... ... O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.
Some qualitative results Computer Computer Human Object Detection Verify-box: Is the yellow box Answer: No tight around a car ... ... … Computer Computer Human Draw-box: Draw a box Final Labeling Answer: Yellow box below around a person Car Person ... ... ... O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.
Recommend
More recommend