best of both worlds human machine collaboration for
play

Best of both worlds: Human-machine collaboration for object - PowerPoint PPT Presentation

Best of both worlds: Human-machine collaboration for object annotation Fei-Fei Li Olga Russakovsky Li-Jia Li (Stanford U.) (Stanford U.) (Snapchat) CVPR 2015 Backpack Strawberry Flute Traffic light Backpack Matchstick Bathing cap


  1. Best of both worlds: Human-machine collaboration for object annotation Fei-Fei Li Olga Russakovsky Li-Jia Li (Stanford U.) (Stanford U.) (Snapchat) CVPR 2015

  2. Backpack

  3. Strawberry Flute Traffic light Backpack Matchstick Bathing cap Sea lion Racket

  4. Large-scale recognition

  5. Large-scale recognition Need benchmark datasets

  6. PASCAL VOC 2005-2012 20 object classes 22,591 images Classification: person, motorcycle Detection Segmentation Person Motorcycle Action: riding bicycle Everingham, Van Gool, Williams, Winn and Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. IJCV 2010.

  7. Large Scale Visual Recognition Challenge (ILSVRC) 2010-2014 20 object classes 22,591 images 200 object classes 517,840 images DET 1000 object classes 1,431,167 images CLS-LOC Person Person Person Person Dog http://image-net.org/challenges/LSVRC/

  8. ILSVRC types of image annotations Image classification • one object class per image • no bounding boxes Steel drum 1,000 object classes 1,331,167 images $

  9. ILSVRC types of image annotations Image classification Single-object localization • • one object class per one object class per image • image bounding boxes around all • no bounding boxes instances of this class Steel drum Steel drum 1,000 object classes 1,000 object classes 573,966 images 1,331,167 images 657,231 bounding boxes $$ $

  10. ILSVRC types of image annotations Image classification Single-object localization Object detection • • • one object class per one object class per image all target object classes • • image bounding boxes around all bounding boxes around • no bounding boxes instances of this class all instances Steel drum Steel drum Person Car Motorcycle Helmet 200 object classes 1,000 object classes 1,000 object classes 81,799 images 573,966 images 1,331,167 images 228,981 bounding boxes 657,231 bounding boxes $$$ $$ $

  11. Q: How good is scene understanding with ILSVRC?

  12. Q: How good is scene understanding with ILSVRC? An unknown image

  13. Q: How good is scene understanding with ILSVRC? ILSVRC image classification: Table

  14. Q: How good is scene understanding with ILSVRC? ILSVRC single-object localization: Table

  15. Q: How good is scene understanding with ILSVRC? ILSVRC object detection: state-of-the-art output (removing wrong detections) Person Table Person Backpack Table TV

  16. Q: How good is scene understanding with ILSVRC? ILSVRC object detection: all instances of the 200 target objects Cup Lamp Lamp Cup Potted Cup Plant Potted Plant Person Tapeplayer Table Person Potted Plant Couch Backpack Table TV Couch Table

  17. One unsolved question: 
 What would it take to recognize all the objects here?

  18. The accuracy/cost tradeoff Dense manual annotation High accuracy Huge cost Many objects Cost Label quantity and quality per image

  19. The accuracy/cost tradeoff Dense manual annotation High accuracy Huge cost Many objects Cost Fully automatic object detection Low cost Low accuracy Few objects Label quantity and quality per image

  20. The accuracy/cost tradeoff Dense manual annotation High accuracy Huge cost Many objects Cost Fully automatic object detection Low cost Low accuracy Few objects ☺ Label quantity and quality per image

  21. The accuracy/cost tradeoff Dense manual annotation Crowd engineering High accuracy Huge cost is improving Many objects Cost Fully automatic object detection Low cost Low accuracy Few objects ☺ Label quantity and quality per image

  22. The accuracy/cost tradeoff Dense manual annotation Crowd engineering High accuracy Huge cost is improving Many objects Humans need short, focused Data annotation tasks Cost Fully automatic object detection Low cost Low accuracy Few objects ☺ Label quantity and quality per image

  23. The accuracy/cost tradeoff Dense manual annotation Crowd engineering High accuracy Huge cost is improving Many objects Object detectors Cost are improving Fully automatic object detection Low cost Low accuracy Few objects ☺ Label quantity and quality per image

  24. The accuracy/cost tradeoff Dense manual annotation Crowd engineering High accuracy Huge cost is improving Many objects Object detectors Cost are improving Fully automatic object detection Object detectors are reasonably Low cost Low accuracy Algorithms Few objects accurate on some classes ☺ Label quantity and quality per image

  25. The accuracy/cost tradeoff Dense manual annotation Crowd engineering High accuracy Huge cost is improving Many objects Object detectors Cost are improving Fully automatic object detection Low cost Low accuracy Few objects ☺ Label quantity and quality per image O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

  26. Human-machine collaboration 
 for object annotation O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

  27. Human-machine collaboration 
 for object annotation Input image and constraints O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

  28. Human-machine collaboration 
 for object annotation Input image and constraints Detections For every box B, class C: P(det(B,C) | Image) Pillow (0.8) Bed (0.5) O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

  29. Human-machine collaboration 
 for object annotation Multiple types of human input Input image Is this a bed? and constraints Are there more pillows? Detections Solicit feedback For every box B, class C: Is there a fan? P(det(B,C) | Image) Outline another bed, if any Name this object Pillow (0.8) Name another Bed (0.5) object: pillow, Is this an object? bed, what else? O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

  30. Human-machine collaboration 
 for object annotation Multiple types of human input Input image Is this a bed? and constraints Are there more pillows? Detections Solicit feedback For every box B, class C: Is there a fan? P(det(B,C) | Image, User input ) Outline another bed, if any Update state Name this object Pillow ( 0.9 ) Name another Bed ( 0.6 ) object: pillow, Is this an object? bed, what else? O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

  31. Human-machine collaboration 
 for object annotation Multiple types of human input Input image Is this a bed? and constraints Are there more pillows? Detections Solicit feedback For every box B, class C: Is there a fan? P(det(B,C) | Image, User input) Outline another bed, if any Update state Name this object Pillow (0.9) Name another Bed (0.6) object: pillow, Is this an object? bed, what else? Output detections O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

  32. Human-machine collaboration 
 for object annotation Multiple types of human input Input image Is this a bed? and constraints Are there more pillows? Detections Solicit feedback For every box B, class C: Is there a fan? P(det(B,C) | Image, User input) Outline another bed, if any Update state Name this object Pillow (0.9) Name another HCI in computer vision Bed (0.6) object: pillow, Branson ECCV2010 Jain ICCV2013 Is this an object? bed, what else? Kovashka ICCV2011 Vondrick IJCV 2013 Wah ICCV2011 Wah CVPR2014 Output detections Parkash ECCV2012 Vijayanarasimhan IJCV2014 Biswas CVPR2013 Branson CVPR2014 O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

  33. Some qualitative results Computer Object Detection ... O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

  34. Some qualitative results Computer Computer Human Object Detection Verify-box: Is the yellow box Answer: No tight around a car ... ... O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

  35. Some qualitative results Computer Computer Human Object Detection Verify-box: Is the yellow box Answer: No tight around a car ... ... … Computer Human Draw-box: Draw a box Answer: Yellow box below around a person ... ... O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

  36. Some qualitative results Computer Computer Human Object Detection Verify-box: Is the yellow box Answer: No tight around a car ... ... … Computer Computer Human Draw-box: Draw a box Final Labeling Answer: Yellow box below around a person Car Person ... ... ... O Russakovsky et al. Best of both worlds: human-machine collaboration for object annotation. CVPR 2015.

Recommend


More recommend