the multidimensional wisdom of crowds
play

The Multidimensional Wisdom of Crowds Welinder P., Branson S., - PowerPoint PPT Presentation

The Multidimensional Wisdom of Crowds Welinder P., Branson S., Belongie S., Perona, P Experiment Presentation [CS395T] Visual Recognition Fall 2012 Presented by: Niveda Krishnamoorthy Problem Overview Slides from


  1. The Multidimensional Wisdom of Crowds Welinder P., Branson S., Belongie S., Perona, P Experiment Presentation [CS395T] Visual Recognition Fall 2012 Presented by: Niveda Krishnamoorthy

  2. Problem Overview

  3. Slides from http://videolectures.net/nips2010_welinder_mwc/

  4. Slides from http://videolectures.net/nips2010_welinder_mwc/

  5. Slides from http://videolectures.net/nips2010_welinder_mwc/

  6. Slides from http://videolectures.net/nips2010_welinder_mwc/

  7. Slides from http://videolectures.net/nips2010_welinder_mwc/

  8. Motivation

  9. Distribution of Human Expertise – Task: Finding bluebirds Experiment #1 Rate of correct rejection Rate of correct detection

  10. Distribution of Human Expertise – Task: Finding bluebirds Experiment #1 Rate of correct Bots rejection 50% error Rate of correct detection

  11. Distribution of Human Expertise – Task: Finding bluebirds Experiment #1 Competent Rate of correct rejection 50% error Rate of correct detection

  12. Distribution of Human Expertise – Task: Finding bluebirds Experiment #1 Pessimistic Fraction of True Negatives 50% error Fraction of True Positives

  13. Distribution of Human Expertise – Task: Finding bluebirds Experiment #1 Rate of correct rejection 50% error Optimistic Rate of correct detection

  14. Distribution of Human Expertise – Task: Finding bluebirds Experiment #1 Rate of correct rejection Adversarial 50% error Rate of correct detection

  15. The Idea

  16. Slides from http://videolectures.net/nips2010_welinder_mwc/

  17. Slides from http://videolectures.net/nips2010_welinder_mwc/

  18. Slides from http://videolectures.net/nips2010_welinder_mwc/

  19. Slides from http://videolectures.net/nips2010_welinder_mwc/

  20. Error Rate for Bluebirds dataset

  21. Estimating Image Difficulty Complex Images

  22. 1D clusters from learned X i values Dataset: Bluebirds

  23. 1D clusters from learned X i values Dataset: Bluebirds

  24. How do these learned image complexities compare with vision-based techniques? Vision-based measure: Predicted time* to label an image as a measure of image complexity * What’s It Going to Cost You? : Predicting Effort vs. Informativeness for Multi-Label Image Annotations. S. Vijayanarasimhan and K. Grauman. CVPR 2009 Approach: Extract 2804-d feature vectors for MSRC dataset ● Pyramid of HoG ● Color histogram ● Grayscale histogram ● Spatial pyramid of edge density (Canny edge) Train a regressor on top 200 features selected using ReliefF Predict time to label images for bluebirds dataset

  25. Vision-based complexity (vs) Learned Image Complexity Predicted time to label image Learned image complexity

  26. Vision-based complexity (vs) Learned Image Complexity Less time to label images that are at Predicted time to label image the ends Learned image complexity

  27. Vision-based complexity (vs) Learned Image Complexity Longer time to label images Predicted time to label image towards the center Learned image complexity

  28. Qualitative Comparison

  29. Complex Images – Examples False negatives -0.1729917846 -0.9812171195 -0.032638129 -0.4584787866 -0.2354540127 -0.3173699459

  30. Complex Images – Examples False positive 0.1455767129 0.0405159085 0.1051552874 0.2087033725 0.5611137687 0.478586944 0.02178887

  31. Easy Images – Examples True negatives -1.7174006439 -1.4096763371 -1.1632084806 -1.339038233 -1.7104861404 -1.4764330722 -1.214442414

  32. Easy Images – Examples True positives 1.0293133893 1.1407150551 1.0859884692 1.1461191967 1.077287623

  33. Task: Finding ducks Slides from http://videolectures.net/nips2010_welinder_mwc/

  34. Slides from http://videolectures.net/nips2010_welinder_mwc/

  35. 2D clusters from learned X i values Dataset: Ducks Recreated in MATLAB

  36. Vision-based complexity (vs) Learned Image Complexity Recreated in MATLAB: Size of point is proportional to the predicted time needed to label it

  37. Vision-based complexity (vs) Learned Image Complexity Images along the outer edge of the cluster take longer to label Recreated in MATLAB: Size of point is proportional to the predicted time needed to label it

  38. Vision-based complexity (vs) Learned Image Complexity Similarly Recreated in MATLAB: Size of point is proportional to the predicted time needed to label it

  39. Vision-based complexity (vs) Learned Image Complexity Images in the wrong clusters take longer to label Recreated in MATLAB: Size of point is proportional to the predicted time needed to label it

  40. Vision-based complexity (vs) Learned Image Complexity Images at the center can also take longer to label. Why? Recreated in MATLAB: Size of point is proportional to the predicted time needed to label it

  41. Discussion Is vision-based image complexity a good indicator of difficulty in labeling an image? What are the other factors?

  42. Discussion Is vision-based image complexity a good indicator of difficulty in labeling an image? What are the other factors? Bird pose Occlusions Lighting

  43. Discussion 1. The authors experiment only with a 2-dimensional model of human expertise How would this model perform by increasing the number of intrinsic dimensions?

  44. Extending this approach to a video dataset YouTube corpus

  45. Example YouTube video with descriptions http://youtu.be/FYyqIJ36dSU A french bulldog is playing with a big ball A small dog chases a big ball. A French bulldog is running fast and playing with a blue yoga ball all by himself in a field. The little dog pushed a big blue ball. A dog is playing with a very large ball. A dog chases a giant rubber ball around A dog is playing with ball

  46. Approach YouTube corpus is not cut out for this task. Consider predicting the presence of the activity “run” 1. Selected 50 videos where “run” was the predicted activity using majority voting 2. Selected 30 videos where “play” was the predicted activity using majority voting 3. Selected 20 videos where “walk” was the predicted activity using majority voting Ground Truth Labels were assigned accordingly Each video had variable number of annotators. Picked the 20 most frequent annotators .

  47. Results Subsampled “RUN” data

  48. 1D clusters from learned X i values Dataset: YouTube videos

  49. How do these learned video complexities compare with vision-based techniques? Vision-based measure: Number of STIPS in the video STIP density * Learning Realistic Human Actions from Movies. I. Laptev, M. Marszałek, C. Schmid and B. Rozenfeld. CVPR 2008.

  50. Vision-based complexity (vs) Learned Image Complexity Number of STIPs Learned image complexity

  51. Vision-based complexity (vs) Learned Image Complexity STIP density Learned image complexity

  52. Vision-based complexity (vs) Learned Image Complexity Not much correlation but false negatives seem to have a higher STIP STIP density density PLAY/WALK RUN Learned image complexity

  53. Discussion How can we quantify the complexity of a video? STIP density? Video length? Variety in STIPS? Confusion amongst multiple annotators? How can we quantify the effort involved in labeling a video? How do these relate to video ambiguity?

  54. Qualitative Comparison – True positive STIP density Learned image complexity http://youtu.be/NKm8c_7mgx4

  55. Qualitative Comparison – True negative STIP density Learned image complexity http://youtu.be/abiezv1p7SY

  56. Qualitative Comparison – False positive STIP density Learned image complexity http://youtu.be/1l9Hx1kX_tQ

  57. Qualitative Comparison – False negative STIP density Learned image complexity http://youtu.be/8miosT-Fs1k

  58. Strengths 1. Each annotator is modeled as a multi-dimensional entity – competence, expertise, bias 2. Can be extended to any domain to estimate the ground truth with least error 3. Models image complexities without even seeing the image 4. The model discovers groups of annotators with varying skill sets.

  59. Discussion 1. Image difficulties are learned from human annotations only, which is great! But would the model perform better if image difficulty was incorporated as a known parameter (using some vision-based technique) into the graphical model?

  60. ?

Recommend


More recommend