the promise and perils of big data
play

The Promise and Perils of Big Data Some Slides from A. Efros and A. - PowerPoint PPT Presentation

The Promise and Perils of Big Data Some Slides from A. Efros and A. Torralba Why do we need data? Most problems in vision are ambiguous and hard. 2D -> 3D Segmentation/Edges So, how do we solve these problems? Magic of data !


  1. Step 4: Score Word-String Candidates • Scoring of candidates based on: – Proximity (minimize extraneous words in target n-gram ≈ precision) – Number of word matches (maximize coverage ≈ recall) ) – Regular words given more weight than function words – Combine results (e.g., optimize F 1 or p-norm or …) Target Word-String Candidates “ Regular ” Words Word Matches Proximity Total Scoring Scoring T3-b T(x) T2-d T(x) T(x) T6-c 3rd 3rd --- 3rd 3rd T4-a T6-b T(x) T2-c T3-a 1st 2st --- 2nd 1st T3-c T2-b T4-e T5-a T6-a 1st 1st --- 1st 1st Slide by Jaime Carbonell

  2. Step 5: Select Candidates Using Overlap 
 (Propagate context over entire sentence) T(x1) T2-d T3-c T(x2) T4-b Word-String 1 T(x1) T(x1) T3-c T2-b T4-e T3-c T2-b T4-e Candidates T(x2) T(x2) T4-a T6-b T(x3) T4-a T6-b T(x3) T2-c T2-c T3-b T(x3) T3-b T(x3) T3-b T(x3) T2-d T(x5) T2-d T(x5) T2-d T(x5) T(x6) T(x6) T(x6) T6-c T6-c T6-c Word-String 2 T4-a T6-b T(x3) T4-a T6-b T(x3) T4-a T6-b T(x3) T2-c T3-a T2-c T3-a T2-c T3-a Candidates T3-c T2-b T4-e T5-a T6-a T3-c T2-b T4-e T5-a T6-a T3-c T2-b T4-e T5-a T6-a T2-b T4-e T5-a T6-a T(x8) T2-b T4-e T5-a T6-a T(x8) Word-String 3 T6-b T(x11) T2-c T3-a T(x9) Candidates T6-b T(x3) T6-b T(x3) T2-c T3-a T(x8) T2-c T3-a T(x8) Slide by Jaime Carbonell

  3. Step 5: Select Candidates Using Overlap Best translations selected via maximal overlap T(x2) T4-a T6-b T(x3) T2-c T4-a T6-b T(x3) T2-c T3-a Alternative 1 T6-b T(x3) T2-c T3-a T(x8) T(x2) T4-a T6-b T(x3) T2-c T3-a T(x8) T(x1) T3-c T2-b T4-e T3-c T2-b T4-e T5-a T6-a Alternative 2 T2-b T4-e T5-a T6-a T(x8) T(x1) T3-c T2-b T4-e T5-a T6-a T(x8) Slide by Jaime Carbonell

  4. A (Simple) Real Example of Overlap Flooding � N-gram fidelity Overlap � Long range fidelity a United States soldier N-grams United States soldier died generated from soldier died and two others Flooding died and two others were injured two others were injured Monday N-grams connected via a United States soldier died and two others were injured Monday Overlap Slide by Jaime Carbonell

  5. Texture Synthesis

  6. So, how do we use big data?

  7. Two ways to use Lots of Data Brute Force Vision: Find See what different subsets that needle in the of data think of you haystack and disregard the rest (a.k.a. kNN)

  8. kNN matching is great… • because we live in a (mostly) boring world!

  9. Lots Of Images A. Torralba, R. Fergus, W.T .Freeman. PAMI 2008

  10. Lots Of Images A. Torralba, R. Fergus, W.T .Freeman. PAMI 2008

  11. Lots Of Images

  12. Automatic Colorization Result Grayscale input High resolution Colorization of input using average A. Torralba, R. Fergus, W.T .Freeman. 2008

  13. im2gps Instead of using objects labels, the web provides other kinds of metadata associate to large collections of images 20 million geotagged and geographic text-labeled images Hays & Efros. CVPR 2008

  14. im2gps Hays & Efros. CVPR 2008

  15. Image completion Instead, generate proposals using millions of images Input output 16 nearest neighbors 
 (gist+color matching) Hays, Efros, 2007

  16. With a good image similarity 
 and a lot of data… Nearest neighbors Input image 22,000 LabelMe scenes Hays, Efros, Siggraph 2006 Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

  17. With a good image similarity 
 and a lot of data… Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

  18. With a good image similarity 
 and a lot of data… Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

  19. With a good image similarity 
 and a lot of data… Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

  20. With a good image similarity 
 and a lot of data… Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

  21. Outputs Russell, Liu, Torralba, Fergus, Freeman. NIPS 2007

  22. While many scenes are boring… Slide by Antonio Torralba

  23. Some scenes are unique Slide by Antonio Torralba

  24. Dealing with sparse data (rare scenes) • better similarity

  25. Medici Fountain, Paris 83

  26. 84

  27. 85

  28. Medici Fountain, Paris (winter) 86

  29. 87

  30. 88

  31. 89

  32. 90

  33. 91

  34. O UR G OAL 92

  35. 93

  36. Input Query Top Matches 94

  37. Input Query Top Matches 95

  38. Input Query Top Matches 96

  39. I MPORTANT P ARTS ? Input Query Important Parts 97

  40. Top Matches Input Query 98

  41. “Data-driven Uniqueness” 99

  42. Search using Images Input Query Top Matches 100

Recommend


More recommend