image interpretation
play

Image Interpretation Martial Hebert Abhinav Gupta David Fouhey, - PowerPoint PPT Presentation

Learning from 3D Data for Image Interpretation Martial Hebert Abhinav Gupta David Fouhey, Adrien Matricon, Wajahat Hussain Slides adapted from David Fouhey Mid-level primitives learned from image+3D can be used to transfer geometric


  1. Learning from 3D Data for Image Interpretation Martial Hebert Abhinav Gupta David Fouhey, Adrien Matricon, Wajahat Hussain

  2. Slides adapted from David Fouhey

  3. • Mid-level primitives learned from image+3D can be used to transfer geometric information? • Geometric reasoning can use this local evidence to produce a consistent geometric interpretation?

  4. Pattern Repetition Common patterns correspond to common geometric configurations

  5. Pattern Repetition

  6. Pattern Repetition ...

  7. Physical/Geometric Constraints

  8. Primitives Visually Geometrically Discriminative Informative Image Surface Normals Saurabh Singh et al. Discriminative Mid-Level Patches

  9. Geometric configurations from large-scale RGBD data. NYU v2 Dataset (Silberman et al., 2012)

  10. Representation Detector Instances Canonical Form

  11. Representation Detector Instances w Canonical Form 8x8

  12. Representation Detector Instances N Canonical Form 10x10

  13. Representation Detector Instances y Canonical Form

  14. Learning Primitives 𝐻 + 𝑑 2 𝑀(w, N, x 𝑗 𝐵 , 𝑧 𝑗 ) y,w,N 𝑆 𝑥 + 𝑑 1 𝑧 𝑗 Δ N, x 𝑗 min 𝑗 Primitive Patch 10x10

  15. Learning Primitives Approach: iterative procedure

  16. Learning Primitives ( ) = Avg

  17. Learning Primitives Cluster Instances Patches Geometrically Dissimilar to N

  18. Learning Primitives …

  19. Learning Primitives Initialize y by clustering sampled patches …

  20. Inference Sparse Transfer … 19s

  21. Inference Sparse Transfer …

  22. Inference Sparse Transfer

  23. Inference Dense Transfer

  24. Sample Results – Qualitative 795 /654

  25. Confidences Most Confident Result Least Confident Result rank

  26. Cross-dataset PETS B3DO

  27. Failures

  28. Summary Stats ( ⁰) % Good Pixels (Lower Better) (Higher Better) Mean Median RMSE 11.25⁰ 22.5⁰ 30⁰ 3D Primitives 33.0 28.3 40.0 18.8 40.7 52.4 Singh et al. 35.0 32.4 40.6 11.2 32.1 45.8 Karsch et al. 40.8 37.8 46.9 7.9 25.8 38.2 Hoiem et al. 41.2 34.8 49.3 9.0 31.7 43.9 Saxena et al. 47.1 42.3 56.3 11.2 28.0 37.4 RF + Dense SIFT 36.0 33.4 41.7 11.4 31.1 44.2 RMSE

  29. Using geometric and physical constraints

  30. The Story So Far (Sparse)

  31. The Story So Far (Dense)

  32. The Story So Far

  33. Adding Physical/Geometric Constraints

  34. Adding Physical/Geometric Constraints

  35. Past Physical Constraints Camera-in-a-box Top-down Cuboid Hedau et al. 2009, Flint et al. 2011, Lee et al. 2010, Gupta et al. 2010, Satkin et al. 2012, Schwing et al. Xiao et al. 2012, etc. 2012, etc.

  36. Digression: Inspiration from the past…. Kanade’s Origami World, 1978

  37. From the past…. • Kanade’s chair… (Artificial Intelligence, 1981)

  38. Edges between surfaces Concave ( - ) Convex ( + )

  39. Edges between surfaces Concave ( - ) Convex ( + )

  40. Parameterization vp 2 vp vp 3 1

  41. Parameterization vp 2 vp vp 3 1 Schwing 2013, Hedau 2010

  42. Parameterization vp 2 vp vp 3 1

  43. Parameterization

  44. Parameterization 32/64

  45. Parameterization

  46. Parameterization

  47. Labeling : is cell i on?

  48. Formulation

  49. Variable : is cell i on?

  50. Unary Potentials : should cell i be on?

  51. Binary Potentials : should cells i and j both be on?

  52. Binary Potentials Convex ( + ) Concave ( - )

  53. … 8o7s+UCM

  54. Binary Potentials Convex ( + ) Concave ( - ) 8o7s

  55. Constraints What configurations are forbidden? Gurobi BB

  56. Ground Truth Input Projected 3D Primitives 3D Primitives Proposed

  57. Qualitative Results Ground Truth Input Projected 3D Primitives 3D Primitives Proposed

  58. Ground Truth Input Projected 3D Primitives 3D Primitives Proposed

  59. Random Qualitative Results Proposed 3D Primitives

  60. Quantitative Results Summary Stats ( ⁰) % Good Pixels (Lower Better) (Higher Better) Mean Median RMSE 11.25⁰ 22.5⁰ 30⁰ Proposed 37.5 17.2 53.2 41.9 53.9 58.0 3D Primitives 38.5 19.0 54.2 41.7 52.4 56.3 Hedau et al. 43.2 24.8 59.4 39.1 48.8 52.3 Lee et al. 47.6 43.4 60.6 28.1 39.7 43.9 Karsch et al. 46.6 43.0 53.6 5.4 19.9 31.5 Hoiem et al. 45.6 38.2 55.1 8.6 30.5 41.0 rank

  61. Style vs. structure? Tenenbaum & Freeman. Separating Style and Content with Bilinear Models. Neural Computation. 2000.

  62. Casablanca Hotel, New York

  63. More general environments?

  64. KITTI Dataset: Geiger, Lenz, Urtasun , ‘12

  65. • Large regions without surface interpretation • Fewer linear/planar structures to anchor • Irregular distribution of 3D training data

  66. Discovered Primitives (Examples) 747/203

  67. Contact points

  68. Object surfaces + Contact points

  69. Next: Better reasoning Semantic information Less structured environments Evaluation Applications Data-Driven 3D Primitives For Single-Image Understanding , Fouhey, Gupta, Hebert, In ICCV 2013. Unfolding an Indoor Origami World, Fouhey, Gupta, Hebert, In ECCV 2014.

  70. • Harvested from tripadvisor.com

  71. Sheraton Los Angeles Meritan Apartments Sydney Le Champlain Quebec

  72. Project digression…..

  73. Next: Better reasoning Semantic information Less structured environments Evaluation Applications Data-Driven 3D Primitives For Single-Image Understanding , Fouhey, Gupta, Hebert, In ICCV 2013. Unfolding an Indoor Origami World, Fouhey, Gupta, Hebert, In ECCV 2014.

  74. Results – Quantitative Recall

Recommend


More recommend