instance level recognition
play

Instance-level Recognition Pingmei Xu Object Recognition Friends - PowerPoint PPT Presentation

Instance-level Recognition Pingmei Xu Object Recognition Friends SE01EP02 Recognition: Find the Ring! Friends SE01EP02 Recognition: Find the Ring! Instance


  1. Instance-level Recognition Pingmei Xu

  2. Object Recognition Friends SE01EP02

  3. Recognition: Find the Ring! Friends SE01EP02

  4. ��������������������������� Recognition: Find the Ring! Instance Recognition � Friends SE01EP02

  5. ����������������� Recognition: Find the Ring! Category Recognition � Friends SE01EP02

  6. ����������������������� ���������������� Recognition: Find the Ring! Recognition Algorithm � Friends SE01EP02

  7. ������������������������� ��������������������������� Recognition: Find the Ring! Scene Understanding � Friends SE01EP02

  8. History of Ideas in Recognition Some Slides are borrowed from Svetlana Lazebnik

  9. The Religious Wars Geometry Appearance vs. Parts vs. The Whole • …and the standard answer: probably both or neither Alexei Efros

  10. 1960s ~ late 1990s the Geometric Era

  11. Blocks World (1960s) � � � � L.G. Roberts � � • Constrained 3D scene models to allow object recognition from very simple image features.

  12. Generalized Cylinders (1970s) � � � � T. Binford � � • Representing 3D shapes and parts in terms of “Generalized Cylinders”.

  13. Recognition by Parts � � � � � Biederman (1987) Pentland (1986) � � • Geons: shape primitives + deformations, with predictable edge properties under perspective.

  14. Recognition by Parts Primitives (geons) Objects � � � � � � Biederman (1987) � • Hypothesis: there is a small number of geometric components that constitute the primitive elements of the object recognition system.

  15. Parts + Spatial Configurations Fischler & Elschlager (1973) • There is more to shape than just the right part primitives, i.e., their spatial relationships.

  16. Alignment Huttenlocher & Ullman (1987)

  17. 1990s Appearance-Based

  18. Eigenfaces Turk & Pentland (1991)

  19. Color Histogram Swain & Ballard (1991)

  20. 1990s ~ present Sliding Window

  21. Sliding Window Approaches Viola & Jones (2000) HOG feature map Template Response map Dalal & Triggs (2005)

  22. late 1990s ~ present Local Features

  23. Local Features D. Lowe (1999, 2004)

  24. early 2000s ~ present Parts and Shape

  25. Constellation Models Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)

  26. Deformable Part Model Felzenszwalb, Girshick, McAllester & Ramanan (2009)

  27. mid 2000s ~ present Bag of Features

  28. Bag-of-features Models Objects Bag of “words”

  29. Present Trends

  30. Data-driven Method Malisiewicz et al. (2011)

  31. Recognition from RGBD Images Shotton et al. (2011)

  32. Deep Learning http://deeplearning.net/

  33. History of Ideas in Recognition • 1960s – late 1990s: the geometric era • 1990s: appearance-based models • 1990s – present: sliding window approaches • late 1990s – present: local features • early 2000s – present: parts-and-shape models • mid 2000s – present: bags of features • present trends: “big data”, context, attributes, combining geometry and recognition, advanced scene understanding tasks, deep learning

  34. History of Ideas in Recognition THE ¡COMPUTER ¡VISION ¡EVOLUTION ?

  35. 3D Object Modeling and Recognition A test image Instances of 5 models Rothganger, Lazebnik, Schmid, & Ponce (2006)

  36. 3D Modeling: Pairwise Matching

  37. Affine Patches STEP 1: Detection STEP 2: Description Detect salient image regions Extract a descriptor • Idea: although smooth surfaces are almost never planar in the large, they are always planar in the small.

  38. Affine Patches Harris-Laplacian DoG

  39. Affine Patches

  40. Affine Patches

  41. Affine Patches

  42. Affine Patches

  43. Affine Patches • Patch rectification

  44. Geometric Constraints

  45. 3D Object Modeling: Matching Procedure RANSAC: 1) sampling stage 2) consensus stage

  46. Application: 3D Object Modeling

  47. 3D Modeling: Input Images • The 20 images used to construct the teddy bear model.

  48. 3D Modeling: Partial Model from Image Pairs • Matches between two images.

  49. 3D Modeling: Partial Model → Composite Ones Covert a collection of matches to a 3D model 1. Chaining 2. Stitching 3. Bundle adjustment 4. Euclidean upgrade

  50. 3D Modeling: Partial → Composite: Chaining Chaining: link matches across multiple images. • Construction of the patch-view matrix. A (subsampled) patch-view matrix for the teddy bear. Each black square indicates the presence of a given patch in a given image.

  51. 3D Modeling: Partial → Composite: Stitching Stitching: solve for the affine structure and motion while coping with missing data. Common patches of adjacent modeling views presented in a common coordinate frame.

  52. 3D Modeling: Partial → Composite: Bundle Adjustment Bundle adjustment: refine the model using non-linear least squares. The bear model along with the recovered affine camera configurations.

  53. 3D Modeling: Object Gallery

  54. Application: 3D Object Recognition

  55. 3D Object Recognition: Select Potential Matches Features: 1) a measure of the contrast (average squared gradient norm) in the patch 2) a 10 × 10 color histogram drawn from the UV portion of YUV space, and 3) SIFT

  56. 3D Object Recognition: Robust Estimation • RANSAC • Greedy Here |P| denotes the size of the set P of match hypotheses, K is the number of best matches kept per model patch, M is the number of samples drawn, and N is the size of one seed.

  57. 3D Object Recognition: Object Detection • Criteria used to decide whether it is present or not: � � � • Measure of distortion: reflects how close to the top part of a scaled rotation this matrix is.

  58. 3D Object Recognition: Successful Examples

  59. 3D Object Recognition: Failed Examples

  60. 3D Object Modeling and Recognition Paper: 3D Object Modeling and Recognition Using Local Affine-Invariant • Image Descriptors and Multi-View Spatial Constraints. F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce, IJCV 2006 Input Pairwise Matching Test Image 3D Recognition 3D Modeling

  61. Large Scale Retrieval

  62. Large Scale Image Retrieval • Combining local features, indexing, and spatial constraints K. Grauman and B. Leibe

  63. Video Google (VG) Query Search Results J Sivic et al. (2003), Philbin et al. (2007, 2008), Chum et al. (2007) • Whether text retrieval approach is applicable to object recognition?

  64. ������������������������������ ������������������ Text Retrieval Word Stem Document Corpus

  65. The Visual Analogy ? ? Frame/Image Film/Image Set

  66. VG:Local Descriptor • Viewpoint covariant regions: 1) ’Maximally Stable’ (yellow) 2) ’Shape Adapted’ (cyan) • 128-dimensional SIFT

  67. The Visual Analogy Descriptor ? Frame/Image Film/Image Set

  68. VG: Visual Vocabulary Affine covariant regions Clusters • Vector quantize the descriptors into clusters by k-means.

  69. VG: Visual Words • Each group of patches belongs to the same visual word.

  70. The Visual Analogy Descriptor Centroid Frame/Image Film/Image Set

  71. Image Retrieval Using Visual Words Vocabulary construction (offline) � • • Database construction (offline) • Image retrieval (online)

  72. VG: Stop List Before stop list → After stop list → • The most frequent visual words that occur in almost all images are suppressed.

  73. VG: Soft Assignment • Count in one bin is spread to neighbouring bins.

  74. Vocabulary Construction Summary Subset of 48 shots Select regions (SA+MS) Frame tracking is selected 10k frames= 10k frames*1600 ~200k 10% of movie =1,600,000 regions regions Cluster descriptors Reject unstable SIFT descriptors using K-means regions Parameters tuning

  75. Image Retrieval Using Visual Words • Vocabulary construction (offline) Database construction (offline) � • • Image retrieval (online)

  76. tf-idf Vector Number of Total number of occurrences of word i documents in database in document d Number of words Total number of in document d word i in database • Documents -> vectors of word frequencies • Term frequency – inverse document frequency • Downweight words that appear often in the database

  77. VG: Inverted File Index Word ID Document ID 1 1 2 2 3 3 4 4 … … N K � • Word -> a list of all documents (with frequencies)

  78. Crawling Movies Summary Vocabulary Construction Select key Select regions Reject unstable Frame tracking frames (SA+MS) regions Stop list Vector quantization SIFT descriptors Tf-idf weighting Inverted index Database Construction

Recommend


More recommend