instance level recognition iii correspondence and
play

Instance level recognition III: Correspondence and efficient visual - PowerPoint PPT Presentation

Computer Vision and Object Recognition 2013 Instance level recognition III: Correspondence and efficient visual search Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire dInformatique, Ecole Normale


  1. Computer Vision and Object Recognition 2013 Instance level recognition III: Correspondence and efficient visual search Josef Sivic http://www.di.ens.fr/~josef INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548 Laboratoire d’Informatique, Ecole Normale Supérieure, Paris With slides from: O. Chum, K. Grauman, S. Lazebnik, B. Leibe, D. Lowe, J. Philbin, J. Ponce, D. Nister, C. Schmid, N. Snavely, A. Zisserman

  2. Announcements Class web-page: http://www.di.ens.fr/willow/teaching/recvis13/ Assignment 1 is due next week on Oct 22 2013 http://www.di.ens.fr/willow/teaching/recvis13/assignment1/ Matlab tutorial online: http://www.di.ens.fr/willow/teaching/recvis12/matlab-tut.zip Final projects: project proposal (due on Nov 8). Start looking at the final project topics on the class webpage.

  3. Instance-level recognition Last lecture: • Basic camera geometry – (J. Ponce) • Local invariant features – (C. Schmid) Today: • Correspondence, matching and recognition with local invariant features, efficient visual search – (J. Sivic) Next week: • Very large scale visual indexing – (C. Schmid)

  4. Outline Part 1. Image matching and recognition with local features - Correspondence - Semi-local and global geometric relations - Robust estimation – RANSAC and Hough Transform Part 2. Going large-scale - Approximate nearest neighbour matching - Bag-of-visual-words representation - Efficient visual search and extensions - Beyond bag-of-visual-words representations - Applications

  5. Outline Part 1. Image matching and recognition with local features - Correspondence - Semi-local and global geometric relations - Robust estimation – RANSAC and Hough Transform

  6. Image matching and recognition with local features The goal: establish correspondence between two or more images P X P / x x ' Image points x and x’ are in correspondence if they are projections of the same 3D scene point X. Images courtesy A. Zisserman

  7. Example I: Wide baseline matching and 3D reconstruction Establish correspondence between two (or more) images. [Schaffalitzky and Zisserman ECCV 2002]

  8. Example I: Wide baseline matching and 3D reconstruction Establish correspondence between two (or more) images. X [Schaffalitzky and Zisserman ECCV 2002]

  9. [Agarwal, Snavely, Simon, Seitz, Szeliski, ICCV’09] – Building Rome in a Day 57,845 downloaded images, 11,868 registered images. This video: 4,619 images.

  10. Example II: Object recognition Establish correspondence between the target image and (multiple) images in the model database. Model database Target image [D. Lowe, 1999]

  11. Example III: Visual search Given a query image, find images depicting the same place / object in a large unordered image collection. Find these landmarks ...in these images and 1M more

  12. Establish correspondence between the query image and all images from the database depicting the same object / scene. Query image Database image(s)

  13. Mobile visual search Bing visual scan and others … Snaptell.com, Millpix.com

  14. Example Slide credit: I. Laptev

  15. Why is it difficult? Want to establish correspondence despite possibly large changes in scale, viewpoint, lighting and partial occlusion Viewpoint Scale Occlusion Lighting … and the image collection can be very large (e.g. 1M images)

  16. Approach Pre-processing (last lecture): • Detect local features. • Extract descriptor for each feature. Matching: 1. Establish tentative (putative) correspondences based on local appearance of individual features (their descriptors). 2. Verify matches based on semi-local / global geometric relations.

  17. Example I: Two images -“Where is the Graffiti?” object

  18. Step 1. Establish tentative correspondence Establish tentative correspondences between object model image and target image by nearest neighbour matching on SIFT vectors 128D descriptor Model (query) image Target image space Need to solve some variant of the “nearest neighbor problem” for all feature vectors, , in the query image: where, , are features in the target image. Can take a long time if many target images are considered.

  19. Step 1. Establish tentative correspondence Examine the distance to the 2 nd nearest neighbour [Lowe, IJCV 2004] Ambiguous Unique 128D descriptor Model (query) image Target image space If the 2 nd nearest neighbour is much further than the 1 st nearest neighbour Match is more “unique” or discriminative. Measure this by the ratio: r = d 1NN / d 2NN r is between 0 and 1 r is small the match is more unique. Works very well in practice. See the Assignment 1 for an example.

  20. Problem with matching on local descriptors alone • too much individual invariance • each region can affine deform independently (by different amounts) • locally appearance can be ambiguous Solution: use semi-local and global spatial relations to verify matches.

  21. Example I: Two images -“Where is the Graffiti?” Initial matches Nearest-neighbor search based on appearance descriptors alone. After spatial verification

  22. Step 2: Spatial verification (now) a. Semi-local constraints Constraints on spatially close-by matches b. Global geometric relations Require a consistent global relationship between all matches

  23. Semi-local constraints: Example I. – neighbourhood consensus [Schmid&Mohr, PAMI 1997]

  24. Semi-local constraints: Example I. – neighbourhood consensus Original images Tentative matches [Schaffalitzky & Zisserman, CIVR 2004] After neighbourhood consensus

  25. Semi-local constraints: Example II. Model image Matched image [Ferrari et al., IJCV 2005] Matched image

  26. Geometric verification with global constraints • All matches must be consistent with a global geometric relation / transformation. • Need to simultaneously (i) estimate the geometric relation / transformation and (ii) the set of consistent matches Matches consistent with an affine Tentative matches transformation

  27. Examples of global constraints 1 view and known 3D model. • Consistency with a (known) 3D model. 2 views • Epipolar constraint • 2D transformations • Similarity transformation • Affine transformation • Projective transformation N-views Are images consistent with a 3D model?

  28. 3D constraint: example • Matches must be consistent with a 3D model Offline: Build a 3D model 3 (out of 20) images used to build the 3D model Recovered 3D model [Lazebnik, Rothganger, Schmid, Ponce, CVPR’03]

  29. 3D constraint: example • Matches must be consistent with a 3D model Offline: Build a 3D model 3 (out of 20) images used to build the 3D model At test time: Recovered 3D model Object recognized in a previously Recovered pose unseen pose [Lazebnik, Rothganger, Schmid, Ponce, CVPR’03]

  30. 3D constraint: example Given 3D model (set of known 3D points X’s) and a set of measured 2D image points x, find camera matrix P and a set of geometrically consistent correspondences x X. X P x C

  31. Epipolar geometry (not considered here) In general, two views of a 3D scene are related by the epipolar constraint. • A point in one view “generates” an epipolar line in the other view • The corresponding point lies on this line. Slide credit: A. Zisserman

  32. Epipolar geometry (not considered here) Epipolar geometry is a consequence of the coplanarity of the camera centres and scene point X x / x / C C The camera centres, corresponding points and scene point lie in a single plane, known as the epipolar plane Slide credit: A. Zisserman

  33. Epipolar geometry (not considered here) Algebraically, the epipolar constraint can be expressed as X x / x / C C where • x, x’ are homogeneous coordinates (3-vectors) of corresponding image points. • F is a 3x3, rank 2 homogeneous matrix with 7 degrees of freedom, called the fundamental matrix . Slide credit: A. Zisserman

  34. 2D transformation models Similarity (translation, scale, rotation) Affine Projective (homography) Why are 2D planar transformation important?

  35. Recall perspective projection Slide credit: A. Zisserman

  36. Plane projective transformations Slide credit: A. Zisserman

  37. Projective transformations continued • This is the most general transformation between the world and image plane under imaging by a perspective camera. • It is often only the 3 x 3 form of the matrix that is important in establishing properties of this transformation. • A projective transformation is also called a ``homography'' and a ``collineation''. • H has 8 degrees of freedom. How many points are needed to compute H? Slide credit: A. Zisserman

  38. Planes in the scene induce homographies H 1 H 2 H x x' H = H 2 H 1

  39. Planes in the scene induce homographies Points on the plane transform as x’ = H x , where x and x’ are image points (in homogeneous coordinates), and H is a 3x3 matrix. H x x'

  40. Case II: Cameras rotating about their centre image plane 2 image plane 1 • The two image planes are related by a homography H • H depends only on the relation between the image planes and camera centre, C, not on the 3D structure

  41. Case II: Example of a rotating camera Images courtesy of A. Zisserman.

  42. Homography is often approximated well by 2D affine geometric transformation H A x x'

Recommend


More recommend