Outline • Last time: local invariant features, scale invariant detection Lecture 14: • Applications, including stereo Indexing with local features • Indexing with invariant features • Bag-of-words representation for images Thursday, Nov 1 Prof. Kristen Grauman Classes of Translation Invariant local features transformations Subset of local feature types • Euclidean/rigid : Translation and Scaling y 1 designed to be invariant to Translation + rotation y 2 – Scale – Lengths and angles … – Translation preserved y d – Rotation • Similarity : Translation + Similarity transformation – Affine transformations rotation + uniform scale – Illumination x 1 • Affine : Similarity + shear x 2 – Valid for orthographic 1) Detect distinctive interest points … camera, locally planar x d object 2) Extract invariant descriptors Affine transformation – Lengths and angles not preserved [Mikolajczyk & Schmid, Matas et al., Tuytelaars & Van Gool, Lowe, Kadir et al.,… ] Recall: segmentation as clustering Recall: segmentation as clustering • Previously we represented pixels with features, mapping • Previously we represented pixels with features, mapping each one to a d -dimensional vector each one to a d -dimensional vector R=255 R=0 R=0 G=200 G=200 G=200 B=250 R=245 B=20 B=20 G=220 X=30 … B=248 Y=20 B R=15 R=15 G=189 G=189 G B=2 B=2 X=20 R=3 R=3 Y=400 G=12 G=12 B=2 B=2 X=100 R Y=200
Image patches as vectors Image metrics Can compare those vector descriptions • SSD • Dot product • … Slide by Trevor Darrell, MIT Indexing with local features SIFT descriptors: vector formation • Now we have patches or regions, still mapping each one Thresholded image gradients are sampled over 16x16 • to a d -dimensional vector (e.g., d =128 for SIFT) array of locations in scale space Create array of orientation histograms • 8 orientations x 4x4 histogram array = 128 dimensions • David Lowe, UBC Indexing with local features What are the limitations of describing image • When we see close points in feature space, we have patches with a stack of pixel intensities? similar descriptors, which indicates similar local content. Why should something like a SIFT descriptor be more robust? What role does the interest point detection play? Figure from Andrew Zisserman, University of Oxford
Recall: Triangulation Many applications of local features Scene point • Wide baseline stereo P in 3d • Motion tracking • Panoramas Right image • Mobile robot navigation Left image p • 3D reconstruction p’ • Recognition – Specific objects O O’ baseline – Textures – Categories Estimate scene point based on camera • … relationships and correspondence. Dense correspondence search Sparse correspondence search For each epipolar line For each pixel / window in the left image • Restrict search to sparse set of detected features • compare with every pixel / window on same epipolar line in right • Rather than pixel values (or lists of pixel values) use feature image descriptor and an associated feature distance • pick position with minimum match cost (e.g., SSD, correlation) • Still narrow search further by epipolar geometry Adapted from Li Zhang Wide baseline stereo Wide baseline stereo • 3d reconstruction depends on finding good correspondences • Especially with wide-baseline views, local image deformations not well-approximated with rigid transformations • Cannot simply compare regions of fixed shape (circles, rectangles) – shape is not preserved under affine transformations J. Matas, O. Chum, M. Urban, T. Pajdla. Robust Wide Baseline Stereo From Maximally Stable Extremal Regions, BMVC 2002.
Wide baseline stereo Wide baseline stereo J. Matas, O. Chum, M. Urban, T. Pajdla. Robust Wide Baseline Stereo From Maximally Stable Extremal Regions, BMVC 2002. J. Matas, O. Chum, M. Urban, T. Pajdla. Robust Wide Baseline Stereo From Maximally Stable Extremal Regions, BMVC 2002. SIFT matching and recognition Recognition of specific objects, scenes Index descriptors • Generalized Hough transform: vote for object poses • Refine with geometric verification: affine fit, check for • agreement between image features and model Schmid and Mohr 1997 Sivic and Zisserman, 2003 Lowe 2002 SIFT Features Rothganger et al. 2003 [Lowe 1999] Panorama stitching Value of local (invariant) features • Complexity reduction via selection of distinctive points • Describe images, objects, parts without requiring segmentation • Local character means robustness to clutter, occlusion • Robustness: similar descriptors in spite of noise, blur, etc. Brown, Szeliski, and Winder, 2005
Comparative evaluations Testing various detector and descriptor options for relative repeatability and distinctiveness Planar objects / flat scenes: Mikolajczyk & Schmid (2004) 3D objects: Moreels & Perona (2005) http://www.robots.ox.ac.uk/~vgg/research/affine/detectors.html#binaries [Images from Lazebnik, Sicily 2006] Outline • Last time: local invariant features, scale invariant detection • Applications, including stereo • Indexing with invariant features • Bag-of-words representation for images Slide from Andrew Zisserman, University of Oxford Slide from Andrew Zisserman Slide from Andrew Zisserman
Text retrieval vs. image search • What makes the problems similar, different? Slide from Andrew Zisserman Analogy to documents Analogy to documents Object Object Bag of ‘ Bag of ‘words words’ ’ Of all the sensory impressions proceeding to China is forecasting a trade surplus of $90bn the brain, the visual experiences are the (£51bn) to $100bn this year, a threefold dominant ones. Our perception of the world increase on 2004's $32bn. The Commerce around us is based essentially on the Ministry said the surplus would be created by messages that reach the brain from our eyes. a predicted 30% jump in exports to $750bn, For a long time it was thought that the retinal compared with a 18% rise in imports to sensory, brain, China, trade, image was transmitted point by point to visual $660bn. The figures are likely to further centers in the brain; the cerebral cortex was a annoy the US, which has long argued that visual, perception, surplus, commerce, movie screen, so to speak, upon which the China's exports are unfairly helped by a retinal, cerebral cortex, exports, imports, US, image in the eye was projected. Through the deliberately undervalued yuan. Beijing discoveries of Hubel and Wiesel we now eye, cell, optical agrees the surplus is too high, but says the yuan, bank, domestic, know that behind the origin of the visual yuan is only one factor. Bank of China nerve, image foreign, increase, perception in the brain there is a considerably governor Zhou Xiaochuan said the country more complicated course of events. By Hubel, Wiesel also needed to do more to boost domestic trade, value following the visual impulses along their path demand so more goods stayed within the to the various cell layers of the optical cortex, country. China increased the value of the Hubel and Wiesel have been able to yuan against the dollar by 2.1% in July and demonstrate that the message about the permitted it to trade within a narrow band, but image falling on the retina undergoes a step- the US wants the yuan to be allowed to trade wise analysis in a system of nerve cells freely. However, Beijing has made it clear that stored in columns. In this system each cell it will take its time and tread carefully before has its specific function and is responsible for allowing the yuan to rise further in value. a specific detail in the pattern of the retinal image. ICCV 2005 short course, L. Fei-Fei ICCV 2005 short course, L. Fei-Fei representation representation recognition recognition codewords dictionary codewords dictionary feature detection & representation image representation category models category models category category (and/or) classifiers (and/or) classifiers decision decision
1.Feature detection and representation 1.Feature detection and representation 1.Feature detection and representation 1.Feature detection and representation • Regular grid • Regular grid • Interest point detector 1.Feature detection and representation 1.Feature detection and representation 1.Feature detection 1.Feature detection and and representation representation • Regular grid Compute • Interest point detector Normalize SIFT patch descriptor [Lowe’99] Detect patches • Other methods [Mikojaczyk and Schmid ’02] – Random sampling [Matas et al. ’02] [Sivic et al. ’03] – Segmentation based patches Slide credit: Josef Sivic 1.Feature detection detection and and representation representation 2. Codewords Codewords dictionary formation dictionary formation 1.Feature 2. … …
2. Codewords 2. Codewords dictionary formation dictionary formation Extract some local features from a number of images … … SIFT descriptor space: each point is 128-dimensional Vector quantization Slide credit: Josef Sivic Slides from D. Nister
Recommend
More recommend