feature based place recognition
play

Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 - PowerPoint PPT Presentation

1 Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 tutorial on Large-Scale Visual Place Recognition and Image-Based Localization Alex Kendall, Torsten Sattler, Giorgos Tolias, Akihiko Torii 2 Introduction Challenges in


  1. 1 Feature-based Place Recognition Akihiko Torii Tokyo Tech CVPR 2017 tutorial on Large-Scale Visual Place Recognition and Image-Based Localization Alex Kendall, Torsten Sattler, Giorgos Tolias, Akihiko Torii

  2. 2 Introduction • Challenges in large-scale place recognition • Local feature based image representation • Instant level recognition to place recognition • Datasets and evaluation protocol • Feature-based Place Recognition

  3. 3 Introduction Feature-based Place Recognition

  4. 4 Visual Place Recognition Where? Search box

  5. 5 The approach Represent the world by a set of geotagged images • Given a query image, find the best matching image • Transfer the geotag of the best matching image • Query https://www.google.co.jp/maps/ @35.6066354,139.6861582,3a,45.2y,256.68h,96.58t

  6. 6 Why is this interesting? Mapping/organizing any photos • on the globe Recognition & geometry •

  7. 7 (Visual) place recognition [Knopp10,Torii13, Maddern14, Johns14, Sunderhauf15] Location recognition [Cao13, Arandjelovic14, Sattler15] Landmark identification [Chen11] Geo-localization [Hays08, Cummins08, Zamir10, Zamir16, Kim17]

  8. 8 Challenges in large-scale 
 place recognition Feature-based Place Recognition

  9. 9 Sources of geotagged images Photo community sites (flickr, instagram, …) + Never-stop growing - Noisy images/tags, concentrates landmarks StreetView images (Google StreetView, Mapillary,…) + Accurate, covering almost all the streets - (Can) not update frequently Generate perspective cutouts [Gronat11, Chen11, Torii13]

  10. 10 Sources of geotagged images San Francisco Landmarks dataset [Chen11] • 1.06M images for 6 x 6 km 2 Spatial and temporal densities may increase by collecting all the data from autonomous drivings! However, it is impossible to monitor all the streets for all the time. Figures from: [Chen-CVPR11]

  11. 11 Temporal sparseness induces … Query images Time 2014/07 2014/10/08 Lighting (day-night) Structure (poster) Database image

  12. 12 Spatial sparseness induces … Viewpoints Occlusions Space Time

  13. 13 It is actually the mixture of … Lightings, Structures, Viewpoints, Occlusions Space Time

  14. 14 Why is this di ffi cult? Temporal gap - Lighting, weather, season, structures, moving objects Spatial gap - Viewpoints, self-occlusions Large scale - Inter/intra repetitions, saturations of features

  15. 15 Local feature based image representation Feature-based Place Recognition

  16. 16 Visual instance recognition Geotagged image database Design an “image representation” extractor f(I) f( ) f( ) f( ) + + Transfer GPS + f( ) f( ) + + + f( ) + Query f( ) Image representation space

  17. 17 Review: Visual instance recognition 2 0 0 + 1 0 1 … Image I Extract local features Aggregate f(I) Compact yet discriminative image representation f(I), i.e. BoW, VLAD, FV

  18. 18 Review: Bag of Words (BoW) Local feature detection & description (DoG+SIFT) 0/1 assignment of desc. i to cluster k 0 1 0 0 [Sivic03]

  19. 19 Review: Bag of Words (BoW) Local feature detection & description (DoG+SIFT) 0/1 assignment of desc. i to cluster k Sum over all N descriptors 0 in the image 1 1 2 B = [ 1, 0, 2, 1, … ] [Sivic03]

  20. 20 Review: Vector of Locally Aggregated Descriptors (VLAD) 0/1 assignment of desc. i to cluster k Residual vector [Jégou10b]

  21. 21 Review: Vector of Locally Aggregated Descriptors (VLAD) 0/1 assignment of desc. i to cluster k Residual vector Sum over all N descriptors in the image V = [ , . , , , … ] [Jégou10b]

  22. 22 BoW VLAD (FV) B = [ 1, 0, 2, 1, … ] V = [ , . , , , … ] Dim. = #clusters 
 Dim. = #clusters x Dim. of feature 
 (e.g. 1.6M) (e.g. 256K x 128K) + Can be a sparse histogram + Performs well with a small vocab. (using a large vocab.) + Can be compressed by PCA with + Can provide matches a small loss in performance + No extra memory requirement to encode more features

  23. 23 Matching BoW histograms 0 1 1 2 Q = [ 1, 0, 2, 1, … ] B = [ 1, 0, 2, 1, … ]

  24. 24 Matching BoW histograms 0 1 1 2 Q = [ 1, 0, 2, 1, … ] B = [ 1, 0, 2, 1, … ]

  25. 25 Matching BoW histograms 0 1 1 2 Q = [ 1, 0, 2, 1, … ] B = [ 1, 0, 2, 1, … ]

  26. 26 BoW VLAD (FV) B = [ 1, 0, 2, 1, … ] V = [ , . , , , … ] Dim. = #clusters 
 Dim. = #clusters x Dim. of feature 
 (e.g. 1.6M) (e.g. 256K x 128K) + Can be a sparse histogram + Performs well with a small vocab. (using a large vocab.) + Can be compressed by PCA with + Can provide matches a small loss in performance + No extra memory requirement to encode more features

  27. 27 Sparse to dense features , . , , , … + Image I Aggregate f(I) Extract local features (DoG+SIFT)

  28. 28 Sparse to dense features , . , , , … + Image I Aggregate f(I) Extract local features (DSIFT, PHOW) + No memory overhead (with VLAD) + No bursts +/- Less invariant to viewpoint changes See [Lazebnik06, Bosch07, Iscen15, Torii15]

  29. 29 Sparse to dense features to CNN , . , , , … + Image I Aggregate f(I) Extract local features (DSIFT, PHOW) Pooling layer CNN layers Image NetVLAD layer Convolutional Neural Network (KxD)x1 soft-assignment VLAD vector conv (w,b) s L2 soft-max ... 1x1xDxK normalization x V intra- x VLAD core (c) normalization WxHxD map interpreted as NxD local descriptors x For detail, please be patient and wait for next session! See also an excellent survey paper [Zheng17]!

  30. 30 Instant level recognition to place recognition Feature-based Place Recognition

  31. 31 Designing (sparse) BoW tailored for place recognition tasks (Please forget about VLAD for 5 min) Feature-based Place Recognition

  32. 32 Using advanced techniques • Burstiness weighting [Jegou09] • Soft/multiple assignment [Philbin08, Chen11] • Hamming embedding [Jegou10, Arandjelovic14, Sattler16] • Query expansion [Chum11, Arandjelovic12] • Spatial/Hough pyramid, WGC, [Lazebnik06, Jegou10, Tolias14] Figure from [Zheng16]

  33. 33 Using advanced techniques • Burstiness weighting [Jegou09] • Soft/multiple assignment [Philbin08, Chen11] • Hamming embedding [Jegou10, Arandjelovic14, Sattler16] • Query expansion [Chum11, Arandjelovic12] • Spatial/Hough pyramid, WGC, [Lazebnik06, Jegou10, Tolias14] The challenges in large-scale place recognition Temporal • - Lighting, weather, season, structures, moving objects Spatial • - Viewpoints, self-occlusions Large scale • - Saturations (repetitions)

  34. 34

  35. 35 Burstiness weighting [Jegou09] Suppress saturation of BoW by repetitive patterns Query Top 3 ranked images The score is dominated by VWs on repeated structures But, removing them loses too much information

  36. 36 Soft/multiple assignment [Philbin08, Chen11] Retrieve matches lost by quantization Database image (correct) Query image Soft weight

  37. 37 Adaptive assignment [Torii13] 1.Explicitly detect repeated structures 2.Design an adaptive soft-assignment procedure - Repetitions provide a natural soft-assignment 3.Truncate high weights to limit influence of repeated VWs

  38. 38 Hamming embedding [Jegou10] Subdivide each cell into finer blocks • Give additional binary signature • Database image 10 (correct) 00 11 Query image 01 01 00

  39. 39 Dislocation [Arandjelovic14] Distinctiveness = inverse of local density = distance to the k-th nearest descriptor (HE signature) = σ See also Disloc+geometric burstiness [Sattler16] and selective match kernel [Tolias16]

  40. 40 Using GPS tags as weak priors Feature-based Place Recognition

  41. 41 Spatially far images should not match 200m positive image many negative data points [Schindler07] – What are informative features? [Zamir10] – Ratio test with location constraint. [Knopp10, Gronat13, Cao13, Sattler16 …. ]

  42. 42 Detecting “confusing” image regions [Knopp10] Key idea: Spatially far images should not match. Find the most similar images that are spatially far

  43. 43 Detecting “confusing” image regions [Knopp10] Find image areas with high density of local matches Matches with confused images Confusion score Confuising regions

  44. 44 Learning per-place linear SVM [Gronat-CVPR13] Key idea: Spatially far images should not match. 200m Objective function: where h is squared hinge loss. See also: [Cao13] Similar to Exemplar SVM by [Malisiewicz11]

  45. 45 NetVLAD [Arandjelovic16] GPS only provides weak supervision –Given a query, GPS gives us: • Definite negatives : – geographically far from the query Query

  46. 46 Major changes in appearance Feature-based Place Recognition

  47. 47 Changes across time, weather, season Figure from [Maddern14] Figure from [Neubert15] More studied in robot vision community, e.g. Generating illumination invariant images [Maddern14] - Leaning from repeated recordings [Neubert15] -

  48. 48 Place recognition under large changes in appearance & illumination [Torii15] A very challenging dataset that contains major changes • in illumination as well as structural changes. Day Sunset Night Database image Query images

Recommend


More recommend