instance level recognition iv very large databases
play

Instance level recognition IV: Very large databases Cordelia Schmid - PowerPoint PPT Presentation

Instance level recognition IV: Very large databases Cordelia Schmid LEAR INRIA Grenoble Visual search change in viewing angle Matches 22 correct matches Image search system for large datasets Large image dataset (one million


  1. Instance level recognition IV: Very large databases Cordelia Schmid LEAR – INRIA Grenoble

  2. Visual search … change in viewing angle

  3. Matches 22 correct matches

  4. Image search system for large datasets Large image dataset (one million images or more) query ranked image list Image search system • Issues for very large databases • to reduce the query time • to reduce the storage requirements • with minimal loss in retrieval accuracy

  5. Large scale object/scene recognition Image dataset: > 1 million images query ranked image list Image search system • Each image described by approximately 2000 descriptors – 2 * 10 9 descriptors to index for one million images! • Database representation in RAM: – Size of descriptors : 1 TB, search+memory intractable

  6. Bag-of-features [Sivic&Zisserman’03] Query Set of SIFT centroids image descriptors (visual words) sparse frequency vector Harris-Hessian-Laplace Bag-of-features regions + SIFT descriptors processing + tf-idf weighting • Visual Words • Visual Words – 1 word (index) per local descriptor Inverted – only images ids in inverted file querying file ⇒ 8 GB for a million images, fits in RAM • Problem – Matching approximation Re-ranked ranked image Geometric list short-list verification [Chum & al. 2007]

  7. Visual words – approximate NN search • Map descriptors to words by quantizing the feature space – Quantize via k-means clustering to obtain visual words – Assign descriptors to closest visual words • Bag-of-features as approximate nearest neighbor search • Bag-of-features as approximate nearest neighbor search Descriptor matching with k -nearest neighbors Bag-of-features matching function where q(x) is a quantizer, i.e., assignment to a visual word and δ a,b is the Kronecker operator ( δ a,b =1 iff a=b)

  8. Approximate nearest neighbor search evaluation •ANN algorithms usually returns a short-list of nearest neighbors – this short-list is supposed to contain the NN with high probability – exact search may be performed to re-order this short-list •Proposed quality evaluation of ANN search: trade-off between – Accuracy : NN recall = probability that the NN is in this list against against – Ambiguity removal = proportion of vectors in the short-list - the lower this proportion, the more information we have about the vector - the lower this proportion, the lower the complexity if we perform exact search on the short-list •ANN search algorithms usually have some parameters to handle this trade-off

  9. ANN evaluation of bag-of-features •ANN algorithms returns a list of 0.7 potential neighbors k=100 0.6 200 • Accuracy : NN recall 500 0.5 = probability that the 1000 recall NN is in this list NN is in this list NN rec 0.4 0.4 2000 2000 5000 0.3 • Ambiguity removal : 10000 20000 = proportion of vectors 30000 50000 0.2 in the short-list 0.1 •In BOF, this trade-off BOW is managed by the 0 number of clusters k 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 rate of points retrieved

  10. Vocabulary size • The intrinsic matching scheme performed by BOF is weak – for a “small” visual dictionary: too many false matches – for a “large” visual dictionary: complexity, true matches are missed • • No good trade-off between “small” and “large” ! No good trade-off between “small” and “large” ! – either the Voronoi cells are too big – or these cells can’t absorb the descriptor noise → intrinsic approximate nearest neighbor search of BOF is not sufficient

  11. 20K visual word: false matches

  12. 200K visual word: good matches missed

  13. Hamming Embedding [Jegou et al. ECCV’08] Representation of a descriptor x – Vector-quantized to q(x) as in standard BOF + short binary vector b(x) for an additional localization in the Voronoi cell Two descriptors x and y match iif h( a , b ) Hamming distance

  14. Term frequency – inverse document frequency • Weighting with tf-idf score: weight visual words based on their frequency •Tf: normalized term (word) frequency ti in a document dj ∑ ∑ = = tf tf n n / / n n ij ij ij ij kj kj k •Idf: inverse document frequency, total number of documents divided by number of documents containing the term ti D = idf log { } ∈ i d : t d i − = ⋅ Tf-Idf: tf idf tf idf ij ij i �������������������������� ��������������������

  15. Hamming Embedding [Jegou et al. ECCV’08] •Nearest neighbors for Hamming distance ≈ those for Euclidean distance → a metric in the embedded space reduces dimensionality curse effects •Efficiency – Hamming distance = very few operations – Fewer random memory accesses: 3 x faster that BOF with same dictionary size!

  16. Hamming Embedding • Off-line (given a quantizer) – draw an orthogonal projection matrix P of size d b × d → this defines d b random projection directions – for each Voronoi cell and projection direction, compute the median – for each Voronoi cell and projection direction, compute the median value for a learning set • On-line : compute the binary signature b(x) of a given descriptor – project x onto the projection directions as z(x) = (z 1 ,…z db ) – b i (x) = 1 if z i (x) is above the learned median value, otherwise 0

  17. Hamming neighborhood 1 0.8 ieved (recall) Trade-off between memory usage and accuracy rate of 5-NN retrieve 0.6 0.6 � More bits yield higher accuracy 0.4 In practice, 64 bits (8 byte) 0.2 8 bits 16 bits 32 bits 64 bits 128 bits 0 0 0.2 0.4 0.6 0.8 1 rate of cell points retrieved

  18. ANN evaluation of Hamming Embedding 0.7 32 28 compared to BOW: at least k=100 24 0.6 22 10 times less points in the 200 short-list for the same level 20 500 0.5 of accuracy 1000 18 2000 2000 0.4 0.4 NN recall h t =16 5000 Hamming Embedding 0.3 10000 provides a much better 20000 trade-off between recall 30000 50000 0.2 and ambiguity removal 0.1 HE+BOW BOW 0 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 rate of points retrieved

  19. Matching points - 20k word vocabulary 240 matches 201 matches Many matches with the non-corresponding image!

  20. Matching points - 200k word vocabulary 69 matches 35 matches Still many matches with the non-corresponding one

  21. Matching points - 20k word vocabulary + HE 83 matches 8 matches 10x more matches with the corresponding image!

  22. Bag-of-features [Sivic&Zisserman’03] Query Set of SIFT centroids image descriptors (visual words) sparse frequency vector Bag-of-features Harris-Hessian-Laplace processing regions + SIFT descriptors + tf-idf weighting Inverted querying file Re-ranked ranked image Geometric list short-list verification [Chum & al. 2007]

  23. Geometric verification Use the position and shape of the underlying features to improve retrieval quality Both images have many matches – which is correct?

  24. Geometric verification We can measure spatial consistency between the query and each result to improve retrieval quality Many spatially consistent Few spatially consistent matches – correct result matches – incorrect result

  25. Geometric verification Gives localization of the object

  26. Weak geometry consistency • Re-ranking based on full geometric verification – works very well – but performed on a short-list only (typically, 100 images) → for very large datasets, the number of distracting images is so high that relevant images are not even short-listed! 1 1 short-list size: short-list size: 0.9 20 images rate of relevant images short-listed 100 images 0.8 1000 images 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1000 10000 100000 1000000 dataset size

  27. Weak geometry consistency • Weak geometric information used for all images (not only the short-list) • Each invariant interest region detection has a scale and rotation angle associated, here characteristic scale and dominant gradient orientation Scale change 2 Rotation angle ca. 20 degrees • Each matching pair results in a scale and angle difference • For the global image scale and rotation changes are roughly consistent

  28. WGC: orientation consistency Max = rotation angle between images

  29. WGC: scale consistency

  30. Weak geometry consistency • Integration of the geometric verification into the BOF – votes for an image in two quantized subspaces, i.e. for angle & scale – these subspace are show to be roughly independent – final score: filtering for each parameter (angle and scale) • Only matches that do agree with the main difference of orientation and scale will be taken into account in the final score • Re-ranking using full geometric transformation still adds information in a final stage

  31. INRIA holidays dataset • Evaluation for the INRIA holidays dataset, 1491 images – 500 query images + 991 annotated true positives – Most images are holiday photos of friends and family • 1 million & 10 million distractor images from Flickr • Vocabulary construction on a different Flickr set • Vocabulary construction on a different Flickr set • Almost real-time search speed • Evaluation metric: mean average precision (in [0,1], bigger = better) – Average over precision/recall curve

  32. Holiday dataset – example queries

Recommend


More recommend