efficient visual search of local features efficient
play

Efficient visual search of local features Efficient visual search of - PowerPoint PPT Presentation

Efficient visual search of local features Efficient visual search of local features Cordelia Schmid Cordelia Schmid Matching of descriptors Matching of descriptors Matching of descriptors Matching of descriptors Find the nearest neighbor


  1. Efficient visual search of local features Efficient visual search of local features Cordelia Schmid Cordelia Schmid

  2. Matching of descriptors Matching of descriptors

  3. Matching of descriptors Matching of descriptors • Find the nearest neighbor in the second image Find the nearest neighbor in the second image • Pruning strategies P i t t i – Ratio with respect to the second best match (d1/d2 << 1)

  4. Matching of descriptors Matching of descriptors • Find the nearest neighbor in the second image Find the nearest neighbor in the second image • Pruning strategies P i t t i – Ratio with respect to the second best match (d1/d2 << 1) – Local neighborhood constraints (semi-local constraints) – Local neighborhood constraints (semi-local constraints) Neighbors of the point have to match and angles have to correspond Neighbors of the point have to match and angles have to correspond. Note that in practice not all neighbors have to be matched correctly.

  5. Matching of descriptors Matching of descriptors • Find the nearest neighbor in the second image Find the nearest neighbor in the second image • Pruning strategies P i t t i – Ratio with respect to the second best match (d1/d2 << 1) – Local neighborhood constraints (semi-local constraints) – Local neighborhood constraints (semi-local constraints) – Backwards matching (matches are NN in both directions)

  6. Matching of descriptors Matching of descriptors • Find the nearest neighbor in the second image Find the nearest neighbor in the second image • Pruning strategies P i t t i – Ratio with respect to the second best match (d1/d2 << 1) – Local neighborhood constraints (semi-local constraints) – Local neighborhood constraints (semi-local constraints) – Backwards matching (matches are NN in both directions) • Geometric verification with global constraint – Hough transform [see for example Lowe’04, student presentation] – RANSAC (RANdom Sampling Consensus) [Fishler&Bolles’81]

  7. Algorithm RANSAC Algorithm RANSAC • Robust estimation with RANSAC of a homography Robust estimation with RANSAC of a homography – Repeat • Select 4 point matches Select 4 point matches • Compute 3x3 homography • Measure support (number of inliers within threshold, i.e. – Choose (H with the largest number of inliers) – Re-estimate H with all inliers

  8. Comparison Comparison Hough Transform RANSAC •Advantages Ad t •Advantages Ad t – Can handle high percentage of – General method suited to large range outliers (>95%) of problems – E t Extracts groupings from clutter in t i f l tt i – E Easy to implement t i l t linear time – “Independent” of number of dimensions •Disadvantages •Disadvantages – Basic version only handles moderate – Quantization issues number of outliers (<50%) – Only practical for small number of dimensions (up to 4) •Many variants available, e.g. •Improvements available – PROSAC: Progressive RANSAC – Probabilistic Extensions [Chum05] [Chum05] – Continuous Voting Space – Preemptive RANSAC [Nister05] – Can be generalized to arbitrary shapes and objects

  9. Visual search Visual search …

  10. Image search system for large datasets Image search system for large datasets Large image dataset (one million images or more) (one million images or more) query ranked image list Image search Image search system • Issues for very large databases • to reduce the query time q y • to reduce the storage requirements • with minimal loss in retrieval accuracy

  11. Two strategies g 1. Efficient approximate nearest neighbor search on local feature descriptors feature descriptors. 2 Quantize descriptors into a “visual vocabulary” and use 2. Quantize descriptors into a “visual vocabulary” and use efficient techniques from text retrieval (Bag of words representation) (Bag-of-words representation)

  12. Strategy 1: Efficient approximate NN search Local features invariant descriptor descriptor vectors Images invariant d descriptor i t vectors 1. Compute local features in each image independently 2. Describe each feature by a descriptor vector 3. Find nearest neighbour vectors between query and database g q y 4. Rank matched images by number of (tentatively) corresponding regions 5. Verify top ranked images based on spatial consistency

  13. Finding nearest neighbour vectors Establish correspondences between query image and images in the database by nearest neighbour matching on SIFT vectors 128D descriptor Model image Image database space Solve following problem for all feature vectors, , in the query image: S l f ll i bl f ll f t t i th i where, , are features from all the database images.

  14. Quick look at the complexity of the NN-search N … images M … regions per image (~1000) D … dimension of the descriptor (~128) Exhaustive linear search: O(M NMD) Example: • Matching two images (N=1), each having 1000 SIFT descriptors Nearest neighbors search: 0 4 s (2 GHz CPU implemenation in C) Nearest neighbors search: 0.4 s (2 GHz CPU, implemenation in C) • Memory footprint: 1000 * 128 = 128kB / image # of images g CPU time Memory req. y q N = 1,000 … ~7min (~100MB) N = 10,000 … ~1h7min (~ 1GB) … N = 10 7 ~115 days (~ 1TB) … All images on Facebook: All images on Facebook: N = 10 10 … ~300 years (~ 1PB)

  15. Nearest-neighbor matching ea est e g bo atc g Solve following problem for all feature vectors, x j , in the query image: S l f ll i bl f ll f t t i th i where x i are features in database images. Nearest-neighbour matching is the major computational bottleneck • Linear search performs dn operations for n features in the d t b database and d dimensions d d di i • No exact methods are faster than linear search for d>10 • Approximate methods can be much faster, but at the cost of A i t th d b h f t b t t th t f missing some correct matches.

  16. K-d tree d t ee • K-d tree is a binary tree data structure for organizing a set of points • Each internal node is associated with an axis aligned hyper-plane E h i t l d i i t d ith i li d h l splitting its associated points into two sub-trees. • Dimensions with high variance are chosen first • Dimensions with high variance are chosen first. • Position of the splitting hyper-plane is chosen as the mean/median of the projected points – balanced tree. p j p l 1 4 6 6 l 1 l 1 l 3 l l 2 l 7 l 4 l 5 l 7 l 6 8 5 l 2 9 10 3 2 5 4 11 8 l 8 l 10 l 9 2 1 11 1 3 9 10 6 7

  17. Large scale object/scene recognition Large scale object/scene recognition Image dataset: > 1 million images q query y ranked image list k d i li t Image search system • Each image described by approximately 1000 descriptors – 10 9 descriptors to index for one million images! 10 9 descriptors to index for one million images! • Database representation in RAM: Database representation in RAM: – Size of descriptors : 1 TB, search+memory intractable

  18. Bag-of-features [Sivic&Zisserman’03] Bag of features [Sivic&Zisserman 03] Query Set of SIFT centroids image descriptors (visual words) sparse freq enc sparse frequency vector ector Harris-Hessian-Laplace Bag-of-features regions + SIFT descriptors processing + tf-idf weighting + tf idf weighting Inverted • • “visual words”: visual words : querying querying file – 1 “word” (index) per local descriptor p – only images ids in inverted file => 8 GB fits! Re-ranked Geometric ranked image g verification list short-list [Chum & al. 2007]

  19. Indexing text with inverted files Indexing text with inverted files Document collection: Inverted file: Inverted file: Term List of hits (occurrences in documents) List of hits (occurrences in documents) Term People [d1:hit hit hit], [d4:hit hit] … Common Common [d1:hit hit], [d3: hit], [d4: hit hit hit] … [d1:hit hit] [d3: hit] [d4: hit hit hit] Sculpture [d2:hit], [d3: hit hit hit] … Need to map feature descriptors to “visual words”

  20. Visual words Visual words •Example: each group of patches belongs to f t h b l t the same visual word Figure from S ivic & Zisserman, ICCV 2003 22 K. Grauman, B. Leibe

  21. Vector quantize the descriptor space Vector quantize the descriptor space v1 1

  22. Vector quantize the descriptor space Vector quantize the descriptor space v10 v30 v1 v2 v31 v31 • Histogram of visual word occurrence represents the image represents the image • Sparse if large visual vocabulary

  23. Inverted file index for visual words List of image Word • numbers number • Score each image by the number of common visual words (tentative • Score each image by the number of common visual words (tentative correspondences) • Dot product between bag-of-features Dot product between bag of features • Fast for sparse vectors ! Image credit: A. Zisserman K. Grauman, B. Leibe

Recommend


More recommend