The Inverted Multi-Index Presented by: Denis Efremov Source: https://en.ppt-online.org/92412 1/26
Introduction • Main goal: apply NN search on the high-dimensional space • NN is expensive – curse of dimensionality • Can pay by accuracy for the search time and memory usage • Use indexing • Indexing – storing and organizing the content of N-dimensional space into K clusters 2/26
Vector quantization 3/49 • K-means clustering of the dataset quantizer • Used in inverted index for indexing + K = 16 Length of the cell lists is balanced - Coarse sampling density codebook centroids 3/26
Querying the inverted index Query: • Have to consider several words for best accuracy • Want to use as big codebook as possible conflict • Want to spend as little time as possible for matching to codebooks 4/26
Product quantization + For the same K, much finer subdivision achieved K = 162 - Very non-uniform entry size distribution • Used in inverted multi-index for indexing • Used then for reranking in a both cases (Indexing and Multi-Indexing) 5/26
Querying the inverted multi-index – Step 1 inverted inverted index multi-index K 2 number of K entries operations to 2K+O(1) 2K+O(1) match to codebooks 6/26
Querying the inverted multi-index – Step 2 Step 2: the multi-sequence algorithm 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 0.6 0.8 4.1 6.1 8.1 9.1 0.6 0.8 4.1 6.1 8.1 9.1 0.6 0.8 4.1 6.1 8.1 9.1 0.6 0.8 4.1 6.1 8.1 9.1 0.6 0.8 4.1 6.1 8.1 9.1 0.6 0.8 4.1 6.1 8.1 9.1 2 2.5 2.7 6 8 10 11 2.5 2.7 6 8 10 11 2.5 2.7 6 8 10 11 2.5 2.7 6 8 10 11 2.5 2.7 6 8 10 11 2.5 2.7 6 8 10 11 3 3.5 3.7 7 9 11 12 3.5 3.7 7 9 11 12 3.5 3.7 7 9 11 12 3.5 3.7 7 9 11 12 3.5 3.7 7 9 11 12 3.5 3.7 7 9 11 12 4 6.5 6.7 10 12 14 15 6.5 6.7 10 12 14 15 6.5 6.7 10 12 14 15 6.5 6.7 10 12 14 15 6.5 6.7 10 12 14 15 6.5 6.7 10 12 14 15 5 7.5 7.7 11 13 15 16 7.5 7.7 11 13 15 16 7.5 7.7 11 13 15 16 7.5 7.7 11 13 15 16 7.5 7.7 11 13 15 16 7.5 7.7 11 13 15 16 6 11.5 11.7 15 17 19 20 11.5 11.7 15 17 19 20 11.5 11.7 15 17 19 20 11.5 11.7 15 17 19 20 11.5 11.7 15 17 19 20 11.5 11.7 15 17 19 20 7/26
Index vs Multi-index 8/26
Performance comparison Recall on the dataset of 1 billion of visual descriptors: "How fast can we catch the nearest neighbor to the query?" 100 x K = 2 14 Time increase: 1.4 msec -> 2.2 msec on a single core (with Basic Linear Algebra Subprograms (BLAS) instructions) 9/26
Performance comparison Recall on the dataset of 1 billion 128D visual descriptors: 10/26
Time complexity For same K index gets a slight advantage because of BLAS instructions 11/26
Why 2 halves? 12/49 Fourth-order is faster, but not so accurate 12/26
Multi-Index + Reranking • After quarrying we have list of vectors without distances, to reorder the list we have to use reranking Asymmetric Distance Computation • use m bytes to encode the original vector using product quantization faster (efficient caching possible for distance computation) • use m bytes to encode the remainder between the original vector and the centroid more accurate 13/26
Multi-D-ADC vs IVFADC State-of-the-art [Jegou et al.] 14/26
Retrieval examples Exact NN Uncompressed GIST Multi-D-ADC 16 bytes Exact NN Uncompressed GIST Multi-D-ADC 16 bytes Exact NN Uncompressed GIST Multi-D-ADC 16 bytes Exact NN Uncompressed GIST Multi-D-ADC 16 bytes 15/26
Multi-Index and PCA (128->32 dimensions) • Naïve – Principal ComponentAnalysis (PCA) before PQ • Smart – PQ before separated PCA 16/26
Conclusions • A new data structure for indexing the visual descriptors • Significant accuracy boost over the inverted index at the cost of the small memory overhead • Code available at https://github.com/ethz- asl/maplab/tree/master/algorithms/loopclosure/inverted-multi-index 17/26
Improvement of Product Quantization • K-means: + Minimal distortion - Intractable look-up • Product Quantization: + Huge codebook Tractable - Sensitive to projection (possible correlations) [Kalantidis, Avrithis CVPR 2014] 18/26
Improvement of Product Quantization • Optimized Product Quantization: + Huge codebook Tractable High-dim. Subspace Optimize w.r.t. R - Unoptimized for local clusters (the same non-uniform distribution) [Kalantidis, Avrithis CVPR 2014] 19/26
Improvement of Product Quantization • Locally Optimized Product Quantization: + Huge codebook Tractable High-dim. Subspace Optimize w.r.t. R Locally optimized - Are they? [Kalantidis, Avrithis CVPR 2014] 20/26
Recommend
More recommend