composite quantization for
play

Composite Quantization for Approximate Nearest Neighbor Search - PowerPoint PPT Presentation

Composite Quantization for Approximate Nearest Neighbor Search Jingdong Wang Lead Researcher Microsoft Research http://research.microsoft.com/~jingdw ICML 2104, joint work with my interns Ting Zhang from USTC and Chao Du from Tsinghua


  1. Composite Quantization for Approximate Nearest Neighbor Search Jingdong Wang Lead Researcher Microsoft Research http://research.microsoft.com/~jingdw ICML 2104, joint work with my interns Ting Zhang from USTC and Chao Du from Tsinghua University

  2. Outline • Introduction • Problem • Product quantization • Cartesian k-means • Composite quantization • Experiments 5/6/2015

  3. Nearest neighbor search • Application to similar image search query 5/6/2015

  4. Nearest neighbor search • Application to particular object retrieval query 5/6/2015

  5. Nearest neighbor search • Application to duplicate image search 5/6/2015

  6. Nearest neighbor search • Similar image search Application to K-NN annotation: Annotate the query image using the similar images 5/6/2015

  7. Nearest neighbor search • Definition • Database: ∈ 𝑆 𝑒 • Query: • Nearest neighbor: 𝑦 − 𝑟 2 5/6/2015

  8. Nearest neighbor search • Exact nearest neighbor search • linear scan: 5/6/2015

  9. Nearest neighbor search - speedup • K-dimensional tree (Kd tree) • Generalized binary search tree • Metric tree • Ball tree • VP tree • BD tree • Cover tree • … 5/6/2015

  10. Nearest neighbor search • Exact nearest neighbor search • linear scan: • Costly and impractical for large scale high-dimensional cases • Approximate nearest neighbor (ANN) search • Efficient • Acceptable accuracy • Practically used 5/6/2015

  11. Two principles for ANN search Recall the complexity of linear scan: 1. Reduce the number of distance computations • Time complexity: • Tree structure, neighborhood graph search and inverted index 5/6/2015

  12. Our work: TP Tree + NG Search • TP Tree • Jingdong Wang, Naiyan Wang, You Jia, Jian Li, Gang Zeng, Hongbin Zha, Xian-Sheng Hua:Trinary- Projection Trees for Approximate Nearest Neighbor Search. IEEE Trans. Pattern Anal. Mach. Intell. 36(2): 388-403 (2014) • You Jia, Jingdong Wang, Gang Zeng, Hongbin Zha, Xian-Sheng Hua: Optimizing kd-trees for scalable visual descriptor indexing. CVPR 2010: 3392-3399 • Neighborhood graph search • Jingdong Wang, Shipeng Li:Query-driven iterated neighborhood graph search for large scale indexing. ACM Multimedia 2012: 179-188 • Jing Wang, Jingdong Wang, Gang Zeng, Rui Gan, Shipeng Li, Baining Guo: Fast Neighborhood Graph Search Using Cartesian Concatenation. ICCV 2013: 2128-2135 • Neighborhood graph construction • Jing Wang, Jingdong Wang, Gang Zeng, Zhuowen Tu, Rui Gan, Shipeng Li: Scalable k-NN graph construction for visual descriptors. CVPR 2012: 1106-1113 5/6/2015

  13. Comparison over SIFT 1M ICCV13 ACMMM12 CVPR10 1 NN 13 5/6/2015

  14. Comparison over GIST 1M ICCV13 ACMMM12 CVPR10 1 NN 14 5/6/2015

  15. Comparison over HOG 10M ICCV13 ACMMM12 CVPR10 1 NN 15 5/6/2015

  16. Neighborhood Graph Search • Shipped to Bing ClusterBed • Number • Index building time on 40M documents is only 2 hours and 10 minutes. • Search DPS on each NNS machine stable at 950 without retry and errors • Five times faster improved FLANN 5/6/2015

  17. Two principles for ANN search Recall the complexity of linear scan: 1. Reduce the number of distance computations 1. High efficiency  • Time complexity: 2. Large memory cost  • Tree structure, neighborhood graph search and inverted index 2. Reduce the cost of each distance computation 1. Small memory cost  • Time complexity: 2. Low efficiency  • Hashing (compact codes) 5/6/2015

  18. Approximate nearest neighbor search • Binary embedding methods (hashing) • Produce a few distinct distances • Limited ability and flexibility of distance approximation • Vector quantization (compact codes) • K-means • Impossible to use for medium and large code length • Impossible to learn the codebook • Impossible to compute a code for a vector • Product quantization • Cartesian k-means 5/6/2015

  19. Combined solution for very large scale search Retrieve candidates with Load raw features for Reranking using the true an index structure using retrieved candidates from distances compact codes disk Efficient and small memory IO cost is small consumption 5/6/2015

  20. Outline • Introduction • Problem • Product quantization • Cartesian k-means • Composite quantization • Experiments 5/6/2015

  21. Product quantization • Approximate x by the concatenation of M subvectors 5/6/2015

  22. Product quantization • Approximate x by the concatenation of M subvectors x 1𝑗 1 p 1𝑗 1 Codebook in the 1 st subspace {p 11 , p 12 , ⋯ , p 1𝐿 } Codebook in the 2 nd subspace x 2𝑗 2 p 2𝑗 2 {p 21 , p 22 , ⋯ , p 2𝐿 } x = ≈ x = ⋮ ⋮ x 𝑁𝑗 𝑁 p 𝑁𝑗 𝑁 Codebook in the M th subspace {p 𝑁1 , p 𝑁2 , ⋯ , p 𝑁𝐿 } 5/6/2015

  23. Product quantization • Approximate x by the concatenation of M subvectors p 1𝑗 1 x 1 Codebook in the 1 st subspace {p 11 , p 12 , ⋯ , p 1𝐿 } Codebook in the 2 nd subspace p 2𝑗 2 x 2 {p 21 , p 22 , ⋯ , p 2𝐿 } x = ≈ x = ⋮ ⋮ x 𝑁 p 𝑁𝑗 𝑁 Codebook in the M th subspace {p 𝑁1 , p 𝑁2 , ⋯ , p 𝑁𝐿 } 5/6/2015

  24. Product quantization • Approximate x by the concatenation of M subvectors p 1𝑗 1 x 1 Codebook in the 1 st subspace {p 11 , p 12 , ⋯ , p 1𝐿 } p 2𝑗 2 Codebook in the 2 nd subspace x 2 {p 21 , p 22 , ⋯ , p 2𝐿 } x = ≈ x = ⋮ ⋮ x 𝑁 p 𝑁𝑗 𝑁 Codebook in the M th subspace {p 𝑁1 , p 𝑁2 , ⋯ , p 𝑁𝐿 } 5/6/2015

  25. Product quantization • Approximate x by the concatenation of M subvectors p 1𝑗 1 x 1 Codebook in the 1 st subspace {p 11 , p 12 , ⋯ , p 1𝐿 } p 2𝑗 2 Codebook in the 2 nd subspace x 2 {p 21 , p 22 , ⋯ , p 2𝐿 } x = ≈ x = ⋮ ⋮ x 𝑁 p 𝑁𝑗 𝑁 Codebook in the M th subspace {p 𝑁1 , p 𝑁2 , ⋯ , p 𝑁𝐿 } 5/6/2015

  26. Product quantization • Approximate x by the concatenation of M subvectors • Code presentation: (𝑗 1 , 𝑗 2 , …, 𝑗 𝑁 ) • Distance computation: 2 + 𝑒 q 2 , p 2𝑗 2 2 + ⋯ + 𝑒 q M , p 𝑁𝑗 𝑁 2 x 2 = 𝑒 q 1 , p 1𝑗 1 • 𝑒 q, • M additions using a pre-computed distance table p 1𝑗 1 x 1 Codebook in the 1 st subspace 𝑒( {p 11 , p 12 , ⋯ , p 1𝐿 } , q 1 ) ≜ {𝑒 q 1 , p 11 , 𝑒 q 1 , p 12 , ⋯ , 𝑒(q 1 , p 1𝐿 )} p 2𝑗 2 Codebook in the 2 nd subspace x 2 𝑒( {p 21 , p 22 , ⋯ , p 2𝐿 } , q 2 ) x = ≈ x = q 1 ⋮ ⋮ x 𝑁 p 𝑁𝑗 𝑁 Codebook in the M th subspace {p 𝑁1 , p 𝑁2 , ⋯ , p 𝑁𝐿 } 𝑒( , q 𝑁 ) 5/6/2015

  27. Product quantization 𝑁 = 2 • Approximate x by the concatenation of M subvectors • Codebook generation • Do k-means for each subspace 5/6/2015

  28. Product quantization 𝑁 = 2, 𝐿 = 3 • Approximate x by the concatenation of M subvectors 𝑞 23 • Codebook generation • Do k-means for each subspace 𝑞 22 𝑞 21 𝑞 11 𝑞 12 𝑞 13 5/6/2015

  29. Product quantization 𝑁 = 2, 𝐿 = 3 • Approximate x by the concatenation of M subvectors 𝑞 23 • Codebook generation • Do k-means for each subspace 𝑞 22 • Result in 𝐿 𝑁 groups • The center of each is the concatenation of M subvectors 𝑞 21 𝑞 11 𝑞 12 𝑞 13 = 𝑞 11 𝑞 21 5/6/2015

  30. Product quantization 𝑁 = 2, 𝐿 = 3 • Approximate x by the concatenation of M subvectors • Codebook generation • Do k-means for each subspace • Result in 𝐿 𝑁 groups • The center of each is the concatenation of M subvectors 5/6/2015

  31. Product quantization 𝑁 = 2, 𝐿 = 3 • Approximate x by the concatenation of M subvectors • Codebook generation • Do k-means for each subspace • Result in 𝐿 𝑁 groups • The center of each is the concatenation of M subvectors 5/6/2015

  32. Product quantization 𝑁 = 2, 𝐿 = 3 • Approximate x by the concatenation of M subvectors • Codebook generation x • Do k-means for each subspace 𝑞 22 • Result in 𝐿 𝑁 groups • The center of each is the concatenation of M subvectors 𝑞 13 𝑞 13 x ≈ x = 𝑞 22 5/6/2015

  33. Outline • Introduction • Problem • Product quantization • Cartesian k-means • Composite quantization • Experiments 5/6/2015

  34. Cartesian K-means • Extended product quantization • Optimal space rotation • Perform PQ over the rotated space p 1𝑗 1 p 2𝑗 2 x ≈ x = R ⋮ p 𝑁𝑗 𝑁 5/6/2015

  35. Cartesian K-means • Extended product quantization • Optimal space rotation • Perform PQ over the rotated space 𝑞 13 𝑞 23 p 1𝑗 1 𝑞 12 𝑞 22 𝑞 11 p 2𝑗 2 𝑞 21 x ≈ x = R ⋮ p 𝑁𝑗 𝑁 5/6/2015

  36. Cartesian K-means x • Extended product quantization • Optimal space rotation • Perform PQ over the rotated space 𝑞 13 𝑞 23 p 1𝑗 1 𝑞 12 𝑞 22 𝑞 11 p 2𝑗 2 𝑞 21 x ≈ x = R x = R 𝑞 13 ⋮ x ≈ 𝑞 22 p 𝑁𝑗 𝑁 5/6/2015

  37. Outline • Introduction • Composite quantization • Experiments 5/6/2015

  38. Composite quantization • Approximate x by the addition of M vectors x ≈ x = c 1𝑗 1 + c 2𝑗 2 + ⋯ + c 𝑁𝑗 𝑁 {c 11 , c 12 , ⋯ , c 1𝐿 } {c 21 , c 22 , ⋯ , c 2𝐿 } {c 𝑁1 , c 𝑁2 , ⋯ , c 𝑁𝐿 } ⋯ Source codebook 1 Source codebook 2 Source codebook M Each source codebook is composed of K d -dimensional vectors 5/6/2015

  39. Composite quantization 2 source codebooks: {c 11 , c 12 , c 13 } {c 21 , c 22 , c 23 } c 11 c 23 c 12 c 22 c 21 c 13 5/6/2015

Recommend


More recommend