graph based time space trade offs for approximate near
play

Graph-based timespace trade-offs for approximate near neighbors - PowerPoint PPT Presentation

Graph-based timespace trade-offs for approximate near neighbors Thijs Laarhoven mail@thijs.com http://thijs.com/ SoCG 2018 , Budapest, Hungary (June 13, 2018) Nearest neighbor searching Nearest neighbor problem Problem description


  1. Graph-based time–space trade-offs for approximate near neighbors Thijs Laarhoven mail@thijs.com http://thijs.com/ SoCG 2018 , Budapest, Hungary (June 13, 2018)

  2. Nearest neighbor searching

  3. Nearest neighbor problem – Problem description

  4. Nearest neighbor problem – Problem description

  5. Nearest neighbor problem – Problem description

  6. Nearest neighbor problem – Approximate solutions r

  7. Nearest neighbor problem – Approximate solutions r c · r

  8. Nearest neighbor problem – Example: Voronoi cells

  9. Nearest neighbor problem – Example: Voronoi cells

  10. Nearest neighbor problem – Example: Voronoi cells

  11. Partition-based methods

  12. Partition-based methods – Data structure

  13. Partition-based methods – Data structure

  14. Partition-based methods – Hash table lookups

  15. Partition-based methods – Hash table lookups

  16. Partition-based methods – Hash table lookups

  17. Partition-based methods – Hash table lookups

  18. Partition-based methods – Near the boundaries

  19. Partition-based methods – Near the boundaries

  20. Partition-based methods – Near the boundaries

  21. Partition-based methods – Randomizations

  22. Partition-based methods – Randomizations

  23. Partition-based methods – Randomizations

  24. • Product of bisections; [Cha03] Partition-based methods – Challenges Main problem : choosing the best types of space partitions. • Requires an effjcient decoding algorithm; • Space partitions should have nice shapes . Utopia : disjoint spheres lying on an effjciently decodable code or lattice. Real world : approximate ideal solution as best as we can. • Voronoi cells induced by hypercube; [TT07, Laa16] • Random (overlapping) spheres; [AI06, AINR14] • Voronoi cells induced by cross-polytopes; [TT07, AIL+15, KW17] • Voronoi cells induced by (pseudo)random points. [BDGL16, ALRW17, Chr17] Best techniques are theoretically optimal as well as practical .

  25. • Product of bisections; [Cha03] Partition-based methods – Challenges Main problem : choosing the best types of space partitions. • Requires an effjcient decoding algorithm; • Space partitions should have nice shapes . Utopia : disjoint spheres lying on an effjciently decodable code or lattice. Real world : approximate ideal solution as best as we can. • Voronoi cells induced by hypercube; [TT07, Laa16] • Random (overlapping) spheres; [AI06, AINR14] • Voronoi cells induced by cross-polytopes; [TT07, AIL+15, KW17] • Voronoi cells induced by (pseudo)random points. [BDGL16, ALRW17, Chr17] Best techniques are theoretically optimal as well as practical .

  26. Partition-based methods – Challenges Main problem : choosing the best types of space partitions. • Requires an effjcient decoding algorithm; • Space partitions should have nice shapes . Utopia : disjoint spheres lying on an effjciently decodable code or lattice. Real world : approximate ideal solution as best as we can. • Voronoi cells induced by hypercube; [TT07, Laa16] • Random (overlapping) spheres; [AI06, AINR14] • Voronoi cells induced by cross-polytopes; [TT07, AIL+15, KW17] • Voronoi cells induced by (pseudo)random points. [BDGL16, ALRW17, Chr17] Best techniques are theoretically optimal as well as practical . • Product of bisections; [Cha03]

  27. Partition-based methods – Challenges Main problem : choosing the best types of space partitions. • Requires an effjcient decoding algorithm; • Space partitions should have nice shapes . Utopia : disjoint spheres lying on an effjciently decodable code or lattice. Real world : approximate ideal solution as best as we can. • Voronoi cells induced by hypercube; [TT07, Laa16] • Random (overlapping) spheres; [AI06, AINR14] • Voronoi cells induced by cross-polytopes; [TT07, AIL+15, KW17] • Voronoi cells induced by (pseudo)random points. [BDGL16, ALRW17, Chr17] Best techniques are theoretically optimal as well as practical . • Product of bisections; [Cha03]

  28. Nearest neighbor methods – Practice (ANN Benchmarks [ABF17]) bruteforce-blas rpforest nearpy hnsw(nmslib) fmann falconn faiss-lsh faiss-ivf faiss-gpu DolphinnPy dolphinn bruteforce0(nmslib) BallTree(nmslib) 0 annoy Queries per second Recall rate 10 4 10 3 10 2 10 1 1 SW-graph(nmslib) 0 . 2 0 . 4 0 . 6 0 . 8

  29. Nearest neighbor methods – Practice (ANN Benchmarks [ABF17]) bruteforce-blas rpforest nearpy hnsw(nmslib) fmann falconn faiss-lsh faiss-ivf faiss-gpu DolphinnPy dolphinn bruteforce0(nmslib) BallTree(nmslib) 0 annoy Queries per second Recall rate 10 4 10 3 10 2 10 1 1 SW-graph(nmslib) 0 . 2 0 . 4 0 . 6 0 . 8

  30. Graph-based methods

  31. Graph-based methods – Data structure

  32. Graph-based methods – Data structure

  33. Graph-based methods – Greedy algorithm

  34. Graph-based methods – Greedy algorithm

  35. Graph-based methods – Greedy algorithm

  36. Graph-based methods – Greedy algorithm

  37. Graph-based methods – Greedy algorithm

  38. Graph-based methods – Greedy algorithm

  39. Graph-based methods – Greedy algorithm

  40. Graph-based methods – Greedy algorithm

  41. Graph-based methods – Greedy algorithm

  42. Graph-based methods – Greedy algorithm

  43. Graph-based methods – Greedy algorithm

  44. Graph-based methods – Greedy algorithm

  45. Graph-based methods – Local solutions

  46. Graph-based methods – Local solutions

  47. Graph-based methods – Local solutions

  48. Graph-based methods – Local solutions

  49. Graph-based methods – Local solutions

  50. Graph-based methods – Local solutions

  51. Graph-based methods – Local solutions

  52. Graph-based methods – Randomizations

  53. Graph-based methods – Randomizations

  54. Graph-based methods – Randomizations

  55. Graph-based methods – Randomizations

  56. Graph-based methods – Randomizations

  57. Graph-based methods – Randomizations

  58. Graph-based methods – Challenges Main problem : designing the graph. • Intuitively: connect near neighbors for gradual progress; • Avoiding local minima: add a few long edges ; • Hierarchical graphs : long edges in upper layers, short edges in bottom layers. Practically , graph-based methods are very effjcient as well. Theoretically , little is known about the performance of these methods.

  59. Graph-based methods – Challenges Main problem : designing the graph. • Intuitively: connect near neighbors for gradual progress; • Avoiding local minima: add a few long edges ; • Hierarchical graphs : long edges in upper layers, short edges in bottom layers. Practically , graph-based methods are very effjcient as well. Theoretically , little is known about the performance of these methods.

  60. Graph-based methods – Challenges Main problem : designing the graph. • Intuitively: connect near neighbors for gradual progress; • Avoiding local minima: add a few long edges ; • Hierarchical graphs : long edges in upper layers, short edges in bottom layers. Practically , graph-based methods are very effjcient as well. Theoretically , little is known about the performance of these methods.

  61. Graph-based methods – Challenges Main problem : designing the graph. • Intuitively: connect near neighbors for gradual progress; • Avoiding local minima: add a few long edges ; • Hierarchical graphs : long edges in upper layers, short edges in bottom layers. Practically , graph-based methods are very effjcient as well. Theoretically , little is known about the performance of these methods.

  62. Graph-based methods – Contributions Theorem (Main result, informal) For randomized greedy walks on the near neighbor graph and for “random” data sets, we can solve the approximate nearest neighbor problem on n points with query time O ( n ρ q ) and space O ( n 1 + ρ s ) with ρ q , ρ s ≥ 0 satisfying ( 2 c 2 − 1 ) ρ q + 2 c 2 ( c 2 − 1 ) � ρ s ( 1 − ρ s ) ≥ c 4 .

  63. Graph-based methods – Contributions space), this scales equivalently as the best partition-based trade-offs: [ALRW17] (1) Positive result : greedy algorithm already “optimal” for c 1 and s 0. Negative result : (analysis of) this algorithm is not competitive for c 1 or s 0. In the most common regime of c ≈ 1 ( high recall rate ) and ρ s ≈ 0 (near-linear ρ q = 1 − 4 ( c − 1 ) √ ρ s · ( 1 + o ( 1 )) .

  64. Graph-based methods – Contributions space), this scales equivalently as the best partition-based trade-offs: [ALRW17] (1) Negative result : (analysis of) this algorithm is not competitive for c 1 or s 0. In the most common regime of c ≈ 1 ( high recall rate ) and ρ s ≈ 0 (near-linear ρ q = 1 − 4 ( c − 1 ) √ ρ s · ( 1 + o ( 1 )) . Positive result : greedy algorithm already “optimal” for c ≈ 1 and ρ s ≈ 0.

Recommend


More recommend