data dependent hashing for nearest neighbor search
play

Data-dependent Hashing for Nearest Neighbor Search Alex Andoni - PowerPoint PPT Presentation

Data-dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Based on joint work with: Piotr Indyk, Huy Nguyen, Ilya Razenshteyn Nearest Neighbor Search (NNS) Preprocess: a set of points Query: given a


  1. Data-dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Based on joint work with: Piotr Indyk, Huy Nguyen, Ilya Razenshteyn

  2. Nearest Neighbor Search (NNS)  Preprocess: a set 𝑄 of points  Query: given a query point π‘Ÿ , report a point π‘ž βˆ— ∈ 𝑄 with the smallest distance to π‘Ÿ π‘ž βˆ— π‘Ÿ 2

  3. Motivation  Generic setup: 000000  Points model objects (e.g. images) 011100 010100  Distance models (dis)similarity measure 000100 010100 011111  Application areas: 000000 001100  machine learning: k-NN rule π‘ž βˆ— 000100 000100  speech/image/video/music recognition, vector 110100 111111 π‘Ÿ quantization, bioinformatics, etc…  Distances:  Hamming, Euclidean, edit distance, earthmover distance, etc…  Core primitive: closest pair, clustering, etc… 3

  4. Curse of Dimensionality  All exact algorithms degrade rapidly with the dimension 𝑒 Algorithm Query time Space π‘œ 𝑃(𝑒) (Voronoi diagram size) 𝑒 β‹… log π‘œ 𝑃 1 Full indexing No indexing – 𝑃(π‘œ β‹… 𝑒) 𝑃(π‘œ β‹… 𝑒) linear scan 4

  5. Approximate NNS  𝑠 -near neighbor: given a query point π‘Ÿ , report a point π‘ž β€² ∈ 𝑄 s.t. π‘žβ€² βˆ’ π‘Ÿ ≀ 𝑠 𝑑𝑠  as long as there is some point within distance 𝑠 𝑠 π‘ž βˆ— 𝑑𝑠 π‘Ÿ  Practice: use for exact NNS  Filtering : gives a set of candidates (hopefully π‘ž β€² small) 5

  6. NNS algorithms Exponential dependence on dimension  [Arya- Mount’93], [Clarkson’94], [Arya -Mount-Netanyahu-Silverman- We’98], [Kleinberg’97], [Har - Peled’02],[Arya -Fonseca- Mount’11],… Linear/poly dependence on dimension  [Kushilevitz-Ostrovsky- Rabani’98], [Indyk - Motwani’98], [Indyk’98, β€˜01], [Gionis-Indyk- Motwani’99], [Charikar’02], [Datar -Immorlica-Indyk- Mirrokni’04], [Chakrabarti - Regev’04], [Panigrahy’06], [Ailon - Chazelle’06], [A.- Indyk’06], [ A.-Indyk-Nguyen- Razenshteyn’14], [A. - Razenshteyn’15], [Pagh’16],[Laarhoven’16],… 6

  7. Locality-Sensitive Hashing [Indyk-Motwani ’ 98] π‘Ÿ π‘žβ€² Random hash function β„Ž on 𝑆 𝑒 π‘ž π‘Ÿ satisfying:  for close pair : when π‘Ÿ βˆ’ π‘ž ≀ 𝑠 𝑄 1 = Pr[β„Ž(π‘Ÿ) = β„Ž(π‘ž)] is β€œhigh” β€œ not-so-small ”  for far pair : when π‘Ÿ βˆ’ π‘žβ€² > 𝑑𝑠 𝑄 2 = Pr[β„Ž(π‘Ÿ) = β„Ž(π‘žβ€²)] is β€œsmall” Use several hash tables: Pr[𝑕(π‘Ÿ) = 𝑕(π‘ž)] 𝜍 = log 1/𝑄 1 1 π‘œ 𝜍 , where log 1/𝑄 𝑄 1 2 𝑄 2 π‘Ÿ βˆ’ π‘ž 7 𝑠 𝑑𝑠

  8. LSH Algorithms 𝒅 = πŸ‘ Space Time Exponent Reference π‘œ 𝜍 π‘œ 1+𝜍 𝜍 = 1/𝑑 𝜍 = 1/2 [IM’98] Hamming space 𝜍 β‰₯ 1/𝑑 [MNP’06, OWZ’11] π‘œ 1+𝜍 π‘œ 𝜍 𝜍 = 1/𝑑 𝜍 = 1/2 [IM’98, DIIM’04] Euclidean 𝜍 β‰ˆ 1/𝑑 2 space 𝜍 = 1/4 [AI’06] 𝜍 β‰₯ 1/𝑑 2 [MNP’06, OWZ’11] 8

  9. LSH is tight… what’s next? Lower bounds (cell probe) Datasets with additional structure [A.-Indyk- Patrascu’06 , [Clarkson’99, Panigrahy-Talwar- Wieder’08,β€˜ 10, Karger- Ruhl’02, Kapralov- Panigrahy’12] Krauthgamer- Lee’04, Beygelzimer-Kakade- Langford’06, Indyk- Naor’07, Space-time trade-offs Dasgupta- Sinha’13, [Panigrahy’06, Abdullah-A.-Krauthgamer- Kannan’14,…] A.- Indyk’06 ] But are we really done with basic NNS algorithms? 9

  10. Beyond Locality Sensitive Hashing 𝒅 = πŸ‘ Space Time Exponent Reference π‘œ 1+𝜍 π‘œ 𝜍 𝜍 = 1/𝑑 𝜍 = 1/2 [IM’98] Hamming LSH 𝜍 β‰₯ 1/𝑑 [MNP’06, OWZ’11] space π‘œ 1+𝜍 π‘œ 𝜍 complicated 𝜍 = 1/2 βˆ’ πœ— [AINR’14] 1 𝜍 = 1/3 [AR’15] 𝜍 β‰ˆ 2𝑑 βˆ’ 1 Euclidean π‘œ 1+𝜍 π‘œ 𝜍 𝜍 β‰ˆ 1/𝑑 2 𝜍 = 1/4 [AI’06] LSH space 𝜍 β‰₯ 1/𝑑 2 [MNP’06, OWZ’11] π‘œ 1+𝜍 π‘œ 𝜍 𝜍 = 1/4 βˆ’ πœ— complicated [AINR’14] 1 𝜍 = 1/7 [AR’15] 𝜍 β‰ˆ 2𝑑 2 βˆ’ 1 10

  11. New approach? Data-dependent hashing  A random hash function, chosen after seeing the given dataset  Efficiently computable 11

  12. Construction of hash function  T wo components: has better LSH  Nice geometric structure data-dependent  Reduction to such structure  Like a (weak) β€œ regularity l emma” for a set of points 12

  13. Nice geometric structure: average-case  Think: random dataset on a sphere  vectors perpendicular to each other  s.t. random points at distance β‰ˆ 𝑑𝑠 1  Lemma: 𝜍 = 2𝑑 2 βˆ’1 𝑑𝑠  via Cap Carving 𝑑𝑠/ 2 13

  14. Reduction to nice structure  Idea: iteratively decrease the radius of minimum enclosing ball Why ok? Why ok?  Algorithm: no dense clusters β€’  find dense clusters  with smaller radius l ike β€œrandom dataset” β€’  large fraction of points with radius= 100𝑑𝑠  recurse on dense clusters radius = 100𝑑𝑠 even better! β€’  apply cap carving on the rest  recurse on each β€œcap”  eg, dense clusters might reappear radius = 99𝑑𝑠 14 *picture not to scale & dimension

  15. Hash function  Described by a tree (like a hash table) radius = 100𝑑𝑠 15 *picture not to scale&dimension

  16. 2 βˆ’ πœ— 𝑆 Dense clusters  Current dataset: radius 𝑆  A dense cluster:  Contains π‘œ 1βˆ’πœ€ points  Smaller radius: 1 βˆ’ Ξ© πœ— 2 𝑆  After we remove all clusters:  For any point on the surface, there are at most π‘œ 1βˆ’πœ€ points 2 βˆ’ πœ— 𝑆 within distance πœ— trade-off πœ€ trade-off  The other points are essentially orthogonal !  When applying Cap Carving with parameters (𝑄 1 , 𝑄 2 ) : ?  Empirical number of far pts colliding with query: π‘œπ‘„ 2 + π‘œ 1βˆ’πœ€  As long as π‘œπ‘„ 2 ≫ π‘œ 1βˆ’πœ€ , the β€œimpurity” doesn’t matter!

  17. Tree recap  During query:  Recurse in all clusters  Just in one bucket in CapCarving  Will look in >1 leaf!  How much branching? 𝑃(1/πœ— 2 )  Claim: at most π‘œ πœ€ + 1  Each time we branch  at most π‘œ πœ€ clusters ( +1 )  a cluster reduces radius by Ξ©(πœ— 2 )  cluster-depth at most 100/Ξ© πœ— 2  Progress in 2 ways: πœ€ trade-off  Clusters reduce radius  CapCarving nodes reduce the # of far points (empirical 𝑄 2 ) 1  A tree succeeds with probability β‰₯ π‘œ βˆ’ 2𝑑2βˆ’1 βˆ’π‘(1) 17

  18. Beyond β€œBeyond LSH”  Practice: often optimize partition to your dataset  PCA-tree, spectral hashing, etc [S91, McN01,VKD09, WTF08,…]  no guarantees (performance or correctness)  Theory: assume special structure in the dataset  low intrinsic dimension [KR’02, KL’04, BKL’06, IN’07, DS’13,…]  structure + noise [Abdullah-A.-Krauthgamer- Kannan’14] Data-dependent hashing helps even when no a priori structure ! 18

  19. Data-dependent hashing wrap-up  Dynamicity?  Dynamization techniques [Overmars- van Leeuwen’81]  Better bounds?  For dimension 𝑒 = 𝑃(log π‘œ) , can get better 𝜍 ! [Laarhoven’16]  For 𝑒 > log 1+πœ€ π‘œ : our 𝜍 is optimal even for data-dependent hashing! [A- Razenshteyn ’??]:  in the right formalization (to rule out Voronoi diagram):  description complexity of the hash function is π‘œ 1βˆ’Ξ©(1)  Practical variant [A-Indyk-Laarhoven-Razenshteyn- Schmidt’15]  NNS for β„“ ∞  [Indyk’98] gets approximation 𝑃(log log 𝑒) (poly space, sublinear qt)  Cf., β„“ ∞ has no non-trivial sketch!  Some matching lower bounds in the relevant model [ACP’08, KP’12]  Can be thought of as data-dependent hashing  NNS for any norm (eg, matrix norms, EMD) ? 19

Recommend


More recommend