Data-dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Based on joint work with: Piotr Indyk, Huy Nguyen, Ilya Razenshteyn
Nearest Neighbor Search (NNS) ο½ Preprocess: a set π of points ο½ Query: given a query point π , report a point π β β π with the smallest distance to π π β π 2
Motivation ο½ Generic setup: 000000 ο½ Points model objects (e.g. images) 011100 010100 ο½ Distance models (dis)similarity measure 000100 010100 011111 ο½ Application areas: 000000 001100 ο½ machine learning: k-NN rule π β 000100 000100 ο½ speech/image/video/music recognition, vector 110100 111111 π quantization, bioinformatics, etcβ¦ ο½ Distances: ο½ Hamming, Euclidean, edit distance, earthmover distance, etcβ¦ ο½ Core primitive: closest pair, clustering, etcβ¦ 3
Curse of Dimensionality ο½ All exact algorithms degrade rapidly with the dimension π Algorithm Query time Space π π(π) (Voronoi diagram size) π β log π π 1 Full indexing No indexing β π(π β π) π(π β π) linear scan 4
Approximate NNS ο½ π -near neighbor: given a query point π , report a point π β² β π s.t. πβ² β π β€ π ππ ο½ as long as there is some point within distance π π π β ππ π ο½ Practice: use for exact NNS ο½ Filtering : gives a set of candidates (hopefully π β² small) 5
NNS algorithms Exponential dependence on dimension ο½ [Arya- Mountβ93], [Clarksonβ94], [Arya -Mount-Netanyahu-Silverman- Weβ98], [Kleinbergβ97], [Har - Peledβ02],[Arya -Fonseca- Mountβ11],β¦ Linear/poly dependence on dimension ο½ [Kushilevitz-Ostrovsky- Rabaniβ98], [Indyk - Motwaniβ98], [Indykβ98, β01], [Gionis-Indyk- Motwaniβ99], [Charikarβ02], [Datar -Immorlica-Indyk- Mirrokniβ04], [Chakrabarti - Regevβ04], [Panigrahyβ06], [Ailon - Chazelleβ06], [A.- Indykβ06], [ A.-Indyk-Nguyen- Razenshteynβ14], [A. - Razenshteynβ15], [Paghβ16],[Laarhovenβ16],β¦ 6
Locality-Sensitive Hashing [Indyk-Motwani β 98] π πβ² Random hash function β on π π π π satisfying: ο½ for close pair : when π β π β€ π π 1 = Pr[β(π) = β(π)] is βhighβ β not-so-small β ο½ for far pair : when π β πβ² > ππ π 2 = Pr[β(π) = β(πβ²)] is βsmallβ Use several hash tables: Pr[π(π) = π(π)] π = log 1/π 1 1 π π , where log 1/π π 1 2 π 2 π β π 7 π ππ
LSH Algorithms π = π Space Time Exponent Reference π π π 1+π π = 1/π π = 1/2 [IMβ98] Hamming space π β₯ 1/π [MNPβ06, OWZβ11] π 1+π π π π = 1/π π = 1/2 [IMβ98, DIIMβ04] Euclidean π β 1/π 2 space π = 1/4 [AIβ06] π β₯ 1/π 2 [MNPβ06, OWZβ11] 8
LSH is tightβ¦ whatβs next? Lower bounds (cell probe) Datasets with additional structure [A.-Indyk- Patrascuβ06 , [Clarksonβ99, Panigrahy-Talwar- Wiederβ08,β 10, Karger- Ruhlβ02, Kapralov- Panigrahyβ12] Krauthgamer- Leeβ04, Beygelzimer-Kakade- Langfordβ06, Indyk- Naorβ07, Space-time trade-offs Dasgupta- Sinhaβ13, [Panigrahyβ06, Abdullah-A.-Krauthgamer- Kannanβ14,β¦] A.- Indykβ06 ] But are we really done with basic NNS algorithms? 9
Beyond Locality Sensitive Hashing π = π Space Time Exponent Reference π 1+π π π π = 1/π π = 1/2 [IMβ98] Hamming LSH π β₯ 1/π [MNPβ06, OWZβ11] space π 1+π π π complicated π = 1/2 β π [AINRβ14] 1 π = 1/3 [ARβ15] π β 2π β 1 Euclidean π 1+π π π π β 1/π 2 π = 1/4 [AIβ06] LSH space π β₯ 1/π 2 [MNPβ06, OWZβ11] π 1+π π π π = 1/4 β π complicated [AINRβ14] 1 π = 1/7 [ARβ15] π β 2π 2 β 1 10
New approach? Data-dependent hashing ο½ A random hash function, chosen after seeing the given dataset ο½ Efficiently computable 11
Construction of hash function ο½ T wo components: has better LSH ο½ Nice geometric structure data-dependent ο½ Reduction to such structure ο½ Like a (weak) β regularity l emmaβ for a set of points 12
Nice geometric structure: average-case ο½ Think: random dataset on a sphere ο½ vectors perpendicular to each other ο½ s.t. random points at distance β ππ 1 ο½ Lemma: π = 2π 2 β1 ππ ο½ via Cap Carving ππ / 2 13
Reduction to nice structure ο½ Idea: iteratively decrease the radius of minimum enclosing ball Why ok? Why ok? ο½ Algorithm: no dense clusters β’ ο½ find dense clusters ο½ with smaller radius l ike βrandom datasetβ β’ ο½ large fraction of points with radius= 100ππ ο½ recurse on dense clusters radius = 100ππ even better! β’ ο½ apply cap carving on the rest ο½ recurse on each βcapβ ο½ eg, dense clusters might reappear radius = 99ππ 14 *picture not to scale & dimension
Hash function ο½ Described by a tree (like a hash table) radius = 100ππ 15 *picture not to scale&dimension
2 β π π Dense clusters ο½ Current dataset: radius π ο½ A dense cluster: ο½ Contains π 1βπ points ο½ Smaller radius: 1 β Ξ© π 2 π ο½ After we remove all clusters: ο½ For any point on the surface, there are at most π 1βπ points 2 β π π within distance π trade-off π trade-off ο½ The other points are essentially orthogonal ! ο½ When applying Cap Carving with parameters (π 1 , π 2 ) : ? ο½ Empirical number of far pts colliding with query: ππ 2 + π 1βπ ο½ As long as ππ 2 β« π 1βπ , the βimpurityβ doesnβt matter!
Tree recap ο½ During query: ο½ Recurse in all clusters ο½ Just in one bucket in CapCarving ο½ Will look in >1 leaf! ο½ How much branching? π(1/π 2 ) ο½ Claim: at most π π + 1 ο½ Each time we branch ο½ at most π π clusters ( +1 ) ο½ a cluster reduces radius by Ξ©(π 2 ) ο½ cluster-depth at most 100/Ξ© π 2 ο½ Progress in 2 ways: π trade-off ο½ Clusters reduce radius ο½ CapCarving nodes reduce the # of far points (empirical π 2 ) 1 ο½ A tree succeeds with probability β₯ π β 2π2β1 βπ(1) 17
Beyond βBeyond LSHβ ο½ Practice: often optimize partition to your dataset ο½ PCA-tree, spectral hashing, etc [S91, McN01,VKD09, WTF08,β¦] ο½ no guarantees (performance or correctness) ο½ Theory: assume special structure in the dataset ο½ low intrinsic dimension [KRβ02, KLβ04, BKLβ06, INβ07, DSβ13,β¦] ο½ structure + noise [Abdullah-A.-Krauthgamer- Kannanβ14] Data-dependent hashing helps even when no a priori structure ! 18
Data-dependent hashing wrap-up ο½ Dynamicity? ο½ Dynamization techniques [Overmars- van Leeuwenβ81] ο½ Better bounds? ο½ For dimension π = π(log π) , can get better π ! [Laarhovenβ16] ο½ For π > log 1+π π : our π is optimal even for data-dependent hashing! [A- Razenshteyn β??]: ο½ in the right formalization (to rule out Voronoi diagram): ο½ description complexity of the hash function is π 1βΞ©(1) ο½ Practical variant [A-Indyk-Laarhoven-Razenshteyn- Schmidtβ15] ο½ NNS for β β ο½ [Indykβ98] gets approximation π(log log π) (poly space, sublinear qt) ο½ Cf., β β has no non-trivial sketch! ο½ Some matching lower bounds in the relevant model [ACPβ08, KPβ12] ο½ Can be thought of as data-dependent hashing ο½ NNS for any norm (eg, matrix norms, EMD) ? 19
Recommend
More recommend