CS 498ABD: Algorithms for Big Data, Spring 2019 LSH for ℓ 2 distances Lecture 15 March 12, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 21
LSH Approach for Approximate NNS Use locality-sensitive hashing to solve simplified decision problem Definition A family of hash functions is ( r , cr , p 1 , p 2 ) -LSH with p 1 > p 2 and c > 1 if h drawn randomly from the family satisfies the following: Pr[ h ( x ) = h ( y )] ≥ p 1 when dist ( x , y ) ≤ r Pr[ h ( x ) = h ( y )] ≤ p 2 when dist ( x , y ) ≥ cr Key parameter: the gap between p 1 and p 2 measured as ρ = log p 1 log p 2 usually small. Two-level hashing scheme: Amplify basic locality sensitive hash family to create better family by repetition Use several copies of amplified hash functions Layer binary search based on r on top of above scheme. Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 21
LSH Approach for Approximate NNS Key parameter: the gap between p 1 and p 2 measured as ρ = log p 1 log p 2 usually small. L ≃ n ρ hash tables Storage: n 1+ ρ (ignoring log factors) Query time: kn ρ (ignoring log factors) where k = log 1 / p 2 n Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 21
LSH for Euclidean Distances Now x 1 , x 2 , . . . , x n ∈ R d and dist ( x , y ) = � x − y � 2 First do dimensionality reduction (JL) to reduce d (if necessary) to O (log n ) (since we are using c -approximation anyway) Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 21
LSH for Euclidean Distances Now x 1 , x 2 , . . . , x n ∈ R d and dist ( x , y ) = � x − y � 2 First do dimensionality reduction (JL) to reduce d (if necessary) to O (log n ) (since we are using c -approximation anyway) What is a good basic locality-sensitive hashing scheme? That is, we want a hashing approach that makes nearby points more likely to collide than farther away points. Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 21
LSH for Euclidean Distances Now x 1 , x 2 , . . . , x n ∈ R d and dist ( x , y ) = � x − y � 2 First do dimensionality reduction (JL) to reduce d (if necessary) to O (log n ) (since we are using c -approximation anyway) What is a good basic locality-sensitive hashing scheme? That is, we want a hashing approach that makes nearby points more likely to collide than farther away points. Projections onto random lines plus bucketing Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 21
Random unit vector Question: How do we generate a random unit vector in R d (same as a uniform point on the sphere S n − 1 )? Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 21
Random unit vector Question: How do we generate a random unit vector in R d (same as a uniform point on the sphere S n − 1 )? Pick d independent rvs Z 1 , Z 2 , . . . , Z d where each Z i ≃ N (0 , 1) and let g = ( Z 1 , Z 2 , . . . , Z d ) (also called a random Guassian vector) g is symmetric and hence is a random direction to obtain random unit vector normalize g ′ = g / � g � 2 When d is large � g � 2 i Z 2 2 = � i is concentrated around d and hence � g � 2 = (1 ± ǫ ) √ g with high probability. √ Thus g / d is a proxy for random unit vector and is easier to work with in many cases Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 21
Projection onto a random guassian vector Lemma Suppose x ∈ R d and g is a random Guassian vector. Let Y = x · g . Then Y ∼ N (0 , � x � 2 ) and hence E [ Y 2 ] = ( � x � 2 ) 2 . Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 21
Hashing scheme Pick a random unit Guassian vector u Pick a random shift a ∈ (0 , r ] For vector x set h u , a = ⌊ x · u + a ⌋ r Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 21
Analysis Suppose x , y are such that � x − y � 2 ≤ r . What is p 1 = Pr[ h u , a ( x ) = h u , a ( y )] Suppose x , y are such that � x − y � 2 ≥ cr . What is p 2 = Pr[ h u , a ( x ) = h u , a ( y )] Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 21
Analysis Suppose x , y are such that � x − y � 2 ≤ r . What is p 1 = Pr[ h u , a ( x ) = h u , a ( y )] Suppose x , y are such that � x − y � 2 ≥ cr . What is p 2 = Pr[ h u , a ( x ) = h u , a ( y )] Let q = x − y . Let s = � q − x � 2 be length of q . From Lemma q · g is distributed as s N (0 , 1) . Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 21
Analysis Suppose x , y are such that � x − y � 2 ≤ r . What is p 1 = Pr[ h u , a ( x ) = h u , a ( y )] Suppose x , y are such that � x − y � 2 ≥ cr . What is p 2 = Pr[ h u , a ( x ) = h u , a ( y )] Let q = x − y . Let s = � q − x � 2 be length of q . From Lemma q · g is distributed as s N (0 , 1) . Observations: h ( x ) � = h ( y ) if | q · g | ≥ r If | q · g | < r then h ( x ) = h ( y ) with probability 1 − | q · g | / r Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 21
Analysis Suppose x , y are such that � x − y � 2 ≤ r . What is p 1 = Pr[ h u , a ( x ) = h u , a ( y )] Suppose x , y are such that � x − y � 2 ≥ cr . What is p 2 = Pr[ h u , a ( x ) = h u , a ( y )] Let q = x − y . Let s = � q − x � 2 be length of q . From Lemma q · g is distributed as s N (0 , 1) . Observations: h ( x ) � = h ( y ) if | q · g | ≥ r If | q · g | < r then h ( x ) = h ( y ) with probability 1 − | q · g | / r Thus collision probability depends only on s Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 21
Analysis Let q = x − y . Let s = � q − x � 2 be length of q . From Lemma q · g is distributed as s N (0 , 1) . Observations: h ( x ) � = h ( y ) if | q · g | ≥ r If | q · g | < r then h ( x ) = h ( y ) with probability 1 − | q · g | / r For a fixed s collision probability is � r f ( t )(1 − t / r ) dt p ( s ) = 0 where f is the density function of | s N (0 , 1) | . Rewriting � r 1 s f ( t s )(1 − t / r ) dt p ( s ) = 0 where f is the density function of the |N (0 , 1) | . Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 21
ρ Analysis � r 1 s f ( t p ( s ) = s )(1 − t / r ) dt 0 where f is the density function of the |N (0 , 1) | . Recall p 1 = p ( r ) and p 2 = p ( cr ) and we are interested in ρ = log p 1 log p 2 . Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 21
Analysis � r 1 s f ( t p ( s ) = s )(1 − t / r ) dt 0 where f is the density function of the |N (0 , 1) | . Recall p 1 = p ( r ) and p 2 = p ( cr ) and we are interested in ρ = log p 1 ρ log p 2 . Show ρ < 1 / c by plot 1 rho 1/c 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 Approximation factor c Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 21
NNS for Euclidean distances For any fixed c > 1 use above scheme to obtain Storage: O ( n 1+1 / c polylog ( n )) Query time: O ( dn 1 / c polylog ( n )) Can use JL to reduce d to O (log n ) . Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 21
Improved LSH Scheme [Andoni-Indyk’06] Basic LSH scheme projects points into lines Better scheme: pick some small constant t and project points → into R t Use lattice based space partitioning scheme to “bucket” instead of intervals [Andoni-Indyk’06] ρ p p X t w → 6]: ntil ρ ≥ 0.45/c 2 w ρ Figures from Piotr Indyk’s slides ρ ≥ Chandra (UIUC) CS498ABD 12 Spring 2019 12 / 21
Improved LSH Scheme [Andoni-Indyk’06] Basic LSH scheme projects points into lines Better scheme: pick some small constant t and project points into R t Use lattice based space partitioning scheme to “bucket” instead of intervals Leads to ρ ≃ 1 / c 2 + O (log t / √ t ) and hence tends to 1 / c 2 for large c and fixed t Lower bound for LSH in ℓ 2 says ρ ≥ 1 / c 2 Chandra (UIUC) CS498ABD 13 Spring 2019 13 / 21
Data dependent LSH Scheme LSH is data oblivious. That is, the hash families are chosen before seeing the data. Can one do better by choosing hash functions based on the given set of points? Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 21
Data dependent LSH Scheme LSH is data oblivious. That is, the hash families are chosen before seeing the data. Can one do better by choosing hash functions based on the given set of points? Yes. [Andoni-Indyk-Ngyuyen-Razenshteyn’14, Andoni-Razensteyn’15] ρ = 1 / (2 c 2 − 1) for ℓ 2 improving upon 1 / c 2 for data oblivious LSH (which is tight in worst case) ρ = 1 / ( c 2 − 1) for ℓ 1 /Hamming cubt improving upon 1 / c for data oblivious LSH Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 21
LSH Summary A modular hashing based scheme for similarity estimation Main competitors are space partitioning data structures such as variants of k-d trees Provides speedups but uses more memory Does not appear to be a clear winner Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 21
Digression: p -stable distributions For F 2 estimation and JL and LSH we used important “stability” property of the Normal distribution. Lemma Let Y 1 , Y 2 , . . . , Y d be independent random variables with distribution N (0 , 1) . Z = � i x i Y i has distribution � x � 2 N (0 , 1) Standard Gaussian is 2 -stable. Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 21
Recommend
More recommend