CS 498ABD: Algorithms for Big Data LSH for ℓ 2 distances Lecture 15 October 15, 2020 Chandra (UIUC) CS498ABD 1 Fall 2020 1 / 21
LSH Approach for Approximate NNS Use locality-sensitive hashing to solve simplified decision problem Definition A family of hash functions is ( r , cr , p 1 , p 2 ) -LSH with p 1 > p 2 and c > 1 if h drawn randomly from the family satisfies the following: Pr[ h ( x ) = h ( y )] ≥ p 1 when dist ( x , y ) ≤ r Pr[ h ( x ) = h ( y )] ≤ p 2 when dist ( x , y ) ≥ cr Key parameter: the gap between p 1 and p 2 measured as ρ = log p 1 log p 2 usually small. Two-level hashing scheme: Amplify basic locality sensitive hash family to create better family by repetition Use several copies of amplified hash functions Layer binary search based on r on top of above scheme. Chandra (UIUC) CS498ABD 2 Fall 2020 2 / 21
LSH Approach for Approximate NNS Key parameter: the gap between p 1 and p 2 measured as ρ = log p 1 log p 2 usually small. L ≃ n ρ hash tables Storage: n 1+ ρ (ignoring log factors) Query time: kn ρ (ignoring log factors) where k = log 1 / p 2 n Chandra (UIUC) CS498ABD 3 Fall 2020 3 / 21
LSH for Euclidean Distances Now x 1 , x 2 , . . . , x n ∈ R d and dist ( x , y ) = � x − y � 2 First do dimensionality reduction (JL) to reduce d (if necessary) to O (log n ) (since we are using c -approximation anyway) Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 21
LSH for Euclidean Distances Now x 1 , x 2 , . . . , x n ∈ R d and dist ( x , y ) = � x − y � 2 First do dimensionality reduction (JL) to reduce d (if necessary) to O (log n ) (since we are using c -approximation anyway) What is a good basic locality-sensitive hashing scheme? That is, we want a hashing approach that makes nearby points more likely to collide than farther away points. Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 21
LSH for Euclidean Distances Now x 1 , x 2 , . . . , x n ∈ R d and dist ( x , y ) = � x − y � 2 First do dimensionality reduction (JL) to reduce d (if necessary) to O (log n ) (since we are using c -approximation anyway) What is a good basic locality-sensitive hashing scheme? That is, we want a hashing approach that makes nearby points more likely to collide than farther away points. Projections onto random lines plus bucketing Chandra (UIUC) CS498ABD 4 Fall 2020 4 / 21
Random unit vector Question: How do we generate a random unit vector in R d (same as a uniform point on the sphere S n − 1 )? Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 21
Random unit vector Question: How do we generate a random unit vector in R d (same as a uniform point on the sphere S n − 1 )? Pick d independent rvs Z 1 , Z 2 , . . . , Z d where each Z i ≃ N (0 , 1) and let g = ( Z 1 , Z 2 , . . . , Z d ) (also called a random Guassian vector) g is symmetric and hence is a random direction to obtain random unit vector normalize g ′ = g / � g � 2 When d is large � g � 2 i Z 2 2 = � i is concentrated around d and √ hence � g � 2 = (1 ± ǫ ) d with high probability. √ Thus g / d is a proxy for random unit vector and is easier to work with in many cases Chandra (UIUC) CS498ABD 5 Fall 2020 5 / 21
Projection onto a random guassian vector Lemma Suppose x ∈ R d and g is a random Guassian vector. Let Y = x · g . Then Y ∼ N (0 , � x � 2 ) and hence E [ Y 2 ] = ( � x � 2 ) 2 . Chandra (UIUC) CS498ABD 6 Fall 2020 6 / 21
Hashing scheme Pick a random unit Guassian vector u Pick a random shift a ∈ (0 , r ] For vector x set h u , a = ⌊ x · u + a ⌋ r Chandra (UIUC) CS498ABD 7 Fall 2020 7 / 21
Analysis Suppose x , y are such that � x − y � 2 ≤ r . What is p 1 = Pr[ h u , a ( x ) = h u , a ( y )] Suppose x , y are such that � x − y � 2 ≥ cr . What is p 2 = Pr[ h u , a ( x ) = h u , a ( y )] Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 21
Analysis Suppose x , y are such that � x − y � 2 ≤ r . What is p 1 = Pr[ h u , a ( x ) = h u , a ( y )] Suppose x , y are such that � x − y � 2 ≥ cr . What is p 2 = Pr[ h u , a ( x ) = h u , a ( y )] Let q = x − y . Let s = � q � 2 be length of q . From Lemma q · g is distributed as s N (0 , 1) . Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 21
Analysis Suppose x , y are such that � x − y � 2 ≤ r . What is p 1 = Pr[ h u , a ( x ) = h u , a ( y )] Suppose x , y are such that � x − y � 2 ≥ cr . What is p 2 = Pr[ h u , a ( x ) = h u , a ( y )] Let q = x − y . Let s = � q � 2 be length of q . From Lemma q · g is distributed as s N (0 , 1) . Observations: h ( x ) � = h ( y ) if | q · g | ≥ r If | q · g | < r then h ( x ) = h ( y ) with probability 1 − | q · g | / r Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 21
Analysis Suppose x , y are such that � x − y � 2 ≤ r . What is p 1 = Pr[ h u , a ( x ) = h u , a ( y )] Suppose x , y are such that � x − y � 2 ≥ cr . What is p 2 = Pr[ h u , a ( x ) = h u , a ( y )] Let q = x − y . Let s = � q � 2 be length of q . From Lemma q · g is distributed as s N (0 , 1) . Observations: h ( x ) � = h ( y ) if | q · g | ≥ r If | q · g | < r then h ( x ) = h ( y ) with probability 1 − | q · g | / r Thus collision probability depends only on s Chandra (UIUC) CS498ABD 8 Fall 2020 8 / 21
Analysis Let q = x − y . Let s = � q � 2 be length of q . From Lemma q · g is distributed as s N (0 , 1) . Observations: h ( x ) � = h ( y ) if | q · g | ≥ r If | q · g | < r then h ( x ) = h ( y ) with probability 1 − | q · g | / r For a fixed s collision probability is � r f ( t )(1 − t / r ) dt p ( s ) = 0 where f is the density function of | s N (0 , 1) | . Rewriting � r 1 s f ( t s )(1 − t / r ) dt p ( s ) = 0 where f is the density function of the |N (0 , 1) | . Chandra (UIUC) CS498ABD 9 Fall 2020 9 / 21
ρ Analysis � r 1 s f ( t s )(1 − t / r ) dt p ( s ) = 0 where f is the density function of the |N (0 , 1) | . Recall p 1 = p ( r ) and p 2 = p ( cr ) and we are interested in ρ = log p 1 log p 2 . Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 21
Analysis � r 1 s f ( t s )(1 − t / r ) dt p ( s ) = 0 where f is the density function of the |N (0 , 1) | . Recall p 1 = p ( r ) and p 2 = p ( cr ) and we are interested in ρ = log p 1 ρ log p 2 . Show ρ < 1 / c by plot 1 rho 1/c 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 4 5 6 7 8 9 10 Approximation factor c Chandra (UIUC) CS498ABD 10 Fall 2020 10 / 21
NNS for Euclidean distances For any fixed c > 1 use above scheme to obtain Storage: O ( n 1+1 / c polylog ( n )) Query time: O ( dn 1 / c polylog ( n )) Can use JL to reduce d to O (log n ) . Chandra (UIUC) CS498ABD 11 Fall 2020 11 / 21
Improved LSH Scheme [Andoni-Indyk’06] Basic LSH scheme projects points into lines Better scheme: pick some small constant t and project points → into R t Use lattice based space partitioning scheme to “bucket” instead of intervals [Andoni-Indyk’06] ρ p p X t w → 6]: ntil ρ ≥ 0.45/c 2 w ρ Figures from Piotr Indyk’s slides ρ ≥ Chandra (UIUC) CS498ABD 12 Fall 2020 12 / 21
Improved LSH Scheme [Andoni-Indyk’06] Basic LSH scheme projects points into lines Better scheme: pick some small constant t and project points into R t Use lattice based space partitioning scheme to “bucket” instead of intervals Leads to ρ ≃ 1 / c 2 + O (log t / √ t ) and hence tends to 1 / c 2 for large t and fixed c Lower bound for LSH in ℓ 2 says ρ ≥ 1 / c 2 Chandra (UIUC) CS498ABD 13 Fall 2020 13 / 21
Data dependent LSH Scheme LSH is data oblivious. That is, the hash families are chosen before seeing the data. Can one do better by choosing hash functions based on the given set of points? Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 21
Data dependent LSH Scheme LSH is data oblivious. That is, the hash families are chosen before seeing the data. Can one do better by choosing hash functions based on the given set of points? Yes. [Andoni-Indyk-Ngyuyen-Razenshteyn’14, Andoni-Razensteyn’15] ρ = 1 / (2 c 2 − 1) for ℓ 2 improving upon 1 / c 2 for data oblivious LSH (which is tight in worst case) ρ = 1 / ( c 2 − 1) for ℓ 1 /Hamming cube improving upon 1 / c for data oblivious LSH Chandra (UIUC) CS498ABD 14 Fall 2020 14 / 21
LSH Summary A modular hashing based scheme for similarity estimation Main competitors are space partitioning data structures such as variants of k-d trees Provides speedups but uses more memory Does not appear to be a clear winner Chandra (UIUC) CS498ABD 15 Fall 2020 15 / 21
Digression: p -stable distributions For F 2 estimation and JL and LSH we used important “stability” property of the Normal distribution. Lemma Let Y 1 , Y 2 , . . . , Y d be independent random variables with distribution N (0 , 1) . Z = � i x i Y i has distribution � x � 2 N (0 , 1) Standard Gaussian is 2 -stable. Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 21
Digression: p -stable distributions For F 2 estimation and JL and LSH we used important “stability” property of the Normal distribution. Lemma Let Y 1 , Y 2 , . . . , Y d be independent random variables with distribution N (0 , 1) . Z = � i x i Y i has distribution � x � 2 N (0 , 1) Standard Gaussian is 2 -stable. Definition A distribution D is p -stable if Z = � i x i Y i has distribution � x � p D when the Y i are independent and each of them is distributed as D . Chandra (UIUC) CS498ABD 16 Fall 2020 16 / 21
Recommend
More recommend