Let π β π(π β2 β log π) and π΅ β β πΓπ , let each entry of π΅ is chosen i.i.d. from πΆ(0,1/π) , 2 β 1 > π < 1/π 3 then for any unit vector π£ β β π : Pr π΅π£ 2 β’ Each π΅ ππ is chosen i.i.d. from πΆ(0,1/π) β’ Linear combination of independent Gaussian r.v. is also Gaussian 2 , π~πΆ π π , π π 2 β ππ + ππ~πΆ ππ π + ππ π , π 2 π π 2 + π 2 π π 2 β’ π~πΆ π π , π π π£ is unit vector
Let π β π(π β2 β log π) and π΅ β β πΓπ , let each entry of π΅ is chosen i.i.d. from πΆ(0,1/π) , 2 β 1 > π < 1/π 3 then for any unit vector π£ β β π : Pr π΅π£ 2 β’ Each π΅ ππ is chosen i.i.d. from πΆ(0,1/π) β’ Linear combination of independent Gaussian r.v. is also Gaussian 2 , π~πΆ π π , π π 2 β ππ + ππ~πΆ ππ π + ππ π , π 2 π π 2 + π 2 π π 2 β’ π~πΆ π π , π π π£ is unit vector Moreover, these π΅π£ π are mutually independent!
Let π β π(π β2 β log π) and π΅ β β πΓπ , let each entry of π΅ is chosen i.i.d. from πΆ(0,1/π) , 2 β 1 > π < 1/π 3 then for any unit vector π£ β β π : Pr π΅π£ 2
Let π β π(π β2 β log π) and π΅ β β πΓπ , let each entry of π΅ is chosen i.i.d. from πΆ(0,1/π) , 2 β 1 > π < 1/π 3 then for any unit vector π£ β β π : Pr π΅π£ 2
Let π β π(π β2 β log π) and π΅ β β πΓπ , let each entry of π΅ is chosen i.i.d. from πΆ(0,1/π) , 2 β 1 > π < 1/π 3 then for any unit vector π£ β β π : Pr π΅π£ 2
Let π β π(π β2 β log π) and π΅ β β πΓπ , let each entry of π΅ is chosen i.i.d. from πΆ(0,1/π) , 2 β 1 > π < 1/π 3 then for any unit vector π£ β β π : Pr π΅π£ 2 In terms of expectation we are fine, but how fast do we deviate from expectation?
Let π β π(π β2 β log π) and π΅ β β πΓπ , let each entry of π΅ is chosen i.i.d. from πΆ(0,1/π) , 2 β 1 > π < 1/π 3 then for any unit vector π£ β β π : Pr π΅π£ 2
Let π β π(π β2 β log π) and π΅ β β πΓπ , let each entry of π΅ is chosen i.i.d. from πΆ(0,1/π) , 2 β 1 > π < 1/π 3 then for any unit vector π£ β β π : Pr π΅π£ 2
Let π β π(π β2 β log π) and π΅ β β πΓπ , let each entry of π΅ is chosen i.i.d. from πΆ(0,1/π) , 2 β 1 > π < 1/π 3 then for any unit vector π£ β β π : Pr π΅π£ 2
Let π β π(π β2 β log π) and π΅ β β πΓπ , let each entry of π΅ is chosen i.i.d. from πΆ(0,1/π) , 2 β 1 > π < 1/π 3 then for any unit vector π£ β β π : Pr π΅π£ 2 Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π
Let π β π(π β2 β log π) and π΅ β β πΓπ , let each entry of π΅ is chosen i.i.d. from πΆ(0,1/π) , 2 β 1 > π < 1/π 3 then for any unit vector π£ β β π : Pr π΅π£ 2 Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , Notice π π = π β π π 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π
Let π β π(π β2 β log π) and π΅ β β πΓπ , let each entry of π΅ is chosen i.i.d. from πΆ(0,1/π) , 2 β 1 > π < 1/π 3 then for any unit vector π£ β β π : Pr π΅π£ 2 Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π
Let π β π(π β2 β log π) and π΅ β β πΓπ , let each entry of π΅ is chosen i.i.d. from πΆ(0,1/π) , 2 β 1 > π < 1/π 3 then for any unit vector π£ β β π : Pr π΅π£ 2 For suitable Chernoff bound for π π -distribution: π β π(π β2 β log π) For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π
Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π
Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π
Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π
Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π
Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π
Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π If π~πΆ(0,1) and π‘ < 1/2 , then π½ π π‘π 2 = 1 1β2π‘
Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π If π~πΆ(0,1) and π‘ < 1/2 , then π½ π π‘π 2 = 1 1β2π‘
Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π If π~πΆ(0,1) and π‘ < 1/2 , then π½ π π‘π 2 = 1 1β2π‘
Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π If π~πΆ(0,1) and π‘ < 1/2 , then π½ π π‘π 2 = 1 1β2π‘
Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π when π β€ 1/4 If π~πΆ(0,1) and π‘ < 1/2 , then π½ π π‘π 2 = 1 1β2π‘
Chernoff bound for π π -distribution: For i.i.d. π 1 , π 2 , β― , π π ~πΆ(0,1) and 0 < π < 1 , 2 β 1 > π < 2π βππ 2 /8 1 π π Ο π=1 Pr π π when π β€ 1/4 If π~πΆ(0,1) and π‘ < 1/2 , then π½ π π‘π 2 = 1 let π = π/4 1β2π‘
Theorem (Johnson-Lindenstrauss 1984) : β0 < π < 1 , for any set π of π points from β π , there is a π: β π β β π with π β π(π β2 β log π) , such that βπ¦ π , π¦ π β π : 2 β€ 2 β€ (1 + π) π¦ π β π¦ π 2 1 β π π¦ π β π¦ π π π¦ π β π π¦ π 2 2 2 β JLT states in Euclidian space, it is always possible to embed a set of π points in arbitrary dimension to π(log π) dimension with constant distortion. β β Even better, it is very easy to find such π : Just sample a random π Γ π matrix π΅ β
Nearest Neighbor Search (NNS) Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π , find the π§ π which is closest to Τ¦ π¦
Nearest Neighbor Search (NNS) a set a distance function satisfying triangle inequality Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π , find the π§ π which is closest to Τ¦ π¦
Nearest Neighbor Search (NNS) Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π , find the π§ π which is closest to Τ¦ π¦
Nearest Neighbor Search (NNS) Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π , find the π§ π which is closest to Τ¦ π¦ Can find many applications in: β’ database systems β’ pattern recognition β’ machine learning β’ bioinformatics β’ β¦
Nearest Neighbor Search (NNS) Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π , find the π§ π which is closest to Τ¦ π¦ Can find many applications in: β’ database systems ? β’ pattern recognition sound β’ machine learning β’ bioinformatics β’ β¦ size
Nearest Neighbor Search (NNS) Data: π points π§ 1 , π§ 2 , β― , π§ π β π π for some finite π π¦ β π π , find the π§ π which is closest to Τ¦ Query: given a point Τ¦ π¦ Goal: Efficiently answer the query What efficiency we care? β’ Usually space and time Trivial solution: β’ No preprocessing, just linear search Voronoi diagram When dimension π is small: β’ Binary search when π = 1 π -d tree β’ π -d tree β’ Voronoi diagram β’ β¦
Nearest Neighbor Search (NNS) Data: π points π§ 1 , π§ 2 , β― , π§ π β π π for some finite π π¦ β π π , find the π§ π which is closest to Τ¦ Query: given a point Τ¦ π¦ Goal: Efficiently answer the query What if dimension π is large, say π β« log π ?
Nearest Neighbor Search (NNS) Data: π points π§ 1 , π§ 2 , β― , π§ π β π π for some finite π π¦ β π π , find the π§ π which is closest to Τ¦ Query: given a point Τ¦ π¦ Goal: Efficiently answer the query What if dimension π is large, say π β« log π ? Curse of dimensionality: It is conjectured that to solve NNS in high dimension requires either super-polynomial( π ) space or super-polynomial( π ) time.
Nearest Neighbor Search (NNS) Data: π points π§ 1 , π§ 2 , β― , π§ π β π π for some finite π π¦ β π π , find the π§ π which is closest to Τ¦ Query: given a point Τ¦ π¦ Goal: Efficiently answer the query What if dimension π is large, say π β« log π ? Curse of dimensionality: It is conjectured that to solve NNS in high dimension requires either super-polynomial( π ) space or super-polynomial( π ) time. Blessing: Randomization + Approximation
Approximate Near ( est ) Neighbor (ANN) Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π , π -ANN (Approximate Nearest Neighbor): Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ π β min 1β€πβ€π dist( Τ¦ π¦, π§ π ) (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise
Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π , π -ANN (Approximate Nearest Neighbor): Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ π β min 1β€πβ€π dist( Τ¦ π¦, π§ π ) (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise
Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π , π -ANN (Approximate Nearest Neighbor): Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ π β min 1β€πβ€π dist( Τ¦ π¦, π§ π ) (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise If we can solve (π, π ) -ANN, then we can solve π -ANN with little overhead.
Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π , π -ANN (Approximate Nearest Neighbor): Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ π β min 1β€πβ€π dist( Τ¦ π¦, π§ π ) (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise If we can solve (π, π ) -ANN, then we can solve π -ANN with little overhead. πΈ πππ = 1β€π<πβ€π dist(π§ π , π§ π ) min πΈ πππ¦ = 1β€π<πβ€π dist(π§ π , π§ π ) max π = πΈ πππ π β1 , πΈ πππ π 0 , πΈ πππ π 1 , β― , πΈ πππ¦ β β β 2 2 2
Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π , π -ANN (Approximate Nearest Neighbor): Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ π β min 1β€πβ€π dist( Τ¦ π¦, π§ π ) (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise If we can solve (π, π ) -ANN, then we can solve π -ANN with little overhead. πΈ πππ = 1β€π<πβ€π dist(π§ π , π§ π ) min πΈ πππ¦ = 1β€π<πβ€π dist(π§ π , π§ π ) max π = πΈ πππ π β1 , πΈ πππ π 0 , πΈ πππ π 1 , β― , πΈ πππ¦ β β β 2 2 2 Let π β be the min in π s.t. ( π, π β ) -ANN returns yes with π§ β
Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π , π -ANN (Approximate Nearest Neighbor): Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ π β min 1β€πβ€π dist( Τ¦ π¦, π§ π ) (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise If we can solve (π, π ) -ANN, then we can solve π -ANN with little overhead. πΈ πππ = 1β€π<πβ€π dist(π§ π , π§ π ) min πΈ πππ¦ = 1β€π<πβ€π dist(π§ π , π§ π ) max π = πΈ πππ π β1 , πΈ πππ π 0 , πΈ πππ π 1 , β― , πΈ πππ¦ β β β 2 2 2 Let π β be the min in π s.t. ( π, π β ) -ANN returns yes with π§ β π¦, π§ β β€ π¦, π§ π > π β / π π β π β βπ§ π β π: dist Τ¦ dist Τ¦
Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π , π -ANN (Approximate Nearest Neighbor): Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ π β min 1β€πβ€π dist( Τ¦ π¦, π§ π ) (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise If we can solve (π, π ) -ANN, then we can solve π -ANN with little overhead. π -ANN can be solved βπ : ( π, π ) -ANN can be solved πΈ πππ¦ πΈ πππ with space π π‘ β log π ΰ΅ with space π‘ and query time π’ πΈ πππ¦ πΈ πππ and query time π π’ β log 2 log π ΰ΅
Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π , π -ANN (Approximate Nearest Neighbor): Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ π β min 1β€πβ€π dist( Τ¦ π¦, π§ π ) (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise If we can solve (π, π ) -ANN, then we can solve π -ANN with little overhead. π -ANN can be solved βπ : ( π, π ) -ANN can be solved πΈ πππ¦ πΈ πππ with space π π‘ β log π ΰ΅ with space π‘ and query time π’ πΈ πππ¦ πΈ πππ and query time π π’ β log 2 log π ΰ΅
Setup: consider Hamming space 0,1 π Data: π points π§ 1 , π§ 2 , β― , π§ π β 0,1 π π¦ β 0,1 π , Query: given a point Τ¦ (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise GF(2): two elements {0,1} , XOR as sum, AND as multiplication. π Therefore, π¨ π π = π΅π§ π π = Ο π=1 π΅ ππ β π§ π π mod 2 . Let π , π and π‘ to be fixed later. Sample a π Γ π Boolean matrix π΅ with i.i.d. entries from Bernoulli(π) . For π = 1,2, β― , π : let π¨ π = π΅π§ π β 0,1 π on finite field GF(2). dist π£, π¨ π β€ π‘ for all π£ β 0,1 π . Store all π‘ -balls πΆ π‘ π£ = π§ π π¦ β 0,1 π : Now, upon a query Τ¦ Retrieve πΆ π‘ (π΅ Τ¦ π¦) . If πΆ π‘ π΅ Τ¦ π¦ = β return βnoβ, else return any π§ π β πΆ π‘ (π΅ Τ¦ π¦) . Space: π(π β 2 π ) Query time: π(ππ) computation + π(1) memory access
Setup: consider Hamming space 0,1 π Data: π points π§ 1 , π§ 2 , β― , π§ π β 0,1 π π¦ β 0,1 π , Query: given a point Τ¦ (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise Let π , π and π‘ to be fixed later. Sample a π Γ π Boolean matrix π΅ with i.i.d. entries from Bernoulli(π) . For π = 1,2, β― , π : let π¨ π = π΅π§ π β 0,1 π on finite field GF(2). dist π£, π¨ π β€ π‘ for all π£ β 0,1 π . Store all π‘ -balls πΆ π‘ π£ = π§ π π¦ β 0,1 π : Now, upon a query Τ¦ Retrieve πΆ π‘ (π΅ Τ¦ π¦) . If πΆ π‘ π΅ Τ¦ π¦ = β return βnoβ, else return any π§ π β πΆ π‘ (π΅ Τ¦ π¦) .
Setup: consider Hamming space 0,1 π Data: π points π§ 1 , π§ 2 , β― , π§ π β 0,1 π π¦ β 0,1 π , Query: given a point Τ¦ (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise Let π , π and π‘ to be fixed later. Sample a π Γ π Boolean matrix π΅ with i.i.d. entries from Bernoulli(π) . For π = 1,2, β― , π : let π¨ π = π΅π§ π β 0,1 π on finite field GF(2). dist π£, π¨ π β€ π‘ for all π£ β 0,1 π . Store all π‘ -balls πΆ π‘ π£ = π§ π π¦ β 0,1 π : Now, upon a query Τ¦ Retrieve πΆ π‘ (π΅ Τ¦ π¦) . If πΆ π‘ π΅ Τ¦ π¦ = β return βnoβ, else return any π§ π β πΆ π‘ (π΅ Τ¦ π¦) . π§ β 0,1 π : For suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦
Setup: consider Hamming space 0,1 π Data: π points π§ 1 , π§ 2 , β― , π§ π β 0,1 π π¦ β 0,1 π , Query: given a point Τ¦ (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise Let π , π and π‘ to be fixed later. Sample a π Γ π Boolean matrix π΅ with i.i.d. entries from Bernoulli(π) . For π = 1,2, β― , π : let π¨ π = π΅π§ π β 0,1 π on finite field GF(2). dist π£, π¨ π β€ π‘ for all π£ β 0,1 π . Store all π‘ -balls πΆ π‘ π£ = π§ π π¦ β 0,1 π : Now, upon a query Τ¦ Retrieve πΆ π‘ (π΅ Τ¦ π¦) . If πΆ π‘ π΅ Τ¦ π¦ = β return βnoβ, else return any π§ π β πΆ π‘ (π΅ Τ¦ π¦) . π§ β 0,1 π : For suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ (π, π ) -ANN is solved w.h.p.
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ each row vector π΅ π of π΅ has i.i.d. entries β Bernoulli(π)
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ each row vector π΅ π of π΅ has i.i.d. entries β Bernoulli(π)
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ each row vector π΅ π of π΅ has i.i.d. entries β Bernoulli(π) an alternative view regarding the generation of π΅ π : β’ build π· β [π] s.t. each element in [π] is chosen independently with pr. 2π β’ each coordinate in π· is independently set to 0 or 1 each with pr. 1/2
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ each row vector π΅ π of π΅ has i.i.d. entries β Bernoulli(π) an alternative view regarding the generation of π΅ π : β’ build π· β [π] s.t. each element in [π] is chosen independently with pr. 2π β’ each coordinate in π· is independently set to 0 or 1 each with pr. 1/2 observations: β’ if π β π· for all coordinates π where Τ¦ π¦ π β π§ π , then π΅ Τ¦ Τ¦ π¦ π = π΅ Τ¦ π§ π β’ otherwise, if exists such π β π· , then once all other entries in π΅ π are fixed, exactly one of the two choices for π΅ ππ will make π΅ Τ¦ π¦ π = π΅ Τ¦ π§ π
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ each row vector π΅ π of π΅ has i.i.d. entries β Bernoulli(π) choose π to satisfy 1 β 2π = 2 β1/π
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ choose π to satisfy 1 β 2π = 2 β1/π
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ choose π to satisfy 1 β 2π = 2 β1/π π π where π π = α1 if π΅ Τ¦ π¦ π β π΅ Τ¦ π§ π π π§ = π = Ο π=1 dist π΅ Τ¦ π¦, π΅ Τ¦ 0 otherwise
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ choose π to satisfy 1 β 2π = 2 β1/π π π where π π = α1 if π΅ Τ¦ π¦ π β π΅ Τ¦ π§ π π π§ = π = Ο π=1 dist π΅ Τ¦ π¦, π΅ Τ¦ 0 otherwise
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ choose π to satisfy 1 β 2π = 2 β1/π π π where π π = α1 if π΅ Τ¦ π¦ π β π΅ Τ¦ π§ π π π§ = π = Ο π=1 dist π΅ Τ¦ π¦, π΅ Τ¦ 0 otherwise 1 4 + 1 2 β2 β π+1 π 3 8 β 2 β(π+2) π choose π‘ = = 2
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ choose π to satisfy 1 β 2π = 2 β1/π π π where π π = α1 if π΅ Τ¦ π¦ π β π΅ Τ¦ π§ π π π§ = π = Ο π=1 dist π΅ Τ¦ π¦, π΅ Τ¦ 0 otherwise 1 4 + 1 2 β2 β π+1 π 3 8 β 2 β(π+2) π choose π‘ = = 2
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ choose π to satisfy 1 β 2π = 2 β1/π π π where π π = α1 if π΅ Τ¦ π¦ π β π΅ Τ¦ π§ π π π§ = π = Ο π=1 dist π΅ Τ¦ π¦, π΅ Τ¦ 0 otherwise independent 1 4 + 1 2 β2 β π+1 π 3 8 β 2 β(π+2) π choose π‘ = = 2
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ Chernoff bound: π Let independent r.v. π 1 , π 2 , β― , π π β {0,1} , let π = Ο π=1 π π , then for π‘ > 0 : Pr π β₯ π½ π + π‘ β€ exp β 2π‘ 2 choose π to satisfy 1 β 2π = 2 β1/π π Pr π β€ π½ π β π‘ β€ exp β 2π‘ 2 π π π where π π = α1 if π΅ Τ¦ π¦ π β π΅ Τ¦ π§ π π π§ = π = Ο π=1 dist π΅ Τ¦ π¦, π΅ Τ¦ 0 otherwise independent 1 4 + 1 2 β2 β π+1 π 8 β 2 β(π+2) π 3 choose π‘ = = 2
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ Chernoff bound: π Let independent r.v. π 1 , π 2 , β― , π π β {0,1} , let π = Ο π=1 π π , then for π‘ > 0 : Pr π β₯ π½ π + π‘ β€ exp β 2π‘ 2 choose π to satisfy 1 β 2π = 2 β1/π π Pr π β€ π½ π β π‘ β€ exp β 2π‘ 2 π π π where π π = α1 if π΅ Τ¦ π¦ π β π΅ Τ¦ π§ π π π§ = π = Ο π=1 dist π΅ Τ¦ π¦, π΅ Τ¦ 0 otherwise independent 1 4 + 1 2 β2 β π+1 π 8 β 2 β(π+2) π 3 choose π‘ = = 2
random π Γ π Boolean matrix π΅ with i.i.d. entries β Bernoulli(π) π π¦ π = Ο π=1 computation on GF(2): π΅ Τ¦ π΅ ππ β Τ¦ π¦ π mod 2 π§ β 0,1 π : for suitable π β π(log π) , p and s; β Τ¦ π¦, Τ¦ choose π to satisfy 1 β 2π = 2 β1/π π π where π π = α1 if π΅ Τ¦ π¦ π β π΅ Τ¦ π§ π π π§ = π = Ο π=1 dist π΅ Τ¦ π¦, π΅ Τ¦ 0 otherwise independent 1 4 + 1 2 β2 β π+1 π 3 8 β 2 β(π+2) π choose π‘ = = 2
Setup: consider Hamming space 0,1 π Data: π points π§ 1 , π§ 2 , β― , π§ π β 0,1 π π¦ β 0,1 π , Query: given a point Τ¦ (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise ln π 3 8 β 2 β(π+2) π . 1 β 2 β1/π Ξ€ Ξ€ Let π = 1 8β2 β(π+2) , π = 2 and π‘ = Ξ€ Sample a π Γ π Boolean matrix π΅ with i.i.d. entries from Bernoulli(π) . For π = 1,2, β― , π : let π¨ π = π΅π§ π β 0,1 π on finite field GF(2). dist π£, π¨ π β€ π‘ for all π£ β 0,1 π . Store all π‘ -balls πΆ π‘ π£ = π§ π π¦ β 0,1 π : Now, upon a query Τ¦ Retrieve πΆ π‘ (π΅ Τ¦ π¦) . If πΆ π‘ π΅ Τ¦ π¦ = β return βnoβ, else return any π§ π β πΆ π‘ (π΅ Τ¦ π¦) . Space: π(π β 2 π ) Query time: π(ππ) computation + π(1) memory access
Setup: consider Hamming space 0,1 π Data: π points π§ 1 , π§ 2 , β― , π§ π β 0,1 π π¦ β 0,1 π , Query: given a point Τ¦ (π , π) -ANN (Approximate Near Neighbor): β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise ln π 3 8 β 2 β(π+2) π . 1 β 2 β1/π Ξ€ Ξ€ Let π = 1 8β2 β(π+2) , π = 2 and π‘ = Ξ€ Sample a π Γ π Boolean matrix π΅ with i.i.d. entries from Bernoulli(π) . For π = 1,2, β― , π : let π¨ π = π΅π§ π β 0,1 π on finite field GF(2). dist π£, π¨ π β€ π‘ for all π£ β 0,1 π . Store all π‘ -balls πΆ π‘ π£ = π§ π π¦ β 0,1 π : Now, upon a query Τ¦ Retrieve πΆ π‘ (π΅ Τ¦ π¦) . If πΆ π‘ π΅ Τ¦ π¦ = β return βnoβ, else return any π§ π β πΆ π‘ (π΅ Τ¦ π¦) . Space: π(π β 2 π ) Query time: π(ππ) computation + π(1) memory access Space: π π(1) Solve (π, π ) -ANN w.h.p. Query time: π(π log π)
Locality-Sensitive Hashing (LSH) Given a metric space π, dist , a random β: π β π drawn from β is an (π , ππ , π, π) -LSH if, for all Τ¦ π¦, Τ¦ π§ β π :
Locality-Sensitive Hashing (LSH) Given a metric space π, dist , a random β: π β π drawn from β is an (π , ππ , π, π) -LSH if, for all Τ¦ π¦, Τ¦ π§ β π : π > π
Locality-Sensitive Hashing (LSH) Given a metric space π, dist , a random β: π β π drawn from β is an (π , ππ , π, π) -LSH if, for all Τ¦ π¦, Τ¦ π§ β π : If there exists an (π , ππ , π, π) -LSH β: π β π , then there exists an (π , ππ , π π , π π ) -LSH π: π β π π
Locality-Sensitive Hashing (LSH) Given a metric space π, dist , a random β: π β π drawn from β is an (π , ππ , π, π) -LSH if, for all Τ¦ π¦, Τ¦ π§ β π : If there exists an (π , ππ , π, π) -LSH β: π β π , then there exists an (π , ππ , π π , π π ) -LSH π: π β π π Independently draw β 1 , β 2 , β― , β π according to the distribution of β β π π π π¦ = β 1 π¦ , β 2 π¦ , β― , β π π¦
(π , π) -ANN Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π : β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise Suppose we have (π , ππ , π β , Ξ€ 1 π) -LSH π: π β π β Τ¦ π¦, Τ¦ π§ β π :
(π , π) -ANN Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π : β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise Suppose we have (π , ππ , π β , Ξ€ 1 π) -LSH π: π β π β Τ¦ π¦, Τ¦ π§ β π : Store π§ 1 , π§ 2 , β― , π§ π in nondecreasing order of π(π§ π ) . Upon query Τ¦ π¦ β π : Find all π§ π such that π Τ¦ π¦ = π(π§ π ) by binary search. If encounter some π§ π such that dist Τ¦ π¦, π§ π β€ ππ then return this π§ π ; otherwise return βnoβ.
(π , π) -ANN Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π : β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise Suppose we have (π , ππ , π β , Ξ€ 1 π) -LSH π: π β π β Τ¦ π¦, Τ¦ π§ β π : Store π§ 1 , π§ 2 , β― , π§ π in nondecreasing order of π(π§ π ) . Upon query Τ¦ π¦ β π : Find all π§ π such that π Τ¦ π¦ = π(π§ π ) by binary search. If encounter some π§ π such that dist Τ¦ π¦, π§ π β€ ππ then return this π§ π ; otherwise return βnoβ. If the real answer is βnoβ: always correct If the real answer is βyesβ: correct with probability at least π β
(π , π) -ANN Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π : β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise Suppose we have (π , ππ , π β , Ξ€ 1 π) -LSH π: π β π β Τ¦ π¦, Τ¦ π§ β π : Store π§ 1 , π§ 2 , β― , π§ π in nondecreasing order of π(π§ π ) . Upon query Τ¦ π¦ β π : Find all π§ π such that π Τ¦ π¦ = π(π§ π ) by binary search. If encounter some π§ π such that dist Τ¦ π¦, π§ π β€ ππ then return this π§ π ; otherwise return βnoβ. If the real answer is βnoβ: always correct If the real answer is βyesβ: correct with probability at least π β Space: π(π) Time: π(log π)
(π , π) -ANN Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π : β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise Suppose we have (π , ππ , π β , Ξ€ 1 π) -LSH π: π β π β Τ¦ π¦, Τ¦ π§ β π : Store π§ 1 , π§ 2 , β― , π§ π in nondecreasing order of π(π§ π ) . Upon query Τ¦ π¦ β π : Find all π§ π such that π Τ¦ π¦ = π(π§ π ) by binary search. If encounter some π§ π such that dist Τ¦ π¦, π§ π β€ ππ then return this π§ π ; otherwise return βnoβ. If the real answer is βnoβ: always correct If the real answer is βyesβ: correct with probability at least π β Space: π(π) Time: π(log π) + π(1) in expectation
(π , π) -ANN Setup: metric space (π, dist) Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: given a point Τ¦ π¦ β π : β’ Return a π§ π s.t. dist Τ¦ π¦, π§ π β€ ππ if βπ§ π : dist Τ¦ π¦, π§ π β€ π β’ Return βnoβ if βπ§ π : dist Τ¦ π¦, π§ π > ππ β’ Arbitrary answer otherwise Suppose we have (π, ππ , π β , Ξ€ 1 π) -LSH π: π β π β Τ¦ π¦, Τ¦ π§ β π : 1 π β , independently draw π 1 , π 2 , β― , π π . Ξ€ Let π = Maintain π sorted tables: For π = 1,2, β― , π : Store π§ 1 , π§ 2 , β― , π§ π in table- π in nondecreasing order of π π (π§ π ) . Upon query π β π : Find first 10 β π such π§ π that βπ: π π Τ¦ π¦ = π π (π§ π ) by binary search. If encounter some π§ π such that dist Τ¦ π¦, π§ π β€ ππ then return this π§ π ; otherwise return βnoβ.
(π, ππ , π β , Ξ€ (π, π ) -ANN in metric space (π, dist) 1 π) -LSH π: π β π Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: some point Τ¦ π¦ β π 1 π β , independently draw π 1 , π 2 , β― , π π . Let π = Ξ€ Maintain π sorted tables: For π = 1,2, β― , π : Store π§ 1 , π§ 2 , β― , π§ π in table- π in nondecreasing order of π π (π§ π ) . Upon query π β π : Find first 10 β π such π§ π that βπ: π π Τ¦ π¦ = π π (π§ π ) by binary search. If encounter some π§ π such that dist Τ¦ π¦, π§ π β€ ππ then return this π§ π ; otherwise return βnoβ.
(π, ππ , π β , Ξ€ (π, π ) -ANN in metric space (π, dist) 1 π) -LSH π: π β π Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: some point Τ¦ π¦ β π 1 π β , independently draw π 1 , π 2 , β― , π π . Let π = Ξ€ Maintain π sorted tables: For π = 1,2, β― , π : Store π§ 1 , π§ 2 , β― , π§ π in table- π in nondecreasing order of π π (π§ π ) . Upon query π β π : Find first 10 β π such π§ π that βπ: π π Τ¦ π¦ = π π (π§ π ) by binary search. If encounter some π§ π such that dist Τ¦ π¦, π§ π β€ ππ then return this π§ π ; otherwise return βnoβ. If the real answer is βnoβ: always correct.
(π, ππ , π β , Ξ€ (π, π ) -ANN in metric space (π, dist) 1 π) -LSH π: π β π Data: π points π§ 1 , π§ 2 , β― , π§ π β π Query: some point Τ¦ π¦ β π 1 π β , independently draw π 1 , π 2 , β― , π π . Let π = Ξ€ Maintain π sorted tables: For π = 1,2, β― , π : Store π§ 1 , π§ 2 , β― , π§ π in table- π in nondecreasing order of π π (π§ π ) . Upon query π β π : Find first 10 β π such π§ π that βπ: π π Τ¦ π¦ = π π (π§ π ) by binary search. If encounter some π§ π such that dist Τ¦ π¦, π§ π β€ ππ then return this π§ π ; otherwise return βnoβ. If the real answer is βnoβ: always correct. If exists π§ π‘ such that dist Τ¦ π¦, π§ π‘ β€ π , then Pr answer "no" β€
Recommend
More recommend