dimension reduction
play

Dimension Reduction and Nearest Neighbor Search Advanced - PowerPoint PPT Presentation

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall 2018 Dimension reduction: Why we care? High dimension data are common, yet working on them directly is expensive. Dimension reduction: Why we


  1. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 β€’ Each 𝐡 π‘—π‘˜ is chosen i.i.d. from 𝑢(0,1/𝑙) β€’ Linear combination of independent Gaussian r.v. is also Gaussian 2 , 𝑍~𝑢 𝜈 𝑍 , 𝜏 𝑍 2 β†’ π‘π‘Œ + 𝑐𝑍~𝑢 π‘πœˆ π‘Œ + π‘πœˆ 𝑍 , 𝑏 2 𝜏 π‘Œ 2 + 𝑐 2 𝜏 𝑍 2 β€’ π‘Œ~𝑢 𝜈 π‘Œ , 𝜏 π‘Œ 𝑣 is unit vector

  2. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 β€’ Each 𝐡 π‘—π‘˜ is chosen i.i.d. from 𝑢(0,1/𝑙) β€’ Linear combination of independent Gaussian r.v. is also Gaussian 2 , 𝑍~𝑢 𝜈 𝑍 , 𝜏 𝑍 2 β†’ π‘π‘Œ + 𝑐𝑍~𝑢 π‘πœˆ π‘Œ + π‘πœˆ 𝑍 , 𝑏 2 𝜏 π‘Œ 2 + 𝑐 2 𝜏 𝑍 2 β€’ π‘Œ~𝑢 𝜈 π‘Œ , 𝜏 π‘Œ 𝑣 is unit vector Moreover, these 𝐡𝑣 𝑗 are mutually independent!

  3. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2

  4. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2

  5. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2

  6. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 In terms of expectation we are fine, but how fast do we deviate from expectation?

  7. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2

  8. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2

  9. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2

  10. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  11. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , Notice π‘Œ 𝑗 = 𝑙 β‹… 𝑍 𝑗 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  12. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  13. Let 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) and 𝐡 ∈ ℝ 𝑙×𝑒 , let each entry of 𝐡 is chosen i.i.d. from 𝑢(0,1/𝑙) , 2 βˆ’ 1 > πœ— < 1/π‘œ 3 then for any unit vector 𝑣 ∈ ℝ 𝑒 : Pr 𝐡𝑣 2 For suitable Chernoff bound for 𝝍 πŸ‘ -distribution: 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  14. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  15. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  16. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  17. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  18. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗

  19. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗 If π‘Œ~𝑢(0,1) and 𝑑 < 1/2 , then 𝔽 𝑓 π‘‘π‘Œ 2 = 1 1βˆ’2𝑑

  20. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗 If π‘Œ~𝑢(0,1) and 𝑑 < 1/2 , then 𝔽 𝑓 π‘‘π‘Œ 2 = 1 1βˆ’2𝑑

  21. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗 If π‘Œ~𝑢(0,1) and 𝑑 < 1/2 , then 𝔽 𝑓 π‘‘π‘Œ 2 = 1 1βˆ’2𝑑

  22. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗 If π‘Œ~𝑢(0,1) and 𝑑 < 1/2 , then 𝔽 𝑓 π‘‘π‘Œ 2 = 1 1βˆ’2𝑑

  23. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗 when πœ‡ ≀ 1/4 If π‘Œ~𝑢(0,1) and 𝑑 < 1/2 , then 𝔽 𝑓 π‘‘π‘Œ 2 = 1 1βˆ’2𝑑

  24. Chernoff bound for 𝝍 πŸ‘ -distribution: For i.i.d. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ~𝑢(0,1) and 0 < πœ— < 1 , 2 βˆ’ 1 > πœ— < 2𝑓 βˆ’π‘™πœ— 2 /8 1 𝑙 𝑙 Οƒ 𝑗=1 Pr π‘Œ 𝑗 when πœ‡ ≀ 1/4 If π‘Œ~𝑢(0,1) and 𝑑 < 1/2 , then 𝔽 𝑓 π‘‘π‘Œ 2 = 1 let πœ‡ = πœ—/4 1βˆ’2𝑑

  25. Theorem (Johnson-Lindenstrauss 1984) : βˆ€0 < πœ— < 1 , for any set 𝑇 of π‘œ points from ℝ 𝑒 , there is a 𝜚: ℝ 𝑒 β†’ ℝ 𝑙 with 𝑙 ∈ 𝑃(πœ— βˆ’2 β‹… log π‘œ) , such that βˆ€π‘¦ 𝑗 , 𝑦 π‘˜ ∈ 𝑇 : 2 ≀ 2 ≀ (1 + πœ—) 𝑦 𝑗 βˆ’ 𝑦 π‘˜ 2 1 βˆ’ πœ— 𝑦 𝑗 βˆ’ 𝑦 π‘˜ 𝜚 𝑦 𝑗 βˆ’ 𝜚 𝑦 π‘˜ 2 2 2 β€œ JLT states in Euclidian space, it is always possible to embed a set of π‘œ points in arbitrary dimension to 𝑃(log π‘œ) dimension with constant distortion. ” β€œ Even better, it is very easy to find such 𝜚 : Just sample a random 𝑙 Γ— 𝑒 matrix 𝐡 ”

  26. Nearest Neighbor Search (NNS) Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , find the 𝑧 𝑗 which is closest to Τ¦ 𝑦

  27. Nearest Neighbor Search (NNS) a set a distance function satisfying triangle inequality Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , find the 𝑧 𝑗 which is closest to Τ¦ 𝑦

  28. Nearest Neighbor Search (NNS) Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , find the 𝑧 𝑗 which is closest to Τ¦ 𝑦

  29. Nearest Neighbor Search (NNS) Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , find the 𝑧 𝑗 which is closest to Τ¦ 𝑦 Can find many applications in: β€’ database systems β€’ pattern recognition β€’ machine learning β€’ bioinformatics β€’ …

  30. Nearest Neighbor Search (NNS) Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , find the 𝑧 𝑗 which is closest to Τ¦ 𝑦 Can find many applications in: β€’ database systems ? β€’ pattern recognition sound β€’ machine learning β€’ bioinformatics β€’ … size

  31. Nearest Neighbor Search (NNS) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 𝑉 𝑒 for some finite 𝑉 𝑦 ∈ 𝑉 𝑒 , find the 𝑧 𝑗 which is closest to Τ¦ Query: given a point Τ¦ 𝑦 Goal: Efficiently answer the query What efficiency we care? β€’ Usually space and time Trivial solution: β€’ No preprocessing, just linear search Voronoi diagram When dimension 𝑒 is small: β€’ Binary search when 𝑒 = 1 𝑙 -d tree β€’ 𝑙 -d tree β€’ Voronoi diagram β€’ …

  32. Nearest Neighbor Search (NNS) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 𝑉 𝑒 for some finite 𝑉 𝑦 ∈ 𝑉 𝑒 , find the 𝑧 𝑗 which is closest to Τ¦ Query: given a point Τ¦ 𝑦 Goal: Efficiently answer the query What if dimension 𝑒 is large, say 𝑒 ≫ log π‘œ ?

  33. Nearest Neighbor Search (NNS) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 𝑉 𝑒 for some finite 𝑉 𝑦 ∈ 𝑉 𝑒 , find the 𝑧 𝑗 which is closest to Τ¦ Query: given a point Τ¦ 𝑦 Goal: Efficiently answer the query What if dimension 𝑒 is large, say 𝑒 ≫ log π‘œ ? Curse of dimensionality: It is conjectured that to solve NNS in high dimension requires either super-polynomial( π‘œ ) space or super-polynomial( 𝑒 ) time.

  34. Nearest Neighbor Search (NNS) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 𝑉 𝑒 for some finite 𝑉 𝑦 ∈ 𝑉 𝑒 , find the 𝑧 𝑗 which is closest to Τ¦ Query: given a point Τ¦ 𝑦 Goal: Efficiently answer the query What if dimension 𝑒 is large, say 𝑒 ≫ log π‘œ ? Curse of dimensionality: It is conjectured that to solve NNS in high dimension requires either super-polynomial( π‘œ ) space or super-polynomial( 𝑒 ) time. Blessing: Randomization + Approximation

  35. Approximate Near ( est ) Neighbor (ANN) Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise

  36. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise

  37. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise If we can solve (𝑑, 𝑠) -ANN, then we can solve 𝑑 -ANN with little overhead.

  38. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise If we can solve (𝑑, 𝑠) -ANN, then we can solve 𝑑 -ANN with little overhead. 𝐸 π‘›π‘—π‘œ = 1≀𝑗<π‘˜β‰€π‘œ dist(𝑧 𝑗 , 𝑧 π‘˜ ) min 𝐸 𝑛𝑏𝑦 = 1≀𝑗<π‘˜β‰€π‘œ dist(𝑧 𝑗 , 𝑧 π‘˜ ) max 𝑆 = 𝐸 π‘›π‘—π‘œ 𝑑 βˆ’1 , 𝐸 π‘›π‘—π‘œ 𝑑 0 , 𝐸 π‘›π‘—π‘œ 𝑑 1 , β‹― , 𝐸 𝑛𝑏𝑦 β‹… β‹… β‹… 2 2 2

  39. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise If we can solve (𝑑, 𝑠) -ANN, then we can solve 𝑑 -ANN with little overhead. 𝐸 π‘›π‘—π‘œ = 1≀𝑗<π‘˜β‰€π‘œ dist(𝑧 𝑗 , 𝑧 π‘˜ ) min 𝐸 𝑛𝑏𝑦 = 1≀𝑗<π‘˜β‰€π‘œ dist(𝑧 𝑗 , 𝑧 π‘˜ ) max 𝑆 = 𝐸 π‘›π‘—π‘œ 𝑑 βˆ’1 , 𝐸 π‘›π‘—π‘œ 𝑑 0 , 𝐸 π‘›π‘—π‘œ 𝑑 1 , β‹― , 𝐸 𝑛𝑏𝑦 β‹… β‹… β‹… 2 2 2 Let 𝑠 βˆ— be the min in 𝑆 s.t. ( 𝑑, 𝑠 βˆ— ) -ANN returns yes with 𝑧 βˆ—

  40. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise If we can solve (𝑑, 𝑠) -ANN, then we can solve 𝑑 -ANN with little overhead. 𝐸 π‘›π‘—π‘œ = 1≀𝑗<π‘˜β‰€π‘œ dist(𝑧 𝑗 , 𝑧 π‘˜ ) min 𝐸 𝑛𝑏𝑦 = 1≀𝑗<π‘˜β‰€π‘œ dist(𝑧 𝑗 , 𝑧 π‘˜ ) max 𝑆 = 𝐸 π‘›π‘—π‘œ 𝑑 βˆ’1 , 𝐸 π‘›π‘—π‘œ 𝑑 0 , 𝐸 π‘›π‘—π‘œ 𝑑 1 , β‹― , 𝐸 𝑛𝑏𝑦 β‹… β‹… β‹… 2 2 2 Let 𝑠 βˆ— be the min in 𝑆 s.t. ( 𝑑, 𝑠 βˆ— ) -ANN returns yes with 𝑧 βˆ— 𝑦, 𝑧 βˆ— ≀ 𝑦, 𝑧 𝑗 > 𝑠 βˆ— / 𝑑 𝑑 β‹… 𝑠 βˆ— βˆ€π‘§ 𝑗 ∈ π‘Œ: dist Τ¦ dist Τ¦

  41. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise If we can solve (𝑑, 𝑠) -ANN, then we can solve 𝑑 -ANN with little overhead. 𝑑 -ANN can be solved βˆ€π‘  : ( 𝑑, 𝑠) -ANN can be solved 𝐸 𝑛𝑏𝑦 𝐸 π‘›π‘—π‘œ with space 𝑃 𝑑 β‹… log 𝑑 ΰ΅— with space 𝑑 and query time 𝑒 𝐸 𝑛𝑏𝑦 𝐸 π‘›π‘—π‘œ and query time 𝑃 𝑒 β‹… log 2 log 𝑑 ΰ΅—

  42. Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ , 𝒅 -ANN (Approximate Nearest Neighbor): Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑 β‹… min 1β‰€π‘˜β‰€π‘œ dist( Τ¦ 𝑦, 𝑧 π‘˜ ) (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise If we can solve (𝑑, 𝑠) -ANN, then we can solve 𝑑 -ANN with little overhead. 𝑑 -ANN can be solved βˆ€π‘  : ( 𝑑, 𝑠) -ANN can be solved 𝐸 𝑛𝑏𝑦 𝐸 π‘›π‘—π‘œ with space 𝑃 𝑑 β‹… log 𝑑 ΰ΅— with space 𝑑 and query time 𝑒 𝐸 𝑛𝑏𝑦 𝐸 π‘›π‘—π‘œ and query time 𝑃 𝑒 β‹… log 2 log 𝑑 ΰ΅—

  43. Setup: consider Hamming space 0,1 𝑒 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 0,1 𝑒 𝑦 ∈ 0,1 𝑒 , Query: given a point Τ¦ (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise GF(2): two elements {0,1} , XOR as sum, AND as multiplication. 𝑒 Therefore, 𝑨 𝑗 π‘˜ = 𝐡𝑧 𝑗 π‘˜ = Οƒ π‘š=1 𝐡 π‘˜π‘š β‹… 𝑧 𝑗 π‘š mod 2 . Let 𝑙 , π‘ž and 𝑑 to be fixed later. Sample a 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries from Bernoulli(π‘ž) . For 𝑗 = 1,2, β‹― , π‘œ : let 𝑨 𝑗 = 𝐡𝑧 𝑗 ∈ 0,1 𝑙 on finite field GF(2). dist 𝑣, 𝑨 𝑗 ≀ 𝑑 for all 𝑣 ∈ 0,1 𝑙 . Store all 𝑑 -balls 𝐢 𝑑 𝑣 = 𝑧 𝑗 𝑦 ∈ 0,1 𝑒 : Now, upon a query Τ¦ Retrieve 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . If 𝐢 𝑑 𝐡 Τ¦ 𝑦 = βˆ… return β€œno”, else return any 𝑧 𝑗 ∈ 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . Space: 𝑃(π‘œ β‹… 2 𝑙 ) Query time: 𝑃(𝑒𝑙) computation + 𝑃(1) memory access

  44. Setup: consider Hamming space 0,1 𝑒 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 0,1 𝑒 𝑦 ∈ 0,1 𝑒 , Query: given a point Τ¦ (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Let 𝑙 , π‘ž and 𝑑 to be fixed later. Sample a 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries from Bernoulli(π‘ž) . For 𝑗 = 1,2, β‹― , π‘œ : let 𝑨 𝑗 = 𝐡𝑧 𝑗 ∈ 0,1 𝑙 on finite field GF(2). dist 𝑣, 𝑨 𝑗 ≀ 𝑑 for all 𝑣 ∈ 0,1 𝑙 . Store all 𝑑 -balls 𝐢 𝑑 𝑣 = 𝑧 𝑗 𝑦 ∈ 0,1 𝑒 : Now, upon a query Τ¦ Retrieve 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . If 𝐢 𝑑 𝐡 Τ¦ 𝑦 = βˆ… return β€œno”, else return any 𝑧 𝑗 ∈ 𝐢 𝑑 (𝐡 Τ¦ 𝑦) .

  45. Setup: consider Hamming space 0,1 𝑒 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 0,1 𝑒 𝑦 ∈ 0,1 𝑒 , Query: given a point Τ¦ (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Let 𝑙 , π‘ž and 𝑑 to be fixed later. Sample a 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries from Bernoulli(π‘ž) . For 𝑗 = 1,2, β‹― , π‘œ : let 𝑨 𝑗 = 𝐡𝑧 𝑗 ∈ 0,1 𝑙 on finite field GF(2). dist 𝑣, 𝑨 𝑗 ≀ 𝑑 for all 𝑣 ∈ 0,1 𝑙 . Store all 𝑑 -balls 𝐢 𝑑 𝑣 = 𝑧 𝑗 𝑦 ∈ 0,1 𝑒 : Now, upon a query Τ¦ Retrieve 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . If 𝐢 𝑑 𝐡 Τ¦ 𝑦 = βˆ… return β€œno”, else return any 𝑧 𝑗 ∈ 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . 𝑧 ∈ 0,1 𝑒 : For suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦

  46. Setup: consider Hamming space 0,1 𝑒 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 0,1 𝑒 𝑦 ∈ 0,1 𝑒 , Query: given a point Τ¦ (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Let 𝑙 , π‘ž and 𝑑 to be fixed later. Sample a 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries from Bernoulli(π‘ž) . For 𝑗 = 1,2, β‹― , π‘œ : let 𝑨 𝑗 = 𝐡𝑧 𝑗 ∈ 0,1 𝑙 on finite field GF(2). dist 𝑣, 𝑨 𝑗 ≀ 𝑑 for all 𝑣 ∈ 0,1 𝑙 . Store all 𝑑 -balls 𝐢 𝑑 𝑣 = 𝑧 𝑗 𝑦 ∈ 0,1 𝑒 : Now, upon a query Τ¦ Retrieve 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . If 𝐢 𝑑 𝐡 Τ¦ 𝑦 = βˆ… return β€œno”, else return any 𝑧 𝑗 ∈ 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . 𝑧 ∈ 0,1 𝑒 : For suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ (𝑑, 𝑠) -ANN is solved w.h.p.

  47. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦

  48. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ each row vector 𝐡 𝑗 of 𝐡 has i.i.d. entries ∈ Bernoulli(π‘ž)

  49. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ each row vector 𝐡 𝑗 of 𝐡 has i.i.d. entries ∈ Bernoulli(π‘ž)

  50. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ each row vector 𝐡 𝑗 of 𝐡 has i.i.d. entries ∈ Bernoulli(π‘ž) an alternative view regarding the generation of 𝐡 𝑗 : β€’ build 𝐷 βŠ† [𝑒] s.t. each element in [𝑒] is chosen independently with pr. 2π‘ž β€’ each coordinate in 𝐷 is independently set to 0 or 1 each with pr. 1/2

  51. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ each row vector 𝐡 𝑗 of 𝐡 has i.i.d. entries ∈ Bernoulli(π‘ž) an alternative view regarding the generation of 𝐡 𝑗 : β€’ build 𝐷 βŠ† [𝑒] s.t. each element in [𝑒] is chosen independently with pr. 2π‘ž β€’ each coordinate in 𝐷 is independently set to 0 or 1 each with pr. 1/2 observations: β€’ if π‘˜ βˆ‰ 𝐷 for all coordinates π‘˜ where Τ¦ 𝑦 π‘˜ β‰  𝑧 π‘˜ , then 𝐡 Τ¦ Τ¦ 𝑦 𝑗 = 𝐡 Τ¦ 𝑧 𝑗 β€’ otherwise, if exists such π‘˜ ∈ 𝐷 , then once all other entries in 𝐡 𝑗 are fixed, exactly one of the two choices for 𝐡 π‘—π‘˜ will make 𝐡 Τ¦ 𝑦 𝑗 = 𝐡 Τ¦ 𝑧 𝑗

  52. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ each row vector 𝐡 𝑗 of 𝐡 has i.i.d. entries ∈ Bernoulli(π‘ž) choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠

  53. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠

  54. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise

  55. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise

  56. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise 1 4 + 1 2 βˆ’2 βˆ’ 𝑑+1 𝑙 3 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 choose 𝑑 = = 2

  57. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise 1 4 + 1 2 βˆ’2 βˆ’ 𝑑+1 𝑙 3 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 choose 𝑑 = = 2

  58. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise independent 1 4 + 1 2 βˆ’2 βˆ’ 𝑑+1 𝑙 3 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 choose 𝑑 = = 2

  59. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ Chernoff bound: 𝑙 Let independent r.v. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ∈ {0,1} , let π‘Œ = Οƒ 𝑗=1 π‘Œ 𝑗 , then for 𝑑 > 0 : Pr π‘Œ β‰₯ 𝔽 π‘Œ + 𝑑 ≀ exp βˆ’ 2𝑑 2 choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 𝑙 Pr π‘Œ ≀ 𝔽 π‘Œ βˆ’ 𝑑 ≀ exp βˆ’ 2𝑑 2 𝑙 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise independent 1 4 + 1 2 βˆ’2 βˆ’ 𝑑+1 𝑙 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 3 choose 𝑑 = = 2

  60. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ Chernoff bound: 𝑙 Let independent r.v. π‘Œ 1 , π‘Œ 2 , β‹― , π‘Œ 𝑙 ∈ {0,1} , let π‘Œ = Οƒ 𝑗=1 π‘Œ 𝑗 , then for 𝑑 > 0 : Pr π‘Œ β‰₯ 𝔽 π‘Œ + 𝑑 ≀ exp βˆ’ 2𝑑 2 choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 𝑙 Pr π‘Œ ≀ 𝔽 π‘Œ βˆ’ 𝑑 ≀ exp βˆ’ 2𝑑 2 𝑙 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise independent 1 4 + 1 2 βˆ’2 βˆ’ 𝑑+1 𝑙 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 3 choose 𝑑 = = 2

  61. random 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries ∈ Bernoulli(π‘ž) 𝑒 𝑦 𝑗 = Οƒ π‘˜=1 computation on GF(2): 𝐡 Τ¦ 𝐡 π‘—π‘˜ β‹… Τ¦ 𝑦 π‘˜ mod 2 𝑧 ∈ 0,1 𝑒 : for suitable 𝑙 ∈ 𝑃(log π‘œ) , p and s; βˆ€ Τ¦ 𝑦, Τ¦ choose π‘ž to satisfy 1 βˆ’ 2π‘ž = 2 βˆ’1/𝑠 π‘Œ 𝑗 where π‘Œ 𝑗 = α‰Š1 if 𝐡 Τ¦ 𝑦 𝑗 β‰  𝐡 Τ¦ 𝑧 𝑗 𝑙 𝑧 = π‘Œ = Οƒ 𝑗=1 dist 𝐡 Τ¦ 𝑦, 𝐡 Τ¦ 0 otherwise independent 1 4 + 1 2 βˆ’2 βˆ’ 𝑑+1 𝑙 3 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 choose 𝑑 = = 2

  62. Setup: consider Hamming space 0,1 𝑒 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 0,1 𝑒 𝑦 ∈ 0,1 𝑒 , Query: given a point Τ¦ (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise ln π‘œ 3 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 . 1 βˆ’ 2 βˆ’1/𝑠 Ξ€ Ξ€ Let 𝑙 = 1 8βˆ’2 βˆ’(𝑑+2) , π‘ž = 2 and 𝑑 = Ξ€ Sample a 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries from Bernoulli(π‘ž) . For 𝑗 = 1,2, β‹― , π‘œ : let 𝑨 𝑗 = 𝐡𝑧 𝑗 ∈ 0,1 𝑙 on finite field GF(2). dist 𝑣, 𝑨 𝑗 ≀ 𝑑 for all 𝑣 ∈ 0,1 𝑙 . Store all 𝑑 -balls 𝐢 𝑑 𝑣 = 𝑧 𝑗 𝑦 ∈ 0,1 𝑒 : Now, upon a query Τ¦ Retrieve 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . If 𝐢 𝑑 𝐡 Τ¦ 𝑦 = βˆ… return β€œno”, else return any 𝑧 𝑗 ∈ 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . Space: 𝑃(π‘œ β‹… 2 𝑙 ) Query time: 𝑃(𝑒𝑙) computation + 𝑃(1) memory access

  63. Setup: consider Hamming space 0,1 𝑒 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ 0,1 𝑒 𝑦 ∈ 0,1 𝑒 , Query: given a point Τ¦ (𝒅, 𝒔) -ANN (Approximate Near Neighbor): β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise ln π‘œ 3 8 βˆ’ 2 βˆ’(𝑑+2) 𝑙 . 1 βˆ’ 2 βˆ’1/𝑠 Ξ€ Ξ€ Let 𝑙 = 1 8βˆ’2 βˆ’(𝑑+2) , π‘ž = 2 and 𝑑 = Ξ€ Sample a 𝑙 Γ— 𝑒 Boolean matrix 𝐡 with i.i.d. entries from Bernoulli(π‘ž) . For 𝑗 = 1,2, β‹― , π‘œ : let 𝑨 𝑗 = 𝐡𝑧 𝑗 ∈ 0,1 𝑙 on finite field GF(2). dist 𝑣, 𝑨 𝑗 ≀ 𝑑 for all 𝑣 ∈ 0,1 𝑙 . Store all 𝑑 -balls 𝐢 𝑑 𝑣 = 𝑧 𝑗 𝑦 ∈ 0,1 𝑒 : Now, upon a query Τ¦ Retrieve 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . If 𝐢 𝑑 𝐡 Τ¦ 𝑦 = βˆ… return β€œno”, else return any 𝑧 𝑗 ∈ 𝐢 𝑑 (𝐡 Τ¦ 𝑦) . Space: 𝑃(π‘œ β‹… 2 𝑙 ) Query time: 𝑃(𝑒𝑙) computation + 𝑃(1) memory access Space: π‘œ 𝑃(1) Solve (𝑑, 𝑠) -ANN w.h.p. Query time: 𝑃(𝑒 log π‘œ)

  64. Locality-Sensitive Hashing (LSH) Given a metric space π‘Œ, dist , a random β„Ž: π‘Œ β†’ 𝑉 drawn from β„‹ is an (𝑠, 𝑑𝑠, π‘ž, π‘Ÿ) -LSH if, for all Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ :

  65. Locality-Sensitive Hashing (LSH) Given a metric space π‘Œ, dist , a random β„Ž: π‘Œ β†’ 𝑉 drawn from β„‹ is an (𝑠, 𝑑𝑠, π‘ž, π‘Ÿ) -LSH if, for all Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : π‘ž > π‘Ÿ

  66. Locality-Sensitive Hashing (LSH) Given a metric space π‘Œ, dist , a random β„Ž: π‘Œ β†’ 𝑉 drawn from β„‹ is an (𝑠, 𝑑𝑠, π‘ž, π‘Ÿ) -LSH if, for all Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : If there exists an (𝑠, 𝑑𝑠, π‘ž, π‘Ÿ) -LSH β„Ž: π‘Œ β†’ 𝑉 , then there exists an (𝑠, 𝑑𝑠, π‘ž 𝑙 , π‘Ÿ 𝑙 ) -LSH 𝑕: π‘Œ β†’ 𝑉 𝑙

  67. Locality-Sensitive Hashing (LSH) Given a metric space π‘Œ, dist , a random β„Ž: π‘Œ β†’ 𝑉 drawn from β„‹ is an (𝑠, 𝑑𝑠, π‘ž, π‘Ÿ) -LSH if, for all Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : If there exists an (𝑠, 𝑑𝑠, π‘ž, π‘Ÿ) -LSH β„Ž: π‘Œ β†’ 𝑉 , then there exists an (𝑠, 𝑑𝑠, π‘ž 𝑙 , π‘Ÿ 𝑙 ) -LSH 𝑕: π‘Œ β†’ 𝑉 𝑙 Independently draw β„Ž 1 , β„Ž 2 , β‹― , β„Ž 𝑙 according to the distribution of β„Ž ∈ 𝑉 𝑙 𝑕 𝑦 = β„Ž 1 𝑦 , β„Ž 2 𝑦 , β‹― , β„Ž 𝑙 𝑦

  68. (𝒅, 𝒔) -ANN Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ : β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Suppose we have (𝑠, 𝑑𝑠, π‘ž βˆ— , Ξ€ 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 βˆ€ Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ :

  69. (𝒅, 𝒔) -ANN Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ : β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Suppose we have (𝑠, 𝑑𝑠, π‘ž βˆ— , Ξ€ 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 βˆ€ Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in nondecreasing order of 𝑕(𝑧 𝑗 ) . Upon query Τ¦ 𝑦 ∈ π‘Œ : Find all 𝑧 𝑗 such that 𝑕 Τ¦ 𝑦 = 𝑕(𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”.

  70. (𝒅, 𝒔) -ANN Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ : β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Suppose we have (𝑠, 𝑑𝑠, π‘ž βˆ— , Ξ€ 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 βˆ€ Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in nondecreasing order of 𝑕(𝑧 𝑗 ) . Upon query Τ¦ 𝑦 ∈ π‘Œ : Find all 𝑧 𝑗 such that 𝑕 Τ¦ 𝑦 = 𝑕(𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”. If the real answer is β€œno”: always correct If the real answer is β€œyes”: correct with probability at least π‘ž βˆ—

  71. (𝒅, 𝒔) -ANN Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ : β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Suppose we have (𝑠, 𝑑𝑠, π‘ž βˆ— , Ξ€ 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 βˆ€ Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in nondecreasing order of 𝑕(𝑧 𝑗 ) . Upon query Τ¦ 𝑦 ∈ π‘Œ : Find all 𝑧 𝑗 such that 𝑕 Τ¦ 𝑦 = 𝑕(𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”. If the real answer is β€œno”: always correct If the real answer is β€œyes”: correct with probability at least π‘ž βˆ— Space: 𝑃(π‘œ) Time: 𝑃(log π‘œ)

  72. (𝒅, 𝒔) -ANN Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ : β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Suppose we have (𝑠, 𝑑𝑠, π‘ž βˆ— , Ξ€ 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 βˆ€ Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in nondecreasing order of 𝑕(𝑧 𝑗 ) . Upon query Τ¦ 𝑦 ∈ π‘Œ : Find all 𝑧 𝑗 such that 𝑕 Τ¦ 𝑦 = 𝑕(𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”. If the real answer is β€œno”: always correct If the real answer is β€œyes”: correct with probability at least π‘ž βˆ— Space: 𝑃(π‘œ) Time: 𝑃(log π‘œ) + 𝑃(1) in expectation

  73. (𝒅, 𝒔) -ANN Setup: metric space (π‘Œ, dist) Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: given a point Τ¦ 𝑦 ∈ π‘Œ : β€’ Return a 𝑧 𝑗 s.t. dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 if βˆƒπ‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ ≀ 𝑠 β€’ Return β€œno” if βˆ€π‘§ π‘˜ : dist Τ¦ 𝑦, 𝑧 π‘˜ > 𝑑𝑠 β€’ Arbitrary answer otherwise Suppose we have (𝑑, 𝑑𝑠, π‘ž βˆ— , Ξ€ 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 βˆ€ Τ¦ 𝑦, Τ¦ 𝑧 ∈ π‘Œ : 1 π‘ž βˆ— , independently draw 𝑕 1 , 𝑕 2 , β‹― , 𝑕 π‘š . Ξ€ Let π‘š = Maintain π’Ž sorted tables: For π‘˜ = 1,2, β‹― , π‘š : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in table- π‘˜ in nondecreasing order of 𝑕 π‘˜ (𝑧 𝑗 ) . Upon query π’š ∈ 𝒀 : Find first 10 β‹… π‘š such 𝑧 𝑗 that βˆƒπ‘˜: 𝑕 π‘˜ Τ¦ 𝑦 = 𝑕 π‘˜ (𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”.

  74. (𝑑, 𝑑𝑠, π‘ž βˆ— , Ξ€ (𝑑, 𝑠) -ANN in metric space (π‘Œ, dist) 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: some point Τ¦ 𝑦 ∈ π‘Œ 1 π‘ž βˆ— , independently draw 𝑕 1 , 𝑕 2 , β‹― , 𝑕 π‘š . Let π‘š = Ξ€ Maintain π’Ž sorted tables: For π‘˜ = 1,2, β‹― , π‘š : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in table- π‘˜ in nondecreasing order of 𝑕 π‘˜ (𝑧 𝑗 ) . Upon query π’š ∈ 𝒀 : Find first 10 β‹… π‘š such 𝑧 𝑗 that βˆƒπ‘˜: 𝑕 π‘˜ Τ¦ 𝑦 = 𝑕 π‘˜ (𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”.

  75. (𝑑, 𝑑𝑠, π‘ž βˆ— , Ξ€ (𝑑, 𝑠) -ANN in metric space (π‘Œ, dist) 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: some point Τ¦ 𝑦 ∈ π‘Œ 1 π‘ž βˆ— , independently draw 𝑕 1 , 𝑕 2 , β‹― , 𝑕 π‘š . Let π‘š = Ξ€ Maintain π’Ž sorted tables: For π‘˜ = 1,2, β‹― , π‘š : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in table- π‘˜ in nondecreasing order of 𝑕 π‘˜ (𝑧 𝑗 ) . Upon query π’š ∈ 𝒀 : Find first 10 β‹… π‘š such 𝑧 𝑗 that βˆƒπ‘˜: 𝑕 π‘˜ Τ¦ 𝑦 = 𝑕 π‘˜ (𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”. If the real answer is β€œno”: always correct.

  76. (𝑑, 𝑑𝑠, π‘ž βˆ— , Ξ€ (𝑑, 𝑠) -ANN in metric space (π‘Œ, dist) 1 π‘œ) -LSH 𝑕: π‘Œ β†’ 𝑉 Data: π‘œ points 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ ∈ π‘Œ Query: some point Τ¦ 𝑦 ∈ π‘Œ 1 π‘ž βˆ— , independently draw 𝑕 1 , 𝑕 2 , β‹― , 𝑕 π‘š . Let π‘š = Ξ€ Maintain π’Ž sorted tables: For π‘˜ = 1,2, β‹― , π‘š : Store 𝑧 1 , 𝑧 2 , β‹― , 𝑧 π‘œ in table- π‘˜ in nondecreasing order of 𝑕 π‘˜ (𝑧 𝑗 ) . Upon query π’š ∈ 𝒀 : Find first 10 β‹… π‘š such 𝑧 𝑗 that βˆƒπ‘˜: 𝑕 π‘˜ Τ¦ 𝑦 = 𝑕 π‘˜ (𝑧 𝑗 ) by binary search. If encounter some 𝑧 𝑗 such that dist Τ¦ 𝑦, 𝑧 𝑗 ≀ 𝑑𝑠 then return this 𝑧 𝑗 ; otherwise return β€œno”. If the real answer is β€œno”: always correct. If exists 𝑧 𝑑 such that dist Τ¦ 𝑦, 𝑧 𝑑 ≀ 𝑠 , then Pr answer "no" ≀

Recommend


More recommend