approximate nearest line search in high dimensions
play

Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi - PowerPoint PPT Presentation

Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi The NLS Problem Given: a set of lines in The NLS Problem Given: a set of lines in Goal: build a data structure s.t. given a


  1. Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi

  2. The NLS Problem β€’ Given: a set of 𝑂 lines 𝑀 in ℝ 𝑒

  3. The NLS Problem β€’ Given: a set of 𝑂 lines 𝑀 in ℝ 𝑒 β€’ Goal: build a data structure s.t. – given a query π‘Ÿ , find the closest line β„“ βˆ— to π‘Ÿ

  4. The NLS Problem β€’ Given: a set of 𝑂 lines 𝑀 in ℝ 𝑒 β€’ Goal: build a data structure s.t. – given a query π‘Ÿ , find the closest line β„“ βˆ— to π‘Ÿ – polynomial space – sub-linear query time

  5. The NLS Problem β€’ Given: a set of 𝑂 lines 𝑀 in ℝ 𝑒 β€’ Goal: build a data structure s.t. – given a query π‘Ÿ , find the closest line β„“ βˆ— to π‘Ÿ – polynomial space – sub-linear query time Approximation β€’ Finds an approximate closest line β„“ 𝑒𝑒𝑒𝑒 π‘Ÿ , β„“ ≀ 𝑒𝑒𝑒𝑒 ( π‘Ÿ , β„“ βˆ— )(1 + πœ— )

  6. Nearest Neighbor Problems Motivation Previous Work Our result Notation BACKGROUND

  7. Nearest Neighbor Problem NN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point π‘Ÿ , finds the closest point π‘ž βˆ— to π‘Ÿ .

  8. Nearest Neighbor Problem NN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point π‘Ÿ , finds the closest point π‘ž βˆ— to π‘Ÿ . Applications: database, information retrieval, β€’ pattern recognition, computer vision – Features: dimensions – Objects: points – Similarity: distance between points

  9. Nearest Neighbor Problem NN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point π‘Ÿ , finds the closest point π‘ž βˆ— to π‘Ÿ . Applications: database, information retrieval, β€’ pattern recognition, computer vision – Features: dimensions – Objects: points – Similarity: distance between points Current solutions suffer from β€œcurse of β€’ dimensionality”: – Either space or query time is exponential in 𝑒 – Little improvement over linear search

  10. Approximate Nearest Neighbor(ANN) β€’ ANN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point π‘Ÿ , finds an approximate closest point π‘ž to π‘Ÿ , i.e., 𝑒𝑒𝑒𝑒 π‘Ÿ , π‘ž ≀ 𝑒𝑒𝑒𝑒 π‘Ÿ , π‘ž βˆ— 1 + πœ—

  11. Approximate Nearest Neighbor(ANN) β€’ ANN: Given a set of 𝑂 points 𝑄 , build a data structure s.t. given a query point π‘Ÿ , finds an approximate closest point π‘ž to π‘Ÿ , i.e., 𝑒𝑒𝑒𝑒 π‘Ÿ , π‘ž ≀ 𝑒𝑒𝑒𝑒 π‘Ÿ , π‘ž βˆ— 1 + πœ— β€’ There exist data structures with different tradeoffs. Example: 1 – Space: 𝑒𝑂 𝑃 πœ—2 𝑃 1 𝑒 log 𝑂 – Query time: πœ—

  12. Motivation for NLS One of the simplest generalizations of ANN: data items are represented by 𝑙 - flats (affine subspace) instead of points

  13. Motivation for NLS One of the simplest generalizations of ANN: data items are represented by 𝑙 - flats (affine subspace) instead of points β€’ Model data under linear variations β€’ Unknown or unimportant parameters in database

  14. Motivation for NLS One of the simplest generalizations of ANN: data items are represented by 𝑙 - flats (affine subspace) instead of points β€’ Model data under linear variations β€’ Unknown or unimportant parameters in database β€’ Example: – Varying light gain parameter of images – Each image/point becomes a line – Search for the closest line to the query image

  15. Previous and Related Work β€’ Magen[02]: Nearest Subspace Search for constant 𝑙 𝑃 1 1 – Query time is fast : 𝑒 + log 𝑂 + πœ— – Space is super-polynomial : 2 log 𝑂 𝑃 1

  16. Previous and Related Work β€’ Magen[02]: Nearest Subspace Search for constant 𝑙 𝑃 1 1 – Query time is fast : 𝑒 + log 𝑂 + πœ— – Space is super-polynomial : 2 log 𝑂 𝑃 1 Dual Problem: Database is a set of points, query is a 𝑙 -flat β€’ [AIKN] for 1-flat: for any 𝑒 > 0 – Query time: 𝑃 𝑒 3 𝑂 0 . 5+𝑒 πœ—2 + 1 1 – Space: 𝑒 2 𝑂 𝑃 𝑒2

  17. Previous and Related Work β€’ Magen[02]: Nearest Subspace Search for constant 𝑙 𝑃 1 1 – Query time is fast : 𝑒 + log 𝑂 + πœ— – Space is super-polynomial : 2 log 𝑂 𝑃 1 Dual Problem: Database is a set of points, query is a 𝑙 -flat β€’ [AIKN] for 1-flat: for any 𝑒 > 0 – Query time: 𝑃 𝑒 3 𝑂 0 . 5+𝑒 πœ—2 + 1 1 – Space: 𝑒 2 𝑂 𝑃 𝑒2 β€’ Very recently [MNSS] extended it for 𝑙 -flats 𝑙 𝑙+1βˆ’πœ +𝑒 – Query time 𝑃 π‘œ πœπ‘™ 𝑙+1βˆ’πœ + π‘œ log 𝑃 1 1+ 𝑒 π‘œ ) – Space: 𝑃 ( π‘œ

  18. Our Result We give a randomized algorithm that for any sufficiently small πœ— reports a 1 + πœ— -approximate solution with high probability 1 β€’ Space: 𝑂 + 𝑒 𝑃 πœ—2 𝑃 1 1 β€’ Time : 𝑒 + log 𝑂 + πœ—

  19. Our Result We give a randomized algorithm that for any sufficiently small πœ— reports a 1 + πœ— -approximate solution with high probability 1 β€’ Space: 𝑂 + 𝑒 𝑃 πœ—2 𝑃 1 1 β€’ Time : 𝑒 + log 𝑂 + πœ— β€’ Matches up to polynomials, the performance of best algorithm for ANN. No exponential dependence on 𝑒

  20. Our Result We give a randomized algorithm that for any sufficiently small πœ— reports a 1 + πœ— -approximate solution with high probability 1 β€’ Space: 𝑂 + 𝑒 𝑃 πœ—2 𝑃 1 1 β€’ Time : 𝑒 + log 𝑂 + πœ— β€’ Matches up to polynomials, the performance of best algorithm for ANN. No exponential dependence on 𝑒 β€’ The first algorithm with poly log query time and polynomial space for objects other than points

  21. Our Result We give a randomized algorithm that for any sufficiently small πœ— reports a 1 + πœ— -approximate solution with high probability 1 β€’ Space: 𝑂 + 𝑒 𝑃 πœ—2 𝑃 1 1 β€’ Time : 𝑒 + log 𝑂 + πœ— β€’ Matches up to polynomials, the performance of best algorithm for ANN. No exponential dependence on 𝑒 β€’ The first algorithm with poly log query time and polynomial space for objects other than points β€’ Only uses reductions to ANN

  22. Notation β€’ 𝑀 : the set of lines with size 𝑂 β€’ q : the query point

  23. Notation β€’ 𝑀 : the set of lines with size 𝑂 β€’ q : the query point β€’ 𝐢 ( 𝑑 , 𝑠 ) : ball of radius 𝑠 around 𝑑

  24. Notation β€’ 𝑀 : the set of lines with size 𝑂 β€’ q : the query point β€’ 𝐢 ( 𝑑 , 𝑠 ) : ball of radius 𝑠 around 𝑑 β€’ 𝑒𝑒𝑒𝑒 : the Euclidean distance between objects

  25. Notation β€’ 𝑀 : the set of lines with size 𝑂 β€’ q : the query point β€’ 𝐢 ( 𝑑 , 𝑠 ) : ball of radius 𝑠 around 𝑑 β€’ 𝑒𝑒𝑒𝑒 : the Euclidean distance between objects β€’ π‘π‘œπ‘π‘π‘ : defined between lines

  26. Notation β€’ 𝑀 : the set of lines with size 𝑂 β€’ q : the query point β€’ 𝐢 ( 𝑑 , 𝑠 ) : ball of radius 𝑠 around 𝑑 β€’ 𝑒𝑒𝑒𝑒 : the Euclidean distance between objects β€’ π‘π‘œπ‘π‘π‘ : defined between lines β€’ πœ€ -close: two lines β„“ , β„“β€² are πœ€ -close if sin( π‘π‘œπ‘π‘π‘ β„“ , β„“ β€² ) ≀ πœ€

  27. Net Module Unbounded Module Parallel Module MODULES

  28. Net Module β€’ Intuition: sampling points from each line finely enough to get a set of points 𝑄 , and building an 𝐡𝑂𝑂 ( 𝑄 , πœ— ) should suffice to find the approximate closest line.

  29. Net Module β€’ Intuition: sampling points from each line finely enough to get a set of points 𝑄 , and building an 𝐡𝑂𝑂 ( 𝑄 , πœ— ) should suffice to find the approximate closest line. Lemma: β€’ Let 𝑦 be the separation parameter: distance between two adjacent samples on a line, Then – Either the returned line β„“ π‘ž is an approximate closest line – Or 𝑒𝑒𝑒𝑒 π‘Ÿ , β„“ π‘ž ≀ 𝑦 / πœ—

  30. Net Module β€’ Intuition: sampling points from each line finely enough to get a set of points 𝑄 , and building an 𝐡𝑂𝑂 ( 𝑄 , πœ— ) should suffice to find the approximate closest line. Lemma: β€’ Let 𝑦 be the separation parameter: distance between two adjacent samples on a line, Then – Either the returned line β„“ π‘ž is an approximate closest line – Or 𝑒𝑒𝑒𝑒 π‘Ÿ , β„“ π‘ž ≀ 𝑦 / πœ— Issue: It should be used inside a bounded region

  31. Unbounded Module - Intuition β€’ All lines in 𝑀 pass through the origin 𝑝

  32. Unbounded Module - Intuition β€’ All lines in 𝑀 pass through the origin 𝑝 β€’ Data structure: – Project all lines onto any sphere 𝑇 𝑝 , 𝑠 to get point set 𝑄 – Build ANN data structure 𝐡𝑂𝑂 ( 𝑄 , πœ— )

  33. Unbounded Module - Intuition β€’ All lines in 𝑀 pass through the origin 𝑝 β€’ Data structure: – Project all lines onto any sphere 𝑇 𝑝 , 𝑠 to get point set 𝑄 – Build ANN data structure 𝐡𝑂𝑂 ( 𝑄 , πœ— ) β€’ Query Algorithm: – Project the query on 𝑇 ( 𝑝 , 𝑠 ) to get π‘Ÿβ€² – Find the approximate closest point to π‘Ÿβ€² , i.e., π‘ž = 𝐡𝑂𝑂 𝑄 π‘Ÿ β€² – Return the corresponding line of π‘ž

  34. Unbounded Module β€’ All lines in 𝑀 pass through a small ball 𝐢 𝑝 , 𝑠 β€’ Query is far enough, outside of 𝐢 ( 𝑝 , 𝑆 ) β€’ Use the same data structure and query algorithm

Recommend


More recommend