Approximate Nearest Line Search in High Dimensions Sepideh Mahabadi
The NLS Problem β’ Given: a set of π lines π in β π
The NLS Problem β’ Given: a set of π lines π in β π β’ Goal: build a data structure s.t. β given a query π , find the closest line β β to π
The NLS Problem β’ Given: a set of π lines π in β π β’ Goal: build a data structure s.t. β given a query π , find the closest line β β to π β polynomial space β sub-linear query time
The NLS Problem β’ Given: a set of π lines π in β π β’ Goal: build a data structure s.t. β given a query π , find the closest line β β to π β polynomial space β sub-linear query time Approximation β’ Finds an approximate closest line β ππππ π , β β€ ππππ ( π , β β )(1 + π )
Nearest Neighbor Problems Motivation Previous Work Our result Notation BACKGROUND
Nearest Neighbor Problem NN: Given a set of π points π , build a data structure s.t. given a query point π , finds the closest point π β to π .
Nearest Neighbor Problem NN: Given a set of π points π , build a data structure s.t. given a query point π , finds the closest point π β to π . Applications: database, information retrieval, β’ pattern recognition, computer vision β Features: dimensions β Objects: points β Similarity: distance between points
Nearest Neighbor Problem NN: Given a set of π points π , build a data structure s.t. given a query point π , finds the closest point π β to π . Applications: database, information retrieval, β’ pattern recognition, computer vision β Features: dimensions β Objects: points β Similarity: distance between points Current solutions suffer from βcurse of β’ dimensionalityβ: β Either space or query time is exponential in π β Little improvement over linear search
Approximate Nearest Neighbor(ANN) β’ ANN: Given a set of π points π , build a data structure s.t. given a query point π , finds an approximate closest point π to π , i.e., ππππ π , π β€ ππππ π , π β 1 + π
Approximate Nearest Neighbor(ANN) β’ ANN: Given a set of π points π , build a data structure s.t. given a query point π , finds an approximate closest point π to π , i.e., ππππ π , π β€ ππππ π , π β 1 + π β’ There exist data structures with different tradeoffs. Example: 1 β Space: ππ π π2 π 1 π log π β Query time: π
Motivation for NLS One of the simplest generalizations of ANN: data items are represented by π - flats (affine subspace) instead of points
Motivation for NLS One of the simplest generalizations of ANN: data items are represented by π - flats (affine subspace) instead of points β’ Model data under linear variations β’ Unknown or unimportant parameters in database
Motivation for NLS One of the simplest generalizations of ANN: data items are represented by π - flats (affine subspace) instead of points β’ Model data under linear variations β’ Unknown or unimportant parameters in database β’ Example: β Varying light gain parameter of images β Each image/point becomes a line β Search for the closest line to the query image
Previous and Related Work β’ Magen[02]: Nearest Subspace Search for constant π π 1 1 β Query time is fast : π + log π + π β Space is super-polynomial : 2 log π π 1
Previous and Related Work β’ Magen[02]: Nearest Subspace Search for constant π π 1 1 β Query time is fast : π + log π + π β Space is super-polynomial : 2 log π π 1 Dual Problem: Database is a set of points, query is a π -flat β’ [AIKN] for 1-flat: for any π > 0 β Query time: π π 3 π 0 . 5+π’ π2 + 1 1 β Space: π 2 π π π’2
Previous and Related Work β’ Magen[02]: Nearest Subspace Search for constant π π 1 1 β Query time is fast : π + log π + π β Space is super-polynomial : 2 log π π 1 Dual Problem: Database is a set of points, query is a π -flat β’ [AIKN] for 1-flat: for any π > 0 β Query time: π π 3 π 0 . 5+π’ π2 + 1 1 β Space: π 2 π π π’2 β’ Very recently [MNSS] extended it for π -flats π π+1βπ +π’ β Query time π π ππ π+1βπ + π log π 1 1+ π’ π ) β Space: π ( π
Our Result We give a randomized algorithm that for any sufficiently small π reports a 1 + π -approximate solution with high probability 1 β’ Space: π + π π π2 π 1 1 β’ Time : π + log π + π
Our Result We give a randomized algorithm that for any sufficiently small π reports a 1 + π -approximate solution with high probability 1 β’ Space: π + π π π2 π 1 1 β’ Time : π + log π + π β’ Matches up to polynomials, the performance of best algorithm for ANN. No exponential dependence on π
Our Result We give a randomized algorithm that for any sufficiently small π reports a 1 + π -approximate solution with high probability 1 β’ Space: π + π π π2 π 1 1 β’ Time : π + log π + π β’ Matches up to polynomials, the performance of best algorithm for ANN. No exponential dependence on π β’ The first algorithm with poly log query time and polynomial space for objects other than points
Our Result We give a randomized algorithm that for any sufficiently small π reports a 1 + π -approximate solution with high probability 1 β’ Space: π + π π π2 π 1 1 β’ Time : π + log π + π β’ Matches up to polynomials, the performance of best algorithm for ANN. No exponential dependence on π β’ The first algorithm with poly log query time and polynomial space for objects other than points β’ Only uses reductions to ANN
Notation β’ π : the set of lines with size π β’ q : the query point
Notation β’ π : the set of lines with size π β’ q : the query point β’ πΆ ( π , π ) : ball of radius π around π
Notation β’ π : the set of lines with size π β’ q : the query point β’ πΆ ( π , π ) : ball of radius π around π β’ ππππ : the Euclidean distance between objects
Notation β’ π : the set of lines with size π β’ q : the query point β’ πΆ ( π , π ) : ball of radius π around π β’ ππππ : the Euclidean distance between objects β’ πππππ : defined between lines
Notation β’ π : the set of lines with size π β’ q : the query point β’ πΆ ( π , π ) : ball of radius π around π β’ ππππ : the Euclidean distance between objects β’ πππππ : defined between lines β’ π -close: two lines β , ββ² are π -close if sin( πππππ β , β β² ) β€ π
Net Module Unbounded Module Parallel Module MODULES
Net Module β’ Intuition: sampling points from each line finely enough to get a set of points π , and building an π΅ππ ( π , π ) should suffice to find the approximate closest line.
Net Module β’ Intuition: sampling points from each line finely enough to get a set of points π , and building an π΅ππ ( π , π ) should suffice to find the approximate closest line. Lemma: β’ Let π¦ be the separation parameter: distance between two adjacent samples on a line, Then β Either the returned line β π is an approximate closest line β Or ππππ π , β π β€ π¦ / π
Net Module β’ Intuition: sampling points from each line finely enough to get a set of points π , and building an π΅ππ ( π , π ) should suffice to find the approximate closest line. Lemma: β’ Let π¦ be the separation parameter: distance between two adjacent samples on a line, Then β Either the returned line β π is an approximate closest line β Or ππππ π , β π β€ π¦ / π Issue: It should be used inside a bounded region
Unbounded Module - Intuition β’ All lines in π pass through the origin π
Unbounded Module - Intuition β’ All lines in π pass through the origin π β’ Data structure: β Project all lines onto any sphere π π , π to get point set π β Build ANN data structure π΅ππ ( π , π )
Unbounded Module - Intuition β’ All lines in π pass through the origin π β’ Data structure: β Project all lines onto any sphere π π , π to get point set π β Build ANN data structure π΅ππ ( π , π ) β’ Query Algorithm: β Project the query on π ( π , π ) to get πβ² β Find the approximate closest point to πβ² , i.e., π = π΅ππ π π β² β Return the corresponding line of π
Unbounded Module β’ All lines in π pass through a small ball πΆ π , π β’ Query is far enough, outside of πΆ ( π , π ) β’ Use the same data structure and query algorithm
Recommend
More recommend