Proximity in the Age of Distraction: Robust Approximate Nearest - - PowerPoint PPT Presentation
Proximity in the Age of Distraction: Robust Approximate Nearest - - PowerPoint PPT Presentation
Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled Sepideh Mahabadi UIUC MIT Nearest Neighbor Problem Nearest Neighbor Dataset of points in a metric space (, ) , e.g.
Nearest Neighbor Problem
Nearest Neighbor
Dataset of π points π in a metric space (π, ππ), e.g. βπ
Nearest Neighbor
Dataset of π points π in a metric space (π, ππ), e.g. βπ A query point π comes online
π
Nearest Neighbor
Dataset of π points π in a metric space (π, ππ), e.g. βπ A query point π comes online Goal:
- Find the nearest data point πβ
π πβ
Nearest Neighbor
Dataset of π points π in a metric space (π, ππ), e.g. βπ A query point π comes online Goal:
- Find the nearest data point πβ
- Do it in sub-linear time and small space
π πβ
Approximate Nearest Neighbor
Dataset of π points π in a metric space (π, ππ), e.g. βπ A query point π comes online Goal:
- Find the nearest data point πβ
- Do it in sub-linear time and small space
- Approximate Nearest Neighbor
β If optimal distance is π , report a point in distance cπ for c = 1 + π
π πβ π
Approximate Nearest Neighbor
Dataset of π points π in a metric space (π, ππ), e.g. βπ A query point π comes online Goal:
- Find the nearest data point πβ
- Do it in sub-linear time and small space
- Approximate Nearest Neighbor
β If optimal distance is π , report a point in distance cπ for c = 1 + π β For Hamming (and π1) query time is π1/π(π) [IM98] β and for Euclidean (π2) it is π
1 π(π2) [AI08]
π πβ π
Applications of NN
Searching for the closest object
Robust NN Problem
Robustness
The data points are:
Robustness
The data points are:
- corrupted, noisy
- Image denoising
Robustness
The data points are:
- corrupted, noisy
- Image denoising
- Incomplete
- Recommendation: Sparse
matrix
1 - 0 - -
- 0 1 - 0 -
- -
- 1 1 -
Movies Users
Robustness
The data points are:
- corrupted, noisy
- Image denoising
- Incomplete
- Recommendation: Sparse
matrix
- Irrelevant
- Occluded image
1 - 0 - -
- 0 1 - 0 -
- -
- 1 1 -
Movies Users
The Robust NN problem
- Dataset of π points π in βπ
n=3 π1 = (3,4,0,5) π2 = (3,2,1,2) π3 = (2,3,3,1)
The Robust NN problem
- Dataset of π points π in βπ
- A parameter π
n=3,k=2 π1 = (3,4,0,5) π2 = (3,2,1,2) π3 = (2,3,3,1)
The Robust NN problem
- Dataset of π points π in βπ
- A parameter π
- A query point π comes online
- Find the closest point after
removing π coordinates π = (1,2, 1,5) n=3,k=2 π1 = (3,4,0,5) π2 = (3,2,1,2) π3 = (2,3,3,1)
The Robust NN problem
- Dataset of π points π in βπ
- A parameter π
- A query point π comes online
- Find the closest point after
removing π coordinates π = (1,2, 1,5) n=3,k=2 π1 = (3,4,0,5) dist=1 π2 = (3,2,1,2) dist=0 π3 = (2,3,3,1) dist=2
The Robust NN problem
- Dataset of π points π in βπ
- A parameter π
- A query point π comes online
- Find the closest point after
removing π coordinates π = (1,2, 1,5) n=3,k=2 π1 = (3,4,0,5) dist=1 π2 = (3,2,1,2) dist=0 π3 = (2,3,3,1) dist=2
The Robust NN problem
- Dataset of π points π in βπ
- A parameter π
- A query point π comes online
- Find the closest point after
removing π coordinates π = (1,2, 1,5) n=3,k=2 π1 = (3,4,0,5) dist=1 π2 = (3,2,1,2) dist=0 π3 = (2,3,3,1) dist=2
- Different set of coordinates for different points
- Applying this naively would require π
π β ππ
Budgeted Version
- Dataset of π points π in βπ
- π weights
π₯ = (π₯1, π₯2, β¦ , π₯π) β 0,1 π π₯ = 0.5, 0.5, 0.8, 0.3 n=3 π1 = (1,4,0,3) π2 = (3,2,4,2) π3 = (4,6,3,4)
Budgeted Version
- Dataset of π points π in βπ
- π weights
π₯ = (π₯1, π₯2, β¦ , π₯π) β 0,1 π
- A query point π comes online
- Find the closest point after
removing a set of coordinates πΆ
- f weight at most π.
π₯ = 0.5, 0.5, 0.8, 0.3 π = (1,2, 5,5) n=3 π1 = (1,4,0,3) π2 = (3,2,4,2) π3 = (4,6,3,4)
Budgeted Version
- Dataset of π points π in βπ
- π weights
π₯ = (π₯1, π₯2, β¦ , π₯π) β 0,1 π
- A query point π comes online
- Find the closest point after
removing a set of coordinates πΆ
- f weight at most π.
π₯ = 0.5, 0.5, 0.8, 0.3 π = (1,2, 5,5) n=3 π1 = (1,4,0,3) π2 = (3,2,4,2) π3 = (4,6,3,4)
Budgeted Version
- Dataset of π points π in βπ
- π weights
π₯ = (π₯1, π₯2, β¦ , π₯π) β 0,1 π
- A query point π comes online
- Find the closest point after
removing a set of coordinates πΆ
- f weight at most π.
π₯ = 0.5, 0.5, 0.8, 0.3 π = (1,2, 5,5) n=3 π1 = (1,4,0,3) dist=4 π2 = (3,2,4,2) dist=1 π3 = (4,6,3,4) dist=3
Budgeted Version
- Dataset of π points π in βπ
- π weights
π₯ = (π₯1, π₯2, β¦ , π₯π) β 0,1 π
- A query point π comes online
- Find the closest point after
removing a set of coordinates πΆ
- f weight at most π.
π₯ = 0.5, 0.5, 0.8, 0.3 π = (1,2, 5,5) n=3 π1 = (1,4,0,3) dist=4 π2 = (3,2,4,2) dist=1 π3 = (4,6,3,4) dist=3
Results
Bicriterion Approximation, for π1 norm
- Suppose that for πβ β π we have πππ‘π’ π, πβ = π after
ignoring π coordinates
Results
Bicriterion Approximation, for π1 norm
- Suppose that for πβ β π we have πππ‘π’ π, πβ = π after
ignoring π coordinates
- For π β (0,1)
- Report a point π s.t. πππ‘π’ π, π = π(π /π) after
ignoring π(π/π) coordinates.
- Query time equals to ππ queries in 2-ANN data-
structure
Results
Bicriterion Approximation, for π1 norm
- Suppose that for πβ β π we have πππ‘π’ π, πβ = π after
ignoring π coordinates
- For π β (0,1)
- Report a point π s.t. πππ‘π’ π, π = π(π /π) after
ignoring π(π/π) coordinates.
- Query time equals to ππ queries in 2-ANN data-
structure Why not single criterion?
- Equivalent to exact near neighbor in Hamming: there is a
point within distance π of the query iff there is a point within distance 0 after ignoring π = π coordinates
Results
distance #ignored coordinates Query Time #Queries Query type Opt π π
Results
distance #ignored coordinates Query Time #Queries Query type Opt π π π1 π(π π) π(π π) ππ 2-ANN
Results
distance #ignored coordinates Query Time #Queries Query type Opt π π π1 π(π π) π(π π) ππ 2-ANN ππͺ
π(π π + 1 π
1/p
)
π(π π + 1
π )
ππ π1/p-ANN
Results
distance #ignored coordinates Query Time #Queries Query type Opt π π π1 π(π π) π(π π) ππ 2-ANN ππͺ
π(π π + 1 π
1/p
)
π(π π + 1
π )
ππ π1/p-ANN (1 + π)- approximation π (1 + π) π( π
ππ)
O(ππ
π )
1 + π βANN
Results
distance #ignored coordinates Query Time #Queries Query type Opt π π π1 π(π π) π(π π) ππ 2-ANN ππͺ
π(π π + 1 π
1/p
)
π(π π + 1
π )
ππ π1/p-ANN (1 + π)- approximation π (1 + π) π( π
ππ)
O(ππ
π )
1 + π βANN Budgeted Version π(π ) Weight of π(1) ππ 2-ANN +π(πππ4)
Algorithm
High Level Algorithm
- Theorem. If for a point πβ β π , the π1 distance of π and πβ is
at most π after removing π coordinates, there exists an algorithm which reports a point π whose distance to π is π(π /π) after removing π(π/π) coordinates.
High Level Algorithm
- Cannot apply randomized dimensionality reduction e.g.
Johnson-Lindenstrauss
- Theorem. If for a point πβ β π , the π1 distance of π and πβ is
at most π after removing π coordinates, there exists an algorithm which reports a point π whose distance to π is π(π /π) after removing π(π/π) coordinates.
High Level Algorithm
- Cannot apply randomized dimensionality reduction e.g.
Johnson-Lindenstrauss
- A set of randomized maps ππ, ππ, β¦ ππ: βπ β βπβ²
- All of them map far points from query to far points
- At least one of them maps a close point to a close point
- Theorem. If for a point πβ β π , the π1 distance of π and πβ is
at most π after removing π coordinates, there exists an algorithm which reports a point π whose distance to π is π(π /π) after removing π(π/π) coordinates.
High Level Algorithm
- Cannot apply randomized dimensionality reduction e.g.
Johnson-Lindenstrauss
- A set of randomized maps ππ, ππ, β¦ ππ: βπ β βπβ²
- All of them map far points from query to far points
- At least one of them maps a close point to a close point
- W.l.o.g. assume that the query is the origin
- Find the data point with minimum norm.
- Theorem. If for a point πβ β π , the π1 distance of π and πβ is
at most π after removing π coordinates, there exists an algorithm which reports a point π whose distance to π is π(π /π) after removing π(π/π) coordinates.
A Randomized Map
- Embed all the points using a random mapping π: βπ β βπβ²:
A Randomized Map
- Embed all the points using a random mapping π: βπ β βπβ²:
- Repeat π’ = π(ln π) times
- Sample each coordinate in [π] with probability π/π
A Randomized Map
- Embed all the points using a random mapping π: βπ β βπβ²:
- Repeat π’ = π(ln π) times
- Sample each coordinate in [π] with probability π/π
- E.g. π = 5
- round 1: coordinates (1,3,4) sampled
- round 2: coordinate (4) sampled
- π€ = 3,6,1,2,4 maps to π π€ = (3,1,2,2)
A Randomized Map
- Embed all the points using a random mapping π: βπ β βπβ²:
- Repeat π’ = π(ln π) times
- Sample each coordinate in [π] with probability π/π
- E.g. π = 5
- round 1: coordinates (1,3,4) sampled
- round 2: coordinate (4) sampled
- π€ = 3,6,1,2,4 maps to π π€ = (3,1,2,2)
π½ πβ² = π·(π π¦π¨ π β πΊ π)
A Randomized Map
- Embed all the points using a random mapping π: βπ β βπβ²:
- Repeat π’ = π(ln π) times
- Sample each coordinate in [π] with probability π/π
- E.g. π = 5
- round 1: coordinates (1,3,4) sampled
- round 2: coordinate (4) sampled
- π€ = 3,6,1,2,4 maps to π π€ = (3,1,2,2)
- Simple setup: Consider a vector π€ where each coordinate is either 0 or β
π½ πβ² = π·(π π¦π¨ π β πΊ π)
A Randomized Map
- Embed all the points using a random mapping π: βπ β βπβ²:
- Repeat π’ = π(ln π) times
- Sample each coordinate in [π] with probability π/π
- E.g. π = 5
- round 1: coordinates (1,3,4) sampled
- round 2: coordinate (4) sampled
- π€ = 3,6,1,2,4 maps to π π€ = (3,1,2,2)
- Simple setup: Consider a vector π€ where each coordinate is either 0 or β
- Close point:
- π€ has at most π large coordinates
- Probability of avoiding large coordinates is at least 1 β π
π πβ ln π
β πβπ π½ πβ² = π·(π π¦π¨ π β πΊ π)
A Randomized Map
- Embed all the points using a random mapping π: βπ β βπβ²:
- Repeat π’ = π(ln π) times
- Sample each coordinate in [π] with probability π/π
- E.g. π = 5
- round 1: coordinates (1,3,4) sampled
- round 2: coordinate (4) sampled
- π€ = 3,6,1,2,4 maps to π π€ = (3,1,2,2)
- Simple setup: Consider a vector π€ where each coordinate is either 0 or β
- Close point:
- π€ has at most π large coordinates
- Probability of avoiding large coordinates is at least 1 β π
π πβ ln π
β πβπ
- Far point
- π€ has at least π/π large coordinates
- Probability of missing large coordinates is at most 1 β π
π (π/π)β ln π
β 1/π π½ πβ² = π·(π π¦π¨ π β πΊ π)
Outline
- Embed all the points using a random mapping π: βπ β βπβ²
- With probability πβπ
- all far points will be mapped to far points under π1 distance
- a close by point will be mapped to a close by point under π1 distance.
Outline
- Embed all the points using a random mapping π: βπ β βπβ²
- With probability πβπ
- all far points will be mapped to far points under π1 distance
- a close by point will be mapped to a close by point under π1 distance.
- We can use ANN as a black-box to find it
Outline
- Embed all the points using a random mapping π: βπ β βπβ²
- With probability πβπ
- all far points will be mapped to far points under π1 distance
- a close by point will be mapped to a close by point under π1 distance.
- We can use ANN as a black-box to find it
- Repeat this embedding O(ππ log π) times and report the
best.
Algorithm
ππ
π
Ignore k-coords
Algorithm
t times
f
Sample every coordinate w.p πΊ/π
ππ
π
ππβ²
π
Ignore k-coords π1
Algorithm
t times
f
Sample every coordinate w.p πΊ/π
ππ
π
ππβ²
π
π΅ππ Ignore k-coords π1
Algorithm
t times
f
Sample every coordinate w.p πΊ/π
. . . ππ times ππ
π
ππβ²
π
π΅ππ
t times
f
Sample every coordinate w.p πΊ/π
ππ
π
ππβ²
π
π΅ππ Ignore k-coords π1
Algorithm
Check the distance of all ππ candidates and report the closest one after ignoring π coordinates
t times
f
Sample every coordinate w.p πΊ/π
. . . ππ times ππ
π
ππβ²
π
π΅ππ
t times
f
Sample every coordinate w.p πΊ/π
ππ
π
ππβ²
π
π΅ππ Ignore k-coords π1
Analysis
Truncation
Coordinates are not necessarily π and β
Truncation
Coordinates are not necessarily π and β
Let πβ² be obtained by sampling every coordinate with probability π = πΊ/π
Truncation
Coordinates are not necessarily π and β
Let πβ² be obtained by sampling every coordinate with probability π = πΊ/π
- π½ πβ² π€
1 = π π€ 1
Truncation
Coordinates are not necessarily π and β
Let πβ² be obtained by sampling every coordinate with probability π = πΊ/π
- π½ πβ² π€
1 = π π€ 1
- πππ¬ πβ² π€
1 = π 1 β π π€ 2 2
Truncation
Coordinates are not necessarily π and β
Let πβ² be obtained by sampling every coordinate with probability π = πΊ/π
- π½ πβ² π€
1 = π π€ 1
- πππ¬ πβ² π€
1 = π 1 β π π€ 2 2
Need to bound the influence of every coordinate.
Truncation
Coordinates are not necessarily π and β
Let πβ² be obtained by sampling every coordinate with probability π = πΊ/π
- π½ πβ² π€
1 = π π€ 1
- πππ¬ πβ² π€
1 = π 1 β π π€ 2 2
Need to bound the influence of every coordinate.
- Truncate every coordinate at π/π ,i.e., ππ = π§π£π¨{ππ, π/π}
Truncation
Coordinates are not necessarily π and β
Let πβ² be obtained by sampling every coordinate with probability π = πΊ/π
- π½ πβ² π€
1 = π π€ 1
- πππ¬ πβ² π€
1 = π 1 β π π€ 2 2
Need to bound the influence of every coordinate.
- Truncate every coordinate at π/π ,i.e., ππ = π§π£π¨{ππ, π/π}
- π βLight point: a point with norm β€ π after truncation
- Rβ Heavy point: a point with norm β₯ π after truncation
Truncation
Coordinates are not necessarily π and β
Let πβ² be obtained by sampling every coordinate with probability π = πΊ/π
- π½ πβ² π€
1 = π π€ 1
- πππ¬ πβ² π€
1 = π 1 β π π€ 2 2
Need to bound the influence of every coordinate.
- Truncate every coordinate at π/π ,i.e., ππ = π§π£π¨{ππ, π/π}
- π βLight point: a point with norm β€ π after truncation
- Rβ Heavy point: a point with norm β₯ π after truncation
- Close point: a point with norm β€ π after ignoring π coordinates
- Far point: a point with norm β₯ π /π after ignoring π/π coordinates
Truncation
Coordinates are not necessarily π and β
Let πβ² be obtained by sampling every coordinate with probability π = πΊ/π
- π½ πβ² π€
1 = π π€ 1
- πππ¬ πβ² π€
1 = π 1 β π π€ 2 2
Need to bound the influence of every coordinate.
- Truncate every coordinate at π/π ,i.e., ππ = π§π£π¨{ππ, π/π}
- π βLight point: a point with norm β€ π after truncation
- Rβ Heavy point: a point with norm β₯ π after truncation
- Close point: a point with norm β€ π after ignoring π coordinates
- Far point: a point with norm β₯ π /π after ignoring π/π coordinates
Claim:
- A close point is 2π βlight.
Truncation
Coordinates are not necessarily π and β
Let πβ² be obtained by sampling every coordinate with probability π = πΊ/π
- π½ πβ² π€
1 = π π€ 1
- πππ¬ πβ² π€
1 = π 1 β π π€ 2 2
Need to bound the influence of every coordinate.
- Truncate every coordinate at π/π ,i.e., ππ = π§π£π¨{ππ, π/π}
- π βLight point: a point with norm β€ π after truncation
- Rβ Heavy point: a point with norm β₯ π after truncation
- Close point: a point with norm β€ π after ignoring π coordinates
- Far point: a point with norm β₯ π /π after ignoring π/π coordinates
Claim:
- A close point is 2π βlight.
- A far point is π
π βheavy.
Truncation
Coordinates are not necessarily π and β
Let πβ² be obtained by sampling every coordinate with probability π = πΊ/π
- π½ πβ² π€
1 = π π€ 1
- πππ¬ πβ² π€
1 = π 1 β π π€ 2 2
Need to bound the influence of every coordinate.
- Truncate every coordinate at π/π ,i.e., ππ = π§π£π¨{ππ, π/π}
- π βLight point: a point with norm β€ π after truncation
- Rβ Heavy point: a point with norm β₯ π after truncation
- Close point: a point with norm β€ π after ignoring π coordinates
- Far point: a point with norm β₯ π /π after ignoring π/π coordinates
Claim:
- A close point is 2π βlight.
- A far point is π
π βheavy.
- Analyze the behavior of the maps over the truncated points instead.
π1 Norm
Using truncation
- Bound the variance and prove concentration for πβ² by Chebyshev
π1 Norm
Using truncation
- Bound the variance and prove concentration for πβ² by Chebyshev
π is a concatenation of π = π·(ππ π) such πβ²
π1 Norm
Using truncation
- Bound the variance and prove concentration for πβ² by Chebyshev
π is a concatenation of π = π·(ππ π) such πβ²
- π½ π π€
1 = π’π π€ 1
- Prove concentration for π using Chernoff
π1 Norm
Using truncation
- Bound the variance and prove concentration for πβ² by Chebyshev
π is a concatenation of π = π·(ππ π) such πβ²
- π½ π π€
1 = π’π π€ 1
- Prove concentration for π using Chernoff
distance #ignored coordinates Query Time #Queries Query type Opt π π π1 π(π π) π(π π) ππ 2-ANN
Generalizations
ο± π΄π norm
Generalizations
ο± π΄π norm
- Minimize the π π
π norm, i.e., π ππ π similar to the π1 norm
Generalizations
ο± π΄π norm
- Minimize the π π
π norm, i.e., π ππ π similar to the π1 norm
ππ
π(π π + 1 π
1/π
)
π(π π + 1
π )
ππ π1/πͺ-ANN
Generalizations
ο± π΄π norm
- Minimize the π π
π norm, i.e., π ππ π similar to the π1 norm
ο± Budgeted
ππ
π(π π + 1 π
1/π
)
π(π π + 1
π )
ππ π1/πͺ-ANN
Generalizations
ο± π΄π norm
- Minimize the π π
π norm, i.e., π ππ π similar to the π1 norm
ο± Budgeted
- Map:
- sample coordinate π with probability proportional to π/ππ
ππ
π(π π + 1 π
1/π
)
π(π π + 1
π )
ππ π1/πͺ-ANN
Generalizations
ο± π΄π norm
- Minimize the π π
π norm, i.e., π ππ π similar to the π1 norm
ο± Budgeted
- Map:
- sample coordinate π with probability proportional to π/ππ
- To maintain the expectation multiply sampled coordinates by ππ
ππ
π(π π + 1 π
1/π
)
π(π π + 1
π )
ππ π1/πͺ-ANN
Generalizations
ο± π΄π norm
- Minimize the π π
π norm, i.e., π ππ π similar to the π1 norm
ο± Budgeted
- Map:
- sample coordinate π with probability proportional to π/ππ
- To maintain the expectation multiply sampled coordinates by ππ
- Truncation:
- Truncate coordinate π with by value
π π /ππβπ
ππ
π(π π + 1 π
1/π
)
π(π π + 1
π )
ππ π1/πͺ-ANN
Generalizations
ο± π΄π norm
- Minimize the π π
π norm, i.e., π ππ π similar to the π1 norm
ο± Budgeted
- Map:
- sample coordinate π with probability proportional to π/ππ
- To maintain the expectation multiply sampled coordinates by ππ
- Truncation:
- Truncate coordinate π with by value
π π /ππβπ
- E.g. a coordinate of cost approaching 0 will be truncated to 0
ππ
π(π π + 1 π
1/π
)
π(π π + 1
π )
ππ π1/πͺ-ANN
Generalizations
ο± π΄π norm
- Minimize the π π
π norm, i.e., π ππ π similar to the π1 norm
ο± Budgeted
- Map:
- sample coordinate π with probability proportional to π/ππ
- To maintain the expectation multiply sampled coordinates by ππ
- Truncation:
- Truncate coordinate π with by value
π π /ππβπ
- E.g. a coordinate of cost approaching 0 will be truncated to 0
ππ
π(π π + 1 π
1/π
)
π(π π + 1
π )
ππ π1/πͺ-ANN Budgeted Version π(π ) Weight of π(1) ππ 2-ANN +π(πππ4)
Conclusion
distance #ignored coordinates Query Time #Queries Query type Opt π π π1 π(π π) π(π π) ππ 2-ANN ππ
π(π π + 1 π
1/π
)
π(π π + 1
π )
ππ π1/πͺ-ANN (1 + π)- approximation π (1 + π) π( π
ππ)
O(ππ
π )
1 + π βANN Budgeted Version π(π ) Weight of π(1) ππ 2-ANN +π(πππ4)
Conclusion
Open Problems
- Improve the dependence on π
- Prove lower bounds
distance #ignored coordinates Query Time #Queries Query type Opt π π π1 π(π π) π(π π) ππ 2-ANN ππ
π(π π + 1 π
1/π
)
π(π π + 1
π )
ππ π1/πͺ-ANN (1 + π)- approximation π (1 + π) π( π
ππ)
O(ππ
π )
1 + π βANN Budgeted Version π(π ) Weight of π(1) ππ 2-ANN +π(πππ4)
Conclusion
Open Problems
- Improve the dependence on π
- Prove lower bounds
distance #ignored coordinates Query Time #Queries Query type Opt π π π1 π(π π) π(π π) ππ 2-ANN ππ
π(π π + 1 π
1/π
)
π(π π + 1
π )
ππ π1/πͺ-ANN (1 + π)- approximation π (1 + π) π( π
ππ)
O(ππ
π )
1 + π βANN Budgeted Version π(π ) Weight of π(1) ππ 2-ANN +π(πππ4)