Proximity in the Age of Distraction: Robust Approximate Nearest - - PowerPoint PPT Presentation

β–Ά
proximity in the age of distraction robust approximate
SMART_READER_LITE
LIVE PREVIEW

Proximity in the Age of Distraction: Robust Approximate Nearest - - PowerPoint PPT Presentation

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled Sepideh Mahabadi UIUC MIT Nearest Neighbor Problem Nearest Neighbor Dataset of points in a metric space (, ) , e.g.


slide-1
SLIDE 1

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search

Sariel Har-Peled UIUC Sepideh Mahabadi MIT

slide-2
SLIDE 2

Nearest Neighbor Problem

slide-3
SLIDE 3

Nearest Neighbor

Dataset of π‘œ points 𝑄 in a metric space (π‘Œ, π‘’π‘Œ), e.g. ℝ𝑒

slide-4
SLIDE 4

Nearest Neighbor

Dataset of π‘œ points 𝑄 in a metric space (π‘Œ, π‘’π‘Œ), e.g. ℝ𝑒 A query point π‘Ÿ comes online

π‘Ÿ

slide-5
SLIDE 5

Nearest Neighbor

Dataset of π‘œ points 𝑄 in a metric space (π‘Œ, π‘’π‘Œ), e.g. ℝ𝑒 A query point π‘Ÿ comes online Goal:

  • Find the nearest data point π‘žβˆ—

π‘Ÿ π‘žβˆ—

slide-6
SLIDE 6

Nearest Neighbor

Dataset of π‘œ points 𝑄 in a metric space (π‘Œ, π‘’π‘Œ), e.g. ℝ𝑒 A query point π‘Ÿ comes online Goal:

  • Find the nearest data point π‘žβˆ—
  • Do it in sub-linear time and small space

π‘Ÿ π‘žβˆ—

slide-7
SLIDE 7

Approximate Nearest Neighbor

Dataset of π‘œ points 𝑄 in a metric space (π‘Œ, π‘’π‘Œ), e.g. ℝ𝑒 A query point π‘Ÿ comes online Goal:

  • Find the nearest data point π‘žβˆ—
  • Do it in sub-linear time and small space
  • Approximate Nearest Neighbor

─ If optimal distance is 𝑠, report a point in distance c𝑠 for c = 1 + πœ—

π‘Ÿ π‘žβˆ— π‘ž

slide-8
SLIDE 8

Approximate Nearest Neighbor

Dataset of π‘œ points 𝑄 in a metric space (π‘Œ, π‘’π‘Œ), e.g. ℝ𝑒 A query point π‘Ÿ comes online Goal:

  • Find the nearest data point π‘žβˆ—
  • Do it in sub-linear time and small space
  • Approximate Nearest Neighbor

─ If optimal distance is 𝑠, report a point in distance c𝑠 for c = 1 + πœ— ─ For Hamming (and 𝑀1) query time is π‘œ1/𝑃(𝑑) [IM98] ─ and for Euclidean (𝑀2) it is π‘œ

1 𝑃(𝑑2) [AI08]

π‘Ÿ π‘žβˆ— π‘ž

slide-9
SLIDE 9

Applications of NN

Searching for the closest object

slide-10
SLIDE 10

Robust NN Problem

slide-11
SLIDE 11

Robustness

The data points are:

slide-12
SLIDE 12

Robustness

The data points are:

  • corrupted, noisy
  • Image denoising
slide-13
SLIDE 13

Robustness

The data points are:

  • corrupted, noisy
  • Image denoising
  • Incomplete
  • Recommendation: Sparse

matrix

1 - 0 - -

  • 0 1 - 0 -
  • -
  • 1 1 -

Movies Users

slide-14
SLIDE 14

Robustness

The data points are:

  • corrupted, noisy
  • Image denoising
  • Incomplete
  • Recommendation: Sparse

matrix

  • Irrelevant
  • Occluded image

1 - 0 - -

  • 0 1 - 0 -
  • -
  • 1 1 -

Movies Users

slide-15
SLIDE 15

The Robust NN problem

  • Dataset of π‘œ points 𝑄 in ℝ𝑒

n=3 π‘ž1 = (3,4,0,5) π‘ž2 = (3,2,1,2) π‘ž3 = (2,3,3,1)

slide-16
SLIDE 16

The Robust NN problem

  • Dataset of π‘œ points 𝑄 in ℝ𝑒
  • A parameter 𝒍

n=3,k=2 π‘ž1 = (3,4,0,5) π‘ž2 = (3,2,1,2) π‘ž3 = (2,3,3,1)

slide-17
SLIDE 17

The Robust NN problem

  • Dataset of π‘œ points 𝑄 in ℝ𝑒
  • A parameter 𝒍
  • A query point π‘Ÿ comes online
  • Find the closest point after

removing 𝒍 coordinates π‘Ÿ = (1,2, 1,5) n=3,k=2 π‘ž1 = (3,4,0,5) π‘ž2 = (3,2,1,2) π‘ž3 = (2,3,3,1)

slide-18
SLIDE 18

The Robust NN problem

  • Dataset of π‘œ points 𝑄 in ℝ𝑒
  • A parameter 𝒍
  • A query point π‘Ÿ comes online
  • Find the closest point after

removing 𝒍 coordinates π‘Ÿ = (1,2, 1,5) n=3,k=2 π‘ž1 = (3,4,0,5) dist=1 π‘ž2 = (3,2,1,2) dist=0 π‘ž3 = (2,3,3,1) dist=2

slide-19
SLIDE 19

The Robust NN problem

  • Dataset of π‘œ points 𝑄 in ℝ𝑒
  • A parameter 𝒍
  • A query point π‘Ÿ comes online
  • Find the closest point after

removing 𝒍 coordinates π‘Ÿ = (1,2, 1,5) n=3,k=2 π‘ž1 = (3,4,0,5) dist=1 π‘ž2 = (3,2,1,2) dist=0 π‘ž3 = (2,3,3,1) dist=2

slide-20
SLIDE 20

The Robust NN problem

  • Dataset of π‘œ points 𝑄 in ℝ𝑒
  • A parameter 𝒍
  • A query point π‘Ÿ comes online
  • Find the closest point after

removing 𝒍 coordinates π‘Ÿ = (1,2, 1,5) n=3,k=2 π‘ž1 = (3,4,0,5) dist=1 π‘ž2 = (3,2,1,2) dist=0 π‘ž3 = (2,3,3,1) dist=2

  • Different set of coordinates for different points
  • Applying this naively would require 𝑒

𝑙 β‰ˆ 𝑒𝑙

slide-21
SLIDE 21

Budgeted Version

  • Dataset of π‘œ points 𝑄 in ℝ𝑒
  • 𝑒 weights

π‘₯ = (π‘₯1, π‘₯2, … , π‘₯𝑒) ∈ 0,1 𝑒 π‘₯ = 0.5, 0.5, 0.8, 0.3 n=3 π‘ž1 = (1,4,0,3) π‘ž2 = (3,2,4,2) π‘ž3 = (4,6,3,4)

slide-22
SLIDE 22

Budgeted Version

  • Dataset of π‘œ points 𝑄 in ℝ𝑒
  • 𝑒 weights

π‘₯ = (π‘₯1, π‘₯2, … , π‘₯𝑒) ∈ 0,1 𝑒

  • A query point π‘Ÿ comes online
  • Find the closest point after

removing a set of coordinates 𝐢

  • f weight at most 𝟐.

π‘₯ = 0.5, 0.5, 0.8, 0.3 π‘Ÿ = (1,2, 5,5) n=3 π‘ž1 = (1,4,0,3) π‘ž2 = (3,2,4,2) π‘ž3 = (4,6,3,4)

slide-23
SLIDE 23

Budgeted Version

  • Dataset of π‘œ points 𝑄 in ℝ𝑒
  • 𝑒 weights

π‘₯ = (π‘₯1, π‘₯2, … , π‘₯𝑒) ∈ 0,1 𝑒

  • A query point π‘Ÿ comes online
  • Find the closest point after

removing a set of coordinates 𝐢

  • f weight at most 𝟐.

π‘₯ = 0.5, 0.5, 0.8, 0.3 π‘Ÿ = (1,2, 5,5) n=3 π‘ž1 = (1,4,0,3) π‘ž2 = (3,2,4,2) π‘ž3 = (4,6,3,4)

slide-24
SLIDE 24

Budgeted Version

  • Dataset of π‘œ points 𝑄 in ℝ𝑒
  • 𝑒 weights

π‘₯ = (π‘₯1, π‘₯2, … , π‘₯𝑒) ∈ 0,1 𝑒

  • A query point π‘Ÿ comes online
  • Find the closest point after

removing a set of coordinates 𝐢

  • f weight at most 𝟐.

π‘₯ = 0.5, 0.5, 0.8, 0.3 π‘Ÿ = (1,2, 5,5) n=3 π‘ž1 = (1,4,0,3) dist=4 π‘ž2 = (3,2,4,2) dist=1 π‘ž3 = (4,6,3,4) dist=3

slide-25
SLIDE 25

Budgeted Version

  • Dataset of π‘œ points 𝑄 in ℝ𝑒
  • 𝑒 weights

π‘₯ = (π‘₯1, π‘₯2, … , π‘₯𝑒) ∈ 0,1 𝑒

  • A query point π‘Ÿ comes online
  • Find the closest point after

removing a set of coordinates 𝐢

  • f weight at most 𝟐.

π‘₯ = 0.5, 0.5, 0.8, 0.3 π‘Ÿ = (1,2, 5,5) n=3 π‘ž1 = (1,4,0,3) dist=4 π‘ž2 = (3,2,4,2) dist=1 π‘ž3 = (4,6,3,4) dist=3

slide-26
SLIDE 26

Results

Bicriterion Approximation, for 𝑀1 norm

  • Suppose that for π‘žβˆ— βŠ‚ 𝑄 we have 𝑒𝑗𝑑𝑒 π‘Ÿ, π‘žβˆ— = 𝑠 after

ignoring 𝑙 coordinates

slide-27
SLIDE 27

Results

Bicriterion Approximation, for 𝑀1 norm

  • Suppose that for π‘žβˆ— βŠ‚ 𝑄 we have 𝑒𝑗𝑑𝑒 π‘Ÿ, π‘žβˆ— = 𝑠 after

ignoring 𝑙 coordinates

  • For πœ€ ∈ (0,1)
  • Report a point π‘ž s.t. 𝑒𝑗𝑑𝑒 π‘Ÿ, π‘ž = 𝑃(𝑠/πœ€) after

ignoring 𝑃(𝑙/πœ€) coordinates.

  • Query time equals to π‘œπœ€ queries in 2-ANN data-

structure

slide-28
SLIDE 28

Results

Bicriterion Approximation, for 𝑀1 norm

  • Suppose that for π‘žβˆ— βŠ‚ 𝑄 we have 𝑒𝑗𝑑𝑒 π‘Ÿ, π‘žβˆ— = 𝑠 after

ignoring 𝑙 coordinates

  • For πœ€ ∈ (0,1)
  • Report a point π‘ž s.t. 𝑒𝑗𝑑𝑒 π‘Ÿ, π‘ž = 𝑃(𝑠/πœ€) after

ignoring 𝑃(𝑙/πœ€) coordinates.

  • Query time equals to π‘œπœ€ queries in 2-ANN data-

structure Why not single criterion?

  • Equivalent to exact near neighbor in Hamming: there is a

point within distance 𝑠 of the query iff there is a point within distance 0 after ignoring 𝑙 = 𝑠 coordinates

slide-29
SLIDE 29

Results

distance #ignored coordinates Query Time #Queries Query type Opt 𝑠 𝑙

slide-30
SLIDE 30

Results

distance #ignored coordinates Query Time #Queries Query type Opt 𝑠 𝑙 𝑀1 𝑃(𝑠 πœ€) 𝑃(𝑙 πœ€) π‘œπœ€ 2-ANN

slide-31
SLIDE 31

Results

distance #ignored coordinates Query Time #Queries Query type Opt 𝑠 𝑙 𝑀1 𝑃(𝑠 πœ€) 𝑃(𝑙 πœ€) π‘œπœ€ 2-ANN 𝑀πͺ

𝑃(𝑠 𝑑 + 1 πœ€

1/p

)

𝑃(𝑙 𝑑 + 1

πœ€ )

π‘œπœ€ 𝑑1/p-ANN

slide-32
SLIDE 32

Results

distance #ignored coordinates Query Time #Queries Query type Opt 𝑠 𝑙 𝑀1 𝑃(𝑠 πœ€) 𝑃(𝑙 πœ€) π‘œπœ€ 2-ANN 𝑀πͺ

𝑃(𝑠 𝑑 + 1 πœ€

1/p

)

𝑃(𝑙 𝑑 + 1

πœ€ )

π‘œπœ€ 𝑑1/p-ANN (1 + πœ—)- approximation 𝑠(1 + πœ—) 𝑃( 𝑙

πœ—πœ€)

O(π‘œπœ€

πœ— )

1 + πœ— βˆ’ANN

slide-33
SLIDE 33

Results

distance #ignored coordinates Query Time #Queries Query type Opt 𝑠 𝑙 𝑀1 𝑃(𝑠 πœ€) 𝑃(𝑙 πœ€) π‘œπœ€ 2-ANN 𝑀πͺ

𝑃(𝑠 𝑑 + 1 πœ€

1/p

)

𝑃(𝑙 𝑑 + 1

πœ€ )

π‘œπœ€ 𝑑1/p-ANN (1 + πœ—)- approximation 𝑠(1 + πœ—) 𝑃( 𝑙

πœ—πœ€)

O(π‘œπœ€

πœ— )

1 + πœ— βˆ’ANN Budgeted Version 𝑃(𝑠) Weight of 𝑃(1) π‘œπœ€ 2-ANN +𝑃(π‘œπœ€π‘’4)

slide-34
SLIDE 34

Algorithm

slide-35
SLIDE 35

High Level Algorithm

  • Theorem. If for a point π‘žβˆ— βŠ‚ 𝑄 , the 𝑀1 distance of π‘Ÿ and π‘žβˆ— is

at most 𝑠 after removing 𝑙 coordinates, there exists an algorithm which reports a point π‘ž whose distance to π‘Ÿ is 𝑃(𝑠/πœ€) after removing 𝑃(𝑙/πœ€) coordinates.

slide-36
SLIDE 36

High Level Algorithm

  • Cannot apply randomized dimensionality reduction e.g.

Johnson-Lindenstrauss

  • Theorem. If for a point π‘žβˆ— βŠ‚ 𝑄 , the 𝑀1 distance of π‘Ÿ and π‘žβˆ— is

at most 𝑠 after removing 𝑙 coordinates, there exists an algorithm which reports a point π‘ž whose distance to π‘Ÿ is 𝑃(𝑠/πœ€) after removing 𝑃(𝑙/πœ€) coordinates.

slide-37
SLIDE 37

High Level Algorithm

  • Cannot apply randomized dimensionality reduction e.g.

Johnson-Lindenstrauss

  • A set of randomized maps π’ˆπŸ, π’ˆπŸ‘, … π’ˆπ’: ℝ𝒆 β†’ ℝ𝒆′
  • All of them map far points from query to far points
  • At least one of them maps a close point to a close point
  • Theorem. If for a point π‘žβˆ— βŠ‚ 𝑄 , the 𝑀1 distance of π‘Ÿ and π‘žβˆ— is

at most 𝑠 after removing 𝑙 coordinates, there exists an algorithm which reports a point π‘ž whose distance to π‘Ÿ is 𝑃(𝑠/πœ€) after removing 𝑃(𝑙/πœ€) coordinates.

slide-38
SLIDE 38

High Level Algorithm

  • Cannot apply randomized dimensionality reduction e.g.

Johnson-Lindenstrauss

  • A set of randomized maps π’ˆπŸ, π’ˆπŸ‘, … π’ˆπ’: ℝ𝒆 β†’ ℝ𝒆′
  • All of them map far points from query to far points
  • At least one of them maps a close point to a close point
  • W.l.o.g. assume that the query is the origin
  • Find the data point with minimum norm.
  • Theorem. If for a point π‘žβˆ— βŠ‚ 𝑄 , the 𝑀1 distance of π‘Ÿ and π‘žβˆ— is

at most 𝑠 after removing 𝑙 coordinates, there exists an algorithm which reports a point π‘ž whose distance to π‘Ÿ is 𝑃(𝑠/πœ€) after removing 𝑃(𝑙/πœ€) coordinates.

slide-39
SLIDE 39

A Randomized Map

  • Embed all the points using a random mapping π’ˆ: ℝ𝒆 β†’ ℝ𝒆′:
slide-40
SLIDE 40

A Randomized Map

  • Embed all the points using a random mapping π’ˆ: ℝ𝒆 β†’ ℝ𝒆′:
  • Repeat 𝑒 = 𝑃(ln π‘œ) times
  • Sample each coordinate in [𝑒] with probability πœ€/𝑙
slide-41
SLIDE 41

A Randomized Map

  • Embed all the points using a random mapping π’ˆ: ℝ𝒆 β†’ ℝ𝒆′:
  • Repeat 𝑒 = 𝑃(ln π‘œ) times
  • Sample each coordinate in [𝑒] with probability πœ€/𝑙
  • E.g. 𝑒 = 5
  • round 1: coordinates (1,3,4) sampled
  • round 2: coordinate (4) sampled
  • 𝑀 = 3,6,1,2,4 maps to 𝑔 𝑀 = (3,1,2,2)
slide-42
SLIDE 42

A Randomized Map

  • Embed all the points using a random mapping π’ˆ: ℝ𝒆 β†’ ℝ𝒆′:
  • Repeat 𝑒 = 𝑃(ln π‘œ) times
  • Sample each coordinate in [𝑒] with probability πœ€/𝑙
  • E.g. 𝑒 = 5
  • round 1: coordinates (1,3,4) sampled
  • round 2: coordinate (4) sampled
  • 𝑀 = 3,6,1,2,4 maps to 𝑔 𝑀 = (3,1,2,2)

𝔽 𝒆′ = 𝑷(𝒆 𝐦𝐨 𝒐 β‹… 𝜺 𝒍)

slide-43
SLIDE 43

A Randomized Map

  • Embed all the points using a random mapping π’ˆ: ℝ𝒆 β†’ ℝ𝒆′:
  • Repeat 𝑒 = 𝑃(ln π‘œ) times
  • Sample each coordinate in [𝑒] with probability πœ€/𝑙
  • E.g. 𝑒 = 5
  • round 1: coordinates (1,3,4) sampled
  • round 2: coordinate (4) sampled
  • 𝑀 = 3,6,1,2,4 maps to 𝑔 𝑀 = (3,1,2,2)
  • Simple setup: Consider a vector 𝑀 where each coordinate is either 0 or ∞

𝔽 𝒆′ = 𝑷(𝒆 𝐦𝐨 𝒐 β‹… 𝜺 𝒍)

slide-44
SLIDE 44

A Randomized Map

  • Embed all the points using a random mapping π’ˆ: ℝ𝒆 β†’ ℝ𝒆′:
  • Repeat 𝑒 = 𝑃(ln π‘œ) times
  • Sample each coordinate in [𝑒] with probability πœ€/𝑙
  • E.g. 𝑒 = 5
  • round 1: coordinates (1,3,4) sampled
  • round 2: coordinate (4) sampled
  • 𝑀 = 3,6,1,2,4 maps to 𝑔 𝑀 = (3,1,2,2)
  • Simple setup: Consider a vector 𝑀 where each coordinate is either 0 or ∞
  • Close point:
  • 𝑀 has at most 𝑙 large coordinates
  • Probability of avoiding large coordinates is at least 1 βˆ’ πœ€

𝑙 𝑙⋅ln π‘œ

β‰ˆ π‘œβˆ’πœ€ 𝔽 𝒆′ = 𝑷(𝒆 𝐦𝐨 𝒐 β‹… 𝜺 𝒍)

slide-45
SLIDE 45

A Randomized Map

  • Embed all the points using a random mapping π’ˆ: ℝ𝒆 β†’ ℝ𝒆′:
  • Repeat 𝑒 = 𝑃(ln π‘œ) times
  • Sample each coordinate in [𝑒] with probability πœ€/𝑙
  • E.g. 𝑒 = 5
  • round 1: coordinates (1,3,4) sampled
  • round 2: coordinate (4) sampled
  • 𝑀 = 3,6,1,2,4 maps to 𝑔 𝑀 = (3,1,2,2)
  • Simple setup: Consider a vector 𝑀 where each coordinate is either 0 or ∞
  • Close point:
  • 𝑀 has at most 𝑙 large coordinates
  • Probability of avoiding large coordinates is at least 1 βˆ’ πœ€

𝑙 𝑙⋅ln π‘œ

β‰ˆ π‘œβˆ’πœ€

  • Far point
  • 𝑀 has at least 𝑙/πœ€ large coordinates
  • Probability of missing large coordinates is at most 1 βˆ’ πœ€

𝑙 (𝑙/πœ€)β‹…ln π‘œ

β‰ˆ 1/π‘œ 𝔽 𝒆′ = 𝑷(𝒆 𝐦𝐨 𝒐 β‹… 𝜺 𝒍)

slide-46
SLIDE 46

Outline

  • Embed all the points using a random mapping 𝑔: ℝ𝑒 β†’ ℝ𝑒′
  • With probability π‘œβˆ’πœ€
  • all far points will be mapped to far points under 𝑀1 distance
  • a close by point will be mapped to a close by point under 𝑀1 distance.
slide-47
SLIDE 47

Outline

  • Embed all the points using a random mapping 𝑔: ℝ𝑒 β†’ ℝ𝑒′
  • With probability π‘œβˆ’πœ€
  • all far points will be mapped to far points under 𝑀1 distance
  • a close by point will be mapped to a close by point under 𝑀1 distance.
  • We can use ANN as a black-box to find it
slide-48
SLIDE 48

Outline

  • Embed all the points using a random mapping 𝑔: ℝ𝑒 β†’ ℝ𝑒′
  • With probability π‘œβˆ’πœ€
  • all far points will be mapped to far points under 𝑀1 distance
  • a close by point will be mapped to a close by point under 𝑀1 distance.
  • We can use ANN as a black-box to find it
  • Repeat this embedding O(π‘œπœ€ log π‘œ) times and report the

best.

slide-49
SLIDE 49

Algorithm

𝑆𝑒

π‘Ÿ

Ignore k-coords

slide-50
SLIDE 50

Algorithm

t times

f

Sample every coordinate w.p 𝜺/𝒍

𝑆𝑒

π‘Ÿ

𝑆𝑒′

π‘Ÿ

Ignore k-coords 𝑀1

slide-51
SLIDE 51

Algorithm

t times

f

Sample every coordinate w.p 𝜺/𝒍

𝑆𝑒

π‘Ÿ

𝑆𝑒′

π‘Ÿ

𝐡𝑂𝑂 Ignore k-coords 𝑀1

slide-52
SLIDE 52

Algorithm

t times

f

Sample every coordinate w.p 𝜺/𝒍

. . . π‘œπœ€ times 𝑆𝑒

π‘Ÿ

𝑆𝑒′

π‘Ÿ

𝐡𝑂𝑂

t times

f

Sample every coordinate w.p 𝜺/𝒍

𝑆𝑒

π‘Ÿ

𝑆𝑒′

π‘Ÿ

𝐡𝑂𝑂 Ignore k-coords 𝑀1

slide-53
SLIDE 53

Algorithm

Check the distance of all π‘œπœ€ candidates and report the closest one after ignoring 𝑙 coordinates

t times

f

Sample every coordinate w.p 𝜺/𝒍

. . . π‘œπœ€ times 𝑆𝑒

π‘Ÿ

𝑆𝑒′

π‘Ÿ

𝐡𝑂𝑂

t times

f

Sample every coordinate w.p 𝜺/𝒍

𝑆𝑒

π‘Ÿ

𝑆𝑒′

π‘Ÿ

𝐡𝑂𝑂 Ignore k-coords 𝑀1

slide-54
SLIDE 54

Analysis

slide-55
SLIDE 55

Truncation

Coordinates are not necessarily 𝟏 and ∞

slide-56
SLIDE 56

Truncation

Coordinates are not necessarily 𝟏 and ∞

Let π’ˆβ€² be obtained by sampling every coordinate with probability 𝝊 = 𝜺/𝒍

slide-57
SLIDE 57

Truncation

Coordinates are not necessarily 𝟏 and ∞

Let π’ˆβ€² be obtained by sampling every coordinate with probability 𝝊 = 𝜺/𝒍

  • 𝔽 𝑔′ 𝑀

1 = 𝜐 𝑀 1

slide-58
SLIDE 58

Truncation

Coordinates are not necessarily 𝟏 and ∞

Let π’ˆβ€² be obtained by sampling every coordinate with probability 𝝊 = 𝜺/𝒍

  • 𝔽 𝑔′ 𝑀

1 = 𝜐 𝑀 1

  • 𝐖𝐛𝐬 𝑔′ 𝑀

1 = 𝜐 1 βˆ’ 𝜐 𝑀 2 2

slide-59
SLIDE 59

Truncation

Coordinates are not necessarily 𝟏 and ∞

Let π’ˆβ€² be obtained by sampling every coordinate with probability 𝝊 = 𝜺/𝒍

  • 𝔽 𝑔′ 𝑀

1 = 𝜐 𝑀 1

  • 𝐖𝐛𝐬 𝑔′ 𝑀

1 = 𝜐 1 βˆ’ 𝜐 𝑀 2 2

Need to bound the influence of every coordinate.

slide-60
SLIDE 60

Truncation

Coordinates are not necessarily 𝟏 and ∞

Let π’ˆβ€² be obtained by sampling every coordinate with probability 𝝊 = 𝜺/𝒍

  • 𝔽 𝑔′ 𝑀

1 = 𝜐 𝑀 1

  • 𝐖𝐛𝐬 𝑔′ 𝑀

1 = 𝜐 1 βˆ’ 𝜐 𝑀 2 2

Need to bound the influence of every coordinate.

  • Truncate every coordinate at 𝒔/𝒍 ,i.e., π’˜π’‹ = 𝐧𝐣𝐨{π’˜π’‹, 𝒔/𝒍}
slide-61
SLIDE 61

Truncation

Coordinates are not necessarily 𝟏 and ∞

Let π’ˆβ€² be obtained by sampling every coordinate with probability 𝝊 = 𝜺/𝒍

  • 𝔽 𝑔′ 𝑀

1 = 𝜐 𝑀 1

  • 𝐖𝐛𝐬 𝑔′ 𝑀

1 = 𝜐 1 βˆ’ 𝜐 𝑀 2 2

Need to bound the influence of every coordinate.

  • Truncate every coordinate at 𝒔/𝒍 ,i.e., π’˜π’‹ = 𝐧𝐣𝐨{π’˜π’‹, 𝒔/𝒍}
  • 𝒔 βˆ’Light point: a point with norm ≀ 𝑠 after truncation
  • Rβˆ’ Heavy point: a point with norm β‰₯ 𝑆 after truncation
slide-62
SLIDE 62

Truncation

Coordinates are not necessarily 𝟏 and ∞

Let π’ˆβ€² be obtained by sampling every coordinate with probability 𝝊 = 𝜺/𝒍

  • 𝔽 𝑔′ 𝑀

1 = 𝜐 𝑀 1

  • 𝐖𝐛𝐬 𝑔′ 𝑀

1 = 𝜐 1 βˆ’ 𝜐 𝑀 2 2

Need to bound the influence of every coordinate.

  • Truncate every coordinate at 𝒔/𝒍 ,i.e., π’˜π’‹ = 𝐧𝐣𝐨{π’˜π’‹, 𝒔/𝒍}
  • 𝒔 βˆ’Light point: a point with norm ≀ 𝑠 after truncation
  • Rβˆ’ Heavy point: a point with norm β‰₯ 𝑆 after truncation
  • Close point: a point with norm ≀ 𝑠 after ignoring 𝑙 coordinates
  • Far point: a point with norm β‰₯ 𝑠/πœ€ after ignoring 𝑙/πœ€ coordinates
slide-63
SLIDE 63

Truncation

Coordinates are not necessarily 𝟏 and ∞

Let π’ˆβ€² be obtained by sampling every coordinate with probability 𝝊 = 𝜺/𝒍

  • 𝔽 𝑔′ 𝑀

1 = 𝜐 𝑀 1

  • 𝐖𝐛𝐬 𝑔′ 𝑀

1 = 𝜐 1 βˆ’ 𝜐 𝑀 2 2

Need to bound the influence of every coordinate.

  • Truncate every coordinate at 𝒔/𝒍 ,i.e., π’˜π’‹ = 𝐧𝐣𝐨{π’˜π’‹, 𝒔/𝒍}
  • 𝒔 βˆ’Light point: a point with norm ≀ 𝑠 after truncation
  • Rβˆ’ Heavy point: a point with norm β‰₯ 𝑆 after truncation
  • Close point: a point with norm ≀ 𝑠 after ignoring 𝑙 coordinates
  • Far point: a point with norm β‰₯ 𝑠/πœ€ after ignoring 𝑙/πœ€ coordinates

Claim:

  • A close point is 2𝑠 βˆ’light.
slide-64
SLIDE 64

Truncation

Coordinates are not necessarily 𝟏 and ∞

Let π’ˆβ€² be obtained by sampling every coordinate with probability 𝝊 = 𝜺/𝒍

  • 𝔽 𝑔′ 𝑀

1 = 𝜐 𝑀 1

  • 𝐖𝐛𝐬 𝑔′ 𝑀

1 = 𝜐 1 βˆ’ 𝜐 𝑀 2 2

Need to bound the influence of every coordinate.

  • Truncate every coordinate at 𝒔/𝒍 ,i.e., π’˜π’‹ = 𝐧𝐣𝐨{π’˜π’‹, 𝒔/𝒍}
  • 𝒔 βˆ’Light point: a point with norm ≀ 𝑠 after truncation
  • Rβˆ’ Heavy point: a point with norm β‰₯ 𝑆 after truncation
  • Close point: a point with norm ≀ 𝑠 after ignoring 𝑙 coordinates
  • Far point: a point with norm β‰₯ 𝑠/πœ€ after ignoring 𝑙/πœ€ coordinates

Claim:

  • A close point is 2𝑠 βˆ’light.
  • A far point is 𝑠

πœ€ βˆ’heavy.

slide-65
SLIDE 65

Truncation

Coordinates are not necessarily 𝟏 and ∞

Let π’ˆβ€² be obtained by sampling every coordinate with probability 𝝊 = 𝜺/𝒍

  • 𝔽 𝑔′ 𝑀

1 = 𝜐 𝑀 1

  • 𝐖𝐛𝐬 𝑔′ 𝑀

1 = 𝜐 1 βˆ’ 𝜐 𝑀 2 2

Need to bound the influence of every coordinate.

  • Truncate every coordinate at 𝒔/𝒍 ,i.e., π’˜π’‹ = 𝐧𝐣𝐨{π’˜π’‹, 𝒔/𝒍}
  • 𝒔 βˆ’Light point: a point with norm ≀ 𝑠 after truncation
  • Rβˆ’ Heavy point: a point with norm β‰₯ 𝑆 after truncation
  • Close point: a point with norm ≀ 𝑠 after ignoring 𝑙 coordinates
  • Far point: a point with norm β‰₯ 𝑠/πœ€ after ignoring 𝑙/πœ€ coordinates

Claim:

  • A close point is 2𝑠 βˆ’light.
  • A far point is 𝑠

πœ€ βˆ’heavy.

  • Analyze the behavior of the maps over the truncated points instead.
slide-66
SLIDE 66

𝑀1 Norm

Using truncation

  • Bound the variance and prove concentration for π’ˆβ€² by Chebyshev
slide-67
SLIDE 67

𝑀1 Norm

Using truncation

  • Bound the variance and prove concentration for π’ˆβ€² by Chebyshev

π’ˆ is a concatenation of 𝒖 = 𝑷(π’Žπ’ 𝒐) such π’ˆβ€²

slide-68
SLIDE 68

𝑀1 Norm

Using truncation

  • Bound the variance and prove concentration for π’ˆβ€² by Chebyshev

π’ˆ is a concatenation of 𝒖 = 𝑷(π’Žπ’ 𝒐) such π’ˆβ€²

  • 𝔽 𝑔 𝑀

1 = π‘’πœ 𝑀 1

  • Prove concentration for π’ˆ using Chernoff
slide-69
SLIDE 69

𝑀1 Norm

Using truncation

  • Bound the variance and prove concentration for π’ˆβ€² by Chebyshev

π’ˆ is a concatenation of 𝒖 = 𝑷(π’Žπ’ 𝒐) such π’ˆβ€²

  • 𝔽 𝑔 𝑀

1 = π‘’πœ 𝑀 1

  • Prove concentration for π’ˆ using Chernoff

distance #ignored coordinates Query Time #Queries Query type Opt 𝑠 𝑙 𝑀1 𝑃(𝑠 πœ€) 𝑃(𝑙 πœ€) π‘œπœ€ 2-ANN

slide-70
SLIDE 70

Generalizations

 𝑴𝒒 norm

slide-71
SLIDE 71

Generalizations

 𝑴𝒒 norm

  • Minimize the π’˜ 𝒒

𝒒 norm, i.e., 𝒋 π’˜π’‹ 𝒒 similar to the 𝑀1 norm

slide-72
SLIDE 72

Generalizations

 𝑴𝒒 norm

  • Minimize the π’˜ 𝒒

𝒒 norm, i.e., 𝒋 π’˜π’‹ 𝒒 similar to the 𝑀1 norm

𝑀𝒒

𝑃(𝑠 𝑑 + 1 πœ€

1/𝒒

)

𝑃(𝑙 𝑑 + 1

πœ€ )

π‘œπœ€ 𝑑1/πͺ-ANN

slide-73
SLIDE 73

Generalizations

 𝑴𝒒 norm

  • Minimize the π’˜ 𝒒

𝒒 norm, i.e., 𝒋 π’˜π’‹ 𝒒 similar to the 𝑀1 norm

 Budgeted

𝑀𝒒

𝑃(𝑠 𝑑 + 1 πœ€

1/𝒒

)

𝑃(𝑙 𝑑 + 1

πœ€ )

π‘œπœ€ 𝑑1/πͺ-ANN

slide-74
SLIDE 74

Generalizations

 𝑴𝒒 norm

  • Minimize the π’˜ 𝒒

𝒒 norm, i.e., 𝒋 π’˜π’‹ 𝒒 similar to the 𝑀1 norm

 Budgeted

  • Map:
  • sample coordinate 𝒋 with probability proportional to 𝟐/𝒙𝒋

𝑀𝒒

𝑃(𝑠 𝑑 + 1 πœ€

1/𝒒

)

𝑃(𝑙 𝑑 + 1

πœ€ )

π‘œπœ€ 𝑑1/πͺ-ANN

slide-75
SLIDE 75

Generalizations

 𝑴𝒒 norm

  • Minimize the π’˜ 𝒒

𝒒 norm, i.e., 𝒋 π’˜π’‹ 𝒒 similar to the 𝑀1 norm

 Budgeted

  • Map:
  • sample coordinate 𝒋 with probability proportional to 𝟐/𝒙𝒋
  • To maintain the expectation multiply sampled coordinates by 𝒙𝒋

𝑀𝒒

𝑃(𝑠 𝑑 + 1 πœ€

1/𝒒

)

𝑃(𝑙 𝑑 + 1

πœ€ )

π‘œπœ€ 𝑑1/πͺ-ANN

slide-76
SLIDE 76

Generalizations

 𝑴𝒒 norm

  • Minimize the π’˜ 𝒒

𝒒 norm, i.e., 𝒋 π’˜π’‹ 𝒒 similar to the 𝑀1 norm

 Budgeted

  • Map:
  • sample coordinate 𝒋 with probability proportional to 𝟐/𝒙𝒋
  • To maintain the expectation multiply sampled coordinates by 𝒙𝒋
  • Truncation:
  • Truncate coordinate 𝒋 with by value

𝒔 𝒅/π’™π’‹βˆ’πŸ

𝑀𝒒

𝑃(𝑠 𝑑 + 1 πœ€

1/𝒒

)

𝑃(𝑙 𝑑 + 1

πœ€ )

π‘œπœ€ 𝑑1/πͺ-ANN

slide-77
SLIDE 77

Generalizations

 𝑴𝒒 norm

  • Minimize the π’˜ 𝒒

𝒒 norm, i.e., 𝒋 π’˜π’‹ 𝒒 similar to the 𝑀1 norm

 Budgeted

  • Map:
  • sample coordinate 𝒋 with probability proportional to 𝟐/𝒙𝒋
  • To maintain the expectation multiply sampled coordinates by 𝒙𝒋
  • Truncation:
  • Truncate coordinate 𝒋 with by value

𝒔 𝒅/π’™π’‹βˆ’πŸ

  • E.g. a coordinate of cost approaching 0 will be truncated to 0

𝑀𝒒

𝑃(𝑠 𝑑 + 1 πœ€

1/𝒒

)

𝑃(𝑙 𝑑 + 1

πœ€ )

π‘œπœ€ 𝑑1/πͺ-ANN

slide-78
SLIDE 78

Generalizations

 𝑴𝒒 norm

  • Minimize the π’˜ 𝒒

𝒒 norm, i.e., 𝒋 π’˜π’‹ 𝒒 similar to the 𝑀1 norm

 Budgeted

  • Map:
  • sample coordinate 𝒋 with probability proportional to 𝟐/𝒙𝒋
  • To maintain the expectation multiply sampled coordinates by 𝒙𝒋
  • Truncation:
  • Truncate coordinate 𝒋 with by value

𝒔 𝒅/π’™π’‹βˆ’πŸ

  • E.g. a coordinate of cost approaching 0 will be truncated to 0

𝑀𝒒

𝑃(𝑠 𝑑 + 1 πœ€

1/𝒒

)

𝑃(𝑙 𝑑 + 1

πœ€ )

π‘œπœ€ 𝑑1/πͺ-ANN Budgeted Version 𝑃(𝑠) Weight of 𝑃(1) π‘œπœ€ 2-ANN +𝑃(π‘œπœ€π‘’4)

slide-79
SLIDE 79

Conclusion

distance #ignored coordinates Query Time #Queries Query type Opt 𝑠 𝑙 𝑀1 𝑃(𝑠 πœ€) 𝑃(𝑙 πœ€) π‘œπœ€ 2-ANN 𝑀𝒒

𝑃(𝑠 𝑑 + 1 πœ€

1/𝒒

)

𝑃(𝑙 𝑑 + 1

πœ€ )

π‘œπœ€ 𝑑1/πͺ-ANN (1 + πœ—)- approximation 𝑠(1 + πœ—) 𝑃( 𝑙

πœ—πœ€)

O(π‘œπœ€

πœ— )

1 + πœ— βˆ’ANN Budgeted Version 𝑃(𝑠) Weight of 𝑃(1) π‘œπœ€ 2-ANN +𝑃(π‘œπœ€π‘’4)

slide-80
SLIDE 80

Conclusion

Open Problems

  • Improve the dependence on πœ€
  • Prove lower bounds

distance #ignored coordinates Query Time #Queries Query type Opt 𝑠 𝑙 𝑀1 𝑃(𝑠 πœ€) 𝑃(𝑙 πœ€) π‘œπœ€ 2-ANN 𝑀𝒒

𝑃(𝑠 𝑑 + 1 πœ€

1/𝒒

)

𝑃(𝑙 𝑑 + 1

πœ€ )

π‘œπœ€ 𝑑1/πͺ-ANN (1 + πœ—)- approximation 𝑠(1 + πœ—) 𝑃( 𝑙

πœ—πœ€)

O(π‘œπœ€

πœ— )

1 + πœ— βˆ’ANN Budgeted Version 𝑃(𝑠) Weight of 𝑃(1) π‘œπœ€ 2-ANN +𝑃(π‘œπœ€π‘’4)

slide-81
SLIDE 81

Conclusion

Open Problems

  • Improve the dependence on πœ€
  • Prove lower bounds

distance #ignored coordinates Query Time #Queries Query type Opt 𝑠 𝑙 𝑀1 𝑃(𝑠 πœ€) 𝑃(𝑙 πœ€) π‘œπœ€ 2-ANN 𝑀𝒒

𝑃(𝑠 𝑑 + 1 πœ€

1/𝒒

)

𝑃(𝑙 𝑑 + 1

πœ€ )

π‘œπœ€ 𝑑1/πͺ-ANN (1 + πœ—)- approximation 𝑠(1 + πœ—) 𝑃( 𝑙

πœ—πœ€)

O(π‘œπœ€

πœ— )

1 + πœ— βˆ’ANN Budgeted Version 𝑃(𝑠) Weight of 𝑃(1) π‘œπœ€ 2-ANN +𝑃(π‘œπœ€π‘’4)

Thank You!