Pruning • If B ε ( ω ) ∩ B = ∅ , the sub -tree descending from the node B can be pruned: A B ε ω ε B ε A that is, if it can be certified that ∈ B ε = { x ∈ Ω: d ( x, B ) < ε } . ω / Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.7/25
Pruning • If B ε ( ω ) ∩ B = ∅ , the sub -tree descending from the node B can be pruned: A B B A ε ε ω ω ε ε B ε B ε A A that is, if it can be certified that ∈ B ε = { x ∈ Ω: d ( x, B ) < ε } . ω / • Otherwise the search branches out. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.7/25
Pruning • If B ε ( ω ) ∩ B = ∅ , the sub -tree descending from the node B can be pruned: A B B A ε ε ω ω ε ε B ε B ε A A that is, if it can be certified that ∈ B ε = { x ∈ Ω: d ( x, B ) < ε } . ω / • Otherwise the search branches out. How to “certify” that B ε ( ω ) ∩ B = ∅ ? Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.7/25
Decision functions Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25
Decision functions Let f : Ω → R be a 1 -Lipschitz function, | f ( x ) − f ( y ) | ≤ d ( x, y ) ∀ x, y ∈ Ω , Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25
Decision functions Let f : Ω → R be a 1 -Lipschitz function, | f ( x ) − f ( y ) | ≤ d ( x, y ) ∀ x, y ∈ Ω , such that f ↾ B ≤ 0 . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25
Decision functions Let f : Ω → R be a 1 -Lipschitz function, | f ( x ) − f ( y ) | ≤ d ( x, y ) ∀ x, y ∈ Ω , such that f ↾ B ≤ 0 . Then f ↾ B ε < ε , Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25
Decision functions Let f : Ω → R be a 1 -Lipschitz function, | f ( x ) − f ( y ) | ≤ d ( x, y ) ∀ x, y ∈ Ω , such that f ↾ B ≤ 0 . Then f ↾ B ε < ε , f f(x) ε B 0 x y Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25
Decision functions Let f : Ω → R be a 1 -Lipschitz function, | f ( x ) − f ( y ) | ≤ d ( x, y ) ∀ x, y ∈ Ω , such that f ↾ B ≤ 0 . Then f ↾ B ε < ε , f f(x) ε B 0 x y that is, f ( ω ) ≥ ε Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25
Decision functions Let f : Ω → R be a 1 -Lipschitz function, | f ( x ) − f ( y ) | ≤ d ( x, y ) ∀ x, y ∈ Ω , such that f ↾ B ≤ 0 . Then f ↾ B ε < ε , f f(x) ε B 0 x y that is, f ( ω ) ≥ ε is a certificate that B ε ( ω ) ∩ B = ∅ Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.8/25
Metric trees A metric tree for a metric similarity workload (Ω , ρ, X ) : a binary rooted tree T , a collection of partially defined 1 -Lipschitz functions f t : B t → R for every inner node t (decision functions), a collection of bins B t ⊆ Ω for every leaf node t , containing pointers to elements X ∩ B t , such that B root ( T ) = Ω , ∀ inner node t and child nodes t − , t + , B t ⊆ B t − ∪ B t + . When processing a range query B ε ( ω ) , t − [ t + ] is accessed ⇐ ⇒ f t ( ω ) < ε [resp. f t ( ω ) > − ε ]. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.9/25
What happens in practice? Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.10/25
What happens in practice? The best indexing schemes for exact similarity search in high -dimensional outer datasets are often (not always!) outperformed by linear scan. ∗ ∗ ∗ Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.10/25
What happens in practice? The best indexing schemes for exact similarity search in high -dimensional outer datasets are often (not always!) outperformed by linear scan. ∗ ∗ ∗ The emphasis has shifted towards approximate similarity search: Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.10/25
What happens in practice? The best indexing schemes for exact similarity search in high -dimensional outer datasets are often (not always!) outperformed by linear scan. ∗ ∗ ∗ The emphasis has shifted towards approximate similarity search: given ε > 0 and ω ∈ Ω , return a point that is [with high probability] at a distance < (1 + ε ) d NN ( ω ) from ω . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.10/25
The curse of dimensionality conjecture Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25
The curse of dimensionality conjecture Conjecture. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25
The curse of dimensionality conjecture Let X ⊆ { 0 , 1 } d be a dataset with n points, Conjecture. where the Hamming cube is equipped with the Hamming ( ℓ 1 ) distance: d ( x, y ) = ♯ { i : x i � = y i } . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25
The curse of dimensionality conjecture Let X ⊆ { 0 , 1 } d be a dataset with n points, Conjecture. where the Hamming cube is equipped with the Hamming ( ℓ 1 ) distance: d ( x, y ) = ♯ { i : x i � = y i } . Suppose d = n o (1) , but d = ω (log n ) . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25
The curse of dimensionality conjecture Let X ⊆ { 0 , 1 } d be a dataset with n points, Conjecture. where the Hamming cube is equipped with the Hamming ( ℓ 1 ) distance: d ( x, y ) = ♯ { i : x i � = y i } . Suppose d = n o (1) , but d = ω (log n ) . Any data structure for exact nearest neighbour search in X , Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25
The curse of dimensionality conjecture Let X ⊆ { 0 , 1 } d be a dataset with n points, Conjecture. where the Hamming cube is equipped with the Hamming ( ℓ 1 ) distance: d ( x, y ) = ♯ { i : x i � = y i } . Suppose d = n o (1) , but d = ω (log n ) . Any data structure for with d O (1) query exact nearest neighbour search in X , time, Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25
The curse of dimensionality conjecture Let X ⊆ { 0 , 1 } d be a dataset with n points, Conjecture. where the Hamming cube is equipped with the Hamming ( ℓ 1 ) distance: d ( x, y ) = ♯ { i : x i � = y i } . Suppose d = n o (1) , but d = ω (log n ) . Any data structure for with d O (1) query exact nearest neighbour search in X , time, must use n ω (1) space. ∗ ∗ ∗ Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25
The curse of dimensionality conjecture Let X ⊆ { 0 , 1 } d be a dataset with n points, Conjecture. where the Hamming cube is equipped with the Hamming ( ℓ 1 ) distance: d ( x, y ) = ♯ { i : x i � = y i } . Suppose d = n o (1) , but d = ω (log n ) . Any data structure for with d O (1) query exact nearest neighbour search in X , time, must use n ω (1) space. ∗ ∗ ∗ The cell probe model : Ω( d/ log n ) lower bound (Barkol–Rabani, 2000). Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.11/25
Concentration of measure Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.12/25
Concentration of measure The phenomenon of concentration of measure on high- dimensional structures ( “Geometric LLN” ): Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.12/25
Concentration of measure The phenomenon of concentration of measure on high-dimensional structures ( “Geometric LLN” ): for a typical “high -dimensional” structure Ω , if A is a subset containing at least half of all points, then the measure of the ε -neighbourhood A ε of A is overwhelmingly close to 1 already for small ε > 0 . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.12/25
Concentration of measure The phenomenon of concentration of measure on high-dimensional structures ( “Geometric LLN” ): for a typical “high -dimensional” structure Ω , if A is a subset containing at least half of all points, then the measure of the ε -neighbourhood A ε of A is overwhelmingly close to 1 already for small ε > 0 . Ω ε Α contains at least half of all points ������������������������� ������������������������� A ������������������������� ������������������������� ������������������������� ������������������������� Ω \ A ε ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� α(Ω,ε) ������������������������� ������������������������� ������������������������� ������������������������� ) bounds \ A ε µ(Ω ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� ������������������������� from above ������������������������� ������������������������� ������������������������� ������������������������� A ε ������������������������� ������������������������� Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.12/25
Concentration function Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.13/25
Concentration function Let Ω = (Ω , d, µ ) be a metric space with measure. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.13/25
Concentration function Let Ω = (Ω , d, µ ) be a metric space with measure. The concentration function of Ω : � 1 if ε = 0 , 2 , α ( ε ) = µ ♯ ( A ε ) : A ⊆ Ω , µ ♯ ( A ) ≥ 1 � � 1 − min if ε > 0 . , 2 Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.13/25
Concentration function Let Ω = (Ω , d, µ ) be a metric space with measure. The concentration function of Ω : � 1 if ε = 0 , 2 , α ( ε ) = µ ♯ ( A ε ) : A ⊆ Ω , µ ♯ ( A ) ≥ 1 � � 1 − min if ε > 0 . , 2 For Ω = Σ n , the Hamming cube (normalized distance + unif. measure): α Σ n ( ε ) ≤ e − 2 ε 2 n . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.13/25
Concentration function Let Ω = (Ω , d, µ ) be a metric space with measure. The concentration function of Ω : � 1 if ε = 0 , 2 , α ( ε ) = µ ♯ ( A ε ) : A ⊆ Ω , µ ♯ ( A ) ≥ 1 � � 1 − min if ε > 0 . , 2 For Ω = Σ n , the Hamming cube (normalized distance + unif. measure): α Σ n ( ε ) ≤ e − 2 ε 2 n . Gaussian estimates are typical (Euclidean spheres S n , cubes I n , ...) Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.13/25
Example: the Hamming cube Concentration function versus Chernoff’s bound, n = 101 1 Concentration function Chernoff bound 0.8 0.6 0.4 0.2 0 0 0.05 0.1 0.15 0.2 Concentration function α (Σ 101 , ε ) versus Chernoff bound Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.14/25
Effects of concentration on branching Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.15/25
Effects of concentration on branching C < α (C, ε) < α (C, ε) B A ε ω ε B ε A Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.15/25
Effects of concentration on branching C < α (C, ε) < α (C, ε) B A ε ω ε B ε A For all query points ω ∈ C except a set of measure ≤ 2 α ( C, ε ) , the search algorithm branches out at the node C . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.15/25
Search radius ε NN ( ω ) is a 1 -Lipschitz function, so concentrates near the median value, ε M ; ε M → E µ ⊗ µ d ( x, y ) = O (1) . Example: 1000 pts ∼ [0 , 1] 10 , the ℓ 2 - ε NN : E d ( x, y ) = 1 . 2765 . ε M = 0 . 69419 Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.16/25
A naive average O ( n ) lower bound Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25
A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25
A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... ...as well as query points. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25
A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... ...as well as query points. A balanced metric tree of depth O (log n ) , with O ( n ) bins of roughly equal size ( µ -measure). Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25
A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... ...as well as query points. A balanced metric tree of depth O (log n ) , with O ( n ) bins of roughly equal size ( µ -measure). in 1 / 2 the cases, ε NN ≥ ε M = O (1) , the median NN dist. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25
A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... ...as well as query points. A balanced metric tree of depth O (log n ) , with O ( n ) bins of roughly equal size ( µ -measure). in 1 / 2 the cases, ε NN ≥ ε M = O (1) , the median NN dist. For every element A of level t partition, α ( A, ε M ) ≤ 2 µ ( A ) − 1 α (Ω , ε M / 2) = O (2 t ) e − O (1) ε 2 M d . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25
A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... ...as well as query points. A balanced metric tree of depth O (log n ) , with O ( n ) bins of roughly equal size ( µ -measure). in 1 / 2 the cases, ε NN ≥ ε M = O (1) , the median NN dist. For every element A of level t partition, α ( A, ε M ) ≤ 2 µ ( A ) − 1 α (Ω , ε M / 2) = O (2 t ) e − O (1) ε 2 M d . � branching at every node occurs for all ω except Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25
A naive average O ( n ) lower bound Suppose datapoints are distributed according to µ ∈ P (Ω) ... ...as well as query points. A balanced metric tree of depth O (log n ) , with O ( n ) bins of roughly equal size ( µ -measure). in 1 / 2 the cases, ε NN ≥ ε M = O (1) , the median NN dist. For every element A of level t partition, α ( A, ε M ) ≤ 2 µ ( A ) − 1 α (Ω , ε M / 2) = O (2 t ) e − O (1) ε 2 M d . � branching at every node occurs for all ω except α ( A, ε ) = O ( n 2 ) e − O (1) d = o (1) , ♯ ( nodes ) × 2 sup A because d = ω (log n ) , � e − O (1) d is superpoly ( n ) . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.17/25
What’s wrong? Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.18/25
What’s wrong? A dataset X is modeled by a sequence of i.i.d. r.v. X i ∼ µ . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.18/25
What’s wrong? A dataset X is modeled by a sequence of i.i.d. r.v. X i ∼ µ . Implicit assumption: empirical measure µ n ( A ) = | A | n ≈ µ ( A ) . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.18/25
What’s wrong? A dataset X is modeled by a sequence of i.i.d. r.v. X i ∼ µ . Implicit assumption: empirical measure µ n ( A ) = | A | n ≈ µ ( A ) . But the scheme is chosen after seeing an instance X ! Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.18/25
What’s wrong? A dataset X is modeled by a sequence of i.i.d. r.v. X i ∼ µ . Implicit assumption: empirical measure µ n ( A ) = | A | n ≈ µ ( A ) . But the scheme is chosen after seeing an instance X ! 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 How much can be said of concentration in (Ω , µ n ) ? Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.18/25
VC dimension Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.19/25
VC dimension Let A be a family of subsets of Ω (a concept class ). B ⊆ Ω is shattered by A if for each C ⊆ B there is A ∈ A such that A ∩ B = C. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.19/25
VC dimension Let A be a family of subsets of Ω (a concept class ). B ⊆ Ω is shattered by A if for each C ⊆ B there is A ∈ A such that A ∩ B = C. Ω A B C Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.19/25
VC dimension Let A be a family of subsets of Ω (a concept class ). B ⊆ Ω is shattered by A if for each C ⊆ B there is A ∈ A such that A ∩ B = C. Ω A B C The Vapnik–Chervonenkis dimension VC -dim ( A ) of A is the largest cardinality of a set B ⊆ Ω shattered by A . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.19/25
Statistical learning bounds Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.20/25
Statistical learning bounds Let A ⊆ 2 Ω be a concept class of finite VC dimension, d . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.20/25
Statistical learning bounds Let A ⊆ 2 Ω be a concept class of finite VC dimension, d . Then for all ǫ, δ > 0 and every probability measure µ on Ω , Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.20/25
Statistical learning bounds Let A ⊆ 2 Ω be a concept class of finite VC dimension, d . Then for all ǫ, δ > 0 and every probability measure µ on Ω , if n datapoints in X are drawn randomly and independently acoording to µ , then with confidence 1 − δ � � � µ ( A ) − X ∩ A � � ∀ A ∈ A , � < ǫ, � � n Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.20/25
Statistical learning bounds Let A ⊆ 2 Ω be a concept class of finite VC dimension, d . Then for all ǫ, δ > 0 and every probability measure µ on Ω , if n datapoints in X are drawn randomly and independently acoording to µ , then with confidence 1 − δ � � � µ ( A ) − X ∩ A � � ∀ A ∈ A , � < ǫ, � � n provided n is large enough: � 2 e 2 n ≥ 128 � ε log 2 e � + log 8 � d log . ε 2 ε δ Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.20/25
Bin access lemma Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.21/25
Bin access lemma Let δ > 0 , and let γ be a collection of subsets A ⊆ Ω of measure µ ( A ) ≤ α ( δ ) ≤ 1 4 each, satisfying µ ( ∪ γ ) ≥ 1 / 2 . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.21/25
Bin access lemma Let δ > 0 , and let γ be a collection of subsets A ⊆ Ω of measure µ ( A ) ≤ α ( δ ) ≤ 1 4 each, satisfying µ ( ∪ γ ) ≥ 1 / 2 . Then the 2 δ -neighbourhood of every point ω ∈ Ω , apart from 1 2 α ( δ ) − 1 a set of measure at most 1 2 , meets at least ⌈ 1 2 ⌉ 2 α ( δ ) elements of γ . ∗ ∗ ∗ Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.21/25
Bin access lemma Let δ > 0 , and let γ be a collection of subsets A ⊆ Ω of measure µ ( A ) ≤ α ( δ ) ≤ 1 4 each, satisfying µ ( ∪ γ ) ≥ 1 / 2 . Then the 2 δ -neighbourhood of every point ω ∈ Ω , apart from 1 2 α ( δ ) − 1 a set of measure at most 1 2 , meets at least ⌈ 1 2 ⌉ 2 α ( δ ) elements of γ . ∗ ∗ ∗ If we can now guarantee that the bins are not too large, we get a lower bound on the number of bin accesses. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.21/25
Bin complexity estimates Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.22/25
Bin complexity estimates Let F be a class of 1 -Lipschitz functions used for constructing a metric tree of a particular type. Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.22/25
Bin complexity estimates Let F be a class of 1 -Lipschitz functions used for constructing a metric tree of a particular type. Let A be the concept class of all solution sets to inequalities f � a, f ∈ F , a ∈ R . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.22/25
Bin complexity estimates Let F be a class of 1 -Lipschitz functions used for constructing a metric tree of a particular type. Let A be the concept class of all solution sets to inequalities f � a, f ∈ F , a ∈ R . Suppose p = VC-dim ( A ) < ∞ ( pseudodimension of F in the sense of Vapnik ). Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.22/25
Bin complexity estimates Let F be a class of 1 -Lipschitz functions used for constructing a metric tree of a particular type. Let A be the concept class of all solution sets to inequalities f � a, f ∈ F , a ∈ R . Suppose p = VC-dim ( A ) < ∞ ( pseudodimension of F in the sense of Vapnik ). Denote B the class of all bins of all possible metric trees of depth ≤ h built using F . Then VC-dim ( B ) ≤ 2 hp log( hp ) = O ( hp ) . Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.22/25
Rigorous lower bounds Metric tree indexing schemes for similarity search Vladimir Pestov, University of Ottawa AofA 2008, Maresias, SP , Brazil – p.23/25
Recommend
More recommend