An introduction to chaining, and applications to sublinear algorithms Jelani Nelson Harvard August 28, 2015
What’s this talk about?
What’s this talk about? Given a collection of random variables X 1 , X 2 , . . . , , we would like to say that max i X i is small with high probability. (Happens all over computer science, e.g. “Chernion” (Chernoff+Union) bound)
What’s this talk about? Given a collection of random variables X 1 , X 2 , . . . , , we would like to say that max i X i is small with high probability. (Happens all over computer science, e.g. “Chernion” (Chernoff+Union) bound) Today’s topic: Beating the Union Bound
What’s this talk about? Given a collection of random variables X 1 , X 2 , . . . , , we would like to say that max i X i is small with high probability. (Happens all over computer science, e.g. “Chernion” (Chernoff+Union) bound) Today’s topic: Beating the Union Bound Disclaimer: This is an educational talk, about ideas which aren’t mine.
A first example • T ⊂ B ℓ n 2
A first example • T ⊂ B ℓ n 2 • Random variables ( Z x ) x ∈ T Z x = � g , x � for a vector g with i.i.d. N (0 , 1) entries
A first example • T ⊂ B ℓ n 2 • Random variables ( Z x ) x ∈ T Z x = � g , x � for a vector g with i.i.d. N (0 , 1) entries • Define gaussian mean width g ( T ) = E g sup x ∈ T Z x
A first example • T ⊂ B ℓ n 2 • Random variables ( Z x ) x ∈ T Z x = � g , x � for a vector g with i.i.d. N (0 , 1) entries • Define gaussian mean width g ( T ) = E g sup x ∈ T Z x • How can we bound g ( T )?
A first example • T ⊂ B ℓ n 2 • Random variables ( Z x ) x ∈ T Z x = � g , x � for a vector g with i.i.d. N (0 , 1) entries • Define gaussian mean width g ( T ) = E g sup x ∈ T Z x • How can we bound g ( T )? • This talk: four progressively tighter ways to bound g ( T ), then applications of techniques to some TCS problems
Gaussian mean width bound 1: union bound • g ( T ) = E sup x ∈ T Z x = E sup x ∈ T � g , x �
Gaussian mean width bound 1: union bound • g ( T ) = E sup x ∈ T Z x = E sup x ∈ T � g , x � • Z x is a gaussian with variance one
Gaussian mean width bound 1: union bound • g ( T ) = E sup x ∈ T Z x = E sup x ∈ T � g , x � • Z x is a gaussian with variance one � ∞ E sup Z x = P (sup Z x > u ) du x ∈ T x ∈ T 0 � ∞ � u ∗ = P (sup Z x > u ) du + P (sup Z x > u ) du 0 x ∈ T x ∈ T u ∗ � �� � � �� � ≤ 1 ≤| T |· e − u 2 / 2 (union bound) ≤ u ∗ + | T | · e − u 2 ∗ / 2 � � log | T | (set u ∗ = 2 log | T | ) �
Gaussian mean width bound 1: union bound • g ( T ) = E sup x ∈ T Z x = E sup x ∈ T � g , x � • Z x is a gaussian with variance one � ∞ E sup Z x = P (sup Z x > u ) du x ∈ T x ∈ T 0 � ∞ � u ∗ = P (sup Z x > u ) du + P (sup Z x > u ) du 0 x ∈ T x ∈ T u ∗ � �� � � �� � ≤ 1 ≤| T |· e − u 2 / 2 (union bound) ≤ u ∗ + | T | · e − u 2 ∗ / 2 � � log | T | (set u ∗ = 2 log | T | ) �
Gaussian mean width bound 1: union bound • g ( T ) = E sup x ∈ T Z x = E sup x ∈ T � g , x � • Z x is a gaussian with variance one � ∞ E sup Z x = P (sup Z x > u ) du x ∈ T x ∈ T 0 � ∞ � u ∗ = P (sup Z x > u ) du + P (sup Z x > u ) du 0 x ∈ T x ∈ T u ∗ � �� � � �� � ≤ 1 ≤| T |· e − u 2 / 2 (union bound) ≤ u ∗ + | T | · e − u 2 ∗ / 2 � � log | T | (set u ∗ = 2 log | T | ) �
Gaussian mean width bound 2: ε -net • g ( T ) = E sup x ∈ T � g , x � • Let S ε be ε -net of ( T , ℓ 2 )
Gaussian mean width bound 2: ε -net • g ( T ) = E sup x ∈ T � g , x � • Let S ε be ε -net of ( T , ℓ 2 ) • � g , x � = � g , x ′ � + � g , x − x ′ � ( x ′ = argmin y ∈ T � x − y � 2 ) � g , x − x ′ � g ( T ) ≤ g ( S ε ) + E g sup x ∈ T � �� � ≤ ε ·� g � 2
Gaussian mean width bound 2: ε -net • g ( T ) = E sup x ∈ T � g , x � • Let S ε be ε -net of ( T , ℓ 2 ) • � g , x � = � g , x ′ � + � g , x − x ′ � ( x ′ = argmin y ∈ T � x − y � 2 ) � g , x − x ′ � g ( T ) ≤ g ( S ε ) + E g sup x ∈ T � �� � ≤ ε ·� g � 2 � log | S ε | + ε ( E g � g � 2 2 ) 1 / 2 • � + ε √ n • � log 1 / 2 N ( T , ℓ 2 , ε ) � �� � smallest ε − net size
Gaussian mean width bound 2: ε -net • g ( T ) = E sup x ∈ T � g , x � • Let S ε be ε -net of ( T , ℓ 2 ) • � g , x � = � g , x ′ � + � g , x − x ′ � ( x ′ = argmin y ∈ T � x − y � 2 ) � g , x − x ′ � g ( T ) ≤ g ( S ε ) + E g sup x ∈ T � �� � ≤ ε ·� g � 2 � log | S ε | + ε ( E g � g � 2 2 ) 1 / 2 • � + ε √ n • � log 1 / 2 N ( T , ℓ 2 , ε ) � �� � smallest ε − net size • Choose ε to optimize bound; can never be worse than last slide (which amounts to choosing ε = 0)
Gaussian mean width bound 3: ε -net sequence • S k is a ( 1 / 2 k )-net of T , k ≥ 0 π k x is closest point in S k to x ∈ T , ∆ k x = π k x − π k − 1 x
Gaussian mean width bound 3: ε -net sequence • S k is a ( 1 / 2 k )-net of T , k ≥ 0 π k x is closest point in S k to x ∈ T , ∆ k x = π k x − π k − 1 x • wlog | T | < ∞ (else apply this slide to ε -net of T for ε small) • � g , x � = � g , π 0 x � + � ∞ k =1 � g , ∆ k x �
Gaussian mean width bound 3: ε -net sequence • S k is a ( 1 / 2 k )-net of T , k ≥ 0 π k x is closest point in S k to x ∈ T , ∆ k x = π k x − π k − 1 x • wlog | T | < ∞ (else apply this slide to ε -net of T for ε small) • � g , x � = � g , π 0 x � + � ∞ k =1 � g , ∆ k x � + � ∞ • g ( T ) ≤ E g sup � g , π 0 x � k =1 E g sup x ∈ T � g , ∆ k x � x ∈ T � �� � 0
Gaussian mean width bound 3: ε -net sequence • S k is a ( 1 / 2 k )-net of T , k ≥ 0 π k x is closest point in S k to x ∈ T , ∆ k x = π k x − π k − 1 x • wlog | T | < ∞ (else apply this slide to ε -net of T for ε small) • � g , x � = � g , π 0 x � + � ∞ k =1 � g , ∆ k x � + � ∞ • g ( T ) ≤ E g sup � g , π 0 x � k =1 E g sup x ∈ T � g , ∆ k x � x ∈ T � �� � 0 • |{ ∆ k x : x ∈ T }| ≤ N ( T , ℓ 2 , 1 / 2 k ) · N ( T , ℓ 2 , 1 / 2 k − 1 ) ≤ ( N ( T , ℓ 2 , 1 / 2 k )) 2
Gaussian mean width bound 3: ε -net sequence • S k is a ( 1 / 2 k )-net of T , k ≥ 0 π k x is closest point in S k to x ∈ T , ∆ k x = π k x − π k − 1 x • wlog | T | < ∞ (else apply this slide to ε -net of T for ε small) • � g , x � = � g , π 0 x � + � ∞ k =1 � g , ∆ k x � + � ∞ • g ( T ) ≤ E g sup � g , π 0 x � k =1 E g sup x ∈ T � g , ∆ k x � x ∈ T � �� � 0 • |{ ∆ k x : x ∈ T }| ≤ N ( T , ℓ 2 , 1 / 2 k ) · N ( T , ℓ 2 , 1 / 2 k − 1 ) ≤ ( N ( T , ℓ 2 , 1 / 2 k )) 2 k =1 ( 1 / 2 k ) · log 1 / 2 N ( T , ℓ 2 , 1 / 2 k ) • g ( T ) � � ∞ � ∞ 0 log 1 / 2 N ( T , ℓ 2 , u ) du (Dudley’s theorem) �
Gaussian mean width bound 4: generic chaining • Again, wlog | T | < ∞ . Define T 0 ⊆ T 1 ⊆ · · · ⊆ T k ∗ = T | T 0 | = 1 , | T k | ≤ 2 2 k (call such a sequence “admissible”)
Gaussian mean width bound 4: generic chaining • Again, wlog | T | < ∞ . Define T 0 ⊆ T 1 ⊆ · · · ⊆ T k ∗ = T | T 0 | = 1 , | T k | ≤ 2 2 k (call such a sequence “admissible”) • Exercise: show Dudley’s theorem is equivalent to k =1 2 k / 2 · sup x ∈ T d ℓ 2 ( x , T k ) � ∞ g ( T ) � inf { T k } admissible (should pick T k to be the best ε = ε ( k ) net of size 2 2 k )
Gaussian mean width bound 4: generic chaining • Again, wlog | T | < ∞ . Define T 0 ⊆ T 1 ⊆ · · · ⊆ T k ∗ = T | T 0 | = 1 , | T k | ≤ 2 2 k (call such a sequence “admissible”) • Exercise: show Dudley’s theorem is equivalent to k =1 2 k / 2 · sup x ∈ T d ℓ 2 ( x , T k ) � ∞ g ( T ) � inf { T k } admissible (should pick T k to be the best ε = ε ( k ) net of size 2 2 k ) • Fernique’76 ∗ : can pull the sup x outside the sum � ∞ k =1 2 k / 2 · d ℓ 2 ( x , T k ) def • g ( T ) � inf { T k } sup x ∈ T = γ 2 ( T , ℓ 2 )
Gaussian mean width bound 4: generic chaining • Again, wlog | T | < ∞ . Define T 0 ⊆ T 1 ⊆ · · · ⊆ T k ∗ = T | T 0 | = 1 , | T k | ≤ 2 2 k (call such a sequence “admissible”) • Exercise: show Dudley’s theorem is equivalent to k =1 2 k / 2 · sup x ∈ T d ℓ 2 ( x , T k ) � ∞ g ( T ) � inf { T k } admissible (should pick T k to be the best ε = ε ( k ) net of size 2 2 k ) • Fernique’76 ∗ : can pull the sup x outside the sum � ∞ k =1 2 k / 2 · d ℓ 2 ( x , T k ) def • g ( T ) � inf { T k } sup x ∈ T = γ 2 ( T , ℓ 2 ) ∗ equivalent upper bound proven by Fernique (who minimized some integral over all measures over T ), but reformulated in terms of admissible sequences by Talgarand
Recommend
More recommend