an introduction to chaining and applications to sublinear
play

An introduction to chaining, and applications to sublinear - PowerPoint PPT Presentation

An introduction to chaining, and applications to sublinear algorithms Jelani Nelson Harvard August 28, 2015 Whats this talk about? Whats this talk about? Given a collection of random variables X 1 , X 2 , . . . , , we would like to say


  1. An introduction to chaining, and applications to sublinear algorithms Jelani Nelson Harvard August 28, 2015

  2. What’s this talk about?

  3. What’s this talk about? Given a collection of random variables X 1 , X 2 , . . . , , we would like to say that max i X i is small with high probability. (Happens all over computer science, e.g. “Chernion” (Chernoff+Union) bound)

  4. What’s this talk about? Given a collection of random variables X 1 , X 2 , . . . , , we would like to say that max i X i is small with high probability. (Happens all over computer science, e.g. “Chernion” (Chernoff+Union) bound) Today’s topic: Beating the Union Bound

  5. What’s this talk about? Given a collection of random variables X 1 , X 2 , . . . , , we would like to say that max i X i is small with high probability. (Happens all over computer science, e.g. “Chernion” (Chernoff+Union) bound) Today’s topic: Beating the Union Bound Disclaimer: This is an educational talk, about ideas which aren’t mine.

  6. A first example • T ⊂ B ℓ n 2

  7. A first example • T ⊂ B ℓ n 2 • Random variables ( Z x ) x ∈ T Z x = � g , x � for a vector g with i.i.d. N (0 , 1) entries

  8. A first example • T ⊂ B ℓ n 2 • Random variables ( Z x ) x ∈ T Z x = � g , x � for a vector g with i.i.d. N (0 , 1) entries • Define gaussian mean width g ( T ) = E g sup x ∈ T Z x

  9. A first example • T ⊂ B ℓ n 2 • Random variables ( Z x ) x ∈ T Z x = � g , x � for a vector g with i.i.d. N (0 , 1) entries • Define gaussian mean width g ( T ) = E g sup x ∈ T Z x • How can we bound g ( T )?

  10. A first example • T ⊂ B ℓ n 2 • Random variables ( Z x ) x ∈ T Z x = � g , x � for a vector g with i.i.d. N (0 , 1) entries • Define gaussian mean width g ( T ) = E g sup x ∈ T Z x • How can we bound g ( T )? • This talk: four progressively tighter ways to bound g ( T ), then applications of techniques to some TCS problems

  11. Gaussian mean width bound 1: union bound • g ( T ) = E sup x ∈ T Z x = E sup x ∈ T � g , x �

  12. Gaussian mean width bound 1: union bound • g ( T ) = E sup x ∈ T Z x = E sup x ∈ T � g , x � • Z x is a gaussian with variance one

  13. Gaussian mean width bound 1: union bound • g ( T ) = E sup x ∈ T Z x = E sup x ∈ T � g , x � • Z x is a gaussian with variance one � ∞ E sup Z x = P (sup Z x > u ) du x ∈ T x ∈ T 0 � ∞ � u ∗ = P (sup Z x > u ) du + P (sup Z x > u ) du 0 x ∈ T x ∈ T u ∗ � �� � � �� � ≤ 1 ≤| T |· e − u 2 / 2 (union bound) ≤ u ∗ + | T | · e − u 2 ∗ / 2 � � log | T | (set u ∗ = 2 log | T | ) �

  14. Gaussian mean width bound 1: union bound • g ( T ) = E sup x ∈ T Z x = E sup x ∈ T � g , x � • Z x is a gaussian with variance one � ∞ E sup Z x = P (sup Z x > u ) du x ∈ T x ∈ T 0 � ∞ � u ∗ = P (sup Z x > u ) du + P (sup Z x > u ) du 0 x ∈ T x ∈ T u ∗ � �� � � �� � ≤ 1 ≤| T |· e − u 2 / 2 (union bound) ≤ u ∗ + | T | · e − u 2 ∗ / 2 � � log | T | (set u ∗ = 2 log | T | ) �

  15. Gaussian mean width bound 1: union bound • g ( T ) = E sup x ∈ T Z x = E sup x ∈ T � g , x � • Z x is a gaussian with variance one � ∞ E sup Z x = P (sup Z x > u ) du x ∈ T x ∈ T 0 � ∞ � u ∗ = P (sup Z x > u ) du + P (sup Z x > u ) du 0 x ∈ T x ∈ T u ∗ � �� � � �� � ≤ 1 ≤| T |· e − u 2 / 2 (union bound) ≤ u ∗ + | T | · e − u 2 ∗ / 2 � � log | T | (set u ∗ = 2 log | T | ) �

  16. Gaussian mean width bound 2: ε -net • g ( T ) = E sup x ∈ T � g , x � • Let S ε be ε -net of ( T , ℓ 2 )

  17. Gaussian mean width bound 2: ε -net • g ( T ) = E sup x ∈ T � g , x � • Let S ε be ε -net of ( T , ℓ 2 ) • � g , x � = � g , x ′ � + � g , x − x ′ � ( x ′ = argmin y ∈ T � x − y � 2 ) � g , x − x ′ � g ( T ) ≤ g ( S ε ) + E g sup x ∈ T � �� � ≤ ε ·� g � 2

  18. Gaussian mean width bound 2: ε -net • g ( T ) = E sup x ∈ T � g , x � • Let S ε be ε -net of ( T , ℓ 2 ) • � g , x � = � g , x ′ � + � g , x − x ′ � ( x ′ = argmin y ∈ T � x − y � 2 ) � g , x − x ′ � g ( T ) ≤ g ( S ε ) + E g sup x ∈ T � �� � ≤ ε ·� g � 2 � log | S ε | + ε ( E g � g � 2 2 ) 1 / 2 • � + ε √ n • � log 1 / 2 N ( T , ℓ 2 , ε ) � �� � smallest ε − net size

  19. Gaussian mean width bound 2: ε -net • g ( T ) = E sup x ∈ T � g , x � • Let S ε be ε -net of ( T , ℓ 2 ) • � g , x � = � g , x ′ � + � g , x − x ′ � ( x ′ = argmin y ∈ T � x − y � 2 ) � g , x − x ′ � g ( T ) ≤ g ( S ε ) + E g sup x ∈ T � �� � ≤ ε ·� g � 2 � log | S ε | + ε ( E g � g � 2 2 ) 1 / 2 • � + ε √ n • � log 1 / 2 N ( T , ℓ 2 , ε ) � �� � smallest ε − net size • Choose ε to optimize bound; can never be worse than last slide (which amounts to choosing ε = 0)

  20. Gaussian mean width bound 3: ε -net sequence • S k is a ( 1 / 2 k )-net of T , k ≥ 0 π k x is closest point in S k to x ∈ T , ∆ k x = π k x − π k − 1 x

  21. Gaussian mean width bound 3: ε -net sequence • S k is a ( 1 / 2 k )-net of T , k ≥ 0 π k x is closest point in S k to x ∈ T , ∆ k x = π k x − π k − 1 x • wlog | T | < ∞ (else apply this slide to ε -net of T for ε small) • � g , x � = � g , π 0 x � + � ∞ k =1 � g , ∆ k x �

  22. Gaussian mean width bound 3: ε -net sequence • S k is a ( 1 / 2 k )-net of T , k ≥ 0 π k x is closest point in S k to x ∈ T , ∆ k x = π k x − π k − 1 x • wlog | T | < ∞ (else apply this slide to ε -net of T for ε small) • � g , x � = � g , π 0 x � + � ∞ k =1 � g , ∆ k x � + � ∞ • g ( T ) ≤ E g sup � g , π 0 x � k =1 E g sup x ∈ T � g , ∆ k x � x ∈ T � �� � 0

  23. Gaussian mean width bound 3: ε -net sequence • S k is a ( 1 / 2 k )-net of T , k ≥ 0 π k x is closest point in S k to x ∈ T , ∆ k x = π k x − π k − 1 x • wlog | T | < ∞ (else apply this slide to ε -net of T for ε small) • � g , x � = � g , π 0 x � + � ∞ k =1 � g , ∆ k x � + � ∞ • g ( T ) ≤ E g sup � g , π 0 x � k =1 E g sup x ∈ T � g , ∆ k x � x ∈ T � �� � 0 • |{ ∆ k x : x ∈ T }| ≤ N ( T , ℓ 2 , 1 / 2 k ) · N ( T , ℓ 2 , 1 / 2 k − 1 ) ≤ ( N ( T , ℓ 2 , 1 / 2 k )) 2

  24. Gaussian mean width bound 3: ε -net sequence • S k is a ( 1 / 2 k )-net of T , k ≥ 0 π k x is closest point in S k to x ∈ T , ∆ k x = π k x − π k − 1 x • wlog | T | < ∞ (else apply this slide to ε -net of T for ε small) • � g , x � = � g , π 0 x � + � ∞ k =1 � g , ∆ k x � + � ∞ • g ( T ) ≤ E g sup � g , π 0 x � k =1 E g sup x ∈ T � g , ∆ k x � x ∈ T � �� � 0 • |{ ∆ k x : x ∈ T }| ≤ N ( T , ℓ 2 , 1 / 2 k ) · N ( T , ℓ 2 , 1 / 2 k − 1 ) ≤ ( N ( T , ℓ 2 , 1 / 2 k )) 2 k =1 ( 1 / 2 k ) · log 1 / 2 N ( T , ℓ 2 , 1 / 2 k ) • g ( T ) � � ∞ � ∞ 0 log 1 / 2 N ( T , ℓ 2 , u ) du (Dudley’s theorem) �

  25. Gaussian mean width bound 4: generic chaining • Again, wlog | T | < ∞ . Define T 0 ⊆ T 1 ⊆ · · · ⊆ T k ∗ = T | T 0 | = 1 , | T k | ≤ 2 2 k (call such a sequence “admissible”)

  26. Gaussian mean width bound 4: generic chaining • Again, wlog | T | < ∞ . Define T 0 ⊆ T 1 ⊆ · · · ⊆ T k ∗ = T | T 0 | = 1 , | T k | ≤ 2 2 k (call such a sequence “admissible”) • Exercise: show Dudley’s theorem is equivalent to k =1 2 k / 2 · sup x ∈ T d ℓ 2 ( x , T k ) � ∞ g ( T ) � inf { T k } admissible (should pick T k to be the best ε = ε ( k ) net of size 2 2 k )

  27. Gaussian mean width bound 4: generic chaining • Again, wlog | T | < ∞ . Define T 0 ⊆ T 1 ⊆ · · · ⊆ T k ∗ = T | T 0 | = 1 , | T k | ≤ 2 2 k (call such a sequence “admissible”) • Exercise: show Dudley’s theorem is equivalent to k =1 2 k / 2 · sup x ∈ T d ℓ 2 ( x , T k ) � ∞ g ( T ) � inf { T k } admissible (should pick T k to be the best ε = ε ( k ) net of size 2 2 k ) • Fernique’76 ∗ : can pull the sup x outside the sum � ∞ k =1 2 k / 2 · d ℓ 2 ( x , T k ) def • g ( T ) � inf { T k } sup x ∈ T = γ 2 ( T , ℓ 2 )

  28. Gaussian mean width bound 4: generic chaining • Again, wlog | T | < ∞ . Define T 0 ⊆ T 1 ⊆ · · · ⊆ T k ∗ = T | T 0 | = 1 , | T k | ≤ 2 2 k (call such a sequence “admissible”) • Exercise: show Dudley’s theorem is equivalent to k =1 2 k / 2 · sup x ∈ T d ℓ 2 ( x , T k ) � ∞ g ( T ) � inf { T k } admissible (should pick T k to be the best ε = ε ( k ) net of size 2 2 k ) • Fernique’76 ∗ : can pull the sup x outside the sum � ∞ k =1 2 k / 2 · d ℓ 2 ( x , T k ) def • g ( T ) � inf { T k } sup x ∈ T = γ 2 ( T , ℓ 2 ) ∗ equivalent upper bound proven by Fernique (who minimized some integral over all measures over T ), but reformulated in terms of admissible sequences by Talgarand

Recommend


More recommend