Two Useful Arrows Darts in that Quiver Clément Canonne FOCS Workshop – November 9, 2019 1
Avering, Bucketing, and Investing arguments 2
Suppose you have a : X → [0,1] such that E [ a ( x )] ≥ ε . (Let’s say you already proved that.) We think of a ( x ) as the quality of x , and “using” it has cost cost( a ( x )). 3
Suppose you have a : X → [0,1] such that E [ a ( x )] ≥ ε . (Let’s say you already proved that.) We think of a ( x ) as the quality of x , and “using” it has cost cost( a ( x )). For instance, a population of coins, each with their own bias. The expected bias is ε ; for any given coin, checking bias 0 vs. bias α takes 1/ α 2 tosses. Goal: find a biased coin. 4
How... to convert this into a useful thing? How to find an x with small cost? That is, can we get Pr x [ a ( x ) ≥ blah( ε )] ≥ bluh( ε ) for some “good” functions blah, bluh? 5
“By a standard averaging argument...” First attempt: Markov Lemma (Markov) We have a ( x ) ≥ ε ≥ ε � � Pr 2 . (1) 2 x 6
“By a standard averaging argument...” First attempt: Markov Lemma (Markov) We have a ( x ) ≥ ε ≥ ε � � Pr 2 . (1) 2 x Proof. ε ≤ E [ a ( x )] ≤ ε � a ( x ) < ε � � a ( x ) ≥ ε � 2 · Pr + 1 · Pr x 2 x 2 ≤ 1 7
“By a standard averaging argument...” First attempt: Markov Strategy Sample O (1/ ε ) x ’s to find a “good” one; for each, pay cost( ε /2). 8
“By a standard averaging argument...” First attempt: Markov Strategy Sample O (1/ ε ) x ’s to find a “good” one; for each, pay cost( ε /2). Yes, but... Typically, at least quadratic total cost in ε as cost( α ) = Ω (1/ α ). 9
“By a standard averaging argument...” First attempt: Markov Strategy Sample O (1/ ε ) x ’s to find a “good” one; for each, pay cost( ε /2). Yes, but... Typically, at least quadratic total cost in ε as cost( α ) = Ω (1/ α ). We should not pay the worst of both worlds. 10
“By a standard bucketing argument...” Second attempt: my bucket list Lemma (Bucketing) There exists 1 ≤ j ≤ ⌈ log(2/ ε ) ⌉ : = L s.t. ≥ 2 j ε � a ( x ) ≥ 2 − j � Pr 4 L . (2) x 11
“By a standard bucketing argument...” Second attempt: my bucket list Lemma (Bucketing) There exists 1 ≤ j ≤ ⌈ log(2/ ε ) ⌉ : = L s.t. ≥ 2 j ε � a ( x ) ≥ 2 − j � Pr 4 L . (2) x Proof. Define buckets B 0 : = { x : a ( x ) ≤ ε /2}, B j : = { x : 2 − j ≤ a ( x ) ≤ 2 − j + 1 },1 ≤ j ≤ L Then L ε ≤ E [ a ( x )] ≤ ε � � � 2 − j + 1 · Pr 2 · Pr[ x ∈ B 0 ] + x ∈ B j j = 1 ≤ 1 � � so (averaging!) there exists j ∗ s.t. 2 − j + 1 · Pr ≥ ε /(2 L ). x ∈ B j 12
“By a standard bucketing argument...” Second attempt: my bucket list Strategy For each j ∈ [ L ], in case it’s the good bucket: � sample O (log(1/ ε )/(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ). 13
“By a standard bucketing argument...” Second attempt: my bucket list Strategy For each j ∈ [ L ], in case it’s the good bucket: � sample O (log(1/ ε )/(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ). Total cost (examples): log 2 (1/ ε ) if cost( α ) ≍ 1/ α L log(1/ ε ) ε � · cost(2 − j ) = log(1/ ε ) 2 j ε if cost( α ) ≍ 1/ α 2 j = 1 ε 2 14
“By a standard bucketing argument...” Second attempt: my bucket list Strategy For each j ∈ [ L ], in case it’s the good bucket: � sample O (log(1/ ε )/(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ). Total cost (examples): log 2 (1/ ε ) if cost( α ) ≍ 1/ α L log(1/ ε ) ε � · cost(2 − j ) = log(1/ ε ) 2 j ε if cost( α ) ≍ 1/ α 2 j = 1 ε 2 Yes, but... we lose log factors. Do we have to lose log factors? 15
“By a refined averaging argument...” Third (and last) attempt: strategic investment Assume that cost( α ) is superlinear, e.g., cost( α ) = 1/ α 2 . Lemma (Levin’s Economical Work Investment Strategy) There exists 1 ≤ j ≤ ⌈ log(2/ ε ) ⌉ : = L s.t. 2 j ε � a ( x ) ≥ 2 − j � Pr 8( L + 1 − j ) 2 . (3) ≥ x 16
“By a refined averaging argument...” Third (and last) attempt: strategic investment Assume that cost( α ) is superlinear, e.g., cost( α ) = 1/ α 2 . Lemma (Levin’s Economical Work Investment Strategy) There exists 1 ≤ j ≤ ⌈ log(2/ ε ) ⌉ : = L s.t. 2 j ε � a ( x ) ≥ 2 − j � Pr 8( L + 1 − j ) 2 . (3) ≥ x Proof. By contradiction: L L E [ a ( x )] ≤ ε ≤ ε � a ( x ) ≥ 2 − j � � � 2 − j + 1 · Pr � � 2 − j + 1 · Pr x ∈ B j 2 + 2 + j = 1 j = 1 L 2 j ε L ∞ < ε 8( L + 1 − j ) 2 = ε 2 + ε ℓ 2 < ε 1 2 + ε 1 � � � 2 − j + 1 · 2 + ℓ 2 < ε 4 4 j = 1 ℓ = 1 ℓ = 1 “Oops.” 17
“By a refined averaging argument...” Third (and last) attempt: strategic investment Strategy For each j ∈ [ L ]: � sample O (( L + 1 − j ) 2 /(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ) ≍ 2 2 j . 18
“By a refined averaging argument...” Third (and last) attempt: strategic investment Strategy For each j ∈ [ L ]: � sample O (( L + 1 − j ) 2 /(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ) ≍ 2 2 j . Total cost: L ( L + 1 − j ) 2 L ( L + 1 − j ) 2 · 2 j = 2 L + 1 L · 2 2 j = 1 � � � ℓ 2 · 2 − ℓ 2 j ε ε ε j = 1 j = 1 ℓ = 1 ∞ < 4 � ℓ 2 · 2 − ℓ (It’s 6.) ε 2 ℓ = 1 O (1) 19
“By a refined averaging argument...” Third (and last) attempt: strategic investment Strategy For each j ∈ [ L ]: � sample O (( L + 1 − j ) 2 /(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ) ≍ 2 2 j . Total cost: L ( L + 1 − j ) 2 L ( L + 1 − j ) 2 · 2 j = 2 L + 1 L · 2 2 j = 1 � � � ℓ 2 · 2 − ℓ 2 j ε ε ε j = 1 j = 1 ℓ = 1 ∞ < 4 � ℓ 2 · 2 − ℓ (It’s 6.) ε 2 ℓ = 1 O (1) Yes, but... No, actually, nothing. Works for any cost( α ) ≫ 1/ α 1 + δ . 20
“By a refined averaging argument...” Third (and last) attempt: strategic investment Strategy For each j ∈ [ L ]: � sample O (( L + 1 − j ) 2 /(2 j ε )) x ’s to find a “good” one in B j ; � for each such x , pay cost(2 − j ) ≍ 2 2 j . Total cost: L ( L + 1 − j ) 2 L ( L + 1 − j ) 2 · 2 j = 2 L + 1 L · 2 2 j = 1 � � � ℓ 2 · 2 − ℓ 2 j ε ε ε j = 1 j = 1 ℓ = 1 ∞ < 4 � ℓ 2 · 2 − ℓ (It’s 6.) ε 2 ℓ = 1 O (1) Yes, but... No, actually, nothing. Works for any cost( α ) ≫ 1/ α 1 + δ . For cost( α ) ≍ 1/ α , not so easy, but some stuff exists. 21
Thomas’ Favorite Lemma 22
Kullback–Leibler Divergence Recall the definition of Kullback–Leibler divergence (a.k.a. relative entropy) between two discrete distributions p , q : p ( ω )log p ( ω ) � D ( p � q ) = q ( ω ) ω 23
Kullback–Leibler Divergence Recall the definition of Kullback–Leibler divergence (a.k.a. relative entropy) between two discrete distributions p , q : p ( ω )log p ( ω ) � D ( p � q ) = q ( ω ) ω It has some issues (symmetry, triangle inequality), yes, but it is everywhere (for a reason). It also has many nice properties. 24
Kullback–Leibler Divergence The dual characterization Theorem (First) For every q ≪ p, � � e f ( x ) �� � � D ( p � q ) = sup E x ∼ p f ( x ) − log E x ∼ q (4) f Theorem (Second) For every q ≪ p, and every λ � e λ x � � � log E x ∼ p = max λ E x ∼ q [ x ] − D ( q � p ) (5) q ≪ p Known as: Gibbs variational principle (1902?), Donsker-Varadhan (1975), special case of Fenchel duality, ... 25
Kullback–Leibler Divergence The dual characterization Theorem (First) For every q ≪ p, � � e f ( x ) �� � � D ( p � q ) = sup E x ∼ p f ( x ) − log E x ∼ q (4) f Theorem (Second) For every q ≪ p, and every λ � e λ x � � � log E x ∼ p = max λ E x ∼ q [ x ] − D ( q � p ) (5) q ≪ p Known as: Gibbs variational principle (1902?), Donsker-Varadhan (1975), special case of Fenchel duality, ... 26
An application Theorem Suppose p is subgaussian on R d . For every function a : R d → [0,1] (with α : = E x ∼ p [ a ( x )] > 0 ), � log 1 � E x ∼ p [ xa ( x )] � 2 ≤ C p α (6) α (constant C p depends on subgaussian parameter, not on d). 27
An application Theorem Suppose p is subgaussian on R d . For every function a : R d → [0,1] (with α : = E x ∼ p [ a ( x )] > 0 ), � log 1 � E x ∼ p [ xa ( x )] � 2 ≤ C p α (6) α (constant C p depends on subgaussian parameter, not on d). The proof that follows was communicated to me by Himanshu Tyagi. 28
Recommend
More recommend