Showing � S ( ξ ) is close to � Z ( ξ ) X ( ξ/ √ n ) is valid only if | ξ | is small. Note: Taylor expansion of � √ n Claim: For | ξ | ≤ 100 , we have � 1 � � � √ n | ξ | 3 e − ξ 2 / 3 �� S ( ξ ) − � � = O Z ( ξ ) . Plugging this back into approximate Fourier inversion (with T = √ n / 100), � 1 � � ξ = T | ξ | 2 e − ξ 2 / 3 Pr[ S ≤ x ] − Pr[ Z ≤ x ] ≤ 1 √ n d ξ + O . 2 π T ξ = − T
Showing � S ( ξ ) is close to � Z ( ξ ) X ( ξ/ √ n ) is valid only if | ξ | is small. Note: Taylor expansion of � √ n Claim: For | ξ | ≤ 100 , we have � 1 � � � √ n | ξ | 3 e − ξ 2 / 3 �� S ( ξ ) − � � = O Z ( ξ ) .
Showing � S ( ξ ) is close to � Z ( ξ ) X ( ξ/ √ n ) is valid only if | ξ | is small. Note: Taylor expansion of � √ n Claim: For | ξ | ≤ 100 , we have � 1 � � � √ n | ξ | 3 e − ξ 2 / 3 �� S ( ξ ) − � � = O Z ( ξ ) . Proof technique: Split | ξ | into the high | ξ | regime and the low | ξ | regime. In particular, define 1 1 1 6 } and Γ high = { ξ : n 6 } . 2 / 100 ≥ | ξ | > n Γ low = { ξ : | ξ | ≤ n
Showing � S ( ξ ) is close to � Z ( ξ ) X ( ξ/ √ n ) is valid only if | ξ | is small. Note: Taylor expansion of � √ n Claim: For | ξ | ≤ 100 , we have � 1 � � � √ n | ξ | 3 e − ξ 2 / 3 �� S ( ξ ) − � � = O Z ( ξ ) .
Showing � S ( ξ ) is close to � Z ( ξ ) X ( ξ/ √ n ) is valid only if | ξ | is small. Note: Taylor expansion of � √ n Claim: For | ξ | ≤ 100 , we have � 1 � � � √ n | ξ | 3 e − ξ 2 / 3 �� S ( ξ ) − � � = O Z ( ξ ) . Proof technique: When ξ ∈ Γ low , then we apply Taylor’s expansion – recall � � n 1 − ξ 2 � 2 n + o ( | ξ | 2 / n ) S ( ξ ) = .
Showing � S ( ξ ) is close to � Z ( ξ ) X ( ξ/ √ n ) is valid only if | ξ | is small. Note: Taylor expansion of � √ n Claim: For | ξ | ≤ 100 , we have � 1 � � � �� S ( ξ ) − � � = O √ n | ξ | 3 e − ξ 2 / 3 Z ( ξ ) .
Showing � S ( ξ ) is close to � Z ( ξ ) X ( ξ/ √ n ) is valid only if | ξ | is small. Note: Taylor expansion of � √ n Claim: For | ξ | ≤ 100 , we have � 1 � � � �� S ( ξ ) − � � = O √ n | ξ | 3 e − ξ 2 / 3 Z ( ξ ) . Proof technique: On the other hand, it is not difficult to show that S ( ξ ) | ≤ e − 2 ξ 2 Z ( ξ ) | = e − ξ 2 | � 3 . Using the fact that | � 2 . When ξ ∈ Γ high , this is enough
Finishing the proof of Berry-Ess´ een √ n For all | ξ | ≤ 100 , we have � 1 � � � √ n | ξ | 3 e − ξ 2 / 3 �� S ( ξ ) − � � = O Z ( ξ ) .
Finishing the proof of Berry-Ess´ een √ n For all | ξ | ≤ 100 , we have � 1 � � � √ n | ξ | 3 e − ξ 2 / 3 �� S ( ξ ) − � � = O Z ( ξ ) . Plugging this back into the approximate Fourier inversion formula (which is) � ξ = T � 1 � | � S ( ξ ) − � | Pr[ S ≤ x ] − Pr[ Z ≤ x ] | ≤ 1 Z ( ξ ) | d ξ + O , 2 π | ξ | T ξ = − T we get Pr[ S ≤ x ] − Pr[ Z ≤ x ] | = O ( n − 1 / 2 ).
Application of the Berry-Ess´ een theorem Stochastic knapsack Suppose you have n items, each with a profit c i and a stochastic weight X i where each X i is a positive valued random variable.
Application of the Berry-Ess´ een theorem Stochastic knapsack Suppose you have n items, each with a profit c i and a stochastic weight X i where each X i is a positive valued random variable. Goal: Given a knapsack with capacity θ and error tolerance probability p , pack a subset S of items such that � X j ≤ θ ] ≥ 1 − p , Pr[ j ∈ S so that the profit � j ∈ S c j is maximized.
Berry-Ess´ een theorem for stochastic knapsack Stochastic knapsack Suppose you have n items, each with a profit c i and a stochastic weight X i where each X i is of the form � w.p. 1 w ℓ, i 2 X i = w.p. 1 w h , i 2 Here all w ℓ, i ∈ [1 , . . . , M / 4] and w h , i ∈ [3 M / 4 , . . . , M ] where M = poly( n ). Further, all profits c i ∈ [1 , . . . , M ].
Algorithmic result for stochastic knapsack Result: There is an algorithm which for any error parameter ǫ > 0, runs in time poly( M , n 1 /ǫ 2 ) and outputs a set S ∗ such that � Pr[ X j ≤ θ ] ≥ 1 − p − ǫ, j ∈ S ∗ such that � j ∈ S ∗ c j = OPT.
Algorithmic result for stochastic knapsack Result: There is an algorithm which for any error parameter ǫ > 0, runs in time poly( M , n 1 /ǫ 2 ) and outputs a set S ∗ such that � Pr[ X j ≤ θ ] ≥ 1 − p − ǫ, j ∈ S ∗ such that � j ∈ S ∗ c j = OPT. Key feature: We do not relax the knapsack capacity θ .
Algorithmic result for stochastic knapsack Result: There is an algorithm which for any error parameter ǫ > 0, runs in time poly( M , n 1 /ǫ 2 ) and outputs a set S ∗ such that � Pr[ X j ≤ θ ] ≥ 1 − p − ǫ, j ∈ S ∗ such that � j ∈ S ∗ c j = OPT. Key feature: We do not relax the knapsack capacity θ . See paper in SODA 2018 for the most general version of results.
Main idea behind the algorithm Observation I: If we “center” random variable X i , i.e, Y i = X i − E [ X i ], then it satisfies � � 3 / 2 . E [ | Y i | 3 ] ≤ E [ | Y i | 2 ]
Main idea behind the algorithm Observation I: If we “center” random variable X i , i.e, Y i = X i − E [ X i ], then it satisfies � � 3 / 2 . E [ | Y i | 3 ] ≤ E [ | Y i | 2 ] Thus, we can potentially apply Berry-Ess´ een to a sum of X i .
Main idea behind the algorithm Observation I: If we “center” random variable X i , i.e, Y i = X i − E [ X i ], then it satisfies � � 3 / 2 . E [ | Y i | 3 ] ≤ E [ | Y i | 2 ] Thus, we can potentially apply Berry-Ess´ een to a sum of X i . Observation II: Consider any subset of items S with | S | ≥ 100 /ǫ 2 . Then, � Var( X i ) ≤ ǫ 2 · ( max Var( X j )) . i j ∈ S
Algorithmic idea Step 1: Either the optimum solution S opt is such that | S opt | ≤ 100 /ǫ 2 . In this case, we can brute-force search for S opt . Running time is n Θ(1 /ǫ 2 ) .
Algorithmic idea Step 1: Either the optimum solution S opt is such that | S opt | ≤ 100 /ǫ 2 . In this case, we can brute-force search for S opt . Running time is n Θ(1 /ǫ 2 ) . Step 2: Otherwise, | S opt | > 100 /ǫ 2 . In this case, define µ opt , σ 2 and C opt as (i) µ opt = � opt = � opt j ∈ S opt E [ X j ]; (ii) σ 2 j ∈ S opt Var( X j ); (iii) C opt = � j ∈ S opt c j .
Algorithmic idea Observe that µ opt , σ 2 opt and C opt are all integral multiples of 1 / 4 bounded by M 2 .
Algorithmic idea Observe that µ opt , σ 2 opt and C opt are all integral multiples of 1 / 4 bounded by M 2 . We use dynamic programming to find S ∗ such that C opt = C ∗ , µ opt = µ ∗ and σ 2 opt = σ 2 ∗ .
Algorithmic idea Observe that µ opt , σ 2 opt and C opt are all integral multiples of 1 / 4 bounded by M 2 . We use dynamic programming to find S ∗ such that C opt = C ∗ , µ opt = µ ∗ and σ 2 opt = σ 2 ∗ . Consequence of Berry-Ess´ een theorem: � � Pr[ X j ≤ θ ] ≥ Pr[ X j ≤ θ ] − ǫ. j ∈ S ∗ j ∈ S opt
Algorithmic idea Consequence of Berry-Ess´ een theorem: � � Pr[ X j ≤ θ ] ≥ Pr[ X j ≤ θ ] − ǫ. j ∈ S ∗ j ∈ S opt een, the distribution of � This is because by Berry-Ess´ j ∈ S ∗ X j and � j ∈ S opt X j follows essentially Gaussian (and their means and variances match).
General algorithmic result for stochastic optimization Suppose the item sizes { X i } n i =1 are all hypercontractive – i.e., E [ | X i | 3 ] ≤ O (1) · ( E [ | X i | 2 ]) 3 / 2 .
General algorithmic result for stochastic optimization Suppose the item sizes { X i } n i =1 are all hypercontractive – i.e., E [ | X i | 3 ] ≤ O (1) · ( E [ | X i | 2 ]) 3 / 2 . Theorem: When item sizes are hypercontractive, then there is an algorithm running in time n O (1 /ǫ 2 ) such that the output set S ∗ satisfies 1. � j ∈ S ∗ c j ≥ (1 − ǫ ) · ( � j ∈ S opt c j ). 2. Pr[ � j ∈ S ∗ X j ≤ θ ] ≥ Pr[ � j ∈ S opt X j ≤ θ ] − ǫ.
General algorithmic result for stochastic optimization Suppose the item sizes { X i } n i =1 are all hypercontractive – i.e., E [ | X i | 3 ] ≤ O (1) · ( E [ | X i | 2 ]) 3 / 2 . Theorem: When item sizes are hypercontractive, then there is an algorithm running in time n O (1 /ǫ 2 ) such that the output set S ∗ satisfies 1. � j ∈ S ∗ c j ≥ (1 − ǫ ) · ( � j ∈ S opt c j ). 2. Pr[ � j ∈ S ∗ X j ≤ θ ] ≥ Pr[ � j ∈ S opt X j ≤ θ ] − ǫ. Read SODA 2018 paper for more details.
Central limit theorems: Citius , Altius , Fortius
Central limit theorems: Citius , Altius , Fortius Let’s do Altius – as in higher degree polynomials.
Central limit theorems: Citius , Altius , Fortius Let’s do Altius – as in higher degree polynomials. Berry-Ess´ een says that sums of independent random variables under mild conditions converges to a Gaussian.
Central limit theorems: Citius , Altius , Fortius Let’s do Altius – as in higher degree polynomials. Berry-Ess´ een says that sums of independent random variables under mild conditions converges to a Gaussian. What if we replace the sum by a polynomial? Let us think of the easy case when the degree is 2.
Central limit theorem for low-degree polynomials � x 1 + ... + x n � 2 . As n → ∞ , x 1 , . . . , x n are i.i.d. Consider p ( x ) = √ n copies of unbiased ± 1 random variables, the distribution of p ( x ) goes to a
Central limit theorem for low-degree polynomials � x 1 + ... + x n � 2 . As n → ∞ , x 1 , . . . , x n are i.i.d. Consider p ( x ) = √ n copies of unbiased ± 1 random variables, the distribution of p ( x ) goes to a χ 2 distribution.
Central limit theorem for low-degree polynomials � x 1 + ... + x n � 2 . As n → ∞ , x 1 , . . . , x n are i.i.d. Consider p ( x ) = √ n copies of unbiased ± 1 random variables, the distribution of p ( x ) goes to a χ 2 distribution. In fact, suppose p ( x ) is of degree-2 and of the following form: p ( x ) = λ · ℓ 2 ( x ) + q ( x ) m where ℓ ( x ) is a linear form and λ = E [ p ( x ) · ℓ 2 ( x )]. If λ is large, then p ( x ) is very far from a Gaussian.
Central limit theorem for quadratic polynomials Theorem Let p ( x ) : R n → R such that Var( p ( x )) = 1 and E [ p ( x )] = µ . Express p ( x ) = x T Ax + � b , x � + c where A ∈ R n × n and b ∈ R n . Let � A � op ≤ ǫ and � b � ∞ ≤ ǫ . Suppose, x ∼ {− 1 , 1 } n . Then, d K ( p ( x ) , N ( µ, 1)) = O ( √ ǫ ) . If p ( x ) is not correlated with product of two linear forms, then it is distributed as a Gaussian.
Central limit theorem for higher degree polynomials Corresponding to any multilinear polynomial p : R n → R of degree- d , we have a sequence of tensors ( A d , . . . , A 0 ) where A i ∈ R n × i is a tensor of order i .
Central limit theorem for higher degree polynomials Corresponding to any multilinear polynomial p : R n → R of degree- d , we have a sequence of tensors ( A d , . . . , A 0 ) where A i ∈ R n × i is a tensor of order i . For a tensor A i (where i > 1), we use σ max ( A i ) to denote the “maximum singular value” obtained by a non-trivial flattening.
Central limit theorem for higher degree polynomials Corresponding to any multilinear polynomial p : R n → R of degree- d , we have a sequence of tensors ( A d , . . . , A 0 ) where A i ∈ R n × i is a tensor of order i . For a tensor A i (where i > 1), we use σ max ( A i ) to denote the “maximum singular value” obtained by a non-trivial flattening. Theorem Let p : R n → R be a degree-d polynomial with Var( p ( x )) = 1 and E [ p ( x )] = µ . Let ( A d , . . . , A 0 ) denote the tensors corresponding to p. Then, d K ( p ( x ) , N ( µ, 1)) = O d ( √ ǫ ) , where x ∼ {− 1 , 1 } n . Here ǫ ≥ max j > 1 σ max ( A j ) and ǫ ≥ �A 0 � ∞ .
Features of the central limit theorem 1. Qualitatively tight: in particular, for a polynomial if max j > 1 σ max ( A j ) is large, then the distribution of p ( x ) does not look like a Gaussian.
Features of the central limit theorem 1. Qualitatively tight: in particular, for a polynomial if max j > 1 σ max ( A j ) is large, then the distribution of p ( x ) does not look like a Gaussian. 2. max j > 1 σ max ( A j ) is essentially capturing correlation of p ( x ) with product of two lower degree polynomials.
Features of the central limit theorem 1. Qualitatively tight: in particular, for a polynomial if max j > 1 σ max ( A j ) is large, then the distribution of p ( x ) does not look like a Gaussian. 2. max j > 1 σ max ( A j ) is essentially capturing correlation of p ( x ) with product of two lower degree polynomials. 3. Condition for convergence to normal is efficiently checkable.
Proof of the central limit theorem • The first step is to go from x ∼ {− 1 , 1 } n to x ∼ N n (0 , 1). Accomplished via the invariance principle .
Proof of the central limit theorem • The first step is to go from x ∼ {− 1 , 1 } n to x ∼ N n (0 , 1). Accomplished via the invariance principle . • Once in the Gaussian domain, the question is when does a polynomial of a Gaussian look like a Gaussian?
Proof of the central limit theorem • The first step is to go from x ∼ {− 1 , 1 } n to x ∼ N n (0 , 1). Accomplished via the invariance principle . • Once in the Gaussian domain, the question is when does a polynomial of a Gaussian look like a Gaussian? • Proof technique: Stein’s method + Malliavin calculus.
Central limit theorem – application in derandomization • Derandomization – fertile ground both for applications and discovery of central limit theorems (in computer science)
Central limit theorem – application in derandomization • Derandomization – fertile ground both for applications and discovery of central limit theorems (in computer science) • Why is central limit theorem useful for derandomization?
Central limit theorem – application in derandomization • Derandomization – fertile ground both for applications and discovery of central limit theorems (in computer science) • Why is central limit theorem useful for derandomization? • Example: Suppose we are given a halfspace f : {− 1 , 1 } n → {− 1 , 1 } where f ( x ) = sign( � n i =1 w i x i − θ ).
Central limit theorem – application in derandomization • Derandomization – fertile ground both for applications and discovery of central limit theorems (in computer science) • Why is central limit theorem useful for derandomization? • Example: Suppose we are given a halfspace f : {− 1 , 1 } n → {− 1 , 1 } where f ( x ) = sign( � n i =1 w i x i − θ ). Deterministically compute Pr x ∈{− 1 , 1 } n [ f ( x ) = 1].
Derandomizing halfspaces • For f ( x ) = sign( � n i =1 w i x i − θ ), exactly computing Pr x ∈{− 1 , 1 } n [ f ( x ) = 1] is # P -hard.
Derandomizing halfspaces • For f ( x ) = sign( � n i =1 w i x i − θ ), exactly computing Pr x ∈{− 1 , 1 } n [ f ( x ) = 1] is # P -hard. • Computing Pr x ∈{− 1 , 1 } n [ f ( x ) = 1] to additive error ǫ is trivial using randomness.
Derandomizing halfspaces • For f ( x ) = sign( � n i =1 w i x i − θ ), exactly computing Pr x ∈{− 1 , 1 } n [ f ( x ) = 1] is # P -hard. • Computing Pr x ∈{− 1 , 1 } n [ f ( x ) = 1] to additive error ǫ is trivial using randomness. • What can we do deterministically? or how are CLTs going to be useful?
Derandomizing halfspaces • For f ( x ) = sign( � n i =1 w i x i − θ ), exactly computing Pr x ∈{− 1 , 1 } n [ f ( x ) = 1] is # P -hard. • Computing Pr x ∈{− 1 , 1 } n [ f ( x ) = 1] to additive error ǫ is trivial using randomness. • What can we do deterministically? or how are CLTs going to be useful? • [Servedio 2007]: Suppose all the | w i | ≤ ǫ/ 100 (where � w � 2 = 1).
Berry-Ess´ een in action � n � � � � Pr w i x i − θ ≥ 0 ≈ ǫ Pr g − θ ≥ 0 x ∈{− 1 , 1 } n g ∼N (0 , 1) i =1
Berry-Ess´ een in action � n � � � � Pr w i x i − θ ≥ 0 ≈ ǫ Pr g − θ ≥ 0 x ∈{− 1 , 1 } n g ∼N (0 , 1) i =1 � � • However, Pr g ∼N (0 , 1) g − θ ≥ 0 can be computed in O ǫ (1) time.
Berry-Ess´ een in action � n � � � � Pr w i x i − θ ≥ 0 ≈ ǫ Pr g − θ ≥ 0 x ∈{− 1 , 1 } n g ∼N (0 , 1) i =1 � � • However, Pr g ∼N (0 , 1) g − θ ≥ 0 can be computed in O ǫ (1) time. � � n � • Thus, when | w i | ≤ ǫ/ 100, Pr x ∈{− 1 , 1 } n i =1 w i x i − θ ≥ 0 can be computed to ± ǫ in O ǫ (1) · poly( n ) time.
What if max | w i | ≥ ǫ/ 100? • Suppose | w 1 | ≥ ǫ/ 100. We recurse on the variable x 1 . n n � � f x 1 =1 = sign( w i x i − θ + w 1 ); f x 1 = − 1 = sign( w i x i − θ − w 1 ) i =2 i =2
What if max | w i | ≥ ǫ/ 100? • Suppose | w 1 | ≥ ǫ/ 100. We recurse on the variable x 1 . n n � � f x 1 =1 = sign( w i x i − θ + w 1 ); f x 1 = − 1 = sign( w i x i − θ − w 1 ) i =2 i =2 • Observe that it suffices to compute � � 1 2 · x ∈{− 1 , 1 } n − 1 [ f x 1 =1 ( x ) = 1] + Pr x ∈{− 1 , 1 } n − 1 [ f x 1 = − 1 ( x ) = 1] Pr .
Berry-Ess´ een in recursive action �� n • Either max j ≥ 2 | w j | ≤ ( ǫ/ 100) · i =2 w 2 i .
Berry-Ess´ een in recursive action �� n • Either max j ≥ 2 | w j | ≤ ( ǫ/ 100) · i =2 w 2 i . • If yes, we can apply the Berry-Ess´ een theorem.
Berry-Ess´ een in recursive action �� n • Either max j ≥ 2 | w j | ≤ ( ǫ/ 100) · i =2 w 2 i . • If yes, we can apply the Berry-Ess´ een theorem. • Else, we restrict w 2 . Note: every time we restrict a variable, we capture an ǫ -fraction of the remaining ℓ 2 mass.
Berry-Ess´ een in recursive action �� n • Either max j ≥ 2 | w j | ≤ ( ǫ/ 100) · i =2 w 2 i . • If yes, we can apply the Berry-Ess´ een theorem. • Else, we restrict w 2 . Note: every time we restrict a variable, we capture an ǫ -fraction of the remaining ℓ 2 mass. • Suppose the process goes on for j iterations. Either j ≤ ǫ − 1 log(1 /ǫ ) or j > ǫ − 1 log(1 /ǫ ).
Berry-Ess´ een in recursive action • If j ≤ ǫ − 1 log(1 /ǫ ), then this reduces the problem to exp(1 /ǫ ) subproblems – each of which can be solved using Berry-Ess´ een.
Berry-Ess´ een in recursive action • If j ≤ ǫ − 1 log(1 /ǫ ), then this reduces the problem to exp(1 /ǫ ) subproblems – each of which can be solved using Berry-Ess´ een. • If j > ǫ − 1 log(1 /ǫ ), we simply stop at j = ǫ − 1 log(1 /ǫ ). • The top ǫ − 1 log(1 /ǫ ) weights capture most of the ℓ 2 mass.
Berry-Ess´ een in recursive action • If j ≤ ǫ − 1 log(1 /ǫ ), then this reduces the problem to exp(1 /ǫ ) subproblems – each of which can be solved using Berry-Ess´ een. • If j > ǫ − 1 log(1 /ǫ ), we simply stop at j = ǫ − 1 log(1 /ǫ ). • The top ǫ − 1 log(1 /ǫ ) weights capture most of the ℓ 2 mass. • Non-trivial: Since ǫ − 1 log(1 /ǫ ) weights capture most of the mass of the vector w , we can just consider the halfspace over these variables.
Recommend
More recommend