large deviation bounds
play

Large Deviation Bounds A typical probability theory statement: - PowerPoint PPT Presentation

Large Deviation Bounds A typical probability theory statement: Theorem (The Central Limit Theorem) Let X 1 , . . . , X n be independent identically distributed random variables with common mean and variance 2 . Then z n 1 i =1 X i


  1. Large Deviation Bounds A typical probability theory statement: Theorem (The Central Limit Theorem) Let X 1 , . . . , X n be independent identically distributed random variables with common mean µ and variance σ 2 . Then � z � n 1 i =1 X i − µ 1 e − t 2 / 2 dt . n σ/ √ n √ n →∞ Pr( lim ≤ z ) = 2 π −∞ A typical CS probabilistic tool: Theorem (Chernoff Bound) Let X 1 , . . . , X n be independent Bernoulli random variables such that Pr ( X i = 1) = p i . Let µ = 1 � n i =1 p i , then n n Pr (1 X i ≥ (1 + δ ) µ ) ≤ e − µ n δ 2 / 3 . � n i =1

  2. Chernoff’s vs. Chebyshev’s Inequality Assume for all i we have p i = p ; 1 − p i = q . µ = E [ X ] = np Var [ X ] = npq If we use Chebyshev’s Inequality we get Pr ( | X − µ | > δµ ) ≤ npq npq q δ 2 µ 2 = δ 2 n 2 p 2 = δ 2 µ Chernoff bound gives Pr ( | X − µ | > δµ ) ≤ 2 e − µδ 2 / 3 .

  3. The Basic Idea of Large Deviation Bounds: For any random variable X , by Markov inequality we have: For any t > 0, Pr ( X ≥ a ) = Pr ( e tX ≥ e ta ) ≤ E [ e tX ] . e ta Similarly, for any t < 0 Pr ( X ≤ a ) = Pr ( e tX ≥ e ta ) ≤ E [ e tX ] . e ta Theorem (Markov Inequality) If a random variable X is non-negative (X ≥ 0 ) then Prob ( X ≥ a ) ≤ E [ X ] . a

  4. The General Scheme: We obtain specific bounds for particular conditions/distributions by 1 computing E [ e tX ] 2 optimize E [ e tX ] Pr ( X ≥ a ) ≤ min e ta t > 0 E [ e tX ] Pr ( X ≤ a ) ≤ min . e ta t < 0 3 symplify

  5. Chernof Bound - Large Deviation Bound Theorem Let X 1 , . . . , X n be independent, identically distributed, 0 − 1 random variables with Pr ( X i = 1) = E [ X i ] = p. Let ¯ X n = 1 � n i =1 X i , then for any δ ∈ [0 , 1] we have n Prob ( ¯ X n ≥ (1 + δ ) p ) ≤ e − np δ 2 / 3 and Prob ( ¯ X n ≤ (1 − δ ) p ) ≤ e − np δ 2 / 2 .

  6. Chernof Bound - Large Deviation Bound Theorem Let X 1 , . . . , X n be independent, 0 − 1 random variables with Pr ( X i = 1) = E [ X i ] = p i . Let µ = � n i =1 p i , then for any δ ∈ [0 , 1] we have n X i ≥ (1 + δ ) µ ) ≤ e − µδ 2 / 3 � Prob ( i =1 and n � X i ≤ (1 − δ ) µ ) ≤ e − µδ 2 / 2 . Prob ( i =1

  7. Consider n coin flips. Let X be the number of heads. Markov Inequality gives � � X ≥ 3 n ≤ n / 2 3 n / 4 ≤ 2 Pr 3 . 4 Using the Chebyshev’s bound we have: � X − n � ≥ n ≤ 4 �� � � Pr n . � � 2 4 Using the Chernoff bound in this case, we obtain � X − n � ≥ n � X ≥ n � 1 + 1 �� �� � � Pr = Pr � � 2 4 2 2 � X ≤ n � 1 − 1 �� + Pr 2 2 e − 1 n 1 4 + e − 1 n 1 4 ≤ 2 e − n 24 . ≤ 3 2 2 2

  8. Moment Generating Function Definition The moment generating function of a random variable X is defined for any real value t as M X ( t ) = E [ e tX ] .

  9. Theorem Let X be a random variable with moment generating function M X ( t ) . Assuming that exchanging the expectation and differentiation operands is legitimate, then for all n ≥ 1 E [ X n ] = M ( n ) X (0) , where M ( n ) X (0) is the n-th derivative of M X ( t ) evaluated at t = 0 . Proof. M ( n ) X ( t ) = E [ X n e tX ] . Computed at t = 0 we get M ( n ) X (0) = E [ X n ] .

  10. Theorem Let X and Y be two random variables. If M X ( t ) = M Y ( t ) for all t ∈ ( − δ, δ ) for some δ > 0 , then X and Y have the same distribution. Theorem If X and Y are independent random variables then M X + Y ( t ) = M X ( t ) M Y ( t ) . Proof. M X + Y ( t ) = E [ e t ( X + Y ) ] = E [ e tX ] E [ e tY ] = M X ( t ) M Y ( t ) .

  11. Chernoff Bound for Sum of Bernoulli Trials Theorem Let X 1 , . . . , X n be independent Bernoulli random variables such that Pr ( X i = 1) = p i . Let X = � n i =1 X i and µ = � n i =1 p i . • For any δ > 0 , � µ e δ � Pr ( X ≥ (1 + δ ) µ ) ≤ . (1) (1 + δ ) 1+ δ • For 0 < δ ≤ 1 , Pr ( X ≥ (1 + δ ) µ ) ≤ e − µδ 2 / 3 . (2) • For R ≥ 6 µ , Pr ( X ≥ R ) ≤ 2 − R . (3)

  12. Chernoff Bound for Sum of Bernoulli Trials Let X 1 , . . . , X n be a sequence of independent Bernoulli trials with Pr ( X i = 1) = p i . Let X = � n i =1 X i , and let � n � n n � � � µ = E [ X ] = E X i = E [ X i ] = p i . i =1 i =1 i =1 For each X i : E [ e tX i ] M X i ( t ) = p i e t + (1 − p i ) = 1 + p i ( e t − 1) = e p i ( e t − 1) . ≤

  13. E [ e tX i ] ≤ e p i ( e t − 1) . M X i ( t ) = Taking the product of the n generating functions we get for X = � n i =1 X i n � M X ( t ) = M X i ( t ) i =1 n e p i ( e t − 1) � ≤ i =1 � n i =1 p i ( e t − 1) = e e ( e t − 1) µ =

  14. M X ( t ) = E [ e tX ] = e ( e t − 1) µ Applying Markov’s inequality we have for any t > 0 Pr ( e tX ≥ e t (1+ δ ) µ ) Pr ( X ≥ (1 + δ ) µ ) = E [ e tX ] ≤ e t (1+ δ ) µ e ( e t − 1) µ ≤ e t (1+ δ ) µ For any δ > 0, we can set t = ln(1 + δ ) > 0 to get: � µ e δ � Pr ( X ≥ (1 + δ ) µ ) ≤ . (1 + δ ) (1+ δ ) This proves (1).

  15. We show that for 0 < δ < 1, e δ (1 + δ ) (1+ δ ) ≤ e − δ 2 / 3 f ( δ ) = δ − (1 + δ ) ln(1 + δ ) + δ 2 / 3 ≤ 0 or that in that interval. Computing the derivatives of f ( δ ) we get 1 − 1 + δ 1 + δ − ln(1 + δ ) + 2 3 δ = − ln(1 + δ ) + 2 f ′ ( δ ) = 3 δ, 1 + δ + 2 1 f ′′ ( δ ) = − 3 . f ′′ ( δ ) < 0 for 0 ≤ δ < 1 / 2, and f ′′ ( δ ) > 0 for δ > 1 / 2. f ′ ( δ ) first decreases and then increases over the interval [0 , 1]. Since f ′ (0) = 0 and f ′ (1) < 0, f ′ ( δ ) ≤ 0 in the interval [0 , 1]. Since f (0) = 0, we have that f ( δ ) ≤ 0 in that interval. This proves (2).

  16. For R ≥ 6 µ , δ ≥ 5. � µ e δ � Pr ( X ≥ (1 + δ ) µ ) ≤ (1 + δ ) (1+ δ ) � e � R ≤ 6 2 − R , ≤ that proves (3).

  17. Theorem Let X 1 , . . . , X n be independent Bernoulli random variables such that Pr ( X i = 1) = p i . Let X = � n i =1 X i and µ = E [ X ] . For 0 < δ < 1 : • � µ e − δ � Pr ( X ≤ (1 − δ ) µ ) ≤ . (4) (1 − δ ) (1 − δ ) • Pr ( X ≤ (1 − δ ) µ ) ≤ e − µδ 2 / 2 . (5)

  18. Using Markov’s inequality, for any t < 0, Pr ( e tX ≥ e (1 − δ ) t µ ) Pr ( X ≤ (1 − δ ) µ ) = E [ e tX ] ≤ e t (1 − δ ) µ e ( e t − 1) µ ≤ e t (1 − δ ) µ For 0 < δ < 1, we set t = ln(1 − δ ) < 0 to get: � µ e − δ � Pr ( X ≤ (1 − δ ) µ ) ≤ (1 − δ ) (1 − δ ) This proves (4). We need to show: f ( δ ) = − δ − (1 − δ ) ln(1 − δ ) + 1 2 δ 2 ≤ 0 .

  19. We need to show: f ( δ ) = − δ − (1 − δ ) ln(1 − δ ) + 1 2 δ 2 ≤ 0 . Differentiating f ( δ ) we get f ′ ( δ ) = ln(1 − δ ) + δ, 1 f ′′ ( δ ) = − 1 − δ + 1 . Since f ′′ ( δ ) < 0 for δ ∈ (0 , 1), f ′ ( δ ) decreasing in that interval. Since f ′ (0) = 0, f ′ ( δ ) ≤ 0 for δ ∈ (0 , 1). Therefore f ( δ ) is non increasing in that interval. f (0) = 0. Since f ( δ ) is non increasing for δ ∈ [0 , 1), f ( δ ) ≤ 0 in that interval, and (5) follows.

  20. Example: Coin Flips Let X be the number of heads in a sequence of n independent fair coin flips. Markov Inequality gives � X ≥ 3 n � ≤ n / 2 3 n / 4 ≤ 2 Pr 3 . 4 Using the Chebyshev’s bound we have: � X − n � ≥ n ≤ 4 �� � � Pr n . � � 2 4 Using the Chernoff bound in this case, we obtain � � �� � X − n � ≥ n X ≥ n 1 + 1 �� � � Pr = Pr � � 2 4 2 2 � � �� X ≤ n 1 − 1 + Pr 2 2 e − 1 n 1 4 + e − 1 n 1 ≤ 3 2 2 2 4 2 e − n 24 . ≤

  21. Example: Coin flips Theorem (The Central Limit Theorem) Let X 1 , . . . , X n be independent identically distributed random variables with common mean µ and variance σ 2 . Then � z � n 1 i =1 X i − µ 1 e − t 2 / 2 dt . n n →∞ Pr( lim σ/ √ n ≤ z ) = √ 2 π −∞ 1 � n i =1 X i − µ Φ(2 . 23) = 0 . 99, thus, lim n →∞ Pr ( n ≤ 2 . 23) = 0 . 99 σ/ √ n For coin flips: 1 � n i =1 X i − 1 / 2 1 / (2 √ lim n →∞ Pr ( n ≤ 2 . 23) = 0 . 99 n ) � n i =1 X i − n / 2 lim n →∞ Pr ( ≥ 2 . 23) = 0 . 01 √ n / 2 2 ≥ 2 . 23 √ n / 2) = 0 . 01 lim n →∞ Pr ( � n i =1 X i − n 2 ≥ 3 . 5 √ n / 2) = 0 . 001 Φ(3 . 5) ≈ 0 . 999, lim n →∞ Pr ( � n i =1 X i − n

  22. Example: Coin flips Let X be the number of heads in a sequence of n independent fair coin flips. √ �� � X − n � ≥ 1 � � Pr 6 n ln n � � 2 2 � � �� � X ≥ n 6 ln n = Pr 1 + 2 n � � �� � X ≤ n 6 ln n + Pr 1 − 2 n ≤ 2 ≤ e − 1 6 ln n + e − 1 6 ln n n n n . 3 2 n 2 2 n � Note that the standard deviation is n / 4

  23. Example: estimate the value of π 1 • Choose X and Y independently and uniformly at random in [0 , 1]. • Let √ X 2 + Y 2 ≤ 1 , � 1 if Z = 0 otherwise, • 1 2 ≤ p = Pr( Z = 1) = π 4 ≤ 1 . • 4 E [ Z ] = π .

  24. • Let Z 1 , . . . , Z m be the values of m independent experiments. W m = � m i =1 Z i . • � m � m E [ Z i ] = m π � � E [ W m ] = E Z i = 4 , i =1 i =1 m = 4 • W ′ m W m is an unbiased estimate for π ( i.e. E [ W ′ m ] = π ) • How many samples do we need to obtain a good estimate? Pr( | W ′ m − π | ≥ ǫ ) =?

Recommend


More recommend