Strong Law of Large Numbers Will Perkins February 12, 2013
The Theorem Theorem (Strong Law of Large Numbers) Let X 1 , X 2 , . . . be iid random variables with a finite first moment, E X i = µ . Then X 1 + X 2 + · · · + X n → µ n almost surely as n → ∞ . The word ‘Strong’ refers to the type of convergence, almost sure. We’ll see the proof today, working our way up from easier theorems.
Using Chebyshev’s Inequality, we saw a proof of the Weak Law of Large Numbers, under the additional assumption that X i has a finite variance. Under an even stronger assumption we can prove the Strong Law. Theorem (Take 1) Let X 1 , . . . be iid, and assume E X i = µ and E X 4 i = m 4 < ∞ . Then X 1 + X 2 + · · · + X n → µ n almost surely as n → ∞ .
Proof with a 4th moment Proof: Since we have a finite 4th moment, we can try a 4th moment version of Chebyshev: Pr[ | Z − E Z | > ǫ ] ≤ E | Z − E Z | 4 ǫ 4 First to simplify, we can assume E X i = 0 just by subtracting µ from each. Now let U n = X 1 + X 2 + ··· + X n . E U n = 0. n Then calculate n = 1 E U 4 � X 4 � X i X 3 � X 2 i X 2 � X i X j X 2 � n 4 E [ i +4 j +3 j +6 k + X i X j X k i i � = j i � = j i , j , k i , j , k , l
Proof with a 4th moment Now all the terms with an X i to the first power are 0 in expectation. [Why?] Which leaves: n = 1 E U 4 n E X 4 i + 3 n ( n − 1) E X 2 i X 2 � � j n 4 n 3 + 3 σ 4 ≤ m 4 n 2 Now applying the 4th moment Markov’s Inequality: n 3 + 3 σ 4 m 4 n 2 Pr[ | U n − E U n | > ǫ ] ≤ ǫ 4
Proof with a 4th moment But for ǫ fixed, we can sum the RHS from n = 1 to ∞ and get a finite sum. (1 / n 2 is summable). Now apply Borel-Cantelli: fix ǫ > 0, and let A ǫ n be the event that | U n | > ǫ . We’ve shown that ∞ � Pr( A ǫ n ) < ∞ n =1 and so by the Borel-Cantelli Lemma, with probability 1, only finitely many of the A ǫ n ’s occur. This is precisely what it means for U n → 0 almost surely.
Removing Higher Moment Conditions What remains is to remove the conditions for X i to have finite higher moments.
Strong Law with 2nd Moment Theorem (Take 2) Let X 1 , . . . be iid with mean µ and variance σ 2 . Then X 1 + X 2 + · · · + X n → µ n almost surely as n → ∞ . Two tricks: 1 Assume X i ’s are non-negative 2 First prove for a subsequence
Non-negativity Let X i = X + − X − where X + = max { 0 , X i } , X − = − min { 0 , X i } i i i i X + and X − are both non-negative, with finite expectation and i i variance, so if we prove the SLLN holds for non-negative RV’s, we can apply spearately to the two parts and recombine.
Subsequence We will find a subsequence of natural numbers so that the empirical averages along the subsequence converge alsmost surely. The subsequence will be explicit: 1 , 4 , 9 , . . . n 2 , . . . Let �� X 1 + · · · + X n 2 � � � � A ǫ n 2 = − µ � > ǫ � � n 2 � We bound with Chebyshev � X 1 + ··· + X n 2 � var n 2 Pr( A ǫ n 2 ) ≤ ǫ 2
Subsequence n 4 n 2 σ 2 = σ 2 � X 1 + · · · + X n 2 � = 1 var n 2 n 2 So � σ 2 � Pr( A ǫ n 2 ) ≤ ǫ 2 n 2 < ∞ n Applying the Borel-Cantelli Lemma shows that along the subsequence { n 2 } , the empirical averages converge to µ almost surely.
From Subsequence to Full Sequence We want to show that for every ǫ > 0 with probability 1 there is N large enough so that � � X 1 + · · · + X n � � − µ � < ǫ � � N � We know this holds for large enough N = n 2 . And here is where we will use non-negativity. Start by picking n large enough so that � X 1 + · · · + X n 2 � � � − µ � < ǫ/ 3 � � n 2 � and X 1 + · · · + X ( n +1) 2 � � � � − µ � < ǫ/ 3 � � ( n + 1) 2 �
From Subsequence to Full Sequence For n 2 ≤ N ≤ ( n + 1) 2 , ≤ X 1 + · · · + X ( n +1) 2 X 1 + · · · + X n 2 ≤ X 1 + · · · + X n ( n + 1) 2 N 2 n 2 and n 2 µ − ǫ ( n + 1) 2 ≤ X 1 + · · · + X n 2 � � ( n + 1) 2 3 and � ( n + 1) 2 X 1 + · · · + X ( n +1) 2 µ + ǫ � ≤ n 2 n 2 3 n 2 If n is large enough so that ( n +1) 2 is close to 1, then we are done.
Removing the finite variance condition To get the full theorem under the fewest conditions we need one more trick: truncation. Again assume that X i ≥ 0, with E X i = µ < ∞ . Let Y n = min { X n , n } . Fact: X n − Y n → 0 almost surely. Proof: � � Pr[ X n � = Y n ] = Pr[ X 1 > n ] ≤ E X 1 < ∞ n n and apply Borel-Cantelli. In particular, it’s enough to prove the strong law for the Y n ’s.
Removing the finite variance condition Now we apply the same methods we’ve used before. This time we will use an even sparser subsequence, 1 , c , c 2 , c 3 , . . . for some c > 1 which will depend on ǫ . The main estimate we need to apply Borel-Cantelli is: ∞ 1 c j min { X i , c j } 2 = O ( X j ) � j =1 and so ∞ 1 c j E [ Y c j ] 2 < ∞ � j =1
Removing the finite variance condition Now we use Chebyshev: Let �� � � Y 1 + · · · + Y c j � � A ǫ c j = − µ � > ǫ � � c j � and � Y 1 + ··· + Y cj � var c j Pr( A ǫ c j ) ≤ ǫ 2 1 ǫ 2 c j E [ Y c j ] 2 ≤
Finishing Up From above, ∞ 1 ǫ 2 c j E [ Y c j ] 2 < ∞ � j =1 and so Borel-Cantelli says that along the subsequence c j , the empirical averages converge almost surely. Again we can use the fact that the Y i ’s are non-negative to go from the sparse sequence to the full sequence.
Recommend
More recommend