The Central Limit Theorem: More of the Story Steven Janke November 2015 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 1 / 33
Central Limit Theorem Theorem (Central Limit Theorem) Let X 1 , X 2 , . . . be a sequence of independent and identically distributed random variables, each with expectation µ and variance σ 2 . Then the distribution of Z n = X 1 + X 2 + · · · + X n − n µ σ √ n converges to the distribution of a standard normal random variable. � x 1 e − y 2 n →∞ P ( Z n ≤ x ) = √ 2 dy lim 2 π −∞ Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 2 / 33
Central Limit Theorem Applications The sampling distribution of the mean is approximately normal. The distribution of experimental errors is approximately normal. >> But why the normal distribution? Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 3 / 33
Benford’s Law In an arbitrary table of data, such as populations or lake areas, P [Leading digit is d] = log 10 (1 + 1 d ) Data: List of 60 Tallest Buildings Lead Digit Meters Feet Benford 1 0.433 0.300 0.301 2 0.117 0.133 0.176 3 0.150 0.133 0.125 4 0.100 0.100 0.097 5 0.067 0.167 0.079 6 0.017 0.083 0.067 7 0.033 0.033 0.058 8 0.083 0.017 0.051 9 0.000 0.033 0.046 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 4 / 33
Benford Justification Simon Newcomb 1881 Frank Benford 1938 ”Proof” arguments: Positional number system Densities Scale invariance Scale and base unbiased (Hill 1995) Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 5 / 33
Central Limit Theorem Theorem (Central Limit Theorem) Let X 1 , X 2 , . . . be a sequence of independent and identically distributed random variables, each with expectation µ and variance σ 2 . Then the distribution of Z n = X 1 + X 2 + · · · + X n − n µ σ √ n converges to the distribution of a standard normal random variable. � x 1 e − y 2 n →∞ P ( Z n ≤ x ) = √ 2 dy lim 2 π −∞ Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 6 / 33
Central Limit Theorem Proof Proof Sketch: Let Y i = X i − µ Moment Generating Function of Y i is M Y i ( t ) = Ee tY i σ √ n ] n t MGF of Z n is M Z n ( t ) = [ M Y 1 ( lim n →∞ ln M Z n ( t ) = t 2 2 t 2 The MGF of the standard normal is e 2 Since the MGF’s converge, the distributions converge. (L´ evy Continuity Theorem). Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 7 / 33
Counter-Examples Moment problem: Lognormal R.V. not determined by moments. No first moment: Cauchy R.V. has no MGF, EX = ∞ so CLT does not hold. 1 | x | 3 for | x | ≥ 1. No second moment: f ( x ) = Pairwise independence is not sufficient for CLT. Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 8 / 33
Demoivre’s Theorem 1733 Each X i is Bernoulli ( 0 or 1 ). � 1 � n b ( k ) = P [ S n = k ] = 2 n k √ 2 π ) n n √ ne − n (Stirling’s formula) n ! ≈ ( √ b ( n 2 2 ) ≈ √ π n b ( n 2 + d 2 ) ) ≈ − 2 d 2 log( b ( n n √ √ π n e − 2 d 2 2 b ( n 2 + d ) ≈ n � b lim n →∞ P [ a ≤ S n − n / 2 a e − x 2 1 √ n / 2 ≤ b ] = 2 dx √ 2 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 9 / 33
Laplace 1810 Dealt with independent and identically distributed case. Started with discrete variables: Consider X i where p k = P [ X i = k m ] for k = − m , − m + 1 , · · · , m − 1 , m Generating function: T ( t ) = � m k = − m p k t k q j = P [ � X i = j m ] is coefficient of t j in T ( t ) n � π Substitute e ix for t and recall 1 − π e − itx e isx dx = δ ts 2 π � π − π e − ijx [ � m 1 − m p k e ikx ] n dx Then, q j = 2 π Now, expand e ikx in a power series around 0 and use the fact that the mean of X i is zero. Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 10 / 33
Why Normal? Normal Characteristic Function 1 � e iux e − x 2 2 dx = e − u 2 √ f ( u ) = 2 2 π σ √ n ( u ) = E [ e iu ( S n /σ √ n ) ] = ( f ( u σ √ n )) n f Sn 2 σ 2 nu 2 + o ( σ 2 σ 2 σ 2 nu 2 )) n = (1 − = (1 − u 2 2 n + o ( u 2 n )) n → e − u 2 2 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 11 / 33
Levy Continuity Theorem If distribution functions F n converge to F, then the corresponding ch.f. f n converge to f . Conversely, if f n converges to g continuous at 0, then F n converges to F. Proof Sketch: First direction is the Helly-Bray theorem. The set { e iux } is a separating set for distribution functions. In both directions, continuity points and mass of F n are critical. Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 12 / 33
History Laplace never presented a general CLT statement. (Concerned with limiting probabilities for particular problems). Concern over convergence led Poisson to improvements (not identically distributed case). Dirichlet and Cauchy changed conception of analysis (epsilon/delta). Counter-examples uncovered limitations. Chebychev proved CLT using convergence of moments. (Markov and Liapounov were students). First rigorous proof (Liapounov 1900). CLT holds with independent (but not necessarily i.i.d.) X i if � E | X j | 3 � E | X j | 3 → 0 [ � X 2 j ] 3 / 2 = s 3 n Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 13 / 33
Liapounov proof � E | X j | 3 � E | X j | 3 Assume: j ] 3 / 2 = → 0 [ � X 2 s 3 n 1 f k ( u 1 [1 + ( f k ( u g n ( u ) = Π n s n ) = Π n s n ) − 1)] s n ) = 1 − u 2 f k ( u n ( σ 2 k + δ k ( u s n )) 2 s 2 s n ) − 1 | ≤ 2 u 2 σ 2 ⇒ � k | f k ( u 1 | f k ( u s n ) − 1 | ≤ 2 u 2 = k s 2 n 3 ( E | X 2 2 < E | X k | 3 k | ) σ k ⇒ sup | f k ( u s n → 0 = s n ) − 1 | → 0 sup k ≤ n Use log(1 + z ) = z (1 + θ z ) where | θ | ≤ 1 for | z | ≤ 1 2 log g n ( u ) = � n s n ) − 1) + θ � n 1 ( f k ( u 1 ( f k ( u s n ) − 1) 2 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 14 / 33
Liapounov proof continued log g n ( u ) = � n s n ) − 1) + θ � n 1 ( f k ( u 1 ( f k ( u s n ) − 1) 2 | θ � n 1 ( f k ( u s n ) − 1) 2 | ≤ sup | f k ( u s n ) − 1 | · � n 1 | ( f k ( u s n ) − 1) | → 0 σ 2 s n ) − 1 = − u 2 n + θ k u 3 f k ( u n E | X k | 3 k 2 s 2 s 3 1 E | X k | 3 → − u 2 s n ) − 1) = − u 2 2 + θ u 3 � n � n 1 ( f k ( u s 3 2 n Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 15 / 33
Lindeberg 1922 Theorem (Central Limit Theorem) Let the variables X i be independent with EX i = 0 and EX 2 i = σ 2 i . Let s be the standard deviation of the sum S and let F be the distribution of S s . 1 � � | x |≥ ǫ s n x 2 dF k → 0 , we With Φ( x ) the normal distribution, then if s 2 n have x | F ( x ) − Φ( x ) | ≤ 5 ǫ sup Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 16 / 33
Lindeberg Proof Pick auxiliary function f . � Arbitrary distribution V , define F ( x ) = f ( x − t ) dV ( x ) � With φ ( x ) the normal density, define Ψ( x ) = ( f ( x − t ) φ ( x ) dx Taylor expansion of f to third power gives | x | 3 dV ( x ) � | F ( x ) − Ψ( x ) | < k With U i the distribution of X i , � � F 1 ( x ) = f ( x − t ) dU 1 ( x ) . . . F n ( x ) = F n − 1 ( x − t ) dU n ( x ) � � · · · U ( x − t 1 − t 2 − · · · t n ) dU 1 ( t 1 ) · · · dU n ( t n ) Note U ( x ) = 1 By selecting f carefully, | U ( x ) − Φ( x ) | < 3( � n � | x | 3 dU i ( x )) 4 i Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 17 / 33
Still Why Normal? Let X = X 1 + X 2 where X is N (0 , 1) and X 1 independent of X 2 f ( u ) = f 1 ( u ) f 2 ( u ) = e − u 2 2 e − u 2 2 is an entire, non-vanishing function with | f 1 ( z ) | ≤ e c | u | 2 , Hadamard factorization theorem = ⇒ log f 1 ( u ) is a polynomial in u of at most degree 2. ⇒ f (0) = 1 , f ( u ) = ¯ f is a characteristic function = f ( − u ), and it is bounded. Hence, log f ( u ) = iua + bu 2 . This is the general form of the normal characteristic function. Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 18 / 33
Feller - Levy 1935 Theorem (Final Central Limit Theorem) Let the variables X i be independent with EX i = 0 and EX 2 i = σ 2 i . Let S n = � n n = � n 1 X i and s 2 1 σ 2 k . Φ is the normal distribution and F k is the distribution of X k . Then as n → ∞ , σ k P [ S n / s n ≤ x ] → Φ( x ) and max k ≤ n → 0 s n if and only if for every ǫ > 0 1 � � x 2 dF k → 0 s 2 | x |≥ ǫ s n n Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 19 / 33
Recommend
More recommend