gov 2000 4 sums means and limit theorems
play

Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell - PowerPoint PPT Presentation

Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell Fall 2016 1 / 60 1. Sums and Means of Random Variables 2. Useful Inequalities 3. Law of Large Numbers 4. Central Limit Theorem 5. More Exotic CLTs* 6. Wrap-up 2 / 60 Where


  1. Gov 2000: 4. Sums, Means, and Limit Theorems Matthew Blackwell Fall 2016 1 / 60

  2. 1. Sums and Means of Random Variables 2. Useful Inequalities 3. Law of Large Numbers 4. Central Limit Theorem 5. More Exotic CLTs* 6. Wrap-up 2 / 60

  3. Where are we? Where are we going? outcomes/random variables. samples 3 / 60 • Probability: formal way to quantify uncertain • Last week: how to work with multiple r.v.s at the same time. • This week: applying those ideas to study large random

  4. Large random samples variable: 𝑌 1 , 𝑌 2 , … , 𝑌 𝑜 (𝑌 1 , 𝑍 1 ), (𝑌 2 , 𝑍 2 ), … , (𝑌 𝑜 , 𝑍 𝑜 ) 4 / 60 • In real data, we will have a set of 𝑜 measurements on a • Or we might have a set of 𝑜 measurements on two variables: • Empirical analyses: sums or means of these 𝑜 measurements ▶ Almost all statistical procedures involve a sum/mean. ▶ What are the properties of these sums and means? ▶ Can they tell us anything about the distribution of 𝑌 𝑗 ? • Asymptotics: what can we learn as 𝑜 gets big?

  5. 1/ Sums and Means of Random Variables 5 / 60

  6. Sums and means are random variables ̅ 2 6 / 60 • If 𝑌 1 and 𝑌 2 are r.v.s, then 𝑌 1 + 𝑌 2 is a r.v. ▶ Has a mean 𝔽[𝑌 1 + 𝑌 2 ] and a variance 𝕎[𝑌 1 + 𝑌 2 ] • The sample mean is a function of sums and so it is a r.v. too: 𝑌 = 𝑌 1 + 𝑌 2

  7. Distribution of sums/means ⋮ draw 4 3 58 61 30.5 ⋮ ⋮ 134 ⋮ ⋮ distribution of the sum distribution of the mean 67 75 𝑌 1 59 𝑌 2 ̅ 𝑌 draw 1 20 71 91 45.5 draw 2 12 66 78 39 draw 3 7 / 60 𝑌 1 + 𝑌 2

  8. Independent and identical r.v.s distributed r.v.s, 𝑌 1 , … , 𝑌 𝑜 8 / 60 • We often will work with independent and identically ▶ Random sample of 𝑜 respondents on a survey question. ▶ Written “i.i.d.” • Independent: 𝑌 𝑗 ⟂ ⟂ 𝑌 𝑘 for all 𝑗 ≠ 𝑘 • Identically distributed: 𝑔 𝑌 𝑗 (𝑦) is the same for all 𝑗 ▶ 𝔽[𝑌 𝑗 ] = 𝜈 for all 𝑗 ▶ 𝕎[𝑌 𝑗 ] = 𝜏 2 for all 𝑗

  9. Distribution of the sample mean 9 / 60 • Sample mean of i.i.d. r.v.s: 𝑌 𝑜 = 1 𝑜 ∑ 𝑜 𝑗=1 𝑌 𝑗 • 𝑌 𝑜 is a random variable, what is its distribution? ▶ What is the expectation of this distribution, 𝔽[𝑌 𝑜 ] ? ▶ What is the variance of this distribution, 𝕎[𝑌 𝑜 ] ? ▶ What is the p.d.f. of the distribution? • How do they relate to the expectation, variance of 𝑌 1 , … , 𝑌 𝑜 ?

  10. Properties of the sample mean Mean and variance of the sample mean 𝕎[𝑌 𝑗 ] = 𝜏 2 . Then: 𝔽[𝑌 𝑜 ] = 𝜈 𝑜 size √𝑜 10 / 60 Suppose that 𝑌 1 , … , 𝑌 𝑜 is are i.i.d. r.v.s with 𝔽[𝑌 𝑗 ] = 𝜈 and 𝕎[𝑌 𝑜 ] = 𝜏 2 • Key insights: ▶ Sample mean get the right answer on average ▶ Variance of 𝑌 𝑜 depends on the variance of 𝑌 𝑗 and the sample ▶ Not dependent on the (full) distribution of 𝑌 𝑗 ! • Standard error of the sample mean: √𝕎[𝑌 𝑜 ] = 𝜏 • You’ll prove both of these facts in this week’s HW.

  11. 2/ Useful Inequalities 11 / 60

  12. Why inequalities? don’t know (or don’t want to assume) a distribution. subject to some restrictions like fjnite variance. 12 / 60 • Behavior of r.v.s depend on their distribution, but we often • Today, we’ll discuss results for r.v.s with any distribution • Why study these? ▶ Build toward massively important results like LLN ▶ Inequalities used regularly throughout statistics ▶ Gives us some practice with proofs/analytic reasoning

  13. Markov Inequality Markov Inequality Suppose that 𝑌 is r.v. such that ℙ(𝑌 ≥ 0) = 1 . Then, for every real number 𝑢 > 0 , 𝑢 . ℙ(𝑌 ≥ 100) ≤ 0.01 probability can be in the tail. 13 / 60 ℙ(𝑌 ≥ 𝑢) ≤ 𝔽[𝑌] • For instance, if we know that 𝔽[𝑌] = 1 , then • Once we know the mean of a r.v., it limits how much

  14. Markov Inequality Proof 𝔽[𝑌] = ∑ 𝑦 𝑦𝑔 𝑌 (𝑦) = ∑ 𝑦<𝑢 𝑦𝑔 𝑌 (𝑦) + ∑ 𝑦≥𝑢 𝑦𝑔 𝑌 (𝑦) 14 / 60 • For discrete 𝑌 : • Because 𝑌 is nonnegative, 𝔽[𝑌] ≥ ∑ 𝑦≥𝑢 𝑦𝑔 𝑌 (𝑦) • Since 𝑦 ≥ 𝑢 , then ∑ 𝑦≥𝑢 𝑦𝑔 𝑌 (𝑦) ≥ ∑ 𝑦≥𝑢 𝑢𝑔 𝑌 (𝑦) • But this is just ∑ 𝑦≥𝑢 𝑢𝑔 𝑌 (𝑦) = 𝑢 ∑ 𝑦≥𝑢 𝑔 𝑌 (𝑦) = 𝑢ℙ(𝑌 ≥ 𝑢) • Implies 𝔽[𝑌] ≥ 𝑢ℙ(𝑌 ≥ 𝑢)

  15. Chebyshev Inequality Chebyshev Inequality Suppose that 𝑌 is r.v. for which 𝕎[𝑌] < ∞ . Then, for every real number 𝑢 > 0 , 𝑢 2 . from its mean. 15 / 60 ℙ(|𝑌 − 𝔽[𝑌]| ≥ 𝑢) ≤ 𝕎[𝑌] • The variance places limits on how far an observation can be

  16. Proof of Chebyshev squared both sides. 𝑢 2 𝑢 2 16 / 60 • Let 𝑍 = (𝑌 − 𝔽[𝑌]) 2 ▶ ⇝ ℙ(𝑍 ≥ 0) = 1 (nonnegative) ▶ 𝔽[𝑍] = 𝔽[(𝑌 − 𝔽[𝑌]) 2 ] = 𝕎[𝑌] (defjnition of variance) • Note that if |𝑌 − 𝔽[𝑌]| ≥ 𝑢 then 𝑍 ≥ 𝑢 2 because we just • Thus, ℙ(|𝑌 − 𝔽[𝑌]| ≥ 𝑢) = ℙ(𝑍 ≥ 𝑢 2 ) • Apply Markov’s inequality: ℙ(|𝑌 − 𝔽[𝑌]| ≥ 𝑢) = ℙ(𝑍 ≥ 𝑢 2 ) ≤ 𝔽[𝑍] = 𝕎[𝑌]

  17. Application: planning a survey vote for Donald Trump, 𝑞 , from a random sample of size 𝑜 . respondent. 𝑜 17 / 60 • Suppose we want to estimate the proportion of voters who will ▶ 𝑌 1 , 𝑌 2 , … , 𝑌 𝑜 indicating voting intention for Trump for each ▶ By our earlier, calculation, 𝔽[𝑌 𝑜 ] = 𝑞 and 𝕎[𝑌 𝑜 ] = 𝜏 2 ▶ Since this is a Bernoulli r.v., we have 𝜏 2 = 𝑞(1 − 𝑞) • What does 𝑜 need to be to have at least 0.95 probability that 𝑌 𝑜 is within 0.02 of the true 𝑞 ? ▶ How to guarantee a margin of error of ± 2 percentage points?

  18. Application: planning a survey 0.0004𝑜 (1/0.0016𝑜) ≤ 0.05 , which gives us 𝑜 ≥ 12, 500 !! 0.0016𝑜 1 18 / 60 • What does 𝑜 have to be so that ℙ(|𝑌 𝑜 − 𝑞| ≤ 0.02) ≥ 0.95 ⟺ ℙ(|𝑌 𝑜 − 𝑞| ≥ 0.02) ≤ 0.05 • Applying Chebyshev: ℙ(|𝑌 𝑜 − 𝑞| ≥ 0.02) ≤ 𝕎[𝑌 𝑜 ] 0.02 2 = 𝑞(1 − 𝑞) • We don’t know 𝕎[𝑌 𝑗 ] = 𝑞(1 − 𝑞) , but: ▶ Conservative to use largest possible variance. ▶ It can’t be bigger than 𝑞(1 − 𝑞) ≤ (1/2) ⋅ (1/2) = (1/4) ℙ(|𝑌 𝑜 − 𝑞| ≥ 0.02) ≤ 𝑞(1 − 𝑞) 0.0004𝑜 ≤ • We want this probability to be bounded by 0.05 so we need

  19. Application: planning a survey percentage points? but actual probabilities are much smaller. and show the distribution of the means. 19 / 60 • Do we really need 𝑜 ≥ 12, 500 to get a margin of error of ±2 • No! Chebyshev provides a bound that is guaranteed to hold, ▶ We’re also using the “worst-case” variance of 0.25. • Let’s simulate 1000 samples of size 𝑜 = 12500 with 𝑞 = 0.4 ▶ What proportion of these are within 0.02 of 𝑞 ?

  20. Application: planning a survey } nsims <- 1000 ## [1] 0 mean(abs(holder - 0.4) > 0.02) 20 / 60 holder <- rep(NA, times = nsims) for (i in 1:nsims) { this.samp <- rbinom(n = 12500, size = 1, prob = 0.4) holder[i] <- mean(this.samp) 80 60 Density 40 20 0 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 x n − p

  21. 3/ Law of Large Numbers 21 / 60

  22. Current knowledge know that: 22 / 60 • For i.i.d. r.v.s, 𝑌 1 , … , 𝑌 𝑜 , with 𝔽[𝑌 𝑗 ] = 𝜈 and 𝕎[𝑌 𝑗 ] = 𝜏 2 we ▶ Expectation is 𝔽[𝑌 𝑜 ] = 𝔽[𝑌 𝑗 ] = 𝜈 ▶ Variance is 𝕎[𝑌 𝑜 ] = 𝜏 2 𝑜 where 𝜏 2 = 𝕎[𝑌 𝑗 ] ▶ Some bounds on tail probabilities from Chebyshev. ▶ None of these rely on a specifjc distribution for 𝑌 𝑗 ! • Can we say more about the distribution of the sample mean? • Yes, but we need to think about how 𝑌 𝑜 changes as 𝑜 gets big.

  23. Sequence of sample means increasing 𝑜 : ⋮ 23 / 60 • What can we say about the sample mean 𝑜 gets large? • Need to think about sequences of sample means with 𝑌 1 = 𝑌 1 𝑌 2 = (1/2) ⋅ (𝑌 1 + 𝑌 2 ) 𝑌 3 = (1/3) ⋅ (𝑌 1 + 𝑌 2 + 𝑌 3 ) 𝑌 4 = (1/4) ⋅ (𝑌 1 + 𝑌 2 + 𝑌 3 + 𝑌 4 ) 𝑌 5 = (1/5) ⋅ (𝑌 1 + 𝑌 2 + 𝑌 3 + 𝑌 4 + 𝑌 5 ) 𝑌 𝑜 = (1/𝑜) ⋅ (𝑌 1 + 𝑌 2 + 𝑌 3 + 𝑌 4 + 𝑌 5 + ⋯ + 𝑌 𝑜 ) • Note: this is a sequence of random variables!

  24. Convergence in Probability Convergence in probability A sequence of random variables, 𝑎 1 , 𝑎 2 , … , is said to converge in probability to a value 𝑐 if for every 𝜁 > 0 , as 𝑜 → ∞ . We write this 𝑎 𝑜 𝑞 → 𝑐 . interval around 𝑐 approaches 0 as 𝑜 → ∞ 𝑞 → 𝑐 . 24 / 60 ℙ(|𝑎 𝑜 − 𝑐| > 𝜁) → 0, • Basically: probability that 𝑎 𝑜 lies outside any (teeny, tiny) • Wooldridge writes plim (𝑎 𝑜 ) = 𝑐 if 𝑎 𝑜

  25. Law of large numbers Theorem: Weak Law of Large Numbers 𝑞 → 𝜈 . to 0 as 𝑜 gets big. a fjnite variance! 25 / 60 Let 𝑌 1 , … , 𝑌 𝑜 be a an i.i.d. draws from a distribution with mean 𝜈 and fjnite variance 𝜏 2 . Let 𝑌 𝑜 = 1 𝑜 ∑ 𝑜 𝑗=1 𝑌 𝑗 . Then, 𝑌 𝑜 • Intuition: The probability of 𝑌 𝑜 being “far away” from 𝜈 goes ▶ The distribution of 𝑌 𝑜 “collapses” on 𝜈 • No assumptions about the distribution of 𝑌 𝑗 beyond i.i.d. and

Recommend


More recommend