Unit 3: Foundations for inference Lecture 1: Variability in estimates and CLT Statistics 101 Thomas Leininger May 28 2013
Announcements Announcements 1 Variability in estimates 2 Example Sampling distributions - via simulation Sampling distributions - via CLT Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger
Announcements Announcements Labs 2 & 3 due today PS 3 due tomorrow Projects Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 2 / 16
Variability in estimates Announcements 1 Variability in estimates 2 Example Sampling distributions - via simulation Sampling distributions - via CLT Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger
Variability in estimates Example Announcements 1 Variability in estimates 2 Example Sampling distributions - via simulation Sampling distributions - via CLT Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger
Variability in estimates Example http://pewresearch.org/pubs/2191/young-adults-workers-labor-market-pay-careers-advancement-recession Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 3 / 16
Variability in estimates Example Margin of error 41% ± 2.9%: We are 95% confident that 38.1% to 43.9% of the public believe young adults, rather than middle-aged or older adults, are having the toughest time in today’s economy. 49% ± 4.4%: We are 95% confident that 44.6% to 53.4% of 18-34 years olds have taken a job they didn’t want just to pay the bills. Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 4 / 16
Variability in estimates Example Parameter estimation We are often interested in population parameters . Since complete populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest. Sample statistics vary from sample to sample. Quantifying how sample statistics vary provides a way to estimate the margin of error associated with our point estimate. But before we get to quantifying the variability among samples, let’s try to understand how and why point estimates vary from sample to sample. Suppose we randomly sample 1,000 adults from each state in the US. Would you expect the sample means of their heights to be the same, somewhat different, or very different? Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 5 / 16
Variability in estimates Sampling distributions - via simulation Announcements 1 Variability in estimates 2 Example Sampling distributions - via simulation Sampling distributions - via CLT Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger
Variability in estimates Sampling distributions - via simulation Average number of Duke games attended Next let’s look at the population data for the number of Duke basketball games attended: 150 100 Frequency 50 0 0 10 20 30 40 50 60 70 number of Duke games attended Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 6 / 16
Variability in estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Sampling distribution, n = 10: What does each observa- tion in this distribution rep- 2000 resent? 1500 Is the variability of the sam- Frequency 1000 pling distribution smaller or larger than the variability of the population distribution? 500 Why? 0 0 5 10 15 20 sample means from samples of n = 10 Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 7 / 16
Variability in estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Sampling distribution, n = 10: What does each observa- tion in this distribution rep- 2000 resent? Sample mean, ¯ x , of 1500 samples of size n = 10 . Is the variability of the sam- Frequency 1000 pling distribution smaller or larger than the variability of the population distribution? 500 Why? 0 0 5 10 15 20 sample means from samples of n = 10 Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 7 / 16
Variability in estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Sampling distribution, n = 10: What does each observa- tion in this distribution rep- 2000 resent? Sample mean, ¯ x , of 1500 samples of size n = 10 . Is the variability of the sam- Frequency 1000 pling distribution smaller or larger than the variability of the population distribution? 500 Why? Smaller, sample means will 0 vary less than individual 0 5 10 15 20 observations. sample means from samples of n = 10 Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 7 / 16
Variability in estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Sampling distribution, n = 30: 800 How did the shape, cen- ter, and spread of the sam- 600 pling distribution change go- Frequency ing from n = 10 to n = 30 ? 400 200 0 2 4 6 8 10 sample means from samples of n = 30 Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 8 / 16
Variability in estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Sampling distribution, n = 30: 800 How did the shape, cen- ter, and spread of the sam- 600 pling distribution change go- Frequency ing from n = 10 to n = 30 ? 400 Shape is more symmetric, center is about the same, 200 spread is smaller. 0 2 4 6 8 10 sample means from samples of n = 30 Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 8 / 16
Variability in estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Sampling distribution, n = 70: 1200 1000 800 Frequency 600 400 200 0 3 4 5 6 7 8 9 sample means from samples of n = 70 Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 9 / 16
Variability in estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Question The mean of the sampling distribution is 5.75, and the standard devia- tion of the sampling distribution (also called the standard error ) is 0.75. Which of the following is the most reasonable guess for the 95% con- fidence interval for the true average number of Duke games attended by students? (a) 5 . 75 ± 0 . 75 (b) 5 . 75 ± 2 × 0 . 75 (c) 5 . 75 ± 3 × 0 . 75 (d) cannot tell from the information given Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 10 / 16
Variability in estimates Sampling distributions - via simulation Average number of Duke games attended (cont.) Question The mean of the sampling distribution is 5.75, and the standard devia- tion of the sampling distribution (also called the standard error ) is 0.75. Which of the following is the most reasonable guess for the 95% con- fidence interval for the true average number of Duke games attended by students? (a) 5 . 75 ± 0 . 75 (b) 5 . 75 ± 2 × 0 . 75 → (4 . 25 , 7 . 25) (c) 5 . 75 ± 3 × 0 . 75 (d) cannot tell from the information given Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 10 / 16
Variability in estimates Sampling distributions - via CLT Announcements 1 Variability in estimates 2 Example Sampling distributions - via simulation Sampling distributions - via CLT Statistics 101 U3 - L1: Variability in estimates and CLT Thomas Leininger
Variability in estimates Sampling distributions - via CLT Central limit theorem Central limit theorem The distribution of the sample mean is well approximated by a normal model: � � mean = µ, SE = σ x ∼ N ¯ √ n If σ is unknown, use s . So it wasn’t a coincidence that the sampling distributions we saw earlier were symmetric. σ We won’t go into the proving why SE = √ n , but note that as n increases SE decreases. As the sample size increases we would expect samples to yield more consistent sample means, hence the variability among the sample means would be lower. Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 11 / 16
Variability in estimates Sampling distributions - via CLT CLT - conditions Certain conditions must be met for the CLT to apply: Independence: Sampled observations must be independent. 1 This is difficult to verify, but is more likely if random sampling/assignment is used, and, if sampling without replacement, n < 10% of the population. Sample size/skew/outliers: Either 2 1) the population distribution is normal OR 2) n > 30 and the population distribution is not extremely skewed. This is also difficult to verify for the population, but we can check it using the sample data, and assume that the sample mirrors the population. Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 12 / 16
Variability in estimates Sampling distributions - via CLT CLT - sample size/skew condition - simulations (1) http://onlinestatbook.com/stat sim/sampling dist/index.html Statistics 101 (Thomas Leininger) U3 - L1: Variability in estimates and CLT May 28 2013 13 / 16
Recommend
More recommend