Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals STAT 113 Confidence Intervals Colin Reimer Dawson Oberlin College October 3, 2017 1 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Outline Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals 2 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Two Main Goals of Inference 1. Assessing strength of evidence about “yes/no” questions (hypothesis testing) 2. Estimating unknown quantities in a population using a sample (confidence intervals) 3 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Statistics vs. Parameters • Summary values (like mean, median, standard deviation) can be computed for populations or for samples. • In a population, such a summary value is called a parameter • In a sample, these values are called statistics , and are used to estimate the corresponding parameter Value Population Parameter Sample Statistic ¯ Mean µ X Proportion ˆ p p Correlation ρ r ˆ Slope of a Line β 1 β 1 X 1 − ¯ ¯ Difference in Means µ 1 − µ 2 X 2 . . . . . . . . . 4 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Using Samples to Make Estimates About Populations • The set of all gumballs from my factory is my population . • The mean flavor-life in the population is a population parameter (write µ for the pop. mean) • Ideally I can test a random sample • The mean flavor-life in the sample is a sample statistic (write ¯ x for the sample mean). Statistic : Sample :: Parameter : Population 5 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Variability due to Sampling • Samples are imperfect reflections of the population. • However, some populations are more compatible with the sample than others. • If we imagine a continuum of populations (or just population means), some are more plausible than others because they make the data more likely . 6 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Sampling Distributions Sampling Distribution Definition Consider all possible random samples of a fixed size, n from a population. Each one has its own value for a particular statistic (like ¯ x ). A sampling distribution is the collection of all of of those ¯ x values (or whatever the statistic is) 7 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Sampling Distribution of Gumball Means 0.15 Density mean = 66.77 0.00 60 65 70 75 Population flavor−life (min.) 0.4 Density s = 0.9 0.0 60 65 70 75 Sample Mean Flavor Life (n = 10) 8 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Self-Check Quiz 1. What are the cases in the context of a sampling distribution? Possible samples of a fixed size n 2. What is the variable in the relevant sampling distribution for the gumball life example? Each case has its own sample mean 9 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Standard Error Standard Error Definition The distribution of a quantitative variable has a standard deviation. The sampling distribution of a quantitative sample statistic (like a mean) has a standard deviation too. This has a special name: the standard error (e.g., “of the mean”). 10 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Sampling Distribution of Gumball Means 0.15 Density mean = 66.77 0.00 60 65 70 75 Population flavor−life (min.) 0.4 Density s = 0.9 0.0 60 65 70 75 Sample Mean Flavor Life (n = 10) 11 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Properties of Sampling Distribution Most (about 95%) of simple random samples have a sample mean ( ¯ x ) which is within 2 Standard Errors of the population mean ( µ ). 12 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Sampling Distribution of Gumball Means 0.15 Density mean = 66.77 0.00 60 65 70 75 Population flavor−life (min.) 0.4 Density s = 0.9 0.0 60 65 70 75 Sample Mean Flavor Life (n = 10) 13 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Properties of Sampling Distribution Most (about 95%) of simple random samples have a sample mean ( ¯ x ) which is within 2 Standard Errors of the population mean ( µ ). The population mean µ is within 2 Standard Errors of most (about 95%) sample means. Deeeeeep.... 14 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Outline Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Bootstrap Resampling Bootstrap Confidence Intervals Bootstrap Percentile Intervals 15 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Margins of Error In a Gallup poll released yesterday, a sample of 1500 adults in the U.S. voters were asked whether they approved or disapproved of the job that Donald Trump is doing as president. 42% of respondents said “approve” and 54% said disapprove. The poll’s margin of error was 3 percentage points. • What’s the meaning of that 3%? Margin of Error It defines a range of “plausible” values for each population proportion. Precisely, a 95% margin of error of 3 points means that 95% of surveys with the same procedure and sample size will yield sample statistics which are within 3 points of the corresponding 16 / 51 population parameter.
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Confidence Intervals • A point estimate of some population parameter (like a mean), together with some measure of our confidence/uncertainty (e.g., MoE), defines a confidence interval . • Can be written in the form “statistic ± MoE”. Stating Confidence Intervals • “With 95% confidence, the mean flavor-life of our gumballs is between 65.3 and 67.1 minutes.” • “With 95% confidence, between 43 (i.e, 46 − 3 ) and 49 (i.e., 46 + 3 ) percent of registered voters prefer Hillary Clinton to Donald Trump.” • “With 95% confidence, between 39 ( 42 − 3 ) and 45 ( 42 + 3 ) percent of U.S. adults approve of the president’s job performance.” 17 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Self-Check: Confidence Intervals HBO Sports/Marist gave 1253 U.S. adults the following poll question in spring 2015 (I have edited for length): "Top college men’s football and basketball programs bring in a lot of money to their schools... Do you think student athletes in [these top programs] should be paid for the hours they are required to spend practicing, traveling, and playing on the team, OR should not be paid given the value of their scholarship and a chance to earn a degree?" This poll’s 95% margin of error is 2.8%. The results are given in the following table. Find a 95% confidence interval for the percentage of U.S. adults who chose the first option. Should be paid Should not be paid Unsure 18 / 51 33% 65% 2%
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals How to Determine the Margin of Error? The population mean µ is within 2 Standard Errors of most (about 95%) sample means (from simple random samples). Margin of Error A 95% margin of error of 3 points means that 95% of surveys with the same procedure and sample size will yield sample statistics which are within 3 points of the corresponding population parameter. If the sampling distribution is approximately Normal (bell-shaped), then 95% Margin of Error is about 2 Standard Errors. 19 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Confidence Interval of Gumball Flavor Life 0.4 Density s = 0.9 0.0 60 65 70 75 Sample Mean Flavor Life (n = 10) mean = 67.23 ● ● ● ● ● ● ● ●● ● 60 65 70 75 Sample flavor−life (min.) 20 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals A Different Confidence Interval of Gumball Flavor Life 0.4 Density s = 0.9 0.0 60 65 70 75 Sample Mean Flavor Life (n = 10) mean = 71.08 ● ● ● ● ●● ● ● ● ● 60 65 70 75 Sample flavor−life (min.) 21 / 51
Sampling Distributions Confidence Intervals Bootstrap Confidence Intervals Example: Carbon in Forest Biomass • Scientists hoping to curb deforestation estimate 1 that the carbon stored in tropical forests in Latin America, sub-Saharan Africa, and southeast Asia has a total biomass of 247 gigatons. • To arrive at this estimate, they first estimate the mean amount of carbon per square kilometer. • Based on a sample of size n = 4079 inventory plots, the sample mean is ¯ x = 11600 tons with a standard error of 1000 tons. • Give and interpret a 95% confidence interval for the carbon per km in the entire set of forests. 1 Saatchi, S.S. et. al. “Benchmark Map of Forest Carbon Stocks in Tropical Regions Across Three Continents,” Proceedings of the National Academy of Sciences , 5/31/11. 22 / 51
Recommend
More recommend