M4S1 - Central Limit Theorem Professor Jarad Niemi STAT 226 - Iowa State University September 28, 2018 Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 1 / 24
Outline Sampling distribution Standard error Central Limit Theorem Estimation Bias Variability Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 2 / 24
Sampling distribution Sampling distribution Definition A summary statistic is a numerical value calculated from the sample. But this sample is only one of many possibilities. What could have happened if we had a different sample? Definition The sampling distribution of a statistic is the distribution of that statistic over different samples of a fixed size. Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 3 / 24
Sampling distribution Binomial distribution Flipping a coin Suppose we repeatedly tossed a fair coin 10 times and recorded the number of heads. The sampling distribution is the binomial distribution with 10 attempts and probability of success 0.5. Bin(10,0.5) 0.25 0.20 0.15 P(Y=y) 0.10 0.05 0.00 0 2 4 6 8 10 y Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 4 / 24
Sampling distribution Binomial distribution Rolling a die Suppose we repeatedly rolled a fair 6-sided die 24 times and recorded the number of 1s. The sampling distribution is the binomial distribution with 24 attempts and probability of success 1/6. Bin(24,1/6) 0.20 0.15 P(Y=y) 0.10 0.05 0.00 0 5 10 15 20 y Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 5 / 24
Sampling distribution Binomial distribution Rolling a die Suppose we repeatedly rolled a fair 6-sided die 120 times and recorded the number of 1s. The sampling distribution is the binomial distribution with 120 attempts and probability of success 1/6. Bin(120,1/6) 0.10 0.08 0.06 P(Y=y) 0.04 0.02 0.00 0 20 40 60 80 100 120 y Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 6 / 24
Sampling distribution Maximum Rolling a die Suppose we repeatedly rolled a fair 6-sided die 5 times and recorded the maximum. It’s hard to analytically determine what happens, but we can use a computer to perform the experiment. Histogram of simulated die rolls 0.6 0.5 0.4 Density 0.3 0.2 0.1 0.0 1 2 3 4 5 6 x Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 7 / 24
Sampling distribution Maximum Rolling a die Suppose we repeatedly rolled a fair 6-sided die 50 times and recorded the maximum. It’s hard to analytically determine what happens, but we can use a computer to perform the experiment. Histogram of simulated die rolls 1.0 0.8 0.6 Density 0.4 0.2 0.0 1 2 3 4 5 6 x Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 8 / 24
Sampling distribution Mean Sample mean Suppose we repeatedly rolled a fair 6-sided die 8 times and recorded the mean. It’s hard to analytically determine what happens, but we can use a computer to perform the experiment. Histogram of mean of simulated die rolls 0.6 0.5 0.4 Density 0.3 0.2 0.1 0.0 1 2 3 4 5 6 x Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 9 / 24
Sampling distribution Mean Sample mean Suppose we repeatedly rolled a fair 6-sided die 80 times and recorded the mean. It’s hard to analytically determine what happens, but we can use a computer to perform the experiment. Histogram of mean of simulated die rolls 2.0 1.5 Density 1.0 0.5 0.0 1 2 3 4 5 6 x Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 10 / 24
Central Limit Theorem Central Limit Theorem Theorem Suppose you have a sequence of independent and identically distributed random variables X 1 , X 2 , . . . with population mean E [ X i ] = µ and population variance V ar [ X i ] = σ 2 . The Central Limit Theorem (CLT) says the sampling distribution of the sample mean converges to a normal distribution. Specifically X n − µ σ/ √ n → N (0 , 1) as n → ∞ � n where X n = 1 i =1 X i . Thus, for large n , we can approximate the sample mean n by a normal distribution, i.e. ∼ N ( µ, σ 2 /n ) · X n where ∼ means “approximately distributed.” The standard deviation of the · sampling distribution of a statistic is known as the standard error (SE), i.e. σ/ √ n is the standard error from the CLT. Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 11 / 24
Central Limit Theorem Mean of the sample mean Recall the following property: E [ aX + bY + c ] = aE [ X ] + bE [ Y ] + c If we have E [ X i ] = µ for all i , then � 1 � n � E [ X n ] = E i =1 X i n n E [ � n = 1 i =1 X i ] � n = 1 i =1 E [ X i ] n � n = 1 i =1 µ n = 1 n n · µ = µ So the expectation/mean of the sample mean ( X ) is the population mean µ . Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 12 / 24
Central Limit Theorem Variance of the sample mean Recall the following property for independent random variables X and Y : V ar [ aX + bY + c ] = a 2 V ar [ X ] + b 2 V ar [ Y ] If we have V ar [ X i ] = σ 2 for all i , then � 1 � n n 2 V ar [ � n 1 � V ar [ X n ] = V ar i =1 X i = i =1 X i ] n � n � n 1 1 i =1 σ 2 = i =1 V ar [ X i ] = n 2 n 2 n 2 n · σ 2 1 = = σ 2 /n � � σ 2 /n SE [ X n ] = V ar [ X n ] = = σ/ √ n So the variance of the sample mean ( X ) is the population variance ( σ 2 ) divided by the sample size ( n ). The standard error, which is the square root of the variance, is the population standard deviation ( σ ) divided by the square root of the sample size ( √ σ ). Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 13 / 24
Central Limit Theorem Sampling distribution of sample mean If X 1 , X 2 , . . . are a sequence of independent and identically distributed random variables with population mean E [ X i ] = µ and population variance V ar [ X i ] = σ 2 , then V ar [ X n ] = σ 2 /n E [ X n ] = µ for any n . The CLT says that, as n gets large, the sampling distribution of the sample mean converges to a normal distribution. Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 14 / 24
Central Limit Theorem Coin flipping Sampling distribution for the proportion of heads on an unbiased coin flip. Bin(10,1/2) Bin(30,1/2) Bin(50,1/2) 0.15 0.25 0.10 0.20 0.08 0.10 0.15 0.06 P(Y=y) P(Y=y) P(Y=y) 0.10 0.04 0.05 0.05 0.02 0.00 0.00 0.00 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 y y y Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 15 / 24
Central Limit Theorem Die rolling Sampling distribution for the proportion of 1s on an unbiased 6-sided die roll. Bin(10,1/6) Bin(30,1/6) Bin(50,1/6) 0.15 0.30 0.15 0.25 0.10 0.20 P(Y=y) P(Y=y) 0.10 P(Y=y) 0.15 0.05 0.10 0.05 0.05 0.00 0.00 0.00 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 y y y Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 16 / 24
Central Limit Theorem Die rolling Sampling distribution for the sample mean of an unbiased 6-sided die roll. n=10 n=30 n=50 1.2 1.5 0.6 1.0 0.8 1.0 Density 0.4 Density Density 0.6 0.4 0.5 0.2 0.2 0.0 0.0 0.0 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 x x x Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 17 / 24
Central Limit Theorem Welfare A certain group of welfare recipients receives SNAP benefits of $110 per week with a standard deviation of $20. A random sample of 30 people is taken and sample mean is calculated. What is the expected value of the sample mean? Let X i be the SNAP benefit for individual i . We know E [ X i ] = $110 and V ar [ X i ] = $20 2 . Thus, E [ X 30 ] = $110 . What is the the standard error of the sample mean? √ The standard error is σ/ √ n = $20 / 30 ≈ $3 . 65 . What is the approximate probability the sample mean will be greater than $120? ∼ N ($110 , $3 . 65 2 ) . · We know X 30 � � X 30 − $110 > $120 − $110 P ( X 30 > $120) = P $3 . 65 $3 . 65 ≈ P ( Z > 2 . 74) = 1 − P ( Z < 2 . 74) = 1 − 0 . 9969 = 0 . 0031 Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 18 / 24
Central Limit Theorem Process to use CLT Given a scientific question, do the following 1. Identify the random variables X 1 , X 2 , . . . . 2. Verify these are independent and identically distributed. 3. Determine the expectation/mean and variance (or standard deviation) of the X i . 4. Determine the sample size. Is the sample size large enough for the CLT to apply? 5. If yes, determine the approximate sampling distribution for the sample mean. 6. Write the scientific question in mathematical/probabilistic notation. 7. Calculate your answer. Professor Jarad Niemi (STAT226@ISU) M4S1 - Central Limit Theorem September 28, 2018 19 / 24
Recommend
More recommend