unit 3 foundations for inference
play

Unit 3: Foundations for inference 1. Variability in estimates and - PowerPoint PPT Presentation

Unit 3: Foundations for inference 1. Variability in estimates and CLT GOVT 3990 - Spring 2020 Cornell University Outline 1. Housekeeping 2. Main ideas 1. Sample statistics vary from sample to sample 2. CLT describes the shape, center, and


  1. Unit 3: Foundations for inference 1. Variability in estimates and CLT GOVT 3990 - Spring 2020 Cornell University

  2. Outline 1. Housekeeping 2. Main ideas 1. Sample statistics vary from sample to sample 2. CLT describes the shape, center, and spread of sampling distributions 3. CLT only applies when independence and sample size/skew conditions are met 3. Exercises [time permitting] 4. Summary

  3. Announcements ◮ Decks online 1

  4. Announcements ◮ Decks online ◮ Grades ◮ Problem Set and Lab now due Friday 1

  5. Outline 1. Housekeeping 2. Main ideas 1. Sample statistics vary from sample to sample 2. CLT describes the shape, center, and spread of sampling distributions 3. CLT only applies when independence and sample size/skew conditions are met 3. Exercises [time permitting] 4. Summary

  6. Outline 1. Housekeeping 2. Main ideas 1. Sample statistics vary from sample to sample 2. CLT describes the shape, center, and spread of sampling distributions 3. CLT only applies when independence and sample size/skew conditions are met 3. Exercises [time permitting] 4. Summary

  7. Sample statistics vary from sample to sample ◮ We are often interested in population parameters . 2

  8. Sample statistics vary from sample to sample ◮ We are often interested in population parameters . ◮ Since complete populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest. 2

  9. Sample statistics vary from sample to sample ◮ We are often interested in population parameters . ◮ Since complete populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest. ◮ Sample statistics vary from sample to sample. 2

  10. Sample statistics vary from sample to sample ◮ We are often interested in population parameters . ◮ Since complete populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest. ◮ Sample statistics vary from sample to sample. ◮ Quantifying how sample statistics vary provides a way to estimate the margin of error associated with our point estimate. 2

  11. Sample statistics vary from sample to sample ◮ We are often interested in population parameters . ◮ Since complete populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest. ◮ Sample statistics vary from sample to sample. ◮ Quantifying how sample statistics vary provides a way to estimate the margin of error associated with our point estimate. ◮ But before we get to quantifying the variability among samples, let’s try to understand how and why point estimates vary from sample to sample. Suppose we randomly sample 1,000 adults from each state in the US. Would you expect the sample means of their ages to be the same, somewhat different, or very different? 2

  12. We would like to estimate the average number of drinks it takes students to get drunk. ◮ We will assume that our population is comprised of 146 students. ◮ Assume also that we don’t have the resources to collect data from all 146, so we will take a sample of size n = 10 . If we randomly select observations from this data set, which values are most likely to be selected, which are least likely? 25 20 15 10 5 0 0 2 4 6 8 10 3 number of drinks to get drunk

  13. Social Media Activity Survey Back in 2015 we surveyed all 146 students of GOVT 111 and asked them, among other things, about their social media activity. For instance, we asked how many social media accounts they had. 4

  14. Social Media Activity Survey Back in 2015 we surveyed all 146 students of GOVT 111 and asked them, among other things, about their social media activity. For instance, we asked how many social media accounts they had. These were their answers: 7 6 6 10 6 4 6 4 1 21 41 61 81 101 121 141 5 2 10 7 5 7 5 6 2 22 42 62 82 102 122 142 4 6 3 4 6 6 3 6 3 23 43 63 83 103 123 143 4 7 6 5 8 8 2 4 4 24 44 64 84 104 124 144 6 3 10 6 4 3 2 5 5 25 45 65 85 105 125 145 2 6 4 6 10 6 5 5 6 26 46 66 86 106 126 146 3 5 3 6 5 2 10 7 27 47 67 87 107 127 5 8 3 7 10 5 4 8 28 48 68 88 108 128 5 0 6 7 8 1 1 9 29 49 69 89 109 129 6 8 8 5 5 5 4 10 30 50 70 90 110 130 1 5 8 10 4 5 10 11 31 51 71 91 111 131 10 9 8 3 0.5 4 8 12 32 52 72 92 112 132 4 7 2 5.5 3 4 10 13 33 53 73 93 113 133 4 5 4 7 3 9 6 14 34 54 74 94 114 134 6 5 8 10 5 4 6 15 35 55 75 95 115 135 3 7 3 6 6 3 6 16 36 56 76 96 116 136 10 4 5 6 4 3 7 17 37 57 77 97 117 137 8 0 5 5 4 4 3 18 38 58 78 98 118 138 5 4 8 4 2 4 10 19 39 59 79 99 119 139 10 3 4 5 5 8 4 20 40 60 80 100 120 140 4

  15. ◮ Now, lets, sample, with replacement, ten student IDs (the white cell): > sample(1:146, size = 10, replace = TRUE) 5

  16. ◮ Now, lets, sample, with replacement, ten student IDs (the white cell): > sample(1:146, size = 10, replace = TRUE) [1] 59 121 88 46 58 72 82 81 5 10 5

  17. ◮ Now, lets, sample, with replacement, ten student IDs (the white cell): > sample(1:146, size = 10, replace = TRUE) [1] 59 121 88 46 58 72 82 81 5 10 ◮ Find the students with these IDs: 5

  18. ◮ Now, lets, sample, with replacement, ten student IDs (the white cell): > sample(1:146, size = 10, replace = TRUE) [1] 59 121 88 46 58 72 82 81 5 10 ◮ Find the students with these IDs: ◮ Calculate the sample mean of their answer: (8 + 6 + 10 + 4 + 5 + 3 + 5 + 6 + 6 + 6) / 10 = 5 . 9 5

  19. Activity: Creating a sampling distribution Repeat this, now on your own, and report your sample mean. > sample(1:146, size = 10, replace = TRUE) 1. Find the students with these IDs: 7 6 6 10 6 4 6 4 1 21 41 61 81 101 121 141 5 2 10 7 5 7 5 6 2 22 42 62 82 102 122 142 4 6 3 4 6 6 3 6 3 23 43 63 83 103 123 143 4 7 6 5 8 8 2 4 4 24 44 64 84 104 124 144 6 3 10 6 4 3 2 5 5 25 45 65 85 105 125 145 2 6 4 6 10 6 5 5 6 26 46 66 86 106 126 146 3 5 3 6 5 2 10 7 27 47 67 87 107 127 5 8 3 7 10 5 4 8 28 48 68 88 108 128 5 0 6 7 8 1 1 9 29 49 69 89 109 129 6 8 8 5 5 5 4 10 30 50 70 90 110 130 1 5 8 10 4 5 10 11 31 51 71 91 111 131 10 9 8 3 0.5 4 8 12 32 52 72 92 112 132 4 7 2 5.5 3 4 10 13 33 53 73 93 113 133 4 5 4 7 3 9 6 14 34 54 74 94 114 134 6 5 8 10 5 4 6 15 35 55 75 95 115 135 3 7 3 6 6 3 6 16 36 56 76 96 116 136 10 4 5 6 4 3 7 17 37 57 77 97 117 137 8 0 5 5 4 4 3 18 38 58 78 98 118 138 5 4 8 4 2 4 10 19 39 59 79 99 119 139 10 3 4 5 5 8 4 20 40 60 80 100 120 140 2. Calculate the sample mean, round it to 2 decimal places. 6

  20. Sampling distribution What you just constructed is called a sampling distribution . What is the shape and center of this distribution. Based on this distribution what do you think is the true population average? 7

  21. Sampling distribution What you just constructed is called a sampling distribution . What is the shape and center of this distribution. Based on this distribution what do you think is the true population average? 5.39 7

  22. Average number of Syracuse games attended Next let’s look at the population data for the number of Syracuse basketball games attended: 150 100 50 0 0 10 20 30 40 50 60 70 number of games attended 8

  23. Average number of Syracuse games attended (cont.) Sampling distribution, n = 10: What does each observation in this distribution represent? 1500 Is the variability of the 500 sampling distribution smaller or larger than the 0 variability of the population 5 10 15 20 sample means from samples of n = 10 distribution? 9

  24. Average number of Syracuse games attended (cont.) Sampling distribution, n = 10: What does each observation in this distribution represent? Sample mean, ¯ x , of samples 1500 of size n = 10 . Is the variability of the 500 sampling distribution 0 smaller or larger than the 5 10 15 20 variability of the population sample means from samples of n = 10 distribution? 9

  25. Average number of Syracuse games attended (cont.) Sampling distribution, n = 10: What does each observation in this distribution represent? Sample mean, ¯ x , of samples of size n = 10 . 1500 Is the variability of the sampling distribution 500 smaller or larger than the 0 variability of the population 5 10 15 20 distribution? sample means from samples of n = 10 Smaller, sample means will vary less than individual observations. 9

  26. Average number of Syracuse games attended (cont.) Sampling distribution, n = 30: How did the shape, center, and spread of the 1000 sampling distribution 500 change going from n = 10 to n = 30 ? 0 2 4 6 8 10 12 sample means from samples of n = 30 10

Recommend


More recommend