the general social s u r v e y
play

The General Social S u r v e y IN FE R E N C E FOR C ATE G OR IC - PowerPoint PPT Presentation

The General Social S u r v e y IN FE R E N C E FOR C ATE G OR IC AL DATA IN R Andre w Bra y Assistant Professor of Statistics at Reed College INFERENCE FOR CATEGORICAL DATA IN R INFERENCE FOR CATEGORICAL DATA IN R INFERENCE FOR CATEGORICAL


  1. The General Social S u r v e y IN FE R E N C E FOR C ATE G OR IC AL DATA IN R Andre w Bra y Assistant Professor of Statistics at Reed College

  2. INFERENCE FOR CATEGORICAL DATA IN R

  3. INFERENCE FOR CATEGORICAL DATA IN R

  4. INFERENCE FOR CATEGORICAL DATA IN R

  5. INFERENCE FOR CATEGORICAL DATA IN R

  6. INFERENCE FOR CATEGORICAL DATA IN R

  7. E x ploring GSS library(dplyr) glimpse(gss) Observations: 3,300 Variables: 25 $ id <dbl> 518, 1092, 2094, 229, 979, 554, 491, 319, 3143, 1... $ year <dbl> 1982, 1982, 1982, 1982, 1982, 1982, 1982, 1982, 1... $ age <fct> 49, 22, 26, 75, 71, 33, 56, 33, 69, 40, 44, 42, 5... $ class <fct> WORKING CLASS, WORKING CLASS, WORKING CLASS, LOWE... $ degree <fct> HIGH SCHOOL, HIGH SCHOOL, HIGH SCHOOL, LT HIGH SC... $ sex <fct> MALE, MALE, MALE, MALE, FEMALE, FEMALE, MALE, FEM... $ happy <fct> HAPPY, HAPPY, HAPPY, HAPPY, HAPPY, HAPPY, HAPPY, ... INFERENCE FOR CATEGORICAL DATA IN R

  8. E x ploring GSS gss2016 <- filter(gss, year == 2016) ggplot(gss2016, aes(x = happy)) + geom_bar() INFERENCE FOR CATEGORICAL DATA IN R

  9. E x ploring GSS gss2016 <- filter(gss, year == 2016) ggplot(gss2016, aes(x = happy)) + geom_bar() INFERENCE FOR CATEGORICAL DATA IN R

  10. E x ploring GSS p_hat <- gss2016 %>% summarize(prop_happy = mean(happy == "HAPPY")) %>% pull() p_hat 0.7733333 INFERENCE FOR CATEGORICAL DATA IN R

  11. General 95% confidence inter v al ( ^ − 2 × SE , ^ + 2 × SE ) p p Sample proportion pl u s or min u s t w o standard errors INFERENCE FOR CATEGORICAL DATA IN R

  12. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  13. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  14. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  15. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  16. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  17. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  18. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  19. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  20. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  21. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  22. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  23. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  24. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  25. Bootstrap INFERENCE FOR CATEGORICAL DATA IN R

  26. Bootstrap Confidence Inter v al library(infer) boot boot <- gss2016 %>% specify(response = happy, Response: happy (factor) success = “HAPPY”) %>% # A tibble: 500 x 2 generate(reps = 500, replicate stat type = "bootstrap") %>% <int> <dbl> calculate(stat = "prop") 1 1 0.827 2 2 0.740 3 3 0.780 4 4 0.773 5 5 0.747 6 6 0.753 INFERENCE FOR CATEGORICAL DATA IN R

  27. Bootstrap Confidence Inter v al ggplot(boot, aes(x = stat)) + geom_density() INFERENCE FOR CATEGORICAL DATA IN R

  28. Bootstrap Confidence Inter v al SE <- boot %>% summarize(sd(stat)) %>% pull() SE 0.03482251 ( ^ − 2 × SE , ^ + 2 × SE ) p p c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7051883 0.8412784 INFERENCE FOR CATEGORICAL DATA IN R

  29. Let ' s practice ! IN FE R E N C E FOR C ATE G OR IC AL DATA IN R

  30. Interpreting a Confidence Inter v al IN FE R E N C E FOR C ATE G OR IC AL DATA IN R Andre w Bra y Assistant Professor of Statistics at Reed College

  31. Confidence inter v als Concl u sion : the tr u e proportion of Americans that are happ y is bet w een 0.705 and 0.841. What do w e mean b y con � dent ? INFERENCE FOR CATEGORICAL DATA IN R

  32. Dataset 1 ds1 <- filter(gss, year == 2016) p_hat <- ds1 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds1 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7073114 0.8393553 INFERENCE FOR CATEGORICAL DATA IN R

  33. INFERENCE FOR CATEGORICAL DATA IN R

  34. INFERENCE FOR CATEGORICAL DATA IN R

  35. INFERENCE FOR CATEGORICAL DATA IN R

  36. INFERENCE FOR CATEGORICAL DATA IN R

  37. INFERENCE FOR CATEGORICAL DATA IN R

  38. INFERENCE FOR CATEGORICAL DATA IN R

  39. INFERENCE FOR CATEGORICAL DATA IN R

  40. INFERENCE FOR CATEGORICAL DATA IN R

  41. INFERENCE FOR CATEGORICAL DATA IN R

  42. INFERENCE FOR CATEGORICAL DATA IN R

  43. Dataset 2 ds2 <- filter(gss, year == 2014) p_hat <- ds1 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds1 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.8348831 0.9384503 INFERENCE FOR CATEGORICAL DATA IN R

  44. Dataset 3 ds3 <- filter(gss, year == 2012) p_hat <- ds1 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds1 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7626359 0.8906974 INFERENCE FOR CATEGORICAL DATA IN R

  45. Dataset 3 ds3 <- filter(gss, year == 2012) p_hat <- ds3 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds3 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7626359 0.8906974 INFERENCE FOR CATEGORICAL DATA IN R

  46. Dataset 3 ds3 <- filter(gss, year == 2012) p_hat <- ds3 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds3 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7626359 0.8906974 INFERENCE FOR CATEGORICAL DATA IN R

  47. Dataset 3 ds3 <- filter(gss, year == 2012) p_hat <- ds3 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds3 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7626359 0.8906974 INFERENCE FOR CATEGORICAL DATA IN R

  48. Dataset 3 ds3 <- filter(gss, year == 2012) p_hat <- ds3 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds3 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7626359 0.8906974 INFERENCE FOR CATEGORICAL DATA IN R

  49. Dataset 3 ds3 <- filter(gss, year == 2012) p_hat <- ds3 %>% summarize(mean(happy == "HAPPY")) %>% pull() SE <- ds3 %>% specify(response = happy, success = "HAPPY") %>% generate(reps = 500, type = "bootstrap") %>% calculate(stat = "prop") %>% summarize(sd(stat)) %>% pull() c(p_hat - 2 * SE, p_hat + 2 * SE) 0.7626359 0.8906974 INFERENCE FOR CATEGORICAL DATA IN R

  50. Confidence Inter v als Interpretation : “ We ’ re 95% con � dent that the tr u e proportion of Americans that are happ y is bet w een 0.705 and 0.841.” Width of the inter v al a � ected b y n con � dence le v el p INFERENCE FOR CATEGORICAL DATA IN R

  51. Let ' s practice ! IN FE R E N C E FOR C ATE G OR IC AL DATA IN R

  52. The appro x imation shortc u t IN FE R E N C E FOR C ATE G OR IC AL DATA IN R Andre w Bra y Assistant Professor of Statistics at Reed College

  53. Confidence Inter v als SE Standard errors increase w hen n is small 0.009998905 p is close to 0.5 SE_small_n 0.03809731 SE_low_p 0.00547912 INFERENCE FOR CATEGORICAL DATA IN R

  54. INFERENCE FOR CATEGORICAL DATA IN R

  55. INFERENCE FOR CATEGORICAL DATA IN R

  56. The normal distrib u tion A . K . A the " bell c u r v e ". If obser v ations are independent n is large Then ^ p follo w s a normal distrib u tion INFERENCE FOR CATEGORICAL DATA IN R

  57. Standard de v iation √ ^ × (1 − ^ ) p p n INFERENCE FOR CATEGORICAL DATA IN R

  58. Assessing model ass u mptions Ho w do I check " obser v ations are independent "? This depends u pon the data collection method . What does " n is large " mean ? n × ^ > 10 p n × (1 − ^ ) > 10 p INFERENCE FOR CATEGORICAL DATA IN R

  59. Calc u lating standard error : appro x imation p_hat <- gss2016 %>% summarize(mean(happy == "HAPPY")) %>% pull() n <- nrow(gss2016) c(n * p_hat, n * (1 - p_hat)) 116 35 SE_approx <- sqrt(p_hat * (1 - p_hat) / n) SE_approx 0.03418468 INFERENCE FOR CATEGORICAL DATA IN R

Recommend


More recommend