predicting voter turnout from survey data
play

Predicting voter turnout from survey data Julia Silge Data - PowerPoint PPT Presentation

DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Predicting voter turnout from survey data Julia Silge Data Scientist at Stack Overflow DataCamp Supervised Learning in R: Case Studies Views of the


  1. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Predicting voter turnout from survey data Julia Silge Data Scientist at Stack Overflow

  2. DataCamp Supervised Learning in R: Case Studies Views of the Electorate Research Survey (VOTER) Democracy Fund Voter Study Group Politically diverse group of analysts and scholars in the United States Data is freely available

  3. DataCamp Supervised Learning in R: Case Studies Views of the Electorate Research Survey (VOTER) Life in America today for people like you compared to fifty years ago is better? about the same? worse? Was your vote primarily a vote in favor of your choice or was it mostly a vote against his/her opponent? How important are the following issues to you? Crime Immigration The environment Gay rights

  4. DataCamp Supervised Learning in R: Case Studies Views of the Electorate Research Survey (VOTER)

  5. DataCamp Supervised Learning in R: Case Studies Interpreting integer survey responses AMERICA IS A FAIR SOCIETY WHERE EVERYONE HAS THE OPPORTUNITY TO GET AHEAD Response Code Strongly agree 1 Agree 2 Disagree 3 Strongly disagree 4 Learn more about the data yourself !

  6. DataCamp Supervised Learning in R: Case Studies Predicting voter turnout > voters %>% + count(turnout16_2016) # A tibble: 2 x 2 turnout16_2016 n <fct> <int> 1 Did not vote 264 2 Voted 6428

  7. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Let's get started!

  8. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES VOTE 2016 Julia Silge Data Scientist at Stack Overflow

  9. DataCamp Supervised Learning in R: Case Studies Exploratory data analysis Elections don't matter Gay rights are very important Crime is very important Did not vote 55.3% 17.0% 66.3% Voted 34.1% 25.3% 57.6%

  10. DataCamp Supervised Learning in R: Case Studies Exploratory data analysis

  11. DataCamp Supervised Learning in R: Case Studies Fitting a simple model

  12. DataCamp Supervised Learning in R: Case Studies Fitting a simple model > library(broom) > > simple_glm %>% + tidy() %>% + filter(p.value < 0.05) %>% + arrange(desc(estimate)) term estimate std.error statistic p.value 1 (Intercept) 2.45703562 0.73272138 3.353301 7.985370e-04 2 imiss_a_2016 0.39712084 0.13898678 2.857256 4.273207e-03 3 imiss_l_2016 0.27468893 0.10678119 2.572447 1.009825e-02 4 imiss_q_2016 0.24456695 0.11909335 2.053573 4.001699e-02 5 track_2016 0.24107452 0.12146679 1.984695 4.717843e-02 6 RIGGED_SYSTEM_1_2016 0.23628350 0.08508091 2.777162 5.483579e-03 7 futuretrend_2016 0.21056782 0.07120079 2.957380 3.102651e-03 8 RIGGED_SYSTEM_5_2016 0.19025188 0.09645384 1.972466 4.855648e-02 9 wealth_2016 -0.06940523 0.02634395 -2.634580 8.424157e-03 10 imiss_k_2016 -0.18103020 0.08272555 -2.188323 2.864611e-02 11 econtrend_2016 -0.29536980 0.08722417 -3.386330 7.083422e-04 12 imiss_f_2016 -0.32328040 0.10543220 -3.066240 2.167694e-03 13 imiss_g_2016 -0.33203385 0.07867346 -4.220405 2.438640e-05 14 imiss_n_2016 -0.44161183 0.09003981 -4.904628 9.360434e-07

  13. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Let's build some models!

  14. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Cross-validation Julia Silge Data Scientist at Stack Overflow

  15. DataCamp Supervised Learning in R: Case Studies Cross-validation Partitioning your data into subsets and using one subset for validation

  16. DataCamp Supervised Learning in R: Case Studies Cross-validation Partitioning your data into subsets and using one subset for validation method = "cv" method = "repeatedcv"

  17. DataCamp Supervised Learning in R: Case Studies

  18. DataCamp Supervised Learning in R: Case Studies

  19. DataCamp Supervised Learning in R: Case Studies

  20. DataCamp Supervised Learning in R: Case Studies

  21. DataCamp Supervised Learning in R: Case Studies Cross-validation Repeated cross-validation can take a long time Parallel processing can be worth it

  22. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Let's practice!

  23. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Comparing model performance Julia Silge Data Scientist at Stack Overflow

  24. DataCamp Supervised Learning in R: Case Studies Confusion matrix > confusionMatrix(predict(fit_glm, training), + training$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 149 1633 Voted 63 3510 Accuracy : 0.6833 95% CI : (0.6706, 0.6957) No Information Rate : 0.9604 P-Value [Acc > NIR] : 1 Kappa : 0.0847 Mcnemar's Test P-Value : <2e-16 Sensitivity : 0.70283 Specificity : 0.68248 Pos Pred Value : 0.08361 Neg Pred Value : 0.98237 Prevalence : 0.03959 Detection Rate : 0.02782 Detection Prevalence : 0.33277 Balanced Accuracy : 0.69266 'P iti ' Cl Did t t

  25. DataCamp Supervised Learning in R: Case Studies Confusion matrix > confusionMatrix(predict(fit_rf, training), + training$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 212 5 Voted 0 5138 Accuracy : 0.9991 95% CI : (0.9978, 0.9997) No Information Rate : 0.9604 P-Value [Acc > NIR] : < 2e-16 Kappa : 0.9879 Mcnemar's Test P-Value : 0.07364 Sensitivity : 1.00000 Specificity : 0.99903 Pos Pred Value : 0.97696 Neg Pred Value : 1.00000 Prevalence : 0.03959 Detection Rate : 0.03959 Detection Prevalence : 0.04052 Balanced Accuracy : 0.99951 'P iti ' Cl Did t t

  26. DataCamp Supervised Learning in R: Case Studies Confusion matrix for the testing data > confusionMatrix(predict(fit_glm, testing), + testing$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 37 428 Voted 15 857 Accuracy : 0.6687 95% CI : (0.6427, 0.6939) No Information Rate : 0.9611 P-Value [Acc > NIR] : 1 Kappa : 0.0787 Mcnemar's Test P-Value : <2e-16 Sensitivity : 0.71154 Specificity : 0.66693 Pos Pred Value : 0.07957 Neg Pred Value : 0.98280 Prevalence : 0.03889 Detection Rate : 0.02767 Detection Prevalence : 0.34779 Balanced Accuracy : 0.68923 'P iti ' Cl Did t t

  27. DataCamp Supervised Learning in R: Case Studies Confusion matrix for the testing data > confusionMatrix(predict(fit_rf, testing), + testing$turnout16_2016) Confusion Matrix and Statistics Reference Prediction Did not vote Voted Did not vote 0 14 Voted 52 1271 Accuracy : 0.9506 95% CI : (0.9376, 0.9616) No Information Rate : 0.9611 P-Value [Acc > NIR] : 0.9767 Kappa : -0.0168 Mcnemar's Test P-Value : 5.254e-06 Sensitivity : 0.00000 Specificity : 0.98911 Pos Pred Value : 0.00000 Neg Pred Value : 0.96070 Prevalence : 0.03889 Detection Rate : 0.00000 Detection Prevalence : 0.01047 Balanced Accuracy : 0.49455 'P iti ' Cl Did t t

  28. DataCamp Supervised Learning in R: Case Studies Comparing model performance > library(yardstick) > > sens(testing_results, truth = turnout16_2016, estimate = `Logistic regression` [1] 0.7115385 > > spec(testing_results, truth = turnout16_2016, estimate = `Logistic regression` [1] 0.6669261 > > sens(testing_results, truth = turnout16_2016, estimate = `Random forest`) [1] 0 > > spec(testing_results, truth = turnout16_2016, estimate = `Random forest`) [1] 0.9891051

  29. DataCamp Supervised Learning in R: Case Studies SUPERVISED LEARNING IN R : CASE STUDIES Let's finish this case study!

Recommend


More recommend