DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R The House of Representatives in 2018 G. Elliott Morris Data Journalist
DataCamp Analyzing Election and Polling Data in R Political prediction as a case study: why? Why use political prediction as a case study? In data science: Helps you practice data cleaning, wrangling, modeling, and visualizing skills all at once Help you understand the limits of (basic) predictive modeling In politics: Craft fine-tuned expectations about what may happen in upcoming elections, allowing you to better anticipate outcomes
DataCamp Analyzing Election and Polling Data in R The US House The House of Representatives Made up of 435 individual voting members from all the states in America All members are up for election every two years
DataCamp Analyzing Election and Polling Data in R Tools you will need Exercise 1: What might happen in 2018? filter() polls_2018 %>% filter(date > "2018-06-01") mutate() polls_2018 %>% mutate(Dem.Margin = Dem - Rep) pull() polls_2018 %>% pull(Dem.Margin) mean() mean(polls_2018$Dem.Margin)
DataCamp Analyzing Election and Polling Data in R Tools you will need Exercise 2: Historical polling averages from August and September since 1980 filter() polls %>% filter(month(date) %in% c(8,9)) group_by() polls %>% group_by(year) summarise() polls %>% group_by(year) %>% summarise(avg = mean(Dem.Margin)
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Let's practice!
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Training a Model to Predict the Future with Polls G. Elliott Morris Data Journalist
DataCamp Analyzing Election and Polling Data in R From tidying to modeling From http://r4ds.had.co.nz/model-intro.html
DataCamp Analyzing Election and Polling Data in R The original model lm(Dem.Vote.Margin ~ Dem.Poll.Margin)
DataCamp Analyzing Election and Polling Data in R A multivariate model ggplot(generic_ballot,aes(x=Dem.Poll.Margin,y=Dem.Vote.Margin, col=party_in_power) + geom_text(aes(label=ElecYear)) + geom_smooth(method='lm')
DataCamp Analyzing Election and Polling Data in R A multivariate model model <- lm(Dem.Vote.Margin ~ Dem.Poll.Margin + party_in_power, data=polls_predict) summary(model) Call: lm(formula = Dem.Vote.Margin ~ Dem.Poll.Margin + party_in_power, data = polls_predict) Residuals: Min 1Q Median 3Q Max -4.3893 -2.4283 -0.2004 2.4982 4.6166 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.1168 1.1244 -1.883 0.078079 . Dem.Poll.Margin 0.8856 0.2070 4.278 0.000577 *** party_in_power -2.1348 0.8809 -2.423 0.027601 * Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.238 on 16 degrees of freedom Multiple R-squared: 0.7498, Adjusted R-squared: 0.7185 F-statistic: 23.98 on 2 and 16 DF, p-value: 1.535e-05
DataCamp Analyzing Election and Polling Data in R Predictions on new data predict(model, data.frame(Dem.Poll.Margin = 8, party_in_power=-1)) 1 7.102972
DataCamp Analyzing Election and Polling Data in R Margins of error Margin of error = model error ∗ 1.96 From: http://www.icse.xyz/msor/ssim/SDandCI.html
DataCamp Analyzing Election and Polling Data in R Calculating a margin of error Generic root-mean-square error formula: sqrt(mean(c(model$fitted.values - data$actual_results)^2)) * 1.96 With our poling data: sqrt(mean(c(model$fitted.values - polls_predict$Dem.Vote.Margin)^2)) *1.96 [1] 5.823251 In-sample MoE < out-of-sample MoE The latter should be used when available
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Your turn!
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R The Presidency in 2020 G. Elliott Morris Data Journalist
DataCamp Analyzing Election and Polling Data in R A final applied example US presidential elections: Decided by the Electoral College , which gives disproportionate representation to every state The count of everyone's ballot is called the popular vote Possible for a president to win the presidency without winning the popular vote: 2016, 2000, 1888, 1876, and 1824 We're going to predict the popular vote, but keep in mind, we might not predict the Electoral College very well.
DataCamp Analyzing Election and Polling Data in R Who wins the popular vote? "Time For Change" model created by Professor Alan Abramowitz. Model for predicting the popular vote Presidential elections can be predicted with: Presidential approval ratings Economic growth How long the White House has been controlled by one party instead of the other
DataCamp Analyzing Election and Polling Data in R Training the model lm(vote_share ~ pres_approve + q2_gdp + two_plus_terms, pres_elecs) To predict: vote_share : Vote share for the president's party Three input variables: pres_approve : Presidential approval q2_gdp : annual GDP growth from quarter two two_plus_terms : Term length
DataCamp Analyzing Election and Polling Data in R Performance of the model ggplot(pres_elecs,aes(x=predict,y=vote_share,label=Year)) + geom_abline() + geom_text()
DataCamp Analyzing Election and Polling Data in R Performance of the model Calculate the model's margin of error: # calculate the model's root-mean-square error sqrt(mean(c(pres_elecs$predict-pres_elecs$vote_share)^2)) * 1.96 [1] 3.273301
DataCamp Analyzing Election and Polling Data in R The states...
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Your turn!
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Congratulations! G. Elliott Morris Data Journalist
DataCamp Analyzing Election and Polling Data in R What you learned Chapter 1: approval polls dplyr: select , filter , mutate , group_by , summarise zoo: rollmean ggplot2: ggplot , geom_line , geom_point Chapter 2: US House elections polls lm ggplot: geom_smoooth()
DataCamp Analyzing Election and Polling Data in R What you learned Chapter 3: election results and Brexit choroplethr regression for analyzing relationships between data Chapter 4: prediction and applied examples multivariate regression ggplot for showing the relationship between three variables making predictions on new data
DataCamp Analyzing Election and Polling Data in R What's next? Data Camp: Learn more about the tidyverse Learn more about ggplot2 Reading: R for Data Science by Garrett Grolemund and Hadley Wickham Work! Connect with the #rstats community online Learn by doing!
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Congratulations, and thanks!
Recommend
More recommend