DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R The 2016 US Presidential Election G. Elliott Morris Data Journalist
DataCamp Analyzing Election and Polling Data in R Understanding presidential elections United States Presidential Elections: Voters cast ballots for two major parties, Democrats and Republicans, and other minor parties Results are recorded by county election officials and published by state secretaries of state Can be combined with other county-level demographic data from the U.S. Census Bureau
DataCamp Analyzing Election and Polling Data in R What can we learn from elections? THE BIG QUESTION: Are white counties... ... more Republican?
DataCamp Analyzing Election and Polling Data in R Are areas that vote more Republican in presidential elections...
DataCamp Analyzing Election and Polling Data in R ...also whiter areas of the country?
DataCamp Analyzing Election and Polling Data in R The data The datasets we're going to use for this lesson are a combination of two official sources: 1. County-level election returns form the 2016 presidential election 2. County-level demographic data from the US Census Bureau, accessed via the choroplethr package
DataCamp Analyzing Election and Polling Data in R The data left_join(df_county_demographics, uspres_county, by = "county.fips") county.fips total_population percent_white percent_black percent_asian 1 1001 54907 76 18 1 2 1003 187114 83 9 1 3 1005 27321 46 46 0 4 1007 22754 75 22 0 5 1009 57623 88 1 0 6 1011 10746 22 71 0 percent_hispanic per_capita_income median_rent median_age county.name 1 2 24571 668 37.5 autauga 2 4 26766 693 41.5 baldwin 3 5 16829 382 38.3 barbour 4 2 17427 351 39.4 bibb 5 8 20730 403 39.6 blount 6 6 18628 276 39.6 bullock state.name county.total.count D O R Dem.pct 1 alabama 24973 5936 865 18172 0.23769671 2 alabama 95215 18458 3874 72883 0.19385601 3 alabama 10469 4871 144 5454 0.46527844 4 alabama 8819 1874 207 6738 0.21249575 5 alabama 25588 2156 573 22859 0.08425825 6 alabama 4710 3530 40 1140 0.74946921
DataCamp Analyzing Election and Polling Data in R Exploring relationships between data Load the ggplot2 package: library(ggplot2) Visualize the relationship between percent_white and Dem.pct : ggplot(county_merged, aes(x=percent_white,y=Dem.pct)) + geom_point()
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R Exploring relationships between data Load the ggplot2 package: library(ggplot2) Visualize the relationship between percent_white and Dem.pct : ggplot(county_merged, aes(x=percent_white,y=Dem.pct)) + geom_point() Add a trend line: ggplot(county_merged, aes(x=percent_white,y=Dem.pct)) + geom_point() + geom_smooth(method="lm")
DataCamp Analyzing Election and Polling Data in R
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Now it's your turn!
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Mapping The 2016 US Presidential Election G. Elliott Morris Data Journalist
DataCamp Analyzing Election and Polling Data in R Why maps? Why not? Mapping can: Display continuous or discrete data in a familiar way (most people can find where they live on a map) Help analysts identify meaningful patterns in the data: is the south more Republican than the north? Put other types of graphics, like scatterplots, into context Mapping cannot: Conduct statistical analysis! Easily show relationships between two or more variables
DataCamp Analyzing Election and Polling Data in R Mapping data in R
DataCamp Analyzing Election and Polling Data in R Mapping data in R Choices for mapping: choroplethr : fast visualization, low customizability, comes with data ggplot + geom_sf() : fast, customizability, need for data leaflet : interactive, customizable, steep learning curve, need for data
DataCamp Analyzing Election and Polling Data in R Choroplethr
DataCamp Analyzing Election and Polling Data in R Ggplot + geom_sf()
DataCamp Analyzing Election and Polling Data in R Leaflet From https://rstudio.github.io/leaflet/choropleths.html
DataCamp Analyzing Election and Polling Data in R Mapping the 2016 election ... using choroplethr: Load the package library(choroplethr) Give the dataset its proper names: county_map <- county_merged %>% dplyr::rename("region" = county.fips, "value" = Dem.pct) Map! county_choropleth(county_map)
DataCamp Analyzing Election and Polling Data in R A map of the 2016 presidential election
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Your turn!
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Linear Regression and Political Data G. Elliott Morris Data Journalist
DataCamp Analyzing Election and Polling Data in R Regression recap Analyzes the relationship between two (or more) variables Does so by fitting a "line of best fit" through the data
DataCamp Analyzing Election and Polling Data in R Election results and linear regression Linear regression made easy: draw a line between points that best fits the data:
DataCamp Analyzing Election and Polling Data in R Analyzing results with linear regression fit <- lm(Dem.pct ~ percent_white, data=county_merged) summary(fit) Call: lm(formula = Dem.pct ~ percent_white, data = county_merged) Residuals: Min 1Q Median 3Q Max -0.39987 -0.08303 -0.00903 0.07281 0.47761 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.6719046 0.0090408 74.32 <2e-16 *** percent_white -0.0045684 0.0001123 -40.68 <2e-16 ***
DataCamp Analyzing Election and Polling Data in R Interpreting linear regression results summary(fit) Call: lm(formula = Dem.pct ~ percent_white, data = county_merged) Residuals: Min 1Q Median 3Q Max -0.39987 -0.08303 -0.00903 0.07281 0.47761 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.6719046 0.0090408 74.32 <2e-16 *** percent_white -0.0045684 0.0001123 -40.68 <2e-16 *** Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1227 on 3097 degrees of freedom (44 observations deleted due to missingness) Multiple R-squared: 0.3482, Adjusted R-squared: 0.348 F-statistic: 1655 on 1 and 3097 DF, p-value: < 2.2e-16
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Your Turn!
DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R The 2016 UK Referendum to Leave the EU (AKA: Brexit) G. Elliott Morris Data Journalist
DataCamp Analyzing Election and Polling Data in R What was Brexit? From https://www.bbc.com/news/uk-politics-32810887
DataCamp Analyzing Election and Polling Data in R The puzzle of Brexit 1. Who was leading the polls? "Remain" versus "Leave" 2. What happened? "Leave" won 3. Why? Non-college educated UKIP voters vs Labour and establishment Tories
DataCamp Analyzing Election and Polling Data in R Brexit Polling Polls showed Remain with a slight lead in the final days of the campaign head(brexit_polls) Date Remain Leave RemainLead 1 6/23/16 52 48 4 2 6/22/16 55 45 10 3 6/22/16 51 49 2 4 6/22/16 49 46 3 5 6/22/16 44 45 -1 6 6/22/16 54 46 8 7 6/22/16 48 42 6 8 6/22/16 41 43 -2 9 6/20/16 45 44 1 10 6/19/16 42 44 -2
DataCamp Analyzing Election and Polling Data in R Brexit Polling: Analysis Option 1: Average all the polls in the last week of the campaign. Conclusion: Remain will win Option 2: LOESS smoothers (LOcally wEighted Scatter-plot Smoother) use local regression to predict the outcome of a variable. Conclusion: Remain's lead over time: a surge in the final week, but uncertainty around the outcome remained high
DataCamp Analyzing Election and Polling Data in R Brexit Polling: Visualization ggplot(brexit_polls, aes(x = mdy(Date), y = Remain - Leave)) + geom_point() + geom_smooth(method = 'loess')
DataCamp Analyzing Election and Polling Data in R Brexit Polling: Conclusion Remain's lead was large and significant However, it was not large enough to rule out a Leave victory, as many analysts did This improper reading of uncertainty in data can lead to misguided understandings of the probability of different events.
Recommend
More recommend