elections and political parties
play

Elections and Political Parties G. Elliott Morris Data Journalist - PowerPoint PPT Presentation

DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Elections and Political Parties G. Elliott Morris Data Journalist DataCamp Analyzing Election and Polling Data in R Measuring party support: the


  1. DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Elections and Political Parties G. Elliott Morris Data Journalist

  2. DataCamp Analyzing Election and Polling Data in R Measuring party support: the "generic ballot" "If the election for the U.S. House of Representatives were held today, would you vote for the Democratic candidate or Republican candidate in your district?" Source: RealClearPolitics.com

  3. DataCamp Analyzing Election and Polling Data in R House polling: Exploratory data analysis (EDA) > head(generic_ballot) Date Democrats Republicans ElecYear ElecDay 7/4/1945 44 31 1946 11/5/1946 7/19/1945 38 31 1946 11/5/1946 10/23/1945 36 51 1946 11/5/1946 11/28/1945 40 34 1946 11/5/1946 1/10/1946 40 34 1946 11/5/1946 DaysTilED DemVote RepVote 489 45 53 474 45 53 378 45 53 342 45 53 299 45 53 > nrow(generic_ballot) # the number of observations [1] 2559 > length(generic_ballot) # the number of variables [1] 8

  4. DataCamp Analyzing Election and Polling Data in R Generic ballot polling: EDA library(lubridate) library(tidyverse) ggplot(generic_ballot,aes(x=mdy(Date),y=Democrats)) + geom_point()

  5. DataCamp Analyzing Election and Polling Data in R How to learn from? this data After initial data wrangling, you're going to use the generic ballot polling dataset to: Graph trends over time Compare polls (predictions) with election results (observations) Create statistical models that can explain outcomes

  6. DataCamp Analyzing Election and Polling Data in R Tidyverse refresher Tidyverse functions from chapter 1: select() # selects columns filter() # filters dataset to value(s) of variable(s) group_by() # groups a dataset by unique observations of variable(s) summarise() # summarises a variable from a grouped dataset with an # aggregation function, like mean()

  7. DataCamp Analyzing Election and Polling Data in R Step by step 1. Look at the data with head() 2. filter() the polls to those in 2016 3. select() only the relevant variables 4. mutate() a variable to represent the Democratic margin over Republicans

  8. DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Time to practice!

  9. DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R 73 Years of "Generic Ballot" Polls G. Elliott Morris Data Journalist

  10. DataCamp Analyzing Election and Polling Data in R The generic ballot over time In the last lesson, you: Explored the data to make a plan for your analysis Mutated a column that better represents the nature of elections (margin of victory can be more helpful than the share of votes cast). Now: Analyze long-term trends in the generic ballot Visualize the generic ballot over time

  11. DataCamp Analyzing Election and Polling Data in R The generic ballot over time From the last slides

  12. DataCamp Analyzing Election and Polling Data in R Time series analysis for the generic ballot Steps for monthly time series analysis: Group polls by the month and year in which they were taken data %>% group_by(year, month) Create an average reading of the Democratic margin in that month data %>% group_by(year, month) %>% summarise(support = mean(support)) Analyze and visualize # with ggplot()!

  13. DataCamp Analyzing Election and Polling Data in R Making a generic ballot ggplot 1. Make the ggplot object ggplot(data,aes(x=month, y=support)) 2. Add a geometric layer (point, line, etc) ggplot(data,aes(x=month, y=support)) geom_point()

  14. DataCamp Analyzing Election and Polling Data in R Making a generic ballot ggplot

  15. DataCamp Analyzing Election and Polling Data in R Adding a trend line with geom_smooth() ggplot(data,aes(x=month,y=support)) + geom_point() + geom_smooth(span=0.2)

  16. DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Your turn!

  17. DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Calculating and Visualizing Error in Polls G. Elliott Morrs Data Journalist

  18. DataCamp Analyzing Election and Polling Data in R Polls have error! Important thing #1: Polls have error! Why? It is impractical to ask everyone in the country who they're going to vote for. Some people answer phone calls for polls and some do not. Some people don't even get called. It's hard to tell what a population should look like. Important: Sometimes, many pollsters all make the same mistake, and results can be biased against one party or individual.

  19. DataCamp Analyzing Election and Polling Data in R Why care about error? Not controlling for the right variables, or not treating your data with the proper amount of uncertainty can lead to over-confident results. From http://cdc.gov/cancer/breast/statistics

  20. DataCamp Analyzing Election and Polling Data in R Analyzing error in polls Steps for calculating error in polls: 1. Wrangle the data: Create polling averages for every year 2. Calculate polling error for each year Subtract the result from the average poll's prediction 3. Visualize the results and the margin of error

  21. DataCamp Analyzing Election and Polling Data in R Grouping generic ballot data by year Mutate a variable for the Democrat's margins poll_error <- generic_ballot %>% mutate(Democrats_Poll_Margin = Democrats - Republicans, Democrats_Vote_Margin = Democrats_vote - Republicans_vote) Average that variable by year poll_error <- poll_error %>% group_by(Year) %>% summarise(Democrats_Poll_Margin = mean(Democrats_Poll_Margin), Democrats_Vote_Margin = mean(Democrats_Vote_Margin)) Compare the polling average to the results poll_error %>% mutate(error = Dem.Poll.Margin - Dem.Vote.Margin)

  22. DataCamp Analyzing Election and Polling Data in R Calculating generic ballot error Calculate the room-mean-square error rmse <- sqrt(mean(poll_error$error^2)) Compute a margin of error # multiply it by 1.96 for our 95% CI <- rmse * 1.96 Add an upper and lower bound variable to our dataset by_year <- poll_error %>% mutate(upper = Dem.Poll.Margin + CI, lower = Dem.Poll.Margin - CI)

  23. DataCamp Analyzing Election and Polling Data in R Visualizing error Visualizing error with points and error bars: # make the ggplot object ggplot(by_year) + # add the polling geom_point() layer ggplot(by_year) + geom_point(aes(x=ElecYear,y=Dem.Poll.Margin,col="Poll")) + # add the results geom_point() layer ggplot(by_year) + geom_point(aes(x=ElecYear,y=Dem.Poll.Margin,col="Poll")) + geom_point(aes(x=ElecYear,y=Dem.Vote.Margin,col="Vote")) + # add the error geom_errorbar() layer ggplot(by_year) + geom_point(aes(x=ElecYear,y=Dem.Poll.Margin,col="Poll")) + geom_point(aes(x=ElecYear,y=Dem.Vote.Margin,col="Vote")) + geom_errorbar(aes(x=ElecYear,ymin=lower, ymax=upper))

  24. DataCamp Analyzing Election and Polling Data in R Visualizing error

  25. DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Practice

  26. DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Predicting Winners with Linear Regression G. Elliott Morris Data Journalist

  27. DataCamp Analyzing Election and Polling Data in R Use polls to predict votes Steps: 1. Specify a regression model to predict polling 2. Use that model to predict a hypothetical election with a 5-point margin for Democrats in the polls

  28. DataCamp Analyzing Election and Polling Data in R What is linear regression? Analyzes the relationship between two (or more) variables Does so by fitting a "line of best fit" through the data In other words, what equation best predicts y with x (and x2 , x3 , etc.)

  29. DataCamp Analyzing Election and Polling Data in R What is linear regression? Linear regression made easy: draw a line between points that best fits the data:

  30. DataCamp Analyzing Election and Polling Data in R Use polls to predict votes # specify and evaluate the model model <- lm(Dem.Vote.Margin ~ Dem.Poll.Margin, by_year) summary(model) Call: lm(formula = Dem.Vote.Margin ~ Dem.Poll.Margin, data = by_year) Residuals: Min 1Q Median 3Q Max -4.735 -1.788 -1.112 1.965 5.793 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.8109 1.0886 -2.582 0.024000 * Dem.Poll.Margin 0.8031 0.1607 4.998 0.000311 *** Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.389 on 12 degrees of freedom Multiple R-squared: 0.6755, Adjusted R-squared: 0.6484 F-statistic: 24.98 on 1 and 12 DF, p-value: 0.0003105

  31. DataCamp Analyzing Election and Polling Data in R ANALYZING ELECTION AND POLLING DATA IN R Practice

Recommend


More recommend