Predicting the Results of the Scottish Referendum Zhou Fang
(Most of this presentation was prepared on the 31st of July. We’ll have a look at how things have changed later…)
The Referendum Scots (and people living in Scotland) are due to vote on the 18th of September on whether or not Scotland becomes an independent country. ● Huge public interest ● Intensely fought campaign ● Result will shape the future of the UK
Opinion Polls Several companies conduct opinion polls on how people plan to vote in the referendum. (We focus on the more mainstream pollsters of the British polling council.) Polls produce a wealth of data on public opinions with respect to the referendum. In the past this has been very successful at predicting the results of, for example, the 2012 US Elections. Can this be done for the Scottish Referendum?
A look at the polls Running average of polls
Nate Silver "There's virtually no chance that the 'yes' side will win. If you look at the polls, it's pretty definitive really where the no side is at 60-55% and the yes side is about 40 or so." "There is a wide variety of polls and they all show the 'no' vote ahead, some by modest margins and some by overwhelming margins. The best you can do is take an average of those.” (13th August 2013)
Another look at the polls Let’s focus on the Yes share of the (non-undecided) vote: Y / (Y + N) To smooth the polls, can also use Princeton professor Sam Wang’s median-based method which was also successful in 2012. We get:
Yes share of vote with 1 month rolling medians
But... ● Difficult to interpret as a prediction for Day 0 ● Very non-smooth ● Ignores effect of different polling companies ○ Could differences be due to this?
Yes share of vote with 1 month rulling medians
Yes share of vote, with lines aggregating polls from the same pollster
Spline model Assuming ● the underlying pattern of variation is smooth ● the ‘house effect’ of each pollster is constant over time it is natural to opt for a spline model to smooth the data, and make extrapolations: min || YesVote(t,i) - f(t) - A i || 2 - P(f(t)) with P a smoothness penalty, and t, i the day and pollster associated with each poll. Applying to the data, we get:
Results of spline model with house effect adjustments (Using package ‘mgcv’ in R)
Is this enough? In principle we can make a prediction for referendum day by taking f(0), and some average, say, across the polling companies. However...
Sampling and weighting Different pollsters represent different methods of ● sampling ● weighting (especially to political affiliation) ● asking the question This can make a big difference! Few previous referendums, so difficult to say which procedure is correct.
‘Game changer’? Sudden changes of public opinion in the last few months of a campaign do happen ... even without an obvious ‘event’ to explain it... Even just before the election, opinion polls can fail if pollsters make wrong assumptions about whether people who say they will vote actually go vote. It is thus not so easy to predict. For example, applying method to another referendum 75 days before the end:
AV referendum at 75 days out
AV referendum, all the data
Including the error We have very little data to use to fully account for these effects. To at least incorporate them, adopt a randomisation based approach. 1. Randomly select a polling company and make a prediction at Day 0. 2. Randomly add on +/- a Day-75 prediction error from one of a number of similar previous elections with opinion poll results Do this many times to create a distribution of predicted results.
Simulated Yes votes. (Considered elections: AV vote, 2010 general election, Welsh devolution referendum, 2011 Scottish Parliament election)
Conclusion ● According to our model and simulation, at the end of July No has approximately a 69% chance of winning the coming referendum. ● This value will change as more polls come in and we get closer to referendum day. ● Clearly a lot of assumptions have been made! Curiously, our value is essentially identical to the value obtained by David Bell’s (University of Stirling) analysis of bets made on prediction markets. ( The Independence Referendum:Predicting the Outcome )
What happened since? After the presentation was given, we have had ● Two televised debates ● A number of additional polls ● We are now closer to the referendum...
Yes share of vote, updated - dashed lines denote debates
Yes share of vote, updated - Blue is the original spline, Red is with newer data
Simulated referendum vote shares
Simulated referendum vote shares - final estimate and win probability
Any questions? zhou.fang@bioss.ac.uk zhou.zfang@gmail.com
Recommend
More recommend