Online Collaborative Prediction of Regional Vote Results Vincent Etter, Emtiyaz Khan, Mattias Grossglauser, Patrick Thiran DSAA — October 17, 2016 — Montréal, Canada
Data Opportunity Many countries adopt open government initiatives • Several datasets published • Demographics • State a ff airs • Votes and elections • Unique opportunity • Get a better understanding • Build tools useful to others • 2
Voting Data News agencies, political parties, and polling institutes are all • interested in understanding voting behaviors Will the next vote pass easily? • What makes two regions vote similarly? • Where should we focus our e ff orts? • 3
Dataset Vote results from Switzerland • Issue votes between 1981 and 2014 • Outcome (% of “yes”) at the municipality level • 281 votes • 13 features: voting recommendation of the main parties • 2352 regions • 25 features: languages spoken, demographics, etc. • Data available at http://vincent.etter.io/dsaa16 4
Similarities Between Results 5
Online Predictions On the day of the vote, regional results are released in • sequence Use published results to predict others • … and re fi ne the prediction as more results are published? • 6
Our Approach Use a matrix-factorization model to capture the bi-clustering • Add region and vote features • Reduce the cold-start problem • More interpretable • Build the model incrementally to assess the e ff ect of each • component 7
Our Model y dn = z dn + ✏ v T z dn = µ n + f n ( x d ) + f d ( w n ) + d u n bias regression regression matrix on region on vote factorization 8
Our Models v T z dn = µ n + f n ( x d ) + f d ( w n ) + d u n LIN(r) z dn = µ n + β T n x d LIN(v) γ T z dn = µ n + d w n LIN(r) + LIN(v) β T γ T λ β , λ γ , λ u , λ v z dn = µ n + + n x d d w n MF v T z dn = µ n + + d u n MF + LIN(r) β T v T z dn = µ n + + n x d d u n MF + GP(r) v T GP( x d ) z dn = µ n + + d u n θ , σ s , λ γ MF + GP(r) + LIN(v) γ T v T GP( x d ) z dn = µ n + + + d w n d u n 9
Performance Evaluation Last 50 votes as test data • Simulate 500 random reveal order • Last 10% of regions as test regions • Observe increasing number of regions • Predict result of test regions • 10
Results 13 13 13 RMSE on the last 10 % of regions [%] RMSE on the last 10 % of regions [%] RMSE on the last 10 % of regions [%] 12 12 12 11 11 11 10 10 10 9 9 9 MF + LIN(r) 8 8 8 7 7 7 LIN(r) LIN(r) LIN(r) 6 6 6 MF MF 5 5 5 10 0 10 0 10 0 10 1 10 1 10 1 10 2 10 2 10 2 10 3 10 3 10 3 Number of observed regions Number of observed regions Number of observed regions 11
Bayesian VS Non-Bayesian 13 13 RMSE on the last 10 % of regions [%] RMSE on the last 10 % of regions [%] 12 12 11 11 10 10 9 9 MF + LIN(r) MF + LIN(r) 8 8 7 7 M 6 6 F + G P ( r ) 5 5 10 0 10 0 10 1 10 1 10 2 10 2 10 3 10 3 Number of observed regions Number of observed regions 12
Final Model 13 13 RMSE on the last 10 % of regions [%] RMSE on the last 10 % of regions [%] 12 12 11 11 10 10 9 9 LIN(v) LIN(v) M M 8 8 F F MF + GP(r) + LIN(v) + + G G P P ( ( 7 7 r r ) ) 6 6 5 5 10 0 10 0 10 1 10 1 10 2 10 2 10 3 10 3 Number of observed regions Number of observed regions 13
Interpretation Röstigraben x y Election CVP Election BDP Election SVP Age 20-64 Election other right Election PST Elevation Election SP Age 0-19 Election Greens Election GL Election FDP Election PEV Foreigners Social aid Age 65+ Speaks French Population density Speaks Romansh Jobs Population Speaks German Speaks Italian 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Relative importance 14
Summary Individual models have di ff erent strengths • Vote features regression for cold start • Region features and bi-clustering when more observations • Bayesian methods are useful • Proper hyperparameters setting • Accurate and interpretable results • 15
Thank you! Code and data available at http://vincent.etter.io/dsaa16 Any questions? 16
Recommend
More recommend