online collaborative prediction of regional vote results
play

Online Collaborative Prediction of Regional Vote Results Vincent - PowerPoint PPT Presentation

Online Collaborative Prediction of Regional Vote Results Vincent Etter, Emtiyaz Khan, Mattias Grossglauser, Patrick Thiran DSAA October 17, 2016 Montral, Canada Data Opportunity Many countries adopt open government initiatives


  1. Online Collaborative Prediction of Regional Vote Results Vincent Etter, Emtiyaz Khan, Mattias Grossglauser, Patrick Thiran DSAA — October 17, 2016 — Montréal, Canada

  2. Data Opportunity Many countries adopt open government initiatives • Several datasets published • Demographics • State a ff airs • Votes and elections • Unique opportunity • Get a better understanding • Build tools useful to others • 2

  3. Voting Data News agencies, political parties, and polling institutes are all • interested in understanding voting behaviors Will the next vote pass easily? • What makes two regions vote similarly? • Where should we focus our e ff orts? • 3

  4. Dataset Vote results from Switzerland • Issue votes between 1981 and 2014 • Outcome (% of “yes”) at the municipality level • 281 votes • 13 features: voting recommendation of the main parties • 2352 regions • 25 features: languages spoken, demographics, etc. • Data available at http://vincent.etter.io/dsaa16 4

  5. Similarities Between Results 5

  6. Online Predictions On the day of the vote, regional results are released in • sequence Use published results to predict others • … and re fi ne the prediction as more results are published? • 6

  7. Our Approach Use a matrix-factorization model to capture the bi-clustering • Add region and vote features • Reduce the cold-start problem • More interpretable • Build the model incrementally to assess the e ff ect of each • component 7

  8. Our Model y dn = z dn + ✏ v T z dn = µ n + f n ( x d ) + f d ( w n ) + d u n bias regression regression matrix on region on vote factorization 8

  9. Our Models v T z dn = µ n + f n ( x d ) + f d ( w n ) + d u n LIN(r) z dn = µ n + β T n x d LIN(v) γ T z dn = µ n + d w n LIN(r) + LIN(v) β T γ T λ β , λ γ , λ u , λ v z dn = µ n + + n x d d w n MF v T z dn = µ n + + d u n MF + LIN(r) β T v T z dn = µ n + + n x d d u n MF + GP(r) v T GP( x d ) z dn = µ n + + d u n θ , σ s , λ γ MF + GP(r) + LIN(v) γ T v T GP( x d ) z dn = µ n + + + d w n d u n 9

  10. Performance Evaluation Last 50 votes as test data • Simulate 500 random reveal order • Last 10% of regions as test regions • Observe increasing number of regions • Predict result of test regions • 10

  11. Results 13 13 13 RMSE on the last 10 % of regions [%] RMSE on the last 10 % of regions [%] RMSE on the last 10 % of regions [%] 12 12 12 11 11 11 10 10 10 9 9 9 MF + LIN(r) 8 8 8 7 7 7 LIN(r) LIN(r) LIN(r) 6 6 6 MF MF 5 5 5 10 0 10 0 10 0 10 1 10 1 10 1 10 2 10 2 10 2 10 3 10 3 10 3 Number of observed regions Number of observed regions Number of observed regions 11

  12. Bayesian VS Non-Bayesian 13 13 RMSE on the last 10 % of regions [%] RMSE on the last 10 % of regions [%] 12 12 11 11 10 10 9 9 MF + LIN(r) MF + LIN(r) 8 8 7 7 M 6 6 F + G P ( r ) 5 5 10 0 10 0 10 1 10 1 10 2 10 2 10 3 10 3 Number of observed regions Number of observed regions 12

  13. Final Model 13 13 RMSE on the last 10 % of regions [%] RMSE on the last 10 % of regions [%] 12 12 11 11 10 10 9 9 LIN(v) LIN(v) M M 8 8 F F MF + GP(r) + LIN(v) + + G G P P ( ( 7 7 r r ) ) 6 6 5 5 10 0 10 0 10 1 10 1 10 2 10 2 10 3 10 3 Number of observed regions Number of observed regions 13

  14. Interpretation Röstigraben x y Election CVP Election BDP Election SVP Age 20-64 Election other right Election PST Elevation Election SP Age 0-19 Election Greens Election GL Election FDP Election PEV Foreigners Social aid Age 65+ Speaks French Population density Speaks Romansh Jobs Population Speaks German Speaks Italian 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Relative importance 14

  15. Summary Individual models have di ff erent strengths • Vote features regression for cold start • Region features and bi-clustering when more observations • Bayesian methods are useful • Proper hyperparameters setting • Accurate and interpretable results • 15

  16. Thank you! Code and data available at http://vincent.etter.io/dsaa16 Any questions? 16

Recommend


More recommend