Beyond Binary Labels: Political Ideology Prediction of Twitter Users - PowerPoint PPT Presentation

Beyond Binary Labels: Political Ideology Prediction of Twitter Users Daniel Preot ¸iuc-Pietro Joint work with Ye Liu (NUS), Daniel J Hopkins (Political Science), Lyle Ungar (CS) 2 August 2017

Motivation User attribute prediction from text is successful: ◮ Age (Rao et al. 2010 ACL) ◮ Gender (Burger et al. 2011 EMNLP) ◮ Location (Eisenstein et al. 2010 EMNLP) ◮ Personality (Schwartz et al. 2013 PLoS One) ◮ Impact (Lampos et al. 2014 EACL) ◮ Political Orientation (Volkova et al. 2014 ACL) ◮ Mental Illness (Coppersmith et al. 2014 ACL) ◮ Occupation (Preot ¸iuc-Pietro et al. 2015 ACL) ◮ Income (Preot ¸iuc-Pietro et al. 2015 PLoS One) ... and useful in many applications.

Political Ideology & Text Hypothesis: Political ideology of a user is disclosed through language use ◮ partisan political mentions or issues ◮ cultural di ff erences

Political Ideology & Text Previous CS / NLP research used data sets with user labels identified through: 1. User descriptions H1 Users are far more likely to be politically engaged

Political Ideology & Text 2. Partisan Hashtags H2 The prediction problem was so far over-simplified

Political Ideology & Text 3. Lists of Conservative / Liberal users H3 Neutral users

Political Ideology & Text 4. Followers of partisan accounts H4 Di ff erences in language use exist between moderate and extreme users

Data ◮ Political ideology ◮ specific of country and culture ◮ our use case is US politics (similar to all previous work) ◮ the major US ideology spectrum is Conservative – Liberal ◮ seven point scale

Data We collect a new data set: ◮ 3.938 users (4.8M tweets) ◮ public Twitter handle with > 100 posts Political ideology is reported through an online survey ◮ only way to obtain unbiased ground truth labels (Flekova et al. 2016 ACL, Carpenter et al. 2016 SPPS) ◮ additionally reported age, gender and other demographics

Data ◮ Data available at preotiuc.ro ◮ full data for research purposes ◮ aggregate for replicability ◮ Twitter Developer Agreement & Policy VII.A4 ” Twitter Content , and information derived from Twitter Content, may not be used by, or knowingly displayed, distributed, or otherwise made available to any entity to target , segment, or profile individuals based on [...] political a ffi liation or beliefs” ◮ Study approved by the Internal Review Board (IRB) of the University of Pennsylvania

Class Distribution 1000 750 696 692 594 501 453 500 401 250 195 0 696 453

Data For comparison to previous work, we collect a data set: ◮ 13.651 users (25.5M tweets) ◮ follow liberal / conservative politicians on Twitter

Hypotheses H1 Previous studies used users far more likely to be politically engaged H2 The prediction problem was so far over-simplified H3 Neutral users can be identified H4 Di ff erences in language use exist between moderate and extreme users

Engagement H1 Previous studies used users far more likely to be politically engaged Manually coded: ◮ Political words (234) ◮ Political NEs: mentions of politician proper names (39) ◮ Media NEs: mentions of political media sources and pundints (20)

Engagement Data set obtained using previous methods 4.00 Political word usage across 0.18 user groups 3.50 0.11 Media/Pundit Names 0.79 3.00 Politician Names 0.73 Political Words 2.50 2.00 1.50 1.00 0.50 2.64 2.95 0.00 Average percentage of political word usage

Engagement Our data set 4.00 Political word usage across 0.18 user groups 3.50 0.11 Media/Pundit Names 0.79 3.00 Politician Names 0.73 Political Words 2.50 2.00 1.50 0.03 0.04 1.00 0.24 0.19 0.03 0.03 0.14 0.03 0.02 0.12 0.09 0.07 0.02 0.50 0.07 2.64 0.76 0.55 0.42 0.36 0.46 0.51 0.76 2.95 0.00 Average percentage of political word usage

Engagement Take aways: ◮ 3x more political terms for automatically identified users compared to the highest survey-based scores ◮ almost perfectly symmetrical U-shape across all three types of political terms ◮ The di ff erence between 1-2 / 6-7 is larger than 2-3 / 5-6

Over-simplification H2 The prediction problem was so far over-simplified .972 .976 1.0 .891 .9 .8 .7 .6 .5 CvL Topics Political Terms Domain Adaptation ROC AUC, Logistic Regression, 10-fold cross-validation

Over-simplification H2 The prediction problem was so far over-simplified .972 .976 1.0 .891 .9 .785 .785 .789 .8 .7 .6 .5 CvL 1v7 Topics Political Terms Domain Adaptation ROC AUC, Logistic Regression, 10-fold cross-validation

Over-simplification H2 The prediction problem was so far over-simplified .972 .976 1.0 .891 .9 .785 .785 .789 .8 .690 .679 .7 .662 .6 .5 CvL 1v7 2v6 Topics Political Terms Domain Adaptation ROC AUC, Logistic Regression, 10-fold cross-validation

Over-simplification H2 The prediction problem was so far over-simplified .972 .976 1.0 .891 .9 .785 .785 .789 .8 .690 .679 .7 .662 .625 .590 .581 .6 .5 CvL 1v7 2v6 3v5 Topics Political Terms Domain Adaptation ROC AUC, Logistic Regression, 10 fold-cross validation

Over-simplification Predicting continuous political leaning (1 – 7) .40 .369 .300 .294 .286 .30 .256 .20 .145 .10 .00 Leaning Unigrams LIWC Topics Emotions Political All Pearson R between predictions and true labels, Linear Regression, 10-fold cross-validation

Over-simplification Seven-class classification 30% 27.60% 26.20% 24.20% 22.20% 19.60% 20% 10% 0% Accuracy, 10-fold cross-validation GR – Logistic regression with Group Lasso regularisation

Neutral Users H3 Neutral users can be identified Words associated with either Words associated with neutral extreme conservative or liberal users a a a correlation strength Correlations are age and gender controlled. Extreme groups are combined using matched age and gender distributions.

Political Engagement H3a There is a separate dimension of political engagement Combine the classes into a scale: 4 – 3&5 – 2&6 – 1&7 .40 .369 .300 .294 .286 .30 .256 .196 .20 .165 .169 .169 .149 .145 .10 .079 .00 Leaning Engagement Unigrams LIWC Topics Emotions Political All Pearson R between predictions and true labels, Linear Regression, 10 fold-cross validation

Moderate Users H4 Di ff erences between moderate and extreme users Words associated with moderate Words associated with extreme liberals (5 and 6). liberals (7). a a a correlation strength relative frequency Correlations are age and gender controlled

Take Aways ◮ User-level trait acquisition methodologies can generate non-representative samples ◮ Political ideology: ◮ Goes beyond binary classes ◮ The problem was to date over-simplified ◮ New data set available for research ◮ New model to identify political leaning and engagement

Questions? www.preotiuc.ro wwbp.org

Beyond Binary Labels: Political Ideology Prediction of Twitter Users - PowerPoint PPT Presentation

Beyond Binary Labels: Political Ideology Prediction of Twitter Users Daniel Preot iuc-Pietro Joint work with Ye Liu (NUS), Daniel J Hopkins (Political Science), Lyle Ungar (CS) 2 August 2017 Motivation User attribute prediction from text

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

String Theory Ideology Or Tool Box Plan What is string theory? Unification ideology.

2016 Vegetable Pesticide Update: Weeds 1) New/Changed labels 2) Labels soon 3) Auxin Technologies

2012 GFVGA: Herbicide Update 2012 Weed Control Update 1. Recent labels 2. New labels 3. Near

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

Does ideology hinder insights on gender and labor markets? Charlotta Stern Department of

July 2019 POLITICAL MONITOR 1 1 Ipsos MORI Political Monitor | Public Ipsos MORI Political

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

LECTURE 2 Review 1 Binary Math and Assembly BINARY MATH In this section, we review Binary

Binary trees Binary trees David Morgan Binary trees Binary trees elements have up to 2

Language ideology and indexicality of non-standard Cantonese in Hong Kong Vivian Y . Y . Yip

Machine Learning for Fine-Grained Hardware Prefetcher Control Jason Hiebel Laura E. Brown

On Target Coun-ng by Sequen-al Snapshots of Binary Proximity

Non-Silicon Non-Binary Computing: Why Not? Elena Dubrova, Yusuf Jamal, Jimson Mathew Royal

ETH Zrich FFmpeg and a thousand fixes >1,000 bugs found and fixed 2 person-years &

Wrangling Court Data on a National Level The agenda Who am I? What is CourtListener?

Eiganes tunnel / Ryfast Worlds longest sub-sea road-tunnel, a city tunnel, and a sub-sea

Sea-ice verification by using binary image distance metrics B. Casati, JF. Lemieux, G. Smith, P.

A jump-target identification method for multi-architecture static binary translation Alessandro

Beyond Binary Labels: Political Ideology Prediction of Twitter Users - PowerPoint PPT Presentation

Beyond Binary Labels: Political Ideology Prediction of Twitter Users Daniel Preot iuc-Pietro Joint work with Ye Liu (NUS), Daniel J Hopkins (Political Science), Lyle Ungar (CS) 2 August 2017 Motivation User attribute prediction from text

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

String Theory Ideology Or Tool Box Plan What is string theory? Unification ideology.

2016 Vegetable Pesticide Update: Weeds 1) New/Changed labels 2) Labels soon 3) Auxin Technologies

2012 GFVGA: Herbicide Update 2012 Weed Control Update 1. Recent labels 2. New labels 3. Near

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

Does ideology hinder insights on gender and labor markets? Charlotta Stern Department of

July 2019 POLITICAL MONITOR 1 1 Ipsos MORI Political Monitor | Public Ipsos MORI Political

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

LECTURE 2 Review 1 Binary Math and Assembly BINARY MATH In this section, we review Binary

Binary trees Binary trees David Morgan Binary trees Binary trees elements have up to 2

Language ideology and indexicality of non-standard Cantonese in Hong Kong Vivian Y . Y . Yip

Machine Learning for Fine-Grained Hardware Prefetcher Control Jason Hiebel Laura E. Brown

On Target Coun-ng by Sequen-al Snapshots of Binary Proximity

Non-Silicon Non-Binary Computing: Why Not? Elena Dubrova, Yusuf Jamal, Jimson Mathew Royal

ETH Zrich FFmpeg and a thousand fixes &gt;1,000 bugs found and fixed 2 person-years &amp;

Wrangling Court Data on a National Level The agenda Who am I? What is CourtListener?

Eiganes tunnel / Ryfast Worlds longest sub-sea road-tunnel, a city tunnel, and a sub-sea

Sea-ice verification by using binary image distance metrics B. Casati, JF. Lemieux, G. Smith, P.

A jump-target identification method for multi-architecture static binary translation Alessandro

ETH Zrich FFmpeg and a thousand fixes >1,000 bugs found and fixed 2 person-years &