Mining socio-political and socio-economic signals from social media content Vasileios Lampos Department of Computer Science University College London @lampos | lampos.net Summer School on “ Big Data & Networks in Social Sciences ” University of Warwick, Sept. 21-23, 2016
Structure of the presentation 1. Introductory remarks 2. Collective inference tasks — Mining emotions — Modelling voting intention 3. Personalised inference tasks — Occupational class — Income — Socioeconomic status 4. Concluding remarks
Context and motivation the Internet, the World Wide Web , connectivity numerous web products feeding from user activity user-generated content , publicly available, esp. on social media platforms (e.g. Twitter) large-scale digitised data, ‘ Big Data ’, ‘Data Science’ How can we use online user-generated content to enhance our understanding about our world?
Context and motivation the Internet, the World Wide Web , connectivity numerous web products feeding from user activity user-generated content , publicly available, esp. on social media platforms (e.g. Twitter) large-scale digitised data, ‘ Big Data ’, ‘Data Science’ How can we use online user-generated content to enhance our understanding about our world?
About Twitter
About Twitter > 140 characters per published status ( tweet ) > users can follow and be followed > embedded usage of topics (using #hashtags) > user interaction (re-tweets, @mentions, likes) > real-time nature > biased demographics (13-15% of UK’s population, age bias etc.) > information is noisy and not always accurate
Inferring collective information from user-generated content mood / emotions voting intention Lampos (Ph.D. Thesis, 2012) Lansdall-Welfare, Lampos & Cristianini (WWW 2012) Lampos, Preotiuc-Pietro & Cohn (ACL 2013)
Emotion taxonomies and quantification > WordNet Affect > Linguistic Inquiry and Word Count (LIWC) ( Strapparava & Valitutti, 2004 ; Pennebaker et al., 2001, 2007 ) ‘Emotional’ keywords , representing + anger , e.g. angry , irritate + fear , e.g. fearful , afraid + joy , e.g. cheerful , enthusiastic + sadness , e.g. depressed , gloomy + plus other emotions Simply — but maybe not good enough! — we compute the mean keyword frequency score per emotion
Emotion taxonomies and quantification > WordNet Affect > Linguistic Inquiry and Word Count (LIWC) ( Strapparava & Valitutti, 2004 ; Pennebaker et al., 2001, 2007 ) ‘Emotional’ keywords , representing + anger , e.g. angry , irritate + fear , e.g. fearful , afraid + joy , e.g. cheerful , enthusiastic + sadness , e.g. depressed , gloomy + plus other emotions Simply — but maybe not good enough! — we compute the mean keyword frequency score per emotion
Circadian emotion patterns from Twitter (UK) Winter Summer Aggregated Data Sadness Score 0.1 0.1 0 0 -0.1 -0.1 3 6 9 12 15 18 21 24 3 6 9 12 15 18 21 24 0.1 0.1 Joy Score 0 0 -0.1 -0.1 3 6 9 12 15 18 21 24 3 6 9 12 15 18 21 24 Hourly Intervals Hourly Intervals 24h emotion patterns for ‘joy’ and ‘sadness’ for summer and winter with 95% confidence intervals
‘ Joy’ time series based on Twitter (UK) y o 933 Day Time Series for Joy in Twitter Content , 10 * XMAS * XMAS raw joy signal * XMAS Normalised Emotional Valence 14 − day smoothed joy s 8 6 4 * valentine . * valentine * halloween * easter 2 * halloween * easter * RIOTS * halloween 0 * CUTS * roy.wed. − 2 Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12 Date s, Clear peaking pattern during XMAS or other annual celebrations (Valentine’s Day, Easter)
Recession, riots, and Twitter emotions (UK) Budget Cuts (UK) Riots (UK) 1.5 Anger Fear Date of Budget Cuts 1 Date of Riots Difference in mean 0.5 0 − 0.5 − 1 Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12 Date Difference in mean mood score 50 days prior and after each date; peaks indicate increase in mood change
Inferring voting intention — Data sets United Kingdom + 3 political parties (Conservatives, Labour, Lib Dem) + 42,000 Twitter users distributed proportionally to UK’s regional population figures + 60 million tweets, 80,976 1-grams + 240 polls from 30 Apr. 2010 to 13 Feb. 2012 Austria + 4 political parties (SPO, OVP , FPO, GRU) + 1,100 active Twitter users selected by political scientists + 800,000 tweets, 22,917 1-grams + 98 polls from 25 Jan. to 25 Dec. 2012
Regularised text regression x i ∈ R m , i ∈ { 1 , . . . , n } observations — X responses y i ∈ R , i ∈ { 1 , . . . , n } — y weights, bias w j , β ∈ R , j ∈ { 1 , . . . , m } — w ∗ = [ w ; β ] f ( x i ) = x T i w + β Elastic Net ( Zou & Hastie, 2005 ) 2 8 9 0 1 n m m m < = X X X X w 2 argmin + λ 1 | w j | + λ 2 @ y i − β − x ij w j A j w , β : ; i =1 j =1 j =1 j =1 L1-norm L2-norm
Regularised text regression x i ∈ R m , i ∈ { 1 , . . . , n } observations — X responses y i ∈ R , i ∈ { 1 , . . . , n } — y weights, bias w j , β ∈ R , j ∈ { 1 , . . . , m } — w ∗ = [ w ; β ] f ( x i ) = x T i w + β Elastic Net ( Zou & Hastie, 2005 ) 2 8 9 0 1 n m m m < = X X X X w 2 argmin + λ 1 | w j | + λ 2 @ y i − β − x ij w j A j w , β : ; i =1 j =1 j =1 j =1 L1-norm L2-norm
Bilinear (users+text) regularised regression users p ∈ Z + observations Q i ∈ R p × m , i ∈ { 1 , . . . , n } X — responses i ∈ { 1 , . . . , n } y i ∈ R , — y weights, bias u k , w j , β ∈ R , k ∈ { 1 , . . . , p } u , w , β — j ∈ { 1 , . . . , m } f ( Q i ) = u T Q i w + β + β × × ) = u T Q T Q i w Q i w
Bilinear elastic net (BEN) + β × × ) = u T Q : T Q i w Q i w ( n ) � 2 + ψ ( u , θ u ) + ψ ( w , θ w ) X u T Q i w + β � y i � argmin u , w , � i =1 � � where ψ ( x , λ 1 , λ 2 ) = λ 1 k x k ` 1 + λ 2 k x k 2 ` 2
Training bilinear elastic net (BEN) : ( n ) � 2 + ψ ( u , θ u ) + ψ ( w , θ w ) X u T Q i w + β � y i � argmin u , w , � i =1 Biconvex problem + fix u , learn w and vice versa + iterate through convex optimisation tasks Large-scale solvers in SPAMS ( Mairal et al., 2010 ) Global Objective 2.4 RMSE Global objective function 2 during training ( red ) 1.6 1.2 Corresponding prediction 0.8 error on held out data ( blue ) 0.4 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Step
Training bilinear elastic net (BEN) : ( n ) � 2 + ψ ( u , θ u ) + ψ ( w , θ w ) X u T Q i w + β � y i � argmin u , w , � i =1 Biconvex problem + fix u , learn w and vice versa + iterate through convex optimisation tasks Large-scale solvers in SPAMS ( Mairal et al., 2010 ) Global Objective 2.4 RMSE Global objective function 2 during training ( red ) 1.6 1.2 Corresponding prediction 0.8 error on held out data ( blue ) 0.4 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Step
Bilinear and multi-task regression tasks τ ∈ Z + users p ∈ Z + observations Q i ∈ R p × m , i ∈ { 1 , . . . , n } X — responses i ∈ { 1 , . . . , n } y i ∈ R τ , Y — weights, bias β ∈ R τ , k ∈ { 1 , . . . , p } — β β u k , w j , β U , W , β β j ∈ { 1 , . . . , m } 1 2 � U T Q i W � f ( Q i ) = tr + β × × T Q i w U T Q T � i W
1 2 Bilinear Group L 2,1 (BGL) × × T Q i w U T Q T � i W 8 9 p ⌧ n m � 2 + λ u < = X X X X u T Q i w t + β t � y ti � argmin k U k k 2 + λ w k W j k 2 U , W , � � � : ; t =1 i =1 j =1 k =1 + a nonzero weighted feature (user or word) is encouraged to be nonzero for all tasks , but with potentially different weights + intuitive for political preference inference
Voting intention inference performance Mean poll Last poll 3 Elastic Net (words) 3.067 BEN Root Mean Squared Error BGL 2 2 1.851 1.723 1.699 1.69 1.573 1.478 1.47 1.442 1.439 1 0 UK Austria
Voting intention inference performance Mean poll Last poll 3 Elastic Net (words) 3.067 BEN Root Mean Squared Error BGL 2 2 1.851 1.723 1.699 1.69 1.573 1.478 1.47 1.442 1.439 1 0 UK Austria
Voting intention comparative plots 40 40 35 35 Voting Intention % Voting Intention % 30 30 BEN CON CON 25 25 BGL LAB LAB 20 20 LIB LIB 15 15 10 10 5 5 0 0 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45 Time Time 40 35 Voting Intention % 30 CON 25 YouGov LAB 20 LIB 15 10 5 0 5 10 15 20 25 30 35 40 45 Time
Voting intention comparative plots 30 30 25 25 Voting Intention % Voting Intention % 20 20 15 15 10 10 SPÖ BGL SPÖ BEN ÖVP ÖVP 5 5 FPÖ FPÖ GRÜ GRÜ 0 0 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45 Time Time 30 25 Voting Intention % 20 15 10 Polls SPÖ ÖVP 5 FPÖ GRÜ 0 5 10 15 20 25 30 35 40 45 Time
Recommend
More recommend