mining socio political and socio economic signals from
play

Mining socio-political and socio-economic signals from social media - PowerPoint PPT Presentation

Mining socio-political and socio-economic signals from social media content Vasileios Lampos Department of Computer Science University College London @lampos | lampos.net Summer School on Big Data & Networks in Social Sciences


  1. Mining socio-political and socio-economic signals from social media content Vasileios Lampos Department of Computer Science University College London @lampos | lampos.net Summer School on “ Big Data & Networks in Social Sciences ” University of Warwick, Sept. 21-23, 2016

  2. Structure of the presentation 1. Introductory remarks 2. Collective inference tasks 
 — Mining emotions 
 — Modelling voting intention 3. Personalised inference tasks 
 — Occupational class 
 — Income 
 — Socioeconomic status 4. Concluding remarks

  3. Context and motivation the Internet, the World Wide Web , connectivity numerous web products feeding from user activity user-generated content , publicly available, esp. on social media platforms (e.g. Twitter) large-scale digitised data, ‘ Big Data ’, ‘Data Science’ How can we use online user-generated content to enhance our understanding about our world?

  4. Context and motivation the Internet, the World Wide Web , connectivity numerous web products feeding from user activity user-generated content , publicly available, esp. on social media platforms (e.g. Twitter) large-scale digitised data, ‘ Big Data ’, ‘Data Science’ How can we use online user-generated content to enhance our understanding about our world?

  5. About Twitter

  6. About Twitter > 140 characters per published status ( tweet ) > users can follow and be followed > embedded usage of topics (using #hashtags) > user interaction (re-tweets, @mentions, likes) > real-time nature > biased demographics (13-15% of UK’s population, age bias etc.) > information is noisy and not always accurate

  7. Inferring collective information 
 from user-generated content mood / emotions voting intention Lampos (Ph.D. Thesis, 2012) Lansdall-Welfare, Lampos & Cristianini (WWW 2012) Lampos, Preotiuc-Pietro & Cohn (ACL 2013)

  8. Emotion taxonomies and quantification > WordNet Affect > Linguistic Inquiry and Word Count (LIWC) ( Strapparava & Valitutti, 2004 ; Pennebaker et al., 2001, 2007 ) ‘Emotional’ keywords , representing + anger , e.g. angry , irritate + fear , e.g. fearful , afraid + joy , e.g. cheerful , enthusiastic + sadness , e.g. depressed , gloomy + plus other emotions Simply — but maybe not good enough! — we compute the mean keyword frequency score per emotion

  9. Emotion taxonomies and quantification > WordNet Affect > Linguistic Inquiry and Word Count (LIWC) ( Strapparava & Valitutti, 2004 ; Pennebaker et al., 2001, 2007 ) ‘Emotional’ keywords , representing + anger , e.g. angry , irritate + fear , e.g. fearful , afraid + joy , e.g. cheerful , enthusiastic + sadness , e.g. depressed , gloomy + plus other emotions Simply — but maybe not good enough! — we compute the mean keyword frequency score per emotion

  10. Circadian emotion patterns from Twitter (UK) Winter Summer Aggregated Data Sadness Score 0.1 0.1 0 0 -0.1 -0.1 3 6 9 12 15 18 21 24 3 6 9 12 15 18 21 24 0.1 0.1 Joy Score 0 0 -0.1 -0.1 3 6 9 12 15 18 21 24 3 6 9 12 15 18 21 24 Hourly Intervals Hourly Intervals 24h emotion patterns for ‘joy’ and ‘sadness’ for summer and winter with 95% confidence intervals

  11. ‘ Joy’ time series based on Twitter (UK) y o 933 Day Time Series for Joy in Twitter Content , 10 * XMAS * XMAS raw joy signal * XMAS Normalised Emotional Valence 14 − day smoothed joy s 8 6 4 * valentine . * valentine * halloween * easter 2 * halloween * easter * RIOTS * halloween 0 * CUTS * roy.wed. − 2 Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12 Date s, Clear peaking pattern during XMAS or other annual celebrations (Valentine’s Day, Easter)

  12. Recession, riots, and Twitter emotions (UK) Budget Cuts (UK) Riots (UK) 1.5 Anger Fear Date of Budget Cuts 1 Date of Riots Difference in mean 0.5 0 − 0.5 − 1 Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12 Date Difference in mean mood score 50 days prior and after each date; peaks indicate increase in mood change

  13. Inferring voting intention — Data sets United Kingdom + 3 political parties (Conservatives, Labour, Lib Dem) + 42,000 Twitter users distributed proportionally to UK’s regional population figures + 60 million tweets, 80,976 1-grams + 240 polls from 30 Apr. 2010 to 13 Feb. 2012 Austria + 4 political parties (SPO, OVP , FPO, GRU) + 1,100 active Twitter users selected by political scientists + 800,000 tweets, 22,917 1-grams + 98 polls from 25 Jan. to 25 Dec. 2012

  14. Regularised text regression x i ∈ R m , i ∈ { 1 , . . . , n } observations — X responses y i ∈ R , i ∈ { 1 , . . . , n } — y weights, bias w j , β ∈ R , j ∈ { 1 , . . . , m } — w ∗ = [ w ; β ] f ( x i ) = x T i w + β Elastic Net ( Zou & Hastie, 2005 ) 2 8 9 0 1 n m m m < = X X X X w 2 argmin + λ 1 | w j | + λ 2 @ y i − β − x ij w j A j w , β : ; i =1 j =1 j =1 j =1 L1-norm L2-norm

  15. Regularised text regression x i ∈ R m , i ∈ { 1 , . . . , n } observations — X responses y i ∈ R , i ∈ { 1 , . . . , n } — y weights, bias w j , β ∈ R , j ∈ { 1 , . . . , m } — w ∗ = [ w ; β ] f ( x i ) = x T i w + β Elastic Net ( Zou & Hastie, 2005 ) 2 8 9 0 1 n m m m < = X X X X w 2 argmin + λ 1 | w j | + λ 2 @ y i − β − x ij w j A j w , β : ; i =1 j =1 j =1 j =1 L1-norm L2-norm

  16. Bilinear (users+text) regularised regression users p ∈ Z + observations Q i ∈ R p × m , i ∈ { 1 , . . . , n } X — responses i ∈ { 1 , . . . , n } y i ∈ R , — y weights, bias u k , w j , β ∈ R , k ∈ { 1 , . . . , p } u , w , β — j ∈ { 1 , . . . , m } f ( Q i ) = u T Q i w + β + β × × ) = u T Q T Q i w Q i w

  17. Bilinear elastic net (BEN) + β × × ) = u T Q : T Q i w Q i w ( n ) � 2 + ψ ( u , θ u ) + ψ ( w , θ w ) X u T Q i w + β � y i � argmin u , w , � i =1 � � where ψ ( x , λ 1 , λ 2 ) = λ 1 k x k ` 1 + λ 2 k x k 2 ` 2

  18. Training bilinear elastic net (BEN) : ( n ) � 2 + ψ ( u , θ u ) + ψ ( w , θ w ) X u T Q i w + β � y i � argmin u , w , � i =1 Biconvex problem + fix u , learn w and vice versa + iterate through convex optimisation tasks Large-scale solvers in SPAMS ( Mairal et al., 2010 ) Global Objective 2.4 RMSE Global objective function 2 during training ( red ) 1.6 1.2 Corresponding prediction 0.8 error on held out data ( blue ) 0.4 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Step

  19. Training bilinear elastic net (BEN) : ( n ) � 2 + ψ ( u , θ u ) + ψ ( w , θ w ) X u T Q i w + β � y i � argmin u , w , � i =1 Biconvex problem + fix u , learn w and vice versa + iterate through convex optimisation tasks Large-scale solvers in SPAMS ( Mairal et al., 2010 ) Global Objective 2.4 RMSE Global objective function 2 during training ( red ) 1.6 1.2 Corresponding prediction 0.8 error on held out data ( blue ) 0.4 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Step

  20. Bilinear and multi-task regression tasks τ ∈ Z + users p ∈ Z + observations Q i ∈ R p × m , i ∈ { 1 , . . . , n } X — responses i ∈ { 1 , . . . , n } y i ∈ R τ , Y — weights, bias β ∈ R τ , k ∈ { 1 , . . . , p } — β β u k , w j , β U , W , β β j ∈ { 1 , . . . , m } 1 2 � U T Q i W � f ( Q i ) = tr + β × × T Q i w U T Q T � i W

  21. 1 2 Bilinear Group L 2,1 (BGL) × × T Q i w U T Q T � i W 8 9 p ⌧ n m � 2 + λ u < = X X X X u T Q i w t + β t � y ti � argmin k U k k 2 + λ w k W j k 2 U , W , � � � : ; t =1 i =1 j =1 k =1 + a nonzero weighted feature (user or word) is encouraged to be nonzero for all tasks , but with potentially different weights + intuitive for political preference inference

  22. Voting intention inference performance Mean poll Last poll 3 Elastic Net (words) 3.067 BEN Root Mean Squared Error BGL 2 2 1.851 1.723 1.699 1.69 1.573 1.478 1.47 1.442 1.439 1 0 UK Austria

  23. Voting intention inference performance Mean poll Last poll 3 Elastic Net (words) 3.067 BEN Root Mean Squared Error BGL 2 2 1.851 1.723 1.699 1.69 1.573 1.478 1.47 1.442 1.439 1 0 UK Austria

  24. Voting intention comparative plots 40 40 35 35 Voting Intention % Voting Intention % 30 30 BEN CON CON 25 25 BGL LAB LAB 20 20 LIB LIB 15 15 10 10 5 5 0 0 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45 Time Time 40 35 Voting Intention % 30 CON 25 YouGov LAB 20 LIB 15 10 5 0 5 10 15 20 25 30 35 40 45 Time

  25. Voting intention comparative plots 30 30 25 25 Voting Intention % Voting Intention % 20 20 15 15 10 10 SPÖ BGL SPÖ BEN ÖVP ÖVP 5 5 FPÖ FPÖ GRÜ GRÜ 0 0 5 10 15 20 25 30 35 40 45 5 10 15 20 25 30 35 40 45 Time Time 30 25 Voting Intention % 20 15 10 Polls SPÖ ÖVP 5 FPÖ GRÜ 0 5 10 15 20 25 30 35 40 45 Time

Recommend


More recommend