detecting events and patterns in the social web with
play

Detecting Events and Patterns in the Social Web with Statistical - PowerPoint PPT Presentation

Detecting Events and Patterns in the Social Web with Statistical Learning Vasileios Lampos Computer Science Department University of Sheffield 1 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 1/29 Outline


  1. Detecting Events and Patterns in the Social Web with Statistical Learning Vasileios Lampos Computer Science Department University of Sheffield 1 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 1/29

  2. Outline ⊥ Motivation, Aims ⊥ Data ⊣ Nowcasting Events from the Social Web ⊣ Extracting Mood Patterns from the Social Web | = Conclusions 2 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 2/29

  3. Facts We started to work on this idea in 2008, when... • Web contained 1 trillion unique pages (Google) • Social Networks were rising, e.g. ◦ Facebook : 100m users in 2008, 955m in 2012 (June) ◦ Twitter : 6m users in 2008, 500m active users in 2012 (April) • User behaviour was changing ◦ Socialising via the Web ◦ Giving up privacy (Debatin et al. , 2009) 3 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 3/29

  4. Questions • Does user generated text posted on Social Web platforms include useful information ? • How can we extract this useful information... ... automatically ? Therefore, not we, but a machine . • Practical / real-life applications ? • Can those large samples of human input assist studies in other scientific fields ? Social Sciences , Psychiatry ... 4 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 4/29

  5. One slide on @Twitter. What does a ‘tweet’ look like? Figure 1: Some biased and anonymised examples of tweets (limit of 140 characters /tweet, # denotes a topic ) (a) (user will remain anonymous) (b) they live around us (c) citizen journalism (d) flu attitude 5 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 5/29

  6. Data Collection • Considered to be the easiest part of the process... ... not true ! ◦ Storage space ◦ Crawler implementation, parallel data processing ◦ Equipment, new technologies ( e.g. Map-Reduce) • Data collected and used in the following experiments ◦ tweets geo-located in 54 urban centres in the UK ◦ collected periodically (every 3 or 5 minutes per urban centre) ◦ approx. 0.5 billion tweets by 10 million users (06/2009 to 01/2012) ◦ ground truth (regional flu & local rainfall rates) 6 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 6/29

  7. Nowcasting Events from the Social Web 7 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 7/29

  8. ‘Nowcasting’? We do not predict the future, but infer the present − δ i.e. the very recent past State of the World ( u ) W M  ( u ) ( ) ( u ) S Figure 2: Nowcasting the magnitude of an event ( ε ) emerging in the real world from Web information Our case studies: nowcasting (a) flu rates & (b) rainfall rates ( ?! ) 8 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 8/29

  9. What do we get in the end? 16 Rainfall rate (mm) − Bristol 14 Actual Inferred 12 10 8 6 4 2 0 0 5 10 15 20 25 30 Days Figure 3: Inferred rainfall rates for Bristol, UK (October, 2009) 9 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 9/29

  10. Core Methodology (1/3) – Turning text into numbers Candidate features ( n -grams): C = { c i } Set of Twitter posts for a time interval u : P ( u ) = { p j } Frequency of c i in p j : � ϕ if c i ∈ p j , g ( c i , p j ) = 0 otherwise. – g Boolean, maximum value for ϕ is 1 – Score of c i in P ( u ) : |P ( u ) | � g ( c i , p j ) j =1 � c i , P ( u ) � s = |P ( u ) | 10 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 10/29

  11. Core Methodology (2/3) Set of time intervals : U = { u k } ∼ 1 hour, 1 day, ... Time series of candidate features scores : x ( u 1 ) ... x ( u |U| ) � T , X ( U ) = � where c |C| , P ( u i ) �� T x ( u i ) = � � c 1 , P ( u i ) � � s ... s Target variable (event): � T y ( U ) = � y 1 ... y |U| 11 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 11/29

  12. Core Methodology (3/3) – Feature selection Solve the following optimisation problem : �X ( U ) w − y ( U ) � 2 min ℓ 2 w s.t. � w � ℓ 1 ≤ t, t = α · � w OLS � ℓ 1 , α ∈ (0 , 1] . • Least Absolute Shrinkage and Selection Operator ( LASSO ) (Tibshirani, 1996) • Enforce sparsity on w (feature selection) • Least Angle Regression ( LARS ) – computes entire regularisation path (Efron et al. , 2004) 12 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 12/29

  13. Flu rates – Example of selected features Figure 4: Font size is proportional to the weight of each feature; flipped n-grams are negatively weighted. All words are stemmed (Porter, 1980) . (Lampos and Cristianini, 2012) 13 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 13/29

  14. Rainfall rates – Example of selected features Figure 5: Font size is proportional to the weight of each feature; flipped n-grams are negatively weighted. All words are stemmed (Porter, 1980) . (Lampos and Cristianini, 2012) 14 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 14/29

  15. Examples of inferences Flu Rate − C.England & Wales 120 120 Actual Actual Flu Rate − S.England 100 100 Inferred Inferred 80 80 60 60 40 40 20 20 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Days Days (a) Central England/Wales (flu) (b) South England (flu) 16 Rainfall rate (mm) − Bristol 14 Actual Inferred 12 10 8 6 4 2 0 0 5 10 15 20 25 30 Days (c) Bristol (rain) Figure 6: Examples of flu and rainfall rates inferences from Twitter content (Lampos and Cristianini, 2012) 15 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 15/29

  16. Flu Detector URL: http://geopatterns.enm.bris.ac.uk/epidemics Figure 7: Flu Detector uses the content of Twitter to nowcast flu rates in several UK regions (Lampos, De Bie and Cristianini, 2010) 16 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 16/29

  17. Extracting Mood Patterns from the Social Web 17 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 17/29

  18. Computing a mood score Table 1: Mood terms from WordNet Affect Fear Sadness Joy Anger afraid depressed admire angry fearful discouraged cheerful despise frighten disheartened enjoy enviously horrible dysphoria enthousiastic harassed panic gloomy exciting irritate ... ... ... ... ( 92 terms ) ( 115 terms ) ( 224 terms ) ( 146 terms ) Mood score computation for a time interval u using n mood terms and a sample of D days : | D | � n � 1 1 sf ( t j,u ) � � M s ( u ) = i | D | n j =1 i =1 = f ( t d,u ) − ¯ f i sf ( t d,u ) i , i ∈ { 1 , ..., n } . i σ f i ( t d,u ) : normalised frequency of a mood term i during time interval u in day d ∈ D f i 18 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 18/29

  19. Circadian mood patterns (1/2) Figure 8: Circadian ( 24-hour ) mood patterns based on UK Twitter content Winter Summer Aggregated Data 0.1 0.1 Fear Score 0 0 -0.1 -0.1 3 6 9 12 15 18 21 24 3 6 9 12 15 18 21 24 Sadness Score 0.1 0.1 0 0 -0.1 -0.1 3 6 9 12 15 18 21 24 3 6 9 12 15 18 21 24 0.1 0.1 Joy Score 0 0 -0.1 -0.1 3 6 9 12 15 18 21 24 3 6 9 12 15 18 21 24 0.05 0.05 Anger Score 0 0 -0.05 -0.05 3 6 9 12 15 18 21 24 3 6 9 12 15 18 21 24 Hourly Intervals Hourly Intervals 19 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 19/29

  20. Circadian mood patterns (2/2) Figure 9: Autocorrelation of circadian mood patterns based on hourly lags revealing periodicities 0.4 Autocorr. Autocorr. Autocorr. (Sadness) Autocorr. (Fear) 0.4 Conf. Bound Conf. Bound 0.3 0.2 0.2 0.1 0 0 1 12 24 36 48 60 72 84 96 108 120 132 144 156 168 1 12 24 36 48 60 72 84 96 108 120 132 144 156 168 Autocorr. Lags (Hours) Autocorr. Lags (Hours) (a) Fear (b) Sadness Autocorr. Autocorr. 0.3 Autocorr. (Anger) Conf. Bound Autocorr. (Joy) 0.4 Conf. Bound 0.2 0.2 0.1 0 0 −0.2 1 12 24 36 48 60 72 84 96 108 120 132 144 156 168 1 12 24 36 48 60 72 84 96 108 120 132 144 156 168 Autocorr. Lags (Hours) Autocorr. Lags (Hours) (c) Joy (d) Anger 20 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 20/29

  21. The mood of the nation (1/4) Figure 10: Daily time series for the mood of Joy based on Twitter content geo-located in the UK 933 Day Time Series for Joy in Twitter Content , 10 * XMAS * XMAS e raw joy signal * XMAS Normalised Emotional Valence 8 14−day smoothed joy d by 6 st 4 is. * valentine * valentine * halloween od * easter 2 * halloween ied * easter * RIOTS d 0 * halloween * CUTS * roy.wed. −2 ying location Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12 Date s, (Lansdall, Lampos and Cristianini, 2012) 21 / 29 V. Lampos bill@lampos.net Detecting Events and Patterns in the Social Web 21/29

Recommend


More recommend