Can Social Media tell us something about our lives? Vasileios - PowerPoint PPT Presentation

Can Social Media tell us something about our lives? Vasileios Lampos Computer Science Department University of Sheffield March, 2013 1 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 1/43

Outline ⊥ Motivation, Aims [Facts, Questions] ⊥ Data ⊣ Nowcasting Events ⊣ Extracting Mood Patterns ⊣ TrendMiner – Extracting Political Opinion | = Conclusions 2 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 2/43

Facts We started to work on those ideas back in 2008, when... • Web contained 1 trillion unique pages (Google) • Social Networks were rising, e.g. ◦ Facebook : 100m (2008) → > 1 billion active users (October, 2012) ◦ Twitter : 6m (2008) → 500m active users (July, 2012) • User behaviour was changing ◦ Socialising via the Web ◦ Giving up privacy (Debatin et al. , 2009) 3 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 3/43

Some general questions • Does user generated text posted on Social Web platforms include useful information ? • How can we extract this useful information... ... automatically ? Therefore, not we, but a machine . • Practical / real-life applications ? • Can those large samples of human input assist studies in other scientific fields ? Social Sciences , Psychology , Epidemiology 4 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 4/43

The Data (1/3) Why Twitter? • Has a lot of content that is publicly accessible • Provides a well-documented API for several types of data collection • Opinions and personal statements on various domains • Connection with current affairs (usually in real-time ) • Some content is geo-located • Option for personalised modelling • ... and we got good results from the very first, simple experiment! 5 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 5/43

The Data (2/3) What does a @tweet look like? Figure 1 : Some biased and anonymised examples of tweets (limit of 140 characters /tweet, # denotes a topic ) (a) (user will remain anonymous) (b) they live around us (c) citizen journalism (d) flu attitude 6 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 6/43

The Data (3/3) Data Collection & Preprocessing • The easiest part of the process... ◦ not true ! → Storage space, crawler implementation, parallel data processing, new technologies ( e.g. , Map-Reduce) (Preotiuc et al. , 2012) • Data collected via Twitter’s Search API : ◦ collective sampling ◦ tweets geo-located in 54 urban centres in the UK ◦ periodical crawling (every 3 or 5 minutes per urban centre) • Data collected via Twitter’s REST API : ◦ user-centric sampling ◦ preprocessing to approximate user’s location (city & country) ◦ ... or manual user selection from domain experts ◦ get their latest tweets (3,000 or more) • Several forms of ground truth (flu/rainfall rates, polls) 7 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 7/43

Nowcasting Events from the Social Web 8 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 8/43

‘Nowcasting’? We do not predict the future, but infer the present − δ i.e. the very recent past State of the World ( u ) W M  ( u ) ( ) ( u ) S Figure 2 : Nowcasting the magnitude of an event ( ε ) emerging in the real world from Web information Our case studies: nowcasting (a) flu rates & (b) rainfall rates ( ?! ) 9 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 9/43

What do we get in the end? This is a regression problem ( text regression in NLP) x i ∈ R n i.e. ∀ time interval i we aim to infer y i ∈ R using text input x x 16 Rainfall rate (mm) − Bristol 14 Actual Inferred 12 10 8 6 4 2 0 0 5 10 15 20 25 30 Days Figure 3 : Inferred rainfall rates for Bristol, UK (October, 2009) 10 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 10/43

Methodology (1/5) — Text in Vector Space Candidate features ( n -grams): C = { c i } Set of Twitter posts for a time interval u : P ( u ) = { p j } Frequency of c i in p j : � ϕ if c i ∈ p j , g ( c i , p j ) = 0 otherwise. – g Boolean, maximum value for ϕ is 1 – Score of c i in P ( u ) : |P ( u ) | � g ( c i , p j ) j = 1 � c i , P ( u ) � s = |P ( u ) | 11 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 11/43

Methodology (2/5) Set of time intervals : U = { u k } ∼ 1 hour, 1 day, ... Time series of candidate features scores : x ( u |U| ) � T , X ( U ) = x ( u 1 ) ... x � x x x where c |C| , P ( u i ) �� T x ( u i ) = � � c 1 , P ( u i ) � � x x s ... s Target variable (event): � T y ( U ) = � y y y 1 ... y |U| 12 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 12/43

Methodology (3/5) — Feature selection Solve the following optimisation problem : � X ( U ) w y ( U ) � 2 min w w − y y ℓ 2 w s.t. � w w w � ℓ 1 ≤ t , t = α · � w w w OLS � ℓ 1 , α ∈ ( 0 , 1 ] . • Least Absolute Shrinkage and Selection Operator ( LASSO ) � X ( U ) w y ( U ) � 2 argmin w w − y y ℓ 2 + λ � w w w � ℓ 1 w w w (Tibshirani, 1996) • Expect a sparse w w w (feature selection) • Least Angle Regression ( LARS ) – computes entire regularisation path ( w w w ’s for different values of λ ) (Efron et al. , 2004) 13 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 13/43

Methodology (4/5) LASSO is model-inconsistent : • inferred sparsity pattern may deviate from the true model, e.g. , when predictors are highly correlated (Zhao and Yu, 2006) • bootstrap [ ? ] LASSO ( Bolasso ) performs a more robust feature selection (Bach, 2008) ? : ◦ in each bootstrap, input space is sampled with replacement ◦ apply LASSO (LARS) to select features ◦ select features with nonzero weights in all bootstraps • better alternative — soft-Bolasso : ◦ a less strict feature selection ◦ select features with nonzero weights in p % of bootstraps ◦ (learn p using a separate validation set) • weights of selected features determined via OLS regression 14 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 14/43

Methodology (5/5) — Simplified summary Observations : X ∈ R m × n ( m time intervals, n features) y ∈ R m Response variable : y y For i = 1 to number of bootstraps Form X i ⊂ X by sampling X with replacement w i ∈ R n Solve LASSO for X i and y y y , i.e. learn w w Get the k ≤ n features with nonzero weights End_For Select the v ≤ n features with nonzero weight in p % of the bootstraps Learn their weights with OLS regression on X ( v ) ∈ R m × v and y y y 15 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 15/43

How do we form candidate features? • Commonly formed by indexing the entire corpus (Manning, Raghavan and Schütze, 2008) • We extract them from Wikipedia, Google Search results, Public Authority websites ( e.g. , NHS) Why? ◦ reduce dimensionality to bound the error of LASSO � W 2 N , W 2 � N + p N + W 1 1 1 L ( w w w ) ≤ L (ˆ w ) + Q , with Q ∼ min w √ w N p candidate features, N samples, empirical loss L (ˆ w ) and w w � ˆ w w � ℓ 1 ≤ W 1 w (Bartlett, Mendelson and Neeman, 2011) ◦ Harry Potter Effect! 16 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 16/43

The ‘Harry Potter’ effect (1/2) Figure 4 : Events co-occurring ( correlated ) with the inference target may affect feature selection, especially when the sample size is small. Flu (England & Wales) 300 Hypothetical Event I Hypothetical Event II 250 Event Score 200 150 100 50 0 180 200 220 240 260 280 300 320 340 Day Number (2009) (Lampos, 2012a) 17 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 17/43

The ‘Harry Potter’ effect (2/2) Table 1 : Top 1-grams correlated with flu rates in England/Wales (06–12/2009) 1-gram Event Corr. Coef. latitud Latitude Festival 0.9367 flu Flu epidemic 0.9344 swine 0.9212 � harri Harry Potter Movie 0.9112 slytherin 0.9094 � potter 0.8972 � benicassim Benicàssim Festival 0.8966 graduat Graduation (?) 0.8965 dumbledor Harry Potter Movie 0.8870 hogwart 0.8852 � quarantin Flu epidemic 0.8822 gryffindor Harry Potter Movie 0.8813 ravenclaw 0.8738 � princ 0.8635 � swineflu Flu epidemic 0.8633 ginni Harry Potter Movie 0.8620 weaslei 0.8581 � hermion 0.8540 � draco 0.8533 � Solution : ground truth with some degree of variability (Lampos, 2012a) 18 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 18/43

About n-grams 1-grams • decent (dense) representation in the Twitter corpus • unclear semantic interpretation Example: “ I am not sick. But I don’t feel great either! ” 2-grams • very sparse representation in tweets • sometimes clearer semantic interpretation Experimental process indicated that... a hybrid combination ∗ of 1 -grams and 2 -grams delivers the best inference performance ∗ refer to (Lampos, 2012a) 19 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 19/43

Can Social Media tell us something about our lives? Vasileios - PowerPoint PPT Presentation

Can Social Media tell us something about our lives? Vasileios Lampos Computer Science Department University of Sheffield March, 2013 1 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 1/43 Outline

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

Social Media donts What is social media Social media is nothing new Just an extension

Social Media Legal Issues Brian C. England Deputy City Attorney Garland, Texas March 7, 2018

Social Media for Mason AGENDA What is Social Media Social Media Strategy Content

Financial Disclosure Statement Something Old, Something New, Something Unbreakable, and Something

Getting Social What is social media? Why does social media matter? What social media

Social Media Analytics Ahmed Abbasi University of Virginia 1 Outline Social Media Overview

VoIP Security Title : Something Old (H.323), Something New (IAX), Something Hallow ( Security ),

1 To check something out (pv): to see, watch, examine, try. Something/someone is not ones cup of

Digital Media Addiction Smart Phones, Social Media and Suicide Fact: Social Media is a

Social Media Seminar for Development Educators Part 1: Social Media Basics How are these

Natural Language Processing Lecture 8: Parts of Speech My cat who lives dangerously no longer

Natural Language Processing Lecture 7: Parts of Speech My cat who lives dangerously no longer

Contents Introduction What is social media Social media overview Classification of

Social Media 201: Using Social Media to Advocate for Advocacy Day and Beyond Social Media 101

Social Media for Business July 28, 2009 What is it? Social media marketing also known as social

Modeling and Verification with SPIN Wishnu Prasetya wishnu@cs.uu.nl www.cs.uu.nl/docs/vakken/pv

Investigating the security properties of MACs based on stream ciphers Leonie Simpson, Mufeed Al

Representing Images and Sounds Class 4. 3 Sep 2009 Instructor: Bhiksha Raj Representing an

Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of

Integrating Personalized Health Information from MedlinePlus in a Patient Portal

Chapter 9 Alternative Architectures Quote It would appear that we have reached the limit of

The Essentials of CAGD Chapter 1: The Bare Basics Gerald Farin & Dianne Hansford CRC Press,

Green Coordinates T obias G. Pfeiffer Freie Universitt Berlin AG Mathematical Geometry

Can Social Media tell us something about our lives? Vasileios - PowerPoint PPT Presentation

Can Social Media tell us something about our lives? Vasileios Lampos Computer Science Department University of Sheffield March, 2013 1 / 43 V. Lampos bill@lampos.net Can Social Media tell us something about our lives? 1/43 Outline

Presentation 1 What is social media? Get Media Smart social media 2 What is social media?

Social Media donts What is social media Social media is nothing new Just an extension

Social Media Legal Issues Brian C. England Deputy City Attorney Garland, Texas March 7, 2018

Social Media for Mason AGENDA What is Social Media Social Media Strategy Content

Financial Disclosure Statement Something Old, Something New, Something Unbreakable, and Something

Getting Social What is social media? Why does social media matter? What social media

Social Media Analytics Ahmed Abbasi University of Virginia 1 Outline Social Media Overview

VoIP Security Title : Something Old (H.323), Something New (IAX), Something Hallow ( Security ),

1 To check something out (pv): to see, watch, examine, try. Something/someone is not ones cup of

Digital Media Addiction Smart Phones, Social Media and Suicide Fact: Social Media is a

Social Media Seminar for Development Educators Part 1: Social Media Basics How are these

Natural Language Processing Lecture 8: Parts of Speech My cat who lives dangerously no longer

Natural Language Processing Lecture 7: Parts of Speech My cat who lives dangerously no longer

Contents Introduction What is social media Social media overview Classification of

Social Media 201: Using Social Media to Advocate for Advocacy Day and Beyond Social Media 101

Social Media for Business July 28, 2009 What is it? Social media marketing also known as social

Modeling and Verification with SPIN Wishnu Prasetya wishnu@cs.uu.nl www.cs.uu.nl/docs/vakken/pv

Investigating the security properties of MACs based on stream ciphers Leonie Simpson, Mufeed Al

Representing Images and Sounds Class 4. 3 Sep 2009 Instructor: Bhiksha Raj Representing an

Variational Model Selection for Sparse Gaussian Process Regression Michalis K. Titsias School of

Integrating Personalized Health Information from MedlinePlus in a Patient Portal

Chapter 9 Alternative Architectures Quote It would appear that we have reached the limit of

The Essentials of CAGD Chapter 1: The Bare Basics Gerald Farin &amp; Dianne Hansford CRC Press,

Green Coordinates T obias G. Pfeiffer Freie Universitt Berlin AG Mathematical Geometry

The Essentials of CAGD Chapter 1: The Bare Basics Gerald Farin & Dianne Hansford CRC Press,