Mining the social web: A series of statistical NLP case studies - PowerPoint PPT Presentation

Mining the social web: A series of statistical NLP case studies Vasileios Lampos Department of Computer Science University College London May, 2014 1 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 1/49

Key assumptions about social media • a significant sample of the population uses them • a significant amount of the published content is geo-located • this content reflects on collective portions of real-life (opinions, events, phenomena) ◦ usually forming a real-time relationship • it is easy ( ? ) to collect, store and process this content • and everyone seems to know how to use this “ big data ” 2 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 2/49

Twitter in one slide • 140 characters per published status (tweet) • users can follow and can be followed • embedded usage of topics (#rbnews, #inception in statistics) • retweets ( RT ), @replies, @mentions, favourites • real-time nature • biased user demographics (13-15% of UK’s population is now on Twitter) 3 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 3/49

In this talk Ways for harnessing social media information... • to extract simplified collective mood patterns (Lansdall et al., 2012) • to nowcast phenomena (an infectious disease or rainfall rates) (Lampos, Cristianini, 2010 & 2012) • to model voting intention (Lampos et al., 2013) • to understand characteristics related to user impact (Lampos et al., 2014) 4 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 4/49

Proof of concept and a little more: extracting collective mood patterns 5 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 5/49

Time series of joy and anger based on UK tweets 933 Day Time Series for Joy in Twitter Content , 10 * XMAS * XMAS e raw joy signal * XMAS Normalised Emotional Valence 14−day smoothed joy 8 d by joy 6 st happy, enjoy, love, 4 is. * valentine * valentine * halloween * easter od glad, joyful, elated... 2 * halloween ied * easter * RIOTS d 0 * halloween * CUTS * roy.wed. −2 ying location Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12 Date s, 1.5 Anger Fear Date of Budget Cuts 1 Date of Riots Difference in mean 0.5 derivative of anger & fear 0 −0.5 −1 Jul 09 Jan 10 Jul 10 Jan 11 Jul 11 Jan 12 Date (Lansdall et al., 2012), (Strapparava, Valitutti, 2004) → WordNet Affect 6 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 6/49

Mood projections Projections of 4-dimensional mood score signals (joy, sadness, anger and fear) on their top-2 principal components (2011 Twitter data) Days of the Week Days in 2011 359 0.4 Tuesday Monday 10 0.3 2nd Principal Component 2nd Principal Component 8 0.2 Wednesday 1 0.1 358 6 Thursday 0 365 45 4 −0.1 357 282 114 Friday 170 −0.2 93 119 205 204 2 197 334 10 23 Saturday 360 196 304 128 79 17 281 Sunday 184 232 9 296 185 239240 218219 212 2 309 112 195 113 233 211 191 226 115 289 16 295 261 122 177 155 107 80 356 229 213 225 222 303 302 297 331 221 181 328 120 100 43 230 355 32 317 30 275 198 253 254 288 −0.3 60 82 180 169 176 165 156 99 241 227 51 344 148 352 58 246 274 247 22 324 214 265 127 4 5 351 72 8 76 162 83 44 178 190 8687 81 236 179 316 92 157 106 280 234 121 323 129 208 206 38 3 70 104 237 66 174 164 109 59 163 158 228 194 244 94 90 19 310 151 188 150 73 24 193 124 278279 260 293 135 346 330 361 142 332 182 183 154 111 231 97 145 139 7 215 62 152 271 18 173 131 171 202 243 337 20 345 192 292 257 338 235 15 223 12 333 342 36 220 175 41 238 210 47 272 108 96 110 153 144 61 78 117 149 103 167 29 34 242 166 132 350 88 327 31 325 33 291 6 341 326 248 364 199 277 276 298 71 37 123 98 217 189 102 116 35 294 301 203 5253 50 54 172 207 186 57 313 209 136 95 319 264 137 318 339 300 262 299 283 311 353 290 0 161 147 105 224 321 335 306 27 159 85 307 256 349 187 89 13 354 305 25 312 143 74 362 363 249 56 168 160 46 287 55 39 216 101 138 26 348 270 11 340 134 42 84 77 140141 67 273 21 14 343 68 252 118 48 308 245 133 69 320 258 146 75 314 130 263 201 284 347 255 91 329 40 259 251 125 286 200 250 28 336 266 322 285 268 −0.4 49 126 315 65 63 64 269 267 −0.5 −2 −1.5 −1 −0.5 0 0.5 1 −8 −6 −4 −2 0 2 4 6 8 1st Principal Component 1st Principal Component New Year ( 1 ), Valentine’s ( 45 ), Christmas Eve ( 358 ), New Year’s Eve ( 365 ) O.B. Laden’s death ( 122 ), Winehouse’s death & Breivik ( 204 ), UK riots ( 221 ) (Lampos, 2012), (Strapparava, Valitutti, 2004) → WordNet Affect 7 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 7/49

Supervised learning Primary outcomes 8 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 8/49

Regression basics — Ordinary Least Squares x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w w w j , β ∈ R , — w w ∗ = [ w w ; β ] j ∈ { 1 , ..., m } Ordinary Least Squares (OLS) � − 1 X � y � 2 X T X T argmin � X X X ∗ w w w ∗ − y y ℓ 2 ⇒ w w w ∗ = X X ∗ X X ∗ X X ∗ y y y w w w ∗ Why not? X T − − X − X ∗ X X X ∗ may be singular (thus difficult to invert) − − − high-dimensional models difficult to interpret − − − unsatisfactory prediction accuracy (estimates have large variance) 9 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 9/49

Regression basics — Ridge Regression • observations x i ∈ R m , x x — X X X i ∈ { 1 , ..., n } • responses y y i ∈ R , — y y i ∈ { 1 , ..., n } • weights, bias w j , β ∈ R , — w w ∗ = [ w w w w ; β ] j ∈ { 1 , ..., m } Ridge Regression (RR) � � y � 2 w � 2 argmin � X X X ∗ w w w ∗ − y y ℓ 2 + λ � w w ℓ 2 w w w ∗ + + + size constraint on the weight coefficients ( regularisation ) → resolves problems caused by collinear variables + + + less degrees of freedom, better predictive accuracy than OLS − − − does not perform feature selection (nonzero coefficients) (Hoerl, Kennard, 1970) 10 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 10/49

Regression basics — Lasso x i ∈ R m , • observations x x — X X X i ∈ { 1 , ..., n } • responses y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w w w j , β ∈ R , — w w ∗ = [ w w ; β ] j ∈ { 1 , ..., m } ℓ 1 ℓ 1 ℓ 1 –norm regularisation or lasso (Tibshirani, 1996) � � y � 2 argmin � X X X ∗ w w w ∗ − y y ℓ 2 + λ � w w w � ℓ 1 w w ∗ w − − − no closed form solution — quadratic programming problem + Least Angle Regression (LAR) explores entire reg. path + + (Efron et al., 2004) + w + + sparse w w , interpretability, better performance (Hastie et al., 2009) − if m > n , at most n variables can be selected − − − − − strongly corr. predictors → model-inconsistent (Zhao, Yu, 2009) 11 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 11/49

Lasso for text regression x i ∈ R m , • n-gram frequencies x x — X X X i ∈ { 1 , ..., n } • target phenomenon y i ∈ R , — y y y i ∈ { 1 , ..., n } • weights, bias w w w j , β ∈ R , j ∈ { 1 , ..., m } — w w ∗ = [ w w ; β ] ℓ 1 –norm regularisation or lasso ℓ 1 ℓ 1 � � y � 2 X w y w argmin � X X ∗ w w ∗ − y ℓ 2 + λ � w w � ℓ 1 w w ∗ w 12 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 12/49

Nowcasting ILI rates from Twitter (1/2) Assumptions • Twitter users post about their health condition • We can turn this information into an influenza-like-illness (ILI) rate Is there a signal in the data? • 41 illness related keyphrases (e.g. flu, fever, sore throat, headache) • z-scored cumulative frequency vs z-scored official ILI rates −2 Twitter’s Flu−score (region D) HPA’s Flu rate (region D) 5 Flu rate / score (z−scores) 4 England & Wales (region D) 3 r = .856 2 (Lampos, Cristianini, 2010) 0 −1 −2 160 180 200 220 240 260 280 300 320 340 Day Number (2009) 13 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 13/49

Nowcasting ILI rates from Twitter (2/2) • create a pool of unigram features by indexing all words in relevant web pages (Wikipedia, NHS pages) • stop-words removed, Porter-stemming • automatic unigram selection and weighting via lasso Selected uni-grams ‘unwel’, ‘temperatur’, ‘headach’, ‘appetit’, ‘symptom’, ‘diarrhoea’, ‘muscl’, ‘feel’, ‘flu’, ‘cough’, ‘nose’, ‘vomit’, ‘diseas’, ‘sore’, ‘throat’, ‘fever’, ‘ach’, ‘runni’, ‘sick’, ‘ill’, ... 150 HPA Inferred Flu rate 100 England & Wales r = .968 50 0 180 200 220 240 260 280 300 320 340 Day Number (2009) (Lampos, Cristianini, 2010) 14 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 14/49

Mining the social web: A series of statistical NLP case studies - PowerPoint PPT Presentation

Mining the social web: A series of statistical NLP case studies Vasileios Lampos Department of Computer Science University College London May, 2014 1 / 49 v.lampos@ucl.ac.uk Slides: http://bit.ly/1v3Jeiy 1/49 Key assumptions about social

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Web MINING Web MINING Overview Overview Dr Ahmed Rafea Rafea Dr Ahmed 1 Web Mining Outline

Web Mining Web Mining to automatically discover and extract information from Web

Web Mining Web Mining to automatically discover and extract information from Web

Mining the Social Web: A series of statistical NLP case studies Vasileios Lampos Department of

Lead Screw Motors LSM08 Series LSM11 Series LSM14 Series LSM17 Series

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

Web Mining Andreas Andersson Gustav Strmberg Sandra Stendahl Introduction Web mining o

Introduction to Web Mining What is Web Mining? Discovering useful information from the

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Data Mining: Concepts and Techniques Web Mining Li Xiong Slides credits: Jiawei Han and

Time Series Analysis and Mining with R Time Series Decomposi- tion Time Series Forecasting

Governance of social services in the Czech Republic in a comparative perspective (Crucial trends

Agent-Based Systems Discussed simple, abstract models of multiagent encounters Utilities,

The Overpopulation of Solutions to Philosophical Problems Nathan Oserofg Kings College London

Global Changs Conjecture and singular cardinals Monroe Eskew (joint with Yair Hayut) Kurt G

Estimation II: Consistency Stat 3202 @ OSU, Autumn 2018 Dalpiaz 1 The Usual Setup Suppose we

Consistent Kernel Mean Estimation for Functions of Random Variables Ilya Tolstikhin jointly with

Lecture 15: Batch RL Emma Brunskill CS234 Reinforcement Learning. Winter 2019 Slides drawn from

Lo Locally Differentially Private Frequency Es Esti timati tion on Ex Exploi oiti ting Con