Predicting the Future with Deep Learning and Signals from Social Media SVITLANA VOLKOVA, PHD Senior Research Scientist Data Sciences and Analytics Group, National Security Directorate Pacific Northwest National Laboratory ACL Workshop on Natural Language Processing and Computational Social Science August 10, 2017 1
Social Media Analytics Forecasting Analytics Predictive Analytics Identify Forecast Suspicious Perspective Accounts Dynamics Brussels Bombings March 2016 Predict Forecast Final Output Probabilities } Probability Activation Layer (sigmoid/softmax) Deceptive Language … } Dense Layer (100 units) } Tensor Concatenation … } Dense Layer (100 units) Dense Layer … } LSTM/ (100 units) … } Convolutional Change News Layer (100 units) Dense Layer … } (100 units) Embedding … } Layer (200 units) Network/ } } Input Word Linguistic Sequences Cues Russia-Ukraine Forecast Detect Conflict 2014 – 2015 Future Real-World Forecast Events Events and The most likely Conflict event type Predict } Fully Influenza and connected } Output Instability Probabilities layer } Softmax Layer Native } LSTM … Weather layer } Dense Layer (128 units) LSTM … } .4 .3 .3 .3 pre-trained Event Types (100 units) Language … … … … Entity … Distributions } Dense Layer .3 .1 .3 .1 (100 units) Predicted weekly .03 t 0 t 1 t 3 t 4 embedding ILI proportions dimension (100) } Fully } russian } Binary connected … tanks Output } Input layer spotted … Embeddings in Dense crimea … } ES DE FR JA IN Classification Merge today Layers layer Bidirectional } … } } GRU LSTM LSTM (20 units) layer layer Bidirectional … } GRU (20 units) .4 .3 .3 .3 .03 .01 .02 .05 … Embedding } Layer t 0 t 1 t 3 t 4 … … … … (30 units) ILI predictors .1 .3 .1 .3 English Input (Bytes) t 0 t 1 t 3 t 4 SM predictors August 10, 2017 2
Outline Predicting Suspicious and Trusted News on Twitter Final Output Probabilities } Probability Activation Layer (sigmoid/softmax) … } (joint work with K. Shaffer, J. Yang, and N. Hodas) Dense Layer (100 units) } Tensor Concatenation Dense Layer … } (100 units) … Dense Layer } LSTM/ (100 units) … } Convolutional Layer (100 units) … Dense Layer } (100 units) Embedding … } Layer (200 units) Network/ } Input Word } Linguistic Sequences Cues Analyzing and Forecasting Targeted Perspectives in Social Media (collaboration with H. Rashkin and Y. Choi) ) Writer P t n ( e w g the writer a the predicate → → portrays the t doesn’t directly = h — e w agent as being m imply what the ( P e unfairly ) writer thinks of P (agent → theme) — opportunistic the theme Agent Theme — agent is unfairly taking advantage — = of the theme Reader Forecasting Short-Term Change in Text Representations during Crisis Events from VK (joint work with I. Stewart, D. Arendt, and E. Bell) August 10, 2017 3
Outline Predicting Suspicious and Trusted News on Twitter Final Output Probabilities } Probability Activation Layer (sigmoid/softmax) … } (joint work with K. Shaffer, J. Yang, and N. Hodas) Dense Layer (100 units) } Tensor Concatenation Dense Layer … } (100 units) … Dense Layer } LSTM/ (100 units) … } Convolutional Layer (100 units) … Dense Layer } (100 units) Embedding … } Layer (200 units) Network/ } Input Word } Linguistic Sequences Cues Analyzing and Forecasting Targeted Perspectives in Social Media (collaboration with H. Rashkin and Y. Choi) Forecasting Short-Term Change in Text Representations during Crisis Events from VK (joint work with I. Stewart, D. Arendt, and E. Bell) August 10, 2017 4
Motivation and Background 62% of U.S. adults get news on social media (Pew Research, Oct 2016) 64% of U.S. adults said that “made-up news” has caused a “great deal of confusion” about the facts of current events (Pew Research, Dec 2016) Previous work on deception detection: Deceptive Amazon reviews (Choi, Mihalcea) Satirical news (Rubin et al.2015) Rumors (Qazvinian et al., 2011; Liu et al., 2015) Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on August 10, 2017 5 Twitter. S. Volkova, K. Shaffer, J. Yea Jang and N. Hodas. ACL 2017.
Deceptive News Google Fact Checking: https://www.blog.google/topics/journalism-news/expanding-fact-checking-google/ Facebook 3 rd Party Verification: http://newsroom.fb.com/news/2016/12/news-feed-fyi-addressing-hoaxes-and-fake-news/ August 10, 2017 6
Deceptive News Types Propaganda Hoax Clickbait Satire Intent to Deceive No Intent to Deceive Propaganda deliberately spread misinformation in order to appeal to certain groups Hoax seek to mislead, rather than entertain, readers for financial or political gain Clickbait take bits of true stories but insinuate and make up other details to sew fear Satire take fun of the news, are satirical bent, or parodies of news August 10, 2017 7
Twitter News Data Propaganda Hoax Clickbait Satire Disinfo Propaganda Conspiracy Hoax Clickbait Intent to Deceive No Intent to Deceive No Intent to Deceive Intent to Deceive 2M suspicious tweets 130K total 65K suspicious August 10, 2017 8
News Categorization http://www.marketwatch.com/story/how-does-your-favorite-news-source-rate-on-the-truthiness-scale-consult-this-chart-2016-12-15
Alternative News Categorization http://www.marketwatch.com/story/how-does-your-favorite-news-source-rate-on-the-truthiness-scale-consult-this-chart-2016-12-15
Annotations Brussels bombing dataset March 15 – March 29, 2016 One week after and before March 22 nd , 2016 Account-level vs. tweet-level annotations: Fake news annotations http://www.fakenewswatch.com/ PropOrNot http://www.propornot.com/p/the-list.html (manually verified) Signs of propaganda Tries to persuade Influences the emotions, attitudes, opinions, and actions Target audiences for political, ideological, and religious purposes Have examples of selectively-omitting and one-sided messages August 10, 2017 11
Task Definition Build tweet-level neural network models to differentiate between: Verified vs. unverified news posts (130K) ? Intent to Deceive No Intent to Deceive Types of unverified news posts: propaganda, hoax, clickbait, satire (65K) Propaganda Hoax Clickbait Satire No Intent to Deceive Intent to Deceive disinformation, propaganda, conspiracy, clickbait, hoaxes (2M) Disinfo Propaganda Conspiracy Hoax Clickbait No Intent to Deceive Intent to Deceive August 10, 2017 12
Model Baselines: logistic regression with TFIDF and Doc2Vec representations Our models: neural networks (RNN/CNN) with social network interaction and linguistic cues: hedging, assertive, factive, implicative verbs Final Output Probabilities } Probability Activation Layer (sigmoid/softmax) … } Dense Layer (100 units) } Tensor Concatenation Dense Layer … } (100 units) … Dense Layer } LSTM/ (100 units) … } Convolutional Layer (100 units) … Dense Layer } Embedding (100 units) … } Layer (200 units) Network/ Input Word } } Linguistic Sequences Cues August 10, 2017 13 Keras: https://keras.io/, scikit-learn: http://scikit-learn.org/stable/, Doc2Vec: https://pypi.python.org/pypi/gensim
Linguistic Analysis Moral Foundation Theory (Haidt and Grahm, 2007, Graham et al., 2009) Harm, Care, Loyalty, Betrayal, Authority Biased Language (Recasens et al., 2013) Assertive, Factive, Hedging, Implicative, Report Verbs Subjective Language (Volkova et al., 2013, Liu et al., 2005, Riloff et al., 2003) Betrayal↑, Care↑, Loyalty↓, Hedging↓, Implicative↓ Loyalty↑, Hedges↑, Subj↑, Betrayal↓ Care↓, Subjective↓, Factive↓, Bias↓ August 10, 2017 14
Verified vs. Suspicious Prediction Results Binary: linguistic and social graph features (130K tweets, 10 fold c.v.) ? Intent to Deceive No Intent to Deceive LR D2V LR TFIDF RNN CNN 0.95 1 0.93 Accuracy 0.9 0.81 0.76 0.8 0.7 0.6 text + graph + ling. cues all 15 August 10, 2017
Suspicious News Prediction Results (1) Multi-class prediction: satire, hoaxes, clickbaits, propaganda (65K) Propaganda Hoax Clickbait Satire Intent to Deceive No Intent to Deceive RNN CNN LR TFIDF LR D2V 0.71 0.8 0.66 0.63 0.63 F1 macro 0.6 0.4 0.2 text + network + ling. all markers August 10, 2017 16
Suspicious News Prediction Results (2) Multi-class prediction: disinformation, propaganda, conspiracy, clickbait, hoaxes (2M) Disinfo Propaganda Conspiracy Hoax Clickbait No Intent to Deceive Intent to Deceive 4-way (no disinfo) 5-way 1 0.85 0.84 0.78 0.76 F1 macro 0.8 0.67 0.65 0.6 0.4 0.2 0 words + network + deepwalk 0.98 0.92 0.71 0.64 0.61 August 10, 2017 17
Recommend
More recommend