The spread of misinformation in social media Filippo Menczer Center for Complex Networks and Systems Research School of Informatics and Computing Indiana University, Bloomington
1. Detection of misinformation 2. How misinformation spreads 3. Can the spread of misinformation be mitigated?
EPJ Data Science 2014
#snow on 22 Jan 2016
Social Media Observatory API Timeline Middleware Network Vis Geo Maps NoSQL Analytics Distrib. DB Dynamic Vis Algorithm Data Collection Videos System Long-term Backup Stream Sample
politics celebrities spam astroturf
Astroturf Detection Number of nodes nodes edges Number of edges Mean degree mean_k mean_s Mean strength Mean edge weight in largest connected com- mean_w ponent Classifier Accuracy AUC max_k(i,o) Maximum (in,out)-degree max_k(i,o)_user User with max. (in,out)-degree max_s(i,o) Maximum (in,out)-strength max_s(i,o)_user AdaBoost User with max. (in,out)-strength 96.4% 0.99 std_k(i,o) Std. dev. of (in,out)-degree std_s(i,o) Std. dev. of (in,out)-strength skew_k(i,o) Skew of (in,out)-degree dist. SVM 95.6% 0.95 Skew of (in,out)-strength dist. skew_s(i,o) mean_cc The mean size of connected components The size of the largest connected component max_cc entry_nodes Number of unique injections Number of times ‘truthy’ button was clicked num_truthy for the meme sentiment scores The six GPOMS sentiment dimensions ICWSM 2011 About ~ seconds! ~ Real-time query (Twitter search API) Real-time feature (>1K) extraction Real-time analysis and classification
AUC 0.95 Comm. ACM 2016
WWW 2016 Developers Day
DARPA Twitter bot detection challenge Sentimetrix 50.75 USC 45.00 IU 43.25 IBM 43.00 Boston Fusion 41.75 Georgia Tech 24.00 IEEE Computer , June 2016
1. Detection of misinformation 2. How misinformation spreads (Truthy case study) 3. Can the spread of misinformation be mitigated?
28 Aug 26 Aug 3 Sep 28 Aug 25 Aug 2014
18 Oct 24 Oct 28 Aug 21 Oct 23 Oct 22 Oct 26 Aug 3 Sep 28 Aug 25 Aug 2014
10 Nov 18 Oct 4 Nov 3 Nov 24 Oct 28 Aug 21 Oct 23 Oct 22 Oct 26 Aug 3 Sep 28 Aug 25 Aug 2014
September 2014 1 March 2016
Competition between hoaxes and fact checking chemtrails anti-vax
Hoax vs. fact checking: model
Number of active believers Segregation Credibility
Hoaxy Social Networks News Sites API Crawler RSS Parser URL Tracker (stream api) Scrapy Spider Monitors Store DATABASE Fetch Analysis Dashboard
���� ���� ���� ���� source sites tweets users URLs ���� r fake news 71 1,287,769 171,035 96,400 ���� fact checking 6 154,526 78,624 11,183 ���� ���� ���� ��� ��� � �� �� Tweets/Retweets ��� ����������� ��� �� � �� � �� � ��� ��� �� �� �� �� �� �� � � � � � ��� � �� �� ρ N ≥ n A ≥ a P ≥ p �� �� �� �� �� �� �� �� �� �� ��� � � �� �� � Pr ��������� Pr Pr �� �� �� �� �� �� ��F��F��F���� ��� �� �� �� �� �� �� ��������� ��F��F��F���� �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � p a n WWW SNOW 2016
1. Detection of misinformation 2. How misinformation spreads 3. Can the spread of misinformation be mitigated?
computational fact-checking?
carnivorous cat bird is a eat is a is a is a is a animal
is the spouse of 0.45 Woodrow Wilson 0.4 Warren G. Harding 0.35 Calvin Coolidge 0.3 0.25 Herbert Hoover 0.2 Franklin D. Roosevelt 0.15 0.1 Harry S. Truman Dwight D. Eisenhower John F. Kennedy Lyndon B. Johnson Richard Nixon Gerald Ford Jimmy Carter Ronald Reagan George H. W. Bush Bill Clinton George W. Bush Barack Obama Edith Bolling Galt Wilson Florence Harding Grace Coolidge Lou Henry Hoover Eleanor Roosevelt Bess Truman Mamie Eisenhower Jacqueline Kennedy Onassis Lady Bird Johnson Pat Nixon Betty Ford Rosalynn Carter Nancy Reagan Barbara Bush Hillary Rodham Clinton Laura Bush Michelle Obama is the capital of PLoS ONE 2015
a b Obama Barack Obama Columbia University Association of American Universities Canada Stephen Harper Calgary Naheed Nenshi Islam Muslim
Does fact-checking work? ‣ Echo chambers ‣ Selective exposure ‣ Con fi rmation bias ICWSM 2011
Predicting political alignment Features Accuracy Text (TF-IDF) 79% Hashtags 91% Retweet network 95% Tags + Network 95% SocialCom 2011
Activity K-core follower network K-core retweet network EPJ Data Science 2012
Homogeneous Exposure gmail live mail youtube Social yahoo mail reddit tumblr bubbles aol mail facebook wikipedia twitter ask bing email aol search news aggregator google news search yahoo search social media google random walker wiki pinterest baseline 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 B h PeerJ CS 2015
Competition for attention 2B Views 55M Followers Hashtag Popularity User Popularity # daily retweets # followers [Twitter] [Yahoo! Meme]
Can the competition for limited user attention help explain the broad heterogeneity of meme popularity and our vulnerability to misinformation?
Toy agent-based model Follower Post Post a new topic (Pn) #jan25 #jan25 C C #jobs #jobs B B A A D D Post existing topics (1 - Pn) Screen (Pr) Memory ( "! Pr) Screen (Pr) Memory ( "! Pr) (P m ) (P m ) #apple #apple #apple #jan25 ! #jobs #jobs #jan25 #jobs #justinbieber #justinbieber #apple #ladygaga #jan25 #ladygaga Before After
Toy model predictions: Role of social network a b
Toy model predictions: Role of limited attention a b Nature Sci. Rep. 2012
• Spread among agents with limited attention on social network is sufficient to explain meme virality • Not necessary to invoke more complicated explanations based on intrinsic meme value or external factors
DO the best ideas win? 128 m 9 µ=0.1 µ=0.2 m 1 64 µ=0.4 µ=0.6 mean popularity µ=0.8 32 µ=1.0 m 9 m 3 16 � m 1 8 m 2 4 m 9 m 7 2 m 3 m 9 m 4 m 5 1 0.0 0.2 0.4 0.6 0.8 1.0 (a) fitness 3 10 m 7 m 6 1- � α =1 m 8 m 1 α =2 α =3 m 5 α =4 α =5 m 6 α =6 average popularity α =7 2 10 α =8 m 6 α =9 m 3 α =10 1 10 m 6 m 7 m 6 m 5 µ=0.1 0 10 0 0.2 0.4 0.6 0.8 1 fitness
efficiency vs diversity 0.0 0.2 0.4 0.6 0.8 1.0 � =0.01 � =0.2 � =0.9 (a) (b) (c) Intensity of competition
limited attention 0.25 0.2 E ffi ciency 0.15 τ α =1 α =2 0.1 α =3 α =4 α =5 α =6 α =7 0.05 α =8 α =9 α =10 0 0 0.5 1 1.5 2 2.5 3 H Diversity
• Structural, temporal, content, and user features can be used to detect astroturf and social bots. • Social media and traditional media work together to spread misinformation; gullible people are exposed through gullible connections. • Social network structure and limited attention may amplify our natural biases and make us more vulnerable to misinformation. Marcella Tambuscio Thanks!
More recommend