You Won’t Believe It: Exploring the Advertising Ecosystem of Fake News Websites Catherine Han Allen Tong CS 261N 1
Misinformation Flavors: Deception 2
Misinformation Flavors: Junk Science 3
Misinformation Flavors: Unreliable Clickbait 4
Misinformation Related Work ● Detection ● Fact-checking ● Prevention ● Propagation modeling 5
Online Advertisement Problem Space ● Big revenue stream for “free” services ○ For news publishers too ● Impact on user experience and privacy ○ Annoyance ○ Performance ○ Trackers ○ Fear 6
Online Advertisement Problem Space, con’t ● Targeting algorithms ○ Algorithmic bias ● RTB ecosystem 7
Real-Time Bidding (RTB) Simplified P I I + r e q u e s t u s e r p r o f i l e Publisher Ad Exchange Server winning bid ($A), embed A creative (A) request for bids $A $D $C $B AD SLOT A Advertiser A Advertiser B Advertiser C Advertiser D User Browser 8
Misinformation and Ad Revenue 9
Misinformation and Ad Revenue, con’t 10
Misinformation and Ad Revenue, con’t 11
Research Problem ● Understanding the ecosystem of online ads on fake news sites ○ Identifying third-party mediators ○ Identifying advertisers ○ Categorizing ads (e.g., medical, health, etc.) ● Comparative analysis with popular benign news sites 12
Construction of Site Corpus ● Fake News ○ Zimdars’ False, Misleading, Clickbait-y and/or Satirical “News” Sources List ■ Domain, “About Us”, source, style, aesthetic, social media analysis ● “Real” (Benign) News ○ Alexa Top Sites (News category) 13
Methodology - Collecting Ads foo.com foo_3 foo_1 foo_2 foo.co foo_1_1 foo_3_3 foo_2_3 foo_2_1 m rendered HTML foo_3_1 foo_1_3 foo_2_2 for foo.com foo_1_2 foo_3_2 14
Methodology - Collecting Ads < a h r e f = ” . . <a href=”...”> . ” > <a href= foo.co m ad URLs EasyList rendered HTML Malware Domains for foo.com + 12 UBlock Lists foo.com links 15
Methodology - Categorizing Ads final ad landing pages URL URL track URL redirects URL ad URLs LDA ad topics dictionary of ad words 16
Methodology - Challenges & Lessons Learned ● Choosing the corpus of fake news sites ● Web crawling woes ○ Headless browsing detection ○ Dynamically loaded content (ads) ○ Asynchronicity in JS ● Categorization of websites ○ Amazon Web Information Service API 17
Limitations & Ethics ● Inability to account for fingerprinting ● Corpus size and robustness ● Gathering (consistent) corpus metadata ● Clicking on ads 18
Results - Third-Party Ad Servers ● Third-party ads were found on 246 fake news sites and 716 benign sites … … (truncated) (truncated) 19
Results - Site Ranking and Third-Party Ad Servers 20
Results - Advertisers Misinfo Benign 21
Results - Site Ranking and Advertisers 22
Results - Products Advertised Advertisement Category Unique Fake News Sites Unique Benign News Sites (N=246) (N=716) Technology 235 ( 96% ) 403 ( 56% ) Medicine & Health 133 ( 57% ) 383 ( 53% ) Coronavirus 36 ( 15% ) 151 ( 21% ) (proper subset of Medicine & Health) Finance 84 ( 34% ) 329 ( 46% ) Politics 24 ( 10% ) 40 ( 6% ) Cannabis and CBD 23 ( 10% ) 46 ( 6% ) 23
Future Work ● Larger corpus ● Categorization of ads ● Longitudinal crawling ● Motivations of users navigating to such sites ○ Browser extension + user study with MTurk 24
Recommend
More recommend