Fifteen Minutes of Unwanted Fame: Detecting and Characterizing Doxing Peter Snyder* – Periwinkle Doerfler + – Chris Kanich* – Damon McCoy + * + 1
Overview • Doxing is a target form of online abuse • Prior work is qualitative or on defensive • We don't understand scope and targets of problem • This work is the first qualitative, large scale measurement of doxing 2
Outline • Problem area • Measurement methodology • Results and findings • Discussion and conclusions 3
Outline • Problem area • Measurement methodology • Results and findings • Discussion and conclusions 4
What is Doxing? (1/2) • Method of targeted online abuse • Attackers compile sensitive information about the target • Personal : Name, addresses, age, photographs, SSN • Relationships : Family members, partners, friends • Financial : Work history, investments, criminal history • Online : Email, social network accounts, passwords, IPs 5
What is Doxing? (2/2) • Information is compiled into plain text files • Released "anonymously" • Text sharing sites (e.x. pastebin.com, skidpaste.com) • Online forums (e.x. 4chan, 8chan) • Torrents • IRC, Twitch, social networks, etc. 6
==================================================== Full Name: █████ ██████ Aliases> ████████████ Age: ██ DOB: ██ / ██ / ████ Address: ██ ███████ █████ ███████████ , ███████ ██████ // Confirmed Mobile Number: + █ ( ███ ) ███ - ████ // Confirmed Email: ██████████ @ ███████ . ███ // Confirmed Illness: Asthma ==================================================== ISP Records> ISP: Rogers Cable // Previous IP Address: ███ . ███ . ███ . ███ // Previous ==================================================== Parental Information> Father: █ █ ██████ Age: ██ 7
Aliases) ███████████ , ███████████ , █████ Name) ██████ ████ DOB █ / ██ / ██ Address) ██ █ ████ █ , ██████ , ██ █████ Cell Phone) ███ - ███ - ████ – Sprint, Mobile Caller ID) ██████ ████ Old Home Phone) ███ - ███ - ████ – CenturyLink, Landline Last 4 of Mastercard) ████ Emails) ██████████████ @ █████ . ███ , ████████ @ █████ . ███ Snapchat) ███████████ Twitter) @ ███████████ Facebook) https://facebook.com/ █████████ , ███████████ Skype) █████████ , ████████ 8
Doxing Harms 9
Frequency, Targets and Effects • Prior work is based in qualitative or preventative / risk management approaches • Research Questions: 1. How frequently does doxing happen? 2. What information is shared in doxes? Who is targeted? 3. What is knowable about the large scale e ff ects and harms? 4. Are anti-abuse tools e ff ective? 10
Outline • Problem area • Measurement methodology • Results and findings • Discussion and conclusions 11
General Measurement Strategy • Find places online where doxes are frequently shared • Train a classifier to determine how much activity is doxing • Measure extracted doxes to determine contained information • Watch the OSN accounts of doxing victims for abuse 12
Steps to Protect Victims • Worked closely with IRBs; multiple rounds of study design • Only recorded publicly available data • Careful data storage / analysis methods: only recorded high level summary data • Data protection best practices (key based encryption, single data store, strict access controls) 13
Dox Collection Pipeline 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet 1.45m 144k 138k 3.4k 512 • Fully automated Dox Classifier - Sec 3.1.2 • Single IP at the University of 5,330 files 1.73m files Not Dox Illinois at Chicago OSN Extractor - Sec 3.1.3 • Two recording periods: 5,330 files 748 117 328 127 345 245 • Summer of 2016 Dox De-Duplication - Sec 3.1.4 4,328 files 1,002 files Duplicate • Winter of 2016 Social Network Account Verifier & Scraper - Sec 3.1.5 552 Acct 228 Acct 305 Acct 200 Acct 14
Text File Collection • Data recorded from 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet 1.45m • pastebin.com 144k 138k 3.4k 512 Dox Classifier - Sec 3.1.2 • 4chan.org (pol, b) 5,330 files 1.73m files Not Dox • 8ch.net (pol, baphomet) OSN Extractor - Sec 3.1.3 • API and scrapers 5,330 files 748 117 328 127 345 245 • Selected because: Dox De-Duplication - Sec 3.1.4 4,328 files • "Original" sources of doxes 1,002 files Duplicate Social Network Account Verifier & Scraper - Sec 3.1.5 • Anecdotal reputation for toxic behavior / doxing activity 552 Acct 228 Acct 305 Acct 200 Acct 15
Text File Classification 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet 1.45m 144k 138k 3.4k 512 • Scikit-Learn, TfidfVectorizer, Dox Classifier - Sec 3.1.2 SGDClassifier 5,330 files 1.73m files Not Dox • Training Data: OSN Extractor - Sec 3.1.3 5,330 files • Manual labeling of Pastebin 748 117 328 127 345 245 crawl Dox De-Duplication - Sec 3.1.4 4,328 files • "proof-of-work" sets 1,002 files Duplicate Social Network Account Verifier & Scraper - Sec 3.1.5 552 Acct 228 Acct 305 Acct 200 Acct 16
Text File Classification Label Precision Recall # Samples Dox 0.81 0.89 258 Not 0.99 0.98 3,546 Avg / Total 0.98 0.98 3,804 17
Social Networking Account Extractor • Extract social networking 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet accounts 1.45m 144k 138k 3.4k 512 • Custom, heuristic-based Dox Classifier - Sec 3.1.2 5,330 files identifier 1.73m files Not Dox OSN Extractor - Sec 3.1.3 • Example: 5,330 files 748 117 328 127 345 245 • Facebook:https://facebook.com/example • Dox De-Duplication - Sec 3.1.4 FB example • 4,328 fbs: example - example2 - example3 files 1,002 files Duplicate • facebooks; example and example2 Social Network Account Verifier & Scraper - Sec 3.1.5 • Evaluated on 125 labeled doxes 552 Acct 228 Acct 305 Acct 200 Acct 18
Social Networking Account Extractor % Doxes Including Extractor Accuracy Instagram 11.2 95.2 Twitch 9.7 95.2 Google+ 18.4 90.4 Twitter 34.4 86.4 Facebook 48.0 84.8 YouTube 40.0 80.0 19
Dox De-duplication 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet 1.45m 144k 138k 3.4k 512 • Similar doxes, identical target Dox Classifier - Sec 3.1.2 • Hash based comparison 5,330 files 1.73m files Not Dox fragile to marginal updates OSN Extractor - Sec 3.1.3 • Compare referenced OSN 5,330 files 748 117 328 127 345 245 accounts Dox De-Duplication - Sec 3.1.4 • ~14.2% of doxes were 4,328 files 1,002 files Duplicate duplicates Social Network Account Verifier & Scraper - Sec 3.1.5 552 Acct 228 Acct 305 Acct 200 Acct 20
Dox De-duplication • Repeatedly visit referenced 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet OSN accounts 1.45m 144k 138k 3.4k 512 • After 1, 2, 3, 7, 14… days Dox Classifier - Sec 3.1.2 5,330 files • Only record the status of the 1.73m files Not Dox account: OSN Extractor - Sec 3.1.3 5,330 files • public 748 117 328 127 345 245 Dox De-Duplication - Sec 3.1.4 • private 4,328 files 1,002 files Duplicate • inactive Social Network Account Verifier & Scraper - Sec 3.1.5 • Single IP @ UIC 552 Acct 228 Acct 305 Acct 200 Acct 21
Manual Dox Labeling • Randomly selected 464 doxes • Manually label each dox to understand the contents. • Did it include name, address, phone #, email, etc.? • Age and gender of the target (if included) • Categorization of the victim • Categorization of the motive of attacker ("why I doxed this person…") 22
Collection Statistics Study Period Summer 2016 Winter 2016-17 Combined Text Files 484,185 1,253,702 1,737,887 Recorded Classified as Dox 2,976 2,554 5,530 Doxes w/o 2,326 2,202 4,528 Duplicates Manually Labeled 270 194 464 23
Outline • Problem area • Measurement methodology • Results and findings • Discussion and conclusions 24
Outline • Results and findings • Doxing targets • Doxing perpetuators • E ff ects on social networks 25
Doxing Targets 26
Victim Demographics Min Age 10 years old • Taken from the 464 manually labeled doxes Max Age 74 years old • Only based on data in Mean Age 21.7 years old doxes Gender, Female 16.3% • Harm prevention steps Gender, Male 82.2% (e.g. not taking demographic data from Gender, Other 0.4% OSN accounts) 64.5% Located in USA (of 300 files that included address) 27
Recommend
More recommend