Fifteen Minutes of Unwanted Fame: Detecting and Characterizing Doxing Peter Snyder* – Periwinkle Doerfler + – Chris Kanich* – Damon McCoy + * + 1
Overview • Doxing is a targeted form of online abuse • Prior work is qualitative or on defensive techniques • We don't understand the scale or targets of problem • This work is the first quantitative, large scale measurement of doxing 2
Outline • Problem area • Measurement methodology • Results and findings • Discussion and conclusions 3
Outline • Problem area • Measurement methodology • Results and findings • Discussion and conclusions 4
What is Doxing? (1/2) • Method of targeted online abuse • Attackers compile sensitive information about the target • Personal : Name, addresses, age, photographs, SSN • Relationships : Family members, partners, friends • Financial : Work history, investments, CCN • Online : Email, social network accounts, passwords, IPs 5
What is Doxing? (2/2) • Information is compiled into plain text files • Released "anonymously" • Text sharing sites (e.x. pastebin.com, skidpaste.com) • Online forums (e.x. 4chan, 8chan) • Torrents • IRC, Twitch, social networks, etc. 6
==================================================== Full Name: █████ ██████ Aliases> ████████████ Age: ██ DOB: ██ / ██ / ████ Address: ██ ███████ █████ ███████████ , ███████ ██████ // Confirmed Mobile Number: + █ ( ███ ) ███ - ████ // Confirmed Email: ██████████ @ ███████ . ███ // Confirmed Illness: Asthma ==================================================== ISP Records> ISP: Rogers Cable // Previous IP Address: ███ . ███ . ███ . ███ // Previous ==================================================== Parental Information> Father: █ █ ██████ Age: ██ 7
Aliases) ███████████ , ███████████ , █████ Name) ██████ ████ DOB █ / ██ / ██ Address) ██ █ ████ █ , ██████ , ██ █████ Cell Phone) ███ - ███ - ████ – Sprint, Mobile Caller ID) ██████ ████ Old Home Phone) ███ - ███ - ████ – CenturyLink, Landline Last 4 of Mastercard) ████ Emails) ██████████████ @ █████ . ███ , ████████ @ █████ . ███ Snapchat) ███████████ Twitter) @ ███████████ Facebook) https://facebook.com/ █████████ , ███████████ Skype) █████████ , ████████ 8
Doxing Harms 9
Frequency, Targets and Effects • Prior work is based in qualitative or preventative / risk management approaches • Research Questions: 1. How frequently does doxing happen? 2. What information is shared in doxes? Who is targeted? 3. What is knowable about the large scale e ff ects and harms? 4. Are anti-abuse tools e ff ective? 10
Outline • Problem area • Measurement methodology • Results and findings • Discussion and conclusions 11
Steps to Protect Victims • Worked closely with IRBs; multiple rounds of study design • Only recorded publicly available data, careful to not use it to record data • Careful data storage / analysis methods: only recorded high level summary data • Data protection best practices (key based encryption, single data store, strict access controls) 12
General Measurement Strategy • Find places online where doxes are frequently shared • Train a classifier to determine how much activity is doxing • Measure extracted doxes to determine contained information • Watch the OSN accounts of doxing victims for abuse 13
Dox Collection Pipeline 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet 1.45m • Fully automated 144k 138k 3.4k 512 Dox Classifier - Sec 3.1.2 • Single IP at the University 5,330 files 1.73m files Not Dox of Illinois at Chicago OSN Extractor - Sec 3.1.3 • Two recording periods: 5,330 files 748 117 328 127 345 245 • Summer of 2016 Dox De-Duplication - Sec 3.1.4 4,328 files 1,002 files Duplicate • Winter of 2016 Social Network Account Verifier & Scraper - Sec 3.1.5 552 Acct 228 Acct 305 Acct 200 Acct 14
Text File Collection • Data recorded from 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet • pastebin.com 1.45m 144k 138k 3.4k 512 • 4chan.org (pol, b) Dox Classifier - Sec 3.1.2 5,330 files 1.73m files Not Dox • 8ch.net (pol, baphomet) OSN Extractor - Sec 3.1.3 5,330 • Selected because: files 748 117 328 127 345 245 Dox De-Duplication - Sec 3.1.4 • "Original" sources of 4,328 files 1,002 files Duplicate doxes Social Network Account Verifier & Scraper - Sec 3.1.5 • Anecdotal reputation for 552 Acct 228 Acct 305 Acct 200 Acct doxing 15
Text File Classification 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet • Scikit-learn, 1.45m 144k 138k 3.4k 512 TfidfVectorizer, Dox Classifier - Sec 3.1.2 5,330 SGDClassifier files 1.73m files Not Dox OSN Extractor - Sec 3.1.3 • Training Data: 5,330 files 748 117 328 127 345 245 • Manual labeling of Dox De-Duplication - Sec 3.1.4 Pastebin crawl 4,328 files 1,002 files Duplicate • "proof-of-work" sets Social Network Account Verifier & Scraper - Sec 3.1.5 552 Acct 228 Acct 305 Acct 200 Acct 16
Text File Classification Label Precision Recall # Samples Dox 0.81 0.89 258 Not 0.99 0.98 3,546 Avg / Total 0.98 0.98 3,804 17
Social Networking Account Extractor 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet 1.45m 144k 138k 3.4k 512 • Extract social networking Dox Classifier - Sec 3.1.2 accounts 5,330 files 1.73m files Not Dox • Custom, heuristic-based OSN Extractor - Sec 3.1.3 5,330 identifier files 748 117 328 127 345 245 • Evaluated on 125 labeled Dox De-Duplication - Sec 3.1.4 4,328 files doxes 1,002 files Duplicate Social Network Account Verifier & Scraper - Sec 3.1.5 552 Acct 228 Acct 305 Acct 200 Acct 18
Social Networking Account Extractor % Doxes Including Extractor Accuracy Instagram 11.2 95.2 Twitch 9.7 95.2 Google+ 18.4 90.4 Twitter 34.4 86.4 Facebook 48.0 84.8 YouTube 40.0 80.0 19
Dox De-duplication 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet • Similar doxes, identical 1.45m 144k 138k 3.4k 512 target Dox Classifier - Sec 3.1.2 5,330 files • Hash based comparison 1.73m files Not Dox fragile to marginal updates OSN Extractor - Sec 3.1.3 5,330 files • Compare referenced OSN 748 117 328 127 345 245 accounts Dox De-Duplication - Sec 3.1.4 4,328 files 1,002 files • ~14.2% of doxes were Duplicate Social Network Account Verifier & Scraper - Sec 3.1.5 duplicates 552 Acct 228 Acct 305 Acct 200 Acct 20
Social Network Status Watcher 4Chan 4Chan 8Ch 8Ch Pastebin pol b pol baphomet • Repeatedly visit 1.45m 144k 138k 3.4k 512 referenced OSN accounts Dox Classifier - Sec 3.1.2 5,330 files • After 1, 2, 3, 7, 14… days 1.73m files Not Dox OSN Extractor - Sec 3.1.3 • Only record the status of 5,330 files 748 117 328 127 345 245 the account: Dox De-Duplication - Sec 3.1.4 • public, private, inactive 4,328 files 1,002 files Duplicate • Single IP @ UIC Social Network Account Verifier & Scraper - Sec 3.1.5 552 Acct 228 Acct 305 Acct 200 Acct 21
Manual Dox Labeling • Randomly selected 464 doxes • Manually label each dox to understand the contents. • Did it include name, address, phone #, email, etc.? • Age and gender of the target (if included) • Categorization of the victim • Categorization of the motive of attacker 22
Collection Statistics Study Period Summer 2016 Winter 2016-17 Combined Text Files 484,185 1,253,702 1,737,887 Recorded Classified as Dox 2,976 2,554 5,530 Doxes w/o 2,326 2,202 4,528 Duplicates Manually Labeled 270 194 464 23
Outline • Problem area • Measurement methodology • Results and findings • Discussion and conclusions 24
Outline • Results and findings • Doxing targets • Doxing perpetuators • E ff ects on social networks 25
Doxing Targets 26
Victim Demographics Min Age 10 years old • Taken from the 464 manually labeled doxes Max Age 74 years old • Only based on data in Mean Age 21.7 years old doxes Gender, Female 16.3% • Careful to avoid further Gender, Male 82.2% harm (e.g. not taking demographic data from Gender, Other 0.4% OSN accounts) 64.5% Located in USA (of 300 files that included address) 27
Types of Data in Doxes Frequently Occurring Data Highly Sensitive Data Category # of Doxes % of Doxes* Category # of Doxes % of Doxes* Address School 422 90.1% 48 10.3% Phone # ISP 284 61.2% 100 21.6% Family Info Passwords 235 50.6% 40 8.6% Criminal Email 249 53.7% 6 1.3% Record Zip Code CCN 227 48.9% 20 4.3% Date of SSN 155 33.4% 10 2.6% Birth *All numbers from 464 manually labeled doxes 28
Recommend
More recommend