Ne Needle in a Haystack: : Tracking Do Down Elite Ph Phishing g Dom Domains in the Wild Ke Tian, Steve T.K. Jan , Hang Hu, Danfeng Yao, Gang Wang Computer Science, Virginia Tech
Phishing is a Big Th Threat • Phishing: fraudulent attempt to obtain credentials (password) • Big Threat: estimated $30M loss in 2017 1 Yahoo Data Breach in 2014 Affected 500 Million Yahoo! User Account Ubiquiti Networks Lost $46.7M dollar to scammers in 2015 • Exploiting human factor is easier than system vulnerabilities. 2 1. Internet Crime Report, FBI, 2017.
Some Some P Phishing W Websites a are E Easy t to T o Tell • Phishing is a long existing problem • Good news: some phishing websites are easy to detect URL not relate http://178.128.85.7/banks/National http://account-updates-center-service.beedoces.com.br to Paypal: Phishing 3
Some Some P Phishing W Websites a are E Easy t to T o Tell • Phishing is a long existing problem • Good news: some phishing websites are easy to detect URL not include http://178.128.85.7/banks/National http://account-updates-center-service.beedoces.com.br http://178.128.85.7/banks/National domain name: Phishing 4
Mor More Sop Sophisticated P Phishing E Examp mple http://www.apple.com Different Char http://www.apple.com • This is IDN (Internalized Domain Name) homograph attack • Homograph domain squatting: Exploit the fact that many different characters look alike 5
Mor More Sop Sophisticated P Phishing E Examp mple http://get.adoḅe.com/es/flashplayer http://www.apple.com Different Char http://www.apple.com • This is IDN (Internalized Domain Name) homograph attack • Homograph domain squatting: Exploit the fact that many different characters look alike 6
Ho How w can an we e system ematic tically ally cap aptur ture e thes these e so sophisticated phish shing g websi sites s in practice? 7
Th This Study • We focus on squatting phishing domains • Web contents: phishing content, mimicking real websites • Domain name: “squatting” domain that impersonates popular brands • Research questions • How to systematically detect squatting phishing domains in practice? • What types of impersonation/evasion techniques do they use? • How effective are existing blacklists to detect them? • Large-scale empirical measurements • Search over 224 million DNS records • 702 popular brands 8
Ou Outline • Introduction • Detection methodology • Detect squatting domain • Detect phishing pages under squatting domain • Measuring squatting-based phishing • Conclusion 9
Detec ecti tion n Metho thodo dology gy • Our detection methodology based on a series of filtering process DNS Records: 224,810,532 Squatting Confirmed: Phishing: Domains: Web 857 1,741 657,663 Mobile: 908 Popular brands: 702 Squatting domain detection Phishing classifier Manually check 10
Detec ect t Squa quatti ting ng Domain • Goal: Detect squatting domain that impersonate brands • Given a brand, search squatting domains in DNS facebook.com • Capture five types of squatting domains faceb00k.com facebook.com 1. Homograph : Look similar to target domain facebnok.com 2. Bits : Flip a bit of target domain 3. Typo : Mimic the incorrectly typed of target domain fcaebook.com 4. Combo : Connect target domain with other strings facebook-stroty.com 5. WrongTLD : Different TLD of target domain facebook.audi 11
Detec ect t Squa quatti ting ng Domain • 224,810,532 DNS records 657,663 squatting domains • Crawl web and mobile version of pages that are still alive • Dynamic crawler: It can load java scripts and process redirections • 6,115 squatting domains (1.7%) are redirected to original brand • Some business purchase squatting domains to protect their own customers Squattting Domain Original Brand pricelin.com priceline.com Re-direct 12
Ph Phis ishin ing Clas lassif ifier ier • Goal: Classifying phishing pages under squatting domains • Ground Truth Data: • 1,731 phishing pages from PhishTank (manually confirm) • 1,565 benign pages from squatting domain (manually confirm) • Our classifier is motivated by observations on evasion techniques: 1. Layout obfuscation 2. String obfuscation 3. Code obfuscation 13
La Layou out O Obfuscation on • Change style/color/layout of target brand website • Evade screenshot-similarity based detection method Phishing Website Target Brand Be detected by existing methods Not be detected by existing methods 14
St Stri ring/Cod /Code O Obfuscation on • Hide important text and keywords in the HTML source code • Evade keyword-similarly based, or source code similarly based detection Target Brand HTML Phishing HTML <title> Log in to your PayPa1 </title> <title> Log in to your PayPal </title> <title> Log in to your PayPal </title> String Obfuscation Be detected by keyword- similarly based methods <script> String.fromCharCode(50) + “a” + …. Code Obfuscation 15
Our Desi Ou sign gn • Intuition 1: Phishing pages will be visually displayed to users • Extract keywords from their screenshots with OCR • Tesseract OCR: extract keywords from image Keyword list: Keyword list: Paypol Paypal Email Email passward password Google OCR NLTK spell check …… …… 16
Ou Our Desi sign gn Cont. • Intuition 2: Phishing pages contain forms to collect user credentials • Extract keywords from HTML forms • Using text-based feature from the source code as compliment 17
Gr Ground Truth th Evalu aluatio tion • Feed features to machine learning classifiers • Image (OCR) features, form features, text-based features • Naive Bayes, KNN and Random forest • Results of 10-fold cross-validation: Classifier False Positive False Negative AUC NaïveBayes 0.5 0.05 0.64 KNN 0.04 0.1 0.92 Random Forest 0.03 0.06 0.97 Random Forest is highly accurate 18
Ou Outline • Introduction • Detection methodology • Detect squatting domain • Detect phishing pages under squatting domain • Measuring squatting-based phishing • Conclusion 19
Detec ecti tion n in n Practi tice DNS Records: 224,810,532, Popular brands: 702 Squatting domains: 657,663 Detected Phishing pages: 1,741 Phishing on Mobile Confirmed phishing pages Confirmed phishing pages Confirmed phishing pages Web only: Mobile only: and Web: on mobile: 908 on web: 857 on both: 1175 267 318 590 Squatting phishing websites indeed exist More phishing websites on mobile 20
Can Current Blacklists Detect Th Them? • Run 70+ phishing blacklists, including PhishTank, eCrimeX, VirusTotal # of Pages 1200 1000 Over 90 % live 800 over a month 600 400 Reported them Existing blacklists/tools are not capable to capture squatting phishing yet 200 0 PhishTank VirusTotal eCrimeX Evaded Blacklists 21
Sq Squatting D Doma omains T Types • Combo squatting domains contain the largest number of phishing pages • Bits and homograph squatting domains: Hard to register # of pages Web Mobile 600 500 400 300 200 100 0 Homograph Bits Typo Combo WrongTLD 22
Ex Exampl ple Study: udy: Ube ber • Attackers steal Uber truck driver’s account. Squatting Domain Target Domain go-uberfreight.com freight.uber.com 23
Ex Exampl ple Study: udy: Of Offi fice 365 • Attackers compromises users’ office 365 account Squatting Domain Target Domain outlook-office365.net office365.com 24
Con Conclusion on • An extensive measurement of squatting phishing domain • From 224,810,532 DNS records and 700+ brands • Detect and identify 1,175 squatting phishing pages • Open-sourced our tool at: https://github.com/SquatPhish • Future work • Adversarial attacks for OCR-based phishing detection • Deploy the system for long term measurement 25
Thank You 26
APPENDIX 27
Ev Evasions in Squatting Phishing • Layout obfuscation: average 28.5 hamming distance • String obfuscation: 68% adopted • Code obfuscation: 35% adopted Obfuscation is common to squatting phishing. 28
IP IP Locatio tion • Check geolocation of 1,021 IP addresses, hosted in 53 different countries. • U.S. has most of the sites, then Germany 29
Fa False Positive Prediction http://paypal.me 30
Recommend
More recommend