web crawler system for
play

Web crawler system for collecting malicious activities FIRST TC - PowerPoint PPT Presentation

Web crawler system for collecting malicious activities FIRST TC Mauritius 2016 Hisao Nashiwa Internet Initiative Japan Inc. Who am I? Threat analyst at Internet Initiative Japan Inc. that is short for IIJ. IIJ is a Japanese


  1. Web crawler system for collecting malicious activities FIRST TC Mauritius 2016 Hisao Nashiwa Internet Initiative Japan Inc.

  2. Who am I? • Threat analyst at “Internet Initiative Japan Inc.” that is short for “IIJ”. – IIJ is a Japanese ISP (We are the first commercial ISP in Japan). Hisao Nashiwa CISSP • Member of CSIRT team called “IIJ - SECT” – Our jobs include… • Malware Analysis • Forensic Investigation • Incident Response and Handling • Developing and operating honeypot and web crawler system • Surveying malware and attacking technique trends • Writing reports for our quarterly report (called “IIR”) and blogs

  3. Motivation • Recent typical malware infection vectors are drive-by download and malware-attached email. • We want to observe web-based threat trend, especially, Exploit kit activities. • For the reason, we should crawl large amount of websites(more than 300,000 sites/day) for Japanese user and detect threat.

  4. Issues • Simple web crawling tools do not implement DOM parser act as Internet Explorer. – wget, curl, even spidermonkey and jsunpack- n can’t process JavaScript contents made for Internet Explorer. – t hug has DOM parser, but it’s not the same as Internet Explorer’s one. • Several sandbox products can parse DOM with real Internet Explorer on Windows VM. But these take a long time for analysis. (5-15min/website).

  5. Solution • We made in-house light-weight sandbox that runs real Internet Explorer on Windows VM for crawling websites.(in other words, web client honeypot) – That takes 15-60 sec/website for light-weight analysis. • We adopt customized proxy server to analyze the crawling traffic.

  6. Flow and Components Web crawler system Mgmt server Light weight sandbox (client honeypot) Redirection Detector/Analyzer (1) Redirection Detector/Analyzer Controller/ Windows Honey DB Honey (2) VM Custom Clients Clients (4) MITM (2) Proxy Phase2 The Phase1 analyzer analyzer internet Sandbox (3) product (1) Controller order the target website. (2) Client honeypot crawl the website through the proxy. Then it analyzes traffic and VM activity. – phase 1 analysis (3) If some suspicious activities detected in phase 1 analysis, another sandbox product crawl the same website to deep analysis(takes several minutes) (4) Controller collect sandbox report and the phase1 analysis result, then do phase2 analysis to classify the threat.

  7. Details of analysis • Phase1 analysis – HTTP header info • FQDN length, destination port number, domain transitions, User- agent, content-type, header length, response size and so on... – Simple content analysis • File magic checking (PE, ZIP(jar, xap) and PDF are suspicious) • reversing action scripts of swf contents by swf dumping tool and grep some suspicious sentence(like XOR order). – Windows VM activity • Sysmon log, RAM usage, Java related process.. • Flash trace log These analyses apply for all session(takes 15-60 sec/website)

  8. Details of analysis(cont.) • Sandbox analysis – We use commercial sandbox product to automated analysis. This analysis apply for 0.1 percent of all session(takes 5-15 min/website)

  9. Details of analysis(cont.) • Phase 2 analysis – Analyze the results of phase 1 analysis and report of sandbox product. – Classify the threat and identify the name of Exploit kit by our in-house pattern match signatures. – The false positive rate of pattern matching is modestly low because this analysis apply only sessions raised by phase 1 analysis. This analysis apply for 0.1 percent of all session(takes several sec/website)

  10. Websites to crawl • We’re crawling about 400,000 websites/day. – .jp domain websites from Alexa top 1 million (automatic collection) – public offices, local governments, listed companies, media companies…(manual collection) • Googling for the lists then parse the lists… very tired. – Hot websites from keywords of search engines (automatic collection)

  11. Recent observations (encount rate(%)) 0.009 Sundown KaiXin 0.008 Rig 0.007 Neutrino 0.006 0.005 0.004 0.003 0.002 0.001 0 1-Jul-16 1-Aug-16 1-Sep-16 (date) Encounter rate of Exploit kit threat in Japan

  12. Recent activities of Rig exploit kit 100 80 60 40 20 0 29-Sep-16 4-Oct-16 9-Oct-16 14-Oct-16 (date) Number of defaced websites redirect to Rig EK

  13. Rig Exploit kit • The pandemic of Rig EK is worldwide trend. • Heavily obfuscated Landing page. • Exploit Internet Explorer, Adobe Flash and Silverlight. • We observed Locky, Cerber and Ursnif as exploit payload.

  14. Case: a certain blog defaced and redirected visitors to Rig EK http://blogs.XXXXX.com @Mar-2016

  15. Case: a certain blog defaced and redirected visitors to Rig EK(Cont.) 1. GET http:// blogs.XXXXX.com / HTTP/1.1 200 0 25833 text/html Compromised webpage 2. GET http:// blogs.XXXXX.com /wp-includes/js/wp-emoji-release.min.js HTTP/1.1 200 0 6519 application/javascript 3. GET http:// xc.rottencouchtomatoes.com /hlfvviewforumym.php HTTP/1.1 200 0 901 text/javascript Redirector 4. GET http:// ef.scber.com /?wXqBcrWVLRbJCII=l3SKfPrfJxzFGMSUb-nJDa9BMEXCRQLPh4SGhKrXCJ- ofSih17OIFxzsmTu2KV_OpqxveN0SZFSOzQfZPVQlyZAdChoB_Oqki0vHjUnH1cmQ9laHYghP7ZWSELQy2AnyyuAUI5kvxh PU6WJVyO1MAwlB4AwSzqrJBKqE HTTP/1.1 200 0 5254 text/html Infector 5. GET http:// ef.scber.com /index.php?wXqBcrWVLRbJCII=l3SMfPrfJxzFGMSUb-nJDa9BMEXCRQLPh4SGhKrXCJ- ofSih17OIFxzsmTu2KV_OpqxveN0SZFSOzQfZPVQlyZAdChoB_Oqki0vHjUnH1cmQ9laHYghP7ZWSELQy2AnyyuAUI5kvxh PU6WJVyO1MAwlB4AwSzqrJBKqKp0N6RgBnEB_CbJQlqw-BF3H6PXl5gv2pHn4oieWX_P93mpMmmA HTTP/1.1 200 0 14779 application/x-shockwave-flash 6. GET http:// ef.scber.com /index.php?wXqBcrWVLRbJCII=l3SMfPrfJxzFGMSUb-nJDa9BMEXCRQLPh4SGhKrXCJ- ofSih17OIFxzsmTu2KV_OpqxveN0SZFSOzQfZPVQlyZAdChoB_Oqki0vHjUnH1cmQ9laHYghP7ZWSELQy2AnyyuAUI5kvxh PU6WJVyO1MAwlB4AwSzqrJBKqKp0N6RgBnEB_CbJQlqw-KAWf6PXl5gv2pHn4oieWX_PR3lJImmA HTTP/1.1 200 0 13938 application/x-silverlight-app 7. GET http:// ef.scber.com /index.php?wXqBcrWVLRbJCII=l3SMfPrfJxzFGMSUb-nJDa9BMEXCRQLPh4SGhKrXCJ- ofSih17OIFxzsmTu2KV_OpqxveN0SZFSOzQfZPVQlyZAdChoB_Oqki0vHjUnH1cmQ9laHYghP7ZWSELQy2AnyyuAUI5kvxh PU6WJVyO1MAwlB4AwSzqrJBKqKp0N6RgBnEB_CbJQlqw-fECT6PXl5gv2pHn4oieWX_PJwnJAmmA&dfsdf=11010 HTTP/1.1 200 0 323584 application/x-msdownload

  16. Case: a certain blog defaced and redirected visitors to Rig EK(Cont.) 1. GET http:// blogs.XXXXX.com / HTTP/1.1 200 0 25833 text/html 2. GET http:// blogs.XXXXX.com /wp-includes/js/wp-emoji-release.min.js HTTP/1.1 200 0 6519 application/javascript 3. GET http:// xc.rottencouchtomatoes.com /hlfvviewforumym.php HTTP/1.1 200 0 901 text/javascript 4. GET http:// ef.scber.com /?wXqBcrWVLRbJCII=l3SKfPrfJxzFGMSUb-nJDa9BMEXCRQLPh4SGhKrXCJ- ofSih17OIFxzsmTu2KV_OpqxveN0SZFSOzQfZPVQlyZAdChoB_Oqki0vHjUnH1cmQ9laHYghP7ZWSELQy2AnyyuAUI5kvxh PU6WJVyO1MAwlB4AwSzqrJBKqE HTTP/1.1 200 0 5254 text/html 5. GET http:// ef.scber.com /index.php?wXqBcrWVLRbJCII=l3SMfPrfJxzFGMSUb-nJDa9BMEXCRQLPh4SGhKrXCJ- ofSih17OIFxzsmTu2KV_OpqxveN0SZFSOzQfZPVQlyZAdChoB_Oqki0vHjUnH1cmQ9laHYghP7ZWSELQy2AnyyuAUI5kvxh PU6WJVyO1MAwlB4AwSzqrJBKqKp0N6RgBnEB_CbJQlqw-BF3H6PXl5gv2pHn4oieWX_P93mpMmmA HTTP/1.1 200 0 14779 application/x-shockwave-flash 6. GET http:// ef.scber.com /index.php?wXqBcrWVLRbJCII=l3SMfPrfJxzFGMSUb-nJDa9BMEXCRQLPh4SGhKrXCJ- ofSih17OIFxzsmTu2KV_OpqxveN0SZFSOzQfZPVQlyZAdChoB_Oqki0vHjUnH1cmQ9laHYghP7ZWSELQy2AnyyuAUI5kvxh PU6WJVyO1MAwlB4AwSzqrJBKqKp0N6RgBnEB_CbJQlqw-KAWf6PXl5gv2pHn4oieWX_PR3lJImmA HTTP/1.1 200 0 13938 application/x-silverlight-app 7. GET http:// ef.scber.com /index.php?wXqBcrWVLRbJCII=l3SMfPrfJxzFGMSUb-nJDa9BMEXCRQLPh4SGhKrXCJ- ofSih17OIFxzsmTu2KV_OpqxveN0SZFSOzQfZPVQlyZAdChoB_Oqki0vHjUnH1cmQ9laHYghP7ZWSELQy2AnyyuAUI5kvxh PU6WJVyO1MAwlB4AwSzqrJBKqKp0N6RgBnEB_CbJQlqw-fECT6PXl5gv2pHn4oieWX_PJwnJAmmA&dfsdf=11010 HTTP/1.1 200 0 323584 application/x-msdownload

  17. Case: a certain blog defaced and redirected visitors to Rig EK(Cont.) 1. GET http:// blogs.XXXXX.com / HTTP/1.1 200 0 25833 text/html 2. GET http:// blogs.XXXXX.com /wp-includes/js/wp-emoji-release.min.js HTTP/1.1 200 0 6519 application/javascript 3. GET http:// xc.rottencouchtomatoes.com /hlfvviewforumym.php HTTP/1.1 200 0 901 text/javascript 4. GET http:// ef.scber.com /?wXqBcrWVLRbJCII=l3SKfPrfJxzFGMSUb-nJDa9BMEXCRQLPh4SGhKrXCJ- ofSih17OIFxzsmTu2KV_OpqxveN0SZFSOzQfZPVQlyZAdChoB_Oqki0vHjUnH1cmQ9laHYghP7ZWSELQy2AnyyuAUI5kvxh PU6WJVyO1MAwlB4AwSzqrJBKqE HTTP/1.1 200 0 5254 text/html 5. GET http:// ef.scber.com /index.php?wXqBcrWVLRbJCII=l3SMfPrfJxzFGMSUb-nJDa9BMEXCRQLPh4SGhKrXCJ- ofSih17OIFxzsmTu2KV_OpqxveN0SZFSOzQfZPVQlyZAdChoB_Oqki0vHjUnH1cmQ9laHYghP7ZWSELQy2AnyyuAUI5kvxh PU6WJVyO1MAwlB4AwSzqrJBKqKp0N6RgBnEB_CbJQlqw-BF3H6PXl5gv2pHn4oieWX_P93mpMmmA HTTP/1.1 200 0 14779 application/x-shockwave-flash SCRIPT src = "http://xc.rottencouchtomatoes.com/hlfvviewforumym.php" 6. GET http:// ef.scber.com /index.php?wXqBcrWVLRbJCII=l3SMfPrfJxzFGMSUb-nJDa9BMEXCRQLPh4SGhKrXCJ- ofSih17OIFxzsmTu2KV_OpqxveN0SZFSOzQfZPVQlyZAdChoB_Oqki0vHjUnH1cmQ9laHYghP7ZWSELQy2AnyyuAUI5kvxh PU6WJVyO1MAwlB4AwSzqrJBKqKp0N6RgBnEB_CbJQlqw-KAWf6PXl5gv2pHn4oieWX_PR3lJImmA HTTP/1.1 200 0 13938 application/x-silverlight-app 7. GET http:// ef.scber.com /index.php?wXqBcrWVLRbJCII=l3SMfPrfJxzFGMSUb-nJDa9BMEXCRQLPh4SGhKrXCJ- ofSih17OIFxzsmTu2KV_OpqxveN0SZFSOzQfZPVQlyZAdChoB_Oqki0vHjUnH1cmQ9laHYghP7ZWSELQy2AnyyuAUI5kvxh PU6WJVyO1MAwlB4AwSzqrJBKqKp0N6RgBnEB_CbJQlqw-fECT6PXl5gv2pHn4oieWX_PJwnJAmmA&dfsdf=11010 HTTP/1.1 200 0 323584 application/x-msdownload

Recommend


More recommend