when threat hunting fails
play

When threat hunting fails Identifying malvertising domains using - PowerPoint PPT Presentation

When threat hunting fails Identifying malvertising domains using lexical clustering Tucson, January 9th, 2018 Authors kitty Matt Foley David Rodriguez Dhia Mahjoub 2 Background Ad Network Profiling and Filtering Agenda Lexical Clustering


  1. When threat hunting fails Identifying malvertising domains using lexical clustering Tucson, January 9th, 2018

  2. Authors kitty Matt Foley David Rodriguez Dhia Mahjoub 2

  3. Background Ad Network Profiling and Filtering Agenda Lexical Clustering Hosting space and top talkers 3

  4. Background 4

  5. Exploit Kits Compromised Site Compromised Site Step 1. Gets lander (proxy) EK Server Malvertising Victim Ad Net. Publisher Staged Site (Ad) 5

  6. What is Malvertising Ad Networks Ad Exchanges DSPs Visitors Ad Agencies Publishers Ad Servers 6

  7. 7

  8. Ad Campaign Flow User visits publisher site Publisher Site Publisher site includes ad network javascript Compromised Ad Net. Compromised Ad Net. Examples: Tech support scam Rig Exploit Kit Fake flash/java update Ad network fingerprints and sends user to malvertisement 8

  9. Exploit Kits 9

  10. Tech Support Scams 10

  11. Fake Flash and Java Updates 11

  12. Ad Network Profiling and Filtering 12

  13. Filtering on non-residential IP Address 13

  14. Proxy Network 403 Squid Proxy Choice of region Rotating IPs 14

  15. Filtering on non-residential IP Address 403 GET Ad Network Returns a 403 Browsing with Ad Network DigitalOcean proxy 15

  16. Attempts with other VPS providers 16

  17. Attempts with other VPS providers 17

  18. 18

  19. 19

  20. Lexical Clustering 20

  21. Attention to Details 21

  22. Fake Flash and Java Updates 22

  23. 23

  24. More or Less Traveled Roads 24

  25. Consider the almighty RegeX Keywords safe content build free apple click Synonyms Typos Known UnKnown Keywords Keywords 25

  26. Consider the almighty RegeX grep “*.fake.*” 26

  27. Traffic Pattern of Fake Update Sites 27

  28. Traffic Pattern of Fake Update Sites Look for burst in traffic 28

  29. For one word, many 29

  30. Shingling Fake Flash and Java Update Trigram host name contentfreeandsafe4update {‘con’, ‘ont’, ‘nte’, ‘ten’, ‘ent’, …, ‘ate’} 30

  31. Shingling Fake Flash and Java Update Trigram host name contentfreeandsafe4update {‘con’, ‘ont’, ‘nte’, ‘ten’, ‘ent’, …, ‘ate’} MinHash LSH 31

  32. Locality Sensitive Hashing Fake Flash contentfreeandsafe4update contentfreeandforupdate 3 Domains with a lot of shingles in common content4freeandsafeupdate con tent fre and saf dat 32

  33. On to production 33

  34. Clustering Pipeline Realtime/Batch Count min-sketch Out pipeline hasher pipeline goodnewcontentssafe.download Cluster DB Analyst Dashboard 34

  35. Payday 35

  36. Fake Flash and Java Update Lexical Clustering cluster_1: cluster_2: goodnewcontentssafe.download call-mlcrosoftnw-err81711102.win goodnewfreecontentsload.date call-mlcrosoftnw-err99817109.win goodnewfreecontentall.trade call-mlcrosoftnw-err81711101.win ... ... cluster_3: cluster_4: artificialintelligencesweden.se mkto-sj220048.com artificialintelligencechip.com mkto-sj220146.com artificialintelligence.net.cm mkto-sj220162.com ... ... 36

  37. We need help 37

  38. Simple Flask App Dashboard 38

  39. Hosting space and top talkers 39

  40. Where are these hosted? Any patterns? ● Take 1 week’s worth of detections and their hosting space; Jan 1-7 ● Some hosters are consistently abused AS12876, FR AS14618 Amazon AWS and more Some IPs are actively hosting thousands of domains for months ● Some hosters are highly infested with shady, toxic content; dedicated? AS202023, LLHOST, RO; phishing, tech support scams, fake updates, porn 40

  41. Who is querying these domains? ● Take 1 week’s worth of detections; Jan 1-7 and user IPs ● 10 busiest hours 20000+ user IPs querying 2000+ malvertising domains ● Some top talker clusters emerge Security companies owned ranges querying hundreds of domains Some rogue networks querying hundreds of domains 41

  42. Summary 42

  43. grep “*.fake.*” Look for burst in traffic user IPs hosting IPs 43

  44. Current and NLP on misspellings and common typos Models to categorize clusters Future Work Identifying malicious file hosts using belief propagation 44

  45. Thank you Matt Foley, matfoley@cisco.com David Rodriguez, davrodr3@cisco.com Questions? Dhia Mahjoub, dmahjoub@cisco.com We are hiring 45

Recommend


More recommend