melting the snow detecting snowshoe spam domains using
play

Melting the Snow: Detecting Snowshoe Spam Domains Using Active DNS - PowerPoint PPT Presentation

Olivier van der Toorn <o.i.vandertoorn@utwente.nl> November 13, 2018 University of Twente, Design and Analysis of Communication Systems NOMS 2018 Melting the Snow: Detecting Snowshoe Spam Domains Using Active DNS Measurements


  1. Olivier van der Toorn <o.i.vandertoorn@utwente.nl> November 13, 2018 University of Twente, Design and Analysis of Communication Systems NOMS 2018 Melting the Snow: Detecting Snowshoe Spam Domains Using Active DNS Measurements

  2. Introduction

  3. 1

  4. 1

  5. 1

  6. 2 Snowshoe Spam Internet

  7. 2 • few hosts Snowshoe Spam Internet Spam

  8. 2 • few hosts Snowshoe Spam Internet Spam • many messages per host

  9. 2 • few hosts Snowshoe Spam Internet Spam Snowshoe Spam • many hosts • many messages per host

  10. 2 • few hosts Snowshoe Spam Internet Spam Snowshoe Spam • many hosts • many messages per host • few messages per host

  11. 3 Assumption

  12. 4 Assumption

  13. ip4:217.72.207.0/27 -all Email from ‘paypoint_sanchez@consultant.com’ designate 103.10.4.139 as permitted sender) client-ip=103.10.4.139; Typical usage of SPF v=spf1 ip4:213.165.64.0/23 ip4:74.208.5.64/26 ip4:74.208.122.0/26 ip4:212.227.126.128/25 ip4:212.227.15.0/24 ip4:212.227.17.0/27 ip4:74.208.4.192/26 ip4:82.165.159.0/24 Received-SPF: fail (google.com: domain of paypoint_sanchez@consultant.com does not v=spf1 a mx ip4:167.160.22.0/24 -all 5 Assumption: Background SPF record from ‘consultant.com’

  14. Email from ‘paypoint_sanchez@consultant.com’ designate 103.10.4.139 as permitted sender) client-ip=103.10.4.139; Typical usage of SPF v=spf1 ip4:213.165.64.0/23 ip4:74.208.5.64/26 ip4:74.208.122.0/26 ip4:212.227.126.128/25 ip4:212.227.15.0/24 ip4:212.227.17.0/27 ip4:74.208.4.192/26 ip4:82.165.159.0/24 ip4:217.72.207.0/27 -all Received-SPF: fail (google.com: domain of paypoint_sanchez@consultant.com does not v=spf1 a mx ip4:167.160.22.0/24 -all 5 Assumption: Background SPF record from ‘consultant.com’

  15. Email from ‘paypoint_sanchez@consultant.com’ designate 103.10.4.139 as permitted sender) client-ip=103.10.4.139; Typical usage of SPF v=spf1 ip4:213.165.64.0/23 ip4:74.208.5.64/26 ip4:74.208.122.0/26 ip4:212.227.126.128/25 ip4:212.227.15.0/24 ip4:212.227.17.0/27 ip4:74.208.4.192/26 ip4:82.165.159.0/24 Received-SPF: fail (google.com: domain of paypoint_sanchez@consultant.com does not v=spf1 a mx ip4:167.160.22.0/24 -all 5 Assumption: Background SPF record from ‘consultant.com’ ip4:217.72.207.0/27 -all

  16. Typical usage of SPF v=spf1 ip4:213.165.64.0/23 ip4:74.208.5.64/26 ip4:74.208.122.0/26 ip4:212.227.126.128/25 ip4:212.227.15.0/24 ip4:212.227.17.0/27 ip4:74.208.4.192/26 ip4:82.165.159.0/24 Received-SPF: fail (google.com: domain of paypoint_sanchez@consultant.com does not v=spf1 a mx ip4:167.160.22.0/24 -all 5 Assumption: Background SPF record from ‘consultant.com’ ip4:217.72.207.0/27 -all Email from ‘paypoint_sanchez@consultant.com’ designate 103.10.4.139 as permitted sender) client-ip=103.10.4.139;

  17. v=spf1 ip4:213.165.64.0/23 ip4:74.208.5.64/26 ip4:74.208.122.0/26 ip4:212.227.126.128/25 ip4:212.227.15.0/24 ip4:212.227.17.0/27 ip4:74.208.4.192/26 ip4:82.165.159.0/24 Received-SPF: fail (google.com: domain of paypoint_sanchez@consultant.com does not v=spf1 a mx ip4:167.160.22.0/24 -all 5 Assumption: Background SPF record from ‘consultant.com’ ip4:217.72.207.0/27 -all Email from ‘paypoint_sanchez@consultant.com’ designate 103.10.4.139 as permitted sender) client-ip=103.10.4.139; Typical usage of SPF

  18. 6 While snowshoe spammers are hard to detect, but still leave a trace in the DNS. Hypothesis

  19. Snowshoe spam + SPF While snowshoe spammers are hard to detect, but still leave a trace in the DNS. 6 Hypothesis

  20. Snowshoe spam + SPF While snowshoe spammers are hard to detect, but still leave a trace in the DNS. Many hosts + a DNS record for each host or a long SPF record 6 Hypothesis

  21. While snowshoe spammers are hard to detect, but still leave a trace in the DNS. Snowshoe spam + SPF Many hosts + a DNS record for each host or a long SPF record 6 Hypothesis Domain with many records or long SPF records

  22. While snowshoe spammers are hard to detect, but still leave a trace in the DNS. Snowshoe spam + SPF Many hosts + a DNS record for each host or a long SPF record Domain with many records or long SPF records Active DNS measurements are a good way to detect snowshoe spam domains. 6 Hypothesis

  23. Methodology

  24. OpenINTEL 7 (DNS data source) Overview

  25. 8 OpenINTEL (DNS data source) Machine Learning (processing) Overview

  26. 8 OpenINTEL (DNS data source) Machine Learning (processing) Realtime Blackhole List (storage) Overview

  27. 8 OpenINTEL (DNS data source) Machine Learning (processing) Realtime Blackhole List (storage) SURFnet (validation) Overview

  28. • Queries more than 60% of registered domain names (in total more than 206 million) • Active DNS measurement platform • A • AAAA • MX • NS • … • Every 24 hours a measurement is started 9 OpenINTEL: Background

  29. • Active DNS measurement platform • A • AAAA • MX • NS • … • Every 24 hours a measurement is started 9 OpenINTEL: Background • Queries more than 60% of registered domain names (in total more than 206 million)

  30. • Active DNS measurement platform • A • AAAA • MX • NS • … • Every 24 hours a measurement is started 9 OpenINTEL: Background • Queries more than 60% of registered domain names (in total more than 206 million)

  31. • Active DNS measurement platform • A • AAAA • MX • NS • … • Every 24 hours a measurement is started 9 OpenINTEL: Background • Queries more than 60% of registered domain names (in total more than 206 million)

  32. 37 features • Simple: number of MX addresses • Complex: number of IP addresses inside an SPF record These features are not computed for every domain in OpenINTEL. 10 OpenINTEL: Datasets & Features

  33. 37 features • Simple: number of MX addresses • Complex: number of IP addresses inside an SPF record These features are not computed for every domain in OpenINTEL. 10 OpenINTEL: Datasets & Features

  34. 37 features • Simple: number of MX addresses • Complex: number of IP addresses inside an SPF record These features are not computed for every domain in OpenINTEL. 10 OpenINTEL: Datasets & Features

  35. 37 features • Simple: number of MX addresses • Complex: number of IP addresses inside an SPF record These features are not computed for every domain in OpenINTEL. 10 OpenINTEL: Datasets & Features

  36. 11 OpenINTEL: Long Tail Analysis

  37. 12 OpenINTEL (DNS data source) Machine Learning (processing) Realtime Blackhole List (storage) SURFnet (validation) Machine Learning

  38. The performance of each classifier is compared based on the precision metric. We have trained and evaluated 12 Machine Learning algorithms. • Training dataset from domains on the long tail which appear in known blacklists. Precision True Positives True Positives False Positives Selected the ‘AdaBoost’ classifier as our classifier of choice, since it had the highest precision (98% with a FPR of 1%). 13 Machine Learning: 12 algorithms

  39. We have trained and evaluated 12 Machine Learning algorithms. • Training dataset from domains on the long tail which appear in known blacklists. True Positives Selected the ‘AdaBoost’ classifier as our classifier of choice, since it had the highest precision (98% with a FPR of 1%). 13 Machine Learning: 12 algorithms The performance of each classifier is compared based on the precision metric. Precision = True Positives + False Positives

  40. We have trained and evaluated 12 Machine Learning algorithms. • Training dataset from domains on the long tail which appear in known blacklists. True Positives Selected the ‘AdaBoost’ classifier as our classifier of choice, since it had the highest precision (98% with a FPR of 1%). 13 Machine Learning: 12 algorithms The performance of each classifier is compared based on the precision metric. Precision = True Positives + False Positives

  41. 14 OpenINTEL (DNS data source) Machine Learning (processing) Realtime Blackhole List (storage) SURFnet (validation) Realtime Blackhole List (RBL)

  42. 15 OpenINTEL (DNS data source) Machine Learning (processing) Realtime Blackhole List (storage) SURFnet (validation) SURFnet

  43. We have selected the AdaBoost classifier as our classifier of choice, since it had the highest precision metric. The results of our daily detections are stored in an RBL. Active DNS measurements of more than 60% of registered domain names forms the source of our data. We filter out large domains via the Long Tail Analysis. We have evaluated the RBL in SURFmailfilter. 16 Methodology: Recap

  44. The results of our daily detections are stored in an RBL. Active DNS measurements of more than 60% of registered domain names forms the source of our data. We filter out large domains via the Long Tail Analysis. We have evaluated the RBL in SURFmailfilter. 16 Methodology: Recap We have selected the AdaBoost classifier as our classifier of choice, since it had the highest precision metric.

Recommend


More recommend