longtime behavior of harvesting spam bots
play

Longtime Behavior of Harvesting Spam Bots Oliver Hohlfeld TU Berlin - PowerPoint PPT Presentation

Longtime Behavior of Harvesting Spam Bots Oliver Hohlfeld TU Berlin / DT Labs Thomas Graf Florin Ciucu Modas GmbH TU Berlin / DT Labs Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC12 1 / 10 Image


  1. Longtime Behavior of Harvesting Spam Bots Oliver Hohlfeld TU Berlin / DT Labs Thomas Graf Florin Ciucu Modas GmbH TU Berlin / DT Labs Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 1 / 10

  2. Image source: http://www.flickr.com/photos/twistermc/3382403844/ (CC BY-SA 2.0) Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 2 / 10

  3. Why you? Image source: http://www.flickr.com/photos/twistermc/3382403844/ (CC BY-SA 2.0) Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 2 / 10

  4. Why you? Scope: Address harvesting from public web sites Image source: http://www.flickr.com/photos/twistermc/3382403844/ (CC BY-SA 2.0) Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 2 / 10

  5. Approach Our Infrastructure 9 Web Sites (1 US) Database SMTP Servers Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 3 / 10

  6. Approach Our Infrastructure Address Harvester Addresses Web Crawler HTTP 9 Web Sites (1 US) Money Addresses Database Spammer SMTP Servers Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 3 / 10

  7. Approach Our Infrastructure Address Harvester Addresses Web Crawler HTTP 9 Web Sites (1 US) Money Addresses Database Spammer Might pay botmaster to send Spam E-Mail botnet SMTP Servers Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 3 / 10

  8. Approach Our Infrastructure Address Harvester ☎ ta ✜�✁✂✄ ☎ ✆ ✝ ✦ ✧ Addresses Web Crawler HTTP 9 Web Sites (1 US) ears Money Addresses Database Spammer Might pay botmaster to send Spam E-Mail botnet SMTP Servers Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 3 / 10

  9. Host Properties How many harvesting hosts? Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 4 / 10

  10. Host Properties How many harvesting hosts? > 1 k Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 4 / 10

  11. Host Properties How many harvesting hosts? > 1 k Geolocation? Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 4 / 10

  12. Host Properties How many harvesting hosts? > 1 k Geolocation? 800 60.6% # Distinct IPs 200 2% 0 DE US GB CN NL ES CI RO TW MY Figure: by requesting IPs Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 4 / 10

  13. Host Properties How many harvesting hosts? > 1 k Geolocation? 800 300k 60.6% 46% # Distinct IPs Mails 26% SpamE− 150k 1 Host 200 10% 50k 2% 0 0 DE US GB CN NL ES CI RO TW MY RO BG DE NL CN US PL VN PT CH Figure: by requesting IPs Figure: by spam volume 24 massive harvesting hosts in Romania ( ≈ 10k page requests / day) Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 4 / 10

  14. Host Properties How many harvesting hosts? > 1 k Geolocation? 800 300k 60.6% 46% # Distinct IPs Mails 26% SpamE− 150k 1 Host 200 10% 50k 2% 0 0 DE US GB CN NL ES CI RO TW MY RO BG DE NL CN US PL VN PT CH Figure: by requesting IPs Figure: by spam volume 24 massive harvesting hosts in Romania ( ≈ 10k page requests / day) How are they connected? Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 4 / 10

  15. Host Properties How many harvesting hosts? > 1 k Geolocation? 800 300k 60.6% 46% # Distinct IPs Mails 26% SpamE− 150k 1 Host 200 10% 50k 2% 0 0 DE US GB CN NL ES CI RO TW MY RO BG DE NL CN US PL VN PT CH Figure: by requesting IPs Figure: by spam volume 24 massive harvesting hosts in Romania ( ≈ 10k page requests / day) How are they connected? 73% hosted in ADSL / cable networks Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 4 / 10

  16. Host Properties How many harvesting hosts? > 1 k Geolocation? 800 300k 60.6% 46% # Distinct IPs Mails 26% SpamE− 150k 1 Host 200 10% 50k 2% 0 0 DE US GB CN NL ES CI RO TW MY RO BG DE NL CN US PL VN PT CH Figure: by requesting IPs Figure: by spam volume 24 massive harvesting hosts in Romania ( ≈ 10k page requests / day) How are they connected? 73% hosted in ADSL / cable networks Using Tor Anonymity Service? Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 4 / 10

  17. Host Properties How many harvesting hosts? > 1 k Geolocation? 800 300k 60.6% 46% # Distinct IPs Mails 26% SpamE− 150k 1 Host 200 10% 50k 2% 0 0 DE US GB CN NL ES CI RO TW MY RO BG DE NL CN US PL VN PT CH Figure: by requesting IPs Figure: by spam volume 24 massive harvesting hosts in Romania ( ≈ 10k page requests / day) How are they connected? 73% hosted in ADSL / cable networks Using Tor Anonymity Service? No Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 4 / 10

  18. Blocking Does blacklisting help? Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 5 / 10

  19. Blocking Does blacklisting help? → Yes (26% hosts balacklisted at access time) Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 5 / 10

  20. Blocking Does blacklisting help? → Yes (26% hosts balacklisted at access time) HTTP User Agent String Fingerprinting? Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 5 / 10

  21. Blocking Does blacklisting help? → Yes (26% hosts balacklisted at access time) HTTP User Agent String Fingerprinting? Variability might imply only few active parties Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 5 / 10

  22. Blocking Does blacklisting help? → Yes (26% hosts balacklisted at access time) HTTP User Agent String Fingerprinting? Variability might imply only few active parties “Java/1.6.0 17” UA 3% of harvesting hosts 88% of harvesting page requests 55% of total spam volume 99.9% of Romanian harvesting bots Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 5 / 10

  23. Blocking Does blacklisting help? → Yes (26% hosts balacklisted at access time) HTTP User Agent String Fingerprinting? Variability might imply only few active parties “Java/1.6.0 17” UA 3% of harvesting hosts 88% of harvesting page requests 55% of total spam volume 99.9% of Romanian harvesting bots → Blocking certain user agent strings currently helps Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 5 / 10

  24. Proxies Revisited: Search Engines Search engines exploited for malicious activities Also used by harvesters? Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 6 / 10

  25. Proxies Revisited: Search Engines Search engines exploited for malicious activities Also used by harvesters? Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 6 / 10

  26. Proxies Revisited: Search Engines Our Infrastructure Search Engine 9 Web Sites (1 US) Address Harvester Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 7 / 10

  27. Proxies Revisited: Search Engines Our Infrastructure Search Engine Addresses Web Crawler HTTP 9 Web Sites (1 US) Address Harvester Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 7 / 10

  28. Proxies Revisited: Search Engines Our Infrastructure Search Engine Addresses Web Crawler HTTP 9 Web Sites (1 US) Address Harvester ECrawl, ... Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 7 / 10

  29. Proxies Revisited: Search Engines Our Infrastructure Search Engine Addresses Web Crawler HTTP 9 Web Sites (1 US) ECrawl v2.63: “Access to the Google cache Address Harvester (VERY fast harvesting)” ECrawl, ... Fast Email Harvester 1.2 : “collector sup- ports all major search engines, such as Google, Yahoo, MSN” Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 7 / 10

  30. Proxies Revisited: Search Engines Our Infrastructure Search Engine Addresses Web Crawler HTTP 9 Web Sites (1 US) Address Harvester 0.5% of addresses spammed ECrawl, ... 0.2% of total spam Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 7 / 10

  31. Proxies Revisited: Search Engines Our Infrastructure Search Engine Addresses Web Crawler HTTP 9 Web Sites (1 US) Address Harvester 0.5% of addresses spammed ECrawl, ... 0.2% of total spam → You don’t want to block Google! Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 7 / 10

  32. Address Usage 1 faster slower 0.8 70% < 11 days 0.6 CDF 0.4 faster slower 0.2 Full Data Set Search Engines 0 0.1 1 10 100 1000 Address Turnaround Time (Days) 50% spammed < 4 days (general), 11 days (search engines) Oliver Hohlfeld (TU Berlin / DT Labs) Longtime Behavior of Harvesting Spam Bots IMC’12 8 / 10

Recommend


More recommend