cloak of visibility detecting when machines browse a
play

Cloak of Visibility: Detecting When Machines Browse a Different Web - PowerPoint PPT Presentation

Cloak of Visibility: Detecting When Machines Browse a Different Web Luca Invernizzi *, Kurt Thomas*, Alexandros Kapravelos , Oxana Comanescu*, Jean-Michel Picod*, and Elie Bursztein* * Google - Anti-fraud and abuse research North Carolina


  1. Cloak of Visibility: Detecting When Machines Browse a Different Web Luca Invernizzi *, Kurt Thomas*, Alexandros Kapravelos † , Oxana Comanescu*, Jean-Michel Picod*, and Elie Bursztein* * Google - Anti-fraud and abuse research † North Carolina State University

  2. Web cloaking Cloaking site

  3. Web cloaking Search Effective for Search Engine Optimization Ads Effective to infringe policies Malware Effective to evade security crawlers

  4. Responsive design vs cloaking This is not cloaking .

  5. Responsive design vs cloaking 404 This is cloaking .

  6. Research goals Keep up with Identify Explore arms race trends alternatives

  7. Blackmarket Investigation Acquired Can’t go wrong with Top 10 Cloaky McCloakyFace. Cloaking software samples I swear by NowYouSeeMe!

  8. $3500+ cloaking software HTTP reverse proxy Decision based on: Network Browser Browsing context

  9. $3500+ cloaking software Configures Admin interface Generates HTTP reverse proxy

  10. Admin interface Input keywords => http://money.site Features ● Find similar sites through SERPs ● Content/Template spinning ● Drip-feeding Added services ● Plagiarism detection ● SERP ranking

  11. Cloaking techniques

  12. Technique: referer-based cloaking GET / Referer: blank GET / Referer: ... tiffany + cheap ... GET / Referer: ... tiffany ...

  13. Technique: IP blacklisting 51m 30 3 Blacklisted IPs Security companies Proxy networks 983 2 122 Subnets Hacking collectives Entities: companies, universities, registrars

  14. Crowdsourced blacklist 50k Honeypot Blacklisted IPs $350+ Subscription

  15. Host 66.249.66.1? Technique: rDNS cloaking crawl.googlebot.com. Google (.* 1e100.*, .*google.* ) Microsoft Yahoo Yandex Baidu Ask Rambler DirectHit Theoma 66.249.66.1

  16. Technique: browsing pattern cloaking Set-Cookie: now() GET / GET /clicked

  17. More techniques JS Flash/JS Geolocation: User-Agent country, city, support & carrier level. fingerprints

  18. Prevalence and dominant techniques Is this cloaking? 404 How do they cloak?

  19. Browser farm I’m real! wget wget Pretend Google bots Simple honey clients Realistic honey clients User-Agent: GoogleBot User-Agent: Chrome User-Agent: Chrome Referer: blank Referer: blank, or simple Referer: context-aware Google IP Cloud provider IPs Residential and mobile IPs

  20. Features HTML Image Syntactic Content similarity Screenshot similarity Semantic Topic similarity Screenshot topic similarity

  21. Classification 82% .9% True positive rate False positive rate 95k labeled samples 75k legitimate websites (Alexa) + 20k cloaked storefronts

  22. Prevalence 4.9% 11.7% Cloaking pages in Cloaking pages in Google AdWords , Google Search , for for health and luxury storefronts software ads. keywords.

  23. Traditional techniques: only IP, Referer, and User-Agent Search: 1 out of 5 Ads: 1 out of 4

  24. Current techniques: JavaScript support Search: Half Ads: 1 out of 4

  25. Current techniques: wait for click Search: 1 out of 10 Ads: 1 out of 5

  26. Delivery: same-page cloaking Search: 1 out of 5 Ads: 2 out of 3 Uncloaked Cloaked

  27. Delivery: 40x/50x errors to bots 404 Search: 1 out of 7 Ads: 1 out of 8

  28. Future: client-side detection Search/Ads links add a parameter with the topics Check that the page found by the bot. matches the same topics.

  29. Takeaways Prevalence Techniques Moving forward IP/ User-Agent / Client side, 5% of ads and 12% of search results Referer only gets semantic ⅕ of cloaking. features needed for cloaking-prone keywords cloak. for hard cases.

  30. Thank you! Luca Invernizzi invernizzi@google.com

Recommend


More recommend