Tracing Cross Border Web Tracking Costas Iordanou Georgios Smaragdakis Ingmar Poese Nikolaos Laoutaris
We Web adver)sing fuels the w fuels the web eb 1
The r Th e rise of e of t targeted ed a ads Why Targeted ads? How it works? • Tracking and profiling users • Users get relevant ads • Real Gme aucGons of ads (RTB) • Increase user engagement • Cookie synchronizaGon • More efficient ad campaigns • Etc. • Higher ROI for the adverGsers • BeIer use of resources • Etc. User typed in “used cars for sale” 2
Th The r e rea eac) c)on on of of u user ers a and r reg egulator ors Regulators Users Browser extensions Browsers 3
User Users and r s and regula egulators r s reac eac)o )on n Regulators Users Browser extensions Browsers 4
Gen Gener eral D Data P Prot otec) ec)on on R Reg egula)on on - D - Details One of the biggest changes with respect to privacy and regulaGon on the web in the last few years (Enforcement date: 25 th May, 2018) In general the new legislaGon: 1. tries to regulate how users’ data are collected, processed and stored and 2. if they include any sensiGve informaGon about the user 5
Gen Gener eral D Data P Prot otec) ec)on on R Reg egula)on on - D - Details One of the biggest changes with respect to privacy and regulaGon on the web in the last few years (Enforcement date: 25 th May, 2018) In general the new legislaGon: 1. tries to regulate how users’ data are collected, processed and stored and 2. if they include any sensiGve informaGon about the user ImplementaGon – Per member state Data ProtecGon Authority (DPA) DPA: Responsible for complaints – invesGgaGons & enforcement InvesGgaGon starGng point – Ad & Tracking flows entry point servers locaGon RQ: How can we idenGfy the physical locaGons of such servers? 6
Ch Challen enges es 1. How to effecGvely detect ad and tracking related domains in the wild ? 2. How to ensure correct geoloca7on of infrastructure servers ? 7
Ch Challen enges es 1. How to effecGvely detect ad and tracking related domains in the wild ? 2. How to ensure correct geoloca7on of infrastructure servers ? 3. How to ensure that all possible ad and tracking servers are observed ? 4. How to maintain a balance between accuracy and scalability ? 8
Why real users instead of ju just Web crawling? User interacGon Real Users 9
Why real users instead of ju just Web crawling? User interacGon Real Users Geo load balancing 10
Mapping 3 rd rd party doma Mapping 3 mains to IPs Chrome Browser Extension Chrome API event listeners hIp://www.example.com chrome. chrome. webRequest. webRequest. onBeforeSendHeader onCompleted tracker.com Mapping Table - example.com Domain IP analyGcs.com tracker.com 213.121.66.99 … analyGcs.com … tracker.com 213.121.66.99 11
Iden)fy Ad and Tracking related doma mains easyprivacy easylist CorrecGon Script ABP Parser Custom keywords 12
Iden)fy Ad and Tracking related doma mains easyprivacy easylist AD + Tracking Domains 2 YES CorrecGon YES 3 Script Should block? Ad + Tracking related? NO ABP Parser NO 1 Custom url 1 + meta data keywords url 2 + meta data url 3 + meta data … 13
Ch Challen enges es 1. How to effecGvely detect ad and tracking related domains in the wild ? 2. How to ensure correct geoloca7on of infrastructure servers ? 3. How to ensure that all possible ad and tracking servers are observed ? 4. How to maintain a balance between accuracy and scalability ? 14
Accu Accurate g e geo-l eo-loc oca)on on of of s ser erver er IP IPs RIPE IPmap validaGon process - infrastructure servers IPs RIPE IPmap prefix region service 46.51.128.0/18 eu-west-1 AMAZON 46.51.216.0/21 ap- AMAZON southeast-1 13.73.232.0/21 japaneast AZURE 99.6% match with 20 . 19 . 14 . 128 koreacentral AZURE the reported country / 25 … … … Regions maps eu-west-1: Ireland, Ireland ap-southeast-1: Singapore, Singapore 15
Ch Challen enges es 1. How to effecGvely detect ad and tracking related domains in the wild ? 2. How to ensure correct geoloca7on of infrastructure servers ? 3. How to ensure that all possible ad and tracking servers are observed ? 4. How to maintain a balance between accuracy and scalability ? 16
Avoiding piIalls … Av - IdenGfy all domains behind each IP (Reverse DNS query) Query: hIps://freeapi.robtex.com/pdns/reverse/93.184.216.34 Response: rrname:example.org, rrdata:93.184.216.34, rrtype:A, time_first:1440526884, time_last:1535919774, count:18 rrname:www.example.org, rrdata:93.184.216.34, rrtype:A, time_first:1440723354, time_last:1527899734, count:18 rrname:www.example.com, rrdata:93.184.216.34, rrtype:A, time_first:1441108386, time_last:1535371292, count:18 rrname:www.example.net, rrdata:93.184.216.34, rrtype:A, time_first:1436692690, time_last:1527900018, count:18 rrname:imrek.org, rrdata:93.184.216.34, rrtype:A, time_first:1440827324, time_last:1508103356, count:18 rrname:example.net, rrdata:93.184.216.34, rrtype:A, time_first:1440526998, time_last:1533895598, count:18 … 17
Avoiding piIalls … Av - IdenGfy all domains behind each IP (Reverse DNS query) Query: hIps://freeapi.robtex.com/pdns/reverse/93.184.216.34 Response: rrname:example.org, rrdata:93.184.216.34, rrtype:A, time_first:1440526884, time_last:1535919774, count:18 rrname:www.example.org, rrdata:93.184.216.34, rrtype:A, time_first:1440723354, time_last:1527899734, count:18 rrname:www.example.com, rrdata:93.184.216.34, rrtype:A, time_first:1441108386, time_last:1535371292, count:18 rrname:www.example.net, rrdata:93.184.216.34, rrtype:A, time_first:1436692690, time_last:1527900018, count:18 rrname:imrek.org, rrdata:93.184.216.34, rrtype:A, time_first:1440827324, time_last:1508103356, count:18 rrname:example.net, rrdata:93.184.216.34, rrtype:A, time_first:1440526998, time_last:1533895598, count:18 … - IdenGfy all IPs for each domain (Forward DNS query) Query: hIps://freeapi.robtex.com/pdns/forward/example.com Response: rrname:example.com, rrdata:2606:280::::::1946, rrtype:AAAA, time_first:1441278890, time_last:1535952170, count:18 rrname:example.com, rrdata:93.184.216.34, rrtype:A, time_first:1441278890, time_last:1535952170, count:18 rrname:example.com, rrdata:208.77.188.166, rrtype:A, time_first:1246678898, time_last:1246678898, count:1 18
Avoiding piIalls … Av - IdenGfy all domains behind each IP (Reverse DNS query) Query: hIps://freeapi.robtex.com/pdns/reverse/93.184.216.34 Response: rrname:example.org, rrdata:93.184.216.34, rrtype:A, time_first:1440526884, time_last:1535919774, count:18 rrname:www.example.org, rrdata:93.184.216.34, rrtype:A, time_first:1440723354, time_last:1527899734, count:18 rrname:www.example.com, rrdata:93.184.216.34, rrtype:A, time_first:1441108386, time_last:1535371292, count:18 rrname:www.example.net, rrdata:93.184.216.34, rrtype:A, time_first:1436692690, time_last:1527900018, count:18 rrname:imrek.org, rrdata:93.184.216.34, rrtype:A, time_first:1440827324, time_last:1508103356, count:18 rrname:example.net, rrdata:93.184.216.34, rrtype:A, time_first:1440526998, time_last:1533895598, count:18 … - IdenGfy all IPs for each domain (Forward DNS query) Query: hIps://freeapi.robtex.com/pdns/forward/example.com Response: rrname:example.com, rrdata:a.iana-servers.net, rrtype:NS, time_first:1246678898, time_last:1535952170, count:2 rrname:example.com, rrdata:b.iana-servers.net, rrtype:NS, time_first:1246678898, time_last:1535952170, count:2 rrname:example.com, rrdata:2606:280::::::1946, rrtype:AAAA, time_first:1441278890, time_last:1535952170, count:18 rrname:example.com, rrdata:93.184.216.34, rrtype:A, time_first:1441278890, time_last:1535952170, count:18 rrname:example.com, rrdata:208.77.188.166, rrtype:A, time_first:1246678898, time_last:1246678898, count:1 19
Joining Jo ining everything thing togethe ther r Browser extension with real users RIPE IPmap CorrecGon Mapping Table - example.com Script Domain IP & tracker.com 213.121.66.99 hIps://ipmap.ripe.net/ analyGcs.com 130.12.88.110 ABP Parser … … Source country 3 rd party flow Mapping IP(s) Filtering DesGnaGon country Spain hIp://tracker.com 213.121.66.99 Ad + Tracking Germany France hIp://example.com 145.100.210.5 Clean USA … … … … … 20
Results - EU 28 member states confinement level MaxMind geo-locaGon 21
Results - EU 28 member states confinement level MaxMind geo-locaGon RIPE IPmap geo-locaGon 22
Wha What abo t about sensi)v ut sensi)ve w e web ebsit sites? es? SensiGve categories as defined by GDPR GeneGc & biometric data Race & Ethnicity PoliGcal beliefs Religion Health Sexual OrientaGon 23
Re Results - Sensi)ve websites based on EU 28 users SensiGve Category DesGnaGon ConGnent 24
Ch Challen enges es 1. How to effecGvely detect ad and tracking related domains in the wild ? 2. How to ensure correct geoloca7on of infrastructure servers ? 3. How to ensure that all possible ad and tracking servers are observed ? 4. How to maintain a balance between accuracy and scalability ? 25
Scaling up – From real users to ISP flows 26
up – Fr Sc Scaling aling up From m real users to ISP fl flows Datasets ISPs Datasets List of Ad + Tracking IPs + < 28k IPs 27
Recommend
More recommend