CrowdSurf Empowering Transparency in the Web Hassan Metwalley 25 Aug 2016, Stefano Traverso ACM SIGCOMM, Florianopolis Marco Mellia Stanislav Miskovic Mario Baldi
Introduction 26 August 2016 CrowdSurf - Stefano Traverso 2
Do you know what you HTTP? 26 August 2016 CrowdSurf - Stefano Traverso 3
Example Web tracking Thousands of Web trackers collect our data q Browsing histories q Religious, sexual, and political preferences q On average, the first tracker is met as soon as the browser starts [1] q Some trackers reach 96% of users [1] q 71% of websites host at least one tracker [1] [1] Metwalley, H. et al. “ The Online Tracking Horde: A View from Passive Measurements ”, TMA 2015 26 August 2016 CrowdSurf - Stefano Traverso 4
The Open Question How to know and choose which services our data is exchanged with and how? 26 August 2016 CrowdSurf - Stefano Traverso 5
Partial solutions In-network devices On-client q Firewalls and proxies q Browser plugins Ø Fail in case of Ø Limited scope encrypted traffic (HTTPS) Ø No control on Ø Lack scalability device traffic Ø Managed by third Ø Not transparent parties 26 August 2016 CrowdSurf - Stefano Traverso 6
A New System Goal Let users re-gain visibility and control on the information they exchange with Web services Design Principles q Crowd-sourced q Holistic knowledge built on a community of users working in any scenario q Automatic q Client-centric little engagement of the user available on any kind of q Privacy-safe device q Practical, not revolutionary never compromise users’ privacy use existing technology 26 August 2016 CrowdSurf - Stefano Traverso 7
CrowdSurf 26 August 2016 CrowdSurf - Stefano Traverso 8
CrowdSurf Cloud Client A controller collects Users download the q q information about the suggestions they like services users visit the CrowdSurf Layer q translates them into rules Ø Explicit -> their opinion Ø Implicit -> traffic samples Rules = actions on users’ q Users’ contributions processed q traffic by data-analyzers and the Regexp + action Ø advising community Results = suggestions about q the reputation of services 26 August 2016 CrowdSurf - Stefano Traverso 9
CrowdSurf Controllers Open Controller Corporate Controller q Collaborative approach q Builds directly rules for q Users improve the wisdom employees of the system q Employees can not Ø Traffic samples and customize rules opinions q All devices follow the Ø Build data analyzers and same rules suggestions 26 August 2016 CrowdSurf - Stefano Traverso 10
The CrowdSurf Layer HTTP Open Suggestions Controller CrowdSurf Layer Rule Processor Regular Expression Matching Rules to Action Corporat Bloc Log and Redirect Allow Modify e k Report Controller Anonymization TLS TCP
CrowdSurf in a picture Web Services Suggestions Ruled Interaction Opinions + Traffic samples Traffic Rules samples Open Controller Corporate Controller 26 August 2016 CrowdSurf - Stefano Traverso 12
Proof of Concept 26 August 2016 CrowdSurf - Stefano Traverso 13
Prototype Controller Client q Java-based web service q Implemented as a Firefox q Communicates with plugin CrowdSurf devices q Supports block , redirect , q Hosts a data analyzer for log&report identification of tracking sites q Collects traffic samples q Distributes suggestions 26 August 2016 CrowdSurf - Stefano Traverso 14
Example of Data Analyzer: Automatic Tracker Detector Unsupervised methodology to identify third-party trackers [2] q Observation: q trackers usually embed UIDs as URL parameters q Procedure: 1. Input: HTTP traffic samples provided by CS users 2. Take all HTTP queries to third-party services http://acmetrack.com/query? key1 =X& key2 =Y 3. Extract keys ( key1 , key2 ) and their values 4. Check the presence of key values uniquely associated to the users [2] Metwalley, H. et al “Unsupervised Detection of Web Trackers”, IEEE Globecom 2015 26 August 2016 CrowdSurf - Stefano Traverso 15
Example of Data Analyzer: Automatic Tracker Detector http://acmetrack.com/query? sid =X& tmp =Y& uid =Z Visit 1 Visit 2 Visit 3 34 new third-party trackers found sid a b c d e f g h i tmp m m m n n n p p p uid x y z x y z x y z Time 26 August 2016 CrowdSurf - Stefano Traverso 16
Performance Implications of running CrowdSurf Different user profiles Paranoid Profile Kid Profile Corporate Profile q Blocks q Activates child q Redirects q adv/tracking protection rules search.google.com q JS code q Reports traffic to to search.bing.com q Does not report traffic trackers q Blocks social samples networks, e- commerce sites, trackers q Reports acitivity on DropBox 26 August 2016 CrowdSurf - Stefano Traverso 17
Impact on Web site loading time Paranoid is 1.07 times faster than baseline Kid is 1.08 times slower Corporate is 1.18 time slower Paranoid Kid Corporate 26 August 2016 CrowdSurf - Stefano Traverso 18
Conclusion 26 August 2016 CrowdSurf - Stefano Traverso 19
Open Problems q Lot of details to consider q Design/develop/stardardize a new network layer q Protecting users’ privacy q Anonymizing HTTP/S traffic q Usability q Involve users to join q Protection from malicious biases 26 August 2016 CrowdSurf - Stefano Traverso 20
CrowdSurf Holistic, crowd-sourced system for the auditing of the information we expose in the Web https://www.myermes.com 26 August 2016 CrowdSurf - Stefano Traverso 21
Thank you! 26 August 2016 CrowdSurf - Stefano Traverso 22
Need a new model that… Enables transparency Monitor the HTTP traffic before and visibility encryption takes place Block/manipulate/report Takes actions transactions to undesired services Under user’s control Automatic, but configurable 26 August 2016 CrowdSurf - Stefano Traverso 23
Example of Data Analyzer: Automatic Tracker Detector Dataset HTTP trace from ISP running Tstat Automatic Tracker vs q 10 days of October 2014 Detector q ~19k monitored users q ~240k HTTP transactions per day Embedded Third- Third-party Trackers Keys Website 34 new third-party trackers found party Trackers cl.adform.net xid Portal1 26 atemda.com bidderuid News1 13 x.bidswitch.net user_id E-commerce1 12 News1 www.77tracking.com rand E-commerce2 9 rack.movad.net us E-commerce3 4 ovo01.webtrekk.net cs2 Portal2 4 dis.criteo.com uid Porn 3 p.rfihub.com bk-uuid Sportnews 1 ib.adnxs.com xid SearchEngine 1 26 August 2016 CrowdSurf - Stefano Traverso 24
Example A growing business around our data [3] Metwalley, H. et al. “ The Online Tracking Horde: A View from Passive Measurements ”, TMA 2015 26 August 2016 CrowdSurf - Stefano Traverso 25
Loss of visibility and control q HTTPS protects our privacy, but… q …prevents third parties to check what’s going on under the hood of encryption q …and severely limits network functions “ Child protection through the use of Internet Watch Foundation blacklists has become ineffective, with just 5% of entries still being blocked when HTTPS is deployed” [2] [2] Naylor, D. et al. “ The Cost of the "S" in HTTPS ”, CoNEXT 2014 26 August 2016 CrowdSurf - Stefano Traverso 26
Time to collect a dataset googleanalytics 26 August 2016 CrowdSurf - Stefano Traverso 27
Monitoring the Web HTTP [1] HTTPS/HTTP 2.0 [1] Popa, L. et al.,“HTTP As the Narrow Waist of the Future Internet,” ACM HotNets , 2010 26 August 2016 CrowdSurf - Stefano Traverso 28
CrowdSurf Controllers Open Controller Third party Controller Corporate Controller q Collaborative approach q Suggestions for q Builds directly rules for q Users improve the commercial purposes employees wisdom of the system q Opens to a market of q Employees can not Traffic samples and suggestions customize rules Ø opinions q All devices follow the Build data analyzers and Ø same rules suggestions 26 August 2016 CrowdSurf - Stefano Traverso 29
CrowdSurf in a picture Open controller Web Services Third-party Suggestions controller Corporate Rules Web Browsing Traffic samples Private User Device Corporate Device Corporate controller Data Analyzer 26 August 2016 CrowdSurf - Stefano Traverso 30
Recommend
More recommend