k -fingerprinting: a Robust Scalable Website Fingerprinting - PowerPoint PPT Presentation

k -fingerprinting: a Robust Scalable Website Fingerprinting Technique George Danezis Jamie Hayes University College London August 12, 2016 1/24

How does website fingerprinting work? - Training Tor network − Relay 2 − Adversary Relay 1 Relay 3 − Create fingerprints for , and 2/24

How does website fingerprinting work? - Attack Tor network Adversary Relay 2 − Relay 1 Client Relay 3 www Adversary checks if fingerprint of is equal to fingerprint of or or 3/24

Experimental Attack set-up Access only: Access any: Closed World Open World 4/24

Contributions k -FP - New attack based on Random Forests and k-NN 1 An analysis of the features used in this and prior work to determine which yield the most information about an encrypted or anonymized webpage. Large open world setting. In total we tested k -FP on 101,130 unique webpages. Experimented with both standard websites and Tor hidden services. 1 Wang et al. “Effective Attacks and Provable Defenses for Website Fingerprinting” 2014 5/24

Feature Analysis Features need to be drawn from a diverse set to bypass targeted WF defenses. 0.040 0.035 0.030 Feature importance score 0.025 0.020 0.015 0.010 0.005 0.000 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 Feature rank The “best” features were number of packets (incoming/outgoing) and information leaked from the first few seconds of loading a webpage. 6/24

k-FP Attack Train on a classification task with network traffic information as features. Use Random Forest output as the fingerprint of a website load. Then use k-NN for classification. 7/24

Base Rate Previous attacks had very high True Positive Rate (TPR) and very low False Positive Rate (FPR), but as the number of samples rises so too will the false alarms. As the number of samples grows, the vast majority of alarms will be false positives. 8/24

Base Rate FPR needs to be very low for an accurate attack as more fingerprints are tested. Suppose we have a FPR of 1%. If a client loads 100 unmonitored webpages. Then the attacker will mark 1 webpages incorrectly as monitored. If a client load 1,000,000 unmonitored webpages. Then the attacker will mark 10,000 webpages incorrectly as monitored. 9/24

Accuracy metrics TPR - The probability that a monitored page is classified as the correct monitored page. FPR - The probability that an unmonitored page is incorrectly classified as a monitored page. BDR - The probability that a page corresponds to the correct monitored page given that the classifier recognized it as that monitored page. Assuming a uniform distribution of pages BDR can be found from TPR and FPR using the formula TPR · Pr( M ) ( TPR · Pr( M ) + FPR · Pr( U )) where Pr( M ) = | Monitored | Pr( U ) = 1 − P ( M ) . | Total Pages | , 10/24

Tor hidden services Protects receiver anonymity in addition to sender anonymity. Sensitive servers such as SecureDrop use Tor hidden services. 11/24

Tor hidden services Tor network Adversary − Client www 12/24

Tor hidden services Tor network Adversary − Client www IP 13/24

Tor hidden services Tor network Adversary − Client www RP IP 14/24

Prelims All traffic was collected via Tor. Monitored websites by the Adversary - Alexa Sites (Google, Facebook, Wikipedia etc.) & popular Tor Hidden Services Only collected landing page of each website. Alexa monitored set consisted of 100 samples for each of 55 websites. Hidden Services monitored set consisted of 80 samples for each of 30 Hidden Services. Extra sites for testing purposes - 100,000 websites (chosen from top Alexa list). 17/24

Parameter tuning - number of neighbours and number of trees Number of neighbours Number of Trees 1.0 0.92 Max accuracy Min accuracy 0.8 0.90 0.6 True positive 0.88 Accuracy 0.4 0.86 0.2 0.84 True positive rate False positive rate 0.82 0.0 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0 50 100 150 200 False positive Number of trees Using different k , the number of neighbours allows us to tune the TPR and FPR. After adding 15 decision trees only incremental benefit in adding more. 18/24

Alexa monitored set results 1.00 0.95 True positive rate 0.90 0.85 0.80 0.75 0.70 k=1 k=5 k=10 0.040 0.035 False positive rate 0.030 0.025 0.020 0.015 0.010 0.005 0.000 20000 40000 60000 80000 100000 Number of unmonitored sites 19/24

Tor hidden service monitored set results 1.00 0.95 True positive rate 0.90 0.85 0.80 0.75 0.70 k=1 k=5 k=10 0.040 0.035 False positive rate 0.030 0.025 0.020 0.015 0.010 0.005 0.000 20000 40000 60000 80000 100000 Number of unmonitored sites 20/24

BDR Tor Hidden Services Monitored set. 1.0 Bayesian detection rate 0.8 0.6 0.4 0.2 0.0 k=1 k=5 k=10 1.0 Bayesian detection rate 0.8 0.6 0.4 0.2 0.0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 Number of unmonitored sites Alexa Monitored set. 21/24

Limitations “The BDR implicitly assumes a base rate, with no particular backing in reality.” - We assume uniform expectation of visiting a webpage. “I would like to better understand how these techniques would work if the attacker did not know the start/stop time that the user visits each website.” - Website fingerprinting evaluation may not reflect practical risks. 22/24

Conclusion The open world is not as much of a problem as we had thought, and using state-of-the-art machine learning we expect to be able to tackle other obstacles such as start-stop time identification and multiple tabs. Attack is highly accurate over a large number of webpages. Distiguishability between Tor Hidden Services and Non Tor Hidden Services. 23/24

Thanks Questions? j.hayes@cs.ucl.ac.uk @_jamiedh http://www.homepages.ucl.ac.uk/~ucabaye/ 24/24

k -fingerprinting: a Robust Scalable Website Fingerprinting - PowerPoint PPT Presentation

k -fingerprinting: a Robust Scalable Website Fingerprinting Technique George Danezis Jamie Hayes University College London August 12, 2016 1/24 How does website fingerprinting work? - Training Tor network Relay 2 Adversary Relay 1

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Feature Selection in Website Fingerprinting Junhua Yan Advisor: Prof. Jasleen Kaur July 24,

Website fingerprinting attacks against Tor Browser Bundle: a comparison between HTTP/1.1 and

Website Fingerprinting Attacks and Defenses in the Tor Onion Space Marc Juarez imec-COSIC KU

Bayes, not Nave Security Bounds on Website Fingerprinting Defenses Giovanni Cherubin Privacy

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Fingerprinting hardware devices Fingerprinting hardware devices using clock-skewing using

CO 447 | LEC6 BLOCKCHAIN SECURITY Dr. Benjamin Livshits Stateless Fingerprinting 2 EFF

PAM 2004 Typeset by Foil T EX PAM2004 Outline A Robust Classifier for Passive TCP/IP

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Visit our website www.parasdewsgurgaon.in Visit our website www.parasdewsgurgaon.in Visit our

Website Website www.nhec.coop www.nhec.coop Website Website www.nhec.coop www.nhec.coop

Robust Location and Scatter Estimators Outline for Multivariate Data Analysis Background

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T.

Third-party Identity Management Usage on the Web Anna Vapen, Niklas Carlsson, Anirban

Failures, Latency and Happy Eyeballs Contributions Takeaway Content Caches Latency IPv6

IPV6 Supply and Demand Jan 7, 2013 Charles Prince

Vorticity in atomic nuclei V.O. Nesterenko J. Kvasil, A. Repko Institute of Particle and

Building the Kamus Besar Bahasa Indonesia (KBBI) Database and Its Online Application David

Post-quantum cryptography Tanja Lange 07 October 2015 SPACE 2015 In the long term, all

Using the Library for your Final Year Project Laura Woods, Computing & Engineering Librarian

Time-aware API Popularity Prediction via Heterogeneous Features Yao

k -fingerprinting: a Robust Scalable Website Fingerprinting - PowerPoint PPT Presentation

k -fingerprinting: a Robust Scalable Website Fingerprinting Technique George Danezis Jamie Hayes University College London August 12, 2016 1/24 How does website fingerprinting work? - Training Tor network Relay 2 Adversary Relay 1

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

Feature Selection in Website Fingerprinting Junhua Yan Advisor: Prof. Jasleen Kaur July 24,

Website fingerprinting attacks against Tor Browser Bundle: a comparison between HTTP/1.1 and

Website Fingerprinting Attacks and Defenses in the Tor Onion Space Marc Juarez imec-COSIC KU

Bayes, not Nave Security Bounds on Website Fingerprinting Defenses Giovanni Cherubin Privacy

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Fingerprinting hardware devices Fingerprinting hardware devices using clock-skewing using

CO 447 | LEC6 BLOCKCHAIN SECURITY Dr. Benjamin Livshits Stateless Fingerprinting 2 EFF

PAM 2004 Typeset by Foil T EX PAM2004 Outline A Robust Classifier for Passive TCP/IP

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Visit our website www.parasdewsgurgaon.in Visit our website www.parasdewsgurgaon.in Visit our

Website Website www.nhec.coop www.nhec.coop Website Website www.nhec.coop www.nhec.coop

Robust Location and Scatter Estimators Outline for Multivariate Data Analysis Background

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T.

Third-party Identity Management Usage on the Web Anna Vapen, Niklas Carlsson, Anirban

Failures, Latency and Happy Eyeballs Contributions Takeaway Content Caches Latency IPv6

IPV6 Supply and Demand Jan 7, 2013 Charles Prince

Vorticity in atomic nuclei V.O. Nesterenko J. Kvasil, A. Repko Institute of Particle and

Building the Kamus Besar Bahasa Indonesia (KBBI) Database and Its Online Application David

Post-quantum cryptography Tanja Lange 07 October 2015 SPACE 2015 In the long term, all

Using the Library for your Final Year Project Laura Woods, Computing &amp; Engineering Librarian

Time-aware API Popularity Prediction via Heterogeneous Features Yao

Using the Library for your Final Year Project Laura Woods, Computing & Engineering Librarian