w e b s i t e f i n g e r p r i n tj n g
play

W e b s i t e F i n g e r p r i n tj n g Claudia Diaz KU Leuven - PowerPoint PPT Presentation

W e b s i t e F i n g e r p r i n tj n g Claudia Diaz KU Leuven COSIC (With thanks to Marc Juarez and Bekah Overdorf) Summer School on real-world crypto and privacy June 2017 Outline Website Fingerprintjng for htups sites Website


  1. W e b s i t e F i n g e r p r i n tj n g Claudia Diaz KU Leuven – COSIC (With thanks to Marc Juarez and Bekah Overdorf) Summer School on real-world crypto and privacy June 2017

  2. Outline • Website Fingerprintjng for htups sites • Website Fingerprintjng for Tor • From the lab to reality: reviewing assumptjons • Fingerprintability of hidden services

  3. htups

  4. htups train test htups

  5. Side channel leaks in web applicatjons (Chen et al, 2010) • Interactjve pages that are responsive to user actjons such as choices in drop-down menus, mouse clicks, typing • Examples: healthcare diagnosis, taxatjon, web search (auto- complete) • Characteristjcs: – Stateful communicatjon: transitjons to next states depend both on the current state and on its input – Low entropy input: small input space – Uniqueness of traffjc: disparate sizes and patuerns for each possibility 5

  6. “I know why you went to the clinic” (Miller et al, 2014) • Hidden Markov Models used to leverage link structure in websites • Impact of caching and cookies was 17% (train with one optjon, test with the other)

  7. Tor directory directory server server Middl download Exi public (onion) keys e t Guar d 7

  8. Tor Tor Web

  9. Website Fingerprintjng Tor Web

  10. Website Fingerprintjng Tor Web

  11. Website Fingerprintjng Tor Web

  12. Website Fingerprintjng Tor Web Open world

  13. Tor Hidden (“Onion”) Services (HS) Introduction Point (IP) HS-IP Client xyz.onion Rendezvous Client-RP Point (RP) HS-RP HSDir HS-RP circuits are distinguishable from normal circuits (Kwon et al, 2015) Size of the HS world is estimated at a few thousands (closed world!)

  14. State of the art atuacks • k N N • CUMUL • k-Fingerprintjng

  15. kNN classifjer (Wang et al, 2014) • Features – 3,000 – total size, total tjme, number of packets, packet ordering – the lengths of the fjrst 20 packets – traffjc bursts (sequences of packets in the same directjon) • Classifjcatjon – k -NN – Tune weights of the distance metric that minimizes the distance among instances that belong to the same site. • Results – 90% - 95% accuracy on a closed-world of 100 non-onion service websites.

  16. kNN

  17. CUMUL (Panchenko et al, 2016) • Features – a 104-coordinate vector formed by the number of bytes and packets in each directjon and 100 interpolatjon points of the cumulatjve sum of packet lengths (with directjon) • Classifjcatjon – Radial Basis Functjon kernel (RBF) SVM • Results – 90% - 93% for 100 Non HS sites – Open world of 9,000 pages

  18. SVM

  19. k-Fingerprintjng (Hayes et al, 2016) • Features – 1 7 5 – Timing and Size features such as #packets/second • Classifjcatjon – Random Forest (RF) + k-NN • Results – 90% accuracy on 30 onion services – Open world of 100,000 pages

  20. Random Forest • Train decision trees with web traffjc features • Training set is randomized per tree • Random Forest is an ensemble of decision trees • Use Random Forest output as the fjngerprint of a website download

  21. Why Do We Care? • Tor is the most advanced anonymity network • WF allows an adversary to discover the browsing history • Can be deployed by a low-resource adversary (that Tor aims to protect against) • Series of successful atuacks in the lab • … how concerned should we be about these atuacks in practjce ? – Critjcal review of WF atuacks (Juarez et al, 2014)

  22. Assumptjons Client settjngs : Tor e.g., browser version, single Web tab browsing User Adversary

  23. Efgect of multj-tab browsing ● FF users use average 2 or 3 tabs ● Experiment with 2 tabs: 0.5s, 3s, 5s ● Success: detectjon of either page

  24. Experiments multj-tab Accuracy for difgerent tjme gaps Tab 1 Tab 2 77.08% BW Time 9.8% 7.9% 8.23% Control Test (3s) Test (0.5s) Test (5s)

  25. Experiments: TBB version • TBB: Tor Browser Bundle • Several versions coexist at any given tjme 79.58% 66.75% 6.51% Control Test Test (3.5.2.1) (3.5) (2.4.7)

  26. Assumptjons Tor Web Adversary : User e.g., replicability Adversary

  27. Experiments: network conditjons VM Leuven VM New York VM Singapore KU Leuven DigitalOcean (virtual private servers) 12

  28. Experiments: network conditjons VM Leuven VM New York VM Singapore 66.95% 8.83% Control (LVN) Test (NY) 12

  29. Experiments: network conditjons VM Leuven VM New York VM Singapore 66.95% 9.33% Control (LVN) Test (SI) 12

  30. Experiments: network conditjons VM Leuven VM New York VM Singapore 76.40% 68.53% Control (SI) Test (NY) 12

  31. Assumptjons Tor Web : e.g., staleness Web User Adversary

  32. Data staleness Less than 50% afuer 9d. Accuracy (%) Time (days)

  33. Efgect of false negatjves: Base rate fallacy • Breathalyzer test: – 0.88 identjfjes truly drunk drivers (true positjves) – 0.05 false positjves • Alice gives positjve in the test – What is the probability that she is indeed drunk? ( BDR ) Only 0.1! – Is it 0.95? Is it 0.88? Something in between?

  34. The base rate fallacy: example ● Circumference represents the world of drivers. ● Each dot represents a driver. 18

  35. The base rate fallacy: example ● 1% of drivers are driving drunk ( base rate or prior ). 19

  36. The base rate fallacy: example ● From drunk people 88% are identjfjed as drunk by the test 20

  37. The base rate fallacy: example ● From the not drunk people, 5% are erroneously identjfjed as drunk 21

  38. The base rate fallacy: example ● Alice must be within the black circumference ● Ratjo of red dots within the black circumference: BDR = 7/70 = 0.1 ! 22

  39. The base rate fallacy in WF • Base rate must be taken into account • In WF: – Blue: webpages – Red: monitored – Base rate? 23

  40. Experiment: BDR in a 35K world • World of 35K sites • 4 target pages • Uniform prior • For 30K sites BDR is 0.4%

  41. Disparate impact • WF normally atuacks report average success • But… – Are certain websites more susceptjble to website fjngerprintjng atuacks than others? – What makes some sites more vulnerable to the atuack than others?

  42. Misclassifjcatjons of onion services: Sites that are “safe”

  43. Misclassifjcatjons: Sites that are “safe” Some sites are Some sites are hidden from all hidden from all methods! methods!

  44. Median of total incoming packet size for misclassifjed instances 0 0 1 . 0 Predicted Site − Median 7 5 0 . 0 5 0 0 . 0 2 5 0 . 0 0 0 0 . 0 0 0 0 . 0 2 5 0 . 0 5 0 0 . 0 5 0 7 . 0 0 1 0 . 0 Median − True Site

  45. Site-level Feature Analysis • T r a c e f e a t u r e s a r e n o t a l w a y s h e l p f u l • Can we determine what characteristjcs of a website afgect its fjngerprintability? • Site-Level Features: – T o t a l H T T P d o w n l o a d s i z e – htup duratjon – screenshot size – number of scripts – …

  46. Site Level Feature Analysis

  47. WF countermeasures • Network layer – Add padding • C o n s t a n t r a t e i s u n r e a s o n a b l e • Leakage: how to optjmize padding? – Add latency to disrupt the traffjc patuern • Bad idea • Page design – Small size – Dynamism

  48. To conclude • WF can be deployed by adversaries with only local access to the communicatjons network • WF seriously undermines the protectjon ofgered by htups • WF threatens the anonymity propertjes of Tor – Though it’s unclear to which extent lab results would hold in the wild – The atuack is costly in terms of resources • Disparate impact: some pages are more fjngerprintable than others, which is not captured if you only look at average results • Countermeasures involve additjonal traffjc and/or dynamism

Recommend


More recommend