Website Fingerprinting Attacks and Defenses in the Tor Onion Space Marc Juarez imec-COSIC KU Leuven COSIC Seminar - 23rd October 2017, Leuven
Introduction • Contents of this presentation: - PETS’17: “Website Fingerprinting Defenses at the Application Layer” - CCS’17: “How Unique is Your Onion?” 2
What is Website Fingerprinting (WF)? Adversary Tor network WWW Middle User Exit Entry 3
Website Fingerprinting: deployment Training Testing 4
Why Do We Care? • Tor is the most popular anonymity network and aims to protect against such adversaries. • Series of successful attacks with accuracies greater than 90% • … how concerned should we be in practice ? – Critical review of WF attacks (Juarez et al, 2014)
Closed vs Open World Closed world Open world 6
Tor Hidden Services (HS) User xyz.onion • HS: user visits xyz.onion without resolving it to an IP • Examples: Wikileaks, GlobaLeaks, Facebook, ... 7
Website Fingerprinting on Hidden Services (HSes) • WF adversary can distinguish HSes from regular sites • Website Fingerprinting in HSes is more threatening: - Fewer sites makes HSes more identifiable (~ closed world) - HS users vulnerable because content is sensitive 8
The SecureDrop case • Freedom of the Press Foundation • Whistleblowing platform • Vulnerable to website fingerprinting (?) 9
Website Fingerprinting Defenses at the Application Layer Giovanni Cherubin 1 Jamie Hayes 2 Marc Juarez 3 1 Royal Holloway University of London 2 University College London 3 imec-COSIC KU Leuven Presented in PETS 2017, Minneapolis, MN, USA
Website Fingerprinting defenses WF Defenses BuFLO Tamaraw Tor network CS-BuFLO WTF-PAD … Middle User Entry Dummy These are TCP packets or Tor messages Real 11
Application-layer Defenses • Existing defenses are designed at the network layer Key observation: identifying info originates at app layer! Identifying info Web content ‘Latent‘ features: F 1 , …, F n HTTP(S) T(·) Tor Last layer of encryption TLS Observed features: O 1 , ..., O n TCP Adversary ... 12
Pros and Cons of app-layer Defenses The main advantage is that they are easier to implement: • do not depend on Tor to be implemented Cons: • padding runs end-to-end may require server collaboration… but HSes have incentives! • 13
LLaMA ALPaCA • Client-side (FF add-on) • Server-side (first one) • Applied on hosted content • Applied on website requests • More bandwidth overhead • More latency overhead (two different solutions, not a client-server solution) 14
ALPaCA Original Target Morphed • Abstract web pages as num objects and object sizes : pad them to match a target page • Does not impact user experience: e.g., comments in HTML/JS, images’ metadata, “ display: none” styles 15
ALPaCA strategies (1) Example: protect a SecureDrop page - Strategy 1: target page is Facebook securedrop securedrop.png fake.css index.html facebook index.html facebook.png style.css Padding 16
ALPaCA strategies (2) - Strategy 2: pad to an “anonymity set” target page securedrop securedrop.png index.html fake.css facebook facebook.png index.html style.css target Padding Defines num objects and object sizes by: Deterministic: next multiple of λ, δ ● ● Probabilistic: sampled from empirical distribution 17
Evaluation: methodology • Collect with and without defense: 100 HSes (cached) ○ Security: accuracy of attacks kNN, k-Fingerprinting (kFP), CUMUL ○ Performance: overheads - latency (extra delay) - bandwidth (extra padding/time) 18
ALPaCA: results • From 40% to 60% decrease in accuracy • 50% latency and 85% bandwidth overheads 19
20
How Unique is Your Onion? An Analysis of the Fingerprintability of Tor Onion Services Rebekah Overdorf 1 Marc Juarez 2 Gunes Acar 2 Rachel Greenstadt 1 Claudia Diaz 2 1 Drexel University 2 imec-COSIC KU Leuven To be presented in CCS 2017, Dallas, TX, USA
Disparate impact • WF normally attacks report average success • But… – Are certain websites more susceptible to website fingerprinting attacks than others? – What makes some sites more vulnerable to the attack than others? Credit: Claudia Diaz
State-of-the-Art Attacks - k-NN (Wang et al., 2015) - CUMUL (Panchenko et al., 2016) - k-Fingerprinting (Hayes and Danezis, 2016) 23
k-NN (Wang et al. 2015) • Features – number of outgoing packets in spans of 30 packets – the lengths of the first 20 packets – traffic bursts (sequences of packets in the same direction) • Classification – k -NN – Tune weights of the distance metric that minimizes the distance among instances that belong to the same site. • Results – From 90% to 95% accuracy on a closed-world of 100 non-hidden service websites. Credit: Bekah Overdorf
CUMUL (Panchenko et al. 2016) • Features – 100 interpolation points of the cumulative sum of packet lengths (with direction) • Classification – Radial Basis Function kernel (RBF) SVM • Results – From 90% to 93% for 100 Non HS sites. Credit: Bekah Overdorf
k-Fingerprinting (Hayes and Danezis 2016) • Features – Timing and size features in the literature • Classification – Random Forest (RF) + k-NN • Results – 90% accuracy on 30 hidden services Credit: Bekah Overdorf
Data • Crawled 790 sites over Tor (homepages) • Removed – Offline sites – Failed visits – Duplicates • 482 sites fit our criteria with 70 visits each Credit: Bekah Overdorf
Credit: Bekah Overdorf
SecureDrop sites • There was a SecureDrop site in our dataset: – Project On Gov’t Oversight’ (POGO) • CUMUL achieved 99%!!! – As compared to 80% in average
Misclassifications of Hidden Services Credit: Bekah Overdorf
Misclassifications of Hidden Services Credit: Bekah Overdorf
Median of total incoming packet size for misclassified instances Credit: Bekah Overdorf
Low-level Feature Analysis • Intra-class variance: similarity between instances of the same site. – Lower intra-class variances improves identification. • Inter-class variance: similarity between instances of different sites. – Higher inter-class variances improves identification. Top features: 1. Total Size of all Outgoing Packets 2. Total Size of Incoming Packets 3. Number of Incoming Packets 4. Number of Outgoing Packets
Site-level Feature Analysis • Can we determine what characteristics of a website affect its fingerprintability ? • Site-Level Features: – Number of embedded resources – Number of fonts – Screenshot size – Use of a CMS? – …
Can we predict if a site will be fingerprintable? “Meta-classifier” Random forest regressor
Results: importance of site-level features
Take aways • WF threatens Tor, especially its Hidden services. • Disparate impact: some pages are more fingerprintable than others (there is a bias in reporting average results). • WF defenses that alter the website design (app layer) are easier to implement and as effective as network-layer defenses. • Changes of the paget that protect against WF: – Small (e.g., fewer resources) and dynamic.
Take aways • WF threatens Tor, especially its Hidden services. • Disparate impact: some pages are more fingerprintable than others (there is a bias in reporting average results). • WF defenses that alter the website design (app layer) are easier Future work Re-design ALPaCA to follow to implement and as effective as network-layer defenses. these guidelines. • Changes of the paget that protect against WF: – Small (e.g., fewer resources) and dynamic.
Software and Data • HSes have incentives to support server-side defenses: SecureDrop has implemented a prototype of ALPaCA • ALPaCA is running on a HS: 3tmaadslguc72xc2.onion • Source code defenses: github.com/camelids • Source code and data for fingerprintability analysis: cosic.esat.kuleuven.be/fingerprintability 40
The HS world • Exploratory crawl: 5,000 HSes (from Ahmia.fi) • Stats for the HS world (from intercepted HTTP headers) - Distribution of types, sizes and number of resources • Most HSes are small compared to an average website • Few HSes have any JS or 3rd-party content - JS: less than 13% Assumption: no JS - 3rd party content: less than 20% Assumption: no 3rd parties 41
Limitations and Future Work • ALPaCA can only make sites bigger, but not smaller • What’s the optimal padding at the app layer? Lack of a thorough feature analysis. • How do the distributions change over time? How do we update our defenses accordingly? - How does the strategy need be adapted as HSes adopt our defense(s)? 42
LLaMA Client Server • Inspired by Randomized Pipelining C 1 Goal: randomize HTTP requests C 2 • Same goal from a FF add-on: δ C 1 ’ - Random delays ( δ) C 2 - Repeat previous requests (C 1 ) 43
LLaMA: results • Accuracy drops between 20% and 30% • Less than 10% latency and bandwidth overheads 44
Recommend
More recommend