Website fingerprinting on Tor: attacks and defenses Claudia Diaz KU - PowerPoint PPT Presentation

Website fingerprinting on Tor: attacks and defenses Claudia Diaz KU Leuven Joint work with: Marc Juarez , Sadia Afroz, Gunes Acar, Rachel Greenstadt, Mohsen Imani, Mike Perry, Matthew Wright Post-Snowden Cryptography Workshop Brussels, December 10, 2015

Metadata It’s not just about communications content: Sigint Time, duration, size, identities, location, pattern Exposed by default in communications protocols Bulk collection: size much smaller than content Machine readable, cheap to analyze, highly revealing Much lower level of legal protection Dedicated systems to protect metadata Tor network NSA program “Egotistical Giraffe”

Introduction: how does WF work? Tor Web User User = Alice Adversary Webpage = ?? 2

Why is WF so important? �� Tor as the most advanced anonymity network (according to NSA) �� Allows an adversary to recover users web browsing history �� Series of successful attacks �� Weak adversary model (local adversary) Number of top conference publications on WF (30) 3

Introduction: assumptions Client settings : Tor Browsing behaviour: which Web pages, one at the time User Adversary 4

Introduction: assumptions Adversary : Tor Replicability system Web config, parsing (start/ User end page), clean traces Adversary 4

Introduction: assumptions Web : Tor No personalisation, Web or staleness User Adversary 4

Methodology Based on Wang and Goldberg’s Batches and k-fold cross-validation Fast-levenshtein attack (SVM) Comparative experiments Key: isolate variable under evaluation (e.g., TBB version) 6

Comparative experiments: example ● Step 1: ● Step 2: 7

Comparative experiments: example ● Step 1: Train: on data with default value Accuracy Test: on data with default value ● Step 2: Control 7

Comparative experiments: example ● Step 1: Train: on data with default value Accuracy ● Step 2: Test: on data with value of interest Test 7

Experiments: multitab browsing ● FF users use average 2 or 3 tabs 9

Experiments: multitab browsing ● FF users use average 2 or 3 tabs ● Experiment with 2 tabs: 0.5s, 3s, 5s 9

Experiments: multitab browsing Foreground Foreground Background Background ● FF users use average 2 or 3 tabs ● Experiment with 2 tabs: 0.5s, 3s, 5s ● Background page picked at random 9

Experiments: multitab browsing ● FF users use average 2 or 3 tabs ● Experiment with 2 tabs: 0.5s, 3s, 5s ● Background page picked at random for a batch ● Success: detection of either page 9

Experiments: multitab browsing Accuracy for different time gaps Tab 1 Tab 2 77.08% BW Time 9.8% 7.9% 8.23% Control Test (3s) Test (0.5s) Test (5s) 10

Experiments: TBB versions Coexisting Tor Browser Bundle (TBB) versions Versions: 2.4.7, 3.5 and 3.5.2.1 (changes in RP, etc.) 11

Experiments: TBB versions Coexisting Tor Browser Bundle (TBB) versions Versions: 2.4.7, 3.5 and 3.5.2.1 (changes in RP, etc.) 79.58% 66.75% 6.51% Control Test Test (3.5.2.1) (3.5) (2.4.7) 11

Experiments: network conditions VM Leuven VM New York VM Singapore KU Leuven DigitalOcean (virtual private servers) 12

Experiments: network conditions VM Leuven VM New York VM Singapore 66.95% 8.83% Control (LVN) Test (NY) 12

Experiments: network conditions VM Leuven VM New York VM Singapore 66.95% 9.33% Control (LVN) Test (SI) 12

Experiments: network conditions VM Leuven VM New York VM Singapore 76.40% 68.53% Control (SI) Test (NY) 12

Experiments: data staleness Staleness of our collected data over 90 days (Alexa Top 100) Less than 50% after 9d. Accuracy (%) Time (days) 15

Summary 16

Closed vs Open world Early prior WF works considered closed world of pages users may browse (train and test on that world) In practice: in the Tor case, extremely large universe of web pages How likely is the user (a priori) to visit a target web page? - If adversary has a good prior, the attack becomes “confirmation attack” - BUT may be hard for adversary to have a good prior, particularly for less popular pages - If the prior is not a good estimate: base rate fallacy � many false positives “False positives matter a lot” 1 1 Mike Perry, “A Critique of Website Traffic Fingerprinting Attacks”, Tor project Blog, 2013. https:// blog.torproject.org/blog/critique-website-traffic-fingerprinting-attacks.

The base rate fallacy: example Breathalyzer test: 0.88 identifies truly drunk drivers (true positives) 0.05 false positives Alice gives positive in the test What is the probability that she is indeed drunk? ( BDR ) Is it 0.95? Is it 0.88? Something in between? 17

The base rate fallacy: example Breathalyzer test: 0.88 identifies truly drunk drivers (true positives) 0.05 false positives Alice gives positive in the test Only 0.1! What is the probability that she is indeed drunk? ( BDR ) Is it 0.95? Is it 0.88? Something in between? 17

The base rate fallacy: example ● Circumference represents the world of drivers. ● Each dot represents a driver. 18

The base rate fallacy: example ● 1% of drivers are driving drunk ( base rate or prior ). 19

The base rate fallacy: example ● From drunk people 88% are identified as drunk by the test 20

The base rate fallacy: example ● From the not drunk people, 5% are erroneously identified as drunk 21

The base rate fallacy: example ● Alice must be within the black circumference ● Ratio of red dots within the black circumference: BDR = 7/70 = 0.1 ! 22

The base rate fallacy in WF Base rate must be taken into account In WF: Blue: webpages Red: monitored Base rate? 23

The base rate fallacy in WF Probability of visiting a monitored page? Experiment - 4 monitored pages - Train on Alexa top 100, test on Alexa top 35K - Binary classification: monitored / non-monitored Prior probability of visiting a monitored page: - Uniform in 35K - Priors estimated from Active Linguistic Authentication Dataset (ALAD ) dataset (3,5%): Real-world users (80 users, 40K unique URLs) 24

Experiment: BDR in a 35K world 0.8 ● Uniform world 0.13 ● Non-popular pages 0.026 from ALAD Size of the world 25

Classify, but verify Verification step to test classifier confidence Number of FPs reduced But BDR is still very low for non popular pages 26

Cost for the adversary Adversary’s cost will depend on: Number of pages (versions, personalisation) 27

Cost for the adversary Adversary’s cost will depend on: Number of pages (versions, personalisation) Number of target users (system configuration, location) 29

Cost for the adversary Adversary’s cost will depend on: Number of pages (versions, personalisation) Number of target users (system configuration, location) Training and testing complexities of the classifier 31

Cost for the adversary Adversary’s cost will depend on: Number of pages (versions, personalisation) Number of target users (system configuration, location) Training and testing complexities of the classifier Maintaining a successful WF system is costly 32

Defenses against WF attacks High level: Randomized pipelining, HTTPOS Ineffective � Supersequence approaches, traffic morphing: grouping pages to create anonymity sets Infeasible � BuFLO: constant rate Expensive (bandwidth), usability (latency) � Tamaraw, CS-BuFLO Still expensive (bandwidth), usability (latency) �

Requirements defenses Effectiveness Do not increase latency No need for computing / distributing auxiliary information No server-side cooperation needed Bandwidth: some increase is tolerable in the input connections to the network

Adaptive padding Based on proposal by Shmatikov and Wang as defense for E2E traffic confirmation attacks Generates traffic packets at random times Inter-packet timings following distribution of general web traffic Does NOT introduce latency: real packets are not delayed Disturbs key traffic features exploited by classifiers (burst features, total size) in an unpredictable way, different for each visit to the same page

Adaptive padding implementation Implemented as a pluggable transport Implemented by both ends (OP �� Guard or Bridge) Controlled by the client (OP) Need to obtain the distribution of inter-packet delays: crawl

Adaptive padding

Modifications to adaptive padding Interactivity : two additional histograms to generate dummies in response to a packet received from the other end Control messages : for client to tell server parameters of padding Soft-stop condition : sampling an infinity value (probabilistic)

Adaptive padding evaluation Classifier: kNN (Wang et al.) Experimental setup: Training: Top Alexa 100 Monitored pages: 10, 25, 80 Open world: 5K-30K pages

Evaluation results Comparison with other defenses Closed world: 100 pages Ideal attack conditions

Website fingerprinting on Tor: attacks and defenses Claudia Diaz KU - PowerPoint PPT Presentation

Website fingerprinting on Tor: attacks and defenses Claudia Diaz KU Leuven Joint work with: Marc Juarez , Sadia Afroz, Gunes Acar, Rachel Greenstadt, Mohsen Imani, Mike Perry, Matthew Wright Post-Snowden Cryptography Workshop Brussels, December

k -fingerprinting: a Robust Scalable Website Fingerprinting Technique George Danezis Jamie Hayes

Website fingerprinting attacks against Tor Browser Bundle: a comparison between HTTP/1.1 and

Feature Selection in Website Fingerprinting Junhua Yan Advisor: Prof. Jasleen Kaur July 24,

Bayes, not Nave Security Bounds on Website Fingerprinting Defenses Giovanni Cherubin Privacy

Website Fingerprinting Attacks and Defenses in the Tor Onion Space Marc Juarez imec-COSIC KU

Website Fingerprinting Attacking Popular Privacy Enhancing Technologies with the Multinomial

Website Fingerprinting at Internet Scale Andriy Panchenko 1 , Fabian Lanze 1 , Andreas Zinnen 2 ,

A Critical Evaluation of Website Fingerprinting Attacks Marc Juarez 1 Sadia Afroz 2 Gunes Acar 1

Website Fingerprinting Defenses at the Application Layer Giovanni Cherubin 1 Jamie Hayes 2 Marc

Website Fingerprinting Defenses at the Application Layer Giovanni Cherubin 1 Jamie Hayes 2 Marc

CO 447 | LEC6 BLOCKCHAIN SECURITY Dr. Benjamin Livshits Stateless Fingerprinting 2 EFF

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Fingerprinting hardware devices Fingerprinting hardware devices using clock-skewing using

Blind Elephant: Web Application Fingerprinting & Vulnerability Inferencing Patrick Thomas

Articulus Detecting IP Hijacking Through Server Fingerprinting Research Question How can we

Overview System Security Scanning Security Scanning and Discovery Important Security Web

Clock Around the Clock Time-Based Device Fingerprinting Iskander Sanchez-Rola, Igor Santos,

Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Carnegie

Website Website www.nhec.coop www.nhec.coop Website Website www.nhec.coop www.nhec.coop

Lab 5: Shark Attacks, Again! DNA Fingerprinting to the Rescue Notebook Lab Objectives

Lab 5: Shark Attacks, Again! DNA Fingerprinting to the Rescue Notebook Lab Objectives

Active 802.11 Fingerprinting: gibberish and secret handshakes to know your AP Sergey

Br Browser fi fingerprinting Nataliia Bielova @nataliabielova February 12

Forensic DNA Fingerprinting Lab Tools and Technology Used During Lab p20 micropipette

Website fingerprinting on Tor: attacks and defenses Claudia Diaz KU - PowerPoint PPT Presentation

Website fingerprinting on Tor: attacks and defenses Claudia Diaz KU Leuven Joint work with: Marc Juarez , Sadia Afroz, Gunes Acar, Rachel Greenstadt, Mohsen Imani, Mike Perry, Matthew Wright Post-Snowden Cryptography Workshop Brussels, December

k -fingerprinting: a Robust Scalable Website Fingerprinting Technique George Danezis Jamie Hayes

Website fingerprinting attacks against Tor Browser Bundle: a comparison between HTTP/1.1 and

Feature Selection in Website Fingerprinting Junhua Yan Advisor: Prof. Jasleen Kaur July 24,

Bayes, not Nave Security Bounds on Website Fingerprinting Defenses Giovanni Cherubin Privacy

Website Fingerprinting Attacks and Defenses in the Tor Onion Space Marc Juarez imec-COSIC KU

Website Fingerprinting Attacking Popular Privacy Enhancing Technologies with the Multinomial

Website Fingerprinting at Internet Scale Andriy Panchenko 1 , Fabian Lanze 1 , Andreas Zinnen 2 ,

A Critical Evaluation of Website Fingerprinting Attacks Marc Juarez 1 Sadia Afroz 2 Gunes Acar 1

Website Fingerprinting Defenses at the Application Layer Giovanni Cherubin 1 Jamie Hayes 2 Marc

Website Fingerprinting Defenses at the Application Layer Giovanni Cherubin 1 Jamie Hayes 2 Marc

CO 447 | LEC6 BLOCKCHAIN SECURITY Dr. Benjamin Livshits Stateless Fingerprinting 2 EFF

Acoustic Fingerprinting Soundz Jake Runzer June 28, 2018 Jake Runzer Acoustic Fingerprinting

Fingerprinting hardware devices Fingerprinting hardware devices using clock-skewing using

Blind Elephant: Web Application Fingerprinting &amp; Vulnerability Inferencing Patrick Thomas

Articulus Detecting IP Hijacking Through Server Fingerprinting Research Question How can we

Overview System Security Scanning Security Scanning and Discovery Important Security Web

Clock Around the Clock Time-Based Device Fingerprinting Iskander Sanchez-Rola, Igor Santos,

Overview n Melody-Based Retrieval n Audio-Score Alignment n Music Fingerprinting 2 Carnegie

Website Website www.nhec.coop www.nhec.coop Website Website www.nhec.coop www.nhec.coop

Lab 5: Shark Attacks, Again! DNA Fingerprinting to the Rescue Notebook Lab Objectives

Lab 5: Shark Attacks, Again! DNA Fingerprinting to the Rescue Notebook Lab Objectives

Active 802.11 Fingerprinting: gibberish and secret handshakes to know your AP Sergey

Br Browser fi fingerprinting Nataliia Bielova @nataliabielova February 12

Forensic DNA Fingerprinting Lab Tools and Technology Used During Lab p20 micropipette

Blind Elephant: Web Application Fingerprinting & Vulnerability Inferencing Patrick Thomas