W e b s i t e F i n g e r p r i n tj n g Claudia Diaz KU Leuven - PowerPoint PPT Presentation

W e b s i t e F i n g e r p r i n tj n g Claudia Diaz KU Leuven – COSIC (With thanks to Marc Juarez and Bekah Overdorf) Summer School on real-world crypto and privacy June 2017

Outline • Website Fingerprintjng for htups sites • Website Fingerprintjng for Tor • From the lab to reality: reviewing assumptjons • Fingerprintability of hidden services

htups train test htups

Side channel leaks in web applicatjons (Chen et al, 2010) • Interactjve pages that are responsive to user actjons such as choices in drop-down menus, mouse clicks, typing • Examples: healthcare diagnosis, taxatjon, web search (auto- complete) • Characteristjcs: – Stateful communicatjon: transitjons to next states depend both on the current state and on its input – Low entropy input: small input space – Uniqueness of traffjc: disparate sizes and patuerns for each possibility 5

“I know why you went to the clinic” (Miller et al, 2014) • Hidden Markov Models used to leverage link structure in websites • Impact of caching and cookies was 17% (train with one optjon, test with the other)

Tor directory directory server server Middl download Exi public (onion) keys e t Guar d 7

Tor Tor Web

Website Fingerprintjng Tor Web

Website Fingerprintjng Tor Web Open world

Tor Hidden (“Onion”) Services (HS) Introduction Point (IP) HS-IP Client xyz.onion Rendezvous Client-RP Point (RP) HS-RP HSDir HS-RP circuits are distinguishable from normal circuits (Kwon et al, 2015) Size of the HS world is estimated at a few thousands (closed world!)

State of the art atuacks • k N N • CUMUL • k-Fingerprintjng

kNN classifjer (Wang et al, 2014) • Features – 3,000 – total size, total tjme, number of packets, packet ordering – the lengths of the fjrst 20 packets – traffjc bursts (sequences of packets in the same directjon) • Classifjcatjon – k -NN – Tune weights of the distance metric that minimizes the distance among instances that belong to the same site. • Results – 90% - 95% accuracy on a closed-world of 100 non-onion service websites.

CUMUL (Panchenko et al, 2016) • Features – a 104-coordinate vector formed by the number of bytes and packets in each directjon and 100 interpolatjon points of the cumulatjve sum of packet lengths (with directjon) • Classifjcatjon – Radial Basis Functjon kernel (RBF) SVM • Results – 90% - 93% for 100 Non HS sites – Open world of 9,000 pages

k-Fingerprintjng (Hayes et al, 2016) • Features – 1 7 5 – Timing and Size features such as #packets/second • Classifjcatjon – Random Forest (RF) + k-NN • Results – 90% accuracy on 30 onion services – Open world of 100,000 pages

Random Forest • Train decision trees with web traffjc features • Training set is randomized per tree • Random Forest is an ensemble of decision trees • Use Random Forest output as the fjngerprint of a website download

Why Do We Care? • Tor is the most advanced anonymity network • WF allows an adversary to discover the browsing history • Can be deployed by a low-resource adversary (that Tor aims to protect against) • Series of successful atuacks in the lab • … how concerned should we be about these atuacks in practjce ? – Critjcal review of WF atuacks (Juarez et al, 2014)

Assumptjons Client settjngs : Tor e.g., browser version, single Web tab browsing User Adversary

Efgect of multj-tab browsing ● FF users use average 2 or 3 tabs ● Experiment with 2 tabs: 0.5s, 3s, 5s ● Success: detectjon of either page

Experiments multj-tab Accuracy for difgerent tjme gaps Tab 1 Tab 2 77.08% BW Time 9.8% 7.9% 8.23% Control Test (3s) Test (0.5s) Test (5s)

Experiments: TBB version • TBB: Tor Browser Bundle • Several versions coexist at any given tjme 79.58% 66.75% 6.51% Control Test Test (3.5.2.1) (3.5) (2.4.7)

Assumptjons Tor Web Adversary : User e.g., replicability Adversary

Experiments: network conditjons VM Leuven VM New York VM Singapore KU Leuven DigitalOcean (virtual private servers) 12

Experiments: network conditjons VM Leuven VM New York VM Singapore 66.95% 8.83% Control (LVN) Test (NY) 12

Experiments: network conditjons VM Leuven VM New York VM Singapore 66.95% 9.33% Control (LVN) Test (SI) 12

Experiments: network conditjons VM Leuven VM New York VM Singapore 76.40% 68.53% Control (SI) Test (NY) 12

Assumptjons Tor Web : e.g., staleness Web User Adversary

Data staleness Less than 50% afuer 9d. Accuracy (%) Time (days)

Efgect of false negatjves: Base rate fallacy • Breathalyzer test: – 0.88 identjfjes truly drunk drivers (true positjves) – 0.05 false positjves • Alice gives positjve in the test – What is the probability that she is indeed drunk? ( BDR ) Only 0.1! – Is it 0.95? Is it 0.88? Something in between?

The base rate fallacy: example ● Circumference represents the world of drivers. ● Each dot represents a driver. 18

The base rate fallacy: example ● 1% of drivers are driving drunk ( base rate or prior ). 19

The base rate fallacy: example ● From drunk people 88% are identjfjed as drunk by the test 20

The base rate fallacy: example ● From the not drunk people, 5% are erroneously identjfjed as drunk 21

The base rate fallacy: example ● Alice must be within the black circumference ● Ratjo of red dots within the black circumference: BDR = 7/70 = 0.1 ! 22

The base rate fallacy in WF • Base rate must be taken into account • In WF: – Blue: webpages – Red: monitored – Base rate? 23

Experiment: BDR in a 35K world • World of 35K sites • 4 target pages • Uniform prior • For 30K sites BDR is 0.4%

Disparate impact • WF normally atuacks report average success • But… – Are certain websites more susceptjble to website fjngerprintjng atuacks than others? – What makes some sites more vulnerable to the atuack than others?

Misclassifjcatjons of onion services: Sites that are “safe”

Misclassifjcatjons: Sites that are “safe” Some sites are Some sites are hidden from all hidden from all methods! methods!

Median of total incoming packet size for misclassifjed instances 0 0 1 . 0 Predicted Site − Median 7 5 0 . 0 5 0 0 . 0 2 5 0 . 0 0 0 0 . 0 0 0 0 . 0 2 5 0 . 0 5 0 0 . 0 5 0 7 . 0 0 1 0 . 0 Median − True Site

Site-level Feature Analysis • T r a c e f e a t u r e s a r e n o t a l w a y s h e l p f u l • Can we determine what characteristjcs of a website afgect its fjngerprintability? • Site-Level Features: – T o t a l H T T P d o w n l o a d s i z e – htup duratjon – screenshot size – number of scripts – …

Site Level Feature Analysis

WF countermeasures • Network layer – Add padding • C o n s t a n t r a t e i s u n r e a s o n a b l e • Leakage: how to optjmize padding? – Add latency to disrupt the traffjc patuern • Bad idea • Page design – Small size – Dynamism

To conclude • WF can be deployed by adversaries with only local access to the communicatjons network • WF seriously undermines the protectjon ofgered by htups • WF threatens the anonymity propertjes of Tor – Though it’s unclear to which extent lab results would hold in the wild – The atuack is costly in terms of resources • Disparate impact: some pages are more fjngerprintable than others, which is not captured if you only look at average results • Countermeasures involve additjonal traffjc and/or dynamism

W e b s i t e F i n g e r p r i n tj n g Claudia Diaz KU Leuven - PowerPoint PPT Presentation

W e b s i t e F i n g e r p r i n tj n g Claudia Diaz KU Leuven COSIC (With thanks to Marc Juarez and Bekah Overdorf) Summer School on real-world crypto and privacy June 2017 Outline Website Fingerprintjng for htups sites Website

Charlie Garrod Bogdan Vasilescu School of Computer Science 17-214 1 Lambdas and streams GUIs

SipDown Red A SipDown | Product Vision Promote safe drinking Active feedback through real-time

NDN Testbed Status Update March 2017 John DeHart Washington University jdd@wustl.edu NDN

Goals for Today Learning Objective: Present final exam details + review content

Admin Project checkpoint #2 due tonight Keep letting us know of any fuzzing issues with

Threshold ECDSA w/ Identifiable Aborts Ran Canetti (Boston University), Rosario Gennaro (City

Distributed Systems Rik Sarkar James Cheney University

The Wonderful World of Services By: Stefanja What Is a Service? Make the computer world go

Freeing Programmers from the Shackles of Sequentiality Thesis Proposal Talk Sven Stork

SOCI 325: Sociology of science Agenda 1. Administrative 2. Studying scientists Theme 2:

CCA & LEAPS 2.0 Mr Indrajit Singh (HOD/ PE & CCA) Co-Curricular Programmes 4

ProtoDUNE calibration database validation Wanwei Wu, Ajib Paudel ProtoDUNE Sim/Reco Meeting

HRBP Business Catalyst Nick Holley Director of CRF Learning Corporate Research Forum Tel:

Agenda Introduction Challenges Analysis Requirements Examples Summary THE UNIVERSITY OF

The many faces of leadership. A founders guide to contextually appropriate leadership. Today

Infrastucture and IT support for the MAGIX collaboration Stefano Caiazza Mainz, February 17 2017

Access to community mental health services How COVID19 is shaping our approach Steve Appleton

9.520 Math Camp Probability Theory Say we have some training data S ( n ) , consisting of n

k -times Full Traceable Ring Signature Xavier Bultel Pascal Lafourcade 31 August 2016, P .

Short Division of Long Integers (joint work with David Harvey) Paul Zimmermann October 6, 2011

All Ireland Schwartz Rounds and QI Conference People Make Change Happen | #QIreland

Earth's Layers Three Types of Rocks Early Life on Earth / Fossils Rock Strata Return to

Dante Firefighters are our heroes. Lets make them our super heroes. OCTOBER 5TH 2017 | RED B

CSCI 5417 Information Retrieval Systems Jim Martin Lecture 12 10/4/2011 Today 10/4

W e b s i t e F i n g e r p r i n tj n g Claudia Diaz KU Leuven - PowerPoint PPT Presentation

W e b s i t e F i n g e r p r i n tj n g Claudia Diaz KU Leuven COSIC (With thanks to Marc Juarez and Bekah Overdorf) Summer School on real-world crypto and privacy June 2017 Outline Website Fingerprintjng for htups sites Website

Charlie Garrod Bogdan Vasilescu School of Computer Science 17-214 1 Lambdas and streams GUIs

SipDown Red A SipDown | Product Vision Promote safe drinking Active feedback through real-time

NDN Testbed Status Update March 2017 John DeHart Washington University jdd@wustl.edu NDN

Goals for Today Learning Objective: Present final exam details + review content

Admin Project checkpoint #2 due tonight Keep letting us know of any fuzzing issues with

Threshold ECDSA w/ Identifiable Aborts Ran Canetti (Boston University), Rosario Gennaro (City

Distributed Systems Rik Sarkar James Cheney University

The Wonderful World of Services By: Stefanja What Is a Service? Make the computer world go

Freeing Programmers from the Shackles of Sequentiality Thesis Proposal Talk Sven Stork

SOCI 325: Sociology of science Agenda 1. Administrative 2. Studying scientists Theme 2:

CCA &amp; LEAPS 2.0 Mr Indrajit Singh (HOD/ PE &amp; CCA) Co-Curricular Programmes 4

ProtoDUNE calibration database validation Wanwei Wu, Ajib Paudel ProtoDUNE Sim/Reco Meeting

HRBP Business Catalyst Nick Holley Director of CRF Learning Corporate Research Forum Tel:

Agenda Introduction Challenges Analysis Requirements Examples Summary THE UNIVERSITY OF

The many faces of leadership. A founders guide to contextually appropriate leadership. Today

Infrastucture and IT support for the MAGIX collaboration Stefano Caiazza Mainz, February 17 2017

Access to community mental health services How COVID19 is shaping our approach Steve Appleton

9.520 Math Camp Probability Theory Say we have some training data S ( n ) , consisting of n

k -times Full Traceable Ring Signature Xavier Bultel Pascal Lafourcade 31 August 2016, P .

Short Division of Long Integers (joint work with David Harvey) Paul Zimmermann October 6, 2011

All Ireland Schwartz Rounds and QI Conference People Make Change Happen | #QIreland

Earth's Layers Three Types of Rocks Early Life on Earth / Fossils Rock Strata Return to

Dante Firefighters are our heroes. Lets make them our super heroes. OCTOBER 5TH 2017 | RED B

CSCI 5417 Information Retrieval Systems Jim Martin Lecture 12 10/4/2011 Today 10/4

CCA & LEAPS 2.0 Mr Indrajit Singh (HOD/ PE & CCA) Co-Curricular Programmes 4