you cannot hide for long de anonymization of real world
play

You cannot hide for long: De-anonymization of real-world dynamic - PowerPoint PPT Presentation

You cannot hide for long: De-anonymization of real-world dynamic behaviour George Danezis (University College London) Carmela Troncoso (Gradiant) Privacy beyond confjdentiality Common belief: if I encrypt my data, then the data is


  1. You cannot hide for long: De-anonymization of real-world dynamic behaviour George Danezis (University College London) Carmela Troncoso (Gradiant)

  2. Privacy beyond confjdentiality • Common belief: “if I encrypt my data, then the data is private” • Encryption works and gets more and more effjcient! • But does not hide all data • Origin and destination • Timing • Frequency • Location • …

  3. Anonymization • Decouple user identity from actions Anonymizer Anonymizer • Enabler for privacy-preserving technologies • Anonymous credentials • eVoting • Privacy-preserving statistics computation

  4. Anonymity in reality • Diffjcult to guarantee perfect anonymity due to constraints • Observations allow for inferences (e.g., behavioral profjles) Bob Alice Anonymizer behaviour Alice is speaking to Bob Prior info on users with probability X State of the art limitation: static behavior

  5. A model for dynamic behaviour • Users Dynamism: Sends messages to at rate λ AB Epochs t of stationary behaviour Sends messages to at rate λ AO Profjle evolution probability Send messages to at rate λ OB Send messages to at rate λ OO • Anonymizer Given observation… • Divided in batches (n batches per epoch) What is λ AB ? • Perfect anonymity V Pois ( ) ← λ + λ V A V B A AB AO Visible V Pois ( ) ← λ + λ Hidde t t O OB OO n V O V O’ V Pois ( ) ← λ + λ B AB OB V Pois ( ) ← λ + λ O ' AO OO

  6. Sequential Monte Carlo aka. Particle Filters • Inferring hidden parameters of sequential models • Our case: modeling λ AB at t depends on λ AB at t-1 • Core idea: • Particles representing sample hidden states (λ AB , λ OB ) • Distributed following posterior distribution given evidence (V X ) Allow for statistic computation (mean, std, …) of hidden variables Likelihood of obs. Prob evolving to • From Bayes theorem given hidden current λ AB state t t t t 1 t 1 t 1 Pr[( , ) | V ] L ( V | ) E ( | − ) Pr[( − , − )] λ λ ∝ λ λ λ λ λ AB OB * * * AB AB AB OB Prob at epoch t Prior (epoch t-1)

  7. V A V B T oy example t t V O V O’ t t-1 λ t-1 λ t Weight particles: λ t-1 λ t i. Likelihood t t Pr[( , ) | V ] ii. Evolution λ AB λ OB * iii. Proposal λ t-1 λ t λ t-1 λ t 1. Propose new 2. Likelihood 3. Re-sample particles given Obs and previous state

  8. In pseudocode Take obs in all epochs Initialize particles Propose current state given observation All types of samples Likelihood of observation given current and previous state Reweighting of proposal likelihood given proposal distributions Resampling to obtain new particles according to posterior

  9. The likelihood function L ( V | ) * λ * V Pois ( ) ← λ + λ V A V B A AB AO Visible Hidde V Pois ( ) ← λ + λ t O OB OO t n V Pois ( ) ← λ + λ V O V O’ B AB OB V Pois ( ) ← λ + λ O ' AO OO • How likely is an observation V * given sending rates λ * Prob of each of the rounds Prob of total volume in epoch given λ * (just Poisson) Binomial p ab is just the probability A sent to B p ab =(λ AB / λ AB + λ OB )

  10. t t 1 The profjle evolution probability E ( | − ) λ λ AB AB • Probability of λ AB at t given λ AB at t-1 • T wo stages 1) Probability transitions silent-communication 2) Probability of given difgerence: mixture with heavy tails

  11. Evaluation • Three datasets: • eMail: Enron dataset ~0.5M emails, 150 users. • Mailing list: Indymedia ~300K posts from 28237 senders to 693 lists • Location: Gowala dataset ~6.5M checkins from ~200K users • Parameters empirically inferred using EM • T wo sets • Communication Transitions Mixtur Prio • Stop talking e Silent r Stay silent evoluti silen on t • Anonymity system • 1 day delay (anonymity vs delay trade-ofg given 1 week epochs) • Thresholds: eMail/Mailing ~100 Location ~15K

  12. Evaluation - an example trace (Avg(Batch)= 244) • State of the art: Statistical Disclosure Attack • Background traffjc: messages to • Use background to estimate volume in her rounds • Assumes static behaviour: short and long term

  13. Evaluation – estimation accuracy as Squared error 13 20 MSE Comm 84 3.7 83 MSE Silent 0.7 2.8 0.8 =12 K K 1.2 2.3 36 K 0 Epoch Trace

  14. Evaluation – communication detection Are Alice and Bob communicating? Base rate fallacy! Use particles distribution Use rate directly

  15. Conclusions • Structured model for traffjc analysis based on known Bayesian inference techniques • easy to extend • allow assessment of inference quality • avoid base rate fallacy • Attacks on real world traces • can be efgective for rather low action rates • can be efgective over a much shorter period of time than previously thought • can be efgective for secure confjgurations of the anonymity system • Rethink current evaluations and fjgures of merit

  16. Thanks!! ctroncoso@gradiant.org g.danezis@ucl.ac.uk

Recommend


More recommend