bayesian inference and traffic analysis
play

Bayesian Inference and Traffic Analysis Carmela Troncoso George - PowerPoint PPT Presentation

Bayesian Inference and Traffic Analysis Carmela Troncoso George Danezis September-November 2008 Microsoft Research Cambridge/ KU Leuven(COSIC) Anonymous Communications T ell me who your friends are. .. => Anonymous


  1. Bayesian Inference and Traffic Analysis Carmela Troncoso George Danezis September-November 2008 Microsoft Research Cambridge/ KU Leuven(COSIC)

  2. Anonymous Communications  “T ell me who your friends are. .. ” => Anonymous communications to hide communication partners  High latency systems (e.g.anonymous remailers) use mixes [Chaum 81]: hide input/output relationship MIX MIX MIX 2

  3. Anonymous Communications  Attacks to mix networks  Restricted routes [Dan03]  Bridging and Fingerprinting [DanSyv08]  Social information:  Disclosure Attack [Kes03],  Statistical Disclosure Attack [Dan03],  P erfect Matching Disclosure Attacks [T ron08]  Heuristics and specific models 3

  4. Mix networks and traffic analysis  Determine probability distributions input-output ( , , ) A B C 1 1 A A or B 2 2 MIX 1 3 3 1 ( , , ) B Q 8 8 4 MIX 3 3 3 1 R ( , , ) 8 8 4 1 1 A or B 1 1 1 2 2 A or B or C 4 4 2 MIX 2 1 1 1 ( , , ) C S 4 4 2

  5. Mix networks and traffic analysis  Constraints, e.g. length=2 ( , , ) A B C 1 1 A A or B 2 2 MIX 1 1 1 1 ( , , ) B Q 4 4 2 MIX 3 1 1 1 R ( , , ) 4 4 2 1 1 A or B 1 C 2 2 MIX 2 1 1 ( , , 0 ) C S 2 2 N on trivial given observation!!

  6. “The real thing” S enders Mixes (Threshold = 3) Receivers How to compute probabilities How to compute probabilities systematically? systematically? ? ?

  7. Mix networks and traffic analysis  Find “hidden state” of the mixes A B Q Pr( | , ) ? M1 HS O C R M3 C S M2 Prior information   Pr( | , ) Pr( | ) Pr( | , ) O HS C HS C O HS C K   Pr( | , ) HS O C   Pr( , | ) HS O C T oo large to HS enumerate!!

  8. Mix networks and traffic analysis  “hidden state” + Observation = P aths A B Q M1 R M3 C S M2 P 1 A M1 M2 M3 R P 2 B M1 M3 Q P 3 C M2 S  Pr( | , ) Pr( | ) O HS C K Paths C   Pr( | , ) HS O C  

  9. Bayesian Inference  Actually… we want marginal probabilities ( , , ) A B C 1 1 A or B A 2 2 3 3 1 ( , , ) B Q 8 8 4 3 3 1 ( , , ) R 8 8 4 1 1 A or B 1 1 1 2 2 A or B or C 4 4 2 1 1 1 C S ( , , ) 4 4 2  ( ) I HS  A Q j   Pr( | , , ) HS A Q HS O C j  But… we cannot obtain them directly

  10. Bayesian Inference - sampling  If we obtain samples ~ Pr( | , ) HS O C HS 1 , HS 2 , HS 3 , HS 4 ,…, HS j (A → Q)? 0 1 0 1 … 1  ( ) I HS  A Q j   HS Pr( | , , ) A Q HS O C j  Markov Chain Monte Carlo Methods  Metropolis Hastings alg orithm Pr( | ) Paths C  Pr( | , ) HS O C  How does Pr(P aths|C) look like?

  11. Probabilistic model – Basic Constraints  Users decide independently   Pr( | ) Pr( | ) Paths C P x C x L  Pr( | )  Length restrictions with any distribution l C 1  e.g.   uniform ( L min , L max ) Pr( | ) L l C  L L max min  N ode choice restrictions 1   Pr( | , )  Choose l out of the N mix node a vailable M L l C x ( , ) P N l mix  Choose a set ( ) I set M x      Pr( | ) Pr( | ) Pr( | , ) ( ) P C L l C M L l C I M x x set x

  12. Probabilistic model – Basic Constraints  Unknown destinations max  3 L S C S     L max       Pr( | ) Pr( | ) Pr( | , ) ( ) P C L l C M L l C I M x x set x    l L obs

  13. Probabilistic model – More Constraints  Bridging ( )  Known nodes I M bridging x  N on-compliant clients (with probability ) p c p  Do not respect length restrict ions ( , ) L L min, max, c p c p  Choose l out of the N mix node a vailable, allow repetiti ons 1   Pr( | , , ( )) M L l C I Path x c p ( , ) P N l r mix   Pr( | ) Pr( | ) Paths C P x C x              Pr( | ) Pr( | , ( )) ( 1 ) Pr( | ) Paths C p P C I P p P C i c p i j     c p c p       i P j P c p cp

  14. Probabilistic model – More constraints  S ocial network information x   Assuming we know sending profiles Pr( Sen Rec ) x        Pr( | ) Pr( | ) Pr( | , ) ( ) Pr( Sen Rec ) P C L l C M L l C I M x x set x x x  O ther constraints  Unknown origin  Dummies  O ther mixing strategies  ….

  15. Markov Chain Monte Carlo  S ample from a distribution difficult to sample from directly   Pr( | , ) Pr( | ) Pr( | , ) Pr( | ) O HS C HS C O HS C K Paths C    Pr( | , ) HS O C    Pr( , | ) HS O C HS  3 K ey advantages:  Requires generative model (we know how to compute it!)  Good estimation of errors  N ot false positives and negatives  Systematic

  16. Metropolis Hastings Algorithm Pr( | , )  Constructs a Markov Chain with stationary distribution HS O C Q  Current state Candidate state ( | ) Q HS HS candidate current HS HS candidate current ( | ) Q HS HS current candidate Pr( ) ( | ) HS Q HS HS   candidate candidate current 1. Compute Pr( ) ( | ) HS Q HS HS   current current candidate 1 2. If  HS HS current candidate ~ U ( 0 , 1 ) u else   u if  HS HS current candidate else  HS HS current current

  17. Our sampler: Q transition ( | ) Q Paths Paths candidate current Pr( | ) Paths C  Pr( | , ) HS O C Pahts Paths candidate current Z ( | ) Q Paths Paths current candidate Pr( ) ( | ) Paths Q Paths Paths   candidate candidate current Pr( ) ( | ) Paths Q Paths Paths current current candidate  T ransition Q : swap operation Q A R B M3 M1 C S M2  More complicated transitions for non-compliant clients

  18. Iterations ( | ) Q Paths Paths candidate current Pr( | ) Paths C  Pr( | , ) HS O C Pahts Paths Z candidate current ( | ) Q Paths Paths current candidate  Consecutive samples dependant   S ufficiently separated   Paths Pr( | ) Pr( ) Paths Paths Paths Paths i j i Paths Paths Paths i Paths Paths Paths Paths Paths Paths j

  19. Error estimation  P 1 P 2 P 3 P 4   Paths I (A → Q)? Paths    A Q 1 0 1 0 Pr( ) A Q j Paths  Error estimation Paths  Bernouilli distribution Paths  Pr[ , , ,... | Pr( )] Paths Paths Paths A Q 1 2 3  Prior Beta(1,1) ~ uniform A  Pr[Pr( ) | , , ,...] Q Paths Paths Paths 1 2 3      Pr( ) ~ ( ( ) 1 , ( ) 1 ) A Q Beta I Path I Path    A Q i A Q i Paths Paths  Confidence intervals

  20. Evaluation Create an instance of a network 1. Run the sampler 2. Choose a target sender and a receiver 3. Estimate probability 4.  ( ) I Paths  Sen Rec j   Pr( Sen Rec ) Paths j Check if actually S en chose Rec as receiver ( ) I network 5.  Sen Rec Choose new network and g o to 2 6. Events should happen with the estimated probability  ( ) I Paths  Sen Rec j    Paths Pr( Sen Rec ) ( ( )) E I network  Sen Rec j

  21. Results – compliant clients ( ( )) E I network  Sen Rec  ( ) I Paths  Sen Rec j Paths j

  22. Results – 50 messages

  23. Results – 10 messages

  24. Results – big networks

  25. Performance – RAM usage Nmix t Nmsg Samples RAM(Mb) 3 3 10 500 16 3 3 50 500 18 5 10 100 500 19 10 20 1 000 1 000 24 10 20 10 000 1 000 125  S ize of network and population  Results are kept in memory during simulation  N umber samples collected increases

  26. Performance – Running time Nmix t Nmsg iter Full analysis (min) One sample(ms) 3 3 10 6011 2.33 267.68 3 3 50 6011 2.55 306.00 5 10 100 4011 1.58 190.35 10 20 1 000 7011 3.16 379.76  O perations should be O (1)  W riting of the results on a file  Different number of iterations

  27. Conclusions  T raffic analysis is non trivial when there are constraints  Probabilistic model: incorpor ates most attacks  N on-compliant clients  Monte Carlo Markov Chain methods to extract marginal probabilities  Future work:  SDA based on Ba yesian Inferenc e  Added value?

  28. Thanks for y our attention Carmela.T roncoso@ esat.kuleuven .be Microsoft technical report coming soon… 28

  29. Bayes theorem   Pr( , | ) Pr( | , ) Pr( | ) O HS C HS O C O C   Pr( , | ) Pr( | , ) Pr( | ) O HS C O HS C HS C   Pr( | , ) Pr( | ) Pr( | , ) Pr( | ) O HS C HS C O HS C HS C   Pr( | , ) HS O C  Pr( | ) Pr( , | ) O C HS O C HS J oint probability:     Pr( , ) Pr( | ) Pr( ) Pr( | ) Pr( ) X Y X Y Y Y X X

Recommend


More recommend