Bayesian inference to evaluate information leakage in complex scenarios Carmela Troncoso Gradiant, Spain 17 th July 2013 GALICIAN RESEARCH AND DEVELOPMENT CENTER IN ADVANCED TELECOMMUNICATIONS
Privacy beyond encryption Common belief: “if I encrypt my data, then the data is private” Encryption works and gets more and more efgicient! But does not hide all data Origin and destination Timing Frequency Location … These data contain a lot of information WWII: The English recognized German Morse code operators Nowadays: Phonotactic Reconstruction of Encrypted VoIP conversations: Hookt on fon-iks . A. White, A. Matthews, K. Snow, and F . Monrose. IEEE Symposium on Security and Privacy, May, 2011. CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
Easy, let’s hide this information! Delay messages to change frequency and timing patters Messages cannot be delayed for too long Add dummy events to confuse the adversary Pad packets to hide their length Bandwith is in general limited Reroute messages to hide origin and destination Delays messages Needs of collaboration or dedicated infrastructure Obfuscate the location Obfuscation must not prevent usability CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
Maybe is not that easy… Design decisions to: Information will leak!! Balance available resources and privacy Balance usability and privacy And do not forget there is an adversary not only observes public input/outputs of the system… … also knows the privacy-preserving mechanism operation e.g, ISP providers, system administrator, Data Retention, … How to quantify the information leaked? CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
This is a problem we all have Given an observation… Location privacy mechanisms Anonymous communications ? n o i t a c o l l a e X r e h t s i h c h i W ? m o h w h t w i s k a e p s o h W Source identifjcation Image forensics h e t d e a t n i g i o r e c v i e d a t h W ? e g a m i ? d e r e p m a t e g a m i e h t a s W CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
Case study Anonymous communications GALICIAN RESEARCH AND DEVELOPMENT CENTER IN ADVANCED TELECOMMUNICATIONS
Anonymous communications Hide who speaks to whom sender, receiver, type of service, network address, friendship network, frequency, relationship status. Main building block for privacy-preserving applications Desirable privacy (comms, surveys,…) Mandatory privacy (eVoting,) Subject to constraints (bandwidth, delay,…) They must leak information! CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
Trafgic analysis of Anonymous Communications Systems are evaluated against one attack at a time Network constraints Users knowledge Persistent communications … Based on heuristics and simplifjed models Exact calculation of probability distributions in complex systems was considered as an intractable problem CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
Mix networks as an example Mixes hide relations between inputs and outputs Mixes are combined in networks in order to Distribute trust (one good mix is enough) Load balancing (no mix is big enough) CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
The trafgic analysis game Who speaks to whom? 1/2 1/2 3/8 3/8 1/4 3/8 3/8 1/4 1/4 1/4 1/2 1/2 1/2 1/4 1/4 1/2 1/4 1/4 1/2 CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
Routing constraints Max Length = 2 hops 1/2 1/2 1/4 1/4 1/2 1/4 1/4 1/2 1/2 1 1/2 1/2 1/2 1/2 0 1/2 Non trivial given the observation!! CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
Routing constraints Really, non-trivial! (we could think about user knowledge in the same way) CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
(Re)Defjning Trafgic analysis Find hidden state of mixes CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
(Re)Defjning Trafgic analysis Find hidden state of mixes Pr[ HS | O , C ] ? Pr[ O | HS , C ] Pr[ HS | C ] Pr[ HS | O , C ] = ∑ Pr[ O | HS , C ] CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA HS
(Re)Defjning Trafgic analysis Find hidden state of mixes Pr[ HS | O , C ] ? Too large to enumerate Pr[ O | HS , C ] Pr[ HS | C ] Pr[ O | HS , C ] K Pr[ HS | O , C ] = = ∑ Pr[ O | HS , C ] Z CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA HS
Sampling to get probabilities Computing Pr[HS|O,C] infeasible: too many HS … but we only care about marginal distributions Is Alice speaking to Bob? if we had many samples of HS according to Pr[HS| O,C] we could simply count how many times Alice speaks to Bob Markov Chain Monte Carlo methods Sample from a distribution difgicult to sample from directly CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
Metropolis Hastings Simple 1. Given HS 0 (an internal confjguration of the mixes) 2. Propose a new state HS 1 3. Accept with probability min(1, α ), reject otherwise Pr[ O | HS , C ] K Q ( HS | HS ) 1 ⋅ Pr[ HS | O , C ] Q ( HS | HS ) ⋅ 0 1 Z 1 0 1 α = = Pr[ O | HS , C ] K Pr[ HS | O , C ] Q ( HS | HS ) ⋅ 0 Q ( HS | HS ) 0 1 0 ⋅ 1 0 Z Pr[O|HS,C] is a generative model (in general simple) The stationary distribution Q() is a proposal function corresponds to Pr[HS| e.g., swap two links in a mix O,S] We can sample! The bayesian traffic analysis of mix networks,C. Troncoso and G. Danezis, CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA 16th on Computer and Communications Security (CCS 2009)
Why is this useful? Evaluation information theoretic metrics for anonymity H = ∑ Pr[ A R | O , C ] log(Pr[ A R | O , C ]) → → i i R i e.g., comparison of network topologies Estimating probability of arbitrary events Input message to output message? Alice speaking to Bob ever? Two messages having the same sender? Accommodate new constraints Key to evaluate new mix network proposals Impact of Network Topology on Anonymity and Overhead in Low-Latency Anonymity Networks, C. Diaz, S. J. Murdoch, and C. Troncoso 10th Privacy Enhancing Technologies Symposium(PETS CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA 2010)
Persistent communications B Alice Perfect! T 1 Anonymity set size = 6 Others Others Entropy metric H A = log 6 CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
Persistent communications B Alice T 1 Others Others B Alice T 2 Rounds in which Alice Others Others participates output a message to her friends Alice B Her friends appear more T 3 often Others Others We can infer set of friends! . . Alice B . T ρ Others Others CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA
Statistical Disclosure Attacks Statistically fjnds frequent Round Receivers SDA [15, 13, 14, 5, [13, 14, receivers 1 9] 15] [19, 10, 17, 13, [13, 17, Count & Substract “noise” 2 8] 19] 20 users, 5 msgs/batch 3 [0, 7, 0, 13, 5] [0, 5, 13] [16, 18, 6, 13, Alice’s friends [0,13,19] 4 [5, 10, 13] 10] 10 13 15 [10, 13, 0 5 [1, 17, 1, 13, 6] 17] 5 [18, 15, 17, 13, [13, 17, 6 17] 18] 0 7 [0, 13, 11, 8, 4] [0, 13, 17] [15, 18, 0, 8, 8 [0, 13, 17] 12] [15, 18, 15, 19, [13, 15, 9 14] 18] 10 [0, 12, 4, 2, 8] [0, 13, 15] [9, 13, 14, 19, 11 [0, 13, 15] 15] 12 [13, 6, 2, 16, 0] [0, 13, 15] 13 [1, 0, 3, 5, 1] [0, 13, 15] [17, 10, 14, 11, 14 [0, 13, 15] 19] CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA [12, 14, 17, 13, 15 [0, 13, 17]
Statistical Disclosure Attacks Statistically fjnds frequent Round Receivers SDA [15, 13, 14, 5, [13, 14, receivers 1 9] 15] [19, 10, 17, 13, [13, 17, Count & Substract “noise” 2 8] 19] 20 users, 5 msgs/batch 3 [0, 7, 0, 13, 5] [0, 5, 13] [16, 18, 6, 13, Alice’s friends [0,13,19] 4 [5, 10, 13] 10] 15 13 [10, 13, 0 5 [1, 17, 1, 13, 6] 17] 10 19 [18, 15, 17, 13, [13, 17, 5 6 17] 18] 0 7 [0, 13, 11, 8, 4] [0, 13, 17] [15, 18, 0, 8, 8 [0, 13, 17] 12] Efgicient [15, 18, 15, 19, [13, 15, 9 14] 18] Needs a lot of data for 10 [0, 12, 4, 2, 8] [0, 13, 15] reliability [9, 13, 19, 19, 11 [0, 13, 15] 15] More complex models 12 [13, 6, 2, 16, 0] [0, 13, 15] 13 [1, 0, 3, 5, 1] [0, 13, 15] (replies, pool mixes) [17, 10, 14, 11, 14 [0, 13, 15] 19] CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA [12, 14, 17, 13, 15 [0, 13, 17]
Co-inferring routing and profjles A simple approach Iterate profjle and routing Introduces systematic errors if done naively Pr[ M Ψ , | O , C ] Actually we want to fjnd M is the routing, Ψ are the profjles (multinomial distribution) Sounds familiar… Gibbs sampling MCMC to sample from a joint distributions Pr[ X , Y | O , C ] Iterate and X ← Pr[ X | Y , O , C ] Y ← Pr[ Y | X , O , C ] Perfect matching disclosure attacks,C. Troncoso, B. Gierlichs, B. Preneel, and I. Verbauwhede. CENTRO TECNOLÓXICO DE TELECOMUNICACIÓNS DE GALICIA 8th International Symposium on Privacy Enhancing Technologies (PETS 2008)
Recommend
More recommend