modeling data correlations in private data mining with
play

Modeling Data Correlations in Private Data Mining with Markov Model - PowerPoint PPT Presentation

Modeling Data Correlations in Private Data Mining with Markov Model and Markov Networks Yang Cao Emory University 2017.11.15 Outline Data Mining with Di ff erential Privacy (DP) Scenario: Spatiotemporal Data Mining using DP Markov


  1. Modeling Data Correlations in Private Data Mining with Markov Model and Markov Networks Yang Cao Emory University 2017.11.15

  2. Outline • Data Mining with Di ff erential Privacy (DP) • Scenario: Spatiotemporal Data Mining using DP • Markov Chain for temporal correlations • Gaussian Random Markov Field for user-user correlations • Summary and open problems

  3. Outline • Data Mining with Di ff erential Privacy • Scenario: Spatiotemporal Data Mining using DP • Markov Chain for temporal correlations • Gaussian Random Markov Field for user-user correlations • Summary and open problems

  4. Data Mining Company Institute sensitive database* ! a t t a c k Attacker Public

  5. Privacy-Preserving Data Mining (PPDM)* How? ε -Differential Privacy! Institute Or X attack Sensitive data noisy data Adversary

  6. What is Differential Privacy • Privacy: the right to be forgotten. • DP: output of an algorithm should NOT be significantly affected by individual’s data. D’ D 1 1 ≈ M( M( Q( ) Q( ) ) ) 0 0 1 0 • Formally, M satisfies ε -DP if… ε ⬆ , privacy ⬇ . ( ) = r ) ( ) log Pr( M Q D e.g. 2 ε -DP means 
 ) = r ) ≤ ε ( ( ) more privacy loss than ε -DP. ′ Pr( M Q D • e.g., Laplace mechanism: add Lap(1/ ε ) noise to Q(D) • Sequential Composition. e.g., run M twice → 2 ε -DP

  7. An open problem of DP on Correlation Data • When data are independent: D’ D 1 1 ≈ M( M( Q( ) Q( ) ) ⇒ ε -DP ) 0 0 1 0 • When data are correlated (e.g. u1 and u3 always same): D’ D 1 0 ≈ M( M( Q( ) Q( ) ) ) / 0 0 ⇒ ?-DP 1 0 • It is still controversial [*][**] about the “guarantee” of DP [*] Di ff erential Privacy as a Causal Property, https://arxiv.org/abs/1710.05899 [**] https://github.com/frankmcsherry/blog/blob/master/posts/2016-08-29.md

  8. Quantifying DP on Correlated Data • A few recent papers [Cao17][Yang15][Song17] use a Quantification approach to achieve ε -DP (protecting each user private data value) Traditional approach (if attacker knows correlations, ε -DP may not hold): Laplace sensitive Mechanism ε -DP data data Lap(1/ ε ) Quantification approach (protect against attackers with knowledge of correlation): Laplace sensitive model data attacker Mechanism ε -DP data data correlations inference Lap(1/ ε ’ ) [Cao17]: Markov Chain [Yang15]: Gaussian Markov Random Field (GMRF) [Song17]: Bayesian Network

  9. Outline • Data Mining with Di ff erential Privacy • Scenario: Spatiotemporal Data Mining using DP • Markov Chain for temporal correlations • Gaussian Random Markov Field for user-user correlations • Summary and open problems

  10. Spatiotemporal Data Mining with DP Sensitive data Private data ε -DP ε -DP ε -DP D 3 … … r 2 r 3 D 1 D 2 r 1 t= 1 2 3 … t= 1 2 3 t= 1 2 3 .. .. Count Laplace 0 1 3 u1 … Query Noise 0 2 2 loc 3 loc 1 loc 1 loc1 .. loc1 .. 3 1 0 2 0 0 loc2 .. u2 … loc2 .. loc 2 loc 4 loc 5 1 0 1 1 0 1 loc3 .. loc3 .. u3 … loc 2 loc 4 loc 5 2 1 0 1 2 0 loc4 .. loc4 .. Lap(1/ ε ) u4 … loc 4 loc 5 loc 3 1 3 3 0 1 2 loc5 .. loc5 .. (a) Location Data (c) Private Counts (b) True Counts

  11. What types of data correlations ? loc 3 loc 5 colleague couple u2 loc 4 u1 u3 (a) Road Network (b) Social Ties temporal correlation 
 spatial correlation 
 for single user for user-user … D 1 D 2 D 3 7:00 8:00 9:00 … u1 … loc 3 loc 1 loc 1 u2 … loc 2 loc 1 loc 1 u3 … loc 2 loc 4 loc 5 u4 … loc 4 loc 5 loc 3 (a) Location Data

  12. Outline • Data Mining with Di ff erential Privacy • Scenario: Spatiotemporal Data Mining using DP • Markov Chain for temporal correlations - what is MC - how can (attacker) learn MC from data - how can (attacker) infer private data using MC • Gaussian Random Markov Field for user-user correlations • Summary and open problems

  13. What is Markov Chain • A Markov chain is a stochastic process with the Markov property. • 1- order Markov property: the state at time t only depends on the state at time t-1 Pr(x_t|x_t-1)=Pr(x_t|x_t-1,…,x_1) • Time-homogeneous: the transition matrix is the same after each step ∀ t>0, Pr(x_t+1|x_t)=Pr(x_t+2|x_t+1) t+1 loc1 loc2 loc3 7:00 8:00 9:00 … loc1 0.2 0.1 0.7 … u1 loc1 loc3 loc2 t loc2 0.1 0.2 0.7 … u2 loc2 loc2 loc2 loc3 0.3 0.4 0.3 … u3 loc3 loc1 loc1 … u4 loc1 loc2 loc2 Transition Matrix Raw Trajectories

  14. How can (attacker) learn MC • If attacker knows partial user trajectory, he can directly learn transition matrix by Maximum Likelihood estimation • If attacker knows road network, he may learn MC using google-like model [*] [*] E. Crisostomi, S. Kirkland, and R. Shorten, “A Google-like model of road network dynamics and its application to regulation and control,” International Journal of Control , vol. 84, no. 3, pp. 633–651, Mar. 2011.

  15. How can (attacker) infer private data using MC Model Attacker Define TPL Find structure of TPL • Model temporal correlations using Markov Chain e.g., user i : loc 1 → loc 3 → loc 2 → … t − 1 l i t ) t l i t − 1 ) (a) Transition Matrix Pr( l i (b) Transition Matrix Pr( l i time t time t-1 loc 1 loc 2 loc 3 loc 1 loc 2 loc 3 time t-1 loc 1 loc 1 0.2 0.3 0.5 0.1 0.2 0.7 time t loc 2 loc 2 0.1 0.1 0.8 0 0 1 loc 3 0.6 0.2 0.2 loc 3 0.3 0.3 0.4 B F P P Backward Temporal Correlation Forward Temporal Correlation i i

  16. How can (attacker) infer private data using MC Model Attacker Define TPL Find structure of TPL • DP can protect against the attacker with knowledge of all tuples + Temporal Correlation ? except the one of victim D t= 1 l i ? u1 loc 3 u2 loc 2 } D K u3 loc 2 T ( D K , P B , P F ) A i ( D K ) A i u4 loc 4 i i B , ∅ ) T ( D K , P A i (i) i T ( D K , ∅ , P F ) (ii) A i i T ( D K , P B , P F ) A i (iii) i i

  17. How can (attacker) infer private data using MC Model Attacker Define TPL Find structure of TPL • Recall the definition of DP: PL 0 ( M ) ≤ ε if , then satisfies ε -DP. M • Definition of TPL:

  18. How can (attacker) infer private data using MC Model Attacker Define TPL Find structure of TPL • Definition of TPL: • If no temporal correlation… TPL = PL 0 Eqn(2)= log Pr( r 1 | l i t ) + ... + log Pr( r t | l i t ) + ... + log Pr( r T | l i t ) t , D k t , D k t , D k Pr( r 1 | l i t ) Pr( r t | l i t ) Pr( r T | l i t ) t ʹ , D k t ʹ , D k t ʹ , D k { { { PL 0 0 0

  19. How can (attacker) infer private data using MC Model Attacker Define TPL Find structure of TPL • Definition of TPL: Hard to quantify • If with temporal correlation… TPL = ? Eqn(2)… Eqn(2)= log Pr( r 1 | l i t ) + ... + log Pr( r t | l i t ) + ... + log Pr( r T | l i t ) t , D k t , D k t , D k Pr( r 1 | l i t ) Pr( r t | l i t ) Pr( r T | l i t ) t ʹ , D k t ʹ , D k t ʹ , D k { { { PL 0 ? ?

  20. ⇒ How can (attacker) infer private data using MC Model Attacker Define TPL Find structure of TPL B , ∅ ) T ( D K , ∅ , P T ( D K , P F ) (i) A i (ii) A i i i (BPL) (FPL) T ( D K , P B , P F ) (iii) A i i i (i) (ii) r 1 …. r t-1 r t r t+1 …. r T

  21. How can (attacker) infer private data using MC BPL Model Attacker Define TPL Find structure of TPL • Analyze BPL Backward temporal correlations Eqn(6)= Backward privacy loss function . how to calculate it? ⇒

  22. How can (attacker) infer private data using MC FPL Model Attacker Define TPL Find structure of TPL • Analyze FPL Forward temporal correlations Forward privacy loss function . how to calculate it? ⇒

  23. Calculating BPL & FPL Privacy Quantification Upper bound • We convert the problem of BPL/FPL calculation to finding an optimal solution of a linear-fractional programming problem . • This problem can be solved by simplex algorithm in O(2 n ). • We designed a O(n 2 ) algorithm for quantifying BPL/FPL.

  24. ⇒ Calculating BPL & FPL Privacy Quantification Upper bound • Example of BPL under different temporal corr. (i) Strong temporal corr. (ii) Moderate temporal corr. (iii) No temporal corr. 1.0 0.9 0.8 0.7 0.6 Privacy Loss 0.50 0.48 0.45 0.5 0.42 0.39 0.35 0.4 0.30 0.25 0.3 0.18 0.2 0.10 0.1 t=1 2 3 4 5 6 7 8 9 10 Time

  25. Calculating BPL & FPL Privacy Quantification Upper bound q = 0.8; d = 0.1; ε = 0.23 q = 0.8; d = 0; ε = 0.15 q=0.8, d=0.1, ε =0.23 q=0.8, d=0, ε =0.15 BPL BPL 0.8 1.2 Privacy Loss 1.0 case 2 case 1 0.6 (a) (b) 0.8 0.4 0.6 B B P i = ( ) P i = ( ) 0.8 0.2 0.8 0.2 0.1 0.9 0 1 0.4 0.2 0.2 100 t 100 t 20 40 60 80 20 40 60 80 time q = 1; d = 0; ε = 0.23 q = 0.8; d = 0; ε = 0.23 BPL BPL q=0.8, d=0, ε =0.23 q=1, d=0, ε =0.23 3.5 20 (d) 3.0 (c) case 3 case 4 2.5 15 2.0 10 1.5 B B 1 0 P i = ( ) P i = ( ) 0.8 0.2 0 1 0 1 1.0 5 0.5 100 t 100 t 20 40 60 80 20 40 60 80 Refer to Theorem 5 in our paper

  26. Outline • Data Mining with Di ff erential Privacy • Scenario: Spatiotemporal Data Mining using DP • Markov Chain for temporal correlations • Gaussian Random Markov Field for user-user correlations - what is GMRF • Summary and open problems - how can (attacker) learn GMRF from data - how can (attacker) infer private data using GMRF

Recommend


More recommend