privacy preserving mechanisms for correlated data
play

Privacy-preserving Mechanisms for Correlated Data Kamalika - PowerPoint PPT Presentation

Privacy-preserving Mechanisms for Correlated Data Kamalika Chaudhuri University of California, San Diego Joint work with Shuang Song and Yizhen Wang Sensitive Data Medical Records Search Logs Social Networks Talk Agenda: How do we analyze


  1. Privacy-preserving Mechanisms for Correlated Data Kamalika Chaudhuri University of California, San Diego Joint work with Shuang Song and Yizhen Wang

  2. Sensitive Data Medical Records Search Logs Social Networks

  3. Talk Agenda: How do we analyze sensitive data while still preserving privacy ? (Focus on correlated data )

  4. Correlated Data User information in social networks Physical Activity Monitoring

  5. Why is Privacy Hard for Correlated Data? Because neighbor’s information leaks information on user

  6. Talk Agenda: 1. Privacy for Correlated Data - How to define privacy (for uncorrelated data)

  7. Differential Privacy [DMNS06] Randomized Data + Algorithm “similar” Randomized Data + Algorithm Participation of a single person does not change output

  8. Differential Privacy: Attacker’s View Algorithm Prior Conclusion Output on + = on Knowledge Data & Algorithm Prior Conclusion Output on + = on Knowledge Data & Note: a. Algorithm could draw personal conclusions about Alice b. Alice has the agency to participate or not

  9. What happens with correlated data?

  10. Example 1: Activity Monitoring Goal: Share aggregate data on physical activity with doctor, while hiding activity at each specific time. Agency is at the individual level.

  11. Example 2: Spread of Flu in Network Interaction Network Goal: Publish aggregate statistics over a set of schools, prevent adversary from knowing who has flu. Agency at school level.

  12. Why does Correlated data require a different notion of privacy?

  13. Example: Activity Monitoring D = (x 1 , .., x T ), x t = activity at time t Correlation Network Goal: (1) Publish activity histogram (2) Prevent adversary from knowing activity at t

  14. Example: Activity Monitoring D = (x 1 , .., x T ), x t = activity at time t Correlation Network Goal: (1) Publish activity histogram (2) Prevent adversary from knowing activity at t Agency is at individual level, not time entry level

  15. Example: Activity Monitoring D = (x 1 , .., x T ), x t = activity at time t Correlation Network 1-DP: Output histogram of activities + noise with stdev T Too much noise - no utility!

  16. Example: Activity Monitoring D = (x 1 , .., x T ), x t = activity at time t Correlation Network 1-entry-DP: Output histogram of activities + noise with stdev 1 Not enough - activities across time are correlated!

  17. Example: Activity Monitoring D = (x 1 , .., x T ), x t = activity at time t Correlation Network 1-Entry-Group DP: Output histogram of activities + noise with stdev T Too much noise - no utility!

  18. Pufferfish Privacy [KM12] Secret Set S S: Information to be protected e.g: Alice’s age is 25, Bob has a disease

  19. Pufferfish Privacy [KM12] Secret Pairs Secret Set S Set Q Q: Pairs of secrets we want to be indistinguishable e.g: (Alice’s age is 25, Alice’s age is 40) (Bob is in dataset, Bob is not in dataset)

  20. Pufferfish Privacy [KM12] Secret Pairs Distribution Secret Set S Set Q Class Θ : A set of distributions that plausibly generate the data Θ e.g: (connection graph G, disease transmits w.p [0.1, 0.5]) (Markov Chain with transition matrix in set P ) May be used to model correlation in data

  21. Pufferfish Privacy [KM12] Secret Pairs Distribution Secret Set S Set Q Class Θ An algorithm A is -Pufferfish private with parameters ✏ ( S, Q, Θ ) if for all (s i , s j ) in Q, for all , all t, θ ∈ Θ X ∼ θ , p ✓ ,A ( A ( X ) = t | s i , θ ) ≤ e ✏ · p ✓ ,A ( A ( X ) = t | s j , θ ) whenever P ( s i | θ ) , P ( s j | θ ) > 0 t p ( A ( X ) | s j , θ ) p ( A ( X ) | s i , θ )

  22. Pufferfish “Includes” DP [KM12] Theorem: Pufferfish = Differential Privacy when: S = { s i,a := Person i has value a, for all i, all a in domain X } Q = { (s i,a s i,b ), for all i and (a, b) pairs in X x X } = { Distributions where each person i is independent } Θ

  23. Pufferfish “Includes” DP [KM12] Theorem: Pufferfish = Differential Privacy when: S = { s i,a := Person i has value a, for all i, all a in domain X } Q = { (s i,a s i,b ), for all i and (a, b) pairs in X x X } = { Distributions where each person i is independent } Θ Theorem: No utility possible when: = { All possible distributions } Θ

  24. Talk Agenda: 1. Privacy for Correlated Data - How to define privacy (for uncorrelated data) - How to define privacy (for correlated data) 2. Privacy Mechanisms - A General Pufferfish Mechanism

  25. How to get Pufferfish privacy? Special case mechanisms [KM12, HMD12] Is there a more general Pufferfish mechanism for a large class of correlated data? Our work: Yes, two - a. Wasserstein Mechanism b. Markov Quilt Mechanism (Also concurrent work [GK16])

  26. Correlation Measure: Bayesian Networks Node: variable Directed Acyclic Graph Joint distribution of variables: Y Pr( X 1 , X 2 , . . . , X n ) = Pr( X i | parents( X i )) i

  27. A Simple Example X 1 X 2 X 3 X n Model: X i in {0, 1} State Transition Probabilities: 1 - p p 0 1 p 1 - p

  28. A Simple Example X 1 X 2 X 3 X n Model: Pr(X 2 = 0| X 1 = 0) = p X i in {0, 1} Pr(X 2 = 0| X 1 = 1) = 1 - p State Transition Probabilities: …. 1 - p p 0 1 p 1 - p

  29. A Simple Example X 1 X 2 X 3 X n Model: Pr(X 2 = 0| X 1 = 0) = p X i in {0, 1} Pr(X 2 = 0| X 1 = 1) = 1 - p State Transition Probabilities: …. 1 - p 2 + 1 1 Pr(X i = 0| X 1 = 0) = p 0 1 p 2(2 p − 1) i − 1 1 2 − 1 Pr(X i = 0| X 1 = 1) = 2(2 p − 1) i − 1 1 - p Influence of X 1 diminishes with distance

  30. Algorithm: Main Idea X 1 X 2 X 3 X n Goal: Protect X 1

  31. Algorithm: Main Idea X 1 X 2 X 3 X n Local nodes Rest (almost independent) (high correlation) Goal: Protect X 1

  32. Algorithm: Main Idea X 1 X 2 X 3 X n Local nodes Rest (almost independent) (high correlation) Goal: Protect X 1 Add noise to hide Small correction + local nodes for rest

  33. Measuring “Independence” Max-influence of X i on a set of nodes X R : x R log Pr( X R = x R | X i = a, θ ) e ( X R | X i ) = max a,b sup max Pr( X R = x R | X i = b, θ ) θ ∈ Θ Low e(X R |X i ) means X R is almost independent of X i To protect X i , correction term needed for X R is exp(e(X R |X i ))

  34. How to find large “almost independent” sets Brute force search is expensive Use structural properties of the Bayesian network

  35. Markov Blanket Markov Blanket (X i ) Markov Blanket (X i ) = X S Set of nodes X S s.t Xi is independent of X\(X i U X S ) X i given X S (usually, parents, children, other parents of children)

  36. Define: Markov Quilt X Q is a Markov Quilt of X i if: 1. Deleting X Q breaks graph into X N and X R X i X N X Q 2. X i lies in X N 3. X R is independent of X i given X Q X R (For Markov Blanket X N = X i )

  37. Recall: Algorithm X 1 X 2 X 3 X n Local nodes Rest (almost independent) (high correlation) Goal: Protect X 1 Add noise to hide Small correction + local nodes for rest

  38. Why do we need Markov Quilts? Given a Markov Quilt, X N = local nodes for X i X i X Q U X R = rest X N X Q X R

  39. Why do we need Markov Quilts? Given a Markov Quilt, X N = local nodes for X i X i X Q U X R = rest X N X Q Need to search over Markov Quilts X Q to find the one which needs optimal amount X R of noise

  40. From Markov Quilts to Amount of Noise Let X Q = Markov Quilt for X i Stdev of noise to protect X i : X i Noise due to X N X N X Q card ( X N ) Score(X Q ) = ✏ − e ( X Q | X i ) Correction for X Q U X R X R

  41. The Markov Quilt Mechanism For each X i Find the Markov Quilt X Q for X i with minimum score s i Output F(D) + (max i s i ) Z where Z ∼ Lap (1)

  42. The Markov Quilt Mechanism For each X i Find the Markov Quilt X Q for X i with minimum score s i Output F(D) + (max i s i ) Z where Z ∼ Lap (1) Theorem: This preserves -Pufferfish privacy ✏ Advantage: Poly-time in special cases.

  43. Example: Activity Monitoring D = (x 1 , .., x T ), x t = activity at time t

  44. Example: Activity Monitoring D = (x 1 , .., x T ), x t = activity at time t X i-a X i X i+b X Q X Q X R X N (Minimal) Markov Quilts for X i have form {X i-a ,X i+b } Efficiently searchable

  45. Example: Activity Monitoring set of states X : transition matrix describing each θ ∈ Θ P θ :

  46. Example: Activity Monitoring set of states X : transition matrix describing each θ ∈ Θ P θ : Under some assumptions, relevant parameters are: (min prob of x under stationary distr.) π Θ = x ∈ X , θ ∈ Θ π θ ( x ) min θ ∈ Θ min { 1 − | λ | : P θ x = λ x, λ < 1 } (min eigengap of any ) P θ g Θ = min

  47. Example: Activity Monitoring set of states X : transition matrix describing each θ ∈ Θ P θ : Under some assumptions, relevant parameters are: (min prob of x under stationary distr.) π Θ = x ∈ X , θ ∈ Θ π θ ( x ) min θ ∈ Θ min { 1 − | λ | : P θ x = λ x, λ < 1 } (min eigengap of any ) P θ g Θ = min Max-influence of X Q = {X i-a ,X i+b } for X i ✓ π Θ + exp( − g Θ b ) ◆ ✓ π Θ + exp( − g Θ a ) ◆ e ( X Q | X i ) ≤ log + 2 log π Θ − exp( − g Θ b ) π Θ − exp( − g Θ a ) a + b − 1 Score(X Q ) = ✏ − e ( X Q | X i )

Recommend


More recommend