providing input discriminative protection for local

Providing Input-Discriminative Protection for Local Differential - PowerPoint PPT Presentation

Providing Input-Discriminative Protection for Local Differential Privacy Xiaolan Gu * , Ming Li * , Li Xiong # and Yang Cao *University of Arizona # Emory University Kyoto University IEEE International Conference on Data Engineering


  1. Providing Input-Discriminative Protection for Local Differential Privacy Xiaolan Gu * , Ming Li * , Li Xiong # and Yang Cao † *University of Arizona # Emory University † Kyoto University IEEE International Conference on Data Engineering (ICDE), April 2020

  2. Overview • Background on LDP • Our Privacy Notion: ID-LDP • Our Privacy Mechanism on ID-LDP • Evaluation • Conclusion

  3. Background • Companies are collecting our private data to provide better services (Google, Facebook, Apple, Yahoo, Uber, …) • Yahoo: massive data breaches impacted 3 billion user account, 2013 • Facebook: 267 million users’ data has reportedly been leaked, 2019 • However, privacy concerns arise • … • Possible solution: locally private data collection model Upload perturbed data Randomized mechanism y 1 M x i y i Analysis y 2 Raw Perturbed ⋮ data data Untrusted y n server

  4. Local Differential Privacy (LDP) [Duchi et al, FOCS’ 13] A mechanism satisfies -LDP if and only if for any pair of inputs x , x ′ M ϵ and any output y Pr( M ( x ) = y ) ) = y ) ⩽ e ϵ Pr( M ( x ′ • : the possible input (raw) data (generated by the user) x , x ′ • : the output (perturbed) data (public and known by adversary) y • : privacy budget (a smaller indicates stronger privacy) ϵ ϵ An adversary cannot infer whether the input is or with high confidence x ′ x (controlled by ) ϵ

  5. Applications of LDP Apple: discovering popular Emojis under LDP Source: Source: https://developers.googleblog.com/2019/09/enabling-developers-and-organizations.html https://machinelearning.apple.com/2017/12/06/learning-with-privacy-at-scale.html

  6. Limitations of LDP • LDP notion requires the same privacy budget for all pairs of possible inputs • Existing LDP protocols perturb the data in the same way for all inputs • However, in many practical scenarios, di ff erent inputs have di ff erent degrees of sensitiveness, thus require distinct levels of privacy protection. Scenarios High sensitiveness Low sensitiveness Website-click records Politics-related Facebook and Amazon Medical records HIV and cancer Anemia and headache • LDP protocols can provide excessive protection for some inputs that do not need such strong privacy (leading to an inferior privacy-utility tradeo ff )

  7. Our Privacy Notion: Input-Discriminative LDP (ID-LDP) is the privacy budget ϵ x of an input x • Given a privacy budget set , a randomized mechanism satisfies ℰ = { ϵ x } x ∈𝒠 M -ID-LDP if and only if for any pair of inputs and output ℰ x , x ′ ∈ 𝒠 y ∈ Range ( M ) Pr( M ( x ) = y ) ) = y ) ⩽ e r ( ϵ x , ϵ x ′ ) is a function of two privacy budgets r ( ⋅ , ⋅ ) Pr( M ( x ′ • In this paper, we focus on an instantiation called MinID-LDP with r ( ϵ x , ϵ x ′ ) = min{ ϵ x , ϵ x ′ } Intuition: for any pair of inputs , MinID-LDP guarantees the adversary’s capability of distinguishing x , x ′ them would not exceed the bound controlled by both and (thus achieving di ff erentiated privacy ϵ x ϵ x ′ protection for each pair) MinID-LDP has Sequential Composition like LDP , which guarantees the overall privacy for a sequence of mechanisms.

  8. Relationships with LDP 1. If for all , then -MinID-LDP -LDP x ∈ 𝒠 ℰ ⇔ ϵ ϵ x = ϵ 2. If , then -LDP -MinID-LDP min{ ℰ } ⩾ ϵ ⇒ ℰ ϵ 3. If , then -MinID-LDP -LDP ϵ ⩾ min{max{ ℰ }, 2 min{ ℰ }} ℰ ⇒ ϵ Factor 2 is due to the symmetric property of the indistinguishability definition . It captures user’s fine-grained MinID-LDP can be regarded as a relaxation compared with LDP privacy requirement , when LDP is too strong (i.e., provides overprotection).

  9. Related Privacy Notions • Personalized LDP (PLDP) [Chen et al, ICDE’ 16] User-discriminative Distance-discriminative Input-discriminative ID-LDP PLDP GI or CLDP • Geo-indistinguishability (GI) [Andres et al, CCS’ 13] 𝑦 � ( ϵ � ) 𝑦 � ( ϵ � ) 𝜗 �� 𝜗𝑒 �� 𝜗 𝜗 𝑣 𝑦 � 𝑦 � 𝑦 � 𝑦 � 𝑦 � 𝑦 � 𝜗𝑒 �� 𝜗 �� 𝜗 𝜗 𝑣 • Condensed LDP (CLDP) [Gursoy et al, TDSC’ 19] 𝜗 �� 𝜗 𝑣 𝜗 𝜗 𝜗𝑒 �� 𝜗 �� 𝜗 𝑣 𝜗𝑒 �� 𝜗 �� 𝜗𝑒 �� 𝜗 𝑣 𝜗 • Utility-optimized LDP (ULDP) 𝜗 �� 𝜗 𝑦 � 𝑦 � 𝜗 𝑣 𝑦 � 𝑦 � ( ϵ � ) 𝑦 � ( ϵ � ) 𝜗𝑒 �� 𝑦 � 𝑦 � 𝑦 � [Murakami and Kawamoto, USENIX Security’ 19] 𝜗 all 𝜗 𝑣 : the privacy budget of 𝜗𝑒 �� : the privacy budget for 𝜗 � : the privacy budget of 𝑦 � y x ts, a user 𝑣 for all pairs of 𝜗 �� : the privacy budget of a pair a pair of inputs 𝑦 � , 𝑦 � 𝜗 � min�𝜗 � � inputs (different user Sensitive of inputs 𝑦 � , 𝑦 � for all users 𝑒 �� : distance between 𝑦 � , 𝑦 � ULDP 𝒴 S 𝒵 P 𝑦 � 𝜗 � may have different 𝜗 𝑣 ) inputs MinID-LDP: 𝜗 �� � min�𝜗 � , 𝜗 � � Privacy budget of a pair of inputs in several related notions Non- sensitive 𝒵 I 𝒴 N inputs -LDP ϵ ULDP does not guarantee the indistinguishability between the sensitive and non-sensitive inputs when observing some outputs, thus ULDP does not guarantee LDP .

  10. Privacy Mechanism Design under ID-LDP Problem Statement • Data types: categorical (two cases: each user has only one item or an item-set) • Analysis Task/Application: frequency estimation (which is the building block for many applications) • Objectives: minimize MSE of frequency estimation while satisfying ID-LDP ID-LDP protocols perturb inputs with di ff erent probabilities Challenges • The number of variables (perturbation parameters) and privacy constraints (to be satisfied for any Example: assume domain size , m ) can be very large (especially for a large domain or item-set data). x , x ′ , y m 2 m 3 then variables and constraints • Objective function (MSE) is dependent on the unknown true frequencies; Preliminaries: LDP protocols • Randomized Response • Unary Encoding Our protocol satisfying ID-LDP is based on this

  11. ̂ LDP Protocol: Randomized Response • Randomized Response (RR) [Warner, 1965]: reports the truth with some probability (for binary answer: yes-or-no) Advanced versions: Unary Encoding, Generalized RR, … • Example: Is your annual income more than 100k? Truth x Response y Frequency of response y w.p. p 1 f = f − (1 − p ) Frequency estimation: 1 2 p − 1 w.p. 1 − p 0 𝔽 [ ̂ Unbiasedness: f ] = f * w.p. 1 − p 1 0 True frequency 0 w.p. p e ϵ p 1 − p = e ϵ To satisfy -LDP: (since ) ϵ p = 𝔽 [ f ] = f * p + (1 − f *)(1 − p ) = (2 p − 1) f * + (1 − p ) e ϵ + 1

  12. LDP Protocol: Unary Encoding (UE) • To handle more general case (domain size is ), UE represents the input/output by multiple bits. d • Step 1. encode the input into vector with length x = [0, ⋯ ,0,1,0, ⋯ ,0] x = i d • Step 2. perturb each bit independently By minimizing the approximate MSE of frequency estimation RAPPOR OUE x [ k ] y [ k ] [Erlingsson et al, CCS’ 14] [Wang et al, USENIX Security’ 17] w.p. p 1 w.p. 0.5 1 To satisfy -LDP: ϵ 0 w.p. 0.5 w.p. 1 − p e ϵ /2 1 , p = q = 1 w.p. 1 − p w.p. q e ϵ /2 + 1 e ϵ + 1 0 0 w.p. p w.p. 1 − q

  13. Overview of Our Protocol for ID-LDP Recall the two challenges: 1) High complexity of the optimization problem. 2) MSE depends on unknown true frequencies. For single-item data: IDUE (Input-Discriminative Unary Encoding) m 2 1. We propose Unary Encoding based protocol with only variables and constraints 2 m 2. We address the second challenge by developing three variants of optimization models (some models can further reduce the problem complexity) For item-set data: IDUE-PS (with Padding-and-Sampling protocol) 1. We extend IDUE for item-set data (by combining with a sampling protocol) to solve the scalability issue 2. We show IDUE-PS also satisfies MinID-LDP (if the base protocol IDUE satisfies MinID-LDP)

  14. ̂ ̂ Privacy Mechanism for Single-Item Data • Step 1, encode the input into x = [0, ⋯ ,0,1,0, ⋯ ,0] x [ k ] y [ k ] x = i w.p. a k 1 • Step 2, perturb each bit independently (with di ff erent probabilities) 1 w.p. 1 − a k 0 ∑ u y u [ i ] − nb i • Step 3, estimate frequency/counting by 1 w.p. b k c i = a i − b i 0 — number of users n 0 w.p. 1 − b k — perturbation probabilities a i , b i — true frequency c * a i (1 − b j ) nb i (1 − b j ) i (1 − a i − b i ) ( a i − b i ) 2 + c * b i (1 − a j ) ⩽ e r ( ϵ i , ϵ j ) ( ∀ i , j ) i MSE ̂ c i = Var [ ̂ c i ] = — estimated frequency c i a i − b i Benefits m 2 1. The optimization problem only has variables and constraints 2 m 2. The frequency estimator is unbiased, and its MSE can be composed by two terms, where only the second term is dependent on the true frequencies c * i

  15. Comparison with LDP Protocols Example: a health organization is taking a survey which asks participants to return a response n perturbed from categories {HIV, anemia, headache, stomachache, toothache}, where HIV ( ) is i = 1 more sensitive, thus we set di ff erent privacy budgets, such as and . ϵ i = ln 6 ( i = 2, ⋯ ,5) ϵ 1 = ln 4 More perturbation Less perturbation noise for i = 1 noise for i ≠ 1 The total variance of IDUE is in a range because it depends on the distribution of true input data, and the upper bound is still less than that of RAPPOR and OUE.

Recommend


More recommend


Explore More Topics

Stay informed with curated content and fresh updates.