Privacy Accounting and Quality Control in the Sage Di ff erentially Private ML Platform Mathias Lécuyer With: Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, and Daniel Hsu
Machine Learning (ML) introduces a dangerous double standard for data protection Example: messaging app 2
Example: messaging app ML platform (e.g. TFX) auto- ad recommendation Traditional complete targeting model model model code messages, database Growing Database likes, clicks... 3
Example: messaging app user's messages user's messages (per access control restrictions) API ML platform (e.g. TFX) auto- ad recommendation Traditional complete targeting model model model code Access control messages, database Growing Database likes, clicks... 4
Example: messaging app models and/or predictions (based on everyone's messages, likes, clicks...) API ML platform (e.g. TFX) ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP auto- ad ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP recommendation ad auto- recommendation Traditional complete targeting ad auto- recommendation model targeting complete model model model targeting complete model code model model model model Access control messages, database Growing Database likes, clicks... 5
Example: messaging app ML should only captures general trends from the data, but often captures specific information about individual entries in the dataset. API ML platform (e.g. TFX) ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP auto- ad ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP recommendation ad auto- recommendation Traditional complete targeting ad auto- recommendation model targeting complete model model model targeting complete model code model model model model Access control messages, database Growing Database likes, clicks... 6
Example: messaging app Language models over users’ emails leak secrets. (Carlini+ '18) API ML platform (e.g. TFX) ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP auto- ad ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP recommendation ad auto- recommendation Traditional complete targeting ad auto- recommendation model targeting complete model model model targeting complete model code model model model model Access control messages, database Growing Database likes, clicks... 7
Example: messaging app Recommenders leak information across users. Membership in a training set can be inferred [Calandrino'11] through prediction APIs. (Shokri+17) API ML platform (e.g. TFX) ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP auto- ad ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP recommendation ad auto- recommendation Traditional complete targeting ad auto- recommendation model targeting complete model model model targeting complete model code model model model model Access control messages, database Growing Database likes, clicks... 8
Example: messaging app Recommenders leak information across users. Language models over users’ emails leak secrets. Recommenders leak information across users. [Calandrino'11] (Carlini+ '18) (Calandrino'11) API ML platform (e.g. TFX) ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP auto- ad ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP recommendation ad auto- recommendation Traditional complete targeting ad auto- recommendation model targeting complete model model model targeting complete model code model model model model Access control messages, database Growing Database likes, clicks... 9
Example: messaging app • Making individual training algorithms Differentially Privacy (DP) is good but insufficient, because old data is reused many times. • No system exists for managing multiple DP training algorithms to enforce a global DP guarantee. API ML platform (e.g. TFX) ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP auto- ad ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP recommendation ad auto- recommendation Traditional complete targeting ad auto- recommendation model targeting complete model model model targeting complete model code model model model model Access control messages, database Growing Database likes, clicks... 10
Example: messaging app • Making individual training algorithms Differentially Privacy (DP) is good but insufficient, because old data is reused many times. • No system exists for managing multiple DP training algorithms to enforce a global DP guarantee. API ML platform (e.g. TFX) ( ε , δ )-DP ( ε , δ )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε , δ )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP auto- ad ad auto- recommendation recommendation Traditional ad auto- recommendation complete targeting targeting complete model model targeting complete model code model model model model model model Access control messages, database Growing Database likes, clicks... 11
Can we make Di ff erential Privacy practical for ML applications? 12
Sage • Enforces a global ( ε g , δ g )-DP guarantee across all models ever released from a growing database. API ML platform (e.g. TFX) • Tackles in practical ways two difficult ( ε , δ )-DP ( ε , δ )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε , δ )-DP DP challenges: ( ε g , δ g )-DP ( ε g , δ g )-DP ( ε g , δ g )-DP auto- ad ad auto- recommendation recommendation Traditional ad auto- recommendation complete targeting targeting complete model 1. “Running out of budget” model targeting complete model code model model model model model model 2. “Privacy-utility tradeoff.” Access Sage access control global ( ε g , δ g )-DP control messages, database Growing Database likes, clicks... 13
Outline Motivation Di ff erential Privacy Two practical challenges Sage design Evaluation 14
Differential Privacy (DP) (Dwork+ '06) • Developed to allow privacy-preserving statistical analyses on sensitive datasets (e.g., census, drug purchases, …). • First (and only) rigorous definition of privacy suitable for this use case. 15
Definition • DP is a stability constraint on computations running on datasets: it requires that no single data point in an input dataset has a significant influence on the output. • To achieve stability, randomness is added into the computation. 16
Definition • DP is a stability constraint on computations running on datasets: it requires that no single data point in an input dataset has a significant influence on the output. • To achieve stability, randomness is added into the computation. • A randomized computation f: D → O, is ( ε , δ )-DP if for any pair of datasets D and D' di ff ering in one entry, and for any output set S ⊂ O: P(f(D) ∈ S) ≤ e ε P(f(D') ∈ S) + δ 17
DP in ML • Approach: make training algorithms DP . • It prevents membership query and reconstruction attacks (Steinke-Ullman '14; Dwork+ '15; Carlini+ '18). • DP versions exist for most ML training algorithms: • Stochastic gradient descent (SGD) (Abadi+16, Yu+19). • Various regressions (Chaudhuri+08, Kifer+12, Nikolaenko+13, Talwar+15). • Collaborative filtering (McSherry+09). • Language models (McMahan+18). • Feature and model selection (Chaudhuri+13, Smith+13). • Model evaluation (Boyd+15). • Tensorflow/privacy implements several of these algorithms (McMahan+19). 18
Outline Motivation Di ff erential Privacy Two practical challenges Sage design Evaluation 19
Challenge 1 - Running out of privacy budget ML platform (e.g. TFX) ( ε g , δ g )-DP model ( ε g , δ g )-DP model ( ε g , δ g )-DP model ( ε , δ )-DP model Most DP work focuses on a fixed database model: global ( ε g , δ g )-DP • Each model consumes some privacy budget. Privacy loss • When the budget is exhausted, the data cannot be used anymore: the system can "run out of budget". Time Fixed Dataset 20
Challenge 1 - Running out of privacy budget ML platform (e.g. TFX) ( ε g , δ g )-DP model ( ε g , δ g )-DP model ( ε g , δ g )-DP model ( ε , δ )-DP model Most DP work focuses on a fixed database model: global ( ε g , δ g )-DP • Each model consumes some privacy budget. Privacy loss • When the budget is exhausted, the data cannot be used anymore: the system can "run out of budget". Time Fixed Dataset 21
Challenge 1 - Running out of privacy budget ML platform (e.g. TFX) ( ε g , δ g )-DP model ( ε g , δ g )-DP model ( ε g , δ g )-DP model ( ε , δ )-DP model Most DP work focuses on a fixed database model: global ( ε g , δ g )-DP • Each model consumes some privacy budget. Privacy loss • When the budget is exhausted, the data cannot be used anymore: the system can "run out of budget". Time Fixed Dataset 22
Challenge 1 - Running out of privacy budget ML platform (e.g. TFX) ( ε g , δ g )-DP model ( ε g , δ g )-DP model ( ε g , δ g )-DP model ( ε , δ )-DP model Most DP work focuses on a fixed database model: global ( ε g , δ g )-DP • Each model consumes some privacy budget. Privacy loss • When the budget is exhausted, the data cannot be used anymore: the system can "run out of budget". Time Fixed Dataset 23
Challenge 2 - Privacy/utility trade-off 24
Recommend
More recommend