Towards Practical Differentially Private Convex Optimization ROGER JOSEPH P. DAWN SONG IYENGAR NEAR UNIVERSITY OF CARNEGIE MELLON UNIVERSITY OF CALIFORNIA, BERKELEY UNIVERSITY VERMONT OM THAKKAR ABHRADEEP LUN WANG THAKURTA BOSTON UNIVERSITY OF UNIVERSITY OF UNIVERSITY CALIFORNIA, CALIFORNIA, SANTA BERKELEY CRUZ
Contributions • New Algorithm for Differentially Private Convex Optimization: Approximate Minima Perturbation (AMP) • Can leverage any off-the-shelf optimizer • Works for all convex loss functions • Has a competitive hyperparameter-free variant • Broad Empirical Study • 6 state-of-the-art techniques • 2 models: Logistic Regression, and Huber SVM • 13 datasets: 9 public (4 high-dimensional), 4 real-world use cases • Open-source repo: https://github.com/sunblaze-ucb/dpml-benchmark
This Talk • Why Privacy for Learning? • Background • Differential Privacy (DP) • Convex Optimization • Approximate Minima Perturbation (AMP) • Broad Empirical Study
Why Privacy for Learning? Sensitive Data 𝐸 Trained Input Training Algorithm 𝐵 Output Model 𝜄 • Models can leak information about training data • Membership inference attacks [Shokri Stronati Song Shmatikov’17, Carlini Liu Kos Erlingsson Song’18, Melis Song Cristofaro Shmatikov’18] • Model inversion attacks [Fredrikson Jha Ristenpart’15, Wu Fredrikson Jha Naughton’16] • Solution?
Differential Privacy [Dwork Mcsherry Nissim Smith ‘06] 𝐸 : : Alice Bob Cathy Doug Emily Om 𝐐𝐬( 𝑩(𝑬) = 𝜾 ) 𝐵 𝐐𝐬 Randomized Θ Outcomes 𝜾 𝜾 ∈𝚰
Differential Privacy [Dwork Mcsherry Nissim Smith ‘06] 𝐸↑ ′ : Felix Alice Bob Cathy Doug Emily Om 𝐐𝐬( 𝑩(𝑬) = 𝜾 ) 𝐵 𝐐𝐬 Randomized Θ Outcomes 𝜾 𝜾 ∈𝚰
Differential Privacy [Dwork Mcsherry Nissim Smith ‘06] 𝐸↑ ′ : Felix Alice Bob Cathy Doug Emily Om 𝐐𝐬( 𝑩(𝑬 ′ ) = 𝜾 ) 𝐵 𝐐𝐬 Randomized Θ Small Outcomes 𝜾 𝜾 ∈𝚰
Differential Privacy [Dwork Mcsherry Nissim Smith ‘06] • Privacy parameters: (𝜁 , 𝜀) • A randomized algorithm 𝐵 : ↑𝑜 → 𝑈 is (𝜁 , 𝜀) -DP if • for all neighboring datasets 𝐸 , 𝐸↑ ′ ∈ ↑𝑜 , i.e., 𝑒𝑗𝑡𝑢(𝐸 , 𝐸↑ ′ ) =1 • for all sets of outcomes 𝑇 ⊆Θ , we have Pr � (𝐵(𝐸) ∈ 𝑇) ≤ 𝑓↑𝜁 Pr � (𝐵(𝐸↑ ′ ) ∈ 𝑇) + 𝜀 𝜁 : Multiplicative change. 𝜀 : Additive change. Typically, 𝜁 = 𝑃( 1 ) Typically, 𝜀 = 𝑃( 1/ 𝑜↑ 2 )
Convex Optimization • Input: 𝑀(𝜄 , 𝐸) • Dataset 𝐸 ∈ ↑𝑜 • Loss function 𝑀(𝜄 , 𝐸) , where • 𝜄 ∈ ℝ ↑𝑞 is a model • Loss 𝑀 is convex in the first parameter 𝜄 • Goal: Output model 𝜄 such that 𝜄 ∈ min ┬𝜄 ∈ ℝ ↑𝑞 � 𝑀(𝜄 , 𝐸) • Applications: 𝜄 𝜄 • Machine Learning, Deep Learning, Collaborative Filtering, etc.
DP Convex Optimization - Prior Work Sensitive Data 𝐸 Trained Training Algorithm 𝐵 Input Output Model 𝜄 DP Objective DP GD/SGD DP Frank Output Permutation Perturbation [Song Chaudhuri Wolfe Sarwate’13, Bassily Smith Perturbation -based SGD [Chaudhuri Monteleoni Thakurta’14, Abadi Chu [Talwar Thakurta Sarwate’11, Kifer Smith Goodfellow McMahan [CMS’11, KST’12, JT’14] Zhang’14] [Wu Li Kumar Chaudhuri Thakurta’12, Jain Thakurta’14] Mironov Talwar Zhang’16] Jha Naughton ’17] - Requires minima of loss - Requires custom optimizer
Approximate Minima Perturbation (AMP) • Input: • Dataset 𝐸 , Loss function: 𝑀(𝜄 , 𝐸) 𝑀↓𝑞𝑠𝑗𝑤 (𝜄 , 𝐸) • Privacy parameters: 𝑐 =( 𝜗 , 𝜀 ) 𝑀(𝜄 , 𝐸) • Gradient norm bound 𝛿 • Algorithm (high-level): 1. Split privacy budget into 2 parts 𝑐↓ 1 and 𝑐↓ 2 2. Perturb loss: 𝑀↓𝑞𝑠𝑗𝑤 (𝜄 , 𝐸) = 𝑀(𝜄 , 𝐸) + 𝑆𝑓 ( 𝜄 , 𝑐↓ 1 ) Similar to standard Objective Perturbation 𝜄 𝜄↓𝑞𝑠𝑗𝑤 𝜄 [KST’12]
Approximate Minima Perturbation (AMP) • Input: ‖ ∇ 𝑀↓𝑞𝑠𝑗𝑤 (𝜄 , 𝐸)‖↓ 2 ≤ 𝛿 • Dataset 𝐸 , Loss function: 𝑀(𝜄 , 𝐸) 𝑀↓𝑞𝑠𝑗𝑤 (𝜄 , 𝐸) • Privacy parameters: 𝑐 =( 𝜗 , 𝜀 ) • Gradient norm bound 𝛿 • Algorithm (high-level): 1. Split privacy budget into 2 parts 𝑐↓ 1 and 𝑐↓ 2 2. Perturb loss: 𝑀↓𝑞𝑠𝑗𝑤 (𝜄 , 𝐸) = 𝑀(𝜄 , 𝐸) + 𝑆𝑓 ( 𝜄 , 𝑐↓ 1 ) 𝜄↓𝑏𝑞𝑞𝑠𝑝𝑦 3. Let 𝜄↓𝑏𝑞𝑞𝑠𝑝𝑦 = 𝜄 s.t. ‖ ∇ 𝑀↓𝑞𝑠𝑗𝑤 (𝜄 , 𝐸)‖↓ 2 ≤ 𝛿 4. Output 𝜄↓𝑏𝑞𝑞𝑠𝑝𝑦 + 𝑂𝑝𝑗𝑡𝑓(𝑐↓ 2 , 𝛿) Similar to standard Objective Perturbation 𝜄 𝜄↓𝑞𝑠𝑗𝑤 [KST’12]
Utility guarantees • Let 𝜄 minimize 𝑀(𝜄 ; 𝐸) , and the regularization parameter Λ= Θ (𝜊√ � 𝑞 /𝜗𝑜‖𝜄 ‖ ) . • Objective Perturbation [KST’12] : If 𝜄↓𝑞𝑠𝑗𝑤 is the output of obj. pert.: 𝔽 (𝑀(𝜄↓𝑞𝑠𝑗𝑤 ; 𝐸) − 𝑀(𝜄 ; 𝐸)) = 𝑃 (𝜊√ � 𝑞 ‖𝜄 ‖/𝜗𝑜 ) . • AMP (adapted from [KST’12] ): For output 𝜄↓𝐵𝑁𝑄 : 𝔽 (𝑀(𝜄↓𝐵𝑁𝑄 ; 𝐸) − 𝑀(𝜄 ; 𝐸)) = 𝑃 (𝜊√ � 𝑞 ‖𝜄 ‖/𝜗𝑜 + ‖𝜄 ‖𝛿𝑜) . • For 𝛿 = 𝑃( 1 /𝑜↑ 2 ) , the utility of AMP is asymptotically the same as that of Obj. Pert. • Private PSGD [ WL K ↑ + 17 ] : For output 𝜄↓𝑄𝑇𝐻𝐸 , and model space radius 𝑆 : 𝔽 (𝑀(𝜄↓𝑄𝑇𝐻𝐸 ; 𝐸) − 𝑀(𝜄 ; 𝐸)) = 𝑃 (𝜊√ � 𝑞 𝑆/𝜗√ � 𝑜 ) . • For 𝛿 = 𝑃( 1 /𝑜↑ 2 ) , the utility of AMP has a better dependence on 𝑜 than Private PSGD. than Private PSGD.
AMP - Takeaways • Can leverage any off-the-shelf optimizer • Works for all standard convex loss functions • For 𝛿 = 𝑃( 1 /𝑜↑ 2 ) , the utility of AMP: • is asymptotically the same as Objective Perturbation [KST’12] • has a better dependence on 𝑜 than Private PSGD [ WL K ↑ + 17 ] • 𝛿 = 1 /𝑜↑ 2 achievable using standard Python libraries
Empirical Evaluation • Algorithms evaluated: • Approximate Minima Perturbation (AMP) • Private SGD [ BST ↑ ′ 14, ACG ↑ + 17 ] • Private Frank-Wolfe (FW) [ TTZ ↑ ′ 14 ] • Private Permutation-based SGD (PSGD) [ WL K ↑ + 17 ] • Private Strongly-convex (SC) PSGD [ WL K ↑ + 17 ] • Hyperparameter-free (HF) AMP • Splitting the privacy budget: We provide a schedule for low- and high-dim. data by evaluating AMP only on synthetic data • Non-private (NP) Baseline
Empirical Evaluation • Loss functions considered: This talk • Logistic loss • Huber SVM • Procedure: • 80/20 train/test random split • Fix 𝜀 = 1 /𝑜↑ 2 , and vary 𝜗 from 0.01 to 10 • Measure accuracy of final tuned* private model over test set • Report the mean accuracy and std. dev. over 10 independent runs *Does not apply to Hyperparameter-free AMP.
Synthetic Datasets Synthetic-L (10 k ×20) Synthetic-H (2 k ×2 k ) Legend NP Baseline AMP HF AMP Private SGD Private PSGD Private SC PSGD Private FW - Synthetic-H is high-dimensional, but low-rank - Private Frank-Wolfe performs the best on Synthetic-H
High-dimensional Datasets Real-sim (72 k ×21 k ) RCV-1 (50 k ×47 k ) Legend NP Baseline AMP HF AMP Private SGD Private PSGD Private SC PSGD Private FW - Both variants of AMP almost always provide the best performance
Real-world Use Cases (Uber) Dataset 1 (4 m ×23) Dataset 2 (18 m ×294) Legend NP Baseline AMP HF AMP Private SGD Private PSGD Private SC PSGD Private FW - DP as a regularizer [BST’14, Dwork Feldman Hardt Pitassi Reingold Roth ’15] - Even for 𝜗 = 10 ↑ −2 , accuracy of AMP is close to non-private baseline
Conclusions • For large datasets, cost of privacy is low • Private model is within 4% accuracy of the non-private one for 𝜗 =0.01 , and within 2% for 𝜗 =0.1 • AMP almost always provides the best accuracy, and is easily deployable in practice • Hyperparameter-free AMP is competitive w.r.t. tuned state-of-the-art private algorithms • Open-source repo: https://github.com/sunblaze-ucb/dpml-benchmark Thank You!
Recommend
More recommend