h s i l a c o L Distributed Models for Statistical Data - PowerPoint PPT Presentation

h s i l a c o L Distributed Models for Statistical Data Privacy Adam Smith Based on • L. Reyzin, A. Smith, S. Yakoubov BU Computer Science https://eprint.iacr.org/2018/997 PPML 2018 Workshop • A. Cheu, A. Smith, J. Ullman, D. Zeber, M. December 8, 2018 Zhilayev https://arxiv.org/abs/1808.01394

Privacy in Statistical Databases Researchers Individuals Summaries queries “Agency” answers Complex models Many domains • Census • Medical Synthetic data • Advertising • Education • … … 3

Privacy in Statistical Databases Researchers Individuals Summaries queries “Agency” answers Complex models “Aggregate” outputs can leak lots of information • Reconstruction attacks Synthetic data • Example: Ian Goldberg’s talk on “the secret sharer” … 4

Utility Privacy Utility Trust Privacy model 5

Differential Privacy [Dwork, McSherry, Nissim, S. 2006] A A A(x’) A(x) local random local random coins coins !’ is a neighbor of ! if they differ in one data point Neighboring databases induce close distributions Definition : A is #, % -differentially private if, on outputs for all neighbors ! , !’ , for all sets of outputs & )*+,- *. / 0 ! ∈ & ≤ 3 4 ⋅ )*+,- *. / 0 ! 6 ∈ & + % Pr Pr 6

Outline • Local model • Models for DP + MPC • Lightweight architectures Ø “From HATE to LOVE MPC” • Minimal primitives Ø “Differential Privacy via Shuffling” 7

Equivalent to [Efvimievski, Local Model for Privacy Gehrke, Srikant ‘03] 9 : A Untrusted 9 ; aggregator 9 < local random A coins • “Local” model Ø Person ! randomizes their own data Ø Attacker sees everything except player ! ’s local state • Definition: A is " -locally differentially private if for all ! : Ø for all neighbors # , #’ , = = 0 Ø for all behavior % of other parties, w.l.o.g. Ø for all sets of transcripts & : )*+,- . / 0 #, % = 3 ≤ 5 6 ⋅ )*+,- . / 0 # 8 , % = 3 Pr Pr 8

Local Model for Privacy ! " A Untrusted ! # aggregator ! $ local random A coins https://developer.apple.com/ videos/play/wwdc2016/709/ https://github.com/google/rappor 9

Local Model for Privacy ( " A Untrusted ( ) aggregator ( $ local random A coins • Pros Ø No trusted curator Ø No single point of failure Ø Highly distributed Ø Beautiful algorithms • Cons Ø Low accuracy # $ error [BMO’08,CSS’12] vs %( " " • Proportions: Θ $# ) central Ø Correctness requires honesty 10

Selection Lower Bounds [DJW’13, Ullman ‘17] 1 attributes 0 1 1 0 1 0 0 0 1 2 data 0 1 0 1 0 1 0 0 1 1 0 1 1 1 1 0 1 0 people 1 1 0 0 1 0 1 0 0 • Suppose each person has ! binary attributes • Goal : Find index " with highest count (±%) • Central model : ' = ) log(!)/.% suffices [McSherry Talwar ‘07] • Local model: Any noninteractive local DP protocol with nontrivial error requires ' = Ω(! log(!) /. 0 ) Ø [DJW’13, Ullman ‘17] Ø (No lower bound known for interactive protocols) 12

Local Model for Privacy ! " A Untrusted ! # aggregator ! $ local random A coins What other models allow similarly distributed trust? 13

Two great tastes that go great together " # A • How can we get accuracy without a trusted curator? • Idea: Replace central algorithm ! with multiparty computation (MPC) protocol for ! (randomized), and either Ø Secure channels + honest majority Ø Computational assumptions + PKI • Questions: Ø What definition does this achieve? Ø Are there special-purpose protocols that are more efficient than generic reductions? Ø What models make sense? Ø What primitives are needed? 15

Definitions & ' A What definitions are achieved? Not • Simulation of an (", $) -DP protocol equivalent • Computational DP [Mironov, Pandey, Reingold, Vadhan’08] Definition : A is ((, ", $) -computationally differentially private if, for all neighbors ) , )’ , for all distinguishers + ∈ (-./(() = 1 ≤ / < ⋅ 23456 37 ' + 8 ) > 23456 37 ' + 8 ) Pr Pr = 1 + $ 16

Question 1: Special-purpose protocols • [Dwork Kenthapadi McSherry Mironov Naor ‘06] Special-purpose protocols for generating Laplace/exponential noise via finite field arithmetic Ø ⇒ honest-majority MPC Ø Satisfies simulation, follows existing MPC models Ø Lots of follow-up work • [He, Machanavajjhala, Flynn, Srivastava ’17, Mazloom, Gordon ’17, maybe others?] Use DP statistics to speed up MPC Ø Leaks more than ideal functionality 17

Question 2: What MPC models make sense? • Recall: secure MPC protocols require Ø Communication between all pairs of parties Ø Multiple rounds, so parties have to stay online • Protocols involving all Google/Apple users wouldn’t work 18

Question 2: What MPC models make sense? Applications of DP suggest a few different settings • “Few hospitals” Ø Small set of computationally powerful data holders Ø Each holds many participants’ data ! = $(& ' , … , & * ) Ø Data holders have their own privacy-related concerns • Sometimes can be modeled explicitly, e.g. [Haney, Machanavajjhala, Abowd, Graham, Kutzbach, Vilhuber ‘17] • Data holders interests may not align with individuals’ • “Many phones” Ø Many weak clients (individual data holders) Ø One server or small set of servers Ø Unreliable, client-server network Ø Calls for lightweight MPC protocols, e.g. [Shi, Chan, Rieffel, Chow, Song ‘11, Boneh, Corrigan-Gibbs ‘17, Bonawitz, Ivanov, Kreuter, Marcedone, McMahan, Patel, Ramage, Segal, Seth ’17] ! = DP does not need full MPC $(& 1 , … , & * ) Ø Sometimes, leakage helps [HMFS ’17, MG’17] Ø Sometimes, we do not know how to take advantage of it [McGregor Mironov Pitassi Reingold Talwar Vadhan ’10] 19

Question 3: What MPC primitives do we need? • Observation: Most DP algorithms rely on 2 primitives Ø Addition + Laplace/Gaussian noise Ø Threshold (summation + noise) • Sufficient for “sparse vector” and “exponential mechanism” • [Shafi’s talk mentions others for training nonprivate deep nets.] Ø Relevant for PATE framework • Lots of work focuses on addition Ø “Federated learning” Ø Relies on users to introduce small amounts of noise • Thresholding remains complicated Ø Because highly nonlinear Ø Though maybe approximate thresholding easier (e.g. HEEAN) • Recent papers look at weaker primitives Ø Shufflers as a useful primitive [Erlingsson, Feldman, Mironov, Raghunathan, Talwar, Thakurta] [Cheu, Smith, Ullman, Zeber, Zhilyaev 2018] 20

Turning HATE into LOVE MPC Scalable Multi-Party Computation With Limited Connectivity Leonid Reyzin, Adam Smith, Sophia Yakoubov https://eprint.iacr.org/2018/997

Goals • Clean formalism for “many phones” model • Inspired by protocols of [Shi et al, 2011; Bonawitz et al. 2017] • Identify • Fundamental limits • Potentially practical protocols • Open questions

L arge-scale Y = f(X 1 , X 2 , X 3 , X 4 ) O ne-server X 1 V anishing-participants E fficient X 2 X 4 MP MPC [Goldreich,Micali,Widgerson87,Yao87] Y = f(X 1 , X 2 , X 3 , X 4 ) Y = f(X 1 , X 2 , X 3 , X 4 ) X 3 Y = f(X 1 , X 2 , X 3 , X 4 ) No party learns anything other than the output!

L arge-scale Y = A(X 1 , X 2 , X 3 , X 4 ) O ne-server X 1 V anishing-participants E fficient X 2 X 4 = X 4 MP MPC Y = A(X 1 , X 2 , X 3 , X 4 ) Y = A(X 1 , X 2 , X 3 , X 4 ) X 3 Central model level accuracy! Y = A(X 1 , X 2 , X 3 , X 4 ) Local model level privacy! Can compute differentially private statistic A(X) without server learning anything but the output! [Dwork,Kenthapadi,McSherry,Mironov,Naor06]

L arge-scale Y = A(X 1 , X 2 , X 3 , X 4 ) O ne-server X 1 V anishing-participants E fficient X 2 X 4 = X 4 MP MPC Y = A(X 1 , X 2 , X 3 , X 4 ) Y = A(X 1 , X 2 , X 3 , X 4 ) X 3 Central model level accuracy! Y = A(X 1 , X 2 , X 3 , X 4 ) Local model level privacy! Can compute differentially private statistic A(X) without server learning anything but the output! A(X) is often linear, so we will focus on MPC for addition

L arge-scale O ne-server X 1 V anishing-participants E fficient X 2 X 4 MP MPC Y = f(X 1 , X 2 , X 3 , X 4 ) X 3 Clients Server Computational power weak strong

L arge-scale (millions of clients) O ne-server V anishing-participants E fficient MP MPC Y = f(X 1 , X 2 , … X n ) Clients Server Computational power weak strong

L arge-scale (millions of clients) O ne-server V anishing-participants E fficient MP MPC • Star communication graph, Y = f(X 1 , X 2 , … X n ) as in noninteractive multiparty computation (NIMPC) [Beimel,Gabizon,Ishai,Kushilevitz,Meldgaard,PaskinCherniavsky14] Clients Server Computational power weak strong Direct communication only to server to everyone

h s i l a c o L Distributed Models for Statistical Data - PowerPoint PPT Presentation

h s i l a c o L Distributed Models for Statistical Data Privacy Adam Smith Based on L. Reyzin, A. Smith, S. Yakoubov BU Computer Science https://eprint.iacr.org/2018/997 PPML 2018 Workshop A. Cheu, A. Smith, J. Ullman, D.

Execution Integrity Gang Tan Penn State University Spring 2019 CMPSC 447, Software Security

Autovectorization with LLVM Hal Finkel April 12, 2012 The LLVM Compiler Infrastructure 2012

Who we are Eshard - Embedded Security Company Software & Hardware Security What do we

Consensus: Paxos Haobin Ni Oct 22, 2018 What is consensus? A group of people go to the same

Achieving Correctness in Fair Rational Secret Sharing Sourya Joyee De & Asim K Pal

Disclosures I have no financial disclosures Abortion in 2020: The Role of Primary Care CME

Addressing Disparities in We have no disclosures relevant to this talk Abortion &

Figures from the 2016 Assisted Reproductive Technology National Summary Report Locations of ART

Figures from the 2015 Assisted Reproductive Technology National Summary Report Locations of ART

Hoare Calculus and Predicate Transformers Wolfgang Schreiner

r rr r t

1 Marj Plumb, Dr. PH Plumbline Coaching and Consulting, Inc. marj@marjplumb.com 2 How

The ACA and What It Means for Black Americans Presented by the Kaiser Family Foundation Tuesday,

A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe and

The Role of Policy Preferences in Mass Belief Systems How much do they matter, and what matters

IMPACT OF COVID -19 ON IPPF OPERATIONS COVID -19 & SRHR Landscape 121 countries 6

Mamoudou Gazibo Professor of political science 1 Introduction I- China in Africa : past and

Health Coverage for your County Jails Pretrial Population Thursday, February 23, 2012 Support

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Instead of generic

New Directions in Privacy- preserving Machine Learning Kamalika Chaudhuri University of

2020 Convening & Collaborating / 2020 Reach Competition Information Session Shannon Tolleson

What is Quality? Workshop on Quality Assurance and Quality Measurement for Language and Speech

T o p 10ish Wo me n s He a lth o r to pic s in the ne ws Mo stly g yn, a little o b

Short-Term Certificates D R A F T - I E T F - A C M E - S T A R - 0 0 Y A R O N S H E F F E R ,

h s i l a c o L Distributed Models for Statistical Data - PowerPoint PPT Presentation

h s i l a c o L Distributed Models for Statistical Data Privacy Adam Smith Based on L. Reyzin, A. Smith, S. Yakoubov BU Computer Science https://eprint.iacr.org/2018/997 PPML 2018 Workshop A. Cheu, A. Smith, J. Ullman, D.

Execution Integrity Gang Tan Penn State University Spring 2019 CMPSC 447, Software Security

Autovectorization with LLVM Hal Finkel April 12, 2012 The LLVM Compiler Infrastructure 2012

Who we are Eshard - Embedded Security Company Software &amp; Hardware Security What do we

Consensus: Paxos Haobin Ni Oct 22, 2018 What is consensus? A group of people go to the same

Achieving Correctness in Fair Rational Secret Sharing Sourya Joyee De &amp; Asim K Pal

Disclosures I have no financial disclosures Abortion in 2020: The Role of Primary Care CME

Addressing Disparities in We have no disclosures relevant to this talk Abortion &amp;

Figures from the 2016 Assisted Reproductive Technology National Summary Report Locations of ART

Figures from the 2015 Assisted Reproductive Technology National Summary Report Locations of ART

Hoare Calculus and Predicate Transformers Wolfgang Schreiner

r rr r t

1 Marj Plumb, Dr. PH Plumbline Coaching and Consulting, Inc. marj@marjplumb.com 2 How

The ACA and What It Means for Black Americans Presented by the Kaiser Family Foundation Tuesday,

A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe and

The Role of Policy Preferences in Mass Belief Systems How much do they matter, and what matters

IMPACT OF COVID -19 ON IPPF OPERATIONS COVID -19 &amp; SRHR Landscape 121 countries 6

Mamoudou Gazibo Professor of political science 1 Introduction I- China in Africa : past and

Health Coverage for your County Jails Pretrial Population Thursday, February 23, 2012 Support

CS345a: Data Mining Jure Leskovec and Anand Rajaraman j Stanford University Instead of generic

New Directions in Privacy- preserving Machine Learning Kamalika Chaudhuri University of

2020 Convening &amp; Collaborating / 2020 Reach Competition Information Session Shannon Tolleson

What is Quality? Workshop on Quality Assurance and Quality Measurement for Language and Speech

T o p 10ish Wo me n s He a lth o r to pic s in the ne ws Mo stly g yn, a little o b

Short-Term Certificates D R A F T - I E T F - A C M E - S T A R - 0 0 Y A R O N S H E F F E R ,

Who we are Eshard - Embedded Security Company Software & Hardware Security What do we

Achieving Correctness in Fair Rational Secret Sharing Sourya Joyee De & Asim K Pal

Addressing Disparities in We have no disclosures relevant to this talk Abortion &

IMPACT OF COVID -19 ON IPPF OPERATIONS COVID -19 & SRHR Landscape 121 countries 6

2020 Convening & Collaborating / 2020 Reach Competition Information Session Shannon Tolleson