Privacy guarantees in statistical estimation: How to formalize the - PowerPoint PPT Presentation

Privacy guarantees in statistical estimation: How to formalize the problem? Martin Wainwright UC Berkeley Departments of Statistics, and EECS van Dantzig Seminar, University of Leiden Martin Wainwright (UC Berkeley) Privacy and statistics October 2015 1 / 22

The modern landscape Modern data sets are often very large biological data (genes, proteins, etc.) medical imaging (MRI, fMRI etc.) astronomy datasets social network data recommender systems (Amazon, Netflix etc.)

The modern landscape Modern data sets are often very large biological data (genes, proteins, etc.) medical imaging (MRI, fMRI etc.) astronomy datasets social network data recommender systems (Amazon, Netflix etc.) Statistical considerations interact with: 1 Computational constraints: (low-order) polynomial-time is essential!

The modern landscape Modern data sets are often very large biological data (genes, proteins, etc.) medical imaging (MRI, fMRI etc.) astronomy datasets social network data recommender systems (Amazon, Netflix etc.) Statistical considerations interact with: 1 Computational constraints: (low-order) polynomial-time is essential! 2 Communication/storage constraints: distributed implementations are often needed

The modern landscape Modern data sets are often very large biological data (genes, proteins, etc.) medical imaging (MRI, fMRI etc.) astronomy datasets social network data recommender systems (Amazon, Netflix etc.) Statistical considerations interact with: 1 Computational constraints: (low-order) polynomial-time is essential! 2 Communication/storage constraints: distributed implementations are often needed 3 Privacy constraints: tension between hiding/sharing data

From Classical Minimax Risk... Choose estimator to minimize the worst-case risk � � �� Classical minimax risk = inf sup E L θ n , θ . � θ n θ ∈ Ω Abraham Wald 1902–1950

From Classical Minimax Risk... Choose estimator to minimize the worst-case risk � � �� Classical minimax risk = inf sup E L θ n , θ . � θ n θ ∈ Ω Two party game: Nature chooses parameter θ ∈ Ω in a potentially adversarial manner Statistician takes infimum over all estimators: ( X 1 , . . . , X n ) �→ � θ n ∈ Ω � �� Abraham Wald arbitrary measurable function 1902–1950

From Classical Minimax Risk... Choose estimator to minimize the worst-case risk � � �� Classical minimax risk = inf sup E L θ n , θ . � θ n θ ∈ Ω Two party game: Nature chooses parameter θ ∈ Ω in a potentially adversarial manner Statistician takes infimum over all estimators: ( X 1 , . . . , X n ) �→ � θ n ∈ Ω � �� Abraham Wald arbitrary measurable function 1902–1950 Classical questions about minimax risk: how fast does it decay as a function of sample size n ? dependence on dimensionality, smoothness etc.? characterization of optimal estimators?

....to Constrained Minimax Risk Classical framework imposes no constraints on the choice of estimators � θ n .

....to Constrained Minimax Risk Classical framework imposes no constraints on the choice of estimators � θ n . Unbounded memory and computational power. Provided centralized access to all n samples. Data is fully revealed: no privacy-preserving properties.

....to Constrained Minimax Risk Classical framework imposes no constraints on the choice of estimators � θ n . Unbounded memory and computational power. Provided centralized access to all n samples. Data is fully revealed: no privacy-preserving properties. On-going research: statistical minimax with constraints Computationally-constrained estimators (e.g., Rigollet & Berthet, 2013; Ma & Wu, 2014; Zhang, W. & Jordan, 2014) Communication constraints (e.g., Zhang et al., 2013; Ma et al. 2014; Braverman et al., 2015) Privacy constraints (e.g., Dwork, 2006; Hardt & Rothblum, 2010; Hall et al., 2011; Duchi, W. & Jordan, 2013)

Why be concerned with privacy? Many sources of data have both statistical utility and privacy concerns. (a) Personal genome project

Why be concerned with privacy? Many sources of data have both statistical utility and privacy concerns. (a) Personal genome project (b) Privacy breach Scientific American, August 2013

Why be concerned with privacy? Many sources of data have both statistical utility and privacy concerns. (a) Personal genome project (b) Privacy breach Scientific American, August 2013 Question How to obtain principled tradeoffs between these competing criteria?

Basic model of local privacy X 1 Q ( Z n 1 | X n 1 ) X 2 � Z n θ X 3 1 X n each individual i ∈ { 1 , 2 , . . . , n } has personal data X i ∼ P θ ∗ conditional distribution Q between private data X n 1 and public data Z n 1 1 �→ � estimator Z n θ of unknown parameter θ ∗ .

Local privacy at level α Log likelihood log Q ( · | x ) log Q ( · | ¯ x ) z Definition Conditional distribution Q is locally α -differentially private if Q ( z | x n 1 ) e − α ≤ sup 1 ) ≤ e α for all x n x n 1 such that d HAM ( x n x n 1 and ¯ 1 , ¯ 1 ) = 1. x n Q ( z | ¯ z (Dwork et al., 2006)

Illustration of Laplacian mechanism x x Add α -Laplacian noise (Dwork et al., 2006) where W has density ∝ e − α | w | Z = x + W,

Illustration of Laplacian mechanism x x Add α -Laplacian noise (Dwork et al., 2006) where W has density ∝ e − α | w | Z = x + W, For all x, x ′ ∈ [ − 1 / 2 , 1 / 2]: � � � � � log Q ( z | x ) � � � � sup = α � sup | z − x | − | z − x | ≤ α. � � Q ( z | x ) z ∈ R z ∈ R

Various mechanisms for α -privacy Choices from past work: randomized response in survey questions (Warner, 1965) Laplacian noise (Dwork et al., 2006) exponential mechanism (McSherry & Talwar, 2007)

Various mechanisms for α -privacy Choices from past work: randomized response in survey questions (Warner, 1965) Laplacian noise (Dwork et al., 2006) exponential mechanism (McSherry & Talwar, 2007) Some past work on privacy and estimation: local differential privacy and PAC learning (Kasiviswanathan et al., 2008) linear queries over discrete-valued data sets (Hardt & Rothblum, 2010) global differential privacy and histogram estimators (Hall et al., 2011) lower bounds for certain 1-D statistics (Chaudhuri & Hsu, 2012)

Various mechanisms for α -privacy Choices from past work: randomized response in survey questions (Warner, 1965) Laplacian noise (Dwork et al., 2006) exponential mechanism (McSherry & Talwar, 2007) Some past work on privacy and estimation: local differential privacy and PAC learning (Kasiviswanathan et al., 2008) linear queries over discrete-valued data sets (Hardt & Rothblum, 2010) global differential privacy and histogram estimators (Hall et al., 2011) lower bounds for certain 1-D statistics (Chaudhuri & Hsu, 2012) Questions: Can we provide a general characterization of trade-offs between α -privacy and statistical utility? Can we identify optimal “mechanisms” for privacy?

Minimax optimality with α -privacy � family of distributions P ∈ F} , and functional P �→ θ ( P ) 1 �→ � samples X n 1 ≡ { X 1 , . . . , X n } ∼ P and estimator X n θ ( X n 1 ) loss function (e.g., squared error, 0-1 error, ℓ 1 -error) ( � L ( � θ, θ ) �→ θ, θ ) � �� quality of � θ as estimate of θ

Minimax optimality with α -privacy � family of distributions P ∈ F} , and functional P �→ θ ( P ) 1 �→ � samples X n 1 ≡ { X 1 , . . . , X n } ∼ P and estimator X n θ ( X n 1 ) loss function (e.g., squared error, 0-1 error, ℓ 1 -error) ( � L ( � θ, θ ) �→ θ, θ ) � �� quality of � θ as estimate of θ Ordinary minimax risk: � �� θ ( X n M n ( F ) := inf sup E L 1 ) , θ ( P ) � P ∈F θ �� Best estimator Worst-case distribution

Minimax optimality with α -privacy � family of distributions P ∈ F} , and functional P �→ θ ( P ) 1 �→ � samples X n 1 ≡ { X 1 , . . . , X n } ∼ P and estimator X n θ ( X n 1 ) loss function (e.g., squared error, 0-1 error, ℓ 1 -error) ( � L ( � θ, θ ) �→ θ, θ ) � �� quality of � θ as estimate of θ Ordinary minimax risk: � �� θ ( X n M n ( F ) := inf sup E L 1 ) , θ ( P ) � P ∈F θ �� Best estimator Worst-case distribution Minimax risk with α -privacy Estimators now depend on privatized samples Z n 1 � �� θ ( Z n M n ( α ; F ) := inf inf sup L 1 ) , θ ( P ) E Q ∈Q α � P ∈F θ � �� Best α -private channel

Vignette A: α -private location estimation Consider estimation of mean functional θ ( P ) = E [ X ] over family � � distributions P such that E [ X ] ∈ [ − 1 , 1] and E [ | X | k | ] ≤ 1 F k :=

Vignette A: α -private location estimation Consider estimation of mean functional θ ( P ) = E [ X ] over family � � distributions P such that E [ X ] ∈ [ − 1 , 1] and E [ | X | k | ] ≤ 1 F k := � n For k ≥ 2 and non-private setting, sample mean � θ = 1 i =1 X i achieves rate n 1 /n .

Privacy guarantees in statistical estimation: How to formalize the - PowerPoint PPT Presentation

Privacy guarantees in statistical estimation: How to formalize the problem? Martin Wainwright UC Berkeley Departments of Statistics, and EECS van Dantzig Seminar, University of Leiden Martin Wainwright (UC Berkeley) Privacy and statistics

Implicit Guarantees and Risk Taking: Implicit Guarantees and Risk Taking: Implicit Guarantees and

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

Database Privacy Review: Anonymity vs. Privacy Privacy - Privacy is the claim of individuals,

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

Privacy engineering, CyLab privacy by design, privacy impact assessments, and privacy governance

Privacy Enhancing Technologies Spring 2006 Outline Privacy Overview Course Topics

Privacy engineering, CyLab privacy by design, privacy impact assessments, and privacy governance

Privacy in Wireless Networks privacy notions and metrics; privacy in RFID systems; location

Privacy by Design Principles of Privacy-Aware Ubiquitous Systems Marc Langheinrich Privacy by

Data privacy: an introduction (part 1) Klara Stokes What is privacy? Privacy has been defined in

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

Robust TTS duration modelling using DNNs Gustav Eje Henter Srikanth Ronanki Oliver Watts Mirjam

Probabilistic & Unsupervised Learning Expectation Propagation Maneesh Sahani

NNLO subtraction for numerical integration of virtual amplitudes Mao Zeng, ETH Zrich

Renormalization for LaMET Yi-Bo Yang L a t t i c e Michigan state university P a r t o n P h y

What can we learn from data? Annex 58, 60 and 66 Meeting LBNL, Berkeley, September 2014 Henrik

Variational inference Probabilistic Graphical Models Sharif University of Technology Soleymani

Statistical modeling of summary values leads to accurate Approximate Bayesian Computations

Power Spectral Density Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical

Sambuz

Useful Links

Newsletter

Mail Us

Privacy guarantees in statistical estimation: How to formalize the - PowerPoint PPT Presentation

Privacy guarantees in statistical estimation: How to formalize the problem? Martin Wainwright UC Berkeley Departments of Statistics, and EECS van Dantzig Seminar, University of Leiden Martin Wainwright (UC Berkeley) Privacy and statistics

Implicit Guarantees and Risk Taking: Implicit Guarantees and Risk Taking: Implicit Guarantees and

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

Database Privacy Review: Anonymity vs. Privacy Privacy - Privacy is the claim of individuals,

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

Privacy engineering, CyLab privacy by design, privacy impact assessments, and privacy governance

Privacy Enhancing Technologies Spring 2006 Outline Privacy Overview Course Topics

Privacy engineering, CyLab privacy by design, privacy impact assessments, and privacy governance

Privacy in Wireless Networks privacy notions and metrics; privacy in RFID systems; location

Privacy by Design Principles of Privacy-Aware Ubiquitous Systems Marc Langheinrich Privacy by

Data privacy: an introduction (part 1) Klara Stokes What is privacy? Privacy has been defined in

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

Robust TTS duration modelling using DNNs Gustav Eje Henter Srikanth Ronanki Oliver Watts Mirjam

Probabilistic &amp; Unsupervised Learning Expectation Propagation Maneesh Sahani

NNLO subtraction for numerical integration of virtual amplitudes Mao Zeng, ETH Zrich

Renormalization for LaMET Yi-Bo Yang L a t t i c e Michigan state university P a r t o n P h y

What can we learn from data? Annex 58, 60 and 66 Meeting LBNL, Berkeley, September 2014 Henrik

Variational inference Probabilistic Graphical Models Sharif University of Technology Soleymani

Statistical modeling of summary values leads to accurate Approximate Bayesian Computations

Power Spectral Density Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical

Sambuz

Useful Links

Newsletter

Mail Us

Probabilistic & Unsupervised Learning Expectation Propagation Maneesh Sahani