Lo Locally Differentially Private Frequency Es Esti timati tion - PowerPoint PPT Presentation

Lo Locally Differentially Private Frequency Es Esti timati tion on Ex Exploi oiti ting Con Consi sistency Tianhao Wang Purdue University Joint work with Milan Lopuhaä-Zwakenberg, Zitao Li, Boris Skoric, Ninghui Li 1

Privacy in Practice • Local differential privacy is deployed • In Google Chrome browser, to collect browsing statistics • In Apple iOS and MacOS, to collect typing statistics • In Microsoft Windows, to collect telemetry data over time • In Alibaba, we built a system to collect user transaction info • Different algorithms are proposed. • They work for different tasks and different settings. • They are all based on Randomized Response .

Randomized Response • Survey technique for private questions Pr disease → yes • Survey people: = Pr disease → yes ∧ /012 • “Do you have disease X?” + Pr disease → yes ∧ 4156 • Each person: = 7. 8×1 + 7. 8×0.5 = 0.75 • Flip a secret coin Similarly: • Answer truth if head (w.p. 0.5 ) Pr disease → no = 0.25 • Answer randomly if tail (w.p. 0.5 ): Pr no disease → yes = 0.25 • reply “yes”/“no” w.p. 0.5 Pr no disease → no = 0.75 S L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. JASA. 1965.

Pr disease → yes = 0.75 Randomized Response Pr disease → no = 0.25 Pr no disease → no = 0.25 Pr no disease → yes = 0.75 • To estimate the distribution: • If ! "#$ out of ! people have the disease, we expect to see: An algorithm A is @ -LDP if and only if for any A and A′ , and any valid output C , E[ ' "#$ ] = 0.75! "#$ + 0.25(! − ! "#$ ) “yes” answers DE F G HI DE F GJ HI ≤ L M • Inverting the above equation: ' "#$ − 0.25! ! "#$ = 3 Enumerating possibilities of A and A J taking 0.5 disease or no disease, and C as yes or no, • It is the unbiased estimation of the number of patients the binary randomized response is N!3 -LDP. E[' "#$ ] − 0.25! E[ 3 ! "#$ ] = = ! "#$ 0.5 • Similar for the “no”

Local Differential Privacy (LDP) takes reports from all users and outputs Estimation function is done independent for each value % . • estimations 3(%) for any The result is not consistent. • value % Some may be negative. • Sum may not be 4 (the original number of users). • Noisy Data ! Noisy Data Noisy Data In this work, we explore 10 different methods that improves the • accuracy of LDP by enforcing consistency. A is ' -LDP iff for any % and %′ , • ! = A(%) and any valid output ! , takes input value % and )* + , -. )* + ,/ -. ≤ 1 2 outputs ! . Data Data Data % Data Data Trust boundary

1) The estimated frequency of Making Estimations Consistent each value is non-negative. 2) The sum of the estimated frequencies is 1. Method Description Non-neg Sum to 1 Complexity Base Use existing estimation No No N/A Several Base-Pos Convert negative est. to 0 Yes No O " Baselines Post-Pos Convert negative query result to 0 Yes No N/A Base-Cut Convert est. below threshold # to 0 Yes No O " Norm Add δ to est. No Yes O " Normalizati Norm-Mul Convert negative est. to 0, then multiply ϒ to positive est. Yes Yes O " on-based Norm-Cut Convert negative and small positive est. below ϑ to 0 Yes Almost O " Methods Norm-Sub Convert negative est. to 0 while adding δ to positive est. Yes Yes O " MLE-based MLE-Apx Convert negative est. to 0, then add δ to positive est. Yes Yes O " Needs Power Fit Power-Law dist., then minimize expected squared error. Yes No O $" More Prior O $" PowerNS Apply Norm-Sub after Power Yes Yes

Post-Processing: Toy Example Estimated Truth 40 40 35 35 Estimated Ratio (%) 30 25 24 30 True Ratio (%) 23 22 20 14 Constraint 1: estimation is non-negative 20 10 5 12 3 0 10 0 3 2 2 1 0 0 -2 -2 -3 -10 0 Base-Pos: Convert Norm-Sub: Additively Occupation Occupation negative to 0 normalize the result 40 34 40 Sum: 106% 35 Estimated Ratio (%) It is the solution to Constraint Estimated Ratio (%) 30 24 23 30 25 Least Square (CLS) and 24 Constraint 2: Sum of 20 Approximate Maximal Likelihood 20 13 14 estimations is known Estimation (MLE) 10 10 5 4 3 2 0 0 0 0 0 0 0 0 0 0 Occupation Occupation

Analysis of the Estimation in LDP • Estimation function ' ()* +,../0 " 1 = ' 2 +30 • ! " #$% = , more generally ! ,./ 4+3 probability of A(:) supporting : probability of A(:′) supporting : where : D ≠ : (disease → yes) Takeaway: The noise of the LDP (no disease → yes) estimation approximately follows • Noise comes from 5 1 , which is the addition of two Binomials Gaussian distribution. • Bin(":, <) + Bin " − ":, @ = Bin ", 0 A 0 < + 0+0A @ 0 This makes the analysis easier (Norm-Sub is solution to MLE). "< D 1 − < D ) for < D = 0 A 0 < + 0+0A • When " is large, noise ≈ C(< D ", @ 0 J, Jia, and N. Gong. Calibrate: Frequency estimation and heavy hitter identification with local differential privacy via incorporating prior knowledge. INFOCOM 2019 .

Empirical Understanding • 1 million reports following Zipf’s Base-Pos: Convert distribution (s=1.5) with 1024 values. negative to 0 • 5000 runs (each dot is the mean). Systematic positive bias to Bias is a bad thing. Should we stop post-processing? infrequent values. No, because it prevents impossible events. But how is it affect the utility? Norm-Sub: Additively Estimated Frequency normalize the result Systematic negative bias to frequent values. Value

Empirical Understanding Variance is smaller for infrequent values. • 1 million reports following Zipf’s distribution (s=1.5) with 1024 values. Base-Pos: Convert negative to 0 • 5000 runs (each dot is the variance). Takeaway Message • Utility is composed of bias and variance • Post processing introduces bias Norm-Sub: Additively but reduces variance Variance normalize the result Estimated • Different method achieves different bias-variance tradeoff

Comparison of Different Methods Multiplicatively normalize the result Mean Squared Error • Norm-Sub > Base-Pos > Base > Norm-Mul • Exploiting constraint may or may not be helpful More Privacy

Comparison of Different Methods • Normalization- Mean Squared Error based methods works better. • MSE is symmetric with ρ = 50 if the estimates sum up to 1. ρ Uniformly sample ρ% elements from the domain. • MSE of estimating a subset of values (set-value). •

Method Description Base Use existing estimation Summary Base-Pos Convert negative est. to 0 Post-Pos Convert negative query result to 0 Base-Cut Convert est. below threshold ! to 0 Norm Add δ to est. • LDP noise follows Gaussian. Norm-Mul Convert negative est. to 0, then multiply ϒ to positive est. Norm-Cut Convert negative and small positive est. below ϑ to 0 • Norm-Sub is the solution to MLE. Norm-Sub Convert negative est. to 0 while adding δ to positive est. • Exploiting priors is helpful. MLE-Apx Convert negative est. to 0, then add δ to positive est. Power Fit Power-Law dist., then minimize expected squared error. • Different method works for PowerNS Apply Norm-Sub after Power different tasks.

Lo Locally Differentially Private Frequency Es Esti timati tion - PowerPoint PPT Presentation

Lo Locally Differentially Private Frequency Es Esti timati tion on Ex Exploi oiti ting Con Consi sistency Tianhao Wang Purdue University Joint work with Milan Lopuha-Zwakenberg, Zitao Li, Boris Skoric, Ninghui Li 1 Privacy in

For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Verifying Differentially Private Bayesian Inference Marco Gaboardi University of Dundee Joint

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Differentially Private Recommender Systems David Madras University of Toronto April 4, 2017

Estimating the Variance of Complex Differentially Private Algorithms Robert Ashmead JSM 2019,

Releasing a Differentially Private Password Frequency Corpus from 70 Million Yahoo! Passwords

An Estimating System For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION

Absorption Line Profiles for Absorption Line Profiles for Differentially Rotating 2 M

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

Locally tabular polymodal logics Ilya Shapirovsky Institute for Information Transmission Problems

Differentially Private Oblivious RAM Sameer Wagh , Paul Cuff , Prateek Mittal July 24,

Building Blocks of Privacy: Differentially Private Mechanisms Graham Cormode graham@cormode.org

Computer Graphics Spectral Analysis Philipp Slusallek Spatial Frequency Frequency

Development, Evaluation, and Management of Differentially Private ML Pipelines on

Differentially Private Model Publishing for Deep Learning Lei Yu, Ling Liu, Calton Pu , Mehmet

Lecture 15: Batch RL Emma Brunskill CS234 Reinforcement Learning. Winter 2019 Slides drawn from

Consistent Kernel Mean Estimation for Functions of Random Variables Ilya Tolstikhin jointly with

Estimation II: Consistency Stat 3202 @ OSU, Autumn 2018 Dalpiaz 1 The Usual Setup Suppose we

Mining the social web: A series of statistical NLP case studies Vasileios Lampos Department of

Estimation: Sample Averages, Bias, and Concentration Inequalities CMPUT 296: Basics of Machine

Point Estimation The goal of Point Estimation is to find the point in -space which gives the

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Lo Locally Differentially Private Frequency Es Esti timati tion - PowerPoint PPT Presentation

Lo Locally Differentially Private Frequency Es Esti timati tion on Ex Exploi oiti ting Con Consi sistency Tianhao Wang Purdue University Joint work with Milan Lopuha-Zwakenberg, Zitao Li, Boris Skoric, Ninghui Li 1 Privacy in

For New Construction &amp; Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

Frequency Decomposition The base frequency or the fundamental frequency is the lowest frequency.

Verifying Differentially Private Bayesian Inference Marco Gaboardi University of Dundee Joint

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Differentially Private Recommender Systems David Madras University of Toronto April 4, 2017

Estimating the Variance of Complex Differentially Private Algorithms Robert Ashmead JSM 2019,

Releasing a Differentially Private Password Frequency Corpus from 70 Million Yahoo! Passwords

An Estimating System For New Construction &amp; Ship Repair PERCEPTION ESTI-MATE PERCEPTION

Absorption Line Profiles for Absorption Line Profiles for Differentially Rotating 2 M

Time-Frequency Analysis Time Frequency Analysis in Visual Signal Yetmen Wang AnCAD, Inc.

Locally tabular polymodal logics Ilya Shapirovsky Institute for Information Transmission Problems

Differentially Private Oblivious RAM Sameer Wagh , Paul Cuff , Prateek Mittal July 24,

Building Blocks of Privacy: Differentially Private Mechanisms Graham Cormode graham@cormode.org

Computer Graphics Spectral Analysis Philipp Slusallek Spatial Frequency Frequency

Development, Evaluation, and Management of Differentially Private ML Pipelines on

Differentially Private Model Publishing for Deep Learning Lei Yu, Ling Liu, Calton Pu , Mehmet

Lecture 15: Batch RL Emma Brunskill CS234 Reinforcement Learning. Winter 2019 Slides drawn from

Consistent Kernel Mean Estimation for Functions of Random Variables Ilya Tolstikhin jointly with

Estimation II: Consistency Stat 3202 @ OSU, Autumn 2018 Dalpiaz 1 The Usual Setup Suppose we

Mining the social web: A series of statistical NLP case studies Vasileios Lampos Department of

Estimation: Sample Averages, Bias, and Concentration Inequalities CMPUT 296: Basics of Machine

Point Estimation The goal of Point Estimation is to find the point in -space which gives the

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

An Estimating System For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION