Techniques for Private Data Analysis Sofya Raskhodnikova Penn - PowerPoint PPT Presentation

Techniques for Private Data Analysis Sofya Raskhodnikova Penn State University Based on joint work with Shiva Kasiviswanathan , Homin Lee , Kobbi Nissim and Adam Smith

Private data analysis Alice Users: ❥ Trusted Bob ✛ q government, . ✲ ✲ collection & . . researchers, ✯ sanitization you marketers,... Collections of personal and sensitive data • census • medical and public health data • social networks • recommendation systems • trace data: search records, click data • intrusion-detection 2

Meta Question What information can be released? • Two conflicting goals – utility: users can extract ”global” statistics – privacy: individual information stays hidden 3

Related work Other fields: huge amount of work • in statistics (statistical disclosure limitation) • in data mining (privacy-preserving data mining) • largely: no precise privacy definition (only security against specific attacks) In cryptography (private data analysis) • [Dinur Nissim 03, Dwork Nissim 04, Chawla Dwork McSherry Smith Wee 05, Blum Dwork McSherry Nissim 05, Chawla Dwork McSherry Talwar 05, Dwork McSherry Nissim Smith 06, ...] • rigorous privacy guarantees 4

Differential privacy [DMNS06] Intuition: Users learn the same thing about me whether or not I participate in the census Two databases are neighbors if they differ in one row (arbitrarily complex information supplied by one person). x 1 x 1 x ′ x 2 2 x ′ = x = . . . . . . x n x n Privacy definition Algorithm A is ε -differentially private if • for all neighbor databases x, x ′ • for all sets of answers S Pr[ A ( x ) ∈ S ] ≤ (1 + ε ) · Pr[ A ( x ′ ) ∈ S ] 5

Properties of differential privacy x 1 ❥ ε -diff. x 2 q . ✲ ✲ private A ( x ) Users . . ✯ algorithm A x n • ε is non-negligible (at least 1 n ). • Composition: If A 1 and A 2 are ε -differentially private then ( A 1 , A 2 ) is 2 ε -differentially private • robust in the presence of arbitrary side information 6

What can we compute privately? Research so far: • Definitions [DiNi,DwNi,EGS,DMNS,DwNa,DKMMN,GKS] • Function approximation x 1 ❥ x 2 ε -diff. ✛ q Compute f ( x ) ✲ . private Users . . ✲ A ( x ) ≈ f ( x ) ✯ A x n – Protocols [DiNi,DwNi,BDMN,DMNS,NRS,BCDKMT] – Impossibility results [DiNi,DMNS,DwNa,DwMT,DwY] – Distributed protocols [DKMMN,BNiO] • Mechanism design [McSherry Talwar 07] • Learning [Blum Dwork McSherry Nissim 05, KLNRS08] • Releasing classes of functions [Blum Ligett Roth 08] • Synthetic data [Machanavajjhala Kifer Abowd Gehrke Vilhuber 08] 7

Road map I. Function approximation • Global sensitivity framework [DMNS06] • Smooth sensitivity framework [NRS07] • Sample-and-aggregate [NRS07] II. Learning • Exponential mechanism [MT07,KLNRS08] 8

Function Approximation x 1 ❥ Trusted x 2 ✛ q Compute f ( x ) . ✲ agency Users . . ✲ A ( x ) = ✯ A f ( x ) + noise x n For which functions f can we have: • privacy: differential privacy [DMNS06] • utility: output A ( x ) is close to f ( x ) 9

Global sensitivity framework [DMNS06] Intuition: f can be released accurately when it is insensitive to individual entries x 1 , . . . , x n . neighbors x,x ′ � f ( x ) − f ( x ′ ) � 1 . Global sensitivity GS f = max Example: GS average = 1 n if x ∈ [0 , 1] n . Theorem � � GS f If A ( x ) = f ( x ) + Lap then A is ε -diff. private. ε 10

Global sensitivity framework [DMNS06] Intuition: f can be released accurately when it is insensitive to individual entries x 1 , . . . , x n . neighbors x,x ′ � f ( x ) − f ( x ′ ) � 1 . Global sensitivity GS f = max � 1 � Example: GS average = 1 n if x ∈ [0 , 1] n . Noise = Lap . εn Compare to: Estimating frequencies (e.g., proportion of 1 people with blue eyes) from n samples: sampling error √ n . Theorem � � GS f If A ( x ) = f ( x ) + Lap then A is ε -diff. private. ε 10

Global sensitivity framework [DMNS06] Intuition: f can be released accurately when it is insensitive to individual entries x 1 , . . . , x n . neighbors x,x ′ � f ( x ) − f ( x ′ ) � 1 . Global sensitivity GS f = max � 1 � Example: GS average = 1 n if x ∈ [0 , 1] n . Noise = Lap . εn Compare to: Estimating frequencies (e.g., proportion of 1 people with blue eyes) from n samples: sampling error √ n . Theorem � � GS f If A ( x ) = f ( x ) + Lap then A is ε -diff. private. ε Functions with low global sensitivity • Means, variances for data in a bounded interval • histograms, contingency tables • singular value decomposition 10

Instance-Based Noise Big picture for global sensitivity framework: • add enough noise to cover the worst case for f • noise distribution depends only on f , not database x Problem: for some functions that’s too much noise Smooth sensitivity framework [Nissim Smith Raskhodnikova 07] : noise tuned to database x 11

Local sensitivity x ′ : neighbor of x � f ( x ) − f ( x ′ ) � Local sensitivity LS f ( x ) = max Reminder: GS f = max LS f ( x ) x Example: median for 0 ≤ x 1 ≤ · · · ≤ x n ≤ 1, odd n . . . . . . x 1 x m − 1 x m x m +1 x n 0 1 ✲ r r r r r ✻ median LS median ( x ) = max( x m − x m − 1 , x m +1 − x m ) Goal: Release f ( x ) with less noise when LS f ( x ) is lower. 12

Local sensitivity x ′ : neighbor of x � f ( x ) − f ( x ′ ) � Local sensitivity LS f ( x ) = max Reminder: GS f = max LS f ( x ) x Example: median for 0 ≤ x 1 ≤ · · · ≤ x n ≤ 1, odd n . . . . . . x 1 x m − 1 x m x m +1 x n 0 1 ✲ r r r r r ❨ ✻ median new median when x ′ 1 = 1 LS median ( x ) = max( x m − x m − 1 , x m +1 − x m ) Goal: Release f ( x ) with less noise when LS f ( x ) is lower. 12

Local sensitivity x ′ : neighbor of x � f ( x ) − f ( x ′ ) � Local sensitivity LS f ( x ) = max Reminder: GS f = max LS f ( x ) x Example: median for 0 ≤ x 1 ≤ · · · ≤ x n ≤ 1, odd n . . . . . . x 1 x m − 1 x m x m +1 x n 0 1 ✲ r r r r r ❨ ✒ ✻ new median median new median when x ′ when x ′ n = 0 1 = 1 LS median ( x ) = max( x m − x m − 1 , x m +1 − x m ) Goal: Release f ( x ) with less noise when LS f ( x ) is lower. 12

Instance-based noise: first attempt Noise magnitude proportional to LS f ( x ) instead of GS f ? No! Noise magnitude reveals information. Lesson: Noise magnitude must be an insensitive function. 13

Smooth bounds on local sensitivity Design sensitivity function S ( x ) • S ( x ) is an ε -smooth upper bound on LS f ( x ) if: – for all x : S ( x ) ≥ LS f ( x ) – for all neighbors x, x ′ : S ( x ) ≤ e ε S ( x ′ ) ✻ LS f ( x ) ✲ x Theorem � S ( x ) � then A is ε ′ -differentially private. If A ( x ) = f ( x ) + noise ε Example: GS f is always a smooth bound on LS f ( x ) 14

Smooth bounds on local sensitivity Design sensitivity function S ( x ) • S ( x ) is an ε -smooth upper bound on LS f ( x ) if: – for all x : S ( x ) ≥ LS f ( x ) – for all neighbors x, x ′ : S ( x ) ≤ e ε S ( x ′ ) ✻ S ( x ) LS f ( x ) ✲ x Theorem � S ( x ) � then A is ε ′ -differentially private. If A ( x ) = f ( x ) + noise ε Example: GS f is always a smooth bound on LS f ( x ) 14

Smooth Sensitivity � LS f ( y ) e − ε · dist ( x,y ) � Smooth sensitivity S ∗ f ( x )= max y Lemma S ∗ For every ε -smooth bound S : f ( x ) ≤ S ( x ) for all x . Intuition: little noise when far from sensitive instances low local sensitivity high local sensitivity database space 15

Smooth Sensitivity � LS f ( y ) e − ε · dist ( x,y ) � Smooth sensitivity S ∗ f ( x )= max y Lemma S ∗ For every ε -smooth bound S : f ( x ) ≤ S ( x ) for all x . Intuition: little noise when far from sensitive instances low local sensitivity low smooth sensitivity high local sensitivity database space 15

Computing smooth sensitivity Example functions with computable smooth sensitivity • Median & minimum of numbers in a bounded interval • MST cost when weights are bounded • Number of triangles in a graph Approximating smooth sensitivity • only smooth upper bounds on LS are meaningful • simple generic methods for smooth approximations – work for median and 1-median in L d 1 16

Road map I. Function approximation • Global sensitivity framework [DMNS06] • Smooth sensitivity framework [NRS07] • Sample-and-aggregate [NRS07] II. Learning • Exponential mechanism [MT07,KLNRS08] 17

New goal • Smooth sensitivity framework requires understanding combinatorial structure of f – hard in general • Goal: an automatable transformation from an arbitrary f into an ε -diff. private A – A ( x ) ≈ f ( x ) for ”good” instances x 18

Techniques for Private Data Analysis Sofya Raskhodnikova Penn - PowerPoint PPT Presentation

Techniques for Private Data Analysis Sofya Raskhodnikova Penn State University Based on joint work with Shiva Kasiviswanathan , Homin Lee , Kobbi Nissim and Adam Smith Private data analysis Alice Users: Trusted Bob q government,

Grid.java public public class class Grid { private private final final int int width;

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

POZIERES RELIC Private WOOD HC Private POTTER TJA DIV FIELD ARTILLERY LCPL PRIEST TH Private

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Russian Private Equity landscape W hats today? Money flow cycle Money flow cycle Private

Presentation Techniques A Guide To Drawing And Presenting Design Ideas Presentation Techniques A

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

Low Power Techniques for SoC Design: basic concepts and techniques Estagi ario de Doc encia

Data and Analysis Part V Statistical Analysis of Data Alex Simpson Part V: Statistical Analysis

DNA Analysis Techniques DNA Analysis Techniques for Molecular Genealogy for Molecular Genealogy

Malcode Analysis Malcode Analysis Techniques Techniques for for Incident Handlers Incident

Link Analysis Lecture 7 Link Analysis November 29, 2017 1 CS6220 Data Mining Techniques

PRIVATE EVENTS PrivateEvents@ACL-LIVE.com (512)404-1318 MEET THE TEAM Director of Director of

NASDAQ PRIVATE MARKET OVERVIEW October 2, 2017 ABOUT NASDAQ PRIVATE MARKET (NPM) Nasdaq Private

Ethics in Techniques for large-scale data Graham J.L. Kemp TECHNIQUES FOR LARGE-SCALE DATA

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

Computational challenges & opportunities P. Perona California Institute of Technology 4

Machine Learning @Quora: Beyond Deep Learning 08/02/2016 Xavier Amatriain (@xamat) Our Mission

Efficient Tensor Decomposition and Its Application Naoki KAWASHIMA (ISSP) Dec. 3, 2018 Occam's

Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the

Prediction models of Social Media data Daniel Preotiuc-Pietro daniel@dcs.shef.ac.uk 11.10.2013

Machine learning theory Nonuniform learnability Hamid Beigy Sharif university of technology

Advances in Gaussian Processes Tutorial at NIPS 2006 in Vancouver Carl Edward Rasmussen Max

Techniques for Private Data Analysis Sofya Raskhodnikova Penn - PowerPoint PPT Presentation

Techniques for Private Data Analysis Sofya Raskhodnikova Penn State University Based on joint work with Shiva Kasiviswanathan , Homin Lee , Kobbi Nissim and Adam Smith Private data analysis Alice Users: Trusted Bob q government,

Grid.java public public class class Grid { private private final final int int width;

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

POZIERES RELIC Private WOOD HC Private POTTER TJA DIV FIELD ARTILLERY LCPL PRIEST TH Private

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Russian Private Equity landscape W hats today? Money flow cycle Money flow cycle Private

Presentation Techniques A Guide To Drawing And Presenting Design Ideas Presentation Techniques A

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

Low Power Techniques for SoC Design: basic concepts and techniques Estagi ario de Doc encia

Data and Analysis Part V Statistical Analysis of Data Alex Simpson Part V: Statistical Analysis

DNA Analysis Techniques DNA Analysis Techniques for Molecular Genealogy for Molecular Genealogy

Malcode Analysis Malcode Analysis Techniques Techniques for for Incident Handlers Incident

Link Analysis Lecture 7 Link Analysis November 29, 2017 1 CS6220 Data Mining Techniques

PRIVATE EVENTS PrivateEvents@ACL-LIVE.com (512)404-1318 MEET THE TEAM Director of Director of

NASDAQ PRIVATE MARKET OVERVIEW October 2, 2017 ABOUT NASDAQ PRIVATE MARKET (NPM) Nasdaq Private

Ethics in Techniques for large-scale data Graham J.L. Kemp TECHNIQUES FOR LARGE-SCALE DATA

Kolmogorov Complexity of Categories Complexity Programing Language Kolmogorov Noson S.

Computational challenges &amp; opportunities P. Perona California Institute of Technology 4

Machine Learning @Quora: Beyond Deep Learning 08/02/2016 Xavier Amatriain (@xamat) Our Mission

Efficient Tensor Decomposition and Its Application Naoki KAWASHIMA (ISSP) Dec. 3, 2018 Occam's

Model Selection Frank Wood December 10, 2009 Standard Linear Regression Recipe Identify the

Prediction models of Social Media data Daniel Preotiuc-Pietro daniel@dcs.shef.ac.uk 11.10.2013

Machine learning theory Nonuniform learnability Hamid Beigy Sharif university of technology

Advances in Gaussian Processes Tutorial at NIPS 2006 in Vancouver Carl Edward Rasmussen Max

Computational challenges & opportunities P. Perona California Institute of Technology 4