Smooth Sensitivity and Sampling CompSci 590.03 Instructor: Ashwin - PowerPoint PPT Presentation

Smooth Sensitivity and Sampling CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 7 : 590.03 Fall 12 1

Project Topics • 2-3 minute presentations about each project topic. • 1-2 minutes of questions about each presentation. Lecture 7 : 590.03 Fall 12 2

Recap: Differential Privacy For every pair of inputs For every output … that differ in one value D 1 D 2 O Adversary should not be able to distinguish between any D 1 and D 2 based on any O Pr[A(D 1 ) = O] log < ε ( ε >0) Pr[A(D 2 ) = O] . Lecture 7 : 590.03 Fall 12 3

Recap: Laplacian Distribution Query q Database True answer q(d) + η q(d) Researcher Privacy depends on η the λ parameter h( η ) α exp(- η / λ ) Laplace Distribution – Lap( λ ) 0.6 Mean: 0, 0.4 Variance: 2 λ 2 0.2 0 Lecture 7 : 590.03 Fall 12 4 -10 -8 -6 -4 -2 0 2 4 6 8 10

Recap: Laplace Mechanism [Dwork et al., TCC 2006] Thm : If sensitivity of the query is S , then the following guarantees ε - differential privacy. λ = S/ ε Sensitivity : Smallest number s.t . for any d, d’ differing in one entry, || q(d) – q(d’) || ≤ S(q) Lecture 7 : 590.03 Fall 12 5

Sensitivity of Median function • Consider a dataset containing salaries of individuals – Salary can be anywhere between $200 to $200,000 • Researcher wants to compute the median salary. • What is the sensitivity? Lecture 7 : 590.03 Fall 12 6

Queries with Large Sensitivity • Median, MAX, MIN … • Let {x 1 , …, x 10 } be numbers in [0, Λ ]. (assume x i are sorted) • q med (x 1 , …, x 10 ) = x 5 Sensitivity of q med = Λ – d 1 = {0, 0, 0, 0, 0, Λ , Λ , Λ , Λ , Λ } – q med (d 1 ) = 0 – d 2 = {0, 0, 0, 0, Λ , Λ , Λ , Λ , Λ , Λ } – q med (d 2 ) = Λ Lecture 7 : 590.03 Fall 12 7

Minimum Spanning Tree • Graph G = (V,E) • Each edge has weight between 0, Λ • What is Global Sensitivity of cost of minimum spanning tree? Λ • Consider complete graph with all Λ Λ edge weights = Λ . Cost of MST = 3 Λ • Suppose one of the edge’s weight 0 is changed to 0 Λ Λ Cost of MST = 2 Λ Lecture 7 : 590.03 Fall 12 8

k-means Clustering • Input: set of points x 1 , x 2 , …, x n from R d • Output: A set of k cluster centers c1, c2, …, ck such that the following function is minimized. Lecture 7 : 590.03 Fall 12 9

Global Sensitivity of Clustering Lecture 7 : 590.03 Fall 12 10

Queries with Large Sensitivity However for most inputs q med is not very sensitive. d x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 d’ x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 Λ 0 x 4 ≤ q med (d’) ≤ x 6 Sensitivity of q med at d = max(x 5 – x 4 , x 6 – x 5 ) << Λ d’ differs from d in k=1 entry Lecture 7 : 590.03 Fall 12 11

Local Sensitivity of q at d – LS q (d) [Nissim et al., STOC 2007] Smallest number s.t . for any d’ differing in one entry from d, || q(d) – q(d’) || ≤ LS q (d) Sensitivity = Global sensitivity S(q) = max d LS q (d) Can we add noise proportional to local sensitivity? Lecture 7 : 590.03 Fall 12 12

Noise proportional to Local Sensitivity • d 1 = {0, 0, 0, 0, 0, 0, Λ , Λ , Λ , Λ } differ in one value • d 2 = {0, 0, 0, 0, 0, Λ , Λ , Λ , Λ , Λ } Lecture 7 : 590.03 Fall 12 13

Noise proportional to Local Sensitivity • d 1 = {0, 0, 0, 0, 0, 0, Λ , Λ , Λ , Λ } q med (d 1 ) = 0 LS qmed (d 1 ) = 0 => Noise sampled from Lap(0) • d 2 = {0, 0, 0, 0, 0, Λ , Λ , Λ , Λ , Λ } q med (d 2 ) = 0 LS qmed (d 2 ) = Λ => Noise sampled from Lap( Λ / ε ) Pr[answer > 0 | d 2 ] > 0 Pr[answer > 0 | d 2 ] > 0 = ∞ implies Pr[answer > 0 | d 1 ] = 0 Pr[answer > 0 | d 1 ] = 0 Lecture 7 : 590.03 Fall 12 14

Local Sensitivity LS qmed (d 1 ) = 0 & LS qmed (d 2 ) = Λ implies S(LS q (.)) ≥ Λ LS qmed (d) has very high sensitivity. Adding noise proportional to local sensitivity does not guarantee differential privacy Lecture 7 : 590.03 Fall 12 15

Sensitivity Local Sensitivity Global Sensitivity Smooth Sensitivity D1 D2 D3 D4 D5 D6 Lecture 7 : 590.03 Fall 12 16

Smooth Sensitivity [Nissim et al., STOC 2007] S(.) is a β -smooth upper bound on the local sensitivity if, For all d, S q (d) ≥ LS q (d) For all d, d’ differing in one entry, S q (d) ≤ exp(β ) S q (d’) • The smallest upper bound is called β -smooth sensitivity . S* q (d) = max d ’ ( LS q (d’) exp( -m β ) ) where d and d’ differ in m entries. Lecture 7 : 590.03 Fall 12 17

Smooth sensitivity of q med d x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 8 x 9 x 10 d’ Λ Λ Λ 0 0 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 • x 5-k ≤ q med (d’) ≤ x 5+k • LS(d’) = max(x med+1 – x med , x med – x med-1 ) d’ differs from d in S* qmed (d) = max k (exp(-k β ) x k=3 entries max 5- k ≤med≤ 5+k (x med+1 – x med , x med – x med-1 )) Lecture 7 : 590.03 Fall 12 18

Smooth sensitivity of q med For instance, Λ = 1000, β = 2. d 1 2 3 4 5 6 7 8 9 10 S* qmed (d) = max ( max 0≤k≤4 (exp(- β∙k) ∙ 1), max 5≤k≤10 (exp(- β∙k) ∙ Λ ) ) = 1 Lecture 7 : 590.03 Fall 12 19

Calibrating noise to smooth sensitivity Lecture 7 : 590.03 Fall 12 20

Calibrating noise to smooth sensitivity Theorem • If h is an ( α , β ) admissible distribution • If S q is a β -smooth upper bound on local sensitivity of query q. • Then adding noise from h(S q (D)/ α ) guarantees: P[f(D)  O] ≤ e ε P[f(D’)  O] + δ for all D, D’ that differ in one entry, and for all outputs O. Lecture 7 : 590.03 Fall 12 21

Calibrating Noise for Smooth Sensitivity A(d) = q(d) + Z ∙ (S* q (x) / α ) • Z sampled from h(z) 1/(1 + |z| γ ), γ > 1 • α = ε /4 γ , • S* is ε / γ smooth sensitive P[f(D)  O] ≤ e ε P[f(D’)  O] Lecture 7 : 590.03 Fall 12 22

Calibrating Noise for Smooth Sensitivity • Laplace and Normally distributed noise can also be used. • They guarantee ( ε , δ )-differential privacy. Lecture 7 : 590.03 Fall 12 23

Summary of Smooth Sensitivity • Many functions have large global sensitivity. • Local sensitivity captures sensitivity of current instance. – Local sensitivity is very sensitive. – Adding noise proportional to local sensitivity causes privacy breaches. • Smooth sensitivity – Not sensitive. – Much smaller than global sensitivity. Lecture 7 : 590.03 Fall 12 24

Computing the (Smooth) Sensitivity • No known automatic method to compute (smooth) sensitivity • For some complex functions it is hard to analyze even the sensitivity of the function. Lecture 7 : 590.03 Fall 12 25

Sample and Aggregate Framework Sample without Original Data replacement ( ) Original Function New Aggregation Function Lecture 7 : 590.03 Fall 12 26

Example: Statistical Analysis [Smith STOC’11] • Let T be some statistical point estimator on data (assumed to be drawn i.i.d. from some distribution) • Suppose T takes values from [- Λ/2, Λ/2 ], sensitivity = Λ Solution: • Divide data X into K parts • Compute T on each of the K parts: z 1 , z 2 , …, z K • Compute (z 1 , z 2 , …, z K )/K Lecture 7 : 590.03 Fall 12 27

Example: Statistical Analysis [Smith STOC’11] Solution: • Divide data X into K parts • Compute T on each of the K parts: z 1 , z 2 , …, z K • Compute : Ave K,T = (z 1 , z 2 , …, z K )/K Utility Theorem: Lecture 7 : 590.03 Fall 12 28

Example: Statistical Analysis [Smith STOC’11] Solution: • Divide data X into K parts • Compute T on each of the K parts: z 1 , z 2 , …, z K • Compute : Ave K,T = (z 1 , z 2 , …, z K )/K Privacy: Average is a deterministic algorithm. So does not guarantee differential privacy. (Add noise calibrated to sensitivity of average) Lecture 7 : 590.03 Fall 12 29

Widened Windsor Mean • α -Windsorized Mean: W(z 1 , z 2 , …, z k ) – Round up the α k smallest values to z α k – Round down the α k largest values to z (1- α )k – Compute the mean on the new set of values. • If statistician knows a = z (1- α )k and b = z α k – Sensitivity = |a-b|/k ε • If not known, a and b can be estimated using exponential mechanism. Lecture 7 : 590.03 Fall 12 30

Summary • Local sensitivity can be much smaller than global sensitivity • But local sensitivity may be a very insensitive function. • Need to use a smooth upperbound on local sensitivity • Sample and Aggregate framework helps apply differential privacy when computing sensitivity is hard. Lecture 7 : 590.03 Fall 12 31

Next Class • Optimizing noise when a workload of queries are known. Lecture 7 : 590.03 Fall 12 32

References C. Dwork, F. McSherry, K. Nissim , A. Smith, “Calibrating noise to sensitivity in private data analysis”, TCC 2006 K. Nissim, S. Raskhodnikova , A. Smith, “Smooth Sensitivity and sampling in private data analysis”, STOC 2007 A. Smith, "Privacy-preserving statistical estimation with optimal convergence rates", STOC 2011 Lecture 7 : 590.03 Fall 12 33

Smooth Sensitivity and Sampling CompSci 590.03 Instructor: Ashwin - PowerPoint PPT Presentation

Smooth Sensitivity and Sampling CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 7 : 590.03 Fall 12 1 Project Topics 2-3 minute presentations about each project topic. 1-2 minutes of questions about each presentation. Lecture

Climate Sensitivity We consider climate sensitivity in a very simple context. Climate Sensitivity

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players

Smooth Sensitivity and Sampling Sofya Raskhodnikova Penn State University Joint work with Kobbi

Sensitivity to Market Risks 1 METAC Workshop Sensitivity to Market Risks I OVERVIEW A

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

W4231: Analysis of Algorithms Definition of median 9/14/1999 Let A = a 1 a n be a

Chapter 3 : Central Tendency O Overview i Definition: Central tendency is a statistical

Chapter 9: Medians and Order Statistics The selection problem is the problem of computing, given a

Linear-time Median Def: Median of elements A=a 1 , a 2 , , a n is the (n/2)-th smallest element

Lecture 9/Chapter 7 Summarizing and Displaying Measurement (Quantitative) Data Five Number

Statistical Data Analysis DS GA 1002 Statistical and Mathematical Models

CSE101: Algorithm Design and Analysis Russell Impagliazzo Sanjoy Dasgupta Ragesh Jaiswal

Looking For Truth Or At Least Data Elizabeth D. Zwicky zwicky@otoh.org LISA 2009 Important

Smooth Sensitivity and Sampling CompSci 590.03 Instructor: Ashwin - PowerPoint PPT Presentation

Smooth Sensitivity and Sampling CompSci 590.03 Instructor: Ashwin Machanavajjhala Lecture 7 : 590.03 Fall 12 1 Project Topics 2-3 minute presentations about each project topic. 1-2 minutes of questions about each presentation. Lecture

Climate Sensitivity We consider climate sensitivity in a very simple context. Climate Sensitivity

What is the strengths and weakness of these sampling methods? Sampling Strengths /

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Chapter 7. Sampling Chapter 7. Sampling methods? methods? Two types of sampling methods Two

Multiple importance sampling Slides for CS6630 lecture 6 sampling the BRDF sampling the

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players

Smooth Sensitivity and Sampling Sofya Raskhodnikova Penn State University Joint work with Kobbi

Sensitivity to Market Risks 1 METAC Workshop Sensitivity to Market Risks I OVERVIEW A

Sampling Sediment and Sampling Sediment and Sampling Sediment and Porewater Sampling Sediment

Sampling Overview R toy sampling Non-probability sampling Probability Methods (AKA random)

Sampling Methods CMSC 678 UMBC Outline Recap Monte Carlo methods Sampling Techniques Uniform

Newfound Water Quality Sampling: In Lake Sampling 8 Historic Sampling locations

Sampling Distributions Sampling Distribution of the Mean &amp; Hypothesis Testing Sampling

Overview of Sampling Topics (Shannon) sampling theorem Impulse-train sampling

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

W4231: Analysis of Algorithms Definition of median 9/14/1999 Let A = a 1 a n be a

Chapter 3 : Central Tendency O Overview i Definition: Central tendency is a statistical

Chapter 9: Medians and Order Statistics The selection problem is the problem of computing, given a

Linear-time Median Def: Median of elements A=a 1 , a 2 , , a n is the (n/2)-th smallest element

Lecture 9/Chapter 7 Summarizing and Displaying Measurement (Quantitative) Data Five Number

Statistical Data Analysis DS GA 1002 Statistical and Mathematical Models

CSE101: Algorithm Design and Analysis Russell Impagliazzo Sanjoy Dasgupta Ragesh Jaiswal

Looking For Truth Or At Least Data Elizabeth D. Zwicky zwicky@otoh.org LISA 2009 Important

Sampling Distributions Sampling Distribution of the Mean & Hypothesis Testing Sampling