Differential Privacy (Part III) Approximate (or ( , - PowerPoint PPT Presentation

Differential Privacy (Part III)

Approximate (or ( ℇ , ∂ ))-differential privacy •Generalized definition of differential privacy allowing for a (supposedly small) additive factor •Used in a variety of applications A query mechanism M is ( ✏ , � )-di ff erentially private if, for any two adjacent databases D and D 0 (di ff ering in just one entry) and C ⊆ range ( M ) Pr( M ( D ) ∈ C ) ≤ e ✏ · Pr ( M ( D 0 ) ∈ C ) + �

The Gaussian mechanism The ℓ 2 - sensitivity of f : ℕ | X | → ℝ k is defined as ∆ 2 ( f )=max || f ( x ) -f ( y )|| 2 for all x , y ∈ ℕ | X | ,|| x - y || 1 =1 For c 2 >2ln(1.25/ δ ) , the Gaussian mechanism with parameter σ≥ c ∆ 2 ( f )/ ε is ( ε , δ )- differentially private

Sparse Vector Technique ✦ [Hardt-Rothblum, FOCS’10] study the problem of k , adaptively chosen, low sensitivity queries where • only a very small number of these queries (say c ) take values above a certain threshold T • the data analyst is only interested in such queries • useful to learn correlations, e.g., whether there is a dependency between smoke and cancer ✦ The data analyst could ask only the significant queries, but she does not know them in advance! ✦ Goal: answer only the significant queries, pay only for them, and ignore the others

Histograms and linear queries ✦ A histogram x ∈ ℝ N represents a database (or a distribution) over a universe U of size |U|=N • Databases have support of size n , whereas distributions do not necessarily have a small support X ✦ We assume x is normalized so that x i = 1 i ∈ U ✦ Here we focus on linear queries f : R N → [0 , 1] • can be seen as the inner-product <x,f > for f ∈ [0 , 1] N • counting queries (i.e., how many elements in the database fulfill a certain predicate) are a special case ✦ Example: U ={1,2,3} D =[1,2,2,3,1] • x = (2,2,1), after normalization (2/5,2/5,1/5) • “how many entries ≤ 2” ⇒ f = (1,1,0) ✦ By normalization, linear queries have sensitivity 1/ n

SVT: algorithm We need to sanitize the threshold otherwise the We pay only We pay only conditional branch would leak for c queries for c queries information ✦ Intuition: answer only those queries whose sanitized result is above the sanitized threshold

SVT: accuracy We say Sparse is ( α , β )-accurate for a sequence of k queries Q 1 , . . . , Q k , if except with probability at most β , the algorithm does not abort before Q k , and for all a i ∈ R : | a i − Q i ( D ) | ≤ α and for all a i = ⊥ : Q i ( D ) ≤ T + α •α captures the distance between the sanitized result and the real result •β captures the error probability

SVT: accuracy theorem For any sequence of k queries Q 1 , . . . , Q k such that L ( T ) = |{ i : Q i ( D ) ≥ T − ↵ }| ≤ c , Sparse( D, { Q i } , T, c ) is ( ↵ , � )- accurate for: 4 c (log k + log 2 β ) ↵ = 2 � (log k + log 2 � ) = ✏ n •The larger β, the smaller α •The accuracy loss is logarithmic in the number of queries

SVT: privacy theorem The Sparse vector algorithm is ✏ -di ff erentially private •So, what did we prove in the end? •You can estimate the actual answers and report only those in this range: T+ α ∞ T •We can fish out insignificant queries almost “for free”, paying only logarithmically for them in terms of accuracy

SVT: approximate differential privacy p ✦ Setting , we get the following theorems: 32 c ln 1 / � � = ✏ n The Sparse vector algorithm is ( ✏ , � )-di ff erentially private For any sequence of k queries Q 1 , . . . , Q k such that L ( T ) = |{ i : Q i ( D ) ≥ T − ↵ }| ≤ c , Sparse( D, { Q i } , T, c ) is ( ↵ , � )- accurate for: 128 c ln 1 δ (log k + log 2 β ) ↵ = 2 � (log k + log 2 � ) = ✏ n

Limitations ✦ Differential privacy is a general purpose privacy definition, originally thought for databases and later applied to a variety of different settings ✦ At the moment, it is considered the state-of-the-art ✦ Still, it is not the holy grail and it is not immune from concerns, criticisms, and limitations ✦ Typically accompanied by some over-claims

No free lunch in data privacy ✦ Privacy and utility cannot be provided without making assumptions about how data are generated (no free lunch theorem) ✦ Privacy means hiding the evidence of participation of an individual in the data generating process ✦ If database rows are not independent, this is different from removing one row • Bob’s participation in a social network may cause new edges between pairs of his friends ✦ If there is group structure, differential privacy may not work very well...

No free lunch in data privacy (cont’d) ✦ This work disputes three popular over-claims ✦ “DP requires no assumptions on the data” • database rows must actually be independent, otherwise removing one row does not suffice to remove the individual’s participation ✦ If rows are not independent, deciding how many entries should be removed and which ones is far from being easy...

No free lunch in data privacy (cont’d) ✦ The attacker knows all entries of the database except for one, so “the more an attacker knows, the greater the privacy risks” ✦ Thus we should protect against the strongest attacker ✦ Careful! In DP, the more the attacker knows, the less noise we actually add • intuitively, this is due to the fact that we have less to hide

No free lunch in data privacy (cont’d) ✦ “DP is robust to arbitrary background knowledge” ✦ Actually, DP is robust when certain subsets of the tuples are known to the attacker ✦ Other types of background knowledge may instead be harmful • e.g., previous exact query answers ✦ DP composes well with itself, but not necessarily with other privacy definitions or release mechanisms ✦ One can get a new, more generic, DP privacy guarantee if, after releasing exact query answers, a set of tuples (not just one), called neighbours, is altered in a way that is still consistent with previously answered queries (plausible deniability)

Geo-indistinguishability •Goal: protect user’s exact location, while allowing approximate information (typically needed to obtain a certain desired service) to be released •Idea: protect the user’s location within a radius r with a level of privacy that depends on r •corresponds to a generalized version of the well- known concept of differential privacy.

Pictorially… •Achieve l -privacy within r •the provider cannot easily infer the user’s location within, say, the 7th arrondissement of Paris •the provider can infer with high probability that the user is located in Paris instead of, say, London

More formally… •Here K(x) denotes the distribution (of locations) generated by the mechanism K applied to location x •Achieved through a variant of the Laplace mechanism

Browser extension

Malicious aggregators f( x 1 ,…, x n ) x 1 x n Aggregator Users Analyst •So far we focused on malicious analysts… •…but aggregators can be malicious (or at least curious) too!

            Existing approaches • Secure hardware (or trusted server)-based mechanisms   • Fully distributed mechanisms with individual noise

Distributed Differential Privacy “What’s the average age of your self-help group?” How to compute differentially private queries in a distributed setting (attacker model, cryptographic protocols…)?

Smart-metering ✦ Remote reads ✦ Reads every 15-30 min ✦ Manual reads ✦ One reads every 3 months to 1 year ✦ Fine-grained smart-metering has multiple uses: • time-of-use billing, providing energy advice, settlement, forecasting, demand response, and fraud detection ✦ USA: Energy Independence and Security Act of 2007 • American Recovery and Reinvestment Act (2009, $4.5bn) ✦ EU: Directive 2009/72/EC ✦ UK: deployment of 47 million smart meters by 2020

Smart-metering: privacy issues ✦ Meter readings are sensitive • Were you in last night? • You do like watching TV, don’t you? • Another ready meal in the microwave? • Has your boyfriend moved in?

Smart-metering: privacy issues (cont’d)

Privacy-friendly smart metering ✦ Goals: • precise billing of consumption while revealing no consumption information to third parties • privacy-friendly real- time aggregation

Protocol overview ✦ r i answer from client i ✦ k ij key shared between client i and aggregator j ✦ t label classifying the kind of reading ✦ w i weight given to i’s answers

Protocol overview ✦ Geometric distribution, Geom( α ), with α >1 , is the discrete distribution with support and Z probability mass function α − 1 α + 1 α − | k | ✦ Discrete counterpart of Laplace distribution Let f : D → Z be a function with sensitivity ∆ f . Then g = f ( X ) + Geom( ✏ ∆ f ) is ✏ -di ff erentially private.

Differential Privacy (Part III) Approximate (or ( , - PowerPoint PPT Presentation

Differential Privacy (Part III) Approximate (or ( , ))-differential privacy Generalized definition of differential privacy allowing for a (supposedly small) additive factor Used in a variety of applications A query mechanism M is (

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

Toniann Pitassi Outline 1. Differential Privacy: The Basics 2. Differential Privacy in New

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

CS573 Data Privacy and Security Local Differential Privacy Li Xiong Privacy at Scale: Local

Differential Privacy Techniques Beyond Differential Privacy Steven Wu Assistant Professor

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Data privacy: an introduction (part 1) Klara Stokes What is privacy? Privacy has been defined in

Mobile Data Collection and Analysis with Local Differential Privacy - Part 1 Ninghui Li (Purdue

Privacy October 2006 Overview I. Definition II. Basic Privacy Principles III. Privacy Issues

Data and Analysis Part III Unstructured Data Ian Stark February 2011 Part III: Unstructured

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

Database Privacy Review: Anonymity vs. Privacy Privacy - Privacy is the claim of individuals,

Value of Perfect Information A W U MEU with no evidence Umbrella leave sun 100 U leave

Java Review Slides Hello World Program // HelloWorld // Author: Chris Wilcox // Date:

Convergence of Truncated T-Matrix Approximation M. Ganesh 1 , S.C. Hawkins 2 , Ralf Hiptmair 3 1

CS 241: Systems Programming Lecture 20. File I/O in C Spring 2020 Prof. Stephen Checkoway 1

Announcements Thursday Extras: CS Commons on Thursdays @ 4:00 pm Monday Extra, Oct. 8, 4:15 pm,

LATTICE SUSY joel.giedt, simon.catterall, raghav.govind.jha, david.schaich DESPERATELY SEEKING

The Summation-Truncation Hybrid: Reusing Discarded Bits for Free Aldo Gunsing and Bart Mennink

Short Digital Signatures and ID- KEMs via Truncation Collision Resistance Tibor Jager Rafael