Crowd-Blending Privacy Johannes Gehrke, Michael Hay, Edward Lui, - PowerPoint PPT Presentation

Crowd-Blending Privacy Johannes Gehrke, Michael Hay, Edward Lui, Rafael Pass Cornell University

Data Privacy Alice Users Bob Jack San . . . Database Jane mechanism Database containing data. E.g., census data, medical records, etc. • Utility: Accurate statistical info is released to users • Privacy : Each individual’s sensitive info remains hidden 2

Simple Anonymization Techniques are Not Good Enough! • Governor of Massachusetts Linkage Attack [Swe02] – “Anonymized” medical data + public voter registration records ⇒ G overnor of MA’s medical record identified! • Netflix Attack [NS08] – “Anonymized” Netflix user movie rating data + public IMDb database ⇒ Netflix dataset partly deanonymized! 3

Privacy Definitions • k -anonymity [Sam01, Swe02] – Each record in released data table is indistinguishable from k -1 other records w.r.t. certain identifying attributes • Differential privacy [DMNS06] – ∀ databases D, D’ differing in only one row, San(D ) ≈ ε San( D’ ) • Zero-knowledge privacy [GLP11] – ∀ adversary A interacting with San , ∃ a simulator S s.t. ∀ D, z, i, the simulator S can simulate A ’s output given just k random samples from D \ {i}: Out A (A(z ) ↔ San(D )) ≈ ε S(z, RS k (D \ {i})) 4

Privacy Definitions • k -anonymity – Good: Simple; efficient; practical – Bad: Weak privacy protection; known attacks • Differential privacy – Good: Strong privacy protection; lots of mechanisms – Bad: Have to add noise. Efficient? Practical? • Zero-knowledge privacy – Good: Even stronger privacy protection, lots of mechanisms – Bad: Have to add even more noise. Efficient? Practical? 5

Practical Sanitization? • Differential privacy and zero-knowledge privacy – Mechanism needs to be randomized – noise is added to the exact answer/output (sometimes quite a lot!) • In practice – Don’t want to add (much) noise – Want simple and efficient sanitization mechanisms • Problem: Is there a practical way of sanitizing data while ensuring privacy and good utility? 6

Privacy from Random Sampling • In practice, data is often collected via random sampling from some population (e.g., surveys) Population Alice Bob Jack San Random Sampling . . . Jane • Already known: If San is differentially private, then the random sampling step amplifies the privacy of San [KLNRS08] • Can we use a qualitatively weaker privacy def. for San and still have the combined process satisfy a strong notion of privacy? 7

Leveraging Random Sampling • Goal: Provide a privacy definition such that if San satisfies the privacy definition, then: Differential privacy or zero-knowledge Random Sampling + San privacy • Should be weaker than differential privacy ⇒ Better utility! • Should be meaningful by itself (without random sampling) – Strong fall-back guarantee if the random sampling is corrupted or completely leaked 8

k -Anonymity Revisited • k -anonymity: Each record in released data table is indistinguishable from k -1 other records w.r.t. certain identifying attributes • Based on the notion of “ blending in a crowd ” • Simple and practical • Problem: Definition restricts the output, not the mechanism that generates it – Leads to practical attacks on k -anonymity 9

k -Anonymity Revisited • A simple example illustrating the problem: – Use any existing algorithm to generate a data table satisfying k -anonymity – At the end of each row, attach the personal data of some fixed individual from the original database • The output satisfies k -anonymity but reveals personal data about some individual! • There are plenty of other examples! 10

Towards a New Privacy Definition • k -anonymity does not impose restrictions on mechanism – Does not properly capture “ blending in a crowd ” • One of the key insights of differential privacy: Privacy should be a property of the mechanism! • We want a privacy definition that imposes restrictions on the mechanism and properly captures “ blending in a crowd ” 11

Our Main Results • We provide a new privacy definition called crowd-blending privacy • We construct simple and practical mechanisms for releasing histograms and synthetic data points • We show: Crowd- Random Zero-knowledge + blending Sampling privacy privacy 12

Blending in a Crowd • Two individuals (with data values) t and t’ are ε - indistinguishable by San if San(D, t ) ≈ ε San(D, t’ ) ∀ D • Differential privacy: Every individual t in the universe is ε -indistinguishable by San from every other individual t’ in the universe. – In any database D, each individual in D is ε - indistinguishable by San from every other individual in D 13

Blending in a Crowd • First attempt of a privacy definition: ∀ D of size ≥ k , each individual in D is ε -indistinguishable by San from at least k-1 other individuals in D . – Collapses back down to differential privacy: If DP doesn’t hold, then ∃ t and t’ s.t. San can ε -distinguish t and t’ ; now, consider a database D = (t, t’ , t’ , …, t’ ). • Solution: D can have “outliers”, but we require San to essentially delete/ignore them . 14

Crowd-Blending Privacy • Definition: San is (k, ε )-crowd-blending private if ∀ D, and ∀ t in D, either • t is ε - indistinguishable from ≥ k individuals in D, or • t is essentially ignored: San(D ) ≈ ε San(D \ {t}). • Weaker than differential privacy ⇒ Better utility! • Meant to be used in conjunction with random sampling, but still meaningful by itself 15

Privately Releasing Histograms • (k,0)-crowd-blending private mechanism for releasing histogram: – Compute histogram – For bin counts < k, suppress to 0 Suppressing counts < k k k 0 0 Original Histogram Suppressed Histogram Simple and similar to what is done in practice! (Not differentially private) 16

Privately Releasing Synthetic Data Points • Impossible to efficiently and privately release synthetic data points for answering general classes of counting queries [DNRRV09, UV11] • We focus on answering smooth query functions Outlier ( k , ε )-crowd-blending private mechanism: Add noise • The above CBP mechanism: Useful for answering all smooth query functions with decent accuracy – Not possible with differentially private synthetic data points 17

Our Main Theorem Population Alice Bob Jack San Random Sampling . . . (k, ε )-crowd-blending Jane With probability p private mechanism Theorem (Informal): The combined process satisfies zero-knowledge privacy, and thus differential privacy as well. Our theorem holds even if the random sampling is slightly biased as follows: • Most individuals are sampled w.p . ≈ p • Remaining are sampled with arbitrary probability 18

Thank you! 19

Crowd-Blending Privacy Johannes Gehrke, Michael Hay, Edward Lui, - PowerPoint PPT Presentation

Crowd-Blending Privacy Johannes Gehrke, Michael Hay, Edward Lui, Rafael Pass Cornell University Data Privacy Alice Users Bob Jack San . . . Database Jane mechanism Database containing data. E.g., census data, medical records, etc.

Neural Networks applied to Blending Challenges Sowmya Kamath, Patricia Burchat Blending

Preparation for Blending Mr Don McWhirter MacWool Ltd Blending of Greasy Wool Extremely

Blending Wool for a uniform top which meets specification Martin Prins CSIRO FIBRE BLENDING

Sort-Independent Alpha Blending Houman Meshkin Senior Graphics Engineer Perpetual Entertainment

Blending Blending Elements of two input spaces are projected into a third space, the

Utilizing Crowd Funding Utilizing Crowd Funding for Support SMEs funding for Support SMEs

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Blending in LSST Data Products Jim Bosch, DM DRP Scientist / Princeton Blending Families Two

Blending Blending Combining familiar spaces (domains of understanding, mental

Supplement 189: Parametric Blending Presentation State Storage 1 Overview Objective

Rasterization Rasterization Blending Blending Frame buffer Simple color model: R, G,

Aggregate Blending Aggregate Blending To meet the gradation specifications for a concrete or

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Master Plan Input Gathering Update FEBRUARY 27, 2018 Forest Preserve District of DuPage County

MATERIALS AND TESTING QUALITY ASSURANCE QUALITY ASSURANCE What is Quality Assurance? Why

Survey Methodology for Research Essentials JAX April 14, 2020 Cyndi Garvan, MA (Mathematics),

What We Learned From the Mid-Year Check-In Calls: Successes & Challenges Mid-Year Calls

Trade Reform and Worker Flows in Brazil Marc-Andreas Muendler UC San Diego partly joint with

The Rise of Alternative Work Arrangements: Evidence and Implications for Tax Filing and Benefit

Ohio Tax Statistical Sampling Preparation for a Sample Audit Wednesday, January 25, 2017

Presentation of Findings g By ETC Institute January 21, 2009 Purpose Methodology

Crowd-Blending Privacy Johannes Gehrke, Michael Hay, Edward Lui, - PowerPoint PPT Presentation

Crowd-Blending Privacy Johannes Gehrke, Michael Hay, Edward Lui, Rafael Pass Cornell University Data Privacy Alice Users Bob Jack San . . . Database Jane mechanism Database containing data. E.g., census data, medical records, etc.

Neural Networks applied to Blending Challenges Sowmya Kamath, Patricia Burchat Blending

Preparation for Blending Mr Don McWhirter MacWool Ltd Blending of Greasy Wool Extremely

Blending Wool for a uniform top which meets specification Martin Prins CSIRO FIBRE BLENDING

Sort-Independent Alpha Blending Houman Meshkin Senior Graphics Engineer Perpetual Entertainment

Blending Blending Elements of two input spaces are projected into a third space, the

Utilizing Crowd Funding Utilizing Crowd Funding for Support SMEs funding for Support SMEs

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Blending in LSST Data Products Jim Bosch, DM DRP Scientist / Princeton Blending Families Two

Blending Blending Combining familiar spaces (domains of understanding, mental

Supplement 189: Parametric Blending Presentation State Storage 1 Overview Objective

Rasterization Rasterization Blending Blending Frame buffer Simple color model: R, G,

Aggregate Blending Aggregate Blending To meet the gradation specifications for a concrete or

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Master Plan Input Gathering Update FEBRUARY 27, 2018 Forest Preserve District of DuPage County

MATERIALS AND TESTING QUALITY ASSURANCE QUALITY ASSURANCE What is Quality Assurance? Why

Survey Methodology for Research Essentials JAX April 14, 2020 Cyndi Garvan, MA (Mathematics),

What We Learned From the Mid-Year Check-In Calls: Successes &amp; Challenges Mid-Year Calls

Trade Reform and Worker Flows in Brazil Marc-Andreas Muendler UC San Diego partly joint with

The Rise of Alternative Work Arrangements: Evidence and Implications for Tax Filing and Benefit

Ohio Tax Statistical Sampling Preparation for a Sample Audit Wednesday, January 25, 2017

Presentation of Findings g By ETC Institute January 21, 2009 Purpose Methodology

What We Learned From the Mid-Year Check-In Calls: Successes & Challenges Mid-Year Calls