The Privacy of Secured Computations Adam Smith Penn State Crypto - PowerPoint PPT Presentation

The Privacy of Secured Computations Adam Smith Penn State Crypto & Big Data Workshop “Relax – it can only December 15, 2015 see metadata.” Cartoon: 1

Big Data Every <length of time> your <household object> generates <metric scale modifier> bytes of data about you • Everyone handles sensitive data • Everyone delegates sensitive computations Crypto & Big data 4

Secured computations • Modern crypto offers powerful tools  Zero-knowledge to program obfuscation • Broadly: specify outputs to reveal  … and outputs to keep secret  Reveal only what is necessary • Bright lines  E.g., psychiatrist and patient • Which computations should we secure?  Consider average salary in department before and after professor X resigns  Today: settings where we must release some data at the expense of others 5

Which computations should we secure? • This is a social decision  True, but… • Technical community can offer tools to reason about security of secured computations • This talk: privacy in statistical databases • Where else can technical insights be valuable? 6

Privacy in Statistical Databases Individuals “Curator” Users Government, ( answers ) queries researchers, A businesses (or) Malicious adversary Large collections of personal information • census data • national security data • medical/public health data • social networks • recommendation systems • trace data: search records, etc 7

Privacy in Statistical Databases • Two conflicting goals  Utility : Users can extract “aggregate” statistics  “Privacy” : Individual information stays hidden • How can we define these precisely?  Variations on model studied in • Statistics (“statistical disclosure control”) • Data mining / database (“privacy - preserving data mining” *)  Recently: Rigorous foundations & analysis 8

Privacy in Statistical Databases • Why is this challenging?  A partial taxonomy of attacks • Differential privacy  “Aggregate” as insensitive to individual changes • Connections to other areas 9

External Information Individuals Server/agency Users Internet Government, ( answers ) Social queries researchers, A network businesses (or) Other Malicious anonymized adversary data sets • Users have external information sources  Can’t assume we know the sources Anonymous data (often) isn’t . 10

A partial taxonomy of attacks • Reidentification attacks  Based on external sources or other releases • Reconstruction attacks  “Too many, too accurate” statistics allow data reconstruction • Membership tests  Determine if specific person in data set (when you already know much about them) • Correlation attacks  Learn about me by learning about population 11

Reidentification attack example [Narayanan, Shmatikov 2008] Alice Bob Charlie Danielle Erica Frank Anonymized Public, incomplete NetFlix data IMDB data Alice Bob On average, = Charlie four movies Danielle uniquely Erica identify user Frank Identified NetFlix Data 12 Image credit: Arvind Narayanan

Other reidentification attacks • … based on external sources, e.g.  Social networks  Computer networks  Microtargeted advertising  Recommendation Systems  Genetic data [ Yaniv’s talk] • … based on composition attacks  Combining independent anonymized releases [Citations omitted] 13

Is the problem granularity? • Examples so far: releasing individual information  What if we release only “aggregate” information? • Defining “aggregate” is delicate  E.g. support vector machine output reveals individual data points • Statistics may together encode data  Reconstruction attacks: Too many, “too accurate” stats ⇒ reconstruct the data  Robust even to fairly significant noise 14

Reconstruction Attack Example [Dinur Nissim ’ 03] • Data set: 𝑒 “public” attributes, 1 “sensitive” reconstruction release people ≈ a i y y’ y attributes • Suppose release reveals correlations between attributes  Assume one can learn 𝑏 𝑗 , 𝑧 + 𝑓𝑠𝑠𝑝𝑠  If 𝑓𝑠𝑠𝑝𝑠 = 𝑝 𝑜 and 𝑏 𝑗 uniformly random and 𝑒 > 4𝑜 , then one reconstruct 𝑜 − 𝑝(𝑜) entries of y • Too many, “too accurate” stats ⇒ reconstruct data  Cannot release everything everyone would want to know 15

Reconstruction attacks as linear encoding [DMT ‘07,… ] • Data set: d “public” attributes per person, 1 “sensitive” n reconstruction release y’ ≈ a i y y people d+1 attributes • Idea: view statistics as noisy linear encoding My + e a i x a j y e y’ + M • Reconstruction depends on geometry of matrix M  Mathematics related to “compressed sensing” 16

Membership Test Attacks • [Homer et al. (2008)] Exact high-dimensional summaries allow an attacker with knowledge of population to test membership in a data set • Membership is sensitive  Not specific to genetic data (no- fly list, census data…)  Learn much more if statistics are provided by subpopulation • Recently:  Strengthened membership tests [Dwork, S., Steinke, Ullman, Vadhan ‘ 15]  Tests based on learned face recognition parameters [Frederiksson et al ‘ 15] 17

Membership tests from marginals • 𝑌 : set of 𝑜 binary vectors from distrib 𝑄 over 0,1 𝑒 • 𝑟 𝑌 = 𝑌 ∈ 0,1 𝑒 : proportion of 1 for each attribute • 𝑨 ∈ 0,1 𝑒 : Alice’s data • Eve wants to know if Alice is in X. 𝑌 = Eve knows 0 1 1 0 1 0 0 0 1  𝑟 𝑌 = 𝑌 0 1 0 1 0 1 0 0 1  𝑨 : either in 𝑌 or from 𝑄 1 0 1 1 1 1 0 1 0 1 1 0 0 1 0 1 0 0  𝑍 : 𝑜 fresh samples from 𝑄 𝑌 = • [Sankararam et al, ‘ 09] ½ ¾ ½ ½ ¾ ½ ¼ ¼ ½ Eve reliably guesses if 𝑨 ∈ 𝑌 when 𝑒 > 𝑑𝑜 𝑨 = 1 0 1 1 1 1 0 1 0 18

Strengthened membership tests [DSSUV’ 15] • 𝑌 : set of 𝑜 binary vectors from distrib 𝑄 over 0,1 𝑒 𝑟 𝑌 = • 𝑌 ± 𝜷 : approximate proportions • 𝑨 ∈ 0,1 𝑒 : Alice’s data • Eve wants to know if Alice is in X. 𝑌 = Eve knows 0 1 1 0 1 0 0 0 1  𝑟 𝑌 = 𝑌 ± 𝜷 0 1 0 1 0 1 0 0 1  𝑨 : either in 𝑌 or from 𝑄 1 0 1 1 1 1 0 1 0 1 1 0 0 1 0 1 0 0  𝑍 : 𝒏 fresh samples from 𝑄 𝑟 𝑌 ≈ • [DSSUV’ 15] ½ ¾ ½ ½ ¾ ½ ¼ ¼ ½ Eve reliably guesses if 𝑨 ∈ 𝑌 𝒐 𝟑 when 𝑒 > 𝑑′ 𝑜 + 𝜷 𝟑 𝒐 𝟑 + 𝑨 = 𝒏 1 0 1 1 1 1 0 1 0 19

Robustness to perturbation • 𝑜 = 100 • 𝑛 = 200 True positive rate • 𝑒 = 5,000 • Two tests  LR [Sankararam et al’ 09]  IP [DSSUV’ 15] False positive rate • Two publication mechanisms  Rounded to nearest multiple of 0.1 (red / green)  Exact statistics (yellow / blue) Conclusion: IP test is robust. Calibrating LR test seems difficult 20

“Correlation” attacks • Suppose you know that I smoke and…  Public health study tells you that I am at risk for cancer  You decide not to hire me • Learn about me by learning about underlying population  It does not matter which data were used in study  Any representative data for population will do • Widely studied  De Finetti [Kifer ‘ 09]  Model inversion [Frederickson et al ‘ 15] *  Many others • Correlation attacks fundamentally different from others  Do not rely on (or imply) individual data  Provably impossible to prevent ** * Model inversion used two few different ways in [Frederickson et al.] ** Details later. 21

A partial taxonomy of attacks • Reidentification attacks  Based on external sources or other releases • Reconstruction attacks  “Too many, too accurate” statistics allow data reconstruction • Membership tests  Determine if specific person in data set (when you already know much about them) • Correlation attacks  Learn about me by learning about population 22

Privacy in Statistical Databases • Why is this challenging?  A partial taxonomy of attacks • Differential privacy • “Aggregate” ≈ stability to small changes in input • Connections to other areas • Handles arbitrary external information • Rich algorithmic and statistical theory 23

Differential Privacy [Dwork, McSherry, Nissim, S. 2006] • Intuition:  Changes to my data not noticeable by users  Output is “independent” of my data 24

Differential Privacy [Dwork, McSherry, Nissim, S. 2006] A A(x) local random coins • Data set x  Domain D can be numbers, categories, tax forms  Think of x as fixed (not random) • A = randomized procedure  A(x) is a random variable  Randomness might come from adding noise, resampling, etc. 25

Differential Privacy [Dwork, McSherry, Nissim, S. 2006] A A A( x’ ) A(x) local random local random coins coins • A thought experiment  Change one person’s data (or remove them)  Will the distribution on outputs change much? 26

Differential Privacy [Dwork, McSherry, Nissim, S. 2006] A A A( x’ ) A(x) local random local random coins coins x’ is a neighbor of x if they differ in one data point Neighboring databases induce close distributions Definition : A is ε -differentially private if, on outputs for all neighbors x, x’, for all subsets S of outputs 27

The Privacy of Secured Computations Adam Smith Penn State Crypto - PowerPoint PPT Presentation

The Privacy of Secured Computations Adam Smith Penn State Crypto & Big Data Workshop Relax it can only December 15, 2015 see metadata. Cartoon: 1 Big Data Every <length of time> your <household object> generates

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

Database Privacy Review: Anonymity vs. Privacy Privacy - Privacy is the claim of individuals,

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

Privacy engineering, CyLab privacy by design, privacy impact assessments, and privacy governance

Privacy Enhancing Technologies Spring 2006 Outline Privacy Overview Course Topics

Privacy engineering, CyLab privacy by design, privacy impact assessments, and privacy governance

Privacy in Wireless Networks privacy notions and metrics; privacy in RFID systems; location

Privacy by Design Principles of Privacy-Aware Ubiquitous Systems Marc Langheinrich Privacy by

Data privacy: an introduction (part 1) Klara Stokes What is privacy? Privacy has been defined in

Privacy Past and Future ISSA Lansing, MI May 19, 2016 Keith A. Cheresko Principal, Privacy

Privacy Presentation Task Force on Autonomous Vehicles Subcommittee on Cybersecurity, Privacy,

UL HPC School 2017 Overview & Challenges of the UL HPC Facility at the Belval & EuroHPC

Unimodality of q -Eulerian polynomials and q , p -Eulerian polynomials Michelle Wachs University

Out line Robot ics Percept ion Robot ics Planning Reading: R&N Sect .

HUF 2017 KEK site report Share Our Experience Koichi Murakami (KEK/CRC) HUF 2017 KEK, Tsukuba 1

Agenda Agenda Linda Rammler, UConn UCEDD (copy from Agenda handout) Fr. John Gallagher,

Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations Fei Sun ,

[ with the schedulers collaboration] Alberto Miranda, PhD Researcher on HPC I/O

Highlights from BESIII Hai-Bo Li for BESIII Collaboration Institute of High Energy Physics

The Privacy of Secured Computations Adam Smith Penn State Crypto - PowerPoint PPT Presentation

The Privacy of Secured Computations Adam Smith Penn State Crypto & Big Data Workshop Relax it can only December 15, 2015 see metadata. Cartoon: 1 Big Data Every <length of time> your <household object> generates

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

Database Privacy Review: Anonymity vs. Privacy Privacy - Privacy is the claim of individuals,

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

Privacy engineering, CyLab privacy by design, privacy impact assessments, and privacy governance

Privacy Enhancing Technologies Spring 2006 Outline Privacy Overview Course Topics

Privacy engineering, CyLab privacy by design, privacy impact assessments, and privacy governance

Privacy in Wireless Networks privacy notions and metrics; privacy in RFID systems; location

Privacy by Design Principles of Privacy-Aware Ubiquitous Systems Marc Langheinrich Privacy by

Data privacy: an introduction (part 1) Klara Stokes What is privacy? Privacy has been defined in

Privacy Past and Future ISSA Lansing, MI May 19, 2016 Keith A. Cheresko Principal, Privacy

Privacy Presentation Task Force on Autonomous Vehicles Subcommittee on Cybersecurity, Privacy,

UL HPC School 2017 Overview &amp; Challenges of the UL HPC Facility at the Belval &amp; EuroHPC

Unimodality of q -Eulerian polynomials and q , p -Eulerian polynomials Michelle Wachs University

Out line Robot ics Percept ion Robot ics Planning Reading: R&amp;N Sect .

HUF 2017 KEK site report Share Our Experience Koichi Murakami (KEK/CRC) HUF 2017 KEK, Tsukuba 1

Agenda Agenda Linda Rammler, UConn UCEDD (copy from Agenda handout) Fr. John Gallagher,

Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations Fei Sun ,

[ with the schedulers collaboration] Alberto Miranda, PhD Researcher on HPC I/O

Highlights from BESIII Hai-Bo Li for BESIII Collaboration Institute of High Energy Physics

UL HPC School 2017 Overview & Challenges of the UL HPC Facility at the Belval & EuroHPC

Out line Robot ics Percept ion Robot ics Planning Reading: R&N Sect .