Do Computer Science Definitions of Privacy Satisfy Legal Definitions of Privacy? The Case of FERPA and Differential Privacy CS LAW Kobbi Nissim Ben-Gurion University Center for Research on Computation and Society Harvard University Priv rivacy Enhancin ing Technolo logie ies for r Bio Biometric ic Data Haifa University, Jan 17, 2016
This Work: a Collaboration CS LAW Product of a working group (meeting since Nov 2014) Contributing to this project: • Center for Research on Computation and Society (CRCS): • Kobbi Nissim, Aaron Bembenek, Mark Bun, Marco Gaboardi, Thomas Steinke, and Salil Vadhan • Berkman Center for Internet & Society: • David O ’ Brien, Alexandra Wood, and Urs Gasser
Privacy Tools for Sharing Research Data • Goal: help social scientists share privacy-sensitive research data via a collection of technological and legal tools • A problem: privacy protection techniques repeatedly shown to fail to provide reasonable privacy
Data Privacy • Studied (at least) from the 60s • Approaches: De-identification, redaction, auditing, noise addition, synthetic datasets … • Focus on how to provide privacy, not on what privacy protection is • May have been suitable for the pre-internet era • Re- identification [Sweeney ’00, …] • GIS data, health data, clinical trial data, DNA, Pharmacy data, text data, registry information, … • Blatant non- privacy [Dinur, Nissim ‘03], … • Auditors [Kenthapadi, Mishra, Nissim ’05] • AOL Debacle ‘06 • Genome- Wide association studies (GWAS) [Homer et al. ’08] • Netflix award [Narayanan, Shmatikov ‘09] • Netflix canceled second contest • Social networks [Backstrom , Dwork, Kleinberg ‘11] • Genetic research studies [Gymrek, McGuire, Golan, Halperin, Erlich ‘11] • Microtargeted advertising [Korolova 11] • Recommendation Systems [Calandrino, Kiltzer, Naryanan, Felten, Shmatikov 11] • Israeli CBS [Mukatren , Nissim, Salman, Tromer ’14] • Attack on statistical aggregates [Homer et al.’08] [Dwork, Smith, Steinke, Vadhan ‘15 ] • … Slide idea stolen shamelessly from Or Sheffet
Privacy Tools for Sharing Research Data • Goal: help social scientists share privacy-sensitive research data via a collection of technological and legal tools • A problem: privacy protection techniques repeatedly shown to fail to provide reasonable privacy • Differential privacy [Dwork, McSherry, N, Smith 2006] • A formal mathematical privacy concept • Addresses weaknesses of traditional schemes (and more) • Has a rich theory, in first steps of implementation and testing
The Protagonists Differential Privacy A mathematical definition of privacy 𝑁: 𝑌 𝑜 → 𝑈 satisfies 𝜗 -differential privacy if FERPA (the Family Educational Rights and ∀𝑦, 𝑦 ′ ∈ 𝑌 𝑜 s.t. 𝑒𝑗𝑡𝑢 𝐼 𝑦, 𝑦 ′ = 1 ∀𝑇 ⊆ 𝑈 , Privacy Act) M 𝑁 𝑦 ∈ 𝑇 ≤ 𝑓 𝜗 Pr M 𝑁 𝑦 ′ ∈ 𝑇 . A legal standard of privacy Pr
A use case: The Privacy Tools for Sharing Research Data Project * http://privacytools.seas.harvard.edu/
Contains Should I apply student info for access??? protected by FERPA Other IRB policies, terms of use … researchers may is it worth the trouble? find it useful … Alice Bob Dataverse Network Alice ’ s data please Restricted! Access to Alice ’ s data Alice ’ s cool w/differential privacy MOOC data Privacy Tools Does DP satisfy FERPA? * http://dataverse.org/ http://privacytools.seas.harvard.edu/
Short digression: Motivating Differential Privacy
Yay! Just before this talk … an interesting discussion … How many Justin Bieber fans attend the workshop? Highly sensitive personal info; how can this be done? Great! … I will only I will do publish the the survey… result Hooray! … and Trusted Party immediately forget the data! Yay!
3 #JustinBieber fans attend #Haifa-privacy-workshop
A survey! How many JB A few minutes later - I come in … fans attend the wkshop? … publish the result I will do the survey … … and forget the Trusted Party data! What are you doing? Me too!
3 #JustinBieber fans attend #Haifa-privacy-workshop 4/100 chance Each Kobbi is a JB The tweet fan hides my info! Aha! (after @Kobbi joins): 4 #JustinBieber fans attend #Haifa-privacy-workshop
Composition • Differencing attack: • How is my privacy affected when an attacker sees analysis before and after I join/leave? • More generally: Composition • Ho is my privacy affected when an attacker combines results from two or more privacy preserving analyses? • Fundamental law of information: the more information we extract from our data, the more is learned about individuals! • So, privacy will deteriorate as we use our data more and more • Best desiderata: • Deterioration is quantifiable and controllable • Not abrupt
The Protagonists: Differential Privacy
My Privacy Desiderata Real world: Kobbi ’ s data Analysis Outcome (Computation) Data same outcome My ideal world: Data Analysis Outcome w/my (Computation) info removed
Things to Note • In this talk, we only consider the outcome of analyses • Security flaws, hacking, implementation errors, … • Very important but very different questions • My privacy desiderata would hide whether I ’ m a JB fan! • Resilient to differencing attacks • Does not mean I ’ m fully protected • I ’ m only protected to the extent I ’ m protected in my ideal world • Some harm could happen to me even in my ideal world • Bob smokes in public • Study teaches that smoking causes cancer • Bob ’ s health insurer raises his premium • Bob is harmed even if he does not participate in the study!
Our Privacy Desiderata Should ignore Kobbi ’ s info Real world: Analysis Outcome (Computation) Data same outcome My ideal world: Data Analysis Outcome w/my (Computation) info removed
Our Privacy Desiderata Should ignore Kobbi’s info and Gertrude’s! Real world: Analysis Outcome (Computation) Data same outcome Gert ’ s ideal world: Data Analysis Outcome w/Gert ’ s (Computation) info removed
Our Privacy Desiderata Should ignore Kobbi ’ s info and Gertrude ’ s! and Mark’s! Real world: Analysis Outcome (Computation) Data same outcome Mark’s ideal world: Data Analysis Outcome w/ Mark ’s (Computation) info removed
Our Privacy Desiderata Should ignore Kobbi ’ s info and Gertrude’s! and Mark’s! Real world: … and everybody ’ s! Analysis Outcome (Computation) Data same outcome ’s ideal world: Data Analysis Outcome w/ ’s (Computation) info removed
A Realistic Privacy Desiderata Real world: Analysis Outcome (Computation) Data same outcome ε - ” similar ” ’ s ideal world: Data Analysis Outcome w/ ’ s (Computation) info removed
Differential Privacy [Dwork McSherry N Smith 06] Real world: Analysis Outcome (Computation) Data ε - ”similar” ’ s ideal world: Data Analysis Outcome w/ ’s (Computation) info removed *See also: Differential Privacy: An Introduction for Social Scientists.
Why Differential Privacy? • DP: Strong, quantifiable, composable mathematical privacy guarantee • Provably resilient to known and unknown attack modes! • Natural interpretation: I am protected (almost) to the extent I’m protected in my privacy-ideal scenario • Theoretically, DP enables many computations with personal data while preserving personal privacy • Practicality in first stages of validation
Differential Privacy [Dwork McSherry N Smith 06] 𝑁: 𝑌 𝑜 → 𝑈 satisfies 𝜗 -differential privacy if ∀𝑦, 𝑦 ′ ∈ 𝑌 𝑜 s.t. 𝑒𝑗𝑡𝑢 𝐼 𝑦, 𝑦 ′ = 1 ∀𝑇 ⊆ 𝑈 , M 𝑁 𝑦 ∈ 𝑇 ≤ 𝑓 𝜗 Pr M 𝑁 𝑦 ′ ∈ 𝑇 . Pr
It ’ s Real!
How is Differential Privacy Achieved? • Careful addition of random noise into the computation: • Randomized Response [W65], Framework of global sensitivity [DMNS05], Framework of smooth sensitivity [NRS07], Sample and aggregate [NRS07], Exponential mechanism [MT07], Propose test release [DL09], Sparse vector technique [DNRRV09], Private multiplicative weights [HR10], Matrix mechanism [LHRMM10], Choosing mechanism [BNS13], Large margin mechanism [CHS14], Dual query mechanism [GGHRW14 ], … • Differentially private algorithms exists for many tasks: • Statistics, machine learning, private data release, …
Some Other Efforts to Bring DP to Practice • Microsoft Research “ PINQ ” • CMU-Cornell-PennState “ Integrating Statistical and Computational Approaches to Privacy ” (See http://onthemap.ces.census.gov/) • UCSD “ Integrating Data for Analysis, Anonymization, and Sharing ” (iDash) • UT Austin “ Airavat: Security & Privacy for MapReduce ” • UPenn “ Putting Differential Privacy to Work ” • Stanford-Berkeley-Microsoft “ Towards Practicing Privacy ” • Duke-NISSS “ Triangle Census Research Network ” • MIT/CSAIL/ALFA "MoocDB Privacy tools for Sharing MOOC data" • …
The Protagonists: FERPA
Recommend
More recommend