privitar com engineering privacy london uk why are
play

www.privitar.com Engineering Privacy London, UK WHY ARE - PowerPoint PPT Presentation

www.privitar.com Engineering Privacy London, UK WHY ARE ORGANISATIONS SLOW TO ADOPT PETS? Differential Privacy as a case study Theresa Stadler, SuRI at EPFL 2018 What are you talking about? Everybody wants data privacy as fast as


  1. www.privitar.com Engineering Privacy London, UK

  2. WHY ARE ORGANISATIONS SLOW TO ADOPT PETS? Differential Privacy as a case study Theresa Stadler, SuRI at EPFL 2018

  3. “What are you talking about? Everybody wants data privacy as fast as possible!” INTRO More and more organisations show their commitment to protecting user privacy by adopting privacy- enhancing technologies. Google, …

  4. “What are you talking about? Everybody wants data privacy as fast as possible!” INTRO More and more organisations show their commitment to protecting user privacy by adopting privacy- enhancing technologies. Google, US Census Bureau, …

  5. “What are you talking about? Everybody wants data privacy as fast as possible!” INTRO More and more organisations show their commitment to protecting user privacy by adopting privacy- enhancing technologies. Google, US Census Bureau, Apple, …

  6. “What are you talking about? Everybody wants data privacy as fast as possible!” INTRO But some are struggling to get it right

  7. “What are you talking about? Everybody wants data privacy as fast as possible!” INTRO But some are struggling to get it right… And this is just the ones who tried to use their data. Many other organisations would like to use their data (for good) but do not know what they can do, should do or which technologies are the right ones to use. Instead they are either locking down their data or rely on laborious manual access controls and human monitoring which is slowing down innovation.

  8. “What are you talking about? Everybody wants data privacy as fast as possible!” INTRO PETs Despite the current push for stronger privacy regulations and an increased awareness amongst customers for data privacy, many organisations are slower to adopt PETs than one would expect given the current push for stronger privacy regulations. Why?

  9. PART I “What are the hard questions that need solving for PETs to become easier to adopt?”

  10. “But academia already offers solutions such as Differential Privacy.” Industry need Privacy-enhancing technology Safely release aggregate statistics Differential Privacy MOTIVATION – – Table A: School population: primary, secondary and all pupils: Schools in England, 2006-2017 Year State funded State funded All schools types primary schools secondary (including independent schools schools) 2006 4,150,595 3,347,500 8,231,055 2007 4,110,750 3,325,625 8,167,715 2008 4,090,400 3,294,575 8,121,955 2009 4,077,350 3,278,130 8,092,280 2010 4,096,580 3,278,485 8,098,360 2011 4,137,755 3,262,635 8,123,865 2012 4,217,000 3,234,875 8,178,200 2013 4,309,580 3,210,120 8,249,810 2014 4,416,710 3,181,360 8,331,385 2015 4,510,310 3,184,730 8,438,145 2016 4,615,170 3,193,420 8,559,540 2017 4,689,660 3,223,090 8,669,085 Source: school census Source: Founders4Schools, LinkedIn Salary, SFR28/2017 Source: Dwork and Roth, 2014

  11. How to protect against privacy risks in aggregate statistics DIFFERENTIAL PRIVACY Vote COUNT ! NOISY COUNT ! + # 50% 73% 50% 52.5% 50% 50.2% Remain 961 962 896 2107 Leave 446 440 447 348 • Enable to bound the information leakage about individuals • Allows inference about groups

  12. “But academia already offers solutions such as Differential Privacy.” Industry need (for real) Privacy-enhancing technology Safely release aggregate statistics multiple Differential Privacy times about several related entities where Local Di ff erential Privacy for Evolving Data data is aggregated from a relational Aaron Roth † Jonathan Ullman ‡ Bo Waggoner § Matthew Joseph ∗ February 21, 2018 database with high accuracy Theorem 1.1 (Protocol for Bernoulli Means, Informal Version of Theorem 4.3) . In the above model, there is an ε -di ff erentially private local protocol that achieves the following guarantee: with probability at least 1 − δ the protocol outputs estimates ˜ p t such that √ p t ∣ = O ⎛ δ )⎞ ∣ p t − ˜ ∀ t = 1 ,... ,T 1 ℓ + k 2 2 ( nT MOTIVATION 3 ε 2 n ⋅ log ⎝ ⎠ where k is the number of times p t changes, ℓ is the epoch length, T is the number of epochs, and n is the number of users. Note that if ℓ ≳ ε 2 n k k 2 then the error is ≲ � √ n DOI:10.1145/1810891.1810916 Privacy Integrated Queries: An Extensible Platform for Privacy-Preserving Data Analysis By Frank McSherry Towards Practical Differential Privacy for SQL Queries Noah Johnson Joseph P . Near Dawn Song University of California, University of California, University of California, Berkeley Berkeley Berkeley noahj@berkeley.edu jnear@berkeley.edu dawnsong@cs.berkeley.edu Definition 6 (Local Sensitivity at Distance) . The local sensitivity of f at distance k from database x is: A ( k ) f ( x ) = y ∈ D n : d ( x,y ) ≤ k LS f ( y ) max Improving the Gaussian Mechanism for Di ff erential Privacy: Analytical Calibration and Optimal Denoising ∗ Borja Balle † 1 and Yu-Xiang Wang 2 1 Amazon Research, Cambridge, UK 2 Amazon Web Services, Palo Alto, USA Theorem 5. A mechanism M : X ! Y is ( ε , δ ) -DP if and only if the following holds for every x ' x 0 : P [ L M,x,x 0 � ε ] � e ε P [ L M,x 0 ,x  � ε ]  δ . (3) Source: SFR28/2017, Johnson et al. 2017 Source: Joseph et al. 2018, Johnson et al. 2017, Balle and Wang 2018, McSherry 2010

  13. “But academia already offers solutions such as Differential Privacy.” Theoretical risk What organisations are worried about Definition 4. A fractional linear query is specified by a vector b 2 [0 , 1] n ; the exact answer is q b ( s ) = 1 n b | s (which lies in [0 , 1] as long as s is binary). An answer ˆ q b is α -accurate if | ˆ q b � q b ( s ) |  α . If a collection of fractional linear query statistics, given by the rows of a matrix B , is MOTIVATION answered to within some error α , we get the following problem: 1 Definition 5 ( B -reconstruction problem) . Given a matrix B and a vector ˆ q = n Bs + e , where k e k 1  α and s 2 { 0 , 1 } n , find ˆ n s with Ham (ˆ s, s )  10 . The reconstruction error is the fraction Ham (ˆ s,s ) . n Theorem 6 (Dinur & Nissim (2003)) . When B 2 { 0 , 1 } 2 n ⇥ n has all possible rows in { 0 , 1 } n , there is an attack A that solves the B -reconstruction problem with reconstruction error at most 4 α (given α -accurate query answers), for every α > 0 . In particular, every mechanism that releases such statistics is blatantly nonprivate when α < 1 / 40 . Theorem 8. There exists an attack A such that, if B is chosen uniformly at random in { 0 , 1 } m × n and 1 . 1 n ≤ m ≤ 2 n then, with high probability over the choice of B , A ( B, ˆ q ) , given any α -accurate answers ˆ q , solves B -reconstruction with error β = o (1) as long as ⇣q ⌘ log( m/n ) α = o . In particular, there is a c > 0 such that every mechanism for answering n p log( m the queries in B with error α ≤ c n ) /n is blatantly nonprivate. Source: Dwork et al. 2016

  14. “If these problems were all solved, will PETs become a plug-and-play technique?” Privacy expert: We offer you strong privacy protection for your data product. Client: Great. What’s the level of privacy? MOTIVATION Privacy expert: ! = 0.5 Client: …? Client: But does it preserve my data utility? Privacy expert: Yes. Average distortion is only 3.27. Client: …?

  15. “What are we required to do?” Singling out ? Unlinkability Anonymity QUESTIONS ? Unobservability Inference ? Pseudonimity From a business perspective • Understanding regulations is hard for businesses • Unclear what legal terms translate into • Even privacy expert community can’t provide answers Source: European Union Article 29 Data Protection Working Party Opinion on Anonymization

  16. “What could we do?” Decentralisation Differential Privacy Homomorphic Encryption QUESTIONS PP Machine Learning Aggregation Synthe<c data Suppression From a business perspective • There’s no good overview what technologies are out there • There’s no clear overview which PETs are fit for which use case

  17. “What should we do?” Singling out Linkage Inference Differential Privacy ? ? ? Aggregation ? ? ? QUESTIONS Hashing ? ? ? Suppression ? ? ? Decentralisation From a business ? ? ? perspective • There’s no good overview what technologies are out there • No clear mapping from privacy harm to techniques to reduce the risk of a harm • No best practice examples • Few guidelines Source: European Union Article 29 Data Protection Working Party Opinion on Anonymization

  18. “What do we gain?” Theory Industry £ € QUESTIONS ample 4. k occurrences of each value under k -anonymity $ e T in Figure 2 adheres to k -anonymity, where QI From a business Recommended 192-bit Elliptic Curve Domain perspecIve n Parameters over F p • What is the transactional value of privacy? • Businesses want to measure the value of privacy in $ • This requires clearer measures of risk and risk reduction Source: European Union Article 29 Data Protection Working Party Opinion on Anonymization

Recommend


More recommend