patterns and packages
play

Patterns and Packages in PostgreSQL for Privacy Preservation - PowerPoint PPT Presentation

PostgreSQL, Planning, PostGIS, Partitioning, PaaS, Permissions and now. Patterns and Packages in PostgreSQL for Privacy Preservation mantaq10 15 November 2019, Sydney www.2019.pgdu.org Atif Rahman I was like her According to Pearson-R


  1. PostgreSQL, Planning, PostGIS, Partitioning, PaaS, Permissions and now…. Patterns and Packages in PostgreSQL for Privacy Preservation mantaq10 15 November 2019, Sydney www.2019.pgdu.org Atif Rahman

  2. I was like her According to Pearson-R We were both outliers • Data Engineering • ML Pipelines • Herding Cats

  3. April 2018 – March 2019 964 NDB Breach Notifications OAIC Report 2019 Human Error Attack Healthcare 55% 35% 60% Error Others Financial 41% www.2019.pgdu.org mantaq10

  4. OAIC Report 2019 www.2019.pgdu.org mantaq10

  5. You can have security but not necessarily privacy www.2019.pgdu.org mantaq10

  6. Security Protection Binary Usage Privacy Contextual ISO/IEC 29100:2011: Privacy Framework www.2019.pgdu.org mantaq10

  7. Privacy Guarantees 𝐺 Loss-less Functions De-Identification (Record Keys (PK, FK, SK)) 1 vs x f(x) Lossy Functions 𝐺 "# Re-Identification (Brute Force & Decryption) 2 Re-Identification (Record Linkage * Math) 3 4 Ethical Computing (Permissibility & Compliance) PII and Attribute “Homomorphic encryption schemes are often Augmentation repackaging vulnerabilities (practical chosen- ciphertext attacks) as features.” – The Internet www.2019.pgdu.org mantaq10

  8. Record Linkage "87% of the U.S. population is uniquely identified by date of birth, gender, postal code.” Latanya Sweeney (k-anonymity) “Decreasing the precision of the data, or perturbing it statistically, makes re-identification gradually harder at a substantial cost to utility”. Chris Culnane, Benjamin Rubinstein, Vanessa Teague @UniMelb www.2019.pgdu.org mantaq10

  9. Privacy vs Utility Trade-off Bleeding Edge Cutting Edge Established DP SM HE AN Better Privacy Utility Guarantee SM: Secure Multiparty Computing DP: Differential Privacy HE: Homomorphic Encryption AN: Anonymisation www.2019.pgdu.org mantaq10

  10. 1. AN: (Pseudo)Anonymisation ID NAME DOB EMPLOYER ZIPCODE FK_SHOP 101 SARAH CONNOR 12-06-1962 JB Vet 63456 12 112 PAMELA LANDY 18-10-1971 FBI 54367 45 REPLACEMENT SUPRESSION (PG String Functions) (reversible or random) REPLACEMENT (PGAnonymizer) PERTURBATION GENERALISATION ID NAME DOB EMPLOYER ZIPCODE FK_SHOP 101 MIKE OBAMA 13-07-1982 JB Vet 63456 12 112 BRUCE LEE 19-11-1991 FBI 54367 45 www.2019.pgdu.org mantaq10

  11. 1. AN: (Pseudo) Anonymisation ID NAME DOB EMPLOYER ZIPCODE FK_SHOP 101 SARAH CONNOR 12-06-1962 JB Vet 63456 12 112 PAMELA LANDY 18-10-1971 FBI 54367 45 REPLACEMENT SUPRESSION (PG String Functions) (Wildcard or Removal) SUPRESSION (PGAnonymizer) - 18 PII Attributes PERTURBATION GENERALISATION ID NAME EMPLOYER ZIPCODE FK_SHOP 101 M*** ****A JB Vet 63456 12 112 B**** **E FBI 54367 45 www.2019.pgdu.org mantaq10

  12. 1. AN: (Pseudo) Anonymisation ID NAME DOB EMPLOYER ZIPCODE FK_SHOP 101 SARAH CONNOR 12-06-1962 JB Vet 63456 12 112 PAMELA LANDY 18-10-1971 FBI 54367 45 REPLACEMENT SUPRESSION (Additive Noise) (PGAnonymizer) (PDF) (Google DP) PERTURBATION (Data Imputation) (Uber DP) PERTURBATION GENERALISATION ID NAME DOB EMPLOYER ZIPCODE FK_SHOP 101 SARAH CONNOR 12-07-1958 JB Vet 64532 12 112 PAMELA LANDY 18-11-1973 FBI 57843 45 www.2019.pgdu.org mantaq10

  13. 1. AN: (Pseudo)Anonymisation ID NAME DOB EMPLOYER ZIPCODE FK_SHOP 101 SARAH CONNOR 12-06-1962 JB Vet 63456 12 112 PAMELA LANDY 18-10-1971 FBI 54367 45 REPLACEMENT SUPRESSION (PGAnonymizer) GENERALISATION (K-Anonymity or Masking) (PG Aggregate Functions) PERTURBATION 𝝉 _ZIPCODE ID NAME DOB EMPLOYER FK_SHOP GENERALISATION 101 SARAH CONNOR 1960s JB Vet 0.37 12 112 PAMELA LANDY 1970s FBI -0.99 45 www.2019.pgdu.org mantaq10

  14. Privacy vs Utility Trade-off Bleeding Edge Cutting Edge Established DP SM HE AN Better Privacy Utility Guarantee SM: Secure Multiparty Computing DP: Differential Privacy HE: Homomorphic Encryption AN: Anonymisation www.2019.pgdu.org mantaq10

  15. Differential Privacy ? Database with Statistical Perturbations Private The Oracle Ned in it Properties (Noise) Database. Not sure if Ned is there anymore • Works on the Data itself, not on the management environment • Considerably fast compared to encryption techniques. • Quantum Safe (ish) www.2019.pgdu.org mantaq10

  16. Differential Privacy on PostgreSQL https://github.com/google/differential-privacy Privacy Loss Count Sum Mean - Epsilon & Delta Variance - Risk Score for every attribute Standard deviation Order statistics (including min, max, used for a particular person and median) - Risk Score for total number of records with similar values Laplace Functions for UDFs - (rule of thumb) k = 11 www.2019.pgdu.org mantaq10

  17. HE: Homomorphic Encryption Malleable BFV Microsoft SEAL Partial Ability to apply PALISADE HE computations HELib BGV Performance on encrypted HEAAN Full TFHE data! HE CKKS Operators Trade-Offs Categories Schemes Libraries www.2019.pgdu.org mantaq10

  18. Privacy vs Utility Trade-off Bleeding Edge Cutting Edge Established DP SM HE AN Better Privacy Utility Guarantee SM: Secure Multiparty Computing DP: Differential Privacy HE: Homomorphic Encryption AN: Anonymisation www.2019.pgdu.org mantaq10

  19. Secure Multi-party Computing K-Anonymity X4/4 = Avg_pay A X4 = D_pay + X3 X1 = A_pay + 876532 D B X3 = C_pay + X2 C X2 = B_pay + X1 www.2019.pgdu.org mantaq10

  20. Privacy Guarantees 𝐺 Loss-less Functions De-Identification (Record Keys (PK, FK, SK)) 1 vs x f(x) Lossy Functions 𝐺 "# Re-Identification (Brute Force & Decryption) 2 Re-Identification (Record Linkage & Math) 3 4 Ethical Computing (Permissibility & Compliance) PII and Attribute “Homomorphic encryption schemes are often Augmentation repackaging vulnerabilities (practical chosen- ciphertext attacks) as features.” – The Internet www.2019.pgdu.org mantaq10

  21. Typical Data Pipelines Sources Landing Processing Serving ………Privacy Gates 1 1 2 4 3 2 Unified Key 1 2 Management System Ethical Computing (Permissibility & De-Identification (Record Keys (PK, FK, SK)) Re-Identification (Brute Force & Decryption) Re-Identification (Record Linkage) Compliance) www.2019.pgdu.org mantaq10

  22. Emerging Data Architecture (Data Fabrics) [HTAP = OLTP + OLAP] Processing & Serving Persistence Sources *Gaps to Close: Encryption • Performance Developer UX • Admin Tooling • Extensions! • Unified Key Management System Ethical Computing (Permissibility & De-Identification (Record Keys (PK, FK, SK)) Re-Identification (Brute Force & Decryption) Re-Identification (Record Linkage) Compliance) www.2019.pgdu.org mantaq10

  23. Key Takeaways Securing your database doesn ’t guarantee data privacy. There are trade-offs between privacy and utility You can provision privacy controls within PostgreSQL PostgreSQL fits emerging (data) architecture patterns Atif is pledging to build an extension, he needs my help! www.2019.pgdu.org mantaq10

  24. Questions 24

Recommend


More recommend