database privacy
play

Database Privacy Review: Anonymity vs. Privacy Privacy - Privacy is - PowerPoint PPT Presentation

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy - Privacy is the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is


  1. Introduction to Cybersecurity Database Privacy

  2. Review: Anonymity vs. Privacy  Privacy - Privacy is the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others  Anonymity - The state of being not identifiable within a set of subjects/individuals - It is a property exclusively of individuals  Privacy != Anonymity - Anonymity is a way to maintain privacy, and sometimes it is not necessary Foundations of Cybersecurity 2016 1

  3. Review: Anonymous Communication (AC) Protocols  Various AC protocols with different goals: Communication - Low Latency Overhead Complexity - Low Communication Overhead - High Traffic-Analysis Resistance Latency  Typically categorized by latency overhead: Traffic-Analysis Resistance - low-latency AC protcols e.g. Tor, DC Nets, Crowds - high-latency AC protocols e.g. Mix networks Introduction to Cybersecurity 2016 2

  4. A Glimpse on Research: Privacy Assessment with MATor Randomly choose an entry , a middle and an related exit node. entry middle exit corrupts Goal: Derive worst-case quantitative anonymity guarantees Impact of Single Node Corruption Overall Guarantee related i 𝐶 Budget Adversary 𝐵 𝑔 with cost function 𝑔: 𝑂 → ℝ and budget 𝐶 Anonymity degeneration (for Integer maximization problem encryption as terms) Maximize A 𝑕 ≔ ∑ 𝑏∈A 𝑏𝑒𝑤 𝑜𝑝𝑒𝑓 𝑏 subject to A ⊆ 𝑂, 𝑔 A ≤ 𝐶 𝜀 𝑓𝑜𝑢𝑠𝑧 𝑗 = ∑ (𝑛,𝑦)∈𝑂 2 Pr[ 𝑗, 𝑛, 𝑦 ← 𝑈𝑝𝑠] Computational 𝜀 𝑛𝑗𝑒𝑒𝑚𝑓 𝑗 = ∑ (𝑓,𝑦)∈𝑂 2 Δ st Pr[ 𝑓, 𝑗, 𝑦 ← 𝑈𝑝𝑠] 𝑏𝑒𝑤 𝑜𝑝𝑒𝑓 (𝑗) Soundness 𝜀 𝑓𝑦𝑗𝑢 𝑗 = Δ 𝑡𝑢 ∑ 𝑓∈𝑂 Pr[ 𝑓, 𝑛, 𝑗 ← 𝑈𝑝𝑠] 1 𝑕 𝑏𝑚𝑕𝑓𝑐𝑠𝑏𝑗𝑑 (𝑜) − 𝑕 𝑑𝑠𝑧𝑞𝑢𝑝 (𝑜) ≤ 𝑞𝑝𝑚𝑧 𝑜 Introduction to Cybersecurity 2016 3

  5. A Glimpse on Research: Privacy Assessment with MATor Randomly choose an entry , a middle and an related exit node. entry middle exit corrupts Goal: Derive worst-case quantitative anonymity guarantees 1 Live Monitor Anonymity Tor 1 0.8 LASTor 0.6 Alternative Uniform Path Selection 0.4 Algorithms 0,5 US-Exit 0.2 0 1 8 64 512 4,096 32,768 0 2012 2013 2014 Bandwidth in MB/s time Challenges: Comprehensive network-layer attackers, extension beyond structural corruption, content-sensitive assessment Potential killer arguments: Attackers overly powerful, hence too pessimistic guarantees; assessment only for Tor, not tailored attack Introduction to Cybersecurity 2016 4

  6. Lecture Summary – Part I Basic Database Privacy • Motivation • Data Sanitization • k-anonymity and l-diversity Principle Approaches to Data Protection • Sanitization before Publication • Protection after Publication • Publication without Control Introduction to Cybersecurity 2016 5

  7. Data Privacy: Attribute Disclosure female female 25-30 25-30 Saarland Saarland Addison Addison Disorder Disorder female female 25-30 25-30 Saarland Saarland Addison Addison Disorder Disorder male male 30-35 30-35 Saarland Saarland Healthy Healthy Alice suffers from the Addison disorder! female 29y Saarbrücken social network Introduction to Cybersecurity 2016 6

  8. Cryptographic Solutions  Why not just delete the data? In contrast to cryptography, privacy often requires a certain utility. Deleting data destroys utility.  Why can’t we encrypt? Storing or transmitting data encrypted is a good idea. Someone has (needs to have) the key. Introduction to Cybersecurity 2016 7

  9. Sanitization  Legally, data has to be “sanitized”: - Removal of “identifying” information Unsanitized data Sanitized data • • Name Name • • Gender Gender • • Age Age • • Address Address • • Phone Number Phone Number • • Field of studies Field of studies • • Grades Grades Introduction to Cybersecurity 2016 8

  10. Benefits of Sanitization Sanitized data can (still) be used for:  Research Statistics Science! Sanitized data  Healthcare • Name • Gender • Age • Address • Phone Number  Governmental statistics • Field of studies • Grades  Improving business models Introduction to Cybersecurity 2016 9

  11. Does Sanitization suffice? Sanitization = Privacy?  No identity  No identifying information (“quasi identifiers”) such as address or phone number 1 female student Sanitized data of this age • Name attends a course • Gender • Age • Address • Phone Number Privacy • Field of studies Breach • Grades Introduction to Cybersecurity 2016 10

  12. Attacks on Databases Early defense mechanisms: query sanitization. Sanitization: Queries must not Name Age Gender Semester Grade Alice 19 Female 1 1.3 depend on identifiers! Bob 18 Male 1 2.0 Charlie 18 Male 1 1.7 SELECT SUM(Grade) WHERE Name = ‘Isa’ Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 3.7 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Isa 20 Female 3 3.7 John 20 Male 3 1.7 Kale 21 Male 5 1.7 Leonard 23 Male 5 failed Martin 20 Male 5 2.7 Nils 22 Male 5 3.0 Otto 20 Male 5 1.0 Introduction to Cybersecurity 2016 11

  13. Attacks on Databases Early defense mechanisms: query sanitization. Sanitization: Queries must not Name Age Gender Semester Grade Alice 19 Female 1 1.3 depend on identifiers! Bob 18 Male 1 2.0 Charlie 18 Male 1 1.7 Sanitization: Queries must not Dave 18 Male 1 3.7 be answered if the answer is Eve 17 Female 1 1.0 below a threshold Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 SELECT SUM(Grade) WHERE Semester = 3 Hans 23 Male 3 3.0 AND Gender = Female Isa 20 Female 3 3.7 3.7 John 20 Male 3 1.7 Kale 21 Male 5 1.7 Leonard 23 Male 5 failed Martin 20 Male 5 2.7 Nils 22 Male 5 3.0 Otto 20 Male 5 1.0 Introduction to Cybersecurity 2016 12

  14. Attacks on Databases Early defense mechanisms: query sanitization. Sanitization: Queries must not Name Age Gender Semester Grade Alice 19 Female 1 1.3 depend on identifiers! Bob 18 Male 1 2.0 Charlie 18 Male 1 1.7 Sanitization: Queries must not Dave 18 Male 1 3.7 be answered if the answer is Eve 17 Female 1 1.0 below a threshold Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 SELECT SUM(Grade) Hans 23 Male 3 3.0 Isa 20 Female 3 3.7 30.1 John 20 Male 3 1.7 SELECT SUM(Grade) WHERE NOT Kale 21 Male 5 1.7 (Semester = 3 AND Gender = Female) Leonard 23 Male 5 failed Martin 20 Male 5 2.7 26.4 Nils 22 Male 5 3.0 Otto 20 Male 5 1.0 Local Computation: Isa ’ s Grade = 30.1 – 26.4 = 3.7 Introduction to Cybersecurity 2016 13

  15. K-Anonymity (Intuitive Idea) K-Anonymity: Privacy means that one can hide within a set of (at least) K other people with the same quasi-identifiers. Quasi identifiers: Attributes that could identify a person (name, age, etc.) K = 6 Introduction to Cybersecurity 2016 15

  16. K-Anonymity (Definition) Definition: Data satisfies K-Anonymity, if each person contained in the data cannot be distinguished from at least K-1 other individuals also within the data. Introduction to Cybersecurity 2016 16

  17. Achieving K-Anonymity Reduce the information such that the data collapses: Suppression: Name Age Gender Semester Grade Name Age Gender Semester Grade Alice 19 Female 1 1.3 * 19 * 1 1.3 Bob 18 Male 1 2.0 * 18 * 1 2.0 Charlie 18 Male 1 1.7 * 18 * 1 1.7 Dave 18 Male 1 3.7 * 18 * 1 3.7 Eve 17 Female 1 1.0 * 17 * 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Generalization: Isa 20 Female 3 failed John 20 Male 3 1.7 Name Age Gender Semester Grade Kale 21 Male 5 1.7 21-25 5 1.7 Leonard 23 Male 5 failed 21-25 5 failed Martin 20 Male 5 2.7 18-20 5 2.7 Nils 22 Male 5 3.0 21-25 5 3.0 Otto 20 Male 5 1.0 20 5 1.0 Introduction to Cybersecurity 2016 17

  18. K-Anonymity (3) Example: K-Anonymity for a list of students with K=5. Name Semester Grade * 1 1.3 For each semester, there are at least 5 * 1 2.0 individuals present that cannot be * 1 1.7 distinguished. * 1 3.7 * 1 1.0 * 3 1.3 Idea/Goal: * 3 2.3 * 3 3.0 Consequently, one cannot be identified, * 3 failed but hides in a group of K=5 people. * 3 1.7 * 5 1.7 * 5 failed * 5 2.7 * 5 3.0 * 5 1.0 Introduction to Cybersecurity 2016 18

  19. Attacks on K-Anonymity – Homogeneity One may learn a lot of information about an individual, if there are k people with this information. Name Semester Grade K-Anonymity with K=5 * 1 1.3 * 1 2.0 * 1 1.7 But: * 1 3.7  If we know that a particular student, * 1 1.0 say, Isa is in the 3 rd semester, then we * 3 failed immediately learn that she has failed * 3 failed * 3 failed the exam. * 3 failed * 3 failed * 5 1.7 * 5 failed * 5 2.7 * 5 3.0 * 5 1.0 Introduction to Cybersecurity 2016 19

Recommend


More recommend