Introduction to Cybersecurity Database Privacy
Review: Anonymity vs. Privacy Privacy - Privacy is the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others Anonymity - The state of being not identifiable within a set of subjects/individuals - It is a property exclusively of individuals Privacy != Anonymity - Anonymity is a way to maintain privacy, and sometimes it is not necessary Foundations of Cybersecurity 2016 1
Review: Anonymous Communication (AC) Protocols Various AC protocols with different goals: Communication - Low Latency Overhead Complexity - Low Communication Overhead - High Traffic-Analysis Resistance Latency Typically categorized by latency overhead: Traffic-Analysis Resistance - low-latency AC protcols e.g. Tor, DC Nets, Crowds - high-latency AC protocols e.g. Mix networks Introduction to Cybersecurity 2016 2
A Glimpse on Research: Privacy Assessment with MATor Randomly choose an entry , a middle and an related exit node. entry middle exit corrupts Goal: Derive worst-case quantitative anonymity guarantees Impact of Single Node Corruption Overall Guarantee related i 𝐶 Budget Adversary 𝐵 𝑔 with cost function 𝑔: 𝑂 → ℝ and budget 𝐶 Anonymity degeneration (for Integer maximization problem encryption as terms) Maximize A ≔ ∑ 𝑏∈A 𝑏𝑒𝑤 𝑜𝑝𝑒𝑓 𝑏 subject to A ⊆ 𝑂, 𝑔 A ≤ 𝐶 𝜀 𝑓𝑜𝑢𝑠𝑧 𝑗 = ∑ (𝑛,𝑦)∈𝑂 2 Pr[ 𝑗, 𝑛, 𝑦 ← 𝑈𝑝𝑠] Computational 𝜀 𝑛𝑗𝑒𝑒𝑚𝑓 𝑗 = ∑ (𝑓,𝑦)∈𝑂 2 Δ st Pr[ 𝑓, 𝑗, 𝑦 ← 𝑈𝑝𝑠] 𝑏𝑒𝑤 𝑜𝑝𝑒𝑓 (𝑗) Soundness 𝜀 𝑓𝑦𝑗𝑢 𝑗 = Δ 𝑡𝑢 ∑ 𝑓∈𝑂 Pr[ 𝑓, 𝑛, 𝑗 ← 𝑈𝑝𝑠] 1 𝑏𝑚𝑓𝑐𝑠𝑏𝑗𝑑 (𝑜) − 𝑑𝑠𝑧𝑞𝑢𝑝 (𝑜) ≤ 𝑞𝑝𝑚𝑧 𝑜 Introduction to Cybersecurity 2016 3
A Glimpse on Research: Privacy Assessment with MATor Randomly choose an entry , a middle and an related exit node. entry middle exit corrupts Goal: Derive worst-case quantitative anonymity guarantees 1 Live Monitor Anonymity Tor 1 0.8 LASTor 0.6 Alternative Uniform Path Selection 0.4 Algorithms 0,5 US-Exit 0.2 0 1 8 64 512 4,096 32,768 0 2012 2013 2014 Bandwidth in MB/s time Challenges: Comprehensive network-layer attackers, extension beyond structural corruption, content-sensitive assessment Potential killer arguments: Attackers overly powerful, hence too pessimistic guarantees; assessment only for Tor, not tailored attack Introduction to Cybersecurity 2016 4
Lecture Summary – Part I Basic Database Privacy • Motivation • Data Sanitization • k-anonymity and l-diversity Principle Approaches to Data Protection • Sanitization before Publication • Protection after Publication • Publication without Control Introduction to Cybersecurity 2016 5
Data Privacy: Attribute Disclosure female female 25-30 25-30 Saarland Saarland Addison Addison Disorder Disorder female female 25-30 25-30 Saarland Saarland Addison Addison Disorder Disorder male male 30-35 30-35 Saarland Saarland Healthy Healthy Alice suffers from the Addison disorder! female 29y Saarbrücken social network Introduction to Cybersecurity 2016 6
Cryptographic Solutions Why not just delete the data? In contrast to cryptography, privacy often requires a certain utility. Deleting data destroys utility. Why can’t we encrypt? Storing or transmitting data encrypted is a good idea. Someone has (needs to have) the key. Introduction to Cybersecurity 2016 7
Sanitization Legally, data has to be “sanitized”: - Removal of “identifying” information Unsanitized data Sanitized data • • Name Name • • Gender Gender • • Age Age • • Address Address • • Phone Number Phone Number • • Field of studies Field of studies • • Grades Grades Introduction to Cybersecurity 2016 8
Benefits of Sanitization Sanitized data can (still) be used for: Research Statistics Science! Sanitized data Healthcare • Name • Gender • Age • Address • Phone Number Governmental statistics • Field of studies • Grades Improving business models Introduction to Cybersecurity 2016 9
Does Sanitization suffice? Sanitization = Privacy? No identity No identifying information (“quasi identifiers”) such as address or phone number 1 female student Sanitized data of this age • Name attends a course • Gender • Age • Address • Phone Number Privacy • Field of studies Breach • Grades Introduction to Cybersecurity 2016 10
Attacks on Databases Early defense mechanisms: query sanitization. Sanitization: Queries must not Name Age Gender Semester Grade Alice 19 Female 1 1.3 depend on identifiers! Bob 18 Male 1 2.0 Charlie 18 Male 1 1.7 SELECT SUM(Grade) WHERE Name = ‘Isa’ Dave 18 Male 1 3.7 Eve 17 Female 1 1.0 3.7 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Isa 20 Female 3 3.7 John 20 Male 3 1.7 Kale 21 Male 5 1.7 Leonard 23 Male 5 failed Martin 20 Male 5 2.7 Nils 22 Male 5 3.0 Otto 20 Male 5 1.0 Introduction to Cybersecurity 2016 11
Attacks on Databases Early defense mechanisms: query sanitization. Sanitization: Queries must not Name Age Gender Semester Grade Alice 19 Female 1 1.3 depend on identifiers! Bob 18 Male 1 2.0 Charlie 18 Male 1 1.7 Sanitization: Queries must not Dave 18 Male 1 3.7 be answered if the answer is Eve 17 Female 1 1.0 below a threshold Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 SELECT SUM(Grade) WHERE Semester = 3 Hans 23 Male 3 3.0 AND Gender = Female Isa 20 Female 3 3.7 3.7 John 20 Male 3 1.7 Kale 21 Male 5 1.7 Leonard 23 Male 5 failed Martin 20 Male 5 2.7 Nils 22 Male 5 3.0 Otto 20 Male 5 1.0 Introduction to Cybersecurity 2016 12
Attacks on Databases Early defense mechanisms: query sanitization. Sanitization: Queries must not Name Age Gender Semester Grade Alice 19 Female 1 1.3 depend on identifiers! Bob 18 Male 1 2.0 Charlie 18 Male 1 1.7 Sanitization: Queries must not Dave 18 Male 1 3.7 be answered if the answer is Eve 17 Female 1 1.0 below a threshold Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 SELECT SUM(Grade) Hans 23 Male 3 3.0 Isa 20 Female 3 3.7 30.1 John 20 Male 3 1.7 SELECT SUM(Grade) WHERE NOT Kale 21 Male 5 1.7 (Semester = 3 AND Gender = Female) Leonard 23 Male 5 failed Martin 20 Male 5 2.7 26.4 Nils 22 Male 5 3.0 Otto 20 Male 5 1.0 Local Computation: Isa ’ s Grade = 30.1 – 26.4 = 3.7 Introduction to Cybersecurity 2016 13
K-Anonymity (Intuitive Idea) K-Anonymity: Privacy means that one can hide within a set of (at least) K other people with the same quasi-identifiers. Quasi identifiers: Attributes that could identify a person (name, age, etc.) K = 6 Introduction to Cybersecurity 2016 15
K-Anonymity (Definition) Definition: Data satisfies K-Anonymity, if each person contained in the data cannot be distinguished from at least K-1 other individuals also within the data. Introduction to Cybersecurity 2016 16
Achieving K-Anonymity Reduce the information such that the data collapses: Suppression: Name Age Gender Semester Grade Name Age Gender Semester Grade Alice 19 Female 1 1.3 * 19 * 1 1.3 Bob 18 Male 1 2.0 * 18 * 1 2.0 Charlie 18 Male 1 1.7 * 18 * 1 1.7 Dave 18 Male 1 3.7 * 18 * 1 3.7 Eve 17 Female 1 1.0 * 17 * 1 1.0 Fritz 19 Male 3 1.3 Gerd 21 Male 3 2.3 Hans 23 Male 3 3.0 Generalization: Isa 20 Female 3 failed John 20 Male 3 1.7 Name Age Gender Semester Grade Kale 21 Male 5 1.7 21-25 5 1.7 Leonard 23 Male 5 failed 21-25 5 failed Martin 20 Male 5 2.7 18-20 5 2.7 Nils 22 Male 5 3.0 21-25 5 3.0 Otto 20 Male 5 1.0 20 5 1.0 Introduction to Cybersecurity 2016 17
K-Anonymity (3) Example: K-Anonymity for a list of students with K=5. Name Semester Grade * 1 1.3 For each semester, there are at least 5 * 1 2.0 individuals present that cannot be * 1 1.7 distinguished. * 1 3.7 * 1 1.0 * 3 1.3 Idea/Goal: * 3 2.3 * 3 3.0 Consequently, one cannot be identified, * 3 failed but hides in a group of K=5 people. * 3 1.7 * 5 1.7 * 5 failed * 5 2.7 * 5 3.0 * 5 1.0 Introduction to Cybersecurity 2016 18
Attacks on K-Anonymity – Homogeneity One may learn a lot of information about an individual, if there are k people with this information. Name Semester Grade K-Anonymity with K=5 * 1 1.3 * 1 2.0 * 1 1.7 But: * 1 3.7 If we know that a particular student, * 1 1.0 say, Isa is in the 3 rd semester, then we * 3 failed immediately learn that she has failed * 3 failed * 3 failed the exam. * 3 failed * 3 failed * 5 1.7 * 5 failed * 5 2.7 * 5 3.0 * 5 1.0 Introduction to Cybersecurity 2016 19
Recommend
More recommend