CS573 Data Privacy and Security Anonymization methods Anonymization - PowerPoint PPT Presentation

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong

Today • Permutation based anonymization methods (cont.) • Other privacy principles for microdata publishing publishing • Statistical databases

Anonymization methods • Non-perturbative: don't distort the data – Generalization – Suppression • Perturbative: distort the data • Perturbative: distort the data – Microaggregation/clustering – Additive noise • Anatomization and permutation – De-associate relationship between QID and sensitive attribute

Concept of the Anatomy Algorithm • Release 2 tables, �� (QIT) and �� (ST) • Use the same QI groups (satisfy l!diversity), replace the sensitive attribute values with a Group!ID column • Then produce a sensitive table with �� statistics • Then produce a sensitive table with �� statistics tuple ID �� 1 23 M 11000 1 �� 2 27 M 13000 1 1 headache 2 3 35 M 59000 1 1 pneumonia 2 4 59 M 12000 1 2 bronchitis 1 5 61 F 54000 2 2 flu 2 6 65 F 25000 2 2 stomach ache 1 7 65 F 25000 2 8 70 F 30000 2 ST QIT

Specifications of Anatomy �� D EFINITION 3. (Anatomy) With a given � !diverse partition anatomy will create QIT and ST tables QIT will be constructed as the following: ( � �� ) ( � �� ) ST will be constructed as the following: ( �� , � � , �� )

Privacy properties T HEOREM 1. Given a pair of QIT and ST inference of the sensitive value of any individual is at mos 1/ � �� 23 M 11000 1 dyspepsia 2 23 M 11000 1 pneumonia 2 27 M 13000 1 dyspepsia 2 27 27 M M 13000 13000 1 1 pneumonia pneumonia 2 2 35 M 59000 1 dyspepsia 2 35 M 59000 1 pneumonia 2 59 M 12000 1 dyspepsia 2 59 M 12000 1 pneumonia 2 61 F 54000 2 bronchitis 1 61 F 54000 2 flu 2 61 F 54000 2 stomachache 1 65 F 25000 2 bronchitis 1 65 F 25000 2 flu 2 65 F 25000 2 stomachache 1 65 F 25000 2 bronchitis 1 65 F 25000 2 flu 2 65 F 25000 2 stomachache 1 70 F 30000 2 bronchitis 1 70 F 30000 2 flu 2 70 F 30000 2 stomachache 1

Comparison with generalization • Compare with generalization on two assumptions: A1: the adversary has the QI!values of the target individual A2: the adversary also knows that the individual is definitely in the �� If A1 and A2 are true, anatomy is as good as generalization 1/ �� If A1 and A2 are true, anatomy is as good as generalization 1/ �� holds true If A1 is true and A2 is false, generalization is stronger If A1 and A2 are false, generalization is still stronger

Preserving Data Correlation • Examine the correlation between Age and Disease in T using probability density function pdf • Example: t1 tuple ID �� 1 (Bob) 23 M 11000 pneumonia 2 27 M 13000 Dyspepsia 3 35 M 59000 Dyspepsia 4 59 M 12000 pneumonia 5 61 F 54000 flu 6 65 F 25000 stomach pain 7 (Alice) 65 F 25000 flu 8 70 F 30000 bronchitis table 1

Preserving Data Correlation �� • To re!construct an approximate pdf of � � from the generalization table: tuple ID �� 1 1 [21,60] [21,60] M M [10001, 60000] [10001, 60000] pneumonia pneumonia 2 [21,60] M [10001, 60000] Dyspepsia 3 [21,60] M [10001, 60000] Dyspepsia 4 [21,60] M [10001, 60000] pneumonia 5 [61,70] F [10001, 60000] flu 6 [61,70] F [10001, 60000] stomach pain 7 [61,70] F [10001, 60000] flu 8 [61,70] F [10001, 60000] bronchitis table 2

Preserving Data Correlation �� • To re!construct an approximate pdf of � � from the QIT and ST tables: tuple ID �� 1 23 M 11000 1 2 27 M 13000 1 3 35 M 59000 1 4 4 59 59 M M 12000 12000 1 1 5 61 F 54000 2 6 65 F 25000 2 7 65 F 25000 2 8 70 F 30000 2 QIT �� 1 headache 2 1 pneumonia 2 2 bronchitis 1 2 flu 2 2 stomach ache 1 ST

Preserving Data Correlation �� • To figure out a more rigorous comparison, calculate the “ � � distance” with the following equation: The distance for anatomy is 0.5 while the distance for The distance for anatomy is 0.5 while the distance for generalization is 22.5

Preserving Data Correlation �� Idea: Measure the error for each tuple by using the following formula: Objective: for all tuples � in � and obtain a minimal �� (RCE): Algorithm: Nearly!Optimal Anatomizing Algorithm

Experiments • dataset CENSUS that contained the personal information of 500k American adults containing 9 discrete attributes • Created two sets of �� tables Set 1: 5 tables denoted as OCC!3, ..., OCC!7 so that OCC! � (3 ≤ � ≤ 7) uses the first � as QI!attributes and �� (3 ≤ � ≤ 7) uses the first � as QI!attributes and �� as the sensitive attribute � � Set 2: 5 tables denoted as SAL!3, ..., SAL!7 so that SAL! � (3 ≤ � ≤ 7) uses the first � as QI!attributes and !��"�� as the sensitive attribute � � g

Experiments ��

Today • Permutation based anonymization methods (cont.) • Other privacy principles for microdata publishing publishing • Statistical databases • Differential privacy

Attacks on k-Anonymity • k-Anonymity does not provide privacy if – Sensitive values in an equivalence class lack diversity – The attacker has background knowledge A 3-anonymous patient table A 3-anonymous patient table Homogeneity attack Homogeneity attack Zipcode Age Disease Bob 476** 2* Heart Disease �� 476** 2* Heart Disease 47678 27 476** 2* Heart Disease 4790* ≥40 Flu 4790* ≥40 Heart Disease Background knowledge attack 4790* ≥40 Cancer Carl 476** 3* Heart Disease �� 476** 3* Cancer 47673 36 476** 3* Cancer slide 16

l-Diversity [Machanavajjhala et al. ICDE ‘06] Caucas 787XX Flu Caucas 787XX Shingles Caucas 787XX Acne Caucas 787XX Flu Caucas Caucas 787XX 787XX Acne Acne Sensitive attributes must be Sensitive attributes must be “diverse” within each Caucas 787XX Flu quasi-identifier equivalence class Asian/AfrAm 78XXX Flu Asian/AfrAm 78XXX Flu Asian/AfrAm 78XXX Acne Asian/AfrAm 78XXX Shingles Asian/AfrAm 78XXX Acne Asian/AfrAm 78XXX Flu slide 17

Distinct l-Diversity • Each equivalence class has at least l well- represented sensitive values • Doesn’t prevent probabilistic inference attacks 8 records have HIV 10 records 2 records have other values slide 18

Other Versions of l-Diversity • Probabilistic l-diversity – The frequency of the most frequent value in an equivalence class is bounded by 1/l • Entropy l-diversity • Entropy l-diversity – The entropy of the distribution of sensitive values in each equivalence class is at least log(l) • Recursive (c,l)-diversity – r 1 <c(r l +r l+1 +…+r m ) where r i is the frequency of the i th most frequent value – Intuition: the most frequent value does not appear too frequently slide 19

Neither Necessary, Nor Sufficient Original dataset 2 Cancer 2 Cancer 2 Cancer 2 Flu 2 2 Cancer Cancer 2 Cancer 2 Cancer 2 Cancer 2 Cancer 2 Cancer 2 Flu 2 Flu 99% have cancer

Neither Necessary, Nor Sufficient Original dataset Anonymization A 2 Cancer Q1 Flu 2 Cancer Q1 Flu 2 Cancer Q1 Cancer 2 Flu Q1 Flu 2 2 Cancer Cancer Q1 Q1 Cancer Cancer 2 Cancer Q1 Cancer 2 Cancer Q2 Cancer 2 Cancer Q2 Cancer 2 Cancer Q2 Cancer 2 Cancer Q2 Cancer 2 Flu Q2 Cancer 50% cancer ⇒ quasi5identifier group is “diverse” 2 Flu Q2 Cancer 99% have cancer slide 21

Neither Necessary, Nor Sufficient Original dataset Anonymization A Anonymization B 2 Cancer Q1 Flu Q1 Flu 2 Cancer Q1 Flu Q1 Cancer 2 Cancer Q1 Cancer Q1 Cancer 2 Flu Q1 Flu Q1 Cancer 2 2 Cancer Cancer Q1 Q1 Cancer Cancer Q1 Q1 Cancer Cancer 2 Cancer Q1 Cancer Q1 Cancer 2 Cancer Q2 Cancer Q2 Cancer 2 Cancer Q2 Cancer Q2 Cancer 99% cancer ⇒ quasi5identifier group is not “diverse” 2 Cancer Q2 Cancer Q2 Cancer 2 Cancer Q2 Cancer Q2 Cancer 2 Flu Q2 Cancer Q2 Flu 50% cancer ⇒ quasi5identifier group is “diverse” 2 Flu Q2 Cancer Q2 Flu �� 99% have cancer slide 22

CS573 Data Privacy and Security Anonymization methods Anonymization - PowerPoint PPT Presentation

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today Permutation based anonymization methods (cont.) Other privacy principles for microdata publishing publishing Statistical databases

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

Data Privacy Anonymization Li Xiong CS573 Data Privacy and Security Outline Inference

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

CS573 Data Privacy and Security Data Anonymization (cont.) Li Xiong Department of Mathematics

Introduction to Anonymization (I) Claire McKay Bowen Postdoctoral Researcher, Los Alamos

Anonymization Algorithms - Microaggregation and Clustering Li Xiong CS573 Data Privacy and

Data Anonymization Introduction Li Xiong CS573 Data Privacy and Security Outline

Anonymization Algorithms - Other techniques, metrics, and extended scenarios Li Xiong CS573

CS573 Data Privacy and Security Local Differential Privacy Li Xiong Privacy at Scale: Local

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

CS573 Data Privacy and Security Location Privacy Location Privacy Yonghui (Yohu) Xiao htt //

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and

Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity

Healthcare privacy and security Li Xiong CS573 Data Privacy and Security Patients Are Concerned

CS573 Data Privacy and Security Li Xiong Department of Mathematics and Computer Science Emory

S. Manfredini, S. Vertuani CIRCUMSTANTIAL PATHOLOGICAL (related to (related to environmental or

Acne at the Bottom of the Main Sequence John Barnes Department of Physical Sciences Open

Outline Care Practitioner Part 1 Principles of topical therapy Nummular Dermatitis

NACB Evidence-Based Practice for POCT Ellis Jacobs, Ph.D., DABCC New York University School of

Practical Shadows Out of the demo, into the engine Tom Forsyth RAD Game Tools Outline

Skin Cancer, S Sunscreen, a and T Tips for Su Success i in M Managing Com ommon Ra Rashes

INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2017 Lecture 3:

Basic Ray Tracing CMSC 435/634 Projections orthographic axis-aligned orthographic perspective

Sambuz

Useful Links

Newsletter

Mail Us

CS573 Data Privacy and Security Anonymization methods Anonymization - PowerPoint PPT Presentation

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today Permutation based anonymization methods (cont.) Other privacy principles for microdata publishing publishing Statistical databases

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

Data Privacy Anonymization Li Xiong CS573 Data Privacy and Security Outline Inference

CS573 Data Privacy and Security Data Privacy and Security in Healthcare Data Privacy and Security

CS573 Data Privacy and Security Data Anonymization (cont.) Li Xiong Department of Mathematics

Introduction to Anonymization (I) Claire McKay Bowen Postdoctoral Researcher, Los Alamos

Anonymization Algorithms - Microaggregation and Clustering Li Xiong CS573 Data Privacy and

Data Anonymization Introduction Li Xiong CS573 Data Privacy and Security Outline

Anonymization Algorithms - Other techniques, metrics, and extended scenarios Li Xiong CS573

CS573 Data Privacy and Security Local Differential Privacy Li Xiong Privacy at Scale: Local

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

CS573 Data Privacy and Security Location Privacy Location Privacy Yonghui (Yohu) Xiao htt //

Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and

Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity

Healthcare privacy and security Li Xiong CS573 Data Privacy and Security Patients Are Concerned

CS573 Data Privacy and Security Li Xiong Department of Mathematics and Computer Science Emory

S. Manfredini, S. Vertuani CIRCUMSTANTIAL PATHOLOGICAL (related to (related to environmental or

Acne at the Bottom of the Main Sequence John Barnes Department of Physical Sciences Open

Outline Care Practitioner Part 1 Principles of topical therapy Nummular Dermatitis

NACB Evidence-Based Practice for POCT Ellis Jacobs, Ph.D., DABCC New York University School of

Practical Shadows Out of the demo, into the engine Tom Forsyth RAD Game Tools Outline

Skin Cancer, S Sunscreen, a and T Tips for Su Success i in M Managing Com ommon Ra Rashes

INFOGR Computer Graphics Jacco Bikker &amp; Debabrata Panja - April-July 2017 Lecture 3:

Basic Ray Tracing CMSC 435/634 Projections orthographic axis-aligned orthographic perspective

Sambuz

Useful Links

Newsletter

Mail Us

INFOGR Computer Graphics Jacco Bikker & Debabrata Panja - April-July 2017 Lecture 3: