Towards Plausible Graph Anonymization Yang Zhang, Mathias Humbert, - PowerPoint PPT Presentation

Towards Plausible Graph Anonymization Yang Zhang, Mathias Humbert, Bartlomiej Surma, Praveen Manoharan, Jilles Vreeken, Michael Backes

Graph sharing 2

Graph anonymization 3

Graph anonymization id 3 id 7 id 1 id 6 id 4 id 2 id 8 id 5 4

Our work ▪ Find a fundamental flaw in graph anonymization designs 8

Our work ▪ Find a fundamental flaw in graph anonymization designs ▪ Exploit it to recover original graph 9

Our work ▪ Find a fundamental flaw in graph anonymization designs ▪ Exploit it to recover original graph ▪ Use our findings to enhance anonymization designs 10

Our work ▪ Find a fundamental flaw in graph anonymization designs ▪ Exploit it to recover original graph ▪ Use our findings to enhance anonymization designs ▪ Evaluate privacy and usability of enhanced techniques on 3 real life datasets: ▪ Enron, NO, Snap 11

Graph anonymization methods ▪ ’08 Liu et al. - k-anonymity (k-DA) ▪ ’08 Zhou et al. - k-anonymity (k-NA) ▪ ’10 Cheng et al. - k-anonymity (k-iso) ▪ ’11 Sala et al. - differential privacy ▪ ’12 Mittal et al. - random walk privacy ▪ ’14 Xiao et al. - differential privacy 12

k-DA algorithm id 2 id 1 id 6 id 8 id 3 id 4 id 7 id 5 13

k-DA algorithm 5 id 2 4 # nodes 3 id 1 id 6 2 id 8 1 id 3 0 1 2 3 4 node degree id 4 id 7 id 5

k-DA algorithm 5 id 2 4 # nodes 3 id 1 id 6 2 id 8 1 id 3 0 1 2 3 4 node degree id 4 id 7 id 5 2-DA 6 5 # nodes 4 3 2 1 0 1 2 3 4 node degree 15

k-DA algorithm 5 id 2 4 # nodes 3 id 1 id 6 2 id 8 1 id 3 0 1 2 3 4 node degree id 4 id 7 id 5 2-DA id 2 6 id 1 id 6 5 id 8 # nodes 4 id 3 3 2 1 id 4 0 id 7 1 2 3 4 id 5 node degree 16

SalaDP algorithm id 2 id 1 id 6 id 8 dK-2 series id 3 id 4 id 7 id 5 ɛ -DP id 2 id 1 id 6 id 8 id 3 perturbed dK-2 series id 4 id 7 id 5 17

Social network graph properties id 3 id 7 id 1 id 6 id 4 id 2 id 8 id 5 18

Graph recovery attack - overview 22

Graph recovery attack - graph embedding ▪ Node embeddings with node2vec ’16 Grover and Leskovec ▪ Mapping users into continuous vector space ▪ User’s vector reflects structural properties 23

Graph recovery attack - graph embedding ▪ Plausibility is cosine similarity between embeddings × 10 4 Original edges 7 Fake edges 6 Number of edges 5 4 3 2 1 0 − 0 . 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Edge plausibility 24

Graph recovery attack - graph embedding ▪ Plausibility is cosine similarity between embeddings × 10 4 1 . 0 Original edges 7 Fake edges 0 . 8 6 Number of edges 5 0 . 6 AUC 4 0 . 4 3 2 Cosine Embeddedness 0 . 2 Euclidean Jaccard 1 Bray-Curtis Adamic-Adar 0 0 . 0 − 0 . 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Enron NO SNAP Edge plausibility 25

Graph recovery attack - graph embedding ▪ Find a cutoff point and remove non-plausible edges × 10 4 Original edges 7 Fake edges 6 Number of edges 5 4 3 2 1 F1 score 0 − 0 . 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Edge plausibility 26

Enhancing anonymization ▪ get fake edges with highest plausibility? ▪ the distribution will look unnatural 27

Enhancing anonymization ▪ get fake edges with highest plausibility? ▪ the distribution will look unnatural ▪ draw fake edges from same plausibility distribution? 28

Enhancing anonymization ▪ get fake edges with highest plausibility? ▪ the distribution will look unnatural ▪ draw fake edges from same plausibility distribution? k-DA (k=100) Enhanced k-DA (k=100) 29

Resilience to graph recovery attack ▪ F1 score for original anonymizations k-DA drops by:   26~51% SalaDP drops by: 37~48% ▪ F1 score for enhanced anonymizations 30

Utility of Enhanced anonymization 1 . 0 Eigencentrality (Enron) Eigencentrality (NO) 0 . 9 Eigencentrality (SNAP) Utility of G F Degree distribution (Enron) Degree distribution (NO) 0 . 8 Degree distribution (SNAP) Triangle count (Enron) 0 . 7 Triangle count (NO) Triangle count (SNAP) 0 . 6 0 . 6 0 . 7 0 . 8 0 . 9 1 . 0 Utility of G A 31

Resilience to deanonymization attack 30 Anonymity gain (%) 25 k -DA ( k = 50) k -DA ( k = 75) 20 k -DA ( k = 100) 15 SalaDP ( ✏ = 100) SalaDP ( ✏ = 50) 10 SalaDP ( ✏ = 10) 5 0 Enron NO SNAP 32

Conclusion We find flaws in current graph anonymizations 33

Conclusion We find flaws in current graph anonymizations We recover the original, pre-anonymized graph 34

Conclusion We find flaws in current graph anonymizations We enhance the anonymization techniques We recover the original, pre-anonymized graph 35

Conclusion We find flaws in current graph anonymizations We enhance the anonymization techniques We evaluate privacy and utility We recover the original, pre-anonymized graph of enhanced anonymization 36

Towards Plausible Graph Anonymization Yang Zhang, Mathias Humbert, - PowerPoint PPT Presentation

Towards Plausible Graph Anonymization Yang Zhang, Mathias Humbert, Bartlomiej Surma, Praveen Manoharan, Jilles Vreeken, Michael Backes Graph sharing 2 Graph anonymization 3 Graph anonymization id 3 id 7 id 1 id 6 id 4 id 2 id 8 id 5

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

CS573 Data Privacy and Security Anonymization methods Anonymization methods Li Xiong Today

Introduction to Anonymization (I) Claire McKay Bowen Postdoctoral Researcher, Los Alamos

WHY IS IT PLAUSIBLE? (Barry Mazur, JMM conference, Jan. 5, 2012) ( A ) = ( B ) ( B ) is

Data Masking and Anonymization for PostgreSQL 1 The Anonymization Challenge 8 Strategies

Data Privacy Anonymization Li Xiong CS573 Data Privacy and Security Outline Inference

Noise Graph Addition: A New Perspective for Graph Anonymization Vicen Torra, Julin Salas

An Automated Social Graph De-anonymization Technique Kumar Sharad 1 George Danezis 2 1 2

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Modelling semantics developing a cognitively plausible, data-driven approach Objective

Position Based Dynamics A fast yet physically plausible method for deformable body simulation

MULTILINGUAL AUTOMATED TEXT ANONYMIZATION Francisco Dias francisco.m.c.dias@tecnico.ulisboa.pt

Big Data and the application of anonymization techniques Annual Privacy Forum 2015 7-8 October,

Issues of Data Mining Kyle Borah OutLine Background Data Anonymization Encryption

Anonymization Algorithms - Other techniques, metrics, and extended scenarios Li Xiong CS573

Active Inductive Logic Programming for Code Search Aishwarya Sivaraman, Tianyi Zhang, Guy Van den

What You See Is What You Get Exploiting Visibility for 3D Object Detection Peiyun Hu, Jason

Data Relations id score id sex age 2 10 1 m 19 3 18 2 m 22 4 21 3 NA NA 4 23 4

Person Re-identification Introduction and Future Trends Shengcai Liao Institute of Automation

HFL: Hybrid Fuzzing on the Linux Kernel Kyungtae Kim*, Dae R. Jeong, Chung Hwan Kim ,

JSON-LD Joint Session Lyon, France, October 2018 DEFINING @ID OF THING Defining @id of Thing

Purposefully Planning the Road to Recruitment Dial: 877-853-5257 Webinar ID: 335-362-024 Welcome!

Relational Algebra and SQL Chapter 5 1 Relational Query Languages Languages for describing