Towards Plausible Graph Anonymization Yang Zhang, Mathias Humbert, Bartlomiej Surma, Praveen Manoharan, Jilles Vreeken, Michael Backes
Graph sharing 2
Graph anonymization 3
Graph anonymization id 3 id 7 id 1 id 6 id 4 id 2 id 8 id 5 4
Graph anonymization id 3 id 7 id 1 id 6 id 4 id 2 id 8 id 5 5
Graph anonymization id 3 id 7 id 1 id 6 id 4 id 2 id 8 id 5 6
Graph anonymization id 3 id 7 id 1 id 6 id 4 id 2 id 8 id 5 7
Our work ▪ Find a fundamental flaw in graph anonymization designs 8
Our work ▪ Find a fundamental flaw in graph anonymization designs ▪ Exploit it to recover original graph 9
Our work ▪ Find a fundamental flaw in graph anonymization designs ▪ Exploit it to recover original graph ▪ Use our findings to enhance anonymization designs 10
Our work ▪ Find a fundamental flaw in graph anonymization designs ▪ Exploit it to recover original graph ▪ Use our findings to enhance anonymization designs ▪ Evaluate privacy and usability of enhanced techniques on 3 real life datasets: ▪ Enron, NO, Snap 11
Graph anonymization methods ▪ ’08 Liu et al. - k-anonymity (k-DA) ▪ ’08 Zhou et al. - k-anonymity (k-NA) ▪ ’10 Cheng et al. - k-anonymity (k-iso) ▪ ’11 Sala et al. - differential privacy ▪ ’12 Mittal et al. - random walk privacy ▪ ’14 Xiao et al. - differential privacy 12
k-DA algorithm id 2 id 1 id 6 id 8 id 3 id 4 id 7 id 5 13
k-DA algorithm 5 id 2 4 # nodes 3 id 1 id 6 2 id 8 1 id 3 0 1 2 3 4 node degree id 4 id 7 id 5
k-DA algorithm 5 id 2 4 # nodes 3 id 1 id 6 2 id 8 1 id 3 0 1 2 3 4 node degree id 4 id 7 id 5 2-DA 6 5 # nodes 4 3 2 1 0 1 2 3 4 node degree 15
k-DA algorithm 5 id 2 4 # nodes 3 id 1 id 6 2 id 8 1 id 3 0 1 2 3 4 node degree id 4 id 7 id 5 2-DA id 2 6 id 1 id 6 5 id 8 # nodes 4 id 3 3 2 1 id 4 0 id 7 1 2 3 4 id 5 node degree 16
SalaDP algorithm id 2 id 1 id 6 id 8 dK-2 series id 3 id 4 id 7 id 5 ɛ -DP id 2 id 1 id 6 id 8 id 3 perturbed dK-2 series id 4 id 7 id 5 17
Social network graph properties id 3 id 7 id 1 id 6 id 4 id 2 id 8 id 5 18
Social network graph properties id 3 id 7 id 1 id 6 id 4 id 2 id 8 id 5 19
Social network graph properties id 3 id 7 id 1 id 6 id 4 id 2 id 8 id 5 20
Social network graph properties id 3 id 7 id 1 id 6 id 4 id 2 id 8 id 5 21
Graph recovery attack - overview 22
Graph recovery attack - graph embedding ▪ Node embeddings with node2vec ’16 Grover and Leskovec ▪ Mapping users into continuous vector space ▪ User’s vector reflects structural properties 23
Graph recovery attack - graph embedding ▪ Plausibility is cosine similarity between embeddings × 10 4 Original edges 7 Fake edges 6 Number of edges 5 4 3 2 1 0 − 0 . 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Edge plausibility 24
Graph recovery attack - graph embedding ▪ Plausibility is cosine similarity between embeddings × 10 4 1 . 0 Original edges 7 Fake edges 0 . 8 6 Number of edges 5 0 . 6 AUC 4 0 . 4 3 2 Cosine Embeddedness 0 . 2 Euclidean Jaccard 1 Bray-Curtis Adamic-Adar 0 0 . 0 − 0 . 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Enron NO SNAP Edge plausibility 25
Graph recovery attack - graph embedding ▪ Find a cutoff point and remove non-plausible edges × 10 4 Original edges 7 Fake edges 6 Number of edges 5 4 3 2 1 F1 score 0 − 0 . 2 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 1 . 0 Edge plausibility 26
Enhancing anonymization ▪ get fake edges with highest plausibility? ▪ the distribution will look unnatural 27
Enhancing anonymization ▪ get fake edges with highest plausibility? ▪ the distribution will look unnatural ▪ draw fake edges from same plausibility distribution? 28
Enhancing anonymization ▪ get fake edges with highest plausibility? ▪ the distribution will look unnatural ▪ draw fake edges from same plausibility distribution? k-DA (k=100) Enhanced k-DA (k=100) 29
Resilience to graph recovery attack ▪ F1 score for original anonymizations k-DA drops by: 26~51% SalaDP drops by: 37~48% ▪ F1 score for enhanced anonymizations 30
Utility of Enhanced anonymization 1 . 0 Eigencentrality (Enron) Eigencentrality (NO) 0 . 9 Eigencentrality (SNAP) Utility of G F Degree distribution (Enron) Degree distribution (NO) 0 . 8 Degree distribution (SNAP) Triangle count (Enron) 0 . 7 Triangle count (NO) Triangle count (SNAP) 0 . 6 0 . 6 0 . 7 0 . 8 0 . 9 1 . 0 Utility of G A 31
Resilience to deanonymization attack 30 Anonymity gain (%) 25 k -DA ( k = 50) k -DA ( k = 75) 20 k -DA ( k = 100) 15 SalaDP ( ✏ = 100) SalaDP ( ✏ = 50) 10 SalaDP ( ✏ = 10) 5 0 Enron NO SNAP 32
Conclusion We find flaws in current graph anonymizations 33
Conclusion We find flaws in current graph anonymizations We recover the original, pre-anonymized graph 34
Conclusion We find flaws in current graph anonymizations We enhance the anonymization techniques We recover the original, pre-anonymized graph 35
Conclusion We find flaws in current graph anonymizations We enhance the anonymization techniques We evaluate privacy and utility We recover the original, pre-anonymized graph of enhanced anonymization 36
Recommend
More recommend