quantifying privacy loss of human mobility graph topology
play

Quantifying Privacy Loss of Human Mobility Graph Topology The 18th - PowerPoint PPT Presentation

Quantifying Privacy Loss of Human Mobility Graph Topology The 18th Privacy Enhancing Technologies Symposium July 2427, 2018 Dionysis Manousakas , Cecilia Mascolo , , Alastair R. Beresford , Dennis Chan , Nikhil Sharma


  1. Quantifying Privacy Loss of Human Mobility Graph Topology The 18th Privacy Enhancing Technologies Symposium July 24–27, 2018 Dionysis Manousakas ∗ , Cecilia Mascolo ∗ , † , Alastair R. Beresford ∗ , Dennis Chan ∗ , Nikhil Sharma ‡ ∗ University of Cambridge † The Alan Turing Institute ‡ UCL

  2. Mobility data privacy vs. utility analytics PETS’18 Background 2 • Information sharing for data-driven customization and large-scale • context-awareness • transportation management, health studies, urban development • Utility -preserving anonymized data representations • timestamped GPS, CDR, etc. measurements • histograms • heatmaps • graphs • How privacy conscientious they are? • often poorly understood, leading to privacy breaches

  3. Mobility data privacy vs. utility analytics PETS’18 Background 2 • Information sharing for data-driven customization and large-scale • context-awareness • transportation management, health studies, urban development • Utility -preserving anonymized data representations • timestamped GPS, CDR, etc. measurements • histograms • heatmaps • graphs • How privacy conscientious they are? • often poorly understood, leading to privacy breaches

  4. Mobility data privacy vs. utility analytics PETS’18 Background 2 • Information sharing for data-driven customization and large-scale • context-awareness • transportation management, health studies, urban development • Utility -preserving anonymized data representations • timestamped GPS, CDR, etc. measurements • histograms • heatmaps • graphs • How privacy conscientious they are? • often poorly understood, leading to privacy breaches

  5. Mobility data privacy vs. utility analytics PETS’18 Background 2 • Information sharing for data-driven customization and large-scale • context-awareness • transportation management, health studies, urban development • Utility -preserving anonymized data representations • timestamped GPS, CDR, etc. measurements • histograms • heatmaps • graphs • How privacy conscientious they are? • often poorly understood, leading to privacy breaches

  6. Deanonymizing mobility Raw mobility data Inference on individual traces information 1 Sparsity and regularity-based [Zang and Bolot, 2011] [de Montjoye et al., 2013] [Naini et al., 2016] PETS’18 Background 3 • ”top- N ” location attacks • unicity of spatio-temporal points • matching of individual mobility histograms

  7. Deanonymizing mobility Raw mobility data Inference on individual traces information [De Mulder et al., 2008] PETS’18 Background 4 2 Probabilistic models • Markovian mobility models • Mobility Markov chains [Gambs et al., 2014]

  8. Deanonymizing mobility Raw mobility data Inference on population statistics aggregated mobility data [Xu et al., 2017] location time-series [Pyrgelis et al., 2017] PETS’18 Background 5 3 On aggregate information • Individual trajectory recovery from • Probabilistic inference on aggregated

  9. Mobility representations raw mobility data sequences of pseudonymised regions of interest e.g. MDC research track, Device Analyzer storage cost utility inference diffjculty privacy loss ? PETS’18 Motivation 6

  10. Mobility representations raw mobility data sequences of pseudonymised regions of interest e.g. MDC research track, Device Analyzer storage cost utility inference diffjculty privacy loss ? PETS’18 Motivation 6

  11. Mobility representations raw mobility data sequences of pseudonymised regions of interest e.g. MDC research track, Device Analyzer storage cost utility inference diffjculty privacy loss ? PETS’18 Motivation 6

  12. Mobility representations raw mobility data sequences of pseudonymised regions of interest e.g. MDC research track, Device Analyzer storage cost utility inference diffjculty privacy loss ? PETS’18 Motivation 6

  13. Mobility representations raw mobility data sequences of pseudonymised regions of interest e.g. MDC research track, Device Analyzer storage cost utility inference diffjculty privacy loss ? PETS’18 Motivation 6

  14. Motivation Let’s remove – What is the privacy leakage of this representation? – Does topology still bear identifjable information? – Can an adversary exploit it in a deanonymization attack? PETS’18 Motivation 7 • temporal (except from ordering of states) • geographic, and • cross-referencing information

  15. Motivation Let’s remove – What is the privacy leakage of this representation? – Does topology still bear identifjable information? – Can an adversary exploit it in a deanonymization attack? PETS’18 Motivation 7 • temporal (except from ordering of states) • geographic, and • cross-referencing information

  16. Mobility information fmow Removal of Geographic-Temporal Information Mobility Data Graph Topology Sparsity Recurrence Privacy Loss PETS’18 Overview 8

  17. Mobility information fmow Removal of Geographic-Temporal Information Mobility Data Graph Topology Sparsity Recurrence Privacy Loss PETS’18 Overview 8

  18. Mobility information fmow Removal of Geographic-Temporal Information Mobility Data Graph Topology Sparsity Recurrence Privacy Loss PETS’18 Overview 8

  19. Mobility information fmow Removal of Geographic-Temporal Information Mobility Data Graph Topology Sparsity Recurrence Privacy Loss PETS’18 Overview 8

  20. Difgerences of our approach Mobility deanonymization Overview PETS’18 information Sharad and Danezis, 2014] [Narayanan and Shmatikov, 2008, node matching entire graph : No need for Privacy on graphs [Lin et al., 2015] ) information (as opposed to locations 9 • Each user ’s information is an • No cross-referencing between • No fjne-grained temporal • No social network

  21. Data information, cellular and wireless location PETS’18 Overview 10 • Device Analyzer : global dataset from mobile devices with system • 1500 users with the most cid location datapoints • average of 430 days of observation, • 200 regions of interest • cids pseudonymized per handset

  22. Mobility networks Graphs with nodes corresponding to ROIs and edges to recorded transitions between ROIs data [Scholtes, 2017] visited regions in user’s routine PETS’18 Overview 11 • Network Order Selection via Markov chain modeling of sequential • Node attributes with no temporal/geographic information • Edge weights corresponding to frequency of transitions • Location pruning to top − N networks by keeping the most frequently

  23. Empirical statistics Graphs with: PETS’18 Overview 12 • heavy-tailed degree distributions • large number of rarely repeated transitions • small number of frequent transitions • high recurrence rate

  24. Privacy framework is the minimum cardinality of isomorphism classes within a population of graphs [Sweeney, 2002] PETS’18 Method 13 k − anonymity via graph isomorphism Graph k − anonymity

  25. directed undirected undirected networks PETS’18 Method 14 Identifjability of top − N mobility networks • 15 and 19 locations suffjce to form uniquely identifjable directed and • 5 and 8 are the corresponding theoretical upper bounds

  26. directed and undirected networks respectively PETS’18 Method 15 Anonymity size of top − N mobility networks • small isomorphism clusters for even very few locations • median anonymity becomes one for network sizes of 5 and 8 in

  27. Recurring patterns in typical user’s mobility 1st half of the observation period 2nd half of the observation period observation window PETS’18 Method 16 shown edges correspond to the 10 % most frequent transitions in the respective

  28. Threat Model PETS’18 Method 17

  29. Threat Model DISCLOSED IDs UNDISCLOSED IDs PETS’18 Method 18 G train G test • closed-world • partition point for each user randomly ∈ (0 . 3 , 0 . 7) of total obs. period • state frequency information

  30. Threat Model PETS’18 Method 19

  31. Attacks: Uninformed Adversary P for every G i PETS’18 Method 20 ( ) l G ′ = l G i = 1/ |L| , ∈ G train expected rank = |L| / 2

  32. Attacks: Informed Adversary P Method PETS’18 f : non-decreasing K : graph similarity metric, , 21 K ( G i , G ′ ) ( l G ′ = l G i ) ( ) |G train , K ∝ f for every G i ∈ G train

  33. Attacks: Informed Adversary PL Method PETS’18 true P 22 P • Posterior probability K ( G i , G ′ ) ( ) ( ) l G ′ = l G i |G train , K ∝ f , for every G i ∈ G train • Privacy Loss ( ) l G ′ = l G ′ true |G train , K G ′ ; G train , K ( ) = P − 1 ( ) l G ′ = l G ′

  34. Graph Similarity Functions Graph Kernels Method PETS’18 G 23 Express similarity as inner product of vectors with graph statistics [Vishwanathan et al., 2010] subtrees) • on Atomic Substructures (e.g. Shortest-Paths, Weisfeiler-Lehman ⟨ ⟩ φ ( G ′ ) φ ( G ) K ( G , G ′ ) = || φ ( G ) || , || φ ( G ′ ) || • Deep Kernels [Yanardag and Vishwanathan, 2015] K ( G , G ′ ) = φ G ′ ) ( ) T M φ ( M : encodes similarities between substructures

Recommend


More recommend