time distortion anonymization for the publication of
play

Time Distortion Anonymization for the Publication of Mobility Data - PowerPoint PPT Presentation

Time Distortion Anonymization for the Publication of Mobility Data with High Utility Vincent Primault, Sonia Ben Mokhtar, Cdric Lauradoux and Lionel Brunie Mobility data usefulness Real-time traffic, traffic prediction Companies


  1. Time Distortion Anonymization for the Publication of Mobility Data with High Utility Vincent Primault, Sonia Ben Mokhtar, Cédric Lauradoux and Lionel Brunie

  2. Mobility data usefulness… Real-time traffic, traffic prediction Companies collecting data Long-term place prediction 2

  3. … and threats Gambs et al. Show Me How You Move and I Will Tell You Who You Are. Transactions on Data Privacy, 2011. 3

  4. Privacy-preserving data publication Attacker Partially Public maps re-identified Clustering mobility traces Protection White pages mechanism Anonymized Raw mobility mobility traces traces Data mining Machine learning Useful Simulations analysis results Researcher 5

  5. Outline • Introduction • State of the art • Making a P ROMESSE • Experimental evaluation • Conclusion 6

  6. A mobility trace A trace is a temporally ordered list of records belonging to a same user. A record is a triplet (user, location, timestamp). 7

  7. Extraction of points of interest (POIs) Done by using, e.g., an appropriate clustering algorithm. Points of interest convey semantic information about habits and can lead to users re-identification. 8

  8. Location privacy protection mechanisms for data publication Differential k -anonymity privacy Geo-Indistinguishability Wait For Me [Andrés et al., 2013] [Abul et al., 2010] Abul et al. Anonymization of moving objects databases by clustering and perturbation . Information Systems, 2010. Andrés et al. Geo-indistinguishability: Differential privacy for Location-based Systems . CCS, 2013. 9

  9. Wait For Me ∂ represents the incertitude that comes from GPS measurements. Wait For Me enforces ( k ,∂)-anonymity, i.e., there is always at least k users in ∂ a cylinder of radius ∂/2. ∂ Abul et al. Anonymization of moving objects databases by clustering and perturbation . Information Systems, 2010. 10

  10. Geo-Indistinguishability Level of privacy l i within r i proportional to an ε Real location Protected location l4, r4 l3, r3 l2, r2 l1, r1 Andrés et al. Geo-indistinguishability: Differential privacy for Location-based Systems . CCS, 2013. 11

  11. Outline • Introduction • State of the art • Making a P ROMESSE • Experimental evaluation • Conclusion 12

  12. Intuition behind our work No state-of-the-art mechanism is both privacy- preserving and useful for data scientists. Almost all of them alter the geographical information in some way. We believe geographical information is the most important one, so we propose a new mechanism that minimally distort the location. 13

  13. Hiding POIs with speed smoothing The idea How? To guarantee a constant speed Divide traces into smaller along a trace. trajectories, typically one day long. More challengingto identify where a user stops, and Enforce an equal duration and therefore her POIs. length between two consecutive records. 14

  14. Speed smoothing 10:05 10:05 epsilon 10:06 Point of interest 10:06 10:07 10:07 10h08 10:08 15

  15. Outline • Introduction • State of the art • Making a P ROMESSE • Experimental evaluation • Conclusion 16

  16. Experimenting with three real-life datasets Cabspotting Geolife MDC Records 8,9M 3,8M 1,1M Traces 5,5k 2,4k 4,6k Avg trace duration 32 h 3 h 3 h Avg sampling rate 72 s 7 s 32 s 17

  17. POIs retrieval POIs with maximum diameter of 200 meters and minimum duration of 15 minutes. Two POIs match if their centroids are within 100 meters. 18

  18. POIs retrieval (F-score) Lower is better 60% 50% 40% 30% Cabspotting 20% Geolife MDC 10% 0% 19

  19. Average spatial error Real trace Protected trace 20

  20. Average spatial error Lower is better (log scale) 100000 10000 Spatial error, in meters 1000 100 Cabspotting 10 Geolife MDC 1 0,1 21

  21. Range queries distortion 1,000 different queries From 2 to 8 hours Distortion is |Q(D) – Q(D’)|/Q(D) 22

  22. Range queries distortion Lower is better 120% 100% 80% 60% Cabspotting 40% Geolife MDC 20% 0% 23

  23. Outline • Introduction • State of the art • Making a P ROMESSE • Experimental evaluation • Future work • Conclusion 24

  24. Summary • Introduced time distortion, opened a new research direction. • Implemented a new protection mechanism for data publishing, addressing a severe threat while maintaining high utility. • Evaluated against three real-life datasets. 25

  25. Questions 26

Recommend


More recommend