Time Distortion Anonymization for the Publication of Mobility Data with High Utility Vincent Primault, Sonia Ben Mokhtar, Cédric Lauradoux and Lionel Brunie
Mobility data usefulness… Real-time traffic, traffic prediction Companies collecting data Long-term place prediction 2
… and threats Gambs et al. Show Me How You Move and I Will Tell You Who You Are. Transactions on Data Privacy, 2011. 3
Privacy-preserving data publication Attacker Partially Public maps re-identified Clustering mobility traces Protection White pages mechanism Anonymized Raw mobility mobility traces traces Data mining Machine learning Useful Simulations analysis results Researcher 5
Outline • Introduction • State of the art • Making a P ROMESSE • Experimental evaluation • Conclusion 6
A mobility trace A trace is a temporally ordered list of records belonging to a same user. A record is a triplet (user, location, timestamp). 7
Extraction of points of interest (POIs) Done by using, e.g., an appropriate clustering algorithm. Points of interest convey semantic information about habits and can lead to users re-identification. 8
Location privacy protection mechanisms for data publication Differential k -anonymity privacy Geo-Indistinguishability Wait For Me [Andrés et al., 2013] [Abul et al., 2010] Abul et al. Anonymization of moving objects databases by clustering and perturbation . Information Systems, 2010. Andrés et al. Geo-indistinguishability: Differential privacy for Location-based Systems . CCS, 2013. 9
Wait For Me ∂ represents the incertitude that comes from GPS measurements. Wait For Me enforces ( k ,∂)-anonymity, i.e., there is always at least k users in ∂ a cylinder of radius ∂/2. ∂ Abul et al. Anonymization of moving objects databases by clustering and perturbation . Information Systems, 2010. 10
Geo-Indistinguishability Level of privacy l i within r i proportional to an ε Real location Protected location l4, r4 l3, r3 l2, r2 l1, r1 Andrés et al. Geo-indistinguishability: Differential privacy for Location-based Systems . CCS, 2013. 11
Outline • Introduction • State of the art • Making a P ROMESSE • Experimental evaluation • Conclusion 12
Intuition behind our work No state-of-the-art mechanism is both privacy- preserving and useful for data scientists. Almost all of them alter the geographical information in some way. We believe geographical information is the most important one, so we propose a new mechanism that minimally distort the location. 13
Hiding POIs with speed smoothing The idea How? To guarantee a constant speed Divide traces into smaller along a trace. trajectories, typically one day long. More challengingto identify where a user stops, and Enforce an equal duration and therefore her POIs. length between two consecutive records. 14
Speed smoothing 10:05 10:05 epsilon 10:06 Point of interest 10:06 10:07 10:07 10h08 10:08 15
Outline • Introduction • State of the art • Making a P ROMESSE • Experimental evaluation • Conclusion 16
Experimenting with three real-life datasets Cabspotting Geolife MDC Records 8,9M 3,8M 1,1M Traces 5,5k 2,4k 4,6k Avg trace duration 32 h 3 h 3 h Avg sampling rate 72 s 7 s 32 s 17
POIs retrieval POIs with maximum diameter of 200 meters and minimum duration of 15 minutes. Two POIs match if their centroids are within 100 meters. 18
POIs retrieval (F-score) Lower is better 60% 50% 40% 30% Cabspotting 20% Geolife MDC 10% 0% 19
Average spatial error Real trace Protected trace 20
Average spatial error Lower is better (log scale) 100000 10000 Spatial error, in meters 1000 100 Cabspotting 10 Geolife MDC 1 0,1 21
Range queries distortion 1,000 different queries From 2 to 8 hours Distortion is |Q(D) – Q(D’)|/Q(D) 22
Range queries distortion Lower is better 120% 100% 80% 60% Cabspotting 40% Geolife MDC 20% 0% 23
Outline • Introduction • State of the art • Making a P ROMESSE • Experimental evaluation • Future work • Conclusion 24
Summary • Introduced time distortion, opened a new research direction. • Implemented a new protection mechanism for data publishing, addressing a severe threat while maintaining high utility. • Evaluated against three real-life datasets. 25
Questions 26
Recommend
More recommend