walk2friends: Inferring Social Links from Mobility Profiles Yang Zhang joint work with Michael Backes, Mathias Humbert, and Jun Pang
Location Privacy • 4 spatial-temporal points can identify 95% of the individuals • Mobility traces can be e ff ectively de-anonymized • You are where you go • Demographics • Social relations
Social Relation Privacy • Social relations can be sensitive, e.g., o ffi ce romance • 17.2% -> 56.2% (Facebook users in New York) • NSA’s co-traveler program
Predict whether two users are friends based on the locations they have visited
• Solution 1: common locations two users have visited • Almost all data mining approaches take this way • Location entropy • Can’t work when two users share no common locations
• Solution 2: mobility profiles/features • Summarize each user’s mobility profiles • Friends share similar mobility profiles than strangers • Feature engineering • Tedious e ff orts and domain expert knowledge Every Single Time!!! • Time consuming
Representation Learning • Learning features (representation/deep learning) • Follow a general object (unsupervised) • Graph representation learning (graph embedding) • Preserve each user’s neighbors in a social network • Mobility feature learning
Assumption: A user’s mobility neighbors can reflect his mobility profile/features • Define each user’s mobility neighbors • Learn mobility features/profiles • Infer two users’ social relation
Mobility Neighbors • A user’s mobility neighbors include • Locations a user has visited • Others who have visited similar locations and their locations • Breadth first search • Not considering the visiting frequencies • Random walk sampling
Mobility Neighbors
Feature Learning • Learn a function: θ : U → R d • Each node to predict it’s neighbors • Softmax p ( | ; θ ) · · ; θ ) · p ( | ; θ ) · p ( | ; θ ) · p ( | arg max θ ; θ ) · p ( | ; θ ) · p ( | p ( | ; θ ) · p ( | ; θ ) · ; θ ) · p ( | ; θ ) · p ( | ; θ ) · p ( | ; θ ) · p ( | ; θ ) · p ( | ; θ ) · p ( | ; θ ) · p ( | ; θ ) · p ( | ; θ ) · ; θ ) · p ( | p ( | ; θ ) · p ( | ; θ ) ; θ ) · p ( | ; θ ) · p ( | p ( |
Social Relation Inference s ( ) = 0 . 9 , s ( ) = 0 . 8 • Cosine similarity , • Unsupervised s ( ) = 0 . 6 , • Predict any social relation s ( ) = 0 . 4 , s ( ) = 0 . 3 , s ( ) = 0 . 2 ,
Evaluation: dataset • Instagram users’ check-ins • New York, Los Angeles and London • Foursquare (location semantics) • Social relations (two users follow each other)
Evaluation: ROC curve
Evaluation: distance metric 0.80 0.70 0.60 0.50 A8C CosLne 0.40 EuclLdean CoUUelatLon 0.30 CheEysheY 0.20 BUay-CuUtLs CanEeUUa 0.10 0anhattan 0.00 1ew YoUN Los Angeles London
Evaluation: baseline models 0.80 0.70 0.60 0.50 A8C 0.40 0.30 2uU appUoach aa_ent w_geodLst common_p mLn_ent pp 0.20 oYeUOap_p aa_p dLYeUsLty w_common_p w_fUequency mLn_p 0.10 peUsonaO w_oYeUOap_p geodLst 0.00 1ew YoUN Los AngeOes London
Evaluation: baseline models 0.80 0.70 0.60 0.50 A8C 0.40 0.30 2uU appUoach aa_ent w_geodLst common_p mLn_ent pp 0.20 oYeUOap_p aa_p dLYeUsLty w_common_p w_fUequency mLn_p 0.10 peUsonaO w_oYeUOap_p geodLst 0.00 1ew YoUN Los AngeOes London
Evaluation: hyperparameters 0.82 0.82 0.82 0.80 0.80 0.80 0.78 0.78 0.78 A8C A8C A8C 0.76 0.76 0.76 0.74 0.74 0.74 1ew YoUN 1ew YoUN New YoUN 0.72 0.72 0.72 Los Angeles Los Angeles Los Angeles London London London 0.70 0.70 0.70 10 20 30 40 50 60 70 80 90 100 2 4 6 8 10 12 14 16 18 20 4 5 6 7 8 l w t w log 2 ( d )
Evaluation: check-in numbers 0.83 0.80 A8C 0.77 0.74 1ew YoUN Los Angeles London 0.71 5 10 15 20 25 30 1umbeU of checN-Lns
Evaluation: common locations 0.82 0.78 A8C 0.74 0.70 1ew YoUN Los Angeles London 0.66 0 1 2 3 4 1umbeU of common locatLons
Evaluation: geo-coordinates 0.83 0.76 A8C 0.69 0.62 1ew YoUN Los Angeles LonGon 0.55 10 −3 10 −2 10 −1 GULG gUanulaULty (Ln GegUee)
Defense Mechanisms • Hiding • Delete certain proportion of check-ins • Replacement • Random walk to replace locations
Defense Mechanisms • Generalization • Geo-coordinate and location semantics • MoMA -> art (40.76N, -73.97W) • Recover location first • art (40.76N, -73.97W) -> MoMA or Tom Otterness Frog?
Utility Metric • Each user’s check-in distribution • Both original and obfuscated • Jensen-Shannon divergence • Average over all users
Defense Evaluation 1.00 0.80 0.76 0.80 0.72 0.60 8tility 0.68 A8C 0.64 0.40 Hiding Hiding 0.60 5HplacHPHnt (stHp 5) 5HplacHPHnt (stHp 5) 5HplacHPHnt (stHp 15) 5HplacHPHnt (stHp 15) 0.20 0.56 5HplacHPHnt (stHp 25) 5HplacHPHnt (stHp 25) 5HplacHPHnt (stHp 35) 5HplacHPHnt (stHp 35) 0.52 0.00 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 3URpRUtiRn Rf RbfuscatiRn (%) 3URpRUtiRn Rf RbfuscatiRn (%)
Defense Evaluation
Defense Evaluation 1.00 HiGing 5HplacHmHnt GHnHUalizatiRn 0.80 0.60 8tility 0.40 0.20 0.00 0.50 0.55 0.60 0.65 0.70 0.75 0.80 A8C
yang.zhang@cispa.saarland Conclusion @yangzhangalmo • A new social relation inference attack with mobility profiles • Learning user profiles • Unsupervised and predict any social relations • Three general defense mechanisms • Replacement and hiding outperform generalization
Recommend
More recommend