Location Privacy. Where do we stand and where are we going? Fernando Pérez-González Signal Theory and Communications Department Universidad de Vigo - SPAIN
Why do we like location based apps? 2
Google maps 3
Foursquare 4
Facebook place tips 5
Waze 6
And, of course … 7
How can you be geolocated? (without you fully knowing) 8
IP-based Geolocation Source: GeoIPTool 9
Meta-data based Geolocation 10
Landmark recognition Geolocation 11
Biometric geolocation 12
Credit card usage Geolocation 14
Triangulation and other geolocation techniques 15
Signal strength-based triangulation Source: The Wrongful Convictions Blog 16
Signal strength-based triangulation Source: The Wrongful Convictions Blog 17
Multilateration: Time Difference of Arrival (TDOA) Source:[Fujii et al. 2015] 18
Wardriving geolocation (Wigle) Source:Wigle.net 19
Electrical Network Frequency Geolocation 20
21
Why is it dangerous? 22
23
Buster busted! 24
25
26
6 months in the life of Malte Spitz (2009-2010) Source:http://www.zeit.de/datenschutz/malte-spitz-data-retention 29
Are we concerned about it? 31
Are people really concerned about location privacy? • Survey by Skyhook Wireless (July 2015) of 1,000 Smartphone app users. • 40% hesitate or don’t share location with apps. • 20% turned off location for all their apps. • Why people don’t share location? • 50% privacy concerns. • 23% don’t see value in location data. • 19% say it drains their battery. • Why people turn off location? • 63% battery draining. • 45% privacy. • 20% avoid advertising. 32
How much is geolocation data worth? 33
34
How much value do we give to location data? [Staiano et al. 2014] Many participants opted-out of revealing geolocation information. Avg. daily value of location info: 3 € Daily Value ( € ) Strong correlation between the amount traveled and the value given to location data. 35
Earn money as you share data • GeoTask • £1 PayPal cash voucher per 100 days of location data sharing (£0.01/day) Financial Times in 2013: advertisers are willing to pay a mere $0.0005 per person for general information such as their age, gender and location, or $0.50 per 1,000 people. 36
Pay as you drive • Formula can be a function of the amount of miles driven, or the type of driving, age of the driver, type of roads used… • Up to 40% reduction in the cost of insurance. 38
That’s $90 per person year!!!! BIA/Kelsey projects U.S. location-targeted mobile ad spending to grow from $9.8 billion in 2015 to $29.5 billion in 2020. 39
SAP, Germany, estimates wireless carrier revenue from selling mobile-user behavior data in $5.5 billion in 2015 and predicts $9.6 billion for 2016. 40
How about anonymization/pseudonymization? 47
Anonymity Location Location Service provider Anonymity provider (local/central) Problems: • Difficult authentication and personalization. • Operating system or apps may access location before anonymization. 48
Pseudonimity Location Pseudonym Service provider Problems: • Operating system or apps may access location data before pseudonymization. • Deanonymization. 49
Deanonymization based on home location [Hoh, Gruteser 2006] • Data from GPS traces of larger Detroit area (1 min resolution). • No data when vehicle parked. • K-means algorithm for clustering locations + 2 heuristics: • Eliminate centroids that don’t have evening visits. • Eliminate centroids outside residential areas (manually). Source: [Hoh, Gruteser 2006] 50
Deanonymization based on home location [Krummer 2007] • 2- week GPS data from 172 subjects (avg. 6 sec resolution). • Use heuristic to single out trips by car. • Then use several heuristics: destination closest to 3 a.m. is home; place where individual spends most time is home; center of cluster with most points is home. • Use reverse geocoding and white pages to deanonymize. Success measured by finding out name of individual. • Positive identification rates around 5%. • Even noise addition with std=500 m gives around 5% success when measured by finding out correct address. 51
Mobile trace uniqueness [de Montjoye et al 2013] • Study on 15 months of mobility data; 0.5M individuals. • Dataset with hourly updates and resolution given by cell carrier antennas, only 4 points suffice to identify 95% of individuals. • Uniqueness of mobility traces decays as 1/10th power of their resolution. Source: [de Montojoye et al. 2013] 52
Location privacy protection mechanisms 53
Location white lies Source: Caro Spark (CC BY-NC-ND) 54
Location based privacy mechanisms Input Output location pseudolocation X Z Source: Motherboards.org 55
Location privacy protection mechanisms (LPPMs) • ( X ) Z • The mechanism may be deterministic (e.g., quantization) or stochastic (e.g., noise addition). ( • Function may depend on other contextual (e.g., time) ) or user-tunable (e.g., privacy level) parameters. • When the mechanism is stochastic, there is an underlying probability density function, i.e., ( | ) f Z X 56
Hiding 57
Perturbation: (indepedent) noise addition 58
Perturbation: quantization 59
Obfuscation 60
Spatial Cloaking 61
How to commit the perfect murder 62
Space-time Cloaking Time 63
Dummies 64
User-centric vs. Centralized LPPM User-centric 65
User-centric vs. Centralized LPPM Centralized 66
67
Utility vs. Privacy • In broad terms: Utility Privacy 68
Very nice, but … • There are two main problems: How do we measure utility? How do we measure privacy? 69
How to measure utility? 70
How to measure utility? 71
How to measure utility? Real position pseudolocation 72
A note about distances d 1 d 2 76
Adversarial definition of privacy [Shokri et al 2011-] • Assume stochastic mechanism for the user ( | ) f Z X . • Adversary constructs a (possibly stochastic) estimation ˆ remapping . ( | ) r X Z • Prior assumed available to the adversary. ( X ) ˆ ˆ • ( , ) d p x x : Distance between and x . x • : Distance between and ( , ) . d q x z z x LPPM z x ˆ x Adversary 77
Adversarial definition of privacy [Shokri et al 2011-] • Establish a cap on average utility loss: { ( , )} E d X Z QL q • This is a Stackelberg game in which the user chooses first and the adversary plays second. • Find optimal adversarial ‘ remapping ’: ˆ ˆ * ( | ) arg min { ( , ) | } r X Z E d X X Z p • Optimal remapping depends on ( | ) ( X ) and . f Z X ˆ ˆ ˆ { ( , ) | } ( | ) ( | ) ( , ) E d X X Z r X Z f X Z d X X p P ˆ , X X LBPM where ( | ) ( ) f Z X X ( | ) Prior f X Z ( ) f Z 78
Example: uniform noise addition ( | ) f Z z X Prior z ˆ x ( | ) f Z X x x LPPM 79
Adversarial definition of privacy [Shokri et al 2011-] ˆ • When for a given there are several minimizers the X Z ˆ * function becomes stochastic. ( | ) r X Z • The user now must maximize privacy: ˆ ˆ ˆ * max { ( , )} max ( | ) ( | ) ( ) ( , ) E d X X r X Z f Z X X d X X p p ˆ , , Z X X * ( | ) • Which is achieved for some mechanism f Z X ˆ • Privacy is defined as after solving this { ( , )} E d X X p maxmin problem. 80
An interesting result d • When d : p q * ( | ) arg min { ( , )} f Z z X E d z X p ˆ ˆ * ( | ) ( ) r X Z z X z i.e. do nothing! • When the following identity must hold d d d 2 p q { | } z E X Z z • When both user and adversary play optimally: Privacy=Utility Loss 81
The Utility Loss-Privacy plane Achievable region P=UL Optimal Adversary Adv. Adv. Adv. Strategy 2 Achievable region Strategy 1 Strategy 3 Optimal Mechanism Adv. Utility Strategy 4 Loss Adv. Playing line Privacy 85
What’s wrong with priors? • Is it realistic to asume that the adversary knows the prior? • Adversary no longer plays optimally with the ‘ wrong ’ prior. • Shokri’s privacy definition is prior-dependent. • Definition of differential privacy is prior-independent: log(Pr{ ( ) }) log(Pr{ ( ) }) A D S A D S 1 2 - Two databases differing in a single element. 1 , D D 2 - A : randomized algorithm. - S : set of possible subsets of im(A) . 86
Recommend
More recommend