Location Privacy. Where do we stand and where are we going? - - PowerPoint PPT Presentation

location privacy
SMART_READER_LITE
LIVE PREVIEW

Location Privacy. Where do we stand and where are we going? - - PowerPoint PPT Presentation

Location Privacy. Where do we stand and where are we going? Fernando Prez-Gonzlez Signal Theory and Communications Department Universidad de Vigo - SPAIN Why do we like location based apps? 2 Google maps 3 Foursquare 4 Facebook


slide-1
SLIDE 1

Location Privacy.

Where do we stand and where are we going?

Fernando Pérez-González Signal Theory and Communications Department Universidad de Vigo - SPAIN

slide-2
SLIDE 2

2

Why do we like location based apps?

slide-3
SLIDE 3

Google maps

3

slide-4
SLIDE 4

Foursquare

4

slide-5
SLIDE 5

Facebook place tips

5

slide-6
SLIDE 6

Waze

6

slide-7
SLIDE 7

And, of course…

7

slide-8
SLIDE 8

8

How can you be geolocated? (without you fully knowing)

slide-9
SLIDE 9

IP-based Geolocation

9

Source: GeoIPTool

slide-10
SLIDE 10

Meta-data based Geolocation

10

slide-11
SLIDE 11

Landmark recognition Geolocation

11

slide-12
SLIDE 12

Biometric geolocation

12

slide-13
SLIDE 13

Credit card usage Geolocation

14

slide-14
SLIDE 14

Triangulation and other geolocation techniques

15

slide-15
SLIDE 15

Signal strength-based triangulation

16

Source: The Wrongful Convictions Blog

slide-16
SLIDE 16

17

Source: The Wrongful Convictions Blog

Signal strength-based triangulation

slide-17
SLIDE 17

Multilateration: Time Difference of Arrival (TDOA)

18

Source:[Fujii et al. 2015]

slide-18
SLIDE 18

Wardriving geolocation (Wigle)

19

Source:Wigle.net

slide-19
SLIDE 19

Electrical Network Frequency Geolocation

20

slide-20
SLIDE 20

21

slide-21
SLIDE 21

22

Why is it dangerous?

slide-22
SLIDE 22

23

slide-23
SLIDE 23

Buster busted!

24

slide-24
SLIDE 24

25

slide-25
SLIDE 25

26

slide-26
SLIDE 26

6 months in the life of Malte Spitz (2009-2010)

29

Source:http://www.zeit.de/datenschutz/malte-spitz-data-retention

slide-27
SLIDE 27

31

Are we concerned about it?

slide-28
SLIDE 28

Are people really concerned about location privacy?

  • Survey by Skyhook Wireless (July 2015) of 1,000

Smartphone app users.

  • 40% hesitate or don’t share location with apps.
  • 20% turned off location for all their apps.
  • Why people don’t share location?
  • 50% privacy concerns.
  • 23% don’t see value in location data.
  • 19% say it drains their battery.
  • Why people turn off location?
  • 63% battery draining.
  • 45% privacy.
  • 20% avoid advertising.

32

slide-29
SLIDE 29

33

How much is geolocation data worth?

slide-30
SLIDE 30

34

slide-31
SLIDE 31

How much value do we give to location data? [Staiano et al. 2014]

35

Daily Value (€)

Many participants opted-out of revealing geolocation information.

  • Avg. daily value of location info: 3 €

Strong correlation between the amount traveled and the value given to location data.

slide-32
SLIDE 32

Earn money as you share data

36

  • GeoTask
  • £1 PayPal cash voucher per

100 days of location data sharing (£0.01/day) Financial Times in 2013: advertisers are willing to pay a mere $0.0005 per person for general information such as their age, gender and location, or $0.50 per 1,000 people.

slide-33
SLIDE 33

Pay as you drive

38

  • Formula can be a function
  • f the amount of miles

driven, or the type of driving, age of the driver, type of roads used…

  • Up to 40% reduction in the

cost of insurance.

slide-34
SLIDE 34

39

BIA/Kelsey projects U.S. location-targeted mobile ad spending to grow from $9.8 billion in 2015 to $29.5 billion in 2020. That’s $90 per person year!!!!

slide-35
SLIDE 35

40

SAP, Germany, estimates wireless carrier revenue from selling mobile-user behavior data in $5.5 billion in 2015 and predicts $9.6 billion for 2016.

slide-36
SLIDE 36

47

How about anonymization/pseudonymization?

slide-37
SLIDE 37

Anonymity

Problems:

  • Difficult authentication and personalization.
  • Operating system or apps may access location before

anonymization.

48

Anonymity provider (local/central) Location Location Service provider

slide-38
SLIDE 38

Pseudonimity

Problems:

  • Operating system or apps may access location data before

pseudonymization.

  • Deanonymization.

49

Location Service provider Pseudonym

slide-39
SLIDE 39

Deanonymization based on home location [Hoh, Gruteser 2006]

  • Data from GPS traces of larger Detroit area (1 min resolution).
  • No data when vehicle parked.
  • K-means algorithm for clustering locations + 2 heuristics:
  • Eliminate centroids that don’t have evening visits.
  • Eliminate centroids outside residential areas (manually).

50

Source: [Hoh, Gruteser 2006]

slide-40
SLIDE 40

Deanonymization based on home location [Krummer 2007]

  • 2- week GPS data from 172 subjects (avg. 6 sec resolution).
  • Use heuristic to single out trips by car.
  • Then use several heuristics: destination closest to 3 a.m. is

home; place where individual spends most time is home; center of cluster with most points is home.

  • Use reverse geocoding and white pages to deanonymize.

Success measured by finding out name of individual.

  • Positive identification rates around 5%.
  • Even noise addition with std=500 m gives around 5% success

when measured by finding out correct address.

51

slide-41
SLIDE 41

Mobile trace uniqueness [de Montjoye et al 2013]

  • Study on 15 months of mobility data; 0.5M individuals.
  • Dataset with hourly updates and resolution given by cell

carrier antennas, only 4 points suffice to identify 95% of individuals.

  • Uniqueness of mobility traces decays as 1/10th power of

their resolution.

52

Source: [de Montojoye et al. 2013]

slide-42
SLIDE 42

53

Location privacy protection mechanisms

slide-43
SLIDE 43

Location white lies

54

Source: Caro Spark (CC BY-NC-ND)

slide-44
SLIDE 44

Location based privacy mechanisms

55

Input location Output pseudolocation

X Z

Source: Motherboards.org

slide-45
SLIDE 45

Location privacy protection mechanisms (LPPMs)

  • The mechanism may be deterministic (e.g., quantization) or

stochastic (e.g., noise addition).

  • Function

may depend on other contextual (e.g., time)

  • r user-tunable (e.g., privacy level) parameters.
  • When the mechanism is stochastic, there is an underlying

probability density function, i.e.,

56

) (X Z   ) (  ) | ( X Z f

slide-46
SLIDE 46

Hiding

57

slide-47
SLIDE 47

Perturbation: (indepedent) noise addition

58

slide-48
SLIDE 48

Perturbation: quantization

59

slide-49
SLIDE 49

Obfuscation

60

slide-50
SLIDE 50

Spatial Cloaking

61

slide-51
SLIDE 51

How to commit the perfect murder

62

slide-52
SLIDE 52

Space-time Cloaking

63

Time

slide-53
SLIDE 53

Dummies

64

slide-54
SLIDE 54

User-centric vs. Centralized LPPM

65

User-centric

slide-55
SLIDE 55

User-centric vs. Centralized LPPM

66

Centralized

slide-56
SLIDE 56

67

slide-57
SLIDE 57

Utility vs. Privacy

68

Privacy Utility

  • In broad terms:
slide-58
SLIDE 58

Very nice, but…

  • There are two main problems:

How do we measure utility? How do we measure privacy?

69

slide-59
SLIDE 59

How to measure utility?

70

slide-60
SLIDE 60

71

How to measure utility?

slide-61
SLIDE 61

How to measure utility?

72

Real position pseudolocation

slide-62
SLIDE 62

A note about distances

76

2

d

1

d

slide-63
SLIDE 63

Adversarial definition of privacy [Shokri et al 2011-]

  • Assume stochastic mechanism for the user

.

  • Adversary constructs a (possibly stochastic) estimation

remapping .

  • Prior assumed available to the adversary.
  • : Distance between

and

  • : Distance between

and

77

) | ( X Z f ) | ˆ ( Z X r ) (X 

x ˆ . x

) ˆ , ( x x d p ) , ( z x dq

x . z x z LPPM x ˆ Adversary

slide-64
SLIDE 64

Adversarial definition of privacy [Shokri et al 2011-]

  • Establish a cap on average utility loss:
  • This is a Stackelberg game in which the user chooses first

and the adversary plays second.

  • Find optimal adversarial ‘remapping’:
  • Optimal remapping depends on

and . where

78

} | ) , ˆ ( { min arg ) | ˆ (

*

Z X X d E Z X r

p

 ) (X  QL Z X d E

q

 )} , ( { ) | ( X Z f ) , ˆ ( ) | ( ) | ˆ ( } | ) , ˆ ( {

ˆ ,

X X d Z X f Z X r Z X X d E

X X P p

 ) ( ) ( ) | ( ) | ( Z f X X Z f Z X f   

LBPM Prior

slide-65
SLIDE 65

Example: uniform noise addition

79

LPPM z x ˆ

) | ( X z Z f 

Prior x

) | ( x X Z f 

slide-66
SLIDE 66

Adversarial definition of privacy [Shokri et al 2011-]

  • When for a given

there are several minimizers the function becomes stochastic.

  • The user now must maximize privacy:
  • Which is achieved for some mechanism
  • Privacy is defined as after solving this

maxmin problem.

80

) ˆ , ( ) ( ) | ( ) | ˆ ( max )} , ˆ ( { max

ˆ , , *

X X d X X Z f Z X r X X d E

p X X Z p

X ˆ Z

) | ˆ (

*

Z X r )} , ˆ ( { X X d E

p

) | (

*

X Z f

slide-67
SLIDE 67

An interesting result

  • When

: i.e. do nothing!

  • When

the following identity must hold

  • When both user and adversary play optimally:

81

) ˆ ( ) | ˆ (

*

z X z Z X r    

q p

d d 

Privacy=Utility Loss

)} , ( { min arg ) | (

*

X z d E X z Z f

p

 

2

d d d

q p

  } | { z Z X E z  

slide-68
SLIDE 68

The Utility Loss-Privacy plane

85

Utility Loss Privacy

Achievable region Optimal Mechanism Achievable region Optimal Adversary

P=UL

Adv. Strategy 1 Adv. Strategy 2 Adv. Strategy 4 Adv. Strategy 3

  • Adv. Playing line
slide-69
SLIDE 69

What’s wrong with priors?

  • Is it realistic to asume that the adversary knows the prior?
  • Adversary no longer plays optimally with the ‘wrong’ prior.
  • Shokri’s privacy definition is prior-dependent.
  • Definition of differential privacy is prior-independent:
  • Two databases

differing in a single element.

  • A: randomized algorithm.
  • S: set of possible subsets of im(A).

86

}) ) ( log(Pr{ }) ) ( log(Pr{

2 1

S D A S D A     

2 1, D

D

slide-70
SLIDE 70

Geoindistinguishability [Chatzikokolakis et al 2013-]

  • A mechanism is geo-indistinguishable iff:

for all

  • Differential privacy corresponds to dp = Hamming distance.
  • Definition is prior-independent.
  • Guarantees a small leakage of information BUT is no

defense against EVERY adversary: with proper side information, adversary can learn a lot!

87

) ' , ( | ) ' | ( log( ) | ( log( | x x d x X z f x X z f

p

      . , ' , z x x

slide-71
SLIDE 71

Uniform mechanisms do not provide geo-ind

88

) | ( x X Z f  x ' x ) ' | ( x X Z f       | ) ' | ( log( ) | ( log( | x X z f x X z f

slide-72
SLIDE 72

Laplacian mechanism

  • Laplacian distribution in polar coordinates:
  • Then,
  • The Laplacian mechanism satisfies the geo-ind condition.

89

) , (

2

2 ) | (

z x d

e x X z f

 

 

  | ) , ( ) ' , ( | | ) ' ( log ) | ( log |

2 2

x z d x z d x X z f x X z f           ) ' , (

2

x x d   

Triangle inequality

slide-73
SLIDE 73

Laplacian mechanism

90

slide-74
SLIDE 74

Optimal mechanisms for geo-ind

  • Minimize quality loss (i.e., ) subject to geo-

ind constraint.

  • Fact: geo-ind constraint is kept under any adversarial

remapping

  • Optimal mechanism is then

where

  • The optimal adversarial remapping would find

91

) , ( ) ( ) | ( )} , ( {

,

Z X d X X Z f Z X d E

q Z X q

  )} , ( { Z X d E

q

)} , ( { min arg ) | (

*

X Z d E X Z f

q

 ) | ˆ ( Z X r } | ) , ˆ ( { min arg ) | ˆ (

*

Z X X d E Z X r

p

slide-75
SLIDE 75

Optimal mechanisms for geo-ind

  • If

the adversary does nothing. Minimization of the QL has been already done by the mechanism!!

  • But if the adversary does nothing, Privacy=QL.
  • The operating value thus depends on

(the smaller, the larger the privacy).

92

q p

d d  

slide-76
SLIDE 76

98

Where are we going?

slide-77
SLIDE 77

Sensitivity [Bertino et. al 2010]

99

slide-78
SLIDE 78

Sensitivity

  • The mechanism should weigh the importance given by the

user to each location.

  • This can be specified semantically by defining categories.
  • Sensitivity of a region:
  • prob. that the user,

known to be in that region, is actually in a sensitive place.

  • For other mechanisms:
  • pen problem.

100 100

slide-79
SLIDE 79

Graph-based models

101

slide-80
SLIDE 80

Graph-based models

102

slide-81
SLIDE 81

Graph-based models

103

Trace

slide-82
SLIDE 82

Graph-based models

  • A trace is a path together with time .
  • Common assumption for an adversary: the true trace can be

described through a Markov chain.

  • Prior transition probabilities between states can be

estimated if training traces are (at least partially) available.

104

N i i i t

X

1

} , {

) | (

n m S

S P ) | ( 1

n m S

S P 

) (

l

S P

) | (

k l S

S P ) | (

l n S

S P ) | (

k m S

S P

) (

k

S P ) (

m

S P ) (

n

S P

Training data

) (

n

S P

) | (

l n S

S P

) (

l

S P

) | (

k l S

S P

) (

k

S P ) (

m

S P

) | (

k m S

S P ) | (

n m S

S P

slide-83
SLIDE 83

Graph-based models

  • Shokri et al.’s approach: depending on what the adversary

wants to learn, apply a different method.

  • Maximum likelihood: find the most likely trace given the
  • bserved trace
  • Dynamic programming (e.g., Viterbi algorithm) can be used.

105

) } , { | } , ({ max arg

1 1 } , {

1

N i i i N i i i t X

t Z t X f

N i i i

 

slide-84
SLIDE 84

Graph-based models

  • Distribution estimation: estimate the probabilities of all

traces using the Metropolis-Hastings algorithm.

106

slide-85
SLIDE 85

Graph-based models

  • Location estimation: find the most likely node at time
  • Can be solved using the backward-forward algorithm to

recursively compute the probabilities.

107

) } , { | ( max arg

1 N i i i k X

t Z X f

k

k

t

slide-86
SLIDE 86

Privacy as a zero-sum game

109

Utility Loss Privacy

Achievable region Optimal Mechanism

P=UL

Achievable region Optimal Adversary

Privacy+Utility=constant

slide-87
SLIDE 87

Adding a new dimension: bandwidth

110

s dummies) 8 ( 3   n ) ( Privacy ) ( Loss Utility

2 2

s d s d   S ) ( Privacy )/3 ( ) ( Loss Utility

2 2 2

S d S d s d   

slide-88
SLIDE 88

The Utility Loss-Privacy-Bandwidth region

111

Utility Loss Privacy

Achievable region Optimal Mechanism

P=UL P=3 UL

Achievable region Optimal Adversary

BW is now 9 times larger Service provider utility loss User utility loss

Privacy gain due to dummying

slide-89
SLIDE 89

Space-time cloaking

112

time Delay density . pop time area anonimity

  • k

Privacy area Loss Utility      

slide-90
SLIDE 90

Privacy-preserving queries

Retrieval in Encrypted Domain Encrypted query Encrypted reply

slide-91
SLIDE 91

114

slide-92
SLIDE 92

Thanks!

fperez@gts.uvigo.es www.gpsc.uvigo.es

Grupo Procesado de Señal en Comunicaciones

slide-93
SLIDE 93

What utility? An example

116

density . pop time area anonimity

  • k

Privacy area / 1 / 1 Utility

max

      d

slide-94
SLIDE 94

But delay also counts…

117

Utility Privacy Delay=5 min Delay=10 min Delay=15 min

slide-95
SLIDE 95

118

slide-96
SLIDE 96

What utility? Another example

  • Space-time slicing
  • Is this related to bandwidth?

119

slide-97
SLIDE 97
  • Space-time slicing
  • Is this related to bandwidth?

120