DBSec’13 Differentially Private Multi- Dimensional Time Series Release for Traffic Monitoring Liyue Fan , Li Xiong, Vaidy Sunderam Department of Math & Computer Science Emory University
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 2 Outline • Traffic Monitoring • User Privacy • Challenges • Proposed Solutions • Temporal Estimation • Spatial Estimation • Empirical Evaluation
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 3 Monitoring Traffic • Congestions/Trending places/Everyday life • How many cars are there? Where are they? Monital Metropol, Brazil Google Traffic View
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 4 Traffic Monitoring • Real-time GPS data traffic histogram • At any timestamp: Aggregate 2D Histogram Real-time user location
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 5 User Privacy • User privacy should be protected when releasing their data! • Real-time location data is sensitive • pleaserobme.com • GPS traces are identifying • “We study fifteen months of human mobility data for one and a half million individuals and find that human mobility traces are highly unique. … in a dataset where the location of an individual is specified hourly, and with a spatial resolution equal to that given by the carrier's antennas, four spatio-temporal points are enough to uniquely identify 95% of the individuals .” De Montjoye, Yves-Alexandre, Cesar A. Hidalgo, Michel Verleysen, and Vincent D. Blondel. "Unique in the Crowd: The Privacy Bounds of Human Mobility." Scientific Reports 3 (2013)
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 6 Differentially Private Data Sharing
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 7 Differential Privacy (in a nutshell) • Rigorous definition • D oesn’t stipulate the prior knowledge of the attacker • Upon seeing the published data, an attacker should gain little knowledge about any specific individual. • α -Differential Privacy[BLR08] • Smaller α values ( 𝛽 < 1 ) indicate stronger privacy guarantee Privacy Budget
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 8 Static α -Differential Privacy • Laplace perturbation 𝐵 𝐸 = 𝑔 𝐸 + 𝑀𝑏𝑞(∆𝑔 Dataset D 𝛽 ) 𝑒 strong privacy → high Query f perturbation noise • Global Sensitivity 𝐸,𝐸 ′ 𝑔 𝐸 − 𝑔(𝐸 ′ ) 1 ∆𝑔 = max Laplace Perturbation 𝑑 1 :1 𝑑 2 :0 𝑑 1 :2 𝑑 2 :1 A(𝐸) : 𝑔(𝐸) : 𝑑 3 :5 𝑑 4 :3 𝑑 3 :3 𝑑 4 :4 1 𝑑 𝑗 = 𝑑 𝑗 + Lap( 𝛽 ) Δ𝑔 = 1
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 9 Composability of Differential Privacy • Sequential Composition [McSherry10] • Let 𝐵 𝑙 each provide 𝛽 𝑙 -differential privacy. A sequence of 𝐵 𝑙 (𝐸) over dataset 𝐸 provides 𝛽 𝑙 -differential privacy. • Timestamp k = 0, … 𝑈 − 1 • 𝑔 𝑙 (𝐸) : 2D cell histogram at time 𝑙 𝛽 • 𝐵 𝑙 (𝐸) : released 2D histogram that satisfies 𝑈 -DP • 𝐵 0 𝐸 , … , 𝐵 𝑈−1 (𝐸) satisfies 𝛽 -DP
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 10 Baseline Solution: LPA • Laplace Perturbation Algorithm • For each timestamp k: 𝑈 𝛽 ) 𝑒 • Release 𝐵 𝑙 𝐸 = 𝑔 𝑙 (𝐸) + 𝑀𝑏𝑞( • High perturbation noise for long time-series, i.e. when T is large • Low utility output since data is sparse Relative error 𝑑 1 :1 𝑑 2 :0 𝑑 1 :2 𝑑 2 :1 𝑑 1 : 50% 𝑑 2 : 100% 𝑑 3 :5 𝑑 4 :3 𝑑 3 :3 𝑑 4 :4 • Fact: location data is VERY sparse.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 11 Two Proposed Solutions Utilize time series model • Temporal Estimation for each cell and posterior estimation to reduce perturbation error. 𝑑 1 𝑑 2 𝑑 3 𝑑 4 • Spatial Estimation within each partition 1 1 0 0 Group similar cells together 1 2 1 0 to overcome data sparsity. 2 3 4 4 3 3 6 10
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 12 Framework Domain knowledge: known Sparse or Dense label for each cell. Differentially Private Raw Series Modeling/Partitioning Series Laplace Estimation Perturbation Doesn’t incur extra differential privacy cost
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 13 Temporal Estimation • For each cell, its count series { 𝑦 𝑙 }, k = 0, … 𝑈 − 1 • e.g. { 3,3,4,5,4,3,2,…} • Process Model 𝑦 𝑙+1 = 𝑦 𝑙 + 𝜕 𝜕~ℕ(0, 𝑅) Small value for Sparse cells; Large value for Dense cells. • Measurement Model 𝑨 𝑙 = 𝑦 𝑙 + 𝜉 𝜉~𝑀𝑏𝑞(𝑈 𝛽) • Goal: given 𝑨 𝑙 and the above models, estimate 𝑦 𝑙 .
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 14 Temporal Estimation(cont.) • Estimation algorithm based on the Kalman filter O(1) computation per timestamp 𝑈 2 • Gaussian approx 𝜉~ℕ(0, 𝑆) , 𝑆 ∝ 𝛽 2 Model-based Prediction Posterior Estimate/Output Linearly combine prediction and measurement Fan and Xiong CIKM’12, TKDE’13
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 15 Temporal Estimation Example • For cell c , at time k : • Suppose 𝑦 𝑙 = 4 − , e.g. 2 • Prediction 𝑦 𝑙 • Measurement/Laplace perturbed value 𝑨 𝑙 , e.g. 8 • Posterior estimation 𝑦 𝑙 , e.g. 3 • Impact of perturbation noise is reduced by taking into account of the process model and prediction!
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 16 Spatial Estimation • Goal: group cells to overcome data sparsity. • First partition the space until each partition contains Sparse or Dense cells only S S S S • Topdown algorithm based on QuadTree S S S S • Data independency and efficiency S S S S S S D D • For each timestamp k : Δ𝑔 ′ • 𝑔 ′ 𝑙 = 1 𝑙 𝐸 : partition counts 𝑈 𝛽 ) 𝑒 ′ • 𝐵′ 𝑙 𝐸 = 𝑔′ 𝑙 (𝐸) + 𝑀𝑏𝑞( • Release 𝑔 𝑙 (𝐸) estimated from 𝐵′ 𝑙 𝐸 • Each cell is visited O(1) times at each timestamp.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 17 Spatial Estimation Example • At time k Perturbation noise is evenly 1 1 0 0 distributed to every cell 1 2 1 0 within the partition. Original Cell Histogram 𝒈 𝒍 𝑬 : 2 3 4 4 3 3 6 10 1 1 0 0 5 1 6 0 1 1 0 0 4 4 5 3 3 3 5 3 11 12 6 10 6 11 3 3 6 11 Partition Laplace Estimated Cell Histogram 𝒈 ′ 𝒍 (𝑬) Perturbed 𝑩 ′𝒍 𝑬 𝒍 𝑬 Histogram 𝒈
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 18 Evaluation: Data • Generated moving objects on a road network • City of Oldenburg, Germany • 500K objects at the beginning • 25K new objects at every timestamp • total time: 100 timestamps • Two-dimensional 1024 by 1024 grid over the city map • Each cell represents 400 m 2 • Record object locations at cell resolution • 95% cells are labeled Sparse ! http://iapg.jade-hs.de/personen/brinkhoff/generator/
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 19 Temporal Estimation 400 300 200 cell count 100 0 -100 -200 orig -300 Laplace -400 Kalman -500 1 11 21 31 41 51 time
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 20 Spatial Partitions Oldenburg Road Network Partitions by QuadTree
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 21 Evaluation: Utility vs. Privacy • Utility of each cell: Average Relative Error of released series • For each 𝛽 value, median utility among each class is plotted DFT: Rastogi and Nath , SIGMOD’10
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 23 Evaluation: Range Queries • How many objects are in the area of m by m cells at every timestamp? • For each m , 100 areas are randomly selected and evaluated.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 24 Evaluation: Runtime • Overall runtime is plotted in millisecond.
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 25 Conclusion • Difficult when time series is long and data is sparse! • Domain knowledge can be used for temporal modeling as well as spatial partitioning. • Output utility is improved with same privacy guarantee. • We don’t observe extra time cost by our solutions. • Ongoing work: • Utilize rich information in spatio-temporal data. • Model learning and parameter learning. • Contact: liyue.fan@emory.edu • AIMS Group: www.mathcs.emory.edu/aims
9/4/2013 DBSec'13: Privacy Preserving Traffic Monitoring 26 Q&A
Recommend
More recommend