time series compressibility and privacy
play

Time Series Compressibility and Privacy Spiros Papadimitriou* - PowerPoint PPT Presentation

Time Series Compressibility and Privacy Spiros Papadimitriou* Feifei Li + George Kollios + Philip S. Yu* *IBM TJ Watson + Boston University Intuition / Motivation Introduce uncertainty about individual values, while still allowing


  1. Time Series Compressibility and Privacy Spiros Papadimitriou* Feifei Li + George Kollios + Philip S. Yu* *IBM TJ Watson + Boston University

  2. Intuition / Motivation � Introduce uncertainty about individual values, while still allowing interesting pattern mining 55mph speed highway 35mph city time 2

  3. Intuition / Motivation � Introduce uncertainty about individual values, while still allowing interesting pattern mining 55mph speed highway 35mph city Need to publish some value within the band: time which one? 3

  4. Random (white noise) ? � Completely random permutation? � Cars (typically) don’t drive like this ⇒ Noise can be filtered out speed time 4

  5. Deterministic ? � Completely “deterministic” permutation? � True value leaks speed δ time 5

  6. First extreme case White noise Completely random 6

  7. Summary of extreme cases Completely random � � ? � � Completely “deterministic” 7

  8. Summary of extreme cases Adaptively combine completely random and Completely random � � completely “deterministic” ? ? � � Completely “deterministic” 8

  9. Main challenge Knowledge of Knowledge of an arbitrary number signal’s subspace (“shape”) of true values with arbitrary precision Completely Completely random “deterministic” Combining both 9

  10. Goals � Partial “information hiding” via data perturbation, for time series � Perturbation adapts to data properties � Automatically combines “random” and “deterministic” at appropriate scales � Evaluate against both � Filtering � True value leaks � Suitable for on-the-fly, streaming perturbation 10

  11. Overview � Definitions � Method � Experiments � Conclusion 11

  12. Utility = discord time � Published values are (on expectation) within of the true values : 12

  13. Privacy = final uncertainty time � Recovered values are (on expectation) within of the true values : 13

  14. Goal � Recovery of true values is based on assumptions about attack model, with specific background knowledge � Linear filtering � Linear reconstruction (based on true values) � Goal: 14

  15. Overview � Definitions � Method � Experiments � Conclusion 15

  16. Wavelet and Fourier representations One-slide refresher Scale (frequency) Frequency Time Time 16

  17. Our work � Fourier-based perturbation � Batch � Wavelet-based perturbation � Batch � Streaming 17

  18. Fourier-based perturbation Intuition 0 Energy concentrated 0 Original series Original series in few coefficients: 0 high compression 0 0 100 0 0 + ≈ σ ≈ σ Perturbed series Perturbation ≈ σ ≈ σ = ≈ σ ≈ 100 ± σ ≈ σ ≈ σ Freq. domain Time domain 18

  19. Fourier-based perturbation Intuition & Summary Frequency Time 19

  20. Wavelet-based perturbation Intuition & Summary Scale (frequency) Time Time Next: How to do this online? (1) Wavelet transform; (2) Noise allocation 20

  21. Streaming perturbation (1) Wavelet transform—Summary � Forward transform: post-order traversal 1 2 4 5 � O(lgN) space � O(1) time (amortized) 3 6 7 21

  22. Streaming perturbation (2) Noise allocation—Summary Challenge: � Knowing only the wavelet coefficients up to the current time � How can we allocate the noise online so that it is as close as possible to the batch allocation? current value Indefinite publication delay? 23

  23. Streaming perturbation (1) Wavelet transform—Summary � Inverse transform: pre-order traversal 1 3 2 4 5 6 6 7 � O(lgN) space � O(1) time (amortized) 3 2 7 5 4 1 22

  24. Streaming perturbation (2) Noise allocation—Summary Batch Per-band lookahead Exceeds threshold [see paper for details] Perturbed 24

  25. Overview � Definitions � Method � Experiments � Conclusion 25

  26. Experimental overview � Datasets: � Chlorine: Chlorine concentration in drinkable water distribution network � Light: Light intensity measurements (Intel Berkeley) � SP500: Standards & Poors 500 index Chlorine Light SP500 2 4 1 1.5 3 1 0 2 0.5 1 -1 0 0 -0.5 -2 -1 200 400 600 800 1000 1200 1400 1600 1800 2000 200 400 600 800 1000 1200 1400 1600 1800 2000 2000 4000 6000 8000 10000 12000 14000 16000 26

  27. Experimental overview � Varying � Discord levels, and � Perturbation methods: � IID � Fourier-based (FFT) � Batch wavelet-based (DWT) � Streaming wavelet-based (str. DWT) � Filter: wavelet shrinkage [Donoho / TOIT95] � True values: linear regression 27

  28. Removed uncertainty Removed noise (%) Perturbation method Discord σ (% RMS) 28

  29. Removed uncertainty � Average (over ten runs): � IID noise: excellent resilience to leaks, very poor for filtering � Other methods: comparable 29

  30. Removed uncertainty � Maximum (over ten runs): � Fourier may perform poorly for “non-smooth” signals 30

  31. Removed uncertainty Light 2 1 . 5 � Maximum (over ten runs): 1 0 . 5 0 � Fourier may perform poorly for - 0 . 5 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 1 2 0 0 1 4 0 0 1 6 0 0 1 8 0 0 2 0 0 0 “non-smooth” signals 2 1 0 -1 0.4 0.2 0 -0.2 -0.4 2 1 0 -1 0.2 0 -0.2 2 1 0 -1 0.2 0 -0.2 31

  32. “True” uncertainty Remaining noise (% RMS) Discord σ (% RMS) 32

  33. “True” uncertainty � Average (over ten runs): � IID noise: very poor overall � Other methods: comparable 33

  34. “True” uncertainty � Maximum (over ten runs): � Fourier may perform poorly for “non-smooth” signals 34

  35. Scalability Constant per measurement 35

  36. Overview � Definitions � Method � Experiments � Conclusion 36

  37. Related work (1/2) � Privacy-preserving data mining � SMC [Lindel & Pinkas / CRYPTO00], [Vaidya & Clifton / KDD02] � Partial information hiding � Perturbation [Agrawal & Srikant / SIGMOD00], [Du & Zhan / KDD03], [Kargupta, Datta, Wang & Sivakumar / ICDM03], [Agrawal & Aggarwal / EDBT04], [Chen & Liu / ICDM05], [Huang, Du & Chen / SIGMOD05], [Liu, Ryan & Kargupta / TKDE05], [Li et al. / ICDE07] � k -anonymity [Sweeney / IJUFKS02] , [Aggarwal & Yu / EDBT04], [Bertino, Ooi, Yang & Deng / ICDE05], [Kifer & Gehrke / SIGMOD06], [Machanwajjala, Gehrke & Kifer / ICDE06], [Xiao & Tao / SIGMOD06] � Interactive privacy [Blum, Dwork, McSherry & Nissim / PODS05], [Dwork, McSherry, Nissim, Smith / TCC06] � SSDBs [Denning / TODS80] � Wavelets in DM [Gilbert, Kotidis, Muthukrishnan & Strauss / VLDB01], [Garofalakis & Gibbons / SIGMOD02], [Bulut & Singh / ICDE03], [Papadimitriou, Brockwell & Faloutsos / VLDB04], [Lin, Vlachos, Keogh & Gunopulos / EDBT04], [Karras & Mamoulis / VLDB05] � Compression and DM [Keogh, Lonardi & Ratanamahatana / KDD04] 37

  38. Related work (2/2) � Correlated perturbation [Kargupta, Datta, Wang & Sivakumar / ICDE03], [Huang, Du & Chen / SIGMOD05] , for streams [Li et al. / ICDE07] � L-diversity [Machanwajjala, Gehrke & Kifer / ICDE06] and personalized privacy [Xiao & Tao / SIGMOD06] � Dimensionality curse and privacy [Aggarwal / VLDB05] � Watermarking [Sion, Attalah & Prabhakar / TKDE06] � Compressed sensing [Donoho / TOIT06], [Candés, Romberg & Tao / TOIT06] 38

  39. Conclusion � Partial information hiding via data perturbation � User-defined discord (utility) � Adapts to data properties � Automatically combines “random” and “deterministic” at appropriate scales � Additionally preserves spectral properties � Evaluate against both � Filtering � True value leaks � Suitable for on-the-fly, streaming perturbation Perturbing data objects with any “structure” is non-trivial, even under fixed attack model(s) 39

  40. Thank you Time Series Compressibility and Privacy Spiros Papadimitriou* Feifei Li + George Kollios + Philip S. Yu* *IBM TJ Watson + Boston University

  41. BACKUP Per-band allocation � Fourier equal alloc.: “spreads” noise if signal is non-smooth � Wavelets: time- adaptive anyway 41

  42. BACKUP Per-band allocation 42

  43. BACKUP Marginals Light - CDF Chlorine - CDF 1 1 IID IID Fourier 0.9 0.9 Fourier Wavelet Wavelet 0.8 0.8 0.7 0.7 0.6 0.6 P(z) P(z) 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 z ≡ |y t - x t | z ≡ |y t -x t | 43

Recommend


More recommend