di ff erentially private batch query answering
play

Di ff erentially-Private Batch Query Answering Exploiting the - PowerPoint PPT Presentation

Di ff erentially-Private Batch Query Answering Exploiting the Workload vs. Exploiting the Data Gerome Miklau University of Massachusetts, Amherst DIMACS Workshop on Recent Work on Di ff erential Privacy across Computer Science October 2012


  1. Queries and workloads 1-dim ranges • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients

  2. Queries and workloads 1-dim ranges marginals • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients

  3. Queries and workloads 1-dim ranges marginals k-dim ranges • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients

  4. Queries and workloads 1-dim ranges marginals k-dim ranges predicate counting queries • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients

  5. Queries and workloads 1-dim ranges marginals k-dim ranges predicate counting queries linear counting queries • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients

  6. Privacy definitions & mechanisms • Di ff erential privacy A randomized algorithm A provides ( ε , δ ) -di ff erential privacy if: for all neighboring databases D and D’, and for any set of outputs S : Pr [ A ( D ) ∈ S ] ≤ e � Pr [ A ( D � ) ∈ S ] + δ • if δ =0, standard ε -di ff erential privacy: • Laplace(0,b) noise where b=||q|| 1 / ε • if δ >0, approximate ( ε , δ )-di ff erential privacy: • Gaussian(0, σ ) noise where σ = ||q|| 2 (2ln(2/ δ )) 1/2 / ε • Multi-query Laplace/Gaussian mechanism adds independent noise to each query answer. • Exponential mechanism

  7. The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 x 7 +1 x 8 x 9 x 10 x’

  8. The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 x 7 +1 x 8 x 9 x 10 x’

  9. The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 || W || 1 = 4 x 7 +1 x 8 x 9 x 10 x’

  10. The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 || W || 1 = 4 x 7 +1 x 8 The L 1 sensitivity of a query matrix is: x 9 the maximum L1 norm of the columns. x 10 x’

  11. The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 || W || 1 = 4 x 7 +1 x 8 The L 1 sensitivity of a query matrix is: x 9 the maximum L1 norm of the columns. x 10 The L 2 sensitivity of a query matrix is: x’ the maximum L2 norm of the columns.

  12. Outline 1. Preliminaries 2. Approach 1: workload-aware • Fixed Observations • Optimized Observations 3. Approach 2: data-aware 4. Conclusions

  13. Outline 1. Preliminaries 2. Approach 1: workload-aware • Fixed Observations • Optimized Observations 3. Approach 2: data-aware 4. Conclusions

  14. Answering all range queries Goal: answer all range-count queries over x AllRange = { w | w = x i + ... + x j for 1 ≤ i ≤ j ≤ n } x 1 + x 2 + x 3 + x 4 w 1 range(x 1 ,x 4 ) x 1 + x 2 + x 3 w 2 range(x 1 ,x 3 ) x 2 + x 3 + x 4 w 3 range(x 2 ,x 4 ) x 1 + x 2 w 4 range(x 1 ,x 2 ) workload W x 2 + x 3 w 5 range(x 2 ,x 3 ) x 3 + x 4 w 6 range(x 3 ,x 4 ) w 7 range(x 1 ,x 1 ) x 1 w 8 range(x 2 ,x 2 ) x 2 w 9 range(x 3 ,x 3 ) x 3 w 10 range(x 4 ,x 4 ) x 4

  15. Answering all range queries Goal: answer all range-count queries over x AllRange = { w | w = x i + ... + x j for 1 ≤ i ≤ j ≤ n } x 1 + x 2 + x 3 + x 4 w 1 range(x 1 ,x 4 ) x 1 + x 2 + x 3 w 2 range(x 1 ,x 3 ) x 2 + x 3 + x 4 w 3 range(x 2 ,x 4 ) x 1 + x 2 w 4 range(x 1 ,x 2 ) workload W x 2 + x 3 w 5 range(x 2 ,x 3 ) x 3 + x 4 w 6 range(x 3 ,x 4 ) w 7 range(x 1 ,x 1 ) x 1 w 8 range(x 2 ,x 2 ) x 2 w 9 range(x 3 ,x 3 ) x 3 w 10 range(x 4 ,x 4 ) x 4 x = 10 23 16 3

  16. Answering all range queries Goal: answer all range-count queries over x AllRange = { w | w = x i + ... + x j for 1 ≤ i ≤ j ≤ n } x 1 + x 2 + x 3 + x 4 w 1 range(x 1 ,x 4 ) w 1 52 x 1 + x 2 + x 3 w 2 range(x 1 ,x 3 ) w 2 49 x 2 + x 3 + x 4 w 3 range(x 2 ,x 4 ) w 3 42 x 1 + x 2 w 4 range(x 1 ,x 2 ) w 4 33 workload W x 2 + x 3 w 5 range(x 2 ,x 3 ) w 5 39 x 3 + x 4 w 6 range(x 3 ,x 4 ) w 6 19 w 7 range(x 1 ,x 1 ) x 1 w 7 10 w 8 range(x 2 ,x 2 ) x 2 w 8 23 w 9 range(x 3 ,x 3 ) w 9 x 3 16 w 10 range(x 4 ,x 4 ) w 10 x 4 3 x = 10 23 16 3

  17. Method 1: basic Laplace mechanism Workload queries Laplace noise private output x 1 + x 2 + x 3 + x 4 w 1 b 1 w’ 1 x 1 + x 2 + x 3 w 2 b 2 w’ 2 x 2 + x 3 + x 4 w 3 b 3 w’ 3 x 1 + x 2 w 4 b 4 w’ 4 W x 2 + x 3 w 5 b 5 w’ 5 + ( 6 / ε ) x 3 + x 4 w 6 b 6 w’ 6 w 7 x 1 b 7 w’ 7 w 8 x 2 b 8 w’ 8 b 9 w 9 x 3 w’ 9 b 10 w 10 x 4 w’ 10 || W || 1 =6

  18. Method 1: basic Laplace mechanism Workload queries Laplace noise private output x 1 + x 2 + x 3 + x 4 w 1 52 b 1 8.2 w’ 1 60.2 x 1 + x 2 + x 3 w 2 49 b 2 -5.4 w’ 2 44.6 x 2 + x 3 + x 4 w 3 42 b 3 -3.1 w’ 3 38.9 x 1 + x 2 w 4 33 b 4 6.6 w’ 4 39.6 W x 2 + x 3 w 5 39 b 5 -7.9 w’ 5 31.1 + ( 6 / ε ) x 3 + x 4 w 6 19 b 6 2.4 w’ 6 21.4 w 7 x 1 10 b 7 -3.0 w’ 7 7.0 w 8 x 2 23 b 8 -4.9 w’ 8 18.1 16 b 9 6.7 w 9 x 3 w’ 9 22.7 3 b 10 4.6 w 10 x 4 w’ 10 7.6 || W || 1 =6

  19. Method 1: basic Laplace mechanism Workload queries Laplace noise private output x 1 + x 2 + x 3 + x 4 w 1 52 b 1 8.2 w’ 1 60.2 x 1 + x 2 + x 3 w 2 49 b 2 -5.4 w’ 2 44.6 x 2 + x 3 + x 4 w 3 42 b 3 -3.1 w’ 3 38.9 x 1 + x 2 w 4 33 b 4 6.6 w’ 4 39.6 W x 2 + x 3 w 5 39 b 5 -7.9 w’ 5 31.1 + ( 6 / ε ) x 3 + x 4 w 6 19 b 6 2.4 w’ 6 21.4 w 7 x 1 10 b 7 -3.0 w’ 7 7.0 w 8 x 2 23 b 8 -4.9 w’ 8 18.1 Σ =55.4 16 b 9 6.7 w 9 x 3 w’ 9 22.7 3 b 10 4.6 w 10 x 4 w’ 10 7.6 || W || 1 =6

  20. Method 1: basic Laplace mechanism Workload queries Laplace noise private output x 1 + x 2 + x 3 + x 4 w 1 52 b 1 8.2 w’ 1 60.2 x 1 + x 2 + x 3 w 2 49 b 2 -5.4 w’ 2 44.6 x 2 + x 3 + x 4 w 3 42 b 3 -3.1 w’ 3 38.9 x 1 + x 2 w 4 33 b 4 6.6 w’ 4 39.6 W x 2 + x 3 w 5 39 b 5 -7.9 w’ 5 31.1 + ( 6 / ε ) x 3 + x 4 w 6 19 b 6 2.4 w’ 6 21.4 w 7 x 1 10 b 7 -3.0 w’ 7 7.0 w 8 x 2 23 b 8 -4.9 w’ 8 18.1 Σ =55.4 16 b 9 6.7 w 9 x 3 w’ 9 22.7 3 b 10 4.6 w 10 x 4 w’ 10 7.6 || W || 1 =6 n n=4 Sensitivity || W || 1 6 O(n 2 ) Error per query 2( || W || 1 / ε ) 2 = 72/ ε 2 2( || W || 1 / ε ) 2 = O(n 4 )/ ε 2

  21. Method 2: noisy frequency counts Use Laplace mechanism to get noisy estimates for each x i . Observation queries submitted Laplace noise private output b 1 z 1 x 1 b 2 z 2 x 2 + ( 1 / ε ) I x 3 b 3 z 3 x 4 b 4 z 4 || I || 1 =1

  22. Method 2: noisy frequency counts Use Laplace mechanism to get noisy estimates for each x i . Observation derived queries submitted Laplace noise private output workload answers z 1 + z 2 + z 3 + z 4 w’ 1 b 1 z 1 x 1 z 1 + z 2 + z 3 w’ 2 b 2 z 2 x 2 + ( 1 / ε ) I z 2 + z 3 + z 4 w’ 3 x 3 b 3 z 3 z 1 + z 2 w’ 4 x 4 b 4 z 4 z 2 + z 3 w’ 5 z 3 + z 4 w’ 6 || I || 1 =1 w’ 7 z 1 w’ 8 z 2 w’ 9 z 3 w’ 10 z 4

  23. Method 2: noisy frequency counts Use Laplace mechanism to get noisy estimates for each x i . Observation derived queries submitted Laplace noise private output workload answers z 1 + z 2 + z 3 + z 4 w’ 1 b 1 z 1 x 1 z 1 + z 2 + z 3 w’ 2 b 2 z 2 x 2 + ( 1 / ε ) I z 2 + z 3 + z 4 w’ 3 x 3 b 3 z 3 z 1 + z 2 w’ 4 x 4 b 4 z 4 z 2 + z 3 w’ 5 z 3 + z 4 w’ 6 || I || 1 =1 w’ 7 z 1 w’ 8 z 2 w’ 9 z 3 w’ 10 z 4 For w=range(x i, x j ) Error(w)= 2(j-i+1)/ ε 2

  24. Method 2: noisy frequency counts Use Laplace mechanism to get noisy estimates for each x i . Observation derived queries submitted Laplace noise private output workload answers 8/ ε 2 z 1 + z 2 + z 3 + z 4 w’ 1 b 1 z 1 x 1 z 1 + z 2 + z 3 w’ 2 b 2 z 2 x 2 + ( 1 / ε ) I z 2 + z 3 + z 4 w’ 3 x 3 b 3 z 3 z 1 + z 2 w’ 4 x 4 b 4 z 4 z 2 + z 3 w’ 5 z 3 + z 4 w’ 6 || I || 1 =1 w’ 7 z 1 w’ 8 z 2 w’ 9 z 3 2/ ε 2 w’ 10 z 4 For w=range(x i, x j ) Error(w)= 2(j-i+1)/ ε 2

  25. [Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. Observation Laplace noise private output queries submitted x 1 + x 2 + x 3 + x 4 b 1 z 1 x 1 + x 2 b 2 z 2 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H x 1 b 4 z 4 x 2 b 5 z 5 x 3 b 6 z 6 x 4 b 7 z 7 || H || 1 = 3 = logn+1

  26. [Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10

  27. [Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3

  28. [Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3 z 5 + z 6

  29. [Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3 z 2 - z 4 + z 6 z 5 + z 6

  30. [Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3 z 2 - z 4 + z 6 z 1 - z 4 - z 7 z 5 + z 6

  31. [Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3 Least-squares z 2 - z 4 + z 6 z 1 - z 4 - z 7 z 5 + z 6 (6z 1 + 3z 2 + 3z 3 - 9z 4 + 12z 5 + 12z 6 - 9z 7 )/21 estimate

  32. Error rates: workload of all range queries ε -di ff erential privacy 200000 ε = 0.1 160000 n = 1024 Mean Squared Error 120000 Noisy counts Hierarchical (2) 80000 40000 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Query Width (as fraction of the domain) small ranges big ranges

  33. [Xiao, ICDE 10] Method 4: wavelet queries Wavelet : use Haar wavelet as observations. Observation derived queries submitted Laplace noise private output workload answers b 1 z 1 w’ 1 x 1 + x 2 + x 3 + x 4 w’ 2 b 2 z 2 x 1 + x 2 - x 3 - x 4 + ( 3 / ε ) Y w’ 3 x 1 - x 2 b 3 z 3 w’ 4 ? x 3 - x 4 b 4 z 4 w’ 5 || Y || 1 = 3 w’ 6 w’ 7 = logn+1 w’ 8 w’ 9 w’ 10 Estimate for query range(x 2 ,x 3 ) = x 2 + x 3 .5z 1 + 0z 2 - .5z 3 + .5z 4

  34. [Xiao, ICDE 10] Method 4: wavelet queries Wavelet : use Haar wavelet as observations. Observation derived queries submitted Laplace noise private output workload answers b 1 z 1 w’ 1 x 1 + x 2 + x 3 + x 4 w’ 2 b 2 z 2 x 1 + x 2 - x 3 - x 4 + ( 3 / ε ) Y w’ 3 x 1 - x 2 b 3 z 3 w’ 4 ? x 3 - x 4 b 4 z 4 w’ 5 || Y || 1 = 3 w’ 6 w’ 7 = logn+1 w’ 8 w’ 9 w’ 10 Estimate for query range(x 2 ,x 3 ) = x 2 + x 3 .5z 1 + 0z 2 - .5z 3 + .5z 4

  35. Error: workload of all range queries ε -di ff erential privacy ε = 0.1 n = 1024 200000 160000 Mean Squared Error 120000 Identity Hierarchical (2) Wavelet 80000 Hierarchical (4) 40000 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Query Width (as fraction of the domain)

  36. Observations for the workload of all range queries Hierarchical Wavelet Noisy counts x 1 x 1 + x 2 + x 3 + x 4 x 1 + x 2 + x 3 + x 4 x 1 + x 2 x 1 + x 2 - x 3 - x 4 x 2 x 3 + x 4 x 1 - x 2 x 3 x 1 x 3 - x 4 x 4 x 2 x 3 x 4 I H Y Very low sensitivity, but Low sensitivity, and all range queries large ranges estimated can be estimated using no more than badly. logn output entries. 1-dim O(n/ ε 2 ) O(log 3 n/ ε 2 ) O(log 3 n/ ε 2 ) Max/Avg error k-dim O(log 3k n/ ε 2 )

  37. Observations for alternative workloads • Workload : sets of 2D range less accurate queries • Observations : [Cormode, ICDE ’12] • Quad-tree queries ... • Geometrically increasing ε by more accurate level • Workload : sets of low-order marginals H i-1 H i-1 • Observations: [Barak, PODS ‘07] H i = H i-1 -H i-1 • Fourier basis queries

  38. Questions raised Workload Observations Citation [Barak, PODS ‘07] low-order marginals Fourier basis queries Non-adaptive [Hay, PVLDB ‘10] all one-dim range queries Hierarchical ranges [Xiao, ICDE ‘10] all (multi-dim) range queries Haar wavelet queries [Cormode, ICDE ’12] 2-dim range queries Quad-tree queries • Are these observations optimal for the targeted workloads? • Which observations should we use for other custom workloads?

  39. Questions raised Workload Observations Citation [Barak, PODS ‘07] low-order marginals Fourier basis queries Non-adaptive [Hay, PVLDB ‘10] all one-dim range queries Hierarchical ranges [Xiao, ICDE ‘10] all (multi-dim) range queries Haar wavelet queries [Cormode, ICDE ’12] 2-dim range queries Quad-tree queries • Are these observations optimal for the targeted workloads? • Which observations should we use for other custom workloads?

  40. Questions raised Workload Observations Citation [Barak, PODS ‘07] low-order marginals Fourier basis queries Non-adaptive [Hay, PVLDB ‘10] all one-dim range queries Hierarchical ranges [Xiao, ICDE ‘10] all (multi-dim) range queries Haar wavelet queries [Cormode, ICDE ’12] 2-dim range queries Quad-tree queries • Are these observations optimal for the targeted workloads? • Which observations should we use for other custom workloads? Adapt observations to workload

  41. Outline 1. Preliminaries 2. Approach 1: workload-aware • Fixed Observations • Optimized Observations 3. Approach 2: data-aware 4. Conclusions

  42. Outline 1. Preliminaries 2. Approach 1: workload-aware • Fixed Observations • Optimized Observations 3. Approach 2: data-aware 4. Conclusions

  43. Laplace mechanism (matrix notation) Laplace(W,x) = Wx + ( || W || 1 / ε )b W m x n workload x n x 1 database || W || 1 sensitivity scalar noise: independent samples b m x 1 from Laplace(1)

  44. Laplace mechanism (matrix notation) Laplace(W,x) = Wx + ( || W || 1 / ε )b W m x n workload x n x 1 database || W || 1 sensitivity scalar noise: independent samples b m x 1 from Laplace(1) Error(w) = 2 ( || W || 1 / ε ) 2

  45. The matrix mechanism: justification

  46. The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A

  47. The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A

  48. The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b

  49. The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z .

  50. The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z . • compute estimate x of x that minimizes squared error: 2 ⎟⎜ Ax - z ⎟⎜ 2

  51. The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z . • compute estimate x of x that minimizes squared error: 2 ⎟⎜ Ax - z ⎟⎜ 2 • solution is the ordinary least squares estimator: x=A + z where A + =(A T A) -1 A T

  52. The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z . • compute estimate x of x that minimizes squared error: 2 ⎟⎜ Ax - z ⎟⎜ 2 • solution is the ordinary least squares estimator: Thm : x is unbiased x=A + z where A + =(A T A) -1 A T and has the least variance among all linear unbiased estimators.

  53. The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z . • compute estimate x of x that minimizes squared error: 2 ⎟⎜ Ax - z ⎟⎜ 2 • solution is the ordinary least squares estimator: Thm : x is unbiased x=A + z where A + =(A T A) -1 A T and has the least variance among all • Compute workload queries using estimate x : linear unbiased estimators. Wx

  54. The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1)

  55. The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with observations A

  56. The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with true answer observations A

  57. The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with scaling by true answer observations A || A || 1

  58. The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with scaling by transformation true answer observations A by WA + || A || 1

  59. The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with scaling by transformation true answer observations A by WA + || A || 1 Compare with the Laplace mechanism: Laplace(W,x) = Wx + ( || W || 1 / ε )b

  60. Instances of the matrix mechanism Given workload W of linear queries: Observation Resulting mechanism Matrix A A = W Never worse than Laplace -- sometimes better A = Identity matrix a common baseline A = Haar wavelet [Xiao, ICDE ‘10] [Hay, PVLDB ‘10] [Cormode, ICDE ’12] A = tree based [Barak, PODS ‘07] A = fourier basis

  61. Observation matrices equivalent to wavelet 1 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 > 1 1 -1 -1 ≡ 0 1 0 0 √ 2 0 0 0 1 -1 0 0 0 0 1 0 0 √ 2 0 0 0 0 0 1 0 0 1 -1 0 0 √ 2 0 1 0 0 0 0 0 0 √ 2 0 1 0 0 0 0 1 0 0 0 0 1 Wavelet Y Y’ Y’’ || Y || 1 = 3 || Y’ || 1 = 3 || Y’’ || 1 = 2.414

  62. Observation matrices equivalent to wavelet 1 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 > 1 1 -1 -1 ≡ 0 1 0 0 √ 2 0 0 0 1 -1 0 0 0 0 1 0 0 √ 2 0 0 0 0 0 1 0 0 1 -1 0 0 √ 2 0 1 0 0 0 Equivalent 0 0 0 √ 2 0 1 0 0 error for all 0 0 1 0 queries 0 0 0 1 Wavelet Y Y’ Y’’ || Y || 1 = 3 || Y’ || 1 = 3 || Y’’ || 1 = 2.414

  63. Observation matrices equivalent to wavelet 1 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 > 1 1 -1 -1 ≡ 0 1 0 0 √ 2 0 0 0 1 -1 0 0 0 0 1 0 0 √ 2 0 0 0 0 0 1 0 0 1 -1 0 0 √ 2 0 1 0 0 0 Equivalent Lower 0 0 0 √ 2 0 1 0 0 error for all error for all 0 0 1 0 queries queries 0 0 0 1 Wavelet Y Y’ Y’’ || Y || 1 = 3 || Y’ || 1 = 3 || Y’’ || 1 = 2.414

  64. Observation matrices equivalent to wavelet 1 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 > 1 1 -1 -1 ≡ 0 1 0 0 √ 2 0 0 0 1 -1 0 0 0 0 1 0 0 √ 2 0 0 0 0 0 1 0 0 1 -1 0 0 √ 2 0 1 0 0 0 Equivalent Lower 0 0 0 √ 2 0 1 0 0 error for all error for all 0 0 1 0 queries queries 0 0 0 1 Wavelet Y Y’ Y’’ || Y || 1 = 3 || Y’ || 1 = 3 || Y’’ || 1 = 2.414 The haar wavelet observation matrix Y is dominated by alternative matrix Y’’ .

  65. Error of matrix mechanism Given an observation matrix A and workload W , the error under the mechanism Matrix A is: For a single query w in W : Error A (w) = ( 2/ ε 2 ) ( || A || 1 ) 2 w(A T A) -1 w T Total error for workload W : TotalError A (w) = ( 2/ ε 2 ) ( || A || 1 ) 2 trace( W(A T A) -1 W T ) Error independent of the input data

  66. Optimal selection of observations Objective: given workload W , find the observation matrix A that minimizes the total error.

  67. Optimal selection of observations Objective: given workload W , find the observation matrix A that minimizes the total error. Optimization Objective Problem Type Runtime Privacy (W)

  68. Optimal selection of observations Objective: given workload W , find the observation matrix A that minimizes the total error. Optimization Objective Problem Type Runtime Privacy Given W consisting of data cube queries, choose A ε set-cover O(n) consisting of data cube queries to minimize simplified error DP approx measure. [Ding, SIGMOD ’11] W A TotalError (W) (W)

Recommend


More recommend