Queries and workloads 1-dim ranges • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients
Queries and workloads 1-dim ranges marginals • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients
Queries and workloads 1-dim ranges marginals k-dim ranges • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients
Queries and workloads 1-dim ranges marginals k-dim ranges predicate counting queries • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients
Queries and workloads 1-dim ranges marginals k-dim ranges predicate counting queries linear counting queries • 1-dimensional range queries : intervals • Marginals / data cube queries / contingency tables : aggregate over excluded dimensions. • k-dimensional range queries : axis-aligned rectangles • Predicate counting queries : only 0 or 1 coe ffi cients • Linear counting queries : arbitrary coe ffi cients
Privacy definitions & mechanisms • Di ff erential privacy A randomized algorithm A provides ( ε , δ ) -di ff erential privacy if: for all neighboring databases D and D’, and for any set of outputs S : Pr [ A ( D ) ∈ S ] ≤ e � Pr [ A ( D � ) ∈ S ] + δ • if δ =0, standard ε -di ff erential privacy: • Laplace(0,b) noise where b=||q|| 1 / ε • if δ >0, approximate ( ε , δ )-di ff erential privacy: • Gaussian(0, σ ) noise where σ = ||q|| 2 (2ln(2/ δ )) 1/2 / ε • Multi-query Laplace/Gaussian mechanism adds independent noise to each query answer. • Exponential mechanism
The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 x 7 +1 x 8 x 9 x 10 x’
The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 x 7 +1 x 8 x 9 x 10 x’
The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 || W || 1 = 4 x 7 +1 x 8 x 9 x 10 x’
The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 || W || 1 = 4 x 7 +1 x 8 The L 1 sensitivity of a query matrix is: x 9 the maximum L1 norm of the columns. x 10 x’
The sensitivity of a query matrix • For two neighboring databases D and D’, their frequency vectors x and x’ will di ff er in one position, by exactly 1. y 1 1 1 1 1 1 1 1 1 1 1 x 1 x 2 y 2 1 1 1 1 1 0 0 0 0 0 = x x 3 y 3 0 1 0 0 0 0 1 0 0 0 x 4 y 4 1 1 1 1 1 -1 -1 -1 -1 -1 x 5 answers query matrix W x 6 || W || 1 = 4 x 7 +1 x 8 The L 1 sensitivity of a query matrix is: x 9 the maximum L1 norm of the columns. x 10 The L 2 sensitivity of a query matrix is: x’ the maximum L2 norm of the columns.
Outline 1. Preliminaries 2. Approach 1: workload-aware • Fixed Observations • Optimized Observations 3. Approach 2: data-aware 4. Conclusions
Outline 1. Preliminaries 2. Approach 1: workload-aware • Fixed Observations • Optimized Observations 3. Approach 2: data-aware 4. Conclusions
Answering all range queries Goal: answer all range-count queries over x AllRange = { w | w = x i + ... + x j for 1 ≤ i ≤ j ≤ n } x 1 + x 2 + x 3 + x 4 w 1 range(x 1 ,x 4 ) x 1 + x 2 + x 3 w 2 range(x 1 ,x 3 ) x 2 + x 3 + x 4 w 3 range(x 2 ,x 4 ) x 1 + x 2 w 4 range(x 1 ,x 2 ) workload W x 2 + x 3 w 5 range(x 2 ,x 3 ) x 3 + x 4 w 6 range(x 3 ,x 4 ) w 7 range(x 1 ,x 1 ) x 1 w 8 range(x 2 ,x 2 ) x 2 w 9 range(x 3 ,x 3 ) x 3 w 10 range(x 4 ,x 4 ) x 4
Answering all range queries Goal: answer all range-count queries over x AllRange = { w | w = x i + ... + x j for 1 ≤ i ≤ j ≤ n } x 1 + x 2 + x 3 + x 4 w 1 range(x 1 ,x 4 ) x 1 + x 2 + x 3 w 2 range(x 1 ,x 3 ) x 2 + x 3 + x 4 w 3 range(x 2 ,x 4 ) x 1 + x 2 w 4 range(x 1 ,x 2 ) workload W x 2 + x 3 w 5 range(x 2 ,x 3 ) x 3 + x 4 w 6 range(x 3 ,x 4 ) w 7 range(x 1 ,x 1 ) x 1 w 8 range(x 2 ,x 2 ) x 2 w 9 range(x 3 ,x 3 ) x 3 w 10 range(x 4 ,x 4 ) x 4 x = 10 23 16 3
Answering all range queries Goal: answer all range-count queries over x AllRange = { w | w = x i + ... + x j for 1 ≤ i ≤ j ≤ n } x 1 + x 2 + x 3 + x 4 w 1 range(x 1 ,x 4 ) w 1 52 x 1 + x 2 + x 3 w 2 range(x 1 ,x 3 ) w 2 49 x 2 + x 3 + x 4 w 3 range(x 2 ,x 4 ) w 3 42 x 1 + x 2 w 4 range(x 1 ,x 2 ) w 4 33 workload W x 2 + x 3 w 5 range(x 2 ,x 3 ) w 5 39 x 3 + x 4 w 6 range(x 3 ,x 4 ) w 6 19 w 7 range(x 1 ,x 1 ) x 1 w 7 10 w 8 range(x 2 ,x 2 ) x 2 w 8 23 w 9 range(x 3 ,x 3 ) w 9 x 3 16 w 10 range(x 4 ,x 4 ) w 10 x 4 3 x = 10 23 16 3
Method 1: basic Laplace mechanism Workload queries Laplace noise private output x 1 + x 2 + x 3 + x 4 w 1 b 1 w’ 1 x 1 + x 2 + x 3 w 2 b 2 w’ 2 x 2 + x 3 + x 4 w 3 b 3 w’ 3 x 1 + x 2 w 4 b 4 w’ 4 W x 2 + x 3 w 5 b 5 w’ 5 + ( 6 / ε ) x 3 + x 4 w 6 b 6 w’ 6 w 7 x 1 b 7 w’ 7 w 8 x 2 b 8 w’ 8 b 9 w 9 x 3 w’ 9 b 10 w 10 x 4 w’ 10 || W || 1 =6
Method 1: basic Laplace mechanism Workload queries Laplace noise private output x 1 + x 2 + x 3 + x 4 w 1 52 b 1 8.2 w’ 1 60.2 x 1 + x 2 + x 3 w 2 49 b 2 -5.4 w’ 2 44.6 x 2 + x 3 + x 4 w 3 42 b 3 -3.1 w’ 3 38.9 x 1 + x 2 w 4 33 b 4 6.6 w’ 4 39.6 W x 2 + x 3 w 5 39 b 5 -7.9 w’ 5 31.1 + ( 6 / ε ) x 3 + x 4 w 6 19 b 6 2.4 w’ 6 21.4 w 7 x 1 10 b 7 -3.0 w’ 7 7.0 w 8 x 2 23 b 8 -4.9 w’ 8 18.1 16 b 9 6.7 w 9 x 3 w’ 9 22.7 3 b 10 4.6 w 10 x 4 w’ 10 7.6 || W || 1 =6
Method 1: basic Laplace mechanism Workload queries Laplace noise private output x 1 + x 2 + x 3 + x 4 w 1 52 b 1 8.2 w’ 1 60.2 x 1 + x 2 + x 3 w 2 49 b 2 -5.4 w’ 2 44.6 x 2 + x 3 + x 4 w 3 42 b 3 -3.1 w’ 3 38.9 x 1 + x 2 w 4 33 b 4 6.6 w’ 4 39.6 W x 2 + x 3 w 5 39 b 5 -7.9 w’ 5 31.1 + ( 6 / ε ) x 3 + x 4 w 6 19 b 6 2.4 w’ 6 21.4 w 7 x 1 10 b 7 -3.0 w’ 7 7.0 w 8 x 2 23 b 8 -4.9 w’ 8 18.1 Σ =55.4 16 b 9 6.7 w 9 x 3 w’ 9 22.7 3 b 10 4.6 w 10 x 4 w’ 10 7.6 || W || 1 =6
Method 1: basic Laplace mechanism Workload queries Laplace noise private output x 1 + x 2 + x 3 + x 4 w 1 52 b 1 8.2 w’ 1 60.2 x 1 + x 2 + x 3 w 2 49 b 2 -5.4 w’ 2 44.6 x 2 + x 3 + x 4 w 3 42 b 3 -3.1 w’ 3 38.9 x 1 + x 2 w 4 33 b 4 6.6 w’ 4 39.6 W x 2 + x 3 w 5 39 b 5 -7.9 w’ 5 31.1 + ( 6 / ε ) x 3 + x 4 w 6 19 b 6 2.4 w’ 6 21.4 w 7 x 1 10 b 7 -3.0 w’ 7 7.0 w 8 x 2 23 b 8 -4.9 w’ 8 18.1 Σ =55.4 16 b 9 6.7 w 9 x 3 w’ 9 22.7 3 b 10 4.6 w 10 x 4 w’ 10 7.6 || W || 1 =6 n n=4 Sensitivity || W || 1 6 O(n 2 ) Error per query 2( || W || 1 / ε ) 2 = 72/ ε 2 2( || W || 1 / ε ) 2 = O(n 4 )/ ε 2
Method 2: noisy frequency counts Use Laplace mechanism to get noisy estimates for each x i . Observation queries submitted Laplace noise private output b 1 z 1 x 1 b 2 z 2 x 2 + ( 1 / ε ) I x 3 b 3 z 3 x 4 b 4 z 4 || I || 1 =1
Method 2: noisy frequency counts Use Laplace mechanism to get noisy estimates for each x i . Observation derived queries submitted Laplace noise private output workload answers z 1 + z 2 + z 3 + z 4 w’ 1 b 1 z 1 x 1 z 1 + z 2 + z 3 w’ 2 b 2 z 2 x 2 + ( 1 / ε ) I z 2 + z 3 + z 4 w’ 3 x 3 b 3 z 3 z 1 + z 2 w’ 4 x 4 b 4 z 4 z 2 + z 3 w’ 5 z 3 + z 4 w’ 6 || I || 1 =1 w’ 7 z 1 w’ 8 z 2 w’ 9 z 3 w’ 10 z 4
Method 2: noisy frequency counts Use Laplace mechanism to get noisy estimates for each x i . Observation derived queries submitted Laplace noise private output workload answers z 1 + z 2 + z 3 + z 4 w’ 1 b 1 z 1 x 1 z 1 + z 2 + z 3 w’ 2 b 2 z 2 x 2 + ( 1 / ε ) I z 2 + z 3 + z 4 w’ 3 x 3 b 3 z 3 z 1 + z 2 w’ 4 x 4 b 4 z 4 z 2 + z 3 w’ 5 z 3 + z 4 w’ 6 || I || 1 =1 w’ 7 z 1 w’ 8 z 2 w’ 9 z 3 w’ 10 z 4 For w=range(x i, x j ) Error(w)= 2(j-i+1)/ ε 2
Method 2: noisy frequency counts Use Laplace mechanism to get noisy estimates for each x i . Observation derived queries submitted Laplace noise private output workload answers 8/ ε 2 z 1 + z 2 + z 3 + z 4 w’ 1 b 1 z 1 x 1 z 1 + z 2 + z 3 w’ 2 b 2 z 2 x 2 + ( 1 / ε ) I z 2 + z 3 + z 4 w’ 3 x 3 b 3 z 3 z 1 + z 2 w’ 4 x 4 b 4 z 4 z 2 + z 3 w’ 5 z 3 + z 4 w’ 6 || I || 1 =1 w’ 7 z 1 w’ 8 z 2 w’ 9 z 3 2/ ε 2 w’ 10 z 4 For w=range(x i, x j ) Error(w)= 2(j-i+1)/ ε 2
[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. Observation Laplace noise private output queries submitted x 1 + x 2 + x 3 + x 4 b 1 z 1 x 1 + x 2 b 2 z 2 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H x 1 b 4 z 4 x 2 b 5 z 5 x 3 b 6 z 6 x 4 b 7 z 7 || H || 1 = 3 = logn+1
[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10
[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3
[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3 z 5 + z 6
[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3 z 2 - z 4 + z 6 z 5 + z 6
[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3 z 2 - z 4 + z 6 z 1 - z 4 - z 7 z 5 + z 6
[Hay, PVLDB 10] Method 3: hierarchical observations Hierarchical queries : recursively partition the domain, computing sums of each interval. derived Observation Laplace noise private output workload answers queries submitted x 1 + x 2 + x 3 + x 4 w’ 1 b 1 z 1 w’ 2 x 1 + x 2 b 2 z 2 w’ 3 x 3 + x 4 b 3 z 3 + ( 3 / ε ) H w’ 4 x 1 b 4 z 4 ? w’ 5 x 2 b 5 z 5 w’ 6 x 3 b 6 z 6 w’ 7 x 4 b 7 z 7 w’ 8 || H || 1 = 3 w’ 9 = logn+1 w’ 10 Possible estimates for query range(x 2 ,x 3 ) = x 2 + x 3 Least-squares z 2 - z 4 + z 6 z 1 - z 4 - z 7 z 5 + z 6 (6z 1 + 3z 2 + 3z 3 - 9z 4 + 12z 5 + 12z 6 - 9z 7 )/21 estimate
Error rates: workload of all range queries ε -di ff erential privacy 200000 ε = 0.1 160000 n = 1024 Mean Squared Error 120000 Noisy counts Hierarchical (2) 80000 40000 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Query Width (as fraction of the domain) small ranges big ranges
[Xiao, ICDE 10] Method 4: wavelet queries Wavelet : use Haar wavelet as observations. Observation derived queries submitted Laplace noise private output workload answers b 1 z 1 w’ 1 x 1 + x 2 + x 3 + x 4 w’ 2 b 2 z 2 x 1 + x 2 - x 3 - x 4 + ( 3 / ε ) Y w’ 3 x 1 - x 2 b 3 z 3 w’ 4 ? x 3 - x 4 b 4 z 4 w’ 5 || Y || 1 = 3 w’ 6 w’ 7 = logn+1 w’ 8 w’ 9 w’ 10 Estimate for query range(x 2 ,x 3 ) = x 2 + x 3 .5z 1 + 0z 2 - .5z 3 + .5z 4
[Xiao, ICDE 10] Method 4: wavelet queries Wavelet : use Haar wavelet as observations. Observation derived queries submitted Laplace noise private output workload answers b 1 z 1 w’ 1 x 1 + x 2 + x 3 + x 4 w’ 2 b 2 z 2 x 1 + x 2 - x 3 - x 4 + ( 3 / ε ) Y w’ 3 x 1 - x 2 b 3 z 3 w’ 4 ? x 3 - x 4 b 4 z 4 w’ 5 || Y || 1 = 3 w’ 6 w’ 7 = logn+1 w’ 8 w’ 9 w’ 10 Estimate for query range(x 2 ,x 3 ) = x 2 + x 3 .5z 1 + 0z 2 - .5z 3 + .5z 4
Error: workload of all range queries ε -di ff erential privacy ε = 0.1 n = 1024 200000 160000 Mean Squared Error 120000 Identity Hierarchical (2) Wavelet 80000 Hierarchical (4) 40000 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Query Width (as fraction of the domain)
Observations for the workload of all range queries Hierarchical Wavelet Noisy counts x 1 x 1 + x 2 + x 3 + x 4 x 1 + x 2 + x 3 + x 4 x 1 + x 2 x 1 + x 2 - x 3 - x 4 x 2 x 3 + x 4 x 1 - x 2 x 3 x 1 x 3 - x 4 x 4 x 2 x 3 x 4 I H Y Very low sensitivity, but Low sensitivity, and all range queries large ranges estimated can be estimated using no more than badly. logn output entries. 1-dim O(n/ ε 2 ) O(log 3 n/ ε 2 ) O(log 3 n/ ε 2 ) Max/Avg error k-dim O(log 3k n/ ε 2 )
Observations for alternative workloads • Workload : sets of 2D range less accurate queries • Observations : [Cormode, ICDE ’12] • Quad-tree queries ... • Geometrically increasing ε by more accurate level • Workload : sets of low-order marginals H i-1 H i-1 • Observations: [Barak, PODS ‘07] H i = H i-1 -H i-1 • Fourier basis queries
Questions raised Workload Observations Citation [Barak, PODS ‘07] low-order marginals Fourier basis queries Non-adaptive [Hay, PVLDB ‘10] all one-dim range queries Hierarchical ranges [Xiao, ICDE ‘10] all (multi-dim) range queries Haar wavelet queries [Cormode, ICDE ’12] 2-dim range queries Quad-tree queries • Are these observations optimal for the targeted workloads? • Which observations should we use for other custom workloads?
Questions raised Workload Observations Citation [Barak, PODS ‘07] low-order marginals Fourier basis queries Non-adaptive [Hay, PVLDB ‘10] all one-dim range queries Hierarchical ranges [Xiao, ICDE ‘10] all (multi-dim) range queries Haar wavelet queries [Cormode, ICDE ’12] 2-dim range queries Quad-tree queries • Are these observations optimal for the targeted workloads? • Which observations should we use for other custom workloads?
Questions raised Workload Observations Citation [Barak, PODS ‘07] low-order marginals Fourier basis queries Non-adaptive [Hay, PVLDB ‘10] all one-dim range queries Hierarchical ranges [Xiao, ICDE ‘10] all (multi-dim) range queries Haar wavelet queries [Cormode, ICDE ’12] 2-dim range queries Quad-tree queries • Are these observations optimal for the targeted workloads? • Which observations should we use for other custom workloads? Adapt observations to workload
Outline 1. Preliminaries 2. Approach 1: workload-aware • Fixed Observations • Optimized Observations 3. Approach 2: data-aware 4. Conclusions
Outline 1. Preliminaries 2. Approach 1: workload-aware • Fixed Observations • Optimized Observations 3. Approach 2: data-aware 4. Conclusions
Laplace mechanism (matrix notation) Laplace(W,x) = Wx + ( || W || 1 / ε )b W m x n workload x n x 1 database || W || 1 sensitivity scalar noise: independent samples b m x 1 from Laplace(1)
Laplace mechanism (matrix notation) Laplace(W,x) = Wx + ( || W || 1 / ε )b W m x n workload x n x 1 database || W || 1 sensitivity scalar noise: independent samples b m x 1 from Laplace(1) Error(w) = 2 ( || W || 1 / ε ) 2
The matrix mechanism: justification
The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A
The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A
The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b
The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z .
The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z . • compute estimate x of x that minimizes squared error: 2 ⎟⎜ Ax - z ⎟⎜ 2
The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z . • compute estimate x of x that minimizes squared error: 2 ⎟⎜ Ax - z ⎟⎜ 2 • solution is the ordinary least squares estimator: x=A + z where A + =(A T A) -1 A T
The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z . • compute estimate x of x that minimizes squared error: 2 ⎟⎜ Ax - z ⎟⎜ 2 • solution is the ordinary least squares estimator: Thm : x is unbiased x=A + z where A + =(A T A) -1 A T and has the least variance among all linear unbiased estimators.
The matrix mechanism: justification ➊ ( Select Observations ) Choose a (full rank) query matrix A ➋ ( Apply Laplace ) Use the Laplace mechanism to answer A z = Ax + ( || A || 1 / ε )b ➌ ( Derive answers ) Compute estimate x of x using answers z . • compute estimate x of x that minimizes squared error: 2 ⎟⎜ Ax - z ⎟⎜ 2 • solution is the ordinary least squares estimator: Thm : x is unbiased x=A + z where A + =(A T A) -1 A T and has the least variance among all • Compute workload queries using estimate x : linear unbiased estimators. Wx
The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1)
The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with observations A
The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with true answer observations A
The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with scaling by true answer observations A || A || 1
The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with scaling by transformation true answer observations A by WA + || A || 1
The matrix mechanism Given a workload W , and any full-rank strategy matrix A , the following randomized algorithm is ε -differentially private: Matrix A (W,x) = Wx + ( || A || 1 / ε ) WA + b b=Lap(1) instantiated with scaling by transformation true answer observations A by WA + || A || 1 Compare with the Laplace mechanism: Laplace(W,x) = Wx + ( || W || 1 / ε )b
Instances of the matrix mechanism Given workload W of linear queries: Observation Resulting mechanism Matrix A A = W Never worse than Laplace -- sometimes better A = Identity matrix a common baseline A = Haar wavelet [Xiao, ICDE ‘10] [Hay, PVLDB ‘10] [Cormode, ICDE ’12] A = tree based [Barak, PODS ‘07] A = fourier basis
Observation matrices equivalent to wavelet 1 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 > 1 1 -1 -1 ≡ 0 1 0 0 √ 2 0 0 0 1 -1 0 0 0 0 1 0 0 √ 2 0 0 0 0 0 1 0 0 1 -1 0 0 √ 2 0 1 0 0 0 0 0 0 √ 2 0 1 0 0 0 0 1 0 0 0 0 1 Wavelet Y Y’ Y’’ || Y || 1 = 3 || Y’ || 1 = 3 || Y’’ || 1 = 2.414
Observation matrices equivalent to wavelet 1 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 > 1 1 -1 -1 ≡ 0 1 0 0 √ 2 0 0 0 1 -1 0 0 0 0 1 0 0 √ 2 0 0 0 0 0 1 0 0 1 -1 0 0 √ 2 0 1 0 0 0 Equivalent 0 0 0 √ 2 0 1 0 0 error for all 0 0 1 0 queries 0 0 0 1 Wavelet Y Y’ Y’’ || Y || 1 = 3 || Y’ || 1 = 3 || Y’’ || 1 = 2.414
Observation matrices equivalent to wavelet 1 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 > 1 1 -1 -1 ≡ 0 1 0 0 √ 2 0 0 0 1 -1 0 0 0 0 1 0 0 √ 2 0 0 0 0 0 1 0 0 1 -1 0 0 √ 2 0 1 0 0 0 Equivalent Lower 0 0 0 √ 2 0 1 0 0 error for all error for all 0 0 1 0 queries queries 0 0 0 1 Wavelet Y Y’ Y’’ || Y || 1 = 3 || Y’ || 1 = 3 || Y’’ || 1 = 2.414
Observation matrices equivalent to wavelet 1 1 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 > 1 1 -1 -1 ≡ 0 1 0 0 √ 2 0 0 0 1 -1 0 0 0 0 1 0 0 √ 2 0 0 0 0 0 1 0 0 1 -1 0 0 √ 2 0 1 0 0 0 Equivalent Lower 0 0 0 √ 2 0 1 0 0 error for all error for all 0 0 1 0 queries queries 0 0 0 1 Wavelet Y Y’ Y’’ || Y || 1 = 3 || Y’ || 1 = 3 || Y’’ || 1 = 2.414 The haar wavelet observation matrix Y is dominated by alternative matrix Y’’ .
Error of matrix mechanism Given an observation matrix A and workload W , the error under the mechanism Matrix A is: For a single query w in W : Error A (w) = ( 2/ ε 2 ) ( || A || 1 ) 2 w(A T A) -1 w T Total error for workload W : TotalError A (w) = ( 2/ ε 2 ) ( || A || 1 ) 2 trace( W(A T A) -1 W T ) Error independent of the input data
Optimal selection of observations Objective: given workload W , find the observation matrix A that minimizes the total error.
Optimal selection of observations Objective: given workload W , find the observation matrix A that minimizes the total error. Optimization Objective Problem Type Runtime Privacy (W)
Optimal selection of observations Objective: given workload W , find the observation matrix A that minimizes the total error. Optimization Objective Problem Type Runtime Privacy Given W consisting of data cube queries, choose A ε set-cover O(n) consisting of data cube queries to minimize simplified error DP approx measure. [Ding, SIGMOD ’11] W A TotalError (W) (W)
Recommend
More recommend