Randomness in Computing L ECTURE 27 Last time • Stationary distributions • Random walks on graphs • Algorithm for 𝑡 - 𝑢 -PATH Today • Sublinear algorithms • Differential privacy 4/29/2020 Sofya Raskhodnikova;Randomness in Computing; based on slides by Baranasuriya et al.
A Sublinear-Time Algorithm B L A - B L A - B L A - B L A - B L A - B L A - B L A - B L A ? L ? B ? L ? A randomized algorithm approximate answer Resources Quality of • number of queries approximation • running time
Goal: Fundamental Understanding of Sublinear Computation • What computational tasks? • How to measure quality of approximation? • What type of access to the input? • Can we make our computations robust (e.g., to noise or erased data)?
Fundamental Computational Tasks • Property testing • need to answer YES or NO intuition: only require correct answers on two sets of instances that are very different from each other • Learning • need an approximate representation of an object input is from a given class (or is close to it) • Classical approximation • need to compute a value output should be close to the desired value 4
Property Testing: Definition [Rubinfeld Sudan, Goldreich Goldwasser Ron] Randomized Algorithm Property Tester YES YES Accept with Accept with probability ≥ 𝟑/𝟒 probability ≥ 𝟑/𝟒 𝜁 Don’t care Close to YES NO Far from Reject with Reject with YES probability 2/3 probability 2/3 𝜁 - ( ≥ 𝜁 fraction of places) far = differs in many places
Example: Lipschitz Testing [Jha R] Input: a list of 𝑜 numbers 𝑦 1 , 𝑦 2 , … , 𝑦 𝑜 • A list of numbers is Lipschitz if 𝑦 𝑗+1 − 𝑦 𝑗 ≤ 1 for all 𝑗. • Question: Is the list Lipschitz? Requires reading entire list: (𝑜) time • Approximate version: Is the list Lipschitz or 𝜁 -far from Lipschitz? (An 𝜁 fraction of 𝑦 𝑗 ’s have to be changed to make it Lipschitz.) Our result: O ((log 𝑜)/𝜁) time 5 6 5 4 5 4 3 2 2 1 𝒚 𝒋 6 5 4 3 2 1 𝒋 1 2 3 4 5 6 7 8 9 10 6
Lipschitz Testing: Attempts 1. Test : Pick a random 𝑗 and reject if 𝑦 𝑗+1 − 𝑦 𝑗 > 1 Fails on: ← 1/2-far from Lipschitz 0 1 2 3 5 6 7 8 𝒚 𝒋 6 5 4 3 2 1 𝒋 1 2 3 4 5 6 7 8 2. Test : Pick random 𝑗 < 𝑘 and reject if 𝑦 𝑘 − 𝑦 𝑗 > 𝑘 − 𝑗 Fails on: ← 1/2-far from Lipschitz 0 2 1 3 2 4 3 5 4 6 𝒚 𝒋 6 5 4 3 2 1 𝒋 1 2 3 4 5 6 7 8 9 10
Is a list Lipschitz or 𝜁 -far from Lipschitz? Idea: Associate positions in the list with vertices of the directed line. … … … 𝒐 -1 𝒐 1 2 3 Construct a graph (2-spanner) ≤ 𝑜 log 𝑜 edges [Bhattacharyya Grigorescu Jung R Woodruff] • by adding a few “shortcut” edges (𝑗, 𝑘) for 𝑗 < 𝑘 • where each pair of vertices is connected by a path of length at most 2
Is a list Lipschitz or 𝜁 -far from Lipschitz? Test Pick a random edge (𝑗, 𝑘) from the 2-spanner and reject if 𝑦 𝑘 − 𝑦 𝑗 > 𝑘 − 𝑗. 3 2 2 4 6 6 7 2 4 6 x k x i x j Analysis: Call a pair (𝑗, 𝑘) violated if 𝑦 𝑘 − 𝑦 𝑗 > 𝑘 − 𝑗 , and satisfied otherwise. • • If 𝑗 is an endpoint of a violated edge, call 𝑦 𝑗 bad . Otherwise, call it good . Claim 1. All pairs of good numbers are satisfied. Proof: Consider any two good numbers, x i and x j . They are connected by a path of (at most) two satisfied edges 𝑗, 𝑙 , (𝑙, 𝑘) ⇒ 𝑦 𝑙 − 𝑦 𝑗 ≤ 𝑙 − 𝑗 and 𝑦 𝑘 − 𝑦 𝑙 ≤ 𝑘 − 𝑙 ⇒ 𝑦 𝑘 − 𝑦 𝑗 ≤ 𝑦 𝑘 − 𝑦 𝑙 + 𝑦 𝑙 − 𝑦 𝑗 ≤ 𝑘 − 𝑙 + 𝑙 − 𝑗 = 𝑘 − 𝑗
Is a list Lipschitz or 𝜁 -far from Lipschitz? Test Pick a random edge (𝑗, 𝑘) from the 2-spanner and reject if 𝑦 𝑘 − 𝑦 𝑗 > 𝑘 − 𝑗. 3 2 2 4 6 6 7 2 4 6 x k x i x j Analysis: Call a pair (𝑗, 𝑘) violated if 𝑦 𝑘 − 𝑦 𝑗 > 𝑘 − 𝑗 , and satisfied otherwise. • • If 𝑗 is an endpoint of a violated edge, call 𝑦 𝑗 bad . Otherwise, call it good . Claim 1. All pairs of good numbers are satisfied. Claim 2. An 𝜁 -far list violates ≥ 𝜁/(2 log 𝑜) fraction of edges in 2-spanner. Proof: If a list is 𝜁 -far from Lipschitz, it has ≥ 𝜁𝑜 bad numbers. (Claim 1) • Each violated edge contributes 2 bad numbers. 𝜁𝑜 • 2-spanner has ≥ 2 violated edges out of 𝑜 log 𝑜 .
Is a list Lipschitz or 𝜁 -far from Lipschitz? Test Pick a random edge (𝑗, 𝑘) from the 2-spanner and reject if 𝑦 𝑘 − 𝑦 𝑗 > 𝑘 − 𝑗. 3 2 2 4 6 6 7 2 4 6 x k x i x j Analysis: • Call a pair (𝑗, 𝑘) violated if 𝑦 𝑘 − 𝑦 𝑗 > 𝑘 − 𝑗 , and satisfied otherwise. Claim 2. An 𝜁 -far list violates ≥ 𝜁/(2 log 𝑜) fraction of edges in 2-spanner. Algorithm 4 log 𝑜 edges ( x i ,x j ) from the 2-spanner and reject if 𝑦 𝑘 − 𝑦 𝑗 > 𝑘 − 𝑗 . Sample 𝜁 Guarantee: All Lipschitz lists are accepted. All lists that are 𝜁 -far from Lipschitz are rejected with probability ≥ 2/3. Time: O((log n)/ ² ) 11
Testing if a List is Lipschitz: Summary • [Jha R]: We can determine if a list of 𝑜 numbers is Lipschitz or 𝜁 -far from Lipschitz log 𝑜 in O time. 𝜁 • [Jha R, Blais R Yaroslavtsev, Chakrabarty Dixit Jha Seshadhri]: This cannot be improved.
Testing Properties of High-Dimensional Functions In polylogarithmic time, we can test a large class of properties of functions 𝑔: 1, … , 𝑜 𝑒 → ℝ , including: x y • Lipschitz property [Jha R ] • Monotonicity [Goldreich Goldwasser Lehman Ron, Dodis Goldreich Lehman R Ron Samorodnitsky] • Bounded-derivative properties [Chakrabarty Dixit Jha Seshadhri] • Unateness [Baleshzar Chakrabarty Pallavoor R Seshadhri]
Sublinear Algorithms: Summary • Many problems admit sublinear-time algorithms • Algorithms are often simple • Analysis requires creation of interesting combinatorial, geometric and algebraic tools • Unexpected connections to other areas • Many open questions
Private Data Analysis Individuals Curator Data Analysts 𝑦 1 ( Queries ) 𝑦 2 Answers 𝑦 3 x = 𝑦 𝑒−1 𝑦 𝑒 Typical examples: census, medical studies, what big companies want to publish about our data … Two conflicting goals Protect privacy of individuals • Differential privacy [Dwork McSherry Nissim Smith 06] Give accurate answers
Neighboring Datasets Two datasets 𝑦, 𝑦′ are neighbors if they differ in one person’s data. 𝑦 1 𝑦 1 𝑦 2 𝑦 2 𝑦 3 𝒚′ 𝟒 𝑦 𝑒−1 𝑦 𝑒−1 𝑦 𝑒 𝑦 𝑒 𝑦 𝑦′
Differential Privacy [Dwork McSherry Nissim Smith] Privacy Definition An algorithm A is 𝝑 -differentially private if for all pairs of neighbors 𝒚, 𝒚′ and all sets of answers S : 𝐐𝐬 𝑩 𝒚 ∈ 𝑻 ≤ 𝒇 𝝑 𝐐𝐬 𝑩 𝒚 ′ ∈ 𝑻 𝑦 1 𝑦 1 𝑦 2 𝑦 2 𝑦 3 𝒚′ 𝟒 𝑦 𝑒−1 𝑦 𝑒−1 𝑦 𝑒 𝑦 𝑒 𝑦 𝑦′
Properties of Differential Privacy • Composition: If algorithms 𝐵 1 and 𝐵 2 are 𝜗 -differentially private then algorithm that outputs (𝐵 1 𝑦 , 𝐵 2 (𝑦)) is 2 𝜗 -differentially private • Meaningful in the presence of arbitrary external information 18
Output Perturbation Frameworks for designing differentially private algorithms 19
Output Perturbation Individuals Curator Data Analysts 𝑦 1 𝑦 2 Evaluate 𝒈(𝒚) 𝑦 3 x = A 𝒚 = 𝒈 𝒚 + 𝒐𝒑𝒋𝒕𝒇 𝑦 𝑒−1 𝑦 𝑒
Global Sensitivity Framework Global sensitivity of a function 𝑔 is 𝐨𝐟𝐣𝐡𝐢𝐜𝐩𝐬𝑡 𝑦,𝑦 ′ 𝑔 𝑦 − 𝑔 𝑦 ′ . 𝑯𝑻 𝒈 = max 𝑦 1 +⋯+𝑦 𝑜 Example: 𝑦 1 , … , 𝑦 𝑜 ∈ 0,1 , ave 𝑦 = 𝑜 • 𝐻𝑇 ave = ?
Global Sensitivity Framework Global sensitivity of a function 𝑔 is 𝐨𝐟𝐣𝐡𝐢𝐜𝐩𝐬𝑡 𝑦,𝑦 ′ 𝑔 𝑦 − 𝑔 𝑦 ′ . 𝑯𝑻 𝒈 = max 𝑦 1 +⋯+𝑦 𝑜 Example: 𝑦 1 , … , 𝑦 𝑜 ∈ 0,1 , ave 𝑦 = 𝑜 • 𝐻𝑇 ave = 1/𝑜 Theorem [Dwork McSherry Nissim Smith] 𝐻𝑇 𝑔 If 𝐵 𝑦 = 𝑔 𝑦 + 𝑀𝑏𝑞 then 𝐵 is 𝜗 -differentially private. 𝜗
Global Sensitivity: Noise Distribution Laplace Mechanism Theorem [Dwork McSherry Nissim Smith] 𝐻𝑇 𝑔 If 𝐵 𝑦 = 𝑔 𝑦 + 𝑀𝑏𝑞 then 𝐵 is 𝜗 -differentially private. 𝜗 2𝜇 ⋅ 𝑓 − 𝑧 1 Laplace distribution Lap (𝜇) has density ℎ 𝑧 = 𝜇 (mean 0, standard deviation 2 ⋅ 𝜇 ) 𝐻𝑇 𝑔 Sliding Property of 𝑀𝑏𝑞 𝜗 𝜗⋅ 𝜀 ℎ 𝑧 𝐻𝑇𝑔 for all 𝑧, 𝜀 : ℎ 𝑧+𝜀 ≤ 𝑓
When is Laplace Mechanism Useful? • Laplace mechanism is always private. • When is it accurate? 𝑦 1 +⋯+𝑦 𝑜 Example: 𝑦 1 , … , 𝑦 𝑜 ∈ 0,1 , ave 𝑦 = 𝑜 1 • 𝐻𝑇 ave = 1/𝑜 Noise= Lap 𝜗𝑜 Accurate when GS is low (and 𝑜 , the size of the database, is sufficiently large)
Recommend
More recommend