Sublinear Algorithms L ECTURE 10 Last time • Multipurpose sketches • Count-min and count-sketch • Range queries, heavy hitters, quantiles Today • Limitations of streaming algorithms • Communication complexity 10/6/2020 Sofya Raskhodnikova;Boston University
Recall: Frequency Moments Estimation Input: a stream 𝑏 1 , 𝑏 2 , … , 𝑏 𝑛 ∈ 𝑜 𝑛 The frequency vector of the stream is 𝑔 = (𝑔 1 , … , 𝑔 𝑜 ) , • where 𝑔 𝑗 is the number of times 𝑗 appears in the stream 𝑞 = σ 𝑗=1 𝑞 𝑜 The 𝑞 -th frequency moment is 𝐺 𝑞 = 𝑔 𝑔 • 𝑗 𝑞 𝐺 0 is the number of nonzero entries of 𝑔 (# of distinct elements) 𝐺 1 = 𝑛 (# of elements in the stream) 2 is a measure of non-uniformity 𝐺 2 = 𝑔 2 used e.g. for anomaly detection in network analysis 𝐺 ∞ = max 𝑔 𝑗 is the most frequent element 𝑗 We obtained streaming algorithms for 𝐺 0 , 𝐺 1 , 𝐺 2 . What about 𝐺 3 to 𝐺 ∞ ? 2
Communication Complexity A Method for Proving Lower Bounds
(Randomized) Communication Complexity 𝑇ℎ𝑏𝑠𝑓𝑒 𝑠𝑏𝑜𝑒𝑝𝑛 𝑡𝑢𝑠𝑗𝑜 1101000101110101110101010110 … Alice Bob 0100 11 001 ⋯ 0011 𝐽𝑜𝑞𝑣𝑢: 𝑦 Input: 𝑧 Compute 𝐷 𝑦, 𝑧 Goal: minimize the number of bits exchanged. Communication complexity of a protocol is t he maximum number of bits • exchanged by the protocol. • Communication complexity of a function 𝐷 , denoted 𝑆(𝐷) , is the communication complexity of the best protocol for computing C. Partially based on slides by Eric Blais 4
Example: Set Disjointness 𝐸𝐽𝑇𝐾 𝒍 1101000101110101110101010110 … Alice Bob 𝐽𝑜𝑞𝑣𝑢: 𝑇 ⊆ [𝑜] , 𝑇 = 𝑙 . Input: 𝑈 ⊆ [𝑜] , 𝑈 = 𝑙 Compute 𝐸𝐽𝑇𝐾 𝑙 𝑇, 𝑈 = ቊ 𝒃𝒅𝒅𝒇𝒒𝒖 if 𝑇 ∩ 𝑈 = ∅ 𝒔𝒇𝒌𝒇𝒅𝒖 otherwise Theorem [Kalyanasundaram Schmitger 92, Razborov 92] 𝑜 𝑆 DISJ 𝑙 ≥ Ω 𝑙 for all 𝑙 ≤ 2 . 5
One-Way Communication Complexity 𝑇ℎ𝑏𝑠𝑓𝑒 𝑠𝑏𝑜𝑒𝑝𝑛 𝑡𝑢𝑠𝑗𝑜 1101000101110101110101010110 … Alice Bob 𝑛 1 𝐽𝑜𝑞𝑣𝑢: 𝑦 Input: 𝑧 Compute 𝐷 𝑦, 𝑧 Goal: minimize the number of bits Alice sends to Bob. One-way communication complexity of a function 𝐷 , denoted 𝑆 → (𝐷) , is the communication complexity of the best one-way protocol for computing C. 6
3-Player One-Way Communication Complexity 𝑇ℎ𝑏𝑠𝑓𝑒 𝑠𝑏𝑜𝑒𝑝𝑛 𝑡𝑢𝑠𝑗𝑜 1101000101110101110101010110 … Alice Carol Bob 𝑛 1 𝑛 2 Compute 𝐷 𝑦, 𝑧, 𝑨 Input: 𝑨 𝐽𝑜𝑞𝑣𝑢: 𝑦 Input: 𝑧 Goal: minimize 𝑛 1 + |𝑛 2 | . • Require correct output w.p. at least 2/3 over the random string 7
Converting Streaming Algorithm to CC Protocol Let 𝓠 be a streaming problem. • Suppose there is a transformation 𝑦 → 𝑡 1 , 𝑧 → 𝑡 2 , 𝑨 → 𝑡 3 such that 𝓠 (𝑡 1 ∘ 𝑡 2 ∘ 𝑡 3 ) suffices to compute 𝐷(𝑦, 𝑧, 𝑨) 𝑛 1 𝑛 2 Compute 𝐷 𝑦, 𝑧, 𝑨 Input: 𝑨 𝐽𝑜𝑞𝑣𝑢: 𝑦 Input: 𝑧 𝑡 1 𝑡 2 𝑡 3 An 𝑡 -bit algorithm 𝐵 for 𝓠 gives a 2𝑡 -bit protocol for 𝐷 • Alice runs 𝐵 on 𝑡 1 and sends memory state, 𝑛 1 , to Bob Bob instantiates 𝐵 with 𝑛 1 , runs 𝐵 on 𝑡 2 , sends memory state, 𝑛 2 , to Carol • Carol instantiates 𝐵 with 𝑛 2 , runs 𝐵 on 𝑡 3 to get 𝓠 (𝑡 1 ∘ 𝑡 2 ∘ 𝑡 3 ) and • computes 𝐷(𝑦, 𝑧, 𝑨) 8 Based on Andrew McGregor’s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf
Converting Streaming Algorithm to CC Protocol Let 𝓠 be a streaming problem. • Suppose there is a transformation 𝑦 → 𝑡 1 , 𝑧 → 𝑡 2 , 𝑨 → 𝑡 3 such that 𝓠 (𝑡 1 ∘ 𝑡 2 ∘ 𝑡 3 ) suffices to compute 𝐷(𝑦, 𝑧, 𝑨) 𝑛 1 𝑛 2 Compute 𝐷 𝑦, 𝑧, 𝑨 Input: 𝑨 𝐽𝑜𝑞𝑣𝑢: 𝑦 Input: 𝑧 𝑡 1 𝑡 2 𝑡 3 An 𝑡 -bit algorithm 𝐵 for 𝓠 gives a 2𝑡 -bit protocol for 𝐷 • If there are 𝑞 players than the protocol uses 𝑞 − 1 𝑡 bits 𝑀 • A lower bound 𝑀 for computing 𝐷 implies 𝑐 = Ω 𝑞 9 Based on Andrew McGregor’s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf
A lower bound using CC method Approximating 𝐺 ∞
Application: Approximating 𝑮 ∞ Theorem Every algorithm that computes 4/3 -approximation of 𝐺 ∞ (w.p. ≥ 2/3) needs Ω(𝑜) space. Proof: Reduction from Set Disjointness On input 𝑦, 𝑧 ∈ 0,1 𝑜 , players generate 𝑡 1 = {𝑘: 𝑦 𝑘 = 1} and 𝑡 2 = {𝑘: 𝑧 𝑘 = 1} 0 0 1 1 0 0 Example: → 〈3,4; 1,3,5〉 (1 0 1 0 1 0) • Then 𝐺 ∞ = 1 if 𝑦, 𝑧 represent disjoint sets, and 𝐺 ∞ = 2 , otherwise. Output ≥ 3/2 Output ≤ 4/3 An 𝑡 -space algorithm implies an 𝑡 -bit protocol: • 𝑡 = Ω 𝑜 by communication complexity of 𝑇𝑓𝑢 𝐸𝑗𝑡𝑘𝑝𝑗𝑜𝑢𝑜𝑓𝑡𝑡 11 Based on Andrew McGregor’s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf
A lower bound using CC method Computing the median of a stream
Index • Alice gets an 𝑜 -bit string 𝑦 , and Bob gets an index 𝑘 ∈ [𝑜] . Define 𝐽𝑜𝑒𝑓𝑦(𝑦, 𝑘) = 𝑦 𝑘 . • One-way communication complexity of 𝐽𝑜𝑒𝑓𝑦(𝑦, 𝑘) is Ω 𝑜 • 13
Application: Finding the Median of a Stream Theorem Every algorithm that computes the median of an (2𝑜 − 1) - element stream exactly (w.p. ≥ 2/3) needs Ω(𝑜) space. Proof: Reduction from Index. On input 𝑦 ∈ 0,1 𝑜 , Alice generates 𝑡 1 = {2𝑗 + 𝑦 𝑗 : 𝑗 ∈ [𝑜]} • 0 0 1 1 0 1 1 → 〈2,4,7,9,10,13,15〉 Example: • On input 𝑘 ∈ [𝑜] , Bob generates 𝑡 2 = 𝑜 − 𝑘 copies of 0 and 𝑘 − 1 copies of 2𝑜 + 2 𝑘 = 2 → 〈0,0,0,0,0,16〉 Example: Then 𝑛𝑓𝑒𝑗𝑏𝑜 𝑡 1 ∘ 𝑡 2 = 2𝑘 + 𝑦 𝑘 and Index 𝑦, 𝑘 = 2𝑘 + 𝑦 𝑘 𝑛𝑝𝑒 2 • • An 𝑡 -space algorithm implies an 𝑡 -bit protocol: by 1-way communication 𝑡 = Ω 𝑜 complexity of 𝐽𝑜𝑒𝑓𝑦 14 Based on Andrew McGregor’s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf
A lower bound using CC method Approximating Frequency Moments [Bar-Yossef, Jayram, Kumar, Sivakumar 04]
Multi-party Set Disjointness • Consider a 𝑞 × 𝑜 binary matrix 𝑁 where each column has weight 0, 1 or 𝑞 Example: 0 0 1 1 0 0 4 5 1 0 1 0 1 0 1 3 0 0 1 0 0 0 6 0 0 1 0 0 1 • The input of player 𝑗 is row 𝑗 of 𝑁 𝐸𝐽𝑇𝐾 𝑞 𝑁 = ቊ 0 if there is a column of 1s 1 otherwise Communication complexity of 𝐸𝐽𝑇𝐾 𝑞 𝑁 is Ω 𝑜 • 𝑞 16 Based on Andrew McGregor’s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf
Application: Frequency Moments for 𝒍 > 𝟑 Every algorithm that 2-approximaes 𝐺 𝑙 (w.p. ≥ 2/3) needs Ω 𝑜 1− 2 Thm. space 𝑙 Proof: Reduction from multi-party Set Disjointness On input 𝑁 ∈ 0,1 𝑞×𝑜 , player 𝑗 generates 𝑡 𝑗 = {𝑘: 𝑁 𝑗𝑘 = 1} • Example: 0 0 1 1 0 0 4 5 1 1 0 1 0 1 0 3 → 〈3,4; 1,3,5; 3; 3,6〉 0 0 1 0 0 0 6 0 0 1 0 0 1 𝑙 ≤ 𝑜 𝑜 If all columns have weight 0 or 1 then 𝐺 𝑙 = σ 𝑗=1 • 𝑔 𝑗 If there is a column of weight 𝑞 then 𝐺 𝑙 ≥ 𝑞 𝑙 • 1 A 2-approximation of 𝐺 𝑙 distinguishes the cases if 𝑞 𝑙 > 4𝑜 ⇔ 𝑞 > 4𝑜 • 𝑙 • An 𝑡 -space algorithm implies 𝑡(𝑞 − 1) -bit protocol: 𝑜 𝑜 = Ω 𝑜 1−2 𝑡 = Ω = Ω 𝑙 𝑞 2 2 4𝑜 𝑙 by communication complexity of 𝐸𝐽𝑇𝐾 (𝑞) f or constant 𝑙 17 Based on Andrew McGregor’s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf
A lower bound using CC method Distinct Elements
Gap Hamming • Alice and Bob get 𝑜 -bit strings 𝑦 and 𝑧 , respectively. Hamming distance 𝐼𝑏𝑛(𝑦, 𝑧) is the number of positions on which 𝑦 and 𝑧 • differ. Output: 𝐼𝑏𝑛(𝑦, 𝑧) with additive error 𝑜 w.p. ≥ 2/3 • • Communication complexity of 𝐼𝑏𝑛(𝑦, 𝑧) is Ω 𝑜 even when |𝑦| and |𝑧| are known to both players 19 Based on Andrew McGregor’s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf
Application: Distinct Elements Every algorithm (1 + 𝜁) -approximing 𝐺 0 (w.p. ≥ 2/3) needs Ω 1/𝜁 2 space Thm. Proof: Reduction from Gap Hamming On input 𝑦, 𝑧 ∈ 0,1 𝑜 , players generate 𝑡 1 = {𝑘: 𝑦 𝑘 = 1} and 𝑡 2 = {𝑘: 𝑧 𝑘 = 1} 0 0 1 1 0 0 Example: → 〈3,4; 1,3,5〉 (1 0 1 0 1 0) Then 2𝐺 0 = 𝑦 + 𝑧 + 𝐼𝑏𝑛(𝑦, 𝑧) • • When |𝑦| is known to Bob, (1 + 𝜁) -approximation of 𝐺 0 gives an additive approximation to Ham 𝑦, 𝑧 𝜁 ⋅ 𝑦 + 𝑧 + 𝐼𝑏𝑛 𝑦, 𝑧 ≤ 𝜁𝑜 ≤ 𝑜 2 f or 𝜁 ≤ 1/ 𝑜 • An 𝑡 -space algorithm implies an 𝑡 -bit protocol: 1 𝑡 = Ω 𝑜 = Ω 𝜁 2 by communication complexity of 𝐻𝑏𝑞 𝐼𝑏𝑛𝑛𝑗𝑜 20 Based on Andrew McGregor’s slides: https://people.cs.umass.edu/~mcgregor/711S18/lowerbounds-1.pdf
Recommend
More recommend