under modular updates
play

under Modular Updates Shachar Lovett (UCSD) Kaave Hosseini (UCSD - PowerPoint PPT Presentation

Optimality of Linear Sketching under Modular Updates Shachar Lovett (UCSD) Kaave Hosseini (UCSD CMU), Grigory Yaroslavtsev (Indiana) Streaming and sketching Streaming with binary updates Counters 1 , , 2


  1. Optimality of Linear Sketching under Modular Updates Shachar Lovett (UCSD) Kaave Hosseini (UCSD β†’ CMU), Grigory Yaroslavtsev (Indiana)

  2. Streaming and sketching

  3. Streaming with binary updates β€’ Counters 𝑦 1 , … , 𝑦 π‘œ ∈ 𝔾 2 β€’ Stream of updates: 𝑦 𝑗 ← 𝑦 𝑗 βŠ• 1 β€’ At the end, want to compute function 𝑔(𝑦 1 , … , 𝑦 π‘œ ) β€’ For which functions can we do it using β‰ͺ π‘œ memory?

  4. Example β€’ Initially 000000 β€’ Flip 𝑦 1 100000 β€’ Flip 𝑦 5 100010 β€’ Flip 𝑦 2 110010 β€’ Flip 𝑦 5 100000 β€’ … β€’ Compute 𝑔 𝑦 1 , … , 𝑦 π‘œ

  5. Linear sketching β€’ Linear sketching is a useful primitive for streaming π‘œ β†’ {0,1} β€’ Let 𝑔: 𝔾 2 β€’ 𝑔 has a linear sketch of size k if it factors as 𝑔 𝑦 = π‘ž(𝑀 𝑦 ) where: π‘œ β†’ 𝔾 2 𝑙 linear function 𝑀: 𝔾 2 (i) 𝑙 β†’ {0,1} post-processing function (ii) π‘ž: 𝔾 2 β€’ Equivalently, the β€œFourier dimension” of 𝑔 is 𝑙

  6. Linear sketching implies streaming π‘œ β†’ {0,1} factors as 𝑔 𝑦 = π‘ž(𝑀 𝑦 ) where β€’ Assume 𝑔: 𝔾 2 π‘œ β†’ 𝔾 2 𝑙 linear function (i) 𝑀: 𝔾 2 𝑙 β†’ {0,1} post-processing function (ii) π‘ž: 𝔾 2 𝑙 β€’ To compute 𝑔 in the streaming model, maintain 𝑀 𝑦 ∈ 𝔾 2 β€’ Easy to maintain under updates 𝑦 𝑗 ← 𝑦 𝑗 βŠ• 1 β€’ Requires only k bits of memory

  7. Randomized linear sketching β€’ Randomization makes linear sketching more powerful π‘œ β†’ {0,1} has a randomized linear sketch of size k if it can be β€’ 𝑔: 𝔾 2 approximated by a distribution over linear sketches of size k β€’ That is, if exists a distribution over 𝑀, π‘ž , where: π‘œ β†’ 𝔾 2 𝑙 linear function 𝑀: 𝔾 2 (i) 𝑙 β†’ {0,1} post-processing function π‘ž: 𝔾 2 (ii) Such that Pr L,p 𝑔 𝑦 = π‘ž(𝑀 𝑦 ) β‰₯ 1 βˆ’ πœ—

  8. Randomized sketching gives additional power β€’ Consider the OR function: 𝑃𝑆 𝑦 1 , … , 𝑦 π‘œ = 𝑦 1 ∨ β‹― ∨ 𝑦 π‘œ β€’ Deterministic sketching requires size n β€’ Randomized sketching can be done in size 𝑃 log 1/πœ— (random parities)

  9. Is linear sketching universal? β€’ Linear sketching seems like a very useful primitive for streaming β€’ Is it universal? β€’ That is: given a streaming algorithm that computes 𝑔 using 𝑙 bits of memory, can we extract from it a linear sketch for 𝑔 of size β‰ˆ 𝑙 ?

  10. Universality of linear sketching

  11. Universality of linear sketching π‘œ β†’ {0,1} β€’ Let 𝑔: 𝔾 2 β€’ Assume: randomized streaming algorithm supporting 𝑂 updates and using 𝑙 bits of memory β€’ Goal: extract a randomized linear sketch of size β‰ˆ 𝑙 β€’ True if 𝑂 β‰₯ 2 2 2π‘œ [Li-Nguyen-Woordruff β€˜14, Ai -Hu-Li- Woodruff β€˜16] β€’ True if 𝑂 = Ξ©(π‘œ) for random inputs [Kannan-Mossell-Sanyal-Yaroslavtsev β€˜18] β€’ True if 𝑂 = Ξ©(π‘œ 2 ) [This work]

  12. Main theorem: streaming π‘œ β†’ {0,1} β€’ Let 𝑔: 𝔾 2 β€’ Assume there exists a randomized streaming algorithm for 𝑔 supporting N = Ξ© π‘œ 2 updates which uses 𝑙 bits of memory β€’ Then there exists a randomized linear sketch for 𝑔 of size 𝑃(𝑙)

  13. Extensions (that I will not talk about) π‘œ β†’ [0,1] β€’ Extends to approximate real-valued functions 𝑔: 𝔾 2 β€’ Extends to functions over other fields β€’ Assuming only N = Ξ©(π‘œ) updates are supported, we can still extract a randomized linear sketch, but its size will be π‘žπ‘π‘šπ‘§(𝑙) instead of 𝑃(𝑙)

  14. One-way communication complexity

  15. One way communication complexity β€’ Model a streaming algorithm as a one-way communication protocol β€’ Break 𝑂 updates into 𝑁 = 𝑂/π‘œ chunks of size n each β€’ Setup: M players, holding inputs 𝑦 1 , … , 𝑦 𝑁 ∈ 𝔾 2 π‘œ ( 𝑦 𝑗 is the aggregate of the n updates in the i-th chunk) β€’ Goal: compute 𝑔 𝑦 1 + β‹― + 𝑦 𝑁 β€’ Communication model: one-way

  16. One way communication complexity β€’ M players, holding inputs 𝑦 1 , … , 𝑦 𝑁 ∈ 𝔾 2 π‘œ β€’ Model: one-way communication with shared randomness β€’ Goal: output = 𝑔 𝑦 1 + β‹― + 𝑦 𝑁 w.h.p over shared randomness Message Output Message Message 𝑛 1 ∈ 0,1 𝑙 𝑛 2 ∈ 0,1 𝑙 𝑛 π‘βˆ’1 ∈ 0,1 𝑙 𝑝𝑣𝑒 ∈ {0,1} … Player Player Player 1 2 M 𝑦 1 ∈ 𝔾 2 𝑦 2 ∈ 𝔾 2 𝑦 𝑁 ∈ 𝔾 2 π‘œ π‘œ π‘œ

  17. Main theorem: one way communication π‘œ β†’ {0,1} β€’ Let 𝑔: 𝔾 2 β€’ Assume there exists a one-way communication protocol for computing 𝑔 𝑦 1 + β‹― + 𝑦 𝑁 for 𝑁 = Ξ©(π‘œ) players with k-bit messages (recall: this corresponds to 𝑂 = π‘π‘œ = Ξ© π‘œ 2 binary updates) β€’ Then there exists a randomized linear sketch for f of size 𝑃 𝑙 β€’ For 𝑁 = Ξ© 1 players, get linear sketch of size π‘žπ‘π‘šπ‘§(𝑙)

  18. Proof

  19. Proof β€’ The proof uses 1. Standard techniques in communication complexity 2. Additive combinatorics

  20. Proof step 1: Yao’s minimax principle π‘œ β†’ {0,1} β€’ Let 𝑔: 𝔾 2 β€’ Fix a β€œ hard distribution ” 𝜈 over inputs β€’ Goal: linear sketch for 𝑔(𝑦) where 𝑦 ∼ 𝜈 β€’ Embed hard distribution to the M players: β€’ First M-1 players inputs 𝑦 1 , … , 𝑦 π‘βˆ’1 are uniform in 𝔾 2 π‘œ β€’ Last player input 𝑦 𝑁 is set so that 𝑦 1 + β‹― + 𝑦 𝑁 = 𝑦 β€’ Intuition: protocol has no information on x until the last player

  21. Proof step 2: protocol structure β€’ Target: 𝑦 ∼ 𝜈 β€’ Players inputs: 𝑦 1 , … , 𝑦 π‘βˆ’1 ∈ 𝔾 2 π‘œ uniformly, 𝑦 𝑁 = 𝑦 1 + β‹― + 𝑦 π‘βˆ’1 + 𝑦 β€’ We may assume the protocol is deterministic β€’ Messages: 𝑛 1 𝑦 1 , 𝑛 2 𝑛 1 , 𝑦 2 , 𝑛 3 𝑛 1 , 𝑛 2 , 𝑦 3 , … β€’ Output: 𝑝𝑣𝑒 𝑛 1 , … , 𝑛 π‘βˆ’1 , 𝑦 𝑁 β€’ With good probability out = 𝑔 𝑦 1 + β‹― + 𝑦 𝑁 = 𝑔(𝑦) β€’ Can fix the messages (of the first M- 1 players) to β€œ typical messages ”, without hurting the success probability too much

  22. Proof step 3: fixing to typical messages βˆ— , 𝑛 2 βˆ— , … , 𝑛 π‘βˆ’1 βˆ— β€’ Fix typical messages 𝑛 1 β€’ Corresponds to the first M-1 players inputs: β€’ 𝐡 1 = 𝑦 1 ∈ 𝔾 2 π‘œ : 𝑛 1 𝑦 1 = 𝑛 1 βˆ— β€’ 𝐡 2 = 𝑦 2 ∈ 𝔾 2 βˆ— , 𝑦 2 = 𝑛 2 π‘œ : 𝑛 2 𝑛 1 βˆ— β€’ … β€’ Sets are big: if the protocol uses k bits, then 𝐡 𝑗 β‰₯ 2 π‘œβˆ’π‘™ β€’ After conditioning on 𝑦 1 ∈ 𝐡 1 , … , 𝑦 π‘βˆ’1 ∈ 𝐡 π‘βˆ’1 , protocol output is a function of only 𝑦 𝑁 = 𝑦 1 + β‹― + 𝑦 π‘βˆ’1 + 𝑦

  23. Proof step 4: mixing β€’ Large sets 𝐡 1 , … , 𝐡 π‘βˆ’1 βŠ‚ 𝔾 2 π‘œ of density 2 βˆ’π‘™ β€’ If we sample 𝑦 1 ∈ 𝐡 1 , … , 𝑦 π‘βˆ’1 ∈ 𝐡 π‘βˆ’1 and 𝑦 ∼ 𝜈 , then with high probability 𝑝𝑣𝑒 𝑦 1 + β‹― + 𝑦 π‘βˆ’1 + 𝑦 = 𝑔 𝑦 β€’ Technical lemma: for 𝑁 = Ξ© 𝑂 , the sum 𝑦 1 + β‹― + 𝑦 π‘βˆ’1 mixes in 𝔾 2 π‘œ π‘œ of co-dimension 𝑃(𝑙) , β€’ More precisely, there exists a subspace π‘Š βŠ‚ 𝔾 2 such that the sum is near invariant to a random shift from π‘Š

  24. Proof step 5: extracting linear sketch β€’ We found a large subspace V of co-dimension O(k) β€’ If we sample 𝑦 1 ∈ 𝐡 1 , … , 𝑦 π‘βˆ’1 ∈ 𝐡 π‘βˆ’1 , 𝑦 ∼ 𝜈 and 𝑀 ∈ π‘Š, then with high probability 𝑝𝑣𝑒 𝑦 1 + β‹― + 𝑦 π‘βˆ’1 + 𝑦 + 𝑀 = 𝑔 𝑦 β€’ This allows to β€œfactor out” V from the output function, and extract a linear sketch for 𝑔 𝑦

  25. Open problems

  26. Linear sketching for modular updates β€’ For binary updates (or more general, modular updates), we prove that linear sketching is universal β€’ Any streaming algorithm which supports 𝑂 = Ξ© π‘œ 2 updates implies a randomized linear sketch with similar guarantees β€’ Open problem 1: can this be improved to 𝑂 = Ξ© π‘œ ? β€’ [Kannan-Mossell-Sanyal-Yaroslavtsev β€˜18] proved a partial result in this regime, giving a linear sketch for f on random inputs β€’ Our results in this regime incur a polynomial loss in the sketch size

  27. Integer updates β€’ Streaming if often considered in the integer case β€’ Integer counters 𝑦 1 , … , 𝑦 π‘œ β€’ Updates 𝑦 𝑗 += 1 or 𝑦 𝑗 βˆ’= 1 β€’ Sketching corresponds to linear functions over the integers β€’ The results of [Li-Nguyen-Woordruff β€˜14, Ai -Hu-Li- Woodruff β€˜16] work in this regime as well, but require assuming 𝑂 β‰₯ 2 2 2π‘œ β€’ Open problem 2: can our techniques be imported to this regime? β€’ Challenge: not clear what β€œmixing” should mean here

Recommend


More recommend