under Modular Updates Shachar Lovett (UCSD) Kaave Hosseini (UCSD - PowerPoint PPT Presentation

Optimality of Linear Sketching under Modular Updates Shachar Lovett (UCSD) Kaave Hosseini (UCSD → CMU), Grigory Yaroslavtsev (Indiana)

Streaming and sketching

Streaming with binary updates • Counters 𝑦 1 , … , 𝑦 𝑜 ∈ 𝔾 2 • Stream of updates: 𝑦 𝑗 ← 𝑦 𝑗 ⊕ 1 • At the end, want to compute function 𝑔(𝑦 1 , … , 𝑦 𝑜 ) • For which functions can we do it using ≪ 𝑜 memory?

Example • Initially 000000 • Flip 𝑦 1 100000 • Flip 𝑦 5 100010 • Flip 𝑦 2 110010 • Flip 𝑦 5 100000 • … • Compute 𝑔 𝑦 1 , … , 𝑦 𝑜

Linear sketching • Linear sketching is a useful primitive for streaming 𝑜 → {0,1} • Let 𝑔: 𝔾 2 • 𝑔 has a linear sketch of size k if it factors as 𝑔 𝑦 = 𝑞(𝑀 𝑦 ) where: 𝑜 → 𝔾 2 𝑙 linear function 𝑀: 𝔾 2 (i) 𝑙 → {0,1} post-processing function (ii) 𝑞: 𝔾 2 • Equivalently, the “Fourier dimension” of 𝑔 is 𝑙

Linear sketching implies streaming 𝑜 → {0,1} factors as 𝑔 𝑦 = 𝑞(𝑀 𝑦 ) where • Assume 𝑔: 𝔾 2 𝑜 → 𝔾 2 𝑙 linear function (i) 𝑀: 𝔾 2 𝑙 → {0,1} post-processing function (ii) 𝑞: 𝔾 2 𝑙 • To compute 𝑔 in the streaming model, maintain 𝑀 𝑦 ∈ 𝔾 2 • Easy to maintain under updates 𝑦 𝑗 ← 𝑦 𝑗 ⊕ 1 • Requires only k bits of memory

Randomized linear sketching • Randomization makes linear sketching more powerful 𝑜 → {0,1} has a randomized linear sketch of size k if it can be • 𝑔: 𝔾 2 approximated by a distribution over linear sketches of size k • That is, if exists a distribution over 𝑀, 𝑞 , where: 𝑜 → 𝔾 2 𝑙 linear function 𝑀: 𝔾 2 (i) 𝑙 → {0,1} post-processing function 𝑞: 𝔾 2 (ii) Such that Pr L,p 𝑔 𝑦 = 𝑞(𝑀 𝑦 ) ≥ 1 − 𝜗

Randomized sketching gives additional power • Consider the OR function: 𝑃𝑆 𝑦 1 , … , 𝑦 𝑜 = 𝑦 1 ∨ ⋯ ∨ 𝑦 𝑜 • Deterministic sketching requires size n • Randomized sketching can be done in size 𝑃 log 1/𝜗 (random parities)

Is linear sketching universal? • Linear sketching seems like a very useful primitive for streaming • Is it universal? • That is: given a streaming algorithm that computes 𝑔 using 𝑙 bits of memory, can we extract from it a linear sketch for 𝑔 of size ≈ 𝑙 ?

Universality of linear sketching

Universality of linear sketching 𝑜 → {0,1} • Let 𝑔: 𝔾 2 • Assume: randomized streaming algorithm supporting 𝑂 updates and using 𝑙 bits of memory • Goal: extract a randomized linear sketch of size ≈ 𝑙 • True if 𝑂 ≥ 2 2 2𝑜 [Li-Nguyen-Woordruff ‘14, Ai -Hu-Li- Woodruff ‘16] • True if 𝑂 = Ω(𝑜) for random inputs [Kannan-Mossell-Sanyal-Yaroslavtsev ‘18] • True if 𝑂 = Ω(𝑜 2 ) [This work]

Main theorem: streaming 𝑜 → {0,1} • Let 𝑔: 𝔾 2 • Assume there exists a randomized streaming algorithm for 𝑔 supporting N = Ω 𝑜 2 updates which uses 𝑙 bits of memory • Then there exists a randomized linear sketch for 𝑔 of size 𝑃(𝑙)

Extensions (that I will not talk about) 𝑜 → [0,1] • Extends to approximate real-valued functions 𝑔: 𝔾 2 • Extends to functions over other fields • Assuming only N = Ω(𝑜) updates are supported, we can still extract a randomized linear sketch, but its size will be 𝑞𝑝𝑚𝑧(𝑙) instead of 𝑃(𝑙)

One-way communication complexity

One way communication complexity • Model a streaming algorithm as a one-way communication protocol • Break 𝑂 updates into 𝑁 = 𝑂/𝑜 chunks of size n each • Setup: M players, holding inputs 𝑦 1 , … , 𝑦 𝑁 ∈ 𝔾 2 𝑜 ( 𝑦 𝑗 is the aggregate of the n updates in the i-th chunk) • Goal: compute 𝑔 𝑦 1 + ⋯ + 𝑦 𝑁 • Communication model: one-way

One way communication complexity • M players, holding inputs 𝑦 1 , … , 𝑦 𝑁 ∈ 𝔾 2 𝑜 • Model: one-way communication with shared randomness • Goal: output = 𝑔 𝑦 1 + ⋯ + 𝑦 𝑁 w.h.p over shared randomness Message Output Message Message 𝑛 1 ∈ 0,1 𝑙 𝑛 2 ∈ 0,1 𝑙 𝑛 𝑁−1 ∈ 0,1 𝑙 𝑝𝑣𝑢 ∈ {0,1} … Player Player Player 1 2 M 𝑦 1 ∈ 𝔾 2 𝑦 2 ∈ 𝔾 2 𝑦 𝑁 ∈ 𝔾 2 𝑜 𝑜 𝑜

Main theorem: one way communication 𝑜 → {0,1} • Let 𝑔: 𝔾 2 • Assume there exists a one-way communication protocol for computing 𝑔 𝑦 1 + ⋯ + 𝑦 𝑁 for 𝑁 = Ω(𝑜) players with k-bit messages (recall: this corresponds to 𝑂 = 𝑁𝑜 = Ω 𝑜 2 binary updates) • Then there exists a randomized linear sketch for f of size 𝑃 𝑙 • For 𝑁 = Ω 1 players, get linear sketch of size 𝑞𝑝𝑚𝑧(𝑙)

Proof • The proof uses 1. Standard techniques in communication complexity 2. Additive combinatorics

Proof step 1: Yao’s minimax principle 𝑜 → {0,1} • Let 𝑔: 𝔾 2 • Fix a “ hard distribution ” 𝜈 over inputs • Goal: linear sketch for 𝑔(𝑦) where 𝑦 ∼ 𝜈 • Embed hard distribution to the M players: • First M-1 players inputs 𝑦 1 , … , 𝑦 𝑁−1 are uniform in 𝔾 2 𝑜 • Last player input 𝑦 𝑁 is set so that 𝑦 1 + ⋯ + 𝑦 𝑁 = 𝑦 • Intuition: protocol has no information on x until the last player

Proof step 2: protocol structure • Target: 𝑦 ∼ 𝜈 • Players inputs: 𝑦 1 , … , 𝑦 𝑁−1 ∈ 𝔾 2 𝑜 uniformly, 𝑦 𝑁 = 𝑦 1 + ⋯ + 𝑦 𝑁−1 + 𝑦 • We may assume the protocol is deterministic • Messages: 𝑛 1 𝑦 1 , 𝑛 2 𝑛 1 , 𝑦 2 , 𝑛 3 𝑛 1 , 𝑛 2 , 𝑦 3 , … • Output: 𝑝𝑣𝑢 𝑛 1 , … , 𝑛 𝑁−1 , 𝑦 𝑁 • With good probability out = 𝑔 𝑦 1 + ⋯ + 𝑦 𝑁 = 𝑔(𝑦) • Can fix the messages (of the first M- 1 players) to “ typical messages ”, without hurting the success probability too much

Proof step 3: fixing to typical messages ∗ , 𝑛 2 ∗ , … , 𝑛 𝑁−1 ∗ • Fix typical messages 𝑛 1 • Corresponds to the first M-1 players inputs: • 𝐵 1 = 𝑦 1 ∈ 𝔾 2 𝑜 : 𝑛 1 𝑦 1 = 𝑛 1 ∗ • 𝐵 2 = 𝑦 2 ∈ 𝔾 2 ∗ , 𝑦 2 = 𝑛 2 𝑜 : 𝑛 2 𝑛 1 ∗ • … • Sets are big: if the protocol uses k bits, then 𝐵 𝑗 ≥ 2 𝑜−𝑙 • After conditioning on 𝑦 1 ∈ 𝐵 1 , … , 𝑦 𝑁−1 ∈ 𝐵 𝑁−1 , protocol output is a function of only 𝑦 𝑁 = 𝑦 1 + ⋯ + 𝑦 𝑁−1 + 𝑦

Proof step 4: mixing • Large sets 𝐵 1 , … , 𝐵 𝑁−1 ⊂ 𝔾 2 𝑜 of density 2 −𝑙 • If we sample 𝑦 1 ∈ 𝐵 1 , … , 𝑦 𝑁−1 ∈ 𝐵 𝑁−1 and 𝑦 ∼ 𝜈 , then with high probability 𝑝𝑣𝑢 𝑦 1 + ⋯ + 𝑦 𝑁−1 + 𝑦 = 𝑔 𝑦 • Technical lemma: for 𝑁 = Ω 𝑂 , the sum 𝑦 1 + ⋯ + 𝑦 𝑁−1 mixes in 𝔾 2 𝑜 𝑜 of co-dimension 𝑃(𝑙) , • More precisely, there exists a subspace 𝑊 ⊂ 𝔾 2 such that the sum is near invariant to a random shift from 𝑊

Proof step 5: extracting linear sketch • We found a large subspace V of co-dimension O(k) • If we sample 𝑦 1 ∈ 𝐵 1 , … , 𝑦 𝑁−1 ∈ 𝐵 𝑁−1 , 𝑦 ∼ 𝜈 and 𝑤 ∈ 𝑊, then with high probability 𝑝𝑣𝑢 𝑦 1 + ⋯ + 𝑦 𝑁−1 + 𝑦 + 𝑤 = 𝑔 𝑦 • This allows to “factor out” V from the output function, and extract a linear sketch for 𝑔 𝑦

Open problems

Linear sketching for modular updates • For binary updates (or more general, modular updates), we prove that linear sketching is universal • Any streaming algorithm which supports 𝑂 = Ω 𝑜 2 updates implies a randomized linear sketch with similar guarantees • Open problem 1: can this be improved to 𝑂 = Ω 𝑜 ? • [Kannan-Mossell-Sanyal-Yaroslavtsev ‘18] proved a partial result in this regime, giving a linear sketch for f on random inputs • Our results in this regime incur a polynomial loss in the sketch size

Integer updates • Streaming if often considered in the integer case • Integer counters 𝑦 1 , … , 𝑦 𝑜 • Updates 𝑦 𝑗 += 1 or 𝑦 𝑗 −= 1 • Sketching corresponds to linear functions over the integers • The results of [Li-Nguyen-Woordruff ‘14, Ai -Hu-Li- Woodruff ‘16] work in this regime as well, but require assuming 𝑂 ≥ 2 2 2𝑜 • Open problem 2: can our techniques be imported to this regime? • Challenge: not clear what “mixing” should mean here

under Modular Updates Shachar Lovett (UCSD) Kaave Hosseini (UCSD - PowerPoint PPT Presentation

Optimality of Linear Sketching under Modular Updates Shachar Lovett (UCSD) Kaave Hosseini (UCSD CMU), Grigory Yaroslavtsev (Indiana) Streaming and sketching Streaming with binary updates Counters 1 , , 2

Modular Budgets Modular Budgets Modular Budgets Modular Budgets OSPA NANO Session 10/25/06

1 TEMPORARY MODULAR HOUSING Meeting Purpose Learn how Temporary Modular Housing will allow

Modular Applications, Loose Coupling, and the NetBeans Lookup API The Need for Modular

Managing Modular Software for your NuGet, C++ and Java Development Agenda Modular software

resources T M Modular Gold Plant MGP Environmentally Friendly True Modular

Modular Robots Modular Robots by D. Dibbern and A. Werdermann by D. Dibbern and A. Werdermann

WELCOME Temporary Modular Housing Community Information Session Thank you for joining us!

Modular Home Virginia Building Solutions For Home Owners: Variety Modular homes look like

Modular Program and Modular Design for LARP Quadrupoles A research program and magnet design

Efficient and secure modular operations using the Polynomial Modular Number System (Part 1)

Lecture 14. Outline. Modular Arithmetic Fact and Secrets There exists a polynomial... Modular

Mission Updates Payload and Subsystems Updates Rocket and Subsystems Updates

Lecture 7 Modular forms for subgroups SL 2 p Z q & dimension formulas April 28, 2020 1

+ METAL MEETS PLASTIC MODULAR MOLDING SYSTEMS Advanced Automation Systems for

Examining the Effectiveness of Modular Psychotherapy in a Community Clinic : Two Analytic

Embedded PC The modular Industrial PC for mid-range control Embedded PC 1 Embedded OS

How to Make ASLR Win the Clone Wars: Runtime Re-Randomization Kangjie Lu , Stefan Nrnberger,

Noncespaces: Using Randomization to Enforce Information Flow Tracking and Thwart Cross-Site

8. Average-Case Analysis of Algorithms + Randomized Algorithms 1 insertion sort Array A[1]

Verifiable Elections That Scale for Free Melissa Chase (MSR Redmond) Markulf Kohlweiss (MSR

To Randomize or Not To Randomize: Space Optimal Summaries for Hyperlink Analysis Tam as

Randomized methods for machine learning David Lopez-Paz, FAIR May 17, 2016

(Still) Exploiting TCP Timestamps Veit N. Hailperin 1 1 scip AG Hack in Paris, June 2015 Veit N.

Recommendation on Data Missing Not at Random A Doubly Robust Joint Learning Approach Rating

under Modular Updates Shachar Lovett (UCSD) Kaave Hosseini (UCSD - PowerPoint PPT Presentation

Optimality of Linear Sketching under Modular Updates Shachar Lovett (UCSD) Kaave Hosseini (UCSD CMU), Grigory Yaroslavtsev (Indiana) Streaming and sketching Streaming with binary updates Counters 1 , , 2

Modular Budgets Modular Budgets Modular Budgets Modular Budgets OSPA NANO Session 10/25/06

1 TEMPORARY MODULAR HOUSING Meeting Purpose Learn how Temporary Modular Housing will allow

Modular Applications, Loose Coupling, and the NetBeans Lookup API The Need for Modular

Managing Modular Software for your NuGet, C++ and Java Development Agenda Modular software

resources T M Modular Gold Plant MGP Environmentally Friendly True Modular

Modular Robots Modular Robots by D. Dibbern and A. Werdermann by D. Dibbern and A. Werdermann

WELCOME Temporary Modular Housing Community Information Session Thank you for joining us!

Modular Home Virginia Building Solutions For Home Owners: Variety Modular homes look like

Modular Program and Modular Design for LARP Quadrupoles A research program and magnet design

Efficient and secure modular operations using the Polynomial Modular Number System (Part 1)

Lecture 14. Outline. Modular Arithmetic Fact and Secrets There exists a polynomial... Modular

Mission Updates Payload and Subsystems Updates Rocket and Subsystems Updates

Lecture 7 Modular forms for subgroups SL 2 p Z q &amp; dimension formulas April 28, 2020 1

+ METAL MEETS PLASTIC MODULAR MOLDING SYSTEMS Advanced Automation Systems for

Examining the Effectiveness of Modular Psychotherapy in a Community Clinic : Two Analytic

Embedded PC The modular Industrial PC for mid-range control Embedded PC 1 Embedded OS

How to Make ASLR Win the Clone Wars: Runtime Re-Randomization Kangjie Lu , Stefan Nrnberger,

Noncespaces: Using Randomization to Enforce Information Flow Tracking and Thwart Cross-Site

8. Average-Case Analysis of Algorithms + Randomized Algorithms 1 insertion sort Array A[1]

Verifiable Elections That Scale for Free Melissa Chase (MSR Redmond) Markulf Kohlweiss (MSR

To Randomize or Not To Randomize: Space Optimal Summaries for Hyperlink Analysis Tam as

Randomized methods for machine learning David Lopez-Paz, FAIR May 17, 2016

(Still) Exploiting TCP Timestamps Veit N. Hailperin 1 1 scip AG Hack in Paris, June 2015 Veit N.

Recommendation on Data Missing Not at Random A Doubly Robust Joint Learning Approach Rating

Lecture 7 Modular forms for subgroups SL 2 p Z q & dimension formulas April 28, 2020 1