Streaming Algorithms for Set Cover Piotr Indyk With : Sepideh - PowerPoint PPT Presentation

Streaming Algorithms for Set Cover Piotr Indyk With : Sepideh Mahabadi, Ali Vakilian

Set Cover • Input: a collection S of sets S 1 ...S m that covers U={1...n} – I.e., S 1  S 2  ….  S m = U • Output: a subset I of S such that: – I covers U – | I | is minimized • Classic optimization problem: – NP-hard – Greedy ln(n)-approximation algorithm – Can’t do better unless P=NP (or something like that)

Streaming Set Cover [SG09] • Model – Sequential access to S 1 , S 2 , …., S m – One (or few) passes, sublinear (i.e., o(mn)) storage – (Hopefully) decent approximation factor • Why ? – A classic optimization problem (see previous slide) – Several ``big data’’ uses – One of few NP-hard problems studied in streaming • Other examples: max-cut, sub-modular opt, FPT

The ``Big Table’’ Result Approximation Passes Space R/D Greedy ln(n) 1 O(mn) D Greedy ln(n) n O(n) D [SG09] O(logn) O(logn) O(n logn) D [ER14] O(n 1/2 ) 1 O˜(n) D [DIMV14] O(4 1/ δ ρ ) O(4 1 /δ ) O˜(mn δ ) R [CW] n δ /δ 1/δ−1 Θ˜ (n) D [Nis02] log(n)/2 O(logn) Ω(m) R [DIMV14] O(1) O(logn) Ω( mn) D [IMV] O(ρ/δ) O(1/δ) O˜(mn δ ) R Ω ~(mn δ ) [IMV] 1 1/2δ−1 R [IMV] 1 1/ 2δ−1 Ω~( ms) R [IMV] 3/2 1 Ω(mn) R

A few observations: algorithms Greedy ln(n) 1 O(mn) D Greedy ln(n) n O(n) D [SG09] O(logn) O(logn) O(n logn) D [ER14] O(n) 1 O˜(n) D [DIMV14] O(4 1/ δ ρ ) O(4 1 /δ ) O˜(mn δ ) R [CW] n δ /δ 1/δ−1 Θ˜ (n) D [IMV] O(ρ/δ) O(1/δ) O˜(mn δ ) R • Most of the algorithms are deterministic • All of the algorithms are ``clean’’

A few observations: lower bounds [Nis02] log(n)/2 O(logn) Ω(m) R [DIMV14] O(1) O(logn) Ω( mn) D [CW] n δ /δ 1/δ−1 Θ˜ (n) D [IMV] 1 1/2δ−1 Ω ~(mn δ ) R [IMV] 3/2 1 Ω(mn) R

Algorithm [IMV] O(ρ/δ) O(1/δ) O˜(mn δ ) R • Approach: “dimensionality reduction” – Covers all but 1/n δ fraction of elements using ρ *k sets (k=min cover size) – Uses O~(mn δ ) space – Two passes • Repeat O(1/ δ ) times: – O(1/ δ ) passes – O(ρ/δ ) approximation

• Covers all but 1/n δ fraction of Dimensionality reduction: elements • Uses mn δ space • Two passes • Suppose we know k=min cover size • Pass 1: – For each set S i , select S i if it covers Ω (n/k) elements – Compute V=set of elements not covered by selected sets – Fact: each not-selected set covers O(n/k) elements in V • Select a set R of kn δ log m random elements from V • Pass 2: – Store all sets projected on R – Compute a ρ - approximate set cover I’ – Fact [DIMV14, KMVV13] : I’ covers all but 1/n δ fraction of V • Report sets found in Pass 1 and Pass 2

Dimensionality reduction: space accounting • Suppose we know k=min cover size * log n • Pass 1: – For each set S i , select S i if it covers Ω (n/k) elements n – Compute V=set of elements not covered by selected sets – Fact: each not-selected set covers O(n/k) elements in V • Select a set R of kn δ log m random elements from V • Pass 2: m*(n/k)*|R|/n – Store all sets projected on R =m*n δ log m – Compute a ρ - approximate set cover I’ – Fact [DIMV14, KMVV13] : I’ covers all but 1/n δ fraction of V • Report sets found in Pass 1 and Pass 2

Lower bound: single pass [IMV] 3/2 1 Ω( mn) R • Have seen that O(1) passes can reduce space requirements • What can(not) be done in one pass ? • We show that distinguishing between k=2 and k=3 requires Ω( mn) space

Proof Idea • Two sets cover U iff their complements are disjoint • Consider two following one-way communication complexity problem: – Alice: sets S 1 … S m – Bob: set S – Question: is S disjoint from one of S i ’s ? • Lemma: the randomized one way c.c. of this problem is Ω( mn) if error prob. is 1/poly(m)

Proof idea ctd. • Lemma: the one way c.c. of this problem is Ω( mn) if error prob. is 1/poly(m). • Proof: – Suppose S i ’s are selected uniformly at random – We show that there exist poly(m) sets S such if Bob learns answers to all of them, he can recover all S i ’s with high probability

Proof idea ctd. • Bob’s queries: – p oly(m) random “seed” queries of size c log m for some constant c>0 – For each sees query S, all “extension” queries of the form S  {i} • Recovery procedure – Suppose that a seed S is disjoint from exactly one S i (we do not know which one) • Call it a ``good seed’’ for S i – Then extension queries recover the complement of S i • poly(m) queries suffice to generate a good seed for each S i

Lower bound: multipass [IMV] 1 1/2δ−1 Ω ~(mn δ ) R [IMV] 1 1/ 2δ−1 Ω~( ms) R • Reduction from Intersection Set Chasing [Guruswami- Onak’13] • Very “brittle”, hence works only for the exact problem

Conclusions Result Approximation Passes Space R/D Greedy ln(n) 1 O(mn) D Greedy ln(n) n O(n) D [SG09] O(logn) O(logn) O(n logn) D [ER14] O(n 1/2 ) 1 O˜(n) D [DIMV14] O(4 1/ δ ρ ) O(4 1 /δ ) O˜(mn δ ) R [CW] n δ /δ 1/δ−1 Θ˜ (n) D [Nis02] log(n)/2 O(logn) Ω(m) R [DIMV14] O(1) O(logn) Ω( mn) D [IMV] O(ρ/δ) O(1/δ) O˜(mn δ ) R Ω ~(mn δ ) [IMV] 1 1/2δ−1 R [IMV] 1 1/ 2δ−1 Ω~( ms) R [IMV] 3/2 1 Ω(mn) R

Streaming Algorithms for Set Cover Piotr Indyk With : Sepideh - PowerPoint PPT Presentation

Streaming Algorithms for Set Cover Piotr Indyk With : Sepideh Mahabadi, Ali Vakilian Set Cover Input: a collection S of sets S 1 ...S m that covers U={1...n} I.e., S 1 S 2 . S m = U Output: a subset I of S such that:

Set Cover in Sub-linear Time Piotr Indyk Sepideh Mahabadi Ronitt Rubinfeld MIT Columbia

(Nearly) Sample Optimal Sparse Fourier Transform Piotr Indyk 1 Michael Kapralov 1 Eric Price 2 1

Simple and Practical Algorithm for the Sparse Fourier Transform Haitham Hassanieh Piotr Indyk

Nearly Optimal Sparse Fourier Transform Haitham Hassanieh Piotr Indyk Dina Katabi Eric Price

Sketching and Streaming for Distributions Piotr Indyk Andrew McGregor Massachusetts Institute of

Algorithms for Parity Games Piotr Danilewski May 15, 2008 Piotr Danilewski Algorithms for

P and NP Carola Wenk Slides courtesy of Piotr Indyk with additions by Carola Wenk CMPS 6610

P and NP Carola Wenk Slides courtesy of Piotr Indyk with small changes y y g by Carola Wenk

Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Dimensionality Reduction Techniques for Proximity Problems Piotr Indyk, SODA 2000 CS 468 |

Diverse Near Neighbor Problem Sofiane Abbar (QCRI) Sihem Amer-Yahia (CNRS) Piotr Indyk (MIT)

Streaming algorithms Jeremy Gibbons University of Oxford APPSEM II, April 2004 Streaming

Geometric Optimization Piotr Indyk April 26, 2005 Lecture 19: Geometric Optimization Geometric

Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation

Adaptive Sparse Recovery Eric Price MIT 2012-04-26 Joint work with Piotr Indyk and David

Faster GPS via the Sparse Fourier Transform Haitham Hassanieh Fadel Adib Dina Katabi Piotr Indyk

Using Wildlife Acoustics SM4Bat Joe Chun-Chia Huang 1 5/4/2018 SM4 BAT Two models: SM4BAT FS

DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs

IMPLEMENTATION OF DIFFERENT CANOPY REDUCTION MECHANISMS IN CMAQ Jan A. Arndt*, Volker Matthias,

MapReduce and Dryad CS227 Li Jin, Jayme DeDona Outline Map Reduce Dryad

FIRST Sets Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of Computer

On the Limitations of Representing Functions on Sets Edward Wagstaff, Fabian Fuchs, Martin

CPSC 121: Models of Computation PART 1 REVIEW OF TEXT READING Unit 11: Sets These pages

Theory and Practice of Finding Eviction Sets Pepe Vila Boris Kpf Jos F. Morales IMDEA

Sambuz

Useful Links

Newsletter

Mail Us

Streaming Algorithms for Set Cover Piotr Indyk With : Sepideh - PowerPoint PPT Presentation

Streaming Algorithms for Set Cover Piotr Indyk With : Sepideh Mahabadi, Ali Vakilian Set Cover Input: a collection S of sets S 1 ...S m that covers U={1...n} I.e., S 1 S 2 . S m = U Output: a subset I of S such that:

Set Cover in Sub-linear Time Piotr Indyk Sepideh Mahabadi Ronitt Rubinfeld MIT Columbia

(Nearly) Sample Optimal Sparse Fourier Transform Piotr Indyk 1 Michael Kapralov 1 Eric Price 2 1

Simple and Practical Algorithm for the Sparse Fourier Transform Haitham Hassanieh Piotr Indyk

Nearly Optimal Sparse Fourier Transform Haitham Hassanieh Piotr Indyk Dina Katabi Eric Price

Sketching and Streaming for Distributions Piotr Indyk Andrew McGregor Massachusetts Institute of

Algorithms for Parity Games Piotr Danilewski May 15, 2008 Piotr Danilewski Algorithms for

P and NP Carola Wenk Slides courtesy of Piotr Indyk with additions by Carola Wenk CMPS 6610

P and NP Carola Wenk Slides courtesy of Piotr Indyk with small changes y y g by Carola Wenk

Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Dimensionality Reduction Techniques for Proximity Problems Piotr Indyk, SODA 2000 CS 468 |

Diverse Near Neighbor Problem Sofiane Abbar (QCRI) Sihem Amer-Yahia (CNRS) Piotr Indyk (MIT)

Streaming algorithms Jeremy Gibbons University of Oxford APPSEM II, April 2004 Streaming

Geometric Optimization Piotr Indyk April 26, 2005 Lecture 19: Geometric Optimization Geometric

Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation

Adaptive Sparse Recovery Eric Price MIT 2012-04-26 Joint work with Piotr Indyk and David

Faster GPS via the Sparse Fourier Transform Haitham Hassanieh Fadel Adib Dina Katabi Piotr Indyk

Using Wildlife Acoustics SM4Bat Joe Chun-Chia Huang 1 5/4/2018 SM4 BAT Two models: SM4BAT FS

DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs

IMPLEMENTATION OF DIFFERENT CANOPY REDUCTION MECHANISMS IN CMAQ Jan A. Arndt*, Volker Matthias,

MapReduce and Dryad CS227 Li Jin, Jayme DeDona Outline Map Reduce Dryad

FIRST Sets Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of Computer

On the Limitations of Representing Functions on Sets Edward Wagstaff*, Fabian Fuchs*, Martin

CPSC 121: Models of Computation PART 1 REVIEW OF TEXT READING Unit 11: Sets These pages

Theory and Practice of Finding Eviction Sets Pepe Vila Boris Kpf Jos F. Morales IMDEA

Sambuz

Useful Links

Newsletter

Mail Us

On the Limitations of Representing Functions on Sets Edward Wagstaff, Fabian Fuchs, Martin