Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work - PowerPoint PPT Presentation

Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work with A. Wirth Sublinear Algorithms Workshop JHU, Jan 2016

Combinatorial Optimisation Problems I 1950s, 60s: Operations research I 1970s, 80s: NP-hardness I 1990s, 2000s: Approximation algorithms, hardness of approximation I 2010s: Space-constrained settings, e.g., streaming

Set Cover

Set Cover with Sets Streamed I Input: stream of m sets, each ⊆ [ n ] I Goal: cover universe [ n ] using as few sets as possible

Set Cover with Sets Streamed I Input: stream of m sets, each ⊆ [ n ] I Goal: cover universe [ n ] using as few sets as possible • Use sublinear (in m ) space • Ideally O ( n polylog n ) ... “semi-streaming” • Need Ω ( n log n ) space to certify : for each item, who covered it? Think m ≥ n

Background and Related Work O ffl ine results: I Best possible poly-time approx (1 ± o (1)) ln n [Johnson’74] [Slav´ ık’96] [Lund-Yannakakis’94] [Dinur-Steurer’14] I Simple greedy strategy gets ln n -approx: • Repeatedly add set with highest contribution • Contribution := number of new elements covered

Background and Related Work O ffl ine results: I Best possible poly-time approx (1 ± o (1)) ln n [Johnson’74] [Slav´ ık’96] [Lund-Yannakakis’94] [Dinur-Steurer’14] I Simple greedy strategy gets ln n -approx: • Repeatedly add set with highest contribution • Contribution := number of new elements covered Streaming results: I One pass semi-streaming O ( √ n ) approx I This is best possible in one semi-streaming pass [Emek-Ros´ en’14] I O (log n ) semi-streaming passes allow O (log n ) approx [Saha-Getoor’09] [Cormode-Karlo ff -Wirth’10]

Background and Related Work O ffl ine results: I Best possible poly-time approx (1 ± o (1)) ln n [Johnson’74] [Slav´ ık’96] [Lund-Yannakakis’94] [Dinur-Steurer’14] I Simple greedy strategy gets ln n -approx: • Repeatedly add set with highest contribution • Contribution := number of new elements covered Streaming results: I One pass semi-streaming O ( √ n ) approx I This is best possible in one semi-streaming pass [Emek-Ros´ en’14] I O (log n ) semi-streaming passes allow O (log n ) approx [Saha-Getoor’09] [Cormode-Karlo ff -Wirth’10] I There’s more: wait till the end! [Nisan’02] [Demaine-Indyk-Mahabadi-Vakilian’14] [Indyk-M-V’16]

Related Work: In Greater Detail Algorithms using p passes, S space, giving α -approximation Upper bounds: O ( n ) , α = O ( √ n ) I p = 1 , S = e [Emek-Ros´ en’14] I p = O (log n ) , S = e O ( n ) , α = O (log n ) [Cormode-Karlo ff -Wirth’10] I S = e O ( mn 1 / Ω (log p ) ) , α = O ( p ) [Demaine-Indyk-Mahabadi-Vakilian’14] I S = e O ( mn 1 / Ω ( p ) ) , α = O ( p ) [Indyk-Mahabadi-Vakilian’16] Lower bounds: I p = 1 , S = e O ( n ) ⇒ α = Ω ( n 1 / 2 − δ ) [Emek-Ros´ en’14] I α < 1 2 log 2 n ⇒ S = Ω ( m ) [Nisan’02] I α = O (1), deterministic ⇒ S = Ω ( mn ) [Demaine-I-M-V’14] I α = 1 ⇒ S = e Ω ( n 1+1 / (2( p +1)) ) [Indyk-Mahabadi-Vakilian’16] I p = 1 , α = 3 2 ⇒ S = Ω ( mn ) [Indyk-Mahabadi-Vakilian’16]

Our Results Upper bound I With p passes, semi-streaming space, get O ( n 1 / ( p +1) )-approx I Algorithm giving this approx based on very simple heuristic I Deterministic Lower bound I Randomised I In p passes, semi-streaming space, need Ω ( n 1 / ( p +1) / p 2 ) approx I Upper bound tight for all constant p I Semi-streaming O (log n ) approx requires Ω (log n / log log n ) passes

Progressive Greedy Algorithm Recall simple greedy: I Repeatedly add set with highest contribution I Contribution := number of new elements covered Progressive greedy: I In first pass, add all sets with contribution ≥ n 1 − 1 / p I In second pass, add all sets with contribution ≥ n 1 − 2 / p I ... I ... I In p th pass, add all sets with contribution ≥ 1

Progressive Greedy Algorithm 1: procedure GreedyPass (stream � , threshold ⌧ , set Sol , array Coverer ) for each set S i in � do 2: C { x : Coverer [ x ] 6 = 0 } . the already covered elements 3: if | S i \ C | � ⌧ then . set’s contribution � threshold 4: Sol Sol [ { i } 5: for each x 2 S i \ C do Coverer [ x ] i 6: 7: procedure ProgGreedyNaive (stream � , integer n , integer p � 1) Coverer [1 . . . n ] 0 n ; Sol ∅ 8: for j = 1 to p do GreedyPass ( � , n 1 − j / p , Sol , Coverer ) 9: output Sol , Coverer 10:

Progressive Greedy: Analysis Idea Consider p = 2 passes I First pass: admit sets i ff contribution ≥ √ n I Thus, first pass adds at most √ n sets to Sol

Progressive Greedy: Analysis Idea Consider p = 2 passes I First pass: admit sets i ff contribution ≥ √ n I Thus, first pass adds at most √ n sets to Sol I Second pass: Opt covers remaining items with sets of contrib ≤ √ n I Thus, Sol will cover the same using ≤ √ n | Opt | sets

Progressive Greedy: Analysis Idea Consider p = 2 passes I First pass: admit sets i ff contribution ≥ √ n I Thus, first pass adds at most √ n sets to Sol I Second pass: Opt covers remaining items with sets of contrib ≤ √ n I Thus, Sol will cover the same using ≤ √ n | Opt | sets But wait, this uses two passes for O ( √ n ) approx!

Progressive Greedy: Analysis Idea Consider p = 2 passes I First pass: admit sets i ff contribution ≥ √ n I Thus, first pass adds at most √ n sets to Sol I Second pass: Opt covers remaining items with sets of contrib ≤ √ n I Thus, Sol will cover the same using ≤ √ n | Opt | sets But wait, this uses two passes for O ( √ n ) approx! I Logic of last pass especially simple: add set if positive contrib I Can fold this into previous one Final result: p passes, O ( n 1 / ( p +1) )-approx

Lower Bound Idea: One Pass Reduce from index : Alice gets x ∈ { 0 , 1 } n , Bob gets j ∈ [ n ], Alice talks to Bob, who must determine x j . Requires Ω ( n )-bit message. [Ablayev’96] Universe F 2 q F q n = q 2 Alice’s sets Bob’s set

Lower Bound Idea: One Pass Reduce from index : Alice gets x ∈ { 0 , 1 } n , Bob gets j ∈ [ n ], Alice talks to Bob, who must determine x j . Requires Ω ( n )-bit message. [Ablayev’96] Universe F 2 q F q n = q 2 Alice’s sets Bob’s set If Alice has Bob’s missing line , then | Opt | = 2, else | Opt | ≥ q

Lower Bound Idea: One Pass Reduce from index : Alice gets x ∈ { 0 , 1 } n , Bob gets j ∈ [ n ], Alice talks to Bob, who must determine x j . Requires Ω ( n )-bit message. [Ablayev’96] Universe F 2 q F q n = q 2 Alice’s sets Bob’s set If Alice has Bob’s missing line , then | Opt | = 2, else | Opt | ≥ q So Θ ( √ n ) approx requires Ω (#lines) = Ω ( q 2 ) = Ω ( n ) space

Next Steps Goal: p semi-streaming passes require Ω ( n 1 / ( p +1) ) approx I Handle more passes I Increase space bound

Next Steps Goal: p semi-streaming passes require Ω ( n 1 / ( p +1) ) approx I Handle more passes • Can’t start from index , need harder communication problem I Increase space bound • Need ! ( n ) to rule out semi-streaming

Tree Pointer Jumping Multiplayer game tpj p +1 , t defined on complete ( p + 1)-level t -ary tree I Pointer to child at each internal level- i node (known to Player i ) I Bit at each leaf node (known to Player 1) I Goal: output (whp) bit reached by following pointers from root Level 3 Model: p rounds of communication Level 2 Each round: player 1 , player 2 , . . . , player p +1 Level 1 1 0 0 1 1 1 0 0 1 Theorem: Longest message is Ω ( t / p 2 ) bits [C.-Cormode-McGregor’08]

Multi-Pass Set Cover: First Attempt Two passes, reducing from tpj 3 , t , using universe F 3 q (so n = q 3 ) I Three players: Alice, Bob, Carol • Alice encodes leaf bits: lines in F 3 q • Bob encodes lower pointers: planes in F 3 q with a line deleted • Carol encodes root pointer: F 3 q with a plane deleted

Multi-Pass Set Cover: First Attempt Two passes, reducing from tpj 3 , t , using universe F 3 q (so n = q 3 ) I Three players: Alice, Bob, Carol • Alice encodes leaf bits: lines in F 3 q • Bob encodes lower pointers: planes in F 3 q with a line deleted • Carol encodes root pointer: F 3 q with a plane deleted I (Carol set) ∪ (corresp. Bob set) = F 3 q \ (a line) I If Alice has the missing line, then | Opt | = 3, else ⇒ | Opt | ≥ q (*)

Multi-Pass Set Cover: First Attempt Two passes, reducing from tpj 3 , t , using universe F 3 q (so n = q 3 ) I Three players: Alice, Bob, Carol • Alice encodes leaf bits: lines in F 3 q • Bob encodes lower pointers: planes in F 3 q with a line deleted • Carol encodes root pointer: F 3 q with a plane deleted I (Carol set) ∪ (corresp. Bob set) = F 3 q \ (a line) I If Alice has the missing line, then | Opt | = 3, else ⇒ | Opt | ≥ q (*) How good is this?

Multi-Pass Set Cover: First Attempt Two passes, reducing from tpj 3 , t , using universe F 3 q (so n = q 3 ) I Three players: Alice, Bob, Carol • Alice encodes leaf bits: lines in F 3 q • Bob encodes lower pointers: planes in F 3 q with a line deleted • Carol encodes root pointer: F 3 q with a plane deleted I (Carol set) ∪ (corresp. Bob set) = F 3 q \ (a line) I If Alice has the missing line, then | Opt | = 3, else ⇒ | Opt | ≥ q (*) How good is this? I Each pointer encoded by Bob can choose from only as many leaves as ⇒ t = Θ ( q 2 ) = Θ ( n 2 / 3 ) there are lines in a specific plane =

Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work - PowerPoint PPT Presentation

Streaming Set Cover Amit Chakrabarti Dartmouth College Joint work with A. Wirth Sublinear Algorithms Workshop JHU, Jan 2016 Combinatorial Optimisation Problems I 1950s, 60s: Operations research I 1970s, 80s: NP-hardness I 1990s, 2000s:

Streaming Algorithms for Set Cover Piotr Indyk With : Sepideh Mahabadi, Ali Vakilian Set Cover

Streaming Video Tuesday, February 2 Instructor: Jeremy Slayton Well cover: What is

Tight Bounds for Single-Pass Streaming Complexity of the Set Cover Problem Sepehr Assadi

Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set Cover Problem Sepehr Assadi

Training Presentation Web Streaming Introduction What is Web Streaming? Who is Streaming?

Introduction (1) Packet Loss Recovery for Streaming is growing Commercial streaming

Spark Streaming Summary by Lucy Yu Motivation Most of big data happens in a streaming

Streaming items through a cluster with Spark Streaming Tathagata TD Das @tathadas CME 323:

Streaming algorithms Jeremy Gibbons University of Oxford APPSEM II, April 2004 Streaming

Massive-scale analysis of streaming social networks David A. Bader Exascale Streaming Data

COAL COVER COAL COAL COAL COVER COVER COVER Searfoss

Streaming Session 7 INST 346 Technologies, Infrastructure and Architecture Goals for Today

LIVE STREAMING AT SCALE Jordi Cenzano | Director of engineering mmsys2019

Landell - live streaming for the masses Luciana Fujii Pontello Landell - live streaming for the

Congestion Control in Distributed Media Streaming Lin Ma and Wei Tsang Ooi National

Streaming Model of Computation A streaming algorithm processes a data stream : Input is

Playing Video Content Alan Smith ACTIVE SOLUTION, STOCKHOLM, SWEDEN youtube.com/user/CloudCasts

Streaming Streaming Overview Our goal: Build a video sharing site Same idea as the

Streaming Systems Instructor: Matei Zaharia cs245.stanford.edu Outline Motivation Streaming

Parameterized Streaming Algorithms Graham Cormode Rajesh Chitnis Parameterized Streaming

Software Streaming via Block Streaming Pramote Kuacharoen*, Vincent J. Mooney III + and Vijay K.

1 Prefetching Prefetching or Streaming or Streaming Prediction Prediction Compiler-driven

Measuring the Annoyance in Streaming Media Caused by Buffers and Interrupts Andrew Roskuski

Evaluating and Improving Push based Video Streaming with HTTP/2 Mengbai Xiao 1 , Vishy Swaminathan