amit chakrabarti
play

Amit Chakrabarti Dartmouth College WAPMDS, IIT Kanpur, Dec 2009 - PowerPoint PPT Presentation

Multi-Pass Lower Bounds Dec 20, 2009 Multi-pass Data Stream Lower Bounds via Round Elimination Amit Chakrabarti Dartmouth College WAPMDS, IIT Kanpur, Dec 2009 Amit Chakrabarti 1 Multi-Pass Lower Bounds Dec 20, 2009 Lower Bounds Paradigms


  1. Multi-Pass Lower Bounds Dec 20, 2009 Multi-pass Data Stream Lower Bounds via Round Elimination Amit Chakrabarti Dartmouth College WAPMDS, IIT Kanpur, Dec 2009 Amit Chakrabarti 1

  2. Multi-Pass Lower Bounds Dec 20, 2009 Lower Bounds Paradigms Algorithm design: Lower bounds: Amit Chakrabarti 2

  3. Multi-Pass Lower Bounds Dec 20, 2009 Lower Bounds Paradigms Algorithm design: divide & conquer, greedy, dynamic programming, LP relaxation, . . . Lower bounds: ? ? ? Amit Chakrabarti 2-a

  4. Multi-Pass Lower Bounds Dec 20, 2009 Lower Bounds Paradigms Algorithm design: divide & conquer, greedy, dynamic programming, LP relaxation, . . . Lower bounds: ? ? ? • Information complexity paradigm [C.-Shi-Wirth-Yao’01] • Round elimination paradigm [Miltersen-Nisan-Safra-Wigderson’95] Amit Chakrabarti 2-b

  5. Multi-Pass Lower Bounds Dec 20, 2009 Multi-Pass Lower Bounds Data streams: two broad application scenarios • Networks: Busy router, packets whizzing by – Web traffic statistics – Intrusion detection • Databases: Huge DB, linear scan cheaper than random access – Query optimisation: join size estimation – Log analysis Amit Chakrabarti 3

  6. Multi-Pass Lower Bounds Dec 20, 2009 Multi-Pass Lower Bounds Data streams: two broad application scenarios • Networks: Busy router, packets whizzing by – Web traffic statistics – Intrusion detection • Databases: Huge DB, linear scan cheaper than random access – Query optimisation: join size estimation – Log analysis • DB setting: Multiple passes meaningful This talk: Pass/space tradeoffs for some basic stream problems Amit Chakrabarti 3-a

  7. Multi-Pass Lower Bounds Dec 20, 2009 Data Stream Model • Formally: input stream = n tokens, each token ∈ [ m ] – Assume log m = Θ(log n ) • Compute some function of stream, using – Small space, s ≪ m, n ... ideally, s = O (log n ) – Small number of passes, p Amit Chakrabarti 4

  8. Multi-Pass Lower Bounds Dec 20, 2009 Problems of Interest Class A: • Median Class B: • Distinct elements • Frequency moments • Empirical entropy Amit Chakrabarti 5

  9. Multi-Pass Lower Bounds Dec 20, 2009 Problems of Interest Class A: • Median Class B: • Distinct elements , F 0 F k = � m i =1 freq( i ) k • Frequency moments , H = � m • Empirical entropy , i =1 (freq( i ) /m ) · log( m/ freq( i )) Amit Chakrabarti 5-a

  10. Multi-Pass Lower Bounds Dec 20, 2009 Problems of Interest Class A: • Median • Key question: Want s = O (log n ) ; then p = ?? – Dates back to first “data streams” paper [Munro-Paterson’78] Class B: • Distinct elements , F 0 F k = � m i =1 freq( i ) k • Frequency moments , H = � m • Empirical entropy , i =1 (freq( i ) /m ) · log( m/ freq( i )) Amit Chakrabarti 5-b

  11. Multi-Pass Lower Bounds Dec 20, 2009 Problems of Interest Class A: • Median • Key question: Want s = O (log n ) ; then p = ?? – Dates back to first “data streams” paper [Munro-Paterson’78] Class B: • Distinct elements , F 0 F k = � m i =1 freq( i ) k • Frequency moments , H = � m • Empirical entropy , i =1 (freq( i ) /m ) · log( m/ freq( i )) • Key question: Want ε -approx; then s = ?? – One-pass: e O ( ε − 2 ) , e Ω( ε − 2 ) [BarYossef-J.-K.-S.-T.’02]; [Woodruff’04] – Dependence of s on n : [A-M-S’96]; [C.-Khot-Sun’03]; [Gronemeier’09] Amit Chakrabarti 5-c

  12. Multi-Pass Lower Bounds Dec 20, 2009 Our Results (Answering the Key Questions) Class A: Median [C.-Cormode-McGregor’08] • Achieving s = O (log n ) requires p = Ω(log n ) • If tokens randomly ordered, requires p = Ω(log log n ) • Above lower bounds are tight [Guha-McGregor’07] Amit Chakrabarti 6

  13. Multi-Pass Lower Bounds Dec 20, 2009 Our Results (Answering the Key Questions) Class A: Median [C.-Cormode-McGregor’08] • Achieving s = O (log n ) requires p = Ω(log n ) • If tokens randomly ordered, requires p = Ω(log log n ) h i Ω( n 2 − p ) – Specifically: s ≈ Ω( n 1 /p ) for adversarial [random] order • Above lower bounds are tight [Guha-McGregor’07] Amit Chakrabarti 6-a

  14. Multi-Pass Lower Bounds Dec 20, 2009 Our Results (Answering the Key Questions) Class A: Median [C.-Cormode-McGregor’08] • Achieving s = O (log n ) requires p = Ω(log n ) • If tokens randomly ordered, requires p = Ω(log log n ) h i Ω( n 2 − p ) – Specifically: s ≈ Ω( n 1 /p ) for adversarial [random] order • Above lower bounds are tight [Guha-McGregor’07] Class B: Distinct elements [Brody-C.’09] • Need s = Ω(1 /ε 2 ) space for any p = O (1) – Specifically: s = e Ω(1 / ( ε 2 p 2 )) [Brody-C.-Regev-Vidick-deWolf’10] • Holds under random order, and even random data • Matching upper bound, even with one pass and adversarial data Amit Chakrabarti 6-b

  15. Multi-Pass Lower Bounds Dec 20, 2009 Method: Reduce from Communication Complexity 14 22 9 4 12 32 17 10 1 11 29 28 2 7 25 31 3 18 5 23 30 8 6 27 20 26 16 19 21 15 24 13 p -pass streaming algorithm = ⇒ Θ( p ) -round communication protocol messages = memory contents of streaming algorithm Amit Chakrabarti 7

  16. Multi-Pass Lower Bounds Dec 20, 2009 Communication vs Data Stream 14 22 9 4 12 32 17 10 1 11 29 28 2 7 25 31 3 18 5 23 30 8 6 27 20 26 16 19 21 15 24 13 �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� split amongst many players ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� 14 22 9 4 12 32 17 10 1 11 29 28 2 7 25 31 3 18 5 23 30 8 6 27 20 26 16 19 21 15 24 13 Alice Bob Carl p -pass streaming algorithm = ⇒ Θ( p ) -round communication protocol messages = memory contents of streaming algorithm Amit Chakrabarti 7

  17. Multi-Pass Lower Bounds Dec 20, 2009 Communication vs Data Stream 14 22 9 4 12 32 17 10 1 11 29 28 2 7 25 31 3 18 5 23 30 8 6 27 20 26 16 19 21 15 24 13 �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� �� split amongst many players ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� ���� ���� �� �� ���� ���� 14 22 9 4 12 32 17 10 1 11 29 28 2 7 25 31 3 18 5 23 30 8 6 27 20 26 16 19 21 15 24 13 Alice Bob Carl take special case input + interpret combinatorially 1 1 0 1 0 0 0 1 0 1 1 0 1 1 0 0 1 1 0 1 1 1 0 0 1 0 0 1 0 1 0 0 p -pass streaming algorithm = ⇒ Θ( p ) -round communication protocol messages = memory contents of streaming algorithm Amit Chakrabarti 7

  18. Multi-Pass Lower Bounds Dec 20, 2009 The Round Elimination Paradigm If there exists... Round 1: A B C D msg1 msg1 msg1 msg1 Input: Round 2: A B C D msg2 msg2 msg2 msg2 Round 3: A B C D msg3 msg3 msg3 msg3 with short messages, then there exists... Input: Round 2: A B C D msg2 msg2 msg2 msg2 Round 3: A B C D Padding: msg3 msg3 msg3 msg3 Amit Chakrabarti 8

  19. Multi-Pass Lower Bounds Dec 20, 2009 The Round Elimination Paradigm If there exists... Round 1: A B C D msg1 msg1 msg1 msg1 Input: Round 2: A B C D msg2 msg2 msg2 msg2 Round 3: A B C D msg3 msg3 msg3 msg3 with short messages, then there exists... Input: Round 2: A B C D msg2 msg2 msg2 msg2 Round 3: A B C D Padding: msg3 msg3 msg3 msg3 Eventually, if original protocol too short, then 0 -round protocol for a nontrivial problem = ⇒ Contradiction Amit Chakrabarti 8-a

  20. Multi-Pass Lower Bounds Dec 20, 2009 Class A: Median Amit Chakrabarti 9

  21. Multi-Pass Lower Bounds Dec 20, 2009 Tree Pointer Jumping Complete k -level t -ary tree T Input φ : V ( T ) → [ t ] with φ ( leaf ) ∈ { 0 , 1 } Player i knows φ at level i   φ ( v ) -th child of v, if v internal g φ ( v ) :=  φ ( v ) , if v leaf 3 Level Desired output = g φ ( g φ ( · · · g φ ( root ) · · · )) Model: k − 1 rounds of communication 2 Level Each round: (Plr 1, Plr 2, . . . , Plr k ) Level 1 Call this tpj k,t 1 0 0 1 1 1 0 0 1 Amit Chakrabarti 10

  22. Multi-Pass Lower Bounds Dec 20, 2009 Weight-Based TPJ 3 -error, CC p ( tpj p +1 ,t ) = Ω( t/p 2 ) Theorem: For uniform random input, 1 Contrast: D p ( tpj p +1 ,t ) = O ( t ) and D p +1 ( tpj p +1 ,t ) = O ( p log t ) Amit Chakrabarti 11

Recommend


More recommend