communication complexity in the field new questions from
play

Communication Complexity in the Field: New Questions from Practice - PowerPoint PPT Presentation

Communication Complexity in the Field: New Questions from Practice Qin Zhang Indiana University Bloomington BIRS Workshop March 20, 2017 1-1 This talk Not on a particular problem Try to present a few new questions that I have encountered


  1. Communication Complexity in the Field: New Questions from Practice Qin Zhang Indiana University Bloomington BIRS Workshop March 20, 2017 1-1

  2. This talk Not on a particular problem Try to present a few new questions that I have encountered when trying to apply comm. complexity in various settings 2-1

  3. Agenda I will talk about 1. Number-in-hand CC with input sharing – Distributed computation of graph problems 2. Primitive problems overlap; direct-sum does not apply – Distributed joins 3. Higher LB in simultaneous comm. than one-way comm.? – Sketching edit distance 3-1

  4. Distributed graph computation Real world systems: Pregel, Giraph, GPS, GraphLab, etc. 4-1

  5. The coordinator model The coordinator model : We have k machines (sites) and one central server (coordinator). – Each site has a 2-way comm. channel with the coordinator. – Each site has a piece of data x i . – Task : compute f ( x 1 , . . . , x k ) together via comm., for some f . Coordinator outputs the answer. – Goal : minimize total communication C · · · S k S 1 S 3 S 2 5-1

  6. Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. 6-1

  7. Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. 6-2

  8. Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. C · · · S k S 1 S 3 S 2 6-3

  9. Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. C A trivial solution: each S i sends a local spanning forest to C . Cost · · · O ( kn log n ) bits. S k S 1 S 3 S 2 n : # nodes of the graph 6-4

  10. Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. C A trivial solution: each S i sends a local spanning forest to C . Cost · · · O ( kn log n ) bits. S k S 1 S 3 S 2 n : # nodes of the graph Can we do better, e.g., o ( kn ) bits of comm. in total? 6-5

  11. Distributed graph computation Let’s think about the graph connectivity problem: k sites each holds a portion of a graph. Goal: compute whether the graph is connected. C A trivial solution: each S i sends a local spanning forest to C . Cost · · · O ( kn log n ) bits. S k S 1 S 3 S 2 n : # nodes of the graph Can we do better, e.g., o ( kn ) bits of comm. in total? If graph is edge partitioned among k sites, Ω( kn ) [Woodruff, Z. ’13] 6-6

  12. LB graph for edge partition LB graph for edge partition: For each i ∈ [ k ], ( X i , Y ) ∼ µ which is a hard input distribution for set-disjointness. Each site S i holding X i = { X i , 1 , . . . , X i , n } creates an edge ( u i , v j ) for each X i , j = 1. The coordinator holding Y = { Y 1 , . . . , Y n } creates a path containing { v j | Y j = 1 } and a path containing { v j | Y j = 0 } . v j | Y j = 0 v j | Y j = 1 v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | u 1 u 2 u 3 u k ( X 1 ) ( X 2 ) ( X 3 ) ( X k ) 7-1

  13. LB graph for edge partition LB graph for edge partition: For each i ∈ [ k ], ( X i , Y ) ∼ µ which is a hard input distribution for set-disjointness. Each site S i holding X i = { X i , 1 , . . . , X i , n } creates an edge ( u i , v j ) for each X i , j = 1. The coordinator holding Y = { Y 1 , . . . , Y n } creates a path containing { v j | Y j = 1 } and a path containing { v j | Y j = 0 } . v j | Y j = 0 v j | Y j = 1 v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | Graph connected ⇔ DISJ ( X 1 , Y ) ∨ . . . ∨ DISJ ( X k , Y ) = 1 (LB: Ω( kn )) u 1 u 2 u 3 u k ( X 1 ) ( X 2 ) ( X 3 ) ( X k ) 7-2

  14. What if the graph is node partitioned? In most practical systems, graph is node partitioned . Can we prove a similar LB? 8-1

  15. What if the graph is node partitioned? In most practical systems, graph is node partitioned . Can we prove a similar LB? v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | Graph connected ⇔ DISJ ( X 1 , Y ) ∨ . . . ∨ DISJ ( X k , Y ) = 1 u 1 u 2 u 3 u k Basically, only bottom nodes (and their adjacent edges) are partitioned 8-2

  16. What if the graph is node partitioned? In most practical systems, graph is node partitioned . Can we prove a similar LB? v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | Graph connected ⇔ DISJ ( X 1 , Y ) ∨ . . . ∨ DISJ ( X k , Y ) = 1 u 1 u 2 u 3 u k Basically, only bottom nodes (and their adjacent edges) are partitioned If we also partition the top nodes (and their adjacent edges), then the Ω( kn ) LB does not hold. 8-3

  17. What if the graph is node partitioned? In most practical systems, graph is node partitioned . Can we prove a similar LB? v j | Y | +1 v j | Y | +2 v j | Y | +3 v j n v j 1 v j 2 v j | Y | Graph connected ⇔ DISJ ( X 1 , Y ) ∨ . . . ∨ DISJ ( X k , Y ) = 1 u 1 u 2 u 3 u k Basically, only bottom nodes (and their adjacent edges) are partitioned If we also partition the top nodes (and their adjacent edges), then the Ω( kn ) LB does not hold. Not a surprise. If a graph is node partitioned, ˜ O ( n ) suffices. [Ahn, Guha, McGregor ’12] 8-4

  18. Input sharing Input sharing To prove LB in the node partition model, one needs to deal with input sharing: each edge may be stored in two sites. Need new techniques? 9-1

  19. Input sharing Input sharing To prove LB in the node partition model, one needs to deal with input sharing: each edge may be stored in two sites. Need new techniques? A concrete problem: Breadth First Search Tree Given a node u , the parties want to jointly compute a BSF tree rooted at u . The coordinator outputs the final BFS tree. What is the comm. complexity? 9-2

  20. Distributed joins 10-1

  21. Set-intersection join A 1 , . . . , A m ⊆ [ n ] = { 1 , 2 , . . . , n } , and B 1 , . . . , B m ⊆ [ n ] A 1 = = B 1 B m B A A m e.g., skills e.g., skills of required by a applicants job positions Set-Intersection Join (cardinality version) SIJ ( A , B ) = |{ ( i , j ) for which C i , j > 0 , where C = A · B }| An important operation in databases 11-1

  22. Set-intersection join (cont.) The problem : estimate SIJ ( A , B ) up to a (1 + ǫ ) factor. Useful e.g. in query planning. 12-1

  23. Set-intersection join (cont.) The problem : estimate SIJ ( A , B ) up to a (1 + ǫ ) factor. Useful e.g. in query planning. Current LB Ω( n /ǫ 2 / 3 ) : (Van Gucht, Williams, Woodruff, Z. ’15) 12-2

  24. Set-intersection join (cont.) The problem : estimate SIJ ( A , B ) up to a (1 + ǫ ) factor. Useful e.g. in query planning. Current LB Ω( n /ǫ 2 / 3 ) : (Van Gucht, Williams, Woodruff, Z. ’15) For each i ∈ [ m ], choose ( A i , B i ) ∼ µ where µ is a hard input distribution for set-disjointness. Define SUM ( A , B ) = � i ∈ [ m ] DISJ ( A i , B i ). W.h.p. SIJ ( A , B ) = SUM ( A , B ) + m ( m − 1) . Using basically a direct-sum (Gap-hamming + DISJ), any rand. algo. that computes SUM ( A , B ) w.pr. 0.99 � up to an additive error m / 2 needs Ω( mn ) comm. Set m = 1 /ǫ 2 / 3 to get Ω( n /ǫ 2 / 3 ) LB 12-3

  25. Set-intersection join (cont.) The current best UB : ˜ O ( m /ǫ 2 ) using F 0 -sketch, and is one-way Can we prove an Ω( n /ǫ 2 ) LB? Not enough to apply a direct-sum type argument on ( A 1 , B 1 ) , . . . , ( A m , B m ), since each A i is going to join each B j . In other words, the primitive problems overlap. Need new techniques? 13-1

  26. Sketching threshold edit distance 14-1

  27. Edit Distance Definition: Given two strings s , t ∈ Σ n : ed ( s , t ) = minimum number of character operations (insertion/deletion/substitution) that transform s to t . 15-1

  28. Edit Distance Definition: Given two strings s , t ∈ Σ n : ed ( s , t ) = minimum number of character operations (insertion/deletion/substitution) that transform s to t . ed( banana , ananas ) = 2 15-2

  29. Edit Distance Definition: Given two strings s , t ∈ Σ n : ed ( s , t ) = minimum number of character operations (insertion/deletion/substitution) that transform s to t . ed( banana , ananas ) = 2 Applications: numerous. E.g., bioinformatics (measuring similarity between DNA seq. 15-3

  30. Edit Distance Definition: Given two strings s , t ∈ Σ n : ed ( s , t ) = minimum number of character operations (insertion/deletion/substitution) that transform s to t . ed( banana , ananas ) = 2 Applications: numerous. E.g., bioinformatics (measuring automatic spelling correction similarity between DNA seq. 15-4

  31. Problems The threshold version of ED: Given two strings s , t ∈ { 0 , 1 } n and a threhold K , output all the edits if ed ( s , t ) ≤ K , output “ Error ” otherwise. 16-1

  32. Problems The threshold version of ED: Given two strings s , t ∈ { 0 , 1 } n and a threhold K , output all the edits if ed ( s , t ) ≤ K , output “ Error ” otherwise. sk(s) t s document exchange App: remote file sync; file transmission through a noisy channel One-way comm. 16-2

Recommend


More recommend