sublinear algorithms for big data
play

Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 2: Sublinear - PowerPoint PPT Presentation

Sublinear Algorithms for Big Data Qin Zhang 1-1 Part 2: Sublinear in Communication 2-1 Sublinear in communication x 2 = 111011 x 1 = 010011 The model x 3 = 111111 x k = 100011 They want to jointly compute f ( x 1 , x 2 , . . . , x k ) Goal:


  1. Sublinear Algorithms for Big Data Qin Zhang 1-1

  2. Part 2: Sublinear in Communication 2-1

  3. Sublinear in communication x 2 = 111011 x 1 = 010011 The model x 3 = 111111 x k = 100011 They want to jointly compute f ( x 1 , x 2 , . . . , x k ) Goal: minimize total bits of communication Applicaitons etc. 3-1

  4. A natrual approach The model x 2 = 111011 x 1 = 010011 Coordinator x 3 = 111111 = C x k = 100011 S 1 S 2 S 3 S k · · · They want to jointly compute f ( x 1 , x 2 , . . . , x k ) Goal: minimize total bits of communication The natural approach Each S i computes a skech of its input sk ( S i ) and send it to C , and then C computes f ( x 1 , . . . , x k ) based on sk ( S 1 ) , . . . , sk ( S k ) The slides from next page are borrowed from Andrew McGregor 4-1

  5. I. Connectivity II. k -Connectivity III. Min-Cut

  6. II. k -Connectivity III. Min-Cut I. Connectivity Theorem: Testing Connectivity a) Dynamic Graph Stream: O(n polylog n) space. b) Simultaneous Messages: O(polylog n) length.

  7. Ingredient 1: Basic Algorithm

  8. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest):

  9. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge

  10. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge

  11. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge

  12. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge

  13. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge

  14. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge

  15. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge

  16. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge 3.Repeat until no edges between connected comp.

  17. Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge 3.Repeat until no edges between connected comp. Lemma: After O(log n) rounds selected edges include spanning forest.

  18. Ingredient 2: Sketching Neighborhoods

  19. Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 3 4

  20. Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 3 4

  21. Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 � 0 0 � 1 0 0 1 0 0 0 0 a 1 + a 2 = 3 4

  22. Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 � 0 0 � 1 0 0 1 0 0 0 0 a 1 + a 2 = 3 4 Lemma: For any subset of nodes S ⊂ V , � support ( a i ) = E ( S , V \ S ) i ∈ S

  23. Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 � 0 0 � 1 0 0 1 0 0 0 0 a 1 + a 2 = 3 4 Lemma: For any subset of nodes S ⊂ V , � support ( a i ) = E ( S , V \ S ) i ∈ S Lemma: ∃ random M: � N → � k with k=O(polylog N) such that for any a ∈ � N , with high probability → e ∈ support( a ) M a −

  24. Recipe: Sketch & Compute on Sketches

  25. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j

  26. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space:

  27. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j

  28. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use:

  29. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use: � � M a j = M ( a j ) j ∈ S j ∈ S

  30. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use: � � � → e ∈ support( a j ) = E ( S , V \ S ) M a j = M ( a j ) − j ∈ S j ∈ S j ∈ S

  31. Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use: � � � → e ∈ support( a j ) = E ( S , V \ S ) M a j = M ( a j ) − j ∈ S j ∈ S j ∈ S Detail: Actually each player sends log n indept sketches M 1 a j , M 2 a j , ... and central player uses M i a j when emulating i th iteration of the algorithm.

  32. I. Connectivity II. k -Connectivity III. Min-Cut

  33. I. Connectivity III. Min-Cut II. k -Connectivity Theorem: Checking every cut has size ≥ k a) Dynamic Graph Stream: O(n k polylog n) space. b) Simultaneous Messages: O(k polylog n) length.

  34. Ingredient 1: Basic Algorithm

  35. Ingredient 1: Basic Algorithm Algorithm (k-Connectivity):

  36. Ingredient 1: Basic Algorithm Algorithm (k-Connectivity): 1. Let F 1 be spanning forest of G(V ,E)

  37. Ingredient 1: Basic Algorithm Algorithm (k-Connectivity): 1. Let F 1 be spanning forest of G(V ,E) 2.For i=2 to k: 2.1. Let F i be spanning forest of G(V ,E-F 1 -...-F i-1 )

  38. Ingredient 1: Basic Algorithm Algorithm (k-Connectivity): 1. Let F 1 be spanning forest of G(V ,E) 2.For i=2 to k: 2.1. Let F i be spanning forest of G(V ,E-F 1 -...-F i-1 ) Lemma: G(V ,F 1 +...+F k ) is k-connected iff G(V ,E) is.

  39. Ingredient 2: Connectivity Sketches

  40. Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}.

  41. Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G

  42. Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G Use M 2 G-M 2 F 1 =M 2 (G-F 1 ) to find F 2

  43. Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G Use M 2 G-M 2 F 1 =M 2 (G-F 1 ) to find F 2 Use M 3 G-M 3 F 1 -M 3 F 2 =M 3 (G-F 1 -F 2 ) to find F 3

  44. Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G Use M 2 G-M 2 F 1 =M 2 (G-F 1 ) to find F 2 Use M 3 G-M 3 F 1 -M 3 F 2 =M 3 (G-F 1 -F 2 ) to find F 3 etc.

  45. I. Connectivity II. k -Connectivity III. Min-Cut

  46. I. Connectivity II. k -Connectivity III. Min-Cut Theorem: (1+ % )-approximating minimum cut a) Dynamic Graph Stream: O( % -2 n polylog n) space. b) Simultaneous Messages: O( % -2 polylog n) length.

  47. Ingredient 1: Subsampling

  48. Ingredient 1: Subsampling Lemma (Karger): Define subgraph G i by sampling edges w/p 2 -i . Then Min-Cut( G ) = (1 ± ǫ ) · 2 i · Min-Cut( G i ) if i < − log p ∗

  49. Ingredient 1: Subsampling Lemma (Karger): Define subgraph G i by sampling edges w/p 2 -i . Then Min-Cut( G ) = (1 ± ǫ ) · 2 i · Min-Cut( G i ) if i < − log p ∗ p ∗ = 6 ǫ − 2 log n / Min-Cut(G) where

  50. Ingredient 1: Subsampling Lemma (Karger): Define subgraph G i by sampling edges w/p 2 -i . Then Min-Cut( G ) = (1 ± ǫ ) · 2 i · Min-Cut( G i ) if i < − log p ∗ p ∗ = 6 ǫ − 2 log n / Min-Cut(G) where G=G 0

Recommend


More recommend