Sublinear Algorithms for Big Data Qin Zhang 1-1
Part 2: Sublinear in Communication 2-1
Sublinear in communication x 2 = 111011 x 1 = 010011 The model x 3 = 111111 x k = 100011 They want to jointly compute f ( x 1 , x 2 , . . . , x k ) Goal: minimize total bits of communication Applicaitons etc. 3-1
A natrual approach The model x 2 = 111011 x 1 = 010011 Coordinator x 3 = 111111 = C x k = 100011 S 1 S 2 S 3 S k · · · They want to jointly compute f ( x 1 , x 2 , . . . , x k ) Goal: minimize total bits of communication The natural approach Each S i computes a skech of its input sk ( S i ) and send it to C , and then C computes f ( x 1 , . . . , x k ) based on sk ( S 1 ) , . . . , sk ( S k ) The slides from next page are borrowed from Andrew McGregor 4-1
I. Connectivity II. k -Connectivity III. Min-Cut
II. k -Connectivity III. Min-Cut I. Connectivity Theorem: Testing Connectivity a) Dynamic Graph Stream: O(n polylog n) space. b) Simultaneous Messages: O(polylog n) length.
Ingredient 1: Basic Algorithm
Ingredient 1: Basic Algorithm Algorithm (Spanning Forest):
Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge
Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge
Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge
Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge
Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge
Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge
Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge
Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge 3.Repeat until no edges between connected comp.
Ingredient 1: Basic Algorithm Algorithm (Spanning Forest): 1. For each node: pick incident edge 2.For each connected comp: pick incident edge 3.Repeat until no edges between connected comp. Lemma: After O(log n) rounds selected edges include spanning forest.
Ingredient 2: Sketching Neighborhoods
Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 3 4
Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 3 4
Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 � 0 0 � 1 0 0 1 0 0 0 0 a 1 + a 2 = 3 4
Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 � 0 0 � 1 0 0 1 0 0 0 0 a 1 + a 2 = 3 4 Lemma: For any subset of nodes S ⊂ V , � support ( a i ) = E ( S , V \ S ) i ∈ S
Ingredient 2: Sketching Neighborhoods For node i, let a i be vector indexed by node pairs. Non-zero entries: a i [i,j]=1 if j>i and a i [i,j]=-1 if j<i. � 1 {1,2} {1,3} {1,4} {1,5} {2,3} {2,4} {2,5} {3,4} {3,5} {4,5} 0 � 1 0 0 0 0 0 0 0 2 5 a 1 = � − 1 0 � 0 0 0 1 0 0 0 0 a 2 = 1 � 0 0 � 1 0 0 1 0 0 0 0 a 1 + a 2 = 3 4 Lemma: For any subset of nodes S ⊂ V , � support ( a i ) = E ( S , V \ S ) i ∈ S Lemma: ∃ random M: � N → � k with k=O(polylog N) such that for any a ∈ � N , with high probability → e ∈ support( a ) M a −
Recipe: Sketch & Compute on Sketches
Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j
Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space:
Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j
Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use:
Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use: � � M a j = M ( a j ) j ∈ S j ∈ S
Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use: � � � → e ∈ support( a j ) = E ( S , V \ S ) M a j = M ( a j ) − j ∈ S j ∈ S j ∈ S
Recipe: Sketch & Compute on Sketches Sketch: Each player sends Ma j Central Player Runs Algorithm in Sketch Space: Use Ma j to get incident edge on each node j For i=2 to log n: To get incident edge on component S ⊂ V use: � � � → e ∈ support( a j ) = E ( S , V \ S ) M a j = M ( a j ) − j ∈ S j ∈ S j ∈ S Detail: Actually each player sends log n indept sketches M 1 a j , M 2 a j , ... and central player uses M i a j when emulating i th iteration of the algorithm.
I. Connectivity II. k -Connectivity III. Min-Cut
I. Connectivity III. Min-Cut II. k -Connectivity Theorem: Checking every cut has size ≥ k a) Dynamic Graph Stream: O(n k polylog n) space. b) Simultaneous Messages: O(k polylog n) length.
Ingredient 1: Basic Algorithm
Ingredient 1: Basic Algorithm Algorithm (k-Connectivity):
Ingredient 1: Basic Algorithm Algorithm (k-Connectivity): 1. Let F 1 be spanning forest of G(V ,E)
Ingredient 1: Basic Algorithm Algorithm (k-Connectivity): 1. Let F 1 be spanning forest of G(V ,E) 2.For i=2 to k: 2.1. Let F i be spanning forest of G(V ,E-F 1 -...-F i-1 )
Ingredient 1: Basic Algorithm Algorithm (k-Connectivity): 1. Let F 1 be spanning forest of G(V ,E) 2.For i=2 to k: 2.1. Let F i be spanning forest of G(V ,E-F 1 -...-F i-1 ) Lemma: G(V ,F 1 +...+F k ) is k-connected iff G(V ,E) is.
Ingredient 2: Connectivity Sketches
Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}.
Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G
Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G Use M 2 G-M 2 F 1 =M 2 (G-F 1 ) to find F 2
Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G Use M 2 G-M 2 F 1 =M 2 (G-F 1 ) to find F 2 Use M 3 G-M 3 F 1 -M 3 F 2 =M 3 (G-F 1 -F 2 ) to find F 3
Ingredient 2: Connectivity Sketches Sketch: Simultaneously construct k independent connectivity sketches {M 1 G, M 2 G, ... M k G}. Run Algorithm in Sketch Space: Use M 1 G to find a spanning forest F 1 of G Use M 2 G-M 2 F 1 =M 2 (G-F 1 ) to find F 2 Use M 3 G-M 3 F 1 -M 3 F 2 =M 3 (G-F 1 -F 2 ) to find F 3 etc.
I. Connectivity II. k -Connectivity III. Min-Cut
I. Connectivity II. k -Connectivity III. Min-Cut Theorem: (1+ % )-approximating minimum cut a) Dynamic Graph Stream: O( % -2 n polylog n) space. b) Simultaneous Messages: O( % -2 polylog n) length.
Ingredient 1: Subsampling
Ingredient 1: Subsampling Lemma (Karger): Define subgraph G i by sampling edges w/p 2 -i . Then Min-Cut( G ) = (1 ± ǫ ) · 2 i · Min-Cut( G i ) if i < − log p ∗
Ingredient 1: Subsampling Lemma (Karger): Define subgraph G i by sampling edges w/p 2 -i . Then Min-Cut( G ) = (1 ± ǫ ) · 2 i · Min-Cut( G i ) if i < − log p ∗ p ∗ = 6 ǫ − 2 log n / Min-Cut(G) where
Ingredient 1: Subsampling Lemma (Karger): Define subgraph G i by sampling edges w/p 2 -i . Then Min-Cut( G ) = (1 ± ǫ ) · 2 i · Min-Cut( G i ) if i < − log p ∗ p ∗ = 6 ǫ − 2 log n / Min-Cut(G) where G=G 0
Recommend
More recommend