Distributed Graph-Parallel Computation on Natural Graphs Joseph Gonzalez Joint work with: Yucheng Haijie Danny Carlos Low Gu Bickson Guestrin
Graphs are ubiquitous.. 2
Social Media Science Advertising Web • Graphs encode relationships between: People Products Ideas Facts Interests • Big : billions of vertices and edges and rich metadata 3
Graphs are Essential to Data-Mining and Machine Learning • Identify influential people and information • Find communities • Target ads and products • Model complex data dependencies 4
Natural Graphs Graphs derived from natural phenomena 5
Problem: Existing distributed graph computation systems perform poorly on Natural Graphs . 6
PageRank on Twitter Follower Graph Natural Graph with 40M Users, 1.4 Billion Links Runtime Per Iteration 0 50 100 150 200 Hadoop GraphLab Order of magnitude by exploiting properties Twister of Natural Graphs Piccolo PowerGraph Hadoop results from [Kang et al. '11] 7 Twister (in-memory MapReduce) [Ekanayake et al. ‘10]
Properties of Natural Graphs Power-Law Degree Distribution 8
Power-Law Degree Distribution 10 10 More than 10 8 vertices have one neighbor. 10 8 Number of Vertices Top 1% of vertices are High-Degree 10 6 adjacent to Vertices count 50% of the edges! 10 4 10 2 AltaVista WebGraph 1.4B Vertices, 6.6B Edges 10 0 10 0 10 2 10 4 10 6 10 8 Degree degree 9
Power-Law Degree Distribution “Star Like” Motif President Followers Obama 10
Power-Law Graphs are Difficult to Partition CPU 1 CPU 2 • Power-Law graphs do not have low-cost balanced cuts [Leskovec et al. 08, Lang 04] • Traditional graph-partitioning algorithms perform poorly on Power-Law Graphs. [Abou-Rjeili et al. 06] 11
Properties of Natural Graphs High-degree Power-Law Low Quality Vertices Degree Distribution Partition 12
Program Run on This For This Machine 1 Machine 2 • Split High-Degree vertices • New Abstraction à Equivalence on Split Vertices 13
How do we program graph computation? “Think like a Vertex.” -Malewicz et al. [SIGMOD’10] 14
The Graph-Parallel Abstraction • A user-defined Vertex-Program runs on each vertex • Graph constrains interaction along edges – Using messages (e.g. Pregel [PODC’09, SIGMOD’10]) – Through shared state (e.g., GraphLab [UAI’10, VLDB’12]) • Parallelism : run multiple vertex programs simultaneously 15
Example Depends on the popularity their followers Depends on popularity of her followers What’s the popularity of this user? Popular? 16
PageRank Algorithm X R [ i ] = 0 . 15 + w ji R [ j ] j ∈ Nbrs( i ) Rank of Weighted sum of user i neighbors’ ranks • Update ranks in parallel • Iterate until convergence 17
The Pregel Abstraction Vertex-Programs interact by sending messages . Pregel_PageRank (i, messages ) : i // Receive all the messages total = 0 foreach ( msg in messages ) : total = total + msg // Update the rank of this vertex R[i] = 0.15 + total // Send new messages to neighbors foreach (j in out_neighbors[i]) : Send msg( R[i] * w ij ) to vertex j Malewicz et al. [PODC’09, SIGMOD’10] 18
The GraphLab Abstraction Vertex-Programs directly read the neighbors state GraphLab_PageRank (i) i // Compute sum over neighbors total = 0 foreach ( j in in_neighbors(i)): total = total + R[j] * w ji // Update the PageRank R[i] = 0.15 + total // Trigger neighbors to run again if R[i] not converged then foreach ( j in out_neighbors(i)): signal vertex-program on j Low et al. [UAI’10, VLDB’12] 19
Challenges of High-Degree Vertices Sequentially process Sends many Touches a large Edge meta-data edges messages fraction of graph too large for single (Pregel) (GraphLab) machine Asynchronous Execution Synchronous Execution requires heavy locking (GraphLab) prone to stragglers (Pregel) 20
Communication Overhead for High-Degree Vertices Fan-In vs. Fan-Out 21
Pregel Message Combiners on Fan-In A Sum + B D C Machine 1 Machine 2 • User defined commutative associative (+) message operation: 22
Pregel Struggles with Fan-Out A B D C Machine 1 Machine 2 • Broadcast sends many copies of the same message to the same machine! 23
Fan-In and Fan-Out Performance • PageRank on synthetic Power-Law Graphs – Piccolo was used to simulate Pregel with combiners 10 Total Comm. (GB) 8 6 4 2 0 1.8 1.9 2 2.1 2.2 Power-Law Constant α More high-degree vertices 24
GraphLab Ghosting A A B D D B C C Ghost Machine 1 Machine 2 • Changes to master are synced to ghosts 25
GraphLab Ghosting A A B D D B C C Ghost Machine 1 Machine 2 • Changes to neighbors of high degree vertices creates substantial network traffic 26
Fan-In and Fan-Out Performance • PageRank on synthetic Power-Law Graphs • GraphLab is undirected 10 Total Comm. (GB) 8 6 4 2 0 1.8 1.9 2 2.1 2.2 Power-Law Constant alpha More high-degree vertices 27
Graph Partitioning • Graph parallel abstractions rely on partitioning: – Minimize communication – Balance computation and storage Y Data transmitted across network Machine 1 Machine 2 O(# cut edges) 28
Random Partitioning • Both GraphLab and Pregel resort to random (hashed) partitioning on natural graphs then the expected fraction of edges | Edges Cut | � = 1 − 1 E | E | p 10 Machines à 90% of edges cut 100 Machines à 99% of edges cut! Machine 1 Machine 2 29
In Summary GraphLab and Pregel are not well suited for natural graphs • Challenges of high-degree vertices • Low quality partitioning 30
• GAS Decomposition : distribute vertex-programs – Move computation to data – Parallelize high-degree vertices • Vertex Partitioning: – Effectively distribute large power-law graphs 31
A Common Pattern for Vertex-Programs GraphLab_PageRank (i) // Compute sum over neighbors Gather Information total = 0 foreach ( j in in_neighbors(i)): About Neighborhood total = total + R[j] * w ji // Update the PageRank Update Vertex R[i] = 0.1 + total // Trigger neighbors to run again Signal Neighbors & if R[i] not converged then Modify Edge Data foreach ( j in out_neighbors(i)) signal vertex-program on j 32
GAS Decomposition A pply G ather (Reduce) S catter Accumulate information Apply the accumulated Update adjacent edges about neighborhood value to center vertex and vertices. User Defined: User Defined: User Defined: Apply ( , Σ ) à Y Scatter ( ) à Gather ( ) à Σ Y Y’ Y ’ Σ 1 + Σ 2 à Σ 3 Y Σ ’ Y’ Y Y Update Edge Data & Parallel + Σ + … + à Activate Neighbors Sum 33 Y Y Y
PageRank in PowerGraph X R [ i ] = 0 . 15 + w ji R [ j ] j ∈ Nbrs( i ) PowerGraph_PageRank(i) Gather ( j à i ) : return w ji * R[j] sum (a, b) : return a + b; Apply ( i, Σ ) : R[i] = 0.15 + Σ Scatter ( i à j ) : if R[i] changed then trigger j to be recomputed 34
Distributed Execution of a PowerGraph Vertex-Program Machine 1 Machine 2 Master G ather Y’ Y’ Y’ Y’ Σ Σ 1 Σ 2 + + + Mirror A pply Y Y Y Y Σ 3 Σ 4 S catter Mirror Mirror Machine 3 Machine 4 35
Minimizing Communication in PowerGraph Communication is linear in Y Y Y the number of machines each vertex spans A vertex-cut minimizes machines each vertex spans Percolation theory suggests that power law graphs have good vertex cuts . [Albert et al. 2000] 36
New Approach to Partitioning • Rather than cut edges: Must synchronize Y New Theorem: Y many edges For any edge-cut we can directly CPU 1 CPU 2 construct a vertex-cut which requires • we cut vertices: strictly less communication and storage. Must synchronize Y Y a single vertex CPU 1 CPU 2 37
Constructing Vertex-Cuts • Evenly assign edges to machines – Minimize machines spanned by each vertex • Assign each edge as it is loaded – Touch each edge only once • Propose three distributed approaches: – Random Edge Placement – Coordinated Greedy Edge Placement – Oblivious Greedy Edge Placement 38
Random Edge-Placement • Randomly assign edges to machines Machine 1 Machine 2 Machine 3 Balanced Vertex-Cut Y Spans 3 Machines Y Y Y Y Y Y Y Z Z Y Y Z Z Spans 2 Machines Not cut! 39
Analysis Random Edge-Placement • Expected number of machines spanned by a vertex: 20 18 Twitter Follower Graph Exp. # of Machines Spanned 16 41 Million Vertices 14 1.4 Billion Edges 12 10 Predicted 8 Random 6 Accurately Estimate 4 Memory and Comm. 2 Overhead 8 28 48 Number of Machines 40
Random Vertex-Cuts vs. Edge-Cuts • Expected improvement from vertex-cuts: 100 Comm. and Storage Reduction in 10 Order of Magnitude Improvement 1 0 50 100 150 Number of Machines 41
Greedy Vertex-Cuts • Place edges on machines which already have the vertices in that edge. A B B C Machine1 Machine 2 A B D E 42
Greedy Vertex-Cuts • De-randomization à greedily minimizes the expected number of machines spanned • Coordinated Edge Placement – Requires coordination to place each edge – Slower: higher quality cuts • Oblivious Edge Placement – Approx. greedy objective without coordination – Faster: lower quality cuts 43
Recommend
More recommend