joseph gonzalez
play

Joseph Gonzalez Joint work with: Yucheng Haijie Danny Carlos - PowerPoint PPT Presentation

Distributed Graph-Parallel Computation on Natural Graphs Joseph Gonzalez Joint work with: Yucheng Haijie Danny Carlos Low Gu Bickson Guestrin Graphs are ubiquitous.. 2 Social Media Science Advertising Web Graphs encode


  1. Distributed Graph-Parallel Computation on Natural Graphs Joseph Gonzalez Joint work with: Yucheng Haijie Danny Carlos Low Gu Bickson Guestrin

  2. Graphs are ubiquitous.. 2

  3. Social Media Science Advertising Web • Graphs encode relationships between: People Products Ideas Facts Interests • Big : billions of vertices and edges and rich metadata 3

  4. Graphs are Essential to Data-Mining and Machine Learning • Identify influential people and information • Find communities • Target ads and products • Model complex data dependencies 4

  5. Natural Graphs Graphs derived from natural phenomena 5

  6. Problem: Existing distributed graph computation systems perform poorly on Natural Graphs . 6

  7. PageRank on Twitter Follower Graph Natural Graph with 40M Users, 1.4 Billion Links Runtime Per Iteration 0 50 100 150 200 Hadoop GraphLab Order of magnitude by exploiting properties Twister of Natural Graphs Piccolo PowerGraph Hadoop results from [Kang et al. '11] 7 Twister (in-memory MapReduce) [Ekanayake et al. ‘10]

  8. Properties of Natural Graphs Power-Law Degree Distribution 8

  9. Power-Law Degree Distribution 10 10 More than 10 8 vertices have one neighbor. 10 8 Number of Vertices Top 1% of vertices are High-Degree 10 6 adjacent to Vertices count 50% of the edges! 10 4 10 2 AltaVista WebGraph 1.4B Vertices, 6.6B Edges 10 0 10 0 10 2 10 4 10 6 10 8 Degree degree 9

  10. Power-Law Degree Distribution “Star Like” Motif President Followers Obama 10

  11. Power-Law Graphs are Difficult to Partition CPU 1 CPU 2 • Power-Law graphs do not have low-cost balanced cuts [Leskovec et al. 08, Lang 04] • Traditional graph-partitioning algorithms perform poorly on Power-Law Graphs. [Abou-Rjeili et al. 06] 11

  12. Properties of Natural Graphs High-degree Power-Law Low Quality Vertices Degree Distribution Partition 12

  13. Program Run on This For This Machine 1 Machine 2 • Split High-Degree vertices • New Abstraction à Equivalence on Split Vertices 13

  14. How do we program graph computation? “Think like a Vertex.” -Malewicz et al. [SIGMOD’10] 14

  15. The Graph-Parallel Abstraction • A user-defined Vertex-Program runs on each vertex • Graph constrains interaction along edges – Using messages (e.g. Pregel [PODC’09, SIGMOD’10]) – Through shared state (e.g., GraphLab [UAI’10, VLDB’12]) • Parallelism : run multiple vertex programs simultaneously 15

  16. Example Depends on the popularity their followers Depends on popularity of her followers What’s the popularity of this user? Popular? 16

  17. PageRank Algorithm X R [ i ] = 0 . 15 + w ji R [ j ] j ∈ Nbrs( i ) Rank of Weighted sum of user i neighbors’ ranks • Update ranks in parallel • Iterate until convergence 17

  18. The Pregel Abstraction Vertex-Programs interact by sending messages . Pregel_PageRank (i, messages ) : i // Receive all the messages total = 0 foreach ( msg in messages ) : total = total + msg // Update the rank of this vertex R[i] = 0.15 + total // Send new messages to neighbors foreach (j in out_neighbors[i]) : Send msg( R[i] * w ij ) to vertex j Malewicz et al. [PODC’09, SIGMOD’10] 18

  19. The GraphLab Abstraction Vertex-Programs directly read the neighbors state GraphLab_PageRank (i) i // Compute sum over neighbors total = 0 foreach ( j in in_neighbors(i)): total = total + R[j] * w ji // Update the PageRank R[i] = 0.15 + total // Trigger neighbors to run again if R[i] not converged then foreach ( j in out_neighbors(i)): signal vertex-program on j Low et al. [UAI’10, VLDB’12] 19

  20. Challenges of High-Degree Vertices Sequentially process Sends many Touches a large Edge meta-data edges messages fraction of graph too large for single (Pregel) (GraphLab) machine Asynchronous Execution Synchronous Execution requires heavy locking (GraphLab) prone to stragglers (Pregel) 20

  21. Communication Overhead for High-Degree Vertices Fan-In vs. Fan-Out 21

  22. Pregel Message Combiners on Fan-In A Sum + B D C Machine 1 Machine 2 • User defined commutative associative (+) message operation: 22

  23. Pregel Struggles with Fan-Out A B D C Machine 1 Machine 2 • Broadcast sends many copies of the same message to the same machine! 23

  24. Fan-In and Fan-Out Performance • PageRank on synthetic Power-Law Graphs – Piccolo was used to simulate Pregel with combiners 10 Total Comm. (GB) 8 6 4 2 0 1.8 1.9 2 2.1 2.2 Power-Law Constant α More high-degree vertices 24

  25. GraphLab Ghosting A A B D D B C C Ghost Machine 1 Machine 2 • Changes to master are synced to ghosts 25

  26. GraphLab Ghosting A A B D D B C C Ghost Machine 1 Machine 2 • Changes to neighbors of high degree vertices creates substantial network traffic 26

  27. Fan-In and Fan-Out Performance • PageRank on synthetic Power-Law Graphs • GraphLab is undirected 10 Total Comm. (GB) 8 6 4 2 0 1.8 1.9 2 2.1 2.2 Power-Law Constant alpha More high-degree vertices 27

  28. Graph Partitioning • Graph parallel abstractions rely on partitioning: – Minimize communication – Balance computation and storage Y Data transmitted across network Machine 1 Machine 2 O(# cut edges) 28

  29. Random Partitioning • Both GraphLab and Pregel resort to random (hashed) partitioning on natural graphs then the expected fraction of edges  | Edges Cut | � = 1 − 1 E | E | p 10 Machines à 90% of edges cut 100 Machines à 99% of edges cut! Machine 1 Machine 2 29

  30. In Summary GraphLab and Pregel are not well suited for natural graphs • Challenges of high-degree vertices • Low quality partitioning 30

  31. • GAS Decomposition : distribute vertex-programs – Move computation to data – Parallelize high-degree vertices • Vertex Partitioning: – Effectively distribute large power-law graphs 31

  32. A Common Pattern for Vertex-Programs GraphLab_PageRank (i) // Compute sum over neighbors Gather Information total = 0 foreach ( j in in_neighbors(i)): About Neighborhood total = total + R[j] * w ji // Update the PageRank Update Vertex R[i] = 0.1 + total // Trigger neighbors to run again Signal Neighbors & if R[i] not converged then Modify Edge Data foreach ( j in out_neighbors(i)) signal vertex-program on j 32

  33. GAS Decomposition A pply G ather (Reduce) S catter Accumulate information Apply the accumulated Update adjacent edges about neighborhood value to center vertex and vertices. User Defined: User Defined: User Defined: Apply ( , Σ ) à Y Scatter ( ) à Gather ( ) à Σ Y Y’ Y ’ Σ 1 + Σ 2 à Σ 3 Y Σ ’ Y’ Y Y Update Edge Data & Parallel + Σ + … + à Activate Neighbors Sum 33 Y Y Y

  34. PageRank in PowerGraph X R [ i ] = 0 . 15 + w ji R [ j ] j ∈ Nbrs( i ) PowerGraph_PageRank(i) Gather ( j à i ) : return w ji * R[j] sum (a, b) : return a + b; Apply ( i, Σ ) : R[i] = 0.15 + Σ Scatter ( i à j ) : if R[i] changed then trigger j to be recomputed 34

  35. Distributed Execution of a PowerGraph Vertex-Program Machine 1 Machine 2 Master G ather Y’ Y’ Y’ Y’ Σ Σ 1 Σ 2 + + + Mirror A pply Y Y Y Y Σ 3 Σ 4 S catter Mirror Mirror Machine 3 Machine 4 35

  36. Minimizing Communication in PowerGraph Communication is linear in Y Y Y the number of machines each vertex spans A vertex-cut minimizes machines each vertex spans Percolation theory suggests that power law graphs have good vertex cuts . [Albert et al. 2000] 36

  37. New Approach to Partitioning • Rather than cut edges: Must synchronize Y New Theorem: Y many edges For any edge-cut we can directly CPU 1 CPU 2 construct a vertex-cut which requires • we cut vertices: strictly less communication and storage. Must synchronize Y Y a single vertex CPU 1 CPU 2 37

  38. Constructing Vertex-Cuts • Evenly assign edges to machines – Minimize machines spanned by each vertex • Assign each edge as it is loaded – Touch each edge only once • Propose three distributed approaches: – Random Edge Placement – Coordinated Greedy Edge Placement – Oblivious Greedy Edge Placement 38

  39. Random Edge-Placement • Randomly assign edges to machines Machine 1 Machine 2 Machine 3 Balanced Vertex-Cut Y Spans 3 Machines Y Y Y Y Y Y Y Z Z Y Y Z Z Spans 2 Machines Not cut! 39

  40. Analysis Random Edge-Placement • Expected number of machines spanned by a vertex: 20 18 Twitter Follower Graph Exp. # of Machines Spanned 16 41 Million Vertices 14 1.4 Billion Edges 12 10 Predicted 8 Random 6 Accurately Estimate 4 Memory and Comm. 2 Overhead 8 28 48 Number of Machines 40

  41. Random Vertex-Cuts vs. Edge-Cuts • Expected improvement from vertex-cuts: 100 Comm. and Storage Reduction in 10 Order of Magnitude Improvement 1 0 50 100 150 Number of Machines 41

  42. Greedy Vertex-Cuts • Place edges on machines which already have the vertices in that edge. A B B C Machine1 Machine 2 A B D E 42

  43. Greedy Vertex-Cuts • De-randomization à greedily minimizes the expected number of machines spanned • Coordinated Edge Placement – Requires coordination to place each edge – Slower: higher quality cuts • Oblivious Edge Placement – Approx. greedy objective without coordination – Faster: lower quality cuts 43

Recommend


More recommend