CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs Presenter: Mengxiao Wang
Problem: Existing distributed graph computation systems perform poorly on natural graphs.
Properties of Natural Graphs: Power-Law Degree Distribution
Properties of Natural Graphs: Low-Quality Partition • Power-Law graphs do not have low cost balanced cut • Traditional graph-partitioning algorithms perform poorly on Power-Law graphs
Q1: Use Figure 1 to illustrate highly skewed power-law degree distribution in a graph and explain how this presents challenges to graph-parallel execution engines like Pregel (in terms of work balance, partitioning, communication, storage, and computation). • Work balance: Due to the property of power-low distribution, the runtime of vertices varies widely with graph-parallel execution. • Partitioning: Pregel uses random partitioning of vertices, which results in poor locality (only a small part of machines will have most edge cuts). • Communication: High-degree vertices will have too much communication other vertices resulting in bottleneck, like traffic problem and too many same messages. • Storage: High-degree vertices will have too many edge metadata on a single machine. • Computation: Since the vertex-programs are executed in parallel but abstractions within them do not parallelize, the high-degree vertices will have more computation than other vertices.
Q2: Use Algorithm 1 and Figure 3 about SSSP to explain how an SSSP problem is solved in the execution of GAS vertex programs. • Gather all the information of in-neighbors and in-edges of the vertex. Find out the minimum value of the sum of in- neighbors and in-edges. • Apply the value to the master vertex and update it to other mirrors on other machines. • If the value is changed, scatter the new value to all out- neighbors and activate them to start GAS vertex programs.
Q3: Explain how the load balancing issue with a graph of highly skewed power-law degree distribution in Pregel can be addressed in PowerGraph. • Evenly assign edges to For greedy vertex-cuts: machines • De-randomization: greedily • Minimize machines spanned minimizes the expected by each vertex number of machines spanned • Assign each edge as it is • Coordinated loaded – Requires coordination to place • Touch each edge only once each edge • Propose three distributed – Slower: higher quality cuts approaches • Oblivious • Random Edge Placement – Approx. greedy objective • Coordinated Greedy Edge without coordination Placement – Faster: lower quality cuts • Oblivious Greedy Edge Placement
• Rather than edge-cut Must synchronize edges • Prefer vertex-cut Must synchronize vertices
Distributed Execution of a PowerGraph Vertex-Program
Thank You! Questions?
Recommend
More recommend