computation on natural graphs
play

Computation on Natural Graphs Presenter: Mengxiao Wang Problem: - PowerPoint PPT Presentation

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs Presenter: Mengxiao Wang Problem: Existing distributed graph computation


  1. CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs Presenter: Mengxiao Wang

  2. Problem: Existing distributed graph computation systems perform poorly on natural graphs.

  3. Properties of Natural Graphs: Power-Law Degree Distribution

  4. Properties of Natural Graphs: Low-Quality Partition • Power-Law graphs do not have low cost balanced cut • Traditional graph-partitioning algorithms perform poorly on Power-Law graphs

  5. Q1: Use Figure 1 to illustrate highly skewed power-law degree distribution in a graph and explain how this presents challenges to graph-parallel execution engines like Pregel (in terms of work balance, partitioning, communication, storage, and computation). • Work balance: Due to the property of power-low distribution, the runtime of vertices varies widely with graph-parallel execution. • Partitioning: Pregel uses random partitioning of vertices, which results in poor locality (only a small part of machines will have most edge cuts). • Communication: High-degree vertices will have too much communication other vertices resulting in bottleneck, like traffic problem and too many same messages. • Storage: High-degree vertices will have too many edge metadata on a single machine. • Computation: Since the vertex-programs are executed in parallel but abstractions within them do not parallelize, the high-degree vertices will have more computation than other vertices.

  6. Q2: Use Algorithm 1 and Figure 3 about SSSP to explain how an SSSP problem is solved in the execution of GAS vertex programs. • Gather all the information of in-neighbors and in-edges of the vertex. Find out the minimum value of the sum of in- neighbors and in-edges. • Apply the value to the master vertex and update it to other mirrors on other machines. • If the value is changed, scatter the new value to all out- neighbors and activate them to start GAS vertex programs.

  7. Q3: Explain how the load balancing issue with a graph of highly skewed power-law degree distribution in Pregel can be addressed in PowerGraph. • Evenly assign edges to For greedy vertex-cuts: machines • De-randomization: greedily • Minimize machines spanned minimizes the expected by each vertex number of machines spanned • Assign each edge as it is • Coordinated loaded – Requires coordination to place • Touch each edge only once each edge • Propose three distributed – Slower: higher quality cuts approaches • Oblivious • Random Edge Placement – Approx. greedy objective • Coordinated Greedy Edge without coordination Placement – Faster: lower quality cuts • Oblivious Greedy Edge Placement

  8. • Rather than edge-cut Must synchronize edges • Prefer vertex-cut Must synchronize vertices

  9. Distributed Execution of a PowerGraph Vertex-Program

  10. Thank You! Questions?

Recommend


More recommend