help high level primitives for large scale graph
play

HelP: High-level Primitives for Large- Scale Graph Processing Semih - PowerPoint PPT Presentation

HelP: High-level Primitives for Large- Scale Graph Processing Semih Salihoglu Stanford University Jennifer Widom Stanford University 1 Large-scale Graph Processing 10s or 100s billion vertices and edges Distributed Shared-Nothing


  1. HelP: High-level Primitives for Large- Scale Graph Processing Semih Salihoglu — Stanford University Jennifer Widom — Stanford University 1

  2. Large-scale Graph Processing  10s or 100s billion vertices and edges  Distributed Shared-Nothing Systems Machine k Machine 1 Machine 2 ……… Distributed Storage Pregel PowerGraph 2

  3. APIs of Existing Systems  Specialized map() and reduce() type APIs  Pregel’s compute()  PowerGraph’s gather(), apply(), scatter()  Vertex-centric/Graph-parallel  Message-passing Machine k Machine 1 Machine 2 ……… Distributed Storage 3

  4. Advantages  Transparent parallelism  Flexible. Can express many graph algorithms: PageRank HITS Shortest Paths Collaborative Filtering Affinity Propagation Loopy Belief Propagation Weakly Connected Components Triangle Counting Strongly Connected Components Betweenness-Centrality Minimum Spanning Tree Diameter Estimation … … 4

  5. Disadvantages  Custom code for common operations, such as:  Initializing vertex values  Aggregating neighbor values  Difficult to read and understand some programs:  Complex UDFs hide higher-level graph operations … graph = Pregel.compute Pregel.compute(U (UDF1 F1) graph graph = = Pregel.compute Pregel.compute(U (UDF2 F2) graph = Pregel.compute Pregel.compute(U (UDF3 F3) …  Too low-level for some operations  E.g: forming super vertices in a minimum spanning tree  Multiple rounds of complex messaging inside compute() 5

  6. HelP Primitives Large-Scale Data Large-Scale Graph Processing Processing X X X X X map() reduce() X compute() gather() apply() scatter() Pig and Hive: HelP: join, group by, ? select, … 6

  7. Steps in Our Work 1. Implemented a wide suite of distributed graph algorithms 2. Identified the commonly appearing operations 3. Abstracted the operations into HelP primitives 4. Implemented HelP on GraphX 5. Reimplemented the suite of algorithms on GraphX 7

  8. Graph Algorithms We Implemented Algorithm PageRank HITS Conductance Approx. Betweenness Centrality Clustering Coefficient Semi-clustering Multi-level clustering Approx. Maximum Weight Matching Random Bipartite Matching Weakly Connected Components Strongly Connected Components Single Source Shortest Paths Graph Coloring Maximal Independent Set K-core Triangle Counting Diameter Estimation K-truss Minimum Spanning Forest 8

  9. HelP Primitives Primitive Type of Operation Aggregate Neighbor Values (ANV) Vertex-centric Update Local Update of Vertices (LUV) Vertex-centric Update Vertex-centric Update Update Vertices Using One Other Vertex (UVUOV) Filter Topology Modification Form Supervertices (FS) Topology Modification Aggregate Global Value (AGV) Global Aggregation 9

  10. Algorithms & HelP Primitives Algorithm Filter ANV LUV UVUOV FS AGV PageRank x x HITS x x x Conductance x x Approx. Betweenness Centrality x x x Clustering Coefficient x x Semi-clustering x x x Multi-level clustering x x x Approx. Maximum Weight Matching x x Random Bipartite Matching x x x Weakly Connected Components x x Strongly Connected Components x x x x Single Source Shortest Paths x x Graph Coloring x x x Maximal Independent Set x x x K-core x x Triangle Counting x Diameter Estimation x x x K-truss x Minimum Spanning Forest x x x x 10

  11. Example: Aggregate Neighbor Values  Vertices a ggregate some or all of their neighbors’ values  Update own value with the aggregated value  Version 1: Non-iterative => aggregateNeighborValues PageRank … for (i=0; i < 10; ++i) { g.aggregat aggregateN eNeig eighb hbor orVa Valu lues es( v -> true /* aggregate all vertices */, nbr -> true /* which neighbors to aggregate */ , nbr -> nbr.val.pr/nbr.degree, AggrFnc.SUM, (v, sumPr)->{v.val.pr = 0.85*sumPr + 0.15/g.numV;}) } 11

  12. Version 2: Iterative => propagateAndAggregate  Continue aggregations until vertex values converge  Ex: Weakly Connected Components 1 1 1 1 7 7 7 7 5 1 4 9 7 9 5 2 4 8 8 8 8 2 2 2 2 9 8 9 5 5 3 5 5 5 5 4 9 5 9 9 3 3 5 3 5 … 5 4 3 5 4 4 9 4 9 9 9 g.propaga gate teAn AndAg Aggr greg egat ate( EdgeDirection.BOTH, v -> true, /* start propagation from all */ v -> v.val.wccID, AggrFnc.MAX, (v, aggrWCCID) -> {v.val.wccID = aggrWCCID;}) 12

  13. Related Work (see paper)  Vertex-centric APIs  MapReduce-based APIs  Higher-Level Data Analysis Languages  Domain-Specific Graph Languages  MPI-based Libraries 13

  14. GraphX Implementation, Limitations, Future Work See Our Paper & Poster! 14

  15. Questions? 15

  16. GraphX Implementation (Non-iterative Version) Graph EdgesRDD MessagesRDD v 1 .ID v 2 .ID e 1 v 1 .ID v 3 .ID e 2 mapreduceTriplets v 1 .I aggrMsg 1 (join + map + D v 2 .ID v 3 .ID e 3 reduceBy) v 2 .I aggrMsg 2 v 3 .ID v 1 .ID e 4 D v 4 .ID v 2 .ID e 5 v 3 .I aggrMsg 3 D v 4 .ID v 1 .ID e 6 VerticesMsgsRDD NewVerticesRDD v 1 .ID v 1 .val aggrMsg 1 v 1 .ID v 1 .newval VerticesRDD join v 2 .ID v 2 .val aggrMsg 2 v 2 .ID v 2 .newval map v 1 .I v 1 .val v 3 .ID v 3 .newval v 3 .ID v 3 .val aggrMsg 3 D v 4 .ID v 4 .mewval v 4 .ID v 4 .val aggrMsg 4 v 2 .I v 2 .val D v 3 .I v 3 .val D v 4 .I v 4 .val D 16 Replace VerticesRDD with NewVerticesRDD.

Recommend


More recommend