graphreduce large scale graph analytics on accelerator
play

GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC - PowerPoint PPT Presentation

GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems Dipanjan Sengupta Shuaiwen Leon Song Kapil Agarwal Pacific Northwest National Karsten Schwan Lab CERCS - Georgia Tech Talk Outline Motivation Background on


  1. GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems Dipanjan Sengupta Shuaiwen Leon Song Kapil Agarwal Pacific Northwest National Karsten Schwan Lab CERCS - Georgia Tech

  2. Talk Outline — Motivation — Background on GAS — Hybrid Programming model — GraphReduce Architecture — Experimental Results — Conclusion — Future Work

  3. Motivation — Why use GPUs ? – GPU-based frameworks are orders of magnitude faster — Previous GPU-based graph processing doesn’t handle datasets that doesn’t fit in memory — Yahoo-web graph with 1.4 billion vertices requires 6.6 GB memory just to store its vertex values. — Se Seve veral cha hallenge nges s in n large ge- scale scale grap graph h pro rocessing cessing — How to to parti titi tion th the graph ? — How and when to to move th the parti titi tions betw tween host t and GPU GPU ? — How to to best t extr tract t multi ti-level parallelis parallelism in in G GPUs ? ?

  4. Background – GAS model — Gather phase: each U 1 U 2 U 2 U 1 U 2 U 1 vertex aggregates values a a b a b b associated with its v U 3 v U 3 v U 3 c c c incoming edges and d d d U 4 U 4 U 4 source vertices Gather Apply Scatter — Apply phase: each vertex updates its state using the gather result — Scatter phase: each vertex updates the state of every outgoing edge.

  5. Hybrid Programming Model vertex_scatter (vertex v) edge_scatter (edge e) send updates over outgoing edges of v send update over e vertex gather (vertex v) update_gather (update u) apply updates from inbound edges of v apply update u to u. destination while not done while not done for all vertices v that need to scatter updates for all edges e vertex_scatter (v) edge_scatter (e) for all vertices v that have updates for all updates u vertex_gather (v) update_gather (u) Vertex-centric GAS Edge-centric GAS — Existing systems choose either vertex- or edge-centric GAS programming model for graph execution. — Different processing phases have different types of parallelism and memory access characteristics — GraphReduce adopts a hybrid model with a combination of both vertex and edge centric model

  6. GraphReduce Architecture

  7. GraphReduce Architecture Contd… Three major components — Partition Engine — Data Movement Engine — Computation Engine — ! Partition Engine has two responsibilities — Load balanced shard creation , such that each shard contains approximately — equal number of edges Ordering the edges in a shard based on their source or destination vertices for — efficient data movement and memory access Data Movement Engine has following responsibilities — Moving shards in and out of limited GPU memory to process large-scale graphs — Efficiently utilize GPU hardware resources using CUDA streams and Hyper-Qs to — achieve high performance — Saturate the data transfer bandwidth of the PCI-E bus connecting the host and the GPUs

  8. Compute Engine — Four phases of computation — Gather Map: fetches all the updates/ messages along the in-edges. — Gather Reduce: reduce all the collected updates for each vertex — Apply: apply the update to each vertex — Scatter: distribute the updated states of the vertices along the out-edges — Combination of vertex and edge centric implementation — Gather Map – edge centric — Gather Reduce – vertex centric — Apply – vertex centric — Scatter – edge centric

  9. Experimental Setup — Experimental Setup — Node configuration — Two Intel Xeon E5-2670 processors running at 2.6 GHz and 32 GB of RAM — NVIDIA Tesla K20c GPU with 4.8 GB of DRAM

  10. Benchmarks and Dataset — Graph algorith thms used are are BFS BFS an and d Pa PageR geRank nk — 9 real w 9 real world an orld and d synth theti tic graph data tasets ts as shown in th the ta table.

  11. Results

  12. Conclusions — Gr GraphR hRed educ uce dev develops elops a g a graph raph proces processin ing f fram ramew ework ork for input t data tasets ts th that t may or may not t fit t in GPU me memo mory — Adopts ts a combinati tion of both th edge and verte tex centr tric implementa tati tion of GAS programming model — Leverages CUDA DA str treams and hardware supports ts like hyper-Qs to to str tream data ta in and out t of GPU for high perf perform orman ance ce — Outp tperforms CPU-based out- t-of-core graph processing framework across a variety ty of real data ta sets ts

  13. Future Work — Ex Exte tending Gr GraphR hRed educ uce framework to to multi tiple nodes in a cluste ter using communicati tion models like MPI like MPI — Addressing th the limite ted on-node memory size th through th the usage of SSD D and oth ther sto torage dev devices ices — Pr Processing essing dyna namic mically lly evo evolving lving gr graphs hs — Understa tanding how dynamic profiling could be inte tegrate ted into to Gr GraphR hRed educ uce

  14. Thank You!

Recommend


More recommend