GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC - PowerPoint PPT Presentation

GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems Dipanjan Sengupta Shuaiwen Leon Song Kapil Agarwal Pacific Northwest National Karsten Schwan Lab CERCS - Georgia Tech

Talk Outline  Motivation  Background on GAS  Hybrid Programming model  GraphReduce Architecture  Experimental Results  Conclusion  Future Work

Motivation  Why use GPUs ? – GPU-based frameworks are orders of magnitude faster  Previous GPU-based graph processing doesn’t handle datasets that doesn’t fit in memory  Yahoo-web graph with 1.4 billion vertices requires 6.6 GB memory just to store its vertex values.  Se Seve veral cha hallenge nges s in n large ge- scale scale grap graph h pro rocessing cessing  How to to parti titi tion th the graph ?  How and when to to move th the parti titi tions betw tween host t and GPU GPU ?  How to to best t extr tract t multi ti-level parallelis parallelism in in G GPUs ? ?

Background – GAS model  Gather phase: each U 1 U 2 U 2 U 1 U 2 U 1 vertex aggregates values a a b a b b associated with its v U 3 v U 3 v U 3 c c c incoming edges and d d d U 4 U 4 U 4 source vertices Gather Apply Scatter  Apply phase: each vertex updates its state using the gather result  Scatter phase: each vertex updates the state of every outgoing edge.

Hybrid Programming Model vertex_scatter (vertex v) edge_scatter (edge e) send updates over outgoing edges of v send update over e vertex gather (vertex v) update_gather (update u) apply updates from inbound edges of v apply update u to u. destination while not done while not done for all vertices v that need to scatter updates for all edges e vertex_scatter (v) edge_scatter (e) for all vertices v that have updates for all updates u vertex_gather (v) update_gather (u) Vertex-centric GAS Edge-centric GAS  Existing systems choose either vertex- or edge-centric GAS programming model for graph execution.  Different processing phases have different types of parallelism and memory access characteristics  GraphReduce adopts a hybrid model with a combination of both vertex and edge centric model

GraphReduce Architecture

GraphReduce Architecture Contd… Three major components  Partition Engine  Data Movement Engine  Computation Engine  ! Partition Engine has two responsibilities  Load balanced shard creation , such that each shard contains approximately  equal number of edges Ordering the edges in a shard based on their source or destination vertices for  efficient data movement and memory access Data Movement Engine has following responsibilities  Moving shards in and out of limited GPU memory to process large-scale graphs  Efficiently utilize GPU hardware resources using CUDA streams and Hyper-Qs to  achieve high performance  Saturate the data transfer bandwidth of the PCI-E bus connecting the host and the GPUs

Compute Engine  Four phases of computation  Gather Map: fetches all the updates/ messages along the in-edges.  Gather Reduce: reduce all the collected updates for each vertex  Apply: apply the update to each vertex  Scatter: distribute the updated states of the vertices along the out-edges  Combination of vertex and edge centric implementation  Gather Map – edge centric  Gather Reduce – vertex centric  Apply – vertex centric  Scatter – edge centric

Experimental Setup  Experimental Setup  Node configuration  Two Intel Xeon E5-2670 processors running at 2.6 GHz and 32 GB of RAM  NVIDIA Tesla K20c GPU with 4.8 GB of DRAM

Benchmarks and Dataset  Graph algorith thms used are are BFS BFS an and d Pa PageR geRank nk  9 real w 9 real world an orld and d synth theti tic graph data tasets ts as shown in th the ta table.

Results

Conclusions  Gr GraphR hRed educ uce dev develops elops a g a graph raph proces processin ing f fram ramew ework ork for input t data tasets ts th that t may or may not t fit t in GPU me memo mory  Adopts ts a combinati tion of both th edge and verte tex centr tric implementa tati tion of GAS programming model  Leverages CUDA DA str treams and hardware supports ts like hyper-Qs to to str tream data ta in and out t of GPU for high perf perform orman ance ce  Outp tperforms CPU-based out- t-of-core graph processing framework across a variety ty of real data ta sets ts

Future Work  Ex Exte tending Gr GraphR hRed educ uce framework to to multi tiple nodes in a cluste ter using communicati tion models like MPI like MPI  Addressing th the limite ted on-node memory size th through th the usage of SSD D and oth ther sto torage dev devices ices  Pr Processing essing dyna namic mically lly evo evolving lving gr graphs hs  Understa tanding how dynamic profiling could be inte tegrate ted into to Gr GraphR hRed educ uce

Thank You!

GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC - PowerPoint PPT Presentation

GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems Dipanjan Sengupta Shuaiwen Leon Song Kapil Agarwal Pacific Northwest National Karsten Schwan Lab CERCS - Georgia Tech Talk Outline Motivation Background on

Massively Parallel Graph Analytics Supercomputing for large-scale graph analytics George M. Slota

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

1 3 5 CONVENTIONAL DC MODEL Accelerator Output Accelerator Opening FB-CA SERIES Accelerator

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

WITH RAPIDS Joe Eaton, Ph.D. Technical Lead for Graph Analytics AGENDA Introduction - Why

CEBAF Accelerator Status Arne Freyberger Operations Department Accelerator Division Jefferson

SLAC Accelerator Science and R&D R. Hettel Accelerator Research Division Head (acting)

Fermilab Accelerator R&D Program Vladimir Shiltsev, Accelerator Physics Center Institutional

Pregel Large-Scale Graph Processing William Jones Analysing large graphs is hard. We are

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Netflix: Netflix: Petabyte Scale Petabyte Scale Analytics Infrastructure in Analytics

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Granula: Toward Fine-grained Performance Analysis of Large-scale Graph Processing Platforms Wing

igraph a package for network analysis G a bor Cs a rdi Gabor.Csardi@unil.ch Department

Graph Search Methods Graph Search Methods A search method starts at a given vertex v and

and Edge Arrival Models Nick Gravin Tomer Ezra Michal Feldman Zhihao Gavin Tang Shanghai

Raphtory : Streaming Analysis Of Distributed Temporal Graphs Benjamin Steer , Felix Cuadrado &

Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State University 1 Graph Properties

Degree distributions in preferential attachment graphs Part I: Multivariate Approximations

Six-vertex model partition functions and symmetric polynomials of type BC Dan Betea LPMA (UPMC

Sublinear Algorithms for ( + 1) Vertex Coloring Sepehr Assadi University of Pennsylvania

GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC - PowerPoint PPT Presentation

GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems Dipanjan Sengupta Shuaiwen Leon Song Kapil Agarwal Pacific Northwest National Karsten Schwan Lab CERCS - Georgia Tech Talk Outline Motivation Background on

Massively Parallel Graph Analytics Supercomputing for large-scale graph analytics George M. Slota

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

1 3 5 CONVENTIONAL DC MODEL Accelerator Output Accelerator Opening FB-CA SERIES Accelerator

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

WITH RAPIDS Joe Eaton, Ph.D. Technical Lead for Graph Analytics AGENDA Introduction - Why

CEBAF Accelerator Status Arne Freyberger Operations Department Accelerator Division Jefferson

SLAC Accelerator Science and R&amp;D R. Hettel Accelerator Research Division Head (acting)

Fermilab Accelerator R&amp;D Program Vladimir Shiltsev, Accelerator Physics Center Institutional

Pregel Large-Scale Graph Processing William Jones Analysing large graphs is hard. We are

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Netflix: Netflix: Petabyte Scale Petabyte Scale Analytics Infrastructure in Analytics

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Granula: Toward Fine-grained Performance Analysis of Large-scale Graph Processing Platforms Wing

igraph a package for network analysis G a bor Cs a rdi Gabor.Csardi@unil.ch Department

Graph Search Methods Graph Search Methods A search method starts at a given vertex v and

and Edge Arrival Models Nick Gravin Tomer Ezra Michal Feldman Zhihao Gavin Tang Shanghai

Raphtory : Streaming Analysis Of Distributed Temporal Graphs Benjamin Steer , Felix Cuadrado &amp;

Sublinear Algorithms Lecture 3 Sofya Raskhodnikova Penn State University 1 Graph Properties

Degree distributions in preferential attachment graphs Part I: Multivariate Approximations

Six-vertex model partition functions and symmetric polynomials of type BC Dan Betea LPMA (UPMC

Sublinear Algorithms for ( + 1) Vertex Coloring Sepehr Assadi University of Pennsylvania

SLAC Accelerator Science and R&D R. Hettel Accelerator Research Division Head (acting)

Fermilab Accelerator R&D Program Vladimir Shiltsev, Accelerator Physics Center Institutional

Raphtory : Streaming Analysis Of Distributed Temporal Graphs Benjamin Steer , Felix Cuadrado &