nvgraph firehose pagerank
play

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe - PowerPoint PPT Presentation

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated Computing nvGRAPH New Features Coming Soon Agenda Dynamic Graphs GraphBLAS 2 ACCELERATED COMPUTING 10x Performance & 5x Energy Efficiency GPU


  1. NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D.

  2. Accelerated Computing nvGRAPH New Features Coming Soon Agenda Dynamic Graphs GraphBLAS 2

  3. ACCELERATED COMPUTING 10x Performance & 5x Energy Efficiency GPU Accelerator # GPU Developers CPU 3

  4. PERFORMANCE GAP CONTINUES TO GROW Peak Double Precision FLOPS Peak Memory Bandwidth GFLOPS GB/s 5.5 800 Pascal Pascal 5.0 700 4.5 600 4.0 3.5 500 K80 K80 3.0 400 2.5 300 2.0 K20 1.5 M2090 200 K20 1.0 M2090 M1060 100 0.5 M1060 0.0 0 2008 2010 2012 2014 2016 2008 2011 2012 2014 2016 NVIDIA GPU x86 CPU NVIDIA GPU x86 CPU 4

  5. GRAPHS ARE FUNDAMENTAL Tight connection between data and graphs Data View Graph View Data Element/ Entity Graph Vertex Entity Attributes Vertex labels Binary Relation (1 to 1) Graph Edge N-ary Relation (many to 1) Hypergraph edge Relation Attributes Edge labels Group of relations over entities Sets of Vertices and Edges 5

  6. NVGRAPH Easy Onramp to GPU Accelerated Graph Analytics GPU Optimized Algorithms Reduced cost & Increased performance Standard formats and primitives Semi-rings, load-balancing Performance Constantly Improving 6

  7. nvGRAPH nvGRAPH: 3.4x Speedup Accelerated Graph Analytics 3 nvGRAPH for high performance graph analytics 2x12 Core Xeon E5 v2 2 Iterations/s nvGRAPH on P100 Deliver results up to 3x faster than CPU-only Solve graphs with up to 2 Billion edges on a single GPU (M40) 1 Accelerates a wide range of graph analytics applications: PageRank Single Source Shortest Single Source Widest 0 Path Path PageRank on Twitter 1.5B edge dataset Search Robotic Path Planning IP Routing nvGraph on P100 Recommendation Engines Power Network Planning Chip Design / EDA GraphMat on 2 socket 12-core Xeon E5-2697 v2 CPU,@ 2.70 Social Ad Placement Logistics & Supply Chain Traffic sensitive routing GHz Planning developer.nvidia.com/nvgraph 7

  8. Motivating example Power law graph: wiki2003.bin 455,436 vertices (n) 2,033,173 edges (nnz) sparsity = 4.464234 Cusparse csrmv time: 8.05 ms Merge Path csrmv time: 1.08 ms ~7.45x faster! PSG Cluster, K40 8

  9. SEMI-RINGS Definition / Axioms Set R with two binary operators: + and * that satisfy: 1. (R, +) is associative, commutative with additive identity 0 ( 0 + a = a ) 2. (R, *) is associative with multiplicative identity 1 ( 1 * a = a ) 3. Left and Right multiplication is distributive over addition 4. Additive identity 0 = multiplicative null operator ( 0 * a = a * 0 = 0 ) 9

  10. SEMI-RINGS Examples SEMIRING SET PLUS TIMES 0 1 ℝ + * 0 1 Real ℝ ∪ {−∞, ∞} + ∞ MinPlus min 0 ℝ ∪ {−∞, ∞} - ∞ ∞ MaxMin max min ∨ ∧ {0, 1} Boolean 0 1 10

  11. APPLICATIONS Pagerank (+, *) • Ideal application: runs on web and social graphs • Each iteration involves computing: 𝑧 = 𝐵 𝑦 • Standard csrmv • PlusTimes Semiring • α = 1.0 (multiplicative identity) • β = 0.0 (multiplicative nullity) //sw/gpgpu/naga/src/pagerank.cpp 11

  12. APPLICATIONS Single Source Shortest Path (min, +) 3 Common Usage Examples: 2 1 1 Path-finding algorithms: • 2 Navigation • Modeling 2 • 2 0 Communications Network • 1 1 Breadth first search building block 2 • 2 2 1 • Graph 500 Benchmark 2 3 2 3 12

  13. APPLICATIONS Widest Path (max,min) Common Usage Examples: • Maximum bipartite graph matching • Graph partitioning • Minimum cut • Common application areas: • power grids • chip circuits 13

  14. PROPERTY GRAPHS Many simple graphs overlaid 14

  15. SUBGRAPH EXTRACTION Focus on a specific area 15

  16. COMING SOON Features in next release Partitioning Clustering BFS Graph Contraction 16

  17. PARTITIONING AND CLUSTERING Spectral Min Edge Cut Partition 17

  18. BREADTH FIRST SEARCH Key subroutine in several graph algorithms, naturally leads to random access MPI Version implementations: pack or use a bitmap to exchange frontier at end of each step NVSHMEM version: directly updates the frontier map at target using atomics Benefits with smaller graphs (likely behavior with strong scaling) 4 P100 GPUs alltoall connected with NVLink cuMPI NVSHMEM cuMPI NVSHMEM 25 25 13% 20 20 20% 8% 15 GTEPS 15 GTEPS 33% 18% 41% 25% 10 10 5 5 0 0 20 21 22 23 20 21 22 23 Scale Scale 1x4 Process Grid 2x2 Process Grid 18

  19. Graph Contraction 19

  20. Graph Contraction 20

  21. DYNAMIC GRAPHS cuSTINGER brings STINGER to GPUs Oded Green presented at HPEC 2016 cuSTINGER: Supporting Dynamic Graph Algorithms for GPUs https://www.researchgate.net/publication/308174457 21

  22. GRAPHBLAS nvGRAPH is part of a GraphBLAS implementation SPMV and SPMM Semi-rings are the first parts Draft 1.0 of GraphBLAS spec 22

Recommend


More recommend