NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe - - PowerPoint PPT Presentation

nvgraph firehose pagerank
SMART_READER_LITE
LIVE PREVIEW

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe - - PowerPoint PPT Presentation

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated Computing nvGRAPH New Features Coming Soon Agenda Dynamic Graphs GraphBLAS 2 ACCELERATED COMPUTING 10x Performance & 5x Energy Efficiency GPU


slide-1
SLIDE 1

NVGRAPH,FIREHOSE,PAGERANK

GPU ACCELERATED ANALYTICS NOV 2016

Joe Eaton Ph.D.

slide-2
SLIDE 2

2

Agenda

Accelerated Computing nvGRAPH New Features Coming Soon Dynamic Graphs GraphBLAS

slide-3
SLIDE 3

3

CPU GPU Accelerator

ACCELERATED COMPUTING

10x Performance & 5x Energy Efficiency

# GPU Developers

slide-4
SLIDE 4

4

PERFORMANCE GAP CONTINUES TO GROW

100 200 300 400 500 600 700 800

2008 2011 2012 2014 2016

Peak Memory Bandwidth

NVIDIA GPU x86 CPU

M2090 M1060 K20 K80 Pascal

GB/s

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5

2008 2010 2012 2014 2016

Peak Double Precision FLOPS

NVIDIA GPU x86 CPU

M1060 K20

GFLOPS

K80 Pascal M2090

slide-5
SLIDE 5

5

GRAPHS ARE FUNDAMENTAL

Tight connection between data and graphs Data View Graph View Data Element/ Entity Graph Vertex Entity Attributes Vertex labels Binary Relation (1 to 1) Graph Edge N-ary Relation (many to 1) Hypergraph edge Relation Attributes Edge labels Group of relations over entities Sets of Vertices and Edges

slide-6
SLIDE 6

6

NVGRAPH

Easy Onramp to GPU Accelerated Graph Analytics

GPU Optimized Algorithms Reduced cost & Increased performance Standard formats and primitives Semi-rings, load-balancing Performance Constantly Improving

slide-7
SLIDE 7

7

nvGRAPH

Accelerated Graph Analytics

nvGRAPH for high performance graph analytics

Deliver results up to 3x faster than CPU-only Solve graphs with up to 2 Billion edges on a single GPU (M40) Accelerates a wide range of graph analytics applications:

developer.nvidia.com/nvgraph PageRank Single Source Shortest Path Single Source Widest Path Search Robotic Path Planning IP Routing Recommendation Engines Power Network Planning Chip Design / EDA Social Ad Placement Logistics & Supply Chain Planning Traffic sensitive routing

1 2 3

Iterations/s

nvGRAPH: 3.4x Speedup

2x12 Core Xeon E5 v2 nvGRAPH on P100

PageRank on Twitter 1.5B edge dataset

nvGraph on P100 GraphMat on 2 socket 12-core Xeon E5-2697 v2 CPU,@ 2.70 GHz

slide-8
SLIDE 8

8

Motivating example

Power law graph: wiki2003.bin Cusparse csrmv time: 8.05 ms Merge Path csrmv time: 1.08 ms

455,436 vertices (n) 2,033,173 edges (nnz) sparsity = 4.464234 ~7.45x faster! PSG Cluster, K40

slide-9
SLIDE 9

9

SEMI-RINGS

Definition / Axioms

Set R with two binary operators: + and * that satisfy: 1. (R, +) is associative, commutative with additive identity 0

(0 + a = a)

2. (R, *) is associative with multiplicative identity 1

(1 * a = a)

3. Left and Right multiplication is distributive over addition 4. Additive identity 0 = multiplicative null operator

(0 * a = a * 0 = 0)

slide-10
SLIDE 10

10

SEMI-RINGS

Examples

SEMIRING SET PLUS TIMES 1

Real

ℝ + *

1

MinPlus

ℝ ∪ {−∞, ∞}

min

+ ∞

MaxMin

ℝ ∪ {−∞, ∞}

max min

Boolean

{0, 1}

∨ ∧ 1

slide-11
SLIDE 11

11

APPLICATIONS

Pagerank (+, *)

//sw/gpgpu/naga/src/pagerank.cpp

  • Ideal application: runs on web and social graphs
  • Each iteration involves computing: 𝑧 = 𝐵 𝑦
  • Standard csrmv
  • PlusTimes Semiring
  • α = 1.0 (multiplicative identity)
  • β = 0.0 (multiplicative nullity)
slide-12
SLIDE 12

12

APPLICATIONS

Single Source Shortest Path (min, +)

Common Usage Examples:

  • Path-finding algorithms:
  • Navigation
  • Modeling
  • Communications Network
  • Breadth first search building block
  • Graph 500 Benchmark

1 1 2 1 1 2 2 2 2 1 3 2 3 2 2 3 2

slide-13
SLIDE 13

13

APPLICATIONS

Widest Path (max,min)

Common Usage Examples:

  • Maximum bipartite graph matching
  • Graph partitioning
  • Minimum cut
  • Common application areas:
  • power grids
  • chip circuits
slide-14
SLIDE 14

14

PROPERTY GRAPHS

Many simple graphs overlaid

slide-15
SLIDE 15

15

SUBGRAPH EXTRACTION

Focus on a specific area

slide-16
SLIDE 16

16

COMING SOON

Partitioning Clustering BFS Graph Contraction

Features in next release

slide-17
SLIDE 17

17

PARTITIONING AND CLUSTERING

Spectral Min Edge Cut Partition

slide-18
SLIDE 18

18

5 10 15 20 25 20 21 22 23 GTEPS Scale cuMPI NVSHMEM

BREADTH FIRST SEARCH

Key subroutine in several graph algorithms, naturally leads to random access MPI Version implementations: pack or use a bitmap to exchange frontier at end of each step NVSHMEM version: directly updates the frontier map at target using atomics Benefits with smaller graphs (likely behavior with strong scaling)

4 P100 GPUs alltoall connected with NVLink 1x4 Process Grid 2x2 Process Grid

5 10 15 20 25 20 21 22 23 GTEPS Scale cuMPI NVSHMEM

41% 33% 20% 13% 25% 18% 8%

slide-19
SLIDE 19

19

Graph Contraction

slide-20
SLIDE 20

20

Graph Contraction

slide-21
SLIDE 21

21

DYNAMIC GRAPHS

cuSTINGER brings STINGER to GPUs

Oded Green presented at HPEC 2016 cuSTINGER: Supporting Dynamic Graph Algorithms for GPUs https://www.researchgate.net/publication/308174457

slide-22
SLIDE 22

22

GRAPHBLAS

nvGRAPH is part of a GraphBLAS implementation SPMV and SPMM Semi-rings are the first parts Draft 1.0 of GraphBLAS spec

slide-23
SLIDE 23