NVGRAPH,FIREHOSE,PAGERANK
GPU ACCELERATED ANALYTICS NOV 2016
Joe Eaton Ph.D.
NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe - - PowerPoint PPT Presentation
NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated Computing nvGRAPH New Features Coming Soon Agenda Dynamic Graphs GraphBLAS 2 ACCELERATED COMPUTING 10x Performance & 5x Energy Efficiency GPU
Joe Eaton Ph.D.
2
3
CPU GPU Accelerator
# GPU Developers
4
100 200 300 400 500 600 700 800
2008 2011 2012 2014 2016
Peak Memory Bandwidth
NVIDIA GPU x86 CPU
M2090 M1060 K20 K80 Pascal
GB/s
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
2008 2010 2012 2014 2016
Peak Double Precision FLOPS
NVIDIA GPU x86 CPU
M1060 K20
GFLOPS
K80 Pascal M2090
5
6
GPU Optimized Algorithms Reduced cost & Increased performance Standard formats and primitives Semi-rings, load-balancing Performance Constantly Improving
7
nvGRAPH for high performance graph analytics
Deliver results up to 3x faster than CPU-only Solve graphs with up to 2 Billion edges on a single GPU (M40) Accelerates a wide range of graph analytics applications:
developer.nvidia.com/nvgraph PageRank Single Source Shortest Path Single Source Widest Path Search Robotic Path Planning IP Routing Recommendation Engines Power Network Planning Chip Design / EDA Social Ad Placement Logistics & Supply Chain Planning Traffic sensitive routing
1 2 3
Iterations/s
2x12 Core Xeon E5 v2 nvGRAPH on P100
PageRank on Twitter 1.5B edge dataset
nvGraph on P100 GraphMat on 2 socket 12-core Xeon E5-2697 v2 CPU,@ 2.70 GHz
8
9
(0 + a = a)
(1 * a = a)
(0 * a = a * 0 = 0)
10
Real
MinPlus
MaxMin
Boolean
11
//sw/gpgpu/naga/src/pagerank.cpp
12
13
14
15
16
17
18
5 10 15 20 25 20 21 22 23 GTEPS Scale cuMPI NVSHMEM
Key subroutine in several graph algorithms, naturally leads to random access MPI Version implementations: pack or use a bitmap to exchange frontier at end of each step NVSHMEM version: directly updates the frontier map at target using atomics Benefits with smaller graphs (likely behavior with strong scaling)
4 P100 GPUs alltoall connected with NVLink 1x4 Process Grid 2x2 Process Grid
5 10 15 20 25 20 21 22 23 GTEPS Scale cuMPI NVSHMEM
41% 33% 20% 13% 25% 18% 8%
19
20
21
cuSTINGER brings STINGER to GPUs
22