Parallel Eigensolver for Graph Spectral Analysis on GPU Yimin Liu - PowerPoint PPT Presentation

15-618 Final Project Parallel Eigensolver for Graph Spectral Analysis on GPU Yimin Liu Heran Lin yiminliu@andrew.cmu.edu lin1@andrew.cmu.edu Carnegie Mellon University May 11, 2015

Overview ◮ Undirected graph G = ( V , E ) ◮ Symmetric square matrix M associated with graph G (adjacency matrix A , graph Laplacian L , etc.) ◮ Eigenvalues of M encodes interesting properties of the graph Mx = λ x

Eigendecomposition Overview ◮ Transform M to a symmetric tridiagonal matrix T m ◮ Calculate eigenvalues of T m ⇒ ⇒ (easy) Lanczos

The Lanczos Algorithm for Tridiagonalization   α 1 β 2 ...   β 2 α 2   T m =   ... ...   β m     β m α m 1. v 0 ← 0 , v 1 ← norm-1 random vector, β 1 ← 0 2. for j = 1 , . . . , m ◮ w j ← Mv j ◮ α j ← w ⊤ j v j ◮ w j ← w j − α j v j − β j v j − 1 ◮ β j +1 ← � w j � 2 ◮ v j +1 ← w j /β j +1 Potential parallelism for CUDA: matrix-vector product , dot-product, SAXPY

Challenges Characteristics of M ◮ Really sparse ◮ Skewed distribution of non-zero elements ◮ Example: power-law node degree distribution in social networks

Compressed Sparse Row (CSR) Matrix-Vector Multiplication (SPMV) · · · Row 0 Row 1 Row 2 column index = ×

Naive Work Assignment Thread 0 Thread 1 Thread 2 · · · Row 0 Row 1 Row 2 Row 0 Result ◮ Each thread is responsible for one row ◮ Work imbalance issues

Warp-based Work Assignment Warp 0 Warp 1 Warp 2 · · · Row 0 Row 1 Row 2 Partial Sum Row 0 Result ◮ Each warp (32 threads) is responsible for one row ◮ Reduce partial sum in shared memory

Warp-based Work Assignment for Row Groups Warp 0 Warp 1 · · · Row 0 Row 1 Row 2 Row 0 Result Row 1 Result ◮ Each warp is responsible for a group of rows ◮ Group size depending on the average row sparsity of the matrix

Evaluation Environment Amazon Web Service EC2 g2.2xlarge ◮ NVIDIA GK104 GPU, 1,536 CUDA cores, with CUDA 7.0 Toolkit installed ◮ Intel Xeon E5-2670 CPU, 8 cores, with gcc/g++ 4.8.2 installed, -O3 optimization switched on Competitive reference: SPMV implementation in cuSparse ( http://docs.nvidia.com/cuda/cusparse/ ) Dataset: generated scale-free networks based-on the Barab´ asi-Albert model, using Python NetworkX

float SPMV Performance Similiar to cuSparse 9 Speedup of GPU SPMV over CPU 8 7 6 5 Group SPMV 4 cuSparse SPMV Naive SPMV 3 0 400 800 1 , 600 3 , 200 Graph Node Count ( × 10 3 )

double SPMV Performance Better than cuSparse 11 Speedup of GPU SPMV over CPU 10 9 8 7 6 Group SPMV cuSparse SPMV 5 Naive SPMV 4 0 400 800 1 , 600 3 , 200 Graph Node Count ( × 10 3 )

Real-world Graphs ◮ as-Skitter: ∼ 1,700,000 nodes, ∼ 11,000,000 edges ◮ cit-Patents: ∼ 3,800,000 nodes, ∼ 17,000,000 edges Converted to symmetric double adjacency matrices Data source: SNAP ( http://snap.stanford.edu/data/index.html )

SPMV Better than cuSparse on Large Real-world Graphs Speedup of GPU SPMV over CPU 11 . 6 Group SPMV 12 10 . 8 cuSparse SPMV Naive SPMV 10 7 . 5 7 . 5 8 7 . 4 6 4 2 . 5 2 as-Skitter cit-Patents Real-world Graph

Faster Eigenvalue Solver on GPU Running Time of Eigensolvers (sec) 40 GPU Eigensolver CPU Eigensolver 31 . 8 30 20 9 10 3 . 1 1 . 6 0 as-Skitter cit-Patents Real-world Graph

Discussion SLEPc ( http://slepc.upv.es ) ◮ A state-of-the-art parallel CPU framework using MPI for sparse matrix eigenvalues solving ◮ Took 84.9 sec to solve 10 largest eigenvalues for the cit-Patents graph, while we took only 31.8 sec on CPU ◮ Unfair to compare? ◮ Many variants of the Lanczos algorithm ◮ Accuracy v.s. performance tradeoff

Parallel Eigensolver for Graph Spectral Analysis on GPU Yimin Liu - PowerPoint PPT Presentation

15-618 Final Project Parallel Eigensolver for Graph Spectral Analysis on GPU Yimin Liu Heran Lin yiminliu@andrew.cmu.edu lin1@andrew.cmu.edu Carnegie Mellon University May 11, 2015 Overview Undirected graph G = ( V , E ) Symmetric

A parallel eigensolver eigensolver using using contour contour A parallel integration for

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Spectral Graph Theory and its Applications Lillian Dai 6.454 Oct. 20, 2004 1 Outline Basic

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Course : Data mining Lecture : Spectral graph analysis Aristides Gionis Department of Computer

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

General spectral graph theory: The inverse eigenvalue problem of a graph Department of

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

2110412 Parallel Comp Arch CUDA: Parallel Programming on GPU Natawut Nupairoj, Ph.D. Department

A modularity-based spectral graph analysis Dario Fasino (Udine), Francesco Tudisco (Roma TV)

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Discrete space (lattice) arises naturally in solids (crystals) Using localized atomic-like

Performance Portable Line Smoother for Multiphysics Problems using Compact Batched BLAS Kyungjoo

CS475 / CM375 Lecture 17: Nov 8, 2011 QR Algorithm and Reduction to Hessenberg Reading: [TB] Chapt

Diffraction Light bends! Diffraction assumptions Solution to Maxwell's Equations The far-field

Lecture 12: Dense Linear Algebra David Bindel 3 Mar 2010 HW 2 update I have speed-up over

Laurette TUCKERMAN laurette@pmmh.espci.fr Numerical Methods for Differential Equations in

Chapter IX: Matrix factorizations Information Retrieval & Data Mining Universitt des

Natural Language Processing Class is now big enough for big class policies Late days: 7

Parallel Eigensolver for Graph Spectral Analysis on GPU Yimin Liu - PowerPoint PPT Presentation

15-618 Final Project Parallel Eigensolver for Graph Spectral Analysis on GPU Yimin Liu Heran Lin yiminliu@andrew.cmu.edu lin1@andrew.cmu.edu Carnegie Mellon University May 11, 2015 Overview Undirected graph G = ( V , E ) Symmetric

A parallel eigensolver eigensolver using using contour contour A parallel integration for

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Spectral Graph Theory and its Applications Lillian Dai 6.454 Oct. 20, 2004 1 Outline Basic

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Course : Data mining Lecture : Spectral graph analysis Aristides Gionis Department of Computer

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

General spectral graph theory: The inverse eigenvalue problem of a graph Department of

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

2110412 Parallel Comp Arch CUDA: Parallel Programming on GPU Natawut Nupairoj, Ph.D. Department

A modularity-based spectral graph analysis Dario Fasino (Udine), Francesco Tudisco (Roma TV)

Lesson 9 Introduction Signal Spectral Analysis: Estimation of the power spectral density

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Discrete space (lattice) arises naturally in solids (crystals) Using localized atomic-like

Performance Portable Line Smoother for Multiphysics Problems using Compact Batched BLAS Kyungjoo

CS475 / CM375 Lecture 17: Nov 8, 2011 QR Algorithm and Reduction to Hessenberg Reading: [TB] Chapt

Diffraction Light bends! Diffraction assumptions Solution to Maxwell's Equations The far-field

Lecture 12: Dense Linear Algebra David Bindel 3 Mar 2010 HW 2 update I have speed-up over

Laurette TUCKERMAN laurette@pmmh.espci.fr Numerical Methods for Differential Equations in

Chapter IX: Matrix factorizations Information Retrieval &amp; Data Mining Universitt des

Natural Language Processing Class is now big enough for big class policies Late days: 7

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Chapter IX: Matrix factorizations Information Retrieval & Data Mining Universitt des