A Preliminary Study of Compiler Transformations for Graph - PowerPoint PPT Presentation

A Preliminary Study of Compiler Transformations for Graph Applications on the EMU System Prasanth Chatarasi and Vivek Sarkar Habanero Extreme Scale Software Research Project Georgia Institute of Technology, Atlanta, USA 1 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

Introduction – Graph applications • Increasing in importance for high-performance — With the advent of "big data" • Random memory access patterns — Inefficient utilization of memory & cache in CPU and GPU’s • Growing interest to innovate architectures — To handle applications with weak-locality 2 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

EMU [Kogge et al. IA3’16] • A highly scalable near-memory multi-processor — 8 nodes à 8 nodelets/node à 4 cores/nodelet à 64 threads/core — Cilk programming model for expressing parallelism A Comparison b/w EMU and Xeon on a pointer-chasing benchmark -- Hein et al. [IPDPSW’18] http://www.emutechnology.com/products/ #lightbox/0/ 3 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

1) Key features – Thread migration • Automatic thread migrations on an access to a non-local data x y = x+1 z = y+1 — Computation moves instead of data • y x Benefits of thread migration — Sparse matrix vector multiply – Kogge et al. [IA3’17] — BFS algorithm y z y – Belviranli et al. [HPEC’18] z http://www.emutechnology.com/products/ #lightbox/0/ 4 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

2) Key features – Remote atomic updates • Atomic updates that do NOT cause a thread migration ATOMIC_ADD (Z, 1) — Sends a packet having data and operation to be performed Data:1 Operation: Add • x Used when a thread doesn’t need a return value of atomic operation — Otherwise, explicit FENCE y required to block the thread z http://www.emutechnology.com/products/ #lightbox/0/ 5 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

Challenges with the EMU system • Overheads from thread migrations, thread creation and synchronization. • We focus on exploring compiler transformations to reduce the overheads and improve performance — High-level compiler transformations – Node fusion and Edge flipping — Low-level compiler transformations – Use of remote atomic updates 6 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

Agenda • Introduction • Compiler transformations • Evaluation — Conductance — Bellman-Ford’s algorithm for single-source shortest path — Triangle counting • Conclusions and future work 7 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

1) Node fusion 1: parallel-for(v ∈ vertices) { 1: parallel-for(v ∈ vertices) { 2: p1[v] = … 2: p1[v] = … 3: } // Implicit barrier 3: p2[v] = f(p1[v], …) 4: } 4: parallel-for(v ∈ vertices) { 5: p2[v] = f(p1[v], …) 6: } • Repeated migrations for — Same property across parallel loops — Different properties of same vertex across parallel loops – Can be reduced with fusing parallel loops Can reduce thread creation and synchronization overhead 8 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

2) Edge flipping 1: for(t-loop) { 1: for(t-loop) { 2: parallel-for(v ∈ vertices) 2: parallel-for(v ∈ vertices) 3: cont = f(p1[v], …); 3: for(u ∈ incoming_neighbors(v)) 4: for(u ∈ outgoing_neighbors(v)) 4: p1[v] = f(p1[u], …); 5: atomic_update(p1[u], cont); 5: } 6: } • Back and forth migrations — From a vertex to each of its incoming neighbor vertices – Can be reduced by pushing vertex contribution to its outgoing neighbors 9 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

Agenda • Introduction • Compiler transformations • Evaluation — Conductance — Bellman-Ford’s algorithm for single-source shortest path — Triangle counting • Conclusions and future work 10 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

Experimental setup • Evaluation on a single node of the Emu system — Actual hardware on FPGA • Two experimental variants — Original version of a graph algorithm — Transformed version after manually applying compiler transformations 11 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

Graph applications • Graph applications — Conductance — Bellman-Ford’s algorithm for single-source shortest path — Triangle counting – Developed using the MEATBEE framework • Input data sets — RMAT graphs from scale of 6 to 14 as specified by Graph500 – #vertices = ! "#$%& – #edges = 16 * #vertices https://github.gatech.edu/ehein6/meatbee 12 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

1) Conductance algorithm • Computes a flow from a given partition of graph to others • Repeated migrations to same nodelet for the same property from multiple parallel loops • All the parallel loops can be fused to avoid the overheads 13 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

Results after node fusion 3.0 Speedup after applying loop fusion 2.3 2.17 2.12 2.08 2.06 2.02 2.01 1.97 1.66 1.58 1.5 0.8 0.0 6 7 8 9 10 11 12 13 14 Scale of RMAT graphs specified by Graph500 • Speedups of up to 2.2x (geometric mean: 1.95x) — Also, a geometric mean reduction of 6.06% in thread migrations 14 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

2) Bellman-Ford’s algorithm • Compute shortest paths from a single source vertex to all the other vertices in a weighted directed graph Back and forth migration for every incoming neighbor • Edge flipping followed by remote updates can avoid back and forth migrations 15 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

Results after Edge flipping + Remote updates 4.0 3.76 Speedup after applying edge-flipping + remote updates 3.0 1.94 2.0 1.80 1.16 0.99 1.0 0.83 0.74 0.0 6 7 8 9 10 11 12 Scale of RMAT graphs specified by Graph500 • Speedups of up to 3.8x (geometric mean: 1.38x) — Also, a geometric mean reduction of 36.39% in thread migrations 16 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

3) Triangle counting • Computes the number of triangles in a given undirected graph — Also computes the number of triangles that each node belongs to • Regular atomic updates can be replaced with remote updates 17 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

Results after using Remote updates 1.5 Speedup after using remote atomic updates 1.26 1.10 1.03 1.03 1.01 1.01 1.01 1.01 1.01 1.0 0.5 0.0 6 7 8 9 10 11 12 13 14 Scale of RMAT graphs specified by Graph500 • Speedups of up to 1.3x (geometric mean: 1.05x) — Also, a geometric mean reduction of 54.55% in thread migrations 18 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

Conclusions & Future work • EMU architecture is a potential choice for graph applications — But, a careful attention is required to make sure that overheads don’t hurt the benefits — Evaluated compiler transformations for three graph applications Applications Transformations Conductance Node fusion Bellman-Ford’s algorithm Edge flipping + Remote updates Triangle counting Remote updates • Future work — Systematically explore & evaluate more compiler transformations Any questions? 19 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

Acknowledgements • MCHPC’18 Program committee • Eric Hein and Jeff Young — Getting setup with EMU machine and the MEATBEE framework • CRNCH center at Georgia Tech — Rogues gallery http://crnch.gatech.edu/rogues-emu 20 Prasanth Chatarasi et al, MCHPC 18 CS 6245, Fall 2018 (V.Sarkar)

A Preliminary Study of Compiler Transformations for Graph - PowerPoint PPT Presentation

A Preliminary Study of Compiler Transformations for Graph Applications on the EMU System Prasanth Chatarasi and Vivek Sarkar Habanero Extreme Scale Software Research Project Georgia Institute of Technology, Atlanta, USA 1 Prasanth Chatarasi

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Linear Transformations Linear Transformations 1 / 21 Linear Transformations A function T from R

CMSC427 Transformations I Credit: slides 9+ from Prof. Zwicker Transformations: outline

Transformations Composition of Transformations Congruence Transformations Dilations Similarity

Lecture 6: Normal Transformations, 3D Transformations, Euler Angles COMPSCI/MATH 290-04 Chris

lecture 3 view transformations model transformations GL_MODELVIEW transformation view

Transformations & Transformations & Coordinate Systems Coordinate Systems CSCD 472?

Transformations and Matrices Transformations I Transformations are functions Matrices

Review Transformations Scale Translate Rotate Combining Transformations

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Graph Neural Network Fang Yuanqiang, 2019/05/18 Graph Neural Network Why GNN? Preliminary

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

11/8/2012 The Structure of a Compiler (2) The Structure of a Compiler (1) Any compiler must

Compiler Development (CMPSC 401) Janyl Jumadinova January 17, 2018 Janyl Jumadinova Compiler

Principles of Compiler Design - The Brainf*ck Compiler - Clifford Wolf - www.clifford.at

Transformations Composition of Transformations Congruence Transformations Dilations

Verifying a Lustre Compiler (Part 1) Timothy Bourke 1 , 2 Llio Brun 1 , 2 Pierre-variste Dagand

gpucc: An Open-Source GPGPU Compiler Jingyue Wu , Artem Belevich, Eli Bendersky, Mark Heffernan,

Build your own WebAssembly Compiler Colin Eberhardt, Scott Logic https://wasmweekly.news/ Why

A Compiler Intermediate Representation for Stencils Climate change is now affecting every

David Markachev CSE814 Topics What is Spec# Similarities

Causal Block Diagram: compiler to LaTeX and DEVS Nicolas Demarbaix Overview Introduction

Outline Trusting trust attack Countering Trusting Trust What it is through Diverse

Hot code is faster code Addressing JVM warm-up Mark Price LMAX Exchange The JVM warm-up

Sambuz

Useful Links

Newsletter

Mail Us

A Preliminary Study of Compiler Transformations for Graph - PowerPoint PPT Presentation

A Preliminary Study of Compiler Transformations for Graph Applications on the EMU System Prasanth Chatarasi and Vivek Sarkar Habanero Extreme Scale Software Research Project Georgia Institute of Technology, Atlanta, USA 1 Prasanth Chatarasi

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Linear Transformations Linear Transformations 1 / 21 Linear Transformations A function T from R

CMSC427 Transformations I Credit: slides 9+ from Prof. Zwicker Transformations: outline

Transformations Composition of Transformations Congruence Transformations Dilations Similarity

Lecture 6: Normal Transformations, 3D Transformations, Euler Angles COMPSCI/MATH 290-04 Chris

lecture 3 view transformations model transformations GL_MODELVIEW transformation view

Transformations &amp; Transformations &amp; Coordinate Systems Coordinate Systems CSCD 472?

Transformations and Matrices Transformations I Transformations are functions Matrices

Review Transformations Scale Translate Rotate Combining Transformations

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Graph Neural Network Fang Yuanqiang, 2019/05/18 Graph Neural Network Why GNN? Preliminary

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

11/8/2012 The Structure of a Compiler (2) The Structure of a Compiler (1) Any compiler must

Compiler Development (CMPSC 401) Janyl Jumadinova January 17, 2018 Janyl Jumadinova Compiler

Principles of Compiler Design - The Brainf*ck Compiler - Clifford Wolf - www.clifford.at

Transformations Composition of Transformations Congruence Transformations Dilations

Verifying a Lustre Compiler (Part 1) Timothy Bourke 1 , 2 Llio Brun 1 , 2 Pierre-variste Dagand

gpucc: An Open-Source GPGPU Compiler Jingyue Wu , Artem Belevich, Eli Bendersky, Mark Heffernan,

Build your own WebAssembly Compiler Colin Eberhardt, Scott Logic https://wasmweekly.news/ Why

A Compiler Intermediate Representation for Stencils Climate change is now affecting every

David Markachev CSE814 Topics What is Spec# Similarities

Causal Block Diagram: compiler to LaTeX and DEVS Nicolas Demarbaix Overview Introduction

Outline Trusting trust attack Countering Trusting Trust What it is through Diverse

Hot code is faster code Addressing JVM warm-up Mark Price LMAX Exchange The JVM warm-up

Sambuz

Useful Links

Newsletter

Mail Us

Transformations & Transformations & Coordinate Systems Coordinate Systems CSCD 472?