GPU research in the ES-group Henk Corporaal (professor) Gert-Jan - PowerPoint PPT Presentation

GPU research in the ES-group Henk Corporaal (professor) Gert-Jan van den Braak (postdoc) Roel Jordans (postdoc) Erkan Diken (PhD) Rik Jongerius (PhD) Ang Li (PhD) Maurice Peemen (PhD) Luc Waeijen (PhD) Mark Wijtvliet (PhD)

PARsE research – http://parse.ele.tue.nl/ / Department of Electrical Engineering 3 December 2015 1

PARsE Parallel Architecture Research Eindhoven • Using advanced heterogeneous platforms • Multi-core CPUs • GPUs • DSPs • FPGAs • Efficient code generation • Code transformation & generation • Compilers • Even more efficient: new architectures • SIMD, CGRA, R-GPU • Accelerators − Neural networks (CNNs) / Department of Electrical Engineering 3 December 2015 2

GPU research – overview (selection) • Application mapping • Histogram, CNN • Understanding GPUs Hash function • Modeling of GPU L1 cache 0000 0000 1 0000 1 00 • Cache bypassing • Architecture modification lock bank • Hash functions in scratchpad memory C-code • Code generation • Bones source-to-source tools / Department of Electrical Engineering 3 December 2015 3

Application mapping • Histogram, • Convolutional Neural Networks (CNN) / Department of Electrical Engineering 3 December 2015 4

Application mapping: histogram • Load pixel • Update votes CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE [1] High Performance Predictable Histogramming on GPUs: / Department of Electrical Engineering 3 December 2015 5 Exploring and Evaluating Algorithm Trade-offs

Histogram – replication • Load pixel • Update votes CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE [2] GPU-Vote: A Framework for Accelerating / Department of Electrical Engineering 3 December 2015 6 Voting Algorithms on GPU

7 WARP SCHEDULER CORE CORE CORE CORE CORE CORE CORE CORE DISPATCH UNIT SCRATCHPAD MEMORY / L1 CACHE CORE CORE CORE CORE CORE CORE CORE CORE INSTRUCTION CACHE REGISTER FILE CORE CORE CORE CORE CORE CORE CORE CORE 3 December 2015 CORE CORE CORE CORE CORE CORE CORE CORE WARP SCHEDULER DISPATCH UNIT LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST SFU SFU SFU SFU Scratchpad memory layout • Each bank has 32 lock-bits, 1024 in total … … BANK 31 … … BANK 30 … • Scratchpad memory • Divided in 32 banks … … BANK 3 … … BANK 2 / Department of Electrical Engineering … … BANK 1 … … BANK 0 LOCK-BITS

Application mapping: CNN • Convolutional Neural Network (CNN) • GTX 460: 35fps • Tegra X1: ~20fps Layer 3 Object 80x173x313 Layer 4 Layer 1 Layer 2 input Category + Position 8x173x313 6x358x638 16x177x317 720 x 1280 at(x,y) at(x,y) 1x1 conv. 6x6 conv. with 6x6 conv. with 5x5 conv. 2x2 subsample 2x2 subsample [4] Speed Sign Detection and Recognition / Department of Electrical Engineering 3 December 2015 9 by Convolutional Neural Networks

Understanding GPUs • Modeling of GPU L1 cache • Cache bypassing • Transit model / Department of Electrical Engineering 3 December 2015 10

Understanding GPUs: L1 cache modeling • GPU Cache model: • Execution model (threads, thread blocks) • Memory latencies • MSHRs (pending memory requests) • Cache associativity [5] A Detailed GPU Cache Model Based / Department of Electrical Engineering 3 December 2015 11 on Reuse Distance Theory

L1 cache model – results Mean absolute error of 6.4% / Department of Electrical Engineering 3 December 2015 12

Understanding GPUs: Cache bypassing [6] Adaptive and Transparent / Department of Electrical Engineering 3 December 2015 13 Cache Bypassing for GPUs

Cache bypassing – results [6] Adaptive and Transparent / Department of Electrical Engineering 3 December 2015 14 Cache Bypassing for GPUs

Understanding GPUs: Transit model • Transit model: computation and memory sub-systems [7] Transit: A Visual Analytical Model / Department of Electrical Engineering 3 December 2015 15 for Multithreaded Machines

Architecture modifications Hash function • Scratchpad memory hash functions 0000 0000 1 0000 1 00 • R-GPU LD / ST CR lock bank CORE CR / Department of Electrical Engineering 3 December 2015 16

GPU modifications: bank & lock conflicts CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE addr = 32 * id address 0000 0000 1 00000 00 lock bank … BANK 30 BANK 31 BANK 0 BANK 1 BANK 2 BANK 3 all addresses in bank 0 / Department of Electrical Engineering 3 December 2015 17

Resolving bank conflicts: hash functions CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE addr = 32 * id hash function Hash function 0000 0000 1 00000 00 … BANK 30 BANK 31 BANK 0 BANK 1 BANK 2 BANK 3 lock bank [8] Simulation and Architecture Improvements of / Department of Electrical Engineering 3 December 2015 18 Atomic Operations on GPU Scratchpad Memory

Resolving bank conflicts: hash functions CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE CORE addr = 33 * id configurable hash function Hash function 0000 0000 1 0000 1 00 … BANK 30 BANK 31 BANK 0 BANK 1 BANK 2 BANK 3 lock bank [9] Configurable XOR Hash Functions for / Department of Electrical Engineering 3 December 2015 19 Banked Scratchpad Memories in GPUs

Architecture modifications: R-GPU LD / ST CR LD / ST CR CORE CORE CORE CORE CORE CORE CORE CORE A SHARED MEMORY / L1 CACHE CORE CORE CORE CORE CORE CORE CORE CORE B CORE CORE CORE CORE CORE CORE CORE CORE C CORE CORE CORE CORE CORE CORE CORE CORE D LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST E LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST LD / ST CORE CORE CORE CORE CR CR CR CR / Department of Electrical Engineering 3 December 2015 20

Code generation: ASET & Bones sequential C code How to generate efficient code for all these devices? GPU-OpenCL-AMD Multi-GPU CPU-OpenMP GPU-CUDA FPGA CPU-OpenCL-AMD (CUDA / OpenCL) CPU-OpenCL-Intel XeonPhi-OpenCL [10] Automatic Skeleton-Based Compilation through / Department of Electrical Engineering 21 Integration with an Algorithm Classification

Code generation: ASET & Bones sequential C code Algorithmic Species PET ‘ASET’ Extraction Tool (llvm) species-annotated C code skeleton-based ‘Bones’ compiler GPU-OpenCL-AMD Multi-GPU CPU-OpenMP GPU-CUDA FPGA CPU-OpenCL-AMD (CUDA / OpenCL) CPU-OpenCL-Intel XeonPhi-OpenCL [10] Automatic Skeleton-Based Compilation through / Department of Electrical Engineering 22 Integration with an Algorithm Classification

Example C to CUDA transformation template <unsigned int blockSize> 3 1 7 0 4 1 6 3 __device__ void warpReduce(volatile int *sm, unsigned int tid) { if (blockSize >= 64) sm[tid] += sm[tid + 32]; if (blockSize >= 32) sm[tid] += sm[tid + 16]; Example 1: Sum if (blockSize >= 16) sm[tid] += sm[tid + 8]; if (blockSize >= 8) sm[tid] += sm[tid + 4]; int sum = 0; if (blockSize >= 4) sm[tid] += sm[tid + 2]; if (blockSize >= 2) sm[tid] += sm[tid + 1]; for (int i=0; i<N; i++){ } sum = sum + in[i]; template <unsigned int blockSize> __global__ void reduce6(int *g_idata, int *g_odata, unsigned int n) { } extern __shared__ int sm[]; unsigned int tid = threadIdx.x; unsigned int i = blockIdx.x*(blockSize*2) + tid; unsigned int gridSize = blockSize*2*gridDim.x; sm[tid] = 0; while (i < n) { sm[tid] += g_idata[i] sm[tid] += g_idata[i+blockSize]; i += gridSize; } __syncthreads(); if (blockSize >= 512) { if (tid < 256) { sm[tid] += sm[tid + 256]; } __syncthreads(); } if (blockSize >= 256) { if (tid < 128) { sm[tid] += sm[tid + 128]; } __syncthreads(); } if (blockSize >= 128) { if (tid < 64) { sm[tid] += sm[tid + 64]; } __syncthreads(); } if (tid < 32) { warpReduce<blockSize>(sm, tid); } if (tid == 0) { g_odata[blockIdx.x] = sm[0]; } } / Department of Electrical Engineering 23

GPU research in the ES-group Henk Corporaal (professor) Gert-Jan - PowerPoint PPT Presentation

GPU research in the ES-group Henk Corporaal (professor) Gert-Jan van den Braak (postdoc) Roel Jordans (postdoc) Erkan Diken (PhD) Rik Jongerius (PhD) Ang Li (PhD) Maurice Peemen (PhD) Luc Waeijen (PhD) Mark Wijtvliet (PhD) PARsE research

NLP Research Group: MIT Wuwei Lan, Wei Sun NLP@MIT Introduction group @ CSAIL 2

The Social Services Research Group and why you should join Social Services Research Group The

Presentation of the Research group iCrim International Crime and Criminal Law Research,

National Water Research Group Mauritius Research Council (MRC - Chair) Agricultural Research

Overview of the Paddy Rice Research Group 3rd Meeting of the Americas Sub-Group 13 July 2016

How to build How to build a successful research group? a successful research group? Prof. M

About I Info-Tech R Research G Group Founded by Joel McLean, Info-Tech Research Group opened

Music Informatics Research Group (MIRG) http://mi.soi.city.ac.uk/ Music Informatics Research

Types of Group Comparison Research Causal-comparative AKA Ex Post Facto (Latin for after the

The Edison Group The Future of Equity Research Introduction to Edison Group People 110+ Edison

The Research Group, LLC 810 NW Carpathian Dr. Corvallis, OR 97330 Voice and Facsimile (541)

Priority Research Themes Identified by the Human Resources Research Group Presented by Aveeraj

SWAN Americas Alliance Research Group Call Digital Solutions for COVID-19 Response Led by Eric

The Research Capability Programme Peter Knight, Group Programme Director 11/03/2010 RESEARCH

7. CO 2 Membrane Separation Teruhiko Kai and Shingo Kazama Chemical Research Group Research

Frequency and Quantum Metrology Research Group Research Staff Michael Tobar

Livestock Research Group Why GRA? 2011 Partner Organizations Research Groups Team of Co-Chairs

WELCOME Paddy Rice Research Group Global Research Alliance Dale Bumpers National Rice Research

Crypto Forum Research Group Kenny Paterson RHUL C F R G Crypto Forum Research

Formal Verification and Security Group Research Interests Natasha Sharygina

Renormalization Group Methods: Landau-Fermi Liquid and BCS Superconductor Research Group in

Background Warwick, UK: Empirical Modelling Research Group Dependency and its role in modern

Changing Practice LCFT Self Injury Group Self Injury & Research Interest Group Clinical

Popular Branchings and Their Dual Certificates Telikepalli Kavitha, Tam as Kir aly, Jannik

GPU research in the ES-group Henk Corporaal (professor) Gert-Jan - PowerPoint PPT Presentation

GPU research in the ES-group Henk Corporaal (professor) Gert-Jan van den Braak (postdoc) Roel Jordans (postdoc) Erkan Diken (PhD) Rik Jongerius (PhD) Ang Li (PhD) Maurice Peemen (PhD) Luc Waeijen (PhD) Mark Wijtvliet (PhD) PARsE research

NLP Research Group: MIT Wuwei Lan, Wei Sun NLP@MIT Introduction group @ CSAIL 2

The Social Services Research Group and why you should join Social Services Research Group The

Presentation of the Research group iCrim International Crime and Criminal Law Research,

National Water Research Group Mauritius Research Council (MRC - Chair) Agricultural Research

Overview of the Paddy Rice Research Group 3rd Meeting of the Americas Sub-Group 13 July 2016

How to build How to build a successful research group? a successful research group? Prof. M

About I Info-Tech R Research G Group Founded by Joel McLean, Info-Tech Research Group opened

Music Informatics Research Group (MIRG) http://mi.soi.city.ac.uk/ Music Informatics Research

Types of Group Comparison Research Causal-comparative AKA Ex Post Facto (Latin for after the

The Edison Group The Future of Equity Research Introduction to Edison Group People 110+ Edison

The Research Group, LLC 810 NW Carpathian Dr. Corvallis, OR 97330 Voice and Facsimile (541)

Priority Research Themes Identified by the Human Resources Research Group Presented by Aveeraj

SWAN Americas Alliance Research Group Call Digital Solutions for COVID-19 Response Led by Eric

The Research Capability Programme Peter Knight, Group Programme Director 11/03/2010 RESEARCH

7. CO 2 Membrane Separation Teruhiko Kai and Shingo Kazama Chemical Research Group Research

Frequency and Quantum Metrology Research Group Research Staff Michael Tobar

Livestock Research Group Why GRA? 2011 Partner Organizations Research Groups Team of Co-Chairs

WELCOME Paddy Rice Research Group Global Research Alliance Dale Bumpers National Rice Research

Crypto Forum Research Group Kenny Paterson RHUL C F R G Crypto Forum Research

Formal Verification and Security Group Research Interests Natasha Sharygina

Renormalization Group Methods: Landau-Fermi Liquid and BCS Superconductor Research Group in

Background Warwick, UK: Empirical Modelling Research Group Dependency and its role in modern

Changing Practice LCFT Self Injury Group Self Injury &amp; Research Interest Group Clinical

Popular Branchings and Their Dual Certificates Telikepalli Kavitha, Tam as Kir aly, Jannik

Changing Practice LCFT Self Injury Group Self Injury & Research Interest Group Clinical