SOCAO: Source-to-Source OpenCL Compiler for Intel-Altera FPGAs - PowerPoint PPT Presentation

SOCAO: Source-to-Source OpenCL Compiler for Intel-Altera FPGAs Johanna Rohde 1 , Marcos Martinez-Peiró 2 , Rafael Gadea-Gironés 2 Date: 7.9.2017 1 Computer Systems Group, TU Darmstadt 2 Department of Electronic Engineering, Universitat Politècnica de València 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 1

Overview ● Introduction ● Background ● Design ● Implementation ● Evaluation ● Conclusion 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 2

Introduction Problem: Accelerate a program with an FPGA ● How do I program the FPGA? ● How do I communicate with the FPGA? ● How much time do I need to rewrite the code? 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 3

Introduction Low Level OpenCL SOCAO Compiler ASIC Tools for FPGA for C to OpenCL FPGA Programmers Parallel Programmers Sofuware Programmers 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 4

Background ● OpenCL Open programming standard for heterogeneous parallel systems – Calculations are passed to external accelerator – Accelerator can be – ● CPU ● GPU ● FPGA ● ... 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 5

Background OpenCL Platform ● Host – Manages the system ● Host Is connected to one or more ● compute devices Compute Device – Compute Device Executes a kernel Compute Unit Compute Unit Compute Unit ● PE PE PE PE PE PE Consists of multiple compute ... ... ... ... ● units PE PE PE PE PE PE Compute Unit – Consists of multiple processing ● elements 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 6

Background OpenCL ● Memory Model – Global Memory ● Used to transfer data ● Accessible by all work groups DEVICE Global ● Normally the slowest memory Memory WORK GROUP WORK Host HOST – Constant Memory ITEM Memory Local Private Memory Memory ● Used to save constants Constant WORK ITEM Memory – Local Memory Private Memory ... ● Accessible by all work items of one work group ● Not accessible by host – Private Memory ● Accessible by one work item ● Holds intermediate values 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 7

Background OpenCL Host Program Flow ● Host OpenCL Host OpenCL Program Kernel Program Kernel Set Kernel Write Buffer Arguments Initialize Load Kernel Environment Launch Kernel Read Buffer Execution 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 8

Background OpenCL for FPGAs ● 2 additional forms of parallelism ● Instruction-level parallelism – Instructions that are independent of each other can be calculated at the same time a b + a b c d c d + + + x = ( a + b )*( c + d ); y = x - e ; * e 4 * time e z = x << 4 ; - << - y z y 4 << z 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 9

Background OpenCL for FPGAs ● Loop Pipelining – Iterations are overlapped – Ideal case: One Iteration per clock cycle – Problem when the loop has loop-carried dependencies time +/+ * <</- for( int a = 0 ; a < 4 ; a ++) iteration 1: { +/+ * <</- iteration 2: x = ( a + b )*( c + d ); +/+ * <</- y = x << 4 ; iteration 3: z = x - e ; +/+ * <</- } iteration 4: 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 10

Background OpenCL for FPGAs ● Intel’s SDK for OpenCL Host Code OpenCL Accelerator Code void sum ( int * A , int * B , __kernel void void sum ( int * A , int * B , __kernel void int * res , int size ) sum ( __global int * A , int * res , int size ) sum ( __global int * A , { __global int * B , { __global int * B , clEnqueueWriteBuffer (...); __global int * res , clEnqueueWriteBuffer (...); __global int * res , ClEnqueueTask (...); int size ) ClEnqueueTask (...); int size ) ClEnqueueReadBuffer (...); { ClEnqueueReadBuffer (...); { } for( int i = 0 ; i < size ; i ++) } for( int i = 0 ; i < size ; i ++) Intel Intel res [ i ] = A [ i ] + B [ i ]; res [ i ] = A [ i ] + B [ i ]; Cross Cross Offmine } Offmine } Compiler Compiler Compiler Compiler . exe . exe . aocx . aocx 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 11

Design Input Program //Altera_OpenCL_Accelerate //Altera_OpenCL_Accelerate //Altera_OpcnCL_size A size //Altera_OpcnCL_size A size //Altera_OpenCL... //Altera_OpenCL... void sum ( int * A , int * B , int * res , int size ) void sum ( int * A , int * B , int * res , int size ) { { for( int i = 0 ; i < size ; i ++) for( int i = 0 ; i < size ; i ++) res [ i ] = A [ i ]+ B [ i ]; res [ i ] = A [ i ]+ B [ i ]; } } SOCAO Compiler SOCAO Compiler SOCAO Compiler SOCAO Compiler Host Code OpenCL Accelerator Code void sum ( int * A , int * B , __kernel void void sum ( int * A , int * B , __kernel void int * res , int size ) aocl_generated_kernel ( int * res , int size ) aocl_generated_kernel ( { __global int * A , { __global int * A , clEnqueueWriteBuffer ( ... ); __global int * B , clEnqueueWriteBuffer ( ... ); __global int * B , clEnqueueTask ( … ); __global int * res , clEnqueueTask ( … ); __global int * res , clEnqueueReadBuffer ( … ); int size ) clEnqueueReadBuffer ( … ); int size ) } { } { for( int i = 0 ; i < size ; i ++) for( int i = 0 ; i < size ; i ++) res [ i ] = A [ i ] + B [ i ]; res [ i ] = A [ i ] + B [ i ]; } } 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 12

Implementation Front-end Middle-end Back-end Semantic Check Unparse Start Function Detection yes Parse Successful End Function Analysis & no yes Transformation Successful Abort no Host Program Abort Transformation Kernel Creation • ROSE Framework – Open-source compiler framework – Provides front-end, back-end and additional functionalities 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 13

Implementation Front-end Middle-end Back-end Semantic Check Unparse Start Function Detection yes Parse Successful End Function Analysis & no yes Transformation Successful Abort no Host Program Abort Transformation Kernel Creation • ROSE Framework – Open-source compiler framework – Provides front-end, back-end and additional functionalities 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 14

Implementation Inline Constant Value Constant Constant Array 2D to 1D Array Transformation Transformation Folding Analysis Transformation Memory Typedef Parameter Loop Unrolling In/Out Analysis Analysis Analysis Analysis ● The Function Analysis & Transformation phase is the most important ● All decisions are made during this phase ● Consists of 10 analysis/transformation steps 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 15

Implementation Inline Constant Value Constant Constant Array 2D to 1D Array Transformation Transformation Folding Analysis Transformation Memory Typedef Parameter Loop Unrolling In/Out Analysis Analysis Analysis Analysis void vector_process ( char * input , void vector_update ( char * input , char value ) int ilen ) { { int i ; { for( i = 0 ; i < 64 ; i ++) char value_1 = 'c' ; input [ i ] += value ; int i ; } for( i = 0 ; i < 64 ; i ++) input [ i ] += value_1 ; void vector_update ( char * input , } int ilen ) } { vector_process ( input , 'c' ); } 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 16

Implementation Inline Constant Value Constant Constant Array 2D to 1D Array Transformation Transformation Folding Analysis Transformation Memory Typedef Parameter Loop Unrolling In/Out Analysis Analysis Analysis Analysis program.cpp aocl_kernel.cl const uint32_t K [] = { 0x428A2F98 , __constant const uint32_t K [] = { …}; 0x428A2F98 , …}; //Altera_OpenCL_Accelerate __kernel void //Altera_OpenCL_size K 64 aocl_generated_kernel ( …. ) //Altera_OpenCL_const_vec K { … … void update_accelerated ( … ) } 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 17

Implementation Inline Constant Value Constant Constant Array 2D to 1D Array Transformation Transformation Folding Analysis Transformation Memory Typedef Parameter Loop Unrolling In/Out Analysis Analysis Analysis Analysis #define WIDTH 512 #define WIDTH 512 int [ 5 ][ WIDTH ] A ; __kernel void aocl_generated_kernel ( int //Altera_OpenCL_Accelerate __global __restrict__ * A , ...) //Altera_OpenCL_size A 2560 { void func (...) ... { int tmp = A [ 2 * WIDTH + 3 ]; int tmp = A [ 2 ][ 3 ]; } } 07.09.17 | TU Darmstadt | Computer Systems Group | Johanna Rohde | 18

SOCAO: Source-to-Source OpenCL Compiler for Intel-Altera FPGAs - PowerPoint PPT Presentation

SOCAO: Source-to-Source OpenCL Compiler for Intel-Altera FPGAs Johanna Rohde 1 , Marcos Martinez-Peir 2 , Rafael Gadea-Girons 2 Date: 7.9.2017 1 Computer Systems Group, TU Darmstadt 2 Department of Electronic Engineering, Universitat

OpenCL Kernel Compilation Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu

Real Time Linux Scheduling Comparison Vince Bridgers Software Architect Altera Corporation Who

Intel Labs Haskell Research Compiler Hai (Paul) Liu with Neal Glew, Leaf Peterson, Todd A.

Investigation of the OpenCL support in the GeantV's Vectorized Geometry Gabor Biro 22.09.2014.

The OpenCL C++ API Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Introduction to OpenCL David Black-Schaffer david.black-schaffer@it.uu.se 1 Disclaimer I

OpenCL on FPGAs Contains material from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Synchronization in OpenCL Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

Digital System Design Lecture 12: Altera-Xilinx SOPC Amir Masoud Gharehbaghi

Intel Case Intel Case Processor Serial Number (PSN) Processor Serial Number (PSN) 5/9/99 Group

Validation Labs with OpenStack Shuquan Huang, Intel IT Engineering Computing Weibo: @

5G Cloud Native from RAN to Core Christian Maciocco, Intel Shilpa Talwar, Intel Saikrishna

AFS at Intel AFS at Intel Travis Broughton Travis Broughton Agenda Agenda Intels

11/8/2012 The Structure of a Compiler (2) The Structure of a Compiler (1) Any compiler must

Symmetry Breaking Marco Chiarandini Department of Mathematics & Computer Science University

Lecture 18: Applications of Dynamic Programming Steven Skiena Department of Computer Science

BREAK POINTS BREAK POINTS Matthieu R Bloch Tuesday, March 3, 2020 1 LOGISTICS LOGISTICS TAs

22 nd May 2013 Welcome Annie Gunner Logan Director, CCPS Matt will be taking photographs today

5.1 Online versus Offline SVMs We start with a review of the Offline Support Vector Machine.

Budget Managers Meeting April 19, 2013 Agenda Allocation Communications NUPlans Update

Iron Mountain Reports Fourth Quarter and Full Year 2017 Results BOSTON February 16, 2018

CCL Industries Inc. CCL Industries Inc. Investor Update 4th Quarter 2013 Review February 20,

Sambuz

Useful Links

Newsletter

Mail Us

SOCAO: Source-to-Source OpenCL Compiler for Intel-Altera FPGAs - PowerPoint PPT Presentation

SOCAO: Source-to-Source OpenCL Compiler for Intel-Altera FPGAs Johanna Rohde 1 , Marcos Martinez-Peir 2 , Rafael Gadea-Girons 2 Date: 7.9.2017 1 Computer Systems Group, TU Darmstadt 2 Department of Electronic Engineering, Universitat

OpenCL Kernel Compilation Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu

Real Time Linux Scheduling Comparison Vince Bridgers Software Architect Altera Corporation Who

Intel Labs Haskell Research Compiler Hai (Paul) Liu with Neal Glew, Leaf Peterson, Todd A.

Investigation of the OpenCL support in the GeantV's Vectorized Geometry Gabor Biro 22.09.2014.

The OpenCL C++ API Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Introduction to OpenCL David Black-Schaffer david.black-schaffer@it.uu.se 1 Disclaimer I

OpenCL on FPGAs Contains material from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin, James

Synchronization in OpenCL Slides taken from Hands On OpenCL by Simon McIntosh-Smith, Tom Deakin,

Digital System Design Lecture 12: Altera-Xilinx SOPC Amir Masoud Gharehbaghi

Intel Case Intel Case Processor Serial Number (PSN) Processor Serial Number (PSN) 5/9/99 Group

Validation Labs with OpenStack Shuquan Huang, Intel IT Engineering Computing Weibo: @

5G Cloud Native from RAN to Core Christian Maciocco, Intel Shilpa Talwar, Intel Saikrishna

AFS at Intel AFS at Intel Travis Broughton Travis Broughton Agenda Agenda Intels

11/8/2012 The Structure of a Compiler (2) The Structure of a Compiler (1) Any compiler must

Symmetry Breaking Marco Chiarandini Department of Mathematics &amp; Computer Science University

Lecture 18: Applications of Dynamic Programming Steven Skiena Department of Computer Science

BREAK POINTS BREAK POINTS Matthieu R Bloch Tuesday, March 3, 2020 1 LOGISTICS LOGISTICS TAs

22 nd May 2013 Welcome Annie Gunner Logan Director, CCPS Matt will be taking photographs today

5.1 Online versus Offline SVMs We start with a review of the Offline Support Vector Machine.

Budget Managers Meeting April 19, 2013 Agenda Allocation Communications NUPlans Update

Iron Mountain Reports Fourth Quarter and Full Year 2017 Results BOSTON February 16, 2018

CCL Industries Inc. CCL Industries Inc. Investor Update 4th Quarter 2013 Review February 20,

Sambuz

Useful Links

Newsletter

Mail Us

Symmetry Breaking Marco Chiarandini Department of Mathematics & Computer Science University