H ETERO R EFACTOR : Refactoring for Heterogeneous Computing with FPGA Jason Lau*, Aishwarya Sivaraman*, Qian Zhang*, Muhammad Ali Gulzar, Jason Cong, Miryung Kim University of California, Los Angeles *Equal co-first authors in alphabetical order
H ETERO R EFACTOR : Refactoring for Heterogeneous Computing with FPGA Jason Lau Aishwarya Qian Zhang Muhammad Jason Cong Miryung Kim Sivaraman Ali Gulzar
FPGA*-based Acceleration Fast Efficient * FPGA: Field Programmable Gate Array 3
FPGA*-based Acceleration Fast Efficient Effort * Field Programmable Gate Array 4
Evolution of Programming Model module vecdot(a, b, c, clk, rst); Verilog typeless. input [67:0] a, b; HDL * output [16:0] c; reg [5:0] s; reg [16:0] prod [0:7]; ... always @( posedge clk or posedge rst) registers. if (!rst) begin if (s == 6’b00001 ) prod[0] = a[..] * b[..]; prod[1] =... instructions. s = 6’b00010 ; else if (s == 6’b00010 ) reg1 = prod[0] + prod[1] + prod[2]; goto-style control. s = 6’b00100 ; // goto L00100; else if (s == 6’b00100 ) reg1 = reg1 + prod[3] + prod[4]; s = 6’b01000 ; else ... ; ... endmodule * HDL: Hardware Description Language 5
Evolution of Programming Model fpga_float<8,15> vecdot( typed. Merlin fpga_float<8,15> a[], HLS * , fpga_float<8,15> b[], auto schedule. etc. fpga_int<31> n) { for ( fpga_int<31> i = 0; i < n; i++) auto resource. sum += a[i] * b[i]; return sum; auto optimization. } * HLS: High-Level Synthesis 6
Something is missing... fpga_float<8,15> vecdot( bit-width. Merlin fpga_float<8,15> a[], HLS * , fpga_float<8,15> b[], etc. fpga_int<31> n) { for ( fpga_int<31> i = 0; i < n; i++) bitwidth = 31 sum += a[i] * b[i]; return sum; waste scarce } memory! FPGA memory: < 100 MB * HLS: High-Level Synthesis 7
Something is missing... exponent 8 bits fraction 15 bits fpga_float<8,15> vecdot( bit-width. Merlin fpga_float<8,15> a[], HLS * , fpga_float<8,15> b[], floating-point precision. etc. fpga_int<31> n) { for ( fpga_int<31> i = 0; i < n; i++) sum += a[i] * b[i]; return sum; precision? } memory? * HLS: High-Level Synthesis 8
Something is missing... 4 errors in 14 lines of code struct Node { bit-width. Merlin Node *left, *right; HLS * , int val; }; floating-point precision. etc. void init(Node **root) { recursive data structure. *root = (Node *)malloc(sizeof(Node)); } nested pointers void insert(Node **root, int *arr); void delete_tree(Node *root) { // … free(root); } void traverse(Node *curr) { if (curr == NULL) return ; int ret = visit(curr->val); traverse(curr->left); traverse(curr->right); } * HLS: High-Level Synthesis 9
Something is missing... 4 errors in 14 lines of code struct Node { bit-width. Merlin Node *left, *right; HLS * , int val; }; preallocated floating-point precision. etc. size? void init(Node **root) { recursive data structure. *root = (Node *)malloc(sizeof(Node)); } nested pointers void insert(Node **root, int *arr); void delete_tree(Node *root) { dynamic mem mgmt // … free(root); } void traverse(Node *curr) { if (curr == NULL) return ; int ret = visit(curr->val); traverse(curr->left); traverse(curr->right); } * HLS: High-Level Synthesis 10
Something is missing... 4 errors in 14 lines of code struct Node { bit-width. Merlin Node *left, *right; HLS * , int val; }; preallocated floating-point precision. etc. size? void init(Node **root) { recursive data structure. *root = (Node *)malloc(sizeof(Node)); } nested pointers void insert(Node **root, int *arr); void delete_tree(Node *root) { dynamic mem mgmt // … free(root); } pointer operations void traverse(Node *curr) { if (curr == NULL) return ; int ret = visit(curr->val); traverse(curr->left); traverse(curr->right); } * HLS: High-Level Synthesis 11
Something is missing... 4 errors in 14 lines of code struct Node { bit-width. Merlin Node *left, *right; HLS * , int val; }; preallocated floating-point precision. etc. size? void init(Node **root) { recursive data structure. *root = (Node *)malloc(sizeof(Node)); } nested pointers void insert(Node **root, int *arr); void delete_tree(Node *root) { dynamic mem mgmt // … free(root); } pointer operations void traverse(Node *curr) { if (curr == NULL) return ; recursion functions int ret = visit(curr->val); traverse(curr->left); traverse(curr->right); } * HLS: High-Level Synthesis 12
Evolution of Programming Model Programmability Credit: A Multi-Paradigm Programming Infrastructure for (languages features, Heterogeneous Architectures by Cong et al. programming evolving difficulty, etc.) C++ 17 C++ 14 gap C++ 11 CPU C++ 03 ANSI C evolving Merlin HLS pragma simplified Vivado HLS C/C++ C/C++ untimed descriptions FPGA HDL 1989 2003 2011 2014 2017 Year 13
Evolution of Programming Model Programmability Credit: A Multi-Paradigm Programming Infrastructure for (languages features, Heterogeneous Architectures by Cong et al. programming evolving difficulty, etc.) C++ 17 significant human effort. C++ 14 & error-prone. gap C++ 11 CPU C++ 03 ANSI C evolving Merlin HLS pragma simplified Vivado HLS C/C++ C/C++ untimed descriptions FPGA HDL 1989 2003 2011 2014 2017 Year 14
Evolution of Programming Model Programmability Credit: A Multi-Paradigm Programming Infrastructure for (languages features, Heterogeneous Architectures by Cong et al. programming evolving difficulty, etc.) C++ 17 significant human effort. C++ 14 & error-prone. gap C++ 11 CPU C++ 03 waste scarce memory ANSI C evolving Merlin HLS pragma simplified Vivado HLS C/C++ C/C++ untimed descriptions FPGA HDL 1989 2003 2011 2014 2017 Year 15
I want it to run! 16
I want it to run efficiently! 17
Automation! 18
H ETERO R EFACTOR C++ Inputs Recursive Data Structures Support and Optimization Instrumentation one-click Integers Refactoring Bitwidth Optimization Selective Offloading Floating Points Bitwidth Optimization Vivado HLS / Merlin 19
Part 1. Dynamic Data Structures C++ Inputs Recursive Data Structures Support and Optimization Instrumentation one-click Integers Refactoring Bitwidth Optimization Selective Offloading Floating Points Bitwidth Optimization Vivado HLS / Merlin 20
Dynamic Data Structures: Instrumentation C++ Inputs Recursive Data Structures Instrumentation one-click Refactoring Data Structure Size Recursion Depth Selective Offloading Vivado HLS / Merlin 21
Dynamic Data Structures: Refactoring C++ Inputs Recursive Data Structures Recursive Data Structures Support and Optimization Instrumentation one-click Refactoring Data Structure Size Recursion Depth Selective Offloading Rewrite Memory Modify Pointer Convert Management Access Recursion Vivado HLS / Merlin 22
Example Program void init(Node **root) { *root = (Node *)malloc(sizeof(Node)); } C++ Inputs void delete_tree(Node *root) { // … free(root); } Instrumentation void traverse(Node *curr) { // entry if (curr == NULL) return ; Refactoring int ret = visit(curr->val); traverse(curr->left); traverse(curr->right); // return } Selective Offloading // top-level function float kernel( float input[], int n) { float value = computation( float (..), ..); Vivado HLS / Merlin } 23
Refactoring Rule 1 : Rewrite Mem. Mgmt. void init(Node **root) { C++ Inputs *root = (Node *)malloc(sizeof(Node)); } void delete_tree(Node *root) { // … Instrumentation free(root); } Refactoring void init(Node_ptr *root) { *root = (Node_ptr)Node_malloc(sizeof(Node)); } Selective Offloading void delete_tree(Node_ptr root) { // … Node_free(root); } Vivado HLS / Merlin 24
Refactoring Rule 1 : Rewrite Mem. Mgmt. void init(Node **root) { C++ Inputs *root = (Node *)malloc(sizeof(Node)); } void delete_tree(Node *root) { // … Instrumentation free(root); } Refactoring void init(Node_ptr *root) { *root = (Node_ptr)Node_malloc(sizeof(Node)); } Selective Offloading void delete_tree(Node_ptr root) { // … Node_free(root); } Vivado HLS / Merlin 25
Refactoring Rule 2 : Rewrite Pointer Access void traverse(Node_ptr curr) { C++ Inputs if (curr == NULL) return ; int ret = visit(curr->val); traverse(curr->left); Instrumentation traverse(curr->right); } Refactoring Node Node_arr[NODE_ARR_SIZE]; void traverse(Node_ptr curr) { if (curr == NULL) return ; Selective Offloading int ret = visit(Node_arr[curr].val); traverse(Node_arr[curr].left); traverse(Node_arr[curr].right); } Vivado HLS / Merlin 26
Refactoring Rule 3 : Convert Recursion void traverse(Node_ptr curr) { traverse(Node_arr[curr].left); C++ Inputs traverse(Node_arr[curr].right); } void traverse_converted(Node_ptr curr) { Instrumentation stack<context> s(STACK_SIZE); while (!s.empty()) { context c = s.pop(); goto c.location; Refactoring L0: // traverse(Node_arr[curr].left); c.location = L1; s.push(c); Selective Offloading s.push({curr: Node_arr[curr].left}); continue; L1: // ... Vivado HLS / Merlin } } 27
Part 2. Integers C++ Inputs Recursive Data Structures Support and Optimization Instrumentation one-click Integers Refactoring Bitwidth Optimization Selective Offloading Floating Points Bitwidth Optimization Vivado HLS / Merlin 28
Recommend
More recommend