High-Level Synthesis Xilinx Vivado HLS Hao Zheng Comp Sci & Eng University of South Florida 1
Reading ➜ The Zynq Book , chapter 14, 15 ➜ Vivado Design Suite Tutorial: High-Level Synthesis 2
Overview 3
4
5
6
Implementation Considerations ➜ Resources / area ➜ Throughput ➜ Clock frequency Controlled by synthesis directives ➜ Latency ➜ Power consumption ➜ I/O requirements 7
8
Native Types in C/C++ 9
Arbitrary Precision – Integer 1 ≤ N ≤ 1024 10
Typical C/C++ Construct to RTL Mapping C Constructs HW Components à Modules Functions à Arguments Input/output ports à Operators Functional units Scalars à Wires or registers Arrays à Memories à Control flows Control logics 11
Function Hierarchy ➜ Each function is synthesized to a RTL module ➜ Function inlining eliminates hierarchy � ➜ The function main() cannot be synthesized ➜ Used to develop C-testbench Source code RTL hierarchy void A() { .. body A .. } TOP void C() { .. body C .. } void B() { A C(); B C } void TOP( ) { A(…); B(…); } 12
Function Arguments ➜ Function arguments become module ports ➜ Interface follows certain protocol to synchronize data � exchange TOP void TOP(int* in1, int* in2, Datapath int* out1) in1 out1 in2 { *out1 = *in1 + *in2; in1_vld out1_vld FSM in2_vld } 13 �
Expressions � ➜ Expressions and operations are synthesized to datapath components ➜ Timing constraints influence the degree of registering A + char A, B, C, D, int P; B � + P = (A+B)*C+D C P D 14
Arrays ➜ An array is typically implemented by a mem block ➜ Read & write array -> RAM; Constant array -> ROM ➜ An array can be partitioned and map to multiple RAMs ➜ Multiples arrays can be merged and map to one RAM � ➜ An array can be partitioned into individual elements and map to registers A[N] void TOP(int) TOP N-1 { RAM N-2 A_out A_in DIN DOUT int A[N]; … ADDR for (i = 0; i < N; i++) 1 CE A[i+x] = A[i] + i; 0 WE } 15 � � �
Loops � ➜ By default, loops are rolled ➜ Each loop iteration corresponds to a “sequence” of states (possibly a DAG) ➜ This state sequence will be repeated multiple times based on the loop trip count TOP void TOP (…) { ... S 1 for (i = 0; i < N; i++) + b LD b += a[i]; a[i] S 2 } 16
Loop Unrolling ➜ To expose higher parallelism and achieve shorter � latency ➜ Pros for (int i = 0; i < N; i++) A[i] = C[i] + D[i]; ➜ Decrease loop overhead ➜ Increase parallelism for scheduling ➜ Facilitate constant propagation and array-to-scalar promotion A[0] = C[0] + D[0]; ➜ Cons – increase operation count, A[1] = C[1] + D[1]; A[2] = C[2] + D[2]; which may negatively impact area, power, and timing ..... 17
Loop Pipelining ➜ Loop pipelining is one of the most important � optimizations for high-level synthesis ➜ Allows a new iteration to begin processing before the previous iteration is complete ➜ Key metric: Initiation Interval (II) in # cycles x [ i ] y [ i ] ld ld for ( i = 0; i < N; ++ i ) p [ i ] = x [ i ] * y [ i ]; × II = 1 i=0 ld st � � st ld st i=1 � � p [ i ] st ld � � i=2 ld � � st ld – Load i=3 st – Store cycles 18
Synthesis of Loops – Case Study By default, Vivado intends to optimize area, so loops are rolled 19
Synthesis of Loops – Case Study 20
Merging Loops 21
Merging Loops 22
Interface Synthesis 23
Port Directions 24
Port Protocols ➜ Simple: ap_none, ap_stable, ap_ack ➜ Ports with validation: ap_vld, ap_ovld , ap_hs ➜ Memory Interface: ap_memory, bram ➜ ap_fifo — ➜ ap_bus — ➜ AXI: axis, s_axilite, m_axi . 25
Backup 26
Recommend
More recommend