high level synthesis
play

High-Level Synthesis Xilinx Vivado HLS Hao Zheng Comp Sci & - PowerPoint PPT Presentation

High-Level Synthesis Xilinx Vivado HLS Hao Zheng Comp Sci & Eng University of South Florida 1 Reading The Zynq Book , chapter 14, 15 Vivado Design Suite Tutorial: High-Level Synthesis 2 Overview 3 4 5 6 Implementation


  1. High-Level Synthesis Xilinx Vivado HLS Hao Zheng Comp Sci & Eng University of South Florida 1

  2. Reading ➜ The Zynq Book , chapter 14, 15 ➜ Vivado Design Suite Tutorial: High-Level Synthesis 2

  3. Overview 3

  4. 4

  5. 5

  6. 6

  7. Implementation Considerations ➜ Resources / area ➜ Throughput ➜ Clock frequency Controlled by synthesis directives ➜ Latency ➜ Power consumption ➜ I/O requirements 7

  8. 8

  9. Native Types in C/C++ 9

  10. Arbitrary Precision – Integer 1 ≤ N ≤ 1024 10

  11. Typical C/C++ Construct to RTL Mapping C Constructs HW Components à Modules Functions à Arguments Input/output ports à Operators Functional units Scalars à Wires or registers Arrays à Memories à Control flows Control logics 11

  12. Function Hierarchy ➜ Each function is synthesized to a RTL module ➜ Function inlining eliminates hierarchy � ➜ The function main() cannot be synthesized ➜ Used to develop C-testbench Source code RTL hierarchy void A() { .. body A .. } TOP void C() { .. body C .. } void B() { A C(); B C } void TOP( ) { A(…); B(…); } 12

  13. Function Arguments ➜ Function arguments become module ports ➜ Interface follows certain protocol to synchronize data � exchange TOP void TOP(int* in1, int* in2, Datapath int* out1) in1 out1 in2 { *out1 = *in1 + *in2; in1_vld out1_vld FSM in2_vld } 13 �

  14. Expressions � ➜ Expressions and operations are synthesized to datapath components ➜ Timing constraints influence the degree of registering A + char A, B, C, D, int P; B � + P = (A+B)*C+D C P D 14

  15. Arrays ➜ An array is typically implemented by a mem block ➜ Read & write array -> RAM; Constant array -> ROM ➜ An array can be partitioned and map to multiple RAMs ➜ Multiples arrays can be merged and map to one RAM � ➜ An array can be partitioned into individual elements and map to registers A[N] void TOP(int) TOP N-1 { RAM N-2 A_out A_in DIN DOUT int A[N]; … ADDR for (i = 0; i < N; i++) 1 CE A[i+x] = A[i] + i; 0 WE } 15 � � �

  16. Loops � ➜ By default, loops are rolled ➜ Each loop iteration corresponds to a “sequence” of states (possibly a DAG) ➜ This state sequence will be repeated multiple times based on the loop trip count TOP void TOP (…) { ... S 1 for (i = 0; i < N; i++) + b LD b += a[i]; a[i] S 2 } 16

  17. Loop Unrolling ➜ To expose higher parallelism and achieve shorter � latency ➜ Pros for (int i = 0; i < N; i++) A[i] = C[i] + D[i]; ➜ Decrease loop overhead ➜ Increase parallelism for scheduling ➜ Facilitate constant propagation and array-to-scalar promotion A[0] = C[0] + D[0]; ➜ Cons – increase operation count, A[1] = C[1] + D[1]; A[2] = C[2] + D[2]; which may negatively impact area, power, and timing ..... 17

  18. Loop Pipelining ➜ Loop pipelining is one of the most important � optimizations for high-level synthesis ➜ Allows a new iteration to begin processing before the previous iteration is complete ➜ Key metric: Initiation Interval (II) in # cycles x [ i ] y [ i ] ld ld for ( i = 0; i < N; ++ i ) p [ i ] = x [ i ] * y [ i ]; × II = 1 i=0 ld st � � st ld st i=1 � � p [ i ] st ld � � i=2 ld � � st ld – Load i=3 st – Store cycles 18

  19. Synthesis of Loops – Case Study By default, Vivado intends to optimize area, so loops are rolled 19

  20. Synthesis of Loops – Case Study 20

  21. Merging Loops 21

  22. Merging Loops 22

  23. Interface Synthesis 23

  24. Port Directions 24

  25. Port Protocols ➜ Simple: ap_none, ap_stable, ap_ack ➜ Ports with validation: ap_vld, ap_ovld , ap_hs ➜ Memory Interface: ap_memory, bram ➜ ap_fifo — ➜ ap_bus — ➜ AXI: axis, s_axilite, m_axi . 25

  26. Backup 26

Recommend


More recommend