pciehls
play

PCIeHLS Malte Vesper, Dirk Koch and Khoa Pham High level synthesis - PowerPoint PPT Presentation

PCIeHLS Malte Vesper, Dirk Koch and Khoa Pham High level synthesis half a solution Easy generation of kernels from popular languages Good results require tuning with knowledge about FPGA architecture No infrastructure for kernel C


  1. PCIeHLS Malte Vesper, Dirk Koch and Khoa Pham

  2. High level synthesis – half a solution • Easy generation of kernels from popular languages • Good results require tuning with knowledge about FPGA architecture • No infrastructure for kernel C C++

  3. Vendor kernel integration Intel SDK for OpenCL • For selected boards only: => Popular academic board VC709 missing • Vendor partial flow

  4. Partial flow Ours Xilinx  Potentially calls for minor manual  Commercial stability adjustments on the static system  Relocation of modules  Combining partial regions  Synthesis largely independent of static system  Synthesis of partial and static with different tool versions

  5. Things you don’t want to know • ICAP • PCIe • Memory controller • Decoupling • Clock domain crossing • Timing closure

  6. System diagram Module 0 Module 1 Module … Module n 256 32 Partial reconfiguration (ICAP)

  7. Floorplan UserModule • Up to 4 user modules UserModule • Each user module ≈13.5% Slices ≈46.0% Slices • Static system Static System • Adjacent user module areas can be combined UserModule UserModule

  8. Flow

  9. Steps of our flow • Bus macro • Clock constraining • Block: • Fabric differences • Sites used by static system • Pips used by static system • Timing constraints • Cut out bitstream with Bitman

  10. Bus Macro • LUT – wire – LUT LUT LUT

  11. Bus Macro • LUT – wire – LUT • Constrained: • LOC/BEL LUT LUT LUT LUT LUT LUT

  12. Bus Macro • LUT – wire – LUT • Constrained: • LOC/BEL LUT LUT • LOCK_PINS LUT LUT

  13. Bus Macro • LUT – wire – LUT • Constrained: • LOC/BEL LUT LUT • LOCK_PINS • FIXED_ROUTE LUT LUT

  14. Bus Macro • LUT – wire – LUT • Constrained: • LOC/BEL • LOCK_PINS • FIXED_ROUTE

  15. Clock constraining • Ensure clock is driven • Block other h-wires • Issues: timing differences on relocation, positive and negative skew

  16. Fabric differences Pblock 0 LUT LUT LUT • Special cells disturb regularity of fabric (i.e. PCIe , ICAP, …) • Simply block differences LUT LUT LUT Pblock 1 LUT LUT LUT LUT PCIe

  17. Sites used by static system Pblock 0 I/O LUT LUT • Block I/O • I/O does not actually matter, not I/O reconfigured LUT LUT I/O Pblock 1 I/O LUT LUT I/O I/O LUT LUT I/O

  18. Optimization prevention • Floating wires tied off to 0 • Optimization might remove logic 0 ? 0 & ?

  19. Optimization prevention • Floating wires tied off to 0 DONT_TOUCH • Optimization might remove logic ? • Flop marked as DONT_TOUCH 0 prevents logic optimization & ? • Works for signals into the partial region as well

  20. Routing used by static system

  21. Routing used by static system Pblock 0 INT LUT LUT • Route a blocker from outside the PR through the wires (pips) INT LUT LUT Pblock 1 INT LUT LUT INT LUT LUT

  22. Timing constraints • Extract timing to Bus macro in static system • Calculate slowest as WORST • Constrain path of partial module to bus macro to period-worst

  23. Bitman cutting • extract partial bitstreams • Relocate bitstreams for modules Static System

  24. Summary • Build modules: • Once, use in multiple locations • Independent of static system • Infrastructure provided: • ICAP partial reconfiguration • PCIe link to host • MMCM to adjust clock for partial modules • Memory

  25. Thank you for your attention Questions

Recommend


More recommend