parallel programming and heterogeneous computing
play

Parallel Programming and Heterogeneous Computing FPGA Accelerators - PowerPoint PPT Presentation

Parallel Programming and Heterogeneous Computing FPGA Accelerators Max Plauth, Sven Khler, Felix Eberhardt, Lukas Wenzel and Andreas Polze Operating Systems and Middleware Group FPGA Hardware Characteristics Application Specific Integrated


  1. Parallel Programming and Heterogeneous Computing FPGA Accelerators Max Plauth, Sven Köhler, Felix Eberhardt, Lukas Wenzel and Andreas Polze Operating Systems and Middleware Group

  2. FPGA Hardware Characteristics Application Specific Integrated Circuits (ASIC) implement a single fixed and usually ■ highly optimized hardware architecture (e.g. CPUs, GPUs, …) Field Programmable Gate Arrays (FPGA) are configured to implement a variety of ■ hardware designs Logic Blocks Programmable Interconnect FPGA fabric consists of a regular structure of Ø hardware primitives, signal lines and routers EnSem 2019 FPGA Accelerators Lukas Wenzel Chart 2 RAM/ALU/... Blocks IO Blocks

  3. FPGA Hardware Characteristics Hardware primitives include: ■ Logic Blocks (CLB) with Flipflops, Lookup □ Tables, Multiplexers, … Memory Blocks (BRAM) to act as single port, □ dual port or FIFO memories Arithmetic Blocks (DSP) with hardware □ multipliers, adders, shifters, … Clock Management Blocks (MMCM) to derive □ clock signals with specific frequency and phase relations IO Banks with logic for various signaling □ standards EnSem 2019 FPGA Accelerators Lukas Wenzel Chart 3 From: Xilinx UG 474, Figure 5-1

  4. FPGA Hardware Characteristics Floorplan of a Xilinx Kintex Ultra Scale XCKU060 FPGA EnSem 2019 FPGA Accelerators Lukas Wenzel Chart 4

  5. FPGA Performance Characteristics 0.4 ASICs are rated by maximum operating clock frequency ■ FPGAs have no uniform clock frequency rating ■ 0.1 0.1 0.2 Maximum clock frequency is design specific and 0.2 Ø constrained by the longest combinatorial path delay 0.4 0.3 0.2 0.1 0.2 Individual logic delays range from 0.1ns to 0.5ns ■ 0.2 0.1 Small and tightly coupled design sections may run at Ø 0.4 1GHz 0.4 ns 0.6 ns 1.1 ns 1.4 ns 1.7 ns 0.8 ns Common frequency is 250MHz Ø ~ 550 MHz Specific blocks like BRAMs may have maximum clock EnSem 2019 FPGA ■ Accelerators frequency ratings Lukas Wenzel BRAMs on current Xilinx FPGAs can run at 800MHz □ Chart 5

  6. FPGA Performance Characteristics FPGA designs operate at up to an order of magnitude lower clock frequencies than ASIC accelerators! How do FPGAs achieve speedups over fixed function ASIC implementations? Avoid overheads of general-purpose hardware: Ø CPUs invest a large amount of logic and cycles into fetching and decoding a □ general-purpose instruction stream CPUs must accommodate a wide variety of applications by providing a □ compromise set of execution facilities (function units, forwarding paths, …) FPGAs permit application specific microarchitectures , leveraging: EnSem 2019 FPGA Ø Accelerators Parallelization Pipelining Lukas Wenzel Clock Clock Clock Clock Task Task Task T a s k Chart 6 Task Task

  7. FPGA Design Process Hardware development toolchains and steps are significantly different from software development, as final artifacts are not executable binaries but hardware configurations. EnSem 2019 FPGA Accelerators Lukas Wenzel Chart 7

  8. High Level Synthesis and Data Streams void hls_operator (stream & in, stream & out, stream_data offset) { #pragma HLS interface ap_stable port=offset stream_element in_element, out_element; do { #pragma HLS pipeline in_element = in. read (); out_element.tdata = in_element.tdata + offset; out_element.tlast = in_element.tlast; out. write (out_element); } while (!in_element.tlast); } EnSem 2019 FPGA in out Accelerators Lukas Wenzel data data last last hls_operator valid valid ready ready Chart 8 offset

  9. Links CAPI + MetalFS on Nallatech N250S / POWER8 https://github.com/osmhpi/metal_fs/ https://github.com/open-power/snap/ Zynq SOC + PYNQ on Ultra96 Boards https://ultra96-pynq.readthedocs.io/ EnSem 2019 FPGA Accelerators Lukas Wenzel Chart 10

  10. Roadmap 15. Okt Introduction 17. Dec Work/Consultation 22. Okt Terminology, OpenMP 24. Dec <No lecture> 29. Okt <No lecture> 31. Dec <No lecture> 05. Nov SIMD, Profiling I 07. Jan <maybe no lecture> 12. Nov Heatmap Discussion 14. Jan 19. Nov FPGA Accelerators 21. Jan Bring Five Simple EnSem 2019 FPGA 26. Nov [Heatmap|MatMul|FFT] 28. Jan Accelerators Implementations Lukas Wenzel 03. Dec Study GPU Literature 03. Feb Final Presentation Chart 11 10. Dec Roadmap Presentation

Recommend


More recommend