From Dataflow Specifications to Customised Reconfigurable Datapaths Using HLS: the OpenCL Case for FPGAs Rubén Salvador [Kindly hosted by INSA: KDesnos, MPelcat, JFNezan, DMenard, LMorin …] Universidad Politécnica de Madrid (UPM) School of Telecommunications Systems and Engineering (ETSIST) Research Center on Software Technologies and Multimedia Systems (CITSEM) Dataflow Workshop Rennes, 12-14 December 2017
Context Rubén Salvador From Dataflow to Customised FPGA Datapaths 2
Context Rubén Salvador From Dataflow to Customised FPGA Datapaths 3
OUTLINE Motivation OpenCL FPGA Dataflow Mapping On Top Of OpenCL FPGA Implementation Steps Rubén Salvador From Dataflow to Customised FPGA Datapaths 4
OUTLINE Motivation OpenCL FPGA Dataflow Mapping On Top Of OpenCL FPGA Implementation Steps Rubén Salvador From Dataflow to Customised FPGA Datapaths 5
Dataflow: a (naive) view from a newcomer Dataflow A flow of data… … moves … … is transformed … …and sinks. between (2) along its way points in space Spatial Computing hardware datapath FPGA on-chip memory point to point comms architecture Rubén Salvador From Dataflow to Customised FPGA Datapaths 6
Customised FPGA-based datapaths for dataflow graphs Dataflow FPGA HLS A C B System Level Integration Wide (SW) developers love Community Embrace Rubén Salvador From Dataflow to Customised FPGA Datapaths 7
Conquering Computing Community Embrace What can dataflow bring to the OpenCL community? OpenCL Dataflow • Graph analysis & Guarantees • Functionally portable • Schedulability, deadlocks, FIFO • Wide community acceptance Pros sizing • Support for HLS • Concurrent execution model • … • Comms interaction • No dataflow (streaming) friendly • Niche domain • Global memory comms • Most work for multi/manycore • Compute accelerator model Cons • Data offload (writes/reads) • Throughput oriented (vs latency) Rubén Salvador From Dataflow to Customised FPGA Datapaths 8
OUTLINE Motivation OpenCL FPGA Dataflow Mapping On Top Of OpenCL FPGA Implementation Steps Rubén Salvador From Dataflow to Customised FPGA Datapaths 9
OpenCL: framework for heterogeneous/parallel computing Task parallelism Data parallelism SIMD Work Group (WG) Work Item (WI) Rubén Salvador From Dataflow to Customised FPGA Datapaths 10
OpenCL FPGA Model Intel FPGA SDK for OpenCL: Best Practices Guide https://www.altera.com/en_US/pdfs/lit erature/hb/opencl-sdk/aocl-best- practices-guide.pdf https://www.altera.com/products/design- software/embedded-software- developers/opencl/developer-zone.html Rubén Salvador From Dataflow to Customised FPGA Datapaths 11
OUTLINE Motivation OpenCL FPGA Dataflow Mapping On Top Of OpenCL FPGA Implementation Steps Rubén Salvador From Dataflow to Customised FPGA Datapaths 12
Dataflow On Top Of OpenCL: SoC FPGAs Desired Features & Expected Gains Hardware acceleration (custom datapath) Reduced processor (communication) overhead Reduced memory transactions Self-timed Execution Rubén Salvador From Dataflow to Customised FPGA Datapaths 13
Dataflow On Top Of OpenCL FPGA Dataflow Community Leverage OpenCL FPGA constructs to generate efficient dataflow Dataflow- driven “ OpenCL ” code generation Tool Expertise & Design Space Exploration OpenCL Community OpenCL Khronos Group Standard Recent (2017) proposal: add dataflow semantics to OpenCL standard Rubén Salvador From Dataflow to Customised FPGA Datapaths 14
MoCs semantics for OpenCL Pipes Kapre, Nachiket, and Hiren Patel. Applying Models of Computation to MoCs semantics to OpenCL OpenCL Pipes for FPGA Computing . Proc. 5th IWOCL . ACM, 2017 . OpenCL compute model + MoC Comms Schemes Proposal for the OpenCL Standard a.k.a. : compiler’s job Synchronous Dataflow (SDF) Bulk Synchronous Parallel (BSP) Rubén Salvador From Dataflow to Customised FPGA Datapaths 15
OpenCL Increasing Streaming Support Pipes (OpenCL 2.0) Standard OpenCL Kernel-to-Kernel communication Overlap multi-kernel operation Channels (Intel FPGA) Preferred Kernel-to-Kernel communication Self-triggered kernels (free run decoupled from host) Host-Kernel Pipes Kang, K., and P. Yiannacouras. Host Pipes: Direct Streaming Interface Between OpenCL Host and Kernel . Proc. 5th IWOCL . ACM, 2017 . … only prototype demo so far Rubén Salvador From Dataflow to Customised FPGA Datapaths 16
Kernel Operation Possibilities Autorun kernels No host-kernel communication logic Autostart & Auto-restart Intel FPGA SDK for OpenCL: Best Practices Guide https://www.altera.com/en_US/pdfs/literature/hb/opencl- Communicate through channels sdk/aocl-best-practices-guide.pdf Rubén Salvador From Dataflow to Customised FPGA Datapaths 17
Channels Channels Kernel execution decoupled from host Blocking/Non-blocking Read/Write API Intel FPGA SDK for OpenCL: Programming Guide Synchronization mechanisms https://www.altera.com/content/dam/altera- www/global/en_US/pdfs/literature/hb/opencl- I/O Channels -> Streaming DSP sdk/aocl_programming_guide.pdf Rubén Salvador From Dataflow to Customised FPGA Datapaths 18
OUTLINE Motivation OpenCL FPGA Dataflow Mapping On Top Of OpenCL FPGA Implementation Steps Rubén Salvador From Dataflow to Customised FPGA Datapaths 19
Mapping (Pi)SDF Dataflow Graphs To OpenCL Model Kernel A C B K 1.- Actor Firing Rules within OpenCL Kernels Check overhead vs. performance Acceptable? 2.- Code Generatrion from P REESM 2.1.- Actor firing rules - scheduler 2.2.- FIFO analysis & Buffer generation 2.3.- Memory Access Optimization Rubén Salvador From Dataflow to Customised FPGA Datapaths 20
Mapping (PiSDF) Dataflow Graphs To OpenCL Model Kernel K 2.1.- Actor firing rules (scheduler) Enough with channels sync? Actor I/O IFs, firing rules, templates • Host code: Borrow from CA CAPH ¿? • Platform initialization: automatic • Job management: automatic • input data & result data • "only" necessary for the host/device frontier • pointers mapped to device buffers • Kernel code: • I/O interfaces (firing rules): automatic • Functionality: manual (provided by user) Rubén Salvador From Dataflow to Customised FPGA Datapaths 21
Mapping (PiSDF) Dataflow Graphs To OpenCL Model Kernel K 2.1.- Actor firing rules (scheduler) Enough with channels sync? Actor I/O IFs, firing rules, templates Borrow from CA CAPH ¿? 2.2.- Buffer generation Leverage current P REESM buffer generation Pipes vs Channels vs Ad-hoc Buffer 2.3.- Memory Accesses Optimization Streaming Dataflow Shared/Global Memory Different workloads? Local FPGA DDRs (kernel only) Out-of-order accesses? Rubén Salvador From Dataflow to Customised FPGA Datapaths 22
future(future) 3.- Hack the Flow Kernel (actor) functionality Component library Plug HDL / CAPH DSE: Predictability Area / Latency / Throughput Host-Device communications Device Wrapper DSE: Predictability Compute upper bound? Open Run Time ?¿ Graph Reconfiguration Dynamic Reconfiguration ?¿ 4.- New devices Intel Xeon + FPGA HPC community Rubén Salvador From Dataflow to Customised FPGA Datapaths 23
Thanks for your attention!! ruben.salvador@upm.es https://twitter.com/RubenSalvadorP http://blogs.upm.es/rubensalvador/ Rubén Salvador From Dataflow to Customised FPGA Datapaths 24
Recommend
More recommend