Ariane + NVDLA Seamless Third-Party IP Integration with ESP Davide Giri Kuan-Lin Chiu Guy Eichler Paolo Mantovani Nandhini Chandramoorthy (IBM Research) CARRV 2020 Luca P. Carloni
Motivation • SoCs are increasingly heterogeneous [1] • Heterogeneity increases the engineering effort [2] → IP reuse enables the design of complex SoCs • Thanks to open-source hardware (OSH) movement [3] → Proliferation of open-source IPs Seamless third-party IP integration is key! 2 [1] Shao, SLCA’15 [2] Khailani, DAC’18 [3] Gupta, IEEE Computer’17
In this work Enhance ESP with support for third-party accelerators • ESP is our open-source platform for SoC design [4] Demonstrate integration capabilities of ESP • Integration of Ariane [5] and NVDLA [6] • Rapid FPGA prototyping Open-source release as part of ESP • Hands-on tutorial: esp.cs.columbia.edu/docs/thirdparty_acc 3 [4] ESP: esp.cs.columbia.edu [5] Ariane: github.com/pulp-platform/ariane [6] NVDLA: nvdla.org
ESP P overview 4
ESP architecture 5
ESP methodology Accelerator Flow • Simplified design accelerator SoC Integration • Automated integration accelerator … HLS Design Flows ** By lewing@isc.tamu.edu Larry Ewing and The GIMP third-party SoC Flow Vivado HLS accelerator Catapult HLS * Stratus HLS • Mix&match … Rapid floorplanning GUI ** Prototyping RTL * By Nvidia Corporation • Rapid FPGA prototyping Design … Ariane Flows IP Library 6
ESP methodology: SoC flow accelerator SoC Integration accelerator … third-party accelerator … Rapid ** Prototyping … Ariane IP Library 7 ** By lewing@isc.tamu.edu Larry Ewing and The GIMP
Third-party IP integration with ESP 8
ESP accelerator tile 9
ESP accelerator flow ESP accelerator third-party accelerator automated Third-party manual RTL and SW Makefile files list targets definition Accelerator Accelerator Accelerator skeleton definition RTL specific (xml) wrapper functions wiring Test behavior accelerator accelerator … accelerator … Generate RTL … Test RTL Instantiate into SoC 10
Ariane + NVDLA with ESP 11
Integration of Ariane ESP processor tile • RISC-V Ariane (new!) or Sparc-v8 Leon3 • Boot unmodified Linux • AXI4 (new!) or AHB bus to access memory • APB bus to access peripherals • Optional L2 private cache • Processor-specific interrupt controller placed in the I/O tile 12
NVDLA NVIDIA Deep Learning Accelerator • Open source • Fixed function • Highly configurable NVDLA small • 8-bit integer precision • 64 MAC units • 128 KB local memory 13
Evaluation: setup SoCs evaluated on FPGA (Xilinx XCVU440) • Ariane core Evaluation networks • 1-4 NVDLA tiles • 1-4 memory channels 14
Evaluation: results Performance of NVDLA small in ESP Scaling NVDLA instances and DDR channels @ 50 MHz @ 50 MHz 5 5 frames / second (normalized) 4.5 performance preserved 3.9 18x lower than 3.8 4 4 frames / second NVIDIA’s results 3.1 @ 1GHz 3 3 2.1 2 2 1.3 1 1 1 0.4 0 0 1 NVDLA 2 NVDLA 3 NVDLA 4 NVDLA LeNet Convnet SimpleNet ResNet50 1 mem ctrl 2 mem ctrl 3 mem ctrl 4 mem ctrl 1 NVDLA LeNet 15
Thank you from the ESP team! sld.cs.columbia.edu ColumbiaSld esp.cs.columbia.edu sld-columbia/esp ESP channel Ariane + NVDLA Seamless Third-Party IP Integration with ESP Davide Giri Kuan-lin Chiu Guy Eichler Paolo Mantovani Nandhini Chandramoorthy (IBM) CARRV 2020 Luca P. Carloni
Recommend
More recommend