machine learning on a
play

Machine Learning on a Heterogeneous Cluster NaifTarafdar , Giuseppe - PowerPoint PPT Presentation

AIgean: An Open Framework for Machine Learning on a Heterogeneous Cluster NaifTarafdar , Giuseppe Di Guglielmo, Philip C Harris, Jeffrey D Krupa, Vladimir Loncar , Dylan S Rankin, Nhan Tran , Zhenbin Wu , Qianfeng Shen and


  1. AIgean: An Open Framework for Machine Learning on a Heterogeneous Cluster NaifTarafdar ¹ , Giuseppe Di Guglielmo², Philip C Harris³, Jeffrey D Krupa³, Vladimir Loncar ⁴ , Dylan S Rankin³, Nhan Tran ⁵ , Zhenbin Wu ⁶ , Qianfeng Shen¹ and Paul Chow¹ University of Toronto¹ Columbia University² Massachusetts Institute of Technology³ CERN ⁴ Fermilab ⁵ University of Illinois ⁶

  2. Take Aways • Galapagos: Platform for multi-FPGA application deployment – A scalable giant FPGA comprised of individual FPGAs • AIgean: Mapping an ML application onto the giant FPGA – Could also be your own applications • Depending on your area of expertise and interest you can use different parts of this project November 13, 2020 H2RC 2020

  3. Machine Learning • One of the most popular topics of research – In many areas, many applications (e.g medical, financial, safety, transportation etc.) – Also within the computing community • Wide usage in world pushes limits of devices – Metrics include performance and energy – Leading many researchers to consider heterogeneity! November 13, 2020 H2RC 2020

  4. Heterogeneity All Around Us This Photo by Unknown author is licensed under CC BY-NC. This Photo by Unknown author is licensed under CC BY-SA-NC. This Photo by Unknown author is licensed under CC BY-NC-ND. November 13, 2020 H2RC 2020

  5. Applying Machine Learning to a Heterogeneous Environment • Challenge: How do you design machine learning algorithms for a heterogenous space? – Hard enough with a homogenous computing environment – Is there a framework for such a thing? • Challenge: If such a framework exists can we get both flexibility and performance? November 13, 2020 H2RC 2020

  6. Outline • Brief Motivation • Overview of machine learning frameworks – Categorized as an abstraction layer stack • Overview of AIgean – HLS4ML – Galapagos • Results November 13, 2020 H2RC 2020

  7. MA MACH CHINE NE LEA EARNING RNING FR FRAM AMEW EWORKS ORKS November 13, 2020 H2RC 2020

  8. Many Popular Examples! • Such as – Tensorflow – PyTorch – Caffe – Intel DLA – Xilinx XfDNN • What do these different frameworks offer? – Depends on who you ask! November 13, 2020 H2RC 2020

  9. Machine Learning Stack Applications & Algorithms Cluster Deployment & Communication Hardware November 13, 2020 H2RC 2020

  10. Machine Learning Stack E.g: Neural net layers, Applications & Algorithms quantization, compression, pruning Cluster Deployment & Communication Hardware November 13, 2020 H2RC 2020

  11. Machine Learning Stack Applications & Algorithms E.g: Physical Connections Cluster Deployment & (PCIe, ethernet etc.), Communication Communication Protocols Hardware November 13, 2020 H2RC 2020

  12. Machine Learning Stack Applications & Algorithms Cluster Deployment & Communication E.g: Hardware circuit Hardware (multipliers, shifters), memory architecture (caching etc.) November 13, 2020 H2RC 2020

  13. Machine Learning Stack • Allows researchers to pick and choose layers Applications & Algorithms they wish to configure Cluster Deployment & Communication • Collapsable/Expandable Hardware for specific application and infrastructure! November 13, 2020 H2RC 2020

  14. AIG AIGEAN AN OVE VERVI VIEW November 13, 2020 H2RC 2020

  15. AIGean Introduction • Like the archipelago and sea • Combines two existing frameworks: – HLS4ML: • HLS IP cores of ML IP – Galapagos • Connects and deploys heterogeneous distributed application across multiple nodes November 13, 2020 H2RC 2020

  16. HLS4ML • Open source project • Input: – Description of FPGA resources • LUT, BRAM, DSP – Description of neural net • PyT orch, Keras, Onyx support • Output: – HLS synthesizable C++ that fits within resource constraints implementing neural net • Tunable HLS code, made to fit the FPGA November 13, 2020 H2RC 2020

  17. Galapagos User can define a FPGA cluster using • cluster description files and AXI-Stream VM kernels FPGA 1 File Network T ool Flow VM VM Kern Kern el el Kernel AXI-Stream AXI-Stream FPGA 2 FPGA 3 November 13, 2020 H2RC 2020

  18. Galapagos User can define a FPGA cluster using • cluster description files and AXI-Stream VM kernels FPGA 1 File Network T ool Flow VM VM Kern Kern el el Kernel AXI-Stream AXI-Stream FPGA 2 FPGA 3 November 13, 2020 H2RC 2020

  19. Galapagos User can define a FPGA cluster using • cluster description files and AXI-Stream VM kernels FPGA 1 File Network T ool Flow VM VM Kern Kern el el Kernel AXI-Stream AXI-Stream FPGA 2 FPGA 3 November 13, 2020 H2RC 2020

  20. Galapagos User can define a FPGA cluster using • cluster description files and AXI-Stream VM kernels FPGA 1 File Network T ool Flow VM VM Kern Kern el el Kernel AXI-Stream AXI-Stream FPGA 2 FPGA 3 November 13, 2020 H2RC 2020

  21. Galapagos Communication Layer • Heterogeneous Stack • Allows users to create flexible heterogeneous clusters Middleware/Network Layer across CPUs/FPGAs Seamlessly prototype by implementing both on CPU and • Hypervisor Layer FPGA Physical Hardware Galapagos ensures functional portability for network – communication Essentially "network-connected" HLS kernels – • For both SW and HW Iterative development, selectively move bottleneck from SW – to hardware without modifying code • Flexibly change communication protocol without modifying user application – TCP, UDP, L1 etc – User application is agnostic to this November 13, 2020 H2RC 2020

  22. Birth of AIgean • HLS4ML creates HLS IP core to maximize FPGA utilization • Galapagos can give a multi-FPGA fabric • Tools combined to deploy neural-net on multi- FPGA Fabric November 13, 2020 H2RC 2020

  23. AIgean Tool Flow Tuning Tunin CPU/FPGA IP Cluster Cluster g Not connected HLS Model Partitio Training ML Model Hls4 ner HDF5 & JSON Layers ml* C++ & TCL Keras, PyTorch AIGean Automated ML2G + Flow HLS ML to Galapag C++ & TCL os Bridge November 13, 2020 H2RC 2020

  24. AIgean Tool Flow Tuning CPU/FPGA IP Cluster Cluster Not connected HLS ML Model Layers Training Partitioner Model Hls4ml* HDF5 & JSON C++ & TCL Keras, PyTorch ML2G + HLS ML to Galapagos Bridge AIGean C++ & TCL Automated Flow November 13, 2020 H2RC 2020

  25. AIgean Tool Flow Tuning CPU/FPGA IP Cluster Cluster Not connected HLS ML Model Layers Training Partitioner Model Hls4ml* HDF5 & JSON C++ & TCL Keras, PyTorch ML2G + HLS ML to Galapagos Bridge AIGean C++ & TCL Automated Flow November 13, 2020 H2RC 2020

  26. HLS4ML Modifications • HLS4ML modified to create independent layers HLS ML Layers as separate HLS IP cores Hls4ml* C++ & TCL Model HDF5 & JSON – Each IP core is a streaming core with each stream per HLS ML to Galapagos Bridge dimension of the particular C++ & TCL layer November 13, 2020 H2RC 2020

  27. HLS4ML Galapagos Bridge • Bridges custom made for the layers used in the network (different bridges needed for different number of dimensions) • If the user has a different application layer then they would need a different bridge Galapagos Kernel ML Layer ML Layer Bridge Bridge … … One 512 bit stream One 512 bit stream one 8 bit stream per dim one 8 bit stream per dim November 13, 2020 H2RC 2020

  28. AIgean Tool Flow Tuning CPU/FPGA IP Cluster Cluster Not connected HLS ML Model Layers Training Partitioner Model Hls4ml* HDF5 & JSON C++ & TCL Keras, PyTorch ML2G + HLS ML to Galapagos Bridge AIGean C++ & TCL Automated Flow November 13, 2020 H2RC 2020

  29. Partitioner Partitioner separates IP cores onto • different FPGAs • Currently using IP resources IP Cluster Not connected estimation from HLS Place and HLS ML route and performing simple greedy Layers Partitioner approach C++ & TCL • Does not place the bridges as that is AIgean specific, and this partitioner is general for all Galapagos IP kernels November 13, 2020 H2RC 2020

  30. AIgean Tool Flow Tuning CPU/FPGA IP Cluster Cluster Not connected HLS ML Model Layers Training Partitioner Model Hls4ml* HDF5 & JSON C++ & TCL Keras, PyTorch ML2G + HLS ML to Galapagos Bridge AIGean C++ & TCL Automated Flow November 13, 2020 H2RC 2020

  31. Machine Learning to Galapagos (ML2G) • Adds the appropriate bridges on the interfaces IP Cluster CPU/FPGA Not connected Cluster of the FPGAs • Creates the local ML2G connections for kernels + on the same FPGA HLS ML to Galapagos Bridge C++ & TCL November 13, 2020 H2RC 2020

  32. RESU SULTS November 13, 2020 H2RC 2020

  33. Experiment Setup • CPUs – Xeon E5-2650 • 24 Cores at 2.2 GHz • FPGAs – Fidus Sidewinder • ZU19EG FPGA – ~1 Million logic cells, 35 MB BRAM, 1968 DSP slices • 100 GB network interface – 100 GB UDP core November 13, 2020 H2RC 2020

  34. Microbenchmarks Link Latency Throughput • Latency send Software to 0.029 ms 0.244 GB/s single flit Hardware • Throughput: Hardware to 0.00017 ms 100 GB/s Hardware maximum Hardware to 0.0203 ms N/A throughput of Software link (varying packet size for software) November 13, 2020 H2RC 2020

Recommend


More recommend