fasttrack leveraging heterogeneous fpga wires to design
play

FastTrack : Leveraging Heterogeneous FPGA Wires to Design Low-cost - PowerPoint PPT Presentation

FastTrack : Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs Nachiket Kapre + Tushar Krishna nachiket@uwaterloo.ca, tushar@ece.gatech.edu 1/29 Claim FPGA overlay NoCs designed to exploit interconnect


  1. FastTrack : Leveraging Heterogeneous FPGA Wires to Design Low-cost High-performance Soft NoCs Nachiket Kapre + Tushar Krishna nachiket@uwaterloo.ca, tushar@ece.gatech.edu 1/29

  2. Claim FPGA overlay NoCs designed to exploit interconnect properties of the FPGA fabric can surpass existing state-of-the-art NoCs by: ◮ 2.5–2.8 × throughput ↑ ◮ 2.2 × energy ↓ ◮ at 2.5 × LUT cost ↑ Xilinx Virtex-7 485T FPGA, 8 × 8 system size, synthetic+real-world traffic. 2/29

  3. Context ◮ FPGAs finding comfortable home in datacenters ◮ Offloading compute intensive workloads to the FPGA ◮ Energy-efficiency, fast coupling to networking ◮ Common Infrastructure : NoCs for apps + system IO 3/29

  4. Context ◮ FPGAs finding comfortable home in datacenters ◮ Offloading compute intensive workloads to the FPGA ◮ Energy-efficiency, fast coupling to networking ◮ Common Infrastructure : NoCs for apps + system IO 3/29

  5. Landscape of contemporary FPGA NoC Routers 4/29

  6. Landscape of contemporary FPGA NoC Routers ● Split−Merge BLESS CONNECT OpenSMART Peak S/W Bandwidth (packets/ns) 1.00 ● 0.75 0.50 0.25 0.00 0 1000 2000 3000 Cost per Switch max(LUTs,FFs) ◮ ASIC clones transplanted onto FPGAs fare poorly! → expensive buffers, virtual channels, multi-ported switches ◮ Even contemporary FPGA routers are expensive and slow ◮ FastTrack : Deflection-routing + Bufferless + Torus 5/29

  7. Landscape of contemporary FPGA NoC Routers ● Split−Merge BLESS CONNECT OpenSMART Peak S/W Bandwidth (packets/ns) 1.00 ● 0.75 0.50 0.25 0.00 0 1000 2000 3000 Cost per Switch max(LUTs,FFs) ◮ ASIC clones transplanted onto FPGAs fare poorly! → expensive buffers, virtual channels, multi-ported switches ◮ Even contemporary FPGA routers are expensive and slow ◮ FastTrack : Deflection-routing + Bufferless + Torus 5/29

  8. Landscape of contemporary FPGA NoC Routers ● Split−Merge BLESS CONNECT OpenSMART Peak S/W Bandwidth (packets/ns) 3 2 1 ● 0 0 1000 2000 3000 Cost per Switch max(LUTs,FFs) ◮ ASIC clones transplanted onto FPGAs fare poorly! → expensive buffers, virtual channels, multi-ported switches ◮ Even contemporary FPGA routers are expensive and slow ◮ FastTrack : Deflection-routing + Bufferless + Torus 5/29

  9. Landscape of contemporary FPGA NoC Routers ● Split−Merge BLESS Hoplite CONNECT OpenSMART FastTrack Peak S/W Bandwidth (packets/ns) 3 2 1 ● 0 0 1000 2000 3000 Cost per Switch max(LUTs,FFs) ◮ ASIC clones transplanted onto FPGAs fare poorly! → expensive buffers, virtual channels, multi-ported switches ◮ Even contemporary FPGA routers are expensive and slow ◮ FastTrack : Deflection-routing + Bufferless + Torus 5/29

  10. Qualitative Comparison of FPGA NoC Routers Router Cost Xbar+Arb Buffers VCs OpenSMART ✗ ✗ ✗ BLESS ✗ ✓ ✓ CONNECT ✗ ✗ ✗ Split-Merge ✗ ✗ ✓ Hoplite ✓✓ ✓ ✓ 6/29

  11. Quick Tutorial on Hoplite sw sw sw sw 0,0 1,0 2,0 3,0 sw sw sw sw 0,1 1,1 2,1 3,1 sw sw sw sw 0,2 1,2 2,2 3,2 sw sw sw sw 0,3 1,3 2,3 3,3 Hoplite: A Deflection-Routed Directional Torus NoC for FPGAs , TRETS 2017 Hoplite: Building Austere Overlay NoCs for FPGAs , FPL 2015 7/29

  12. Quick Tutorial on HopliteRT sw sw sw sw 0,0 1,0 2,0 3,0 sw sw sw sw 0,1 1,1 2,1 3,1 sw sw sw sw 0,2 1,2 2,2 3,2 sw sw sw sw 0,3 1,3 2,3 3,3 HopliteRT: An Efficient FPGA NoC for Real-Time Applications , FPT 2017 8/29

  13. Qualitative Comparison of FPGA NoC Routers Router Cost Xbar+Arb Buffers VCs OpenSMART ✗ ✗ ✗ BLESS ✗ ✓ ✓ CONNECT ✗ ✗ ✗ Split-Merge ✗ ✗ ✓ Hoplite ✓✓ ✓ ✓ 9/29

  14. Qualitative Comparison of FPGA NoC Routers Router Cost Perf Xbar+Arb Buffers VCs Tput Latency OpenSMART ✗ ✗ ✗ ✓ ✓ BLESS ✗ ✓ ✓ ✓ ✗ CONNECT ✗ ✗ ✗ ✓ ✓ Split-Merge ✗ ✗ ✓ ✓ ✓ Hoplite ✓✓ ✓ ✓ ✗ ✗ 9/29

  15. Challenge ◮ Deflection routing → inefficient use of wiring resources ◮ Deflected packets stay in network for longer → latency ↑ ◮ Steal bandwidth from other traffic → throughput ↓ ◮ Can we allow improve NoC performance under deflection routing? ◮ Are there unique opportunities provided by the FPGA fabric? ◮ Hoplite cheap in LUT cost. . . ◮ FastTrack → inspect FPGA interconnect 10/29

  16. Outline Introduction and Motivation FastTrack NoC Organization FastTrack Router Operation Evaluation 11/29

  17. Outline Introduction and Motivation FastTrack NoC Organization FastTrack Router Operation Evaluation 12/29

  18. FPGA Wire Speeds distances not to scale 13/29

  19. FastTrack NoC Organization sw sw sw sw sw sw sw sw sw sw sw sw sw sw sw sw 14/29

  20. Depopulated Topology Generation sw sw sw sw sw sw sw sw sw sw sw sw sw sw sw sw sw sw sw sw 15/29

  21. Parametric Topology generation ◮ FPGA NoC parameterized by three terms: ◮ N System size ◮ D Distance of express link ◮ R Depopulation parameter → controls how many routers are FastTrack vs. vanilla Hoplite ◮ Fully populated 4 × 4 NoC → FT(16,2,1) ◮ Half population 4 × 4 NoC → FT(16,2,2) 16/29

  22. Outline Introduction and Motivation FastTrack NoC Organization FastTrack Router Operation Evaluation 17/29

  23. FastTrack Switch Organization N Sh N Ex W Ex 5:1 E Ex N W Sh W 4:1 3:1 E Sh E 4:1 4:1 3:1 S Sh S Ex PE PE S (a) Base HopliteRT (b) FastTrack 18/29

  24. Switch Operation ◮ Packets can start in either short or express links ◮ DOR routing function: N Sh N Ex travel in X first, then Y ◮ Packets can upgrade to W Ex 5:1 E Ex fast links if they can ◮ Packets can downgrade to W Sh 4:1 E Sh slow links only on turn! ◮ Livelock avoidance: 4:1 4:1 W → S > N → S S Sh S Ex PE ◮ Express links=higher priority, deflected packets acquire higher priority → progress 19/29

  25. Switch Operation ◮ Packets can start in either short or express links ◮ DOR routing function: N Sh N Ex travel in X first, then Y ◮ Packets can upgrade to W Ex 5:1 E Ex fast links if they can ◮ Packets can downgrade to W Sh 4:1 E Sh slow links only on turn! ◮ Livelock avoidance: 4:1 4:1 W → S > N → S S Sh S Ex PE ◮ Express links=higher priority, deflected packets acquire higher priority → progress 19/29

  26. Switch Operation ◮ Packets can start in either short or express links ◮ DOR routing function: N Sh N Ex travel in X first, then Y ◮ Packets can upgrade to W Ex 5:1 E Ex fast links if they can ◮ Packets can downgrade to W Sh 4:1 E Sh slow links only on turn! ◮ Livelock avoidance: 4:1 4:1 W → S > N → S S Sh S Ex PE ◮ Express links=higher priority, deflected packets acquire higher priority → progress 19/29

  27. Switch Operation ◮ Packets can start in either short or express links ◮ DOR routing function: N Sh N Ex travel in X first, then Y ◮ Packets can upgrade to W Ex 5:1 E Ex fast links if they can ◮ Packets can downgrade to W Sh 4:1 E Sh slow links only on turn! ◮ Livelock avoidance: 4:1 4:1 W → S > N → S S Sh S Ex PE ◮ Express links=higher priority, deflected packets acquire higher priority → progress 19/29

  28. Switch Operation ◮ Packets can start in either short or express links ◮ DOR routing function: N Sh N Ex travel in X first, then Y ◮ Packets can upgrade to W Ex 5:1 E Ex fast links if they can ◮ Packets can downgrade to W Sh 4:1 E Sh slow links only on turn! ◮ Livelock avoidance: 4:1 4:1 W → S > N → S S Sh S Ex PE ◮ Express links=higher priority, deflected packets acquire higher priority → progress 19/29

  29. Switch Operation ◮ Packets can start in either short or express links ◮ DOR routing function: N Sh N Ex travel in X first, then Y ◮ Packets can upgrade to W Ex 5:1 E Ex fast links if they can ◮ Packets can downgrade to W Sh 4:1 E Sh slow links only on turn! ◮ Livelock avoidance: 4:1 4:1 W → S > N → S S Sh S Ex PE ◮ Express links=higher priority, deflected packets acquire higher priority → progress 19/29

  30. Switch Operation ◮ Packets can start in either short or express links ◮ DOR routing function: N Sh N Ex travel in X first, then Y ◮ Packets can upgrade to W Ex 5:1 E Ex fast links if they can ◮ Packets can downgrade to W Sh 4:1 E Sh slow links only on turn! ◮ Livelock avoidance: 4:1 4:1 W → S > N → S S Sh S Ex PE ◮ Express links=higher priority, deflected packets acquire higher priority → progress 19/29

  31. Switch Operation ◮ Packets can start in either short or express links ◮ DOR routing function: N Sh N Ex travel in X first, then Y ◮ Packets can upgrade to W Ex 5:1 E Ex fast links if they can ◮ Packets can downgrade to W Sh 4:1 E Sh slow links only on turn! ◮ Livelock avoidance: 4:1 4:1 W → S > N → S S Sh S Ex PE ◮ Express links=higher priority, deflected packets acquire higher priority → progress 19/29

  32. Outline Introduction and Motivation FastTrack NoC Organization FastTrack Router Operation Evaluation 20/29

Recommend


More recommend