hermes a an asynchronous noc router with distributed
play

Hermes-A: An Asynchronous NoC Router with Distributed Routing - PowerPoint PPT Presentation

Hermes-A: An Asynchronous NoC Router with Distributed Routing Julian Pontes Matheus Moreira Fernando Moraes Ney Calazans 1 Outline Introduction Related Work Architecture Input Port Path Calculation Output


  1. Hermes-A: An Asynchronous NoC Router with Distributed Routing Julian Pontes Matheus Moreira Fernando Moraes Ney Calazans 1

  2. Outline • Introduction • Related Work • Architecture – Input Port • Path Calculation – Output Port • Output Control • Results • Future Work • Conclusions 2

  3. Introduction A B S • Dual Rail Encoding 0 0 SF • Four Phase Protocol 0 1 ST • DIMS Logic 1 0 ST 1 1 SF CD CD CD CD R R R R R R e Logic e e e e e Logic g g g g g g 3

  4. Introduction • Asynchronous Circuits – Less Simultaneous Switching ☺ • Less EMI • Less IR Drop ( Slight PowerPlan ) • Less Peak Power ( No Decap Cells ) • Less Crosstalk Problems in Data Links ??? (DI Codes - Four Phase) ( Partial Shielding in data links ) – Average Case Delay ☺ – Reduce Dynamic Power (5 times – 65nm comparison) ☺ 4

  5. Introduction • Asynchronous Circuits – Area and Leakage Overhead (~5-3 times more – 65nm) � – Lack of CAD Tools and Standards � • Synthesis Tools – Traditional Tools (~45 Thousand loop breakers in a 3x3 NoC) – Asynchronous Synthesis Tools (Balsa, Teak) » Lack of traditional optimizations (Pin Swapping, Reordering, Retime, …) • STA – Liberty File Support (is_async_reg) – New Set of Constraints (Cycle Time Definition) 5

  6. Introduction • Networks on Chip – Offer large communication parallelism – Can provide alternate paths • Asynchronous Network on Chip – Enable the Design of Complex GALS Systems on Chip 6

  7. Objetive • Design an asynchronous router architecture capable to support the design of GALS Systems – High Throughput – Low Power – Permit the implementation of fine grain control power » MVS » Power Shut-Off 7

  8. Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP (454Mflits/s 3.6Gb/s per router) ASIC 8

  9. Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP (454Mflits/s 3.6Gb/s per router) ASIC 9

  10. Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP (454Mflits/s 3.6Gb/s per router) ASIC 10

  11. Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP Clock (454Mflits/s Stretching 3.6Gb/s per router) ASIC 11

  12. Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP Clock (454Mflits/s Stretching 3.6Gb/s per router) ASIC 12

  13. Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP Clock (454Mflits/s Stretching 3.6Gb/s per router) ASIC 13

  14. Related Work Characteristics Topology Routing / Flow Network Asynchronous Links and Implementati � Control Interface Style encoding on NoC As. QNoC 2D Mesh Source / wormhole N.A 4-phase 10-bit flits 180 nm, (Irreg/Reg) / credit-based with bundled-data 200Mflits/s, preemption ASIC 8VCs RasP Framework Source / bit serial Ad hoc QDI Point-to- 180nm, / point-to- point 700Mb/s point pipelined (Irreg/Reg) serial links ASPIN 2D Mesh Distributed XY / A2S, S2A Bundled-data/ Dual-rail, 4- 90nm, (Reg) wormhole / EOP FIFOs QDI ph., 34-bit 714Mflits/s flits ANoC 2D Mesh Source / Adaptive - QDI One of Four 130nm/ 2VCs 5Gb/s (router) Hermes-A 2D Mesh Distributed XY / Dual-Rail QDI Dual-Rail 180nm, wormhole / BOP- SCAFFI 727Mbits/s, EOP Clock (454Mflits/s Stretching 3.6Gb/s per router) ASIC 14

  15. Router Architecture • Distributed Routing • Independent Ports • Dual Rail Encoding • Weak Conditioned Half Buffer • DIMS Logic 15

  16. Input Port • Packet – First Flit contains the address – BOP and EOP delimiters – Three main paths • First Flit (1), Last Flit (3) and other Flits (2) 16

  17. Input Port 10 17

  18. Path Calculation • All logic employs Delay Insensitive Minterm Synthesis • First Flit contains the XY address 18

  19. Input Port 14 19

  20. Input Port 10 4 20

  21. Input Port 10 4 21

  22. Input Port 14 22

  23. Input Port 10 4 23

  24. Input Port 10 4 24

  25. Input Port 4 10 25

  26. Input Port 14 26

  27. Input Port 10 4 27

  28. Input Port 4 K 28

  29. Input Port 14 29

  30. Input Port K 30

  31. Input Port S-Control 31

  32. S-Element - Enclosure • Starts with a handshake at the input port • Perform two handshakes – First to send the last flit – Second to close the communication section at the output port (EOP = BOP = 1) • Speed Independent Design – Circuit generated with Petrify 32

  33. S-Control S-Control INPUT LAST FLIT Input Ack S-Control – Output Port A LAST FLIT Ack A S-Control – Output Port B BOP = EOP =1 AckB 33

  34. Output Port • Arbitration • Kill Section 34

  35. Output Port 35

Recommend


More recommend