the simulation of the dynamic link allocation router dylar
play

The Simulation of the Dynamic Link Allocation Router (DyLAR) Wei - PowerPoint PPT Presentation

The Simulation of the Dynamic Link Allocation Router (DyLAR) Wei Song Advanced Processor Technology Group 2014/5/13 The School of Computer Science Overview A brief review of the Dynamic Link Allocation flow control method The new


  1. The Simulation of the Dynamic Link Allocation Router (DyLAR) Wei Song Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  2. Overview • A brief review of the Dynamic Link Allocation flow control method • The new simulation platform • Some simple performance analyses • An alternative method of the task request procedure • Future schedule Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  3. Serial is better than Parallel DI0 0 DO0 0 C C Dual-rail 0 DI0 1 DO0 1 C C DI1 0 DO1 0 C C Dual-rail 1 DI1 1 DO1 1 C C DI2 0 DO2 0 C C Dual-rail 2 DI2 1 DO2 1 C C DI3 0 DO3 0 C C Dual-rail 3 DI3 1 DO3 1 C C C C C C C C ACKI ACKO Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  4. Bandwidth efficiency is less than 50% Master Slave Request to reserve a path False Ack Request to reserve a path Time OK ACK Data transmissions Data transmissions Data transmissions Data transmissions Data transmissions Data transmissions Data transmissions (end) False ACK Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  5. The high Loss Rate Simulation results of a 6x6 NoC. 0.9 Flit Level Loss Rate 16000 Average Frame Latency (ns) Frame Level Retry rate 0.8 14000 0.7 12000 0.6 Loss Rate 10000 0.5 8000 0.4 6000 0.3 4000 0.2 2000 0.1 0 0 100 200 300 400 0 100 200 300 400 Frame Injection Rate (kfps) Frame Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  6. Some hypotheses of DyLAR • Asynchronous circuits prefer serial rather than parallel channels • Connection oriented communications only have a bandwidth efficiency less than 50% • The high retry rate of connection oriented communication is reducible by add virtual channels • The input buffer could be smaller than flit size when using serial channels Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  7. The DyLAR Router DyLAR Arbiter Router Request Switch Credit (0,0) Tran Credit (0,1) Credit (0,0) Control Credit (0,2) Sub-link (0,0) Sub-link (0,0) Credit (0,1) Sub-link (0,1) Sub-link (0,1) Credit (0,2) Sub-link (0,2) Credit (1,0) Sub-link (0,2) Output Buffer Tran Input Buffer Credit (1,1) Credit (1,0) Control Credit (1,2) Sub-link (1,0) Sub-link (1,0) Data Credit (1,1) Sub-link (1,1) Sub-link (1,1) Switch Credit (1,2) Sub-link (1,2) Credit (2,0) Sub-link (1,2) Output Buffer Tran Input Buffer Credit (2,1) Credit (2,0) Control Credit (2,2) Sub-link (2,0) Sub-link (2,0) Credit (2,1) Sub-link (2,1) Sub-link (2,1) Credit (2,2) Sub-link (2,2) Sub-link (2,2) Input Buffer Output Buffer Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  8. Flit Formats   REQ _ NUM 128 bit 4 bit *2     4 flit data flit type header 8 bit 8 bit flit Y X flit type header Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  9. The Flow Control Procedures Arbiter Arbiter 2 1 3 Tran Tran 4 Control Control Tran Tran Control Control Tran Tran Control Control Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  10. Overview • A brief review of the Dynamic Link Allocation flow control method • The new simulation platform • Some simple performance analyses • An alternative method of the task request procedure • Future schedule Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  11. Basic information – Mesh topology – Only send XY frames – Parameter reconfigurable – Latency is set according to 1-of-4 CHAIN link – SystemC 2.2.0 – GNU g++ – Makefile – Batch simulation and automatic result analysis (accepted traffic, latency, loss rate) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  12. Configurable parameters – Dimension (>1) – Injected traffic (kfps) (>0) – Channel number (>0) – Request number (>0) – Random seed (0 random seed, others seeds) – Random delay – Simulation time – VCD file (generate waveform and debug logs) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  13. Current Problems • The router design – Multiple request lines sharing one channel will generate deadlocks • (still under debugging and modificating) • The simulation model – Slow (possible > 20 min under 4x4 cases) – Memory consuming (possible > 2G under some 4x4 cases) Simulation environment: ADM 2.4GHz 64-bit 4G memory Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  14. Deadlock Avoidance 1 ! Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  15. Deadlock Avoidance 2 Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  16. Deadlock Recovery 1 DyLAR Arbiter Router Request Switch Tran Control Output Buffer Tran Input Buffer Control Data Switch Output Buffer Tran Input Buffer Control Input Buffer Output Buffer Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  17. Deadlock Recovery 2 DyLAR Arbiter Router Request Switch Tran Control Output Buffer Tran Input Buffer Control Data Switch Output Buffer Tran Input Buffer Control Input Buffer Output Buffer Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  18. Overview • A brief review of the Dynamic Link Allocation flow control method • The new simulation platform • Some simple performance analyses • An alternative method of the task request procedure • Future schedule Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  19. Simulation parameters • Dimension 4x4 • Channel 1~3 • Request line 1~8 • Frame injection rate 20~500 kfps • Random delay and random uniform traffic pattern Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  20. 1 channel with multiple requests 320 70000 300 280 60000 260 Accepted Traffic (MByte/s) 240 1 req average Latency (ns) 50000 220 2 req 200 4 req 180 40000 6 req 160 140 30000 1 req 120 2 req 100 20000 4 req 80 6 req 60 10000 40 20 0 0 0 20 40 60 80 100 120 140 160 180 200 220 240 0 20 40 60 80 100 120 140 Injection Rate (kfps) Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  21. 1 channel with multiple requests 0.9 0.8 0.7 0.6 Retry rate 0.5 0.4 0.3 1 req 2 req 0.2 4 req 6 req 0.1 0.0 0 20 40 60 80 100 120 140 Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  22. 1 request with multiple channels 300 50000 1C1R 1C1R 2C1R 2C1R 250 Accepted Traffic (MByte/s) 3C1R 3C1R 40000 average Latency (ns) 200 30000 150 20000 100 10000 50 0 0 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 160 180 200 220 Injection Rate (kfps) Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  23. 1 request with multiple channels 1C1R 2C1R 6000 3C1R average Latency (ns) 4000 2000 0 50 100 Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  24. 2 channels with multi-requests 800 C1R1 50000 C2R1 700 C2R2 Accepted Traffic (MByte/s) C2R4 600 Average Latency (ns) 40000 C2R6 C2R8 500 30000 C1R1 400 C2R1 C2R2 20000 300 C2R4 C2R6 200 10000 C2R8 100 0 0 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 Injection Rate (kfps) Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  25. 3 channels with multi-requests 1200 80000 70000 1000 3C1R 3C1R Accepted Traffic (MByte/s) 3C2R 60000 3C2R Average Latency (ns) 3C4R 3C4R 800 3C6R 3C6R 50000 3C8R 3C8R 600 40000 30000 400 20000 200 10000 0 0 0 100 200 300 400 500 0 100 200 300 400 500 Injection Rate (kfps) Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  26. Throughput 1 2 4 6 8 request request request request request 1 channel 186 266 300 300 300 2 channel 265 512 710 >710 >710 3 channel 300 650 >1000 >1000 >1000 Unit: MByte/s Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  27. Overview • A brief review of the Dynamic Link Allocation flow control method • The new simulation platform • Some simple performance analyses • An alternative method of the task request procedure • Future schedule Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  28. The Original Task Request Procedure M S S S T R F ( 3 ) T A F TRF(2) TRF(1) TRF(0) TAF VRF/TAF VRF/TAF TRF task request flit VRF volunteer request flit TAF task acknowledge flit Advanced Processor Technology Group 2014/5/13 The School of Computer Science

  29. The alternative method M S S S T R F ( 3 ) TRF(2) TRF(1) TRF(3) M S S S VRF/TAF TAF VRF/TAF Advanced Processor Technology Group 2014/5/13 The School of Computer Science

Recommend


More recommend