The Simulation of the Dynamic Link Allocation Router (DyLAR) Wei Song Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Overview • A brief review of the Dynamic Link Allocation flow control method • The new simulation platform • Some simple performance analyses • An alternative method of the task request procedure • Future schedule Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Serial is better than Parallel DI0 0 DO0 0 C C Dual-rail 0 DI0 1 DO0 1 C C DI1 0 DO1 0 C C Dual-rail 1 DI1 1 DO1 1 C C DI2 0 DO2 0 C C Dual-rail 2 DI2 1 DO2 1 C C DI3 0 DO3 0 C C Dual-rail 3 DI3 1 DO3 1 C C C C C C C C ACKI ACKO Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Bandwidth efficiency is less than 50% Master Slave Request to reserve a path False Ack Request to reserve a path Time OK ACK Data transmissions Data transmissions Data transmissions Data transmissions Data transmissions Data transmissions Data transmissions (end) False ACK Advanced Processor Technology Group 2014/5/13 The School of Computer Science
The high Loss Rate Simulation results of a 6x6 NoC. 0.9 Flit Level Loss Rate 16000 Average Frame Latency (ns) Frame Level Retry rate 0.8 14000 0.7 12000 0.6 Loss Rate 10000 0.5 8000 0.4 6000 0.3 4000 0.2 2000 0.1 0 0 100 200 300 400 0 100 200 300 400 Frame Injection Rate (kfps) Frame Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Some hypotheses of DyLAR • Asynchronous circuits prefer serial rather than parallel channels • Connection oriented communications only have a bandwidth efficiency less than 50% • The high retry rate of connection oriented communication is reducible by add virtual channels • The input buffer could be smaller than flit size when using serial channels Advanced Processor Technology Group 2014/5/13 The School of Computer Science
The DyLAR Router DyLAR Arbiter Router Request Switch Credit (0,0) Tran Credit (0,1) Credit (0,0) Control Credit (0,2) Sub-link (0,0) Sub-link (0,0) Credit (0,1) Sub-link (0,1) Sub-link (0,1) Credit (0,2) Sub-link (0,2) Credit (1,0) Sub-link (0,2) Output Buffer Tran Input Buffer Credit (1,1) Credit (1,0) Control Credit (1,2) Sub-link (1,0) Sub-link (1,0) Data Credit (1,1) Sub-link (1,1) Sub-link (1,1) Switch Credit (1,2) Sub-link (1,2) Credit (2,0) Sub-link (1,2) Output Buffer Tran Input Buffer Credit (2,1) Credit (2,0) Control Credit (2,2) Sub-link (2,0) Sub-link (2,0) Credit (2,1) Sub-link (2,1) Sub-link (2,1) Credit (2,2) Sub-link (2,2) Sub-link (2,2) Input Buffer Output Buffer Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Flit Formats REQ _ NUM 128 bit 4 bit *2 4 flit data flit type header 8 bit 8 bit flit Y X flit type header Advanced Processor Technology Group 2014/5/13 The School of Computer Science
The Flow Control Procedures Arbiter Arbiter 2 1 3 Tran Tran 4 Control Control Tran Tran Control Control Tran Tran Control Control Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Overview • A brief review of the Dynamic Link Allocation flow control method • The new simulation platform • Some simple performance analyses • An alternative method of the task request procedure • Future schedule Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Basic information – Mesh topology – Only send XY frames – Parameter reconfigurable – Latency is set according to 1-of-4 CHAIN link – SystemC 2.2.0 – GNU g++ – Makefile – Batch simulation and automatic result analysis (accepted traffic, latency, loss rate) Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Configurable parameters – Dimension (>1) – Injected traffic (kfps) (>0) – Channel number (>0) – Request number (>0) – Random seed (0 random seed, others seeds) – Random delay – Simulation time – VCD file (generate waveform and debug logs) Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Current Problems • The router design – Multiple request lines sharing one channel will generate deadlocks • (still under debugging and modificating) • The simulation model – Slow (possible > 20 min under 4x4 cases) – Memory consuming (possible > 2G under some 4x4 cases) Simulation environment: ADM 2.4GHz 64-bit 4G memory Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Deadlock Avoidance 1 ! Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Deadlock Avoidance 2 Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Deadlock Recovery 1 DyLAR Arbiter Router Request Switch Tran Control Output Buffer Tran Input Buffer Control Data Switch Output Buffer Tran Input Buffer Control Input Buffer Output Buffer Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Deadlock Recovery 2 DyLAR Arbiter Router Request Switch Tran Control Output Buffer Tran Input Buffer Control Data Switch Output Buffer Tran Input Buffer Control Input Buffer Output Buffer Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Overview • A brief review of the Dynamic Link Allocation flow control method • The new simulation platform • Some simple performance analyses • An alternative method of the task request procedure • Future schedule Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Simulation parameters • Dimension 4x4 • Channel 1~3 • Request line 1~8 • Frame injection rate 20~500 kfps • Random delay and random uniform traffic pattern Advanced Processor Technology Group 2014/5/13 The School of Computer Science
1 channel with multiple requests 320 70000 300 280 60000 260 Accepted Traffic (MByte/s) 240 1 req average Latency (ns) 50000 220 2 req 200 4 req 180 40000 6 req 160 140 30000 1 req 120 2 req 100 20000 4 req 80 6 req 60 10000 40 20 0 0 0 20 40 60 80 100 120 140 160 180 200 220 240 0 20 40 60 80 100 120 140 Injection Rate (kfps) Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science
1 channel with multiple requests 0.9 0.8 0.7 0.6 Retry rate 0.5 0.4 0.3 1 req 2 req 0.2 4 req 6 req 0.1 0.0 0 20 40 60 80 100 120 140 Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science
1 request with multiple channels 300 50000 1C1R 1C1R 2C1R 2C1R 250 Accepted Traffic (MByte/s) 3C1R 3C1R 40000 average Latency (ns) 200 30000 150 20000 100 10000 50 0 0 0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140 160 180 200 220 Injection Rate (kfps) Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science
1 request with multiple channels 1C1R 2C1R 6000 3C1R average Latency (ns) 4000 2000 0 50 100 Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science
2 channels with multi-requests 800 C1R1 50000 C2R1 700 C2R2 Accepted Traffic (MByte/s) C2R4 600 Average Latency (ns) 40000 C2R6 C2R8 500 30000 C1R1 400 C2R1 C2R2 20000 300 C2R4 C2R6 200 10000 C2R8 100 0 0 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 Injection Rate (kfps) Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science
3 channels with multi-requests 1200 80000 70000 1000 3C1R 3C1R Accepted Traffic (MByte/s) 3C2R 60000 3C2R Average Latency (ns) 3C4R 3C4R 800 3C6R 3C6R 50000 3C8R 3C8R 600 40000 30000 400 20000 200 10000 0 0 0 100 200 300 400 500 0 100 200 300 400 500 Injection Rate (kfps) Injection Rate (kfps) Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Throughput 1 2 4 6 8 request request request request request 1 channel 186 266 300 300 300 2 channel 265 512 710 >710 >710 3 channel 300 650 >1000 >1000 >1000 Unit: MByte/s Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Overview • A brief review of the Dynamic Link Allocation flow control method • The new simulation platform • Some simple performance analyses • An alternative method of the task request procedure • Future schedule Advanced Processor Technology Group 2014/5/13 The School of Computer Science
The Original Task Request Procedure M S S S T R F ( 3 ) T A F TRF(2) TRF(1) TRF(0) TAF VRF/TAF VRF/TAF TRF task request flit VRF volunteer request flit TAF task acknowledge flit Advanced Processor Technology Group 2014/5/13 The School of Computer Science
The alternative method M S S S T R F ( 3 ) TRF(2) TRF(1) TRF(3) M S S S VRF/TAF TAF VRF/TAF Advanced Processor Technology Group 2014/5/13 The School of Computer Science
Recommend
More recommend