Fifth International Conference on Broadband and Wireless Computing, Communication and Applications, Nov.4, 2010 Advanced Design Issues for OASIS Network-on-Chip Architecture Kenichi Mori, Adam Esch, Abderazek Ben Abdallah, Kenichi Kuroda The University of Aizu, Japan 2010/11/4 BWCCA 2010 1
Contents • Background • Original OASIS NoC – Architecture – Drawback • Our contribution • Proposal designed ONoC mechanism – Stall-go control flow methodology – ONoC(Optimized NoC) Architecture • Simulation result • Summary 2010/11/4 BWCCA 2010 2
Background • Network-on-Chip can solve bus-based problem • Scalable architectural platform with huge potential to handle growing complexity • Processing elements are connected via a packet switched communication network P1 P2 P3 P1 P2 P3 s s s P4 P5 P6 P: Processing element S: Switch P4 P5 P6 s s s Bus-based system Network-on-Chip system 2010/11/4 BWCCA 2010 3
OASIS NoC: Network • Original OASIS* NoC has 4x4 mesh network • Each router has one processing element OASIS whole network * A. Ben Abdallah, M.Sowa, Basic Network-on-Chip Interconnection for Future Gigascale MCSoCs Applications: Communication and Computation Orthogonalization, JASSST2006, Dec. 4-9th, 2006. 2010/11/4 BWCCA 2010 4
OASIS NoC: Routing destination • Routing algorithm is static XY routing • Switching method is worm hole source OASIS whole network Flit structure Routing information 2010/11/4 BWCCA 2010 5
OASIS NoC: Router design First stage : 76 They have buffering and 76 Local data_in_L[379:0] port_req[24:0] 5 sw_alloc Input_port routing mechanisms tail 16 Second stage : data_out_L[0] data_out_S[0] data_out_W[0] data_out_N[0] data_out_E[0] 76 South 76 It has scheduling and data_in_S[379:0] 5 cntrl[24:0] Input_port 16 flow control mechanism 76 data_out_L[379:0] North 76 5 data_in_N[379:0] Input_port data_out_S[379:0] 16 data_out_N[379:0] 76 crossbar West 76 data_in_W[379:0] 5 data_out_W[379:0] Input_port Third stage : 16 data_out_E[379:0] It sends flits each 76 East 76 data_in_E[379:0] adequate next port 5 Input_port 16 80 drop_out 5 drop_in One router has three pipeline stages 2010/11/4 BWCCA 2010 6
OASIS NoC drawback • Original OASIS NoC has an overhead problem – Large number of dropped flits in congestion communication PE PE full Router Router Congestion Dropped flits Large overhead Node should send again 2010/11/4 BWCCA 2010 7
Our contribution • Optimized NoC(ONoC) can overcome the OASIS overhead problem • To avoid dropped flits, an efficient stall- go (ESG) algorithm is proposed 2010/11/4 BWCCA 2010 8
Contents • Background • Original OASIS NoC – Architecture – Drawback • Our contribution • Proposal designed ONoC mechanism – Stall-go control flow methodology – ONoC(Optimized NoC) Architecture • Simulation result • Summary 2010/11/4 BWCCA 2010 9
Efficient stall-go (ESG) algorithm Nearly_full = 0 Nearly_full = 1 Data_sent = 1 Out = 0 Out = 1 Data_sent = 1 Stop Go Sent Nearly_full = 1 Out = 0 Data_sent = 0 Nearly_full = 1 Out = 0 Data_sent = 0 Nearly_full = 0 Out = 1 Data_sent = 0 Mealy machine for ESG algorithm 2010/11/4 BWCCA 2010 10
ONoC: Architecture nearly full nearly full stop nearly full 1 stop 1 1 1 1 20 ESG ESG data_in 1 1 data_sent data_sent 1 1 block block 1 1 1 1 grant grant data_out Scheduler data_out Scheduler 20 20 20 20 nearly full 1 1 1 1 20 20 20 20 20 20 data_in • ESG is implemented between input port and scheduler • ESG receives nearly full and data sent signal • If receiver FIFO will be full, stall go controls to stop sending flits 2010/11/4 BWCCA 2010 11
ONoC: Router design stop[4:0] 1 sw_req[4:0] 1 data_sent[4:0] Local 20 data_in_L[19:0 ] 20 port_req[24:0] 3 xaddr[2:0] 5 Input_port 3 tail_sent[4:0] sw_alloc yaddr[2:0] 1 1 1 data_in_S[19:0] South 20 20 3 5 Input_port 3 1 sent tail ESG is implemented 1 data_out_L[0] data_out_S[0] data_out_W[5:1] data_out_W[0] data_out_N[5:1] data_out_N[0] data_out_L[5:1] data_out_S[5:1] data_out_E[5:1] data_out_E[0] 1 cntrl[24:0] data_in_N[19:0] North 20 20 3 5 Input_port 3 1 1 1 data_out_L[19:0] data_in_W[19:0] West 20 20 data_out_S[19:0] 3 crossbar data_out_N[19:0] 5 Input_port 3 data_out_W[19:0] 1 data_out_E[19:0] 1 1 20 data_in_E[19:0] East 20 3 5 data_in[99:0 ] Input_port 3 1 5 Nearly_full 5 Nearly_full 2010/11/4 BWCCA 2010 12
Efficient stall-go achievement PE PE full Router Router Just stop Congestion sending Flits are sent without overhead 2010/11/4 BWCCA 2010 13
Contents • Background • Original OASIS NoC – Architecture – Drawback • Our contribution • Proposal designed ONoC mechanism – Stall-go control flow methodology – ONoC(Optimized NoC) Architecture • Simulation result • Summary 2010/11/4 BWCCA 2010 14
Simulation parameters ONoC parameters configurations Network size 3x3-mesh Buffer depth 4, 8, 16 and 32 Flit size 20 bit (Header: 12 bit Payload: 8 bit) Forwarding Wormhole switching Scheduling Round-robin Flow control Stall-go Routing static X-Y routing Target application JPEG codec Target device Altera Stratix III Input data size 120,015 bytes(ratio 200x200) 2010/11/4 BWCCA 2010 15
ONoC communication time analysis ONoC total communication time is 250000 less than OASIS in small buffer depth 200000 cycles 150000 OASIS cycles 100000 ONoC cycles 50000 0 4 8 16 32 Buffer depth 2010/11/4 BWCCA 2010 16
ONoC complexity analysis Power Speed Buffer size Architecture Area (ALUTs) (mW) (MHz) ONoC 5,485(5%) 649.17 185.87 4 OASIS 5,282(5%) 649.03 207.90 4.38 % extra ONoC 8,269(7%) 660.02 186.60 hardware 8 OASIS 7,890(7%) 659.31 195.05 ONoC 10,538(9%) 682.80 161.26 16 OASIS 10,279(9%) 681.63 177.43 17,416 ONoC 716.87 153.96 (15%) 32 16,569 OASIS 716.02 172.38 (15%) 2010/11/4 BWCCA 2010 17
Summary • This research presents optimization technique and architecture of a Optimized NoC • ONoC achieves 14.18 % less communication time than OASIS, and area is only 4.38 % larger than OASIS On going work • Buffer borrowing algorithm • Short cut bus 2010/11/4 BWCCA 2010 18
Thank you for listening 2010/11/4 BWCCA 2010 19
Recommend
More recommend