architecture
play

Architecture Kenichi Mori, Adam Esch, Abderazek Ben Abdallah, - PowerPoint PPT Presentation

Fifth International Conference on Broadband and Wireless Computing, Communication and Applications, Nov.4, 2010 Advanced Design Issues for OASIS Network-on-Chip Architecture Kenichi Mori, Adam Esch, Abderazek Ben Abdallah, Kenichi Kuroda The


  1. Fifth International Conference on Broadband and Wireless Computing, Communication and Applications, Nov.4, 2010 Advanced Design Issues for OASIS Network-on-Chip Architecture Kenichi Mori, Adam Esch, Abderazek Ben Abdallah, Kenichi Kuroda The University of Aizu, Japan 2010/11/4 BWCCA 2010 1

  2. Contents • Background • Original OASIS NoC – Architecture – Drawback • Our contribution • Proposal designed ONoC mechanism – Stall-go control flow methodology – ONoC(Optimized NoC) Architecture • Simulation result • Summary 2010/11/4 BWCCA 2010 2

  3. Background • Network-on-Chip can solve bus-based problem • Scalable architectural platform with huge potential to handle growing complexity • Processing elements are connected via a packet switched communication network P1 P2 P3 P1 P2 P3 s s s P4 P5 P6 P: Processing element S: Switch P4 P5 P6 s s s Bus-based system Network-on-Chip system 2010/11/4 BWCCA 2010 3

  4. OASIS NoC: Network • Original OASIS* NoC has 4x4 mesh network • Each router has one processing element OASIS whole network * A. Ben Abdallah, M.Sowa, Basic Network-on-Chip Interconnection for Future Gigascale MCSoCs Applications: Communication and Computation Orthogonalization, JASSST2006, Dec. 4-9th, 2006. 2010/11/4 BWCCA 2010 4

  5. OASIS NoC: Routing destination • Routing algorithm is static XY routing • Switching method is worm hole source OASIS whole network Flit structure Routing information 2010/11/4 BWCCA 2010 5

  6. OASIS NoC: Router design First stage : 76 They have buffering and 76 Local data_in_L[379:0] port_req[24:0] 5 sw_alloc Input_port routing mechanisms tail 16 Second stage : data_out_L[0] data_out_S[0] data_out_W[0] data_out_N[0] data_out_E[0] 76 South 76 It has scheduling and data_in_S[379:0] 5 cntrl[24:0] Input_port 16 flow control mechanism 76 data_out_L[379:0] North 76 5 data_in_N[379:0] Input_port data_out_S[379:0] 16 data_out_N[379:0] 76 crossbar West 76 data_in_W[379:0] 5 data_out_W[379:0] Input_port Third stage : 16 data_out_E[379:0] It sends flits each 76 East 76 data_in_E[379:0] adequate next port 5 Input_port 16 80 drop_out 5 drop_in One router has three pipeline stages 2010/11/4 BWCCA 2010 6

  7. OASIS NoC drawback • Original OASIS NoC has an overhead problem – Large number of dropped flits in congestion communication PE PE full Router Router Congestion Dropped flits Large overhead Node should send again 2010/11/4 BWCCA 2010 7

  8. Our contribution • Optimized NoC(ONoC) can overcome the OASIS overhead problem • To avoid dropped flits, an efficient stall- go (ESG) algorithm is proposed 2010/11/4 BWCCA 2010 8

  9. Contents • Background • Original OASIS NoC – Architecture – Drawback • Our contribution • Proposal designed ONoC mechanism – Stall-go control flow methodology – ONoC(Optimized NoC) Architecture • Simulation result • Summary 2010/11/4 BWCCA 2010 9

  10. Efficient stall-go (ESG) algorithm Nearly_full = 0 Nearly_full = 1 Data_sent = 1 Out = 0 Out = 1 Data_sent = 1 Stop Go Sent Nearly_full = 1 Out = 0 Data_sent = 0 Nearly_full = 1 Out = 0 Data_sent = 0 Nearly_full = 0 Out = 1 Data_sent = 0 Mealy machine for ESG algorithm 2010/11/4 BWCCA 2010 10

  11. ONoC: Architecture nearly full nearly full stop nearly full 1 stop 1 1 1 1 20 ESG ESG data_in 1 1 data_sent data_sent 1 1 block block 1 1 1 1 grant grant data_out Scheduler data_out Scheduler 20 20 20 20 nearly full 1 1 1 1 20 20 20 20 20 20 data_in • ESG is implemented between input port and scheduler • ESG receives nearly full and data sent signal • If receiver FIFO will be full, stall go controls to stop sending flits 2010/11/4 BWCCA 2010 11

  12. ONoC: Router design stop[4:0] 1 sw_req[4:0] 1 data_sent[4:0] Local 20 data_in_L[19:0 ] 20 port_req[24:0] 3 xaddr[2:0] 5 Input_port 3 tail_sent[4:0] sw_alloc yaddr[2:0] 1 1 1 data_in_S[19:0] South 20 20 3 5 Input_port 3 1 sent tail ESG is implemented 1 data_out_L[0] data_out_S[0] data_out_W[5:1] data_out_W[0] data_out_N[5:1] data_out_N[0] data_out_L[5:1] data_out_S[5:1] data_out_E[5:1] data_out_E[0] 1 cntrl[24:0] data_in_N[19:0] North 20 20 3 5 Input_port 3 1 1 1 data_out_L[19:0] data_in_W[19:0] West 20 20 data_out_S[19:0] 3 crossbar data_out_N[19:0] 5 Input_port 3 data_out_W[19:0] 1 data_out_E[19:0] 1 1 20 data_in_E[19:0] East 20 3 5 data_in[99:0 ] Input_port 3 1 5 Nearly_full 5 Nearly_full 2010/11/4 BWCCA 2010 12

  13. Efficient stall-go achievement PE PE full Router Router Just stop Congestion sending Flits are sent without overhead 2010/11/4 BWCCA 2010 13

  14. Contents • Background • Original OASIS NoC – Architecture – Drawback • Our contribution • Proposal designed ONoC mechanism – Stall-go control flow methodology – ONoC(Optimized NoC) Architecture • Simulation result • Summary 2010/11/4 BWCCA 2010 14

  15. Simulation parameters ONoC parameters configurations Network size 3x3-mesh Buffer depth 4, 8, 16 and 32 Flit size 20 bit (Header: 12 bit Payload: 8 bit) Forwarding Wormhole switching Scheduling Round-robin Flow control Stall-go Routing static X-Y routing Target application JPEG codec Target device Altera Stratix III Input data size 120,015 bytes(ratio 200x200) 2010/11/4 BWCCA 2010 15

  16. ONoC communication time analysis ONoC total communication time is 250000 less than OASIS in small buffer depth 200000 cycles 150000 OASIS cycles 100000 ONoC cycles 50000 0 4 8 16 32 Buffer depth 2010/11/4 BWCCA 2010 16

  17. ONoC complexity analysis Power Speed Buffer size Architecture Area (ALUTs) (mW) (MHz) ONoC 5,485(5%) 649.17 185.87 4 OASIS 5,282(5%) 649.03 207.90 4.38 % extra ONoC 8,269(7%) 660.02 186.60 hardware 8 OASIS 7,890(7%) 659.31 195.05 ONoC 10,538(9%) 682.80 161.26 16 OASIS 10,279(9%) 681.63 177.43 17,416 ONoC 716.87 153.96 (15%) 32 16,569 OASIS 716.02 172.38 (15%) 2010/11/4 BWCCA 2010 17

  18. Summary • This research presents optimization technique and architecture of a Optimized NoC • ONoC achieves 14.18 % less communication time than OASIS, and area is only 4.38 % larger than OASIS On going work • Buffer borrowing algorithm • Short cut bus 2010/11/4 BWCCA 2010 18

  19. Thank you for listening 2010/11/4 BWCCA 2010 19

Recommend


More recommend