onoc spl customized
play

ONoC-SPL: Customized Network-on-Chip (NoC) Architecture and - PowerPoint PPT Presentation

iCAST 2012 Seoul, Korea July 21-24 2012 ONoC-SPL: Customized Network-on-Chip (NoC) Architecture and Prototyping for Data-intensive Computation Applications Akram Ben Ahmed, Kenichi Mori, Abderazek Ben Abdallah The University of Aizu School of


  1. iCAST 2012 Seoul, Korea July 21-24 2012 ONoC-SPL: Customized Network-on-Chip (NoC) Architecture and Prototyping for Data-intensive Computation Applications Akram Ben Ahmed, Kenichi Mori, Abderazek Ben Abdallah The University of Aizu School of Computer Science and Engineering, Adaptive Systems Laboratory, Aizu-Wakamatsu, Japan. Email:d8141104@u-aizu.ac.jp The University of Aizu Adaptive systems lab 1

  2. Outline • Background • ONoC-SPL architecture – OASIS2-NoC overview – SPL Insertion Algorithm • Evaluation • Conclusion The University of Aizu Adaptive systems lab 2

  3. Outline • Background • ONoC-SPL architecture – OASIS2-NoC overview – SPL Insertion Algorithm • Evaluation • Conclusion The University of Aizu Adaptive systems lab 3

  4. Background: Bus-based system Vs. NoC Wait Wait Core1 Core2 Core3 Data Data Data Memory Memory I/O 1 2 Bus based system Parallelism problem High latency The University of Aizu Adaptive systems lab 3

  5. Background: Bus-based system Vs. NoC Input buffer Processing Element Router Unidirectional link Network Interface NoC based system [Carloni2009, Ben2006] The University of Aizu Adaptive systems lab 5

  6. Background: Bus-based system Vs. NoC NoC based system [Carloni2009, Ben2006] The University of Aizu Adaptive systems lab 5

  7. Background: NoC Challenges - Routing [Sulivan1977, Seo2005] Path selection has an impact on the system performance The University of Aizu Adaptive systems lab 5

  8. Background: NoC Challenges - Routing [Sulivan1977, Seo2005] - Flow control [Agarwal2009, Pullini2005] Efficient flow control is crucial The University of Aizu Adaptive systems lab 5

  9. Background: Bus-based system Vs. NoC - Routing [Sulivan1977, Seo2005] - Flow control [Agarwal2009, Pullini2005] - Topology  Mesh [Zhang2011]  Uniform connection  Large hop count The long distance affects the latency, throughput and power The University of Aizu Adaptive systems lab 5

  10. Background: Bus-based system Vs. NoC - Routing [Sulivan1977, Seo2005] - Flow control [Agarwal2009, Pullini2005] - Topology  Mesh [Zhang2011]  Torus [Dally1986]  Connects the network extremities to reduce the inter-node distance - Increasing complexity - Different wire lengths - Clock skew The University of Aizu Adaptive systems lab 5

  11. Background: Bus-based system Vs. NoC - Routing [Sulivan1977, Seo2005] - Flow control [Agarwal2009, Pullini2005] - Topology  Mesh [Zhang2011]  Torus [Dally1986]  Customized [Bolotin2004]  Especially designed for specific application - Long design time - Difficult to implement The University of Aizu Adaptive systems lab 5

  12. Background: OASIS2-NoC • 4x4 Mesh topology • Wormhole-like switching • Stall-and-Go flow control • 20 bits flit OASIS2-NoC 4x4 network system [*] [*] K. Mori, A. Esch , A. Ben Abdallah, K., Kuroda, ” Advanced Design Issue for OASIS Network-on-Chip Architecture ”, IEEE , International Conference on BWCCA, pp.74-79, 2010. The University of Aizu Adaptive systems lab 5

  13. Background: Motivation • In OASIS2-NoC, PEs are connected uniformly and it suffers from large hop count between any (source, destination) pair – Significantly degrades the overall performance especially for Data intensive applications • Using synthetic traffic in High-level simulation do not reveal the real system performance – Not enough to evaluate the NoC router’s parameters (flow control, Buffer size and routing) effects and trade-offs – Not accurate hardware and performance evaluation The University of Aizu Adaptive systems lab 5

  14. Background: Contributions • Proposal of an optimized version of OASIS-2, named ONoC-SPL, customized with a Short- Pass-Link (SPL) – To reduce the communication latency for long range and high frequency communication • Prototyping ONoC-SPL on FPGA with synthetic and real applications – To evaluate accurate Power consumption, Area utilization and Performance The University of Aizu Adaptive systems lab 5

  15. Outline • Background • ONoC-SPL architecture – OASIS2-NoC architecture – SPL Insertion Algorithm • Evaluation • Conclusion The University of Aizu Adaptive systems lab 15

  16. Outline • Background • ONoC-SPL architecture – OASIS2-NoC architecture – SPL Insertion Algorithm • Evaluation • Conclusion The University of Aizu Adaptive systems lab 16

  17. OASIS2-NoC: Router architecture BW RC/SA CT The University of Aizu Adaptive systems lab 17

  18. OASIS2-NoC: Router architecture Input module Input data enter to these modules - Input buffer (BW) - Look-Ahead-XY routing (RC) The University of Aizu Adaptive systems lab 18

  19. OASIS2-NoC: Router architecture Arbiter and flow control • Arbiter : Handles the arbitration between the different input port request (SA) • Stall/Go : Includes the flow control module The University of Aizu Adaptive systems lab 19

  20. OASIS2-NoC: Router architecture Crossbar Handles the transfer of flits to their appropriate channels depending on the information received from the arbiter (CT) The University of Aizu Adaptive systems lab 20

  21. OASIS2-NoC: Arbitration & flow control Flow control mechanism Arbitration mechanism Matrix arbiter When the priority i > j, P(i,j) becomes Avoiding buffer overflow method is Stall/Go 1 and P(j, i) become 0 highest highest (a) (b) The University of Aizu Adaptive systems lab 21

  22. Outline • Background • ONoC-SPL architecture – OASIS2-NoC architecture – Short-Pass-Link (SPL) Customization • Evaluation • Conclusion The University of Aizu Adaptive systems lab 22

  23. Short-Pass-Link (SPL) Customization SPL • ONoC-SPL employs mesh topology with Short Pass Link(SPL) – To reduce the latency caused by the high number of of hops The University of Aizu Adaptive systems lab 23

  24. SPL insertion process: Algorithm The number of SPL decision Insert commu. selection Simulation and Insertion The University of Aizu Adaptive systems lab 24

  25. SPL insertion process: Example Dimension reversal with SPL Hotspot with SPL Communication Communication frequency 2 SPL inserted frequency 2 SPL inserted Distance Distance (0,3) -> (1,0): 0.294 (0,3) -> (1,0): 4 (3,0) -> (0,3): 6 (3,0) -> (0,3): 0.125 -(3,0) -> (0,3) -(0,3) -> (1,0) (3,3) -> (1,1): 0.235 (3,3) -> (1,1): 4 (0,3) -> (3,0): 6 (0,3) -> (3,0): 0.125 -(0,3) -> (3,0) -(3,3) -> (1,1) (2,0) -> (2,3): 0.235 The University of Aizu Adaptive systems lab 25

  26. Outline • Background • ONoC-SPL architecture – OASIS2-NoC overview – SPL Insertion Algorithm • Evaluation • Conclusion The University of Aizu Adaptive systems lab 26

  27. Evaluation: Evaluation methodology • Design Tools Dimen. Hotspot JPEG – Language: Verilog-HDL Network size info. Behavior – Software: Quartus II 11.0 NoC partitioning parameter Model – Simulation tool: ModelSim- Verilog- RTL code Hardware Altera 6.6 HDL compile – Device: Stratix III FPGA board Synthesis Quartus II • Target applications RGB bitstream FPGA Stratix III – Dimension Reversal 24'b001101100101001101101110; 24'b001101110101010001101111; 24'b010001110110010001111111; 24'b010110100111011110010010; 24'b011001011000000010011011; 24'b011010001000001110011110; 24'b011001000111101110010101; – Hotspot 24'b010101100110110010000101; 24'b001110010101011001110001; 24'b010000000101110101111000; – JPEG encoder Execution Hardware time complexity The University of Aizu Adaptive systems lab 27

  28. Evaluation: Simulation Configuration The University of Aizu Adaptive systems lab 28

  29. Evaluation: Hardware complexity • Extra area less than 5% • 6.5% speed reduction • Slight 1% power overhead The University of Aizu Adaptive systems lab 29

  30. Evaluation: Performance (Execution time) Execution time 30 ONoC-SPL execution time decreased by 30.1 % on average 25 -16.9 -16.1 20 +7.3 +11.3 Dimension Reversal (μs) 15 Hotspot(μs) -29.7 JPEG time (x10^1 ms) -31.0 10 -43.7 5 0 OASIS ONoC-SPL1 ONoC-SPL2 ONoC-SPL3 The University of Aizu Adaptive systems lab 30

  31. Evaluation: Performance (Throughput) Throughput (flits/cycle) ONoC-SPL throughput enhanced 32.3 % on average +49.6 +0.01 +24.8 + 24.8 +11.3 + 22.6 0.0 The University of Aizu Adaptive systems lab 31

  32. Outline • Background • ONoC-SPL architecture – OASIS2-NoC overview – SPL Insertion Algorithm • Evaluation • Conclusion The University of Aizu Adaptive systems lab 32

  33. Conclusion • Proposal of an optimized version of 2D-NoC named ONoC-SPL • SPL insertion algorithm is proposed to reduce the high frequency communication latency • Prototyping on FPGA for accurate performance and hardware complexity evaluation using synthetic traffic and real workload The University of Aizu Adaptive systems lab 33

  34. Conclusion • The execution time has decreased with 30.1% and the throughput has enhanced by 32.3% in average when comparing the proposed system with previous systems • Performance gain was obtained with an extra hardware under 5% observing a slight 0.49% power consumption overhead in average The University of Aizu Adaptive systems lab 34

Recommend


More recommend