design of adaptive communication design of adaptive
play

Design of Adaptive Communication Design of Adaptive Communication - PowerPoint PPT Presentation

Design of Adaptive Communication Design of Adaptive Communication Channel Buffers for Low- -Power Area Power Area- - Channel Buffers for Low Efficient Network- -on on- -Chip Architecture Chip Architecture Efficient Network Avinash Kodi


  1. Design of Adaptive Communication Design of Adaptive Communication Channel Buffers for Low- -Power Area Power Area- - Channel Buffers for Low Efficient Network- -on on- -Chip Architecture Chip Architecture Efficient Network Avinash Kodi † , Ashwini Sarathy * and Ahmed Louri * † Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701 * Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85719 E-mail: kodi@ohio.edu, sarathya@ece.arizona.edu, louri@ece.arizona.edu Sponsored: National Science Foundation (NSF) grant ECCS-0725765 (at the High Performance Computing Architectures and Technologies Lab, University of Arizona, Tucson) ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS’07) Dec 3-4, 2007

  2. Talk Outline Talk Outline • Motivation & Introduction • iDEAL – Inter-router Dual-function Energy and Area-efficient Links for NoC architectures – Link and Router Architecture • Performance Evaluation – Power & Area estimation for the Links & Routers – Simulation results for Throughput, Latency & Overall network power • Conclusions 2

  3. Motivation Motivation Processing Elements (Processors, System-on-Chip (SoC) paradigm System- -on on- -Chip ( Chip (SoC SoC) paradigm ) paradigm System DSPs, Peripheral Controllers, Memory Subsystems) Processor Cores NOC (0,3) (1,3) (2,3) (3,3) Router UART / GPIO Channels or Links USB / Ethernet (0,2) (1,2) (2,2) (3,2) controllers SRAM/Flash & Memory Controllers (0,1) (1,1) (2,1) (3,1) - Increasing wire delay with decreasing feature size (0,0) (1,0) (2,0) (3,0) - Scalable, modular interconnect – Network- -on on- -Chip ( Chip (NoC NoC) ) Network 3

  4. Motivation Motivation Recent NSF-sponsored workshop on On- Generic NoC Router Recent NSF-sponsored workshop on On- Chip Interconnection Networks 1 : Chip Interconnection Networks 1 : Virtual Channel • “The most important technology constraint for Route • “The most important technology constraint for (VC) Computation on-chip networks is power consumption”. on-chip networks is power consumption”. (RC) Switch Allocator • Power consumption of OCINs implemented with (SA) • Power consumption of OCINs implemented with Input Buffers current techniques – exceeds expected needs by a current techniques – exceeds expected needs by a Crossbar Switch factor of 10. factor of 10. + x Power Break-up in the NoC Router - x + y Clock Buffer, 16% - y Arbiter, 3% Buffers, 46% Crossbar, Processing Element (PE) 35% 1. Reference : J.D.Owens, W.J.Dally, R.Ho, D.N.Jayasimha, S.W.Keckler and L.S.Peh, “Research Challenges for On-Chip Interconnection Networks”, IEEE Micro, vol. 27, no. 5, pp. 96 – 108, September-October 2007. 4

  5. iDEAL – – I Inter nter- -router router D Dual ual- -function function E Energy and nergy and A Area rea- - iDEAL efficient L Links for inks for NoC NoC architectures architectures efficient iDEAL Methodology (circuit and architectural techniques) iDEAL Methodology (circuit and architectural techniques) - Reduce the number of router buffers - Reduce the number of router buffers - To prevent performance degradation, use adaptive channel buffers to store - To prevent performance degradation, use adaptive channel buffers to store data along the links when required data along the links when required - Dynamic buffer allocation within the router buffers - Dynamic buffer allocation within the router buffers Virtual Channel Virtual Channel Route Route (VC) (VC) Computation Computation (RC) (RC) Switch Allocator Switch Allocator (SA) (SA) Input Buffers Input Buffers Crossbar Switch Crossbar Switch + x Adaptive channel - x - x buffers along the link + y + y Reduced router - y - y buffer size Processing Element (PE) Processing Element (PE) Generic NoC architecture iDEAL architecture 5

  6. Conventional Links Conventional Links Output Input Port Port of of Router A Router B 6

  7. iDEAL – – Channel Buffer Design Channel Buffer Design (1/2) iDEAL (1/2) Output Input Port Port of of Router A Router B Congestion Control block Control block 7

  8. iDEAL – – Channel Buffer Channel Buffer Design Design (2/2) iDEAL (2/2) Control block Control block Functions as a conventional repeater Repeater tri-stated and holds the when there is no congestion. sampled value, during congestion. Control block is turned ‘OFF’. Control block is turned ‘ON’. 8

  9. iDEAL – – Control Block Control Block iDEAL O/P Port I/P Port Router A Router A Congestion signal CLK1 CLK2 CLK1 CLK2 CLK - Power efficient - Power efficient - Stable at varying frequencies - Stable at varying frequencies 9

  10. iDEAL : : Dual Dual- -function Link function Link iDEAL Cycle 1 Data-In 3 2 1 0 Congestion Signal Cycle 2 Data-In 0 3 2 1 Congestion Signal Cycle 3 Data-In Data-Out 0 3 2 1 Congestion Release 10

  11. Link - - Power & Area Estimation Power & Area Estimation Link Output Input Port Port of of Router A Router B Congestion P segment(repeater) Control block Control block P segment(chl-buffer) (Dynamic, leakage, short-circuit) (leakage, control block) Congestion CLK1 CLK2 P control-blk (inverters, clock, CLK switched-cap.) 11

  12. iDEAL – – Router Buffer Design Router Buffer Design iDEAL Input Port P • Static buffer allocation vc 1 Flit 1 - Fixed number of buffers per Flit r VC VCID VC State Table - HoL blocking DEMUX MUX vc v Flit 1 Flit r Credit Return VC State Table Congestion Control VC RP WP OP OVC CR C* Status v RP = read pointer, WP = write pointer, OP = output port, OVC = output VC, CR = credits, C* = congestion RP = read pointer, WP = write pointer, OP = output port, OVC = output VC, CR = credits, C* = congestion Status = status of the VC (idle, waiting, RC, VA, SA, ST) Status = status of the VC (idle, waiting, RC, VA, SA, ST) 12

  13. iDEAL – – Router Buffer Design Router Buffer Design iDEAL Input Port P • Dynamic buffer allocation Flit 1 - Approximately (z + c)/v buffers Flit r per VC (z = router buffers, c = Flit r+1 channel buffers, v = # of VCs) DEMUX MUX Flit 2r VC RP WP OP OVC CR Status F 0 F 1 F (z+c)/v … Flit 0 3 N N 3 N … (v-1) r + 1 1 6 N N 6 … N Flit z … … … … … … … Write Read v 5 N N 5 … N Pointer Pointer Credit Unified VC State Return Table Buffer Free Buffer Slot Slot Input Flit Output Flit Availability Tracking Tracking 1 Y 2 N Congestion Control z N RP = read pointer, WP = write pointer, OP = output port, OVC = output VC, CR = credits, C* = congestion Status = status of the VC (idle, waiting, RC, VA, SA, ST) 13

  14. iDEAL – – Router Buffer Design Router Buffer Design iDEAL • Example illustrating Dynamic buffer allocation in iDEAL Write Read Pointer Pointer Buffer Free Slot VC RP WP OP OVC CR Status F 0 F 1 F 3 F 4 0 N ST 0 3 N 0 1 4 SA 2 3 4 7 1 N Unified VC State 6 1 1 2 N 4 VC 1 N N N N Table 2 N 2 N N N N 4 Idle N N N N 3 N Y 4 N 3 0 N 5 2 4 SA 0 5 N N 5 N Buffer Slot Input Output Availability 6 N Y Flit Flit Tracking Tracking 7 N Incoming flit (VCID = 1) Congestion Control RP = read pointer, WP = write pointer, OP = output port, OVC = output VC, CR = credits, C* = congestion Status = status of the VC (idle, waiting, RC, VA, SA, ST) 14

  15. Router - - Power & Area Estimation Power & Area Estimation Router Virtual Channel Route (VC) 6T SRAM cell Computation (RC) Switch Allocator (SA) Buffer Power Input Buffers (P write + P read ) Crossbar Switch Bitlines Crossbar Power (Switch + Arbiter) Wordlines Sense Amp • Power reduces on decreasing the Processing Element (PE) buffer size 15

  16. Performance Evaluation Performance Evaluation Evaluated on a cycle-accurate on-chip network simulator • Simulated 8 x 8 Mesh and 8 x 8 Folded Torus topologies • Synthetic benchmarks such as uniform, and non-uniform workloads • (Butterfly, Complement, Perfect Shuffle, Matrix Transpose, Bit Reversal) were evaluated Parameters evaluated include throughput, latency and overall network • power Considered 5 different configurations – (vn V – rn R – cn C ) • (n V = No. of VCs per input port, n R = No. of router buffers per VC, n C = number of channel buffers) – Baseline = 440 – 434, 428, 344, 531 16

  17. Power Estimation - - Summary Summary Power Estimation vnV – rnR - Buffer % Change Mesh % Change Folded Torus % Change cnC Power (mW) Link + Control Link + Control Power (mW) Power (mW) v4-r4-c0 2.020 2.032 + 0 - 4.068 + 0 - - v4-r3-c4 1.646 -18.51 2.164 + 0.0122 + 7.0 4.195 + 0.0122 + 3.4 v4-r2-c8 1.272 -37.02 2.296+0.0205 +13.9 4.437+0.0205 +6.8 v4-r2-c8 1.272 -37.02 2.296 + 0.0205 + 13.9 4.327 + 0.0205 + 6.8 v4-r2-c8 1.272 -37.02 2.296+0.0205 +13.9 4.437+0.0205 +6.8 v3-r4-c4 1.646 -18.51 2.164 + 0.0122 + 7.0 4.195 + 0.0122 + 3.4 v3-r3-c7 1.365 -32.41 2.263 + 0.0184 + 12.2 4.294 + 0.0184 + 6.0 v5-r2-c6 1.459 -27.76 2.230 + 0.0164 + 10.5 4.261 + 0.0164 + 5.1 v5-r3-c1 1.926 -4.65 2.065 + 0.0059 + 1.8 4.096 + 0.0059 + 0.8 n V = number of VCs per input port n R = number of router buffers per VC n C = number of channel buffers 17

Recommend


More recommend