Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance Randy Morris Ϯ , Avinash Kodi Ϯ and Ahmed Louri ‡ School of Electrical Engineering and Computer Science, Ohio University Ϯ Department of Electrical and Computer Engineering, University of Arizona ‡ E-mail: kodi@ohio.edu, louri@email.arizona.edu 45 th International Symposium on Microarchitecture (MICRO) December 1 – December 5, 2012 Vancouver BC, Canada
Talk Outline • Motivation & Background • R-3PO: Architecture & Reconfiguration • Performance Analysis • Conclusions 2
Multicores & Network-on-Chips Tilera-64 1 80-core Intel TeraFlops 2 512-core FERMI (Nvidia) 3 • With increasing cores, communication-centric design paradigm is becoming important (Network-on-Chips) • Energy for communication is increasing • Delivered throughput is decreasing 1 http://www.tilera.com/products/processors/TILE64 2 http://techresearch.intel.com/ProjectDetails.aspx?Id=151 3 http://www.nvidia.com/object/fermi_architecture.html 3
Energy Discrepancy & Throughput On-die energy • Energy discrepancy between computation and global communication 1.2 Compute Energy 1 with technology scaling Interconnect Energy 0.8 Relative 0.6 Need to reduce global communication energy 0.4 0.2 0 45 32 22 14 10 7 Technology (nm) Source: Shekar Borkar, Intel Tile Power: Intel Tera-Flops (65 nm) 1 • Reduced throughput due to aggressive 250 1.33 Tflops 225 Power (watts) voltage and clock scaling At 230 W 200 175 Need to provide scalable bandwidth 1 Tflops 150 at 97 W without sacrificing performance 125 100 75 50 => Potential solutions: 25 0 Nanophotonics, 3D Stacking Voltage Y. Hoskote , “A 5 - GHz Mesh Interconnect for A Teraflops Processor,” 1. IEEE Computer Society, 2007 pp. 51-61 4
Nanophotonics & Optical 3D Stacking • Nanophotonics offers several advantages: • Low energy (7.9 fJ/bit ) • Small Footprint (~2.5 µm) • High Bandwidth (~40 Gbps) • CMOS compatibility 1. L. Xu, W. Zhang, Q. Li, J. Chan, H. L. R. Lira, M. Lipson, K. Bergman, "40-Gb/s DPSK Data Transmission Through a Silicon Microring Switch," IEEE Photonics Technology Letters 24 . 2. Sasikanth Manipatruni, Kyle Preston, Long Chen, and Michal Lipson, "Ultra-low voltage, ultra-small mode volume silicon microring modulator," Opt. Express 18, 18235-18242 (2010) • Optical 3D stacking offers several advantages: • Shorter interconnect length • Higher bandwidth density • Optical vias create power-efficient Layer 2 inter-layer communication Layer 1 3. P. Koonath and B. Jalali , “Multilayer 3 - d photonics in silicon,” Opt. Express, vol. 15, pp. 12 686 – 12 691, 2007. 4. A. Biberman, K. Preston, G. Hendry, N. Sherwood-Droz, J. Chan, J. S. Levy, M. Lipson, and K. Bergman, “Photonic network -on-chip architectures using multilayer deposited silicon materials for high performance chip multiprocessors,” J. Emerg. Technol. Comput. Syst., vol. 7, pp. 1 – 25, July 2011. 5
Recent Work on Photonic NoC, among others • Shared-Bus [Cornell, MICRO’06] • Free-Space Architecture [ISCA’10] Optical Proximity [Sun, ISCA’10] • • Circuit Switch [Columbia, NoCs’07] PROPEL [Ohio, NoCs’10] • • CORONA [HP/Wisconsin, ISCA’08] System Level Trimming [UC Davis, • • Processor-DRAM [MIT, Hot Int’08] HPCA’11] • Firefly [Northwestern, ISCA’09] Atomic Coherence [Wisconsin/HP, HPCA’11] • • Phastlane [Cornell, ISCA’09] FeatherWeight [Northwestern/KAIST, • MICRO’11 ] • Flexishare [Northwestern, HPCA’10] Resilient Microring Design [UCDavis, • • Oblivious Router [Cornell, ASPLOS’10] MICRO’11] • ATAC [MIT, PACT’10 ] Tolerating Process Variations [Pittsburgh, • • MPNoC [Arizona, DAC’10] ISCA’12] • However, there are several issues not addressed • 2D planar connections have waveguide crossings • Static network resource allocation • Lack of fault tolerance 6
Talk Outline • Motivation & Background • R-3PO: Architecture & Reconfiguration • Performance Analysis • Conclusions 7
R-3PO Architecture • Decomposed optical crossbar • Reduces optical hardware complexity by having smaller crossbars • Reduces crossover losses (~ 0.05 dB/crossing) • Optical vias • Light switched via photonic rings (reduces electrical power) • Eases fabrication as optical and electrical dies can be separately grown • Reconfiguration of network resources by re-allocating bandwidth • Reduces application execution time by monitoring link and buffer utilization • Provides fault tolerance as faulty channels are bypassed 8
R-3PO Architecture (1/6) Electrical Contact Optical Layer 3 Optical Optical Layer 2 Die Optical Layer 1 Optical Layer 0 Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 9
R-3PO Architecture (1/6) L1 Cache L1 Cache Core Core 0 1 Shared L2 L1 Cache L1 Cache Core Core 2 3 Electrical Die Core + Cache + MC Heat Sink 10
R-3PO Architecture (2/6) Limiting Driver for Photo- TIA detector Amplifier Electronics Buffer Chain T x T x T x T x R x R x R x R x Micro-ring resonator λ 1 λ 2 λ 3 λ 4 λ 1 λ 2 λ 3 λ 4 Off- Chip Laser Core A Core B Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 11
R-3PO Architecture (3/6) Group 1 Group 0 Group 2 Group 3 Optical Layer 0 Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 12
R-3PO Architecture (4/6) Group 1 Group 0 Group 3 Group 2 Optical Layer 1 Optical Layer 0 External Electro-Optic Transceivers Laser TSVs Electrical Die Core + Cache + MC Heat Sink 13
R-3PO Architecture (5/6) Group 0 Group 1 Group 3 Group 2 Optical Layer 2 Optical Layer 1 Optical Layer 0 Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 14
R-3PO Architecture (6/6) Group 0 Group 1 Electrical Contact Group 2 Group 3 Optical Layer 3 Optical Optical Layer 2 Die Optical Layer 1 Optical Layer 0 Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 15
Router Microarchitecture Tile 0 Header Route Computation Token capture Token (RC) IB 0 Req + Rel release To Optical E/O Tx Layer 0 demux MRR BW BW BW BW RC RC EO EO OL OL OL OL OL OL OE OE SA SA Modulators S D Token IB 3 BW BW BW BW Req + Rel RC RC EO EO OL OL OL OL OL OL OE OE SA SA L2 Shared Cache S D To Optical E/O Tx Layer 3 RC: Route Computation BWS: Buffer Write (Source) Switch Allocator Token Token 0B 0 Control (SA) Re-generation EO: Electrical to Optical Driver From Optical O/E OL: Optical link latency (1-3 cycles) Rx Layer 0 OE: Optical to Electrical (Dest) mux BWD: Buffer Write (Dest) Token 0B 3 Control SA: Switch Allocation O/E From Optical Rx Layer 3 MRR Filters 16
Static Communication Layer 2 Source Group 0 Group 1 Communication demand between • Tile 0 and Tile 15 is high based on application If there are under-utilized links, • then the bandwidth can be re- allocated to improve the performance Group 3 Group 2 17
Network Reconfiguration Layer 0 Layer 1 Source Group 1 Group 0 Group 1 Group 0 Switch point Combine point Layer 1 Layer 0 Group 3 Group 2 Group 3 Group 2 Destination 2x increase in bandwidth is obtained by routing half the data through two other nanophotonic channels 18
Reconfiguration • Reconfiguration in R-3PO takes place between the different layers as follows: • R-3P0-L1 : Reconfiguration between Layer0/Layer1 & Layer2/Layer3 • R-3P0-LA : Reconfiguration between adjacent layers • R-3P0-L2 : Reconfiguration between two adjacent layers • R-3P0-L3 : Reconfiguration between all layers • Reconfiguration algorithm monitors network resources • Link & Buffer utilization • Accomplished with hardware counters & electrical circuitry 19
Reconfiguration Algorithm Step 1: Wait for Reconfiguration window, R W t Step 2: RC i sends a request packet to all local tiles requesting Link Util and Buffer Util for previous R W t-1 Step 3: Each hardware counter sends Link Util and Buffer Util statistics from the pervious R W t-1 to RC i Step 4: RC i classifies the link statistic for each hardware counter as: If Link util = 0.0 Not-Utilized: Use β 4 If Link util ≤ Lmin Under-Utilized: Use β 3 If Link util ≥ L min and Buffer util < B con Normal-Utilized: Use β 2 If Bufferutil > Bcon Over-Utilized: Use β 1 Step 5: Each RC i sends bandwidth available information to RC j , (i ≠ j). Step 6: If RC j can use any of the free links then notify RC i of their use, else RC j will forward to next RC j Step 7a: RC i receives response back from RC j and activates corresponding microrings Step 7b: RC j notifies the tiles of additional bandwidth and RC i notifies RC j that the additional bandwidth is now available Step 8: Goto Step 1 20
Recommend
More recommend