Photonic Networks-on-Chip for Maximizing Performance and Improving - PowerPoint PPT Presentation

Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance Randy Morris Ϯ , Avinash Kodi Ϯ and Ahmed Louri ‡ School of Electrical Engineering and Computer Science, Ohio University Ϯ Department of Electrical and Computer Engineering, University of Arizona ‡ E-mail: kodi@ohio.edu, louri@email.arizona.edu 45 th International Symposium on Microarchitecture (MICRO) December 1 – December 5, 2012 Vancouver BC, Canada

Talk Outline • Motivation & Background • R-3PO: Architecture & Reconfiguration • Performance Analysis • Conclusions 2

Multicores & Network-on-Chips Tilera-64 1 80-core Intel TeraFlops 2 512-core FERMI (Nvidia) 3 • With increasing cores, communication-centric design paradigm is becoming important (Network-on-Chips) • Energy for communication is increasing • Delivered throughput is decreasing 1 http://www.tilera.com/products/processors/TILE64 2 http://techresearch.intel.com/ProjectDetails.aspx?Id=151 3 http://www.nvidia.com/object/fermi_architecture.html 3

Energy Discrepancy & Throughput On-die energy • Energy discrepancy between computation and global communication 1.2 Compute Energy 1 with technology scaling Interconnect Energy 0.8 Relative 0.6  Need to reduce global communication energy 0.4 0.2 0 45 32 22 14 10 7 Technology (nm) Source: Shekar Borkar, Intel Tile Power: Intel Tera-Flops (65 nm) 1 • Reduced throughput due to aggressive 250 1.33 Tflops 225 Power (watts) voltage and clock scaling At 230 W 200 175  Need to provide scalable bandwidth 1 Tflops 150 at 97 W without sacrificing performance 125 100 75 50 => Potential solutions: 25 0 Nanophotonics, 3D Stacking Voltage Y. Hoskote , “A 5 - GHz Mesh Interconnect for A Teraflops Processor,” 1. IEEE Computer Society, 2007 pp. 51-61 4

Nanophotonics & Optical 3D Stacking • Nanophotonics offers several advantages: • Low energy (7.9 fJ/bit ) • Small Footprint (~2.5 µm) • High Bandwidth (~40 Gbps) • CMOS compatibility 1. L. Xu, W. Zhang, Q. Li, J. Chan, H. L. R. Lira, M. Lipson, K. Bergman, "40-Gb/s DPSK Data Transmission Through a Silicon Microring Switch," IEEE Photonics Technology Letters 24 . 2. Sasikanth Manipatruni, Kyle Preston, Long Chen, and Michal Lipson, "Ultra-low voltage, ultra-small mode volume silicon microring modulator," Opt. Express 18, 18235-18242 (2010) • Optical 3D stacking offers several advantages: • Shorter interconnect length • Higher bandwidth density • Optical vias create power-efficient Layer 2 inter-layer communication Layer 1 3. P. Koonath and B. Jalali , “Multilayer 3 - d photonics in silicon,” Opt. Express, vol. 15, pp. 12 686 – 12 691, 2007. 4. A. Biberman, K. Preston, G. Hendry, N. Sherwood-Droz, J. Chan, J. S. Levy, M. Lipson, and K. Bergman, “Photonic network -on-chip architectures using multilayer deposited silicon materials for high performance chip multiprocessors,” J. Emerg. Technol. Comput. Syst., vol. 7, pp. 1 – 25, July 2011. 5

Recent Work on Photonic NoC, among others • Shared-Bus [Cornell, MICRO’06] • Free-Space Architecture [ISCA’10] Optical Proximity [Sun, ISCA’10] • • Circuit Switch [Columbia, NoCs’07] PROPEL [Ohio, NoCs’10] • • CORONA [HP/Wisconsin, ISCA’08] System Level Trimming [UC Davis, • • Processor-DRAM [MIT, Hot Int’08] HPCA’11] • Firefly [Northwestern, ISCA’09] Atomic Coherence [Wisconsin/HP, HPCA’11] • • Phastlane [Cornell, ISCA’09] FeatherWeight [Northwestern/KAIST, • MICRO’11 ] • Flexishare [Northwestern, HPCA’10] Resilient Microring Design [UCDavis, • • Oblivious Router [Cornell, ASPLOS’10] MICRO’11] • ATAC [MIT, PACT’10 ] Tolerating Process Variations [Pittsburgh, • • MPNoC [Arizona, DAC’10] ISCA’12] • However, there are several issues not addressed • 2D planar connections have waveguide crossings • Static network resource allocation • Lack of fault tolerance 6

Talk Outline • Motivation & Background • R-3PO: Architecture & Reconfiguration • Performance Analysis • Conclusions 7

R-3PO Architecture • Decomposed optical crossbar • Reduces optical hardware complexity by having smaller crossbars • Reduces crossover losses (~ 0.05 dB/crossing) • Optical vias • Light switched via photonic rings (reduces electrical power) • Eases fabrication as optical and electrical dies can be separately grown • Reconfiguration of network resources by re-allocating bandwidth • Reduces application execution time by monitoring link and buffer utilization • Provides fault tolerance as faulty channels are bypassed 8

R-3PO Architecture (1/6) Electrical Contact Optical Layer 3 Optical Optical Layer 2 Die Optical Layer 1 Optical Layer 0 Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 9

R-3PO Architecture (1/6) L1 Cache L1 Cache Core Core 0 1 Shared L2 L1 Cache L1 Cache Core Core 2 3 Electrical Die Core + Cache + MC Heat Sink 10

R-3PO Architecture (2/6) Limiting Driver for Photo- TIA detector Amplifier Electronics Buffer Chain T x T x T x T x R x R x R x R x Micro-ring resonator λ 1 λ 2 λ 3 λ 4 λ 1 λ 2 λ 3 λ 4 Off- Chip Laser Core A Core B Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 11

R-3PO Architecture (3/6) Group 1 Group 0 Group 2 Group 3 Optical Layer 0 Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 12

R-3PO Architecture (4/6) Group 1 Group 0 Group 3 Group 2 Optical Layer 1 Optical Layer 0 External Electro-Optic Transceivers Laser TSVs Electrical Die Core + Cache + MC Heat Sink 13

R-3PO Architecture (5/6) Group 0 Group 1 Group 3 Group 2 Optical Layer 2 Optical Layer 1 Optical Layer 0 Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 14

R-3PO Architecture (6/6) Group 0 Group 1 Electrical Contact Group 2 Group 3 Optical Layer 3 Optical Optical Layer 2 Die Optical Layer 1 Optical Layer 0 Electro-Optic Transceivers External Laser TSVs Electrical Die Core + Cache + MC Heat Sink 15

Router Microarchitecture Tile 0 Header Route Computation Token capture Token (RC) IB 0 Req + Rel release To Optical E/O Tx Layer 0 demux MRR BW BW BW BW RC RC EO EO OL OL OL OL OL OL OE OE SA SA Modulators S D Token IB 3 BW BW BW BW Req + Rel RC RC EO EO OL OL OL OL OL OL OE OE SA SA L2 Shared Cache S D To Optical E/O Tx Layer 3 RC: Route Computation BWS: Buffer Write (Source) Switch Allocator Token Token 0B 0 Control (SA) Re-generation EO: Electrical to Optical Driver From Optical O/E OL: Optical link latency (1-3 cycles) Rx Layer 0 OE: Optical to Electrical (Dest) mux BWD: Buffer Write (Dest) Token 0B 3 Control SA: Switch Allocation O/E From Optical Rx Layer 3 MRR Filters 16

Static Communication Layer 2 Source Group 0 Group 1 Communication demand between • Tile 0 and Tile 15 is high based on application If there are under-utilized links, • then the bandwidth can be re- allocated to improve the performance Group 3 Group 2 17

Network Reconfiguration Layer 0 Layer 1 Source Group 1 Group 0 Group 1 Group 0 Switch point Combine point Layer 1 Layer 0 Group 3 Group 2 Group 3 Group 2 Destination 2x increase in bandwidth is obtained by routing half the data through two other nanophotonic channels 18

Reconfiguration • Reconfiguration in R-3PO takes place between the different layers as follows: • R-3P0-L1 : Reconfiguration between Layer0/Layer1 & Layer2/Layer3 • R-3P0-LA : Reconfiguration between adjacent layers • R-3P0-L2 : Reconfiguration between two adjacent layers • R-3P0-L3 : Reconfiguration between all layers • Reconfiguration algorithm monitors network resources • Link & Buffer utilization • Accomplished with hardware counters & electrical circuitry 19

Reconfiguration Algorithm Step 1: Wait for Reconfiguration window, R W t Step 2: RC i sends a request packet to all local tiles requesting Link Util and Buffer Util for previous R W t-1 Step 3: Each hardware counter sends Link Util and Buffer Util statistics from the pervious R W t-1 to RC i Step 4: RC i classifies the link statistic for each hardware counter as: If Link util = 0.0 Not-Utilized: Use β 4 If Link util ≤ Lmin Under-Utilized: Use β 3 If Link util ≥ L min and Buffer util < B con Normal-Utilized: Use β 2 If Bufferutil > Bcon Over-Utilized: Use β 1 Step 5: Each RC i sends bandwidth available information to RC j , (i ≠ j). Step 6: If RC j can use any of the free links then notify RC i of their use, else RC j will forward to next RC j Step 7a: RC i receives response back from RC j and activates corresponding microrings Step 7b: RC j notifies the tiles of additional bandwidth and RC i notifies RC j that the additional bandwidth is now available Step 8: Goto Step 1 20

Photonic Networks-on-Chip for Maximizing Performance and Improving - PowerPoint PPT Presentation

Dynamic Reconfiguration of 3D Photonic Networks-on-Chip for Maximizing Performance and Improving Fault Tolerance Randy Morris , Avinash Kodi and Ahmed Louri School of Electrical Engineering and Computer Science, Ohio University

Photonic Crystals Derek Stewart CNF Fall Workshop What are photonic crystals? Photonic crystals

Exploring Chip to Chip Photonic Networks Philip Watts Computer Laboratory University of Cambridge

Recent Advances in Photonic Recent Advances in Photonic effect employing IP- based distributed

Photonic Crystals Photonic Crystals and Si Photonics and Si Photonics Toshihiko Baba Toshihiko

How the colour is created? Semiconductors vs Photonic Crystals (PCs) Semiconductors vs Photonic

Self-Assembly of Metal-Organic Framework Photonic Sensors Nanyang Research Programme Loi Si

Calibration des Microroc (II) Alex, Cyril, Giom, Jean, Max 09 Mai 2011, Annecy 1 Reminder 2

Recent Progress in Recent Progress in Photonic Crystal Devices Photonic Crystal Devices

Silicon nitride based TriPleX Photonic Integrated Circuits for sensing applications Arne Leinse

Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need

Study Of Chip Breaker El-Sherbeeny, PhD 2014 Project-Group 6 TYPES ES OF F CHI HIP a)

Australian Junior Resources Blue Chip Australian Junior Resources Blue Chip Australian Junior

Final Assembly Chip Core Your final project chip consists of a core The Chip Core is

Designing Networks on Chip: Designing Networks on Chip: Solutions and Challenges Solutions and

An Investigation into System-Level Trimming Issues in On-Chip Nanophotonic Networks p

Photonic Crystal Cavities (coupled with InAs Quantum Dots) Wayne McKenzie Introduction

Next Generation Optical Networks for Broadband European Leadership (NOBEL) NOBEL 2 Overview and

Great East Japan Earthquake and research and development for network resilience and recovery 23

Provider Backbone Transport Networking Host-to-host connections through SURFnet6 Drs. R. van der

Indoor Switches Air insulated switch-disconnectors EPMV Product Group Apparatus Agenda Scope

Trade Offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks

The Packet ADM Making Ethernet Services Economically Viable Gady Rosenfeld Director, St rategic

Salzer Electronics Limited Result Update Presentation November 2017 Disclaimer This presentation

THE APPLICATION OF THE APPLICATION OF GPRS FOR SCADA GPRS FOR SCADA COMMUNICATIONS

Sambuz

Useful Links

Newsletter

Mail Us