Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University
Chip-Scale Interconnection Networks • Chip multi-processors create need for high performance interconnects • Performance bottleneck of on-chip networks and I/O • Power dissipation constraints of the chip package • > 50% of total power comes from interconnects* Intel Polaris IBM Cell AMD Opteron * N. Magen et al. , “Interconnect - power dissipation in a microprocessor,” SLIP 2004. 2
Motivation • CMPs of the future = 3D stacking • Lots of data on chip • Photonics offers key advantages 3
Why Photonics? Photonics changes the rules for Bandwidth, Energy, and Distance. OPTICS: ELECTRONICS: Modulate/receive high bandwidth Buffer, receive and re-transmit at data stream once per communication every router. event. Broadband switch routes entire multi- Each bus lane routed independently. (P N LANES ) wavelength stream. Off-chip BW = On-chip BW for Off-chip BW is pin-limited and nearly same power. power hungry. RX RX RX RX RX TX RX RX TX TX TX TX TX TX TX TX TX 4
Hybrid Network Premise Optical processing difficult and limited Source, destination routing inefficient Use electronics for routing, optics for switching and transmission Hybrid Circuit-Switching 5
Hybrid Circuit-Switched Networks Step 1: Path SETUP request Electronic SETUP Msg Source core Destination Core 6
Hybrid Circuit-Switched Networks Step 2: Path ACK Electronic ACK Msg 7
Hybrid Circuit-Switched Networks Step 3: Transmit Data Photonic Switch Use Information 8
Hybrid Circuit-Switched Networks Meanwhile: Path Contention Path BLOCKED Msg (Backoff) 9
Hybrid Circuit-Switched Networks Step 4: Path TEARDOWN Electronic SETUP Msg Source core Destination Core 10
Hybrid Circuit-Switched Networks Pros: Cons: • • Energy-efficient end-to- Path setup latency end transmission • Path setup contention • High bandwidth through (no fairness) WDM • Electronic network still available for small control messages* • Network-level support for secure regions 11 * [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]
Programming and Communication
Shared Memory scaling “… [ OpenMP on large systems] often performs worse than message passing due to a combination of false sharing, coherence traffic, contention, and system issues that arise from the difference in scheduling and network interface moderation” ~ Exascale Report Implicit Explicit Communication Communication 13
Partitioned Global Address Space Access Method Local Read Optical Receive Local Write Optical send Remote Read Electronic request, optical receive Remote Write Optical send Shared R/W ? Implicit Explicit Communication Communication 14 [G. Hendry et al . Circuit-Switched Memory Access in Photonic Interconnection Networks for HPEC . In Supercomputing, Nov. 2010]
Message Passing • Complex, dynamic access patterns • Relatively larger blocks of data • Scientific computing Implicit Explicit Communication Communication 15 * [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]
Streaming • Embedded / specialized systems (Graphics, Image + Signal Proc.) • Execution mode of general-purpose systems (Cell Processor) 1 4 Output Data Input Data 3 2 Persistent optical circuits Implicit Explicit Communication Communication 16
Electronic Plane
Electronic Router Ring Cntrl Request Bus Xbar Cntrl Data Switch Buffer Cntrl Routing Logic Credits In Flow Control Arbiter Xbar Cntrl Xbar Allocation Ring Cntrl Data Switch … Allocation • Low frequency operation (~ 1GHz) Buffer Crossbar • 1 VC (typically) Control Router Data Path • Small buffers (64-28) • Narrow Channels (8-32) 18
Network Gateway To/From Electronic Crossbar Control Router Control plane Network IF Core Deserialization Receivers Core Tx/Rx Core Serialization Drivers Core 5-port photonic switch To/From Bidirectional Bidirectional Data plane Electronic Channel Waveguide External Concentration [P. Kumar et al. Exploring concentration and channel slicing in on-chip network router . In NOCS, 2009] 19
The Photonic Plane
Silicon Photonic Waveguide Technology 1.28 Tb/s Data Transmission Experiment ( occupies small slice of available WG BW) 100 ps [Vlasov and McNab, Optics Express 12 (8) 1622 (2004)] before injection into waveguide after 5-cm waveguide Low-loss (1.7 dB/cm), and EDFA high-bandwidth (> 200 C28 C23 C46 C51 nm) silicon photonic (1559 nm) (1555 nm) (1541 nm) (1537 nm) waveguides can be [B. G. Lee et al. , Photon. Technol. Lett. 20 (10) 767 (2008)] fabricated in Silicon photonic waveguides provide low-power optical commercial CMOS interconnects in CMOS-compatible platform. process. 21
Silicon Photonic Modulator and Detector Technology (CW) modulator detector LASER 18 Gb/s demonstrated Ge-on-Si Detectors : 40-GHz bandwidths 1 A/W responsivities Receivers (detectors w/ CMOS amplifiers) : 1.1 pJ/bit demonstrated at 10 Gb/s Scalable to < 50 fJ/bit [M Lipson, Optics Express (2007)] 85 fJ/bit demonstrated at 10 Gb/s Scalable to < 25 fJ/bit 22 [M Watts, Group Four Photonics (2008)] [S Koester, J. Lightw. Technol. (2007)]
Silicon Photonic Micro-Ring Switch Explanation bar state in0 out0 no current, on-resonance cross state in1 out1 current, fast control of resonance off-resonance wavelength via carrier injection Transmission ( in i out i ) 23
Higher Order Switch Designs 24
On-Chip Topology Exploration • • Photonic Torus Nonblocking Photonic Torus [A. Shacham et al., Trans. on Comput., 2008] [M. Petracca et al. IEEE Micro, 2008] 25
On-Chip Topology Exploration • • TorusNX Square Root [J. Chan et al. JLT, May 2010] 26
Photonic Plane Characteristics • Insertion Loss • Noise • Power 27
Insertion Loss and Optical Power Budget Nonlinear Effects Power Budget Optical WDM Factor Worst-case Insertion Loss Detector Sensitivity 28
Insertion Loss vs. Bandwidth Topologies Number of λ Network Size 29
Simulation Results 50 50 Torus Topology Non-BlockingTorus Topology Insertion Loss (dB) 60.3 ) B 63.2 40 54.5 40 d 56.8 ( 48.6 s 50.6 s 30 42.8 30 o 44.1 37.0 L 38.0 31.2 n 20 20 31.5 o 25.6 i t 25.3 r 20.6 18.7 e 10 10 s n I 0 0 4 6×6 8×8 1 12×12 1 1 1 4 6 8×8 1 1 1 1 1 × 0 4 6 8 × × 0 2 4 6 8 4 × × × × 4 6 × × × × × 1 1 1 1 1 1 1 1 1 0 4 6 8 0 2 4 6 8 Topology Size (nodes) Topology Size (nodes) 50 50 TorusNX Topology Square Root Topology ) ) B B 40 40 d d ( ( 42.7 s s 38.8 s s 30 30 o o 34.9 L L 31.0 n n 27.1 20 20 30.6 o o 23.2 i i t t 21.5 r 19.5 r e e 10 15.8 10 s s 12.2 n n I I 0 0 4×4 6×6 8×8 10×10 12×12 14×14 16×16 18×18 4×4 8×8 1 1 1 16×16 1 1 0 2 4 6 8 × × × × × 1 1 1 1 1 0 2 4 6 8 Topology Size (nodes) Topology Size (nodes) Propagation Crossing Dropping Into a Ring 30
Simulation Results Original is based on the IL results from previous slide, Improved is based on a hypothetical improvement in crossing loss from 0.15 dB to 0.05 dB. Torus Topology Non-Blocking Torus Topology s s l l e e n n 100 Optical power n n 100 a a h h budget C C h h t t g g n n e e l l e e v v a a W 10 W 10 f f o o r r e e b b m m u u N N 1 1 0 100 200 300 10 20 30 Number of Access Points Number of Access Points TorusNX Topology Square Root Topology Number of W avelength Channels Number of Wavelength Channels 100 100 Optical power budget 10 10 1 1 0 100 200 300 0 100 200 300 31 Number of Access Points Number of Access Points
Photonic Plane Characteristics • Insertion Loss • Noise • Power 32
Noise and Crosstalk Laser Noise Modulation Noise Coherent noise Inter-Message Crosstalk Crosstalk Intra-Message Crosstalk Incoherent noise Filter 33
Effects of Noise Optical SNR Number of λ Network Size Network Load 34
Simulation Results Results • Results are plotted for network size of 8 × 8 at saturation, at the detectors. 50 • Maximum OSNR = ~45 dB (due to laser Torus Non-blocking Torus noise) TorusNX 40 • Minimum OSNR < 17 dB (due to Square Root B) message-to-message crosstalk) d • Variations between networks due to ( 30 R N varying likelihood of two message S intersecting on network topology. l Optica 20 System Performance • SNR measures the likelihood of error-free 10 transmission. • Lower SNR designs will require additional 0 retransmission, resulting in lower 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Message Size (bit) throughput performance. The line at OSNR=16.9 dB is where a bit-error-rate of 10 -12 can be achieved, assuming an ideal binary receiver circuit and orthogonal signaling. 35
Photonic Plane Characteristics • Insertion Loss • Noise • Power 36
Recommend
More recommend