Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics Christopher Batten 1 , Ajay Joshi 1 , Jason Orcutt 1 , Anatoly Khilo 1 Benjamin Moss 1 , Charles Holzwarth 1 , Miloˇ c 1 , Hanqing Li 1 s Popovi´ Henry Smith 1 , Judy Hoyt 1 , Franz K¨ artner 1 , Rajeev Ram 1 c 1 , Krste Asanovi´ c 2 Vladimir Stojanovi´ 1 Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology, Cambridge, MA 2 Department of Electrical Engineering and Computer Science University of California, Berkeley, CA Symposium on High Performance Interconnects August 27, 2008
Motivation Photonic Technology Network Architecture Full System Design The manycore memory bandwidth challenge Christopher Batten MIT/UCB 2 / 25
Motivation Photonic Technology Network Architecture Full System Design The manycore memory bandwidth challenge Christopher Batten MIT/UCB 2 / 25
Motivation Photonic Technology Network Architecture Full System Design Cost of electrical processor-to-DRAM networks Christopher Batten MIT/UCB 3 / 25
Motivation Photonic Technology Network Architecture Full System Design Cost of electrical processor-to-DRAM networks Christopher Batten MIT/UCB 3 / 25
Motivation Photonic Technology Network Architecture Full System Design Cost of electrical processor-to-DRAM networks Christopher Batten MIT/UCB 3 / 25
Motivation Photonic Technology Network Architecture Full System Design Cost of electrical processor-to-DRAM networks Christopher Batten MIT/UCB 3 / 25
Motivation Photonic Technology Network Architecture Full System Design Motivation Photonic Technology Network Architecture Full System Design Christopher Batten MIT/UCB 4 / 25
Motivation Photonic Technology Network Architecture Full System Design Seamless On-Chip/Off-Chip Photonic Link Christopher Batten MIT/UCB 5 / 25
Motivation Photonic Technology Network Architecture Full System Design Seamless On-Chip/Off-Chip Photonic Link • Light coupled into waveguide on chip A Christopher Batten MIT/UCB 5 / 25
Motivation Photonic Technology Network Architecture Full System Design Seamless On-Chip/Off-Chip Photonic Link • Light coupled into waveguide on chip A • Transmitter off : Light extracted by ring modulator Christopher Batten MIT/UCB 5 / 25
Motivation Photonic Technology Network Architecture Full System Design Seamless On-Chip/Off-Chip Photonic Link • Light coupled into waveguide on chip A • Transmitter off : Light extracted by ring modulator • Transmitter on : Light passes by ring modulator Christopher Batten MIT/UCB 5 / 25
Motivation Photonic Technology Network Architecture Full System Design Seamless On-Chip/Off-Chip Photonic Link • Light coupled into waveguide on chip A • Transmitter off : Light extracted by ring modulator • Transmitter on : Light passes by ring modulator • Light continues to receiver on chip B Christopher Batten MIT/UCB 5 / 25
Motivation Photonic Technology Network Architecture Full System Design Seamless On-Chip/Off-Chip Photonic Link • Light coupled into waveguide on chip A • Transmitter off : Light extracted by ring modulator • Transmitter on : Light passes by ring modulator • Light continues to receiver on chip B • Light extracted by receiver’s ring filter and guided to photodetector Christopher Batten MIT/UCB 5 / 25
Motivation Photonic Technology Network Architecture Full System Design Photonic Component Characterization Standard CMOS process • Waveguides • Ring Modulators • Ring Filters 65 nm Test Chip Simulation • Photodetectors Christopher Batten MIT/UCB 6 / 25
Motivation Photonic Technology Network Architecture Full System Design Photonic Component: Waveguide Christopher Batten MIT/UCB 7 / 25
Motivation Photonic Technology Network Architecture Full System Design Photonic Component: Ring Modulator Christopher Batten MIT/UCB 8 / 25
Motivation Photonic Technology Network Architecture Full System Design Photonic Component: Ring Filter Christopher Batten MIT/UCB 9 / 25
Motivation Photonic Technology Network Architecture Full System Design Photonic Component: Photodetector Christopher Batten MIT/UCB 10 / 25
Motivation Photonic Technology Network Architecture Full System Design Silicon photonic’s energy and area advantage Bandwidth Energy Density (pJ/b) (Gb/s/µm) Global on-chip photonic link 0.25 160-320 Global on-chip optimally repeated M9 wire in 32 nm 1 5 Off-chip photonic link (50 µm coupler pitch) 0.25 13-26 Off-chip electrical SERDES (50 µm pitch) 5 0.2 On-chip/off-chip seamless photonic link 0.25 Christopher Batten MIT/UCB 11 / 25
Motivation Photonic Technology Network Architecture Full System Design Motivation Photonic Technology Network Architecture Full System Design Christopher Batten MIT/UCB 12 / 25
Motivation Photonic Technology Network Architecture Full System Design Leveraging silicon photonics to address the memory bandwidth challenge Christopher Batten MIT/UCB 13 / 25
Motivation Photonic Technology Network Architecture Full System Design Baseline Network Architecture: Mesh Topology Logical View Physical View Christopher Batten MIT/UCB 14 / 25
Motivation Photonic Technology Network Architecture Full System Design Analytical modeling of energy and throughput tradeoffs 8 • 22 nm – 256 cores @ 2.5 GHz Total Energy 6 (nJ/cycle) • Performance will most likely be Off−chip I/O Channels energy constrained Mesh 4 Routers • Fixed 8 nJ/cycle energy budget (20W) 2 Mesh Channels • Use simple gate-level models to 0 20 40 60 80 100 120 estimate energy, ideal throughput Mesh Channel Bitwidth (b/cycle) under uniform random traffic, and Total Ideal Throughput zero-load latency 8 d e (Kb/cycle) t i m 6 i L h s e M 4 2 I/O Limited (5 pJ/b) 0 20 40 60 80 100 120 Mesh Channel Bitwidth (b/cycle) Christopher Batten MIT/UCB 15 / 25
Motivation Photonic Technology Network Architecture Full System Design Analytical modeling of energy and throughput tradeoffs 8 • 22 nm – 256 cores @ 2.5 GHz Total Energy 6 (nJ/cycle) • Performance will most likely be Off − chip I/O Channels energy constrained Mesh 4 Routers • Fixed 8 nJ/cycle energy budget (20W) 2 Mesh Channels • Use simple gate-level models to 0 20 40 60 80 100 120 estimate energy, ideal throughput Mesh Channel Bitwidth (b/cycle) under uniform random traffic, and Total Ideal Throughput zero-load latency 8 d e (Kb/cycle) t i m 6 i L h s e M I/O Limited 4 (250 fJ/b) 2 I/O Limited (5 pJ/b) 0 20 40 60 80 100 120 Mesh Channel Bitwidth (b/cycle) Christopher Batten MIT/UCB 15 / 25
Motivation Photonic Technology Network Architecture Full System Design Ideal throughput vs. off-chip I/O energy efficiency • Decreased off-chip I/O energy, Ideal Throughput 8 results in more I/O bandwidth and (Kb/cycle) 6 mesh bandwidth 4 • Latency decreases slightly due to photonic 2 lower serialization latency range electrical range 0 • In photonic range almost all of the 0 1 2 3 4 5 Off−chip I/O Energy (pJ/b) energy is being spent on the mesh 225 Zero−Load Latency • A more energy efficient on-chip interconnect should further improve 175 (cycles) throughput 125 photonic electrical range range 75 0 1 2 3 4 5 Off−chip I/O Energy (pJ/b) Christopher Batten MIT/UCB 16 / 25
Motivation Photonic Technology Network Architecture Full System Design Mesh Augmented with Global Crossbar Logical View Physical View Christopher Batten MIT/UCB 17 / 25
Motivation Photonic Technology Network Architecture Full System Design Analytical modeling of global crossbar topology 30 • Global crossbar increases energy Ideal Throughput Simple Mesh efficiency of the on-chip interconnect Mesh w/ 4 Groups (Kb/cycle) 20 improving throughput Mesh w/ 16 Groups • Global traffic is moved from energy- 10 inefficient mesh channels to energy- efficient on-chip silicon photonics 0 0 1 2 3 4 5 Off−chip I/O Energy (pJ/b) • Global crossbar has little impact in the electrical range since very little 225 Zero−Load Latency energy is being spent in the on-chip interconnect to begin with 175 (cycles) • Latency decreases due to lower 125 serialization and hop latency photonic electrical range range 75 0 1 2 3 4 5 Off−chip I/O Energy (pJ/b) Christopher Batten MIT/UCB 18 / 25
Motivation Photonic Technology Network Architecture Full System Design Simulation Methodology • Execution driven cycle-accurate network simulator • Models pipeline latencies, router contention, credit-based flow control, and serialization overheads • Configuration same as in analytical modeling except: – Mesh networks use dimension ordered routing – 16 DRAM modules distributed around chip – Memory channels cache-line interleaved – Normalized buffering in terms of bits Christopher Batten MIT/UCB 19 / 25
Recommend
More recommend