fiber to the processor and other challenges for photonics
play

Fiber-to-the-processor and other challenges for photonics in future - PowerPoint PPT Presentation

Fiber-to-the-processor and other challenges for photonics in future systems A.F.J. Levi http://www.usc.edu/alevi with contributions from Bindu Madhavan USC and Agilent Technologies Stanford, April 21, 2005 Fiber to the processor Page 1


  1. Fiber-to-the-processor and other challenges for photonics in future systems A.F.J. Levi http://www.usc.edu/alevi with contributions from Bindu Madhavan – USC and Agilent Technologies Stanford, April 21, 2005 Fiber to the processor Page 1

  2. What is a system ? VSR interconnect � Understand electronics in systems transition PCB Package to connector Backplane – Definition of system Backplane trace Line card via • Complex enough to require system area IC network trace Line card – Multi-processor rack-based system, router, data center, telephone switch, automobile etc., are systems – Cell-phone, telephone handset, camera, pocket calculator, etc., are not complex enough to be systems Backplane via – Chip IO performance – Backplane performance � Chassis systems composed of passive backplane with connectors for linecards – Backplane supplies power to linecards – Connectors are interconnected by traces in Line cards 8 × 8 × 40 × 2 backplane Gb/s = 5.12 Tb/s � Chassis systems have slots for linecards that plug into backplane at connectors 5 RU = 8.75” � Total chip-to-chip interconnect length up to 1meter. � Interconnect loss is a tradeoff between – Cost – improved line-characteristic using costlier dielectric materials, blind-via techniques,counterboring of backplane press-fit connector vias. Backplane – Density – reduced signal density at linecard-backplane 128 port × 40 × 2 Gb/s = 10.24 Tb/s interface allows for cheaper PCB manufacturing options Fiber to the processor Page 2

  3. System interconnect hierarchy and advanced optical solutions FTTP A. F. J. Levi, Optical Interconnects in Systems , Frame-to-Frame Proc. IEEE 88, 1264-1270 (2000) Board-to-Board “LAN” Substrate-to-Substrate Electron Bohr radius in GaAs Quantum effects accessed Conventional Chip-to-Chip Optical Data Link Shelf-to-Shelf Gate-to-Gate POLO by photonics Single atom Parallel Optical Data Link Electronics PONI Fiber to the processor Parallel Optical applications Interconnect Increasing system functionality Length at which electrical transmission lines are required 1 µ m 10 µ m 100 µ m 0.1 nm 1 nm 10 nm 100 nm 1 mm 1 cm 10 cm 1 m 10 m 100 m 1 km 1 T 100 G 10 G 1 G 100 M 10 M 1 M 100 k 10 k 10 T Transfer bit rate Fiber to the processor Page 3

  4. Parallel optical interconnect products emerge from DARPA funded POLO – PONI – MAUI programs POLO-PONI-MAUI PONI (1997 – 2000) - inspired products for 10 m – 600 m POLO (1994 – 1997) interconnect lengths : Agilent, Zarlink, Picolight, Gore, Emcore, Paracer, E20, Silicon Light Machines, Cielo Agilent announced 12 x 3.3 Gb/s = 40 Gb/s November 2000 Full production November 2001, customers: Nortel, Cisco, IBM 12 x 10 Gb/s = 120 Gb/s demonstrated 2003 1995 2000 2004 time VCSELs / PINs Guide pin Passives MAUI (2002 – present) Combination of VCSEL WDM and parallel fiber Optics optic technology for FTTP 1 m – 100 m interconnect length applications 240 Gb/s < 1 W Silicon IC Flex circuit Metal base demonstrated 2004 8 mm x 6 mm PMOSA 240 – 1000 Gb/s, < 1W Fiber to the processor Page 4

  5. Parallel optics and CMOS integration POLO HP experimental Point-to-point host Ring network for parallel optics JetStream ring network interface for parallel optics integrated in single CMOS IC 1 Gb/s Tx 16 Gb/s Tx 20 Gb/s Tx 1 Gb/s Rx 16 Gb/s Rx 20 Gb/s Rx 20× JetStream on a chip Afterburner JetStream High-speed parallel fiber-optic interface 210 mm Host 144 mm Link Adapter Chip for parallel fiber-optic ring network – 400,000 transistors includes ring MAC – 10.2 mm x 7.3 mm in 0.5 µ m CMOS – tape-out 8.17.00, received 11.10.00 July 1995 October 1997 December 2000 Fiber to the processor Page 5

  6. New markets for optical interconnects: Solving the electronics interconnect and packaging mess! FTTP CPU Memory Main PCI Cards Memory Cont. Main IO Cont. Memory The SAN The memory access bottleneck � Integration trend places multi-processors on single chip – Chip multi-processor (CMP) from Broadcom (SiByte BCM1250) � Main memory likely to remain separate in most systems 10nm CMOS circuits have 100M transistors/mm 2 – • 6 transistors per bit in SRAM → 16 Mb = 2MB/mm 2 or 200MB/cm 2 • 1 transistor per bit in DRAM → 100 Mb = 12MB/mm 2 or 1.2GB/cm 2 – Might be useful for single-chip notebook computer or make an interesting L2 cache for a CMP � Multiple processor boards in chassis systems are connected by switches Fiber to the processor Page 6

  7. 1U (1.75”) thick 20-port GbE switch/router for chassis servers (2001) System example Eight GbE serial backplane interconnect over low-cost CPCI connectors � 96W, hot-swappable 20- port GbE router 100W, 48V, 20A brick � 15.5” x 5.35” 100W, 48V, 20A brick � ~2300 components � ~7000 nets, ~11000 pins � Electrical and optical GbE IO � 8 GbE optical links � 8 GbE backplane links Quad serial link IC for GbE backplane interconnect � 4 GbE Cat-5 links GbE PHY IC Clock generation SERDES + dual quad-channel MMF Management Microprocessor and Quad 8-port, mesh-connected GbE optical modules support circuitry Switch ICs with 20 external ports Fiber to the processor Page 7

  8. Integration and packing driven processor crisis: The case for fiber-to-the-processor (FTTP) System level issues � Electronics fails to deliver 1000 � Power crisis - projected kW CPU not viable Itanium � Processor crisis driving multi-core processor design Pentium 4 10 power (W 100 with increased IO demand and only a fraction of transistors being active at any one time Log 10 i386SX � Intel moves to CMP and Pentium IV uni-processor development terminated - 2005 1 � Bandwidth density and latency crisis 1980 1985 1990 1995 2000 2005 2010 Year � increasing mismatch between memory bus bandwidth 1000 accounts for and CPU External Memory Bandwidth Internal CPU Bandwidth superscalar � many CPU cycles wasted after cache miss 100 Bus bandwidth (Gb/s) microprocessor � Signal integrity crisis architecture by 10 multiplying � EMI, reflections, crosstalk, device noise may lead the way internal datapath to optical interconnects width by the 1 � high-speed electrical signaling not reliable number of instructions that � $400M i820 memory translator hub recall because of 0.1 i386Dx-16 i486Dx-25 i486Dx-33 P1-66 P1-100 P1-133 P1-200 P1-233 P2-450 P3-733 P4-1500 P4-2000 P4-3000 P4-3200 Itanium-2 can be issued electrical noise - 5.10.00 simultaneously. � 1.13 GHz PIII recall because of electrical noise in circuit Ethernet switch-port deployment element - 8.28.00 10 Ethernet data- � Fiber-to-the-processor is a new design point rate deployment � Less power, less power density in distributed system Data rate (Gb/s) 1 Moore’s Law using WDM SAN 2× every 2 years � Better signal integrity, optical isolation 0.1 � More bandwidth density gives reduced latency in node and SAN 0.01 1994 1996 1998 2000 2002 2004 Removes electrical backplane bottleneck for future � Year Moore’s Law: On-chip high-performance local clock (SIA 97) multi-processor systems Fiber to the processor Page 8

  9. Optical interconnects and the memory access bottleneck FTTP 1000 Optical interconnect can External Memory Bandwidth fill the Internal CPU Bandwidth memory-access 100 performance gap Bus Bandwidth (Gb/s) with bandwidth edge density of 10 60 – 600 Gb/s/mm 1 0.1 6 5 3 2 6 0 3 0 3 0 3 0 0 0 0 1 2 3 0 3 0 3 5 3 0 0 0 0 - 6 m - - - 5 0 0 2 - 1 1 2 2 4 7 x x x 1 1 2 3 3 - - - - - - u D D D P 1 1 1 1 2 3 - - - - i 4 4 4 4 n 6 6 6 P P P P P P a 8 8 8 P P P P t 3 4 4 I i i i Fiber to the processor Page 9

  10. FTTP: A new architecture enabled by optical interconnects and high-performance CMOS integration System level issues � New technology Switch-based – Optical interconnect Driving to a architecture • Ultra-high bandwidth “ technology convergence point ” • Low power • Low latency Optical � Integration FTTP interconnect – CMOS interface to optics • High-performance crossbar switch CMOS � New switch-based architecture optical – Next generation scalable NUMA interface • Switch integrated in processor and memory SAN P 1 P 1 5 Tb/s 5 Tb/s L 3 L 3 Parallel optics and WDM P 2 P 2 SAN Multi-processor switched-based network VCSEL High-performance CMOS interface Fiber to the processor Page 10

  11. Example latency estimate Round-trip 10 Cy at 125 MHz P P P P (80 ns) time 80 ns 5 Cy at 500 MHz Ctl Ctl Memory Memory + 10 ns (10 ns) + 16 ns Round-trip 4+4 Cy at 500 MHz time + 30 ns (16 ns) Cross Bar Cross Bar per segment + 16 ns 15 Cy at 500 MHz + 30 ns (30 ns) 30 ns 16 ns + 16 ns 4+4 Cy at 500 MHz 16 ns + 30 ns Cross Bar (16 ns) Cross Bar + 16 ns 20 ns 15 Cy at 500 MHz + 10 ns (30 ns) Ctl Memory Ctl Memory + 20 ns + 50 ns 50ns 10ns P P P = 324 ns P � 10× increase in clock rate reduces round-trip time ~10× � Assume time-of-flight ~ 0 ns Fiber to the processor Page 11

Recommend


More recommend