Multiplying Moore's Law with Proximity Communication Robert Drost, Ph.D. Director and Distinguished Engineer Sun Microsystems Laboratories
Outline • The Bandwidth Motivation • Proximity Communication Technology • Multiplying Moore's Law 2
The Team VLSI Research Group at Sun Labs Igor Benko, Alex Chow, Wes Clark, Bill Coates, Robert Drost, Jo Ebergen, Scott Fairbanks, Jonathan Gainsley, Gilda Garreton, Yaeko Hirotsuka, Ron Ho, David Hopkins, Ian Jones, Russell Kao, Jon Lexau, Dimitri Nadezhin, Tarik Ono, Steve Rubin, Jeff Rulifson, Justin Schauer, Ivan Sutherland, and friends: David Harris, Mark Greenstreet, Ken Yang And many others at Sun 3
Why do we want more off-chip bandwidth anyway? 4
Motivation: CPU vs. DRAM J.L. Hennessy and D.A. Patterson, Computer Organization and Design, 2nd ed. 5
Motivation: BBW vs. Flops 3,000 0.01 byte/flop 0.1 byte/flop 0.001 byte/flop 1 byte/flop 1,000 (Ref 1) Performance (TFlops/sec) More bandwidth/flop Blue Gene/L (2005) 10 bytes/flop ¼ Blue SX-8 Vector Gene/L NASA 100 Columbia MPP Sandia Red Storm Thin-node Colsa Cluster Mach5 Earth ASCI-Q LLNL Sim Thunder Fat-node Cluster NCSA Tungsten 10 0.01 0.1 1 10 100 1,000 2,000 Bisection Bandwidth (TBytes/sec) 6
Bandwidth versus Memory Capacity (Ref 1) 7
Motivation: Lack of Data Locality Dense Linear Algebra 3D FFT (Ref 2) Black=no processor pair communication White=Heavy processor pair communication 8
Proximity Communication Tech nology 9
Proximity Communication ● Avoids Off-Chip Wires Chip2 ● Increases Bandwidth/Area Chip1 Chip3 ● Makes Chips Replaceable Transmit Receive ● Enhances Testing Capability ● Enables Smaller Chips ● Obviates ESD Protection Receive Transmit ● Shrinks Transceiver Circuits 1000 Proximity Area Ball Bonding Proximity Communication I/O Area Ball 100 Bonding 1 2 0 1 u 5 m u m 10 2003 2005 2007 2009 Year 10
Simple Circuits: 11
Proximity Packaging Challenges • Performance is a function of Z, Ψ , Φ misalignments Power Connection Alignment Heat Force Vector Extraction • With reasonable misalignment control tens of Tbps bandwidth per chip can be realized 12
Alignment is Multi-Dimensional θ X Y Φ ψ Z Chip1 Chip2 13
Alignment is the major challenge • Must align chips in X, Y, Θ , Z, Ψ , Φ • X, Y, Θ misalignments are corrected electronically Chip1 Chip2 0 0 Inactive Tx micropad Tx Micropads 1 1 Active Tx micropad 0 Rx pad X 1 0 0 Y Vernier 1 1 Tstrobe Rstrobe X Vernier Rx Pads ...and correct... Measure... ...on-chip 14
Steering Circuit One Receiver Pad Pitch B1 B2 Steering Tx pad in two dimensions C1 C2 15
Pads Cross-Section Transmitter Plates Plate Chip 1 50 μ m Separation Chip 2 Receiver Plates 16
Signal and Noise Simulated Coupling Combining estimates for ● Channel speed ● Receiver sensitivity ● Signal vs. noise for pads ● Clocking and overhead We can estimate... Where G=pad separation, or gap in microns 17
A tileable PxC block Data Tx channels Align Align Tx Tx Tx Tx Tx Tx Tx Tx Tx Tx Tx Tx Clock channel Rx Rx Rx Rx Rx Rx Rx Rx Rx Rx Rx Rx Rx Data Align Align channels 18
Measured results • TSMC 180nm CMOS • 72 transmit, 72 receive channels • 1.8 Gb/s per channel, 10 -15 bit error rate • Aggregate 260Gb/s/chip, density 430Gb/s/mm 2 • 3pJ/bit 19
Experimental Setup PCB1 PCB1 PCB1 PCB2 PCB2 PCB2 Chip1 Chip2 Chip1 Chip2 20
BER vs. chip separation 21
Eye opening at 1.8Gb/s 22
How do we multiply Moore's Law? 23
The Key Idea in Moore's Law • Double number of transistors/chip (for same cost) every 24 months > The principal driving force behind the past 40 years of integrated circuit industry advancement > An amazing prediction in 1965 based on fewer than a hundred transistors/chip 24
The Key Idea in Proximity Comm. • We connect chips with enough bandwidth that they can perform as a single integrated chip • Hence, PxC increases the effective number of transistors/chip over and above Moore's Law 25
Multiplying Moore's Law • Assuming Moore's Law continues PxC Arrays with increasing Transistors per Chip chip counts 1,000,000,000 g n i l a c s w a 1,000,000 L s ' e r o o M 1,000 1970 1980 1990 2000 2010 2020 26
What if Moore's Law stalls? • Many have (incorrectly) predicted demise of Moore's Law • Technical causes > Short channel effects in transistors leading to too much leakage and hence power consumption > Wire delay limiting performance • Financial causes > Fabs cost too much to yield a return on investment • 65nm fabs cost $3 Billion to build (and going up 2x per generation) > Chips cost too much to yield a return on investment 27
Multiplying a stalled Moore's Law • Proximity Communication keeps increasing transistors/chip without a fabrication contribution s y a g r Transistors r n A i s s C a per Chip t n e x u P r c o n c i p h i t h i w c 1,000,000,000 If Moore's Law stalls g n i l a c s w a 1,000,000 L s ' e r o o M 1,000 1970 1980 1990 2000 2010 2020 28
Summary • Need for off-chip bandwidth motivates PxC • Good mechanical alignment enables PxC and its tremendous bandwidth increase • PxC multiplies Moore's Law by providing enough bandwidth to realize wafer-scale integration 29
Multiplying Moore's Law with Proximity Communication http://research.sun.com/vlsi
References (1) D. Hopkins, et al., “Circuit Techniques to Enable 430Gb/s/mm 2 Proximity Communication,” IEEE Int'l Solid-State Circuits Conference , Feb. 2007. (2) R. Drost, et al., “Challenges in building a flat-bandwidth memory hierarchy for a large-scale computer with proximity communication,” High Performance Interconnects, 2005. Proceedings. 13th Symposium on , pp. 13-22, Aug. 2005. (3) Krste Asanovíc, et al., “The Landscape of Parallel Computing Research: A View from Berkeley,” EECS Technical Report, in press , December 3, 2006. (4) R. Drost, R. Ho, R. D. Hopkins, I. Sutherland, “Electronic Alignment for Proximity Communication,” IEEE Int'l Solid-State Circuits Conference , Feb. 2004. (5) R. Drost, R. D. Hopkins, I. Sutherland, “Proximity Communications,” IEEE Custom Integrated Circuits Conference , pp. 469-472, Sept. 2003. (6) J.L. Hennessy and D.A. Patterson, Computer Organization and Design, 2nd ed., Morgan Kaufmann Publishers, San Francisco, 1997. 31
Recommend
More recommend