Wide I/O DRAM Architecture Utilizing Proximity Communication by Qawi Harvard Thesis Defense – October 8 th , 2009
Introduction Bandwidth and power consumption of dynamic random access memory stifles computer performance scaling Background Status of Proximity Communication DRAM Market Analysis 4 Gb DRAM Architecture Wide I/O DRAM Architecture Utilizing Proximity Communication Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 2
Background Memory Gap Main memory does not scale with processor performance Power Current consumption is rising Bandwidth increases power Voltage scaling masks the issue Density Memory channel loading Limits bandwidth Proximity Communication Proposed by Ivan Sutherland – US Patent #6,500,696 Promises to reduce power and increase bandwidth Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 3
Proximity Communication Chip 1 Chip 2 Transmit Receive Chip 1 Chip 2 Receive Transmit Capacitive Coupled Proximity Communication Top metal forms the parallel plates Chip-to-chip communication through coupling capacitor Ref:[1] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 4
Proximity Communication Benefits Increased I/O density Avoids on/off chip wires Eases chip replacement at the system level Enhances system level testability Enables smaller chip sizes Removes the need for ESD protection Challenges Mechanical misalignment Applying power to the chips Thermal solution Ref:[1-5] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 5
Proximity Communication Parallel Plate Capacitance A 0 0 C 8 . 9 aF m d 10 pF/mm 2 4000 1000 Chip-to-chip separation I/O Density per mm 2 Proximity Communication d = 1 µm Area Ball Bonding 100 One channel 50 fF 10 2003 2004 2005 2006 2007 2008 2009 2010 200 signals/mm 2 [1] Ref:[1] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 6
Proximity Communication Mechanical Misalignment Six axis Multiple sources z θ x θ y Separation Tilt θ z y x Translation Rotation Ref:[5] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 7
Proximity Communication Electronic Sensors Chip-to-chip separation sensors (0.2 µm resolution) Vernier scale incorporated on chip (1.0 µm resolution) Electrical Re-Alignment 1.0 Normalized Received Signal 0.9 Receive array 0.8 0.7 0.6 Micro-transmit array 0.5 0.4 0.3 Electronic steering circuit 0.2 0.1 0 20 40 60 80 100 120 0 Misalignment (µm) Transmit Chip Inactive Transmit Pad 1 1 1 1 1 1 0 0 0 0 0 Active Transmit Pad Receiver Pad 1 1 1 ?? 1 1 Ref:[1-5] 0 0 0 0 Receive Chip Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 8
DRAM Market Analysis Revisit the Memory Gap “Performance” becomes a relative term Dichotomy in scaling Why Density? Why Not Latency? 100000 100000 10000 10000 Relative Performance (%) Relative Performance (%) DRAM Density Processor IPS 1000 1000 100 100 Processor IPS DRAM Latency 10 10 1 1 1980 1985 1990 1995 2000 2005 2010 1980 1985 1990 1995 2000 2005 2010 Ref:[6] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 9
DRAM Market Analysis Moore’s Law 41% increase in transistor count per year Selling Price 36% historic decline per year Putting it into Perspective 100,000.00 1 Gb 2009 → $2.00 Historically the price per bit has 1975 1977 10,000.00 1974 declined by 9% every quarter 1979 2 Gb 2011 → $1.64 1976 (1974 – 2008). 1981 Price per Bit (Milicents) 1978 1983 1,000.00 1980 1985 1987 1982 Density or Bust!! 1989 1984 100.00 1991 1993 1995 1986 1988 1990 10.00 1994 1992 1997 1996 1999 1.00 1998 2001 2003 2000 2005 0.10 2002 2004 0.01 1 1 0 0 1 0 0 1 0 , 0 0 0 1 , 0 0 0 , 0 0 0 1 0 0 , 0 0 0 , 0 0 0 1 2 ) 1 2 C u m u l a t i v e B i t V o l u m e (1 0 Ref:[7-8] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 10
DRAM Market Analysis Cost Low cost manufacturing process Generations o 3 metal layers Features • Increased usage of each level Simple RAS/CAS PM o Small chip size Fast CAS Access FPM • Limits I/O count Latched Output Moore’s Law EDO Programmable Burst & 41% scaling per year Synchronous w/Clock SDR Latency Multi-Bank LVTTL Interface o Wordline cross sectional area Data Clocked on Both Clock Data Strobe DDR Edges SSTL 2.5 Interface o Tight metal pitch ODT Posted CAS DDR2 OCD SSTL 1.8 Interface o Contact resistance Standard Low Voltage Option Drive/ODT Calibration Physics of Scaling DDR3 Dynamic ODT Write Leveling Latency must increase Faster DDR4 Lower Power Ref:[7-8,13] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 11
DRAM Market Analysis 700 Home PC Current Consumption (mA) 600 Plugged into a wall 500 ♦ DDR 400 Mobile ● DDR4 ▲ DDR3 300 Projections ■ DDR2 Battery life 200 100 Server 0 266 333 400 533 666 800 1066 1333 1600 2133 2666 3200 Power consumption Data Rate (MHz) Cooling Memory Mezzanine Tray (Sun SPARC Enterprise T5240 Server Only) Trending Up Poor Efficiency UltraSPARC UltraSPARC Bandwidth Driven Ref:[14-17] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 12
DRAM Market Analysis 4000 Interface versus Core SDR DDR DDR2 DDR3 DDR4 3500 (1n) (2n) (4n) (8n) (16n) Interface bares the burden 3000 Frequency (MHz) 2666 2666 Core cycles 2500 ■ Interface 2133 ● Core 2000 DRAM Pre-fetch 1600 1600 1333 1500 Doubles at each generation 800 1000 1066 667 667 533 100 100 267 333 400 Density limited by 500 133 167 200 200 133 167 167 133 167 167 200 67 67 0 167 200 133 bandwidth SSTL loading in memory DIMMS or Devices per channel Increase chip count per Channel module 400 800 1066 1333 1600 Data Rate (MHz) Ref:[14-17] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 13
4 Gb DRAM Architecture Possible Architecture Compared to ITRS 2012 Production Release 74 mm 2 56 % Array Efficiency COLUMN 40 nm SPINE ROW Ref:[10,12,20-24] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 14
4 Gb DRAM Architecture 6F 2 Memory Cell 6 F Isolation Feature Size = 40 nm Gate 2 F Cell Area = 0.0096 µm 2 UNITCELL 3F Pitch per Wordline Bitline 2F Pitch per Bitline Wordline Memory Capacitor 256 kb Array Macro Core Array 512 bitlines ≈ 43.6 µm 512 wordlines ≈ 65.4 µm UNITCELL UNITCELL Periphery Circuitry UNITCELL 4 µm space allocated Ref:[24] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 15
4 Gb DRAM Architecture Ref:[25] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 16
4 Gb DRAM Architecture 3.5 mm 256 Mb Array 1.9 mm 32 x 32 256 kb macros 1.6 mm 32 43 . 6 m 4 m 1 . 532 mm 2.3 mm ROW 2.5 mm 256M Array 256M Array 1 Gb Array Multiple implementations 4.9 mm COLUMN COLUMN ROW 256M Array 256M Array GLOBAL CORNER COLUMN Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 17
4 Gb DRAM Architecture Chip Size = 71.4 mm 2 ITRS Array Efficiency = 57.7% 74 mm 2 7.0 mm 56% Array Efficiency Wide I/O Architecture 256M 256M 256M 256M Row Row Array Array Array Array Moving the pads Column Column Column Column Centralized Row 256M 256M 256M 256M Row Row Array Array Array Array Centralized Column 10.2 mm SPINE 0.4 mm 256M 256M 256M 256M Row Row Array Array Array Array Column Column Column Column 256M 256M 256M 256M Row Row Array Array Array Array Ref:[29] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 18
Wide I/O Chip Architecture 64 Bytes per Chip Chip Size = 67.0 mm 2 6.2% Chip Size Reduction Array Efficiency = 61.5% 6.7 mm 6.6% Increase in Array RO Efficiency 256M Array W 256M Array 256M Array 256M 256M Row 256M 256M Array Array Array Array RO Challenges 256M Array 256M Array 256M Array W 4.6 mm RO Routing from the edge 256M Array 256M Array W 256M 256M Row 256M 256M Array Array Array Array RO Array I/O route increase 256M Array 256M Array W 10.0 mm o 2.3 mm → 4.6 mm RO 256M Array 256M Array 256M Array W Additional row decode 256M 256M Row 256M 256M Array Array Array Array RO 256M Array 256M Array 256M Array W Create Eight Internal Banks 256M Array 256M Array Row 256M 256M 256M 256M Array Array Array Array RO 256M Array 256M Array W Proximity Interface Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 19
4 Gb DRAM Architecture 1 mm Allocation for Proximity Chip Size = 69.68 mm 2 Array Efficiency = 59.2% Channel 6.7 mm 3.2 mm Buffers at the Center Increase global I/O metal usage 2.3 mm ROW 512M Bank 512M Bank Array I/O Routing Reduced to 2.3 mm ROW 512M Bank 512M Bank Architecture NOT Efficient for 10.4 mm Proximity Communication 7.0 mm ROW 512M Bank 512M Bank 6.7 mm versus 10.4 mm Buffers required ROW Large metal usage 512M Bank 512M Bank Proximity Interface Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 20
Recommend
More recommend