communication
play

Communication by Qawi Harvard Thesis Defense October 8 th , 2009 - PowerPoint PPT Presentation

Wide I/O DRAM Architecture Utilizing Proximity Communication by Qawi Harvard Thesis Defense October 8 th , 2009 Introduction Bandwidth and power consumption of dynamic random access memory stifles computer performance scaling


  1. Wide I/O DRAM Architecture Utilizing Proximity Communication by Qawi Harvard Thesis Defense – October 8 th , 2009

  2. Introduction Bandwidth and power consumption of dynamic random access memory stifles computer performance scaling  Background  Status of Proximity Communication  DRAM Market Analysis  4 Gb DRAM Architecture  Wide I/O DRAM Architecture Utilizing Proximity Communication Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 2

  3. Background  Memory Gap  Main memory does not scale with processor performance  Power  Current consumption is rising  Bandwidth increases power  Voltage scaling masks the issue  Density  Memory channel loading  Limits bandwidth  Proximity Communication  Proposed by Ivan Sutherland – US Patent #6,500,696  Promises to reduce power and increase bandwidth Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 3

  4. Proximity Communication Chip 1 Chip 2 Transmit Receive Chip 1 Chip 2 Receive Transmit  Capacitive Coupled Proximity Communication  Top metal forms the parallel plates  Chip-to-chip communication through coupling capacitor Ref:[1] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 4

  5. Proximity Communication  Benefits  Increased I/O density  Avoids on/off chip wires  Eases chip replacement at the system level  Enhances system level testability  Enables smaller chip sizes  Removes the need for ESD protection  Challenges  Mechanical misalignment  Applying power to the chips  Thermal solution Ref:[1-5] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 5

  6. Proximity Communication  Parallel Plate Capacitance  A   0  0 C 8 . 9 aF  m d  10 pF/mm 2 4000 1000  Chip-to-chip separation I/O Density per mm 2 Proximity Communication  d = 1 µm Area Ball Bonding 100  One channel  50 fF 10 2003 2004 2005 2006 2007 2008 2009 2010  200 signals/mm 2 [1] Ref:[1] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 6

  7. Proximity Communication  Mechanical Misalignment  Six axis  Multiple sources z θ x θ y Separation Tilt θ z y x Translation Rotation Ref:[5] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 7

  8. Proximity Communication  Electronic Sensors  Chip-to-chip separation sensors (0.2 µm resolution)  Vernier scale incorporated on chip (1.0 µm resolution)  Electrical Re-Alignment 1.0 Normalized Received Signal 0.9  Receive array 0.8 0.7 0.6  Micro-transmit array 0.5 0.4 0.3  Electronic steering circuit 0.2 0.1 0 20 40 60 80 100 120 0 Misalignment (µm) Transmit Chip Inactive Transmit Pad 1 1 1 1 1 1 0 0 0 0 0 Active Transmit Pad Receiver Pad 1 1 1 ?? 1 1 Ref:[1-5] 0 0 0 0 Receive Chip Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 8

  9. DRAM Market Analysis  Revisit the Memory Gap  “Performance” becomes a relative term  Dichotomy in scaling  Why Density?  Why Not Latency? 100000 100000 10000 10000 Relative Performance (%) Relative Performance (%) DRAM Density Processor IPS 1000 1000 100 100 Processor IPS DRAM Latency 10 10 1 1 1980 1985 1990 1995 2000 2005 2010 1980 1985 1990 1995 2000 2005 2010 Ref:[6] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 9

  10. DRAM Market Analysis  Moore’s Law  41% increase in transistor count per year  Selling Price  36% historic decline per year  Putting it into Perspective 100,000.00  1 Gb 2009 → $2.00 Historically the price per bit has 1975 1977 10,000.00 1974 declined by 9% every quarter 1979  2 Gb 2011 → $1.64 1976 (1974 – 2008). 1981 Price per Bit (Milicents) 1978 1983 1,000.00 1980 1985 1987 1982  Density or Bust!! 1989 1984 100.00 1991 1993 1995 1986 1988 1990 10.00 1994 1992 1997 1996 1999 1.00 1998 2001 2003 2000 2005 0.10 2002 2004 0.01 1 1 0 0 1 0 0 1 0 , 0 0 0 1 , 0 0 0 , 0 0 0 1 0 0 , 0 0 0 , 0 0 0 1 2 ) 1 2 C u m u l a t i v e B i t V o l u m e (1 0 Ref:[7-8] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 10

  11. DRAM Market Analysis  Cost  Low cost manufacturing process Generations o 3 metal layers Features • Increased usage of each level Simple RAS/CAS PM o Small chip size Fast CAS Access FPM • Limits I/O count Latched Output  Moore’s Law EDO Programmable Burst &  41% scaling per year Synchronous w/Clock SDR Latency Multi-Bank LVTTL Interface o Wordline cross sectional area Data Clocked on Both Clock Data Strobe DDR Edges SSTL 2.5 Interface o Tight metal pitch ODT Posted CAS DDR2 OCD SSTL 1.8 Interface o Contact resistance Standard Low Voltage Option Drive/ODT Calibration  Physics of Scaling DDR3 Dynamic ODT Write Leveling  Latency must increase Faster DDR4 Lower Power Ref:[7-8,13] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 11

  12. DRAM Market Analysis 700  Home PC Current Consumption (mA) 600  Plugged into a wall 500 ♦ DDR 400  Mobile ● DDR4 ▲ DDR3 300 Projections ■ DDR2  Battery life 200 100  Server 0 266 333 400 533 666 800 1066 1333 1600 2133 2666 3200  Power consumption Data Rate (MHz)  Cooling Memory Mezzanine Tray (Sun SPARC Enterprise T5240 Server Only)  Trending Up  Poor Efficiency UltraSPARC UltraSPARC  Bandwidth Driven Ref:[14-17] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 12

  13. DRAM Market Analysis 4000  Interface versus Core SDR DDR DDR2 DDR3 DDR4 3500 (1n) (2n) (4n) (8n) (16n)  Interface bares the burden 3000 Frequency (MHz) 2666 2666  Core cycles 2500 ■ Interface 2133 ● Core 2000  DRAM Pre-fetch 1600 1600 1333 1500  Doubles at each generation 800 1000 1066 667 667 533 100 100 267 333 400  Density limited by 500 133 167 200 200 133 167 167 133 167 167 200 67 67 0 167 200 133 bandwidth  SSTL loading in memory DIMMS or Devices per channel  Increase chip count per Channel module 400 800 1066 1333 1600 Data Rate (MHz) Ref:[14-17] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 13

  14. 4 Gb DRAM Architecture  Possible Architecture  Compared to ITRS  2012 Production Release  74 mm 2  56 % Array Efficiency COLUMN  40 nm SPINE ROW Ref:[10,12,20-24] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 14

  15. 4 Gb DRAM Architecture  6F 2 Memory Cell 6 F Isolation  Feature Size = 40 nm Gate 2 F  Cell Area = 0.0096 µm 2 UNITCELL  3F Pitch per Wordline Bitline  2F Pitch per Bitline Wordline Memory Capacitor  256 kb Array Macro  Core Array  512 bitlines ≈ 43.6 µm  512 wordlines ≈ 65.4 µm UNITCELL UNITCELL  Periphery Circuitry UNITCELL  4 µm space allocated Ref:[24] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 15

  16. 4 Gb DRAM Architecture Ref:[25] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 16

  17. 4 Gb DRAM Architecture 3.5 mm  256 Mb Array 1.9 mm  32 x 32 256 kb macros 1.6 mm        32 43 . 6 m 4 m 1 . 532 mm 2.3 mm ROW 2.5 mm 256M Array 256M Array  1 Gb Array  Multiple implementations 4.9 mm COLUMN COLUMN ROW 256M Array 256M Array GLOBAL CORNER COLUMN Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 17

  18. 4 Gb DRAM Architecture Chip Size = 71.4 mm 2  ITRS Array Efficiency = 57.7%  74 mm 2 7.0 mm  56% Array Efficiency  Wide I/O Architecture 256M 256M 256M 256M Row Row Array Array Array Array  Moving the pads Column Column Column Column  Centralized Row 256M 256M 256M 256M Row Row Array Array Array Array  Centralized Column 10.2 mm SPINE 0.4 mm 256M 256M 256M 256M Row Row Array Array Array Array Column Column Column Column 256M 256M 256M 256M Row Row Array Array Array Array Ref:[29] Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 18

  19. Wide I/O Chip Architecture  64 Bytes per Chip Chip Size = 67.0 mm 2  6.2% Chip Size Reduction Array Efficiency = 61.5% 6.7 mm  6.6% Increase in Array RO Efficiency 256M Array W 256M Array 256M Array 256M 256M Row 256M 256M Array Array Array Array RO  Challenges 256M Array 256M Array 256M Array W 4.6 mm RO  Routing from the edge 256M Array 256M Array W 256M 256M Row 256M 256M Array Array Array Array RO  Array I/O route increase 256M Array 256M Array W 10.0 mm o 2.3 mm → 4.6 mm RO 256M Array 256M Array 256M Array W  Additional row decode 256M 256M Row 256M 256M Array Array Array Array RO 256M Array 256M Array 256M Array W  Create Eight Internal Banks 256M Array 256M Array Row 256M 256M 256M 256M Array Array Array Array RO 256M Array 256M Array W Proximity Interface Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 19

  20. 4 Gb DRAM Architecture  1 mm Allocation for Proximity Chip Size = 69.68 mm 2 Array Efficiency = 59.2% Channel 6.7 mm 3.2 mm  Buffers at the Center  Increase global I/O metal usage 2.3 mm ROW 512M Bank 512M Bank  Array I/O Routing Reduced to 2.3 mm ROW 512M Bank 512M Bank  Architecture NOT Efficient for 10.4 mm Proximity Communication 7.0 mm ROW 512M Bank 512M Bank  6.7 mm versus 10.4 mm  Buffers required ROW  Large metal usage 512M Bank 512M Bank Proximity Interface Qawi Harvard – Oct. 8 th ,2009 Thesis Defense 20

Recommend


More recommend