intelligent ram iram
play

Intelligent RAM (IRAM) Richard Fromm, David Patterson, Krste - PowerPoint PPT Presentation

Intelligent RAM (IRAM) Richard Fromm, David Patterson, Krste Asanovic, Aaron Brown, Jason Golbus, Ben Gribstad, Kimberly Keeton, Christoforos Kozyrakis, David Martin, Stylianos Perissakis, Randi Thomas, Noah Treuhaft, Katherine Yelick, Tom


  1. Intelligent RAM (IRAM) Richard Fromm, David Patterson, Krste Asanovic, Aaron Brown, Jason Golbus, Ben Gribstad, Kimberly Keeton, Christoforos Kozyrakis, David Martin, Stylianos Perissakis, Randi Thomas, Noah Treuhaft, Katherine Yelick, Tom Anderson, John Wawrzynek rfromm@cs.berkeley.edu http://iram.cs.berkeley.edu/ EECS, University of California Berkeley, CA 94720-1776 USA 1 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  2. IRAM Vision Statement L Proc o f Microprocessor & DRAM $ $ g a on a single chip: L2$ I/O I/O i b Bus G on-chip memory latency Bus c 5-10X, bandwidth 50-100X D R A M G improve energy efficiency 2X-4X (no off-chip bus) I/O I/O G serial I/O 5-10X vs. buses Proc D G smaller board area/volume f Bus R G adjustable memory size/width a A b M D R A M 2 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  3. Outline G Today’s Situation: Microprocessor & DRAM G IRAM Opportunities G Initial Explorations G Energy Efficiency G Directions for New Architectures G Vector Processing G Serial I/O G IRAM Potential, Challenges, & Industrial Impact 3 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  4. Processor-DRAM Gap (latency) Relative Performance µProc 1000 “Moore’s Law” 60%/yr. Processor-Memory 100 Performance Gap: (grows 50% / year) 10 DRAM 7%/yr. 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time 4 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  5. Processor-Memory Performance Gap “Tax” Processor % Area % Transistors (~cost) (~power) G Alpha 21164 37% 77% G StrongArm SA110 61% 94% G Pentium Pro 64% 88% G 2 dies per package: Proc/I$/D$ + L2$ G Caches have no inherent value, only try to close performance gap 5 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  6. Today’s Situation: Microprocessor G Rely on caches to bridge gap G Microprocessor-DRAM performance gap G time of a full cache miss in instructions executed 1st Alpha (7000): 340 ns/5.0 ns = 68 clks x 2 or 136 ns 2nd Alpha (8400): 266 ns/3.3 ns = 80 clks x 4 or 320 ns 3rd Alpha (t.b.d.): 180 ns/1.7 ns =108 clks x 6 or 648 ns G X latency x 3X clock rate x 3X Instr/clock ⇒ - 5X 1 2 G Power limits performance (battery, cooling) G Shrinking number of desktop ISAs? G No more PA-RISC; questionable future for MIPS and Alpha G Future dominated by IA-64? 6 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  7. Today’s Situation: DRAM DRAM Revenue per Quarter $20,000 $16B $15,000 (Miillions) $10,000 $7B $5,000 $0 4 4 4 4 5 5 5 5 6 6 6 6 7 9 9 9 9 9 9 9 9 9 9 9 9 9 Q Q Q Q Q Q Q Q Q Q Q Q Q 1 2 3 4 1 2 3 4 1 2 3 4 1 G Intel: 30%/year since 1987; 1/3 income profit 7 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  8. Today’s Situation: DRAM G Commodity, second source industry ⇒ high volume, low profit, conservative G Little organization innovation (vs. processors) in 20 years: page mode, EDO, Synch DRAM G DRAM industry at a crossroads: G Fewer DRAMs per computer over time G Growth bits/chip DRAM: 50%-60%/yr G Nathan Myrvold (Microsoft): mature software growth (33%/yr for NT), growth MB/$ of DRAM (25%-30%/yr) G Starting to question buying larger DRAMs? 8 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  9. Fewer DRAMs/System over Time DRAM Generation (from Pete ‘86 ‘89 ‘92 ‘96 ‘99 ‘02 MacWilliams, 1 Mb 4 Mb 16 Mb 64 Mb 256 Mb 1 Gb Intel) 32 8 Memory per 4 MB Minimum Memory Size DRAM growth 16 4 8 MB @ 60% / year 8 2 16 MB 4 1 32 MB Memory per 8 2 64 MB System growth 4 1 @ 25%-30% / year 128 MB 8 2 256 MB 9 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  10. Multiple Motivations for IRAM G Some apps: energy, board area, memory size G Gap means performance challenge is memory G DRAM companies at crossroads? G Dramatic price drop since January 1996 G Dwindling interest in future DRAM? G Too much memory per chip? G Alternatives to IRAM: fix capacity but shrink DRAM die, packaging breakthrough, more out-of-order CPU, ... 10 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  11. DRAM Density G Density of DRAM (in DRAM process) is much higher than SRAM (in logic process) G Pseudo-3-dimensional trench or stacked capacitors give very small DRAM cell sizes StrongARM 64 Mbit DRAM Ratio 0.35 µ m logic 0.40 µ m DRAM Process Transistors/cell 6 1 6:1 Cell size ( µ m 2 ) 26.41 1.62 16:1 ( λ 2 ) 216 10.1 21:1 Density (Kbits/ µ m 2 ) 10.1 390 1:39 (Kbits/ λ 2 ) 1.23 62.3 1:51 11 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  12. Potential IRAM Latency: 5 - 10X G No parallel DRAMs, memory controller, bus to turn around, SIMM module, pins… G New focus: Latency oriented DRAM? G Dominant delay = RC of the word lines G Keep wire length short & block sizes small? G 10-30 ns for 64b-256b IRAM “RAS/CAS”? G AlphaStation 600: 180 ns=128b, 270 ns=512b Next generation (21264): 180 ns for 512b? 12 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  13. Potential IRAM Bandwidth: 50-100X G 1024 1Mbit modules(1Gb), each 256b wide G 20% @ 20 ns RAS/CAS = 320 GBytes/sec G If cross bar switch delivers 1/3 to 2/3 of BW of 20% of modules ⇒ 100 - 200 GBytes/sec G FYI: AlphaServer 8400 = 1.2 GBytes/sec G 75 MHz, 256-bit memory bus, 4 banks 13 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  14. Potential Energy Efficiency: 2X-4X G Case study of StrongARM memory hierarchy vs. IRAM memory hierarchy (more later...) G cell size advantages ⇒ much larger cache ⇒ fewer off-chip references ⇒ up to 2X-4X energy efficiency for memory G less energy per bit access for DRAM 14 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  15. Potential Innovation in Standard DRAM Interfaces G Optimizations when chip is a system vs. chip is a memory component G Lower power via on-demand memory module activation? G Improve yield with variable refresh rate? G “Map out” bad memory modules to improve yield? G Reduce test cases/testing time during manufacturing? G IRAM advantages even greater if innovate inside DRAM memory interface? (ongoing work...) 15 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  16. “Vanilla” Approach to IRAM G Estimate performance of IRAM implementations of conventional architectures G Multiple studies: G “Intelligent RAM (IRAM): Chips that remember and compute”, 1997 Int’l. Solid-State Circuits Conf. , Feb. 1997. G “Evaluation of Existing Architectures in IRAM Systems”, Workshop on Mixing Logic and DRAM, 24th Int’l. Symp. on Computer Architecture , June 1997. G “The Energy Efficiency of IRAM Architectures”, 24th Int’l. Symp. on Computer Architecture , June 1997. 16 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  17. “Vanilla” IRAM - Performance Conclusions G IRAM systems with existing architectures provide only moderate performance benefits G High bandwidth / low latency used to speed up memory accesses but not computation G Reason: existing architectures developed under the assumption of a low bandwidth memory system G Need something better than “build a bigger cache” G Important to investigate alternative architectures that better utilize high bandwidth and low latency of IRAM 17 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  18. IRAM Energy Advantages G IRAM reduces the frequency of accesses to lower levels of the memory hierarchy, which require more energy G IRAM reduces energy to access various levels of the memory hierarchy G Consequently, IRAM reduces the average energy per instruction: Energy per memory access = AE L1 + ( MR L1 × AE L2 + ( MR L2 × AE off-chip )) where AE = access energy and MR = miss rate 18 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  19. Energy to Access Memory by Level of Memory Hierarchy G For 1 access, measured in nJoules: Conventional IRAM on-chip L1$(SRAM) 0.5 0.5 on-chip L2$(SRAM vs. DRAM) 2.4 1.6 L1 to Memory (off- vs. on-chip) 98.5 4.6 L2 to Memory (off-chip) 316.0 (n.a.) G Based on Digital StrongARM, 0.35 µm technology G Calculated energy efficiency (nanoJoules per instruction) G See “The Energy Efficiency of IRAM Architectures,” 24th Int’l. Symp. on Computer Architecture , June 1997 19 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  20. IRAM Energy Efficiency Conclusions G IRAM memory hierarchy consumes as little as 29% (Small) or 22% (Large) of corresponding conventional models G In worst case, IRAM energy consumption is comparable to conventional: 116% (Small), 76% (Large) G Total energy of IRAM CPU and memory as little as 40% of conventional, assuming StrongARM as CPU core G Benefits depend on how memory-intensive the application is 20 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

  21. A More Revolutionary Approach G “...wires are not keeping pace with scaling of other features. … In fact, for CMOS processes below 0.25 micron ... an unacceptably small percentage of the die will be reachable during a single clock cycle .” G “Architectures that require long-distance, rapid interaction will not scale well ...” G “Will Physical Scalability Sabotage Performance Gains?” Matzke, IEEE Computer (9/97) 21 Richard Fromm, IRAM tutorial, ASP-DAC ‘98, February 10, 1998

Recommend


More recommend