Novel Nonvolatile Memory Hierarchies to Realize "Normally-Off Mobile Processors" ASP-DAC 2014 Shinobu Fujita, Kumiko Nomura, Hiroki Noguchi, Susumu Takeda , Keiko Abe Toshiba Corporation, R&D Center Advanced LSI technology laboratory Acknowledgement This work was partly supported by Normally-off Computing PJ (NEDO) in Japan. 1 1
OUTLINE Introduction: Normally-off ( N-off ) Processor (from ver.0 to ver.1. ) Key Point 1: Advanced STT-MRAM Key Point 2: Decrease in power for short CPU standby state by applying new memory cell design Key Point 3: Power Decrease for long CPU standby state by Ultra-Fast- Power Gating Conclusions Towards N-off ver 2. 2
History of Concept on Normally-Off Computer Normally-Off Computer Ver.0 ( 2001 ) (FED journal Japan, 2001) Proposed by K. Ando, AIST, Japan ALU/ ALU/ ALU/ ALU/ FlipFlop FlipFlop FlipFlop FlipFlop Volatile Non- Register files gister files Register files gister files Memory volatile volatile Volatile Cache Cache (L (L1) 1) Cache Cache (L (L1) 1) Memory Cache (L Cache (L2) 2) Cache (L Cache (L2) 2) (MRAM) Ma Main in me memo mory ry Main Ma in me memo mory ry Storage Storag Storage Storag non-volatile non-volatile Memory Hierarchy Memory Hierarchy The same Ver.0 concept presented by T. Kawahara, ASP-DAC 2011. “Nonvolatile memory and normally-off computing” (based on MRAM) 3
Rethink Normally-off Concept Ver.0 Ver.0 (All Nonvolatile Memory Hierarchy) is not suitable for decreasing power .. Active power is dominant. Power ALU resistor L1 -cache L2 -cache Main memory Time Storage Standby power is dominant. Attention: -Active power (write power) of nonvolatile memory is so large! -Speed of NV-Memory is much slower than that of SRAM. 4 (CPU core power and performance is largely degraded by Ver.0!)
History of Concept on Normally-Off Computer (2) Normally-Off Computer Ver.0 ( 2001 ) Ver.1 ( 2010 ) K. Abe, S. Fujita et al., (FED journal Japan, 2001) K. Ando, AIST, Japan Toshiba, SSDM 2010 . ALU/ ALU/ ALU/ ALU/ ALU/ ALU/ FlipFlop FlipFlop FlipFlop FlipFlop FlipFlop FlipFlop Volatile Non- Register files gister files Register files gister files Register files gister files CPU core power Memory volatile volatile Volatile Cache (L Cache (L1) 1) Cache Cache (L (L1) 1) Cache Cache (L (L1) 1) and performance is Memory Non- Cache Cache (L (L2) 2) Cache (L Cache (L2) 2) Cache Cache (L (L2) 2) (MRAM) largely degraded ! Volatile Memory Main Ma in me memo mory ry Ma Main in me memo mory ry Ma Main in me memo mory ry (MRAM) Storage Storag Storage Storag Storag Storage non-volatile non-volatile non-volatile Memory Hierarchy Memory Hierarchy Memory Hierarchy Ver.0 (2011) T. Kawahara, ASP-DAC 2011. CPU core power and performance is Ultra low power applications largely degraded ! such as Sensor Networks etc. 5
Why nonvolatile L2 , L3, LL Cache? Capacitance of Cache Memory in CPU is increasing, which increases standby power of processors! Server, W/S CPU Desk Top PC CPU Note PC CPU Smart Phone CPU 64 CPU Core 32 Resistors 16 8 Resistor file 4 MB L1Cache 2 1 L2, L3 Cache 0.5 ‘09 ‘10 ‘11 ‘12 ‘13 ‘08 More cache, <Background> More Leakage.. •Increase performance not by increasing clock frequency. •Multi-core. 6
Especially for Mobile-Processor, not Standby Power but Leakage Power is Dominant! 100% Energy consumed (%) 80% 60% 40% (Evaluation from 20% one-day use case.) 0% Power Active Clock gated state state gated (except L2 state (retention)) Consumed Energy Caused by Leakage Power of Last Level Cache (L2$) 7
STT-MRAM is the best in NVM, but.. SRAM 1 practically RAM DRAM unlimited endurance Access speed (ns) 10 STT-MRAM MRAM Its operation speed FeRAM 100 is slow ReRAM and its power is PCM high 1000 for cache memory. limited Storage endurance 1000000 NAND Flash HDD 10000000 1M 1G 1T Memory capacity (bit) 8
Standby power is low, but active energy is extremely higher than that of SRAM even using conventional STT-MRAM. “Dilemma of Nonvolatile Memory! “ 100% Energy consumed (%) 80% 60% General STT-MRAM General STT-MRAM Active power Standby power 40% Increases Decreases drastically! largely! 20% 0% Active Clock Power state gated gated state 9 ( L2 retention) state
OUTLINE Introduction: New Design Concept Normally-off ( N-off ) Processor (from ver.0 to ver.1. ) Key Point 1: Advanced STT-MRAM Key Point 2: Decrease in power for short CPU standby state (in CPU active state) by applying new memory cell design Key Point 3: Power Decrease for long CPU standby state by Ultra-Fast- Power Gating Conclusions Towards N-off ver 2 . 10
Advanced STT-MRAM has been developed! SRAM 1 practically RAM Advanced DRAM unlimited p-STT-MRAM endurance Access speed (ns) 10 STT-MRAM MRAM FeRAM 100 ReRAM PCM 1000 limited Storage endurance 1000000 NAND Flash HDD 10000000 1M 1G 1T Memory capacity (bit) 11
Breakthrough Breakthrough by Toshiba ’ ’s advanced STT s advanced STT- - MRAM MRAM by Toshiba 1.0E-02 Programming current ( A) [3] Power of cache [4] memory is increased [2] 1.0E-03 Power down [1] Breakeven point for [6] the replacement of hp-SRAM [5] in power of cache memory Reduction in 1.0E-04 power of cache [7] memory Toshiba 2012 1.0E-05 (Advanced STT-MRAM, 28-30nm) 1.E-10 1.E-09 1.E-08 1.E-07 Programming time (nsec) Higher speed [1] Sony corp. IEDM (2005) [2] New York univ. APPLIED PHYSICS LETTERS 97, 242510 (2010) [3] Cornel Univ. APPLIED PHYSICS LETTERS 95, 012506 (2009) [4] Minnesota univ. J. Phys. D: Appl. Phys. 45, 025001 (2012). [5] NEC corp. Symposium on VLSI Circuits 7.3 (2012). 12 [6] IBM corp. Appl Phys Lett 98, 022501 (2011). [7] TDK-Headway Applied Physics Express 5 093008 (2012)
Embedded Memory Integration (by Toshiba N-off PJ) STT-MRAM Test Chip 256Rx256C 1059.32 μ m 4:1MUX 560.12 μ m 592.81 μ m 1K WL 256Rx256C XDEC 4:1MUX 256Rx256C 55.26 μ m 4:1MUX 502.03 μ m 502.03 μ m 256Rx256C 32.69 μ m READ & WRITE 4:1MUX 1K BL Driver Process Process 65-nm CMOS 65-nm CMOS process process S A Macro size Macro size 0.628 mm 2 0.628 mm 2 Organization 4K words x 256 bits = 1 Mb Organization 4K words x 256 bits = 1 Mb Access Time Measurements 17.8mW @250MHz Read 3.3ns@ 1.2V 4ns@ 1.05V 1.2 Pass 1.1 Voltage (V) 1.0 H. Noguchi et al., VLSI circuit 2T-2MTJ 0.9 symposium, 2013 Fail 0.8 3 4 5 6 Access time (ns) Access time < 4ns 13
High speed STT-MRAM is NOT for high CPU performance, but for lower power CPU! 4 Relative Average Power for L2 Cache Memory Average Time per Instruction (ns) 3 4 STT-MRAM Higher Power 3 2 SRAM < 3% 2 @5ns HP-SRAM=1 1 1MB 1 Lower Power 0 0 0 5 10 15 2 0 10 20 30 Access Time (ns) Access Time of STT-MRAM (ns) 14
Development of “STT-MRAM-top Integration” Specific MRAM Integration Process 配線層 STT-MRAM CMOS CMOS Cross section image Pool -top construction (Marina Bay Sands Hotel) Conventional CMOS Process (in-house fab, foundry..) To be presented in VLSI-TSA 2014. 15
OUTLINE Introduction: New Design Concept Normally-off ( N-off ) Processor (from ver.0 to ver.1. ) Key Point 1: Advanced STT-MRAM Key Point 2: Decrease in power for short CPU standby state (in CPU active state) by applying new memory cell design (normally-off type design) Key Point 3: Power Decrease for long CPU standby state by Ultra-Fast- Power Gating Conclusions Towards N-off ver 2 . 16
From “ Normally-On Type Memory with Power Gating” to “ Normally-Off Type Memory without Power Gating” Normally-On Type (1) SRAM and Nonvolatile SRAM without Power Gating Leakage path Power Leakage path BL /BL Short standby /BL /BL BL BL Active Active Active WL WL WL 10~30ns F F P P MTJ Short Standby Short Standby MTJ P P F F Leakage power Leakage power Time Nonvolatile SRAM (2004 Toshiba) - NV-SRAM for High Speed! Power Power Gating? Active Active Active Time 17
From “ Normally-On Type Memory with Power Gating” to “ Normally-Off Type Memory without Power Gating” Normally-On Type (1) SRAM and Nonvolatile SRAM with Power Gating Leakage path Power gating switch BL /BL /BL /BL BL BL NV-SRAM WL WL WL power gating Area SRAM F F P P switch MTJ MTJ P P F F Nonvolatile SRAM (2004 Toshiba) - Overhead of power gating switch is much large! (Delay and Power overhead also ) (2) Normally-off Type Memory without Power Gating Area SRAM New design: “Normally-off Type” STT-MRAM (Next page) x2 ~x4 No Leakage path, No power gating switch. STT-MRAM cell is much smaller than SRAM cell. 18
Various kinds of Normally-off Type Memory Cell designs using advanced p-STT-MRAM presented by Toshiba. L L B B L W R L L L S L L L L B S B B S B S / / / BL /BL WWL WL WL WL RWL N1 N2 M1 M3 MTJ MTJ MTJ MTJ MTJ P P P P MTJ1 MTJ2 F F F F WWL SWL M2 M4 SL (d) 4T-2MTJ (a) D-MRAM (b) 3T-2MTJ (c) 2T-2MTJ K. Abe et al. A. Kawasumi et al. H. Noguchi et al. C. Tanaka et al. IEDM2012 IMW2013 VLSI Circuit 2013 SSDM 2013 (Toshiba) (Toshiba) (Toshiba) (Toshiba) As there are No Leakage paths like SRAM, no power gating switch is needed in the memory arrays. 19
Recommend
More recommend