RESISTIVE MEMORY TECHNOLOGY Mahdi Nazm Bojnordi Assistant Professor - PowerPoint PPT Presentation

RESISTIVE MEMORY TECHNOLOGY Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 7810: Advanced Computer Architecture

Overview ¨ Upcoming deadlines ¤ March 29 th : Sign up for your student paper presentation ¨ This lecture ¤ Resistive memory technology ¤ Write optimization techniques ¤ Wear leveling ¤ MLC technologies

Resistive Memory Technology ¨ Main benefits ¤ Non-volatile memory ¤ Multi-level storage ¤ Denser cells ¤ Better scalability ¨ Shortcomings ¤ Limited endurance ¤ High switching delay and energy What can we do?

Comparison of Technologies ¨ Compared to NAND Flash, PCM is byte-addressable, has orders of magnitude lower latency and higher endurance. DRAM PCM NAND Flash Page size 64B 64B 4KB ∼ 50ns ∼ 25 µs Page read latency 20-50ns ∼ 1 µs ∼ 500 µs Page write latency 20-50ns ∼ GB/s Write bandwidth 50-100 MB/s 5-40 MB/s per die per die per die ∼ 2 ms Erase latency N/A N/A 6 − 10 4 − 10 8 5 ∞ Endurance 10 10 Read energy 0.8 J/GB 1 J/GB 1.5 J/GB [28] Write energy 1.2 J/GB 6 J/GB 17.5 J/GB [28] ∼ 100 mW/GB ∼ 1 mW/GB Idle power 1–10 mW/GB Density 1× 2 − 4× 4× Sources: [Doller ’09] [Lee et al. ’09] [Qureshi et al. ‘09]

Comparison of Technologies ¨ Compared to DRAM, PCM has better density and scalability and similar read but longer write latencies DRAM PCM NAND Flash Page size 64B 64B 4KB ∼ 50ns ∼ 25 µs Page read latency 20-50ns ∼ 1 µs ∼ 500 µs Page write latency 20-50ns ∼ GB/s Write bandwidth 50-100 MB/s 5-40 MB/s per die per die per die ∼ 2 ms Erase latency N/A N/A 6 − 10 4 − 10 8 5 ∞ Endurance 10 10 Read energy 0.8 J/GB 1 J/GB 1.5 J/GB [28] Write energy 1.2 J/GB 6 J/GB 17.5 J/GB [28] ∼ 100 mW/GB ∼ 1 mW/GB Idle power 1–10 mW/GB Density 1× 2 − 4× 4× Sources: [Doller ’09] [Lee et al. ’09] [Qureshi et al. ‘09]

Latency Comparison Read NAND Flash Hard Disk DRAM PCM 10ns 100ns 1us 10us 100us 1ms 10ms NAND Flash Hard Disk DRAM PCM Write [Qureshi’09]

Read Compare Write ¨ A cache line is written in several cycles ¨ Read-compare-write (differential write) n Write only modified bits rather than entire cache line ¨ Skipping parts with no modified bits 0 0 0 0 1 Cache line 0 1 0 1 1 0 0 1 0 1 1 0 1 0 1 1 0 1 1 0 1 1 0 1 1 1 0 PCM 0 0 0 1 1 0 0 1 1 1 1 0 0 1 0 1 1 0 0 1 1 1 1 0 0 1 0 0 0 0 1 0 0 1 1 1 0

Reducing Bit Flips Old 0 0 1 0 1 1 q Encode write data into either its regular or inverted form and then pick the 0 0 1 0 1 0 1 1 New( Regular ) encoding that yields in less flips in 0 0 1 comparison against old data. 1 0 1 1 0 1 0 New ( Inverted ) Flip-N-Write [MICRO’09] Saves 4 bit flips q Encode write data into a set of data Old 0 0 1 0 1 1 vectors and then pick the vector that 0 1 0 1 0 1 New 1 0 1 yields in less flips in comparison against old data. New 2 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 1 0 1 0 1 1 New 3 Flip-Min [HPCA’13] Saves 5 bit flips

Limited Lifetime Challenge : Each cell can endure 10-100 Million writes 16 yrs workloads 4 yrs With uniform write traffic, system lifetime ranges from 4-20 years [Qureshi’09]

Non-Uniform Writes ¨ Even with 64K spare lines, baseline gets 5% lifetime of ideal Average [Qureshi’09]

Impact of Non-Uniformity ¨ Even with 64K spare lines, baseline gets 5% lifetime of ideal Num. writes before system failure x 100% Norm. Endurance = Num. writes before failure with uniform writes 20x lower 100 Normalized Endurance (%) 95 90 85 Baseline w/o spares 80 75 70 Baseline (64K spare lines) 65 60 55 50 45 40 35 30 25 20 15 10 5 0 oltp db1 db2 fft stride stress Gmean [Qureshi’09]

Making Writes Uniform ¨ Wear Leveling: make writes uniform by remapping frequently written lines Line Addr. Lifetime Count Period Count Line Remap Addr A 99K (Low) 1K (Low) A C è B 100K (Med) 3K (High) B A C 101K (High) 2K (Med) C B Indirection Table Physical Address PCM Address [Qureshi’09]

How to Remap ¨ Tables ¤ Area of several (tens of) megabytes ¤ Indirection latency (table in EDRAM/DRAM) ¨ Area overhead can be reduced with more lines per region ¤ Reduced effectiveness (e.g. Line0 always written) ¤ Support for swapping large memory regions (complex) [Qureshi’09]

Start-Gap Wear Leveling ¨ Two registers (Start & Gap) + 1 line (GapLine) to support movement ¨ Move GapLine every 100 writes to memory. ç START A 0 B 1 C 2 D 3 GAP è 4 PCMAddr = (Start+Addr); (PCMAddr >= Gap) PCMAddr++) Storage overhead: less than 8 bytes (GapLine taken from spares) Latency: Two additions (no table lookup) Write overhead: One extra write every 100 writes è 1% [Qureshi’09]

Start-Gap Results ¨ On average, Start-Gap gets 53% normalized endurance Normalized Endurance (%) 100 90 Baseline 80 70 Start Gap 60 Perfect 50 40 30 20 10 0 oltp db1 db2 fft stride stress Gmean [Qureshi’09]

Multi-Level Cells Voltage 11 10 01 00 Time [Yoon’14]

Sensing Multi-level Cells [Yoon’14]

Multi-Level Cells Voltage 11 10 01 00 Time [Yoon’14]

Multi-Level Cells Time to determine Bit 1's value Voltage 11 10 01 00 Time [Yoon’14]

Multi-Level Cells Time to determine Bit 0's value Voltage 11 10 01 00 Time [Yoon’14]

Decoupled Bit Mapping MLC-PCM cell Bit 1 (fast read) Bit 0 (fast write) Coupled (baseline): Contiguous bits alternate between FR and FW 1 3 5 7 9 11 13 15 bit bit bit bit bit bit bit bit 0 2 4 6 8 10 12 14 bit bit bit bit bit bit bit bit Decoupled: Contiguous regions alternate between FR and FW 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 bit bit bit bit bit bit bit bit [Yoon’14]

Decoupled Bit Mapping l By decoupling, we've created regions with distinct characteristics – We examine the use of 4KB regions (e.g., OS page size) Physical address Fast read page Fast write page l Want to match frequently read data to FR pages and vice versa l Toward this end, we propose a new OS page allocation scheme [Yoon’14]

Performance Results +31% +19% +10%+16% +13% Conventional All fast write Normalized All fast read Speedup DBM DBM+APM+SRB Ideal [Yoon’14]

RESISTIVE MEMORY TECHNOLOGY Mahdi Nazm Bojnordi Assistant Professor - PowerPoint PPT Presentation

RESISTIVE MEMORY TECHNOLOGY Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 7810: Advanced Computer Architecture Overview Upcoming deadlines March 29 th : Sign up for your student paper presentation

Report on Measurements in the Lab with R11, R12, R13 Alexandra Moskaleva What is a resistive

EE16A Lab: Touchscreen 2 Last Week: Soldering Building the base of the resistive touchscreen

Digital Hadron Calorimeter with ith Resistive Plate Chambers Resistive Plate Chambers Jos

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

CODING ASSISTED ADAPTIVE THRESHOLDING FOR SNEAK-PATH MITIGATION IN RESISTIVE MEMORIES Zehui Chen

Emerging Non Volatile Memory Resistive Memory Technologies Key concept: replace DRAM cell

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao,

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Programming Activity Resistive Sensors and Servos The goal

Resistive Micromegas Multigen 2D for Muon tomography Simon Bouteille CEA/DSM/Irfu/SPhN

Resistive strips signal propagation studies and spark mitigation Javier Galan For the 8th RD51

PERFORMANCE OF DIFFERENT RESISTIVE PROTECTION CONCEPTS. MASSIMO DELLA PIETRA UNIVERSITY OF

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

non-von Neumann computing? Abu Sebastian IBM Research Zurich Stanford EE380, 7 th March 2018

Outline DIF/DSF with DIF/DSF with PCMtrees PCMtrees Detecting Differential Item and Testing

Multimedia Outline Compression RTP Scheduling Spring2002 CS461 1

MA-207 Differential Equations II Ronnie Sebastian Department of Mathematics Indian Institute of

Theoretical study of photoproduction of an N bound state on a deuteron target with forward

A new look at integrable -models and their deformations Dmitri Bykov Max-Planck-Institut f

Transversality, the maximum principle, and the approximation problem H ector J. Sussmann

On Liouville integrable defects Anastasia Doikou University of Patras Quantum Integrable Systems