RELIABILITY OF RESISTIVE MEMORIES Mahdi Nazm Bojnordi Assistant - PowerPoint PPT Presentation

RELIABILITY OF RESISTIVE MEMORIES Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 7810: Advanced Computer Architecture

Overview ¨ Upcoming deadlines ¤ April 6 th : student paper presentation ¨ This lecture ¤ Hard errors in resistive memories ¤ Increasing reliability by replication, ECP , SAFER, FREE-p ¤ Resistive computing

Recall: Resistive vs. Dynamic RAM ¨ Phase-Change RAM ¨ DRAM ¤ Nonvolatile ¤ Volatile, charge based ¤ Projected to be more ¤ Difficult to further scale scalable down the capacitor ¤ Cells may be written ¤ All of the accesses are individually through row buffer ¤ Slower, with more ¤ Faster, with acceptable energy intensive writes energy consumption ¤ Susceptible to hard ¤ Vulnerable to soft errors errors

Solutions to Memory Hard Errors ¨ Accept failure of some fraction of pages ¤ Map failed pages out of logical memory ¨ Wear-level data pages/blocks, and within blocks ¤ Shift/rotate data randomly (intervals/locations) ¨ Differential writes ¤ Write only cells with values that change ¨ Correct errors when possible ¤ Error correction techniques

Error Correction Techniques ¨ No correction (detection only) ¤ Inefficient ¤ A page must be retired when the first cell fails ¨ SECDED ECC ¤ With a 12.5% memory overhead 8 chips SEC/SECDED 8 bits/chip 7/8 bits 10.9%/12.5% overhead 64 bits

Error Correction Techniques ¨ No correction (detection only) ¤ Inefficient ¤ A page must be retired when the first cell fails ¨ SECDED ECC ¤ With a 12.5% memory overhead ¤ A page must be retired when a block within the page suffers a second error X X

Error Correction Codes ¨ Good for soft errors ¤ Transient errors ¨ Not good for hard errors ¤ ECC has high entropy and can hasten wear-out ¤ Flipping just one data bit changes about half of ECC bits 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Dynamically Replicated Memory ¨ Goal: handle hard errors by pairing two pages that have faults in different locations; replicate data across the two pages ¨ How: errors are detected with parity bits; replica reads are issued if the initial read is faulty [ASPLOS’10]

Dynamically Replicated Memory ¨ Improve the lifetime of PCM by up to 40x over conventional error-detection techniques [ASPLOS’10]

Error Correction Pointers ¨ Key idea: instead of using ECC to handle a few transient faults in DRAM, use error-correcting pointers to handle hard errors in specific locations ¨ For a 512-bit line with 1 failed bit, maintain a 9-bit field to track the failed location and another bit to store the value in that location ¨ Can store multiple such pointers and can recover from faults in the pointers too [ISCA’10]

Error Correction Pointers correction entry replacement cell data cells Full? 0 1 1 0 … 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 511 510 509 508 3 2 1 0 8 7 6 5 4 3 2 1 0 R correction pointer 1 [ISCA’10]

Error Correction Pointers correction entries data cells 0 0 1 1 0 … 1 0 0 1 0000 Full? 5 4 3 2 1 0 511 510 509 508 3 2 1 0 0 0 0 0 0 0 0 1 0 1 8 7 6 5 4 3 2 1 0 R 1 [ISCA’10]

Error Correction Pointers correction entries data cells 0 0 1 1 0 … 1 0 0 1 0010 0001 Full? 5 4 3 2 1 0 511 510 509 508 3 2 1 0 1 1 1 1 1 1 1 0 1 0 8 7 6 5 4 3 2 1 0 R 0 0 0 0 0 0 0 0 1 0 1 8 7 6 5 4 3 2 1 0 R What if correction entry fails? 1 [ISCA’10]

Stuck-At-Fault Error Recovery ¨ Observation: a failed cell with a stuck-at value is still readable ¨ Goal: either write the word or its flipped version so that the failed bit is made to store the stuck-at value ¨ For multi-bit errors, the line can be partitioned such that each partition has a single error ¨ Errors are detected by verifying a write; recently failed bit locations are cached so multiple writes can be avoided [MICRO’10]

Stuck-At-Fault Error Recovery ¨ Three partition candidates in SAFER How to detect two fails? (read the paper) [MICRO’10]

Stuck-At-Fault Error Recovery ¨ Fail recovery [MICRO’10]

Multi-tiered ECC for Hard/Soft Errors ¨ FREE-p: fine-grained remapping with ECC and embedded pointer ¤ Re-use a “dead” 64B block for storing a remap pointer ¤ Architectural techniques to accelerate address remapping ¨ Detection/correction at the memory controller ¤ Allow simple NVRAM devices ¤ Tolerate hard/soft errors in the cell array, periphery, etc. [HPCA’11]

FREE-p ¨ Embed a 64-bit pointer within a faulty block ¤ There are still-functional bits in a faulty block ¤ 1-bit D/P flag per 64B block n Identify a block is remapped or not ¤ Avoid chained remapping n Embed always the FINAL pointer [HPCA’11]

Capacity vs. Lifetime [HPCA’11]

Resistive Computation ¨ Leverage STT-MRAM for energy efficiency ¤ Near-zero leakage power ¤ Low-energy read operation ¨ Goal: selectively migrate on-chip storage and combinational logic to STT-MRAM to reduce power ¤ On-chip storage: caches, TLBs, register files, queues ¤ Combinational logic: lookup-table (LUT) based computing [ISCA’10]

Hybrid CMT Pipeline ¨ Small arrays Pure CMOS STT-MRAM LUTs STT-MRAM Arrays and simple logic in CMOS Inst I$ Fetch Thrd Decode Buf Logic Sel Logic ¨ Large arrays x 8 I-TLB and complex CLK CLK CLK CLK logic in STT- MC 0 Queue Reg MRAM MC 0 Logic File x 8 MC 1 Queue Shared MC 1 Logic CLK L2$ CLK Banks MC 2 Queue x 8 Func Unit MC 2 Logic ALU ST D$ MC 3 Queue Buf FPU MC 3 Logic D-TLB x 8 CLK [ISCA’10]

System Power !"#$%&'"()*&+"*,$%-.)/&#"& !"#$#%"&'()"*&+(*,#-./"0& 0123&!"#$%&'"()*& 1(&2345&!"#$#%"&'()"*& (!!"# (!!"# '!"# '!"# &!"# &!"# %!"# %!"# $!"# $!"# !"# !"# )*+,# ,--.*/0*# )*+,# ,--.*/0*# 1234352#67829# :;<3=>?#67829# 1$# 1(2#345#-162# )7892# [ISCA’10]

System Performance 1 System Throughput Normalized to 0.8 0.6 CMOS 0.4 0.2 0 S N X N T M G Y E T U G M N S K F N I K L A - A D O C M I F A A W R S A E E A L S E E U C M E B T L R S B Q M O O A O E K W H E G C [ISCA’10]

RELIABILITY OF RESISTIVE MEMORIES Mahdi Nazm Bojnordi Assistant - PowerPoint PPT Presentation

RELIABILITY OF RESISTIVE MEMORIES Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 7810: Advanced Computer Architecture Overview Upcoming deadlines April 6 th : student paper presentation This

Resistive Memories Marwen Zorgui, Mohammed E. Fouda, Zhiying Wang, Ahmed Eltawil, and Fadi Kurdahi

CODING ASSISTED ADAPTIVE THRESHOLDING FOR SNEAK-PATH MITIGATION IN RESISTIVE MEMORIES Zehui Chen

Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs

Mellow Writes Extending Lifetime in Resistive Memories through Selective Slow Write Backs Zhang,

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao,

Report on Measurements in the Lab with R11, R12, R13 Alexandra Moskaleva What is a resistive

Digital Hadron Calorimeter with ith Resistive Plate Chambers Resistive Plate Chambers Jos

EE16A Lab: Touchscreen 2 Last Week: Soldering Building the base of the resistive touchscreen

Memories Introduction Why do we need memory in an FPGA Device? Topics Types of FPGA

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

Programming Activity Resistive Sensors and Servos The goal

Reliability Engineering - Discussions and Clarifications Reliability Engineering VS.

A Suitcase full of memories: Exploring the meaning of tourism memories for people with dementia

PERFORMANCE OF DIFFERENT RESISTIVE PROTECTION CONCEPTS. MASSIMO DELLA PIETRA UNIVERSITY OF

Resistive strips signal propagation studies and spark mitigation Javier Galan For the 8th RD51

Memories and SRAM 1 Silicon Memories Why store things in silicon? Its fast!!!

Memories and SRAM 1 Silicon Memories Why store things in silicon? Its fast!!!

Transverse resistive-wall impedance Elliptical pipe with semiaxes w , b : Use form factors G 1

LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories Hsin-Jung Yang ,

Memories Memories Viktor wall Dept. of Electrical and Information Technology p gy Lund

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

- Reliability - Reliability What It Is, Why, and How Jason Nicholas, Ph.D. November 13,

Resistive Micromegas Multigen 2D for Muon tomography Simon Bouteille CEA/DSM/Irfu/SPhN

Software Reliability Categorizing and specifying the reliability of software systems CS 422

RELIABILITY OF RESISTIVE MEMORIES Mahdi Nazm Bojnordi Assistant - PowerPoint PPT Presentation

RELIABILITY OF RESISTIVE MEMORIES Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 7810: Advanced Computer Architecture Overview Upcoming deadlines April 6 th : student paper presentation This

Resistive Memories Marwen Zorgui, Mohammed E. Fouda, Zhiying Wang, Ahmed Eltawil, and Fadi Kurdahi

CODING ASSISTED ADAPTIVE THRESHOLDING FOR SNEAK-PATH MITIGATION IN RESISTIVE MEMORIES Zehui Chen

Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs

Mellow Writes Extending Lifetime in Resistive Memories through Selective Slow Write Backs Zhang,

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao,

Report on Measurements in the Lab with R11, R12, R13 Alexandra Moskaleva What is a resistive

Digital Hadron Calorimeter with ith Resistive Plate Chambers Resistive Plate Chambers Jos

EE16A Lab: Touchscreen 2 Last Week: Soldering Building the base of the resistive touchscreen

Memories Introduction Why do we need memory in an FPGA Device? Topics Types of FPGA

Real Time Embedded Systems &quot; Memories Memories &quot; rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

Programming Activity Resistive Sensors and Servos The goal

Reliability Engineering - Discussions and Clarifications Reliability Engineering VS.

A Suitcase full of memories: Exploring the meaning of tourism memories for people with dementia

PERFORMANCE OF DIFFERENT RESISTIVE PROTECTION CONCEPTS. MASSIMO DELLA PIETRA UNIVERSITY OF

Resistive strips signal propagation studies and spark mitigation Javier Galan For the 8th RD51

Memories and SRAM 1 Silicon Memories Why store things in silicon? Its fast!!!

Memories and SRAM 1 Silicon Memories Why store things in silicon? Its fast!!!

Transverse resistive-wall impedance Elliptical pipe with semiaxes w , b : Use form factors G 1

LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories Hsin-Jung Yang ,

Memories Memories Viktor wall Dept. of Electrical and Information Technology p gy Lund

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

- Reliability - Reliability What It Is, Why, and How Jason Nicholas, Ph.D. November 13,

Resistive Micromegas Multigen 2D for Muon tomography Simon Bouteille CEA/DSM/Irfu/SPhN

Software Reliability Categorizing and specifying the reliability of software systems CS 422

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL