flash memory characterization
play

Flash Memory: Characterization, Optimization, and Recovery Yu Cai, - PowerPoint PPT Presentation

Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation 1 You Probably Know Many use cases: + High


  1. Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation 1

  2. You Probably Know • Many use cases: + High performance, low energy consumption 2

  3. NAND Flash Memory Challenges – Requires erase before program (write) – High raw bit error rate Controller Raw Flash Flash CPU Memory Chips ECC Controller 3

  4. Limited Flash Memory Lifetime Goal: Extend flash memory lifetime Raw bit error rate (RBER) at low cost P/E Cycle Lifetime ECC-correctable RBER ~2000 ~3000 Program/Erase (P/E) Cycles (or Writes Per Cell) 4

  5. Retention Loss Charge leakage over time 0 0 1 Retention Flash cell error One dominant source of flash memory errors [DATE ‘12, ICCD ‘12] 5

  6. Before I show you how we extend flash lifetime … NAND Flash 101 6

  7. Threshold Voltage (V th ) Flash cell Flash cell 0 1 Normalized V th 7

  8. Threshold Voltage (V th ) Distribution Probability Density Function (PDF) 0 1 Normalized V th 8

  9. Read Reference Voltage (V ref ) PDF V ref 0 1 Normalized V th 9

  10. Multi-Level Cell (MLC) ER-P1 V ref P1-P2 V ref P2-P3 V ref PDF Erased P1 P2 P3 (11) (10) (00) (01) Normalized V th 10

  11. Threshold Voltage Reduces Over Time Before retention loss: After some retention loss: PDF P1 P2 P3 (10) (00) (01) Normalized V th 11

  12. Fixed Read Reference Voltage Becomes Suboptimal Before retention loss: After some retention loss: P1-P2 V ref P2-P3 V ref PDF P1 P2 P3 (10) (00) (01) Normalized V th Normalized V th Raw bit errors 12

  13. Optimal Read Reference Voltage (OPT) After some retention loss: P1-P2 OPT P2-P3 OPT P1-P2 V ref P2-P3 V ref PDF P1 P2 P3 (10) (00) (01) Normalized V th Minimal raw bit errors 13

  14. Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage 14

  15. Retention Failure After some retention loss: After significant retention loss: P1-P2 V ref P2-P3 V ref PDF P1 P2 P3 (10) (00) (01) Normalized V th Uncorrectable errors Correctable errors 15

  16. Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 16

  17. To understand the effects of retention loss: - Characterize retention loss using real chips 17

  18. To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 18

  19. Characterization Methodology FPGA-based flash memory testing platform [Cai+,FCCM ‘11] 19

  20. Characterization Methodology • FPGA-based flash memory testing platform • Real 20- to 24-nm MLC NAND flash chips • 0- to 40-day worth of retention loss • Room temperature (20⁰C) • 0 to 50k P/E Cycles 20

  21. Characterize the effects of retention loss 1. Threshold Voltage Distribution 2. Optimal Read Reference Voltage 3. RBER and P/E Cycle Lifetime 21

  22. 1. Threshold Voltage (V th ) Distribution PDF P1 P2 P3 Normalized V th 22

  23. 1. Threshold Voltage (V th ) Distribution 0-day 0-day 40-day 40-day P1 P2 P3 Finding: Cell’s threshold voltage decreases over time 23

  24. 2. Optimal Read Reference Voltage (OPT) 40-day 0-day 40-day 0-day OPT OPT OPT OPT P1 P2 P3 Finding: OPT decreases over time 24

  25. 3. RBER and P/E Cycle Lifetime RBER P/E Cycles 25

  26. 3. RBER and P/E Cycle Lifetime V ref closer to Reading data with 7-day worth of retention loss. actual OPT Nominal Extended Lifetime Lifetime Actual OPT ECC-correctable RBER Finding: Using actual OPT achieves the longest lifetime 26

  27. Characterization Summary Due to re retention lo loss ‐ Cell’s threshold voltage (V th ) decreases over time ‐ Optim imal read reference volt ltage (OPT) decreases over time Using the actual OPT T for reading ‐ Achieves the longest lif lifetime 27

  28. To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 28

  29. Naïve Solution: Sweeping V ref Key idea: Read the data multiple times with different read reference voltages until the raw bit errors are correctable by ECC  Finds the optimal read reference voltage  Requires many read-retries  higher read latency 29

  30. Comparison of Flash Read Techniques Flash Read Lifetime Performance Techniques (P/E Cycle) (Read Latency)   Fixed V ref   Sweeping V ref   Our Goal 30

  31. Observations 1. The optimal read reference voltage gradually decreases over time Key idea: Record the old OPT as a prediction (V pred ) of the actual OPT Benefit: Close to actual OPT  Fewer read retries 2. The amount of retention loss is similar across pages within a flash block Key idea: Record only one V pred for each block Benefit: Small storage overhead (768KB out of 512GB) 31

  32. Retention Optimized Reading (ROR) Components: 1. Online pre-optimization algorithm ‐ Periodically records a V pred for each block 2. Improved read-retry technique ‐ Utilizes the recorded V pred to minimize read-retry count 32

  33. 1. Online Pre-Optimization Algorithm • Triggered periodically (e.g., per day) • Find and record an OPT as per-block V pred • Performed in background • Small storage overhead New Old PDF V pred V pred Normalized V th 33

  34. 2. Improved Read-Retry Technique • Performed as normal read • V pred already close to actual OPT • Decrease V ref if V pred fails, and retry PDF OPT V pred Normalized V th Very close 34

  35. Retention Optimized Reading: Summary Flash Read Lifetime Performance Techniques (P/E Cycle) (Read Latency)   Fixed V ref  64% ↑  Sweeping V ref  64% ↑  _____ Nom. Life: 2.4% ↓ ROR Ext. Life: 70.4% ↓ 35

  36. To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 36

  37. Retention Failure After significant retention loss: After some retention loss: P1-P2 V ref P2-P3 V ref PDF P1 P2 P3 (10) (00) (01) Normalized V th Uncorrectable errors Correctable errors 37

  38. Leakage Speed Variation PDF S S low-leaking cell F ast-leaking cell F Normalized V th 38

  39. Initially, Right After Programming PDF P2 P3 S S F F F F S S Normalized V th 39

  40. After Some Retention Loss Fast-leaking cells have lower V th PDF Slow-leaking cells have higher V th P2 P3 S S F F F F F F F F S S Normalized V th 40

  41. Eventually: Retention Failure PDF OPT P2 P3 S S F F F F S S Normalized V th 41

  42. Retention Failure Recovery (RFR) Key idea: Guess original state of the cell from its leakage speed property Three steps 1. Identify risky cells 2. Identify fast-/slow-leaking cells 3. Guess original states 42

  43. 1. Identify Risky Cells OPT –σ OPT+ σ + S = P2 Risky PDF OPT cells + F = P3 Key Formula S F F S Normalized V th 43

  44. 2. Identifying Fast- vs. Slow-Leaking Cells OPT –σ OPT+ σ + S = P2 Risky PDF OPT cells + F = P3 Key Formula ? ? ? ? ? ? Normalized V th 44

  45. 2. Identifying Fast- vs. Slow-Leaking Cells OPT –σ OPT+ σ + S = P2 Risky PDF OPT cells + F = P3 Key Formula ? S ? ? F ? ? F ? S Normalized V th 45

  46. 3. Guess Original States + S = P2 Risky PDF cells + F = P3 Key Formula S F F S Normalized V th 46

  47. RFR Evaluation • Expect to eliminate Program with 50% of raw bit errors random data • ECC can correct remaining errors 28 days Detect failure, backup data 12 addt’l . days Recover data 47

  48. To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 48

  49. Conclusion Problem: Retention loss reduces flash lifetime Overall Goal: Extend flash lifetime at low cost Flash Characterization: Developed an understanding of the effects of retention loss in real chips Retention Optimized Reading: A low-cost mechanism that dynamically finds the optimal read reference voltage ‐ 64% lifetime ↑ , 70.4% read latency ↓ Retention Failure Recovery: An offline mechanism that recovers data after detecting uncorrectable errors ‐ Raw bit error rate 50% ↓ , reduces data loss 49

  50. Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation 50

  51. Backup Slides 51

  52. RFR Motivation Data loss can happen in many ways 1. High P/E cycle High temperature  accelerates retention 2. loss 3. High retention age (lost power for a long time) 52

Recommend


More recommend