zombienand resurrecting dead nand flash for improved ssd
play

ZombieNAND: Resurrecting Dead NAND Flash for Improved SSD Longevity - PowerPoint PPT Presentation

ZombieNAND: Resurrecting Dead NAND Flash for Improved SSD Longevity Ellis H. Wilson III 1 , 2 Myoungsoo Jung 3 Mahmut Kandemir 1 1 Department of Computer Science and Engineering, The Pennsylvania State University 2 Panasas, Inc. 3 Department of


  1. ZombieNAND: Resurrecting Dead NAND Flash for Improved SSD Longevity Ellis H. Wilson III 1 , 2 Myoungsoo Jung 3 Mahmut Kandemir 1 1 Department of Computer Science and Engineering, The Pennsylvania State University 2 Panasas, Inc. 3 Department of Electrical Engineering, The University of Texas at Dallas September 10th, 2014

  2. Motivation Simulation Results Before We Begin: Get the Slides and Paper Slides and Paper are Available At: www.ellisv3.com ellis (www.ellisv3.com) ZombieNAND

  3. Motivation Simulation Results Contents: I 1 Motivation and Background for ZombieNAND Background on Flash Proof-of-Concept Problem Statement Simulation Model and ZombieNAND Wear-Leveling 2 High-Fidelity Longevity Simulation Fixing Current Wear-Leveling Shortcomings Synthetic and Trace-Driven Simulation Results 3 Experimental Setup Synthetic Experiment Results Trace-Driven Experiment Results ellis (www.ellisv3.com) ZombieNAND

  4. Motivation Background Simulation Proof-of-Concept Results Problem Statement The Present and Future of Flash Well-Known Flash Dynamics SLC: Fast, Long Life, Small Size MLC: Medium, Medium Life, Medium Size TLC: Slow, Short Lived, Large Size Cells are getting smaller (i.e., slower, shorter-lived)! Future Flash: As consumers push towards higher-capacity and SSDs slowly replace HDDs, longevity will return to the forefront of the discussion ellis (www.ellisv3.com) ZombieNAND

  5. Motivation Background Simulation Proof-of-Concept Results Problem Statement Leveraging the Little-Known A Little-Known Flash Fact SLC, MLC, and TLC are solely logical differentiations Same underlying NAND material! Our Question: Given impending longevity concerns and increasing NAND diversity, can we develop a scheme that will increase longevity without sacrificing manufacturer longevity guarantees or performance? ellis (www.ellisv3.com) ZombieNAND

  6. Motivation Background Simulation Proof-of-Concept Results Problem Statement Proof-Of-Concept on Real Hardware SSD A (MLC): SSD B (MLC): 2200 2400 2000 LSB 2200 1800 MSB 2000 Transition Occurrence Transition Occurrence 1600 Typical 1800 Latency (us) Latency (us) 1400 1600 1200 1400 1000 1200 800 1000 LSB 600 800 MSB 400 Ty pical 600 200 400 0 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 P/E cycle P/E cycle Take-away: Potential! But this (write all pages, erase, repeat) is an extremely simplified scenario. ellis (www.ellisv3.com) ZombieNAND

  7. Motivation Background Simulation Proof-of-Concept Results Problem Statement Problem Statement How Best to Leverage This Trick? Sounds Simple: Just transition a block down a bit-level when it approaches death Open Problems Targetted: Upon bit-switch, how long will the new MLC (or SLC) block survive? Can we do a double-death? How much does ZombieNAND extend lifetime? Do current-gen algorithms (e.g., wear-leveling) work with this? What is the impact on performance (before and after rebirth)? Don’t break any manufacturer guarantees! ellis (www.ellisv3.com) ZombieNAND

  8. Motivation High-Fidelity Longevity Simulation Simulation Wear-Leveling Results Simulation Framework Existing Simulators Fall Short Existing simulators use simple block counters Works when bit-levels remain constant, but when you switch. . . Extending DiskSim Add a physics-accurate stress model Add support to existing mechanics (e.g., garbage collection) to handle varying bit-levels blocks ellis (www.ellisv3.com) ZombieNAND

  9. Motivation High-Fidelity Longevity Simulation Simulation Wear-Leveling Results ZombieNAND Oxide Stress Model 1: procedure calc stress ( cycle ) A ← 0 . 08 2: B ← 5 . 0 3: Cox ← 2 . 15 e − 17 4: q ← 1 . 6 e − 19 5: δ N it ← A ∗ cycle 0 . 62 6: δ N ot ← B ∗ cycle 0 . 30 7: δ V it ← ( δ N it ∗ q ) / Cox 8: δ V ot ← ( δ N ot ∗ q ) / Cox 9: 10: return ( δ V it + δ V ot ) 11: end procedure Conservative estimate: We ignore charge leakage (cell recovery) due to manufacturing variability ellis (www.ellisv3.com) ZombieNAND

  10. Motivation High-Fidelity Longevity Simulation Simulation Wear-Leveling Results Limitations of Existing Wear-Leveling Existing Wear-Leveling Algorithms Overdo It Early experiments with adapted DiskSim demonstrate limited improvements Problem: We actually don’t want all of the cells to switch simultaneously Solution: Controlled Wear-Unleveling for Lifetime Early Blocks ≤ ( R − W ) × B (1) 2 S − 2 R=reserved percentage, W=high-watermark percentage, B=number of blocks per element, S=starting bit-level See the paper for the rest of the wear-leveling and GC algorithms ellis (www.ellisv3.com) ZombieNAND

  11. Motivation Setup Simulation Synthetic Results Trace-Driven Experimental Setup: Timings Fixed access latencies and lifetime by bit-level: Access Type (unit) SLC (2KB) MLC (4KB) TLC (8KB) Read (page) 0.025 ms 0.05 ms 0.15 ms Write (page) 0.2 ms 0.5 ms 1.0 ms Erase (block) 1.5 ms 1.5 ms 3.0 ms Lifetime (cycle) 75,000 6,000 1,000 Derived from specification documents from Micron. Fixed access latencies are not reasonable for small studies, but for lifetime studies they work fine. ellis (www.ellisv3.com) ZombieNAND

  12. Motivation Setup Simulation Synthetic Results Trace-Driven Experimental Setup: SSD Configuration Key experimental SSD configurations. Synthetic Trace-Driven Flash Chips 1 4 Blocks per Element 128 512 Planes per Element 8 8 Blocks per Plane 16 64 Pages per Block 128 128 Yes, these are “small” configurations (128MB and 1GB SSD sizes) relative to modern drives (often 128GB to 1TB) due to raw duration of simulation. ellis (www.ellisv3.com) ZombieNAND

  13. Motivation Setup Simulation Synthetic Results Trace-Driven Synthetic TLC Results: 50% Read/Write Ratio Normalized Latency Normalized Lifetime 26 17 26 1.6 16 Relative Improvement Over Baseline 24 Reserved Area (% of SSD) 24 15 Reserved Area (% of SSD) 1.4 Relative Change from Baseline 14 22 22 13 20 1.2 12 20 11 18 18 10 1 16 9 16 8 0.8 14 14 7 6 12 12 0.6 5 10 4 10 3 0.4 8 8 2 6 1 6 0.2 10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90 Working Set Size (% of SSD) Working Set Size (% of SSD) Take-aways: 1) All lifetimes are at least as long as the baseline. 2) Latency degradations occur largely after death of baseline. 3) Large lifetime gains (up to 16x) are not unreasonable due to huge differences in TLC/MLC/SLC P/E cycles; latency gains are accordingly less drastic. 4) Other R/W ratios follow same trend (see paper). ellis (www.ellisv3.com) ZombieNAND

  14. Motivation Setup Simulation Synthetic Results Trace-Driven Trace-Driven TLC Simulation Lifetime Results Baseline 10% Reserve 20% Reserve 30% Reserve Longevity Improvement (Relative to Baseline) 12 11 10 9 8 7 6 5 4 3 2 1 Fin-A Fin-B NFS-A NFS-B NFS-C User-A User-B User-C SQL-A SQL-B Application Trace Take-aways: 1) Still assuring in the worst case we match baseline. 2) Lifetime improvements vary widely across applications. 3) Efficacy has a strong correlation to address reuse of writes (see paper for details). ellis (www.ellisv3.com) ZombieNAND

  15. Motivation Setup Simulation Synthetic Results Trace-Driven Trace-Driven TLC Simulation Latency Changes Over Lifetime For TLC, 20% Reserved Scenario (see paper for rest) Baseline Death 5.5 Fin-A 5 Fin-B NFS-A Latency Relative to Baseline NFS-B 4.5 NFS-C User-A 4 User-B User-C 3.5 SQL-A SQL-B 3 2.5 2 1.5 1 0.5 0 0 20 40 60 80 100 120 140 160 180 200 Normalized Lifetime (100=Baseline Death, 200=Death) Take-aways: 1) Match or exceed performance up until last 5% of baseline life. 2) Some traces get even better after baseline death, some get far worse – desirable compared to complete death. 3) Spill-over accesses drive performance loss in post-baseline area. ellis (www.ellisv3.com) ZombieNAND

  16. Motivation Setup Simulation Synthetic Results Trace-Driven Questions? ellis (www.ellisv3.com) ZombieNAND

Recommend


More recommend