Ouroboros Wear-leveling: A Two-level Hierarchical Wear-leveling Model for NVRAM Qingyue Liu Peter Varman ECE Department, Rice University May 18, 2017
New Challenges for New Technologies RRAM PCM 3DXpoint • Advantages • Major Drawback: – High-Density: Easy to – Lifetime endurance problem scale down under 10nm – PCM: 10 7 ~10 8 writes per cell – Non-volatile – In practice, lifetime around – In-place update 20x shorter without wear- – Low leakage power leveling 2
Wear-leveling (WL) • A technique for prolonging the service life of some kinds of erasable computer storage media • Block migration across the memory with certain rules – Move high usage blocks to low usage frames 800 D A Write A 0 1 20 B 270 C 2 Aim: Make write evenly 3 6 A D distributed across the 80 E 4 memory 600 5 F 100 G 6 96 H 7 3
SSD WL vs. NVRAM WL • Solid State Disk (SSD) • NVRAM – Written out-of-place – In-place writing – Granularity: – Granularity: ➢ Read/write: page ➢ Read/write: byte ➢ Erase: block ➢ No erase – Requires garbage collection – No garbage collection • NVRAM has more freedom and can do better – No complex design for garbage collection – Fine-grained wear-leveling – Allows both algebraic and full-associative logical to physical mappings 4
Outline • Background • Previous Work • Our Contributions – Hierarchical Ouroboros Wear-leveling – System Design • Architecture • Parameter selection – Experiments and Results • Conclusion 5
Previous Work: NVRAM • Wear-leveling using restricted algebraic mappings – No address mapping table – Granularity: memory line (cache line) – Example: Start-Gap Wear-leveling [1] • Wear-leveling using fully-associative mappings – Additional address mapping table needed – Granularity: block – Example: Segment Swapping [2], PCM-aware swap [3] [1] Qureshi etal, "Enhancing lifetime and security of PCM-based main memory with start-gap wear-leveling." MICRO, 2009. [2] Zhou etal , “ A durable and energy efficient main memory using phase change memory technology ” ISCA, 2009. [3] A. P. Ferreira, M. Zhou, S. Bock, B. Childers, R. Melhem, and D. Moss ´ e , “ Increasing pcm main memory lifetime ,” in Proceedings of the conference on design, automation and test in Europe. European Design and Automation Association, 2010, pp. 914 – 919. 6
Start-Gap Method Analysis A Start A B WRITE 0 C D • Advantages: GapLine – Distribute writes smoothly Gap within the frame – Small space overhead Q R – Simple algorithm 6 S • Disadvantages: T GapLine – Region size is limited since only U 1 line is relocated at a time V – May not use all the region to 7 W distribute the writes X GapLine 7
Previous Work: NVRAM • Wear-leveling using restricted algebraic mappings – No address mapping table – Granularity: memory line (cache line) – Example: Start-Gap Wear-leveling [1] • Wear-leveling using fully-associative mappings – Additional address mapping table needed – Granularity: block – Example: Segment Swapping [2], PCM-aware swap [3] [1] Qureshi etal, "Enhancing lifetime and security of PCM-based main memory with start-gap wear-leveling." MICRO, 2009. [2] Zhou etal , “ A durable and energy efficient main memory using phase change memory technology ” ISCA, 2009. [3] A. P. Ferreira, M. Zhou, S. Bock, B. Childers, R. Melhem, and D. Moss ´ e , “ Increasing pcm main memory lifetime ,” in Proceedings of the conference on design, automation and test in Europe. European Design and Automation Association, 2010, pp. 914 – 919. 8
Segment Swap vs. PCM-aware Swap • PCM-aware Swap: • Segment Swap: – Periodically swap content in – Periodically swap content in highest-usage frame with highest-usage frame with content in random frame content in lowest-usage frame • Advantages: G 0 A A – Can involve all space into wear- 1 B leveling 2 C – Can easily be implemented 3 D 4 E 5 F A 6 G 7 H 9
Analysis of 2 Swap Methods: A* Pattern Without Wear-leveling Segment Swap PCM-aware Swap • A* Pattern: Write to the same logical block A continuously • Deterministic swap is better than randomized swap under correct conditions 10
Analysis of 2 Swap Methods: AB* Pattern Without Wear-leveling Segment Swap PCM-aware Swap • AB* Pattern: Alternate writes to two logical blocks A and B (catastrophic pattern for Segment Swap) • Randomized swap is better than deterministic swap in bad cases 11
Outline • Background • Previous Work • Our Contributions – Hierarchical Ouroboros Wear-leveling – System Design • Architecture • Parameter selection – Experiments and Results • Conclusion 12
NVRAM Model Frame A B • Memory partitioned C D into frames Block • Each frame holds a block A B • A block holds a set of C Q memory lines D R • Block assumed to have S T consecutive address range Memory line U V W X 13
Hierarchical Ouroboros Wear-leveling • Aim: – Guarantee write distribution as smooth as possible • Level 1: Local WL within frames – Start-gap like rule – Smooth distribution of writes within a frame – Granularity: Memory line – Aim: Make expensive large block Global WL less frequent 14
Hierarchical Ouroboros Wear-leveling • Level 2: Global WL across frames – Exploit demand prediction to direct global wear-leveling – Use randomization in block migration to avoid worst-case behavior – Smooth distribution of writes across frames – Granularity: Frame – Aim: Involve all memory space into wear-leveling 15
Global Wear-Leveling Framework Demand-based Ouroboros Migration • Inputs 1. Usage counter of each physical frame (U) 2. Prediction of the number of future writes to each logical block (P) – Repetitive workloads – Program Analysis (embedded applications) – Use recent activity (demand) as predictor 16
Global Wear-Leveling Framework 1. Collect statistics: • Estimate future demand of each block to form a vector P • Collect current usage for each frame to form a vector U 2. Generate raw block migration mapping Aim: Map the i th hottest (highest demand) block to the i th • coldest (lowest usage) frame 17
Raw Block Migration Initialization: Hot-to-Cold Blocks Cold-to-Hot Frames Physical Frame(Usage U) C 1 15 5 5 0 1 2 3 4 B 10 4 6 6 10 20 5 40 100 A 5 0 10 - + Logical Block (Demand P) 5 0 D 0 20 0 1 2 3 4 B C D E F A 0 E 3 40 0 10 15 0 0 0 0 F 2 100 D C F E B A 5 0 1 2 3 4 Final Block Order 18
Global Wear-Leveling Framework 1. Collect statistics: • Estimate future demand of each block to form a vector A • Collect current usage for each frame to form a vector U 2. Generate raw block migration mapping Aim: Map the i th hottest (highest demand) block to the i th • coldest (lowest usage) frame 3. Classification step: • Identify a hot pool with up to K hottest blocks that meet a minimum demand threshold 4. Pruning Step: • Move only blocks in the hot pool to deterministic frames 19
Block Migration with Pruning Method Initialization: Hot-to-Cold Blocks Cold-to-Hot Frames Physical Frame(Usage U) C C 1 15 5 5 0 1 2 3 4 B B 10 4 6 6 10 20 5 40 100 A 5 0 10 - + Logical Block (Demand D) 5 0 D 0 20 0 1 2 3 4 B C D E F A 0 E 3 40 0 10 15 0 0 0 0 F 2 100 A C E D B F 5 0 1 2 3 4 |H| = |C| =2 Final Block Order: 20
Deterministic Block Migration Ring 2 Hot Cold E C Block Block 1 4 … B E B C Hot Cold Block Block Deterministic Block Migration 21
Ouroboros Block Migration Ring 2 2 Hot Hot Random C C Block Block Free Block 3 1 5 1 4 B 5 F … E B 0 4 Cold Cold E Block Block Deterministic Block Ouroboros Block Free Frame Pool Migration Migration Ring 22
Ouroboros Block Migration Ring 2 2 Hot Random F C C Block Free Block 3 1 5 1 4 B C 5 E F … E B Hot 0 4 Cold Block Cold Block E B Block Deterministic Block Ouroboros Block Free Frame Pool Migration Migration Ring 22
Global Wear-Leveling Framework 1. Collect statistics: • Estimate future demand of each block to form a vector A • Collect current usage for each frame to form a vector U 2. Generate raw block migration mapping Aim: Map the i th hottest (highest demand) block to the i th coldest • (lowest usage) frame 3. Classification step: • Identify a hot pool with up to K hottest blocks that meet a minimum demand threshold 4. Pruning Step: • Move only blocks in the hot pool to deterministic frames 5. Randomization step: • Identify free frame pool with more than K free frames for randomization 6. Form Ouroboros block migration ring for block relocation 23
Block Migration with Randomization Initialization: Hot-to-Cold Blocks Cold-to-Hot Frames Physical Frame(Usage U) C 1 1 15 5 5 0 1 2 3 4 B 10 4 4 6 6 10 20 5 40 100 A 5 0 10 - + Logical Block (Demand D) 5 0 D 0 20 0 1 2 3 4 B C D E F A 0 E 3 40 0 10 15 0 0 0 0 F 2 2 100 Free Frame Pool A C F D B E 0 5 5 5 0 1 2 3 4 Final Block Order: |H| = |C| =2, |F|=2 24
Outline • Background • Previous Work • Our Contributions – Hierarchical Ouroboros Wear-leveling – System Design • Architecture • Parameter selection – Experiments and Results • Conclusion 25
Recommend
More recommend