Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs Lunkai Zhang, Brian Neely, Diana Franklin, Dmitri Strukov, Yuan Xie, Frederic T. Chong Presented at ISCA 2016 EECS 573, University of Michigan, Ann Arbor Presented by Nishil Talati and Tarunesh Verma 1
Executive Summary RRAM DRAM Improve to the scaling RRAM rescue problem lifetime Trade-off performance for extended lifetime 2
RRAM at the Rescue • More scalable compared to DRAM • Does not solve the DRAM scaling problem • Shortcomings • Longer write latency • Higher write energy • Limited endurance Resistive RAM (RRAM) 3
Trade-off Between Endurance and Write Latency in RRAM Write operation High Low power power Short time Long time Low endurance High endurance (Courtesy: Zhang et al., ISCA 2016) For typical Resistive Memory technologies, slower writes are predicted to have a quadratic endurance advantage! D. B. Strukov, “Endurance-write-speed tradeoffs in nonvolatile memories,” Applied Physics A, vol. 122, no. 4, pp. 1–4, 2016. 4
Is Having a Single Write Latency Wise? Short write latency • Limited lifetime Long write latency • Poor performance • Is it possible to let a system adaptively use different write latencies to improve the lifetime without loss of performance ? 5
Motivation – Typical Bank Utilization Trends (Courtesy: Zhang et al., ISCA 2016) • Memory banks are idle for most of the time • Is it possible to use the bank idle time to slowly write back the data? 6
Proposed Schemes • Mellow Writes – Bank-Aware Mellow Writes – Eager Mellow Writes • Wear Quota 7
Proposed Schemes • Mellow Writes – Bank-Aware Mellow Writes – Eager Mellow Writes • Wear Quota 8
Motivation: Bank Level Imbalance # Awaiting Writes 1 Zhang et al., (Courtesy: ISCA 2016) 2 3 1 • Bank 0 has only 1 memory block to be written back • Bank 2 has more memory blocks to be written back Intuition: slow writes to bank 0 and fast writes to bank 2 9
Bank-Aware Mellow Writes # Awaiting Writes 1 Zhang et al., (Courtesy: ISCA 2016) 2 3 1 • Proposed Approach : Slowly writing back a memory block only when there is no other memory block queued for the same bank • Write back the only memory block for Bank 0 in slow speed • Write back current memory block for Bank 2 in normal speed 10
Simulated System • OoO Alpha core • 32KB L1 I/D-$, 256KB L2$, 2MB L3$ (LLC) • 4GB Resistive Main Memory (ReRAM technology), 16 Banks ( across 4 ranks ), 32-entry read/write queues, write drain, Start-Gap Wear Leveling, (1.0x latency = 150ns, 1.00x endurance = 5.0 * 10^6 ) : – Norm Writes (1.0x): 1.00x latency, 1.00x endurance – Slow Writes (3.0x): 3.00x latency, 9.00x endurance • Eight-Year lifetime requirement 11
Effectiveness of Bank-Aware Mellow Writes (Courtesy: Zhang et al., ISCA 2016) No Noticeable Performance Degradation. Geomean 87% lifetime improvement compared with All-Norm. 4 out of 11 applications meet the 8-year lifetime requirements. 12
Schemes • Mellow Writes – Bank-Aware Mellow Writes – Eager Mellow Writes • Wear Quota 13
Motivation: Write Scheduling Imbalance in a Memory Bank Too Wasted! With Bank-Aware Mellow Writes Crowded! If it is possible to evenly reschedule the writes (Courtesy: Zhang et al., ISCA 2016) Is it possible to reschedule the writes? 14
Eager Mellow Writes • Predict which dirty cache lines in the Last Level Cache will not be written again before their evictions, and eagerly and slowly write back these cache lines • In some sense, we treat Last Level Cache as a large write buffer, in which we find proper write backs to fill the idle memory intervals 15
Choosing Cache Lines for Eager Mellow Writes Last Level Cache Set 0 Set 1 Set 2 Set 3 Predicted useful Predicted useless Candidates of Eager Mellow Writes if Dirty (Courtesy: Zhang et al., ISCA 2016) • This paper chooses dirty cache lines which are predicted to be useless as the candidates for Eager Mellow Writes. Those are, the cache lines will not be accessed again before their eviction 16
A Utility Based Approach To Predict Useless Cache Lines (Courtesy: Zhang et al., ISCA 2016) For an LRU Set-associative Last Level Cache (LLC): • Add an access counter for each LRU stack position in LLC • Increase the corresponding access counter if there is an access hit on an LRU position • For every time slice (500,000 cycles), choose the consecutive least-used LRU positions with sum less than 1/32 LLC accesses • In the next time slice, consider these cache lines with these LRU positions as useless, and they can be eagerly written back Moinuddin K. Qureshi & Yale N. Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High- Performance, Runtime Mechanism to Partition Shared Caches”, MICRO'06. 17
Architectural Modifications (Courtesy: Zhang et al., ISCA 2016) • Eager Mellow Write Requests • Eager Mellow Queue Lowest Priority, No Write Drains, Just Slow Writes 18
Effectiveness of Eager Mellow Writes (Courtesy: Zhang et al., ISCA 2016) No Performance Degradation, even some performance benefit Geomean 158% lifetime improvement compared with All-Norm. 6 out of 11 applications meet the 8-year lifetime requirements. 5 applications still suffer from short lifetime! 19
Schemes • Mellow Writes – Bank-Aware Mellow Writes – Eager Mellow Writes • Wear Quota 20
Partition the time into Time slices Expected Lifetime Total Amount of Available Wear of Resistive Main Memory Wear Quota Time Slice (Courtesy: Zhang et al., ISCA 2016) • Wear Quota (per bank): the average available wear of each time slice. 21
Wear Quota Within Within Exceeding Within Wear Quota Wear Quota Wear Quota Wear Quota Wear Quota Wear Quota Wear Quota Wear Quota Time Slice 1 Time Slice 2 Time Slice 3 Time Slice 4 Wear Wear Wear Wear Time Time Time Time Slice 1 Slice 2 Slice 3 Slice 4 (Courtesy: Zhang et al., ISCA 2016) Time Slice 1: Mellow Writes Policy Time Slice 2: Mellow Writes Policy Time Slice 3: All-Slow Writes Policy Time Slice 4: Mellow Writes Policy 22
Effectiveness of Wear Quota (Courtesy: Zhang et al., ISCA 2016) • All 11 applications meet the 8-year lifetime requirements. • Does not degrade the performance if the lifetime requirement is already met. • Degrades the performance only when necessary! 23
Technical Insights and Conclusion – A dynamic trade-off between write latency and endurance. – Two Mellow Writes schemes which improve the lifetime without sacrificing the performance. – Wear Quota scheme which guarantees a minimal lifetime with relatively small performance loss. – Low hardware overhead and easy to implement.
Discussion • System biased to support their hypothesis – Single-core vs. multi-core environment – Open page vs. relaxed closed page policy • Memory-control level write forwarding for eager mellow write queue • Performance improves in case of eager mellow writes • Wear quota vs. remapping dead memory cells 25
Backup Slides 26
How About Energy? • Operation Level A 3x Slow Write consumes 66% more Energy Compared with a normal write. • Total Memory Consumption of the Execution (Courtesy: Zhang et al., ISCA 2016) On Average Less than 50% more memory energy compared with All-Norm Policy An Affordable Cost Compared with the Lifetime Benefit. 27
Sensitivity to Analytic Model (Courtesy: Zhang et al., ISCA 2016) • In a typical ReRAM technology, compared with default speed writes, slow writes are predicted to achieve a quadratic endurance benefit . Based on a wider range of device parameters, the endurance benefit could be linear to cubic . • What will happen if we have a different endurance benefit? • Even with a pessimistic linear endurance benefit, we can still achieve 47% lifetime improvement. D. B. Strukov, “Endurance-write-speed tradeoffs in nonvolatile memories,” Applied Physics A, vol. 122, no. 4, pp. 1–4, 2016. 28
Recommend
More recommend