Nonblocking Memory Refresh Kate Nguyen, Kehan Lyu, Xianze Meng , Vilas Sridharan, Xun Jian
History of DRAM 2 Refresh Latency Bus Cycle Time Min. Read Latency 550 512 Latency (ns) 16 13.5 0.75 0.5 2014 2007 2000 2003 1968 2018 DDR4 DDR DDR3 DRAM is DDR2 50 th Anniversary of patented DRAM patent 2013 2012 2015 2017 Skipping Refresh (ISCA ‘12, HPCA ‘13 HPCA ’14, ISCA ’15, ISCA ’17, MICRO ‘17 )
Issues with Skipping Refresh 3 Tested DRAM chips from different manufacturers Memory Cell Refresh Interval (ms) Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, K. Lai, and O. Mutlu , “Flipping bits in memory without accessing them: An experimental study of dram disturbance errors,” in 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp. 361 – 372, June 2014. Skipping refresh reduces memory security
Why DRAM Refresh Hurts Performance 4 SRAM DRAM address line T3 T4 T2 T1 transistor storage T5 T6 capacitor bit line word line bit bit Blocking Refresh Nonblocking Refresh
Our Proposal: Nonblocking Refresh 5 • Improve performance while retaining the same level of security as the conventional baseline. • Transform DRAM refresh into the static/background refresh in SRAM at the system level. • Refresh DRAM in the background without stalling read accesses to refreshing memory blocks.
How Nonblocking Refresh Works 6 Conventional Refresh Nonblocking Refresh Refreshing Refreshing Memory Block Memory Block Pending read requests Calculate to the block are stalled Refreshing Redundant Data Data
Leveraging Existing Redundant Data for Free 7 Avg 97% Year of operation 7 Each memory block in server memory 6 Redundant Data 5 Program Data (12.5% - 40.6%) 4 3 2 For hardware failure 1 protection 0% 20% 40% 60% 80% 100% % of pages that remain fault-free, on average For server systems, Nonblocking Refresh can leverage existing underutilized redundant data without storage overheads .
Primer on Server Memory Organization 8 Example Memory Rank Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2 Fetched Memory Block from Rank
Nonblocking Refresh for Server Memory 9 Inaccessible data Accessible data due to refresh Example Memory Rank Redundant Redundant Data chip1 Data chip 2 Data chip 3 Data chip 4 Chip 1 Chip 2 Fetched Memory Block from Rank Calculate
Challenge 1: Ensuring Same Amount of Refresh 10 Conventional (blocking) refresh Refreshing (inaccessible) 6 5 Not refreshing Chip ID 4 (accessible) 3 2 1 Time Refreshing Memory Rank Redundant Redundant Chip 1 Chip 2 Data chip1 Data chip 2 Data chip 3 Data chip 4
Challenge 1: Ensuring Same Amount of Refresh 11 Nonblocking Refresh Refreshing (inaccessible) 6 5 Not refreshing Chip ID 4 (accessible) 3 2 1 Time Refreshing Memory Rank Redundant Redundant Chip 1 Chip 2 Data chip1 Data chip 2 Data chip 3 Data chip 4
Challenge 1: Ensuring Same Amount of Refresh 12 Nonblocking Refresh Refreshing Refresh Interval (inaccessible) 6 5 Not refreshing Chip ID 4 (accessible) 3 2 1 Time Refreshing Memory Rank Redundant Redundant Chip 1 Chip 2 Data chip1 Data chip 2 Data chip 3 Data chip 4
Challenge 1: Ensuring Same Amount of Refresh 13 Nonblocking Refresh Refreshing Refresh Interval (inaccessible) 6 5 Not refreshing Chip ID 4 (accessible) 3 2 1 Time Refreshing Memory Rank Redundant Redundant Chip 1 Chip 2 Data chip1 Data chip 2 Data chip 3 Data chip 4
Challenge 2: Ensuring Memory Write Bandwidth 14 Conventional Systems Nonblocking Refresh Refreshing Shared Shared Memory Bus Memory Bus 100%/N 0% Rank Rank 1 1 36 KB/Channel 100% Rank 100% Write 100% Rank Write Writeback Queue Queue 2 Cache 2 100%/N ... ... ... Processor Processor Rank Rank 0% N N 100%/N
Challenge 3: Preserving Baseline Hardware Failure Protection 15 Use the block’s existing redundant data : Read a block from a to calculate inaccessible data stored in refreshing chips + refreshing rank to detect unknown hardware errors YES Hardware Wait for refresh to complete Error detected ? NO Re-read block from memory Read completes Read completes Perform error correction
Methodology 16 • Two Memory Systems: • Intel/AMD Server Memory Systems • IBM Server Memory System • Baseline: • Conventional Refresh: fully compliance with manufacturer specification • Insecure Refresh: skips 75% of refresh operations • Evaluated 7 multi-threaded and 7 multi-program workloads • 16gb and future 32gb DRAM • 4 memory channels with 4 ranks per channel
Performance Improvement 17 Performance Improvement vs. 40% 35% Conventional Refresh 30% 25% 20% 15% 10% 5% 0% -5% -10% Intel/AMD Server IBM Server Mem Intel/AMD Server IBM Server Mem Mem Mem 16Gb 32Gb
Performance Improvement 18 Performance Improvement vs. 10% 8% 6% Insecure Refresh 4% 2% 0% -2% -4% -6% -8% -10% Intel/AMD Server IBM Server Mem Intel/AMD Server IBM Server Mem Mem Mem 16Gb 32Gb
Power Consumption 19 vs. Conventional Refresh vs. Insecure Refresh 9% 7% 5% 3% Power 1% -1% -3% -5% Intel/AMD Server IBM Server Mem Intel/AMD Server IBM Server Mem Mem Mem 16Gb 32Gb
Performance of Systems with Faulty Chips 20 3 Faulty Ranks/Channel 2 Faulty Ranks/Channel 1 Faulty Rank/Channel Average 100% Nonblocking Refresh on faulty systems 98% 96% vs. on fault-free systems 94% 92% 90% 88% 86% 84% 82% 80% Intel/AMD Server IBM Server Mem Intel/AMD Server IBM Server Mem Mem Mem 16GB 32GB
Conclusion 21 • Since its invention 50 years ago, DRAM has always required expensive refresh operations that stall accesses to refreshing data. • We propose Nonblocking Refresh to refresh data in DRAM without stalling read accesses to refreshing data. • For server memory systems, Nonblocking Refresh improves average performance by 16.2% and 30.3% for 16gb and 32gb chips, respectively. • Nonblocking Refresh preserves conventional baseline level of security by ensuring the same amount of refresh.
22 Questions?
Recommend
More recommend