Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems Matheus Ogleari Prof. Ethan Miller Prof. Jishen Zhao 1
Overview ● Background/Motivation ● Hardware-Driven Logging Design ● Evaluation 2
Background ● Persistent Memory - What is it? ○ A “hybridization” between storage and memory ○ Combines the persistence in data storage of disks with the byte-addressability and load/store interface of memories ○ Avoids paging data blocks from/to a storage device or context switching while servicing page faults ○ Utilizes NVM technology (NonVolatile Memory) 3
Motivation ● “There is no such thing as a free lunch” ○ There is overhead and complexity costs associated with implementing persistent memory ○ There is “a large performance gap between a system with a persistent memory and a “native system” (i.e., with no persistence support)” [Zhao, Micro’13] Image from Zhao Micro‘13 Paper 4
Our Work ● What are we doing? ○ Hardware-Driven Logging for Persistent Memory Systems ○ Shifting the balance between hardware and software while improving performance with persistent memory applications 5
Baseline ● Baseline Assumptions ■ System uses 100% NVRAM OR hybrid NVRAM/DRAM ■ In a hybrid system, we can distinguish between using DRAM and NVRAM ■ Cache is SRAM Memory Cache Memory Cache (DRAM) Processor Processor (SRAM) (NVRAM) (SRAM) (NVRAM) Chip Chip Off-Chip Off-Chip 6
Baseline ● Baseline Assumptions ■ Use logging for memory persistence ■ Log is finite size, cannot grow forever, is uncacheable ■ Log is a circular buffer with head and tail pointers, wraps around head Circular Log Buffer tail 7
Baseline ● Software-Driven Logging 8
Our Work ● What we want: 9
Hardware-Driven Logging ● How does it work? ● Two primary mechanisms: ○ Hardware Redo+Undo Logging (WAL) ○ Cache forced write-back (FWB) 10
Hardware-Driven Logging ● Design ○ Hardware Redo+Undo Logging (WAL) ■ All writes within persistent memory blocks automatically trigger a write to log. ■ These logs accumulate in an on-chip log buffer <addr, old_val, new_val, txid> ■ Once transactions complete (or buffer full), write out log entries to NVRAM 11
Hardware-Driven Logging ● Design ○ Hardware Redo+Undo Logging (WAL) ■ For a log entry <addr, old_val, new_val, txid>, addr and new_val are given in the write request, and old_val is grabbed from write-allocate cache line 12
Hardware-Driven Logging ● Design ○ Forced Write-Back (FWB) ■ New cache write-back scheme that enhances the standard ■ Makes it so that no cache line remains “too long” in the cache and are written-back if an FWB is triggered by the mechanism ■ Checks cache lines and updates state periodically (e.g., once per cycle). 13
Hardware-Driven Logging ● Design ○ Forced Write-Back (FWB) ■ Provides guaranteed persistence due to smarter cache line write back, unlike software approach ■ Allows for writes to coalesce in cache lines before writing back to memory ■ Is more efficient than always writing-back after persistent updates 14
Trade-Offs ● Hardware-Driven Logging for Persistent Memory Systems ○ Primary idea: Let hardware take care of it, but there are trade-offs PROS: CONS: +Provides redo+undo logging at low - More complex hardware cost - Higher chip area +Removes burden from the programmer - More on-chip power consumption +More efficient utilization of underlying hardware +Guarantees persistence +Improves performance 15
Evaluation ● Experiments ○ Microbenchmarks (hash, rbtree, sps, ssca2, btree) w/ real workloads (WHISPER) ● Results ○ On average, performance (measured by IPC) increases by 1.42x over the baseline without CLWB and a 1.51x improvement over the baseline design with CLWB. ○ On average, dynamic memory energy consumption reduces by 1.53x over the baseline without CLWB and a 1.72x improvement over the baseline design with CLWB. ○ On average, operational throughput increases by 1.45x over the baseline without CLWB and a 1.60x improvement over the baseline design with CLWB. ○ On average, memory write traffic reduces by 2.36x over the baseline without CLWB and a 3.12x improvement over the baseline design with CLWB. 16
Hardware Cost? ● Hardware Overhead ○ Table below shows major new hardware components ○ Relatively low overhead compared to existing cache state ○ Other logic also added, but is small- to medium-sized logic components like Decoders and Muxes Mechanism Logic Type Size Transaction ID register flip-flops 1 Byte Log head pointer register flip-flops 8 Bytes Log tail pointer register flip-flops 8 Bytes Log buffer SRAM 964 Bytes FWB tag bits SRAM 1040 Bytes 17
Impact ● Why does this matter? ○ Provides redo+undo logging at low cost, with benefits of both ○ Improves persistent memory performance, closing the gap with non-persistent systems ○ Addresses common issues that arise with current persistent memory software-based models ○ Removes the need for persistent memory-related instructions (mfence, clwb, clflush) ○ Provides new avenue of research, with hardware-based solutions becoming more appealing 18
Thank You! 19
Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems ● What we want: head Circular Log Buffer tail 20
Recommend
More recommend