efficient hardware based undo redo logging for persistent
play

Efficient Hardware-based Undo+Redo Logging for Persistent Memory - PowerPoint PPT Presentation

Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems Matheus Ogleari Prof. Ethan Miller Prof. Jishen Zhao 1 Overview Background/Motivation Hardware-Driven Logging Design Evaluation 2 Background


  1. Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems Matheus Ogleari Prof. Ethan Miller Prof. Jishen Zhao 1

  2. Overview ● Background/Motivation ● Hardware-Driven Logging Design ● Evaluation 2

  3. Background ● Persistent Memory - What is it? ○ A “hybridization” between storage and memory ○ Combines the persistence in data storage of disks with the byte-addressability and load/store interface of memories ○ Avoids paging data blocks from/to a storage device or context switching while servicing page faults ○ Utilizes NVM technology (NonVolatile Memory) 3

  4. Motivation ● “There is no such thing as a free lunch” ○ There is overhead and complexity costs associated with implementing persistent memory ○ There is “a large performance gap between a system with a persistent memory and a “native system” (i.e., with no persistence support)” [Zhao, Micro’13] Image from Zhao Micro‘13 Paper 4

  5. Our Work ● What are we doing? ○ Hardware-Driven Logging for Persistent Memory Systems ○ Shifting the balance between hardware and software while improving performance with persistent memory applications 5

  6. Baseline ● Baseline Assumptions ■ System uses 100% NVRAM OR hybrid NVRAM/DRAM ■ In a hybrid system, we can distinguish between using DRAM and NVRAM ■ Cache is SRAM Memory Cache Memory Cache (DRAM) Processor Processor (SRAM) (NVRAM) (SRAM) (NVRAM) Chip Chip Off-Chip Off-Chip 6

  7. Baseline ● Baseline Assumptions ■ Use logging for memory persistence ■ Log is finite size, cannot grow forever, is uncacheable ■ Log is a circular buffer with head and tail pointers, wraps around head Circular Log Buffer tail 7

  8. Baseline ● Software-Driven Logging 8

  9. Our Work ● What we want: 9

  10. Hardware-Driven Logging ● How does it work? ● Two primary mechanisms: ○ Hardware Redo+Undo Logging (WAL) ○ Cache forced write-back (FWB) 10

  11. Hardware-Driven Logging ● Design ○ Hardware Redo+Undo Logging (WAL) ■ All writes within persistent memory blocks automatically trigger a write to log. ■ These logs accumulate in an on-chip log buffer <addr, old_val, new_val, txid> ■ Once transactions complete (or buffer full), write out log entries to NVRAM 11

  12. Hardware-Driven Logging ● Design ○ Hardware Redo+Undo Logging (WAL) ■ For a log entry <addr, old_val, new_val, txid>, addr and new_val are given in the write request, and old_val is grabbed from write-allocate cache line 12

  13. Hardware-Driven Logging ● Design ○ Forced Write-Back (FWB) ■ New cache write-back scheme that enhances the standard ■ Makes it so that no cache line remains “too long” in the cache and are written-back if an FWB is triggered by the mechanism ■ Checks cache lines and updates state periodically (e.g., once per cycle). 13

  14. Hardware-Driven Logging ● Design ○ Forced Write-Back (FWB) ■ Provides guaranteed persistence due to smarter cache line write back, unlike software approach ■ Allows for writes to coalesce in cache lines before writing back to memory ■ Is more efficient than always writing-back after persistent updates 14

  15. Trade-Offs ● Hardware-Driven Logging for Persistent Memory Systems ○ Primary idea: Let hardware take care of it, but there are trade-offs PROS: CONS: +Provides redo+undo logging at low - More complex hardware cost - Higher chip area +Removes burden from the programmer - More on-chip power consumption +More efficient utilization of underlying hardware +Guarantees persistence +Improves performance 15

  16. Evaluation ● Experiments ○ Microbenchmarks (hash, rbtree, sps, ssca2, btree) w/ real workloads (WHISPER) ● Results ○ On average, performance (measured by IPC) increases by 1.42x over the baseline without CLWB and a 1.51x improvement over the baseline design with CLWB. ○ On average, dynamic memory energy consumption reduces by 1.53x over the baseline without CLWB and a 1.72x improvement over the baseline design with CLWB. ○ On average, operational throughput increases by 1.45x over the baseline without CLWB and a 1.60x improvement over the baseline design with CLWB. ○ On average, memory write traffic reduces by 2.36x over the baseline without CLWB and a 3.12x improvement over the baseline design with CLWB. 16

  17. Hardware Cost? ● Hardware Overhead ○ Table below shows major new hardware components ○ Relatively low overhead compared to existing cache state ○ Other logic also added, but is small- to medium-sized logic components like Decoders and Muxes Mechanism Logic Type Size Transaction ID register flip-flops 1 Byte Log head pointer register flip-flops 8 Bytes Log tail pointer register flip-flops 8 Bytes Log buffer SRAM 964 Bytes FWB tag bits SRAM 1040 Bytes 17

  18. Impact ● Why does this matter? ○ Provides redo+undo logging at low cost, with benefits of both ○ Improves persistent memory performance, closing the gap with non-persistent systems ○ Addresses common issues that arise with current persistent memory software-based models ○ Removes the need for persistent memory-related instructions (mfence, clwb, clflush) ○ Provides new avenue of research, with hardware-based solutions becoming more appealing 18

  19. Thank You! 19

  20. Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems ● What we want: head Circular Log Buffer tail 20

Recommend


More recommend