CCHL: Compression-Consolidation Hardware Logging for Efficient Failure-Atomic Persistent Memory Updates Xueliang Wei, Dan Feng, Wei Tong, Jingning Liu, Chengning Wang, Liuqing Ye Huazhong University of Science and Technology
Persistent Memory CPU CPU Fast memory interface No persistence DRAM PM DRAM Slow I/O interface Persistent Memory Persistence Fast memory interface Disk/Flash Persistence PCM, ReRAM, 3D Xpoint, etc. • Provide data persistence at main-memory level • Reduce persistence overhead compared with using traditional storage devices 2
Failure-Atomic Updates • Example : Insert a node into a linked list in persistent memory Operation 1: C Unexpected system failures happen C C … A B … Insert … A B … … A B … Operation 2: C Linked list is broken Data are lost … A B … Failure-Atomic Updates : Persist a group of writes in an all or nothing manner in the presence of system failures 3
Durable Transactions • Example : A durable transaction with write-ahead logging C Log(A) Log(C) c1 Cores a1 Insert … A B … Tx_Begin Execute Compute new data Caches Log A Log C CLWB MFENCE a0 b0 c0 St C, c1 PM St A, a1 Home Region Log region CLWB MFENCE Tx_End 4
Full/Delayed Transaction Durability • Example : Fully/Delayed durable transactions with redo logging Full durability : The transaction is persisted during commit Tx_Begin ❶ Compute Compute Tx_Begin Tx_End Log a1 ❷ Write Log Log c1 time Compute Write Log Persist Log Write Data CLWB ❸ Persist Log MFENCE Persist Data St C, c1 Delayed durability : ❹ Write Data St A, a1 The transaction can be persisted after commit CLWB ❺ Persist Data Tx_Begin Tx_End MFENCE Tx_End Compute Write Log Write Data time Persist Data Persist Log 5
Software/Hardware Logging • Example : Durable transactions with software/hardware redo logging Software logging Tx_Begin Tx_Begin Software performs log writes on the critical path of transaction execution, Compute Compute causing up to 70% performance degradation [ATOM, HPCA’17] Log a1 St C, c1 Tx_Begin Tx_End Log c1 St A, a1 time CLWB Compute Write Log Persist Log Write Data Tx_End MFENCE Persist Data St C, c1 Hardware logging St A, a1 Hardware performs log writes, asynchronous to volatile execution CLWB Tx_Begin Tx_End MFENCE Tx_End time Compute Write Data Write Log Persist Log Persist Data Software Logging Hardware Logging 6
Overview • Motivation : Many log writes are still executed in the critical path in hardware logging, particularly for the multi-core systems with many threads. • Our Approach : Eliminate unnecessary log writes and enable delayed transaction durability. • Intra-Tx Log Compression • Observation 1 : 29.5% of data updated in transactions are dirty. • Avoid redundant log writes by logging only dirty data . • Inter-Tx Log Consolidation • Observation 2 : 53.4% of data are updated by two close transactions (distance < 4). • Avoid redundant log writes by combining successive transactions when they update the same data. • Evaluation : Improve performance by 47.4%, reduce PM write traffic by 36.1%, and reduces memory dynamic energy by 18.7%. 7
Outline • Motivation • CCHL: Compression-Consolidation Hardware Logging • Intra-Tx Log Compression • Inter-Tx Log Consolidation • Evaluation • Conclusion 8
Execution Flow with Hardware Logging • Example : Transaction execution flow with hardware redo logging Tx_Begin Cores Compute St C, c1 St A, a1 Caches a0 c0 Log Buffer Tx_End Memory Write Controller Queue a0 b0 c0 PM Home Region Log Region 9
Execution Flow with Hardware Logging • Example : Transaction execution flow with hardware redo logging Tx_Begin ❶ Compute Cores a1 c1 Compute St C, c1 St A, a1 Caches a0 c0 Log Buffer Tx_End Memory Write Controller Queue a0 b0 c0 PM Home Region Log Region 10
Execution Flow with Hardware Logging • Example : Transaction execution flow with hardware redo logging Tx_Begin ❶ Compute Cores a1 c1 Compute ❷ Write Data St C, c1 ❸ Write Log St A, a1 Caches a0 Log(C) c0 Log Buffer Tx_End Memory Write Controller Queue a0 b0 c0 PM Home Region Log Region 11
Execution Flow with Hardware Logging • Example : Transaction execution flow with hardware redo logging Tx_Begin ❶ Compute Cores a1 Compute ❷ Write Data St C, c1 ❸ Write Log St A, a1 Caches Log(A) a0 c1 Log Buffer Log(C) Tx_End ❹ Persist Log Memory Write Controller Queue a0 b0 c0 PM Home Region Log Region 12
Execution Flow with Hardware Logging • Example : Transaction execution flow with hardware redo logging Tx_Begin ❶ Compute Cores Compute ❷ Write Data St C, c1 ❸ Write Log St A, a1 Caches a1 c1 Log Buffer Log(A) Tx_End ❹ Persist Log Memory Write Log(C) Controller Queue a0 b0 c0 PM Home Region Log Region 13
Execution Flow with Hardware Logging • Example : Transaction execution flow with hardware redo logging Tx_Begin ❶ Compute Cores Compute ❷ Write Data St C, c1 ❸ Write Log St A, a1 Caches a1 c1 Log Buffer Tx_End ❺ Persist Data ❹ Persist Log Memory Write Log(C) Log(A) Controller Queue a0 b0 c0 PM Home Region Log Region 14
Analysis of Hardware Logging Overhead • Some log writes are still executed in the critical path • Example 1 : Evict a cache line when the write queue is full Cores d1 Caches Log(D) b1 c1 Log Buffer Memory Write Log(A) Log(B) Log(C) a1 Controller Queue a0 b0 c0 d0 d0 PM Home Region Log Region 15
Analysis of Hardware Logging Overhead • Some log writes are still executed in the critical path • Example 2 : Commit a transaction when some log entries are buffered Cores Caches d1 c1 Log Buffer Log(D) Memory Write Log(B) Log(C) a1 b1 Controller Queue a0 b0 c0 d0 Log(A) PM Home Region Log Region 16
Analysis of Hardware Logging Overhead • Hardware logging overhead increases as the thread number increases • The percentage of log writes increases as the thread number increases 17
Outline • Motivation • CCHL: Compression-Consolidation Hardware Logging • Intra-Tx Log Compression • Inter-Tx Log Consolidation • Evaluation • Conclusion 18
Intra-Tx Log Compression • Dirty data : The data of which values are modified by transactions • Observation 1 : Only 29.5% bytes among all the updated words are dirty 19
Intra-Tx Log Compression Caches a1 = 0x01020030 Log Buffer Memory Write Controller Queue Recovery a1 = 0x01020030 a0 = 0x00000000 b1 = 0x12345678 Log(A) = 0x01020030 PM Log(B) = 0x12345678 Home Region Log Region • Only the log data for dirty data are essential for recovery 20
Intra-Tx Log Compression • Key idea : Avoid redundant log writes by logging only dirty bytes • A (p,q) dirty flag is added in each log entry to track the dirtiness of data • (p,q) means the dirtiness of every q -byte data is tracked with p flag bits a0 0 0 0 0 0 0 0 0 (1,1) dirty flag 0 1 0 1 0 0 1 0 (1,1) log data 1 2 3 Less clean data costs a1 0 1 0 2 0 0 3 0 Log(A) Metadata Dirty Flag Log Data (1,2) dirty flag 1 1 0 1 Less dirty flag costs (1,2) log data 0 1 0 2 3 0 21
Intra-Tx Log Compression • How does intra-tx log compression reduce log writes? MD MD Flag Intra-tx 5 log writes CLD 0 LD 0 log compression MD Flag CLD 1 MD 8 log writes LD 1 MD Flag CLD 2 MD : Metadata MD Flag CLD 3 MD Flag : Dirty flag LD 2 LD : Log data Log Packing CLD : Compressed log data [Jeong + MICRO’18] MD 3 log writes MD Flag MD Flag … LD 3 CLD 0 CLD 3 Reduce 5 log writes CLD 1 CLD 2 22
Intra-Tx Log Compression • Implementation Get dirty flag by comparing the old and the new value Cores a1 = 0x01020030 Caches a0 = 0x00000000 Log Buffer A, 0x52, 0x01020030 A, 0x52, 0x123 Memory Write Controller Queue a0 = 0x00000000 PM Home Region Log Region 23
Outline • Motivation • CCHL: Compression-Consolidation Hardware Logging • Intra-Tx Log Compression • Inter-Tx Log Consolidation • Evaluation • Conclusion 24
Inter-Tx Log Consolidation • Transaction distance : The number of transactions between two transactions that update the same words • Observation 2 : 53.4% of the updated words are written by two transactions of which the distance is less than 4 25
Inter-Tx Log Consolidation Tx_Begin St A, a1 a0 a1 Log(A) = a1 Recovery St B, b1 b2 b0 Log(B) = b1 Unused Tx_End PM c0 c2 Log(C) = c2 Tx_Begin St C, c2 Log(B) = b2 St B, b2 Home Region Log Region Tx_End • Reduce log writes by avoiding writing unused log entries when several transactions update the same data 26
Inter-Tx Log Consolidation • Key idea : Combine several successive transactions into a large one if they update the same data, and only log the newest values of the data Tx_Begin Tx_Begin Inter-tx St A, a1 St A, a1 Log(A) = a1 Log(A) = a1 log consolidation St B, b1 St B, b1 Tx_End St C, c2 Log(C) = c2 Log(B) = b1 St B, b2 Log(C) = c2 Log(B) = b2 Tx_Begin Tx_End St C, c2 3 log entries Log(B) = b2 St B, b2 Tx_End 4 log entries 27
Recommend
More recommend