Logging in Persistent Memory: to Cache, or Not to Cache? Mengjie Li, - PowerPoint PPT Presentation

Logging in Persistent Memory: to Cache, or Not to Cache? Mengjie Li, Matheus Ogleari , Jishen Zhao

Persistent Memory STT-RAM, PCM, Memory CPU CPU ReRAM, NVDIMM, Battery-backed Load/store DRAM NVRAM DRAM, etc. Not persistent Persistent memory Storage Disk/Flash Load/store Fopen, fread, fwrite , … Persistent Persistent These nonvolatile devices are able to retain the data in a consistent state in case of power loss. 2

Logging in Persistent Memory Update persistent memory with transactions Tx_begin Core Core Core Core … … do some reads L1 L1 L1 L1 do some computation Memory LLC LLC Rlog ( addr(C), new_val(C) ) Barrier memory_barrier NVRAM Root NVRAM Root write C Tx_commit A A C’ B B C D D Micro-ops: Log_C ’ Log_C ’ Log Log s tore C’ 1 C 1 ’ s tore C’ 2 ... Time 3

To cache, or not cache? That is the question. [Mengjie Li+, Memsys 2017] 4

Experimental Setup • Desktop – Dell OptiPlex 7040 Tower o CPU – 4-core 3.4GHz Intel Core-i7 o Cache – 8 MB last-level cache • Measurement Tools – Perf & rdtsc • Micro-benchmarks – run 20 times and report the average performance without initialization time o Various working set sizes o Various transaction sizes and write intensity o Various data structures: hashtable, rbtree, array, … 5

Microbenchmarks Example //initialization Create an array of strings //Uncacheable log //Cacheable log for (i = 0; i < array_size; ++i) { for (i = 0; i < array_size; ++i) { value = random_string; value = random_string; key = i; key = i; // Log updates // Log updates // Intrinsic functions to invoke movnti log[2 * i] = key; _mm_stream_si32(&log[2 * i], key); log[2 * i + 1] = value; asm volatile (“ sfence ”); _mm_stream_si32(&log[2 * i + 1], value); asm volatile (“ sfence ”); array[i] = value; array[i] = value; } } 6

Issue with Cacheable log Core Core L1i Cache L1d Cache L1i Cache L1d Cache Log Cache pollution ... ... Log Last-Level Cache Log Memory Bus DRAM NVM Log 7

LLC Miss Rate and Execution Time Execution Time (Million Cycles) LLC Miss Rate Execution Time 90% 1.4 85% 1.2 LLC Miss Rate 80% 1.0 75% 0.8 70% 0.6 65% 0.4 60% 0.2 55% 50% 0.0 Uncacheable Cacheable 8

How about uncacheable log performance? 9

How do we make log uncacheable? Example: x86 processors provide uncacheable write instructions (movnti, movntg, etc) Instructions can be invoked by • Inline functions (__asm__()) • Intrinsic functions(_mm_stream_si32) 10

Write Combining Buffer (WCB) 4-6 cache lines Core Core WCB Log WCB L1 Cache L1 Cache ... ... Last-Level Cache Memory Bus DRAM NVM Log 11

Issues with Uncacheable Log • Existing uncacheable writing schemes are sub-optimal o Partial writes in WCB o Overhead of uncacheable write instructions o Limited WCB size 12

Partial Writes in WCB Full write Partial write 64B < 64B WCB 1 bus clock 1 bus clock Memory Partial writes are inefficient, because they underutilize the memory bus bandwidth 13

Execution Time vs. Transaction Size — Partial Writes Partial Writes Full Writes 1.28E09 Cycles 1.15E08 Cycles 100% Partial writes: 90% String Size – 4B Execution Time 80% Iterations – 2097152 70% Total Data – 8MB 60% 50% 40% Full wirtes: 30% String Size – 64B 20% 10% Iterations – 131072 0% Total Data – 8MB Uncacheable Cacheable 14

Overhead of Uncacheable Write Instructions / /U n c a c h e a b l e lo g fo r ( i = 0 ; i < a r r a y _ s i z e ; + + i) { v a l u e = r a n d o m _ s tr i n g ; k e y = i ; / / L o g u p d a te s / / In tr i n s i c fu n c ti o n s to i n v o k e m o v n ti _ m m _ s tr e a m _ s i 3 2 ( & lo g [ 2 * i] , k e y ) ; e ( “ ” ) ; _ m m _ s tr e a m _ s i 3 2 ( & lo g [ 2 * i + 1 ] , v a l u e ) ; e ( “ ” ) ; a s m v o l a ti l s fe n c e a r r a y [ i] = v a l u e ; } / /C a c h e a b l e lo g fo r ( i = 0 ; i < a r r a y _ s i z e ; + + i) { v a l u e = r a n d o m _ s tr i n g ; k e y = i ; / / L o g u p d a te s lo g [ 2 * i] = k e y ; lo g [ 2 * i + 1 ] = v a l u e ; e ( “ ” ) ; a s m v o l a ti l s fe n c e e ( “ ” ) ; a r r a y [ i] = v a l u e ; 15 } 6

Overhead of Uncacheable Write Instructions More overhead to do type casting, if the type of data written is not integer void _mm_stream_si32 (int *p, int a) asm (” movnti %1, %0” : “=m” (*p) : “r”(v)); // int * p, int v; 16

Issues with Limited WCB Size Log updates among transactions issued by program WCB NVRAM bus 17

Inefficiencies of Uncacheable Log String size iterations (Bytes) uncacheable cacheable speedup 4 2097152 3.5 1.6 Partial writes 8 1048576 Execution Time (Billion cycles) 3.0 and sfence Speedup 16 524288 WCB size limit 2.5 1.4 32 262144 2.0 – – 64 131072 1.5 1.2 1.0 128 65536 0.5 256 32768 0.0 1.0 4 8 16 32 64 128 256 String size (Bytes) 18

Summary • Tradeoff between cacheable and uncacheable log o Issues with cacheable log – cache contamination o Issues with uncacheable log – sub-optimal design in • Uncacheable write instructions and programming interface • Hardware components, e.g., write-combining buffer design and the way it is used • More results o Sensitivity study on read/write ratio in transactions o Sensitivity study on transaction size o Other data structures: hash table, rbtree, b+tree, etc. 19

Logging in Persistent Memory: to Cache, or Not to Cache? Mengjie Li, Matheus Ogleari , Jishen Zhao

Logging in Persistent Memory: to Cache, or Not to Cache? Mengjie Li, - PowerPoint PPT Presentation

Logging in Persistent Memory: to Cache, or Not to Cache? Mengjie Li, Matheus Ogleari , Jishen Zhao Persistent Memory STT-RAM, PCM, Memory CPU CPU ReRAM, NVDIMM, Battery-backed Load/store DRAM NVRAM DRAM, etc. Not persistent Persistent

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Hardware Support for ACID Transactions in Persistent Memory Arpit Joshi , Vijay Nagarajan, Marcelo

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Distributed Shared Persistent Memory (SoCC 17) Yizhou Shan, Yiying Zhang Persistent Memory

DHTM: Durable Hardware Transactional Memory Arpit Joshi , Vijay Nagarajan, Marcelo Cintra, Stratis

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #5: LOGGING

ALMA Common Software Basic Track Logging and Error Systems Logging system conceptual overview

Debugging & Logging Java Logging Java has built-in support for logging Logs contain

CCHL: Compression-Consolidation Hardware Logging for Efficient Failure-Atomic Persistent Memory

Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems Matheus Ogleari Prof.

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Persistent Handles: approaches Ralph Bhme, Samba Team, SerNet 2018-06-08 Outline Persistent

WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems Se Kwon Lee K. Hyun Lim 1 ,

Persistent Memory Use Cases in Modern Software Architectures Olasoji Denloye SW Engineer Intel

Efficient Hardware-assisted Logging with Asynchronous and Direct Update for Persistent Memory

Samson Logging Tires Logging Tire Size Definition 24.5-32/16 24.5 = section width in inches -

Usable assembly language for GPUs: a success story Daniel J. Bernstein, UIC Hsieh-Chung Chen,

Road map Midterm acomin Friday in class Exams page on web site has info + practice problems

FSM$Modeling State$Diagrams$(SDs)$and$Algorithmic$State$

ShellNoob Because writing shellcode is fun, but sometimes painful Black Hat USA Yanick

DAME finder : A package to detect changes in allele-specific methylation Stephany Orjuela Len

CENG 342 Digital Systems Algorithmic State Machine with Datapath (ASMD) Larry Pyeatt

Wildlife Corridors Background AB 498 (2015) by Assemblymember Levine AB 2087 (2016) by

DIV 26000 AND HEAT TRACE FOR MECHANICAL SYSTEMS ACE/ASM DOS AND DONTS OF HEAT TRACE IN

Logging in Persistent Memory: to Cache, or Not to Cache? Mengjie Li, - PowerPoint PPT Presentation

Logging in Persistent Memory: to Cache, or Not to Cache? Mengjie Li, Matheus Ogleari , Jishen Zhao Persistent Memory STT-RAM, PCM, Memory CPU CPU ReRAM, NVDIMM, Battery-backed Load/store DRAM NVRAM DRAM, etc. Not persistent Persistent

Lecture 11: Persistent Memory Databases 1 / 71 Persistent Memory Databases Recap

Hardware Support for ACID Transactions in Persistent Memory Arpit Joshi , Vijay Nagarajan, Marcelo

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Distributed Shared Persistent Memory (SoCC 17) Yizhou Shan, Yiying Zhang Persistent Memory

DHTM: Durable Hardware Transactional Memory Arpit Joshi , Vijay Nagarajan, Marcelo Cintra, Stratis

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #5: LOGGING

ALMA Common Software Basic Track Logging and Error Systems Logging system conceptual overview

Debugging &amp; Logging Java Logging Java has built-in support for logging Logs contain

CCHL: Compression-Consolidation Hardware Logging for Efficient Failure-Atomic Persistent Memory

Efficient Hardware-based Undo+Redo Logging for Persistent Memory Systems Matheus Ogleari Prof.

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Persistent Handles: approaches Ralph Bhme, Samba Team, SerNet 2018-06-08 Outline Persistent

WORT: Write Optimal Radix Tree for Persistent Memory Storage Systems Se Kwon Lee K. Hyun Lim 1 ,

Persistent Memory Use Cases in Modern Software Architectures Olasoji Denloye SW Engineer Intel

Efficient Hardware-assisted Logging with Asynchronous and Direct Update for Persistent Memory

Samson Logging Tires Logging Tire Size Definition 24.5-32/16 24.5 = section width in inches -

Usable assembly language for GPUs: a success story Daniel J. Bernstein, UIC Hsieh-Chung Chen,

Road map Midterm acomin Friday in class Exams page on web site has info + practice problems

FSM$Modeling State$Diagrams$(SDs)$and$Algorithmic$State$

ShellNoob Because writing shellcode is fun, but sometimes painful Black Hat USA Yanick

DAME finder : A package to detect changes in allele-specific methylation Stephany Orjuela Len

CENG 342 Digital Systems Algorithmic State Machine with Datapath (ASMD) Larry Pyeatt

Wildlife Corridors Background AB 498 (2015) by Assemblymember Levine AB 2087 (2016) by

DIV 26000 AND HEAT TRACE FOR MECHANICAL SYSTEMS ACE/ASM DOS AND DONTS OF HEAT TRACE IN

Debugging & Logging Java Logging Java has built-in support for logging Logs contain