nvthreads practical persistence for multi threaded
play

NVthreads: Practical Persistence for Multi-threaded Applications - PowerPoint PPT Presentation

NVthreads: Practical Persistence for Multi-threaded Applications Terry Hsu* , Purdue University Helge Brgner*, TU Mnchen Indrajit Roy*, Google Inc. Kimberly Keeton, Hewlett Packard Labs Patrick Eugster, TU Darmstadt and Purdue University *


  1. NVthreads: Practical Persistence for Multi-threaded Applications Terry Hsu* , Purdue University Helge Brügner*, TU München Indrajit Roy*, Google Inc. Kimberly Keeton, Hewlett Packard Labs Patrick Eugster, TU Darmstadt and Purdue University * Work was done at Hewlett Packard Labs. NVMW 2018 ❖ NVthreads was published in EuroSys 2017 ❖ This work was supported by Hewlett Packard Labs, NSF TC-1117065, NSF TWC-1421910, and ERC FP7-617805.

  2. What is non-volatile memory (NVM)? • Key features: persistence, good performance, byte addressability • Persistence - Retain data without power • Good performance - Outperform traditional filesystem interface • Byte addressability - Allow for pure memory operations 2

  3. Programming interfaces for NVM • NVM aware filesystems: BPFS, PMFS, PMEM - Pro: provide good performance - Con: require applications to use file-system interfaces and may need hardware modifications • Durable transaction and heaps: NV-Heaps, Mnemosyne - Pro: allow fine-grained NVM access - Con: force programs to use transactions and require non-trivial effort to retrofit transactions in lock-based programs ☞ Problem: Can we provide a simpler programming interface? 4

  4. NVM-aware apps programming 1 : # Add element to the tail of list Challenges: 2 : pthread_lock(&m); 1.data consistency 3 : malloc(&e, sizeof(*e)); programmability 4 : volatile caches performance 5 : 6 : e->value = 5; 7 : 8 : 9 : e->next = NULL; 10: NVM 11: 12: head->next = e; //crash 12: head->next = e; // crash e head 13: . . 1 5 NULL 14: tail 15: tail = e; 16: pthread_unlock(&m); 8

  5. NVM-aware apps programming 1 : # Add element to the tail of list Challenges: 2 : pthread_lock(&m); 1.data consistency 3 : malloc(&e, sizeof(*e)); 2.programmability 4 : <save old value of e->value> volatile caches performance 5 : 6 : e->value = 5; 7 : <save old value of e->next> 8 : 9 : e->next = NULL; 10: <save old value of head->next> NVM 11: 12: head->next = e; e head 13: <save old value of tail> . . 1 5 NULL 14: tail 15: tail = e; 16: pthread_unlock(&m); 9

  6. NVM-aware apps programming 1 : # Add element to the tail of list Challenges: 2 : pthread_lock(&m); 1.data consistency 3 : malloc(&e, sizeof(*e)); 2.programmability 4 : <save old value of e->value> 3.volatile caches performance 5 : <flush log entry to NVM> 6 : e->value = 5; 7 : <save old value of e->next> Cache 8 : <flush log entry to NVM> 9 : e->next = NULL; flushing… 10: <save old value of head->next> NVM 11: <flush log entry to NVM> 12: head->next = e; e head 13: <save old value of tail> . . 1 5 NULL 14: <flush log entry to NVM> tail 15: tail = e; 16: pthread_unlock(&m); 10

  7. NVM-aware apps programming 1 : # Add element to the tail of list Challenges: 2 : pthread_lock(&m); 1.data consistency 3 : malloc(&e, sizeof(*e)); 2.programmability 4 : <save old value of e->value> 3.volatile caches 4.performance 5 : <flush log entry to NVM> 6 : e->value = 5; 7 : <save old value of e->next> Cache 8 : <flush log entry to NVM> 9 : e->next = NULL; flushing… 10: <save old value of head->next> NVM 11: <flush log entry to NVM> 12: head->next = e; e head 13: <save old value of tail> . . 1 5 NULL 14: <flush log entry to NVM> tail 15: tail = e; 16: pthread_unlock(&m); 11

  8. Challenges of using NVM • Data consistency - Ensure data consistency even after crash • Volatile caches - Manage data movement from volatile caches to NVM • Programmability - Avoid extensive program modifications • Performance - Minimize runtime overhead ! Proposal: NVthreads, a programming model and runtime that adds persistence to multi-threaded C/C++ programs 13

  9. Goals of NVthreads • Make existing lock-based C/C++ applications crash tolerant • Minimize porting effort - Drop-in replacement for pthreads library - No need for transactions • Advantages of the NVthreads - Good performance - Easier to develop NVM-aware applications 14

  10. Key ideas • Use synchronization points to infer consistent regions (cf. Atlas [OOPSLA’14]) - Does not require applications to use transactions • Execute multithreaded program as multi-process program (cf. DThreads [SOSP’11]) - Process memory buffers uncommitted writes • Track data modifications at page granularity - Amortizes logging overhead vs fine-grained tracking 15

  11. Using NVthreads Ease of use : • bash$ gcc foo.c –o foo.out –rdynamic libnvthread.so –ldl Unmodified C/C++ application Add recovery code, Modifications specify persistent allocations • Allocate data in NVM: nvmalloc() • Recover data in NVM: nvrecover() User space NVthreads library Multi-process, intercepting synchronization, tracking data, maintaining log Link to NVthreads library Operating system Kernel Memory allocation and file system interface for space both DRAM and NVM DRAM DRAM NVM Hardware Volatile main memory Persistent regions e.g., stacks e.g., linked list on heap NVM 19

  12. NVthreads: programming model 1 void main(){ 2 if( crashed() ){ 3 int *c = (int*) nvmalloc(sizeof(int), “c”); 4 *c = nvrecover(c, sizeof(int), “c”); 5 } 6 else{ // normal execution 6 else{ // normal execution 7 int *c = (int*) nvmalloc(sizeof(int), “c”); 7 int *c = (int*) nvmalloc(sizeof(int), “c”); 8 ... // thread creation 8 ... // thread creation 9 m.lock() 9 m.lock() 10 *c = *c+1; 10 *c = *c+1; Locks mark boundary for 11 ... 11 ... durable code section. 12 m.unlock() 12 m.unlock() 13 } 13 } 14 } 22

  13. NVthreads: programming model 1 void main(){ 2 if( crashed() ){ 2 if( crashed() ){ Application specific 3 int *c = (int*)nvmalloc(sizeof(int), “c”); 3 int *c = (int*) nvmalloc(sizeof(int), “c”); recovery code. 4 *c = nvrecover(c, sizeof(int), “c”); 4 *c = nvrecover(c, sizeof(int), “c”); Programer needs to add. 5 } 5 } 6 else{ // normal execution 7 int *c = (int*) nvmalloc(sizeof(int), “c”); 8 ... // thread creation 9 m.lock() 10 *c = *c+1; 11 ... 12 m.unlock() 13 } 14 } 23

  14. Example: linked list • NVthreads guarantees that the linked list is atomically appended w.r.t. failures 1 : # L is a persistent list Critical T1 section 2 : Start threads {T1, T2, T3} (add e 1 ) 3 : … Recovery 4 : # Add element to the tail of list phase Critical T2 5 : pthread_lock(&m); section (execute (add e 2 ) redo ops) 6 : nvmalloc(&e, sizeof(*e)); 7 : e->val = localVal; Critical 8 : tail->next = e; T3 section (add e 3 ) 9 : e->prev = tail; // crash! 9 : e->prev = tail; // crash! 10: tail = e; NVM L={} L={e 1 } L={e 1, e 2 } 11: pthread_unlock(&m) state of the list data structure “L” 25

  15. Implementing atomic durability • Convert threads to processes (cf. DThreads [SOSP’11]) - Each process works on private memory, no undo log shared address space disjoint address spaces • At synchronization points, propagate private updates, execute processes sequentially • Track dirty pages and log them to NVM for recovery - Apply redo log in the event of crash 26

  16. From threads to processes Track NVM Merge Wait Start Stop T1 dirty log shared pages write state Pass token Track NVM Merge Wait Start Stop T2 dirty log shared pages write state Parallel Parallel phase phase Critical section Execute Wait 33

  17. Redo logging Clean page Parallel phase Critical section Dirtied page Shared state T1 write back to NVM merge updated log dirty bytes Rego pages log sync() NVM 34

  18. Tracking data dependencies A T1 X=1 cond_wait() dependence X=Y=0 T2 Y=X cond_signal() B NVthreads maintains metadata for memory pages Log1 Log2 Log3 NVM per lockset to track data dependencies. 46

  19. Evaluation • Environment Ubuntu 14.04 (Linux 3.16.7) - Two Intel Xeon X5650 processors (12cores@2.67GHz) - 198GB RAM and 600GB SSD - • Applications PARSEC benchmarks, Phoenix benchmarks, PageRank, K-means - • NVM emulator Linux tmpfs on DRAM emulating nvmfs (provided by Hewlett Packard Labs) - Injected 1000ns delay to each 4KB page write via RDTSCP instruction - 47

  20. Slowdown (x) • • 12 16 0 4 8 No recovery protocol Phoenix and PARSEC benchmarks h i s t o g r Performance vs pthreads a m k m e a n s l i n e a r r e g r e s s i o n Pthreads m a t r i x m u l t i p l y p c a Dthreads r e v e r s e i n d e x s t r i n g m a t c 48 h NVthreads (nvmfs 1000ns) w o r d c o u n t b l a c k s c h o l e s c a n n e a l d e d u p f e r r e t Atlas s t r e a m c l u s t e r s w a p t i o n s

Recommend


More recommend