durable transactional memory can scale with timestone
play

Durable Transactional Memory Can Scale With TimeStone * R. Madhava - PowerPoint PPT Presentation

Durable Transactional Memory Can Scale With TimeStone * R. Madhava Krishnan , Jaeho Kim * , Ajit Mathew, Xinwei Fu, Anthony Demeri, Changwoo Min, Sudarsun Kannan + + Executive Summary TimeStone is a highly scalable Durable Transaction


  1. Durable Transactional Memory Can Scale With TimeStone * R. Madhava Krishnan , Jaeho Kim * , Ajit Mathew, Xinwei Fu, Anthony Demeri, Changwoo Min, Sudarsun Kannan + +

  2. Executive Summary ➢ TimeStone is a highly scalable Durable Transaction Memory (DTM) ○ Goals: High scalability, performance and low write amplification ○ Technique: Hybrid DRAM-NVMM logging and MVCC ➢ A novel Hybrid DRAM-NVMM logging approach for ○ High performance and low write amplification ➢ TimeStone adopts Multi-Version Concurrency Control (MVCC) model ○ For high scalability and support multiple isolation levels ➢ Scales upto 112 cores and has write amplification <= 1 2

  3. Talk Outline ➢ Motivation ➢ Overview ➢ Design ➢ Evaluation 3

  4. Non-Volatile Main Memory (NVMM) ➢ NVMM has arrived! ➢ Storage like characteristics ○ Data persistence ○ Large capacity ➢ Memory like performance ○ ~100x faster than SSDs ○ Offers byte-addressability 4

  5. Durable Transactional Memory (DTM) ➢ DTMs are software framework supporting ACID properties ➢ DTMs makes NVMM programming easier ➢ Relieves the burden on NVMM application developers ➢ There are some serious problems that needs immediate attention ➢ Poor Scalability ➢ High Write Amplification (up to 6x) 5

  6. Review of Existing DTMs ➢ State-of-art DTMs focuses on reducing the crash consistency cost ○ DudeTM [ASPLOS-17] ○ Romulus [SPAA-18] ➢ To reduce the crash consistency overhead ○ DudeTM keeps logging operations out of critical path ○ Romulus maintains a backup heap to eliminate logging operations ➢ Existing DTMs incurs high Write Amplification in the course of reducing the crash consistency cost 6

  7. Review of Existing DTMs ➢ What is Write Amplification (WA)? ○ Additional bytes written to NVMM for each user requested bytes ➢ Why is it a serious problem? ○ Low write endurance of NVMM ○ Additional writes generates unnecessary traffic at the NVMM ➢ Hence critical path latency increases and performance drops ➢ None of the DTMs considers Many-core Scalability 7

  8. Existing DTMs Are Not Scalable Poor Scalability Romulus Performance Scalability is inevitable!! None of the DTMs scale Saturates beyond 16 cores!!! DudeTM 16 PMDK 8

  9. The Reasons for Poor Scalability 1. Low RW Parallelism Romulus ➢ Poor scalability of the underlying STM ○ DudeTM eg) DudeTM [ASPLOS-17] ➢ Supports only single Writer PMDK ○ eg) Romulus [SPAA-18], ○ PMDK [Intel] 9

  10. The Reasons for Poor Scalability 2. High Write Amplification DTM Systems Write Amplification(WA) Libpmemobj 70x ➢ Additional bytes written to NVMM Romulus 2x ➢ Crash Consistency Overhead DudeTM 4-6x ➢ Metadata Overhead KaminoTx 2x Mnemosyne 4-7x ➢ High WA in the critical path ○ Impacts the system throughput 10

  11. So What Do We Need Now? ➢ A scalable and high performance DTM Our Solution: ➢ Low write amplification TimeStone 11

  12. Talk Outline ➢ Motivation ➢ Overview ➢ Design ➢ Evaluation 12

  13. Two Main Goals of TimeStone 1) Achieve High Scalability and Performance 2) Reduce Write Amplification significantly 13

  14. Goal 1 - To Achieve High Scalability ➢ TimeStone adopts Multi-Version Concurrency Control (MVCC) ➢ Supports non-blocking reads and concurrent disjoint writes ➢ MVCC provides better RW parallelism ➢ Let’s illustrate how MVCC works! 14

  15. Illustration - MVCC Programming Model CASE 1: Concurrent Readers Reader-1 Reader-2 Reader-3 Reader-4 Node A Node B Node C Node D Timestone Supports Non-Blocking Reads 15

  16. Illustration - MVCC Programming Model CASE 2: Concurrent Writers One of the Writers Disjoint Writers Succeeds and Others Abort Writer-1 Writer-2 Writer-3 Node A Node B Node C Node D Timestone Supports Disjoint Writes 16

  17. Goal 1 - To Achieve High Scalability ➢ MVCC provides better RW Parallelism ➢ But that's not just enough for better scalability! ➢ Two reasons for poor scalability ○ Low RW Parallelism ⇒ solved by adopting MVCC ○ High Write Amplification ➢ MVCC can incur very high Write Amplification 17

  18. Goal 1 - To Achieve High Scalability ➢ We optimize MVCC for NVMM to achieve better Scalability ○ ➢ MVCC for better RW parallelism Reduce Write Amplification ○ Asynchronous Garbage Collection (Refer Paper) ➢ Optimize MVCC for NVMM 18

  19. Goal 2 - Low Write Amplification ➢ TOC logging is a multilayered hybrid DRAM-NVMM logging ○ T ransient Version log in DRAM (Tlog) ■ To leverage faster DRAM for better coalescing ○ O perational log in NVMM (Olog) ■ To Guarantee Immediate Durability ○ C heckpoint log in NVMM (Clog) ■ To Guarantee Correct Recovery ➢ TOC logging is key to achieve low write amplification 19

  20. Reducing Write Amplification in TimeStone Update_node (A , V 3 ) “Clog is 70% filled, I need Immediate Durability DRAM to free up some space!! with low Overhead Let me trigger Writeback” Node A Olog NVMM V 9 update_node update_node update_node (A, V 1 ) (A, V 2 ) (A, V 3 ) ➢ Oplog for low Crash Consistency Overhead “Tlog is 70% filled, I need to free up some space!! ➢ Log coalescing for Low Metadata Overhead Let me trigger checkpointing” Writeback Tlog Clog Node A Node A Node A V 1 V 2 V 3 Node A Node A Node A Node A Writes Coalesced V 3 V 5 V 7 V 9 Checkpoints Coalesced Checkpointing Metadata Overhead Reduced Metadata Overhead Reduced 20

  21. Talk Outline ➢ Motivation ➢ Overview ➢ Design ➢ Evaluation 21

  22. Object Structure In TimeStone: Master Object DRAM NVMM ➢ TimeStone is an object based DTM ➢ User defined persistent structure called the master object ➢ For eg., a simple linked list Master Object Master Object Master Object Master Object A B C D 22

  23. Object Structure in TimeStone: Version Object ➢ Different versions of one master object called DRAM NVMM the Version object Master Object Master Object Master Object Master Object A B C D Version Object B 2 Version Object C 2 Version Object D 2 Version Object A 2 Version chain Version Object B 1 Version Object C 1 Version Object D 1 Version Object A 1 23

  24. Writes in TimeStone Update(B, B 1 ) DRAM Master Object NVMM B Any number of writers can simultaneously work on 3 Tlog the disjoint Master Objects 4 Linearization point 1 Assign the wrt-clk 77 Version Object B 1 Master Object B Olog 2 Durability point Update(B, B 1 ) 24

  25. Dereferencing - Finding the Right Version Reader DRAM Master Object B local-clk = 55 NVMM Reader Version Object B4 Which Version Object to wrt-clk >= local-clk Any number of readers can simultaneously traverse local-clk = 55 wrt-clk=70 dereference? the version chain without being blocked Reader Version Object B3 Read the first Version Object wrt-clk <= local-clk local-clk = 55 wrt-clk=50 with wrt-clk <= local-clk Version Object B2 wrt-clk=40 25

  26. Other Interesting Features in TimeStone ➢ Mixed isolation support ➢ Asynchronous time based garbage collection ➢ More details on the design 26

  27. Talk Outline ➢ Motivation ➢ Overview ➢ Design ➢ Evaluation 27

  28. Evaluation Questions ➢ What is the write amplification in TimeStone? ➢ Is log coalescing beneficial? ➢ Does TimeStone scale? ➢ What is the impact on real-world workload? 28

  29. Evaluation Settings ➢ Real NVMM server (Intel DCPMEM) ○ 1TB NVMM and 337GB DRAM ○ 2.5 GHZ 112 core Intel Cascade Lake processor ➢ Benchmarks ○ Microbenchmarks - List, Hash Table, BST ○ Application Benchmarks - Kyotocabinet and YCSB ➢ Workloads ○ Different update ratios, access patterns and data set size ➢ Compared against state-of-art DTM systems 29

  30. Write Amplification for Write-intensive (80% Update) Hash Table Write Amplification of PMDK is 70 even for 2% Update case Write Amplification of TimeStone is always <= 1 30

  31. Write Coalescing in TOC Logging 100% Only 7% of writes are ➢ checkpointed from Tlog The rest are coalesced in ➢ the Tlog Only 0.01% of writes are ➢ written back to master The rest are coalesced in ➢ the Tlog and Clog 16% 0.01% 7% 31

  32. Scalability for Read-Mostly Hash Table (2% Update) TimeStone scales linearly TimeStone is 70x faster than Romulus 32

  33. Scalability for Write-Intensive Hash Table (80% Update) TimeStone still scales linearly With MVCC TimeStone supports better RW parallelism than existing DTMs and hence it Scales better TimeStone performs 100x faster than DudeTM Low Write Amplification in TimeStone makes the critical path shorter and eventually a better performance and Scalability 33

  34. Real-World Application - KyotoCabinet TimeStone enabled KyotoCabinet scales well in addition to offering Performs upto 3x better with Crash Consistency additionally supporting Crash Consistency Vanilla KyotoCabinet running on DRAM Vanilla KyotoCabinet running on NVMM without Crash consistency 34

  35. Discussion ➢ Durable Transactional Memory Systems ○ Romulus[SPAA-18], DudeTM[ASPLOS-17], PMDK, Mnemosyne[ASPLOS-11] ➢ Inspired from in-memory databases ○ Ermia[SIGMOD-16], Cicada[SIGMOD-17] ➢ Also non-linearizable synchronization algorithms ○ RCU[OLS-02], RLU[SOSP-15], MV-RLU[ASPLOS-19] ➢ Future work ○ Provide memory safety and reliability in TimeStone ○ Extend TimeStone to support distributed transaction s 35

Recommend


More recommend