distributed shared persistent memory
play

Distributed Shared Persistent Memory (SoCC 17) Yizhou Shan, Yiying - PowerPoint PPT Presentation

Distributed Shared Persistent Memory (SoCC 17) Yizhou Shan, Yiying Zhang Persistent Memory (PM/NVM) CPU Byte Addressable Persistent Cache Low Latency Capacity Cost effective PM DRAM 2 Many PM Work, but All in Single Machine


  1. Distributed Shared Persistent Memory (SoCC ’17) Yizhou Shan, Yiying Zhang

  2. Persistent Memory (PM/NVM) CPU Byte Addressable Persistent Cache Low Latency Capacity Cost effective PM DRAM 2

  3. Many PM Work, but All in Single Machine • Local memory models – NV-Heaps [ASPLOS ’11] , Mnemosyne [ASPLOS ’11] – Memory Persistency [ISCA ’14] , Synchronous Ordering [Micro’16] • Local file systems – BPFS [SOSP’09] , PMFS [EuroSys’14] , SCMFS [SC’11] , HiNFS [EuroSys’16] • Local transaction/logging systems – NVWAL [ASPLOS’16] , SCT/DCT [ASPLOS’16] , Kamino-Tx [Eurosys’17] 3

  4. Moving PM into Datacenters • PM fits datacenter - Applications require a lot memory - and accessing persistent data fast - with low monetary cost • Challenges - Handle node failure - Ensure good performance and scalability - Easy-to-use abstraction 4

  5. How to Use PM in Distributed Environments? • As distributed memory? • As distributed storage? • Mojim [Zhang etal., ASPLOS’15] - First PM work in distributed environments - Efficient PM replication - But far from a full-fledged distributed NVM system 5

  6. Resource Allocation in Datacenters VM1 App1 Container1 App2 Core Core Core Core Core Core Core Core Core Core Core 3GB 4GB Main Memory 8GB Main Memory Memory Node 1 Node 2 6

  7. Resource Utilization in Production Clusters * Google Production Cluster Trace Data. “https://github.com/google/cluster-data” Unused Resource + Waiting/Killed Jobs Because of Physical-Node Constraints 7

  8. Q1: How to achieve better resource utilization? Use remote memory 8

  9. Distributed (Remote) Memory VM1 App1 Container1 App2 App2 Core Core Core Core Core Core Core Core Core Core Core Memory Main Memory Main Memory Node 1 Node 2 9

  10. Modern Datacenter Applications Have Significant Memory Sharing • ¡ PowerGraph TensorFlow 10 Purdue ECE WukLab

  11. Q2: How to scale out parallel applications? Distributed shared memory 11

  12. What about persistence? • Data persistence is useful - Many existing data storage systems ➡ Performance - Memory-based, long-running applications ➡ Checkpointing 12

  13. Q3: How to provide data persistence? 13

  14. DSM Distributed Shared Persistent Memory (DSPM) 
 a significant step towards using PM in datacenters 14

  15. DSPM • Native memory load/store interface – Local or remote (transparent) – Pointers and in-memory data structures • Supports memory read/write sharing 15

  16. DSM Distributed Shared Persistent Memory (DSPM) 
 a significant step towards using PM in datacenters 16

  17. DSPM • Memory load/store interface – Local or remote (transparent) DSPM: One Layer Approach – Pointers and in-memory data structures (Distributed) ¡Memory Benefits of both memory and storage • Supports memory read/write sharing No redundant layers No data marshaling/unmarshalling • Persistent naming • Data durability and reliability (Distributed) ¡Storage 17

  18. Hotpot : 
 A Kernel-Level RDMA-Based 
 DSPM System • Easy to use • Native memory interface • Fast, scalable • Flexible consistency levels • Data durability & reliability 18

  19. Hotpot Architecture 19

  20. Hotpot Architecture 20

  21. Hotpot Architecture 21

  22. Hotpot Architecture 22

  23. Hotpot Architecture 23

  24. Hotpot Architecture 24

  25. Hotpot Code Example /* Open a dataset named 'boilermaker’ */ int fd = open(”/mnt/hotpot/boilermaker”, O_CREAT|O_RDWR); / * map it to application’s virtual address space */ void *base = mmap(0, 40960, PROT_WRITE, MAP_PRIVATE, fd, 0); /* First access: Hotpot will fetch page from remote */ *base = 9; /* Later accesses: Direct memory load/store */ memset(base, 0x27, PAGE_SIZE); /* Commit data: making data coherent, durable, and replicated */ msync(sg_addr, sg_len, MSYNC_HOTPOT); 25

  26. How to efficiently add P to “DSM”? • Distributed Shared Memory - Cache remote memory on-demand for fast local access - Multiple redundant copies • Distributed Storage Systems - Actively add more redundancy to provide data reliability Integrate two forms of redundancy with morphable page states One Layer Principle 26

  27. Morphable Page States • A PM page can serve different purposes, possibly at different times • as a local cached copy to improve performance • as a redundant data page to improve data reliability Node 2 accesses page 3 Node ¡1 Node ¡2 3 3 1 4 2 2 4 1 3 10/9/17 27

  28. How to efficiently add P to “DSM”? • When to make cached copies coherent? • When to make data durable and reliability? • Observations - Data-store applications have well-defined commit points - Commit points: time to make data persistent - Visible to storage devices => visible to other nodes Exploit application behavior: Make data coherent only at commit points 28

  29. Commit Point CPU cache CPU cache CPU cache A’ B’ A C’ PM PM PM A B C A’ A’ Node 1 Node 2 Node 3 • durable • coherent • reliable • single-node and distributed consistency • two consistency modes: single/multiple writer 29

  30. Flexible Coherence Levels • Multiple Reader Multiple Writer ( MRMW ) - Allows multiple concurrent dirty copies - Great parallelism, but weaker consistency - Three-phase commit protocol • Multiple Reader Single Writer ( MRSW ) - Allows only one dirty copy - Trades parallelism for stronger consistency - Single phase commit protocol 30

  31. MongoDB Results • Modify MongoDB with ~120 LOC, use MRMW mode • Compare with tmpfs , PMFS , Mojim , Octopus using YCSB 10/9/17 31

  32. Conclusion • One layer approach: challenges and benefits • Hotpot: a kernel-level RDMA-based DSPM system • Hide complexity behind simple abstraction • Calls for attention to use PM in datacenter • Many open problems in distributed PM! 32

  33. Thank You Questions? Get Hotpot at: https://github.com/WukLab/Hotpot wuklab.io

Recommend


More recommend