NATIVE OS SUPPORT FOR PERSISTENT MEMORY WITH REGIONS Mohammad Chowdhury (mchow017@fiu.edu) Raju Rangaswami (raju@cs.fiu.edu) Florida International University
PERSISTENT MEMORY (PM) Hybrid characteristics of memory and storage Memory Storage • Volatile • Non-volatile/Persistent • Byte-addressable access • Block I/O access • Fast • Slow Persistent Memory Read/Write latency : • Non-Volatile/Persistent 4X-10X • Byte-addressable access of memory • Fast 2 5/19/2017
PM CHALLENGES PM is directly accessible by CPU BUT … Caches and Memory controller sit between PM and CPU PM resident data can be Caches write dirty pages to DRAM/PM according to corrupted after a system failure cache eviction policy Memory Controller optimizes performance by reordering if ordering of updates is violated the updates 3 5/19/2017
PM CHALLENGES: THE COSTS OF ORDERING • Ordering requires cache line flushes, barriers, and ADR (asynchronous DRAM refresh) • Increased cost of operations • More redundant metadata More ordering required • GOAL • Reduce ordering requirements 4 5/19/2017
PM CHALLENGES: ATOMIC DATA DURABILITY P1 P2 P3 PM 8 4 2 P2(x 2 ) P1(x) P3(x 3 ) P3(msync) Final t 1 t 2 t 3 t 0 Version “4” Memory PM Null 4 All good!! shouldn’t be here Requirements: 1. Make data atomically durable (ALL or NONE) 2. Revert back to initial state in case of failure 5 5/19/2017
PM OPPORTUNITIES: SHARED CONSISTENCY DAX/Regular MMAP NOVA ATOMIC MMAP P1 P2 P1 P2 Private copy Cache coherent visibility PM PM MAP_ATOMIC MAP_PRIVATE • MAP_SHARED • Requirements: • Updates immediately reflected in process • Updates only visible to the process 1. Updates should be visible to all the shared processes address spaces • Atomically durable 2. Should support atomic durability of all updates across a shared region • NOT atomically durable • Forfeits sharing/cache coherency support 6 5/19/2017
PM OPPORTUNITIES: SIMPLE MEMORY-LIKE TRANSACTIONS Program A Program B Allocate persistent Obj1; A = mmap(PM); Allocate persistent Obj2; Allocate objects Obj1,Obj2 from mapped area Programmers Begin Transaction 1. Must track Obj1 operations Operations involving Obj1, Obj2. all updates End transaction Sync() to persistent objects Begin Transaction More Operations on both Obj1, Obj2 2. Must annotate Obj2 operations Sync() individual End transactions Programmers simply call Sync() to transactions persist all updates in a mapped area 7 5/19/2017
APPLICATIONS REQUIREMENTS FOR PM Arbitrary & Unordered Allocation Mapped Persistent Data Namespace PM Based Consistency Application Consistent Simple Sharing Memory Like Support Transactions 8 5/19/2017
CONTEMPORARY SOLUTIONS DAX File Systems Memory Subsystem Persistent Heaps Mnemosyne NOVA, EXT4- OS NV-Heaps DAX, PMFS LibpmemObj Regular File Sys. Atomic Msync Replication Failure Atomic Mojim EXT4, BTRFS, Msync (EXT4-JBD) RDMA etc. 9 5/19/2017
CONTEMPORARY SOLUTIONS Region System DAX File Systems Memory Subsystem Persistent Heaps Regular File Sys. Atomic Msync Replication Arbitrary and Unordered Allocation Consistent Sharing Support Simple Memory Like Transactions Mapped Data Consistency Persistent Namespace Mapped Data Consistency (Partial) 10 5/19/2017
REGION SYSTEM We present “Region System”, a kernel subsystem, to support persistent memory to achieve the following goals: • Minimize unwanted latency in the persistent memory access path; • Provide users with direct and consistent access to shared persistent memory; and • Demonstrate modifications of the existing applications for optimized usage. 11 5/19/2017
REDEFINED OS MEMORY/STORAGE STACK NOT intended as replacement for File Systems or Memory Subsystem RS should serve as a core “Persistent Memory Support System” usable by applications , file systems , and other kernel subsystems. 12 5/19/2017
ARCHITECTURE Region : Collection of persistent pages PPAGES : 4KB PM pages 13 5/19/2017
CONSISTENCY STATES Current Snapshot State 0 0 No Ppage 0 y Invalid – There can not be a snapshot without current x 0 Un-synced page, mapped to the address space x == y, page in synced state x y x != y, page in unsynced state, “y” is the consistent version 14 5/19/2017
REGION SYSTEM (RS) INTERFACE Class System Call region_d open (char region_name, flags f) int close (region_d rd ) Namespace int delete (region_d rd) ppage_no alloc_ppage (region_d rd) Allocation int free_ppage (region_d rd, ppage_no ppn) vaddr pmmap(vaddr va, region_d rd, ppage_no, int nbytes, flags f) Mapping & int pmunmap(vaddr va) Consistency pmsync(vaddr va) 15 5/19/2017
METADATA OPERATIONS • Persistent Operations • Modifies persistent metadata • Volatile Operations • No updates to persistent metadata • Persistent operations are designed to achieve atomic durability 16 5/19/2017
METADATA OPERATION COMPARISON Persistent Operations 1.1x 2.8x 2.2x 1.25x 2.3x Volatile Operations 17 5/19/2017
MAPPED DATA CONSISTENCY CHALLENGES • Avoid Unwanted Durability • Applications want to make updates durable only updates a msync() invocation. • Updates are made durable in PM before a msync call. • In case of a failure, the mapped PM area will contain uncommitted data. • Protecting the Sync • During sync operation no applications should be allowed to write to mapped PM difficult to achieve due to direct CPU access. 18 5/19/2017
ATOMIC DURABILITY WITH PMSYNC 1. Identify the dirty pages 2. Write protect the pages 3. Flush dirty cache lines 4. Copy-on-write protection for future writes to a sync’ed page 19 5/19/2017
AVOIDING COW PROPAGATION 1 1 1 2 2 2 3 3 4 4 5 4 5 6 7 8 c s c s c s 7 6 7 8 9 9 9 10 9 10 Conventional CoW Region System CoW Copy-on-write for page 9 Copy-on-write for page 9 20 5/19/2017
PMSYNC EXAMPLE 7. PMSYNC_COMPLETE 5. PMSYNC_IN_PROGRESS PMSYNC A rs_root rnode: A Locked rnode: B 4 5 6 1 2 3 1 2 4 5 6 3 PM c s c s c s c s c s c s c s c s c s c s c s c s 6. Change s E2 E3 E6 E7 E8 E9 EE F0 F2 4. Flush cache Cache line for tlb E2 mmu Volatile Page 2. Write 2. Wait for tables Protect CPU 1 vma vma vma vma E2 mm mm Task Z Task Y 1. IPI CPU 1 CPU 2 3. IPI returns 21 5/19/2017
PMSYNC COMPARISON WITH EXT4-DAX Latency ( μs ) File/Region size 22 5/19/2017
LIBPMEM-REGION Non-transactional pmem-flush All or None policy does not work A portion of the updates can be lost Outcome 1. Add atomic durability guarantee to libpmem 2. Reduce risk factor for libraries built on top of libpmem 23 5/19/2017
LIBPMEM COMPARISONS 24 5/19/2017
LIBPMEM COMPARISONS 25 5/19/2017
SUMMARY • Region System Features • Provides arbitrary and unordered allocation and de- allocation • Minimizes ordering requirements by eliminating redundancy • Provides transparent sharing and atomic durability of mapped data with competitive performance • Usable by File systems , Applications , Libraries , and other kernel subsystems or modules. • Source code will be made public soon! 26 5/19/2017
Thanks ! QUESTIONS ? 27 5/19/2017
Recommend
More recommend