Outline • Target usage scenario • High-level design • Handling data operations • Handling file reads and updates • Handling file appends • Consistency guarantees • Evaluation � 15
Handling reads and updates Application U-Split User Kernel K-Split (ext4-DAX) File PM � 16
Handling reads and updates Application read / update U-Split User Kernel K-Split (ext4-DAX) File PM � 16
Handling reads and updates Application read / update U-Split User mmap Kernel K-Split (ext4-DAX) perform mmap File PM � 16
Handling reads and updates Application read / update DAX-mmaps U-Split User mmap Kernel K-Split (ext4-DAX) perform mmap File PM � 16
Handling reads and updates Application read / update DAX-mmaps U-Split User Kernel K-Split (ext4-DAX) File PM � 16
Handling reads and updates Application read / update DAX-mmaps U-Split User Kernel In the common case, file reads and updates do K-Split (ext4-DAX) not pass through the kernel File PM � 16
Outline • Target usage scenario • High-level design • Handling data operations • Handling file reads and updates • Handling file appends • Consistency guarantees • Evaluation � 17
Handling appends user kernel foo inode size = 10 foo Persistent Memory � 18
Handling appends Application Start user kernel foo inode size = 10 foo Persistent Memory � 18
Handling appends Application Start user kernel staging file inode foo inode size = 100 size = 10 foo staging file Persistent Memory � 18
Handling appends Application Start staging file mmap user kernel staging file inode foo inode size = 100 size = 10 foo staging file Persistent Memory � 18
Handling appends Application Start staging file mmap append (foo,“abc”) store user kernel staging file inode foo inode size = 100 size = 10 abc foo staging file Persistent Memory 18 �
Handling appends Application Start staging file mmap append (foo,“abc”) load user read (foo) kernel staging file inode foo inode size = 100 size = 10 abc foo staging file Persistent Memory 18 �
Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel fsync (foo) staging file inode foo inode size = 100 size = 10 abc foo staging file Persistent Memory 18 �
Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel relink() fsync (foo) staging file inode foo inode size = 100 size = 10 abc foo staging file Persistent Memory 18 �
Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel relink() fsync (foo) staging foo staging file inode foo inode ext4-journal transaction size = 100 size = 10 abc foo staging file Persistent Memory 18 �
Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel relink() fsync (foo) staging foo staging file inode foo inode ext4-journal transaction size = 100 size = 10 abc foo staging file Persistent Memory 18 �
Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel relink() fsync (foo) staging foo staging file inode foo inode ext4-journal transaction size = 100 size = 10 In the common case, file appends do not pass through the kernel abc foo staging file Persistent Memory 18 �
Outline • Target usage scenario • High-level design • Handling data operations • Consistency guarantees • Evaluation � 19
Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX PMFS, Sync SplitFS-Sync NOVA, Strata, Strict SplitFS-Strict � 20
Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX PMFS, Sync SplitFS-Sync NOVA, Strata, Strict SplitFS-Strict � 20
Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX PMFS, Sync SplitFS-Sync NOVA, Strata, Strict SplitFS-Strict � 20
Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX PMFS, Sync SplitFS-Sync NOVA, Strata, Strict SplitFS-Strict � 20
Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX Optimized logging is used in order to provide PMFS, Sync SplitFS-Sync stronger guarantees in sync and strict modes NOVA, Strata, Strict SplitFS-Strict � 20
Optimized logging � 21
Optimized logging SplitFS employs a per-application log in sync and strict mode, which logs every logical operation � 21
Optimized logging SplitFS employs a per-application log in sync and strict mode, which logs every logical operation In the common case • Each log entry fits in one cache line • Persisted using a single non-temporal store and sfence instruction � 21
Flexible SplitFS App 2 App 3 App 1 User Kernel K-Split (ext4-DAX) PM File 1 File 2 File 3 File 4 � 22
Flexible SplitFS App 2 App 3 App 1 U-Split- U-Split- U-Split- strict sync POSIX User Kernel K-Split (ext4-DAX) PM File 1 File 2 File 3 File 4 � 22
Flexible SplitFS App 2 App 3 App 1 Data Meta Data Meta Data Meta U-Split- U-Split- U-Split- strict sync POSIX User Kernel K-Split (ext4-DAX) PM File 1 File 2 File 3 File 4 � 22
Visibility � 23
Visibility When are updates from one application visible to another? � 23
Visibility When are updates from one application visible to another? • All metadata operations are immediately visible to all other processes � 23
Visibility When are updates from one application visible to another? • All metadata operations are immediately visible to all other processes • Writes are visible to all other processes on subsequent fsync() � 23
Visibility When are updates from one application visible to another? • All metadata operations are immediately visible to all other processes • Writes are visible to all other processes on subsequent fsync() • Memory mapped files have the same visibility guarantees as that of ext4-DAX � 23
SplitFS Techniques Technique Benefit � 24
SplitFS Techniques Technique Benefit Low-overhead data operations, SplitFS Architecture Correct metadata operations � 24
SplitFS Techniques Technique Benefit Low-overhead data operations, SplitFS Architecture Correct metadata operations Optimized appends, Staging + Relink No data copy � 24
SplitFS Techniques Technique Benefit Low-overhead data operations, SplitFS Architecture Correct metadata operations Optimized appends, Staging + Relink No data copy Optimized Logging + out-of-place writes Stronger guarantees � 24
Outline • Target usage scenario • High-level design • Handling data operations • Consistency guarantees • Evaluation � 25
Evaluation � 26
Evaluation Setup: • 2-socket 96-core machine with 32 MB LLC • 768 GB Intel Optane DC PMM, 378 GB DRAM � 26
Evaluation Setup: • 2-socket 96-core machine with 32 MB LLC • 768 GB Intel Optane DC PMM, 378 GB DRAM File systems compared: • ext4-DAX, PMFS, NOVA, Strata � 26
Evaluation Setup: • 2-socket 96-core machine with 32 MB LLC • 768 GB Intel Optane DC PMM, 378 GB DRAM File systems compared: • ext4-DAX, PMFS, NOVA , Strata � 26
Does SplitFS reduce software overhead compared to other file systems? How does SplitFS perform on data intensive workloads? How does SplitFS perform on metadata intensive workloads? � 27
Does SplitFS reduce software overhead compared to other file systems? How does SplitFS perform on data intensive workloads? How does SplitFS perform on metadata intensive workloads? • < 15% overhead for metadata intensive workloads � 27
Software Overhead of SplitFS • Append 4KB data to a file • Time taken to copy user data to PM: ~700 ns 9002 10000 (12x) 8000 Time (ns) 6000 4150 (5x) 3021 4000 (3x) 2450 (2.5x) 2000 700 0 device SplitFS-strict Strata NOVA PMFS ext4-DAX � 28
Software Overhead of SplitFS • Append 4KB data to a file • Time taken to copy user data to PM: ~700 ns 9002 10000 (12x) 8000 Time (ns) 6000 4150 (5x) 3021 4000 (3x) 2450 (2.5x) 1251 2000 (0.8x) 700 0 device SplitFS-strict Strata NOVA PMFS ext4-DAX � 28
Workloads Seq writes Microbenchmarks Seq reads Appends Rand reads Rand writes YCSB on LevelDB Data intensive Redis TPCC on SQLite Metadata intensive Tar Git Rsync � 29
Workloads Seq writes Microbenchmarks Seq reads Appends Rand reads Rand writes YCSB on LevelDB Data intensive Redis TPCC on SQLite Metadata intensive Tar Git Rsync � 29
YCSB on LevelDB Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark Insert 5M keys. Run 5M operations. Key size = 16 bytes. Value size = 1K 2.5 Normalized throughput 2 1.5 NOVA 1 SplitFS-Strict 0.5 0 Load A Run A Run B Run C Run D Load E Run E Run F Load A - 100% writes Run D - 95% reads (latest), 5% writes Run A - 50% reads, 50% writes Load E - 100% writes Run B - 95% reads, 5% writes Run E - 95% range queries, 5% writes Run C - 100% reads Run F - 50% reads, 50% read-modify-writes � 30
YCSB on LevelDB Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark Insert 5M keys. Run 5M operations. Key size = 16 bytes. Value size = 1K 2.5 Normalized throughput 139.94 kops/s 174.85 kops/s 191.54 kops/s 32.24 kops/s 13.39 kops/s 66.54 kops/s 13.59 kops/s 17.75 kops/s 2 1.5 NOVA 1 SplitFS-Strict 0.5 0 Load A Run A Run B Run C Run D Load E Run E Run F Load A - 100% writes Run D - 95% reads (latest), 5% writes Run A - 50% reads, 50% writes Load E - 100% writes Run B - 95% reads, 5% writes Run E - 95% range queries, 5% writes Run C - 100% reads Run F - 50% reads, 50% read-modify-writes � 31
Recommend
More recommend