splitfs reducing software overhead in file systems for
play

SplitFS: Reducing Software Overhead in File Systems for Persistent - PowerPoint PPT Presentation

SplitFS: Reducing Software Overhead in File Systems for Persistent Memory Rohan Kadekodi, Se Kwon Lee, Sanidhya Kashyap*, Taesoo Kim, Aasheesh Kolli, Vijay Chidambaram * on the job market 1 Persistent Memory (PM) Non-volatile Fast


  1. Outline • Target usage scenario • High-level design • Handling data operations • Handling file reads and updates • Handling file appends • Consistency guarantees • Evaluation � 15

  2. Handling reads and updates Application U-Split User Kernel K-Split (ext4-DAX) File PM � 16

  3. Handling reads and updates Application read / update U-Split User Kernel K-Split (ext4-DAX) File PM � 16

  4. Handling reads and updates Application read / update U-Split User mmap Kernel K-Split (ext4-DAX) perform mmap File PM � 16

  5. Handling reads and updates Application read / update DAX-mmaps U-Split User mmap Kernel K-Split (ext4-DAX) perform mmap File PM � 16

  6. Handling reads and updates Application read / update DAX-mmaps U-Split User Kernel K-Split (ext4-DAX) File PM � 16

  7. Handling reads and updates Application read / update DAX-mmaps U-Split User Kernel In the common case, file reads and updates do K-Split (ext4-DAX) not pass through the kernel File PM � 16

  8. Outline • Target usage scenario • High-level design • Handling data operations • Handling file reads and updates • Handling file appends • Consistency guarantees • Evaluation � 17

  9. Handling appends user kernel foo inode size = 10 foo Persistent Memory � 18

  10. Handling appends Application Start user kernel foo inode size = 10 foo Persistent Memory � 18

  11. Handling appends Application Start user kernel staging file inode foo inode size = 100 size = 10 foo staging file Persistent Memory � 18

  12. Handling appends Application Start staging file mmap user kernel staging file inode foo inode size = 100 size = 10 foo staging file Persistent Memory � 18

  13. Handling appends Application Start staging file mmap append (foo,“abc”) store user kernel staging file inode foo inode size = 100 size = 10 abc foo staging file Persistent Memory 18 �

  14. Handling appends Application Start staging file mmap append (foo,“abc”) load user read (foo) kernel staging file inode foo inode size = 100 size = 10 abc foo staging file Persistent Memory 18 �

  15. Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel fsync (foo) staging file inode foo inode size = 100 size = 10 abc foo staging file Persistent Memory 18 �

  16. Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel relink() fsync (foo) staging file inode foo inode size = 100 size = 10 abc foo staging file Persistent Memory 18 �

  17. Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel relink() fsync (foo) staging foo staging file inode foo inode ext4-journal transaction size = 100 size = 10 abc foo staging file Persistent Memory 18 �

  18. Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel relink() fsync (foo) staging foo staging file inode foo inode ext4-journal transaction size = 100 size = 10 abc foo staging file Persistent Memory 18 �

  19. Handling appends Application Start staging file mmap append (foo,“abc”) user read (foo) kernel relink() fsync (foo) staging foo staging file inode foo inode ext4-journal transaction size = 100 size = 10 In the common case, file appends do not pass through the kernel abc foo staging file Persistent Memory 18 �

  20. Outline • Target usage scenario • High-level design • Handling data operations • Consistency guarantees • Evaluation � 19

  21. Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX PMFS, Sync SplitFS-Sync NOVA, Strata, Strict SplitFS-Strict � 20

  22. Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX PMFS, Sync SplitFS-Sync NOVA, Strata, Strict SplitFS-Strict � 20

  23. Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX PMFS, Sync SplitFS-Sync NOVA, Strata, Strict SplitFS-Strict � 20

  24. Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX PMFS, Sync SplitFS-Sync NOVA, Strata, Strict SplitFS-Strict � 20

  25. Consistency Guarantees Metadata Synchronous Data Mode File System Atomicity Operations Atomicity ext4-DAX, POSIX SplitFS-POSIX Optimized logging is used in order to provide PMFS, Sync SplitFS-Sync stronger guarantees in sync and strict modes NOVA, Strata, Strict SplitFS-Strict � 20

  26. Optimized logging � 21

  27. Optimized logging SplitFS employs a per-application log in sync and strict mode, which logs every logical operation � 21

  28. Optimized logging SplitFS employs a per-application log in sync and strict mode, which logs every logical operation In the common case • Each log entry fits in one cache line • Persisted using a single non-temporal store and sfence instruction � 21

  29. Flexible SplitFS App 2 App 3 App 1 User Kernel K-Split (ext4-DAX) PM File 1 File 2 File 3 File 4 � 22

  30. Flexible SplitFS App 2 App 3 App 1 U-Split- U-Split- U-Split- strict sync POSIX User Kernel K-Split (ext4-DAX) PM File 1 File 2 File 3 File 4 � 22

  31. Flexible SplitFS App 2 App 3 App 1 Data Meta Data Meta Data Meta U-Split- U-Split- U-Split- strict sync POSIX User Kernel K-Split (ext4-DAX) PM File 1 File 2 File 3 File 4 � 22

  32. Visibility � 23

  33. Visibility When are updates from one application visible to another? � 23

  34. Visibility When are updates from one application visible to another? • All metadata operations are immediately visible to all other processes � 23

  35. Visibility When are updates from one application visible to another? • All metadata operations are immediately visible to all other processes • Writes are visible to all other processes on subsequent fsync() � 23

  36. Visibility When are updates from one application visible to another? • All metadata operations are immediately visible to all other processes • Writes are visible to all other processes on subsequent fsync() • Memory mapped files have the same visibility guarantees as that of ext4-DAX � 23

  37. SplitFS Techniques Technique Benefit � 24

  38. SplitFS Techniques Technique Benefit Low-overhead data operations, SplitFS Architecture Correct metadata operations � 24

  39. SplitFS Techniques Technique Benefit Low-overhead data operations, SplitFS Architecture Correct metadata operations Optimized appends, Staging + Relink No data copy � 24

  40. SplitFS Techniques Technique Benefit Low-overhead data operations, SplitFS Architecture Correct metadata operations Optimized appends, Staging + Relink No data copy Optimized Logging + out-of-place writes Stronger guarantees � 24

  41. Outline • Target usage scenario • High-level design • Handling data operations • Consistency guarantees • Evaluation � 25

  42. Evaluation � 26

  43. Evaluation Setup: • 2-socket 96-core machine with 32 MB LLC • 768 GB Intel Optane DC PMM, 378 GB DRAM � 26

  44. Evaluation Setup: • 2-socket 96-core machine with 32 MB LLC • 768 GB Intel Optane DC PMM, 378 GB DRAM File systems compared: • ext4-DAX, PMFS, NOVA, Strata � 26

  45. Evaluation Setup: • 2-socket 96-core machine with 32 MB LLC • 768 GB Intel Optane DC PMM, 378 GB DRAM File systems compared: • ext4-DAX, PMFS, NOVA , Strata � 26

  46. Does SplitFS reduce software overhead compared to other file systems? How does SplitFS perform on data intensive workloads? How does SplitFS perform on metadata intensive workloads? � 27

  47. Does SplitFS reduce software overhead compared to other file systems? How does SplitFS perform on data intensive workloads? How does SplitFS perform on metadata intensive workloads? • < 15% overhead for metadata intensive workloads � 27

  48. Software Overhead of SplitFS • Append 4KB data to a file • Time taken to copy user data to PM: ~700 ns 9002 10000 (12x) 8000 Time (ns) 6000 4150 (5x) 3021 4000 (3x) 2450 (2.5x) 2000 700 0 device SplitFS-strict Strata NOVA PMFS ext4-DAX � 28

  49. Software Overhead of SplitFS • Append 4KB data to a file • Time taken to copy user data to PM: ~700 ns 9002 10000 (12x) 8000 Time (ns) 6000 4150 (5x) 3021 4000 (3x) 2450 (2.5x) 1251 2000 (0.8x) 700 0 device SplitFS-strict Strata NOVA PMFS ext4-DAX � 28

  50. Workloads Seq writes Microbenchmarks Seq reads Appends Rand reads Rand writes YCSB on LevelDB Data intensive Redis TPCC on SQLite Metadata intensive Tar Git Rsync � 29

  51. Workloads Seq writes Microbenchmarks Seq reads Appends Rand reads Rand writes YCSB on LevelDB Data intensive Redis TPCC on SQLite Metadata intensive Tar Git Rsync � 29

  52. YCSB on LevelDB Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark Insert 5M keys. Run 5M operations. Key size = 16 bytes. Value size = 1K 2.5 Normalized throughput 2 1.5 NOVA 1 SplitFS-Strict 0.5 0 Load A Run A Run B Run C Run D Load E Run E Run F Load A - 100% writes Run D - 95% reads (latest), 5% writes Run A - 50% reads, 50% writes Load E - 100% writes Run B - 95% reads, 5% writes Run E - 95% range queries, 5% writes Run C - 100% reads Run F - 50% reads, 50% read-modify-writes � 30

  53. YCSB on LevelDB Yahoo! Cloud Serving Benchmark - Industry standard macro-benchmark Insert 5M keys. Run 5M operations. Key size = 16 bytes. Value size = 1K 2.5 Normalized throughput 139.94 kops/s 174.85 kops/s 191.54 kops/s 32.24 kops/s 13.39 kops/s 66.54 kops/s 13.59 kops/s 17.75 kops/s 2 1.5 NOVA 1 SplitFS-Strict 0.5 0 Load A Run A Run B Run C Run D Load E Run E Run F Load A - 100% writes Run D - 95% reads (latest), 5% writes Run A - 50% reads, 50% writes Load E - 100% writes Run B - 95% reads, 5% writes Run E - 95% range queries, 5% writes Run C - 100% reads Run F - 50% reads, 50% read-modify-writes � 31

Recommend


More recommend