compoundfs compounding i o operations in firmware file
play

CompoundFS: Compounding I/O Operations in Firmware File Systems - PowerPoint PPT Presentation

CompoundFS: Compounding I/O Operations in Firmware File Systems Yujie Ren 1 , Jian Zhang 2 and Sudarsun Kannan 1 1 Rutgers University; 2 ShanghaiTech University Outline Background Analysis Design Evaluation Conclusion 2


  1. CompoundFS: Compounding I/O Operations in Firmware File Systems Yujie Ren 1 , Jian Zhang 2 and Sudarsun Kannan 1 1 Rutgers University; 2 ShanghaiTech University

  2. Outline • Background • Analysis • Design • Evaluation • Conclusion 2

  3. In-storage Processors Are Powerful Samsung 840 Samsung 970 Intel X25M Year: 2008 2013 2018 Price: $7.4/GB $0.92/GB $0.80/GB CPU: 2-core 3-core 5-core RAM: 128MB DDR2 512MB LPDDR2 1GB LPDDR4 B/W: 250 MB/s 500 MB/s 3300 MB/s Latency: ~70 𝜈 s ~60 𝜈 s ~40 𝜈 s 3

  4. Software Latency Matters Now Application write() : Kernel Trap VFS Layer : Data Copy : OS Overhead Page Cache Actual FS PMFS ext4 1 - 4 𝜈𝑡 Block I/O Layer Software Device Driver OS Kernel Overhead Matters ! 4

  5. Current Solutions • DirectFS (i.e. Strata, SplitFS, DevFS) reduces software overhead bypassing OS kernel partially or fully : data-plane ops : control-plane ops Application Application Application FS Lib FS Lib FS Lib Kernel FS Server DAX FS Firmware FS Storage Storage Storage Strata (SOSP’17) SplitFS (SOSP’19) DevFS (FAST’18) 5

  6. Limitation of Current Solutions • DirectFS designs do not reduce boundary crossing - Strata needs boundary crossing between FS Lib and FS Server - SplitFS needs kernel trap for control-plane operations - DevFS suffers from high PCIe latency for every operation • DirectFS designs do not efficiently reduce data copy - Current solutions need multiple data copies back and forth between application and storage stack • DirectFS designs do not utilize in-storage computation - Current solutions only use host CPUs for I/O related operations 6

  7. Outline • Background • Analysis • Design • Evaluation • Conclusion 7

  8. Analysis Methodology • Storage - Emulated persistent memory on DRAM like prior work (e.g., SplitFS) • File Systems - ext4-DAX: ext4 on byte-addressable storage bypassing page cache - SplitFS: direct-access file system bypassing kernel for data-plane ops • Application - LevelDB: Well-known persistent key-value store - db_bench: random write and read benchmarks 8

  9. LevelDB Overhead Breakdown Data allocation (OS) Data copy (OS) Filesystem update (OS) Lock (OS) 80% Data allocation (user) Data copy (user) Run time percentage(%) CRC32 (user) 60% 40% 20% 0% 256 4096 256 4096 (DAX) (DAX) (SplitFS) (SplitFS) Value size (bytes) • LevelDB spends significant time (~%50) in OS storage stack • Spends ~%15 of time on data copy between App and OS • Spends ~%20 of time on App-level crash consistency – CRC of data 9

  10. Outline • Background • Analysis • Design • Evaluation • Conclusion 10

  11. Our solution: CompoundFS • Combine (compound) multiple file system I/O ops into one • Offload I/O pre- and post-processing to storage-level CPUs • Bypass OS kernel and provide direct-access 11

  12. Our solution: CompoundFS • Combine (compound) multiple file system I/O ops into one - e.g. write() after read() compounded to write-after-read() - Reduces boundary crossing b/w host and storage (e.g., syscall) • Offload I/O pre- and post-processing to storage-level CPUs - e.g. checksum() after write() compounded to write-and-checksum() - Storage CPUs perform computation (e.g., checksum) and persist - Reduce data movement cost across boundaries • Bypass OS kernel and provide direct-access - firmware file system design to provide direct access for data plane and most control plane operations 12

  13. I/O Only Compound Operations Read-modify-write: Traditional FS Path: Compound FS Path: Read(data) Write(data) Read_modify_write(data) User space User space modify Storage Kernel space Storage FS 2 syscalls + 2 data copies Only 1 data copy with direct access : Kernel Trap StorageFS performs compound op : Data Copy 13

  14. I/O + Compute Compound Operations Write-and-checksum Traditional FS Path: Compound FS Path: Write_and_checksum(data) Write(data) Write(checksum) User space User space checksum Storage Kernel space Storage FS 2 syscalls + 2 data copies Only 1 data copy with direct access : Kernel Trap StorageFS handles checksum calculation : Data Copy 14

  15. CompoundFS Architecture Application (Thread 2) Application (Thread 1) Op1 open( File1 ) -> fd 1 Op2+ read_modify_write (fd2, buf, off=30, sz=5) Op3* write_and_checksum (fd1,buff, off=10, Op4 read(fd2, buf, off=30, sz=5) sz=1K, checksum_pos=head) Per-inode I/O Queue Per-inode Data Buffer UserLib (in Host) Converting POSIX I/O syscalls to Op1 Op2+ Op3* Op4 CompoundFS compoundOps StorageFS Journal (In Device) Meta- NVM Data TxB TxE … data Block Addr I/O Request Processing Threads Compounding I/O ops CPUID CPUID Perform CRC calculation CPUID Cred before write() Table Cred Cred Cred Device CPU Cores 15

  16. CompoundFS Implementation • Command-based arch based on PMFS (Eurosys’14) - control-plane ops (e.g. open) as commands via ioctl() - ioctl() carries arguments for each I/O ops • Avoids VFS overhead - control-plane ops are issued via ioctl(), no VFS layer • Avoids system call overhead - UserLib and StorageFS share a command buffer - UserLib adds requests to command buffer - StorageFS processes requests from the buffer 16

  17. CompoundFS Challenges • Crash-consistency model for compound I/O operations • All-or-nothing model (current solution) - an entire compound operation is a transaction - partially completed operations cannot be recovered - e.g., write-and-checksum, only data is persisted but checksum not • All-or-something model (ongoing) - fine-grained journaling and partial recovery is supported - recovery could become complex 17

  18. Outline • Background • Analysis • Design • Evaluation • Conclusion 18

  19. Evaluation Goal • Effectiveness to reduce boundary crossing • Effectiveness to reduce data copy overheads • Ability to exploit compute capability of modern storage 19

  20. Experimental Setup Hardware Platform • - dual-socket 64-core Xeon Scalable CPU @ 2.6GHz - 512GB Intel DC Optane NVM • Emulate firmware-level FS - reserve dedicated device threads handling I/O requests - add PCIe latency for every I/O operation - reduce CPU frequency to 1.2GHz for device CPU State-of-the-art File Systems • - ext4-DAX (Kernel-level file system) - SplitFS (User-level file system) - DevFS (Device-level file system) 20

  21. Micro-Benchmark 1200 1200 ext4-DAX 1000 1000 Throughput (MB/s) SplitFS Throughput (MB/s) 1.25x DevFS 800 800 CompoundFS 600 600 CompoundFS-slowCPU 400 400 2.1x 200 200 0 0 256 4096 256 4096 Value Size Value Size Read-modify-write Write-and-checksum CompoundFS reduces unnecessary data movement and system call • overhead by combining operations Even with slow device CPUs, CompoundFS can still provide gains for in- • storage computation 21

  22. LevelDB 40 100 ext4-DAX SplitFS Throughput (MB/s) 80 1.75x 30 Latency (us/op) DevFS 60 CompoundFS 20 40 CompoundFS-slowCPU 10 20 0 0 512 4096 512 4096 db_bench Value Size (500k keys) db_bench Value Size (500k keys) db_bench random read db_bench random write CompoundFS also shows promising speedup in Leveldb • 22

  23. Conclusion Storage hardware is moving to microsecond era • - Software overhead matters and providing direct-access is critical - Storage compute capability can benefit I/O intensive applications CompoundFS combines I/O ops and offloads computations • - Reduces boundary crossing (system call) and data copy overhead - Takes advantage of in-storage compute resources Our ongoing work • - Fine-grained crash consistency mechanism - Efficient I/O scheduler for managing computation in storage 23

  24. Thanks! yujie.ren@rutgers.edu Questions? 24

Recommend


More recommend