Direct-FUSE: A User-level File System with Multiple Backends Yue Zhu yzhu@cs.fsu.edu Florida State University
Outline Ø Background & Motivation Ø The Overview of Direct-FUSE Ø Performance Evaluation Ø Conclusions S-2
User Space vs. Kernel-level File Systems Ø The development complexity, reliability, and portability of kernel-level and user space file systems are different. Kernel-level File System User-level File System Development 1) System crushing and 1) Few system crushing and Complexity restarting during restarting during debugging. debugging. 2) Numerous user-space tools for 2) Language limitation. debugging. 3) Less language limitation and more useful libraries. 4) Systems can be mounted and developed by non-privileged users. Reliability 1) A kernel bug crashes 1) Lower kernel crash possibility entire production system Portability 1) Significant efforts for 1) Easy to port to other systems porting a special file system to a different one S-3
Filesystem in Userspace Ø What is Filesystem in Userspace (FUSE) ? – A software interface for Unix-like computer operating systems. – Non-privileged users can create their own file systems without editing kernel code. – However, the FUSE kernel module is needed to be pre-installed by system administrator. – Example: • SSHFS: a file system client that interacts with directories and files on the remote server over ssh connection. • FusionFS (BigData’14): a distributed file system, which supports metadata- intensive and write-intensive operations. • IndexFS Client (SC’14): the client of IndexFS, which redirects applications’ file operations to the appropriate destination. S-4
How does FUSE File System Work? Ø Execution path of a function call 1. Send the request to the user-level file system • App program → VFS → FUSE kernel module → User-level file system process 2. Return the data back to the application program • User-level file system process → FUSE kernel module → VFS → App program User Level File System Application Program User Space 1 6 Kernel Space 4 Virtual File System (VFS) 3 2 5 FUSE Kernel Modules In-Built File System Storage Device S-5
FUSE File System vs. Native File System Ø Overheads in FUSE file systems User Level File Application – 4 user-kernel mode switches System Program • App ↔ kernel User Space 1 6 • Kernel ↔ file system process Kernel Space – 2 context switches 4 Virtual File System (VFS) 3 • App ↔ file system process – 2 or 3 memory copies 2 5 • Write: App → page cache → file system process → page cache (if made to native FUSE Kernel In-Built File Modules System file system) Ø Overhead in native file system (Ext4) – 2 user-kernel mode switches Storage Device • App ↔ kernel – 0 context switches – 1 memory copy • Write: App → page cache S-6
Number of Context Switches & I/O Bandwidth Ø Measuring the number of context switches and bandwidth in FUSE file system and a native file system. – dd microbenchmark and perf are used in the tests. – FUSE-tmpfs is a FUSE file system deployed on top of tmpfs , and mounted with tuned option values. Block FUSE-tmpfs FUSE-tmpfs tmpfs tmpfs Size (KB) Throughput # Context Throughput # Context (MB/s) Switches (GB/s) Switches 4 163 1012 1.3 7 16 372 1012 1.6 7 64 519 1012 1.7 7 128 549 1012 2.0 7 256 569 2012 2.4 7 1024 576 8012 2.5 7 S-7
Breakdown of Metadata Operation Latency Ø The create() and close() latency on tmpfs and FUSE-tmpfs. – Real Operation: the time in the conducting operation (the actual create or close time). – Overhead: the cost besides the real operation, e.g., the involvement of FUSE kernel module. Ø The real operation time only consists of a small portion of a complete FUSE function call. 250 11.18% Real Operation 200 Latency ( µ s) Overhead 150 100 2.17% 50 0 tmpfs tmpfs FUSE-tmpfs FUSE-tmpfs Create Close S-8
Breakdown of Data Operation Latency Ø The write latency on tmpfs and FUSE-tmpfs – Data Movement: the actual write operation in a complete write function call. – Overhead: the cost besides the data movement. Ø The data movement time only consists of a small portion of a complete FUSE I/O call. 600 Data Movement Overhead 38.21% 500 Latency (µs) 400 37.86% 300 200 33.7% 100 34.8% 10.08% 15.82% 0 1 4 16 64 128 256 Transfer Sizes (KB) S-9
Desirable Objectives Ø Some file systems, such as TableFS (USENIX’13), are leveraged as libraries to avoid the involvement of FUSE kernel module. – However, this approach may not support multiple FUSE libraries with distinct file paths and file descriptors. Ø We propose Direct-FUSE to provide multiple backend services for one application without going through the FUSE kernel. – To reduce the overheads from FUSE modules, we adopt libsysio for providing an FUSE clients service without going through kernels. – Libsysio • developed by the Scalable I/O team at Sandia National Laboratories. • POSIX-like file I/O • Name space support for file systems from the application’s user-level address space. S-10
Outline Ø Background & Motivation Ø The Overview of Direct-FUSE Ø Performance Evaluation Ø Conclusions S-11
The Overview of Direct-FUSE Ø Direct-FUSE ‘ s components includes the adopted-libsysio , lightweight-libfuse , and backend services . – Adopted-libsysio • Distinguishes file path and descriptor for different backends. – lightweight-libfuse • Not real libfuse • Exposes file system operation to under layer backend services. – Backend services • Provide defined file system operations. S-12
Path and File Descriptor Operations Ø To support multiple FUSE backends, file system operations are divided into two categories: path operations and file descriptor operations. – Path • Apply a prefix for the path. (sshfs:/sshfs/test.txt) • Intercept the prefix and path to return the mount information, which contains the pointers to the defined operations. • When a new file is opened, the returned file descriptors of the backend is mapped to a new file descriptor assigned by adopted-libsysio. – File descriptor • file record is found by the file descriptor in the open file table. • file record contains pointers to the operations, current stream position, etc. S-13
Requirements for New Backends • The file system operations work with paths and file names instead of inodes. Ø A independent library which contains the fuse file system operations, initialization function, and also the unmount function. – If there is no existing library for the backend, we have to build the library by ourselves. – If there is a library for the backend, we have to wrap its APIs and provide the initialization function. Ø No user data is passed to FUSE module via fuse_mount() function. – If the file system passes the user data via fuse_mount() when mount, then additional efforts are needed to globalize the user data for other file system operations. Ø Implemented in C or C++. S-14
Outline Ø Background & Motivation Ø The Overview of Direct-FUSE Ø Performance Evaluation Ø Conclusions S-15
Experimental Methodology Ø We compare the bandwidth of Direct-FUSE with local FUSE file system and native file system on disk and RAM-disk by Iozone. – Disk • Ext4-fuse: FUSE file system overlying Ext4. • Ext4-direct: Ext4-fuse bypasses the FUSE kernel. • Ext4-native: original Ext4 on disk. – RAM-disk • Tmpfs-fuse, Tmpfs-direct, and Tmpfs-native are similar to the three tests on disk. Ø We also compare the I/O bandwidth of distributed FUSE file system with Direct-FUSE. – FusionFS: a distributed file system that supports metadata-intensive and write-intensive operations. Ø Breakdown Analysis of I/O Processing in Direct-FUSE S-16
Sequential Write Bandwidth Ø The bandwidth of Direct-FUSE is very close to the native file system. 10000 Ext4-fuse Ext4-direct Ext4-native tmpfs-fuse tmpfs-direct tmpfs-native 1000 Bandwidth (MB/s) 100 10 1 4 KB 16 KB 64 KB 256 KB 1 MB 4 MB 16 MB Transfer Sizes S-17
Sequential Read Bandwidth Ø Similar to the sequential write bandwidth, the read bandwidth of Direct-FUSE is close to the native file system. Ext4-fuse Ext4-direct Ext4-native tmpfs-fuse tmpfs-direct tmpfs-native 10000 Bandwidth (MB/s) 1000 100 10 1 4 KB 16 KB 64 KB 256 KB 1 MB 4 MB 16 MB Transfer Sizes S-18
I/O Bandwidth of FusionFS Ø According the figure, doubling the number of nodes yields doubled throughput both in read and write, which demonstrates the almost linear scalability of FusionFS and Direct-FUSE to up to 16 nodes. 10000 10000 fusionfs direct-fusionfs fusionfs direct-fusionfs Bandwidth (MB/s) Bandwidth (MB/s) 1000 1000 100 100 10 10 1 1 1 2 4 8 16 1 2 4 8 16 Number of Nodes Number of Nodes S-19
Breakdown Analysis of I/O Processing in Direct-FUSE Ø The dummy read/write only takes about 38 ns, which occupies less than 3% of the complete I/O function time in Direct-FUSE, even when the I/O size is very small. – Dummy write/read: no actual data movement, directly return after reaching the backend service. – Real write/read: the actual Direct-FUSE read and write I/O calls. 10000 10000 dummy write real write dummy read real read 1000 1000 Latency (ns) Latency (ns) 100 100 10 10 1 1 1B 4B 16B 64B 256B 1KB 1B 4B 16B 64B 256B 1KB Transfer Sizes Transfer Sizes S-20
Recommend
More recommend