BFO: Batch-File Operations on Massive Files for Consistent Performance Improvement Yang Yang, Qiang Cao, Hong Jiang, Li Yang, Jie Yao, Yuanyuan Dong, Puyuan Yang Huazhong University of Science and Technology, University of Texas at Arlington, Alibaba group 1
Outline Background BFO Design Evaluation Conclusion 2
Backgr ground ound Batch-file Operations Accessing a batch of files Many applications need batch-file operations Backup applications File-level data replication and archiving Big data analytics systems Social media and online shopping websites Traditional access approaches access files one by one Called single-file access pattern Inefficient for small files 3
Backgr ground ound Small files in file systems Desktop file system: more than 80% of accesses are to files smaller than 32B. Cloud and HPC cluster: 25%~40% files < 4KB. Single-file access pattern for small files Accessing metadata Fetching file data, and so on IO operations dominate batch-file access Metadata access contributes 40% time for accessing a small file on disk. Random data IOs 4
Overall access performance Read performance 9704 227.8 16384 256 HDD_R SSD_R 2226.5 106.3 4096 128 87.3 HDD_S SSD_S Execution time (s) Execution time (s) 551.3 1024 64 37.1 177.6 30.5 167.9 256 32 21.1 20.2 65.1 53.7 14.9 14.2 37.1 64 16 31.1 29.4 28.9 28.2 10.4 9.7 8.5 16 8 4 4 4KB 16KB 64KB 256KB 1MB 4MB 4KB 16KB 64KB 256KB 1MB 4MB File size in different file sets File size in different file sets Setup: File sets: 4GB data with different file sizes (i.e., from 4KB to 4MB) Devices: HDD & SSD Orders: Random & Sequential 5
Overall access performance Read performance 9704 227.8 16384 256 HDD_R SSD_R 2226.5 106.3 2.6X 4096 128 87.3 HDD_S SSD_S 57.8X Execution time (s) Execution time (s) 551.3 1024 64 37.1 177.6 30.5 167.9 256 32 21.1 20.2 65.1 53.7 14.9 14.2 37.1 64 16 31.1 29.4 28.9 Large performance gap between the 28.2 10.4 9.7 8.5 16 8 random and sequential, especially for small files 4 4 4KB 16KB 64KB 256KB 1MB 4MB 4KB 16KB 64KB 256KB 1MB 4MB File size in different file sets File size in different file sets Setup: File sets: 4GB data with different file sizes (i.e., from 4KB to 4MB) Devices: HDD & SSD Orders: Random & Sequential 6
Overall access performance Read performance 9704 227.8 16384 256 HDD_R SSD_R 2226.5 106.3 4096 128 87.3 HDD_S SSD_S Execution time (s) Execution time (s) 551.3 1024 64 37.1 177.6 30.5 167.9 256 32 21.1 20.2 65.1 53.7 14.9 14.2 37.1 64 16 31.1 29.4 28.9 28.2 Large performance gap among different 10.4 9.7 8.5 16 8 file sizes 4 4 4KB 16KB 64KB 256KB 1MB 4MB 4KB 16KB 64KB 256KB 1MB 4MB File size in different file sets File size in different file sets Setup: File sets: 4GB data with different file sizes (i.e., from 4KB to 4MB) Devices: HDD & SSD Orders: Random vs Sequential 7
Probl blem em Write performance 5138 8192 128 92.4 HDD_R SSD_R 58.8 2048 HDD_S SSD_S 930 64 Execution time (s) Execution time (s) 37.8 225.7 512 146.5 32 22.2 20.8 88.7 68.6 128 56.1 16.5 43.5 35.9 36.1 35.3 12.9 16 12.5 12.4 37 11.6 11.3 32 11 8 8 4 2 4KB 16KB 64KB 256KB 1MB 4MB 4KB 16KB 64KB 256KB 1MB 4MB File size in different file sets File size in different file sets Setup: Observation: the single-file access approach is very inefficient File sets: 4GB data with different file sizes (i.e., from 4KB to 4MB) for small files (below 1MB); Devices: HDD & SSD in a random manner . Orders: Random vs Sequential 8
Related ed W Wor orks Application-level optimization (Fastcopy) Multi-threading, large buffer Prefetching mechanism (Diskseen, ATC’07) Depending on the future access behaviors Block-level I/O scheduler (split-level I/O scheduling, SOSP’15) Serializing the file accesses Packing metadata and data together (CFFS, FAST’16) Redesigning new file systems 9
Probl blem em A Anal alysis File Access behaviors Reading a file set with three representative file systems 10
Probl blem em A Anal alysis File Access behaviors Reading a file set with three representative file systems 11
Probl blem em A Anal alysis File Access behaviors Reading a file set with three representative file systems Writing a file set with three representative file systems Insufficiency #1: The single-file access approach leads to the back and forth seek operations between the metadata area and data area, resulting in many non-sequential I/Os. 12
Probl blem em A Anal alysis File Access behaviors Data Access behaviors (excluding the metadata) A B C D App E Disk Blocks E D B A C Expected access order 13
Probl blem em A Anal alysis File Access behaviors Data Access behaviors (excluding the metadata) A A B B C C D D App E E Disk Blocks Disk Blocks File A File C File E E File D D File B B A C Actual access order (alphabetic) 14
Probl blem em A Anal alysis File Access behaviors Data Access behaviors (excluding the metadata) 137 Logical Block Address ( X10 6 ) 136.5 136 135.5 Insufficiency #2: The single-file access approach is unaware 135 234 234.1 234.2 234.3 234.4 234.5 234.6 234.7 234.8 234.9 235 of the underlying data layout, and may read these files in Time (Secs) any order, also leading to random I/Os. 15
Outline Background BFO Design BFOr BFOw Evaluation Conclusion 16
BFO FOr Two-phase read Objective: Separately read the metadata and file data of all accessed files in batches Phase 1: scanning the inodes Phase 2: fetching all files’ data Layout-aware scheduler 2MB 128MB data group 17
BFO FOr Two-phase read Layout-aware scheduler Extracting the addresses from the inodes Sorting the addresses of all files Issuing read I/O in the order of the list Order_node Inode (2bytes) Order list Start-point (8bytes) Length (4bytes) Num (4bytes) Disk blocks A C E D B 18
BFO FOr Two-phase read Layout-aware scheduler Extracting the addresses from the inodes Sorting the addresses of all files Issuing read I/O in the order of the list Order_node Order_node A B C D E Inode-> File A Inode (2bytes) Order list Start-point (8bytes) Start-point-> 3000# Length-> 8192bytes Length (4bytes) Num-> 0 Num (4bytes) Disk blocks A C E D B 19
BFO FOr Two-phase read Layout-aware scheduler Extracting the addresses from the inodes Sorting the addresses of all files Issuing read I/O in the order of the list Order_node Order_node A B C D E Inode-> File A Inode (2bytes) Order list Start-point (8bytes) Start-point-> 3000# Length-> 8192bytes Length (4bytes) Num-> 0 Num (4bytes) Disk blocks A C E D B 20
BFO FOr Two-phase read Layout-aware scheduler Extracting the addresses from the inodes Sorting the addresses of all files Issuing read I/O in the order of the list Order_node Order_node A B C D E Inode-> File A Inode (2bytes) Order list Start-point (8bytes) Start-point-> 3000# Length-> 8192bytes Length (4bytes) Num-> 0 Num (4bytes) Disk blocks A C E D B 21
BFO FOw Two-phase write Phase 1: creating a global file to store all data once Creating G inode for the file Creating Order_list to record the order of the written files Phase 2: creating all inodes for all files Extracting the address from the G inode Creating all inodes with the address information and the Order_list Current_FileAddr = Previous_FileAddr + FileLength G ABCDE Light-weight consistency strategy Disk Blocks G Global file 22
BFO FOw Two-phase write Phase 1: creating a global file to store all data once Creating G inode for the file Creating Order_list to record the order of the written files Phase 2: creating all inodes for all files Extracting the address from the G inode Creating all inodes with the address information and the Order_list Current_FileAddr = Previous_FileAddr + FileLength G ABCDE Light-weight consistency strategy Disk Blocks G A B C D E 23
BFO FOw Two-phase write Phase 1: creating a global file to store all data once Creating G inode for the file Creating Order_list to record the order of the written files Phase 2: creating all inodes for all files Extracting the address from the G inode Creating all inodes with the address information and the Order_list Current_FileAddr = Previous_FileAddr + FileLength G ABCDE Light-weight consistency strategy Disk Blocks G A B A B C D E C D E 24
BFO FOw Two-phase write Light-weight consistency strategy writing the Order_list into journal files as an atomic operation recreating all inodes with the Order_list and G inode G ABCDE Disk Blocks G G A B A B C D E C D E 25
Outline Background BFO Design Evaluation Conclusion 28
Recommend
More recommend