bfo batch file operations on massive files for consistent
play

BFO: Batch-File Operations on Massive Files for Consistent - PowerPoint PPT Presentation

BFO: Batch-File Operations on Massive Files for Consistent Performance Improvement Yang Yang, Qiang Cao, Hong Jiang, Li Yang, Jie Yao, Yuanyuan Dong, Puyuan Yang Huazhong University of Science and Technology, University of Texas at Arlington,


  1. BFO: Batch-File Operations on Massive Files for Consistent Performance Improvement Yang Yang, Qiang Cao, Hong Jiang, Li Yang, Jie Yao, Yuanyuan Dong, Puyuan Yang Huazhong University of Science and Technology, University of Texas at Arlington, Alibaba group 1

  2. Outline  Background  BFO Design  Evaluation  Conclusion 2

  3. Backgr ground ound  Batch-file Operations  Accessing a batch of files  Many applications need batch-file operations  Backup applications  File-level data replication and archiving  Big data analytics systems  Social media and online shopping websites  Traditional access approaches access files one by one  Called single-file access pattern  Inefficient for small files 3

  4. Backgr ground ound  Small files in file systems  Desktop file system: more than 80% of accesses are to files smaller than 32B.  Cloud and HPC cluster: 25%~40% files < 4KB.  Single-file access pattern for small files  Accessing metadata  Fetching file data, and so on  IO operations dominate batch-file access  Metadata access contributes 40% time for accessing a small file on disk.  Random data IOs 4

  5. Overall access performance  Read performance 9704 227.8 16384 256 HDD_R SSD_R 2226.5 106.3 4096 128 87.3 HDD_S SSD_S Execution time (s) Execution time (s) 551.3 1024 64 37.1 177.6 30.5 167.9 256 32 21.1 20.2 65.1 53.7 14.9 14.2 37.1 64 16 31.1 29.4 28.9 28.2 10.4 9.7 8.5 16 8 4 4 4KB 16KB 64KB 256KB 1MB 4MB 4KB 16KB 64KB 256KB 1MB 4MB File size in different file sets File size in different file sets Setup:  File sets: 4GB data with different file sizes (i.e., from 4KB to 4MB)  Devices: HDD & SSD  Orders: Random & Sequential 5

  6. Overall access performance  Read performance 9704 227.8 16384 256 HDD_R SSD_R 2226.5 106.3 2.6X 4096 128 87.3 HDD_S SSD_S 57.8X Execution time (s) Execution time (s) 551.3 1024 64 37.1 177.6 30.5 167.9 256 32 21.1 20.2 65.1 53.7 14.9 14.2 37.1 64 16 31.1 29.4 28.9 Large performance gap between the 28.2 10.4 9.7 8.5 16 8 random and sequential, especially for small files 4 4 4KB 16KB 64KB 256KB 1MB 4MB 4KB 16KB 64KB 256KB 1MB 4MB File size in different file sets File size in different file sets Setup:  File sets: 4GB data with different file sizes (i.e., from 4KB to 4MB)  Devices: HDD & SSD  Orders: Random & Sequential 6

  7. Overall access performance  Read performance 9704 227.8 16384 256 HDD_R SSD_R 2226.5 106.3 4096 128 87.3 HDD_S SSD_S Execution time (s) Execution time (s) 551.3 1024 64 37.1 177.6 30.5 167.9 256 32 21.1 20.2 65.1 53.7 14.9 14.2 37.1 64 16 31.1 29.4 28.9 28.2 Large performance gap among different 10.4 9.7 8.5 16 8 file sizes 4 4 4KB 16KB 64KB 256KB 1MB 4MB 4KB 16KB 64KB 256KB 1MB 4MB File size in different file sets File size in different file sets Setup:  File sets: 4GB data with different file sizes (i.e., from 4KB to 4MB)  Devices: HDD & SSD  Orders: Random vs Sequential 7

  8. Probl blem em  Write performance 5138 8192 128 92.4 HDD_R SSD_R 58.8 2048 HDD_S SSD_S 930 64 Execution time (s) Execution time (s) 37.8 225.7 512 146.5 32 22.2 20.8 88.7 68.6 128 56.1 16.5 43.5 35.9 36.1 35.3 12.9 16 12.5 12.4 37 11.6 11.3 32 11 8 8 4 2 4KB 16KB 64KB 256KB 1MB 4MB 4KB 16KB 64KB 256KB 1MB 4MB File size in different file sets File size in different file sets Setup: Observation: the single-file access approach is very inefficient  File sets: 4GB data with different file sizes (i.e., from 4KB to 4MB)  for small files (below 1MB);  Devices: HDD & SSD  in a random manner .  Orders: Random vs Sequential 8

  9. Related ed W Wor orks  Application-level optimization (Fastcopy)  Multi-threading, large buffer  Prefetching mechanism (Diskseen, ATC’07)  Depending on the future access behaviors  Block-level I/O scheduler (split-level I/O scheduling, SOSP’15)  Serializing the file accesses  Packing metadata and data together (CFFS, FAST’16)  Redesigning new file systems 9

  10. Probl blem em A Anal alysis  File Access behaviors  Reading a file set with three representative file systems 10

  11. Probl blem em A Anal alysis  File Access behaviors  Reading a file set with three representative file systems 11

  12. Probl blem em A Anal alysis  File Access behaviors  Reading a file set with three representative file systems  Writing a file set with three representative file systems Insufficiency #1: The single-file access approach leads to the back and forth seek operations between the metadata area and data area, resulting in many non-sequential I/Os. 12

  13. Probl blem em A Anal alysis  File Access behaviors  Data Access behaviors (excluding the metadata) A B C D App E Disk Blocks E D B A C Expected access order 13

  14. Probl blem em A Anal alysis  File Access behaviors  Data Access behaviors (excluding the metadata) A A B B C C D D App E E Disk Blocks Disk Blocks File A File C File E E File D D File B B A C Actual access order (alphabetic) 14

  15. Probl blem em A Anal alysis  File Access behaviors  Data Access behaviors (excluding the metadata) 137 Logical Block Address ( X10 6 ) 136.5 136 135.5 Insufficiency #2: The single-file access approach is unaware 135 234 234.1 234.2 234.3 234.4 234.5 234.6 234.7 234.8 234.9 235 of the underlying data layout, and may read these files in Time (Secs) any order, also leading to random I/Os. 15

  16. Outline  Background  BFO Design  BFOr  BFOw  Evaluation  Conclusion 16

  17. BFO FOr  Two-phase read  Objective: Separately read the metadata and file data of all accessed files in batches  Phase 1: scanning the inodes  Phase 2: fetching all files’ data  Layout-aware scheduler 2MB 128MB data group 17

  18. BFO FOr  Two-phase read  Layout-aware scheduler  Extracting the addresses from the inodes  Sorting the addresses of all files  Issuing read I/O in the order of the list Order_node Inode (2bytes) Order list Start-point (8bytes) Length (4bytes) Num (4bytes) Disk blocks A C E D B 18

  19. BFO FOr  Two-phase read  Layout-aware scheduler  Extracting the addresses from the inodes  Sorting the addresses of all files  Issuing read I/O in the order of the list Order_node Order_node A B C D E Inode-> File A Inode (2bytes) Order list Start-point (8bytes) Start-point-> 3000# Length-> 8192bytes Length (4bytes) Num-> 0 Num (4bytes) Disk blocks A C E D B 19

  20. BFO FOr  Two-phase read  Layout-aware scheduler  Extracting the addresses from the inodes  Sorting the addresses of all files  Issuing read I/O in the order of the list Order_node Order_node A B C D E Inode-> File A Inode (2bytes) Order list Start-point (8bytes) Start-point-> 3000# Length-> 8192bytes Length (4bytes) Num-> 0 Num (4bytes) Disk blocks A C E D B 20

  21. BFO FOr  Two-phase read  Layout-aware scheduler  Extracting the addresses from the inodes  Sorting the addresses of all files  Issuing read I/O in the order of the list Order_node Order_node A B C D E Inode-> File A Inode (2bytes) Order list Start-point (8bytes) Start-point-> 3000# Length-> 8192bytes Length (4bytes) Num-> 0 Num (4bytes) Disk blocks A C E D B 21

  22. BFO FOw  Two-phase write  Phase 1: creating a global file to store all data once  Creating G inode for the file  Creating Order_list to record the order of the written files  Phase 2: creating all inodes for all files  Extracting the address from the G inode  Creating all inodes with the address information and the Order_list Current_FileAddr = Previous_FileAddr + FileLength  G ABCDE  Light-weight consistency strategy Disk Blocks G Global file 22

  23. BFO FOw  Two-phase write  Phase 1: creating a global file to store all data once  Creating G inode for the file  Creating Order_list to record the order of the written files  Phase 2: creating all inodes for all files  Extracting the address from the G inode  Creating all inodes with the address information and the Order_list Current_FileAddr = Previous_FileAddr + FileLength  G ABCDE  Light-weight consistency strategy Disk Blocks G A B C D E 23

  24. BFO FOw  Two-phase write  Phase 1: creating a global file to store all data once  Creating G inode for the file  Creating Order_list to record the order of the written files  Phase 2: creating all inodes for all files  Extracting the address from the G inode  Creating all inodes with the address information and the Order_list Current_FileAddr = Previous_FileAddr + FileLength  G ABCDE  Light-weight consistency strategy Disk Blocks G A B A B C D E C D E 24

  25. BFO FOw  Two-phase write  Light-weight consistency strategy  writing the Order_list into journal files as an atomic operation  recreating all inodes with the Order_list and G inode G ABCDE Disk Blocks G G A B A B C D E C D E 25

  26. Outline  Background  BFO Design  Evaluation  Conclusion 28

Recommend


More recommend