performance improvement of btrfs
play

Performance Improvement of Btrfs Miao Xie - PowerPoint PPT Presentation

Performance Improvement of Btrfs Miao Xie <miaox@cn.fujitsu.com> Li Zefan <lizf@cn.fujitsu.com> Agenda Comparison between Btrfs and Ext3/4 Issue analysis (We have investigated) Small file sequential read Large file


  1. Performance Improvement of Btrfs Miao Xie <miaox@cn.fujitsu.com> Li Zefan <lizf@cn.fujitsu.com>

  2. Agenda  Comparison between Btrfs and Ext3/4  Issue analysis (We have investigated)  Small file sequential read  Large file random write (Direct I/O and fsync)  File creation/deletion  Future work 2

  3. Comparison between Btrfs and Ext3/4  Performance test environment  Hardware • CPU : Xeon(TM) X5260 3.33G X 2 ( 4 cores ) • Memory : 4GB • Disk : 20GB  Software • OS : RHEL6(x86_64) • Kernel : 2.6.38 • Glibc : 2.12 • Btrfs-progs : 0.9 • Sysbench: 0.4.12 3

  4. Comparison between Btrfs and Ext3/4  73 cases in total  72 file I/O cases, mix the following conditions: • Small file / Large file • Write / Read • Random / Sequential • Sync / Async / Direct I/O • Single-thread / Multi-thread • Different block size (1Kb, 4Kb, 32Kb) *  File creation/deletion • Measure the speed of empty file creation/deletion * Block size (bs): read or write BYTES bytes at a time. 4

  5. Small file random read performance 6000 5000 4000 IO Speed (Unit: Kb/s) 3000 EXT3 EXT4 BTRFS 2000 1000 0 bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K 1 Thread 8 Threads 1 Thread 8 Threads DirectI/O General Read 5

  6. Small file random write performance 3000 2500 IO Speed (Unit: Kb/s) 2000 1500 EXT3 EXT4 BTRFS 1000 500 0 bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K 1 Thread 8 Threads 1 Thread 8 Threads DirectI/O Write (fsync) Write (fsync): write data into the file, and do fsync every 100 requests 6

  7. Small file sequential read performance 60.00 50.00 40.00 IO Speed (Unit: Mb/s) 30.00 EXT3 EXT4 BTRFS 20.00 10.00 0.00 bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K 1 Thread 8 Threads 1 Thread 8 Threads DirectI/O General Read 7

  8. Small file sequential write performance 6000 5000 IO Speed (Unit: Kb/s) 4000 3000 EXT3 EXT4 BTRFS 2000 1000 0 bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K 1 Thread 8 Threads 1 Thread 8 Threads DirectI/O Write (fsync) Write (fsync): write data into the file, and do fsync every 100 requests 8

  9. Large file random read performance 16.00 14.00 12.00 IO Speed (Unit: Mb/s) 10.00 8.00 EXT3 EXT4 6.00 BTRFS 4.00 2.00 0.00 bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads 1 Thread 8 Threads DirectIO General Read 9

  10. Large file sequential read performance 90.00 80.00 70.00 60.00 IO Speed (Unit: Mb/s) 50.00 EXT3 40.00 EXT4 30.00 BTRFS 20.00 10.00 0.00 bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads 1 Thread 8 Threads DirectIO General Read 10

  11. Large file random write performance (1/2) 14.00 12.00 10.00 IO Speed (Unit: Mb/s) 8.00 EXT3 6.00 EXT4 BTRFS 4.00 2.00 0.00 bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads 1 Thread 8 Threads DirectIO Write (fsync) 11

  12. Large file random write performance (2/2) 40.00 35.00 30.00 IO Speed (Unit: Mb/s) 25.00 20.00 EXT3 EXT4 15.00 BTRFS 10.00 5.00 0.00 bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads General Write 12

  13. Large file sequential write performance (1/2) 8.00 7.00 6.00 IO Speed (Unit: Mb/s) 5.00 4.00 EXT3 EXT4 3.00 BTRFS 2.00 1.00 0.00 bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads DirectIO 13

  14. Large file sequential write performance (2/2) 90.00 80.00 70.00 60.00 IO Speed (Unit: Mb/s) 50.00 EXT3 40.00 EXT4 BTRFS 30.00 20.00 10.00 0.00 bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads 1 Thread 8 Threads Write (fsync) General Write 14

  15. File creation/deletion performance  Create/delete lots of empty files to measure the speed of file creation and deletion. 140000 120000 100000 (Unit: files/sec) 80000 Ext3 Ext4 60000 Btrfs 40000 20000 0 Creation Deletion 15

  16. Comparison between Btrfs and Ext3/4  The performance of Btrfs is quite poor in the following cases (> 20% lower than Ext3/4)  Small file random read (Not inline file)  Small file sequential read  Small file random/sequential write  Large file random write (Direct I/O and fsync)  Large file random write (general write, bs = 4Kb)  File creation and deletion 16

  17. Agenda  Comparison between Btrfs and Ext3/4  Issue analysis (We have investigated)  Small file sequential read  Large file random write (Direct I/O and fsync)  File creation/deletion  Future work 17

  18. Small file sequential read  Reasons  Metadata fragment -> The file extent reading latency -> The delay of file data reading • Btrfs must read file extent before reading file data (no matter the small file is inlined or not), but the disk has to reposition the reading offset frequently because of the fragment, and the readahead function can’t work well. So … Fs/file tree Disk 18

  19. Small file sequential read  Reason verification  Do small file sequential read after defragment 30.00 25.00 IO Speed (Unit: Mb/s) 20.00 15.00 No Defrag After Defrag 10.00 5.00 0.00 bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K 1 Thread 8 Threads 1 Thread 8 Threads DirectI/O General Read 19

  20. Small file sequential read  Solution  Pre-allocation for b+ tree: Introduce free space clusters for each node in the tree, then we can allocate contiguous free space from the parent node’s cluster to store the sibling leaves closely (The patch of this solution is still under test, hasn’t be posted) Fs/file tree Cluster Cluster Cluster Disk 20

  21. Small file sequential read  Improvement result 60.00 50.00 IO Speed (Unit: Mb/s 40.00 EXT3 30.00 EXT4 BTRFS 20.00 BTRFS + Patch 10.00 0.00 bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K bs = 1K bs = 4K 1 Thread 8 Threads 1 Thread 8 Threads DirectI/O General Read  Further Improvement  Introduce the auto defragment for metadata  Apply the new metadata readahead API written by Arne 21

  22. Agenda  Comparison between Btrfs and Ext3/4  Issue analysis (We have investigated)  Small file sequential read  Large file random write (Direct I/O and fsync)  File creation/deletion  Future work 22

  23. Large file random write (Direct IO and fsync)  Background – What is tree logging? Tree logging is a special write ahead log of dirty metadata.  Purpose: Reduce the write requests of the metadata when fsyncs and O_SYNCs happen.  Implementation: Copy the changed items into a special tree (log tree, one per fs/file tree), and then write that tree to disk. After a crash, Btrfs recover the fs/file tree by that tree. 23

  24. Large file random write (Direct IO and fsync)  Reasons  Log lots of unchanged metadata (Ex. Csum, File extent) Application File Extent 1 Extent 2 Extent 3 … Extent N Change the relative Csum tree Checksums Log all the csum data of this file Log tree Extent1 Extent2 Extent3 ExtentN Csum Csum Csum Csum Write to disk Disk Checksum of the file’s extent Checksum of the file’s extent that be changed The extent that be changed 24

  25. Large file random write (Direct IO and fsync)  Reason verification  Do large file random write test after closing tree log function (mount with -o notreelog) 50.00 45.00 40.00 35.00 IO Speed (Unit: Mb/s) 30.00 25.00 BTRFS 20.00 BTRFS(no treelog) 15.00 10.00 5.00 0.00 bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K bs = 4K bs = 32K 1 Thread 8 Threads 1 Thread 8 Threads DirectIO Write (fsync) 25

  26. Large file random write (Direct IO and fsync)  Solution  Don’t log unchanged metadata: Introduce sub- transaction id to filter the unchanged metadata (v2.6.41) Application File Extent 1 Extent 2 Extent 3 … Extent N Change the relative Csum tree Checksums Log all the csum data of this file Log tree Extent1 Extent2 Extent3 ExtentN Csum Csum Csum Csum Write to disk Disk Checksum of the file’s extent Checksum of the file’s extent that be changed The extent that be changed 26

  27. Large file random write (Direct IO and fsync)  Solution  Don’t log unchanged metadata: Introduce sub- transaction id to filter the unchanged metadata (v2.6.41) Application File Extent 1 Extent 2 Extent 3 … Extent N Change the relative Csum tree Checksums Log all the csum data of this file Log tree Extent1 Extent2 Extent3 ExtentN Csum Csum Csum Csum Write to disk Disk Checksum of the file’s extent Checksum of the file’s extent that be changed The extent that be changed 27

  28. Large file random write (Direct IO and fsync)  Solution  Don’t log unchanged metadata: Introduce sub- transaction id to filter the unchanged metadata (v2.6.41) Application File Extent 1 Extent 2 Extent 3 … Extent N Change the relative Csum tree Checksums Log the changed csum data of this file Log tree Extent1 Extent2 Extent3 ExtentN Csum Csum Csum Csum Write to disk Disk Checksum of the file’s extent Checksum of the file’s extent that be changed The extent that be changed 28

Recommend


More recommend