write optimization of log structured flash file system
play

Write Optimization of Log-structured Flash File System for Parallel - PowerPoint PPT Presentation

Write Optimization of Log-structured Flash File System for Parallel I/O on Manycore Servers Chang-Gyu Lee , Hyunki Byun, Sunghyun Noh, Hyeongu Kang, Youngjae Kim Department of Computer Science and Engineering Sogang University, Seoul, Republic of


  1. Write Optimization of Log-structured Flash File System for Parallel I/O on Manycore Servers Chang-Gyu Lee , Hyunki Byun, Sunghyun Noh, Hyeongu Kang, Youngjae Kim Department of Computer Science and Engineering Sogang University, Seoul, Republic of Korea SYSTOR ‘19 1

  2. Data Intensive Applications – Massive data explosion in recent years and expected to grow 2007, 2010, 2013, 2020, 281 EB 1.2 ZB 4.4 ZB ~44 ZB Memory Growing Capacity Demands Parallel Writes – Database Applications Storage 2

  3. Manycore CPU and NVMe SSD Manycore Servers OS File System Parallel Writes (F2FS) High-Performance SSD 3

  4. What are Parallel Writes? – Shared File Writes (DWOM from FxMark[ATC’16]) – Multiple processes write private regions on a single file. Process 1 Process 2 Process 3 Process N Direct I/O Write Shared File – Private File Write with FSYNC (DWSL from FxMark[ATC’16]) – Multiple processes write private files, then call fsync system calls. Process 1 Process 2 Process 3 Process N Write and fsync Private Files * FxMark[ATC’16]: Min. et. al., "Understanding Manycore Scalability of File Systems", USENIX ATC 2016 4

  5. Preliminary Results <DWOM Workload> <DWSL Workload> 200 200 150 150 K IOPS K IOPS 100 100 50 50 0 0 1 1 2 4 5 7 8 9 1 1 1 1 2 4 5 7 8 9 1 1 5 8 2 6 0 4 8 1 2 5 8 2 6 0 4 8 1 2 2 0 2 0 # of Cores # of Cores – In DWOM workload, the performance does not scale. – In DWSL workload, the performance does not scale after 42 cores. 5

  6. Contents – Introduction and Motivation – Background: F2FS – Research Problems – Parallel Writes do never scale with respect to the increased number of cores on Manycore servers. – Approaches – Applying Range-Locking – NVM Node Logging for file and file system metadata – Pin-Point Update to completely eliminate checkpointing – Evaluation Results – Conclusion 6

  7. F2FS: Flash Friendly File System – F2FS is a log-structured file system designed for NAND Flash SSD. – F2FS employs two types of logs to benefit with Flash’s parallelism and garbage collection. – Data log for directory entry and user data – Node log for inode and indirect node – Node Address Table (NAT) translates Node id (NID) to block address . – In memory, block address of an NAT entry is updated when corresponding Node Log is flushed. – Entire NAT is flushed to the storage device during checkpointing. Filesystem Metadata Main Log Area (Random write) (Sequential Write) CP NAT SIT SSA Node Log Data Log 7

  8. Problem(1): Serialized Shared File Writes – Single file write A B C Blocked Grant Lock Inode File 8

  9. Problem(2): fsync Processing in F2FS Old Data New Data Reference Flushing Node id Block NAT ❶ ❷ DRAM inode inode Data Data SSD Node id Block NAT Data Log Node Log 9

  10. Problem(3): I/O Blocking during Checkpointing Old Data New Data Reference Flushing Node id Block 60 Sec. NAT Checkpointing ❶ ❷ DRAM inode inode Data Data SSD Node id Block NAT Data Log Node Log 10

  11. Problem(3): I/O Blocking during Checkpointing Old Data New Data Reference Flushing Node id Block 60 Sec. NAT Checkpointing ❶ ❸ ❷ DRAM inode inode Data Data SSD Node id Block NAT Data Log Node Log 11

  12. Problem(3): I/O Blocking during Checkpointing Old Data New Data Reference Flushing User Level Filesystem Level Node id Block NAT ❶ ❸ ❷ DRAM inode inode Data Data SSD Node id Block NAT Data Log Node Log 12

  13. Summary – We identified the causes of bottlenecks in F2FS for parallel writes as follows. 1. Serialization of parallel writes on a single file 2. High latency of fsync system call 3. I/O blocking by checkpointing of F2FS 13

  14. Approach(1): Range Locking – In F2FS, parallel writes to a single file are serialized by inode mutex lock. We employed a range-based lock to allow parallel writes on a single file. A B C Inode B, ref=0 A, ref=0 Grant Lock C, ref=1 Grant Lock Block File 14

  15. Approach(2): High Latency of fsync Processing – When fsync is called, F2FS has to flush data and metadata. – Even if only small portion of metadata is changed, a block has to be flushed. – The latency of fsync is dominated by block I/O latency. To mitigate high latency of fsync, we propose NVM Node Logging and fine- graind inode. Slow Block I/O Better Latency DRAM DRAM inode inode SSD NVM Write Amplification Byte-addressability 15

  16. Approach(2): Node Logging on NVM Old Data New Data Reference Flushing Node id Block NAT ❷ ❶ DRAM inode inode Data Data NVM SSD Node id Block NAT Data Log Node Log 16

  17. Approach(3): Fine-grained inode Structure inode inode 0.4KB Address Address nid Address Space 4KB Data Data Direct Node Indirect Node Double Indirect nid Fine-grained inode inode in baseline F2FS 17

  18. Approach(4): Pin-Point NAT Update – Frequent fsync calls trigger checkpointing in F2FS – However, F2FS blocks all incoming I/O requests during checkpointing. To eliminate checkpointing, we propose Pin-Point NAT Update. 18

  19. Approach(4): Pin-Point NAT Update Old Data In Pin-Point NAT Update, we update only the modified NAT entry New Data directly in NVM when fsync is called. Therefore, checkpointing is Reference not necessary to persist the entire NAT. Flushing Node id Block NAT ❸ ❷ ❶ DRAM inode inode Data Data NVM SSD Node id Block NAT Data Log Node Log 19

  20. Approach(4): Pin-Point NAT Update Old Data New Data Reference Flushing Node id Block NAT ❸ ❷ ❶ DRAM inode inode Data Data NVM SSD Node id Block NAT Data Log Node Log 20

  21. Evaluation Setup – Microbenchmark (FxMark) – Test-bed – DWOM – IBM x3950 X6 Manycore Server – Shared File Write Intel Xeon E7-8870 v2 2.3GHz – DWSL CPU 8 CPU Nodes (15 Cores per Node) – Private File Write with fsync Total 120 cores RAM 740GB Intel SSD 750 Series 400GB (NVMe) SSD Read: 2200 MB/s, Write: 900 MB/s 32GB Emulated as PMEM device on R NVM AM OS Linux kernel 4.14.11 * FxMark[ATC’16]: Min. et. al., "Understanding Manycore Scalability of File Systems", USENIX ATC 2016 21

  22. Shared File Write (DWOM Workload) 140 120 100 X15 80 K IOPS 60 X6.8 40 Baseline and node logging lines overlap. • Node Logging does not help at all because DWOM • 20 workload does not carry fsync calls. 0 1 15 28 42 56 70 84 98 112 120 # of Cores baseline range lock node logging integrated 22

  23. Frequent fsync (DWSL Workload) 250 200 X1.6 150 K IOPS 100 50 0 1 15 28 42 56 70 84 98 112 120 # of Cores baseline range lock node logging integrated 23

  24. Conclusion – We identified performance bottlenecks of F2FS for parallel writes. 1. Serialization of share file writes on a file 2. High latency of fsync operations in F2FS 3. High I/O blocking times during checkpointing. – To solve these problem, we proposed 1. File-level Range Lock to allow parallel writes on a shared file 2. NVM Node Logging to provides lower latency for updating file/file system metadata 3. Pin-Point NAT Update to eliminate I/O blocking times of checkpointing 24

  25. Q&A Thank you! – Contact: Changgyu Lee (changgyu@sogang.ac.kr) Department of Computer Science and Engineering Sogang University, Seoul, Republic of Korea 25

Recommend


More recommend