Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks Shengan Zheng † , Morteza Hoseinzadeh § , Steven Swanson § † Shanghai Jiao Tong University § University of California, San Diego 1
Background • Non-volatile main memory (NVMM) – Byte-addressability – Persistence 3D-XPoint NVDIMM – Direct access (DAX) • NVMM file systems – PMFS, SCMFS, NOVA – EXT4-DAX, XFS-DAX DRAM + Flash NVDIMM – Capacity? 2
Motivation Bandwidth DRAM 10GB/s NVMM Optane SSD NVMe SSD SATA SSD 1GB/s Hard Disk Drive 100MB/s $/GB 10 0.01 0.1 1 3
Motivation Bandwidth DRAM 10GB/s NVMM Optane SSD NVMe SSD SATA SSD 1GB/s Hard Disk Drive 100MB/s $/GB 10 0.01 0.1 1 4
Tiered Storage System • SSD for speed • HDD for capacity SSD HDD 5
Tiered Storage System • NVMM for speed • Disks for capacity NVMM SSD HDD 6
Ziggurat Overview • Intelligent data placement policy – Send writes to the most suitable tier – High NVMM space utilization • Accurate predictors – Predict the synchronicity of each file (synchronicity predictor) – Predict the size of future writes to each file (write size predictor) • Efficient migration mechanism – Only migrate cold data in cold files – Migrate file data in groups 7
Outline • Motivation • Data placement policy • Migration mechanism • Evaluation • Conclusion 8
Data Placement Policy • Although NVMM is the fastest tier in Ziggurat, file writes should not always go to NVMM. Synchronicity predictor Data Placement Synchronously-updated Asynchronously-updated Large NVMM Disk Write writes size Small predictor NVMM NVMM writes 9
Synchronicity Predictor • Predict whether the future accesses are likely to be synchronous write(0,2); Data blocks written: 0 / 4 File log fsync(); write(2,2); fsync(); Synchronous File data write(4,2); fsync(); 0 1 2 3 4 5 6 7 Data blocks written: 0 / 4 File log write(0,2); write(2,2); write(4,2); File data Asynchronous Synchronous fsync(); 0 1 2 3 4 5 6 7 10 Write entry offset, length
Synchronicity Predictor • Predict whether the future accesses are likely to be synchronous write(0,2); Data blocks written: 2 / 4 File log 0,2 fsync(); write(2,2); fsync(); Synchronous File data write(4,2); fsync(); 0 1 2 3 4 5 6 7 Data blocks written: 0 / 4 File log write(0,2); write(2,2); write(4,2); File data Asynchronous Synchronous fsync(); 0 1 2 3 4 5 6 7 11 Write entry offset, length
Synchronicity Predictor • Predict whether the future accesses are likely to be synchronous write(0,2); Data blocks written: 0 / 4 File log 0,2 fsync(); write(2,2); fsync(); Synchronous File data write(4,2); fsync(); 0 1 2 3 4 5 6 7 Data blocks written: 0 / 4 File log write(0,2); write(2,2); write(4,2); File data Asynchronous Synchronous fsync(); 0 1 2 3 4 5 6 7 12 Write entry offset, length
Synchronicity Predictor • Predict whether the future accesses are likely to be synchronous write(0,2); Data blocks written: 2 / 4 File log 0,2 2,2 fsync(); write(2,2); fsync(); Synchronous File data write(4,2); fsync(); 0 1 2 3 4 5 6 7 Data blocks written: 0 / 4 File log write(0,2); write(2,2); write(4,2); File data Asynchronous Synchronous fsync(); 0 1 2 3 4 5 6 7 13 Write entry offset, length
Synchronicity Predictor • Predict whether the future accesses are likely to be synchronous write(0,2); Data blocks written: 0 / 4 File log 0,2 2,2 fsync(); write(2,2); fsync(); Synchronous File data write(4,2); fsync(); 0 1 2 3 4 5 6 7 Data blocks written: 0 / 4 File log write(0,2); write(2,2); write(4,2); File data Asynchronous Synchronous fsync(); 0 1 2 3 4 5 6 7 14 Write entry offset, length
Synchronicity Predictor • Predict whether the future accesses are likely to be synchronous write(0,2); Data blocks written: 2 / 4 File log 0,2 2,2 4,2 fsync(); write(2,2); fsync(); Synchronous File data write(4,2); fsync(); 0 1 2 3 4 5 6 7 Data blocks written: 0 / 4 File log write(0,2); write(2,2); write(4,2); File data Asynchronous Synchronous fsync(); 0 1 2 3 4 5 6 7 15 Write entry offset, length
Synchronicity Predictor • Predict whether the future accesses are likely to be synchronous write(0,2); Data blocks written: 0 / 4 File log 0,2 2,2 4,2 fsync(); write(2,2); fsync(); Synchronous File data write(4,2); fsync(); 0 1 2 3 4 5 6 7 Data blocks written: 0 / 4 File log write(0,2); write(2,2); write(4,2); File data Asynchronous Synchronous fsync(); 0 1 2 3 4 5 6 7 16 Write entry offset, length
Synchronicity Predictor • Predict whether the future accesses are likely to be synchronous write(0,2); Data blocks written: 0 / 4 File log 0,2 2,2 4,2 fsync(); write(2,2); fsync(); Synchronous File data write(4,2); fsync(); 0 1 2 3 4 5 6 7 Data blocks written: 2 / 4 File log 0,2 write(0,2); write(2,2); write(4,2); File data Asynchronous Synchronous fsync(); 0 1 2 3 4 5 6 7 17 Write entry offset, length
Synchronicity Predictor • Predict whether the future accesses are likely to be synchronous write(0,2); Data blocks written: 0 / 4 File log 0,2 2,2 4,2 fsync(); write(2,2); fsync(); Synchronous File data write(4,2); fsync(); 0 1 2 3 4 5 6 7 Data blocks written: 4 / 4 File log 0,2 2,2 write(0,2); write(2,2); write(4,2); File data Asynchronous Synchronous fsync(); 0 1 2 3 4 5 6 7 18 Write entry offset, length
Synchronicity Predictor • Predict whether the future accesses are likely to be synchronous write(0,2); Data blocks written: 0 / 4 File log 0,2 2,2 4,2 fsync(); write(2,2); fsync(); Synchronous File data write(4,2); fsync(); 0 1 2 3 4 5 6 7 Data blocks written: 6 / 4 File log 0,2 2,2 4,2 write(0,2); write(2,2); write(4,2); File data Asynchronous Synchronous fsync(); 0 1 2 3 4 5 6 7 19 Write entry offset, length
Synchronicity Predictor • Predict whether the future accesses are likely to be synchronous write(0,2); Data blocks written: 0 / 4 File log 0,2 2,2 4,2 fsync(); write(2,2); fsync(); Synchronous File data write(4,2); fsync(); 0 1 2 3 4 5 6 7 Data blocks written: 0 / 4 File log 0,2 2,2 4,2 write(0,2); write(2,2); write(4,2); File data Asynchronous fsync(); 0 1 2 3 4 5 6 7 20 Write entry offset, length
Write Size Predictor • Predict whether the incoming writes are both large and stable File log 0,4,3 4,4,1 5,1,0 6,1,0 File data 0 1 2 3 4 5 6 7 write(0,4); write(6,1); write(4,4); 21 Write entry offset, length, counter
Write Size Predictor • Predict whether the incoming writes are both large and stable File log 0,4,3 4,4,1 5,1,0 6,1,0 0,4,? File data 0 1 2 3 4 5 6 7 write(0,4); write(6,1); write(4,4); 22 Write entry offset, length, counter
Write Size Predictor • Predict whether the incoming writes are both large and stable Length ≥ 4 Large File log 0,4,3 4,4,1 5,1,0 6,1,0 0,4,? Stable Predecessor found File data 0 1 2 3 4 5 6 7 write(0,4); write(6,1); write(4,4); 23 Write entry offset, length, counter
Write Size Predictor • Predict whether the incoming writes are both large and stable Length ≥ 4 Large File log 0,4,3 4,4,1 5,1,0 6,1,0 0,4,? Stable Predecessor found 0,4,4 File data 0 1 2 3 4 5 6 7 write(0,4); write(6,1); write(4,4); 24 Write entry offset, length, counter
Write Size Predictor • Predict whether the incoming writes are both large and stable Length ≥ 4 Large File log 0,4,3 4,4,1 5,1,0 6,1,0 0,4,? Stable Predecessor found 0,4,4 File data 0 1 2 3 4 5 6 7 6,1,? write(0,4); write(6,1); write(4,4); 25 Write entry offset, length, counter
Write Size Predictor • Predict whether the incoming writes are both large and stable Length ≥ 4 Large File log 0,4,3 4,4,1 5,1,0 6,1,0 0,4,? Stable Predecessor found 0,4,4 File data Length < 4 Small 0 1 2 3 4 5 6 7 6,1,? Predecessor found Stable write(0,4); write(6,1); write(4,4); 26 Write entry offset, length, counter
Write Size Predictor • Predict whether the incoming writes are both large and stable Length ≥ 4 Large File log 0,4,3 4,4,1 5,1,0 6,1,0 0,4,? Stable Predecessor found 0,4,4 File data Length < 4 Small 0 1 2 3 4 5 6 7 6,1,? Predecessor found Stable 6,1,0 write(0,4); write(6,1); write(4,4); 27 Write entry offset, length, counter
Write Size Predictor • Predict whether the incoming writes are both large and stable Length ≥ 4 Large File log 0,4,3 4,4,1 5,1,0 6,1,0 0,4,? Stable Predecessor found 0,4,4 File data Length < 4 Small 0 1 2 3 4 5 6 7 6,1,? Predecessor found Stable 6,1,0 write(0,4); write(6,1); write(4,4); 4,4,? 28 Write entry offset, length, counter
Write Size Predictor • Predict whether the incoming writes are both large and stable Length ≥ 4 Large File log 0,4,3 4,4,1 5,1,0 6,1,0 0,4,? Stable Predecessor found 0,4,4 File data Length < 4 Small 0 1 2 3 4 5 6 7 6,1,? Predecessor found Stable 6,1,0 write(0,4); write(6,1); write(4,4); Length ≥ 4 Large 4,4,? Predecessor not found Unstable 29 Write entry offset, length, counter
Recommend
More recommend