Towards Application-level I/O Proportionality with a Weight-aware Page Cache Management Jonggyu Park *, Kwonje Oh, and Young Ik Eom Sungkyunkwan University, South Korea
Server Consolidation is Pervasive • Multiple virtualized instances run on a single host • Compete for system resources • Efficient resource scheduling is necessary Container 1 Container 2 Container 3 Container 4 </> I/O Requests Operating System I/O Requests Hardware SSD HDD
Proportional I/O Sharing by Cgroups • Cgroups proportionally share I/O resources using I/O weight • The I/O bandwidth ratio follows the ratio of I/O weight 1.0 Group : Cgroup Node Root # # : I/O Weight 0.1 0.4 0.5 Group 1 Group 2 Group 3 : I/O Proportion : Applications 100 400 500
Cgroups and the Block Layer • The blkio subsystem controls I/O resources collaboratively with the block layer • I/O scheduler in the block layer utilizes the I/O weights in scheduling • I/O service time (CFQ) or the number of sectors to serve (BFQ) Container 1 Container 2 Container 3 Container 4 </> Single-queue Multi-queue NOOP Multi-queue 1.0 CFQ Operating System Group : Cgroup Node deadline … Root # # 0.1 0.4 0.5 : I/O Weight Block Layer Cgroups … Group 1 Group 2 Group 3 : I/O Proportion : Applications 100 400 500 Hardware SSD HDD
The Page Cache • The page cache is often utilized to enhance I/O performance. • It directly serves I/O requests without delivering them to the block layer, if possible • Cgroups cannot control I/O requests that are serviced by the page cache Container 1 Container 2 Container 3 Container 4 </> Operating System I/O Requests Return Page Cache Block Layer Cgroups Hardware
Buffered I/O vs. Direct I/O 400 8 Normalized I/O Bandwidth • Direct I/O Direct I/O Buffered I/O I/O Bandwidth (MB/s) Direct I/O Norm. Buffered I/O Norm. • Proportional I/O sharing according to I/O weight 300 6 • Lower performance due to bypassing the page cache 200 4 100 2 • Buffered I/O 0 0 100 200 400 800 • Poor proportionality Weight (Set by Cgroups) • Better performance due to the page cache Fileserver workload 500 10 Normalized I/O Bandwidth Direct I/O Buffered I/O I/O Bandwidth (MB/s) Direct I/O Norm. Buffered I/O Norm. 400 8 300 6 200 4 100 2 0 0 100 200 400 800 Weight (Set by Cgroups) Re-read workload
The Life of the Page Cache • Page allocation • Allocates a new page for the new page cache entry • Qspinlock serializes page allocation • Critical to the write performance • Page reclamation • Deallocates pages that are not used to secure new pages • Reclaims the pages at the tail of the inactive list • Decides which pages will reside in the page cache • Affects the read performance
Qspinlock of Page Allocation • Qspinlock prevents race condition Weight Weight Weight Weight 100 200 400 800 • Consists of a qspinlock and per-cpu qnodes qspinlock qnode locked • Allows one CPU holding qspinlock while the head node APP 1 APP 2 APP 3 APP 4 tail next busy-waiting (CPU2) busy-waits • After qspinlock is released, the head node acquires the CPU1 CPU2 CPU4 qspinlock busy-waiting 800 CPU1 CPU2 CPU4 CPU3 • FIFO-based holder selection • The conventional qspinlock for page allocation selects the next holder in a FIFO manner CPU2 CPU4 CPU3 busy-waiting • No consideration of I/O weight CPU2 CPU4 CPU3 qspinlock lock waiting queue An overview of qspinlock
Page Reclamation • Page cache • maintains 2Q LRU • Keeps data frequently accessed in the active list, otherwise in the inactive list Cgroups • Reclaims pages at the tail of the inactive list node APP 2 APP 3 APP 4 APP 1 200 400 800 100 APP # I/O weight • Page reclamation inactive list in Page Cache • Ignores the I/O weight during reclamation 1 3 2 4 2 • Pages used by higher weighted apps can be evicted page earlier 2 1 2 4 2 APP# • No scheme to reflect I/O weight page reclamation An overview of page reclamation
Justitia Problem #1: Cgroups focus on block-level I/O proportionality Problem #2: Page allocation/reclamation do not reflect I/O weight Weight-awareness!!! Justitia: new page cache management for application-level I/O proportionality A. Weight-aware Qspinlock for Page Cache Allocation B. Weight-aware Page Reclamation
Weight-aware Qspinlock for Page Cache Allocation • Weight-aware Qspinlock qspinlock qnode • Stores weight in the qnode weight locked APP 1 APP 2 APP 3 APP 4 next tail • Reflects I/O weight by the following procedure busy-waiting 800 200 1. qspinlock is released 2. Iterates lock waiting queue to find the qnode CPU1 CPU4 CPU2 busy-waiting (maxNode) with the highest I/O weight 800 200 400 3. Moves the maxNode next to the head node CPU1 CPU4 CPU2 CPU3 4. Next time, when the head node acquires the qspinlock, 800 400 200 the maxNode becomes a head node CPU4 CPU3 CPU2 busy-waiting In short, Justitia reorders the lock waiting queue based 400 200 on I/O weight CPU4 CPU3 CPU2 qspinlock lock waiting queue An overview of weight-aware qspinlock
Preventing the Race Condition • How about the starvation problem? • When there are many high-weighted apps, the low-weighted apps can starve • We adopt aging technique to prevent the starvation problem • Whenever reordering occurs, Justitia increases I/O weight of qnodes in the lock waiting queue • Justitia considers not only I/O weight but also the waiting time 800 400 200 CPU4 CPU3 CPU2 busy-waiting 400 200 300 CPU4 CPU3 CPU2 qspinlock lock waiting queue
Weight-aware Page Reclamation Justitia imposes weight-awareness by the following procedures • Calculating the I/O proportion of each application • Recording page ownership information on the page structure • Page reclamation considering the I/O proportion
Weight-aware Page Reclamation • Calculating the I/O proportion of each application • New variables in Cgroups are added • Proportion: Proportion of I/O weight (weight / total weight) • nrp_pages: The number of pages in the page cache that this cgroup is currently using Cgroups 100 APP 2 APP 3 APP 4 APP 1 node 100+200+400+800 100 200 400 800 APP # I/O weight 0.07 0.13 0.27 0.53 proportion nrp_pages
Weight-aware Page Reclamation • Recording page ownership information on the page structure • New variable in the page structure • I/O weight • Pointer to the corresponding cgroups node APP 1 page 100 weight 0.07 0 à 1 page allocation pointer to the corresponding Cgroups node 100 1
Weight-aware Page Reclamation • Page reclamation considering the I/O proportion • Justitia reclaims pages whose cgroups hold more pages than its threshold *Threshold = proportion * the total # of pages in the page cache Cgroups node APP 2 APP 3 APP 4 APP 1 APP # I/O weight 100 200 400 800 0.07 1 0.13 6 à 5 0.27 1 0.53 2 proportion nrp_pages inactive list in Page Cache 100 400 200 800 200 1 3 2 4 2 page 200 200 200 800 200 weight 2 2 2 4 2 pointer to the corresponding page reclamation Cgroups node An overview of weight-aware page reclamation
Experimental Setup • CPU: Intel I7-6700 • Memory: 16GB DRAM • Storage: SATA SSD 256GB • Benchmarks: FIO (re-read) and Filebench (fileserver) Read à Dummy write à Read * All applications were containerized by Docker • A metric to quantitively measure I/O proportionality, introduced in [1] (Proportionality Variation) Ref [1] J.Kim et al. “I/O scheduling schemes for better I/O proportionality on flash-based SSDs”
Evaluation (Fileserver) 10 Ideal Conventional Justitia CPM Normalized I/O Bandwidth 8 6 4 2 0 100 200 300 400 500 600 700 800 Weight (Set by Cgroups) • Compared with the conventional, Justitia achieves better I/O proportionality • Conventional: 1 : 1.51 : 2.02 : 2.40 : 2.63 : 2.71 : 3.07 : 3.31 • Justitia: 1 : 1.73 : 2.24 : 2.65 : 3.04 : 3.75 : 4.37 : 6.26
Evaluation (Aging Technique) 16 Normalized I/O Bandwidth Ideal Justitia w/o Aging Justitia 12 8 4 0 C1 C2 C3 C4 C5 C6 C7 C8 Weight (Set by Cgroups) Extreme case where C1’s weight: 100, C2-C8’s weight: 1000 • Justitia without aging: 1 : 12.57 : 13.31 : 11.72 : 12.443 : 13.31 : 12.77 : 13.35 (PV: 2.31) • Justitia: 1 : 8.94 : 9.36 : 9.08 : 8.83 : 9.49 : 9.77 : 9.43 (PV: 0.64)
Evaluation (Re-read) 10 Ideal Conv Justitia CPM Direct Normalized I/O Bandwidth 8 6 4 2 0 100 200 400 800 Weight (Set by Cgroups) • Justitia achieves better I/O proportionality than the other cases • PV of Conventional: 1.4 • PV of Justitia: 0.33 • PV of Direct I/O: 0.61
Conclusion • Cgroups support only block-level I/O proportionality, rather than application-level I/O proportionality • The conventional page cache management do not consider I/O weight either in page allocation and reclamation • Justitia: a new page cache management for application-level I/O proportionality • Weight-aware qspinlock for page allocation • Weight-aware page reclamation • Justitia is available at github.com/kzeoh/Justitia.git
Thank you! Any questions? Feel free to contact jonggyu@skku.edu
Recommend
More recommend