Enlightening the I/O Path: A Holistic Approach for Application Performance appeared in FAST'17 Jinkyu Jeong Sungkyunkwan University
Data-Intensive Applications Relational Document Key-value Column Search 2
Data-Intensive Applications • Common structure Client Application * Example: MongoDB Request Response Application - Client (foreground) T1 T2 T3 T4 performance - Checkpointer I/O I/O I/O I/O - Log writer - Eviction worker Operating System - … Storage Device 3
Data-Intensive Applications • Common structure Client Application Background tasks are problematic * Example: MongoDB Request Response Application for application performance - Server (client) T1 T2 T3 T4 performance - Checkpointer I/O I/O I/O I/O - Log writer - Evict worker Operating System - … Storage Device 4
Application Impact • Illustrative experiment • YCSB update-heavy workload against MongoDB 5
Application Impact • Illustrative experiment • YCSB update-heavy workload against MongoDB 30 seconds latency at 99.99 th percentile CFQ Operation throughput 30000 (ops/sec) 20000 10000 0 0 200 400 600 800 1000 1200 1400 1600 1800 Regular Elapsed time (sec) checkpoint task 6
Application Impact • Illustrative experiment • YCSB update-heavy workload against MongoDB I/O priority does not help CFQ CFQ-IDLE Operation throughput 30000 (ops/sec) 20000 10000 0 0 200 400 600 800 1000 1200 1400 1600 1800 Elapsed time (sec) 7
Application Impact • Illustrative experiment • YCSB update-heavy workload against MongoDB State-of-the-art schedulers do CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO Operation throughput not help much 30000 (ops/sec) 20000 10000 0 0 200 400 600 800 1000 1200 1400 1600 1800 Elapsed time (sec) 8
What’s the Problem? • Independent policies in multiple layers • Each layer processes I/Os w/ limited information • I/O priority inversion • Background I/Os can arbitrarily delay foreground tasks 9
What’s the Problem? • Independent policies in multiple layers • Each layer processes I/Os w/ limited information • I/O priority inversion • Background I/Os can arbitrarily delay foreground tasks 10
Multiple Independent Layers • Independent I/O processing read() write() Application Buffer Cache Caching Layer Abstraction File System Layer FG BG FG reorder Block Layer BG BG FG BG Storage Device 11
What’s the Problem? • Independent policies in multiple layers • Each layer processes I/Os w/ limited information • I/O priority inversion • Background I/Os can arbitrarily delay foreground tasks 12
I/O Priority Inversion • Task dependency Application Caching Layer Locks File System Layer Condition variables Block Layer Storage Device 13
I/O Priority Inversion • I/O dependency Application Caching Layer File System Layer Outstanding I/Os Block Layer Storage Device 14
Our Approach • Request-centric I/O prioritization (RCP) • Critical I/O: I/O in the critical path of request handling • Policy: holistically prioritizes critical I/Os along the I/O path CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP Operation throughput 100 ms latency at 40000 99.99 th percentile 30000 (ops/sec) 20000 10000 0 0 200 400 600 800 1000 1200 1400 1600 1800 Elapsed time (sec) 15
Challenges • How to accurately identify I/O criticality • How to effectively enforce I/O criticality 16
Critical I/O Detection • Enlightenment API • Interface for tagging foreground tasks • I/O priority inheritance • Handling task dependency • Handling I/O dependency 17
I/O Priority Inheritance • Handling task dependency • Locks unlock inherit submit lock FG BG FG BG I/O FG BG complete • Condition variables wake inherit register submit FG wait CV CV BG BG I/O FG BG CV complete 18
I/O Priority Inheritance • Handling I/O dependency I/O Non-critical I/O tracking PER-DEV I/O ROOT NCIO NCIO Q admission stage NCIO NCIO Descriptor Location Resolver Sector # Sched queueing stage Block Layer delete on completion I/O 19
Handling Transitive Dependency • Possible states of dependent task inherit wait inherit wait inherit wait FG BG BG FG BG I/O FG BG Blocked Blocked Blocked at on task on I/O admission stage 20
Handling Transitive Dependency • Recording blocking status inherit inherit inherit reprio inherit retry FG BG BG FG BG I/O FG BG Task is I/O is recorded recorded 21
Challenges • How to accurately identify I/O criticality • Enlightenment API • I/O priority inheritance • Recording blocking status • How to effectively enforce I/O criticality 22
Criticality-Aware I/O Prioritization • Caching layer • Apply low dirty ratio for non-critical writes (1% by default) • Block layer • Isolate allocation of block queue slots • Maintain 2 FIFO queues • Schedule critical I/O first • Limit # of outstanding non-critical I/Os (1 by default) • Support queue promotion to resolve I/O dependency 23
Evaluation • Implementation on Linux 3.13 w/ ext4 • Application studies • PostgreSQL relational database • Backend processes as foreground tasks • I/O priority inheritance on LWLocks (semop) • MongoDB document store • Client threads as foreground tasks • I/O priority inheritance on Pthread mutex and condition vars (futex) • Redis key-value store • Master process as foreground task 24
Evaluation • Experimental setup • 2 Dell PowerEdge R530 (server & client) • 1TB Micron MX200 SSD • I/O prioritization schemes • CFQ (default), CFQ-IDLE • SPLIT-A (priority), SPLIT-D (deadline) [SOSP’15] • QASIO [FAST’15] • RCP 25
Application Throughput • PostgreSQL w/ TPC-C workload CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP 8000 37% Transaction throughput 31% 6000 28% (trx/sec) 4000 2000 0 10GB dataset 60GB dataset 200GB dataset 26
Application Throughput • Impact on background task CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP Our scheme improves Transaction log size (GB) 35 application throughput w/o penalizing 25 background tasks 15 5 -5 0 200 400 600 800 1000 1200 1400 1600 1800 Elapsed time (sec) 27
Application Latency • PostgreSQL w/ TPC-C workload CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP Our scheme is effective 0th 10 0 10 0 1 .E+00 for improving tail latency CCDF P[X>=x] 90th 10 -1 10 -1 1 .E-01 99th 1 .E-02 10 -2 10 -2 Over 2 sec 99.9th 1 .E-03 10 -3 10 -3 at 99.9th 10 -4 10 -4 1 .E-04 99.99th 1 .E-05 10 -5 10 -5 99.999th 0 1000 2000 3000 4000 5000 6000 Transaction latency (msec) 300 msec at 99.999th 28
Summary of Other Results • Performance results • MongoDB: 12%-201% throughput, 5x-20x latency at 99.9 th • Redis: 7%-49% throughput, 2x-20x latency at 99.9 th • Analysis results • System latency analysis using LatencyTOP • System throughput vs. Application latency • Need for holistic approach 29
Conclusions • Key observation • All the layers in the I/O path should be considered as a whole with I/O priority inversion in mind for effective I/O prioritization • Request-centric I/O prioritization • Enlightens the I/O path solely for application performance • Improves throughput and latency of real applications • Ongoing work • Practicalizing implementation • Applying RCP to database cluster with multiple replicas 30
Thank You! • Questions and comments 31
Recommend
More recommend