enlightening the i o path
play

Enlightening the I/O Path: A Holistic Approach for Application - PowerPoint PPT Presentation

Enlightening the I/O Path: A Holistic Approach for Application Performance appeared in FAST'17 Jinkyu Jeong Sungkyunkwan University Data-Intensive Applications Relational Document Key-value Column Search 2 Data-Intensive Applications


  1. Enlightening the I/O Path: A Holistic Approach for Application Performance appeared in FAST'17 Jinkyu Jeong Sungkyunkwan University

  2. Data-Intensive Applications Relational Document Key-value Column Search 2

  3. Data-Intensive Applications • Common structure Client Application * Example: MongoDB Request Response Application - Client (foreground) T1 T2 T3 T4 performance - Checkpointer I/O I/O I/O I/O - Log writer - Eviction worker Operating System - … Storage Device 3

  4. Data-Intensive Applications • Common structure Client Application Background tasks are problematic * Example: MongoDB Request Response Application for application performance - Server (client) T1 T2 T3 T4 performance - Checkpointer I/O I/O I/O I/O - Log writer - Evict worker Operating System - … Storage Device 4

  5. Application Impact • Illustrative experiment • YCSB update-heavy workload against MongoDB 5

  6. Application Impact • Illustrative experiment • YCSB update-heavy workload against MongoDB 30 seconds latency at 99.99 th percentile CFQ Operation throughput 30000 (ops/sec) 20000 10000 0 0 200 400 600 800 1000 1200 1400 1600 1800 Regular Elapsed time (sec) checkpoint task 6

  7. Application Impact • Illustrative experiment • YCSB update-heavy workload against MongoDB I/O priority does not help CFQ CFQ-IDLE Operation throughput 30000 (ops/sec) 20000 10000 0 0 200 400 600 800 1000 1200 1400 1600 1800 Elapsed time (sec) 7

  8. Application Impact • Illustrative experiment • YCSB update-heavy workload against MongoDB State-of-the-art schedulers do CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO Operation throughput not help much 30000 (ops/sec) 20000 10000 0 0 200 400 600 800 1000 1200 1400 1600 1800 Elapsed time (sec) 8

  9. What’s the Problem? • Independent policies in multiple layers • Each layer processes I/Os w/ limited information • I/O priority inversion • Background I/Os can arbitrarily delay foreground tasks 9

  10. What’s the Problem? • Independent policies in multiple layers • Each layer processes I/Os w/ limited information • I/O priority inversion • Background I/Os can arbitrarily delay foreground tasks 10

  11. Multiple Independent Layers • Independent I/O processing read() write() Application Buffer Cache Caching Layer Abstraction File System Layer FG BG FG reorder Block Layer BG BG FG BG Storage Device 11

  12. What’s the Problem? • Independent policies in multiple layers • Each layer processes I/Os w/ limited information • I/O priority inversion • Background I/Os can arbitrarily delay foreground tasks 12

  13. I/O Priority Inversion • Task dependency Application Caching Layer Locks File System Layer Condition variables Block Layer Storage Device 13

  14. I/O Priority Inversion • I/O dependency Application Caching Layer File System Layer Outstanding I/Os Block Layer Storage Device 14

  15. Our Approach • Request-centric I/O prioritization (RCP) • Critical I/O: I/O in the critical path of request handling • Policy: holistically prioritizes critical I/Os along the I/O path CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP Operation throughput 100 ms latency at 40000 99.99 th percentile 30000 (ops/sec) 20000 10000 0 0 200 400 600 800 1000 1200 1400 1600 1800 Elapsed time (sec) 15

  16. Challenges • How to accurately identify I/O criticality • How to effectively enforce I/O criticality 16

  17. Critical I/O Detection • Enlightenment API • Interface for tagging foreground tasks • I/O priority inheritance • Handling task dependency • Handling I/O dependency 17

  18. I/O Priority Inheritance • Handling task dependency • Locks unlock inherit submit lock FG BG FG BG I/O FG BG complete • Condition variables wake inherit register submit FG wait CV CV BG BG I/O FG BG CV complete 18

  19. I/O Priority Inheritance • Handling I/O dependency I/O Non-critical I/O tracking PER-DEV I/O ROOT NCIO NCIO Q admission stage NCIO NCIO Descriptor Location Resolver Sector # Sched queueing stage Block Layer delete on completion I/O 19

  20. Handling Transitive Dependency • Possible states of dependent task inherit wait inherit wait inherit wait FG BG BG FG BG I/O FG BG Blocked Blocked Blocked at on task on I/O admission stage 20

  21. Handling Transitive Dependency • Recording blocking status inherit inherit inherit reprio inherit retry FG BG BG FG BG I/O FG BG Task is I/O is recorded recorded 21

  22. Challenges • How to accurately identify I/O criticality • Enlightenment API • I/O priority inheritance • Recording blocking status • How to effectively enforce I/O criticality 22

  23. Criticality-Aware I/O Prioritization • Caching layer • Apply low dirty ratio for non-critical writes (1% by default) • Block layer • Isolate allocation of block queue slots • Maintain 2 FIFO queues • Schedule critical I/O first • Limit # of outstanding non-critical I/Os (1 by default) • Support queue promotion to resolve I/O dependency 23

  24. Evaluation • Implementation on Linux 3.13 w/ ext4 • Application studies • PostgreSQL relational database • Backend processes as foreground tasks • I/O priority inheritance on LWLocks (semop) • MongoDB document store • Client threads as foreground tasks • I/O priority inheritance on Pthread mutex and condition vars (futex) • Redis key-value store • Master process as foreground task 24

  25. Evaluation • Experimental setup • 2 Dell PowerEdge R530 (server & client) • 1TB Micron MX200 SSD • I/O prioritization schemes • CFQ (default), CFQ-IDLE • SPLIT-A (priority), SPLIT-D (deadline) [SOSP’15] • QASIO [FAST’15] • RCP 25

  26. Application Throughput • PostgreSQL w/ TPC-C workload CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP 8000 37% Transaction throughput 31% 6000 28% (trx/sec) 4000 2000 0 10GB dataset 60GB dataset 200GB dataset 26

  27. Application Throughput • Impact on background task CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP Our scheme improves Transaction log size (GB) 35 application throughput w/o penalizing 25 background tasks 15 5 -5 0 200 400 600 800 1000 1200 1400 1600 1800 Elapsed time (sec) 27

  28. Application Latency • PostgreSQL w/ TPC-C workload CFQ CFQ-IDLE SPLIT-A SPLIT-D QASIO RCP Our scheme is effective 0th 10 0 10 0 1 .E+00 for improving tail latency CCDF P[X>=x] 90th 10 -1 10 -1 1 .E-01 99th 1 .E-02 10 -2 10 -2 Over 2 sec 99.9th 1 .E-03 10 -3 10 -3 at 99.9th 10 -4 10 -4 1 .E-04 99.99th 1 .E-05 10 -5 10 -5 99.999th 0 1000 2000 3000 4000 5000 6000 Transaction latency (msec) 300 msec at 99.999th 28

  29. Summary of Other Results • Performance results • MongoDB: 12%-201% throughput, 5x-20x latency at 99.9 th • Redis: 7%-49% throughput, 2x-20x latency at 99.9 th • Analysis results • System latency analysis using LatencyTOP • System throughput vs. Application latency • Need for holistic approach 29

  30. Conclusions • Key observation • All the layers in the I/O path should be considered as a whole with I/O priority inversion in mind for effective I/O prioritization • Request-centric I/O prioritization • Enlightens the I/O path solely for application performance • Improves throughput and latency of real applications • Ongoing work • Practicalizing implementation • Applying RCP to database cluster with multiple replicas 30

  31. Thank You! • Questions and comments 31

Recommend


More recommend