cake enabling hig igh level slos on shared sto torage
play

Cake: : Enabling Hig igh-level SLOs on Shared Sto torage Syste - PowerPoint PPT Presentation

Cake: : Enabling Hig igh-level SLOs on Shared Sto torage Syste tems Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica University of California, Berkeley SO SOCC CC 20 2012 12 Content Introduction Problem


  1. Cake: : Enabling Hig igh-level SLOs on Shared Sto torage Syste tems Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica University of California, Berkeley SO SOCC CC 20 2012 12

  2. Content  Introduction  Problem And Challenge  Solutions  System Design  Implementation  Evaluation  Conclusion  Future work 2

  3. Introduction  Rich web applications  A single slow storage request can dominate the overall response time  High percentile latency SLOs  Deal with the latency present at the 95th or 99th percentile

  4. Introduction  Datacenter applications  Latency-sensitive  Throughput-oriented  Accessing distributed storage systems  Applications don’t share storage systems  Service-level objectives on throughput or latency 4

  5. Introduction  SLOs  Reflect the performance expectations  Amazon, Google, and Microsoft have identified SLO as a major cause of user dissatisfaction  For example  A web client might require a 99th percentile latency SLO of 100ms  A batch job might require a throughput SLO of 100 scan requests per second 5

  6. Problem And Challenge  Physically separating storage systems  Need Individual peak load  Segregation of data leads to degraded user experience  Operational complexity  Require additional maintenance staff  More software bugs and configuration errors 6

  7. Problem And Challenge  Focusing solely on controlling disk-level resources  High-level storage SLOs require consideration of resources beyond the disk  Disconnect between the high-level SLOs and performance parameters like MB/s  Require tedious, manual translation  More programmer or system operator 7

  8. Solutions Cake ke A coordinated, multi-resource schedule for shared distributed storage environments with the goal of achieving both high throughput and bounded latency. 8

  9. System Design Architecture 9

  10. System Design  First-level schedulers as a client  Provide mechanisms for differentiated scheduling  Split large requests into smaller chunks  Limit the number of outstanding device requests 10

  11. System Design  Cake’s second -level scheduler as a feedback loop  While attempting to increase utilization  Continually adjusts resource allocation at each of the first-level schedulers  Maximize SLO compliance of the system 11

  12. First-level Resource Scheduling  Differentiated scheduling a b 12

  13. First-level Resource Scheduling  Split large requests  Control number of outstanding requests c d 13

  14. Second-level Scheduling  Multi-resource Request Lifecycle  Request processing in a storage system involves far more than just accessing disk  Necessitating a coordinated, multi-resource approach to scheduling 14

  15. Second-level Scheduling  Multi-resource Request Lifecycle 15

  16. Second-level Scheduling  High-level SLO Enforcement  Cake’s second -level scheduler  Satisfy the latency requirements of latency-sensitive front-end clients  Maximize the throughput of throughput-oriented batch clients  Two phases of second level scheduling decisions  For disk in the SLO compliance-based phase  For non-disk resources in the queue occupancy- based phase 16

  17. Second-level Scheduling  The initial SLO compliance-based phase  Decide on disk allocations based on client performance  The queue occupancy-based phase  Balance allocation in the rest of the system to keep the disk utilized and improve overall performance 17

  18. Implementation  Chunking Large Requests 18

  19. Implementation  Number of Outstanding Requests 19

  20. Implementation  Cake Second-level Scheduler — SLO Compliance-based Scheduling 20

  21. Implementation  Cake Second-level Scheduler — Queue Occupancy-based Scheduling 21

  22. Evaluation  Proportional Shares and Reservations When the front-end client is sending low throughput, reservations are an effective way of reducing queue time at HDFS 22

  23. Evaluation  Proportional Shares and Reservations When the front-end is sending high throughput,proportional share is an effective mechanism at reducing latency 23

  24. Evaluation  Single vs Multi-resource Scheduling CPU contention within HBase when running many concurrent threads and without separate queues and differentiated scheduling 24

  25. Evaluation  Single vs. Multi-resource Scheduling Thread-per-request displays greatly increased latency with chunked request sizes 25

  26. Evaluation  Convergence Time  Diurnal Workload  Spike Workload  Latency Throughput Trade-off  Quantifying Benefits of Consolidation 26

  27. Conclusion  Coordinating resource allocation across multiple software layers  Allowing application programmers to specify high-level SLOs directly to the storage  Allowing consolidation of latency-sensitive and throughput-oriented workloads 27

  28. Conclusion  Allowing users to flexibly move within the storage latency vs. throughput trade-off by choosing different high-level SLOs  Using Cake has concrete economic and business advantages 28

  29. Future work  SLO admission control  Influence of DRAM and SSDs  Composable application-level SLOs  Automatic parameter tuning  Generalization to multiple SLOs 29

  30. Thank You 30

Recommend


More recommend