Cake: : Enabling Hig igh-level SLOs on Shared Sto torage Syste tems Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Randy Katz, Ion Stoica University of California, Berkeley SO SOCC CC 20 2012 12
Content Introduction Problem And Challenge Solutions System Design Implementation Evaluation Conclusion Future work 2
Introduction Rich web applications A single slow storage request can dominate the overall response time High percentile latency SLOs Deal with the latency present at the 95th or 99th percentile
Introduction Datacenter applications Latency-sensitive Throughput-oriented Accessing distributed storage systems Applications don’t share storage systems Service-level objectives on throughput or latency 4
Introduction SLOs Reflect the performance expectations Amazon, Google, and Microsoft have identified SLO as a major cause of user dissatisfaction For example A web client might require a 99th percentile latency SLO of 100ms A batch job might require a throughput SLO of 100 scan requests per second 5
Problem And Challenge Physically separating storage systems Need Individual peak load Segregation of data leads to degraded user experience Operational complexity Require additional maintenance staff More software bugs and configuration errors 6
Problem And Challenge Focusing solely on controlling disk-level resources High-level storage SLOs require consideration of resources beyond the disk Disconnect between the high-level SLOs and performance parameters like MB/s Require tedious, manual translation More programmer or system operator 7
Solutions Cake ke A coordinated, multi-resource schedule for shared distributed storage environments with the goal of achieving both high throughput and bounded latency. 8
System Design Architecture 9
System Design First-level schedulers as a client Provide mechanisms for differentiated scheduling Split large requests into smaller chunks Limit the number of outstanding device requests 10
System Design Cake’s second -level scheduler as a feedback loop While attempting to increase utilization Continually adjusts resource allocation at each of the first-level schedulers Maximize SLO compliance of the system 11
First-level Resource Scheduling Differentiated scheduling a b 12
First-level Resource Scheduling Split large requests Control number of outstanding requests c d 13
Second-level Scheduling Multi-resource Request Lifecycle Request processing in a storage system involves far more than just accessing disk Necessitating a coordinated, multi-resource approach to scheduling 14
Second-level Scheduling Multi-resource Request Lifecycle 15
Second-level Scheduling High-level SLO Enforcement Cake’s second -level scheduler Satisfy the latency requirements of latency-sensitive front-end clients Maximize the throughput of throughput-oriented batch clients Two phases of second level scheduling decisions For disk in the SLO compliance-based phase For non-disk resources in the queue occupancy- based phase 16
Second-level Scheduling The initial SLO compliance-based phase Decide on disk allocations based on client performance The queue occupancy-based phase Balance allocation in the rest of the system to keep the disk utilized and improve overall performance 17
Implementation Chunking Large Requests 18
Implementation Number of Outstanding Requests 19
Implementation Cake Second-level Scheduler — SLO Compliance-based Scheduling 20
Implementation Cake Second-level Scheduler — Queue Occupancy-based Scheduling 21
Evaluation Proportional Shares and Reservations When the front-end client is sending low throughput, reservations are an effective way of reducing queue time at HDFS 22
Evaluation Proportional Shares and Reservations When the front-end is sending high throughput,proportional share is an effective mechanism at reducing latency 23
Evaluation Single vs Multi-resource Scheduling CPU contention within HBase when running many concurrent threads and without separate queues and differentiated scheduling 24
Evaluation Single vs. Multi-resource Scheduling Thread-per-request displays greatly increased latency with chunked request sizes 25
Evaluation Convergence Time Diurnal Workload Spike Workload Latency Throughput Trade-off Quantifying Benefits of Consolidation 26
Conclusion Coordinating resource allocation across multiple software layers Allowing application programmers to specify high-level SLOs directly to the storage Allowing consolidation of latency-sensitive and throughput-oriented workloads 27
Conclusion Allowing users to flexibly move within the storage latency vs. throughput trade-off by choosing different high-level SLOs Using Cake has concrete economic and business advantages 28
Future work SLO admission control Influence of DRAM and SSDs Composable application-level SLOs Automatic parameter tuning Generalization to multiple SLOs 29
Thank You 30
Recommend
More recommend