capacity planning with phased workloads
play

Capacity planning with phased workloads Arif Merchant Storage - PowerPoint PPT Presentation

WOSP 98, Santa Fe, NM, 12-16 October 1998 Capacity planning with phased workloads Arif Merchant Storage Systems Program Computer Systems Laboratory Hewlett-Packard Laboratories, Palo Alto, CA Joint work with E. Borowsky, R. Golding, P.


  1. WOSP ‘98, Santa Fe, NM, 12-16 October 1998 Capacity planning with phased workloads Arif Merchant Storage Systems Program Computer Systems Laboratory Hewlett-Packard Laboratories, Palo Alto, CA Joint work with E. Borowsky, R. Golding, P. Jacobson, L. Schreier, M. Spasojevic and John Wilkes 10/16/98

  2. Attribute-managed storage A day in the life of a System Administrator Quality of service Need more capacity. guarantees. Need better performance. Network attached storage. Need high availability. More demanding Must rebalance the load. applications. Must add devices. UGH!... AAAGH!... my head hurts! Brain exploding! Headache today? Migraine tomorrow!!! 1 slides-only.fm

  3. Attribute-managed storage Motivation Growing complexity of storage systems HP 300GB TPC-D benchmark list price Complexity: $ growing number of 37% Storage disks, 21% Software 42% SPUs Capacity time Growing cost of ownership for storage systems Management costs Initial disk purchase cost 3 slides-only.fm

  4. Attribute-managed storage Opportunity Servers Services High speed back-end storage network Clients Storage utility Virtual stores Storage-utility Optional: direct client interface access to storage network 5 slides-only.fm

  5. Attribute-managed storage A closer look Main storage interface Storage utility Virtual To clients stores Shared, network- attached storage devices To storage managers Distributed storage- management functions Storage-management interface 7 slides-only.fm

  6. Attribute-managed storage The goal Say what you want not how to do it! RAID 3 data layout, across 5 of the disks on disk array F, using 64KB stripe size, 3MB dedicated buffer cache with 128KB sequential readahead • business-critical buffer, delayed write-back with availability 1MB NVRAM buffer and max 10s residency time, dual • 100 IOs/sec 256Kb/s links via host interfaces 12.4.3 and 16.0.4, • 200ms response time 1Gb/s trunk links between FibreChannel switches A-3 and B-1, … 9 slides-only.fm

  7. Attribute-managed storage The mechanism applications workload workload requirements assignment engine storage- (solver) system configuration storage device abilities 11 slides-only.fm

  8. Attribute-managed storage The assignment problem 13 slides-only.fm

  9. Constraints Does it fit? ❏ Capacity constraints ❏ Is there enough space? ❏ Availability constraints ❏ Is it up often enough? ❏ Performance constraints ❏ Is response time adequate? E.g.: Are 95% of requests satisfied within 0.2 sec? 15 slides-only.fm

  10. Short Term Utilization Intuition Queues form in stable system because of variation in workload arrival rate. Request arrivals Work in system Queueing delays can be controlled by controlling variability in work arrival rate. 17 slides-only.fm

  11. Short Term Utilization A theorem If the work arriving in every period of length T is such that the device can do it in T seconds, then the response time is always less than T seconds. Request arrivals Work in system ❏ Setting T= maximum response time allowed meets requirements. ❏ But ... this requirement is too strict. 19 slides-only.fm

  12. Short Term Utilization An approximation Pr{Work arriving in T < what device can do in T} > p => Pr{Response time < T} > p ❏ Translates bound on response time tail into a bound on tail of Work(T) ❏ Approximation is exact for p=1 ❏ Distribution of Work arriving in time T frequently easy to calculate or approximate for simple workloads. 21 slides-only.fm

  13. Workload Characterization TPC-D workload traces: application phases 200 200 150 150 IOs/sec IOs/sec 100 100 50 50 0 0 0 100 200 300 0 100 200 300 time (sec) time (sec) 200 200 150 150 IOs/sec IOs/sec 100 100 50 50 0 0 0 100 200 300 400 0 50 100 time (sec) time (sec) 23 slides-only.fm

  14. Workload characterization Phased correlated model Each workload is modeled as a ON-OFF Poisson process ❏ Parameters: ON time average, OFF time average, IO rate during ON period ❏ Correlation between workloads: pij = Pr{Aj is ON when Ai comes ON} 25 slides-only.fm

  15. Phasing and Short term utilization Combining forces ❏ Response times increase only when some workload goes ON ❏ Sufficient to test response time bounds only at the times workloads change state from OFF to ON ❏ Workload distribution is easy to estimate given a workload just went ON. 27 slides-only.fm

  16. Validation and testing Tasting the stew Compared simulation and modelling results ❏ Baseline case: 8 streams, correlated sets of 4,2, 2. All predictions were correct. ❏ Checking tightness of predictions - are the bounds optimistic (wrong) or pessimistic? 3 cosa0 Tightness cosa1 cosa2 cosa3 2 cosa4 cosa5 cosa6 cosa7 inaccuracies 1 0.5 1.0 1.5 0.2 2 Inter Arrival Time 29 slides-only.fm

  17. Validation and testing The validation loop Forum validation loop application KItrace compare measurement measurements predictions KItrace Panopticon Forum solver assignments workload specs application 31 slides-only.fm

  18. Validation and testing The pudding 4000 execution time (sec) 3000 25 disks 15 disks 2000 1000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Query Query execution times: 25 vs. 15 disks 33 slides-only.fm

  19. Capacity planning What next? ❏ Better device models ❏ Better workload models ❏ Fault-tolerant on-line management 35 slides-only.fm

  20. Attribute-managed storage The future Need guaranteed quality of service? Storage distributed across the network? Continually changing workload? NO PROBLEM! http://www.hpl.hp.com/SSP 37 slides-only.fm

Recommend


More recommend