dynamic resource allocation for database servers running
play

Dynamic Resource Allocation for Database Servers Running on Virtual - PowerPoint PPT Presentation

Dynamic Resource Allocation for Database Servers Running on Virtual Storage Gokul Soundararajan, Daniel Lupei, Saeed Ghanbari, Adrian Daniel Popescu, Jin Chen, Cristiana Amza University of Toronto 1 Multi-tier Resource Allocation


  1. Dynamic Resource Allocation for Database Servers Running on Virtual Storage Gokul Soundararajan, Daniel Lupei, Saeed Ghanbari, Adrian Daniel Popescu, Jin Chen, Cristiana Amza University of Toronto 1

  2. Multi-tier Resource Allocation Application-A Application-B Composed of several tiers Web Server Share resources Application Server in each tier Can lead to interference Database Server Consolidated Environment 2

  3. Our Focus: Storage Hierarchy Application-A Application-B Buffer Disk is a Want to use all Database Pool bottleneck for resources Database Apps efficiently Network Storage Cache Storage Disk Bandwidth 3

  4. State of the Art ‣ Previous work studied resources in isolation - Memory Partitioning: MRC [ASPLOS’04] - Disk Bandwidth : Facade [FAST’03], Argon [FAST’07], etc. - ... and many more ‣ Want to use the storage hierarchy efficiently ‣ However, performance depends on all layers - Interdependency between resources - E.g., Increasing buffer pool reduces number of storage accesses 4

  5. Motivating Scenario Cache Friendly Cache Un-Friendly 1 Outstanding I/O 10 Outstanding I/O Small Large Buffer Using Oracle Pool ORION I/O tool Storage Cache Disk Bandwidth 5

  6. Motivating Scenario 7.5 Benefits cache- Normalized Latency friendly workload 5.0 Avoids disk interference Has best performance 2.5 0 Shared Cache Disk Cache & Disk Small Large 6

  7. Contributions ‣ Build performance models dynamically - Account for interdependencies between resources - Lightweight but still accurate ‣ Multi-level Resource Allocator - Uses performance models to guide resource allocation - Corrects model errors through runtime sampling - Uses global utility (SLOs) to partition resources - Minimize sum of application latencies 7

  8. Approach ‣ Build performance models - One per application - Derive function to predict application latency given configuration = f ( ρ c , ρ s , ρ d ) L avg ‣ Find resource partitioning setting - Minimize sum of application latencies - Find best setting using hill climbing 8

  9. Outline ‣ Online Performance Models - What are they? - Why are they hard to build? ‣ Multi-level Resource Allocator ‣ Prototype Implementation ‣ Experimental Results ‣ Conclusions 9

  10. One-Level Cache Model Allocate in 32MB chunks m=CacheSize/ChunkSize =1GB/32MB=32 choices Choose Rd(A) Avg. Latency 512MB 32 64 ... 512 ... 1G 32 choices Cache size 10

  11. MRC Cache Model Computes miss-ratio Rd(A) given an I/O trace Miss-Ratio Multiply by I/O latency gets Avg. Latency 32 64 ... 512 ... 1G Cache size 11

  12. Two-Level Cache Model ‣ Performance affected by Changes the I/O trace at - DB Buffer Pool Size ( m choices) storage - Storage Cache ( n choices) ‣ Performance model - Needs to consider all parameters ( m*n choices) - 1GB caches allocated in 32MB chunks - m = 1GB/32MB = 32 settings - m*n = 1024 distinct settings 12

  13. Two-Level Cache Model 2 caches create a 3D surface 32x32=1024 High data points! Latency Latency Avg. Low Latency 32 data points Storage Cache Size 1024 768 512 256 Storage Cache Size (MB) Buffer Pool Size 512 256 768 Buffer Pool Size (MB) 13

  14. Overall Performance Model Application Buffer ρ c Pool Needs 32x32x10=10240 Storage samples ρ s Cache 15 mins/sample takes 3 months! Disk ρ d Bandwidth 14

  15. Outline ‣ Online Performance Models ‣ Multi-level Resource Allocator - Building performance models - Allocating resources using models ‣ Prototype Implementation ‣ Experimental Results ‣ Conclusions 15

  16. Key Observations ‣ Known cache replacement policies - Most cache replacement algorithms are LRU - Only as effective as the largest cache ( cache inclusiveness ) ‣ Disk is a closed loop system - Rate of responses is same as rate of requests - Performance proportional to the disk bandwidth fraction 16

  17. Cache Inclusiveness 8 8 8 1 6 8 4 5 6 3 4 LRU LRU 8 8 I/Os: 0 I/Os: 1 17

  18. Cache Inclusiveness 8 8 8 8 1 1 6 6 8 8 4 4 5 5 6 6 3 3 4 4 LRU 4 3 Storage cache includes data in the buffer pool LRU 4 3 6 5 8 I/Os: 6 18

  19. Cache Inclusiveness 8 8 8 8 1 1 6 6 8 8 4 4 5 5 6 6 3 3 4 4 LRU 4 3 6 5 8 Buffer pool includes data in the storage cache LRU 3 5 I/Os: 6 19

  20. Approximate Single Cache Model (LRU) 8 8 8 8 1 1 6 6 8 8 4 4 5 5 6 6 3 3 4 4 LRU 4 3 M c (max[ ρ c , ρ s ]) LRU LRU 4 4 3 3 6 6 5 5 8 8 Same Number of I/Os I/Os: 6 20

  21. Cache Model (DEMOTE) ‣ Maintain cache exclusiveness - E.g., using DEMOTEs [USENIX’02] - Every block brought into buffer pool is not cached below - Only evictions from buffer pool cached in storage cache ‣ Approximate performance using single cache - M c ( ρ c + ρ s ) 21

  22. Find Best Partitioning Setting Latency Storage Cache Size App-1 Find Best Resource Buffer Pool Size Allocation Setting Minimize sum of Latency application latencies Storage App-2 Cache Size Buffer Pool Size 22

  23. Disk Model ‣ Observation: Closed loop system - Rate of responses same as rate of requests - Use interactive response time law ‣ Performance proportional to disk bandwidth fraction L d (1) - Measure base disk latency: - Predict latency for smaller bandwidth fractions L d ( ρ d ) = L d (1) ρ d 23

  24. Putting it All Together Application Buffer Pool Approximate Single-Level Cache Can now be Storage Cache solved using MRC M c ( ρ c ) M s ( ρ c , ρ s ) N = M c (max[ ρ c , ρ s ]) N 24

  25. Putting it all Together Application H c ( ρ c ) L c M c ( ρ c ) H s ( ρ c , ρ s ) L net M c ( ρ c ) M s ( ρ c , ρ s ) L d ( ρ d ) 25

  26. Inaccuracies in the Model ‣ Cache Model - Approximations to LRU, i.e., CLOCK - Large fraction of writes in the workload ‣ Disk Model - Using Quanta-based scheduler [Wachs et. al, FAST’07] - Interference due to disk seeks at small quanta ‣ Inaccuracies localized in known regions - E.g., Small disk quanta 26

  27. Iterative Refinement ‣ Build model - Use trace collected at the database buffer pool ‣ Refine the model - Use cross-validation to measure quality - Selectively sample where error is high - Interpolate computed and measured samples - Using regression (SVM) 27

  28. Virtual Storage Prototype MySQL Storage Buffer Pool NBD Cache Quanta Network Linux Linux Block Layer Block Layer SCSI NBD SCSI Disk Disk Disk Disk CLIENT SERVER 28

  29. Experimental Setup ‣ Benchmarks - UNIFORM ( microbenchmark ), TPC-W and TPC-C ‣ LAMP Architecture - L inux, A pache 1.3, M ySQL/InnoDB 5.0, and P HP 5.0 ‣ Cache Configuration - MySQL buffer pool = 1GB - Storage cache = 1GB - Using InnoDB cache replacement in MySQL, CLOCK in storage cache 29

  30. Our Algorithms ‣ GLOBAL - Gather trace at the buffer pool - Measure base disk latency - Compute performance using performance model ‣ GLOBAL+ - Run GLOBAL - Evaluate model accuracy - Refine model using runtime samples 30

  31. Algorithms for Comparison ‣ MRC - Partition cache (independently ) using miss-ratio curves ‣ DISK - Partition caches equally, determine best disk quanta ‣ MRC+DISK - Run MRC then DISK ‣ IDEAL* - Build model with SVM using 16*16*5=1280 sampled configurations 31

  32. Roadmap of Results ‣ Multi-level cache allocator - Using LRU and DEMOTE cache replacement policies ‣ Multi-level cache and disk ‣ Accuracy of computed models 32

  33. Miss-Ratio Curves 100 TPC-W TPC-C 75 Miss Ratio (%) UNIFORM 50 25 0 0 128 256 384 512 640 768 896 1024 Buffer Pool Size (MB) 33

  34. Multi-Level Caching (LRU) 2 TPC-W HeatMap 1G Instances Lighter: Better Optimal Darker: Worse Buffer Pool Size (A) Optimal Storage Cache Size 0 1G (A) 34

  35. Multi-Level Caching (DEMOTE) 2 TPC-W 1G Instances Optimal Buffer Pool Size (A) Storage Cache Size 0 1G (A) 35

  36. Roadmap of Results ‣ Multi-level cache allocator ‣ Multi-level cache and disk - Using two identical applications - Using different applications ‣ Accuracy of computed models 36

  37. UNIFORM/UNIFORM Allocate caches to 50/50 8.410 6.728 Average Latency (ms) Matches 5.046 GLOBAL 3.364 1.682 0 GLOBAL GLOBAL+ MRC DISK MRC+DISK IDEAL* UNIFORM UNIFORM 37

  38. TPC-W/UNIFORM Allocate caches to 50/50 7.5 6.0 Average Latency (ms) Not enough buffer pool to 4.5 UNIFORM Compensate for MRC settings 3.0 1.5 0 GLOBAL GLOBAL+ MRC DISK MRC+DISK IDEAL* TPC-W UNIFORM 38

  39. TPC-W/TPC-C TPC-C allocated 0.80 more in both 0.64 Average Latency (ms) 0.48 Corrects imbalance in Corrects model MRC 0.32 at runtime 0.16 0 GLOBAL GLOBAL+ MRC DISK MRC+DISK IDEAL* TPC-W TPC-C 39

  40. Roadmap of Results ‣ Multi-level cache allocator ‣ Multi-level cache and disk ‣ Accuracy of computed models - Cache model - Disk model 40

  41. Cache Model Accuracy (TPC-W) 1024 50 Storage Cache Size (MB) 896 Localized in the 40 768 middle Error (%) 640 30 512 20 384 256 10 128 0 0 0 128 256 384 512 640 768 896 1024 Buffer Pool Size (MB) 41

  42. Disk Model Accuracy (TPC-W) 50 Measured Computed 40 Latency (ms) 30 20 10 0 0 0.2 0.4 0.6 0.8 1 Disk Quota 42

Recommend


More recommend