ahab data driven virtual cluster hunting
play

AHAB: Data-Driven Virtual Cluster Hunting Johannes Zerwas* Patrick - PowerPoint PPT Presentation

Chair of Communication Networks Departement of Electrical and Computer Engineering Technical University of Munich AHAB: Data-Driven Virtual Cluster Hunting Johannes Zerwas* Patrick Kalmbach* Carlo Fuerst Arne Ludwig Andreas Blenk* Wolfgang


  1. Chair of Communication Networks Departement of Electrical and Computer Engineering Technical University of Munich AHAB: Data-Driven Virtual Cluster Hunting Johannes Zerwas* Patrick Kalmbach* Carlo Fuerst° Arne Ludwig° Andreas Blenk* Wolfgang Kellerer* Stefan Schmid^ *Technical University of Munich, Germany °Technical University of Berlin, Germany ^University of Vienna, Austria IFIP Networking 2018, Zurich, Switzerland

  2. Context VM 1 VM 1 VM 3 ? ? VM 2 VM 2 VM N • Increased use data-intensive applications in shared data centers • Many provider-tenant interfaces neglect network as a resource • Problems: − Unpredictable application performance − Limited applicability of cloud − Inefficiencies in production data centers • Solution: Network-aware abstraction - Virtual Cluster (ACM SIGCOMM 2011) Johannes Zerwas (TUM) 2

  3. Background: Virtual Cluster Abstraction 0/8 Physical Cluster • Compute Units (CUs) used total BUs 0/4 • Bandwidth Units (BUs) • Tree-like topology (abstracted 0/2 0/2 0/2 0/2 from Fat-Tree) Virtual Cluster (VC) • Number of VMs (N) • Size of VMs (S) 1 • Bandwidth (B) • Lifetime given resource fulfillment Johannes Zerwas (TUM) 3

  4. Background: Virtual Cluster Abstraction 0/8 Physical Cluster Footprint F=6 • Compute Units (CUs) 1/4 • Bandwidth Units (BUs) • Tree-like topology (abstracted 0/2 1/2 1/2 1/2 from Fat-Tree) Virtual Cluster (VC) Utilization U=9/32 • Number of VMs (N) • Size of VMs (S) 1 • Bandwidth (B) • Lifetime given resource fulfillment Johannes Zerwas (TUM) 4

  5. Problem: Resource Fragmentation Existing allocation algorithms focus on single request: ▪ Oktopus (ACM SIGCOMM 2011) ▪ Kraken (IEEE/ACM TON 2018) 0/16 0/16 0/16 0/16 0/16 0/16 0/8 0/8 8/8 0/8 8/8 0/8 0/8 4/8 4/8 0/8 4/8 4/8 Contribution 1: TETRIS - Sacrifice the footprint 2/4 2/4 0/4 2/4 0/4 2/4 2/4 2/4 0/4 2/4 2/4 0/4 4/4 0/4 0/4 4/4 0/4 0/4 0/4 0/4 4/4 0/4 4/4 0/4 Fragmentation of resources Contribution 2: AHAB - Admission Control 1 2 t Johannes Zerwas (TUM) 5

  6. TETRIS: Sacrifice Footprint for Fragmentation Choose hosts with max. ratio of residual resources 0/16 0/16 4/16 4/16 0/16 0/16 2/8 0/8 0/8 0/8 2/8 0/8 0/8 0/8 2/8 0/8 0/8 2/8 1/4 1/4 0/4 0/4 0/4 0/4 0/4 1/4 1/4 0/4 0/4 0/4 0/4 1/4 0/4 1/4 0/4 0/4 0/4 1/4 0/4 1/4 0/4 1/4 = 4 − 2 2/3 2/3 2/3 2/3 2/3 2/3 0/2 2/3 2/3 4 − 1 1 t Johannes Zerwas (TUM) 6

  7. TETRIS: Sacrifice Footprint for Fragmentation Choose hosts with max. ratio of residual resources 12/16 4/16 4/16 12/16 4/16 4/16 2/8 6/8 2/8 6/8 2/8 2/8 6/8 2/8 6/8 2/8 2/8 2/8 1/4 3/4 3/4 3/4 1/4 1/4 1/4 3/4 1/4 1/4 3/4 1/4 3/4 1/4 1/4 1/4 1/4 3/4 1/4 1/4 3/4 1/4 3/4 1/4 Resources still usable - 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1 2 t Johannes Zerwas (TUM) 7

  8. Algorithm Evaluation ▪ Baseline: OKTOPUS (ACM SIGCOMM 2011), KRAKEN (IEEE/ACM TON 2018) ▪ Physical Cluster: Fat-Tree with k=12, 8CUs and 8BUs ▪ Performance metrics: CU Utilization, avg. VC Footprint ▪ Virtual Cluster Requests: ▪ 1000 / run with varying arrival rates ▪ Num. VMs, size VMs, BW similar to traces from Google & Microsoft Johannes Zerwas (TUM) 8

  9. TETRIS Evaluation +5% utilization +10% footprint Johannes Zerwas (TUM) 9

  10. TETRIS Evaluation Bandwidth (BU) 1 2 4 8 Num. VMs Size VMs (CU) Add Admission Control Johannes Zerwas (TUM) 10

  11. AHAB: The Case for Data-Driven Admission Control Data-Driven Leverage Monte Carlo Decision Tree Search Knowledge Johannes Zerwas (TUM) 11

  12. AHAB: The Case for Data-Driven Admission Control 0/4 0/4 0/2 0/2 0/2 0/2 accept 1 reject Johannes Zerwas (TUM) 12

  13. AHAB: The Case for Data-Driven Admission Control Utilization 1/4 1/4 accept reject 1 1 1/2 1/2 1/2 0/2 = 12 accept … … 1 reject Johannes Zerwas (TUM) 13

  14. AHAB: The Case for Data-Driven Admission Control 0/4 0/4 accept reject 1 1 0/2 1/2 0/2 1/2 = 12 accept … Num. requests / sequence A … Num. sequences “accept” accept accept A > B? 1 1 1 B = 9 Works with every VC embedding … reject algorithm (Oktopus, Kraken, Tetris) … Johannes Zerwas (TUM) 14

  15. AHAB improves utilization +10% utilization -25% footprint Johannes Zerwas (TUM) 15

  16. Why is AHAB better? Kraken AHAB(Kraken) Bandwidth (BU) Bandwidth (BU) 1 2 4 8 1 2 4 8 Num. VMs Num. VMs Size VMs (CU) Size VMs (CU) Small VMs Large VMs Large BW Small BW Johannes Zerwas (TUM) 16

  17. Why is AHAB better? Kraken & Tetris AHAB Acceptance Ratio Size VM / BW AHAB accepts more valuable requests Johannes Zerwas (TUM) 17

  18. Optimization Opportunities Trade-Off: Utilization - Use ML for Computations speed-up Johannes Zerwas (TUM) 18

  19. Summary ▪ TETRIS sacrifices footprint increase utilization ▪ AHAB employs a data-driven approach for Admission Control ▪ AHAB evaluates the impact of a single request on future requests ▪ AHAB’s approach applies also to other use-cases ▪ Future Work: Use ML to predict AHAB’s decisions Johannes Zerwas (TUM) 19

  20. Thank you! Questions? Johannes Zerwas (TUM) 20

Recommend


More recommend