Chair of Communication Networks Departement of Electrical and Computer Engineering Technical University of Munich AHAB: Data-Driven Virtual Cluster Hunting Johannes Zerwas* Patrick Kalmbach* Carlo Fuerst° Arne Ludwig° Andreas Blenk* Wolfgang Kellerer* Stefan Schmid^ *Technical University of Munich, Germany °Technical University of Berlin, Germany ^University of Vienna, Austria IFIP Networking 2018, Zurich, Switzerland
Context VM 1 VM 1 VM 3 ? ? VM 2 VM 2 VM N • Increased use data-intensive applications in shared data centers • Many provider-tenant interfaces neglect network as a resource • Problems: − Unpredictable application performance − Limited applicability of cloud − Inefficiencies in production data centers • Solution: Network-aware abstraction - Virtual Cluster (ACM SIGCOMM 2011) Johannes Zerwas (TUM) 2
Background: Virtual Cluster Abstraction 0/8 Physical Cluster • Compute Units (CUs) used total BUs 0/4 • Bandwidth Units (BUs) • Tree-like topology (abstracted 0/2 0/2 0/2 0/2 from Fat-Tree) Virtual Cluster (VC) • Number of VMs (N) • Size of VMs (S) 1 • Bandwidth (B) • Lifetime given resource fulfillment Johannes Zerwas (TUM) 3
Background: Virtual Cluster Abstraction 0/8 Physical Cluster Footprint F=6 • Compute Units (CUs) 1/4 • Bandwidth Units (BUs) • Tree-like topology (abstracted 0/2 1/2 1/2 1/2 from Fat-Tree) Virtual Cluster (VC) Utilization U=9/32 • Number of VMs (N) • Size of VMs (S) 1 • Bandwidth (B) • Lifetime given resource fulfillment Johannes Zerwas (TUM) 4
Problem: Resource Fragmentation Existing allocation algorithms focus on single request: ▪ Oktopus (ACM SIGCOMM 2011) ▪ Kraken (IEEE/ACM TON 2018) 0/16 0/16 0/16 0/16 0/16 0/16 0/8 0/8 8/8 0/8 8/8 0/8 0/8 4/8 4/8 0/8 4/8 4/8 Contribution 1: TETRIS - Sacrifice the footprint 2/4 2/4 0/4 2/4 0/4 2/4 2/4 2/4 0/4 2/4 2/4 0/4 4/4 0/4 0/4 4/4 0/4 0/4 0/4 0/4 4/4 0/4 4/4 0/4 Fragmentation of resources Contribution 2: AHAB - Admission Control 1 2 t Johannes Zerwas (TUM) 5
TETRIS: Sacrifice Footprint for Fragmentation Choose hosts with max. ratio of residual resources 0/16 0/16 4/16 4/16 0/16 0/16 2/8 0/8 0/8 0/8 2/8 0/8 0/8 0/8 2/8 0/8 0/8 2/8 1/4 1/4 0/4 0/4 0/4 0/4 0/4 1/4 1/4 0/4 0/4 0/4 0/4 1/4 0/4 1/4 0/4 0/4 0/4 1/4 0/4 1/4 0/4 1/4 = 4 − 2 2/3 2/3 2/3 2/3 2/3 2/3 0/2 2/3 2/3 4 − 1 1 t Johannes Zerwas (TUM) 6
TETRIS: Sacrifice Footprint for Fragmentation Choose hosts with max. ratio of residual resources 12/16 4/16 4/16 12/16 4/16 4/16 2/8 6/8 2/8 6/8 2/8 2/8 6/8 2/8 6/8 2/8 2/8 2/8 1/4 3/4 3/4 3/4 1/4 1/4 1/4 3/4 1/4 1/4 3/4 1/4 3/4 1/4 1/4 1/4 1/4 3/4 1/4 1/4 3/4 1/4 3/4 1/4 Resources still usable - 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1 2 t Johannes Zerwas (TUM) 7
Algorithm Evaluation ▪ Baseline: OKTOPUS (ACM SIGCOMM 2011), KRAKEN (IEEE/ACM TON 2018) ▪ Physical Cluster: Fat-Tree with k=12, 8CUs and 8BUs ▪ Performance metrics: CU Utilization, avg. VC Footprint ▪ Virtual Cluster Requests: ▪ 1000 / run with varying arrival rates ▪ Num. VMs, size VMs, BW similar to traces from Google & Microsoft Johannes Zerwas (TUM) 8
TETRIS Evaluation +5% utilization +10% footprint Johannes Zerwas (TUM) 9
TETRIS Evaluation Bandwidth (BU) 1 2 4 8 Num. VMs Size VMs (CU) Add Admission Control Johannes Zerwas (TUM) 10
AHAB: The Case for Data-Driven Admission Control Data-Driven Leverage Monte Carlo Decision Tree Search Knowledge Johannes Zerwas (TUM) 11
AHAB: The Case for Data-Driven Admission Control 0/4 0/4 0/2 0/2 0/2 0/2 accept 1 reject Johannes Zerwas (TUM) 12
AHAB: The Case for Data-Driven Admission Control Utilization 1/4 1/4 accept reject 1 1 1/2 1/2 1/2 0/2 = 12 accept … … 1 reject Johannes Zerwas (TUM) 13
AHAB: The Case for Data-Driven Admission Control 0/4 0/4 accept reject 1 1 0/2 1/2 0/2 1/2 = 12 accept … Num. requests / sequence A … Num. sequences “accept” accept accept A > B? 1 1 1 B = 9 Works with every VC embedding … reject algorithm (Oktopus, Kraken, Tetris) … Johannes Zerwas (TUM) 14
AHAB improves utilization +10% utilization -25% footprint Johannes Zerwas (TUM) 15
Why is AHAB better? Kraken AHAB(Kraken) Bandwidth (BU) Bandwidth (BU) 1 2 4 8 1 2 4 8 Num. VMs Num. VMs Size VMs (CU) Size VMs (CU) Small VMs Large VMs Large BW Small BW Johannes Zerwas (TUM) 16
Why is AHAB better? Kraken & Tetris AHAB Acceptance Ratio Size VM / BW AHAB accepts more valuable requests Johannes Zerwas (TUM) 17
Optimization Opportunities Trade-Off: Utilization - Use ML for Computations speed-up Johannes Zerwas (TUM) 18
Summary ▪ TETRIS sacrifices footprint increase utilization ▪ AHAB employs a data-driven approach for Admission Control ▪ AHAB evaluates the impact of a single request on future requests ▪ AHAB’s approach applies also to other use-cases ▪ Future Work: Use ML to predict AHAB’s decisions Johannes Zerwas (TUM) 19
Thank you! Questions? Johannes Zerwas (TUM) 20
Recommend
More recommend