DC-DRF : Adaptive Multi- Resource Sharing at Public Cloud Scale ACM Symposium on Cloud Computing 2018 Ian A Kash, Greg O’Shea, Stavros Volos 1
Public Cloud DC hosting enterprise customers O(100K) servers, mostly small tenants 2
Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b compute storage 3
Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 One VM in compute V T O R1 V T O R2 VTORa VTORb server in compute rack T X1 R X1 TXb R X b S S D b compute storage 4
Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 One VM in compute V T O R1 V T O R2 VTORa VTORb server in compute rack T X1 R X1 TXb R X b One VHD in storage server in storage rack S S D b compute storage 5
Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b compute storage 6
Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b compute storage 7
Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b compute storage 8
Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b compute storage 9
Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b compute storage 10
Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b 11
Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b compute storage 12
Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b compute storage 13
Result: a multi-resource “demand vector” T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b compute storage 14
Encodes resource id and proportions T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b compute storage 15
Encodes resource id and proportions T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 Any element could be a V T O R1 V T O R2 VTORa VTORb bottleneck to performance T X1 R X1 TXb R X b S S D b compute storage 16
Demand vectors form a sparse demand matrix r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - - - - - - - - - n 1 - - - - - - - - - - n 2 - - - - - - - - - - - - - - - - - - - - n 3 - - - - - - - - - - n 4 - - - - - - - - - - n 5 n 6 - - - - - - - - - - - - - - - - - - - - n 7 n 8 - - - - - - - - - - n 9 - - - - - - - - - - 17
Columns are shared physical resources r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - - - - - - - - - n 1 - - - - - - - - - - n 2 - - - - - - - - - - - - - - - - - - - - n 3 - - - - - - - - - - n 4 - - - - - - - - - - n 5 n 6 - - - - - - - - - - - - - - - - - - - - n 7 n 8 - - - - - - - - - - n 9 - - - - - - - - - - 18
Rows are tenants’ demand vectors r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - - - - - - - - - n 1 - - - - - - - - - - n 2 - - - - - - - - - - - - - - - - - - - - n 3 - - - - - - - - - - n 4 - - - - - - - - - - n 5 n 6 - - - - - - - - - - - - - - - - - - - - n 7 n 8 - - - - - - - - - - n 9 - - - - - - - - - - 19
Shown as fractions of a resource r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - 1.0 - - - - - - .92 n 1 - - - - - - - - - - n 2 - - - - - - - - - - - - - - - - - - - - n 3 - - - - - - - - - - n 4 - - - - - - - - - - n 5 n 6 - - - - - - - - - - - - - - - - - - - - n 7 n 8 - - - - - - - - - - n 9 - - - - - - - - - - 20
Large and very sparse matrix r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - 1.0 - - - - - - .92 n 1 .95 - .47 - - - - - 1.0 - n 2 .54 1.0 - .30 .33 .23 .55 - .56 .31 - .41 .20 .12 .13 .09 .23 1.0 .23 .13 n 3 DC matrix 100K by 100K - 1.0 - .30 - .23 .55 - - .31 n 4 Rows mostly empty - .41 .09 .12 1.0 .64 .23 .20 .13 .13 n 5 n 6 .32 - .09 .12 1.0 .64 .23 .20 .13 .13 - - - - - - 1.0 - - .57 n 7 n 8 - - .56 .64 .20 .32 .13 .09 1.0 .23 n 9 .90 .27 .45 .64 .20 .32 .13 .09 1.0 .56 21
Provider has multi-resource allocation problem • Goal: maintain acceptable service level for all tenants • Acceptable means always “willing to pay” • Avoid abrupt performance collapse for any tenant • Assuming aggressive (noisy) neighbors and oversubscription • DC-DRF builds on existing multi-resource algorithms • DRF [Ghodsi et al, NSDI’11] • EDRF [Parkes et al., EC2012] • Challenging at DC scale: EDRF iterates and is 22
Systems aspects 23
Systems challenges • How to capture multi-resource demand vectors? • How to enforce multi-resource allocations? • DRF implies central SDN-like controller – good or bad? • Good: Simpler algorithm and global view • Bad: EDRF at Public Cloud DC scale 24
SIGCOMM 2015 demonstration 25
SIGCOMM 2015 demonstration Central controller running EDRF Pass1: reservation-based SLAs Pass2: work conservation of residual 26
SIGCOMM 2015 demonstration 4 tenants, 30 VMs each Spread over 10 servers R/W to 2X storage servers 40Gb RDMA switch 27
SIGCOMM 2015 demonstration Demand estimation and enforcement in HyperV 28
SIGCOMM 2015 demonstration Aggressive red tenant Perf. collapses for blue,yellow,green 29
video 30
SIGCOMM 2015 demonstration What did we learn from prototype? Potentially very powerful. But EDRF algorithm not scaling well. 31
The algorithms • to understand DC-DRF first understand EDRF • to understand DRF first understand max-min 32
Max-Min fairness : mice before elephants • Maximize the minimum allocation across competing tenants • Allocate fractions of a single shared resource based on demand • No tenant gets a larger fraction than its demand • Tenants with unsatisfiable demand obtain equal share Residual resource = 1.0 Residual resource = 0.7 Tenants remaining = 4 Tenants remaining = 2 .35 D Current share = 1.0/4 Current share = 0.7/2 x t = 0.25 x t = 0.6 .35 C Demand 0.35 0.5 x t =0.35 x t =0.25 B 0.2 0.2 0.1 0.1 A Allocated D A B C Tenant 33
How to handle multiple resources? r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - 1.0 - - - - - - .92 n 1 .95 - .47 - - - - - 1.0 - n 2 .54 1.0 - .30 .33 .23 .55 - .56 .31 - .41 .20 .12 .13 .09 .23 1.0 .23 .13 n 3 - 1.0 - .30 - .23 .55 - - .31 n 4 - .41 .09 .12 1.0 .64 .23 .20 .13 .13 n 5 n 6 .32 - .09 .12 1.0 .64 .23 .20 .13 .13 - - - - - - 1.0 - - .57 n 7 n 8 - - .56 .64 .20 .32 .13 .09 1.0 .23 n 9 .90 .27 .45 .64 .20 .32 .13 .09 1.0 .56 34
Dominant Resource Fairness (DRF) • For each tenant identifies its Dominant Resource • The resource of which it demands the largest fraction • Apply max-min fairness across dominant shares • Maximize smallest dominant share in system • Then second smallest, and so on… • Think : find the smallest mouse across all columns 35
Demand vectors normalized by Dominant Resource r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - 1.0 - - - - - - .92 n 1 .95 - .47 - - - - - 1.0 - n 2 .54 1.0 - .30 .33 .23 .55 - .56 .31 - .41 .20 .12 .13 .09 .23 1.0 .23 .13 n 3 - 1.0 - .30 - .23 .55 - - .31 n 4 - .41 .09 .12 1.0 .64 .23 .20 .13 .13 n 5 n 6 .32 - .09 .12 1.0 .64 .23 .20 .13 .13 - - - - - - 1.0 - - .57 n 7 n 8 - - .56 .64 .20 .32 .13 .09 1.0 .23 n 9 .90 .27 .45 .64 .20 .32 .13 .09 1.0 .56 36
Recommend
More recommend