snc meister admitting more tenants with tail latency slos
play

SNC-Meister: Admitting More Tenants With Tail Latency SLOs Timothy - PowerPoint PPT Presentation

SNC-Meister: Admitting More Tenants With Tail Latency SLOs Timothy Zhu Daniel S. Berger Mor Harchol-Balter Carnegie Mellon University University of Kaiserslautern Carnegie Mellon University Presented By: Zane Ma & Shuo Feng


  1. SNC-Meister: Admitting More Tenants With Tail Latency SLOs Timothy Zhu Daniel S. Berger 
 Mor Harchol-Balter Carnegie Mellon University University of Kaiserslautern Carnegie Mellon University Presented By: Zane Ma & Shuo Feng SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 1

  2. Cloud Request Latency High performance cloud computing in a single datacenter Ex: MapReduce, Heron, HDFS Cloud networks provide latency service-level objectives (SLOs) Typically guarantee 99% or 99.9% request latency , rather than packet latency SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 2

  3. Cloud Request Latency High performance cloud computing in a single datacenter Ex: MapReduce, Heron, HDFS Goal: Achieving high tenancy while Cloud networks provide latency meeting tail latency SLOs service-level objectives (SLOs) Typically guarantee 99% or 99.9% request latency , rather than packet latency SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 3

  4. Latency Causes Assumption: typical behavior, no hardware failure, flash crowds, etc. Short lived bursts caused by network queues and services Datacenter Network Queue Queue Tenant VM 1 Switch Server VM Tenant VM 2 SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 4

  5. Modeling Latency Deterministic Network Calculus Calculate fixed maximum rate/burst constraints from historical traces Consider worst case scenario from adversarial coordination (i.e. 100% latency) Used by Silo (SIGCOMM 2015), QJump (NSDI 2015), PriorityMeister (SoCC 2014) SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 5

  6. Modeling Latency Deterministic Network Calculus Stochastic Network Calculus Calculate fixed maximum rate/burst Model maximum rate/ burstiness as a constraints from historical traces probabilistic distribution Consider worst case scenario from Does not assume all tenants are adversarial coordination (i.e. 100% adversarially correlated - lower target latency) latency percentile (e.g. 99.9%) Used by Silo (SIGCOMM 2015), QJump (NSDI 2015), PriorityMeister (SoCC 2014) SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 6

  7. Modeling Latency Deterministic Network Calculus Stochastic Network Calculus SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 7

  8. Modeling Latency Deterministic Network Calculus Stochastic Network Calculus 99.9% latency SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 8

  9. Modeling Latency Deterministic Network Calculus Stochastic Network Calculus 99.9% latency SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 9

  10. SNC Example Queue Queue Tenant VM 1 Server VM Switch Tenant VM 2 SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 10

  11. SNC Example Arrival Processes Queue Queue A1 Tenant VM 1 A3 Server VM Switch A2 Tenant VM 2 SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 11

  12. SNC Example Arrival Processes Queue Queue A1 Tenant VM 1 A3 Server VM Switch A2 Tenant VM 2 S1 S2 Service Processes SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 12

  13. SNC Example Queue Queue A1 Tenant VM 1 A3 Switch Server VM A2 Tenant VM 2 S1 S2 Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 13

  14. SNC Example Queue Queue A1 Tenant VM 1 A3 Switch Server VM A2 Tenant VM 2 S1 S2 Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = switch latency + server latency SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 14

  15. SNC Example Queue Queue A1 Tenant VM 1 A3 Switch Server VM A2 Tenant VM 2 S1 S2 Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency( A1 , S1 , 0.99) + Latency( A3 , S2 , 0.99) SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 15

  16. SNC Example Queue Queue A1 A1 Tenant VM 1 A3 Switch Server VM A2 Tenant VM 2 S1 S2 Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency(A1, S1, 0.99) + Latency(A3, S2, 0.99) S1 slowed down by A2 ! SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 16

  17. SNC Example Queue Queue A1 A1 Tenant VM 1 A3 Switch Server VM A2 Tenant VM 2 S’1 S2 Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency(A1, S1, 0.99) + Latency(A3, S2, 0.99) S1 slowed down by A2 ! —> S’1 = Leftover( S1 , A2 ) SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 17

  18. SNC Example Queue Queue A1 A1 Tenant VM 1 A3 Switch Server VM A2 Tenant VM 2 S’1 S2 Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency(A1, S1 S’1 , 0.99) + Latency(A3, S2, 0.99) S’1 = Leftover(S1, A2) SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 18

  19. SNC Example Queue Queue A1 A1 Tenant VM 1 A3 Switch Server VM A2 Tenant VM 2 S’1 S2 Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency(A1, S’1, 0.99) + Latency(A3, S2, 0.99) S’1 = Leftover(S1, A2) A3 = Output( A1, S’1 ) SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 19

  20. SNC Example Queue Queue A1 A1 Tenant VM 1 A3 Switch Server VM A2 Tenant VM 2 S’1 S2 Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency(A1, S’1, 0.99) + Latency(A3, S2, 0.99) S’1 = Leftover(S1, A2) A3 = Output(A1, S’1) Adding latencies does not preserve SLO %! SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 20

  21. SNC Example Queue Queue A1 A1 Tenant VM 1 A3 Switch Server VM A2 Tenant VM 2 S’1 S2 Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency(A1, S’1, 0.99) + Latency(A3, S2, 0.99) S’1 = Leftover(S1, A2) A3 = Output(A1, S’1) Adding latencies does not preserve SLO %! Convolution(L1, L2, 0.99) SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 21

  22. SNC Operators Operator Meaning Latency(A, S, N) N% latency for a given A, S Leftover(S, A) S adjusted/reduced by A Output(A, S) Resultant output distribution of A and S Convolution(L1, L2) Combine latencies L1, L2 Aggregation(A1, A2) Multiplexed A1 and A2 SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 22

  23. SNC Implementation Challenges SNC order of operations optimizations Tunable dependencies between tenants Modeling burstiness - Markov Modulated Poisson Process Programming language abstraction for applying SNC operators SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 23

  24. SNC Implementation Challenges SNC order of operations optimizations Tunable dependencies between tenants Modeling burstiness - Markov Modulated Poisson Process Programming language abstraction for applying SNC operators SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 24

  25. SNC Implementation Challenges SNC order of operations optimizations Switching between high Tunable dependencies between tenants and low phases Modeling burstiness - Markov Modulated Poisson Process Programming language abstraction for applying SNC operators SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 25

  26. SNC Implementation Challenges SNC order of operations optimizations Tunable dependencies between tenants Modeling burstiness - Markov Modulated Poisson Process Programming language abstraction for applying SNC operators SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 26

  27. Experimental Setup Silo: DNC, fixed 1.5Kb bursts, trial and error manual bandwidth selection Silo++: Silo with dynamic bandwidth selection QJump: manual priority class assignment QJump++: QJump with automatically assigned priority class PriorityMeister: automatically derived rates from tenant trace Real production 2015 traces from large internet company SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 27

  28. Results More Tenants High Network Utilization SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 28

  29. Results #Tenants Scales to Scales to high SLO % Cluster Size SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 29

  30. Future Work / Discussion Bootstrapping representative historical traces/logs is a chicken-and-egg problem. How can we improve the process? How can we build fault-tolerance into SNC-Meister? Any practical SLO mechanism should account for as many failure scenarios as possible. The paper makes an assumption about latency within a single datacenter, why do we need this assumption? What if this assumption is not met? When most of the tenants are dependent on one another, why does SNC show higher latency than DNC? SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 30

  31. Backup Slides SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 31

  32. SNC Operators SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 32

Recommend


More recommend