kairos preemptive data center scheduling without runtime
play

Kairos: Preemptive Data Center Scheduling Without Runtime Estimates - PowerPoint PPT Presentation

Symposium on Cloud Computing (SoCC) Kairos: Preemptive Data Center Scheduling Without Runtime Estimates Pamela Delgado, Diego Didona, Florin Dinu and Willy Zwaenepoel October 11, 2018 1 Kairos Data center scheduling without task runtime


  1. Symposium on Cloud Computing (SoCC) Kairos: Preemptive Data Center Scheduling Without Runtime Estimates Pamela Delgado, Diego Didona, Florin Dinu and Willy Zwaenepoel October 11, 2018 1

  2. Kairos Data center scheduling without task runtime estimates Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 2

  3. Kairos key idea • New preemption approach ✓ No head-of-line blocking ✓ Good scheduling performance Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 3

  4. Data center scheduling challenge cluster • Heavy-tailed workloads scheduler … Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 4

  5. Problem: head-of-line blocking • Short waiting for long • High likelihood … Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 5

  6. Historical use of runtime estimates per-task estimations dual classification no estimations Yarn’13 Sparrow’13 Apollo’14 Hawk’15 Mercury*’15 Do not avoid Borg’15 head-of-line! Yaq’16 Depend on Tetrisched’16 runtime estimates Eagle’16 Firmament’16 … Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 6

  7. Hard to obtain reliable estimates • Mis-estimations happen • unseen jobs, skewed input, failures/spikes • Consequences: • poor scheduling decisions*, violate SLOs^ • complex designs to compensate *Job- aware scheduling in Eagle: Divide and Stick to Your Probes (SoCC’16) ^ Tetrisched: global rescheduling with adaptive plan- ahead in dynamic heterogeneous clusters (Eurosys’16) Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 7

  8. Can we dispense with task runtime estimates altogether? Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 8

  9. Can we dispense with ✓ Avoid head-of-line blocking task runtime estimates ✓ No task runtime estimates altogether? Kairos Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 9

  10. Kairos insight Use preemption!! Kairos Preemption Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 10

  11. Preemption in Kairos Costly resuming elsewhere: Preempt long! Do preemption locally! … Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 11

  12. Kairos architecture Node j Kairos node scheduler Kairos Local preemption centralized scheduler Node x Kairos node Centralized Distributed scheduler component component Local preemption Load balancing … Node y Kairos node scheduler Local preemption Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 12

  13. Kairos architecture Node j Kairos node scheduler Kairos Local preemption centralized scheduler Node x Kairos node scheduler Local preemption Load balancing … Node y Kairos node scheduler Local preemption Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 13

  14. Kairos architecture Node j Kairos node scheduler Kairos Local preemption centralized scheduler Node x Kairos node scheduler Local preemption Load balancing … Node y Kairos node scheduler Local preemption Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 14

  15. Least-Attained Service (LAS) • Preemptive policy • Give resources to task that received least service ✓ New task runs immediately ✓ Runs as long as it is the one with least received service Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 15

  16. LAS rationale • Good for heavy-tailed workloads* • Benefits: 1.Shorter tasks have priority (no head-of-line blocking) 2.Shorter tasks – very likely – execute until completion *Performance modeling and design of computer systems: queueing theory in action M. Harchol-Balter 2013 Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 16

  17. Kairos distributed scheduling • Node schedulers Kairos node scheduler … • LAS at the nodes Kairos node scheduler How to dispatch … tasks among nodes? … Kairos node … scheduler Kairos Distributed scheduling Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 17

  18. Kairos architecture Node j Kairos node scheduler Kairos Local preemption centralized scheduler Node x Kairos node scheduler Local preemption Load balancing … Node y Kairos node scheduler Local preemption Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 18

  19. Kairos centralized scheduling Node j Kairos node scheduler 1 Kairos ? centralized Node x scheduler Kairos node scheduler 4 4 … 1 st Load balancing Node y Kairos node 2 nd Maximize LAS effectiveness scheduler 2 Kairos Centralized scheduling Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 19

  20. Load balancing rationale 1. Avoid! 1. Lowest # tasks: no idle nodes Node j Kairos node scheduler • Bound max # tasks 0 tasks Node y Kairos node scheduler … 100 tasks … Kairos Centralized scheduling Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 20

  21. Load balancing rationale 2. Avoid! 2. LAS-aware policy break ties: Node j Kairos node scheduler • Heavy-tailed for each node • Maximize LAS effectiveness only short • Node with lowest AS variance* Node y Kairos node scheduler only long *Minimizing total flow time and total completion time with immediate dispatching. Avrahami et.al. 2003 Multi-layered round robin routing for parallel servers Down et.al. 2006 Kairos Centralized scheduling Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 21

  22. Kairos recap 1. Distributed: ✓ LAS node level 2. Centralized: ✓ LAS-aware load balancing technique Kairos Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 22

  23. Evaluation • Yarn and Docker containers • 120 cores in 30 nodes • heavy-tailed workload (100 jobs) • Metrics: Job runtime and slowdown • Compare to: Big- C [ATC’17], FIFO • Simulation: Google trace, compare to Eagle [SoCC’16] Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 23

  24. What is the slowdown? 𝑝𝑐𝑡𝑓𝑠𝑤𝑓𝑒 𝑘𝑝𝑐 𝑠𝑣𝑜𝑢𝑗𝑛𝑓 𝑘𝑝𝑐 𝑡𝑚𝑝𝑥𝑒𝑝𝑥𝑜 = 𝑣𝑜𝑑𝑝𝑜𝑢𝑓𝑜𝑒𝑓𝑒 𝑘𝑝𝑐 𝑠𝑣𝑜𝑢𝑗𝑛𝑓 Best job slowdown = 1 Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 24

  25. Kairos vs Big-C and FIFO Job slowdown Kairos Big-C FIFO 120 100 80 CDF 60 40 20 0 0 2 4 6 8 10 12 14 16 18 20 Job slowdown better Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 25

  26. Kairos vs Big-C and FIFO Job slowdown Kairos Big-C FIFO 120 100 80 Slowdown in Kairos <1.8X CDF 60 40 20 0 0 2 4 6 8 10 12 14 16 18 20 Job runtime/expected job runtime better Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 26

  27. Kairos vs Big-C and FIFO Job running times Kairos Big-C FIFO 120 2.3X 100 2X 80 CDF 60 1.6X 40 20 0 0 500 1000 1500 2000 2500 3000 3500 4000 Job runtime [s] better Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 27

  28. Kairos vs Big-C and FIFO Job running times Kairos Big-C FIFO 120 2.3X 100 2X 80 Kairos better across the board CDF 60 1.6X 40 20 0 0 500 1000 1500 2000 2500 3000 3500 4000 Job runtime [s] better Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 28

  29. Kairos vs Eagle • Short jobs runtime • Google trace 50th 90th 99th 1,4 1,2 Kairos/Eagle 1 0,8 0,6 0,4 0,2 0 better Lower Higher Cluster load Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 29

  30. Kairos vs Eagle • Short jobs runtime • Google trace 50th 90th 99th 1,4 1,2 Kairos/Eagle 1 Kairos works well at large scale 0,8 0,6 0,4 0,2 0 better Lower Higher Cluster load Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 30

Recommend


More recommend