Symposium on Cloud Computing (SoCC) Kairos: Preemptive Data Center Scheduling Without Runtime Estimates Pamela Delgado, Diego Didona, Florin Dinu and Willy Zwaenepoel October 11, 2018 1
Kairos Data center scheduling without task runtime estimates Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 2
Kairos key idea • New preemption approach ✓ No head-of-line blocking ✓ Good scheduling performance Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 3
Data center scheduling challenge cluster • Heavy-tailed workloads scheduler … Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 4
Problem: head-of-line blocking • Short waiting for long • High likelihood … Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 5
Historical use of runtime estimates per-task estimations dual classification no estimations Yarn’13 Sparrow’13 Apollo’14 Hawk’15 Mercury*’15 Do not avoid Borg’15 head-of-line! Yaq’16 Depend on Tetrisched’16 runtime estimates Eagle’16 Firmament’16 … Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 6
Hard to obtain reliable estimates • Mis-estimations happen • unseen jobs, skewed input, failures/spikes • Consequences: • poor scheduling decisions*, violate SLOs^ • complex designs to compensate *Job- aware scheduling in Eagle: Divide and Stick to Your Probes (SoCC’16) ^ Tetrisched: global rescheduling with adaptive plan- ahead in dynamic heterogeneous clusters (Eurosys’16) Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 7
Can we dispense with task runtime estimates altogether? Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 8
Can we dispense with ✓ Avoid head-of-line blocking task runtime estimates ✓ No task runtime estimates altogether? Kairos Motivation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 9
Kairos insight Use preemption!! Kairos Preemption Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 10
Preemption in Kairos Costly resuming elsewhere: Preempt long! Do preemption locally! … Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 11
Kairos architecture Node j Kairos node scheduler Kairos Local preemption centralized scheduler Node x Kairos node Centralized Distributed scheduler component component Local preemption Load balancing … Node y Kairos node scheduler Local preemption Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 12
Kairos architecture Node j Kairos node scheduler Kairos Local preemption centralized scheduler Node x Kairos node scheduler Local preemption Load balancing … Node y Kairos node scheduler Local preemption Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 13
Kairos architecture Node j Kairos node scheduler Kairos Local preemption centralized scheduler Node x Kairos node scheduler Local preemption Load balancing … Node y Kairos node scheduler Local preemption Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 14
Least-Attained Service (LAS) • Preemptive policy • Give resources to task that received least service ✓ New task runs immediately ✓ Runs as long as it is the one with least received service Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 15
LAS rationale • Good for heavy-tailed workloads* • Benefits: 1.Shorter tasks have priority (no head-of-line blocking) 2.Shorter tasks – very likely – execute until completion *Performance modeling and design of computer systems: queueing theory in action M. Harchol-Balter 2013 Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 16
Kairos distributed scheduling • Node schedulers Kairos node scheduler … • LAS at the nodes Kairos node scheduler How to dispatch … tasks among nodes? … Kairos node … scheduler Kairos Distributed scheduling Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 17
Kairos architecture Node j Kairos node scheduler Kairos Local preemption centralized scheduler Node x Kairos node scheduler Local preemption Load balancing … Node y Kairos node scheduler Local preemption Kairos Architecture Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 18
Kairos centralized scheduling Node j Kairos node scheduler 1 Kairos ? centralized Node x scheduler Kairos node scheduler 4 4 … 1 st Load balancing Node y Kairos node 2 nd Maximize LAS effectiveness scheduler 2 Kairos Centralized scheduling Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 19
Load balancing rationale 1. Avoid! 1. Lowest # tasks: no idle nodes Node j Kairos node scheduler • Bound max # tasks 0 tasks Node y Kairos node scheduler … 100 tasks … Kairos Centralized scheduling Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 20
Load balancing rationale 2. Avoid! 2. LAS-aware policy break ties: Node j Kairos node scheduler • Heavy-tailed for each node • Maximize LAS effectiveness only short • Node with lowest AS variance* Node y Kairos node scheduler only long *Minimizing total flow time and total completion time with immediate dispatching. Avrahami et.al. 2003 Multi-layered round robin routing for parallel servers Down et.al. 2006 Kairos Centralized scheduling Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 21
Kairos recap 1. Distributed: ✓ LAS node level 2. Centralized: ✓ LAS-aware load balancing technique Kairos Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 22
Evaluation • Yarn and Docker containers • 120 cores in 30 nodes • heavy-tailed workload (100 jobs) • Metrics: Job runtime and slowdown • Compare to: Big- C [ATC’17], FIFO • Simulation: Google trace, compare to Eagle [SoCC’16] Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 23
What is the slowdown? 𝑝𝑐𝑡𝑓𝑠𝑤𝑓𝑒 𝑘𝑝𝑐 𝑠𝑣𝑜𝑢𝑗𝑛𝑓 𝑘𝑝𝑐 𝑡𝑚𝑝𝑥𝑒𝑝𝑥𝑜 = 𝑣𝑜𝑑𝑝𝑜𝑢𝑓𝑜𝑒𝑓𝑒 𝑘𝑝𝑐 𝑠𝑣𝑜𝑢𝑗𝑛𝑓 Best job slowdown = 1 Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 24
Kairos vs Big-C and FIFO Job slowdown Kairos Big-C FIFO 120 100 80 CDF 60 40 20 0 0 2 4 6 8 10 12 14 16 18 20 Job slowdown better Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 25
Kairos vs Big-C and FIFO Job slowdown Kairos Big-C FIFO 120 100 80 Slowdown in Kairos <1.8X CDF 60 40 20 0 0 2 4 6 8 10 12 14 16 18 20 Job runtime/expected job runtime better Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 26
Kairos vs Big-C and FIFO Job running times Kairos Big-C FIFO 120 2.3X 100 2X 80 CDF 60 1.6X 40 20 0 0 500 1000 1500 2000 2500 3000 3500 4000 Job runtime [s] better Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 27
Kairos vs Big-C and FIFO Job running times Kairos Big-C FIFO 120 2.3X 100 2X 80 Kairos better across the board CDF 60 1.6X 40 20 0 0 500 1000 1500 2000 2500 3000 3500 4000 Job runtime [s] better Kairos Evaluation Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 28
Kairos vs Eagle • Short jobs runtime • Google trace 50th 90th 99th 1,4 1,2 Kairos/Eagle 1 0,8 0,6 0,4 0,2 0 better Lower Higher Cluster load Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 29
Kairos vs Eagle • Short jobs runtime • Google trace 50th 90th 99th 1,4 1,2 Kairos/Eagle 1 Kairos works well at large scale 0,8 0,6 0,4 0,2 0 better Lower Higher Cluster load Kairos: Preemptive Data Center Scheduling Without Runtime Estimates | SoCC’18 30
Recommend
More recommend