Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud Gunho Lee (UC Berkeley) Byung-Gon Chun (Yahoo! Research) Randy H. Katz (UC Berkeley)
We have resources and jobs Resource Job/Task
Allocate resources (slots) Allocation Resource Job/Task
Then schedule jobs/tasks on them Allocation Scheduling Resource Job/Task
Goal 1. Minimize the cluster size while providing good performance Dynamic Resource Allocation Resource Job/Task
Goal 2. Provide each job with “fair share” of resources Fair scheduling Resource Job/Task
Heterogeneity makes the problem more complex Allocation ??? Scheduling ??? Resource Job/Task
Our Approach • Consider Job Affinity to match more suitable resources to jobs • Redefine a share metric to provide fairness • Allocation – Core Nodes + Accelerator Nodes • Scheduling – Progress Share
Fair Share Metric • The scheduler try to equalize “share” of all jobs – SlotShare : Number of slots owned • Does not work well in heterogeneous environments – ProgressShare: Progress being made with owned slots / all slots • Contribution of a slot to a job’s progress rate
Progress Share 1 Progress without sharing (1 job) Progress 0 Time
Progress Share 1 Progress without sharing (1 job) Progress Just good progress with sharing (2 jobs) 0 Time
Progress Share 1 Progress without sharing (1 job) (Even better) Progress Just good progress with sharing (2 jobs) (Under-served) 0 Time
Progress Share 1 Progress without sharing (1 job) (Even better) Progress Just good progress with sharing (2 jobs) a (Under-served) b 0 Time Progress Share of Job A = Ratio of progress slope (b/a)
Homogeneous case 1 Slot Share 1 0 Job A Progress 1 Job B Progress 0 Share Time 0
Heterogeneous case Job A runs faster on gray slots B B B B B A A B B B B B A A B B B B B A A B B B B B A A B B B B B A A A A A B B B B B A A A A A 1 1 Job A Progress Progress Job B 0 0 Time Time
Heterogeneous case 1 Using SlotShare B B A 1 A B B B B B A Slot A B A Share B A B A B A B B B B 1 0 Time 1 Job A Progress Job B Progress Share 0 Time 0 Time
Heterogeneous case 1 Using SlotShare B B A 1 A B B B B B A Slot A B A Share B A B A B A B B B B 1 0 Time 1 Job A Progress Job B Progress Share 0 Time 0 Time
Heterogeneous case 1 Using SlotShare Job A is making less progress, B B A with the same number of slots 1 A B B B B B A Slot A B A Share B A B A B A B B B B 1 0 Time 1 Job A Progress Job B Progress Share 0 Time 0 Time
Heterogeneous case 2 Using ProgressShare B B B B B 1 B B B B B B B B B B Slot B A B Share A A A A A A A A A A 1 0 Time Job A 1 Progress Job B Progress Share 0 Time 0 Time
Heterogeneous case 2 Using ProgressShare B B B B B 1 B B B B B B B B B B Slot B A B Share A A A A A A A A A A 1 0 Time Job A 1 Progress Job B Progress Share 0 Time 0 Time
Heterogeneous case 2 Using ProgressShare Both jobs making B B B B B progress >= 0.5 1 B B B B B B B B B B Slot B A B Share A A A A A A A A A A 1 0 Time Job A 1 Progress Job B Progress Share 0 Time 0 Time
Performance Gain of Using Progress Share
Summary • Heterogeneity should be taken account at both level of two-level scheduling – Resource Allocation and Job Scheduling • Need to redefine “share” to provide performance and fairness simultaneously in heterogeneous environments – Propose “progress share” • Future Work – Combine with sub-linear performance model – Consider inference of co-located jobs
Recommend
More recommend