THE LINUX SCHEDULER: A DECADE OF WASTED CORES Jean-Pierre Lozi Baptiste Lepers Fabien Gaud jplozi@unice.fr baptiste.lepers@epfl.ch me@fabiengaud.net Vivien Quéma Alexandra Fedorova vivien.quema@imag.fr sasha@ece.ubc.ca Justin Funston jfunston@ece.ubc.ca THE LINUX SCHEDULER: A DECADE OF WASTED CORES 1/16
INTRODUCTION Take a machine with a lot of cores (64 in our case) THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2/16
INTRODUCTION Take a machine with a lot of cores (64 in our case) Run two CPU-intensive processes in two terminals (e.g. R scripts): R < script.R --nosave & R < script.R --nosave & THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2/16
INTRODUCTION Take a machine with a lot of cores (64 in our case) Run two CPU-intensive processes in two terminals (e.g. R scripts): R < script.R --nosave & R < script.R --nosave & Compile your kernel in a third terminal: make – j 62 kernel THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2/16
INTRODUCTION Take a machine with a lot of cores (64 in our case) Run two CPU-intensive processes in two terminals (e.g. R scripts): R < script.R --nosave & R < script.R --nosave & Compile your kernel in a third terminal: make – j 62 kernel Here is what might happen: THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2/16
INTRODUCTION Take a machine with a lot of cores (64 in our case) Run two CPU-intensive processes in two terminals (e.g. R scripts): R < script.R --nosave & R < script.R --nosave & Compile your kernel in a third terminal: make – j 62 kernel Here is what might happen: Two NUMA nodes with many idle cores (white) THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2/16
INTRODUCTION Take a machine with a lot of cores (64 in our case) Run two CPU-intensive processes in two terminals (e.g. R scripts): R < script.R --nosave & R < script.R --nosave & Compile your kernel in a third terminal: make – j 62 kernel Here is what might happen: Two NUMA nodes with many idle cores (white) Other NUMA nodes with many overloaded cores (orange, red) THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2/16
Performance degradation: INTRODUCTION 14% for the make process! Take a machine with a lot of cores (64 in our case) Run two CPU-intensive processes in two terminals (e.g. R scripts): R < script.R --nosave & R < script.R --nosave & Compile your kernel in a third terminal: make – j 62 kernel Here is what might happen: Two NUMA nodes with many idle cores (white) Other NUMA nodes with many overloaded cores (orange, red) THE LINUX SCHEDULER: A DECADE OF WASTED CORES 2/16
INTRODUCTION General-purpose schedulers aim to be work-conserving on multicore architectures THE LINUX SCHEDULER: A DECADE OF WASTED CORES 3/16
INTRODUCTION General-purpose schedulers aim to be work-conserving on multicore architectures Basic invariant: no idle cores if some cores have several threads in their runqueues Can actually happen, but only in transient situations! THE LINUX SCHEDULER: A DECADE OF WASTED CORES 3/16
INTRODUCTION General-purpose schedulers aim to be work-conserving on multicore architectures Basic invariant: no idle cores if some cores have several threads in their runqueues Can actually happen, but only in transient situations! We found four major bugs that break this invariant in the Linux scheduler (CFS)! THE LINUX SCHEDULER: A DECADE OF WASTED CORES 3/16
INTRODUCTION General-purpose schedulers aim to be work-conserving on multicore architectures Basic invariant: no idle cores if some cores have several threads in their runqueues Can actually happen, but only in transient situations! We found four major bugs that break this invariant in the Linux scheduler (CFS)! This talk: presentation of the CFS scheduler + issues we found + discussion THE LINUX SCHEDULER: A DECADE OF WASTED CORES 3/16
INTRODUCTION General-purpose schedulers aim to be work-conserving on multicore architectures Basic invariant: no idle cores if some cores have several threads in their runqueues Can actually happen, but only in transient situations! We found four major bugs that break this invariant in the Linux scheduler (CFS)! This talk: presentation of the CFS scheduler + issues we found + discussion Disclaimer: this is a motivation paper! Don’t expect a solved problem THE LINUX SCHEDULER: A DECADE OF WASTED CORES 3/16
THE COMPLETELY FAIR SCHEDULER (CFS): CONCEPT Core 0 Core 1 Core 2 Core 3 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 4/16
THE COMPLETELY FAIR SCHEDULER (CFS): CONCEPT One runqueue, threads sorted by runtime R = 103 R = 82 R = 24 R = 18 R = 12 Core 0 Core 1 Core 2 Core 3 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 4/16
THE COMPLETELY FAIR SCHEDULER (CFS): CONCEPT When thread done running One runqueue, threads for its timeslice : enqueued again R = 112 sorted by runtime R = 103 R = 82 R = 24 R = 18 R = 12 Core 0 Core 1 Core 2 Core 3 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 4/16
THE COMPLETELY FAIR SCHEDULER (CFS): CONCEPT When thread done running One runqueue, threads for its timeslice : enqueued again R = 112 sorted by runtime R = 103 R = 82 R = 24 Lower niceness = longer timeslice R = 18 (tasks allowed to run longer) R = 12 Core 0 Core 1 Core 2 Core 3 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 4/16
THE COMPLETELY FAIR SCHEDULER (CFS): CONCEPT When thread done running One runqueue, threads for its timeslice : enqueued again R = 112 sorted by runtime R = 103 Cores: next task from runqueue R = 82 R = 24 Lower niceness = longer timeslice R = 18 (tasks allowed to run longer) R = 12 Core 0 Core 1 Core 2 Core 3 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 4/16
THE COMPLETELY FAIR SCHEDULER (CFS): CONCEPT When thread done running One runqueue, threads for its timeslice : enqueued again R = 112 sorted by runtime R = 103 Cores: next task from runqueue R = 82 R = 24 Lower niceness = longer timeslice In practice: cannot work with single R = 18 (tasks allowed to run longer) runqueue because of contention! R = 12 Core 0 Core 1 Core 2 Core 3 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 4/16
CFS: IN PRACTICE One runqueue per core to avoid contention W=1 W=1 W=1 W=6 W=1 W=1 W=1 Core 0 Core 1 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 5/16
CFS: IN PRACTICE One runqueue per core to avoid contention W=1 W=1 CFS periodically balances “loads”: W=1 W=6 load(task) = weight 1 x % cpu use 2 W=1 W=1 W=1 Core 0 Core 1 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 5/16
CFS: IN PRACTICE One runqueue per core to avoid contention W=1 W=1 CFS periodically balances “loads”: W=1 W=6 load(task) = weight 1 x % cpu use 2 W=1 W=1 W=1 1 Lower niceness = higher weight Core 0 Core 1 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 5/16
CFS: IN PRACTICE One runqueue per core to avoid contention W=1 W=1 CFS periodically balances “loads”: W=1 W=6 load(task) = weight 1 x % cpu use 2 W=1 W=1 W=1 1 Lower niceness = higher weight 2 Prevent high-priority thread from taking whole CPU just to sleep Core 0 Core 1 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 5/16
CFS: IN PRACTICE One runqueue per core to avoid contention W=1 W=1 CFS periodically balances “loads”: W=1 W=6 load(task) = weight 1 x % cpu use 2 W=1 W=1 W=1 1 Lower niceness = higher weight 2 Prevent high-priority thread from taking whole CPU just to sleep Core 0 Core 1 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 5/16
CFS: IN PRACTICE One runqueue per core to avoid contention W=1 W=1 CFS periodically balances “loads”: W=1 W=6 load(task) = weight 1 x % cpu use 2 W=1 W=1 W=1 1 Lower niceness = higher weight 2 Prevent high-priority thread from taking whole CPU just to sleep Core 0 Core 1 Since there can be many cores: hierarchical approach! THE LINUX SCHEDULER: A DECADE OF WASTED CORES 5/16
CFS: BALANCING THE LOAD L=2000 L=3000 L=6000 L=1000 L=1000 L=1000 L=1000 L=1000 L=1000 L=3000 L=1000 L=1000 L=1000 L=1000 Core 0 Core 1 Core 2 Core 3 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 6/16
CFS: BALANCING THE LOAD L=2000 L=3000 L=6000 L=1000 L=1000 L=1000 L=1000 L=1000 L=1000 L=3000 L=1000 L=1000 L=1000 L=1000 Core 0 Core 1 Core 2 Core 3 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 6/16
CFS: BALANCING THE LOAD L=2000 L=3000 L=6000 L=1000 L=1000 L=1000 L=1000 L=1000 L=1000 L=3000 L=1000 L=1000 L=1000 L=1000 Core 0 Core 1 Core 2 Core 3 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 6/16
CFS: BALANCING THE LOAD L=2000 L=3000 L=6000 L=1000 Balanced! L=1000 L=1000 L=1000 L=1000 L=1000 L=3000 L=1000 L=1000 L=1000 L=1000 Core 0 Core 1 Core 2 Core 3 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 6/16
CFS: BALANCING THE LOAD L=2000 L=3000 L=6000 L=1000 Balanced! L=1000 L=1000 L=1000 L=1000 L=1000 L=3000 L=1000 L=1000 L=1000 L=1000 Core 0 Core 1 Core 2 Core 3 THE LINUX SCHEDULER: A DECADE OF WASTED CORES 6/16
Recommend
More recommend