NEPTUNE Scheduling Suspendable Tasks for Unified Stream/Batch - PowerPoint PPT Presentation

NEPTUNE Scheduling Suspendable Tasks for Unified Stream/Batch Applications Panagiotis Garefalakis Imperial College London pgaref@imperial.ac.uk Konstantinos Karanasos Peter Pietzuch Microsoft Imperial College London kokarana@microsoft.com prp@imperial.ac.uk SoCC, Santa Cruz, California, November 2019

Unified application example Inference Job Real-time Low-latency data responses Training Job Stream Historical Trained Batch Application data Model Iterate Panagiotis Garefalakis - Imperial College London 2

Evolution of analytics frameworks Frameworks Batch frameworks Unified with hybrid stream/batch stream/batch frameworks Stream frameworks applications 2010 2014 2018 Structured Streaming Panagiotis Garefalakis - Imperial College London 3

Stream/Batch application requirements Requirements > Latency: Execute inference job with minimum delay > Throughput: Batch jobs should not be compromised > Efficiency: Achieve high cluster resource utilization Challenge: schedule stream/batch jobs to satisfy their diverse requirements Panagiotis Garefalakis - Imperial College London 4

Stream/Batch application scheduling Driver App Context submit Application run job Code DAG Scheduler Stage1 Stage2 2x 2x 2xT 2xT T T Inference (stream) Job Stage1 Stage2 3x 4x 3T Training (batch) Job T 3T T 3T T 3T Panagiotis Garefalakis - Imperial College London 5

Stream/Batch application scheduling Stage1 Stage2 2x 2x 2xT 2xT T T Inference (stream) Job Stage1 Stage2 3x 4x 3T Training (batch) Job T 3T T 3T T 3T > Static allocation: dedicate resources to each job executor 2 T T T T Cores T executor 1 3T Wasted resources T 3T 3T T 3T 6T 2T 4T 8T Resources can not be shared across jobs Panagiotis Garefalakis - Imperial College London 6

Stream/Batch application scheduling Stage1 Stage2 2x 2x 2xT 2xT T T Inference (stream) Job Stage1 Stage2 3x 4x 3T Training (batch) Job T 3T T 3T T 3T > FIFO: first job runs to completion shared executors 3T T Cores 3T T 3T T T 3T T T T 6T 2T 4T 8T Long batch jobs increase stream job latency Panagiotis Garefalakis - Imperial College London 7

Stream/Batch application scheduling Stage1 Stage2 2x 2x 2xT 2xT T T Inference (stream) Job Stage1 Stage2 3x 4x 3T Training (batch) Job T 3T T 3T T 3T > FAIR: weight share resources across jobs shared executors T 3T Cores T T 3T 3T T T 3T T T 6T 2T 4T 8T Better packing with non-optimal latency queuing Panagiotis Garefalakis - Imperial College London 8

Stream/Batch application scheduling Stage1 Stage2 2x 2x 2xT 2xT T T Inference (stream) Job Stage1 Stage2 3x 4x 3T Training (batch) Job T 3T T 3T T 3T > KILL: avoid queueing by preempting batch tasks shared executors T 3T T Cores T 3T T 3T T 3T T 3T 3T T 6T 2T 4T 8T Better latency at the expense of extra work Panagiotis Garefalakis - Imperial College London 9

Stream/Batch application scheduling Stage1 Stage2 2x 2x 2xT 2xT T T Inference (stream) Job Stage1 Stage2 3x 4x 3T Training (batch) Job T 3T T 3T T 3T > NEPTUNE: minimize queueing and wasted work! shared executors T 3T T Cores T 3T T T 3T 2T 3T T 2T T 6T 2T 4T 8T Panagiotis Garefalakis - Imperial College London 10

Challenges > How to minimize queuing for latency-sensitive jobs and wasted work? Implement suspendable tasks > How to natively support stream/batch applications? Provide a unified execution framework > How to satisfy different stream/batch application requirements and high-level objectives? Introduces custom scheduling policies Panagiotis Garefalakis - Imperial College London 11

NEPTUNE Execution framework for Stream/Batch applications > How to minimize queuing for latency-sensitive jobs and wasted work? Implement suspendable tasks Support suspendable tasks > How to natively support stream/batch applications? Provide a unified execution framework Unified execution framework on top of Structured Streaming > How to satisfy different stream/batch application requirements and high-level objectives? Introduces custom scheduling policies Introduce pluggable scheduling policies Panagiotis Garefalakis - Imperial College London 12

Typical tasks > Tasks: apply a function to a partition of data Executor Stack > Subroutines that run in executor to completion State Iterator > Preemption problem: Context > Loss of progress (kill) > Unpredictable preemption times Function (checkpointing) Value Task run Panagiotis Garefalakis - Imperial College London 13

Suspendable tasks > Idea: use coroutines State Coroutine > Separate stacks to store task Stack Iterator state > Yield points handing over Context Context control to the executor Executor Function Stack > Cooperative preemption: yield call > Suspend and resume in milliseconds > Work-preserving Value > Transparent to the user Task run https://github.com/storm-enroute/coroutines Panagiotis Garefalakis - Imperial College London 14

Execution framework > Problem : not just assign but also suspend and resume > Idea : centralized scheduler with pluggable policies metrics Executor Executor Executor DAG scheduler Tasks Running Tasks Paused Incrementalizer Optimizer Low-pri job High-pri job suspend & launch run task task Task Scheduler Scheduling policy High Low App + job priorities Panagiotis Garefalakis - Imperial College London 15

Scheduling policies > Idea : policies trigger task suspension and resumption > Guarantee that stream tasks bypass batch tasks > Satisfy higher-level objectives i.e. balance cluster load > Avoid starvation by suspending up to a number of times > Load-balancing (LB): takes into account executors’ memory conditions and equalize the number of tasks per node > Locality- and memory aware (LMA): respect task locality preferences in addition to load-balancing Panagiotis Garefalakis - Imperial College London 16

Implementation > Built as an extension to 2.4.0 (https://github.com/lsds/Neptune) > Ported all ResultTask, ShuffleMapTask functionality across programming interfaces to coroutines > Extended Spark’s DAG Scheduler to allow job stages with different requirements (priorities) > Added additional Executor performance metrics as part of the heartbeat mechanism Panagiotis Garefalakis - Imperial College London 17

Azure deployment > Cluster – 75 nodes with 4 cores and 32 GB of memory each > Workloads – LDA : ML training/inference application uncovering hidden topics from a group of documents – Yahoo Streaming Benchmark : ad-analytics on a stream of ad impressions – TPC-H decision support benchmark Panagiotis Garefalakis - Imperial College London 18

Benefit of NEPTUNE in stream latency 6 Streaming latency (s) 5 37 % 99th 4 3 61 % 2 13 % 54 % median 1 5th 0 Static FIFO FAIR KILL LMA LB Isolation DIFF-EXEC FIFO FAIR KILL NEP-CL NEP-LB PRI-ONLY allocation Neptune Neptune > LDA: training (batch) job using all available resources, with NEPTUNE achieves latencies comparable to a latency-sensitive inference (stream) using 15% of resources the ideal for the latency-sensitive jobs Panagiotis Garefalakis - Imperial College London 19

Impact of resource demands in performance Streaming latency (s) 6 3.95 Batch (M events/s) 3.92 1.5% 4 3.90 2 3.88 0 3.85 0% 20% 40% 60% 80% 100% Cores used for Streaming Past to future > YSB: increasing stream job resource demands while batch Efficiently share resources with low impact on job using all available resources throughput Panagiotis Garefalakis - Imperial College London 20

Summary NEPTUNE supports complex unified applications with diverse job requirements! > Suspendable tasks using coroutines > Pluggable scheduling policies > Continuous unified analytics https://github.com/lsds/Neptune Thank you! Panagiotis Garefalakis Questions? pgaref@imperial.ac.uk

NEPTUNE Scheduling Suspendable Tasks for Unified Stream/Batch - PowerPoint PPT Presentation

NEPTUNE Scheduling Suspendable Tasks for Unified Stream/Batch Applications Panagiotis Garefalakis Imperial College London pgaref@imperial.ac.uk Konstantinos Karanasos Peter Pietzuch Microsoft Imperial College London kokarana@microsoft.com

Neptune Energy 2018 Results About Neptune Energy Group Neptune is an independent global E&P

Report for Neptune Energy Group Midco Limited About Neptune Energy Group Neptune is an

Neptune Avenue Tree Planting and Future Replacements Neptune Avenue - Tree Planting and Future

NEPTUNE ENERGY 2019 HALF YEAR RESULTS Neptune Energy Group Midco Limited Unaudited Condensed

Report for Neptune Energy Group Midco Limited About Neptune Energy Group Neptune is an

NEPTUNE ENERGY 2020 1 st QUARTER RESULTS Neptune Energy Group Midco Limited Unaudited Condensed

NEPTUNE ENERGY 2019 3 rd QUARTER RESULTS Neptune Energy Group Midco Limited Unaudited Condensed

Neptune Energy Q1 2020 Results Wednesday, 27 th May 2020 Neptune Energy Q1 2020 Results

Neptune Energy H1 2018 Results Investor Call 6 th September 2018 General & Disclaimer

Neptune Terminals Neptune Bulk Terminals One of the largest multi-product bulk terminals in

Neptune and Stratford Water Meter Installations Who is Neptune? Founded in 1892 Largest

The Neptune System Revised: The Neptune System Revised: New Results on the Moons New Results on

H1 2019 Results 29 AUGUST 2019 GENERAL & DISCLAIMER Except as the context otherwise

Q1 2020 Results 27 MAY 2020 GENERAL AND DISCLAIMER Except as the context otherwise indicates,

Q3 Results 2019 Friday, 22 nd November 2019 Neptune Energy Q3 Results 2019 Thursday, 22 nd

2018 RESULTS 3 April 2019 General & Disclaimer Except as the context otherwise indicates,

Counterfighting Counterfeit Detecting and taking down fraudulent webshops at a ccTLD Thymen Wabeke

Ecological transformation When ecological knowledge and transformation in the social

A Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

Solving Nonlinear Eigenvalue Problems with SLEPc Jose E. Roman D. Sistemes Inform` atics i

Linear and Nonlinear SP 2 Methods for Large Scale Eigenvalue Calculations Zhaojun Bai

Simple, Fast and Deterministic Gossip and Rumor Spreading Main paper by: B. Haeupler, MIT Talk

Statistical Methods used in Reactor Neutrino Experiments Xin Qian BNL 1 Reactor Neutrinos

Nutrient loads from estuaries to the coastal ocean; the role of resolution and vegetation on

Sambuz

Useful Links

Newsletter

Mail Us

NEPTUNE Scheduling Suspendable Tasks for Unified Stream/Batch - PowerPoint PPT Presentation

NEPTUNE Scheduling Suspendable Tasks for Unified Stream/Batch Applications Panagiotis Garefalakis Imperial College London pgaref@imperial.ac.uk Konstantinos Karanasos Peter Pietzuch Microsoft Imperial College London kokarana@microsoft.com

Neptune Energy 2018 Results About Neptune Energy Group Neptune is an independent global E&amp;P

Report for Neptune Energy Group Midco Limited About Neptune Energy Group Neptune is an

Neptune Avenue Tree Planting and Future Replacements Neptune Avenue - Tree Planting and Future

NEPTUNE ENERGY 2019 HALF YEAR RESULTS Neptune Energy Group Midco Limited Unaudited Condensed

Report for Neptune Energy Group Midco Limited About Neptune Energy Group Neptune is an

NEPTUNE ENERGY 2020 1 st QUARTER RESULTS Neptune Energy Group Midco Limited Unaudited Condensed

NEPTUNE ENERGY 2019 3 rd QUARTER RESULTS Neptune Energy Group Midco Limited Unaudited Condensed

Neptune Energy Q1 2020 Results Wednesday, 27 th May 2020 Neptune Energy Q1 2020 Results

Neptune Energy H1 2018 Results Investor Call 6 th September 2018 General &amp; Disclaimer

Neptune Terminals Neptune Bulk Terminals One of the largest multi-product bulk terminals in

Neptune and Stratford Water Meter Installations Who is Neptune? Founded in 1892 Largest

The Neptune System Revised: The Neptune System Revised: New Results on the Moons New Results on

H1 2019 Results 29 AUGUST 2019 GENERAL &amp; DISCLAIMER Except as the context otherwise

Q1 2020 Results 27 MAY 2020 GENERAL AND DISCLAIMER Except as the context otherwise indicates,

Q3 Results 2019 Friday, 22 nd November 2019 Neptune Energy Q3 Results 2019 Thursday, 22 nd

2018 RESULTS 3 April 2019 General &amp; Disclaimer Except as the context otherwise indicates,

Counterfighting Counterfeit Detecting and taking down fraudulent webshops at a ccTLD Thymen Wabeke

Ecological transformation When ecological knowledge and transformation in the social

A Short History of Array Computing in Python Wolf Vollprecht, PyParis 2018 TOC - Array

Solving Nonlinear Eigenvalue Problems with SLEPc Jose E. Roman D. Sistemes Inform` atics i

Linear and Nonlinear SP 2 Methods for Large Scale Eigenvalue Calculations Zhaojun Bai

Simple, Fast and Deterministic Gossip and Rumor Spreading Main paper by: B. Haeupler, MIT Talk

Statistical Methods used in Reactor Neutrino Experiments Xin Qian BNL 1 Reactor Neutrinos

Nutrient loads from estuaries to the coastal ocean; the role of resolution and vegetation on

Sambuz

Useful Links

Newsletter

Mail Us

Neptune Energy 2018 Results About Neptune Energy Group Neptune is an independent global E&P

Neptune Energy H1 2018 Results Investor Call 6 th September 2018 General & Disclaimer

H1 2019 Results 29 AUGUST 2019 GENERAL & DISCLAIMER Except as the context otherwise

2018 RESULTS 3 April 2019 General & Disclaimer Except as the context otherwise indicates,