Malleable task-graph scheduling with a practical speed-up model - PowerPoint PPT Presentation

Malleable task-graph scheduling with a practical speed-up model Loris Marchal 1 Bertrand Simon 1 Oliver Sinnen 2 Frédéric Vivien 1 1: CNRS, INRIA, ENS Lyon and Univ. Lyon, FR. 2: Univ. Auckland, NZ. New Challenges in Scheduling Theory — Aussois March 2016 L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 1 / 22

Motivation Context: � Optimize the time performance of multifrontal sparse solvers (e.g., MUMPS or QR-MUMPS) � Computations well described by a tree of tasks � Generalization to Series-Parallel graphs � Purpose: find a schedule achieving the lowest makespan T T Objectives: � Provide theoretical guarantees on widely used scheduling algorithms � Design ones with smaller makespan L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 2 / 22

Motivation Context: � Optimize the time performance of multifrontal sparse solvers (e.g., MUMPS or QR-MUMPS) � Computations well described by a tree of tasks � Generalization to Series-Parallel graphs � Purpose: find a schedule achieving the lowest makespan G 1 G 2 G 1 ; G 2 Objectives: � Provide theoretical guarantees on widely used scheduling algorithms � Design ones with smaller makespan L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 2 / 22

Motivation Context: � Optimize the time performance of multifrontal sparse solvers (e.g., MUMPS or QR-MUMPS) � Computations well described by a tree of tasks � Generalization to Series-Parallel graphs � Purpose: find a schedule achieving the lowest makespan G 1 G 2 G 1 ∥ G 2 Objectives: � Provide theoretical guarantees on widely used scheduling algorithms � Design ones with smaller makespan L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 2 / 22

Motivation Context: � Optimize the time performance of multifrontal sparse solvers (e.g., MUMPS or QR-MUMPS) � Computations well described by a tree of tasks � Generalization to Series-Parallel graphs � Purpose: find a schedule achieving the lowest makespan Objectives: � Provide theoretical guarantees on widely used scheduling algorithms � Design ones with smaller makespan L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 2 / 22

Motivation Context: � Optimize the time performance of multifrontal sparse solvers (e.g., MUMPS or QR-MUMPS) � Computations well described by a tree of tasks � Generalization to Series-Parallel graphs � Purpose: find a schedule achieving the lowest makespan 1 2 4 5 6 3 Objectives: � Provide theoretical guarantees on widely used scheduling algorithms � Design ones with smaller makespan L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 2 / 22

Application modeling Coarse-grain picture: tree of tasks (or SP task graph) � Each task: partial factorization, graph of smaller sub-tasks Expand all tasks and schedule resulting graph ? � Scheduling trees simpler than general graphs (forget sub-tasks) � Behavior of coarse-grain tasks � parallel and malleable � Speed-up model − → trade-off between: Accuracy : fits well the data Tractability : amenable to perf. analysis, guaranteed algorithms L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 3 / 22

Application modeling Coarse-grain picture: tree of tasks (or SP task graph) � Each task: partial factorization, graph of smaller sub-tasks POTRF-0 TRSM-1-0 TRSM-4-0 SYRK-1-1-0 TRSM-2-0 TRSM-3-0 GEMM-4-1-0 GEMM-4-2-0 GEMM-4-3-0 POTRF-1 GEMM-2-1-0 GEMM-3-2-0 GEMM-3-1-0 SYRK-4-4-0 TRSM-4-1 TRSM-2-1 SYRK-2-2-0 TRSM-3-1 SYRK-3-3-0 SYRK-4-4-1 GEMM-4-2-1 GEMM-4-3-1 SYRK-2-2-1 GEMM-3-2-1 SYRK-3-3-1 Expand all tasks and schedule resulting graph ? � Scheduling trees simpler than general graphs (forget sub-tasks) � Behavior of coarse-grain tasks � parallel and malleable � Speed-up model − → trade-off between: Accuracy : fits well the data Tractability : amenable to perf. analysis, guaranteed algorithms L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 3 / 22

General speed-up models Literature: studies with few assumptions speed-up ( p ) = time(1 proc.) � work ( p ) = p · time ( p proc. ) � time(p proc.) � Non-increasing speed-up and work � Independent tasks: theoretical FPTAS and practical 2-approximations [Jansen 2004, Fan et al. 2012] � SP-graphs: ≈ 2 . 6-approximation [Lepère et al. 2001] with concave speed-up: ( 2 + ε ) -approximation of unspecified complexity [Makarychev et al. 2014] L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 4 / 22

Previous work (Europar 2015, with A. Guermouche) speed − up ( p ) = p α Prasanna & Musicus model [PM 1996]: speed-up α = 1 perfect parallelism 0 < α < 1 1 α = 0 no parallelism processors 1 Conclusions: � Average Accuracy � No guarantees for distributed platforms � Rational numbers of processors � Task finish times complex � Optimal algorithm for SP-graphs to compute L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 5 / 22

Today: simpler model Simple and reasonable model of a parallel malleable task T i � Perfect parallelism up to a threshold δ i : time = w i / min ( p , δ i ) � Rational allocation for free (McNaughton’s wrap-around rule) speed-up 1 e = p o l s processors δ i Related studies � 2-approximation [Balmin et al. 13] that we will discuss � [Kell et al. 2015] : time = w i p + ( p − 1 ) c ; 2-approximation for p = 3, open for p ≥ 4 L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 6 / 22

Problem complexity Proportional Mapping Greedy strategy Experimental comparison Outline Problem complexity 1 Analysis of P ROPORTIONAL M APPING [Pothen et al. 1993] 2 Design of a greedy strategy 3 Experimental comparison 4 Conclusion 5 L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 7 / 22

Problem complexity Proportional Mapping Greedy strategy Experimental comparison Overview of the problem Given a SP-graph, p processors: compute the optimal makespan � Problem known as P | sp − graph , any , spdp - lin , δ i | C max � Malleability + perfect parallelism ⇒ P = . . . + thresholds = ⇒ NP-complete � � Existing proof in [Drozdowski and Kubiak 1999] : arguably complex Contribution � New NP-completeness proof L. Marchal, B. Simon , O. Sinnen, F. Vivien Malleable task-graph scheduling with a practical speed-up model 8 / 22

Malleable task-graph scheduling with a practical speed-up model - PowerPoint PPT Presentation

Malleable task-graph scheduling with a practical speed-up model Loris Marchal 1 Bertrand Simon 1 Oliver Sinnen 2 Frdric Vivien 1 1: CNRS, INRIA, ENS Lyon and Univ. Lyon, FR. 2: Univ. Auckland, NZ. New Challenges in Scheduling Theory

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Real-Time Scheduling slides: P. Puschner Scheduling Task Model Assumptions about task timing,

CASTER ASSEMBLY SCALE 1 : 1 DRAWN 1/26/2010 swaters A A CHECKED TITLE QA CASTER ASSEMBLY

Malleable Proof Systems and Applications Melissa Chase (MSR Redmond) Markulf Kohlweiss (MSR

Non-Malleable Codes for Partial Functions with Manipulation Detection Aggelos Kiayias Feng-Hao

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

2. Scheduling for Real-Time Systems Roadmap for Section 2 Task Assignment and Scheduling

I have nothing to disclose Em erging Issues in Ped ia tric Infections rela ted to Va ccine-p

Spiral-CT Benjamin Keck 21. March 2006 1 Motivation Spiral-CT offers reconstruction of long

4D X-Ray CT Reconstruction using Multi-Slice Fusion Soumendu Majee 1 1 School of ECE, Purdue

CS 754 Ajit Rajwade Imagine a line was drawn through the 2D image in a certain direction , and

The Rise and Fall and Rise (Hopefully) d Ri (H f ll ) of Standard Mumps Arthur B. Smith Chair,

SLEPc: Scalable Library for Eigenvalue Problem Computations Tutorial version 3.6 Jose E.

MEMS Processes at CMP Bulk Micromachining MUMPs from MEMSCAP Teledyne DALSA MIDIS Micralyne

Fast Fourier transform Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA,