part 3 memory aware dag scheduling
play

Part 3: Memory-Aware DAG Scheduling CR05: Data Aware Algorithms - PowerPoint PPT Presentation

Part 3: Memory-Aware DAG Scheduling CR05: Data Aware Algorithms October 12 & 15, 2020 Summary of the course Part 1: Pebble Games models of computations with limited memory Part 2: External Memory and Cache Oblivous Algoritm 2-level


  1. Part 3: Memory-Aware DAG Scheduling CR05: Data Aware Algorithms October 12 & 15, 2020

  2. Summary of the course ◮ Part 1: Pebble Games models of computations with limited memory ◮ Part 2: External Memory and Cache Oblivous Algoritm 2-level memory system, some parallelism (work stealing) ◮ Part 3: Streaming Algoritms Deal with big data, distributed computing ◮ Part 4: DAG scheduling (today) structured computations with limited memory ◮ Part 5: Communication Avoiding Algorithms regular computations (lin. algebra) in distributed setting 2 / 22

  3. Introduction ◮ Directed Acyclic Graphs: express task dependencies ◮ nodes: computational tasks ◮ edges: dependencies (data = output of a task = input of another task) ◮ Formalism proposed long ago in scheduling ◮ Back into fashion thanks to task based runtimes ◮ Decompose an application (scientific computations) into tasks ◮ Data produced/used by tasks created dependancies ◮ Task mapping and scheduling done at runtime ◮ Numerous projects: ◮ StarPU (Inria Bordeaux) – several codes for each task to execute on any computing resource (CPU, GPU, *PU) ◮ DAGUE, ParSEC (ICL, Tennessee) – task graph expressed in symbolic compact form, dedicated to linear algebra ◮ StartSs (Barcelona), Xkaapi (Grenoble), and others. . . ◮ Now included in OpenMP API 3 / 22

  4. Introduction ◮ Directed Acyclic Graphs: express task dependencies ◮ nodes: computational tasks ◮ edges: dependencies (data = output of a task = input of another task) ◮ Formalism proposed long ago in scheduling ◮ Back into fashion thanks to task based runtimes ◮ Decompose an application (scientific computations) into tasks ◮ Data produced/used by tasks created dependancies ◮ Task mapping and scheduling done at runtime ◮ Numerous projects: ◮ StarPU (Inria Bordeaux) – several codes for each task to execute on any computing resource (CPU, GPU, *PU) ◮ DAGUE, ParSEC (ICL, Tennessee) – task graph expressed in symbolic compact form, dedicated to linear algebra ◮ StartSs (Barcelona), Xkaapi (Grenoble), and others. . . ◮ Now included in OpenMP API 3 / 22

  5. Task graph scheduling and memory ◮ Consider a simple task graph ◮ Tasks have durations and memory demands B D A F C E ◮ Peak memory: maximum memory usage ◮ Trade-off between peak memory and performance (time to solution) 4 / 22

  6. Task graph scheduling and memory ◮ Consider a simple task graph ◮ Tasks have durations and memory demands B D memory A F duration C E ◮ Peak memory: maximum memory usage ◮ Trade-off between peak memory and performance (time to solution) 4 / 22

  7. Task graph scheduling and memory ◮ Consider a simple task graph ◮ Tasks have durations and memory demands Processor 2: C E F Processor 1: A B D time ◮ Peak memory: maximum memory usage ◮ Trade-off between peak memory and performance (time to solution) 4 / 22

  8. Task graph scheduling and memory ◮ Consider a simple task graph ◮ Tasks have durations and memory demands Processor 2: C E F Processor 1: A B D time out of memory! ◮ Peak memory: maximum memory usage ◮ Trade-off between peak memory and performance (time to solution) 4 / 22

  9. Task graph scheduling and memory ◮ Consider a simple task graph ◮ Tasks have durations and memory demands Processor 2: C E F Processor 1: A B D time ◮ Peak memory: maximum memory usage ◮ Trade-off between peak memory and performance (time to solution) 4 / 22

  10. Going back to sequential processing ◮ Temporary data require memory ◮ Scheduling influences the peak memory B D A F C E When minimum memory demand > available memory: ◮ Store some temporary data on a larger, slower storage (disk) ◮ Out-of-core computing, with Input/Output operations (I/O) ◮ Decide both scheduling and eviction scheme 5 / 22

  11. Going back to sequential processing ◮ Temporary data require memory ◮ Scheduling influences the peak memory A B C D E F A C B D E F When minimum memory demand > available memory: ◮ Store some temporary data on a larger, slower storage (disk) ◮ Out-of-core computing, with Input/Output operations (I/O) ◮ Decide both scheduling and eviction scheme 5 / 22

  12. Going back to sequential processing ◮ Temporary data require memory ◮ Scheduling influences the peak memory A B C D E F A C B D E F When minimum memory demand > available memory: ◮ Store some temporary data on a larger, slower storage (disk) ◮ Out-of-core computing, with Input/Output operations (I/O) ◮ Decide both scheduling and eviction scheme 5 / 22

  13. Research problems Several interesting questions: ◮ For sequential processing: ◮ Minimum memory needed to process a graph ◮ In case of memory shortage, minimum I/Os required ◮ In case of parallel processing: ◮ Tradeoffs between memory and time (makespan) ◮ Makespan minimization under bounded memory Most (all?) of these problems: NP-hard on general graphs � Sometimes restrict on simpler graphs: 1. Trees (single output, multiple inputs for each task) Arise in sparse linear algebra (sparse direct solvers), with large data to handle: memory is a problem 2. Series-Parallel graphs Natural generalization of trees, close to actual structure of regular codes 6 / 22

  14. Research problems Several interesting questions: ◮ For sequential processing: ◮ Minimum memory needed to process a graph ◮ In case of memory shortage, minimum I/Os required ◮ In case of parallel processing: ◮ Tradeoffs between memory and time (makespan) ◮ Makespan minimization under bounded memory Most (all?) of these problems: NP-hard on general graphs � Sometimes restrict on simpler graphs: 1. Trees (single output, multiple inputs for each task) Arise in sparse linear algebra (sparse direct solvers), with large data to handle: memory is a problem 2. Series-Parallel graphs Natural generalization of trees, close to actual structure of regular codes 6 / 22

  15. Research problems Several interesting questions: ◮ For sequential processing: ◮ Minimum memory needed to process a graph ◮ In case of memory shortage, minimum I/Os required ◮ In case of parallel processing: ◮ Tradeoffs between memory and time (makespan) ◮ Makespan minimization under bounded memory Most (all?) of these problems: NP-hard on general graphs � Sometimes restrict on simpler graphs: 1. Trees (single output, multiple inputs for each task) Arise in sparse linear algebra (sparse direct solvers), with large data to handle: memory is a problem 2. Series-Parallel graphs Natural generalization of trees, close to actual structure of regular codes 6 / 22

  16. Research problems Several interesting questions: ◮ For sequential processing: ◮ Minimum memory needed to process a graph ◮ In case of memory shortage, minimum I/Os required ◮ In case of parallel processing: ◮ Tradeoffs between memory and time (makespan) ◮ Makespan minimization under bounded memory Most (all?) of these problems: NP-hard on general graphs � Sometimes restrict on simpler graphs: 1. Trees (single output, multiple inputs for each task) Arise in sparse linear algebra (sparse direct solvers), with large data to handle: memory is a problem 2. Series-Parallel graphs Natural generalization of trees, close to actual structure of regular codes 6 / 22

  17. Outline Minimize Memory for Trees Minimize Memory for Series-Parallel Graphs Minimize I/Os for Trees under Bounded Memory 7 / 22

  18. Outline Minimize Memory for Trees Minimize Memory for Series-Parallel Graphs Minimize I/Os for Trees under Bounded Memory 8 / 22

  19. Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i 2 n 2 n 3 3 ◮ Execution data of size n i 0 ◮ Input data of leaf nodes f 4 f 5 have null size 4 n 4 n 5 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 9 / 22

  20. Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i 2 n 2 n 3 3 ◮ Execution data of size n i f 4 0 ◮ Input data of leaf nodes f 5 have null size n 4 4 n 5 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 9 / 22

  21. Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i 2 n 2 n 3 3 ◮ Execution data of size n i f 4 f 5 0 ◮ Input data of leaf nodes have null size n 5 4 n 4 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 9 / 22

  22. Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i n 2 2 n 3 3 ◮ Execution data of size n i f 4 f 5 0 ◮ Input data of leaf nodes have null size 4 n 4 n 5 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 9 / 22

  23. Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i n 3 2 n 2 3 ◮ Execution data of size n i 0 ◮ Input data of leaf nodes f 4 f 5 have null size 4 n 4 n 5 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 9 / 22

  24. Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i 2 n 2 n 3 3 ◮ Execution data of size n i 0 ◮ Input data of leaf nodes f 4 f 5 have null size 4 n 4 n 5 5 0 0   �  + n i + f i ◮ Memory for node i : MemReq ( i ) = f j  j ∈ Children ( i ) 9 / 22

Recommend


More recommend