Scheduling tree-shaped task graphs to minimize memory and makespan Lionel Eyraud-Dubois (INRIA, Bordeaux, France) , Loris Marchal (CNRS, Lyon, France) , Oliver Sinnen (Univ. Auckland, New Zealand) , Fr´ ed´ eric Vivien (INRIA, Lyon, France) New Challenges in Scheduling Theory Workshop, Aussois, March/April 2014 1/ 28
Introduction Task graph scheduling ◮ Application modeled as a graph ◮ Map tasks on processors and schedule them ◮ Usual performance metric: makespan (time) Today: focus on memory ◮ Workflows with large temporary data ◮ Bad evolution of perf. for computation vs. communication: 1/Flops ≪ 1/bandwidth ≪ latency ◮ Gap between processing power and communication cost increasing exponentially annual improvements Flops rate 59% mem. bandwidth 26% mem. latency 5% ◮ Avoid communications ◮ Restrict to in-core memory (out-of-core is expensive) 2/ 28
Introduction Task graph scheduling ◮ Application modeled as a graph ◮ Map tasks on processors and schedule them ◮ Usual performance metric: makespan (time) Today: focus on memory ◮ Workflows with large temporary data ◮ Bad evolution of perf. for computation vs. communication: 1/Flops ≪ 1/bandwidth ≪ latency ◮ Gap between processing power and communication cost increasing exponentially annual improvements Flops rate 59% mem. bandwidth 26% mem. latency 5% ◮ Avoid communications ◮ Restrict to in-core memory (out-of-core is expensive) 2/ 28
Introduction Task graph scheduling ◮ Application modeled as a graph ◮ Map tasks on processors and schedule them ◮ Usual performance metric: makespan (time) Today: focus on memory ◮ Workflows with large temporary data ◮ Bad evolution of perf. for computation vs. communication: 1/Flops ≪ 1/bandwidth ≪ latency ◮ Gap between processing power and communication cost increasing exponentially annual improvements Flops rate 59% mem. bandwidth 26% mem. latency 5% ◮ Avoid communications ◮ Restrict to in-core memory (out-of-core is expensive) 2/ 28
Focus on Task Trees Motivation: ◮ Arise in multifrontal sparse matrix factorization ◮ Assembly/Elimination tree: application task graph is a tree ◮ Large temporary data ◮ Memory usage becomes a bottleneck 3/ 28
Outline Introduction and related work Complexity of parallel tree processing Heuristics for weighted task trees Simulations Summary and perspectives 4/ 28
Outline Introduction and related work Complexity of parallel tree processing Heuristics for weighted task trees Simulations Summary and perspectives 5/ 28
Related Work: Register Allocation & Pebble Game How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x )(5 − z ) − (( u − t ) / (2 + z )) + v − + + / v 7 × + − − + 2 z u t 5 z 1 x Pebble-game rules: ◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime Objective: pebble root node using minimum number of pebbles 6/ 28
Related Work: Register Allocation & Pebble Game How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x )(5 − z ) − (( u − t ) / (2 + z )) + v − + + / v 7 × + − − + 2 z u t 5 z 1 x Pebble-game rules: ◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime Objective: pebble root node using minimum number of pebbles 6/ 28
Related Work: Register Allocation & Pebble Game How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x )(5 − z ) − (( u − t ) / (2 + z )) + v − + + / v 7 × + − − + 2 z u t 5 z 1 x Pebble-game rules: ◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime Objective: pebble root node using minimum number of pebbles 6/ 28
Related Work: Register Allocation & Pebble Game How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x )(5 − z ) − (( u − t ) / (2 + z )) + v − + + / v 7 × + − − + 2 z u t 5 z 1 x Pebble-game rules: ◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime Objective: pebble root node using minimum number of pebbles 6/ 28
Related Work: Register Allocation & Pebble Game How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x )(5 − z ) − (( u − t ) / (2 + z )) + v − + + / v 7 × + − − + 2 z u t 5 z 1 x Pebble-game rules: ◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime Objective: pebble root node using minimum number of pebbles 6/ 28
Related Work: Register Allocation & Pebble Game How to efficiently compute the following arithmetic expression with the minimum number of registers? 7 + (1 + x )(5 − z ) − (( u − t ) / (2 + z )) + v Complexity results Problem on trees: ◮ Polynomial algorithm [Sethi & Ullman, 1970] General problem on DAGs (common subexpressions): ◮ P-Space complete [Gilbert, Lengauer & Tarjan, 1980] ◮ Without re-computation: NP-complete [Sethi, 1973] Pebble-game rules: ◮ Inputs can be pebbled anytime ◮ If all ancestors are pebbled, a node can be pebbled ◮ A pebble may be removed anytime Objective: pebble root node using minimum number of pebbles 6/ 28
Notations: Tree-Shaped Task Graphs n 1 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i 2 ◮ Execution data of size n i n 2 3 n 3 ◮ Input data of leaf nodes 0 f 4 f 5 have null size 4 n 4 n 5 5 0 0 � + n i + f i ◮ Memory for node i : MemReq ( i ) = f j j ∈ Children ( i ) 7/ 28
Notations: Tree-Shaped Task Graphs 1 n 1 ◮ In-tree of n nodes f 2 f 3 ◮ Output data of size f i n 2 2 3 ◮ Execution data of size n i n 3 f 4 f 5 ◮ Input data of leaf nodes 0 have null size 4 5 n 4 n 5 0 0 � + n i + f i ◮ Memory for node i : MemReq ( i ) = f j j ∈ Children ( i ) 7/ 28
Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 0 2 2 4 2 2 5 0 0 Peak memory so far: Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28
Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 2 0 2 2 4 2 5 0 0 Peak memory so far: 4 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28
Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 2 0 2 4 2 2 5 0 0 Peak memory so far: 4 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28
Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 2 2 0 2 4 2 5 0 0 Peak memory so far: 6 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28
Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 2 2 0 4 2 2 5 0 0 Peak memory so far: 6 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28
Impact of Schedule on Memory Peak 4 1 3 3 1 2 3 6 2 2 0 4 2 2 5 0 0 Peak memory so far: 8 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28
Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 0 2 2 4 2 2 5 0 0 Peak memory so far: 8 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28
Impact of Schedule on Memory Peak 4 1 3 3 2 6 1 3 0 2 2 4 2 2 5 0 0 Peak memory so far: 12 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28
Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 0 2 2 4 2 2 5 0 0 Peak memory so far: 12 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28
Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 0 2 2 4 2 2 5 0 0 Peak memory so far: 12 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28
Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 0 2 2 4 2 2 5 0 0 Peak memory so far: Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28
Impact of Schedule on Memory Peak 4 1 3 3 2 6 1 3 0 2 2 4 2 2 5 0 0 Peak memory so far: 9 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28
Impact of Schedule on Memory Peak 4 1 3 3 2 1 3 6 2 0 2 2 4 2 5 0 0 Peak memory so far: 9 Two existing optimal sequential schedules: ◮ Best traversal [J. Liu, 1987] ◮ Best post-order traversal [J. Liu, 1986] 8/ 28
Recommend
More recommend