Practical Steady-State Scheduling for Tree-Shaped Task Graphs Sékou D IAKITÉ 1 , Loris M ARCHAL 2 , Jean-Marc N ICOD 1 , Laurent P HILIPPE 1 - 19/11/2009 1 : Laboratoire d’Informatique de Franche-Comté Université de France Comté, France 2 : Laboratoire de l’Informatique du Parallélisme CNRS - INRIA - Université de Lyon, France
Outline Scheduling problem Principle of steady-state scheduling Overview Shortcomings Reducing the latency Dependencies Mixed Integer Program Heuristic approach Using non-conservative steady-state solutions Experimental results Simulation settings Inter-period dependencies Scheduling efficiency Number of running instances Running time of the algorithms Synthesis D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 2 / 43
Scheduling problem Definitions Execution platform undirected graph, G p = ( V p , E p ) V p = { P 1 , ..., P n } : n processors E p : communication links between the processors bidirectional one-port model c i , j is the time needed to send a unit of data from P i to P j Example P 1 P 2 P 3 P 4 D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 3 / 43
Scheduling problem Definitions Application DAG with no forks (in-trees), G a = ( V a , E a ) V a = { T 1 , ..., T k } : k tasks unrelated computation model, w i , k : time needed by P i to execute T k E a dependencies between tasks F k , l is the amount of data (File) produced by T k and consumed by T l Example T 1 10 to 1000 times T 3 T 4 T 2 D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 4 / 43
Scheduling problem How to ? Problem Executing a batch of graphs (from 10 to 1000) Objective Minimizing the makespan C max Chosen method Steady-state technique which is asymptotically optimal (throughput) D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 5 / 43
Outline Scheduling problem Principle of steady-state scheduling Overview Shortcomings Reducing the latency Dependencies Mixed Integer Program Heuristic approach Using non-conservative steady-state solutions Experimental results Simulation settings Inter-period dependencies Scheduling efficiency Number of running instances Running time of the algorithms Synthesis D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 6 / 43
Principle of steady-state scheduling Overview This study is based on O. Beaumont, A. Legrand, L. Marchal and Y. Robert. Steady-state scheduling on heterogeneous clusters. Int. J. of Foundations of Computer Science, 16(2) :163-194, 2005. D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 7 / 43
Principle of steady-state scheduling Overview Converting the scheduling problem to a linear program the steady-state is characterized by activities variables the average number of T k processed by P i in one time unit the average number of F k , l sent by P i to P j in one time unit these activities variables allow us to write constraints on processor speeds and link bandwidths "conservation laws" to state that F k , l has to be produced by T k and consumed by T l these constraints describe a valid steady-state schedule by adding the objective of maximizing the steady-state throughput, we obtain a linear program D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 8 / 43
Principle of steady-state scheduling Overview From the linear program to a periodic schedule (period) the optimal solution of the linear program gives rational activities we can not split tasks and files → the period length L is equal to the LCM of activities denominators → we multiply every activity by L , activities are now integers L is large but bounded the period allows us to schedule any number of graphs, the final schedule consists in 3 phases initialization steady-state : n × periods clean-up D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 9 / 43
Principle of steady-state scheduling Overview Example L processor P 1 T 1 P 1 P 2 T 1 processor P 2 T 1 T 1 T 1 F 1 , 2 processor P 3 T 2 T 2 link P 1 → P 3 F 1 , 2 T 2 P 3 T 2 T 2 link P 2 → P 3 F 1 , 2 A 2 A 1 Platform graph Task graph Steady-state period Allocations D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 10 / 43
Outline Scheduling problem Principle of steady-state scheduling Overview Shortcomings Reducing the latency Dependencies Mixed Integer Program Heuristic approach Using non-conservative steady-state solutions Experimental results Simulation settings Inter-period dependencies Scheduling efficiency Number of running instances Running time of the algorithms Synthesis D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 11 / 43
Principle of steady-state scheduling Shortcomings Long latency several periods are necessary to process an instance → drawback for interactive applications → lead to large buffers : at every time step, a large number of ongoing job has to be stored Long initialization and clean-up phases the period contains a large number of ongoing job → long initialization phase to enter steady-state → long clean-up phase to leave steady-state initialization and clean-up are done with heuristic scheduling → we lose the benefit of the optimal steady-state phase D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 12 / 43
Principle of steady-state scheduling Shortcomings Addressing the shortcomings the original steady-state algorithm reaches good C max as soon as the number of instances is large enough in this study, we aim at reducing this threshold D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 13 / 43
Principle of steady-state scheduling Addressing the shortcomings Means of actions decrease the length of the period hard to do when we want to keep an optimal period reduce the latency (inter/intra dependencies) side benefit : less work to do in initialization and clean-up (gain on C max ) reduce the period length by allowing a small reduction of the throughput side benefit : reducing the latency D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 14 / 43
Outline Scheduling problem Principle of steady-state scheduling Overview Shortcomings Reducing the latency Dependencies Mixed Integer Program Heuristic approach Using non-conservative steady-state solutions Experimental results Simulation settings Inter-period dependencies Scheduling efficiency Number of running instances Running time of the algorithms Synthesis D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 15 / 43
Reducing the latency Dependencies How the reduce the latency ? Intra-period dependencies. The original steady-state (only inter-period dependencies) T 2 n T 2 n + 2 T 2 n + 4 1 1 1 P 1 P 2 T 2 n + 1 T 1 T 2 n + 3 T 2 n + 5 1 1 1 T 2 n + 1 T 2 n F 1 , 2 T 2 n − 4 T 2 n − 3 T 2 n − 2 T 2 n − 1 2 2 2 2 2 2 F 2 n F 2 n − 2 F 2 n + 2 T 2 1 , 2 1 , 2 1 , 2 P 3 F 2 n + 1 F 2 n − 1 F 2 n + 3 1 , 2 1 , 2 1 , 2 Platform graph Task graph Steady-state schedule D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 16 / 43
Reducing the latency Dependencies The steady-state with intra-period dependencies period n + 1 period n + 2 period n T 2 n T 2 n + 2 T 2 n + 4 1 1 1 T 2 n + 1 T 2 n + 3 T 2 n + 5 1 1 1 P 1 P 2 T 1 T 2 n T 2 n + 1 T 2 n − 3 T 2 n − 2 T 2 n − 1 T 2 n + 2 2 2 2 2 2 2 F 1 , 2 F 2 n F 2 n − 2 F 2 n + 2 1 , 2 1 , 2 1 , 2 T 2 F 2 n + 1 F 2 n − 1 F 2 n + 3 P 3 1 , 2 1 , 2 1 , 2 inter-period dependency Platform graph Task graph intra-period dependency D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 17 / 43
Outline Scheduling problem Principle of steady-state scheduling Overview Shortcomings Reducing the latency Dependencies Mixed Integer Program Heuristic approach Using non-conservative steady-state solutions Experimental results Simulation settings Inter-period dependencies Scheduling efficiency Number of running instances Running time of the algorithms Synthesis D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 18 / 43
Reducing the latency Mixed Integer Program Ordering Tasks ( T j , T k ) on the same processor P i binary variable y j , k = 1 if and only if T j is processed before T k t j is the starting time of task T j , L is the length of the period (1) t j − t k ≥ − y j , k × L y j , k + y k , j = 1 (2) t k − ( t j + w i , j ) ≥ ( y j , k − 1 ) × L (3) (4) t j + w i , j ≤ L D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 19 / 43
Reducing the latency Mixed Integer Program Dependencies For each dependency T j → T k binary variable e j , k = 1 intra-period dependency ( e j , k = 0 inter-period) t k − ( t j + w i , j ) ≥ ( e j , k − 1 ) × L (5) Objective � Maximize � e j , k under the constraints (1) , (2) , (3) , (4) and (5) D IAKITÉ , M ARCHAL , N ICOD , P HILIPPE ROMA/GRAAL working Group - 19/11/2009 20 / 43
Recommend
More recommend