Toward a Fully Decentralized Algorithm for Multiple Bag-of-tasks Application Scheduling on Grids R´ emi Bertin, Arnaud Legrand, Corinne Touati Laboratoire LIG, CNRS-INRIA Grenoble, France Aussois Workshop A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling 1 / 24
Outline Framework 1 Lagrangian Optimization 2 Simulations: Early Results 3 A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling 2 / 24
Motivation Large-scale distributed computing platforms result from the collab- oration of many users: ◮ Sharing resources amongst users should somehow be fair. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling 3 / 24
Motivation Large-scale distributed computing platforms result from the collab- oration of many users: ◮ Sharing resources amongst users should somehow be fair. ◮ The size of these systems prevents the use of centralized ap- proaches � need for distributed scheduling. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling 3 / 24
Motivation Large-scale distributed computing platforms result from the collab- oration of many users: ◮ Sharing resources amongst users should somehow be fair. ◮ The size of these systems prevents the use of centralized ap- proaches � need for distributed scheduling. ◮ Task regularity (SETI@home, BOINC, . . . ) � steady-state scheduling. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling 3 / 24
Motivation Large-scale distributed computing platforms result from the collab- oration of many users: ◮ Sharing resources amongst users should somehow be fair. ◮ The size of these systems prevents the use of centralized ap- proaches � need for distributed scheduling. ◮ Task regularity (SETI@home, BOINC, . . . ) � steady-state scheduling. Designing a Fair and Distributed scheduling algorithm for this framework. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling 3 / 24
Outline Framework 1 Lagrangian Optimization 2 Simulations: Early Results 3 A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 4 / 24
Platform Model ◮ General platform graph G = ( N, E, W, B ) . W j B i → j W i A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 5 / 24
Platform Model ◮ General platform graph G = ( N, E, W, B ) . ◮ Speed of P n ∈ N : W n (in MFlops/s). W j B i → j W i A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 5 / 24
Platform Model ◮ General platform graph G = ( N, E, W, B ) . ◮ Speed of P n ∈ N : W n (in MFlops/s). ◮ Bandwidth of ( P i → P j ) : B i,j (in MB/s). W j B i → j W i A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 5 / 24
Platform Model ◮ General platform graph G = ( N, E, W, B ) . ◮ Speed of P n ∈ N : W n (in MFlops/s). ◮ Bandwidth of ( P i → P j ) : B i,j (in MB/s). ◮ Linear-cost communication and computa- tion model: X/B i,j time units to send a W j message of size X from P i to P j . B i → j W i A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 5 / 24
Platform Model ◮ General platform graph G = ( N, E, W, B ) . ◮ Speed of P n ∈ N : W n (in MFlops/s). ◮ Bandwidth of ( P i → P j ) : B i,j (in MB/s). ◮ Linear-cost communication and computa- tion model: X/B i,j time units to send a W j message of size X from P i to P j . ◮ Communications and computations can be B i → j overlapped. W i A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 5 / 24
Platform Model ◮ General platform graph G = ( N, E, W, B ) . ◮ Speed of P n ∈ N : W n (in MFlops/s). ◮ Bandwidth of ( P i → P j ) : B i,j (in MB/s). ◮ Linear-cost communication and computa- tion model: X/B i,j time units to send a W j message of size X from P i to P j . ◮ Communications and computations can be B i → j overlapped. ◮ Multi-port communication model. W i A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 5 / 24
Application Model Multiple applications: ◮ A set A of K applications A 1 , . . . , A K . A 1 A 2 A 3 A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 6 / 24
Application Model Multiple applications: ◮ A set A of K applications A 1 , . . . , A K . ◮ Each consisting in a large number of same-size independent tasks � each application is defined by a communication cost w k (in MFlops) and a communication cost b k (in MB). A 1 A 2 A 3 A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 6 / 24
Application Model Multiple applications: ◮ A set A of K applications A 1 , . . . , A K . ◮ Each consisting in a large number of same-size independent tasks � each application is defined by a communication cost w k (in MFlops) and a communication cost b k (in MB). ◮ Different communication and computation demands for differ- ent applications. A 1 A 2 A 3 A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 6 / 24
Hierarchical Deployment ◮ Each application originates from a master node P m ( k ) that initially holds all the input data necessary for each application A k . P m (2) P m (1) A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 7 / 24
Hierarchical Deployment ◮ Each application originates from a master node P m ( k ) that initially holds all the input data necessary for each application A k . P m (2) ◮ Communication are only required outwards P m (1) from the master nodes: the amount of data returned by the worker is negligible. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 7 / 24
Hierarchical Deployment ◮ Each application originates from a master node P m ( k ) that initially holds all the input data necessary for each application A k . P m (2) ◮ Communication are only required outwards P m (1) from the master nodes: the amount of data returned by the worker is negligible. ◮ Each application A k is deployed on the platform as a tree. Therefore if an application k wants to use a node P n , all its data will use a single path from P m ( k ) to P n denoted by ( P m ( k ) � P n ) . A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 7 / 24
Steady-State Scheduling and Utility ◮ All tasks of a given application are identical and independent � we do not really need to care about where and when (as opposed to classical scheduling problems). A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 8 / 24
Steady-State Scheduling and Utility ◮ All tasks of a given application are identical and independent � we do not really need to care about where and when (as opposed to classical scheduling problems). ◮ We only need to focus on average values in steady-state. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 8 / 24
Steady-State Scheduling and Utility ◮ All tasks of a given application are identical and independent � we do not really need to care about where and when (as opposed to classical scheduling problems). ◮ We only need to focus on average values in steady-state. ◮ Steady-state values: A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 8 / 24
Steady-State Scheduling and Utility ◮ All tasks of a given application are identical and independent � we do not really need to care about where and when (as opposed to classical scheduling problems). ◮ We only need to focus on average values in steady-state. ◮ Steady-state values: ◮ Variables: average number of tasks of type k processed by pro- cessor n per time unit: ̺ n,k . A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 8 / 24
Steady-State Scheduling and Utility ◮ All tasks of a given application are identical and independent � we do not really need to care about where and when (as opposed to classical scheduling problems). ◮ We only need to focus on average values in steady-state. ◮ Steady-state values: ◮ Variables: average number of tasks of type k processed by pro- cessor n per time unit: ̺ n,k . ◮ Throughput of application k : ̺ k = � n ∈ N ̺ n,k . A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 8 / 24
Steady-State Scheduling and Utility ◮ All tasks of a given application are identical and independent � we do not really need to care about where and when (as opposed to classical scheduling problems). ◮ We only need to focus on average values in steady-state. ◮ Steady-state values: ◮ Variables: average number of tasks of type k processed by pro- cessor n per time unit: ̺ n,k . ◮ Throughput of application k : ̺ k = � n ∈ N ̺ n,k . Theorem 1. From “feasible” ̺ n,k , it is possible to build an optimal periodic infi- nite schedule (i.r. whose steady-state rates are exactly the ̺ n,k ). Such a schedule is asymptotically optimal for the makespan. A. Legrand (CNRS-LIG) INRIA-MESCAL Fair and Distributed Scheduling Framework 8 / 24
Recommend
More recommend