Scheduling the I/O of HPC applications under congestion Ana Gainaru, Guillaume Aupy, Anne Benoit, Yves Robert, Franck Cappello & Marc Snir JLPC Sophia-Antipolis - June 2014
I/O scheduling 1 Motivation G. Aupy Motivation 2 Model Model Platform Platform Applications Applications Objectives Objectives Algorithms Simulations Applications 3 Algorithms Assessment of heuristics Experiments 4 Simulations Conclusion Applications Assessment of heuristics 5 Experiments 6 Conclusion 1.0
I/O Interconnect technologies: A major challenge scheduling G. Aupy Motivation Model Platform Applications Objectives Without efficient interconnect technology, exascale systems Algorithms would be more like data-centers Simulations Applications Assessment of heuristics Experiments The challenge: Conclusion Flops are “free”, we need to optimize data-movement! 2.0
I/O Interconnect technologies: A major challenge scheduling G. Aupy Motivation Model Platform Applications Objectives Algorithms Simulations Applications Assessment of heuristics Experiments Conclusion Analysis of the Intrepid system @Argonne: I/O throughput decrease (percentage per application, over 400 applications). 2.0
I/O scheduling 1 Motivation G. Aupy Motivation 2 Model Model Platform Platform Applications Applications Objectives Objectives Algorithms Simulations Applications 3 Algorithms Assessment of heuristics Experiments 4 Simulations Conclusion Applications Assessment of heuristics 5 Experiments 6 Conclusion 3.0
I/O Platform scheduling G. Aupy Motivation • N unit-speed processors, equipped with an I/O card of Model bandwidth b Platform Applications Objectives • Centralized I/O system with total bandwidth B Algorithms Simulations Applications Assessment of heuristics b=0.1Gb/s/Node Experiments Conclusion =B Model instantiation for the Intrepid platform. 4.0
I/O Applications scheduling G. Aupy Motivation K applications competing for I/O. For application App ( k ) : Model Platform Applications • Released at time r k ; Objectives • Executed on β ( k ) procs; Algorithms Simulations consists of w ( k , i ) units of computation • n ( k ) tot instances : I ( k ) Applications i Assessment of heuristics followed by the transfer of a volume vol ( k , i ) ; io Experiments • The minimum time to execute vol ( k , i ) is: Conclusion io vol ( k , i ) time ( k , i ) io = min( β ( k ) b , B ); io • Last instance finishes at time d k . 5.0
I/O Applications scheduling G. Aupy Motivation Model Platform Applications Objectives Algorithms App (3) Simulations App (2) Applications App (1) Assessment of heuristics bw Experiments B Conclusion 0 0 Time 5.0
I/O Applications scheduling G. Aupy Motivation Model Platform Applications Objectives Algorithms App (3) w (3 , 1) Simulations App (2) w (2 , 1) Applications App (1) w (1 , 1) Assessment of heuristics bw Experiments B Conclusion 0 0 Time 5.0
I/O Applications scheduling G. Aupy Motivation Model Platform Applications Objectives Algorithms App (3) w (3 , 1) Simulations App (2) w (2 , 1) Applications App (1) w (1 , 1) Assessment of heuristics bw Experiments B Conclusion 0 0 Time 5.0
I/O Applications scheduling G. Aupy Motivation Model Platform Applications Objectives Algorithms App (3) w (3 , 1) Simulations App (2) w (2 , 1) Applications App (1) w (1 , 1) Assessment of heuristics bw Experiments B Conclusion 0 0 Time 5.0
I/O Applications scheduling G. Aupy Motivation Model Platform Applications Objectives Algorithms App (3) w (3 , 1) Simulations App (2) w (2 , 1) Applications App (1) w (1 , 1) Assessment of heuristics bw Experiments B Conclusion 0 0 Time 5.0
I/O Applications scheduling G. Aupy Motivation Model Platform Applications Objectives Algorithms App (3) w (3 , 1) Simulations App (2) w (2 , 1) Applications App (1) w (1 , 1) Assessment of heuristics bw Experiments B Conclusion 0 0 Time 5.0
I/O Applications scheduling G. Aupy Motivation Model Platform Applications Objectives Algorithms App (3) w (3 , 1) Simulations App (2) w (2 , 1) Applications App (1) w (1 , 1) w (1 , 2) Assessment of heuristics bw Experiments B Conclusion 0 0 Time 5.0
I/O Applications scheduling G. Aupy Motivation Model Platform Applications Objectives Algorithms App (3) w (3 , 1) Simulations App (2) w (2 , 1) Applications App (1) w (1 , 1) w (1 , 2) Assessment of heuristics bw Experiments B Conclusion 0 0 Time 5.0
I/O Applications scheduling G. Aupy Motivation Model Platform Applications Objectives Algorithms App (3) w (3 , 1) w (3 , 2) Simulations App (2) w (2 , 1) Applications App (1) w (1 , 1) w (1 , 2) Assessment of heuristics bw Experiments B Conclusion 0 0 Time 5.0
I/O Applications scheduling G. Aupy Motivation Model Platform Applications Objectives Algorithms App (3) w (3 , 1) w (3 , 2) Simulations App (2) w (2 , 1) w (2 , 2) Applications App (1) w (1 , 1) w (1 , 2) Assessment of heuristics bw Experiments B Conclusion 0 0 Time 5.0
I/O Applications scheduling G. Aupy Motivation Model Platform Applications Objectives Algorithms App (3) w (3 , 1) w (3 , 2) w (3 , 3) Simulations App (2) w (2 , 1) w (2 , 2) w (2 , 3) Applications App (1) w (1 , 1) w (1 , 2) w (1 , 3) Assessment of heuristics bw Experiments B Conclusion 0 0 Time 5.0
I/O Objectives scheduling G. Aupy Motivation Definition (Application efficiency) Model Platform Applications i ≤ n ( k ) ( t ) w ( k , i ) � Objectives ρ ( k ) ( t ) = ˜ , Algorithms t − r k Simulations Applications where n ( k ) ( t ) is the number of instances of App ( k ) executed at Assessment of heuristics time t . Experiments Conclusion 6.0
I/O Objectives scheduling G. Aupy Motivation Definition (Application efficiency) Model Platform Applications i ≤ n ( k ) ( t ) w ( k , i ) � Objectives ρ ( k ) ( t ) = ˜ , Algorithms t − r k Simulations Applications where n ( k ) ( t ) is the number of instances of App ( k ) executed at Assessment of heuristics time t . Experiments Conclusion � w ( k , i ) + time ( k , i ) � Obviously: t − r k ≥ � . i ≤ n ( k ) ( t ) io Hence: i ≤ n ( k ) ( t ) w ( k , i ) � ρ ( k ) ( t ) ≤ ρ ( k ) ( t ) = ˜ � . � w ( k , i ) + time ( k , i ) � i ≤ n ( k ) ( t ) io 6.0
I/O Objectives scheduling G. Aupy Motivation Model Platform Applications Objectives • SysEfficiency : Algorithms Simulations K maximize 1 Applications � β ( k ) ˜ ρ ( k ) ( d k ) . Assessment of heuristics N k =1 Experiments Conclusion • Dilation : ρ ( k ) ( d k ) minimize max ρ ( k ) ( d k ) . ˜ k =1 .. K 6.0
I/O scheduling 1 Motivation G. Aupy Motivation 2 Model Model Platform Platform Applications Applications Objectives Objectives Algorithms Simulations Applications 3 Algorithms Assessment of heuristics Experiments 4 Simulations Conclusion Applications Assessment of heuristics 5 Experiments 6 Conclusion 7.0
I/O Scheduler scheduling G. Aupy Motivation Model Platform The scheduler monitors the stream of I/O calls; decides on the Applications Objectives fly which applications can perform I/O. Algorithms Simulations • At each time step, it has access to the state of the system Applications ρ ( k ) ). (each application efficiency, ˜ Assessment of heuristics • Based on a given strategy, chooses a subset of applications Experiments Conclusion that are allowed to perform I/O. 8.0
I/O Scheduler scheduling G. Aupy Motivation Model Platform The scheduler monitors the stream of I/O calls; decides on the Applications Objectives fly which applications can perform I/O. Algorithms Simulations • At each time step, it has access to the state of the system Applications ρ ( k ) ). (each application efficiency, ˜ Assessment of heuristics • Based on a given strategy, chooses a subset of applications Experiments Conclusion that are allowed to perform I/O. When a strategy favors App ( k ) , it means that App ( k ) is b β ( k ) , bw avail � � executed as fast as possible (min ). 8.0
I/O Different strategies scheduling G. Aupy Motivation • RoundRobin : Similar to the current scheduler in HPC Model systems. Applications are served following the Platform Applications “First-Come, First Served” principle. Objectives Algorithms Simulations Applications Assessment of heuristics Experiments Conclusion 9.0
I/O Different strategies scheduling G. Aupy Motivation • RoundRobin : Similar to the current scheduler in HPC Model systems. Applications are served following the Platform Applications “First-Come, First Served” principle. Objectives Algorithms • MinDilation : favors applications with high values of Simulations ρ ( k ) ( t ) ρ ( k ) ( t ) . Applications ˜ Assessment of heuristics Experiments Conclusion 9.0
Recommend
More recommend