periodic i o scheduling for supercomputers
play

Periodic I/O Scheduling for Supercomputers Guillaume Aupy 1 , Ana - PowerPoint PPT Presentation

Periodic I/O Scheduling for Supercomputers Guillaume Aupy 1 , Ana Gainaru 2 , Valentin Le F` evre 3 1 Inria & U. of Bordeaux 2 Vanderbilt University 3 ENS Lyon & Inria PMBS Workshop, November 2017 Slides available at


  1. Periodic I/O Scheduling for Supercomputers Guillaume Aupy 1 , Ana Gainaru 2 , Valentin Le F` evre 3 1 – Inria & U. of Bordeaux 2 – Vanderbilt University 3 – ENS Lyon & Inria PMBS Workshop, November 2017 Slides available at https://project.inria.fr/dash/

  2. IO congestion in HPC systems Some numbers for motivation: ◮ Computational power keeps increasing (Intrepid: 0.56 PFlop/s, Mira: 10 PFlop/s, Aurora: 450 PFlop/s (?)). ◮ IO Bandwidth increases at slowlier rate (Intrepid: 88 GB/s, Mira: 240 GB/s, Aurora: 1 TB/s (?)). ✶

  3. IO congestion in HPC systems Some numbers for motivation: ◮ Computational power keeps increasing (Intrepid: 0.56 PFlop/s, Mira: 10 PFlop/s, Aurora: 450 PFlop/s (?)). ◮ IO Bandwidth increases at slowlier rate (Intrepid: 88 GB/s, Mira: 240 GB/s, Aurora: 1 TB/s (?)). In other terms: Intrepid can process 160 GB for every PFlop Mira can process 24 GB for every PFlop Aurora will (?) process 2.2 GB for every PFlop Congestion is coming. ✶

  4. Burst buffers: the solution? Simplistically: ◮ If IO bandwidth available: use it ◮ Else, fill the burst buffers ◮ When IO bandwidth is available: empty the burst-buffers. If the Burst Buffers are big enough it should work ✷

  5. Burst buffers: the solution? Simplistically: ◮ If IO bandwidth available: use it ◮ Else, fill the burst buffers ◮ When IO bandwidth is available: empty the burst-buffers. If the Burst Buffers are big enough it should work right? ✷

  6. Burst buffers: the solution? Simplistically: ◮ If IO bandwidth available: use it ◮ Else, fill the burst buffers ◮ When IO bandwidth is available: empty the burst-buffers. If the Burst Buffers are big enough it should work right? Average I/O occupation : sum for all applications of the volume of I/O transfered, divided by the time they execute, normalized by the peak I/O bandwidth. ✷

  7. Burst buffers: the solution? Simplistically: ◮ If IO bandwidth available: use it ◮ Else, fill the burst buffers ◮ When IO bandwidth is available: empty the burst-buffers. If the Burst Buffers are big enough it should work right? Average I/O occupation : sum for all applications of the volume of I/O transfered, divided by the time they execute, normalized by the peak I/O bandwidth. Given a set of data-intensive applications running conjointly: ◮ on Intrepid have a max average I/O occupation of 25% ✷

  8. Burst buffers: the solution? Simplistically: ◮ If IO bandwidth available: use it ◮ Else, fill the burst buffers ◮ When IO bandwidth is available: empty the burst-buffers. If the Burst Buffers are big enough it should work right? Average I/O occupation : sum for all applications of the volume of I/O transfered, divided by the time they execute, normalized by the peak I/O bandwidth. Given a set of data-intensive applications running conjointly: ◮ on Intrepid have a max average I/O occupation of 25% ◮ on Mira have an average I/O occupation of 120 to 300% ! ✷

  9. Previously in IO cong. “Online” scheduling ( Gainaru et al. Ipdps’15 ) : ◮ When an application is ready to do I/O, it sends a message to an I/O scheduler; ◮ Based on the other applications running and a priority function, the I/O scheduler will give a GO or NOGO to the application. ◮ If the application receives a NOGO , it pauses until a GO instruction. ◮ Else, it performs I/O. ✸

  10. Previously in IO cong. App (3) App (2) App (1) bw B 0 0 Time ✸

  11. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) bw B 0 0 Time ✸

  12. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) bw B 0 0 Time ✸

  13. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) bw B 0 0 Time ✸

  14. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) bw B 0 0 Time ✸

  15. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) bw B 0 0 Time ✸

  16. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) w (1) bw B 0 0 Time ✸

  17. Previously in IO cong. App (3) w (3) App (2) w (2) App (1) w (1) w (1) bw B 0 0 Time ✸

  18. Previously in IO cong. App (3) w (3) w (3) App (2) w (2) App (1) w (1) w (1) bw B 0 0 Time ✸

  19. Previously in IO cong. App (3) w (3) w (3) App (2) w (2) w (2) App (1) w (1) w (1) bw B 0 0 Time ✸

  20. Previously in IO cong. App (3) w (3) w (3) w (3) App (2) w (2) w (2) w (2) App (1) w (1) w (1) w (1) bw B 0 0 Time Approx 10% improvement in application performance with 5% gain in system performance on Intrepid. ✸

  21. This work Assumption: Applications follow I/O patterns 1 that we can obtain (based on historical data for intance). ◮ We use this information to compute an I/O time schedule; ◮ Each application then knows its GO / NOGO information and uses it to perform I/O. Spoiler: it works very well (at least it seems) 1 periodic pattern , to be defined ✹

  22. I/O characterization of HPC applis Hu et al. 2016 1. Periodicity: computation and I/O phases (write operations such as checkpoints). 2. Synchronization: parallel identical jobs lead to synchronized I/O operations. 3. Repeatability: jobs run several times with different inputs. 4. Burstiness: short burst of write operations. Idea: use the periodic behavior to compute periodic schedules. ✺

  23. Platform model ◮ N unit-speed processors, equipped with an I/O card of bandwidth b ◮ Centralized I/O system with total bandwidth B b=0.1Gb/s/Node =B Model instantiation for the Intrepid platform. ✻

  24. Application Model K periodic applications already scheduled in the system : App ( k ) ( β ( k ) , w ( k ) , vol ( k ) io ). ◮ β ( k ) is the number of processors onto which App ( k ) is assigned ◮ w ( k ) is the computation time of a period ◮ vol ( k ) is the volume of I/O to transfor after the w ( k ) units of time io vol ( k ) time ( k ) io io = min( β ( k ) · b, B ) App (3) w (3) w (3) w (3) App (2) w (2) w (2) w (2) App (1) w (1) w (1) w (1) Bandwidth B 0 0 Time ✼

  25. Objectives If App ( k ) runs during a total time T k and performs n ( k ) instances, we define: w ( k ) ρ ( k ) = n ( k ) w ( k ) ρ ( k ) = , ˜ w ( k ) + time ( k ) T k io ✽

  26. Objectives If App ( k ) runs during a total time T k and performs n ( k ) instances, we define: w ( k ) ρ ( k ) = n ( k ) w ( k ) ρ ( k ) = , ˜ w ( k ) + time ( k ) T k io SysEfficiency maximize peak performance (average number of Flops): 1 � K k =1 β ( k ) ˜ ρ ( k ) . maximize N ✽

  27. Objectives If App ( k ) runs during a total time T k and performs n ( k ) instances, we define: w ( k ) ρ ( k ) = n ( k ) w ( k ) ρ ( k ) = , ˜ w ( k ) + time ( k ) T k io SysEfficiency Dilation minimize largest slowdown maximize peak performance (fairness between users): (average number of Flops): ρ ( k ) 1 � K k =1 β ( k ) ˜ ρ ( k ) . minimize max k =1 ..K ρ ( k ) . maximize ˜ N ✽

  28. High-level constraints ◮ Applications are already scheduled on the machines: not (yet) our job to do it; ✾

  29. High-level constraints ◮ Applications are already scheduled on the machines: not (yet) our job to do it; ◮ We want the schedule information distributed over the applis: the goal is not to add a new congestion point; ✾

  30. High-level constraints ◮ Applications are already scheduled on the machines: not (yet) our job to do it; ◮ We want the schedule information distributed over the applis: the goal is not to add a new congestion point; ◮ Computing a full I/O schedule over all iterations of all applications would be too expensive (i) in time, (ii) in space. ✾

  31. High-level constraints ◮ Applications are already scheduled on the machines: not (yet) our job to do it; ◮ We want the schedule information distributed over the applis: the goal is not to add a new congestion point; ◮ Computing a full I/O schedule over all iterations of all applications would be too expensive (i) in time, (ii) in space. ◮ We want a minimum overhead for Applis users: otherwise, our guess is, users might not like it that much � . ✾

  32. High-level constraints ◮ Applications are already scheduled on the machines: not (yet) our job to do it; ◮ We want the schedule information distributed over the applis: the goal is not to add a new congestion point; ◮ Computing a full I/O schedule over all iterations of all applications would be too expensive (i) in time, (ii) in space. ◮ We want a minimum overhead for Applis users: otherwise, our guess is, users might not like it that much � . ✾

  33. High-level constraints ◮ Applications are already scheduled on the machines: not (yet) our job to do it; ◮ We want the schedule information distributed over the applis: the goal is not to add a new congestion point; ◮ Computing a full I/O schedule over all iterations of all applications would be too expensive (i) in time, (ii) in space. ◮ We want a minimum overhead for Applis users: otherwise, our guess is, users might not like it that much � . We introduce Periodic Scheduling. ✾

  34. Periodic schedules Bw · · · c Time T + c 2 T + c 3 T + c ( n − 2) T + c ( n − 1) T + c nT + c Init Pattern Clean up (a) Periodic schedule (phases) Bw B vol (3) vol (4) vol (3) vol (2) vol (2) vol (2) io io io io io io vol (1) vol (1) vol (1) io io io 0 Time 0 endW (4) initIO (4) initW (4) T 1 1 1 (b) Detail of I/O in a period/pattern ✶✵

Recommend


More recommend