coorm v2 an rms with support for non predictably evolving
play

CooRM v2: An RMS with Support for Non-predictably Evolving - PowerPoint PPT Presentation

. . . . . . CooRM v2: An RMS with Support for Non-predictably Evolving Applications Cristian KLEIN, Christian PREZ Avalon / GRAAL, INRIA / LIP, ENS de Lyon Scheduling Workshop, May 29June 1, 2011, Aussois Cristian KLEIN (INRIA) CooRM


  1. . . . . . . CooRM v2: An RMS with Support for Non-predictably Evolving Applications Cristian KLEIN, Christian PÉREZ Avalon / GRAAL, INRIA / LIP, ENS de Lyon Scheduling Workshop, May 29–June 1, 2011, Aussois Cristian KLEIN (INRIA) CooRM v2 Scheduling in Aussois 1 / 19

  2. . . . . . . . . . . . . . . Goal: maintain a given target efficiency Cristian KLEIN (INRIA) CooRM v2 Scheduling in Aussois . 2 / 19 . Adaptive Mesh Refinement Applications (AMR) . . . . . . Generally evolves non-predictably . . . . . Mesh is dynamically refined / coarsened as required by numerical precision ◮ Memory requirements increase / decrease ◮ Amount of parallelism increases / decreases Duration of a step (s) 1000 Normalized data size 3136 GiB 100 900 784 GiB 800 196 GiB 700 48 GiB 600 12 GiB 500 10 400 300 200 100 1 0 1 4 16 64 256 1k 4k 16k 0 100 200 300 400 500 600 700 800 900 1000 Number of nodes Step number

  3. . . Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) Ideally , unused resources should be filled by other applications . Evolution is not known in advance E.g., cluster, supercomputing batch schedulers . . . 3 / 19 . . . . Use static allocations (rigid jobs) . Executing AMR applications on HPC resources (1/2) . . . . → User is forced to over-allocate → Inefficient resource usage Example: target efficiency 75% ( ± 10 % ) Data Size (Relative) 8 4 2 1 1/2 1/4 1/8 0 1000 2000 3000 4000 5000 Number of nodes ◮ Needs support from the Resource Management System (RMS)

  4. . . Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) application? Ideally , RMS guarantees the availability of resources to an AMR Application may run out-of-memory Even without this limit: “Out of capacity” errors Infinite? Actually up to 20 “The illusion of infinite computing resources available on demand” Clouds Malleable jobs : RMS tells applications to grow/shrink . . . . . . . . Use dynamic allocations . Executing AMR applications on HPC resources (2/2) . . . . 4 / 19

  5. . . Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) application? Ideally , RMS guarantees the availability of resources to an AMR Application may run out-of-memory Even without this limit: “Out of capacity” errors Infinite? Actually up to 20 “The illusion of infinite computing resources available on demand” Clouds Malleable jobs : RMS tells applications to grow/shrink . . . . . . . . Use dynamic allocations . Executing AMR applications on HPC resources (2/2) . . . . 4 / 19

  6. . . Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) application? Ideally , RMS guarantees the availability of resources to an AMR Application may run out-of-memory Even without this limit: “Out of capacity” errors “The illusion of infinite computing resources available on demand” Clouds Malleable jobs : RMS tells applications to grow/shrink . . . . . . . . Use dynamic allocations . Executing AMR applications on HPC resources (2/2) . . . . 4 / 19 ◮ Infinite? Actually up to 20

  7. . . Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) application? Ideally , RMS guarantees the availability of resources to an AMR “The illusion of infinite computing resources available on demand” Clouds Malleable jobs : RMS tells applications to grow/shrink . . . . . . . . Use dynamic allocations . Executing AMR applications on HPC resources (2/2) . . . . 4 / 19 ◮ Infinite? Actually up to 20 ◮ Even without this limit: “Out of capacity” errors → Application may run out-of-memory

  8. . . Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) application? Ideally , RMS guarantees the availability of resources to an AMR “The illusion of infinite computing resources available on demand” Clouds Malleable jobs : RMS tells applications to grow/shrink . . . . . . . . Use dynamic allocations . Executing AMR applications on HPC resources (2/2) . . . . 4 / 19 ◮ Infinite? Actually up to 20 ◮ Even without this limit: “Out of capacity” errors → Application may run out-of-memory

  9. . . Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) Guarantee the availability of resources To use resources efficiently evolving applications A Resource Management System (RMS) which allows non-predictably . . . . . . . Problem . . . . 5 / 19

  10. . Results Application Examples Non-predictably Evolving: Adaptive Mesh Refinement Malleable: Parameter-Sweep Application . .. 4 . .. .. 5 Conclusions Cristian KLEIN (INRIA) CooRM v2 Scheduling in Aussois 3 . . 1 . . . . . .. Introduction Scheduling Algorithm . .. 2 CooRM v2 Resource Requests High-level Operations Views 6 / 19

  11. . Results Application Examples Non-predictably Evolving: Adaptive Mesh Refinement Malleable: Parameter-Sweep Application . .. 4 . .. .. 5 Conclusions Cristian KLEIN (INRIA) CooRM v2 Scheduling in Aussois 3 . . 1 . . . . . .. Introduction Scheduling Algorithm . .. 2 CooRM v2 Resource Requests High-level Operations Views 7 / 19

  12. . . Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) immediately if I need them.” “I do not currently need these resources, but make sure I can get them Pre-allocation Preemptible (think OAR best-effort jobs) Non-preemptible (default in major RMSs) Type Cluster ID, number of nodes, duration . . . . . . . Resource Requests . . . . 8 / 19 RMS chooses start time → node IDs are allocated to the application

  13. . . Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) immediately if I need them.” “I do not currently need these resources, but make sure I can get them Pre-allocation Preemptible (think OAR best-effort jobs) Type Cluster ID, number of nodes, duration . . . . . . . Resource Requests . . . . 8 / 19 RMS chooses start time → node IDs are allocated to the application ◮ Non-preemptible (default in major RMSs)

  14. . . Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) immediately if I need them.” “I do not currently need these resources, but make sure I can get them Pre-allocation Type Cluster ID, number of nodes, duration . . . . . . . Resource Requests . . . . 8 / 19 RMS chooses start time → node IDs are allocated to the application ◮ Non-preemptible (default in major RMSs) ◮ Preemptible (think OAR best-effort jobs)

  15. . . Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) immediately if I need them.” “I do not currently need these resources, but make sure I can get them Type Cluster ID, number of nodes, duration . . . . . . . Resource Requests . . . . 8 / 19 RMS chooses start time → node IDs are allocated to the application ◮ Non-preemptible (default in major RMSs) ◮ Preemptible (think OAR best-effort jobs) ◮ Pre-allocation

  16. . CooRM v2 defines simple, low-level operations on requests Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) An update is guaranteed to succeed only inside a pre-allocation . . . . . . . . High-level Operations . . . . . . . . . . Low-level Operations . High-level Operations . . . . 9 / 19

  17. . . Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) . Preemptible view informs when resources need to be preempted Each app is presented with two views: non-preemptible, preemptible Apps need to adapt their requests to the availability of the resources . 10 / 19 . . . . Views . . . . . 14 Preemptible view Non-preemptible view 12 Number of nodes 10 8 6 4 2 0 0 20 40 60 80 100 120 140 Time (minutes)

  18. . . Scheduling in Aussois CooRM v2 Cristian KLEIN (INRIA) Preemptible requests Pre-allocations and non-preemptible requests . . . . . . . Scheduling Algorithm . . . . 11 / 19 ◮ Conservative Back-Filling (CBF) ◮ equi-partitioning

  19. . Results Application Examples Non-predictably Evolving: Adaptive Mesh Refinement Malleable: Parameter-Sweep Application . .. 4 . .. .. 5 Conclusions Cristian KLEIN (INRIA) CooRM v2 Scheduling in Aussois 3 . . 1 . . . . . .. Introduction Scheduling Algorithm . .. 2 CooRM v2 Resource Requests High-level Operations Views 12 / 19

  20. . . Behaviour in CooRM v2 . . . . . . Aim: maintain a given target efficiency . Sends one pre-allocation Sends non-preemptible requests inside the pre-allocation Cristian KLEIN (INRIA) CooRM v2 Scheduling in Aussois . Cannot predict its data evolution . Application Model . . . . Non-predictably Evolving: Adaptive Mesh Refinement . . Application knows its speed-up model . . . . . . . 13 / 19 ◮ Simulation parameter: overcommitFactor

  21. . . . . . . . . . . Send preemptible requests Spawn tasks if resources are available Kill tasks if RMS asks to (increases waste ) Stop tasks if will not be available (no waste) Cristian KLEIN (INRIA) CooRM v2 Scheduling in Aussois Behaviour in CooRM v2 Aim: maximize speed-up . . . . . . Malleable: Parameter-Sweep Application . Application Model . All tasks have the same duration (known in advance) . . . . . . Infinite number of single-node tasks 14 / 19

  22. . Results Application Examples Non-predictably Evolving: Adaptive Mesh Refinement Malleable: Parameter-Sweep Application . .. 4 . .. .. 5 Conclusions Cristian KLEIN (INRIA) CooRM v2 Scheduling in Aussois 3 . . 1 . . . . . .. Introduction Scheduling Algorithm . .. 2 CooRM v2 Resource Requests High-level Operations Views 15 / 19

Recommend


More recommend