pmix process management for exascale environments
play

PMIx: Process Management for Exascale Environments Ralph H. Castain - PowerPoint PPT Presentation

PMIx: Process Management for Exascale Environments Ralph H. Castain , David Solt, Joshua Hursey, Aurelien Bouteiller EuroMPI/USA 2017, Chicago, IL What is PMIx? 2015 2016 2017 RM RM RM SLURM SLURM SLURM JSM ALPS JSM others PMI-1


  1. PMIx: Process Management for Exascale Environments Ralph H. Castain , David Solt, Joshua Hursey, Aurelien Bouteiller EuroMPI/USA 2017, Chicago, IL

  2. What is PMIx? 2015 2016 2017 RM RM RM SLURM SLURM SLURM JSM ALPS JSM others PMI-1 PMI-2 PMIx v1.2 PMIx v2.x years go by… MPICH wireup support OMPI Exascale systems dynamic spawn OMPI on horizon Spectrum keyval publish/lookup Spectrum Launch times long OSHMEM OSHMEM New paradigms SOS Exascale launch Exascale launch PGAS PGAS in < 30s in < 10s others others Orchestration

  3. Three Distinct Entities • PMIx Standard § Defined set of APIs, attribute strings § Nothing about implementation • PMIx Reference Library § A full-featured implementation of the Standard § Intended to ease adoption • PMIx Reference Server § Full-featured “shim” to a non-PMIx RM

  4. The Community https://pmix.github.io/pmix https://github.com/pmix

  5. Traditional Launch Sequence GO Wait for files FS & libs Global Barrier Xchg Spawn Job RM Proc WLM WLM Proc Proc Script Procs Launch Cmd Fabric Fabric Fabric NIC NIC NIC Topo Topo Topo

  6. Newer Launch Sequence GO Wait for files FS & libs Proxy Proxy Proxy Global Barrier Xchg Pro Pro Pro Spawn Job RM Proc WLM WLM Proc Proc c c c Script Procs Launch Cmd Fabric Fabric Fabric NIC NIC NIC Topo Topo Topo

  7. PMIx-SMS Interactions System Management Stack OpenMP FS Fabric RM Orchestration Mgr Requests Fabric PMIx PMIx APP NIC Server Client Responses RAS MPI Job Script Tool Support

  8. PMIx Launch Sequence *RM daemon, mpirun-daemon, etc.

  9. PMIx/SLURM* Performance papers coming in 2018! MPI_Init (sec) PRS** srun/PMI2 #nodes *LANL/Buffy cluster, 1ppn **PMIx Reference Server v2.0, direct-fetch/async

  10. Similar Requirements • Notifications/response Multiple, § Errors, resource changes use- specific § Negotiated response libs? • Request allocation changes (difficult for RM community to § shrink/expand support) • Workflow management Single, § Steered/conditional execution multi- • QoS requests purpose lib? § Power, file system, fabric

  11. PMIx “Standards” Process • Modifications/additions § Proposed as RFC § Include prototype implementation • Pull request to reference library Standards Doc under § Notification sent to mailing list development! • Reviews conducted § RFC and implementation § Continues until consensus emerges • Approval given § Developer telecon (weekly)

  12. Philosophy • Generalized APIs § Few hard parameters § “Info” arrays to pass information, specify directives • Easily extended § Add “keys” instead of modifying API • Async operations • Thread safe • SMS always has right to say “not supported” § Allow each backend to evaluate what and when to support something

  13. Messenger not Doer • Generalized APIs § Few hard parameters § “Info” arrays to pass information, specify directives • Easily extended SMS APP § Add “keys” instead of modifying API • Async operations • Thread safe • SMS always has right to say “not supported” § Allow each backend to evaluate what and when to support something Tool

  14. Current Support • Typical startup operations • Event notification § Put, get, commit, barrier, § App, system generated spawn, [dis]connect, § Subscribe, chained publish/lookup § Pre-emption, failures, • Tool connections timeout warning, … • Logging (job record) § Debugger, job submission, query § Status reports, error output • Generalized query • Flexible allocations support § Release resources, request § Job status, layout, system resources data, resource availability

  15. Event Notification Use Case • Fault detection and reporting w/ULFM MPI § ULFM MPI is a fault tolerant flavor of Open MPI MPI MPI • Failures may be detected from the SMS, RAS, or directly by MPI communications PMIx PMIx Server Server • Components produce a PMIx event when detecting an error • Fault Tolerant components PMIx RAS register for the fault event • Components propagate fault events which are then delivered to registered clients

  16. In Pipeline • • Network support File system support § Security keys, pre-spawn local § Dependency detection driver setup, fabric topology § Tiered storage caching strategies and status, traffic reports, • Debugger/tool support ++ fabric manager interaction § Automatic rendezvous • Obsolescence protection § Single interface to all launchers § Automatic cross-version § Co-launch daemons compatibility § Access fabric info, etc. § Container support • Cross-library interoperation • Job control § Pause, kill, signal, heartbeat, resilience support • Generalized data store

  17. Summary We now have an interface library RMs will support for application-directed requests Need to collaboratively define what we want to do with it Project: https://pmix.github.io/pmix Reference Implementation: https://github.com/pmix/pmix Reference Server: https://github.com/pmix/pmix-reference-server

Recommend


More recommend