PMIx: Process Management for Exascale Environments Ralph H. Castain , David Solt, Joshua Hursey, Aurelien Bouteiller EuroMPI/USA 2017, Chicago, IL
What is PMIx? 2015 2016 2017 RM RM RM SLURM SLURM SLURM JSM ALPS JSM others PMI-1 PMI-2 PMIx v1.2 PMIx v2.x years go by… MPICH wireup support OMPI Exascale systems dynamic spawn OMPI on horizon Spectrum keyval publish/lookup Spectrum Launch times long OSHMEM OSHMEM New paradigms SOS Exascale launch Exascale launch PGAS PGAS in < 30s in < 10s others others Orchestration
Three Distinct Entities • PMIx Standard § Defined set of APIs, attribute strings § Nothing about implementation • PMIx Reference Library § A full-featured implementation of the Standard § Intended to ease adoption • PMIx Reference Server § Full-featured “shim” to a non-PMIx RM
The Community https://pmix.github.io/pmix https://github.com/pmix
Traditional Launch Sequence GO Wait for files FS & libs Global Barrier Xchg Spawn Job RM Proc WLM WLM Proc Proc Script Procs Launch Cmd Fabric Fabric Fabric NIC NIC NIC Topo Topo Topo
Newer Launch Sequence GO Wait for files FS & libs Proxy Proxy Proxy Global Barrier Xchg Pro Pro Pro Spawn Job RM Proc WLM WLM Proc Proc c c c Script Procs Launch Cmd Fabric Fabric Fabric NIC NIC NIC Topo Topo Topo
PMIx-SMS Interactions System Management Stack OpenMP FS Fabric RM Orchestration Mgr Requests Fabric PMIx PMIx APP NIC Server Client Responses RAS MPI Job Script Tool Support
PMIx Launch Sequence *RM daemon, mpirun-daemon, etc.
PMIx/SLURM* Performance papers coming in 2018! MPI_Init (sec) PRS** srun/PMI2 #nodes *LANL/Buffy cluster, 1ppn **PMIx Reference Server v2.0, direct-fetch/async
Similar Requirements • Notifications/response Multiple, § Errors, resource changes use- specific § Negotiated response libs? • Request allocation changes (difficult for RM community to § shrink/expand support) • Workflow management Single, § Steered/conditional execution multi- • QoS requests purpose lib? § Power, file system, fabric
PMIx “Standards” Process • Modifications/additions § Proposed as RFC § Include prototype implementation • Pull request to reference library Standards Doc under § Notification sent to mailing list development! • Reviews conducted § RFC and implementation § Continues until consensus emerges • Approval given § Developer telecon (weekly)
Philosophy • Generalized APIs § Few hard parameters § “Info” arrays to pass information, specify directives • Easily extended § Add “keys” instead of modifying API • Async operations • Thread safe • SMS always has right to say “not supported” § Allow each backend to evaluate what and when to support something
Messenger not Doer • Generalized APIs § Few hard parameters § “Info” arrays to pass information, specify directives • Easily extended SMS APP § Add “keys” instead of modifying API • Async operations • Thread safe • SMS always has right to say “not supported” § Allow each backend to evaluate what and when to support something Tool
Current Support • Typical startup operations • Event notification § Put, get, commit, barrier, § App, system generated spawn, [dis]connect, § Subscribe, chained publish/lookup § Pre-emption, failures, • Tool connections timeout warning, … • Logging (job record) § Debugger, job submission, query § Status reports, error output • Generalized query • Flexible allocations support § Release resources, request § Job status, layout, system resources data, resource availability
Event Notification Use Case • Fault detection and reporting w/ULFM MPI § ULFM MPI is a fault tolerant flavor of Open MPI MPI MPI • Failures may be detected from the SMS, RAS, or directly by MPI communications PMIx PMIx Server Server • Components produce a PMIx event when detecting an error • Fault Tolerant components PMIx RAS register for the fault event • Components propagate fault events which are then delivered to registered clients
In Pipeline • • Network support File system support § Security keys, pre-spawn local § Dependency detection driver setup, fabric topology § Tiered storage caching strategies and status, traffic reports, • Debugger/tool support ++ fabric manager interaction § Automatic rendezvous • Obsolescence protection § Single interface to all launchers § Automatic cross-version § Co-launch daemons compatibility § Access fabric info, etc. § Container support • Cross-library interoperation • Job control § Pause, kill, signal, heartbeat, resilience support • Generalized data store
Summary We now have an interface library RMs will support for application-directed requests Need to collaboratively define what we want to do with it Project: https://pmix.github.io/pmix Reference Implementation: https://github.com/pmix/pmix Reference Server: https://github.com/pmix/pmix-reference-server
Recommend
More recommend