Lightweight MPI Communicators with Applications to Perfectly Balanced Quicksort Michael Axtmann , Peter Sanders, Armin Wiebigke IPDPS · May 22, 2018 Institute of Theoretical Informatics Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics KIT – The Research University in the Helmholtz Association www.kit.edu Perfectly Balanced Quicksort Karlsruhe Institute of Technology
Overview Communicators and communication Disadvantages of communicator construction Solutions for MPI RBC communicators Case study on sorting Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 1 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
Communicators in MPI 2 0 1 2 Subcommunicator MPI COMM WORLD 0 → 3; 1 → 4; 2 → 1 0 1 5 3 4 Send ISend Compute and Test 0 0 Blocking Nonblocking 1 1 point-to-point point-to-point 2 2 Receive IReceive and Test Scan IScan Compute and Test 0 0 Blocking Nonblocking 1 1 collective collective 2 2 Scan IScan Test Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 2 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
MPI Examples Communication over Divide and conquer rows and columns 0 1 2 0 1 2 3 4 5 6 7 5 3 4 0 1 2 3 4 5 6 7 6 7 8 0 1 2 3 4 5 6 7 Usage of communicators Divide tasks into fine-grained subproblems Elegant algorithms and comfortable programming Communicators make life easier at no cost!? Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 3 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
Current Implementations OpenMPI and MPICH PE group 2 Subcommunicator Mapping from PE ID to process ID required 0 → 3; 1 → 4; 2 → 1 0 1 Explicit representation as table Context ID Separates communication between communicators Subcommunicator part of each message 3/0 2/1 2 Context ID 0 Unique for all PEs of the PE group Subcommunicator 0 1 3 Context ID 1 Blocking Allgather-operation on context ID mask Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 4 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
Current Implementations OpenMPI and MPICH PE group 2 Subcommunicator Mapping from PE ID to process ID required 0 → 3; 1 → 4; 2 → 1 0 1 Explicit representation as table Communicator creation takes time linear to the communicator size Context ID Separates communication between communicators Subcommunicator part of each message 3/0 2/1 2 Context ID 0 Unique for all PEs of the PE group Subcommunicator 0 1 3 Context ID 1 Blocking Allgather-operation on context ID mask Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 4 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
Current Implementations OpenMPI and MPICH PE group 2 Subcommunicator Mapping from PE ID to process ID required 0 → 3; 1 → 4; 2 → 1 0 1 Explicit representation as table Communicator creation takes time linear to the communicator size Context ID Separates communication between communicators Subcommunicator part of each message 3/0 2/1 2 Context ID 0 Unique for all PEs of the PE group Subcommunicator 0 1 3 Context ID 1 Blocking Allgather-operation on context ID mask Communicator creation is a blocking collective operation Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 4 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
Blocking Communicator Creation “. . . nonblocking collective operations can mitigate possible synchronizing effects. . . ” “. . . enabling communication-computation overlap. . . ” “. . . perform collective operations on overlapping communicators, which would lead to deadlocks with blocking operations.” – MPI Standard A collective operation is invoked by all PEs of a communicator BUT: Communicator creation breaks nonblocking idea Nonblocking Nonblocking collective collective with communicator creation IScan Compute and Test Communicator creation IScanTest 0 0 1 1 2 2 IScan Test Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 5 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
Communicator Construction Splitting a communicator into two communicators of half the size Splitting PE 0 1 2 3 4 5 6 7 8 9 0.6 Running Time / Comm Size [µs] IBM – MPI Comm create group IBM – MPI Comm split 0.5 Intel – MPI Comm create group Intel – MPI Comm split 0.4 0.3 0.2 0.1 0 2 10 2 11 2 12 2 13 2 14 2 15 Comm Size (Cores) Communicator construction time is linear to PE group size Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics SuperMUC 6 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
Communicator Construction Splitting 2 15 PEs into two Collective operation on 2 14 cores communicators of size 2 14 Splitting Collective operation PE 0 1 2 3 4 5 6 7 8 9 MPI_Reduce MPI_Exscan 10 2 MPI_Comm_split Running Time [ms] 10 1 10 0 10 − 1 2 3 2 5 2 7 2 9 2 11 2 13 2 15 2 17 2 19 2 21 Message Length [B] Communicator construction is expensive compared to collectives Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics SuperMUC – 32 768 cores – IBM MPI 7 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
Communicator Construction Splitting a communicator into overlapping communicators of size four Alternating Cascading PE group invokest MPI Comm create group Time PE 0 1 2 3 4 5 6 7 8 9 10 PE 0 1 2 3 4 5 6 7 8 9 10 Alternating 10 3 Cascade Running time [ms] 10 2 10 1 10 0 10 − 1 2 9 2 10 2 11 2 12 2 13 Comm Size (Cores) Blocking communicator creation causes delays Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics SuperMUC – Intel MPI 8 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
Proposals for MPI PE group 0 Subcommunicator Sparse representations f i r s t =1 , l a s t =4 , s t r i d e =2 1 E.g. MPI_Group_range_incl Context ID Subcommunicator 3/0 2/1 2 Context ID 0 User-defined tag Subcommunicator Calculate by MPI: Concatenation of counters 0 1 3 Context ID 1 MPI COMM WORLD { 0 } { 1 } { 2 } { 3 } { 0,0 } { 0,1 } { 2,0 } { 2,1 } { 2,2 } Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 9 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
Our RBC library Our RBC library Range-based communicator in O ( 1 ) time rbc::Comm rbc::Comm Local construction + parent comm : MPI Comm + parent comm : MPI Comm Select MPI or RBC operations + first : int + first : int + last : int + last : int Local splitting: + stride : int + stride : int Split_RBC_Comm(Comm&, Comm&, ttt int first, int last, int stride) Only adjust range Initial MPI communicator 0 1 2 Blocking Ops Nonblocking Ops Classes Local Ops rbc::Bcast rbc::Ibcast rbc::Request rbc::Create RBC Comm 5 3 4 rbc::Reduce rbc::Ireduce rbc::Comm rbc::Split RBC Comm rbc::Allreduce rbc::Iallreduce RBC::Comm rank rbc::Scan rbc::Iscan rbc::Comm size rbc::Gather rbc::Igather 6 7 8 rbc::Gatherv rbc::Igatherv rbc::Barrier rbc::Ibarrier rbc::Send rbc::Isend rbc::Recv rbc::Irecv f i r s t =1 , l a s t =7 , s t r i d e =3 rbc::Probe rbc::Iprobe rbc::Wait rbc::Test rbc::Waitall Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 10 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
Our RBC library Implementation Details (Non)blocking point-to-point communication Maps rank to rank of MPI communicator Call MPI counterpart (Non)blocking collective operations Broadcast Calls point-to-point operations of RBC One globally reserved tag Time Nonblocking details Optional user-defined tag 0 1 2 3 PE Round-based schedule rbc::Ibcast( void *buff, int cnt, MPI_Datatype datatype, int root, ttt rbc::Comm comm, rbc::Request *request, int tag = RBC_IBCAST_TAG) Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics 11 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
RBC vs. MPI Splitting a communicator into two communicators of half the size Splitting PE 0 1 2 3 4 5 6 7 8 9 0.6 Running Time / Comm Size [µs] IBM – MPI Comm create group IBM – MPI Comm split 0.5 Intel – MPI Comm create group Intel – MPI Comm split 0.4 RBC – rbc::Split_RBC_Comm 0.3 0.2 0.1 0 2 10 2 11 2 12 2 13 2 14 2 15 Comm Size (Cores) RBC splitting comes with almost no cost Michael Axtmann: Lightweight MPI Communicators with Applications to Institute of Theoretical Informatics SuperMUC 12 Perfectly Balanced Quicksort Karlsruhe Institute of Technology
Recommend
More recommend