MPI - Message Passing Interface • MPI – is the mostly used message passing-standard – By using MPI you obtain portable programs • MPI is based on earlier message passing libraries Lecture 7: – NX, MPL, PVM, P4, TCGMSH, PARMACS, ... • MPI is a standard that specifies: Message Passing Programing using MPI – programming interface to message passing – semantics for communication routines in the standard • MPI 1.1 – 128 routines callable from FORTRAN, C, C++, Ada, .. • Latest version is MPI 2.0 (www.mpi-forum.org) • Exists many implementations: MPICH, MPICH-G2, ScaMPI, etc. 1 2 MPI-Programming The MPI Architecture #include <mpi.h> • include mpi.h (C) #include <stdio.h> • SPMD: Single Program Multiple Data • All MPI program have to begin with int main(int argc, – Given P processors, run the same program on every processor char **argv) { err = MPI_Init (&argc, &argv) • Data types MPI_Init(&argc, &argv); • All MPI program ends with printf("Hello world\n"); – Data is described in a standardized way in MPI MPI_Finalize(); err = MPI_Finalize () • Communicators return 0; } – cleans up, removes internal MPI – An abstraction on how to choose participants in a collection of structures etc communications On sarek : sarek$ module add mpich/pgi • Pair-wise communication – do not interrupt remaining Compile with mpicc communications – One participant sends and the other receives • Note: On seth: • Collective communication Compile with – Reductions, broadcasts, etc – routines not belonging to MPI are gcc -I/opt/scali/include local and -L/opt/scali/lib –lmpi – e.g., printf is run by all proc. 3 See quickstart guides 4 MPI World The MPI world, example 2 1 3 0 4 • All processors calling MPI_Init 7 5 #include <mpi.h> 6 defines MPI_COMM_WORLD #include <stdio.h> • All MPI communication demands a communicator int main( int argc, char **argv ) { – MPI_COMM_WORLD – predefined world composed of all int rank, size; processors taking part in the program MPI_Init( &argc, &argv ); – You can create your own communicators MPI_Comm_rank ( MPI_COMM_WORLD, &rank ); • Only MPI processes with the same communicator can MPI_Comm_size ( MPI_COMM_WORLD, &size ); exchange messages printf( "Hello world! I'm %d of %d\n", rank, size ); • Size of a communicator MPI_Finalize(); err = MPI_Comm_Size (MPI_COMM_WORLD, &size) return 0; • My identity (Rank) in the world (0..size-1) } err = MPI_Comm_Rank (MPI_COMM_WORLD, &myrank) 5 6
”The minimal set of MPI routines” ” The minimal set of MPI routines” • Things to remember: ➢ In practice, you can do everything with 6 routines: – MPI_Init may only be called once during execution – MPI_Finalize is called last and after that may no ➢ MPI_Init (int *argc, char ***argv) other MPI routines be called, not even MPI_Init ➢ MPI_Finalize () • MPI_Comm_size and MPI_Comm_rank refers to size and ➢ int MPI_Comm_size (MPI_Comm comm, int *size) rank with respect to the communicator comm . ➢ int MPI_Comm_rank (MPI_Comm comm, int *rank) MPI_COMM_WORLD is the one we use most of the time, but ➢ int MPI_Send (void *buf, int count, we can create our own communicators with completely MPI_Datatype datatype, int dest, different values for size and rank . int tag, MPI_Comm comm ) • Standard MPI_Send / MPI_Recv are blocking (see below). ➢ int MPI_Recv (void *buf, int count, – Returns only when *buf can be used again MPI_Datatype datatype, • the message is received, or int source, int tag, • copied to an internal buffer, MPI allows both types, MPI_Comm comm, MPI_Status *status) – we should assume the first type to be sure!!!. 7 8 Point to Point Communication MPI Basic Send/Receive • Vi have to fill in the details • Sending a message from one process to another Process 0 Process 1 Send(data) Receive(data) 2 2 1 3 1 3 • Things that has to be defined: 0 4 0 4 – How can ”data” be represented? 7 5 7 5 6 6 – How can processes be identified? You can of course send several – When is an operation finished? Process 1 sends to messages at the same time • Demands cooperation between sender and receiver process 6 9 10 Synchronous vs Asynchronous Message passing Communication User mode System mode sendbuf Synchronous Asynchronous sysbuf Calls the send-subroutine Copies data from 1 A B 1 A ic B ready? sendbuf to sysbuf Now is sendbuf ready for reuse Sends data from 2 2 A B ready! A ic ready! B sysbuf to dest 3 System mode 3 User mode A B A ic B Calls the receive-subroutine Receives data from The sequence of 1 and 2 src into sysbuf do not matter Copies data from Now has recvbuf valid data sysbuf to recvbuf recvbuf sysbuf 11 12
Blocking vs Nonblocking Synchronization or...? Communication • Blocking Synchronous Asynchronous – Send : When the control is returned, the message has been sent and everything that has to do with 1 A B 1 A ic B the send is finished ready? – Receive : When control is returned, the whole 2 2 A B ready! A ic ready! B message has been received 3 • Non-Blocking 3 A B A ic B – Send : The control is returned immediately, the The sequence of 1 and 2 actual send is done later by the system do not matter – Receive : The control is returned immediately, the MPI_Ssend, MPI_Issend MPI_Send, MPI_Isend message is not received until it arrives The send calls are matched against a receive-routine, e.g., MPI_Irecv • You may of course match a blocking send with a non- blocking receive or the other way around (common) 13 14 Data types More about data types • Data in a message is described by a triplet • In order to hide machine specific differences in how ( address, count, datatype ) data is stored MPI defines a number of data types that makes it possible to have communication between • Predefined MPI data types corresponds to data types heterogeneous processors from the programming language • MPI_CHAR (signed char), MPI_SHORT (signed short int), (e.g.: MPI_INT, MPI_DOUBLE_PRECISION ) MPI_LONG (signed int), MPI_UNSIGNED_CHAR • There are MPI functions to create aggregate data (unsigned char), MPI_UNSIGNED_SHORT (unsigned short types like arrays (int, float), pairs, or one row in a int), MPI_UNSIGNED (unsigned int), matrix stored column wise MPI_UNSIGNED_LONG (unsigned long int), MPI_FLOAT (float), MPI_DOUBLE (double), MPI_LONG_DOUBLE (long • Since type information is sent with all data, an MPI double), MPI_BYTE (1 byte), MPI_PACKED (packed non- implementation can support communication between contiguous data) processes with different memory representations and • You can create your own MPI types by calling (possibly sizes of elementary data types (heterogeneous recursively) some routines (e.g. MPI_TYPE_STRUCT ) communication). 15 16 More about data types Tags in MPI struct { • Messages are sent together with a tag , that int nResults; helps the receiving processes to identify the double results[RMAX]; Example: construction of a struct } resultPacket; message. (Multiplexing) #define RESULT_PACKET_NBLOCKS 2 – Messages can be filtered by the receiver wrt tag , //Get necessary information for call to MPI_Type_Struct or not be filtered by specifying MPI_ANY_TAG int blocklengths[RESULT_PACKET_NBLOCKS] = {1, RMAX} MPI_Type_extent (MPI_INT, &extent) • Errors in using tags are common for the MPI MPI_Aint displacements[RESULT_PACKET_NBLOCKS] = {0, extent} novice and often causes deadlocks MPI_Datatype types[RESULT_PACKET_NBLOCKS] = {MPI_INT, MPI_DOUBLE} //Create the new type MPI_Datatype resultPacketType; MPI_Type_struct ( 2, blocklengths, displacements, types, &resultPacketType); MPI_Type_commit(&resultPacketType); //Now the new MPI type can be used to send variables of type resultPacket MPI_Send(&myResultPacket, count, resultPacketType, dest, tag, comm); 17 18
Recommend
More recommend