Message Passing - Operating Systems and Middleware Group

34 downloads 114 Views 1020KB Size Report
Message Passing. Peter Tröger. Sources: Clay Breshears: The Art of Concurrency. Blaise Barney: Introduction to Parallel Computing. OpenMP 3.0 Specification.
Parallel Programming Concepts Message Passing Peter Tröger Sources: Clay Breshears: The Art of Concurrency Blaise Barney: Introduction to Parallel Computing OpenMP 3.0 Specification MPI2 Specification Blaise Barney: OpenMP Tutorial, https://computing.llnl.gov/tutorials/openMP/

Parallel Programming MultiTasking

PThreads, OpenMP, OpenCL, Linda, Cilk, ...

Message Passing

MPI, PVM, CSP Channels, Actors, ...

Implicit Parallelism

Map/Reduce, PLINQ, HPF, Lisp, Fortress, ...

Mixed Approaches

Ada, Scala, Clojure, Erlang, X10, ...

Shared Memory (SM) Shared Nothing / Distributed Memory (DM)

Parallel Application ParProg | Languages

Data Parallel / SIMD GPU, Cell, SSE, Vector processor ...

processor-array systems systolic arrays Hadoop ...

Task Parallel / MIMD ManyCore/ SMP system ...

cluster systems MPP systems ...

Execution Environment 2

PT 2011

Message Passing

ParProg | Languages

MultiTasking

PThreads, OpenMP, OpenCL, Linda, Cilk, ...

Message Passing

MPI, PVM, CSP Channels, Actors, ...

Implicit Parallelism

Map/Reduce, PLINQ, HPF, Lisp, Fortress, ...

Mixed Approaches

Ada, Scala, Clojure, Erlang, X10, ...

3

PT 2011

Message Passing • Programming paradigm targeting shared-nothing infrastructures • Implementations for shared memory available, but may not be optimal • Multiple instances of the same application running on a set of nodes

ParProg | Languages

4

PT 2011

The Parallel Virtual Machine (PVM) • Intended for heterogeneous environments, integrated set of software tools and libraries • User-configured host pool • Translucent access to hardware, collection of virtual processing elements • Unit of parallelism in PVM is a task, no process-to-processor mapping is implied • Support for heterogeneous environments • Explicit message-passing mode, multiprocessor support • C, C++ and Fortran language

ParProg | Languages

5

PT 2011

PVM (contd.) • PVM tasks are identified by an integer task identifier (TID) • User named groups • Programming paradigm • User writes one or more sequential programs • Contains embedded calls to the PVM library • User typically starts one copy of one task by hand • Process subsequently starts other PVM tasks • Tasks interact through explicit message passing

ParProg | Languages

6

PT 2011

PVM Example main() { int cc, tid, msgtag; char buf[100]; printf("i'm t%x\n", pvm_mytid()); //print id cc = pvm_spawn("hello_other", (char**)0, 0, "", 1, &tid); if (cc == 1) { msgtag = 1; pvm_recv(tid, msgtag); // blocking pvm_upkstr(buf); // read msg content printf("from t%x: %s\n", tid, buf); } else printf("can't start it\n"); pvm_exit(); } ParProg | Languages

7

PT 2011

PVM Example (contd.) main() { int ptid, msgtag; char buf[100]; ptid = pvm_parent(); // get master id strcpy(buf, "hello from "); gethostname(buf+strlen(buf), 64); msgtag = 1; // initialize send buffer pvm_initsend(PvmDataDefault); // place a string pvm_pkstr(buf); // send with msgtag to ptid pvm_send(ptid, msgtag); pvm_exit(); } ParProg | Languages

8

PT 2011

Message Passing Interface (MPI) • Communication library for sequential programs • Definition of syntax and semantics for source code portability • Maintain implementation freedom on high-performance messaging hardware - shared memory, IP, Myrinet, propietary ... • MPI 1.0 (1994), 2.0 (1997), 3.0 (2012) - developed by MPI Forum • Fixed number of processes, determined on startup • Point-to-point communication • Collective communication, for example group broadcast • Focus on efficiency of communication and memory usage, not interoperability • Fortran / C language bindings - see examples on next slides ... ParProg | Languages

9

PT 2011

10

Basic MPI • Communicators (process group handle) • MPI_COMM_SIZE (IN comm, OUT size), MPI_COMM_RANK (IN comm, OUT pid) • Sequential process ID‘s, starting with zero • Source / destination identified by 3-tupel: tag, source and communicator • MPI_RECV can block, waiting for specific source • MPI_SEND (IN buf, IN count, IN datatype, IN destPid, IN msgTag, IN comm) MPI_RECV (IN buf, IN count, IN datatype, IN srcPid, IN msgTag, IN comm, OUT status) • Constants: MPI_COMM_WORLD, MPI_ANY_SOURCE, MPI_ANY_DEST • Data types: MPI_CHAR, MPI_INT, ..., MPI_BYTE, MPI_PACKED ParProg | Languages

11

PT 2011

Blocking communication • Synchronous / blocking communication • Data send and receive operation runs synchronously to the control flow • Finalization of the communication method implies operation completion • Sender may wait for the matching receive • Buffering may or may not happen • Sender and receiver application-side buffers are in defined state afterwards int MPI_Send(void* buf, int count, MPI_Datatype type, int dest, int tag, MPI_Comm com);

ParProg | Languages

12

PT 2011

Blocking communication • Blocking communication from MPI perspective: „Do not return until the message data and envelope have been stored away“ • Send modes deal with synchronization and buffering of communication • Default: MPI decides whether outgoing messages are buffered • Buffered: MPI_BSEND returns always immediately • Might be a problem when the internal send buffer is already filled • Synchronous: MPI_SSEND completes if the receiver started to receive • Recommendation for most cases, can (!) avoid buffering at all • Ready: Sender application takes care of calling MPI_RSEND only if the matching MPI_RECV is already available • Can omit a handshake-operation on some systems ParProg | Languages

13

PT 2011

Non-Overtaking Message Order • „If a sender sends two messages in succession to the same destination, and both match the same receive, then this operation cannot receive the second message if the first one is still pending.“ CALL MPI_COMM_RANK(comm, rank, ierr) IF (rank.EQ.0) THEN CALL MPI_BSEND (buf1, count, MPI_REAL, 1, tag, comm, ierr) CALL MPI_BSEND (buf2, count, MPI_REAL, 1, tag, comm, ierr) ELSE ! rank.EQ.1 CALL MPI_RECV (buf1, count, MPI_REAL, 0, MPI_ANY_TAG, comm, status, ierr) CALL MPI_RECV (buf2, count, MPI_REAL, 0, tag, comm, status, ierr) END IF

ParProg | Languages

14

PT 2011

Rendezvous (2)

Rendezvous Process A

Process B

Process A

Process B

• Special case with rendezvous communication • Sender retrieves reply message for it‘s request

receive

send

send

• Control flow on sender side only continues after this reply message

data

receive data

data

• Typical RPC problem

reply

data

reply

• Ordering problem should be solved by the library int MPI_Sendrecv( void* sbuf, int scount, MPI_Datatype stype, int dest, int stag, void* rbuf, int rcount, MPI_Datatype rtype, int src, int rtag, MPI_Comm com, MPI_Status* status); ParProg | Languages

15

PT 2011

One-Sided Communication

Synchrones Schreiben Synchrones Les Process A

• No explicit receive operation, but synchronous remote memory access int MPI_Put( void* src, int srccount, MPI_Datatype srctype, int dest, void* destoffset, int destcount, MPI_Datatype desttype, MPI_Win win);

store

Process B

Process A

Process B

fetch

data

data

store

data fetch

int MPI_Get( void* dest, int destcount, MPI_Datatype desttype, int src, void* srcoffset, int srccount, MPI_Datatype srctype, MPI_Win win); ParProg | Languages

data

16

PT 2011

#include "mpi.h" #include "stdio.h" #define SIZE1 100 #define SIZE2 200 int main(int argc, char *argv[]) { int rank, destrank, nprocs, *A, *B, i, errs=0; MPI_Group comm_group, group; MPI_Win win;

MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Alloc_mem(SIZE2 * sizeof(int), MPI_INFO_NULL, &A); MPI_Alloc_mem(SIZE2 * sizeof(int), MPI_INFO_NULL, &B); MPI_Comm_group(MPI_COMM_WORLD, &comm_group); if (rank == 0) { for (i=0; i