Constructive Communication in MP Jeff Magee, Naranker Dulay and Jeff Kramer Department of Computing, Imperial College of Science, Technology and Medicine, 180 Queen's Gate, London SW7 2BZ, UK. Email:
[email protected]
Abstract MP is a programming environment for message passing parallel computers. The paper describes the basic set of communication primitives provided by MP and demonstrates how higher level communication operations such as symmetric exchange and remote rendezvous can be directly constructed from the basic set. The paper further shows how global parallel operations such as parallel sum, barrier synchronisation and parallel prefix can be elegantly constructed by combining the basic set of primitives with generic process structures described in the configuration language Darwin which forms part of the MP programming environment. Implementation and performance aspects of the primitives and constructed operations are discussed in relation to a transputer based multicomputer.
Keywords: parallel programming environment, parallel programming language, transputers.
1
1. Introduction The message passing programming paradigm is well suited to distributed memory multicomputers since in many cases, the programming language primitive operations can be implemented directly in hardware. This match results in a high efficiency for message passing programs. However, in most commercially available parallel programming environments which support message passing[MEI89], this efficiency is only achieved at the expense of burdening the programmer with the task of directly programming complex message interactions from lowlevel send and receive primitives. These environments provide little support for the construction and subsequent reuse of higher level global communication abstractions. By global communication we mean a communication operation that affects more than two processes - a simple example would be broadcast. One reaction to this problem of managing communication complexity has been environments such as Express[PAR89] which support the Single Program Multiple Data (SPMD) paradigm for parallel programs. This paradigm is suited to scientific and engineering calculations which operate on large arrays of data[FOX88]. Data is divided between the set of available processing elements (PE) and the same program is executed at each PE. Data is exchanged and results combined using a fixed repertoire of global communication operations. So successful has the SPMD approach been for this category of problem that support tools and compilers have been produced to semi-automatically partition data and insert the required communication operations into a sequential source program[HAT91, IKU11]. The disadvantage of the SPMD approach is the difficulty encountered when problems have a natural functional rather than data parallelism. MP is a programming environment for message passing parallel programs which emphasises the construction of communication operations. It is intended that the environment should have the flexibility to support a wide range of message passing program styles and paradigms including SPMD. The programming facilities consist of two parts. The first, a set of communication primitives and datatypes which are embedded in a host sequential programming language - generally by a preprocessor. The second, the configuration language Darwin which is used to program the construction and subsequent evolution of networks of communicating processes. Darwin supports modular and generic network programs. A detailed description of the language and its use may be found in [MAG91a].This paper concentrates on the communication operations provided by MP and their combination with Darwin generic network structures to provide global communication operations. Section 2 first outlines the form of MP/Darwin programs and their execution environment and then describes the MP primitive communication operations and datatypes which are embedded in a host programming language. In this paper, the syntax used for examples is from the Pascal embedding of MP. A preprocessor is also available for C. Section 3 demonstrates, using the example of combining networks, how the set of primitives described 2
in Section 2 may be used in conjunction with Darwin generic structures to elegantly implement global communication operations. Implementation and performance aspects of MP communication are discussed in Section 4 with respect to a transputer based distributed memory machine. Finally, Section 5 discusses the work and presents some conclusions and future directions.
2. Primitive Communication Programs developed in MP are intended to execute on message passing machines with Pmax identical processing elements (figure 1). Each processing element consists of a CPU, local memory and an interface to the communication network. Processing elements can support many independent threads of execution (processes). Pairs of processes may directly communicate with each other whether they are resident on the same processor or on different processors. The communication network is thus assumed to support full interconnection between any two processes wherever they are located. The techniques used by the MP runtime system to implement this abstract machine on current generation message passing machines are similar to those employed by its predecessor Tonic (see [MAG91b] for a description).
PE1 Processing Elements
Communication Network
PE2 PE3 PEPmax
Figure 1 - Message Passing Parallel Machine MP programs consist of networks of intercommunicating processes. An example of a program with Pmax producer processes and 1 consumer process is given in Figure 2. The network is described in the Darwin language by specifying the set of process types and/or subnetwork component types (use) from which the network is constructed , the set of instances of these types (inst) which form the nodes of the network and the set of communication paths between process instances (bind). The example also includes a description of how the network is to be executed on the parallel computer. By default, instances are allocated to the processing element on which their enclosing component resides, otherwise their location is specified by the pe attribute associated with the instance declaration. In the example, each producer process is simply allocated to a different PE. Note that Pmax is not a constant, it is determined by the MP 3
loader at program execution time. component prodcons { use producer, consumer; inst C:consumer; forall k:1..Pmax { inst P[k]:producer @ pe=k; bind P[k].put -- C.get; } }
prodcons
P[1]
C
P[2]
P[Pmax ]
Figure 2 - Darwin Producer-Consumer program.
Ports & Receive Processes communicate in MP by means of ports. Ports can be considered to be queues of typed messages. Messages are dequeued from ports and copied into the local variables of a process by means of the blocking receive operation (?). In the example below of the consumer process from figure 2, the port get queues messages consisting of a real and an integer value. Ports are always declared within the context of a process. The directive provide get makes the port visible to a Darwin program using the process type consumer. p r o c e s s consumer; p r o v i d e get; var get: p o r t(integer,real); i:integer; r:real; begin loop get ? (i, r); writeln(i,r); end; e n d.
The MP-Pascal compiler checks that the types of operands of a receive operation are compatible with the port on which the operation is being performed. Ports are similar to files in Pascal in that they may be passed as var parameters to procedures but they may not be assigned.
Remote References & Send 4
Messages are sent to remote references to ports. A remote reference is a pointer to a remote object which, as we will see in the following, is not always a simple port. Remote reference types are declared by prefixing a normal type with an @ symbol. In the example below of the producer process from Figure 2, put is a remote reference to a port which can receive messages consisting of an integer and a real. p r o c e s s producer; r e q u i r e put; var put: @ p o r t(integer,real); i:integer; begin f o r i:=1 t o 100 d o put ! (i, sqrt(i)); e n d.
The directive require put makes the port reference variable visible to a Darwin program using the process type producer. It should now be clear that the effect at execution time of a Darwin bind declaration is to assign a remote reference value provided by one process instance into the remote reference variable required by another process. Messages are sent to remote port references by means of the synchronous send operation (!). Again, the compiler checks that the send operation is compatible with the port reference variable. MP compilers generate a compact structural type description of each port provided and reference required. Darwin uses these descriptions to perform a static type check on bindings between processes. Consequently, the compatibility between messages sent and messages received is enforced. With the exception of ports, files, and pointers, messages of any Pascal datatype (including reference types) may be sent in messages. Remote Rendezvous The ability to send remote reference values in messages provides a convenient way to construct higher level communication protocols from the simple synchronous send and receive operations described above. For example, a remote rendezvous protocol is implemented by the client and server processes shown in figure 3.
5
bind client.sine -- server.sine
client
server
process client; require sine; var sine:@port(real,@port(real)); crep : port(real); angle,sinval:real; begin ....... sine!(angle,@crep); crep?(sinval); ....... end.
process server; provide sine; var sine:port(real,@port(real)); srep : @port(real); angle:real; begin ....... sine?(angle,srep); srep!(sin(angle)); ....... end.
Figure 3 - Client - Server The client process sends a message to the server consisting of a real value and a reference to its local port crep. The reference to crep is computed by the @ operator. The server process receives the real value into the variable angle and the reference value into the variable srep. Srep is used to send the computed sine value back to the client. However, the syntax for declaring port reference types requires simplification to enhance the readability of programs. With no change to the semantics of the underlying operations, an alternative syntax for declaring port references is provided such that ->type_list is equivalent to @port(type_list). Furthermore, since rendezvous interaction is very common the compiler provides an implicit reply port for the caller together with an implicit port reference variable for the callee of a remote rendezvous. The program of figure 3 now becomes: process client; require sine; var sine:->real->real; angle,sinval:real; begin ....... sine!(angle->sinval); ....... end.
process server; provide sine; var sine:port(real->real); angle:real; begin ....... sine?(angle->sin(angle)); ....... end.
An extended rendezvous is also supported to allow the callee to execute a series of statements before replying. Note that the protocol works for any number of clients connected to the server. Further extensions to this basic rendezvous can easily be programmed. For example, a server 6
can reply in a different order to which requests were received by storing more than one reply reference value. To implement a scheduling server the programmer must resort to explicit reference variable declaration as in figure 3. Port Structures As noted earlier, remote references do not always refer to ports. They may also reference arrays or records of ports as shown below in the program of figure 4. bind user.S -- sema.S
user
sema
type semaphore = record P,V:port() end; process user; require S; var S:@semaphore; begin ....... S.P!(); ............ S.V!(); end.
process sema(init:integer); provide S; var S:semaphore; val:integer; begin val:=init; loop select when val>0 & S.P?() do val:=val-1; when true & S.V?() do val:=val+1; end end end.
Figure 4 - Semaphore Implementation The sema process of figure 4 implements a semaphore. The interface to the process is provided as a record of two ports P and V. Thus only one binding is required between the user of the semaphore service and the provider. The compiler of the user process can easily compute the references to the individual ports P and V from the reference to S and the structure of the record semaphore. The ability to reference port structures permits the efficient management of complex interfaces in Darwin. Further, it allows servers to return a complex interface to a user process request as a single reference value. Figure 4 also includes an example of the MP guarded command. The select statement allows a process to selectively wait for a message from one of a set of ports. The statement can include both time delays and purely computational when clauses in a similar fashion to the Occam[INM88] ALT statement or Ada[DOD83] select statement.
7
Communication Conjunction The select statement provides a nondeterministic way of choosing between one of a set of possible communications. Complementary functionality is required by a process which requires to perform all of a set of communication operations in an order which cannot be predetermined locally when the process is programmed. bind exch[1].left -- exch[2].right; exch[2].left -- exch[1].right;
exch[1]
exch[2]
process exch; provide right; require left; var right: port(integer); left : -> integer; valin,valout:integer; begin { exchange my valout with neighbour } conjoin right?(valin) & left!(valout); .......... end;
Figure 5 -Exchange Program For example, if two instances of the same process type wish to exchange data then with the communication statements and operations so far defined we cannot program them to avoid deadlock. Either both processes will send first or both will receive first since they must perform operations in the same order as a result of being two instances of the same type 1. Since communication is synchronous, they will deadlock. MP provides the conjoin statement to cater for this situation. The statement activates a set of communication operations in parallel and waits until they all complete. The individual communication operations within the conjoin statement may complete in any order. The program of figure 5 is an example of the use of the conjoin statement to perform two communication operations concurrently. The program consists of two instances of the process type exch connected together so that they can exchange data values. The conjoin statement performs the required data exchange without causing deadlock. This situation is commonly found in scientific calculations when boundary values are exchanged with neighbouring PEs. The program works with n exch processes connected in a ring. The MP 1
Unless some asymmetry is introduced by messages received from other processes or from parameters.
8
compiler performs a static check to warn when there is possible interference between variables used in the message operations of a conjoin statement. Because indexed array variables and pointers can be used in message operations, this check is only a warning. Full safety is provided by an optional runtime check. The conjoin statement is a generalisation of the exchange primitives provided by SPMD programming environments such as Express[PAR89]. This section has described the basic communication datatypes and synchronous operations provided by MP. Some simple examples have been given of how the basic primitives may be used to construct higher level abstractions. Discussion as to why these particular primitives have been chosen and their relation to primitives provided in other parallel programming environments is deferred to Section 5. The next section describes how global communication abstractions can be constructed from the basic primitives.
3. Global Communication In the previous section, we demonstrated how protocols such as symmetric exchange and remote rendezvous could be simply constructed form the basic synchronous send and blocking receive primitives. However, these are basically two party communication interaction protocols. In this section, we examine the construction of n-party communication interaction where up to n processes take part in a communication. An example of an n-party interaction would be barrier synchronisation where n processes communicate to ensure that they all have reached a certain point in their execution before any one of them proceeds beyond that point. We term these n-party communication protocols global communication. These global communication operations are constructed from the base-level primitives with the aid of generic Darwin structures.
Generic Darwin Components The generic facility of Darwin allows the user to describe network structures of communicating processes without specifying precisely the function of each processing node of the network. For example, figure 6 is an n input combining network. The recursive Darwin description of this network is parameterised with the component type cnode.
9
cnode component cnode { provide left,right; require out; }
combine
component combine (n:int; use cnode) { provide in[n]; require out; when n==2 { inst root:cnode @pe =(Pmax/2 +1); bind in[0] -- root.left; in[1] -- root.right; root.out -- out; } when n>2 { inst half:combine(n/2 + n%2,cnode); bind half.out -- out; forall i:0..n/2-1 { inst cnode[i] @pe = 2*i*Pmax/n + 1; bind in[2*i] -- cnode[i].left; in[2*i+1] -- cnode[i].right; cnode[i].out -- half.in[i]; } when n%2==1 bind in[n-1] -- half.in[n/2]; } }
Figure 6 - Combine Network The Darwin description of cnode need only specify the names of objects the type provides and the names of the reference variables it requires values for. Consequently, many different process (and process networks) can be substituted for cnode at program instantiation time. Note that the combine program contains information on how the network should be mapped to a physical multicomputer. The mapping ensures maximum parallelism by ensuring that processes at the same level of the combining network will be allocated to different processing elements.
10
Parallel Sum By substituting the process add (described below) for cnode the combine network can be used to sum n values in O(log n) time. In Darwin, the parallel sum network is declared as follows: i n s t parsum:combine(n,add) . p r o c e s s add; p r o v i d e left, right; r e q u i r e out; var left,right: p o r t(real->real); out: ->real->real; leftrep,rightrep:->real; val,sum:real; begin loop left?(val,leftrep); right?(sum,rightrep); sum:=sum+val; if not null(out) then out!(sum->sum); leftrep!(sum); rightrep!(sum); e n d; e n d.
Each process taking part in the parallel addition sends a value to the combining network. Each add process sums the value from its left and right input and sends the sum to its output out. When the root of the combining tree is reached, the fact that out is not bound is detected by the predicate null(out). The sum is now diffused back up the tree to the inputs. The program of Figure 7 which computes in parallel a value for π by performing the integration from 0 to 1 of 4/(1+x2), serves to illustrate the use of the combining network to compute parallel sums. Each process calcpi[i] performs a slice of the integration. The results are combined by each process performing the operation add!(sum->sum) which replaces the local sum with the global sum. Process calcpi[0] then prints out the result and terminates the computation.
11
process calcpi(n,maxn:integer); require add; const deltas=400000; calcpi[0] calcpi[1] calcpi[n-1] var add:->real->real; i:integer; sum,x:real; begin combine(n) sum:=0.0; for i:=n*(deltas div maxn) to (n+1)*(deltas div maxn)-1 do component pi begin { x:=(i+0.5)/deltas; use combine,calcpi,add; sum:=sum+ 4.0/(1.0+x * x); inst parsum:combine(Pmax,add); end; forall i:0..Pmax-1 add!(sum->sum); { sum:=sum/deltas; inst calcpi[i](i,Pmax) @pe=i+1; if n=0 then begin bind calcpi[i].add--parsum.in[i]; writeln("pi:", sum:14:12); } halt; } end; end.
pi
Figure 7 - Parallel computation of π .
Barrier synchronisation The use of the combine network to perform a parallel sum operation also implicitly synchronises the processes participating in the sum. An explicit barrier synchronisation operation can easily be programmed by substituting the process sync described below for cnode in the combine network. p r o c e s s sync; p r o v i d e left, right; r e q u i r e out; var left,right: p o r t(->void); {->void is equivalent to @port() } out: ->void->void; leftrep,rightrep:->void; begin loop left?(leftrep); right?(rightrep); if not null(out) then out!(->void); leftrep!(); rightrep!(); e n d; e n d.
An n process barrier network is created by the Darwin instance declaration: 12
i n s t barrier:combine(n,sync); Each user process would declare a reference variable bsync bound to one of the inputs of barrier and perform the operation bsync!(->void) to synchronise. Since all n inputs must be received before the root node can reply, barrier performs barrier synchronisation.
Parallel Prefix By substituting the process prefixadd for cnode, the combining network can be used to solve the parallel prefix problem such that an input to in[i] returns the sum of inputs submitted to inputs in[0..i-1]. The solution is due to Kruskal, Rudolph and Snir [KRU88].
p r o c e s s prefixadd; p r o v i d e left, right; r e q u i r e out; var left,right:p o r t(real->real); out:->real->real; r0,r1:->real; v0,v1,sum:real; begin loop left?(v0,r0); right?(v1,r1); i f n o t n u l l(out) t h e n out!(v0+v1->sum) else sum:=0.0; r0!(sum); r1!(sum+v0); end; end.
The prefix network is instantiated as shown below: i n s t presum:combine(n,prefixadd); A client process i would request the operation by declaring a port reference variable padd bound to presum.in[i] and perform the operation padd!(val->sum) to calculate the sum of its predecessors 0.. i-1 input values. Both the parallel sum and parallel prefix programs work for any binary associative operator. In a language which supports procedure types (such as C) the operator as well as the operands can be passed to the combining network. Consequently, in C, the network together with a suitable implementation of cnode can be used to provide exactly the functionality of the excombine global communication operator of Express[PAR89]. The Express exsync operation is provided by the barrier synchronisation program above. The parallel sum program provides a form of global broadcast operation where the broadcaster would perform the operation add!(val->val) and the remaining processes wishing to receive the 13
broadcast would perform the operation add!(0.0->val). Each process would have the broadcasted value after the parallel sum completed. In the above, the combine network has been used with remote rendezvous communication to return results of operations to inputting processes. The network can also be used to produce a single output stream of values from a set of input processes ordered by their connection index to combine.in[]. The output would be available on combine.out.
4. Implementation & Performance The MP programming environment is currently targeted to produce programs for a transputer based Meiko Computing Surface. This has 32 T800 processors each with 4 Megabytes of memory. The machine is hosted by a SPARC I based host computer. The Meiko is statically configurable in that the transputers can be connected together into different network structures before an MP program is downloaded from the host. The hardware configuration activity is controlled by a Darwin program which describes the required hardware interconnection network [MAG91a,b]. The MP runtime system implements the abstract fully interconnected machine of Figure 1 by providing routing and virtual channels between processing elements. The MP program comprising the compiled Darwin program, code for process types and runtime system is loaded into each transputer’s memory. The Darwin program then executes independently in parallel at each transputer to create the initial network of process instances at each node and the inter-process bindings. After the initial network has been created, process instance execution is started. Note that even if the Darwin program is hierarchically structured the network computed is a “flat” structure of processes and direct bindings between processes. There is no indirection at run-time. Although not described in this paper, the initial process network of processes can be extended at runtime by invoking Darwin configuration operations[MAG91a]. Remote references are represented at runtime by the triple: where pe_id is the processing element number, proc_id is the process instance identifier and offset is the offset from the base address of the data segment of that process. The pair provide a unique identifier for each process instance in a program. Process instance identifiers within a processing element are allocated cyclically over a large range to enable the detection of invalid references. A reference becomes invalid when the process instance to which it refers has terminated and its data segment released. Currently, a remote reference is represented in store by two 32 bit words. References with pe_id =0 are 14
used to communicate with the I/O server running on the host computer.
Ports are implemented using a seven word control block which includes a transputer channel word for data transfer and head and tail pointers to the queue of processes awaiting access to the channel word. A count of the waiting processes is also maintained. The remaining words are required by the implementation of the select statement. Send-Receive time in microseconds (µS) out
inn
out!(M)
Local:
inn?(M)
- M in off chip memory
18 + 0.08 * Ms
Remote: 40 + L * (60 + 0.7 * Ms) where L > 0
Ms = Size of M in bytes Rs = Size of R in bytes L = number of HW links
Remote Rendezvous time in microseconds (¨µS) out
inn
inn?(M->R)
out!(M->R) Local:
36 + 0.08 * (Ms + Rs)
- M & R in off chip memory
Remote: 90 + L * (67+ 0.7 * (Ms + Rs)) where L > 0
Figure 8 - Message Passing Performance Figure 8 summarises the performance of the MP message passing primitives. As should be expected, local (intra processor) times for remote rendezvous are exactly twice that of local send-receive times since the remote rendezvous is constructed from the base level primitives. However, in the remote case (inter-processor), the remote rendezvous is much less than twice the single send-receive time. A single pair of send and receive operations across a network of transputers requires an acknowledgement to be transmitted back across the network from the receive end to the send end to preserve the semantic of synchronous communication. However, these acknowledgements can be omitted in the case of remote rendezvous while preserving the semantics of the operation. The MP-Pascal compiler automatically detects this optimisation in the case of remote rendezvous. In the case where acknowledgements can be dropped from a sequence of communication operations while preserving the operational semantics of a program, the user can invoke a compiler option to selectively switch off acknowledgements. The communication times reported in Figure 8 are independent of the number of remote reference 15
variables bound to a port. In this, MP differs from its predecessor Tonic[MAG91b] where communication time was proportional to the fan-in factor. MP uses a queuing mechanism for fan-in where Tonic used the transputer ALT construct. A 32 input parallel sum computation (section 3) takes 1640µS when mapped unto an 8 by 4 torus. The program of figure 7 computes π in 128 milliseconds (mS) on a network of 32 transputers connected as an 8 by 4 torus. This can be compared with the runtimes of the same program reported in[HAT91] of 280 mS for a 32 processor Intel IPSC/2, 350mS for a 64 processor Ncube 3200 and 550mS for a 24 processor Sequent Symmetry. All times exclude downloading and result output time. We expect that the communication times quoted in figure 8 would improve dramatically on a machine constructed from the new T9000/C104 transputer components since these provide hardware support for routing, virtual channels and many to one multiplexing. These functions are currently provided in software by the MP runtime system.
5. Discussion & Conclusion The MP communication primitives have been designed to be “constructive” in the sense that they can easily be combined to form more complex communication abstractions. The programmer is given the responsibility of choosing the communication abstractions most appropriate to his problem. Ideally, a programmer can select generic Darwin structures from a library rather than start from the base level. This approach can be compared with that of Linda[GEL85] which provides a fixed set of high level communication abstractions. The Linda compiler is given the responsibility of deriving efficient implementations in terms of low-level message operations. Typically, the compiler substitutes direct message passing for tuple space update and lookup. At times, the programmer is forced to use inappropriate communication abstractions when the problem has a natural formulation using message passing primitives. Where Linda style tuple space is appropriate for a problem, we would hope to provide it as a Darwin component, although as yet, this claim cannot be substantiated. We share with Linda, Express and numerous parallel programming systems the desire to reuse routines from existing sequential codes. Consequently, the MP communication primitives are designed to extend existing sequential languages. To date, extensions for Pascal, C and C++ have been implemented. The approach is to preprocess programs in the extended language to the original language. In contrast to the above mentioned systems, we have chosen to “deeply” embed MP message passing. For example, ports and remote references extend the basic data types and type constructors provided by the host language. Our desire is to retain for parallel programming the advantages that the language type system gives to sequential programming. 16
The compromise of using structural type checking for inter-process binding, allows the combination of processes written in different languages into a single program. Further, structural type checks permit the independent compilation of MP programs as opposed to Ada style separate compilation. Independent compilation simplifies distributed development of programs. The concept of a communication port is in no way novel. Ports of one form or another have been provided in a number of operating systems and programming languages[BAL89]. The novelty here lies in the combination of ports, remote references and configuration language. However, it is instructive to compare MP ports with the channels of Occam 2[INM88]. Channels provide one to one unidirectional communication as opposed to the many to one communication pattern supported by MP ports. References to ports may be sent in messages whereas channels may not. Ports are always declared within the scope of a process which receives on that port, whereas a channel is declared global to the scope of sending and receiving processes. In essence, these differences represent two sets of consistent design decisions. Ports must have a single location so that we can form and transmit references to them. Since port references can be copied and transmitted to different processes, ports must be capable of receiving messages from many sources. Channels, in contrast, more accurately reflect the nature of the underlying transputer hardware. They are thus statically defined connections between processes in the same way as links are connections between processors. Static structure, from our experience with previous systems[MAG89, MAG91b], results in programs with predictable performance and which facilitate comprehension and reasoning. It also, at times, leads to unnecessary complication where the structure of a program naturally evolves during computation. In addition to permitting dynamic connection establishment through passing port references in messages, MP permits dynamic process instantiation [MAG91a]. Processes in MP are sequential threads of control. Parallelism is provided by composing multiple process instances using Darwin. Consequently, the equivalent of the PAR statement in Occam is not required. Further, in sequential languages such as Pascal and C which have pointers and numerous ways of programming side effects, it would be difficult to provide a safe form of PAR which detected interference between created threads. However, with synchronous communication, it is often necessary to allow a number of communication operations to proceed in parallel since there is no apriori sequential ordering. Our solution to this problem is the conjoin statement (section 2) which allows only communication operations to be executed in parallel. Argument values and variable addresses are computed in advance of the parallel communication operation execution. This limits the possibility of interference, allows the compiler to perform some static checking and limits the runtime checks required. Darwin generic components are the skeletal form of parallel programs. Although, we have concentrated on a particular example, the reader will realise that other common parallel 17
algorithmic forms such as pipeline, supervisor-worker and divide & conquer can be expressed with equal facility. The advantage of programming using generic components is that not only the structure but the associated mapping annotation can be reused between applications. Although not shown in the example attributes such as priority, in addition to location, can be used to annotate instances in Darwin. It is planned to build a library of generic components as part of the MP programming environment. The functional programming group at Imperial College[DAR91] has also recognised the importance of generic structures (termed skeletons). Transformation directed at a particular skeleton provides a path towards improving the execution efficiency of functional programs on parallel architectures. Future collaboration is planned to investigate the utility of transformation techniques within the MP environment. In conclusion, the MP communication primitives are an attempt to combine the flexibility of systems which provide low-level message passing primitives with the programmability of systems which provide a fixed repertoire of higher level abstractions. The emphasis in MP is in providing facilities to construct communication operations and subsequently convenient reuse of these operations. To validate the constructive capability of MP, the development of components which support the functionality of SPMD systems such as Express and the tuple space abstraction of Linda is planned.
Acknowledgements The authors would like to acknowledge discussions with our colleagues in the Parallel and Distributed Systems Group during the formulation of these ideas. We gratefully acknowledge the SERC under grant GR/G31079, and the CEC in the REX Project (2080) for their financial support.
References [BAL89]
H.Bal, J. Steiner, A Tanenbaum, “Programming Languages for Distributed Computing Systems, ACM Computing Surveys,Vol. 21, No. 3, September 1989, pp 261-322.
[DAR91]
J. Darlington, A. Field, P. Harrison, D. Harper, G. Jouret, P. Kelly, K. Sephton and D. Sharp. “Structured parallel functional programming”. In Glaser and Hartel, editors, Proceedings of the Workshop on the Parallel Implementation of Functional Languages. Technical report CSTR~91-07, Dept. of Electronics and Computer Science, University of Southampton.
[DOD83]
Department of Defense, U.S., “Reference manual for the Ada programming language”, ANSI/MIL-STD-1815A, DoD, Washington, D.C., Jan 1983. 18
[FOX88]
G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon and D. Walker, “Solving Problems on Consurrent processors, Vols. 1 and 2. Prentice Hall, Englewood Cliffs, NJ, 1988.
[GEL85]
D. Gelernter, “Generative communication in Linda”, ACM TOPLAS 7(1) Jan 1985, pp80-112.
[HAT91]
P.J. Hatcher, M.J. Quinn, A.J. Lapadula, B.K. Seevers, R.J. Seevers, R.J. Anderson and R.R. JOnes, “Data-Parallel Programming on MIMD Computers”, IEEE Trans. on Parallel and Distributed Systems”, PDS-2(3), July 1991, pp 377-383.
[IKU11]
K. Ikudome, G. Fox, A. Kolawa and J. .W. Flower, “An automatic and symbolic parallelization system for distributed memory architectures”, Proc. 5th Distributed Memory Computing Conference, Charleston, S. Corolina, April 1990.
[INM88]
Inmos Ltd, “OCCAM 2 reference manual”, Prentice Hall, 1988.
[KRU88]
C.P. Kruskall, L. Rudolph, amd M. Snir, “Efficient Synchronisation on Multiprocessors with Shared Memory”, ACM TOPLAS, vol.10, No. 4, Oct 1988, p579-601.
[MAG89]
Magee,J., Kramer,J., and Sloman,M. (1989). "Constructing Distributed Systems in Conic" IEEE Transactions on Software Engineering, SE-15 (6).
[MAG91a]
J. Magee, N. Dulay and J. Kramer, “Structuring Parallel and Distributed Programs”, submitted for publication.
[MAG91b]
J.N.Magee and N.Dulay, “ A Configuration Approach to Parallel Programming”, PARLE 91, Vol II, LNCS 506, pp 313-330.
[MEI89]
Meiko Ltd, “CS Tools Documentation Guide”, 650 Aztec West, Bristol, 1989.
[PAR89]
Para Soft Corporation, “EXPRESS user manual”, 1989.
19