University of Waterloo. Waterloo, Ontario N2L 3Gl. ABSTRACT. This paper describes the motivation for a set of intertask communication primitives, the hardware.
ARCHITECTURAL
SUPPORT
FOR SYNCHRONOUS TASK COMMUNICATION
F. J. Burkowski,
G. V. Cormack
and
G. D. P. Dueck?
Department of Computer Science University of Waterloo Waterloo, Ontario N2L 3Gl
ABSTRACT
lNTRODUCTlON
This paper describes the motivation for a set of intertask communication primitives, the hardware support of these primitives, the architecture used in the Sylvan project which studiestheseissues,and the experiencegainedfiom variousexperimentsconducted in this area. We start by describing how these facilities have been implemented in a multiprocessor configuration that utilizes a shared backplane. This configuration representsa single node in the system. The latter part of the paper discussesa distributed multiple node system and the extension of the primitives that are usedin this expandedenvironment.
The use of synchronous communication-based computational architectures has evolved in recent years as a technique for reducing the development complexity of large software systems,particularly in areas where real-time performance is of paramount concern.In concert with an anthropomorphic style of program structuring [BEA82, BO084, GEN81], this approach has proved to be extremely valuable in reducing both the initial and long-term (maintenance) costs of such developments, and has resulted in the creation of a large number of operating systemsthat support this organizational model [CHE79, CHE83, GEN83, HlR83, MAN81, WAT84]. For example, [CHESS] discusses the benefits of defining an operating system as a very large number of tasks (larger than the number of processors)and he refers to this techniqueas “superconcurrency”.
This researchis funded by a strategic grant from the Natural Sciencesand Engineering ResearchCouncil of Canada(Grant No. G1581).
Despite the successof this software methodology, all theseexisting systemssuffer from a lack of hardware support for the underlying message-passing operations, limiting their performance in a fashion similar to that evidenced by the minimal support of procedure invocations in architectural designs of the late 1960s. The Sylvan project was begun as a concertedeffort to addressthis concern, and has as its goal the development of a hardware architecture for the direct support of message-passingoperations, both in a local context (within a single computer) and
‘On leave from Braudon University, Manitoba, Canada,R7A 6A9. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/ or specific permission.
0
1989 ACM O-8979 l-300-O/89/0004/0040
$1 SO
40
in a more general, distributed configuration (with inter-nodal messagetrafIic).
THE COMMUNICATION
In this paper we discuss three primitives for inter-task communication: SEND, RECEIVE, and REPLY.
In addition to the software methodology, there is another compelling reason for the support of many tasks in a system. When concurrent tasks are executing in separateprocessor/memory modules, the ensuing parallel activity raises the performance level of the system. This is especially true if the architecture allows parallel execution of many “small granularity” tasks.
SEND specifies a bounded-length message to be transmitted to a specified receiving task, and the sending task is suspendedwaiting for a reply message from the receiving task. RECEIVE is issued by a task wishing to accept a message. If a messagehas already been sent to this task, the message and the sender’s name are returned. If no such message has been transmitted, the receiver is suspendeduntil such a messageis sent.
While the anthropomorphic style of program structuring promotes a variety of organizational stereotypes, the most popular paradigm involves the utilzation of client - server interactions. This is especially true for systems that are horizontally partitioned as a sequence of layers. In this case the system incorporates various resources or services that are administered by tasks or monitors. These services are organized in layers of increasing abstraction starting with a kernel, and culminating in the user program. A server task at a particular layer provides services to higher level client tasks. These designations do not apply to the the tasks themselves but rather to the role that a task plays during an interaction between itself and another task. Thus, a server task will act as a client when interacting with another server task in a lower layer.
REPLY
is issued by the receiving task after a successful RECEIVE; it transmits a reply message back to the sender and unblocks the sender so that it may continue execution. These primitives are modelled after those of Thoth[CHE79], and paradigms for the use of the primitives have been described by Gentleman [GEN81]. The primitives are remarkably similar to those that effect rendezvous in Ada [BAR82](Send, receive, and reply correspond to ENTRY call, ACCEPT, and END accept respectively), and the algorithms described here may be adapted easily to implement rendezvous. Additional facilities for task creation and deletion must also be established in support of these communication primitives.
We can considerably enhance the effectiveness of this interaction if the following reliablity and software methodological principles are carefully observed: PI) P2) P3) P4)
PRIMITIVES
client and server communicate only through a well defined interface, the client has no knowledge of the mechanisms used to implement the server, the server has no a priori knowledge of particular clients and their activities, and the server does not depend on the client’s correct behaviour.
SUITABILITY
OF THE Thoth PRIMITIVES
We now discuss the advantages of using the Thoth primitives in a system which constrains the interaction between client and server to be such that the client issues a request for service via a SEND primitive while the server cooperates in the interaction by executing a RECEIVE followed by a REPLY.
Our aim here is to find a simple set of essential primitives that provide suitable communication and synchronization for the inter-layer interfaces and to incorporate them into a computer architecture. Naturally, these primitives must be provided in the kernel layer; without them no additional layers could be built. A primitive is considered essential if it cannot be implemented in terms of the others without violating the principles just listed.
We observe principle Pl by insisting that client and server synchronize activities solely through the employment of these primitives, avoiding the use of any other “special” synchronization facilities. This enforced parsimony of primitives at the application level encouragesa sound software methodology since the synchronization mechanism is uniform throughout the system and is easily understood by all the application programmers working on the system.
41
by using a time-out facility which also adds complexity to the system.
In practice, there may be performance gains if one uses extra “primitives” such as copy-to and copyfrom which effectively move larger amountsof data betweenclient and server. However, theseoperations are executedby the server (in pre-reply code that is executed after a RECEIVE and before a REPLY) while the client is send-blocked. Thus, the synchronizationis still being achievedby the standard SEND, RECEIVE and REPLY.
Note that with proper authentication, and/or typing mechanisms,the server can check the validity of the incoming messageand subsequentlybehavecorrectly even if the client issues inappropriate or erroneous messages.
Principle P2 strives to promote a clean demarcation between the functional responsiblities of server and client. For example, the client need not be aware of the concurrency techniques used by the server. This is fosteredby the reply primitive since the servermay executepre-reply code that provides the service or it may act as an administratorwhich passesthe required serviceto a worker task which will provide a result to the serverfor eventualreply to the client.
OTHER PRIMITIVES COMPARED TO THOTH We now condsider a variety of communication and synchronization primitives discussing their applicability to the client-server relationship. These classesof primitives will be ranked in an order which exhibits a progressively tighter coupling between synchronizationactivity and the logic of the program.
Principle P3 is supportedby the “receive-any” nature of the RECEIVE primitive. The serveris not held up waiting for a specific sender, rather it will appropriately react to the requests issued by the current active sender (client). This has obvious performance and reliability advantages. It may be noted at this point, that although the server need not concern itself with the internal activities of the sender,it can often capitalize on an extremely useful and important aspect of the pre-reply code: namely, that it executes while the sender is blocked. As a consequence, the sender is in a known state of exection and the server can execute pre-reply code which formulates a reply that relies on this known state. This has implications for the logical structure of the code that go beyond the use of the copy-to and copy-from operationsjust described. Once again, the existenceof pre-reply code hasa significant impact on the principles listed earlier.
S1:
Non-OueuedMessage Communication with Wait
In this environment tasks run on different processors and communicate using SEND, RECEIVE primitives in order to provide a producer - consumer relationship between two tasks. This is exemplified by the Transputer[TRA86]. In the simplest of these systems,a processor will simply wait for a message to arrive if it executes a receive operation before a specific producer task executesa send. A processor will also wait if it must send a messagewhich is not ready to be receivedby a particular consumertask. In thesecircumstancesa wait is simply a suspensionof processoractivity without the possiblity of a context switch. This arrangementhas obvious disadvantages in terms of the system performance and due to the connection oriented (receive specific) nature of the communication, it is not easily adapted to the clientserver interaction. Other drawbacks that mitigate against the client-server interaction include the absence of a REPLY primitive and the lack of a queueing facility which would allow the kernel to build a list of clients trying to accessthe sameserver.
Finally, principle P4 is guaranteedby stipulating that the server will execute only RECEIVE primitives in its interaction with a client, never executing a SEND. Consequently, the server never becomes blocked waiting for some appropriate action from a client. If a serversendsto a client it may block for an extended period of time if the client does not respond in a timely fashion due to a software bug or communication failure. The behaviour of the server is then at the mercy of any client. At best, such a scenariocan lead to a compromiseof performanceby neglecting other clients assumingwe can avoid the worst casesituation of a perpetual wait
S2:
Message Communication with Buffer Synchronization
These systemsare characterizedby a software kernel which provides SEND and RECEIVE primitives for a buffered producer - consumerinteraction. Typically, synchronization is allied with context switching in order to make the best use of CPU resources.
42
execution, in the server, of a SEND primitive following instructions that would have the same functionality as pre-reply code. However, the server must then block and is thus at the mercy of the client in violation of P4.
There is a very important distinction between this style of communication and that provided by the Thoth primitives. In Thoth, synchronizationis much more coupled with the program logic as provided by the REPLY primitive which is used to unblock the sender after the activities expressedin the pre-reply code have occurred. With buffer synchronization, synchronization activity is coupled with task activity only through the messagegeneration and this has implications in the logical structure of the software since a server cannot exercise the option of having any control over the progression of execution in its clients as it can iu a Thoth system.
s4:
As noted earlier, the SEND, RECEIVE and REPLY of Thoth are very similar to the ENTRY, ACCEPT and END accept of Ada. The similarity is close enough that the Pl to P4 principles can be fulfilled. The primary differencebetweenAda and Thoth is that the END statementis tied to the ACCEPT statement by its placementwithin the ace@ construct while the REPLY and RECEIVE of Thoth are associatedonly through the joint use of the same task ID, namely that of the sending task. This difference precludes a server from interleaving the processing of several client requests(described as “local delay” by Liskov [LIS86]), except for the following restricted case: Because of the syntax of the Ada construct, an additional ACCEPT and END accept(working with a second sending task) can be placed in code that is separatefrom the other ACCEPT-END sequenceor it can be placed within the ACCEPT statementsprior to the END statement. In this latter case the two statementsare nested within the outer ACCEPT and END. In Thoth, the RECEIVE and REPLY primitives used in communication with multiple senders may be established in the code with any arbitrary interleaving as long as the REPLY to a senderfollows the RECEIVE for that samesender. It is fairly easy to describe task communication scenariosfor which the nesting requirement is overly restrictive when comparedto the Thoth primitives.
Another disadvantageof this approacharisesfrom the finite limitations of the buffer area needed to store messages in transit between the communicating tasks. Even if producer tasksare operating correctly, they can possibly generatemessagesto excess. Thus, the system must take special precautions to ensure that sendingtasksdo not overrun buffer areasand this introduces extra kernel complexity and a nonuniformity in the operation of the primitives. To express this argument in more concise terms, we would contend that the state of the system is best representedas the collective statesof all tasks in the systemand ti as the collective stateof all messages in their various stagesof transmission. There is yet another disadvantage which is also exhibited by the next strategy, so we will discuss it in the next point. s3:
The Ada Rendezvous
Blockinu Send and Receive
In this situation we have a SEND and RECEIVE rendezvous with synchronization similar to that described earlier in point Sl except that a hardware wait is replaced by a context switch. Transfer of information occurs only when both the source and destination tasks have invoked the SEND and RECEIVE commandsrespectively. Therefore,either the source or the destination task may be suspended until the other task is ready with the corresponding output or input. This is essentially a zero length buffer scheme. This approach is exemplified by the CSPprimitives describedin [HOA78].
A MESSAGE-PASSING ARCHITECTURB The objective of the Sylvan architecture has been to establishall the kernel facilities within the hardware architecture. While a hardware implementationof the kernel provides an extra measure of reliability and security, the major reason for this approach is the performanceenhancementof primitive execution. By using hardware specifically designed to implement kernel primitives, the SEND, RECEIVE and REPLY operationshave been greatly accelerated. This is not only due to the dedicated hardware functionality (a microprogrammedimplementation of the primitives) but also because of the reduction in the context switching that is required. If the kernel is
While the blocking SEND effectively side-steps buffer management problems, there is still an inherent weaknessexhibited by theseprimitives when they are forced into the client-server interaction. Since the sender is allowed to unblock at the execution of its associatedreceive, a reply cannot be formulated as in the Thoth environment. The best approximation to this activity would require the
43
the various tasks and it is also responsible for all the changesof state that a task undergoes as it executes the various communication primitives. State information for each task is retained in a task control block stored in a memory that is local to the Taskmaster.
implementedin software, using the sameprocessoras the application code, then the invocation of a primitive essentially requires a “context switch” to the kernel followed by another switch to the next running task when kernel activities are complete. As implied earlier, kernel speed is a worthwhile objective in order to reduce as much as possible the overhead associated with task management. This reduction will promote the utilization of smaller “lightweight” tasks in the application. We believe that a finer granularity of task function has a significant impact on software methodology since the programmer has much more latitude in the task structure of the system.
4.
While it is possible for all application tasks to run in a large flat addressspace,it is much more effective to organize the data and code areas of a task within an environment that has at least minimal facilities for memory management. Ideally, if the hardware can support this, page tables within the memory management facility would be modified by the Taskmaster.
In our view, the essential features of a messagepassing architecture involves the following componentsand facilities: 1.
5.
Communication Facilities
Since message-passingobviatesthe requirementfor a sharedmemory, we rely on a communication facility between the various processingcomplexes. This can be effected as a hierarchical bus structure or some connection facility such as the binary hypercube [SEI85]. Since communication between tasks is not patterned or specialized to any particular application class, we do not strive for concurrency gains by the employment of a particular connecting topology, It is sufficient that the inter-connecting topology be cost-efficientand reasonablyfast. 3.
hltfXlllDt2$
In responseto external events the system will often run an event handler task which must displace another lower priority task that is currently running within a processing complex. The Taskmaster will perform this context switch by first issuing an interrupt to the processorcomplex that is required for the executionof the handler task. The serviceroutine that responds to this interrupt executesa Taskmaster coprocessor instruction that initiates the context switch.
Processing Complexes
The system consists of a collection of processing complexes each comprised of a processor and associatedmemory accessiblevia a local bus. 2.
Memory Management
A REALIZATION OF THIS ARCHITECTURE The Sylvan multiprocessoris comprised of a number of nodeseachcontainedin a separatecard-cagewith a Motorola VERSAbus/RAMbus backplane.Each node (See diagram below) consists of one or more processing complexes, a Taskmastermodule and an INTERtask COMmunication module (INTERCOM). The processing complex is comprised of Motorola VERSAmodules M68KVM04 (CPU) and M68KVM13 (dynamic RAM). The CPU module includes a MC68020 32-bit microprocessorwith 16K byte on-board cache, MC68881 floating point coprocessorand a four megabytedynamic RAM with dual-port facilities. It should be noted that the dualport capability is very important. Most of the processor referenceswill be to its local RAM, these transactions being done over the RAMbus that is local to the pair.
The Taskmaster
Each processingcomplex relies on the hardwarebased kernel facilities provided by a module or subsystem that we will refer to as a Taskmaster. Communication between a processor and its Taskmastershould be fast and efficient. A possible approach would be to involve the Taskmaster as a coprocessor which performs kernel operations in response to the execution of SEND, RECEIVE and REPLY “instructions”. The Taskmaster handles messagetransfers between tasks since it is privy to all the memory mapping information associatedwith
44
VM04 Processor t-
ht-sabus VM04 Processor I-
IRambus
Taskmaster
VM04 Processor I-
I I
VM13 Memory
I-
Sylvan Node
45
This meansthat the internal node bus (VBRSAbus) is used only for coprocessor invocations and message traffic between complexes in the same node or bctwecn complexes in different nodes. VBRSAbus bandwidth should easily handle messagetraffic for the few complexes(1 to a maximum of 7) being used.
awareinvolve dynamicacquisition of resources,either in the form of per-messagetask creation, or in the creation of surrogate task descriptors to represent communicating tasks on remote machines. In the latter part of this paper, we describe algorithms that implement distributed communication while preservingthe aforementionedbenefits,
The Taskmasteris basedon the AMD 29116 bipolar microprocessor and 2910 microprogram sequencer working in conjunction with a fast static RAM that holds task descriptors in various queue structures. There is also a fast RAM which provides the bus registers for coprocessor communication exchanges between the Taskmasterand any one of the processor complexes.
We will first describe the basic communication algorithms as implemented on a uniprocessor or multiprocessor with shared backplane. Next, we describe the distributed environment to which the algorithms are adapted. We assume that each processormay reliably send a fixed-length packet of information to any other processor in constant time. The data structures used to implement distributed communciation are modelled very closely after those for the single node implementation. The stateof each task is representedby a unique task descriptor on a specific processor. Task descriptors on different processorsare linked togetherinto distributed message queues; it is the manipulation of these queues that forms the heart of thesealgorithms.
The INTERCOM module also communicates with the Taskmasterover VERSAbus. The INTERCOM is basically a small cross-point switch which allows bidirectional communication with four neighbouring nodes. It also contains a Direct Memory Access (DMA) subsystemthat provides high bandwidth data transfers to or from any of the memory modules in a node. The Taskmaster controls INTERCOM DMA operationsby writing commandsto a commandPlFO accessible via the VERSAbus. An INTERCOM status FIFO can be read by the Taskmasterby using the VERSAbus.
Algorithms for a Single Node Primitives for communcation between tasks resident within processors sharing the same backplane are handled by a Taskmaster that is local to this node. Each task is describedby a task descriptor in a local RAM within the Taskmaster. In addition to storing the execution context of the task, each descriptor contains
A variety of interconnection strategies are made possible by the adopted node architecture.Since each node communicateswith four other nodes,we can use any degree-4 interconnection scheme such as the binary hypertree.
STATE; lU&NEL ALGORITHMS (READY, SEND-BLOCKED, REPLY-BLOCKBD, RBCEIVE BLOCKED)
As mentioned earlier, synchronous messagepassing is desirable as a set of primitives for intertask communication, because it does not require any dynamic allocation of resourcesin sendinga message: all synchronizationis reflected by statechangesof the tasks involved in the communication, and information is transferred directly from one task to another without the need for a buffer. As a consequence,eachof the primitives can be executedin constant time and, as no exhaustable resources are consumed,delivery of the transmittedinformation is assureil.
BLOCKED ON< TASK-NAMB HEAD.TAIL.LINIQ TASK-NAME MSG. REPLY. SENDER:
However, when the sameprimitives are impIemented on a distributed system,theseadvantagesare typically lost: existing implementations of which we are
MEMORY-ADDRESS
46
PEPJaY R to task ifSisnotavalidtaskorS.STATB# REPLY-BLOCKED or S.BLOCKED-ON # P then ignore this reply; else copy R to SREPLY; S.STATE := READY;
We assumehere that there is a direct mapping from the nameof a task to its descriptor: the STATE of a task named P is denoted as P.STATE. The implementation of the primitives involves manipulating these fields in the task descriptors, and transferring messagesfrom the memory spaceof one task to another. After a primitive is executed, the CPU is dispatchedto executesometask P that has P.STATE = READY.
It is easily observed that each of these primitives executesin time that is bounded by a constant, and that failure occurs only in the event of programmer error. Execution of the SEND primitive may involve queueing: in this event, the queue is formed by manipulating the existing link fields in the task descriptors;the descriptor for the target task forms the head and the descriptor for each SEND-BLOCKED task is an elementin the queue.
The following algorithm is executed when the send primitive is executedby the currently executing task P. SEND m-e
M to task T and place replv in R;
if T is not a valid task name then fail; if T.STATB = RECEIVE-BLOCKED then transfer messagefrom M to T.MSG, T.SENDERt := p; T.STATE := READY; P.STATE := REPLY-BLOCKED, PREPLY := addressof R; P.BLOCKED-ON := T; else P.STATE := SENDBLOCKED; P.MSG := addressof M, PRBPLY := addressof REPLY; P.BLOCKBD_ON := T; if THBAD = NIL then THEAD, T.TAIL := p; else T.TAIL.LINK := p; T.TAlL := P;
In the Sylvan distributed system, each task P and its associateddescriptorresideson a particular processor. Each processor has a separate memory and communication takes place only via fixed length communcationpackets. We assumethat packets are deliveredreliably and in constanttime. The constanttime assumption is necessaryonly to guarantee that SEND, RECEIVE and REPLY can be executed in constant time. The data structures for the distributed implementationare essentially the sameas described previously. However, the queuesof task descriptors cannot be manipulated as easily because a single queue may involve descriptorsthat reside in different memories;the processorsinvolved must cooperateby exchanginginformation packets.
When task P executes a RECEIVE or a REPLY primitive, it is implementedas follows:
There are seven types of information packets, each having a differentformat.
RBCEIVE messaoeM and ulace sender’snamein&
Packet type
Contents
if MHBAD = NIL then P.STATE := RECEIVE-BLOCKED; P.MSG := addressof M; P.SBNDER := addressof S; else S := P.HEAD, copy messageat P.HEAD.MSG to M, P.HEAD.STATE := RFiPLY-BLOCKED;
SENDJBQUBST
sender-task, receiver-task, message
SEND-AGAIN
sender-task, receiver-task, next-task, message
47
SEND~QUBUE
PROCEED
sender-task, receiver-task, next-task
RECEIVE messageM and olace sender’sname in S; if PHEAD = NIL then P.STATE := RECEIVE-BLOCKED; P.MSG := address of M; PSENDER := address of S; else transmit PROCEED(P.HBAD,P) to P.HEAD.CPU; P.STATB := PROCEED-BLOCKED;
sender-task, receiverJask
RBPLY-BLOCK RBPLYJUZQUBST
ERROR
SenderJask, receiver&tsk, reply-message
REPLY R to task S:
sender-task, receiverJask
transmit REPLY-RBQUEST(S ,P,R) to S.CPU,
The first two types of packets are transmitted from the processor on which sender-task resides to the processor on which receiver-task resides; the last five are transmitted from the processor containing receiver-task to the processor containing sender-task.
These routines are executed whenever a packet arrives at processor C.
The algorithms for implementing SEND, RECEIVE and REPLY are expressed as two sets of routines. The first set contains the routines that are invoked when each of the primitives is invoked by the active task on a processor; these routines are the analogues of those in the previous section. The second set of routines are those executed by the processor upon arrival of a packet from another processor: there is one routine for each packet type. These routines must be executed atomically on the processor.
if RECEIVER is an invalid task then transmit ERROR(SENDER,RECEIVER) to SENDER.CPU; else if RECEIVERSTATE = RBCEIVB_BLOCKBD then copy MESSAGE to RECEIVER.MSG; copy SENDER to RECEIVER.SENDER; RBCEIVBR.STATB := READY: transmit REPLY-BLOCK(SENDER,RECEIVER) to SENDER.CPU, else if RECEIVERHEAD = NIL then RECEIVBRHBAD, RECEIVBR.TAIL := SENDER; else transmit SEND_QUEUE(RECIEVER.TAlL, RECEIVER,SENDER) to RBCEIVBR.TAlL.CPU; RBCEIVBR.TAIL := SENDER;
SEND-REQUEST (SENDER, RECEIVER, MESSAGE)
The following routines implement primitives when invoked by a task P. The fields of P’s descriptor are available only to the processor on which P resides (denoted P.CPU ), and contain the same information as described in the previous section, except that STATE may take on the possible values SEND-RBQUBST-BLOCKED, SEND-QUBUE-BLOCKED, RBCIBVB~BLOCKBD, SEND-AGAIN-BLOCKED, and REPLY-BLOCKED. SEND messageM to task T and place reolv in R;
SEND-AGAIN (SENDER, RECEIVER, NEXT, MESSAGE)
PSTATB := SEND-REQBLOCKED; P.BLOCKBD-ON := T; P.MSG := address of M; PREPLY := addressof R; transmit SEND-REQUBST(P,T,M) to T.CPU,
copy MESSAGE to RBCBIVBR.MSG; copy SENDER to RECEIVER-SENDER; RECEIVERHEAD := NBXT; RECIEVBR.STATB := READY ;
48
SEND-REQUEST packet arrives, and it is observed that R is not RECEIVE-BLOCKED, the message itself is discarded (to be re-transmitted later). A record of the SEND-REQUEST is kept. In the simple case (Figure 2),, Sl is the only task to issue a SEND-REQUEST to R . Sl ‘s name is recorded in the R ‘s descriptor. When the RECEIVE primitive is eventually invoked, a packet PROCEED is transmitted to Sl.CPU , and the message is retransmitted. Later, a REPLY is effected as described above. The total number of packets transmitted in this interaction is 4.
SEND-QUEUE (SENDER, RECEIVER, NEXT) SENDER-STATE := SENDQUEUE-BLOCKED; SENDER.LlNK := NEXT; PROCEED (SENDER, RECEIVER) SENDER.STATE := REPLY-BLOCKED; transmit SENl_AGAIN(SENDER, RECEIVER, SENDER-LINK, tSENDER.MSG);
If another task 52 sends to R before R has received the message from Sl (Figure 3), a queue must be formed. S2 is added to the tail of the queue by transmitting a SEND-QUEUE packet to Sl.CPU. Later, when S 1 ‘s messageis received, R.CPU sends a PROCEED packet to S l.CPU , and S l.CPU transmits SEND-AGAIN to R.CPU. In addition to transmitting Sl’s message, this packet indicates that S2 is next in the queue, and R’s descriptor is updated accordingly. Finally, a REPLY takes place in the normal. This interaction is the most complicated that can arise, and involves the exchange of five packets per SEND-RECEIVE-REPLY cycle.
REPLY-BLOCK (SENDER, RECEIVER) SENDERSTATE := REPLY-BLOCKED; REPLY-REQUEST (SENDER, RECEIVER, REPLY-MESSAGE) if SENDER.STATE = REPLY-BLOCKED and SENDER.BLOCKED-ON = RECEIVER then copy REPLY-MESSAGE to SENDERREPLY; SENDER.STATE := READY;
DISCUSSION ERROR (SENDER, RECEIVER) Clearly, these algorithms have a cost in the number of packets sent. With unlimited buffering at the receiver’s processor, SEND-RECEIVE-REPLY can be accomplished with 2 packets, as opposed to 3-5 packets. The advantageof the approach described here is in its simple semantics, from the point of view of both the implementor and the user. There is no such thing as unlimited buffering, and when a constraint is applied, the possibilities of message failure and deadlockaur.
SENDER-STATE := READY; move “Invalid Receiver” to SENDERREPLY; EXPLANATION A SEND-RECEIVE-REPLY interaction between two tasks may involve the transmission of 3, 4, or 5 packets, depending on timing circumstances. In the simplest case, a message is sent from a task Sl to a task R that has previously executed a RECEIVE (Figure l), and is therefore RECEIVE-BLOCKED. In this case, the message in the initial SEND-REQUEST packet (transmitted from Sl.CPU to R.CPU ) is copied and R is allowed to execute. A REPLY-BLOCK packet is transmitted back to S l.CPU to indicate that the message has been successfully delivered. Later, a REPLY primitive causes a REPLY-REQUEST to be transmitted to SLCPU, and S 1 resumes execution.
Two modifications to the algorithms can reduce this cost. First, the packet REPLY-BLOCK is necessary only to ensure the validity of REPLY invocations. If the language system makes it impossible to reply to a task without first receiving from it (as in Ada), these packets may be eliminated, reducing the cost of the simple SEND-RECEIVE-REPLY to 2 packets. In many systems, the vast majority of interactions are of this form. Finally, a hybrid scheme may be used: we may use buffering until the available space is exhausted, and then fall back on the scheme described here.
The other scenarios occur when a SEND is done before the corresponding RECEIVE. When the
49
Sl
R
RECYVE I I I I SEND I
----
SEND-REQUEST ---w
I ----
+ 1 /
I
/
SEND-REQ I BLOCKED
I I
REPLY
BLOCKED REPLY
FIGURE
I.
RECEIVE
BEFORE
50
SEND
RECEIVE
BLOCKED
531
R
SEND-REQ
BLOCKED
PROCEED
BLOCKED REPLY
BLOCKED
c -
FIGURE
2.
SINGLE
SEND
51
REPLY
BEFORE
RECEIVE
SEND-REQ I
1
I
SEND-Q
I
SEND,REQ
BLOCKED BLOCKED
REPLY
BLOCKED
REPLY
BLOCKED
FIGURE
3.
MULTIPLE
SENDS
52
BEFORE
RECEIVE
IMPLEMENTATION
[CHE83] Cheriton, D.R., and Zwaenepoel, W., “The Distributed V Kernel and its Performance for Diskless Workstations”, Proc. Ninth ACM Symp. Operating Systems Principles , Oct. 1983, pp. 128-140.
RESULTS
To-date we have built a single node with the microcoded kernel running in the Taskmaster. Currently, the time for a SEND-RECEIVE-REPLY cycle is 62 microseconds with a zero byte message and 78 microseconds with a 16 byte message.These are timings for tasks running on a single processor. If the communicating tasks are on different processor modules (situated on the same backplane) then the respective timings are 48 and 65 microseconds. This faster result is due to the execution overlap that takes place with code running simultaneously on two processors.
[CHE85] Cheriton, D.R., “Evaluating Hardware Support for Superconcurrency with the V-Kernel”, (Preprint), Computer Science Dept., Stanford University. [GEN813 Gentleman, W-M., “Message Passing Between Sequential Processes: The Reply Primitive and Administrator Concept”, Software -- Practice and Exuerience, Vol. 11,1981, pp. 435-466.
The Taskmaster has a 2K by 80-bit microstore. The aforementioned kernel required 495 microinstructions. The distributed algorithms have been simulated and we are now in the process of establishing them in the Taskmaster.
[GEN83] Gentleman, W.M., “Using The Harmony Operating System”, National Research Council Canada, NRCC No. 23030, Dec. 1983.
ACKNOWLEDGEMENTS
[H~831 Hirschy, E., “Hermes: An Operating System for a Modula-2 Environment”, Proc. ACM Conf. on Personal and Small Computers, San Diego, Dec. 1983, pp. 163-167.
The authors wish to thank Crispin Cowan, Scott Darlington and Dennis Vadura for their efforts in debugging the hardware and writing the microcode. We also thank John Rogers and Dermot Harris for many supportive suggestions and general helpful endeavour.
lHOA78] Sequential Hoare, C.A.R., “Communicating Processes,” CACM, 21, 8, Aug. 1978, pp. 666-667.
REFERENCES lLIS861 Liskov, B., et al, “Limitations of Synchronous Communication with Static Process Structure in Languages for Distributed Computing”, Proc. 13th ACM Symp. Print. Prog. Lang., 1986, 150-159.
IBAR
Barnes, J.G.P., ProPramming in ADA, International Computer Science Series, Addison-Wesley Publishers Limited, 1982, pp.203-207.
[BEA
MANS11
Beach, R., Beatty, J., Booth, K., Fiume, E., and Plebon, D., “The Message is the Medium: Multiprocess Structuring of an Interactive Paint Program”, Commuter GraDhics, Vol. 16, No. 3, Aug. 1982, pp. 277-287.
Manning, E., Wong, J.W., Tokuda, H., et al., “Shoshin - A Testbed for Distributed Software”, Proc. ICCC ‘81, June 1981, 25.5.1 - 25.5.5. [SEI85] Seitz, C.L., “The Cosmic Cube,” Comm. of the ACM, (28) Jan. 1985, pp. 22 - 33.
Booth, K., Schaeffer, J., and Gentleman, M., “Anthropomorphic Programming”, TR CS-82-47, Dept. of Comp. Sci., Univ. of Waterloo, Feb. 1984.
[TRAf-w “Transputer Reference Manual”, Inmos, 1986.
[CHE791 Cheriton, D.R., et al., “Thoth, A Portable Real-Time Operating System”, Comm. ACM 2,1979, 105-l 15.
IYATW “Waterloo Port User’s Guide”, Waterloo Microsystems Inc., Waterloo, Feb. 1984, Edited by P.A. Didur, M.A. Malcolm, and P.A. McWeeny.
53