Architectural CCS Padmanabhan Krishnan
Department of Computer Science University of Canterbury, Private Bag 4800 Christchurch 1, New Zealand E-Mail:
[email protected]
1
Abstract
In this article, we discuss the eect of architectures on behaviours and the notions of equivalence for CCS terms. Two types of architectures are considered viz., shared memory systems and distributed memory systems. Processes can be logically migrated in shared memory systems while in distributed memory systems, processes are bound to a location and require explicit migration. Communication in shared memory systems can follow the CCS principle. A complete equational characterisation of the bisimulation equivalence induced by the execution on multiprocessors requires an extended syntax; which captures the process of compiling and loading. To permit a realistic description of communication in distributed systems, asynchronous actions are required. Due to this the complete axiomatisation for the bisimulation equivalence is more sensitive to the causal structure of the processes. A practical interpretation to the complete axiomatisation of the bisimulation equivalence is also given.
Keywords: Multiprocessors, Distributed systems, CCS
1 Introduction A number of dierent types of parallel and distributed machines have been built. These include vector machines [Rus78] data driven/demand driven machines [TBH82], shared memory systems, like the Sequent, and distributed memory systems ranging from a network of work stations to well structured architectures like the hypercube. More parameters such as heterogeneity, replication for fault-tolerance etc. add to the variation. Despite the variety of machines, a common factor is the presence of multiple processing elements and interconnections between them. For these machines to be used to their capacity, the applications must be aware of the resources available such as the number of processors and the cost of using the interconnecting elements. While these factors need not (and should not) be considered at the abstract speci cation level, these need to be addressed at the implementation level. For example, a compiler writer requires a detailed semantics for the architecture to design an ecient code generator. Our aim is to present a framework in which implementation dependent details can be introduced to an abstract speci cation. The use of our framework in program development is discussed in sections 2.2 and 4. Process algebras such as ACP [BK88], CCS [Mil89], CSP [Hoa85] have been used to describe concurrent systems. The relevance of these theories in real systems is illustrated by the ISO speci cation language LOTOS [BB89, vVD89]. The syntax of these calculi are simple and yet powerful enough to express parallelism, non-determinism and synchronisation. As they are abstract models, they do not address issues to related to modelling the physical environment. Also the bisimulation semantics of CCS [Mil89] reduces parallelism to non-determinism (or interleaved execution of actions: the expansion theorem for CCS). This is due to the presence of only one observer with the power to observe one action at a time. There are other approaches which describe a non-interleaving semantics [Pra86, DDM88, CH89, KHCB91a] but do so only at a logical level. In this paper we provide a semantics for CCS which takes into account \physical available distribution". Given that there is no uniformity in the nature of distributed architecture, one needs to de ne \distributed". For that we consider the natural language meaning of the terms concurrent, distributed and parallelism. Concurrent is de ned as \happening together in time or place"; distributed as \scattered or spread out over a surface" [LL90] and parallelism as \without interactive causal relationship". In CCS concurrent has the interpretation of place or processor. Thus it is not surprising that parallelism, given one place, reduces to non-determinism. Our goal is to \scatter parallelism over a surface". Given that parallelism is to be mapped over a surface, the question arises whether to allow the process to contain some information regarding the mapping. If so, another level 2
Output Lines C1
C2
Ck
Cn
Clock
Loaded Processes Figure 1: Multiprocessor Machine Model of abstraction can be created. The mapping information available in the process can be considered to be a logical representation of the surface. One then studies the eect of mapping a logical surface to a physical surface. In distributed computation, one considers the representation of distribution also called binding and the mapping of binding to the physical surface, i.e., con guring. This view is indeed taken by the advocates of the virtual node approach to distributed computing [CM87]. Thus there are two stages in program construction, viz., a) distributed surface with logical binding and b) the con guration of distributed memory. Our aim is to develop a calculus in which the following can be studied: a) Expressing parallelism, b) Describing the eect of parallelism and c) Mapping of the expressed parallelism to a physical environment. We use CCS as a language in which parallelism can be expressed. An operational semantics as in uenced by the type of hardware is developed. We then show the limitation of the syntax in studying system development/programming issues and present an extended syntax. The theoretical reasons for extending the syntax; viz., complete axiomatisation, is also discussed. In this paper the eect of two main classes of architecture on the speci cation and execution of concurrent programs is studied. The architecture classes we consider are a) multiprocessor or shared memory systems and b) distributed memory systems. In section 2 a CCS-based semantics for multiprocessor systems is presented while in section 3 a semantics for loosely coupled systems is presented. In section 4 we show how some of the issues in distributed software development can be accommodated in our framework.
2 Multiprocessors A multiprocessor system or a shared memory system has a number of processors and a single memory unit which is accessed by the processor using a bus [JS80] (see gure 1). This machine model is similar to the Chemical Abstract Machine [BB92]. The Chemical Abstract Machine models processes as being suspended in a solution with the ability to interact with one another. The multiprocessor machine model can be considered to be a Chemical Abstract Machine with a bounded number of catalysts (processing elements) which are essential for any evolution. The machine and language described in [BCM88] forms the basis for the semantics described here. In [BCM88] the main focus is on illustrating applications using the model and implementation of the model on a concrete architecture (based on transputer like 3
machines). In this paper we develop a theory which can be applied to the implementation of concurrent systems. We assume that the processors are homogeneous and memory is uniformly accessible to all processors. This allows `logical migration', i.e., any process can use any processor, without any relocation cost. Therefore our semantics generates all possible behaviours resulting from all possible process to processor allocations. Scalability (the ability to add more processors) and fault-tolerance (the ability to function after losing a processor) are important properties of a multiprocessor system. We consider the eect of altering the number of processors of a system on the behaviour exhibited by processes. In this section we rst present an operational semantics based on labelled transitions systems for CCS processes which are to be executed on a shared memory system. The semantics is indexed by a nite number of processors. Based on the operational semantics, we develop a notion of bisimulation and relate the behaviours of processors with dierent numbers of processing elements. We also discuss the issues related to a complete axiomatisation of the bisimulation equivalence. As in CCS, we assume a set of actions . As usual we assume , to be a bijection on such that a = a. This can be extended to sets of actions where for a set S, S represents element-wise application of . Typical elements of are denoted by a; b : : : . A special action not in is used for synchronisation. We let range over [f g. The syntax of the language is as follows. P = 0 (P) (P + P) (P j P) (P n H ) The process 0 represents a terminated process and can exhibit no action, `' denotes action or pre x, `+' non-determinism, `j' parallel composition and `nH' restricting of actions not in H. For the sake of simplicity we do not consider recursion. A structural operational semantics [Plo81] is de ned as a generalisation of the rules for CCS. We assume that the following black-box is a model of a multi-processor system which runs a process given n processing elements. There is a `clock' line which when toggled advances each processing element by one step. The observer rst toggles the `clock' and then notes the behaviour on the n-lines (which may appear at dierent times with respect to some real clock) and the process continues. This is shown in gure 1. The semantics developed here is similar to the step semantics developed in [vGV87] but the number of actions in a step is bounded. However as will be seen later, we do not assume a synchronous model. Therefore, our semantics is dierent from SCCS [Mil83], where asynchronous evolution is disallowed. Not all processors in the system may be required by a process at all the steps. For example, if a system has two processors to execute P only one of them can execute the action `'; the others will necessarily be idle. (P may or may not be able to use both the processors.) Let represent idling (of a processor) and let Act = [ f; g For the observations (the labels in the transition relation) we use n-tuples as opposed to multi-sets. This facilitates the requirement that synchronisation of processes occur on the same processor. It would be unrealistic to assume synchronisation across dierent processing elements. This captures the intuition that synchronisation occurs at a location; the processor representing the location. Using the Chemical Abstract Machine analogy, synchronisation can occur only by moving the processes physically close to each another. Architecturally, synchronisation across processors would require the bus to support a particular protocol. It would be unrealistic to encode such a protocol in the operational semantics. This is not to conclude that multisets as observations cannot be used. We shall see later that observing n-tuples while including multisets in the syntax allows us to distinguish between compilation and loading. The rst step in de ning the operational semantics is to specify the possible observations. 4
De nition: 1 Let On denote the function space from f1 .. ng to Act (or n-tuples) and for any S 2 On , Actions(S) = (codomain(S) - fg). The intuition in using On is that if one is given n processing elements one can observe
n actions at every step. Actions(S) identi es all the non-idling actions of the observation S.
A single action can be exhibited on any of the n processors. The set of possible options is de ned by the function Allocate which is de ned as follows.
De nition: 2 Allocate() = f O 2 On 9 i, O(i) = and 8 j 6= i, O(j) = g Legal combinations of observations are de ned as follows. De nition: 3 De ne a partial function +n on On On ! On as follows: O1 +n O2 = O where for every x 8 > < O1 (x) if O2(x) = O(x) = > O2 (x) if O1(x) = : if O1(x) = O2 (x) As processes can compete for the processors, one has to de ne consistency of processor allocation. We assume that only one action can be exhibited by a processor at any time. As mentioned earlier, if two processes are attempting to synchronise, they are required to be on the same processor. As an element of On represents observing n actions simultaneously, +n de nes combining observations in a truly parallel fashion. The de nition requires a processor to be idle with respect to one process if the other is to be able to use it except in the case of synchronisation. If both processes do not use a processor, it is idle in their combination. If both processes use the processor to exhibit actions which cannot be synchronised, their parallel combination is unde ned.
De nition: 4 The operational semantics is de ned by a relation ?!n Processes On Processes, such that it is the smallest relation satisfying the axioms in gure 2. It
describes the behaviour of processes when n processing elements are available.
A brief and informal explanation of the operational semantics is as follows. The elementary action can be executed on any of the processors and due to sequentiality all but one will be idle. We do not require a process to be xed to a processor. If the machine architecture is to be exploited, the migration of processes to dierent processors has to be permitted. An atomic action can be considered to be the basic unit of scheduling. The process is preempted after executing a single action and returned to the pool of processes competing for the limited resources. Non-deterministic choice also has the usual de nition; i.e., if a process can exhibit an action (or set of actions) so can its non-deterministic combination with other processes. The rules that determine the behaviour under parallel composition are as follows. The rst requires the assignment of processes P and Q to be compatible for the parallel composition to be successful. The second interleaves the execution. The rule for restriction is as usual; i.e., (P nH) cannot exhibit a behaviour in which any action in H or in H is involved. It is possible to impose a step optimal parallelism requirement (under a limited number of processors) by requiring that all possible processor assignments fail before applying the interleaving law. This would be the adaptation of the maximal parallelism model [SM82] to suit limited resources. For example, one could require that the only acceptable behaviour of (a0 j b0) given 2 processors is executing them on dierent processors; interleaving is disallowed (i.e, \no unnecessary waiting" is modelled). Interleaving would have to be permitted for (a0 j b0 j c0) given 2 processors. However, this results in the parallel operator being not associative as shown in the example below. 5
Pre x
O P 8 O 2 Allocate(): P ?! n
Non-Determinism
S P0 P ?! n S S P0 0 P+Q ?!n P , Q+P ?! n
Parallelism
S 1 P0 , Q ?! S 2 Q0 , P ?! n n S = S 1 +n S 2 S P0 j Q0 P j Q ?! n
Interleaving
S P0 P ?! n S S Q j P0 0 P j Q ?!n P j Q, Q j P ?! n
Restriction
S P0 , P ?! n Actions(S) \ (H [ H) = ; S P0 nH P nH ?! n
Figure 2: Operational Semantics
Example 1 A possible behaviour for the process (a0 j b0) jc0 under 2 processors is the 2-tuple of a and b followed by c. However a0 j (b0 j c0) cannot exhibit this as (b0 j c0) can only exhibit the 2-tuple containing the actions b and c. As this goes against the intuition of the parallel operator, the step optimal semantics is not adopted. In the following section we present a few results related to the presence of limited number of processors.
2.1 Resource Induced Properties
The rst result is that as only the parallel operator introduces multiple observations, it is natural that if a process P exhibits k non-idling actions, P must be composed of at least k parallel processes. Towards that we use the obvious generalisation of the observational preorder introduced in [Mil89]. Intuitively, if P is less than Q in the preorder, Q can exhibit all behaviours of P. This is de ned formally as follows.
De nition: 5 P @ n Q i 8 S 2 On P ?!S n P0 implies 9 Q0: Q ?!S n Q0 and P0 @ n Q0. We are also interested in a bisimulation semantics (written as n ) for n-indexed mul-
tiprocessor semantics. We use the established technique [Mil89] and de ne it as follows:
De nition: 6 Let n be the largest equivalence relation such that P n Q i for all S in On S P0 , there exists Q0 Q ?! S Q0 and P0 Q0 Whenever P ?! n n n S S 0 0 0 0 Whenever Q ?!n Q , there exists P P ?!n P and P n Q0 Our semantics is a generalisation of the standard CCS semantics by explicitly considering the number of processors in the system. Clearly, if there is only one processor in the system, the standard behaviour must be exhibited. This is indeed the case.
Proposition 1 1 = CCS . 6
Proof: It is easy to verify that ?!1 is identical to the transition rules for CCS. 2 Lemma 1 If P ?!S n P0 and the number of non-idling actions of S (i.e, cardinality of
Actions(S)) is greater than 1, then there exists: 1) Processes P1 , P2 and P3 , 2) Observations S1 and S2 and 3) A subset H (could be empty) of such that: S1 1) P1 ?! n P1 0 , S2 2) P2 ?! n P2 0 , 3) S1 + S2 =S and 4) Either (P1 0 j P2 0 j P3 ) n P0 (hence H is the empty set) or ((P1 0 j P2 0 j P3 ) nH) n P0 Proof: By induction on the structure of the process P. Let P be (R1 j R2). In this case H will be the empty set. If both R1 and R2 contribute to form S then P1 is R1 , P2 is R2 , and P3 is 0. If only one evolved say R1 , then by the induction hypothesis, there are R11 , S1 S2 R12 and R13 , such that R11 ?! n R0 11 and R12 ?!n R0 12 and R0 11 j R0 12 j R13 n R1 0 . Now P0 n (R1 0 j R2 ). Then letting P1 be R11 , P2 being R12 and P3 being (R13 j R2 ) satis es the condition. If P is of the form (R1 j R2 )nH1 , an argument similar to the above one can be constructed but with H being non empty. For this argument we equate H to H1 . 2 Note that in the above result we do not derive the structure for P. In general P could have made various choices and one has to introduce choices at every point where an action pre x occurs. For example consider the process ( (((aP1 + P10 ) j (cQ1 + Q10 )) + R10 ) j (((bP2 + P20 ) j (dQ2 + Q20 )) + R20 ) j P3 ) + P40 under 4 processors and the observation < a; b; c; d >. Also assume that the processes P10 , P20 , P40 , Q10 , Q20 , R10 and R20 do not contribute to the observation. Given a four action observation one can conclude that the derived process will be of the form (P1 j Q1 j P2 j Q2 j P3). The process that generated the observation can be of various forms. As our operational semantics is indexed by the number of available processors, we can study the eect on observable behaviour of adding and removing processors from a multiprocessor system. It is easy to see that if a set of actions is exhibited by a process, any non-idling subset of it can also be exhibited. This is because we have not assumed any notion of optimality in the operational semantics. This is stated formally in proposition 2. De nition: 7 Let R and S 2 On. De ne R S, i there is a 1-1 map F, on f1 .. ng such that 8 i, R(i) 6= implies R(i) = S(F(i)), i.e., S observes more actions but with possibly dierent processor usage. S P0 and R S, and 9 i, R(i) 6= , then 9 P00 such that P ?! R Proposition 2 If P ?! n n
P00
Proof By structural induction.
2
In general, a behaviour using n + 1 processors can be used to predict the behaviour which can use only n processors. All observations involving n + 1 non-idling actions have to be discarded. It is easy to see that if two processes are similar under n + 1 processors, they will be related under n processors. 7
Proposition 3 P @ n+1 Q implies P @ n Q. Proof: From proposition 2.
As adding more resources can expose `true concurrency', (P @ n+1 Q), does not necessarily hold if (P @ n Q). For example, (a0 j b0) @ 1 (ab0 + ba0), but (a0 j b0) 6@ 2 (ab0 + ba0). However, if the process on the right is the `more parallel one', the result holds.
Lemma 2 If Q is a process not involving +, P @ n Q implies P @ n+1 Q. Proof: Let Q have no +, P @ n Q but P 6@ n+1 Q. As P 6@ n+1 Q, either - there is a transition P ?!Sn+1 P0 and Q has no transition labelled by S or - there exists Q0 such that Q ?!Sn+1 Q0 and P0 6 @ n+1 Q0. Consider the rst case. It is clear that the cardinality of S is n + 1 (if less than n + 1 it violates P @ n Q). Thus, by lemma 1, S is composed of S1 and S2 such that P ?!S1n+1 and P ?!S2n+1 . S1 S2 If the cardinality of S1 and S2 is less than n + 1, Q ?! n and Q ?!n . If Q cannot exhibit S, then either 1) S1 + S2 is not de ned which is not the case or 2) there is a choice between S1 and S2 in which case Q has a +. However, if the cardinality of both S1 and S2 is n + 1 (hence S consists of (n + 1) actions), some sub-term in P has to produce an S1 and another sub-term has to produce S2 which will consist of all the complementary actions of S1 . Hence S1 cannot contain any action. By applying lemma 1 to the appropriate sub-terms, the desired result can be proven. The second case can be handled by an induction argument on the size of P and Q. 2 This completes the discussion of nite resources in a multiprocessor environment. In the next section we discuss both practical and theoretical issues related to `compilation' and complete axiomatisation respectively. We present a limitation of the syntax of CCS and show how it can be overcome.
2.2 Compilation and Axiomatisation
The practical issue we consider is that of compiling for multiprocessors. The operational semantics for CCS on multiprocessors essentially coded an \on-line" scheduler. For example, executing the process aQ (call it P) involves selecting an arbitrary processor and allocating it to the process P. Similarly the process (P1 j P2 ) is scheduled by assigning processors to P1 and P2 . If there is an overlap of processors, the processes are required to synchronise. We show how the on-line scheduling can be compiled. The relevance of compilation leads us to the theoretical issue of a sound and complete axiomatisation of the bisimulation semantics for multiprocessor CCS. As a given process could exhibit more parallelism than the available number of processors, some form of interleaving is necessary. Consider for example the behaviour of the process (say P) (a0 j b0 j c0) given 2 processors. Two possible observations are two non idling actions (say `a' and `c') followed by `b' and the action `a' followed by the two tuple of `b' and `c'. The various observations represent choices and we believe we should be able to characterise them in our framework. This is related to the expansion theorem in CCS (i.e., reduction of parallelism to non-determinism). For example, (a0 j b0) CCS (ab0 + ba0). As the CCS semantics is a special case of the n-processor semantics, one would expect a similar law for the n-processor case. The expansion theorem could be expected to be a reduction of a process which can exhibit n + 1 actions, but is given only n processors, to a process which can exhibit only n actions. But unfortunately that is not the case. 8
We can show that if the process P considered above is related to a term T under 2 , T can exhibit all the 3 actions in one step given 3 processors. The intuitive argument is as follows. Assume T cannot exhibit the 3 actions in one step. As P can exhibit a and evolve to the process (b0 j c0), T could involve terms such as ((a b 0) j c 0) or (a(b0 j c0)). Terms of the rst type are disallowed as they can exhibit the action c and evolve to the process (ab0). But no c evolution of P is bisimilar to (a b0). A term of the second type is not sucient as P can exhibit the actions a and b in one step. The lack of an expansion theorem for P can formally be stated as follows. Proposition 4 Let P = (a0 j b0 j c0). If P 2 Q+R, then either P 2 Q or P 2 R. The reason for this result is that the j combinator does not force both its branches to evolve. As the transition rule for parallel composition permits interleaving, it is impossible to force a process to exhibit multiple actions at a particular step. This is mainly due to the on-line scheduling algorithm used implicitly in the operational semantics. The principal problem is that the combinator j is too `liberal'. It permits the exhibition of any non-empty subset of the actions that can possibly be exhibited in one step. Therefore it is essential to have a construct which forces multiple actions to be performed in one step. That is, we need to consider terms which have been partially scheduled for execution. For this we alter a single action pre x to a multi-set pre x. A multi-set captures multiple actions that should occur in one step. Interleaving of the actions within a multiset is not permitted. The compiler translates a CCS process into a process with multi-set pre xes. For example, (a0 j b0) can be considered to be an abbreviation for ab0 + ba0 + fa,bg0. If there is only one processor the pre x fa,bg cannot contribute to the behaviour and (a0 j b0) is equivalent to ab0 + ba0. Similarly, (a0 j b0 j c0) can be thought of as a(b0j c0) + b(a0j c0) + c(a0j b0) + fa,bgc0 + fa,cgb0 + fb,cga0 + fa,b,cg0 and if there are only 2 processors, fa,b,cg will not contribute to the behaviour. Thus a multi-set pre x represents `forced' parallelism and captures partial scheduling of parallel processes. It is only partial as it is possible that the cardinality of the multi-set is greater than the number of available processors. In such a case no evolution is possible. Eliminating such terms could be perceived as rejection of the alternatives at loading time. In the rest of this section we formalise the above notions and show that if the language permits a multi-set pre x, the resulting bisimulation equivalence for nite processes can be completely axiomatised. We also assume that the number of processors is xed (n 1).
De nition: 8 De ne a multi-set m as a function, m: ([Xf g) ! N P De ne the cardinality of a multi-set m, j m j, as m(a) where indicates a2[f g
integer addition.
The following is the syntax for a multiprocessor language whose bisimulation semantics is axiomatised. P = 0 msP (P j P) (P + P) (P nH) The only dierence from the initial language is that action pre x () is replaced by a multi-set pre x (ms). The semantics of an atomic action permitted the use of any of the available processors. Similarly the semantics of a multi-set of actions permits any possible assignment of processors to the actions. The multi-set pre x introduces another level of scheduling. Given a multi-set an allocation of actions to processors is required. This is de ned by the function Assign, which is a generalisation of Allocate. Given an empty set, all the processors in the system are idle and that is the only possible assignment. Given an 9
S 2 Assign(ms)
Multi-set Pre x
S P msP ?! n
Figure 3: Operational Semantics for Multiset Pre x assignment of k actions, the k+1st action can be scheduled on any of the idle processors. Complementary actions within a multi-set pre x cannot synchronise with one another. For example, if m is a multi-set such that m(a)=1 and m(a)=1, Assign will require at least two processors to execute it. This does not imply that (a0 j a0) cannot synchronise. The above process will be translated to fa,ag0 + 0 + aa0 + aa0. In it the rst component represents simultaneous evolution without synchronisation. De nition 11 formalises the intuitive meaning.
De nition: 9 Assume a xed n. Assign is the smallest set satisfying the following - Assign ; = f < , ... ,> g - If (Y 2 Assign ( m) and Y(i) = and X (j ) =
(
a if j = i Y (j ) otherwise and
m() + 1 if = a m0() = m () otherwise
then X 2 Assign m0 .
Given an observation, the multiset that gave rise to it can be obtained by the function
Assign?1 de ned as follows. De nition: 10 Assign?1(S) = m such that m(a) = cardinality(fi such that S(i) = ag) As Assign permits all possible allocations of actions to processors, the following hold.
Proposition 5 If S 2 Assign(m) and S0 is a permutation of S then S0 2 Assign(m). Proposition 6 If S 2 Assign(m) then Assign?1(S) = m. Example 2 Consider a 2 processor system. If m(a)=1,m(b)=1 and m(c) = 0 for c other than a or b, Assign m = f ha; bi , hb; ai g. Assign?1 (ha; bi ) = fa,bg. The semantics of multiset pre x (msP) is de ned by inference and given in gure 3.
The transition rules for the other constructs are as before. Although it is possible to give a more direct and simpler de nition of Assign?1 , we adopt this approach to make the process of scheduling (i.e., converting a multi-set into an n-tuple) explicit. De ne strong bisimulation equivalence for the language as before. The principal aim of considering a language with multi-set pre xes is to be able to have an axiomatisation of bisimulation. To do this we need a generalisation of the expansion theorem. The CCS version needs to be generalised not only to handle multiset pre xes but also to combine multiset pre xes from two processes to form another pre x. Towards that aim we de ne the functions Combine and Choice. Combine m1 m2 as the set of all possible behaviours that can result by exhibiting the multisets m1 and m2 in one step. Choice is used by Combine to synchronise two elements to exhibit .
De nition: 11 Combine of two multisets is the smallest set satisfying the following conditions.
10
P+P=P (P + Q) + R = P + (Q + R) (mP) na = m(Pna) if m(a) and m(a) = 0 (mP) = 0 if j m j > n.
P+0=P (mP) na = 0 if m(a) or m(a) 6= 0 (P + Q) na = (P na) + (Q na) 0 na = 0
Figure 4: Equations for Multiprocessors If P =
X
mi Pi and Q =
i2I X
(P j Q) =
i2I
X
mj Qj and Ci;j = Combine mi mj then
j 2JX
mi (Pi j Q) +
j 2J
mj (P j Qj ) +
XX X i
j m2Ci;j
m(Pi j Qj )
Figure 5: Expansion Theorem for Multiprocessors
- Combine ; m1 = Combine m1 ; = f m1 g - Let (m d d) denote the function obtained from m by restriction of the domain to d.
If m1(a) = k1 and m2(a) = k2 with k1 > 0 or k2 > 0 then Combine m1 m2 = f S [ D where S 2 Combine m10 m20 , m10 = m1d (dom(m1)-fag), m20 = m2 d (dom(m2)-fag) and D 2 Choice m1(a) m2(a) a g - Choice k1 k2 a = f f < ; i >, < a; k1 ? i >, < a; k2 ? i > g where 0 i min(k1 ,k2) g
Two multisets can be combined to yield all possible synchronisations (including none). For example, fa,bg fa,bg can result in fa,b,a,bg, fa, ,ag, fb, ,bg or f , g. The rst being no synchronisation, the second the synchronisation of b, the third the synchronisation of a and the fourth, both a and b are synchronised. Not all combinations may contribute to legal behaviour. In the above example, if there are only 2 processors only the last combination can be observed. Note that in the CCS case, actions can only be combined to yield a set of cardinality 1, viz., only is legal. We should remark that the multiset pre x could have been replaced by a tuple-pre x without aecting the completeness results. For example, fa; bgP (which is multiset pre x) can be represented as (ha; biP + hb; aiP) in the tuple-pre x. The tuple-pre x representation does not require the auxiliary de nitions Assign, Combine and Choice. However, the representation is more concrete than the multiset form. Given the usefulness of multisets for multiprocessor systems [BCM88], we use the multiset pre x.
2.2.1 Completeness Having de ned the auxiliary functions, we can now present a set of axioms which completely axiomatise bisimulation equivalence for multiset pre x CCS. As the operational semantics was de ned for a xed n, the set of axioms also assumes a xed n. The proof technique used for CCS [Mil89](pages 160-165) is adequate. That is, we de ne a normal form and show that all nite process can be reduced to normal form. Using an absorption lemma we show that the set of axioms is complete. Consider the equations de ned in gure 4 (the usual axioms) and 5 (the expansion theorem). Note that the expansion theorem corresponds to the compilation of a parallel process for a multiprocessor system.
Proposition 7 The set of axioms is sound; that is P = Q implies that P n Q. 11
Proof Standard.
The proof of completeness involves the de nition of a normal form, then showing that all processes can be proved to have a normal form. If two processes are bisimilar, they can be proved to have identical normal forms. The proofs are only outlined as the proof techniques are well known.
X
De nition: 12 De ne a process to be in normal form if it is of the form mi Pi and i2I each Pi is in normal form and for all i, j mi j n Proposition 8 All process can be reduced to normal form using the equational rules. Proof: By induction on the size of the process. 2 Lemma 3 (Absorption Lemma) Let P be in normal form. If P ?!S n P0 and P0= Q then P + m1Q = P where m1 = Assign?1 (S) X
S P0 then 9i; S 2 Assign(m ) with P0 syntactically mi Pi. If P ?! n i i2I identical to Pi . Hence P + mi Q = P. 2
Proof: Let P =
Lemma 4 The set of axioms is complete; i.e., P n Q implies P = Q. Proof: It is sucient to consider only normal forms as all processes can be reduced X to normal form. We prove by induction on the length of the normal forms. Let P = mi Pi i2I X and Q = m0j Qj such that P n Q. We show that this implies P = P + Q = Q. To j 2J prove P + Q = P, it is sucient to show 8j , P + m0j Qj = P. As P n Q, there is a mi equal to m0j and Pi n Qj . Furthermore, Pi = Qj . Therefore from the absorption lemma 2 P + m0j Qj = P.
The axiomatisation of the bisimulation equivalence required the introduction of multiset pre xes. The analogy between the expansion theorem for CCS and multiprocessor CCS is that in CCS j was translated to choice with action pre x, while in multiset CCS j was translated to choice with multiset pre x. The process of compilation can be captured as follows. If a process (say P) is replaced by a bisimilar process (say Q) such that Q does not involve the j operator, Q represents the compiled version of P. We note that the syntax of CCS with multiset labels on the transition rules suces for the study of observable behaviour of multiprocessor systems. Multiset pre xes are necessary when other issues such as a complete axiomatisation are considered. This concludes the semantics for multiprocessor systems. In the next section the semantics for loosely coupled systems is discussed.
3 Distributed Systems Distributed systems do not share memory and hence the location of a process in the system is important (see gure 6). Boxes indicated by L1 , L2 etc. represent subsystems consisting of a processor and memory. Each process will be loaded into some memory subsystem and hence that process can only use the processor associated with that subsystem. If a process has to access another processor it is to be explicitly relocated. As both binding and con guration are to be considered, an extension to CCS is necessary. A notion of location has been introduced in CCS [KHCB91a] to study the distributed nature of processes. Their primary concern is the logical construction of processes without considering the architecture the process is executing on. The idea of location has also been 12
L1
Communication Medium L2
L3 Figure 6: Distributed System Model used in other languages [Hud86]. We use the same syntax as in [KHCB91b] but give a dierent semantics. In the semantics for CCS, any two processes can synchronise. This cannot be permitted in the distributed case. A local transition depending on behaviour at a remote site is unrealistic. Consider, for example, the CCS process ((a 0 + b 0) j (a0 + b0) )nfa; bg. If the two parallel processes are physically distributed, the decision to execute action a or action b has to be taken in co-ordination with action a or action b respectively. If a general CCS process is to be executed on a physically distributed environment a nontrivial protocol to eect these decisions is essential. Therefore deriving a distributed implementation from the operational semantics is not straight forward. It is for a similar reason that occam-2[INM88] disallows output guards in the alt statement. For the distributed case, a protocol based on the send/receive primitives is more appropriate. This then requires a de nition of buers, where the messages sent are stored before being actually `used' by the process. Towards this we assume special actions such as ; , which indicates the sending of action a to location loc. This message is buered on location loc by creating a process which can engage only in action `a'. As we do not limit the number of parallel terms in a process, unbounded buering is modelled.
De nition: 13 Let Lact be the set of local actions be identical to and let L be a nite set of locations. De ne a set of sending actions Sact (represented by ; ) for loc 2 L and a 2 Lact. The set of actions is Act = Lact [ Sact [ f , g. The basic syntax for our language (with being an element of (Act - fg), loc an location, and H a subset of Lact) is as follows:
P = 0 P (P j P) (P + P) (P nH) (loc:: P) rec(X :E ) The two new constructs for the loosely coupled system are (loc::P) which restricts the execution of P to location loc and rec(X :E ) which de nes a recursive process. To obtain 13
well de ned terms we require that terms must be closed and the recursion guarded. The main purpose of recursive terms is to illustrate the encoding of simple processes in the modi ed syntax. See section 3.1. Issues such as axiomatisation will still be discussed within the context of nite processes. Given that a process has location information, an observation now is a pair consisting of action and location. As processes on dierent locations can execute in parallel behaviour is characterised by a set of observations with the restriction that processes can exhibit multiple actions if they are at dierent locations. Processes can synchronise if they are at the same location. We de ne an operational semantics which is similar to the multiprocessor case; the dierence being instead of using tuples we index the observations by the locations involved. The following de nitions are used in the operational semantics which is de ned in gure 7.
De nition: 14 Let OL represent the function space from L! Act. This is used by the operational semantics to identify the action observed at any location.
De nition: 15 De ne the transition relation ?!D Processes OL Processes,
Also de ne two projection functions Location and Action, which return the location/action part of the observation.
De nition: 16 De ne a partial function +D on OL OL ! OL as follows: S1 +D S2 = S where for every loc 8 > < S1(loc) if S2(loc) = S (loc) = > S2 (loc) if S1 (loc) = : if S1 (loc) = S2 (loc)
The function +D combines two behaviours (S1 and S2 ) to form a new one step observation if the non-idling actions are on dierent locations. If a location can exhibit two actions, they must be able to synchronise and the combined behaviour will exhibit . It is the distributed analogue of de nition 3. The distributed operational semantics is de ned in gure 7. We label the transition with only the non-idle observations. All locations not present on the label are assumed to exhibit the idle action (). The only new rule is that for recursion. It states that if a single unfolding of Ei in a set of recursive equations results in an observation S leading to P, the process identi ed by Ei can exhibit S resulting in P. An informal explanation of the semantics follows. A process with no location can be executed at any location. However, the remainder of the process is constrained to be executed at that location. This can be thought of as the initial loading of a process. The sending of a message to a particular location results in creating a process at that location which can only engage in the action contained in the message. Furthermore, the message passing is visible to the observer. As we have not named the processes, one cannot send a message to a particular process. This can be simulated by ensuring that only `one process' is present at a location. The result of sending messages to the location has the eect of sending messages to the particular process. A process already assigned a location can exhibit actions only on that location. Note that in the operational semantics of loc1 ::P we do not restrict P0 to location loc1 . The restriction is introduced by the transition of P. Furthermore, P could have evolved to P0 via a send (say to location loc2 ) in which case P0 would have the structure: (loc1 ::P00 jloc2 ::a0). In this case the process loc1 ::P0 cannot exhibit the action a. The idea of binding a process to a location (loc::P) can be perceived as a special case of relabelling in CCS or CSP. The set of actions could be partitioned into indexed set of actions each carrying the location label. However, the semantics of relabelling is not as 14
8 l 2 L and 2 Lact: g P f?! D l :: P
Lact Pre x Sact Pre x
8 l 2 L : ; P fg (l :: P ) j (m :: a 0)
Location
g 0 P f?! D P fg 0 l:: P ?!D P
Non-Determinism
S P0 P ?! D S S P0 0 P + Q ?!D P , Q + P ?! D
Interleaving (Asynchronous Evolution)
S P0 P ?! D S P j Q ?!D P0 j Q S Q j P0 Q j P ?! D
Parallelism
S 1 P0 , Q ?! S 2 Q0 , S = S 1 + S 2 P ?! D D D S 0 0 P j Q ?!D P j Q S Q0 j P 0 Q j P ?! D
Restriction
S P0 , P ?! D Action(S ) \ (H [ H) = ; S P0 nH P nH ?! D
Recursion
S P Ei (rec X : E /X) ?! D S reci X:E ?!D P
Figure 7: Operational Semantics
15
restricted as the one presented here. We do not permit relabelling of location; that is once a process is agged as on location loc, its subsequent behaviour is only on location loc. Relabelling in CCS can be applied arbitrarily to get dierent actions. While we could use the idea of an indexed set of actions, we use the explicit location syntax to model message passing. CCS with location was introduced in [KHCB91b]. The main operational rules in their a p0 p ?! u a p u 2 Loc and semantics are as follows: a p ?! u a 0 v :: p ?! vu v :: p The main dierence of our semantics, is that we do not allow the evolution of processes such as v::u::P with v 6= u. The evolution is permitted by [KHCB91b] as they are concerned with the structure of the process rather than implementing a process on a distributed architecture. A location string `vu' indicates that it is a sub-location of `v'. They also assume implicit relocation of processes (or implicit communication between processes at dierent locations) and thus distinguish between a(b0 j c0) and a(bc0 + cb0). Similarly [CH89] present a distributed bisimulation semantics. The motivation and the results are similar to [KHCB91b]. A detailed comparison between [KHCB91b] and [CH89] is presented in [KHCB91b]. In short, both the approaches consider spatial issues in distributed computation at the logical level. They do not consider limitations imposed by a physical architecture. In `distributed semantics' such as [DDM88] the primary concern is causality and not physically distributed computing elements.
3.1 Examples
As the syntax for distributed CCS is dierent from CCS we present a few examples which illustrate modelling certain aspects of distributed systems.
Example: The following is an encoding of RPC [BN81]. A caller process sends a request to the callee and waits for a response. The callee waits for a request, calls the procedure and sends an acknowledgement. The calling of the procedure is denoted by the action a and the response by the action b. The actual procedure (call-procedure) is modelled as an action. caller = ; bcaller callee = acall-procedure ; callee System = ( (l::caller) j(m::callee)) nfa,bg
Though the above code assumes a xed location to send the response the location of the caller can be coded to be part of the action. calleri = bcalleri X i ;b> callee = ai call-procedure