server model. For example, in a teleconference, ob- jects representing the participants make a group, i.e. conference, and send messages to others in the group.
Communication Protocol for Group of Distributed Objects Takayuki Tachikawa
and
Makoto Takizawa
Dept. of Computers and Systems Engineering Tokyo Denki University Ishizaka, Hatoyama, Hiki-gun, Saitama 350-03, JAPAN e-mail
ftachi,
g
taki @takilab.k.dendai.ac.jp
Abstract
In distributed applications, a group of multiple objects are cooperated. On receipt of request messages, the objects send back the responses. Kinds of group communication protocols have been discussed so far, which support the reliable and ordered delivery of messages at the network level. Only messages to be ordered at the application level, not necessarily all messages, are required to be causally delivered in the required order. The state of the object depends on in what order the requests are computed and the responses and requests are transmitted. In this paper, we would like to de ne the signi cant precedence order of messages based on the con icting relation among the requests. We would like to discuss a protocol which supports the signi cantly ordered delivery of request and response messages.
1 Introduction
Distributed applications like teleconferences [7] are composed of multiple application objects. Objects support operations for manipulating the states of the objects. A group of multiple application objects have to be communicated in the distributed applications. There are two types of communication for supporting the group. One type is multicast [3,4], where each object sends a message to a group or groups of objects. For example, a client multicasts a request to a group of replicated servers. Thus, the multicast is used by client-server systems. On the other hand, multiple autonomous objects are cooperated in distributed applications like teleconferences and communication networks. Here, the group is rst de ned among multiple objects. Then, messages sent by each object are delivered to the destination objects in the group. This type of group communication is intra-group communication [16{18, 20]. In this paper, we would like to discuss the intra-group communication among multiple application objects. Many papers [1,3, 4,6, 8,9, 12,14, 16{18,20, 23] have discussed how to support the causally ordered delivery of messages at the network level in the presence of message loss and stop faults of the objects. O (n2 ) processing overhead and O (n) to O (n2 ) communication overhead are implied for number n of objects in the group [16]. On the other hand, Cheriton et al . [5]
point out that it is meaningless at the application level to support the causally ordered delivery of all messages transmitted in the network. Only messages required by the applications have to be causally delivered in order to reduce the communication and processing overhead. Ravindran et al . [19] discuss how to support the ordered delivery of messages based on the precedence relation among messages explicitly speci ed by the application. Agrawal et al . [11] de ne the signi cant messages, that is, if the state of the object is changed on receipt of a message m, m is signi cant . The receivers of the signi cant message m have to be rolled back if the sender of m is rolled back. The object o supports abstract operations for manipulating o. For example, a Bank object supports deposit and withdrawal operations. The operations are computed based on the remote procedure call. That is, a request message m with an operation op is sent to o. On receipt of m, o computes op. op might change the state of o, e.g. write on a le object. On completion of op, op sends back a response message m' with the result of op. Here, m precedes m' because m' is the response of m. Next, the states of the objects depend on in what order the operations are computed. The con icting relation [2] among the operations is de ned for each object based on the semantics of the object. If two operations sending and receiving messages con ict in an object, the messages have to be received in the computation order of the operations. Thus, the signi cant precedence relation among the request and response messages can be de ned based on the con icting relation. In this paper, we would like to present a protocol named Object-based Group Communication (OGC) protocol which supports the signi cantly preceded delivery of messages where only messages to be ordered are delivered to the application objects in the order. In the OGC protocol, the messages are ordered by timestamps given by using the real time clock. In section 2, we present the system model. In section 3, we discuss the signi cant precedency among requests and responses at the application level. In sections 4 and 5, the protocol for supporting the signi cantly ordered delivery of messages in the group is discussed and evaluated.
2 System Model
The distributed application is realized by computations on multiple objects and communications among the objects by passing messages in the network. An object o is de ned to be a pair of state Do and a collection Po of operations for manipulating Do . o can be manipulated only through an operation op in Po . On receipt of a request message of op, o computes op and sends back the response . op may change Do . Here, o' and o are the sender and receiver of m, respectively. A group G is a collection of application objects o1 , : : : , on (n 2) [Figure 1]. The objects are autonomous and are cooperated with each other in G by sending requests and responses. This is the intragroup communication [20,21]. That is, there is no predetermined precedence among objects like the clientserver model. For example, in a teleconference, objects representing the participants make a group, i.e. conference, and send messages to others in the group. The routers in the networks are autonomous and are cooperated to exchange the routing information. The communication system is composed of multiple system objects. The communication system takes messages from the application objects and causally delivers the destinations only the messages which have to be causally ordered from the application point of view [4]. We assume that the network is reliable and synchronous , i.e. messages sent by each object are delivered to the destinations with no message loss in the sending order and the delay time for transmitting a message among objects is bounded. O1 DO1 PO1
Oi DOi POi
On DOn POn
Ci
Cn
group G
request response
C1
communication system
network
Figure 1: Group G Each application object o supports operations by which o can be manipulated. For every operation op and state s of o , let op (s) denote a state obtained by applying op to s. [De nition] Two operations op 1 and op 2 supported by an object o are compatible i op 1 (op 2 (s)) = op 2 (op 1 (s)) for every state s of o. 2 op 1 and op 2 con ict i they are not compatible. That is, the state obtained by applying op1 and op2 to o depends on the computation order of op 1 and op 2 , i.e. op 1 (op 2 (s)) 6= op 2 (op 1 (s)) for some state s of o. The con icting relation among the operations is speci ed when o is de ned. Suppose that op 1 is issued to o while op 2 is being computed in o . If op 1 is compatible with op 2 in o, op 1 can be computed. Otherwise, op 1 has to wait until op 2 completes. Two operations op1 and op2 are mutually exclusive
in o i they cannot be interleaved in o. In this paper, we assume that op1 and op2 are mutually exclusive if they con ict. Each time o receives a request of an operation op , a thread for op is created for o. In each thread for op , actions of op are computed sequentially on o. The action is a primitive unit of computation in o. That is, op is computed to be a sequence of actions. The computation of op by the thread is referred to as instance of op in o. Only if all the actions computed in op complete successfully, op completes successfully, i.e. commits . If some action in op fails, no action in op is computed, i.e. aborts . The completion of op means that op commits or aborts. That is, op is atomicaly computed in o. op may invoke another operation op i in an object oi . o sends the request op i to oi. On receipt of op i, the instance of op i is computed in oi and oi sends back the response to o. opi may further invoke operations. Thus, the computation of op is nested .
3 Signi cant Precedence of Messages 3.1 Precedence of operations
It is important to consider in what order the operations are computed in the objects. Let opi1 and opi2 be instances of operations op1 and op2 in an object oi, respectively. Here, opi1 precedes opi2 (op i1 )i op i2 ) in oi i op i2 is computed after op i1 completes. If op i1 and op i2 are mutually exclusive, either one of them has to precede the other. Otherwise, they can be interleaved. We would like to de ne the precedence relation \)" among the operations. [De nition] opi1 precedes opj2 (op i1 ) op j2 ) i (1) op i1 )i op j2 for i = j , (2) op i1 invokes op j2 , or (3) for some op k3 , op i1 ) op k3 ) op j2 . 2 In Figure 2, opi2 and opi3 are computed after opi1 , and j j opi2 invokes op4 . Here, opi1 ) opi2 ) op4 and opi1 ) i i i op3 . Since op2 and op3 are interleaved, neither opi2 ) opi3 nor opi3 ) opi2 . oj
oi opi1
opi3
time
opi2
?
z
j
op4
?
Figure 2: Precedence of operations
3.2 Signi cant precedence
The causality of messages [4] is de ned on the basis of the precedence among the sending and receipt events [10]. That is, a message m1 causally precedes
if the sending event of m1 precedes the sending event of m2 . For example, suppose that o1 sends m1 to o1 and o2 , and o2 sends m2 to o3 after receiving m1 . Here, m1 precedes m2 . If m1 is a question and m2 is the answer of m1 , o3 has to receive m1 before m2 . If both m1 and m2 are independent questions, o3 can receive m1 and m2 in any order. Thus, it is important to de ne what messages are required to be ordered in the application. We would like to de ne the signi cant precedence relation \!" among messages based on the concept of objects. First, suppose that an object oi sends a message m1 before m2 . There are the following cases as shown in Figure 3. S1. m1 and m2 are sent by the same instance opi1 . S2. m1 and m2 are sent by dierent instances opi1 and opi2 , respectively: S2.1. opi1 precedes opi2 (opi1 ) opi2 ). S2.2. opi1 and opi2 are interleaved. m2
oi
oi
z m z
time
?
(S1)
opi1
2
m1
2
?
(S2.1)
z
z m z
m1
opi
time
m1
z
2
m2
time
?
oi opi1
m1
z
z
m2
opi
z
m1
2
oi
oi opi1
opi1
the messages sent by opi1 have to be received before the messages sent by opi2 , i.e. m1 ! m2 . Otherwise, there is no precedence relation between m1 and m2 (m1 k m2 ). In S2.2, opi1 and opi2 are interleaved. Since opi1 and opi2 are not related, m1 k m2 . Next, suppose that oi sends m2 after receiving m1 . There are the following cases as shown in Figure 4. R1. m1 and m2 are received and sent by opi1 . R2. m1 is received by opi1 and m2 is sent by opi2: R2.1. opi1 ) opi2. R2.2. opi1 and opi2 are interleaved.
(S2.2)
Figure 3: Send-send precedence In the rst case S1, since m1 and m2 are sent by the same instance opi1 , m1 signi cantly precedes m2 (m1 ! m2 ). Here, m1 must be a request message because each operation sends the response at the end of the operation. m2 is either a request of an operation which is invoked by opi1 or the response of opi1 which is sent at the end of opi1 . Suppose that m1 is a request of an operation op2 and m2 is the response of opi1 . The response may be in uenced by op2 . For example, m2 may bring data obtained from op2 . Hence, m1 ! m2 . Next, suppose that m1 and m2 are requests. Since m2 may be aected by the response of m1 , m1 ! m2 . In S2, m1 and m2 are sent by dierent instances opi1 and opi2 in oi , respectively. In S2.1, opi1 and opi2 are not interleaved, i.e. opi1 precedes opi2 (opi1 ) opi2 ). Unless opi1 and opi2 con ict, there is no relation between opi1 and opi2 . Hence, neither m1 ! m2 nor m2 ! m1 (written as m1 k m2 ). If opi1 and opi2 con ict, m2 may be aected by m1 . For example, suppose that m1 and m2 are the responses of opi1 and opi2 , respectively. If opi2 ) opi1 , the output data carried by m1 and m2 may be dierent from opi1 ) opi2 because the state obtained by applying opi1 and opi2 depends on the computation order of opi1 and opi2 . Thus, if opi1 and opi2 con ict,
time
?
(R1)
time
?
oi opi1
m1
opi1
z
opi2 m2
(R2.1)
z
opi2
z
m2
time
?
(R2.2)
Figure 4: Receive-send precedence In R1, m1 is received and m2 is sent by the same instance opi1 . Here, m1 is the request of opi1 or a response of an operation invoked by opi1 . m2 is the response of opi1 or a request of an operation invoked by opi1 . For example, suppose that m1 is a response of an operation op2 invoked by opi1 and m2 is a request of op2 . The input data of op2 may be the input of m1 . Hence, m1 ! m2 . In R2, m1 is received by opi1 and m2 is sent by opi2 (6= opi1 ). In R2.1, opi1 and opi2 are not interleaved, i.e. opi1 ) opi2 . Suppose that opi1 and opi2 con ict. For example, if opi1 is write and opi2 is read, m2 is derived from the state obtained by opi1 . Hence, the messages received by opi1 signi cantly precede the messages sent by opi2 , i.e. m1 ! m2 . Unless opi1 and opi2 con ict, there is no precedence relation among m1 and m2 , i.e. m1 k m2 . In R2.2, since opi1 and opi2 are interleaved, there is no relation among opi1 and opi2 . Hence m1 k m2 . Thus, the signi cant precedence relation ! among messages is de ned as follows. [De nition] A message m1 signi cantly precedes m2 (m1 ! m2 ) i one of the following conditions holds: (1) m1 is sent before m2 by an object oi and (1-1) m1 and m2 are sent by the same operation instance, or (1-2) an operation sending m1 con icts with an operation sending m2 in oi . (2) m1 is received before sending m2 by oi and
(2-1)
and m2 are received and sent by the same operation instance, or (2-2) an operation receiving m1 con icts with an operation sending m2 . (3) m1 ! m3 ! m2 for some message m3 . 2 It is clear that there holds the following relation between the signi cant precedence and the network level causality [4]. [Proposition] For every pair of messages m1 and m2 , m1 causally precedes m2 if m1 signi cantly precedes m2 (m1 ! m2 ). 2 According to the causality theory [13], if oi sends m, m is preceded by all the messages which oi has received before sending m. On the other hand, m is signi cantly preceded by only the messages which are related with m.
3.3 Ordered delivery
oph 1
In C2 and C3, m1 and m2 are allowed to be delivered in any order.
3.3.2 Signi cant order
Suppose that oi receives messages m1 and m2 . If k m2 and m1 and m2 are not requests sent to multiple objects, oi can receive m1 and m2 in any order. Suppose that m1 ! m2 . There are the following cases as shown in Figure 6. T1. m1 and m2 are received by opi1. m1
R
) time
?
j
?
m2
9
opk2
op2
?
?
Figure 5: Receive-receive precedence
T2.
and m2 are received by opi1 and tively. T2.1. opi2 ) opi1 . T2.2. opi1 and opi2 are interleaved.
m1
3.3.1 Serialization
2
ok
m1 op1
Now, we would like to discuss how the messages are delivered to each object oi in the group G. Let us consider a case that there are multiple common destination objects of messages m1 and m2 . Suppose that oh sends m1 to oi , oj , and ok , and ok sends m2 to oh , oi , and oj [Figure 5]. oi and oj receive both m1 and m2 . Here, suppose that m1 and m2 are received by the instances opi1 and opi2 . If m1 ! m2 , m1 has to be delivered before m2 as presented later. Hence, we assume that m1 k m2 . There are the following cases: C1. m1 and m2 are requests. C2. Either m1 or m2 is a request. C3. m1 and m2 are responses. In C1, suppose that m1 and m2 are requests of op1 and op2 , respectively, where op1 and op2 con ict in oi and oj . Since m1 k m2 , m1 may be delivered before m2 in oi and m2 before m1 in oj . That is, opi1 ) j j opi2 and op2 ) op1 . The state of oi obtained by the computations may be inconsistent with oj because op1 and op2 con ict in oi and oj . In order to keep the states of oi and oj consistent, m1 and m2 have to be delivered to oi and oj in the same order. [Serialization (S) rule] For every pair of common destinations oi and oj of request messages m1 of op1 and m2 of op2 , if op1 and op2 con ict in oi and oj , m1 and m2 are delivered in oi and oj in the same order.
oj
oi
oh
m1
oi m1
z z
m2
time
?
(T1)
oi opi
1
m1
m2
z
z time ?
opi2 ,
respec-
oi opi
1
opi2
(T2.1)
m1
opi
1
z
opi2
m2
z time ?
(T2.2)
Figure 6: Receive-receive precedence In T1, m1 has to be delivered to oi before m2 since ! m2 . For example, if m1 is a request of opi1 and is the response of an operation invoked by opi1 , m1 has to precede m2 . In T2, m1 and m2 are received by dierent instances. If opi1 and opi2 are interleaved in T2.2, m1 and m2 can be delivered independently to opi1 and opi2 . opi1 and opi2 are not interleaved in T2.1. First, suppose that opi1 and opi2 con ict. If m1 or m2 is a request, m1 has to be delivered before m2 since m1 ! m2 . Next, suppose that m1 and m2 are responses. Here, unless m1 is delivered before m2 , opi1 waits for m1 and opi2 is not computed since opi1 does not complete. That is, deadlock among opi1 and opi2 occurs. Furthermore, suppose that m3 and m4 are sent to opi1 and opi2 , respectively, and m4 ! m3 . Even if opi1 precedes opi2 (opi1 ) opi2 ) and m1 is delivered before m2 , the deadlock occurs because m4 ! m3 . Thus, the messages destined to dierent instances cannot be delivered to oi in ! unless at least one of the messages is a request. If opi1 and opi2 do not con ict, m1 and m2 are delivered in any order. m1 m2
Thus, if the following rule is satis ed for messages and m2 , the communication system supports the signi cantly ordered delivery of messages. m1
[Signi cantly ordered delivery (SO) rule]
m1 is delivered before m2 in a common destination oi of m1 and m2 if (1) if m1 ! m2 , (1-1) m1 and m2 are received by the same operation instance, or (1-2) an operation instance opi1 receiving m1 con icts with an operation instance opi2 receiving m2 in oi , and m1 or m2 is a request message, (2) otherwise, if m1 and m2 are requests, m1 is delivered before m2 in the common destinations of m1 and m2 by the S rule. 2 The condition (1-2) means that opi1 ) opi2 because opi1 and opi2 con ict and m1 is delivered before m2 . [Theorem] No communication deadlock occurs if every messages are delivered by the SO rule. [Proof] Suppose that m1 is received by opi1 and m2 by opi2 . If m1 and m2 are responses or opi1 and opi2 are compatible, m1 and m2 can be received in any order such that neither opi1 nor opi2 wait for the message. Next, suppose that opi1 and opi2 con ict. Suppose that m1 and m2 are delivered by the SO rule but deadlock occurs. Since the deadlock occurs, opi2 ) opi1 and m1 ! m2 . From (1-2) of the SO rule, opi1 ) opi2 . It contradicts the assumption. 2 [Theorem] The system is consistent if messages are delivered by the SO rule. [Proof] Let m1 and m2 be messages. (1) Suppose that m1 k m2 . If request messages m1 and m2 are sent to multiple objects and the operations of m1 and m2 con ict, the operations are computed in the same order by the S rule. (2) Next, suppose that m1 ! m2 . By the SO rule, m1 is delivered before m2 if m1 or m2 is a request and the instances opi1 and opi2 receiving m1 and m2 con ict. Here, opi1 ) opi2 . 2
4 Protocol
We would like to present a protocol for supporting the signi cantly ordered delivery of messages named Object-based Group Communication (OGC) protocol.
4.1 Real time clock
Suppose that a group G is established among objects o1 , ..., on . The communication system delivers messages to the objects in G so that the request and response messages are delivered by the SO rule. In this paper, we assume that the network is reliable and synchronous, i.e. the loss-less, sending order (FIFO) delivery is supported and the communication delay is bounded to be time units. According to the advances of the hardware technologies, the synchronous real-time clock [15] is available. Hence, we assume that each oi has the real-time clock Ci which shows
the same global time in G. oi has a variable Ti which denotes the current value of Ci . Each time oi sends a message m, oi gives the value of Ti to m as the timestamp ts(m). m1 and m2 may be sent at the same time by o1 and o2 . The timestamp is realized by the real time post xed by the object identi er. If the times given to m1 and m2 are the same and the identi er of o1 is smaller than o2 , ts(m1 ) < ts(m2 ). Therefore, it is straightforward for the following proposition to hold. [Proposition] For every pair of messages m1 and m2, ts(m1 ) < ts(m2 ) if m1 ! m2 . 2 If the messages received are delivered in the timestamp order, the messages are causally delivered. In order to support the timestamp order delivery, every message m has to be stayed in the buer until at least one message m such that ts(m ) > ts(m) is received from every object. This means that even messages not to be ordered have to be stayed in the buer. Only the messages to be ordered have to be delivered in the order ! to reduce the processing overhead and the communication time. From the assumptions, if a message m is sent at time ts(m), every destination oi receives m at time t where ts(m) < t ts(m) + . It is sure that every message m' such that m' ! m is sent by oj before ts(m) [Figure 7]. Hence, the following theorem holds. 0
0
oj
oh m'
z
m'
time
?
?
oi ts(m0 ) m
Ti
z z
0
ts(m)
9T ?i
Figure 7: Delivery of message m
[Theorem] Every object oi can receive every message
!
m' signi cantly preceding m, i.e. m' m, until ts(m) + . Following this theorem, oi can deliver m sent by oj in \ " at ts(m) + because oi surely receives all the messages causally preceding m. However, m has to stay in the receipt queue of oi for time units even if m is not required to be ordered in . In this paper, we present a data transmission procedure by which m can be delivered in time units after m is sent.
2
!
!
4.2 Transmission and receipt A. Transmission
Each message m carries the following information: send(m) = sender object of m, say oi . recv(m) = receiver objects of m. ts(m) = timestamp of m. M clock(m) = hM clock1 (m), ..., M clockn (m)i.
Here, let Mj (m) be a set of messages received by oi from oj and signi cantly preceding m, i.e. Mj (m) = f m' j m' ! m, send(m ) = oj , and oi 2 recv (m) g. M clockj (m) denotes the timestamp of a message m' in Mj (m) whose timestamp is the maximum (j 6= i). That is, M clockj (m) denotes a message m' most recently sent by oj such that m' ! m. M clocki(m) denotes the timestamp of a message whose timestamp is the maximum among the messages signi cantly preceding m which oi has sent. Suppose that oi receives a message m from oj . oi enqueues m into the receipt queue RQi , where messages are stored in the timestamp order. Each message does not stay in RQi for longer time units than . The top message m' in RQi is removed if ts(m ) Ti 0 . Each time oi sends a message m, ts(m) := Ti , i.e. real time of oi, and m is enqueued into the sending queue SQi in oi . m stays in SQi in time units. For the top message m' in SQi , if ts(m ) < Ti 0 , m' is removed from SQi because m' is surely received by every destination. For each oj (j 6= i), RQi is searched for a message mj whose timestamp is the maximum among the messages from oj in RQi which precede m by the following ordering rule. If mj is found in RQi , M clockj (m) := ts(mj ) [Figure 8], else M clockj (m) := 0. [Ordering rule] For every pair of messages m1 and m2 , m1 precedes m2 if ts(m1 ) < ts(m2 ) and one of the following conditions holds: (1) m1 and m2 are sent by the same instance in oi . (2) m1 and m2 are sent by the instances opi1 and opi2 , respectively, and opi1 and opi2 con ict. (3) m1 is received and m2 is sent by the same instance in oi. (4) m1 is received by an instance opi1 and m2 is sent by opi2 , and opi1 and opi2 con ict. 2 Then, SQi is searched for a message mi whose timestamp is the maximum among the messages preceding m by the ordering rule. If mi is found, M clocki (m) = ts(mi ) [Figure 8], else M clocki (m) := 0. It is straightforward for the following theorem to hold from the de nition. [Theorem] Let mj be a message which oi receives from oj . If mj precedes a message m in oi by the ordering rule, mj ! m. 2
the messages. We would like to reduce the time for messages' staying in RQi.
0
0
0
B. Delivery
Next, we would like to discuss how to deliver messages in oi . The messages received are stored in RQi in the timestamp order. If the messages are delivered in the timestamp order without gap, it is sure that the messages are delivered so that the SO rule is satis ed. If the following rule is satis ed, a message m can be delivered without gap. [Non-gap delivery] m is delivered in oi if oi receives a message mj from every oj where ts(m) < ts(mj ). 2 However, the messages have to stay in RQi for time units to guarantee that there is no gap between
M_clock j ( m )
oi
RQ i o o o mj o o o
m
mi mj
o o o mi o o o SQ i
m time
M_clock i ( m )
Figure 8: Sending and receipt queues
[Theorem] Let m be a message which oi receives from
oj . All the messages signi cantly preceding m are received by oi if m satis es the delivery condition : maxf M clockk (m) j k = 1, ..., n g < Ti 0 . [Proof] Suppose that M clockh (m) = max f M clockk (m) j k = 1, ..., n g. This means that oj sends m after receiving m' from oh where M clockh (m) = ts(m ) and m' ! m as shown in Figure 7. Hence, if M clockh (m) < Ti 0 , every message m' such that m' ! m is received. 2 [Delivery procedure] RQi is searched from the top in the timestamp order. For each message m searched in RQi , if (1) ts(m) < Ti 0 , or (2) maxf M clockk (m) j k = 1, ..., n g < Ti 0 , and (2-1) RQi includes no request message m of op con icting with op and preceding m, and m satis es the non-gap rule if m is a request of op to multiple destinations. m is removed from RQi and delivered to oi . 2 If m is delivered by the delivery procedure, only the timestamp ts(m) of m is still staying in RQi until Ti becomes ts(m) + . It is sure that every message m stays in RQi for shorter time units than because ts(m) > M clockj (m) for every j . [Theorem] The delivery procedure satis es the SO rule. [Proof] Let m1 and m2 be messages which oi receives. (1) Suppose that m1 ! m2 , one of m1 and m2 is a request, and instances opi1 and opi2 receiving m1 and m2 , respectively, con ict. By the delivery condition, m1 is delivered after delivering every message m where m ! m and ts(m ) < ts(m). (2) Suppose that m1 k m2 . Since both messages are stored in RQi in time units, m1 and m2 are delivered in timestamp order in every common destination so that m1 precedes m2 if ts(m1 ) < ts(m2 ). If m1 and m2 are sent to multiple objects, the con icting operations are computed in timestamp order by the delivery procedure (2-1). 0
0
0
0
0
0
Table 1: Time values ts(m) M clock(m) m1 : 10 h0,0,0i m2 : 15 h 10 , 0 , 0 i m3 : 16 h0,0,0i m4 : 18 h0,0,0i m5 : 20 h 0 , 15 , 16 i That is, the S rule is satis ed.
C. Example
2
Figure 9 shows an example of the group communication among three objects o1 , o2 , and o3 with = 6. In o1 , opi1 is computed and sends a request m1 of op2 to o2 and o3 . On receipt of m1 , o2 computes op22 and o3 computes op32 . op32 sends m3 to o1 and o2 . o1 and o2 compute op13 and op23 on receipt of m3 , respectively. op22 and op23 send m2 and m5 , respectively. op32 and op34 are interleaved in o3 . op34 sends m4 to o1 . Suppose that op22 and op23 con ict in o2 . Each message carries M clock as shown in Table 1. For example, m2 is sent by op22 at time 15, i.e. ts(m2 ) = 15. Here, m1 is in the receipt queue RQ2 and ts(m1 ) = 10. Hence, M clock1 (m2 ) = 10. o1 (10)
o2
op11 m1
o3
j
z
op22
op32
op34 m2
op13
time
) 9 ?
(15)
op m5
m3
(16)
2 m4 3
(18)
(20)
?
? ( ) : clock value
Figure 9: Example It is clear that m3 ! m5 from R1. After receiving m5 , o1 cannot deliver m5 until T1 = 22 since maxfM clock (m5 )g(= 16) + (= 6) = 22. Suppose that o1 receives m3 at T1 = 22. m3 is delivered to o1 since maxfM clock(m3 )g = 0, i.e. there is no message signi cantly preceding m3 . Then, m5 is delivered. If oi receives m4 before m3 , o1 can deliver m4 before m3 since maxfM clock(m4 )g = 0. o1 can receive m2 be-
fore m5 , i.e. m2 ! m5 since m2 and m5 are sent by o2 and the FIFO delivery is supported. Finally, m1 , m2 , m4 , m3 , and m5 are delivered in oi in this order.
5 Evaluation
We would like to evaluate the OGC protocol in terms of the delivery time of each message by comparing with the causally ordered (CO) protocols [4, 18]. Suppose that an object oi sends a message m after receiving m'. In the CO protocol, m' precedes m since m is preceded by all the messages received or sent by oi before sending m. Here, the messages received are ordered by the vector clocks [4] and the vectors of sequence numbers [18]. In the OGC protocol, only messages to be ordered by the ordering rule are delivered in the order !. The timestamp ts(m) denotes the time when m is sent. Let dt(m) denote the time when m is delivered to the destination objects. Let be the maximum propagation delay time in the network. Suppose that a group G includes n objects o1 , ..., on and each oi sends a message randomly to d destination objects in G (d n). We assume that each oi sends one message every Rt time units. When oi receives a message m, the receipt queue RQi includes messages which oi has received before m and which signi cantly precede m. In the OGC protocol, m is delivered if every message m' such that m' ! m is delivered, not all messages causally preceded by m. For each message m' received by oi , let Pc be a probability that m' ! m. In the CO protocol, Pc is 1 and Pc 1 in the OGC protocol. It takes D = dt(m) 0 ts(m) time units to deliver each message m after m is sent. D is given as dt(m) 0 ts(m) = 0 = ( ( = Rt ) 1 ( d + 1 ) 1 Pc + 1 ). Figure 10 shows D for the number d of destinations where n = 10, = 10, and Rt = 5. Pc = 1 means the CO protocol. Following Figure 10, the OGC protocol implies the shorter delay time than the CO protocol. The OGC protocol uses the real time clock. Since each message includes the vector of the real times, the length of the message is O(n). Unless the real time is used, a message has to include the information on messages signi cantly preceding the message from each object. That is, the message length is O(n2 ).
6 Concluding Remarks
In this paper, we have discussed how to support the causally ordered delivery of messages from the application point of view named the signi cant precedence while most group communication protocols are discussed at the network level. Only messages to be causally ordered at the application level are delivered in the signi cant precedence order. The system is modeled to be a collection of objects whose computations are based on the remote procedure calls in this paper. Based on the con icting relation among the abstract operations supported by each object, we have de ned the signi cant precedence among the request and response messages. We have presented a protocol for a group of objects which supports the sig-
n = 10 ,
delta = 10 ,
Broadcast Protocol," ACM Operating Systems Review , Vol.23, No.4, 1989, pp.5{19. [10] Lamport, L., \Time, Clocks, and the Ordering of Events in a Distributed System," Comm . ACM , Vol.21, No.7, 1978, pp.558{565.
Rt = 5
10 Pc = 1.0 Pc = 0.5 Pc = 0.1
8
D
6
4
2
0 1
2
3
4
5
6
7
8
9
10
d
Figure 10: Number of destinations v.s. time ni cantly ordered delivery of messages by using the real time clock and presented the evaluation in term of the delivery time.
References
[1] Amir, Y., Dolev, D., Kramer, S., and Malki, D., \Transis: A Communication Sub-System for High Availability," Proc . of IEEE FTCS-22 , 1993, pp.76{84.
[2] Bernstein, P. A., Hadzilacos, V., Goodman, N., \Concurrency Control and Recovery in Database Systems," Addison-Wesley Publishing Company, 1987. [3] Birman, K. P. and Joseph, T. A., \Reliable Communication in the Presence of Failures," ACM Trans . Computer Systems , Vol.5, No.1, 1987, pp.47-76. [4] Birman, K., Schiper, A., and Stephenson, P., \Lightweight Causal and Atomic Group Multicast," ACM Trans . Computer Systems , Vol.9, No.3, 1991, pp.272-314. [5] Cheriton, D. R. and Skeen, D., \Understanding the Limitations of Causally and Totally Ordered Communication," Proc. of the ACM SIGOPS'93 , 1993, pp.44{57. [6] Chang, J. M. and Maxemchuk, N. F., \Reliable Broadcast Protocols," ACM Trans . Computer Systems , Vol.2, No.3, 1984, pp.251{273. [7] Ellis, C. A., Gibbs, S. J., and Rein, G. L., \Groupware," Comm . ACM , Vol.34, No.1, 1991, pp.38{58. [8] Garcia-Molina, H. and Spauster, A., \Ordered and Reliable Multicast Communication," ACM Trans . Computer Systems , Vol.9, No.3, 1991, pp.242{271. [9] Kaashoek, M. F., Tanenbaum, A. S., Hummel, S. F., and Bal, H. E., \An Ecient Reliable
[11] Leong, H. V. and Agrawal, D., \Using Message Semantics to Reduce Rollback in Optimistic Message Logging Recovery Schemes," Proc. of IEEE ICDCS-14 , 1994, pp.227{234. [12] Luan, S. W. and Gligor, V. D., \A Fault-Tolerant Protocol for Atomic Broadcast," IEEE Trans . Parallel and Distributed Systems , Vol.1, No.3, 1990, pp.271{285. [13] Mattern, F., \Virtual Time and Global States of Distributed Systems," Parallel and Distributed Algorithms (Cosnard, M. and Quinton, P. eds.), North-Holland , 1989, pp.215{226. [14] Melliar-Smith, P. M., Moser, L. E., and Agrawala, V., \Broadcast Protocols for Distributed Systems," IEEE Trans . on Parallel and Distributed Systems , Vol.1, No.1, 1990, pp.17{25. [15] Mills, D. L., \Precision Synchronization of Computer Network Clocks," ACM Computer Communication Review , Vol.24, No.2, 1994, pp.28{ 43. [16] Nakamura, A. and Takizawa, M., \Reliable Broadcast Protocol for Selectively Ordering PDUs," Proc. of IEEE ICDCS-11 , 1991, pp.239{ 246. [17] Nakamura, A. and Takizawa, M., \Priority-Based Total and Semi-Total Ordering Broadcast Protocols," Proc . of IEEE ICDCS-12 , 1992, pp.178{ 185. [18] Nakamura, A. and Takizawa, M., \Causally Ordering Broadcast Protocol," Proc. of IEEE ICDCS-14 , 1994, pp.48{55. [19] Ravindran, K. and Shah, K., \Causal Broadcasting and Consistency of Distributed Shared Data," Proc. of IEEE ICDCS-14 , 1994, pp.40-47. [20] Tachikawa, T. and Takizawa, M., \Selective Total Ordering Broadcast Protocol," Proc. of IEEE ICNP-94 , 1994, pp.2120219. [21] Tachikawa, T. and Takizawa, M., \Multimedia Intra-Group Communication Protocol," Proc. of IEEE HPDC-4 , 1995, pp.1800187. [22] Tachikawa, T., and Takizawa, M., \Distributed Protocol for Selective Intra-group Communication," Proc. of IEEE ICNP-95 , 1995, pp.2340241. [23] Verissimo, P., Rodrigues, L., and Baptista, M., \AMp: A Highly Parallel Atomic Multicast Protocol," Proc . of the ACM SIGCOMM'89 , 1989, pp.83-93.