An implementation for complete asynchronous distributed ... - CiteSeerX

7 downloads 0 Views 251KB Size Report
scalable (as networks connect more and more dis- ... of distributed objects, similar to Network Objects ..... detected as a free cycle, whereas it is still reachable.
An implementation for complete asynchronous distributed garbage collection Fabrice Le Fessant, Ian Piumarta, Marc Shapiro Project SOR INRIA Rocquencourt, B.P. 105, 78153 Le Chesnay Cedex, France fabrice.le [email protected], [email protected] , [email protected] November 21, 1997

Abstract

space), fault-tolerant (because message loss and space crashes are common in such systems), and We expand an acyclic distributed garbage collector scalable (as networks connect more and more dis(the cleanup protocol of Stub-Scion Pair Chains) tant computers). with a detector of distributed cycles of garbage. The whole result is as a complete and asynchronous In this paper we present a distributed garbage coldistributed garbage collector. lector for distributed languages, matching these The detection algorithm for free distributed cycles three qualities. Moreover, the algorithm is simple is inspired by Hughes [5]. A local collector marks to implement and has low resource requirements for outgoing references with dates, which are propa- participating spaces. gated asynchronously between spaces. A central server computes the minimum of allowed dates, We developed this garbage collector on a system permitting cycles to be detected and cut. of distributed objects, similar to Network Objects Our algorithm is both asynchronous and fault- [1], called SSPC (Stub-Scion Pair Chains), impletolerant. Moreover, it can be adapted to large-scale mented in Objective-Caml (a dialect of ML with systems. Finally, it requires few resources and it is object-oriented extension). Although our system is easy to implement. based on transparent distributed references, our design assumptions are weak enough to support other kinds of distributed languages, such as those based 1 Introduction on channels (-calculus [11], join-calculus [3], etc). Automatic garbage collection is an important feature for modern high-level languages. However, if local garbage collection is pretty much a solved problem, distributed programming still lacks ecient cyclic garbage collection. Indeed, a distributed garbage collector should be asynchronous (all other spaces should continue to work during a local garbage collection in one

In the rst part, we introduce the basis of the SSPC system for garbage collection. The second part describes the new algorithm of cycle detection. In the third part, a short analysis of the algorithm is done and some implementation issues are discussed. Finally, in the fourth part, we compare our algorithm with some other recent distributed garbage collectors. 1

1.1 Background

with optimizations, such as lazy short-cuts and direct message sending for indirect chains. However, we will only describe here the part needed to understand its garbage collector. The garbage collector is based on referencelisting (a factorization of reference counting) between spaces, with time-stamps on messages to avoid race conditions. We base the explanation on the example in Figure 1. This example is easily generalizable to more references and spaces. Each space counts the messages it sent to other spaces. Each message is time-stamped by the sender with the value of the counter when it was sent. When a message containing R is sent by Y to X, the time-stamp of the message is put in a eld of scion (R), called scionstamp. When the message is received by X, the time-stamp of the message is also put in a eld of stub (R), called stubstamp. Thus, for stub (R), stubstamp is the time-stamp of the last message containing R received from Y , and for scion (R), scionstamp is the time-stamp of the last message containing R sent to X. When A becomes unreachable, stub (R) is collected by the local garbage collector of space X. As a nalization of stub (R), a value, called threshold [Y ], is set to its stubstamp eld, if its stubstamp eld is greater than the value of threshold [Y ]. Thus, threshold [Y ] contains the time-stamp of the last message received from Y that contained a reference whose stub has been collected by a local garbage collection. After each garbage collection in space X, a message LIVE is sent to all the spaces in the immediate vicinity. The immediate vicinity of the space X is the set of spaces which have stubs and scions whose associated scions and stubs are in X. The LIVE message sent to space Y contains the list of all the references from X to Y whose stubs are still reachable (this is called reference-listing). The value of threshold [Y ] is also sent in the LIVE message to Y. When the LIVE message is received by Y , the list of references is compared to the list of scions, whose associated stubs were in X. Those missing were

The system consists of a set of spaces. Each space is a process, that has its own memory, its own local roots, and its own local garbage collector. A space can communicate with other spaces (on the same computer or a di erent one) by sending asynchronous messages. Messages can contain marshaled references to local or remote objects, which can be used in remote computations (e.g. for remote invocation). Y

X

B

A

scion

stub

Figure 1: A reference from A in space X to B in space Y .

Y

X

X

A reference R from object A in space X to object B in space Y is represented by two special objects | stub (R) and scion (R) | and materialized by:  A local pointer in X from A to stub (R)  A local pointer in Y from scion (R) to B A scion serves as a root for its local garbage collector and corresponds to an incoming reference: an object pointed from some other space is considered live for local garbage collection. The stub is a \proxy", containing the location of its associated scion. Each scion has at most one matching stub, and each stub has exactly one associated scion. If several spaces refer to B, each will use a di erent scion, one for each stub. Reference R is created by space Y : the new scion scion (R) is rst created, then a message containing the location of scion (R) is sent to X, where the message is unmarshaled by creating stub (R). X

Y

Y

X

X

Y

X

X

X

Y

Y

X

X

1.2 Stub-Scion Pair Chains The SSPC system [15] is a mechanism of distributed references similar to Network Objects [1] 2

X

2.1 Date propagation

unreachable in X and are called suspect. References to suspect scions may still be contained in messages which had not been received yet by X (at local garbage collection time). Such scions should not be deleted. To prevent an incorrect deletion, the scionstamp eld of suspect scions is compared to the threshold [Y ] contained in the LIVE message. If threshold [Y ] is greater than the scionstamp eld of scion (R), then a stub referred to by a message sent after the last one containing R has been collected. This means that the last message containing R must have been received before the garbage collection, and so the scion can be safely deleted. To prevent out-of-order messages from contradicting this last assertion, messages from Y marked with a time-stamp greater than the current value of threshold [Y ] are refused by space X. This is called threshold- ltering. This garbage collector is fault-tolerant: scions are not deleted in the case of message loss, out-of-order messages or space crashes. Moreover, it is scalable (each space only sends and receives messages in its immediate vicinity), and asynchronous (since garbage collections in each space may happen at any time, and need no synchronization with other spaces). Since this GC uses reference-listing, it is not complete: distributed cycles will never be collected. We now introduce our algorithm to detect and cut distributed cycles, making the SSPC garbage collectorcomplete.

9 10 8

A

C 5

5

X

X

Y

5

5 9

10

5

5

9 10 12

X

B

Figure 2: Dates: an unreachable cycle (with constant date 5) and a reachable chain (with increasing dates from 10 to 12).

The idea of propagating dates from roots was introduced by Hughes [5]. However, our algorithm makes far weaker assumptions on the system than Hughes (no global time, asynchronous communications, etc...). A Lamport clock [6] is used to simulate global time at participating spaces: Each message used by the detection process is marked with the current time of the sender space. When a message is received, the local time of the receiver is increased to be strictly greater than the time contained in the message. The algorithm propagates dates from a root (either local root or scion) to all stubs reachable from it. This propagation is performed as a side-e ect of local GC (thus, we suppose it is a tracing GC). A local root propagates the current time of the local Lamport clock, whereas scions propagate the dates that their associated stubs have been marked with. If a stub is reachable from di erent roots, it is marked with the largest of their dates. The dates marked on the stubs will later be propagated to the associated scions by modi ed LIVE messages.

2 Detection of free distributed cycles The detector of free distributed cycles is an extension of the SSPC garbage collector. Thus, spaces may decide to use the SSPC GC without the detector extension. Spaces involved in the detection process are called participating spaces, whereas other spaces are called non-participating spaces. Our detector will only detect cycles completely included in the participating spaces. 3

As local time is increasing, reachable stubs are marked with increasing dates. On the contrary, unreachable cycles evolve in two phases: in the rst phase, the greatest date on the cycle is propagated to all the elements of the cycle. In the second phase, when no new date can override the greatest one, the dates on the elements in the cycle stay constant forever (see Figure 2). We will use this characterization to detect cycles: all elements of an unreachable cycle are

trace 9

8

8

D

9

8

E

9

8

8

9

D

E

protected:8 trace

marked with constant dates.

9

To recognize them, participating spaces compute a limit date, called globalmin, such that all reachable stubs are marked with a date greater than globalmin. Thus, if a stub is marked with a date smaller than globalmin, it must belong to an unreachable cycle.

9

9

D

8

E

protected:8

9

9

9

9

9

D

E 9 propagated

protected:8

9

2.2 Computation of globalmin

protected:8

9

D

9

9

E protected:8

Spaces: A 10

B

C

D

E

Figure 4: D begins to protect 8 when one stub is marked with 9 > 8. D stops protecting 8 when the new date 9 has been propagated by E to its stubs (and as a consequence, E protects 8).

F

10 10

rooted

9

9

9

9 8

8

8

8 7

Protected dates: 10 9

8

The last value of localmin sent by each space is saved by the detection server. Each time a new localmin is received, globalmin is computed as the minimum of the saved values of localmin.

7

Figure 3: A chain of remote references: Stubs and scions marked with 8 are protected by space D, since D knows the chain is reachable (stub passing from 8 to 9).

2.3 Computation of localmin

To compute globalmin, Hughes uses a distributed consensus (using Rana's termination algorithm [14]) relying on time-bounded communications. This approach is unusable in a large-scale faulttolerant system. In our algorithm, globalmin is computed by a designated detection server. By de nition, globalmin is the minimum date on reachable stubs. Each participating space computes its own local minimum on its reachable stubs, called localmin.

At any given space, localmin is the minimum date on local reachable stubs. In fact a space cannot detect all local reachable stubs, as only those whose dates are increasing are known to be reachable. Thus, localmin is computed as the minimum of dates on local stubs known to be reachable. If a stub is reachable, but not known to be reachable (as its date T has not increased at last garbage collection), it will not participate in the computation of localmin in its space for this garbage col4

1001:STUBDATES={12,9}

Lamport time = 12 12 local roots

C

D

E

cyclicthreshold[D]=1001

11

12 10

8

10

8

10

9 9

8

9

8

protected = { ( 1000,E,6 ) ( 1001,E,8 ) ( _ ,E, _ )}

CYCLICTHRESHOLD=1001

ACK

LOCALMIN=6

GLOBALMIN=5 GLOBALMIN=5 GLOBALMIN=5 globalmin = 5 spaces = {(C ,9) (D ,6) (E ,5)}

Figure 5: Data structures and messages

stubs. Space Y can then decide to protect the old date T on the updated stubs. 3. Space Y has sent its new value of localmin to the detection server and has received an acknowledgment. Thus, the dates protected by Y are protected globally. 4. Space X has received a message from Y signaling that T 0 has been propagated. Then, space X may safely stop protecting date T in its next computation of localmin.

lection. Nevertheless, there is a stub in a space upstream on its chain from which it is reachable whose date T is increasing to T 0 . Thus, date T will participate in the computation of the localmin value for the upstream space (see Figure 3). As a consequence, the stub will be protected by the localmin value of an upstream space. We now know which dates are to be protected: the old dates on increasing stubs. We must next de ne how long they should remain protected. A space may only stop protecting a date T on a reachable chain when some other space is protecting the date T on the downstream part of the chain. But, a downstream space protects T when its stub on the chain receives a new date greater than T. So, space X must protect date T, replaced by the greater date T 0, until T 0 has been propagated to the stubs in the next space Y on the chain. Concretely, space X must wait until: 1. The new date T 0 has been propagated to scions in space Y by a message. 2. A local garbage collection has occurred in space Y propagating date T 0 from scions to

2.4 Data structures and messages Each stub has two elds stubdate and olddate. is the new date propagated by the garbage collection trace, olddate is the largest date propagated to the stub by a previous garbage collection (see Figure 5). A scion needs a eld sciondate, containing the date propagated from its matching stub, which will be propagated from this scion by the next garbage collection trace. stubdate

5

After some garbage collections, STUBDATES messages propagate dates from stubs to their matching scions. For each stub, the date contained in the STUBDATES message is the maximumof stubdate and olddate. This tolerates message loss, without impacting the correctness of the computation. STUBDATES messages are marked with SSPC time-stamps. When a message STUBDATES is received by space X from space Y , its timestamp is compared to a new threshold, called cyclicthreshold [Y ], which contains the timestamp of the latest message STUBDATES received by X from Y . The value of cyclicthreshold [Y ] is periodically sent by X to Y in a CYCLICTHRESHOLD message. On each space, a set, called the protected set, contains the dates which must be protected, in triples of the form timestamp  space  date. A triple (timestamp; Y; olddate) corresponds to a message STUBDATES sent to Y after a local garbage collection, marked with time-stamp timestamp, where olddate is the minimum replaced olddate to protect for stubs pointing to Y . The protected set also contains a special entry for each space, called the oating entry, whose timestamp part is not set and which may increase between two local garbage collections (indeed, it is used to cope with mutator activity). Finally, each space computes its localmin as the minimum of the dates of all the triples in the protected set. This value is sent to the detection server in a LOCALMIN message. The detection server acknowledges by sending back a ACK message. Periodically, the detection server sends an up-todate value of globalmin to all participating spaces in a GLOBALMIN message. To compute globalmin, the detection server uses a table containing the last value of localmin received from each participating space.

to the

eld of reachable stubs. values are copied to the values of SSPC threshold- ltering. This is important to prevent out-of-order messages to be accepted after the propagation. Updating the protected set: For each space Y in the immediate vicinity of X, stubs in X pointing to Y are examined: If the stubdate eld is greater than the olddate eld, the olddate value will be protected. This is done by setting the date in the oating entry of Y in the protected set of X to the olddate value, if the olddate value is smaller than the current date of the oating entry. Computing the new localmin value: localmin is computed as the minimum of the dates of all triples in the protected set, for any space and any time-stamp (including

oating entries). Sending the new localmin value: The new computed localmin is sent to the detection server. The detection process in X is stalled until a ACK message is received from the detection server, acknowledging the reception of the LOCALMIN message. In fact, only the expedition of STUBDATES and CYCLICTHRESHOLD messages is delayed. The mutator is not stalled, and STUBDATES and CYCLICTHRESHOLD messages from other spaces are still received and treated. Nevertheless, this acknowledgment is needed for the space to be sure its reachable stubs are protected before sending messages to other spaces, which could decide to stop protecting these stubs. Propagating dates to other spaces: STUBDATES and CYCLICTHRESHOLD messages are sent to other spaces in the immediate vicinity of X. The time-stamps on the STUBDATES messages are used to set the timestamps of the oating entries for each space. New oating entries are created for the next garbage collection. stubdate cyclicthreshold threshold

2.

X

X

3.

4.

5.

2.5 Algorithm

2.5.2 Between two local garbage collections

2.5.1 Local garbage collection in space X

1. Tracing: During the trace, dates are prop- Between two garbage collections, STUBDATES, agated from roots (local roots + scions) CYCLICTHRESHOLD and GLOBALMIN messages 6

are received. The STUBDATES messages contain new dates for scions. To accept out-of-order messages, the sciondate are only updated if the new dates in the STUBDATES messages are greater than the old sciondate values. The time-stamps on STUBDATES messages update the cyclicthreshold values (like the threshold of SSPC). Thus, cyclicthreshold is the greatest time-stamp of STUBDATES messages received from a particular space. The CYCLICTHRESHOLD messages are used to remove old triples from the protected set: Space X receiving a CYCLICTHRESHOLD message from space Y containing the cyclicthreshold T removes all triples (T 0 ; Y; date) such that time-stamp T 0 is smaller than T (or equal). Finally, when a new value of globalmin is received in a GLOBALMIN message, all scions whose dates are smaller than the new globalmin are excluded from the set of local roots. As a consequence, they will not be traced at the following garbage collection, and their cycles will be collected.

dates will stay at NOW forever, and they will act as local roots: distributed cycles including these remote references will never be collected. This is safe and does not impact the completeness of the algorithm for participating spaces. Incoming references from non-participating spaces are well handled by this mechanism, but we must also cope with outgoing references to nonparticipating spaces. Indeed, we must not put entries for these spaces in the protected set, since no CYCLICTHRESHOLD messages will be sent from these spaces to remove such entries, preventing localmin and hence globalmin from increasing, thus, stalling the detection process. A space must therefore only send STUBDATES messages and create entries in its protected set for known participating spaces. The list of participating spaces is maintained by the detection server, and sent to other participating spaces when necessary (new participating space, space quitting the detection process, or space suspected of having crashed).

2.6 New remote references and non- 2.7 Coping with mutator activity participating spaces

The mutator can create and delete remote references between local garbage collections. As a consequence, dates on a remotely-reachable object might never increase owing to a phantom reference: each time a local garbage collection occurs in a space from where the object is reachable, the remote reference is deleted by the mutator after being copied to another space. Greater dates might never be propagated to the object and the object would be detected as a free cycle, whereas it is still reachable.

When a new remote reference is created, we must initialize the olddate eld of the stub and the sciondate eld of the scion. Since the date which will eventually be propagated from a stub to its associated scion is unknown at creation time, the olddate and sciondate elds are initialized to a special value NOW. In a computation, NOW is always replaced by the current local time. As a consequence, a newly created scion propagates the current time, as a normal local root, until a new date is propagated by a STUBDATES message from its matching stub. For the newly created stub, its NOW value will be replaced at next garbage collection time by the local current time. This mechanism is also used to enable references between spaces participating to the detection algorithm and non-participating spaces. Indeed, scions pointed to from non-participating spaces will never be updated by new propagated dates. Thus, their

To cope with this problem, we use the oating entries in the protected set: each time a stub is used by the mutator (e.g. when a reference is copied), its olddate is compared to the olddate of the oating entry for the space of its matching scion. Thus, the date of the oating entry always contains the minimum olddate of all the stubs which have been used between two local garbage collections, thereby protecting any object reachable from these stubs by \transient references". 7

2.8 Fault tolerance

3 Complements

vicinity. The three messages can be sent as one single network message. Thus, there is only one message sent for each space in the vicinity. The message contains one identi er and one date for each live stub pointing to the destination space, a time-stamp and the cyclicthreshold value for the destination space. There is also one LOCALMIN message sent to the detection server, and one ACK message sent by the detection server. The protected set contains triples for each space in the vicinity. For one space X in the vicinity of Y , the number of triples for X in the protected set of Y is equal to the number of local garbage collections that occurred on Y since the last garbage collection on X. If the frequencies of the garbage collections in the di erent participating spaces are close, the protected set should not grow too much. If one space needs too many garbage collections, and its protected set becomes too large, it should stop performing cyclic detection after each garbage collection (but not stop garbage collections) until enough entries in its protected set have been removed. Finally, thousands of spaces may use the same detection server, since the server only contains one date per participating space, and the computation of the minimum of this array should is not expensive.

3.1 Analysis

3.2 Implementation

We can now estimate the maximum time needed to collect a newly unreachable cycle: it is the time needed to propagate dates greater than those on the cycle to all reachable stubs. We de ne a period as the time necessary for all spaces to make a new local garbage collection. The time needed to collect the cycle is equal to the product of the length of the largest chain of reachable references by the period time: time = max  time We can also estimate the number and the size of the messages after one local garbage collection: there is one LIVE message (SSPC garbage collector), one STUBDATES message and one CYCLICTHRESHOLD message sent for each space in the immediate

Our algorithm has been incorporated into an implementation of the SSP Chains system written in Objective-Caml [8], using the Unix and Thread modules [9]. The whole implementation contains 1300 lines of code, of which 200 are associated with the cyclic GC algorithm. The propagation of dates by tracing was implemented by making minor modi cations to the existing Caml garbage collector [2]. The main one was to transform the mark(roots) function into mark(roots,date), which simply marks stubs reachable from a set of roots with the given date. This function is then applied rst to the normal local roots with the current date (which is always greater than all the dates on scions), and then to

This algorithm is tolerant to message loss between participating spaces (although not with the detection server). Indeed, STUBDATES and CYCLICTHRESHOLD messages only contain increasing values (greatest dates on stubs and greatest timestamps on received messages). Out-of-order messages are accepted, until the following local garbage collection (cyclicthreshold is then copied to threshold). Space crashes are handled by the detection server, which can exclude a suspected space from the detection process by sending a special message to all participants. Then, all participating spaces set incoming scions from the suspects to NOW and remove all entries for the suspected spaces in the protected sets. Finally, the detection server may also crash: this does not stop acyclic garbage collection, and only delays cyclic garbage collection. A detection server can be restarted, rebuild dynamically the list of participating spaces through a special recovery protocol, then wait for each participating space to send a new localmin value before computing a new upto-date globalmin value.

collection

lengths

period

8

sets of scions, sorted by decreasing dates, so that each reachable stub is only marked once, with the date of the rst root it is reachable from. Finalization of stubs (needed to set the threshold when they are collected) is implemented by using a list of pairs of one weak pointer to a stub and one stubstamp eld. After a garbage collection, weak pointers are tested and the stubstamp eld modi es the threshold if the weak pointer is dangling. The protected set is implemented as a FIFO queue for each known participating space. The head of the queue contains the oating entry, which can be modi ed by the mutator between local garbage collections. When a CYCLICTHRESHOLD message is received, entries are removed from the tail of the queue until the last entry has a greater time-stamp than the one in the message. Finally, localmin is computed as the minimum of all entries in all the queues. To reduce the number of messages, STUBDATES and CYCLICTHRESHOLD messages are consolidated into a single message. Similarly, ACK can contain a previous value of globalmin, thus replacing GLOBALMIN messages. Objective-Caml has high-level capabilities to automatically marshal and unmarshal symbolic messages, easing the implementation of complex protocols. Modi cations of the compiler and the standard object library were needed to enable dynamic creation of classes of stubs and dynamic type veri cation for SSPC. However, these modi cations are unrelated to both the acyclic GC and the cycle detector algorithm.

greatly depends on the heuristic for selecting \suspected objects". Maheshwari and Liskov [10] work is based on back tracing: objects that are suspected to belong to a free cycle are traced back. An heuristic based on distance from a root gives a means for selecting \suspected objects". If the back trace does not encounter a root, the object is on a free cycle. Their detector is asynchronous, fault-tolerant and adapted to large-scale systems. Nevertheless, backtracking requires extra data structures for each remote references. Furthermore, every suspected cycle need one trace, whereas our algorithm collects all cycles concurrently. Rodrigues and Jones [4] cyclic garbage collector is inspired from Lang et al.[7] which divides the network in groups of processes. Groups are able to collect free cycles among their constituents. The main di erence between the two algorithms is that only suspected objects (using an heuristics such as Maheshwari and Liskov's distance) are traced. A global synchronization is needed to terminate the detection. It is also dicult to know how the algorithm behaves when the group becomes huge. The DMOS garbage collector [12] provides good properties: safety, completeness, nondisruptiveness, incrementality and scalability. Spaces are divided into a number of disjoint blocks (cars). Cars from di erent spaces are grouped together into trains. Reachable data is copied from cars from one train to cars from other trains. Unreachable data and cycles included in one car or in one train are left behind and can be collected. The completeness is guaranteed by the order of collections. This algorithm is really complex. It has not been implemented. Moreover, the problem of fault-tolerance is not addressed by the authors.

4 Related work Detecting free cycles has been addressed by several researchers. Here, we only present the most recent work. A good survey can be found in [13]. The three recent algorithms are based on partitioning in groups of spaces or nodes. Cycles are only collected when they are included in a single partition. To improve the partition, some heuristics are used. These algorithms are complex and may be dicult to implement. Moreover, their eciency

5 Conclusion We have described a complete distributed garbage collector: an acyclic distributed garbage collector, extended by a detector of distributed cyclic garbage. Our garbage collector has some nice properties: asynchrony between participating spaces, fault-tolerance (messages can be lost, participating 9

spaces and servers can crash), low resource requirements (memory, messages and time), and nally ease of implementation. It seems well adapted to large-scale distributed systems, since it supports non-participating spaces, and consequently clusters of spaces. A formal proof of the safety of the cycle detection algorithm has been constructed (see Appendix A).

Nancy (France), September 1985. SpringerVerlag. [6] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558{565, July 1978. [7] Bernard Lang, Christian Queinnec, and Jose Piquer. Garbage collecting the world. In

Proc. of the 19th Annual ACM SIGPLANSIGACT Symp. on Principles of Programming Lang., Albuquerque, New Mexico (USA), Jan-

Acknowledgments The authors would like to thank Neilze Dorta for her survey on recent cyclic garbage collectors. We also thank Jean-Jacques Levy and Damien Doligez for their valuable comments and suggestions on improving this paper.

References [1] Andrew Birrell, Greg Nelson, Susan Owicki, and Edward Wobber. Network objects. In

Proceedings of the 14th ACM Symposium on Operating Systems Principles, pages 217{230,

[8] [9] [10] [11]

[12] Asheville, NC (USA), December 1993. [2] Damien Doligez and Xavier Leroy. A concurrent, generational garbage collector for a multithreaded implementation of ML. In Proc. of the 20th Annual ACM SIGPLAN-SIGACT [13] Symp. on Principles of Programming Lang., pages 113{123, Charleston SC (USA), January 1993. [3] Cedric Fournet, Georges Gonthier, Jean- [14] Jacques Levy, Luc Maranget, and Didier Remy. A calculus of mobile agents. In LNCS, volume 1119, 1996. [15] [4] Richard Jones Helena Rodrigues. A cyclic distributed garbage collector for network objects. In Workshop on Distributed Algorithms (WDAG), Bologna (Italy), October 1996. [5] John Hughes. A distributed garbage collection algorithm. In Jean-Pierre Jouannaud, editor, Functional Languages and Computer Architectures, number 201 in LNCS, pages 256{272,

10

uary 1992. X. Leroy. The objective-caml system software. Technical report, INRIA, 1996. Xavier Leroy. Unix system programming in caml light. Technical Report No. 147, INRIA, Le Chesnay, France, 1993. U. Maheshwari and B. Liskov. Collecting distributed garbage cycles by back tracing. In Principles of Distributed Computing, 1997. Robin Milner, Joachim Parrow, and David Walker. A calculus of mobile processes I and II. Information and Computation, 100:1 { 40 & 41 { 77, September 1992. R.L. Hudson R. Morrison J. Eliot B. Moss D.S. Munro. Garbage collecting the world: One car at a time. In OOPSLA, Atlanta (U.S.A.), October 1997. David Plainfosse and Marc Shapiro. A survey of distributed garbage collection techniques. In Proc. Int. Workshop on Memory Management, Kinross Scotland (UK), September 1995. S. P. Rana. A distributed solution to the distributed termination problem. Information Processing Letters, 17:43{46, July 1983. Marc Shapiro, Peter Dickman, and David Plainfosse. SSP chains: Robust, distributed references supporting acyclic garbage collection. Rapport de Recherche 1799, INRIA, Rocquencourt (France), November 1992.

A Safety proof Notation name (T) is the value of name on space S at time T. MESSAGE ( )! ( ) is the message sent by S at time T and received by S before T . CYCLICLIVE message is a single message containing both STUBDATES and CYCLICTHRESHOLD messages for a destination space. NOW is a special date, which is always replaced by the current time during tracing. k

Sk

k

k

l

Sk Tk

Sl Tl

l

Hypothesis Let S be the central site Let S1 be the participating and non-participating spaces Let S be one participating space Let T 0 , T, T1 be times such that globalmin (T 0) = globalmin (T) and globalmin (T) was computed from localmin (T ) ::n

i

::n

i

Si

S

i

S

Sk

k

Lemma 1 S and S are two participating spaces. If CYCLICLIVE ( )! is received before T , then T 0  T . k

l

Sl T 0

l

Proof:

k

Sk

l

l

LOCALMIN messages must be acknowledged by the central site before any CYCLICLIVE message is sent during a local trace. So, LOCALMIN ( )! has been received by S before CYCLICLIVE ( )! has been sent. Since CYCLICLIVE ( )! is received by S before T  T, LOCALMIN ( )! is received by S before T. But the latest message received from S by S before T is LOCALMIN ( )! . Thus, we have T 0  T . Sl T 0

l

Sl T 0

S

l

Sl T 0

l

k

Sk

Sl T 0

k

l

l

Sl Tl

l

l

Lemma 2 S is a participating space and S a space. If (stub , scion ) is a pair such that scion :date(T ) < globalmin (T), then:  At time T , stub is only accessible from scions satisfying: scion :date(T ) < globalmin (T)  max(stub :date(T ); stub :olddate(T )) < globalmin (T) k

l

l

S

l

Sl

l

Sl

S

scion :date(T ) < globalmin (T) and globalmin (T)  localmin (T ) and localmin (T )  T ) scion :date(T ) < T ) scion :date(T ) 6=NOW k

Sk

S

S

Sk

k

k

Sk

k

k

Sk

k

Sk

S

Sl

l

Sl

Proof:

k

Sk

Sk

Sl

k

11

S

S

Sk

Since scion :date(T ) 6=NOW, S must have received a message CYCLICLIVE ( )! from S , sent at time T 0 , and received before T , which previously set the value of scion :date. Therefore, S is a participating space and, from Lemma 1, T 0  T . Since scion :date is an increasing value, and the date sent by CYCLICLIVE ( )! for stub is max(stub :date(T 0); stub :olddate(T 0)), we have: max(stub :date(T 0); stub :olddate(T 0))  scion :date(T ) ) max(stub :date(T 0); stub :olddate(T 0)) < globalmin (T) k

Sk

Sl T 0

k

l

k

l

l

l

l

Sl T 0

Sk

l

Sl

l

Sl

l

Sl

l

k

Sk

Sl

l

Sl

Sk

l

Sl

Sl

l

Sk

Sk

S

l

Now we must prove that, for any time T 00, such that T 0  T 00  T , we have stub :date(T 00) < globalmin (T): Suppose T 00, such thatT 0  T 00  T , is the rst time such that stub :date(T 00)  globalmin (T). Since stub :olddate is the maximum of all previous stub :date values, we have stub :olddate(T 00) < globalmin (T). Indeed, stub :olddate may also be modi ed by an application message containing the location of scion . But, this is not the case since scion :date would have been set to NOW l

l

l

l

Sl

l

S

l

l

l

l

Sl

S

S

l

Sl

Sl

Sl

l

Sl

Sk

Sk

) stub :olddate(T 00) < stub :date(T 00) ) protected (S ; T 00)  stub :olddate(T 00) ) protected (S ; T 00) < globalmin (T) Sl

Sl

l

Sl

k

l

Sl

k

l

l

Sl

l

S

Moreover, stub :date(T 00) is contained in CYCLICLIVE ( )! , and greater values are contained in subsequent CYCLICLIVE messages . Since scion :date(T ) < globalmin (T), none of them have been received by S before T . From lemma 1, at time T , only CYCLICLIVE ! messages sent before T have been received by T . All these messages have a threshold strictly lower than T 00 , since CYCLICLIVE ( )! and subsequent messages have not been received by S at time T . So, at time T , protected (S ; T 00) is still in the protected (T ) set: ) localmin (T )  protected (S ; T 00) But protected (S ; T 00) < globalmin (T) ) localmin (T ) < globalmin (T) which is a contradiction since localmin (T ) is used to compute globalmin (T) on the central site at time T. We have proved that max(stub :date(T ); stub :olddate(T )) < globalmin (T). From this, at time T , stub is not reachable from any local roots. It is only reachable from scions whose date at time T is inferior to globalmin (T). Sl

Sl T 00

l

k

Sk

l

k

Sk

S

k

l

Sk

k

Sl

k

l

Sl

l

Sl

l

Sl

k

l

S

l

l

Sl

l

Sk

k

l

Sl

k

Sl

k

l

Sl T 00

l

S

Sl

S

l

Sl

l

l

l

Sl

S

Sl

l

S

Proof Let (stub , scion ) be a stub-scion pair such that scion :date(T 0) < globalmin (T 0): We must prove that it belongs to a free cycle before erasing it. Sj

Si

Si

Si

i

i

As scion:date() is increasing, and T < T < T 0 , we have: scion :date(T ) < globalmin (T) i

Si

i

12

i

S

From lemma 2, we know that stub is reachable at time T only from scions satisfying the same property at time T . This property characterises free cycles if and only if, for a stub-scion pair (stub , scion ) in the cycle, no local reference may have been created with scion by an application message sent from stub at time T 0 before T and received by space S at time T 0 after time T . This event must belong to one of the following two cases: either a CYCLICLIVE ! message sent after the application message is received before T by S , or no CYCLICLIVE ! message sent after the application message is received before T by S . In the rst case, all application messages received after T , but sent before the last CYCLICLIVE ! received before T , will be rejected, since their timestamps are lower than the threshold at time T , which is set to be greater than the timestamps of all CYCLICLIVE ! received before T . In the second case, from lemma 1, at time T , all CYCLICLIVE ! messages received were sent before T . Since no CYCLICLIVE ! messages sent after the application message have been received before T , the last CYCLICLIVE ! received before T was sent before the application message. Thus protected (T ) still contains a protected time inferior to stub :olddate(T 0 ) (to cope with mutator activity). ) localmin (T )  stub :olddate(T 0 ) From lemma 2, stub :olddate(T 0 ) < globalmin (T) ) localmin (T ) < globalmin (T) which is a contradiction since localmin (T ) is used to compute globalmin (T) j

Sj

k

Sl

Sk

Sk

Sl

k

k

l

l

l

Sk

l

l

l

Sk

Sl

Sl

l

l

Sk

l

Sk

l

Sk

Sk

Sk

l

Sk

Sk

k

k

k

Sk

Sk

k

Sl

l

Sl

k

l

Sl

Sl

Sk

Sk

Sl

l

S

k

S

Sk

k

S

So, we have proved that scion belongs to a free cycle at time T 0  T and can be safely deleted. Si

i

13

k