Comparing Reference Counting and Global Mark ... - Semantic Scholar

2 downloads 0 Views 226KB Size Report
formance of a simple global mark-and-sweep and a reference counting are comparable. ..... ACM, June 1993. 6. Hans-Juergen Boehm and David Chase.
Comparing Reference Counting and Global Mark-and-Sweep on Parallel Computers

Hirotaka Yamamoto, Kenjiro Taura, and Akinori Yonezawa Department of Information Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, Japan fymmt,tau,[email protected]

We compare two dynamic memory management schemes for distributed-memory parallel computers, one based on reference counting and the other based on global mark-and-sweep. We present a simple model in which one can analyze performance of the two types of GC schemes, and show experimental results. The two important observations drawn from the analysis are: 1) the performance of reference counting largely depends on shapes of data structures; speci cally, it is bad when applications use deeply nested data structures such as distributed trees. 2) the cost of reference counting has a portion that is independent of the heap size while that of global mark-and-sweep does not. We con rmed these observations through experiments using three parallel applications. Abstract.

1

Introduction

Most of the proposed algorithms and implementations of garbage collection (GC) on distributed-memory parallel computers can be classi ed into two groups: Reference Counting and Global Mark-and-Sweep [13]. Reference counting has often been considered superior to global mark-and-sweep because it does not require any synchronization, but previous arguments are often super cial, not con rmed by experiments, and lack a macroscopic viewpoint on the overall eciency of these two collection schemes. This paper reports a performance comparison of these two schemes, with simple analyses on the overall eciency of both reference counting and global mark-and-sweep, and experimental results. Experiments have been done on a parallel computer in which processes communicate solely via message passing, using three parallel applications. Observations drawn from the analyses and experimental results include: {

{

Performance of reference counting is sensitive to the \shape" of distributed data structure to be collected. More speci cally, it is bad when the application uses deeply nested distributed data structures, such as distributed trees. For applications that do not heavily use such deeply nested structures, performance of a simple global mark-and-sweep and a reference counting are comparable.

The rest of this paper is organized as follows. Section 2 speci es the algorithms and implementations we used, then Sect. 3 analyze the performance of both reference counting and global mark-and-sweep on a simple model. Section 4 shows the results of the experiments we did to con rm our analyses. After brie y discussing other performance studies of distributed garbage collections in Sect. 5, we nally summarize our claim and state future work in Sect. 6.

2 2.1

Collection Algorithms and Implementations Local Collector and Entry Table

Like other collectors, our reference counting and global mark-and-sweep consist of two levels | local and global. Reference counting-based collectors are often combined with a local tracing collector and keep track of reference counts only for remote references , since managing reference counts on every pointer duplication and deletion will incur very large overhead. Global mark-and-sweep also often has a local collector besides the global collector to collect local garbage quickly. To prevent local collections from reclaiming objects that may still be referenced by other processors, an entry table is used. An entry table keeps pointers of an object until a global collector (either a reference counter or a global marker) nds that the object is no longer remotely referenced. The local collector simply considers the entry table as a root of a local collection. Local collections in our collectors are co-scheduled, that is, when one processor performs a local collection, it requests all processors to perform a local collection. This does not require any synchronization, but it merely try to do local collections at the same time on all processors. This appears to entail excessive local collections at a glance and may be a waste of time; we con rmed, however, that this does not degrade performance of the applications we tested, and in fact often improves it, because scheduling local collections independently disturbs the synchronization performed by the applications. Further accounts on this issue whether local collections should be synchronized or not have been published in separate papers[16, 18]. Garbage collections generally equip a strategy that adjusts the heap size according to the amount of live data, and dynamically expands heap (i.e., request memory from OS) when needed. When heap is exhausted, garbage collection is triggered if the application has made enough allocations since the last collection. Otherwise, heap is expanded. In [16], we have presented an adaptation of this strategy to distributed-memory parallel computers, but for the purpose of experiments in this paper, we simply give a xed heap size when the application starts. As will be described in Sect. 4, we compare reference counting and global mark-and-sweep giving the same amount of memory. We implemented our collectors by extending Boehm's conservative garbage collection library[5, 6], which runs mainly on uniprocessor systems.

2.2

Reference Counting

In reference counting collectors, each processor increments a reference count of an object when it sends a reference to another processor and decrements a reference count of an object when it no longer uses a remote reference to the object. Objects whose reference count reached zero are unregistered from the entry table and thus can be reclaimed by a local collector. Actual implementations are more complicated to deal with communication delays and our algorithm is a variant of Weighted Reference Counting[3, 17], which do not send any message when it duplicates a remote reference in contrast to naive reference countings. Details are unimportant for the purpose of this paper, though. The question is how does a processor detect remote references that it will no longer use. Typical reference counting-based collectors on distributed environments, including ours, detect them by local collections ; a local collection traverses objects from its local root (including its entry table) and, as a by-product of this traversal, nds remote references that are no longer reachable from its local root. It then sends \delete messages" to the owner of such references. Upon reception of a delete message, the receiver processor decrements the reference count of the corresponding object. Reference counting scheme we focus in this paper should thus be called a hybrid of a local tracing collector and a remote reference counting. The overhead involved in the above hybrid scheme seems very small; it merely adds a small amount of communication overhead to each local collection. At the rst glance, it seems as ecient as local mark-and-sweep on uniprocessors, and therefore more involved schemes, such as global mark-and-sweep, are unnecessary (or necessary only for collecting cycles). A careful analysis and experiments reveal that this is not quite right, however. Leaving detailed discussion to Sect. 3, here we just illustrate a situation in which a reference counting can become very inecient. Consider a distributed binary tree, in which all references between a parent and a child are remote, and the reference count of the root node has just become zero. The rst local collection after this point only reclaims its root node, because reference counts of other nodes are still non-zero. This local collection not only retains these objects, but also spends time to traverse these (garbage) objects. Thus, the eciency of this local collection { space reclaimed per work { can be as bad as reclaiming a single node by traversing the rest of the entire tree. The second collection only reclaims two nodes at the second level from the root, whose reference counts now become zero, again spending time to traverse the rest of the tree, and the third one reclaims four objects at the third level, and so forth. This is, of course, an overly pessimistic scenario, but the point we would like to clarify is that hybrid schemes of this form retain objects that could be reclaimed by one stroke of global mark-and-sweep, spending a considerable amount of time to mark them repeatedly.

2.3

Global Mark-and-Sweep

Our global collector is a natural extension to a mark-and-sweep collector on uniprocessors, which has been described in [16]. Upon a global collection, each processor starts marking from its local root (excluding its entry table) and traces remote references by sending \mark messages" to traverse the entire object graph. Objects that are not marked when marking nishes are reclaimed. A global collector requires global coordination of all processors to detect the termination of this global marking, which has been deprecated by many studies that naive global mark-and-sweep is worse and less scalable than reference counting. We claim, however, that arguments like this lack a macroscopic viewpoint on the cost of garbage collection. A garbage collection scheme, or any memory management scheme in general, is more ecient than another as long as it reclaims a larger amount of memory with the same amount of work. In the next section, we analyze how much work is required to collect the same amount of garbage for both reference counting and global mark-and-sweep. There remains a question as to which of local/global collections should be performed when heap over ows. We adaptively choose one by comparing the eciency (space reclaimed per work) of the last local collection and the last global collection. A global collection is also performed when otherwise heap was expanded in the original algorithm, because we suppress heap expansion as mentioned above.

3 3.1

Performance Analysis on a Simple Model Cost of Garbage Collection

A major goal of garbage collection is to reclaim larger amount of memory with less amount of work; in other words, to minimize the cost of garbage collection, which is de ned as: Cost =

Time spent on GC : Amount of Allocated Memory

(1)

The amount of allocated memory becomes equal to the amount of reclaimed memory if the application runs for suciently long term and reuses reclaimed memory many times. Therefore the cost of garbage collection can also be expressed as: Time spent on GC : (2) Cost = Amount of Reclaimed Memory For example, the cost of a tracing collector on uniprocessor systems is:

Clocal =

local L H 0L

(3)

where H is the heap size, L is the amount of live objects, and local is a constant, which represents a cost to traverse one word of memory.1 Obviously, given a larger heap, the cost of GC on uniprocessors becomes smaller. In the rest of this section, we extend this argument to the cost of reference counting and global mark-and-sweep, assuming following simple behavior of applications. { {

3.2

Applications repeat the same computation many (= n) times. Applications allocate A bytes of objects during one period of computation, and then discards all of them when it reaches the end of the period. Analysis of Reference Counting

As mentioned in Sect. 2, the intuitive explanation of the possible bad performance is that nodes near leaves of a distributed tree are marked many times until their reference counts become zero. Below, we formalize this intuition. First we de ne the depth of a garbage object as the minimum number that satis es the following constraints: Depth of Garbage.

{ { {

If the reference count of a garbage object o is zero, the depth of o is zero. If a garbage object o is referenced by a local reference from an object of depth d, the depth of o is at least d. If a garbage object o is referenced by a remote reference by an object of depth d, the depth of o is at least d + 1.

The de nition captures our intuition that objects that are put at the end of a long chain of remote references are \deep". Figure 1 shows a graph of garbage objects and their depths. Under this de nition, an object whose depth is d when it becomes garbage survives d local collections after it becomes garbage, and is collected by the (d + 1)th local collection. Impact of Deep Garbage. Letting Gi be the total size of objects that are allocated in a period and whose depths are i when becoming garbage. Thus we have A = i0 Gi . The application repeat the computation n times, thus the total amount of allocated memory is:

P

nA = n

XG i0

i :

(4)

Since garbage at depth d is marked d times until reclaimed, the total work required to reclaim all objects is:

Wrc = local mL + local n 1

X iG i0

i ;

(5)

The cost for sweeping is ignored. In fact, it merely adds a small constant to (3) because the time spent on sweeping is proportional to the amount of allocated memory.

PE0 0

PE1 0

2

0

0 0

PE2 1 1 1

Fig. 1. Depth of Garbage. Squares, circles, and arrows represent processors, objects, and references (either remote or local) respectively. Numbers in circles are the depths of objects.

where m is the number of GCs performed during n periods of computation, which is determined by the heap size and live objects. The rst term ( local mL) represents the amount of work for live objects, and the second term ( local n i0 iGi ) for garbage.2 The total number of GCs (m) is determined as follows. Since objects at depth i when they become garbage will survive i local GCs, and the amount of such n Gi , the total amount of such garbage allocated during two successive GCs is m garbage (objects whose depths were i when they became garbage) just before a local GC (TGi ) is: n (6) TGi = (i + 1) Gi :

P

m

Therefore the total amount of garbage (TG ) just before a local GC is:

TG =

X

n (i + 1)Gi : m i0

From the fact that TG is equal to H 0 L,

P

iGi + ( m = i0

Pi Gi)n

H 0L

0

(7)

:

Thus, the cost of reference counting is:

W Crc = rc = Clocal (d + 1) + local d nA

where d =

(8)

Pi iGi P Gi 0

i0

:

(9)

d indicates the average depth of garbage. (d + 1)Clocal is the cost to mark live objects and local d is the cost to mark garbage. 2

For simplicity, we do not include the amount of work required to send/receive delete messages. It would add a constant (depending on applications) to the cost of reference counting (Crc ).

Again, the rst term represents the cost for marking live objects and the second term for dead objects. It is intuitive that the rst term decreases as the heap becomes large and that the second term is proportional to the average depth of garbage (d), because an object at depth d will get marked d times after it becomes garbage. Less intuitively, the rst term also increases as d. This is because deep garbage survives local GCs many times, hence occupies a part of heap. Therefore, deep garbage shrinks e ective heap size that are available for allocation, making GCs more frequent. Two important interpretations of this result are: { When the average depth of garbage (d) is very close to zero, Crc  Clocal , which means reference counting is as ecient as uniprocessor garbage collection. When d is large, however, this no longer holds; Crc is much larger than the cost of uniprocessor garbage collection (Clocal ), especially when d is large or the heap is large (hence Clocal is very small). { The cost cannot be less than local d no mater how large is the heap, because local d is independent of the heap size. More speci cally, the cost approaches a constant ( local d) as the heap size (H ) becomes suciently large. 3.3

Analysis of Global Mark-and-Sweep

Cost of Global Collection. Global mark-and-sweep works only on live objects, therefore we can express the cost of global mark-and-sweep as:

Cgms =

gms L gms = C ; H 0 L local local

(10)

where gms is a constant, which represents an average cost to traverse a remote/local reference. Apparently, this does not capture dependencies on the number of processors which a ects the cost for synchronization, communication latency of the network, the proportion of remote references to local references, or load in-balancing among processors which results in increase of idle time during GC. These are simply represented as an increase in gms . gms remains being a constant and independent of the heap size, however, as long as we use the same application and the same number of processors on the same machine to compare global mark-and-sweep and reference counting. Experiments in the next section are limited to meet these conditions.3 The cost of global mark-and-sweep thus may be several times as large as that of a local collector, but it retains a preferable feature of local GC, namely, larger heaps will reduce the cost of global mark-and-sweep. As mentioned in Sect. 2, we use a local collector together with a global collector expecting that the local collector is often more ecient than the global collector. Cost of Local Collections.

3

We agree that some of these conditions are to be removed to get more detailed characteristics of global mark-and-sweep. Our analysis is, however, still usable for the purpose of comparing two GC schemes.

A local collector used with a global collector can reclaim only garbage at depth zero (G0 ). Assuming that a global collection is performed every m local collections, we can calculate the total amount of work of m local collections at:

Wlgc = local

m X X j G = j=1 i>1

i

local

m(m + 1) 2

XG i>1

i ;

P

(11)

since the amount of garbage to be marked by local collector increases by i>1 Gi for every local collection. The average cost of m local collections is thus: (m + 1) i>1 Gi (12) Clgc = local 2G 0 Equation (12) indicates that: { The cost of local collection gets worse as m becomes large. { The cost of local collection depends on proportion of local garbage (G0 ) to the other ( i>1 Gi ); speci cally, it is often bad when applications heavily use remote references. In our implementation, however, one of local/global collection that seems to be better than another will be selected and performed dynamically. Thus the cost of local collection will never excess that of global collection. In addition, since the cost of global collection becomes smaller as heap becomes larger, the whole performance of global mark-and-sweep gets better with larger heaps.

P

P

4

Experiments

We performed experiments to compare the performance of the two distributed garbage collectors and to con rm our analyses. We used three parallel applications (Bin-tree, Puzzle, and N-body) written in a concurrent object oriented language ABCL/f in the experiments on Sun's Ultra Enterprise 10000. 4.1

Environments

ABCL/f is a concurrent object-oriented language proposed by Taura et al.[15]. It supports concurrent objects and dynamic thread creation via future call of procedures and methods. A concurrent object can be dynamically created on any processor and shared among multiple processors via remote references. Besides concurrent objects, a special object called communication channel is created by a remote procedure/method call to communicate the result value of the invocation. Objects that can be shared among processors thus consist of this concurrent objects and communication channels. Sun Ultra Enterprise 10000 consists of 64 Ultra SPARC I processors operating at 250MHz, and 6 G-bytes shared-memory. We emulated distributed-memory parallel computers by running ABCL/f on a communication layer using sharedmemory of a group of processes. The shared-memory has never been used except for message-passing purposes. We used only 48 processors out of 64 in experiments not to be disturbed by other system processes.

4.2

Applications

Bin-tree makes a binary tree spanning all processors whose height is 13, and traverses every node of the tree once. The construction and traverse of a tree is executed successively 60 times. Upon creating a new tree, the old tree becomes garbage. The majority of garbage therefore consists of the binary trees and communication channels. Puzzle solves an 8-puzzle (small version of 15-puzzle) whose shortest solution is 17 steps using distributed list successively 30 times. Communication in 8-puzzle is very frequent so that it makes a number of garbage communication channels. N-body solves N-body problem that contains 8000 objects using a hierarchical oct-tree structure[2]. It successively simulates the problem for 60 steps and replaces the old tree with a new tree for each step. Similar to Bin-tree, N-body generates distributed tree structures; it also generates, however, much more local garbage than Bin-tree while computing. 4.3

Results

To con rm the observations drawn from our analyses, we measured for each application 1) the distribution of garbage, and 2) time spent on GC for both reference counting and global mark-and-sweep. The distribution of garbage is measured by running each application for only a single period of computation, and then repeatedly performing local collections of reference counting, without making any allocation thereafter. The amount of memory reclaimed by (i + 1)th local collection gives the amount of garbage at depth i (Gi ). The result is shown in Fig. 2. From Fig. 2(a), it is clear that Bin-tree generates a considerable amount of deep garbage, which required many local collections to be collected. Figure 2(c), on the other hand, indicates that most garbage generated by N-body is shallow and thus can be eciently reclaimed by the reference counting. Note that garbage of depth 0 is special in that they can be collected without reference counting; a simple local collection that is used in the global garbage collection suces. Figure 3 shows the time spent on GC in each application. GC time is totaled over all (48) processors. LGC, RRC, and GGC are elapsed time in local collections, sending and receiving delete messages, and global collections, respectively. The x-axis represents the heap size of one processor, where L refers to the minimum heap size required to run that application. L is measured by invoking global mark-and-sweep very eagerly and frequently. Each application was then run with various heap sizes ranging from 2L to 8L. L was 6M bytes for Bin-tree, 2.5M bytes for Puzzle, and 5.5M bytes for N-body. Global mark-and-sweep was better in two applications, and most notably, was signi cantly better in Bin-tree. In addition, when the heap size increases, global mark-and-sweep consistently becomes better, whereas reference counting does not.

´­­ ³­­

ȝßöñâð

²­­ ±­­ °­­ ¯­­ ®­­ ­ ­

®

¯

°

±

²

³

´ µ ¶ ®­ ®® ®¯ ®° ®± ®² ®³ ®´ ®µ ÁâíñåìãÌßçâàñð

(a) Bin-tree

®¯­­­

®­­­­

ȝßöñâð

µ­­­

³­­­

±­­­

¯­­­

­ ­

®

¯ ÁâíñåìãÌßçâàñð

°

±

°

±

(b) Puzzle

¯­­­­ ®µ­­­ ®³­­­

ȝßöñâð

®±­­­ ®¯­­­ ®­­­­ µ­­­ ³­­­ ±­­­ ¯­­­ ­ ­

®

¯ ÁâíñåìãÌßçâàñð

(c) N-body Fig. 2. Distribution of Garbage. The x-axis indicates the depth of garbage whose amount is shown by the y-axis.

±²­­ ±­­­ °²­­

Ðâàìëáð

°­­­ ÉÄÀ ÏÏÀ ÄÄÀ

¯²­­ ¯­­­ ®²­­ ®­­­ ²­­ ­ ¯É

±É

³É

µÉ

ÅâÞíÐæ÷â

(a) Bin-tree

®­­­ ¶­­ µ­­

Ðâàìëáð

´­­ ³­­ ÉÄÀ ÏÏÀ ÄÄÀ

²­­ ±­­ °­­ ¯­­ ®­­ ­ ¯É

³É

±É

µÉ

ÅâÞíÐæ÷â

(b) Puzzle

®³­­ ®±­­ ®¯­­

Ðâàìëáð

®­­­ ÉÄÀ ÏÏÀ ÄÄÀ

µ­­ ³­­ ±­­ ¯­­ ­ ¯É

±É

³É

µÉ

ÅâÞíÐæ÷â

(c) N-body Fig. 3. Elapsed Time in GC. Vertical bars represent the time spent on GC; left bars represent the time of global mark-and-sweep, and right bars represent the time of reference counting.

These results match fairly well to our experimental results; as to Bin-tree and Puzzle where the weighted average of depth of garbage are larger than that of N-body, reference counting performed worse than global mark-and-sweep as our analyses predicted. As to N-body, on the other hand, most garbage is shallow and reference counting performed better. Moreover, larger heaps did not improve the performance of reference counting in Bin-tree and Puzzle, whereas they improved the performance of reference counting in N-body. This also matches to our analysis of reference counting in which we predicted the performance of reference counting becomes independent of the heap size when d and H are suciently large, like in Bin-tree and Puzzle. The presented performance of global mark-and-sweep does not t our prediction so much, in sense that doubling heap size did not half the total time spent on global mark-and-sweep. We guess that there are some inherent overheads in Boehm's library when handling so large heaps, and this is probably also why the performance of reference counting was gradually getting worse as heap became larger in Bin-tree and Puzzle.

5

Related Work

A number of algorithms for reference counting and global mark-and-sweep have been proposed in the literature[9, 13]. Performance studies are rare, presumably because most of them are not implemented. Existing performance studies [10, 14, 16, 19] only show performance of either a global mark-and-sweep or a reference counting and do not compare them. Most of previous works about distributed garbage collection schemes have been saying that: {

{

Reference counting can do local collections independently and can incrementally reclaim global garbage without any diculties, whereas global markand-sweep requires global coordination of all processors. Naive reference counting-based collectors cannot reclaim cyclic garbage while mark-and-sweep based collectors can.

These statements are correct; however, they do not clearly explain the di erence between performance of these two collection schemes, or explain it without experiments or suciently convincing analysis.

6

Conclusion and Future Work

Analysis shown in Sect. 3 clari ed that the eciency of (hybrid) reference counting schemes can potentially be very bad. Experimental results in Sect. 4.3 con rmed that a reference counting could perform much worse than a (fairly unsophisticated) global mark-and-sweep. A better understanding on the bene t of reference counting is:

{

{

It is as ecient as uniprocessor mark-and-sweep, as long as most distributed garbage is shallow, (i.e., not put at the end of a long chain of remote references). Reference counting does not require any extra implementation e ort to make it latency-tolerant, because communication latency is naturally masked by the application. Notice that while the global mark-and-sweep presented in this paper may be sensitive to latency, it is certainly possible to make a global mark-and-sweep latency-tolerant, in ways described in [7] or [10], for example. Hence, the real bene t is that the latency tolerance comes free for reference counting.

To summarize, it will be a choice in environments where communication latency is very long, or we know that most distributed garbage is shallow. They will be the case, for example, in distributed server-client computing in which most references direct from a client to the server, hence long chains of remote references are rarely created. This was not the case, on the other hand, in parallel applications in which deeply nested data structures are often created and the latency of communication is typically much smaller than that of LAN. Interestingly enough, the situation may be changing in distributed computing environments too, with the emerging low-latency interconnect technologies [4, 8] and global computing paradigms[1]. We are now planning to: {

{ {

adapt our analysis to other GC algorithms such as Hughes' time-stamp based global collector[7], indirect reference counting[12], or generational local collector[11], perform more experiments using realistic applications and true distributedmemory computers, and extend our analysis of global mark-and-sweep.

Acknowledgment A discussion with Takashi Chikayama was a start to extend our previous incomplete work. Naoki Kobayashi gave us lots of invaluable comments on drafts of this paper. We are also thankful to the anonymous referees for their insightful comments. We extend our thanks to many people who read and commented on earlier versions of this paper: Tomio Kamada, Toshio Endo, and Norifumi Gotoh.

References 1. A. D. Alexandrov, M. Ibel, K. E. Schauser, and C. J. Scheiman. SuperWeb: Research Issues in Java-Based Global Computing. Concurrency: Practice and Experience, June 1997. 2. Josh Barnes and Piet Hut. A hierarchical O(N log N ) force-calculation algorithm. Nature, (324):446{449, 1986.

3. David I. Bevan. Distributed garbage collection using reference counting. In Parallel Architectures and Languages Europe, number 258 in Lecture Notes in Computer Science, pages 176{187. Springer-Verlag, 1987. 4. Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawik, Charles L. Seitz, Jakov N. Seizovic, and Wen-King Su. Myrinet: A gigabit-persecond local area network. IEEE Micro, 15(1):29{36, February 1995. 5. Hans-Juergen Boehm. Space ecient conservative garbage collection. In Conference on Programming Language Design and Implementation, SIGPLAN NOTICES, pages 197{206. ACM, June 1993. 6. Hans-Juergen Boehm and David Chase. A proposal for garbage-collector-safe c compilation. The Journal of C Language Translation, 4(2):126{141, December 1992. 7. John Hughes. A distributed garbage collection algorithm. In Proceedings of Functional Programming Languages and Computer Architecture, number 201 in Lecture Notes in Computer Science, pages 256{272. Springer-Verlag, 1985. 8. IEEE Computer Society, New York, USA. IEEE Standard for Scalable Coherent Interface (SCI), August 1993. 9. Richard Jones and Rafael Lins. Garbage Collection, Algorithms for Automatic Dynamic Memory Management. Joen Wiley & Sons, 1996. 10. Tomio Kamada, Satoshi Matsuoka, and Akinori Yonezawa. Ecient parallel global garbage collection on massively parallel computers. In Proceedings of SuperComputing, pages 79{88, 1994. 11. Henry Lieberman and Carl Hewitt. A real-time garbage collector based on the lifetimes of objects. Communications of the ACM, June 1983. 12. Jose M. Piquer. Indirect reference-counting, a distributed garbage collection algorithm. In Parallel Architectures and Languages Europe, Lecture Notes in Computer Science, number 365,366, pages 150{165. Springer-Verlag, June 1991. 13. David Plainfosse and Marc Shapiro. A survey of distributed garbage collection techniques. In Proceedings of International Workshop on Memory Management, number 986 in Lecture Notes in Computer Science. Springer-Verlag, 1995. 14. Kazuaki Rokusawa and Nobuyuki Ichiyoshi. Evaluation of remote reference management in a distributed KL1 implementation. IPSJ SIG Notes 96-PRO-8 (SWoPP'96 PRO), 96:13{18, August 1996. (in Japanese). 15. Kenjiro Taura, Satoshi Matsuoka, and Akinori Yonezawa. ABCL/f :a future-based polymorphic typed concurrent object-oriented language | its design and implementation. In Speci cation of Parallel Algorithms, DIMACS, pages 275{291, 1994. 16. Kenjiro Taura and Akinori Yonezawa. An e ective garbage collection strategy for parallel programming languages on large scale distributed-memory machines. In Proceedings of Principles and Practice of Parallel Programming, SIGPLAN, pages 264{275. ACM, June 1997. 17. Paul Watson and Ian Watson. An ecient garbage collection scheme for parallel computer architectures. In Parallel Architectures and Languages Europe, number 258 in Lecture Notes in Computer Science, pages 432{443. Springer-Verlag, 1987. 18. Hirotaka Yamamoto, Kenjiro Taura, and Akinori Yonezawa. A performance comparison between reference count and distributed marking for global garbage collection scheme on distributed-memory machines. IPSJ SIG Notes 97-PRO-14 (SWoPP'97 PRO), 97(78):109{114, August 1997. (in Japanese). 19. Masahiro Yasugi. Evaluation of distributed concurrent garbage collection on a data-driven parallel computer. In Proceedings of Joint Symposium on Parallel Processing, volume 97, pages 345{352, May 1997.

Suggest Documents