Dynamic Clustering in an Object-Oriented Distributed System Yvon Gourhant, Sylvain Louboutin, Vinny Cahill, Andrew Condon, Gradimir Starovic, Brendan Tangney Distributed Systems Group, Dept. of Computer Science, Trinity College, Dublin 2, Ireland e-mail:
[email protected]
October 9, 1992
Abstract
This paper is structured as follows. First, an introduction to Amadeus, including the actual clustering policy, is presented in section 2. Then, the motivations for clustering are given in section 3. Our general clustering algorithm is described in section 4, and two usages of this algorithm are in section 5 and section 6.
In an O-O large distributed system, object grouping is crucial in order to optimize communications between objects and disk I/O transfers. In this paper, we present a general purpose and scalable object clustering method which is integrated with garbage collection and load balancing processing. We propose a mixed dynamic and programmer-driven approach.
2 Amadeus The general approach in object-oriented distributed environments, such as Amadeus, is to partition distributed applications into a set of co-operating objects. Amadeus provides support for persistent and distributed objects. An object is a passive entity, instantiated within a context (a local address space). A job is a distributed process consisting of a set of activities. Activities are distributed threads of control, active in at most one context at a time. An activity executes by invoking operations on objects. When invoked, a given object may be located either in the current context, in another context (on the same or a dierent node) or in secondary storage. If the invoked object is not located in the local context, then an object fault occurs: resolved either by performing a cross-context invocation or by mapping the object into some context determined by security, heterogeneity and load balancing considerations. Each object belongs to some cluster. Each cluster contains a group of objects which may vary dynamically either by creation of new objects in the cluster, garbage collection of objects in the cluster or migration of objects between clusters2 . Clusters are the unit of mapping into a context. Each context contains a
1 Introduction The evolution of distributed applications is characterized by a growing number of nodes and (possibly persistent) objects, due to an increasing number of users and to code reuse. As a result, object clustering is important for performance purpose: to co-locate objects that communicate often, and to optimize disk I/O. Moreover, object clustering does not only improve paging performance but also memory usage, ef ciency of garbage collection and load balancing. In addition, we adopt the following goals: to provide transparent, general purpose, multi-language and scalable solutions. We discard application speci c solutions (e.g., using B+-trees for databases or R-Trees for VLSI applications) because we consider interoperability between applications as more important. We choose a system-level approach, in order to support static languages such as C++ or Eiel, as well as dynamic languages such as Smalltalk or Clos. Finally, scalability is necessary to support new applications spanning many nodes and made of numerous objects. Within the framework of Amadeus [2, 11], a general purpose ne-grained object-oriented distributed system, supporting applications written in several languages, we present a general and scalable dynamic clustering algorithm, possibly programmerdriven, tightly coupled to garbage collection and load balancing1 [4, 14], and executing in parallel with user computations.
In the latter case, clustering can be viewed as a rst phase of load balancing: balancing objects between clusters. However clustering has a long term eect that load balancing does not have; clustering persistent objects has an (hopefully) bene cial eect on the performance of future executions. 2 Objet migration between clusters is not supported by the current Amadeus version. We will implement it, by replacing the migrated object in the source cluster by a stub, acting as a forwarder. Useless stubs will eventually be suppressed by lazy deletion, when referenced next.
1 In Amadeus, load balancing can be con gured for balancing activities (distributed threads of control) or object clusters.
1
set of clusters which may vary dynamically as more clusters are mapped into or unmapped from the context. When an object is required by a job the entire cluster containing the object is mapped into an appropriate context. Clusters are only unmapped when the owning job terminates and there is no outstanding cross-context invocation active in the context or as a result of a garbage collection scan. The current clustering policy consists of automatically creating clusters. Newly created objects are then iteratively inserted in the current cluster until it is full. The cluster size is de ned when created. Dierent policies can be implemented by using explicit primitives for creating a cluster, inserting an object, mapping/unmapping a cluster on any node or on a particular node, nding the cluster containing an object, and nding the container of a cluster. Clusters are stored in containers. A container is simply a logically or physically contiguous area of disk storage. There may be zero, one or more containers per node. Each container stores a subset of the clusters in the system. Migration of clusters between containers is also supported. The choice of containers is currently explicit and static.
3 Motivations On the one hand, we consider the following motivations for clustering: To reduce communication costs.
The cost of an intra-context invocation is 1 s; the cost of a cross-context invocation on the same node can be 100 s, according to Bershad [1]; the cost of a remote invocation is 8ms in Amadeus; this cost in a large area network is still much expensive. Experiments [5] show that object migration (replication of an object accessed for reading, and object migration for writing) has better overall performance than remote invocations3. They also show that load balancing is more ecient when migrations are initiated by the programmer than by automatic mechanisms. To increase object locality for garbage collection.
If clustering is ecient, garbage objects are automatically separated from the other objects, without global synchronization. Therefore, their cluster remains isolated. Reciprocally, the graph of objects used for garbage collection can be used for clustering. To increase object locality for parallelism.
If clustering is ecient, activities rarely diuse (i.e. remote invocation). Each activity executes the code of 3 The cost of medium-grained object migration is more or less twice the cost of remote invocation in SOS [13] (without considering concurrency control), and ne-grainedobject migration [9] in Emerald and Amber is still cheaper.
related objects. Dierent activities can be associated to dierent nodes for parallel execution. They only diuse for synchronizing (via shared objects).
To increase paging performance and memory usage, and to reduce disk I/O.
If there is a high probability that an object fault occurs on object A when an object fault occurs on object B, objects A and B should be placed in a same cluster in order to be mapped/unmapped together at the same time. The consequence is a reduction of the number of page faults, disk I/O frequencies, and mapping/unmapping unreferenced objects located in the same page. One the other hand, primary criteria for clustering in a large distributed system are: The number of objects has an impact on who creates clusters. On the one hand, the programmer can exploit his full knowledge of the program to nd the best clustering arrangements, but this task is tedious and its complexity for large applications makes it dicult for choosing the optimal clustering policy. On the other hand, clustering can be automatically handled by a compiler or a system mechanism, but not always so ecient. So, we argue for an automatic approach, tunable by programmer hints (for explicit parallelism and shared objects handling). The number of shared objects is not considered by most systems, assuming that they can be grouped into the same cluster as all their direct accessors. This assertion has not been veri ed for distributed parallel applications such as TSP4 and SOR5 and, is even more questionable for large multi-media applications, with a high degree of object sharing between documents, or by comparison with les in a traditional system. A solution consists of giving a weight to each pointer, then grouping objects related by pointers having the heaviest weights into one cluster. Another solution consists of replicating a shared object on different clusters [7]. The advantage is availability; the disadvantage is the diculty of ensuring consistency. This solution can be generalized by grouping objects having the same sharing policy on the same pages, in order to avoid ping/pong problems [3]. We choose the solution based on weighted pointers, possibly overloaded by the last one. The dynamic nature of large distributed applications
has a direct consequence on the choice of a clustering policy6. 4 The Traveling Salesman Problem consists of nding the shortest path for a salesman to visit exactly once each city in a given graph. 5 Successive Over Relaxation consists of solving iteratively discrete Laplace equations on a grid. 6 Some examples of possible clustering policies are depth- rst and breadth- rst traversals, arrangement based on object type, random arrangement. According to Stamos [8], no best general grouping scheme has been identi ed.
Static approaches [8, 17] are based on the static structure of programs. A disadvantage is that they do not well suit long-lived, interactive and highly parallel applications. Dynamic approaches [7, 16] are based on statistics of dereferencing references. They are attractive because they automatically consider the dynamic behavior of distributed applications, but they involve some overhead, implying too simple algorithms or solutions dedicated to speci c application types. The last approach is stochastic [15]. Since the problem is N-P complete, the diculty is to nd a compromise between low cost and a good heuristic nding close to optimal solutions quickly. Therefore, we argue for creating clusters statically (using the actual Amadeus implementation for initial placements), then clustering at context termination, and possibly periodically during execution (called reclustering; criteria are object life and frequency of updates) using a dynamic approach based on a general algorithm capable of partitioning any graph. Two kinds of objects, and subsequently two granularities of clustering are considered: clustering of global objects (which can be migrated or invoked remotely), and clustering of persistent objects into containers. In the former case, objects which are known to communicate often are periodically clustered together, in order to be localized in the same context. The purpose of the latter level is twofold: to map/ unmap related objects at the same time and to balance clusters between containers. Criteria for clustering global objects are dierent from criteria for clustering persistent objects. For instance, the number of activities is a primary criterion in the former case, but not in the latter, since activities are not persistent. Moreover, clustering persistent objects does not only depend on the last execution of one application, but also on previous ones of any application that used the same objects before. Dierent applications can use the same objects, but invoke different methods, leading to dierent optimal arrangements. Therefore, we present in the next section our general clustering algorithm used for both global (section 5) and persistent (section 6) objects.
4 Algorithm The clustering algorithm, that we are presenting in this section, is not yet implemented. It is a derived form of V. Lo's Greedy algorithm, described in [10]. We rely on the good performance of this algorithm and on its smooth adaptation to clustering problems. The dierences are that our algorithm assigns objects to clusters (possibly localized on several nodes)
instead of threads to nodes, and works on small parts of the graph instead of the whole graph. This general algorithm is tuned by attractive and repulsive forces whose de nition depends on its usage. Let G be a graph in which each object o is a vertex and in which there is an edge per object pointer with weight c . Each object is initially assigned to a cluster. Here follows the algorithm: i
ij
Compute the average of weights: X 1 c C= k 1 2
ij
i
j
k
where k is the total number of objects. Mark all pointers between objects o1 ; o2; :::; o for which c C . Attempt to partition the graph into several object groups, in such a way that objects related by heavier pointer weights are in the same group when they can t in the same cluster, i.e.: While there are remaining unmarked pointers: k
ij
{ Find an unmarked pointer p = (o ; o ). Mark it. G is the object group containing o . G is the object group containing o . { If there is some arbitrary cluster t for which X i
i
i
j
j
j
u
ol
Gi
[
x