Clusters: A Pragmatic Approach Towards Supporting a Fine Grained Active Object Model in Distributed Systems1 Gary Craig Umesh Bellur Kevin Shank Dept. ECE, Syracuse University email: fcraig, bumesh,
[email protected] Doug Lea Dept. of CS, SUNY Oswego email:
[email protected]
Abstract A powerful programming environment for developing distributed applications relies on its ability to abstract away the details of the underlying architecture and present a simpli ed view to the developer. Along the path to transparency and uniformity lies object orientation, a uniform ne-grained active object model and powerful semantic analysis to automate the partitioning, distribution and run-time control of the user's application. This paper presents a model of object composition known as clustering. The construction model, the mapping of clusters to a run-time speci cally designed to support them, and related issues are described.
1 Background DIAMONDS is an environment for the development of distributed applications (under development at Syracuse) in which the dynamic association of resources to applications re ects algorithmic parallelism, computational needs, and the current state of the system[1]. In this system, networks of heterogeneous processors serve as a pool of computational resources for a variety of applications. Software engineering concerns drive much of the approach, focusing on providing a distributed system which is easy to program, with a clear, clean programming model. The DIAMONDS environment will include: 1) a uniform ne-grained active object model that supports and enhances good distributed object oriented analysis, design, and programming practices; 2) a means of composing ne-grained entities (language objects) into coarser grained run-time objects (the clustering process); and 3) tight coupling between the development environment and the distributed run-time support services. The programming model consists of interacting ne-grained active objects, where every object in the system is capable of processing code. Synchronization is provided by activation conditions called guards. Construction is template or class based. Objects may contain references to other objects and may export these references by sending them in a message. Communication is by message passing with three modes being supported: 1) one-way messages, 2) asynchronous two-way messages (future objects[2]), and 3) synchronous two-way messages (RPC-style). This ne-grained active object model is an ideal programming model which would have very ecient run-time attributes if a massively parallel MIMD architecture were available with an in nitely fast interprocessor communication network. On more typical hardware, the run-time system would prefer to manage coarser-grained entities. There appear to be two options for supporting ne-grained objects: 1
To appear in the Proc. of the 9th Intl. Conf. on Systems Engineering, Las Vegas, NV, July 14-16, 1993.
1. Support ne-grained objects at the level of the operating system itself [3, 4]. 2. Provide a mechanism for collecting (grouping) ne-grained objects into larger, more easily managed entities. The rst approach possesses intrinsic costs. Fine-grained object management is a serious problem due to the overhead associated with maintaining the namespace, access control and other information for these ne-grained entities [5]. Another disadvantage of this approach is the high cost associated with mapping requested objects on demand into a running context, especially when supporting a heterogeneous processor pool. The second option appears to be a more approachable alternative for providing a clean and structured mechanism for supporting ne-grained objects on today's coarse grained machines. This is the approach taken in DIAMONDS.
2 Object Clusters An object cluster is a group of related objects. The relationship between the objects within a cluster may be any of \communicates with", \created by" or others. Logically, clusters are the candidate units for distribution, migration, fault tolerance, and persistence. At the application design level, object composition serves as the principle mechanism for bringing together objects with common functionality, i.e, those objects which are logically related and belong to a single subsystem [6]2. Clustering employs the same general tactics, but with dierent aims. Clustering serves as the basis for colocating and mapping objects in a distributed environment. Conversely, object contralocation exploits concurrency. Clustering also aids in optimizing communication costs by dynamically binding dierent communication mechanisms based on object location and visibility. Clustering should also improve paging performance by packing together tightly coupled objects. The idea of clustering ne-grained objects for eciency has also been proposed for systems such as Elliot [8] and Amadeus[9].
2.1 Cluster Characteristics
Clusters are conceptual entities in the programming model and actual entities in the run-time model. The run-time system deals primarily with clusters, rather than objects. Objects do not exist independently; every object must reside in some cluster. Clusters are formed with the help of static analysis. Most clustering is actually performed by the run-time system in conjunction with static analysis and resource management tools. Clusters do not span nodes, i.e., they exist within a single address space. An executing application is mapped into a set of address spaces, one for each utilized node. An address space may contain more than one object cluster. Multiple clusters in a single address space may take advantage of sharing the same protection domain when communicating with each other. Cluster sharing is performed via proxies. Static analysis and dynamic management liberate programmers from the necessity of mapping objects to processes in those cases where the mappings have no semantic or functional meaning and/or are con guration-dependent. They also have the potential for greatly improving performance since they are based on object communication patterns that are very dicult for developers to take into account when performing manual mappings. On the other hand, developers may pre-specify clusters based on design criteria like modularity and resource requirements. Such eorts complement the mechanical analysis. 2
Clusters here are conceptually the same as ensembles presented in [7].
2.2 Cluster Dynamics
The dynamic pattern of object communication and computation implies that any a priori, static clustering may need to change. As a computation proceeds, an object's \working set" (uses relation) may change greatly and many objects are created and destroyed. It is key for the run-time to have directions for the placement of a newly created object to a speci c cluster, the creation of new clusters, and the need for additional resources (address spaces). The directives are obtained from a number of dierent sources: Static analysis augments object constructors with a placement quali er which references a cluster. This is done is those cases where it is determined that the default placement (within the same cluster as the creator) would result in poor performance. The resource management service of the run-time, monitors inter-cluster communication and resource utilization to determine when new address spaces should be added (reclaimed) and clusters should be migrated.
2.3 The Interaction Model
Clusters have a single entry point for invoking any of their constituent objects methods. All invocations to the cluster are handled by a cluster manager. A cluster manager is in charge of accepting invocation messages, queuing them, and dispatching them to the targeted embedded object.
3 Cluster Construction Model Clustering can be accomplished by partitioning an undirected, weighted graph where the nodes correspond to objects and/or sets of potential objects. The rst kind of node represents objects known to be generated during program execution. The second represents unbounded sets of objects (categorized by class) that may be generated during execution, depending on ow. A metric is de ned to indicate the pairwise proximity anity between any two nodes. These metrics generate a weighted graph that may be represented as a symmetric anity matrix. In a high level view, the process of clustering can be thought of as consisting of the following phases: 1. static analysis of the source to determine the anity matrix, 2. identi cation of unique clusters, and 3. embedding the clustering information within the source to aid the run-time in the association of dynamic objects with clusters.
3.1 Static Analysis
Per-node anity metrics may be estimated in two ways. Relations between any two nodes may be analyzed with respect to class-wide information. When individual objects (not merely sets of potential objects) can be identi ed, ow analysis may result in more exact per-instance anity measures. The anity factor for any two objects O (an instance of class C ) and O (an instance of class C ) is a linear combination of both a class based anity, C , common to all objects of classes X and Y and an instance based anity, A , speci c to O and O . Class based anity factors can be derived from class analysis by the application of certain heuristics such as containment, coupling, operational complexity, representational costs, etc., while the i
x
y
j
xy
ij
i
j
more dynamic instance based anities depend on the speci c interaction patterns between O and O . We de ne \interaction patterns" here to be the methods that the two objects invoke on each other, their type of interaction (synchronous or asynchronous), and the frequency of invocations. i
j
3.2 Cluster Analysis
Cluster analysis subsumes a wide varieties of activities including ordinating and normalizing of the anity matrix, determining the clustering tendency and the \optimal" number of clusters to place the given design in, and doing the actual clustering. Many algorithms devised for statistical clustering (see [10]) appear amenable for application to object clustering.
3.3 Information Feedback
Once cluster analysis has produced a set of object clusters, this information is encoded within the code so that it can direct the run-time system in the formation of physical clusters at runtime. We accomplish this by using a single directive in as a part of the parameter list of each object's constructor.
4 Run-time Mapping When an application is executed, the results of the static clustering analysis are reconciled with the current state of the system. This reconciliation involves determining the number of nodes available, and instantiating an ensemble at each. An ensemble is an application's presence at a node and is primarily responsible for control and resource tracking issues. The number of ensembles active within an executing application re ects both the algorithmic parallelism of the application and the current state of the system. The statically identi ed clusters are then partitioned among the ensembles. As the application executes, requests for new objects are satis ed by the appropriate cluster. Note that in most cases the number of ensembles will be less than the number of clusters. At run-time a cluster becomes an execution environment. Associated with each cluster is a thread of control and a queue of object invocations. Conceptually the thread executes a simple simulation like loop of \get invocation from queue, execute invocation, repeat". In this environment an invocation request can take place between objects that are in the same cluster, or dierent clusters but the same ensemble, or nally dierent ensembles altogether. We transparently bind an appropriate invocation protocol for each of these situations. Within a cluster, references to purely local objects can be direct via pointers, or even inlined by the compiler. References to other objects within the ensemble (dierent cluster) are passed to the appropriate cluster for execution. Ensembles maintain a cache of referenced clusters and their current location. They also have an interface to provide the resource management service with a dynamic view of inter-cluster anities (derived from inter-cluster communication references counts) to aid in application recon guration (cluster migration).
5 Summary We have outlined an approach to supporting ne-grained active objects in a heterogeneous distributed computing environment. The approach leans heavily on static analysis to group
objects into coarser entities which we term clusters and the corresponding run-time support for these clusters. The notable bene ts of this model are: ne-grained active object model enhances programmability. clusters have ner granularity than classic processes which enhances: { recon guration (easier to migrate) { re ned unit for resource management { improved locality of reference { permits arbitrary composition into ensembles based on system resource availability clustering automates much of the task of mapping program objects to medium to coarse grained run-time processes.
References [1] U. Bellur, G. Craig, K. Shank, A. Villarica, D. Lea, and V. Combs. DIAMONDS: Principles and Philosophy. Technical report, Syracuse University CASE Center, 1993. [2] A. Ananda, B. Tay, and E. Koh. A Survey of Asynchronous Remote Procedure Calls. ACM Operating Systems Review, 26(2):92{109, April 1992. [3] Eric Jul, Henry Levy, Norman Hutchinson, and Andrew Black. Fine-Grained Mobility in the Emerald System. ACM Transactions on Computer Systems, 6(1):109{133, February 1988. [4] J. S. Chase, H. M. Levy, E. D. Lazowska, and M. Baker-Harvey. Lightweight Shared Objects in a 64-Bit Operating System. In Proc. of OOPSLA, 1992. [5] R. Lea and J. Weightman. COOL: An object support environment co-existing with Unix. In Proceedings AFUU Convention Unix '91, 1991. [6] R. J. Wirfs-Brock, B. Wilkerson, and L. Wiener. Designing Object-Oriented Software. Printice-Hall, 1990. [7] Dennis de Champeaux. Object-oriented analysis and top-down software development. In Proc. of ECOOP91, pages 360{376, July 1991. [8] S. Krakowiak, A. Freyssinet, and S. Lacourte. A generic object-oriented virtual machine. In Proceedings of International Workshop on Object Orientation in Operating Systems, pages 73{77, 1991. [9] Y. Gourhant, S. Louboutin, V. Cahill, A. Condon, G. Starovic, and B Tangney. Dynamic Clustering in an Object-Oriented Distributed System. In OOPSLA Workshop on Objects in Large Distributed Systems (OLDS-2), 1992. [10] Anil Jain and Richard Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.