The Harness Metacomputing Framework - Semantic Scholar

2 downloads 0 Views 109KB Size Report
Microsoft platforms provide a functional JVM [6]. A more restrictive constraint that the current implementation of the framework impose to computational resources ...
The Harness Metacomputing Framework Mauro Migliardi and Vaidy Sunderam Dept. of Math and Computer Science Emory University 1784 N. Decatur Rd. Suite N. 100 Atlanta, GA, 30322 [email protected] phone: 404 727 4975 fax: 404 727 5611

Abstract Metacomputing frameworks have received renewed attention of late, fueled both by advances in hardware and networking, and by novel concepts such as computational grids. However these frameworks are often inflexible, and force the application into a fixed environment rather than trying to adapt to the application’s needs. Harness is an experimental metacomputing framework based upon the principle of dynamic reconfigurability not only in terms of the computers and networks that comprise the virtual machine, but also in the capabilities of the virtual machine itself. This fundamental characteristic of Harness is intended to address the inflexibility of current metacomputing frameworks as well as their incapability to incorporate new technologies and avoid rapid obsolescence. The reconfigurability of Harness derives from a user controlled, distributed "plug-in" mechanism that is a central feature of the system. In this paper we describe how the design of the Harness system allows the dynamic configuration and reconfiguration of distributed virtual machines. Then we show how the framework deals with distributed virtual machines naming and addressing methods, as well as plug-in location, loading, validation, and synchronization methods. Finally we provide some examples services and applications and we discuss our experience in programming applications and services within the Harness metacomputing framework.

Keywords: Metacomputing, Distributed Virtual Machines, Reconfigurability, Pluggability, Heterogeneous Computational Resources.

1 Introduction Harness is an experimental metacomputing framework based upon the principle of dynamically reconfigurable, networked virtual machines. Harness supports reconfiguration not only in terms of the computers and networks that comprise the virtual machine, but also in the capabilities of the virtual machine itself. These characteristics may be modified under user control via a "plug-in" mechanism that is the central feature of the system. The motivation for a plug-in-based approach to reconfigurable virtual machines is derived from two observations. First, distributed and cluster computing technologies change often in response to new machine capabilities, interconnection network types, protocols, and application requirements. For example, the availability of Myrinet [1] interfaces and Illinois Fast Messages have recently led to new models for closely coupled network of workstations computing systems. Similarly, multicast protocols and better algorithms for video and audio codecs have led to a number of projects that focus on tele-presence over distributed systems. In these instances traditional metacomputing frameworks are subject to rapid obsolescence, in fact the underlying middle-ware either needs to be changed or reconstructed, thereby increasing the effort level involved and hampering interoperability. A virtual machine model intrinsically incorporating reconfiguration capabilities in terms of the services provided will address these obsolescence related issues in an effective manner. The second reason for investigating the plug-in model is to attempt to provide a virtual machine environment that can dynamically adapt to meet an application's needs, rather than forcing the application to fit into a fixed environment. Long-lived simulations evolve through several phases: data input, problem setup, calculation, and analysis or visualization of results. In traditional, statically configured metacomputers, resources needed during one phase are often underutilized in other phases. Besides, if during the execution the application discovers the need for a service that was not accounted for from the very beginning, there is no simple way to add this new capability to the system. By allowing applications to dynamically reconfigure the system, the overall utilization of the computing infrastructure can be enhanced and the need for unaccounted services fulfilled.

1

2

The overall goal of the Harness project encompasses several different issues such as reconfigurability of distributed virtual machines, fault-tolerance, security and access control. However, in the full paper we focus our attention on investigating and developing within the framework of a heterogeneous computing environment the specification and design of plug-in interfaces to allow dynamic extensions to a distributed virtual machine in a partially fault-tolerant way. This aspect involves the development of a generalized plug-in paradigm for distributed virtual machines that allows users or applications to dynamically customize, adapt, and extend the distributed computing environment's features to match their needs without compromising the consistency of the programming environment. The paper is structured as follows: in section 2 we give a very brief abstract description of the fundamental concepts upon which the Harness framework is built and we provide the terminology adopted in the rest of the paper; in section 3 we describe the actual implementation of the Harness system and the reasons behind our design choices; in section 4 we provide some example plug-ins and applications and we show how the Harness system pluggability mechanism allows the building of adaptable virtual machines; in section 5 we compare our approach to the one adopted by other projects in the metacomputing field; finally, in section 6 we provide some concluding remarks.

2 Fundamental Concepts and Terminology In this section we will give a brief, abstract description of the fundamental concepts upon which the Harness system is built, the goals our design strives to achieve as well as the terminology adopted in the rest of the paper. The actual degree of success we achieved in pursuing these goals as well as the constraints imposed by our implementation are discussed in the next section. The fundamental abstraction in the Harness metacomputing framework is the Distributed Virtual Machine (DVM) (see figure 1 in next page, level 1). Any DVM is associated with a symbolic name that is unique in the Harness name space, but has no physical entities connected to it. Heterogeneous Computational resources may join a DVM (see figure 1 in next page, level 2), nevertheless at this level the DVM is not ready yet to accept requests from users. To get ready to interact with users and applications the heterogeneous computational resources enrolled in a DVM need to plug-in services (see figure 1 in next page, level 3) in order to present a consistent baseline (see figure 1 in next page, level 4) to the DVM’s users and their applications. Users may reconfigure the DVM at any time (see figure 1 in next page, level 4) both in terms of computational resources enrolled by having them join or leave the DVM and in terms of services available by loading and unloading plug-ins. A plug-in is one actual implementation of a service. Each plugin is characterized by a set of attributes that defines its shareability, exportability and threadability. A plug-in is shareable if and only if several users might use a single instance of it. A plug-in is exportable if and only if it is possible to give access to it to users of different DVMs. A plug-in is threadable if and only if it is capable of coexisting with other plug-ins and, eventually, other instances of itself in the same address space. The availability of plug-ins to heterogeneous computational resources derives from two different properties of the framework: the portability of plugins and the presence of multiple searchable plug-in repositories.

3 System Architecture and Implementation In this section we give a detailed description of the implementation of the Harness system as well as of our design choices. 3.1 Synergy Between Harness and Java Technology Recently, many different projects related to high performance computing and distributed systems have focused their attention onto Java technology. The Java technology includes several desirable features such as a high degree of portability, the capability to load and link new libraries on demand and the capability to let Java bytecode and system dependent native code interact. However a complete assessment of the degree of efficiency with which the Java technology is able to achieve all these goals is still under development. For these reasons we think that the Java community could benefit from our experience in leveraging Java technology to build a dynamically reconfigurable metacomputing framework. At the same time, one of the main goals of the Harness metacomputing framework is to achieve the capability to enroll heterogeneous computational resources into a single DVM and make any of the enrolled resources capable of delivering services to users. This goal require the programs building up the framework to be as portable as possible over an as large as possible selection of systems, and this requirement can optimally leverage the portability enabled by Java technology. In fact, the Java Virtual Machine (JVM) [2] represents a uniform standard platform upon which it is possible to develop completely portable code at the cost of a reduced efficiency.

3

Fig. 1. Abstract model of a Harness Distributed Virtual Machine.

At the same time, although portability at large is needed in all the components of the framework, it is possible to distinguish three different categories among the components of the framework that requires different level of portability. The first category is represented by the programs that implements the dynamically updated DVM status and the basic loading service, we call them kernel level services. These require the highest achievable degree of portability, as a matter of fact to enroll a computational resource into a DVM it is necessary to execute at least the programs implementing the two above mentioned features. The second category is represented by very commonly used services (e.g. a general, network independent, message passing service or a generic event notification mechanism), we call them basic services. These should be highly available, but it is conceivable for some computational resources based on a non-standard architecture to lack them. Two different types of components compose the last category. The first type consists of highly architecture specific services like services that are inherently dependent on the characteristics of the non-standard architecture of a computational resource (e.g. a low-level image processing service exploiting an SIMD co-processor, or a message passing service exploiting a specific network interface). The second type consists of services that need architecture dependent optimization in order to fulfill strict performance requirements. We call the services of this category specialized services. For this last category portability is a goal to strive for, but it is acceptable for them to be available only on small subsets of the computational resources.

4

These different degrees of required portability and efficiency can optimally leverage the capability to link together Java byte code and system dependent native code enabled by the Java Native Interface (JNI) [3]. The JNI allows to develop the parts of the framework that are most critical to efficient application execution in ANSI C or FORTRAN language and to introduce into them the desired level of architecture dependent optimization at the cost of increased development effort or limited portability. The Java technology allowed us to provide this same spectrum of possibility to users of the Harness framework. As a matter of fact any plug-in implementing a service can be developed as a pure Java program or can include in it any quantity of C-code if the need for efficiency requires it. Besides, the adoption of the Java language as the development platform for the Harness metacomputing framework has given us several other advantages: • it allowed us to develop the framework as a collection of cooperating objects with consistent boundaries (Java Classes) and to guarantee to users an OO development environment; • it allowed us to define a clear and consistent boundary for plug-ins, in fact each plug-in is required to appear to the system as a Java class; • it allowed us to implement all the entities in the framework adopting a robust multithreaded architecture; • it allows users to develop additional services both in a passive, library-like flavor and in an active thread-like or process-like flavor; • it provided us a generic RPC mechanism to require services from remote computational resources (Java Remote Method Invocation [4]); • it provided us a generic methodology to transfer data over the network in a consistent format (Java Object Serialization [5]); • it allowed us to provide to users the definition of interfaces to be implemented by plug-ins implementing the basic services; • it provided us a generic mechanism to implement dynamic, on-demand linking of libraries (the Java Class and Library loading mechanisms). 3.2 System requirements: Prototype Implementation vs. Long Term Limitations The main requirement for a computational resource to be enrolled in a DVM is the capability to run Java programs, i.e. the presence of an implementation of a JVM on the given architecture. This is not a constraint derived from prototype implementation but rather a general design decision and we don’t foresee, at this moment, any reason to change it. Nevertheless, this is not a very restrictive constraint though, as a matter of fact all the major UNIX platforms as well as Microsoft platforms provide a functional JVM [6]. A more restrictive constraint that the current implementation of the framework impose to computational resources is the requirement to support IP multicast communication. In fact both the protocol that is used to let a computational resource discover an active DVM and enroll into it and the protocol used to recovery from DVM crashes utilize this type of communication. However, we plan to develop a second version of those two protocols able to work without multicast communication at the cost of relying on a centralized naming service. A third constraint imposed by the framework is the requirement for plug-ins to appear to the system as Java classes. Anyway, this requirement implies very small restrictions on the way users may develop plug-ins. In fact, a plug-in is not required to be a monolithic entity, on the contrary it may depend on any number of other classes. Besides, if the efficiency requirements are too strict to be fulfilled by pure Java code, the JNI allows developers to manually wrap native code into Java classes [7] while tools to achieve automatic wrapping of legacy code are currently studied [8]. 3.3 Harness DVMs’ Topology and Protocols The Harness metacomputing framework allows the definition and establishment of DVMs. The kernel level services of a Harness DVM are delivered by a distributed system composed of two categories of entities: • a DVM status server, unique for each DVM; • a set of Harness kernels, one and only one running on each computational resource currently enrolled or willing to be enrolled into a DVM. To achieve the highest possible degree of portability for the kernel level services both the kernel and the DVM status server are implemented as pure Java programs. We have used the multithreading capability of the Java Virtual Machine to exploit the intrinsic parallelism of the different tasks the two entities have to perform, and we have built the framework as a set of Java packages [9]. All the control messages and DVM status changes messages flow through a star shaped set of reliable unicast channels whose center is the DVM status server. These connections are implemented through the communication commodities

5

delivered by the java.net package. Messages related to the discovery-and-join protocol and the recover-from-failure protocol constitute an exception to this rule, in fact these protocols are based on multicast datagram transmission. It is important to notice that neither the star topology is meant to be the only topology of connection among intercommunicating entities in the DVM, nor the java.net package is meant to be the only communication fabric of the DVM. On the contrary, other communication services adopt the connection topology that best suit their needs and additional services can deliver access to different communication fabrics. For this reason, neither the star topology interconnecting the kernels and the DVM Server, nor the fact that the java.net package is used represent a major bottleneck in the Harness metacomputing framework, as a matter of fact user generated data streams are not required to flow through them. The kernels and a DVM server interacts to guarantee a consistent evolution of the status of the DVM both in front of users requesting new services to be added and in front of computational resources or network failures. This consistency is enforced by means of a set of protocols executed during the different phases of the DVM life, namely an enrollment protocol, an event notification protocol, a service versioning protocol and a DVM status reconstruction protocol. The enrollment protocol checks the compatibility of the services currently provided by a computational resource willing to join a DVM with the services provided by the DVM. The event notification protocol guarantees that every event changing the current status of the DVM, i.e. a join or leave of a computational resource and the addition or removal of a service, is propagated to every computational resource currently enrolled in a totally ordered manner. The service versioning protocol checks that any component exists in the DVM in a single version and guarantees the absence of name collisions for different services. The status reconstruction protocol guarantees that the DVM status server does not constitutes a single point of failure by enabling the reconstruction of the whole DVM status by incrementally adding the stati of each computational resource enrolled.

4 Example Services and Applications In this section we show how the capability of the Harness framework to dynamically add new services at run-time can be used to perform a live update of a running application by means of two example applications. The first application is a simple chat tool that allows a user to connect to a chat session and shows to the user every message that any user inject into the session. This application exploits a reliable multicast service to guarantee totally ordered delivery of every message injected. This service delivers a single interfacing point to applications and provides the capability to switch among different communication fabrics. We have implemented two different fabrics for the reliable multicast service. The first one is a highly portable pure Java implementation that leverages the DVM control topology and protocol to piggyback user messages onto fake events. This implementation can be plugged into any computational resource enrolled into a DVM but pays its high level of portability with a reduced efficiency. The second implementation is based on the CCTL [10] native library. This implementation is extremely efficient but the native library is not available for MS-Windows platforms. When started the chat application checks if any of the users currently enrolled in the session is unable to adopt the efficient communication fabric. If none is found it starts using it. Later, if the need to incorporate into the session an MS-Windows based computational resource arise the session switches to the pure Java communication fabric. As soon as the MS-Windows computational resource leaves, the chat tools in the session can revert to use the efficient communication fabric. These changes of the communication fabric do not require to stop any of the chat applications and the users experience only a temporary slowdown or freezing of the connection. The synchronization required to avoid message loss is guaranteed by the interface layer, however, during normal operations, no synchronization is required and this layer introduces only a negligible overhead. The second application is a reconfigurable distributed simulation of crystal growth. This application exploits the runtime reconfigurability of the Harness framework at application level. The process of crystal growth is modeled as a two-step diffusion-deposition, discrete time process. At each clock tick a particle already present on the growing surface can diffuse if it fulfills a set of constraints, at the same time there is a small probability that new particles are deposited on the growing surface. The constraints ruling particle diffusion are the core of the physical model, in our application we model a low energy surface on which the attraction between neighbor particles prevents any diffusion. Our application is built as a network of components distributed in the DVM. The only component of the application residing outside the DVM is the GUI. This component allows the user to control the simulation parameters and visualizes the pattern of crystals growing on the surface in real time. However the simulation does not need the GUI component in order to run. As a matter of fact a user can launch the simulation, set up the parameters by means of the GUI component, exit the GUI component, let the simulation run autonomously and check it later in order to steer it if the need arise. In figure 2 we show the different components of this second application and their relationship.

6

Fig. 2. Crystal growth simulation components and their relationships.

We adopt the farming programming paradigm to exploit the dynamically changing number of resources available to the simulation. The Harness metacomputing framework supports this programming paradigm by means of two interfaces: the Farmer interface and the Worker interface. The Farmer interface has three methods. The first one allows scheduling services to notify a component implementing the Farmer interface that one or more new computational resources are available. A user can manually notify any service implementing the Farmer interface for the arrival of new computational resources, but it is also possible for a Farmer service to register itself to any scheduling service to have it done automatically. The second method allows a user to set the behavioral object the Farmer will delegate its work to. The third method allows a user to store into the farmer the behavioral object that will be set into the workers. The farmer will automatically set the registered behavioral object into any new worker that is added to the current set. The Worker interface has three methods: a method to request a computation to be performed on the provided data, a method to set the status of the current computation and a method to set the behavioral object to which the worker will delegate the actual computation. Our application is able to cope with computational resources removal with a graceful performance degradation as long as the Farmer and a Worker are alive. As a matter of fact, the removal of workers from the application simply causes a heavier load onto the surviving workers. In case of loss of the farmer the application survives in a suspended animation state as long as a single component survives. In fact, even if the Farmer dies it is possible to recollect the whole application status from any surviving worker. The capability to plug-in new services at run-time as well as to set behavioral objects into existing ones allows us to perform live upgrade and reconfiguration in our simulation. In fact, the capability to plug-in new services allows us to adopt a different scheduling policy at any time during the simulation run simply forcing the farmer to request the services of a different scheduler. At the same time we can change the constraints that steer the diffusion process or the probability of deposition simply setting a new behavioral object into the workers. It is important to notice that none of these new components needs to exist in the system (or even in the application programmer mind) at the time the simulation is started. Nonetheless it is not necessary to stop the application to let it take advantage of these new modules.

7

5 Related Works Metacomputing frameworks have been popular for nearly a decade, when the advent of high end workstations and ubiquitous networking in the late 80's enabled high performance concurrent computing in networked environments. PVM [11] was one of the earliest systems to formulate the metacomputing concept in concrete virtual machine and programming-environment terms, and explore heterogeneous network computing. PVM is based on the notion of a dynamic, user-specified host pool, over which software emulates a generalized concurrent computing resource. Dynamic process management coupled with strongly typed heterogeneous message passing in PVM provides an effective environment for distributed memory parallel programs. PVM however, is inflexible in many respects that can be constraining to the next generation of metacomputing and collaborative applications. For example, multiple DVM merging and splitting is not supported. Two different users cannot interact, cooperate, and share resources and programs within a live PVM machine. PVM uses Internet protocols which may preclude the use of specialized network hardware. A “plug-in” paradigm would alleviate all these drawbacks while providing greatly expanded scope and substantial protection against both rigidity and obsolescence. Legion [12] is a metacomputing system that began as an extension of the Mentat project. Legion can accommodate a heterogeneous mix of geographically distributed high-performance machines and workstations. Legion is an object oriented system where the focus is on providing transparent access to an enterprise-wide distributed computing framework. As such, it does not attempt to cater to changing needs and it is relatively static in the types of computing models it supports as well as in implementation. The model of the Millennium system [13] being developed by Microsoft Research is similar to that of Legion's global virtual machine. Logically there is only one global Millennium system composed of distributed objects. However, at any given instance it may be partitioned into many pieces. Partitions may be caused by disconnected or weaklyconnected operations. This could be considered similar to the Harness concept of dynamic joining and splitting of DVMs. Globus [14] is a metacomputing infrastructure which is built upon the “Nexus” [15] communication framework. The Globus system is designed around the concept of a toolkit that consists of the pre-defined modules pertaining to communication, resource allocation, data, etc. However the assembly of these modules is not supposed to happen dynamically at run-time as in Harness. Besides, the modularity of Globus remains at the metacomputing system level in the sense that modules affect the global composition of the metacomputing substrate. Sun Microsystems Jini project [16] presents a model where a federation of Java enabled objects connected through a network can freely interact and deliver services to each other and to end users. Although in principle Jini shares with Harness many keywords and goals, such as services as building blocks and hetereogeneity of service providers, Jini focuses on the capability to build up a world of plug-and-play consumer devices and, to cope with such a goal, increase the resolution of the computational resources enrollable in a Jini federation from complete computational resources down to devices such as disks, printers, TVs and VCRs. The CORBA Object Management Architecture [17] provides a model to which Object Requests Brokers of different vendors can refer in order to seamlessly interact, this generality is achieved defining protocols and interfaces to be used by Object Request Brokers. However the focus of CORBA is upon interoperability of service providers while the focus of our project is mainly on building, managing and dynamically reconfiguring such service providers. Almost all the above projects envision a model in which very high performance bricks are statically connected to build a larger system. One of the main idea of the Harness project is to trade some efficiency to gain enhanced global availability, upgradability and resilience to failures by dynamically connecting, disconnecting and reconfiguring heterogeneous components. Harness is also seen as a research tool for exploring pluggability and dynamic adaptability within DVMs.

6 Concluding Remarks In this paper we have described the Harness metacomputing framework. This framework allows users to build, use and dismantle Distributed Virtual Machines (DVM) that are dynamically reconfigurable not only in terms of the heterogeneous computational resources they enrolls but also in the capabilities and services provided by the DVM itself. This fundamental characteristic of Harness is intended to address the inflexibility of current metacomputing frameworks as well as their incapability to incorporate new technologies and avoid rapid obsolescence. These results are achieved without compromising the coherency of the programming environment by means of a distributed plug-in mechanism, a high level of code portability and a fault-tolerant dynamic status management mechanism.

8

At the current development stage of the Harness project the issues of access control and security are still open, however, as we have shown in this paper, our experience with example services and applications demonstrates that the system is currently able: • to enroll heterogeneous computational resources; • to define services in term of abstract Java interfaces in order to have them updated in a user transparent manner; • to adapt to changing user needs by adding new services via the plug-in mechanism; • to safely add services to a distributed DVM through distributed synchronization mechanisms; • to locate, validate and load locally or remotely stored plug-in modules; • to cope with network and host failure with a limited overhead; • to dynamically add and remove heterogeneous computational resources to the DVM via the dynamic DVM management mechanism.

7 Bibliography 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

N. Boden et al., MYRINET: a Gigabit per Second Local Area Network, IEEE-Micro, Vol, 15, No. 1, February 1995. T. Lindholm and F. Yellin, The Java Virtual Machine Specification, Addison Wesley, 1997. S. Liang, The Java Native Interface: Programming Guide and Reference, Addison Wesley, 1998. Sun Mycrosystems, Remote Method Invocation Specification, available on line at http://java.sun.com/products/jdk/1.2/docs/guide/rmi/index.html, July 1998. Sun Mycrosystems, Object Serialization Specification, available on line at http://java.sun.com/products/jdk/1.2/docs/guide/serialization/index.html Sun Microsystems, The Source for Java Technology: Java Platform Ports, available on line at http://java.sun.com/cgi-bin/java-ports.cgi V. Getov, S. Flynn-Hummel and S. Mintchev, High Performance Parallel Programming in Java: Exploiting Native Libraries, to appear on Concurrency: Practice and Experience, 1998. S. Mintchev and V. Getov, Automatic Binding of Native Scientific Libraries to Java, Proceeding of ISCOPE97, December, 1997. M. Migliardi and V. Sunderam, Harness packages on line documentation, http://www.mathcs.emory.edu/~om/HDOCS/overview-summary.html. V. Sunderam, CCF: Collaborative Computing Framework, Proceedings of 1998 ACM/IEEE SC Conference, Orange County Convention Center, Orlando (Florida), 2-6 November 1998. A. Geist, A. Beguelin, J. Dongarra, W. Jiang, B. Mancheck and V. Sunderam, PVM: Parallel Virtual Machine a User’s Guide and Tutorial for Networked Parallel Computing, MIT Press, Cambridge, MA, 1994. A. Grimshaw, W. Wulf, J. French, A. Weaver and P. Reynolds. Legion: the next logical step toward a nationwide virtual computer, Technical Report CS-94-21, University of Virginia, 1994. Microsoft Corporation, Operating Systems Directions for the Next Millenium, position paper available at http://www.research.microsoft.com/research/os/Millennium/mgoals.html I. Foster and C. Kesselman, Globus: a Metacomputing Infrastructure Toolkit, International Journal of Supercomputing Application, May 1997. I. Foster, C. Kesselman and S. Tuecke, The Nexus Approach to Integrating Multithreading and Communication, Journal of Parallel and Distributed Computing, 37:70-82, 1996 Sun Microsystems, Jini Architecture Overview, available on line at http://java.sun.com/products/jini/whitepapers/architectureoverview.pdf, 1998. J. Siegel. CORBA Fundamentals and Programming, John Wiley & Sons Inc., New York, 1996