networks that comprise the virtual machine, and in the .... Java Virtual Machine (JVM) [2] over a large set .... RMI introduces an overhead of about 200 ms for.
Distributed, Reconfigurable Simulation in Harness Mauro Migliardi Dept. of Math and Computer Science Emory University 1784 N. Decatur Rd. Suite N. 100 Atlanta, GA, 30322
Abstract. Harness is an experimental metacomputing system based upon the principle of dynamic reconfigurability both in terms of the computers and networks that comprise the virtual machine, and in the services offered by the virtual machine itself. In this paper we describe how the capability to reconfigure the virtual machine plugging services on demand can be exploited to design dynamically reconfigurable distributed simulation applications. These applications are characterized by a high level of fault tolerance and by the capability to adapt to run-time changes both in the set of available resources and in the simulation model itself. The paper adopts as an example application the simulation of the process of crystal growth. Keywords: distributed systems, reconfigurable systems, adaptable applications, simulation.
1 Introduction Harness [1] is an experimental, Java-centric metacomputing system based upon the principle of dynamically reconfigurable, object oriented, networked computing frameworks. Harness supports reconfiguration not only in terms of the computers and networks that comprise the virtual machine, but also in the capabilities of the VM itself. These characteristics may be modified under user control via an object oriented "plugin" mechanism that is the central feature of the system. The motivation for a plug-in-based approach to reconfigurable virtual machines is derived from two observations. First, distributed
Vaidy Sunderam Dept. of Math and Computer Science Emory University 1784 N. Decatur Rd. Suite N. 100 Atlanta, GA, 30322
and cluster computing technologies change often in response to new machine capabilities, interconnection network types, protocols, and application requirements. The second reason for investigating the plug-in model is to attempt to provide a virtual machine environment that can dynamically adapt to meet an application's needs, rather than forcing the application to fit into a fixed environment. At system level, the capability to reconfigure the set of services delivered by the virtual machine allows overcoming obsolescence related problems and eases the incorporation of new technologies. At application level the reconfiguration capability of the system allows a greater level of code reuse as well as the incorporation of new capabilities into applications directly at run-time. As an example of a category of application that could greatly benefit from run-time reconfigurability we can cite long-lived simulations. These applications evolve through several phases: data input, problem setup, calculation, and analysis or visualization of results. In traditional, statically configured metacomputers, resources needed during one phase are often underutilized in other phases. Besides if during the execution the application discovers the need for a service or a capability that was not accounted for from the very beginning there is no simple way to add this new capability to the system. On the contrary, the capability to dynamically plug-in new
Fig. 1.
A Harness Virtual Machine
services in the virtual machine allows programmers both to adapt the environment to the needs of the application and to update the application itself using behavioral objects. In this paper we focus on the advantages of runtime reconfigurability at application level and we show how these capabilities allow designing distributed simulations endowed with: • a high level of fault tolerance; • the capability to evolve at execution time to adapt to changes in the set of available resources; • the capability to evolve at execution time to take advantage of the information gathered so far. The paper is structured as follows: in section 2 we give an overview of the system architecture; in section 3 we describe our distributed, adaptable simulation and how it capitalizes the capabilities of our system; in section 4 we compare our approach to other metacomputing related works; finally, in section 5, we provide some concluding remarks.
2 Fundamental abstractions and system architecture The fundamental abstraction in the Harness metacomputing framework is the Distributed
Virtual Machine (DVM) (see figure 1, level 1). Any DVM is associated with a symbolic name that is unique in the Harness name space, but has no physical entities connected to it. Heterogeneous Computational Resources may enroll into a DVM (see figure 1, level 2) at any time, however at this level the DVM is not ready yet to accept requests from users. To get ready to interact with users and applications the heterogeneous computational resources enrolled in a DVM need to plug-in services (see figure 1, level 3) in order to present a consistent service baseline (see figure 1, level 4). Users may reconfigure the DVM at any time (see figure 1, level 4) both in terms of computational resources enrolled by having them join or leave the DVM and in terms of services available by loading and unloading plug-ins. The main goal of the Harness metacomputing framework is to achieve the capability to enroll heterogeneous computational resources into a DVM and make them capable of delivering a consistent service baseline to users. This goal require the programs building up the framework to be as portable as possible over an as large as possible selection of systems. The availability of services to heterogeneous computational resources derives from two different properties
of the framework: the portability of plug-ins and the presence of multiple searchable plug-in repositories. Harness implements these properties leveraging different features of Java technology. These features are the capability to layer a homogeneous architecture such as the Java Virtual Machine (JVM) [2] over a large set of heterogeneous computational resources, and the capability to re-define the mechanism adopted to load and link new classes and libraries. Recently, many different projects related to high performance computing and distributed systems have focused their attention onto Java technology. The Java technology includes several desirable features such as a high degree of portability, the capability to load and link new libraries on demand and the capability to let Java bytecode and system dependent native code interact. However a complete assessment of the degree of efficiency with which the Java technology is able to achieve all these goals is still under development. For these reasons we think that the Java community could benefit from our experience in leveraging Java technology to build a dynamically reconfigurable metacomputing framework. At the same time, one of the main goals of the Harness metacomputing framework is to achieve the capability to enroll heterogeneous computational resources into a single DVM and make any of the enrolled resources capable of delivering services to users. This goal require the programs building up the framework to be as portable as possible over an as large as possible selection of systems, and this requirement can optimally leverage the portability enabled by Java technology. In fact, the Java Virtual Machine represents a uniform standard platform upon which it is possible to develop completely portable code at the cost of a reduced efficiency. Although portability at large is needed in all the components of the framework, it is possible to distinguish three different categories among the components of the framework that requires different level of portability. The first category is represented by the programs that implements the dynamically updated DVM status and the basic loading service, we call them kernel level services. These require the highest achievable
degree of portability, as a matter of fact to enroll a computational resource into a DVM it is necessary to execute at least the modules implementing the two above mentioned features. The second category is represented by very commonly used services (e.g. a general, network independent, message passing service or a generic event notification mechanism), we call them basic services. These should be highly available, but it is conceivable for some computational resources based on a non-standard architecture to lack them. Two different types of components compose the last category. The first type consists of highly architecture specific services like services that are inherently dependent on the characteristics of the nonstandard architecture of a computational resource (e.g. a low-level image processing service exploiting an SIMD co-processor, or a message passing service exploiting a specific network interface). The second type consists of services that need architecture dependent optimization in order to fulfill strict performance requirements. We call the services of this category specialized services. For this last category portability is a goal to strive for, but it is acceptable for them to be available only on small subsets of the computational resources. These different degrees of required portability and efficiency can optimally leverage the capability to link together Java byte code and system dependent native code enabled by the Java Native Interface (JNI) [3]. The JNI allows to develop the parts of the framework that are most critical to efficient application execution in ANSI C or FORTRAN language and to introduce into them the desired level of architecture dependent optimization at the cost of increased development effort or limited portability. The Java technology allowed us to provide this same spectrum of possibility to users of the Harness framework. As a matter of fact any plugin implementing a service can be developed as a pure Java program or can include in it C or FORTRAN code if the need for efficiency requires it. The Harness metacomputing framework allows the definition and establishment of DVMs. The kernel level services of a Harness DVM are
delivered by a distributed system composed of two categories of entities: • a DVM status server, unique for each DVM; • a set of Harness kernels, one and only one running on each computational resource currently enrolled or willing to be enrolled into a DVM. To achieve the highest possible degree of portability for the kernel level services both the kernel and the DVM status server are implemented as pure Java programs. We have used the multithreading capability of the Java Virtual Machine to exploit the intrinsic parallelism of the different tasks the two entities have to perform, and we have built the framework as a set of Java packages. All the control messages and DVM status changes messages flow through a star shaped set of reliable unicast channels whose center is the DVM status server. These connections are implemented through the communication commodities delivered by the java.net package. Messages related to the discovery-and-join protocol and the recover-from-failure protocol constitute an exception to this rule, in fact these protocols are based on multicast datagram transmission. It is important to notice that neither the star topology is meant to be the only topology of connection among intercommunicating entities in the DVM, nor the java.net package is meant to be the only communication fabric of the DVM. On the contrary, other communication services adopt the connection topology that best suit their needs and additional services can deliver access to different communication fabrics. For this reason, neither the star topology interconnecting the kernels and the DVM Server, nor the fact that the java.net package is used represent a major bottleneck in the Harness metacomputing framework, as a matter of fact user generated data streams are not required to flow through them. The kernels and a DVM server interacts to guarantee a consistent evolution of the status of the DVM both in front of users requesting new services to be added and in front of computational resources or network failures. This consistency is enforced by means of a set of
protocols executed during the different phases of the DVM life, namely an enrollment protocol, an event notification protocol, a service versioning protocol and a DVM status reconstruction protocol. The enrollment protocol checks the compatibility of the services currently provided by a computational resource willing to join a DVM with the services provided by the DVM. The event notification protocol guarantees that every event changing the current status of the DVM, i.e. a join or leave of a computational resource and the addition or removal of a service, is propagated to every computational resource currently enrolled in a totally ordered manner. The service versioning protocol checks that any component exists in the DVM in a single version and guarantees the absence of name collisions for different services. The status reconstruction protocol guarantees that the DVM status server does not constitutes a single point of failure by enabling the reconstruction of the whole DVM status by incrementally adding the states of each computational resource enrolled.
3 Simulation of the crystals growth process The process of crystal growth is modeled as a two-step diffusion-deposition, discrete time process. At each clock tick a particle already present on the growing surface can diffuse if it fulfills a set of constraints, at the same time there is a small probability for new particles to be deposited on the growing surface. The constraints ruling particle diffusion are the core of the physical model, in our application we model a low energy surface on which the attraction between neighbor particles prevents any diffusion. Our application is almost completely built as a network of cooperating layered services. The only component of the application residing outside the DVM is the GUI (see figure 2). This component allows the user to control the simulation parameters and visualizes the pattern of crystals growing on the surface in real time. However the simulation does not need the GUI component in order to run. As a matter of fact a user can launch the simulation, set up the
Fig. 2.
The GUI of the crystal growth application.
parameters by means of the GUI component, exit the GUI component, let the simulation run autonomously and check it later in order to steer it if the need arise. We adopt the farming programming paradigm to exploit the dynamically changing number of resources available to the simulation. The Harness metacomputing framework supports this programming paradigm by means of two interfaces: the Farmer interface and the Worker interface. The Farmer interface has six methods. The first one allows scheduling services to notify a component implementing the Farmer interface that one or more new computational resources are available. A user can manually notify any service implementing the Farmer interface for the arrival of new computational resources, but it is also possible for a Farmer service to register itself to any scheduling service to have it done automatically. Scheduling services can adopt a greedy approach trying to plug-in the worker class requested by the farmer on every new computational resource or require a third party to load the requested class before signaling the farmer. Our application requests the service of a
greedy scheduler in order to gather the full computational power of the DVM. The second method allows a user to set the behavioral object the Farmer will delegate its work to. The third method allows a user to store into the farmer the behavioral object that will be set into the workers. The last three methods can be used to start and stop the simulation and to get the current status. The farmer will automatically set the registered behavioral object into any new worker that is added to the current set. The Worker interface has three methods: a method to request a computation to be performed on the provided data, a method to set the status of the current computation and a method to set the behavioral object to which the worker will delegate the actual computation. Our implementation of the Worker interface implements also the Recoverable interface. This interface allows recovery of the status of the computation from any surviving worker. The coupling of the Farmer/Worker paradigm with the Recoverable interface allows our application to cope with computational resources removal with a graceful performance degradation as long
as the Farmer and a Worker are alive. As a matter of fact, the removal of workers from the application simply causes a heavier load onto the surviving workers. Besides, this design choice allows the application to survive in suspended animation the removal any number of components as long as a single component survives. In fact, even if the Farmer dies it is possible to recollect the whole application status from any surviving worker. The capability to plug-in new services at runtime as well as to set behavioral objects into existing ones allows us to perform live upgrade and reconfiguration in our simulation. In fact, the capability to plug-in new services allows us to adopt a different scheduling policy at any time during the simulation run simply forcing the farmer to request the services of a different scheduler. At the same time we can change the constraints that steer the diffusion process or the probability of deposition simply setting a new behavioral object into the workers. It is important to notice that none of these new components needs to exist in the system (or even in the application programmer mind) at the time the simulation is started. Nonetheless it is not necessary to stop the application to let it take advantage of these new modules. The farmer-workers computational paradigm provides to the system the capability to balance the load of the set of heterogeneous architectures enrolled in the computation. As a matter of fact faster workers will return their results earlier and will be assigned additional work. This mechanism relies on splitting a single simulation step in a number of sub-activities that must be larger than the number of available workers. The number of sub-activities needed to achieve load balancing has to grow as the difference in the workers computational power grows. Our implementation of the master is able to measure the difference in response time of the workers and shrink or enlarge the number of subactivities by splitting or merging them according to the measured response time. However, in our experiments we saw that Java RMI introduces an overhead of about 200 ms for each call to a worker. This overhead is comparable to the time required for a simulation step, thus it generally masks out the actual
computational power of the computer hosting the workers flattening the differences among them. For this reason, in our experiments the system was not able to distinguish between fast and slow workers and generated a number of sub-activities equal to the number of workers as it assumed to be dealing with a set of workers of equal computational power. This fact greatly reduces the efficiency of our system actually negating its load-balancing capabilities, and generating a heavy slowdown. However, the problem is induced by the inefficient implementation of the Java RMI mechanism and it is not an intrinsic weakness of our system. Besides several recent research projects focused on efficient implementation of Java RMI ( e.g. NinjaRMI [4], Albatross [5] and NexusRMI [6] ) and it is our opinion that one of these efficient implementation will become part of the official Java distribution.
4 Related works PVM [7] was one of the earliest systems to formulate the metacomputing concept in concrete virtual machine and programming-environment terms, and explore heterogeneous network computing. PVM however, is inflexible in many respects that can be constraining to the next generation of metacomputing and collaborative applications. The Harness “plug-in” paradigm effectively alleviates this rigidity while providing greatly expanded scope and substantial protection against both rigidity and obsolescence. Legion [8] is a metacomputing system that began as an extension of the Mentat project. Legion can accommodate a heterogeneous mix of geographically distributed high-performance machines and workstations. Legion is an object oriented system where the focus is on providing transparent access to an enterprise-wide distributed computing framework. As such, it does not attempt to cater to changing needs and it is relatively static in the types of computing models it supports as well as in implementation. Globus [9] is a metacomputing infrastructure which is built upon the “Nexus” [10] multilanguage communication framework. The Globus system is designed around the concept of a toolkit that consists of the pre-defined modules
pertaining to communication, resource allocation, data, etc. However the assembly of these modules is not supposed to happen dynamically at run-time as in Harness. Besides, the modularity of Globus remains at the metacomputing system level in the sense that modules affect the global composition of the metacomputing substrate. Almost all the above projects envision a model in which very high performance bricks are statically connected to build a larger system. One of the main idea of the Harness project is to trade some efficiency to gain enhanced global availability, upgradability and resilience to failures by dynamically connecting, disconnecting and reconfiguring heterogeneous components. Harness is also seen as a research tool for exploring pluggability and dynamic adaptability within DVMs.
5 Conclusions and future works Recent advances in hardware and networking have fueled the interest in distributed computing in general and in metacomputing frameworks in particular. However, traditional, statically configured metacomputing framework suffer from rapid obsolescence due to those same advances and tend to force applications to adapt to a fixed environment rather than adapt to them. To tackle these problems we have designed Harness, a dynamically reconfigurable system based on an object oriented, distributed plug-in mechanism. The run-time reconfiguration capabilities of Harness allow incorporating new technologies as well as to adapt to the changing needs of applications. In this paper we have described how these run-time reconfiguration capabilities can be exploited to build distributed, adaptable applications by means of an example application, namely a crystal growth simulation. Our application is implemented as a network of layered, cooperating services and benefits from the characteristics of the framework acquiring a high level of fault tolerance as well as the capability of being reconfigured, upgraded and modified at run-time. In the current implementation performances are heavily reduced by the overhead introduced by the overhead introduced by Java RMI, however a
version of the framework based on a light-weight message-passing mechanism is currently considered.
6…References 1 M. Migliardi, V. Sunderam, A. Geist, J. Dongarra, Dynamic Reconfiguration and Virtual Machine Management in the Harness Metacomputing System, LNCS, Vol. 1505, pgg.127-134, Springer Verlag, 1998. 2 T. Lindholm and F. Yellin, The Java Virtual Machine Specification, Addison Wesley, 1997. 3 S. Liang, The Java Native Interface: Programming Guide and Reference, Addison Wesley, 1998. 4 NinjaRMI web page, http://www.cs.berkeley.edu/~mdw/proj/ninja/n injarmi.html. 5 J. Maassen, R. Van Nieuwpoort, R. Veldema, H. E. Bal, A. Plaat, An Efficient Implementation of Java's Remote Method Invocation, Proc. of PPoPP'99, Atlanta, GA, May 1999 6 F. Breg, D. Gannon, A Customizable Implementation of RMI for High Performance Computing, Proc. of Workshop on Java for Parallel and Distributed Computing of IPPS/SPDP99 , pp. 733-747, S. Juan, Puerto Rico, April 12-16, 1999. 7 A. Geist, A. Beguelin, J. Dongarra, W. Jiang, B. Mancheck and V. Sunderam, PVM: Parallel Virtual Machine a User’s Guide and Tutorial for Networked Parallel Computing, MIT Press, Cambridge, MA, 1994. 8 A. Grimshaw, W. Wulf, J. French, A. Weaver and P. Reynolds. Legion: the next logical step toward a nationwide virtual computer, Technical Report CS-94-21, University of Virginia, 1994. 9 I. Foster and C. Kesselman, Globus: a Metacomputing Infrastructure Toolkit, International Journal of Supercomputing Application, May 1997. 10I. Foster, C. Kesselman and S. Tuecke, The Nexus Approach to Integrating Multithreading and Communication, Journal of Parallel and Distributed Computing, 37:70-82, 1996