Towards Monitoring in Parallel and Distributed Environments Ivan Zoraja1 , G¨unther Rackl2 , Thomas Ludwig2 1
FESB, Department of Electronics and Computer Science University of Split, 21000 Split, Croatia E-mail:
[email protected] 2
LRR-TUM, Institut f¨ur Informatik, Technische Universit¨at M¨unchen, 80290 M¨unchen, Germany E-mail: frackl|
[email protected] Abstract: Rapid technology transitions and growing complexity in parallel and distributed systems make software development in these environments increasingly difficult. Therefore, among other CASE-tools, software developers and users need powerful tools that are able to gather information from running applications as well as to dynamically manipulate with their executions. To connect multiple tools to the running application, online monitoring systems that integrate common tools functionality and provide synchronization among multiple requests are developed. This paper presents and compares three monitoring systems developed at LRR-TUM M¨unchen which address different programming paradigms. OMIS/OCM is aimed at message passing systems, CORAL at distributed shared memory (DSM) paradigm and MIMO at distributed object computing (DOC). Although our monitoring systems are aimed at different programming paradigms they are based on many similar concepts and solutions. In addition, monitoring features that are specific to the particular programming paradigm are pointed out. KEYWORDS: Online Monitoring, Parallel Tools, Software Development, Load Balancing, Performance Analysis, Debugging, Checkpointing, Visualization.
1 Introduction Software development methodologies and processes have not substantially changed over the years. Typically, developers and customers specifying the system requirements go through an incremental process that evolves into the final product through a series of iterations. They usually make use of graphical and textual modeling techniques to improve visualization, understanding and communication with one another. Once, the system functionality has been elicited and the system structure Research
supported by German Science Foundation SFB342, Subproject A1, Department of Energy grant DEFG05-91ER25105, National Science Foundation grants ASC-9527186 and CCR-9523544 and Army Research Office grant DAAH04-96-1-0083.
1
has been defined, developers verify the specification, design and implement the system. Finally, the system is tested and deployed to the customer. In the test and usage phases both developers and users need sophisticated software tools to identify logical program errors and performance bottlenecks as well as to manipulate with running applications and their environment. The necessity for the tool support is even more pronounced in parallel and distributed systems primarily since parallel and distributed applications consist of multiple activities (processes and threads) which may run concurrently. Further issues, such as nondeterministic execution, management of computing resources and load balancing, must be taken into account when producing quality parallel and distributed software and tools. In addition, technology transitions coupled with competitive pressures imposed on vendors which often ship new systems prematurely, without an appropriate tool support, has obstructed the widespread use of parallel software, especially since the vendors prioritize development of new systems and develop tools separately, usually only for well-established programming paradigms. In this paper we present, evaluate and compare three monitoring systems developed at LRRTUM M¨unchen which connect parallel tools to running applications and are aimed at three different programming paradigms. The rest of paper is organized as follows. In Section 2 we present a model of the parallel software and describe parallel software paradigms deriving them from the model. In Sections 3 and 4 we proceed with general requirements imposed on parallel tools, together with general monitoring architecture. Our monitoring implementations are discussed in Section 5. In the final section we summarize our conclusions and discuss the future work.
2 Parallel Software Paradigms In this section we present a model of parallel software [14] and briefly outline three programming paradigms commonly used in parallel and distributed environments. The model is specified using the UML [3] notation. For the sake of plainness only basic UML relationships are used – dependency, association (aggregation) and generalization. With reference to Figure 1, the basic building blocks in this model are the Code and Data classes. The first class contains operations in a programming or processor language while the second one describes data on which the operations are performed. The SeqExecutable class encompasses the previous classes and two associations: Interaction and Intrusion. The Interaction class shows the communication and synchronization facilities performed by applications via middleware while the Intrusion class refers to both observation and modification facilities used by monitoring systems. Three kinds of SeqExecutable may be identified. First, the Object class refers to objects in sense of object-oriented languages. Second, the Thread class describes both user-level and kernel-level threads. Finally, the Process class identifies processes created by the operating system. The ParExecutable class consists of multiple processes, each process may contain multiple threads and each thread may contain several objects. The Object class may be further refined into data and function members which are encapsulated only in object-oriented languages. The Application class presents an user application performing the specified algorithm or task, i.e. traveling salesman problem. The layer between the application and platform on which the application runs is depicted by the Middleware calls. This class encompasses software that provides specific functionality to applications. The Monitoring class describes software that is able to make intrusions into both Application and Middleware classes in order to observe their behavior and manipulate with their executions and interactions. 2
Application
Middleware
1
Monitoring
ParExecutable
has has
has
1
1
1..*
1..* Process
1..* Object
Thread
Interaction
Intrusion SeqExecutable has
1
1
1..* Code
has 1..* Data
Figure 1: Parallel Software Model – UML Notation According to the semantic of the Interaction class and the structure of the address space the following breakdown of parallel programming paradigms can be obtained: Message Passing – The Interaction association in the software model for this programming
paradigm presents message exchange among SeqExecutables which reside in different address spaces. Messages are exchanged through middleware using two primitives send and receive, with several parameters that specify peers and communication semantics. The most common message passing libraries are PVM [2] and MPI [4, 5]. Distributed Shared Memory – In this programming paradigm SeqExecutables reside in different
address space, too. The DSM middleware makes transparently use of the underlying message passing capabilities or additional hardware support to provide an abstraction of shared memory on hardware which is tightly coupled. The Interaction associations refers to explicit synchronization among parallel activates, e.g. locks and barriers. Typical software DSM representatives are TreadMarks [8] and Orca [1]. Distributed Object Computing – An abstraction of shared memory in this paradigm integrates
the DSM paradigm, object orientation and client-server architecture and therefore hides the Interaction class in runtime systems. Objects may only be accessed by calling their methods upon their references. Remote object are transparently accessed utilizing underling message passing capabilities or remote procedure calls. The DOC functionality is provided by means of services offered by servers and implemented as objects which are distributed across a pool of servers. Services are described by defining interfaces using an interface definition language (IDL). Typical representatives are CORBA [6, 7], DCOM [10] and Java RMI [13]. A detailed discussion on DSMs and DOCs can be found in [17].
3
3 Monitoring Issues Monitoring systems connect parallel tools to running applications. According to Figure 1, softwarebased monitoring systems have the same structure as other parallel and distributed software. The Intrusion association is the glue that makes monitoring working while it allows monitoring systems to do both observation and manipulation of running applications and middleware. In contrast, the Interaction class refers to the communication and synchronization among application’s components via middleware. The key design and implementation issues that developers contemplating building monitoring systems have to deal with are the following: Implementation Complexity – A monitoring system is itself a distributed application and must
cope with common issues already encountered in the realm of parallel and distributed software. In addition, intrusion to middleware and applications usually requires knowledge about their implementation internals. Functionality Requirements – The primary issue that functional requirements must address is
completeness. It means that monitoring systems must provide services to allow all tools to observe and manipulate all objects being monitored. It is evident that a set of events the most tools are interested in is common but the actions they want to perform are usually different. For instance, debugger, performance analyzer and program flow visualizer may all be interested in the event “a process acquired a lock”. However, debugger may want the monitor to stop the acquiring process, performance analyzer may request the time needed to obtain the lock and visualization tool may want the monitor to recored this event in a trace file. Transparency – The middleware mechanisms must stay transparent for tools and be recognized
and managed by monitoring systems. For instance, process migration should be completely transparent for tools and may only be visible as performance degradations. Monitoring system must be able to save and restore the process state as well as other common resources, e.g. the communication routes and shared data. In order to prevent possible falsification after the migration, monitoring systems must transparently and update other participants. If the migration can not be performed the system must be able to cancel the transaction. Efficiency and Scalability – Efficiency deals with performance of monitoring systems as well
as with the impact on the monitoring target. Performance issues are usually twofold. First, communication among monitoring components and tools must be fast. Second, a monitoring system should consume the processor only if it is unavoidable. Scalability refers to the changes in performance when the system being monitored grows – e.g. the system is added new SeqExecutables. Extensibility and Maintainability – These issues deal with the overhead needed to modify and
maintain monitoring systems. In particular, the system should be easily extended with new functionality (services and events), new objects being monitored and new tools. Further, it should be easily adapted to new programming paradigms. Heterogeneity and Portability – These issues deal with the ability of monitoring systems to work
in environments with different hardware architectures, operating systems and programming languages. Portability refers to the overhead needed to port a monitoring system to other 4
platforms. The most common way to deal with portability is layering while it allows developers to modify only lower layers when porting the system to a new platform.
4 Monitoring Architecture The event-action paradigm, the client-server architecture and layering are three important concepts the software architecture of monitoring systems must be centered around. The event-action paradigm allows tools to register actions that should be executed by the monitoring system when requested events occur. The client-server architecture naturally adheres at the tool-monitor relationship – a tool requests a service and the monitoring system performs the requested and replies with the result. A layered approach decomposes the system into logical or functional units and thus may reduce complexity of the system and improve its portability, maintainability and reusability. The layers should have clear interfaces and communication protocols. The following layers may be identified in our monitoring systems: Distribution Layer – This layer deals with parallelism and distribution. For instance, a request to
stop all application processes must be considered at this layer and transparently distributed to local components. This layer is responsible for gathering results from the local components and sending replies back to the tools. The distribution layer understands the semantic of the programming paradigm but does not explicitly deal with it. In addition, due to its central position this layer prevents inconsistency which may arise when multiple tools are connected at the same time. Monitoring Layer – Using this layer the common monitoring functionality is implemented. The
monitoring layer should provide services or actions which may be triggered by the distribution layer or by external events. This layer understands the semantic of the programming paradigm that is to be monitored and is able to deal with its mechanisms, particular implementations and versions through the adaption layer. Adaption Layer – This layer hides the specific features that come from different implementations
or versions of the same programming paradigm. To port the monitoring system to other platforms only this layer should be modified or replaced. The adaption layer is also responsible for caption of events which are generated by the objects to be monitored. It is also able to perform specific actions requested by the monitoring layer. In addition, this layer may be used to integrate monitoring components implemented in hardware. The common architecture of our monitoring systems is depicted in Figure 2. The distribution component implements the distribution layer. The monitoring layer is implemented by local components and the adaption layer is implemented by both local and intruding components. Intrusion is made by wrapping middleware calls and inserting code for observation and manipulation of objects being monitored as well as for communication with locals components.
5 Monitoring Systems In this section we present our monitoring systems according to the issues, requirements and architecture discussed in the previous sections. 5
Tool
Tool request, reply
distribute, gather
Distribution Component communication
Local Component
Local Component event, action
Intruding Component Object being Monitored
Intruding Component Object being Monitored
Intruding Component Object being Monitored
Figure 2: General Monitoring Architecture
5.1 OMIS and OCM The OMIS [9] (On-line Monitoring Interface Specification) project proposes a standardized interface between tools and on-line monitoring systems for the message passing programming paradigm. In order to achieve the flexibility in managing complex communication patterns between tools and monitoring systems, the interface specification is based on a language. The use of standardized interfaces makes the development and porting of parallel tools easier. In particular, having m tools and n target platforms, this approach reduces the development effort from to only + combinations. OMIS provide extension mechanisms for new functionality of existing tools, for new tools and for new programming paradigms. OMIS-based monitoring systems contain set of services which are structured according to the event-action paradigm. The OCM [12] (OMIS Compliant Monitoring System) project is aimed at implementation of OMIS-based monitoring systems. Currently, the monitoring system has been implemented for PVM applications running on a network of workstations – MPI and Windows NT implementations are in progress. Local information and events are managed by local monitors which are implemented as separate processes. Applications are linked against the instrumentation library which communicates with local monitors through shared memory segments. A central component, called NDU (Node Distribution Unit) is responsible for management of requests that address multiple nodes. Currently, communication among OCM components is performed through the PVM library but in order to avoid disruption of the PVM communication mechanisms by the monitoring system and port OCM on MPI, OCM will replace the current PVM-based communication with the MCI library which is part of the CORAL project.
m n
m n
5.2 CORAL The CORAL [14, 15] (Cooperative Onine Monitoring Actions Layer) project is aimed at design and implementation of online monitoring systems for applications based on the DSM programming paradigm. CORAL instruments only parallel activities and constructs. Sequential constructs, e.g. 6
sequential data and code, are not monitored. They can be included in CORAL utilizing legacy sequential software, e.g. debuggers such as dbx and gdb can be used to debug sequential constructs. CORAL is primarily focused on interaction between parallel applications and DSM runtime systems, transparency of the programming paradigm and monitoring consistency. Currently, CORAL supports DSM applications based on the TreadMarks library. The distribution component in CORAL, called CCU (Coral Control Unit), is responsible to deal with parallelism. It gets requests form multiple tools, ensures consistency of multiple requests and distributes the requests to local monitors. The replies from local monitors are gathered by CCU and sent to appropriate tools. CORAL wraps DSM libraries and insert code which transparently manipulate with DSM mechanisms. For instance, process migration is completely transparent for CCU. Local monitors in conjunction with intruding components implement the adaption layer which is able to save the process state and its shared resources and update all other processes. After the migration has been performed all connected tools are informed about the new location of the migree. The communication between monitoring components as well as between the monitoring system and tools is accomplished through a fast and tiny library, called Monitoring Communication Interface (MCI), which makes use of both TCP sockets and shared memory segments for local and remote data transfer, respectively. MCI is derived from the SCIPVM library [16] and has a similar concept as Active Messages. In order to avoid disruption of local computing, potentially caused by monitoring components and still preserve the effectiveness of the monitoring functionality, MCI implements interrupt-driven communication, buffering of early messages and multiplexing among multiple senders. The library does not encode messages to support heterogeneity but each process is able to discover the peer’s architecture and perform encoding on top of MCI.
5.3 MIMO The MIMO (MIddleware MOnitor) system [11] tries to realize a monitoring and management system for distributed, heterogeneous middleware environments. As the target platform of MIMO are large client-server architectures, a way to handle the complex structure of such environments is needed. Therefore, MIMO introduces a multi-layer monitoring approach which allows to systematically observe middleware systems by closely reflecting their hierarchical composition. For this general abstract multi-layer monitoring model, mappings for various middleware platforms like CORBA or DCOM are defined. MIMO’s distribution components provide the tool-monitorinterface which is described in CORBA IDL. Replies to tools are either sent as results to synchronous operation invocations or by using CORBA event channels for asynchronous events. Data collection within MIMO is either carried out by intruders which are transparently integrated into existing applications, or by using specific MIMO adapters which are inserted into application source code by programmers. Both intruders and adapters use CORBA IDL interfaces to attach and detach to/from MIMO, and deliver asynchronous events by means of CORBA event channels. This method of providing a generic intruder-monitor-interface has the benefit of easy integration of further middleware environments, without having to modify the monitoring system itself. Hence, the MIMO system can be seen as an approach to provide a rather light-weight infrastructure which defines a general framework for monitoring and managing distributed middleware applications, where flexibility and extensibility with respect to heterogeneous middleware platforms and complex client-server environments are the main aspects of interest.
7
6 Discussion While the design, implementation and usage of the monitoring systems presented in the paper have proven to be very successful, several significant differencies should be pointed out – most pertaining to the way the monitoring systems deal with the programming paradigm. OMIS/OCM is currently aimed at the message passing programming paradigm and supports PVM and MPI applications. CORAL supports TreadMarks applications and can be easily adapted to other DSM libraries. MIMO supports CORBA and DCOM. This fact is reflected in ways how those systems support parallel tools and which services they provide. For instance, both OCM and CORAL support process migration while MIMO support only object migration. The primary reason for this is that DOC applications and middleware are loosely coupled and all processes do not have information about all others. Further, process migration must restore all server resource and update all clients and therefore includes overhead which may cause the migration to be very expensive to implement. On the other hand, due to the nature of DSM systems, process migration mechanisms in OCM and CORAL are different especially since message passing in DSMs is hidden and automatically managed by DSM runtime systems. The further differences come from the way how monitoring components communicate with one another. Both OCM and MIMO make use of the communication mechanisms available in the programming paradigm being monitored. OCM utilize PVM and MIMO CORBA for internal communication. CORAL implements a communication library which is independent of DSM mechanisms while this approach would yield poor performance – for the same reason synchronization primitives in DSMs utilize message passing rather than DSM mechanisms.
7 Future Work While our projects have been very enlightening and our results have been very encouraging and strongly motivate further research in the real of monitoring system for parallel and distributed environments, we intent to integrate our monitoring systems and provide a monitoring infrastructure for several programming paradigms. This integration will be based on the concepts that all three systems have in common: the event-action paradigm, the client-server architecture and layering. The first step towards is the design and implementation of a object oriented monitoring system providing basic monitoring mechanisms that are independent of programming languages and paradigms. Language- and paradigm specific features should be handled through the adapter and bridge design patterns.
References [1] H. E. Bal, M. F. Kaashoek, and A. S. Tanenbaum. Orca: A Language for Parallel Programming of Distributed Systems. IEEE Transactions on Software Engineering, 18(3):190–205, March 1992. [2] A. Beguelin, J. Dongara, A. Geist, R. Manchek, W. Jiang, and V. S. Sunderam. PVM: A User’s Guide and Tutorial for Networked Parallel Computing. MIT Press, 1994. [3] G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Modeling Language User Guide. Addison-Wesley, 1999. 8
[4] The MPI Forum. MPI: A Message-Passing Interface Standard, Version 1.1’. Technical report, University of Tennessee’, Knoxville, TN, June 1995. [5] The MPI Forum. MPI-2: Extensions to the Message-Passing Interface. Technical report, University of Tennessee, Knoxville, TN, July 1997. [6] OMG (Object Management Group). CORBAservices: Common Object Services Specification. Technical report, November 1997. [7] OMG (Object Management Group). Objects by Value — Joint Revised Submission. Technical report, 1998. [8] P. Keleher, A. L. Cox, S. Dwarkadas, and W. Zwaenepoel. TredMarks: Distributed Shared Memory on Standard Workstations and Operating Systems. In Proceedings of the USENIX Winter 1994 Conference, pages 115–131, January 1994. [9] T. Ludwig, R. Wism¨uller, V. Sunderam, and A. Bode. OMIS – On-line Monitoring Interface Specification, Version 2.0. Technical Report 9, LRR - Techniche Universit¨at M¨unchen, 1997. [10] Microsoft Corporation. DCOM Architecture. Technical report, 1998. [11] G¨unther Rackl. Multi-Layer Monitoring in Distributed Object-Environments. In Lea Kutvonen, Hartmut Ko¨ nig, and Martti Tienari, editors, Distributed Applications and Interoperable Systems II — IFIP TC 6 WG 6.1 Second International Working Conference on Distributed Applications and Interoperable Systems (DAIS’99), pages 265–270, Helsinki, June 1999. Kluwer Academic Publishers. [12] R.Wismueller, J. Trinitis, and T. Ludwig. OCM – A Monitoring System for Interoperable Tools. In In Proceedings of 2nd. SIGMETRICS Symphosium on Parallel and Distributed Tools SPDT98, pages 1–9. ACM Press, August 1998. [13] Sun Microsystem Inc. Java Remote Method Invocation Specification — Revision 1.2. Technical report, October 1997. [14] I. Zoraja. Online Monitoring in Software DSM Systems. PhD thesis, Institut f¨ur Informatik, Techniche Universit¨at M¨unchen, to appear. [15] I. Zoraja, A. Bode, and V. Sunderam. A Framework for Process Migration in Software DSM Environments. In submitted for PDP200, 1999. [16] I. Zoraja, H. Hellwagner, and V. Sunderam. SCIPVM: Parallel Distributed Computing on SCI Workstation Clusters. Concurrency: Practice and Experience, 11(13):121–138, March 1999. [17] I. Zoraja, G. Rackl, M. Schulz, A. Bode, and V. Sunderam. Modern Software DSM Systems: Models and Techniques. Technical report, SFB Report, LRR – Techniche Universit¨at M¨unchen, to appear.
9