active monitoring in grid environments using mobile agent ... - CiteSeerX

12 downloads 88690 Views 97KB Size Report
the automating the operations of monitoring becomes important. Acti- vating all monitoring tools for the different resources involved with an application, collecting ...
ACTIVE MONITORING IN GRID ENVIRONMENTS USING MOBILE AGENT TECHNOLOGY Orazio Tomarchio, Lorenzo Vita Dipartimento di Ingegneria Informatica e delle Telecomunicazioni Universit` a di Catania Viale A. Doria 6, 95125 Catania - Italy

{tomarchio,lvita}@iit.unict.it

Antonio Puliafito Dipartimento di Matematica - Universit` a di Messina Salita Sperone, 98166 Messina - Italy [email protected]

Abstract

Monitoring distributed computational resources effectively is a crucial factor for high-performance distributed computation. Performance analysis and tuning, scheduling strategies, fault detection, are only some of the activities that require monitoring facilities. In this paper we present a mobile agent-based monitoring architecture. After explaining the reasons why this technology is adequate to cope with Grid systems’ heterogeneity, a description of the basic components of the system designed is provided. We also present some considerations on the high degree of flexibility that can be reached with the proposed approach.

Keywords: Grid, Mobile agents, Monitoring, Java, Distributed Management.

1.

INTRODUCTION

Grid environments have recently emerged as integrating infrastructure for distributed high-performance scientific applications [2]. Several scientific applications of different domains such as high energy physics, earth sciences, biology, require a large amount of computing and storage resources, not available in a single supercomputer. The term Grid indicate an execution environment in which high speed networks are used to connect supercomputer, cluster of workstations, databases, scientific instruments located at geographically distributed sites.

1

2 Such environments have many common aspects with distributed and parallel systems, but are different than the latter for some features. In fact (the same way as a distributed system), resources have heterogeneous features, are connected by potentially unreliable networks, and are often under different administrative management domains. However, the need for high performances causes the most common distributed programming paradigms not to be always suitable for Grids. On the other hand, the existing parallel programming tools and techniques are not always appropriate for coping with the heterogeneous Grid systems. These remarks show how a new generation of techniques, mechanisms and tools need to be considered, that can cope with the complexity of such system and provide the performances suitable for scientific computing applications [2]. The creation of several Grid Forums [4] both in the United States and (more recently) in Europe, shows how important this area is for researchers worldwide, and how these computation environments will spread in the future. Among the other issues this kind of architecture arises, monitoring of this plethora of resources becomes an important point. Monitoring facilities are indeed needed in such environments for several activities such as performance analysis, performance tuning, scheduling strategies, fault detection, QoS application adaptation [11]. Some key issues of a monitoring service in a Grid system include scalability, discovery of data, validity of information and flexibility. In this scenario we think that mobile agent technology [5] can play a central role since their capability to cope with systems’ heterogeneity and to deploy user customized procedures on remote sites seems to be very adequate to Grid environments. By interacting with a remote host after migrating on it, an agent is able to make complex operations on remote data without transferring them: the basic idea under this paradigm is that of moving the application logic near the data it needs. This may produce (in general) a significant saving in bandwidth on one hand and the possibility to analyze remote performance data with user customized algorithms encapsulated in an agent. The mobile agent system we used is MAP, developed at University of Catania [10]. We have opted to use our own agent platform, since the Grid system where we operate might require specific improvements and modifications of the agent programming environment itself. Furthermore, the interfacing with external software modules might lead to less performing results, if it is done at a high level instead of doing it within the agent system. The rest of this paper is organized as follows: Section 2 gives an overview of the reasons why - in our opinion - mobile agents may play

Active monitoring in GRIDenvironments using mobileagent technology

3

an important role for monitoring systems in grid environments. Section 3 contains a description of the main characteristics of the MAP agent system, which is the basis of the architecture designed. Then, Section 4 describes the different components of the monitoring architecture designed, and the most common cases in which the system is used. Section 5 provides some references with similar research works. Finally, Section 6 presents the conclusions of this paper.

2.

MONITORING USING MOBILE AGENTS

The capability of monitoring distributed computational resources effectively is a crucial factor for high-performance distributed computation. Monitoring the behaviour of the resources of a distributed computation system is necessary, both for determining the cause of performance problems, and for tuning the system, in order to optimize its use and therefore its performances. The modules of fault detection and the recovery mechanisms rely on monitoring services, in order to determine whether a specific server or remote process is down, and - if necessary - restarting the service or redirecting the requests on other active servers of the system. Finally, in order to build prediction models of performances, which might have been used by sophisticated scheduling algorithms, a historical record for the monitoring data of some resources or of a specific application is needed. As long as the distributed systems on which we operate become bigger and are more widely distributed (as it happens for Grid environments), the automating the operations of monitoring becomes important. Activating all monitoring tools for the different resources involved with an application, collecting such data, filtering them for obtaining useful information, may become a troublesome operation for the end user; this kind of operation might lead to errors. Besides, the lack of a global architecture for monitoring services in a Grid environment might cause problems of scalability and/or waste of resources dedicated to the monitoring features. In this scenario, we think that a distributed approach using technologies based on mobile agents might be the key for dealing with some of the above mentioned problems adequately. A mobile agent is a software module able to migrate among the hosts of a network, in order to carry on a specific task [5]. The agent is not linked to the system where it starts its execution. After being created in an execution environment, an agent can carry its state and code to another execution environment in another host of the network, where the execution can be restarted or continued. By ”state” of the agent we mean a set of values of the agent’s attributes,

4 which allow it to determine what to do when the execution is restarted in another host. The mobile agent programming paradigm overcomes some of the limits of traditional distributed processing techniques, which are typically based on the client/server paradigm [3]. In fact, in a mobile agent approach, the agent moves close to the data to be processed, thus eliminating the network traffic due to messages (excluding the initial migration), and allowing the execution of operations dynamically defined by the user. A typical case concerns client/server applications in which the client must retrieve some data from the server and operate complex filtering operations on such data; by moving an agent containing the procedures that deal with filtering, only the data that actually concern the client are sent through the network, with a considerable reduction of communication costs [9]. Besides, a permanent connection between client and server is not necessary in such scheme; the agent, once it is sent to the site of destination, can continue doing its operations and can communicate the results as soon as the client connects to the network again. Even for monitoring a distributed computation system, we think that the mobility of some code modules can contribute to develop a more effective and flexible architecture. In particular, the main advantages that can be obtained by using an agent paradigm concern the following points: Reduction of the network load. The remote analysis of monitoring data causes a considerable amount of data to be transferred in the network. Conversely, by including the code performing such analysis in an agent, there is no longer need for transferring raw data on the network, since the interaction takes place on the remote resource directly. The data transferred on the network are only the ones actually needed. Opportunity of performing operations of monitoring data analysis by means of algorithms that can be customized by the user and can be dynamically executed on-demand. If a user develops his/her own algorithm for a customized analysis of monitoring data, he/she will only need to develop an appropriate agent, which will be dynamically executed on any system host where the user wishes to execute it, with no need for complex installation procedures. Filtering of monitoring data at several abstraction levels, without high overheads for the system. Each agent positioned on a remote resource may perform a ”continuous monitoring” of the resource. However, in some cases, the information that the user needs concern average values calculated on different time intervals. By using

Active monitoring in GRIDenvironments using mobileagent technology

5

appropriate agents only the data of interest may be communicated, as soon as they are requested or when some events occur. Asynchronous and independent execution of tasks defined by the user. In several occasions, the user of an application needs to monitor several components of the system. In general, this means enabling several remote monitoring services manually. Conversely, if an appropriate agent is programmed, the latter will migrate to the different hosts of the network and will enable the services needed, by acting in an asynchronous and independent way with regard to the user. Integration of heterogeneous resources monitoring tools. Agents may be considered a middleware layer that adapts the vision of the diverse low-level monitoring resources and services present in the system. Applications and users will therefore uniformly interface only with the latter. On-demand enabling of the necessary services. Thanks to the possibility of being dynamically enabled, agents allow the reduction of the number of active modules, until the actually needed ones are active. The consumption of resources dedicated to monitoring is therefore reduced to the least necessary.

3.

THE MAP PLATFORM

The mobile agent system we used is MAP, developed at University of Catania [10]. It provides all the basic tools for the creation and the management of agents, and for their migration and communication. In fact, the platform enables to create, run, suspend, wake up, deactivate and reactivate agents, to stop their execution, and to make them communicate and migrate through the network. MAP is also equipped with a simple graphical interface that facilitates the access to the above mentioned management functions. The MAP platform has been made compliant with the MASIF specification [6, 7]; this way, each MAP platform can accept agents coming from different platforms (also complying with MASIF), and can make them run, allowing them to access the methods needed for their management. Besides, the same way, a MAP agent is allowed to migrate to other platforms that can support it, and can run there. This is particularly important in a heterogeneous environment such as GriD, where the proposed agent architecture for monitoring might be implemented using different agent systems, which are able to operate with each other by means of the MASIF standard. Furthermore, some ”gateways” are currently being

6 developed within MAP, which on one hand allow a MAP agent to invoke a CORBA service, and on the other hand allow an agent to be invoked as if it were a CORBA object. In our opinion, even this integration is very important, since an agent, during its migration, may access various existing services, equipped with a CORBA interface. Security is the other very important aspect - present in MAP - for the development of the monitoring system. In particular, in a service of this type, it is requested to make sure that monitoring tools are started only by users who are entitled to. An adequate security model has been implemented in MAP [8]. This model considers the issues of authentication and authorization, in order to protect - on one hand - the hosts from the agents, and - on the other hand - the agents from the other agents and from the attacks coming from the network. Security policies can be defined for specifying the conditions according to which the agents can access host resources. The techniques used are based on public key encryption, in which each user has a pair of public and private keys, with which he/she can prove his/her identity. This way, the security model developed in MAP can be easily linked to the PKI (Public Key Infrastructure) present in Grid systems.

4.

SYSTEM ARCHITECTURE

The architecture designed for monitoring a Grid system is based on the features of the MAP agent platform, which is used as a basic environment for the execution of agents in the system. Three different levels of ”agents” are present, each having different purposes, features and field of action. They are indicated as Low-Level Sensors (LLSensors), SensorAgents and High-Level Monitoring Agents (HLMAgents), respectively. The former two types of agents are named ”sensors” to better highlight their role of data collectors from the different system’s components. A directory service is also present for enabling the registration of some sensor agents, and for allowing information consumers to discover and understand the characteristics of available information. We will now describe the basic components of the architecture (shown in Figure 1)and their features more in detail. Generally, Low-Level Sensors (LLSensors) are not agents. They consist of a set of operating system tools, programs and/or system calls able to monitor a specific resource, measuring one or more typical parameters. They are therefore specific for each resource, and in general several LLSensors may exist for the same resource, able to measure the same parameters. They are the lowest level of our architecture, and directly interface with resources.

Active monitoring in GRIDenvironments using mobileagent technology

Monitoring Applications

Other Agents

Local Monitoring Application

High-Level Monitoring Agents HLMAgents

Directory Service

7

SensorAgents Low-Level Sensor Management Low-Level Sensors LLSManager LLSensors

Agent Cache Repository

Performance Data Repository

Resource

Figure 1

Architecture of the monitoring system

The Low-Level Sensor Manager (LLSManager) deals with the registration of low-level monitoring tools available in a specific resource in the Directory Service (this component will be better described below). We also point out thay the operation of registration for a LLSensor of a specific resource takes place only once during the initialization of the resource itself. Then SensorAgents will control the execution of LLSensors. SensorAgents are the designed components for using the data supplied by LLSensors for providing uniform information about the state of a resource. They are implemented as mobile agents specialized for monitoring a specific type of resource. Resources which are taken into account include: host: sensors monitoring hosts should be able to monitor CPU load, total and available memory and some other host related parameters; network: sensors monitoring network should be able to retrieve at least SNMP information from network devices and to retrieve other information about network links characteristics; storage: sensors monitoring storage resources should be able to measure the typical I/O parameters such as read/write throughput, access time, etc, for a mass data storage system;

8 process/application: these sensors monitor the life cycle of a process, reporting all its interesting information; they should also be able to monitor user-defined signals and/or events coming from the application: obviously, they are highly application-dependent. The introduction of these SensorAgents arises from the need for uniforming and normalizing the monitoring tools specific for the different resources. As well as enabling a more flexible management, their task is that of creating a uniform and comparable vision of data concerning the performance of different resources, originally obtained with different tools. Finally, at a higher level we find High-Level Monitoring Agents (HLMAgents), whose task is generating and/or gathering significant information relying on all agents and sensors present in the system. Among these types of agents, we can distinguish the user or application-initiated ones, whose activity is expressly started by a user or by an application, and the event-initiated ones; these are agents that, once enabled, wait for a specific event before interacting with the user. CollectSensorAgents and AggregateAgents can be included in the first group. They collect into meaningful information the raw data produced by the sensor agents distributed in the system. While doing their task, these agents can migrate from an host to another in order to minimize the amount of raw monitoring data that has to be transferred. In fact, monitoring facilities often need a general view of the system performance level, which has to be computed by aggregating many raw data from the hosts and the devices of the system. The other type of HLMAgent (event-initiated) gives the possibility of disseminating some trap functions in the system, that inform the user that certain events are occurring. Users can design their own TrapAgents, which are in charge of continuously monitoring a certain event or a complex sequence of events. Only on the occurrence of this condition the TrapAgent (which has been located in the most opportune host in the system) informs the user, who may in turn enable other monitoring services to take a management decision. In the TrapAgent the user can implement any (eventually complex) monitoring function, relying on the services provided by the other sensor agents executing in the system. In this way he doesn’t need to preinstall special software on every host of the network: he can execute every monitoring functions he wants on demand, when really needed. The other basic component of the proposed architecture is the Directory Service. As we have already said before, the LLSManager publishes the services and the capabilities provided by the LLSensors for each specific resource. The different monitoring applications or HLMAgents exploit the information present in the DirectoryService for sending the

Active monitoring in GRIDenvironments using mobileagent technology

9

appropriate SensorAgents to the resource involved. Furthermore, the Directory Service maintains the information concerning the active SensorAgents, and therefore concerning monitoring ”information” available for the different resources. A Performance Data Repository is also present in the system for each resource, in order to store the events and the data recorded by LLSensors and Sensor Agents. This repository is the local database for each resource consulted by the different HLMAgents, when they need to analyze such data. The format of the data contained in the repository is contained within the Directory Service, together with the above mentioned characteristics of sensors. Finally, the Agent Cache Repository present in each MAP region maintains a cache for the code of the agents that have transited and have been executed within the region. This way, the MAP agent system minimizes code transfers of agents, since, after an initial settling stage of the system, the code of an agent is very likely to be in this cache. The mechanism of dynamic loading of classes, which is present in MAP, makes this operation completely transparent for the end user.

5.

RELATED WORK

Research in this area is still at its early stages. Even the concept of Grid needs to be better understood as a whole, and in its impacts with the present tools and methods of distributed computation. For instance, the issue of monitoring in Grid environments is currently being dealt with by the Grid Performance Working Group of GridForum [4]: in one of the papers produced ([12]), a possible architecture is proposed for maintaining and accessing performance information. Conversely, a system for managing monitoring sensors is proposed in [11], where some Java agents are used in a producer/consumer model. Although this approach has several common points with our proposal, no mention is made about the mobility of the different components. Conversely, this aspect is very important in our architecture. Finally, another approach for the dynamic measurement of the performances of an application in Grid environments is present in the Pablo scalable information toolkit [1].

6.

CONCLUSION

An architecture for monitoring services in Grid environments based on mobile agent technology has been presented. Even if previous works in applying this technology for the management of distributed systems has been effective, many issues remain open, mainly regarding the scalability of such infrastructure for Grid systems and the integration with other

10 Grid services for a real exploitation. Future research include testing of this architecture in significant testbeds which are currently being set up by several Grid-related projects.

References [1] L. DeRose and A. Reed. SvPablo: A Multi-Language ArchitectureIndependentPerformance Analysis System. In Proceedings of the International Conference on Parallel Processing (ICPP’99), Fukushima (Japan), September 1999. [2] I. Foster and C. Kesselman. The Grid: Blueprint for a new Computing Infrastructure. Morgan Kaufmann, August 1998. [3] A. Fuggetta, G.P. Picco, and G. Vigna. Understanding Code Mobility. IEEE Transaction on Software Engineering, 24(5), May 1998. [4] GridForum. http://www.gridforum.org. [5] K. Rothermel and R.Popescu-Zeletin Eds.,. Mobile Agents. Lecture Notes in Comp. Science, LNCS1219, 1997. [6] D. Milojicic, M. Breugst, S. Covaci, and al. MASIF: the OMG Mobile Agent System Interoperability Facility. In 2nd Int. Workshop on Mobile Agents, (MA’98), Stuttgart (Germany), September 1998. [7] E. Di Pietro, A. La Corte, A. Puliafito, and O. Tomarchio. Extending the MASIF Location Service in the MAP Agent System. In IEEE Symposium on Computer Communications (ISCC2000), Antibes (France), July 2000. [8] A. Puliafito and O. Tomarchio. Security mechanisms for the MAP agent system. In 8th Euromicro Workshop on Parallel and Distributed Processing (PDP2000), Rhodos (Greece), January 2000. [9] A. Puliafito and O. Tomarchio. Using Mobile Agents to implement flexible Network Management strategies. Computer Communication Journal, 23(8):708–719, April 2000. [10] A. Puliafito, O. Tomarchio, and L. Vita. MAP: Design and Implementation of a Mobile Agent Platform. Journal of System Architecture, 46(2):145–162, 2000. [11] B. Tierney, B. Crowley, et al. A Monitoring Sensor Management System for Grid Environments. In High Performance Distributed Computing (HPDC-9, Pittsburgh (Pennsylvania), August 2000. [12] R. Wolsky, M. Swany, and S. Fitzgerald. White Paper: Developing a Dynamic Performance Information Infrastructure for Grid Systems. February 2000. Available at http://dast.nlanr.net/GridForum/Perf-WG/.