ation and deployment), easy to maintain (i.e. tool support for management ... The configuration of external applications like Nagios. (realtime observation) ...
A Monitoring Framework for the Venice Service Grid Michael Koch, Markus Hillenbrand, Paul M¨uller University of Kaiserslautern D-67653 Kaiserslautern, Germany {m koch2, hillenbr, pmueller}@informatik.uni-kl.de 1 Motivation The Venice Service Grid [4] (in short: Venice) provides an environment for building distributed applications on top of widely deployed services. As a Service Grid, Venice abstracts from underlying hard- and software, hides the complexity of its resources and focuses on the end-user by providing intuitive means for service access. In order to be lightweight, Venice is easy to deploy (i.e. quick service creation and deployment), easy to maintain (i.e. tool support for management, monitoring, and configuration at runtime) and easy to use (i.e. no installation for end-users, graphical user interfaces for access). From a software infrastructure view, Venice is based on a pure service-oriented architecture (SOA) [1] and focuses on openness [5], dependability [2] and security [3]. It is implemented using Web services technology and Peer-to-Peer technology. Its services can be deployed on the Internet – secure and reliable access to the Venice services is managed by the infrastructure. The set of services Venice offers range from service management at runtime, service information and access, collaboration and communication, and utility services for building applications upon. The Venice runtime environment can be used for service development and deployment as well as client development and programmer-friendly service access. The maintenance of such a highly distributed and open system (where services can be deployed and undeployed at any point in time) needs a highly adaptive and robust monitoring system. Only with such a system, the administrators and service providers are able to react fast and precisely. In the following, the intended monitoring system, its requirements and a first architecture are explained.
2 Functional Requirements By definition monitoring describes the measurement and publication of the state of resources at a particular point in time. In an SOA the necessary functionality is spread over several services and components. The range of functions of these services can be divided into three different phases:
1. Data Collection. Before data can be collected, they have to be issued by some resources. In an SOA three predefined data sources for single resource categories exist that may issue values of different areas. A Host is a physical (or virtualised) machine that can measure memory and disk usage, CPU load, I/O rates, or the number of running containers. A Container is a software component that acts as a service environment. It can measure the number of service instances, or memory consumption inside the Java virtual machine (if applicable). Finally, a Service itself can measure the number of users, the maximal concurrent users, the duration of availability, or self-defined property combinations. Self defined properties must be definable by data types which can be used for measurement properties and their metadata. Counters are integer values that are increased/decrease over time. Timers can be a system clock or wall times. Absolute values might be a short, int, long, or float value (for example I/O values). Time series combine one of the above data types with a timestamp and create several of these pairs in a series. Traces are the combination of event type, timestamp and a variable (usually a string) that is growing over time (for example to track the sequence of operation calls of one service; each operation call is appended to the variable). 2. Data Storage. As a result of the previously defined measurements and their data types the monitoring system has to track two kinds of information. On one hand numerical values are stored for long term analysis. On the other hand status and metadata is stored. This information is important to reconstruct the resource topology and their specific environment at a particular point in time. In this way certain service constraints can be derived. All these values have to be saved in a generic way so that they can be reused for different purposes. For this reason an SQL-based database is used to guarantee a wide field of analysis possibili-
ties. The data enable direct or external observation of resources over time within different contexts and perspectives. 3. Data Publication. There is a broad area of possible publication methodologies. Until now three useful aspects have been identified: 1) The triggering of events with respect to freely definable formulae. If a trigger matches, an associated event will be published. 2) The configuration of external applications like Nagios (realtime observation), Nagvis (realtime visualisation) and Munin (historical data representation) to use the collected information. 3) An API for the extraction of data via a dedicated Venice service which can be freely accessed within the Venice Service Federation.
3 Architecture This section describes the current implementation architecture approach to fulfil the described requirements. The functionality is split across several components. Probes are objects that are directly responsible for all measured values of one specific resource. There are three types of probes imaginable today which collect information for the three different resource types like hosts, containers and services. Each probe manages its assigned resource and is initiated by the service abstraction layer of the Venice framework. The service implementation can register several measurement values. Each monitored value can have one of several supported data types. Additionally the granularity of these values can be changed by manipulation of intervals. A Probe Data Collector is initialised by the first probe that is registered within a Venice service container. It also manages a Host Probe and a Container Probe which are collecting information about their specific resource types. The Probe Data Collector reads all measurements of all probes in the given intervals in respect to their definitions, assigns timestamps as well as the required metadata and adds it to the submit queue. After that the data are sent to the Monitoring Data Collector Service. All information of the monitored system is sent to this service. The primary function is the storage of data into a database. The secondary function is the observation of freely definable triggers. Actions can be specified when thresholds or other formulae match. If a threshold is reached or a formula fulfilled a notification is sent via the Venice notification services. In the described way each service is responsible for storing its values by configuring a Service Probe. The Probe Data Collector thereby is responsible for a single container on one host while also managing the submission of the data to the domain’s repository using the Monitoring Data Collector. In a local manner the Probe Data Collector is polling
for the data from the probes. It then pushes this data to the Monitoring Data Collector respectively to the domaincentric monitoring repository. A supplementary application programming interface (API) is implemented as a separate service. It provides flexible access to the collected information and can be used by other services. Data can be extracted by using a WHERE clause of an SQL statement.
4 Conclusion and Future Work The Venice Monitoring Framework provides monitoring functionality for widely distributed services. Probes are injected into hosts, containers and services that provide measurement data. These data are collected by a Probe Data Collector and reliably transmitted to the Monitoring Data Collector Service. There, the data are stored in a database. Triggers complement the collecting functionality by sending notifications when a threshold has been reached or a complex formula is fulfilled. The separate Monitoring Service can be used to read data from the database in order to analyse it. In the future, the monitoring framework will be extended by integrating third-party software like Nagios, Nagvis, or Munin. This enables the integration of Venice into widelyused monitoring systems. Additionally, the performance of the monitoring framework will be analysed in a field test.
5 Acknowledgements This work was funded by the German Ministry for Education and Research (BMBF project “iGreen” 01IA08005G). The authors are responsible for the content.
References [1] T. Erl. Service-Oriented Architecture – Concepts, Technology, and Design. Prentice Hall, 2005. [2] B. E. Helvik. Perspectives on the dependability of networks and services. Teletronikk, 3:27–44, Mar. 2004. [3] M. Hillenbrand, J. G¨otze, J. M¨uller, and P. M¨uller. A Single Sign-On Framework for Web Services-based Distributed Applications. In Proceedings of the 8th International Conference on Telecommunications ConTEL (Zagreb, Croatia), June 2005. [4] M. Hillenbrand, J. G¨otze, G. Zhang, and P. M¨uller. A Lightweight Service Grid based on Web Services and Peer-to-Peer. In Proceedings of Kommunikation in verteilten Systemen KiVS (Berne, Switzerland), 2007. [5] A. S. Tanenbaum and M. van Steen. Distributed Systems – Principles and Paradigms. Prentice Hall, 2002.