Using YAIM Tool for Developing gLite Based Grid ...

3 downloads 160 Views 159KB Size Report
tool not only for grid middleware but for many other Linux based services . .... configure the Nagios monitoring tool in a grid User Interface. (UI) machine, a new ...
Using YAIM Tool for Developing gLite Based Grid Services M.L.Jayalal

S.Rajeswari

S.A.V. Satya Murty

P.Swaminathan

Electronics and Instrumentation Group Indira Gandhi Centre for Atomic Research, Kalpakkam Email-ID: [email protected] Abstract - YAIM originally stands for Yet Another Installation Manager which denotes its basic functionality, i.e. a set of bash scripts used for installing grid middleware services. But after getting added with sufficient functionalities, it is now widely known as Ain't an Installation Manager, which emphasises that it can do many things beyond a simple installation manager. YAIM can be used as a powerful general purpose configuration tool not only for grid middleware but for many other Linux based services . YAIM is widely used grid services configuration tool for gLite based middleware which is developed by CERN as part of the Worldwide LHC Computing Grid (WLCG). The aim of the YAIM tool is to make the installation and configuration of grid services as easy as possible. This paper gives a brief introduction about WLCG setup and the associated gLite middleware software. The YAIM tool configuration details and functionalities including its sequence of operations are discussed. Then based on the working experience in the WLCG project at CERN, a typical example of developing configuration module for a grid service using YAIM tool is covered. I.

INTRODUCTION

The application of grid computing is recognized in all fields of development from basic scientific and engineering research to e-commerce and web services. In essence, grid computing is a form of distributed computing which addresses the ever-growing requirement for computational and data storage resources by employing powerful clusters locally and interconnecting them through wide area networks. The grid computing setup in which the main goal is to share computational as well as data storage resources can be termed as Computational Data Grid. Data Grid is another category where the grid system deals the controlled sharing and management of large amounts of distributed data. Recently, Large Hadron Collider (LHC) experiments and the associated Worldwide LHC Computing Grid (WLCG) project at the European Organization for Nuclear Research (CERN), Geneva got world wide attention during its first protons beam passing. At full capacity, the LHC , the world's largest particle accelerator, is expected to produce more than 15 million Gigabytes of data each year. The mission of the WLCG is to build and maintain the data storage and analysis infrastructure for this immense flow of data, thus helping physicists open new frontiers in our understanding of the Universe. This ambitious project connects and combines the IT power of more than 140 computer centres in 33 countries.The software tools and packages that we consider in this paper are developed as part of WLCG project but can be used for building any typical Computatinal Data Grid.

II.

gLite MIDDLEWARE

The term middleware represents the software stack that comes conceptually between the operating system, which is at the bottom and the different application software that comes at the top. The middleware's task is to organize and integrate the distributed computational resources of the grid into a coherent structure. In a typical setup, the grid specific services are implemented by the grid middleware. The WLCG project uses middleware called gLite (pronounced as “gee-lite”) to implement the grid setup. gLite is the successor of the earlier middleware from the same WLCG project called LCG-2 and is one of the most widely used grid middleware. The gLite middleware hides much of the complexity of this environment from the user, giving the impression that all of these resources are available in a coherent virtual computer centre. gLite is an open source software toolkit and provides a leading-edge framework for building grid applications. The toolkit includes software services and libraries for workload management, resource monitoring, data management, security, accounting, information infrastructure and fault detection. It also includes additional software tools like YAIM for configuring other services, which is considered in detail later in this paper. The initial version of gLite i.e. gLite1.0 was released in April 2005. In May 2006 gLite3.0 was released which combines the features of its own earlier version called gLite1.5 and another similar middleware stack called LCG2.7. gLite 3.1 was released in June 2007 and the latest version is gLite 3.2 released in March 2009. III.

BASIC CONCEPTS RELATED TO gLite

gLite middleware tool provides almost complete set of software framework for a building a Computational Data Grid. The basic concepts and functional units are discussed here to understand gLite middleware in a better way. A.

Virtual Organisation (VO) VO refers to a group of individuals and/or institutions and resources that work in collaboration towards a common goal. The users of grid can be organised in different virtual organisations each having different set of policies. Membership of a VO grants specific privileges to that grid user. The gLite service, which is responsible for grid user/resource authentication, is called Virtual Organization Membership Services (VOMS).

B.

Resource Broker (RB) For running user jobs in a grid environment, allocating and managing appropriate resources to the jobs is the key functionality and this is done by the system called RB. That means the grid wide workload management service of gLite is running on RB machine. To achieve its functionality, the RB machine has to interact with all the grid sites present in the setup.

tool is to make the installation and configuration of grid services as easy as possible. The WLCG project uses this tool for the middleware installation and configuration. This tool helps to install and configure grid sites. It is developed as the part of EGEE (Enabling Grids for E-Science) project and can be used for the installation and configuration of gLite middleware services. In our discussion we are considering the latest version of YAIM called YAIM 4. V.

Information Services The Information Service (IS) provides information about the global Grid resources and their status. This information is highly essential for the whole grid operation because the RB through this service does the resource discovery.

YAIM FEATURES

C.

D.

User Interface (UI) The initial point of access to the grid is the User Interface (UI). This is a machine where grid users have a personal account and where the user's certificate is installed. This is the gateway to grid services. From the UI, a user can be authenticated and authorized to use the grid resources. It provides a command line interface or graphical interface to perform some basic grid operations like job submission, job status checking, job output retrieval etc. E.

Compute Element (CE) Computing Element is responsible for management of a set of computing resources at a grid site. While Resource Brokers are responsible for grid wide workload management, CEs are responsible for site wide workload management. The CE work in a push model where job is pushed to the CE by the grid wide workload management system. F.

Worker Nodes (WN) The collection of computing resources at a grid site is termed as Worker Nodes, the nodes where jobs are run. The CE and WNs are controlled by the gLite service called Local Resource Management System (LRMS). The job scheduling software supported by LRMS are OpenPBS/PBSPro, LSF, Maui/Torque, BQS and Condor. G.

Storage Element (SE) A Storage Element provides uniform access to data storage resources and hence gives a storage backend for the grid environment. The Storage Element may control simple disk servers, large disk arrays or tape-based Mass Storage Systems (MSS). Most grid sites provide at least one SE. Most storage resources are managed by a Storage Resource Manager (SRM), a gLite service that provides capabilities like transparent file migration from disk to tape, space reservation etc. IV. YAIM YAIM is a well-known and widely used grid services configuration tool for gLite-based middleware. The aim of the

A.

Modularized Structure YAIM follows a modular package structure to support component based release model, where all the modules are distributed as separate RPMs. When a new module is developed, that is considered as a project with a standard directory structure and finally a separate RPM module will be created and released for that project. Later the user can selectively install the required YAIM modules based on his requirement.

B.

Bash Script Syntax To ensure that local administrators can adapt YAIM, it has been implemented as a set of bash scripts. Further YAIM follows bash script syntax that means to write functions in YAIM, the user has to follow the bash script syntax. C.

Node Type Configuration Facility YAIM allows the developer to define and configure a new node type. A node type here means a grid middleware service that can be configured using a YAIM module to accomplish a particular task. For example, to develop a YAIM module to configure the Nagios monitoring tool in a grid User Interface (UI) machine, a new node type can be developed using YAIM. This particular example is discussed further as a typical example of configuring a grid service using YAIM D.

RPM Distribution The YAIM module is distributed as RPMs, which makes the total installation easier and user configurable. The default installation directory of YAIM is /opt/glite/yaim but at the time of RPM installation user can override to specify the desired directory. E.

Directory structure The basic essential module of YAIM is called gliteyaim-core, which contains common functions and definitions. The other modules follows a common naming convention like glite-yaim-node_type where node_type represents the module name that implement the functionality to configure specific node type. When the basic core YAIM module is installed, by default a directory structure is created under /opt/glite/yaim. The important sub directories are functions, defaults, bin and log.

F.

Basic YAIM configuration file When the YAIM basic package is installed, by default a sample basic configuration file named site-info.def is created in /opt/glite/yaim/example/siteinfo directory. The file should be modified to accommodate all grid site-specific variables and can be moved to a non-world readable directory. DEVELOPING YAIM MODULE FOR NAGIOS BASED MONITORING - INTRODUCTION

VI.

Monitoring is the act of collecting information concerning the characteristics and status of resources of interest. Grid monitoring is the activity of measuring significant grid resource-related parameters to analyze usage, behavior, and performance of a grid system, and to detect and notify fault situations, contract violations, and user-defined events. One of the major challenges in monitoring is that the grid concept allows resources to dynamically join or leave the system. Nagios is an established open source monitoring tool in high performance computing arena which allows monitoring and notifications for failures of network, hardware and services. Nagios is one of the major monitoring tools used in the WLCG setup. As a resource monitoring tool, Nagios is having many features suitable for using it in a grid setup. Nagios has a web interface, which can be configured to monitor the grid hosts and services. It has wide set of basic sensors and options for developing custom sensors for a particular grid setup requirement. The ease of use and high level of configurability are the major factors which made Nagios as preferable monitoring tool in a typical grid setup. In the following YAIM module development example we are considering how to build a configuration module which enables the Nagios configuration an easy task for grid site administrators. VII.

MODULE DEVELOPMENT PROCEDURE

Consider the scenario where we want to install and configure the Nagios monitoring tool in a large number of grid User Interface (UI) machines. A typical grid setup will have a number of UI machines through which the grid user will submit his jobs to the grid. A monitoring tool like Nagios is very much needed in a UI machine to get the current status of grid hosts and services. Installing and configuring Nagios on each of the UI machines is possible but involves a huge administrative workload. Here we are analyzing how the help of YAIM tool can achieve this. Even though the installation part of Nagios can be integrated with YAIM, here we are considering the configuration part alone, because configuration involves the major chunk of tedious work. Hence we assume that in our example, the UI machine is installed with proper Nagios version. To achieve the configuration part of it in an easier way, we are considering how to develop a YAIM module. Once the module is developed and available for the user, the only required actions are install that new YAIM module and run the YAIM

configuration command. The module will perform all the necessary configurations action for Nagios. The typical grid setup that we consider for developing a Nagios configuration module using YAIM has a project development strategy, which controlled by Concurrent Version System (CVS). CVS is a well-known GNU based software for version control system for software based projects. Another point to be noted that this example module development is done on a UI machine where the entire basic YAIM package are already installed and we have a directory structure as mentioned in section5.5. Let us consider the basic site-specific YAIM configuration file with all required configuration variables set, present as /opt/site-info.def file in UI machine. The steps involved in the YAIM module development for the considered example is further described in the following sections. A.

Request for new YAIM module First the developer has to give a request to the CVS administrator to register the new module with the CVS. After initial validity checking, the administrator will create a new project and basic templates for development with a name, which specifies the new module’s functionality. Here in this example of Nagios module development, the new project registered is termed as org.glite.yaim.nagios. B. Obtain the Project (Check out the Project) After the CVS administrator notifies the developer with the registration, he will be allowed to check out (download to his development platform, here the UI machine) the new registered project for development. In our example, the project termed org.glite.yaim.nagios can obtain by first setting the CVSROOT environment variable to locate the remote CVS server machine’s project repository path and then executing the CVS checkout command as follows: #cvs co org.glite.yaim.nagios The “co” option specifies the check out action. This command will put the project in the org.glite.yaim.nagios directory (we will call this directory as our project directory in following discussion) with required directory structure and basic template files for development in the local development machine. C.

Function development To get the actual configuration activity for the required software (here configuring Nagios monitoring tool on a grid UI machine) the functions development is the important job of the developer. We have to achieve two main goals from function development for Nagios module. First, the UI machine on which the Nagios monitoring tool is being installed, our yaim function should create a proper configuration file in the Nagios default configuration directory (/etc/nagios). This configuration file should contain all its entries properly set for Nagios monitoring tool. This will relieve the system administrator from the tedious configuration activity, which is the main attraction of YAIM.

In our module development example, this is achieved by the function developed named as config_nagios . The second goal is once the Nagios monitoring tool is properly configured and tested its working, there should be an authentication mechanism by which all the authorised users of the grid site (not only the user accounts present in that particular UI machine) should be allowed to get access to Nagios web interface. This is achieved by the function developed named config_voms2htpasswd. More detailed description about developing the two functions are given in the following sections. The basic steps involved in the function development for YAIM module, particularly considering Nagios configuration module, are described here. a.

Define the Functions to YAIM. The list of functions that we want to develop is to be defined in the node-info.d subdirectory of project directory. A template file will be created during the project checkout in the node-info.d subdirectory. The file created is glite-nagios. The developer has to define the required function names in this file .It is advised to begin all function names with prefix “config” to differentiate the YAIM functions globally. The functions execution order while doing the configuration is the same order as it is defined. Here we require two functions to be defined, which are named as config_nagios and config_voms2htpasswd. b.

Develop the Functions. When the new project is installed, by default a sample template for function development will be created at functions subdirectory of project directory. The template name will be config_modulename where module name represents the registered yaim module name. In our example, the function template name is config_nagios. Developers are requested to use this template to add the basic functionality and develop additional required functions and follow the same structure as given in the basic template. The code written in functions follows bash scripts format and many additional constructs are provided by YAIM. As described above, for Nagios module development we are using the default function named config_nagios and created a new function named config_voms2htpasswd. Every YAIM function should contain three sub functions which can be classified as the check sub function, the set environment sub function and the action sub function. The check function is needed when the developer wants to check whether the relevant variables have been defined which are needed to configure the module. The set environment function, on the other hand, used to set the required environment variables by the developer. All the required configuration actions are performed by the action sub function. The action sub function for config_nagios is named as config_nagios() and for config_voms2htpasswd function is named as config_voms2htpasswd().

Function config_nagios The config_nagios function will generate a required configuration file for Nagios named nagios.cfg and put it in /etc/nagios directory. If already an existing configuration file is present in that directory, then it will be renamed as nagios.cfg.old and then the new nagios.cfg file will be created. This function will create around 80 variables for Nagios configuration and set them with proper values. Function config_voms2htpasswd This function’s aim is to generate another configuration file named voms2htpasswd.conf file in the /etc/nagios directory. The content of the file is the list of all VOMS servers present in the grid setup. Further this file will be used as an input to a command termed “voms2htpasswd” which gets all the user names from all the listed VOMS servers and put it in a file named “htpasswd” in the /etc/nagios directory. In this way all the grid site users are allowed to enter in to the web interface of Nagios from theUI machine. D.

Testing the YAIM module Once the required functions have been developed, the developer needs to test the intended functionalities of the module. YAIM provides a set of commands for this purpose. The developer can test his independent functions by running the “yaim” command with “-f” option. For example the config_nagios function can be tested as follows: /opt/glite/yaim/bin/yaim -r -s /opt/site-info.def -n glite-yaimnagios –f config_nagios Here –r option represents running the YAIM module, -s option specifies the location of site-info.def file which contains all grid site specific configuration variables, –n option represents the new module or node type name and –f option indicates the function name. E.

Update Documentation Like all other software development projects, here also the documentation is an important duty of the developer. The man pages for the developed YAIM module should be put under the /config/man sub directory of the project directory. In our example, the name of the man page file is yaim-nagios.1. Like in many other earlier steps, here also YAIM provides a basic template for man documentation. The developer has to put the relevant documentation under the sub titles like name, description, configuration variables, and examples. F.

Check in the Module to the YAIM project Once the developed YAIM module is properly tested for all its functionalities, the next step is to commit the developed project back to the CVS server’s corresponding project repository directory. This can be achieved by executing CVS commit command as follows: #cvs commit org.glite.yaim.nagios This action will result in updating the total project directory structure in remote CVS server.

VIII.

CONCLUSION

The gLite is one of universally accepted open source middleware tools in the computational grid arena. The gLite grid middleware includes the YAIM configuration tool to configure the middleware services in an easier and efficient way. In a typical grid setup, there will be a lot of machines with uniform setup. For example the all User Interface (UI) machines will have a uniform set up and a uniform requirement to configure a monitoring tool like Nagios. We have seen how to develop a configuration module for Nagios using the YAIM tool. This Nagios configuration module developed can make the configuration activity an automated one with out administrator’s intervention. Similarly the powerful YAIM tool can be used to develop so many other grid service configuration modules. REFERENCES [1]

[2] [3] [4]

Ian Foster & Carl Kesselman, “The Grid: Blueprint for a New Computing Infrastructure” (Second Edition),Morgan Kaufmann Publishers, 2004. S. Burke et al, “gLite 3.1 Users Guide”, 2009. Worldwide LHC Computing Grid Project (WLCG):http://lcg.web.cern.ch/LCG/ YAIM Project https://twiki.cern.ch/twiki/bin/view/EGEE/YAIM

Suggest Documents