EGEE Site Deployment & Management Using the ...

4 downloads 8073 Views 560KB Size Report
tion, installation and configuration of middleware to build managed Grid ... administrative domain to explore the capabilities of the Grid in CP appli- cations and ...
EGEE Site Deployment & Management Using the Rocks toolkit António Pina1 , Bruno Oliveira1 , Albano Serrano1 , Vitor Oliveira1 1

Departamento de Informática da Universidade do Minho {pina,boliveira,albano,vspo}@di.uminho.pt

Several cluster management toolkits have been successfully applied to high performance clusters. However, Grid service deployment, in particular for EGEE sites, still lacks complete solutions for the distribution, installation and conguration of middleware to build managed Grid infrastructures. This paper focuses on an easy to use framework that manages the installation of all EGEE components with minimal resources, which resulted from the gLite middleware installation and conguration at Universidade do Minho. The objective is to automate the deployment of the EGEE Grid middleware to oer a common operating environment to the geographically dispersed European Civil Protection (CP) community. We developed the EGEE Roll, based on the Rocks toolkit, to provide a standardized software stack across all the sites of a CP - Virtual organization, tackling two of the most time-consuming portions of a site installation - the architecture and the mechanisms. The framework enables site administrators to incrementally and programmatically create and modify the graph description for all the appliances (node) types required by the EGEE model such as CE/SE/MON/UI/WNs. Extending the Rocks approach to congure EGEE proved to be pertinent in the context of the development of infrastructures and services for virtual organizations, including software update and user management. It guarantees the interoperability across sites and full customization at any CP site administrative domain to explore the capabilities of the Grid in CP applications and to reduce the overall time required for the deployment of an EGEE site. Keywords: Grid Middleware, Cluster and Grid Integration, Civil Protection Upated version: September 2008 Abstract.

1

Introduction

In the last few years we have witnessed the emergence of large-scale production Grid infrastructures to oer computing services to many demanding scientic and industrial applications. The Enabling Grids for E-Science (EGEE) [5] project is a European Community initiative for the establishment of a European-wide grid infrastructures to provide a 24/7 production level Grid for highly demanding scientic research. Included in the EGEE's objectives are the continuous improvement

of middleware software, the attraction of new users and industry into the project and the combination of regional, national and thematic grid eorts in a seamless grid infrastructure.

1.1 The EGEE Model In the EGEE model (as described in [1]), a Grid site is composed of instances of basic node types: the Computing Element (CE), which provides an interface to compute resources via Local Resource Management Systems (LRMS); the Storage Element (SE), which provides an interface to local storage resources; the User Interface (UI), which hosts client software for accessing the Grid; the MonitoringBox (MoB) that receives information about grid jobs from CEs and Worker Nodes (WNs), where the users' jobs run. The minimum resource requirements that sites must oer are described in the document "Service Level Description between ROCs and sites" [3] which formalizes the services provided to its Regional Operations Centre, and vice-versa. The functionality of Grid sites is tested by the regular execution of test jobs (Site Functional Tests) to verify a wide range of Grid operations.

1.2 gLite middleware The EGEE middleware is split in several elements, each one playing a specic role. gLite [2] is a key middleware component that assures a seamless interoperation among sites and provides users high level services for scheduling and running computational jobs, accessing and moving data, and obtaining information on the Grid infrastructure as well as Grid applications, all embedded into a consistent security framework. The middleware on a node is usually congured using YAIM (Yet another Installation Manager) [4] which is a software tool used by site administrator to implement a conguration method for the gLite software, composed of a set of bash scripts and functions. In order to congure a site, one or more conguration les must be edited and then the YAIM script is manually executed. YAIM separates installation and conguration, enabling site administrator to choose how they want to install and congure grid systems. To support the component based release model within the EGEE project, YAIM 4 has been modularized and a YAIM core is supplemented by component specic scripts distributed as RPMs.

2

Rocks Cluster Management

High-performance clusters have become the computing tool of choice for a wide range of scientic disciplines but straightforward software installation, management, and monitoring for large-scale clusters have been consistently painful for non-cluster experts. The Rocks group has been addressing the diculties of deploying manageable clusters driven by the goal of making clusters administration easier to deploy,

to manage, to upgrade and to scale up. With hundreds of deployed clusters, the approach has shown to be quite easily adapted to dierent hardware and logical node congurations. The Rocks toolkit [10, 11] is based on a Red Hat Enterprise based Linux distribution, expanded with packages from many popular cluster and grid specic projects. It provides a singular perspective on cluster installation and management to dramatically simplify cluster integration: a fully functional cluster can be deployed in a matter of hours, with little work required from the system administrator. To set up a Rocks cluster, a frontend node (FE) must be installed and congured, following a small set of input screens to supply the network information, the cluster's identication and the optional components to be installed. Once the frontend node is installed, all the remaining cluster nodes are added through a FE console application that only requires the system administrator to specify the node role to start the installation. The frontend uses PXE or DHCP requests over a LAN network, from the nodes planned to install, as a rst registration step to produce the information needed to generate a custom installation conguration le for each new node in the cluster.

2.1 Appliances and Rolls The role of each node in a rocks cluster is determined by its type, which denes the software to be installed and the node conguration. Based on the type of appliance, the frontend generates a monolithic kickstart - a customizable script based mechanism which automates package installation from RedHat distributions - that is used to fully install and congure each node of the cluster. The software components are structured by means of a packing system called roll, which is a self-contained ISO image that holds packages and their conguration scripts. Each roll installs and congures a specic software package, such as PVFS or Condor, with little involvement from the system administrator. Figure 1

Fig. 1.

EGEE Roll l) Contents r) Directory structure

l) displays the structure of the EGEE roll. This layout is similar to the layout of a Red Hat produced ISO image, with the binary packages that comprise the roll. In this roll there is one package in particular - roll-egee-kickstart-4.2.1-0.noarch.rpm - that holds all the information needed for all the packages included in the roll. Figure 1 r) displays the contents of this package as a directory structure containing XML les in sub-directories nodes and graphs/default.

2.2 Graph-based conguration Rocks oers system administrators the ability to customize the packages to be installed by creating rolls, and to create dependencies between them using a graphbased framework. Some graph nodes are used to specify an appliance, which represents a particular computer node role or type. In order to be as exible as possible, Rocks creates the kickstart on demand based upon a graph representation of the dependencies between packages or services[11]. The graph representations, such as those presented in Figure 2, express both the set of packages to be installed and the individual package conguration. Each vertex of the graph represents a package or a service and its conguration. Graphs

Fig. 2.

EGEE Roll - Conguration Graph

and nodes are described by xml les, as seen in Figure 3. The graph le (left) species the framework hierarchy where edges connect nodes to each other, and each node le (right) contains a list of Red Hat packages and optional conguration scripts to turn a meta-package into a nal software deployment. Along with the packages conguration, the install process can run pre- and/or post- install scripts that have access to a global conguration MySQL database managed by the frontend that supports complex queries. The conguration of a cluster can then be thought of as a program used to congure a set of software, whose state, that represents a single instantiation of a cluster appliance, may be referenced by XML conguration code.

Fig. 3.

EGEE Roll - XML Conguration les l) Graph r) Nodes

Taking the graph approach, nodes installation can be easily extended by creating a new graph that adds an arc linking to the extended node. Once the node le is created and packages are placed in the le installation hierarchy tree, other appliances derived from the newly created node can be added to the site.

2.3 Wide Area Kickstart Traditionally, rocks clusters used the LAN to perform full installation of software components from a frontend machine, allowing controlled site customization and incorporation of components into the base distribution. More recently, Rocks distribution has been enhanced to allow full cluster installations over wide area networks (WAN) with the objective to adapt the installation method to the dynamic needs of systems such as Geon[12]. Any rocks frontend node can be used as a central server over a wide area network in a process called WAN kickstart. It allows a client frontend in a remote site to retrieve all the software, rolls and conguration from the central site, over the Internet. This includes the installation of the environment, to generate its own fully functional kickstart le that is then used to install itself in an ecient manner and with low administration overhead.

3

Building an EGEE Roll

In the work presented, the Rocks toolkit is used as the foundation cluster distribution for addressing the installation and management of EGEE site components. Taking advantage of the toolkit, an EGEE Roll was developed to automate the

deployment of the Grid middleware in sites where a common Grid operating environment must be rapidly setup without involving a lot of work at each site. EGEE imposes a regular structure on its middleware constituents to ensure interoperability between services at dierent levels. It also provides a comprehensive standardized software stack that includes the core operating system, currently Linux.

3.1 Glite conguration requirements gLite installation is strongly dependent on the information contained in global site conguration les and the existence of X.509 certicate for each of the machines, although certicates are not mandatory for WNs and UIs node types. The conguration includes a mandatory le site-info.def that contains a complete list of all the variables needed to congure a site or a minimum set of variables common to all node types. In addition, it contains the list of all VOs supported at the site and their specications. Virtual Organizations can be specied in a le per-VO, inside the vo.d folder, named vo_name. Personal account for each grid user is mapped to generic accounts on local grid site, specied in the le users.conf. The le groups.conf contains information about the mapping of dierent user groups.

3.2 EGEE Roll appliances Presently, the EGEE roll includes the following appliances: computer element, working node, storage element, monitoring box and user interface. These appliances are specic nodes in the rocks installation graph, each one representing a specic gLite element. For each appliance one or more rpm repositories are dened but there are also packages that are transversal to all elements, such as Java and the LCG certicates, see Figure 1 l). In Figure 2 the coloured nodes represent new appliances added to the standard Rocks conguration in the process of installation of the EGEE roll. The edges connecting the nodes show that the egee node depends on the server node and the other couloured nodes depend on the compute node. Although the EGEE states that a BDII must be installed in every site, there is no specic appliance dened for it, because the BDII is installed and congured along with the CE node. The design of the appliances, with minor adaptations, closely mimics the installation process used by gLite software. First it starts to install the software packages and then the conguration of each node type element is performed by a script through the yaim tool.

3.3 EGEE Roll installation The EGEE roll can be installed during the initial setup, following the standard Rocks procedure described in the Rocks users' guide. To bring up a FE we need to select the rolls : Kernel/Boot Roll CD, Base Roll CD, HPC Roll CD, Web Server

Roll CD and the OS Rolls distributions based on Scientic Linux (SLC) 4 (CERN) update 5, followed by the EGEE roll. When all the rolls have been selected, a sequence of conguration screens is presented in order to collect information regarding the generation of the local conguration les required to install an EGEE site; gure 4 display the rst screen. Information elds in these screens include: fully qualied host names, identications of the supported VO's, number of pool account users and the preferred LRMS system. The EGEE roll may also be added onto an existing FE. There are a number of extra procedures that must be accomplished in this situation, but this is the best case when it is not possible, or doable, to create a FE from scratch. A cong le is provided with the installation of the roll, in which the system administrator must type the information that otherwise would have been given at the installation screens. The information must then be registered into the FE's database with a script provided for the eect. The roll must then be re-installed. When all the information needed to build the FE has been gathered, the installation of all the congured packages starts and later the post congurations scripts execute in background until completion. After the frontend reboots, administrator may login and proceed to install a new site by running the program insert-ethers, a tool used to identify the new nodes to install (see g. 5) that captures computer requests and manages information on a central MySQL database.

Fig. 4.

EGEE Roll- Conguration Screen 1

The installation of the gLite elements is straightforward and does not require much conguration. The program menu presents ve new appliances entries, namely the Computing Element, the Storage Element, the MON Box, the User

Interface and the Worker Node, which have been congured during the installation of the egee roll. Any number of elements of any type may be added to a site by selecting the appropriate entry in menu and powering up a computer. For nodes that require a X.509 certicate, the system administrator must copy it into a specic location in the FE, so that the installation process can then supply the certicate to the target machine. As soon as the frontend machine receives a DHCP request from the computer it is inserted into the database and all the conguration les are updated according to the type of the selected appliance, meaning that the computer has successfully requested a kickstart le from the frontend and is going to automatically start the installation. When the node boots for the rst time it automatically installs the CA's certicates and executes yaim cong according to the type of element. The EGEE roll provides all packages required for each type of gLite element and no software download is required during the installation of any type of EGEE node, except for the key les from the supported VO's, as they change overtime.

Fig. 5.

Insert-ethers: A tool to select the elements to install

3.4 Local repository One aspect that cannot be overlooked in the maintainance of a cluster is software updates. In a site with a signicant number of nodes, this task can rapidly drain the available bandwidth, and can represent a bottleneck in the system. The EGEE roll gives system administrators the possibility to create a local repository of the software, so nodes can update themselves locally. This repository is created using the mrepo[16] package. The local repository holds a copy of the three major software components in a EGEE site: Scientic Linux Cern, DAG and gLite middleware.

When installing the roll, system administrator can choose between the local repository or the normal repositories for the software. The creation of the local repository is time and hard-disk space consuming, but in the long run this solutions pays-o since only one machine, the Frontend, has to fetch the updated packages from the Internet, in a completly transparent automatic process, dealt my mrepo. This packages also allows total control over wich packages can or cannot be upgraded, giving system administrators the oportunity to upgrade from just a single repository to a single package, for instance.

4

Civil Protection applications

A goal of the EGEE roll presented previously is to allow Civil Protection (CP) sites to build a Grid infrastructure for testing and evaluating the EGEE software without requiring a profound knowledge of Grid middleware in each of the sites. The authors are involved in two CP projects, CYCLOPS, EELA-2 and CROSSFIRE, in which to explore the EGEE roll. CYCLOPS [13] is an EU project in the area of Grid computing research that outlines the importance to develop enabling e-infrastructures and virtual organization services to fully exploit the GRID capabilities for Civil Protections (CP) applications. This initiative requires partners to perform an in-depth analysis of the current release of the EGEE software and to identify the required enhancements to the platform in order to fully support CP requirements. EELA-2 [15] EELA-2 aims at building a high capacity, production-quality, scalable Grid Facility, providing round-the-clock, worldwide access to distributed computing, storage and network resources needed by the wide spectrum of Applications from European - Latin American Scientic Collaborations. CROSS-FIRE [14] is a project nanced by the Portuguese government aiming to exploit the GRID infrastructure to demonstrate its potential through a CP activity application to be deployed among several independent CP related organizations.

4.1 CP-Virtual Organization Figure 6 l) shows a Civil Protection Virtual Organization that consist of a set of physically distributed EGEE sites located within the administrative domain of each European CP participating site, belonging to the same VO, sharing resources and interacting over the Internet. The architecture model of each site, as can be seen in Figure 6 r), comprises a set of Working nodes (left) in a tightly coupled LAN connected by a private Ethernet switch and several dierent server components (right) connected both to the internal network and to the Internet. Each site has a variable number of WNs and the minimum set of servers required by EGEE, including: 1) one site BDII; 2) at least one CE and a number of Worker Nodes totaling at least eight CPUs/cores attached to it; 3) at least one SE having a capacity of one 1 TByte or more.

Fig. 6.

CP EGEE Sites l) CP-VO r) CP-Architecture

The FE is a point of presence (pop) of a site that serves as a software repository for all the dierent types of nodes. It may also be used as a login and compile host to allow local submission of jobs. The Wide-area kickstart functionality of the Rocks toolkit together with the EGEE roll provide CP organizations a distribution platform and a standardized base software stack across all the sites of the Civil Protection VO. The common software base allows that enhancements developed in any site can be easily deployed in other partners' site and custom distribution of a site can be transmitted to other site's frontends over the Internet. The approach guarantees the interoperability across sites and full customization in each site's administrative domain.

5

Conclusion

Using standard tools such as YAIM for installing EGEE sites still involves a signicant technical eort. In order to congure the site using the YAIM toolkit, an administrator must choose how to install and congure the grid by manually editing several conguration les and executing the YAIM script. In the development of the EGEE roll we conrmed that the eort in installing and administrating an EGEE site could be drastically reduced by applying the techniques developed by the Rocks group. Extending the Rocks' approach to congure EGEE replaces the manual software installation and conguration by a fully-automated node installation process, including software update and user management, reducing the administrator's eort and the overall time required to deploy an EGEE site.

References 1. Burke, S. and al. "gLite 3.0 user guide" http://edms.cern.ch/le/722398//gLite-3UserGuide.pdf

2. gLite public website http://www.glite.org 3. Service level description between Rocs and sites, EGEE-II-SA1: EGEE-II SA1 https://edms.cern.ch/document/860386/0.5 4. Yaim 3.1.1 Guide https://twiki.cern.ch/twiki/bin/view/LCG/YaimGuide311 5. EGEE public website http://www.eu-egee.org 6. INGRID'06 http://www.lip.pt/ingrid06 7. SeARCH Serviços e Investigação em Computação Avançada com Clusters HTC/HPC,Refa :CONC-REEQ/443/2001;cluster http://www.di.uminho.pt/search/ 8. Field, L., Poncet, L. LCG generic installation and conguration. http://griddeployment.web.cern.ch/grid-deployment/documentation/LCG2-Manual-Install.pdf, (2005) 9. Foster, I., Kesselman, C., Tuecke, S. "The Anatomy of the Grid - Enabling Scalable Virtual Organizations"' (2001) 10. Papadopoulos, P. M., Katz, M. J., Bruno, G. "`NPACI Rocks: Tools and techniques for easily deploying manageable linux clusters"' In Proceedings of 2001 IEEE International Conference on Cluster Computing, (2001) 11. Katz, M. J., Papadopoulos, P. M., Bruno, G. "`Leveraging standard core technologies to programmatically build linux cluster appliances."' In Proceedings of 2002 IEEE International Conference on Cluster Computing, (2002) 12. Sacerdoti, F. D., Chandra, S., Bhatia, K. "Grid Systems Deployment & Management Using Rocks"' In IEEE International Conference on Cluster Computing, (2004) 13. CYCLOPS Technical Annex, Cyber-Infrastructure for Civil Protection Operative Procedures (CYCLOPS), SSA project N. 031874, March 2006. 14. CROSS-Fire Collaborative Resources Online to Support Simulations on Forest Fires : a Grid Platform to Integrate Geo-referenced Web Services for Real-Time Management, GRID/GRI/81795/2006. 15. EELA, E-science grid facility for Europe and Latin America 16. mrepo mrepo: Yum/Apt repository mirroring http://dag.wieers.com/homemade/mrepo/

Suggest Documents