Dynamic clusters available under Clusterix Grid

15 downloads 5941 Views 53KB Size Report
national computing infrastructure with 12 sites (local Linux clusters) lo- cated across ... its filesystem located in the dedicated directory on the server. To speed up ...
Dynamic clusters available under Clusterix Grid J. Kwiatkowski1, M. Pawlik1, G. Frankowski2, K. Balos3 , R. Wyrzykowski4, K. Karczewski4 1

3

Institute of Applied Informatics, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland 2 Poznan Supercomputing and Networking Center, ul. Noskowskiego 10, 61-704 Poznan, Poland Institute of Computer Science, AGH, ul. Mickiewicza 30, 30-059 Krakow, Poland 4 Institute of Computer and Information Science, Czestochowa University of Technology, Dabrowskiego 73, 42-200 Czestochowa, Poland

Abstract. The increase of computer networks speed paired with the ubiquity of inexpensive, yet fast and generously equipped hardware offers many organizations an affordable way to increase the available processing power. Clusters, hyperclusters and even Grids, not so long ago seen only in huge datacenters, can now be found helping many small organizations in solving their computational needs. CLUSTERIX is a truly distributed national computing infrastructure with 12 sites (local Linux clusters) located across Poland. The computing power of the CLUSTERIX can be increased dramatically by connecting additional clusters. These clusters are called dynamic because it is assumed that they will be connected to the core infrastructure in a dynamic manner, using an automated procedure. In the paper we present the design foundation of the Cumulus hypercluster deployed at Wroclaw University of Technology together with the method for its integration as a dynamic cluster in the CLUSTERIX grid.

1

Introduction

In many fields of science the increase of computational requirements has made the access to powerful computational installations a necessity. The creation of a dedicated Beowulf type cluster, composed from cheap, commodity hardware is currently in the reach of many organizations. Although this standard procedure has many advantages, its creation is not always the best solution. In the paper we propose an alternative approach - the creation of a dynamic cluster ready to be utilized as a dynamic subsystem of the CLUSTERIX[4] Grid. The main objective of the CLUSTERIX national grid project is to develop mechanisms and tools that allow for deployment of a production grid with the core infrastructure consisting of local clusters based on 64-bit Linux machines. Local PC-clusters are placed across Poland in independent centers connected by the Polish Optical Network PIONIER. Currently the core infrastructure of the CLUSTERIX comprises 250+ Itanium2 CPUs located in 12 sites. The computing power of the CLUSTERIX environment can be increased dramatically

by connecting additional clusters. In the CLUSTERIX environment these clusters can be connected to the core infrastructure in a dynamic manner, using an automated procedure. This paper presents the design, hardware and software configuration of the Cumulus dynamic hypercluster and details of the method of its integration into the CLUSTERIX grid environment.

2

Cumulus hypercluster design overview

The Cumulus hypercluster[3] is composed of two clusters named Calvus and Humilis. The clusters are located in two separate buildings of the Faculty of Computer Science and Management, Wroclaw University of Technology. The nodes in the clusters are either working full-time as the computational nodes (dedicated nodes) or are connected to the cluster only in the predefined hours, when they are not used for other purposes (dynamic nodes). The dedicated nodes have the whole of their hard disk space devoted to the scratch and swap space utilized by the cluster system. The Calvus cluster consists of 40 dynamic nodes and 2 dedicated ones. The Humilis cluster is built of 16 dynamic nodes and 2 dedicated ones. One of the most important Cumulus hypercluster design requirements was that its deployment can not interfere with a typical role of the computers used for university courses. During lesson hours the dynamic nodes are detached from their cluster and only the dedicated ones are operational. The cluster can not be effectively utilized for computations but, with dedicated nodes working, is still available for testing purposes. When the classes are finished, the cluster startup procedure is initiated. The woken up computers begin to form the computational cluster. In the Calvus cluster the nodes are set up to start with network-booting. During lesson hours the DHCP server on the Calvus server is configured to offer boot-up settings only to the dedicated nodes and ignore the other ones so they can fall back to their ordinary hard disk boot-up procedure. In the environment where Humilis cluster is deployed the network booting is utilized for other purposes. The nodes are started with slightly modified GRUB bootloader, configured with a default option to download its configuration file from the Humilis server. During lesson hours the configuration file served to the dynamic nodes orders them to start from hard disk, while after the classes they are instructed to initiate cluster boot-up procedure. To make the nodes operational in the case of Humilis server failure, the GRUB bootloader installed on the nodes is also configured to fall back to the hard disk boot after waiting for the response from the cluster server. After the bootloader initialization, the nodes use DHCP server to obtain their IP address and the TFTP server location. Every computational node has its filesystem located in the dedicated directory on the server. To speed up the operations and lower the disk space requirements most of the files are shared by all the nodes. The TFTP server running on the cluster is utilized by the nodes to download the bootloader configuration, the Linux kernel and the initial ramdisk image. The ramdisk is configured to start the cluster environment. Consulting the station IP it determines the type of the node and performs NFS mount of

appropriate directories from both cluster and hypercluster servers as well as the mounting of scratch space from the local disk. During the bootup procedure the TORQUE Resource Manager is started and the Globus Toolkit environment is initialized. The Humilis cluster operates under Globus version 3.2.1 while Calvus utilizes version 4.0.1. Both clusters are monitored by the Ganglia Monitoring System daemons installed on the hypercluster server. The clusters with different versions of Globus Toolkit do not form a coherent grid environment. If the need to higher the Calvus computational power arises, simple reconfiguration of queuing systems makes it possible to logically move chosen number of nodes from the Humilis cluster to the Calvus cluster.

3

Architectures of local and dynamic clusters

Several conditions should be satisfied to provide the attractiveness of the dynamic cluster concept for its potential users[2]. First of all, the attachment and detachment procedures should be simple and automated. After the initial fulfillment of conditions necessary to provide integration of a dynamic cluster (installed software, initial verification of the cluster performed only once), its operator should be able to attach and detach the cluster automatically by calling a single command. It requires to develop the attachment and detachment procedures invoked as a reaction to this command. The unified architecture of local clusters in the core has been tailored to implement this functionality in an efficient and secure manner. In particular, each local cluster is provided with a dedicated firewall/router whose public network interface is the only access point to the external network (Internet in particular). This solution allows for a balanced implementation of the attachment procedure giving the possibility to choose the most appropriate local cluster to establish connection.

4

Concept of attachment and detachment procedures

Every dynamic cluster can be connected to one of 12 ports corresponding to local clusters in the CLUSTERIX core. The monitoring system, installed on a dedicated node, is responsible for deciding which port the dynamic cluster will be connected to. So at the beginning, the access node of the dynamic cluster should contacts this node. The monitoring system notifies the chosen local cluster about the emerging dynamic cluster, and transfers to the dynamic cluster the public IP address of the access node in the core. According to principles of IP addressing adopted in the CLUSTERIX project, each local cluster possesses a certain subclass of private addresses from 10.x.x.x pool. Inside this subclass, we distinguish one subclass for the computational network of the local cluster, and 16 separate subclasses for dynamic clusters. This means that maximum 16 dynamic clusters may be integrated with a given local cluster at the same time. The procedure of detaching the dynamic cluster is much simpler. The

dynamic cluster must inform the monitoring system, which in turn invokes a set of configuration steps in the firewall assigned to the dynamic cluster. The description of the attachment and detachment procedures shows that these procedures have to be integrated with the installed monitoring system. Since the CLUSTERIX information system is based on JIMS - the JMX-based Infrastructure Monitoring System [1], this goal is achieved by development of an additional module, called DynamicClusterManagement. This module is installed in the JIMS agent, operating on a dedicated machine. The agent is accessible from the outside world through the DNAT translation mechanism configured on the firewall/router, redirecting SOAP requests from dynamic cluster to the Access Node. The main function of the DynamicClusterManagement module is waiting on authenticated requests from dynamic cluster, to invoke one of two methods: installCluster and uninstallCluster. The first method is responsible for the initialization of the dynamic cluster attachment procedure, taking the public IP and gatekeeper IP of the cluster as parameters, and returning firewall IP, IP address pool, its subnet and network mask. The second method is responsible for managing the dynamic cluster removal process and releasing all allocated resources.

5

Conclusions

The possibility to connect dynamic clusters, to the CLUSTERIX backbone, opens the access to a shared environment with extraordinary computational power. The Cumulus hypercluster created at Wroclaw University of Technology already proved to be a beneficial scientific tool for research and educational purposes. Together with the access to the CLUSTERIX environment it can be utilised as a traditional cluster and as a grid installation, joining high and easily extendable computational power of dynamic machines, the advantages of the persistent presence of available dedicated nodes and the computational power, delivered by the Polish national computational grid. Acknowledgments. This research has been partially supported by the Polish Ministry of Science and Information Society Technologies under grant 6T11 2003C/06098 ”ClusteriX - National Cluster of Linux Systems” and the European Community Framework Programme 6 project DeDiSys, contract No 004152.

References 1. Balos, K., Radziszowski, D., Rzepa, P., Zielinski, K., Zielinski, S.: Monitoring Grid Resources: JMX in Action, Task Quarterly 8, 4 (2004) 487-501. 2. Frankowski G., Baos, K., Wyrzykowski R., Karczewski K., Krzywania R., Kosinski J.,: Intergrating Dynamic Clusters in CLUSTERIX Environment, Proc. CGW’05. 3. Kwiatkowski, J., Pawlik, M., Wyrzykowski, R., Karczewski, K.: Cumulus - Dynamic Cluster Available under Clusterix, Proc. CGW’05. 4. Wyrzykowski, R., Meyer, N., Stroinski, M.: Concept and Implementation of CLUSTERIX: National Cluster of Linux Systems. Proc. LCI Int. Conf. on Linux Clusters: The HPC Revolution 2005, Chapel-Hill, NC, April 2005.