Issues of Establishing a Campus-wide Computational Grid

0 downloads 0 Views 261KB Size Report
Running a parallel ... followed by some of the deployment and testing issues ... Plan. In preparation for this endeavor, the Malaysian. Research & Education Network (MYREN), a high- ... months after the official launching of MYREN. ... the front-end node, installing compute nodes through ..... GT 4.0 SimpleCA: Admin Guide.
Issues of Establishing a Campus-wide Computational Grid Infrastructure in the GERANIUM Project L.Y. Por, M.T. Su, T.C. Ling, C.S. Liew, T.F. Ang, K.K. Phang Faculty of Computer Science and Information Technology University of Malaya 50603 Kuala Lumpur, Malaysia Email: {porlip, smting, tchaw, csliew, angtf, kkphang}@um.edu.my

Abstract A campus-wide grid and cluster infrastructure using the Rocks clustering software is established. The infrastructure, namely Grid-Enabled Research Network and Info-structure of University of Malaya (GERANIUM), comprises five distinct clusters located at different faculties and institutes. In this paper, the GERANIUM topology and architecture are presented, and issues and experiences concerned are discussed.

stems from different faculties and institutes. Despite its very early stage, the GERANIUM Grid is believed to stimulate researchers and students to avail themselves of this opportunity. This paper shares our experiences while setting up the grid and cluster facilities. The background and motivation of the GERANIUM project is first outlined, followed by some of the deployment and testing issues worthy of their own discussions. Finally, concluding remarks and future research directions are provided.

2. Background and motivation 1. Introduction Pervasive access to computational power without regard to geographical boundary has been a goal researchers striving to achieve since the Internet first evolved. Through large-scale distributed computing environments, diverse resources that include computer cycles, data archival, scientific instruments, and networks are pooled and grouped systematically to form virtual organizations [1]. Users are enabled to access such resources, increase utilization of idle capacity, share computational results, and invent new problem-solving techniques – some of which can only exist in Grid computing [2]. An unfortunate fact is that current computing environment remains to be deficient in provisioning computation-intensive purposes. Running a parallel MATLAB application over 30 available desktops for 30 days non-stop, exemplifies a typical situation where the whole attempt can be laborious and vulnerable to any kind of interruptions. In the light of the profound acceleration in the needs to access computational power, University of Malaya (UM) has taken the move to establish the first campus-wide grid and cluster infrastructure in Malaysia. GERANIUM, the computational grid, is an ensemble of five clusters – Perdana cluster, COMBI cluster, FCSIT cluster, Alethia cluster, and CadCam cluster – each of which

Grid research is relatively new in Malaysia. No identified governmental intention exists until recently. Grid computing emerges to be a key component of the technology agenda under the upcoming Ninth Malaysia Plan. In preparation for this endeavor, the Malaysian Research & Education Network (MYREN), a highspeed network that links 12 local universities, was launched by the Malaysian Government in March 2005 [3]. In the effort of aspiring Malaysian research communities, MYREN Roadshow was held three months after the official launching of MYREN. In the roadshow, GERANIUM Project was started and the UM grid research community was formed for the commencement of establishing the first campus-level grid and cluster infrastructure in Malaysia. The GERANIUM Project has one primary objective: to set up GERANIUM Grid spanning the university’s campus. The secondary objective is to instil co-operative works among multiple nationwide educational and research organizations. At present, two projects, i.e. Molecular Modelling for Drugs and Simple Job Distribution (SimJD), are running in GERANIUM. The Molecular Modelling for Drugs project uses Amber version 7. In the drugs simulation, input parameter variation and process distribution are automated. This automation is particularly useful when multiple executions of the tasks involved are parameter

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

intensive. The other project, SimJD, is developed with the intention to integrate GERANIUM Grid with MATLAB. A simple MATLAB code is written to test the performance and the integration of the grid infrastructure. Despite of its small scale, the testing assists in the observation of the non-deterministic behaviour of the grid environment, i.e. tightly coupled processors interconnected via a network with high latency and low bandwidth. To help provide insights into the scalability and performance issues, the next step is to run LINPACK benchmark on GERANIUM. At the time of this writing, the metrics defined include flops/sec, compute time, and average response time with respect to problem sizes and the number of computational nodes available. The performance data collected enables the discovery of the system characteristics and performance expectations as well as diagnosing problems in the infancy of this grid infrastructure.

Figure 1 shows the deployment architecture and the topology of GERANIUM. GERANIUM sits on top of MYREN network which acts as the backbone of the research clusters in UM. The clusters within GERANIUM are constructed using Rocks version 4.0.0 which supports Globus Toolkit 2 (GT2) [4]. Currently, every cluster consists of a front-end node and several compute nodes. Frontend nodes are set up based on the recommended requirements [5]: 16 GB or higher disk capacities, 512 MB or higher RAM as well as two network interface cards. Compute nodes are set up based on similar requirements but each with only one network card. Within a cluster, a switch connects all the compute nodes to the front-end node. Once Rocks is successfully installed on the front-end node, the compute nodes are auto-installed using pre-boot execution (PBE) environment. During the installation process, each compute node uses Dynamic Host Configuration Protocol (DHCP) to request for an IP

3. Deployment architecture, topology and requirements

INTERNET

Firewall

MYREN

Firewall

GERANIUM

DNS Server

CA Server

Cluster 2

Cluster 1

Cluster N

Figure 1. GERANIUM deployment architecture and topology

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

address from the front-end node. The compute node then automatically downloads the operation system from the front-end node via Trivial File Transfer Protocol (TFTP). As compared to the installation of the front-end node, installing compute nodes through PBE is diskless and less time consuming. As part of GT2, the Grid Security Infrastructure version 2 (GSI) maintains trusted security relationships among distinct administrative domains through certificate-based authentication and authorization [6]. GSI comprises the entities of a user, a user proxy, a resource, and a resource proxy. The user creates the user proxy, and provides the user proxy with a set of temporary user proxy credentials to enable the user proxy to act on the user’s behalf. This feature supports single sign-on by which further user authentication is no longer needed. The resource proxy that authenticates with the user proxy is responsible for allocating resources and mapping the resources with respective processes. The resource proxy also translates between inter-domain security policies and intra-domain mechanisms, hence binding diverse local security policies into a global framework. The certificate-based authentication and authorization involves generating, signing, and exchanging certificates before a user could distribute jobs to a cluster or to the grid. GERANIUM utilizes a single Certificate Authority (CA) server, which is responsible for signing the generated certificates for all clusters within the grid as well as for external service requesters. The rational of having only one CA is to enable centralized control and monitoring of certificate signing. The next section delineates how the requisite certificates are generated and signed by the CA server.

DNS Server

4. GERANIUM certificate generation and exchange The grid environment requires certificate signing and exchanging as part of the attempt to enforce grid security. In order for users to be entitled to send jobs, the user (and the process operating on behalf of the user) must be authenticated and authorized. This section describes how the certificates are signed by the CA server. Note that all the following commands are treated as a single line. As an example, Figure 2 shows the entities involved in signing and exchanging certificates of the Perdana cluster. FCSIT cluster, Alethia cluster, and CadCam cluster that bear the similar configuration are not shown in this figure to avoid unnecessary complexity in the diagram. The first step is to set up the GSI environment by downloading the GERANIUM’s CA utility package from the CA server, and installing the package at the front-end node of the cluster. The following command creates the GSI environment and sets GERANIUM CA as the system’s default CA when making certificate requests: /opt/nmi/setup/globus_simple_ca_41d5cb1 2_setup/setup-gsi –default

where globus_simple_ca_41d5cb12_setup is the utility package with 41d5cb12 as the unique CA hash.

CA Server

GERANIUM Front-end Node

Front-end Node

Compute Node 1

Compute Node 1

Perdana Cluster

Combi Cluster

Compute Node 2

Compute Node 2

perdana.geranium.um.edu.my

combi.geranium.um.edu.my

Compute Node N

Compute Node N

Figure 2. Topology of the Perdana and Combi clusters in relation to signing and exchanging certificates

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

In the next step the front-end node of the cluster requests a host certificate by using this command:

3. 4.

grid-cert-request –service host clustername.geranium.um.edu.my –ca 41d5cb12

–host

The execution of this command results in the generation of three files with .pem extension (i.e. hostkey.pem, hostcert_request.pem and hostcert.pem). hostcert_request.pem has to be sent to the CA server for signing before job distribution are allowable within the respective cluster. The file can be sent via email to the administrator of CA server. When the CA administrator receives the request from the front-end node of a cluster, the administrator can either sign the certificate by using the ROCKS command local-ca-sign (only supported by ROCKS software) [7] or by using GLOBUS command grid-ca-sign [8]. There are pros and cons of both the signing methods. Signing certificates by using ROCKS command is very efficient since this command signs all the certificates at once. The gridmapfile will also be updated automatically, but the certificates generated will be given arbitrary filenames. The CA administrator has to open each signedcertificate via a text editor to check the owner of each certificate before sending it back to the requester. Signing certificate by using the GLOBUS command is more tedious than using the ROCKS command. However, since certificates are signed separately, the CA administrator is able to know whose certificate is currently being signed and therefore can rename the certificate’s file name according to the requestor’s ID or DNS, hence making the certificate easy to be identified. After signing the certificate, the CA administrator emails the hostcert.pem (signed certificate) back to the requester (front-end node of the cluster). The requester then replaces the empty hostcert.pem file (which is normally located at /etc/gridsecurity/) with the signed one. In the general case, a Unix user (named globususer, for instance) is created to send the globus jobs. Subsequently, the user would perform the following processes: 1. Request a user certificate by using the gridcert-request command (userkey.pem, usercert_request.pem and usercert.pem files will be generated in home/user/.globus/) 2. Email the usercert_request.pem file to GERANIUM CA administrator.

5.

The CA administrator processes the request and signs the certificate. Upon receiving the signed certificate from the CA administrator, the user replaces the unsigned certificate in /home/user/.globus/ with the signed certificate (usercert.pem). The Unix root user updates the /etc/gridsecurity/grid-map file by opening the usercert.pem file and copying the subject line into the grid-map file. An example of the subject line is as follows: "/O=Grid/OU=UM/OU=Geranium/OU= perdana.geranium.um.edu.my/ CN=support" support

6.

where support is the globus user’s name. To verify that the above mentioned steps are working well, the user then launches a globus proxy by doing the following: i)

Create a proxy certificate with gridproxy-init command $ grid-proxy-init Your identity:/O=Grid/OU=UM/ OU=Geranium/OU=perdana.gerani um.um.edu.my/ CN=support Enter GRID pass phrase for this identity: Creating proxy ..........Done Your proxy is valid until: Mon Aug 29 3:23:41 2005

ii)

Request hostname of the Perdana host with globusrun command from COMBI cluster. $ globusrun –a –r perdana perdana.geranium.um.edu.my

7.

Now the deployment is also ready for accepting/initiating connections from/to the Combi Cluster with certificates signed by the GERANIUM CA server. After creating the proxy certificate, the user of the Perdana Cluster be able to issue the following commands: i)

Request the front-end hostname of Combi Cluster at Combi Cluster with globusrun command $ globusrun –a –r combi combi.geranium.um.edu.my

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

ii) Execute hostname request at the remote cluster (Perdana CLuster) by using globus-run command $ globusrun –a –r combi /bin/hostname combi.geranium.um.edu.my

1. 2.

5. Testing GERANIUM topology A change to GERANIUM topology was performed. Combi Cluster which was originally connected to the MYREN network was temporarily relocated to the UM network. Besides, a new cluster called the FCSIT cluster has been established and attached to GERANIUM (see Figure 3) such that both Perdana and FCSIT clusters can send jobs to one another. Alethia cluster and CadCam cluster are again not shown in this figure for the same reason. A problem occurred when trying to send a job from Perdana to Combi. The problem was attributed to the firewall located at the UM network as well as the MYREN network. To remedy this situation, some compromises have been made to allow the incoming and outgoing connections. In other words, the network resides at the back of the firewall needs to either i) tone down the firewall security or blocking protocol, or ii) enable only reliant network addresses for achieving grid-enabled job submission.

6. Findings and Discussion Several issues which could be easily overlooked during the deployment and configuration phase are outlined below:

3.

4. 5.

6.

7. 8.

Should keep track of the different passwords requested during the installation process. Host certificates should be requested by using the Unix root user. Signing process that takes more than 5 minutes will render the signed certificate invalid. Ensure that the subject line of a certificate is added to the grid-mapfile file. Ensure that the IP address, full DNS name and hostname of the other front-end is in the hosts file. The proxy certificate generated with gridproxy-init command will expire in 3 hours time. Jobs should only be submitted by non-root users. Enable the firewall in the iptable file if required.

This project also attempts to develop a web portal which will enable researchers to utilize GERANIUM services and resources. Researchers will be able to submit parallel and serial jobs to GERANIUM, and manage data transfer across the clusters. At the same time, the web portal allows the users to monitor the job status, and view the network bandwidth and latency of the cluster.

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

UM Network

Internet

CA Server

Front-end Node

MYREN DNS Server

DNS Server

GERANIUM

Compute Node 1

Combi Cluster

combi.geranium.um.edu.my

Compute Node 1

Compute Node N

Front-end Node

Front-end Node

Compute Node 1

Compute Node N

Perdana Cluster

perdana.geranium.um.edu.my

FCSIT Cluster

Compute Node N

fsktm.geranium.um.edu.my

Figure 3. Internetwork connectivity topology of UM network and MYREN network We also plan to develop applications by using grid services to integrate scientific instruments, computational resources, and databases. Another aspect of future work include the development of new mechanisms for fine-grained access control, Intrusion Detection System, network monitoring, as well as gridbased software agents for data mining.

7. Acknowledgements We would like to thank Associate Professor Dr Putchong Uthayopas, Mr Somsak Sriprayoonsakul and Mr Kittirak Moungminksuk from High Performance Computing and Network Center, Faculty of Engineering, Kasetsart University, Thailand, for generously sharing their technical expertise with us.

[3] Malaysian Research & Education Network (MYREN). Retrieved: 20/09/2005, from http://www.myren.net.my [4] Rocks Cluster Distribution. Retrieved: 25/09/2005. From http://www.rocksclusters.org/Rocks/ [5] lNPACI Rocks Cluster Distribution. Users Guide Chapter 1: Installing a Rocks Cluster. Retrieved: 15/08/2005. From http://www.rocksclusters.org/rocks-documentation/ 4.0.0/getting-started.html [6] Foster, I., Kesselman, C., Tsudik, G. & Tuecke S. (1998). ‘A Security Architecture for Computational Grids’. In: Proc. 5th ACM Conference on Computer and Communications Security Conference. pp. 83-92.

8. References

[7] lNPACI Rocks Cluster Distribution. Users Guide Chapter 3: Using the Grid Roll. Retrieved: 15/08/2005. From http://www.rocksclusters.org/roll-documentation/grid/4.0.0/ managing-certificates.html

[1] Foster, I. & Kesselman, C. (2001). The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications. 15(3).

[8] Globus Alliance. GT 4.0 SimpleCA: Admin Guide. Retrieved: 20/08/2005. From http://www.globus.org/toolkit/ docs/4.0/security/simpl

[2] Foster, I. & Kesselman, C. (1998). ‘Computational grids’. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann. San Mateo, CA. pp. 15-51. eca/admin-index.html

Proceedings of the International Conference on Networking, International Conference on Systems and International Conference on Mobile Communications and Learning Technologies (ICNICONSMCL’06) 0-7695-2552-0/06 $20.00 © 2006

IEEE

Suggest Documents