Installing a 3 node Linux based cluster for scientific ...

79 downloads 42260 Views 291KB Size Report
An mpi job is compiled and ran in order to test the application ..... Scientific softwares, packages, batch scheduling, management and monitoring tools are.
Installing a 3 node Linux based cluster for scientific computation Moses O. Sokunbi1,2,* 1

Abdus Salam International Centre for Theoretical Physics, Trieste, Italy Department of Electronic and Electrical Engineering, Obafemi Awolowo University, Ile-Ife, Nigeria. * E-mail: [email protected] 2

Abstract This paper describes the installation of a 3 node Linux based cluster for scientific computation. The cluster entails a node as the master node and the other two nodes as the compute nodes. The compute nodes are installed by a network based kickstart installation system. Scientific packages such as compilers, mathematical libraries, Local Area Multicomputer (LAM) package for Message Passing Interface (MPI), batch processing and scheduling software are installed. The output of the Ganglia monitoring package is shown. An mpi job is compiled and ran in order to test the application packages. Keywords cluster, hpc, Linux

1. Introduction Serial computing is the use of a single computer having a single Central Processing Unit (CPU) to solve a computational problem [2]. The computational problem is broken into a discrete series of instructions and the instructions are executed one after another. Only one instruction may execute at any moment in time. Traditionally, software has been written for serial computation. Fig. 1 illustrates serial computing.

Fig.1: Serial computing

Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem [2]. Parallel computing involves using multiple CPUs, whereby a computational problem is broken into discrete parts that can be solved concurrently. Each part of the problem is further broken down to a series of instructions and these instructions execute simultaneously on different CPUs. The compute resources can include a single computer with multiple processors, an arbitrary number of computers

2 connected by a network and a combination of both. The computational problem usually demonstrates characteristics such as the ability to be broken apart into discrete pieces of work that can be solved simultaneously, execute multiple program instructions at any moment in time and solved in less time with multiple compute resources than with a single compute resource. Fig.2 describes parallel computing.

Fig.2: Parallel computing

A computer cluster is a group of computers that work together. A cluster is made up of three basic elements; a collection of individual computers, a network connecting those computers, and a batch queuing or message passing software that enables a computer to share a job among the other computers via the network. In a scientific computational cluster, the job is divided among the different computer nodes on the cluster network, each node working on a part of the job and all working at the same time. A good example of a computer cluster is the world’s largest search engine, Google, which consists of more than 10,000 PCs [4]. Clusters can be both commercial and commodity clusters [3]. Commercial clusters often use proprietary computers and software, the software is often tightly integrated into the system, and the CPU and network performance are well matched. The primary disadvantage of commercial clusters is their cost. Examples of commercial clusters are SUN Ultra and IBM cluster. Commodity clusters are built with freely available, open source software, which enables people to build a cluster when the alternatives are just too expensive. An example is the Beowulf cluster. Clusters have proved their worth in a variety of ways, such as: • • •

In climate modeling and prediction Industrial design Molecular dynamics

3 • • • •

Astronomical modeling Data mining Image processing Mission critical applications such as web and FTP servers.

1.1 Cluster Architecture There are basically three types of cluster architecture namely symmetric, asymmetric and expanded clusters. Fig. 3 describes a symmetric cluster which is the simplest form of cluster structure. This architecture is straight forward to set up, it is a subnetwork made up of individual computers which are the nodes and cluster specific software. One or two servers may be added depending on specific needs, but this involves some configurations. Some of the draw backs of using this type of architecture are problems with cluster management, security, workload distribution which results in difficulty in achieving optimal performance.

Fig. 3: Symmetric cluster

In asymmetric cluster, a computer acts as the master or head node, which is a gateway between the remaining nodes and the users. Asymmetric cluster provides a high level of security because all the traffic to the cluster must pass through the master node. The remaining nodes often have very minimal software and are dedicated exclusively to the cluster. The master node often acts as a primary server for the remaining of the cluster. The primary disadvantage of this architecture is the performance limitation imposed by the master node. Hence, a more powerful computer may be used for the master node, but this is limited as the size of the cluster grows. Fig. 4 is an example of an asymmetric cluster.

4

Fig. 4: Asymmetric cluster

Fig. 5 illustrates the architecture of an expanded cluster, which is a result of incorporating additional servers to an asymmetric cluster. Servers for NFS, monitoring and I/O are added to the cluster. Network design and communications are taken into consideration. Expanded clusters are suitable for large clusters. Fig.4 and Fig.5 are often used in designing scientific computation clusters.

Fig. 5: Expanded Cluster

5

1.2 Types of Clusters There are several different types of computer clusters, each offering different advantages to the user. The different types are: •

High Availability Clusters

High-availability clusters are often used in mission critical applications. It is composed of multiple machines, but only a single machine called the primary server is available, all other machines are secondary servers and are in standby mode. The secondary servers monitor the primary server to ensure that it is operational. If the primary server fails, a secondary server takes its place. The basic fact with this type of cluster is redundancy. An example of where a high availability cluster is implemented is in web server application. •

Load-balancing Clusters

A load-balancing cluster provides better performance by dividing the job among multiple computers. Also, this type of cluster can also be implemented in a web server where the different queries to the server are distributed among the computers in the cluster. •

High Performance Computing (HPC) Clusters

HPC Clusters are designed to exploit the parallel processing power of multiple nodes by splitting a computational task among the nodes. They are most commonly used to perform functions that require nodes to communicate as they perform their tasks – for instance, when calculation results from one node will affect future results from another. HPC Clusters are commonly used in scientific computing.

2. Methodology Fig. 6 describes the Oxygen cluster which is based on the asymmetric cluster architecture described in Fig. 4. 3 IBM dual core processors of 1.13 GHz, 1 GByte RAM, 40 GByte hard disk, rack mounted CPUs are used. The master and compute nodes are networked using unshielded twisted pair (UTP) cables via a 10/100 base Ethernet switch. Any distribution of Linux operating system could be installed but version 4.3 of CentOS which is a red-hat distribution of the Linux operating system is installed as the operating system.

6

Fig.6: Oxygen cluster

2.1 Installation and configuration of the Master node The setting up of the master node of the oxygen cluster involved the implementation of six tasks. Tasks: 1. Installing a Linux Server 2. Setting up the network and hostname 3. Configuring Dynamic Host Configuration Protocol (DHCP) 4. Setting up and configuration of the Preboot eXecution Environment (PXE); and the Trivial File Transfer Protocol (TFTP) Service 5. Configuring Network File System (NFS) 6. Creating the kickstart file for the client installation

7 Task 1: Installing a Linux Server Version 4.3 of CentOS distribution of the Linux operating system is installed on the master node by using CD installation. The master node is installed as a Linux server by choosing server option during the installation and a partition named distro is created and reserved as a repository for the RPMs of the Linux distribution to be used during the kickstart installation. During the package selection, “legacy network servers” is selected so that the master node could also be installed as a TFTP server. Task 2: Setting up the network and hostname 1. The external interface eth0 and internal interface eth1 of the master node is set up with the ifconfig command. 2. The hostname of the master node and compute nodes is set up in the /etc/hosts file. The hosts file of the oxygen cluster is shown below: # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 localhost.localdomain localhost 147.122.17.208 oxygen.hpc.sissa.it oxygen

# Cluster Network 10.10.10.1 10.10.10.10 10.10.10.20 10.10.11.1 10.10.11.10 10.10.11.20

oxygen.cluster.net node01.cluster.net node02.cluster.net oxygen.nfs.net node01.nfs.net node02.nfs.net

oxygen.cluster node01 node02 oxygen.nfs nfs01 nfs02

Task 3: Configuring Dynamic Host Configuration Protocol (DHCP) Oxygen was configured as a DHCP server by creating the /etc/dhcpd.conf file. This was necessary to set up an automated installation procedure. The following parameters are used to create the dhcpd.conf file: • • • • • • •

the subnet and netmask of the subnet the domain name broadcast address the name of the bootloader program which is "pxelinux.0", the name of the TFTP (next server) server, the hardware Ethernet (MAC address) of the nodes, a fixed IP address for the nodes and

8 •

the hostname for the nodes

The dhcpd.conf file of Oxygen is as shown below: # DHCP Server Configuration file. # see /usr/share/doc/dhcp*/dhcpd.conf.sample ddns-update-style none; shared-network comms { subnet 10.10.10.0 netmask 255.255.255.0 option subnet-mask option broadcast-address option ntp-servers next-server filename install/pxelinux.0"; } subnet 10.10.11.0 netmask 255.255.255.0 option subnet-mask option broadcast-address next-server }

{ 255.255.255.0; 10.10.10.255; 10.10.10.1; 10.10.10.1; "/linux-

{ 255.255.255.0; 10.10.11.255; 10.10.11.1;

} # Network for management operations host node01.cluster.net { hardware ethernet fixed-address option host-name }

00:02:55:C6:28:E8; 10.10.10.10; "node01";

host node02.cluster.net { hardware ethernet fixed-address option host-name }

00:02:55:C6:23:F2; 10.10.10.20; "node02";

# Network File System (NFS) operations host node01.nfs.net { hardware ethernet fixed-address option host-name

00:02:55:C6:28:E9; 10.10.11.10; "nfs01";

9 } host node02.nfs.net { hardware ethernet fixed-address option host-name }

00:02:55:C6:23:F3; 10.10.11.20; "nfs02";

#EOF The file is saved and the dhcp daemon started. The dhcp daemon is added to chkconfig so that it could start at boot up time in the following way: [root@oxygen ~]# chkconfig --list dhcpd [root@oxygen ~]# chkconfig --add dhcpd [root@oxygen ~]# chkconfig --level 2345 dhcpd on Task 4: Setting up and Configuration of the Preboot eXecution Environment (PXE) and the Trivial File Transfer Protocol (TFTP) Service Oxygen is also configured as a TFTP server. The following steps are implemented to have the TFTP service running. 1. The Network Bootstrap Program (NBP) pxelinux.0 is necessary to setup the TFTP service. Pxelinux.0 is provided by system-config-netboot and syslinux packages which are selected inside the Legacy Software Development Package during the installation of Linux as a Server. 2. The vmlinuz and initrd.img files are copied from the installation CDROM to the tftpboot/linux-install/ directory. 3. In the /tftpboot/linux-install/pxelinux.cfg directory, the bootstrap configuration files for the clients are created by using as filename the IP addresses of the clients in hexadecimal notation. The IP addresses are converted from dotted quad notation to hexadecimal notation by using a utility called gethostip supplied by the syslinux package. The gethostip command is implemented in the following way: > gethostip -x 10.10.10.10 0A0A0A0A The install information of the nodes are created with the file names 0A0A0A0A and 0A0A0A14.

10 > vi 0A0A0A0A prompt 1 timeout 50 display /pxelinux.cfg/installmsg.txt default install label local LOCALBOOT 0 label install kernel vmlinuz append vga=normal network ip=dhcp ksdevice=eth0 load_ramdisk=1 prompt_ramdisk=0 ramdisk_size=16384 initrd=initrd.img ks=nfs:10.10.11.1:/tftpboot/ks/ks.cfg selinux=0 We create the installation message as follows: > vi installmsg.txt Welcome to Oxygen Cluster We are going to install this node from scratch Available options: local install (default)

boot from local devices install the node by using PXE+TFTP+NFS

5. The /etc/xinetd.d/tftp file is enabled by deleting the line “disable=yes” and the xinet daemon reloaded. Task 5: Configuring Network File System (NFS) 1. The RPMs on all the installation CD's are copied to the distro partition. 2. The /distro, /home and /opt directories are exported by setting up the NFS in the /etc/exports file. The NFS is started. Task 6: Creating the kickstart file for the client installation 1. The kickstart file ks.cfg for the client installation is created and placed in the /tftpboot/ks/ directory. The utility system-config-kickstart is used to create the file.

11 2. The directory /tftpboot/ks/ is exported via NFS as defined on the kernel/anaconda command line option in the bootstrap configuration file. The NFS is restarted. 2.2 Installation and configuration of the compute nodes Tasks: 1. The compute nodes of the oxygen cluster are installed by the kickstart method using the RPM repository on oxygen. We use the kickstart method because it is an automated method that enables many nodes to be installed in a very short time and each time more nodes are added to the cluster, one only needs to create an installation file and install the nodes using kickstart. The nodes are booted and after some seconds the install message below is displayed, with a boot prompt. After a few seconds the automated installation process began. Welcome to Oxygen Cluster We are going to install this node from scratch Available options: local install (default)

boot from local devices install the node by using PXE+TFTP+NFS

At the end of the installation before the nodes are rebooted the default boot instruction in the install files 0A0A0A0A and 0A0A0A14 of the nodes are changed to local so as to prevent an installation loop. The line “we are going to install this node from scratch” is deleted from the intallmsg.txt. The nodes are rebooted and this time they boot from the local hard drive. 2. Creating the /etc/hosts file. The hosts file for the compute nodes is the same as the one for the master node. 3. Creating the /etc/fstab file so that the nodes can mount the /home and /opt directory at boot time. The binaries of the compilers, mathematical libraries and other important application packages are installed in the /opt directory. 2.3 Implementing the “public-key” authentication The public-key authentication is implemented for a user account moses by using the the ssh-keygen –d command. The following steps are taken into consideration: • • •

the current directory of the account moses is changed to the .ssh directory the group write privilege access on the .ssh directory is removed otherwise the public-key authentication wouldn’t work. two files id_dsa and id_dsa.pub are created with the ssh-keygen –d command without using a password.

12 •

The file id_dsa.pub is copied under the name authorized_keys in the same directory.

The user moses is able to do an ssh to node01 and node02 without a password since the /home /moses is already mounted on the nodes at boot time. 2.4 Installation and configuration of application packages Scientific softwares, packages, batch scheduling, management and monitoring tools are downloaded from the Internet and installed on the Oxygen cluster. Compilers such as the g95 compiler, pgi Fortran compiler, Intel Fortran and C++ compilers are installed on Oxygen from their install scripts. Mathematical Libraries such as ATLAS and MKL libraries are installed. LAM/MPI and MPICH used for parallel processing on clusters are also installed. Pbs-Torque and Maui scheduler used for batch processing are installed from source code. C3 tools for cluster management and the Ganglier monitoring core for monitoring the health of the cluster are also installed. For more information on the installation of these packages refer to [1].

3. Cluster security The security of the Oxygen cluster is enhanced by configuring the following service and files: • iptables • /etc/security/limits.conf • /etc/security/access.conf • /etc/ssh/sshd_config Iptables: Since the Oxygen cluster is connected to the Internet through its external interface eth0, it became necessary to secure that interface. An iptables file is created in the /etc/sysconfig directory of the master node, the file is as shown below. The file is saved and the iptables service is started. *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -i eth0 -p tcp --dport 22 --syn -j ACCEPT -A INPUT -i eth0 -p tcp --dport 80 -s 147.122.0.0/16 --syn -j ACCEPT -A INPUT -i eth0 -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -i eth0 -j DROP COMMIT /etc/security/limits.conf: This file is edited on the master node by adding the following lines. This prevents users from running their jobs on the master node with respect to the prescribed time limit.

13 * soft * hard root -

cpu cpu cpu

15 16 525600

/etc/security/access.conf: The line below is added to the access.conf file of the master node. - : root : ALL EXCEPT asgard.sissa.it Also, pam_access.so is included in /etc/pam.d/sshd: #%PAM-1.0 auth required pam_stack.so service=system-auth auth required pam_nologin.so account required pam_stack.so service=system-auth password required pam_stack.so service=system-auth session required pam_stack.so service=system-auth session required pam_loginuid.so account required pam_access.so /etc/ssh/sshd_config: The following lines are added to this file to make the access.conf file functional. Protocol 2 SyslogFacility AUTHPRIV LogLevel VERBOSE PasswordAuthentication yes ChallengeResponseAuthentication no GSSAPIAuthentication yes GSSAPICleanupCredentials yes UsePAM yes X11Forwarding yes DenyUsers root@*.hpc DenyUsers root@*.nfs DenyUsers root@*.sp Subsystem sftp /usr/libexec/openssh/sftp-server

4. Results It is essential to monitor the state of the cluster in order to ensure that every node is up and running. Examples of tools that can be used to monitor the state of clusters include Ganglia, Clumon and Performance Co-Pilot. Any of these tools could have been used as the monitoring tool on the Oxygen cluster but Ganglier is deployed for no particular reason. Ganglier is a real-time performance monitor for clusters and grids and it uses the round-robin database package [2]. The Ganglier monitoring tool deployed on the Oxygen cluster displays some statistical results about the cluster such as the load, CPU usage,

14 network traffic and memory usage of the cluster. Fig. 7 gives an overview of the Oxygen cluster, which entails a master node with internal interface oxygen.cluster.net, two compute nodes node01.cluster.net and node02.cluster.net. Fig. 8 gives the physical view information of the cluster. Observation from the reports of the Ganglier monitoring tool shows that the Oxygen cluster is healthy, up and running.

Fig. 7: Overview of Oxygen cluster

Fig. 8: Physical view of Oxygen cluster

15 An mpi job is also compiled and ran on the Oxygen cluster so as to test the application packages installed on the cluster. The mpi job is run using LAM in the following way: [moses@oxygen ~]$ lamboot -v hostfile LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University n-1 ssi:boot:base:linear: booting n0 (oxygen.cluster.net) n-1 ssi:boot:base:linear: booting n1 (node01.cluster.net) n-1 ssi:boot:base:linear: booting n2 (node02.cluster.net) n-1 ssi:boot:base:linear: finished [moses@oxygen ~]$ lamnodes n0 oxygen.cluster.net:1:origin,this_node n1 node01.cluster.net:1: n2 node02.cluster.net:1: [moses@oxygen ~]$ mpicc cpi.c -o cpi.x [moses@oxygen ~]$ ls cpi.c matvec.f pbstest.sh usingmpi cpi.x matmat.f pi3.f Matmul mountain.tgz pi.f Matmul.tar.gz mpi_progs.tar.gz pit.f [moses@oxygen ~]$ mpirun -np 10 cpi.x Enter the number of intervals: (0 quits) 10 pi is approximately 3.1424259850010978, Error is 0.0008333314113047 Enter the number of intervals: (0 quits) 100 pi is approximately 3.1416009869231249, Error is 0.0000083333333318 Enter the number of intervals: (0 quits) 1000 pi is approximately 3.1415927369231262, Error is 0.0000000833333331 Enter the number of intervals: (0 quits) 100000 pi is approximately 3.1415926535981269, Error is 0.0000000000083338 Enter the number of intervals: (0 quits) 100000000 pi is approximately 3.1415926535897443, Error is 0.0000000000000488 Enter the number of intervals: (0 quits) 0 [moses@oxygen ~]$ lamhalt LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University [moses@oxygen ~]$

16

5. Conclusion A computer cluster is installed using 3 homogeneous nodes. It is possible to use relatively old computers such as 386 or 486 computers to set up a cluster. Also, a good knowledge of the open source operating system “Linux” is necessary for setting up a computer cluster.

Acknowledgement The author undertook this work with the support of the Abdus Salam International Centre for Theoretical Physics (ICTP) Programme for Training and Research in Italian Laboratories (TRIL), Trieste, Italy. Thanks to Dr. Stefano Cozzini and Clement Onime for their useful consultations. Special thanks to Moreno Baricevic for his support and assistance in the Democritos/INFM laboratory at SISSA.

References 1. Advanced School in High Performance Computing Tools for e-Science – Joint DEMOCRITOS/INFM-ELab/SISSA-ICTP activity. http://www.democritos.it/hpc-wiki/ 2. Introduction to Parallel Computing. http://www.llnl.gov/computing/tutorials/parallel_comp/ 3. Sloan, J.D. (2004) “High Performance Linux Clusters”, O’Reilly and Associates, Inc.Sebastopol, CA 95472, USA. 4. Smart to get computers to work together, Uppsala Universitet http://www.it.uu.se/research/info/kluster/