The Story Behind Midnight, a Part Time High Performance ... - CiteSeerX

4 downloads 1396 Views 49KB Size Report
but turns into a high performance compute cluster during the night. ... cluster as a test bed to evaluate the possibilities to ... into Linux after work hours to form a computational cluster. ..... [4] Distributed .net homepage, http://distributed.net, 2003-.
The Story Behind Midnight, a Part Time High Performance Cluster Andreas Boklund, Chris tian Jiresjö, Stefan Mankefors Dept. of informatics and mathematics University of Trollhättan/Uddevalla P.O. Box 957, SE-461 29 Trollhättan, Sweden Communicating author: {andreas.boklund, christian.jiresjo, stefan.mankefors}@htu.se Abs tract In this paper, we present the creation process and the purpose behind the Midnight cluster. It is disguised as a computer laboratory during the day, but turns into a high performance compute cluster during the night. The main focus of this paper is on the basic issues with creating a part time compute cluster. The Midnight cluster was constructed to serve both as a CPU harvester and as a platform for further studies. The main goal of our upcoming research is to ev aluate different methods for handling shapechange and process management on the cluster, and how these factors affect running processes, stability and performance. Keywords: cluster, metamorphosic

computing,

grid,

Mosix,

1. Introduction During the last decade the view on high performance computing have started to change . Twenty five years ago researchers were trying to develop fast processors; ten years ago they were developing large MPP class computers [1]. Today we are working on connecting computers all over the world to create grids [2]. The community have taken a step in the chain of evolution from being hunters of supercomputers to become farmers of grids. Now, our primary goal is to harvest the largest amount of CPU cycles possible. The main problem is that supercomputers are like basements, the larger basement you have the more items you place there. When our supercomputers get more processing power we increase the size of our problems [3]. There are many projects today that deal with cluster computing [1], grid computing [2] and truly

distributed computing like distributed.net [4], seti@home [5 ,6] and folding@home [7]. These are three approaches to high performance computing that suit different environments and problems [1]. In this paper we present the construction of a part time Linux cluster out of computers that would otherwise sit idle. This is nothing new, many companies and research institutes have been running demanding applications on the employees workstations during the day , after work hours or even at their homes [8,9]. Unfortunately, this approach may become unavailable soon since many organizations are replacing their UNIX based workstations with Windows [10]. Because of these factors we want to use this cluster as a test bed to evaluate the possibilities to create part time clusters out of otherwise idle computers and to find out what the problems related to such a procedure might be. This is accomplished by rebooting the computers into Linux after work hours to form a computational cluster.

2. Computational environment The computers that we have to our disposal are standing in a computer laboratory that is only accessible by computer science students. The building is locked and the alarm is on between 11pm and 7am.

2.1. Hardware The cluster consists of one management node and twenty identical part time compute nodes stationed in the laboratory.

HW \ Node

Management

Compute

Processor

2 * Pentium II

Pentium III

Speed

266Mhz

600Mhz

Cache L1/L2

32KB/512KB

32KB/512KB

L2 speed

133Mhz

300Mhz

Chipset

Intel 440FX

Intel 810

SDRAM

512Mbyte

256Mbyte

PCI bus

33MHz/32bit

33MHz/32bit

Hard drive

2*9GB SCSI

30GB IDE

Network card

3com 3c905B

3com 3c920

Network speed

100Mbit

100Mbit

Figure 1: hardware.

Technical

characteristics

of

against implementing a single I/O Space such as the Parallel Virtual file system (PVFS) [23,24], Global file system (GFS) [25] or Mosix Shared File system (MFS) [26]. P arallel file systems have been shown to increase the I/O capabilities of workstations and clusters [27] but in this early stage a parallel file system would only increase the complexity of the system. The management node has four partitions spread over two SCSI hard drives, shown in Figure 2. The swap partitions are spread over several physical hard drives to load balance the I/O. Mount point

the

2.2. Operating system There are two main approaches to cluster computing with Linux; these are Beowulf [ 11,12] and Mosix [13,14,15]. We have chosen to use OpenMosix [16], the free version of the Mosix operating system. The Mosix paradigm suits this project better because it dynamically allocates resources between the available nodes in the cluster [17]. There is no need for an advanced queuing system like for example Load Sharing Facility (LSF) [18] to checkpoint and stop the job s at 7am every morning just to restart it at 11pm in the evening. The Linux distribution used is Gentoo Linux version 1.4_rc2 [19], with the OpenMosix install options [20,21]. The reason why we choose to use Gentoo Linux is that it comes with a package for OpenMosix which makes it easy to install and configure. One advantage that Gentoo Linux has over other distributions is that it uses the portage system [22 ], which downloads the source code of all packages and compiles it locally with all computer specific optimizations.

2.3. File systems The file system used on all partitions is the ext3 journaling file system. Since midnight initially will be used to prove that the part time cluster concept works we decided

Device

Size

Filesystem

/

/dev/sda3

8GB

ext3

/home

/dev/sdb2

7,6GB

ext3

/boot

/dev/sda1

101MB

ext3

Swap

/dev/sda2

512MB

Linux Swap

Swap

/dev/sdb1

512MB

Linux Swap

Figure 2 : File system layout of the management node. The hard drives of the compute nodes contain 4 partitions where the first two support the Microsoft Windows installation and the last two are used for Linux file system and swap, respectively. Mount point

Device

Size

Filesystem

--

/dev/hda1

30MB

Fat16

--

/dev/hda2

10GB

NTFS

/

/dev/hda3

1,5GB

ext3

Swap

/dev/hda4

512MB

Linux Swap

Figure 3: File system layout of the compute nodes.

2.4. Boot loader Since the cluster needs to reboot twice a day at 7am and 11pm we need a boot loader that can be configured to choose different boot alternatives depending on the time. Because GRUB [28], the boot loader of our choice, does not support this, we decided to add the functionality as explained in “Section 3. Adap ting the boot loader”.

2.5. Network configuration All compute nodes are connected to a single VLAN in a Fast Ethernet switch. The management node is connected to the same VLAN but in another Fast Ethernet switch. These two switches are connected to a third switch through a trunk port. All campus computers are connected to VLANs within the same hierarchy of switches.

Figure 4: Networking topology. This networking topology is not very suitable for a compute cluster since latencies are very high and there is a great deal of traffic traversing the network. It should be noted, however that the network load drops significantly during the night when the campus is closed.

3. Adapting the boot loader The changes made to GRUB are really a hack in the full meaning of the word. We started out with GRUB version 0.92. To get this to work we had to make three changes; a new configuration value, an hour check when booting and a function that reads the hour from the BIOS.

3.1. Configuration value The configuration value that was added makes it possible to specify an hour, as an integer. This integer will then be stored in a global variable that is compared to the hour stored in the BIOS when booting.

3.2. Reading the hour from BIOS The function that reads the time from the BIOS was the hardest to implement. It is based on the code for reading the seconds, getrtsecs(). The hour value is

stored in eight bits as a Binary Coded Decimal (BCD). ENTRY(getrthour) push %ebp call EXT_C(prot_to_real) /* enter real mode */ .code16 movb $0x2, %ah int $0x1a DATA32 jnc gothour movb $0xff, %ch gothour: DATA32 call EXT_C(real_to_prot) .code32 cmp %ah, 0 je gotdell movb %ch, %al gotdell: movb %al, %ch /*gotnodell: xor %ch, 15 xor %al, 240 add %ch, %al */ pop %ebp ret Figure 5: Code for reading the hour value from BIOS . Unfortunately, different BIOSes store the hour value in different ways. Right now our patch only supports the Phoenix BIOS installed in our computers (version A08), although we are planning on supporting all major BIOSes. When the patch is completed, we will submit it to the GRUB development team so that it can be included in the standard source. Currently it is only available from our web server [29].

3.3. Hour check when booting When GRUB starts, it reads a configuration file containing boot options. Next, it displays the boot menu and waits for a timeout or for the user to choose an option. If the menu times out GRUB boots the operating system that has been flagged as default. In our case, this is Windows XP. The main functionality of our patch is that it adds a step between reading the configuration file and displaying the boot menu. The step that is added compares the hour read from the BIOS with the configured hour value. If the two values match, the

default boot option will be changed from Windows to Gentoo/OpenMosix.

4. Theory of operation The concept of operation this cluster is meant to prove, is that it is possible to have a set of windows/office computers by day and a high performance Linux cluster by night. It is also the reason why w e named this cluster Midnight. The only full time component of Midnight is the Management node, which serves as the starting point for all jobs that span the cluster. It should never be switched off.

it will spawn twenty (20) processes. All these processes will ru n on the management node and the performance will be terrible.

4.3. At 11pm The windows scheduler reboots the workstations in the computer laboratory. The GRUB boot loader compares the hours and decides that it is time to put the Midnight cluster back into work. The compute nodes come online one after another and the management node starts to pass out the twenty processes t o them.

4.4. At 7am 4.1. Mosix process migration Mosix has support for pre-emptive (completely transparent) process migration [14]. When a process is initiated on a computer, the computer becomes its unique Home Node (UHN). If that computer becomes overloaded, the process will be migrated to other nodes which have available resources [15]. When a process is migrated it is divided in two parts, a deputy process that is kept as a local skeleton process on the UHN and a remote process that runs on the new computer in user space [14].

If we are lucky, the first job has finished and the next job in the queue will be running. A cron script executes the reboot command on the compute nodes and they send all processes back to the management node. The GRUB boot loader is run again and this time the hours do not match so the computers are booted into Windows mode, ready to serve hundreds of computer s cience students in their daily work.

4.5. Between 7am and 11pm The management node is running twenty processes, making a little progress on the job. The compute nodes on the other hand are running the windows operating system, along with various applicat ions. A few more jobs may be p laced onto the queue in the management node.

5. Conclusio ns and Future Work

Figure 6: A local and a migrated process [14]. The drawback of this system is that all site dependent system calls that are executed by the process are forwarded from the remote to the deputy process and executed on the UHN [14]. This adds some overhead especially when a remote process needs to access the network or a file system, unless the Mosix distributed file system have been implemented [27].

4.2. Before 11pm A couple of parallel jobs are queued on Midnight’s management node. The first job will start to run and

The idea behind the development of the Midnight cluster was conceived in the beginning of 2001 [30]. In this paper we describe how we created a workabl e part time compute cluster. T he creation of the cluster is just the first step. The Midnight cluster will serve both as a CPU harvester and as a platform for future studies . The main goal of our upcoming research is to evaluate different methods for handling shapechange and process management on the cluster, and how these factors affect running processes, stability and performance. Our goal is to extend the work presented in this paper in several directions. First, we want to gather data about eventual node failures such as not rebooting, being switched off, migration problem and etcetera. We also want to see how this affects the throughput of the cluster.

Next, we want to implement the MFS parallel file system, to see how it works in an environment where compute nodes are switched on and off. Our third project is to profile the cluster to find out which application types and problem sizes benefits most from this special environment, where the processes are running on the management node during the larger part of the day. We would also like to add a few older outdated computers to take some load of the management node during daytime. Hopefully this will increase the throughput and make it possible to run larger jobs.

6. Acknowledgements We would like to thank Christer Selvefors for letting us use the equipment and Mats Lejon for reconfiguring the campus network to suit our needs. We would also like to thank Richard Torkar, Nima Namaki, Pouya Kheradpour and Anna Johansson for their comments and revisions of this paper.

7. References [1] R. Buyya, High Performance Cluster Computing, Prentice Hall, New Jersey, 1999. [2] I. Foster, C. Kesselman, editors. The Grid: blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers, Inc. 1999. [3] D. Greenberg, R. Brightwell, LA. Fisk, A. Maccabee, R. Riesen, “A System Software Architecture for High -End Computing”. In proceedings of Supercomputing ’97. Nov 1997. [4] Distributed .net homepage, http://distributed.net, 200302-20. [5] G. Lawton, “Distributed net applications create virtual supercomputers”. Computer, Volume: 33 Issue: 6, Jun 2000, pp. 16. [6] E. Korpela, D. Werthimer, J. Anderson, J. Cobb, M. Leboisky, ” SETI@home-massively distributed computing for SETI”. Computing in Science & Engineering, Volume: 3 Issue: 1, Jan/Feb 2001. pp. 78. [7] K. Schreiner, “Distributed projects tackle protein mystery”. Computing in Science & Engineering, Volume: 3 Issue: 1, Jan/Feb 2001. pp. 13. [8] G. Mulas, Turning a group of independent GNU/Linux workstations into an (open) Mosix cluster in an untrusted networking environment and surviving the experience., http://www.democritos.it/events/openMosix/ papers/cagliari.pdf, 2003-02-21. [9] A. Boklund, “Home Clusters”. ;login magazine. Vol 26, SAGE, 2001-08, pp. 74.

[10] Gartner research, http://www.gartner.com/resources/ 109300/109321/109321.pdf, 2003-02-21. [11] S. Heaney, Beowulf: A New Verse Translation. W.W. Norton & Company, 2001. [12] Beowulf Project at beowulf.gsfc.nasa.gov/. 2003-01-13.

CESDIS,

http://

[13] Mosix, http://www.mosix.org/, 2003-02-27 [14] A. Barak, O. La’adan, A. Shiloh, ”Scalable Cluster Computing with MOSIX for LINUX”, Proc. Linux Expo '99, Raleigh, N.C., May 1999, pp. 95-100. [15] A. Barak, S. Guday, R. Wheeler, The MOSIX Distributed Operating System, Load Balancing for UNIX. Lecture Notes in Computer Science, Vol. 672, SpringerVerlag, 1993 [16] OpenMosix, http://openmosix.sourceforge.net/, 200302-03. [17] Y. Amir, B. Awerbuch, A. Barak, R.S. Borgstrom, A. Keren, “An Opportunity Cost Approach for Job Assignment in a Scalable Computing Cluster”, IEEE Tran. Parallel and Distributed Systems, Vol. 11, No. 7, pp. 760, July 2000. [18] M.Q. Xu,”Effective metacomputing using LSF Multicluster“. Cluster Computing and the Grid, 2001. Proceedings. First IEEE/ACM International Symposium on, 2001. pp. 100. [19] Gentoo Linux Distribution, http://www.gentoo.org/, 2003-02-27 [20] OpenMosix portage packet, http://www.gentoo.org/ dyn/pkgs/sys-kernel/openmosix-sources.xml, 2003-02-03 [21] OpenMosix portage packet, http:// www.gentoo.org/ dyn/pkgs/sys-cluster/openmosix - user.xml, 2003-02-03 [22] Portage User Guide, portage- user.xml. 2003-03-05.

http://gentoo.org/doc/en/

[23] PVFS Homepage, http://parlweb.parl.clemson.edu/ pvfs, 2003-02-25. [24] P. Carns, W. Ligon, R. Ross, R. Thakur, ”PVFS: A Parallel File System for Linux Clusters”, Proceedings of the 4th Annual Linux Showcase and Conference, 2000. [25] S. Soltis, G. Erickson, K. Preslan, M. O’Keefe, T. Ruwart, “The Global File System: A File System for Shared Disk Storage”. Submitted to the IEEE Transactions on Parallel and Distributed Systems, 1997. [26] L. Amar, A. Barak, A. Shiloh, “The MOSIX Direct File System Access Method for Supporting Scalable Cluster File Systems”, http://www.mosix.org. March 2003. [27] L. Amar, A. Barak, A. Shiloh, “The MOSIX Parallel I/O System for Scalable I/O Performance”. Proc. 14-th IASTED International Conference on Parallel and Distributed Computing and Systems, 2002. pp. 495.

[28] GRUB development software/GRUB. 2003-03-03.

site,

http://www.gnu.org/

[29] Our GRUB Patch, http://java.thn.htu.se/~caveman/ GRUB/, 2003-03-03 [30] A. Boklund, F. Larsson, "Processmigrationsystem , An efficient way to use a computational cluster?”. M.S. thesis, University of Gothemburg, Sweden, 2001.

Suggest Documents