Using clusters of computers for large QU-GENE simulation experiments

2 downloads 189 Views 70KB Size Report
is a hardware and software solution to the automation ... server software provides for the monitoring and fault- ... The creator and manager software run on the.
BIOINFORMATICS APPLICATIONS NOTE

Vol. 17 no. 2 2001 Pages 194–195

Using clusters of computers for large QU-GENE simulation experiments Kevin P. Micallef ∗, Mark Cooper and Dean W. Podlich School of Land and Food Sciences, The University of Queensland, Brisbane, Queensland 4072, Australia Received on August 17, 2000; accepted on October 20, 2000

ABSTRACT Summary: The QU-GENE Computing Cluster (QCC) is a hardware and software solution to the automation and speedup of large QU-GENE (QUantitative GENEtics) simulation experiments that are designed to examine the properties of genetic models, particularly those that involve factorial combinations of treatment levels. QCC automates the management of the distribution of components of the simulation experiments among the networked singleprocessor computers to achieve the speedup. Contact: [email protected] Supplementary information: http://pig.ag.uq.edu.au/ qu-gene/cluster.htm

QU-GENE is a simulation platform for the quantitative analysis of genetic models (Podlich and Cooper, 1998). QU-GENE has been used for teaching (http://pig.ag. uq.edu.au/qu-gene/teaching.htm) and basic (Podlich and Cooper, 1999) and applied quantitative genetics research, especially the optimisation and sensitivity analysis of plant-breeding programs (Podlich et al., 1999). Typically, a QU-GENE experiment is a factorial design, resulting in hundreds or thousands of individual simulations. Whilst it is generally easy to create the basic set of factor input files for such an experiment, creating the factorial combinations and running the individual simulations is a time-consuming and error-prone process when done manually. The need to automate this process of creating combined input files and the on-demand distribution of these to an array of computers provided the stimulus for the development of the QU-GENE Computing Cluster (QCC). QCC is both a hardware and a software solution to the automation and speedup of large QU-GENE experiments. The hardware consists of 48 number-crunchers and two servers, connected over an Ethernet network. Client– server software provides for the monitoring and faulttolerance of the cluster and the on-demand distribution ∗ To whom correspondence should be addressed.

194

of simulation experiments to the number-crunchers. Other software creates the combinations of input files from hand crafted sets of input files and prepares these for running on the cluster.

HARDWARE The two main factors limiting assembly of sufficient computing power for this type of work are money and storage space. Other than computer equipment, money is required for building shelving to house the computers and installing extra air-conditioning. Computer hardware is a rapidly changing area, but the per-computer costs start at approximately US$ 800, including uninterruptible power supplies, networking infrastructure and KVM (Keyboard Video Mouse) switches. Relatively inexpensive 10 MB hubs provide sufficient networking infrastructure for current QU-GENE experiments. KVM switches allow one set of keyboard, video monitor and mouse to control several computers, thus saving money and space. QCC currently consists of 48 number-crunchers, each with 500 MHz Celeron processors and 64 MB of memory, costing about US$ 900 per computer. The current cluster architecture will be expanded in 2001 to have a larger number of processors and greater individual processor memory. Alternative hardware, such as a supercomputer, was considered. On a per-processor basis, a supercomputer has similar performance (http://www.spec.org) but is one and a half orders of magnitude more expensive than the current QCC desktop solution. For current QU-GENE experiments, the only benefit a supercomputer would offer is access to a large, unified memory of several GB and large disk storage, useful for experiments generating large matrices. SOFTWARE Podlich and Cooper (1998) discussed the two-state architecture of QU-GENE. The first stage is referred to as the engine and defines the genotype-environment system. The second stage is called the application module and simuc Oxford University Press 2001 

Clusters of computers for large QU-GENE simulations

lates the structure of a plant breeding program. The engine is typically run on a user’s desktop computer using handcrafted input files and generates GES (Genotype Environment System) files. The GES file is used in combination with an INB file (describing the parameters for the breeding program) as input by the application module. QCC deals with automating the process of running the application module. A MIO (module input–output) file identifies which GES and INB files are to be used as input for a particular simulation and also specifies the names of output files. In a typical simulation experiment, there are many GES and INB files, and the factorial combination of these forms the array of individual simulations. Thus, for an experiment with 20 genotype-environment systems and 30 different breeding programs, 600 simulations are required. For each of these simulations, a unique MIO file needs to be created. This step is automated by the QCC software. The QCC software was written in Tcl/Tk. Tcl/Tk is a scripting language with high-end features such as networking and graphics, designed to run on a variety of platforms. The QCC software consists of three interlinked components (Figure 1). The first component, the creator, creates the MIO files and prepares them to run on the cluster. The second component, the manager, monitors the cluster and distributes simulations to the numbercrunchers. The creator and manager software run on the controller computer. The third component, the client, runs the simulations on the number-crunchers. Linking the three software components, and the central paradigm of the cluster, is a queue of RUN files, known as the RUNList. RUN files are text files containing the information required for a number-cruncher to find and execute a simulation. A RUN file lists the directory name where to find the simulation files, the name of the application module, the name of the MIO file and a list of extensions of files to return at the completion of the simulation. There is one RUN file per MIO file and they are created, along with MIO files, and added to the RUNList in the first QCC software component. The RUNList is part of the manager software component. As clients complete one simulation experiment, they connect to the manager, via the network, and are sent the head element of the RUNList. The client reads the contents of the RUN file, uploading the necessary files to a working directory, running the simulation and returning the results. Should a client fail, the manager is able to reallocate the failed RUN file to the head of the RUNList for tasking to the next available client. The RUNList can be edited with a mouse, allowing for the deletion or reprioritisation of simulations. QCC has proven a great benefit to researchers using QU-GENE. Podlich et al. (1999) is an example which evaluated the relative efficiency of two plant breeding strategies. Larger experiments that previously took days to

CREATOR qgmio qgrun qgadd

Han dcra files fted MIO and RU N fi les

add to RUNList

MANAGER

PHYSICAL DISK DRIVE Q: files

RUNList Cluster Monitor

Messages

network drive mapping

CONTROLLER COMPUTER take from RUNList

NUMBER-CRUNCHER COMPUTER

CLIENT Get num from RUNList

ent Experim s file

Q: files

Run experiment in num.run Return results

ults Res

VIRTUAL DISK DRIVE

Fig. 1. The three components of the QCC software. The creator and manager components run on the controller computer. The client component runs on each of the number-crunchers. The three components communicate over a network. The creator component, consisting of three commands, creates the MIO (qgmio) and RUN (qgrun) files and adds the RUN files to the RUNList (qgadd). The manager component monitors the cluster and manages the RUNList. The client component requests the next RUN file from the RUNList and executes the corresponding simulation. The Q: drive, physically attached to the controller, stores the simulation files. The numbercrunchers share the Q: drive using network drive mapping.

prepare and months to run can now be prepared in hours and run in days.

REFERENCES Podlich,D.W. and Cooper,M. (1998) QU-GENE: a platform for quantitative analysis of genetic models. Bioinformatics, 14, 632– 653. Podlich,D.W. and Cooper,M. (1999) Modelling plant breeding programs as search strategies on a complex response surface. Lecture Notes in Computer Science 1585, pp. 171–178. Podlich,D.W., Cooper,M. and Basford,K.E. (1999) Computer simulation of a selection strategy to accommodate genotype-byenvironment interaction in a wheat recurrent selection program. Plant Breeding, 118, 17–28.

195