Accelerating Gene Regulatory Network Modeling Using Grid-Based ...

3 downloads 225 Views 137KB Size Report
significant reduction in execution time versus running simulation jobs locally. To analyze this ... analysis, our re- search focuses on supplying biologists with accurate and ..... with custom computation engines, such as hardware accel- eration units ..... other potential optimization is to account for dependencies between jobs ...
SIMULATION http://sim.sagepub.com

Accelerating Gene Regulatory Network Modeling Using Grid-Based Simulation James M. McCollum, Gregory D. Peterson, Chris D. Cox and Michael L. Simpson SIMULATION 2004; 80; 231 DOI: 10.1177/0037549704045051 The online version of this article can be found at: http://sim.sagepub.com/cgi/content/abstract/80/4-5/231

Published by: http://www.sagepublications.com

On behalf of:

Society for Modeling and Simulation International (SCS)

Additional services and information for SIMULATION can be found at: Email Alerts: http://sim.sagepub.com/cgi/alerts Subscriptions: http://sim.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations (this article cites 14 articles hosted on the SAGE Journals Online and HighWire Press platforms): http://sim.sagepub.com/cgi/content/refs/80/4-5/231

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on April 14, 2008 © 2004 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

Accelerating Gene Regulatory Network Modeling Using Grid-Based Simulation James M. McCollum Department of Electrical and Computer Engineering University of Tennessee–Knoxville [email protected] Gregory D. Peterson Department of Electrical and Computer Engineering University of Tennessee–Knoxville Chris D. Cox Department of Civil and Environmental Engineering Center for Environmental Biotechnology University of Tennessee–Knoxville Michael L. Simpson Molecular Scale Engineering and Nanoscale Technologies Research Group Oak Ridge National Laboratory Department of Material Science and Engineering Center for Environmental Biotechnology University of Tennessee–Knoxville Modeling gene regulatory networks has, in some cases, enabled biologists to predict cellular behavior long before such behavior can be experimentally validated. Unfortunately, the extent to which biologists can take advantage of these modeling techniques is limited by the computational complexity of gene regulatory network simulation algorithms. This study presents a new platform-independent, grid-based distributed computing environment that accelerates biological model simulation and, ultimately, development. Applying this environment to gene regulatory network simulation shows a significant reduction in execution time versus running simulation jobs locally. To analyze this improvement, a performance model of the distributed computing environment is built. Although this grid-based system was specifically developed for biological simulation, the techniques discussed are applicable to a variety of simulation performance problems. Keywords: Computational biology, grid-based simulation, gene regulatory networks, performance model

1. Introduction High-throughput methods in molecular biology have led to an explosion in the volume of biological information regarding the sequences of genes and the proteins they encode. These methods have made possible the complete sequencing of many genomes ranging from viruses to humans. The science of bioinformatics has developed to process, categorize, store, and manage this vast quantity of information; however, interpretation of this information in

SIMULATION, Vol. 80, Issue 4–5, April–May 2004 231-241 © 2004 The Society for Modeling and Simulation International

| | | |

the context of how the genes are turned on and off (regulated) represents an even larger challenge. Accurately modeling gene regulation is crucial to understanding the processes by which cells interact with their environment and each other. The ability to analyze and control these cell interactions could lead to the development of new medical treatments for a variety of illnesses ranging from influenza to cancer. Because developing a model of a gene regulatory network requires extensive computation and analysis, our research focuses on supplying biologists with accurate and efficient software tools to accelerate the modeling process. Our research group is part of the DARPA BioSPICE community (www.biospice.org), which aims to develop a suite

DOI: 10.1177/0037549704045051 Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on April 14, 2008 © 2004 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

McCollum, Peterson, Cox, and Simpson

of biological research tools, particularly regarding genetic regulatory modeling and simulation. In recognition of the performance bottlenecks that impair computational biology, we exploit grid-based techniques to accelerate simulation and related tasks. We have developed a parallel computing system called the distributed computing environment (DCE) to distribute jobs across a cluster of machines. Biological modeling applications are then developed to use the DCE to perform parallel simulation and data analysis. Our approach is similar to previous work with Netsolve [1], Condor [2], and Globus [3], but the need to support integration with other BioSPICE tools and formats, combined with the goal of making the interface easy for biologists to use, led us to develop the DCE. For a description of the grid as an emerging computation infrastructure, readers are encouraged to see Foster and Kesselman [3]. Our discussion of the DCE begins with an introduction to gene regulatory networks and the algorithms used to model them. This is followed by a detailed discussion of the interworkings of the DCE and how tools are added to the system. A performance model based on data collected from running our gene regulatory network simulator with the DCE is built to formalize our understanding of the performance gain. Finally, a discussion of how we intend to improve the performance of the DCE and the development of biological models using this system is presented. The article will show that using a grid-based parallel simulation environment, we can greatly accelerate the development and simulation of gene regulatory network models. 2. Gene Regulatory Networks The basic unit of information storage in all cells is deoxyribonucleic acid (DNA). DNA contains instructions for making proteins, which are polymeric chains of amino acids that the cell uses to perform various tasks. Instructions for making a protein are encoded in the sequence of four bases—adenine, cytosine, guanine, and thymine— that are the information content of the DNA chain. A sequence of three bases in the DNA molecule is called a codon. Of the 64 possible three-base codons, 61 of them code for 1 of 20 specific amino acids. The other 3 codons are called stop codons because they signify the end of a coding sequence on the DNA. Therefore, the sequence of bases in the DNA corresponds to a sequence of amino acids that make up the protein. The amino acid sequence of the protein and other structural features, such as its shape upon folding, determine the function of the protein. Some information about the function of a protein can be inferred by comparing its sequence with similar proteins of known function. Once we know the sequence and function of a particular protein coded in the DNA, we can turn to the more complicated subject of gene regulation. While the genome of the organism contains all of the information to code all the proteins that the organism will ever need, to function properly, the correct proteins need to be made at the

correct time in response to specific environmental and/or intracellular conditions. This coordination of which proteins are being made at any given time is accomplished by turning on or off the genes that code for the proteins. The regulation of the gene is accomplished by interactions of proteins and other molecules with control regions of the DNA, called operators. With thousands of different genes within a cell and each gene’s activity potentially being regulated by several proteins, the regulatory network of a cell can become quite complex. Therefore, it is common for biologists to investigate the regulatory behavior of a small number of interregulated genes instead of entire genomes. However, even systems comprising only a few genes can exhibit surprisingly complex behavior. Models of genetic regulatory networks can aid in understanding the control mechanisms used by a cell. A review discussing the many types of models available is given by Smolen, Baxter, and Byrne [4]. Essentially, a model of the network consists of mathematical expressions that represent the rates of the various biochemical reactions that occur within the cell. These reactions include protein production and decay, the interaction of proteins and operators to control the protein production rate, and many others. To date, the most noteworthy successes of genetic models have been to predict certain characteristics of cell behavior based on the architecture of the regulatory network. Tyson and his coworkers [5, 6] have used genetic models to represent the cell cycle in frog embryo extracts. Recently, experimental verification of the model predictions has been obtained [7], and it is important to note that these predictions were made through model simulations long before experimental verification was achieved. Arkin, Ross, and McAdams [8] used models to show that the time-dependent variation (i.e., noise) in the populations of various proteins in phage-infected bacteria is exploited in a switch mechanism that partitions some bacteria to be lysogenic (the phage DNA is incorporated into the host DNA, and the virus remains dormant) or lytic (the viruses multiply rapidly in the host and rupture the cell). Ultimately, the goal is to use such models to develop medical treatments for various illnesses ranging from viral infections to cancer. For example, Endy, Kong, and Yin [9] used models to simulate growth of T7 virus phages in Escherichia coli bacteria. The model accounts for entry of the phage genome into the host bacteria, expression of the phage genes, redirection of resources from the host to replication of phage DNA, and production of intact phage progeny. The model was used to predict the efficacy of various drug strategies on the growth of the virus. 3. Modeling Gene Regulatory Networks Since genetic regulatory networks are essentially systems of chemical reactions, an algorithm capable of estimating the time course evolution of a reacting mixture of chemicals is needed. Because we are concerned with modeling gene regulatory networks in prokaryotes, or cells that do

232 SIMULATION Volume 80, Number 4–5

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on April 14, 2008 © 2004 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

ACCELERATING GENE REGULATORY NETWORK MODELING

not contain internal structures, we can model the cell as a spatially homogeneous chemical system, meaning that we do not need to keep track of where the molecules are spatially located within the mixture. This simplification allows us to state our problem as follows: “If a fixed volume V contains a spatially uniform mixture of N chemical species that can interact through M specified chemical reaction channels, then given the numbers of molecules of each species present at some initial time, what will these molecular population levels be at any later time?” [10]. Several strategies exist for approaching this problem. The first and most common technique treats the volume V as a continuous system whose behavior is represented by a set of ordinary differential equations with variables that represent concentrations of chemical species [10, 11]. The second strategy treats the volume V as a discrete system whose behavior is represented by stochastic processes, with integer variables representing the numbers of molecules [10, 11]. Many biologists and software developers prefer continuous modeling because discrete modeling is computationally intensive. This technique is effective in many cases but cannot be applied to systems with small numbers of molecules such as small biological cells or surface processes [11]. Current research in frequency domain analysis of gene regulatory networks has also required the use of discrete modeling to validate theoretical models that allow for the analysis of stochastic fluctuations in gene circuits and networks [12]. Our research focuses on accelerating discrete modeling because of its general applicability and superior accuracy. Two algorithms that perform exact stochastic simulation of coupled chemical reactions were first developed by Gillespie in the 1970s, called the first reaction method and the direct method [10]. Gibson and Bruck [11] modified Gillespie’s first-reaction method and created the next reaction method, the most efficient algorithm to date for exactly stochastically simulating coupled chemical reactions. Recently, Gillespie [13] has proposed approximate methods with improved performance, but care must be exercised in applying these approximations to preserve the effects of noise in system performance that is critical to our group’s current biological research. Below is a summary of Gibson and Bruck’s next reaction method, the core algorithm used in the stochastic simulator described throughout this article. For further information on this and other stochastic simulation algorithms, consult Gillespie [10] and Gibson and Bruck [11]. 3.1 The First Reaction Method The next reaction method is the method used by our simulator, the Exact Stochastic Simulator (ESS), to determine the time evolution of chemical species given a set of reaction pathways. This algorithm is most easily understood by first explaining the algorithm that it is derived from, Gillespie’s [10] first reaction method. This algorithm is given as follows:

1. Initialize a list of n chemical species and their initial numbers of molecules, X1 , X2 , . . . , Xn . 2. Initialize a list of m chemical reactions and their associated stochastic rate constants, k1 , k2 , . . . , km . 3. Initialize the current time t ← 0. 4. Calculate the propensity, a1 , a2 , . . . , am , for each of the m chemical reactions. 5. For each reaction i, generate a putative time τi , according to an exponential distribution with parameter ai . 6. Let µ be the reaction whose putative time, τµ , is least. 7. Change the number of molecules, X1 , X2 , . . . , Xn , to reflect the execution of reaction µ. 8. Set t ← t + τµ . 9. Go to step 4. To understand this algorithm fully, we must first define the term propensity used in step 4. Consider the following chemical reaction: A + B → C.

(1)

The probability that the reaction given in equation (1) occurs, or the probability that a given molecule A reacts with a given molecule B, in a small time dt is P1 = a1 dt + o(dt).

(2)

The propensity refers to the value a1 given in the equation above, which may be a function of volume, temperature, concentration, and so on. In step 4 of the first reaction method, we calculate this propensity based on the stochastic rate constant associated with the reaction k1 and the current populations of the reactants XA and XB , using the following equation: a1 = XA · XB · k1 .

(3)

Multiplying XA and XB together in equation (3) reflects the number of combinations by which the reaction could occur, thus making the propensity depend on the population of the chemical reactants. The input rate constant k1 is used to account for all other factors, such as volume and temperature, that may determine the propensity of the reaction. Step 5 of the algorithm uses the propensity generated in step 4 to generate a putative time or the amount of time it will take for this reaction to occur based on the current state of the system. This is accomplished by generating a uniformly distributed random number between 0 and 1 (URN), scaling it to fit the exponential distribution, and Volume 80, Number 4–5

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on April 14, 2008 © 2004 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

SIMULATION

233

McCollum, Peterson, Cox, and Simpson

multiplying that number by the propensity, as shown in the following: τi = −

1 log(URN). ai

(4)

Step 6 selects the reaction with the smallest putative time using a linear search. Step 7 updates the number of molecules based on the stoichiometry of the reaction. For example, if we were executing the reaction given in equation (1), we would decrement the values XA and XB and increment the value of XC . Step 8 updates the current time based on the putative time selected in step 6. 3.2 The Next Reaction Method The next reaction method makes several enhancements to the first reaction method to dramatically improve performance. The first enhancement is the creation of a dependency graph to determine which reaction’s propensity values need to be updated when a particular reaction is executed. In step 4 of the first reaction method, each propensity value is recalculated for every step of the algorithm, but using the dependency graph, the computation required for this step is drastically reduced. The second enhancement reuses exponentially distributed numbers to accelerate step 5. The final enhancement stores the values of the putative times τi for each reaction in an indexed priority queue. This reduces the search time of step 6 of the first reaction method from O(n) to O(log n). The next reaction method algorithm is stated below. For more information, consult the work of Gibson and Bruck [11]. 1. Initialize: (a) initialize X1 , X2 , . . . , Xn , set t ← 0; (b) generate a dependency graph G based on the stoichiometry of the m chemical reactions; (c) calculate the propensity function, a1 , a2 , . . . , am , for each of the m chemical reactions; (d) for each reaction i, generate a putative time, τi , according to an exponential distribution with parameter ai ; (e) store the τi values in an indexed priority queue P. 2. Let µ be the reaction whose putative time, τµ , stored in P is least. 3. Let τ be τµ . 4. Change the number of molecules to reflect the execution of reaction µ. Set t ← τ. 5. For each edge (µ, α) in the dependency graph G, (a) update aα ;

(b) if α  = µ, set τα ← (aα,old /aα,new )(τα −t)+t; (c) if α = µ, generate a random number, ρ, according to an exponential distribution with parameter aµ , and set τα ← ρ + t; (d) replace the old in τα value in P with the new value. 6. Go to step 2. 3.3 Performance of Stochastic Simulation The performance of stochastic simulation presents a significant challenge to researchers attempting to model genetic regulatory networks. As reported by Endy and Brent [14], the Gillespie method with the enhancements of Gibson and Bruck [11] is capable of achieving approximately 1010 simulated reactions per day for a 100 reaction system when executing on a 800-MHz Pentium III computer. This translates to roughly 105 reactions per second for the problem. The Exact Stochastic Simulator (ESS) tool developed at the University of Tennessee simulates as many as 420,000 reactions per second for smaller systems of reactions [12]. Researchers investigating cellular systems such as E. coli cell doubling could require simulating 1014 to 1016 reactions [14]. This computational demand outstrips the capabilities of single-processor workstations by several orders of magnitude, so techniques to improve simulation performance are needed. Further exacerbating this challenge, because each Gillespie simulation represents a sample of the stochastic system behavior, a number of simulations are required to collect statistically meaningful data. Note that each of the simulations is performed in isolation; the results of one are not needed for other concurrently executing simulations. Hence, our grid-based distributed computing approach addresses a critical computational challenge for biological and biotechnology research. 4. The Distributed Computing Environment (DCE) The DCE is developed to be a general-purpose, grid-based multiagent environment for distributed simulation and data analysis.Applications associated with the DCE can be classified into one of the four groups described in Table 1. The data flow between these groups is illustrated in Figure 1. 4.1 DCE Flexibility In an attempt to make the DCE as flexible as possible, the system is designed to be platform independent. To achieve this, each application interface is implemented in Java and communicates using platform-independent Java sockets. Application developers who wish to use a simulator or data analysis tool that is not written in Java can access their tool through the Java native interface (JNI) or by making direct system calls to run their tool as a child process.

234 SIMULATION Volume 80, Number 4–5

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on April 14, 2008 © 2004 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

ACCELERATING GENE REGULATORY NETWORK MODELING

Table 1. Distributed computing environment (DCE) components and the tasks they perform Group

Tasks

Server

– Manages all data and connections associated with the grid – Accepts jobs from clients – Distributes jobs to workers – Monitors the progress of all jobs – Retrieves output data from workers – Sends output data to the clients – Connects to the server – Adds data in the form of files to the server – Adds jobs to be performed on the data to the server – Monitors the progress of the executing jobs – Retrieves the results of the jobs – Connects to the server – Registers a set of commands that can be executed by this worker – Requests jobs to be executed from the server – Reports progress as it completes a job – Returns output results to the server – Connects to the server – Observers the status of all jobs in the server – Observes the status of all clients and workers associated with the system – Does not add or remove data from the system

Client

Worker

Monitor

Figure 1. Distributed computing environment (DCE) data flow diagram

To simplify data exchange between elements of the DCE, as well as the process of integrating a non-Java tool, all data within the system are stored and passed as files. This simplifies the software interface by not requiring data to be classified into types or requiring conversions to be performed from Java classes to other languages. A client can add data in any format to the server as long as the

worker knows how to handle these data. This feature can improve performance when handling large amounts of data because instead of storing large messages in memory, the data are stored on the disk and can be accessed randomly. 4.2 Adding Tools to the DCE The simple DCE programmer interface allows application developers to quickly integrate their simulator or dataprocessing tool into the DCE system. The programmer first integrates a simulation tool by creating an object derived from the WorkerCommand class. This class requires the programmer to derive the following four functions: public abstract String getCommandName(); public abstract int getInputCount(); public abstract int getOutputCount(); public abstract void run(Worker worker, File inputs[], File outputs[]);

The getCommandName, getInputCount, and getOutputCount functions return the name of the command, the number of input files required to execute the command, and the number of output files returned by the command. The run command executes the simulator or postprocessing tool on the files contained in the inputs array. Output data are written to the empty files stored in the outputs array. By calling worker.reportProgress(), the run command can periodically report how much of the task it has completed. If an exception is thrown from within this function, an error message is sent to the server and is eventually passed to the corresponding client who requested the job. Exception handling allows the job request to fail gracefully and prevents the worker from prematurely disconnecting from the server. To complete the worker, the application developer creates an executable class that makes three simple function calls that connect to the server, register WorkerCommand objects, and request work from the server. To call the functions implemented in the workers, the application developer must develop a corresponding client application to act as a run manager. The following commands are used to implement a client: public ServerConnection(String host, int port, String type) public FileId addFile(File file) public JobId addJob(String command, FileId[] inputs, int outputCount) public Job[] getJob(JobId jobId[]) public void getFile(FileId fileId, File file)

The client constructs a ServerConnection object that connects the client to the server. The client then adds Volume 80, Number 4–5

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on April 14, 2008 © 2004 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

SIMULATION

235

McCollum, Peterson, Cox, and Simpson

data with the addFile function and calls the addJob function to add work to the server. After work has been added to the server, the client enters a loop that makes periodic calls to the getJob function. This function returns a data structure containing information reflecting the current status of the job (Pending, Running, Complete, Error), the percentage of the job that has been completed, the output FileIds, and a string representing the error message if the job has completed with an error. If the job completes successfully, the client calls the getFile function to retrieve the job’s outputs. 4.3 Comparison to Other Grid Management Software The DCE provides capabilities similar to existing grid computing middleware tools but with some important distinctions. To help in understanding the DCE, we now discuss it in relation to several popular gridware tools. For a good introduction to grid systems and related software infrastructure, the reader is encouraged to refer to Foster and Kesselman [3]. In the NetSolve system, client applications communicate computational requests to one of a set of agents. Each agent maintains status information on a collection of servers, including the specific functions they support, benchmarking results to indicate their relative performance, and their current loading status. Based on this information, when a client requests resources to compute a function, the agent determines which server is the best match for the client and returns the server information to the client. The client then sends the function identity and parameters to the server for execution. Once complete, the server communicates the results back to the client. Note that after assigning a server to the client, the agent does not participate further in the computation. In contrast to the DCE centralized management of job assignment and parameter/result file exchange, NetSolve employs a centralized job assignment via the agents, but parameters and results are exchanged directly between clients and servers in a distributed manner. The NetSolve system requires servers to register their computational capabilities with agents by submitting a program description file describing the prototype for each function supported. Each server also provides the agent with performance results from executing a benchmark (such as LAPACK) to differentiate server capabilities. Agents may support multiple servers, and each server can be registered with multiple agents. The agents act as “matchmakers” by identifying the most appropriate server to execute a given function for a client. An important, implicit assumption made by the NetSolve agents in determining the best server to select is that each server contains one or more processors with a von Neumann architecture. In particular, because agents assess server capabilities with a standard benchmark, the agents do not account for servers with custom computation engines, such as hardware accel-

eration units implemented using reconfigurable computing hardware. In prior work with NetSolve, some investigation has been performed into extending NetSolve to support reconfigurable computing platforms [15], but the NetSolve system does not currently provide such capabilities. In contrast, the DCE was developed with the flexibility to support versions of ESS implemented in software only or using reconfigurable computing-based custom accelerators [16]. A number of job-scheduling systems exist for managing grid resources. One particularly popular system is Condor [2]. Condor supports serial or parallel applications executing on a collection of machines in an attempt to exploit the idle cycles of these machines. Condor’s usage policy on desktop machines is to not interfere with users at the console. Because it periodically checkpoints applications, Condor will stop jobs and migrate them to another machine when mouse or keyboard activity indicates that the first machine is no longer idle. As with NetSolve, Condor does not support reconfigurable computing-based hardware acceleration engines. The Globus toolkit [3] employs a flexible, layered architecture to provide higher level services built on core services. For resource management, a set of local resource managers, known as Globus resource allocation managers (GRAMs), each provides access to their local processors. For example, a GRAM could interface with Condor to use a collection of workstations. In addition, Globus provides a suite of other services, such as security and authentication, access to system status, and communications. In principal, a GRAM could interface with the DCE to access a set of applications as well, although this integration into Globus has not yet been implemented. As mentioned with Condor above, none of the resource managers currently supported with Condor provides support for reconfigurable computing-based hardware acceleration. Because the DCE grid infrastructure, particularly as implemented in the BioGrid tool, attempts to provide a simple interface to help biological researchers perform computational tasks, the programming interface is much less flexible than what is provided with parallel programming libraries such as PVM [17], MPI [18], or the Globus Nexus communications services. Application developers may choose to employ these communications facilities to create parallel processing workers, but the intent of DCE is to hide such details from users. 5. DCE Performance Model To evaluate the effectiveness of the DCE approach to accelerating biological simulations, we develop a performance model. This model can indicate performance bottlenecks as well as help in predicting performance for wider deployment on the grid. We first consider the stochastic simulation application running on a single, dedicated computer before adding the effects of other users sharing the single machine. In so doing, we isolate the effects of background loading and validate our modeling of this effect. Next, we

236 SIMULATION Volume 80, Number 4–5

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on April 14, 2008 © 2004 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

ACCELERATING GENE REGULATORY NETWORK MODELING

consider multiple dedicated computers being used for the computation before once again considering the impact of background load. Because the grid consists of a vast collection of shared, heterogeneous processors, to accurately model the performance of a grid-based simulation application, one must consider each of these issues.

user and system tasks. Previous research has focused on the arrival and service rates of typical processes on Unix and other systems [21]. In practice, this approach provides good results for typical background loads on workstations.

5.1 Dedicated Simulation Engine

The result of equation (7) is then used in equation (6) to predict the simulation performance.

The following simple model can provide good accuracy in describing simulation performance: Tsimulation = tsetup + NR tR + tresults .

(5)

This model assumes that a single-computer workstation is dedicated to a particular simulation task or problem. We use the NR term to express the number of chemical reactions simulated for our application. We can easily model variations in the number of reactions due to different stimuli, random number seeds, or starting conditions (chemical species initial populations) by taking random samples from some distribution and computing expected results for a known distribution. See Papoulis [19] and Peterson and Chamberlain [20] for more information. The tsetup term represents the time spent initializing the simulation, tR represents the time spent computing the results of each reaction, and tresults represents the time processing the results of the simulation (e.g., plotting or saving to disk). We next turn to the performance impact of shared resources such as multitasking processors or shared communications channels. 5.2 Shared Simulation Engine When a computational resource is shared, the other applications using the resource limit the performance of the simulation application. When the tasks of other users of the shared resources compete with the simulation application, we refer to this as background load imbalance. To model the effects of background load imbalance, the expected value of the maximum task completion time is expressed as the average task completion time multiplied by a load imbalance factor η. Note that the ideal value of η is 1, which corresponds to a fully dedicated system. As the load imbalance worsens, η increases. Determining the value of η is one of the challenging aspects of developing an accurate performance model. The simple model of equation (5) can be extended to reflect the impact of other users of a single, shared simulation engine as shown in the following: Tsimulation = η(tsetup + NR tR + tresults ).

(6)

This method of accounting for the impact of other users of a shared resource has been developed and validated for a number of serial and parallel applications [20]. The η term can be found based on the expected number of background tasks by using a processor-sharing queuing model in which we characterize the arrival and service rates for classes of

η = 1 + E{Number of background tasks}.

(7)

5.3 Multiple, Dedicated Simulation Engines Thus far, we have discussed using a single simulation engine. We now consider multiple simulation engines used together, starting with a collection of dedicated simulation engines. For years, researchers have developed the infrastructure to effectively exploit a collection of computational resources using parallel and distributed simulation algorithms [22]. To model the performance on multiple computers, we will consider the setup time and results time on each separate processor i, denoted tsetup;i and tresults;i . Similarly, each processor i simulates NR,i reactions over the course of its execution. Because processors completing early must sit idle at the end of their simulations before the series is completed, a maximum operator gives the time required for the last processor to complete its simulation task. For ease of discussion, assume that each processor performs a single simulation (this can be relaxed by appropriate modifications to the setup time, results time, and number of reactions):   Tsimulation_series = max tsetup,i + NR,i tR + tresults,i . (8) 1≤i≤P

If we view the setup, reaction computation, and results times as random variables, we can then consider samples taken from each of these variables for each simulation. We can consider the mean time per simulation by computing the expected value of the sum of these variables. The barrier synchronization at the end of the simulation series, as modeled with the max operator, represents the effect of a different amount of work for each of the processors in the computation. We refer to this difference in work at each processor as the application load imbalance, which is modeled using a scale factor β. The β scale factor represents the amount of work at the busiest processor as compared to the mean work among the processors. In a similar manner, the performance impact of using heterogeneous computer systems can be modeled with another scale factor relative to a baseline system. Because the overall performance of the system is limited by the slowest component, we can model the effects of load imbalances by finding the worst case among all the components as shown as follows:     1 1 η= E max (ηi ) = E max (βi δi ) . (9) 1≤i≤P 1≤i≤P ∆n Bn ∆n Bn Volume 80, Number 4–5

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on April 14, 2008 © 2004 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

SIMULATION

237

McCollum, Peterson, Cox, and Simpson

βi and δi represent the application load imbalance and the effect of computer heterogeneity, respectively. The ∆n and Bn terms serve as normalization factors. The overall performance is then   Tsimulation_series = η tsetup + NR tR + tresults . (10) In this performance model, the communications costs between the processors are represented in the setup and results terms.As the communications demands are increased, these terms will be increased in turn. 5.4 Multiple, Shared Simulation Engines In the case of a set of simulators executing on a set of shared computers such as with grid computing, the accurate modeling of its performance requires the consideration of communications, load imbalance, and synchronization costs [20, 23]. For communications with higher latency, high probability of message loss, or significant contention, the model can be refined to include the interplay among the computation and communication portions of the application. In particular, the setup and results transmission phases of the computation will be affected by communications issues. The impact of other users of systems, or background load, complicates the modeling task. Similarly, predicting communications network performance in the presence of contention can be challenging. Although high-fidelity models of these effects can be included, relatively simple models often predict the achieved performance with good accuracy. The scale factors used for modeling application load imbalance, imbalance due to heterogeneous systems, and background loading are combined to model the overall performance of a collaborative engineering simulation consisting of a number of interconnected components. We refine equation (10) to include the background load imbalance through the γi term:     1 1 η= E max (ηi ) = E max (βi γi δi ) . 1≤i≤P 1≤i≤P ∆ n Bn ∆ n Bn (11) This result is then used with equation (10). The model quantifies the performance effects of application load imbalance, the variance in runtime at each processor caused by an uneven distribution of the computation among the processors, and background load, the performance degradation resulting from other users of the shared resources. The model also accurately characterizes the performance of applications running on heterogeneous workstations with different computational speeds but similar architectures. It should be noted that the βi values from equation (11) can be set by a scheduler to help balance the workload among the available computational resources. In particular, grid-computing systems such as Netsolve [1] provide loading information as well as computer speed metrics to

support such scheduling decisions by users. Such extensions to DCE are in development. The performance model discussed here has been validated for a set of applications running on parallel and distributed computing platforms. More specifically, homogeneous and heterogeneous configurations were used with shared or dedicated access. The applications included cases with no application load imbalance, deterministicapplication load imbalance, and random-application load imbalance due to the stochastic nature of the problem. Hence, the general modeling framework has been shown to provide good results in predicting distributed application performance [20]. We turn now to the specific case of the DCE executing a set of biological stochastic simulations. 6. Applying the DCE to Biology To discover more about the underlying biology that controls the gene regulatory network that we are modeling, biologists in our research group often require many simulations to be run on a single model. Commonly, they build models in which several of the rate parameters associated with each of the reactions is unknown. To determine these parameters, they measure the real-life behavior of a cell in a laboratory, collect the data, and then sweep model parameter values to see which configuration best fits the data. They also normally wish to run the simulation for multiple seed values for more statistical confidence. If no parameters exist that fit the data properly, they examine the model and the biology to see if there may be more complexity in the gene regulatory network that is not represented in the model or is not yet biologically known. To accelerate this iterative process, we have implemented workers and clients that allow biologists to easily run multiple-simulation jobs over a network of machines to efficiently sweep a list of parameter and random seed values using the DCE. To demonstrate a biological example of this and to validate our performance model, we present the following use case: a model for quorum sensing in Vibrio fischeri. 6.1 Quorum Sensing in Vibrio fischeri Quorum sensing is a cell-cell communication mechanism employed by a wide variety of bacteria to determine if a critical mass of cells for a particular purpose are present (for reviews, see Greenberg [24] and Withers, Swift, and Williams [25]). For example, the bacteria that infect cystic fibrosis patients use quorum sensing to determine if enough cells are present to exert virulence in the host. If cells attempt to exert their virulence factors before a critical mass of cells are present, the infection is likely to be unsuccessful. Another example of a process controlled by quorum sensing is bioluminescence by the bacteria V. fischeri, which is the basis of this use case. The bacteria live in a symbiotic relationship with certain species of squid. Light produced by the bacteria help camouflage the squid; in return, the squid provide the bacteria with a rich growth medium. The

238 SIMULATION Volume 80, Number 4–5

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on April 14, 2008 © 2004 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

ACCELERATING GENE REGULATORY NETWORK MODELING

bacteria do not produce light unless their population density exceeds a critical mass, as detected by the quorumsensing mechanism. A simplified schematic representation of the quorumsensing process is shown in Figure 2. Small molecules, called auto inducers (AI), are produced by the bacteria (via the LuxI protein) and can freely diffuse in and out of the cell. At low cell densities, the cell maintains low levels of LuxI, and hence AI is produced at a low rate. However, as the cell population density increases, eventually the AI concentration will increase to a level that the cells can detect, using another protein called LuxR. The AI reacts with the LuxR to form a chemical complex, which in turn binds to a regulatory region on the DNA known as the lux box, the effect of which is to increase the transcription rate of both the luxR and luxI genes. Increases in LuxI cause a stronger chemical signal to be sent, while increases in LuxR increase the cells’ ability to detect the signal, resulting in a positive feedback loop. The genes responsible for bioluminescence (luxCDABEG) are adjacent to and under the same regulatory control as the luxI genes and thereby are also expressed to a greater extent at high cell densities, causing the cell to bioluminesce. We have developed a stochastic model of the quorumsensing system in V. fischeri consisting of 23 chemical reactions among 16 chemical species [12]. Stochastic simulations were conducted to determine the population of LuxI protein as a function of 20 different steady-state AI levels that represent different cell population densities and 5 different seed values to determine the induction response of the cells. Each simulation was conducted for 10,000 simulated seconds, with the cell state reported every 10 seconds. We used this set of simulations as a test case to demonstrate the performance of the DCE and to validate the performance model. 6.2 Performance of the DCE To evaluate the performance of the DCE, we consider speedups and efficiency for a typical problem. We then apply the performance model developed above to better understand the DCE as well as provide for analysis of trends. As a baseline, we executed the 100 simulation jobs sequentially without using the DCE on a lightly loaded Sun Ultra-60 workstation running Sun OS v5.8. The total execution time for this run was 11,222.40 seconds, or over 3 hours. We then performed a series of experiments using the same set of 100 simulation jobs executing on a cluster of up to 14 processors identical to the sequential case. In all of the experiments, the processors were lightly loaded. The simulation execution times and associated data are given in Table 2. As one can see with these results, the DCE provides quite good results, with nearly ideal speedup and efficiency for the simulations of interest. Note that a biologist using the serial version of the simulator must wait more than 3 hours for the set of simulations to complete, as opposed to

less than 14 minutes when using 14 workstations via the DCE. This not only represents an impressive speedup in the simulations but also enables a much more aggressive use of virtual experimentation in understanding biological problems. For the experiments performed with this biological use case, we employed a collection of homogeneous workstations that were lightly loaded. Hence, the background load imbalance and heterogeneity have no appreciable impact, and their associated terms from equation (11) are set to 1. Similarly, we distribute the application workload evenly across the machines for each experiment, so the application load imbalance is similarly negligible, and these terms are also set to 1. From the serial experiments, we measure that the total number of reactions simulated, NR , is 2,830,686,106. We also determine that the computation time per reaction, tR , is 3.96 microseconds. We can measure the setup and results times with the DCE employing a single worker executing on a workstation, as compared to the case with a single workstation executing the simulations as above. Using all these terms in equation (10), we find that the model predicts performance within 3% of the measured values in each case. Therefore, we can conclude that the performance model provides sufficient insight to understand DCE performance with homogeneous, lightly loaded resources.Additional studies under more general conditions remain to be performed. 7. DCE Future Enhancements Although the current DCE architecture performs well, a number of enhancements could be made to improve its performance and features. The DCE implementation presented in this article is now distributed as the underlying architecture of the BioGrid application. The BioGrid application fits within the BioSPICE (http://www.biospice.org) development program funded by DARPA and the NSF and is intended to seamlessly support the execution of a wide variety of application modules. In particular, our development group has developed the ESS stochastic simulation application, an interface to Octave, and BioSmokey for postprocessing results. Other groups within the BioSPICE community provide such tools as simulators for ordinary differential equations, stochastic differential equation simulation, and parameter estimation tools that generate large numbers of simulations. Many of the future development plans for the software regard enhancing performance. To reduce network bandwidth, we are considering enhancing the worker and client interfaces to compress large files as they are added to the server and to decompress files as they are retrieved. We will also investigate policies for access control for submission of jobs, such as supporting job queues with various buffer sizes, as well as limits on job file sizes or runtimes. Similarly, we will investigate scheduling policies for jobs on workers, including the number of jobs per worker, Volume 80, Number 4–5

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on April 14, 2008 © 2004 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

SIMULATION

239

McCollum, Peterson, Cox, and Simpson

Figure 2. Quorum sensing in Vibrio fischeri

Table 2. Distributed computing environment (DCE) performance Processors

Ideal (sec)

DCE Actual

Speedup

Utilization

2 4 6 8 10 12 14

5611.20 2805.60 1870.40 1402.80 1122.24 935.20 801.60

5663.5 2818.9 1919.7 1439.9 1139.1 954.1 832.5

1.98 3.98 5.85 7.79 9.85 11.76 13.48

99.08 99.53 97.43 97.42 98.52 98.02 96.28

taking into account the effect of background loading in the scheduling, costs associated with each worker machine, and heterogeneous aspects of the worker machines. Another potential optimization is to account for dependencies between jobs when scheduling jobs on workers to minimize the communications costs. As mentioned above, integration with the Globus GRAM interface will also be investigated. Our goal when implementing these capabilities is that the system would be implemented within the DCE interface and would be transparent to the user. 8. Conclusions The speed of gene regulatory network simulation is a limiting factor in biologists’ ability to analyze cell behavior. Using a simple grid-based distributed computing environment, we have shown that we can accelerate this process significantly. By reducing simulation time through improvements to simulation algorithms and high- performance grid-based computing environments, a roadblock to widespread application of these models to important biological problems can be removed. 9. Acknowledgments This work was partially supported by the National Science Foundation via grants 0075792, 0130843, 0311500, and 9972889; the Defense Advanced Research Projects Agency BioComputation program; the University of Ten-

nessee Center for Environmental Biotechnology; and the University of Tennessee Center for Information Technology Research. We gratefully acknowledge their support. We would also like to thank the reviewers for SIMULATION, whose insightful comments and suggestions greatly strengthened this work. 9. References [1] Casanova, H., and J. Dongarra. 1997. NetSolve: A network-enabled server for solving computational science problems. International Journal of Supercomputer Applications and High Performance Computing 11 (3): 212-23. [2] Thain, D., T. Tannenbaum, and M. Livny. 2003. Condor and the Grid. In Grid computing: Making the global infrastructure a reality, edited by F. Berman, A. J. G. Hey, and G. Fox. New York: John Wiley. [3] Foster, I., and C. Kesselman, eds. 1999. The grid: Blueprint for a new computing infrastructure. New York: Morgan Kaufmann. [4] Smolen, P., D. A. Baxter, and J. H. Byrne. 2000. Modeling transcriptional control in gene networks: Methods, recent results, and future directions. Bulletin of Mathematical Biology 62:247-92. [5] Marlovits, G., C. J. Tyson, B. Novak, and J. J. Tyson. 1998. Modeling M-phase control in Xenopus oocyte extracts: The surveillance mechanism for unreplicated DNA. Biophysical Chemistry 72:169-84. [6] Novak, B., and J. J. Tyson. 1993. Modeling the cell-division cycle: Mphase trigger, oscillations, and size control. Journal of Theoretical Biology 165:101-34. [7] Sha, W., J. Moore, K. Chen, A. D. Lassaletta, C.-S. Yi, J. J. Tyson, and J. C. Sible. 2003. Hysteresis drives cell-cycle transitions in Xenopus laevis egg extracts. Proceedings of the National Academy of Sciences of the United States of America 100:975-80.

240 SIMULATION Volume 80, Number 4–5

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on April 14, 2008 © 2004 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

ACCELERATING GENE REGULATORY NETWORK MODELING

[8] Arkin, A., J. Ross, and H. H. McAdams. 1998. Stochastic kinetic analysis of developmental pathway bifurcation in phage lambdainfected Escherichia coli cells. Genetics 149:1633-48. [9] Endy, D., D. Kong, and J. Yin. 1997. Intracellular kinetics of a growing virus: A genetically structured simulation for bacteriophage T7. Biotechnology and Bioengineering 55:375-89. [10] Gillespie, D. T. 1977. Exact stochastic simulation of coupled chemical reactions. Journal of Physical Chemistry 81:2340-61. [11] Gibson, M. A., and J. Bruck. 2000. Efficient exact stochastic simulation of chemical systems with many species and many channels. Journal of Physical Chemistry A 104:1876-89. [12] Cox, C. D., G. D. Peterson, M. S. Allen, J. M. Lancaster, J. M. McCollum, D. Austin, L. Yan, G. S. Sayler, and M. L. Simpson. 2003. Analysis of noise in quorum sensing. OMICS: A Journal of Integrative Biology 7 (3): 317-34. [13] Gillespie, D. T. 2001. Approximate accelerated stochastic simulation of chemically reacting systems. Journal of Chemical Physics 115 (4): 1716-33. [14] Endy, D., and R. Brent. 2001. Modeling cellular behaviour. Nature 409:391-95. [15] Lehrter, J. M., F. N.Abu-Khzam, D. W. Bouldin, M.A. Langston, and G. D. Peterson. 2002. On special-purpose hardware clusters for high-performance computational grids. In Proceedings of the 14th IASTED International Conference on Parallel and Distributed Computing and Systems. [16] McCollum, J. M., J. M. Lancaster, and G. D. Peterson. 2003. Using reconfigurable computing to accelerate simulation applications. In Proceedings of the International Conference on Engineering of Reconfigurable Systems and Algorithms, pp. 308-11. [17] Sunderam, V. S. 1990. PVM: A framework for parallel distributed computing. Currency: Practice and Experience 2 (4): 315-30. [18] Snir, M., S. Otto, S. Huss-Lederman, D. Walker, and J. Dongarra. 1998. MPI: The complete reference. 2d ed. Cambridge, MA: MIT Press. [19] Papoulis, A. 1984. Probability, random variables, and stochastic processes. New York: McGraw-Hill. [20] Peterson, G. D., and R. D. Chamberlain. 1996. Parallel application performance in a shared resource environment. IEE Distributed

Systems Engineering Journal 3 (1): 9-19. [21] Leland, W., and T. Ott. 1986. Load balancing heuristics and process behavior. In Proceedings of PERFORMANCE ’86 and ACM SIGMETRICS, Raleigh, NC, pp. 54-69. [22] Fujimoto, R. M. 1990. Parallel discrete event simulation. Communications of the ACM 33 (10): 30-53. [23] Peterson, G. D. 2001. Performance tradeoffs for emulation, hardware acceleration, and simulation. In System on chip methodologies and design languages, edited by Jean Mermet. New York: Kluwer Academic. [24] Greenberg, E. P. 2000. Acyl-homoserine lactone quorum sensing in bacteria. Journal of Microbiology 38:117-21. [25] Withers, H., S. Swift, and P. Williams. 2001. Quorum sensing as an integral component of gene regulatory networks in Gram-negative bacteria. Current Opinions on Microbiology 4:186-93.

James M. McCollum is a graduate research assistant in the Department of Electrical and Computer Engineering, University of Tennessee–Knoxville. Gregory D. Peterson is an assistant professor in the Department of Electrical and Computer Engineering, University of Tennessee–Knoxville. Chris D. Cox is an associate professor in the Department of Civil and Environmental Engineering, Center for Environmental Biotechnology, University of Tennessee–Knoxville. Michael L. Simpson is a distinguished scientist and professor in the Molecular Scale Engineering and Nanoscale Technologies Research Group, Oak Ridge National Laboratory, Department of Material Science and Engineering, Center for Environmental Biotechnology, University of Tennessee–Knoxville.

Volume 80, Number 4–5

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on April 14, 2008 © 2004 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

SIMULATION

241

Suggest Documents