Methodologies in the Use of Computational Models for Theoretical Biology C. C. Maley
Computer Science Dept. University of New Mexico Albuquerque, NM 87131
[email protected] 505-277-8912 Mathematical and computational approaches alternative is an exhaustive description of the system. to biological questions, a marginal activity a While descriptions help to expand our knowledge, unshort time ago, are now recognized as proderstanding requires simpli cation, i.e. theory. viding some of the most powerful tools in We use abstractions, such as computational and anlearning about nature; such approaches guide alytical models, to say something general about life. empirical work and provide a framework for Such a model proposes that the missing details are synthesis and analysis. (Levin et al., 1997) irrelevant. Whether they are, or not, is a scienti c question that can be addressed through experimental The recent rise of computational models in theoretical tests of the model's predictions in the eld or laborabiology has put both methodological as well as tech- tory. The relevance of theoretical results can also be nical demands on researchers. This paper is a brief examined through the study of models that relax asdiscussion of the methodological issues I have consid- sumptions and add complexity to the original models. ered and approaches I have taken in the course of my I see two high level approaches one might take to the research. It is almost assuredly philosophically naive, construction of a model that expresses the essence of a but it is my hope that these methodologies may prove biological system. One can either start with a simple model and add complexity incrementally, or one can useful to the reader. with a highly detailed model and then pare it I am chie y concerned with the use of computational start down to is bare essentials. models in theoretical biology. Some of what I will say below may apply to numerical analysis and analytical models, but I will restrict myself to the discussion of 1 Incremental Complexity con guration models (Caswell and John, 1992), which are also called agent-based or individual-based models. One common theoretical approach in science is to beThese are models in which each individual is explicitly gin with a simple or \toy" model that represents a represented along with the rules that govern the inter- rst attempt to explain the data. Then, as more data is collected and inconsistencies are found between actions between individuals. the model and experiment, the model is incrementally I take a theory to be a proposed explanation for a phe- elaborated. Complexity is added only when observanomenon. It is thus a sort of idea. In contrast, I view tion of the real world shines light upon an inadequacy a model as an instantiation of a theory, an expression of the model. Alternatively, understanding can be apof that idea. Just as there are many ways to express proached from the theoretical side, by the extention a single idea, there are many possible instantiations of theoretical results through the relaxation of simpliof a theory. I should note that there is a non-trivial fying assumptions. A third approach to explore a leap between theory and instantiation. A model may formalism that expresses a dynamic of isinterest. be an inaccurate instantiation of the theory. Such a model would be useless in the endeavor to explore the 1.1 Exploring Experimental Hypotheses implications of the theory. We seek to build theories in order to understand biol- A modeler should be able to justify why she included a ogy. That understanding requires the extraction of the particular aspect of the real system in her model while essence of a system from the details of the biology. The leaving others out. Biological systems characteristi-
cally exhibit a vast amount of detail and complexity. How might one proceed to develop a toy model in the face of such complexity. The approach I prefer to take is one of hypothesis testing. If we focus on a speci c hypothesis, we can let that hypothesis dictate the contents of the model. This methodology follows in six steps: 1. 2. 3. 4. 5.
Specify the biological hypothesis. Extract the essential components. Implement those components in a model. Test the model for realism. Run experiments with the model to test the hypothesis. 6. Use the results to: (a) Illuminate missing essential components of the hypothesis. (b) Qualify the hypothesis. The rst step is to choose and specify a particular hypothesis to be tested. In Maley (1998a) one of the hypotheses we tested was the proposal that geographical isolation has a strong1 impact on speciation. The second step is to attempt to extract from the hypothesis the necessary and sucient components that must be modeled in order to test the hypothesis. In this case, in order to ground the species level dynamics in the microevolutionary dynamics we needed: organisms, heredity, mutation, dierential survival, reproductive barriers (to de ne species), spatial structure, and migratory barriers. The third step is then to implement each of these components in the model. Once the model has been constructed, the fourth step is to test or calibrate it. In cases where direct experimental data is currently unavailable, progress toward testing the model can still be made by reference to other theoretical results that have themselves been tested (Maley, 1998b). In a multi-scale model that attempts to ground high level phenomena in the dynamics of lower levels, it is often the case that we already know something about how those lower levels behave in the real world. In our example of speciation, we have represented the evolution of populations as well as ecosystems. We can test the model to see if it exhibits appropriate evolutionary and ecological dynamics like adaptation, predator-prey oscillations, trophic cascades, competitive exclusion, and even the 1 \Strong" here is understood to be relative to other factors that were tested for their impact on speciation.
species-area power law (Maley, 1998b,a). Finally, after testing the realism of the model, we are ready to run experiments to test our hypothesis. The results can either be used to identify missing essential components of the hypothesis, or to qualify under which conditions the hypothesis holds. In our example, it turned out that geographical isolation was indeed the strongest factor tested that aected speciation, while other hypotheses, like dierential selection on populations due to habitat heterogeneity, did not demonstrate statistically signi cant aects. These results should be viewed as predictions generated by the model, not information about real biology. One potential problem with this methodology is that choices of implementation can have a signi cant impact on the results of the model. The indirect approach to biology through modeling is similar to the indirect approaches to cognition used in psychology. It is thus perhaps wise to follow the psychologists in their insistence on obtaining similar results from dierent approaches before those results are accepted scientifically. In other words, the results of a model should be corroborated by other models that use dierent implementations before those results can be accepted as insights into the hypothesis. We expect theoretical biology to be revolutionized by computational models. Mathematical biology has often been ignored by experimentalists because the complexity of life has generally been intractable for analytical models. Computational models hold out the hope of expressing more of that complexity than can be easily manipulated in analytical models. We may thus catch a glimpse of biological dynamics which we have never seen before. This strength of computational models is also its bane. If an understanding of the dynamics of a system have not yielded to eld and laboratory experiments, any predictions generated by a computational model will be dicult to verify. This is particularly critical in the study of evolution and ecology where the systems of interest play out over temporal and spatial scales than cannot be easily manipulated in a controlled experiment. As modelers, the burden is upon us to generate predictions that can be falsi ed through experimentation and observation of the real world. Furthermore, if we wish to contribute positively to progress in biology, we should spell out those experiments that should bear upon our results.
1.2 Extending Theoretical Results Another form of incremental complexity makes no reference to data but is intended to extend theoretical
results. Most theoretical models make unrealistic assumptions. In particular, analytical models tend to lack spatial structure, often work with in nite populations and continuous variables to describe those populations. Computational models can be used to relax these assumptions in order to determine if the analytical results still hold under less restrictive assumptions. Collins and Jeerson (1992) did exactly this. They extended the analytical results of Kirkpatrick's analysis of sexual selection. Kirkpatrick's model assumes in nite populations, no mutation, females with global knowledge of the population, complete mixing (no spatial structure), and haploid organisms. Collins and Jeerson (1992) relax each of these assumptions in turn to determine if Kirkpatrick's results apply to populations with more realistic characteristics. With the exception of the invasion of recessive alleles into a diploid population, Kirkpatrick's results do indeed generalize. In a similar vein, Boerlijst and Hogeweg (1991) examined the hypercycle theory for the origin of life. This theory had suered in competition with the auto-catalytic RNA hypothesis due to the fact that hypercycles are unstable to mutations that produce \parasites." However, the addition of spatial structure to the theory produced hypercycles that were robust to the emergence of parasites due to the fact that mutant parasitic molecules could be spatially sequestered from the core of a hypercycle by the spiraling waves of protein products produced by the core. Thus, the addition of spatial structure to the hypercycle model has revitalized that theory as an explanation for the origins of life (May, 1991).
1.3 Exploratory Formalisms A third, and perhaps more radical approach to understanding biology, is to develop a formal structure, usually a computational model, that expresses some dynamic of interest. Then, the researcher may explore how the mechanisms in the model produce the dynamic of interest so as to understand in principle how that dynamic might come about. The question of whether the mechanisms of the model explain the phenomena of nature is a problem that may be addressed separately. Further realism and connection to biology may be increased through incrementally adding complexity to the formalism. Bedau discusses this methodology in the following paper with far more sophistication than I could hope to muster. So I shall leave that topic to him with the exception of one parting thought. Anil Somayaji (pers. comm.) proposed that the role of arti cial life models in biology might be thought of as analogous to the role of mathematics in physics. Much of mathematics has been explored as interesting in its
own right, as pure formal structures. However, it just so happens that some of these formal structures can be coopted as models for physical phenomena. Thus, we may want to provide some space for arti cial life researchers to explore the structures and implications of their models inspired by, but without reference to, any particular biological system.
2 Decremental Complexity An altogether dierent approach to the theoretical understanding of a system is what I will call \decremental complexity." This process begins with as full a description of the biological system as possible. A model is then constructed from this description, in all its manageable complexity and then calibrated against experimental data. If the theoretician can get this far, the real game begins. We can now begin removing details of the model one by one and in various combinations. After each reduction in the model's complexity, its dynamics can be measured and compared to the fully detailed system. If the removal of a piece of the model has no signi cant aect on the results of the model, we may hypothesize that it does not contribute to the essence of the system, and so should be discarded from the model. By picking away at the model in this way, we may be able to distill the signal from the noise, the essence from the description (R. C. Lewontin, pers. comm.). The ecacy of this approach is, to my knowledge, yet untested. The main obstacle is to nd a system that whose details are known suciently that it may be \fully described" and then calibrated against experimental results. Bacteriophage lambda suggests itself as a potential test case (Ptashne, 1992).
3 Closure The promise of computational models is that they can represent the complexity and dynamics of biological systems in a formal context. Con guration models typically have explicit representations of individuals or cells, and genes. The match between these components of con guration models and the objects of experimental research in biology make the con guration models readily accessible to (and thus criticizable by) experimentalists. If we are clear and rigorous enough in our methodology, computational models may help to close the yawning chasm that has traditionally separated theoretical and experimental biology.
Acknowledgements I would like to thank Mark Bedau for his helpful comments on this manuscript as well as Stephanie Forrest and Derek Smith for their discussions and support. This work was supported in part by ONR Grant N00014-99-1-0417 to Stephanie Forrest.
References
Boerlijst, M. and Hogeweg, P. (1991). Spiral wave structure in prebiotic evolution: Hypercycles stable against parasites. Physica, 48D:17{28. Caswell, H. and John, A. M. (1992). From the individual to the population in demographic models. In DeAngelis, D. and Gross, L., editors, IndividualBased Models and Approaches in Ecology, pages 36{ 61, New York. Chapman and Hill. Collins, R. and Jeerson, D. (1992). The evolution of sexual selection and female choice. In Varela, F. J. and Bourgine, P., editors, Toward a Practice of Autonomous Systems: Proceedings of the First European Conference on Arti cial Life, pages 327{336, Cambridge, MA. MIT Press. Levin, S. A., Grenfell, B., Hastings, A., and Perelson, A. S. (1997). Mathematical and computational challenges in population biology and ecosystems science. Science, 275:334{343. Maley, C. C. (1998a). The Evolution of Biodiversity: A Simulation Approach. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA. Maley, C. C. (1998b). Models in evolutionary ecology and the validation problem. In Adami, C., Belew, R. K., Kitano, H., and Taylor, C. E., editors, Arti cial Life VI, pages 423{427. MIT Press, Cambridge, MA. May, R. M. (1991). Hypercycles spring to life. Nature, 353:607{608. Ptashne, M. (1992). A Genetic Switch: Phage and Higher Organisms. Cell Press and Blackwell Scienti c Publications, Cambridge, MA.