Ontologies for Formal Representation of Biological ... - Springer Link

3 downloads 0 Views 343KB Size Report
M. Hucka, A. Finney, H. M. Sauro, H. Bolouri, J. C. Doyle, H. Kitano, A. P.. Arkin, B. J. .... S. A. Racunas, N. H. Shah, I. Albert, and N. V. Fedoroff. Hybrow: a.
Ontologies for Formal Representation of Biological Systems Nigam Shah and Mark Musen Stanford Medical Informatics, Stanford, CA, USA 94305, [email protected], [email protected]

1 Introduction This chapter provides an overview of how the use of ontologies may enhance biomedical research by providing a basis for a formalized, and shareable descriptions, of models of biological systems. A wide variety of artifacts are labeled as “ontologies” in the Biomedical domain, leading to much debate and confusion. The most widely used ontological artifact are controlled vocabularies (CVs). A CV provides a list of terms whose meanings are specifically defined. Terms from a CV are usually used for indexing records in a database. The Gene Ontology (GO) is the most widely used CV in databases serving biomedical researchers. The GO provides term for declaring the molecular function (MF), biological process (BP) and cellular component (CC) of gene products. The statements comprising these MF, BP and CC declaration are called annotations [51], which are predominantly used to interpret results from high throughput gene expression experiments [27,53]. Arguably, CVs provide the most value for effort in terms of facilitating database search and interoperability. The second most prevalent kind of artifact is an information model (or data model). An information model provides an organizing structure to information pertaining to a domain of interest, such as microarray1 data, and describes how different parts of the information at hand, such as the experimental condition and sample description, relate to each other. In biomedical research, Microarray Gene Expression Object Model (MAGE-OM) is an example of 1

An automated technique for simultaneously analyzing thousands of different DNA sequences or proteins affixed to a thumbnail-sized “chip” of glass or silicon. DNA microarrays can be used to monitor changes in the expression levels of genes in response to changes in environmental conditions or in healthy vs. diseased cells. Protein arrays can be used to study protein expression, protein–protein interactions, and interactions between proteins and other molecules. From – www.niaaa.nih.gov/publications/arh26-3/165-171.htm

S. Staab and R. Studer (eds.), Handbook on Ontologies, International Handbooks on Information Systems, DOI 10.1007/978-3-540-92673-3, c Springer-Verlag Berlin Heidelberg 2009 

445

446

N. Shah and M. Musen

a widely known information model. MAGE-OM, along with the controlled terms that are used to populate the information model is referred to as the Microarray Gene Expression Data (MGED) Ontology. The MGED Ontology is used to describe the minimum information about a microarray experiment that is essential to make sense of the numbers comprising the microarray data. The third kind of artifact is an ontology in its true sense, which is increasingly being used for knowledge representation in Biomedicine. In this interpretation, an ontology is a specification of entities (or concepts) and relationships among them in a domain of discourse; along with declarations of the properties of each relationship, and, in some cases, a set of explicit axioms defined for those relations and entities. In biomedical research, several ontologies are striving towards this goal. The foremost is the Foundational Model of Anatomy (FMA), which is a computer-based knowledge source for anatomy and represents classes and relationships necessary for the symbolic modeling of the structure of the human body in a form that is understandable to humans and is also navigable, parseable, and interpretable by machine-based systems [44]. The biomedical research community is perhaps the farthest along in recognizing the need and starting an organized effort for the creation of ontologies that serve as formal knowledge representations [47]. 1.1 Uses of Ontologies in Biomedical Research With the advent of high-throughput technologies,2 biomedical research is undergoing a revolution in terms of the amount and types of data available to the scientist. On the one hand, there is an abundance of individual data types such as gene and protein sequences, gene expression data, protein structures, protein interactions and annotations. On the other hand, there is a shortage of tools and methods that can handle this deluge of information and allow a scientist to draw meaningful inferences. Currently, a significant amount of time and energy is spent in merely locating and retrieving information rather than thinking about what that information means. For example, a researcher trying to understand how the proteins participating in the cell cycle interact with each other, has to read several reviews to determine the list of proteins S/he should track, search databases such as Uniprot to retrieve annotations for the relevant proteins, follow the citations evidencing the annotations to determine the experiment/s that were performed on each protein and in some cases retrieve the actual data sets from special databases. All this information comes in different formats and from different sources. It is extremely difficult to manually search through the various sources and integrate this diverse information about biological systems to formulate hypotheses (or “Models”) spanning a large number genes and proteins [28]. 2

High-throughput technologies are large-scale, usually automated, methods to purify, identify, and characterize DNA, RNA, proteins and other molecules. They allow rapid analysis of very large numbers of samples.

Ontologies for Formal Representation

447

Until recently, the predominant use of ontologies in biomedicine has been to facilitate interoperability among databases by indexing them with standard terms to address the problem of locating and retrieving information. Even if the problem of locating information were solved, it is still difficult to formulate formal hypotheses and models comprising a large number genes and proteins [28]. The difficulty arises primarily because there is no shared formalism – akin to engineering drawings – in which to express such hypotheses or models and the interpretation they convey. Lack of such a formalism also makes it difficult to determine whether the hypotheses are consistent internally or with data, to refine inconsistent hypotheses and to verify the implications of complicated hypotheses in ‘what if’ thought experiments [11, 24]. This situation needs to be rectified and tools need to be developed that utilize formal methods to assist in querying and interpreting the information at hand [11, 15, 52]. Besides using ontologies for enhancing interoperability among databases and enabling data exchange, researchers have also used ontologies to create knowledge bases that store large amounts of knowledge in a structured manner [22, 23]. For example, EcoCyc is a comprehensive source of structured knowledge on metabolic pathways in E. Coli. When used to create knowledge bases, an ontology enables the declaration and storage of a theory – an experimentally testable explanation of the interactions in a biological system [54]. If the ontologies are well-designed, then the resulting knowledge bases can be used to retrieve relevant facts, to organize and interpret disparate knowledge, to infer non-obvious relationships, and to evaluate hypotheses posited by scientists [4, 31, 41]. The emerging trend in the use of ontologies in biomedical research is that, at the outset, ontology terms are used to name things, gradually proceeding toward naming connections between things – first to create information models and then progressing towards the creation of a formal representation3 [15, 52] which allows the creation of formal (both qualitative and quantitative) models4 of biological systems. In this chapter we focus on the latter use. Chapter “Ontology-Based Recommender Systems” discusses the current applications of bio-ontologies that are focused around the theme of database interoperability and data integration. Bodenreider and Stevens [5] have recently reviewed in detail the current progress in biomedical ontologies and we do not review it again in this chapter. In the next sections, we discuss how the use of ontologies for formal representation of biological systems can aid in biomedical research, we then outline the hurdles facing the realization of such use and in the end discuss the possible role of the Semantic Web in advancing this particular use. 3

4

For this current discussion, a formal representation means a computerinterpretable standardized form that can be the basis for creating unambiguous descriptions of biological systems 2.1. We use “models” to mean a schematic description of a system or phenomenon that accounts for its known or inferred properties and can be used for further study of its characteristics.

448

N. Shah and M. Musen

2 Constructing Hypotheses and Models of Biological Systems The discovery process in biomedical research is cyclical; Scientists examine existing data to formulate models that explain the data, design experiments to test the hypotheses and develop new hypotheses that incorporate the data generated during experimentation. Currently, in order to advance this cycle, the experimentalist must perform several tasks: (1) gather information of many different types about the biological entities that participate in a biological process (2) formulate hypotheses (or models) about the relationships among these entities, (3) examine the different data to evaluate the degree to which his/her hypothesis is supported and (4) refine the hypotheses to achieve the best possible match with the data. In todays data-rich environment, this is a very difficult, time-consuming and tedious task. For example, even to evaluate a simple hypothesis such as “protein A is a transcriptional activator of genes X, Y and Z”, the experimentalist must examine the literature for evidence showing that protein A is a transcription factor or exhibits protein sequence homology with known transcriptional factors. S/he must look for evidence indicating DNA binding activity for protein A and if found, examine the promoters of X, Y and Z for presence of binding sequences for protein A. Moreover, each of the preceding steps incorporate a set of implicit assumptions such as sequence homology implying similarity of function. Finally, the refined hypotheses are subjected to experimental testing. Hypotheses that survive these tests – validated hypotheses – are published in scientific publications and represent the growing knowledge about biological entities, processes and relationships among them. Validated hypotheses are eventually synthesized into systems of relationships called “models” that account for the known behavior of the system and provide the grounds for further experimentation. Biologists’ models are generally presented as diagrams showing the type, direction and strength of relationships among biological entities such as genes and proteins. Figure5 1 shows a simplified a model of regulation of the mitotic cell cycle in humans.6 Usually the goal of constructing a model of a biological system is to predict the outcome (either qualitative of quantitative) from the system at some point in the future. For the moment, for most biological systems, scientists must describe the workings of biological systems in a qualitative manner because there is not enough known to formulate quantitative modes [11]. Even for qualitative models, we believe that such predictive models, though essential, lie in the future because much of current research uses prior knowledge for 5 6

Source public domain, non copy righted image. The cell cycle is a complicated biological process and comprises of the progression of events that occur in a cell during successive cell replication. The process can be described at varying level of details ranging from a high level qualitative description to a detailed system of differential equations. However, for most biological processes the representation is primarily in terms of qualitative interactions.

Ontologies for Formal Representation

449

Fig. 1. A diagram showing a simple model of regulation of the mitotic cell cycle in humans. The filled rectangles in the cell (gray box ) denote proteins and protein complexes that participate in the process. The green oval shows the phases of the cell cycle and the green arrows from the proteins shows the phase at which they function. The black arrows depict activating relationships and the red ones show inhibitory relationships. Note how temporal, logical and structural information is mashed together in one representation

interpreting data sets rather than applying prior knowledge as a set of axioms that will elicit new knowledge [56, 59]. In this situation, the most profitable manner to use models is to construct a model or a set of models and then test them for consistency with the available information and knowledge, revise models to minimize the inconsistencies and then pick the most consistent model as a basis for designing further experiments [39, 41, 52]. 2.1 Creating a Formal Representation for Hypotheses and Models If we accept the notion of an hypothesis (or a model) as the basis for an organizing framework for the data and information sources we wish to integrate and interpret, we immediately encounter several problems. As we have discussed, for a large number of participating entities such genes, proteins,

450

N. Shah and M. Musen

clinical observations and laboratory data, it is extremely difficult to integrate current knowledge about the relationships within system under study to formulate hypotheses or models. The difficulty arises primarily because there is no shared formalism – akin to engineering drawings – in which to express such hypotheses or models and the interpretation they convey. Therefore, it is difficult to determine whether such hypotheses are internally consistent or are consistent with data, to refine inconsistent hypotheses and to understand the implications of complicated hypotheses [24]. It is widely recognized that one key challenge in managing this data overload is to represent the results of high-throughput experiments as well as clinical observations and patient records in a formal representation – a computer-interpretable standardized form that can be the basis for unambiguous descriptions of hypotheses and models [15, 52]. This raises the following question: What are the desirable properties of a formal representation for hypotheses or models? Peleg et al. [38], have suggested the following set of desirable properties in a formal representation for models of biological processes: 1. A formal representation should be able to present structural, functional and dynamic views of a biological process. The structural and functional views show the entities that participate in a process and relationships among them. The dynamic view shows the process over time, shows branch points and conditional sub-processes. 2. A formal representation should include an associated ontology that unambiguously identifies the entities and relationships in a process. 3. It should be able to represent biological processes at various scales and should allow hierarchical representation of sub processes to manage complexity. 4. The representation should be able to incorporate new data as they become available and should be extensible to allow new categories of information as they come in to existence. 5. The representation should have a corresponding conceptual (mathematical) framework that allows verification of system properties using simulation and/or logical inference mechanisms. 6. The representation should have an intuitive visual layout. If we can devise a formal representation for hypotheses and the data at hand as well as the mechanisms to check the consistency of hypotheses with that data and prior knowledge, we can significantly streamline the task of interpreting diverse data. Moreover, if we develop such a formal representation, we can develop tools that can operate upon current data sets, information and existing knowledge to integrate them in an environment that supports the formulation and testing of hypotheses [4, 9, 14, 41, 54]. Developing such a formal representation is not a trivial task. Moreover it is unreasonable to expect one representation that will satisfy the needs of all users. However, the need for creating such representation for specific domain

Ontologies for Formal Representation

451

(or sub domain) of study has been proposed multiple times in Biomedicine [2,3,9,11,12,15,37]. In the next section we discuss the key issues in creating a formal representation of a domain (see Fig. 2 for a quick overview) and then in Sect. 3 we discuss the pivotal role ontologies can play in the process. 2.2 Challenges for Developing a Formal Representation Knowledge Representation The first challenge is the systematic representation of the various kinds of biological entities that participate in any given disease process and the many qualitatively different kinds of relationships among them. This requires the development of an ontology for unambiguously representing biological entities and interactions among them. Specifically, an ontology allows us to represent domain-specific entities along with their definitions, a set of relationships among them, properties of each relationship, and, in some cases, a set of explicit axioms defined for those relations and entities. We require different ontologies to represent biological processes at different levels of granularity because biological processes and the relevant data can be considered at varying levels of detail, ranging from molecular mechanisms to general processes such as cell division and from raw data matrixes to qualitative relationships [6]. Ontologies have gained a lot of popularity in molecular biology over the last several years. The earliest ontologies describe properties of ‘objects’ such as genes, gene products and small molecules. The later ontologies describe the ‘processes’ that gene, proteins and small molecules participate in. Currently, there are several ontologies that allow representation of processes in a biological system by specifying relationships between biological entities for tasks ranging from modeling biological systems to extracting information from literature. At the simplest level, gene ontology BP annotations describe the processes a particular gene product might contribute to, which can be viewed as a minimal model of the biological process that does not contain any declaration of the specific relationships among its participants. At the other end is the Systems biology markup language, SBML, which can represent quantitative models of biochemical processes and pathways [20]. However for most systems such detailed information is not yet available. There are multiple ontologies to represent biological processes, models and hypotheses at varying degrees of granularity between the two extremes of a GO BP annotation and a SBML representation. For example, EcoCyc’s ontology, which is used to represent information about metabolic pathways for E. coli [22] provides an ontology of biological entity types and processes. An ontology developed by Rzhetsky et al. [49] allows representation signal transduction pathways at a granularity level that is optimal for programs that extract information about such pathways automatically from published literature [49, 50]. In Sect. 3 we will discuss how ontologies enable the creation of a formal representation by enabling the knowledge representation task.

452

N. Shah and M. Musen

Conceptual Representation The second problem is to represent the biological system conceptually. The conceptual framework for a biological system enables a user to reason about a biological system and perform thought experiments. Thought experiments serve two functions: prediction and verification. Prediction allows scientists to make future claims based on models of a system. Verification provides guidance and feedback about the accuracy of the models through comparison, manipulation, and evaluation against available data and knowledge. Prediction and verification enable scientists to ask ‘what if’ questions about a system, form explanations, as well as make and test predictions. Currently, one of the major limiting factor is the conceptual representation of the “mathematics” of biological systems [11]. A conceptual framework for representing biological systems must accommodate the modularity and temporal behavior of biological systems, as well as handle their non-linearity and redundancy. The conceptual frameworks used to represent biological models vary from ordinary differential equations to Boolean [1, 30] and Bayesian [13, 17, 36] networks as well as Petri Nets [37, 43] to qualitative process calculi [7], special logics [57] and rule systems [21]. The inability to represent disparate kinds of information, at different levels of detail, about biological systems in a common conceptual framework is a major limitation in the creation of formal representations of biological systems, and current efforts usually focus on a limited categories of information [52]. A promising approach is to represent the biological processes in a system as a sequence of ‘events’ that link particular ‘states’ of the system [40, 42]. It has been shown before that complex processes, particularly those that exhibit non-linear behavior, are readily described by event-driven dynamics [19] because event dynamics allows description of the process in terms of observed effects of the non-linear behavior rather then requiring that the non-linear behavior itself be represented as a mathematical function. Moreover, event-based approaches can represent simple processes, such as protein phosphorylation, to complex ones, such as the cell cycle, allowing a wide range of resolution. An event-based framework offers several other advantages as well: (1) It can explicitly represent states which allows for representing information such as commitment to a developmental pathway [38]; (2) It allows hierarchical representation of properties and hence avoid a rapid increase in the number of states that need to be represented [18]; (3) It can represent temporal constraints on when events occur; (4) It can readily accept new categories of information and represent information at different resolution levels. It is unlikely that any conceptual framework will be adequate to represent all biological systems and it is much more productive to represent the data, information and knowledge at hand in an explicit ontology and then map the various entity-types and relationships in the ontology to a particular conceptual frameworks needed under specific situations. For example, Rubin et al have represented anatomic knowledge about the heart and the circulatory system in an ontology and then mapped the ontology to differential equation

Ontologies for Formal Representation

453

models for blood pressure as well as to a reasoning service to predict the effect of penetrating injuries to the heart [45, 46]. Knowledge and Data Acquisition The final challenge is the gathering, storage, and encoding of existing information. Even if all of the above challenges are met, getting access to the information and converting (or encoding) it into the relevant ontology in an automated manner is a major challenge because the information resides in separate repositories, each with custom storage formats and diverse access methods. Moreover, most databases do not store information in an explicit ontology and groups that design ontologies capable of representing models of biological systems [10,20,38,41,50] do not store all relevant information structured in those ontologies. Hence, efforts aimed at building a unified formal representation need to convert existing information into their ontology. For most formal representations this conversion (or encoding) of existing information and knowledge remains an unsolved problem and is the most common bottleneck preventing the use of the representation. All the challenges described above are strongly inter-related as shown in the Fig. 2 and need to solved in tandem for a particular domain of interest.

Fig. 2. Components of a Formal Representation: A formal representation is a computer-interpretable standardized form that can be the basis for unambiguous descriptions of hypotheses and models in a domain of discourse. Knowledge representation comprises the methods and processes of systematically representing the various biological entities and the different kinds of relationships between them. The conceptual framework for a biological system enables a user to reason about a biological system and perform thought experiments. The adoption and success of a formal representation depends critically on the ability to gather and encode existing information and knowledge

454

N. Shah and M. Musen

3 Ontologies Enable the Creation of a Formal Representation As discussed till now, in order to create a unified formal representation, we need a conceptual framework that can represent models of biological systems. The conceptual framework should represent the temporal dynamics of the process and should not require a complete model rewrite on minor changes. The conceptual framework should provide systematic methods to evaluate, update, extend and revise models represented in that framework. It is obvious that no conceptual framework will be adequate under all circumstances. We are better off representing the data, information and knowledge at hand in an explicit ontology and then mapping the various entity-types and relationships in the ontology to a particular conceptual framework needed under specific situations [45, 46]. The challenge then is to bridge the conceptual framework and the ontology to create the formal representation. For example, the relationship ‘protein A activates protein B’ is different from ‘protein A activates gene X’ though it may be described by the same words. When representing a biological process in a conceptual framework, it is essential to distinguish between the two meanings. An ontology can distinguish between the two relationships by providing different terms to represent the two meanings as well as by clearly specifying the two meanings. Having an associated ontology where each term in the ontology has a corresponding construct in the conceptual framework allows this distinction to be made in the conceptual model as well. Because an ontology unambiguously declares the entities and relationship among those entities, it can guide the design of the knowledgebases that store the various experimental and clinical data as well as prior domain knowledge in a manner that different conceptual frameworks can be overlayed on the primary data. Particularly, the use ontologies will help in maintaining a strict distinction between data and an interpretation based on the data. For example, the current diagnosis criteria for Multiple Sclerosis (MS) are based on observing at least two clinical episodes with certain symptoms at least 3 months apart and the presence of two plaques (on the spinal cord) on MRI. If we design our database of clinical records to store the diagnostic code for MS, then, if the diagnostic criteria for that diagnostic code change, we have to re-examine every record, re-diagnose MS and re-associate the correct codes with each record. However, if we define the interpretation about the existence of MS separately from the data structure used to store observational data, then we can change our criteria for MS and still reliably identify all cases of MS. Such separation of the definition of a biomedical concept from the decision (such as recommending a treatment) and computation (such as searching for correlations with environmental factors) has been demonstrated to increase both maintainability and efficiency of computer reasoning [25, 32]. We believe that a particular conceptual framework along with the associated ontology is the optimal way to create a formal representation fit for

Ontologies for Formal Representation

455

a specific situation. For example, differential equations along with the Systems Biology Markup Language (SBML) create an appropriate formal representation for biochemical signaling pathways. Ontologies will play a central role in enabling such modularity and maintaining a separation between data, information and knowledge and the relevant conceptual (mathematical) framework. The formal representation resulting from such separation will be easily extensible to incorporate new data types as they become available as well as to incorporate novel conceptual frameworks as required. The hypotheses, models and underlying data can become compatible with each other in the context of the relevant conceptual framework, making it possible to bring together the implications of many kinds of data and information in a unified manner [41, 54]. We will gain the ability to test complex interpretations as well as the ability to use data from unrelated research projects [35]. 3.1 Unresolved Issues Although, the use of ontologies in the creation of formal representations has a very strong case in its favor, there are several challenging issues that need to be addressed, which we discuss below: Abstraction Levels Biomedical researchers study biological systems at various scales, ranging from electron microscopy images to patient populations. No ontology can span all these and multiple ontologies already exist for different abstraction levels. We need to create a mechanism by which ontologies at different abstraction levels can be effectively mapped to each other. Unambiguous Relationships Within a particular abstraction level of representing a biological system, relationships need to be explicitly defined so that their interpretation is not subjective. The relationship ontology (RO) [55] is a step in this direction for defining relations for the molecular level. Although the RO provides explicit logical definitions, no computational implementation of the RO exists that actually allows a user to verify the correct use of the relations. General, well established mechanisms, such as Ontoclean (discussed in 9), to verify the clarity of relationships in ontologies exist. However, their use in the biomedical domain is minimal. Consistency Across Abstraction Levels Biomedical researchers cross multiple abstraction levels when describing biological systems. It is essential that relationships between entities at a particular abstraction levels can be consistently interpreted when we move to a

456

N. Shah and M. Musen

different abstraction level (For example how does the mechanism of action of a drug at the protein level affect the efficacy of the drug in treating a patient). Bindings with Conceptual Frameworks We have suggested a separation between the ontology used to structure knowledge about a biological system and the conceptual framework used to model the system mathematically. However, currently the process of establishing a correspondence between constructs in an ontology and constructs in a conceptual framework is quite ad hoc. Usually the ontology is designed with one conceptual framework in mind (e.g. SBML [20] and differential equations or the Biological Process ontology and Petri nets [38]). The general problem of easily mapping a formal representation to conceptual models at different scales is still unsolved and promises to be an exciting research direction.

4 Role of the Semantic Web The Semantic Web is an evolving extension of the World Wide Web in which web content can be expressed in a form that can be understood, interpreted and used by software agents (besides humans), thus permitting software agents to find, share and integrate information more easily.7 Given the heterogeneity of biological data both in form and location, the Semantic Web is of considerable interest to the life sciences community; particularly because key issues such as the need for consistent data and knowledge representation can be addressed using the Resource Description Framework (RDF) and Web Ontology Language (OWL) [16]. A variety of technologies have been built on this foundation of RDF and OWL that, together, support identifying, representing, and reasoning across a wide range of biomedical data [48, 58]. The expectation from the Semantic Web in life sciences is that relationships that exist implicitly in the minds of scientists will be explicitly declared (using OWL ontologies) and then used to aggregate genomic, proteomic, cellular, physiological, and chemical data. Semantic definitions will specify which objects are related to others and how. Such linking will enable semantic tools [34] that can pull together diverse information, render it in a manner defined by the user and possibly reason over the collated information to derive novel insights [8, 29, 33, 48]. However, not everyone is convinced that the Semantic Web will have such a revolutionizing effect on life sciences. There are implicit assumptions in the expected role of the Semantic Web, mainly that: (1) a simple syntax and the semantic of description logics will be sufficient (2) translation of existing information into the simple syntax as well as inferences on the simple semantics 7

http://www.w3.org/2001/sw/SW-FAQ

Ontologies for Formal Representation

457

will work right [26]. If these two assumptions are not met, the promise of the Semantic Web might not be realized in the field of biomedical research. Currently there is immense excitement about the Semantic Web and its possible contribution to advancing biomedical research; it remains to be seen wether it bears out in practice.

5 Summary In this chapter we have discussed how the use of ontologies for knowledge representation can aid in current biomedical research. We have argued that formally representing biological systems is necessary for advancing current biomedical research and that it is increasingly recognized that biologists need to use computational tools for performing thought experiments. We have described how biomedical ontologies can play a pivotal role in enabling that transition. We have outlined the hurdles facing the use of ontologies in creating formal representations that enable thought experiments. We have discussed the possible role of the Semantic Web in advancing this particular use of ontologies.

References 1. T. Akutsu, S. Miyano, and S. Kuhara. Algorithms for identifying boolean networks and related biological networks based on matrix multiplication and fingerprint function. J Comput Biol, 7(3-4):331–43, 2000. 1066-5277 Journal Article. 2. R. Altman, M. Buda, X. Chai, M. Carillo, R. Chen, and N. Abernethy. Riboweb: an ontology-based system for collaborative molecular biology. Intelligent Systems, IEEE [see also IEEE Expert], 14(5):68–76, 1999. TY - JOUR. 3. G. An. Concepts for developing a collaborative in silico model of the acute inflammatory response using agent-based modeling. J Crit Care, 21(1):105–10; discussion 110–1, Mar 2006. 4. C. Baral, K. Chancellor, N. Tran, N. Tran, A. Joy, and M. Berens. A knowledge based approach for representing and reasoning about signaling networks. Bioinformatics, 20(suppl 1):15–22, 2004. 5. O. Bodenreider and R. Stevens. Bio-ontologies: current trends and future directions. Brief Bioinform, 7(3):256–274, Sep 2006. 6. A. Brazma. On the importance of standardisation in life sciences. Bioinformatics, 17(2):113–4, 2001. 21138228 1367-4803 Editorial. 7. L. Cardelli. Bioware languages. In A. Herbert and K. S. Jones, editors, Computer Systems: Theory, Technology, and Applications, pages 59–65. Springer, New York, 2005. 8. K. Cheung, P. Qi, D. Tuck, and M. Krauthammer. A semantic web approach to biological pathway data reasoning and integration. Journal of Web Semantics, 4:3, 2006.

458

N. Shah and M. Musen

9. T. Clark and J. Kinoshita. Alzforum and swan: the present and future of scientific web communities. Brief Bioinform, 8(3):163–171, May 2007. 10. E. Demir, O. Babur, U. Dogrusoz, A. Gursoy, A. Ayaz, G. Gulesir, G. Nisanci, and R. Cetin-Atalay. An ontology for collaborative construction and analysis of cellular pathways. Bioinformatics, 20(3):349–356, 2004. 11. N. Fedoroff, S. A. Racunas, and J. Shrager. Making biological computing smarter. The Scientist, 19(11):20–21, 2005. 12. C. Friedman, T. Borlawsky, L. Shagina, H. R. Xing, and Y. A. Lussier. Bioontology and text: bridging the modeling gap. Bioinformatics, 22(19): 2421–2429, Oct 2006. 13. N. Friedman, M. Linial, I. Nachman, and D. Pe’er. Using bayesian networks to analyze expression data. J Comput Biol, 7(3-4):601–20, 2000. 1066-5277 Journal Article. 14. Y. Gao, J. Kinoshita, E. Wu, E. Miller, R. Lee, A. Seaborne, S. Cayzer, and T. Clark. Swan: A distributed knowledge infrastructure for alzheimer disease research. Journal of Web Semantics, inpress, 2006. 15. D. K. Gifford. Blazing pathways through genetic mountains. Science, 293(5537):2049–51, 2001. 0036-8075 Journal Article. 16. B. M. Good and M. D. Wilkinson. The life sciences semantic web is full of creeps! Brief Bioinform, 7(3):275–286, Sep 2006. 17. A. J. Hartemink, D. K. Gifford, T. S. Jaakkola, and R. A. Young. Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. Pac Symp Biocomput, pages 422–33, 2001. Journal Article Validation Studies. 18. M. Heiner. On exploiting the analysis power of petri nets for the validation of discrete event systems. In IMACS Symposium on Mathematical Modelling, pages 171–176, Wien, 1997. 19. Y. C. Ho. Special issue on discrete event dynamical systems: Editorial. Proc IEEE, 77(1):24–38, 1989. 20. M. Hucka, A. Finney, H. M. Sauro, H. Bolouri, J. C. Doyle, H. Kitano, A. P. Arkin, B. J. Bornstein, D. Bray, A. Cornish-Bowden, A. A. Cuellar, S. Dronov, E. D. Gilles, M. Ginkel, V. Gor, I. Goryanin, W. J. Hedley, T. C. Hodgman, J. H. Hofmeyr, P. J. Hunter, N. S. Juty, J. L. Kasberger, A. Kremling, U. Kummer, N. Le Novere, L. M. Loew, D. Lucio, P. Mendes, E. Minch, E. D. Mjolsness, Y. Nakayama, M. R. Nelson, P. F. Nielsen, T. Sakurada, J. C. Schaff, B. E. Shapiro, T. S. Shimizu, H. D. Spence, J. Stelling, K. Takahashi, M. Tomita, J. Wagner, and J. Wang. The systems biology markup language (sbml): a medium for representation and exchange of biochemical network models. Bioinformatics, 19(4):524–31, 2003. 1367-4803 Evaluation Studies Journal Article. 21. T. R. Hvidsten, A. Laegreid, and J. Komorowski. Learning rule-based models of biological process from gene expression time profiles using gene ontology. Bioinformatics, 19(9):1116–23, 2003. Evaluation Studies Journal Article Validation Studies. 22. P. Karp. An ontology for biological function based on molecular interactions. Bioinformatics, 16(3):269–85–, 2000. 23. P. Karp, C. Ouzounis, C. Moore-Kochlacs, L. Goldovsky, P. Kaipa, D. Ahren, S. Tsoka, N. Darzentas, V. Kunin, and N. Lopez-Bigas. Expansion of the biocyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res, 33(19):6083–9–, 2005.

Ontologies for Formal Representation

459

24. P. D. Karp. Pathway databases: a case study in computational symbolic theories. Science, 293(5537):2040–4, 2001. 0036-8075 Journal Article. 25. V. Kashyap, A. Morales, and T. Hongsermeier. On implementing clinical decision support: achieving scalability and maintainability by combining business rules and ontologies. AMIA Annu Symp Proc, pages 414–418, 2006. 26. T. Kazic. Putting semantics into the semantic web: how well can it capture biology? Pac Symp Biocomput, pages 140–151, 2006. 27. P. Khatri and S. Draghici. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics, 21(18):3587–95, 2005. 1367-4803 Journal Article. 28. A. Kuchinsky, K. Graham, D. Moh, M. Creech, K. Babaria, and A. Adler. Biological storytelling: a software tool for biological information organization based upon narrative structure. In Advanced Visual Interfaces, pages –, Trento, Italy, 2002. 29. H. Y. K. Lam, L. Marenco, T. Clark, Y. Gao, J. Kinoshita, G. Shepherd, P. Miller, E. Wu, G. T. Wong, N. Liu, C. Crasto, T. Morse, S. Stephens, and K.-H. Cheung. Alzpharm: integration of neurodegeneration data using rdf. BMC Bioinformatics, 8 Suppl 3:S4, 2007. 30. S. Liang, S. Fuhrman, and R. Somogyi. Reveal, a general reverse engineering algorithm for inference of genetic network architectures. Pac. Symp. Biocomput., pages 18–29, 1998. in file. 31. J. P. Massar, M. Travers, J. Elhai, and J. Shrager. Biolingua: a programmable knowledge environment for biologists. Bioinformatics, 21(2):199–207, Jan 2005. 32. M. A. Musen. Scalable software architectures for decision support. Methods Inf Med, 38(4-5):229–238, Dec 1999. 33. E. Neumann. A life science semantic web: are we there yet? Sci STKE, 2005(283):pe22, 2005. 1525-8882 (Electronic) Journal Article Review. 34. E. Neumann. Biodash: A semantic web dashboard for drug development. In R. Altman, editor, Pacific Symposium in Biocomputing, volume 11, pages 176–187, Hawai, 2006. 35. M. J. O’Connor, D. L. Buckeridge, M. Choy, M. Crubezy, Z. Pincus, and M. A. Musen. Biostorm: a system for automated surveillance of diverse data sources. AMIA Annu Symp Proc, page 1071, 2003. 36. D. Pe’er, A. Regev, G. Elidan, and N. Friedman. Inferring subnetworks from perturbed expression profiles. Bioinformatics, 17(Suppl):S215–S224, 2001. 1367-4803 Journal article. 37. M. Peleg, S. Tu, A. Manindroo, and R. B. Altman. Modeling and analyzing biomedical processes using workflow/petri net models and tools. Medinfo, 2004:74–8, 2004. 1569-6332 Journal Article. 38. M. Peleg, I. Yeh, and R. B. Altman. Modelling biological processes using workflow and petri net models. Bioinformatics, 18(6):825–37, 2002. 22069932 1367-4803 Journal Article. 39. S. Racunas, C. Griffin, and N. Shah. A finite model theory for biological hypotheses. In Computational Systems Bioinformatics Conference, 2004, pages 585–589. IEEE, 2004. TY - CONF. 40. S. Racunas, N. Shah, and N. Fedoroff. A contradiction-based framework for testing gene regulation hypotheses. In Computational Systems Bioinformatics Conference, 2003, pages 634–638. IEEE, 2003. TY - CONF.

460

N. Shah and M. Musen

41. S. A. Racunas, N. H. Shah, I. Albert, and N. V. Fedoroff. Hybrow: a prototype system for computer-aided hypothesis evaluation. Bioinformatics, 20(suppl 1):257–264, 2004. 42. V. N. Reddy. Modeling biological pathways: A discrete event systems approach. Master’s thesis, University of Maryland, College Park, 1994. 43. V. N. Reddy, M. L. Mavrovouniotis, and M. N. Liebman. Petri net representations in metabolic pathways. Proc Int Conf Intell Syst Mol Biol, 1:328–36, 1993. 96038982 Journal Article. 44. C. Rosse and J. L. V. Mejino. A reference ontology for biomedical informatics: the foundational model of anatomy. J Biomed Inform, 36(6):478–500, Dec 2003. 45. D. L. Rubin, Y. Bashir, D. Grossman, P. Dev, and M. A. Musen. Using an ontology of human anatomy to inform reasoning with geometric models. Stud Health Technol Inform, 111:429–435, 2005. 46. D. L. Rubin, D. Grossman, M. Neal, D. L. Cook, J. B. Bassingthwaighte, and M. A. Musen. Ontology-based representation of simulation models of physiology. AMIA Annu Symp Proc, pages 664–668, 2006. 47. D. L. Rubin, S. E. Lewis, C. J. Mungall, S. Misra, M. Westerfield, M. Ashburner, I. Sim, C. G. Chute, H. Solbrig, M.-A. Storey, B. Smith, J. Day-Richter, N. F. Noy, and M. A. Musen. National center for biomedical ontology: advancing biomedicine through structured organization of scientific knowledge. OMICS, 10(2):185–198, 2006. 48. A. Ruttenberg, T. Clark, W. Bug, M. Samwald, O. Bodenreider, H. Chen, D. Doherty, K. Forsberg, Y. Gao, V. Kashyap, J. Kinoshita, J. Luciano, M. S. Marshall, C. Ogbuji, J. Rees, S. Stephens, G. T. Wong, E. Wu, D. Zaccagnini, T. Hongsermeier, E. Neumann, I. Herman, and K.-H. Cheung. Advancing translational research with the semantic web. BMC Bioinformatics, 8 Suppl 3:S2, 2007. 49. A. Rzhetsky, I. Iossifov, T. Koike, M. Krauthammer, P. Kra, M. Morris, H. Yu, P. A. Duboue, W. Weng, W. J. Wilbur, V. Hatzivassiloglou, and C. Friedman. Geneways: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. J Biomed Inform, 37(1):43–53, 2004. 1532-0464 Journal Article. 50. A. Rzhetsky, T. Koike, S. Kalachikov, S. Gomez, M. Krauthammer, S. Kaplan, P. Kra, J. Russo, and C. Friedman. A knowledge model for analysis and simulation of regulatory networks. Bioinformatics, 16(12):1120–8–, 2000. 51. N. Shah and M. M.A. Which annotation did you mean? Technical Report SMI-2007-1247, Stanford Medical Informatics, May 2007. 52. N. H. Shah. Formal Methods for Genomic Data Integration. PhD thesis, The Pennsylvania State University, University Park, 2005. 53. N. H. Shah and N. V. Fedoroff. Clench: a program for calculating cluster enrichment using the gene ontology. Bioinformatics, 20(7):1196–7, 2004. 1367-4803 Journal Article. 54. J. Shrager, R. Waldinger, M. Stickel, and J. P. Massar. Deductive biocomputing. PLoS ONE, 2:e339, 2007. 55. B. Smith, W. Ceusters, B. Klagges, J. Khler, A. Kumar, J. Lomax, C. Mungall, F. Neuhaus, A. L. Rector, and C. Rosse. Relations in biomedical ontologies. Genome Biol, 6(5):R46, 2005. 56. R. Stevens, C. A. Goble, and S. Bechhofer. Ontology-based knowledge representation for bioinformatics. Brief Bioinform, 1(4):398–414, 2000. 21357582 1467-5463 Journal Article Review Review, Tutorial.

Ontologies for Formal Representation

461

57. C. Talcott, S. Eker, M. Knapp, P. Lincoln, and K. Laderoute. Pathway logic modeling of protein functional domains in signal transduction. Pac Symp Biocomput, pages 568–80, 2004. Journal Article. 58. X. Wang, R. Gorlitsky, and J. S. Almeida. From xml to rdf: how semantic web technologies will change the design of ‘omic’ standards. Nat Biotechnol, 23(9):1099–1103, Sep 2005. 59. L. Yue and W. C. Reisdorf. Pathway and ontology analysis: emerging approaches connecting transcriptome data and clinical endpoints. Curr Mol Med, 5(1):11–21, Feb 2005.