Semantic Web Meets e-Neuroscience: An RDF Use Case - CiteSeerX

105 downloads 0 Views 159KB Size Report
Hugo Y.K. Lam1, Luis Marenco2,3, Tim Clark8,9, Yong Gao9, June Kinoshita10 ... {Hugo.YK.Lam, Luis.Marenco, Gordon.Shepherd, Perry.Miller, Chiquito.Crasto ...
Semantic Web Meets e-Neuroscience: An RDF Use Case Hugo Y.K. Lam1, Luis Marenco2,3, Tim Clark8,9, Yong Gao9, June Kinoshita10 Gordon Shepherd4, Perry Miller2,3,5, Elizabeth Wu10, Gwen Wong10, Nian Liu2,3 Chiquito Crasto2,4, Thomas Morse4, Susie Stephens11, and Kei-Hoi Cheung2,3,6,7 1

Program in Computational Biology and Bioinformatics, 2Center for Medical Informatics 3 Anesthesiology, 4Neurobiology, 5Molecular, Cellular and Developmental Biology 6 Genetics, 7Computer Science, Yale University, New Haven, CT, USA 8Harvard University, Initiative in Innovative Computing, Cambridge, MA, USA 9Massachusetts General Hospital, Boston, MA, USA, 10Alzheimer Research Forum, 11Oracle {Hugo.YK.Lam, Luis.Marenco, Gordon.Shepherd, Perry.Miller, Chiquito.Crasto, Nian.Liu, Thomas.Morse, Kei.Cheung}@yale.edu, [email protected], [email protected] {EWu, JuneKino}@alzforum.org; [email protected]; [email protected]

Abstract. Presently, neuroscientists have access to a wide range of neuroscience databases through the Internet. However, most of these databases are neither integrated nor interoperating, which creates a barrier in answering complex neuroscience research questions. Agreement upon a domain ontology is typically useful for querying diverse data sets, but is insufficient for integrating neuroscience data spanning multiple domains. To this end, eNeuroscience seeks to provide an integrated platform for neuroscientists to discover new knowledge through seamless integration of diverse types and levels of neuroscience data. We present a Semantic Web approach to building this e-Neuroscience data integration framework, which involves using RDF as a standard data model to facilitate representation and integration of data. We have converted a subset of the BrainPharm database into RDF and integrated it with SWAN hypothesis and publication data extracted from Alzforum and made available in RDF as the upper ontology. Our implementation uses the RDF Data Model in Oracle Database 10g for data retrieval, integration, and inference. Our approach should be generalizable across many types of biomedical information. Keywords: Semantic Web, e-Neuroscience, neuroinformatics, database integration, RDF, ontology.

1 Introduction e-Science involves developing tools, technologies, and infrastructure to support multidisciplinary and collaborative science enabled by the Internet [1]. One of the key problems that e-Science aims to address is that of data integration. While integrating all the data in science still seems to be far from reality, applying the concept of eScience to a particular scientific domain is more feasible and realizable. eNeuroscience [2] (or neuroinformatics) shares the same vision as e-Science, but has a focus on neuroscience. It also fits the informatics-oriented goal of the Human

Brain Project (HBP) initiated by the U.S. National Institutes of Health (NIH) in 1993, which emphasizes the importance of integrating multiple levels of neuroscience-related information ranging from the molecular gene level to the behavioral level [3]. A positive step towards better understanding brain function is when these different levels of neuroscience data are integrated across all the dimensions and scales [4]. By combining the experimental results produced by multi-disciplinary groups, it can allow a more thorough investigation of complex neuroscience research problems including the study of neurodegenerative diseases such as Alzheimer’s Disease (AD) and Parkinson’s Disease [5]. However, the data in neuroscience are diverse, scattered and rapidly growing. Below we discuss some of the issues involved in integrating diverse types of neuroscience data provided by different sources, although these issues also arise in other scientific domains. 1.

Registry. A large number of neuroscience resources have been developed independently to address different research needs. While keyword search engines (e.g., Google) can help users locate the neuroscience resources of their interest, such a keyword-based search approach suffers from the problem of specificity and sensitivity. For example, if one searches using the keyword “neuron”, a large number of hits will be returned. To address this problem, central registries of neuroscience resources have been created to categorize and keep track of existing neuroscience databases. These registries provide search interface for users to find the database they want. Among these registries, the Neuroscience Database Gateway (NDG) (http://big.sfn.org/NDG/site/) was launched in 2004 as a pilot project sponsored by the Society for Neuroscience (SfN). NDG categorizes neuroscience resources into five types (e.g., databases and software tools). It employs a set of standard terms (e.g., name, description, URL, and species) for describing each resource. As the number of neuroscience resources continues to grow, such a centralized approach to registering resources may not be scalable. A better and more efficient framework is needed to allow resource registration and discovery.

2.

Interface. Fig. 1 shows the screenshots of the Web pages of three different neuroscience databases listed in NDG, which are: Cell-Centered Database [4] (cellular imaging data), CoCoDat [5] (cortical cell and microcircuitry data), and SenseLab [6] (integration of multidisciplinary sensory data). These databases contain different but related types of data. They distribute data using different formats in addition to allowing users to access the data via different user interfaces (as shown in Fig. 1). For example, CCDB provides a free text search interface; SenseLab has a structured form search interface; and CoCoDat is available for download as a Microsoft Access database. Currently, the only way to integrate the data provided by these databases is to do it manually. Such heterogeneity in data format and user interface makes data interoperability and data analysis difficult. A standardized and machine-understandable data format with an open and unified data access model would be crucial to building a data integration framework for e-Neuroscience.

3.

Nomenclature. One of the difficulties in making neuroscience data sources broadly sharable is a lack of standard nomenclature. For example, different

terms (e.g., Neural Arch and Vertebral Arch) may be used to describe the same neuro-anatomical region (e.g., part of the spinal cord). Ambiguity arises when the same term is associated with multiple meanings, for example, spine (vertebral spine) vs. spine (dentritic spine). 4.

Granularity. Different neuroscience databases may model the same type of data at different levels of granularity. For example, CCDB uses a single “dendrite” compartment (“D”) for all data associated with dendrites, whereas NeuronDB and CoCoDat subdivide the dendrite into types (e.g., apical and basal) and dendritic compartments (e.g., proximal, medial, and distal). As a result, data within NeuronDB and CoCoDat can be associated with specific dendritic compartments at a more specific (more granular) level than is possible in CCDB.

Fig.1. Examples of three neuroscience database web interfaces.

2 Semantic Web Approach to Representing and Integrating Data One goal of the Semantic Web is to expose the semantics of Web-accessible data in a standard machine-readable way so that the data can be more easily interpreted and integrated by computer programs (or Web agents). In the following subsections, we describe the component Semantic Web technologies that can be used to realize this vision.

2.1 Ontology One key component of the Semantic Web technology involves the development and use of standard machine-readable ontologies to facilitate knowledge representation and semantic interoperability. Unlike data models, which are focused on typing and structural association of attributes, ontologies convey much more richly the semantics (meanings) of data. The Semantic Web uses ontologies to capture domain concepts, their relationships, and associated data: an ontology is “the specification of a conceptualization” of a knowledge domain [7]. The Semantic Web data integration approach [8] presents standard machinereadable ontologies ubiquitously on the web, in the form of triple-associations [9] or, more elegantly, of a description logic [10]. The ontologies are thereby exposed to processing by Web-aware agents, as well as to human access and interpretation. This facilitates extensible knowledge representation and semantic interoperability, and critically deepens our ability to treat the Web as a true knowledge base. An ontology consists of relatively generic knowledge that can be reused by different kinds of applications within a knowledge domain. In multi-domain investigations such as neuroscience, constructing mappings across the domains becomes a critical interoperability concern. One general approach to multi-domain data integration involves mapping between domain schemas or ontologies and a common cross-domain or “upper” schema or ontology. A significant number of ontologies have been developed in different biomedical domains (including the neuroscience domain). They vary in size and expressiveness. The less expressive ontologies are called “weak ontologies” (e.g., standard vocabularies and taxonomies) and the more expressive ones are called “strong ontologies” (e.g., thesauri). There is an increasing need for using expressive bio-ontologies to facilitate machine-based data integration and data mining. Recognizing this need, community efforts have begun to build ontologies for use by computer applications deployed in different domains of biosciences. Examples include the Gene Ontology [11] (a controlled vocabulary describing gene and gene product attributes in any organism), Plant Ontology [12] (a controlled vocabulary describing plant structures and growth and developmental stages), and Unified Medical Language System [13] (which consists of a vocabulary database about biomedical and health related concepts). To provide a common access to these separately developed bio-ontologies, the Open Biomedical Ontologies [14] (a centralized web resource for providing access to a variety of ontologies related to biomedical sciences) has been created. Many of these bio-ontologies can be crossreferenced or inter-linked to facilitate more comprehensive knowledge acquisition and engineering. In addition to domain-specific ontologies, there is a desire to create an upper ontology that is limited to concepts that are meta, generic, abstract and philosophical, and therefore are general enough to address (at a high level) a broad range of domain areas. Concepts specific to given domains will not be included; however, this standard will provide a structure and a set of general concepts upon which domain-specific ontologies (e.g. microarrays, proteomics, and pathways) could be constructed. Examples of upper ontologies in the biological domain include Functional Genomics Investigation Ontology (FuGO) (http://fugo.sourceforge.net/) and EXPO

(http://expo.sourceforge.net/), which is a general ontology for scientific experiments.

2.2 Ontological languages In the Semantic Web context, several ontological languages (implemented based on the eXtensible Markup Language or XML) have been developed to encode ontologies. To support machine readability, the eXtensible Markup Language (XML) was introduced to associate meaningful tags with data values. In addition, a hierarchical (element/sub-element) structure can be created using these tags. With such descriptive and hierarchically structured labels, computer applications are given better semantic information to parse data in a meaningful way. Despite its machine readability, as indicated by Wang et al. [15], the nature of XML is syntactic and document-centric. This limits its ability to achieve the level of semantic interoperability required by the highly dynamic and integrated bioinformatics applications. In addition, there is a problem with the proliferation of and redundancy of XML formats in the life science domain. Overlapping XML formats (e.g., SBML [16] and PSI MI [17]) have been developed to represent the same type of biological data (e.g., pathway data). The Semantic Web has taken the usage of XML to a new level of ontology-based standardization. In the Semantic Web realm, XML is used as an ontological language to implement machine-readable ontologies in conjunction with standard knowledge representation techniques. The Resource Description Framework (RDF) (http://www.w3.org/RDF/) is an important first step in this direction. It offers a simple but useful semantic model based on the directed acyclic graph structure. In essence, RDF is a language for defining statements about resources and relationships among them. Such resources and relationships are identified using the system of Uniform Resource Identifiers (URIs). Each RDF statement is a triplet with a subject, property (or predicate), and property value (or object). For example, is a triple statement expressing that the subject Dopamine has Neurotransmitter as the value of its Function property. The objects appearing in triples may assign pointers to other objects in such a way as to create a nested structure. RDF also provides a means of defining classes of resources and properties. These classes are used to build statements that assert facts about resources. RDF uses its own syntax (RDF Schema or RDFS) for writing a schema for a resource. RDFS is more expressive than RDF and it includes subclass/superclass relationships as well as constraints on the statements that can be made in a document conforming to the schema. While RDF and RDFS are commonly used Semantic Web standards, neither is expressive enough to support formal knowledge representation that is intended for processing by computers. Such a knowledge representation includes explicit objects (e.g., the class of all proteins, or “Tau” one specific protein), and assertions or claims about them (e.g., “acetylcholinesterase is an enzyme”, or “all enzymes are proteins”). Representing knowledge in such an explicit form enables computers to draw conclusions from knowledge already encoded in this form. More sophisticated XML-

based knowledge representation languages such as the Web Ontology Language [18] have been developed. OWL is based on description logics (DL) [19], which are a family of class-based (concept-based) knowledge representation formalisms [19]. They are characterized by the use of various constructors to build complex classes from simpler ones, an emphasis on the decidability of key reasoning problems, and by the provision of sound, complete and (empirically) tractable reasoning services. Description Logics, and insights from DL research, had a strong influence on the design of OWL, particularly on the formalization of the semantics, the choice of language constructors, and the integration of data types and data values. For an indepth overview of OWL, the reader can refer to [19]. Some biomedical datasets such as the Gene Ontology [11], UniProt (http://expasy3.isb-sib.ch/~ejain//rdf/), and the NCI thesaurus [20] have been made available in RDF format. The data pathway exchange standard called BioPAX (http://www.biopax.org/) has been deployed in OWL to standardize the ontological representation of pathway data [21]. Increasingly, pathway databases including HumanCyc [22] and Reactome [23] have exported data in the OWL-based BioPAX format. In addition, applications that demonstrate how to make use of RDF/OWL datasets have been developed (e.g., [24, 25]).

2.3 Semantic-Web-aware tools

While creating and encoding ontologies in RDF or OWL format is an important first step in Semantic Web, we need tools that can understand and manipulate data represented in such formats. In the past several years, a large collection of SemanticWeb-aware tools have been developed and made accessible to the public. These tools can be grouped into the following categories. 1.

Ontology editors and visualization tools. There are tools that allow the user to create, edit, and visualize RDF/OWL ontologies. Examples include Protégé’, WebOnto (http://kmi.open.ac.uk/projects/webonto/), and GrOWL (http://ecoinformatics.uvm.edu/technologies/growl-knowledge-modeler.html). There are also advanced ontology editors that allow alignment and integration of multiple ontologies (e.g., [26]).

2.

Ontology parsers. Different parsers have been developed for different programming languages to parse data in RDF or OWL format. For example, PerlRDF is one of the RDF parsers written in Perl (http://www.gingerall.org/perlrdf.html). Jena is a Java framework for building Semantic Web applications (http://jena.sourceforge.net/). It provides a programmatic environment for RDF, RDFS and OWL.

3.

Database and querying tools. To provide persistence, management and querying capabilities for RDF-/OWL-formatted data, several RDF database systems have been implemented. Among them OpenRDF Sesame (http://www.openrdf.org/) and Kowari (http://kowari.sourceforge.net/) are open-

source RDF database systems, while the Oracle RDF Data Model [27] is a commercial product (Oracle Database 10g) of Oracle. Tools such as D2RQ are also available for mapping the relational database structure to the corresponding RDF structure, which can facilitate migration from the relational database to the RDF database. 4.

Reasoners. To go beyond data querying, OWL provides the DL -based reasoning capability. A number of reasoners have been developed including Racer [28], Pellet (http://www.mindswap.org/2003/pellet/), and FaCT (http://www.ontoknowledge.org/tools/fact.shtml).

3 Neuroscience Data Integration Use Case Our use case demonstrates how to use the Oracle RDF Data Model (which is installed on a Linux server) to build a Semantic Web data warehouse for integrating datasets extracted from two independently-developed neuroscience databases, namely, BrainPharm (a subdatabase of SenseLab) and SWAN (Semantic Web Applications in Neuromedicine) [29]. 1.

BrainPharm. It is a database under development to support research on drugs for the treatment of different neurological disorders (http://senselab.med.yale.edu/BrainPharm). It contains pharmacological agents that act on neuronal receptors and signal transduction pathways in the normal brain and in nervous disorders such as Alzheimer’s Disease (AD). It enables searches for drug actions at the level of key molecular constituents, cell compartments and individual cells, with links to models of these actions.

2.

SWAN/Alzform. SWAN is a project to develop knowledge management tools and resources for Alzheimer Disease (AD) researchers, based on an ecosystem model of scientific discourse. The SWAN model uses an upper ontology, including the following components: scientists, experiments, data, grant applications, publications, scientific databases, bibliographic databases, scientific ontologies, biomedical research collaborations, and scientific web communities. SWAN is implemented using Semantic Web technology, and represents data in RDF, which is stored and published as part of the Alzheimer Research Forum website (http://www.alzforum.org). Alzforum is a widely used scientific web community which reports on the latest Alzheimer's scientific research, and develops databases of genes, scientific articles, animal models, antibodies, medications, grants, research jobs, and more.

3.1 Data conversion and loading into the Oracle RDF Data Model As a pilot demonstration, we integrate the drug-related (chemical) information extracted from BrainPharm with SWAN hypotheses and publications extracted from

Alzforum. While SWAN data are already available in RDF format (it supports both RDF/XML format and N-triple format), BrainPharm exports data in its own XML format called EDSP (Electronic DataSet Protocol) [6]. We convert the EDSP/XML format into the corresponding RDF/XML format using XSL Transformation (XSLT). After converting BrainPharm data into RDF/XML format, we load both the SWAN RDF dataset and BrainPharm RDF dataset into the Oracle RDF Data Model using its data loader program, which takes RDF data in N-triple format (conversion from RDF/XML format to N-triple format is straightforward, as many tools are available for performing such a conversion).

3.2 RDF-based queries Oracle has extended SQL to provide support for an RDF query language, which allows users to perform queries against multiple RDF datasets. The following two examples illustrate how such queries can be made to retrieve and integrate data from BrainPharm and SWAN. 3.2.1 Example 1 In this example, we query BrainPharm to group and count AD drugs based on their molecular targets. The Oracle query syntax is shown below. SELECT bpharm.uname USAGE, count(bpharm.drugname) NO_OF_DRUGS, bpharm.drugtarget TARGET FROM TABLE(SDO_RDF_MATCH( '(?drug :name ?drugname) (?drug :molecularTarget ?mtarget) (?mtarget :name ?drugtarget) (?drug :clinicalUsage ?usage) (?usage :name ?uname)', SDO_RDF_Models('brainpharm'), null, SDO_RDF_Aliases(SDO_RDF_Alias('', 'http://senselab.med.yale.edu/senselab#')), null)) bpharm WHERE lower(bpharm.uname) like 'alzheimer%' and (lower(bpharm.drugtarget) like '%acetylcholinesterase%' or lower(bpharm.drugtarget) like '%nmda%' ) GROUP BY bpharm.drugtarget, bpharm.uname ;

The output of this query indicates that there are 2 groups of drugs for AD. The first one contains 5 drugs that act as acetylcholinesterase inhibitors. The second one contains only 1 drug that is a N-methyl-D-aspartic acid (NMDA) receptor antagonist. The query demonstrates how to make use of the “GROUP BY” feature (which is supported by standard SQL) to perform aggregation on the data. Current implementations of RDF query languages (RQL) by specialized RDF stores do not support aggregate functions (e.g., COUNT, SUM and AVERAGE) via “GROUP

BY”, although some of these functions are being included in the SPARQL (the RDF counterpart of SQL) standard specifications (http://www.w3.org/TR/rdf-sparqlquery). The Oracle RDF query language supports such functions, as it is a hybrid between RQL and SQL. 3.2.2 Example 2 This example query shows how a complex query can be formulated to integrate BrainPharm’s drug data and SWAN’s publication data. Particularly, the query retrieves the information (stored in BrainPharm) about the AD drug “Donepezil” and publications (stored in SWAN) whose titles or abstracts contain the term “Donepezil” (case-insensitive). In addition, the query demonstrates the use of RDF inferencing based on the parent-child (is-a) relationship between the Publication class (e.g., original articles retrieved from PubMed) and ARFPublication class (e.g., PubMed articles that have been commented by researchers/curators associated with Alzforum) as defined in the SWAN RDF Schema as shown below. ARFPublication

The Oracle RDF Data Model, with its support of RDFS, allows us to create rules for hierarchical relationships from the RDFS for the data. Therefore, it enables us to find out all the publications, including the ARF publications, which are related to AD drugs (e.g., Donepezil). Although the example here only has two levels, such is-a inference is performed recursively for any number of levels. This type of inference is non-trivial for the relational approach. The syntax of this example query is shown below. SELECT drugs.drugname drug_name, drugs.description drug_desc, journals.pname pub_person, journals.jname pub_name, journals.title pub_title FROM ( SELECT bpharm.drugname drugname, bpharm.drugdesc description, bpharm.drugtarget target FROM TABLE(SDO_RDF_MATCH( '(?drug :name ?drugname) (?drug :description ?drugdesc) (?drug :clinicalUsage ?usage) (?drug :molecularTarget ?mtarget) (?mtarget :name ?drugtarget) (?usage :name ?uname)', SDO_RDF_Models('brainpharm'), null, SDO_RDF_Aliases(SDO_RDF_Alias('','http://senselab.med.yale.edu/senselab#')), null)) bpharm where lower(bpharm.drugname) = 'donepezil'

) drugs, ( SELECT distinct swan.title title, swan.abstract abstract, swan.lastname || ', ' || swan.firstname pname, swan.journal || ' vol.' || swan.volume jname FROM TABLE(SDO_RDF_MATCH( '(?pub :title ?title) (?pub :abstract ?abstract) (?pub :createdBy ?creator) (?pub :journal ?journal) (?pub :volume ?volume) (?creator :lastName ?lastname) (?creator :firstName ?firstName) (?pub rdf:type :Publication)', SDO_RDF_Models('swan'), SDO_RDF_Rulebases('RDFS'), SDO_RDF_Aliases(SDO_RDF_Alias('','http://purl.org/swan/0.1#')), null)) swan ) journals where (lower(journals.title) like '%' || lower(drugs.drugname) || '%' or lower(journals.abstract) like '%' || lower(drugs.drugname) || '%');

In the SWAN dataset, publications (e.g., those retrieved from PubMed) are treated as instances of the Publication class. We define publications that have been commented in the Alzforum as instances of the ARFPublication class. With the is-a inference rule incorporated into the query, it finds a total of 19 publications that are linked to claims and/or hypotheses that have to do with the effect of Donepezil on AD treatment. Among these publications, one of them belongs to the ARFPublication class (i.e., one of the 19 publications is ARF-commented). Given the ID (PubMed ID) of the commented publication, the user can retrieve the detailed comments through the Alzforum Web site.

4 Conclusion and Future Directions In this paper, we have shown how to use the Oracle RDF Data Model technology to create a Semantic Web data warehouse to integrate data (related to Alzheimer’s Disease) extracted from BrainPharm and SWAN. We have also demonstrated the RDF querying and inferencing features including the support of data aggregation functions and semantic inference rules provided by the Oracle Database. Our Semantic Web approach extends the data warehouse solution by providing a richer semantic representation and inferencing capability. It can also be adapted to other integration solutions involving many different types of neuroscience information. By adding a RDF module to an existing relational database, it can simplify the task of relational-to-RDF migration. This is important, as many existing neurosciences databases are in the relational form. As data integration is one of the most important problems in e-Neuroscience, this paper shows how the Semantic Web approach can

help address this problem. To increase the use of Semantic Web in e-Neuroscience, we suggest the following future directions of our work. 1.

User-friendly query interface. We will build a Web-based application that provides an intuitive interface (including query forms) to allow the user to easily retrieve data from the data warehouse.

2.

Increase of overlap. To support better integrative neuroscience research, we need to increase the ontological and data linkage between BrainPharm and SWAN. While we are in the process of enhancing the ontological representation of BrainPharm and SWAN, more AD-related data are being added to the two databases. The goal is to strengthen the complementary relationship between BrainPharm and SWAN.

3.

OWL support. The Oracle RDF Data Model currently supports only RDF/RDFS, while it accepts OWL data for repository purposes. OWL-based inferencing cannot be directly performed, but it can be supported indirectly through Protégé, for instance, as there are Oracle RDF Data Model and Racer plugins. Alternatively, we can export the data to files in OWL format, which can then be input into OWL reasoners. For better convenience and performance, it would also be useful to have direct or native support for OWL by the Oracle RDF Data Model in the future.

4.

Query mediation. The data integration system we demonstrated is an example of the data warehouse approach, it would be important to explore the query mediation approach (database federation) in the neuroscience domain [30] especially when a large number of neuroscience databases are involved. Such exploration efforts have begun in the computer science research community (e.g., [31]). In the life science domain, Stephens et al. have demonstrated how to build a federated database using Cerebra Server (http://www.cerebra.com/) for integrating drug safety data [32]. Cerebra Server supports OWL-based inferencing as well as provides the ability of mediating queries against multiple data sources including RDF databases.

Acknowledgments. This work was supported in part by NIH grants K25 HG02378, P01 DC04732, T15 LM 07056, P20 LM07253, NSF grant DBI-0135442, and Ellison Medical Foundation.

References 1. 2.

3.

T. Hey, A.E. Trefethen. 2005. Cyberinfrastructure for e-Science. Science. 308(5723): 817-21. M.E. Martone, A. Gupta, M.H. Ellisman. 2004. E-neuroscience: challenges and triumphs in integrating distributed data from molecules to brains. Nat Biotechnol. 7(5): 467-72. M.F. Huerta, S.H. Koslow, A.I. Leshner. 1993. The Human Brain Project: an international resource. Trends Neurosci. 16(11): 436-8.

4.

5.

6.

7. 8. 9.

10.

11.

12.

13. 14.

15.

16.

17.

M.E. Martone, S. Zhang, A. Gupta, X. Qian, H. He, D.L. Price, M. Wong, S. Santini, M.H. Ellisman. 2003. The cell-centered database: a database for multiscale structural and protein localization data from light and electron microscopy. Neuroinformatics. 1(4): 379-96. J. Dyhrfjeld-Johnsen, J. Maier, D. Schubert, J. Staiger, H.J. Luhmann, K.E. Stephan, R. Kotter. 2005. CoCoDat: a database system for organizing and selecting quantitative data on single neurons and neuronal microcircuitry. Journal of Neuroscience Methods. 141(2): 291-308. L. Marenco, N. Tosches, C. Crasto, G. Shepherd, P.L. Miller, P.M. Nadkarni. 2003. Achieving Evolvable Web-Database Bioscience Applications Using the EAV/CR Framework: Recent Advances. J Am Med Inform Assoc. 10(5): 444453. T. Gruber. 1993. Ontolingua: a translation approach to providing portable ontology specifications. Knowledge Acquisition. 5(2): 199-200. T. Berners-Lee, J. Hendler, O. Lassila. 2001. The semantic web. Scientific American. 284(5): 34-43. J. Caroll, G. Klyne, Resource Description Framework (RDF): Concepts and Abstract Syntax. W3C Recommedation (10 February 2004) (http://www.w3.org/TR/rdf-concepts/). 2004. D.L. McGuinness, F.v. Harmelen, OWL Web Ontology Overview. W3C Recommendation (10 February 2004) (http://www.w3.org/TR/owl-features/). 2004. M. Ashburner, C. Ball, J. Blake, D. Botstein, H. Butler, M. Cherry, A. Davis, K. Dolinski, S. Dwight, J. Eppig, et al. 2000. Gene ontology: tool for the unification of biology. Nature Genetics. 25: 25-29. P. Jaiswal, S. Avraham, K. Ilic, E.A. Kellogg, S. McCouch, A. Pujar, L. Reiser, S.Y. Rhee, M.M. Sachs, M. Schaeffer, et al. 2005. Plant Ontology (PO): a controlled vocabulary of plant structures and growth stages. Comparative and Functional Genomics. 6: 388-97. J. Cimino, R. Sideli. 1991. Using the UMLS to bring the library to the bedside. Med Decis Making. 11(4 Suppl): S116-20. P. Burek, R. Hoehndorf, F. Loebe, J. Visagie, H. Herre, J. Kelso. 2006. A toplevel ontology of functions and its application in the Open Biomedical Ontologies. Bioinformatics. 22(14): 66-73. X. Wang, R. Gorlitsky, J.S. Almeida. 2005. From XML to RDF: how semantic web technologies will change the design of 'omic' standards. Nat Biotechnol. 23(9): 1099-103. M. Hucka, A. Finney, H. Sauro, H. Bolouri, J. Doyle, H. Kitano, A. Arkin, B. Bornstein, D. Bray, A. Cornish-Bowden, et al. 2003. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 19(4): 524-31. H. Hermjakob, L. Montecchi-Palazzi, G. Bader, J. Wojcik, L. Salwinski, A. Ceol, S. Moore, S. Orchard, U. Sarkans, C.v. Mering, et al. 2004. The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data. Nature Biotechnology. 22: 177-83.

18. H. Donis-Keller, P. Green, C. Helms, S. Cartinhour, B. Weiffenbach, K. Stephens, t. Keith, d. Bowden, D. Smith, E. Lander, et al. 1987. A Genetic Linkage Map of the Human Genome. Cell. 51: 319-337. 19. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. Patel-Schneider. The Description Logic Handbook. 2002: Cambridge University Press. 20. J. Goldbeck, G. Fragoso, F. Hartel, J. Hendler, B. Parsia, J. Oberthaler. 2003. The national cancer institute's thesaurus and ontology. Journal of Web Semantics. 1(1). 21. J.S. Luciano. 2005. PAX of mind for pathway researchers. Science. 10(13): 93742. 22. P. Romero, J. Wagg, M. Green, D. Kaiser, M. Krummenacker, P. Karp. 2004. Computational prediction of human metabolic pathways from the complete human genome. genome Biol. 6(1): R2. 23. G. Joshi-Tope, M. Gillespie, I. Vastrik, P. D'Eustachio, E. Schmidt, B. de Bono, B. Jassal, G.R. Gopinath, G.R. Wu, L. Matthews, et al. 2005. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33(Database issue): D428-32. 24. K.-H. Cheung, K.Y. Yip, A. Smith, R. deKnikker, A. Masiar, M. Gerstein. 2005. YeastHub: a semantic web use case for integrating data in the life sciences domain. Bioinformatics. 21(suppl_1): i85-96. 25. E.K. Neumann, D. Quan. Biodash: A Semantic Web Dashboard for Drug Development. in Pacific Symposium on Biocomputing. 2006: World Scientific 176-87. 26. S. Aitken, R. Korf, B. Webber, J. Bard. 2005. COBrA: a bio-ontology editor. Bioinformatics. 21(6): 825-6. 27. E.I. Chong, S. Das, G. Eadon, J. Srinivasan. An efficient SQL-based RDF querying scheme. in VLDB. 2005. Trondheim, Norway. 28. V. Haarslev, R. Moeller, M. Wessel. Querying the Semantic Web with Racer + nRQL. in Proceedings of the KI-04 Workshop on Applications of Description Logics. 2004. Ulm, Germany: Deutsche Bibliothek. 29. Y. Gao, J. Kinoshita, E. Wu, E. Miller, R. Lee, A. Seaborne, S. Cayzer, T. Clark. 2006. SWAN: A Distributed Knowledge Infrastructure for Alzheimer Disease Research. Journal of Web Semantics. 4(3). 30. B. Ludascher, A. Gupta, M.E. Martone. Model-based Mediation with Domain Maps. in 17th International Conference on Data Engineering (ICDE 01). 2001. Heidelberg, Germany 81-90. 31. H. Chen, Z. Wu, H. Wang, Y. Mao. RDF/RDFS-based Relational Database Integration. in ICDE. 2006. Atlanta, Georgia 94. 32. S. Stephens, A. Morales, M. Quinian. 2006. Applying semantic web technologies to drug safety determination. IEEE Intelligent Systems. 21(1): 82-6.

Suggest Documents