BioSilico: an integrated metabolic database system - Semantic Scholar

2 downloads 1356 Views 108KB Size Report
ating system, and run on JDK 1.4.0 Java Virtual Machine. The generated codes are written in Java, MySQL (version. 3.23.47) and JavaServer Pages (JSP).
BIOINFORMATICS APPLICATIONS NOTE

Vol. 20 no. 17 2004, pages 3270–3272 doi:10.1093/bioinformatics/bth363

BioSilico: an integrated metabolic database system Bo Kyeng Hou1,† , Jin Sik Kim2,† , Ji Hoon Jun3 , Dong-Yup Lee1 , Yong Wook Kim3 , Sujin Chae4 , Mira Roh3 , Yong-Ho In3 and Sang Yup Lee1,2,∗ 1 Department

of BioSystems and Bioinformatics Research Center and 2 Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering, BioProcess Engineering Research Center, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701, Republic of Korea, 3 IDRTech Inc., C-2711, Kolontripolis 1 210, Kumgok-dong, Bundang-gu, Sungnam 463-805, Republic of Korea and 4 Department of Biological Chemistry, College of Pharmacy, Seoul National University, San 56-1 Sillim-dong, Gwanak-gu, Seoul 151-742, Republic of Korea

ABSTRACT Summary: BioSilico is a web-based database system that facilitates the search and analysis of metabolic pathways. Heterogeneous metabolic databases including LIGAND, ENZYME, EcoCyc and MetaCyc are integrated in a systematic way, thereby allowing users to efficiently retrieve the relevant information on enzymes, biochemical compounds and reactions. In addition, it provides well-designed view pages for more detailed summary information. BioSilico is developed as an extensible system with a robust systematic architecture. Availability: BioSilico database system is accessible at http://biosilico.kaist.ac.kr Contact: [email protected]

and MetaCyc (http://www.metacyc.org), which enable users to search for metabolic pathway information and to reconstruct reaction network of their own. However, these databases possess different data collections, which will become much more usable if they are unified into composite resources with common interfaces. Furthermore, a number of discordances are observed among the data contents stored in these databases, which will likely to become more severe as the amount of data increases (Stein, 2003). Therefore, the integration of such public databases is indeed required to properly handle the redundant and heterogeneous data sources scattered over different sites. Such a challenge is addressed in this study by establishing an integrated database system, BioSilico, for an easier navigation of metabolic information.

INTRODUCTION Recent years have seen an explosion in the amount of biological data and information. This has resulted in the development of various databases containing biological information at different levels ranging from sequences to pathways (Baxevanis, 2003). Of them, metabolic databases provide information on metabolic pathways, reactions and enzymes which are essential to the elucidation of the metabolic and physiological characteristics of organisms. Currently, there are several public databases available. They include KEGG/LIGAND (http://www.genome.ad.jp/ligand), ENZYME (http://www.expasy.ch/enzyme), BRENDA (http:// www.brenda.uni-koeln.de), EcoCyc (http://www.ecocyc.org) ∗ To

whom correspondence should be addressed.

† The

authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

3270

DESCRIPTION OF THE DATABASE AND SYSTEM ARCHITECTURE BioSilico is a web-based database system that provides integrated access to public metabolic databases available. An integrated data structure and a variety of querying logics are developed to allow easy and efficient retrieval of various data in the metabolic network. The system is installed on the IBM RS/6000 server running the IBM AIX 5.0L operating system, and run on JDK 1.4.0 Java Virtual Machine. The generated codes are written in Java, MySQL (version 3.23.47) and JavaServer Pages (JSP). Two Marvin Java applets (MarvinSketch and MarvinView) and JChem libraries from ChemAxon, Inc. (http://www.chemaxon.com) are also part of the system. In this way, the BioSilico system is supported by a database of chemical structures (Chem DB), and its web pages are dynamically generated through JSP scripts (Fig. 1).

Bioinformatics vol. 20 issue 17 © Oxford University Press 2004; all rights reserved.

Downloaded from bioinformatics.oxfordjournals.org by guest on September 18, 2011

Received on March 12, 2004; revised on April 14, 2004; accepted on June 10, 2004 Advance Access publication June 17, 2004

BioSilico: an integrated metabolic database system

Downloaded from bioinformatics.oxfordjournals.org by guest on September 18, 2011

Fig. 1. Overview of system architecture of BioSilico. Public databases are systematically integrated within a uniform and web-searchable framework, allowing easy retrieval of the relevant information on metabolic pathways, reactions, and enzymes and metabolites involved within.

RELATIONAL STRUCTURE FOR DATA INTEGRATION A unified data model is developed for integrating the public source databases, LIGAND, ENZYME, EcoCyc and MetaCyc that are heterogeneously described according to their own schemas. Entities and attributes were first extracted from flatfiles of the public databases, thus resulting in a relational structure of data. On the basis of resultant relations, a logical schema was then defined for efficiently querying the relevant information and for displaying the results in various ways. It should be noticed that the databases are continuously renewed by DB updating system, stored in a local server and parsed into the relational structure according to the logical schema.

The relations in BioSilico are mainly based on the information of Enzyme Commission (EC) number, enzyme, compound and metabolic reaction. Describing briefly, the ENZYME, LIGAND::enzyme, EcoCyc::reaction and MetaCyc::reaction are linked together by using the Enzyme Commission (EC) number. Additionally, the data stored in ENZYME and LIGAND are linked to the EcoCyc::enzyme, EcoCyc::compound, MetaCyc::enzyme and MetaCyc::compound by using EcoCyc::reaction and MetaCyc::reaction. Other information is linked to one another using their own unique IDs. At the time of this writing, BioSilico mainly contains information on 9994 enzymes, 13 554 compounds and 12 248 metabolic reactions with duplications allowed.

3271

B.K.Hou et al.

ACCESSING THE DATABASE VIA WEBSITE—QUERYING LOGICS AND PROCESSING INTERFACES

SUMMARY AND FUTURE DEVELOPMENT BioSilico can be utilized as a robust and integrated database system for the analysis of metabolic pathways and reactions, and enzymes and compounds involved within. It enables

3272

ACKNOWLEDGEMENTS This work was supported by the Advanced Backbone IT Technology Development Project (IMT2000-C3-1) and the Brain Korea 21 Project. Hardware support by the IBM-SUR program is greatly appreciated.

REFERENCES Baxevanis,A.D. (2003) The molecular biology database collection: 2003 update. Nucleic Acids Res., 31, 1–12. Lee,D.-Y., Yun,H., Park,S. and Lee,S.Y. (2003) MetaFluxNet: the management of metabolic reaction information and quantitative metabolic flux analysis. Bioinformatics, 19, 2144–2146. Stein,L.D. (2003) Integrating biological databases. Nat. Rev. Genet., 4, 337–345.

Downloaded from bioinformatics.oxfordjournals.org by guest on September 18, 2011

In BioSilico, various querying logics and user-friendly interfaces are designed to efficiently retrieve the relevant information on enzymes, biochemical compounds and biological reactions from the integrated database. Queries using EC number, name, formula and Chemical Abstracts Service Registry (CAS) number simultaneously examine all the classes of the database; the matched entries are collected; and a view page of query results is dynamically generated, allowing access to more detailed information using the hyperlinks provided. In addition, a Java Applet for chemical structure search allows users to either draw the compound to be searched or enter its SMILES string, resulting in the list of matching compounds satisfying the selected search type (similarity, substructure or exact). Clicking on a returned structure of the compound from Chem DB leads to the corresponding BioSilico compound page. Consequently, one query renders it possible to efficiently search for the entire classes of the integrated database, retrieve the relevant information and finally to display welldesigned view pages automatically for more detailed summary information.

the cross-checking of data present in the source metabolic databases, which are now integrated within BioSilico. It will become more and more useful as the amount of information stored in the source databases increases. We also have a plan to enhance the user interface to make it possible for the users to design and analyze the metabolic pathways of interest. To this end, we are developing standardized interfaces linking BioSilico with other analysis tools or systems including a web-based dynamic simulation environment and our standalone metabolic flux analysis program package MetaFluxNet (Lee et al., 2003). In the future, users will be able to quantitatively and qualitatively analyze the metabolic pathways via the web interface under this integrated environment.

Suggest Documents