486
Genome Informatics 13: 486–487 (2002)
Development of Database Systems by Integrating Heterogeneous Metabolic Databases Jin Sik Kim1
Sung Ho Yoon1
[email protected]
[email protected]
Doheon
Lee3
[email protected] 1
2
3
Sang Yup Lee1,2
[email protected]
Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong Yuseong-gu, Daejeon 305-701, Republic of Korea Bioinformatics Research Center, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong Yuseong-gu, Daejeon 305-701, Republic of Korea Laboratory of Bio-information Systems, Department of BioSystems, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong Yuseong-gu, Daejeon 305701, Republic of Korea
Keywords: database integration, metabolism, relational model
1
Introduction
Recently, the amount of biological data generated from various research fields is tremendous. Accumulation of raw and processed data causes several serious problems to deal with the data. Therefore, it is necessary to design an efficient computational system for the integration of biological data. Bergholtz et al. have developed a model for the integration of sequence databases using relational database approach [2]. There are a lot of challenges to develop local systems for the analysis of genomic and metabolic information [9, 10]. Our final objective is to develop a system to simulate dynamic model of metabolism and to show the metabolic information that is requested by the users. In this paper, we report a relational model for integrated metabolic database.
2 2.1
Method and Results Target Databases for the Integration
We have analyzed three databases which mainly deal with the metabolic information such as corresponding pathways, involved enzymatic reaction etc. LIGAND [3, 13], ENZYME [1, 12] and ECOCYC [7, 11] have been analyzed to integrate metabolic information in a relational model. The flat files of each database have been downloaded from the public FTP websites.
2.2
Analysis of the Data - Searching for the Relation
We have found that there are a lot of differences and specific characteristics in the data structures of ECOCYC database [4, 5, 6]. In contrast to ECOCYC, LIGAND and ENZYME share a part that is related to the enzyme information. To find the similarities and differences in the definition of data types, we have described the attributes contained in the data files in detail.
2.3
Construction of Relational Model - ER (Entity-Relationship) Model Description and Table Formation
The Entity-Relationship model was defined by using the results of analysis. Various possibilities of models were present for this system. However, the number of efficient models was decreased to a small number of candidates to satisfy the necessity of the database system. According to the basic query candidates, a specific data field was divided into more than one data attributes to represent key
Development of Database Systems by Integrating Heterogeneous Metabolic Databases
487
attributes, i.e. EC number. We have generated entities based on the purpose of the database. The enzyme information is located in the most important position of the relation. The enzyme information contains the EC number and the name of enzyme. To complement for the variability in the definition of enzyme names, a foreign entity is defined by using the information of synonyms. Along with the enzyme information, the enzymatic reaction has an important role in the simulation of dynamic behaviors of metabolic pathways. Various parameters which are required to develop a kinetic equation have been included in the reaction information. Other entities have been defined in a similar manner. After the entity-relationship model description has been done, it is an easy task to convert relational model. The relational models were represented by tables. Each table contained attributes, data types, sizes, multiplicities and keys to efficiently relate the data to other tables.
3
Conclusion
We have developed an integrated database schema for the efficient integration of three heterogeneous metabolic databases. To integrate heterogeneous databases into a structural system, it is required to consider the possibility of schema conflicts occurred among the data which are stored in the separate databases [8]. If the contents or the objectives of the database are changed, it may be required to change the entire schema of the database.
4
Acknowledgments
This work was supported by the Advanced Backbone IT Technology Development Project (IMT2000C3-1) of the Ministry of Information and Communication (MIC) and Korean Ministry of Science and Technology (MOST) and by the National Research Laboratory Program of the MOST, and by the Brain Korea 21 Project.
References [1] Bairoch, A., The ENZYME database in 2000, Nucleic Acids Res., 28(1):304–305, 2000. [2] Bergholz, A., Heymann, S., Schenk, J.A., and Freytag, J.C., Biological sequences integrated: A relational database approach, Acta Biotheoretica, 49:145–149, 2001. [3] Goto, S., Okuno, Y., Hattori, M., Nishioka, T., and Kanehisa, M., LIGAND: Database of chemical compounds and reactions in biological pathways, Nucleic Acids Res., 30(1):402–404, 2002. [4] Karp, P.D., A strategy for database interoperation, Journal of Computational Biology, 2:573–586, 1996. [5] Karp, P.D., An ontology for biological function based on molecular interactions, Bioinformatics, 16:269–285, 2000. [6] Karp, P.D., Pathway databases: A case study in computational symbolic theories, Science, 293:2040–2044, 2001. [7] Karp, P.D., Riley, M., Saier, M., Paulsen, I.T., Collado-Vides, J., Paley, S.M., Pellegrini-Toole, A., Bonavides, C., and Gama-Castro, S., The EcoCyc database, Nucleic Acids Res., 30(1):56–58, 2002. [8] Lim, E.-P. and Chiang, R.H.L., The integration of relationship instances from heterogeneous databases, Decision Support Systems, 29:153–167, 2000. [9] Navathe, S.B. and Kogelnik, A.M., The challenges of modeling biological information for genome databases, Conceptual Modeling, LNCS 1565:168–182, 1999. [10] Rojas, I., Bernardi, L., Ratsch, E., Kania, R., Wittig, U., and Saric, J., A database system for the analysis of biochemical pathways, In Silico Biology, 2:75–86, 2002. [11] http://www.ecocyc.org/ [12] http://www.expacy.ch/enzyme/ [13] http://www.genome.ad.jp/ligand/