A Grid Environment for Data Integration of Scientific Databases

Recommend Documents

Semantic Data Integration Environment for Biomedical Research ...

of Ontological Framework on top of the data integration environment which is .... edges, with appropriate attributes for each node and edge. .... Figure 6: âVisAntBuilderâ GUI. Provide an ..... Grethe JS, Baru C, Gupta A, James M, Ludaescher B,.

Semantic Data Integration Environment for ... - Semantic Scholar

specialist who works with the domain scientists to capture the requirements of the .... relevant data retrieval using ontology concept to concept paths, ontology to ...

Semantically Based Data Integration Environment for Biomedical ...

This paper presents an overview of the data integration mediation system developed as part of the. Biomedical Informatics Research Network (BIRN;.

A Generic Architecture for Sensor Data Integration with the GRID

suitable for directly hosting Grid Services given current technologies. However, in some cases they are capable of hosting Grid Services directly, while in.

A Generic Architecture for Sensor Data Integration with the GRID

Medical Devices for Everyday Healthâ [13]. It aims to create tools and exercise strategies for unobtrusive medical monitoring. Professor Laybourn-Parry and her ...

A Methodology for GIS Databases Integration - CiteSeerX

for the development of tools to help the integration. The existing methodologies ... This methodology has wide application, contemplates all the characteristics of ...

Characteristics of Scientific Databases

bases have additional stages of data collection and analysis that introduce more complexity and chal- lenges. In section 2. different types of scientific databases.

Semantic Integration of File-based Data for Grid Services

Semantic Integration of File-based Data for Grid Services. Andrew Woolf1, Ray ... at providing a semantically rich data representation layer for service-level ...

Integration of Substation Data - Smart Grid Center - Texas A&M ...

D. Sevcik is with CenterPoint Energy, PO Box 1700, Houston, TX 77251. USA (e-mail: .... Custom software application implemented in this example to handle ...

Distributed Data Mining On Grid Environment

to a single problem at the same time - usually to a scientific or technical problem ..... To study the enhancement of the execution time for the cross validation data ...

DATA MANAGEMENT SYSTEM: Grid enabled environment ... - progress

data accessed from the Internet. The next function is mirroring the external database. In this paper we present the DMS architecture and the functionality of its ...

Network Storage Management in Data Grid Environment

uted storage system, which can be utilized in the Grid environment. NSM architecture ... Data intensive applications, such as experimental analysis, simulations and visualizations ... To enable dependable service, NSM uses coding to add.

Quorum Based Data Replication in Grid Environment

in a logical 2D mesh structure and by using quorums and voting techniques to ... read and write operations is consistency, by proofing the quorum must not have.

098-2009: Data Integration in a Grid-Enabled ... - SAS Support

SAS Marketing Automation. SAS Marketing Optimization. Any SAS program. Figure 4. Products in SAS That Can Exploit SAS Grid Computing. DISTRIBUTED ...

Scientific Software and Databases

Use „Insert symbol“ for Greek letters instead switching to Symbol font ... Other useful features: Count words, characters. 16 .... Plugins for Word and OpenOffice.

Semantically Based Data Integration Environment ... - Semantic Scholar

Semantically Based Data Integration Environment for Biomedical Research ..... keeps a list of all the relations and functions found in a conjunctive query.

Semantically Based Data Integration Environment ... - Semantic Scholar

Wrapper Service: Wrapper is the middleman between the mediator and a source. It accepts the query from the mediator, translates it into source query language,.

A Framework for Integration of Large-Scale Distributed Visual Databases

metasearch agent derives a list of relevant database sites to the given query by matching .... template builder, and metadata re nement module, constitute the ...

Scientific Data Mining, Integration, and Visualization - Informatics ...

The workshop had about fifty participants, ranging from software engineers developing Grid infrastructure software, to computer scientists with expertise in data mining and visualization, to application ...... as bespoke hardware. The DAME ...

Redundant Parallel Data Transfer Schemes for the Grid Environment

schemes for parallel data transfer in a grid environment, which copes up with highly inconsistent network performances of the servers. We have developed an.

Research on The Application of Linked Data in Scientific Databases

May 22, 2015 - TB of online data, see Resource Statistics System, CSDB (2011) .... the concept of linked data (http://www.w3.org/DesignIssues/LinkedData.html) in .... resources browsers (based on B / S, or based on C / S architecture), etc.

Replica Selection in Grid Environment: A Data-mining ... - CiteSeerX

Department of Computer Science, University of Calgary, Canada. 2500 University .... measure Grid file transfer time, we extend the optimization technique by ...

Replica Selection in Grid Environment: A Data-mining ... - CiteSeerX

a solution to many Grid-based applications such as Climate data .... Location Service (RLS) and describe a data Grid architectural ..... mization/ optorsim.html.

Database Support for Data-driven Scientific Applications in the Grid

We then represent the common data access pattern in data-driven applications as a database query. 2.1 Application Scenarios. Oil Reservoir Management.

A Grid Environment for Data Integration of Scientific Databases

Download PDF

0 downloads 0 Views 198KB Size Report

Comment

PubChem, NLM Medical Encyclopedia, etc.). Using the system one can make a database access flow for describing a set of queries as a workflow across the.

A Grid Environment for Data Integration of Scientific Databases Hideo Matsuda Department of Bioinformatic Engineering, Graduate School of Information Science and Technology Osaka University, Toyonaka, Japan. [email protected] Abstract Effective integration of heterogeneous data sources has been studied as the most pressing challenge in various fields; such as, high energy physics, astronomy, and life sciences. In this talk, we present a data integration system by using Globus Toolkit with OGSA-DAI. For associating related data among many databases, we have introduced metadata based on their domain ontologies. Using the system one can make a database access flow for describing a set of queries as a workflow, and can query across the databases without aware of their locations and schemas.

1. Introduction With the rapid progress of various technologies, there is a growing demand for analyzing interdisciplinary areas with an integrated view, such as multiscale and multiphysics computations and data integrations. To tackle this problem, many Grid projects have been conducted (e.g., BioGrid [1] and NAREGI [2] in Japan). In this talk, we focus on the data integration from a large number of scientific databases by employing data grid technology. In these databases, their schema and data description are much diverged since they are based on different data domains. In order to integrate databases across those domains, we introduced metadata for describing semantic relationships among their entities [3]. This connected network of databases makes use of the grid technology for delivering integrated searches of the databases.

2. Overview of Data Grid Environment A Data Grid environment was developed for the data integration from a large number of databases (see Fig. 1). As a target data-source, we took life science databases since there are more than 700 databases and many of them are available on the web [4]. To cope with these issues, we construct metadata for associating all the related data among the databases and develop a dataflow engine for performing multiple queries in pipelining flows. It is composed of a database access management tool for constructing database access flows across those databases, query engines for performing simultaneous queries to the DBs, a distributed file system for storing query results in distributed storages, and a metadata construction system for constructing ontology-based metadata to manipulate heterogeneous data in various scientific fields. Access Access Access DB#3 DB#1 DB#2

DB Access Flow based on Metadata Metadata Metadata (e.g., Genome) Construction System Metadata

Query Engine

(e.g., Astronomy)

OGSA DAI

DB DB

Metadata (e.g., Physics)

OGSA DAI

Query Results Storage

Storage

Storage

Distributed File System

OGSA DAI

DB DB

DB DB

Figure 1. Overview of DB integration system.

The system was developed by using Globus Toolkit 4.0.1 with OGSA-DAI [5], and Gfarm [6] as a distributed file system. On this system, we have imported various life science DBs (e.g., UniProt, NCBI PubChem, NLM Medical Encyclopedia, etc.). Using the system one can make a database access flow for describing a set of queries as a workflow across the databases.

References [1] http://www.biogrid.jp/

[2] http://www.naregi.org/ [3] Y. Tohsato, T. Kosaka, S. Date, S. Shimojo, and H. Matsuda, “Heterogeneous Database Federation using Grid Technology for Drug Discovery Process”, Lecture Notes in Bioinformatics, vol.3370, pp.43-52, 2005. [4] M.Y. Galperin, “The Molecular Biology Database Collection: 2005 Update”, Nucleic Acids Research, vol.33, DB issue, pp.D5-D24, 2005. [5] http://www.ogsadai.org/ [6] http://datafarm.apgrid.org/