A GRID-based multilayer architecture for ... - Semantic Scholar

0 downloads 0 Views 404KB Size Report
process in a standardized environment (e.g. Taverna [2]). Formal methods for biological systems make possible to verify experimental hypotheses by simulation.
A GRID-based multilayer architecture for bioinformatics Ezio Bartocci1 , Diletta Cacciagrano1 , Nicola Cannata1 , Flavio Corradini1 , Emanuela Merelli1 , Luciano Milanesi2 , and Paolo Romano3 1

2

University of Camerino, Mathematic and Computer Science Department, Camerino, Italy {name.surname}@unicam.it Institute for Biomedical Technologies, National Research Council, Milan, Italy [email protected] 3 National Cancer Research Institute, Genova, Italy [email protected]

Abstract. The volume and complexity of biological data and informations available today need a significant computational data and processes analysis. A huge amount of bioinformatics databases and tools is available. Some of them allow bioinformaticians to intuitively compose their in-silico experiments in the form of workflows. Other applications aim at the analysis, modeling and simulation of biological systems and processes. In this context, the quest for resources becomes a very demanding and time-consuming activity, so that a dynamic semantic indexing system of bioinformatics resources becomes essential. As a consequence, the availability of a virtual desk fulfilling bioinformaticians needs, undoubtedly constitutes an important requirement in modern and future biology. For this purpose, we propose a GRID-based multilayer architecture, intended to support in-silico experiments, resource discovery and biological systems simulation.

1

Introduction

Bioinformatics is playing a fundamental role in e-Science development. Huge amount of biological data is organized in databases. Electronic tools and formal frameworks help biologists both to formalize in-vitro experimental protocols and to simulate biological system behaviors. In-silico scientists, with text-mining and inference capabilities, are depicted at the horizon [1]. In-silico experiments are naturally specified as workflows of activities, implementing the data analysis process in a standardized environment (e.g. Taverna [2]). Formal methods for biological systems make possible to verify experimental hypotheses by simulation and to harness the complexity of the biological domain [3]. Mechanisms for sharing, labelling in a machine-understandable way, finding and composing specific bioinformatics resources (both data and computational elements) become necessary to implement these new forms of interactions among scientists and resources [4].

Cyberinfrastructures [5, 6], including web services and GRID technologies, provide distributed global environment permitting multidisciplinary, geographically dispersed, data and computation intensive science. The GRID technology [7] supports virtual communities through sharing of computational and data resources, on which deterministic queries, across a distributed and common schema, are possible. Its fundamental architecture also supports stateful processes important to the concept of workflow. A successful prototype of GRID technology application to bioinformatics is represented by the UK e-Science myGRID project [8], a service-based GRID. Agents society fits well in this scenario: they not only permit to describe a biological system as a set of active computational components interacting in a dynamic and unpredictable environment [9], but also provide the necessary flexibility to support data and computation intensive distributed applications. The drawback of the Web as a wide space of shared resources is the difficult and time-consuming activity of finding a well-defined meaning resource. A new model of WWW, enriched with semantics, is being developed. The Semantic Web [10] is described as an extension of the current Web which allows resources to be annotated in a standard and machine-readable way and computers and people to be enabled to work in cooperation. The GRID and Semantic Web technologies integration, known as Semantic GRID [11], is indicated as the ideal infrastructure for fulfilling the e-Science vision. To organize the knowledge spread across the Internet, Zhuge [12] suggested the concept of Knowledge GRID, consisting in a platform enabling sharing and management of heterogeneous resources in an uniform way, to reach desired information and services in a easier and more effective way.

2

A GRID-based architecture

In this scenario, we propose a multilayer architecture to satisfy bioinformaticians needs (Figure 1). At the user layer, it is intended to support in-silico experiments, resource discovery and biological systems simulation. The pivot of the architecture is a component called Resourceome [13], which keeps an alive index of resources in the bioinformatics domain using a specific ontology of resource information. A Workflow Management System, called BioWMS, provides a web-based interface to define in-silico experiments as workflows of complex and primitives activities. High level concepts concerning activities and data could be indexed in the Resourceome, that also dynamically supports workflow enactment, providing the related resources available at runtime. ORION, a multiagent system, is a proposed framework for modeling and engineering complex systems. The agent-oriented approach allows to describe the behavior of the individual components and the rules governing their interactions. The agents also provide, as middleware, the necessary flexibility to support data and distributed applications [14]. A GRID infrastructure allows a transparent access to the high performance computing resources required, for example in the biological sys-

tems simulation. In the following, we give a more detailed description of every component of the proposed architecture.

Fig. 1. The GRID-based Multilayer Architecture

2.1

BioWMS

BioWMS is a Workflow Management System to support the definition, the execution and the results management of an in-silico experiment through a webbased interface. BioWMS, implemented on BioAgent/Hermes architecture, generates dynamically domain-specific, agent-based workflow engines from a user workflow specification. Our approach exploits the proactiveness and mobility of agent-based technology to embed the application domain features inside agents behaviour. The resulting workflow engine is a multiagent system -a distributed, concurrent system- typically open, flexible, and adaptative. 2.2

Resourceome

Resourceome [13] is an ontological model to organize a machine-understandable index of bioinformatics resources. Differently from others bio-ontologies, it can also be invoked for modeling domains different from the bioinformatics one: the

knowledge of the domain is indipendent from the knowledge of the resources, since the resources related to a concept of the domain are directly connected to that concept. Also in this case, agents play a key role, since they take care of important issues like notification of availability and quality of the resources, feedback from previous users and customized assistance for the navigation and the reasoning over the ontology. 2.3

Orion

Orion provides a workbench to engineer, refine and validate biological models and simulations. It is a computational tool for behavioural models metabolic reactions. Issues like interactions, movements, enzymes and metabolites reactions have been described from the agent point of view. The knowledge concerning metabolic pathways is mined from related databases and integrated into the developed domain-specific ontology. The implementation of the framework on the Hermes agent platform is under development. 2.4

Hermes: an agent-based middleware

Hermes [14] is a middleware system supporting run-time executions of agents society; namely, primitives for coordination, communication, mobility and other implementation features. In detail, it provides a set of active services (specialized wrappers, domain specific services, etc.) to allow a secure resources access. The agent mobility is performed through a mobile code environment, providing decentralized executions of local activities, reducing network traffic, and freeing researchers from network faults. 2.5

LITBIO project

The software previously described will be installed in the frame of the Laboratory for Interdisciplinary Technologies in Bioinformatics (LITBIO), applied to Genomics and Proteomics (http://www.litbio.it). This project will contribute to create infrastructure capable of supporting challenging international research and to develop new bioinformatics analysis strategies apply to biomedical and biotechnological data. The project’s aim is to establish cooperation between private and public structures and to stimulate the growth of new enterprises in the sector of Bioinformatics. In these context the web services, HPC computer and GRID technology will play an important role in interconnecting the European Infrastructure in Bioinformatics.

3

Conclusion

The user layer of the architecture we have proposed supports in-silico experiments, resource discovery and biological systems simulation, satisfying bioinformaticians needs. The agent-based middleware realizes important features.

Agents support data and computation intensive distributed applications in a very flexible way; they find available and specific resources and allow a reasoning over the ontology of resources; they execute workflows of complex and primitives activities; they simulate the behavior of the biological components and the rules governing their interactions. The user layer allows a virtual Laboratory to be implemented and shared, respecting the main goals of LITBIO project.

Acknowledgements This work is supported by the Investment Funds for Basic Research (MIURFIRB) project Laboratory of Interdisciplinary Technologies in Bioinformatics (LITBIO).

References 1. Wren, J.D.: Engineering in genomics: the emerging in-silico scientist; how textbased bioinformatics is bridging biology and artificial intelligence. Engineering in Medicine and Biology Magazine, IEEE 23(2) (2004) 87–93 2. Oinn, T., et al.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17) (2004) 3045–54 3. Cardelli, L.: Abstract machines of systems biology. 3737 (2005) 145–168 4. Goble, C.A.: Using the semantic web for e-science: Inspiration, incubation, irritation. In: International Semantic Web Conference. Volume 3729 of LNCS. (2005) 1–3 5. Hey, T., Trefethen, A.E.: Cyberinfrastructure for e-Science. Science 308(5723) (2005) 817–821 6. Buetow, K.H.: Cyberinfrastructure: Empowering a "Third Way" in Biomedical Research. Science 308(5723) (2005) 821–824 7. Foster, I., Kesselman, C.: The Grid: Blueprint for a Future Computing Infrastructure. Morgan Kaufmann Publishers, San Francisco, CA (1998) 8. Stevens, R.D., Robinson, A.J., Goble, C.A.: myGrid: personalised bioinformatics on the information grid. Bioinformatics 19(suppl_1) (2003) i302–304 9. Cannata, N., Corradini, F., Merelli, E., Omicini, A., Ricci, A.: An agent-oriented conceptual framework for systems biology. In: T. Comp. Sys. Biology. Volume 3737 of LNCS. (2005) 105–122 10. Berners-Lee, T., Hendler, J., Lassila, O.: The semantic web. Sci Am. 284 (2001) 34–43 11. De Roure, D., Hendler, J.A.: E-science: The grid and the semantic web. IEEE Intelligent Systems 19(1) (2004) 65–71 12. Zhuge, H.: A knowledge grid model and platform for global knowledge sharing. Expert Syst. Appl. 22(4) (2002) 313–320 13. Cannata, N., Merelli, E., Altman, R.B.: Time to organize the bioinformatics resourceome. PLoS Comput Biol. 1(7) (2005) e76 14. Corradini, F., Merelli, E.: Hermes: agent-based middleware for mobile computing. LNCS 3465 (2005) 234–270