Model-based Design of Distributed Collaborative ... - Semantic Scholar

2 downloads 372 Views 531KB Size Report
Online Conference Service [7] we have shown that the. jABC is adequate to design large distributed, role-based applications, in that case a collaborative decision support system. As shown ... for Computer-Telephony integrated systems [5].
1

Model-based Design of Distributed Collaborative Bioinformatics Processes in the jABC Tiziana Margaria∗ ∗ Chair † Chair ∗ Chair

Christian Kubczak†

Bernhard Steffen‡

of Service and Software Engineering, University Potsdam, Germany, [email protected]

Service Engineering for Distributed Systems, University G¨ottingen, Germany, kubczak,[email protected]

of Programming Systems, University of Dortmund, 44227 Dortmund, Germany, [email protected]

Abstract— Our approach to the model-driven collaborative design of workflows for bioinformatic applications uses the jABC [6] for model driven mediation and choreography to complement a Webservice-based elementary service provision. jABC is a framework for service development based on Lightweight Process Coordination. Users (product developers and system/software designers) develop services and applications by composing reusable building-blocks into (flow-)graph structures that can be animated, analyzed, simulated, verified, executed, and compiled. This way of handling the collaborative design of complex processes has proven to be effective and adequate for the cooperation of non-programmers (in this case biologists) and technical people, and it is now being rolled out in the operative practice.

I. T HE S ETTING : D ISTRIBUTED B IOINFORMATIC W ORKFLOWS

Fig. 1.

Mark Njoku†

The Initial VOGS Requirement Diagram

Comparative study of different species requires an

extensive use of knowledge provided by other groups, an enormous organizational effort in mediating the data discrepancies, and the use of tools and algorithms for the screening, analysis, and comparison of data. To help, major efforts have been carried out by teams distributed world-wide and spanning various consortia, projects, and corporations, to provide data repositories (mostly databases) and tools for the access to and analysis of those data. The complexity arises from the physical distribution, from the heterogeneity of the technical platforms, of the paradigms, of the formats, and of the communication mechanisms witnessed. In this paper, we show how our approach to the modeldriven collaborative design of distributed Webservicebased processes uses the jABC [6] for model driven mediation and choreography of a workflow for the validation of orthologous gene structures. This is the first step towards adaptive design and deployment of self-reconfiguring workflows. With the Online Conference Service [7] we have shown that the jABC is adequate to design large distributed, role-based applications, in that case a collaborative decision support system. As shown e.g. in [3], the jABC approach can be enhanced with workflow adaption techniques that can be used in a moderated or automatic mode. We are currently investigating in a number of application domains the requirements to adaption and self-* properties that characterize the mediation and choreography level, with the aim of building a generic autonomic layer that reacts to the circumstances in an analogue way to reflexes. Our concrete case study concerns the validation of orthologous gene structures among a selection of higher organisms. An orthologous gene is a gene present in two or more species that has evolved from a common ancestor. The validation of the structures is carried out

2

Fig. 2.

The Refined VOGS Requirement Diagram

via comparison of the orthologous gene structures in different species, with the aim of improving the prediction of regulatory regions. The results have various applications, for instance for phylogenetic footprinting, a technique that identifies regulatory elements by finding unusually well conserved regions in a set of orthologous noncoding DNA sequences from multiple species. The concrete way biologist proceed uses the standard analysis tool WU-Blast[15], together with several searches of the Ensembl[4] database, and a number of algorithms developed by our project partners. The

extracted data and the final results must be adequately pre- and postprocessed. In this sense, the whole task is very similar to the mediation problem addressed in the Semantic Web Service Challenge (SWS’06) [14]. The service provisioning paradigm in use for algorithms and databases is in our case the state-of-theart Webservice technology. Requirement modelling tools are Microsoft Word and Powerpoint diagrams, as shown e.g. in Fig. 1 and Fig. 2, for a large number of workflow requirement and description documents produced by non-technical project members, together with XML

3

and WSDL descriptions for the resources (tools and databases) that are already available online. Current approaches to support workflow design in this application domain [1], [12], [2]do not satisfy the needs of simplicity and flexibility required by the highly heterogeneous, distributed character of the technical platform. The non-technical background of the scientists who conceive, design, and use the workflows (largely biologists, biochemists, and chemists), poses a high demand on the ease-of-use of the solution, and limits the tradeoff range for finding adequate technical solutions. The requirement is here thus reconciling the complexity of the required services with a collaborative approach to design that stresses intuitiveness and simplicity, ruling out programming skills as a prerequisite. II. BASIC C ONCEPTS OF THE JABC M ODELLING F RAMEWORK jABC[6] is a mature framework for service development based on Lightweight Process Coordination [10]. Predecessors of jABC have been used since 1995 to design, among others, industrial telecommunication services [11], Web-based distributed decision support systems [7], and test automation environments for Computer-Telephony integrated systems [5]. jABC allows users to easily develop services and applications by composing reusable building-blocks into (flow-)graph structures. This development process is supported by an extensible set of plugins that provide additional functionality in order to adequately support all the activities needed along the development lifecycle like e.g. animation, rapid prototyping, formal verification, debugging, code generation, and evolution. It does not substitute but rather enhance other modelling practices like the UML-based RUP (Rational Unified Process, [13]), which are in fact used in our process to design the single components. Lightweight Process Coordination offers a number of advantages that play a particular role when integrating off-the-shelf, possibly remote functionalities, as in this context. • Simplicity. jABC focuses on application experts, who are typically non-programmers. The basic ideas of our modelling process have been explained in past projects to new participants in less than one hour. • Agility. We expect requirements, models, and artefacts to change over time, therefore the process supports evolution as a normal process phase. • Customizability. The building blocks which form the model can be freely renamed or restructured to fit the habits of the application experts.

Get Orthologue ID’s • Purpose: produce the set of all genes that are orthologue to each other (starting from the initial gene), in all the given species • Input interface: – String ensemblID – organismType(XML)[] organismList • Output interface: Array with the informations Species-name, ensemblID (Array of orthologues, maybe in XML) • Available: Webservice returning the orthologous IDs for a pair of species. – Input: 2 organismType and an ensemblID – Output: Acessionnumber[] idList • Problem: getting the orthologous IDs must be recursive and computed with all species pairs • To do: make it recursive and precise the initial interfaces. Fig. 3.











Specification of the Get Orthologue ID’s functionality

Consistency. The same modelling paradigm underlies the whole process, from the very first steps of prototyping up to the final execution, guaranteeing traceability semantic consistency. Verification. With techniques like model checking and local checks we support the user to consistently modify his model. The basic idea is to define local or global properties that the model must satisfy and to provide automatic checking mechanisms. Service orientation. Existing or external features, applications, or services can be easily integrated into a model by wrapping the existing functionality into building blocks that can be used inside the models. Executability. The model can have different kinds of execution code. These can be as abstract as textual descriptions (e.g. in the first animations during requirement capture), and as concrete as the final runtime implementation. Universality. Thanks to Java as platformindependent, object-oriented implementation language, jABC can be easily adopted in a large variety of technical contexts and of application domains. III. D ESIGNING THE VOGS P ROCESS

A central requirement to the jABC process-oriented models was the capability to bridge the gap between

4

Fig. 4.

SLG of the Orthologous ID Retrieval

the high-level models of the whole project, typically produced by professionals with no technical background, and the detailed models usable by programmers and engineers at implementation time. We started with the graphical description of the toplevel workflow of Fig. 1, where the colour code and the layout were already meaningful: on the top left we find global inputs (green squares), on the bottom right the desired output (red square), and in the center a succession of individual coarse-grained activities (light yellow squares, indicating functionalities) with their intermediate and partial results (light blue squares). Additionally, for each functionality we had a textual, informal specification with a clearly defined metadata structure: Purpose, Input interface, Output interface, Available, Problem, To do. Fig. 3 shows the description for the Get Orthologue IDs functionality. In jABC, every functionality used within an application or service is encapsulated within a ServiceIndependent Building Block (SIB). In fact, we use SIBs to form the workflow of the VOGS within a Service Logic Graph (SLG), jABC’s way of defining processes. A SIB could contain a single functionality, or also whole subgraphs (another SLG), thus serving as a macro that hides more detailed and basic steps.

A. Designing the Workflow We are able to model the big picture workflow exactly as described by the partners: as shown in Fig. 4, the process flow follows rather closely the refined (but less structured) version pictured in Fig. 2. We have a number of SIBs that provide the elementary services made available by the bioinformatics community, like getOrthologIDs and getIDSequence, grouped in the bioinformatik section of our SIB palette, and a number of application-independent SIBs for useful control structures, called helpers. Some helpers like getID, nextID are fully generic, some others, like getBlastID or nextArt, are instances of the generic ones with a more semantically suggestive name, tailored to the application under consideration. The second part of the workflow (Fig. 5) deals with the retrieval and comparison of the genes, and of the determination and scoring of the relevant positions. B. Workflow Granularity The top-level worklow designed within the jABC shown in Fig. 1 is rather simple: it is for instance almost cycle free and clearly follow the intuition of a phaseoriented, nearly sequential process. This abstract view is common when dealing with application experts.

5

Fig. 5.

SLG of the Gene Comparison and Best-fit Return

As we see in Fig. 5, several cycles centered on the use of an elementary service by the bioinformatics community are parametric in the specific type of information searched. It would thus be possible to capsule them as distinct workflow-level reusable functionalities by introducing hierarchy. These three workflows could

be implemented as own SLGs, appearing here as graph SIBs, and simplifying the whole picture nearly to the level of the initial requirements. More generally, loops needed by the detailed tasks can be modelled in the jABC in different ways, mostly depending on the desired abstraction of the workflow:

6

Fig. 6.





Executing the VOGC Process with the jABC Tracer

they can be modelled within the implementation code of the specific SIBs, e.g., as iterations over variables. This is desirable, if there is no need to reason (or prove anything) about that behaviour, which is then considered an implementation issue. if we are interested in analyzing the loop behaviour, we can refine the SLG of the workflow and model the (relevant) loops at the workflow level, either for the whole process, or just inside specific graph SIBs if that portion of the workflow needs specific

attention. In principle, workflows can be refined up to the detail of single statements, if is desired. Successive analysis of the code can help also in cases where the workflow has not been refined to the very end. •

C. Workflow Execution After designing the workflow, by means of the tracer plugin we are able to animate, simulate or interpret it (depending on the kind of executable code associated

7

with the SIBs: mock code, simulation code, or real implementation). In this case, we have already the full implementation of the workflow: as shown in Fig. 6, we started the tracer plugin of the jABC (top right window), which can be run in step-by-step interactive mode or in automatic mode. The tracer window shows the progress of the execution: in this snapshot, we have just run the SIB WUBlast, and are about to execute the SIB identifyBlastHit, both implemented via calls to the corresponding Webservices. On the left we see two user-level SIB inspectors. For the SIBs loadStartIDandArtlist we see that we start with a list containing the human species, and for the getIDSequence we see that the first exon is returned, but we could distinguish whether first exon, last exon, or promotor are returned. At the bottom we see the output of the execution of WUBlast, provided as remote Webservice. We can also generate source code of the SLG by invoking one of the jABC code generators. They differ in the structure and efficiency of the generated code, but all of them allow getting a running application that is independent of the jABC. D. Workflow Evolution The whole process of designing the solution to the P3 redesign challenge can be solved with little initial coding effort by instantiating existing template SIBs (like the SYS SIB used here) and graphically designing and configuring the workflows at the SLGs level. In fact, this is already also sufficient to support a flexible change management and variants production, an important requirement for the second project phase. IV. C ONCLUSIONS AND P ERSPECTIVES We have presented an approach to the model-driven collaborative design of workflows for bioinformatic applications. In the center of this approach is the jABC [6], a framework for service development based on Lightweight Process Coordination, which we use here to complement a Webservice-based elementary service provisioning with functionality for mediation and choreography. This way of handling the collaborative design of complex processes has proven to be effective and adequate for the cooperation of non-programmers (in this case biologists) and technical people, and it is now being rolled out in the operative practice. The remote character of the tools can also be lifted from simple access as it is now into a remote tool integration platform like jETI [8], [9], which constitutes

an own layer on top of the jABC. This is matter of further development. As shown e.g. in [3], the jABC approach can be enhanced with workflow adaption techniques that can be used in a moderated or automatic mode. We are currently investigating in a number of application domains the requirements to adaption and self-* properties that characterize the mediation and choreography level, with the aim of building a generic autonomic layer that reacts to the circumstances in an analogue way to reflexes. V. ACKNOWLEDGEMENTS We thank our partners Prof. Dr. Edgar Wingender (Zentrum f¨ur Informatik, Statistik und Epidemologie, Abteilung Bioinformatik, G¨ottingen), Martin Haubrock and Knut Schwarzer (Universit¨atsklinikum G¨ottingen) for their open and constructive cooperation. R EFERENCES [1] BioPerl homepage: http://bioperl.org/wiki/BioPerl [2] Biospice project homepage: http://biospice.org [3] V. Braun, T. Margaria, B. Steffen, H. Yoo, T. Rychly: Safe Service Customization, Proc. IN’97, IEEE Communication Soc. Workshop on Intelligent Network Colorado Springs, CO (USA), 4-7 May 1997, IEEE Comm. Soc. Press. [4] Ensembl gene database: http://www.ensembl.org [5] H. Hungar, T. Margaria, B. Steffen: Test-Based Model Generation for Legacy Systems, IEEE International Test Conference (ITC), Charlotte, NC, September 30 - October 2, 2003. [6] jABC Website: www.jabc.de [7] M. Karusseit, T. Margaria: Feature-based Modelling of a Complex, Online-Reconfigurable Decision Support Service, WWV’05. 1st Int’l Workshop on Automated Specification and Verification of Web Sites, Valencia, Spain, March 14-15, 2005, – Post Workshop Proc. appear in ENTCS. [8] T. Margaria: Web Services-Based Tool-Integration in the ETI Platform. SoSyM, Int. Journal on Software and System Modelling, Vol. 4, N. 2, May 2005, pp. 141 - 156, Springer Verlag [9] T. Margaria, R. Nagel, B. Steffen: Remote Integration and Coordination of Verification Tools in jETI. Proc. ECBS 2005, 12th IEEE Int. Conf. on the Engineering of Computer Based Systems, April 2005, Greenbelt (USA), IEEE Com-puter Soc. Press, pp. 431-436. [10] Margaria, T., Steffen, B.: Lightweight coarse-grained coordination: a scalable system-level approach. STTT 5 2-3 (2004) 107–123. [11] T. Margaria, B. Steffen, M. Reitenspieß: Service-Oriented Design: The Roots, ICSOC 2005: 3rd ACM SIG-SOFT/SIGWEB Intern. Conf. on Service-Oriented Computing, Amsterdam (NL), Dec. 2005, LNCS N. 3826, Springer Verlag, pp.450-464. [12] myGrid Project homepage: http://www.mygrid.org.uk/ [13] Rational Unified Process. http://www306.ibm.com/software/awdtools/rup/ [14] Semantic Web Services Challenge 2006: Challenge on Automating Web Services Mediation, Choreography and Discovery - organized by DERI, Stanford (USA). http://www.swschallenge.org/ [15] Wisconsin University BLAST, description available at https://gcg.gwdg.de/wublast/README.html