International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43
36
A Web Services based Framework for Uniform Integration of Command-line Bioinformatics Software Tools Elarbi Badidi, M. Vall Mohamed Salem, Salah Bouktif, and Larbi Esmahi Abstract—Life scientists use a variety of bioinformatics software tools to perform tasks such as annotation of DNA and protein sequences. Most of these tools are command-line driven and handle various data types (nucleotide, protein, etc.) and data formats (Fasta, Genbank, GCG, etc.). As many bioinformatics software tools are generally involved in analysis tasks, scientists are more and more requiring that these heterogeneous bioinformatics tools be integrated in a uniform way. They are also requiring graphical user interfaces of these tools, and the ability to compose workflows without much programming effort. In this paper, we propose a Web services based framework that meets the above requirements. Index Terms—Web services, data and tools integration, biological workflows.
I. INTRODUCTION
L
IFE science data and application integration and interoperability are one of the most challenging problems facing bioinformatics today. Indeed, to enable the discovery of new biological insights and to create a global perspective from which unifying principles in biology can be discerned, scientists have to interpret many types of information from a variety of sources. These sources include nucleotide and amino acid sequences, protein domains, protein structures, and gene expression profiles. The structure of biological data has its own characteristics which make it apart from data in other domains. Biological data exists in the form of terabytes of nucleotide sequence data, microarray and other image data, and various other forms of data that result from both experimental and “in silico” research efforts. Due to this huge amount of data, it is often impossible, without the support of additional hardware and software facilities, to interpret and to understand this data. Manuscript received March 15, 2009. Elarbi Badidi is an Assistant Professor of computer science at the College of Information Technology (CIT) of United Arab Emirates University, Maqam Campus, PO. Box 17551, Al-Ain, UAE. (Phone: 971-3713-5552; Fax: 971– 3 –762 6309; e-mail: ebadidi@ uaeu.ac.ae). M. Vall Mohamed Salem, is an Assistant Professor of computer science at the University of Wollongong in Dubai, UAE (e-mail:
[email protected]). Salah Bouktif is an Assistant Professor of computer science at the College of Information Technology (CIT) of United Arab Emirates University, Maqam Campus, PO. Box 17551, Al-Ain, UAE. (Phone: 971-3713-5523; Fax: 971– 3 –762 6309; e-mail: salahb@ uaeu.ac.ae). Larbi Esmahi is an Associate Professor at the School for Computing & Information Systems, Athabasca University, Athabasca, Alberta, Canada (e-mail:
[email protected]).
While genomic data have a well-known representation as sequences taken from the {A,C,G,T} alphabet, there is no clear model for data representing the expression products of genes: proteins and higher forms of organisms e.g., cells and the multitude of forms they assume in response to environmental challenges. To accomplish tasks, such as annotation and manipulation of DNA and protein sequences, and comparison of genes and genomes across species, life scientists have to use a variety of bioinformatics analysis software tools. These tools may use various data types (nucleotide, protein, taxonomy, etc.) and data formats (Fasta, Staden, Embl, Genbank, etc.). Most of them are originally stand-alone, command-line driven, with textual input and output. Moreover, the types and styles of their inputs and outputs are similarly variable as there is no standardization for parameters usage. The lack of graphical user interfaces (GUI) makes them cumbersome for the end user. Another major bottleneck is that most of these software applications are incompatible with one another as they use different file formats. As a consequence, the output of one tool cannot be used as an input for another, without data format conversion. A further complication is that the user has to define a multitude of parameters and options according to the particular data or aim of the analysis. There is no standard way to describe their input parameters and output results. In this paper, we use the terms tool and application interchangeably. In the past few years, two main approaches have been considered to deal with the issue of bioinformatics tools integration. The first approach consists in developing and deploying locally interactive environments in order to facilitate bioinformatics analyses. Examples of such environments are: Isys [1], Turbobench [2], and Applab [3]. The drawback of this approach is that the integration environment as well as the tools must be installed and configured locally, which requires substantial IT expertise. The second approach consists in taking benefit of the growing use of the Web to make the tools accessible through Web interfaces using HTML and various scripting languages (CGI, Perl, etc.) and technologies (Java, RMI, EJB, and CORBA). Examples of environments adopting this approach are: Bionavigator [4], NCSA Biology Workbench [5], and Anabench [6]. The recent development in terms of distributed computing technologies, have led to the Web and Grid services technologies, which promise to alleviate the integration and
ISSN 1738-6535 © Web Services Research Foundation
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 interoperability issues. In this paper, we present our proposed Web-services based framework for bioinformatics tools integration, which allows wrapping applications as Web services. In contrast to the above frameworks, in which the tools are accessed only from a Web interface, our framework will allow accessing application services through a Web portal as well as programmatically by exporting their WSDL [7] files. The Web portal may be implemented using various Web technologies such JSP, ASP, and JSF. II. RELATED WORK The trend now is to use Web and Grid services technologies as well as semantic Web services to solve the integration problem not only in life sciences but in enterprise (or business) integration as well. Examples of life sciences frameworks using these technologies are Soaplab, myGrid, and BioMoby. Soaplab [8] is a soap-based programmatic interface to command-line applications on remote computers. Soaplab uses Apache Axis [9] to create Sun Java implementation classes and deployment descriptors for all derived analysis services. It uses CORBA on the server side to find, start, control, and use applications. Soaplab has been developed within the UK e-Science initiative as a component of the myGrid [10] project. myGrid provides high-level open source grid middleware to facilitate building high level services for data and application resource integration such as resource discovery, workflow execution and distributed query processing. The emphasis is on data integration, and workflow, personalization. Using the myGrid workflow construction tool Taverna [11], workflows can be composed with semantic descriptions and published. BioMoby [12] is an open source project that aims to provide a framework for the discovery, representation, integration, and retrieval of biological data from widely disparate data hosts and analysis services using Web services technology. myGrid and BioMoby are very ambitious projects that aim to achieve integration of biological data distributed worldwide in disparate sources. Our framework shares the same goals with Soaplab, which is also designed to provide Web services wrappers for command-line applications, mainly to the Emboss bioinformatics package [13]. However, the main difference between our framework and Soaplab resides in the approach used for describing input and output of the analysis programs. In soaplab, the service generation mechanism expects inputs and outputs described in the ACD language of the Emboss package, while our service generation expects inputs and outputs described in XML using our XML schema for tools description that we have developed. The utilization of XML and XML Schema greatly reduces the complexity of dealing with heterogeneous analysis tools. By developing Web services wrappers for these command-line tools, we can overcome many of the limitations mentioned above. By using the Java Architecture for XML Binding (JAXB), we can implement very easily these Web service wrappers as well as the user interfaces of the tools from their XML descriptors. Also, the composition of Web services using standards such as BPEL [14] will facilitate the
37
composition of biological protocols. III. BACKGROUND A. Command-line interfaces A command line interface (CLI) is a user interface to a computer's operating system or a software application in which the user responds to a visual prompt by typing in a command on a specified line to perform specific tasks, receives a response back from the system, and then enters another command, and so forth. The Unix Prompt and the MS-DOS Prompt in a Windows operating system is an example of the provision of a command line interface. CLIs are often used by system administrators and programmers in engineering and scientific environments. A CLI is often used when a large vocabulary of commands, together with a wide range of options, can be entered more rapidly as text than with a pure GUI. This is typically the case with operating system command shells. Today, most users prefer the graphical user interface (GUI) offered by Windows, Mac OS, and others. Typically, most of today's Unix-based systems and software applications, such as MYSQL and MATLAB, offer both a command line interface and a graphical user interface with the benefits of both. In a CLI, commands are typically written in a particular way. For example, the command is typed first with no spaces in the name. Then after a space, one can sometimes modify the command by adding what are called “options” or “parameters”. Options change or limit the way the command is executed. They are usually preceded by a dash or another symbol. A command may also include the name of a file or directory that one wants the command to work on. The finished command will look something like this. command -option file command -option sourceFile destinationFile
Fig. 1. Example of command-line options
A CLI defines a grammar, a set of rules that all commands within the CLI must follow. This is the case for UNIX operating system. These rules may be different from one CLI to another. Therefore, with the heterogeneity of rules, it’s only through the documentation of the CLI that one can learn how to run the commands with the right options. In bioinformatics, several software packages and tools, such as Emboss provide only a command line interface. Fig. 1 provides a short description of the options that can be used with the seqret tool of Emboss. B. Web services in the Life Sciences With increasing acceptance among software vendors and rising adoption in the marketplace, Web Services are becoming
ISSN 1738-6535 © Web Services Research Foundation
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 the basis for many Web-based applications. They are interoperable across platforms and neutral to languages, which makes them appropriate for access from heterogeneous environments, enabling mass dissemination of knowledge. In the life sciences, adopting the Web services technology is seen as a key to achieve the coordination and interoperability among incompatible bioinformatics applications available from different providers, an endeavor that is becoming more and more significant to biological research. Within the life sciences community, in which scientists spend a great deal of their time using various incompatible tools, accessing remote databases, copying and pasting data to combine their analyses, converting data formats, and assembling results in ad-hoc protocols, the service oriented approach for accessing bioinformatics tools and databases as services in a standardized fashion has very quickly caught the interest of several organizations and research centers that have published their tools as Web services. The National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/) has published its Entrez Utilities as Web Services. The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp/) provides a Web API for biology (WABI), which includes several services. The European Bioinformatics Institute Web service (EBI; http://www.ebi.ac.uk/) provides programmatic access to data retrieval and analytical tools for several molecular databases. C. Biological workflows Complex analysis, annotation, and data integration typically involves various bioinformatics tools. In the past few years, several software environments and platforms emerged to enable, the orchestration, and execution of these tools in biological workflows. Examples of such systems include Bionavigator [4], Turbobench [2], Pegasys [15], w3h [16], G-Pipe [17], Biopipe [18], VIBE [19], Flosys [20], and Pise [21]. With the current trends towards using Web services technology in life sciences, new environments and frameworks have emerged to enable the orchestration and the execution of biological Web services. The most notable systems include Taverna (part of the myGrid project) [11] and BioMoby [12] as mentioned earlier. Moreover, the different standards in the area of Web services (WS_* standards) in particular the standards for service composition, including the Business Process Execution Language for Web Services (BPEL4WS) and the Business Process Modeling Language (BPML), will allow to go further in biological data and tools integration. D. Workflow use case: phylogenetic analysis A phylogenetic analysis workflow of newly sequenced protein genes involves the following steps: 1) Translation of the nucleic acid sequence to the corresponding peptide sequence in six frames (e.g. using tools such as transeq [13] or ExPASy translate tool [22]), 2) Identification of ORFs (Open reading frame) that correspond to conserved proteins by similarity search (e.g. using tools such as blast [23] or getorf [13]),
38
3) Retrieval of protein sequences from GenBank [24] (e.g. using Entrez [25] or seqret [13]), 4) Multiple protein alignment (e.g. using clustalw [26]), 5) Extraction of well aligned sequence stretches [27], 6) Tree inferences based on various models of evolution, and 7) Tree testing (e.g. using phylip [28] and consel [29]). Typically, such analysis pipelines are employed several times with different data sets or parameters. IV. FRAMEWORK OVERVIEW A. Objectives With the growing interest in data and tools integration in life sciences and the limited number of integration frameworks based on Web services, we set out to develop a framework that allows to: 1) Develop a Web service wrapper around each command-line tool, 2) Make unified and remotely accessible the interfaces of these tools, 3) Hide their dependencies on the underlying operating systems, and 4) Access these tools programmatically in order to be able to compose workflows describing many biological protocols. By converting the command-line applications into Web services, we can overcome many of the limitations and heterogeneity of styles of these applications we mentioned in the introduction. In this paper, an application service is an application with a Web service interface that is described by WSDL document as collections of network endpoints, or ports. To wrap a command-line tool as a Web service, we first describe the tool properties and its parameters in XML. We then translate the XML specification of the tool, which describes its parameters, its data types and data formats, into WSDL, and then create an entry in the UDDI registry [30] to advertise the WSDL specification. Web services clients can then look up the WSDL in the registry and interact with the tool as a Web service. Client applications can be written in various programming languages. The XML description of the data types and the data formats supported by command-line tools greatly reduces the complexity of their composition into workflows. Two given tools may be composed in a workflow in a serial fashion provided that the data type and data format of the first tool output are compatible with the data types and data formats of the second tool input. In the case of incompatibility of data formats, a data format conversion tool, such as readseq [31], is introduced between the two given tools in the workflow. The conversion is then performed without manual user intervention and without data loss. However, readseq is not recommended for very large (100+MB) sequence files, whether as a single record or multiple records. B. System components The architecture of the system is shown in Fig. 2. It is a three-tier architecture composed of a client layer providing presentation logic, a middle-tier layer containing the business
ISSN 1738-6535 © Web Services Research Foundation
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 logic, and a back-end database. Users may interact with the system through a portal using HTML and JSP (Java Server Pages) screen forms, for example, or by invoking the application services programmatically. The requests of the application services submitted by users at the presentation level are handled by an application server equipped with a Servlet and JSP container. Requests may be invocation of individual services or may be part of a workflow. The Workflows manager allows the user to compose workflows from the application services generated from command-line applications. This may be performed from a user interface or programmatically given that the input and output types and formats of applications are described for each application as we will see in the next section. At the back-end level, we find the command-line applications, the session management database which stores information about users’ sessions, and the service registry. We have designed the system architecture based on the service oriented architecture principles, and especially on using XML to describe tools and their parameters. The utilization of XML and XML schema greatly reduces the system complexity and deals with the heterogeneity of applications.
39
others are freely available, such as Jakarta Tomcat Web server and the Apache Axis toolkit. Using this toolkit, for example, one can convert a Java application into a Web service. To build a Web service for a given tool, we have to write a Java application to invoke and execute this tool. One can use the following Java class, described in Fig. 3, to run an application, which is external to the Java virtual machine (JVM).
Fig. 3: a Java class to run applications external to the JVM.
Fig. 2. Framework components
The key component of the system is the XML schema we have developed to describe the applications and their parameters. This XML schema catches most of the different situations and cases of applications and parameters. For instance, with this schema we should be able to describe various applications that are handling several data types (nucleotide, protein, taxonomic, etc.), several data formats (Fasta, Genbank, Embl, etc), as well as heterogeneous parameters with different types and syntax for assigning values to the parameters. From the XML description of an application, we may generate the user interface in the form of HTML and JSP pages, for example, and the Web service associated to the application. This is performed by the User Interface generator and the Web service Generator components. The WSDL of the application service is then published to the service registry. C. Bioinformatics tools as Web services Many platforms are now available to develop Web services applications. Some of them are commercial products, while
The method runApplication() requires the name of the application (name of the executable program), the path to the application, and the arguments to be passed to the application (input, output, and other options of the tool). To illustrate this, we will consider a small tool called infoalign, which is part of the Emboss package. Infoalign is a small utility to list some simple properties of sequences in an alignment. The above code fragment can be converted into a Web service with the following methods: setInfoalignInput(), setInfoAlignOutput(), setInfoAlignOptions(), and runInfoalign(). A WSDL file for this Web service will be automatically created. This feature is provided by most Web services development platforms. This file may be accessed, for example, from: http://localhost:8091/axis/services/RunInfoAlign.wsdl Using this WSDL file, client applications may be created to consume the newly created Web service. Under this scheme, the client should have prior knowledge about the parameters of the application - input, output, and options- with their syntax and order. To solve this problem, information about the application parameters may be part of the information that may be obtained from the Web service. So, by adding a new method called getInfoalignParameters() to the Web service, the client can get a description of the parameters of the infoalign application, and then he can customize the user interface to invoke the infoalign Web service. This solution is very simplistic as the description of the application and its parameters is more complex because of the variety of tools and their parameters. D. Application and parameters description Prior to generating a Web service for a command-line application, the parameters of the application should be described in a structured way that can be used by programs. As these command-line applications are developed by several
ISSN 1738-6535 © Web Services Research Foundation
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 programmers, and implemented in various programming languages, they do not follow the same rules for specifying their parameters. Therefore, coming up with a general description schema of the application parameters is very tentative. We have formalized the above descriptions of applications and parameters by defining the XML schema for both application and parameters descriptions. The main elements describing an application are: application name, application description, version number, category, documentation URL, application path, minimum number of input data, maximum number of input data, input types, input formats.
40
clustalw, infoalign, seqret, transeq, pepcoil, and silent. A template has been also generated from the schema to allow easy description of any command-line analysis tool. Using this template and the textual documentation of a given tool, one can create its XML descriptor that may be validated against our XML schema. Fig. 6 shows an extract from the XML description of the clustalw application for multiple alignments. B. Web service and User interface generation To implement the Java Web service for a given XML descriptor, we are using the Java Architecture for XML Binding (JAXB 2.0), which provides a fast and convenient way to bind between XML schemas and Java representations, making it easy for Java developers to incorporate XML data and processing functions in Java applications. As part of this process, JAXB provides methods for un-marshalling XML descriptor documents into Java content trees of data objects instantiated from the generated JAXB classes. These content trees are then used to implement associated Web services and user interfaces using JSP and HTML. This process is illustrated in Fig. 5.
Fig. 4. Definition of input types and input/output data formats.
An application tool may handle one or more of data types, including: protein, nucleotide, taxonomic, and result. Also, an application tool may handle one or several input and output data formats. This is described in Fig. 4 using XML schema types. The output of an application may be as well in one of the above data formats. By describing the data types and data formats handled by an application, we can compose workflows by connecting outputs of an application service to the inputs of other application services. The main elements describing a parameter of an application are: parameter name, parameter description, type, the option used to invoke the parameter, the value to be assigned to the parameter, the syntax describing how a value is assigned to the parameter, the min and max value in the case of integer parameters, and default values. Each parameter belongs to one following types: IntegerParameter, FloatParameter, StringParameter, SwitchParameter, ChoiceListParameter, FileParameter, and SequenceParameter. The complete XML schema is available at: http://faculty.uaeu.ac.ae/ebadidi/applSchema.xsd
Fig. 5. Implementation process
Fig. 7 depicts the JSP interface generated from the above classes for the seqret tool from the Emboss package.
V. IMPLEMENTATION A. Generic XML Schema A prototype of our proposed framework is under construction. We have developed The XML schema for application and parameters description, using Stylus Studio enterprise edition [32]. This schema was used to develop and validate the description of some bioinformatics tools such as ISSN 1738-6535 © Web Services Research Foundation
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43
41
Using the above process, we generate JAXB classes from the XML descriptors for few command-driven bioinformatics software tools from the Emboss package, such as: infoalign, seqret, transeq, getorf, silent, and pepcoil. These classes are used in the implementation of related Web services. Table 1 provides a description of the operations of some of these Web service. TABLE I SAMPLE WEB SERVICES OPERATIONS
Fig. 6. XML Description of the clustalw application
Fig. 8. Tree representation of the seqret WSDL file.
Fig. 7. Generated JSP User Interface of Seqret
The getInputTypes() operation returns the list of data types (nucleotide, protein, etc) that should be provided as input data to the tool. The getInputFormats() operation returns the list of data formats (Fasta, Genbank, GCG, etc.) of the input data of the tool. The getOutputFormats() operation returns the possible data formats of the output files of the tool. The run operations (runInfoalign, runSeqret, …) allow launching the execution of a tool given input data and a list of arguments. Fig. 8 shows the tree representation of the seqret WSDL file.
ISSN 1738-6535 © Web Services Research Foundation
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 C. Biological workflow composition Using the above operations, it becomes possible to link the tools’ Web services in workflows. Indeed, these operations allow checking the compatibility between the types and formats of output data of a given tool with the types and formats of input data of another tool. Two of the generated Web services can be composed in a workflow if the output data of the first Web service is compatible in terms of types and formats with the input data of the second Web service. As a first attempt to specify biological workflows from the generated Web services, the Business Process Execution Language (BPEL) was the obvious composition language of choice. BPEL provides a rich vocabulary for defining processes and has several features which are not found in programming languages. It enables users to describe business process activities as Web services and define how they can be connected to accomplish specific tasks. The Netbeans development platform supports designing BPEL processes since version 5.5. Our goal is to allow the scientist to visually compose and execute workflows in an easy way by hiding the technical details of BPEL. A graphical user interface will allow the user to specify his workflow in an easy way by just dragging and dropping tools into a canvas. In addition, the composition of workflows should be carried out by checking the compatibility among application services based on their inputs and outputs. The Workflow Manager component is responsible for allowing such visual composition and enactment of workflows from our generated Web services. While investigating existing tools for visual composition, we have found a tool called JOpera [33], which provides a language for visual composition and which is implemented as a plugin of the Eclipse development environment. JOpera is a rapid service composition tool offering a visual language and an execution environment for building processes out of reusable services, which include but are not strictly limited to Web services. It enables composing Web services into processes by visually specifying the order of invocation of each service (control flow) and to model the patterns of data exchange between the services (data flow). The JOpera environment provides support for the whole lifecycle of a process; it features a visual monitoring and debugging environment that lets the user interact with a running process. Fig. 9 depicts a biological workflow that we have developed to experiment with the JOpera environment. It is created from the SeqretWS, TranseqWS, and GetorfWS Web services generated respectively for seqret, transeq, and getorf Emboss tools. The Bioworklow process is composed of three sub-processes: SeqretSubprocess, TranseqSubprocess, and GetorfSubProcess. Each of these subprocess is composed of tasks that represent the invokation of the associated Web service operations. For instance, the SeqretSubprocess is associated with the SeqretWS Web service.
42
Fig. 9- Biological workflow created with the JOpera environment.
VI. CONCLUSION In this paper, we have presented a new framework for integrating bioinformatics tools by wrapping them as Web services. These tools are characterized by the heterogeneity of their styles, their parameters, and the data types and formats they can handle. Our proposed framework allows creating uniform interfaces of these tools without having to modify their code or write additional code. This greatly simplifies composing these applications into workflows to implement biological protocols. The framework is based on using a generic XML schema to describe bioinformatics applications and their parameters in an easy way that catches various styles and scenarios for using parameters in a command-line tool. A prototype of our framework is still under development and some sample application services, such as infoalign, seqret, transeq, silent, getorf, and pepcoil have been generated and customized. As a future work, we intend to add various command-line biological tools to the framework and to integrate JOpera with our workflow manager. In addition to the framework tools, the framework will provide support for importing external biological Web services, available from the bioinformatics community, and for their composition into workflows in the same way as local Web services.
ISSN 1738-6535 © Web Services Research Foundation
International Journal of Web Services Practices, Vol. 4 No.1(2009), pp. 36-43 ACKNOWLEDGMENT The authors would like to thank Haifa Al Abdouli, Halima Shehyari, and Mariam Hefaity for their contribution in the implementation of the proposed test-bed. REFERENCES [1]
[2] [3] [4]
[5]
[6]
[7] [8]
[9] [10]
[11]
[12]
[13]
[14] [15]
[16] [17]
[18]
[19]
[20]
[21] [22] [23]
A. Siepel, A. Farmer, A. Tolopko, M. Zhuang, P. Mendes, W. Beavis, and B. Sobral. ISYS: a decentralized, component-based approach to the integration of heterogeneous bioinformatics resources. Bioinformatics, 2001, 17, pp. 83-94. TurboGenomics Inc (n.d.). TurboBench overview. http://www.turbogenomics.com/products/turbobench_overview.pdf M. Senger. AppLab - A CORBA-Java based Application Wrapper. http://www.omg.org/docs/corbamed/98-03-08.pdf T.G. Littlejohn. Bioinformatics tools for genome projects. In Molecular Breeding of Forage Crops, Spangenberg, G. (ed.), Kluwer Acad. Publ., The Netherlands, 2001, pp. 83-99. R. Unwin, J. Fenton, M. Whitsitt, C. Jamison, M. Stupar, E. Jakobsson, and S. Subramaniam. Biology Workbench: A WWW-based Virtual Computing and Analysis Environment for the Biological Sciences. Bioinformatics (Databases and Systems, S. Letovsky (Ed.)), 1998, pp. 233-244. E. Badidi, C. DeSousa, F. Lang, and G. Burger. AnaBench: a Web/CORBA-based Workbench for biomolecular sequence Analysis. BMC Bioinformatics, 2003, 4:63. World Wide Web Consortium. Web Services Description Language 2.0 (W3C working draft 3). http://www.w3.org/tr/wsdl20 M. Senger, P. Rice, and T. Oinn. Soaplab - a unified Sesame door to analysis tools. Paper presented at the UK e-Science All Hands Meeting, 2003, Nottingham, UK. The Apache Software Foundation (n.d.). Web services – Axis. http://ws.apache.org/axis D.S. Robert, J.R. Alan, and A.G. Carole. myGrid: personalised bioinformatics on the information grid. Bioinformatics, 2003, 19 (Suppl. 1), pp. i302-i304. T. Oinn, M.J. Addis, J. Ferris, D.J. Marvin, M. Greenwood, T. Carver, A. Wipat, and P. Li. Taverna, lessons in creating a workflow environment for the life sciences. Paper presented at the GGF10, Berlin, Germany, 2004. M.D. Wilkinson, and M. Links. BioMOBY: An open source biological web services proposal. Briefings in bioinformatics, 2003, 3(4), pp. 331–341. P. Rice, I. Longden, A. Bleasby. EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics, 2000, 16, pp. 276-277. OASIS. Web Services Business Process Execution Language Version 2.0. OASIS Standard, 11 April 2007. S.P. Shah, D.Y. He, J.N. Sawkins, J.C. Druce, G. Quon, D. Lett, G.X. Zheng, T. Xu, B.F. Ouellette. Pegasys: software for executing and integrating analyses of biological sequences. BMC Bioinformatics 2004, 5:40 P. Ernst, K-H. Glatting, and S. Shuai. A task framework for the web interface W2H. Bioinformatics, 2003, 19, 278-282. A.G. Castro, S. Thoraval, L.J. Garcia, and M.A. Ragan. Workflows in bioinformatics: meta-analysis and prototype implementation of a workflow generator. BMC Bioinformatics, 2005, 6:87. S. Hoon, K. Kumar Ratnapu, J. Chia, B. Kumarasamy, X. Juguang, M. Clamp, A. Stabenau, S. Potter, L. Clarke, and E. Stupka, Biopipe: A Flexible Framework for Protocol-Based Bioinformatics Analysis, Genome Research, 2003, 13:1904-1915. INCOGEN, visual integrated bioinformatics environment. White paper. http://www.incogen.com/public_documents/vibe/VIBE_Whitepaper.pd f E. Badidi, G. Burger, and B.F. Lang. FLOSYS - a Web accessible workflow system for protocol-driven biomolecular sequence analysis. Cellular and Molecular Biology Journal, 2004, 50(7):785-793. C. Lethondal. A web interface generator for molecular biology programs in Unix. Bioinformatics, 2001, 17: 73-82. Swiss Institute of Bioinfomatics (SIB). ExPASy Proteomics tools. http://www.expasy.ch/tools/ S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman, Basic local alignment search tool. J. Mol. Biol. 1990, 215: 403-410.
43
[24] D.A. Benson, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, and D.L. Wheeler. GenBank. Nucl. Acids Res. 2003, 31: 23-27. [25] G.D. Schuler, J.A. Epstein. H. Ohkawa, and J.A. Kans. Entrez: molecular biology database and retrieval system. Methods in Enzymology, 1996, 266: 141-162. [26] J.D. Thompson, D.G. Higgins, and T.J. Gibson. CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research, 1994, 22, pp. 4673-4680. [27] J. Castresana. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution, 2000, 17: 540-552. [28] J. Felsenstein. PHYLIP phylogeny inference package (version 3.2). Cladistics 1989, 5: 164-166. [29] H. Shimodaira, and M. Hasegawa. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 2001, 17(12): 1246-1247. [30] OASIS. Universal Description, Discovery and Integration (UDDI) Version 3.0.2. http://uddi.org/pubs/uddi-v3.0.2-20041019.htm [31] D.G. Gilbert. Sequence file format conversion with command-line Readseq. In Current Protocols in Bioinformatics, A. Baxevanis and D. Davison, eds. Wiley, 2002. [32] Progress Software Corporation (n.d. Stylus studio. http://www.stylusstudio.com/ [33] C. Pautasso. JOpera: Process Support for more than Web services. http://www.iks.ethz.ch/jopera Elarbi Badidi is an Assistant Professor of computer science at the College of Information Technology (CIT) of United Arab Emirates University. Before joining the CIT, he held the position of bioinformatics group leader at the Biochemistry Department of Université de Montréal from 2001 to July 2004. He received a Ph.D. in computer science in 2000 from Université de Montréal, Québec (Canada). His research interests include Web services and Service Oriented Computing, Middleware, and Bioinformatics data and tools integration. M. Vall Mohamed Salem is currently an Assistant Professor with the University of Wollongong in Dubai. His current interests are in performance analysis and scalability issues, distributed systems and software engineering. He received a Ph.D. in computer science in 2002 from Université de Montréal, Québec (Canada). He held an IBM Canada Centre for Advanced Studies fellowship and can be joined at
[email protected]. Salah Bouktif is an Assistant Professor of software engineering at the College of Information Technology (CIT) of United Arab Emirates University. Before joining CIT, Dr. Bouktif was a Post Doc Fellow for two years at the department of computer engineering of the polytechnic school of engineering of Montreal. He received his Ph.D. Degree in 2005 with high honors from the University of Montreal. Dr. Bouktif’s research interest includes Metrics and software quality models, Software quality prediction improvement, Search-Based Software Engineering, Software testing and test data generation, Software evolution, Change and cost modeling. Larbi Esmahi is an Associate Professor of the School of Computing and Information Systems at Athabasca University. He was the graduate program coordinator at the same school during 2002-2005. He holds a PhD in electrical engineering from Ecole Polytechnique, University of Montreal. His current research interests are in e-services, e-commerce, multiagent systems, and intelligent systems. He is an associate editor for the Journal of Computer Science, and the Tamkang Journal of Science and Engineering. He is also member of the editorial advisory board of the Advances in Web-Based Learning Book Series, IGI Global, and member of the international editorial review board the International Journal of Web-Based Learning and Teaching Technologies.
.
ISSN 1738-6535 © Web Services Research Foundation