An XML Schema for sequence data, features

23 downloads 0 Views 2MB Size Report
EXAMPLE DATA: BioXSD Sequence record. BioXSD.org [email protected] would be g and FreeCo. Unable to s pbil.ibcp.fr. SocketTime d?: ns clarations.
BioXSD 1,2

3

4

An XML Schema for sequence data, features, alignments, and identifiers 5

6

1

7

3

Matúš Kalaš , Edita Karosiene , László Kaján , Sveinung Gundersen , Jon Ison , Pål Puntervoll , Christophe Blanchet , Kristoffer Rapacki and Inge Jonassen 1

2

3

1,2

4

Computational Biology Unit, Uni Computing and Department of Informatics, University of Bergen, Bergen, Norway; Center for Biological Sequence Analysis, Technical University of Denmark, Kongens Lyngby, Denmark; Bioinformatics and Computational Biology Department, Technische Universität München, Garching, Germany; 5Institute for Cancer research, Oslo University Hospital, Oslo, Norway; 6European Bioinformatics Institute, EMBL, Wellcome Trust Genome Campus, Hinxton, 7 Cambridge, UK; Institut de Biologie et Chimie des Protéines, CNRS and Université Claude Bernard Lyon 1, Lyon, France.

[email protected] ADVANTAGES OF XML WITH XSD

BACKGROUND The EMBRACE project (EU FP6, 2005-2010) explored ways of providing smoothly interoperable bioinformatics tools in form of Web services. It initiated the development of BioXSD.

Textual and tabular formats, XML, and RDF have each their own advantages in certain usage scenarios.

Advantages for users of tools:

Advantages for providers of tools:

Automatic input validation (*)

usability

security

The table shows certain advantages of XSD-based XML formats over textual formats.

usability

* … and those over schema-less RDF.

usability

Auto-generation of objects and GUIs *

maintainability

scalability

Efficient compression (with EXI, a W3C standard) *

scalability

semantics

Semantic annotation of format’s details (with SAWSDL)

semantics

An XSD (i.e. XML Schema) defines data objects, just as object-oriented programming languages do. In particular with Web services, an XSD is mandatory and useful.

MIX-AND-MATCHING OF TOOLS

less effort

Easier conversion of formats

Workflow programming easier & faster

Standard parsing

Ready-made I/O building blocks: development easier & faster (*)

maintainability

less effort

EXAMPLE WORKFLOW a) Different formats

b) Common format

blue rectangles are Web-service calls, red ovals are data

Without a common format, communication between diverse tools demands proprietary parsing, transformations, “shims”, and maintenance of them in the future. Using common data formats makes workflow construction and maintenance easier and faster.

The 2 scenarios show demands for connecting 2 tools (such as Web services) that use: a) Different formats b) Common format

Smooth!

EXAMPLE DATA: BioXSD Sequence record Basic example:

Type diagram:

LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATA FMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIWGGFSVDKATLNRFFA FHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILLL LLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLAL FLSIVILGLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTII GQMASILYFSIILAFLPIAGXIENY

Suggest Documents