BioXSD - the XML Schema for basic bioinformatics data

3 downloads 0 Views 1MB Size Report
6. European Bioinformatics Institute, EMBL, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK. BioXSD The XML Schema for basic bioinformatics data.
BioXSD 1,2

The XML Schema for basic bioinformatics data

1

3

4

5

6

3

Matúš Kalaš , Pål Puntervoll , Edita Karosiene , Christophe Blanchet , Sveinung Gundersen , Jon Ison , Kristoffer Rapacki and Inge Jonassen 1

2

3

1,2

4

Computational Biology Unit, Uni Computing and Department of Informatics, University of Bergen, Bergen, Norway; Center for Biological Sequence Analysis, Technical University of Denmark, Kongens Lyngby, Denmark; Institut de Biologie et Chimie 5 6 des Protéines, CNRS and Université Claude Bernard Lyon 1, Lyon, France; Institute for Cancer research, Oslo University Hospital, Oslo, Norway; European Bioinformatics Institute, EMBL, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.

[email protected] ADVANTAGES OF XSD

STRATEGY EMBRACE project (EU FP6) has recommended a way of providing smoothly interoperable bioinformatics tools.

Textual and tabular formats, XML, and RDF have each their advantages in certain usage scenarios.

Advantages for users of tools:

Automatic input validation *

usability

The table shows certain advantages of XSD-based XML formats over textual formats.

* … and those over RDF.

An XSD (i.e. XML Schema) defines data objects, just as object-oriented programming languages do. In particular with Web services, an XSD is mandatory and useful.

MIX-AND-MATCHING OF TOOLS

usability

Advantages for providers of tools:

Easier conversion of formats

Parsing “for free”

security

maintainability

usability

Auto-generation of objects and GUIs *

maintainability

scalability

Efficient compression (with EXI standard by W3C) *

scalability

semantics

Semantic annotation of type’s details (with SAWSDL)

semantics

resources

Workflow programming easier & faster

Ready-made I/O building blocks: development easier & faster (*)

resources

EXAMPLE WORKFLOW 1. proprietary formats

2. common format

blue rectangles are Web-service calls, red ovals are data

Without a common format, communication between diverse tools demands proprietary parsing, transformations, “shims”. Using common data formats makes workflow construction and maintenance easier and faster. The 2 scenarios show demands for connecting 2 tools (such as Web services) which are using: 1. Proprietary formats 2. Common format

Smooth!

EXAMPLE DATA: BioXSD Sequence record Basic example:

Type diagram:

LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATA FMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLVEWIWGGFSVDKATLNRFFA FHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLGLLILILLL LLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLAL FLSIVILGLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTII GQMASILYFSIILAFLPIAGXIENY

Suggest Documents