MIRIAM Resources: a robust annotation and crossreferencing framework Camille Laibe
HUPOPSI 2010 Spring Meeting, Seoul, Korea
EBI is an Outstation of the European Molecular Biology Laboratory.
ELIXIR: recommendations Talk outline
Background information BioModels.net MIRIAM Standard Annotations MIRIAM URNs MIRIAM Resources Examples of usage How could all this help you?
BioModels.net Community aiming to improve model annotation, interpretation,
exchange and reuse. Not restricted to any format (eg. SBML, CellML, …) Current projects: A checklist for model annotation: MIRIAM and its supporting infrastructure: MIRIAM Resources A set of relationships (qualifiers) to link model and data: BioModels.net qualifiers An ontology to precise the semantics of models: SBO A public database of published models of biological interest: BioModels Database Several ongoing efforts: MIASE, KiSAO, SEDML, TEDDY, ...
http://biomodels.net/
3
Quantitative biochemical models biochemical model
mathematical model
simulation
computational model Tyson et al (1991) PNAS 88(1):7328-32
MIRIAM
The Minimum Information Required In the Annotation of a Model
http://biomodels.net/miriam/
MIRIAM STANDARD MIRIAM Standard
proposed guidelines for curation and annotation of quantitative models about encoding and annotation applicable to any structured model format cf. Nicolas Le Novère et al. Minimum Information Requested in the Annotation of biochemical Models (MIRIAM). Nature Biotechnology, 2005
MIRIAM Compliance
Models must: be encoded in a public machinereadable format be clearly linked to a single publication reflect the structure of the biological processes described in the reference paper (list of reactions, ...) be instantiable in a simulation (possess initial conditions, ...) be able to reproduce the results given in the reference paper contain creator’s contact details annotated: must unambiguously identify each model constituent
Why are annotations important? Annotations of model components are essential to: unambiguously identify model components improve understanding the structure of the model allow easier comparison of different models ease the integration of models allow efficient search strategies add a semantic layer to the model improve understanding of the biology behind the model allow conversion and reuse of the model ease the integration of model and biological knowledge
Why are annotations important? Annotations of model components are essential to: unambiguously identify model components improve understanding the structure of the model allow easier comparison of different models ease the integration of models allow efficient search strategies add a semantic layer to the model improve understanding of the biology behind the model allow conversion and reuse of the model ease the integration of model and biological knowledge
→ True for any kind of data, not only models!
Why annotations should not be raw text? EMBL bank version 45 (04-DEC-1995 ): /db_xref="PID:g984120" EMBL bank version 47 (07-JUN-1996): /db_xref="PID:g984120" /db_xref="SWISS-PROT:P49581" EMBL bank version 60 (03-SEP-1999): /db_xref="SWISS-PROT:P49581" /protein_id="CAA58766.1" EMBL bank version 73 (30-NOV-2002): /db_xref="SWISS-PROT:P49581" /protein_id="CAA58766.1" /db_xref="GOA:P49581" EMBL bank version 79 (08-JUN-2004): /db_xref="UniProt/Swiss-Prot:P49581" /protein_id="CAA58766.1" /db_xref="GOA:P49581" EMBL bank version 84 (12-SEP-2005): /db_xref="UniProtKB/Swiss-Prot:P49581" /protein_id="CAA58766.1" /db_xref="GOA:P49581"
Why annotations should not be raw text? EMBL bank version 45 (04-DEC-1995 ): /db_xref="PID:g984120" EMBL bank version 47 (07-JUN-1996): /db_xref="PID:g984120" /db_xref="SWISS-PROT:P49581" EMBL bank version 60 (03-SEP-1999): /db_xref="SWISS-PROT:P49581" /protein_id="CAA58766.1" EMBL bank version 73 (30-NOV-2002): /db_xref="SWISS-PROT:P49581" /protein_id="CAA58766.1" /db_xref="GOA:P49581" EMBL bank version 79 (08-JUN-2004): /db_xref="UniProt/Swiss-Prot:P49581" /protein_id="CAA58766.1" /db_xref="GOA:P49581" EMBL bank version 84 (12-SEP-2005): /db_xref="UniProtKB/Swiss-Prot:P49581" /protein_id="CAA58766.1" /db_xref="GOA:P49581"
Why annotations should not be raw text? EMBL bank version 45 (04-DEC-1995 ): /db_xref="PID:g984120" EMBL bank version 47 (07-JUN-1996): /db_xref="PID:g984120" /db_xref="SWISS-PROT:P49581" EMBL bank version 60 (03-SEP-1999): /db_xref="SWISS-PROT:P49581" /protein_id="CAA58766.1" EMBL bank version 73 (30-NOV-2002): /db_xref="SWISS-PROT:P49581" /protein_id="CAA58766.1" /db_xref="GOA:P49581" EMBL bank version 79 (08-JUN-2004): /db_xref="UniProt/Swiss-Prot:P49581" /protein_id="CAA58766.1" /db_xref="GOA:P49581" EMBL bank version 84 (12-SEP-2005): /db_xref="UniProtKB/Swiss-Prot:P49581" /protein_id="CAA58766.1" /db_xref="GOA:P49581"
Not consistent!
Why annotations should not be uncontrolled XML? Extracted from a BioPAX model: CGD CSSM34
Why annotations should not be uncontrolled XML? Extracted from a BioPAX model: CGD CSSM34
What is “CGD”?
Why annotations should not be uncontrolled XML? Extracted from a BioPAX model: CGD CSSM34
What is “CGD”? CGD is the official acronym for: Candida Genome Database Cattle Genome Database Comparative Genomics Database Chronic Granulomatous Disease
Why annotations should not be uncontrolled XML? Extracted from a BioPAX model: CGD CSSM34
What is “CGD”?
Ambiguous! CGD is the official acronym for: Candida Genome Database Cattle Genome Database Comparative Genomics Database Chronic Granulomatous Disease
Why annotations should not be uncontrolled XML? Extracted from a BioPAX model: CGD CSSM34
What is “CGD”?
Ambiguous! CGD is the official acronym for: Candida Genome Database Cattle Genome Database Comparative Genomics Database Chronic Granulomatous Disease Hopefully, things are now changing (cf. Pathway Commons)...
Why annotations should not be uncontrolled URLs?
Minimum information requested in the annotation of biochemical models (MIRIAM) PMID: 16333295
URLs (physical addresses): http://www.ebi.ac.uk/citexplore/citationDetails.do? dataSource=MED&externalId=16333295 http://www.ncbi.nlm.nih.gov/pubmed/16333295 http://www.hubmed.org/display.cgi?uids=16333295 http://srs.ebi.ac.uk/srsbin/cgibin/wgetz?view+MedlineFull+ [medlinePMID:16333295]
Why annotations should not be uncontrolled URLs?
Minimum information requested in the annotation of biochemical models (MIRIAM) PMID: 16333295
Not unique!
URLs (physical addresses): http://www.ebi.ac.uk/citexplore/citationDetails.do? dataSource=MED&externalId=16333295 http://www.ncbi.nlm.nih.gov/pubmed/16333295 http://www.hubmed.org/display.cgi?uids=16333295 http://srs.ebi.ac.uk/srsbin/cgibin/wgetz?view+MedlineFull+ [medlinePMID:16333295]
Why annotations should not be uncontrolled URLs?
Minimum information requested in the annotation of biochemical models (MIRIAM) PMID: 16333295
Not unique! Not perennial!
URLs (physical addresses): http://www.ebi.ac.uk/citexplore/citationDetails.do? dataSource=MED&externalId=16333295 http://www.ncbi.nlm.nih.gov/pubmed/16333295 http://www.hubmed.org/display.cgi?uids=16333295 http://srs.ebi.ac.uk/srsbin/cgibin/wgetz?view+MedlineFull+ [medlinePMID:16333295]
Characteristics of a useful identifier Unique an identifier must never be assigned to two different objects; Perennial the identifier is constant and its lifetime is permanent; Standards compliant must conform on existing standards, such as URI; Resolvable identifiers must be able to be transformed into locations of online resources storing the object or information about the object; Free of use everybody should be able to use and create identifiers, freely and at no cost.
MIRIAM URN Data type identifier
Dataset Identifier
URI
text string
Not a URL, not a “Web address”!
Format depends on the resource identified by the data type
22
MIRIAM URN Data type identifier
Dataset Identifier
URI
text string Format depends on the resource identified by the data type
Not a URL, not a “Web address”! UniProt and P62158 (human calmodulin) EC code and 1.1.1.1 (alcohol dehydrogenase)
urn:miriam:uniprot:P62158 urn:miriam:ec-code:1.1.1.1
Gene Ontology and GO:0000186 (activation of MAPKK activity) urn:miriam:obo.go:GO%3A0000186
23
Qualification of the annotation
model element
qualifier
represents
represents
biological entity A
annotation
relationship
biological entity B
Current BioModels.net Qualifiers Current BioModels.net qualifiers Current BioModels.net qualifiers
bqmodel:is
bqbiol:isPropertyOf
bqmodel:isDerivedFrom
bqbiol:isVersionOf
bqmodel:isDescribedBy
bqbiol:hasVersion
bqbiol:is bqbiol:isDescribedBy bqbiol:hasPart bqbiol:hasProperty bqbiol:isPartOf
bqbiol:isHomologTo bqbiol:isDescribedBy bqbiol:encodes bqbiol:isEncodedBy bqbiol:occursIn ...
http://biomodels.net/qualifiers/ 25
SBML and MIRIAM
SBML and MIRIAM
MIRIAM Resources
–
browsing
–
searching
–
editing
–
export (XML)
–
Web Services
–
documentation
http://www.ebi.ac.uk/miriam/ Open Source: available on SourceForge.net
Camille Laibe and Nicolas Le Novère. MIRIAM Resources: tools to generate and resolve robust crossreferences in Systems Biology. BMC Systems Biology, 2007
MIRIAM Resources overview
MIRIAM Resources: http://www.ebi.ac.uk/miriam/
MIRIAM data types
MIRIAM data type
MIRIAM data type
Resources
MIRIAM resources reliability
MIRIAM resources reliability
Applications: whole cell metabolic models Yeast Metabolic Model Herrgård M.J., Swainston N., Dobson P., Dunn W.B., Arga K.V., Arvas M., Blüthgen N., Borger S., Costenoble R, Heinemann M., Hucka M., Le Novère N., Li P., Liebermeister W., Mo M.L., Oliveira A.P., Petranovic D., Pettifer S., Simeonidis E., Smallbone K., Spasic I., Weichart D., Brent R., Broomhead D.S., Westerhoff H.V., Kırdar B., Penttilä M., Klipp E., Palsson B.Ø., Sauer U., Oliver S.G., Mendes P., Nielsen J., Kell D.B. (2008) A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Nature Biotechnology, 26: 1155 1160. 2152 species, 1857 reactions, 4861 MIRIAM annotations Human Metabolic Model 4889 species, 8866 reactions, 66968 MIRIAM annotations
Models clustering
Krause F, Liebermeister W (2009) A simple clustering of the BioModels database using semanticSBML. Nature Precedings, doi:10.1038/npre.2009.3444.1
Models comparison
http://www.semanticsbml.org/
Tools developing support for MIRIAM URNs Data resources
Application software
BioModels Database (kinetic models)
ARCADIA (graph editor)
Pathway Commons (BioPAX)
BIOUML (modelling and simulation)
Physiome Model Repository (CellML)
COPASI (Simulation)
SABIORK (reaction kinetics)
LibAnnotationSBML
Yeast consensus model database
LibSBML
Human consensus model database
SAINT (semantic annotation)
EMeP (structural genomics)
SBML2BioPAX SBML2LaTeX SBMLeditor (model editor) SemanticSBML (annotation, merging, comparison, ...) Snazer (network analysis, simulations) Systems Biology Workbench (model design and simulation) The Virtual Cell (simulation)
HUPOPSI potential usage Intact (PSI-MI 2.5):
Pride: [...]
HUPOPSI potential usage Intact (PSI-MI 2.5):
Pride: [...]
Possible cooperation?
Proteomics Standards Initiative Controlled Vocabularies Released October, 2006 Last maintenance update, April 2009 http://psidev.info/index.php?q=node/159
“As common reference system for databases MIRIAM resource is recommended.”
Acknowledgements
Nicolas Le Novère Nick Juty Camille Laibe
Henning Hermjakob Samuel Kerrien Juan A Vizcaino Sandra Orchard Luisa MontecchiPalazzi
The community of computational systems biology for the development of MIRIAM and the implementation of MIRIAM support Data providers who replied, discussed and even complied with MIRIAM rules
[email protected]
43
Requirements for a MIRIAM compliant data type
Open access Anybody can access any public data without restriction (no commercial licence, no login page, ...) Atomicity The granularity of the data distributed has to be appropriately selected (a database of “reactions” distributes reactions and not pathways) and consistent (e.g. classes or instances but not classes AND instances) Identifier An atomic data is associated to a unique and perennial identifier Community recognition The resource has to be “recognised” by the corresponding experimental community, be reasonably supported, ...