Overview Direct Submissions Bulk Submissions The SPIN Interface ...

0 downloads 78 Views 1MB Size Report
Email: [email protected]. URL: www.uniprot.org. Funding. UniProt is funded by the European Molecular Biology Laboratory,
Gayatri Chavali1 and UniProt Consortium1,2,3 1EMBL-European

Bioinformatics Institute, Cambridge, UK 2SIB Swiss Institute of Bioinformatics, Geneva, Switzerland 3Protein Information Resource, Georgetown University, Washington DC & University of Delaware, USA

Sequence Submissions to UniProtKB The SPIN Interface

Overview UniProtKB is a comprehensive, centralised resource for protein sequence and functional information. Sequences in UniProtKB are derived primarily from translations of coding sequences submitted to the International Nucleotide Sequence Database Collaboration of ENA, GenBank and DDBJ. Other sources include the wwPDB and gene predictions from Ensembl and RefSeq. Protein sequences are also incorporated into the database as submissions from direct sequencing methods.

Direct Submissions UniProt accepts depositions of direct protein/peptide sequences derived from techniques such as Edman degradation and MS/MS. Direct data submissions are accepted from the research community via SPIN, the interactive web-based submission tool. http://www.ebi.ac.uk/swissprot/Submissions/spin/index.jsp In addition to the primary sequence, submitters are asked to provide: • Source Organism • Source strain/tissue • Citation details • Experimental method used to obtain sequence • Any relevant characterisation data

SPIN submissions are annotated using data provided by the submitter coupled with results from sequence analysis tools and information

Deposition Metrics

propagated from homologous sequences already present in the database. Annotation is carried out by maintaining a dialogue with the submitter to ascertain the supporting evidence available for the deposited sequence. A unique accession number is assigned to each submitted sequence which can be used by the submitter in subsequent publications.

900

Number of unique sequences

800

Number of unique submitters

700 600

Bulk Submissions

500 400

Sequence submissions occasionally consist of more than 50 sequences.

300 200

In the event of such situations submitters may provide the sequences in a bulk fasta file. Currently, improvements are on-going to streamline the

100 0 2004

2005

2006

2007

2008

2009

2010

2011

Mammalia

Amphibia

Aves

Reptiles

Insects

Molluscs

Plants

Fungi

Prokaryotes

Others

2012

treatment of bulk submissions.

Sequence Release Submitted data are either released or moved into a confidential holding

Direct submissions over years (2004-2012)

Distribution of submissions on taxonomy

Future Developments

area depending on the release/hold instructions provided by the submitter. Entries marked for release are made available as part of the next UniProt

SPIN is currently under re-development to provide an enhanced submission

release cycle.

interface along with improvements in handling bulk submission data.

Funding UniProt is funded by the European Molecular Biology Laboratory, National Institutes of Health, European Union, Swiss Federal Government, British Heart Foundation and National Science Foundation.

Email: [email protected] URL: www.uniprot.org