. URL: www.uniprot.org. Funding. UniProt is funded by the European Molecular Biology Laboratory,
Gayatri Chavali1 and UniProt Consortium1,2,3 1EMBL-European
Bioinformatics Institute, Cambridge, UK 2SIB Swiss Institute of Bioinformatics, Geneva, Switzerland 3Protein Information Resource, Georgetown University, Washington DC & University of Delaware, USA
Sequence Submissions to UniProtKB The SPIN Interface
Overview UniProtKB is a comprehensive, centralised resource for protein sequence and functional information. Sequences in UniProtKB are derived primarily from translations of coding sequences submitted to the International Nucleotide Sequence Database Collaboration of ENA, GenBank and DDBJ. Other sources include the wwPDB and gene predictions from Ensembl and RefSeq. Protein sequences are also incorporated into the database as submissions from direct sequencing methods.
Direct Submissions UniProt accepts depositions of direct protein/peptide sequences derived from techniques such as Edman degradation and MS/MS. Direct data submissions are accepted from the research community via SPIN, the interactive web-based submission tool. http://www.ebi.ac.uk/swissprot/Submissions/spin/index.jsp In addition to the primary sequence, submitters are asked to provide: • Source Organism • Source strain/tissue • Citation details • Experimental method used to obtain sequence • Any relevant characterisation data
SPIN submissions are annotated using data provided by the submitter coupled with results from sequence analysis tools and information
Deposition Metrics
propagated from homologous sequences already present in the database. Annotation is carried out by maintaining a dialogue with the submitter to ascertain the supporting evidence available for the deposited sequence. A unique accession number is assigned to each submitted sequence which can be used by the submitter in subsequent publications.
900
Number of unique sequences
800
Number of unique submitters
700 600
Bulk Submissions
500 400
Sequence submissions occasionally consist of more than 50 sequences.
300 200
In the event of such situations submitters may provide the sequences in a bulk fasta file. Currently, improvements are on-going to streamline the
100 0 2004
2005
2006
2007
2008
2009
2010
2011
Mammalia
Amphibia
Aves
Reptiles
Insects
Molluscs
Plants
Fungi
Prokaryotes
Others
2012
treatment of bulk submissions.
Sequence Release Submitted data are either released or moved into a confidential holding
Direct submissions over years (2004-2012)
Distribution of submissions on taxonomy
Future Developments
area depending on the release/hold instructions provided by the submitter. Entries marked for release are made available as part of the next UniProt
SPIN is currently under re-development to provide an enhanced submission
release cycle.
interface along with improvements in handling bulk submission data.
Funding UniProt is funded by the European Molecular Biology Laboratory, National Institutes of Health, European Union, Swiss Federal Government, British Heart Foundation and National Science Foundation.
Email:
[email protected] URL: www.uniprot.org