S6 Annotation transfer and detecting annotation error - PLOS
Recommend Documents
ID. log2FC in. △PdsreA Pfam family domain ... 133 elements included exclusively in "△PdsreA-down": ..... transporter,ABC-transporter extracellular. N-terminal ...
Vasily Tcherepanov; Angelika Ehlers; Chris UptonEmail author ... The program is freely available under the General Public License and can be accessed along ...
years later, the Canadian Ice Hockey Federationhas still not changed its rule. ..... Guide de securite et de prevention dans les arinas. Trois-Rivieres: ... Evaluation de l'impact du reglement type sur les piscines ... Canada. Ottawa: Enqu&te Conditi
Babko-Malaya, Olga, Ann Bies, Ann Taylor, Szuting Yi, Martha Palmer, Mitch ... Pradhan, Sameer, Kadri Hacioglu, Valerie Krugler, Wayne Ward, James H.
Abstract. This paper discusses the corpus annotation effort in the FLAG project and its application in the development of controlled language and grammar ...
In: Granger, S., Gilquin, G.,. Meunier, F., (eds), The Cambridge Handbook of. Learner Corpus Research. Cambridge: Cambridge. University Press. [3] Belz, M.
'I_have_X/X_have_I', which may serve as a teddy- bear construction [8] in the acquisition of verb-final ... Analyse. Master's thesis Humboldt-Universität zu. Berlin.
a modular NLP software architecture for quickly building different kinds of CL .... To best exploit the annotated news corpus, the email sentences needed to be ..... self, we find that subcategorization errors make up the bulk of the grammatical.
Jun 28, 2018 - Cytochromes P450 are enzymes that participate in a wide range of functions in plants, from hormonal signaling and biosynthesis of structural ...
transfer in these proteins, using scop superfamilies to define domain folds and a thesaurus based ... transfer, transferring the structural and functional annotation.
One can say that the ultimate goal is for the higher level lan- guage to coincide .... Figure 1: PDF of L1 distances for RGB histogram feature in. Corel5K dataset.
Technically speaking, disability-free life expectancy is an example of a health expectancy, an estimate of the number of years a population can expect to live in a.
to have insufficient funds to hire programmers to provide necessary program- ..... It must use clearly specified criteria concerning the choice of coders (so.
Text mining systems help to cut down on the amount of time that users have to spend ... system, the user can specify that he wishes to find all instances of positive ... In (S2), the first part of the sentence projects a rather different interpretati
In the case of elliptic sentences, the previous strategy is followed: the keyword is marked and its scope is neglected s
statement, that is, it does not include any speculative element or an element that refers to ... An instance of a comple
2006). Ori-Finder was used to predict the replication origin of the plasmid (Gao and Zhang. 2008). .... Unipro UGENE: a unified bioinformatics toolkit. ... BASys: a web server for automated bacterial genome annotation. .... ankyrin repeat-domain.
Jul 16, 2009 - with Ensembl and Entrez Gene annotation supplemented with a set of unmapped ..... Furthermore, for almost half of the oligos, the sequence.
a violin)) b. clausal unit with preposed PP and em- bedded relative clauses: ((With ... 3Our instructions for marking up such elements benefited ... when two main clauses were coordinated; coordi- .... free-rel (for 'free relatives' such as what you
Feb 3, 2016 - to facilities managed by BioPlatforms Australia (http:// · www.bioplatforms.com) and funded by the Australian ...... Domain name. Before. After.
2002; Dickinson and Meurers 2005b; Padro and Marquez 1998), often prompt- ing researchers to develop work-around techniques to deal with noisy data for a.
Japan. In: Wahrendorf J, ed. Contributions to Epidemiology and Biostatistics. ... International Agency for Research on ... tion of the tumorigenicity of tobacco.
S6 Annotation transfer and detecting annotation error - PLOS
able at http://phylogenomics.berkeley.edu/book/book info.php?book=d1gkpa2). homologs of the hydantoinase SCOP superfamily. Within this subfamily, protein ...
Protein Subfamily Identification, Supplemental
S6
1
Annotation transfer and detecting annotation error
To illustrate biological discovery using our system of classification, we present several illustrations in which clustering sequences into SCI-PHY subfamilies enables functional assignment and detection of errors in function prediction. By teasing apart subtypes within large protein families, our system enables both accurate function prediction and annotation error detection. For example, the amidohydrolase family of enzymes is a highly diverse group whose members exhibit a wide range of functions. SCI-PHY is able to separate these functions into discrete subfamilies, which can be used for functional annotation. Table S5 shows one subfamily from the SCI-PHY partition of the PhyloFacts book d1gkpa2 (available at http://phylogenomics.berkeley.edu/book/book info.php?book=d1gkpa2). homologs of the hydantoinase SCOP superfamily. Within this subfamily, protein P20051 has been annotated as a dihydroorotase using experimental evidence. The subfamily clustering system allows us to transfer this function to the unannotated sequences in the subfamily, for highly precise and accurate function prediction. When a protein family contains multiple functions, it is quite easy for individual sequences to be misannotated, or annotated with a very general function. We provide an example in Table S6, again from the amidohydrolase family. Six sequences (shown in red) within this subfamily of guanine deaminases have been annotated as chlorohydrolases or “cytosine/guanine deaminases.” Based on the experimental evidence for sequences in the subfamily, these sequences are most likely misannotated. In addition, this example again highlights the effectiveness of SCI-PHY in facilitating annotation transfer: fifteen sequences without annotations in their description (and only high-level GO annotation) can be specifically labeled as guanine deaminases. Accession caa30444 P20051
Source Genbank UniProt
zp 00018287 Q9PIN6
Genbank UniProt
eaa31733 Q9UTI0
Genbank UniProt
aam46078 P31301
Genbank UniProt
Description unnamed protein product Dihydroorotase (EC 3.5.2.3) (DHOase) hypothetical protein Dihydroorotase (EC 3.5.2.3)
Species Saccharomyces cerevisiae Saccharomyces cerevisiae
GO function [evidence code] dihydroorotase activity [IMP] protein binding [IPI] dihydroorotase activity [IEA] hydrolase activity [IEA]
Toxoplasma gondii Ustilago maydis
Table S5: The N1852 subfamily from the SCI-PHY partition of sequences in the PhyloFacts book d1gkpa2, a member of the hydantoinase SCOP superfamily. The evidence code for each GO annotation are given in brackets after the term. P20051 (blue) has been annotated as a dihydroorotase using experimental evidence, which can be transferred to unannotated proteins within the subfamily (green).
Description Chlorohydrolase family protein Cytosine/guanine deaminase related protein Cytosine/guanine deaminase related protein Hypothetical protein Similar to sp—Q07729 Saccharomyces cerevisiae YDL238c Hypothetical protein Atrazine chlorohydrolase/guanine deaminase Chlorohydrolase family protein Hypothetical protein Hypothetical protein NCU07309.1 Similar to sp—Q07729 Saccharomyces cerevisiae YDL238c singleton Probable guanine deaminase (EC 3.5.4.3) Similar to sp—Q07729 Saccharomyces cerevisiae YDL238c CG18143-PA (LD44207p) (RE08243p) ENSANGP00000019574 (Fragment) Hypothetical protein Atrazine chlorohydrolase/guanine deaminase Hypothetical protein Hypothetical protein Guanine deaminase (EC 3.5.4.3) Hypothetical protein Chromosome 12 SCAF14652, whole genome shotgun sequence. (Fragment) Hypothetical protein zgc:112282
2
Species Enterococcus faecalis Clostridium acetobutylicum
GO function [evidence code] hydrolase activity [IEA] hydrolase activity [IEA]
Table S6: SCI-PHY helps to identify potential annotation errors. A portion of one subfamily of guanine deaminases from the PFAM amidohydrolase family (Amidohydro 1) is shown. Accession numbers in blue have experimental evidence of guanine deaminase activity. Accessions shown in red have been annotated with different functions and are potential misannotations. Accessions shown in green have no annotation (or only a very general GO annotation), and can be labeled as guanine deaminases. 17 sequences within this subfamily were labeled as guanine deaminases; these are not shown.