Databases and Bioinformatics Tools for Rice Research

2 downloads 0 Views 397KB Size Report
The completion of whole genome sequence of rice (Oryza sativa) and high- .... Ensembl and Gramene, Pathway enrichment and comparison tools by Plant ...
Accepted Manuscript Title: Databases and Bioinformatics Tools for Rice Research Author: Priyanka Garg Pankaj Jaiswal PII: DOI: Reference:

S2214-6628(16)30081-0 http://dx.doi.org/doi:10.1016/j.cpb.2016.12.006 CPB 46

To appear in: Please cite this article as: Priyanka Garg, Pankaj Jaiswal, Databases and Bioinformatics Tools for Rice Research, Current Plant Biology http://dx.doi.org/10.1016/j.cpb.2016.12.006 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1 Title: Databases and Bioinformatics Tools for Rice Research Authors: Priyanka Garg1, and Pankaj Jaiswal1* Affiliation: 1 2082 Cordley Hall, Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA

*Corresponding Author

2 Abstract Rice is one of the most important agricultural crop of the world and widely studied model plant. The completion of whole genome sequence of rice (Oryza sativa) and high-throughput experimental platforms have led to the generation of tremendous amount of data, and development of the specialized databases and bioinformatics tools for data processing, efficient organization, analysis, and visualization. In this article, we discuss a collection of biological databases that host genomics data

on sequence, gene expression , genotyping, gene-

interactomes, and pathways, and facilitate data analysis and visualization. Keywords: Biological database, Rice, Gene expression, Biocuration, Ontology, Pathways

3 Introduction Over the last decades, an increasing amount of genome-scale experimental data sets became available and several online and open source, biology databases have emerged. For instance, currently, ~1685 publicly available, online databases are listed at NAR online Molecular Biology Database Collection [1]. These databases can be categorized on the basis of data type, data curation methods, the scope of data coverage and accessibility of the database. Many such publicly funded resources host data (raw, annotated, analyzed) for various species including crops, model and non-model plants, whereas, others are dedicated to a group of species from a taxonomic clade and may contain certain type of data. In addition, n array of tools and web applications are available that facilitate formatting, analysis and visualization of various types of genomic data.

Data coverage decides the target user community for a database. These large-scale public repositories or international archives, usually developed and maintained by national and international projects, provide genomic data from several species. Table 1 lists some of the generic large-scale public repositories or archives, and databases, for example, GenBank [2], EMBL [3], INSDC [4], and DDBJ [5] for sequences and annotation, PDB [6] for protein structures and UniProt [7] for protein information. These are long-term sustainable repositories for archiving valuable data from several organisms including rice. In contrast communityspecific databases cater to the need of a specific research community such as plant databases, e.g. PlantGDB [8], ENSEMBL Plants [9], Gramene [10], PLEXdb [11], Gene Expression Atlas [12], Planteome [13], etc. The species-specific databases for model and non-model organisms, for example, RAPdb [14], Beijing Genomics Institute-Rice Information System [15] and Rice SNPSeek Database [16, 17] provide in depth coverage of the data sets and are more specifically tuned to the need of a specialized small community of researchers. Furthermore, each database can be assigned to one or more categories on the basis of their content, for example gene expression databases, molecular interaction databases, genome annotation, nucleotide or protein databases, smallRNA databases, genomic variation, phenome and pathway databases.

Rice is an important crop and serves as a model for monocotyledon family. Recent advances in rice genome biology have generated tremendous amount of data including fully sequenced high quality reference genomes [9, 10], low coverage sequencing data from 3010 rice accessions of

4 the rice germplasm core collection with an average sequencing depth of 14x [16-19], genetic variation, transcriptomes, proteomes, metabolomes, etc. which necessitates development of bioinformatics resources and databases for storage, processing, organization, analysis, and visualization of such data at systems level.

For the benefit of rice researchers, we are providing a comprehensive list of such generic and specialized genomic databases, resources, web applications and analysis tools (Table-1 and Tale2). Some of the information provided here is also useful to the community of plant researchers who may not be engaged in rice research. It is possible that we may have missed some resources and we expect this list to grow in future.

1. Resources for Genes, Genomes, Genetic variations

The rice genome assembly, annotation, and associated information is mainly provided by The MSU Rice Genome Annotation Project [20, 21], the International rice genome sequencing project’s (IRGSP) rice RAPdb

[14, 22] and the Oryza Genome Evolution (OGE) project

(http://oge.gramene.org) are the primary rice-specific resources that provide rice genome assembly, gene annotation, genome browser, motifs/domains assignments to genes and proteins, information on repetitive DNA sequences, and gene expression data. Gramene uses annotated rice genes and genome assembly of O. sativa ssp. japonica cv Nipponbare from IRGSP [14, 22] a O. sativa ssp. indica cv 93-11 and several wild Oryza species sequenced by the OGE and the Internatioanl Oryza Map Alignment project (I-OMAP) (http://oge.gramene.org) [23-26]. These genomes are presented by building an integrated web resource for rice that includes rice speciesspecific genome browser, whole genome alignment, synteny, genetic and physical maps with genes, gene trees, ESTs and QTL locations, genetic diversity data including SNPs (from the 3,000 rice genome sequence project) and their consequences on the gene function and structure, descriptions of phenotypic traits and plant pathway databases developed using the BioCyc platform such as RiceCyc [27], and the Reactome platform-based Plant Reactome that provides pathways from the reference rice and gene homology-based projections to all Oryza species with sequenced genome or transcriptome [28]. The features and data available at the genome portal of Gramene and their collaborator Ensembl Plants [9] are similar.

5 In addition to Gramene, a number of databases provide genetic variation data (SNPs and indels) including PmiRKB [29], Rice Variation Database [30], RiceVarMap [31], SNP haplotype database [32], and Rice SNP-Seek Database [16, 17]. The largest data set (~29 million SNPs) for rice genetic variants come from the 3,000 rice genome sequencing project that is now being hosted at IRIC, a Rice SNP-Seek Database together with phenotype and variety information / passport data [16, 17].

2. Resources for gene expression datasets, gene-interactomes, pathways and ontologies

The gene expression databases play a vital role in extracting, organizing and interpreting the information regarding the expression profile of a gene and genomes under a specific developmental stage or in response to a particular treatment to build a connection between the genotype and phenotype of an organism. The transcriptomic data can be obtained using various experimental platforms, such as real-time PCR, microarray, and RNA-seq. The Diurnal [33], GENEVESTIGATOR [34] and EMBL-EBI Gene Expression Atlas [12] are popular resources that host gene expression data. EMBL-EBI Gene Expression Atlas provides information on the baseline expression of the gene or a set of genes in a given sample from the RNA-seq based experiments as well as their differential expression data from both the RNA-seq and microarray experiments. This resource also updates expression data frequently by aligning it against the most recent version of the genome assembly and annotation available from the Plant Ensembl and Gramene database. The rice-specific expression data is also hosted by resources like IC4R [30], RiceFREND [35], RiceXPro [36].

MCDRP [37] resource hosts manually curated annotation of rice proteins based on published datasets. EXPath [38] database provides information on metabolic pathways for several plant species based on the analysis of the microarray-based gene expression data. Other databases such as RiceCyc [27], and OryzaCyc (http://www.plantcyc.org/databases/oryzacyc/) host metabolic pathway

network

for

rice.

Whereas,

KEGG

rice

(http://www.genome.jp/kegg-

bin/show_organism?menu_type=pathway_maps&org=dosa) and the Plant Reactome [28], host pathway databases for metabolic, regulatory, developmental and signaling pathway analysis.

6 Table 1 also includes a number of publicly available molecular interaction databases. The databases such as CoP [39] provides information on the co-expressed genes based on transcriptome analysis for 8 plant species including rice. DIPOS [40] is a database of interacting proteins in Oryza sativa, while PRIN [41] is based on the interologs of six model organisms, where large-scale protein-protein interaction experiments have been applied. RiceNet [42], provides an updated network prioritization server for Oryza sativa ssp. japonica.

The use of ontologies such as the Gene Ontology (GO) [43] and the Plant Ontology (PO) [44] are instrumental in providing high quality and consistency in the annotation of genes and gene products for molecular function, role in biological processes and location in the cellular components (all sub classes of GO). The PO-based annotations include observations or samples isolated for expression and phenotype associations to plant structure and the plant growth and developmental stage, the two subclasses of PO. The Plant Trait Ontology (TO) and the ontologyentity-quality attribute-based phenotype annotations are being promoted for mutant phenotype and QTL trait annotations [45-48]. These annotations are often provided by the individual species specific databases. The Planteome is a new database (www.planteome.org) [49] that provided the reference set of ontologies for integration in the genome, expression and phenome projects. It also provides collection of ontology-based annotation of about 85 plant species including the various rice reference O. sativa subspecies japonica and indica and the wild Oryza species.

3. Web applications and bioinformatics analysis tools Web based tools provide an excellent platform to analyze huge data sets and, thus enabling datadriven discoveries. For example, CyVerse [50] and Galaxy [51, 52] are two widely used cyber infrastructure platforms for data storage, dissemination and high-throughput bioinformatics analysis. They provide access to popular bioinformatics tools, access to computing infrastructure and options to run user tools. CoGe [53] is another “platform for performing Comparative Genomics research. It provides an open-ended network of interconnected tools to manage, analyze, and visualize next-gen sequencing data” (Source: https://genomevolution.org/coge/). Besides the infrastructure platforms many of the databases listed in tables 1 and 2 provide local data analysis options. For example. the Variant Effect Predictor (VEP) [54] tool provided by the Ensembl and Gramene, Pathway enrichment and comparison tools by Plant Reactome [28].

7

4. Fundability, accessibility, and interoperability of genomic data and resources

The primary aim of many plant genomic databases has been to acquire, organize, represent, and help navigate and retrieve the information of interest. In accordance with data collection techniques, biological databases are classified into either Primary and Secondary or both. Primary databases rely upon direct archiving of the experimental results as a data source such as NCBI Sequence Read Archive (SRA) [55], INSDC and its collaborator archives [4], Rice RAPdb [22]. The secondary databases are populated with processed and analyzed data of primary data-sets, e.g. NCBI RefSeq [56], EMBL-EBI’s Gene Expression Atlas [12]. However, resources like Gramene [10, 57], Plant Ontology [44, 58], Planteome [13], Ensembl Plants [9] and Plant Reactome [28] provide both the collected data from external sources, but add their own quality annotation in addition to serving as community access to primary data for which there are no known archives.

Almost all the resources provide some form of biocuration of data and metadata required for integration and proper interpretation of the experiments from which data sets originate. The collection methods, annotation and statistical significance of this primary data affect the credibility and reliability of a resource. The source of data is simply a physical access to the data and data location or a live data accessed via an application programming interface (APIs) and semantic web.

Depending on the data type, such as genome assembly, gene calls, functional annotation, gene expression, QTL and phenotypes, genetic markers, genetic diversity data in the form of SNPs, SSRs, indels, genotyping and germplasm collections, biocuration and biological databases greatly differ in their data curation practices. The processes may include, fully automated, semiautomated, only manual curation or a combination of all. Therefore, accessing data and comparing the datasets across two or more resources poses a major challenge. The accessibility of the data sets is further compromised due to the restrictions on the data download. Due to emerging need for data integration, its reuse and re-analysis, semantic concepts based on ontologies and application programming interface (APIs) are being employed. Thus in a new

8 trend, majority of the biological databases are approaching adoption of Findable, Accessible, Interoperable and Re-usable (FAIR) data principles [59-61]

Conclusion

In this article, we have attempted to catalog different web data resources available for rice. Some of them are well known and widely used while others are new and small-scale repositories. With the increasing number of repositories, it is evident that there is an enormous amount of data available on the web, associated with almost every aspect of rice research. In spite of having such a huge amount of diverse data, it has not been efficiently explored, as many researchers or prospect users in the biology are unfamiliar with all the possible resources to search and analyze the data. Often different databases have different formats and protocols for data exchange, which makes it difficult to integrate them in one place. In an ideal situation, there should be a single platform for all the databases on a single domain of interest, where a user can search all the respective databases with a single query using APIs and ontologies and compare the results. AraPort [62] and Gramene [10]. Are such examples. Some databases are already integrating the links to other databases of similar data types to increase the credibility of their data, which is the first step in providing a unified platform. It maximizes the use of data available in current resources and may help in avoiding redundancy. It provides better visibility to small databases and can collectively provide a bigger picture, since small databases generally focus on one specific aspect and present detailed information.

Acknowledgements PJ greatly appreciates the funding provided by the NSF funded Gramene project (NSF IOS 1127112) and the Planteome project (NSF IOS 1340112) which supported this work and PG. PG and PJ wrote the manuscript. Funding agencies had no role in the study design, data analysis, or preparation of the manuscript.

9 References [1]

[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]

[14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25]

[26]

[27] [28]

Rigden, D.J., X.M. Fernandez-Suarez, and M.Y. Galperin, The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection. Nucleic Acids Res, 2016. 44(D1): p. D16. Benson, D.A., et al., GenBank. Nucleic Acids Res, 2013. 41(Database issue): p. D36-42. Aken, B.L., et al., The Ensembl gene annotation system. Database (Oxford), 2016. 2016. Cochrane, G., et al., The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res, 2016. 44(D1): p. D48-50. Mashima, J., et al., DNA data bank of Japan (DDBJ) progress report. Nucleic Acids Research, 2016. 44(D1): p. D51-D57. Berman, H.M., et al., The Protein Data Bank. Nucleic Acids Res, 2000. 28(1): p. 235-42. Boutet, E., et al., UniProtKB/Swiss-Prot. Methods in Molecular Biology, 2007. 406: p. 89-112. Dong, Q., S.D. Schlueter, and V. Brendel, PlantGDB, plant genome database and analysis tools. Nucleic Acids Res, 2004. 32(Database issue): p. D354-9. Bolser, D., et al., Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomics Data. Methods Mol Biol, 2016. 1374: p. 115-40. Tello-Ruiz, M.K., et al., Gramene 2016: comparative plant genomics and pathway resources. Nucleic Acids Res, 2016. 44(D1): p. D1133-40. Dash, S., et al., PLEXdb: gene expression resources for plants and plant pathogens. Nucleic Acids Res, 2012. 40(Database issue): p. D1194-201. Petryszak, R., et al., Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants. Nucleic Acids Res, 2016. 44(D1): p. D746-52. Cooper, L., et al. The Planteome Project. in International Conference on Biomedical Ontology and BioCreative (ICBO 2016). 2016. Oregon State University, Corvallis, OR, USA: ICBO http://icbo.cgrb.oregonstate.edu/node/305. Ohyanagi, H., et al., The Rice Annotation Project Database (RAP-DB): hub for Oryza sativa ssp. japonica genome information. Nucleic Acids Res, 2006. 34(Database issue): p. D741-4. He, X. and J. Wang, Bgi-Ris V2. Methods Mol Biol, 2007. 406: p. 275-99. Alexandrov, N., et al., SNP-Seek database of SNPs derived from 3000 rice genomes. Nucleic Acids Res, 2015. 43(Database issue): p. D1023-7. Mansueto, L., et al., SNP-Seek II: A resource for allele mining and analysis of big genomic data in Oryza sativa. Current Plant Biology, 2016. Li, J.Y., J. Wang, and R.S. Zeigler, The 3,000 rice genomes project: new opportunities and challenges for future rice research. Gigascience, 2014. 3: p. 8. project, r.g., The 3,000 rice genomes project. Gigascience, 2014. 3: p. 7. Ouyang, S., et al., The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res, 2007. 35(Database issue): p. D883-7. Yuan, Q., et al., The institute for genomic research Osa1 rice genome annotation database. Plant Physiol, 2005. 138(1): p. 18-26. Sakai, H., et al., Rice Annotation Project Database (RAP-DB): an integrative and interactive database for rice genomics. Plant Cell Physiol, 2013. 54(2): p. e6. Wang, M., et al., The genome sequence of African rice (Oryza glaberrima) and evidence for independent domestication. Nat Genet, 2014. 46(9): p. 982-988. Chen, J., et al., Whole-genome sequencing of Oryza brachyantha reveals mechanisms underlying Oryza genome evolution. Nat Commun, 2013. 4: p. 1595. Jacquemin, J., et al., The International Oryza Map Alignment Project: development of a genus-wide comparative genomics platform to help solve the 9 billion-people question. Curr Opin Plant Biol, 2013. 16(2): p. 147-56. Zhang, Y., et al., Genome and Comparative Transcriptomics of African Wild Rice Oryza longistaminata Provide Insights into Molecular Mechanism of Rhizomatousness and Self-Incompatibility. Mol Plant, 2015. 8(11): p. 1683-6. Dharmawardhana, P., et al., A genome scale metabolic network for rice and accompanying analysis of tryptophan, auxin and serotonin biosynthesis regulation under biotic stress. Rice (N Y), 2013. 6(1): p. 15. Naithani, S., et al., Plant Reactome: a resource for plant pathways and comparative analysis. Nucleic Acids Res, 2016.

10 [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49]

[50] [51] [52] [53] [54] [55] [56] [57] [58] [59]

Meng, Y., et al., PmiRKB: a plant microRNA knowledge base. Nucleic Acids Res, 2011. 39(Database issue): p. D181-7. Hao, L., et al., Information Commons for Rice (IC4R). Nucleic Acids Res, 2016. 44(D1): p. D1172-80. Zhao, H., et al., RiceVarMap: a comprehensive database of rice genomic variations. Nucleic Acids Res, 2015. 43(Database issue): p. D1018-22. Yonemaru, J., K. Ebana, and M. Yano, HapRice, an SNP haplotype database and a web tool for rice. Plant Cell Physiol, 2014. 55(1): p. e9. Mockler, T.C., et al., The DIURNAL project: DIURNAL and circadian expression profiling, model-based pattern matching, and promoter analysis. Cold Spring Harb Symp Quant Biol, 2007. 72: p. 353-63. Hruz, T., et al., Genevestigator v3: a reference expression database for the meta-analysis of transcriptomes. Adv Bioinformatics, 2008. 2008: p. 420747. Sato, Y., et al., RiceFREND: a platform for retrieving coexpressed gene networks in rice. Nucleic Acids Res, 2013. 41(Database issue): p. D1214-21. Sato, Y., et al., RiceXPro version 3.0: expanding the informatics resource for rice transcriptome. Nucleic Acids Res, 2013. 41(Database issue): p. D1206-13. Gour, P., et al., Manually curated database of rice proteins. Nucleic Acids Res, 2014. 42(Database issue): p. D1214-21. Chien, C.H., et al., EXPath: a database of comparative expression analysis inferring metabolic pathways for plants. BMC Genomics, 2015. 16 Suppl 2: p. S6. Ogata, Y., et al., CoP: a database for characterizing co-expressed gene modules with biological information in plants. Bioinformatics, 2010. 26(9): p. 1267-8. Sapkota, A., et al., DIPOS: database of interacting proteins in Oryza sativa. Mol Biosyst, 2011. 7(9): p. 2615-21. Gu, H., et al., PRIN: a predicted rice interactome network. BMC Bioinformatics, 2011. 12: p. 161. Lee, T., et al., RiceNet v2: an improved network prioritization server for rice genes. Nucleic Acids Res, 2015. 43(W1): p. W122-7. The Gene Ontology, C., Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res, 2016. Cooper, L. and P. Jaiswal, The Plant Ontology: A Tool for Plant Genomics. Methods Mol Biol, 2016. 1374: p. 89-114. Oellrich, A., et al., An ontology approach to comparative phenomics in plants. Plant Methods, 2015. 11: p. 10. Deans, A.R., et al., Finding Our Way through Phenotypes. PLoS Biology, 2015. 13(1): p. e1002033. Thessen, A.E., et al., Emerging semantics to link phenotype and environment. PeerJ, 2015. 3: p. e1470. Yamazaki, Y. and P. Jaiswal, Biological ontologies in rice databases. An introduction to the activities in Gramene and Oryzabase. Plant Cell Physiol, 2005. 46(1): p. 63-8. Cooper, L., et al. The Planteome Project. in International Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016). 2016. Corvallis, OR, USA: CEUR-ws.org Volume 1747 http://ceurws.org/Vol-1747/IT406-IP35_ICBO2016.pdf. Devisetty, U.K., et al., Bringing your tools to CyVerse Discovery Environment using Docker. F1000Res, 2016. 5: p. 1442. Afgan, E., et al., The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res, 2016. 44(W1): p. W3-W10. Bornich, C., et al., Galaxy Portal: interacting with the galaxy platform through mobile devices. Bioinformatics, 2016. 32(11): p. 1743-5. Lyons, E., et al., Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar, and grape: CoGe with rosids. Plant Physiol, 2008. 148(4): p. 1772-81. McLaren, W., et al., The Ensembl Variant Effect Predictor. Genome Biol, 2016. 17(1): p. 122. Kodama, Y., M. Shumway, and R. Leinonen, The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res, 2012. 40(Database issue): p. D54-6. Pruitt, K.D., et al., RefSeq: an update on mammalian reference sequences. Nucleic Acids Res, 2014. 42(Database issue): p. D756-63. Tello-Ruiz, M.K., et al., Gramene: A Resource for Comparative Analysis of Plants Genomes and Pathways. Methods Mol Biol, 2016. 1374: p. 141-63. Cooper, L., et al., The plant ontology as a tool for comparative plant anatomy and genomic analyses. Plant Cell Physiol, 2013. 54(2): p. e1. Rodriguez-Iglesias, A., et al., Publishing FAIR Data: An Exemplar Methodology Utilizing PHI-Base. Front Plant Sci, 2016. 7: p. 641.

11 [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78]

[79] [80] [81] [82] [83] [84] [85] [86] [87] [88]

FAIR principles for data stewardship. Nat Genet, 2016. 48(4): p. 343. Wilkinson, M.D., et al., The FAIR Guiding Principles for scientific data management and stewardship. Sci Data, 2016. 3: p. 160018. Krishnakumar, V., et al., Araport: the Arabidopsis information portal. Nucleic Acids Res, 2015. 43(Database issue): p. D1003-9. Yim, W.C., et al., PLANEX: the plant co-expression database. BMC Plant Biol, 2013. 13: p. 83. Aoki, Y., et al., ATTED-II in 2016: A Plant Coexpression Database Towards Lineage-Specific Coexpression. Plant Cell Physiol, 2016. 57(1): p. e5. Ohyanagi, H., et al., Plant Omics Data Center: an integrated web repository for interspecies gene expression networks with NLP-based curation. Plant Cell Physiol, 2015. 56(1): p. e9. Zhang, Z., et al., PMRD: plant microRNA database. Nucleic Acids Res, 2010. 38(Database issue): p. D806-13. Takeya, M., et al., NIASGBdb: NIAS Genebank databases for genetic resources and plant disease information. Nucleic Acids Res, 2011. 39(Database issue): p. D1108-13. Droc, G., et al., OryGenesDB: a database for rice reverse genetics. Nucleic Acids Res, 2006. 34(Database issue): p. D736-40. Ruprecht, C., et al., FamNet: A Framework to Identify Multiplied Modules Driving Pathway Expansion in Plants. Plant Physiol, 2016. 170(3): p. 1878-94. Goodstein, D.M., et al., Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res, 2012. 40(Database issue): p. D1178-86. Tomcal, M., N. Stiffler, and A. Barkan, POGs2: a web portal to facilitate cross-species inferences about protein architecture and function in plants. PLoS One, 2013. 8(12): p. e82569. Conte, M.G., et al., GreenPhylDB: a database for plant comparative genomics. Nucleic Acids Res, 2008. 36(Database issue): p. D991-8. Hieno, A., et al., ppdb: plant promoter database version 3.0. Nucleic Acids Res, 2014. 42(Database issue): p. D1188-92. Johnson, C., et al., CSRDB: a small RNA integrated database and browser resource for cereals. Nucleic Acids Res, 2007. 35(Database issue): p. D829-33. Cognat, V., et al., PlantRNA, a database for tRNAs of photosynthetic eukaryotes. Nucleic Acids Res, 2013. 41(Database issue): p. D273-9. Kanz, C., et al., The EMBL Nucleotide Sequence Database. Nucleic Acids Res, 2005. 33(Database issue): p. D29-33. Alter, S., et al., DroughtDB: an expert-curated compilation of plant drought stress genes and their homologs in nine species. Database (Oxford), 2015. 2015: p. bav046. Naika, M., et al., STIFDB2: an updated version of plant stress-responsive transcription factor database with additional stress signals, stress-responsive transcription factor binding sites and stress-responsive genes in Arabidopsis and rice. Plant Cell Physiol, 2013. 54(2): p. e8. Yilmaz, A., et al., GRASSIUS: a platform for comparative regulatory genomics across the grasses. Plant Physiol, 2009. 149(1): p. 171-80. Zhang, Y., et al., IsomiR Bank: a research resource for tracking IsomiRs. Bioinformatics, 2016. 32(13): p. 2069-71. Nakano, M., et al., Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA. Nucleic Acids Res, 2006. 34(Database issue): p. D731-5. Lee, T.H., et al., RiceArrayNet: a database for correlating gene expression from transcriptome profiling, and its application to the analysis of coexpressed genes in rice. Plant Physiol, 2009. 151(1): p. 16-33. Austin, R.S., et al., New BAR tools for mining expression data and exploring Cis-elements in Arabidopsis thaliana. Plant J, 2016. Waese, J. and N.J. Provart, The Bio-Analytic Resource: Data visualization and analytic tools for multiple levels of plant biology. Current Plant Biology. Riano-Pachon, D.M., et al., PlnTFDB: an integrative plant transcription factor database. BMC Bioinformatics, 2007. 8: p. 42. Zhang, T., A.P. Marand, and J. Jiang, PlantDHS: a database for DNase I hypersensitive sites in plants. Nucleic Acids Res, 2016. 44(D1): p. D1148-53. Kurotani, A., et al., Plant-PrAS: a database of physicochemical and structural properties and novel functional regions in plant proteomes. Plant Cell Physiol, 2015. 56(1): p. e11. Murcha, M.W., et al., MPIC: a mitochondrial protein import components database for plant and non-plant species. Plant Cell Physiol, 2015. 56(1): p. e10.

12 [89] [90] [91] [92]

[93]

[94] [95] [96] [97]

[98] [99] [100] [101] [102] [103] [104] [105] [106]

[107] [108]

[109] [110] [111] [112] [113]

Duvick, J., et al., PlantGDB: a resource for comparative plant genomics. Nucleic Acids Res, 2008. 36(Database issue): p. D959-65. Wu, X., Y. Zhang, and Q.Q. Li, PlantAPA: A Portal for Visualization and Analysis of Alternative Polyadenylation in Plants. Front Plant Sci, 2016. 7: p. 889. Lohse, M., et al., Mercator: a fast and simple web server for genome scale functional annotation of plant sequence data. Plant Cell Environ, 2014. 37(5): p. 1250-8. Wang, D., et al., The Rice Genome Knowledgebase (RGKbase): an annotation database for rice comparative genomics and evolutionary biology. Nucleic Acids Res, 2013. 41(Database issue): p. D1199205. Narsai, R., et al., Rice DB: an Oryza Information Portal linking annotation, subcellular location, function, expression, regulation, and evolutionary information for rice and Arabidopsis. Plant Journal, 2013. 76(6): p. 1057-73. Kawahara, Y., et al., TENOR: Database for Comprehensive mRNA-Seq Experiments in Rice. Plant Cell Physiol, 2016. 57(1): p. e7. Hamada, K., et al., OryzaExpress: an integrated database of gene expression networks and omics annotations in rice. Plant Cell Physiol, 2011. 52(2): p. 220-9. Lu, T., et al., RICD: a rice indica cDNA database resource for rice functional genomics. BMC Plant Biol, 2008. 8: p. 118. Edwards, J.D., A.M. Baldo, and L.A. Mueller, Ricebase: a breeding and genetics platform for rice, integrating individual molecular markers, pedigrees and whole-genome-based data. Database (Oxford), 2016. 2016. Kudo, T., et al., UniVIO: a multiple omics database with hormonome and transcriptome data from rice. Plant Cell Physiol, 2013. 54(2): p. e9. Helmy, M., M. Tomita, and Y. Ishihama, OryzaPG-DB: rice proteome database based on shotgun proteogenomics. BMC Plant Biol, 2011. 11: p. 63. Yonemaru, J., et al., Q-Taro: Qtl Annotation Rice Online Database. Rice, 2010. 3(2-3): p. 194-203. Karlowski, W.M., et al., MOsDB: an integrated information resource for rice genomics. Nucleic Acids Res, 2003. 31(1): p. 190-2. Zhang, Z., et al., RiceWiki: a wiki-based database for community curation of rice genes. Nucleic Acids Res, 2014. 42(Database issue): p. D1222-8. Jung, K.H., et al., Phylogenomics databases for facilitating functional genomics in rice. Rice (N Y), 2015. 8(1): p. 60. Sakata, K., et al., RiceGAAS: an automated annotation system and database for rice genome sequence. Nucleic Acids Res, 2002. 30(1): p. 98-102. Cao, P., et al., The Rice Oligonucleotide Array Database: an atlas of rice gene expression. Rice (N Y), 2012. 5(1): p. 17. Sakurai, T., et al., RiceFOX: a database of Arabidopsis mutant lines overexpressing rice full-length cDNA that contains a wide range of trait information to facilitate analysis of gene function. Plant Cell Physiol, 2011. 52(2): p. 265-73. Zhang, J., et al., RMD: a rice mutant database for functional analysis of the rice genome. Nucleic Acids Res, 2006. 34(Database issue): p. D745-8. Priya, P. and M. Jain, RiceSRTFDB: a database of rice transcription factors containing comprehensive expression, cis-regulatory element and mutant information to facilitate gene function analysis. Database (Oxford), 2013. 2013: p. bat027. Smita, S., et al., QlicRice: a web interface for abiotic stress responsive QTL and loci interaction channels in rice. Database (Oxford), 2011. 2011: p. bar037. Ohyanagi, H., et al., OryzaGenome: Genome Diversity Database of Wild Oryza Species. Plant Cell Physiol, 2016. 57(1): p. e1. Kurata, N. and Y. Yamazaki, Oryzabase. An integrated biological and genome information database for rice. Plant Physiol, 2006. 140(1): p. 12-7. Copetti, D., et al., RiTE database: a resource database for genus-wide rice genomics and evolutionary biology. BMC Genomics, 2015. 16: p. 538. Kind, T., M. Scholz, and O. Fiehn, How large is the metabolome? A critical analysis of data exchange practices in chemistry. PLoS One, 2009. 4(5): p. e5440.

13 Table 1- Generic genomic databases and bioinformatics tools. The description may be copied from the original source.

Gene/gene products including proteins and all RNA types Genome annotation Pathway/network/intera ctions Expression (transcript/metabolite/pr otein) Genetic/genomic variation (including SNP, mutants, indels, wild species etc.) Others

Data Types

Database Name and latest release date

Description

Species

URL

Gramene Nov 2016

An open source, data resource for comparative functional genomics in cereals and other plant species [10].

O. sativa and other plant species

http://www.gramene.org

Y

EXPath Oct 21, 2012

It provides, information on metabolic pathways inferred from microarray-based transcriptomic data, gene annotation and orthologous genes [38].

O. sativa and 2 plant species

http://expath.itps.ncku.e du.tw

PLANEX (PLAnt co-Expression) database

Contains publicly available GeneChip data obtained from the Gene Expression Omnibus [63].

O. sativa and 7 plant species

ATTED-II Sep 1, 2015

Provides information on co-expression gene-networks supported by microarray and RNA sequencing-based transcriptomic data. [64].

PODC (Plant Omics Data Center) Mar 16, 2016

Y

Y

Y

Y

Y

Y

http://planex.plantbioinfo rmatics.org

Y

Y

Y

Oryza and 8 plant species

http://atted.jp

Y

Y

Y

A repository of annotated gene expression data and omics data analysis tools [65].

O. sativa and 7 plant species

http://bioinf.mind.meiji.a c.jp/podc

Y

Y

Y

Y

PMRD (Plant MicroRNA Database) Nov 17, 2014

A plant miRNA data repositories containing .associated information on sequence, secondary structure, target genes, expression profiles of miRNAs and their mapping to the species-specific genome browser [66].

O. sativa and 120 plant species

http://bioinformatics.cau. edu.cn/PMRD

Y

Y

Y

Y

NIASGBdb (National Institute of Agrobiological Sciences planttfdb database)

A database containing information on simple sequence repeat (SSR) polymorphisms in plant genomes [67].

O. sativa and other plant species

http://www.gene.affrc.go .jp/databases_en.php

Y

Y

Y

OryGenesDB

A database for rice reverse genetics, build with flanking sequence tags of various mutagens and functional genomics data [68]

O. sativa ssp. indica and japonica, and 2 other plant species

http://orygenesdb.cirad.f r/index.html

Y

Y

Y

FamNet

Allows the user to retrieve data related conserved structural-functional domains within proteins from one or more plant species [69]

O. sativa and 7 plant species

http://www.gene2functio n.de/famnet.html

Y

Y

Y

Y

Y

14

Phytozome Oct 16,2015

A comparative hub for annotated plant genome and gene family data. Provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization [70].

O. sativa and 64 other plant and algae species

http://www.phytozome.n et

Y

Y

Y

POGs (Putative orthologous Groups 2) Database 2014

A relational database that integrates data from rice, Arabidopsis, and maize into 'putative orthologous groups' (POGs) and allow comparisons among orthologs and extrapolation of annotations among species [71].

O. sativa and 3 plant species

http://pogs.uoregon.edu/

Y

Y

Y

GreenPhylDB Sep 4, 2015

The database contains a catalogue of gene families covering a broad taxonomy of green plants [72].

O. sativa and other plant species

http://www.greenphyl.or g/cgi-bin/index.cgi

Y

Y

Y

Oryza and 4 plant species

http://ppdb.agr.gifuu.ac.jp/ppdb/cgibin/index.cgi

Y

Y

Y

O. sativa and maize

http://sundarlab.ucdavis. edu/smrnas/

Y

Y

ppdb (plant promoter database) Jun 5, 2013 CSRDB (Cereal Small RNA Database)

Provides information on transcription start sites (TSSs), core promoter structure (TATA boxes, Initiators, Y Patches, GA and CA elements) and regulatory element groups (REGs) [73]. Consisting of large-scale datasets of maize and rice smRNA sequences generated by high-throughput pyrosequencing [74].

PlantRNA Sep 6, 2012

Compiles transfer RNA (tRNA) gene sequences retrieved from fully annotated plant nuclear, plastid and mitochondrial genomes [75].

O. sativa and 10 plant species

http://plantrna.ibmp.cnrs .fr/

Y

Y

UniProtKB Nov 2016

A Central resource for annotated proteins consisting of two sections: UniProtKB/Swiss-Prot for manually annotated entries, and UniProtKB/TrEMBL for computerannotated entries [7].

O. sativa and other organism species

http://www.uniprot.org/

Y

Y

GenBank Updated daily basis

on

NIH genetic sequence database, a repository of publicly available DNA sequences [2].

http://www.ncbi.nlm.nih. gov

Y

Y

DDBJ Updated daily bases

on

A public repository sequence data [5].

http://www.ddbj.nig.ac.jp

Y

Y

http://www.ebi.ac.uk/abo ut

Y

Y

Y

of

nucleotide

O. sativa and other organism species O. sativa and other organism species O. sativa and other organism species

EMBL Dec 7, 2016

Comprehensive collection of nucleotide sequences and annotation from available public sources [76] .

Ensembl Plants Dec 2016

Provides Genome browser for several plant species various genomic data sets, and tools for analysis and visualization of genome-scale large data sets in the context of genome [9]

O. sativa and other organism species

http://plants.ensembl.or g/index.html

Y

Manually curated genes that are involved in drought stress response [77].

O. sativa ssp. japonica cv. Nipponbare and 8 plant species

http://pgsb.helmholtzmuenchen.de/droughtdb /

Y

DroughtDB

Y

15

STIFDB2 (Stress Responsive Transcription Factor Database) Oct 2012

GRASSIUS (Grass Regulatory Information Services) Aug 25, 2014 CoP database Nov 11, 2009

IsomiR Bank

A collection of biotic and abiotic stress responsive genes with options to identify probable Transcription Factor Binding Sites in their promoters. An integrated biocuration and genomic data mining approach have been employed to characterize the data set of transcription factors and consensus binding sites from literature and stress-responsive genes from the Gene Expression Omnibus [78]. Composed of a collection of databases that relate to the control of gene expression in the grasses, and their relationship with agronomic traits. Includes transcription factors, promoters, co-regulators and transcription factorORF clones [79]. Integrated database for coexpressed genes and biological processes in plants derived from microarray data [39]. First integrative resource that contains the sequence and expression of isomiRs [80].

O. sativa ssp. japonica and Indica and Arabidopsis

http://caps.ncbs.res.in/st ifdb2

Y

Y

Y

O. sativa and 3 plant species

www.grassius.org

Y

Y

Y

O. sativa and 7 plant species

http://webs2.kazusa.or.j p/kagiana/cop0911/

Y

Y

http://mcg.ustc.edu.cn/b sc/isomir/

Y

Y

http://mpss.udel.edu

Y

Y

O. sativa and 7 organisms O. sativa ssp. indica and japonica and 3 plant species

Plant MPSS (massively parallel signature sequencing) databases

Provides information on the expression level of genes, and potential novel transcripts (antisense transcripts, alternative splice isoforms, and regulatory intergenic transcripts) [81].

PlantArrayNet Jan 10, 2011

Provides information on co-expresssed genes using microarray-based transcriptomic data [82].

O. sativa and 2 plant species

http://arraynet.mju.ac.kr/ arraynet/

Y

Y

PLEXdb Jun 2013

A unified gene expression resource for plants and plant pathogens. It is a genotype to phenotype, hypothesis building information warehouse, leveraging highly parallel expression data with seamless portals to related genetic, physical, and pathway data [11].

Oryza and 12 other plant species

http://www.plexdb.org/

Y

Y

BAR (BioAnalytical Resource for plant biology) Jun 2, 2015

Provides interactive interfaces for the exploratory visualization of gene expression data [83, 84]

Several plant species including O. sativa.

http://bar.utoronto.ca

Y

Y

PmiRKB (Plant miRNA Knowledge Base) Jun 5, 2010

Provides four major functional modules"SNPs", "Pri-miRNAs", "MiR—Tar", and "Self-reg" [29].

21 O. sativa and Arabidopsis

http://bis.zju.edu.cn/pmir kb/

Y

PlnTFDB (plant transcription factor database)

A web interface to access large sets of transcription factors of several plant species. Information including protein sequences, coding regions, genomic sequences, expressed sequence tags (ESTs), domain architecture and scientific literature is provided for each family [85].

O. sativa ssp. indica and japonica and other plant species

http://plntfdb.bio.unipotsdam.de/v3.0/

Y

Y

Y

16 PlantDHS (plant DNase I hypersensitive site database) Feb 23,2016

Integrates histone modification, RNA sequencing, nucleosome positioning/ occupancy, transcription factor binding sites, and genomic sequence [86].

Plant Homolog Database May 9, 2015

A database composed homologous genes [30].

PO (Plant Ontology) Sep 2016

Simple yet robust and extensible controlled vocabularies that accurately reflect the biology of plant structures and developmental stages [58]

O. sativa ssp. japonica cv. Nipponbare and 2 plant species 16 plant sp. Including 10 Oryza species O. sativa and other plant species

Plant-PrAS (Plant Protein Annotation Suite) database

Database of physicochemical and structural properties, and novel functional region in plant proteomes [87].

O. sativa and 5 plant species

PDB (Protein Data Bank) Dec 6, 2016

Worldwide archive of structural data of biological macromolecules [6].

MPIC (Mitochondrial Protein Import Components) database

Searchable information on the protein import apparatus of plant and non-plant mitochondria [88].

PlantGDB Jul 23, 2012

A database of molecular sequence data from several plant species [89].

PlantAPA (Alternative polyadenylation) Aug 30, 2016

Planteome

of

plant

A web server for query, visualization, and analysis of poly(A) sites in plants, which can profile heterogeneous cleavage sites and quantify expression pattern of poly(A) sites across different conditions [90]. A resource for common reference ontologies for plants and species-specific crop ontologies. Also provides ontologybased annotation of rice genes, QTLs, phenotypes, germplasms [49].

O. sativa and other organism species O. sativa and 23 other organism species O. sativa and other plant species

http://plantdhs.org/

Y

Y

http://phd.big.ac.cn/

Y

Y

www.plantontology.org

Y

Y

http://plant-pras.riken.jp

Y

Y

http://www.rcsb.org/pdb/

Y

Y

http://www.plantenergy. uwa.edu.au/applications /mpic

Y

Y

http://www.plantgdb.org/

Y

O. sativa and 3 plant species

http://bmi.xmu.edu.cn/pl antapa/

Oryza and plant species

http://www.planteome.or g

Y

Y

Y

Mercator pipeline

Functional annotation of plant 'omics' data [91].

Oryza and 2 plant species

http://mapman.gabipd.or g/web/guest/app/Mercat or

Y

Y

CyVerse (former iPlant Collaborative)

Provides life scientists with powerful computational infrastructure to handle huge datasets and complex analysis, thus enabling data-driven discovery [50].

Plants, animals, and microbes

http://www.cyverse.org/

Y

Y

Galaxy

A software system that allows experimentalists without informatics or programming expertise to perform complex large-scale analysis with just a Web browser [51, 52]

http://galaxyproject.org

Y

Y

Y

Y

17 MoChA (Molecular Characteristics database for Allergens)

Database of allergenic proteins gained by bioinformatics tools or evidence of IgE binding. It has collected genome, transcriptome, proteome data of reliable experiments and molecular features.

Oryza and 3 plant species

http://lilab.life.sjtu.edu.c n:8080/mocha/main-7.92.html

Y

Diurnal

A web-based tool for accessing the diurnal and circadian genome-wide expression results of genes from several array experiments conducted on common model plants [33].

Plant species

http://diurnal.mocklerlab. org/

Y

Y

Genevestigator

Provides powerful tools to explore gene expression across a wide variety of biological contexts [34].

O.sativa and 16 organism species

https://genevestigator.co m/gv/

Y

Y

Y

18 Table 2- Rice specific databases. The description may be copied from the original source.

Rice DB Oct 23, 2013

TENOR (Transcriptome ENcyclopedia Rice)

Of

OryzaExpress Apr 22, 2014

Rice Expression Database Aug 2016

RICD (Rice Indica cDNA Database)

Expression (transcript/metabolite/protein) Genetic/genomic variation (including SNP, mutants, indels, wild species etc.)

MCDRP (Manually Curated Database of Rice Proteins) Nov 30, 2016

Pathway/network/interactions

RGKbase (Rice Genome Knowledgebase) Jun 28, 2012

Y

Y

Y

Y

Y

Y

Y

Y

Y

Species

URL

Annotation database for rice comparative genomics and evolutionary biology, which includes genome sequence assemblies, transcriptomic and epigenomic data, genetic variations, quantitative trait loci (QTLs) and the relevant literature [92].

O. brachyantha, O. glaberrima, O. sativa ssp. japonica cv. Nipponbare, ssp. Indica 93-11, and PA64s

http://rgkbase. big.ac.cn/RGK base

Y

Gene expression database for rice proteins including metabolic pathways and protein interaction from published articles [37].

O. sativa indica japonica

ssp. and

http://www.ge nomeindia.org /biocuration

Y

O. sativa Arabidopsis

and

http://ricedb.pl antenergy.uw a.edu.au

Y

Y

Y

Y

O. sativa japonica Nipponbare

ssp. cv.

http://tenor.dn a.affrc.go.jp

Y

Y

Y

Y

O. sativa

http://plantomi cs.mind.meiji. ac.jp/OryzaEx press/

Y

Y

Y

Oryza spp

http://expressi on.ic4r.org/

Y

Y

Y

O. sativa ssp. indica Guang-lu-ai 4 and Minghui 63

http://www.nc gr.ac.cn/ricd

Y

Y

Functional genomics database linking annotation, sub-cellular location, function, expression, regulation, and evolutionary information [93]. Represents transcriptional activity on the rice genome at the nucleotide level based on the RNA-seq data under 140 environmental stresses and plant hormone treated conditions. Expression profiles, information of cisregulatory elements in promoter regions and co-expressed transcript are provided for each transcript [94]. Sub-platform of PlantExpress for a single-species gene expression analysis in O. sativa. Consists of gene expression networks and omics annotations derived from microarray data [95]. Accommodates rice reference genome with standardized and accurate gene annotations derived from RNA-seq data [30]. cDNA resource with comprehensive information for functional analysis of indica ssp. and comparative genomics including sequences, protein domain annotations, similarity search results, SNPs and InDels information, and hyperlinks to gene annotation [96].

Y

Others

Description

Gene/gene products including proteins and all RNA types

Database Name and latest release date

Genome annotation

Data Types

Y

19

OryGenesDB

Ricebase Aug 10, 2016

UniVIO (Uniformed Viewer for Integrated Omics) Oct 22, 2012

A database for rice reverse genetics, build with FSTs (flanking sequence tags) of various mutagens and functional genomics data, collected from both international insertion collections and the literature [68]. An integrative genomic database for rice with an emphasis on combining datasets in a way that maintains key links between the past and current genetic studies. Includes DNA sequence data, gene annotations, nucleotide variation data and molecular marker fragment size data [97]. Displays hormone-metabolome (hormonome) and transcriptome data in a single formatted (uniformed) heat map. Hormonome and transcriptome data obtained from 14 organ parts of rice plants at the reproductive stage and seedling shoots of three gibberellin signaling mutants are included in the database .[98].

O. sativa ssp. indica and japonica, and 2 other plant species

http://orygene sdb.cirad.fr/in dex.html

Y

Y

Y

O. sativa

http://ricebase .org

Y

Y

Y

http://univio.ps c.riken.jp/

Y

http://oryzapg. iab.keio.ac.jp/

Y

Y

Y

http://qtaro.ab r.affrc.go.jp/

Y

Y

Y

http://pgsb.hel mholtzmuenchen.de/ plant/rice/inde x.jsp

Y

Y

http://rice.plan tbiology.msu.e du

Y

Y

http://rapdb.dn a.affrc.go.jp/

Y

Y

Several species of Oryza

http://oge.gra mene.org

Y

Y

Oryza

http://wiki.ic4r. org/index.php/ Main_Page

Y

Y

O. sativa japonica Nipponbare

ssp. cv.

OryzaPG-DB (Oryza ProteoGenomic Database) Jan 30, 2012

Rice proteome database based on shotgun proteogenomics, contains proteome of rice undifferentiated cultured cells, corresponding cDNA, transcript and genome sequences, novel proteogenomics features and tupdated gene models annotation [99].

O. sativa

Q-TARO (QTL Annotation Rice Online Database ) March 31, 2012

Displays the co-localization of QTLs and distribution of QTL clusters on rice genome [100].

O. sativa indica japonica

MosDB Updated basis

daily

A resource for publicly available sequences of the rice (Oryza sativa L.) genome. Sub-platform of Plantdb [101].

O. sativa

MSU Rice Genome Annotation Project Feb 7, 2012

Displays sequence and annotation data for the rice genome. Includes genome browser, motifs/domains within the predicted genes, a rice repeat database, identified related sequences in other plant species [20, 21].

O. sativa japonica Nipponbare

ssp. cv.

RAP-DB (Rice Annotation Project DataBase) Aug 5, 2016

Genomics database provides gene annotations for the genome sequence of rice [14, 22].

O. sativa japonica Nipponbare

ssp. cv.

on

OGE Gramene

RiceWiki

Provides access to the most updated version of the Oryza genome evolution (OGE) project and the International Oryza Map alignment (I_OMAP) project. Includes, genome assembly, annotation, synteny, gene trees, SNPs and interspecific genome alignments. A wiki-based, publicly editable and open-content platform for community curation of rice genes [102].

ssp. and

Y

Y

Y

20 A collection of databases for six large gene families in rice, including those for glycosyltransferases, glycoside hydrolases, kinases, transcription factors, transporters, and cytochrome P450 monooxygenases viz. Rice Kinase Database, Rice GT database, Rice GH database, Rice TF database, Rice transporter database, Rice Cytochrome P450 database [103].

DIPOS (database of interacting proteins in Oryza sativa)

RiceCyc Dec 2016

Rice Phylogenomic Database Jun 2015

O. sativa

http://ricephyl ogenomics.uc davis.edu/inde x.shtml

Y

Y

Provides comprehensive information of interacting proteins in rice [40].

O. sativa

http://www.ric eresearch.info /

Y

Y

Metabolic network of rice. Provides metabolic pathways, reactions, metabolites and associated gene entities. The analysis tools provide pathway comparison, and gene expression analysis.

O. sativa japonica

KEGG rice

Provides metabolic and regulatory pathways, enzymes, reactions, metabolites and associated gene entities.

IntAct rice

Experimentally determined gene-gene interaction data

O. sativa

RiceNet Dec 29,2014

An updated network prioritization server for Oryza sativa ssp. Japonica. Gene prioritization allows users to predict new candidate genes for a phenotype or biological pathways by prioritizing rice genes [42].

O. sativa japonica

IC4R (Information Commons for Rice) May 5, 2015

Rice knowledgebase- expression profiles derived from RNA-seq data, genomic variations, plant homologs, post-translational modifications, literature as well as communitycontributed annotations [30].

Rice GAAS (Rice Genome Automated Annotation System)

ssp

http://pathway .gramene.org/ RICE/organis m-summary?

Y

Y

http://www.ge nome.jp/keggbin/show_org anism?menu_ type=pathway _maps&org=d osa http://www.ebi .ac.uk/ebisear ch/search.ebi ?db=intactexperiments&t =Oryza+sativa

Y

Y

http://www.ine tbio.org/ricene t/

Y

Oryza

http://ic4r.org/

Y

Y

A rice genome automated annotation system, which integrates programs for prediction and analysis of proteincoding gene structure [104].

O. sativa

http://RiceGA AS.dna.affrc.g o.jp

Y

Y

EMBL-EBI Gene Expression Atlas Dec 2, 2016

An open public repository of gene expression pattern data under different biological conditions using both microarray and RNA-seq data [12].

O. sativa

https://www.e bi.ac.uk/gxa/h ome

Y

Y

RiceFREND Sep 13, 2012

Gene coexpression database derived from Microarray data [35].

O. sativa japonica Nipponbare

http://ricefrend .dna.affrc.go.j p/

Y

Y

ssp.

ssp. cv.

Y

21

RiceXPro EXpression Database) 2013

(Rice Profile

Repository of gene expression profiles derived from microarray analysis of tissues/organs encompassing the entire growth of the rice plant under natural field conditions, rice seedlings treated with various phytohormones, and specific cell types/tissues isolated by laser microdissection (LMD) [36].

O. sativa japonica

ssp.

ssp. and

http://ricexpro. dna.affrc.go.jp /

Y

Y

http://www.ric earray.org/

Y

Y

ROAD (Rice Oligonucleotide Array Database) Mar 28, 2012

A public resource for gene expression and coexpression analysis in rice derived from microarray data [105].

O. sativa indica japonica

RiceFOX

A database of Arabidopsis mutant lines overexpressing rice full-length cDNA that contains a wide range of trait information to facilitate analysis of gene function [106].

O. sativa

http://ricefox.p sc.riken.jp

Y

Y

O. sativa

http://bis.zju.e du.cn/prin

Y

Y

http://www.ory zasnp.org/iricportal/

Y

Y

PRIN (predicted rice interactome network)

Rice SNP-Seek Database Feb 16, 2015

Rice Database

Variation

Protein-protein interaction data of PRIN are based on the interologs of six model organisms where largescale protein-protein interaction experiments have been applied [41]. Provides Genotype, Phenotype, and Variety Information for rice and SNP genotyping data. It allows quick retrieving of SNP alleles for all varieties in a given genome region, finding different alleles from predefined varieties and querying basic passport and morphological phenotypic information about sequenced rice lines [16, 17].

O. sativa japonica Nipponbare

ssp. cv.

An atlas of re-sequencing-based rice genomic variations [30].

Oryza varieties

http://variation .ic4r.org/

Y

Y

RiceVarMap Oct 7, 2015

Database of rice genomic variation. Contains SNPs and INDELs identified from sequencing data of two sets of rice germplasms of cultivated species [31].

O. sativa

http:/ricevarm ap.ncpgr.cn

Y

Y

RMD (Rice Mutant Database) Mar 1, 2012

An archive for collecting, managing and searching information of the TDNA insertion mutants generated by an enhancer trap system [107].

O. sativa ssp. Japonica cv. Zhonghua 11 , Zhonghua 15 and Nipponbare

http://rmd.ncp gr.cn

Y

Y

RiceSRTFDB

Represents transcription factors with comprehensive expression, cisregulatory element and mutant information derived from Microarray data of a curated set of 456 Affymetrix GeneChip Rice Genome arrays [108].

O. sativa

http://www.nip gr.res.in/Rice SRTFDB.html

QlicRice

A collection of abiotic stress responsive quantitative trait loci (QTLs) in rice and their corresponding sequenced gene loci [109].

O. sativa

http://nabg.ias ri.res.in:8080/ qlic-rice/

Y

Y

Y

Y

Y

22

BGI-RIS V2 (Beijing Genomics InstituteRice Information System) Oct 28, 2008

An integrated information resource and comparative analysis workbench for rice genomes including detailed annotation data, including genetic markers, Bacterial Artificial Chromosome (BAC) end sequences, gene contents, cDNAs, oligos, tiling arrays, repetitive elements, and genomic polymorphisms [15].

O. sativa indica japonica

ssp. and

http://rice.gen omics.org.cn/r ice/index2.jsp

Y

OryzaGenome Jun 19, 2015

Subplatform of Oryzabase. Provides genome sequence information for 21 wild Oryza species together with several cultivated strain reference sequences [110].

O. rufipogon, O. longistaminata, O. sativa ssp. japonica cv. Nipponbare, Nongken-58, and Aus-type Kasalath and indica, Guangluai-4

http://viewer.s higen.info/ory zagenome/

Y

Oryzabase 2016

Contains information about rice development and anatomy, rice mutants, and genetic resources, especially for wild varieties of rice [48, 111].

Oryza sp. (different species of oryza)

http://www.shi gen.nig.ac.jp/r ice/oryzabase/ top/top.jsp

Y

HapRice

SNP haplotype database [32].

aus, indica, tropical japonica and temperate japonica

http://qtaro.ab r.affrc.go.jp/in dex.html

Y

RiTE Database (Rice TE database) Aug 31, 2015

A genus-wide collection of transposable elements and repeated sequences across 11 diploid species [112].

Genus Oryza and the closely-related out-group Leersia perrieri.

http://www.ge nome.arizona. edu/cgibin/rite/index.c gi

Y

IRRI (International Rice research institute)

A research organization involved in serving, understanding, sharing, and using rice genetic diversity, breeding and delivering new varieties, developing and sharing improved crop and environmental management practices, and facilitating the largescale adoption of technologies.

Oryza

http://irri.org/

The (rice) metabolomics data

Provides access restricted access to metabolomics data for rice [113].

Oryza sativa

http://fiehnlab. ucdavis.edu/p rojects/rice_m etabolome

Y

Y

Y

Suggest Documents