RNABase: an annotated database of RNA structures - CiteSeerX

4 downloads 198 Views 294KB Size Report
python scripting language to parse and analyze the structures. Lastly, PHP scripts are used to dynamically generate the interface seen at the RNABase site.
502–504

Nucleic Acids Research, 2003, Vol. 31, No. 1 DOI: 10.1093/nar/gkg011

#

2003 Oxford University Press

RNABase: an annotated database of RNA structures Venkatesh L. Murthy and George D. Rose* Department of Biophysics, Johns Hopkins University, Jenkins Hall, 3400 N. Charles Street, Baltimore, MD 21218, USA Received July 16, 2002; Accepted August 13, 2002

ABSTRACT RNABase is a unified database of all threedimensional structures containing RNA deposited in either the Protein Data Bank (PDB) or Nucleic Acid Data Base (NDB). For each structure, RNABase contains a brief summary as well as annotation of conformational parameters, identification of possible model errors, Ramachandran-style conformational maps and classification of ribonucleotides into conformers. These same analyses can also be performed on structures submitted by users. To facilitate access, structures are automatically placed into a variety of functional and structural categories, including: ribozymes, pseudoknots, etc. RNABase can be freely accessed on the web at http:// www.rnabase.org. We are committed to maintaining this database indefinitely.

INTRODUCTION The pace of RNA structural determination has been accelerating rapidly (Fig. 1). Currently, there are over 500 publicly available structures containing one or more ribonucleotides, including 45 structures of ribozymes and ribozyme fragments as well as 85 structures of partial or complete tRNAs (alone and in complex with other molecules). Inconveniently, neither the Protein Data Bank (PDB) (1) nor the Nucleic Acid Database (NDB) (2) alone contains a complete set of these structures. Furthermore, few structural validations and analyses on the structures are provided by either database. To address these issues, we have constructed RNABase, a web-driven relational database. For each structure, RNABase contains: a brief summary as well as annotation of conformational parameters, identification of possible model errors, Ramachandran-style conformational maps and classification of ribonucleotides into conformers. RNABase ENTRIES AND ANALYSES Lists of structures in RNABase can be sorted by experimental technique, structural and functional categories (e.g. ribozymes, aptamers, tetraloops, etc.), or by conformational outlier rate. Classification by technique and into structural and functional

Figure 1. The cumulative number of publicly available RNA containing structures determined by X-ray crystallography (red), NMR spectroscopy (purple) and all techniques combined (blue) have been steadily increasing since the first RNA containing structure was released in 1978. There has been a substantial acceleration in RNA structure determination since the mid-1990s.

categories is performed based on keywords in author supplied data such as: title, keywords, etc. RNABase also provides both simple and advanced search tools in addition to these general lists. For each structure, the RNABase summary page contains information extracted from the structure’s header records such as title, authors, experimental technique, keywords, etc. This page also provides links to the corresponding entries in the PDB (1), NDB (2), MMDB (3), related Medline records (4), as well as the Image Library of Biological Molecules (5). In addition to providing powerful search and browsing facilities, it is critical to perform structure analyses and validations. With the development of complete Ramachandranstyle maps for RNA (6), it is increasingly possible to classify RNA conformers and to identify likely conformational errors. For each RNA residue in every structure, a complete set of conformational parameters are calculated, including: the backbone dihedral angles (a, b, g, d, e, and z), the sidechain dihedral

*To whom correspondence should be addressed. Tel: þ410 5167244; Fax: þ410 5164118; Email: [email protected]

Nucleic Acids Research, 2003, Vol. 31, No. 1

Figure 2. The average number of Ramachandran conformational map outliers per residue has shown no consistent trends over time for all structures (blue) or for the subsets determined by X-ray crystallography (red) and NMR spectroscopy (purple).

angle (w), the ribose dihedral angles (y0, y1, y2, y3, and y4), and the ribose puckering phase and amplitude. Using these parameters, complete sets of conformational plots are generated for every structure in RNABase. Furthermore, residues whose dihedral or puckering parameters fall outside of allowed areas are specifically flagged as probable model errors. These conformational parameters are also used to generate discrete conformational codes (www.rnabase.org/confsum). Each conformational code describes a subset of the multidimensional conformational space available to nucleotides. Correspondingly, residues with the same conformational code have similar conformations. Using these codes, frequently or rarely occurring conformations can easily be identified.

RNABase META-ANALYSIS RNABase also contains a number of ‘meta-analyses’ of cumulative properties across the entire database (http://www.rnabase.org/ metaanalysis/). Firstly, the total cumulative number of available RNA-containing structures is plotted (Fig. 1), showing that the rate of structure determination has been accelerating. It is interesting to note that the first several dozen structures were all solved by X-ray crystallography, but by the late 1990s the number of structures determined by nuclear magnetic resonance spectroscopy nearly equaled the number determined crystallographically. However, the last two years have seen the balance again shift toward the use of crystallography for RNA structure determination. We note that the passage of time has not resulted in any noticeable improvement in the rate of conformational outliers on Ramachandran type plots (Fig. 2). Neither structures

503

Figure 3. The average number of Ramachandran conformational map outliers per residue in structures solved by X-ray crystallography decreases with improving resolution (red). The average rate of outliers per residue for NMR structures (purple) is slightly worse than the overall average and is comparable ˚ resolution. The data for X-ray to the rate for structures determined to  4 A ˚ windows. structures are calculated as averages over 0.5 A

determined by NMR nor by crystallography have shown any consistent trends over the last two decades. Lastly, the number of conformational outliers varies with resolution for those determined by X-ray diffraction, as expected (Fig. 3). Structures determined by NMR have outlier rates comparable to ˚ resolution by X-ray diffraction. structures determined to 4 A RNABase ARCHITECTURE RNABase is built on a free-software platform consisting of the Red Hat Linux operating system, the apache web server and the PostgreSQL relational database system. In addition, a number of custom scripts have been developed using the python scripting language to parse and analyze the structures. Lastly, PHP scripts are used to dynamically generate the interface seen at the RNABase site. The entire system was designed to be largely self-maintaining and to require minimal ongoing work from its administrators. The data in RNABase is assembled by a python script that parses all entries from the PDB and NDB. Because the same nomenclature is used by the NDB and PDB for both ribonucleotides and deoxyribonucleotides, RNABase classifies structures as containing RNA if they have one or more A, C, T, G, or U residues with an O20 atom (O2* in PDB notation). DISCUSSION RNABase is an integrated database of RNA three-dimensional structures containing a number of annotations. It has been designed to facilitate access to structures of RNA molecules by structural, functional or experimental features. Furthermore, by

504

Nucleic Acids Research, 2003, Vol. 31, No. 1

providing interactive structure analysis, the quality of RNA structures may show more consistent improvement in the future. We anticipate that RNABase will continue to expand and evolve. These developments will be reported on the RNABase News page (http://www.rnabase.org/news/). ACKNOWLEDGEMENTS Supported by grants from the NIH (GM29458) and the Mathers foundation. REFERENCES 1. Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242.

2. Berman,H.M., Olson,W.K., Beveridge,D.L., Westbrook,J., Gelbin,A., Demeny,T., Shieh,S.H., Srinivasan,A.R. and Schneider,B. (1992) The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J., 63, 751–759. 3. Wang,Y., Addess,K.J., Geer,L., Madej,T., Marchler-Bauer,A., Zimmerman,D. and Bryant,S.H. (2000) MMDB: 3D structure data in Entrez. Nucleic Acids Res., 28, 243–245. 4. Wheeler,D.L., Church,D.M., Lash,A.E., Leipe,D.D., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Tatusova,T.A., Wagner,L. and Rapp,B.A. (2001) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 29, 11–16. 5. Reichert,J., Jabs,A., Slickers,P. and Su¨ hnel,J. (2000) The IMB Jena Image Library of Biological Macromolecules. Nucleic Acids Res., 28, 246–249. 6. Murthy,V.L., Srinivasan,R., Draper,D.E. and Rose,G.D. (1999) A complete conformational map for RNA. J. Mol. Biol., 291, 313–327.

Suggest Documents