CABIOS COMMUNICATION
Vol. 12 no. 3 1996 Pages 227-229
Image library of biological macromolecules
Jurgen Suhnel Abstract
Introduction
Structural information on biological macromolecules is an essential requirement for our understanding of biological function and for a deliberate variation of this function by rational or evolutionary approaches. Progress in recombinant DNA technology and RNA synthesis, X-ray and Irutitut fOr Molekulare Biotechnologie, Postfach 100813, D-07708 Jena, Germany
© Oxford University Press
227
Downloaded from bioinformatics.oxfordjournals.org by guest on July 14, 2011
An Image Library of Biological Macromolecules is described, which contains image and text files related to structures of biological macromolecules. Currently, the Library has ~5000 image files of ~500 structures of biological macromolecules whose coordinates are available in the Protein Data Bank and in the Nucleic Acid Database. The entries include all RNA structures, ~70 DNA structures, 150 proteins and a few carbohydrates. The Library contains further images ofamino acids, of standard and modified nucleotides and of nucleic acid model structures. Each entry consists of an annotation file with bibliographic and sequence information and possibly comments, of a color-coded distance plot and of structure images. Almost all of the images are available both in a mono and in a stereo representation. Standard procedures for generating these images were strictly avoided. Therefore, mixed rendering, coloring and labeling techniques were used extensively. Since May 1995 the Library has a growing division of images in the new Virtual Reality Modeling Language (VRML) format. The Image Library of Biological Macromolecules can be accessed via the WorldWide Web (http:lJwww.imb-jena.delIMAGE.html). There is a large number of structures determined by experimental andlor modeling techniques which are not intended to be included into the Protein Data Bank or Nucleic Acid Database for some reason. The Image Library could be a repository of these structures and of images of these and other structures of biological macromolecules including structures which are not known at atomic detail. Authors who are willing to make available images or coordinates to the scientific community via the Image Library of Biological Macromolecules are requested to contact the author.
NMR instrumentation and computer and software technology has led to an increasing rate of accumulation of new structures. Structural data of biological macromolecules in terms of coordinate files are available from the Protein Data Bank (PDB) at Brookhaven National Laboratory (Bernstein et al., 1977) and from the Nucleic Acid Database (NDB) at Rutgers University (Berman et al., 1992). Visualization of these structures plays a central role in identifying structural motifs and even in understanding biological function. Often the structure images lead directly to new biological insights or to new hypotheses. Visual information is obviously very appropriate for 'understanding' the complex structures of biological macromolecules (Hall, 1995). The usual way to visualize the structures of biological macromolecules is to retrieve the coordinate files from PDB or NDB over the Internet and then to use a molecular graphics package for displaying and possibly manipulating the structure. This approach is the method of choice if one is interested in a particular structure in a more detailed way. On the other hand, a rather common situation is that one would prefer to have the image of a structure directly available without the need to spend some time with the generation of the image or even without having access to a special molecular graphics software. Recently, network information systems have reached even public consciousness as a result of the dramatic growth in the use of the Internet (Schatz and Hardin, 1994). The World-Wide Web (WWW) has probably already changed the way science is done. One of its most appealing features is that, at least in principle, only one user interface is necessary for accessing almost all types of informations. Using these hypermedia protocols it is now very easy to transfer images or videos over the Internet. In addition, in 1995 the first Virtual Reality Modeling Language (VRML) viewers have become available. VRML is a developing standard for describing threedimensional scenes delivered across the Internet. It enables one to rotate, translate or zoom three-dimensional objects, e.g. biological macromolecules. Of course, this can be done much better using molecular graphics packages. However, one should realize that VRML viewers will be
J.SOhod
an integral part of the next generation of web browsers. Hence, this approach is very appropriate for making available visual structure information on biological macromolecules to a broader community within and even outside of science. Moreover, dynamic behavior will be included into the VRML specification soon, which makes this format appropriate for collaborative work. We have started to set up an Image Library, which combines the visualization of biological macromolecules with the new network tools. The Image Library of Biological Macromolecules
228
Since May 1995 VRML viewers, like WebSpace, i3D or VRweb, have become generally available. More details on these viewers can be found in the VRML division of the Image Library. A more comprehensive overview on VRML is available via the VRML repository at the San Diego Supercomputer Center (URL http://rosebud. sdsc.edu/SDSC/Partners/vrml/). VRML is an open platformindependent three-dimensional file format which is based on the Open Inventor format of Silicon Graphics Inc. The Image Library contains now an increasing number of VRML images (> 150). They were generated using either the Inventor interface of MIDASPLUS or using the explorer tool developed by Silicon Graphics and now supported by the Numerical Algorithms Group Ltd with EyeChem modules written by Omer Casher (Department of Chemistry, Imperial College, London). In both cases files in Inventor format are generated which are then converted to the VRML format. The VRML division of the Image Library of Biological Macromolecules was one of the first applications of virtual reality modeling in biology. To the best of our knowledge it is the very first application which is not aimed at demonstration purposes alone. As already noted, one can interact with VRML images, without having available the coordinates of the underlying structure and molecular graphics software. This is a brand new development which still suffers from various problems. One problem is that complex structures may yield very large datasets. Fortunately the compression rate is rather high, in many cases 90%. On the other hand, uncompression takes time. Hence it may happen that currently less powerful computers are not able to manage larger VRML files. One should realize, however,
Downloaded from bioinformatics.oxfordjournals.org by guest on July 14, 2011
The Image Library provides images of biological macromolecules and, in addition, relevant elementary information like images of amino acids and standard and modified nucleotides, images on the definition of strand direction and of the torsional angles in nucleic acids and DNA model conformations. It is subdivided into DNA, RNA, protein and carbohydrate divisions, where nucleic acid-protein complexes can be found in the nucleic acid parts and DNA-RNA hybrids in the RNA division. Each entry consists of a text file, of a variety of molecular structure images, and of color-coded distance plots. The text file contains general information on the structure, like the sequence and the citation of the structure report and a listing of all image files available. The images of molecular structures are intended to provide as much information as possible. Therefore, mixed rendering, coloring and labeling techniques are extensively used. All molecular images are available in a mono and in a stereo representation. In addition to the molecular images, color-coded distance plots are included. Distance plots relate the distances between representative atoms of amino acids and/or nucleic acids in the three-dimensional structure to the sequence (Godzik et ai, 1993). Hence they yield very useful structural information in addition to the threedimensional images. Difference distance plots can be used to display subtle differences between two similar structures. The distance plots are also very useful for protein-nucleic acid complexes. One advantage is that they provide a comprehensive overview over all residues involved in the interaction region in relation to their sequence position. The filenames are created according to the following rules. First the PDB and/or the NDB code or some other name for model structures is used. The second part of the name indicates the image type. The name of the graphics software used stands for the usual threedimensional images of molecular structures, distc and dist3d indicate contour and three-dimensional distance plots and sec images of secondary structures. Moreover, an additional s indicates a stereo image. For example, the file lecl_midas_3_s.gif represents a stereo representation
of the structure of the 67 kDa N-terminal fragment of Escherichia coli DNA topoisomerase with the PDB code led (Lima et ai, 1994) generated using the graphics software MIDASPLUS (Ferrin et ai, 1988). The following molecular graphics software packages were used: INSIGHTII (Biosym Technologies Inc.), MIDASPLUS (Ferrin et ai, 1988), PRO-EXPLORE (Oxford Molecular Ltd), SETOR (Evans, 1993), VRCHEM, VMD (Theoretical Biophysics Group, Beckman Institute for Advanced Science and Technology, University of Illinois), SYBYL (Tripos Inc.). The distance matrices were calculated using our own code, and the color-coded distance plots (three-dimensional or contour) were generated using Stanford Graphics (3D Visions Torrance, CA ), Origin (MicroCal Software Inc., Northampton, MA) and Spyglass Transform (Spyglass Inc., Savoy, IL). The names of the files with bibliographic information consist simply of the PDB and/or NDB code and the extension txt. All images are stored in GIF format except for the distance plots which are available both as GIF and PostScript files.
Image Library of Biological Macromolecules
images. Nevertheless, there are currently only four large image archives of biological macromolecules, the PDB at Brookhaven National Laboratory, Molecules R US at the National Institutes of Health, Swiss-3D-Image at the University of Geneva (Peitsch et ai, 1995) and the Image Library of Biological Macromolecules at 1MB Jena. The NDB provides only a few images so far and is therefore not classified as a large image archive. There is one basic difference between PDB and Molecules R US on the one side, and Swiss-3D-Image and our Image Library on the other side. The first two archives provide automatically generated images of all structures available. The disadvantage of this approach is that the images generated have a relatively low information content. On the other hand, Swiss-3D-Image and the Image Libary provide very instructive images of only a relatively small number of known structures. There is almost no overlap between Swiss-3D-Image and the Image Library. Swiss-3D-Image has almost no nucleic acid structures and also for proteins both archives are complementary. Hence both archives together already provide a substantial number of highquality images of biological macromolecules. Acknowledgements I am grateful to F.Haubensak for setting up the 1MB Jena WWW server and to K.MehliQ for writing the program DIST, which generates distance matrices and difference distance matrices.
Note added in proof The VRML division of the Image Library has been substantially extended using the new VRML interface of INSIGHT1I.
References Bemstein.F.C, Koetzle.T.F., Williams,G., Mayer.E.F., Bryce.M.D., Rodgers.J.R., Kennard.O, Simanouchi,T. and Tasumi,M. (1977) The Protein Data Bank: a computer based archival file for macromolecular structures. J. Mol. Biol., 112, 535-542. Berman,H.M., Olson,W.K., Beveridge.D.L., Westbrook.J., Gelbin,A., Demeny.T , Hsieh.A.R., Snnivasan,A.R. and Schneider.B. (1992) The Nucleic Acid Database. A comprehensive relational database of threedimensional structures of nucleic acids. Biophys. J., 63, 751-759. Evans.S.V (1993) SETOR: hardware lighted three-dimensional solid model representations of macromolecules. J. Mol. Graphics, 11, 134138. Ferrin,T.E., Huang,C.C, Jarvis.L.E. and Langridge,R. (1988) The MIDAS display system. J. Mol. Graphics, 6, 13-27, 36-37. Godzik,A., SkolnickJ. and Kolinski,A.(1993) Regularities in interaction patterns of globular proteins. Prol. Engineering, 6, 801-810. Hall,S.S. (1995) Protein images update natural history. Science, 267, 620-624. Lima.C.D., Wang.J.C. and Mondragon,A. (1994) Three-dimensional structure of the 67K N-terminal fragment of E.coli DNA topoisomerase. Nature, 367, 138-146. Moore,P.B. (1995) Ribosomes seen through a glass less darkly. Structure, 3, 851-852. Peitsch,M.C, Stampf.D.R., Wells,T.N.C. and SussmannJ.L. (1995) The Swiss-3D Image collection and PDB-browser on the world-wide web. Trends Biochem. Sci., 20, 82-84. Received on December 14, 1995; accepted on March 20, 1996
229
Downloaded from bioinformatics.oxfordjournals.org by guest on July 14, 2011
that the performance of the viewers, like WebSpace for example, has already dramatically increased since May 1995. Currently (November 1995), the Image Library has ~3000 image files of ~300 biomolecular structures. The entries include all RNA structures stored in the PDB and in the NDB, ~150 proteins, ~70 DNA structures and a few carbohydrates. The Internet address of the Image Library of Biological Macromolecules is http://www.imb-jena.de/IMAGE.html. The entries can be accessed either directly or by a text search of filenames and text files. Due to the file name rules used it is very simple to search for special file types. On the other hand, the search option can also be used to find author names, for example, or even all structures with a particular modified nucleotide. Readers are asked to check out the Image Library for themselves. Therefore, example images are not included in this report. Many structures of biological macromolecules are not available for some reason via the PDB or the NDB. Therefore, authors are encouraged to deposit their structures at the structure databases mentioned. On the other hand, the Image Library could represent a useful addition to these databases. We are willing to include in a relatively informal way coordinates of published structures which are not intended to be made available via PDB or NDB for some reason. This refers also to structures obtained by modeling procedures. Further, we would like to encourage authors to make available to the scientific community their own images of structures via the Image Library, regardless of whether or not the coordinates are deposited in the PDB or the NDB. This could be interesting for authors of structure reports because most journals have restrictions on the number of color plates. Recently, electron microscopists have made important progress in reconstructing three-dimensional images of such complex biological objects like the ribosome at a resolution of 25 A (Moore, 1995). We expect in the near future a fruitful interplay between structure images of this type and of building blocks whose structure is already known at atomic resolution. Therefore, it would be useful if images of biological objects not known at atomic resolution could also be included into the Image Library. Anybody interested in contributing to the Image Library of Biological Macromolecules should contact the author (e-mail:
[email protected]). When starting this project in December 1993 we were not aware of similar attempts. Now we know that the PDB has almost simultaneously started to include images and that the Swiss-3D-Image collection at the University of Geneva even earlier provided images of biological macromolecules. Currently, it is the rule rather than the exception that research reports on the WWW include