A vector-based method for drawing RNA secondary structure

0 downloads 0 Views 255KB Size Report
Nov 13, 1998 - structures of secondary structure object and draw list object for each of ..... outline view displays the structure in the form of a backbone in which ...
&#  %&  

)  

BIOINFORMATICS A vector-based method for drawing RNA secondary structure -+%)&&" %  & -+% !$  % &%.!% !$ 

'(*$%* & +*&$*!&% %!%(!% %  %!,()!*- % &% .  % &## &  ($-  +% % %!,()!*- &+#  .  &+* &(

              

Abstract Motivation: To produce a polygonal display of RNA secondary structure with minimal overlap and distortion of structural elements, with minimal search for positioning them, and with minimal user intervention. Results: A new algorithm for automatically drawing RNA secondary structure has been developed. The algorithm represents the direction and space for a structural element using vector and vector space. Two heuristics are used. The first heuristic is concerned with ordering structural elements to be positioned and the second with positioning them in space. The algorithm and a graphical user interface have been implemented in a working program called VizQFolder on IBM PC compatibles. Experimental results demonstrate that VizQFolder is capable of automatically generating nearly overlap-free polygonal displays for long RNA molecules. The only distortion performed to avoid overlap is the rotation of helices, leading to efficient generation of a polygonal display without sacrificing its readability. VizQFolder is not coupled to a specific prediction program of RNA secondary structure, and thus can be used for visualizing secondary structure models obtained by any means. Availability: The executable code of VizQFolder is available at http://automation.inha.ac.kr/∼khan. It can also be obtained from the authors upon request. Contact: [email protected] Introduction An early step toward evaluating a predicted RNA secondary structure is to know what the predicted structure looks like— to visualize the predicted structure in graphical form. In a mathematical sense, the secondary structure of RNA is a topological structure rather than a geometric structure. It remains unchanged even when under distortion, as long as the connectivity relation between nucleotides is not changed. Since RNA secondary structure is concerned with the connectivity relation, it can be described verbally, in natural language, in a very pragmatic sense. Alternatively, it can be described using a table or character string, which shows base pairs. However, such a representation does not serve as a use-

286

ful aid for the evaluation and comparison of RNA secondary structure. Therefore, in visualizing RNA secondary structure, it is important to represent it so as to facilitate the evaluation and comparison. Several programs have been developed for representing RNA secondary structure. These include a conventional representation of polygonal display (Shapiro et al., 1984; Bruccoleri and Heinrich, 1988; Matzura and Wennborg, 1996; Chetouani et al., 1997) and unconventional representations such as mountains (Hogeweg and Hesper, 1984), and circles and domes (Nussinov et al., 1978). Among these, the polygonal display, in which loops are drawn as regular polygons, gives the most clear and compact representation, especially for long molecules. The clear and compact representation is, however, achieved at the cost of an increased computational load relative to the unconventional representations. Muller et al.’s (1993) algorithm, for example, takes exponential time to produce automatically an overlap-free polygonal display due to backtracking. In most drawing programs of a polygonal display, computational load is increased because of the work associated with removing overlap of structural elements, performed either by an iterative process of the programs or with user intervention. A recent algorithm (Nakaya et al., 1996) generates a polygonal display in O(NlogN) time by applying an O(NlogN) force-calculation algorithm, originally developed by Barnes and Hut (1986). Their algorithm has been implemented using a parallel programming language on a parallel computer, which is not widely available. This paper describes a new, practical algorithm for generating polygonal display of RNA secondary structure. The algorithm uses vector and vector space to determine the position of a structural element. The only distortion operation allowed to avoid overlap is the rotation of helices. The algorithm has been implemented in a working program called VizQFolder on IBM PC compatibles, which are the most widespread computers in laboratories.

System and methods VizQFolder is written in Borland C++ Builder and runs on IBM PC-compatible computers with Windows 95 operating  Oxford University Press

Vector-based method for drawing RNA secondary structure

space. An open vector space refers to a potentially open and wide space, which is not obstructed by other structural elements. The open vector space is an ideal region for a structural element to be located. However, it is not always possible to position a loop in an open vector space due to the helices adjacent to its seed loop and the bases of the seed loop. Therefore, we define an allowed vector space as a realistic vector space in which a loop can be located. Considering both the open and allowed vector spaces of a loop, we actually position the loop in the direction of a feasible vector. The detailed methods for computing the vectors and the vector spaces defined above will be discussed later in this paper.

Overview

Fig. 1. Two vectors partition the plane into two vector spaces. A middle vector indicates an intended one among the two vector spaces.

system. The minimal hardware requirements are 32 Mbyte RAM and a monitor with a resolution of 640 × 480. Executable code is available at http://automation.inha.ac.kr/∼khan. It can also be obtained from the authors on request.

Our algorithm positions a loop such that its center is located in the extended line containing its adjacent helix. Since the size of a loop or helix is proportional to the number of bases constituting the structural element, we do not change the size of a structural element in an attempt to resolve overlap. Instead, we rotate a helix and its adjacent loop, which appear later in the drawing priority. With this method, overlap can be avoided without resizing or distorting the earlier structural element. The visualization of RNA secondary structure is composed of four steps, each is subsequently discussed in detail. 1.

Algorithm Definitions A structural element refers to either a double-stranded part (i.e. helix) or a single-stranded part such as an internal loop, bulge loop, multiple loop or dangling end. A structural element consists of one or more structural units, which is a contiguous segment of a base sequence. Adjacent helices to a loop v mean helices directly connected to v. Adjacent loops to a loop v include all loops connected to v via a single helix. A seed loop of a loop v is an adjacent loop to v, which has already been positioned. A regular secondary structure is one having no bulge loop, dangling end or helices directly adjacent to each other. We use a vector to designate the direction of a structural element to be positioned. Two non-identical vectors partition the plane into two unbounded wedge regions (see Figure 1). A vector space is used here to designate the unbounded wedge region lying between two vectors starting at a common point; so, ΛBAC denotes a vector space formed by sweeping a vector AB clockwise until it reaches AC, while ΛCAB denotes the complementary vector space. Since two non-identical vectors define two vector spaces, a middle vector is defined to indicate an intended vector

2.

3.

4.

Regularizing a secondary structure. Transform a secondary structure into a regular one by introducing ‘artificial’ bases so that it does not contain any bulge loop, dangling end or helices directly adjacent to each other. A regularized secondary structure is stored in a data structure called an organization object. Building data structures. Identify structural elements from the organization object, and construct the data structures of secondary structure object and draw list object for each of the identified structural elements. Determining drawing priority. Compute the sizes of all loops and determine the drawing priorities of all structural elements, including helices. A data structure called a priority queue stores these priorities. Positioning and drawing structural elements. Starting from a structural element with the highest priority, compute open and allowed vector spaces and a feasible vector. A structural element shall be positioned in the direction of the feasible vector. For each positioned structural element, compute the coordinates of its constituting bases and display them.

Regularizing a secondary structure The drawing process is simplified if the following assumptions hold:

287

K.Han, D.Kim and H.-J.Kim

Fig. 2. An open vector space for a target loop is ΛEAC, bounded by left vector AE and by right vector AC.

1.

Both ends of a helix are connected to a single loop or none. 2. A secondary structure consists of loops and helices only. In order to satisfy the above assumptions, the secondary structure represented in text form is pre-processed as follows. (a) The helix adjacent to a bulge loop has both a loop and a helix in one end, and thus violates the first assumption. A bulge loop is transformed into an internal loop by adding an artificial base on the opposite side of the bulge loop. ex. ((((((----)))---))) is transformed into (((-(((----)))---))). (b) When a helix is directly connected to another helix, the end of the helix is another helix, rather than a loop or none. To satisfy the first assumption, an artificial base is inserted between the helices. ex. (((----)))(((----))) is transformed into (((----)))-(((----))). (c) A dangling end violates the second assumption because it is neither a loop nor a helix. Artificial bases are added in order to pair the first and last bases. ex. ---(((----))) or ---(((----)))--- is transformed into (((---(((----)))---))). The artificial bases introduced in this step are marked and are not actually drawn in the last step of the algorithm. The regularized secondary structure is stored in the data structure called the organization object, in which the secondary structure is partitioned into structural units.

Building data structures Structural elements such as loops and helices are identified from the regularized secondary structure and two data structures are initialized for each of the identified structural elements. They are the secondary structure object (SSO) and draw list object (DLO). While the SSO contains display de-

288

vice-independent information, the DLO contains device-dependent information such as the coordinates of objects. The SSO contains the indexes to the structural units of the organization object forming the structural element, indexes to adjacent structural elements, and the index to the DLO corresponding to the structural element. The DLO contains the index to the SSO, and the position of a structural element; center position and radius are maintained for a loop, while start and end positions are maintained for a helix.

Determining drawing priority We determine the drawing priority of the structural elements based on the adjacency of structural elements and the radii of loops. When computing a loop radius, we assume that each base is a circle with a diameter 1 and that the distance between the centers of adjacent bases is 2. Artificial bases introduced in the regularizing step are also included in computing the radius of a loop. The determination process of the drawing priority can be described as below. After the determination process is completed, structural elements are stored in a priority queue in decreasing order of drawing priority. 1. The largest loop is added to the priority queue. 2. Among the remaining loops, all loops adjacent to the loops in the priority queue are added to the wait queue. 3. The largest loop in the wait queue is moved to the priority queue. 4. The helices between the last loop and the remaining loops in the priority queue are added to the priority queue. 5. Repeat steps 2, 3 and 4 until all structural elements are stored in the priority queue.

Positioning and drawing structural elements For the task of searching for the space and direction of structural elements, we use two heuristics that minimize the overlap of the structural elements without increasing search effort

Vector-based method for drawing RNA secondary structure

Fig. 3. The vector space in which a target loop and helix can rotate is limited by other loops, which are adjacent to the seed loop an d have already been positioned. The bases of a seed loop can further narrow the allowed vector space for a target loop. The target loop shown in this example cannot be positioned in the positions marked in dotted lines due to intersection with other structural elements.

and distortion level. The first heuristic is concerned with ordering structural elements to be placed and the second one with placing them in space. 1.

2.

Position the largest loop first. The position of a helix is automatically determined from that of a loop adjacent to it. Position a loop in the direction where an open and wide space exists.

The first heuristic has already been used in a previous step for determining the drawing priority. The problem is to find an ‘open and wide’ space, which is used in the second heuristic strategy. In the example of Figure 2, an open and wide space for a target loop exists in the positive y direction instead of the negative y direction. So, a vector space ΛBAC is preferable to ΛCAB. As mentioned earlier, a vector space called an open vector space is used to represent a potentially open space. For efficiency, we find an approximate open vector space instead of an exact open vector space. An approximate open vector space is determined as follows. The left vector of an open vector space for a target loop is a vector starting at the seed loop of the target loop and directing toward the last loop visited in the traverse of the rightmost loop connected to the seed loop. The right vector of an open vector space is a vector starting at the seed loop and directing toward the last loop visited in the traverse of the

leftmost loop connected to the seed loop. The open vector space is an unbounded wedge region between the left and right vectors. Consider the example of Figure 2. To find the left vector of an open vector, loop D is first visited. The only unvisited loop adjacent to D is F, so F is visited. Unvisited loops adjacent to F are E and H. Among them, E is the rightmost loop, so is visited. There is no unvisited loop adjacent to E. Thus, E becomes the end point of the left vector of the open vector space, and AE becomes the left vector. The right vector is found in a similar way, except that the leftmost loop is chosen at each step of traverse. In the example of Figure 2, AC becomes the right vector of the open vector space. Thus, ΛEAC is the open vector space for the target loop. The reason that ΛEAC is found instead of ΛBAC is because we use an approximate open vector space instead of exact open vector space for efficiency. Finding such an exact open vector space requires all identified vectors to be tested for intersection, and this testing increases the complexity of the algorithm, which will be discussed later. The middle vector of an open vector space is an ideal direction for a target loop to be put. However, positioning a target loop in such a direction may not be possible if the space for a target loop is limited by adjacent helices which have been positioned already (Figure 3). The bases of the seed loop can further restrict the space for a target loop (Figure 3). Thus, we

289

K.Han, D.Kim and H.-J.Kim

Fig. 4. A target loop can be positioned in the intersected space of the open vector space and the allowed vector space.

define an allowed vector space, which represents a realistic vector space in which a loop can be located. Considering both the open vector space and the allowed vector space, we determine the direction in which the target loop will actually be positioned (see Figure 4). A feasible vector represents this direction. We distinguish two cases. Case 1: The allowed vector space contains the middle vector of the open vector space (Figure 5). We set the feasible vector to the middle vector of the open vector space. Case 2: The allowed vector space does not contain the middle vector of an open vector space (Figure 6). Out of the left and right vectors of the allowed vector space, the one closer to the middle vector of the open vector space is selected as the feasible vector. For each vector, the start position, direction and magnitude are internally maintained. If any of these components is changed, the remaining components of the vector are automatically changed. When the feasible vector for a target loop is determined, the center position of the target loop is computed as follows. VectorFeasible.dMagnitude = LoopSeed.dRadius + StemTarget.dLength + LoopTarget.dRadius LoopTarget.dCenterX = VectorFeasible.dStartX + VectorFeasible.dDirectionX LoopTarget.dCenterY = VectorFeasible.dStartY + VectorFeasible.dDirectionY

The start and end positions of a target helix are computed as follows. VectorFeasible.dMagnitude = StemTarget.dLength + LoopSeed.dRadius StemTarget.dEndX = VectorFeasible.dStartX + VectorFeasible.dDirectionX

290

StemTarget.dEndY = VectorFeasible.dStartY + VectorFeasible.dDirectionY VectorFeasible.dMagnitude = LoopSeed.dRadius StemTarget.dStartX = VectorFeasible.dStartX + VectorFeasible.dDirectionX StemTarget.dStartY = VectorFeasible.dStartY + VectorFeasible.dDirectionY

Once the start and end positions of a helix, and the center and radius of a loop, have been computed, displaying the bases of them on a display device is straightforward. The procedure for placing structural elements discussed in this section can be summarized as follows. 1. Take a target loop and a target helix from the DLO. The target helix is one that is adjacent to the target loop and its seed loop, if any. 2. Compute the open and allowed vector spaces for the target loop. 3. Based on the open and allowed vector spaces, compute the feasible vector for the target loop. 4. Position the target loop in the direction of the feasible vector. 5. Position the target helix in the direction of the feasible vector.

Complexity analysis There are O(n) structural elements in RNA secondary structure with n bases. The first step of regularizing a secondary structure takes O(n2) time since it requires O(n) time for each structural element in the worst case. In the second step, data structures are built in O(n) time. The third step of determining drawing priority uses O(n2) time since O(n) time is re-

Vector-based method for drawing RNA secondary structure

Example 1

Fig. 5. The case in which the middle vector of the open vector space exists within the allowed vector space. The middle vector becomes the feasible vector.

Example 2

quired for each structural element. The final step of placing and drawing structural elements also takes O(n2) time. Thus, our algorithm has an overall O(n2) time bound. For space, it has O(n) bound since every data structure used by the algorithm requires O(n) space. The reason why in step 4 of our algorithm we find an approximate open vector space instead of an exact open vector space is because of the time efficiency. Finding an exact open vector space requires all identified vectors to be tested for intersection. Intersection testing of them takes O(n2) time for positioning a target loop, which increases the overall complexity of the algorithm from O(n2) to O(n3).

Implementation VizQFolder is written in Borland C++ Builder and runs on IBM PC-compatible computers with Windows 95 operating system. The programs of VizQFolder are implemented in two parts: a system-independent part and a system-dependent part. Computing a polygonal drawing of RNA secondary structure is the core part of VizQFolder and independent of system environments, such as operating system and compiler. The core part is written in the standard C and C++ programming language. The program for displaying the computed drawing on a monitor is system dependent and written using virtual functions of C++. Porting VizQFolder on other systems requires modifying the system-dependent part only.

Fig. 6. The case in which the middle vector of the open vector space does not exist within the allowed vector space. The left vector of the allowed vector space becomes the feasible vector in these examples.

Input The program VizQFolder takes as input an ASCII file in pairing format. The pairing format describes a secondary structure in one of the following styles. Both styles put no restriction on the line length. (i) Pairing format I

291

K.Han, D.Kim and H.-J.Kim

Fig. 7. Interface of VizQFolder with four windows: option window, history window, standard view window and outline view window (clockw ise from the top right).

$ % $ % $ %

1 CAGgUCUCUCUGGUUUUAGACCAGA 1 -----(((((((((-----)))))) 26 UcUGAG 26 ---)))

(ii) Pairing format II CAGgUCUCUCUGGUUUUAGACCAGAUcUGAG -----(((((((((-----))))))---))) The secondary structure data in the CT or the region table format as produced by MFOLD (Jaeger et al., 1989) can also be given to VizQFolder after converting them into the pairing format using an accompanying program in the VizQFolder software.

292

Output As output, VizQFolder produces two kinds of drawings (see Figure 7). In the standard polygonal view, the RNA secondary structure is displayed in the form where bases, symbols between paired bases and base numbering are specified. The outline view displays the structure in the form of a backbone in which loops are replaced by circles and helices by line segments.

User interface and results VizQFolder allows the user to generate several structure models for different RNA molecules without leaving a VizQFolder session. Each generated model is assigned to a separate window, which has its own window identity assigned by the user for later reference (see the option window of Figure 7). This facility is convenient, particularly when comparing several models to search for structural motifs.

Vector-based method for drawing RNA secondary structure

Fig. 8. Standard view of the secondary structure model for Tetrahymena intervening sequence (IVS), consisting of 414 bases. The secondary structure data were taken from Cech et al. (1983). No editing has been performed on the drawing.

Other convenient features of the VizQFolder user interface are as follows: 1. 2.

3.

A history window is produced, which keeps a record of operations of a VizQFolder session. The secondary structure data can be read either from a keyboard or from a file in flexible form. Data given in either form can be edited later. The secondary structure data can be shown while they are being visualized.

Examples of secondary structure models drawn by VizQFolder are given in Figures 8–11. The total response time, measured by Zprofiler (Baars, 1998), for drawing the secondary structure of Tetrahymena intervening sequence (Cech et al., 1983) presented in Figure 8 is ∼390 ms on a Pentium MMX 200 MHz processor. A large part of the response time is ascribed to creating a bitmap and drawing, instead of computation for positioning. Figure 9 shows an outline polygonal display of the secondary structure of mouse β globin RNA, consisting of 1830 nucleotides. The secondary structure data were obtained using MFOLD (Jaeger et al., 1989). While the drawing pro-

gram presented in Shapiro et al. (1984) produces a polygonal display with several overlaps for the subsequence of the same RNA molecule (see Figure 4B of Shapiro et al., 1984), VizQFolder produces an overlap-free drawing for the entire sequence of the molecule. Both Figures 8 and 9 are the original drawings generated by VizQFolder with no user intervention for modification. VizQFolder provides an interactive editing facility for manually fixing drawings when overlaps occur. The user can drag a loop to any position within a window using the left mouse button. Every helix connected to the moved loop is automatically adjusted. The user can also rotate a helix with respect to a loop using the right mouse button. Editing preserves the orientation of a helix, and the size and shape of a loop. That is, a helix always points to the center of its adjacent loop, a loop takes the shape of a circle with size proportional to its number of bases. This editing facility is available both in the standard and the outline displays. Figure 10a shows an outline polygonal display of the secondary structure of Haloferax volcanii 16S rRNA with 1472 nucleotides (Gutell, 1994). The drawing contains several overlaps, as shown in Figure 10a. Figure 10b shows the same

293

K.Han, D.Kim and H.-J.Kim

Fig. 9. Outline view of the secondary structure model for the entire sequence of mouse β globin RNA, consisting of 1830 bases (sequence accession number V00722). The structure is one of the suboptimal structures predicted by MFOLD (Jaeger et al., 1989). No editing has been performed on the drawing.

structure after removing the overlaps by two successive rotations about the pivot points marked in Figure 10a. Figure 11 shows the outline view of the secondary structure model for the entire sequence of HIVRF, consisting of 9128 nucleotides. The secondary structure was predicted by STRUCTURELAB (Shapiro and Kasprzak, 1996). Overlaps in the initial drawing were removed using the editing facility of VizQFolder.

Discussion Experimental results of VizQFolder have demonstrated that polygonal displays of secondary structure can be automatically generated with minimal distortion of structural elements and minimal user intervention. The only distortion performed to avoid overlap was the rotation of a helix, thus leading to efficient generation of a polygonal display without sacrificing its readability. This was possible mainly due to the two heuristics discussed earlier. The first heuristic is concerned with positioning order and the second with positioning method, i.e. (1) Position loops in decreasing order of their sizes; (2) Position

294

a structural element based on both ideal and realistic vector spaces of it. These heuristics help reduce the search effort for positioning structural elements, and the chance of repositioning them due to overlap. A drawing program (hereafter referred to as Shapiro’s program) developed by Shapiro et al. (1984) also attempts to produce non-overlapping polygonal displays of RNA secondary structures by rotating helices. Shapiro’s program has been later incorporated into an RNA structure analysis system called STRUCTURELAB (Shapiro and Kasprzak, 1996). Another program called NAVIEW (Bruccoleri and Heinrich, 1988) contains some improvements over Shapiro’s program. Both Shapiro’s program and NAVIEW are based on the radial drawing (Shapiro et al., 1984), which is in turn based on the circle graph (Shapiro et al., 1984). The radial drawing and the circle graph have a nice mathematical property in that they have precise definitions and they guarantee non-overlapping drawings. However, neither Shapiro’s program nor NAVIEW guarantees non-overlapping drawings. VizQFolder is different from these programs in several ways:

Vector-based method for drawing RNA secondary structure

a

b

Fig. 10. (a) Initial outline view of the secondary structure model for Haloferax volcanii 16S rRNA (sequence accession number K00421). The secondary structure data were taken from Gutell (1994). (b) Outline view of the secondary structure model for H.volcanii 16S rRNA after moving the loops marked in (a).

1.

2.

While Shapiro’s program and NAVIEW have been developed on the basis of the radial drawing, development of VizQFolder is independent of the radial drawing. NAVIEW attempts to avoid overlaps of a polygonal display by modifying both the size and shape of loops

while preserving the directions of helices of the radial drawing. Both VizQFolder and Shapiro’s program avoid overlaps only by rotating helices. Shapiro’s program tries to orient a helix to the direction of its radial vector, which is directly determined by the bases of the helix itself (the radial vector of a helix is a perpendicu-

295

K.Han, D.Kim and H.-J.Kim

Fig. 11. Outline view of the secondary structure model for the entire sequence of HIVRF, consisting of 9128 nucleotides. The secondary structure was predicted by STRUCTURELAB (Shapiro and Kasprzak, 1996). Overlaps in the initial drawing were removed using the editing faci lity of VizQFolder.

3.

296

lar bisector of parallel chords connecting the bases of the helix in the circle graph). VizQFolder positions a helix in the direction of feasible vector, which is either the middle vector of open vector space or the boundary vector of allowed vector space. That is, the direction of a helix computed by VizQFolder is not determined by the bases of the helix itself, but by its neighbor structural elements. The appearance of a drawing produced by VizQFolder is different from those produced by the programs. In a

4.

drawing by VizQFolder, a loop always takes the shape of a circle and the size of a loop or a helix is proportional to the number of its bases. A helix connected to a loop, including a multiple loop, always points to the center of the loop. A multiple loop drawn by NAVIEW is often distorted in shape, and may contain extruded single stands. While NAVIEW and Shapiro’s program run on a workstation or minicomputer, VizQFolder runs on a

Vector-based method for drawing RNA secondary structure

personal computer, which has less computing power than a workstation. 5. While NAVIEW provides no editing facility, VizQFolder and Shapiro’s program provide an interactive editing facility on the structure in terms of structural elements. Another strength of VizQFolder is that it can draw RNA secondary structure predicted by any method as long as the input to VizQFolder is in proper formats, which are flexible, as described earlier. Many drawing programs, however, do not have this feature. RNAdraw (Matzura and Wennborg, 1996), for example, is tightly coupled to a thermodynamic prediction program, and cannot be used for drawing secondary structure predicted by other methods. Some drawing programs, such as CARD (Winnepenninckx et al., 1995) and manual encoding in RNA_d2 (Perochon-Dorisse et al., 1995), are not coupled to prediction methods, but they require every structural element to be entered directly from the keyboard rather than from a file, which is a tedious process for long RNA molecules. In the current version of VizQFolder, standard polygonal displays can be generated for RNA with up to ∼700 nucleotides. This limitation is not due to the memory requirement of VizQFolder, but mainly due to the maximum bitmap size supported by a graphic device driver. For example, there is no difficulty in computing the drawing of HIVRF with 9128 nucleotides, as shown in Figure 11. Its standard view could not be displayed on a monitor due to the bitmap size limitation, but the outline view presented in Figure 11 proves that drawing has been computed. For RNA with many multiple loops, VizQFolder tends to generate a drawing with overlap, which can be eliminated with a simple moving operation provided by VizQFolder. This kind of overlap occurs for the following reasons. 1.

2.

3.

The algorithm computes approximate open vector space instead of exact open vector space for efficiency, as discussed earlier. When computing the allowed vector space for a target loop, the algorithm does not consider the radii of the target loop and other loops. There is no post-processing of a drawing. Moreover, there is no backtracking during the drawing process. Once a structural element is positioned and drawn, it is neither repositioned nor redrawn.

VizQFolder is being expanded to eliminate this kind of overlap and to display larger standard polygonal displays. Although the current implementation of VizQFolder does not produce completely overlap-free polygonal displays for arbitrarily long RNA molecules, its algorithm for positioning and drawing structural elements using vector and vector space would be useful as a basis for developing various visualization methods of RNA structure.

Acknowledgements We thank Dr Bruce A.Shapiro for providing the secondary structure data of HIVRF. This work has been supported by the Korea Science and Engineering Foundation (KOSEF) under grant 961-0911-062-2.

References Baars,A. (1998) Zprofiler, a Delphi component for high-resolution timing (Version 2.20). Published electronically on the Internet. ftp://www6.uniovi.es/pub/delphi/ftp/d20free/z_prof.zip Barnes,J. and Hut,P. (1986) A hierarchical O (NlogN) force-calculation algorithm. Nature, 324, 446–449. Bruccoleri,R.E. and Heinrich,G. (1988) An improved algorithm for nucleic acid secondary structure display. Comput. Applic. Biosci., 4, 167–173. Cech,T.R., Tanner,N.K., Tinoco,I., Weir,B.R., Zuker,M. and Perlman,P.S. (1983) Secondary structure of the Tetrahymena ribosomal RNA intervening sequence: Structural homology with fungal mitochondrial intervening sequences. Proc. Natl Acad. Sci. USA, 80, 3903–3907. Chetouani,F., Monestié,P., Thébault,P., Gaspin,C. and Michot,B. (1997) ESSA: an integrated and interactive computer tool for analyzing RNA secondary structure. Nucleic Acids Res., 25, 3514–3522. Gutell,R.R. (1994) Collection of small subunit (16S- and 16S-like) ribosomal RNA structures: 1994. Nucleic Acids Res., 22, 3502–3507. Hogeweg,P. and Hesper,B. (1984) Energy directed folding of RNA sequences. Nucleic Acids Res., 12, 67–74. Jaeger,J.A., Turner,D.H. and Zuker,M. (1989) Improved predictions of secondary structures for RNA. Proc. Natl Acad. Sci. USA, 86, 7706–7710. Matzura,O. and Wennborg,A. (1996) RNAdraw: an integrated program for RNA secondary structure calculation and analysis under 32-bit Microsoft Windows. Comput. Applic. Biosci., 12, 247–249. Muller,G., Gaspin,Ch., Etienne,A. and Westhof,E. (1993) Automatic display of RNA secondary structures. Comput. Applic. Biosci., 9, 551–561. Nakaya,A., Taura,K., Yamamoto,K. and Yonezawa,A. (1996) Visualization of RNA secondary structures using highly parallel computers. Comput. Applic. Biosci., 12, 205–211. Nussinov,R., Pieczenik,R., Griggs,G. and Kleitman,J. (1978) Algorithms for loop matching. SIAM J. Appl. Math., 35, 68–82. Perochon-Dorisse,J., Chetouani,F., Aurel,S. and Iscolo,N. (1995) RNA_d2: a computer program for editing and display of RNA secondary structures. Comput. Applic. Biosci., 11, 101–109. Shapiro,B.A. and Kasprzak,W. (1996) STRUCTURELAB: A heterogeneous bioinformatics system for RNA structure analysis. J. Mol. Graphics, 14, 194–205. Shapiro,B.A., Maizel,J., Lipkin,L.E., Currey,K. and Whitney,C. (1984) Generating non-overlapping displays of nucleic acid secondary structure. Nucleic Acids Res., 12, 75–88. Winnepenninckx,B., Van de Peer,Y., Backeljau,T. and De Wachter,R. (1995) CARD: a drawing tool for RNA secondary structure models. BioTechniques, 16, 1060–1063.

297