Detection of a fundamental modular format common to transfer and

0 downloads 0 Views 1MB Size Report
tCenter for Studies in Statistical Mechanics, and Departments of §Botany and IZoology, University of Texasat Austin, Austin, TX 78712; and fSDR. Intergroup ...
Proc. Natl. Acad. Sci. USA Vol. 82, pp. 5337-5341, August 1985 Biophysics

Detection of a fundamental modular format common to transfer and ribosomal RNAs: Second-order spectral analysis (evoludon/molecular coding/covariant Fourier resonance analysis)

APOLINARIO D. NAZAREAtt, DAVID P. BLOCH§¶, AND ANNE C. SEMRAU1' tCenter for Studies in Statistical Mechanics, and Departments of §Botany and IZoology, University of Texas at Austin, Austin, TX 78712; and fSDR Intergroup, Waldstrasse 37/H.2, D-1000 West Berlin 21, Federal Republic of Germany

Communicated by Stuart A. Rice, November 26, 1984

Second-order spectral analysis is used to ABSTRACT detect rigorously and to characterize the principal periodicities in the positions of conserved sequences common to tRNAs and rRNAs. It is shown that the shared periodicity having the largest spectral amplitude is 9, followed by 8 and 10, thus forming a dosed triad of significant multiplets centered at 9 bases. This conclusion is proposed to reflect a closed triadic set of fundamental tandem repeat lengths in a class of ancestral macromolecules possessing a restricted sequence symmetry. The terms "remanent" and "archeomodular" are used to describe a relic modular format, traces of which are shown here to persist despite the changes that have occurred in the primary structures of ribonucleic acids during the course of their evolution.

Searches for matching sequences using paired combinations of RNAs in which one member is a transfer RNA (tRNA) and the other is a ribosomal RNA (rRNA) revealed that a large minority of the pairs exhibit one or more matches whose expected number based on coincidence is 1 in 10 or lower (1). The matching sequences are distributed along the molecules, having no apparent correlation with structure. Where each member of the pair comes from a different organism, for example a tRNA from yeast and the rRNA from Escherichia coli, the matches cannot be attributed to selection in a common cellular environment (ref. 2; unpublished work). The matches were interpreted as representing conserved sequences having origins in an ancestral molecule common to both tRNAs and rRNAs (perhaps all classes of RNA). Common ancestry is the usual interpretation given similar sequences in macromolecules (3, 4) and, occasionally, similar higher-order structures (5), and it is one of the conditions defining the term "homology" in an evolutionary context. Alignment of the homologies using a standard frame of reference revealed a pattern (6), suggesting conservation of the positions of the homologies as well as the sequences. The pattern was indicated by a periodicity in spacing. In the present work, we attempt to define the periods by using covariant spectral analysis.

MATERIALS AND METHODS Identification and Placement of the Homologies. The details of the searches have been described (1, 2, 6). The sequences of tRNAs from E. coli, Halobacterium volcanii, and yeast were taken from Sprinzl and Gauss (7). Those for bovine mitochondria were derived from the genome (8). The rRNA sequences for E. coli and yeast were taken from Zweib et al. (9) and that from H. volcanii was from Gupta et al. (10). The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Comparisons were done using the Los Alamos routines (11) with an expected number of 0.1 as a cut-off in screening the matches. Rewards and penalties of -1, 2, and 3 for matching bases, mismatches, and gaps, respectively (11), and the cut-off point, assured low representation of matches containing large numbers of gaps. (For discussion of the gap problem, see refs. 12 and 13.) The beginning and end bases of the homologies were aligned along cartesian axes, using the positions of the E. coli small subunit rRNA on the abscissa and a consensus tRNA structure (7) along the ordinate. The ribosomal RNAs of different species share common features of secondary structure and, to a limited extent, common primary structure (9). This permits the positions of homologies identified in the rRNA of one species to be described in terms of the second. The position of a sequence in yeast rRNA, for example, can be approximated on the E. coli rRNA using secondary structure as a guide, and then pinpointing the location by using nearby conserved primary strings as reference points (see Fig. 1). Spacings between the homologies were determined by extending a 450 line from the 5' base of a homology to the line's intersection on the abscissa (6) (see Fig. 2). In one instance in which the 3' end could be located more accurately than the 5' end, the 450 line was extended from the 3' end. In three cases in which the homologies were interrupted by insertions (or deletions), the uninterrupted sections of the homologies were used in extending the 45° lines. Of 133 homologies identified, 123 could be located. The points of intersection on the abscissa provided the data for the spectral analyses. These data, extended to a total length of 1600, will be referred to as 71a, with a = 0, 1 ... 1599, and where 'qa is the number of homologies whose intercept lies at position a. Second-Order Spectral Analysis. Our purpose is to determine whether positional length modularity can be rigorously inferred from the data. A strong spectrally resolvable characteristic repeat length in the positioning of the homologies would be indicative of fundamental modularity. The wellknown method of second-order spectral analysis, as applied to data series (14, 15), is a straightforward numerical exercise, but a few comments would be useful for putting our results in perspective. Spectral analysis is a conceptually well-defined and rigorous technique (16, 17). The ordered integers that represent the components of a generalized label set (or X set) form an additive Abelian group. The multiplicative character group (18, 19) of such an integral X set, whose elements form an orthogonal basis for the transformation of the Abelian group to the complex numbers, are precisely the fundamental roots of unity-exp(-iwX). These are orthogonal functions in the integral argument X. Thus, when we write the discrete Fourier transform of the X-covariance (14, 15) of our data-

$To whom reprint requests 5337

should be addressed.

5338

Biophysics: Nazarea et al.

Proc. Natl. Acad. Sci. USA 82 (1985)

= cov(7ia,+x q,), with 1XI < m-this Fourier transform p,,(X) is given by F,1(w)

=

(2Xr)-1 Y g(k) p.,,(X) exp(-iwk),

Ccs*-C G U

E. coli t RNA"C

[1]

in the frequency co, with the prefactor g(X) a well-defined function (see below). In Eq. 1, we are in effect exploiting the orthogonality of the character group of the X set. With the summation over JXJ < m, we are implicitly assuming a toroidally truncated data condition. In a Fourier orthogonal decomposition that gives rise to the complex-valued spectral density function cD1(w), the trigonometric components in the spectral line shape are orthogonal resonances of the covariance function. The secondorder spectral density line shape is effectively a complexvalued function that pinpoints the independent frequency or frequencies that contribute most significantly to the observed mean-square covariance of the data. Eq. 1 forms the basis of second-order spectral analysis, and one for which the Parseval-Plancherel theorem, under a given set of data conditions, assures uniqueness of the spectral line shape corresponding to the X-covariance of the homology series. Our homology data 77 were first rescaled to subtract the mean and normalize the covariance function p.(X), with lJX < m and m = 1600. In the numerical calculations, the prefactor g(X) is a convergence factor for finite data length (a lag or X-window, better known as the apodization function in Fourier transform spectroscopy); for our purposes, we have used a Parzen-type window (14). The numerical procedure used for the calculation of the spectral line shapes was a discrete IMSL-FFT (fast Fourier transform) subroutine. We have sampled our data (equally spaced in the a dimension) at the Nyquist rate, so that the frequency axes of the spectral line shapes are normalized. This means that a doublet resonance, which is the shortest repeat (or wave number) that is physically possible under Nyquist rate sampling, must lie at the farthest end of the (normalized) frequency axis. In general, the repeat length (or period) corresponding to a resolved resonance occurring at WN is given by rN = cwT1. We present spectral results only for positive frequencies, since we are dealing with the data set iy, which is a real array. For physical reasons, resonances corresponding to integral and nearly integral repeat lengths, here termed "multiplets," are of particular interest. RESULTS Fig. 1 shows a pair of homologies from yeast and E. coli, their locations, and values indicating their expected numbers based on chance. Fig. 2 shows a section of a graph in which the homologies are plotted along the cartesian axes represented by the positions of the bases in the reference standards. To estimate the principal resonances from these data, the rescaled q1 was segmented into lengths that are integral powers of 2, and the covariant spectral density over each segment was calculated in accordance with Eq. 1, suitably scaled. The average spectral density was evaluated for the segmented data, giving equal weight to each (segmental) covariance spectrum. We present in Fig. 3 the line-shape plots of the modulus (magnitude squared) of the covariant Fourier transform 10.,(w)1-that is, the amplitude spectrumfor four separate estimates, corresponding to segmentation 7, 4, 2, 1, respectively. In the case of Fig. 3a, for instance, the labeling indicates that the rescaled data v7< was first segmented to a length of 1792 with zero padding equally placed at both ends, then broken up into seven separate segments. The ensemble averaged amplitude spectral line shape was calculated over the seven segments, each containing 256 data

Yeast 18S rRNA

KyUUGUIO173A

E. coli 16S rRNA

U

U59

iCiuC

UG

G

U 1281

-~U

G

UGCAGCC~C ~U UC (U CA G)

FIG. 1. Sequence homology between E. coli tRNAvaI (upper left) and yeast 18S rRNA (lower right). Expected number of matches at this level of quality, based on coincidence, is 0.01. Matching bases are shown in bold letters. Conserved primary sequences on the E. coli and yeast small subunit rRNAs used to locate the corresponding regions in the two molecules are enclosed in boxes. (Insets) More comprehensive views of the secondary structure of the relevant portions of the two rRNAs, from Zweib et al. (9). The regions of the homologies are highlighted. The numbers of the bases of corresponding regions in the two rRNAs, identified by the homology shared by the yeast rRNA, are indicated by the dashed lines.

points. (We lack space to show the averaged covariance spectrum for data segmentation equal to 13; but the results for this case are given in Table 1.) We also made a numerical calculation for the case of no segmentation (and no ensemble averaging), with the entire homology data treated as one data set (Fig. 3d).

z

80-

HE Y EG Y EK *Y EH-Y ET-Y EL-E YK-Y EL E EV-Y ELE CD E EH Y

CQ E

ET-H HV-H ET E EI E

YS H YT Y

4-

CK 'A

0

CL 0

40-

0~

1040

1080

Position

1120 on

1160

rRNA

FIG. 2. Positions of the homologies relative to a standard frame of reference. Abscissa, numbered positions on the E. coli small subunit rRNA. Ordinate, bases on the standard tRNA, numbered according to Sprinzl and Gauss's (7) notation. The three characters identifying the sources of the homology denote, in order, the source of the tRNA, its amino acid (identified by single-letter code), and the source of the small subunit rRNA. E, E. coli; C, coliphage; Y, yeast;

H, H. volcanii; B, bovine mitochondria. Dotted lines provide the points of intersection with the abscissa that are used for the spectral analysis. This figure shows a part of the data-i.e., a = 1035-1120.

Biophysics: Nazarea et al.

Proc. Natl. Acad. Sci. USA 82 (1985)

5339

la ow

rA 0.

4)

la

1.55

-

40 us

1824.RTF/2048(2048: 1SG)

0 Vo

d

-

1.384. 00

SNo

1.21

1,

1.03

0.00

0.10

0.20

0.30

0.40

0.50

0.00

0.10

i 0.20

0

Normalized frequency

FIG. 3. Covariant resonance line shapes of second-order amplitude spectra for various homology data segmentation-e.g., 7,4,2, 1. Because of lack of space, the spectrum corresponding to segmentation 13 is not shown, but use is made of it in Table 1. In all cases, a Parzen-type covariance lag window (apodization function) has been used. Normalized frequencies of the resonance peaks and their corresponding repeat lengths [in units of number of (ribonucleotide) bases] are given here to at least three decimal places, but the inherent resolvabilities for various segmentation lengths differ in accordance with the theoretical arguments leading to Eq. 2. The location of benchmark line shape frequencies defining the single (ribonucleotide)-base resolution limit as well as the 10 (ribonucleotide)-base resolution limit are indicated. The definition of 1o peaks and 00 peaks in the resonance line shape conform to the definition in the text. (a) n = 256. 10(256): WN = 0.230, rN = 4.35 (maximum repeat length resolution = 0.074); 00(256): wN = 0.004, rN = 250 (maximum repeat length resolution = 244). The single-base resolution limit is at '0N = 0.0625; the 10-base resolution limit is at w = 0.0198. (b) n = 512. 10(512): w1* = 0.230. rN = 4.35 (maximum repeat length resolution = 0.037). The single-base resolution limit is at WON = 0.0442; the 10-base resolution limit is at WN = 0.0140. (c) n = 1024. 10(1024): wN = 0.111, rN = 9.01 (maximal repeat length resolution = 0.079); 00(1024): 0N = 0.005, rN = 200 (maximal repeat length resolution = 39.1). The single-base resolution limit is at W0N = 0.0312; the 10-base resolution limit is at wN = 0.0099. (d) n = 2048. 10(2048): wN* = 0.236, rN = 4.24 (maximal repeat length resolution = 0.009). The single-base resolution limit is at w0N = 0.0221; the 10-base resolution limit is at w0N = 0.0070.

In what follows, we will adopt the following definitions: (i) The principal resonance is the largest amplitude peak whose corresponding repeat length (see below) is maximally resolvable to at least 1 unit length. The principal resonance will be indicated in Fig. 3 by 10. (it) There will be resonances that will

be of high spectral amplitude but whose corresponding resolvability is maximally much coarser than unit length. These will be indicated by 00. When confusion can arise these labels will be parametrized by the numerical segmentation used-e.g., 10(512) or 00(256).

5340

Proc. Natl. Acad. Sci. USA 82 (1985)

Biophysics: Nazarea et al.

Table 1. Summary of stages in spectral analysis and resolvability calculation ResolvResoluSegmenabilityT Length§ Frequencyt tiont tation* 0.65 9.10 0.11 0.008 SG 128:13 0.07 4.35 0.23 0.004 256:7 SG 0.04 4.35 0.23 0.002 512:4 SG 0.08 9.01 0.11 0.001 1024:2 SG 0.01 4.24 0.24 0.0005 2048:1 SG *Segmentation length, n, and number of segments (SG). tMaximally resolvable (normalized) frequency, Aw/V(n) maxi *Normalized frequency, wr(n), of principal line 10(n). §Corresponding value of repeat length, r,(n), of 10(n). Maximal resolvability of repeat length, Ar.(n) , of 10(n).

Before we go into the details and implications of the results contained in Table 1 and Fig. 3, we need to state the additional fact that for second-order (covariant) spectral analysis, there is a lower bound corresponding to the inherent resolvability dictated by the Nyquist limit, and this can be shown to imply that IwN(n)1max = (1/2)(n/2)-l = n-1. In Table 1, we present the principal covariant resonances derived from the results presented in Fig. 3. The second column gives the values of the maximum frequency resolvability IwN(n)jm. for each segmentation length used. This column indicates the number of strictly warranted significant figures in the resolution of 0N. This leads to the resolved values wjv(n) given in column 3, which in turn yield the resolved values for the repeat length rN(n) shown in column 4. For estimating the maximal resolvability of rN(n) itself, we r2(n)IwN(n)jmax, can make use of the relation IrN(n)Imx which can be directly derived from the fundamental relation connecting rN(n) to wN(n). Thus, since IwN(n)lmax = n-1, as stated earlier, then =

jArN(n)1m., n-iri~W. -

[21

This last expression allows us to estimate directly from columns 1 and 4 the inherent maximal resolvability of any estimate rN(n), which in our case is given in column 5. From Eq. 2, it also follows by direct substitution that one can calculate, for a given segmentation length, the limiting normalized frequencies for unit (monadic)-resolution [in our case single (ribonucleotide)-base resolution in rN(n)] and decadic-resolution [10 (ribonucleotide)-base resolution in rN(n)]. For our homology data, these benchmark figures are indicated for n = 256, 512, 1024, and 2048 in the legend to Fig. 3. To complete these numerical results, the following are the corresponding figures for n = 128. The principal peak is the covariant resonance 10(128): wc = 0.11, rN = 9.1 (maximum repeat-length resolution = 0.65). The single-base resolution limit is at the normalized frequency CwN = 0.088, while the 10-base resolution limit is at (wN = 0.028. The single-base and 10-base resolution benchmarks have the following significance. All resonances in the mean-square covariance or the homology data lying to the right of the single-base benchmark on the normalized frequency axis are resolvable to at least 1 base and, similarly, the 10-base benchmark marks off the normalized frequency to the right of which the resonance peaks are all resolvable to at least 10 bases. By far the most striking feature that stands out in both Table 1 and Fig. 1- is the fact that the spectral results split naturally into two sets, as parametrized by the segmentation length. For one such set (corresponding to n = 128, 1024), the spectral estimate for re(n) is broadly-in the number of nucleotide bases-twice that of the corresponding estimate for the other set (corresponding to n = 256, 512, 2048). This fact seems strongly to indicate that the repeat-lengths that

Table 2. Significant principal multiplets detected by covariant spectral resonance analysis

Significance level for two-sided t test, % 85 87 90 93 94 95 97 99.7 99.8 99.9

Corresponding confidence interval for detected multiplets, RN 8.6-9.0 8.6-9.0 8.6-9.0 8.6-9.1 8.6-9.1 8.5-9.1 8.4-9.2 8.1-9.5 8.0-9.6 7.9-9.7

Significant multiplets, AN, within confidence interval 9 9 9 9 9

8, 9 8, 9 8, 9 8, 9, 10 8, 9, 10

give rise to the principal resonances in the covariance of the homology data are modular symbolic strings possessing a very restricted symmetry group (17). In particular, our split spectral results could arise if such admissible modular symbolic strings each consisted of two tandemly iterated direct repeats. We will attempt to prove assertions of the above type in a later paper devoted to a computer model of sequences interacting in accordance with well-defined recombinational dynamics appropriate for RNAs. From the foregoing arguments, the conclusion that emerges is that the principal multiplets detectable by the secondorder spectral analysis from the mean-square covariance of the homology data are RN = 9.1, 8.7, 8.7, 9.0, 8.5 (in units of number of ribonucleotide bases), specifically admitting modular tandem repeats. Confidence intervals for these detected principal multiplets are given in Table 2. This tabulation represents the result of a succession of t tests for the two-sided confidence limits around the mean of RN, which is equal to 8.8, with corresponding deviation of 0.245. This standard deviation is of the same magnitude as the mean of the maximal resolvabilities given in column 5 of Table 1 (taking tandemly iterated multiplets into account), justifying in a more physical sense the use of the empirical two-sided t test. Table 2 allows the following conclusions: We can assert, to the 94% confidence level, that there exists one principal covariant resonance corresponding to the multiplet AN = 9. Invoking higher confidence levels (95-99.7%), the coexistence of two main resonances corresponding to both multiplets of RN = 8 and 9 has to be admitted as significant. At still higher confidence levels (=99.8%), the coexistence of the triad AN = 8, 9, and 10 becomes a fundamental fact and cannot be significantly ruled out. We do not expect these conclusions to be substantively altered if a different spectral approach is applied to the present homology data.

DISCUSSION AND PERSPECTIVE Similarities among nucleic acids permit identification of conserved sequences essential for function. These sequences are thought to be related by common origin (3, 4). Usually the functions of the conserved sequences are similar. In the present case, the two RNAs differ in size and play different roles. It seems unlikely that the shared sequences reflect shared functions. The unpredictable distributions of the matches along and among the molecules and among different species bear this out (6). The matches appear to reflect common origins. The present work demonstrates a regularity in the positioning of the homologies and suggests that where the sequences are conserved, so are their positions. The sources of the RNAs represent four widely disparate phylogenetic

Biophysics: Nazarea et al. lines, the equivalent of taxonomic kingdoms. They include an archaebacterium, a eubacterium, a eukaryote, and an organelle of a higher eukaryote. In addition, the homologies are defined by matches between, as well as within, members of these groups. A tRNA from yeast matched against an rRNA from Halobacterium, with the homology aligned along the rRNA from E. coli, comprises the typical data that reveal the periodicity. This pooling of sequences might seem to stack the cards against detection of such a relationship, but was done because of the paucity of available sequences. The demonstration of periodicity in spite of the mixed sources provides an additional argument for the reality of the periods. The sequences, in their spatial contexts, must date from the time of divergence of the lines leading to the organisms from which the RNAs were taken. Eukaryotes (represented by yeast) and prokaryotes (bacteria) diverged a billion or more years ago (20). Eubacteria and archaebacteria diverged 3-4 billion years ago (21). The separation of tRNA and rRNA functions from a restricted class of presumptive macromolecular modules must have occurred before the start of phylogenetic evolution. The periodicity centered at a repeat length of 9 bases is interpreted as a relic of modular tandem repeats involving a triad of ancestral modules from which the two classes of RNAs were derived. A model for the generation of such a repeat and for its subsequent evolution through recombination and mutation to give the observed present day configurations has been described (6, 22). The latter events must not have disturbed the spatial contexts in which the homologies exist, the basic framework, or archeomodular format, being still discernible-and therefore remanent-even though the conserved sequences are now distributed unpredictably among the members of the divergent lines. A number of interesting questions arise that might be approached by using spectral analysis. These concern the relative contributions of different groupings of the nucleic acids to the spacings; the contributions of different families of tRNAs; that of the tRNAs as opposed to the rRNAs; of homologies from inter- versus intraspecies searches; and the participation of members of the other classes of RNAs. In addition, there are questions of minor and cryptic resonances, and the effects that evolutionary events may have had in obscuring what are now unresolvable resonances. We did address two other questions that are relevant to the present study. The thymidine and dihydrouridine arms of the tRNAs exhibit a relatively high degree of sequence conservation. Some of the homologies located in these structures are represented a number of times, one set by as many as 13 examples. To ensure that these did not unduly influence the results, we used as data for one spectral run homologies whose positions are represented only once. Of 97 homologies, 93 locatable, 63 different locations are represented. Covariance spectral analysis of these also showed sharply resolved resonance peaks at 8, 9, and 10 bases for various segmentation lengths, indicating that the period is not a reflection of fortuitous proximity of a few homologies to others that lie in these conserved regions. The contribution of rRNA to the periods is assured by common alignments, where homologies that are noncontiguous but neighboring fall along the same 450 line, and by the occurrence of periodic spacings that are larger than the tRNAs (spacings over 76 bases) (6).

Proc. Natl. Acad. Sci. USA 82 (1985)

5341

A more detailed compilation of the covariant resonance structure of these data will be published elsewhere. Our aim is to catalog exhaustively the lesser amplitude lines. For instance, it can be shown that a middle amplitude resonance corresponding to rN 6 is present in the segmented and unsegmented data, a particular repeat length that has been discussed by Marliere (23) in connection with certain tRNA comparisons. To our knowledge, the word remanent has been used in exact scientific discourse only in one evolutionary context: to describe the residual vectorial (directionally reversing) geomagnetization that remains measurable, to the present, in core samples from, e.g., deep sea paleosediments. This extensively studied phenomenon has come to be termed NRM (24), for natural remanent magnetization. In the present paper, we introduce the term remanent in a teleologically similar context. Our aim has been to seek a rigorous confirmation of a remanent archaeomodular format shared by modern tRNAs and rRNAs (and possibly, by all classes of RNAs), whose existence was suggested by the distributions of spacings between homologies shared by these RNAs (6). =

We express our appreciation to Professor Ilya Prigogine for his interest and encouragement. This work was supported in part by Grant GM-23331 from the National Institutes of Health.

1. Bloch, D. P., McArthur, B., Widdowson, R., Spector, D., Guimaraes, R. C. & Smith, J. (1983) J. Mol. Evol. 19, 420-428. 2. Bloch, D. P., McArthur, B., Guimaraes, R. C., Smith, J., Reese, M. & Jayaseelan, S. (1983) J. Cell Biol. 97, 383a. 3. Fitch, W. M. (1970) Syst. Zool. 19, 99-113. 4. Cedergren, R. J., LaRue, B., Sankoff, D., Lapalme, G. & Grossjean, H. (1980) Proc. Nati. Acad. Sci. USA 77,

2791-2795. 5. Matthews, B. W., Remington, S. J., Grutter, M. G. & Anderson, W. F. (1981) J. Mol. Biol. 147, 545-548. 6. Bloch, D. P., McArthur, B. & Mirrop, S. (1985) BioSystems 17, 209-225.

7. Sprinzl, M. & Gauss, D. H. (1982) Nucleic Acids Res. 10, 1-55. 8. Anderson, S., DeBruijn, M. H. L., Coulson, A. R., Eperon, I. C., Sanger, F. & Young, I. G. (1982) J. Mol. Biol. 156, 683-717. 9. Zweib, C., Glotz, C. & Brimacombe, R. (1982) Nucleic Acids Res. 9, 3621-3640. 10. Gupta, R., Lanter, J. M. & Woese, C. R. (1983) Science 221, 656-659. 11. Goad, W. B. & Kanehisa, M. I. (1982) Nucleic Acids Res. 10, 247-263. 12. Lipman, D. J., Wilbur, W. J., Smith, T. F. & Waterman, S. M. (1984) Nucleic Acids Res. 12, 215-226. 13. Kanehisa, M. I. (1984) Nucleic Acids Res. 12, 203-213. 14. Hannan, E. J. (1960) Time Series Analysis (Methuen, London). 15. Cramer, H. & Leadbetter, M. P. (1967) Stationary and Related Stochastic Processes (Wiley, New York). 16. Nazarea, A. D. (1983) DNA 2, 92 (abstr). 17. Nazarea, A. D. (1985) Phys. Rev. A, in press. 18. Mann, H. (1958) Addition Theorems: The Addition Theorems of Group Theory and Number Theory (Krieger, New York). 19. Huppert, B. (1967) Endliche Gruppen (Springer, Berlin). 20. Margulis, L. (1981) Symbiosis and Cell Evolution (Freeman, San Francisco). 21. Woese, C. R., Mangrum, L. J. & Fox, G. E. (1978) J. Mol. Evol. 11, 245-252. 22. Bloch, D. P., McArthur, B., Widdowson, R., Spector, D., Guimaraes, R. C. & Smith, J. (1984) Origins Life 14, 571-578. 23. Marliere, P. (1983) Dissertation (University of Paris, Paris). 24. Cox, A. (1969) Science 163, 237-245.

Suggest Documents