Datiscaceae Revisited: Monophyly and the Sequence ...

3 downloads 0 Views 472KB Size Report
in several small populations (Liston e
Systernntic Botany (1998), 23(1). pp 157-169 O Copyright 1998 by the American Society of Plant Taxonomists

Datiscaceae Revisited: Monophyly and the Sequence of

Breeding System Evolution

SUSAN M. SWENSEN',

JENNIFER N.

LUTHI,and LORENH. RIESEBERG

Department of Biology, Indiana University, Bloomington, Indiana 47405

'Present address: Department of Biology, Ithaca College, Ithaca, New York 14850

Communicating Editor: Lucinda A. McDade ABSTRACT.Previous studies of the small angiosperm family Datiscaceae have drawn contradictory conclusions regarding its monophyly. Clarification of the relationships among the family components is critical to the interpretation of breeding system evolution within this family. Datisca glomerata is the only androdioecious member of the otherwise dioecious family and an initial phylogenetic study suggested that this rare breeding system was derived from dioecy in this family. A subsequent, broader scope phylogenetic analysis of Datiscaceae and related families has since suggested that Datiscaceae are not monophyletic, calling into question earlier conclusions regarding the evolution of androdioecy in Datiscaceae. In the present study, the phylogenetic relationships of Datiscaceae and the sequence of breeding system evolution are reexamined. DNA sequences from three sources including nuclear 18s ribosomal D N A , the internal transcribed spacer (ITS) region of nuclear ribosomal DNA, and the chloroplast-encoded vbcL gene were analyzed phylogenetically using parsimony. Results from analysis of rbcL, 185, and a combined data set all agree that Datiscaceae do not form a monophyletic assemblage. Datisca appears as a sister group to Begoniaceae in all analyses, but the position of sister taxa Octorneles and Tetvameles relative to Datisca and other members of the Cucurbitales is unresolved. The two species of Datisca form separate monophyletic lineages according to ITS analysis, providing no evidence for a progenitor-derivative relationship for the two species. Phylogenetic trees from analyses of rbcL and 18s disagree as to whether dioecy or monoecy is ancestral to Datiscu, and thus provide no evidence as to which sexual system gave rise to androdioecy in D. glomevata, however, there is no evidence for the derivation of androdioecy from hermaphroditism.

The family Datiscaceae comprises three genera and four species (Davidson 1973). Octomeles and Tetrarneles are monotypic, containing the tropical, dioecious tree species, 0 . surnatrana Miq. and T. nudiflora R. Br. (Davidson 1973). Octorneles occurs chiefly in the East Indies and Tetrameles in Indochina and Malaysia (Fig. 1). The third genus, Datisca, consists of two species (Davidson 1973). Both are perennial herbs that occur primarily in riparian habitats. Datisca cannabina L. is dioecious and ranges from the southern slopes of the Himalayas in northwestern India to the island of Crete in the Mediterranean (Fig. 1). By contrast, most populations of Datisca glornerata (Presl) Baill. are androdioecious (Fritsch and Rieseberg 1992; Liston et al. 1990), consisting of male and functionally hermaphroditic individuals. However, previous studies have noted the absence of males in several small populations (Liston et al. 1990). Datisca glornerata is distributed from Baja California, Mexico to northern California (Fig. 1). Although this small family is well-studied, its monophyly is disputed (Davidson 1973,1976; Airy Shaw 1964), and a recent molecular phylogenetic study (Swensen et al. 1994) suggests that the family is not monophyletic.

Without a better understanding of phylogeny, the sequence of breeding system evolution in Datiscaceae is ambiguous. Traditionally, androdioecy has been viewed as an intermediate step in the evolution of dioecy from hermaphroditism (Darwin 1877; Westergaard 1958; Bawa 1980; Richards 1997). Evidence for this view includes 1) the scattered taxonomic distribution of dioecy in flowering plants, suggestive of multiple independent origins from the most common angiosperm breeding system, hermaphroditism (Darwin 1877), and 2) the hypothesized sequence of mutations required to evolve dioecy from hermaphroditism. Because both male and female sterility mutations are unlikely to occur and become established simultaneously, the evolution of dioecy from hermaphroditism may involve an intermediate population that is either gynodioecious (male sterile and hermaphrodite individuals) or androdioecious (female sterile and hermaphrodite individuals) (Westergaard 1958; Charlesworth and Charlesworth 1978). The relative rarity of androdioecy in flowering plants, as compared to gynodioecy, suggests that androdioecy is not an important intermediate in the formation of dioecy from hermaphroditism (Charlesworth 1984). The observed rarity of andro-

SYSTEMATIC BOTANY

[Volume 23

FIG. 1. Global distribution of Datiscaceae. Shaded areas show the distribution of Dntiscn glomerata (I), Datisca cannabina (2), and Octomeles and Tetrameles (3).For detailed information on the collection sites for accessions sequenced in this study, see Liston et al. (1992) and Rieseberg et al. (1992).

dioecy is supported by theoretical predictions indicating that a nuclear female sterility mutation is very unlikely to invade a hermaphrodite population unless the population is predominantly outcrossing and the mutation is accompanied by at least a two-fold increase in fitness (Lloyd 1975). A male sterility mutation requires no such increase in fitness to coexist among hermaphrodites, because male sterility can be caused by cytoplasmic factors. We are aware of only four documented cases of androdioecy [Datisca glomerata, Liston et al. 1990; Mercurialis annua L. (Euphorbiaceae), Pannell 1997; Saxifvaga cernua A. Gray (Saxifragaceae),Molau and Prentice 1992; Phillyrea angustifolia L. (Oleaceae), Lepart and Dommee 1992; Traveset 19941. Datisca glomerata represents the best documented case of androdioecy so far, as the lability of sexual expression in M. annua and S. cernua has not been well studied, and only a single population of P. angtlstifolia is known to be androdioecious (Lepart and Dommee 1992; Traveset 1994). Most reported cases of androdioecy are later found to be functionally dioecious (reviewed in Charlesworth 1984), subdioecious (e.g., Anderson et al. 1988) or andromonoecious (e.g., Thomson et al. 1989). Even where androdioecy exists, there is little evidence that it has served as an intermediate step in the formation of dioecy from hermaphroditism. Neither Saxifvaga cernua nor Phillyrea angustifolia have been analyzed phylogenetically. In Datisca, Liston et al.

(1990) suggested that androdioecy was actually derived from dioecy rather than vice versa because all other species in the family besides D. glomerata are dioecious. Chloroplast DNA (cpDNA) restriction site analyses (Rieseberg et al. 1992) are consistent with this prediction because D. glomerata appeared in a distal position in the cpDNA tree. In the cpDNA analysis, Rieseberg et al. (1992) assumed monophyly for Datiscaceae and rooted their tree with members of Cucurbitaceae and Begoniaceae, families considered to be the closest relatives of Datiscaceae. Recent phylogenetic analyses based on rbcL and using members of lower Hamamelidae and Magnoliidae as outgroups (Swensen et al. 1994) suggest, in fact, that Datiscaceae are paraphyletic with respect to Begoniaceae and Cucurbitaceae; Octomeles and Tetrameles appear basal to a clade that includes Datisca, Cucurbitaceae, and Begoniaceae. Unfortunately, the rbcL tree is ambiguous regarding the sister taxon to Datisca. Moreover, the reliance on data from a single gene for phylogenetic inference may yield trees that are not congruent with organismal phybgeny and thus may lead to inaccurate biologcal conclusions (Avise 1989; Reseberg and Soltis 1991; Doyle 1992). Here we present results from three different data sets that address the phylogeny and sequence of breeding system evolution in Datiscaceae. To determine whether the genera traditionally placed within Datiscaceae form a monophyletic lineage and to identify the probable sister taxon to Datisca,

19981

SWENSEN ET AL.: DATISCACEAE

we present updated results from previous phylogenetic analysis of rbcL data (Swensen et al. 1994) and new 18s ribosomal sequence data and analysis. We have also reconstructed a phylogeny for the genus Dafisca based on internal transcribed spacer (ITS) sequences of nuclear ribosomal DNA. More frequent nucleotide substitutions w i t h ITS, as compared to either rbcL or 18S, allowed a finer scale phylogenetic analysis within Datisca. This analysis determined whether each species of Dafisca formed distinct, monophyletic groups or whether one of the species might have served as the progenitor of the other, as would be suggested by a paraphyletic relationship for populations of the two species.

159

der as a size standard (Gibco BRL). Pooled 100 yL amplification reactions (usually three) were purified to remove unincorporated nucleotides and primers by using Elu-Quick (Schleicher and Schuell, Inc.), but the DNA was eluted in sterile water at the final step rather than in TE. Purified double-stranded DNA was directly sequenced using the Sequenase version 2.0 kit (Amersham Life Science) according to the manufacturer's protocol. Sequencing of ITS regions 1 and 2 was carried out using four primers "ITS-lp" (5'TCCGTAGGTGAACCTGCGG-3'), "ITS-3p" (5'-GCATCGAT"ITS-2," and "ITS-4" [sequences GAAGAACGTAGC-3'), are given for primers with plant-specific modifications of White et al. (1990) primers]. 18s and rbcL were sequenced using the internal primers AND METHODS MATERIALS described in Nickrent and Starr (1994) and Swensen Sources of plant material utilized in this study are (1996), respectively. Sequences were read in both listed in Table 1. DNA was available for individuals forward and reverse directions. of Dafisca cannabina and D. glomerata from previous A total of 18 rbcL sequences were chosen for phyanalyses (Liston et al. 1989, 1992) and the individu- logenetic analysis. Nine rbcL sequences were taken als included in this study are a geographically rep- from previous analyses by Swensen et al. (1994)and resentative subset of this previous collection, Swensen (1996), four rbcL sequences were obtained including individuals from populations that con- from GenBank, and the remaining five are new to tained no males. For other species, total cellular this study (see Table 1). The new rbcL sequences DNA was extracted from fresh or silica dried leaf were entered by computer directly into the phylomaterial according to the method of Doyle and genetic analysis program PAUP* 4.0 d55 (a test verDoyle (1987).Amplification reactions contained 2-5 sion written by David L. Swofford) and analyzed ng genomic DNA, 30 mM Tricine pH 8.4, 2 mM with existing sequences. The first 30 nucleotide MgCl?,50 mM KC1,100 yM each dNTP and 0.15 yM positions were omitted from analysis because they each amplification primer. The vbcL gene was ampli- correspond with the forward amplification primer. fied using the primers described in Swensen (1996). Of the included rbcL nucleotide positions, 2.6% of The ITS 1, 5.8S, and ITS 2 regions of the ribosomal the data matrix cells were scored as missing. DNA repeat were amplified together using primers The 18s sequences analyzed here were selected to "ITS-4" (White et al. 1990) and "ITS-5p" (5'-GGAAG- represent the same taxa as were included in the rbcL GAGAAGTCGTAACAAGG-3'), a plant-specific modifica- analysis. Eight of the 18s sequences were obtained tion of White et al. (1990). 18s was amplified using from GenBank and 10 are newly reported here. 18s primers "25eF" (Nickrent and Starr 1994) and "ITS- sequences were entered directly into PAUP where the latter also a they were combined and aligned with existing 18s 2p" (5'-GCTACGTTCTTCATCGATGC-3'), plant specific modification of White et al. (1990). sequences obtained from GenBank. Nucleotide After adding a mineral oil overlay, the tubes were positions 1-20 of the 18s data matrix were omitted placed in a thermal cycler (MJ Research) and reac- from the analysis due to correspondence with the tions were begun using the hot start procedure forward amplification primer. Positions 2140 were (D'Aquila et al. 1991) and 1.2 units Taq DNA poly- often difficult to read in several taxa and were also merase (Gibco BRL). After adding the polymerase, excluded from analysis. Positions 675-679 and the 18s region and rbcL genes were amplified using 1793-1835 were omitted from analysis because of 30 cycles of 95°C for 1min, 55°C for 1min, 72°C for ambiguity in their alignment. For the remaining 1min followed by a final extension of 6 rnin at 7Z°C. nucleotide positions of the 18s data matrix, 0.26% of For ITS amplifications, cycles 2-30 consisted of 30 the cells were scored as missing. sec denaturation at 95OC. A combined data set of 18s and rbcL sequences The presence and size of single amplification was constructed by appending the aligned 18s products were verified by agarose gel electrophore- sequence data onto the rbcL data matrix. In the sis and ethidium bromide staining using a 1Kb lad- combined data matrix, the same nucleotide posi-

19981

SWENSEN ET AL.: DATISCACEAE

tions were excluded from analysis as described for the separate analyses. Of the remaining nucleotide positions, 1.3% of the cells were scored as missing. Incongruence between the two data sets was assessed by using the incongruence length difference test (ILD test; Farris et al. 1994) implemented as the partition heterogeneity test in PAUP* 4.0 d55. This test randomly repartitions the combined data set into two subsets equal in size to the original data sets. One thousand random repartitions of the combined data set were analyzed heuristically using 10 random-taxon-entry additions per search. The sum of tree lengths derived from analysis of the original data sets was then compared to the distribution of the sum of tree lengths generated by analyses of the randomly repartitioned data sets to determine whether the data sets are significantly incongruent. If the probability is low (e.g., P I 0.05) of obtaining a sum of tree lengths from the random repartitions that is smaller than the sum of tree lengths from the original data sets, then the data sets are considered to be significantly incongruent. A total of 16 new ITS sequences were determined (0.7% scored as missing data) and aligned using the ClustalX program (a Windows interface for ClustalW; Thompson et al. 1994).ClustalX first conducts painvise alignments to produce a guide tree based on distance values between sequence pairs. The guide tree is then used to direct the sequence of painvise alignment of larger and larger groups of sequences. Ten different alignments of the ITS sequences were generated by using different gap opening (GOP) and gap extensions (GEP)penalties for each alignment. The GOP/GEP values specified for the initial pairwise alignment were the same ones used in the multiple alignment phase. The different GOP/GEP combinations were selected to represent a range of values, but concentrated on less severe penalties for gaps because of the taxonomic range of sequences being analyzed. The following GOP/GEP values were used: 1/0.1; 1/1; 2/0.1; 2/1; 3/0.1; 3/1; 4/0.1; 4/1; 5/1, and 10/5. Each of the alignments was inspected manually and none appeared to have any obvious misalignments. Each of the 10 alignments were used as input data sets for separate phylogenetic analyses. In addition, an elision data set, comprising all 10 ITS alignments concatenated into a single data set, was constructed and used as input for another separate phylogenetic analysis. The elision method offers a continuous weighting scheme whereby the characters that are consistently aligned in all data sets are assigned greater weight than those whose

161

alignments are ambiguous (Wheeler et al. 1995). The elision approach was adopted here in lieu of completely removing alignment-ambiguous nucleotide positions ("culling") as previous reports of analyses using culled data sets generally resulted in robust but less resolved hypotheses of relationships (Gatesy et al. 1994).Additionally, the ITS data set contained outgroups whose sequences were substantially divergent from those of the ingroup, thus, the culling approach would have removed most of the nucleotide positions. All data sets were analyzed using unweighted parsimony via the heuristic search option in a test version of PAUP* 4.0 d55 (written by David L. Swofford). For the rbcL, 18S, and the rbcL-18s combined data sets, 1,000 random-taxon-addition (RTA) replications were performed using TBR branch swapping with the MULPARS option selected. For ITS data, 100 RTA replications were conducted for each of the different alignments and the elision data set. Gaps introduced into the 18s sequences (usually single nucleotide indels) for alignment purposes were treated as a fifth character state by changing the PAUP default to GAPMODE=NEWSTATE. Gaps introduced in the alignment of ITS sequences (usually multiple nucleotide indels) were treated as missing data (the PAUP default). The different treatments for gaps reflect the likelihood that single nucleotide gaps in 18s are caused by single evolutionary events, whereas multiple nucleotide gaps within ITS could have been generated by one or more events. In addition, due to sequencing artifacts, some positions within ITS sequences were ambiguous and were scored as missing data; these are better treated as missing rather than as separate evolutionary events. Baueua, Ceratopetalum (Cunoniaceae), Eucryphia (Eucryphiaceae), and Cephalotus (Cephalotaceae), all members of order Rosales were designated as outgroups for rbcL, 18S, and rbcL-18s combined analyses on the basis of previous rbcL and 18s analyses (S. Swensen unpubl. data; Soltis et al. 1997). These outgroups represent the closest known taxa to Cucurbitales (based on molecular analysis). For ITS analysis of Datisca, the outgroup (Marah, Coriaria) was chosen based on the results of rbcL and 18s analyses presented here. Branch support in the trees was assessed using decay (Bremer 1988; Donoghue et al. 1992) and bootstrap analyses (Felsenstein 1985).Decay analyses were implemented by repeating the heuristic search with 100 RTA repetitions and saving all trees up to 5 steps longer than the shortest trees found by

162

SYSTEMATIC BOTANY

[Volume 23

the initial searches. The number of additional steps decaying in trees only one step longer than the required for the collapse of various branches was shortest trees, and with a bootstrap value of less determined by inspecting consensus trees con- than 50%. When the PAUP search was constrained structed from trees found at each length and all by limiting the search to only trees monophyletic those shorter. For the bootstrap analyses, 100 boot- for Datiscaceae, the resulting trees were seven steps strapped data sets were analyzed heuristically longer (2')' than the shortest trees. The analysis of 18s sequences (1,747 total characusing 10 RTA repetitions per search. A bootstrap analysis was conducted for each of the aligned ITS ters; 107 parsimony informative characters) data sets, however, bootstrapping is not an appro- resulted in a single most parsimonious tree shown priate method by which to assess branch support in Fig. 2B (length = 336; CI excluding uninformative for cladistic analysis of elided data sets. This is characters = 0.56; RI = 0.67). The 18s tree agrees because single characters are represented multiple with the rbcL tree by indicating that Datiscaceae are times within the concatenated data set. At mini- not monophyletic; however, the 18s topology sugmum, three characters are required to resolve a gests Datiscaceae are paraphyletic, not polybranch in 95% of the bootstrap estimates (Felsen- phyletic, as with the rbcL tree. In the 18s tree, stein 1985); thus, bootstrapping an elision data set Octomeles-Tetrameles are sister to Begoniaceae, with may indicate good support even for branches sup- Datisca appearing basal to these together. Support ported by only a single character (Soltis et al. 1996). for the branches defining relationships of Datisca For this reason, we chose to use the jackknife and Octomeles-Tetrameles is poor (decay index = 1). method (implemented in PAUP* 4.0 d55) to assess Constraining the search by specifying monophyly branch support. The jackknife method resamples for Datiscaceae resulted in trees that were eight the data set, but without replacement. Because our steps longer (2.4%)than the shortest trees. Phylogenetic analysis of a combined data set conelision data set comprises 10 concatenated alignments, a consistently aligned character could sisting of rbcL and 18s sequences (3,145 total charpotentially be represented 10 times. Thus, by drop- acters; 233 parsimony informative characters) ping the same proportion of data points through resulted in six equally parsimonious trees that are jackknifing (90% in this case), we construct a jack- represented by the strict consensus tree shown in knife data set equivalent to the average size of a sin- Fig. 2C (length = 718; CI excluding uninformative gle alignment. One hundred jackknifed data sets characters = 0.58; RI = 0.71). The combined analysis were analyzed using 10 RTA replications per produced a third topology, different from the rbcL heuristic search. Although this approach still per- and 18s topologies, indicating that Datiscaceae are mits a single character to be represented more than paraphyletic to Begoniaceae. The branches defining once, the chances of that character being repre- the relationships of Datisca and Ocfomeles-Tetrameles within the combined tree have no better support sented many times is substantially lower. than those in the trees generated by separate rbcL and 18s analyses (Fig. 2C). Using a constrained search, trees showing monophyly for Datiscaceae Phylogenetic Analysis of rbcL and 18s were only two steps longer (0.28%) than the most Sequences. The analysis of rbcL sequences (1,398 parsimonious trees. The ILD test for incongruence total characters; 136 parsimony informative charac- between the rbcL and 18s data sets indicated that ters) resulted in a single most parsimonious tree the data were not significantly incongruent. The (Fig. 2A, length = 351; CI excluding uninformative test resulted in a P value of 0.07, indicating that 7% characters = 0.63; RI = 0.77). This topology reiter- of the random repartitions yielded a sum of tree ates previous results from analysis of rbcL lengths lower than the sum of the trees generated sequences (Swensen et al. 1994; Swensen 1996) from the rbcL-18s partition. Alignment and Phylogenetic Analysis of ITS showing a closer relationship between Datisca and Begoniaceae than between Datisca and Ocfome- Sequences. The length of the ITS 1 region ranged les/Tetrameles. In this tree, however, Octomeles and from 198 nucleotides in Marah to 262 nucleotides in Tetrameles appear in a monophyletic group that D. cannabina (accession 3). The ITS 2 region varied includes Cucurbitaceae and Coriaria, suggesting in length from 213 nucleotides in Tetrameles to 271 that Datiscaeae are polyphyletic. Decay and boot- nucleotides in Hillebrandia. Sequence data read strap analyses indicate that the branch uniting from autoradiograms of Tetrameles were ambiguous Datisca with Begoniaceae has minimal support, for several regions of ITS 1 and approximately 15%

19981

SWENSEN ET AL.: DATISCACEAE

of the ITS 1sequence was unreadable and scored as missing. In other cases, the ITS regions were readable in at least one direction. Each of the 10 alignments of the ITS sequences were inspected and none appeared to have any obvious misalignments that required manual manipulation. Phylogenetic analysis of each of the 10 alignments generated between one and 68 trees per search. A total of 243 unique topologies were recovered once all the trees from separate analyses were combined and duplicate trees (only five) were eliminated. A considerable amount of topological variation existed among these trees, such that a strict consensus tree (not shown) showed no resolution among taxa except for Datisca species. Datisca cannabina accessions formed one group and D. glomerata accessions formed another group (both internally unresolved). Bootstrap analyses of the individual data sets indicated that the D. cannabina group is supp

Suggest Documents