Sileneae, Caryophyllaceae - Oxford University Press

1 downloads 0 Views 623KB Size Report
Jul 3, 2009 - lineage sorting (Pamilo and Nei 1988; Linder and. Rieseberg 2004). ... James and Abbott 2005; Mir et al. 2006; Poke et al. 2006). A homoploid ...
Syst. Biol. 58(3):328–345, 2009 c Society of Systematic Biologists Copyright  DOI:10.1093/sysbio/syp030 Advance Access publication on July 3, 2009

Hybrid Origins and Homoploid Reticulate Evolution within Heliosperma (Sileneae, Caryophyllaceae)—A Multigene Phylogenetic Approach with Relative Dating ˇ F RAJMAN1,2∗ , F RIDA E GGENS1 , B O ZO

AND

B ENGT O XELMAN1,3

1 Department

of Systematic Botany, Evolutionary Biology Centre, Uppsala University, Norbyv¨agen 18D, SE-75236 Uppsala, Sweden; Department, Biotechnical Faculty, University of Ljubljana, Veˇcna pot 111, SI-1000 Ljubljana, Slovenia; 3 Department of Plant and Environmental Sciences, G¨ oteborg University, Box 461, SE-405 30 G¨oteborg, Sweden; E-mail:[email protected]; ∗ Correspondence to be sent to: Biology Department, Biotechnical Faculty, University of Ljubljana, Veˇcna pot 111, SI-1000 Ljubljana, Slovenia; E-mail: [email protected]. 2 Biology

Abstract.—We used four potentially unlinked nuclear DNA regions from the gene family encoding the second largest subunit of the RNA polymerases, as well as the psbE-petG spacer and the rps16 intron from the chloroplast genome, to evaluate the origin of and relationships within Heliosperma (Sileneae, Caryophyllaceae). Relative dates of divergence times are used to discriminate between hybridization and gene duplication/loss as alternative explanations for topological conflicts between gene trees. The observed incongruent relationships among the three major lineages of Heliosperma are better explained by homoploid hybridization than by gene duplication/losses because species branching events exceed gene coalescence times under biologically reasonable population sizes and generation times, making lineage sorting an unlikely explanation. The origin of Heliosperma is complex and the gene trees likely reflect both reticulate evolution and sorting events. At least two lineages have been involved in the origin of Heliosperma, one most closely related to the ancestor of Viscaria and Atocion and the other to Eudianthe and/or Petrocoptis. [BEAST; homoploid hybridization; incongruence; lineage sorting; PATHd8; r8s; relative dating; RPA2; RPB2; RPD2.]

Molecular phylogenetics in plants is dominated by studies of chloroplast (cp) and nuclear ribosomal (nr) DNA sequences, but there is also an increasing number of studies based on low-copy nuclear genes (reviewed by Sang 2002). There are at least two good reasons for why a large number of unlinked DNA sequence regions are desirable for a proper understanding of plant phylogenetic relationships. First, provided that the method used for phylogenetic inference is consistent, an increased data size will potentially increase accuracy (resolution and support). Second, if the phylogenetic history of the studied lineages includes reticulate events like hybridization, introgression, or lateral gene transfer, data from several unlinked regions can be used to detect and potentially distinguish these events from intragenomic processes such as gene duplication, recombination, and lineage sorting (Pamilo and Nei 1988; Linder and Rieseberg 2004). The application of low-copy nuclear gene trees has improved the understanding of reticulate evolution, in particular of allopolyploids (e.g., Cronn et al. 1999; Ferguson and Sang 2001; Popp and Oxelman 2001; Doyle et al. 2003; Mason-Gamer 2004; Popp et al. 2005, Huber et al. 2006) and to a lesser extent also of diploid taxa (e.g., Cronn and Wendel 2003; Howarth and Baum 2005; Joly and Bruneau 2006; Poke et al. 2006). Hybridization has played an important role in the evolutionary history of plants (reviewed by e.g., Rieseberg and Wendel 1993; Rieseberg and Carney 1998; Soltis et al. 2003). Discordances between gene trees based on cpDNA and nrDNA have often been explained as a result of hybridization (e.g., Soltis and Kuzoff 1995; McKinnon et al. 2001; Okuyama et al. 2005). However, incongruent phylogenetic patterns can result from a variety of processes (Wendel and Doyle 1998; Slowinski

and Page 1999) that can be classified as interlineage (hybridization, lateral gene transfer between organismal lineages) or intralineage (incomplete lineage sorting, orthology/paralogy conflation), both of which may be further complicated by recombination between alleles or genes. Moreover, phylogenetic trees inferred from different genes may disagree due to stochastic or systematic (the phylogenetic models and methods applied fail to converge to the correct solution) errors. A major challenge for contemporary plant phylogenetics is how to distinguish different causes of incongruent phylogenetic patterns from each other. The extent of homoploid hybridization in nature, leading to the formation of new organismal lineages of the same ploidy level as their progenitors, is unclear (reviewed by Rieseberg 1997). There are a few documented cases of homoploid hybrid speciation, thoroughly investigated in Gossypium (Wendel et al. 1991; reviewed by Cronn and Wendel 2003), Helianthus (e.g., Rieseberg 1991; Welch and Rieseberg 2002; Gross et al. 2003) and some other genera (e.g., Wolfe et al. 1998; Ferguson and Sang 2001; Wang et al. 2001; Howarth and Baum 2005; James and Abbott 2005; Mir et al. 2006; Poke et al. 2006). A homoploid hybrid species tends to have a combination of alleles and/or loci that are specific to either parent (Ferguson and Sang 2001), and it is likely that such hybridization leaves traces in more than one nuclear gene (Cronn et al. 1999). A phylogenetic analysis of sequences from several low-copy nuclear genes can therefore be a tool to trace the parental lineages involved in the formation of homoploid hybrids. Comparing the relative ages of conflicting groups in different gene phylogenies can be useful for determining which processes are responsible for incongruent trees. For example, a single hybridization event should

328

2009

FRAJMAN ET AL.—RETICULATE EVOLUTION OF HELIOSPERMA

result in two predominant but conflicting phylogenies where each topology is generally consistent over genes. By contrast, incomplete lineage sorting of alleles and gene duplications followed by random extinctions are expected to generate many conflicting topologies. If the ages of the conflicting groups are within the coalescence time intervals of the descendent lineages, it is hard, if not impossible, to distinguish hybridization from incomplete lineage sorting if just two conflicting gene phylogenies are at hand (Holder et al. 2001; Joly et al. 2006). Hybridizations and incongruences due to orthology/paralogy conflation can be more readily distinguished if the conflicting divergence ages are outside of these intervals. Thus, coalescence theory (reviewed by Rosenberg and Nordborg 2002), knowledge of the ages of divergences, population sizes, and generation times can be used to test whether incomplete lineage sorting of alleles can be rejected as an explanation for observed incongruent gene phylogenies (Maureira-Butler et al. 2008). Recently, Avise and Robinson (2008) introduced a new term “hemiplasy” for “topological discordance between a gene tree and a species tree attributable to lineage sorting.” In Figure 1a, the gene trees I and II have identical topologies but differ significantly with respect to the divergence time between A and B. If Tree I represents the species lineage (“lineage” in the following text) phylogeny, then the gene phylogeny in Tree II is best explained by graph IV, where a duplication of the gene preceded the split between the lineages A and B, which each has lost a copy afterwards. A similar pattern would also result from incomplete lineage sorting. Conversely, if Tree II reflects the organismal phylogeny, then Tree I is best explained by graph III, where hybridization/lateral gene transfer between species A and B after their divergence has caused the age of the copy present in B to reflect the age of this event rather than the lineage split event and that the original copy of the B lineage has gone extinct. The situation in Tree V (Fig. 1b) is analogous to the scenario for Tree I. Tree V is best explained by graph VII, assuming that Tree VI reflects the lineage tree. There is an ancient split between the two lineages B and AC, followed by a split between the lineages A and C. After this, hybridization takes place between lineages A and B, leading to the loss of the copy in B followed by continued divergence of the lineages A and B. Note that the age of the group CBA in Tree V is then expected to be equal to the age of CA in Tree VI. By contrast, if the gene tree looks like Tree VI and Tree V reflects the lineage tree, the divergence of A, B, and C must have been preceded by gene duplication, as seen in graph VIII. One copy from each taxon has later gone extinct (or is not sampled), causing the observed pattern. Thus, the divergence between C and A in Tree VI must be older than that between B and A in Tree V, and the age of the root in Tree VI is expected to be older than the root of Tree V because the age of the root in Tree VI reflects the age of the gene duplication.

329

FIGURE 1. a) Two gene trees (I and II) with identical topologies but different divergence times between A and B, and hypothetical explanations of their history, III (hybridization/lateral gene transfer) and IV (gene duplication followed by gene loss), explaining I and II, respectively, assuming that the other gene tree (II and I, respectively) reflects the “true” species tree. b) Two gene trees (V and VI) with different topologies and different divergence times between A and B, and hypothetical explanations of their history, VII (hybridization/lateral gene transfer) and VIII (gene duplication preceding the divergence of A, B, and C), explaining V and VI, respectively, assuming that the other gene tree (VI and V, respectively) reflects the “true” species tree. Letters in gray indicate extinctions/nonsampled gene copies.

Provided that the gene trees have a common root, the relative ages of nodes in the different trees can be assessed. This can give us the order in time of the interior nodes of the tree, making it possible to compare gene trees in the way described above. Reconciling discordant trees and identifying the underlying processes of gene duplication, gene loss, and horizontal gene transfer are currently an area of active research (e.g., ´ Sang and Zhong 2000; V’yugin et al. 2002; Gorecki 2004; Hallett et al. 2004). Ideally, a model should be able to consider these processes simultaneously while inferring the species tree from multiple gene trees. However, we are not aware of any successful implementation of this. Our aim here is to demonstrate the usefulness of the relative ages in a collection of gene trees to discriminate between duplication and horizontal events.

330

SYSTEMATIC BIOLOGY

Popp and Oxelman (2004) devised a protocol for polymerase chain reaction (PCR) amplification and sequencing of several presumably unlinked nuclear DNA regions from the gene family encoding the second largest subunit of the RNA polymerases (RNAP) and applied it to the tribe Sileneae (Caryophyllaceae), a monophyletic group supported by nrDNA and cpDNA sequences (e.g., Oxelman and Lid´en 1995; Oxelman et al. 1997). A phylogenetic analysis of the combined RNAP DNA regions together with cpDNA and nrDNA internal transcribed spacer (ITS) sequences resulted in a well-resolved and supported phylogeny of Sileneae. Erixon and Oxelman (2008) thoroughly analyzed approximately 25 kb of cpDNA sequence data for roughly the same set of taxa, which resolved most branches of the tree with high support. However, Heliosperma (Rchb.) Rchb., nom. cons. prop. (Frajman and Rabeler 2006), was not included in the RNAP study and represented by a single sequence (Heliosperma alpestre) in the cpDNA study. Heliosperma includes several diploid (2n = 24) taxa from the Southern and Central European mountains (e.g., Jalas and Suominen 1986). Monophyly of Heliosperma, clearly separated from the core of Silene, is supported by ITS and cpDNA rps16 intron sequences, but its phylogenetic position remains uncertain (Frajman and Oxelman 2007). Within Heliosperma, three major clades were resolved with strong support, corresponding to H. alpestre, H. macranthum, and the H. pusillum group. The phylogenetic relationships among these three differ between the nuclear and the plastid data sets, possibly as a result of ancient hybridization or orthology/paralogy conflation. Also within the H. pusillum group, the plastid and nuclear data revealed incongruent patterns, with a clear geographical division into a Western and an Eastern group in the plastid data. These were hypothetically explained by ancient (2–20 Ma) divergence reflected in the cpDNA and subsequent hybridization caused by secondary contacts due to geological and climatological changes and reflected by extensive additive polymorphic sites and homoplasy in the ITS data (Frajman and Oxelman 2007). In this study, we explore the potential of multiple gene phylogenies to address the origin of Heliosperma as well as the relationships between H. alpestre, H. macranthum, and the H. pusillum group. We use the relative ages of gene tree divergences to address conflicting gene phylogenies concerning the relationships between the Heliosperma lineages and their implications for the hypothesized hybridization history within the genus and the relationships to other Sileneae taxa. Specifically, we do this by using four low-copy nuclear DNA sequence regions from the second largest subunit of the RNAP gene family (RPA2, RPB2, RPD2a, and RPD2b), as well as the psbE-petG spacer and the rps16 intron from the chloroplast genome. The ITS region is not included in this study because its presumed concerted evolution preceded by hybridization within Heliosperma makes it unsuitable for dating (Frajman and Oxelman 2007).

VOL. 58

M ATERIAL AND M ETHODS Plant Material, Sequence Regions, and DNA Extraction Taxa from all major clades of Sileneae were selected on the basis of previous studies (Oxelman and Lid e´ n 1995; Oxelman et al. 1997; Popp and Oxelman 2004; Frajman and Oxelman 2007). Voucher data and EMBL/ GenBank accession numbers of specimens are listed in the online Appendix 1 (http://sysbio.oxfordjournals. org/). We sequenced four RNAP regions, three of them (RPA2, RPD2a, and RPD2b) corresponding to the regions used in the study by Popp and Oxelman (2004), whereas the RPB2 region was extended and includes the sequence corresponding to the region between the middle of exon 18 and the beginning of exon 24 in Arabidopsis (GenBank accession no. AL035527). The chloroplast regions used in our study were the psbE-petG spacer region and the rps16 intron (see e.g., Oxelman et al. 1997; Frajman and Oxelman 2007; Erixon and Oxelman 2008). For dating purposes, we used also the chloroplast matK region (online Appendix 2). The taxon sampling in different DNA regions used is more or less identical, with some exceptions (see online Appendix 1), for example, all RPD2b sequences of the taxa belonging to the Silene subg. Silene are missing as this RPD2 paralogue probably is extinct in this lineage (Popp and Oxelman 2004). Total genomic DNA was extracted from herbarium specimens or silica gel–dried material following the protocol of Oxelman et al. (1997) and purified using the QIAquick purification kit protocol (Qiagen, Alameda, CA) or GFX PCR DNA and Gel Band Purification Kit (Amersham Biosciences, Piscataway, NJ).

PCR, Cloning, Sequencing, and Contig Assembly PCRs were performed as described by Popp and Oxelman (2004). Additionally, some amplifications were performed using the Phusion High-Fidelity PCR kit. The psbE-petG region was amplified (and sequenced) using the psbE-RF and petG-R primers and the same PCR conditions as Erixon and Oxelman (2008). The rps16 intron sequences were amplified and sequenced as described by Oxelman et al. (1997) and Frajman and Oxelman (2007). For the RNAP regions, Sileneae-specific primers (Popp and Oxelman, 2004) were used, mainly applying the reaction conditions described in Popp and Oxelman (2004). Additional H. pusillum/macranthumspecific primers were designed for amplification of the RPD2a region (Table 1) because amplification with Sileneae-specific primers (that successfully amplified this region in H. alpestre) was not successful in H. pusillum and H. macranthum. The RPB2 region was amplified and sequenced between exons 18 and 24 for RPB2, using the RPB2SE 18F and r7586 primers (Popp and Oxelman 2004; Table 1). The matK region was amplified and sequenced using matK2.1F and matK-3.2R primers (Table 1) and applying the following PCR conditions:

2009

FRAJMAN ET AL.—RETICULATE EVOLUTION OF HELIOSPERMA

TABLE 1. Primers, not published elsewhere, designed for this study Region RPB2 RPB2 RPB2 RPB2 RPB2 RPB2 RPD2a RPD2a matK matK

Primer name RPB2 SE18F RPB2 SE23R7380 RPB2 HmacI23Ra RPB2 HmacI23Rb RPB2 HI22Ra RPB2 HI22Rb RPD2FSaHelPus RPD2RSaHelPus matK2.1F matK-3.2R

Sequence GGCAGCTTCCWGCAGGAATTGT GCGTCTCCTTCYTTACCCACATGAGCA AAGGGAAGCACGCCATAAGCA AAGGGATGCACCATAAGCA GATTATTGAGAGCTGATCAGG AATTATTGAGAGCTGATCAGC GGTATCCATTTAMGACTKTCYTTTG TCGACAGAATCCAGCCCTGCA CCTATCCATCTGGAAATCTTAG CTTCCTCTGTAAAGAATTC

one cycle of 94◦ C 1 min, followed by 40 cycles of 94◦ C 30 s, 53◦ C 40 s, 72◦ C 40 s, and 72◦ C 5 min. PCR products were purified using Multiscreen PCR (Millipore, Billerica, MA) and then sequenced with the PCR or nested primers (Popp and Oxelman 2004; Table 1), using cycle sequencing terminator kits provided by Amersham Pharmacia Biotech (APB, Piscataway, NJ) or Applied Biosystems (AB, Foster City, CA), and visualized on a MEGABace 1000 (APB) or on ABI 3700/3730XL (AB). In most cases, we obtained unambiguous sequences, but in some cases direct sequencing resulted in polymorphic sequences. In order to recover single RPA2 sequences from the H. pusillum group, we designed internal sequencing primers, with the 3’-end being specific for polymorphic regions/sites. To survey for the presence of sequence variants, amplification products of the RNAP introns from some Heliosperma accessions of all three main lineages were cloned with the TOPO TA Cloning kit (Invitrogen, Carlsbad, CA) according to the manufacturer’s manual, using half-volume reactions. Between 20 and 40 positive colonies from each reaction were screened by direct PCR using the T7 and M13R universal primers. Between 10 and 20 PCR products from individual colonies were then sequenced using either the T7 or the M13R primer. The quality of these sequences was usually high, therefore both strands were sequenced only if it was not possible to read the entire sequence of the cloned fragment with a single sequencing primer. On the basis of these sequences, RPB2 paralogue-specific PCR/sequencing primers were designed (Table 1). To search for paralogues/gene copies within the accessions belonging to the three main lineages of Heliosperma, all possible PCR/sequencing forward and reverse primer combinations were used in amplifications, but only the copies presented here were found. Contigs were assembled and edited using Sequencher 3.1.1 (Gene Codes Corporation). Base polymorphisms were coded using the NC-IUPAC ambiguity codes. A polymorphism was coded when more than one peak was present at the same position in the electropherogram using the same criteria as described by Frajman and Oxelman (2007). If different sequences were obtained from clones of the same accession, they were all included in preliminary phylogenetic analyses. Polymorphism consensus sequences were constructed from clones from the same plant that formed monophyletic

331

groups. In the final phylogenetic analyses, only the consensus sequences were used from these plants. The sequences were aligned manually using Se-Al (Rambaut 1996) and BioEdit (Hall 1999) following the criteria described by Popp and Oxelman (2004). Alignments are available on TreeBASE (study number SN4178). Gaps (indels) were coded as binary charac¨ ters using SeqState version 1.25 (Muller 2005) applying simple gap coding (Simmons and Ochoterena 2000). To screen for putative recombination breakpoints, Genetic Algorithm Recombination Detection (GARD; Kosakovsky Pond et al. 2006) was used online (http:// www.datamonkey.org/GARD). We used the GARD detection method using nucleotide substitution models as suggested by the model selection tool on the GARD Web page for each data set, with gamma substitution rate variation along the sequence alignment. GARD can detect recombination between ancestral sequences and identify nonrecombinant and recombinant sequences. It outperforms other methods in terms of levels of Type I and Type II error (Kosakovsky Pond et al. 2006) and can employ complex models of substitution. Phylogenetic Analyses Maximum parsimony (MP) analyses as well as maximum parsimony bootstrap (MPB) analyses of all RNAP data sets as well as of chloroplast data set (concatenated rps16 and psbE-petG sequences) were performed using PAUP* version 4.0b10 (Swofford 2002). The most parsimonious trees were searched for heuristically with 100 replicates of random sequence addition, tree bisection reconnection (TBR) swapping, and the multrees option on. In cases where these search settings resulted in large numbers of trees, we performed swapping on a maximum of 1000 trees (nchuck = 1000). All characters were equally weighted and unordered. The data were bootstrapped using full heuristics, 1000 replicates, TBR branch swapping, multrees option off, and random addition sequence with five replicates. Agrostemma githago was used as the out-group for rooting, based on previous studies (Oxelman and Lide´ n 1995; Oxelman et al. 1997). Bayesian analyses were performed on the RNAP, cpDNA, and Caryophyllaceae matK data sets using MrBayes version 3.1 (Huelsenbeck and Ronquist 2001; Ronquist and Huelsenbeck 2003). The data were partitioned into a nucleotide set and an indel set, which was treated as morphological data according to the model of Lewis (2001). Each data set was analyzed with the default prior distributions and applying a model of evolution proposed by MrAIC.pl version 1.4 (Nylander 2004) using the Akaike criterion. The Markov chain Monte Carlo (MCMC) chains were run for 1 000 000 generations. Every 100th tree was saved, resulting in 10 000 trees for each data set, of which the first 2000 were discarded as conservative characterization as the burn-in phase. The AWTY (Are We There Yet?) system (Wilgenbusch et al. 2004; Nylander et al. 2008) was used to explore topological convergence in the MCMC runs.

332

SYSTEMATIC BIOLOGY

VOL. 58

As a null hypothesis, we expect that the individual gene trees all reflect the same phylogenetic tree. Gene copies in aberrant positions were identified as strongly supported if there was mutual conflict, where MPB > 75% and posterior probability (PP) > 0.95. We regard this as a powerful method because the topological location of the conflict is directly identified. We also use these support levels to identify well-supported branches in the individual gene trees. We used DupTree version 1.48 (Wehe et al. 2008) with full queue (500) heuristics to infer a species tree that minimizes the number of gene duplications. Binary gene trees from cpDNA, RPA2, RPB2, RPD2a, and RPD2b were used as input. All optimal trees were output; PAUP* version 4.0b10 (Swofford 2002) was used to determine the number of distinct topologies with the “condense collapse = no” command. Primetv (Sennblad et al. 2007) was used to visualize the gene trees reconciled in the species tree using the Web service at http://prime.sbc.su.se/cgi-bin/primetv.cgi.

parameters, and an uncorrelated relaxed lognormal clock (Drummond et al. 2006). The prior age of the Alsinoideae/Caryophylloideae was set using a lognormal distribution with mean = 0.0, standard deviation = 1.0, and offset = 34 Myr (i.e., end of Eocene), thus setting a hard lower (younger) bound to the age of the group of 34 Myr. Two independent MCMC chains were run for 30 000 000 generations with tree and parameter values saved every 3000th generation. The dating of the Sileneae RNAP and cpDNA data sets was performed using the same methods as for dating the matK tree (see above). The trees were rooted with the out-group (A. githago), forcing the in-group to become a monophyletic sister group. The median age of the in-group crown group obtained from the matK analysis (16.5 Myr) was used as a fixed age to calibrate the gene trees in the r8s and PATHd8 analyses. For BEAST, the prior age of the root was set to the date obtained from the matK analysis (20.81 Myr), with a normally distributed standard deviation of 2.9, which corresponds well to the combined 95% highest posterior densities Dating (HPD) from the two BEAST runs (15.14–26.49 Myr). We estimated the age of the tribe Sileneae using the Tracer version 1.3 (Rambaut and Drummond 2004) was chloroplast matK gene sequence data by Fior et al. used to determine the degree of mixing, the shape of (2006), complemented by some additional Sileneae the probability density distributions, and 95% credibiltaxa (online Appendix 2), using PATHd8 (Britton et al. ity intervals for estimated divergence dates. Effective 2006, 2007), r8s version 1.7 (Sanderson 2003), and BEAST sample sizes (ESS), that is, the number of effectively version 1.4.7 (Drummond and Rambaut 2007). For the independent draws from the posterior distribution, and dating performed with r8s, we used the penalized like- mixing, that is, efficiency with which MCMC algorithm lihood (PL) method (Sanderson 2002). The node ages in samples a parameter (Drummond and Rambaut 2007), PATHd8 were estimated by their corresponding mean in all BEAST analyses were good (i.e., ESS always expath length (MPL), that is, the average of all path lengths ceeded 100 and the parameter traces showed the chain from the node to the terminals descended from that fluctuating rapidly around the equilibrium). FigTree (Rambaut 2006) was used to display the trees node (Britton et al. 2002). For calibration of the Caryophyllaceae matK tree, we (one for each data set) after combining the tree files used the inflorescence fossil (Caryophylloflora paleogenica using LogCombiner and summarizing the informaG. J. Jord. & Macphail) described by Jordan and Macphail tion using TreeAnnotator (both programs available in (2003), which they determine as belonging to Caryophyl- BEAST package). In order to check whether the PPs laceae. More specifically, they conducted a phylogenetic were actually affected by the data, we also ran analyses analysis based on morphological characters scored for in which BEAST only sampled from the prior distribuboth fossil and extant species and molecular charac- tions (Drummond et al. 2006). The resulting ages from ters for the extant species and position the fossil as these runs were in general well outside the range of ages sister to or within the subfamilies Alsinoideae and from the runs with data, indicating informativeness of Caryophylloideae. When dating the matK-tree with r8s the data. and PATHd8, we used the extremes of the age interval 34–45 Ma, which corresponds to the middle to late R ESULTS Eocene age of the fossil (Gradstein et al. 2004) as fixed ages for the Alsinoideae/Caryophylloideae node. The No evidence for recombination could be found by the majority rule tree with mean branch lengths obtained GARD analyses. Consequently, the tree model is considfrom posterior distributions of the MCMC analyses with ered appropriate for all the individual gene data sets, MrBayes was used as input to r8s and PATHd8. The and no sequences were removed from the alignments. BEAST analyses of the matK data set were performed The estimated median age of Sileneae from the BEAST with a Yule tree prior, GTR+Γ substitution model analyses of the matK data set is 20.81 Myr (Fig. 2; 95%

FIGURE 2 (next page). Bayesian consensus chronogram (obtained with BEAST) of the Caryophyllaceae matK sequences, with the fixage node marked (from the age of the inflorescence fossil Caryophylloflora paleogenica). Caryophylloideae and Alsinoideae are shaded. Numbers below branches indicate PPs obtained from Bayesian analyses. Only values above 0.70 are shown. Numbers (in bold) associated with nodes in the chronogram indicate the median crown group age in million of years of the clade above that node. The calculated age of the tribe Sileneae is indicated.

2009

FRAJMAN ET AL.—RETICULATE EVOLUTION OF HELIOSPERMA

F IGURE 2.

333

334

VOL. 58

SYSTEMATIC BIOLOGY

HPD: 15.13–26.5 Myr). The age of Sileneae estimated by PL/MPL was 23/20 or 31/26 Myr when using the calibration extremes of 34 or 45 Myr, respectively. Phylogenetic Relationships and Relative Dating Gene-specific information about the size of the alignments, parsimony metrics, the model of evolution proposed by MrAIC, as well as the coefficient of variation from BEAST analyses are presented in Table 2. The coefficient of variation gives an indication of how clock-like the data are. Values close to 0.0 indicate that the data are clock-like, whereas values much greater than 1.0 indicate substantial rate heterogeneity among lineages (Drummond et al. 2006). The PPs for the clades obtained by MrBayes and BEAST analyses were similar in most cases, with an exception in the RPA2 data (0.51/1.0 PP of the H. macranthum/H. alpestre clade obtained by BEAST/MrBayes analyses). The ages of selected clades as inferred from the three dating methods (PATHd8, r8s, and BEAST) are presented in Figure 3; the age estimation of all major groups is available in the online Appendix 3. The dates estimated with the three methods are largely correlated, but some differences can be observed (Fig. 3, online Appendix 3). However, the HPDs of the age estimations in BEAST, that is, the credible sets that contain 95% of the sampled values, are rather broad, up to almost 16.4 Myr for the in-group age estimations (see online Appendix 3). cpDNA The cpDNA tree (Fig. 4) suggests a sister group position of Heliosperma to a moderately supported (MPB 85%, 0.92 PP) clade including Eudianthe, Viscaria, and Atocion. Interspecific relationships within Heliosperma are congruent with the rps16 phylogeny in Frajman and Oxelman (2007): H. macranthum is sister to the H. pusillum group, and they constitute a sister group to H. alpestre. The age of the in-group estimated with

BEAST is c. 17.1 Myr, whereas the age of Heliosperma estimated with all three methods is c. 9.9 (BEAST) to 12.4 (MPL) Myr. The age estimation of the H. pusillum group is 3.8 (BEAST) to 4.5 (MPL), whereas the age of the Western H. pusillum group varies between 0.9 (BEAST) and 1.4 (PL) Myr and the Eastern group varies between 2.5 and 3.2 Myr. RPA2 All Heliosperma RPA2 sequences form a strongly supported monophyletic group (Fig. 5). The Heliosperma sequences are strongly supported as sister to the Viscaria and Atocion sequences, all three forming a monophyletic group with Petrocoptis and Eudianthe (MPB 84% and 0.97 PP). Heliosperma macranthum and H. alpestre form a sister clade to the sequences from the H. pusillum group, where two putative paralogous copies are found. Heliosperma macranthum possesses one extra copy, which is sister to one of the two H. pusillum group paralogues (MPB 90% and 0.97 PP). The age estimations of the Heliosperma sequences range between 6.3 (BEAST) and 7.8 (PL) Myr. RPB2 The RPB2 phylogeny identifies one group of Heliosperma sequences (Fig. 6) that share a common origin with the Viscaria and Atocion sequences but with moderate support (MPB 76% and 1 PP). These Heliosperma sequences and A. rupestre form a monophyletic group with weak support and poor resolution within. Heliosperma alpestre and H. macranthum possess a second copy of RPB2, which groups confidently with Eudianthe (MPB 100%, 1 PP) and these together form a clade that is sister to Petrocoptis with strong support (MPB 100% and 1 PP). The median age estimation of the in-group with BEAST (13.4 Myr) is considerably lower compared with the other gene trees, where this age is 15.2 (RPA2) to 17.1 Myr (cpDNA).

TABLE 2. Matrix and phylogenetic analyses statistics for the different sequence regions analyzed, and substitution models proposed by MrAIC, and posterior coefficient of variation distribution from BEAST analyses Region Number of terminals Number of included characters (nucleotides/indels) Number/percentage of parsimony informative characters Number/length of MP trees Consistency index (excluding uninformative characters)/retention index Substitution model Coefficient of variation—median (95% lower, upper HPD)

cpDNA 38

RPA2 54

RPB2 44

RPD2a 38

RPD2b 34

3180 (2881/299)

3977 (3839/138)

2120 (1897/223)

3311 (3172/139)

2164 (2048/116)

525/16.5% 12/1494

328/8.2% 97000/905

593/28.0% 49191/1519

315/9.5% 50/794

252/11.6% 162/774

0.747 (0.620)/0.849 GTR+Γ

0.785 (0.689)/0.879 GTR+Γ

0.777 (0.700)/0.828 GTR+Γ

0.822 (0.741)/0.892 GTR+Γ

0.832 (0.719)/0.851 GTR+Γ

0.486 (0.339, 0.668)

0.646 (0.468, 0.854)

0.612 (0.421, 0.834)

0.507 (0.316, 0.736)

0.663 (0.431, 0.937)

2009

FRAJMAN ET AL.—RETICULATE EVOLUTION OF HELIOSPERMA

335

(Frajman and Oxelman 2007). RPA2 is duplicated in the H. pusillum group, and there is one extra H. macranthum copy sister to one of the H. pusillum copies. In RPD2a, H. alpestre groups with Petrocoptis and does not form a monophyletic group with the other Heliosperma taxa. Monophyletic Heliosperma groups are associated with Viscaria and Atocion in RPA2 and RPB2 with strong and moderate support, respectively. A similar pattern is seen in the cpDNA data, with the difference that the relationship to Eudianthe is uncertain. The position of the root of the in-group varies among the data sets and is weakly supported, except in the cpDNA, where Petrocoptis forms a sister lineage to a clade consisting of two subclades, one including Lychnis and Silene, the other including Eudianthe, Heliosperma, Viscaria, and Atocion. FIGURE 3. Relative comparison of the ages in million years obtained from PATHd8, r8s, and BEAST for the groups discussed (see Discussion). The in-group age 16.5 Myr was set as a calibration point in the r8s and PATHd8 analyses.

RPD2a A single RPD2a sequence was amplified from each of the Heliosperma accessions. The relationships among basal branches of this gene tree are poorly resolved (Fig. 7). Heliosperma alpestre is sister to Petrocoptis (MPB 88% and 1 PP), whereas H. macranthum is sister to the H. pusillum group (MPB 100% and 1 PP). Viscaria vulgaris is sister to Atocion with strong support (MPB 100% and 1 PP). The age of the in-group estimated with BEAST is c. 15.5 Myr. RPD2b The relationships among basal branches in the RPD2b tree are poorly resolved (Fig. 8). The Heliosperma sequences form a strongly supported group, but its location in the tree is ambiguous. One RPD2b sequence per Heliosperma accession was obtained, and H. alpestre/ H. macranthum form a monophyletic group (MPB 76% and 1 PP). The age estimation of the in-group with BEAST is 15.8 Myr and for Heliosperma it varies between 6.3 (BEAST) and 6.9 (PL) Myr. Congruence Assessment The relationships among the main groups of Sileneae and Heliosperma and the median ages from the BEAST analyses are summarized in diagrams in Figure 9, where all the nodes with support lower than MPB 75% and 0.95 PP are collapsed. The Duptree analysis for these taxa resulted in 21 distinct optimal species trees that each requires 11 gene duplications to reconcile the strongly supported incongruences among the five data sets. The H. macranthum cpDNA sequences are strongly supported as sister to the H. pusillum group sequences, whereas RPA2 and RPD2b group H. macranthum and H. alpestre together as sister to the H. pusillum group, which is in agreement with nrDNA ITS sequences

D ISCUSSION Our results demonstrate intricate phylogenetic patterns from the nuclear RNAP genes. There is topological incongruence not only with the chloroplast data but also among the nuclear genes. The situation is further complicated by the fact that there is often more than one sequence type per gene region. We will discuss how relative dates can be used to interpret our results; we believe that the reasoning could be extended to other examples. The major aim of our study was to infer the phylogenetic position of Heliosperma within the tribe Sileneae and to test the hypothesis of reticulate evolution among its three major lineages put forward by Frajman and Oxelman (2007). We will therefore focus on the phylogenetic relationships among the genera that show affinities with Heliosperma in our gene trees (Figs 4–8), namely Atocion, Eudianthe, Petrocoptis, and Viscaria. The relationships within Silene and Lychnis based on these genes and a similar sampling of taxa were discussed by Popp and Oxelman (2004). Comparison of Results from Three Dating Methods We have used three dating methods to infer divergence times in the gene trees (Fig. 3, online Appendix 3). The Bayesian approach (as implemented in BEAST) takes topological and branch length uncertainties into account, as opposed to our PL and MPL analyses, where ages are estimated using a single input tree. The BEAST method also differs by the way the trees are calibrated, which was made at the root of Sileneae. The age of the in-group (Sileneae excluding Agrostemma) had to be fixed in the r8s and the PATHd8 analyses. Thus, it may be argued that the results are not strictly comparable. However, our main goal is to make relative age comparisons, and for this we need only identify relative age differences that are demonstrably larger than the coalescence time intervals. The median age estimations of Sileneae and the in-group obtained from the BEAST analyses of the matK data set were arbitrarily

336

SYSTEMATIC BIOLOGY

VOL. 58

FIGURE 4. Bayesian consensus chronogram (obtained with BEAST) of the cpDNA data set (rps16 intron and psbE-petG spacer). Numbers/letters next to the taxon names correspond to accession identifiers in Table 1. Numbers below (above) branches indicate parsimony bootstrap percentages (MPB)/PPs obtained from Bayesian analyses. Only values above 60%/0.70 are shown. Numbers (in bold) associated with nodes in the chronogram indicate the mean crown group age in million of years of the clade above that node. A black dot on a node represent strong support (MPB > 75%/PP > 0.95) for the clade above that node.

chosen for calibration purposes in the analyses of the RNAP and cpDNA data sets. The inferred absolute ages are strictly speaking estimates of the cpDNA splits and not of “species” lineage splits. In principle, we could have set the root age to some arbitrary value, but we think that our approach provides a starting point for further absolute dating of the phylogenetic history of Sileneae. Agrostemma githago was used as the out-group. It is an annual plant and there is some evidence that annuals tend to have higher substitution rates than perennials (e.g., Andreasen and Baldwin 2001), which might be due to the shorter generation times of the former (Smith and Donoghue 2008). Agrostemma is rather divergent from the in-group, and the sequences are difficult to align in some cases, especially in the case of RPA2, and to some extent also in the RPD2 regions, where the alignable part with Agrostemma is short. It is easy to imagine that this can be a source of large stochastic effects. Unfortunately, the sequences of closest relatives of Sileneae,

the Caryophylleae (e.g., Dianthus and Saponaria) are to a large extent not alignable in the intron regions of the RNAP genes used here, which is the reason why the age of the root could not been used for calibrating the trees in the PATHd8 and r8s analyses, where an additional out-group is needed to assign branch lengths from the root. A less desirable property of the MPL method is that negative branch lengths may occur. In such cases, PATHd8 assigns a zero branch length. An example of this is the Petrocoptis/Eudianthe/H. macranthum/ H. alpestre clade in the RPB2 tree (Fig. 6). The PL method penalizes rate changes between ancestral and descendant branches, which may give rise to more clock-like trees than those produced by BEAST, which does not assume autocorrelation. In our analyses, the covariance is close to zero, which suggests that there is no strong evidence of autocorrelation in the data (Drummond and Rambaut 2007), that is, rates are not “inherited” from ancestor lineages to their immediate descendants.

2009

FRAJMAN ET AL.—RETICULATE EVOLUTION OF HELIOSPERMA

337

FIGURE 5. Bayesian consensus chronogram (obtained with BEAST) of the RPA2 sequences. Numbers/letters next to the taxon names correspond to specimens in Table 1. Numbers/letters next to the taxon names correspond to accession identifiers in Table 1. Numbers below (above) branches indicate parsimony bootstrap percentages (MPB)/PPs obtained from Bayesian analyses. Only values above 60%/0.70 are shown. Numbers (in bold) associated with nodes in the chronogram indicate the mean crown group age in million of years of the clade above that node. A black dot on a node represent strong support (MPB > 75%/PP > 0.95) for the clade above that node.

For the sake of clarity, we will from here on refer only to the median ages estimated with BEAST. Our estimates of the absolute divergence ages show that the Heliosperma lineage probably originated during middle to late Miocene. The estimated absolute age for diversification of the H. pusillum group genes is 2.0–6.2 Myr. Thus, the suggestion of a pre-Pleistocene origin for this group (Frajman and Oxelman 2007) is supported.

Using Relative Age Differences to Exclude Lineage Sorting as an Explanation for Incongruences Among the Three Heliosperma Lineages Conservative coalescence times (in the sense that they favor accepting the null hypothesis of lineage sorting as an explanation for incongruence) for the bifurcations observed in our gene trees can be calculated using large populations sizes and generation times (Maddison and

338

SYSTEMATIC BIOLOGY

VOL. 58

FIGURE 6. Bayesian consensus chronogram (obtained with BEAST) of the RPB2 sequences. Numbers/letters next to the taxon names correspond to accession identifiers in Table 1. Numbers below (above) branches indicate parsimony bootstrap percentages (MPB)/PPs obtained from Bayesian analyses. Only values above 60%/0.70 are shown. Numbers (in bold) associated with nodes in the chronogram indicate the mean crown group age in million of years of the clade above that node. A black dot on a node represent strong support (MPB > 75%/PP > 0.95) for the clade above that node.

Knowles 2006). Although very little is known about effective population sizes and generation times for Heliosperma, setting effective population size (N) = 10 000 and generation time (t) = 5 years potentially allows for stringent discrimination between incomplete lineage sorting and other causes of gene tree incongruence. Although members of Heliosperma are perennial plants, they usually flower and set seed the first year. Effective population sizes can be calculated from gene trees using coalescent theory. It is not an easy task however, and several simplistic assumptions have to be made. For example, the BEST method (Liu and Pearl 2007), which ignores hybridization and gene duplications, can be used to estimate ancestral effective population sizes. Unfortunately, these estimates tend to be heavily influenced by the prior distributions (Liu

and Pearl 2007). The sizes of extant spatially restricted groups of plants could be used as crude guesses, but the relationships between extant and ancestral effective sizes are unknown. Extant population sizes, defined spatially, are probably much smaller than 10 000 individuals, in particular for H. macranthum, as estimated in the field. Hudson and Coyne (2002) showed that it takes roughly 9–12 N generations for 95% of nuclear loci to show reciprocal monophyly in sister species. In the absence of other dating errors, we regard differences in divergence times larger than 500 000 years (10Nt), to be unlikely as attributable to incomplete lineage sorting of nuclear alleles. For organelle DNA, the corresponding interval is 250 000 years, given that the population size for a haploid gene in a hermaphroditic population is 2N.

2009

FRAJMAN ET AL.—RETICULATE EVOLUTION OF HELIOSPERMA

339

FIGURE 7. Bayesian consensus chronogram (obtained with BEAST) of the RPD2a sequences. Numbers/letters next to the taxon names correspond to accession identifiers in Table 1. Numbers below (above) branches indicate parsimony bootstrap percentages (MPB)/PPs obtained from Bayesian analyses. Only values above 50%/0.70 are shown. Numbers (in bold) associated with nodes in the chronogram indicate the mean crown group age in million of years of the clade above that node. A black dot on a node represent strong support (MPB > 75%/PP > 0.95) for the clade above that node.

Frajman and Oxelman (2007) suggested that the observed incongruences between cpDNA and nrDNA ITS trees regarding the relationships among H. alpestre, H. macranthum, and the H. pusillum group could be due to either incomplete lineage sorting or ancient hybridization. The RPA2 and RPD2b trees (Figs 5 and 8) agree in part topologically with the ITS tree (Frajman and Oxelman 2007), whereas the RPB2 tree (Viscaria/Atocion/ Heliosperma clade) is poorly resolved. Heliosperma alpestre groups with Petrocoptis in the RPD2a tree. Incomplete lineage sorting can be rejected as a plausible explanation because of the large differences between the divergence times. The difference of 1.3–2.0 Myr between the divergence of Heliosperma and the divergence of H. alpestre/H. macranthum estimated from the nuclear gene trees (RPA2, RPD2b) is outside of the estimated conservative coalescence time of 250 000 years for chloroplast genes, and the difference of c. 1.3 Myr in the divergence of Heliosperma and the divergence of the H. pusillum group/H. macranthum estimated from the cpDNA is greater than 500 000 years, which is the

estimated conservative coalescence time for the nuclear genes applied here. Chloroplast Capture Is Not Responsible for Observed Incongruence Among the Three Heliosperma Lineages Chloroplast capture via hybridization, that is, recent introgression of chloroplasts from one species into another, is often a preferred explanation for incongruence between chloroplast and nuclear gene trees (e.g., Rieseberg and Soltis 1991; McKinnon et al. 2001; Tsitrone et al. 2003; Okuyama et al. 2005). It is possible that the sister–group relationship between H. alpestre and H. macranthum shown by the nuclear gene trees (RPA2, RPD2b, ITS) reflects the “species” phylogeny and that H. macranthum has captured plastids of the H. pusillum lineage (Fig. 10a). However, the relatively young age of the H. macranthum stem lineage in the nuclear genes (Fig. 10b), compared with the chloroplast genes (Fig. 10c), rejects this explanation. If we assume that the chloroplast tree represents the “species

340

SYSTEMATIC BIOLOGY

VOL. 58

FIGURE 8. Bayesian consensus chronogram (obtained with BEAST) of the RPD2b sequences. Numbers/letters next to the taxon names correspond to accession identifiers in Table 1. Numbers below (above) branches indicate parsimony bootstrap percentages (MPB)/PPs obtained from Bayesian analyses. Only values above 50%/0.70 are shown. Numbers (in bold) associated with nodes in the chronogram indicate the mean crown group age in million of years of the clade above that node. A black dot on a node represent strong support (MPB > 75%/PP > 0.95) for the clade above that node.

tree” (cf. Fig. 1), we may instead explain the incongruent pattern using the reasoning in graph VII in Figure 1 (see introduction), that is, there must have been hybridization going on between the H. alpestre and H. macranthum lineages, with subsequent extinction of the “original” H. alpestre nuclear copies. The younger age of Heliosperma seen in the nuclear chronograms is in line with this. Moreover, chloroplast gene duplications and interlineage recombinations are rare (Rice and Palmer 2006; but see Erixon and Oxelman 2008). Therefore, we favor explanations that do not require cpDNA gene duplications. From Figure 1 we can predict that the age of the H. macranthum/H. pusillum clade in the cpDNA tree should agree with the age of Heliosperma in the nuclear data. However, the H. macranthum/H. pusillum clade was estimated to be considerably older in the cpDNA tree than Heliosperma in the nuclear trees (Fig. 3). Apart from errors in the dating analysis, this discrepancy could possibly be explained by incomplete concerted evolution between the parental copies/alleles in the hybrid (Wendel and Doyle 1998). Concerted evolution will tend to homogenize different sequences, resulting in underestimation of divergence times (Teshima and

Innan 2004). This is likely to have taken place in the ITS sequences of the H. pusillum group, where extensive hybridization appears to have taken place (Frajman and Oxelman 2007). However, concerted evolution of RNAP genes remains speculative because no traces of recombination (i.e., concerted evolution; Teshima and Innan 2004) have been found, despite thorough manual and automated checking using GARD (see Results) and also various methods implemented in the software RDP2 (Martin et al. 2005; results not shown). Gene duplication of RPA2 has likely occurred in the evolutionary history of the H. pusillum group, where two RPA2 copies (Fig. 5) form two well-supported clades. A gene duplication preceding the diversification of the H. pusillum group is supported because all sampled specimens have two sequence copies, one in each of the two major clades of the H. pusillum group sequences. Heliosperma macranthum possesses an additional RPA2 sequence-type, which is sister to one of the two H. pusillum-group paralogues. These must be paralogues and not alleles because both sequence types are present in two variants per plant. The divergence time between the first H. macranthum copy and its sister copy of the H. pusillum group (3.3 Ma) is younger than the

2009

FRAJMAN ET AL.—RETICULATE EVOLUTION OF HELIOSPERMA

341

FIGURE 9. Summary diagrams presenting relationships among the main groups of Sileneae. All nodes in the trees with support lower than MPB 75% and 0.95 PP are collapsed here. The numbers associated with the nodes are the median ages in million years of the clades inferred with BEAST.

divergence time between the other H. macranthum copy and H. alpestre (5 Ma). If we disregard lineage sorting as a plausible explanation, a hybridization event between the H. pusillum and H. macranthum lineages is a more parsimonious explanation than gene duplication before the diversification of Heliosperma RPA2 sequences, as this would require a single event (hybridization) rather than an additional gene duplication and at least three gene losses. However, extinction of one of the duplicated gene copies due to selection or drift might not be

very “costly” from a biological perspective, as most duplicated genes are lost within a few million years (Lynch and Conery 2000). Moreover, as the posterior intervals of the age estimates are large, rejection of the duplication/loss hypothesis of RPA2 is not strongly supported. The RPD2a tree (Fig. 7) shows a different incongruent pattern. The sister–group relationship between the H. macranthum and the H. pusillum group sequences is congruent with the cpDNA tree. However, H. alpestre groups with Petrocoptis, a topology not seen in any

FIGURE 10. Hypothetical scenario that would support chloroplast capture within Heliosperma (a). The arrow indicates hypothetical hybridization and capture of cpDNA. (b, c) Relationships inferred by nrDNA and cpDNA in Heliosperma, respectively. Dashed line represents the age of H. macranthum stem lineage inferred by nuclear DNA. The numbers associated with the nodes are the median ages in million years of the clades inferred with BEAST. Crossed circle and gray font represent extinction (or nonsampled gene copies). A, Heliosperma alpestre; M, Heliosperma macranthum; P, Heliosperma pusillum.

342

VOL. 58

SYSTEMATIC BIOLOGY

other gene tree. The RPD2a divergence time between Petrocoptis and H. alpestre (8.2 Ma) is younger than the age of Heliosperma inferred by cpDNA (9.9 Myr; online Appendix 3). One hypothetical explanation is ancient hybridization between Heliosperma and Petrocoptis, the latter serving as pollen donor. This topological pattern is not seen in any other RNAP tree, but it might have been distorted by the more recent hybridization between the H. alpestre and H. macranthum lineages discussed above (in the case of RPA2). Phylogenetic Position of Heliosperma in Sileneae Heliosperma is strongly supported as monophyletic in cpDNA, RPA2, and RPD2b trees (Figs 4, 5, 8, respectively), as well as by ITS data (Frajman and Oxelman 2007), whereas in the RPB2 and RPD2a trees (Figs 6 and 7, respectively) the Heliosperma sequences fail to form a monophyletic group. Various RNAP regions put the Heliosperma sequences together with at least two lineages. One lineage is closer to Viscaria/Atocion (supported by RPA2, partly RPB2), and the other is closer to Eudianthe and/or Petrocoptis (supported by RPB2 and RPD2a). It is noticeable that all data sets except RPD2a either support a sister group relationship of Heliosperma to Viscaria/Atocion or do not reject it strongly. The RPA2 and RPD2b phylogenies group Viscaria and Atocion as well as Petrocoptis and Eudianthe as sister groups, although the MPB support for Petrocoptis and Eudianthe is moderate (89% and 82%, respectively). In the case of cpDNA and RPD2a, the Atocion/Viscaria clade is weakly supported as sister to Eudianthe (MPB 85% and 70%, respectively), which is not in line with some other chloroplast regions, where Heliosperma is sister to Eudianthe or Atocion/Viscaria (Erixon and Oxelman 2008). Despite a very large data set of cpDNA sequences in the latter study, it is still ambiguous whether Heliosperma or Eudianthe is sister to Atocion/Viscaria. Viscaria and Atocion did not form a monophyletic group in the RPB2 tree (Fig. 6), which is in contrast to all other evidence (see Popp and Oxelman 2004; Erixon and Oxelman 2008; Frajman et al. 2009). However, this incongruence is weakly supported and could potentially be explained as sampling error. Heliosperma alpestre and H. macranthum possess two RPB2 sequence variants (Fig. 6). As in the case of the RPA2, different copies of both RPB2 paralogues were found, which are likely alleles. One of the RPB2 paralogues groups with Eudianthe, with Petrocoptis sister to this clade, whereas the other is positioned in the clade with the H. pusillum group and Atocion/Viscaria. The cpDNA sequences confidently put Petrocoptis as, together with Agrostemma, being outside the rest of the tribe (Fig. 4; Erixon and Oxelman 2008). However, it is not found in this position in any of the RNAP trees presented here, but instead is often found as sister, or closely associated, to Eudianthe. If one assumes that the chloroplast tree reflects the “species tree” in this regard, the patterns seen in the RNAP trees are possibly the result of hybridization between Petrocoptis and some

FIGURE 11. Hypothetical explanation of the phylogenetic relationships inferred by the RPB2 sequences, assuming the framework presented in Fig. 1. The arrow indicates hypothetical gene duplication. Crossed circles and gray font represent extinction (or nonsampled gene copies).

ancestor in the Eudianthe/Heliosperma/Viscaria/Atocion lineage. In contrast, the complicated relationships seen in the RPB2 tree can be explained by a tree (Fig. 11) similar to graph VIII in Figure 1, that is, the hypothesis of an ancient duplication followed by the loss of some copies. In one paralogue, Heliosperma and Atocion/ Viscaria form a monophyletic group, but the copy in Eudianthe/Petrocoptis is lost. In the other paralogue, it is not only the copy in the Viscaria/Atocion lineage that is lost but also the copy in the H. pusillum lineage. The duplication of RPB2 must then be older than the divergence between Eudianthe and the other groups, which is consistent with the chronogram in Figure 6. The reasoning depends on the validity of the assumption that the chloroplast tree splits reflect the “species tree” relationships. There is no need for chloroplast gene duplications in the particular cases discussed above, something which supports the hybridization hypothesis, but it is admittedly fragile. Future studies might test the hypothesis using additional genes and methodological approaches to infer species trees from multiple gene tree (e.g., like those of Liu and Pearl 2007). C ONCLUSIONS In this paper, we have evaluated different hypotheses about conflicting gene trees for the same taxa. Divergence times can be useful for deciding between hybridization and gene duplication/loss as the most likely explanation for topological conflicts between gene trees, as long as the absolute time differences are large enough to exclude incomplete lineage sorting of alleles. Using

2009

FRAJMAN ET AL.—RETICULATE EVOLUTION OF HELIOSPERMA

reconciliation procedures without taking times into account may lead to improbable results. For example, all 21 most parsimonious reconciliations applied to our data set favored at least one duplication of chloroplast genes, which from an empirical perspective appears unrealistic. Instead, we conclude that the incongruences seen between the cpDNA and at least some of the nuclear regions regarding the relationships among the three major groups of Heliosperma are likely to be best explained by homoploid hybridization. The pattern regarding the origin of Heliosperma is more complicated and is likely to include several events, perhaps including both gene duplications/extinctions and hybridizations. Clearly, development of more sophisticated analytical tools (e.g., Maureira-Butler et al. 2008, who present a probabilistic method to test the null hypothesis of lineage sorting) as well as more data from nuclear loci are needed in order to improve our understanding of reticulate phylogenetic histories in groups like Heliosperma. S UPPLEMENTARY M ATERIAL Supplementary material can be found at: http:// sysbio.oxfordjournals.org/. A CKNOWLEDGMENTS The Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning and the Swedish Research Council kindly provided financial support for this study. The field work and analyses were partially supported by the scholarships of Helge Ax:son Johnsons Stiftelse to B. Frajman and molecular work by EMBO Short Term Fellowship ASTF 45-2005 to B. Frajman. We are most grateful to D. Petrovi´c, M. Ronikier, S. Strgulc Krajˇsek, A. Strid, and B. Surina for collecting some of the plant samples used and to the curator of the herbarium WU for providing us with specimens. Many thanks are due to N. Heidari and M. Popp who were an indispensable help in the laboratory, to C. Anderson for help with and advice concern¨ ing dating methods, and to M. Thulin, P. Sch onswetter, G. Schneeweiss, B. Pfeil, the associate editor S. Renner, and the reviewers for the comments on draft manuscript versions. R EFERENCES Andreasen K., Baldwin B.G. 2001. Unequal evolutionary rates between annual and perennial lineages of checker mallows (Sidalcea, Malvaceae): evidence from 18S–26S rDNA internal and external transcribed spacers. Mol. Biol. Evol. 18:936–944. Avise J.C., Robinson T.J. 2008. Hemiplasy: a new term in the lexicon of phylogenetics. Syst. Biol. 57:503–507. Britton T., Anderson C.L., Jacquet D., Lundqvist S., Bremer K. 2006. PATHd8—a new method for estimating divergence times in large phylogenetic trees without a molecular clock. Available from: www.math.su.se/PATHd8. Britton T., Anderson C.L., Jacquet D., Lundqvist S., Bremer K. 2007. Estimating divergence times in large phylogenetic trees. Syst. Biol. 56:741–752.

343

Britton T., Oxelman B., Vinnersten A., Bremer K. 2002. Phylogenetic dating with confidence intervals using mean path lengths. Mol. Phylogenet. Evol. 24:58–65. Cronn R., Wendel J.F. 2003. Cryptic trysts, genomic mergers, and plant speciation. New Phytol. 161:133–142. Cronn R.C., Small R.L., Wendel J.F. 1999. Duplicated genes evolve independently after polyploid formation in cotton. Proc. Natl. Acad. Sci. USA. 96:14406–14411. Doyle J.J., Doyle J.L., Rauscher J.T., Brown A.H.D. 2003. Diploid and polyploidy reticulate evolution throughout the history of the perennial soybeans (Glycine subgenus Glycine). New Phytol. 161:121–132. Drummond A.J., Ho S.Y.W., Phillips M.J., Rambaut A. 2006. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4:699–710. Drummond A.J., Rambaut A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7:214. Erixon P., Oxelman B. 2008. Reticulate or treelike chloroplast DNA evolution in Sileneae (Caryophyllaceae)? Mol. Phylogenet. Evol. 48:313–325. Ferguson D., Sang T. 2001. Speciation through homoploid hybridization between allotetraploids in peonies (Paeonia). Proc. Natl. Acad. Sci. USA. 98:3915–3919. Fior S., Karis P.O., Casazza G., Minuto L., Sala F. 2006. Molecular phylogeny of the Caryophyllaceae (Caryophyllales) inferred from chloroplast matK and nuclear rDNA ITS sequences. Am. J. Bot. 93:399–411. Frajman B., Heidari N., Oxelman B. 2009. Phylogenetic relationships of Atocion and Viscaria (Sileneae, Caryophyllaceae) inferred from chloroplast, nuclear ribosomal, and low-copy gene DNA sequences. Taxon. 58(4). Frajman B., Oxelman B. 2007. Reticulate phylogenetics and phytogeographical structure of Heliosperma (Sileneae, Caryophyllaceae) inferred from chloroplast and nuclear DNA sequences. Mol. Phylogenet. Evol. 43:140–155. Frajman B., Rabeler R.K. 2006. Proposal to conserve the name Heliosperma against Ixoca (Caryophyllaceae, Sileneae). Taxon 55:807–808. ´ Gorecki P. 2004. Reconciliation problems for duplication, loss and horizontal gene transfer. In: Bourne P.E., Gusfield D., editors. RECOMB ’04: proceedings of the eighth annual international conference on research in computational molecular biology. New York: Association for Computing Machinery. p. 316–325. Gradstein F.M., Ogg J.G., Smith A.G. , editors. 2004. A geologic time scale 2004. Cambridge: Cambridge University Press. 589 p. Gross B.L., Schwarzbach A.E., Rieseberg L.H. 2003. Origin(s) of the diploid hybrid species Helianthus deserticola (Asteraceae). Am. J. Bot. 90:1708–1719. Hall T.A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41:95. Hallett M.T., Lagergren J., Tofigh A. 2004. Simultaneous identification of duplications and lateral transfers. In: Bourne P.E., Gusfield D., editors. RECOMB ’04: proceedings of the eighth annual international conference on research in computational molecular biology. New York: Association for Computing Machinery. p. 347–356. Holder M.T., Anderson J.A., Holloway A.K. 2001. Difficulties in detecting hybridization. Syst. Biol. 50:978–982. Howarth D.G., Baum D.A. 2005. Genealogical evidence of homoploid hybrid speciation in an adaptive radiation of Scaevola (Goodeniaceae) in the Hawaiian Islands. Evolution. 59:948–961. Huber K.T., Oxelman B., Lott M., Moulton V. 2006. Reconstructing the evolutionary history of polyploids from multilabeled trees. Mol. Biol. Evol. 23:1784–1791. Hudson R.R., Coyne J.A. 2002. Mathematical consequences of the genealogical species concept. Evolution. 56:1557–1565. Huelsenbeck J.P., Ronquist F. 2001. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics. 17:754–755. Jalas J., Suominen J., editors. 1986. Atlas Florae Europaeae 7, Caryophyllaceae (Silenoideae). Helsinki (Finland): Committee for mapping the flora of Europe and Societas Biologica Fennica Vanamo. p. 85–88. James J.K., Abbott R.J. 2005. Recent, allopatric, homoploid hybrid speciation: the origin of Senecio squalidus (Asteraceae) in the British Isles from a hybrid zone on Mount Etna, Sicily. Evolution. 59: 2533–2547.

344

SYSTEMATIC BIOLOGY

Joly S., Bruneau A. 2006. Incorporating allelic variation for reconstructing the evolutionary history of organisms from multiple genes: an example from Rosa in North America. Syst. Biol. 55:623–636. Joly S., Starr J.S., Lewis W.H., Bruneau A. 2006. Polyploid and hybrid evolution in roses east of the Rocky Mountains. Am. J. Bot. 93: 412–425. Jordan G.J., Macphail M.K. 2003. A middle-late Eocene inflorescence of Caryophyllaceae from Tasmania, Australia. Am. J. Bot. 90: 761–768. Kosakovsky Pond S.L., Posada D., Gravenor M.B., Woelk C.H., Frost S.D.W. 2006. Automated phylogenetic detection of recombination using a genetic algorithm. Mol. Biol. Evol. 23:1891–1901. Lewis P.O. 2001. A likelihood approach to estimating phylogeny from discrete morphological character data. Syst. Biol. 50:913–925. Linder C.R., Rieseberg L.H. 2004. Reconstructing patterns of reticulate evolution in plants. Am. J. Bot. 91:1700–1708. Liu L., Pearl D.K. 2007. Species trees from gene trees: reconstructing Bayesian posterior distributions of a species phylogeny using estimated gene tree distributions. Syst. Biol. 56:504–514. Lynch M., Conery J.S. 2000. The evolutionary fate and consequences of duplicate genes. Science. 290:1151–1155. Maddison W.P., Knowles L.L. 2006. Inferring phylogeny despite incomplete lineage sorting. Syst. Biol. 55:21–30. Martin D.P., Williamson C., Posada D. 2005. RDP2: recombination detection and analysis from sequence alignments. Bioinformatics. 21:260–262. Mason-Gamer R.J. 2004. Reticulate evolution, introgression, and intertribal gene capture in an allohexaploid grass. Syst. Biol. 53:25–37. Maureira-Butler I.J., Pfeil B.E., Muangprom A., Osborn T.C., Doyle J.J. 2008. The reticulate history of Medicago (Fabaceae). Syst. Biol. 57:466–482. McKinnon G.E., Vaillancourt R., Jackson H.D., Potts B.M. 2001. Chloroplast sharing in the Tasmanian eucalypts. Evolution. 55: 703–711. Mir C., Toumi L., Jarne P., Sarda V., Di Giusto F., Lumaret R. 2006. Endemic North African Quercus afares Pomel originates from hybridisation between two genetically very distant oak species (Q. suber L. and Q. canariensis Willd.): evidence from nuclear and cytoplasmic markers. Heredity. 96:175–184. ¨ Muller K. 2005. SeqState—primer design and sequence statistics for phylogenetic DNA data sets. Appl. Bioinform. 4:65–69. Nylander J.A.A. 2004. MrAIC.pl. Program distributed by the author. Uppsala (Sweden): Evolutionary Biology Centre, Uppsala University. Nylander J.A.A., Wilgenbusch J.C., Warren D.L., Swofford D.L. 2008. AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics. Bioinformatics. 24:581–583. Okuyama Y., Fujii N., Wakabayashi M., Kawakita A., Ito M., Watanabe M., Murakami N., Makoto K. 2005. Nonuniform concerted evolution and chloroplast capture: heterogeneity of observed introgression patterns in three molecular data partition phylogenies of Asian Mitella (Saxifragaceae). Mol. Biol. Evol. 22:285–296. Oxelman B., Lid´en M. 1995. Generic boundaries in the tribe Sileneae (Caryophyllaceae) as inferred from nuclear rDNA sequences. Taxon. 44:525–542. Oxelman B., Lid´en M., Berglund D. 1997. Chloroplast rps16 intron phylogeny of the tribe Sileneae (Caryophyllaceae). Plant Syst. Evol. 206:393–410. Pamilo P., Nei M. 1988. Relationships between gene trees and species trees. Mol. Biol. Evol. 5:568–583. Poke F.S., Martin D.P., Steane D.A., Vaillancourt R.E., Reid J.B. 2006. The impact of intragenic recombination on phylogenetic reconstruction at the sectional level in Eucalyptus when using a single copy nuclear gene (cinnamoyl CoA reductase). Mol. Phylogenet. Evol. 39:160–170. Popp M., Erixon P., Eggens F., Oxelman B. 2005. Origin and evolution of a circumpolar polyploid species complex in Silene (Caryophyllaceae) inferred from low-copy nuclear RNA polymerase introns, rDNA, and chloroplast DNA. Syst. Bot. 30:302–313. Popp M., Oxelman B. 2001. Inferring the history of the polyploid Silene aegaea (Caryophyllaceae) using plastid and homoeologous nuclear DNA sequences. Mol. Phylogenet. Evol. 20:474–481.

VOL. 58

Popp M., Oxelman B. 2004. Evolution of a RNA polymerase gene family in Silene (Caryophyllaceae)—incomplete concerted evolution and topological congruence among paralogues. Syst. Biol. 53: 914–932. Rambaut A. 1996. Se-Al: Sequence Alignment Editor. Available from: http://tree.bio.ed.ac.uk/software/seal/. Rambaut A. 2006. FigTree v1.0. Available from: http://tree.bio.ed.ac. uk/software/figtree/. Rambaut A., Drummond A.J. 2004. Tracer v1.3. Available from: http:// evolve.zoo.ox.ac.uk/software.html. Rice D.W., Palmer J.D. 2006. An exceptional horizontal gene transfer in plastids: gene replacement by a distant bacterial paralog and evidence that haptophyte and cryptophyte plastids are sisters. BMC Biol. 4:31. Rieseberg L.H. 1991. Homoploid reticulate evolution in Helianthus (Asteraceae): evidence from ribosomal genes. Am. J. Bot. 78: 1218–1237. Rieseberg L.H. 1997. Hybrid origins of plant species. Annu. Rev. Ecol. Syst. 28:359–389. Rieseberg L.H., Carney S.E. 1998. Plant hybridisation. New Phytol. 140:599–624. Rieseberg L.H., Soltis D.E. 1991. Phylogenetic consequences of cytoplasmic gene flow in plants. Evol. Trends Plants. 5:65–84. Rieseberg L.H., Wendel J.F. 1993. Introgression and its consequences in plants. In: Harrison R.G., editor. Hybrid zones and the evolutionary process. New York: Oxford University Press. p. 70–109. Ronquist F., Huelsenbeck J.P. 2003. MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 19:1572– 1574. Rosenberg N.A., Nordborg M. 2002. Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat. Rev. Genet. 3:380–390. Sanderson M.J. 2002. Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Mol. Biol. Evol. 19:101–109. Sanderson M.J. 2003. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics. 19:301–302. Sang T. 2002. Utility of low-copy nuclear gene sequences in plant phylogenetics. Crit. Rev. Biochem. Mol. Biol. 37:121–147. Sang T., Zhong Y. 2000. Testing hybridization hypotheses based on incongruent gene trees. Syst. Biol. 49:422–434. Sennblad B., Schreil E., Sonnhammer A.B., Lagergren J., Arvestad L. 2007. Primetv: a viewer for reconciled trees. BMC Bioinform. 8:148. Simmons M.P., Ochoterena H. 2000. Gaps as characters in sequencebased phylogenetic analyses. Syst. Biol. 49:369–381. Slowinski J., Page R.D.M. 1999. How should species phylogenies be inferred from sequence data? Syst. Biol. 48:814–825. Smith S.A., Donoghue J. 2008. Rates of molecular evolution are linked to life history in flowering plants. Science. 322:86–89. Soltis D., Kuzoff R.K. 1995. Discordance between nuclear and chloroplast phylogenies in the Heuchera group (Saxifragaceae). Evolution. 49:727–742. Soltis D.E., Soltis P.S., Tate J.A. 2003. Advances in the study of polyploidy since Plant speciation. New Phytol. 161:173–191. Swofford D.L. 2002. PAUP*. Phylogenetic analysis using parsymony (*and other methods). Version 4.0b10. Sunderland (MA): Sinauer Associates. Teshima K.M., Innan H. 2004. The effect of gene conversion on the divergence between duplicated genes. Genetics. 166:1553– 1560. Tsitrone A., Kirkpatrick M., Levin D.A. 2003. A model for chloroplast capture. Evolution. 57:1776–1782. V’yugin V.V., Gelfand M.S., Lyubetsky V.A. 2002. Identification of horizontal gene transfer from phylogenetic gene trees. Mol. Biol. 37:571–584. Wang X.-R., Szmidt A.E., Savolainen O. 2001. Genetic composition and diploid hybrid speciation of a high mountain pine, Pinus densata, native to the Tibetan Plateau. Genetics. 159:337–346. Wehe A., Bansal M.S., Burleigh J.G., Eulenstein O. 2008. DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics. 24:1540–1541.

2009

FRAJMAN ET AL.—RETICULATE EVOLUTION OF HELIOSPERMA

Welch M.E., Rieseberg L.H. 2002. Patterns of genetic variation suggest a single, ancient origin for the diploid hybrid species Helianthus paradoxus. Evolution. 56:2126–2137. Wendel J.F., Doyle J.J. 1998. Phylogenetic incongruence: window into genome history and molecular evolution. In: Soltis P., Soltis D., Doyle J., editors. Molecular systematics of plants II. Dodrecht (the Netherlands): Kluwer Academic Press. p. 265–296. Wendel J.F., Stewart J. McD., Rettig J.H. 1991. Molecular evidence for homoploid reticulate evolution among Australian species of Gossypium. Evolution. 45:694–711.

345

Wilgenbusch J.C., Warren D.L., Swofford D.L. 2004. AWTY: a system for graphical exploration of MCMC convergence in Bayesian phylogenetic inference. Available from: http://ceb.csit.fsu.edu/awty. Wolfe A.D., Xiang Q.-Y., Kephart S.R. 1998. Diploid hybrid speciation in Penstemon (Scrophulariaceae). Proc. Natl. Acad. Sci. USA. 95:5112–5115. Received 4 April 2008; reviews returned 9 June 2009; accepted 6 May 2009 Associate Editor: Susanne Renner