Brysting & al. • Polyploid phylogenetic reconstruction
TAXON 60 (2) • April 2011: 333–347
Challenges in polyploid phylogenetic reconstruction: A case story from the arctic-alpine Cerastium alpinum complex Anne K. Brysting,1 Cecilie Mathiesen2 & Thomas Marcussen1 1 Centre for Ecological and Evolutionary Synthesis, Department of Biology, University of Oslo, P.O. Box 1066 Blindern, 0316 Oslo, Norway 2 Microbial Evolution Research Group, Department of Biology, University of Oslo, P.O. Box 1066 Blindern, 0316 Oslo, Norway Author for correspondence: Anne K. Brysting,
[email protected] Abstract Here we illustrate and discuss the major challenges involved in reticulate phylogenetic reconstruction, with special reference to single- and low-copy nuclear data (the RNA polymerase genes) produced for the polyploid Cerastium alpinum group and close relatives. The dynamic nature of polyploid genomes paves the way for evolutionary novelty, and is obviously an important clue for the evolutionary success of polyploid plants, but at the same time it also creates problems in reconstructing the evolutionary history of polyploids. Nascent allopolyploids will hold two homoeologous copies of every gene that is initially single-copy in the parental species; however, immediately after the polyploidization event, modification of the polyploid genome starts, involving gene silencing, pseudogenization and divergence of duplicated genes. Identifying the signatures of reticulation, especially when dealing with old polyploids, may thus be a huge challenge. Sorting of ancestral/diploid variation in the polyploids and additional gene losses and duplications not associated with polyploidy may further complicate the case. Besides these general problems related to incongruent gene and organism lineage phylogenies, there are also several methodological challenges connected with retrieving sequence information from polyploids, such as polymerase errors, differential amplification of homoeologs (PCR selection), generation of chimeric sequences during PCR, and selection of shorter and more common fragments and insertion of incorrect fragments during the cloning reaction. Keywords allopolyploidy; duplicated genes; network construction; reticulate evolution; single- and low-copy nuclear regions
INTRODUCTION Polyploidy and reticulate evolution. — The importance of polyploidy (whole-genome duplication) in plant evolution as a process generating biodiversity and new plant species has thoroughly been demonstrated through many case studies involving both ancient whole-genome duplication events (e.g., Bowers & al., 2003; Paterson & al., 2004; Jaillon & al., 2007; Fawcett & al., 2009; Van de Peer & al., 2009a) as well as recent polyploid speciation (e.g., Abbott & Lowe, 2004; Soltis & al., 2004; Salmon & al., 2005; Soltis & al., 2007). With consequences such as rapid genomic rearrangements, genomic downsizing, movement of genetic elements across genomes, and movement of foreign genetic material into the polyploid genome, polyploidy is an evolutionary trigger (Doyle & al., 2008; Leitch & Leitch, 2008; Soltis & Soltis, 2009; Van de Peer & al., 2009b; Soltis & al., 2010). Successive hybridization and polyploidzation events can build up species complexes of allopolyploids with complicated network-like histories, and the evolutionary history of many plant groups cannot be adequately represented by phylogenetic trees because of such reticulate events (Linder & Rieseberg, 2004; Vriesendorp & Bakker, 2005). The fate of duplicated genes. — The immediate consequence of gene duplications, whether a result of single-gene or whole-genome duplications, is genetic redundancy. A fundamental question in polyploid evolution deals with the fate of duplicated genes (e.g., Adams & Wendel, 2005). Expression of one redundant paralog may become lost due to either epigenetic
regulation, loss-of-function mutations in the regulatory or coding parts of the sequence (pseudogenization), or gene loss. The speed at which gene silencing and gene loss can occur, often within the first generations following the polyploidization event (Comai, 2000; Osborn & al., 2003; Chen, 2007), is aptly demonstrated by the classical examples of truly “recent” allopolyploidy in Tragopogon L., Spartina Schreb., and Senecio L. (Hegarty & al., 2006; Tate & al., 2006; Hegarty & Hiscock, 2008; Ainouche & al., 2009; Buggs & al., 2009). While it is clear that by far the most likely fate of a duplicate gene is gene death (Jaillon & al., 2007; Town & al., 2006; Tuskan & al., 2006), mechanisms accounting for duplications being retained in the genome have, until recently, been less well understood (Yang & al., 2006). According to recent theory, duplicated genes may be preserved by a neutral mechanism in which each paralog accumulates loss-of-function mutations (degeneration) that are complemented by the other copy. Such mutations can happen either at the regulatory level, causing the paralogs to diverge in pattern of expression (duplicationdegeneration-complementation = DDC; Force & al., 1999), or at the product level, causing the paralogs to diverge in function (subfunctionalization; Hughes, 1994). Furthermore, either mechanism can eliminate possible structural trade-offs imposed by different functions performed by a multifunctional gene (Hittinger & Carroll 2007), by unlinking these functions. Subfunctionalization can, thus, be regarded as the first necessary step for duplicated genes to later specialize and acquire new functions (subneofunctionalization; He & Zhang, 2005). 333
Brysting & al. • Polyploid phylogenetic reconstruction
The presence of regulatory and functional subfunctionalization within gene families is well-documented, in eukaryotes in general as well as in plants (e.g., Adams & al., 2003; Drea & al., 2006; Federico & al., 2006; Wang & al., 2006; Yang & al., 2006; Akhunov & al., 2007; Marcussen & al., 2010). Thus, the evolution of homoeologous loci in polyploids is highly dynamic, and it varies among taxonomic groups and also among types of genes whether both copies remain functional, whether one copy becomes silenced or lost while the other copy retains the original function, or whether the two copies diverge in function (Adams & Wendel, 2005; Tate & al., 2005). The dynamic nature of polyploid genomes opens the door for evolutionary novelty and is obviously important for the evolutionary success of polyploid plants, but at the same time it also creates problems when it comes to reconstruction of the evolutionary history of polyploids. Challenges in polyploid phylogenetic reconstruction. — To identify the signatures of hybridization and allopolyploidization in molecular data and to distinguish them from other cases of phylogenetic incongruence can be a challenge (Linder & Rieseberg, 2004; McBreen & Lockhart, 2006). Unravelling ancestral lineages of polyploid species groups poses certain requirements to the molecular markers used. The kinds of markers that up until now have been the two most commonly used in plant phylogeny, chloroplast DNA (cpDNA) and nuclear ribosomal DNA (nrDNA), are in fact unsuitable for reconstruction of reticulate evolutionary histories. As the chloroplast genome is predominantly uniparentally inherited in plants (primarily, but not always, maternally in angiosperms), cpDNA traces the genealogy of only one parent and is unable to provide direct evidence for reticulate speciation events (Sang, 2002; Small & al., 2004). Although the popular ITS (the internal transcribed spacers) marker is biparentally inherited, like all nrDNA, it is neither a reliable marker for reconstructing reticulate evolution as distinct parental contributions tend to become rapidly homogenized by concerted evolution among the homoeologous loci of the nrDNA gene family (Wendel & al., 1995; Buckler & al., 1997; Álvarez & Wendel, 2003; Soltis & al., 2008). In the absence of completed homogenization of homoeologous sequences, ITS may in some cases provide evidence for hybridization and allopolyploidization events (e.g., Rauscher & al., 2002, 2004). However, overall, ITS sequences do not lend themselves to an easy interpretation when studying reticulate speciation events because of the structure and dynamic evolution of nrDNA (first of all the presence of both multiple copies per array and multiple arrays per genome, and the variable and unpredictable strength of concerted evolution). Even though each marker on its own is not useful for interpreting reticulate evolution, incongruent patterns of cpDNA and nrDNA phylogenies have increasingly been attributed to past hybridization and allopolyploidization events (e.g., Rieseberg & Soltis, 1991; Arnold, 1997; Soltis & al., 2003; Frajman & Oxelman, 2007). However, the combination of ITS and cpDNA will not reveal reticulate patterns in cases where ITS of a hybrid genome has been homogenized towards the maternal genome. Secondly, it can not be used successfully in polyploids above tetraploid level where more than two genomes are involved. Finally, it is 334
TAXON 60 (2) • April 2011: 333–347
important to be aware that incongruent phylogenies may be caused also by other processes, including independent gene duplication, random loss of multiple genes and lineage sorting, i.e., the random sorting of ancestral polymorphic alleles in the descendant taxa (Sang, 2002; Rokas & al., 2003; Linder & Rieseberg, 2004). A noteworthy and apparently common feature is phylogenetic incongruence as a result of chloroplast capture, i.e, the introgression of the chloroplast from one species into another (reviewed in Tsitrone & al., 2003), which allows adaptive cytotypes to spread across several hybridizing species in selective sweeps (Muir & Filatov, 2007). Much more promising for the reconstruction of reticulate speciation events in plants is the use of single- and low-copy nuclear genes as these markers are less susceptible to concerted evolution and provide an unlimited source of phylogenetic markers (Sang, 2002; Álvarez & Wendel, 2003; Small & al., 2004). However, there are several good reasons why singleand low-copy nuclear genes have not been widely used. For example, in most cases, primers are not available and have to be developed for each new group. Furthermore, for all nuclear markers (ITS as well as low-copy nuclear markers), it is hard to know a priori exactly how many copies are present, and whether the amplified PCR products represent one or more loci, or alternatively different alleles at the same or more loci. Allopolyploid evolution and ancestry have, nevertheless, been addressed successfully by cloning and analyzing the two or more homoeologous loci of a nuclear gene in various polyploid groups, with each homoeolog tracing its own parental lineage (e.g., Popp & Oxelman, 2001; Doyle & al., 2003; Popp & al., 2005; Smedmark & al., 2005; Brysting & al., 2007; Marcussen & al., 2010). In an idealized situation, where no secondary losses or pseudogenization of homoeologs occur, one round of polyploidy should produce two copies (homoeologs) of a gene that is initially single-copy in the parental species; two rounds of polyploidy occurring independently should produce four copies, and so on. This may be the situation in very young allopolyploid systems, but already in the F1 hybrid and first generations of the nascent allopolyploid, modification of the new mixed genome has begun, involving gene silencing, pseudogenization and divergence of duplicated genes (Hegarty & al., 2006; Tate & al., 2006; Hegarty & Hiscock, 2008; Ainouche & al., 2009; Buggs & al., 2009), and in older polyploids it becomes increasingly difficult to identify and infer past reticulate events (Pfeil & al., 2005). Lineage sorting, additional gene losses and duplications not associated with polyploidy, and failure to sample all paralogs are common problems associated with the use of single- and low-copy nuclear genes, which of course also hamper the interpretation of reticulate relationships from a single gene tree (Linder & Rieseberg, 2004; Pfeil & al., 2005). To tackle these problems and to make reliable species trees or networks, information from several unlinked gene trees needs to be combined. As reticulate evolution cannot be adequately represented by dichotomous phylogenetic trees, there has also been a growing interest in the development of network analyses, which may enhance the possibility of identifying signatures of hybridization and allopolyploidization in molecular data, and furthermore improve our ability to distinguish these mechanisms from
TAXON 60 (2) • April 2011: 333–347
other possible causes of phylogenetic incongruence (Linder & Rieseberg, 2004; Vriesendorp & Bakker, 2005; Huber & al., 2006; McBreen & Lockhart, 2006). The Cerastium case story. — The Arctic is one of the most polyploid-rich areas, particularly of high-level and recently evolved polyploids (Brochmann & al., 2004; Brochmann & Brysting 2008). The Cerastium alpinum L. group (within C. sect. Orthodon in Caryophyllaceae) is one of several arctic-alpine polyploid complexes where high ploidy levels (octoploids, 2n = 8x = 72; and dodecaploids, 2n = 12x = 108) dominate and no diploid progenitors are known. Previous studies of morphology, isozymes, and DNA fingerprints have identified several evolutionary lineages (Brysting & Borgen, 2000; Brysting & Elven, 2000; Hagen & al., 2001), and the low level of cpDNA variation observed suggests recent origins and recurrent episodes of range expansions and contractions during the Quaternary glaciations (Scheen & al., 2004). The circumpolar C. alpinum group consists of six high-ploid species: the amphi-Atlantic C. alpinum (8x), C. arcticum Lange (12x), and C. nigrescens (H.C. Watson) Edmonston ex H.C. Watson (12x); the amphi-Pacific C. fischerianum Ser. (8x); and the more widespread to almost circumpolar C. beeringianum Cham. & Schltdl. (8x) and C. regelii Ostenf. (8x). Three of the species (C. alpinum, C. nigrescens, C. beeringianum) extend into midlatitude mountain ranges in Europe and/or North America. Based on variation in several morphological separable characters, the Panarctic Flora checklist recognizes these taxa as separate entities (Elven, 2007). However, as would be expected in a young system, hybridization and introgression takes place both within and between ploidy levels (Brysting & Elven, 2000; Hagen & al., 2002; Brysting & al., 2007; Brysting, 2008) and in areas were the taxa overlap (e.g., C. alpinum, C. beeringianum and C. arcticum in eastern Canada), the delimitation of taxa may be difficult. More morphological and molecular investigations with thorough sampling of specimens from the whole distribution area are needed to deal with the largely unresolved variation in part of the species complex. In a phylogeny based on non-coding plastid DNA (trnL intron, trnL-trnF spacer, and psbA-trnH spacer; Scheen & al., 2004), the high-ploid species of the C. alpinum group formed a polytomy together with polyploid members of the boreal and temperate C. tomentosum L. and C. arvense L. groups. To resolve this polytomy, Brysting & al. (2007) used successfully a non-coding region of a single-copy nuclear gene (“RPB2”, see below). In addition to the arctic and boreal high-ploids from the C. alpinum and C. arvense groups, related low-ploid species (tetraploids, 2n = 4x = 36) were included as potential progenitors of the higher ploidy levels. Using a network construction algorithm (Huber & al., 2006) to transform the resulting phylogenetic tree into a network, they were actually able to untangle the ancestral genomes of the high-ploid species, and in several cases also to identify the low-ploid progenitor species (Brysting & al., 2007). The RNA polymerase (Pol) genes. — The non-coding nuclear region used by Brysting & al. (2007) is intron 6 of NRPB2, a gene encoding the second largest subunit of nuclear RNA polymerase II (Pol II, or RNAP II). RNA polymerases are the
Brysting & al. • Polyploid phylogenetic reconstruction
principal enzymes responsible for gene transcription and occur in five variants in angiosperms (Pol I, II, III, IV and V), unlike most eukaryotes which typically only have Pol I, II and III (Luo & Hall, 2007; Ream & al., 2009). NRPB2 is typically single-copy in fungi, animals and plants, but a gene duplication occurred early in the core eudicot evolution (Oxelman & al., 2004; Luo & al., 2007). Subsequently, multiple gene duplication and paralog sorting events have happened independently in different core eudicot lineages, and all studies to date indicate that NRPB2 is single-copy in Caryophyllales (e.g., Oxelman & al., 2004; Popp & Oxelman, 2004; Brysting & al., 2007; Popp & al., 2008; Frajman & al., 2009). Even though NRPB2 seems to track ploidy well in the C. alpinum group, making it possible to untangle most of the reticulate history (Brysting & al., 2007), a single gene tree does usually not tell the full and true story. Other unlinked nuclear DNA regions may, thus, be needed to complement the full picture (e.g., Linder & Rieseberg, 2004). For this reason, two additional nuclear regions were sequenced, intron 6 of NRPD2/E2a and intron 6 of NRPD2/E2b. All three genes behave phylogenetically independently in studies of Silene L. of the same family (Popp & Oxelman, 2004). These genes encode the second largest subunit of Pol IV (i.e., NRPD2) or Pol V (NRPE2), but as it is unknown which gene product is associated with which polymerase, we have herein given them provisional names, following the recommendation for Viola L. (Marcussen & al., 2010). NRPD2/ E2a and NRPD2/E2b have originated independently in many angiosperm families through the duplication of NRPD2. Two well-differentiated paralogs, for which there is evidence for subneofunctionalization with respect to Pol IV and Pol V at least in Violaceae, have hitherto been found in Caryophyllaceae, Violaceae, and Brassicaceae, but not in Vitaceae, Lamiaceae and Asteraceae (Popp & Oxelman, 2004; Jaillon & al., 2007; Luo & Hall, 2007; Vilatersana & al., 2007; Marcussen & al., 2010; Bendiksby & Brysting, unpublished data). The subunit nomenclature of nuclear RNA polymerases has varied among research groups and organisms, and is often in conflict with names for unrelated genes. In the following, we have therefore adopted the 4-letter gene names registered with The Arabidopsis Information Resource. The informal names RPB2, RPD2a and RPD2b previously used in a number of biosystematic studies (e.g., Popp & Oxelman, 2004; Brysting & al., 2007; Luo & Hall, 2007; Luo & al., 2007; Vilatersana & al., 2007; Brysting, 2008; Frajman & al., 2009) have therefore been replaced by the more correct NRPB2, NRPD2/E2a and NRPD2/E2b. In this paper, we will illustrate and discuss some of the major challenges in polyploid phylogenetic reconstruction by comparing Cerastium networks produced from NRPB2, NRPD2/E2a, and NRPD2/E2b sequences.
MATERIALs AND METHODS Plant material. — Twenty Cerastium accessions were included in the phylogenetic analyses, representing the high-ploid species of the C. alpinum and C. arvense complexes, as well 335
Brysting & al. • Polyploid phylogenetic reconstruction
as potential low-ploid progenitor species. Cerastium lithospermifolium Fisch. was included as a representative of section Strephodon, and C. cerastoides (L.) Britton from subgenus Dichodon was used as outgroup (Appendix). PCR, cloning and sequencing. — Extraction of total genomic DNA from silica-dried leaves or herbarium specimens as well as cloning and sequencing of the NRPB2 intron were done as part of a previous study (Brysting & al., 2007). All NRPD2/E2a and NRPD2/E2b sequences are newly cloned and sequenced for this study. Pooled and diluted PCR products obtained from degenerated Pol-specific primers (RNAP10F, RNAP11R, RNAP10FF, and RNAP11bR; Popp & Oxelman, 2004) were used as template in a nested PCR using subunitspecific primers (D2F, D2R; Popp & Oxelman, 2004). As cloning of the resulting PCR products selected mainly for the shorter NRPD2/E2a paralog, paralog-specific primers were designed and used to amplify also the longer NRPD2/E2b paralog (CerD2-2cf: ATCGCTTGTGGGGGYACATCAAA; CerD2-2r: ATCTTGAGAATCCAGCCCTGCA). For both primer combinations, an initial 5 min denaturation at 95°C was followed by 35 cycles (30 s denaturation at 95°C, 1 min annealing at 55°C, and 2 min extension at 72°C) and a final 2 min extension at 72°C. Depending on the DNA polymerase used in the PCR reaction (AmpliTaq DNA Polymerase, Applied Biosystems; or Phusion Hot Start High-Fidelity DNA Polymerase, Finnzymes), the resulting PCR products were cloned with Invitrogen’s TOPO TA Cloning Kit for Sequencing or Zero Blunt TOPO PCR Cloning Kit, respectively, following the manufacturer’s manual (except that half reactions were used). At least 30 colonies from each accession were screened by PCR using T7 and M13R universal primers and the following conditions: initial denaturation at 95°C for 10 min followed by 35 cycles (30 s at 94°C, 1 min at 55°C, and 2 min at 72°C) and a final extension at 72°C for 10 min. For most accessions, six or seven products of each length variant (for each NRPD2 paralog) were sequenced. However, in some cases considerably more PCR products (up to 52) were sequenced in an attempt to detect all homoeologous sequences. In a few cases (e.g., for NRPD2/ E2a in the two dodecaploid taxa), only one clone was sequenced of the longest homoeolog as it was not as easily incorporated into the vector as the shorter fragments. Sequencing in both directions was performed using ABI BigDye Terminator sequencing buffer and v.3.1 Cycle Sequencing kit. After ethanol cleaning, the sequencing products were run on an ABI 3730 Genetic Analyzer (Applied Biosystems). Forward and reverse sequences were edited and assembled with the ContiqExpress module in Vector NTI Advance 10.3.0 (Invitrogen). Alignment and phylogenetic analyses. — The resulting sequences were aligned manually using BioEdit v.5.0.9 (Hall, 1999). As sequences of the two NRPD2 paralogs could not be aligned with each other, three separate alignments were produced, NRPB2, NRPD2/E2a and NRPD2/E2b. Recombinant sequences (chimeras) with affinity to two different homoeologous sequences of the same accession may be produced through the amplification and cloning procedure (Popp & Oxelman, 2001; Qiu & al., 2001; Cline & al., 2002). Such sequences were detected during the alignment process and removed from the 336
TAXON 60 (2) • April 2011: 333–347
subsequent analyses. Unique substitutions in single clones were ignored as they were considered the result of PCR-generated mutations due to polymerase errors, which is another potential problem with this methodological approach (Hengen, 1995; Qiu & al., 2001). The number of consensus sequences were further reduced after initial phylogenetic analyses by keeping only one consensus sequence (the one representing most identical sequences) if more sequences of the same accession formed a monophyletic group, assuming that these slightly varying sequence types represented allelic variation. GenBank accession numbers of the resulting consensus sequences are given in the Appendix. Gaps were coded by “simple indel coding” (Simmons & Ochoterena, 2000) and phylogenetic analyses (maximum parsimony and Bayesian) were performed on datasets with and without coded gaps. The maximum parsimony analyses were conducted in TNT (Goloboff & al., 2003), using heuristic search with 1000 replicates and saving 10 trees per replication. The resulting trees were swapped on with tree bisection-reconnection (TBR) saving up to 100,000 trees. Collapsing rule was set to minimum length and random seed was set to “time”. Standard measures for fit of characters, consistency index (CI) and retention index (RI), were calculated manually for the resultant trees based on tree length scores (min., max, and actual tree length). The strength of support for individual branches was estimated using the same settings and 1000 bootstrap replications (Felsenstein, 1985). For the Bayesian analyses, model comparison was performed in MrModeltest v.2.2 (Posada & Crandall, 2001; Nylander, 2004) using the freely available Bioportal computer service (http://www.bioportal.uio.no). For all three datasets (NRPB2, NRPD2/E2a, NRPD2/E2b), the HKY + G model was selected as the best-fitting evolutionary model by hierarchical likelihood-ratio tests (hLRTs), whereas the GTR + G model was selected by the Akaike information criterion (AIC). Two different phylogenetic analyses implementing the two models were performed for each dataset using MrBayes v.3.1.2 (Ronquist & Huelsenbeck, 2003). The coded gap characters were included as a separate binary data partition using a simple model for binary data implemented in MrBayes. The Markov chain Monte Carlo (MCMC) chains were run for 4,000,000 generations, and trees were saved each 100th generation, in all counting 40,000 trees. Burn-in was set to 4000 based on stationarity of the MCMC chains (indicated by low values of the average standard deviation of split frequencies), leaving 36,000 trees for calculation of the consensus tree and posterior probability values. Convergence of the MCMC chains was tested by repeating the Bayesian inference twice from random starting trees. The potential scale reduction factor (PSRF) was 1.0 for all parameters, and tree topologies, mean likelihood scores, and posterior-probability values from independent runs were almost identical, suggesting that the MCMC had run long enough to converge. The resulting consensus trees were so-called multi-labelled trees, in which some terminals represent different homoeologous sequences of the same accession. The multi-labelled trees were transformed into networks, using the algorithm described
Brysting & al. • Polyploid phylogenetic reconstruction
TAXON 60 (2) • April 2011: 333–347
in Huber & al. (2006) and the open-source PADRE software for analysing and displaying reticulate evolution (Lott & al., 2009). Based on the results from PADRE, the presented networks were redrawn and edited using Adobe Illustrator CS3.
RESULTS NRPB2 phylogenetic analyses. — A total of 202 cloned NRPB2 sequences was grouped into 34 consensus sequences, each of which represented from three to twelve cloned sequences, except for one possible pseudogene sequences which was obtained only once from C. eriophorum Kit. (Appendix). Including C. cerastoides as outgroup, the aligned matrix was 1372 bp long. Fifty gaps were scored as present/absent characters and added to the matrix, resulting in all together of 167 parsimony-informative characters. The maximum parsimony analyses with or without the inclusion of coded gap characters produced the same overall tree topology. As internal clade support was generally somewhat higher and the number of most parsimonious trees (MPTs) somewhat lower when coded gap characters were included, only results from these analyses are presented (Fig. 1). The maximum parsimony analysis of the NRPB2 alignment (including coded gaps) resulted in two MPTs, each with a tree length of 396 (CI = 0.88; RI = 0.93). Bayesian inference applying the HKY + G or GTR + G model resulted in identical consensus trees and very similar posterior-probability C. cerastoides C. lithospermifolium C. uniflorum_pse 100 C. eriophorum_pse 1 100 C. runemarkii 4x C. biebersteinii 4x 1 98 81 C. theophrastus 4x 1 1 C. eriophorum 4x 89 0.83 1
53 1
100 1 85 0.95 99 1 0.75
95 1
C. latifolium 4x C. uniflorum 4x
C. alpinum 8x C. arcticum 12x C. nigrescens 12x
94 0.99 1
62 0.94 99 1 100 1 84 1
93 1
values; the consensus tree had the same topology as the parsimony consensus tree except for one minor change in one of the terminal clades (C. arvense and C. arvense subsp. strictum Gaudin constituted a monophyletic group (not shown), whereas the two taxa were part of a polytomy in the parsimony consensus tree). Posterior probabilities (PP; from the analysis applying the GTR + G model) are indicated together with bootstrap support (BS) values on the NRPB2 network in Fig. 1. In the network, which is constructed from and contains the parsimony strict consensus tree, branches representing homoeologous sequences of the same accession have been placed on the top of each other to produce parallel additive branches. In the consensus trees (as well as the network), all the high-ploid arctic taxa and their tetraploid relatives formed a well-supported clade (corresponding to C. sect. Orthodon; BS = 81, PP = 1). Within this clade, two somewhat deviating sequences of the tetraploids C. uniflorum Clairv. and C. eriophorum constituted a sister group to the remaining sequences. The exon part of these two sequences were characterized by several substitutions and indels resulting in frame shifts, and both sequences were interpreted as representing non-functional paralogs (pseudogene). All remaining sequences were interpreted as functional paralogs (inferred from conserved reading frames and intron sites), and there seemed to be an overall good correspondence between the number of homoeologous sequences (copies) found and the ploidy of each taxon: one functional copy was found in all tetraploids, two copies were found in all octoploids, and three copies were found in
C. regelii 8x C. bee. wcan 8x C. jenisejense 8x
C. arv. str. 4x 87 1
60 0.98
C. arvense 8x C. fischerianum 8x C. pusillum 8x C.velutinum 8x C. bee. ecan 8x
Fig. 1. The reticulate evolutionary history of the high-ploid C. alpinum group illustrated by the NRPB2 network. The network is constructed from and contains the NRPB2 strict consensus tree from the maximum parsimony analysis. Hypothetical parental tetraploid lineages are indicated by different colours. Parsimony bootstrap values and posterior probabilities (from the Bayesian analysis) are given above and below branches, respectively. Supposedly nonfunctional homoeologous sequences are indicated by “pse” (pseudogene) after the taxon name. Ploidy levels are indicated for all taxa of C. sect. Orthodon (see Appendix for more information). Abbreviations: C. arv. str. = C. arvense subsp. strictum; C. bee. ecan = C. beeringianum from East Canada; C. bee. wcan = C. beeringianum from West Canada.
337
Brysting & al. • Polyploid phylogenetic reconstruction
TAXON 60 (2) • April 2011: 333–347
the dodecaploid C. nigrescens. However, only two copies were obtained in the dodecaploid C. arcticum. Five highly supported clades of the functional NRPB2 paralog were resolved in the consensus trees (BS = 98–100; PP = 1). Three of the clades included at least one tetraploid taxon: C. uniflorum was sister to one of the C. nigrescens homoeologous sequences (labelled yellow in Fig. 1), C. eriophorum was sister to a clade consisting of C. alpinum and C. arcticum (labelled blue), and C. arvense subsp. strictum was part of a polytomy containing C. arvense subsp. arvense and several other octoploid taxa of the C. arvense group (labelled red). The remaining two clades (labelled green and black, respectively) did not include any tetraploid taxa. NRPD2/E2a phylogenetic analyses. — A total of 265 cloned NRPD2/E2a sequences was grouped into 31 consensus sequences, each of which represented from one to 52 cloned sequences (Appendix). Including C. cerastoides as outgroup, the aligned matrix was 1123 bp long. A total of 48 gaps was scored as present/absent characters and added to the matrix, resulting in all together 186 parsimony-informative characters. The maximum parsimony analysis of the NRPD2/E2a alignment (including coded gaps) resulted in one MPT with a tree length of 320 (CI = 0.89; RI = 0.96). Bayesian inference applying the HKY + G or GTR + G model resulted in identical consensus trees and very similar posterior-probability values; the consensus trees (not shown) had the same topology as the most parsimonious tree. Posterior probabilities (from the analysis applying the GTR + G model) are indicated together with bootstrap support values on the NRPD2/E2a network in Fig. 2. A single sequence copy was found in all tetraploids, two were found in most octoploids (but only one in C. arvense and C. pusillum Ser.), and three copies were found in both Fig. 2. The reticulate evolutionary history of the high-ploid C. alpinum group illustrated by the NRPD2/E2a network. The network is constructed from and contains the NRPD2/ E2a most parsimonious tree from the maximum parsimony analysis. Hypothetical parental tetraploid lineages are indicated by different colours. Parsimony bootstrap values and posterior probabilities (from the Bayesian analysis) are given above and below branches, respectively. For abbreviations, see Fig. 1.
dodecaploids. In the consensus trees (and the network), all the high-ploid arctic taxa and their tetraploid relatives formed a highly supported clade (BS = 100, PP = 1), within which four well-supported clades could be recognized (BS = 95–100; PP = 0.95–1). Two of these clades corresponded to the yellow and blue clades from the NRPB2 network, whereas the green, black and red clades from the NRPB2 network were merged into two clades (labelled brown and black, respectively, in Fig. 2). Other major changes compared to the NRPB2 network were: (1) in the NRPB2 network, the blue clade was sister to the remaining sequences, whereas in the NRPD2/E2a network, the yellow clade held this position; (2) the two tetraploid taxa C. runemarkii Möschl & Rech. f. and C. biebersteinii DC. changed their position from being associated with the blue clade in the NRPB2 network to a sister position (though with low support) relative to the green/black clade in the NRPD2/E2a network. NRPD2/E2b phylogenetic analyses. — A total of 219 cloned NRPD2/E2b sequences was grouped into 37 consensus sequences, each of which represented from one to 20 cloned sequences (Appendix). Including C. cerastoides as outgroup, the aligned matrix was 2199 bp long. A total of 49 gaps was scored as present/absent characters and added to the matrix, resulting in a total of 247 parsimony-informative characters. The maximum parsimony analysis of the NRPD2/E2b alignment (including coded gaps) resulted in two MPTs, each with a tree length of 669 (CI = 0.87; RI = 0.92). Bayesian inference applying the HKY + G or GTR + G model resulted in identical consensus trees and very similar posterior-probability values; the consensus trees had the same topology as the parsimony consensus tree except for minor changes in two of the terminal clades (not shown). Posterior probabilities (from the analysis
C. cerastoides C. lithospermifolium C. latifolium 4x 100 100 C. uniflorum 4x 1 86 0.99
100 1
100 1
0.99
97 1
C. runemarkii 4x C. biebersteinii 4x 0.89 61 0.96
C. alpinum 8x
59 0.76
95 0.92 0.55 59 1
51 0.96 100 1
C. arv. str. 4x 338
C. nigrescens 12x
C. theophrasti 4x C. eriophorum 4x 92
C. jenisejense 8x C. bee. wcan 8x C. regelii 8x C. bee. ecan 8x C. velutinum 8x C. fischerianum 8x C. pusillum 8x C. arvense 8x
C. arcticum 12x
Brysting & al. • Polyploid phylogenetic reconstruction
TAXON 60 (2) • April 2011: 333–347
applying the GTR + G model) are indicated together with bootstrap values on the NRPD2/E2b network in Fig. 3. For several (but not all) taxa, a short and possibly non-functional paralog (inferred from substitutions and indels in the exon part resulting in frame shifts) was found. These sequences grouped together in a sister-group position relative to the remaining sequences. Within this larger clade, one sequence copy was found in all tetraploids, two were found in most octoploids (but only one in C. arvense and C. pusillum), and three copies were found in both dodecaploids. These sequences were grouped into five highly supported clades (BB = 97–100; PP = 1), which largely corresponded to the five main clades of the NRPB2 network. The only major differences were the change of C. beeringianum from East Canada from the red clade (NRPB2) to the green clade (NRPD2/E2b), and the lack of C. arvense and C. pusillum sequences in the red and the black clade, respectively. In Fig. 4, the number of functional copies obtained from each of the three regions is compared among taxa. For each region, the different copies are coloured according to their position within the phylogentic network, indicating the major clade
to which they belong. Overall, there is a good correspondence with the number of functional copies obtained and the ploidy level of the species. However, there are some exceptions. Only two NRPB2 copies were found in the dodecaploid C. arcticum, whereas three NRPD2/E2a and NRPD2/E2b copies were found in this species. The octoploids C. arvense and C. pusillum had two NRPB2 copies, but lacked a NRPD2/E2a and a NRPD2/ E2b copy. For NRPD2/E2a, as many as 52 clones were checked to look for the second C. arvense copy but with no success.
DISCUSSION The relative age of the polyploidization event. — In the C. alpinum species complex, we are probably dealing with polyploidization events at different time scales. Scheen & al. (2004) used a simple molecular clock based on major biogeographic events such as the earliest opening of the Bering Strait and the formation of the Isthmus of Panama to date the cpDNA phylogeny of the whole genus Cerastium. Their analyses suggested an Old World origin of the genus and at least two
C. cerastoides C. lithospermifolium C. arv. str._pse C. biebersteinii_pse 1 C. fischerianum_pse 100 C. bee. ecan_pse 0.87 C. bee. wcan_pse C. arcticum_pse 0.51 C. runemarkii 4x 98 C. biebersteinii 4x 1 C. theophrasti 4x 1 97 C. eriophorum 4x 1 62 1
84 0.99
99 1
0.99
95 1
100 1
C. latifolium 4x C. uniflorum 4x
C. alpinum 8x
C. arcticum 12x
C. nigrescens 12x
100 1
98 1
C. arv. str. 4x 86 0.99
C. regelii 8x C. jenisejense 8x C. bee. wcan 8x C. bee. ecan 8x C. pusillum 8x
65 1 58 97 0.97 97 1 1 52 0.98
C. velutinum 8x C. fischerianum 8x C. arvense 8x
Fig. 3. The reticulate evolutionary history of the high-ploid C. alpinum group illustrated by the NRPD2/E2b network. The network is constructed from and contains the NRPD2/E2b strict consensus tree from the maximum parsimony analysis. Hypothetical parental tetraploid lineages are indicated by different colours. Parsimony bootstrap values and posterior probabilities (from the Bayesian analysis) are given above and below branches, respectively. For abbreviations, see Fig. 1.
339
Brysting & al. • Polyploid phylogenetic reconstruction
Fig. 4. Comparison of homoeologues sequences in octo- and dodecaploid Cerastium species as well as three tetraploid progenitor species. For each Pol region, the homoeologous sequences obtained are coloured according to which clade within the respective network they belong to: the yellow, blue, green, red or black clade within the NRPB2 and NRPD2/E2b networks; the yellow, blue, brown or black clade within the NRPD2/E2a network. For abbreviations, see Fig. 1.
TAXON 60 (2) • April 2011: 333–347
NRPB2
NPRD2/E2b
C. uniflorum 4x C. eriophorum 4x C. arv. str. 4x C. alpinum 8x C. arvense 8x C. pusillum 8x C. velutinum 8x C. fischeri pusillum 8x 88xn um C. bee. ecan 8x C. bee. wcan an n 8x C. regelii 8x C. jenisejense 8x C. nigrescens 12x C. arcticum 12x
migration events into North America from the Old World. The first event possibly took place across the Bering land bridge during the Miocene, with a subsequent colonization of South America during the Pliocene. A more recent migration event into North America took place during the Quaternary, resulting in the current circumpolar distribution of the high-ploid species of the C. alpinum species complex. The origin of the arctic octo- and dodecaploid taxa is most likely related to recurrent episodes of range expansions and contractions during the Quaternary glaciations. This hypothesis is consistent with the very low cpDNA variation found among these species; all arctic species included in the study by Scheen & al. (2004) had identical trnL-F and psbA-trnH sequences. The number of Pol homoeologous sequences (copies) obtained here corresponds in most cases well with ploidy, demonstrating additivity of ancestral lineages, which furthermore strengthens the view that we are dealing with a group of recently formed allopolyploids. The chromosome base number of the genus Cerastium is probably x = 9; however, the diploid chromosome number (2n = 18) has been reported only once, for the Central Asian species C. lithospermifolium (Krogulevich, 1971; referred in Goldblatt, 1981). The tetraploid chromosome number (2n = 36), which is found frequently throughout the genus, is most likely the result of one or more ancient tetraploidization events involving now extinct diploid progenitors. Because of “diploidization” processes, which seem to work actively on polyploid genomes, only remnants of such palaeoploidy events are usually found at the gene level. In some polyploids gene deletion and chromosomal reorganization are so extensive that the genome behaves and is structured like a diploid rather than a polyploid (Soltis & al., 2003; Tate & al., 2005; Thomas & al., 2006). Extensive and rapid genomic changes of polyploid genomes may also involve fusion of chromosomes and changes of the karyotype as has been shown for Arabidopsis thaliana L. (Lysak & al., 2006). Even if the overall karyotype appears unchanged, extensive reorganisation, elimination of chromosome- or genome-specific sequences and gene silencing may still have worked hard on the polyploid genome, modifying it towards a diploidization. The Cerastium tetraploids are most likely “secondary diploids” which have been through a diploidization process. The fact that 340
NRPD2/E2a
Yellow Blue Green Red Black Yellow Blue Brown Black Yellow Blue Green Red Black
they obviously contain only one functional copy of each of the three Pol regions suggests that the NRPB2 and NRPD2/E2b pseudogene sequences, which were obtained in some Cerastium taxa, represent remnants of an older tetraploidization event. It is indeed a major weakness of the Cerastium system that the basic unit in the more recent polyploidization events is already tetraploid, as we could in some cases in principle be comparing different copies resulting from ancient polyploidization. Extracting species phylogenies from gene phylogenies. — The dynamic nature of polyploid genomes even in recently formed polyploids poses severe challenges and limitations when we set out to reconstruct the evolutionary history of a high-ploid species complex like the C. alpinum group. Brysting & al. (2007) concluded that the single-copy NRPB2 region is a suitable marker for disentangling genome mergings in a recently evolved allopolyploid species complex. Overall, they were able to identify the different tetraploid lineages that seem to have contributed to the high-ploid arctic species, and in some cases also to couple them with extant tetraploid progenitor species (Fig. 1). However, they were still left with several unanswered questions, for instance regarding the phylogenetic history of certain polyploids for which a lower number of NRPB2 gene copies was recovered than expected from the ploidy level. To complement the NRPB2 gene tree, we have produced gene trees from NRPD2/E2a and NRPD2/E2b (Figs. 2 and 3), which do actually provide the answer to some of the questions raised by Brysting & al. (2007), but at the same time pose new questions. Gene loss or pseudogenization, or simply lack of PCR amplification, are probable reasons why only two NRPB2 copies were found in the dodecaploid C. arcticum; the species possesses three NRPD2/E2a and three NRPD2/E2b tetraploid genomes as expected. A similar explanation may be given for the amplification of only one NRPD2a and one NRPD2/E2b copy for the octoploids C. arvense and C. pusillum. The red NRPD2/E2b copy is missing in C. arvense, whereas the black copy is missing in C. pusillum (Fig. 4). In the networks, the arctic high-ploids and their closest relatives are grouped into five (NRPB2, NRPD2/E2b) or four (NRPD2/E2a) main clades. The relative position of these clades
TAXON 60 (2) • April 2011: 333–347
is, however, unclear and changes from network to network. The yellow and blue clades, associated with different Central European tetraploids, are recognized in all three networks. Tetraploids from the C. latifolium L. group (C. latifolium and C. uniflorum) are associated with the yellow lineage, which has contributed one tetraploid genome to the dodecaploid C. nigrescens. Evidence of this relationship has previously been provided by morphology and isoenzyme data (Brysting & Borgen, 2000; Brysting & Elven, 2000). Tetraploids from the C. alpinum group (C. eriophorum and C. theophrasti Merx. & Strid) are associated with the blue lineage, which has contributed to the octoploid C. alpinum and the dodecaploid C. arcticum. The close relationship between C. eriophorum and C. alpinum has previously been suggested by morphology and isoenzyme data (Boşcaiu, 1996; Boşcaiu & al., 1997; Brysting & Borgen, 2000; Brysting & Elven, 2000). The Greek endemics C. theophrasti and C. runemarkii have also been suggested as possible tetraploid progenitor taxa within the C. alpinum group, based first of all on seed morphology characters (Merxmüller & Strid, 1977; Boşcaiu, 1996). In all three networks, C. theophrasti is closely associated with the blue clade, whereas the position of C. runemarkii changes somewhat. The tetraploid C. biebersteinii from the C. tomentosum group is associated with the blue clade in the NRPB2 and NRPD2/E2b networks, but not in the NRPD2/ E2a network. The inclusion of more taxa from the C. tomentosum group would probably have influenced and improved the resolution of the networks with regard to C. biebersteinii. The three other main clades resolved in the NRPB2 network (green, black, red) are also recognized in the NRPD2/ E2b network. However, these three clades are intermingled in the NRPD2/E2a network into two well-supported clades; one of these (labelled black in Fig. 2) corresponds to the black clade in NRPB2 and NRPD2/E2b, except for the fact that the tetraploid C. arvense subsp. strictum is included here. Brysting & al. (2007) suggested this taxon as the tetraploid progenitor species of the red clade, which in the NRPB2 network includes the octoploid species of the C. arvense group (C. arvense subsp. arvense, C. velutinum) as well as the Beringian C. fischerianum and the Central Asian C. pusillum (both of which based on morphology have been considered members of the C. alpinum group; Hultén, 1956; Schischkin, 1970; Boşcaiu, 1996; Morton, 2005). The second NRPD2/E2a clade (labelled brown in Fig. 2) combines the green and the red clades from the NRPB2 and NRPD2/E2b networks. If the supposed tetraploid progenitor species of the red clade, C. arvense subsp. strictum had been included in this clade (and not the black clade), the observed patterns could have been explained by levels of resolution too low for proper separation of the green and red clades. However, as the deviating position of C. arvense subsp. strictum can not easily be explained, it seems more likely that lineage sorting has contributed to the observed patterns of these two clades. As duplicated genes are subject to random loss in different species, via random production of pseudogenes, they are subject to the gene tree/species tree problem in much the same manner as lineage sorting of alleles at a locus (Linder & Rieseberg, 2004). By including two more networks based on other Pol regions to supplement the NRPB2 network from Brysting & al.
Brysting & al. • Polyploid phylogenetic reconstruction
(2007), we have provided further support for the identification of the various tetraploid lineages that have contributed to the octoploid C. alpinum and the two dodecaploid taxa, C. arcticum and C. nigrescens. We have answered some of the questions posed by Brysting & al. (2007), e.g., the question whether the presence of only two NRPB2 copies in the dodecaploid C. arcticum could be explained by autopolyploidy. However, as the three networks do differ in several aspects, new questions have appeared. Similar results were obtained by Popp & al. (2005), who produced several gene trees from potentially unlinked regions for a polyploid species group within the genus Silene. The gene trees all had small deviations from the general pattern explained by allopolyploidy, deviations which were better explained by gene duplication, lineage sorting events or lack of information caused by incomplete sampling (Popp & al., 2005). The fact that gene loss, pseudogenization and possible lineage sorting are working independently on different parts of the polyploid genome and, thus, may hamper the interpretation of reticulate evolution even in relative young plant groups like the C. alpinum species complex, emphasizes the importance of examining more gene trees from unlinked nuclear regions before firm conclusions are drawn (Linder & Rieseberg, 2004; Pfeil & al., 2005; Vriesendorp & Bakker, 2005). Studies on Silene and Viola have shown that duplication of NRPD2 is often followed by pseudogenization, complete loss of one paralog, or in some cases gene conversion (sequence replacement or possible mosaic recombinants) (Popp & Oxelman, 2004; Popp & al., 2005; Marcussen & al., 2010). Pseudogenization of one of the two NRPD2 paralogs has occurred also in the whole-genome-sequenced species Arabidopsis thaliana (Luo & Hall, 2007) and Populus trichocarpa Torr. & A.Gray (Ralph & al., 2006). Both NRPD2 paralogs are present in all Cerastium taxa that we have investigated so far, but within each of the two NRPD2 paralogs, several examples of pseudogenization or complete loss of homoeologous sequences have been found. Questions that still remain to be answered are whether both paralogs are expressed and to what extent they might have been specialized. Positive selection indicating such paralog specialization of NRPD2/E2a and NRPD2/E2b has for instance been shown in the genus Viola (Marcussen & al., 2010). Methodological challenges. — Compared to the limited applicability of cpDNA and nrDNA sequences, the use of coding and non-coding sequences of single- or low-copy genes has proven a far better approach when it comes to reconstruction of reticulate evolutionary events (Fig. 5). Nevertheless, besides problems related to independent pseudogenization or loss of different genes in different lineages, discussed in the previous paragraph, there are also methodological challenges in successfully separating the different gene copies before sequencing. The most commonly used approach, and the one applied here, is in vivo cloning of the PCR products utilising competent Escherichia coli cells. Owing to preferential PCR amplification of certain homoeologs (PCR selection), failure to detect all homoeologs present in the genome will always be a potential problem (e.g., Buckler & al., 1997), especially for pseudogenes whose primer sites may not be conserved. This problem can, however, be greatly reduced by trying out additional sets of 341
342
Advantages: • can be amplified with universal primers • usually one 'sequence type', allowing for direct sequencing Disadvantages: • applicable only when the two markers represent each of the parental genomes, typically when ITS has been homogenized towards the paternal parent (as the chloroplast is usually maternally inherited in angiosperms) • incongruence of cpDNA and ITS may result from factors other than reticulation • inapplicable above the tetraploid level when more than two genomes are involved • concerted evolution among ITS homoeologous loci
Fig. 5. Overview of markers systems and molecular approaches for reconstructing the reticulate evolutionary history of hybrids and allopolyploids, where two or more homoeologous genomes have been merged. Advantages and disadvantages of each approach are indicated.
Disadvantages: • it can be difficult to find and reproduce the optimal dilution of template DNA
Disadvantages: • selective amplification of certain homoeologs in the PCR • incorrect inserts resulting from non-specific amplification • PCR recombination may be a problem if the homoeologs are many and/or similar • competent E. coli cells are expensive
Advantages: • minimises PCR recombination and wrong insert of nucleotides • the frequency at which different homoeologs are amplified directly reflects their frequency in the genome • cheaper than cloning with competent cells
2A2: post-PCR separation
Cloning of PCR-amplified homoeologs with competent E. coli cells
Cloning of homoeologs by single molecule PCR (smPCR)
2A1: pre-PCR separation
Advantages: • labour-saving if many samples are to be run
Advantages: • no a priori assumption of which homoeologs are present Disadvantages: • sequences 'sampled' at random; many PCR products need to be screened to ensure that all homoeologs are captured • heteroduplex DNA can be a problem (can be solved) • polymerase reading errors can be a problem (can be solved)
Disadvantages: • amplifies only a priori specified homoeologs; can not be used to find all homoeologs in an unknown polyploid • primers need to be designed based on already existing information
2B: using homoeolog-specific primers in PCR
2A: using general primers in PCR
How to separate homoeologs
Inferring hybrid origin by detection and identification of (all) parental homoeologs
Advantages: • most or all parental homoeologs may be positively identified • no limitations with respect to ploidy level Disadvantages: • few universal primers are available • homoeologs need to be separated before sequenced individually (see below)
approach 2: low-copy nuclear genes
Inferring hybrid origin from incongruences between cpDNA and ITS phylogenies
approach 1: cpDNA + ITS
choice of marker system
Hybrid or (allo)polyploid taxon two or more homoeologous genomes (or remnants thereof)
Brysting & al. • Polyploid phylogenetic reconstruction TAXON 60 (2) • April 2011: 333–347
TAXON 60 (2) • April 2011: 333–347
forward and reverse primers for the target locus (e.g., Marcussen & al., 2010). Isolation of clones with incorrect inserts may also be a problem, especially if the primers are degenerated and the PCR conditions suboptimal. Incorrect inserts are usually short fragments resulting from non-specific PCR amplification and in cases with less optimal PCR products, more than half of the clones may have incorrect inserts, which adds considerably to the work load. Primers that are carefully designed to avoid formation of primer dimers, along with introduction of an extra step of gel purification of the PCR products before cloning, may be helpful in such cases (Sang, 2002). Introduction of artificial mutations due to polymerase errors can be a problem in PCR-based cloning approaches because of the amplification by E. coli of individual DNA molecules (Hengen, 1995; Cline & al., 1996; Qiu & al., 2001). The use of high-fidelity polymerases and optimal PCR conditions can greatly reduce the problem but probably not entirely eliminate it. Several clones of each homoeologous sequence are usually necessary to be able to sort out artificial mutations. A challenge when amplifying from mixed-template samples is the artificial generation of chimeric sequences. Formation of heteroduplex DNA, i.e., DNA where the two strands stem from different gene variants, may occur in the PCR product, especially if the homology is high. This can cause major problems during the cloning process, where the DNA repair system of E. coli may remove mismatches in the heteroduplexes and, as the repair system works patchily, the obtained cloned sequence could be a chimera of the two sequences (e.g., Thompson & al., 2002; Saitoh & Chen, 2008). PCR recombination (“template jumping”; Pääbo & al., 1990) has been attributed to formation of incompletely extended PCR products. In the presence of two similar templates, prematurely terminated products can anneal to non-identical templates and be extended to completion in the next cycle (Qiu & al., 2001; Cronn & al., 2002). In theory, the amount of detectable PCR recombination is expected to increase with the number of gene copies present in the genome and the similarity of these copies. Primer mismatch, which can cause different specificity of the forward and reverse primers, may make PCR recombination more probable for certain gene copies. PCR recombination can be limited by, e.g., using a DNA polymerase with a higher fidelity, reducing the number of PCR cycles, and increasing the extension time during PCR (Qiu & al., 2001; Cronn & al., 2002). If a suitably large number of clones are examined, it may be possible to deconstruct the recombinant sequences into “parental” sequences and remove them, but this approach could involve a lot of work, especially when recombination rates are high. If recombinant sequences are not identified and removed from the phylogenetic analyses, they may severely influence and hamper the final conclusions (Cronn & al., 2002). Many of the problems associated with in vivo cloning of sequences, as outlined above, can be avoided using other approaches, alone or in combination. Homoeolog-specific PCR primers and sequencing primers have been used in numerous case studies and are a powerful and simple way to separate homoeologs (e.g., Sang, 2002; Lihová & al., 2006;
Brysting & al. • Polyploid phylogenetic reconstruction
Shimizu-Inatsugi & al., 2009). Lihová & al. (2006) developed a strategy for designing homoeolog-specific CHS (chalcone synthase) primers without cloning. They sequenced the promoter regions of the mixed homoeologous products using universal primers. Homoeolog-specific insertion/deletion differences bordering unreadable parts of the sequence (with overlaid peaks) turned out as useful positions at which homoeologspecific primers could be designed. To confirm that all the homoeologs were found, they used conserved primers to amplify the CHS gene and then directly sequenced only the second exon to check if double peaks matched the overlay of the obtained homoeologous sequences. In practice, however, this approach is less applicable for higher-ploid systems because of an increasing difficulty in finding apomorphies that are suitable for primer design. If homoeologs differ in size, PCR products may be separated on agarose gel, cut out and cleaned separately; this may be particularly useful if the homoeologs have been differentially amplified in the PCR. A promising alternative to in vivo cloning is in vitro cloning using single molecule PCR (smPCR; reviewed in Kraytsberg & Khrapko, 2005). PCR can be efficiently performed on a single DNA template by performing multiple PCRs at limiting dilution, where DNA concentration is so low that many of the reactions (usually ~50%) by pure chance do not receive any template molecules at all and thus produce no PCR product (Jeffreys & al., 1990). Under such conditions, the positive reactions are most likely to have been initiated by a single template molecule. This simple approach is commonly used in biomedical studies, e.g., mutational studies, but has so far only been used, successfully, in a single biosystematic study on highploid Viola (Marcussen, Brysting & al., in prep.). The method is superior to in vivo cloning in being largely immune to PCR errors, template jumping and allelic preference (Kraytsberg & Khrapko, 2005). This makes smPCR particularly suitable for phylogenetic analyses of polyploids, whose genomes, as shown above, are complex to analyse using other methods. Therefore, it is a pity that smPCR has not yet established itself in the field of plant phylogenetics. Depicting phylogenetic relationships from the obtained sequence data is another challenge when working with reticulated evolution. Differences between gene trees, especially cpDNA and nuclear DNA gene trees, have increasingly been used to argue for interspecific hybridization and allopolyploidization (e.g., Rieseberg & Soltis, 1991; Arnold, 1997; Soltis & al., 2003; Frajman & Oxelman, 2007). The application of consensus and supernetwork methods may be applied in such cases to enhance the interpretation. These methods do not explicitly model reticulate evolutionary history, but provide a visualization of the extent to which various gene trees suggest incongruent relationships (McBreen & Lockhart, 2006). The analysis of single- and low-copy nuclear genes, using any of the molecular approaches reviewed here, has the advantage that the parental homoeologous sequences can be explicitly identified and characterized at least in recently evolved allopolyploids. These sequences can be used in standard phylogenetic tree reconstruction (e.g., maximum parsimony, maximum likelihood or Bayesian analyses) producing gene trees where an 343
Brysting & al. • Polyploid phylogenetic reconstruction
allopolyploid accession will be found at two or more terminal branches (multi-labelled gene trees; Huber & al., 2006). Using an algorithm like the one published by Huber & al. (2006), the gene trees can then be converted into networks where branches representing homoeologous sequences of the same accession are joined.
CONCLUDING REMARKS Polyploidization plays an important role in plant evolution. The use of molecular techniques and sequencing of wholeplant genomes have made it possible to reveal the signatures of polyploidization events even back to the early days of the angiosperms. Phylogenetic reconstruction of plant groups where reticulations are common poses, however, severe challenges. In young polyploid species groups, different patterns of lineage sorting might in principle result in incongruence between phylogenetic markers, similar to what can be encountered among recently evolved diploid species. As the origins of polyploid species are often associated with strong bottlenecks, this problem may, however, be of less importance. In old polyploids, on the other hand, incongruence between phylogenetic markers is often the result of “diploidization” (gene conversion, gene loss etc.) or possibly gene duplication not associated with polyploidy. Combined with the possible extinction of ancestors/ ancestral lineages, these processes may conceal the patterns of reticulations and make it extremely difficult to extract the species phylogeny, especially when only a single or few gene trees are produced. As illustrated in Fig. 5, various marker systems and approaches may be selected when dealing with hybrid or allopolyploid evolution, where two or more homoeologous genomes have been merged. Each approach has advantages and disadvantages, which have to be taken into account. Even though the combination of cpDNA and nuclear ribosomal ITS still has utility, there is no doubt that the use of single- and low-copy nuclear genes has a much larger potential for untangling allopolyploid relationships. So far the use of single- and low-copy nuclear marker systems has been limited first of all because of the limited availability of universal primers. In groups with model species (e.g., Brassicaceae), the numbers of available single- and low-copy nuclear genes are considerably higher than for most other groups, and hopefully with the extreme speed by which whole genomes are being sequenced at the moment, the number of candidate genes for other plant groups will accelerate in the years to come. The main challenges when working with single- and low-copy nuclear genes in allopolyploid species will then be to separate homoeologous copies, and not least be sure that all homoeologous copies have been found. Figure 5 diplays alternative approaches for achieving this, by the use of either homoeolog-specific primers, or general primers where the homoeologs are cloned either pre-PCR (smPCR) or post-PCR (using competent E. coli cells). An alternative strategy, which we have not discussed here, is amplicon sequencing using ultrahigh-throughput sequencing technology; however, this method may still be suboptimal at separating very similar homoeologs. 344
TAXON 60 (2) • April 2011: 333–347
AcknowledgementS Thanks to Russell Orr for helpful advice on the Bayesian analyses, and to two anonymous reviewers whose comments have considerably improved the manuscript.
LITERATURE CITED Abbott, R.J. & Lowe, A.J. 2004. Origins, establishment and evolution of new polyploid species: Senecio cambrensis and S. eboracensis in the British Isles. Biol. J. Linn. Soc. 82: 467–474. Adams, K.L., Cronn, R., Percifield, R. & Wendel, J.F. 2003. Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing. Proc. Natl. Acad. Sci. U.S.A. 100: 4649–4654. Adams, K.L. & Wendel, J.F. 2005. Polyploidy and genome evolution in plants. Curr. Opin. Pl. Biol. 8: 135–141. Ainouche, M.L., Fortune, P.M., Salmon, A., Grandbastien, M.-A., Fukunaga, K., Ricou, M. & Misset, M.-T. 2009. Hybridization, polyploidy and invasion: Lessons from Spartina (Poaceae). Biol. Invas. 11: 1159–1173. Akhunov, E.D., Akhunova, A.R. & Dvorak, J. 2007. Mechanisms and rates of birth and death of dispersed duplicated genes during the evolution of a multigene family in diploid and tetraploid wheats. Molec. Biol. Evol. 24: 539–550. Álvarez, I. & Wendel J.F. 2003. Ribosomal ITS sequences and plant phylogenetic inference. Molec. Phylog. Evol. 29: 417–434. Arnold, M.L. 1997. Natural hybridization and evolution. Oxford: Oxford University Press. Boşcaiu, M.T. 1996. Multidisciplinary studies on some groups of perennial Cerastium species from the Carpathians and the eastern Alps. Ph.D. dissertation, University of Vienna, Vienna, Austria. Boşcaiu, M., Marhold, K. & Ehrendorfer, F. 1997. The Cerastium alpinum group (Caryophyllaceae) in the high mountains of Poland and Slovakia. Phyton (Horn) 37: 1–17. Boşcaiu, M., Vicente, O. & Ehrendorfer, F. 1999. Chromosome numbers, karyotypes and nuclear DNA contents from perennial polyploid groups of Cerastium (Caryophyllaceae). Pl. Syst. Evol. 218: 13–21. Bowers, J.E., Chapman, B.A., Rong, J. & Paterson, A.H. 2003. Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422: 433–438. Brochmann, C. & Brysting, A.K. 2008. The Arctic – an evolutionary freezer? Pl. Ecol. Diversity 1: 181–195. Brochmann, C., Brysting, A.K., Alsos, I., Borgen, L., Grundt, H.H., Scheen, A.-C. & Elven, R. 2004. Polyploidy in arctic plants. Biol. J. Linn. Soc. 82: 521–536. Brysting, A.K. 2000. Chromosome number variation in the polyploid Cerastium alpinum-C. arcticum complex (Caryophyllaceae). Nordic J. Bot. 20: 149–156. Brysting, A.K. 2008. The Arctic Mouse-ear in Scotland – and why it is not arctic. Pl. Ecol. Diversity 1: 321–328. Brysting, A.K. & Borgen, L. 2000. Isozyme analysis of the Cerastium alpinum-C. arcticum complex (Caryophyllaceae) supports a splitting of C. arcticum Lange. Pl. Syst. Evol. 220: 199–221. Brysting, A.K. & Elven, R. 2000. The Cerastium alpinum-C. arcticum complex (Caryophyllaceae): Numerical analyses of morphological variation and a taxonomic revision of C. arcticum Lange s.l. Taxon 49: 189–216. Brysting, A.K., Oxelman, B., Huber, K.T., Moulton, V. & Brochmann, C. 2007. Untangling complex histories of genome mergings in high polyploids. Syst. Biol. 56: 467–476. Buckler, E.S., Ippolito, A. & Holtsford, T.P. 1997. The evolution of ribosomal DNA: Divergent paralogues and phylogenetic implications. Genetics 145: 821–832.
TAXON 60 (2) • April 2011: 333–347
Buggs, R.J.A., Doust, A.N., Tate, J.A., Koh, J., Soltis, K., Feltus, F.A., Paterson, A.H., Soltis, P.S. & Soltis D.E. 2009. Gene loss and silencing in Tragopogon miscellus (Asteraceae): Comparison of natural and synthetic allotetraploids. Heredity 103: 73–81. Chen, Z.J. 2007. Genetic and epigenetic mechanisms for gene expression and phenotypic variation in plant polyploids. Annual Rev. Pl. Biol. 58: 377–406. Cline, J., Braman, J.C. & Hogrefe, H.H. 1996. PCR fidelity of Pfu DNA polymerase and other thermostable DNA polymerases. Nucl. Acids Res. 24: 3546–3551. Comai, L. 2000. Genetic and epigenetic interactions in allopolyploid plants. Genetics 10: 279–284. Cronn, R., Cedroni, M., Haselkorn, T., Grover, C. & Wendel, J.F. 2002. PCR-mediated recombination in amplification products derived from polyploid cotton. Theor. Appl. Genet. 104: 482–489. Doyle, J.J., Doyle, J.T., Rauscher, J.T. & Brown, A.H.D. 2003. Diploid and polyploid reticulate evolution throughout the history of the perennial soybeans (Glycine subgenus Glycine). New Phytol. 161: 121–132. Doyle, J.J., Flagel, L.E., Paterson, A.H., Soltis, D.E., Soltis, P.S. & Wendel, J.F. 2008. Evolutionary genetics of genome merger and doubling in plants. Annual Rev. Genet. 42: 443–461. Drea, S.C., Lao, N.T., Wolfe, K.H. & Kavanagh, T.A. 2006. Gene duplication, exon gain and neofunctionalization of OEP16-related genes in land plants. Pl. J. 46: 723–735. Elven, R. (ed). 2007 onwards. Checklist of the Panarctic flora (PAF) vascular plants. Version Feb. 2008. http://www.binran.ru/infsys/ paflist/index.htm. Favarger, C. 1969. De caryologia Cerastiorum specierum aliquot imprimis in Peninsula Balcanica crescentium. Acta Bot. Croat. 28: 63–74. Fawcett, J.A., Maere, S. & Van de Peer, Y. 2009. Plants with double genomes might have had a better chance to survive the Cretaceous–Tertiary extinction event. Proc. Natl. Acad. Sci. U.S.A. 106: 5737–5742. Federico, M.L., Iñiguez-Luy, F.L., Skadsen, R.W. & Kaeppler, H.F. 2006. Spatial and temporal divergence of expression in duplicated barley germin-like protein-encoding genes. Genetics 174: 179–190. Felsenstein, J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 783–791. Force, A., Lynch, M., Pickett, F.B., Amores, A., Yan, Y.-L. & Postlethwait, J. 1999. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151: 1531–1545. Frajman, B., Eggens, F. & Oxelman, B. 2009. Hybrid origins and homoploid reticulate evolution within Heliosperma (Sileneae, Caryophyllaceae)—a multigene phylogenetic approach with relative dating. Syst. Biol. 58: 328–345. Frajman, B. & Oxelman, B. 2007. Reticulate phylogenetics and phytogeographical stracture of Heliosperma (Sileneae, Caryophyllaceae) inferred from chloroplast and nuclear DNA sequences. Molec. Phylog. Evol. 43: 140–155. Goldblatt, P. 1981. Index to plant chromosome numbers 1975–1978. Michigan: Braun-Brumfield Inc. Goldblatt, P. 1984. Index to plant chromosome numbers 1979–1981. St. Louis: Missouri Botanical Garden Press. Goldblatt, P. & Johnson, D.E. 1996. Index to plant chromosome numbers 1992–1993. St. Louis: Missouri Botanical Garden Press. Goloboff, P., Farris, J. & Nixon, K. 2003. TNT: Tree analysis using new technology. www.zmuc.dk/public/phylogeny. Hagen, A.R., Giese, H. & Brochmann, C. 2001. Trans-Atlantic dispersal and phylogeography of Cerastium arcticum (Caryophyllaceae) inferred from RAPD and SCAR markers. Amer. J. Bot. 88: 103–112. Hagen, A.R., Sæther, T., Borgen, L., Elven, R., Stabbetorp, O.E. & Brochmann, C. 2002. The arctic-alpine polyploids Cerastium alpinum and C. nigrescens (Caryophyllaceae) in a sympatric situation: Breakdown of species integrity? Pl. Syst. Evol. 230: 203–219. Hall, T.A. 1999. BioEdit: A user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acids Symp. Ser. 41: 95–98.
Brysting & al. • Polyploid phylogenetic reconstruction
He, X. & Zhang, J. 2005. Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics 169: 1157–1164. Hegarty, M., Barker, G., Wilson, I., Abbott, R.J., Edwards, K.J. & Hiscock, S.J. 2006. Transcriptome shock after interspecific hybridization in Senecio is ameliorated by genome duplication. Curr. Biol. 16: 1652–1659. Hegarty, M.J. & Hiscock, S.J. 2008. Genomic clues to the evolutionary success of polyploid plants. Curr. Biol. 18: R435–R444. Hengen, P.N. 1995. Methods and reagents—fidelity of DNA polymerases for PCR. Trends Biochem. Sci. 20: 324–325. Hittinger, C. & Carroll, S. 2007. Gene duplication and the adaptive evolution of a classic genetic switch. Nature 449: 677–681. Huber, K.T., Oxelman, B., Lott, M. & Moulton, V. 2006. Reconstructing the evolutionary history of polyploids from multi-labelled trees. Molec. Biol. Evol. 23: 1784–1791. Hughes, A.L. 1994. The evolution of functionally novel proteins after gene duplication. Proc. Roy. Soc. London, Ser. B, Biol. Sci. 256: 119–124. Hultén, E. 1956. The Cerastium alpinum complex: A case of worldwide introgressive hybridization. Svensk Bot. Tidskr. 50: 411–495. Jaillon, O., Aury, J.-M., Noel, B., Policriti, A., Clepet, C., Casagrande, A., Choisne, N., Aubourg, S., Vitulo, N., Jubin, C., Vezzi, A., Legeai, F., Hugueney, P., Dasilva, C., Horner, D., Mica, E., Jublot, D., Poulain, J., Bruyère, C., Billault, A., Segurens, B., Gouyvenoux, M., Ugarte, E., Cattonaro, F., Anthouard, V., Vico, V., Del Fabbro, C., Alaux, M., Di Gaspero, G., Dumas, V., Felice, N., Paillard, S., Juman, I., Moroldo, M., Scalabrin, S., Canaguier, A., Le Clainche, I., Malacrida, G., Durand, E., Pesole, G., Laucou, V., Chatelet, P., Merdinoglu, D., Delledonne, M., Pezzotti, M., Lecharny, A., Scarpelli, C., Artiguenave, F., Pè, M.E., Valle, G., Morgante, M., Caboche, M., Adam-Blondon, A.-F., Weissenbach, J., Quétier, F. & Wincker, P: for The French–Italian Public Consortium for Grapevine Genome Characterization. 2007. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449: 463–467. Jeffreys, A.J., Neumann, R. & Wilson, V. 1990. Repeat unit sequence variation in minisatellites: A novel source of DNA polymorphism for studying variation and mutation by single molecule analysis. Cell 60: 473–485. Khalaf, M.K. & Stace, C.A. 2000. Breeding systems and relationships of the Cerastium tomentosum group. Preslia 72: 323–344. Kraytsberg, Y. & Khrapko, K. 2005. Single-molecule PCR: An artifact-free PCR approach for the analysis of somatic mutations. Expert Rev. Molec. Diagnostics 5: 809–815. Leitch, A.R. & Leitch, I.J. 2008. Genomic plasticity and the diversity of polyploid plants. Science 320: 481–483. Lihová, J., Shimizu, K. & Marhold, S. 2006. Allopolyploid origin of Cardamine asarifolia (Brassicaceae): Incongruence between plastid and nuclear ribosomal DNA sequences solved by a singlecopy nuclear gene. Molec. Phylog. Evol. 39: 759–786. Linder, C.R. & Rieseberg, L.H. 2004. Reconstructing patterns of reticulate evolution in plants. Amer. J. Bot. 91: 1700–1708. Lott, M., Spillner, A., Huber, K.T. & Moulton, V. 2009. PADRE: A package for analyzing and displaying reticulate evolution. Bioinformatics 25: 1199–1200. Löve, Á. 1969. IOPB Chromosome number reports XXII. Taxon 18: 434. Löve, Á. & Löve, D. 1975. Cytotaxonomical atlas of the Arctic flora. Vaduz: Cramer. Luo, J. & Hall, B. 2007. A multistep process gave rise to RNA polymerase IV of land plants. J. Molec. Evol. 64: 101–112. Luo, J., Yoshikawa, N., Hodson, M. & Hall, B.D. 2007. Duplication and paralog sorting of RPB2 and RPB1 genes in core eudicots. Molec. Phylog. Evol. 44: 850–862. Lysak, M.A., Berr, A., Pecinka, A., Schmidt, R., McBreen, K. & Schubert, I. 2006. Mechanisms of chromosome number reduction 345
Brysting & al. • Polyploid phylogenetic reconstruction
in Arabidopsis thaliana and related Brassicaceae species. Proc. Natl. Acad. Sci. U.S.A. 103: 5224–5229. Marcussen, T., Oxelman, B., Skog, A. & Jakobsen, K.S. 2010. Evolution of plant RNA polymerase IV/V genes: Evidence of subneofunctionalization of duplicated NRPD2/NRPE2-like paralogs in Viola (Violaceae). BMC Evol. Biol. 10: 45. DOI: 10.1186/1471-2148-10-45. McBreen, K. & Lockhart, P.J. 2006. Reconstructing reticulate evolutionary histories of plants. Trends Pl. Sci. 11: 398–404. Merxmüller, H. & Strid, A. 1977. A new species in the Cerastium alpinum group from Mt Olympus, Greece. Bot. Not. 130: 69–472. Morton, JK. 2005. Cerastium L. Pp. 74–93 in: Flora of North America Editorial Committee (ed.), Flora of North America, vol. 5. Oxford: Oxford University Press. Muir, G. & Filatov, D. 2007. A selective sweep in the chloroplast DNA of dioecious Silene (section Elisanthe). Genetics 177: 1239–1247. Nylander, J.A.A. 2004. MrModeltest, version 2. Program distributed by the author. Evolutionary Biology Centre, Uppsala University. Osborn, T.C., Pires, J.C., Birchler, J.A., Auger, D.L., Chen, Z.J., Lee, H.-S., Comai, L., Madlung, A., Doerge, R.W., Colot, V. & Martienssen, R.A. 2003. Understanding mechanisms of novel gene expression in polyploids. Trends Genet. 19: 141–147. Oxelman, B., Yoshikawa, N., McConaughy, B.L., Luo, J., Denton, A.L. & Hall, B.D. 2004. RPB2 gene phylogeny in flowering plants, with particular emphasis on asterids. Molec. Phylog. Evol. 32: 462–479. Pääbo, S., Irwin, D.M. & Wilson, A.C. 1990. DNA damage promotes jumping between templates during enzymatic amplification. J. Biol. Chem. 265: 471–4721. Paterson, A.H., Bowers, J.E. & Chapman, B.A. 2004. Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl. Acad. Sci. U.S.A. 101: 9903–908. Pfeil, B.E., Schlueter, J.A., Shoemaker, R.C. & Doyle, J.J. 2005. Placing paleopolyploidy in relation to taxon divergence: A phylogenetic analysis in legumes using 39 gene families. Syst. Biol. 54: 441–454. Popp, M., Erixon, P., Eggens, F. & Oxelman, B. 2005. Origin and evolution of a circumpolar polyploid species complex in Silene (Caryophyllaceae) inferred from low copy nuclear RNA polymerase introns, rDNA, and chloroplast DNA. Syst. Bot. 30: 302–313. Popp, M., Gizaw, A., Nemomissa, S., Suda, J. & Brochmann, C. 2008. Colonization and diversification in the African ‘sky islands’ by Eurasian Lychnis L. (Caryophyllaceae). J. Biogeogr. 35: 1016–1029. Popp, M. & Oxelman, B. 2001. Inferring the history of the polyploid Silene aegaea (Caryophyllaceae) using plastid and homoeologous nuclear DNA sequences. Molec. Phylog. Evol. 20: 474–481. Popp, M. & Oxelman, B. 2004. Evolution of a RNA polymerase gene family in Silene (Caryophyllaceae)—incomplete concerted evolution and topological congruence among paralogues. Syst. Biol. 53: 914–932. Posada, D. & Crandall, K.A. 2001. Selecting the best-fit model of nucleotide substitution. Syst. Biol. 50: 580–601. Qiu, X., Wu, L., Huang, H., McDonel, P.E., Palumbo, A.V., Tiedje, J.M. & Zhou, J. 2001. Evaluation of PCR-generated chimeras, mutations, and heteroduplexes with 16S rRNA gene-based cloning. Appl. Environm. Microbiol. 67: 880–887. Ralph, S., Oddy, C., Cooper, D., Yueh, H., Jancsik, S., Kolosova, N., Philippe, R.N., Aeschliman, D., White, R., Huber, D., Ritland C.E., Benoit, F., Rigby, T., Nantel, A., Butterfield, Y.S., Kirkpatrick, R., Chun, E., Liu, J., Palmquist, D., Wynhoven, B., Stott, J., Yang, G., Barber, S., Holt, R.A., Siddiqui, A., Jones, S.J., Marra, M.A., Ellis, B.E., Douglas, C.J., Ritland, K. & Bohlmann, J. 2006. Genomics of hybrid poplar (Populus trichocarpa × deltoides) interacting with forest tent capillars (Malacosoma disstria): Normalized and full-length cDNA libraries, expressed sequence tags, and a cDNA microarray for the study of insect-induced defences in poplar. Molec. Ecol. 15: 1275–1297. Rauscher, J.T., Doyle, J.J. & Brown A.H.D. 2002. Internal transcribed spacer repeat-specific primers and the analysis of hybridization in 346
TAXON 60 (2) • April 2011: 333–347
the Glycine tomentella (Leguminosae) polyploid complex. Molec. Ecol. 11: 2691–2702. Rauscher, J.T., Doyle, J.J. & Brown, A.H.D. 2004. Multiple origins and nrDNA internal transcribed spacer homeologue evolution in the Glycine tomentella (Leguminosae) allopolyploid complex. Genetics 166: 987–998. Ream, T.S., Haag, J.R., Wierzbicki, A.T., Nicora, C.D., Norbeck, A., Zhu, J.K., Hagen, G., Guilfoyle, T.J., Paša-Tolić, L. & Pikaard, C.S. 2009. Subunit compositions of the RNA-silencing enzymes Pol IV and Pol V reveal their origins as specialized forms of RNA Polymerase II. Molec. Cell 33: 192–203. Rieseberg, L.H. & Soltis, D.E. 1991. Phylogenetic consequences of cytoplasmic gene flow in plants. Evol. Trends Pl. 5: 65–83. Rokas, A., Williams, B.L., King, N. & Caroll, S.B. 2003. Genomescale approaches to resolving incongruence in molecular phylogenies. Nature 425: 798–804. Ronquist, F. & Huelsenbeck, J.P. 2003. MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574. Runemark, H. 1961. Studies in the Aegean flora. III. Bot. Not. 114: 453–456. Saitoh, K. & Chen, W.J. 2008. Reducing cloning artifacts for recovery of allelic sequences by T7 endonuclease I cleavage and single re-extension of PCR products—a benchmark. Gene 423: 92–95. Salmon, A., Ainouche, M.L. & Wendel, J.F. 2005. Genetic and epigenetic consequences of recent hybridization and polyploidy in Spartina (Poaceae). Molec. Ecol. 14: 1163–1175. Sang, T. 2002. Utility of low-copy nuclear gene sequences in plant phylogenetics. Crit. Rev. Biochem. Molec. Biol. 37: 121–147. Scheen, A.-C., Brochmann, C., Brysting, A.K., Elven, R., Morris, A., Soltis, D.E., Soltis, P.S. & Albert, V. 2004. Northern hemisphere biogeography of Cerastium (Caryophyllaceae): Insights from phylogenetic analysis of noncoding plastid nucleotide sequences. Amer. J. Bot. 91: 943–952. Schischkin, B.K. 1970. Cerastium L. Pp. 330–359 in: Komarov, L. & Schischkin, B.K. (eds.), Flora of the USSR, vol. 6. Jerusalem: Israel program for Scientific Translation. Shimizu-Inatsugi, R., Lihová, J., Iwanaga, H., Kudoh, H., Marhold, K., Savolainen, O., Watanabe, K., Yakubov, V.V. & Shimizu, K.K. 2009. The allopolyploid Arabidopsis kamchatica originated from multiple individuals of Arabidopsis lyrata and Arabidopsis halleri. Molec. Ecol. 18: 4024–4048. Simmons, M.P. & Ochoterena, H. 2000. Gaps as characters in sequenced-based phylogenetic analyses. Syst. Biol. 49: 369–381. Small, R.L., Cronn, R.C. & Wendel, J.F. 2004. Use of nuclear genes for phylogeny reconstruction in plants. Austral. Syst. Bot. 17: 145–170. Smedmark, J.E.E., Eriksson, T. & Bremer, B. 2005. Allopolyploid evolution in Geinae (Colurieae: Rosaceae)—Building reticulate species trees from bifurcating gene trees. Organisms Diversity Evol. 5: 275–283. Soltis, D.E., Buggs, R.J.A., Doyle, J.J. & Soltis, P.S. 2010. What we still don’t know about polyploidy. Taxon 59: 1387–1403. Soltis, D.E., Mavrodiev, E.V., Doyle, J.J., Rauscher, J. & Soltis, P.S. 2008. ITS and ETS sequence data and phylogeny reconstruction in allopolyploids and hybrids. Syst. Bot. 33: 7–20. Soltis, D.F., Soltis, P.S., Pires, J.C., Kovarik, A., Tate, J.A. & Mavrodiev, E. 2004. Recent and recurrent polyploidy in Tragopogon (Asteraceae): Cytogenetic, genomic and genetic comparisons. Biol. J. Linn. Soc. 82: 485–501. Soltis, D.E., Soltis, P.S., Schemske, D.W., Hancock, J.F., Thompson, J.N., Husband, B.C. & Judd, W.S. 2007. Autopolyploidy in angiosperms: Have we grossly underestimated the number of species? Taxon 56: 13–30. Soltis, D.E., Soltis, P.S. & Tate, J.A. 2003. Advances in the study of polyploidy since plant speciation. New Phytol. 161: 173–191. Soltis, P.S. & Soltis, D.E. 2009. The role of hybridization in plant speciation. Annual Rev. Pl. Biol. 60: 561–588.
TAXON 60 (2) • April 2011: 333–347
Tate J.A., Ni, Z., Scheen, A.-C., Koh, J., Gilbert, C.A., Lefkowitz, D., Chen, Z.J., Soltis, P.S. & Soltis, D.E. 2006. Evolution and expression of homeologous loci in Tragopogon miscellus (Asteraceae), a recent and reciprocally formed allopolyploid. Genetics 173: 1599–1611. Tate, J.A., Soltis, D.E. & Soltis, P.S. 2005. Polyploidy in plants. Pp. 371–426 in: Gregory, T.R. (ed.), The evolution of the genome. San Diego: Elsevier Academic Press. Thomas, B.C., Pedersen, B. & Freeling, M. 2006. Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dosesensitive genes. Genome Res. 16: 934–946. Thompson, J.R., Marcelino, L.A. & Polz, M.F. 2002. Heteroduplexes in mixed-template amplifications: formation, consequence and elimination by ‘reconditioning PCR’. Nucl. Acids Res. 30: 2083–2088. Town, C.D., Cheunga, F., Maitia, R., Crabtree, J., Haasa, B.J., Wortmana, J.R., Hinea, E.E., Althoffa, R., Arbogasta, T.S., Tallona, L.J., Vigouroux, M., Trickb, M. & Bancroft, I. 2006. Comparative genomics of Brassica oleracea and Arabidopsis thaliana reveal gene loss, fragmentation, and dispersal after polyploidy. Pl. Cell 18: 1348–1359. Tsitrone, A., Kirkpatrick, M. & Levin, D.A. 2003. A model for chloroplast capture. Evolution 57: 1776–1782. Tuskan, G.A., DiFazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., Putnam, N., Ralph, S., Rombaut, S., Salamov, A., Schein, J., Sterck, L., Aerts, A., Bhalerao, R.R., Bhalerao, R.P., Blaudez, D., Boerjan, W., Brun, A., Brunner, A., Busov, V., Campbell, M., Carlson, J., Chalot, M., Chapman, J., Chen, G.-L. Cooper, D., Coutinho, P.M., Couturier, J., Covert, S., Cronk, Q., Cunningham, R., Davis, J., Degroeve, S., Déjardin, A., dePamphilis, C., Detter, J., Dirks, B., Dubchak, I., Duplessis, S., Ehlting, J., Ellis, B., Gendler, K., Goodstein, D., Gribskov, M., Grimwood, J., Groover, A., Gunter, L., Hamberger, B., Heinze, B., Helariutta, Y., Henrissat, B., Holligan, D., Holt, R., Huang, W., Islam-Faridi, N., Jones, S., Jones-Rhoades, M.,
Brysting & al. • Polyploid phylogenetic reconstruction
Jorgensen, R., Joshi, C., Kangasjärvi, J., Karlsson, J., Kelleher, C., Kirkpatrick, R., Kirst, M., Kohler, A., Kalluri, U., Larimer, F., Leebens-Mack, J., Leplé, J.-C., Locascio, P., Lou, Y., Lucas, S., Martin, F., Montanini, B., Napoli, C., Nelson, D.R., Nelson, C., Nieminen, K., Nilsson, O., Pereda, V., Peter, G., Philippe, R., Pilate, G., Poliakov, A., Razumovskaya, J., Richardson, P., Rinaldi, C., Ritland, K., Rouzé, P., Ryaboy, D., Schmutz, J., Schrader, J., Segerman, B., Shin, H., Siddiqui, A., Sterky, F., Terry, A., Tsai, C.-J., Uberbacher, E., Unneberg, P., Vahala, J., Wall, K., Wessler, S., Yang, G., Yin, T., Douglas, C., Marra, M., Sandberg, G., Van de Peer, Y. & Rokhsar, D. 2006. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313: 1596–1604. Van de Peer, Y., Fawcett, J.A., Proost, S., Sterck, L. & Vandepoele, K. 2009a. The flowering world: A tale of duplications. Trends Pl. Sci. 14: 680–688. Van de Peer, Y., Maere, S. & Meyer, A. 2009b. The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 10: 725–732. Vilatersana, R., Brysting, A.K. & Brochmann, C. 2007. Molecular evidence for hybrid origins of the invasive polyploids Carthamus creticus and C. turkestanicus (Cardueae, Asteraceae). Molec. Phylog. Evol. 44: 610–621. Vriesendorp, B. & Bakker F.T. 2005. Reconstructing patterns of reticulate evolution in angiosperms: What can we do? Taxon 54: 593–604. Wang, R., Chong, K. & Wang, T. 2006. Divergence in spatial expression patterns and in response to stimuli of tandem-repeat paralogues encoding a novel class of proline-rich proteins in Oryza sativa. J. Exp. Bot. 57: 2887–2897. Wendel, J.F., Schnabel, A. & Seelanan, T. 1995. Bi-directional interlocus concerted evolution following allopolyploid speciation in cotton (Gossypium). Proc. Natl. Acad. Sci. U.S.A. 92: 280–284. Yang, X., Tuskan, G.A. & Cheng, Z.-M. 2006. Divergence of the Dof gene families in poplar, Arabidopsis, and rice suggests multiple modes of gene evolution after duplication. Pl. Physiol. 142: 820–830.
Appendix. Cerastium accessions with collection locality, voucher information, chromosome number and GenBank accession numbers (accession numbers refer to single original sequences which were used as consensus sequences to represent several other identical or very similar sequences in the phylogenetic analyses). Detailed collection data can be found in Brysting & al. (2007).
Taxon,1 collection locality, voucher information (herbarium), chromosome number (2n),² GenBank accession number (no. of identical cloned sequences) NRPB2; NRPD2/E2a; NRPD2/E2b Cerastium subg. Eucerastium, sect. Orthodon, the C. alpinum group: C. alpinum L., Austria, Tirol, Brysting, A. AL-4-22 (O), 72 (Brysting, 2000), DQ274069 (5), DQ274070 (4); GQ415095 (28), GQ415096 (4); GQ415126 (2), GQ415127 (5). C. arcticum Lange, Svalbard, Nordenskiöld Land, Nordal, I. 4415-2 (O), 108, DQ274071 (7), DQ274072 (7); GQ415097 (1), GQ415098 (11), GQ415099 (8); GQ415128 (1), GQ415129 (5), GQ415130 (3), GQ415131 (8). C. beeringianum Cham. & Schlect., Canada, Nunavut, Aiken, S. & Brysting, A. 01-356 (O), 72, DQ274080 (5), DQ274081 (5); GQ415102 (4), GQ415103 (5); GQ415138 (6), GQ415139 (1), GQ415140 (3). C. beeringianum Cham. & Schlect., Canada, Northwest Territories, Aiken, S. & Brysting, A. 01-560 (O), 72, DQ274078 (7), DQ274079 (8); GQ415104 (5), GQ415105 (7); GQ415135 (6), GQ415136 (1), GQ415137 (9). C. eriophorum Kit., Austria, Gurktaler Alpen, Brysting, A. AL-1-9 (O), 36, DQ274086 (8), DQ274087 (1); GQ415108 (6); GQ415144 (6). C. fischerianum Ser., U.S.A., Alaska, Elven, R. & Solstad, H. (O), 72, DQ274088 (5), DQ274089 (5); GQ415109 (8), GQ415110 (8); GQ415145 (1), GQ415146 (3), GQ415147 (5). C. jenisejense Hultén, Russia, Sakha Republic, Solstad, H. & Elven, R. SUP 04-3835 (O), 72, DQ274090 (3), DQ274091 (4); GQ415111 (11), GQ415112 (8); GQ415148 (5), GQ415149 (5). C. nigrescens (H.C. Watson) Edmonston ex H.C. Watson, Norway, Sør-Trøndelag, Sæther, T. & Hagen, A. HS9070 (O), 108 (Hagen & al., 2002), DQ274097 (7), DQ274098 (3), DQ274099 (3); GQ415115 (1), GQ415116 (2), GQ415117 (7); GQ415152 (4), GQ415153 (2), GQ415154 (7). C. pusillum Ser., Russia, Altai, Grundt, H.H. & Borgen, L. (O), 72, DQ274100 (7), DQ274101 (6); GQ415118 (20); GQ415155 (7). C. regelii Ostenf., Svalbard, Nordenskiöld Land, Nordal, I. 4414-3 (O), 72, DQ274102 (6), DQ274103 (6); GQ415119 (4), GQ415120 (6); GQ415156 (7), GQ415157 (3). C. runemarkii Möschl & Rech. f., Greece, Naxos, Böhling, N. 4721 (O), 36, DQ274104 (6); GQ415121 (7); GQ415158 (10). C. theophrasti Merx. & Strid, Greece, Macedonia, Vilatersana, R. V-534 (O), 36, GQ415094 (6); GQ415122 (7); GQ415159 (6). The C. latifolium group: C. latifolium L., Austria, Tirol, Brysting, A. AL-4-7 (O), 36, DQ274092 (12); GQ415113 (8); GQ415150 (9). C. uniflorum Clairv., Austria, Kärnten, Brysting, A. AL-2-8 (O), 36 (Brysting, 2000), DQ274106 (6), DQ274107 (7); GQ415123 (2); GQ415160 (7). The C. arvense group: C. arvense L., Norway, Oslo, Scheen, A.-C. 01/01 (O), 72, DQ274075 (5), DQ274076 (7); GQ415100 (52); GQ415132 (20). C. arvense L. subsp. strictum Gaudin, Canada, Yukon Territory, Bennett, B. & al. 03/0814 (O), 36, DQ274077 (5); GQ415101 (12); GQ415133 (8), GQ415134 (15). C. velutinum Raf., U.S.A., Maryland, Gustafson, D.J. DJG18, 72, DQ274108 (4), DQ274109 (6); GQ415124 (2), GQ415125 (7); GQ415161 (1), GQ415162 (11). The C. tomentosum group: C. biebersteinii DC., Ukraine, South Crimea, Khalaf, M.K. & Stace, C.A. CER81 (LTR), 36 (Khalaf & Stace, 2000), DQ274082 (8); GQ415106 (5); GQ415141 (4), GQ415142 (6). Sect. Strephodon: C. lithospermifolium Fisch., Russia, Altai, Tribsch, A. & Essl, F. 10490 (O), 72 (Brysting, unpub.), DQ274093 (11); GQ415114 (7); GQ415151 (4). Subg. Dichodon: C. cerastoides (L.) Britton, Norway, Nordland, Stabbetorp, O.E. (O), 38, DQ274084 (7); GQ415107 (2); GQ415143 (10). 1 Taxa from C. sect. Orthodon, subg. Eucerastium (Schischkin, 1970) are designated to groups suggested in the literature (Hultén, 1956; Merxmüller & Strid, 1977; Boşcaiu & al., 1997; Khalaf & Stace, 2000; Scheen & al., 2004). 2 Chromosome numbers from the literature: Runemark (1961), Favarger (1969), Löve (1969), Löve & Löve (1975), Merxmüller & Strid (1977), Goldblatt (1981, 1984), Goldblatt & Johnson (1996), Boşcaiu & al. (1999), Brysting (2000), Khalaf & Stace (2000). Chromosome vouchers are indicated by a reference following the chromosome number.
347