PROOF u 30240 u IJPS u 168 u 2 u 2007 u DER u

1 downloads 0 Views 4MB Size Report
Dec 5, 2006 - In more than 30 analyses of up to 10 million generations each (with an ... mum likelihood programs available to date (but see Guindon ... analyzed, and consider its performance relative to previously .... 10) to reflect the ..... viewer for helpful comments on the manuscript. .... Vol ______ No _____ Month ...
PROOF u 30240 u IJPS u 168 u 2 u 2007 u DER u CHECKED u 12/05/06 Int. J. Plant Sci. 168(2):000–000. 2007. Ó 2007 by The University of Chicago. All rights reserved. 1058-5893/2007/16802-00XX$15.00

A 567-TAXON DATA SET FOR ANGIOSPERMS: THE CHALLENGES POSED BY BAYESIAN ANALYSES OF LARGE DATA SETS Douglas E. Soltis,1 ,* Matthew A. Gitzendanner,* and Pamela S. Soltisy *Department of Botany, University of Florida, Gainesville, Florida 32611, U.S.A.; and yFlorida Museum of Natural History, University of Florida, Gainesville, Florida 32611, U.S.A.

Bayesian analyses of a three-gene, 567-taxon (560 angiosperms, seven outgroups) data set revealed the analytical challenges posed by such large data sets. Determining stationarity in Markov chains for such large data sets is difficult. In more than 30 analyses of up to 10 million generations each (with an average run time of 45 d), log-likelihood plots showed that runs can stabilize for several million generations before making jumps in likelihood scores. Simultaneous independent runs reached apparent stationarity as early as 2 million generations and as late as 9.7 million generations, suggesting that (a) 10 million generations are insufficient for data sets of this size and (b) periods of stationarity even as long as 6 million generations should not be taken as an indication that the tree is fully optimized. Our Bayesian analyses recovered a topology highly similar to that found previously with parsimony. However, a few topological differences were found between the Bayesian and shortest parsimony trees obtained for the same data set, the most noteworthy of which is that a clade (posterior probability ½pp ¼ 0:99) of Amborellaceae þ Nymphaeaceae is sister to all other extant angiosperms (pp ¼ 1:0) in the Bayesian tree, whereas Amborellaceae alone are sister to all other extant angiosperms with parsimony. Additionally, the Bayesian analysis indicates that the magnoliids and Chloranthaceae are sister to Ceratophyllum and eudicots rather than the monocots, as indicated by the parsimony analyses. Many clades receiving moderate to low jackknife support in parsimony analyses received pp values of 1.0.

Q1

Keywords: angiosperm phylogeny, Bayesian analyses, large data sets.

Introduction

the angiosperms; nonetheless, numerous relationships remain unclear. We previously conducted a parsimony analysis of a three-gene data set for 560 angiosperms plus seven outgroups (Soltis et al. 1999, 2000a). At that time, model-based approaches with such large data sets were not feasible. However, with the development of MrBayes shortly after the publication of these articles and with improvements in computational speed and the establishment of computer clusters, it is now possible to conduct Bayesian analyses of data sets of these dimensions. The potential benefits of model-based approaches for alleviating long-branch attraction (e.g., Huelsenbeck 1995; Q2 Gaucher and Miyamoto 2005) make Bayesian analysis particularly attractive for reconstructing deep branches of angiosperm phylogeny. However, the potential pitfalls of Bayesian analyses of data sets of this size are not clear. Here we report the results of a Bayesian analysis of the 567-taxon, threegene data set for angiosperms, one of the largest data sets yet analyzed, and consider its performance relative to previously published parsimony results.

Bayesian inference is now routinely used in phylogeny reconstruction and has been employed in analyses of diverse organisms, including plants. Although the advantages of Bayesian analyses for large data sets have been espoused (e.g., Huelsenbeck et al. 2001), many properties of Bayesian methods remain unexplored (e.g., Goloboff and Pol 2005; Randle et al. 2005; Yang and Rannala 2005). For example, surprisingly few Bayesian analyses have involved large data sets. The demonstration by Huelsenbeck et al. (2001) that MrBayes could provide model-based phylogenetic inference, with support values, for a data set of 357 angiosperms certainly contributed to the enthusiasm for the Bayesian approach. More recent Bayesian studies by Rydin et al. (2002) and Qiu et al. (2005) for ca. 100 plant taxa and by Wurdack et al. (2005) for more than 200 plant species have pushed model-based inference beyond the limits typical of most maximum likelihood programs available to date (but see Guindon and Gascuel 2003; Stamatakis et al. 2005; Zwickl 2006). Perhaps most notable are recent Bayesian analyses of fungi (Lutzoni et al. 2004), especially a two-locus data set (nucSSU and nucLSU rDNA) with 558 species representing all traditionally recognized fungal phyla. Considerable progress has been made in recent years toward resolving phylogenetic relationships at deep levels in 1

Material and Methods Data Set We used the three-gene data set described by Soltis et al. (2000a): rbcL (1428 bp), atpB (1528 bp), and 18S rDNA (1855 bp). This data set had 4811 aligned characters per taxon (after excluding various sites, 4621 aligned characters) for 560 angiosperms and seven outgroup taxa representing

Author for correspondence; e-mail [email protected].

Manuscript received March 2006; revised manuscript received July 2006.

1

2

INTERNATIONAL JOURNAL OF PLANT SCIENCES

three gymnosperm lineages: Ephedra, Gnetum, and Welwitschia (gnetophytes), Ginkgo, and Pinus, Podocarpus, and Taxus (conifers). Collection and voucher information are provided by Soltis et al. (2000a).

Bayesian Analyses Selection of an appropriate model of sequence evolution Q3 (e.g., using ModelTest [Posada and Crandall 1993] or MrModelTest [Nylander 2004]) for this data set could not be accomplished via standard methods because PAUP*, version Q4 4.0b10 (Swofford 2000), was unable to calculate the likelihood of the tree for some of the more complex models. Therefore, we used a reduced data set with the same three genes but only 193 taxa to run these model selection tests (data set from Soltis et al. 1998). Results of these tests indicated that GTR þ I þ G was the best model for the reduced data set, based on both hierarchical likelihood ratio tests and the Akaike Information Criterion. Therefore, the results presented here used this model. Other runs were conducted using a simpler model (K80 þ I þ G) in an effort to decrease computation time. However, we found that these runs were as slow as, or slower than, those with GTR þ I þ G (data not shown). All analyses were conducted with MrBayes, version 3.0b4 (preliminary analyses), or MrBayes, version 3.1.2 (published Q5 results), compiled for parallel processing (MPI enabled) (Huelsenbeck and Ronquist 2001; Ronquist and Huelsenbeck 2003; Altekar et al. 2004). A 16-node Apple XServe dual-G5 computer cluster with MPICH, version 1.2.7 (Gropp et al. 1996), and Sun Grid Engine (Sun Microsystems) was used for the analyses, separating runs and/or chains across processors. In total, more than 7.1 central processing unit (CPU) years were used for the 30 runs. Runs were conducted with GTR þ I þ G, with or without starting trees, and with different heating regimes. All runs were planned to run for 10 million generations. For most analyses, the computer cluster took about 2 wk per million generations. For all analyses, four chains (three hot and one cold) were run, and the cold chain was sampled every 1000 generations; other parameters were left at default values, except when a user tree was supplied or a different heating parameter was used. This user tree was similar in topology to the presented results. Only a single outgroup can be specified with MrBayes; Ginkgo biloba was set as the outgroup for all analyses, with several additional gymnosperms included, as per the original parsimony analyses of Soltis et al. (2000a). Twelve runs, with four chains each, using the GTR þ I þ G model ran for at least 10 million generations. These were stopped at generation 10,160,000 to summarize the results. Log-likelihood plots over time were examined to determine how best to combine the results from multiple runs and what Q6 burn-in to use for the runs (fig. A1). How to combine trees from multiple runs has not been fully addressed, especially when runs reach stationarity at different times during the run, as is common in larger data sets. Some runs had stalled at relatively low likelihoods for as long as 8 million generations, while others showed large increases in likelihoods as recently as 300,000 generations before termination. It could

not be argued, therefore, that these runs had reached a stationary point with good convergence among all runs. However, nine of the 12 runs had converged on a likelihood score above 238,100 (fig. A1). Based on the log-likelihood graphs, we determined that the appropriate set of trees to examine was those obtained after a run had reached 238,050 for the first time. Some runs had reached this point early on (2.7 million generations) while others took much longer to reach this point (9.7 million generations). So as not to bias the consensus tree by the length of time that a run had been sampling the higher likelihood trees, we chose to select the same time frame from all runs. Additionally, we felt that it was important that runs should have at least 2.5 million generations of samples at the higher likelihoods. The last run to reach the 238,050 cutoff point and have at least 2.5 million generations before termination did so after 7,372,000 generations. Seven runs met our qualification criteria; thus, the trees from generations 7,372,000– 10,160,000 were combined from these seven runs to produce the consensus tree and posterior probabilities presented. The impact of different methods of combining trees from multiple runs will be more fully explored in a later article (Gitzendan- Q7 ner et al., in prep.). We provide a summary tree (fig. 1) and have divided it into 11 additional figures (figs. 2–12) and present them in an order and circumscription that matches the figures published for the parsimony analysis of the same data set (Soltis et al. 2000a). In this way, the topologies can be readily compared. Soltis et al.’s (2000a) tree figures can also be obtained from the Deep Time Web site (http://www.flmnh.ufl.edu/deeptime/ projectsummary.html). For ease of comparison, we have generally followed the family names used by Soltis et al. (2000a). However, the Icacinaceae were shown to be polyphyletic (Ka˚rehed 2001); we have labeled figure 12 accordingly, using the appropriate family names Cardiopteridaceae and Stemonuraceae. We have also labeled Plantaginaceae and Orobanchaceae (fig. 11) and Primulaceae (fig. 10) to reflect the current delimitation of these families.

Results General Results The length of time that it takes to analyze a data set of this magnitude limits our ability to compare different run conditions (i.e., the effects of starting trees, heating parameters, models of evolution). The set of 12 GTR þ I þ G runs summarized took 2 yr of CPU time and 2 mo of real time. All together, 30 runs were started, with many—indeed, most—terminating prematurely and only three additional runs reaching the 10-million-generation mark. Because of these limitations, we do not discuss the results of these run conditions beyond stating that providing a topology without branch lengths as a starting tree (in which case MrBayes assigns equal length to all branches) does little to help the analysis, as these trees have likelihoods far below the optimum, and dramatic rearrangements can easily occur early on in the analysis, leading to poor topologies that must then be optimized, just as in runs without starting trees.

SOLTIS ET AL.—BAYESIAN ANALYSIS OF ANGIOSPERMS

Phylogenetic Implications We provide a general overview of the Bayesian topology (figs. 1–12), focusing on the overall pattern and deep-level relationships. We do not provide great detail for each clade recognized at the ordinal level. Basal angiosperms. Our Bayesian analysis resulted in a sister group of Amborellaceae þ Nymphaeaceae (sensu APG II 2003) with a posterior probability (pp) of 0.99; this clade is sister to all other extant angiosperms (pp ¼ 1:0; figs. 1, 2). Austrobaileyales (pp ¼ 1:0) are sister to the remaining angiosperms, the monophyly of which also received a pp of 1.0. The remaining angiosperms form two clades: (1) the monocots (pp ¼ 1:0) and (2) a trichotomy (pp ¼ 0:56) consisting of Chloranthaceae (pp ¼ 1:0), the magnoliids (pp ¼ 0:99), and the eudicots þ Ceratophyllaceae (pp ¼ 0:91). Within the magnoliids, Piperales, Magnoliales, Laurales, and Canellales each have pp ¼ 1:0; Magnoliales and Canellales are sisters (pp ¼ 0:96), and Laurales are sister to Magnoliales þ Canellales (pp ¼ 0:70; fig. 2). Monocots. Acorus is sister to a clade (pp ¼ 1:0) of all reQ8 maining monocots (figs. 1, 3, 4). Alismatales (pp ¼ 1:0), represented here by Zosteraceae, Araceae, Hydrocharitaceae, Aponogetonaceae, and Tofieldiaceae, follow Acoraceae as sister to all remaining monocots, which form a clade with pp ¼ 1:0. Petrosaviaceae (pp ¼ 1:0) are subsequently sister to the remaining monocots (pp ¼ 0:97), which consist of four clades whose relationships are unresolved: (1) Liliales (pp ¼ 1:0), (2) Pandanales (pp ¼ 1:0), (3) Asparagales (pp ¼ 1:0), and (4) a large clade (pp ¼ 0:57) of Dioscoreales (pp ¼ 1:0) plus the commelinids (pp ¼ 0:99). Within the commelinid clade (fig. 4), our analyses suggest a tetrachotomy of (1) a clade (pp ¼ 0:72) of Commelinaceae (pp ¼ 1:0) plus Zingiberales (pp ¼ 1:0), (2) Poales (pp ¼ 1:0), (3) Arecales (pp ¼ 1:0), and (4) Dasypogonaceae (pp ¼ 1:0). Basal eudicots. Within the eudicot clade (pp ¼ 1:0), a grade of basal eudicots is present (fig. 5). Ranunculales (pp ¼ 1:0) are sister to all remaining eudicots (pp ¼ 0:99). Following Ranunculales, Proteales (pp ¼ 1:0) are sister to the remaining eudicots, which are supported as a clade with pp ¼ 0:88. Within the latter clade, Sabiaceae are sister to a clade (pp ¼ 1:0) of all remaining eudicots. Following Sabiaceae, Buxaceae are sister to a clade (pp ¼ 0:50) of the remaining eudicots, within which Trochodendraceae are sister to the core eudicots (pp ¼ 1:0). Core eudicots. Gunnerales (pp ¼ 1:0), consisting of Gunneraceae þ Myrothamnaceae, are sister to the remaining core eudicots, the monophyly of which receives pp ¼ 0:98 (fig. 5). The remaining core eudicots form two clades (figs. 1, 5): (1) a clade (pp ¼ 1:0) of Saxifragales ðpp ¼ 1:0Þ þ rosids (pp ¼ 1:0) (fig. 6) and (2) a clade (pp ¼ 0:90) composed of Berberidopsidales, Santalales, caryophyllids þ Dilleniaceae, and asterids. Within the second clade, Berberidopsidales (pp ¼ 1:0; composed of Berberidopsis and Aextoxicon) are sister to a clade (pp ¼ 0:82) of asterids (pp ¼ 1:0), Santalales (pp ¼ 1:0), and a sister group (pp ¼ 1:0) of Caryophyllales ðpp ¼ 1:0Þ þ Dilleniaceae (pp ¼ 1:0). The relationship among these three clades is unresolved. Caryophyllales. Rhabdodendraceae are sister to the remainder of the Caryophyllales (pp ¼ 0:99), which consist of

3

two subclades, each with pp ¼ 1:0: (1) Plumbaginaceae, Polygonaceae, Frankeniaceae, Tamaricaceae, Dioncophyllaceae, Ancistrocladaceae, Droseraceae, and Nepenthaceae and (2) Phytolaccaceae, Aizoaceae, Nyctaginaceae, Cactaceae, Portulacaceae, Molluginaceae, Amaranthaceae, Caryophyllaceae, Asteropeiaceae, and Simmondsiaceae (fig. 5). Within the first of the two subclades of Caryophyllales, Plumbaginaceae þ Polygonaceae (pp ¼ 1:0), Frankeniaceaeþ Tamaricaceae (pp ¼ 1:0), and a carnivorous clade of Dioncophyllaceae, Ancistrocladaceae, Droseraceae, and Nepenthaceae (pp ¼ 1:0) form a trichotomy. Within the second subclade of Caryophyllales, Simmondsiaceae are sister to the remaining members (pp ¼ 1:0), followed by Asteropieaceae as sister to the remaining families (pp ¼ 1:0). Caryophyllaceaeþ Amaranthaceae (pp ¼ 1:0) are then sister to the remaining families (pp ¼ 1:0), subsequently followed by Molluginaceae, which are sister to a clade (pp ¼ 0:77) of Portulacaceae, Cactaceae, Nyctaginaceae, Phytolaccaceae, and Aizoaceae. Of these remaining families, Portulacaceae þ Cactaceae (pp ¼ 1:0) are sister to a clade (pp ¼ 1:0) of Nyctaginaceae and Aizoaceaeþ Phytolaccaceae (pp ¼ 1:0). Saxifragales. Within Saxifragales, Hamamelidaceae (pp ¼ 0:99) are sister to a clade (pp ¼ 0:56) with Daphyniphyllaceae sister to the remaining families (pp ¼ 0:88). Within the latter clade, Altingiaceae (pp ¼ 1:0), followed by Cercidiphyllaceae, are subsequent sisters to a clade (pp ¼ 0:59) of the remaining families. Paeoniaceae are sister to the remaining families (pp ¼ 0:60), which fall into two clades. One clade (pp ¼ 1:0) shows Grossulariaceae as sister to a clade (pp ¼ 0:97) with Pterostemonaceae þ Iteaceae (pp ¼ 1:0) sister to Saxifragaceae (pp ¼ 1:0). The other clade (pp ¼ 1:0) consists of Crassulaceae ðpp ¼ 1:0Þ þ Haloragaceae s.l. (pp ¼ 1:0; Tetracarpaeacae, Penthoraceae, Haloragaceae s.s.). Rosids. Within the rosid clade (pp ¼ 1:0), Vitaceae are sister to the remainder of the clade, which has a pp of 1.0 (figs. 1, 6). Myrtales (pp ¼ 1:0) are then sister to all remaining rosids, which form a clade with pp of 0.85. The remaining rosids consist of a trichotomy with (1) Geraniales (pp ¼ 1:0), (2) a clade (pp ¼ 0:62) of Crossosomatales (pp ¼ 1:0) plus malvids (pp ¼ 1:0), and (3) the fabids (pp ¼ 1:0). This analysis supports a broadly defined Crossosomatales that includes Ixerbaceae and Aphloiaceae in addition to Staphyleaceae, Crossosomataceae, and Stachyuraceae of Crossosomatales sensu APG II (2003). Most rosids are found within two large clades, each with pp ¼ 1:0: the fabids (eurosid I) (figs. 1, 7, 8) and the malvids (eurosid II) (figs. 1, 9). The fabid clade consists of a trichotomy of (1) Zygophyllales (pp ¼ 1:0), (2) the nitrogen-fixing clade (pp ¼ 1:0) (Fabales [1.0], Fagales [1.0], Cucurbitales [1.0], Rosales [1.0]) (fig. 7), and (3) a clade (pp ¼ 1:0) of Malpighiales (1.0), Celastrales (1.0), and Huaceae plus Oxalidales (1.0) (fig. 8). Within the nitrogen-fixing clade, Fabales are sister to a clade (pp ¼ 0:61) of Cucurbitales, Rosales, and Fagales; Rosales and Fagales are sisters (pp ¼ 0:51; figs. 1, 7). Within the second of these two fabid clades, Oxalidales þ Huaceae (pp ¼ 0:87) are sister to Celastrales þ Malpighiales (0.66). Within the malvid clade, Tapisciaceae appear as sister to Brassicales (pp ¼ 1:0), with pp ¼ 0:73. This sister group is, in turn, sister to a clade (pp ¼ 1:0) of Malvales ðpp ¼ 1:0Þþ Sapindales (pp ¼ 1:0).

Fig. 1 Summary of majority-rule consensus, showing relationships among major clades, based on Bayesian analysis of the three-gene data set for angiosperms. Names of orders and informal names follow APG II (2003), with some updates (e.g., lamiids, campanulids). Numbers above branches are posterior probability values.

Fig. 2 Majority-rule consensus, focusing on relationships among basal angiosperm lineages. Numbers above branches are posterior probability values. 5

Fig. 3 Majority-rule consensus, focusing on relationships among basalmost monocot lineages. Numbers above branches are posterior probability values. Asterisk indicates an apparent ‘‘hybrid’’ sequence (see Soltis et al. 2000a). 6

Fig. 4 Majority-rule consensus, focusing on relationships among commelinid monocot lineages. Numbers above branches are posterior probability values. 7

Fig. 5 Majority-rule consensus, focusing on relationships among basal eudicot lineages. Numbers above branches are posterior probability values. 8

Fig. 6 Majority-rule consensus, focusing on relationships among members of Saxifragales. Numbers above branches are posterior probability values. 9

Fig. 7 Majority-rule consensus, focusing on relationships involving a portion of the fabid (eurosid I) clade. Numbers above branches are posterior probability values. 10

Fig. 8 Majority-rule consensus, focusing on relationships involving a portion of the fabid clade focusing on Malpighiales, Oxalidales, Crossosomatales, and Geraniales. Our circumscription of Salicaceae follows APG II (2003). Numbers above branches are posterior probability values. 11

Fig. 9 Majority-rule consensus, focusing on relationships among members of the malvid (eurosid II) clade. Numbers above branches are posterior probability values. 12

Fig. 10 Majority-rule consensus, focusing on relationships involving basal members of the asterid clade. Numbers above branches are posterior probability values. 13

Fig. 11 Majority-rule consensus, focusing on relationships among members of the lamiid (euasterid I) clade. Numbers above branches are posterior probability values. The terminal labeled ‘‘unknown’’ is based on a problematic sample and represents Phyla from Soltis et al. (2000a). Asterisk indicates an apparent ‘‘hybrid’’ sequence (see Soltis et al. 2000a). 14

Fig. 12 Majority-rule consensus, focusing on relationships among members of the campanulid (euasterid II) clade. Numbers above branches are posterior probability (pp) values. 15

16

INTERNATIONAL JOURNAL OF PLANT SCIENCES

Asterids. Cornales (pp ¼ 1:0) are sister to the remaining asterids, which form a clade with pp ¼ 0:86. Ericales (1.0) are the subsequent sister to all other asterids (pp ¼ 1:0), a clade referred to as the euasterids (figs. 1, 10). The euasterids comprise the lamiid (euasterid I; fig. 11) (pp ¼ 1:0) and campanulid (euasterid II; fig. 12) (pp ¼ 1:0) clades. Within the lamiid clade, Oncothecaceae form a clade (pp ¼ 0:82) with Garryales (pp ¼ 1:0); this sister group is, in turn, sister to the remainder of the lamiid clade, which has pp ¼ 0:99 (fig. 11). Icacinaceae are sister to the remaining lamiids (pp ¼ 1:0); Icacinaceae are subsequently followed by Vahliaceae as sister to a large clade (pp ¼ 0:53) comprising Solanales (pp ¼ 1:0), Boraginaceae (pp ¼ 1:0), Gentianales (pp ¼ 1:0), and Lamiales (pp ¼ 1:0). Solanales, Boraginaceae, and Gentianales form a clade (pp ¼ 0:94) that is sister to Lamiales. Boraginaceae and Solanales form a clade with pp ¼ 0:86 (fig. 11). Within the campanulid clade, Aquifoliales (pp ¼ 1:0) are sister to the remaining (pp ¼ 1:0) members of the clade (figs. 1, 12). Aquifoliales are followed subsequently by Bruniaceae as sister to the remaining campanulids (pp ¼ 1:0), which form two subclades, with Asterales (pp ¼ 1:0) as sister to a clade (pp ¼ 0:85) of Escalloniaceae, Eremosynaceae, Dipsacales (pp ¼ 1:0), and Apiales (pp ¼ 1:0). The relationships among the Apiales, Dipsacales, and Escalloniaceae þ Eremosynaceae (pp ¼ 0:86) are unresolved (fig. 12).

Discussion Implications for Bayesian Analyses of Large Data Sets This is one of the largest data sets analyzed using Bayesian methods to date. Yet, data sets of this size and larger are becoming common, and the need for software and computer resources continues to grow. One motivation for this study, beyond the general interest in the phylogeny, was to explore the challenges of analyzing large data sets. The advent of commodity clustering has introduced high-performance computing to a much wider audience than existed with traditional supercomputers. In addition, more phylogenetics programs are being written with the ability to make use of multiple processors (e.g., MrBayes [Altekar et al. 2004], RAxML [Stamatakis et al. 2005], and GARLI [Zwickl 2006]). Have these computational advances solved our problems when it comes to large data sets? And what issues and challenges still need addressing? This Bayesian analysis of a large data set demonstrates the importance of running the Markov chains for many generations. In fact, the number of generations may be as important as or more important than the model of molecular evolution employed. For example, in several long runs (10 million generations), an outgroup member (either Ephedra, Gnetum, or Welwitschia) appeared within the ingroup. Our suspicion is that as the random starting tree was optimized, just by chance, no proposals were made uniting all of the outgroup taxa. As the tree improved, the misplaced taxon became more firmly entrenched in its placement, usually on an extremely long branch. The topology proposals that MrBayes currently uses Q9 (LOCAL and extending tree-bisection-reconnection [TBR]

branch swapping) make relatively minor adjustments to the topology and either are unable to improve or can improve only slowly the placement of a drastically misplaced taxon in an otherwise acceptable tree (but see Clemens et al. 2006 for Q10 newer proposals that may improve searching in general and solve this problem). Figure A1 shows the log-likelihood plots over 10 million generations for 12 independent runs. These profiles, with long periods of stasis followed by large jumps in likelihood value, are typical of what we observed in analyzing this data set, and we suspect all large data sets will show similar patterns. This is in contrast to what is typically seen with smaller data sets, where likelihoods increase relatively constantly and then level off. With jumps in likelihood still occurring near the 10-million-generation mark, it is clear that longer runs are warranted. However, the reality is that it is often challenging to keep analyses running for many months without interruption. Computer crashes, networking interruptions, power and air conditioning failures, and maintenance activities conspire against runs lasting many months. Newer versions of the MrBayes software that allow checkpointing—the ability to save, stop, and restart analyses—as well as faster computers and better parallelization will help the situation. Our analyses illustrate some of the limitations encountered in Bayesian analyses of particularly large data sets. For example, despite the length of time put into the Bayesian analyses presented here (more than 7.1 CPU years), even more time is required for a thorough analysis. Goloboff and Pol (2005) noted that TBR branch swapping is the basis for both parsimonyand Markov chain Monte Carlo (MCMC)–based searches of Q11 tree space. Because performing an equivalent number of rearrangements takes much longer with MCMC than parsimony, MCMC algorithms are not practical for thorough searches with large data sets, concluding that perhaps 5 billion generations would be required for an analysis on a data set the size of our current data set.

Phylogenetic Implications The topology recovered in the Bayesian analysis is highly similar to that recovered in a parsimony analysis of the same 567-taxon data set (Soltis et al. 2000a). This overall similarity is perhaps not surprising in that the MCMC search algorithms employed in Bayesian analyses incorporate TBR; hence, given enough MCMC generations, Bayesian analyses and parsimony analyses (with TBR) should reveal the same general topology, particularly those clades for which there is strong support (Goloboff and Pol 2005). Nonetheless, there are several important differences between the Bayesian analysis and the parsimony analysis of the same data sets. One noteworthy relationship that differs between the parsimony and Bayesian analyses involves the placement of Amborella. In parsimony searches, Amborellaceae are weakly supported (jackknife ¼ 65%) as sister to all other angiosperms. In the Bayesian analysis, a clade of Amborellaceaeþ Nymphaeaceae (pp ¼ 0:99) is sister to all other angiosperms (pp ¼ 1:0). Most phylogenetic studies to date in which both Amborellaceae and at least one member of Nymphaeaceae have been included have found Amborellaceae as sister to all other angiosperms (e.g., Mathews and Donoghue 1999; Q12

SOLTIS ET AL.—BAYESIAN ANALYSIS OF ANGIOSPERMS Parkinson et al. 1999; Qiu et al. 1999, 2000, 2005; Soltis et al. 1999, 2000a; Graham et al. 2000; Zanis et al. 2002, 2003; Borsch et al. 2003; Hilu et al. 2003; Kim et al. 2004). Analyses employing parsimony, maximum likelihood (ML), Q13 and Bayesian methods have all recovered this topology. However, in some studies, the relationship between Nymphaeaceae and Amborellaceae has depended on the method of analysis (see Barkman et al. 2000; Soltis et al. 2000b; Stevanovic et al. 2004; Leebens-Mack et al. 2005). For example, in a parsimony analysis of a three-gene data set for 110 taxa, Soltis et al. (2000b) recovered Amborellaceae as sister to all other angiosperms with 65% bootstrap support. A ML analysis of a data set of 39 taxa focused on basal angiosperms recovered a clade of Amborellaceae þ Nymphaeaceae (58%) as sister to all other angiosperms. Analyses of a 61-gene plastid data set yielded similar results—ML analyses provided weak support (63% bootstrap) for a clade of Amborellaceaeþ Nymphaeaceae as sister to other angiosperms, whereas parsimony analyses of the same data set suggest that Amborellaceae alone are sister to all other angiosperms (e.g., Leebens-Mack et al. 2005). The same topological difference has been observed in other phylogenetic studies of different data sets, with parsimony typically placing Amborellaceae alone as sister to other angiosperms and ML and Bayesian approaches typically recovering Amborellaceae þ Nymphaeaceae as sister to other angiosperms. In many instances, the Bayesian analysis provided a pp value of 1.0 for a relationship that received only 50%–84% jackknife support with parsimony (Soltis et al. 2000a). We summarize below a few examples of this high Bayesian support with the caveats that bootstrap and pp values are fundamentally different measures (Cummings et al. 2003) and the relationship between pp values and bootstrap (or jackknife) values is complex (Cummings et al. 2003; Yang and Rannala 2005). In many instances, pp values are higher than bootstrap (and, by inference, jackknife) values for corresponding clades (e.g., Miller et al. 2002; Suzuki et al. 2002; Erixon et al. 2003). These higher pp values may be the result of several factors, including lack of convergence and poor mixing in the MCMC algorithm, which may cause the chain to stay Q14 in a small subset of parameter space (Goloboff and Pol 2005; Randle et al. 2005; Yang and Rannala 2005). In addition, star topologies may reveal pp values that are excessively high compared to corresponding bootstrap values (Suzuki et al. 2002; Cummings et al. 2003). Similarly, Lewis et al. (2005) showed that hard, or near-hard, polytomies may cause unpredictable results in Bayesian analyses, with the arbitrary resolution of the polytomy sometimes receiving high pp. Our discussions of Bayesian pp are therefore couched with these caveats and limitations in mind. In the Bayesian analysis, the magnoliid clade (sensu APG II 2003 but not Soltis et al. 2000a) has pp ¼ 0:99, whereas this clade did not appear in the strict consensus resulting from the parsimony analysis of the same data set; although this clade was recovered in some of the shortest trees, it did not receive jackknife support greater than 50%. The commelinid clade has pp ¼ 0:99 but jackknife support of only 68% (Soltis et al. 2000a). Vitaceae are sister to other rosids with pp ¼ 1:0, but jackknife support was only 73% for this same placement. Similarly, Ceratophyllaceae are sister to eudicots

17

with pp ¼ 0:91, whereas jackknife support for this relationship in the parsimony analysis was only 53%. Another important example includes Dilleniaceae þ Caryophyllales (pp ¼ 1:0 in the Bayesian analysis; jackknife ¼ 60% in the parsimony analyses). The nitrogen-fixing clade (Fabales, Fagales, Cucurbitales, Rosales) also has pp ¼ 1:0, although jackknife support for this clade was only 68%. The pp value for an expanded Crossosomatales that includes Aphloia and Ixerba was 1.0, but this clade was not strongly supported in parsimony analyses (56% jackknife). The circumscription of Crossosomatales requires additional study, with the inclusion of additional taxa (e.g., Strasburgeria; reviewed in Soltis et al. 2005). All clades recognized as orders in APG II (2003) received pp values of 1.0. Many of these clades were strongly supported in parsimony searches of the same data set and have jackknife support above 95% (Soltis et al. 2000a), but there are exceptions. Asparagales have pp ¼ 1:0 but jackknife support of only 56%; Celastrales have pp ¼ 1:0 but jackknife support of only 62%. Important areas of the topology that were not strongly supported in the parsimony analyses of the three-gene data set appear much better resolved and receive high pp values in the Bayesian tree. For example, a series of relationships within the monocots received pp values of 1.0, including the position of Acoraceae as sister to all other monocots. Following Acoraceae, Alismatales, Petrosaviaceae, a clade of Liliales plus Pandanales þ Dioscoreales, and then Asparagales are subsequent sisters to the remaining monocots (commelinids), with most of these nodes receiving a pp value of 1.0 (the exception being the clade of Liliales plus Pandanalesþ Dioscoreales, which has pp ¼ 0:91). This topology is very similar to that achieved in parsimony analyses of monocots based on seven genes (Chase et al. 2005; Soltis et al. 2005).

Conclusions Whereas increasing the number of taxa and characters facilitates the analysis of large data sets (e.g., Hillis 1996; Soltis et al. 1998) via parsimony, our results revealed some of the challenges posed by Bayesian analyses of large data sets. Determining stationarity in Markov chains for such large data sets is difficult. In more than 30 analyses of up to 10 million generations each (with an average run time of 45 d), loglikelihood plots suggested that stationarity had been reached within a few million generations; however, closer examination reveals that jumps in likelihood can occur after even 8 million generations of relative stasis and that runs reached the likelihood threshold for inclusion after as long as 9.7 million generations, suggesting the need for longer runs. These results demonstrate the importance of running a sufficient number of generations, although it is not clear what this number should be. There are several important systematic results. Trees obtained via parsimony (Soltis et al. 1999, 2000a) and Bayesian analyses of the three-gene, 567-taxon data set for angiosperms are highly concordant. The three-gene Bayesian tree also agrees with other studies focused on particular clades but relying on more genes and often fewer taxa (e.g., basal angiosperms [Zanis et al. 2002; Qiu et al. 2000, 2005],

18

INTERNATIONAL JOURNAL OF PLANT SCIENCES

Saxifragales [Fishbein et al. 2001], monocots [Chase et al. 2005], and asterids [Albach et al. 2001; Bremer et al. 2002]). Differences also exist, however, with the most noteworthy being the placement of Amborellaceae þ Nymphaeaceae as sister to all other angiosperms with pp ¼ 1:0 in the Bayesian analysis, compared with Amborellaceae alone as sister to all other extant angiosperms with parsimony. The Bayesian topology appears to be better resolved and supported than the parsimony topology. Prominent examples of improved resolution with high pp values include the placements of Ceratophyllaceae, Vitaceae, and Dilleniaceae and the monophyly of magnoliids sensu APG II (2003). In the eudicots, a sister group of Saxifragales þ rosids and relationships within the large rosid clade also received pp ¼ 1:0. For example, in rosids, an expanded Crossosomatales received pp ¼ 1:0, as did the sister relationship of Malvalesþ Sapindales and the clade of Oxalidales þ Huaceae, Malpighiales, and Celastrales, although relationships among these remain unclear. However, caution must be exercised in the interpretation of high pp values (e.g., Miller et al. 2002; Suzuki et al. 2002; Erixon et al. 2003; Goloboff and Pol 2005). How much emphasis should be placed on a pp value of 0.95 or 1.0? It is noteworthy, for example, that within the magnoliid clade, Canellales and Magnoliales are sisters, with pp ¼ 0:96 (fig. 1). However, in phylogenetic analyses of data sets with additional genes for basal angiosperms, Canellales appear as sister to Piperales and Laurales as sister to Magnoliales. Support for these two sister groups has varied among analyses: Magnoliales þ Laurales received bootstrap support from 71% to 100% (Qiu et al. 1999, 2005; Zanis et al. 2002), and Canellales þ Piperales received 83%–100% boot-

strap support (Qiu et al. 1999, 2005; Zanis et al. 2002) in studies focusing on these clades. In many cases, however, relationships that were problematic with parsimony remain poorly resolved with low support in the Bayesian tree. These problem areas include relationships among the monocots, magnoliids, and Chloranthaceae; relationships among the basal eudicot lineages; and relationships among clades of core eudicots. Within the large asterid and rosid clades, many of the same long-standing problematic relationships remain. Our analyses illustrate the limitations of the three-gene, 567-taxon data set. Although the data set is taxonomically rich and represents angiosperm diversity well, resolving the problematic issues noted above will require additional sequence data. We are generally encouraged by the analytical capabilities that are becoming more readily available. Larger computer clusters with faster CPUs and more advanced intracluster communication networks, as well as improvements in programs, will continue to help. However, it is critical that researchers be aware that the real time, not just CPU time, needed to analyze large data sets is still on the order of months. We have spent years collecting these data; we must be equally patient in analyzing them.

Acknowledgments This research was supported by an Assembling the Tree of Life grant (EF-0431266) (National Science Foundation). We thank Vic Albert, Dick Olmstead, and an anonymous reviewer for helpful comments on the manuscript.

Appendix Figure

Fig. A1 Graphs of log-likelihood values (lnL) in cold chain over time (generations) in 12 Bayesian runs using the GTR þ I þ G model of molecular evolution. a, Standard Microsoft Excel graph demonstrates the importance of closely examining these plots in determining whether a run has reached stationarity—cursory examination suggests that all runs reached a plateau fairly quickly. b, Zooming to a narrower range of log likelihoods shows that the 12 runs have very different lnL values during the 10-million-generation run. Nine of the 12 runs converge on log likelihoods above 238,100 (horizontal line in the 12 panes).

19

20

INTERNATIONAL JOURNAL OF PLANT SCIENCES

Literature Cited Albach DC, PS Soltis, DE Soltis, RG Olmstead 2001 Phylogenetic analysis of the Asteridae s.l. based on sequences of four genes. Ann Mo Bot Gard 88:163–212. Altekar G, S Dwarkadas, JP Huelsenbeck, F Ronquist 2004 Parallel Metropolis-coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20:407–415. APG II (Angiosperm Phylogeny Group II) 2003 An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants. Bot J Linn Soc 141:399–436. Bremer B, K Bremer, N Heirdari, P Erixon, RG Olmstead, M Ka¨llersjo¨, AA Anderberg, E Barkhordarian 2002 Phylogenetics of asterids based on 3 coding and 3 non-coding chloroplast DNA markers and the utility of non-coding DNA at higher taxonomic levels. Mol Phylogenet Evol 24:274–301. Chase MW, MF Fay, DS Devey, O Maurin, N Rønsted, J Davies, Y Pillon, et al 2005 Multi-gene analyses of monocot relationships: a summary. Pages 105–123 in JT Columbus, EA Friar, CW Hamilton, JM Porter, LM Prince, MG Simpson, eds. Monocots: comparative biology and evolution. Rancho Santa Ana Botanic Garden, Claremont, CA. Cummings MP, SA Handley, DS Myers, DL Reed, A Rokas, K Winka 2003 Comparing bootstrap and posterior probability values in the four-taxon case. Syst Biol 52:477–487. Erixon P, B Svennblad, T Britton, B Oxelman 2003 Reliability of Bayesian posterior probabilities and bootstrap frequencies in phylogenetics. Syst Biol 52:665–673. Fishbein M, C Hibsch-Jetter, DE Soltis, L Hufford 2001 Phylogeny of Saxifragales (angiosperms, eudicots): analysis of a rapid, ancient radiation. Syst Biol 50:817–847. Gaucher EA, MM Miyamoto 2005 A call for likelihood phylogenetics even when the process of sequence evolution is heterogeneous. Mol Phylogenet Evol 37:928–931. Q15 Goloboff PA 1999 Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics 15:415–428. Goloboff PA, D Pol 2005 Parsimony and Bayesian phylogenetics. Pages 148–159 in VA Albert, ed. Parsimony and Bayesian phylogenetics. Oxford University Press, Oxford. Gropp W, E Lusk, N Doss, A Skjellum 1996 A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput 22:789–828. Guindon S, O Gascuel 2003 A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52: 696–704. Hillis DM 1996 Inferring complex phylogenies. Nature 383:130. Huelsenbeck JP, F Ronquist 2001 MRBAYES: Bayesian inference of phylogeny. Bioinformatics 17:754–755. Huelsenbeck JP, F Ronquist, R Nielsen, J Bollback 2001 Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294:2310–2314. Ka˚rehed J 2001 Multiple origins of the tropical forest tree family Icacinaceae. Am J Bot 88:2259–2274. Leebens-Mack J, LA Raubeson, L Cui, JV Kuehl, MH Fourcade, TW Chumley, JL Boore, RK Jansen, CW dePamphilis 2005 Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one’s way out of the Felsenstein zone. Mol Biol Evol 22: 1948–1963. Lewis PO, MT Holder, KE Holsinger 2005 Polytomies and Bayesian phylogenetic inference. Syst Biol 54:241–253. Lutzoni F, F Kauff, CJ Cox, D McLaughlin, G Celio, B Dentinger, M Padamsee, et al 2004 Assembling the fungal tree of life: progress, classification, and evolution of subcellular traits. Am J Bot 91: 1446–1480. Miller RE, TR Buckley, PS Manos 2002 An examination of the

monophyly of morning glory taxa using Bayesian phylogenetic inference. Syst Biol 51:740–753. Nylander JA 2004 MrModeltest, version 2. Evolutionary Biology Centre, Uppsala University, Uppsala. Nylander JA, F Ronquist, JP Huelsenbeck, JL Nieves-Aldrey 2004 Q16 Bayesian phylogenetic analysis of combined data. Syst Biol 53:47–67. Posada D, KA Crandall 1993 Modeltest: testing the model of DNA substitution. Bioinformatics 14:817–818. Pryer K, ME Schuettpelz, PG Wolf, H Schneider, AR Smith, R Q17 Cranfill 2004 Phylogeny and evolution of ferns (monilophytes) with a focus on the early leptosporangiate divergences. Am J Bot 91: 1582–1598. Qiu Y-L, O Dombrovska, J Lee, L Li, BA Whitlock, F BernasconiQuadroni, JS Rest, et al 2005 Phylogenetic analysis of basal angiosperms based nine plastid, mitochondrial, and nuclear genes. Int J Plant Sci 166:815–842. Qiu Y-L, J-Y Lee, F Bernasconi-Quadroni, DE Soltis, PS Soltis, M Zanis, E Zimmer, Z Chen, V Savolainen, MW Chase 2000 Phylogeny of basal angiosperms: analyses of five genes from three genomes. Int J Plant Sci 161(suppl):S3–S27. Randle CP, ME Mort, DJ Crawford 2005 Bayesian inference of phylogenetics revisited: developments and concerns. Taxon 54:9–15. Ronquist F, JP Huelsenbeck 2003 MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574. Rydin C, M Ka¨llersjo¨, EM Friis 2002 Seed plant relationships and the systematic position of Gnetales based on nuclear and chloroplast DNA: conflicting data, rooting problems, and the monophyly of conifers. Int J Plant Sci 163:197–214. Soltis DE, PS Soltis, MW Chase, ME Mort, DC Albach, M Zanis, V Savolainen, et al 2000a Angiosperm phylogeny inferred from 18S rDNA, rbcL and atpB sequences. Bot J Linn Soc 133:381–461. Soltis DE, PS Soltis, PK Endress, MW Chase 2005 Angiosperm phylogeny and evolution. Sinauer, Sunderland, MA. Soltis DE, PS Soltis, ME Mort, MW Chase, V Savolainen, SB Hoot, CM Morton 1998 Inferring complex phylogenies using parsimony: an empirical approach using three large DNA data sets for angiosperms. Syst Biol 47:32–42. Soltis DE, PS Soltis, MJ Zanis 2002 Phylogeny of seed plants based Q18 on evidence from eight genes. Am J Bot 89:1670–1681. Soltis PS, DE Soltis, MW Chase 1999 Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402:402–404. Soltis PS, DE Soltis, MJ Zanis, S Kim 2000b Basal lineages of angiosperms: relationships and implications for floral evolution. Int J Plant Sci 161(suppl):S97–S107. Stamatakis A, T Ludwig, H Meier 2005 RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21:456–463. Stevanovic S, DW Rice, JD Palmer 2004 Long branch attraction, taxon sampling, and the earliest angiosperms: Amborella or monocots? BMC Evol Biol 4:35–54. Suzuki Y, GV Glazko, M Nei 2002 Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc Natl Acad Sci USA 99:16138–16143. Wurdack KJ, P Hoffmann, MW Chase 2005 Molecular phylogenetic analysis of uniovulate Euphorbiaceae (Euphorbiaceae sensu stricto) using plastid rbcL and trnL-F DNA sequences. Am J Bot 92: 1397–1420. Yang Z, B Rannala 2005 Branch-length prior influences Bayesian posterior probability of phylogeny. Syst Biol 54:455–470. Zwickl DJ 2006 GARLI Genetic algorithm for rapid likelihood inference. Software available at http://www.bio.utexas.edu/grad/ zwickl/web/garli.html.

SOLTIS ET AL.—BAYESIAN ANALYSIS OF ANGIOSPERMS

21

Queries Q1 Q2 Q3 Q4 Q5 Q6 Q7

Q8 Q9 Q10 Q11 Q12

Q13 Q14 Q15 Q16 Q17 Q18

OK to list D. Soltis as corresponding author? Huelsenbeck 1995 is not listed in the literature cited. Please provide reference information or delete from text. OK to change Posada and Crandall 1998 to 1993 to match literature cited? Swofford 2000 is not listed in the literature cited. Please provide reference information or delete from text. Please spell out MPI. I have provided a temporary title for the appendix that contains figure A1. Please revise as necessary. IJPS does not allow citations of articles that are not yet in press. Has Gitzendanner et al. been accepted for publication? If so, please provide reference information and it will be added to the literature cited. If not, please provide first initials and last names of all authors and it will be listed as an unpublished manuscript or unpublished data, as necessary. Please advise. In figure 3, I’m not sure I was able to locate the asterisk described in the legend. Please advise. OK to define TBR here as ‘‘tree-bisection-reconnection branch swapping’’? Clemens et al. 2006 is not listed in the literature cited. Please provide reference information or delete from text. Did I correctly spell out MCMC as Markov chain Monte Carlo?. In the subsection for Amborella, there are many references missing. Please provide information for the following or delete from the text: Mathews and Donoghue 1999, Parkinson et al. 1999, Qiu et al. 1999, Graham et al. 2000, Zanis et al. 2002 and 2003, Borsch et al. 2003, Hilu et al. 2003, Kim et al. 2004, Barkman et al. 2000, Did I correctly spell out ML as maximum likelihood? OK to change Yang and Rannala 2003 to 2005 to match literature cited? Goloboff 1999 is not cited in the text. Please provide a citation or delete from literature cited. Nylander et al. 2004 is not listed in the text. Please provide a citation or delete from literature cited. Pryer et al. 2004 is not cited in the text. Please provide a citation or delete from literature cited. Soltis et al. 2002 is not cited in the text. Please provide a citation or delete from literature cited.

International Journal of Plant Sciences Dennis Keppeler International Journal of Plant Sciences th 1101 E. 57 Street Chicago, IL 60637

Reprint Order Form Please return this form even if no reprints are ordered.

G NO REPRINTS DESIRED

PLEASE CHOOSE ONE OF THE FOLLOWING OPTIONS:

G 5 FREE COPIES OF JOURNAL ISSUE

G 1 YEAR’S FREE SUBSCRIPTION OR RENEWAL

AUTHORS: REPRINT ORDER MUST BE RECEIVED PRIOR TO PRINTING OF JOURNAL ISSUE. Please return this form immediately even if no reprints are desired. Reprints ordered through an institution will not be processed without a purchase order number. Payment by check, Money Order, Visa, or MasterCard is required with all orders not accompanied by an institutional purchase order or purchase order number. Make checks and purchase orders payable to The University of Chicago Press. TO BE COMPLETED BY AUTHOR: International Journal of Plant Sciences

Vol ______ No _____ Month ________________________

Author(s): _____________________________________________________________________ No of pages in article __________ Title of Article: ______________________________________________________________________________________________

R E P R I N T C H A R G E S (please compute) _______ Quantity

$ ___________

Covers

$ ___________

Subtotal

$ ___________

GST (7% for Canadian destinations only)

$ ___________

Reprint rate chart on page 2

Non-U.S. and non-Canada shipping (Non-U.S. orders add 45% to subtotal)

$ ___________

TOTAL DUE (US $)

$ ___________

Prices include shipping for U.S. and Canadian orders. Non-U.S and non-Canadian orders are shipped via Airmail at an additional cost of 45% of the total printing charge. SHIPPING INSTRUCTIONS

BILLING INSTRUCTIONS (Institutional Orders Only)

Name ____________________________________________________

Institution _______________________________________________

__________________________________________________________

________________________________________________________

_________________________________________________________

City _______________________ State _____ Zip ______________

Street_____________________________________________________

Country _________________________________________________

City __________________________ State_____ Zip ______________

*Phone _________________________________________________

Country ___________________________________________________

* Please include a phone number in case we need to contact you about your order

MAKE CHECKS AND PURCHASE ORDERS PAYABLE TO: The University of Chicago Press. All orders must be accompanied by one of the three payment options (purchase order, check/money order, or Visa/MasterCard): 1) G Check or Money Order for total charges is attached

OR

2) Please charge to:

G

VISA

G MASTERCARD

Cardmember name as it appears on card (please print clearly) ________________________________________________________ Card Number _______________________________________________Expiration Date __________________________________ Signature _________________________________________________ Phone __________________________________________ 3) Institutional Purchase Order No. ___________________________________ Purchase Order

attached G

to come G

RETURN THIS REPRINT ORDER FORM WITH YOUR PROOFS TO: International Journal of Plant Sciences Dennis Keppeler International Journal of Plant Sciences th 1101 E. 57 Street

Chicago, IL 60637 DO NOT DELAY ORDERING YOUR REPRINTS: Orders must be in hand before the issue goes to press. FORMAT: Offprints are printed exactly as articles appear in the journal, but without any backing material. They are trimmed on all sides and saddle-stitched. Covers are printed on white stock and include article title, author’s name, copyright information, and issue date. DELIVERY AND INVOICES: Reprints are shipped 2-4 weeks after publication of the Journal. Invoices are mailed at the time of shipment. For all orders charged to institutions, an official Purchase Order must be in hand before the reprint shipment can be released. Reprint orders payable by individuals must be accompanied by advance payment by check, Money Order, Visa, or MasterCard. In case of non-U.S. purchases, this payment must be made in the form of a check payable in U.S. currency via an American bank. Terms are net 30 days.

REPRINT PRICE LIST

Quantity

Number of pages

1-4

5-8

9-12

13-16

17-20

21-24

add’l 4 pages

COVER S

25

$26

$30

$35

$41

$46

$50

$6

$14.00

50

$34

$40

$46

$53

$59

$66

$7

$18.00

100

$52

$65

$77

$91

$103

$116

$13

$35.00

150

$67

$86

$104

$125

$143

$163

$19

$53.00

200

$87

$111

$135

$162

$174

$202

$25

$70.00

250

$116

$147

$177

$211

$241

$276

$31

$88.00

300

$138

$176

$212

$253

$289

$330

$37

$105.00

350

$161

$204

$243

$294

$336

$373

$43

$123.00

400

$183

$233

$329

$337

$385

$440

$49

$140.00