Comment on the Quartet Puzzling Method for Finding Maximum ...

17 downloads 0 Views 272KB Size Report
1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 ... 1 Present address: Department of Zoology, University of Oxford. Key words: quartet ...
Letter to the Editor Comment on the Quartet Puzzling Method for Finding Maximum-Likelihood Tree Topologies Ying Cao,*† Jun Adachi,*1 and Masami Hasegawa* *The Institute of Statistical Mathematics, Tokyo, Japan; and †Faculty of Bioscience and Biotechnology, Tokyo Institute of Technology, Yokohama, Japan

The maximum-likelihood (ML) method for inferring molecular phylogenies (Felsenstein 1981) is being used extensively in the wide field of phylogenetics. The method has a sound statistical basis (e.g., Felsenstein 1983; Goldman 1990; Yang 1994) and has proved to be powerful in recovering correct tree topologies in computer simulation studies (e.g., Hasegawa, Kishino, and Saitou 1991; Hasegawa and Fujiwara 1993; Kuhner and Felsenstein 1994; Gaut and Lewis 1995; Huelsenbeck 1995; Yang 1995, 1996). The most serious problem in applying the ML method to real biological problems is that the number of possible tree topologies increases explosively as the number of sequences increases. The most straightforward approach would be to evaluate all possible tree topologies and pick the one which gives the highest likelihood. This would not be possible even for a moderate number of sequences, since the number of possible tree topologies already exceeds 2 3 106 when the number of sequences is 10 (Felsenstein 1978). In order to overcome this difficulty, several heuristic methods for the topology search have been proposed, e.g., the star decomposition method (Saitou 1988; Adachi and Hasegawa 1992) and the local rearrangement (Felsenstein 1993; Adachi and Hasegawa 1996; called the nearestneighbor interchanges in Swofford et al. 1996). Recently, Strimmer and von Haeseler (1996) and Strimmer, Goldman, and von Haeseler (1997) considered another method of topology search for the ML tree, which they called quartet puzzling. Although the new method looks promising, the tree obtained is not always the highest likelihood tree, as they have shown with computer simulation. We now show that in their illustrative biological example, the tree obtained is not the highest likelihood tree, and it can be improved by the local rearrangement implemented with MOLPHY (Adachi and Hasegawa 1996). Furthermore, we show that some of the ‘‘reliability values’’ of quartet puzzling in their example are misleadingly high relative to the bootstrap probabilities. Strimmer and von Haeseler (1996) applied their method to the concatenated sequences of amniote mitochondrial 12S rRNA, 16S rRNA, and tRNAVal genes in Hedges (1994) with additional sequences from opos1

Present address: Department of Zoology, University of Oxford.

Key words: quartet puzzling, molecular phylogeny, maximumlikelihood tree. Address for correspondence and reprints: Masami Hasegawa, The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku, Tokyo 106, Japan. E-mail: [email protected]. Mol. Biol. Evol. 15(1):87–89. 1998 q 1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038

sum (Janke et al. 1994) and platypus (Janke et al. 1996). First, we applied the neighbor-joining (NJ) method (Saitou and Nei 1987) to their data with the distance matrix estimated by pairwise ML and obtained the tree which coincides with the quartet puzzling tree obtained by Strimmer and von Haeseler (1996, fig. 4). They used the approximate likelihood option for the ML computation (Adachi and Hasegawa 1996) in analyzing the data (known as noniterated likelihood in Waddell 1995). Use of the exact likelihood with quartet puzzling does not significantly change the result for this data set (not shown). While in their tree, lizard is an outgroup to the birdalligator-Sphenodon clade with a reliability value as high as 94% by quartet puzzling (version 2.4 of Strimmer and von Haeseler [1996]; 93% by version 2.5.1 of Strimmer, Goldman, and von Haeseler [1997]) with the HKY85 model (Hasegawa, Kishino, and Yano 1985); it has a local bootstrap probability (bootstrap proportion among the three alternative trees estimated by fixing the branching orders within the three subtrees attached to the particular branch and within the outgroup; Adachi and Hasegawa 1996) of only 47% (estimated by the RELL method; Kishino, Miyata, and Hasegawa 1990; Hasegawa and Kishino 1994). The local bootstrap probability might be misleading when the relationships within the subtrees are wrong. However, given that the within-subtree relationships are all consistent with the traditional tree and are likely to be correct, the local bootstrap probability would be close to the real bootstrap probability. Although Strimmer and von Haeseler (1996) suggested that the reliability value and the bootstrap probability are highly correlated, these two values are quite different for this particular case. Starting with the tree topology shown in the lefthand side of figure 1 (fig. 4 of Strimmer and von Haeseler 1996), the process of the local rearrangements was repeated and the tree of the right-hand side in figure 1 was obtained (using the exact likelihood with the HKY85 model). In this tree, Sphenodon groups with the lizard, and the log-likelihood of this tree is higher than that of the tree of the left-hand side in figure 1 by 1.6 6 7.8 (SE estimated by Kishino and Hasegawa’s [1989] formula). Although the difference in log-likelihoods between these two trees is too small to decide which one represents the true evolutionary history, this analysis suggests that the reliability value of 94% obtained by quartet puzzling is highly exaggerated. Nevertheless, given that the two trees do not differ significantly in log-likelihood, the quartet puzzling method produces a good approximation of the highest likelihood tree. In Hedges (1994), the NJ analysis (as well as the parsimony analysis) groups Sphenodon with the lizard, con87

88

Cao et al.

FIG. 1.—Left-hand side: an NJ tree of mitochondrial 12S rRNA 1 16S rRNA 1 tRNAVal in which the tree topology was estimated by the NJdist program in MOLPHY (Adachi and Hasegawa 1996) and coincides with that of the tree in figure 4 of Strimmer and von Haeseler (1996). Right-hand side: a NucML tree obtained by repeated local rearrangements. The horizontal length of each branch is proportional to the number of nucleotide substitutions estimated by the NucML program (the HKY85 model; a/b 5 2.32) in MOLPHY. Local bootstrap probabilities (%) are shown above branches, and reliability values of quartet puzzling (fig. 4 of Strimmer and von Haeseler 1996) are shown below branches.

trary to our NJ analysis, probably due to the different methods of estimating the distance matrix and to the different species set. When insertion/deletion sites are excluded from the analyses, the 94% reliability value for the bird-alligator-Sphenodon clade obtained by the previous analysis of quartet puzzling reduces to 63% and 70% with versions 2.4 and 2.5.1, respectively, while local rearrangement gives 67% local bootstrap probability for the Sphenodon-lizard clade. Thus, a high reliability value by a single alignment may sometimes be misleading. The reliability values shown in figure 4 of Strimmer and von Haeseler (1996) seem to be misleading in several respects not only with respect to the sister group relationship of lizard with the bird-alligator-Sphenodon clade. For comparison with the reliability values of quartet puzzling, we estimated bootstrap probabilities by the RELL method (Kishino, Miyata, and Hasegawa 1990; Hasegawa and Kishino 1994) with extensive ML analyses of the same data set as used by Strimmer and von Haeseler (1996); i.e., we examined all 945 possible trees among alligator, bird, Sphenodon, lizard, turtle, and mammals with the outgroup and summed up the bootstrap probabilities of trees with a clade of interest. The bootstrap probabilities were estimated to be 99% (reliability value: 100%) for the bird-alligator clade, 42% (94%) for the alligator-bird-Sphenodon clade, 68% (70%) for the alligator-bird-Sphenodon-lizard clade, and 100% (100%) for the alligator-bird-Sphenodon-lizardturtle clade. Similarly, bootstrap probabilities within mammals were estimated by examining 945 trees among human, seal, cow/whale, mouse/rat, opossum, and platypus with bird, reptiles, frog and lungfish as an outgroup

(within-outgroup relationships were assumed to be those of the ML tree in fig. 1). The bootsrap probabilities were estimated to be 66% (100%) for the seal-cow-whale clade, 97% (98%) for the human-seal-cow-whale clade, and 95% (100%) for the platypus-opossum clade. Thus, although the reliability value generally correlates with the bootstrap probability, it is sometimes misleadingly higher than the bootstrap (e.g., for the alligator-birdSphenodon clade, the seal-cow-whale clade, and perhaps the platypus-opossum clade). It is not guaranteed that the tree obtained by repeated local rearrangements has the highest likelihood among all possible trees (there may be dependence on the starting tree). For this reason, use of several alternative starting trees is recommended in order to avoid getting trapped in local optima, and the tree with the highest likelihood from several runs should be chosen (e.g., Felsenstein 1993). The trees inferred by NJ, parsimony, and other methods can be used as starting trees. Since quartet puzzling does not always find the highest likelihood tree, it too might benefit from local rearrangements, and in our experience, it seems to provide a good starting tree for local rearrangements. Acknowledgments We thank Arndt von Haeseler, Korbinian Strimmer, Peter J. Waddell, Naruya Saitou, and two anonymous reviewers for helpful comments on earlier versions of the manuscript. This work was supported by grants from the Ministry of Education, Science, Sports, and Culture of Japan. LITERATURE CITED

ADACHI, J., and M. HASEGAWA. 1992. MOLPHY: programs for molecular phylogenetics, I. PROTML: maximum likelihood inference of protein phylogeny. Computer Science Monographs, no. 27. Institute of Statistical Mathematics, Tokyo. . 1996. MOLPHY version 2.3: programs for molecular phylogenetics based on maximum likelihood. Computer Science Monographs, no. 28. Institute of Statistical Mathematics, Tokyo. FELSENSTEIN, J. 1978. The number of evolutionary trees. Syst. Zool. 27:27–33. . 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 17:368–376. . 1983. Methods for inferring phylogenies: a statistical view. Pp. 315–334 in J. FELSENSTEIN, ed. Numerical taxonomy. Springer-Verlag, Berlin. . 1993. PHYLIP: phylogeny inference package and manual. Version 3.5c. Department of Genetics, University of Washington, Seattle. GAUT, B. S., and P. O. LEWIS. 1995. Success of maximum likelihood phylogeny inference in the four-taxon case. Mol. Biol. Evol. 12:152–162. GOLDMAN, N. 1990. Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses. Syst. Zool. 39:345–361. HASEGAWA, M., and M. FUJIWARA. 1993. Relative efficiencies of the maximum likelihood, maximum parsimony, and neighbor-joining methods for estimating protein phylogeny. Mol. Phylogenet. Evol. 2:1–5.

Letter to the Editor

HASEGAWA, M., and H. KISHINO. 1994. Accuracies of the simple methods for estimating the bootstrap probability of a maximum likelihood tree. Mol. Biol. Evol. 11:142–145. HASEGAWA, M., H. KISHINO, and N. SAITOU. 1991. On the maximum likelihood method in molecular phylogenetics. J. Mol. Evol. 32:443–445. HASEGAWA, M., H. KISHINO, and T. YANO. 1985. Dating of the human–ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160–174. HEDGES, S. B. 1994. Molecular evidence for the origin of birds. Proc. Natl. Acad. Sci. USA 91:2621–2624. HUELSENBECK, J. P. 1995. The robustness of two phylogenetic methods: four-taxon simulations reveal a slight superiority of maximum likelihood over neighbor joining. Mol. Biol. Evol. 12:843–849. JANKE, A., G. FELDMAIER-FUCHS, W. K. THOMAS, A. VON HAE¨A ¨ BO. 1994. The marsupial mitochondrial SELER, and S. PA genome and the evolution of placental mammals. Genetics 137:243–256. JANKE, A., N. J. GEMMELL, G. FELDMAIER-FUCHS, A. VON HAESELER, and S. PA¨A¨BO. 1996. The complete mitochondrial genome of a monotreme, the platypus Ornithorhynchus anatinus. J. Mol. Evol. 42:153–159. KISHINO, H., and M. HASEGAWA. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J. Mol. Evol. 29:170–179. KISHINO, H., T. MIYATA, and M. HASEGAWA. 1990. Maximum likelihood inference of protein phylogeny, and the origin of chloroplasts. J. Mol. Evol. 31:151–160. KUHNER, M. K., and J. FELSENSTEIN. 1994. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11:459–468.

89

SAITOU, N. 1988. Property and efficiency of the maximum likelihood method for molecular phylogeny. J. Mol. Evol. 27: 261–273. SAITOU, N., and M. NEI. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425. STRIMMER, K., N. GOLDMAN, and A. VON HAESELER. 1997. Bayesian probabilities and quartet puzzling. Mol. Biol. Evol. 14:210–211. STRIMMER, K., and A. VON HAESELER. 1996. Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol. Biol. Evol. 13:964–969. SWOFFORD, D. L., G. J. OLSEN, P. J. WADDELL, and D. M. HILLIS. 1996. Phylogenetic inference. Pp. 407–514 in D. M. HILLIS, C. MORITZ, and B. K. MABLE, eds. Molecular systematics. 2nd edition. Sinauer, Sunderland, Mass. WADDELL, P. J. 1995. Statistical methods of phylogenetic analysis: including Hadamard conjugations, LogDet transforms, and maximum likelihood. Ph.D. thesis in Biology, Massey University, Palmerston North, New Zealand. YANG, Z. 1994. Statistical properties of the maximum likelihood method of phylogenetic estimation and comparison with distance matrix methods. Syst. Biol. 43:329–342. . 1995. Evaluation of several methods for estimating phylogenetic trees when substitution rates differ over nucleotide sites. J. Mol. Evol. 40:689–697. . 1996. Phylogenetic analysis using parsimony and likelihood methods. J. Mol. Evol. 42:294–307.

NARUYA SAITOU, reviewing editor Accepted October 13, 1997

Suggest Documents