Points of View Syst. Biol. 48(4):808–813, 1999
Taxon Sampling and Reverse Successive Weighting CHRISTOPHER A. BROCHU Department of Geology, Field Museum of Natural History, Roosevelt Road at Lake Shore Drive, Chicago, Illinois 60605, USA; Email:
[email protected]
Homoplasy in some data sets appears to reflect a hierarchy of nonrandom signals. The possible presence of a nonrandom secondary signal is an important consideration if operations are to be used that assume random homoplasy, such as a posteriori successive weighting. Trueman (1998) presented a method for detecting hierarchical structure in a phylogenetic data matrix and applied his method to two different crocodylian data sets. In this note, I show that the topology and the strength of the secondary signal Trueman recovered for one of these data sets are influenced by the number and phylogenetic position of taxa included in the analysis. In Trueman’s method, reverse successive weighting (RSW), nonhomoplastic characters on the set of optimal trees for a given data set are downweighted or deleted from the matrix, and trees supported by the resulting matrix are examined for signal. The presence of robust nodes in these trees is taken as evidence that homoplasy is not random noise but instead reflects a signal in the data matrix that, though nonrandom, is weaker than the primary signal expressed in the optimal trees. RSW can be applied iteratively—character states can be mapped on the tree recovered from the first step, nonhomoplastic characters can be removed, and the matrix run yet again until secondary signal(s) are lost. Trueman applied his new method to two different crocodylian matrices—morphology (Brochu, 1997) and 12S rRNA sequence data (modified by Brochu from Gatesy et
al., 1993). These trees differ primarily in the placement of Gavialis gangeticus, the Indian gharial. Morphology regards Gavialis as the sister taxon of all other living crocodylians, whereas 12S rRNA data (and other molecular data sets) favor a sister taxon relationship between Gavialis and Tomistoma schlegelii, the false gharial (Gatesy and Amato, 1992; Poe, 1996; Brochu, 1997). When nonhomoplastic characters are removed from the morphology matrix, the resulting optimal tree is congruent with molecular data sets for crocodylians and supports a Gavialis–Tomistoma clade (Fig. 1). These relationships hold through two iterations of RSW. That the secondary signal in morphology might reflect the correct organismal phy-
FIGURE 1. Competing hypotheses for crocodylian relationships. (a) Tree preferred by morphology. (b) Secondary tree recovered from morphology when nonhomoplastic characters are disregarded; this tree is congruent with several molecular hypotheses.
808
1999
POINTS OF VIEW
logeny is supported by the fact that multiple molecular data sets agree with it. Alternatively, the signal might reflect independent derivation of character states in Gavialis and Tomistoma that are insufficient to overwhelm a primary phylogenetic signal. This latter interpretation is suggested by the fact that Tomistoma and Gavialis both have long, tubular snouts, and 8 of the 12 characters joining them in the secondary signal are related to the skull and jaw apparatus (Trueman, 1998). Trueman was cautious about interpreting this signal and suggested that either scenario was a possible explanation. Trueman analyzed a very reduced version of the morphological matrix. Of 45 living and extinct crocodylians considered by Brochu (1997), only ten were included. This was presumably to maximize comparisons between the results of different data sets, because these were the ten for which 12S sequence data were available. But taxon sampling—both the number of ingroup taxa and their phylogenetic position—can bear critically on the primary signal recovered by a phylogenetic analysis (e.g., Gauthier et al., 1988; Donoghue et al., 1989; Kim, 1996; Graybeal, 1998; Poe, 1998; Halanych, 1998; Rannala et al., 1998). We should expect taxon sampling to likewise influence secondary signals. This will influence our interpretation of any secondary signals recovered during RSW. Brochu’s analysis was concerned with the impact fossils had on the recovered primary signal, and fossils might also bear on other levels of the signal hierarchy. Fossils can overturn phylogenies supported by living taxa alone when lineages are being pulled together by large amounts of convergence. They can preserve morphologies with sufficient plesiomorphy to avoid the character states responsible for misleading the “Recent-only” parsimony analysis (Gauthier et al., 1988; Donoghue et al., 1989; Huelsenbeck, 1991), effectively pruning long phylogenetic branches. Fossils are not unique in this regard, as basal extant lineages can serve a similar function (e.g., Kim, 1996), but for very old groups, basal fossils dating close to a divergence point may have fewer autapomorphies and more closely approximate the mor-
809
phological conditions close to the root of a lineage. With or without fossils, the morphological matrix used by Brochu supported the same topology with respect to living taxa. Outwardly, this result suggested the absence of a branch-attraction problem for crocodylian morphology. Alternatively, the fossils may not have been basal enough to correct for branch length and overturn the Recent-only tree. They would be pruning the branches near the tips, and both trees would be inaccurate. RSW gives us an opportunity to address this because fossils should preserve the same secondary signal as Recent taxa if they suffer from a similar convergence problem. To investigate the influence of taxon sampling on RSW analyses of crocodylian morphology, I added fossils to the matrix of 164 morphological characters and applied multiple iterations of RSW. This matrix differed from that of Brochu (1997) only with respect to Australosuchus clarkae, which was recoded for a few characters because more material was made available for examination; codings for this taxon can be obtained from the author on request. Maximum parsimony analyses were conducted using PAUP, version 3.1. Two fossil crocodyliforms—Bernissartia fagesii and an undescribed taxon from the Cretaceous Glen Rose Formation of Texas (Langston, 1974)— were used as outgroups. Heuristic searches were used when all taxa were analyzed, and the branch-and-bound algorithm was applied when a subsample was considered. To assess nodal support, 100 bootstrap replicate sets were generated for each analysis. When nonhomoplastic characters were removed from a matrix that included all 45 ingroup taxa, the resulting consensus tree differed only slightly from that obtained when all characters are considered. Tomistoma was still regarded as a crocodylid and a very distant relative of Gavialis. This basic relationship was stable for two iterations, after which there were no nonhomoplastic characters to remove. But in this case, the secondary signal may have been lost because of data set size. Levels of homoplasy are known to increase as the number of ingroup taxa considered increases (Sanderson
810
SYSTEMATIC BIOLOGY
and Donoghue, 1989, 1996; Hauser and Boyajian, 1997). One would expect the strength of secondary signals to diminish relative to the primary signal as the size of a data set increases: Fewer characters will be nonhomoplastic in the primary signal, and more of the original data set will remain when nonhomoplastic characters are removed. When all crocodylian taxa were considered, 60 characters were nonhomoplastic, compared with 124 for the reduced Recent-only sample. Only eight more characters were nonhomoplastic on the consensus tree that resulted from the first iteration, but none on that from the second. On this basis, one could draw two conclusions from a failure to obtain a signal hierarchy with RSW: Either there is no signal hierarchy, or homoplasy levels are high enough to prevent us from seeing any that exist.
VOL.
48
To diminish the effects of using a large number of taxa, I reran these analyses, using reduced matrices of selected fossil taxa. Fossils were selected on the basis of phylogenetic position in the morphological tree (Brochu, 1997; Table 1 and Fig. 2); if convergence is creating a subsignal within the data set, it should diminish as we move down the lineages in question. Two different versions of this test were conducted, the first with very basal members of crocodylian lineages (Pruned analysis 1 in Table 1) and the other with taxa from higher up the respective lineages (Pruned analysis 2 in Table 1). In the first analysis (Fig. 3a–c), very basal members of Gavialoidea, Alligatoroidea, Crocodyloidea, Tomistominae, and the crocodile lineage (Crocodylinae) were considered, along with Borealosuchus sternber-
TABLE 1. Fossil crocodylians considered in this analysis. Taxon
Age
Pruned analysis
Thoracosaurus macrorhynchus
Early Paleocenea
1
Eogavialis africanum
Late Eocene/Early Oligocene
2
Borealosuchus sternbergii
Late Cretaceous
1
Borealosuchus wilsoni
Early Eocene
2
Leidyosuchus canadensis
Late Cretaceous
1
Diplocynodon darwini
Early Eocene
1
Diplocynodon ratelii
Miocene
2
Brachychampsa montana
Late Cretaceous
1
Stangerochampsa mccabei
Late Cretaceous
1
Alligator prenasalis
Oligocene
2
Allognathosuchus wartheni
Early Eocene
2
Eocaiman cavernosus
Early Eocene
2
Purussaurus neivensis
Miocene
2
Prodiplocynodon langi
Late Cretaceous
1
Asiatosuchus germanicus
Early Eocene
1
“Crocodylus” affinis
Early Eocene
2
“Crocodylus” spenceri
Early Eocene
1
Gavialosuchus americanus
Miocene/Pliocene b
2
Australosuchus clarkae
Late Oligocene/Early Miocenec
1
“Crocodylus” megarhinus
Late Eocene/Early Oligocene
2
“Crocodylus” lloidi
Pliocene
2
Older Thoracosaurus (Late Cretaceous) are less complete but code identically. Older North American Gavialosuchus (Oligocene) are less complete but code identically. Part of an insular Australasian radiation (Mekosuchinae) known from the Eocene through Quaternary. Australosuchus was Mekosuchinae’s representative in Brochu (1997) and was retained here. An Eocene representative (Kembara) has subsequently been coded for this matrix; however, although it does not code identically with Australosuchus, replacing Australosuchus with Kembara makes no significant difference to these results. See Salisbury and Willis (1996) for further discussion of mekosuchine relationships. a
b c
1999
POINTS OF VIEW
FIGURE 2. Hypothesized relationships among fossil taxa considered in this study, with their stratigraphic positions. Extant taxa are in boldface. See Brochu (1997) for further details.
gii, a member of a lineage lying outside Crocodylidae + Alligatoridae according to morphology. When all characters are considered, these fossils support a tree congruent with that supported by living taxa (Fig. 3a), insofar as the gavialoid Thoracosaurus macrorhynchus was basal to all other crocodylians, and the tomistomine “Crocodylus” spenceri was the sister taxon to the crocodyline Australosuchus clarkae. The tomistomine + crocodyline clade received very high bootstrap support (97%). Two iterations of RSW were applied to this matrix; in the first case, the consensus tree reflected the primary signal with respect to Tomistominae and Gavialoidea (Fig. 3b), albeit with low bootstrap support for the tomis-
811
tomine-crocodyline clade (63%). The only group preserved in the second iteration was a clade including two alligatoroids (Fig. 3c). In the second iteration (Fig. 3d–f), fossils higher up the lineages were selected. These taxa were close enough to the tips of their lineages to potentially share some of the putative convergences with their living relatives. We might thus expect a secondary signal in which Gavialoidea and Tomistominae are close together. As before, the tree recovered when all characters were included reflected the tree recovered from extant taxa (Fig. 3d), that is, with a gavialoid (Eogavialis africanum) at the root of Crocodylia and a close relationship between a tomistomine (Gavialosuchus americanus) and two crocodylines. A crocodylid relationship for Tomistominae was very robust (93% bootstrap support). But in this case, the first RSW iteration recovered a tree congruent with the molecular hypothesis, i.e., that Gavialoidea (represented by Eogavialis) and Tomistominae (represented by Gavialosuchus) are part of a clade excluding most (but not all) alligatoroids (Fig. 3e). However, none of the nonalligatorid clades was expressed in the consensus of bootstrap replicates, and nearly all resolution was lost in the second iteration (Fig. 3f). One possible limitation of this study is incompleteness. Although the fossils considered here are incomplete, most or all of the cranial characters responsible for drawing Gavialis and Tomistoma together in the secondary signal reported by Trueman (1998) are codable. The absence of some data (missing data) increases the number of nonhomoplastic characters and, because the snouts and jaws were present and scorable in most, one would actually expect a stronger secondary signal when less relevant regions of the skeleton, such as the postcranium, are not preserved. These results suggest that although living crocodylians do preserve a hierarchy of signals, at least one of these signals is not preserved in the most-basal relatives available. In isolation, this suggests that fossils are preserving morphologies that were present before the acquisition of independently derived states in distant relatives, and that the homoplasy responsible for the
812
SYSTEMATIC BIOLOGY
VOL.
48
FIGURE 3. Results of RSW analyses when fossils are substituted for living crocodylians. “G” denotes a fossil gavialoid, “T” denotes a fossil tomistomine, and “C” denotes a fossil crocodyline. (a–c) Basal fossils considered (Pruned analysis 1 in Table 1): (a) strict consensus of two trees when all characters considered; (b) strict consensus of three trees from first iteration; (c) strict consensus of 25 trees from second iteration. (d–f) More derived fossils considered (Pruned analysis 2 in Table 1): (d) strict consensus of three trees when all characters considered; (e) single optimal tree from first iteration; (f) strict consensus of 14 trees from second iteration. Numbers at nodes indicate support from 100 bootstrap replicates.
secondary signal is not present. As we include more crownward members of these groups, more of this secondary signal is encountered. In effect, it suggests—though
does not demonstrate—that the secondary signal in crocodylian morphology is a result of convergent evolution for a suite of characters pertaining to snout morphology.
1999
POINTS OF VIEW
Based on the results reported here, the capacity for RSW to recover a secondary signal can depend on both the number and the phylogenetic position of taxa included in the analysis. This does not dispute the reality of the secondary signal in crocodylian morphology recovered by Trueman (1998). Indeed, the purpose of this note is neither to dispute the utility of Trueman’s method for isolating and identifying secondary signals, nor to contest the importance of finding such a hierarchy before performing operations that assume homoplasy represents random noise. But these results do suggest the need to bear sampling issues in mind when conducting a search for secondary signals. Failure to recover a secondary signal may reflect the number of taxa included rather than the real absence of a signal. The secondary signals we recover may reflect phylogeny but, as with a primary signal, they may be biased by the taxa being considered and may reflect convergent evolution in distantly related lineages. ACKNOWLEDGMENTS I thank Francois Lutzoni, Peter Wagner, and members of the Field Museum Systematics Discussion Group for helpful discourse. Steven Poe and John Trueman provided valuable reviews of earlier versions of this article.
REFERENCES BROCHU , C. A. 1997. Fossils, morphology, divergence timing, and the phylogenetic relationships of Gavialis. Syst. Biol. 46:479–522. DONOGHUE , M. J., J. A. DOYLE, J. GAUTHIER , A. G. KLUGE, AND T. ROWE . 1989. The importance of fossils in phylogeny reconstruction. Annu. Rev. Ecol. Syst. 20:431–460. GATESY, J., AND G. D. AMATO . 1992. Sequence similarity of 12S ribosomal segment of mitochondrial DNAs of gharial and false gharial. Copeia 1992: 241–244. GATESY, J., R. DE SALLE , AND W. WHEELER . 1993. Alignment-ambiguous nucleotide sites and the exclusion
813
of systematic data. Mol. Phylogenet. Evol. 2:152– 157. GAUTHIER , J., A. G. KLUGE, AND T. ROWE . 1988. Amniote phylogeny and the importance of fossils. Cladistics 4:105–209. GRAYBEAL , A. 1998. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47:9–17. HALANYCH , K. M. 1998. Lagomorphs misplaced by more characters and fewer taxa. Syst. Biol. 47:138– 146. HAUSER , D. L., AND G. BOYAJIAN . 1997. Proportional change and patterns of homoplasy: Sanderson and Donoghue revisited. Cladistics 13:97–100. HILLIS, D. M. 1996. Inferring complex phylogenies. Nature 383:130–131. HUELSENBECK , J. P. 1991. When are fossils better than extant taxa in phylogenetic analysis? Syst. Biol. 40: 458–469. KIM , J. 1996. General inconsistency conditions for maximum parsimony: Effects of branch lengths and increasing numbers of taxa. Syst. Biol. 45:363–374. LANGSTON , W. 1973. The crocodilian skull in historical perspective. Pages 263–284 in Biology of the Reptilia, vol. 4 (C. Gans and T. Parsons, eds.). Academic Press, London. LANGSTON , W. 1974. Nonmammalian Comanchean tetrapods. Geosci. Man 8:77–102. POE, S. 1996. Data set incongruence and the phylogeny of crocodilians. Syst. Biol. 45:393–414. POE, S. 1998. Sensitivity of phylogeny estimation to taxon sampling. Syst. Biol. 47:18–31. RANNALA , B., J. P. HUELSENBECK , Z. YANG , AND R. NIELSEN . 1998. Taxon sampling and the accuracy of large phylogenies. Syst. Biol. 47:702–710. SALISBURY, S. W., AND P. M. A. W ILLIS. 1996. A new crocodylian from the Early Eocene of southeastern Queensland and a preliminary investigation of the phylogenetic relationships of crocodyloids. Alcheringa 20:179–227. SANDERSON , M. J., AND M. J. DONOGHUE. 1989. Patterns of variation in levels of homoplasy. Evolution 43:1781–1795. SANDERSON , M. J., AND M. J. DONOGHUE . 1996. The relationship between homoplasy and confidence in a phylogenetic tree. Pages 67–89 in Homoplasy: The Recurrence of Similarity in Evolution (M. J. Sanderson and L. Hufford, eds.). Academic Press, New York. TRUEMAN , J. W. H. 1998. Reverse successive weighting. Syst. Biol. 47:733–737. Received 13 January 1999; accepted 23 February 1999 Associate Editor: R. Olmstead