H 0 5 initial hypothesis; TE 5 ...... Hedges, David Maddison, Janet Voight, Peter Wagner, and three .... M IYAMOTO, M. M., M. W. ALLARD, R. M. ADKINS, L. L..
Syst. Biol. 47(3): 367 ± 396, 1998
D ata Sets, Partitions, and C haracters: Philo sophies and Procedures for A naly zing M ultiple D ata Sets 1,2
1
J. W ILLIAM O . B ALLARD,
M ARG ARET K. T HAY ER , A LFRED F. N EW T O N , 3,4 AND ELIZABETH R. G RISM ER
JR.,
1
1
Departm ent of Zoology, Field M useum, Roosevelt Road at Lake Shore Drive, C hicago, Illinois 60605, USA Departm ent of Zoology, Field M useum, Roosevelt Road at La ke Shore Drive, C hicago, Illinois 60605, USA; E-mail: ballard@ fmnh.org 3 Biochemistry Laboratory, Field M useum , Roosevelt Road at La ke Shore Drive, C hicago, Illinois 60605, USA 4 Stra tagene C loning Systems, Protein Dept. 11011 N. T orrey Pines, La Jolla, C alifornia 92037, USA 2
Abstract.Ð W e compared four approaches for analy zing three data sets derived fro m staphylinoid beetles, a superfamily who se known species diversity is roughly comparable to that of vertebrates. O ne data set is derived from adult morphology and the two molecular data sets are fro m 12S ribosomal RNA and cytochrome b mitochondrial D NA. W e found that taxonomic cong ruence following conditional data combination, herein called compatible evidence (CE), resolved more nodes compatible with an initial conservative hypothesis than did total evidence (T E), conditional data combination (CD C), or taxonomic congruence (T C). CE sets a base of nodes obtained by CD C analysis and then investigates what further agreement may arise in a universe where these nodes are accepted as given. W e suggest that CE7 5-7 5 may be appropriate for future studies that aim to both generate a well-corro borated tree and investigate con¯ icts between data sets, partitions, and characters. CE7 5-7 5 is a 75% bootstrap consensus CD C tree followed by combinable-component consensus of a 75% bootstrap consensus of each homogeneous set of partitions having hierarchical structure. [Beetles; compatible evidence; conditional data combination; Staphylinidae; taxonomic cong ruence; total evidence.]
The growth of systematics and the availability of independent data sets raises the question of how these data sets should be analyzed. Kluge (1989) suggested that all data sets should be analyzed simultaneo usly according to a total evidence (TE) or simultaneous analysis (Nixon and Carpenter, 1993) procedure to maximize informativeness. T his approach seeks a single best-® tting hypothesis that, in cladistics, involves maximizing character congruence. However, each data set may have di erent levels of hierarchical structure (Faith and Cranston, 1991), be subject to distinct biological forces (Bull et al., 1993), and be informative at di erent diverg ence levels (Friedlander et al., 1994). As a result, no single criterio n or model may be applicable to all sets of evolutionary processes. Chippindale and W iens (1994) argued that di erences in character evolution can be accommodated by di erential character weig hting in a combined analysis of all data. T hey acknowledged, however, that the appropriate weighting scheme is as unkno wn as the true phylogeny. A step toward the reso lution of this quandary is to de® ne data partitions a priori. It may be
argued that partitions recognized by evolutionary biologists do not conform to natural sets of characters but are merely artifacts of tradition and technology. T he naturalness of di erent partitions can be defended by relating their existence to the concept of evolutionary processes de® ned by Bull et al. (1993). T hese authors de® ned process partitions as subsets of characters that are evolving under demo nstrably di erent rules. There are at least three reasons why partitions may appear to evolve di erently : (1) The signal in one or more partitions may be swamped by homoplasy, (2) distinct and con¯ icting processes may be operating, or (3) metho ds for investig ating whether process partitions should be combined may be inadequate. Faith and Cranston (1991) developed the combined hierarchical evidence (CHE) procedure (Faith, personal communication) to address the potential problem of signal being swamped by homoplasy. In CHE the data are partitioned, each independent partition is tested for structure using the permutation probability (PTP) test, and a tree is constructed from partitions with hierarchical structure
367
368
SY STEM AT IC BIO LO G Y
(Archie, 1989; Faith, 1990). Employ ing this procedure, Faith and Cranston (1991) showed that excluding data sets without PT P structure may reso lve a hig her number of nodes than including data sets both with and without PT P structure. Second, distinct and con¯ icting patterns may result from such pheno mena as intro gression or selection. Bull et al. (1993) developed the conditio nal data combination (CDC) procedure to address this issue. Chippindale and W iens (1994) refer to this approach as prior agreement. In CDC the data are partitioned into process partitions, each independent partition is tested for homogeneity, and combinable partitions are pooled and a tree constructed (Bull et al., 1993; de Q ueiroz, 1993; Rodrigo et al., 1993). Consider the nonmo nophyly of the two mitochondrial haplotypes of Drosophila mauritiana with respect to the D. simulans mitochondrial haplotypes (Satta and Takahata, 1990). Alternative explanations for the observed nonmonophyly are the retentio n of ancestral polymo rphisms or intro gressio n. Inspection of Satta and T akahata’s (1990) sequence data show D. mauritiana maI di ers from the D. simulans siIII haplotype by a sing le substitution. M oreover, the mtDNA of D. mauritiuna maI and that of D. simula ns siIII have no restrictio n site length polymorphism di erences (Solignac and M onnero t, 1986). T his sugg ests that there has been intro gression of D. simulans siIII mtDNA into D. mauritiana. Laboratory crosses of D. mauritiana (males) and D. simulans (females) result in sterile males and fertile females (Robertson, 1983), supporting the feasibility of intro gressio n. Third, metho ds for investigating whether process partitions should be combined may be inadequate. Sullivan (1996) showed that, in deer mice and grasshopper mice, portions of 12S ribosomal RNA and cytochrome b have con¯ icting signals but a combined parsimony analysis with equal weig hts recovered a wellcorroborated hy pothesis. O ne explanation for this apparent contradiction is that the phylog enetic sig nal from the di erent loci with different evolutionary processes was additive when it was analyzed under a homogeneo us model. An alternative metho d for analyzing multiple data sets, taxonomic congruence (T C), was
V O L.
47
proposed by M ickevich (1978). This metho d involves partitioning data into discrete process partitions, analyzing the characters in each partition separately, and constructing a consensus tree that summarizes the topological features shared by trees from the separate analyses ( Jones et al., 1993). The T C approach has many metho dological advantages because each data set is treated independently, separate analyses can corroborate speci® c assemblages, and con¯ icts between process partitions can be directly investigated (Lanyon, 1985; M iyamoto and Fitch, 1995). However, consensus metho ds may endo rse trees that contradict the tree obtained from the pooled data. Barrett et al. (1991) sugg ested that these situations ``may not be rare’’ and in such circumstances the consensus trees should not be reg arded as the ``best’’ inference to make from the available data. W e investig ated a new approach to the analysis of multiple data sets, herein called compatible evidence (CE). This procedure can be broken down into six steps and involves a series of familiar elements of systematic analysis (Fig. 1). CE sets a base of nodes obtained by CD C analysis and then investigates what further agreement may arise in a universe where these nodes are treated as given. T he process is consistent with the tenets of metaanalysis and combines the philosophy of CDC with the metho dology of T C. M eta-analysis orig inally was de® ned as statistical analysis of a large collection of individual data sets for the purpose of integrating the results (Glass, 1976). However, the term is often used more restrictively (e.g., Hedges and O lkin 1985; Dickersin and Berlin, 1992). In restrictive meta-analysis, a weighted combined analysis of all data from all studies is statistically compared to the separate analysis of each individual study . Consistent with meta-analysis, Hillis (1987) sugg ested that the best estimate of a phylog eny is obtained from a combined analysis, but that congruence among data sets is evidence that the underlying phylogeny is correctly estimated. W e constructed an initial conservative hypothesis and compared the TE, T C, CDC, and CE approaches by analyzing three independently derived data sets from staphylino id beetles. O ne data set is derived from adult morphology and the other two data sets are from
19 9 8
BALLARD ET AL.Ð
DAT A SET S, PART IT IO NS, AND CHARACTERS
369
FIGURE 1. G raphical representation of the compatible evidence (CE) method. (1) Process partitio ns are de® ned. (2) Each partition is tested for hierarchical structure. T he partitio ns with black boxes have hierarchical structure while the partitions with white bo xes do not. (3) Partitio ns with signi® cant structure are tested for homogeneity. In this case, all partitions are homogeneous and are combined. If they are not homogeneous, the partitio n that contributes most to the partition heterogeneity should be removed. T his procedure should be repeated until the remaining partitio ns are homogeneous. If more than one partitio n is removed, the excluded partitions should be tested for homogeneity. (4 ) A tree is constructed from the combinable partitio ns. In this example, one node is robustly supported. (5) T he characters in each of the homogeneous partitio ns (5 a) are analyzed separately under the constraint generated from the analysis of the combined partitio ns (5 b). In this case, one constraint is employed. If there are multiple sets of homogeneous data partitions the constraints derived from each set of homogeneous data partitio ns should apply only to partitio ns in that set. (6) A combinable-component consensus is constructed for each set of homogeneous combinable partitio ns.
12S ribosomal RNA (12S rRNA) and cytochrome b mitochondrial DNA (mtDNA). In this study we compare the number of well-supported nodes generated by the four techniques and prefer the metho d that supports the highest number of such nodes that are cong ruent with the a priori hypothesis (M iyamoto et al., 1994; W heeler, 1995). CE reso lved a hig her number of nodes compatible with the initial conservative hypothesis than did any other procedure. Further, CE facilitates investig ation of con¯ icts between data sets, partitions, and characters. However, CE may accept a tree that is not the most parsimonious and may generate a biased estimator of nodal support. M AT ERIALS
AND
M ETHO DS
Current Taxonomy Staphylinoid beetles comprise a large component of the suborder Polyphaga, which is the
largest and most diverse suborder within the Coleoptera. Polyphaga contain more than 90% of the known 350,000 species and 166 families of beetles. T he earliest fossil de® nitely attributable to Polyphaga is Peltosyne triassica from the Triassic of Central Asia (Arnol’di et al., 1977). However, Crowson (1975) and Ponomarenko (1969) suggested that the T riassic archostematan family Ademo synidae may actually represent basal Polyphaga in which the prothoracic pleuron had not yet become internalized (Lawrence and Britton, 1994). T he fossil record of the superfamily Staphylinoidea has been recently extended to the Triassic (unnamed Staphylinidae from eastern United States: Fraser et al., 1996) from previous estimates of the M iddle or Lower Jurassic (Arnol’di et al., 1977; Ryvkin, 1985). W ithin the Staphylinidae, the fossil record demonstrates that sig ni® cant diversi® cation had occurred
370
SY STEM AT IC BIO LO G Y
by the upper Jurassic and Lower Cretaceous (e.g., T ikho miro va, 1968; Ryvkin, 1990). Staphylinoidea, whose known species diversity worldwide (ca. 50,000) is comparable to that of vertebrates, owes its size primarily to several radiations within the Staphylinidae. T hese radiations have been accompanied by extraordinary habitat diversi® cation, and likely include multiple orig ins of some morphological adaptations. For example, radiations of several distantly related subfamilies (Pselaphinae, Aleocharinae, O soriinae, Staphylininae) in forest litter and soil habitats were probably preceded by separate feeding canalization in each of these groups (Leschen, 1993; Newton and T hayer, 1995). The many classi® cations of Staphylinidae and Staphylino idea have been reviewed by Lawrence and Newton (1982, 1995) and Newton and T hayer (1992, 1995). For this study, previous morphological evidence suggests that Agyrtidae is an appropriate outgroup. However, Leio didae may be more closely related to Agy rtidae than to Staphylinidae. T his is left unreso lved in all initial analyses.
V O L.
47
Initial Hypothesis In this study, we do not have a known phylogeny . Rather, the initial hypothesis was develo ped prior to the cladistic analysis we report here to facilitate comparison of the procedures investigated. O ur initial hypothesis (Fig. 2) was based on both adult and immature characters (Lawrence and Newton 1982; Ashe and Newton 1993; Newton and T hayer 1995; Newton, unpublished observations; T hayer, unpublished observations). However, only the subset of adult morphological data, without missing data points, was included in our methodological comparisons. T he immatures of some taxa are not known, and missing data have the potential to bias our comparisons. Thus, our initial hypothesis is not completely independent of the morphological characters employed in the analysis of multiple data sets. W e acknowledge this problem but suggest that the lack of independence is not a priori expected to bias one metho d over another. T o test this prediction we speci® cally investig ate nodes that con¯ ict with the initial hypothesis. W e conclude that the initial hy pothesis cannot
FIGURE 2. Initial hypothesis of relatio nships within Staphylinidae for the taxa included in this study (based on Lawrence and Newton, 19 82; Newton and T hayer, 1 995; T hayer, 1985; Newton, unpublished observations; T hayer, unpublished observatio ns). T his is not the result of a quantitative cladistic analy sis.
19 9 8
BALLARD ET AL.Ð
DAT A SET S, PART IT IO NS, AND CHARACTERS
be rejected statistically by currently available data. M orphological Data and Voucher Specimens Specimens used in the morphological analyses were examined dry, in alcohol, or in cleared and disarticulated slide mounts using dissecting and compound microscopes, as appropriate. In this study we employ the set of adult external morphological characters used by Newton and Thayer (1995:254 ± 268); see that work for a descriptio n of the characters and states. Invariant and/or uninterpretable characters are excluded (numbers 11, 18, 73, 84, 109, 111 of Newton and Thayer, 1995). Collection data, storage conditio ns, and identi® cations are described in Appendix 1. The morphological characters are partitioned into two process partitions: head and nonhead characters (Appendix 2). T hese partitions were chosen because transitio ns between predation and nonpredation are suspected to have arisen multiple times. T herefore, it was hypothesized that distinct evolutionary forces may be acting in these partitions. M olecular Data DNA from individual beetles was extracted using the PureG ene Kit (Gentra) following the ``DNA Isolation From Fixed T issue’’ protocol. Specimens had been stored in 95% ethanol for 6 months to 3 years (18 taxa), in 70% ethanol for < 1 year (3 taxa), or in 70% ethanol for 3 years (1 taxon) (Appendix 1, with locality data). Fragments were PCR (polymerase chain reaction) ampli® ed for 35 cycles. Each cycle consisted of 30 sec denaturation at 94 8 C, 30 sec annealing between 45 and 50 8 C, and 1 min extensio n at 72 8 C. T he amplicon was electrophoresed on a 1% agarose gel to verify size and the remainder of the reaction was cleaned and concentrated with M icrocon 100 (Amicon). Fifty nanograms of template and 25 ng of primer were added to 4.25 m l of FS Taq-DyeDeoxy T erminato r premix (Applied Biosy stems) prior to cycle sequencing for 25 cycles. T he cycling pro® le was 15 sec denaturation at 95 8 C, 5 sec annealing at 50 8 C, and 4 min extension at 65 8 C. Cycle sequencing reactions were NH4 O Ac precipitated, ethanol rinsed, and dried. T he dried samples were resuspended in
371
4 m l of deio nized formamide and 50 mM ethy lenediamine tetraacetic acid (ED TA) (5:1 ratio) and 2 m l was electrophoresed on an Applied Biosy stems 377 DNA sequencer. Sequences were imported into the Sequencher software program and the chromatograms were investigated. 12S rRNA. Ð T he 12S rRNA locus was ampli® ed with primers SR J-14233 (alias 12Sbi) and SR-N-14588 (alias 12Sai) desig ned by T . Kocher (Kocher et al., 1989) and listed by Simon et al. (1994). The sequences were initially aligned using ClustalW (Higgins and Sharp, 1988). The secondary structure model proposed in Hickson et al. (1996) was then employed to re® ne the alignment. Sixtythree positions with ambiguous alignments were excluded from all analyses. G aps were coded as an additional character. The 12S rRNA locus is divided into four process partitions according to Hickson et al. (1996): domain II, domain III stems, domain III loops, and domain IV (Appendix 3). C ytochrome b. Ð T he cytochrome b locus was ampli® ed with primers CB J-10933 (CB1 ) and CB-N-11367 (CB2) designed by Y. C. Crozier (Crozier and Crozier, 1992) and listed by Simon et al. (1994). Sequences were aligned using ClustalW (Higgins and Sharp, 1988) without any insertio ns or deletions. All nucleotides were included in the analysis. Amino acid composition was determined using M acClade 3.05 (M addison and M addison, 1992). Five cytochrome b process partitions were considered: ® rst, second, and third codon positions, third-position transversions, and amino acids. T hese partitions are not all independent; only independent partitions are included in the ® nal phylogenetic analyses. Phylogenetic Analysis T he procedures for TE, T C and CDC have been summarized (Kluge, 1989; Bull et al., 1993; Jones et al., 1993; M iyamoto and Fitch, 1995). CE can be broken down into six discrete steps (Fig . 1). T o facilitate comparison between the four techniques we have made slight modi® cations to T E, CD C, and TC, as discussed later.
372
SY STEM AT IC BIO LO G Y
W e employed parsimony analysis using PAUP* 4.0d5 2 (provided by David Swo ord), with one exception: W e used M EG A (Kumar et al. 1993) to conduct a Jukes± Cantor neig hbor-joining bootstrap analysis of cytochrome b replacement substitutions. W e note that it is not essential to implement only one algorithm when constructing the T C and CE trees. O n the contrary, it may behoove researchers to implement alternative metho ds when the branch lengths are hig hly uneven or there is a particular character bias (Felsenstein, 1978; Steel et al., 1993; Huelsenbeck, 1995). PTP testing (Archie, 1989; Faith and Cranston, 1991) is integrated into each metho d. W e employed the PTP test to investig ate whether the observed tree length could have been obtained ``by chance alone’’ (Archie, 1989; Faith, 1990; Carpenter, 1992; Faith, 1992). W e investig ated nucleotide biases using the chi-square test of homogeneity of base frequencies across all taxa as implemented in PAUP* 4.0d52. Steel et al. (1993) found that PT P and bootstrap could provide misleading results where loci have independently acquired similar G 1 C compositions. W e used bootstrapping (Efron, 1982; Felsenstein, 1985), decay or support indices (Bremer, 1988; Donoghue et al., 1992), and the topology-dependent permutation (T-PTP) test (Faith, 1991) to investig ate the monophyly of clades. W e constructed 95% (Felsenstein and Kishino , 1993), 70% (Hillis and Bull, 1993), and 50% (majority rule) bootstrap consensus trees. For the T -PTP tests we employed 99 randomizations unless otherwise stated. For CDC and CE we employed the incongruence leng th di erence test of Farris et al., (1995) to investigate whether partitions are evolving under distinct biological processes. T his random-partitioning test is an extensio n of a measure originally reported by M ickevich and Farris (1981) and is based on the null hypothesis of congruence. For T C and CE we summarized the topological features from separate analyses by combinable-component consensus (Bremer, 1990). Among others, Lanyon (1993) and de Q ueiroz (1993) previously noted that only strongly supported nodes should be included in a ® nal phylogenetic framework.
V O L.
47
W e investig ated con¯ icts between partitions by employing decay indices (Bremer, 1988; Donoghue et al., 1992), T -PTP tests of nonmo nophyly (Faith, 1991; Faith and Trueman, 1996; Swo ord et al., 1996b), and compare-2 tests (Faith, 1991; Ballard, 1994; Swo ord et al., 1996b; Faith and T rueman, 1996). A Posteriori Phylogenetic Analysis T o further compare the four phylogenetic metho ds, we selected the lowest bootstrap percentage generating a topology consistent with the initial hy pothesis. T o be conservative, each bootstrap consensus value was rounded up to the nearest 5%. T his (1) enables comparison of the approaches under the assumption that the initial hypothesis is correct and (2) facilitates prediction of bootstrap percentages that may be appropriate or future studies. W e were interested in comparing the tree lengths of the T E trees, the most parsimonio us tree for each partition, and each maximally reso lved a posteriori tree. W e present tree lengths and statistically investigate length differences using the compare-2 PTP test (Faith, 1991; Ballard, 1994; Swo ord et al., 1996b; Faith and Trueman, 1996) and the T empleton (1983) test. T he latter utilizes a W ilcoxon sig ned-rank test of the relative number of steps required by each character on each of the respective trees. M olecular Analysis Saturation plots. Ð T o further investigate the informativeness of each locus we plotted the number of substitutions in a speci® c partition (y-axis) against an estimate of time since divergence (x-axis). Distance estimates, using the Hasegawa et al. (1985) (HKY) model, were obtained from all alignable 12S rRNA and cytochrome b characters. W e estimated gamma-distributed rates using PAUP* 4d5 2-3. To investigate saturation between the outgroup and the ingro up, the pairwise divergences between Necrophilus and all ingro up taxa and the pairwise diverg ences between all ingro up taxa were plotted with di erent symbols. Sub stitution biases. Ð W e employed the W ilcoxon test to compare the proportion of
19 9 8
BALLARD ET AL.Ð
DAT A SET S, PART IT IO NS, AND CHARACTERS
nucleotides that occurred in the variable positions to the proportion in all positions. T he substitution bias in the 12S rRNA stem partition may be expected to maintain stem stability. Ballard and Kreitman (1994) hypothesized that changes to C or G are slightly deleterio us in insect mtDNA. Results Phylogenetic Analysis Total evidence ana lysis. Ð The 871 characters in the three independently acquired data sets were pooled. T hese data are presented in Appendices 2, 3, and 4. T he 12S rRNA and cytochrome b sequences have also been deposited in G enBank (accession numbers AFO 21046-89). T he 434 informative characters have signi® cant structure as determined by PT P (Table 1). The resultant topology Fig. 3a) substantially disagrees with our initial hypothesis (Fig. 2). In particular, there is con¯ ict over the monophyly of the O maliinae (Amphichroum, Dropephylla, Eusphalerum). To
373
investig ate this apparent con¯ ict we conducted a T-PTP test of O maliinae nonmo nophyly. This test cannot reject monophyly of the O maliinae (T -PTP 5 0.94, 1 step). Data not included in this analysis support monophyly of the O maliinae, speci® cally, larvae of O maliinae have a unique combination of derived characters including single-segmente d urogomphi, ® ve pairs of stemmata, and two pairs of laterosclerites on the ® rst two or more abdominal segments (Newton and Thayer, 1995; T hayer, 1985). Four nodes supported by bootstrap percentages of > 70 (T E70) are supported by the initial hypothesis (Figs. 2, 3b, 4). Also, two of the three nodes supported by 50% bootstrap consensus (T E50) are compatible with the initial hypothesis. M onophyly of (Habrocerus, Cyparium) supported by 50% bootstrap (Figs. 3b, 4) con¯ icts with the initial hypothesis (Fig . 2). T-PTP tests of nonmonophyly suggest these data can reject the hypothesis that (Tachinus, Habrocerus) are monophyletic (T-PTP 5 0.03, 25 steps) but not that (Cyparium, Oxytelus) are monophyletic (T -PTP 5 0.31, 14 steps). Further, data not included in this analysis support monophyly of (Oxytelus, C yparium) and of (Tachinus, Habrocerus). M onophyly of (O xytelus, Cyparium) is supported by loss of a vein in the anal area of the hind wing and specialized saprophagous/ mycophagous mouthparts of adults and larvae. Derived morphological characters of larvae including deciduous urogomphi, epipharynx with asymmetrical setose ® elds (Newton, unpubl.; Ashe and Newton, 1993) and the traditionally accepted semilimulo id body shape unite (Tachinus, Habrocerus). T axonomic congruence ana lysis. Ð Each partition was analyzed independently and then a combinable-component consensus was constructed. For convenience we discuss the three independently acquired data sets separately.
FIGURE 3. T otal evidence analysis of 871 characters (434 were parsimony-informative). T o maintain the original order of Fig. 2 but avoid unnecessary line crossing, Oxytelus, Euconnus, and Dianous appear twice. The taxa out of place are in bold. (a) Strict consensus of the three equally parsimonious trees of length 2,1 43 steps (CI 5 0.3 6). (b) Bootstrap percentages > 50 are shown in circles.
M orphology. Ð The morphological data set has PT P structure (Table 1). T his data set was divided into two partitions: head and nonhead characters. T hese partitions have signi® cant PT P structure (T able 1). Bootstrapping the head partition supports two nodes at 95% level that are consistent with the initial
374 T ABLE 1.
SY STEM AT IC BIO LO G Y
47
Summary of parsimony analyses
Data sets T otal evidence M orphology Head Nonhead 12S rRNA D omain II D omain III stems D omain III loops D omain IV Cytochrome b First positions Second positio ns T hird positions T hird transversio ns Amino acids Conditional data combination a
V O L.
No. of characters
No. of informative characters
PT P
871 106 38 68 300 41 152 93 14 465 155 155 155 155 155 547
43 4 10 6 38 68 12 9 17 70 42 0 19 9 54 22 12 3 89 48 28 3
0.0 1 0.0 1 0.0 1 0.0 1 0.0 1 0.0 1 0.0 1 0.0 1 1.0 0 b 0.0 7 b 0.0 6 0.1 3 b 0.3 0.0 1 0.0 1 0.0 1
a
T ree length
No. of trees
Consistency index
2,1 43 378 145 200 584 72 283 175 3 1,0 84 266 96 669 399 284 1,2 97
3 6 34 3 20 44 3 18 1 1 3 1 3 10 656 1
036 0.35 0.34 0.43 0.44 0.54 0.44 0.49 1.00 0.35 0.39 0.53 0.34 0.24 0.55 0.42
PTP from 99 randomizations except where noted. PT P values were determined from 999 randomizations.
b
hypothesis. Bootstrapping the nonhead morphological characters supports the same two nodes as the head partition in addition to four other nodes at > 70% that do not con¯ ict with the initial hypothesis (Figs. 2, 4). The nonhead and head partitions con¯ ict over the sister taxon to (Conoplectus, Palimbolus) (Fig. 4). The nonhead partition supports monophyly of (Dasycerus (C onoplectus, Palimbolus)). T he head morphological partition supports (Edaphus (C onoplectus, Palimbolus)). T he initial hy pothesis, decay indices, and compare-2 PT P testing support the former. In the non-head partition eight additional steps are required to decay the branch uniting Dasycerus with (Conoplectus, Palimbolus). O nly two additional steps are required to decay the branch uniting Edaphus with (Conoplectus, Palimbolus) in the head partition. T wo constraints were constructed for the compare-2 analysis, the ® rst corresponding to (Dasycerus (Conoplectus, Palimbolus)) and the second to (Edaphus(C onoplectus, Palimbolus)). First, only the nonhead characters were included. T he shortest parsimony tree under the constraint of (Dasycerus(Conoplectus, Palimbolus)) is 200 steps, while it is 218 steps under the constraint of (Edaphus(Conoplectus, Palimbolus)). T he randomization procedure rejects the hy pothesis that the topologies di er due to chance (PTP 5 0.01, 18 steps). Second, only the head
characters were included. Parsimony ® nds a shortest tree of 145 steps under the constraint of (Edaphus (Conoplectus, Palimbolus)) and 150 under the constraint of (Dasycerus(Conoplectus, Palimbolus)). In this case, the compare-2 PTP test fails to reject the null hypothesis that the result is due to chance (PTP 5 0.06, 5 steps). Again, data not included in this analysis have some bearing on interpretatio n of the 50% bootstrap results. As discussed by Newton and T hayer (1995), derived larval characters support monophyly of both the O maliine G roup (here O maliinae, Proteininae, Dasycerinae, and Pselaphinae) (tentorial bridg e structure) and (D asycerinae, Pselaphinae, M icropeplinae) (unarticulated urogomphi), to the exclusion of Euaesthetinae, including Edaphus. Unfortunately, we did not have appropriate material to look at Dasycerus ovariole structure, which Newton and T hayer (1995) predicted would be of a type thus far unique to Pselaphinae 1 Proteininae 1 M icropeplinae (W elch, 1993). In con¯ ict with the initial hypothesis, the morphological nonhead partition groups Oxytelus and ((Opresus, Euconnus) (Edaphus, Dianous)) with a > 50% bootstrap (Fig . 4). T-PTP tests of nonmono phyly suggest these data cannot reject the hy pothesis that (C yparium, Oxytelus) and (Edaphus, Dianous, Opresus, Euconnus, O xyporus, Achenomorphus,
19 9 8
BALLARD ET AL.Ð
DAT A SET S, PART IT IO NS, AND CHARACTERS
375
FIGURE 4. Comparison of the support for speci® c nodes. The tree representing the initial hypothesis is shown in Fig. 2. T he bo otstrap support or the number of independent partitio ns supporting the node is denoted by shading. Neopelatops is not included. H 0 5 initial hypothesis; TE 5 total evidence; T C 5 taxonomic congruence (derived from 95%, 70%, and 5 0% bootstrap consensus of each partitio n and constructed by combinable component consensus); CD C 5 conditional data combination; CE 5 compatible evidence. CE95- was derived from the 95 % bo otstrap consensus CD C tree. T his constraint was then imposed on each partitio n and 70% and 50% bo otstrap consensus trees determined. Combinable-component consensus trees (CE95-7 0, CE95-5 0) were then constructed. Note that within a column some supported nodes may be nested within others, e.g., (C onoplectus, Palimbolus) within (Dasycerus, C onoplectus, Palimbolus) in T E.
Ontholestes, Oiceoptoma) are each monophyletic (T -PTP 5 0.06, 11 steps; T -PTP 5 0.77, 11 steps respectively). Adult, larval, and pupal derived characters not included in this analysis (extraoral digestion with correlated mouthpart structures of adults and larvae; four pairs of functional pupal spiracles) support monophyly of the Staphylinine G roup (here Edaphus through O iceoptoma in Fig . 2). M onophyly of the O xyteline G roup (Oxytelus and C yparium in this study) is also supported by data not included in this analysis including loss of a vein in the anal area of the hind wing and specialized saprophagous/mycophagous mouthparts of adults and larvae (Lawrence and
Newton, 1982; Newton and T hayer, 1995; Newton, unpublished observations). 12S rRNA. Ð There is PTP structure in the 12S rRNA data set (T able 1). T hese characters were divided into four partitions: domain II, domain III stems, domain III loops, and domain IV characters. T he former three partitions have PT P structure but the domain IV characters do not (T able 1). There are no nucleotide biases in the three partitions with PTP structure (domain II c 263 5 25.92 and 37.57; domain III stems c 263 5 25.37 and 38.62; domain III loops c 26 3 5 22.62 and 28.28, P > 0 . 05 for all and variable positions,
376
SY STEM AT IC BIO LO G Y
respectively). O ne node reso lved by each of the three partitions at the 70% bootstrap consensus level does not con¯ ict with the initial hypothesis (Fig. 4). Support for (Opresus, Edaphus) at the 50% bootstrap consensus level in the domain III stems partition con¯ icts with the initial hypothesis (Fig. 4). T his support is at least partially due to compensatory changes that maintain stem stability (Fig . 5). Six synapomorphies unite (O presus, Edaphus) in the three most parsimonious trees (Table 1) and 3 steps are required to decay the node. T wo of these involve compensatory changes and, as a result, it may be argued that these positions should be down-weighted. At position 103 in stem 36 there is a substitution from C to U, while at position 218 there is a substitution from G to A (Fig . 5, Appendix 3). T here are also compensatory changes in stem 48 (A ® U at position 311 and U ® A at position 324) (Appendix 3). T -PTP tests of nonmonophyly suggest these data cannot reject the hypothesis that (Opresus, Euconnus) and (Edaphus, Dianous) are monophyletic (T PT P 5 0.55, 4 steps; T -PTP 5 0.55, 5 steps). Several derived characters (not included in this study) support the nodes (Edaphus, Dianous) and (Opresus, Euconnus) in the initial hypothesis (Newton, unpublished observations). Adults of the former have lost wingfolding patches from the abdominal terg a; larvae have the mala highly reduced and ® ng er-like and the apical maxillary palp seg ment with a membranous ¯ exation point near the middle. Like other Scydmaenidae, O presus and Euconnus have the adult ely tra long, nearly or completely covering the abdomen (a secondary develo pment), and all known scydmaenid larvae have club-shaped antennae, no ligula, three or fewer stemmata, and urogomphi ® xed or absent (Newton, 1991; Opresus larvae are unkno wn). Several additional features that di er between these pairs of taxa are of uncertain polarity , so it is not clear which of the two pairs they support. Cytochrome b. Ð There is no PTP structure when all cytochrome b characters are included or when the locus is divided into ® rst, second, and third codon position (T able 1). The amino acid and third-position transversio n partitions have PT P structure (Table 1). These partitions
V O L.
47
are not independent and the amino acid partition was selected because it includes only replacement changes, whereas the third-position transversio n partition is a heterog eneous assemblage of silent and replacement changes. Consistent with the initial hypothesis, the amino acid characters support monophyly of (Conoplectus, Palimbolus) at the 70% level. T he third-position transversio n partition supports no nodes that are consistent with the initial hypothesis. In con¯ ict with the initial hypothesis, the amino acid characters support monophyly of (C yparium, Habrocerus) at the 50% bootstrap consensus level. Nine sy napomorphies unite (C yparium, Habrocerus) and two steps are required to decay the node. Four of these changes are from isoleucine and three are changes to isoleucine. In contrast, only 44% of all changes througho ut the amino acid tree include isoleucine. T here are 14 isoleucines in the 48 parsimony-info rmative amino acid positions sequenced in Habrocerus and 7 in Cyparium. The mean number of isoleucines in the parsimony-informative positions for all taxa is 8 . 64 0 . 55. These data suggest that there is an increase in the proportion of isoleucines in Habrocerus, possibly causing a long branch e ect. Additio nal sequencing of taxa within the Habrocerinae is required. T-PTP tests of nonmo nophyly sugg est these data cannot reject the hy pothesis that (Tachinus, Habrocerus) and (Cyparium, O xytelus) are monophyletic (T-PTP 5 0.17, 6 steps; T-PTP 5 0.33, 5 steps). A neighbo r-joining analysis with gamma-distributed rates (0.5, 1, 2) does not unite (Cyparium, Habrocerus) but in con¯ ict with the initial hypothesis supports (O xytelus, Oxyporus) at the 50% bootstrap. As discussed in the section on T otal Evidence Analysis, additional characters not included in this analysis also support (Oxytelus, Cyparium) and (Tachinus, Habrocerus). C ombinable-component consensus. Ð A TC tree was constructed from the bootstrap analysis of the partitions. T he 95% bootstrap support tree (T C95) supports two nodes while the TC70 tree supports four (Figs. 4, 6a, 6b; T able 2). Three T C50 nodes supported by 2 or more partitions do not con¯ ict with our initial hypothesis (Fig s. 4, 6c).
19 9 8
BALLARD ET AL.Ð
DAT A SET S, PART IT IO NS, AND CHARACTERS
377
FIGURE 5 . T he 12 S rRNA secondary structure model following Hickso n et al. (1996). Stems are shaded, while equivocally alig nable bases are denoted by hatched circles. Upper case characters denote positio ns that are invariant. Lower case characters denote the majority rule base at that position. If two bases are equally present in the majority the IUPAC code is employed. Insertions are shown with an open triangle. The six changes on the lineage to (Opresus, Edaphus) are shown with arrows and an open diamond is placed beside compensatory base changes in the (Opresus, Edaphus) lineage.
378 T ABLE 2. Fig. 2.
SY STEM AT IC BIO LO G Y
V O L.
47
Summary of nodes supported by each method that are consistent with the initial hypothesis shown in
M ethod T otal evidence T axonomic congruence Conditional data combination Compatible evidence
Phylogenetic analyses
No. nodes
A posterio ri phylogenetic analyses
No. nodes
T E70 T C70 CDC95 CE95-7 0
4 4 5 5
T E55 T C65 CD C75 CE75-7 5
5 5 5 7
Conditional Data Combination Analysis. Ð For CDC we partitioned the data into process partitions, tested each partition for homogeneity, pooled independent combinable partitions, and constructed a tree. The head and nonhead morphological partitions, the domain II and domain III 12S rRNA partitions, and the cytochrome b amino acid partition have PT P structure (T able 1) and are not statistically incongruent (P 5 0.01, 1,159 steps). As de-
scribed earlier, we did not include the cytochrome b third-position transversio n partition. T he CD C procedure reso lves more nodes than either T E or T C at the 95%, 70%, and 50% bootstrap consensus levels (Figs. 4, 7). However, only at the 95% bootstrap consensus (CDC95) are all nodes supported by the initial hypothesis. Eight of the nine nodes supported in the CD C50 tree, including monophyly of Staph-
FIGURE 6 . T axonomic congruence trees constructed by combinable-component consensus of six partitio ns. T he roman numerals in the circles represent the number of partitio ns supporting that node. (a) Constructed from the 95% bootstrap consensus. (b) Constructed from the 70 % bo otstrap consensus. (c) Constructed from the 5 0% bo otstrap consensus.
19 9 8
BALLARD ET AL.Ð
DAT A SET S, PART IT IO NS, AND CHARACTERS
379
ylinidae 1 Scydmaenidae 1 Silphidae, were present in, or consistent with, the orig inal morphology based hypothesis (Figs. 2, 4). T he ninth node, (Oiceoptoma, Tachinus), was not included in the initial hypothesis. T-PTP tests of nonmono phyly suggest that we cannot reject monophyly of the initially hypothesized nodes of (O iceoptoma, Ontholestes, Achenomorphus, Oxyporus, Opresus, Euconnus, Dianous, Edaphus) (T -PTP 5 0.98, 13 steps) or (T achinus, Habrocerus) (T-PTP 5 0.17, 14 steps). Adult, larval, and pupal derived characters not included in this analysis (extraoral dig estion by adults and larvae; four pairs of functional pupal spiracles) unite Oiceoptoma with the rest of the Staphylinine G roup (here, Edaphus through Oiceoptoma in Fig. 2) and derived morphological characters of larvae including deciduous urogomphi, epipharynx with asymmetrical setose ® elds (Ashe and Newton, 1993; Newton, unpublished observations) and semilimulo id adult body shape unite Tachinus with the T achyporine G roup (here, Tachinus and Habrocerus).
Compatible Evidence Analysis. Ð For CE we divided the data into process partitions, tested each discrete partition for hierarchical structure, tested independent partitions having hierarchical structure for homogeneity, pooled combinable partitions, and constructed a tree. T he CE constraint was based upon the 95% CD C topology (CE95) (Fig. 7). The 70% and 50% CDC topologies contain a node that con¯ icts with the initial hypothesis. W e then analyzed the characters in each partition separately under the constraint generated from the combined analysis, and constructed a combinable component consensus tree. No additional nodes were supported by a 95% bootstrap of each partition (CE95-95), and the combinable-component consensus trees that summarize the topological features shared among the CE95-70 and CE9 5-50 bootstrap consensus trees are presented (Fig. 8). CE95-70 includes three constrained and two unconstrained nodes that are congruent or consistent with the initial hy pothesis (T able 2). T he nonhead morphological and the 12S rRNA domain III loop partitions support monophyly of the ing roup excluding Neopelatops, while
FIGURE 7. Conditional data combination 50% bo otstrap consensus tree (bo otstrap percentages are in circles). T here were 547 characters (28 3 informative) in six partitions: head and nonhead morphological partitio ns, 12S rRNA domain II and domain III stems and loops partitions, and the cytochrome b amino acid partitio n. T he sing le most parsimonious tree was 1 ,2 9 7 steps (CI 5 0.4 2).
the head partition supports monophyly of (Amphichroum, Eusphalerum) (Figs. 4, 8a). In CE9 5-50 the nonhead, domain II, domain III loop, and amino acid partitions support monophyly of the ingroup excluding Neopelatops (Figs. 4, 8b). T he head partition supports monophyly of (Amphichroum, Eusphalerum). In con¯ ict with the initial hy pothesis (Fig. 2), the amino acid partition supports monophyly of (C yparium, Ha brocerus) (Fig s. 4, 8b). As discussed earlier, one explanation for this result is that Habrocerus has an elevated number of isoleucines in the amino acid partition. A Posteriori Phylogenetic Analysis M aximally resolved trees consistent with the initial hypothesis.Ð Following M iyamoto et al. (1994) and W heeler (1995), we prefer the metho d that supports the highest number of
380
SY STEM AT IC BIO LO G Y
FIGURE 8 . For CE we analyzed each partition separately under the constraint generated from the 95% CD C analysis. T he constrained nodes are denoted by starred circles. W e then constructed a combinable component consensus tree. T he roman numeral in each circle represents the number of partitio ns supporting that node. (a) Supported by 7 0% bo otstrap consensus. (b) Supported by 50% bo otstrap consensus.
nodes congruent with the initial hypothesis. CE75-75 supports the complete set of six nodes supported by the three other metho ds and (Dasycerus(C onoplectus, Palimbolus)) (Fig. 9, T able 2). A posteriori analysis demo nstrated that the result of the constrained bootstrap analysis is linked with the prior unconstrained analysis. T hree constrained and two unconstrained nodes included in the CE95-70 bootstrap consensus are consistent with our initial hypothesis. However, decreasing the unconstrained bootstrap consensus thresho ld by 20% necessitated a 5% increase in the constrained bootstrap consensus to prevent con¯ ict with the initial hy pothesis. CE75-75 analysis produces ® ve constrained and two unconstrained nodes, but the (Amphichroum, Eusphalerum) node supported in CE9 5-70 was not among these. W e suggest that CE75-75 may be appropriate for future studies. Tree lengths. Ð T he total evidence data set contained all the morphological data and all
V O L.
47
the alignable 12S rRNA and cytochrome b sequence data. This consensus tree was 2,143 steps in length, one step shorter than the TE5 5/CD C75, TC65, and CE75-75 trees (T able 3). T he one-step di erence occurs because (Opresus, Euconnus) does not occur in any of the 2,143-step trees; it does occur in four of the eleven 2,144-step trees and is supported by a 59% bootstrap consensus of the combined data (Fig. 3b). W e tested whether the T E5 5/CD C75, TC65, and CE7 5-75 trees were sig ni® cantly di erent in length. The compare-2 PT P test suggests that there is no statistical di erence between the TE5 5/CDC75 topology and the TC65 and CE7 5-75 topologies (head morphological PT P 5 0.30 and 0.36, nonhead morphological PT P 5 0.82 and 0.91, domain II PT P 5 0.82 and 0.91, domain III stems PT P 5 0.80 and 0.75, domain III loops PT P 5 0.30 and 0.24, cytochrome b amino acids PT P 5 0.92 and 0.98). The T empleton test (1983) supports the conclusion that there are no sig ni® cant di erences among the three topologies in any partition. An alternative approach would be to calculate the tree length di erence between the T E tree topology and each a posteriori tree topology for the total data set and for each of the homogeno us partitions. W e present the tree length di erences for these tests (Table 3), but we suggest that these data should be treated with caution. T he most parsimonious tree for each partition is not constant and the TE topology has less potential for character optimization within each partition (T able 3). M olecular Analysis Saturation plots. Ð T o investig ate further the informativeness of each locus, we plotted the number of substitutions against an HKY (Hasegawa et al., 1985) estimate of relative divergence. T hese plots demo nstrate three main points. First, these data from 12S rRNA and cytochrome b support the hypothesis that the ingro up had an explosive radiation. T he pairwise divergence values between Necrophilus and each ingro up taxon fall within the scatter of ing roup diverg ences (Figs. 10, 11). If all branches were short and of equal leng th it
19 9 8
BALLARD ET AL.Ð
381
DAT A SET S, PART IT IO NS, AND CHARACTERS
FIGURE 9 . Comparison of the a posteriori support for speci® c nodes. The tree representing the initial hypothesis is shown in Fig. 2 . T he bo otstrap support or the number of independent partitio ns supporting a node is denoted by shading. Here the lowest bo otstrap percentage, rounded up to the nearest 5%, that generated a topology consistent with the initial hypothesis is presented. Neopelatops is not included. H 0 5 initial hypothesis; T E55 5 total evidence derived from a 55 % bo otstrap consensus; T C65 5 taxonomic congruence, derived from 6 5% bootstrap support and constructed by combinable-component consensus; CDC75 5 conditional data combination derived from a 75% bo otstrap consensus; CE75 -7 5 5 compatible evidence derived from a 75% bootstrap consensus tree follo wed by a combinable-component 7 5% bootstrap consensus of each partition. Note that (C onoplectus, Palimbolus) is nested within (Dasycerus, C onoplectus, Palimbolus) in CE75 -7 5 .
T ABLE 3.
Increase in the number of steps from the minimum tree length
Data set T otal evidence M orphology Head Nonhead 12S rRNA D omain II D omain III stems D omain III loops Cytochrome b Amino acids a
a
b
M inimum tree (steps)
TE (additional steps)
T E55 /CD C75 T C65 CE75-7 5 (additional steps) (additional steps) (additional steps)
2,1 43
0/4
1
1
1
145 200
28/25 22/18
4 0
7 0
12 0
72 283 175
17/16 33/29 18/19
5 14 5
5 13 8
6 19 11
284
30/31
13
12
15
To facilitate comparison we calculate T E using all data (before the slash) and the six congruent partitio ns with hierarchical structure (after the slash). b These trees have the same topology.
382
SY STEM AT IC BIO LO G Y
V O L.
47
FIGURE 10 . D ivergence between taxa plotted against an estimate of time since divergence for the 1 2S rRNA data. T he 14 domain IV characters are not plotted because the three substitutio ns were not parsimony-informative. (a) D omain II. (b) D omain III stem transitio ns. (c) D omain III stem transversio ns. (d) D omain III loop transitio ns. (e) D omain III loop transversions.
19 9 8
BALLARD ET AL.Ð
DAT A SET S, PART IT IO NS, AND CHARACTERS
383
FIGURE 1 1. D ivergence between taxa plotted against an estimate of divergence time for the cytochrome b data. (a) First codon positio ns. (b) Second codon positions. (c) T hird codon positions. (d) T hird codon transitions. (e) Third codon transversio ns.
384
SY STEM AT IC BIO LO G Y
may be expected that the pairwise diverg ence values between the outgroup Necrophilus and all ingro up taxa would cluster in the top right of each plot. Second, divergences in all partitions show some leveling. T his sugg ests that each partition is saturated at least to some degree (Fig s. 10, 11). Third, proportionately more transversio ns than transitio ns appear to occur in the 12S rRNA domain III stems (Fig . 10c vs. 10b), loops (Fig . 10e vs. 10d) and cytochrome b third codon positions (Fig. 11e vs. 11d). In contrast to these data, Springer and Douzery (1996) noted more 12S rRNA transitio ns than transversio ns in mammals. T his suggests that di erent evolutionary forces may be acting in insect and mammal mtDNA. A less likely explanation is that the di erence may be due to our di culty in accurately estimating divergence time. W e employed the HKY model (Hasegawa et al., 1985) with gamma-distributed rates. In contrast, Springer and D ouzery (1996) estimated divergence times based on the more complete mammal fossil record and on sing le-copy DNA hybridization molecular clocks. Sub stitution biases. Ð This substitution bias in the 12S rRNA stems retains stem stability. Variable characters in 12S rRNA stems have signi® cantly more C’s and fewer U’s than all positions combined (T able 4). There is also the trend to have more G ’s and fewer A’s in variable positions than all positions combined. T ABLE 4.
V O L.
D ata presented here are consistent with the hypothesis of Ballard and Kreitman (1994) that changes to C or G are slightly deleterio us in insect mtDNA. In the less constrained reg ions there are signi® cantly more A’s and U’s or T ’s than C’s and G ’s in variable positions than in all positions combined. Variable positions in 12S rRNA domain III loops have sig ni® cantly more A’s and U’s and signi® cantly fewer C’s and G ’s than in all positions (Table 4). Further, there are signi® cantly more A’s and T ’s and fewer C’s and G ’s in variable ® rst- and thirdcodon positions than in all the positions. Substitution hetero geneity also exists in the domain II 12S rRNA and cytochrome b secondposition partitions. Additio nal data are required to test the generality of this result. Domain II variable positions show sig ni® cantly more A’s and C’s and fewer U’s. T here is no sig ni® cant di erence in the numbers of G ’s. In the variable cytochrome b second positions there are signi® cantly more A’s and fewer G ’s, while the numbers of C’s and T’s do not di er signi® cantly. D ISCUSSIO N CE is a robust metho d for analyzing multiple data sets. CE sets a base of nodes obtained by CD C analysis and then investigates what further agreement may arise in a universe where these nodes are given. Thus, the CDC tree de® nes the minimum reso lution of the CE topology. Congruence analyses are then employed to investig ate the potential for additional phylogenetic reso lution. Establishing
Investigatio n of substitutio nal biases Z
Data set 12S rRNA D omain II D omain III stems D omain III loops Cytochrome b First position Second positio n T hird position a
47
2
A
C
3.7 5* 1.4 1 3.6 2* 4.0 7* 4.1 1* 4.1 2*
a
4.1 0* 4.1 1* 2 4.1 1*
G 2
2
2
1.4 7 1.3 8 2 3.9 1*
4.0 4* 1.3 8 2 4.1 1*
2
T or U 4.0 7* 3.0 7* 4.0 4*
2
2.0 4* 0.6 4.1 2*
2
2
4.0 2* 4.1 1* 2 4.1 1* 2
Substitutio n biases were investigated by comparing the proportion of nucleotides that occurred in the variable positions compared to all positio ns using the W ilcoxon signed ranks test. T he direction of change is represented by a positive (higher number in variable positio ns) or a negative value (higher number in all positio ns combined). *D enotes signi® cance at 0.0 5.
19 9 8
BALLARD ET AL.Ð
DAT A SET S, PART IT IO NS, AND CHARACTERS
such constraints should preclude the problem raised by Barrett et al. (1991) that a consensus tree can contradict the tree inferred from pooled data. In this study CE75-75 supported more nodes that are consistent with the initial hypothesis than any other procedure. W e suggest that these bootstrap proportions may be appropriate for future studies that include multiple data sets. CDC, TC, and CE partition the data into evolutionary process partitions (Bull et al., 1993; M iyamoto and Fitch, 1995). In this study 11 process partitions were de® ned. W e de® ned two morphological partitions because we expected that selection on feeding mode may in¯ uence the evolution of head characters. W e de® ned four 12S rRNA partitions: domain II, domain III stems, domain III loops, and domain IV. It is likely that the evolutionary forces acting on stems and loops di er. However, this simple partitioning may be inadequate. Hickson et al. (1996) sugg ested that patterns of conservatio n of and variability in the paired and unpaired regions make di erential weighting in terms of stems and loops unsatisfactory. W e investig ated a total of ® ve partitions in the cytochrome b locus. T hese partitions are not all independent. It is metho dologically tractable under the criterio n of parsimony to partition the cytochrome b locus into ® rst, second and third codon positions, third-position transversio ns, and amino acids. However, all but the latter are hetero geneo us mixtures of syno nymo us and nonsyno nymo us changes. In this study we employed the cytochrome b amino acid partition in the T C, CD C, and CE analyses. There are at least three reasons why partitions may appear to evolve di erently . T he signal may be swamped by homoplasy, methods for investigating whether process partitions should be combined may be inadequate, or distinct and con¯ icting processes may be operating. Testing for hierarchical structure should test the ® rst element while investigating the homogeneity of data partitions should test the second. W e tested hierarchical structure using the PT P test and for incongruence length di erence using the test of Farris et al. (1995). Cunningham (1997) found that the partition homogeneity test distinguished between cases where combining data generally im-
385
proved phylogenetic accuracy and cases where accuracy of the combined data su ered relative to the individual partitions. However, further investigatio n of the robustness of alternative tests is required (Hillis, 1991; Huelsenbeck, 1991; KaÈllersjoÈ et al., 1992; Huelsenbeck and Bull, 1996; Ly ons-W eiler et al., 1996). Partitioning data may also be advantageous because it (1) assists the investigation of evolutionary forces, (2) facilitates critical exploration of the data, and (3) permits critical examination of metho ds for presenting data. However, not all process partitions are both self-de® ning and metho dologically tractable. As a result, the validity of some partitions may be the subject of debate. In the extreme case, de® ning each character as a partition is not recommended (M iyamoto and Fitch, 1995). W e investigated substitution biases in each molecular partition by comparing the proportion of nucleotides that occurred in the variable positions to the proportion in all positions. O ur results are consistent with the notion that there is a substitution bias to maintain secondary structure. Support for (Opresus, Edaphus) in the domain III stems partition that con¯ icts with the initial hypothesis is at least partially due to compensatory changes that maintain stem stability. In the less constrained reg ions of the variable 12S rRNA loops and the variable cytochrome b ® rst and third codon positions there are sig ni® cantly fewer C’s and G ’s and more A’s and T’s. These data are consistent with the hypothesis of Ballard and Kreitman (1994) that changes to C or G are slightly deleterious at less constrained sites in the mitochondrial genome of Drosophila. Plots of divergence against time are employed as a heuristic guideline for the onset of saturation. These plots clearly show that the outgroup-ingroup divergences cluster within the ingro up-ingroup comparisons (Fig s. 10, 11). O ne explanation for this result is a massive radiation of staphylinid beetles, as has been hypothesized in previous morphological studies (Newton and Thayer, 1995). An alternative explanation is that an inappropriate outgroup was de® ned. O ur attempts to address this latter question by sequencing three other outgroup beetle taxa (two species of Tribolium [Tenebrio nidae] and one of C icindela [Carabidae]) for both 12S rRNA and cytochrome b
386
SY STEM AT IC BIO LO G Y
(data not presented) failed to reso lve the problem. T he pairwise HKY distance estimates fell within the cluster of ingro up-ingroup divergences. Further, the outgroup rooted on a long ingroup branch, suggesting that the sequences were e ectively randomized (Swofford et al. 1996a). A partition is not considered to be saturated if the number of substitutions is linearly related to time since divergence. However, it is not clear how this should be calculated or when this is likely to be observed. Calculations of linear reg ressio ns and hyperbolic curves between transversio ns and transitio ns assume that the pairwise distance estimates are independent and thus researchers are left to infer saturation. In this study transitions appeared to be saturated by 8% diverg ence. In contrast to these data, Hackett (1996) showed that for tanagers in the genus Ramphocelus (Aves) third-position transitions at the ND2 locus exhibit sig ns of saturation by about 12% sequence divergence. Friedlander et al. (1994) suggested that below about 20 ± 30% divergence the pairwise divergence should relate to the number of informative characters. CE analysis of the constrained homogeneous partitions with hierarchical structure gives researchers ¯ exibility to choose metho ds and/or models appropriate for each partition. In this study Habrocerus exhibited an elevated number of isoleucines. This may bias parsimony in the amino acid partition. Naylor and Brown (1997) noted that isoleucine and valine produced the poorest ® t to a mitochondrial genealo gy of 19 taxa whose interrelationships are widely accepted. A bias is intro duced when the evolutionary process does not satisfy the assumptions crucial to a particular metho d. If the bias becomes su ciently great, it may override the support for the true phylogeny and lead the researcher to an incorrect conclusion. In these cases the metho d is said to be positively misleading or inconsistent (Felsenstein, 1978). Future studies of staphylino id systematics that include cytochrome b data should include additional taxa within the T achyporine G roup (Ashe and Newton, 1993) to break the long branch to Habrocerus. CE facilitates the identi® cation of con¯ icting biological sig nals between partitions. In this study, partitioning the morphological data
V O L.
47
into head and nonhead characters resulted in con¯ ict over the phylogenetic a nities of (C onoplectus, Palimbolus) (Fig . 4). It is likely that this con¯ ict is related to selection on predaceous mouthpart characters in the head partition. Decay indices, compare-2 tests, and constrained analyses support (Dasycerus (Conoplectus, Palimbolus)) and (Dianous (Edaphus (Opresus, Euconnus))). T o the extent that they have been speci® ed, characters used previously to justify a relationship between Pselaphinae (represented here by C onoplectus and Palimbolus) and a subset (here Edaphus, Dianous, Opresus, and Euconnus) of the Staphylinine G roup have been a few mouthpart structures and vague aspects of adult body form (especially coxal structure). W e deem these likely to be subject to convergent selection in response to similarities in microhabitat and feeding habits (predation on microarthropods in forest litter), particularly in light of the ® nding s of Newton and Thayer (1995) that morphological features of diverse body parts support monophyly of both the O maliine G roup (here Dropephylla, Eusphalerum, Amphichroum, Conoplectus, and Palimbolus) and the Staphylinine G roup (here Oxyporus, Ontholestes, Achenomorphus, Edaphus, Dianous, O presus, Euconnus, and perhaps Oiceoptoma). Future analyses should include additional independent morphological partitions such as external characters of other life stages or internal characters (e.g., larvae, T hayer, 1985; ovariole structure, W elch, 1993). D espite the advantages of CE there are at least two potential problems that must be addressed before the technique can be routinely employed. Speci® cally, CE analysis may accept a tree that is not the most parsimonious, and it may have a biased estimator of nodal support. CE may in¯ uence character compatibility if the constrained nodes are not present in the most parsimonious reconstruction of the homogeneo us data partitions with hierarchical structure. T his means the constrained patterns may not be congruent with those of global parsimony (M addison et al., 1984). Under the procedure employed here, CD C generates the most parsimonious tree for combinable process partitions with hierarchical structure. CE then performs a second analysis in a universe where
19 9 8
BALLARD ET AL.Ð
DAT A SET S, PART IT IO NS, AND CHARACTERS
the well supported nodes are constrained. T he principle of this analysis is similar to the twostep procedure of M addison et al. (1984) and is consistent with the tenets of meta-analysis. W hen all the data were included in this study, the TE topology was one step shorter than the T E55, T C65, and CE75-75 topologies. W hen the congruent partitions with hierarchical structure were analyzed independently, the T E5 5 and CD C75 topology was shorter than the CE75-75 topology in ® ve of the six partitions. However, the compare-2 PT P and T empleton tests suggest that the increase in tree length that results from constraining nodes in CE7 5-75 is not greater than that attributable to chance. Constraining nodes for subsequent phylogenetic analysis is not a new idea. M oritz et al. (1992) used a step matrix to discount transitio ns at ® rst and third codon positions, then used the shortest topology to constrain the basal branching order in an analysis that heavily weighted transversio ns over transitio ns. CE may have a biased estimator of nodal support. In this study we compared the initial hypothesis to each of the four metho ds using the bootstrap proportions of 95%, 70% and 50%. W e then calculated the minimum bootstrap percentage that remained compatible with the initial hypothesis. T his procedure demo nstrated that the results of the constrained CE bootstrap analysis were intimately linked with the unconstrained analysis. T hree constrained and two unconstrained nodes supported by a CE95-70 bootstrap consensus support our initial hypothesis. T o prevent con¯ ict with this hy pothesis, the a posterio ri analysis indicated that decreasing the unconstrained bootstrap consensus by 20% to CE-75 necessitated a 5% increase in the constrained bootstrap analysis to CE75-75. Further research, possibly through simulation, is required to determine the utility of CE75-75. However, it may be predicted that CE may perform well when there are correlated characters in one part of the tree but not in another or when one or more partitions have missing data. In this study we have shown that CE constrains a set of well-supported CD C nodes. If we are willing to accept this constrained topology and the partitions employ ed to generate this topology reso lve further
387
nodes, we suggest that this represents strong evidence supporting these additional nodes. M ethodologically, CE appears to have advantages over the other procedures discussed. First, it sets a base of nodes obtained by CDC analysis and then investigates what further agreement may arise in a universe where these nodes are given. In this study CE75-75 reso lves more nodes than any other procedure. Second, CE investigates disagreements between partitions in a universe where these nodes are given. In this study, this facilitated investig ation of con¯ icts between partitions. Thus, CE enables researchers to both generate a well-corroborated tree and investigate character con¯ icts. A CKNO W LEDGM ENTS W e thank the Field M useum Evolution Discussio n G roup, Kirrie Ballard, Dan Faith, Anna G raybeal, Larry Hedges, David M addison, Janet V oight, Peter W agner, and three anonymous reviewers for their comments and discussion. W e thank D avid Swo ord for permissio n to use PAUP*. T his work is based upon molecular data collected in the Field M useum’s Biochemistry Laboratories, renovated with support from the Natio nal Science Foundation (g rant ST I-921446 ) and operated with support by an endowment from the Pritzker Foundation. DNA sequence data were collected on an Automated DNA sequencer purchased with funds granted to the Field M useum by the Natio nal Science Foundation (G rant BIR941973 2). Funds for the biochemical work were provided in part by the Field M useum as part of J.W .O .B’s start-up funds. Fieldwork by A.F.N. and M .K.T . was supported by the M arshall Field Fund of the Field M useum.
R EFERENCES A RCHIE, J. W . 1989. A randomization test for phylogenetic information in systematic data. Syst. Zool. 38:2 19 ± 25 2. A RNO L’DI, L. V ., V . V . Z HERIKHIN , L. M . N IKRITIN , AND A. G . PO NO MARENKO . 1 977 . [M esozoic beetles.] T r. Paleontol. Inst. Akad. Nauk SSR, 161:1 ± 20 4 (in Russian). [T ranslatio n: V andenberg, N. J. 19 90. M esozoic Coleoptera. Smithsonian Institutio n Libraries and Natio nal Science Foundation, W ashington, D.C.]. A SHE, J. S., AND A. F. N EW TO N, JR . 1993 . Larvae of Trichophya and phylogeny of the tachyporine group of subfamilies (Coleoptera: Staphylinidae) with a review, new species and characterization of the T richophyinae. Syst. Entomol. 18:2 67 ± 286. B ALLARD , J. W . O . 1994 . Phylogenetic a nities of Axinia (D iptera: Axiniidae) as inferred from 12 S rRNA sequences pp. 528 ± 534. Appendix to: D . H. Colless, New family of muscoid D iptera, with 16 new species
388
SY STEM AT IC BIO LO G Y
in four new genera (Diptera: Axiniidae). Invert. Zool. 8:47 1 ± 53 4 . B ALLARD , J. W . O ., AND M . K REIT M AN . 1 994. Unraveling selection in the mitochondrial genome of Drosophila. G enetics 13 8:7 5 7 ± 77 2. B ARRETT , M ., M . J. D O NO GHUE, AND E. S O BER . 1 991 . Against consensus. Syst. Zool. 40 :4 86 ± 49 3. B REMER, K. 19 88 . T he limits of amino acid sequence data in angiosperm phylogenetic reconstruction. Evolution 42:7 95 ± 80 3. B REMER, K. 199 0. Combinable component consensus. Cladistics 6 :36 9 ± 37 2 . B ULL, J. J., J. P. H UELSENBECK, C. W . C UNNINGHAM , D . L. S W O FFO RD, AND P. J. W ADDELL. 199 3. Partitio ning and combining data in phylogenetic analysis. Syst. Biol. 42:3 84 ± 39 7. C ARPENTER, J. M . 19 92 . Random cladistics. Cladistics 8:14 7 ± 15 3 . C HIPPINDALE, P. T., AND J. J. W IENS . 19 94 . W eighting, partitio ning, and combining characters in phylogenetic analysis. Syst. Biol. 43:2 7 8 ± 287. C RO WSO N, R. A. 1 97 5. T he evolutionary history of Coleoptera, as documented by fossil and comparative evidence. Atti Congr. Naz. Ital. Entomol. 10 :4 7 ± 90. C RO ZIER , R. H. 19 92 . T he cytochrome b and AT Pase genes of honeybee mitochondrial D NA. M ol. Biol. Evol. 9:4 7 4 ± 4 82 . C UNNINGHAM , C. W . 19 97 . Can three incongruence tests predict when data should be combined? M ol. Biol. Evol. 14:7 33 ± 74 0. DE Q UEIRO Z , A. 1 99 3. For consensus (sometimes). Syst. Biol. 4 2:3 68 ± 372 . D ICKERSIN , K., AND J. A. B ERLIN . 19 92. M eta-analysis: State of the science. Epidemiol. Rev. 14 :1 54 ± 176. D O NO GHUE, M . J., R. G . O LM STEAD , J. F. S M ITH, AND J. D . PALM ER . 1 99 2. Phylogenetic relatio nships of D ipsacales based on rbcL sequences. Ann. M o. Bot. G ard. 79:2 49 ± 26 5. EFRO N, B. 1 98 2. T he jackknife, the bo otstrap, and other resampling plans. Conf. Board M ath. Sci. Soc. Ind. Appl. M ath. 38:1 ± 9 2. FAITH, D . P. 19 90. Chance marsupial relatio nships. Nature 34 5:3 9 3. FAITH, D. P. 19 91 . Cladistic permutation tests for monophyly and nonmonophyly. Syst. Zool. 40:3 66 ± 3 75. FAITH, D. P. 19 92 . O n corrobo ration: A reply to Carpenter. Cladistics 8:2 6 5 ± 2 73 . FAITH, D . P., AND P. C RANST O N. 1 991. Could a cladogram this short have arisen by chance alone?: O n permutatio n tests for cladistic structure. Cladistics 7:1 ± 28. FAITH, D. P., AND J. W . H. T RUEM AN . 1996 . W hen the topology -dependent permutation test (T -PT P) for monophyly returns signi® cant support for monophyly, should that be equated with (a) rejecting a null hypothesis of nonmonophyly, (b) rejecting a null hypothesis of ``no structure,’’ (c) failing to falsify a hypothesis of monophyly, or (d) none of the above? Syst. Biol. 4 5:5 8 0 ± 58 6. FARRIS, J. S., M . KAÈ LLERSJOÈ , A. K LUGE, AND C. B ULT . 1 995. T esting signi® cance of incongruence. Cladistics 10:3 15 ± 31 9. FELSENSTEIN , J. 1 97 8 . Cases in which parsimony and com-
V O L.
47
patibility methods will be positively misleading. Syst. Zool. 27:4 01 ± 410. FELSENSTEIN, J. 198 5. Con® dence limits on phylogenies: An approach using the bo otstrap. Evolution 39 :7 83 ± 7 91 . FELSENSTEIN J., AND H. K ISHINO . 1993. Is there something wrong with the bo otstrap on phylogenies? A reply to Hillis and Bull. Syst. Biol. 14 2:1 93 ± 2 00. FRASER , N. C., D . A. G RIM ALDI, P. E. O LSO N, AND B. A XSM ITH. 1996. A T riassic LagerstaÈ tte from eastern North America. Nature 38 0:6 15 ± 61 9. FRIEDLANDER, T . P., J. C. R EGIER , AND C. M IT TER . 1 994. Phylogenetic information content of ® ve nuclear gene sequences in animals: Initial assessment of character sets from concordance and divergence studies. Syst. Biol. 43 :5 11 ± 525 . G LASS , G . V . 1976 . Primary, secondary and meta-analysis of research. Educ. Res. 5 :3 ± 8. H ACKETT , S. J. 1996. M olecular phylogenetics and bio geography of tanagers in the genus Ramphocelus. M ol. Phylogenet. Evol. 5:3 68 ± 382. H ASEGAW A , M ., H. KISHINO , AND T. Y ANO . 1985. D ating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. M ol. Evol. 22:1 60 ± 174. H EDGES , L. V., AND I. O LKIN , 198 5. Statistical methods for meta-analysis. Academic Press, O rlando, Flo rida. H ICKSO N , R. E., C. S IMO N, A. C O O PER, G . S. S PICER, J. S ULLIVAN , AND D. PENNY . 1996 . Conserved sequence motifs, alignment, and secondary structure for the third domain of animal 12S rRNA. M ol. Biol. Evol. 1 3:1 5 0 ± 169. H IGGINS , D . G ., AND R. M . S HARP. 1 988. CLUSTAL, a packag e for performing multiple sequence alignment on a microcomputer. G ene 73:2 3 7 ± 244. H ILLIS, D . M . 1987. M olecular versus morphological approaches to systematics. Annu. Rev. Ecol. Syst. 1 8:2 3 ± 42. H ILLIS, D . M . 199 1. Discriminating between phylogenetic signal and random noise in DNA sequences. Pages 278 ± 294 in Phylogenetic analysis of DNA sequences (M . M . M iyamoto and J. Cracraft, eds.). O xford Univ. Press, New York. H ILLIS, D . M ., AND J. J. B ULL. 1993 . An empirical test of bo otstrapping as a method for assessing con® dence in phylogenetic analysis. Syst. Biol. 42:1 82 ± 1 92. H UELSENBECK, J. P. 19 91 . T ree-length distributio n skewness: An indicator of phylogenetic information. Syst. Zool. 40:2 57 ± 270. H UELSENBECK, J. P. 19 95. Performance of phylogenetic methods in simulatio n. Syst. Biol. 44:1 7 ± 48 . H UELSENBECK, J. P., AND J. J. B ULL. 1996 . A likeliho od ratio test to detect con¯ icting phylogenetic signal. Syst. Biol. 45 :9 2 ± 98. JO NES, T . R., A. G . K LUGE, AND A. J. W O LF. 1993. W hen theories and methodolog ies clash: A phylogenetic reanalysis of the North American ambysto matid salamanders (Caudata: Ambystomatidae). Syst. Biol. 4 2:9 2 -1 02. KAÈ LLERSJOÈ , M ., J. S. FARRIS, A. G . KLUGE, AND C. B ULT . 1 99 2. Skewness permutation. Cladistics 8 :2 75 ± 287 . KLUGE, A. G . 198 9. A concern for evidence and a phylogenetic hypothesis of relationships among Epicrates (Boidae, Serpentes). Syst. Zool. 38:7 ± 25.
19 9 8
BALLARD ET AL.Ð
DAT A SET S, PART IT IO NS, AND CHARACTERS
K O CHER, T . D ., W . K. T HO M AS . A. M EYER, S. V. EDW ARDS , S. PAÈ AÈ BO , F. X. V ILLABLANCA , AND A. C. W ILSO N. 198 9. D ynamics of mitochondrial DNA evolution in animals: Ampli® cation and sequencing with conserved primers. Proc. Natl. Acad. Sci. USA. 86:61 96 ± 62 00 . K UM AR, S., K. T AM URA , AND M . N EI. 19 93. M EGA: M olecular evolutionary genetics analysis, versio n 1.02 . Pennsylvania State Univ., University Park. LANY O N, S. M . 1 985 . Detecting internal inconsistencies in distance data. Syst. Zool. 34 :3 97 ± 403 . LANY O N, S. M . 1 99 3. Phylogenetic frameworks: T owards a ® rmer foundation for the comparative approach. Biol. J. Linn. Soc. 4 9:4 5 ± 61 . LAW RENCE, J. F., AND E. B. BRIT TO N. 1994 . Australian beetles. M elbourne Univ. Press, Carlto n, Australia. LAW RENCE, J. F., AND A. F. N EWT O N, JR . 1982 . Evolution and classi® cation of beetles. Annu. Rev. Ecol. Syst. 13:2 61 ± 2 90 . LAW RENCE, J. F., AND A. F. N EW TO N, JR . 1 995 . Families and subfamilies of Coleoptera (with selected genera, notes, references and data on family-gro up names). Pag es 7 79 ± 10 06 in Biology, phylogeny and classi® cation of Coleoptera: Papers celebrating the 8 0th birthday of Roy A. Crowson ( J. Pakaluk and S. A. Slipinski, eds.). M uzeum i Instytut Zoologii PAN, W arszawa. LESCHEN, R. A. B. 1 99 3. Evolutionary patterns of feeding in selected Staphylinoidea (Coleoptera): Shifts among food textures. Pages 5 9 ± 104 . in Functional morphology of insect feeding (C. W . Schaefer and R. A. B. Leschen, eds.). Entomological Society of America, Lanham, M aryland. LYO NS-W EILER , J., G . A. HO ELZER , AND R. J. T AUSCH. 1996. Relative apparent synapomorphy analysis (RASA). I: T he statistical measurement of phylog enietic signal. M ol. Biol. Evol. 13 :7 49 ± 75 7. M ADDISO N, W . P., M . J. D O NO GHUE, AND D . R. M ADDISO N . 19 84 . O utgroup analysis and parsimony. Syst. Zool. 3 3: 83 ± 10 3 . M ADDISO N, W . P., AND D. R. M ADDISO N. 1 992. M acClade: Analy sis of phylogeny and character evolutio n. Sinauer, Sunderland, M assachusetts. M ICKEVICH, M . F. 19 7 8. T axonomic congruence. Syst. Zool. 2 7:1 4 3 ± 15 8. M ICKEVICH, M . F., AND J. S. FARRIS. 198 1. T he implicatio ns of congruence in M enidia. Syst. Zool. 30:3 51 ± 370. M IYAM O T O , M . M ., M . W . A LLARD , R. M . A DKINS , L. L. J ANECEK, AND R. L. H O NEY CUT T . 199 4. A congruence test of reliability using linked mitochondrial DNA sequences. Syst. Biol. 43 :2 36 ± 24 9. M IYAM O T O , M . M ., AND W . M . FIT CH. 19 95 . T esting species phylogenies and phylogenetic methods with congruence. Syst. Biol. 44:6 4 ± 76. M O RITZ , C., C. J. S CHNEIDER , AND D . B. W AKE. 199 2. Evolutionary relatio nships within the Ensa tina eschscholtzii complex con® rm the ring species interpretation. Syst. Biol. 41 :27 3 ± 29 1. N AY LO R, G . J. P., AND W . M . BRO WN . 199 7. Structural biolo gy and phylogenetic estimation. Nature 388:5 27 ± 52 8. N EW TO N, A. F., JR . 1 99 1. Coleoptera: Sphaeritidae, Synteliidae, Histeridae, Agyrtidae, Leiodidae, Leptinidae,
389
M icropeplidae, Dasyceridae, Silphidae, Pselaphidae, Scaphidiidae, Scydmaenidae. Pages 32 4 ± 341, 3 53 ± 355 , 35 9 ± 36 5 in Immature insects (F. W . Stehr, ed.), V olume 2. Kendall/Hunt, D ubuque, Iowa. N EW TO N, A. F., JR , AND M . K. T HAYER. 199 2. Current classi® cation and family-gro up names in Staphyliniformia (Coleoptera). Fieldiana Zool, N. S. 67:1 ± 92. N EW TO N, A. F., JR ., AND M . K. T HAY ER . 1995. Protopselaphinae new subfamily for Protopselaphus new genus from M alaysia, with a phylogenetic analysis and review of the O maliine G roup of Staphylinidae including Pselaphidae (Coleoptera). Pages 219 ± 320 in Biology, phylogeny and classi® cation of Coleoptera: Papers celebrating the 80th birthday of Roy A. Crowson ( J. Pakaluk and S. A. Slipinski eds.). M uzeum i Insty tut Zoologii PAN, W arszawa. N IXO N, K. C., AND J. M . C ARPENTER . 19 93. O n outgroups. Cladistics 9:4 13 ± 426. PO NO M ARENKO , A. G . 1969. [Historical development of the Coleoptera-Archostemata.] T r. Paleontol. Inst. Akad. Nauk SSR 12 5:1 ± 2 40 (in Russian). R O BERT SO N, H. M . 19 83. M ating behavior and the evolutio n of Drosophila mauritiana . Evolution 37 :1 2 83 ± 129 3. R O DRIGO , A. G ., M . K ELLY -B O RGES , P. R. BERGQ UIST , AND P. L. B ERGQ UIST . 199 3. A randomisation test of the null hypothesis that two cladograms are sample estimates of a parametric phylogenetic tree. N. Z. J. Bot. 31:2 57 ± 26 8. R Y VKIN , A. B. 1985 . [Beetles of the family Staphylinidae from the Jurassic of T ransbaikal]. T r. Paleontol. Inst. Akad. Nauk SSR 21 1:8 8 ± 91 (in Russian). R Y VKIN , A. B. 1990. [Staphylinidae]. T r. Paleontol. Inst. Akad. Nauk SSR 23 9:5 2 ± 66 (in Russian). S ATT A , Y ., AND N. T AKAHAT A . 1990. Evolutio n of Drosophila mitochondrial D NA and the history of the melanogaster subgro up. Proc. Natl. Acad. Sci. USA 87:9 558 ± 9 562. S IM O N, C., F. FRAT I, A. B ECKENBACH, B. C RESPI, H. LIU , AND P. FLO O K. 1994. Evolutio n, weighting and phylogenetic utility of mitochondrial gene sequences and a compilatio n of conserved polymerase chain reaction primers. Ann. Entomol. Soc. Am. 87 :6 51 ± 701 . S O LIGNAC , M ., AND M . M O NNERO T . 1986. Race formatio n and introgressio n within Drosophila simulans, D. mauritiana , and D. sechellia inferred from mitochondrial DNA analysis. Evolution 40:5 31 ± 53 9. S PRINGER , M . S., AND E. D O UZERY . 1996. Secondary structure and patterns of evolutio n among mammalian mitochondrial 12S rRNA molecules. J. M ol. Evol. 43:3 57 ± 37 3. S TEEL, M ., P. J. LO CKHART , AND D . PENNY . 1993. Con® dence in evolutionary trees from bio logical sequence data. Nature 3 64:4 40 ± 4 42. S ULLIVAN , J. 199 6. Combining data with di erent distributio ns of among-site rate variatio n. Syst. Biol. 45:3 75 ± 38 0. S WO FFO RD , D . L., G . J. O LSEN, P. J. W ADDELL, AND D. M . H ILLIS . 1996a. Phylog enetic inference. Pages 407 ± 514 in M olecular systematics, 2nd edition (D. M . Hillis, C. M oritz, and B. K. M able eds.). Sinauer, Sunderland, M assachusetts.
390
SY STEM AT IC BIO LO G Y
S W O FFO RD , D . L., J. L. T HO RNE, J. FELSENST EIN , AND B. M . W IEGM ANN . 1 99 6b. T he topology-dependent permutation test for monophyly does not test for monophyly. Syst. Biol. 45 :57 5 ± 57 9. T EM PLETO N, A. R. 198 3. Phylogenetic inference from restrictio n endonuclease cleavage site maps with particular reference to the humans and apes. Evolution 37:2 21 ± 24 4. T HAY ER , M . K. 1 985 . T he larva of Brathinus nitidus LeConte and the systematic positio n of the genus (Coleoptera: Staphylinidae). Coleopts Bull. 3 9:1 74 ± 184. T IKHO M IRO VA , A. L. 19 68 . [ Jurassic rove beetles of the Karatau (Coleoptera, Staphylinidae)]. Pages 139 ± 1 54 in [ Jurassic insects of the Karatau] (B. B. Rohdendorf, ed.). Nauka, M oscow, Russia (in Russian). W ELCH, R. C. 1 99 3 . O variole development in Staphylinidae (Coleoptera). Invert. Reprod. Dev. 23 :2 2 5 ± 234. W HEELER , W . C. 1 99 5. Sequence alignment, parameter sensitivity, and the phylogenetic analysis of molecular data. Syst. Biol. 4 4:3 21 ± 331 . Received 18 February 1997; accepted 9 Septem ber 1997 Associate Editor: D. C anna tella
A PPEND IX 1 S PECIM ENS EXAM INED All specimens from which morphological and molecular data (listed next) were gathered had been collected and identi® ed by A.F.N. and M .K.T ., except as noted. The notation (7 0%) indicates specimens collected into acid alcohol (7 0% ethanol with 5% by volume of glacial acetic acid added), then stored in 70 % ethanol before DNA extraction; all others were collected and sto red in 95% ethanol. A pinned or pointed specimen from each series is marked as a voucher and deposited in the Field M useum Necrophilus hyd rophiloide s G ueÂrin-M e neville-USA: CALIFO RNIA: M onterey Co., Los Padres NF, Bottchers G ap, 6 25 m, Quercus, etc., woodland; 19.iii.1 995, in rotting and moldy gilled mushrooms; Neopelatop s n.sp.1 -AUST RALIA: T ASM ANIA: Li ey Forest Reserve, near picnic area, 56 0 m, site 918, Eucalyptus obliqua forest with rainforest understory; 31.i.1 993 , pyrethrinfogging large old Eucalyptus log chunk w/tan slime mold about to fruit; D ropeph ylla cacti (Schwarz) (70%)Ð USA: ARIZO NA: Cochise Co., Chiricahua M ts, East T urkey Creek, 1 .1 km SE of on US FS road 42, 1,9 50 m, Quercus-Pinus edulis-Juniperus woodland; 9.iv. 1995, on white ¯ owers C eanothus fendleri 093 0hr; Eusphaleru m sp.: Ð USA: CALIFO RNIA: M onterey Co., Los Padres N.F., Skinner Ridge T rail nr. Bottchers G ap, 650 m, Quercus-A rbutus menziesii woodland; 19.iii.1 995 , on light blue C ea nothus ¯ owers; A m phichrou m sp.: Ð USA: CALIFO RNIA: same as pre-
V O L.
47
ceding; M egarthrus am erican us Sachse Ð USA: ILLINO IS: Union Co., Shawnee N.F., Little G rand Canyon, 120 ± 20 0 m, site 967, 15.x .1 995, mixed hardwood forest; in rotting gilled mushrooms, A. Newton, M . T hayer, O . Helmy; Proteinu s sp. Ð USA: ILLINO IS, same as preceding; D asycerus an gu licollis Horn (70 %)Ð USA: CALIFO RNIA: M onterey Co., Los Padres NF, Nacimiento-Ferg usson Rd. at pass (11.2 km from Califo rnia Hwy. 1), 840 m, site 953, Q uercus-A rbutus menziesii-Pinus coulteri woodland; 18 .iii.1 99 5, FM HD # 95 ± 4 7, berlese, forest leaf and log litter; C on oplectus can aliculatus (LeConte)Ð USA: ILLINO IS: Union Co., Shawnee N.F., Little G rand Canyon, 120 ± 200 m, site 96 7, 15 .x.1 9 95, mixed hardwood forest; berlese, log and leaf litter FM HD# 95 ± 82, M . T hayer; Palim bolu s victoriae (King) (70 %)Ð AUST RALIA: T ASM ANIA: Southwest N.P., G ordon R. Rd, 10.1 km SE Strathgo rdon, 325 m, site 903, open Eucalyptus forest with A cacia, Eucryphia, etc. understory; 24 .i.1 993 , FM HD # 93 ± 52 , berlese, leaf and log litter; Edaphu s sp. (am ericanu s Puthz or n itid us M otschulsky)Ð USA: ILLINO IS: Union Co., Shawnee N.F., F.R. 23 6, SE of M cGee Hill, 17 5 m, site 9 66, 14.x .1 99 5, mixed hardwood forest in south facing ravine; FM HD # 9 5 ± 80, berlese, old litter in dry stream bed, A. Newton, M . T hayer, O . Helmy; and USA: ILLINO IS: FM HD# 95 ± 82, as already described; D ianou s nitid ulu s LeConte (70 %)Ð USA: NEW M EXICO : Lincoln Co., Eagle Ck. at Carlto n Canyon (W of Alto), 2,39 0 m, site 9 44, Pseudotsuga menziesii-Pinus ponderosa forest; 3.iii.1 995, FM HD # 95 ± 2, berlese, wet debris, forest stream; Opresu s sp. (det. W . Suter)Ð USA: W ISCO NSIN: Kenosha Co., Somers, Holmes Forest; 26 .ix .19 95 FM HD # 95 ± 84 , berlese oak stump buttress tree hole, W . Suter; Eu conn us sp.Ð USA: ILLINO IS: FM HD # 95 ± 82, as already described; Oxyporu s stygicu s Say Ð USA: ILLINO IS: Union Co., Shawnee N.F., Little G rand Canyon, 120 ± 2 00 m site 96 7, 1 5.x .19 95, mixed hardwood forest; in yello w gilled mushrooms, A. Newton, M . Thayer, O . Helmy; A chenom orphus corticin us (G ravenhorst)Ð USA: ILLINO IS: FM HD # 95 ± 82, as already described; On tholestes cin gulatu s (G ravenhorst)Ð USA: ILLINO IS: Cook Co., W estern Springs, Bemis W oods North, site 943, oak/mixed hardwood forest; 2 16 .vii.1 9 94, FM HD # 94 ± 6, carrion trap (squid); Oiceoptom a n oveb oracense (Forster)Ð USA: ILLINO IS same as preceding; T achinus lu rid us ErichsonÐ USA: ILLINO IS: same as preceding; H ab rocerus capillaricornis (G ravenhorst)Ð USA: ILLINO IS: Cook Co., W estern Springs, Bemis W oods North, site 943 , oak/mixed hardwood forest; 2 .vii.1 994, FM HD# 94 ± 5, berl., rotting logs; C ypariu m con color (Fabricius)Ð USA: ILLINO IS: FM HD # 95 ± 82, as described earlier; Oxytelus ?convergens LeConte Ð USA: ILLINO IS: FM HD # 94 ± 6, as described earlier.
19 9 8
BALLARD ET AL.Ð
DAT A SET S, PART IT IO NS, AND CHARACTERS
391
A PPENDIX 2 M orphological characters included in this study (see Newton and T hayer [1995 ] for details). Characters 11 ± 50 form the head partition; characters 1 ± 10 and 51 ± 1 12 the nonhead partitio n. ``X’’ indicates characters excluded by Newton and T hayer (19 95 ) and in this study and ``A’’ denotes characters excluded in this study only. A ``?’’ indicates unknown and a ``-’’ denotes inapplicable.
392
SY STEM AT IC BIO LO G Y
V O L.
47
A PPENDIX 3 T he secondary structure alignment of domains II-IV of staphylinoid 1 2S rRNA. The helix (stem) numbers are shown abo ve the alignment and the bases involved in pairing are shaded for each taxon. O pen bo xes enclose positio ns that include gaps but were included in the analysis. Hatched boxes enclose positions that include gaps and have been removed from the analysis.
19 9 8
BALLARD ET AL.Ð
DAT A SET S, PART IT IO NS, AND CHARACTERS
393
394
SY STEM AT IC BIO LO G Y
A PPENDIX 4 T he alignment of the staphylino id cytochrome b locus.
V O L.
47
19 9 8
BALLARD ET AL.Ð
DAT A SET S, PART IT IO NS, AND CHARACTERS
395
396
SY STEM AT IC BIO LO G Y
V O L.
47