Cladistics - American Museum of Natural History

0 downloads 0 Views 539KB Size Report
May 6, 2003 - Simmons and Ochoterena, 2000; Freudenstein and. Chase, 2001 .... in detail, but seen from the equation itself, MRI might slide towards a trivial ...
Cladistics Cladistics 21 (2005) 15–30 www.blackwell-synergy.com

Sequence length variation, indel costs, and congruence in sensitivity analysis Lone Aagesen1,3,*, Gitte Petersen2 and Ole Seberg2 1

Division of Invertebrate Zoology, American Museum of Natural History, Central Park West at 79th St., New York, NY 10024-5192, USA; 2 Botanical Institute, University of Copenhagen, Gothersgade 140, DK-1123 Copenhagen K, Denmark; 3 Instituto de Bota´nica Darwinion, CC 22, 1642 San Isidro, Argentina Accepted 29 November 2004

Abstract The behavior of two topological and four character-based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which the cost of indels was varied. Indels were treated either as a fifth character state, or strings of contiguous gaps were considered single events by using linear affine gap cost. Congruence consistently improved when indels were treated as single events, but no congruence measure appeared as the obviously preferable one. However, when combining enough data, all congruence measures clearly tended to select the same alignment cost set as the optimal one. Disagreement among congruence measures was mostly caused by a dominant fragment or a data partition that included all or most of the length variation in the data set. Dominance was easily detected, as the character-based congruence measures approached their optimal value when indel costs were incremented. Dominance of a fragment or data partition was overwhelmed when new sequence length-variable fragments or data partitions were added.  The Willi Hennig Society 2005.

Since DNA sequence analysis has become a standard technique for phylogenetic inference at all taxonomic levels, there has been a growing demand for phylogenetic markers. Difficult to align sequences such as non-coding sections of DNA sequences and many ribosomal genes are today commonly used in phylogenetic analyses. One disadvantage of these regions is the frequent occurrence of length mutations that may obscure the homology relationship among nucleotides. Without transforming the nucleotide observations into homology statements, phylogenetic analysis cannot proceed. Algorithms have been developed to align the sequences by placing gaps, and alignment programs like Clustal (Thompson et al., 1997) are now routinely used to create hypotheses of homology in phylogenetic analyses using length-variable DNA sequences. In *Corresponding author. E-mail address: [email protected]  The Willi Hennig Society 2005

addition to aligning the bases, a second concern is how to include the length information itself. Including length information into phylogenetic analyses is a complex problem that has led to a growing array of different approaches, such as ID coding (Barriel, 1994), direct optimization (Wheeler, 1996), fixed states optimization (Wheeler, 1999a,b; Lutzoni et al., 2000), simple and complex indel coding (Simmons and Ochoterena, 2000; Young and Healy, 2003), and search based optimization (Wheeler, 2003a). Despite the interest, no consensus about what is the most adequate method has been reached. As noted by Lee (2001), major text books may treat the topic superficially or even ignore it, and sequence regions that are too difficult to align are still commonly left out of analyses. Establishing appropriate standard procedures and algorithms for dealing with the complex analysis of length-variable DNA sequences is extremely important for the reliability and usefulness of the fast-growing

16

L. Aagesen et al. / Cladistics 21 (2005) 15–30

number of published phylogenetic analyses. This paper explores one of the above methods, direct optimization (Wheeler, 1996). The method is often used in a sensitivity analysis framework (Wheeler, 1995) in which different alignment costs are explored. Optimal costs are identified by the use of congruence measures that select as optimal the cost set that yields the highest congruence among data partitions. The purpose of this study is to explore the use of length variation in direct optimization, both when indels are considered as separate events (equal to a fifth character state), and, primarily, when strings of contiguous gaps are considered single events using and approximate algorithm, the recently incorporated affine gap cost (Watermann et al., 1976; Gotoh, 1982). Treating adjacent gaps as single events may potentially improve congruence in the cases where indels are of different length. The main concern of this paper is therefore whether the affine gap cost implemented in direct optimization has a positive influence on congruence among data partitions. A second, and related concern is to understand the behavior of the congruence measures themselves within a sensitivity analysis framework. The behavior of different congruence measures in sensitivity analysis is not well understood and some studies cast doubt on the reliability of commonly used measures (Wheeler and Hayashi, 1998; Dowton and Austin, 2002). It follows that before any conclusion can be drawn on how different indel costs and indel-cost functions affect data congruence, different congruence measures and their behavior should be explored in addition to the indel costs themselves. Three empirical data sets were analyzed, each composed of several partitions with different degrees of length variation. The data include at the species level the genus Hordeum (barley, of the grass family Poaceae), at the genus level the tribe Triticeae (the grass family Poaceae), and at the order level the Arthropoda. All data sets were analyzed within a parameter space that varies indel cost. Six different congruence measures were applied to explore the effects of using different costs. Before considering the results of the empirical analyses, some justification for choosing direct optimization above other methods is given, and different congruence measures used in sensitivity analysis are presented.

Use of length information in phylogenetic analyses When length information is included in phylogenetic analyses, the prevailing approach is a two-step procedure. First a multiple sequence alignment is obtained, either by eye or by using an appropriate algorithm, such as Clustal (Thompson et al., 1997). Second the phylogenetic hypothesis is obtained by analyzing the resulting

matrix with some decisions on how to treat the indels, e.g., as missing data, as a fifth character state, or by recoding the indels. If standard techniques for optimizing characters are altered in such a way that indels are incorporated, the search for the shortest cladogram, the one that best fits the observed data (the unaligned sequences), can proceed without the need for a multiple alignment step. This is the idea behind direct optimization (Wheeler, 1996). The benefit is that the homology assignments of the characters are dynamic and indels are postulated in positions that are optimal for the tree being evaluated under the search, which facilitates finding the overall optimal tree—the one that requires the minimal amount of changes to fit the observed data. Direct optimization as originally implemented in the computer program POY (Wheeler et al., 2003; documentation by De Laet and Wheeler, 2003) treats indels as a fifth character state with the implication that a series of gaps are treated as independent events. Strings of gaps will therefore be given a cost proportional to the length of the string. In the worst case this procedure causes species to group because of shared long indels (also in cases where the indels simply overlap without having identical 5¢ and 3¢ ends, Belshaw and Quicke, 2002; Petersen et al., 2004). For this reason treating indels as a fifth character state is not appealing to many researchers who work with phylogenetic markers where long indel mutations seem to have occurred (for example Simmons and Ochoterena, 2000; Freudenstein and Chase, 2001; Hennequin et al., 2003). Of course, the presence of a long indel may be shared due to common ancestors already having the long indel. The problem is that it is very difficult to test the opposite hypothesis, that the indel was acquired in parallel simply because the number of shared characters will equal the length, or overlapping length, of the indel. The longer the indel or overlap, the more characters the two species have in common and the more additional data are needed to outweigh the indel if the indel is a parallelism. Recoding the indels is an alternative to treating the gaps as a fifth character state. In this approach the gaps are coded as single event (Simmons and Ochoterena, 2000; Young and Healy, 2003). The rationale is that insertions and deletions may be of all possible lengths and should therefore receive equal costs. However, as the recoding is done on a multiple alignment and therefore relies on the quality of this alignment, the basic problems of multiple alignments remain (see Thorne and Kishino, 1992; Wheeler, 1996; Vingron and Haeseler, 1997; Notredame et al., 1998; Brocchieri, 2001). Recently the use of linear affine gap cost (Gotoh, 1982) has been implemented in direct optimization. With this, direct optimization provides an objective and repeatable analytic procedure to analyze length-variable DNA sequences under the assumption that indels may

L. Aagesen et al. / Cladistics 21 (2005) 15–30

be of different lengths. Linear affine gap costs are a rough approximation to a situation in which indels of different length have the same cost. When implemented, all indels will receive the same opening penalty and in addition each indel receives the cost of its extension that is proportional to its length. A cost of zero for extending an indel would give the effect of counting all indels equally no matter their length, but using zero costs would give a non-metric cost set that leads to uninformative alignments (Wheeler, 1993). If the affine gap cost is very low compared to the cost of a base change and gap opening penalty, the cost set approaches a situation where indels of different length have nearly the same weight. This is, however, obtained at the cost of metricity which must be maintained if the analysis is to be logically consistent (Farris, 1981, 1985; Wheeler, 1993), and to avoid uninformative alignments (see below). The use of affine gap costs is therefore a compromise that leads to alignments where indels of some extension may be favored over both very short indels and very long ones. Additionally, the implementation of affine gap costs, like any procedure incorporated into tree searching programs, requires several heuristics (De Laet, pers. comm.). Despite these concerns, affine gap cost may still improve congruence if the sequences vary considerably in length and this variation is caused mainly by long indels.

17

different cost sets may in turn identify different optimal trees. Character-based congruence measures When sensitivity was first proposed within a phylogenetic context (Wheeler, 1995), both topology- and character-based measures of congruence were discussed. They were applied either as percentage of shared or congruent groups or as the incongruence length difference, ILD (Mickevich and Farris, 1981; Farris et al., 1995), normalized through division by the length of the combined data cladogram. The ILD index is currently the most widely used congruence measure in sensitivity analysis: ILD ¼ LAB  ðLA þ LB Þ=LAB ; where LAB is the minimum length of the combined data set and LA and LB are the minimum length of data set A and B analyzed on their own. A rescaled variation of the ILD, the RILD, was proposed by Wheeler and Hayashi (1998). While the proposed ILD index scales the extra steps obtained by combining the data sets as a fraction of the length of the combined tree, the RILD scales the extra steps as a fraction of the maximum possible number of extra steps that can be obtained when data sets are combined: RILD ¼ LAB  ðLA þ LB Þ= max LAB  ðLA þ LB Þ;

Congruence measures Sensitivity analyses in a phylogenetic context are concerned with the problem of assigning alignment costs and the influence of costs on the alignment and optimal trees. That different alignment costs may lead to different alignments and therefore different optimal trees is well known (Wheeler, 1995; Morrison and Ellis, 1997). As there are no objective means for determining optimal costs a priori, sensitivity analysis takes the approach of repeating the phylogenetic analysis several times, each time with a different cost set. The results are then compared and the optimal cost set of those examined, may optionally be identified (Wheeler, 1995). How to select the optimal cost sets is therefore an important issue. As originally described, sensitivity analysis extends the logic from conventional parsimony analyses, which maximize congruence among characters, to maximizing congruence among data in general. Congruence (or incongruence) indices have been based on both tree length and tree topology. While the topological measures compare the optimal tree topologies from the different data partitions, the character (or length) based congruence indices are generally concerned with the homoplasy that arises from combining the data. The problem is that different congruence measures may select different optimal cost sets and

where maxLAB is analogous to the G of the retention index (Farris, 1989). The RILD was supposed to behave differently from the ILD in cases where ILD would show a trivial minimum (see below). The Meta Retention Index—MRI (Wheeler et al., unpublished) corresponds to the retention index applied to the fragments of the sequence data: MRI ¼ max LAB  LAB = max LAB  ðLA þ LB Þ: When the RILD and the MRI are applied to the same fragments they are complementary values, one measuring the number of extra steps when data sets are combined, the other measuring the number of retained possible extra steps. However, when proposing the MRI Wheeler et al. also suggested measuring congruence not between data partitions, as done earlier, but between the fragments of the data partitions. If the sequences from a given data set are not split into fragments the RILD and MRI will be complementary values. If the sequences are fragmented (for example cut into fragments of introns and exons) the ILD and RILD both measure congruence among the data partitions while the MRI measures congruence among the fragments regardless of which of the partitions the fragment belongs. Since the MRI measures congruence among the fragments of an analysis, the analog ILD index has

18

L. Aagesen et al. / Cladistics 21 (2005) 15–30

also been plotted here. When ILD is measured between partitions it will be called ILDW following Dowton and Austin (2002), and ILDF when it is measured between fragments. When general features applicable to both to the ILDW and the ILDF are discussed it will be referred to simply as ILD. These four character based congruence measures quantify two different aspects of congruence: Congruence between data partitions or between fragments, and congruence measured either as minimizing homoplasy analogous to the consistency index (Kluge and Farris, 1969), or maximizing retained possible homoplasy analogous to the retention index. In conventional phylogenetic analysis, where static matrices are analyzed, the tree with highest CI will also be the tree with highest RI. This may differ in sensitivity analysis where homology statements are dynamic. Very little is known about the behavior of any of these congruence measures. Unlike the related ILD-test (Farris et al., 1995), which has been widely discussed (Cunningham, 1997; Dolphin et al., 2000; Yoder et al., 2001; Barker and Lutzoni, 2002; Darlu and Lecointre, 2002; Dowton and Austin, 2002), the efficiency of ILDW used as a congruence index has received very little attention in the literature. Only Dowton and Austin (2002) offer some evaluation of the index. Their paper shows that under some circumstances ILD approaches zero when the weighting schemes become increasingly disproportionate, although congruence is not improved. In their example, two data partitions were combined and analyzed under the same cost set and varying the total weight of one of the partitions. The reported behavior can be predicted by inspecting the ILD index itself: ILD ¼ LAB  ðLA þ LB Þ=LAB : If only one of the data partitions is given an increased weight, the length of the highly weighted data set approaches the length of the combined data set when extreme costs are applied. This can happen for example when one of the partitions is given a heavily a priori weight (the case in Dowton and Austin, 2002) or when indels receive increasingly higher weights and indels are found in only one of the partitions. In these cases incongruence may approach zero according to the ILD, giving an incorrect impression of improved congruence between data partitions. That the ILD under some conditions can misleadingly slide towards increased congruence was already mentioned by Wheeler and Hayashi (1998) when introducing the RILD, as the problem of trivial minima when data set weights become increasingly disproportionate. The RILD and the related MRI are thought to be less affected by the differential weighting of data sets. The behavior of the RILD and MRI has not been explored in detail, but seen from the equation itself, MRI might

slide towards a trivial maximum in cases similar to the ILD, if one of the partitions retains enough indels as synapomorphies, and indels are only or mostly found in this partition or fragment. Topology-based congruence measures Topology measures compare the optimal trees for the individual partitions of the data set and therefore all share the drawback of not considering the strength of evidence of the individual clades (Mickevich and Farris, 1981; Farris et al., 1995). However, as topology measures do not rely on tree length they are not affected by the distortion that differential weighting may have on character-based congruence measures. One is used here simply to report at which cost set the individual partitions of the data set introduce most similar optimal signals into the simultaneous analysis—hence, where optimal topologies of the individual partitions are most alike. Several topological measures have been proposed, here only a single measure has been chosen the TILD (Wheeler, 1999a,b). Other topology based congruence measures could have been explored, but we do not wish to extend the already lengthy discussion by evaluating different topology based congruence measures as well. We here concentrate on character based congruence measures, as these measures are most commonly used in sensitivity analysis, and select a single topology based measure as contrast. We use the TILD because of its simplicity. The TILD is an extension of the incongruence length difference (ILD). Like the ILD the TILD can be applied to data sets with an unequal number of taxa. For an empirical comparison of TILD with other topology based congruence measures see Wheeler (1999a,b). The TILD index (Wheeler, 1999a,b) is based on the incongruence length difference (Mickevich and Farris, 1981), scaled as the ILD index (Wheeler and Hayashi, 1998): TILD ¼ Lab  ðLa þ Lb Þ=Lab : The input matrices a and b for the TILD index consist of the group inclusion characters (Farris, 1973) derived from the topologies of the optimal trees found when analyzing the original data partitions A and B. La and Lb will be equal to the number of resolved groups in the optimal trees derived from the original matrices A and B, respectively. When the groupinclusion character matrices a and b are combined and analyzed, the homoplasy (Lab – (La + Lb)) is proportional to the number of species placed differently in the optimal tree from matrix A and the optimal tree from matrix B. The metrics can be extended to include any number of input data sets, each defining an optimal tree that is converted into a group-inclusion character matrix.

L. Aagesen et al. / Cladistics 21 (2005) 15–30

If some of the optimal trees are highly collapsed this can affect the index. If one of the data partitions under some costs generates several optimal trees and a highly collapsed consensus, while under other costs a single optimal tree or a more resolved consensus tree is found, the index may choose the first cost sets as optimal, simply because there are fewer groups to disagree upon. To avoid this situation Wheeler (1999a,b) also introduced a rescaling of the TILD, the TILDN: TILDN ¼ Lab  ðLa þ Lb Þ= max Lab  ðLa þ Lb Þ; where maxLab is equal to G used in the retention index (Farris, 1989). Here both TILD and TILDN are used; always measured between data partitions, not between fragments.

Materials and methods Hordeum L. The data set that includes three loci (two nuclear and one plastid) and RFLP data from the plastid genome was published by Petersen and Seberg (2003). It includes 26 taxa of Hordeum (barley, of the grass family Poaceae) and three outgroup taxa. Three loci are included. rbcL and part of the downstream region (plastid gene and spacer). The first locus consists of rbcL (ribulose1,5-bisphosphate carboxylase ⁄ oxygenase or rubisco) and the 6–18 bp of the intergenic spacer between rubisco and psaI (gene encoding photosystem I polypeptide I). DMC1 (disrupted meiotic cDNA1) and EF-G (elongation factor G). Both loci are nuclear genes. In the DMC1 and the EF-G data sets sequence lengths vary by 30 and 27 bp, respectively. The variation comes from autapomorphic indels and minor shared indels. Triticeae The full data set includes 30 diploid Triticeae species plus two outgroup species. The grass tribe Triticeae is well studied and several phylogenetic studies are available for the diploid species (Hsiao et al., 1995; Kellogg and Appels, 1995; Petersen and Seberg, 1997, 2000; Mason-Gamer et al., 1998; Seberg and Frederiksen, 2001). The general impression is one of extensive incongruence, with different genomes or portions of genomes pointing to different histories (Mason-Gamer et al., 1998; Petersen and Seberg, 2000). Here five loci are included. DMC1 (disrupted meiotic cDNA1, nuclear gene). The DMC1 data set includes 30 Triticeae species and one outgroup species from Petersen and Seberg (2000). The

19

sequences contain four exons and five introns. No length variation is found in the exons. Among the introns, most of the length variation is confined to intron 14 in which a transposable element was found in five of the ingroup species, while footprints (remains of transposable elements left after excision) were found in several other species (Petersen and Seberg, 2000). In the original analysis the transposable element was left out of the matrix. Here it is included and represents a simple case of a long indel. EF-G (elongation factor G, nuclear gene). The sequences were produced for this study (GenBank acc. nos. AY836186–AY836217). The data set includes two exons and one intron. The first exon has three autapormorphic gaps while the second exon has an informative three base-pair long indel. The sequences of the intron vary by 239 base pairs (18 base pairs among the ingroup species). EF-G does not present cases of long indels and has been included here to add data. GBSSII (Granule-bound starch synthase or waxy, nuclear gene). The sequences are obtained from MasonGamer et al. (1998). Only species represented in the other data partitions were included. The data set includes four exons and five introns. Two of the exons include a single autapomorphic gap. All introns show moderate sequence length variation (20–31 bp difference), but several gaps have to be inserted and the alignment is not straightforward (Mason-Gamer et al., 1998). According to MasonGamer et al. (1998) the GBSSII data set seems to track a portion of the Triticeae genome with its own history unlinked to any other gene studied. rbcL and downstream region until psaI (plastid gene and spacer). The sequences were produced for this study (GenBank acc. nos. AY836154–AY836185). The data set consists of rbcL and the non-coding region flanked by rbcL and psaI. The non-coding region contains the highest level of sequence length variation analyzed in this study, with a variation of 1099 bp (1057 bp among ingroup taxa). The intergenic spacer between rbcL and psaI is supposed to include the pseudogene rpl23¢ as well as several shorter and longer indels (Ogihara et al., 1991; Golenberg et al., 1993). Arthropoda The data set was published by Giribet et al. (2001). The complete data set includes 48 Arthropod lineages and three outgroup species. The Arthropod data represent a moderately big data set with several loci cut into many small sequence fragments in the original analysis. Here the data set was analyzed using the entire DNA sequences except for the 18S which was cut into fragments as defined by the primers. The data set includes a morphological matrix and eight loci.

20

L. Aagesen et al. / Cladistics 21 (2005) 15–30

16S. Mitochondria ribosomal gene, with a sequence length variation of 91 bp. 18S. Nuclear ribosomal gene divided into four fragments. The individual fragments have a sequence length variation of 40 bp in the fragment with less length variation and 927 bp in the fragment with the highest length variation. 28S. Nuclear ribosomal gene with a sequence length variation of 220 bp. U2. Nuclear ribosomal gene with a sequence length variation of 3 bp. Histone H3. Nuclear protein-coding gene, no sequence length variation. Elongation factor-1a. Nuclear protein-coding gene, no sequence length variation. Large subunit of RNA polymerase II. Nuclear protein-coding gene, no sequence length variation. Cytochrome c oxidase 1. Mitochondrial protein coding gene, no sequence length variation. Handling the DNA sequences The Hordeum data set was analyzed with rbcL and the downstream region cut into two fragments. rbcL is commonly thought of as being of more or less constant sequence length, but among the Hordeum species the stop codon seems to be placed at very variable positions, giving the rbcL of the species a 40 bp difference in length from start to stop codon. The unexpected length difference near the 3¢ end of the gene calls for reconsideration of the appropriateness of analyzing sequences of equal length as prealigned, as if indels never happen here. Nonetheless, the larger part of rbcL has here been analyzed as prealigned, cut off from the downstream region at what appears to be a homologous position upstream from the stop codon. The final fragments consist of rbcL excluding the 3¢ end of the gene and a 51–89 bp fragment including the 3¢ end of rbcL and part of the intergenic spacer. The unfragmented DMC1 and EF-G loci were added successively to the two cpDNA sequence fragments. Finally, the RFLP data set was added, weighted either equal to indels (or equal to the opening gap penalty) or equal to base changes. The Triticeae data set was analyzed with all loci fragmented into exons and introns, while the rbcL and spacer were divided as in the Hordeum data set. The DMC1 data was analyzed measuring congruence between exons and introns. EF-G, GBSSII, and the cpDNA sequences were then added successively and congruence was measured between data partitions. The Arthropod data set was analyzed combining all data. The morphology data set was given a weight equal to indel cost when affine gap costs were not used or equal to base changes when affine gap costs were in use.

Setting the alignment costs All analyses were done using direct optimization as implemented in the computer program POY version 3.0.11a (Wheeler et al., 2003). The analyses were run either on a single processor 1.7 GHz Pentium PC or in parallel in a 20-processor subcluster of the cluster operating at the American Museum of Natural History, New York. As the aim of the paper is to evaluate the effect of different indel treatments, the parameter space includes varying the indel cost relative to the base change cost, while transitions and transversions are always weighted equally. Three series of cost sets using affine gap costs were explored: one set the affine gap cost equal to the base change cost and varied the opening gap cost; a second set the affine gap cost at half the cost of base changes and varied the opening gap cost; the third set the opening gap cost and the base change cost as equal and lowered the affine gap costs. This latter cost series may violate the assumption of metricity of the cost set. Non-metric cost sets were nonetheless considered here because they permit the insertion of long indels at relatively low cost. Metricity within a phylogenetic context has been discussed by Farris (1981, 1985) and Wheeler (1993). For the purpose of this paper the triangle inequality is relevant: for all x; y; z : dðx; yÞ  dðx; zÞ þ dðz; yÞ: It states that, considering three points, the direct distance between any two points should not exceed the distance of going from one point to the other through an intermediate third point. If this is not the case, as for example when the cost of an indel is less than 0.5 of a base change, uninformative alignments can be created simply by inserting indels and no base changes. In cases where indels depend both on a gap opening penalty and affine gap costs the situation is more complex. Under this situation the cost of an indel does not conform to a point defined by a unit gap cost but the composite cost of the gap opening penalty and the extension of the indel. This means that one point of the triangle, the indel cost, does not have a fixed cost—it varies with indel length. In this situation a cost set is potentially non-metric when affine gap costs are lower than half of the base change cost. Whether a non-metric situation appears depends on the length of a given indel. The critical indel length, beyond which trivial alignments appear, depends on the specific cost set. Table 1 shows situations where non-metric cost sets produce either informative or trivial alignments. Under a metric cost set (2,2 ⁄ 1) an uninformative alignment is always more costly than the informative one. If the affine gap cost is 0.25 of all other changes (4,4 ⁄ 1) the cost of two uninformative indels is equal to the cost of three base

L. Aagesen et al. / Cladistics 21 (2005) 15–30

21

Table 1 Cost of different alignments under metric and non-metric cost set Opening gap penalty, base change cost ⁄ affine gap cost

2,2 ⁄ 1

4,4 ⁄ 1

8,4 ⁄ 1

Metric CCAACCCCTTCC CC--AACCCCTT--CC CCAAACCCCTTTCC CC---AAACCCCTTT---CC CCAAAACCCCTTTTCC CC----AAAACCCCTTTT----CC CCAAAAAAAACCCCTTTTTTTTCC CC--------AAAAAAAACCCCTTTTTTTT--------CC

yes 4 6 6 8 8 10 16 18

no 8 10 12 12 16 14 32 22

no 8 18 12 20 16 22 32 30

changes but less costly than four base changes. Hence, under this cost set, indels longer than three positions will create a non-metric situation. Longer indels can be inserted without creating a non-metric situation if the opening gap penalty is higher that the base change cost. Using an opening gap penalty of eight, a base change cost of four and affine gap cost of one (8,4 ⁄ 1), noninformative alignments appear when indels have a length of eight positions or more (Table 1). When the affine gap costs are progressively lowered compared to other changes, metricity is violated and non-informative alignments will eventually appear. Although some of the trivial alignment in Table 1 may seem intuitively acceptable, the use of a non-metric cost set has an unknown effect on the data set, as homoplasy can be removed by inserting indels.

When calculating TILD and TILDN Jack2hen.exe (distributed together with the program POY) was used to convert optimal trees into matrices of group inclusion characters. Winclada version 1.00.08 (Nixon, 2002) was used to combine the group inclusion character data sets. The final combined matrices were analyzed with the program Nona (Goloboff, 1998).

Search strategy

The partitions of the Hordeum data set were not subdivided into fragments (rbcL and the spacers are considered separate partitions, although the limit between the two areas is somewhat arbitrary). It follows that ILDW and ILDN are the same while MRI and

All results of congruence measures are shown in Figs 1–11. In the figures all graphs are scaled equally to facilitate comparisons between data sets. Hordeum

out of metricity ILDW

0.30

1.00 0.90

MRI TILD TILDN

0.25 0.80

0.20

0.70 0.60

0.15

0.50 0.40

0.10

0.05

0.30 0.20 0.10

0.00

0.00

ILDW MRI TILD TILDN

1, 1 2, 1 4, 1 8, 1 16 ,1 2, 1/ 1 4, 1/ 1 8, 1/ 16 1 ,1 /1 4, 2/ 1 8, 2/ 16 1 ,2 / 32 1 ,2 /1 2, 2/ 1 4, 4/ 1 8, 8/ 1

In all analyses the search strategy involved finding 100 Wagner trees, swapping each with TBR, and holding only two trees per replicates (-replicates 100 -nospr -tbr -maxtrees 2). Searches were stopped automatically if the optimal length was hit three times after a minimum of 10 replicates were done (-stopat 3 -minstop 10). The final optimal trees were swapped until 100 optimal trees were found (-holdmaxtrees 100). All analyses were furthermore run with the settings (-seed -1 -norandomizeoutgroup) which, respectively, uses the internal clock of the computer as a random seed and holds the outgroup constant. Optimal trees from the Triticeae and Arthropod data sets were submitted to 20–40 rounds of ratcheting (Nixon, 1999) (-ratchettbr 20). The ratchetseverity, which determines by what weight the characters are upweighted during the ratchet replicates was set to 2 or 4, while the ratchetpercent, the percentage of characters to be reweighed, was varied between 15 and 25. For the calculation of MRI and RILD the maximum length of the individual fragments was calculated using the option -pairwise and summing the values over the partitions and data sets.

Results

use of affine gap cost

Fig. 1. Hordeum, rbcL and downstream area. Congruence measured between partitions. Cost of transformations listed as: opening gap cost, base change cost ⁄ affine gap cost. Portion of parameter space with nonmetric cost sets shown in gray.

22

L. Aagesen et al. / Cladistics 21 (2005) 15–30 out of metricity ILDW

0.30

1.00 0.90

out of metricity ILDW

MRI TILD TILDN

0.30

1.00 0.90

MRI TILD TILDN

0.25

0.25

0.80

0.80

0.70

0.70

0.20

0.20 0.60

0.60 0.15

0.50

0.15

0.50

0.40

0.40 0.10

0.30

0.10

0.30 0.20

0.20

TILD

TILD

1

TILDN

8, 1 16 ,1 2, 1/ 1 4, 1/ 1 8, 1/ 1 16 ,1 /1 4, 2/ 1 8, 2/ 1 16 ,2 /1 32 ,2 /1 2, 2/ 1 4, 4/ 1 8, 8/ 1

1

4,

0.00 1

0.00

TILDN

2,

0.00

1, 1 2, 1 4, 1 8, 1 16 ,1 2, 1/ 1 4, 1/ 1 8, 1/ 1 16 ,1 /1 4, 2/ 1 8, 2/ 1 16 ,2 / 32 1 ,2 /1 2, 2/ 1 4, 4/ 1 8, 8/ 1

0.00

MRI

0.10

MRI

0.10

ILDW

0.05

ILDW

1,

0.05

use of affine gap cost

use of affine gap cost

Fig. 2. Hordeum, rbcL, downstream area and DMC1. Congruence measured between partitions. Cost of transformations listed as: opening gap cost, base change cost ⁄ affine gap cost. Portion of parameter space with non-metric cost sets shown in gray.

Fig. 4. Hordeum rbcL, downstream area, DMC1, EF-G and RFLP. RFLP weighted as the gap cost or opening gap cost when affine gap costs were used. Cost of transformations listed as: opening gap cost, base change cost ⁄ affine gap cost. Congruence measured between partitions. Portion of parameter space with non-metric cost sets shown in gray.

out of metricity ILDW

1.00

0.30

0.90

out of metricity

MRI TILD TILDN

ILDW

0.30

0.25

1.00 0.90

0.80

0.25

MRI TILD TILDN

0.80

0.70 0.20

0.20

0.60

0.70 0.60

0.15

0.50

0.15

0.40

0.40 0.10

0.10 0.30

0.05 0.10

ILDW MRI TILD

8,

TILDN

16 ,1 2, 1/ 1 4, 1/ 1 8, 1/ 1 16 ,1 /1 4, 2/ 1 8, 2/ 1 16 ,2 /1 32 ,2 /1 2, 2/ 1 4, 4/ 1 8, 8/ 1

1

1

4,

1,

2,

1

0.00 1

0.00

0.05

use of affine gap cost

0.00

0.30 0.20

ILDW MRI

0.10

TILD

0.00

TILDN

1, 1 2, 1 4, 1 8, 1 16 ,1 2, 1/ 1 4, 1/ 1 8, 1/ 16 1 ,1 /1 4, 2/ 1 8, 2/ 16 1 ,2 / 32 1 ,2 /1 2, 2/ 1 4, 4/ 1 8, 8/ 1

0.20

0.50

use of affine gap cost

Fig. 3. Hordeum, rbcL, downstream area, DMC1 and EF-G. Congruence measured between partitions. Cost of transformations listed as: opening gap cost, base change cost ⁄ affine gap cost. Portion of parameter space with non-metric cost sets shown in gray.

Fig. 5. Hordeum rbcL, downstream area, DMC1, EF-G and RFLP. RFLP weighted as weighted as base change. Congruence measured between partitions. Cost of transformations listed as: opening gap cost, base change cost ⁄ affine gap cost. Portion of parameter space with nonmetric cost sets shown in gray.

RILD will be complementary values. Here ILDW, MRI, TILD, and TILDN were calculated. As expected the ILDw slides towards a trivial minimum when the rbcL sequences are analyzed together with the downstream area (Fig. 1). Indels are not inserted in rbcL and consequently only one of the partitions receives variable costs. Predictably, the ILDW simply shows an increased congruence when indel costs are raised and extension costs are not used. The MRI also slides towards a trivial maximum, suggesting that most or all the indels are retained as synapomorphies in the optimal trees. In the same cost area the topological index, TILD, shows a decline in congruence. Note that at indel costs eight and 16 the TILD values (and TILDN) are identical. The two different cost sets find the same optimal tree

and the increased congruence shown by ILD and MRI is exclusively caused by raising the indel cost. Maximum congruence according to the topological index is found when gap openings are weighted 32, the cost of base changes two, and the affine gap cost one. The reason for this preference may be caused by the monophyly of Hordeum that is supported by the rbcL data, but only recovered by the downstream area at this specific cost set. If the sliding of MRI and ILD is disregarded, MRI also picks this value as optimal while ILD picks a close cost set as optimal. The sliding of the indices will eventually overrule these optimal values at indel cost 32 (ILD 0.0183, MRI 0.9955). When adding DMC1 to the analysis the length variation in the new partition is sufficient to overcome the sliding of ILD (Fig. 2). Although ILD still slides it

L. Aagesen et al. / Cladistics 21 (2005) 15–30

23 out of metricity

out of metricity

ILDW

0.30

1.00 0.90

0.25

ILD W ILD F RILD

MRI TILD TILDN

0.30

1.00 0.90

0.25

0.80

0.80

0.70

0.20

0.70

0.20

0.60

0.60 0.15

0.15

0.50

0.50 0.40

0.40 0.10

0.10

0.30

0.30

0.10

TILD 0.00

use of affine gap cost

Fig. 6. Triticeae, DMC. Congruence measured between exons and introns ILDW, RILD, TILD and TILDN or between fragments ILDF and MRI. Cost of transformations listed as: opening gap cost, base change cost ⁄ affine gap cost. Portion of parameter space with nonmetric cost sets shown in gray.

1.00 0.90

MRI TILD TILDN

MRI

1 16 ,1 2, 1/ 1 4, 1/ 1 8, 1/ 1 16 ,1 /1 4, 2/ 1 8, 2/ 1 16 ,2 /1 32 ,2 /1 2, 2/ 1 4, 4/ 1 8, 8 20 /1 ,2 0 30 /1 ,3 0/ 1

1

8,

1

TILD N

us e of affine gap cost

Fig. 8. Triticeae, DMC1, EF-G and GBSSII. Congruence measured between partitions ILDW, RILD, TILD and TILDN or between fragments ILDF and MRI. Cost of transformations listed as: opening gap cost, base change cost ⁄ affine gap cost. Portion of parameter space with non-metric cost sets shown in gray.

out of metricity 0.30

4,

1,

1, 1

0.00 2,

0.00

RILD

TILD

TILDN

1 2, 1 4, 1 8, 1 16 ,1 2, 1/ 1 4, 1/ 1 8, 1/ 16 1 ,1 /1 4, 2/ 1 8, 2/ 16 1 ,2 / 32 1 ,2 /1 2, 2/ 1 4, 4/ 1 8, 8/ 1

0.00

0.20

0.05

MRI 0.10

ILDW ILDF

ILDW

0.20

0.05

ILD W ILD F RILD

MRI TILD TILDN

out of metricity ILDW ILDF RILD

1.00

0.30

0.90

MRI TILD TILDN

0.25

0.25

0.80

0.80 0.70

0.20

0.70 0.20

0.60 0.15

0.60

0.50

0.15

0.50

0.40

0.40

0.10

0.30

ILDW

0.10 0.30

ILDF

0.20

0.05

0.10

RILD MRI

0.20 0.05 0.10

TILD

1 4,

us e of affine gap cost

RILD MRI TILD

TILD N

8, 1 16 ,1 2, 1/ 1 4, 1/ 1 8, 1/ 1 16 ,1 /1 4, 2/ 1 8, 2/ 1 16 ,2 /1 32 ,2 /1 2, 2/ 1 4, 4/ 1 8, 8 20 /1 ,2 0 30 /1 ,3 0/ 1

1 2,

1, 1

0.00

ILDF

0.00

0.00

TILDN

1, 1 2, 1 4, 1 8, 1 16 ,1 2, 1/ 1 4, 1/ 1 8, 1/ 1 16 ,1 /1 4, 2/ 1 8, 2/ 16 1 ,2 /1 32 ,2 /1 2, 2/ 1 4, 4/ 1 8, 8/ 20 1 ,2 0/ 30 1 ,3 0 40 /1 .4 0 50 /1 ,5 0/ 1

0.00

ILDW

use of affine gap cost

Fig. 7. Triticeae, DMC and EF-G. Congruence measured between partitions ILDW, RILD, TILD and TILDN, or between fragments ILDF and MRI. Cost of transformations listed as: opening gap cost, base change cost ⁄ affine gap cost. Portion of parameter space with nonmetric cost sets shown in gray.

Fig. 9. Triticeae, DMC1, EF-G, GBSSII and rbcL. Congruence measured between partitions ILDW, RILD, TILD and TILDN or between fragments ILDF and MRI. Cost of transformations listed as: opening gap cost, base change cost ⁄ affine gap cost. Portion of parameter space with non-metric cost sets shown in gray.

selects affine gap cost 1 ⁄ 8 of all other costs as the global optimum. The sliding of the MRI is not overcome. TILD picks equal costs and no extension gap as optimal, presumably because some of the input trees are more collapsed under this cost set. TILDN, which is less sensitive to the size of the input trees, still picks the same cost set as optimal (gap opening cost 32, base change cost two, and affine gap cost one). When EF-G (Fig. 3) is added to the data set, the ILD now has a clear minimum at affine gap costs 1 ⁄ 8 of all other costs, which is now also optimal for the MRI, although the MRI still slides towards a trivial maximum in the nonaffine gap cost area. The TILD and TILDN do not change. However, excluding the chloroplast spacer, DMC1, EF-G, and rbcL combined on their own have a

topological optima at an extension gap cost 0.25 of all other costs (results not shown), which is also the next best value for TILDN in this data set. The last partition to be added is the RFLP data set. On its own it shows a completely collapsed consensus tree, hence the topological indices cannot change by adding this data, but the data can still have an influence on the character-based indices. When binary coded data are added, the weight of the transformations have to be specified (Wheeler and Hayashi, 1998). Here the RFLP data set was weighted either as the indel cost or opening gap penalty when affine gap costs were used (Fig. 4), or the RFLP data set was weighted as base changes (Fig. 5).

24

L. Aagesen et al. / Cladistics 21 (2005) 15–30 out of metricity ILDW ILDF RILD

0.30

1.00 0.90

MRI TILD TILDN

0.25 0.80 0.70 0.20 0.60 0.15

0.50 0.40

0.10 0.30

ILDW ILDF

0.20

RILD

0.05

MRI

0.10

TILD

4, 1

8, 1 16 ,1 2, 1/ 1 4, 1/ 1 8, 1/ 1 4, 2/ 1 8, 2/ 1 16 ,2 /1 2, 2/ 1 4, 4/ 1 8, 8 20 /1 ,2 0 3 0 /1 ,3 0/ 1

1, 1

TILDN

0.00 2, 1

0.00

use of affine gap cost

Fig. 10. Triticeae, DMC1, EF-G, GBSSII, rbcL and spacer. Congruence measured between partitions ILDW, RILD, TILD and TILDN or between fragments ILDF and MRI. Cost of transformations listed as: opening gap cost, base change cost ⁄ affine gap cost. Portion of parameter space with non-metric cost sets shown in gray.

out of metricity ILDW ILDF RILD

0.30

1.00 0.90

0.25

MRI TILD TILDN

0.80 0.70

0.20

0.60 0.15

0.50 0.40

0.10

0.30

ILDW ILDF

0.20

0.05

0.10

MRI TILD

44

TILDN

41

21 22

41 22

21 1

0.00 11 1

0.00

RILD

use of affine gap cost

Fig. 11. Arthropoda, all data partitions combined. Congruence measured between partitions ILDW, RILD, TILD and TILDN or between fragments ILDF and MRI. Cost of transformations listed as: opening gap cost, base change cost ⁄ affine gap cost. Portion of parameter space with non-metric cost sets shown in gray.

If RFLP are given the same weights as indels or gap opening penalty, both ILD and MRI slide towards trivial optima following the weight of the RFLP data set. Maximum congruence is obtained when the RFLP data set is given the highest weight (Fig. 4). If, on the other hand, the RFLP are weighted as base changes, the sliding of both ILD and MRI disappears and both indices clearly pick an extension gap 1 ⁄ 8 of all other changes as the optimal cost set (Fig. 5). Triticeae DMC1, EF-G and GBSSII were all divided into fragments corresponding to exons and introns in the

Triticeae data set, and ILDW, ILDF, RILD, and MRI were calculated. If the DMC1 data set is analyzed alone, the ILDW is, as expected, unsuitable for measuring congruence between exons and introns, since indels are only inserted into the introns. Both ILDW and RILD slide towards increased congruence when indel costs are raised and affine gap costs are not used (Fig. 6). In the same area the topology measures show a decrease in congruence. According to the topology measures, maximum congruence between exons and introns is found in the opposite area of the parameter cost space, when gap opening costs and base change costs are set as equal and the affine gap cost is 1 ⁄ 8 of this cost. When ILD is measured among all fragments (ILDF) of the DMC1 data, the MRI still slides towards maximal congruence, but ignoring the slide the MRI agrees with ILDF on optimal cost set. The topological congruence measures pick other cost sets as optimal. When extra data are added, in this case the EF-G data set (Fig. 7), and congruence between data sets is measured, ILDW, TILD, and RILD now pick the same cost set as the optimal one for the combined data set (cost of opening gap and base change equal and affine gap costs 0.25 of this cost). At fragment level, ILDF and MRI also pick this cost value as optimal. Adding data may not always result in a clearer preference for a cost set. This is seen when the GBSSII sequence data available for 18 of the 31 Triticeae species is added to the data set (Fig. 8). Now ILDF and ILDW pick different cost sets as the optimal one. MRI and RILD agree with ILDF or ILDW on which cost set is optimal at fragment level or data set level, respectively. TILD picks a third cost set as optimal. However, TILDn and ILDF show the same costs as being optimal. Although some of the congruence measures pick the same values it is not obvious which cost is optimal for this particular data set. However, all indices agree that the use of affine gap costs improves congruence. The inconclusive congruence picture might reflect the poor agreement among published Triticeae phylogenies, with the GBSSII differing from all other published ones (Mason-Gamer et al., 1998; Petersen and Seberg, 2000). If rbcL, which is of equal sequence length and does not therefore require the insertion of gaps, is added, ILDW, RILD, ILDF, MRI, and TILDN all pick the same cost set as the optimal one, while TILD picks the neighbor cost set as optimal (Fig. 9). In this case the added information contained in the rbcL sequences stabilizes the data set enough to select a narrow zone of optimal congruence. When the intergenic spacer downstream of the rbcL gene is added to the nearly stable data set combined by DMC1, EFG, GBSSII and rbcL, the congruence measures once again scatter over the parameter cost space

L. Aagesen et al. / Cladistics 21 (2005) 15–30

(Fig. 10). ILDF, TILD and TILDN pick the same values (affine gap costs 1 ⁄ 8 of other costs) but ILDW shows an optimum when affine gap costs are 1 ⁄ 40 of all other costs. MRI and RILD slide towards a trivial maximum, suggesting that the extensive length variation of the intergenic spacer is dominating the analysis. Arthropoda In the Arthropod data set only five different cost sets were explored because of limited computing time. Only the complete data set was analyzed and no sequential adding of the loci was done. When analyzing the complete data set the congruence measures all agreed on the same cost set as optimal (gap opening cost twice the cost of base changes and affine gap cost 0.5 the cost of base changes). The only exception is the TILD, which picks an adjacent cost set as optimal (base change and gap opening cost equal and gap extension cost 0.5 of this cost) (Fig. 11).

Discussion The difficulty of assigning the same cost to indels of different length is that affine gap costs of zero lead to trivial alignments. Linear affine gap costs are a rough approximation to a condition where all indels have same cost no matter their length. Nevertheless, the above results clearly show that the use of affine gap costs as implemented in POY proved efficient in accommodating length variation by improving congruence in all cases. Except for a few instances in the Hordeum data set, where one of the topological congruence measures picked equal weights without affine gap costs as optimal, all congruence measures unanimously pointed to the zone of affine gap costs as the area of maximum congruence in all data set and ⁄ or partition combinations. Concerning the congruence measures themselves, the overall impression is that no congruence measure stands out as the obviously preferable one. The congruence measures applied in this study reflect distinct aspects of the data sets, and it is not surprising that they behave differently under different circumstances. The general trend does, however, suggest that when enough data are added or when data partition or fragments are balanced with regard to length variation and phylogenetic information, all congruence measures tend to pick the same cost set as optimal. This is seen most clearly in the Triticeae data set when DMC1 and EF-G are combined. It is also seen in the Arthropod data set when ignoring the TILD. Different aspects of a data set can disturb the balance between fragments or partitions, and congruence measures will disagree upon which cost set is optimal for the

25

data set. The following is a discussion of each congruence measure’s behavior under different conditions. Congruence measured as incongruent length difference, ILD The ILD measures congruence according to the extra steps that arise due to conflict either between data partitions or between fragments. In sensitivity analysis the optimal cost sets are selected most frequently by use of the ILD applied to data partitions (for example Wheeler and Hayashi, 1998; Frost et al., 2001; Giribet et al., 2001; Meier and Wiegmann, 2002). The reliability of the ILD has been questioned as it might show a trivial minimum under some circumstances in which partitions are given disproportional weights (Wheeler and Hayashi, 1998; Dowton and Austin, 2002). The effect of disproportional weighting is clearly seen in the Triticeae data set by analyzing the DMC1 sequences and by measuring congruence between exons and introns (Fig. 6), as well as in the Hordeum data set combining rbcL and the spacer (Fig. 1). In both cases only one partition (either the introns or the spacer) includes indels and receives a successive increase in weight. This weight increment causes the weighted fragment to dominate, which again brings the ILD to slide towards a trivial minimum, giving a misleading impression of improved congruence. The effect is strongest in the area where affine gap costs are not applied and the weight of the indels is directly proportional to their length. The sliding of the congruence measure is very easily detected and confirmed. Every time the gap cost is raised, congruence is improved. If the problem of trivial minima occurs because a single fragment gains dominance due to a successive increase in weight, adding fragments that are also influenced by a variable weighting scheme may break the sliding of the congruence measure. In the case of the DMC1 data set, either more data can be added or congruence can be measured among fragments as the DMC1 sequences are already split into fragments corresponding to exons and introns. All the introns include some amount of sequence length variation and this variation may balance each other. This is because the typical sliding towards a trivial minimum is a sliding towards a grouping according to sequence length, which may occur when indels are treated as a fifth character state. The topology will at one point stabilize but the character based congruence measure will continue to approach optimal values, as in the case of the Hordeum data set (Fig. 1). Including more than one fragment with sequence length variation prevents the sliding simply because it is very unlikely that different fragments will agree upon grouping according to sequence length, although they may agree upon the same phylogenetic pattern.

26

L. Aagesen et al. / Cladistics 21 (2005) 15–30

If new data are added to prevent sliding towards trivial optima, the new data should include sufficient sequence length variation to outweigh the dominating length variation in the introns of DMC1, but not so much variation as to become dominant and thereby create a new situation where the ILD slides towards a trivial minimum. In the case of the DMC1 data set both solutions are efficient, and sliding of the ILD ceases when congruence is measured among fragments (Fig. 6, ILDF) or when the EF-G data partition is added (Fig. 7, both ILDF and ILDW). A case where adding data generates a new situation of trivial minima is seen in the Hordeum data set when the RFLP data set is weighted as indels (Fig. 4). In the Hordeum data set analyzing rbcL, the downstream area, DMC1, and EF-G the ILDW pick as an optimal cost set affine gap costs weighted 1 ⁄ 8 of all other costs (Fig. 3). When RFLP is added and given a weight equal to the indels, the RFLP data set dominates the congruence measure. The highest congruence according to the ILDW is simply the cost set where the RFLP is given a maximal weight of 32. Congruence measured as retained extra steps, RILD and MRI Both the RILD (Wheeler and Hayashi, 1998) and the MRI (Wheeler et al., unpublished) were proposed as alternatives to the ILD with the expectation that these measures would not show trivial optima in cases where the ILD did. Both measures are based on the retention index (Farris, 1989) and as such, pick as optimal the cost set that maximizes the amount of retained possible extra steps, again only considering the extra steps that may arise due to conflict among data partitions or among fragments. In the data sets analyzed here, both the RILD and the MRI seem more susceptible to the dominance of a single partition or fragment than the corresponding ILDW and ILDF. In the Hordeum data set the RILD requires more additional data than the ILDW does before the sliding towards a trivial optimum ceases. In the Triticeae data set adding the spacer between rbcL and psaI causes both RILD and MRI to slide. This result is interesting because in the Triticeae data set none of the congruence measures slide towards trivial optima, except in the case where congruence is measured between introns and exons in the DMC1 data partition. Congruence measures may not agree, as is the case where DMC1, EF-G and GBSSII are combined, but the only case of trivial optima is the sliding of the RILD and the MRI when the spacer is added to the nearly stable data set. ILD measures are not affected. Considering the extensive sequence length differences found in the spacer (1099 bp difference between ingroup and outgroup and 1057 bp among ingroup taxa) dominance of the spacer seems

very likely and the MRI and RILD would then be the most sensitive tools for discovering fragment ⁄ partition dominance. Inspecting the trees shows that length variation may indeed be dominating the output of the analysis. Two species with very short intergenic spacer sequences Aegilops speltoides and Psathyrostachys stoloniformis, group in a clade containing all other species with short intergenic spacer sequences, leaving the genus Psathyrostachys polyphyletic. Psathyrostachys is currently believed to be monophyletic (Kellogg et al., 1996; Petersen and Seberg, 1997, 2000), and only the shared sequence length of this spacer suggests a relationship between Aegilops speltoides and Psathyrostachys stoloniformis. The intergenic spacer of P. stoloniformis is 800 bp shorter than the spacer of the other included Psathyrostachys species. Monophyly of Psathyrostachys is recovered when affine gap costs are 1 ⁄ 20 or less of the cost of other changes. At this cost set the information in the remainder of the data set overcomes the influence of the sequence length of the spacer. If the sliding of the MRI and RILD is an indicator of partition or fragment dominance, it could be used as an objective argument for excluding a specific partition or fragment from an analysis. Note that this approach very easily leads to the pruning of data sets until agreement among congruence measures has been reached. A more satisfactory and theoretically sound approach would be to keep adding information until stability has been reached once more. When the RILD and the MRI do not approach trivial optima they tend to mirror the ILDW or ILDF index, respectively. RILD and ILDW behave nearly identically when DMC1 is analyzed on its own in the Triticeae data set (Fig. 6) or combined with EF-G (Fig. 7) and combined with EF-G and GBSSII (Fig. 8). The MRI and ILDF behave identically in the same combinations of the data partitions, as well as when rbcL is added (Fig. 9). In the Hordeum data set the MRI slides towards trivial optima in nearly all combinations of the data set. Only when combining rbcL, the downstream area, DMC1 and EF-G, and when RFLP is added, weighted as base changes, does the sliding cease and the MRI behave identically to the ILD measure. Congruence measured according to topology of optimal trees from the individual partitions, TILD and TILDN The topological congruence measure TILD shows a decrease or increase in congruence proportional to the number of taxa placed differently by different data partitions. Topological congruence measures are less adequate for measuring congruence than character based measures are, because topology measures are based on the optimal topology of the individual

L. Aagesen et al. / Cladistics 21 (2005) 15–30

partition and do not consider strength of evidence within the partitions (Farris et al., 1995). Nevertheless, topological measures are used here because they are not distorted by the different weights of character transformations, which may distort character based congruence measures. Topological congruence measures simply show at which cost set the individual partitions give the most similar hierarchical information as input to the combined analysis. In the Triticeae data set the TILD behaves very similarly to the ILDW measure. When DMC1 is analyzed on its own, the TILD behaves more like ILDF because the ILDW slides towards a trivial minimum. TILD also behaves more like the ILDF when the spacer is added to the complete data set. In this case the ILDW picks as optimal a cost set far beyond the point where metricity of the data is lost (affine gap costs 1 ⁄ 40 of all other costs) while ILDF and TILD pick as optimal a less extreme cost set (affine gap costs 1 ⁄ 8 of all other costs). In the Hordeum data set, the scaled version of the topology measure, the TILDN, rather than the TILD itself, coincides with the ILD measure. In this data set some of the data partitions generate under some costs a high number of optimal trees and a highly collapsed consensus tree. Since the TILDN is scaled according to the maximal number of possible conflicts among input trees, this index is less sensitive to the resolution of input trees. This may explain why the TILDN and not the TILD coincides with the ILD on the congruence plots. The same reasoning can be applied to the Arthropod data set where the TILDN rather than the TILD also behaves like the ILD measure. In cases where TILD and ILD agree on the congruence plots, there is a direct relationship between increased homoplasy in the combined data set and disagreement in the placement of single taxa between the input data partitions. Picking as optimal the cost set that minimizes homoplasy and disagreement among input partitions undoubtedly seems to be a sound basis for sensitivity analysis, and using the ILD is then a reasonable approach, at least in situations where a balanced data set is analyzed.

27

inconsistent. Metricity has long been held as a required property of phylogenetic analysis (Farris, 1981, 1985; Wheeler, 1993), although some authors have argued that metricity is not necessarily a biological property (Fitch, 1993). Apparently the many long gaps in the Triticeae and Hordeum data sets are not accommodated efficiently by the affine gap costs under the metric cost sets. That all congruence measures, even the topological one, agree on improved congruence when the relative cost of extending the gaps is lowered, suggests that the length of the gaps causes at least part of the incongruence found between the data sets. Exactly how the Triticeae and Hordeum data sets are affected by the lack of metricity of the optimal cost set is difficult to measure. Metricity within the phylogenetic analysis of DNA sequences is not frequently discussed. Both Clustal and POY offer a wide array of possible opening gap costs and affine gap costs without discussing the problem of metricity in the documentation. We do not question the requirement of metricity and simply observe that linear affine gap costs do improve congruence when compared to analyzing gaps as a fifth character state. However, if a data set includes many long gaps, linear affine gap costs may not be the appropriate analytic algorithm as the cost of a long gap may not be sufficiently reduced without violating the metricity of the cost sets. What makes a gap too long to be analyzed under affine gap costs probably depends on the relative amount of phylogenetic information available for the analysis and may therefore vary from data set to data set. It may be argued that the two step analysis of multiple alignment followed by gap coding does not suffer problems of metricity (provided that a metric cost set is chosen under the multiple alignment step). Coding gaps as binary characters produces metric costs but at the expense of analyzing a single alignment when multiple equal optimal alignment could exist (Wheeler, 2001; De Laet, in press).

Conclusions Affine gap cost and metricity While the optimal cost set for the Arthropod data set was a metric cost set, the areas of maximal congruence of the Triticeae and Hordeum data sets were within the zone of non-metricity. The Triticeae and the Hordeum data set often showed balanced congruence curves where all congruence measures picked the same cost set as optimal. The dilemma is that the optimal cost set in most cases was an affine gap cost 1 ⁄ 8 of all other costs—a non-metric cost set. Although all congruence measures agree on this cost set as optimal, picking it for the final analysis set makes the analysis internally

Different alignment costs may lead to different alignments, and choosing an appropriate cost set is therefore crucial for the outcome of a phylogenetic analysis. Sensitivity analysis was intended to deal with this dilemma by using a wide array of cost sets and subsequently applying an optimality criterion of data congruence to identify optimal costs. It has been argued that only by weighting all transformations equally will the final cladogram minimize homoplasy and maximize descriptive power; therefore from an epistemological point of view sensitivity analysis is futile, and only equal weights can be

28

L. Aagesen et al. / Cladistics 21 (2005) 15–30

defended (Frost et al., 2001; Grant and Kluge, 2003). While this position is controversial in itself (for a discussion on equal weighting see De Laet, in press) it cannot be applied to indels in general, at least not with existent algorithms. There is no way of giving indels longer that one base pair the cost of a single event in currently available alignment algorithms, and this is presumably the reason why the procedure of recoding gaps on static multiple alignments is so widely adopted in present day phylogenetic analysis. This may not matter if the sequences include only short indels, or if there are few indels compared to the amount of base changes. However, when analyzing sequences with many and long indels, as for example in the non-coding regions of the chloroplast genome, one may question what equal costs are actually minimizing. Acknowledging that we have no way of giving indels of differentlength equal costs, one is at present faced with the use of affine gap costs and sensitivity analysis. Affine gap costs may accommodate gaps of different lengths as single events and in this paper congruence did in effect consistently improve when affine gap costs were applied. Furthermore, maximum congruence was often found when gap opening costs and base change costs were set to be equal or nearly equal with a lower affine gap cost. Such a cost set is actually an equal weighting cost scheme modified to accommodate gaps of different length. With the choice of sensitivity analysis comes the dilemma of choosing the optimal cost set. At present we have no infallible way to do this. The close relationship between TILD and ILD supports the current procedure of measuring congruence as the number of extra steps arising when fragments are combined. However, rather than focusing on a single congruence measure for identifying optimal cost sets, observing the behavior of different congruence measures gives a more complete description of the structure of the data set. In conventional phylogenetic analysis, where static homology statements are tested, the tree that minimizes extra steps will also be the tree that maximizes retained extra steps. From the above results this also seems to be the case within a dynamic homology scheme. When disagreement did occur between minimizing extra steps and retaining possible extra steps, the disagreement was normally caused by one of the congruence measures sliding toward a trivial optimum. It seems very tempting to conclude that if we seek to identify an optimal cost set for a specific data set by means of sensitivity analysis, the data set needs to be sufficiently balanced so that the same cost set simultaneously optimizes the amount of extra steps and the amount of retained extra steps. In the opposite case, when the congruence measures are pointing to different cost sets as optimal, the data set apparently does not include a signal strong enough to be extracted by the current procedure and more data has to be added.

Whether congruence should be measured between data partitions or fragments is a different issue that depends on the justification for cutting the partitions into fragments (for a discussion on dividing partitions in fragments see Giribet, 2001). In the Hordeum data set no division of the sequences was made. In the Triticeae data set the sequences were cut into fragments corresponding to exons and introns while the sequences of the Arthropod data set was cut into fragments with the limits set by primer sets. In balanced data sets or when enough data were added, measuring congruence between fragments or between partitions yielded the same results. But when differences did occur, the topology measure supported measuring congruence between fragments rather than between data partitions. Finally, one outcome of the above analyses is that combined analyses are preferred to separate analyses. When several data partitions are combined the dominance of individual fragments or partitions is less likely and misleading influence due to long indels may be strongly reduced. To obtain reliable and stable results many loci are needed. Balanced data sets composed by several partitions may even be considerably robust to changes in alignment costs, at least within the parameter space surrounding the optimal cost set, even when this surrounding space includes treating indels as a fifth character state or slightly enters the non-metric zone.

Acknowledgments We wish to thank Ward Wheeler for providing LAa working space at the American Museum of Natural History and for helpful suggestions and comments during all stages of this work. We thank Jan De Laet for answering persistent questions concerning metricity and alignment algorithms and for fixing bugs in POY. We also thank Lauren Aaronson for bug fixing, for discussing algorithms and for improving the spelling and grammar. Ward Wheeler, Jan De Laet, Gonzalo Giribet, Norberto Giannini, Claudia Arango, and two anonymous reviewers helped to improve the manuscript. We want to thank Charlotte Hansen and Lisbeth Knudsen for invaluable technical support. Financial support from the Danish Natural Science Research Council (Grant 51-00-0271, 21-02-0181, and 21-03-0087) and the NASA fundamental space biology program (LAa) is gratefully acknowledged.

References Barker, F.K., Lutzoni, F.M., 2002. The utility of the incongruence length difference test. Syst. Biol. 51, 625–637. Barriel, V., 1994. Phyloge´nies mole´culaires et insertions-de´le´tions de nucle´otides. C. R. Acad. Sci., Sci. la Vie, Paris, 317, 693–701.

L. Aagesen et al. / Cladistics 21 (2005) 15–30 Belshaw, R., Quicke, D.L.J., 2002. Robustness of Ancestral State Estimates. Evolution of Life History Strategy in Ichneumonoid Parasitoids. Syst. Biol. 51, 450–477. Brocchieri, L., 2001. Phylogenetic inference from molecular sequences: review and critique. Theor. Populat. Biol. 59, 27–40. Cunningham, C.W., 1997. Is congruence between data partitions a reliable predictor of phylogenetic accuracy? Empirically testing an iterative procedure for choosing among phylogenetic methods. Syst. Biol. 46, 464–478. Darlu, P., Lecointre, G., 2002. When does the incongruence length difference test fail? Mol. Biol. Evol. 19, 432–437. De Laet, J., in press. Parsimony and the problem of inapplicables in sequence data. In: Albert, V.A. (Ed.), Parsimony, Phylogeny, and Genomics. Oxford University Press, Oxford, UK, pp. 81–116. De Laet, J., Wheeler, W.C., 2003. POY, Version 3.0.11. Published by Wheeler, Gladstein, and De Laet, 6 May 2003. Command line documentation. Dolphin, K., Belshaw, R., Orme, C.D.L., Quicke, D.L.J., 2000. Noise and incongruence. Interpreting results of the incongruence length difference test. Mol. Phylogenet. Evol. 17, 401–406. Dowton, M., Austin, A.D., 2002. Increased congruence does not necessarily indicate increased phylogenetic accuracy – the behavior of the incongruence length difference test in mixed-model analyses. Syst. Biol. 51, 19–31. Farris, J.S., 1973. On comparing the shapes of taxonomic trees. Syst. Zool. 22, 50–54. Farris, J.S., 1981. Distance data in phylogenetic analysis. In: Funk, V.A., Brooks, D.R. (Eds), Advances in Cladistics: Proceedings of the First Meeting of the Willi Hennig Society. New York Botanical Garden, New York, pp. 3–22. Farris, J.S., 1985. Distance data revisited. Cladistics, 1, 67–85. Farris, J.S., 1989. The retention index and the rescaled consistency index. Cladistics, 5, 417–419. Farris, J.S., Ka¨llersjo¨, M., Kluge, A.G., Bult, C., 1995. Testing significance of incongruence. Cladistics, 10, 315–319. Fitch, W.M., 1993. Commentary on the letter from Ward. C. Wheeler. Mol. Biol. Evol. 10, 713–714. Freudenstein, J.V., Chase, M.W., 2001. Analysis of Mitochondrial nad1b-c intron sequences in Orchidaceae: Utility and coding of length-change characters. Syst. Bot. 26, 643–657. Frost, D.R., Rodrigues, M.T., Grant, T., Titus, T.A., 2001. Phylogenetics of the lizard genus Tropidurus (Squamata: Tropiduridae: Tropidurinae): Direct optimization, descriptive efficiency, and sensitivity analysis of congruence between moledular data and morphology. Mol. Phylogenet. Evol. 21, 352–371. Giribet, G., 2001. Exploring the behavior of POY, a program for direct optimization of molecular data. Cladistics, 17, S60–S70. Giribet, G., Edgecombe, G.D., Wheeler, W.C., 2001. Arthropod phylogeny based on eight molecular loci and morphology. Nature, 413, 157–161. Golenberg, E.M., Clegg, M.T., Durbin, M.L., Doebley, J., Ma, D.P., 1993. Evolution of a noncoding region of the chloroplast genome. Mol. Phylogenet. Evol. 2, 52–64. Goloboff, P.A., 1998. NONA, Version 1.9. Program and documentation. Willi Hennig Society. URL: http://www.cladistics.org/. Gotoh, O., 1982. An improved algorithm for matching biological sequences. J. Mol. Biol. 162, 705–708. Grant, T., Kluge, A.G., 2003. Data exploration in phylogenetic inference: scientific, heuristic, or neither. Cladistics, 19, 379–418. Hennequin, S., Ebihara, A., Ito, M., Iwatsuki, K., Dubuisson. J.-Y., 2003. Molecular systematics of the fern genus Hymenophyllum s. 1. (Hymenophyllaceae) based on chloroplastic coding and noncoding regions. Mol. Phylogenet. Evol. 27, 283–301. Hsiao, C., Chatterton, N.J., Asay, K.H., Jensen, K.B., 1995. Phylogenetic relationships of the monogenomic species of the wheat tribe Triticeae (Poaceae), inferred from nuclear rDNA (internal transcribed spacer) sequences. Genome, 38, 211–223.

29

Kellogg, E.A., Appels, R., 1995. Intraspecific and interspecific variation in 5S RNA genes are decoupled in diploid wheat relatives. Genetics, 140, 325–343. Kellogg, E.A., Appels, R., Mason-Gamer, R.J., 1996. When genes tell different stories: the diploid genera of Triticeae (Gramineae). Syst. Bot. 21, 321–347. Kluge, A.G., Farris, J.S., 1969. Quantitative phyletics and the evolution of anurans. Syst. Zool. 18, 1–32. Lee, M.S.Y., 2001. Unalignable sequences and molecular evolution. Trends Ecol. Evol, 16, 681–685. Lutzoni, F., Wagner, P., Reeb, V., Zoller, S., 2000. Integrating ambiguously aligned regions of DNA sequences in phylogenetic analyses without violation of positional homology. Syst. Biol. 49, 628–651. Mason-Gamer, R.J., Weil, C.F., Kellogg, E.A., 1998. Granule-bound starch synthase: structure, function, and phylogenetic utility. Mol. Biol. Evol. 15, 1658–1673. Meier, R., Wiegmann, B., 2002. A phylogenetic analysis of Coelopidae (Diptera) based on morphological and DNA sequence data. Mol. Phylogenet. Evol. 25, 393–407. Mickevich, M.L., Farris, J.S., 1981. The implications of congruence in Menidia. Syst. Zool. 30, 351–370. Morrison, D.A., Ellis, J.T., 1997. Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18s rDNAs of Apicomplexa. Mol. Biol. Evol. 14, 428–441. Nixon, K.C., 1999. The parsimony ratchet a new method for rapid parsimony analysis. Cladistics, 15, 407–414. Nixon, K.C., 2002. Winclada, Version 100.08. Willi Hennig Society. URL: http://www.cladistics.org/. Notredame, C., Holm, L., Higgins, D.G., 1998. COFFEE: an objective function for multiple sequence alignment. Bioinformatics, 14, 407– 422. Ogihara, Y., Terachi, T., Sasakuma, T., 1991. Molecular analysis of the hot spot region related to length mutations in wheat chloroplast DNAs. I. Nucleotide divergence of genes and intergenic spacer regions located in the hot spot region. Genetics, 129, 873–884. Petersen, G., Seberg, O., 1997. Phylogenetic analysis of the Triticeae (Poaceae) based on rpoA seqeunce data. Mol. Phylogenet. Evol. 7, 217–230. Petersen, G., Seberg, O., 2000. Phylogenetic evidence for excision of Stowaway miniature inverted-repeat transposable elements in Triticeae (Poaceae). Mol. Biol. Evol. 17, 1589–1596. Petersen, G., Seberg, O., 2003. Phylogenetic analyses of the diploid species of Hordeum (Poaceae) and revised classification of the genus. Syst. Bot. 2, 293–306. Petersen, G., Seberg, O., Aagesen, L., Frederiksen, S., 2004. An empirical test of the treatment of indels during optimization alignment based on the phylogeny of the genus Secale (Poaceae). Mol. Phylogenet. Evol. 30, 733–742. Phillips, A., Janies, D., Wheeler, W.C., 2000. Multiple sequence alignment in phylogenetic analysis. Mol. Phylogenet. Evol. 16, 317–330. Seberg, O., Frederiksen, S., 2001. A phylogenetic analysis of the monogenomic Triticeae (Poaceae) based on morphology. Bot. J. Linn. Soc. 136, 75–97. Simmons, M.P., Ochoterena, H., 2000. Gaps as characters in sequencebased phylogenetic analyses. Syst. Biol. 49, 369–381. Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., Higgins, D.G., 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucl Acids Res. 24, 4876–4882. Thorne, J.L., Kishino, H., 1992. Freeing phylogenies from artifacts of alignment. Mol. Biol. Evol. 9, 1148–1162. Vingron, M., von Haeseler, A., 1997. Towards integration of multiple alignment and phylogenetic tree construction. J. Comput Biol. 4, 23–34.

30

L. Aagesen et al. / Cladistics 21 (2005) 15–30

Watermann, M.S., Smith, T.F., Beyer, W.A., 1976. Some biological sequence metrics. Adv. Math. 20, 367–387. Wheeler, W.C., 1993. The triangle inequality and character analysis. Mol. Biol. Evol. 10, 707–712. Wheeler, W.C., 1995. Sequence alignment, parameter sensitivity, and the phylogenetic analysis of molecular data. Syst. Biol. 44, 321–331. Wheeler, W.C., 1996. Optimization alignment: the end of multiple sequence alignment in phylogenetics? Cladistics, 12, 1–9. Wheeler, W.C., 1999a. Measuring topological congruence by extending character techniques. Cladistics, 15, 131–135. Wheeler, W.C., 1999b. Fixed character states and the optimization of molecular sequence data. Cladistics, 15, 379–385. Wheeler, W.C., 2001. Homology and the optimization of DNA sequence data. Cladistics, 17, S3–S11.

Wheeler, W.C., 2003a. Search-based optimization. Cladistics, 19, 348– 355. Wheeler, W.C., 2003b. Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search. Cladistics, 19, 261–268. Wheeler, W.C., Gladstein, D.S., DeLaet, J., Aaronson, L., 2003. POY, Version 3.0.11a. Available from ftp.amnh.org ⁄ pub ⁄ molecular ⁄ poy. Wheeler, W.C., Hayashi, C., 1998. The phylogeny of the extant Chelicerate orders. Cladistics, 14, 173–192. Yoder, A.D., Irwin, J.A., Payseur, B.A., 2001. Failure of the ILD to determine data combinability for slow loris phylogeny. Syst. Biol. 50, 408–424. Young, N.D., Healy, J., 2003. GapCoder automates the use of indel characters in phylogenetic analysis. BMC Bioinformatics, 4, 6.