LNCS 7246 - Improving Phylogenetic Tree ... - Springer Link

39 downloads 66 Views 185KB Size Report
The samples were collected from a small area in the Cook county, IL,. USA, and there was no evident relationship among genetic and geographic dis- tance.
Improving Phylogenetic Tree Interpretability by Means of Evolutionary Algorithms Francesco Cerutti1,2 , Luigi Bertolotti1,2 , Tony L. Goldberg3 , and Mario Giacobini1,2 1

Department of Animal Production Epidemiology and Ecology, Faculty of Veterinary Medicine, University of Torino, Italy 2 Molecular Biotechnology Center, University of Torino, Italy 3 Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, USA {francesco.cerutti,luigi.bertolotti,mario.giacobini}@unito.it, [email protected]

Abstract. A recent research article, entitled Taxon ordering in phylogenetic trees: a workbench test presented the application of an evolutionary algorithm to order taxa in a phylogenetic tree, according to a given distance matrix. In previous articles, the authors introduced the first approaches to study the influence of algorithm parameters on the efficacy of finding the tree with the shortest distance among taxa, based on genetic distances. In the considered work, the authors tested the algorithm using both genetic and geographic distances, and a combination of the two, on three phylogenetic trees of different viruses. The results were interesting, especially when applying geographic distances, allowing a new reading direction, orthogonal to the classical root-to-taxa one. Keywords: Evolutionary algorithm, phylogenetic tree, taxon order.

1

Short Background in Phylogenetics

Evolutionary biology often makes use of phylogenetic trees to describe and infer the relationships among living organisms. A phylogenetic tree is a mathematical structure representing the evolutionary history of sequences or individuals. It consists of nodes connected by branches, and the terminal nodes, representing the “leaves” of the tree, are called taxa. Internal nodes represent ancestors, and can be connected to many branches. Evolutionary information is contained in the tree topology: in other words, the relationship between two individuals is described by the pathway linking the two tips, along the branches, through the internal nodes. For this reason the most important feature of a tree is its topology. There are several ways to draw a phylogenetic tree: it is strictly depending by the analyses’ aim, but scientists often depict the tree topology as cladogram or phylogram. Basically, the tree is drawn following a horizontal direction where the evolution is described by the pathway from the root of the tree to its tips. M. Giacobini, L. Vanneschi, and W.S. Bush (Eds.): EvoBIO 2012, LNCS 7246, pp. 250–253, 2012. c Springer-Verlag Berlin Heidelberg 2012 

Improving Phylogenetic Tree Interpretability

251

Usually, the root of the tree is placed at the left side of the figure and the tips are at the right side. Considering what we stated above, the vertical order in which taxa are reported is not meaningful, since the reading direction is only from root to tips and viceversa, the branch length being a measure of divergence. In fact, taxa belonging to an unresolved clade (where several taxa are linked to the same internal node) are often reported following the same order than in the original input file. This representation is hazardous because a superficial browse through the tree could lead to incorrectly consider the closeness among taxa. A first approach to reorder the taxa according to a distance matrix was proposed by Moscato, Cotta and colleagues [1,2]. In these works, the authors both build new phylogenies and improve existing ones generated by Neighbor Joining and hypercleaning methods. They approached the problem as a minimum Hamiltonian path problem, and used memetic algorithms to find the “solution that minimizes the length of a path of distances between species” [2].

2

Taxon Ordering in Phylogenetic Trees: A Workbench Test

A more recent article, entitled Taxon ordering in phylogenetic trees: a workbench test, published on BMC Bioinformatics [3], described the validation of an Evolutionary Algorithm (EA) to order taxa in a phylogenetic tree given a distance matrix. The idea behind this approach is the following: each internal node in a tree can be freely rotated without modifying the topology. In order to better represent the tree, one could group taxa with similar features, such as genetic similarity, geographic location or collection date, preserving the original topology. This approach was intended to improve the interpretability of phylogenetic trees including more information, especially in highly unresolved trees, and to assist in reading them correctly. In a previous work [4], the authors investigated the influence of the different parameters on the dynamics of the proposed alforithm. First, a simple (1 + 1)-EA was adopted, applying genetic distances for the fitness evaluation. This was considered as the sum of the vertical distances’ of the r closes taxa to the considered one, for each taxon on the tree. The study proved that the parameter r, called the radius, drastically influenced the algorithm’s performances, and a value of r = 8 could be a good choice for the fitness evaluation. Comparing the results of the EA with a random search, the former consistently outperformed the latter. Then, the study was directed to the comprehension of the influence of the population size on the search dynamics.The best performances coupled with the more consistent results were obtained when applying (1 + 5)-EAs and (5 + 5)-EAs. After this first test to determine the effectiveness of the algorithm and its parameters, the authors validated the method by applying it to three different phylogenetic trees from literature, using both genetic and geographic distances, and a merge of the two.

252

F. Cerutti et al.

When reordering the taxa, the trees obtained considering geographic distances showed interesting interpretations. Fig. 1 summarizes the best trees obtained reordering the taxa according to geographic distances and its combination with the genetic ones. The pattern of the points’ distribution along the map, representing the State of sample collection, is the same as the one along the tree. Thus, the algorithm effectively reorder taxa on the tree with respect to the distance matrix. The color distribution on the map and on the tree strongly helps the interpretation of the tree, adding a further interpretation.

Fig. 1. (a) Map representing the study area of USA and Mexico where VSV samples were collected, and (b) the original tree, as presented by Perez et al.[5]. The best trees obtained using the geographic (c) and combined genetic-geographic (d) distances. The dashed line in D highlights the “C” shape acquired by the clades (the figure is taken from [3]).

When the genetic distances are used, a recurrent reorder occurs, with long branches of the tree pushed to the extremities of the tree. Being the branch

Improving Phylogenetic Tree Interpretability

253

length proportional to the genetic distance, it is correct that the EA reduces the global distance in the tree by moving them to the extremities, since the samples on those branches are the most divergent on the tree. Other very interesting results in reordering the taxa according to both genetic and geographic distances were obtained by applying the EA to a West Nile virus tree. The samples were collected from a small area in the Cook county, IL, USA, and there was no evident relationship among genetic and geographic distance. Although, genetic variation was larger within sites than between different collection sites. These relationship had a support of the results reported in the considered article. In fact, while with geographic-only and geographic-genetic distances a grouping of samples collected from the same site was recorded, this movement does not appear when applying genetic-only distances.

3

Conclusions

The work presented in the article Taxon ordering in phylogenetic trees: a workbench test [3] showed interesting results for helping the interpretation of phylogenetic trees, a new reading direction, orthogonal to the classical root-to-taxa one. The preliminary results of the study were promising, even thou the genetic information is already contained within the tree topology. Adding more information to the tree by using the geographic distance could provide a strong support to the interpretation of phylogenetic trees. The recent development of tools for phylogeography underlines the increasing interest towards the understanding of the relationship among genetic diversity and spatial distribution. The algorithm presented in the article and here discussed does not pretend to be one of them, but is a simpler method to merge the information from genetic and spatial data.

References 1. Moscato, P., Buriol, L., Cotta, C.: On the analysis of data derived from mitochondrial DNA distance matrices: Kolmogorov and a traveling salesman give their opinion. In: Advances in Nature Inspired Computation: The PPSN VII Workshops 2002, pp. 37–38 (2002) 2. Cotta, C., Moscato, P.: A memetic-aided approach to hierarchical clustering from distance matrices: application to gene expression clustering and phylogeny. Biosystems 72, 75–97 (2003) 3. Cerutti, F., Bertolotti, L., Goldberg, T.L., Giacobini, M.: Taxon ordering in phylogenetic trees: a workbench test. BMC Bioinformatics 12, 58 (2011) 4. Cerutti, F., Bertolotti, L., Goldberg, T.L., Giacobini, M.: Taxon ordering in phylogenetic trees by means of evolutionary algorithms. BioData Mining 4, 20 (2010) 5. Perez, A.M., Pauszek, S.J., Jimenez, D., Kelley, W.N., Whedbee, Z., Rodriguez, L.L.: Spatial and phylogenetic analysis of vesicular stomatitis virus over- wintering in the United States. Preventive Veterinary Medicine 93(4), 258–264 (2010)