Computer Note
MTDIS: A Computer Program to Estimate Genetic Distances Among Populations Based on Variation in Single-Copy DNA R. G. Danzmann Population geneticists interested in assessing genetic distances among conspecific populations with nuclear gene frequencies generally use one of several different distance measures (e.g., CavalliSforza and Edwards 1967; Goldstein et al. 1995; Nei 1978; Rogers 1972; Shriver et al. 1995). A similar repertoire does not exist for the analysis of single copy DNA sequences. Although some of these distance measures (e.g., Nei’s D) have been used to compare single-copy DNA haplotypic divergence among populations, the direct implementation of such a measure may obscure important information on the evolutionary divergence among populations since all DNA haplotypes are given equal weighting in the measure. The program MTDIS implements a distance measure appropriate for single-copy DNA (e.g., mitochondrial DNA or chloroplast DNA) that is based upon the formula given in Danzmann et al. (1991). This distance measure takes into account the actual sequence divergence as well as frequency composition of the DNA haplotypes in the populations being compared. Thus this program is similar to some of the more recently proposed nuclear DNA distance measures which recommend the incorporation of evolutionary distance weightings in algorithms comparing population divergence (Goldstein et al. 1995; Shriver et al. 1995). Although MTDIS was initially used to analyze restriction site data, the present version will also accept DNA sequences as data. Data is input in the form of two files.
One file contains either restriction site data in the form of a presence/absence matrix or DNA sequence data for all haplotypes detected in the populations being examined. A second file gives the actual sample counts of individuals possessing different haplotypes in the populations surveyed. The algorithm used to analyze either restriction site data or sequence data is identical, and simply compares the base pair ( bp) percent differentiation in the two populations being compared. For DNA sequence data this is simply the frequency of base pair differences between haplotypes averaged over the frequency of all haplotype differences between the pair of populations being considered. Thus the measure is similar to a simple p estimation of sequence divergence. For restriction site data, the assumption is made that only 1 bp differs between two haplotypes when a site gain difference is observed with a particular restriction enzyme. Corrections are not made for homoplasies among sites or possible transition/transversion mutational biases in the DNA sequence data. Thus MTDIS is most appropriate for the analysis of relatively recently diverged conspecific populations and would not be suitable for interspecific phylogenetic comparisons. Two types of data analysis are accommodated by MTDIS. First, MTDIS will produce distance measures among a given set of populations from the empirical restriction site matrix or DNA sequence matrix for the haplotypes detected. An upper right-hand matrix of distance measures is produced. MTDIS does not implement an NJ or UPGMA analysis on these distances. However, output from MTDIS is easily converted into a format that may be read by MEGA ( Kumar et al. 1993) for the generation of a desired tree. The output may also be read by the NEIGHBOR program in PHYLIP 3.5 ( Felsenstein 1993) after the appropriate population name designations are listed at the beginning of each row. In-
tra- (dw) and interpopulation (db) nucleotide diversities and nucleon diversities are also given in this file according to Nei (1987). Secondly, MTDIS will also accept input from multiple randomizations (e.g., bootstrapping tests) of the starting restriction site or sequence files. These trees may be generated with the SEQBOOT program in PHYLIP. MTDIS produces multiple right-handed distance matrices from these datasets in a format that may be directly read by the NEIGHBOR program in PHYLIP. The tree file produced by NEIGHBOR may then be read by CONSENSE in PHYLIP to obtain the final confidence estimates on the branches of the MTDIS distance tree. Both Drs. Felsenstein and Kumar should be contacted to obtain copies of PHYLIP and MEGA. A second program, SEQFLTR, is also provided that allows the user to take complete DNA sequence information on several haplotypes and outputs a file suitable for direct input into MTDIS. This file only contains listings for the homologous polymorphic nucleotide sites plus INDEL regions if desired. Monomorphic sites across all haplotypes are given at the bottom of the file and are used in the sequence divergence estimates. MTDIS and SEQFLTR were written and compiled in Visual Basic for DOS and will run as stand-alone executables. Input files are read as ASCII text files and may be constructed with any text editor. The program and an accompanying README file with instructions for use are available from R. G. Danzmann, Department of Zoology, University of Guelph, Guelph, Ontario, Canada N1G 2W1. You may obtain a copy of this program either by sending a formatted disk or by anonymous ftp to site 131.104.50.2 using the password: danzmann. Then use the command GET to retrieve the file: mtdis.zip. PKUNZIP this file. If you have any questions regarding the programs please contact me by e-mail at
[email protected].
283
From the Department of Zoology, University of Guelph, Guelph, Ontario, Canada N1G 2W1, or e-mail:
[email protected]. q 1998 The American Genetic Association
alpinus L., from Thingvallavatn, Iceland. J Fish Biol 39: 649–659.
Nei M, 1987. Molecular evolutionary genetics. New York: Columbia University Press.
Felsenstein J, 1993. PHYLIP (phylogeny inference package), version 3.5c. Seattle: University of Washington.
Rogers JS, 1972. Measures of genetic similarity and genetic distance. In: Studies in Genetics VII. Austin: University of Texas.
References
Goldstein DB, Ruiz-Linares A, Cavalli-Sforza LL, and Feldman MW, 1995. Genetic absolute dating based on microsatellites and the origin of modern humans. Proc Natl Acad Sci USA 92:6723–6727.
Cavalli-Sforza LL and Edwards AWF, 1967. Phylogenetic analysis: models and estimation procedures. Am J Hum Genet 19:233–257.
Kumar S, Tamura K, and Nei M, 1993. MEGA: molecular evolutionary genetics analysis, version 1.01. University Park: Pennsylvania State University.
Danzmann RG, Ferguson MM, Skulason S, Snorrason SS, and Noakes DLG, 1991. Mitochondrial DNA diversity among four sympatric morphs of Arctic char, Salvelinus
Nei M, 1978. Estimation of average heterozygosity and genetic distance from a small number of individuals. Genetics 89:583–590.
284 The Journal of Heredity 1998:89(3)
Shriver MD, Jin L, Boerwinkle E, Deka R, Ferrell RE, and Chakraborty R, 1995. A novel measure of genetic distance for highly polymorphic tandem repeat loci. Mol Biol Evol 12:914–920.
Received January 17, 1997 Accepted September 29, 1997 Corresponding Editor: Robert Angus