HYBRIDLAB: A program for generating simulated ...

9 downloads 0 Views 91KB Size Report
purebred and hybrids (Pritchard et al. 2000; Anderson &Thompson 2002). A recent ... individuals as purebred or hybrids. A prerequisite for the application of the ...
HYBRIDLAB

(version? 1.1): A program for generating simulated hybrids from population samples Einar Eg Nielsen, Lars Arve Bach and Piotr Kotlicki

The study of natural hybridisation and hybrid zones has been the focus of much attention in evolutionary biology (Endler 1977; Barton & Hewitt 1985; Harrison 1990; Burke & Arnold 2001). Likewise, studies of artificial hybridisation between domesticated animals and their wild con-specifics has been a priority in relation to maintenance of the integrity of wild species and populations (e.g. Beaumont et al. 2001; Hansen et al. 2001; Olden 2004). Identification of hybrids between highly diverged taxa is relatively straightforward, relying on fixed differences of genetic markers. However, for most organisms intraspecific genetic differentiation rarely includes fixed differences. Accordingly, identification of hybrids has to be probabilistically. Several methods have been advanced for identifying hybrid individuals (see Anderson &Thompson 2002 and references therein), however, recently, the application of model based Bayesian statistical techniques have gained high popularity due to their superior power for identifying purebred and hybrids (Pritchard et al. 2000; Anderson &Thompson 2002). A recent evaluation of the general efficiency of model based Bayesian methods can be found in Vähä & Primmer (2006). Two methods are commonly used; i.e. STRUCTURE (Pritchard et al. 2000), which estimates the proportion of an individuals genotype originating from each of a set of potentially hybridizing taxa and NEWHYBRIDS (Anderson &Thompson 2002), which estimates the probability of an individual belonging to a specific hybrid (or parental) class. However, the intrinsic problem with all Bayesian analysis is that validity of the assumed distribution of prior(s) cannot be assessed statistically (Gelman et al, 1995). Consequently, a simulation approach has to be implemented for each specific dataset for a case by case evaluation of the statistical power for correctly identifying individuals as purebred or hybrids. A prerequisite for the application of the simulation approach is software which creates artificial parental and hybrid genotypes and for that purpose we have developed HYBRIDLAB. The principle of the program is simple. From standard population genetic data of individual multilocus genotypes the program first estimates allele frequencies at each locus in each of the parental populations specified in the input file. Then multilocus F1 hybrid genotypes are created by randomly drawing, as a function of their calculated frequency distributions, one allele at each locus from each of two user-specified hybridizing populations. Linkage equilibrium, neutrality of markers and random mating is assumed. The program file, documentation and test-file can be downloaded from: http://www.difres.dk/ffi/uk/populationgenetic/hybridlab/index.asp. The program is written in blab la and has to run in a java bla bla When opening the program the user is prompted to open an input file (Figure 1 “the interfase”). HYBRIDLAB uses the specifications for input files for GENEPOP (Raymond & Rousset 1995), which ranges among the most commonly applied formats for population genetic data. Information on data input format can be found on http://wbiomed.curtin.edu.au/genepop/. Additionally, a large number of population genetic programs can convert (to) GENEPOP files. When an input file in the correct format is successfully opened the user is asked to choose

parameters. That is, first to select which two of the provided parental populations to create F1 hybrids between. Secondly, it is possible to “flag” simple statistical information regarding total number of occurrence, frequency of occurrence and accumulated probability for each allele at each locus in the two parental populations, which will be written in the output window. Finally, the number of required new F1 hybrids (maximum?) should be given. A few seconds after pressing START the multilocus genotypes of the hybrids, as well as the optional statistical information, will appear in the output window in GENEPOP format. By clicking on “Save a File” the data will be saved as a standard text file (tab, space delimited?). If the user requires other simulated individuals such as parentals, F2 or backcrosses the procedures are simple. For simulated parentals, use the same population as parent one and parent two. For F2 hybrids, use the output F1 hybrids as parent one and two. For backcrosses, use simulated parentals as parent one and simulated F1 hybrids as parent two. Since the program does not take sample sizes into account for estimation of allele frequencies, it is recommendable to avoid using small and unequal sample sizes, in particular for highly variable loci such as microsatellites to reduce the number of un-sampled alleles within populations. However, standard population genetic sample sizes of 30 - 50 individuals should suffice for avoiding severely biased results. In any case, the potential bias of small sample sizes can also be assessed by simulation comparing hybrid indices for simulated and real individuals of known hybrid class. An example of the application of HYBRIDLAB for evaluation of parental or hybrid individuals simulated individuals for hybridization analysis can be found in Nielsen et al. (2003). Using nine highly variable microsatellites they reinvestigated the classical study by Sick (1965) demonstrating the Wahlund principle for hemoglobin genotypes in Baltic cod (Gadus morhua L.). Like Sick they found intermediate allele frequencies for cod in the transition area between the North Sea and the Baltic Sea, but they were unable to assess whether intermediate allele frequencies were caused by mechanical mixing or hybridization due to the lack of statistical power for HW tests. They estimated the overall proportions of genetic contribution to each sample within the transition area from the North Sea and Baltic Sea cod populations respectively and found significant contributions from both. Estimated individual admixture proportions using STRUCTURE from real transition area samples showed a large number of intermediate (hybrid) genotypes. Two simulated samples of equal size to the real transition area samples were created using an earlier prototype of HYBRIDLAB; one consisting of a mixture of simulated North Sea and Baltic Sea parental genotypes and one consisting of a simulated hybrid swarm (random mating). The simulations clearly showed that the hybrid swam scenario matched the real population samples and confirmed that a hybrid zone was indeed the most parsimonious explanation of cod population structure in that area. The application of HYBRIDLAB is, naturally, not limited to evaluating Bayesian methods but can be used for a number of purposes where individual based methods are applied.

.

The generation of artificial hybrids between pairs of potential parental populations was done using the program HYBRIDLAB v 0.9 www.evalife.dk/applications/hybridlab). The program creates multilocus F1 hybrid genotypes between two populations. Alleles are drawn randomly as a function of their calculated frequency distributions, linkage equilibrium and neutrality of markers assuming random mating.

Suggest Documents