Copyright 2001 by the Genetics Society of America
Heterosis, Marker Mutational Processes and Population Inbreeding History Anne Tsitrone,* Franc¸ois Rousset† and Patrice David* *Centre d’Ecologie Fonctionnelle et Evolutive, Centre National de la Recherche Scientifique, 34293 Montpellier Cedex 5, France and †Institut des Sciences de l’Evolution de Montpellier, Universite´ Montpellier II, 34095 Montpellier, France Manuscript received May 23, 2001 Accepted for publication October 1, 2001 ABSTRACT Genotype-fitness correlations (GFC) have previously been studied using allozyme markers and have often focused on short-term processes such as recent inbreeding. Thus, models of GFC usually neglect marker mutation and only use heterozygosity as a genotypic index. Recently, GFC have also been reported (i) with DNA markers such as microsatellites, characterized by high mutation rates and specific mutational processes and (ii) using new individual genotypic indices assumed to be more precise than heterozygosity. The aim of this article is to evaluate the theoretical impact of marker mutation on GFC. We model GFC due to short-term processes generated by the current breeding system (partial selfing) and to long-term processes generated by past population history (hybridization). Various mutation rates and mutation models corresponding to different kinds of molecular markers are considered. Heterozygosity is compared to other genotypic indices designed for specific marker types. Highly mutable markers (such as microsatellites) are particularly suitable for the detection of GFC that evolve in relation to short-term processes, whereas GFC due to long-term processes are best observed with intermediate mutation rates. Irrespective of the marker type and population scenario, heterozygosity usually provides higher correlations than other genotypic indices under most biologically plausible conditions.
T
HE existence of correlations between individual genotypes at marker loci and fitness-related traits has caused much debate among evolutionary biologists. Such correlations were initially used as an argument in favor of selection acting on the maintenance of allozyme polymorphisms in the controversy that has historically opposed selectionists and neutralists (David 1998). Allozymes have been used for decades to detect correlations between multilocus heterozygosity (the number of heterozygous marker loci per individual) and fitnessrelated traits such as growth, viability, or physiological parameters (reviewed in Mitton and Grant 1984; David 1998). Such positive heterozygosity-fitness correlations (HFC) have been reported for various organisms, including marine bivalves (Zouros et al. 1988), salmonid fishes (Leary et al. 1983), and pine trees (Ledig 1986). HFC have recently been reported using restriction fragment length polymorphism (RFLP) markers (Pogson and Fevolden 1998) and microsatellite markers (Bierne et al. 1998, 2000a). The observation of significant HFC with noncoding DNA markers makes it clear that at least some of the correlations are not due to direct effects of the marker genes on the phenotype. Associative overdominance refers to any kind of HFC not due to a direct effect of marker genes, but to genetic associations between the markers and fitness genes (David 1998). The first kind Corresponding author: Anne Tsitrone, CEFE-CNRS, 1919 Rte. de Mende, 34293 Montpellier Cedex 5, France. E-mail:
[email protected] Genetics 159: 1845–1859 ( December 2001)
of association is linkage desequilibrium due to genetic drift (correlation of allelic state within gametes). This has been identified as a possible cause of HFC in small populations when there is physical linkage between fitness genes and marker genes (Ohta and Kimura 1970; Pamilo and Palsson 1998). The second kind of association, identity disequilibrium due to variance in inbreeding (correlation of homozygosity between loci across the whole genome), has been identified as a major source of HFC in several theoretical models (Ohta and Cockerham 1974; Charlesworth 1991; Zouros 1993). Here we focus on this second process. Variance in inbreeding generates HFC because more inbred individuals are both more homozygous for their marker loci and less fit due to inbreeding depression. HFC may thus be a powerful tool to analyze the fitness consequences of inbreeding (Charlesworth and Charlesworth 1999; Pemberton et al. 1999). However, inbreeding itself may be caused by very different population processes. The most obvious is the mating system, which generates “short-term” inbreeding, i.e., inbreeding caused by one or a few generations of consanguineous matings. This could explain HFC in large, partially selfing populations of pine trees (Ledig 1986; Bush and Smouse 1991). “Long-term” inbreeding, on the other hand, involves both recent coalescence events and coalescence events deeper in the pedigree. For example, when two isolated populations come into contact, hybrid offspring are more “outbred” than nonhybrid offspring. In this case, inbreeding may have built up during a long history of isolation of the parental source populations. Such a scenario was in-
1846
A. Tsitrone, F. Rousset and P. David
voked to explain the HFC detected in the red deer population of the Isle of Rum (Coulson et al. 1998) and in the harbor seal population breeding on Sable Island (Coltman et al. 1998; Pemberton et al. 1999). Previous models of associative overdominance implicitly neglect marker mutation, because they focus on short-term inbreeding and on markers with low mutation rates (e.g, allozymes). Now that highly mutable markers (i.e., microsatellites) and long-term scenarios are included in HFC studies, a theoretical assessment of the importance of mutation is needed. This approach was implicitly followed by Coulson et al. (1998) who proposed the use of a new individual genotypic index, rather than heterozygosity, to account for the marker mutation properties (in their case, microsatellites). The model usually assumed for microsatellites is stepwise mutation (one repeat unit added or removed in each mutation event; Ohta and Kimura 1973; Valde`s et al. 1993). This mutation model suggested the definition of an index, d 2, the squared difference in repeat units between the two alleles of an individual (Coulson et al. 1998), whose distribution is closely related to the distribution of coalescence times under such a mutation model (Pritchard and Feldman 1996). Empirical studies have sometimes found d 2 to correlate with fitness traits in samples where heterozygosity does not correlate significantly with these traits (Coltman et al. 1998; Coulson et al. 1998). It has thus been suggested that heterozygosity is suitable for detecting short-term inbreeding, whereas d 2 provides additional information about long-term inbreeding, due to the mixture of formerly isolated subpopulations (Pemberton et al. 1999; Coulson et al. 1999; Marshall and Spalton 2000). In this article we assess the significance of these arguments using a theoretical approach. The influence of marker mutation on genotype-fitness correlations (GFC) due to inbreeding is investigated by comparing (i) different mutation models and mutation rates, (ii) different population inbreeding histories involving various time scales, and (iii) different genotypic indices related to the mutational processes of the marker. THEORY
The model: The rationale of the model is as follows: Population history can generate variance in inbreeding among individuals, depending on individual pedigrees. The population can thus be partitioned into “inbreeding classes.” First, inbreeding levels are associated with genotypes at neutral markers. The latter can be obtained from identity-in-state (IIS) relationships among marker alleles for a given marker mutation model and population scenario. Second, in the presence of inbreeding depression, the inbreeding level determines the value of the fitness trait. Thus, individual phenotype and genotype correlate through individual variation in the inbreeding level. For each mutation model, genotypic
index, and population scenario assumed, the correlation coefficient (X, W) between X (a given index) and the fitness trait W is derived. As (W, X) ⫽
cov(W, X) , (W)(X)
(1)
three moments must be computed: the covariance of the fitness trait and X, cov(W, X), and the variances 2(W) and 2(X). Below, we describe how analytical expressions for these moments can be obtained. These computations rely on IIS probabilities that depend on the mutation model assumed. Mutation models for the marker genes and corresponding indices are described in the next section. Mutation models and individual genotypic indices: In a second step we develop theoretical models that approximate mutational processes at DNA markers, such as microsatellites, RFLP markers, or neutral DNA sequences. Only the first two categories of markers have been used in GFC studies, but the third kind might also be used in the future. Stepwise mutation model: Under a strict stepwise mutation model (SMM; Ohta and Kimura 1973) with mutation rate u, an allele with i repeat units is assumed to mutate only to the i ⫺ 1 or the i ⫹ 1 states, each with probability u/2 per generation. This model is classically taken as an approximation for mutation at microsatellite marker loci, for which u ranges from 10⫺6 to 10⫺2 per generation (Jarne and Lagoda 1996; Estoup and Angers 1998). Two indices are considered. The first is individual heterozygosity H, which takes the value 0 when the marker locus is homozygous and 1 when it is heterozygous. The second is d 2, the squared difference in repeat units between the two alleles of an individual (Coulson et al. 1998). K-alleles model: Under a model first formulated by Crow and Kimura (1970), there is a finite number K of possible allelic states, and each allele can mutate to any other at rate u/(K ⫺ 1). A two-allele model can be taken as an approximation for mutation at an RFLP locus, as only two alleles (“cut” and “uncut”) need to be distinguished, and for single nucleotide polymorphisms (SNPs; Kuhner et al. 2000). We assume the mutation rate of these DNA markers to be fairly low, even though no direct data are available. Estimates of the substitution rate per nucleotide site and per generation are typically of the order of 10⫺8 or less, depending on the organism under consideration (Li 1997; Drake et al. 1998; Nachman and Crowell 2000). For a SNP site, u should thus be at most 10⫺8 per generation. Assuming that an RFLP polymorphism is due to nucleotide substitutions rather than to indels and that the RFLP locus is composed of few nucleotides, the mutation rate for an RFLP should be of the order of 10⫺8 or 10⫺7 per locus per generation. Although mutational processes affecting RFLP loci may be asymmetric, destroying any particular polymorphism
Heterosis, Marker Mutation and Population History
more often than recreating it, we do not expect our model to be very sensitive to this asymmetry, given that mutation rates will be low (but this remains to be tested). The only index studied for this kind of marker locus is individual heterozygosity. Infinite-alleles and infinite-sites models: Under the infinite-alleles model (IAM; Kimura and Crow 1964), each mutation event produces a new allele at rate u per generation. The infinite-sites model (ISM; Kimura 1969) shares the same property but is more specifically designed for DNA sequences. In this model the number of nucleotide sites in the sequence is assumed to be so large that each new substitution occurs at a site that has not mutated before. The total mutation rate of the sequence is u ⫽ l, where l is the sequence length in base pairs and is the substitution rate per nucleotide site per generation (see above). The first index considered is sequence heterozygosity H, which takes the value 0 when the two alleles of the marker are strictly identical in sequence and 1 when there is at least one different nucleotide site between the two allelic states. Note that analytical derivations of heterozygosity under the ISM and the IAM with mutation rate u are strictly equivalent. For neutral DNA sequences, the number of nucleotide differences between the two alleles of an individual, denoted by p, is also used as an alternative index. The partial selfing model and single-locus heterozygosity-fitness correlation: We investigate a simple scenario in which HFC relates to the actual mating system of the population. This scenario examplifies short-term inbreeding. Consider a large population of size n at inbreeding equilibrium, with freely recombining loci and one marker locus with a given mutation model. A proportion S of offspring is produced by selfing at each generation, whereas a proportion 1 ⫺ S comes from outcrossing events. Each individual is characterized by the number of generations of selfing in its pedigree, k, starting from the most recent outcrossing event (0 ⱕ k ⱕ ∞). The population is thus partitioned into inbreeding classes Ck, 0 ⱕ k ⱕ ∞, each consisting of individuals having the same inbreeding level (i.e., the same k). Note that, under this model, there is an infinite number of inbreeding classes, although in practice only classes with low k are largely represented. In the absence of selection, inbreeding class k has frequency Pr(Ck) ⫽ (1 ⫺ S)S k. Selection against homozygous genomes reduces the frequency of classes with high k. The effect of selection thus resembles a reduction in the selfing rate S to a lower value Ssel that can be computed as explained in David 1999 (appendix a). In practice, David (1999) shows that frequencies of inbreeding classes computed using Ssel instead of S provide a good approximation of the frequencies with selection. In what follows, we therefore take into account selection against homozygous genomes simply by replacing the “raw” S value by the value of Ssel. Let us define the probabilities of IIS within a popula-
1847
tion: Q 0 for a pair of genes drawn from the same individual and Q 1 for genes from different individuals. In what follows, moments are expressed as functions of Q 0 and Q 1 that are derived in appendix a under each mutation model. The inbreeding coefficient in class k, fk, is defined by fk ⫽
Q |k ⫺ Q 1 , 1 ⫺ Q1
(2)
where Q |k is the probability of identity of genes in an individual of class Ck. Under a general inbreeding load model assuming additive effects of fitness genes (Morton et al. 1956) the value of a fitness trait W in a Ck individual can be written as W|k ⫽ W0 ⫺ fk ⫹ ε,
(3)
where W0 is the value of the fitness trait for outbred individuals (i.e., belonging to C 0);  is the inbreeding load (i.e., the fitness reduction among completely inbred individuals as compared to outbred individuals); and ε is a random effect with mean 0 and variance ε2, assumed to be independent of k. Moments of the fitness trait W are, based on Equation 3, E(W ) ⫽ W0 ⫺ E(f ),
(4a)
(W ) ⫽  (f ) ⫹ ,
(4b)
2
2
2
2 ε
where E(f ) and 2(f ) are the mean and variance of fk over all k-values. Moments of heterozygosity H are simply expressed as functions of probabilities of IIS: E(H) ⫽ 1 ⫺ Q 0,
(5a)
(H) ⫽ Q 0(1 ⫺ Q 0). 2
(5b)
Within a given inbreeding class Ck, heterozygosity is not correlated with the fitness trait so that, conditioning on the inbreeding class k, Cov(H, W) ⫽
兺k E(H|Ck)E(W |Ck)P(Ck) ⫺ E(H)E(W ),
(6a)
where E(H|Ck) ⫽ 1 ⫺ Q |k, and E(W |Ck) ⫽ W0 ⫺ fk. Firstand second-order moments of fk and Q |k are needed to compute Equations 4–6. Neglecting mutation since the most recent outcrossing event, 1 ⫺ Q |k ⫽ (1 ⫺ Q 1)/2k, so that fk ⫽ 1 ⫺ (1⁄2)k. This yields E(f ) ⫽ FIS ⫽
Q0 ⫺ Q1 1 ⫺ Q1
⫽
S 2⫺S
and 2(f ) ⫽
4S(1 ⫺ S) , (4 ⫺ S)(2 ⫺ S)2
where FIS is the classical F-statistic (Wright 1951). Note that FIS can be estimated directly using the heterozygote deficiency (Weir and Cockerham 1984). Finally,
1848
A. Tsitrone, F. Rousset and P. David
Cov(H, W ) ⫽ (1 ⫺ Q 1)2( f ) ⫽ (1 ⫺ Q 1)
4S(1 ⫺ S ) . (4 ⫺ S)(2 ⫺ S )2
(6b) Note that cov(H, W) is null for S ⫽ 0 and S ⫽ 1, since there is then no variance in the inbreeding level among individuals. Equations 4b, 5b, and 6b can now be used to derive the correlation coefficient between heterozygosity and the fitness trait from (1). The maximum correlation coefficient (max), assuming no within-inbreeding class variance for the fitness trait (ε2 ⫽ 0 in Equation 3), is 2 ⫽ max
2(1 ⫺ Q 1)S (2Q 1(1 ⫺ S) ⫹ S)(4 ⫺ S)
.
(7a)
2 does not depend on the inbreeding load , but max only on the marker genetic diversity 1 ⫺ Q 1 (see appendix a for an analytical expression of Q 1 under each mutation model) and on the selfing rate S. In the presence of within-pedigree variance for the fitness trait (ε2 ⬎ 0), 2 is
2 ⫽
2 max . 1 ⫹ /(22(f )) 2 ε
(7b)
Extension of the partial selfing model to other genotypic indices: We first focus on the squared difference in repeat units for a microsatellite marker under the stepwise mutation model of mutation. Cov(d 2, W ) is derived as in (6a), replacing H by d 2. Moments of d 2 are obtained as detailed in appendix a (Equation A7). This yields Cov(d 2, W ) ⫽
8(1 ⫺ S)S un for S Ⰷ 1/n, (4 ⫺ S)(2 ⫺ S) (8)
librium, which splits into two randomly mating finite subpopulations of size n Ⰶ N. Let the two subpopulations diverge for a long time (assumed to be equal to N generations for the sake of simplicity) without any gene flow between them and then merge into a single, infinite, panmictic population. Given enough divergence time, different deleterious mutations inherited from the ancestral population will reach different frequencies in each subpopulation. We investigate the genotype-phenotype relationship in the resulting mixed population. Each individual is characterized by the probability x that the two alleles it has at a given locus originate from the same subpopulation, depending on the number g of panmictic generations following the admixture. The quantity 1 ⫺ x can be interpreted as the individual “degree of outbreeding,” i.e., the “disparity between the genome of the two parents” (Coulson et al. 1999). Just after the admixture (g ⫽ 1) the correlation is due to the coexistence of two inbreeding classes, C w and C b, with equal probabilities 1/2. C w (w stands for “within”) is composed of individuals with both parents originating from the same subpopulation (x ⫽ 1) while C b individuals (b stands for “between”) have one parent in each of the two subpopulations (x ⫽ 0). C w individuals are expected to have lower fitness than C b individuals as they are more likely to be homozygous for their deleterious mutations. They also have more similar alleles at a neutral marker locus than C b individuals. Each generation after the contact, new inbreeding classes Cx appear as a consequence of recombination. At generation g after the contact, assuming free recombination for the sake of simplicity,
and thus 2max
4S(1 ⫺ S)n2u , ⫽ 2 (4 ⫺ S )(1 ⫹ 11u ⫹ 4n u(5 ⫺ 7S ⫹ 2S 2 ) ⫹ n(1 ⫺ S )(1 ⫹ 19u))
(9) and is as in (7b). We then focus on another genotypic index: the number of nucleotide differences for a neutral sequence under the infinite-sites model of mutation. The correlation coefficient between p and W under the ISM is computed similarly to the correlation coefficient between d 2 and W. Moments of p are derived in appendix a (Equation A5) and Cov(p, W ) is as in (8). We obtain 2
2max ⫽
4S(1 ⫺ S )n2u (4 ⫺ S )(1 ⫹ u ⫹ 4n2u(1 ⫺ S ) ⫹ n(1 ⫺ S )(1 ⫹ u)) for S Ⰷ 1/n,
(10)
and is as in (7b). The population admixture model (hybridization): We next consider a simple situation where the HFC is related to long-term processes. We assume a large, random mating population of size N at mutation-selection equi2
E(x) ⫽ 1⁄2,
(11a)
2(x) ⫽ 1⁄4g.
(11b)
Let Q w and Q b be the probability of identity-in-state of genes from C w and from C b, respectively. fx, the inbreeding coefficient of class x, is defined as fx ⫽
Q |x ⫺ Q b 1 ⫺ Qb
.
(12)
Neglecting mutation since admixture, Q |x can be expressed as xQ w ⫹ (1 ⫺ x) Q b so that fx ⫽ x
Qw ⫺ Qb
冢 1 ⫺ Q 冣 ⫽ xF
,
ST
(13)
b
where FST is the classical F-statistic (Wright 1951; Cockerham and Weir 1987). Note that FST is defined between the two subpopulations before the admixture and cannot be evaluated in samples of the current population (after the admixture). HFC studies are usually done within a single population and our model therefore concentrates on that situation. Although FIS and FST are used in the context of the partial selfing and the admixture models, respectively, both models use the same
Heterosis, Marker Mutation and Population History
definition of inbreeding, i.e., the increase in the probabilities of identity. By analogy with (3) the expected value of the fitness trait of an individual belonging to class x can be expressed as a function of fx, or simply of x: Wx ⫽ W0 ⫺ x ⫹ ε.
(14)
Here W0 is the value of the fitness trait for C b individuals (i.e., with x ⫽ 0) and the inbreeding load , the difference in fitness for C b and C w individuals, measures heterosis between the two subpopulations before the mixture. The random effect ε with mean 0 and variance ε2 is assumed independent of x. We now derive the correlation coefficient between heterozygosity and the fitness trait. Moments of the fitness trait W are derived on the basis of (11) and (14): E(W ) ⫽ W0 ⫺ /2,
(15a)
2(W ) ⫽ 22(x) ⫹ ε2 ⫽ 2/4g ⫹ ε2.
(15b)
Moments of heterozygosity H are simply expressed as functions of probabilities of IIS: E(H) ⫽ 1 ⫺ Ex(Q |x) ⫽ (1 ⫺ Q b)(1 ⫺ FST E(x)) ⫽ 1 ⫺ 1⁄2 (Q w ⫹ Q b),
(16a)
2(H) ⫽ E(H) (1 ⫺ E(H)) ⫽ (1 ⫺ 1⁄2 (Q w ⫹ Q b))(Q w ⫹ Q b)/2.
(16b)
The covariance term is calculated using the partition into inbreeding classes, Cov(HW ) ⫽
冮x E(H|Cx)E(W |Cx)P(Cx) ⫺ E(H)E(X), (17a)
which yields Cov(HW) ⫽ (Q w ⫺ Q b) 2(x).
(17b)
1849
tained under both population models is provided in Table 1. Numerical parameters for the models: Analytical results derived in the previous sections were explored numerically using Mathematica 3.0 programs (Wolfram 1996). The ranges of values explored for the various parameters of the model are presented below. Mutation rates: To analyze the impact of mutation on heterozygosity-fitness correlations with respect to different mutation models, a large range of mutation rates is investigated, from 10⫺9 to 10⫺2 per locus and per generation. Mutation rates ranging from 10⫺6 to 10⫺2 are investigated to compare d 2 and H under the SMM, as this is the accepted range for a microsatellite locus. Assuming a substitution rate equal to 10⫺9 per site per generation, total mutation rates ranging from 10⫺9 to 10⫺6 correspond to sequences of realistic length (few to several hundreds of base pairs). These mutation rates were thus considered when comparing p and H under the ISM. Population parameters: Under the partial selfing model the standard situation modeled is a large population (n ⫽ 103 individuals) with intermediate selfing rate (S ⫽ 0.4). The effect of a change in the population size (from 103 to 106) and the selfing rate (from 0 to 1) is analyzed. Under the admixture model the standard assumptions are small subpopulations (n ⫽ 102) descended from a large ancestor population (N ⫽ 104) after a long divergence time ( ⫽ 104 generations). The effect of the subpopulation size (from 102 to 104) and the joint influence of the ancestor population size and the divergence time in generations (assumed to be equal for the sake of simplicity and ranging from 104 to 106) are investigated. The inbreeding load does not need to be estimated here as we focus on maximum correlation coefficients, which do not depend on this parameter (see Table 1).
The maximum correlation coefficient is 2 max ⫽
⫽
4(Q w ⫺ Q b)22(x)
RESULTS
(Q w ⫹ Q b)(2 ⫺ (Q w ⫹ Q b))
We first focus on the effects of the mutation models and mutation rates on the heterozygosity-fitness correlation for both population scenarios, evaluating the effect of population parameters. We then analyze the influence of genotypic indices on the genotype-fitness correlation obtained. Finally the form of the relation between the genotype and the fitness (the expected value of the fitness trait as a function of d 2) is discussed under the SMM. Impact of marker mutation on the heterozygosityfitness correlation: Under the partial selfing model, the maximal correlation coefficient increases with the marker mutation rate for all mutation models studied (Figure 1A). This is not surprising since this correlation is an increasing function of marker gene diversity [(1 ⫺ Q 1) in Equation 7a], which itself increases with the mutation rate irrespective of the mutation model (see appendix
(Q w ⫺ Q b)2 4g⫺1(Q w ⫹ Q b)(2 ⫺ (Q w ⫹ Q b))
.
(18a)
At the first generation after the mixture (g ⫽ 1), 2 max(g ⫽1) ⫽
(Q w ⫺ Q b)2 (Q w ⫹ Q b)(2 ⫺ (Q w ⫹ Q b))
. (18b)
2 The coefficient max depends only on the ISS probabilities Q b and Q w (derived in appendix b) but not on the inbreeding load, just as in the partial selfing model. In the presence of within-pedigree variance for the fitness trait, 2 is as in (7b), replacing 2(fk) by 2(x). The model can be generalized to other cases, as for the partial selfing model. A summary of the analytical expressions for the squared correlation coefficients ob-
2(1 ⫺ Q 1)S
4S(1 ⫺ S )n2u (4 ⫺ S )(1 ⫹ u ⫹ 4n2u(1 ⫺ S ) ⫹ n(1 ⫺ S )(1 ⫹ u))
u2(2(N ⫺ n)(1 ⫺ e⫺/2n)⫹ )2 2(p)
4S(1 ⫺ S )n2u (4 ⫺ S )(1 ⫹ 1 1u ⫹ 4n u(5 ⫺ 7S ⫹ 2S 2) ⫹ n(1 ⫺ S )(1 ⫹ 19u))
u2(2(N ⫺ n)(1 ⫺ e⫺/2n)⫹ )2 2(d 2)
2
2(p, W )d
2(d 2, W )c
Analytical expressions of squared correlation coefficients between various genotypic indices and the fitness trait are given under both population models. The variance for the fitness trait within inbreeding classes ε is assumed equal to 0. a The results are given at the first generation after the admixture. b 2 (H, W ) is calculated under the IAM, the SMM, and the KAM. 2(H, W ) is expressed as a function of Q 1 under the partial selfing model. Q 1 is derived using Equations A2 (under the IAM) (A6) (under the SMM) and (A9) (under the KAM) in appendix a. 2(H, W ) is expressed as a function of Q b and Q w under the admixture model. Q b and Q w are derived using Equations B1, B3, and B4 in appendix b. c 2 (d 2, W ) is calculated under the SMM. 2(d 2, W ) is expressed as a function of 2(d 2). 2(d 2) is calculated using (A7) and (B2) [with r(z) obtained under the SMM]. d 2 (p, W ) is calculated under the IAM. 2(p, W ) is expressed as a function of 2(p). 2(p) is calculated using (A5) and (B2).
(Q w ⫹ Q b)(2 ⫺(Q w ⫹ Q b))
(Q w ⫺ Q b)2
Admixturea
(2Q 1(1 ⫺ S ) ⫹ S )(4 ⫺ S )
Selfing
2(H, W )b
GFC for the selfing and the admixture model
TABLE 1
1850 A. Tsitrone, F. Rousset and P. David
Figure 1.—Influence of mutation on the correlation coefficient between heterozygosity at one marker locus and fitness. IAM, infinite alleles model of mutation; SMM, stepwise mutation model; 4AM, four-alleles model; 2AM, two-alleles model. The variance for the fitness trait (ε) was assumed to be equal to 0 within inbreeding class. (A) Partial selfing model. The population size (n) is 103 and the selfing rate is 0.4. (B) Admixture model. Results are given at the first generation following the admixture. The subpopulation size (n) is 102, and the ancestor population size (N) and the divergence time () in generations are 104.
a). For low mutation rates (⬍10⫺5), the correlation coefficient is never higher than a few percent, in agreement with most experimental data (David 1998). Mutation models have a low impact except for high mutation rates (⬎10⫺5), which are realistic only for microsatellite markers. For high mutation rates the correlation coefficient increases with the value of K in the K-alleles model (KAM) and is maximal for the IAM. The SMM behaves like a KAM with large K (K ⬎ 10, data not shown). A different picture is obtained under the admixture model, as the maximum correlation coefficient exhibits a maximum for an intermediate mutation rate (Figure 1B). This “optimal mutation rate” is of the order of 1/
Heterosis, Marker Mutation and Population History
1851 TABLE 2
HFC under the admixture model u 10⫺3
Figure 2.—Influence of population parameters on the correlation coefficient between heterozygosity at one marker locus and fitness, under the partial selfing model and IAM. (ε ⫽ 0). The mutation rate is 10⫺6. The population size n varies from 103 to 106.
(the divergence time; Figure 1B and data not shown). The correlation coefficient obtained under the admixture model may be much higher than that obtained under the partial selfing model. Under the partial selfing model the correlation is limited by the fact that all genotypes, be they homozygotes or heterozygotes, are represented in all inbreeding classes although the frequency of heterozygotes is halved in each generation of selfing. In contrast, under the admixture model, parameter values can be found such that all homozygotes are in class C w and all heterozygotes are in class C b, which allows the maximum correlation (in the absence of environmental variance in fitness) to approach unity. However, assuming freely recombining loci, this maximum correlation is halved each generation following the admixture (Equation 18). Furthermore, in natural populations, within-inbreeding class variance for the fitness trait may weaken the real correlation [Equation 7b, replacing 2( fk) by 2(x)]. Mutation models rank in the same order as those under the partial selfing model with regard to the correlation coefficient, except with very high mutation rates (⬎10⫺3) for which the SMM provides a stronger correlation than the IAM. Ultimately, for both population models studied, marker mutation influences HFC primarily through its effect on marker diversities: (1 ⫺ Q 1) for the partial selfing model (Equation 7a) and (1 ⫺ Q w) and (1 ⫺ Q b) for the admixture model (Equations 18). Impact of population parameters on the heterozygosity-fitness correlation: Under the partial selfing model, the correlation coefficient increases with the population size and the selfing rate, until S is very close to 1 (Figure 2), whatever the mutation rate and model (data not shown). Ohta and Cockerham (1974) obtained qualitatively similar results for the effect of the selfing rate on associative overdominance in an infinite partially
10⫺4
n
102
103
104
102
103
104
N ⫽ ⫽ 106 N ⫽ ⫽ 105 N ⫽ ⫽ 104
0.96 0.96 0.93
0.74 0.74 0.71
0.33 0.33 0.27
0.97 0.51 0.17
0.97 0.50 0.15
0.93 0.45 0.04
The correlation coefficient between heterozygosity and the fitness trait under the admixture model for different mutation rates u, population sizes n, and divergence time in generations assumed equal to the ancestor population size (N ⫽ ), with IAM and ε ⫽ 0.
selfing population. However, Charlesworth (1991) found that associative overdominance is maximal for intermediate selfing rates. This discrepancy probably results from the measure of associative overdominance chosen. Charlesworth (1991) studied the apparent selection coefficient for heterozygotes at the neutral marker, which is expressed in phenotypic units and thus depends on the inbreeding load. The maximal heterozygosity-fitness correlation coefficient used here is independent of the inbreeding load , unlike the covariance term, which scales with  (Equation 6b). Using the latter as a measure of associative overdominance, we obtain a maximum associative overdominance for intermediate selfing rates (data not shown), just as found by Charlesworth (1991). Taking into account within-pedigree variance for the fitness trait (ε ⬎ 0 in Equation 4b) the correlation is weakened and becomes an increasing function of the inbreeding load (Equation 7b). In the admixture model, the correlation coefficient decreases with the subpopulation size, particularly when the mutation rate is high (Table 2). The correlation increases when the ancestral population size and the divergence time increase simultaneously, particularly for low mutation rates. The effect of the inbreeding load  is similar to that in the partial selfing scenario (Equation 7b). Impact of the genotypic index on the correlation with the fitness trait: The squared difference in repeat units vs. heterozygosity under the stepwise mutation model: Under the partial selfing model, for low mutation rates, d 2 and heterozygosity H are equally correlated with fitness (Figure 3A). Increasing the mutation rate strongly increases the correlation with heterozygosity but not with d 2 (Figure 3A), irrespective of population size or selfing rate (data not shown). This can be explained by the distribution of the expected value of the fitness trait conditioned on the value of d 2. Using (C3) and (C4) in appendix c we found that all heterozygotes (nonnull d 2 values) correspond numerically to the same expected fitness
1852
A. Tsitrone, F. Rousset and P. David
Figure 3.—Correlation coefficients between heterozygosity or d 2 at one marker locus and fitness under the SMM (ε ⫽ 0). (A) Partial selfing model. The parameters used are as in Figure 1A. (B) Admixture model. The parameters used are as in Figure 1B except that n ⫽ 103.
(e.g., W ⫽ 0.89 with W0 ⫽ 1,  ⫽ 1, u ⫽ 10⫺4, and other parameters as in Figure 3A), which is higher than that of homozygotes (d 2 ⫽ 0) (e.g., W ⫽ 0.72 with W0 ⫽ 1,  ⫽ 1, u ⫽ 10⫺4, and other parameters as in Figure 3A). Therefore, heterozygosity is as informative as d 2, and it provides a stronger correlation due to its lower variance. Under the admixture model, however, the correlation between heterozygosity and fitness is weakened for high mutation rates (Figures 1B and 3B) whereas the correlation with d 2 is almost insensitive to the mutation rate (Figure 3B). With a small subpopulation size (n ⫽ 100) and mutation rate (u ⫽ 10⫺4), the correlation between the fitness trait and d 2 is weaker than that with heterozygosity, as in the partial selfing model (data not shown). In contrast, when the subpopulation size and the mutation rate are large enough (so that nu Ⰷ 1) d 2 provides a better correlation than H (Figure 3B). Again, this can be explained by the distribution of the expected value of the fitness trait conditional on d 2, derived according to appendix c (Figure 4). Indeed in the first case (nu Ⰶ 1), d 2 has little value for predicting fitness once it is ⬎1. However, when nu Ⰷ 1, the expected value of fitness
increases progressively with the value of d 2, so that in this case d 2 is far more informative than H. The number of nucleotide differences vs. heterozygosity under
Figure 4.—The expected value of the fitness trait as a function of d 2 value under the admixture model assuming the SMM (ε ⫽ 0). The parameters used are as in Figure 3B.
Heterosis, Marker Mutation and Population History
1853 TABLE 3
␦(d ) under the partial selfing model 2
u 10⫺3 n S ⫽ 0.1 S ⫽ 0.4 S ⫽ 0.9
10⫺4
103
104
105
103
104
105
0.074 0.298 0.835
0.173 0.457 0.885
0.319 0.560 0.990
0.036 0.187 0.768
0.074 0.297 0.830
0.172 0.455 0.988
Estimates of the indirect measure of inbreeding depression ␦(d 2) (Equation 20) under the partial selfing model are given assuming a logistic model for the expected fitness conditional on d 2 under the SMM.
W(j) ⫽
W(∞) e⫺r j, (19) 1 ⫹ ((W(∞) ⫺ W(0))/W(0))
where j is the square root of the value taken by d 2, W(∞) is the maximum value of W, asymptotically reached for very high d 2 values, W(0) is the value associated with d 2 ⫽ 0, and r measures the rate at which the fitness trait value saturates as a function of d 2. This can be rewritten: W(j) ⫽ Figure 5.—Correlation coefficients between heterozygosity or the number of substitutions and fitness under the ISM (ε ⫽ 0). (A) Partial selfing model. The parameters used are as in Figure 1A. (B) Admixture model. The parameters used are as in Figure 1B except that n ⫽ 104.
the infinite-sites model: Under the partial selfing model, given the low mutation rate assumed for a neutral sequence, the same correlation with fitness is obtained whether heterozygosity H or the number of nucleotide differences p is considered (Figure 5A). Only very high mutation rates (⬎10⫺4), corresponding to unreasonably long sequences, would introduce a difference, with a higher correlation with H than with p (data not shown). Under the population mixture model, identical correlation coefficients are obtained with p and H except with large subpopulation size (⬎103) and large mutation rates (⬎10⫺6; data not shown). In this case p is slightly less correlated to fitness than H (Figure 5B). Again, only very high mutation rates (⬎10⫺4) would result in a higher correlation with p than with H (data not shown). Unsurprisingly, these results are qualitatively similar to that obtained in the comparison of d 2 and H under the SMM with equivalent mutation rates. The expected value of the fitness trait as a function of d 2: The expected shape of the genotype-phenotype relationship is not linear (see Figure 4), with fitness plateauing for high d 2-values. This indicates that linear regression models previously used in empirical studies are not appropriate. We therefore suggest using a nonlinear regression based on the logistic equation
W(0) 1 ⫹ (1 ⫺ ␦(d 2)/(1 ⫺ ␦(d 2)))e⫺r j
with ␦(d 2) ⫽
W(∞) ⫺ W(0) . W(∞)
(20)
For the partial selfing model, ␦(d 2) can be interpreted as a microsatellite-based indirect measure of inbreeding depression. For the admixture model it is interpreted as an indirect measure of heterosis between the two subpopulations before the mixture. Numerical data obtained using appendix c were fitted to this nonlinear model, for a large range of parameters defined in Tables 3 and 4 (data not shown). This model explains 95–100% of the variance in fitness obtained for both population models studied (neglecting within-inbreeding class variance for the fitness trait). It thus provides a satisfactory description of the genotype-phenotype relationship on the basis of three parameters. Under the partial selfing model, ␦(d 2) increases with the marker mutation rate, the selfing rate, and the population size (Table 3). Under the mixture model, ␦(d 2) increases with the marker mutation rate and when the divergence time and the population size increase simultaneously. It decreases with the subpopulation size (Table 4). DISCUSSION
A unified framework for studying genotype-fitness relationships: To our knowledge, mutational processes at marker loci have not previously been incorporated into analytical models of associative overdominance, although they are recognized as relevant for empirical
1854
A. Tsitrone, F. Rousset and P. David TABLE 4 ␦(d ) under the admixture model 2
u 10⫺3 n N ⫽ ⫽ 106 N ⫽ ⫽ 105 N ⫽ ⫽ 104
10⫺4
102
103
104
102
103
104
0.992 0.976 0.926
0.983 0.947 0.841
0.949 0.843 0.451
0.981 0.942 0.833
0.976 0.926 0.785
0.947 0.841 0.411
Estimates of the indirect measure of heterosis ␦(d 2) (Equation 20) under the admixture model are given assuming a logistic model for the expected fitness conditional on d 2 under the SMM.
issues (Pemberton et al. 1999). Evaluating heterozygosity-fitness correlations requires the evaluation of probabilities of identity-in-state, which depend both on the demographic scenario assumed and on mutational processes and have been extensively used to address population structure issues (reviewed in Rousset 2001). This could be done in a unified framework using the same formalism as previously used (Rousset 1996). It is equivalent to a formalism based on the computation of distributions of coalescence times but does not require an explicit computation of such distributions. Under this formalism, neglecting within-inbreeding class variance in fitness for a given fitness model, the maximal correlation coefficient between a fitness trait and a neutral genotype does not depend on the inbreeding load (the reduction in fitness associated with complete inbreeding), as already reported for other demographic scenarios (Bierne et al. 2000b). There is therefore no need for an explicit expression of the inbreeding load, which depends on the genetic architecture of the fitness trait and on the population scenario. Rather than exploring the continuum of possible historical scenarios exhaustively, we chose to exemplify a short-term (with only very recent coalescence events) and a long-term (with coalescence events deeper in the pedigree of individuals) inbreeding process with, respectively, the partial selfing and the admixture model. The fitness model we used (Equations 3 and 14) was first proposed by Morton et al. (1956). Neglecting purging selection and disequilibrium between selected loci and assuming additive effects among fitness loci (or multiplicative effects, considering log-fitness rather than fitness), we obtained the Morton model for fitness under various genetic architectures and various population scenarios (Charlesworth and Charlesworth 1987, 1999; Bierne et al. 2000b; Whitlock et al. 2000). Variance for the fitness trait may exist within an inbreeding class, particularly when recombination is limited, due to the segregation of chunks of chromosomes rather than independent loci. In natural populations, environmental effects may also increase this variance. If this is large enough compared to genetical effects,
HFC would be weakened and also would depend on the inbreeding load, which should then be estimated. Empirically estimating within- and between-inbreeding class variance for the fitness trait might be difficult in natural populations, but would be particularly useful to assess the validity of our model. Choosing appropriate marker genes: Microsatellites have recently been used as a tool to infer fitness differences due to variation in the level of inbreeding between individuals (Pemberton et al. 1999). Are microsatellites good markers to address such issues? We have shown that HFC due to partial selfing increases with the marker mutation rate (and thus with marker variability), making microsatellites good markers in such a situation. Partial selfing is an example of short-term inbreeding generated by the current mating system. One generation of outcrossing reduces individual inbreeding to zero, so that only selfing since the most recent outcrossing event (i.e., a few generations) needs to be taken into account. However, short-term inbreeding can also be due to recent demographic episodes such as bottlenecks. A sudden reduction in population size produces random inbreeding among individuals, i.e., mating between kin due to chance, as opposed to systematic inbreeding due to the mating system (Male´cot 1969). Bierne et al. (2000b) analyzed theoretically the situation of a population experiencing a recent and drastic bottleneck sustained over a few generations. They found strong HFC, but it is transient due to the rapid homogenization of the inbreeding level among individuals after a few generations of a sustained bottleneck. Bierne et al. (2000b) did not use an explicit mutation model for the marker genes. However, they found that highly variable markers produce larger HFC than less variable ones, consistent with our result under partial selfing. Heterozygosity-fitness correlations increase with marker diversity and thus with marker mutation rates, whenever they are related to short-term inbreeding. Microsatellite loci seem therefore ideal to investigate fitness consequences of short-term inbreeding. However, when the origin of HFC is long-term inbreeding (e.g., admixture of anciently diverged small
Heterosis, Marker Mutation and Population History
subpopulations), the correlation increases up to a certain mutation rate but then decreases (Figure 1B). Using highly mutable markers can thus lead to confusing results under a long-term inbreeding scenario. The explanation is that even highly “inbred” genotypes can have heterozygous marker loci if marker mutation rates are sufficiently high. Marker heterozygosity is therefore not a good index of individual fitness. There is thus an “optimal mutation rate,” which decreases as the timescale of the population inbreeding history increases. In conclusion, one should use molecular markers whose mutational process scales roughly with the assumed cause of inbreeding. However, empirical studies provide only imprecise estimates of mutation rates and a temporal scale of inbreeding. In practice we suggest only that empirical studies should avoid using markers that mutate at an obviously inappropriate rate. Mutation models of marker genes play little role in HFC, except when the mutation rate is high. In the infinite-alleles model there is no homoplasy, whereas the K-alleles model is characterized by increasing homoplasy with decreasing K-values. The comparison between IAM and KAM, and among various K-values, suggests that homoplasy is a limiting factor for HFC, irrespective of the population scenario. Homoplasy represents a loss of information about identity-by-descent, making marker genotypes less closely related to the inbreeding level of individuals. As the homoplasy generated by the stepwise mutation model is different from that produced by a KAM, the results obtained for these two mutation models cannot be directly compared. Choosing an appropriate genotypic index for microsatellite data: It has been suggested recently that d 2 may be a more appropriate index than heterozygosity when analyzing fitness consequences of inbreeding (Coulson et al. 1999; Pemberton et al. 1999). Heterozygosity should be adequate to distinguish inbred and outbred individuals, whereas d 2 may give the additional opportunity to detect “highly outbred” individuals having high d 2 values, i.e., large coalescence time for their two microsatellite alleles, vs. “moderately outbred” individuals (Coulson et al. 1999). In a red deer population from the Isle of Rum, Coulson et al. (1998) found a positive correlation between birth weight and mean d 2, although not between birth weight and heterozygosity. They argued that this could be due to the demographic history of the population, namely population divergence followed by mixing (our admixture model), which is supported by historical data. However, we have shown that, in our admixture model, at least, the conditions under which d 2 would lead to a better correlation than H in the admixture model are restrictive (a very high marker mutation rate and large subpopulation sizes and divergence time) and furthermore they do not seem to apply to what is known of the red deer population of the Isle of Rum (Pemberton et al. 1999). Divergence times of ⵑ100 generations, with
1855
large subpopulation size, parameters that appear realistic for such a population, always lead to better predictive power of H as compared to d 2 (not shown). Note that the fact that there are only two classes of inbreeding and fitness in our admixture scenario just after the contact does not necessarily imply that a binary variable such as H will perform better than a quantitative variable such as d 2. If the proportion of high fitness individuals (class C b) within a class of d 2 increases progressively with d 2 (Figure 4, with u ⫽ 10⫺2) rather than in a stepwise way (in most cases), d 2 performs better than H. One can suggest another explanation for the stronger correlation with fitness obtained using d 2 as compared to H. First, we cannot exclude the possibility that a more sophisticated demographic scenario may generate this result under less restrictive conditions than the admixture scenario formulated here, although we do not have a precise idea of what such a scenario should be. Alternatively, the results of Coulson et al. (1998, 1999) may be due to low homozygosity in their population, making heterozygosity an inadequate genotypic index when compared to d 2. Finally these results could simply be due to chance, i.e., random variation. Recently, several studies have found fitness-related traits, such as adult breeding success in the red deer (Slate et al. 2000) or survival and parasite resistance in the Soay sheep (Coltman et al. 1999), to be more strongly correlated with heterozygosity than with d 2. In a captive wolf population with known inbreeding levels, Hedrick et al. (2001) found that d 2 was less predictive of the known inbreeding coefficient than microsatellite heterozygosity. This supports the hypothesis that d 2 is generally not more powerful than heterozygosity to detect fitness consequences of inbreeding. It seems generally unlikely that d 2 will provide a higher correlation with fitness than H. For low mutation rates, heterozygosity and d 2 are equivalent under both our population models, and for high mutation rates, heterozygosity always provides a higher correlation than d 2 with fitness under the partial selfing model. Under the admixture model only, d 2 is more correlated with a fitness trait than heterozygosity under restrictive conditions: large mutation rate, divergence time, and subpopulation size. These conditions are expected to be associated with a relatively low inbreeding load () in the mixed population, because there is a small probability of fixation, or of sufficient change in frequency, of a deleterious mutation in a large subpopulation. The inbreeding load does not influence the maximum correlation (Equation 18a) but affects the actual correlation, taking into account within-inbreeding class variance for the fitness trait. Given that sources of variation in fitness other than inbreeding must exist in natural populations (generating ε in Equations 4b and 15b), this small inbreeding load weakens the actual correlation (see Equation 7b), thus reducing the chance of detecting it. Finally, a conclusion of our analysis is that there seems
1856
A. Tsitrone, F. Rousset and P. David
to be little theoretical reason to use d 2 instead of H when analyzing correlations between microsatellite genotype and fitness, even for long-term inbreeding scenarios. When doing so, however, one should choose an appropriate regression model to analyze genotype-fitness correlations (Equations 19 and 20) rather than the linear model used in previous studies. We thank Nicolas Bierne, Sylvain Gle´min, Philippe Jarne, Sally Otto, Josephine Pemberton, John Slate, and John Thompson for stimulating discussions and helpful comments on a previous draft of the manuscript. This work was supported by funds from the Centre National de la Recherche Scientifique (CNRS; Jeune Equipe program) to P. Jarne.
LITERATURE CITED Bierne, N., S. Launey, Y. Naciri-Graven and F. Bonhomme, 1998 Early effect of inbreeding as revealed by microsatellite analyses in Ostrea edulis larvae. Genetics 148: 1893–1906. Bierne, N., I. Beuzart, V. Vonau, F. Bonhomme, E. Bedier et al., 2000a Microsatellite-associated heterosis in hatchery-propagated stocks of the shrimp Penaeus stylirostris. Aquaculture 184: 203–219. Bierne, N., A. Tsitrone and P. David, 2000b An inbreeding model of associative overdominance during a population bottleneck. Genetics 155: 1981–1990. Bush, R. M., and P. E. Smouse, 1991 The impact of electrophoretic genotype on life history traits in Pinus taeda. Evolution 45: 481– 498. Charlesworth, B., and D. Charlesworth, 1999 The genetic basis of inbreeding depression. Genet. Res. 74: 329–340. Charlesworth, D., 1991 The apparent selection on neutral marker loci in partially inbreeding populations. Genet. Res. 57: 159–175. Charlesworth, D., and B. Charlesworth, 1987 Inbreeding depression and its evolutionary consequences. Annu. Rev. Ecol. Syst. 18: 237–268. Cockerham, C. C., and B. S. Weir, 1987 Correlations, descent measures: drift with migration and mutation. Proc. Natl. Acad. Sci. USA 84: 8512–8514. Coltman, D. W., W. D. Bowen and J. M. Wright, 1998 Birth weight and neonatal survival of harbour seal pups are positively correlated with genetic variation measured by microsatellites. Proc. R. Soc. Lond. Ser. B 265: 803–809. Coltman, D. W., J. G. Pilkington, J. A. Smith and J. M. Pemberton, 1999 Parasite mediated selection against inbred Soay sheep in a free living island population. Evolution 53: 1249–1267. Coulson, T., S. Albon, J. Slate and J. Pemberton, 1999 Microsatellite loci reveal sex-dependent responses to inbreeding and outbreeding in red deer calves. Evolution 53: 1951–1960. Coulson, T. N., J. M. Pemberton, S. D. Albon, M. Beaumont, T. C. Marshall et al., 1998 Microsatellites reveal heterosis in red deer. Proc. R. Soc. Lond. Ser. B 265: 489–495. Crow, J. F., and M. Kimura, 1970 An Introduction to Population Genetics Theory. Harper & Row, New York. David, P., 1998 Heterozygosity-fitness correlations: new perspectives on old problems. Heredity 80: 531–537. David, P., 1999 A quantitative model of the relationship between phenotypic variance and heterozygosity at marker loci under partial selfing. Genetics 153: 1463–1474. Drake, J., B. Charlesworth, D. Charlesworth and J. Crow, 1998 Rate of spontaneous mutation. Genetics 148: 1667–1686. Estoup, A., and B. Angers, 1998 Microsatellites and minisatellites for molecular ecology: theoretical and empirical considerations, pp. 55–86 in Advances in Molecular Ecology, edited by G. Carvalho. IOS Press, Amsterdam. Hedrick, P., R. Fredrickson and H. Ellegren, 2001 Evaluation of d 2, a microsatellite measure of inbreeding and outbreeding, in wolves with a known pedigree. Evolution 55: 1256-1260. Jarne, P., and P. Lagoda, 1996 Microsatellites, from molecules to populations and back. Trends. Ecol. Evol. 11: 424–429. Kimura, M., 1969 The rate of molecular evolution considered from
the standpoint of population genetics. Proc. Natl. Acad. Sci. USA 63: 1181–1188. Kimura, M., and J. F. Crow, 1964 The number of alleles that can be maintained in a finite population. Genetics 49: 725–738. Kuhner, M. K., P. Beerli, J. Yamato and J. Felsenstein, 2000 Usefulness of SNP data for estimating population parameters. Genetics 156: 439–447. Leary, R. F., F. W. Allendorf and K. L. Knudsen, 1983 Developmental stability and enzyme heterozygosity in rainbow trout. Nature 301: 71–72. Ledig, F. T., 1986 Heterozygosity, heterosis and fitness in outbreeding plants, pp. 77–104 in Conservation Biology (The Science of Scarcity and Diversity), edited by M. E. Soule´. Sinauer Associates, Sunderland, MA. Li, W. H., 1997 Molecular Evolution. Sinauer Associates, Sunderland, MA. Male´cot, G., 1969 Consanguinite´ panmictique et consanguinite´ syste´matique. Ann. Ge´ne´t. Se´l. An. 1: 237–242. Marshall, T. C., and J. A. Spalton, 2000 Simultaneous inbreeding and outbreeding depression in a reintroduced Arabian oryx. Anim. Conserv. 3: 241–248. Mitton, J. B., and M. C. Grant, 1984 Associations among protein heterozygosity, growth rate, and developmental homeostasis. Annu. Rev. Ecol. Syst. 15: 479–499. Morton, N. E., J. F. Crow and H. J. Muller, 1956 An estimate of the mutational damage in man from data on consanguineous marriages. Proc. Natl. Acad. Sci. USA 42: 855–863. Nachman, M. W., and S. L. Crowell, 2000 Estimate of the mutation rate per nucleotide in humans. Genetics 156: 297–304. Ohta, T., and C. C. Cockerham, 1974 Detrimental genes with partial selfing and effects on a neutral locus. Genet. Res. 23: 191–200. Ohta, T., and M. Kimura, 1970 Development of associative overdominance through linkage disequilibrium in finite populations. Genet. Res. 16: 165–177. Ohta, T., and M. Kimura, 1973 A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22: 201–204. Pamilo, P., and S. Palsson, 1998 Associative overdominance, heterozygosity and fitness. Heredity 81: 381–389. Pemberton, J. M., D. W. Coltman, T. N. Coulson and J. Slate, 1999 Using microsatellites to measure the fitness consequences of inbreeding and outbreeding, pp. 151–164 in Microsatellites: Evolution and Applications, edited by B. Goldstein and C. Schlo¨tterer. Oxford University Press, New York. Pogson, G. H., and S. E. Fevolden, 1998 DNA heterozygosity and growth rate in the Atlantic cod Gadus morhua (L). Evolution 52: 915–920. Pritchard, J. K., and M. W. Feldman, 1996 Statistics for microsatellite variation based on coalescence. Theor. Popul. Biol. 50: 325– 344. Rousset, F., 1996 Equilibrium values of measures of population subdivision for stepwise mutation processes. Genetics 142: 1357– 1362. Rousset, F., 2001 Inferences from spatial population genetics, pp. 239–269 in Handbook of Statistical Genetics, edited by D. Balding, M. Bishop and C. Cannings. John Wiley & Sons, New York. Slate, J., L. E. B. Kruuk, T. C. Marshall and J. M. Pemberton, 2000 Inbreeding depression influences lifetime breeding success in a wild population of red deer (Cervus elaphus). Proc. R. Soc. Lond. Ser. B 267: 1657–1662. Valde`s, A. M., M. Slatkin and N. B. Freimer, 1993 Allele frequencies at microsatellite loci: the stepwise mutation model revisited. Genetics 133: 737–749. Weir, B. S., and C. C. Cockerham, 1984 Estimating F-statistics for the analysis of population structure. Evolution 38: 1358–1370. Whitlock, C., P. Ingvarsson and T. Hatfield, 2000 Local drift load and the heterosis of interconnected populations. Heredity 84: 452–457. Wolfram, S., 1996 The Mathematica Book, Ed. 3. Wolfram Media/ Cambridge University Press, Cambridge, UK. Wright, S., 1951 The genetical structure of populations. Ann. Eugen. 15: 323–354. Zouros, E., 1993 Associative overdominance: evaluating the effects of inbreeding and linkage desequilibrium. Genetica 89: 35–46. Zouros, E., M. Romero-Dorey and A. L. Mallet, 1988 Heterozy-
Heterosis, Marker Mutation and Population History gosity and growth in marine bivalves: further data and possible explanations. Evolution 42: 1332–1341.
1857
By differentiating these generating functions one obtains the various moments of p:
Communicating editor: D. Charlesworth
E(p) ⫽ (d0(z)/dz)z⫽1, APPENDIX A: IDENTITY-IN-STATE IN THE PARTIAL SELFING MODEL
We derive probabilities of identity-in-state within and between individuals under the infinite sites, the stepwise mutation, and the K-alleles mutational models (see Rousset 1996). Generating functions of allele size difference (under the SMM) or of the number of nucleotide differences (under the ISM) are also derived and used to obtain the moments of d 2 and of p. IAM and ISM: Recursions for the IIS probabilities under the IAM or the ISM are
冢
1 ⫹ Q 0,t
冢n1
1 ⫹ Q 0,t
Q 0,t⫹1 ⫽ ␥ S Q 1,t⫹1 ⫽ ␥
2 2
冣
⫹ (1 ⫺ S)Q 1,t
冢
⫹ 1⫺
冣 冣
1 Q 1,t , n
(A1)
with ␥ ⫽ (1 ⫺ u)2. Note that in the ISM or the IAM, IIS and identity-by-descent (IBD) are equivalent. Q 0 and Q 1 are the equilibrium values of the system above: Q0 ⫽
␥ , (2/S)(1 ⫺ ␥) ⫹ ␥
Q1 ⫽
␥ . 2n(1 ⫺ ␥) ⫹ ␥
(A2)
冢
1 ⫹ 0,t ⫹ (1 ⫺ S)1,t , 2
(A5)
SMM: To derive Q 0 and Q 1 under the SMM it is useful to introduce 0 (resp. 1), the generating function of the probability pj,0 (resp. pj,1), that two randomly chosen alleles within an individual (resp. between individuals) differ by j steps. Because steps can take either nonnega∞ j tive or negative values, 0(z) ⫽ Rj⫹⫽⫺ ∞pj,0z (resp. 1(z) ⫽ ⫹∞ p z j). We also define , the value of condiRj⫽⫺∞ j,1 0|k 0 tional on class k. The effect of mutation is to change the generating functions by a factor r(z) ⫽ (1 ⫺ u ⫹ 1 ⁄2(uz ⫹ u/z))2, and 0 and 1 are calculated as in (A4) using this new r(z). By definition, we have p0,0 ⫽ Q 0 and p0,1 ⫽ Q 1. Furthermore, p⫺j,0 ⫽ p⫹j,0 ⫽ 1⁄2Pr(d 2 ⫽ j 2) for all j ⬆ 0. Therefore, 0(e ix) ⫽ Q 0 ⫹
∞
兺 cos(jx)Pr(d 2 ⫽ j 2),
j⫽1 ∞
冢
冢
冣 冣
and neglecting mutation since the last outcrossing event, 0|k(z) ⫽ 1 ⫺ (1 ⫺ 1(z))/2k. We use the inverse Fourier transform in j of a given function f: L j(f ) ⫽
1
冮 f (x)cos(jx)dx. 0
Q 0 and Q 1 can computed as Q 0 ⫽ L 0(0(e ix)), Q 1 ⫽ L 1(0(e ix)).
冣
(A3)
0 and 1 are the equilibrium points of the system above, and neglecting mutation since the last outcrossing event 0|k is easily derived: r(z) , (2/S)(1 ⫺ r(z)) ⫹ r(z)
(A6)
L j(f ) are numerically calculated with Mathematica 3.0 (Wolfram 1996). By differentiating the generating functions one obtains the various moments of d 2: E(d) ⫽ 0, E(d 2) ⫽ (d 20(z)/dz 2)z⫽1, 2(d 2) ⫽ (d 40(z)/dz4)z⫽1 ⫹ 6(d 30(z)/dz3)z⫽1 ⫹ 7(d 20(z)/dz2)z⫽1 ⫺ ((d 20(z)/dz2)z⫽1)2,
r(z) 1(z) ⫽ , 2n(1 ⫺ r(z)) ⫹ r(z) 0|k(z) ⫽ 1 ⫺ (1 ⫺ 1(z))/2k.
E(p|k) ⫽ (d0|k(z)/dz)z⫽1.
j⫽1
1 1 ⫹ 0,t 1 ⫹ 1 ⫺ 1,t . 1,t⫹1(z) ⫽ r(z) n 2 n
0(z) ⫽
⫹ (d0(z)/dz)z⫽1 ⫺ ((d0(z)/dz)z⫽1)2,
1(e ix) ⫽ Q 1 ⫹ 2 兺 cos(jx)pj,1,
Under the ISM the generating function 0 (resp. 1) of the probability pj,0 (resp. pj,1) that two randomly chosen alleles within an individual (resp. between individuals) differ by j nucleotide sites is defined as 0(z) ⫽ Rj⫹⫽∞0pj,0z j (resp. 1(z) ⫽ Rj⫹⫽∞0pj,1z j). The effect of mutation is to change both generating functions (z) by a factor: r(z) ⫽ (1 ⫺ u ⫹ uz)2. We also introduce 0|k, which is the value of 0 conditional on inbreeding class k. The recursions on are then mathematically similar to those for IBD (Rousset 1996): 0,t⫹1(z) ⫽ r(z) S
2(p) ⫽ (d 20(z)/dz 2)z⫽1
(A4)
E(d 2/k) ⫽ (d 20|k(z)/dz2)z⫽1 ⫽ 2⫺k(d 21(z)/dz2)z⫽1. (A7) KAM: The recursions for IIS probabilities are
1858
A. Tsitrone, F. Rousset and P. David
Q 0,t⫹1 ⫽ S Q 1,t⫹1 ⫽
⫹ Q⬘0,t 2
⫹ Q⬘0,t 2n
account mutation through the factor r(z) ⫽ (1 ⫺ u ⫹ uz)2. Therefore, just before the admixture we obtain
⫹ (1 ⫺ S)Q⬘1,t ,
冢
⫹ 1⫺
冣
1 Q⬘1,t , n
(A8)
in which ⫽ (1 ⫺ u) ⫹ u2/(K ⫺ 1), Q⬘j,t ⫽ ␥⬘Q j,t ⫹ (1 ⫺ ␥⬘)/K, and ␥⬘ ⫽ (1 ⫺ u/2 K/(K ⫺ 1))2. The equilibrium values of the system are
冢
冢 冢
w,(z) ⫽ eq(z) ⫹ r(z) 1 ⫺
冣
1 (1 ⫺ k)␥⬘(1 ⫹ (nS ⫺ 1)␥⬘) 1⫹ , k (1 ⫺ ␥⬘)(␥⬘ ⫺ 2n ⫺ nS␥⬘) ⫺ nS␥⬘
Q1 ⫽
1 1⫺k 1⫹ .(A9) k (1 ⫺ ␥⬘)((␥⬘ ⫺ 2n)/(nS␥⬘) ⫺ 1) ⫺ 1
冢
冣
APPENDIX B: IDENTITY-IN-STATE IN THE MIXTURE MODEL
We derive probabilities of identity-in-state under the ISM, SMM, and KAM (see Rousset 1996) focusing on C w, C b, and C x individuals. Generating functions of allele size difference (under the SMM) or of the number of nucleotide differences (under the ISM) are also derived, as they are useful to derive moments of d 2 and of p. In contrast to the partial selfing model there is no need to distinguish identity within (subscript 0) and between (subscript 1) individuals, as random mating is assumed. IAM and ISM: The probability of identity (IIS and IBD are confounded here) for two alleles randomly drawn from the ancestral random mating population of size N at inbreeding equilibrium is Q a ⫽ ␥/(2N(1 ⫺ ␥) ⫹ ␥) with ␥ ⫽ (1 ⫺ u)2. Let two subpopulations from the ancestral population diverge without gene flow for generations. Just before the admixture, the probabilities of identity of two randomly chosen alleles are Q w() ⫽ Q eq ⫹ (␥(1 ⫺ 1/(2n)))(Q a ⫺ Q eq) within a subpopulation and Q b() ⫽ ␥Q a between subpopulations, with Q eq ⫽ ␥/(2n(1 ⫺ ␥) ⫹ ␥). After the admixture, neglecting mutations since the admixture event and considering an infinite mixed population for the sake of simplicity,
冢冢
Q w ⫽ Q w() ⫽ Q eq ⫹ ␥ 1 ⫺
1 2n
冣冣 (Q
a
⫺ Q eq),
冣冣
within subpopulations and b,(z) ⫽ (r(z))a(z) between subpopulations, with a(z) ⫽
Q0 ⫽
1 (a(z) ⫺ eq(z)) 2n
r(z) 2N(1 ⫺ r(z)) ⫹ r(z)
and eq(z) ⫽
r(z) . 2n(1 ⫺ r(z)) ⫹ r(z)
Neglecting mutations since the admixture event and considering an infinite mixed population for the sake of simplicity, we finally obtain
冢 冢
w(z) ⫽ eq(z) ⫹ r(z) 1 ⫺
1 2n
冣冣 ( (z) ⫺ (z)), a
eq
b(z) ⫽ (r(z)) a(z), 0|x(z) ⫽ x w(z) ⫹ (1 ⫺ x)b(z), 0(z) ⫽ 1(z) ⫽ 1⁄2(w(z) ⫹ b(z)).
(B2)
These generating functions are used to obtain the moments of p as in (A5). SMM: Again, we obtain analogous expressions for generating functions, as we did in appendix a. The generating functions w, b, and 0|x are defined as in the previous section, but considering “steps” (varying from ⫺∞ to ⫹∞) instead of “nucleotide substitutions” (varying from 0 to ⫹ ∞). Recursions on (z) are identical to that in the previous section (Equations B2) except r(z) ⫽ (1 ⫺ u ⫹ 1⁄2(uz ⫹ u/z))2. By definition, we have p0,w ⫽ Q w and p0,b ⫽ Q b. Furthermore, for all j ⬆ 0, p⫺j,w ⫽ p⫹j,w ⫽ 1⁄2 Pr(d 2 ⫽ j 2|C w) and p⫺j,b ⫽ p⫹j,b ⫽ 1⁄2 Pr(d 2 ⫽ j 2|C b). Therefore w (e ix) ⫽ Q w ⫹ Rj∞⫽1cos(jx)Pr(d 2 ⫽ j 2|C w) and b (e ix) ⫽ Q b ⫹ Rj∞⫽1cos(jx)Pr(d 2 ⫽ j 2|C b). Using inverse Fourier transforms, L j(f ) ⫽ (1/) 兰0 f(x)cos(jx)dx:
Q b ⫽ Q b() ⫽ ␥Q a, Q w ⫽ L 0(w(e ix)),
Q x ⫽ xQ w ⫹ (1 ⫺ x)Q b, Q 0 ⫽ Q 1 ⫽ 1⁄2(Q w ⫹ Q b).
(B1)
Under the ISM we obtain analogous expressions for generating functions, as we did in appendix a. Let w (resp. b) be the generating function of the probability pj,w (resp. pj,b) that two randomly chosen alleles that originated from the same subpopulation (respectively from the two subpopulations) differ by j nucleotide substitutions [w(z) ⫽ Rj⫹⫽∞0pj,wz j]. Recursions on (z) are mathematically identical to those for IBD, taking into
Q b ⫽ L 0(b(e ix)), Q x ⫽ xQ w ⫹ (1 ⫺ x)Q b, Q 0 ⫽ Q 1 ⫽ 1⁄2(Q w ⫹ Q b).
(B3)
L j(f )’s are numerically calculated with Mathematica 3.0 (Wolfram 1996). Generating functions are used to derive moments of d 2 as in (A7). KAM: In the ancestral random mating population of size N at inbreeding equilibrium we have
Heterosis, Marker Mutation and Population History
Qa ⫽
冢
冣
2 2N(1 ⫺ ␥⬘)/K) ⫹ ␥⬘ K with ␥⬘ ⫽ 1 ⫺ u . 2N(1 ⫺ ␥⬘) ⫹ ␥⬘ (K ⫺ 1)
Just before the admixture,
冢 冢
Q w() ⫽ Q eq ⫹ ␥⬘ 1 ⫺
1 (Q a ⫺ Q eq), 2n
冣冣
冢 冢
Qb ⫽
1 (Q a ⫺ Q eq), 2n
冣冣
冢
冣
冢
⫽ 1/L j(0(e ix)) 2
(B4)
APPENDIX C: THE EXPECTED FITNESS CONDITIONAL ON d 2
E(W |d 2 ⫽ j 2) ⫽ W0 ⫺ E(f |d 2 ⫽ j 2),
(C1)
with E(f |d ⫽ j ) ⫽ Rk fk Pr(Ck|d ⫽ j ). Using Bayes’ theorem, we have 2
Pr(d 2 ⫽ j 2|C k)Pr(Ck) Pr(C k|d ⫽ j ) ⫽ . Pr(d 2 ⫽ j 2) 2
冣
S (1 ⫺ S) L j(1(e ix)) . (4 ⫺ S)(2 ⫺ S)
1
ix
(C2)
The expected fitness as a function of d 2 is finally obtained by plugging (C2) into (C1). Using the same method in the population admixture model we obtain E(W |d 2 ⫽ j 2) ⫽ W0 ⫺ E(x|d 2 ⫽ j 2).
Under the partial selfing scenario, Equation 3 yields
2
0
∀j ⬆ 0, E( f |d 2 ⫽ j 2)
Q 0 ⫽ Q 1 ⫽ 1⁄2(Q w ⫹ Q b).
2
E( f |d 2 ⫽ 0) ⫹ S) S (1 ⫺ S) ⫹2 L ( (e ))冣, 冢(4 ⫺S(2S)(2 ⫺ S) (4 ⫺ S)(2 ⫺ S)
1 1 ⫹ ␥⬘ Q a ⫺ , K K
2
Pr(d 2 ⫽ j 2|Ck) is derived from the generating function 0|k in the same way, neglecting mutation since the last outcrossing event. Finally,
⫽ 1/L 0(0(e ix))
Q x ⫽ xQ w ⫹ (1 ⫺ x)Q b,
2
Pr(d 2 ⫽ j 2) is derived from the L j transform of the generating function 0 (appendix a, SMM). According to (A6), Q 0 ⫽ Pr(d 2 ⫽ 0) ⫽ L 0(0(e ix)) and similarly, for all j ⬆ 0, Pr(d 2 ⫽ j 2) ⫽ 2L j(0(e ix)).
with Q eq ⫽ (2n(1 ⫺ ␥⬘)/K ⫹ ␥⬘)/(2n(1 ⫺ ␥⬘) ⫹ ␥⬘) within subpopulations and Q b() ⫽ 1/K ⫹ ␥⬘(Q a ⫺ 1/ K ) between subpopulations. Neglecting mutation after the two populations merged into an infinite population, Q w ⫽ Q eq ⫹ ␥⬘ 1 ⫺
1859
(C3)
At the first generation after the admixture (g ⫽ 1) we have E(x|d 2 ⫽ j 2) ⫽
L j(w(e ix)) L j(w(e ix)) ⫹ L j(b(e ix))
(C4)
with L j, w, and b defined in appendix b, SMM. The expected fitness as a function of d 2 is finally obtained by plugging (C4) into (C3).