Heavy-Tailed Distributions in Quantitative Trait Genetics Serge Sverdlov1 1
[email protected]; Department of Statistics, Box 354322, University of Washington, Seattle, WA 98195
Abstract The statistical genetics of quantitative traits have benefited deeply from analysis of variance methods, which carry assumptions of finite moments and normal kurtosis. Heavy tailed distribution methods violate these assumptions and can therefore lead to radically different behaviors. Our assertion is that the genetics of heavy-tailed traits is not merely Gaussian genetics with outliers; rather, these traits exhibit subtle heritability patterns, especially in polygenic traits, and show the classical distinction between heterogeneous Mendelian and blending forms of inheritance by a novel bifurcation argument. We derive closed form results using the Cauchy distribution, which we show to be a steady state in a broad family of population processes, and connect the distribution to evolvability via extreme value theory and process subordination arguments.
Key Words: QTL, heavy tailed distribution, polygenic traits, Cauchy distribution, population genetics, statistical genetics
1. Trait Distribution The setting is a quantitative trait additive across a large number 𝑛 of autosomic loci, with no dominance or epistasis effect. Diploid individual 𝐴 has 2𝑛 alleles, with total genetic value evaluated as 𝑛
2
𝐺𝐴 =
𝛼𝑖𝑗𝐴 𝑖=0 𝑗 =1
Note that this notation does not distinguish between different alleles except by their additive effect. We are interested in the behavior of 𝐺𝐴 for large 𝑛. Let each of the 𝛼𝑖𝑗𝐴 generated by some population process be independent and identically distributed. Identical distribution is a relatively light assumption based on symmetry of labeling. Independence carries far more baggage; in particular, the assumption that the two alleles at the same locus are independent is restrictive. For large 𝑛, we will be tempted to say 𝐺𝐴 has an approximately normal distribution. However, this is only true if the common distribution of the allelic effects is light tailed. The sums of independent random variables, appropriately normalized, can approach any of the Pareto-Levy stable distributions. In particular, if the common distribution has 𝑥 −2 tails, that is, if the density 𝑓 𝑥 satisfies
𝑓 𝑥 1, this result applies to the identity by descent of any combination of traits. We can fix a segregation and recombination path, and retain the same distribution; since the distribution is closed under mixture we can also retain it under a variety of breeding and selection strategies. Explicitly, so long as the processes that the choose segregation and selection paths are defined without explicit reference to phenotype or genotype of the trait, the distribution will be invariant to the next generation for an arbitrary combination of inbreeding selfing selection by location of recombination event selection for measure of linkage disequilibrium changes in recombination rate or map function population bottlenecks selection based on linked traits or markers The population process is allowed to depend on the recombination rates of a marker with the trait loci, but not, for example, on knowing which marker allele is linked with a high value of the trait. Thus we can make a very strong statement: if a trait has the Cauchy property in one generation, the property is going to remain in later generations unless we apply selection on the trait itself. The catch is that this distribution is a distribution with respect to all possible population processes, not with respect to the one followed by a finite size population. Thus variability within the finite population is not a consistent estimate of variability of the true distribution. Indeed, if we have a finite population size without mutation and allow enough generations to pass for drift to fix a single allele at every locus, the genetic values of all individuals within the population will of course be the same; only the distribution of possible trait values of that individual on repeated experiments is invariant and identical to the distribution within the founder generation. Nor does identically distributed imply independent; conditioning on phenotype or genotype of 𝐴’s relative gives information about the distribution of 𝐴’s trait. Even with the catch, there are important implications for pedigree analysis. Rather than assuming that the founder population samples from a discrete distribution of alleles at each locus, we assume for our purposes that each allele in each founder was independently sampled from the same (Cauchy) effect distribution. With that assumption we don’t even need to know the number of loci, only the trait variability of the independent founders. Relatives within a pedigree who share a common founder ancestor are certainly not independent; but two or more individuals who do not share a common founder are independent and identically distributed, with the distribution equal to that of the founder generation, regardless of what happens within each of their pedigrees. The canonical case for such a situation is dog breeds; if each breed, started with an independent set of founders, was subject to varying amounts of selective inbreeding based on some set of traits other than a Cauchy trait being considered, then a sample of dogs where each is taken from a different breed will have iid Cauchy trait values of the same distribution as the founders, even if we know that one breed experienced a bottleneck and inbreeding and another did not.
2.3 Finite Sample Intuition In practice, what would such a founder population look like within a finite population size? For each locus, the majority of alleles will have effect near zero. It does not matter much if these traits happen to be consolidated into a few high frequency alleles rather than many rare almost-indistinguishable moderate effect alleles. Indeed, because only the shape of the tail matters for convergence of a large number of loci to the Cauchy distribution, it should not hurt the analysis for the moderate traits to be overrepresented or underrepresented – or even missing – in some of the loci. Outside the moderate center, there will be a large number of rare alleles with extreme effect values. The tail rule defines the relation between trait frequency and effect size:
𝑓𝛼 ~
1 𝛼2
It is fine if we think of the mainstream trait as wild type, and the tail as a series of rare extreme mutants. But the extreme mutants, in this case, behave like dominant Mendelian traits.
3. Quantitative Inheritance 3.1 Method of Haploid Distributions Within the additive model, we distinguish between 𝑓𝑑 𝑥 , the equilibrium distribution of trait values in the diploid population, and 𝑓 𝑥 , the haploid distribution, or the distribution of the trait contributions in gametes. The trait value of the diploid individual is the sum of the contribution values of the two gametes received from its parents. The lighter assumption is that the haploid distributions do not differ with the sex of the parent; the stronger assumption is a dual assumption of independence: 1. Gametes assorting from unrelated parents have independent distributions. 2. Gametes assorting from the same parent have distributions that are independent of their complementary alleles. Thus, suppose we fix the segregation choices for a single gamete (e.g. the gamete maternal gamete at loci 1, 2, and 4, and paternal gamete at locus 3). Then the distribution of the contribution value of this gamete, 1M 2M 3P 4M, is independent from the distribution of gamete 1P 2P 3M 4P. In the polygenic setting the second assumption resembles, but is weaker than, linkage equilibrium. Note that the sum of the contribution value of a gamete and its complementary gamete is simply the diploid value. Since a diploid trait value is thus unconditionally a sum of two independent haploid contributions, the two distributions are related by ∞
𝑓𝑑 𝑥 = 𝑓 ⨂𝑓 𝑥 =
−∞
𝑓 𝑦 𝑓 𝑥 − 𝑦 𝑑𝑦
Thus if we only have the population distribution of trait values, we can try to derive the haploid distribution. The haploid distribution uniquely defines the diploid distribution, but the choice of haploid distribution from diploid can be non-unique in pathological cases having to do with the square root of the characteristic function having two complex values. We won’t encounter these cases explicitly. Likewise, existence of a haploid distribution implies the existence of a diploid distribution, but not vice versa. The conditional distribution for the contribution value of a gamete segregating from an individual, given the trait value of the individual, can be derived for a fixed segregation pattern but does not change with the segregation pattern. If 𝑋 is the contribution from a gamete under a given segregation pattern, 𝑍 is the diploid trait value, and 𝑌 = 𝑍 − 𝑋 is the contribution from the complementary gamete, we can write, as an application of Bayes’ law,
𝑓 𝑥𝑧 =
𝑓 𝑥 𝑓 𝑧 − 𝑥 𝑓𝑑 𝑧
The derivations follow those in Edgar (1998), which were in turn developed to illustrate heavy tailed distribution phenomena described by Benoit Mandelbrot. For any stable distribution with coefficient 𝛼 centered at 0, suppose (without loss of generality) that the scale of the diploid distribution is , and the scale of the haploid distribution is . Then the stable property implies
𝛼 + 𝛼 = 1 = 2𝛼 = 2−1/𝛼 Thus for the normal distribution, 𝛼 = 2 and = 1/ 2 ; and for the Cauchy, 𝛼 = 1 and = 1/2 . The two distributions are then related by a scale transformation,
𝑓 𝑥 = Then we can write the conditional density as
1 𝑥 𝑓𝑑
𝑥 −1 𝑧−𝑥 𝑥 𝑧−𝑥 −1 𝑓𝑑 𝑓𝑑 𝑓𝑑 𝑓𝑑 𝑓 𝑥 𝑓 𝑧 − 𝑥 𝑓 𝑥𝑧 = = = −2 𝑓𝑑 𝑧 𝑓𝑑 𝑧 𝑓𝑑 𝑧 As a special case, when 𝑧 = 0 and 𝑓𝑑 𝑥 = 𝑓𝑑 −𝑥 , −2 2 𝑥 𝑓 𝑥𝑧 = 𝑓 𝑓𝑑 0 𝑑 For the Gaussian distribution,
𝑓𝑑 𝑥 =
𝑓 𝑥𝑧 = −
2𝑥 2 2 𝑧 − 𝑥 − 2𝜎 2 2𝜎 2
2
+
−2
𝑓𝑑
𝑥2 − 2 𝑒 2𝜎
1
2𝜋𝜎 = 1/ 2 𝑥 𝑧−𝑥 𝑓 2𝑥 2 2 2 − 2− 𝑑 2𝜎 = 𝑒 𝑓𝑑 𝑧 2𝜋𝜎
𝑧2 2𝑥 2 + 2 𝑧 − 𝑥 = − 2𝜎 2 2𝜎 2 =−
2𝑥 − 𝑧 2𝜎 2
Thus the conditional distribution is normal with mean
𝑧 2
2
− 𝑧2
2
=−
=−
𝑧 𝑥−2
2
𝑧−𝑥 2 𝑧 2 + 2 2𝜎 2 2𝜎
2𝑥 2 + 2𝑧 2 − 4𝑥𝑧 + 2𝑥 2 − 𝑧 2 2𝜎 2
(𝜎/2) 2
and variance
𝜎 2 2
=
𝜎2 4
. The sum of contributions from two
independent gametes, with parents 𝑧1 and 𝑧2 , is then normal with mean 𝜇 and variance
𝜎2 2
.
By breeding with knowledge of parental trait values, the variance of the next generation’s trait values is cut in half; and, with high probability, the mating of an individual with an extreme trait value 𝑧 with one with an ordinary trait produces a child with a medium strength trait near 𝑧/2. Thus far, nothing new; but the Cauchy (along with other heavy tailed distributions) behaves in a radically different way. Using the Cauchy density
𝜎 𝜋(𝑥 2 + 𝜎 2 ) = 1/2 𝑥 𝑧−𝑥 𝑓𝑑 𝑓𝑑 4𝜎 (𝑧 2 + 𝜎 2 ) −2 𝑓 𝑥𝑧 = = 𝑓𝑑 𝑧 𝜋 (𝑥 2 + 4𝜎 2 )( 𝑧 − 𝑥 2 + 4𝜎 2 ) −4 This density is not Cauchy; the tails are asymptotically 𝑥 , although the tails do return to 𝑥 −2 behavior in the limit as 𝑧 → ±∞. We plot this density for 𝜎 = 1 and each of 𝑧 = 0,1,3,7 in Figure 1. 𝑓𝑑 𝑥 =
The phenomenon we observe is completely different from the Gaussian case. For sufficiently extreme values of 𝑧, rather than inheriting an intermediate trait, the offspring will receive, with equal probability, either a trait value near 𝑧, or a trait value near zero. Receiving a trait value near 𝑧/2 is probabilistically excluded for large 𝑧. This phenomenon has an obvious genetic analogue in dominant traits.
Figure 1: Conditional distributions of single gamete contribution for different values of 𝑧. The implication is that the model implicitly assigns the responsibility for the extreme trait values to a single strong gene, that then follows dominant Mendelian inheritance. Small trait values, however, still follow something very much like Gaussian inheritance. The bimodal feature begins at 𝑧 = 𝜎 , splitting trait values into two distinct classes. The effect applies whether or not the underlying genetic model is polygenic, and is in fact not dependent on the assumed number of loci – the only thing we need to know about the particular trait is the population distribution of traits! This phenomenon is tantalizingly close to being empirically testable.
3.2 Toward a localization strategy We could plug a single locus version of this model into an existing linkage mapping strategy; to do so we could specify that founders’ alleles are sampled from the Cauchy distribution, and: 1. Discretize the distribution to create a large but finite number of alleles 2. Compute a probability of observing the phenotype given a genotype using a model of error from discretization 3. Compute the probability of each genotype assignment given a segregation path and founder allele assignment This would, however, require a large number of alleles and therefore be unusually expensive computationally. The approach more distinctively suited to this particular application would have lower resolution, but would naturally incorporate polygenic features. We assume that the loci are evenly spaced along the chromosome with respect to some distance metric such as map distance or number of base pairs. Every base pair is a locus carrying a tiny fraction of the total variability, adding up to the population variability. Then, every disjoint region corresponds to a Cauchy variate with scale proportional to its length.
If we had perfect information about segregation, recombination, and phase in a pedigree, we could split up all of the population’s chromosomes into a set of shared subsequences. In practice unless the pedigree is simple enough to simulate these by MCMC, we would have to make a fixed phasing decision, and then reconstruct the pedigree by some heuristic treating marker sequences as strings, such as a greedy algorithm declaring that the longest contiguous string that could consistently be shared by a parent and a child must contain no recombinations. This would limit the resolution greatly, and introduce unmodeled errors, but may be feasible for complex pedigrees. To each distinct subsequence we assign an independent contribution variable, with scale proportional to length. Note that we’re not trying to model similarity of sequences; we only care about length and distinctiveness. Each individual’s trait value would then be a sum of a set of such subsequence contributions, with some subsequences potentially doubled within an inbred pedigree. Thus for each individual we have a set of linear equations of the form
0 1 2 0 1 (𝑔1 𝑔2 𝑔3 𝑔4 𝑔5 )′ = 𝐺𝐴
The trait value for individual 𝐴 is a linear combination of the sequence contributions, each of which, before conditioning on these constraints, has a Cauchy distribution with a scale proportional to the sequence size. This setup should be sufficient to compute the conditional distribution of the subsequence contributions. As in the above case of conditioning on a parent’s trait, I believe this conditional contribution could be highly multimodal in the space of consistent allocations to the contributions, with each peak providing a hypothesis about a potential location and strength of an influential allele. In practice, even this calculation would be a computationally expensive MCMC problem, and, of course, introducing measurement error or environmental contributions would complicate things dramatically.
4. Evolvability Considerations We’ve made a case for the invariant character of the Cauchy distribution; once the trait has this set of properties, it’s hard for it to lose it; but it remains to be demonstrated that there are natural mechanisms for traits to obtain such specific features. Among the processes that give rise to Cauchy distribution, one mechanism based on subordinated processes stands out. We assume that random mutations occur with some frequency per generation, and that the mutations alter some allele’s effect by adding a Gaussian noise term. This mutation process continues for a random amount of time determined by an operating time random variable. We will interpret the operating time process as waiting for some event that stops the mutation process. For example, that event could be the attainment by a separate mutation process of a defense mechanism against further mutations affecting both traits. The model is based on the distribution of first passage time on a Gaussian random walk, as derived for example in Feller (1971). Given two independent Gaussian random walks 𝑋 and 𝑌 , with starting time 𝑡 = 0, 𝑋0 = 𝑌0 = 0 , 𝑋𝑡 ~𝑁 0, 𝑡 , 𝑌 0, 𝑡 ~𝑁 0, 𝑡 , consider the time when 𝑋 first reaches some predetermined boundary, 𝑇 = inf 𝑡: 𝑋𝑡 = 𝑎 where 𝑎 ≠ 𝑋0 . The first passage time distribution of 𝑋 is the stable distribution on positive support with coefficient 1/2. This distribution happens to be very bizarre; not only does the mean of an iid sample from it never converge, but it actually grows steadily and without bound as sample size increases. The distribution of 𝑌 at a fixed time 𝑡 is 𝑁(0, 𝑡); the distribution of 𝑌 at 𝑇 is a mixture of such Gaussians weighted by the bizarre distribution of 𝑇 , a mixture which happens to be a Cauchy with distribution
𝑓 𝑥 =
𝜋
𝑥2
𝑎 + 𝑎2
To extend the biological analogy, we will interpret time as counter of mutation events, with a unit of time representing an epoch within which a new mutation occurs and fixes within a population. Let 𝑋 and 𝑌 to be two distinct
quantitative traits subject to independent mutation, with each mutation altering the trait by adding a Gaussian. When 𝑋 reaches the value 𝑎, the organism acquires critical mass for some trait which preserves the organism as is for us to observe in the current generation. The nonselected 𝑌 traits dragged along in this process each receive a Cauchydistributed shock. A better evolutionary model would be to consider the waiting time till the first of many members of a population reaches a threshold. As a property of the Poisson process the minimum of independent exponential waiting times is distributed exponentially; but there is no analogously simple result for the first passage time distribution. A future challenge would be to generalize to the Weibull waiting time distribution, which through an extreme value theory argument can be used to model the minimum of a broader class of iid waiting times. Though there is a recent result (Sagias 2005) providing a (very complicated) analytic expression of the moment generating function for the Weibull distribution, which provides a path toward an analytic solution, this problem will remain difficult.
References Edgar, G. Integral, Probability, and Fractal Measures. Springer-Verlag, New York, 1998. Feller W. An Introduction to Probability Theory and Its Application, Vol. 2, 2nd ed., John Wiley and Sons, New York, 1971. Sagias, N.C.; & Karagiannidis G.K; (2005). Gaussian class multivariate Weibull distributions: Theory and applications in fading channels , IEEE Transactions on Information Theory 51 (10), 3608-3619 Weir, B.S. Genetic Data Analysis II: Methods for Discrete Population Genetic Data. Sinauer, Sunderland, 1996.