Population structure of multilocus associations.

1 downloads 0 Views 752KB Size Report
May 11, 1981 - Dik = g -. Pi qk in which Tga = Stgik/J (= gi), and 'p = St tpJ (= *p). The covariance ofallele frequencies at two loci over the pop- ulations is cov(pi ...
Proc. Natl Acad. Sci. USA

Vol. 78, No. 9, pp. 5913-5916, September 1981 Population Biology

Population structure of multilocus associations (barley/enzyme polymorphisms/gametic disequilibrium/coadaptation/subdivided populations)

A. H. D. BROWN* AND M. W. FELDMANt* *Commonwealth Scientific and Industrial Research Organization, Division of Plant Industry, P.O. Box 1600, Canberra City, A.C.T. Australia; and tDepartment of Biological Sciences, Stanford University, Stanford, California 94305

Communicated by R. W. Allard, May 11, 1981

ABSTRACT A method of analysis is presented whereby the structure ofmultilocus associations among and within several populations can be partitioned into its components. The components are measured by their contributions to the variance in the number of heterozygous loci in two randomly chosen gametes. The singlelocus components are the average and the variation among populations in gene diversity and the variance among populations in allele frequency. The two-locus components include the mean and variance of disequilibria, the covariance of allele frequencies over populations, and various interactions. When applied to allozyme data from populations of wild (Hordeum spontaneum) and cultivated barley (H. vulgare), the analysis highlighted the repetitive pattern of the multilocus associations in the composite crosses whereas it emphasized the regionally localized and geographically variable pattern present in the natural populations ofthe wild species. The analysis is flexible and applicable to multilocus gametic data from any set of populations, without regard to the number of alleles per locus or the reproductive method of the organism.

Associations between alleles at different loci are a common feature of polymorphic populations of inbreeding plant species (see refs. 1 and 2 for review and references). Several workers (3-5) have argued that such associations are evidence of epistatic selection, whereas others have pointed out that they could arise by chance from population subdivision (6), hitchhiking (7), or founder effects (8). One approach has been to consider jointly the evidence from more than one population. Clegg et aL (4) considered two composite crosses and argued against founder effects on the grounds that "it is highly unlikely that the same sampling accident would occur in both populations." To facilitate the study of the multilocus genetic structure of single populations, we (9) recently developed a series of indices based on the distribution (1(K)] of the number of heterozygous loci (K) when two randomly chosen gametes are compared. The most useful statistic of this distribution is the variance, which can be expressed in terms of the single-locus genic diversities (10), gene frequencies, and linkage disequilibria for all possible pairs of loci (see Formula 1 below). As a method of summarizing the degree of correlation among loci, this statistic has several biological and statistical advantages (9). One such advantage is that it can be extended to the analysis of multilocus structure in subdivided populations. The extension parallels that developed by Nei (10) for single-locus measures of gene diversity. It affords a partition into contributions to the variance arising from single-locus effects and from two-locus associations and further divides this latter group into associations within and among populations and the overall consistency of associations at these different levels. Its aims are therefore similar to those of the analysis of Smouse and Neel (11). However their method uses correlations between measures defined on zygotic frequencies

and increases in complexity with increasing numbers of alleles per locus. Our method is defined on gametic frequencies in random samples of gametes and therefore excludes the complications of departure from panmixia on zygotic disequilibrium (12). Variance in number of heterozygous loci in a single population Consider an infinite population of gametes whose genotype at each of m loci is known. The random variable K is the number ofthese loci (K = 0, 1, . . ., m) which are different (heterozygous) when two such random gametes are compared at these loci. The population distribution of this random variable, 1(K), depends on the degree of variability at each of the m loci as well as the correlation in allelic state among loci over gametes. The variance (9) of this distribution is m

m

2= >h i

J

m

m

+ 2, >> [2piaD + j l>j i k

(Dsi)2]

[1]

in which p is the frequency of the ith allele at the jth locus,

hj = 1 - is the single-locus gene diversity at the jth locus and DY' is the disequilibrium between the ith allele at the jth locus and the kth allele at the lth locus. Thus [2] in which gok is the frequency of the two-locus gamete with the ith allele at thejth locus and the kth allele at the lth locus. The variance, aK, is independent of higher-order associations (9). It consists of two parts. The first two summations over loci are the contribution of single-locus effects. The last term, cumulative over pairs of loci, measures the alteration to the variance due to pairwise associations. We will now extend the analysis to cover samples from a number, J, of populations. Definitions for several populations Without loss of generality it is only necessary to consider a pair of loci with allele frequencies {pj and {qkj, so that the j and 1 index variables are suppressed. For each genetic parameter (g, p, q, D, h, and (4), we define three kinds of values: (i) the J separate values within each population. With the superscript t used to denote the tth population (t = 1, 2,. 1), formula 2 becomes tg tp tq + tDik; 'ik

:-

9jik- PjiPlk

=

(i) the average of the individual population values, ik=

tgikJ; and

(iii) the value belonging to a hypothetical equiproportional

The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U. S. C. §1734 solely to indicate this fact.

t To whom reprint requests should be addressed.

5913

Population Biology: Brown and Feldman

5914

Proc. Nad Acad. Sci. USA 78 (1981)

WC arises from the Wahlund effect in two loci or covariance of allele frequencies;

mixture of the J populations, Dik = g

Pi qk and 'p = St tpJ (= -

in which Tga = St gik/J (= gi), *p). The covariance ofallele frequencies at two loci over the populations is

cov(pi, qk)

>2 cov (pi, q*) [2TpiTqk + cov (pi, qk)].

WC =

ik

Al arises from the average interaction of these effects,

piqk/J -piqk,

=

Al

=

E i

so that

TDik

Now consider the average of the variances belonging to each

Dik+ cov (pi, q).

=

oftheJ populations. This can also be split into five components,

This increase in disequilibrium in a mixed population has been called the Wahlund effect at two loci (refs. 13 and 14; see ref. 13 for references to previous derivations of this result). The single-locus diversity parameters are th ==~ 1 _ Etq2 thh1==11 - ~~2 tk2 tpi2 t2 i

k

*hi= > thi/J, Th

=

1 E>Tpi2 i

t

From these definitions, the equivalents of Nei's (10) DST statistics or Wahlund's single-locus variances follow as d, =Th, -1h d2= Th2- h2.

Components of variance for several populations Consider the hypothetical population consisting of an equiproportional mixture of the J populations. From formula 1 above, the variance in number of heterozygous loci is

TK2 Thl + Th2

h2

-

=

+

, i k

2TDik[2TPic9k +

Dik]-

[3]

This expression can be split into six components, ofwhich three are single-locus contributions and three arise from the pairwise summation term. The single-locus components are Thl + h- Th2 _ Th2 MH + VH + WH. =

MH arises from the averages of the population gene diversity, MH

=

'hi + *h2- I [thi

+

th2]/J.

VH arises from the variation among populations in gene

diversity, (h1) + var (h2). WH arises from the Wahlund effect or variance of allele frequency among populations, VH

WH

=

= var

di (1 2h, d1) + d2(1 21h2 -

-

-

-

d2).

The two-locus components are >,

,2 TDik [2TpiTqk + TDik] = MD + WC + Al.

i

k

MD arises from the average of the population disequilibria, MD

=

2 Dik [2TpiTqk + ik

4Dik coV (pi, qk) k

'Dik].

one being a single locus effect and the remaining four being twolocus effects.

K2= >

taJ = MH + MD + Al + VD +

Cl.

The single-locus component MH and the two-locus components MD and Al are as defined above. The remaining two-locus components are VD, which arises from variation among populations in disequilibria

VD = i

k

2 var (Dik) = Z>2 i

k

tD2]J - .D2i], t

and CI, which arises from covariation in the interaction of disequilibria and Wahlund's covariance among populations Cl

=

4 cov [(piqk), Dik]ZZ i k

When m loci are scored, the single-locus components-namely, MH, VH, and WH-are evaluated for each locus and summed, whereas the two-locus components are evaluated over the m(m - 1)/2 possible pairs of loci and summed. The gametic phase must be known or all Dik must be estimated. The use of this analysis can now be illustrated with the data on 4-esterase loci in H. vulgare of Clegg et aL (4) and on 19 loci in H. spontaneum (15). Complex esterase polymorphism in two barley composite crosses Clegg et aL (4) have estimated the four-loci joint gametic frequencies for four esterase loci (EA, EB, EC, and ED, hereafter designated Est-1, Est-2, Est-4, and Est-5) in three generations of two composite crosses (CC II and CC V) of cultivated barley. The Est-2, -1, and -4 loci form a tightly linked unit, whereas Est5 segregates independently of that unit. Striking correlations in allelic state over loci developed in these composite populations. Clegg et aL noted that the same pair of complementary four-locus gametes came into marked excess in both composite population. The components of variance in number of heterozygous loci were computed from gametic frequencies for the two late generations (F41 in CC II and F2 in CC V) and for an equal mixture of these two generations by using the formulae above. The results are shown in Table 1, with the two-locus contributions of the linked block (Est-2/Est-1/Est-4) separated from those of associations with Est-5. Ofthe total observed variance in the number of heterozygous loci in pairs of random gametes in the mixed pool, the singlelocus components account for only 55%. There is thus almost a doubling of this variance due to associations among loci, both within and among populations. The average disequilibrium (MD) component is especially high for the linked loci and the variance of disequilibrium (VD) is low. This indicates that dis-

Population Biology:

Proc. Natd Acad. Sci. USA 78 (1981)

Brown and Feldman

Table 1. Components of variance in the number of heterozygous esterase loci (Est-1, -2, -4, and -5) when the late generations of two composite crosses (F26 in CC V, F41 in CC II) of barley are considered as two subpopulations Unlinked Linked pairs (Est 1/ block Total (Est-2/1/4) 2/4 with Est 5) Component Single-locus effect: MH, mean gene 0.827 0.603 diversity VH, variance in 0.051 0.048 diversity WH, Wahlund's allele frequency 0.043 0.066 variance 0.944 Total in Ta2 0.694 Two-locus effect: WC, covariance of allele frequencies

MD, mean linkage disequilibrium Al, interaction of WC x MD VD, variance of disequilibrium CI, covariation of interaction

0aK, average variance of K 0eK. variance of K in mixed pool

0.073

0.080

0.153

0.307

0.088

0.395

0.115

0.118

0.233

0.015

0.011

0.025

-0.094

0.040

-0.054

0.946

1.427

1.135

1.725

equilibria are similar between the two generations. Wahlund's covariance (WC) also accounts for an appreciable fraction of the two-locus effects. The interaction (AI) is substantial and positive, indicating that the correlation of alleles between populations is repetitive of the pattern within populations. Thus, the characteristics of multilocus associations in evidence in this analysis which are associated with systematic, repeatable selection are high MD, positive AI, low VD, and low CI. This

conclusion was reached by Clegg et aL (4) on the basis of highly significant differences in multilocus gametic frequencies. The

5915

alternative hypothesis offounder effects might be invoked when VD and CI are high; that of population subdivision might be invoked when WC is high but AI is low. Geographic patterns of variance components in H. spontaneum

The components of variance in number of heterozygous loci were determined for allozyme data from 28 natural populations of wild barley (H. spontaneum) in Israel (9, 15). This analysis included the following 19 inferred loci: Acph-1, -2, -3; Adh-i, -2; Est-i, -2, -4; Gdh, Got-i, -2; Mdh-i, -2; Nadhd-i; Pept-2; Pgi; Pgm; 6Pgd-2; To-2. Only gametes for which the full 19 locusphenotype was known were used. Table 2 gives the estimates of the components for all sites and for various contrasting partitions into two disjoint sets of 14 sites. The column labeled Est2/1/4 gives estimates based on the linked esterase block only, for direct comparison with Table 1. For these esterase loci, the wild populations as a group showed a pattern oftwo-locus effects which contrasted with the two composite crosses. The MD component was insignificant, the WC component was much greater than MD, and the VD component was higher. For the estimates based on all 19 loci, the component due to variation in disequilibrium (VD) was the most prominent two-locus effect. Thus, although the average population exhibited substantial multilocus structure [( r2 - MH)/MH] as reported previously (9), the most frequent gametic types differed from one population to another. Such a result would arise if founder effects were predominant or diversifying selection operated, or both. It should be noted that the sample sizes are much smaller than with-the composite crosses. To differentiate between these two forces, the full set of populations was divided into two equal groups on various bases. First, they were partitioned into those that were geographically close to all others and those that were distant from all others. The difference in the estimates of each component in the two subsets was compared with the mean absolute difference in such estimates for 10 random partitions in two equal subsets (Table 2, last column). In the first partition, the largest difference was for the VD component. The variance of disequilibrium was higher for remote populations, which is a result predicted by either the founder or the local selection hypothesis. The second partition was based on the mean genetic distance estimate of a population to all others (15). The group of genet-

Table 2. Components of variance in number of heterozygous loci, in a mixture of 28 Israel populations of H. spontaneum, for esterase and for 19 loci and for various partitions of the populations into two equal subsets All loci Est 2/1/4 Geneticallyt Ecologically* Geographic* Mean all 28 All- 28 random§ sites Unlike Xeric Mesic sites Close Remote Like Component 0.43 MH 1.31 1.27 0.18 1.11 147 1.14 1.34 1.50 VH 0.13 0.41 0.04 0.35 0.40 0.36 0.38 0.41 0.34 -0.12 WH 0.39 0.17 0.76 0.26 0.59 0.48 0.25 0.02 0.44 0.16 Total 2.11 2.22 2.13 2.09 2.13 2.00 1.86 0.33 WC MD 0.01 0.01 AI VD 0.17 CI 0.09 OKK20.71 0.49 TaK0

0.64 0.21 0.25 0.37 0.06 0.07 0.11 0.06 0.05 0.04 0.07 0.03 0.04 0.02 0.02 0.55 0.77 0.80 0.62 0.93 0.98 0.55 0.30 0.40 0.44 0.32 0.18 2.35 2.68 2.53 2.42 2.65 2.72 2.27 2.48 2.45 1.99 3.00 2.47 * Average geographic distance from a site to all sites (Table 3 of ref. 15). Close: 3,5,7,8,9,10,11,12,21,22,23,25,26,27. t Mean genetic distance from a site to all other sites (Table 3 of ref. 15). Like: 3,4,6,8,15,16,19,20,21,22,23,24,25,28. 4 Ecologically xeric defined as mean annual rainfall