The Allelic Correlation Structure of Gainj- and Kalam ... - Genetics

2 downloads 58 Views 1MB Size Report
Jun 26, 1986 - Genetic distance analyses are not new in anthropo- logical studies (cf: BALAKRISHNAN and SANGHVI 1968;. CROW and DENNISTON 1974; ...
Copyright 0 987 by the Genetics Society of America

The Allelic Correlation Structure of Gainj- and Kalam-Speaking People. 11. The Genetic Distance Between Population Subdivisions Jeffrey C. Long,* Peter E. Smouset and James W. Wood* *Department of Anthropology, University of New Mexico, Albuquerque, New Mexico 87131, tDepartments of Human Genetics and Biology, University of Michigan, Ann Arbor, Michigan 481 09, and *Department of Anthropology, University of Wisconsin, Madison, Wisconsin 53706 Manuscript received June 26, 1986

Revised copy accepted June 18, 1987 ABSTRACT The patterning of allele frequency variability among 18 local groups of Gainj and Kalam speakers of highland Papua New Guinea is investigated using new genetic distance methods. The genetic distances proposed here are obtained by decomposing Sewall Wright’s coefficient FST into a set of coefficients correspondingto all pairs of population subdivisions.Two statistical methods are given to

estimate these quantities. One method provides estimates weighted by sample sizes, while the other method does not use sample size weighting. Both methods correct for the within-individual and between-individual-within-groups sums of squares. Genetic distances among the Gainj and Kalam subdivisionsare analyzed with respect to demographic, geographic, and linguistic variables. We find that a demographic feature, group size, has the greatest demonstrableassociation with the patterning of genetic distances. The pattern of geographic distances among groups displays a weak congruence with the pattern of genetic distances, and the association of genetic and linguisticdiversity is very low. An effect of differences in group size on genetic distances is not surprising, from basic theoretical considerations, but genetic distances have not often been analyzed with respect to these variables in the past. The lack of correspondence between genetic distances and linguistic and geographic differences is an unusual feature that distinguishes the Gainj and Kalam from most other tribal populations.

T

HE Gainj and Kalam speakers are two groups of

relatively recently contacted tribal swidden horticulturalists residing on the northern fringe of Papua New Guinea’s central highlands. They are linguistically distinct but otherwise very similar culturally. Both the Gainj and Kalam are subdivided into a series of local groups ranging in size from about 20 to 200 individuals. T h e local groups occupy non-overlapping territories, within which group members share access to garden land and forest resources. In keeping with usage in social anthropology (HOGBINand WEDGEWOOD 1952), these local groups are referred to as parishes. In the establishment of marriages, both the Gainj and Kalam attempt to ensure that each person will have affinal ties with as many parishes as possible (JOHNSON 1982). There are prevailing rules of parish exogamy and patrilocal postmarital residence. Among other things, this system avoids marriages between very close relatives (JOHNSON1982). While this is expected to retard the effects of genetic drift, polygyny is practiced in both groups, and this tends to reduce the effective population size, thereby enhancing the effect of drift (WOOD 1986). Formal analysis of dispersal in these groups (WOOD,SMOUSEand LONG 1985) has revealed that 84% of all individuals were born in their father’s natal parish, while only 33% Genetics 117: 273-283 (October, 1987)

were born in the natal parish of their mother. T h e mean distance between the birthplaces of fathers and their offspring is only 1.3 f 0.1 km compared to 4.2 f 0.2 km for mothers and their offspring. In addition to geographical distance, the linguistic composition of the parishes was found to be an important determinant of pairwise migration rates. Thus, the population exhibits a remarkable degree of isolation by distance, and access to potential mates is further restricted by linguistic differences. T h e mating pattern of these people is decidedly nonrandom, and allele frequency heterogeneity among parishes has been demonstrated by a significant value of WRIGHT’S(1965, 1969) coefficient FST (LONG1986). While this statistic is a useful indicator of population structure, it is a single summary measure, and consequently, it reveals very few details of the structure that it measures. In WRIGHT’Sown words, “[the subdivisions] may be completely isolated, at one extreme, or merely arbitrarily bounded portions of a continuum, at the other. Their gene frequencies may be distributed at random in the total population or in an orderly cline. T h e full account of an actual population obviously requires a map, and detailed accounts of the various regions within it and their relations” (WRIGHT 1965). In this study, a method is developed for decomposing FST into a set

274

J. C. Long, P. E. Smouse and J. W. Wood

of distance statistics that measure divergence between all pairs of population subdivisions. T h e resulting pairwise view of among subpopulation genetic variation should better reveal systematic patterns of variation than does the single coefficient. Genetic distance analyses are not new in anthropological studies (cf: BALAKRISHNAN and SANGHVI 1968; CROWand DENNISTON 1974; EDWARDS 1971; JORDE 1980, 1985; LALOUEL 1980; NEI 1973, 1974; SMITH 1977; SMOUSE 1982; WORKMAN et al. 1973), but they have been laden with problems. The most appropriate distance measure may not be apparent because many of them cannot be related to other established genetic parameters (MORTON et al. 1971). In addition, their estimation has been complicated by: (1) unequal sample sizes, (2)the use of multiallelic loci, and (3) the estimation of allele frequencies from sample data. In the following, it is shown that the Euclidean distances between allele frequency vectors representing population subdivisions can be related to WRIGHT'S parameter Fsr. Capitalizing on this relationship, a measure of genetic distance is proposed and estimators for this genetic distance are developed as extensions of the variance components estimator of Fs, (COCKERHAM 1969, 1973; WEIRand COCKERHAM 1984; LONG1986). Thus, the aforementioned problems of estimation are overcome. These genetic distances are computed among 18 Gainj and Kalam parishes using blood protein data, and the resulting pattern of genetic variation among parishes is matched with patterns of demographic, geographic, and linguistic variation.

but the methods developed here may be applied to villages, demes or any other kind of subdivisions. The total variability in allele frequency vectors is denoted by the matrix Z and this variance-covariance matrix can be decomposed as Z = Z, Z,+ Z,,where Z, is the variance-covariance component matrix (VCCM) corresponding to alleles within individuals, Zbis the VCCM that corresponds to variation among individuals within subdivisions, and I;, is the among subdivisions VCCM. SEWALLWRIGHT'Sparameter Fsn is obtained from the VCCM's according to TR[Z-1/2Z,Z-'/2]/(Z - l), where T R denotes the trace of a matrix. The allelic vectors, partition of variance and covariance, and the relationship with F S T are presented in more detail by LONG(1986). Following WRIGHT(1965, p. 401), F S T is defined relative to a current population rather than an ancestral group. The allele frequency vectors for the kth subdivision is denoted, pk = [ P l k , P Z k , . . ., P ( z - I ) R ]and ~ , the average allele frequency vector in the total population is p = [ P I ,p 2 , . . ., P ~ Z - I ~The ] ~ among-subdivision . allele frequency VCCM is

+

K,

= (I/&)

[Pk

- Pl[Pk - PIT

(1)

and [pk - p][pk - pIT forms the matrix of amonggroups sums of squares and cross-products. It is well known (LI 1976; p. 64) that a sum of squares and cross-products can be expressed as a function of the K ( K - 1)/2 pairwise differences between mean vectors, thus K*

k= 1

[pk - PI[Pk - PIT (2)

WRIGHT'S FST, EUCLIDEAN, AND STANDARDIZED DISTANCES

LONG( 1 986) defined a set of random vectors, x,to describe variability at a locus with alleles A I , A2, . . ., A Z . Each vector, x, has 2 - 1 elements. The first vector element is assigned a one if the allele is type A , and a zero otherwise. At the second position, a one is assigned if the allele is type A2, and a zero otherwise; and so forth until the position 2 - 1 is reached. Thus, the vector corresponding to the allele AZ is composed of 2 - 1 zeros. This approach will yield identical values for the genetic distances no matter which alleles are designated AI, AS, . . ., Az. Since each person carries two alleles, two vectors are scored for each individual sampled. We assume a total population that contains k = 1 , . . ., K , subdivisions. The total population will generally have a greater number of subdivisions than are sampled, and the number sampled, say K , needs to be distinguished from the total number, K,. No assumptions are made about the number of individuals within subdivisions. For the present purposes, the subdivisions considered are the Gainj and Kalam parishes,

k= 1

K,

= (I/Kt)

k