leles at the HLA-A locus, *A2, *A24, and *A31, and only. 10 alleles with a frequency ... sian admixture, while the "private" Indian alleles, *BN21 and *Bw48, can be ...... Bernstein F: Comitato Italiona per lo Studio dei Problemi della Populazione.
HLA Class I Variation Controlled for Genetic Admixture in the Gila River Indian Community of Arizona: A Model for the Paleo-Indians Robert C. Williams and Joan E. McAuley
A B S T R A C T : The genetic distribution of the HLA class
I loci is presented for 619 "full blooded" Pima and Tohono O'odham Native Americans (Pimans) in the Gila River Indian Community. Variation in the Pimans is highly restricted. There are only three polymorphic alleles at the HLA-A locus, *A2, *A24, and *A31, and only 10 alleles with a frequency greater than 0.01 at HLA-B where *Bw48 (0.187), *B35 (0.173), and the new epitope *BN21 (0.143) have the highest frequencies. Two and three locus disequilibria values and haplotype frequencies are presented. Ten three-locus haplotypes account for more than 50~b of the class I variation, with *A24 *BN21 *Cw3 (0.085) having the highest frequency. Gm
allotypes demonstrate that little admixture from non-lndian populations has entered the Community since the 17th century when Europeans first came to this area. As a consequence many alleles commonly found in Europeans and European Americans are efficient markers for Caucasian admixture, while the "private" Indian alleles, *BN21 and *Bw48, can be used to measure Native American admixture in Caucasian populations. It is suggested that this distribution in "full blooded" Pimans approximates that of the Paleo-Indian migrants who first entered the Americas between 20,000 and 40,000 years ago. Human Immunology 33, 39-46 (1992)
INTRODUCTION The Pima and T o h o n o O ' o d h a m Indians in the Gila River Indian C o m m u n i t y are culturally, linguistically, and genetically closely related and can be treated together as the Pimans [ 1 - 6 ] . Their class I H L A loci have been the object o f a n u m b e r of studies over the past 25 years [ 7 - 1 3 ] . The present work is motivated by the definition of a major new allele, B N 2 1 [14], and the absence in the literature of a complete population genetics analysis of this important Native American group, one that includes a large sample, phenotype, and maximum likelihood allele frequencies, two- and threelocus haplotype frequency estimates and their disequilibria, and a control for genetic admixture. T h e unique position of the Pimans in epidemiological, genetic, and anthropological studies is well demonstrated [15, 16]. Since 1965 the National Institutes of From the Histocompatibility Laboratory, Blood Systems, Inc. (R.C.W.; J.E.McA.), Scottsdale. and the Department of Anthropology, Arizona State University (R.C. W.), Tempe, Arizona. Address reprint requests to Robert C. Williams, Department of Anthropology, Arizona State University, Tempe, A Z 85287-2402. ReceivedJune 21, 1991; accepted October I8, 1991. Human Immunology 33, 39-46 (1992} ~'.) American Society for Histocompatibility and lmmunogenetics, 1992
Health ( N I H ) has been conducting a long-range study of arthritis and non-insulin-dependent-diabetes mellitus ( N I D D M ) within the Gila River Indian Community [17], which has the highest prevalence of N I D D M in the world [18], a disease that is associated with HLA*A2 [9]. As part of the study, genetic markers from many loci have been typed and a method for determining stated-admixture has been developed [19]. W o r k in this population with the G m system, a powerful marker for European admixture in Native Americans, suggests that only small amounts of Caucasian genes have entered the Community, that when a m e m ber says that he is "full blooded," he most likely is [20], and that as a result the Pimans might be a contemporary model for the Paleo-Indians who first migrated to N o r t h America between 20,000 and 40,000 years ago [19]. Therefore, characterization of their class I loci can yield insights into the early genetic distributions of the first Americans. The class I genes of Native Americans are also of interest because they present a technical challenge and serve, along with Gm, as very efficient markers of ge39
0198-8859/92/$5.00
40
netic admixture. Many of the antigens are difficult to characterize, such as B N 2 1 , which types as neither B49 nor Bw50. Antigen Bw48 has a very high frequency in Pimans and must be distinguished from the highly crossreactive specificities Bw60 and Bw61, which are also relatively common. T h e r e is also a B5 that is neither B51 nor Bw52. Concerning genetic admixture, because of the remarkable restriction of class I variation in "full blooded" Pimans, there are many alleles that are c o m m o n in Europeans and European Americans, but which are absent in the Pimans, and which serve as markers for Caucasian admixture. The importance of detecting European admixture in Native American tribes lies not only in basic anthropological research. It is necessary to control for admixture in disease association studies in which it can be a confounding variable [ 16]. Therefore, a description of the basic genetic distribution of class I genes in nonadmixed Native Americans, such as the present sample, can serve as a foundation for much further research.
MATERIALS AND METHODS Subjects were chosen from the Gila River Indian Community in Arizona. As part of the research protocol, persons are seen at twice weekly clinics in Sacaton, Arizona, with the goal of examining each person in the C o m m u n i t y every 2 years. W h e n a subject is seen in clinic, information is requested about his or her ethnic background. This has led to the construction of an admixture index, stated-admixture, that is determined without reference to a person's genetic phenotype. When N I H began this long-range study in 1965, each person who was examined was asked about his or her pedigree and the amount of admixture from other tribes and non-Indian populations. This, as well as information from individuals who were familiar with the m e m b e r s of the tribes, established a basic admixture number for each m e m b e r o f the community who was interviewed. Since that time these numbers have been used with pedigree data to establish the admixture index of subsequent generations. The stated-admixture index is determined in increments of one eighth. For instance, a person with one parent who is a "full blooded" Indian and one who is a Caucasian would be classified as 4/8. For the present study only those persons who were seen at clinic on the Gila River Indian Community, who are "full blooded" Native American, 0/8, and who are from either the Pima or T o h o n o O ' o d h a m (Papago) tribes were chosen. A person can be Pima, T o h o n o O'odham, or some combination of the two. H o w e v e r , no admixture from other Indian or non-Indian populations was permitted in the sample, as measured by the stated-admixture variable.
R.C. Williams and J. E. McAuley
T h e r e are 619 persons who satisfy these criteria and have been typed for the HLA-A, HLA-B, and HLA-C loci at the Blood Systems' Histocompatibility Laboratory, in Scottsdale, Arizona. H L A typing has been performed by standard methods [21]. A full range of W H O recognized specificities were represented in the battery of antisera for each locus. Allele frequencies were calculated using a maximum likelihood method for ABO-like systems after the method of Li 1976 [22]. The ABO-like model was chosen, rather than a codominant one, because of the segregation of blank alleles at the H L A - B and HLA-C loci. Two and three locus haplotype frequencies and their disequilibria were estimated by a pseudogene-counting method, as previously described [ 2 3 - 2 5 ] . For two-locus haplotypes the X2 value for genetic disequilibrium was approximated from a standard 2 x 2 table in which the values of the cells were taken from the final matrix of antigen counts after the program converged. If one of the ceils had a frequency less than 5, then the X2 statistic was not computed. T h r e e considerations lead us to interpret the X2 statistic for disequilibrium with care. First, the entries of the cells for the 2 × 2 table are themselves estimates from the pseudogene-counting procedure. Second, the observations in each cell are probably not independent, one of the assumptions for the test statistic. Third, there is the problem of multiple comparisons; out of many tests some will be significant by chance. As a result, we do not include P values in the tables for two-locus disequilibria because their values are not known. Instead, we use the X2 statistic only as an approximation of the strength of the disequilibrium between two alleles, and not as part of a statistical test. When collecting a large sample from a small population, a n u m b e r of related persons will be included. Also, during the past 10 years many persons have been chosen for H L A typing because they were part of a clinical study. H o w e v e r , comparison of the present data with those derived from a subset of "unbiased" typings revealed no significant differences.
RESULTS T h r e e alleles segregate with a polymorphic frequency (>0.01) at the H L A - A locus in the sample of 619 Pimans, HLA*A2, *A24, and *A31 (Table 1). Allele HLA*A2 has the highest frequency, 0.561, while *A24 is also very common, 0.342. Together these two antigens make up more than 9 0 % of the allelic variation at this locus. Rare alleles at this locus were included in the category H L A * A R and include the following (allele, frequency): HLA*A1 (1), *A3 (1), *A23 (1), *A28 (6),
Distribution of Class I Genes in Pimans
41
T A B L E 1 Gene frequencies for the "full-blooded" Pimans in the Gila River Indian Community
Allele
Phenotype frequency
Allele frequency
95(7~ confidence interval
*AX
HLA-A locus, N = 619 0.814 0.561 0.580 0.342 0.149 0.080 0.034 0.017 0.000
0.531, 0.315, 0.065, 0.010, 0.000,
0.591 0.370 0.095 0.024 0.009
*B5 *BN21 *B27 *B35 *B39 *B40 *Bw48 *B51 *Bw60 *Bw61 ~BR *BX
HLA-B locus, N = 619 0.147 0.075 0.268 0.143 0.194 0.099 0.326 0.173 0.208 0.111 0.031 0.016 0.331 0.187 0.109 0.057 0.068 0.036 0.094 0.049 0.042 0.021 0.033
0.060, 0.123, 0.082, 0.151, 0.094, 0.009, 0.165, 0.044, 0.026, 0.036, 0.013, 0.025,
0.090 0.163 0.116 0.195 0.129 0.023 0.210 0.070 0.047 0.06I 0.029 0.041
~Cw2 *Cw3 ~Cw4 *Cw7 *Cw8 *CwR *CwX
HLA-C locus, N = 619 0.192 0.102 0.396 0.224 0.288 0.154 0.220 0.115 0.313 0.167 0.005 0.002 0.236
0.085, 0.199, 0.134, 0.097, 0.145, 0.000, 0.222,
0.119 0.248 0.175 0.133 0.188 0.005 0.250
*A2 *A24 *A31 *AR
*A29 (1), *A30 (3), and *Aw33 (1). The maximum likelihood estimate for the frequecy of HLA*AX is 0.0. At the HLA-B locus in this sample, ten serologically defined antigens and alleles segregate at polymorphic frequency (Table 1). Allele HLA*Bw48 is represented most frequently at this locus (0.187), followed in descending order by *B35 (0.173), *BN21 (0.143), and *B39 (0.111), all of which have allele frequencies greater than 0.2. Together these five alleles comprise 61% of the variation at the HLA-B locus. The remaining alleles, while being polymorphic, are less common. Antigen B5 is a specificity that is neither B51 nor Bw52 and appears to share an epitope with Bw53 (data not shown). Antigen B40 in Table 1 is a category for specificities that could not be typed as Bw48, Bw60, or Bw61. Rare antigens are included in the single allele category, HLA*BR, and include the following: HLA*B7 (3), *B8 (1), *B14 (1), *B16 (1), *B44 (5), *B45 (l), *Bw52 (13), and *Bw53 (1). The blank frequency at this locus, HLA*BX, is 0.033, and its 9 5 ~ confidence interval does not include 0.0.
Locus HLA-C also has a restricted allele frequency distribution in the sample of 619 Pima-Tohono O'odham Indians, having only five serological antigens with polymorphic allele frequencies: HLA*Cw3, 0.224; *Cw8, 0.167; *Cw4, 0.154; *Cw7, 0.115; and *Cw2, 0.102 (Table 1). With a frequency of 0.236, the blank allele HLA*CwX represents the highest allele frequency at this locus. Rare alleles in HLA*CwR include *Cw5 (1) and *Cw6 (1). Genetic disequilibria estimates (D) and haplotype frequencies for the HLA-A and HLA-B loci are found in Table 2. The haplotype with the highest frequency is *A2 *Bw48 (0.1227), followed by *A2 *B35 (0.0938), *A24 *BN21 (0.0851), *A24 *B35, (0.0726), *A2 *B5 (0.0683), *A2 *B27 (0.0682), *A24 *B39 (0.0520), *A2 *BN21 (0.0507), *A2 #B51 (0.0472), and *A24 *Bw48 (0.0440). Ignoring the rare alleles at these two loci the number of haplotypes possible is 60, while the ten combinations above represent 70% of the total variation. The new HLA-B locus allele, *BN21, is in strong positive disequilibrium with HLA*A24 (D = 0.0368) and strong negative disequilibrium with HLA*A2 (D = -0.0283). The associations of the haplotypes for HLA-B and HLA-C, because of their close linkage, are of particular strength and interest (Table 3). For instance, HLA*B27 and *Cw2 are in very strong, positive disequilibrium with D = 0.0877 and a haplotype frequency of 0.0975. This is nearly ten times the random expectation for the haplotype frequency of this pair, 0.0101. The result is that all of the haplotypes with *Cw2, except *Bw48 *Cw2, have a frequency of 0.0. The new allele, *BN21, is in strong positive disequilibrium with *Cw3 (D = 0.0729) with a haplotype frequency of 0.1046. Allele HLA*B35 exhibits the well-known, strong positive pairing with *Cw4 (D = 0.1242). Strong positive disequilibrium is also exhibited by *B39 *Cw7 (D = 0.0531), *Bw48 *Cw8 (D = 0.0395), and *B5 *Cw8 (D = 0.0364). For the HLA-A and HLA-C loci, once again alleles HLA*A2 and HLA*A24 combine with the HLA°C alleles to construct the most frequent allele pairs (Table 4). Haplotype *A24 *Cw3 is in strong positive disequilibrium (D = 0.0521) and has the highest haplotype frequency, 0.1280. As was seen above, HLA*A2 demonstrates the opposite pairing when compared to HLA*A24, it being in strong negative disequilibrium with *Cw3 (D = -0.0505) with a haplotype frequency of 0.0739. The frequencies of haplotypes with *A2 are, however, generally large, in the order of 0.05 or higher. The blank allele, *CwX, and *A2 form a haplotype that is particularly common, 0.1594, and that exhibits positive genetic disequilibrium (D = 0.0285). Table 5 presents the frequencies and disequilibria for
42
R . C . Williams and J. E. M c A u l e y
TABLE
2
Haplotype
frequencies, disequilibria, and X 2 values for the HLA-A
*B5
*BN21
*B27
*B35
*B39
.0683 .0270 51.29
.0507 -.0283 31.90
.0682 .0133 9.33
.0938 -.0022 .11
*A24
.0000 -.0252 NT
.0851 .0368 60.11
.0147 -.0189 21.14
*A31
.0062 .0004 .00
.0030 -.0082 NT
*AR
.0000 -.0013 NT
*AX
.0000 -.0009 NT
*A2
*B40
*Bw48
*B51
.0372 -.0243 28.96
.0072 -.0014 .35
.1227 .0189 11.26
.0472 .0161 23.28
.0726 .0141 7.24
.0520 .0144 10.99
.{)034 .{5019 NT
.0440 .0193 13.02
.0141 .0063 6.67
.0033 .0102 NT
.0217 .0130 27.75
.0001 -.0011 NT
.0000 -.0024 NT
.0021 .0004 NT
.0029 -.0001 NT
.0000 -.0019 NT
.0037 .0021 NT
.0000 -.0011 NT
.0004 -.0016 NT
.0000 -.0013 NT
and HLA-B *Bw60
loci a
*Bw61
*BR
*BX
.{1028 .0166 NT
.0265 -.0004 .00
.0109 -.0008 .04
.0189 .0013 .11
.0000 -.0190 NT
.{)266 .0147 33.23
.0212 .0048 2.32
.0042 -.0029 1.74
.0147 .0024 .64
.0205 .0058 3.27
.0060 .0016 .46
.0015 -.0013 NT
.0007 .{503l NT
.0012 .0004 NT
.0000 .0028 NT
.0000 -.0003 NT
.0000 -.0032 NT
.0029 .0020 NT
.0016 .0010 NT
.0000 -.0008 NT
.0048 .0044 NT
.0027 .0021 NT
.0048 .0046 125.93
.0000 .0022 NT
.0000 -.0007 NT
.0027 .0023 NT
.0001) .0005 NT
.0000 .0002 NT
.0000 -.0004 NT
In order, haplotype frequency, disequilibrium value, and X2 statistic; NT, not tested because one or more of the cells had small numbers.
TABLE
3
Haplotype
*B5
f r e q u e n c i e s , d i s e q u i l i b r i a , a n d X2 v a l u e s for t h e H L A - B
*BN21
*B27
*B35
*B39
*B40
*Bw48
and HLA-C
loci a
*B51
*Bw60
*Bw61
*BR
*BX
*Cw2
.0000 -.0074 NT
.0000 -.0141 NT
.0975 .0877 NT
.0000 -.0169 NT
.0000 -.0109 NT
.0000 -.0015 NT
.0009 .0173 NT
.0000 -.0056 NT
.0000 -.0036 NT
.0000 -.0047 NT
.0000 -.0021 NT
.0000 -.0033 NT
*Cw3
.0026 -.0139 NT
.1046 .0729 307.76
.0002 -.0217 NT
.0143 -.0238 27.54
.0000 -.0245 NT
.0156 .0121 NT
.0024 -.0389 NT
.0000 -.0126 NT
.0334 .0254 NT
.0462 .0356 NT
.0000 -.0046 NT
.0016 .0059 NT
*Cw4
.0000 .0114 NT
.0000 -.0219 NT
.0000 -.0151 NT
.1505 .1242 NT
.0000 -.0169 NT
.0000 -.0024 NT
.0000 -.0285 NT
.0009 -.0078 NT
.0000 -.0056 NT
.0000 .0073 NT
.0009 -.0023 NT
.0000 .0052 NT
*Cw7
.0000 -.0086 NT
.0198 .0032 .79
.0000 -.0114 NT
.0000 -.0198 NT
.0659 .0531 341.28
.0000 -.0018 NT
.0211 -.0004 .00
.0000 -.0066 NT
.0000 -.0042 NT
.0000 .0055 NT
.0042 .0018 1.17
,0042 .0003 .01
*Cw8
.0490 .0364 165.78
.0009 .0233 NT
.0000 -.0167 NT
.0000 -.0290 NT
.0010 -.0177 NT
.0000 -.0026 NT
.0709 .0395 88.90
.0298 .0202 64.89
.0000 -.0061 NT
.0000 -.0081 NT
.0098 .0063 14.88
.0067 .0010 0.10
*CwR
.0000 -.0002 NT
.0000 -.0003 NT
.0000 .0002 NT
.0016 .0012 NT
.0000 -.0003 NT
.0000 .0000 NT
.0000 -.0005 NT
.0000 -.0001 NT
.0000 -.0001 NT
.0000 -.0001 NT
.0008 .0008 NT
.0000 .0001 NT
*CwX
.0232 .0050 2.07
.0183 -.0166 14.29
.0016 -.0225 NT
.0059 -.0359 59.64
.0442 .0172 19.31
.0000 -.0038 NT
.0917 .0463 93.27
0.263 .0124 18.15
.0031 -.0058 NT
.0019 -.0098 NT
.0052 .0001 .03
.0215 .0132 33.73
In order, haplotype frequency, disequilibrium value, and X2 statistic; NT, not tested because one or more of the cells had small numbers.
Distribution of Class I Genes in Pimans
TABLE 4
43
Haplotype frequencies, disequilibria, and X 2 values for the HLA-A and HLA-C loci a *Cw2
*Cw3
*Cw4
*Cw7
*Cw8
*A2
.0757 .0193 19.68
.0739 -.0505 72.37
.0749 -.0107 4.03
.0568 -.0069 2.07
.1125 .0202 14.20
.0014 .0000 NT
.1594 .0285 21.88
*A24
.0135 -.0210 25.56
.1280 .0521 84.88
.0775 .0252 26.07
.0364 .0025 .24
.0168 -.0395 60.88
.0011 .0002 NT
.0653 -.0146 6.17
*A31
.0124 .0045 3.11
.0062 -.0114 11.95
.0019 -.0102 NT
.0189 .0099 15.15
.0319 .0188 42.00
.0000 -.0002 NT
.0071 -.0114 11.52
*AR
.0000 -.0017 NT
.0048 .0010 . 16
.0000 -.0026 NT
.0028 .0008 NT
.0051 .0023 1.95
.0000 .0000 NT
.0042 .0002 .01
*AX
.0000 -.0011 NT
.0114 .0088 NT
.0000 -.0018 NT
.0000 -.0013 NT
.0000 -.0019 NT
.0000 .0000 NT
.0000 -.0027 NT
In order, haplotype frequency, disequilibrium value, and
TABLE 5
X2
*CwR
*CwX
statistic; NT, not tested because one or more of the cells had small numbers.
Three locus haplotype frequencies and disequilibria values for polymorphic (>0.01) class I haplotypes in the Pimans
Haplotype
Frequency
Cumulative frequency
D
*A24 *BN21 *Cw3 *A2 *B35 *Cw4 *A24 *B35 *Cw4 *A2 *Bw48 *CwX *A2 *B27 *Cw2 *A2 *B5 *Cw8 *A24 *Bw48 *Cw8 *A2 *Bw61 *Cw3 *A2 *Bw48 *Cw8 *A2 *B51 *Cw8 *A24 *B39 *CwX *A24 *B39 *Cw7 *A2 *B5 *CwX *A2 *B51 *CwX *A2 *B39 *Cw7 *A24 *Bw60 *Cw3 *A2 *BN21 ~Cw3 *A24 *Bw61 *Cw3 *A2 *BN21 *Cw7 *A31 *B39 *Cw7 *A2 *B39 *CwX *A2 *Bw48 *Cw7 *A31 *Bw48 *Cw8 *A24 ~Bw48 ~CwX *A24 ~B27 *Cw2 *A24 *BX *CwX *A31 *B27 *Cw2 *A2 *BN21 *CwX
0.0852 0.0751 0.0745 0.0735 0.0705 0.0430 0.0319 0.0266 0.0254 0.0248 0.0245 0.0245 0.0243 0.0230 0.0228 0.0216 0.0208 0.0192 0.0189 0.0187 0.0171 0.0149 0.0147 0.0143 0.0135 0.0119 0.0118 0.0110
0.0852 0.1603 0.2348 0.3083 0.3788 0.4218 0.4537 0.4803 0.5057 0.5305 0.5550 0.5795 0.6038 0.6268 0.6496 0.6712 0.6920 0.7112 0.7301 0.7488 0.7659 0.7808 0.7955 0.8098 0.8233 0.8352 0.8470 0.8580
0.0344 0.0061 0.0173 0.0127 0.0133 0.0097 0.0185 0.0035 -0.0212 0.0044 0.0077 0.0008 0.0029 0.0030 -0.0101 0.0052 -0.0236 0.0001 0.0122 0.0110 -0.0047 0.0023 0.0046 -0.0092 -0.0155 0.0045 0.0032 0.0038
the three-locus haplotypes. The new allele *BN21 is in positive disequilibrium with *A24 and *Cw3 (D = 0.0344) and has the highest haplotype frequency, 0.0852. Nine haplotypes together represent more than 50% of the total variation for these three allele combinations. Haplotypes *A2 *B35 *Cw4, *A24 *B35 *Cw4, *A24 *Bw48 *CwX, and *A2 *B27 *Cw2 are common, all with frequencies greater that 0.07. A complete list of the three-locus haplotypes, their disequilibria, and associated statistics is available from the authors. DISCUSSION Serology at the H L A - A locus in this population is relatively simple because of its well-characterized alleles. At HLA-B, however, alleles private to Native Americans and crossreactive groups make typing more difficult. Of special interest is allele H L A * B N 2 1 , defined in the 10th International Workshop [14]. It is characterized by a strong reaction with anti-I-ILA*B21 sera and the absence of reactivity with monospecific anti-HLA*B49 and anti-HLA*Bw50 reagents. This antigen and its frequency have also been reported in Mexican Americans [26]. Complicating the serology in Pimans is the simultaneous segregation of crossreacting alleles HLA*Bw48, *Bw60, and *Bw61, all at polymorphic frequencies. We have also found evidence for another new allele, a split of HLA*BS, not *B51 or *Bw52, that appears to share an epitope with *Bw53 (data not shown). Further characterization is in progress.
44
R.C. Williams and J. E. McAuley
The variation in class I genes is much more restricted in Pimans than European American and many other non-Indian populations. At the HLA-A locus nearly 90% of variation is accounted for by *A2 and *A24, while only three alleles have a polymorphic frequency above 0.01. This contrasts with Caucasian populations which have 14 or 15 alleles segregating with frequencies greater than 0.01 [25]. In Pimans, while nine serologically well-defined HLA-B specificities segregate at polymorphic frequencies, 20 to 25 alleles are commonly found in Caucasian populations at this magnitude [25]. These restricted distributions result in higher proportions of homozygosity and less heterozygosity in Pimans when compared with Europeans, European Americans, and other non-Indian groups. The entries for the cells for the X2 statistics in the two and three locus disequilibria calculations are themselves estimates taken from the final matrix of haplotype frequencies. Usually the cells in such tests are enumerated observations of a given class. Given that they are instead estimates, the X2 values must be viewed as approximations.
loci in disease-association studies, it is therefore important to explore their utility for determining the extent of European admixture in Native American populations. The present study and other work of the authors demonstrate that the HLA-A, HLA-B, and HLA-C loci, along with Gm, provide highly polymorphic genetic systems that are sensitive probes for Caucasian alleles in Native American populations [19, 20, 28]. To measure this gene flow Bernstein's method and its extensions are usually employed [20, 26, 29-31]. Bernstein's original formula for estimating the relative proportions of admixture from two parental populations while using one marker allele is PH = mPA + (1.0 -- m)PB where PH, PA, and PB are the frequencies of an allele or haplotype in the hybrid population and two parental groups, respectively, and m is the proportion of alleles in the hybrid that came from population A [29]. Two assumptions underlie the method: (1) that all of the alleles in the hybrid come from either A or B by gene flow; (2) that the allele frequencies are known without error. The efficiency of a genetic marker for measuring m depends upon the difference between PA and P~, and is greatest when the allele or haplotype has a high frequency in the one parental group and is low in the other. This is the case for Gm~;5,13.14;its frequency in European Americans is approximately 0.650 while it is extremely rare in "full blooded" Native Americans. Analogously, the HLA-A and HLA-B loci possess a large number of alleles that are "private" markers for European admixture in southwestern Native American populations, genes that are not present in nonadmixed American Indians but which are commonly found in European Americans (Table 6). For instance, at the HLA-A locus, alleles *A1, *A3, * A l l , *A23, *A25, *A26, *A28, and *A29 are found at a frequency greater than 0.01 in Caucasians but have a very low frequency or are absent in the Pimans, Hopi, and Navajo when the typed samples are restricted to "full blooded" Native
Genetic Admixture As part of epidemiological and disease-association studies in Native Americans it is very important to be able to control for the effect of genetic admixture, which is usually from European Americans. For example, it has been shown that the haplotype Gm 3;5,13,14 is a very efficient marker for Caucasian admixture in Native Americans [19, 20]. While using this set of allotypes to measure the amount of admixture in subjects from the Gila River Indian Community, we have found that there is an inverse relation between the proportion of European alleles and the prevalence of N I D D M : the more genetic admixture, the lower the risk of diabetes [27]. This relation also led to a spurious association between the haplotype and the disease that could have been easily misinterpreted if the presence of genetic admixture had been ignored [27]. Given the prominence of the HLA TABLE 6
Distribution of alleles at the HLA-A and HLA-B loci in Pimans, Hopi, Navajo, and Caucasians
~A1
*A2
"A3
WAll
"A23
*A24
~A25
*A26
"A28
~A29
*A30
0 0 0 1
1 l [ i
0 0 0 1
0 0 0 1
0 0 0 1
1 1 1 I
0 0 0 [
0 0 0 1
0 0 0 1
0 0 0 I
0 1 1 I
~BN21
*B27
*B35
"B37
*B38
*B39
*B44
*Bw48
~B49
*BwS0
*B51
Pimans Navajo Hopi Caucasian
1 1 1 0
1 1 1 1
1 1 1 I
0 0 0 1
0 0 0 1
1 1 1 1
0 0 0 1
1 1 1 0
0 0 0 1
0 0 0 1
1 = frequency
--> 0 . 0 1 ; 0
Pimans Navajo Hopi Caucasian
frequency
< 0.01.
1 1 1 1
~A3[ 1 l l I "gw52 0 0 0 1
~A32 0 1 1 1 *Bw55 0 0 0 1
*Aw33 0 1 0 1 "Bw56 0 0 0 1
*B7
*B8
0 0 0 1
0 0 0 I
~BwS"
*Bw58
0 0 0 1
0 0 0 I
*BI~ 0 () () t ~Bw60 I l ~ 1
*BI4
*BIB
0 0 0 I
0 0 0 1
*Bw61 1 i I 1
~Bw62 0 1 1 1
Distribution of Class I Genes in Pimans
Americans [28]. The H L A - B locus is also a rich source of "private" marker alleles for European admixture (Table 6). For tracing Native American admixture in Caucasian populations, the antigens BN21 and Bw48 are good "private" markers because they are found in these American Indian tribes but are rare or absent in European Americans. H o w e v e r , "private" genetic markers need not be used exclusively to estimate m in admixed populations. Multiallelic extensions of Bernstein's method have been developed with special techniques that account for the lack of independence o f the H L A loci [26, 30, 31]. One was recently used to partition the admixture in Mexican Americans [26]. T h e s e techniques can simultaneously estimate m over all marker alleles and for more than two parental groups. H o w e v e r , given that the variation at the H L A loci in nonadmixed Native Americans is so restricted, when compared with Caucasians, it is very useful to be able to consult lists of "private" alleles to obtain a general idea of how admixed a Native American sample might be. C o n t e m p o r a r y M o d e l for the P a l e o - I n d i a n s One hypothesis about the origin of humans in the N e w World, this supported by linguistic, dental, and genetic evidence, is that three migrations occurred over a duration of 40,000 years [19, 32]. The oldest migration, 20,000 to 40,000 years ago, has been labeled PaleoIndian. Descendants of these first Americans migrated throughout the North, Central, and South Americas and constitute the vast majority of contemporary Native Americans. The second migration, 7,000 to 15,000 years ago, was that of the N a - D e n e who are represented in the Southwestern United States by the Apache and Navajo tribes. Aleut-Eskimo is the most recent migration, 3,000 to 5,000 years ago. G m haplotype distributions, linguistics, and dental evidence assign the Pimans to the first, Paleo-Indian migration [19, 32]. Since the arrival of Europeans, Africans, Asians, and other ethnic groups to the N e w World, alleles from these populations have entered the genomes of Native American populations. To find a contemporary tribe that might approximate the original genetic distribution of Paleo-Indians, a control for genetic admixture is necessary. T h e distribution of G M allotypes in the Pimans demonstrates that the stated-admixture variable is an efficient marker of genetic admixture in this population [19, 20]. Therefore, the "full blooded" Pimans, as measured by the stated-admixture variable, approximate a contemporary model for H L A class I variation in the Paleo-Indian. W h e t h e r the H L A variation for contemporary Native Americans will further support the three migration hypothesis is still an open question. The similar distri-
45
butions of the class I alleles in Pimans, Hopi, and Navajo, when controlled for genetic admixture, suggest that the serological epitopes as presently determined are incapable of discriminating three distinct genetic distributions [28]. W h e t h e r additional population studies that include both serological and D N A polymorphisms will help resolve the problem of Native American origins remains to be tested.
ACKNOWLEDGMENTS
We thank the members of the Gila River Indian Community for their cooperation and participation in the study, Dr. Donna Kostyu for her review of an early draft, Drs. William C. Knowler and David J. Pettitt for providing blood samples and for their helpful comments on the manuscript, Drs. Clifton Bogardus and Stephen Lillioja for providing blood samples, and Robin Medis, Rosalinda Partel, Susy Mathai, Lisa Welte, Lora Mastrangelo Nelson, Evelyn Tomines, Diana Candido, and Donna Swanson for their technical assistance. We especially express our gratitude to the management of Blood Systems, Inc., for its continuing support. This project was supported in part by BRSG 2 S07 RR07112, Division of Research Resources, National Institutes of Health.
REFERENCES I. Bahr DM: Pima and Papago social organization. In Ortiz A (ed): Handbook of North American Indians, Volume 10, Southwest. Washington, D.C., Smithsonian Institution, 1983. 2. Bahr DM: Pima and Papago medicine and philosophy. In Ortiz A (ed): Handbook of North American Indians, Volume 10, Southwest. Washington, D.C., Smithsonian Institution, 1983. 3. Ezell PH: History of the Pima. In Ortiz A (ed): Handbook of North American Indians, Volume 10, Southwest. Washington, D.C., Smithsonian Institution, 1983. 4. Fontana BL: Pima and Papago: Introduction. In Ortiz A (ed): Handbook of North American Indians, Volume 10, Southwest, Washington, D.C., Smithsonian Institution, 1983. 5. Fontana BL: History of the Papago. In Ortiz A (ed): Handbook of North American Indians, Volume 10, Southwest. Washington, D.C., Smithsonian Institution, 1983. 6. Hackenberg RA: Pinna and Papago ecological adaptations. In Ortiz A (ed): Handbook of North American Indians, Volume 10, Southwest. Washington, D.C., Smithsonian Institution, 1983. 7. Perkins HA, Payne RO, Kidd KK, Huestis DW: HL-A and GM typing of Papago Indians. In Dausset J, Colombani (eds): Histocompatibility Testing 1972. Copenhagen, Munksgaard, 1973.
46
8. Spees EK, Kostyu DD, Elston RC, Amos DB: HL-A profiles of the Pima Indians of Arizona. In Dausset J, Colombani (eds): Histocompatibility Testing 1972. Copenhagen, Munksgaard, 1973. 9. Williams RC, Knowler WC, Butler WJ, Pettitt DJ, Lisse JR, Bennett PH, Mann DL, Johnson AH, Terasaki PI: HLA-A2 and type 2 (insulin independent) diabetes mellitus in Pima Indians: an association of allele frequency with age. Diabetologia 21:460, 1981. 10. Kostyu DD, Stewart AD, Pettitt DJ, Knowler WC, Amos DB: Unusual HLA antigens in North American Pima Indian population. In Simons MJ, Tait BD (eds): Proceedings of the Second Asia and Oceania Histocompatibility Workshop Conference. Victoria, Australia, Immunopublishing, 1983. 11. Ensroth AF, Mann DL, Johnson AH, Knowler WC, Pettitt DJ, Bennett PH: HLA and B-lymphocyte alloantigens in Gila River Indians. Tissue Antigens 21:198, 1983. 12. Amos DB, Kostyu DD, Stewart AD, Ward FE: North American Indians: Pima Indians. In Aizawa M, Natori T, Wakisaka A, Konoeda Y (eds): Proceedings of the Third Asia-Oceania Histocompatibility Workshop and Conference. Sapporo, Hokkaido University Press, 1986. 13. Ward FE, Kostyu DD, Stewart AD, Amos DB: HLA antigens in the Pima Indians. In Aizawa M, Natori T, Wakisaka A, Konoeda Y (eds): Proceedings of the Third Asia-Oceania Histocompatibility Workshop and Conference. Sapporo, Hokkaido University Press, 1986. 14. Williams RC, Chen SN, Gill DK, Lane JT, McAuley JE, Strothman R, Mittal KK: Antigen society #6 report (B21, B49, Bw50, BN21, B12, B44, B45). In Dupont B (ed): Immunobiology of HLA, Volume I, Histocompatibility Testing 1987. New York, Springer-Verlag, 1989. 15. Knowler WC, Pettitt DJ, Bennett PH, Williams RC: Diabetes mellitus in the Pima Indians: genetic and evolutionary considerations. Am J Phys Anthropol 62:107, 1983. 16. Knowler WC, Pettitt DJ, Saad MF, Bennett PH: Diabetes mellitus in the Pima Indians: incidence, risk factors and pathogenesis. Diabetes Metab Rev 6:1, 1990. 17. Bennett PH, Butch TA, Miller M: Diabetes mellitus in American (Pima) Indians. Lancet 2:125, 1971. 18. Knowler WC, Bennett PH, Hamman RF, Miller M: Diabetes incidence and prevalence in Pima Indians: a 19-fold greater incidence than in Rochester, Minn. Am J Epidemiol 108:497, 1978. 19. Williams RC, Steinberg AG, Gershowitz H, Bennett PH, Knowler WC, Pettitt DJ, Butler W, Baird R, Dowda-Rea L, Burch TA, Morse HG, Smith CG: Gm allotypes in
R.C. Williams and J. E. McAuley
Native Americans: evidence for three distinct migrations across the Bering land bridge. AmJ Phys Anthropol 66:1, 1985. 20. Williams RC, Steinberg AG, Knowler WC, Pettitt DJ: Gm ~;5,1~,14 and stated-admixture: indepenedent estimates of admixture in American Indians. Am J Hum Genet 39:409, 1986. 21. Hopkins KA: Basic microlymphocytotoxicity test. In Zachary AA, Teresi GA (eds): ASHI Lab Manual, 2nd Edition. American Society for Histocompatibility and Immunogenetics, 1990. 22. Li CC: First Course in Population Genetics. Pacific Grove, The Boxwood Press, 1976. 23. Albert ED, Mickey MR, McNicholas AC, Terasaki PI: Seven new HL-A specificities and their distribution in three races. In Terasaki PI (ed): Histocompatibility Testing 1970. Copenhagen, Munksgaard, 1970. 24. Baur MP, DanilovsJA: Population analysis of HLA-A, B, C, DR, and other genetic markers. In Terasaki PI (ed): Histocompatibility Testing 1980. Copenhagen, Munksgaard, 1980. 25. Baur MP, Neugebauer M, Albert ED: Reference tables of two-locus haplotype frequencies for all MHC marker loci. In Albert ED, Baur MP, Mayr WR (eds): Histocompatibility Testing 1984. Berlin, Springer-Verlag, 1984. 26. Long JC, Williams RC, McAuley JE, Medis R, Partel R, Tregellas WM, South SF, Rea AE, McCormick SB, Iwaniec U: Genetic variation in Arizona Mexican Americans: estimation and interpretation of admixture proportions. Am J Phys Anthropol 84:141, 1991. 27. Knowler WC, Williams RC, Pettitt DJ, Steinberg AG: G m 3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am J Hum Genetics 43:520, 1988. 28. Williams RC, Morse HG, Bonnell MD, Rate RG, Kuberski TT: The HLA loci of the Hopi and Navajo. Am J Phys Anthropol 56:291, 1981. 29. Bernstein F: Comitato Italiona per lo Studio dei Problemi della Populazione. Rome, Istituto Poligratico dello Stato. 30. Long JC, Smouse PE: Intertribal gene flow between the Ye-cuana and Yanomama: genetic analysis of an admixed village. Am J Phys Anthropol 61:411, 1983. 31. Long JC: Admixture, genetic drift and the structure of hybrid populations. Genetics 127:417, 1991. 32. GreenbergJH, Turner CG, Zegura SL: The settlement of the Americas: a comparison of the linguistic, dental, and genetic evidence. Current Anthropol 27:477, 1986.