Genomewide Search for Genes Influencing Percent Body ... - NCBI - NIH

3 downloads 58012 Views 1MB Size Report
Oct 16, 1996 - for percent body fat (277 siblings), the best available indicator of ... 541, 4212 North 16th Street, CDNS-NIH, Phoenix, AZ 85016-5319. E-mail: ...
Am. J. Hum. Genet. 60:166-173, 1997

Genomewide Search for Genes Influencing Percent Body Fat in Pima Indians: Suggestive Linkage at Chromosome 11q21-q22 R. A. Norman,1 D. B. Thompson,1 T. Foroud,2 W. T. Garvey,3 P. H. Bennett,1 C. Bogardus, E. Ravussin,1 and other members of the Pima Diabetes Gene Group* 'Clinical Diabetes and Nutrition Section, National Institute of Diabetes, Digestive, and Kidney Diseases, National

Institutes of Health,

Phoenix; 2Department of Medical and Molecular Genetics, Indiana University, Indianapolis; and 3Division of Endocrinology, Diabetes, and Medical Genetics, Medical University of South Carolina, Charleston

Summary On the basis of accumulating evidence that obesity has a substantial genetic component, a genomewide search for linkages of DNA markers to percent body fat is ongoing in Pima Indians, a population with a very high prevalence of obesity. An initial screen of the genome (>600 markers in 874 individuals) has been completed using highly polymorphic markers (mean heterozygosity = .67). Reported here are the sib-pair linkage results for percent body fat (277 siblings), the best available indicator of overall obesity. Single-marker linkages to percent body fat were evaluated by sib-pair analysis for quantitative traits. From these analyses, the best evidence of genes influencing body fat came from markers at chromosome 11q21-q22 and 3p24.2-p22 (P = .001; LOD = 2.0). Regions flanking these markers were further investigated by multipoint linkage. The evidence for linkage at 11q21-q22 increased to P = .0002 (LOD = 2.8), peaking between markers D11S2000 and D11S2366. Evidence for linkage at 3p24.2-p22 did not change. No association was detected for any marker in the region. Although several genes are known in the 11q21-q22 region, none have been implicated as candidate genes for obesity.

weight for height. In humans, estimates of broad-sense heritability (H2) are best provided by twin studies. Such studies have generally produced large and variable estimates for the heritability of BMI. For example, a study of MZ twins reared together and DZ twins reared together and apart yielded an H2 of .70 for men and .67 for women, with a substantial portion of genetic variation being due to dominance for excess BMI (Stunkard et al. 1990). The H21 of male twins that were induced into the U.S. Army during World War II was .71 for maximum BMI (Fabsitz et al. 1994), with additive genetic effects accounting for 59% of the variance (Carmelli et al. 1994). H2 ranged from .46 to .61 in Danish male twins (increasing with age) and .75 to .77 in females (Herskind et al. 1996). Again, the genetic effect was mainly additive. A better indicator of overall obesity is percent total body fat. Because measures of this indicator of obesity involve special equipment and patient inconvenience, much less data are available and large twin studies have not been done. However, a Canadian population study of percent body fat transmission among relatives found 25% of the variation to be due to an additive genetic effect (Bouchard et al. 1988). In the same study, only 5% of the variation in BMI was genetic. Among Pima Indians, the estimate of percent body fat heritability based on correlations among relatives was 0.76 (H. Sakul, personal communication). Introduction Although these studies have indicated a genetic basis A growing body of evidence indicates an important role for obesity, none provide information as to the number for genetic factors in the development of obesity. Such of genes or frequencies of alleles that are contributing studies of obesity depend mostly on measurements of to this phenotype. Recent complex segregation analyses body mass index (BMI = weight/height2). Although of obesity phenotypes have fostered the view that alleles flawed as a measure of total body fat (Garn et al. 1986), at major genes may be responsible for much of the geBMI is commonly used as an index of fatness or excess netic portion to obesity variation (Borecki et al. 1993, 1995; Rice et al. 1993; Comuzzie et al. 1995). Linkage however, dependent on the outcome of such analyses, Received June 14, 1996; accepted for publication October 16, 1996. Address for correspondence and reprints: Dr. R. A. Norman, Room models, may be too restrictive, given the complex and poorly understood nature of the obesity phenotype. Al541, 4212 North 16th Street, CDNS-NIH, Phoenix, AZ 85016-5319. E-mail: [email protected] though generally less powerful, model-independent ap*Claire Allan, Leslie Baier, Donald Bowden, Robert Hanson, Wil- proaches to linkage analysis offer the hope of uncovering liam Knowler, Sayuko Kobes, David Pettitt, and Michal Prochazka. important loci, regardless of gene action or mode of © 1997 by The American Society of Human Genetics. All rights reserved. inheritance. A commonly used method is the analysis of 0002-9297/97/6001-0023$02.00 166

Norman et al.: Percent Body Fat in Pima Indians

differences between siblings, which has been adapted for two-point linkage analysis of a marker to quantitative trait variation (Haseman and Elston 1972) and recently extended to multipoint analysis (Fulker and Cardon 1994; Olson 1995; Kruglyak and Lander 1995). Obesity and non-insulin dependent diabetes mellitus (NIDDM) are exceedingly prevalent in the Pima Indians of the southwestern United States (Knowler et al. 1991). Data collected on a sample of Pima Indians that were followed over time for the development of NIDDM are consistent with the view that obesity, insulin resistance, and low acute insulin response are all conditions predicting the development of NIDDM (Lillioja et al. 1993). Thus, genetic variants for these underlying phenotypes might contribute to the liability for developing NIDDM. A genomewide search for putative genes related to these characteristics in Pima Indian families has been undertaken using highly polymorphic microsatellites. In this paper, results of sib-pair linkage analyses are examined for evidence of major genes that influence percent body fat. Subjects, Material, and Methods Subjects and Phenotypes Genotypes of polymorphic markers were determined using DNA collected from 874 individuals informative for NIDDM and obesity. Subjects are members of the Gila River Indian Community who participate in a prospective study on the development of NIDDM and its complications (Bennett et al. 1971). Within this group, measures of body composition were available from 303 individuals-277 siblings and 26 parents. For linkage analyses, this sample provided a phenotypic database of 283 sibling pairs from 88 nuclear families. Average sibship size was small: >80% of these families had just two or three siblings. Marker data were available from both parents in 25 families, from one parent in 46 families, and from neither parent in 17 families. Body composition was determined by hydrostatic weighing with simultaneous measurement of lung residual volume by using the Siri equation (Siri 1961). To allow comparisons among siblings differing in age and sex, percentage body fat was adjusted by multiple linear regression for the significant (P < .05) effects of age and sex. The data were examined for normality and outliers before and after adjustments were performed. Plots of residuals, studentized residuals, and Cook's D statistic (Cook 1979) were examined in various leverage plots of the data. According to these plots, the data were judged free of outliers. The residual value (%fat) was then used in linkage calculations. Although many subjects received several examinations over the course of several years, a single value of %fat was selected for linkage studies. To best represent an individual's pro-

167

pensity toward obesity, this %fat value, measured by underwater weighing, was taken at the maximum body weight. None of individuals was diabetic at the time of measure. The physical characteristics of the sibling group are given in table 1. All studies were approved by the National Institute of Diabetes, Digestive and Kidney Diseases and the Tribal Council of the Gila River Community. Genetic Markers and Linkage Analysis A large set of microsatellite markers were typed by the Marshfield Medical Research Foundation, Marshfield, WI (Dubovsky et al. 1995). To this set were added markers previously typed in our laboratories in genetic studies targeting candidate genes. When combined, the number of markers totaled 674. Although somewhat uneven in distribution, markers cover all chromosomes, with an average separation of 8.3 cM. No two markers are separated by a distance of >20 cM. Standard techniques were employed for producing the genotypes of markers by using either fluorescent- or radioactive-labeled primers (Weber et al. 1993; Schwengel et al. 1994) and PCR. In the absence of fully informative parental genotypes, marker allele frequencies were estimated in a subgroup of 207 Pima Indians selected such that none were firstdegree relatives. In some cases, a rare allele was found segregating in one or two families but was not found in

the sample of first-degree unrelated subjects. Such alleles were assigned frequencies of 0.5%. Sib-pair linkage analysis was performed with the Sibpal program (version 2.62) (SAGE 1994). To compensate for the nonindependence of siblings within sibships, Sibpal calculates P values using reduced df's. In our data composed solely of nuclear families, the df is the number of siblings in each family minus 1, summed over all families. For large samples, this modification has been shown by simulation to remove the excess of false positives when the total number of sib pairs is used as df (Wilson and Elston 1993). The map positions of all markers in the vicinity of D 1S900 and D3S2432 were determined from information provided by the Marshfield Medical Research Foundation (D. Vaske, personal communication) and from maps at Genome Data Base (GDB). Additional information for chromosome 11 markers and genes was taken from the radiation hybrid map (van Heyningen and Little 1995) and updates available at the McDermott Center Genome Center (http://mcdermott.swmed.edu), San Antonio, TX. Consensus maps around the above markers were derived and used in a multipoint linkage analysis of %fat by employing the MAPMAKER/SIBS 1.0 computer program (Kruglyak and Lander 1995). Before linkage analysis was performed, nuclear families were examined for non-Mendelian inheritance with Pedma-

168

Am. J. Hum. Genet. 60:166-173, 1997

Table 1 Phenotypic Characteristics of Siblings Analyzed for Linkage with %Fat FEMALES

MALES (n = 170)

(n = 107)

Mean SD Range

%Fat

BMI

Age (years)

%Fat

BMI

Age (years)

46 7 25-60

37.4 9.7 18.7-78.4

29.2 6.4 19.6-50.9

34 9 11-55

35.2 10.0 10.0-84.3

28.8 7.2 18.1-48.9

nager 0.9 (Whitehead Institute for Biomedical Research! MIT Center for Genome Research). Eight such families were detected, and the offending marker was set to missing for all members of the family. No single nuclear family was found in violation of Mendelian segregation at more than one marker. Program parameters were set to parallel those of the two-point sib-pair linkage analysis used in the genome scan, i.e., %fat was analyzed as quantitative trait by use of all available sib pairs. The default map interval of 1.0 cM was used for identity-bydescent (IBD) calculations. To decrease computer time needed for completion of Mapmaker/sibs program, 5 siblings were removed from the original 283 so that no one family had more than 6 siblings. This action resulted in the loss of data from 21 sib pairs. The results of Haseman-Elston regression on expected IBD, regression on estimated IBD, and maximum-likelihood variance estimates were examined. P values were converted to LOD scores and vice versa by the relation P = x2 (LOD x 2 x loge(10)) (df = 1) (Lander and Kruglyak 1995).

Results Two-Point Evidence for Linkage

Markers for which single-marker sib-pair linkage tests to %fat attained a P .05 are listed in table 2. Three of these markers were significant at P .001 (LOD > 2.0). Two of them are located at chromosome 1 1q21q22 (D11S900 and D11S2366), and the other D3S2432 is at 3p24.2-p22. D11S900 maps 8-10 cM from dopamine receptor-2 (DRD2), a gene that has been associated with obesity (Comings et al. 1993). Additional markers-D11S876, D11S2000, and D11S897-in the 1 1q21-q22 region were typed and analyzed for linkage. Sib-pair linkage results for markers mapping within 25 cM of Dl1S900 are illustrated in figure 1. The best evidence of linkage occurs in an -10-cM interval bordered by gata35 and D112366, with D11S900, D1 1S2000, and D1 S2366 yielding the highest P values. Our results do not support DRD2 as contributing to this observed linkage. DRD2 is located distal to Dl1S1986 and Dl1S897, which are not linked to % fat. -

-

In addition, a cluster of apolipoprotein genes (APOC3, APOA1, and APOA4) are located further distal to DRD2 and thus not in the region of linkage to %fat. Additional markers distal to D 1S897 show no evidence of linkage to %fat (data not shown). Similar results were obtained when percent body fat was adjusted separately for age in males and females (data not shown). Evidence for linkage to obesity was also sought, through the analysis of the maximum recorded BMI adjusted for sex, age, and date of birth. Excluding measurements taken on subjects diagnosed with NIDDM left 562 siblings for analysis. Despite this large sample, no significant two-point linkages at P < .01 were found for any of the 11q21-q22 markers linked to %fat (data not shown). Lesser evidence (P < .05) of linkage to BMI, however, was seen at D11S2366 and D11S2000. Multipoint Linkage Analysis Multipoint linkage analysis for 1 1q21-q22 markers is presented in figure 2. Three types of analysis obtained from Mapmaker/Sibs are plotted: Haseman-Elston regression on the expected value of IBD (HE), regression on the actual distribution of IBD (EM), and the variance component estimation (VC). All three methods gave virtually identical results; peak LOD scores (2.8 for HE, 2.7 for EM, and 2.6 for VC) occur between markers D11 S2000 and D11S2366. The improved statistical power of the Kruglyak/Lander multipoint approach over single-marker analysis was evident from an increase in LOD score of 0.8. The nonparametric multipoint method, which ranks sib pairs according to phenotypic differences, gave a similar result, with a LOD score of 2.4 (data not shown). The sensitivity of this result to estimates of allele frequencies and map distances were investigated by altering the values and rerunning the linkage analysis. In particular, setting the minimum allele frequency to 3.0% and adjusting other allele frequencies accordingly did not appreciably alter the results. LOD scores were generally reduced by 0.1-0.2 units. Altering the map distances by 1-2 cM also lowered the LOD scores by from 0.1 to 0.2 units.

169

Norman et al.: Percent Body Fat in Pima Indians

Table 2 All Markers that Attained a Single-Test Linkage Significance of -.05 with %Fat

Chromosome 1 1

1 1 1

2 2 2 2

3 3 3 3 4 4 4 S

6 8 11 11 11 11 11 11 11 13 13 14 14 15 16 17 17 18 18 18 19 20 20 21

Marker

pa

DlS3470 DlS1646 DlS1677 DlS305 DlS595 D2S1363 D2S1394 D2S172 D2S427 D3S1280 D3S1306 D3S2432 D3S3038 D4S2397 D4S3253 D4S403 D5S1716 TNF(IR2/4) D8S261 DllS2000 DllS2366 DllS2016 DllS2371 D11S900 DllS988 (GATA35) D13S232 D13S889 D14S1280 D14S297 D15S207 D16S266 D17S579 D17S784 D18S844 D18S877 D18S70 D19S1036 D20S103 D20S118 D21S1 12

.0225 .0499 .0148 .0187 .0224 .0387 .0030 .0123 .0302 .0259 .0385 .0015b .0388 .0146 .0154 .0463 .0241 .0120c

.0246 .0028 .ooosb

.0351 .0163 .0012b

.0365 .0219 .0473 .0344 .0131 .0335 .0077 .0221 .0230 .0463 .0075 .0057 .0089 .0469 .0185 .0393 .0303

Degrees of freedom range from 143 to 152. b Markers with a P value s-.001. c Marker at obesity candidate gene TNFa. a

The Issue of Statistical Significance Some authors have suggested that false-positive rates of linkage analyses may be better estimated from simulations, given the untested assumptions involved in an analytical approach (Lander and Schork 1994; Weeks and Lathrop 1995). To verify that the false-positive rate of single-marker linkage tests was not excessive, empirical P values were calculated through a permutation analysis of the %fat variable. A random-number generator was

used to reassort values of %fat among siblings. Twopoint sib-pair regressions were calculated and tested for linkage at 5 of the 10 markers listed in figure 2. A total of 36,450 permutations were analyzed, 7,290 per marker. The number of nominal significant linkages for P values of .05, .01, and .001 are given in table 3. The number of linkages is in close agreement with the expected number, e.g., 43 observed, 36.5 expected at P = .001. In addition, the marker D11S1396, with the highest number of "false" positives, was not linked in the analysis of the true data. Discussion Compared to Caucasians, the Pima Indians of Arizona suffer from a high prevalence of obesity and NIDDM. A popular speculation proposes that this difference is due to the genetic adaptation of ancestral peoples to cycles of feast and famine (Neel 1966), such as those experienced in desert-dwelling Pimas. According to this view, the Pimas have a different phenotypic norm of reaction than do most western Europeans, and their experiencing of a "western" environment, through diet and physical activity, results in increased prevalence of obesity and diabetes. This view does not require that susceptible populations such as the Pimas harbor unique alleles. The adaptation could result from differences in allele frequencies. Compared to age-matched adults of the general U.S. population, the average BMI of Pimas is higher (Knowler et al. 1991), whereas the interindividual variation in BMI among the Pima is as great as the variation among U.S. residents (Kuczmarski et al. 1994), Swedes (Kuskowska-Wolk and Bergstrom 1993a, 1993b), or Dutch (Seidell et al. 1996). This observation suggests that alleles contributing to obesity might be more frequent in Pima than in Caucasians. The two chromosome regions with the highest level of significance for linkage to %fat in Pima Indians were near Dl1S2366 and D3S2432. Applying the recently suggested criteria (Thomson 1994; Lander and Kruglyak 1995) for evaluating the significance of sib-pair linkage results gathered in a genome scan, the LOD scores of 2.0 in these regions fell just short of the proposed "suggestive" significance level of 2.2. Given that the average spacing between markers in this genomic scan is -10 cM, the expect number of false-positive linkages at LOD = 2.0 in a genomic scan is less than one (from Lander and Kruglyak 1995, fig. 1). Maximizing the available inheritance information through multipoint linkage analysis increased the significance of the linkage to %fat at chromosome 1 1q21-q22 to a maximum LOD of 2.8 at a peak between markers D11S2000 and D11S2366. Because we applied a denser map for multipoint analysis of this region, the expected false-positive rate is better estimated assuming an infinitely dense map. On the as-

170

Am. J. Hum. Genet. 60:166-173, 1997

10 cM

ME ~~~~~~~~~~~~P=. 1

-1.0 ........................................................................................

-2.0

..............

.......

.............

..

......

...................................................

................................................P=.o1

-3.0

...........................................................................................

......P........................

t value Figure 1

Bar-plot of single-marker sib-pair linkage results (t-values) for %fat juxtaposed on a genetic map of markers in chromosome 11q21-q22. Dashed lines are corresponding single-test P values from Sibpal of SAGE.

sumption of such a map, a LOD score of 2.8 would have a false-positive rate of 0.3. Multipoint sib-pair linkage analysis was applied to the region around D3S2432 (3p24.2-p22) also, after appropriate genetic maps were constructed for all available flanking markers. Multipoint results in this case did not strengthen the evidence for linkage (data not shown). The maximum HE LOD score reached 1.9 (2.0 for HE and 1.8 for EM) in a 12 cM span between D3S2432 and D3S3038. (The two-point LOD score for the latter marker was only 0.5). At present, no obvious candidate genes for obesity are found in this region. It is noteworthy, however, that a mouse quantitative trait locus affecting mainly mesenteric fat mass is located in mouse chromosome 9 in a region syntenic to human 3p2l (West et al. 1994). The authors of this study point out the presence of two candidate genes in this region of mouse chromosome 9, the ca-inhibitory G-protein subunit (Gnai-2) and the growth-hormone receptor (Ghr). Treatment of sib-pairs differences as independent is controversial. Some have argued that, although not truly independent, sib pairs behave as independent, especially for large samples containing large numbers of nuclear families, and that dependence has no effect on the distribution of the test asymptotically (R. Elston, personal communication). The slight bias for smaller samples is similarly eliminated in simulations by use of more con-

servative df's, which are now employed in the Sibpal 2.2 and later programs (Wilson and Elston 1993). Others, however, have argued that squared sib-pair phenotypic differences cannot be properly evaluated by standard ttests, and treating them as such could inflate the number of false positives (Kruglyak and Lander 1995). Simulations with our data, which consist of a large number of families with a small average sibship size, suggest that the single-test P values obtained from Sibpal are appropriate for this study. The linkage of %fat at 11q21-q22 stands out from other linkages and represents the best evidence for a major gene contributing to the variation in body fat among Pima Indians. The result offers suggestive evidence for the presence of a gene affecting obesity in this region. To attain a truly significant level LOD score of ¢n3.6, however, such a gene would most likely have to have a very large effect. This level of significance is not likely to be achieved, given the sample of