genome search for additional human loci controlling

7 downloads 0 Views 97KB Size Report
ANNE ZINN-JUSTIN, SANDRINE MARQUET, DOMINIQUE HILLAIRE, ALAIN DESSEIN, AND LAURENT ABEL. Mathematical and Statistical Modeling in Biology ...
Am. J. Trop. Med. Hyg., 65(6), 2001, pp. 754–758 Copyright 䉷 2001 by The American Society of Tropical Medicine and Hygiene

GENOME SEARCH FOR ADDITIONAL HUMAN LOCI CONTROLLING INFECTION LEVELS BY SCHISTOSOMA MANSONI ANNE ZINN-JUSTIN, SANDRINE MARQUET, DOMINIQUE HILLAIRE, ALAIN DESSEIN, AND LAURENT ABEL Mathematical and Statistical Modeling in Biology and Medicine, Institut National de la Sante´ et de la Recherche Me´dicale (INSERM) U.436, Faculte´ de Me´decine Pitie´-Salpeˆtrie`re, Paris, France; Immunology and Genetic of Parasitic Diseases/ Laboratory of Parasitology-Mycology, INSERM U.399, Faculte´ de Me´decine, Marseille, France; Laboratory of Human Genetics of Infectious Diseases, INSERM U.550, Faculte´ de Me´decine Necker, Paris, France

Abstract. Schistosomiasis is a major public health problem in many developing countries. Previous studies have shown that infection levels by Schistosoma mansoni in a Brazilian population is controlled by a major gene, denoted as SM1, which was mapped to chromosome 5q31-q33 by use of a model-based (logarithm of the odds [lod] score) analysis method. The present study is an autosome-wide scan searching for additional human loci implicated in the regulation of S. mansoni infection intensities. The weighted pairwise correlation model-free linkage method was used in order to consider large pedigrees and to conduct a 2-locus analysis (i.e., to search for a second locus taking into account linkage to 5q31-q33). The most significant linkage results were again obtained in the 5q31-q33 region. Two additional regions provided linkage results with significance levels around 0.001, 1p21-q23 (results independent of 5q31-q33) and 6p21-q21 (results in interaction with 5q31-q33). The investigation of these regions, which contain some candidate genes, is ongoing in other populations to confirm the role of these regions. In our Brazilian population, no other chromosomal regions provided significant evidence of linkage on the basis of lod-score values computed under the SM1 model.5 However, other loci that may influence S. mansoni infection levels in addition to SM1 are likely to have a mode of action different from that of SM1. Therefore, the identification of additional putative loci requires the use of model-free linkage methods, which do not need to specify the precise genetic mode of action. In the present study, we used the weighted pairwise correlation (WPC) model-free linkage method6 to investigate the role of other loci influencing S. mansoni infection levels in the Brazilian pedigrees we assessed. The WPC approach is especially appropriate to study these extended-family data because it considers all pairs of relatives in large pedigrees. Furthermore, we recently developed extensions of the WPC method,7,8 which allow us to perform 2-point and multipoint linkage analysis that is based on identical by descent (IBD) information and to consider the role of 2 unlinked loci (2locus analysis) influencing the phenotype. This latter extension was especially useful to search for a second locus influencing infection levels, taking into account the known linkage of SM1 to chromosome 5q31-q33.

INTRODUCTION

Schistosomiasis affects ⬎ 200 million people and is a major public health problem in many developing countries.1 The disease is spreading with the implementation of irrigation schemes in developing countries and is caused by eggs laid by schistosome worms. The larvae of Schistosoma mansoni, the most common schistosome species, penetrate the skin of humans in contact with infested waters. Within a few weeks, the parasites migrate to the portal and mesenteric veins of their host, where they mature into either male or egg-laying female worms. Although most eggs pass into the intestine, a number of them are taken by the portal blood flow to the liver, where they are stopped in small vessels, causing inflammatory granuloma. In certain people, granulomas lead to extended periportal fibrosis and portal hypertension, which cause the severe clinical features of schistosomiasis. In a large genetic epidemiology survey conducted in a Brazilian village, we have shown that a major locus controls the levels of infection by S. mansoni.2,3 This was done by means of a model-based analysis strategy combining 2 successive steps. In a first step, segregation analysis provided evidence for a codominant major gene, denoted as SM1, taking into account other risk factors for infection (water contact levels, age, and sex).2 Under this major gene model (SM1 model), ⬃ 3% of the population are homozygous and predisposed to very high infection levels, 68% are homozygous and resistant, and 29% are heterozygous with an intermediate level of resistance. In a second step, a genome screen was carried out by means of a classical (model based) lod-score (the decimal logarithm of the likelihood ratio for linkage versus non-linkage) analysis that is based on the SM1 model, and SM1 was mapped to chromosome 5q31q33.3 The 5q31-q33 region contains several candidate genes encoding immunologic molecules that had been shown to play important roles in human protection against schistosomes, and the presence of a locus influencing S. mansoni infection levels on chromosome 5q31-q33 was further replicated in a Senegalese population.4

MATERIAL AND METHODS

Study area and familial data. The study subjects lived in a small village, Caatinga do Moura, in the northeast of Brazil, where S. mansoni is endemic. The measurements of individual exposure to infection and the determination of the phenotypes measuring susceptibility to S. mansoni infection have been extensively described elsewhere.2,9 In brief, infection intensities were measured by individual fecal egg counts before any treatment expressed in eggs per gram of feces and noted as E1 values. These E1 values were log transformed and adjusted for risk factors of infection (water contact levels, age, and sex) by multivariate regression analysis, leading to the so-called E3 values, which were the phenotypes studied in previous linkage analyses3,5 and which we also used in the current work.

754

755

GENETICS OF S. MANSONI INFECTION

The present study was performed on the 142 subjects previously analyzed by Marquet and others.3,5 These people belong to 11 families, including 2 large pedigrees (comprising 50 and 20 people, respectively), 5 smaller pedigrees (12, 9, 14, 6, and 7 people, respectively), and 4 nuclear families (7, 5, 6, and 6 subjects, respectively). Informed consent was obtained from adults or from parents of minors, and human experimentation guidelines of the authors’ institutions were followed. Genotyping. We performed DNA extraction, amplifications, and PCR-product analysis as previously described.3,5 The primary map for the 22 autosomes consisted of 235 markers from the Ge´ne´thon panel of microsatellite markers10 with an average interval spanned by adjacent markers estimated to be 15 cM; no interval was ⬎ 35 cM, and 5 intervals were between 25 and 35 cM. Additional markers were from the Ge´ne´thon panel or intragenic, as described elsewhere.3,5 Marker allelic frequencies were estimated from our data. Linkage analysis. All linkage analyses were performed by means of extensions of the WPC method.6 The WPC approach is a model-free method of linkage analysis—that is, it makes no assumptions about the mode of inheritance of the trait, which has the advantage of being applicable to any kind of traits, and it considers all pairs of relatives in large pedigrees. Furthermore, in the present case of a quantitative trait (E3 values quantifying infection levels), the method does not need to make any assumptions on the trait distribution. The basic principle of the WPC approach is to test whether 2 relatives having close infection levels share more marker alleles than randomly expected. For a pedigree of n members, the WPC statistic S is written as a summation over all pairs of relatives: S⫽

冘 冘 W UU,

n⫺1

n

i⫽1 j⫽i⫹1

ij

i

j

where Wij specifies the marker similarity between 2 relatives i and j and the cross product UiUj measures the phenotypic resemblance between i and j. The Ui are the residuals of the model chosen for the phenotype, centered within each family and correspond in the present study to the E3 values (i.e., egg counts adjusted for water contact levels, age, and sex) used in previous modelbased linkage analyses. The Wij are based on the proportion of marker alleles shared by 2 relatives i and j determined from marker genotypes, and are computed as Wij ⫽ wij ⫺ E(wij). As recently proposed,8 wij corresponds now to the proportion of alleles shared IBD by i and j (2 alleles are IBD if they are of the same ancestral origin), and E(wij) is the expectation of wij under the null hypothesis of no linkage H0. We extended the WPC method in order to compute multipoint IBD sharing between pairs of relatives11 by means of the efficient regression approach developed by Fulker and others12 for sibships, and extended by Almasy and Blangero13 for pedigrees. The Wij can also be modeled to account for the role of 2 unlinked marker (2-locus model) in the expression of the phenotype, and we used in the present analysis the multiplicative 2-locus method described in ZinnJustin and Abel.7 This 2-locus approach can be understood as a statistical interaction effect between the 2 tested markers

because 2 relatives are considered to have a marker resemblance only if they share at least one allele at each marker locus. In the present study, the 2-locus statistic allowed us to search for a locus (outside chromosome 5) taking into account the known linkage to chromosome 5q31-q33. More specifically, we used the linkage information provided by marker D5S636, which gave the highest lod-score value in previous studies, to search for the role of loci located on other autosomes. In the WPC method, the test statistic for linkage is T ⫽ S/(VarS)1/2, which has asymptotically a standard normal distribution under H0. The variance of S is calculated with the use of permutation distribution correcting for residual sibsib correlations.14 However, previous simulation studies under H0 showed an overestimation of the Type I errors for the WPC test,7,8,14 especially for low Type I errors. Therefore, a Monte Carlo (MC) procedure was used to compute empirical P values in order to obtain reliable significance levels, as previously described.7,8 For the one-locus test, denoted as TA, 106 replicates of familial marker data were generated under H0, keeping fixed the pedigree structures and the phenotype values. The 106 TA values were computed providing the empirical distribution of the test under H0, and the MC P value, denoted as pA, is the proportion of replicates that yielded a TA value greater or equal to the one observed in the actual data. The same principle was used for the 2-locus test, denoted as TA円D5S636, for which 106 replicates of marker data were generated at locus A, keeping fixed the pedigree structures, the phenotype values, and the genotypes at D5S636 in order to test the null hypothesis of no linkage to locus A given the linkage information at D5S636. The corresponding MC P value will be denoted as pA円D5S636. The linkage study strategy was a genome-wide search by use of 1) the WPC test TA for the 22 autosomes and 2) the WPC test TA円D5S636 for the 21 autosomes (excluding chromosome 5) to search for a second locus taking into account the linkage to 5q31-q33. Multipoint WPC analyses were performed in the 5q31-q33 region and in other regions showing interesting 2-point results as described in Zinn-Justin and others.11 All computations were performed by our previously developed IBD extension8 of the Fortran WPC program provided by Daniel Commenges. Multipoint IBD sharing estimates for all relative pairs of the pedigree were estimated by means of the SOLAR package.13 RESULTS

In 2-point WPC analysis of chromosome 5 (Figure 1), MC P values ⬍ 0.05 were observed with only the 2 markers of the primary map located in 5q31-q33, D5S393 (pA ⬍ 0.002) and D5S410 (pA ⬍ 0.0004), and lower pA values were obtained with additional markers D5S487 (pA ⬍ 0.0002), D5S636 (pA ⬍ 0.0004), and D5S436 (pA ⬍ 0.0003). Multipoint analysis of the 5q31-q33 region is presented in Figure 2. Maximum multipoint TA values (pA ⬍ 4 ⫻10⫺5) were observed between markers D5S393 and D5S436, not far from the cluster of candidate genes encoding interleukin (IL)-4, IL-5, IL-13, and the interferon regulatory factor 1 (IRF1). By use of the model-based analysis,3 maximum 2-point lodscore values were obtained for D5S636 corresponding to an asymptotic P value around 2 ⫻ 10⫺6, and the peak lod score

756

ZINN-JUSTIN AND OTHERS

FIGURE 1. Monte Carlo (MC) P values (reverse log scale) obtained in the genome screen on the 22 autosomes of the primary map using the weighted pairwise correlation (WPC) one-locus test TA (black bars) and 2-locus test TA円D5S636 (gray bars). Results are not shown for chromosomes that did not provide any MC P value ⬍ 0.10—that is, chromosomes 2 (including 20 markers), 9 (9 markers), 12 (9 markers), 14 (7 markers), 15 (10 markers), 20 (8 markers), 21 (2 markers), and 22 (5 markers).

in multipoint analysis (which was confined to 6-point analysis) was between markers CSF1R (colony-stimulating factor 1 receptor) and D5S636 (asymptotic P value ⬍ 3 ⫻ 10⫺7). Results of the autosome-wide scan that used 2-point WPC tests (TA or TA円D5S636) are shown in Figure 1. Outside chromosome 5, the most interesting results were observed in 3 regions, 1p21-q23 with D1S252 (pA ⬍ 0.004) and D1S305 (pA ⬍ 0.02), 6p21-q21 with D6S286 (pA ⫽ 0.02, pA円D5S636 ⬍ 0.002) and D6S294 (pA円D5S636 ⬍ 0.02), and 7q35-q36 with D7S483 (pA ⬍ 0.01, pA円D5S636 ⬍ 0.008) and D7S550 (pA ⬍ 0.006, pA円D5S636 ⬍ 0.03). Other markers providing MC P values ⬍ 0.05 include D1S234 (pA円D5S636 ⬍ 0.03), D3S1300 (pA円D5S636 ⬍ 0.05), D4S414 (pA円D5S636 ⬍ 0.02), D4S424 (pA円D5S636 ⬍ 0.04), D8S257 (pA ⬍ 0.02), D11S922 (pA ⬍ 0.03, pA円D5S636 ⬍ 0.03), D13S170 (pA円D5S636 ⬍ 0.03), D16S407 (pA円D5S636 ⬍ 0.04), D16S421 (pA円D5S636 ⬍ 0.02), D17S799 (pA円D5S636 ⬍ 0.03), and D18S61 (pA円D5S636 ⬍ 0.02). Multipoint WPC analysis of region 1p21-q23 (Figure 3A) confirmed that the most significant results are obtained when the one locus TA test with a maximum value (pA ⬍ 0.002) corresponding to marker D1S252 is used. In region 6p21q21 (Figure 3B), the peaks of both multipoint TA and TA円D5S636 tests are reached at marker D6S286 with lower MC P values

for the 2-locus test (pA円D5S636 ⬍ 0.002) than for the one-locus test (pA ⬍ 0.02). Multipoint analysis of region 7q35-q36 (data not shown) provide MC P values very close to those obtained with the 2-point analysis. DISCUSSION

We present here the results of the autosome-wide scan searching for loci influencing infection levels by S. mansoni by means of a model-free linkage analysis method. We used the WPC linkage approach, which allows to consider large pedigrees and was recently extended to perform both multipoint and 2-locus analysis. The highest significant linkage was obtained with the 5q31-q33 region, confirming our previous mapping, which used a model-based lod-score approach.3 The multipoint WPC peak was located ⬃10 cM apart from the peak lod score, but all the 30-cM region between IL-4 and D5S412 provided WPC MC P values ⬍ 0.001. Although these linkage results validate the role of the 5q31-q33 region, the precise location of SM1 requires linkage disequilibrium studies with polymorphisms of the candidate genes located within this region such as those coding for IL-4, IL-5, IL-12B, IL-13, IRF1, and the CSF1R. In addition to chromosome 5q31-q33, 3 other regions pro-

GENETICS OF S. MANSONI INFECTION

757

FIGURE 2. Linkage analysis of the 5q31-q33 region using the weighted pairwise correlation (WPC) method. Monte Carlo (MC) P values (reverse log scale) obtained with the one-locus TA test are indicated by the squares (2-point analysis) and the solid line (multipoint analysis). Two-point asymptotic P values obtained by our previous lod-score analysis are also indicated by circles. Multipoint lod-score analysis considering all 5q31-q33 markers could not be performed for computational reasons.

vided linkage results with significance levels ⬍ 0.01 with the one-locus test TA, the 2-locus test TA円D5S636, or both. In region 1p21-q23, the most significant multipoint results were obtained at marker D1S252, with MC P values around 0.001 for the one-locus test, whereas much higher significance levels (⬃ 0.20) were observed via the 2-locus test. This sug-

gests that if a locus influencing S. mansoni infection levels does exist in this region, it would act independently of SM1. Opposite results were observed in region 6p21-q21, with more significant results (MC P values ⬃ 0.001) obtained with the 2-locus test than with the one-locus test, indicating that a putative locus located in this region could interact with

FIGURE 3. Linkage analysis of the 1p21-q23 (A) and the 6p21-q21 (B) regions using the weighted pairwise correlation (WPC) method. Monte Carlo (MC) P values (reverse log scale) obtained with the one-locus TA test are indicated by the squares (2-point analysis) and the solid line (multipoint analysis). MC P values obtained with the 2 locus TA円D5S636 test are indicated by the triangles (2-point analysis) and the broken line (multipoint analysis).

758

ZINN-JUSTIN AND OTHERS

SM1 in the regulation of S. mansoni infection levels. In region 7q35-q36, both the one-locus and the 2-locus tests provided close results, with significance levels around 0.007. Among these 3 regions, only the 7q35-q36 region provided significant lod-score values in our previous model-based analysis with asymptotic P values around 0.01. The examination of human genome maps (http:// www.ncbi.nlm.nih.gov/genemap99) shows that some candidate genes are located within these 3 regions. The chromosome 1 region between D1S248 and D1S484 contains the CSF1 gene, of which receptor is located in 5q31-q33. The 6p21-q21 region is located between 2 regions of major interest. One is the major histocompatibility complex (MHC) region, and the other is the 6q22-q23 region, which contains the gene encoding for the interferon gamma receptor ligandbinding chain (IFNGR1) and which was recently linked to a locus controlling the development of severe hepatic fibrosis due to S. mansoni infection, denoted as SM2.15 However, the genetic distance between D6S286 (corresponding to the multipoint WPC peak) and the MHC genes and IFNGR1 is ⬃ 40 and ⬃ 45 cM, respectively. Therefore, the present linkage results are very unlikely to be explained by a putative locus located within the MHC region or close to SM2. Finally the 7q35-q36 region contains the beta T cell receptor, and T cells were shown to play a central role in the development of acquired immunity against schistosomes. In conclusion, this autosome-wide scan that uses a modelfree linkage method confirmed that in this Brazilian population, there is only one major locus (SM1) located in region 5q31-q33 influencing S. mansoni infection levels. However, the present analysis also identified some candidate regions in which additional loci may play a role (although much less significant than SM1) independently or not of SM1. Further studies in other populations are ongoing to confirm the role of these loci. Acknowledgment: We thank Andreas Ziegler for helpful assistance with the data analysis. Financial support: This work was supported in part by Institut National de la Sante´ et de la Recherche Me´dicale, Ministe`re de l’Education Nationale de la Recherche et de la Technologie (Programme de Recherche Fondamentale en Microbiologie Maladies Infectieuses et Parasitaires), European Community Science and Technology for Development Program, and United Nations Development Programme/World Bank/World Health Organization. Authors’ addresses: Anne Zinn-Justin, INSERM U.436, Mathematical and Statistical Modeling in Biology and Medicine, Faculte´ de Me´decine Pitie´-Salpeˆtrie`re, 91 Bd de l’Hoˆpital 75013 Paris, France. Sandrine Marquet, Dominique Hillaire, and Alain Dessein, INSERM U.399, Immunology and Genetic of Parasitic Diseases/Laboratory of Parasitology-Mycology, Faculte´ de Me´decine, 27 Bd Jean Moulin, 13385 Marseille Cedex 5, France. Laurent Abel, Laboratory of Human Genetics of Infectious Diseases, INSERM U.550, Faculte´ de Me´decine Necker, 156 rue de Vaugirard, 75015 Paris, France. Reprint requests: Laurent Abel, Laboratory of Human Genetics of Infectious Diseases, INSERM U.550, Faculte´ de Me´decine Necker,

156 rue de Vaugirard, 75730 Paris Cedex 15. Telephone: 33-140615381; Fax: 33-1-40615688 (e-mail: [email protected]). REFERENCES

1. WHO, 1993. Public health impact of schistosomiasis: disease and mortality. WHO Expert Committee on the Control of Schistosomiasis. Bull World Health Organ 71: 657–662. 2. Abel L, Demenais F, Prata A, Souza AE, Dessein A, 1991. Evidence for the segregation of a major gene in human susceptibility/resistance to infection by Schistosoma mansoni. Am J Hum Genet 48: 959–970. 3. Marquet S, Abel L, Hillaire D, Dessein H, Kalil J, Feingold J, Weissenbach J, Dessein AJ, 1996. Genetic localization of a locus controlling the intensity of infection by Schistosoma mansoni on chromosome 5q31-q33. Nat Genet 14: 181–184. 4. Muller-Myhsok B, Stelma FF, Guisse-Sow F, Muntau B, Thye T, Burchard GD, Gryseels B, Horstmann RD, 1997. Further evidence suggesting the presence of a locus, on human chromosome 5q31-q33, influencing the intensity of infection with Schistosoma mansoni. Am J Hum Genet 61: 452–454. 5. Marquet S, Abel L, Hillaire D, Dessein A, 1999. Full results of the genome-wide scan which localises a locus controlling the intensity of infection by Schistosoma mansoni on chromosome 5q31-q33. Eur J Hum Genet 7: 88–97. 6. Commenges D, Olson J, Wijsman E, 1994. The weighted rank pairwise correlation statistic for linkage analysis: simulation study and application to Alzheimer’s disease. Genet Epidemiol 11: 201–212. 7. Zinn-Justin A, Abel L, 1998. Two-locus developments of the weighted pairwise correlation method for linkage analysis. Genet Epidemiol 15: 491–510. 8. Zinn-Justin A, Abel L, 1999. Introduction of the IBD information into the weighted pairwise correlation method for linkage analysis. Genet Epidemiol 17: 35–50. 9. Dessein AJ, Abel L, Couissinier P, Demeure C, Rihet P, Kohlstaedt S, Carneiro-Carvalho D, Ouattara M, Goudot-Crozel V, Dessein H, Bourgois A, Carvallo EM, Prata A, 1992. Environmental, genetic and immunological factors in human resistance to Schistosoma mansoni. Immunol Invest 21: 423– 453. 10. Dib C, Faure S, Fizames C, Samson D, Drouot N, Vignal A, Millasseau P, Marc S, Hazan J, Seboun E, Lathrop M, Gyapay G, Morissette J, Weissenbach J, 1996. A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature 380: 152–154. 11. Zinn-Justin A, Ziegler A, Abel L, 2001. Multipoint development of the weighted pairwise correlation (WPC) linkage method for pedigrees of arbitrary size and application to the analysis of breast cancer and alcoholism familial data. Genet Epidemiol 21: 40–52. 12. Fulker DW, Cherny SS, Cardon LR, 1995. Multipoint interval mapping of quantitative trait loci, using sib pairs. Am J Hum Genet 56: 1224–1233. 13. Almasy L, Blangero J, 1998. Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet 62: 1198– 1211. 14. Commenges D, Abel L, 1996. Improving the robustness of the weighted pairwise correlation test for linkage analysis. Genet Epidemiol 13: 559–573. 15. Dessein AJ, Hillaire D, Elwali NE, Marquet S, Mohamed-Ali Q, Mirghani A, Henri S, Abdelhameed AA, Saeed OK, Magzoub MM, Abel L, 1999. Severe hepatic fibrosis in Schistosoma mansoni infection is controlled by a major locus that is closely linked to the interferon-gamma receptor gene. Am J Hum Genet 65: 709–721.