Human Molecular Genetics, 2006, Vol. 15, No. 19 doi:10.1093/hmg/ddl231 Advance Access published on August 21, 2006
2903–2910
Evidence for a colorectal cancer susceptibility locus on chromosome 3q21–q24 from a high-density SNP genome-wide linkage scan Zoe Kemp1,{, Luis Carvajal-Carmona1,{, Sarah Spain1,{, Ella Barclay1,{, Margaret Gorman1,2,{, Lynn Martin1,{, Emma Jaeger1,{, Neil Brooks3,{, D Timothy Bishop4,{, Huw Thomas2,{, Ian Tomlinson1,2,*,{, Elli Papaemmanuil5,{, Emily Webb5,{, Gabrielle S Sellick5,{, Wendy Wood5,{, Gareth Evans6,{, Anneke Lucassen7,{, Eamonn R Maher8,{ and Richard S Houlston5,{,* and the ColoRectal tumour Gene Identification (CoRGI) Study Consortium} 1
Molecular and Population Genetics Laboratory, Cancer Research UK, 44 Lincoln’s Inn Fields, London WC2A 3PX, UK, 2Colorectal Cancer Unit, Cancer Research UK, St Mark’s Hospital, Watford Road, Harrow HA1 3UJ, UK, 3 Bioinformatics, Cancer Research UK, 44 Lincoln’s Inn Fields, London WC2A 3PX, UK, 4Genetic Epidemiology Laboratory, Cancer Research UK, St James’s University Hospital, Beckett Street, Leeds, LS9 7TF, UK, 5Institute of Cancer Research, 15 Cotswold Road, Sutton, Surrey, SM2 5NG, UK, 6Medical Genetics, St Mary’s Hospital, Manchester, Hathersage Road, Manchester, M13 0JH, UK, 7Wessex Clinical Genetics Service, Princess Anne Hospital, Coxford Road, Southampton, SO16 5YA, UK and 8Section of Medical and Molecular Genetics, University of Birmingham School of Medicine and West Midlands Genetics Service, Birmingham Women’s Hospital, Edgbaston, Birmingham, B15 2TG, UK Received June 6, 2006; Revised and Accepted August 10, 2006
To identify a novel susceptibility gene for colorectal cancer (CRC), we conducted a genome-wide linkage analysis of 69 pedigrees segregating colorectal neoplasia in which involvement of known loci had been excluded, using a high-density single nucleotide polymorphism (SNP) array containing 10 204 markers. Multipoint linkage analyses were undertaken using both non-parametric (model-free) and parametric (modelbased) methods. After the removal of SNPs in strong linkage disequilibrium, we obtained a maximum non-parametric linkage statistic of 3.40 (P 5 0.0003) at chromosomal region 3q21 – q24. The same genomic position also yielded the highest multipoint heterogeneity LOD (HLOD) score under a dominant model (HLOD 5 3.10, genome-wide P 5 0.038) with 62% of families linked to the locus. We provide evidence for a novel CRC susceptibility gene. Further studies are needed to confirm this localization and to evaluate the contribution of this locus to disease incidence.
INTRODUCTION Colorectal carcinoma (CRC) is the third most common cause of cancer-related mortality in the Western world and in the United States. It represents the second most common cause of cancer death (1). A number of Mendelian syndromes with
mutations in known genes [APC, mismatch repair (MMR) genes, MUTYH/MYH, SMAD4, ALK3 and STK11/LKB1] have been characterized; carriers of mutations in these genes are at a markedly elevated risk of CRC. Collectively, these known genes account for at most 6% of all CRC cases (2). Evidence from twin studies, however, indicates that inherited
*To whom correspondence should be addressed. Tel: +44 02072692884; Fax: +44 02072693093; Email:
[email protected] or Tel: þ44 208 722 4250; Email:
[email protected] { London Research Institute Group. ‡ Institute of Cancer Research Group. } List of CoRGI Consortium collaborators available on request.
# The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email:
[email protected]
2904
Human Molecular Genetics, 2006, Vol. 15, No. 19
Table 1. Characteristics of the 69 pedigrees segregating colorectal neoplasia Number of affected individuals per pedigree
Number of pedigrees
Patients with CRC
Patients with adenomas
Patients with hyperplastic polyps
Total
2 3 4 5 6 7 10 Total
6 17 21 10 10 4 1 69
9 (9) 29 (27) 36 (29) 28 (20) 29 (17) 8 (7) 8 (1) 147 (110)
3 (3) 21 (21) 43 (43) 22 (21) 29 (25) 15 (15) 1 (1) 134 (129)
0 (0) 1 (1) 5 (4) 0 (0) 2 (2) 5 (5) 1 (1) 14 (13)
12 (12) 51 (49) 84 (76) 50 (41) 60 (44) 28 (27) 10 (3) 295 (252)
Figures in parentheses indicate the number of individuals for whom genotypes were obtained.
susceptibility plays a far greater role in the development of CRC, with an estimated heritability of 35% (3). Such an assertion is supported by other epidemiological data, which show that a significant component of the familial aggregation of CRC remains unexplained by germline mutations in known genes (L. Aaltonen, L. Johns, H. Ja¨rvinen, J. Mecklin and R. Houlston, manuscript in preparation). Direct evidence for uncharacterized high/moderate-penetrance CRC genes is provided by CRC families that show evidence against linkage to known loci and by kindreds that fulfill the clinical (Amsterdam) criteria for HNPCC but whose CRCs do not display MMR deficiency. Since familial CRC risks in relatives of colorectal adenoma (CRA) cases parallel those seen in relatives of cancer cases (4), there is strong a priori evidence that a significant proportion of inherited predisposition is mediated through susceptibility to adenoma formation; adenoma predisposition genes might be distinct from genes that make the progression of adenoma to carcinoma more likely. The above observations strongly support the continued search for novel CRC predisposition genes through conventional genome-wide linkage searches. The feasibility of this strategy is contingent on the ascertainment and collection of a large series of colorectal tumour families, from which familial adenomatosis polyposis and germline mutations in the known MMR genes have been excluded. In 1997, we formed the ColoRectal tumour Gene Identification (CoRGI) Study Consortium to ascertain and collect biological samples and data from families segregating CRC, in order to identify novel pre-disposition genes. Here, we report the results of a genome-wide linkage search of 69 families ascertained through the consortium. As increased genetic marker density in whole genome scans has been shown to increase the linkage information content (IC), we conducted a genome-wide scan using the Affymetrix Mapping 10K Xba142 array, which contains 10,000 single nucleotide polymorphism (SNP) markers. Our results provide substantive evidence for the existence of a novel CRC predisposition locus on chromosome 3q21–q24, and suggestive evidence of other colorectal tumour predisposition genes.
RESULTS Description of families analyzed Details of the 69 families are summarized in Table 1. In total, there were 295 affected family members, of whom 147 had a
diagnosis of CRC (with or without adenoma), 134 had adenomas and 14 had hyperplastic polyps. The number of affected persons per family ranged from two to 10, and the number of affected persons per family with DNA available ranged from two to seven. Six of the 69 families contained affected persons in three generations, whereas 39 pedigrees contained affected family members in two generations. In the remaining pedigrees, affected family members were confined to a single generation. The mean age at diagnosis of CRC in the families was 55.2 years, significantly less than the mean value of 70 years for age at diagnosis observed in the general white population. The mean age at diagnosis of CRA in the families was 49.3 years. The minimum age of diagnosis of CRC within each family ranged from 29 to 77 years (median 49 years). Minimum age at diagnosis within a family is likely to be a superior indicator of the potential for existence of a susceptibility gene, since it is not influenced by older sporadic cases.
Data quality A total of 252 Affymetrix Mapping 10K Xba142 arrays were processed, resulting in the generation of more than 2.5 million genotypes. Monitoring of a number of parameters throughout the study was employed to determine the data quality and all genotypes were housed within the pedigree storage program ProgenyLab. The average SNP call rate per array was 98.17%. For DNA extracted from males, it was possible to examine the 238 markers on the X chromosome for errors because of miscalls or PCR contamination. No SNPs were heterozygous in male samples. A total of 441 SNPs were not polymorphic in our pedigree set, leaving 9763 polymorphic markers with known map locations. After error checking with the programs ProgenyLab and MERLIN, a total of 5322 and 427 genotypes, respectively, were deemed to be erroneous and were removed from further analysis.
Linkage analysis After excluding 2206 SNPs owing to LD, the remaining 7557 SNPs used in the linkage analyses had a calculated median intermarker distance of 0.23 Mb. As the IC for analyses with and without the high-LD SNPs were virtually identical (genomewide IC mean of 0.771 with high-LD SNPs compared to 0.769 without), there was no evidence that loss of these SNPs
Human Molecular Genetics, 2006, Vol. 15, No. 19
from the total number significantly impacted on the probability of recovering appropriate vectors of inheritance. In the whole set of families, the strongest evidence of linkage was to two chromosomal regions—18q21 [maximum nonparametric linkage (NPL) ¼ 3.16; dominant heterogeneity LOD (HLOD) ¼ 0.80; recessive HLOD ¼ 1.34] and 2p22 (maximum NPL ¼ 1.54, dominant HLOD ¼ 0.01; recessive HLOD ¼ 2.16). The region of linkage on chromosome 18q12–q22 (NPL . 2.0) was bounded by SNPs rs180640 and rs768360, a distance of 22.4 Mb. The region of linkage on 2p22 (recessive HLOD . 1.0) was bounded by SNPs rs728618 and rs720795, a distance of 8.24 Mb. Figure 1 and Table 2 show multipoint NPL and HLOD scores for each of these chromosomal regions, together with other regions that produced NPL and/or LOD scores nominally significant at the 1% level. NPL statistics were stable after restricting analysis to those families with median age of CRC diagnosis ,55 (Table 2). We then restricted our definition of affection status to CRC only, since, as we have argued above, separate loci might be associated with adenoma predisposition and progression to CRC. Restricting the affection status in this way rendered a subset of 38 families informative for linkage. Empirical limits for genome-wide significance for this analysis were established at 3.64 for the NPL statistic and 3.01 for the LOD score, with suggestive linkage thresholds of 2.75 and 1.88 respectively. There was genome-wide significant evidence of linkage to a single chromosomal region, 3q21–q24, dominant HLOD ¼ 3.10; recessive HLOD ¼ 1.39; maximum NPL ¼ 3.40 (restricted to families with median age of diagnosis of CRC ,55 years, NPL ¼ 2.44; Table 3). The dominant HLOD was maximized with 62% of families linked. The region of linkage (dominant HLOD . 1.0) was bounded by SNPs rs718612 and rs966226, a distance of 17.8 Mb. In the full family set, this region had shown some evidence of linkage, though non-significant (maximum NPL ¼ 1.51; dominant HLOD ¼ 0.58; recessive HLOD ¼ 0.27). Chromosomes 18q21 and 2p22 did not achieve the thresholds for suggestive linkage in the CRC-only dataset. Figure 2 and Table 3 show multipoint NPL and HLOD scores for the 3q21–q24 chromosomal region, together with other regions that produced NPL and/ or LOD scores nominally significant at the 1% level. Contribution of the chromosome 3q21 –q24 locus to the familial risk of CRC The best estimate of the proportion of sibling pairs affected with CRC sharing no haplotypes in the chromosome 3q21 –q24 region was 0.09. This translates to a sibling relative risk attributable to the identified locus as 2.8 (95% confidence interval 1.8 – 4.3). Mutation analyses Over 90 RefSeq genes map to the 17.8 Mb region of linkage on chromosome 3q21– q24 delimited by rs718612 and rs966226 (UCSC Human Genome browser, http://genome. ucsc.edu/). Two genes mapping to the linked region, MBD4 (MIM 603574) and EPHB1 (MIM 600600), represent highly plausible candidates on the basis of their biology. We screened one or two individuals affected with CRC from each family for
2905
sequence changes in the coding regions and splice sites of both genes. A number of previously documented synonymous SNPs were detected in both genes, but no obvious or potentially pathogenic changes were identified. To examine whether K-ras and BRAF mutational status provided a means of differentiating families, we evaluated the tumours from 27 of the families. There was, however, no evidence of concordance in mutational status between independent pairs of tumours (x2 ¼ 0.13; P ¼ 0.71): 16 pairs of tumours harboured no mutation in either gene; in 10 pairs, only one of the tumours was mutated; and in one family, both tumours were mutated (one with a K-ras codon 12 mutation and the other, a codon 13 mutation).
DISCUSSION Our results provide good evidence for a novel CRC susceptibility locus on chromosome 3q21 – q24, with a pattern of inheritance consistent with an autosomal dominant model. Epidemiological studies have shown that the risk of CRC in young relatives of early-onset CRC cases is increased approximately 10-fold (5). On the basis of this estimate, the 3q21 – q24 locus we have identified is likely to account for ~18% of the familial risk of CRC. Although we found no pathogenic changes in MBD4 and EPHB1, there are several other genes in our region of linkage on chromosome 3q21 – q24 that are involved in aspects of the regulation of cellular proliferation and differentiation (for example, NCK1), or regulation of cell cycle or apoptosis (for example, WDR10 and PIK3CB). These represent additional attractive candidates. Identification of chromosome 3q21 –q24 as a region harbouring a novel CRC predisposition gene was based on an analysis of families in which we restricted affection status to a diagnosis of CRC rather than using more general criteria taking into account segregation of adenomas in families. All of the patients with adenomas (or hyperplastic polyps) whom we classed as affected had unusually severe disease in terms of tumour multiplicity, age of onset or histology. Our estimates of phenocopy rates for these cases were inevitably imprecise, but were likely to be very conservative. Indeed, almost identical criteria to ours have recently been found independently to be a predictor of adenoma recurrence (6) and hence probably of disease predisposition. We therefore expect all of our affected patients who have adenomas but not CRC to be at greatly increased risk of the latter. Nevertheless, different loci may favour adenoma occurrence and progression to carcinoma, as illustrated by comparing the Mendelian conditions of familial adenomatous polyposis and hereditary non-polyposis colon cancer. Our finding of significant linkage based on CRC affection status in a subset of families is not, therefore, unexpected. In addition to the evidence of linkage of CRC to chromosome 3q21 – q24, we found suggestive evidence of linkage in the whole dataset to chromosome 18q21 and chromosome 2p22, the former based on the NPL statistic and the latter on a recessive model of inheritance. Although we have excluded the involvement of known CRC predisposition genes through rigorous examination of clinical data and molecular analyses, it is intriguing that SMAD4 maps to the region of linkage on
2906 Human Molecular Genetics, 2006, Vol. 15, No. 19
Figure 1. Plots of linkage statistics obtained from analysis of all families (after the removal of high-LD SNPs) for chromosomes 2p22, 2q36, 3q21–q24, 5p14, 10p12, 11p14 and 18q21 markers. The HLOD scores under the dominant model are shown in black, HLOD scores under the recessive model are shown in red, and NPL P-values transformed by – log10(P) are shown in blue.
Human Molecular Genetics, 2006, Vol. 15, No. 19
2907
Table 2. Location of NPL scores .2.0 or HLOD scores .1.15 (corresponding to nominal P-value , 0.01) in the whole family set. a provides an estimate of the proportion of families linked at a given genomic position. Figures in parentheses show the NPL score and associated P-value for the subset of 33 pedigrees with median age at diagnosis of CRC ,55 years
Table 3. Location of NPL scores .2.0 or HLOD scores .1.15 (corresponding to nominal P-value , 0.01) in the CRC-only set. a provides an estimate of the proportion of families linked at a given genomic position. Figures in parentheses show the NPL score and associated P-value for the subset of 22 pedigrees with median age at diagnosis of CRC ,55 years
Chromosomal region
Non-parametric
Dominant model
Dominant model
P
Max HLOD
a
Max HLOD
a
Chromosomal region
Non-parametric
Max NPL
Max NPL
P
Max HLOD
a
Max HLOD
a
2p22 (,55) 2q36– q37 (,55) 3q13– q21 (,55) 5p14 (,55) 10p12 (,55) 18q21 (,55)
1.54 (0.70) 1.86 (2.38) 2.45 (2.46) 2.07 (1.89) 2.04 (2.42) 3.16 (3.22)
0.062 (0.242) 0.031 (0.009) 0.008 (0.007) 0.019 (0.029) 0.021 (0.008) 0.0008 (0.0006)
0.01
0.01
2.16
0.29
1.23
0.36
0.81
0.20
1.62
0.48
0.53
0.28
0.96
0.17
1.16
0.24
1.72
0.43
0.77
0.30
0.91
0.21
0.61
0.23
3.10
0.62
1.39
0.50
0.57
0.17
0.86
0.24
0.20
0.19
1.33
0.43
0.80
0.20
1.34
0.37
0.069 (0.540) 0.017 (0.092) 0.014 (0.758) 0.0003 (0.007) 0.032 (0.121) 0.018 (0.095) 0.014 (0.051) 0.028 (0.010)
0.08
0.19
1.48 (20.10) 2.12 (1.33) 2.21 (20.70) 3.40 (2.44) 1.85 (1.17) 2.10 (1.31) 2.20 (1.64) 1.91 (2.31)
0.04
1.41
2p22 (,55) 2q24–q31 (,55) 3q13 (,55) 3q21–q24 (,55) 7q21 (,55) 9p22 (,55) 17q24 (,55) 18q12–q21 (,55)
1.04
0.34
0.2
0.21
0.93
0.38
1.79
0.48
1.30
0.39
0.13
0.14
Recessive model
chromosome 18q, together with other candidate genes such as SMAD2. Candidate genes within the linkage region on 2p22 include MAP4K3 and CDKL4. Recently, a CRC predisposition locus was reported to map to chromosome 9q22.2–q31 (7) on the basis of an analysis of 74 affected sibling pairs from 53 CRC kindreds. A subset of the families analyzed in our study showed some evidence of linkage to 9q22.2–q31 (8), but we found stronger evidence for linkage to other sites. In addition to chromosomes 3q21– q24, 18q21 and 2p22, we found modest support for linkage to chromosomes 2q, 3q13, 5p14, 7q24, 9p22, 10p12, and 17q24. It is possible that there may be epistatic interactions between these putative loci, but data from the analysis of additional families would be required to comprehensively investigate this possibility. Current research, aimed at identifying susceptibility loci, is becoming centered on association studies, which are primarily predicated on the ‘common-disease/common-variant’ hypothesis. Linkage analysis is therefore being relegated as a strategy for disease gene identification. It has, however, proven to be one of the most efficient tools for localizing and identifying disease-causing alleles, even for many complex traits. Our analysis vindicates the continued use of linkage in cases where the disease phenotype can be defined with precision, and shows that using high-density SNP arrays provide an efficient method of conducting genome-wide linkage searches. In conclusion, we provide evidence for a novel susceptibility gene on chromosome 3q21– q24 from an analysis of a series of UK CRC families. Further studies are needed to confirm this localization and robustly evaluate the contribution of this locus to familial disease.
MATERIALS AND METHODS
Recessive model
whole blood using the Chemagic Magnetic Separation Module 1TM . No patient had clinical features of Peutz – Jeghers syndrome, juvenile polyposis, hereditary mixed polyposis or inflammatory bowel disease. Germline MMR gene mutations were excluded by microsatellite instability testing (BAT25, BAT26) in two colorectal cancers from each family; kindreds in which both cancers were unstable were excluded. Where one unstable cancer was found, or if the only available cancer was unstable, direct mutation screening of all coding regions of MSH2 and MLH1 was undertaken using denaturing high-performance liquid chromatography analysis based on constitutional DNA (assay details available on request). Additionally, for all individuals with more than five adenomas, germline APC and MYH mutations were excluded by denaturing high-performance liquid chromatography screening of the full coding region of each gene and by linkage to the APC gene (assay details available on request). In six families, one cancer was MSIþ and one or more others were MSI2; in all of these cases, the MSIþ cancer was not of notably early-onset or from an individual with multiple cancers suggestive of HNPCC. Individuals were classed if they fulfilled one or more of the following criteria: CRC at age 75 years; ‘significant’ adenoma (three or more synchronous or metachronous, and/ or villous morphology, and/or severe dysplasia, and/or diameter .1 cm) at age 75 years; any adenoma at age 45 years; .10 hyperplastic polyps at age 75 years; and any hyperplastic polyp at age 35 years. All samples were obtained with informed consent and local ethical review board approval in accordance with the tenets of the Declaration of Helsinki.
Ascertainment and collection of families Families were ascertained as part of the CoRGI study. Each kindred had at least three affected individuals (confirmed by pathology reports). Genomic DNA was extracted from
Genotyping The analysis was based on 69 families segregating CRC and/or CRA in which involvement of known loci had been excluded.
2908 Human Molecular Genetics, 2006, Vol. 15, No. 19
Figure 2. Plots of linkage statistics (after the removal of high LD SNPs) for chromosome 2p22, 2q36, 3q21–q24, 7q21, 9p21, 17q24 and 18q21 markers after restricting analysis to families in which affection status defined by a diagnosis of CRC. The HLOD scores under the dominant model are shown in black, HLOD scores under the recessive model are shown in red, and NPL P-values transformed by –log10( p) are shown in blue.
Human Molecular Genetics, 2006, Vol. 15, No. 19
A genome-wide linkage search of these families was undertaken using the GeneChip Mapping 10K Xba 142 array containing 10 204 SNP markers (Affymetrix Inc., Santa Clara, CA). SNP genotypes were obtained by following the Affymetrix protocol (9). Briefly, for each sample, 250 ng of genomic DNA isolated from peripheral blood was digested with the restriction endonuclease XbaI for 2.5 h. Digested DNA was mixed with Xba adapters and ligated using T4 DNA ligase for 2.5 h. Ligated DNA was added to four separate PCR reactions, cycled, pooled and purified to remove unincorporated ddNTPs. The purified PCR products were then fragmented and labeled with biotin-ddATP. Biotin-labeled DNA fragments were hybridized to the arrays for 18 h in an Affymetrix 640 hybridization oven. After hybridization, arrays were washed, stained and scanned using an Affymetrix Fluidics Station FS450 with images obtained using an Affymetrix GeneChip 3000 scanner. Affymetrix GCOS software (v1.4) was used to obtain raw microarray feature intensities (RAS scores). RAS scores were processed using Affymetrix GTYPE (v4.0) software to derive SNP genotypes (Affymetrix Inc., Santa Clara, CA). Data manipulation and error checking Non-Mendelian error checking of genotypes and generation of linkage format files from raw Affymetrix array (.chp) files was performed using the program ProgenyLab (Progeny Inc., South Bend, IN). The map order and distances between SNP markers were based on the UCSC Human Genome browser. The program MERLIN (10) was employed to search for additional unlikely genotypes consistent with potential genotyping errors. Investigation of linkage disequilibrium Founder genotypes were not always available in our families. Under such conditions, the presence of linkage disequilibrium (LD) between markers has the potential to inflate multipoint linkage statistics as the vectors of inheritance have to be inferred on the basis of allele frequencies (11,12). Thresholds of 0.16 for r 2 have been advocated to define high-LD SNPs whose inclusion will distort linkage statistics (13). We calculated the pair-wise r 2 between consecutive pairs of SNP markers. LD was removed by considering each set of markers in LD (defined as sets where each consecutive marker pair in the set had r 2 . 0.16) and retaining one SNP from each set (the centrally positioned SNP). The impact of LD was investigated by considering linkage results before and after the exclusion of high-LD SNPs. Linkage analysis Our strategy for generating linkage statistics was not generated post hoc but was formulated prior to analysis. Multipoint linkage analysis was undertaken by implementation of the program SNPLINK (14), which performs fully automated nonparametric (mode-of-inheritance free) and parametric analyses before and after LD removal by incorporation of the MERLIN (v0.10.1) (10) and ALLEGRO (v1.1) (15) programs, respectively. Parametric linkage in the presence of heterogeneity
2909
was assessed using HLOD scores. HLOD scores and their accompanying estimates of the proportion of linked families (a) were calculated using the statistical software package ALLEGRO. We derived LOD scores under both dominant and recessive models of inheritance of CRC with reduced penetrance and four liability classes dependent upon age at diagnosis (,50, 50– 59, 60– 69 and 70 years). Parameters in the models were based on a segregation analysis of 1042 CRC families in which the involvement of known Mendelian loci had been excluded (unpublished data). Allele frequencies were 0.017 under the dominant model and 0.183 under the recessive model. For the dominant model, penetrances were set at 0.044, 0.105, 0.213 and 0.420, with corresponding phenocopy rates of 0.0004, 0.002, 0.007 and 0.030. For the recessive model, penetrances were set at 0.054, 0.146, 0.331 and 0.638 with corresponding phenocopy rates of 0.00004, 0.0003, 0.0026 and 0.023. A second analysis was undertaken whereby individuals were classed as affected if they had frank CRC or advanced CRA or colorectal pathology associated with a high risk of frank malignancy, as defined above. Robust data on the agespecific prevalence or incidence rates for CRA and other colorectal phenotypes are not available. In the absence of such information, in order to incorporate these affected individuals into parametric analyses, they were considered to be equivalent to CRC 15 years later. This assumption of equivalence follows from data estimating the risk of malignant transformation (16). To provide for a higher phenotype rate, we increased rates in the last liability class to 0.07 and 0.06 for dominant and recessive models, respectively. In all analyses, unaffected individuals were considered uninformative (that is, of unknown phenotype). HLOD scores follow a complex statistical distribution, which can be approximated by the maximum of two independently distributed variables. To obtain significance estimates for HLODs, these were first converted to a x2, where x2¼ 2 loge10 HLOD and significance values (P1) were then derived using the x2 distribution with one degree of freedom. The nominal P-value for the HLOD score is then given by 0.5 [1 2 (1 2 P1)(1 2 P1)] (17). Multipoint NPL analyses were performed using the SALL statistic generated by MERLIN. Results are reported in terms of an NPL statistic and its associated one-sided P-value. Under the null hypothesis of no linkage, the NPL statistic is distributed asymptotically as a standard normal random variable. For each analysis, we also calculated empirical genome-wide significance levels for the non-parametric NPL linkage statistics and LOD score (after markers in high LD were removed) using 10 000 simulations. At each iteration, we used ALLEGRO to simulate genotype data, using the original phenotypes, allele frequencies, marker spacing and missing data patterns. Genome-wide significance thresholds represent the statistic that could be achieved in our data by chance under the null hypothesis of no linkage at a frequency of one in 20 simulations, whereas genome-wide suggestive thresholds represent the values that could be obtained by chance once in every genome-wide scan. MERLIN was used to estimate IC for each chromosome provided by the marker set by use of the entropy information measure described by Kruglyak et al. (18).
2910
Human Molecular Genetics, 2006, Vol. 15, No. 19
Familial risk of CRC attributable to linked regions The familial risk of CRC attributable to linked regions in siblings, ls, was determined from allele sharing probabilities between affected relative pairs (19). Bootstrapping was employed to derive 95% confidence intervals for ls.
7.
8.
Mutation detection A search for germline mutations in MBD4 and EPHB1 was undertaken by screening constitutional DNA samples from at least one member of each kindred. The search for pathogenic sequence changes in exons, intron-exon boundaries and untranslated regions of each gene was undertaken using a combination of direct sequencing and fluorescent singlestranded conformation polymorphism analysis (details of assays available on request). Mutations in K-ras (codons 12 and 13) and BRAF were detected in tumour samples using direct sequencing in forward and reverse orientations as previously described (20,21).
9.
10.
11.
12.
13.
ACKNOWLEDGEMENTS The authors are grateful to the patients, their families and clinicians for their participation in this study. The work was supported by grants from Cancer Research UK. Z.K. was funded by the EU, G.S. by Leukemia Research, E.P. by the Institute of Cancer Research and S.S. by CORE. Conflict of Interest statement. None declared.
14.
15.
16.
REFERENCES 1. Parkin, D., Whelan, S. and Ferlay, J. (2003) Cancer incidence in five continents, IARC, Lyon. 2. Bonaiti-Pellie, C. (1999) Genetic risk factors in colorectal cancer. Eur. J. Cancer Prev., 8 (Suppl. 1), S27–S32. 3. Lichtenstein, P., Holm, N.V., Verkasalo, P.K., Iliadou, A., Kaprio, J., Koskenvuo, M., Pukkala, E., Skytthe, A. and Hemminki, K. (2000) Environmental and heritable factors in the causation of cancer–analyses of cohorts of twins from Sweden, Denmark, and Finland. N. Engl. J. Med., 343, 78–85. 4. Johns, L.E. and Houlston, R.S. (2001) A systematic review and meta-analysis of familial colorectal cancer risk. Am. J. Gastroenterol., 96, 2992–3003. 5. Hemminki, K. and Chen, B. (2005) Familial risks for colorectal cancer show evidence on recessive inheritance. Int. J. Cancer, 115, 835–838. 6. Winawer, S.J., Zauber, A.G., Fletcher, R.H., Stillman, J.S., O’Brien, M.J., Levin, B., Smith, R.A., Lieberman, D.A., Burt, R.W., Levin, T.R. et al. (2006) Guidelines for colonoscopy surveillance after polypectomy: a consensus update by the US Multi-Society Task Force on Colorectal
17. 18.
19. 20.
21.
Cancer and the American Cancer Society. Gastroenterology, 130, 1872–1885. Wiesner, G.L., Daley, D., Lewis, S., Ticknor, C., Platzer, P., Lutterbaugh, J., MacMillen, M., Baliner, B., Willis, J., Elston, R.C. et al. (2003) A subset of familial colorectal neoplasia kindreds linked to chromosome 9q22.2–31.2. Proc. Natl. Acad. Sci. USA, 100, 12961–12965. Kemp, Z.E., Carvajal-Carmona, L.G., Barclay, E., Gorman, M., Martin, L., Wood, W., Rowan, A., Donohue, C., Spain, S., Jaeger, E. et al. (2006) Evidence of linkage to chromosome 9q22.33 in colorectal cancer kindreds from the United kingdom. Cancer Res., 66, 5003–5006. Matsuzaki, H., Loi, H., Dong, S., Tsai, Y.Y., Fang, J., Law, J., Di, X., Liu, W.M., Yang, G., Liu, G. et al. (2004) Parallel genotyping of over 10,000 SNPs using a one-primer assay on a high-density oligonucleotide array. Genome Res., 14, 414–425. Abecasis, G.R., Cherny, S.S., Cookson, W.O. and Cardon, L.R. (2002) Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet., 30, 97 –101. Huang, Q., Shete, S. and Amos, C.I. (2004) Ignoring linkage disequilibrium among tightly linked markers induces false-positive evidence of linkage for affected sib pair analysis. Am. J. Hum. Genet., 75, 1106–1112. Evans, D.M. and Cardon, L.R. (2004) Guidelines for genotyping in genomewide linkage studies: single-nucleotide-polymorphism maps versus microsatellite maps. Am. J. Hum. Genet., 75, 687 –692. Boyles, A.L., Scott, W.K., Martin, E.R., Schmidt, S., Li, Y.J., Ashley-Koch, A., Bass, M.P., Schmidt, M., Pericak-Vance, M.A., Speer, M.C. et al. (2005) Linkage disequilibrium inflates type I error rates in multipoint linkage analysis when parental genotypes are missing. Hum. Hered., 59, 220–227. Webb, E.L., Sellick, G.S. and Houlston, R.S. (2005) SNPLINK: multipoint linkage analysis of densely distributed SNP data incorporating automated linkage disequilibrium removal. Bioinformatics, 21, 3060–3061. Gudbjartsson, D.F., Jonasson, K., Frigge, M.L. and Kong, A. (2000) Allegro, a new computer program for multipoint linkage analysis. Nat. Genet., 25, 12 –13. Chen, C.D., Yen, M.F., Wang, W.M., Wong, J.M. and Chen, T.H. (2003) A case-cohort study for the disease natural history of adenoma-carcinoma and de novo carcinoma and surveillance of colon and rectum after polypectomy: implication for efficacy of colonoscopy. Br. J. Cancer, 88, 1866–1873. Faraway, J.J. (1993) Distribution of the admixture test for the detection of linkage under heterogeneity. Genet. Epidemiol., 10, 75 –83. Kruglyak, L., Daly, M.J., Reeve-Daly, M.P. and Lander, E.S. (1996) Parametric and nonparametric linkage analysis: a unified multipoint approach. Am. J. Hum. Genet., 58, 1347–1363. Risch, N. (1990) Linkage strategies for genetically complex traits. I. Multilocus models. Am. J. Hum. Genet., 46, 222 –228. Johnson, V., Lipton, L.R., Cummings, C., Eftekhar Sadat, A.T., Izatt, L., Hodgson, S.V., Talbot, I.C., Thomas, H.J., Silver, A.J. and Tomlinson, I.P. (2005) Analysis of somatic molecular changes, clinicopathological features, family history, and germline mutations in colorectal cancer families: evidence for efficient diagnosis of HNPCC and for the existence of distinct groups of non-HNPCC families. J. Med. Genet., 42, 756–762. Rowan, A., Halford, S., Gaasenbeek, M., Kemp, Z., Sieber, O., Volikos, E., Douglas, E., Fiegler, H., Carter, N., Talbot, I. et al. (2005) Refining molecular analysis in the pathways of colorectal carcinogenesis. Clin. Gastroenterol. Hepatol., 3, 1115–1123.