Quantitative Models for Predicting Mutations in Lynch Syndrome Genes Sining Chen, PhD, David M. Euhus, MD, and Giovanni Parmigiani, PhD
Corresponding author Giovanni Parmigiani, PhD Department of Oncology, Johns Hopkins University School of Medicine and Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 550 North Broadway, Suite 1103, Baltimore, MD 21205, USA. E-mail:
[email protected]
Until recent years, when molecular diagnosis became widely available, Lynch syndrome was historically diagnosed through family history. Widely used during this period were the Amsterdam criteria (AC): •
Three or more family members with a confirmed diagnosis of a cancer associated with Lynch syndrome, one of whom is a first-degree relative (parent, child, sibling) of the other two
•
Two successive affected generations
•
One or more cancers associated with Lynch syndrome diagnosed before age 50 years
Current Colorectal Cancer Reports 2007, 3:206–211 Current Medicine Group LLC ISSN 1556-3790 Copyright © 2007 by Current Medicine Group LLC
As genotyping for Lynch syndrome has become widespread, more and more people are being counseled about whether to be genotyped for mutations in mismatch repair genes. Recently a number of quantitative models have been developed to identify potential Lynch syndrome patients and serve as decision aids for patients at genetic counseling clinics. In contrast to existing clinical guidelines that give dichotomous classifications, these models provide a probability that a family or individual has Lynch syndrome. These models have been shown to be useful tools in identifying likely carriers of Lynch syndrome mutations. Correctly used, they have the potential to greatly improve the current diagnosis and management of Lynch syndrome families. To help clinicians and genetic counseling professionals understand the differences among these models and use the models wisely, we review the key features of each model and offer some guidelines on their use.
Introduction Lynch syndrome is the most common form of hereditary colorectal cancer (CRC). Individuals affected with the syndrome have a high lifetime risk of developing cancer of the colon or rectum and increased risks for a number of other sites, including the endometrium [1]. Therefore it is critical to identify Lynch syndrome families and individuals for targeted prevention and early detection. Because of the lack of a distinguishing phenotype such as that of familial adenomatous polyposis, diagnosis of Lynch syndrome has been a long-standing challenge. In the 1990s, Lynch syndrome was linked to deleterious germline mutations in DNA mismatch repair (MMR) genes, mainly MLH1, MSH2, MSH6, and PMS2 [2–5].
• Familial adenomatous polyposis excluded AC I includes CRC only [6]. AC II includes cancers of the colon and rectum, endometrium, small bowel, ureter, or renal pelvis [7]. There are variations regarding which tumors are considered to be associated with Lynch syndrome. Asian Lynch syndrome families, for example, show an excess of stomach cancers. Sometimes it is also required that all tumors are pathologically verified. However, these criteria are recognized to be too stringent (ie, insufficiently sensitive), as a significant portion of mutation carriers do not satisfy the AC [8]. Meanwhile, there are some AC-positive families for whom a deleterious mutation is not found. It is important at this point to clarify the use of the terms “hereditary nonpolyposis colorectal cancer (HNPCC)” and “Lynch syndrome,” which have been widely perceived as interchangeable. “HNPCC” has historically referred to families with CRC that appeared to be autosomal dominant (eg, AC-positive families), whereas the definition of “Lynch syndrome” requires the carriage of a deleterious MMR mutation. As germline testing becomes widespread, the term “HNPCC” introduces much confusion, and some have suggested that it be phased out [9]. As microsatellite instability (MSI) was recognized in colorectal tumors as a signature of the defective MMR mechanism, the original and revised Bethesda guidelines (BG) were proposed to identify cancer patients whose tumors may exhibit MSI (Table 1) [10,11]. These guidelines involve a two-stage screening algorithm. The algorithm achieves high sensitivity because it is highly inclusive in the first stage. However, a significant portion of MSI-high tumors
Quantitative Models for Predicting Mutations in Lynch Syndrome Genes
Table 1. Revised Bethesda Guidelines Tumors should be tested for microsatellite instability when one or more of the following exist: 1. CRC diagnosed in a patient aged < 50 y 2. Presence of CRCs—synchronous (simultaneous) or metachronous (diagnosed at different times)—or other tumors associated with Lynch syndrome,* regardless of age 3. CRC exhibiting MMR-associated histology† diagnosed in a patient aged < 60 y 4. CRC or other tumor associated with Lynch syndrome* diagnosed before age 50 y in at least one first-degree relative 5. CRC or other tumor associated with Lynch syndrome* diagnosed at any age in two first- or second-degree relatives *Includes colorectal, endometrial, stomach, ovarian, pancreas, ureter and renal pelvis, biliary tract, and brain (usually glioblastoma as seen in Turcot syndrome) tumors; sebaceous gland adenomas; keratoacanthomas in Muir-Torre syndrome; and carcinoma of the small bowel. † Presence of tumor-infiltrating lymphocytes, Crohn disease–like lymphocytic reaction, mucinous or signet-ring differentiation, or medullary growth pattern. CRC—colorectal cancer; MMR—mismatch repair.
are sporadic, so the algorithm is associated with a fairly low specificity. Because MSI tests are expensive, this approach results in a high cost per mutation detected [12]. Recently a number of quantitative models have been developed to identify patients who may have Lynch syndrome. In contrast to the AC and BG, which are dichotomous, these models provide a probability that a family or individual has Lynch syndrome. They are intended for use in genetic counseling clinics and as decision aids for patients. Since different data sources and modeling approaches have been used to develop these models, their clinical application and the interpretation of the model outputs may differ. To help clinicians and genetic counseling professionals understand these differences and use the models wisely, we have reviewed the models and offer some guidelines for their use.
Chen et al.
207
ing models in this review are empirical models: the Wijnen model [13], the Amsterdam-plus model [14•], the MMRpredict model [15•], the PREMM1,2 model [16•], and the Myriad table [17•]. Germline mutation analysis techniques for MMR genes have evolved over time, so the training sets for some models have incorrectly classified some mutation carriers as noncarriers. For example, genomic deletions and rearrangements were not yet screened for when the Wijnen model was developed, and MSH6 mutations were not assessed when the PREMM1,2 model was developed; therefore families with those mutations would not have been classified as Lynch syndrome families. This has implications for the predictive performance of these models. In contrast, Mendelian models predict the presence of a deleterious mutation based on Mendel’s law and on the genotype-phenotype relationship. Therefore they do not require a training sample and are not affected by the nature of mutation analysis. The Mendelian models in this review are the MMRpro model [18•] and the AIFEG model [19•]. Table 2 summarizes important features of the models we identified. In the following paragraphs we also address model development (for empirical models, this includes information on the training set and mutation analysis strategy), input and output, external validation (includes validation set, the mutation analysis performed, and validation results), and URL if available. When validating prediction models, two complementary qualities are often examined: calibration, the accuracy in predicting the proportion of mutations in a population, which is often measured by the observedversus-expected ratio (O/E ratio); and refinement or discriminatory ability, measured by the area under the (receiver operating characteristic [ROC]) curve (AUC). It is worth noting that a fraction of the mutations found in all studies are missense mutations whose pathogenicity is not known. These mutations were typically excluded from the training and validation datasets of these models.
Wijnen model
Models To identify all relevant models, we searched all or subsets of the following keywords in PubMed: “Lynch syndrome,” “HNPCC,” “MLH1,” “MSH2,” “risk,” “probability,” and “model.” When the search yielded a relevant paper, we also searched its references and citations. All identified models were developed to predict the probability that a family or individual has Lynch syndrome based primarily on family history information. We identified seven models, which can be broadly categorized as “empirical models” and “Mendelian models.” Empirical models use statistical learning techniques (eg, logistic regression) to summarize the relationship between a set of family history features and mutation analysis results. Therefore a genotyped training set is required. The follow-
The Wijnen model, also referred to as the Leiden model, is a multivariate, logistic regression–based empirical model [13]. It predicts whether a family rather than an individual harbors an MMR mutation. The training set was 184 families from the Netherlands and Norway, half of which satisfied AC I. Using DGGE (denaturing gradient gel electrophoresis)–based technique, 47 were found to harbor a deleterious mutation in either MLH1 or MSH2. DGGE cannot detect genomic deletions or rearrangements, and MSH6 mutations were not screened for. The input variables are fulfillment of AC I, mean age of CRC diagnoses, and the presence of any endometrial cancer (EC) in the family. The output is the probability of testing positive for either MLH1 or MSH2 mutations. No validation was performed in the original paper.
No
Family
Self
No
No
–
No
Exact pedigree
Applicability
Calculation
Deletions rearrangements
Include MSH6?
What if “no mutation found” or VUS?
Provides future risks?
–
–
Endometrial cancer
Extracolonic tumors
Multiple primaries*
Yes
No
No
Endometrial cancer
Extracolonic tumors
Multiple primaries*
No
No
No
No
Endometrial cancer age of onset
Adenomas
Side of colon
Gender
MSI status
No
No
No
≥5?
No
Family mean
Yes
Amsterdam
Yes
Yes
–
–
–
–
No
–
No
No
Yes
No
No
Family youngest
No
No
1st-degree relative
No
Synchronous or metachronous
Yes
No
–
Yes
Yes
MSH2 deletion No
Online
CRC age