RESEARCH ARTICLE
Lessons Learned from Whole Exome Sequencing in Multiplex Families Affected by a Complex Genetic Disorder, Intracranial Aneurysm
a11111
OPEN ACCESS Citation: Farlow JL, Lin H, Sauerbeck L, Lai D, Koller DL, Pugh E, et al. (2015) Lessons Learned from Whole Exome Sequencing in Multiplex Families Affected by a Complex Genetic Disorder, Intracranial Aneurysm. PLoS ONE 10(3): e0121104. doi:10.1371/ journal.pone.0121104 Academic Editor: Yun Li, University of North Carolina, UNITED STATES Received: August 6, 2014 Accepted: February 10, 2015 Published: March 24, 2015 Copyright: © 2015 Farlow et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: The dataset, consisting of called variants, subject phenotypes, and pedigree information for the multiplex IA families can be requested directly from the National Center for Biotechnology Information (NCBI) Database of Genotypes and Phenotypes (dbGaP) (http://www. ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_ id = phs000636.v1.p1) (accession phs000636). Mapped reads are available on the Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra) (accession SRX329208-SRX329252).
Janice L. Farlow1, Hai Lin1, Laura Sauerbeck2, Dongbing Lai1, Daniel L. Koller1, Elizabeth Pugh3, Kurt Hetrick3, Hua Ling3, Rachel Kleinloog4, Pieter van der Vlies5, Patrick Deelen5,6, Morris A. Swertz5,6, Bon H. Verweij4, Luca Regli4,7, Gabriel J. E. Rinkel4, Ynte M. Ruigrok4, Kimberly Doheny3, Yunlong Liu1, Joseph Broderick2, Tatiana Foroud1*, FIA Study Investigators¶ 1 Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America, 2 Department of Neurology and Rehabilitation Medicine, University of Cincinnati School of Medicine, Cincinnati, Ohio, United States of America, 3 Center for Inherited Disease Research, Johns Hopkins University; Baltimore, Maryland, United States of America, 4 Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands, 5 Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands, 6 Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands, 7 Department of Neurosurgery, University Hospital Zurich, Zurich, Switzerland ¶ Membership of the FIA Study Investigators is provided in the Acknowledgements. *
[email protected]
Abstract Genetic risk factors for intracranial aneurysm (IA) are not yet fully understood. Genomewide association studies have been successful at identifying common variants; however, the role of rare variation in IA susceptibility has not been fully explored. In this study, we report the use of whole exome sequencing (WES) in seven densely-affected families (45 individuals) recruited as part of the Familial Intracranial Aneurysm study. WES variants were prioritized by functional prediction, frequency, predicted pathogenicity, and segregation within families. Using these criteria, 68 variants in 68 genes were prioritized across the seven families. Of the genes that were expressed in IA tissue, one gene (TMEM132B) was differentially expressed in aneurysmal samples (n=44) as compared to control samples (n=16) (false discovery rate adjusted p-value=0.023). We demonstrate that sequencing of densely affected families permits exploration of the role of rare variants in a relatively common disease such as IA, although there are important study design considerations for applying sequencing to complex disorders. In this study, we explore methods of WES variant prioritization, including the incorporation of unaffected individuals, multipoint linkage analysis, biological pathway information, and transcriptome profiling. Further studies are needed to validate and characterize the set of variants and genes identified in this study.
PLOS ONE | DOI:10.1371/journal.pone.0121104 March 24, 2015
1 / 25
Lessons from Whole Exome Sequencing in Familial Intracranial Aneurysm
Funding: National Institutes of Health (www.nih.gov) R01NS39512 (JB); National Institutes of Health R03NS083468 (TF); National Institutes of Health Contract Numbers HHSN268201100011I, HHSN268200782096C, and HHSN268201200008I (KD); National Institutes of Health TL1 TR000162 (JLF); National Institutes of Health T32 GM077229 (JLF); George J. and Elizabeth Wile Neuroscience Research Endowment Fund. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist.
Introduction Subarachnoid hemorrhage (SAH) is the most devastating subtype of stroke. Fatality from SAH between 21 days to one month of the hemorrhage ranges from 25–35% in high-income countries to almost 50% in low- to middle-income countries [1]. Up to 80–90% of SAH cases are caused by rupture of intracranial aneurysms (IA), which are present in approximately 3% of the population [2]. Smoking and hypertension are important risk factors, increasing the risk of IA rupture by 3.1 and 2.6 times respectively [3]. The risk of an IA and for IA rupture is also increased among individuals having a first-degree relative with a history of an IA [2, 4, 5]. The location and number of IAs in a given individual also appears to be influenced by a family history [6]. Thus, several lines of evidence suggest that IA is due to both genetic and environmental risk factors. Unfortunately, until more is understood about these risk factors, the severe morbidity and mortality associated with this disease will continue to be a large public health burden. Several approaches have been employed to identify genes contributing to IA. Initial studies utilized pedigrees having multiple affected members. Analyses in these initial studies detected linkage to several chromosomal regions (1p34.3–36.13 [7, 8], 4q32.2 [9], 6p23 [10],7q11 [7], 7q36.3 [9], 8q12.1 [9], 11q24–25 [10–12], 12q21.33 [9], and 14q23–31 [12]); however, the causative gene was not identified in any of these regions. More recently, genomewide association studies (GWAS) have focused on the role of common variants that might individually have a small effect on disease risk. Analyses have consistently detected association to single nucleotide polymorphisms in CDKN2BAS, also known as ANRIL, on chromosome 9p21.3 [13–15], as well as SOX17 on chromosome 8q12.1 [13–15]. Association has also been reported to EDNRA on chromosome 4q31 [16], CNNM2 on chromosome 10q24 [14], KL/STARD13 on chromosome 13q13 [14], and RBBP8 on chromosome 18q11 [14]. Together, these genes only explain a fraction of the population attributable risk for IA. Advances in technology, especially in the development of high-throughput sequencing, now make it possible to efficiently search for rare variants having a large effect on disease risk. These rare variants may point to novel genes and pathways that are critical to improve the molecular understanding of IA and methods of predicting those at greatest risk. In the present work, whole exome sequencing (WES) was applied to a unique set of families densely affected with IA to investigate the role of rare genetic variation in disease susceptibility and to demonstrate important study design considerations for WES studies in complex disease.
Materials and Methods Families Selected for Whole Exome Sequencing Individuals were recruited as part of the Familial Intracranial Aneurysm (FIA) Study [17]. Study approval was granted by institutional review boards at Indiana University, University of Cincinnati, Mayo Clinic, University of Medicine and Dentistry of New Jersey, Cleveland Clinic, Columbia University, University of Texas Houston, Indianapolis Neurosurgical Group, Goodman Campbell Brain and Spine, University of Western Ontario, University of Maryland, McGill University, University of Montreal, Notre Dame Hospital, Ackland UniServices, The George Institute, Royal Perth Hospital, Sir Charles Gairdner Hospital, Royal Adlaide Hospital, Royal Melbourne Hospital, Alfred Hospital, Royal North Shore Hospital, Westmead Hospital, Royal Prince Alfred Hospital, Northwestern University, University of Ottawa, University of Pittsburgh, Stanford University, University of California San Francisco, University of Southern California, University of Virginia, University of Washington, University of Manitoba, University of Alabama Birmingham, Allegheny General Hospital, Brigham and Women’s Hospital,
PLOS ONE | DOI:10.1371/journal.pone.0121104 March 24, 2015
2 / 25
Lessons from Whole Exome Sequencing in Familial Intracranial Aneurysm
Table 1. Disease phenotypes. Classification
Definition
Definite
Medical records document an intracranial aneurysm (IA) on angiogram, operative report, autopsy, or a non-invasive imaging report (MRA, CTA) demonstrates an IA measuring 7mm or greater.
Probable
Death certificate mentions probable IA without supporting documentation or autopsy. Death certificate mentions subarachnoid hemorrhage (SAH) without mention of IA and a phone screen is consistent with ruptured IA (severe headache or altered level of consciousness) rapidly leading to death. An MRA documents an IA that is less than 7 mm but greater than 3 mm.
Possible
Non-invasive imaging report documents an aneurysm measuring between 2 and 3 mm or SAH was noted on death certificate, without any supporting documentation, autopsy or recording of headache or altered level of consciousness on phone screen. Death certificate lists ‘aneurysm’ without specifying cerebral location or accompanying SAH.
Not a case
There is no supporting information for a possible IA.
doi:10.1371/journal.pone.0121104.t001
Massachusetts General Hospital, University of Florida, Johns Hopkins University, University of Michigan, and Washington University in St. Louis. Written consent was obtained from all study participants. Families were recruited to ensure that DNA could be obtained from at least two living affected relatives and that the family would be informative for linkage analysis. Exclusion criteria included (i) a fusiform-shaped unruptured IA of a major intracranial trunk artery; (ii) an IA that is part of an arteriovenous malformation; (iii) a family or personal history of polycystic kidney disease, Ehlers Danlos syndrome, Marfan’s syndrome, fibromuscular dysplasia, or Moya-Moya disease; or (iv) failure to obtain informed consent from the patient or family members. To identify unruptured IA, magnetic resonance angiography (MRA) was offered to first degree relatives of affected family members who had a higher risk of IA as defined by: 1) 30 years of age or older and 2) either a 10 pack year history of smoking or an average blood pressure of 140 mmHg systolic or 90 mmHg diastolic. Only individuals having an IA based on an intra-arterial angiogram, operative report, autopsy, or size 7 mm on non-invasive imaging (MRA) were considered “definite” cases (Table 1). Two neurologists independently reviewed each record to determine if a subject met all inclusion and exclusion criteria. In case of disagreement, a third neurologist reviewed the data. Seven families of European American descent with the highest density of affected individuals who also had DNA available were selected for WES [18] (Fig. 1). All affected individuals for which sufficient DNA was available were selected for sequencing. Unaffected individuals were selected only if there was an MRA conducted that confirmed the absence of an IA at 45 years or older and if there was sufficient DNA available. One clinically unaffected individual in family E was assumed to be an obligate carrier and was sequenced with her offspring to allow confirmation of allele transmission. Within the seven families, 45 individuals were chosen for WES.
Whole Exome Sequencing WES was performed at the Center for Inherited Disease Research (CIDR, Johns Hopkins University). Exonic sequences were captured using the Agilent SureSelect Human All Exon 50Mb kit, and paired-end sequencing was performed on the Illumina HiSeq 2000 system, using Flowcell version 3 and TruSeq Cluster Kit version 3. All samples were genotyped using the Illumina HumanOmniExpress-12v1_C platform for quality assurance. Two HapMap samples and two study duplicates were used to ensure library preparation batch quality.
Whole Exome Sequencing Bioinformatics Primary analysis was done using HiSeq Controls Software and Runtime Analysis Software. The CIDRSeqSuite pipeline was used for secondary bioinformatics analysis, which consists mainly
PLOS ONE | DOI:10.1371/journal.pone.0121104 March 24, 2015
3 / 25
Lessons from Whole Exome Sequencing in Familial Intracranial Aneurysm
Fig 1. Simplified pedigrees for the 7 whole exome sequencing families. Only sequenced individuals and those needed to preserve generational structure are shown to protect the anonymity of the pedigree. IA = intracranial aneurysm. All affected individuals are definite IA unless noted as a probable IA, possible IA, or aortic abdominal aneurysm (AAA). Criteria for defining definite, probable, and possible IA statuses are outlined in Table 1. All unaffected individuals, with the exception of individual E-9, had an MRA performed that did not show evidence of an IA. Grey indicates an unknown phenotype. An ‘S’ above an individual denotes that the individual was selected for sequencing. doi:10.1371/journal.pone.0121104.g001
of alignment using Burrows Wheeler Aligner (BWA version 0.5.9) [19] to the human genome reference sequence (build hg19) and applying the Genome Analysis Toolkit (GATK version 1.0.4705) [20] to perform local realignment and base quality score recalibration. Duplicate molecules were flagged and mate-pair information synchronized using Picard (version 1.52, http://picard.sourceforge.net/). The GATK Unified Genotyper (GATK version 1.2–29) was used for multi-sample variant calling. The dataset, consisting of called variants, subject phenotypes, and pedigree information for the multiplex IA families can be requested directly from the National Center for Biotechnology Information (NCBI) Database of Genotypes and Phenotypes (dbGaP) (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id = phs000636.v1.p1) (accession phs000636). Mapped reads are available on the Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra) (accession SRX329208-SRX329252). GATK Variant Quality Score Recalibration (VQSR, GATK version 1.2–38) [21] created a high-quality call set for SNVs by using an adaptive error model to estimate the likelihood of true genotype calls based on aggregating information across multiple quality metrics. As recommended by GATK, HapMap 3.3 and the Illumina Omni 2.5M chip sites, available from the GATK bundle 1.2, were used as training sets and the annotations of Quality by Depth,
PLOS ONE | DOI:10.1371/journal.pone.0121104 March 24, 2015
4 / 25
Lessons from Whole Exome Sequencing in Familial Intracranial Aneurysm
Haplotype Score, Mapping Quality Rank Sum, Read Position Rank Sum, Fisher Strand Bias Test, and Mapping Quality were used as quality metrics for the recalibration. SNVs were filtered until 99% of the overlapping HapMap 3.3 sites were retained after application of VQSR. Insertion/deletions were removed if they had a quality by depth < 2.0, ReadPosRankSum < -20.0 (Z-score from Wilcoxon rank sum test of alternative versus reference read position bias), Fisher’s Strand Bias > 200.0 (phred-scaled p-value using Fisher’s exact test to detect strand bias), and/or a homopolymer run > 5. ANNOVAR [22] was used to annotate variants for location, predicted effect on the protein across three gene databases (RefSeq, UCSC, and Ensembl), and corresponding gene and transcript length. Allele frequencies within European American populations in 1000 Genomes (February 2012 release, http://www.1000genomes.org) [23] and the Exome Sequencing Project (ESP) (ESP6500 release with insertion/deletions and chromosome X and Y calls, http://evs.gs. washington.edu/EVS/) [24] were recorded using custom scripts. The scripts mapped variants to 1000 Genomes and ESP based on chromosomal position and reference and alternate alleles to determine allele frequencies. If a variant was not found in 1000 Genomes or ESP, the alternate allele frequency was set to 0. If a variant was found in both 1000 Genomes and ESP, the smaller alternate allele frequency was taken as the consensus frequency. Variants were annotated for binned minor allele frequencies from 290 samples without a known cardiovascular phenotype that were exome sequenced at CIDR using identical capturing and sequencing technology, although SAMtools [25] was used for variant calling instead of GATK Unified Genotyper. Variants that were monomorphic across all samples were also flagged. Variants were also annotated using custom scripts for Gene Ontology (GO) (http://www. geneontology.org) [26] terms that were hypothesized to play a role in IA pathophysiology. GO terms used included GO:0001944 (vasculature development), GO:0001570 (vasculogenesis), GO:0003018 (vascular process in circulatory system), GO:0005581 (collagen), GO:0005604 (basement membrane), and GO:0051541 (elastin metabolic process). Two programs were used to predict the pathogenicity of SNVs: SIFT [27] and PolyPhen-2 [28]. Scores of damaging for SIFT, or possibly or probably damaging for PolyPhen-2, were accepted as evidence for pathogenicity. Two additional programs were used to analyze the effect of insertions and deletions: SIFT-INDEL [29] for those that cause a frameshift, and DDIG-in for those that do not cause a frameshift [30]. Variants were also annotated for C-scores from the Combined Annotation Dependent Depletion (CADD) webserver (http://cadd.gs. washington.edu) [31]. C-scores of 10 or greater, corresponding to the 10% most deleterious substitutions in the human genome according to CADD, were considered damaging predictions. Biological filtering retained loci if they: 1) were autosomal variants; 2) were predicted to be nonsynonymous SNVs or insertion/deletions in an exonic and/or splicing region (within 2 bp of a splicing junction, as annotated by ANNOVAR) based on RefSeq, UCSC, and Ensembl annotations; 3) had an allele frequency in European American populations