letters
npg
© 2014 Nature America, Inc. All rights reserved.
A large-scale screen for coding variants predisposing to psoriasis Huayang Tang1,18, Xin Jin2,3,18, Yang Li1,18, Hui Jiang2,18, Xianfa Tang1,18, Xu Yang2, Hui Cheng1, Ying Qiu4, Gang Chen1, Junpu Mei2, Fusheng Zhou1, Renhua Wu2, Xianbo Zuo1, Yong Zhang2, Xiaodong Zheng1, Qi Cai5, Xianyong Yin1, Cheng Quan1, Haojing Shao2, Yong Cui1, Fangzhen Tian4, Xia Zhao2, Hong Liu6, Fengli Xiao1, Fengping Xu2, Jianwen Han7, Dongmei Shi4, Anping Zhang1, Cheng Zhou8, Qibin Li2, Xing Fan1, Liya Lin2, Hongqing Tian6, Zaixing Wang1, Huiling Fu2, Fang Wang8, Baoqi Yang6, Shaowei Huang2, Bo Liang1, Xuefeng Xie2, Yunqing Ren1, Qingquan Gu2, Guangdong Wen8, Yulin Sun9, Xueli Wu2, Lin Dang10, Min Xia2, Junjun Shan11, Tianhang Li4, Lin Yang2, Xiuyun Zhang4, Yuzhen Li10, Chundi He12, Aie Xu11, Liping Wei13, Xiaohang Zhao9, Xinghua Gao12, Jinhua Xu14, Furen Zhang6, Jianzhong Zhang8, Yingrui Li2, Liangdan Sun1, Jianjun Liu1, Runsheng Chen15, Sen Yang1, Jun Wang2,16,17 & Xuejun Zhang1,14 To explore the contribution of functional coding variants to psoriasis, we analyzed nonsynonymous single-nucleotide variants (SNVs) across the genome by exome sequencing in 781 psoriasis cases and 676 controls and through follow-up validation in 1,326 candidate genes by targeted sequencing in 9,946 psoriasis cases and 9,906 controls from the Chinese population. We discovered two independent missense SNVs in IL23R and GJB2 of low frequency and five common missense SNVs in LCE3D, ERAP1, CARD14 and ZNF816A associated with psoriasis at genome-wide significance. Rare missense SNVs in FUT2 and TARBP1 were also observed with suggestive evidence of association. Single-variant and gene-based association analyses of nonsynonymous SNVs did not identify newly associated genes for psoriasis in the regions subjected to targeted resequencing. This suggests that coding variants in the 1,326 targeted genes contribute only a limited fraction of the overall genetic risk for psoriasis. Genome-wide association studies (GWAS) have identified numerous risk-associated variants within 44 susceptibility loci for psoriasis1–13. However, functional coding variants in these susceptibility loci, particularly low-frequency and rare variants, have not been systematically investigated14. Although some examples have been published
of coding variants associated with common autoimmune diseases, including variants in IFIH1 (ref. 15), IL23R16,17, IL2RA18, IL2RB18 and NOD2 (ref. 19), and for psoriasis, where coding variants in CARD14 were described2,20, the corresponding efforts to identify causal coding variants were still very limited. Next-generation sequencing technologies, such as exome sequencing analysis, have made the systematic investigation of coding variants possible21. To investigate the contribution of functional coding variants to genetic susceptibility for psoriasis, we carried out a large-scale sequencing analysis of functional coding variants in 2 independent samples comprising 21,309 individuals of Han Chinese ancestry (Supplementary Fig. 1). In the discovery stage, we performed exome sequencing of 781 psoriasis cases and 676 controls (Supplementary Tables 1 and 2) with sequence coverage of ~34× (Supplementary Fig. 2a), and we identified 518,308 SNVs (Supplementary Tables 3 and 4a). Of these variants, 20.62% were nonsynonymous, and 68.13% were rare (defined as having a minor allele frequency (MAF) of 0.1% to single-variant association analysis. After excluding SNVs within the major histocompatibility complex (MHC) region, we found 133 nonsynonymous SNVs with suggestive evidence of association (P < 0.05) (Supplementary Table 6). We also performed
1Department
of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China. 2BGI-Shenzhen, Shenzhen, China. 3School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, China. 4Department of Dermatology, Jining No. 1 People’s Hospital, Jining, Shandong, China. 5Department of Dermatology, Second Hospital, Chengdu, China. 6Shandong Provincial Institute of Dermatology and Venereology, Shandong Academy of Medical Science, Jinan, China. 7Department of Dermatology, Affiliated Hospital of Inner Mongolia Medical College, Huhehot, China. 8Department of Dermatology, Peking University People’s Hospital, Beijing, China. 9State Key Laboratory of Molecular Oncology, Cancer Institute & Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College and Center of Basic Medical Sciences, Navy General Hospital, Beijing, China. 10Department of Dermatology, Second Affiliated Hospital of Harbin Medical University, Harbin, China. 11Department of Dermatology, Third People’s Hospital of Hangzhou, Hangzhou, China. 12Department of Dermatology, No. 1 Hospital of China Medical University, Shenyang, China. 13School of Life Sciences, Peking University, Beijing, China. 14Department of Dermatology, Huashan Hospital of Fudan University, Shanghai, China. 15Institute of Biophysics of the Chinese Academy of Sciences, Beijing, China. 16Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark. 17Department of Biology, University of Copenhagen, Copenhagen, Denmark. 18These authors contributed equally to this work. Correspondence should be addressed to Xuejun Zhang (
[email protected]) or J.W. (
[email protected]). Received 7 June; accepted 17 October; published online 10 November 2013; doi:10.1038/ng.2827
Nature Genetics VOLUME 46 | NUMBER 1 | JANUARY 2014
45
letters Table 1 Association results and predicted functionality of common and low-frequency nonsynonymous variants discovered in this study MAF Chr.
Gene
1p31.3
1q21.3
5q15
© 2014 Nature America, Inc. All rights reserved.
LCE3D
ERAP1
5q15
npg
IL23R
ERAP1
13q11-12 GJB2
Variant ID (hg18)
Functional alteration Stagesa Allele
Chr. 1: c.530G>A; 67,421,184 p.Gly149Arg
rs512208
rs27044
rs26653
c.185C>A; p.Gly43Val
c.2535C>G; p.Gln730Glu
c.727G>C; p.Arg127Pro
rs72474224 c.324C>T; p.Val37Ile
19q13.41 ZNF816A rs12459008 c.590T>A; p.Ile80Asn
19q13.3
FUT2
rs1047781
c.418A>T; p.Ile140Phe
Cases
Controls
Combined P value
1
A/G
0.0295 0.0578
1.70 × 10−4
2
A/G
0.0362 0.0483
3.32 × 10−9
1
A/C
0.3260 0.3031
1.89 × 10−1
2
A/C
0.3232 0.2763
1.75 × 10−23
1
C/G
0.4717 0.4768
7.84 × 10−1
2
C/G
0.4765 0.5165
3.31 × 10−15
1
G/C
0.4106 0.4164
7.55 × 10−1
2
G/C
0.4199 0.4555
1.32 × 10−12
1
T/C
0.0795 0.0519
2.88 × 10−3
2
T/C
0.0566 0.0435
3.72 × 10−9
1
T/A
0.2665 0.3459
2.70 × 10−5
2
T/A
0.3309 0.3565
2.26 × 10−7
1
T/A
0.4533 0.4334
2.83 × 10−1
2
T/A
0.4731 0.4492
4.22 × 10−6
OR (95% CI)
P value
OR (95% CI)
Prediction PolyPhen-2
SIFT
0.50 1.94 × 10−11 0.72 Probably Damaging (0.34–0.72) (0.65–0.79) damaging 0.74 (0.67–0.82) 1.11 2.92 × 10−23 1.24 Benign (0.95–1.30) (1.19–1.29) 1.25 (1.20–1.31)
Damaging
0.98 2.16 × 10−14 0.86 Benign (0.85–1.13) (0.83–0.89) 0.85 (0.82–0.89)
Tolerated
0.98 5.27 × 10−12 0.87 Benign (0.84–1.14) (0.84–0.91) 0.87 (0.83–0.90)
Tolerated
1.58 7.46 × 10−11 1.34 Probably Damaging (1.17–2.14) (1.23–1.46) damaging 1.32 (1.20–1.45) 0.69 2.25 × 10−9 0.88 Benign (0.58–0.82) (0.84–0.92) 0.89 (0.86–0.93)
Damaging
1.08 2.78 × 10−6 1.10 Probably Tolerated (0.94–1.26) (1.06–1.14) damaging 1.10 (1.03–1.15)
Chr., chromosome; MAF, minor allele frequency; CI, confidence interval. Alleles are shown as minor allele/major allele. aStage
1, exome sequencing in 1,457 individuals (781 cases and 676 controls); stage 2, targeted sequencing in 19,852 individuals (9,946 cases and 9,906 controls).
gene-based association analysis of 14,578 genes using nonsynonymous SNVs with MAF < 5% (Online Methods), identifying 742 genes with association P < 0.05. However, none of these genes achieved genomewide significance after Bonferroni correction for multiple testing (P < 1.3 × 10−6 for single-variant analysis and P < 3.4 × 10−6 for gene-based analysis) (Supplementary Fig. 3 and Supplementary Table 7). Recognizing that our sample size and, thus, the power of the discovery analysis by exome sequencing were limited, we performed 2 independent studies in a large sample of 9,946 psoriasis cases and 9,906 controls using targeted sequencing (Online Methods). First, we carried out a comprehensive sequencing analysis of 622 known immune disease–related genes (including 57 genes in 44 psoriasis susceptibility loci identified by GWAS) (Supplementary Tables 8 and 9) listed in the GWAS catalog. Second, we carried out a validation study by sequencing the 133 top nonsynonymous SNVs and 742 top genes. A total of 3.2 Mb of coding sequence surrounding the targeted regions of 1,326 genes (covering 133 SNVs, 742 top genes and 622 immune disease–related genes) was captured and sequenced with ~64× coverage (Supplementary Fig. 2b and Supplementary Table 2). In total, we identified 82,387 nonsynonymous SNVs (Supplementary Tables 3 and 4b), of which 97.07% were rare (Supplementary Table 5b). Quality control analyses for SNVs are summarized in the Online Methods and in Supplementary Table 10. Using single-variant association analysis for nonsynonymous SNVs (MAF > 0.1%) in the 622 known immune disease–associated genes and the combined data set from 10,727 psoriasis cases and 10,582 controls, we discovered 6 common and low-frequency nonsynonymous SNVs 46
that showed consistent association with disease in the 2 independent samples of exome and targeted sequencing (Supplementary Table 11) and achieved genome-wide significance: chr. 1: 67,421,184 (c.530G>A) at IL23R (Pcombined = 1.94 × 10−11, odds ratio (OR) = 0.72), rs72474224 (c.324C>T) at GJB2 (Pcombined = 7.46 × 10−11, OR = 1.34), rs512208 (c.185C>A) at LCE3D (Pcombined = 2.92 × 10−23, OR = 1.24), rs27044 (c.2535C>G) and rs26653 (c.727G>C) at ERAP1 (Pcombined = 2.16 × 10−14, OR = 0.86 and Pcombined = 5.27 × 10−12, OR = 0.87, respectively) and rs12459008 (c.590T>A) at ZNF816A (Pcombined = 2.25 × 10−9, OR = 0.88) (Table 1). One common missense SNV, rs1047781 (c.418A>T), in FUT2 was also identified with suggestive evidence of association (Pcombined = 2.78 × 10−6, OR = 1.10) (Table 1). Within the 622 immune disease–relevant genes, only 2 rare missense SNVs (chr. 19: 53,898,296 (c.271C>T) and chr. 19: 53,898,541 (c.516C>G)) in FUT2 were observed with suggestive evidence of association (Pcombined = 5.51 × 10−5, OR = 3.61 and Pcombined = 2.92 × 10−3, OR = 6.92, respectively) (Table 2). Conditional and haplotype analyses indicated that chr. 19: 53,898,296 (c.271C>T) showed independent risk effect (Pconditional = 8.45 × 10−5, OR = 4.03) from rs1047781 (c.418A>T) within FUT2 (Fig. 1a and Supplementary Table 12). We also investigated six previously reported nonsynonymous SNVs in CARD14 (rs11652075)2,20, IL23R (rs11209026 and rs7530511)22, ERAP1 (rs30187)4, TYK2 (rs34536443)2 and TRAF3IP2 (rs33980500)2 conferring psoriasis risk in populations of European ancestry. We replicated the association at rs11652075 in CARD14 (P = 3.46 × 10−9, OR = 0.91) and also confirmed VOLUME 46 | NUMBER 1 | JANUARY 2014 Nature Genetics
letters Table 2 Association analysis of rare missense SNVs by Fisher’s exact test Sequencing Chr. 1
1
19
19
Gene
SNV (hg18)
Allele
TARBP1
Chr. 1: 232,631,799
A/G
TARBP1
FUT2
FUT2
Chr. 1: 232,649,343 Chr. 19: 53,898,296 Chr. 19: 53,898,541
A/G
T/C
G/C
Functional alterations c.2857G>A; p.Val953Met c.1963G>A; p.Gly655Arg c.271C>T; p.Arg91Trp c.516C>G; p.Asp172Glu
Combined
MAF Stagesa
Cases
MAF
Controls
P value
OR
Cases
Controls
1
0
0.00149
2.14 × 10−1
0.00 0.00028 0.00157
2
0.00031
0.00158
3.81 × 10−5
0.19
10−1
1
0
0.00149
2.15 ×
2
0.00051
0.00185
1.16 × 10−4 10−2
P value
OR (95% CI)
1.28 × 10−5
0.18 (0.08–0.43)
4.19 × 10−5
0.26 (0.13–0.52)
0.00192 0.00053
5.51 × 10−5
3.61 (1.85–7.04)
2.92 × 10−3
6.92 (1.57–30.45)
0.00 0.00047 0.00183 0.28
1
0.00418
0
3.18 ×
2
0.00175
0.00057
6.32 × 10−4
3.08
1
0.00199
0.00074
3.73 × 10−1
2.69 0.00066 0.00010
2
0.00056
0.00005
3.99 × 10−3 10.95
NA
Chr., chromosome; MAF, minor allele frequency; CI, confidence interval. Alleles are shown as minor allele/major allele.
npg
© 2014 Nature America, Inc. All rights reserved.
aStage
1, exome sequencing in 1,457 individuals (781 cases and 676 controls); stage 2, targeted sequencing in 19,852 individuals (9,946 cases and 9,906 controls).
e vidence of association at rs33980500 (TRAF3IP2) (P = 5.32 × 10−3, OR = 1.69). rs11209026 (IL23R) and rs34536443 (TYK2) were found to be monomorphic in our samples, which is consistent with 1000 Genomes Project data. However, rs7530511 (IL23R) and rs30187 (ERAP1) did not show association with psoriasis in our study (P > 0.05) (Supplementary Table 13). We performed conditional and haplotype analyses to evaluate whether these significantly associated nonsynonymous SNVs were independent of the established GWAS-identified SNPs in the Chinese population6,10, including rs4085613 (LCE3D), rs3751385 (GJB2), rs3762318 (IL23R), rs151823 (ERAP1) and rs9304742 (ZNF816A) (Table 3 and Supplementary Table 14). Conditioning on the GWASidentified SNPs had limited impact on the associations at rs72474224 (c.324C>T) (Pconditional = 1.88 × 10−2, OR = 1.19) in GJB2 and at chr. 1: 67,421,184 (c.530G>A) (Pconditional = 2.31 × 10−3, OR = 0.70) in IL23R (Table 3). Further haplotype analysis suggested that the minor alleles at chr. 1: 67,421,184 and rs3762318 in IL23R were on different haplotypes and carried opposite directions of effect, further supporting their independence from each other (Fig. 1b). For GJB2, taking the wild-type CC haplotype as the reference, the TT haplotype (OR = 1.36) carrying the risk-associated variants of both rs3751385 and rs72474224 showed stronger effect on psoriasis risk than the TC haplotype (OR = 1.14) that only carries the risk-associated T allele at rs3751385 (Supplementary Table 15). These findings suggested that the T allele at rs72474224 confers additional risk of psoriasis beyond that conferred by the risk-associated T allele at rs3751385 in GJB2 (Fig. 1c). Furthermore, a heterogeneity test for the two haplotypes (TT versus TC) showed that the TT haplotype had significantly greater effect on psoriasis risk than the TC haplotype (Phet = 0.022, OR = 1.18) (Supplementary Table 15). Taken together, our results suggest that chr. 1: 67,421,184 (c.530G>A) in IL23R and rs72474224 (c.324C>T) in GJB2 are independent risk-associated variants for psoriasis. We also performed validation analysis for the top 133 non synonymous SNVs (Supplementary Table 16) and additional nonsynonymous SNVs (MAF > 0.1%) in the top 742 genes from exome sequencing in the same combined data set of 10,727 psoriasis cases and 10,582 controls. We did not discover any nonsynonymous SNVs in new genes with genome-wide significant association by single-variant association analysis (Supplementary Fig. 4). However, two protective rare missense SNVs in TARBP1, chr. 1:232,631,799 Nature Genetics VOLUME 46 | NUMBER 1 | JANUARY 2014
(c.2857G>A) and chr. 1: 232,649,343 (c.1963G>A), which were predicted to be deleterious by SIFT23 and/or Polyphen24, showed enrichment in controls (ORs of 0.18 and 0.26, respectively) with suggestive evidence of association (Pcombined = 1.28 × 10−5 and Pcombined = 4.19 × 10−5) (Table 2). Four haplotypes were observed that included these two rare SNVs: three of these were rare haplo types (AA, GA and AG) and had similar effects on disease risk (OR = 0.18–0.26) (Fig. 1d). In the gene-based association analyses of rare nonsynonymous SNVs in the top 742 genes and 622 known immune disease–related genes (Online Methods), only IL23R (Pmin = 3.31 × 10−8, where Pmin is the minimum P value across the gene-based tests) and GJB2 (Pmin = 6.97 × 10−7) achieved significance in the targeted sequencing samples of 9,946 psoriasis cases and 9,906 controls (Supplementary Tables 7–9). However, after removing these low-frequency SNVs (rs72474224 at GJB2 and chr. 1: 67,421,184 at IL23R, respectively) from gene-based analysis, the two genes no longer showed significant association with psoriasis (Supplementary Table 17). The most significant P value across gene-based tests for TARBP1 was 4.26 × 10−4 (Supplementary Table 7). In the current study, we have performed a comprehensive association analysis of the nonsynonymous SNVs in 622 immune disease– related genes using 10,727 psoriasis cases and 10,582 controls, and we identified 7 common and low-frequency nonsynonymous SNVs within known psoriasis susceptibility genes that were associated with psoriasis risk. Of these seven coding SNVs, we showed that chr. 1: 67,421,184 (c.530G>A) in IL23R and rs72474224 (c.324C>T) in GJB2 were independent of the GWAS-identified SNPs through conditional and haplotype analyses. Intriguingly, chr. 1: 67,421,184 (c.530G>A) in IL23R has been reported to be associated with inflammatory bowel disease (IBD) in European populations16,17, a finding that is consistent with the association of rs11209026 (c.1142G>A) in IL23R with both psoriasis4 and IBD25. However, the MAF of chr. 1: 67,421,184 (c.530G>A) was much lower in individuals of European ancestry (MAF in controls of ~0.004) than in individuals of Chinese ancestry (MAF in controls of ~0.05). GJB2 encodes subunits of a hexamer that forms the connexin-26 gap junction channel26. The missense SNV rs72474224 (c.324C>T) causes an alteration at amino acid 37 (p.Val37Ile) of the GJB2 protein, affecting a residue that appears to orientate toward the pore of the channel (Supplementary Fig. 5). Although the chemical properties of valine and isoleucine are similar, 47
npg
© 2014 Nature America, Inc. All rights reserved.
letters Figure 1 Haplotype analyses in FUT2, IL23R, GJB2 and TARBP1. Orange dots represent the minor alleles of low-frequency or rare variants, and blue dots represent the minor alleles of common variants. (a) In total, 9,705 cases and 9,641 controls were available for haplotype analysis on chr. 19: 53,898,296 (c.271C>T), rs1047781 (c.418A>T) and chr. 19: 53,898,541 (c.516C>G) located in exon 2 of FUT2. Six haplotypes were observed. The distribution of the TAC independent rare haplotype between cases and controls was significant, which suggests that chr. 19: 53,898,296 (c.271C>T) affects risk independently. (b) In total, 2,304 cases and 3,311 controls were available for haplotype analysis on intergenic rs3762318 (C/T) and low-frequency missense SNV chr. 1: 67,421,184 (c.530G>A) located in exon 4 of IL23R. The CG haplotype (carrying risk allele C at rs3762318) and the TA haplotype (carrying the minor, protective A allele at c.530G>A) were independent of each other and had opposite effects on psoriasis risk. (c) In total, 4,693 cases and 3,262 controls were available for haplotype analysis in GJB2. The GWAS SNP rs3751385 (T/C) is located in the 3´ UTR, and the low-frequency missense SNV rs72474224 (c.324C>T) is located in exon 1 of GJB2. The distributions of the TC haplotype (carrying the T risk variant at rs3751385) and the TT haplotype (carrying the T risk variant at both rs3751385 and rs72474224) were both significant. A heterogeneity test for the TT and TC haplotypes that calculated the OR of the TT haplotype and its statistical significance using the TC haplotype as the reference showed that the TT haplotype had significantly greater risk effect (OR = 1.36) than the TC haplotype (OR = 1.14) (Supplementary Table 15). (d) For TARBP1, 10,444 cases and 10,300 controls were available for haplotype analysis. The rare nonsynonymous SNV chr. 1: 232,631,799 (c.1963G>A) is located in exon 12, and chr. 1: 232,649,343 (c.2857G>A) is located in exon 16. Four haplotypes were observed, three of which were rare haplotype (AA, GA and AG) and had similar effects (OR of 0.18–0.26) on psoriasis risk. The GA and AG haplotypes were independent.
a
Exon 1
c.271C>T
c.418A>T
c.516C>G
Cases
Controls
A
C
10,230 (52.70%)
10,639 (55.18%)
C
b
c
–7
0.91 (0.87–0.94)
–6
1.10 (1.06–1.15)
–2
2.70 (1.13–6.42)
–3
13.92 (1.83–105.84)
–5
3.88 (1.94–7.78)
–2
4.96 (1.09–22.68)
7.46 × 10
T
C
9,158 (47.18%)
8,637 (44.79%)
3.23 × 10
T
T
C
19 (0.10%)
7 (0.04%)
1.90 × 10
C
T
G
14 (0.07%)
1 (0.005%)
1.00 × 10
T
A
C
39 (0.2%)
10 (0.05%)
3.68 × 10
C
A
G
10 (0.05%)
2 (0.01%)
3.90 × 10
Intergenic
Exon 4
rs3762318
c.530G>A
IL23R
Cases
Controls
P
OR (95% CI) –1
1.01 (0.91–1.12)
–3
1.18 (1.04–1.33)
–5
0.65 (0.55–0.81)
T
G
3,909 (84.9%)
5,610 (84.8%)
8.97 × 10
C
G
536 (11.6%)
666 (10.1%)
8.20 × 10
158 (3.4%)
336 (5.1%)
T
A
UTR-3
Exon 1
rs3751385
c.324C>T
Cases
Controls
C
C
4,541 (48.4%)
3,406 (52.2%)
C
2.54 × 10
GJB2
4,257 (45.4%)
2,790 (42.8%)
P
OR (95% CI) –6
0.86 (0.81–0.91)
–3
1.11 (1.04–1.18)
–4
1.27 (1.11–1.46)
–1
0.99 (0.38–2.61)
2.16 × 10 1.20 × 10
T
T
578 (6.2%)
320 (4.9%)
7.55 × 10
C
T
10 (0.1%)
7 (0.1%)
9.89 × 10
exon 12
Exon 16
c.1963G>A
c.2857G>A
Cases
Controls
G
G
20,878 (99.95%)
20,562 (99.81%)
TARBP1
P
OR (95% CI) –5
3.86 (1.92–7.75)
–5
0.18 (0.08–0.43)
–5
0.26 (0.13–0.52)
–5
0.18 (0.08–0.43)
4.30 × 10
A
A
6 (0.03%)
33 (0.16%)
1.20 × 10
G
A
10 (0.048%)
38 (0.18%)
4.30 × 10
A
isoleucine has a bulkier, asymmetric side chain (with an extra methyl group) compared to valine, and this may affect channel properties. Our current study further highlighted GJB2 as the potential causal gene in the psoriasis susceptibility locus, although more functional studies are warranted to confirm this hypothesis. The remaining common nonsynonymous SNVs in LCE3D, ERAP1 and ZNF815A did not show associations independent from the GWASidentified SNPs. However, rs12459008 (c.590T>A) in ZNF816A and rs27044 (c.2535C>G) in ERAP1 showed stronger associations with psoriasis than the GWAS SNPs (Table 2). Furthermore, rs27044 (c.2535C>G) affects amino acid 730 (p.Gln730Glu) of an α-helix motif that is buried inside the ERAP1 protein structure27,28. The wild-type glutamine residue is neutral, whereas the mutant glutamic acid residue carries a negative charge, appearing to orientate toward the α-helix and interact with the protein backbone of ERAP1 (Supplementary Fig. 6), 48
OR (95% CI)
P
C
T
d
FUT2
Exon 2
G
6 (0.03%)
33 (0.16%)
1.20 × 10
which may in turn destabilize the α-helix motif. Although finemapping analysis is required to confirm these findings, the nonsynonymous SNVs in ERAP1 and ZNF816A represent candidates for the causal variants located in the reported GWAS-associated loci. In addition, we also observed one common and two rare nonsynonymous SNVs in FUT2 with suggestive evidence of association with psoriasis. Noncoding variants of FUT2 have been shown to affect circulating vitamin B12 levels29,30 and to be associated with susceptibility for both Crohn’s disease and psoriasis13. For TARBP1, the findings in single-variant analysis for two rare nonsynonymous SNVs indicated that this gene has suggestive evidence of association with psoriasis. However, further validation is required for these rare nonsynonymous SNVs to confirm these findings. Our results also suggested that nonsynonymous SNVs show evidence of genetic heterogeneity between European and Chinese VOLUME 46 | NUMBER 1 | JANUARY 2014 Nature Genetics
letters Table 3 Condition analysis between reported associated SNPs and newly discovered nonsynonymous risk SNVs in the Chinese population Association in overlapping samplesa Chr. 1q21.3
1p31.3
5q15
Gene
Variants ID (hg18)
LCE3D
rs4085613
Intergenic
A/C
0.3774
0.4327 2.13 × 10−11
rs512208
Missense
A/C
0.3235
0.2839
3.59 × 10−7
rs3762318
Intergenic
C/T
0.1168
0.1005
7.49 × 10−3
Chr. 1: Missense 67,421,184
A/G
0.0304
0.0509
2.04 × 10−5
rs151823
Intergenic
C/A
0.4931
0.5190
9.98 × 10−5
rs27044
Missense
C/G
0.4846
0.5115
5.12 × 10−5
rs26653
Missense
G/C
0.4295
0.4557
7.18 × 10−5
rs3751385
Intronic
T/C
0.5151
0.4769
2.04 × 10−6
T/C
0.0627
0.0508
9.85 × 10−4
C/T
0.3269
0.3529
9.91 × 10−6
T/A
0.3291
0.3592
1.89 × 10−6
IL23R
ERAP1
npg
© 2014 Nature America, Inc. All rights reserved.
13q11-q12 GJB2
Functional alteration Allele
rs72474224 Missense 19q13.41
MAF
ZNF816A rs9304742
Intronic
rs12459008 Missense
Cases
Controls
P value
OR (95% CI)
Logistic conditional analysis
LD
P value
OR
D′
r2
0.79 (0.74–0.85) 1.21 (1.12–1.30)
rs512208
1.48 × 10−6
0.82
0.98
0.28
rs4085613
2.99 × 10−2
1.10
1.18 (1.05–1.34) 0.65 (0.54–0.80)
Chr. 1: 67,421,184 rs3762318
2.81 × 10−2
1.18
1.00
0.005
2.31 × 10−3
0.70
0.90 (0.86–0.95) 0.89 (0.85–0.95) 0.90 (0.85–0.95)
rs27044
7.68 × 10−1
0.98
0.95
0.81
rs151823
1.96 × 10−1
0.91
rs151823
1.85 × 10−1
0.94
1.16 (1.09–1.24) 1.26 (1.10–1.45)
rs72474224
4.79 × 10−5
1.14
0.96
0.057
rs3751385
1.88 × 10−2
1.19
0.89 (0.84–0.94) 0.88 (0.83–0.92)
rs12459008
9.82 × 10−1
0.99
0.98
0.956
rs9304742
3.38 × 10−1
0.87
Conditioned on
Chr., chromosome; MAF, minor allele frequency; CI, confidence interval; LD, linkage disequilibrium. Alleles are shown as minor allele/major allele. aOverlapping
samples (3,980 cases and 3,152 controls for LCE3D, 2,098 cases and 3,058 controls for IL23R, 5,693 cases and 5,630 controls for ERAP1, 4,693 cases and 3,262 controls for GJB2, 5,601 cases and 5,620 controls for ZNF816A) for each locus in the current study and our previous GWAS of psoriasis.
opulations. rs72474224 (GJB2) seemed to be specific to individuals of p Chinese ancestry, whereas rs11209026 (IL23R) was specific to individuals of European ancestry, according to data from the current study and the 1000 Genomes Project. rs7530511 (IL23R), which showed significant association with psoriasis in individuals of European ancestry22, did not show any evidence of association in our study. In CARD14, although several rare missense variants were identified in individuals of European ancestry20,31, only one common missense variant, rs11652075, showed consistent association in our Chinese samples. The investigation of 622 immune disease–related genes suggested that coding variants, at least common and low-frequency nonsynonymous variants, make limited independent contribution to psoriasis risk. For most of the GWAS loci, the associations that were originally identified at common SNPs are unlikely to represent causal variants owing to the effects of coding risk variants. This is consistent with the observation by Maurano et al.32 that ~80% of the genetic variants for common diseases identified by GWAS are within noncoding or regulatory sequence marked by DNase I–hypersensitive sites (DHSs) as well as with the findings of the Encyclopedia of DNA Elements (ENCODE) Project that the majority of the variants identified by GWAS in noncoding or regulatory sequence had important roles in gene expression or regulation and therefore were involved in the pathogenesis of common diseases33,34. Another study also indicated that the missing heritability for common autoimmune diseases might not be attributable to rare coding variants in GWAS-identified loci35. In this study, with combined samples, we have sufficient power (≥80%) to detect low-frequency SNVs (MAF > 1%) with modest effect (OR = 2) through single-variant association analysis; however, we had limited power ( 4) (Supplementary Fig. 7). Given the increasing amount of evidence suggesting that the common variants Nature Genetics VOLUME 46 | NUMBER 1 | JANUARY 2014
identified by GWAS can only partially explain the genetic risk of common diseases, studying coding SNVs in common diseases has been the major driver of the emerging interest in performing sequencing analysis aiming to discover such low-frequency and rare coding SNVs that contribute to the missing heritability of common diseases36,37. Taking together the results from both the comprehensive analysis of immune disease–related gene and the large-scale exome sequencing analysis, our study indicates that nonsynonymous SNVs in the 1,326 targeted genes (Supplementary Table 18) had limited contribution to the overall genetic risk of psoriasis. In summary, we have carried out a comprehensive analysis of coding SNVs in a large sample and only discovered two independent low-frequency variants with moderate effect on disease risk. We did not identify any low-frequency or rare coding variants with strong genetic effect, and it remains to be shown whether this will also hold true for other common diseases beyond psoriasis. URLs. GWAS catalog, http://www.genome.gov/gwastudies; 1000 Genomes Project, http://www.1000genomes.org/; ENCODE Project, http://www.genome.gov/10005107; PLINK/SEQ, http://atgu.mgh. harvard.edu/plinkseq/; Research Collaboratory for Structural Bioinformatics Protein Data Bank, http://www.rcsb.org/pdb/home/ home.do; DeepView/Swiss-PdbViewer, http://www.expasy.org/ spdbv/. Methods Methods and any associated references are available in the online version of the paper. Note: Any Supplementary Information and Source Data files are available in the online version of the paper.
49
letters
npg
© 2014 Nature America, Inc. All rights reserved.
Acknowledgments We thank the individuals who participated in this project and their families. We also want to thank L. Wu, D. Li, Y. Shi, J. Shen, L. Song, Y. Xue, J. Jv, Y. Sheng and J. Gao who participated in the analysis of exome sequencing data. We thank the State Key Laboratory Incubation Base of Dermatology, Ministry of National Science and Technology (Hefei, China). This study was funded by the Key Program of the National Natural Science Foundation of China (81130031), the National Science Fund for Excellent Young Scholars (81222022), the Outstanding Talents of Organization Department of the CPC (Communist Party of China) Central Committee program, the Local Universities Characteristics and Advantages of Discipline Development Program of the Ministry of Finance of China and the General Program of the National Natural Science Foundation of China (81072461, 30971644, 31171224, 31000528, 81000692, 81071285, 81172866, 81172591 and 31200939), New Century Excellent Talents in University (NCET-11-0889), and the Science and Technological Fund of Anhui Province for Outstanding Youth (1108085J10) as well as the Pre-National Basic Research Program of China (973 Plan; 2012CB722404), the National Basic Research Program of China (973 Plan; 2009CB825404), the State Key Development Program for Basic Research of China (973 Program; 2011CB809203), the Chinese High-Tech (863) Program (2012AA02A201), the Enterprise Key Laboratory, supported by Guangdong Province, and the Shenzhen Key Laboratory of Transomics Biotechnologies (CXB201108250096A), and the National High-Tech Research & Development Program (2012AA020206). AUTHOR CONTRIBUTIONS Xuejun Zhang conceived this study and obtained financial support. Xuejun Zhang, J.W., Yingrui Li and L.S. participated in study design and were responsible for project management. H.C., Y.Q., Q.C., C.Q., Y.C., F.T., H.L., F. Xiao, J.H., D.S., A.Z., C.Z., X.F., H. Tian, Z.W., F.W., B.Y., B.L., G.W., Y.S., L.D., J.S., T.L., Xiuyun Zhang, Yuzhen Li, C.H., A.X., L.W., Xiaohang Zhao, X.G., J.X., F. Zhang and J.Z. conducted sample selection and data management, undertook recruitment, collected phenotype data, undertook related data handling and calculations, managed recruitment and obtained biological samples. H. Tang, X.J., Yang Li, H.J., X.T., X.Y., J.M., R.W., X. Zuo, Y.Z., X. Yin, H.S., Xia Zhao, F. Xu, Q.L., L.L., H.F., S.H., X.X., Y.R., Q.G., X.W., M.X., L.Y. and R.C. designed the bioinformatics and experimental sections, coordinated the collection, maintained project procedures and performed data analysis. F. Zhou, G.C. and X. Zheng performed genotyping analysis. H. Tang, X.J., X. Zuo, X.T. and H.C. undertook data processing, statistical analysis and bioinformatics investigations. H. Tang, X.J., L.S., X.T., Yang Li and J.L. cowrote the manuscript. All authors contributed to the final version of the manuscript, with Xuejun Zhang, J.W., S.Y., L.S., Yingrui Li, H. Tang, X.J., X.T., H.J. and Yang Li having key roles. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. Reprints and permissions information is available online at http://www.nature.com/ reprints/index.html. 1. Zhang, X. Genome-wide association study of skin complex diseases. J. Dermatol. Sci. 66, 89–97 (2012). 2. Tsoi, L.C. et al. Identification of 15 new psoriasis susceptibility loci highlights the role of innate immunity. Nat. Genet. 44, 1341–1348 (2012). 3. Liu, Y. et al. A genome-wide association study of psoriasis and psoriatic arthritis identifies new disease loci. PLoS Genet. 4, e1000041 (2008). 4. Strange, A. et al. A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1. Nat. Genet. 42, 985–990 (2010). 5. Nair, R.P. et al. Genome-wide scan reveals association of psoriasis with IL-23 and NF-κB pathways. Nat. Genet. 41, 199–204 (2009). 6. Zhang, X.J. et al. Psoriasis genome-wide association study identifies susceptibility variants within LCE gene cluster at 1q21. Nat. Genet. 41, 205–210 (2009). 7. Ellinghaus, E. et al. Genome-wide association study identifies a psoriasis susceptibility locus at TRAF3IP2. Nat. Genet. 42, 991–995 (2010).
50
8. Ellinghaus, E. et al. Genome-wide meta-analysis of psoriatic arthritis identifies susceptibility locus at REL. J. Invest. Dermatol. 132, 1133–1140 (2012). 9. Stuart, P.E. et al. Genome-wide association analysis identifies three psoriasis susceptibility loci. Nat. Genet. 42, 1000–1004 (2010). 10. Sun, L.D. et al. Association analyses identify six new psoriasis susceptibility loci in the Chinese population. Nat. Genet. 42, 1005–1009 (2010). 11. Hüffmeier, U. et al. Common variants at TRAF3IP2 are associated with susceptibility to psoriatic arthritis and psoriasis. Nat. Genet. 42, 996–999 (2010). 12. Capon, F. et al. Identification of ZNF313/RNF114 as a novel psoriasis susceptibility gene. Hum. Mol. Genet. 17, 1938–1945 (2008). 13. Ellinghaus, D. et al. Combined analysis of genome-wide association studies for Crohn disease and psoriasis identifies seven shared susceptibility loci. Am. J. Hum. Genet. 90, 636–647 (2012). 14. Zeggini, E. Next-generation association studies for complex traits. Nat. Genet. 43, 287–288 (2011). 15. Nejentsev, S., Walker, N., Riches, D., Egholm, M. & Todd, J.A. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science 324, 387–389 (2009). 16. Momozawa, Y. et al. Resequencing of positional candidates identifies low frequency IL23R coding variants protecting against inflammatory bowel disease. Nat. Genet. 43, 43–47 (2011). 17. Rivas, M.A. et al. Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat. Genet. 43, 1066–1073 (2011). 18. Diogo, D. et al. Rare, low-frequency, and common variants in the protein-coding sequence of biological candidate genes from GWASs contribute to risk of rheumatoid arthritis. Am. J. Hum. Genet. 92, 15–27 (2013). 19. Lesage, S. et al. CARD15/NOD2 mutational analysis and genotype-phenotype correlation in 612 patients with inflammatory bowel disease. Am. J. Hum. Genet. 70, 845–857 (2002). 20. Jordan, C.T. et al. Rare and common variants in CARD14, encoding an epidermal regulator of NF-κB, in psoriasis. Am. J. Hum. Genet. 90, 796–808 (2012). 21. Ng, S.B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 42, 30–35 (2010). 22. Nair, R.P. et al. Polymorphisms of the IL12B and IL23R genes are associated with psoriasis. J. Invest. Dermatol. 128, 1653–1661 (2008). 23. Kumar, P., Henikoff, S. & Ng, P.C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009). 24. Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010). 25. Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn’s disease susceptibility loci. Nat. Genet. 42, 1118–1125 (2010). 26. Maeda, S. et al. Structure of the connexin 26 gap junction channel at 3.5 Å resolution. Nature 458, 597–602 (2009). 27. Nguyen, T.T. et al. Structural basis for antigenic peptide precursor processing by the endoplasmic reticulum aminopeptidase ERAP1. Nat. Struct. Mol. Biol. 18, 604–613 (2011). 28. Kochan, G. et al. Crystal structures of the endoplasmic reticulum aminopeptidase-1 (ERAP1) reveal the molecular basis for N-terminal peptide trimming. Proc. Natl. Acad. Sci. USA 108, 7745–7750 (2011). 29. Tanaka, T. et al. Genome-wide association study of vitamin B6, vitamin B12, folate, and homocysteine blood concentrations. Am. J. Hum. Genet. 84, 477–482 (2009). 30. Hazra, A. et al. Common variants of FUT2 are associated with plasma vitamin B12 levels. Nat. Genet. 40, 1160–1162 (2008). 31. Jordan, C.T. et al. PSORS2 is due to mutations in CARD14. Am. J. Hum. Genet. 90, 784–795 (2012). 32. Maurano, M.T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012). 33. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012). 34. Gerstein, M.B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012). 35. Hunt, K.A. et al. Negligible impact of rare autoimmune-locus coding-region variants on missing heritability. Nature 498, 232–235 (2013). 36. Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009). 37. Eichler, E.E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 11, 446–450 (2010).
VOLUME 46 | NUMBER 1 | JANUARY 2014 Nature Genetics
ONLINE METHODS
npg
© 2014 Nature America, Inc. All rights reserved.
Subjects. Two independent study cohorts, totaling 21,309 individuals (after sample quality controls), were evaluated in this study. All samples were from the Han Chinese population, obtained through collaboration with multiple hospitals in China. Samples in the initial stage (exome sequencing) included 781 psoriasis cases and 676 controls (Supplementary Table 1) and were mainly selected from previous GWAS samples. In total, 9,946 psoriasis cases and 9,906 controls (Supplementary Table 1) were recruited in the second stage (targeted sequencing). Clinical and demographic information was gathered from both cases and controls through the same structured questionnaire. All controls used in this study were individuals without psoriasis, any autoimmune disorders or systemic disorders, and any family history of psoriasis (including first-, second- and third-degree relatives). Cases and controls were matched on the basis of age and sex. All participants provided written informed consent. The study was approved by the institutional ethical committee of each hospital and was conducted according to Declaration of Helsinki principles. Investigators were not blinded to group allocation during the experiment when assessing outcome. Exome sequencing. We sequenced 1,500 exomes in the first stage of the project, including those for 800 psoriasis cases and 700 controls, at BGI Shenzhen. Genomic DNA for each individual was hybridized with the NimbleGen 2.1Mprobe sequence capture array38 to enrich exonic DNA in each library. We performed sequencing on the Illumina HiSeq 2000 platform independently for each captured library to ensure that each sample had average coverage of ~34-fold (Supplementary Fig. 2a). Raw image files were processed by the Illumina Pipeline (version 1.3.4) for base calling with default parameters, and sequences for each individual were generated as 90-bp reads. Targeted sequencing. The regions selected for targeted sequencing included (i) regions that encompassed the top associated SNVs in the exome sequencing stage (100 bp per SNV on average); (ii) coding regions for the top 742 genes in gene-based tests; and (iii) coding regions for the 622 genes associated with immune diseases by GWAS. Target regions were combined and merged, and the final set of regions was optimized and used to construct a SeqCap EZ Library (Roche NimbleGen) for enrichment of the selected target regions. We then sequenced the target regions to 64× average coverage (Supplementary Fig. 2b) in 20,005 samples (10,003 psoriasis cases and 10,002 controls). Targeted sequencing was performed using the same workflow as for exome sequencing on the HiSeq 2000 platform to generate 90-bp paired-end reads. Alignment. Samples were aligned to the NCBI human genome reference assembly (build 36.3) using BWA (Burrows-Wheeler Aligner)39. BAM files were processed, using the Genome Analysis Toolkit (GATK v1.6)40 to perform realignment around known indels. All aligned read data were subjected to CountCovariates (GATK) on the basis of known SNVs (dbSNP135), and base quality was then recalibrated (TableRecalibration in GATK). Sample quality control. Sequencing data for each individual were evaluated against quality control metrics, and we required that samples have (i) average sequencing depth of ≥10× in exome sequencing and ≥30× sequencing depth in targeted sequencing; (ii) 90% of the target region covered by ≥8×; (iii) GC content of 42–48%; (iv) heterozygosity of SNVs on autosomes; (v) inbreeding coefficient of −0.2 to 0.1; (vi) absence of population stratification (by principalcomponents analysis (PCA); Supplementary Fig. 8); (vii) pairwise identity by descent of 3-degree outside; and (viii) concordance with genotyping data. After filtering, 43 samples from the exome sequencing stage and 153 samples from the targeted sequencing stage that did not pass quality control were removed from further analysis. Variant calling. SOAPsnp (v1.05)41 was used to generate genotype information in targeted and flanking regions for each individual. We identified and merged candidate variants that were called as high-quality SNVs (genotype quality of ≥20, sequencing depth of ≥8× and depth of non-reference allele of ≥4×) in at least one individual. We then extracted the genotypes for these positions in all samples to build a genotype matrix as input for the following
doi:10.1038/ng.2827
analysis. A number of filtering criteria were performed to remove false-positive calls, and data quality and error rates were carefully evaluated. The sensitivity of SNV detection was evaluated using 1000 Genomes Project data for the CHB (Han Chinese in Beijing, China) reference on the SNVs located within our targeted regions in sequencing analysis (both exome and targeted sequencing) (Supplementary Table 19). Variant quality control. After initial SNV calls were generated, we performed further filtering to identify high-confidence SNVs in targeted sequencing, requiring that each SNV (i) have a call rate of ≥90% (Q20 and depth of ≥8× were considered to be indicative of high quality; samples not meeting these criteria were considered to be missing); (ii) have a Hardy-Weinberg test P value of >1.0 × 10–4; (iii) be located in a region where the length of homopolymer runs was ≤6 bp; (iv) not be located in MHC homologous sequence or in a known indel (including indels in the 1000 Genomes Project and their flanking 5-bp regions); (v) pass the allele balance test (with distributions of major and minor alleles in heterozygous positions in balance; binomial test P > 1.0 × 10–7); (vi) pass the end enrichment test (with major and minor alleles not enriched on the ends of reads; Fisher’s exact test P > 1.0 × 10–7); and (vii) pass the strand bias test (with major and minor alleles not enriched on either strand; Fisher’s exact test P > 1.0 × 10–7). As part of our quality control analysis, we sequenced 24 CHB samples from the 1000 Genomes Project in our targeted sequencing analysis. For the 35,297 variants in the 24 CHB samples that were called independently by the 1000 Genome Project and the current study, the genotype concordance was 99.59%. We also genotyped 27 SNVs by Sequenom using 975 samples, and we observed that 99.49% individuals had genotype concordance of ≥99% (Supplementary Table 10). Annotation of SNVs. Annovar42 was used to separate SNVs into different functional categories according to their genic location and their expected effect on encoded gene products, based on information from the RefSeq database. In addition, we categorized the SNVs into two levels of ‘known’ and ‘newly identified’ variants according to whether they were present in dbSNP (version 135). Genotyping. As quality control for sequencing, 27 SNVs from sequencing data were genotyped to evaluate fingerprint concordance of genotypes using MALDI-TOF (matrix-assisted laser desorption and ionization time of flight) mass spectrometry (Sequenom). One to two individuals per sequencing pool were chosen at random as representatives in almost all pools (~5% of the total) to compare the genotypes of 27 SNVs in sequencing and genotyping data (concordance of >95%). Genotyping of rs11652075 in CARD14 in an additional 4,480 psoriasis cases and 6,521 controls was performed using MALDI-TOF mass spectrometry. Approximately 15 ng of genomic DNA was used to genotype each sample. Locus-specific PCR and detection primers were designed using MassARRAY Assay Design 3.0 software (Sequenom). Statistical analysis. After samples quality controls, 781 psoriasis cases and 676 controls in the exome sequencing stage, and 9,946 psoriasis cases and 9,906 controls in the validation stage involving targeted sequencing were employed for statistical analysis. Single-variant association analysis for SNVs with MAF > 0.1% was performed by case-control association analysis using PLINK 1.07 (ref. 43). Fisher’s exact test was used for rare variants with a count of below five in cases or controls. Combined analysis was performed using Cochran-Mantel-Haenszel (CMH) tests44 across the two collections. A Manhattan plot was generated using association P values and the locations of the SNVs across the genome (Supplementary Fig. 3). Linkage disequilibrium and haplotype analyses of significantly associated coding variants and GWASidentified SNPs in each gene were conducted with Haploview 4.2 (ref. 45). We also performed a conditional analysis by adjusting for the associations of coding variants to assess the contribution of these variants to the known GWAS signals. Conditional and haplotype analyses were carried out on samples that were analyzed in both the current study and our previous GWAS 6. Statistical power to observe a P value of 1.3 × 10−6 for association at nonsynonymous variants was assessed with the Genetic Power Calculator46 (Supplementary Fig. 7).
Nature Genetics
Protein structure analysis. We searched for published three-dimensional protein structures in the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) and downloaded structures for ERAP1 (2YD0)28 and connexin-26/GJB2 (2ZW3)26. We used the DeepView/SwissPdbViewer to view the protein structures and to examine the side chains
of the original and mutant residues at the relevant amino acid positions. SIFT23 and PolyPhen-2 (ref. 24) were used in the prediction of damage evolution and progression for associated nonsynonymous variants. 38. Albert, T.J. et al. Direct selection of human genomic loci by microarray hybridization. Nat. Methods 4, 903–905 (2007). 39. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010). 40. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). 41. Li, R. et al. SNP detection for massively parallel whole-genome resequencing. Genome Res. 19, 1124–1132 (2009). 42. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010). 43. Purcell, S. et al. PLINK: a tool set for whole-genome association and populationbased linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). 44. Wallenstein, S. & Wittes, J. The power of the Mantel-Haenszel test for grouped failure time data. Biometrics 49, 1077–1087 (1993). 45. Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005). 46. Purcell, S., Cherny, S.S. & Sham, P.C. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19, 149–150 (2003). 47. Siepel, A., Pollard, K. & Haussler, D. in Research in Computational Molecular Biology, Vol. 3909 (eds. Apostolico, A., Guerra, C., Istrail, S., Pevzner, P. & Waterman, M.) 190–205 (Springer, Berlin, Heidelberg, 2006). 48. Han, F. & Pan, W. A data-adaptive sum test for disease association with multiple common or rare variants. Hum. Hered. 70, 42–54 (2010). 49. Madsen, B.E. & Browning, S.R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009). 50. Dering, C., Hemmelmann, C., Pugh, E. & Ziegler, A. Statistical analysis of rare sequence variants: an overview of collapsing methods. Genet. Epidemiol. 35 (suppl. 1), S12–S17 (2011). 51. Li, B. & Leal, S.M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008). 52. Wu, M.C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011). 53. Price, A.L. et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010). 54. Neale, B.M. et al. Testing for an unusual distribution of rare variants. PLoS Genet. 7, e1001322 (2011).
npg
© 2014 Nature America, Inc. All rights reserved.
Gene-based analysis. In the exome sequencing stage, functional coding SNVs (missense, nonsense and splice site) with MAF < 5% were extracted and weighted by their predicted function (tests were limited to those variants predicted to be damaging by at least one of the following methods: SIFT23, PolyPhen-2 (ref. 24) or phylop47) for gene-based analysis. A range of genebased methods were used for preliminary screening, including (i) aSUM48, a data-adaptive modification sum test for both common and rare variants; (ii) WSS49, weight-sum statistics over rare variants (MAF < 5%); (iii) Collaps50, which collapses rare variants to a single variable; (iv) CMC51, Hotelling’s T2 test for rare variants (MAF < 5%); and (v) UNIQ, which counts case-unique rare alleles. Analyses for aSUM, WSS, Collaps and CMC were performed using in-house R script. Analysis for UNIQ was performed using PLINK/SEQ software. Minimum P values (Pmin) across the five methods were assigned as the gene-based test result for each gene. Statistical significance for gene-based analysis was defined as P < 3.5 × 10−6 after multiple-testing correction for the 14,578 genes captured by exome sequencing. In the validation stage of targeted sequencing, functional coding SNVs (missense, nonsense and splice site) with MAF < 5% and 1% were extracted for gene-based test using the five methods described above and by six additional methods, with and without weighting by prediction of functional impact (the same weighting strategy was used as for the exome sequencing stage). The six additional methods included (i) SKAT-O (SNP-set kernel association test)52; (ii) BURDEN (detecting an excess of rare alleles in cases compared to controls); (iii) VT (variable threshold test)53; (iv) FQRWGT (frequency-weighted test, in the spirit of Madsen-Browning); (v) CALPHA (C-alpha test)54; and (vi) SUMSTAT (evaluating the sums of single-site statistics). SKAT-O analysis was performed with the R package SKAT with parameter method = ‘optimal’. BURDEN, UNIQ, VT, FQRWGT, CALPHA and SUMSTAT analyses were performed with PLINK/SEQ software with parameter ‘--phenotype--mask mac=1-199--test uniq vt fw calpha sumstat’, where equal to 1–199 for MAF < 1% and 1–993 for MAF < 5%.
Nature Genetics
doi:10.1038/ng.2827