Annotation of functional variation within non-MHC MS ... - Nature

1 downloads 0 Views 572KB Size Report
Jul 17, 2014 - There is a strong and complex genetic component to multiple sclerosis (MS). In addition to variation in the major histocompatibility complex ...
Genes and Immunity (2014) 15, 466–476 & 2014 Macmillan Publishers Limited All rights reserved 1466-4879/14 www.nature.com/gene

ORIGINAL ARTICLE

Annotation of functional variation within non-MHC MS susceptibility loci through bioinformatics analysis FBS Briggs, LJ Leung and LF Barcellos There is a strong and complex genetic component to multiple sclerosis (MS). In addition to variation in the major histocompatibility complex (MHC) region on chromosome 6p21.3, 110 non-MHC susceptibility variants have been identified in Northern Europeans, thus far. The majority of the MS-associated genes are immune related; however, similar to most other complex genetic diseases, the causal variants and biological processes underlying pathogenesis remain largely unknown. We created a comprehensive catalog of putative functional variants that reside within linkage disequilibrium regions of the MS-associated genic variants to guide future studies. Bioinformatics analyses were also conducted using publicly available resources to identify plausible pathological processes relevant to MS and functional hypotheses for established MS-associated variants. Genes and Immunity (2014) 15, 466–476; doi:10.1038/gene.2014.37; published online 17 July 2014

INTRODUCTION Multiple sclerosis (MS) is a clinically heterogeneous autoimmune disease of the central nervous system with a complex etiology, primarily characterized by demyelination and the formation of neurological lesions.1 The prevalence of MS is greatest among Northern Europeans (0.1–0.2%).1 There is a prominent genetic component illustrated by consistently higher disease concordance in monozygotic twins compared with dizygotic twins across several populations (30% and 5%, respectively), similar to other autoimmune diseases.2 The relative recurrence risk (ls) is B6.3 among siblings.3 For several decades, variation within the major histocompatibility complex (MHC) on chromosome 6p21.3 has been known to confer the strongest genetic risk for MS. The primary susceptibility locus is a human leukocyte antigen (HLA) class II allele, HLA-DRB1*15:01.1,4,5 Recent fine-mapping of the MHC demonstrated the presence of at least 10 additional independent HLA and non-HLA risk alleles for MS.6 Full characterization of the non-MHC genetic component in MS has been challenging; however, 110 non-MHC risk variants have been identified within the last decade through both genome-wide association studies (GWASs) and follow-up candidate gene studies facilitated by international research initiatives (International Multiple Sclerosis Genetics Consortium and collaborative construction of the ImmunoChip).4,5 These variants explain 20% of ls, and with the inclusion of the four most prominent HLA risk alleles as much as 28% of ls.5 These risk variants are both genic (63 single-nucleotide polymorphisms (SNPs) in 63 genes) and intergenic, similar to most complex diseases. For most of the 63 MS-associated genes, their contribution to pathogenesis is unclear; however, they appear to be primarily involved in immune response. In the largest MS association study, highresolution mapping was performed for 8 risk loci; for 5 of them, TNFSF14, IL2RA, TNFRSF1A, IL12A and STAT4, 50% of the posterior probability of association was explained by an intronic variant; 3 of the 5 variants are associated with protein levels.5 Functional experiments have revealed the primary causal variants affecting

protein structure through alternative splicing within IL7R7 and TNFRSF1A.8 However, the causal variants and the pathological biological processes mediated by the remaining 103 loci have not been fully explored. Here, we present a comprehensive catalog of candidate functional variants within linkage disequilibrium (LD) blocks encompassing the non-MHC MS susceptibility loci to expedite the prioritization of SNPs for future fine-mapping analyses. We also conducted several bioinformatics enrichment analyses that integrated multiple functional (-omic) data for both MS-associated genes and the individual non-MHC risk variants. These results provide thorough insight into underlying disease mechanisms in MS based on current knowledge. RESULTS Of the 110 non-MHC MS risk variants, 63 SNPs are within genes, 1 SNP is within a noncoding RNA (PVT1) and the remaining 46 SNPs are intergenic (Table 1). Using Haploreg v2 (2013.02.14; http://www.broadinstitute.org/mammals/haploreg/haploreg.php)9 features of the genic risk SNPs were assessed. Among the genic risk SNPs, there were three missense SNPs (TYK2, SLC44A2 and IFI30), two 30 untranslated region (UTR) SNPs (EVI5 and CD69), one 50 UTR SNP (CD86), one synonymous SNP (TIMMDC1) and one downstream SNP (EOMES); the remaining 45 SNPs are intronic (Table 1). With the exception of the TYK2 missense variant, these variants are fairly common with minor allele frequencies 45%. A total of 92 SNPs reside within regulatory motifs, and 27 SNPs reside within transcription factor binding sites. Interestingly, the MS risk variant rs4796791 is an intronic STAT3 (a transcription factor) SNP, and three other MS risk variants exist within signal transducer and activator of transcription 3 (STAT3) binding sites within their respective genes/noncoding RNA: rs4410871 (PVT1), rs917116 (JAZF1) and rs2236262 (ZFP36L1). Among all variants, there was a significant enrichment of strong enhancers for several cell lines, including human embryonic stem cells and leukemia

Genetic Epidemiology and Genomics Laboratory, Division of Epidemiology, School of Public Health, University of California, Berkeley, CA, USA. Correspondence: Profesor LF Barcellos, Genetic Epidemiology and Genomics Laboratory, Division of Epidemiology, School of Public Health, 324 Stanley Hall, University of California, Berkeley, CA 94720, USA. E-mail: [email protected] Received 20 February 2014; revised 2 May 2014; accepted 29 May 2014; published online 17 July 2014

2525665 6530189 85746993 85915183 92975464

101240893 101407519 117080166

120258970 157770241 160711804 192541472 200874728

25017860 43361256 61095245 68587477 112665201 191974435 231115454

18785585 27757018 28078571

33013483

71530346 105558837 119222456 121543577 121770539 121796768

159691112 103551603 106173199 35879156

40399096 55440730 133446575 141506564 158759900

1 1 1

1 1 1 1 1

& 2014 Macmillan Publishers Limited

2 2 2 2 2 2 2

3 3 3

3

3 3 3 3 3 3

3 4 4 5

5 5 5 5 5

rs6880778 rs71624119 rs756699 NA rs2546890

rs1014486 rs7665090 rs2726518 rs6881706

rs9828629 rs2028597 rs1131265 rs1920296 rs2255214 rs9282641

rs4679081

rs11719975 rs2371108 rs1813375

rs4665719 rs2163226 rs842639 rs7595717 rs17174870 rs9967792 rs9989735

rs666930 rs2050568 rs35967351 rs1359062 rs55838263

rs7552544 rs11581062 rs6677309

rs3748817 rs3007421 rs12087340 rs11587876 rs41286801

A G C T A

T A A G

C G G A G G

T

G G G

C T G C C T G

T C A C A

T A A

T G C T C

G A T G G

C G C T

T A C C T A

C

C T T

T C A T T C C

C T T G G

C G C

C A T C T

Alt allele

1 1 1 1 1

Ref allele

Base pair location

Chr

SNP ID

Annotation for the 110 non-MHC MS risk SNPs

Table 1.

0.58 0.25 0.85 0.61 0.5

0.47 0.49 0.6 0.28

0.41 0.08 0.16 0.61 0.51 0.09

0.54

0.25 0.4 0.49

0.75 0.27 0.68 0.28 0.29 0.66 0.18

0.54 0.47 0.3 0.81 0.26

0.44 0.29 0.14

0.33 0.11 0.08 0.21 0.17

EUR freq (1.10–1.18) (1.07–1.18) (1.15–1.29) (1.07–1.17) (1.15–1.25)

(1.05–1.13) (1.07–1.15) (1.08–1.15) (1.06–1.14) (1.00–1.07) (1.07–1.15) (1.12–1.22)

(1.06–1.13) (1.05–1.12) (1.05–1.13) (1.13–1.23) (1.08–1.17)

1.10 1.12 1.12 1.07 1.06

1.11 1.08 1.09 1.12

1.08 1.04 1.19 1.14 1.11 1.12

(1.06–1.14) (1.08–1.17) (1.07–1.18) (1.04–1.11) (1.02–1.09)

(1.07–1.14) (1.05–1.12) (1.05–1.13) (1.08–1.16)

(1.05–1.12) (0.98–1.11) (1.14–1.24) (1.11–1.18) (1.08–1.15) (1.05–1.19)

1.08 (1.04–1.11)

1.09 (1.05–1.13) 1.08 (1.05–1.12) 1.15 (1.12–1.19)

1.09 1.10 1.11 1.10 1.03 1.11 1.17

1.09 1.08 1.09 1.18 1.12

1.08 (1.05–1.12) 1.05 (1.01–1.09) 1.34 (1.27–1.41)

1.14 1.12 1.22 1.12 1.20

Published odds ratio (95% CI)a

EBF1, MEF2A, PU1

NF-kB, POL2, POL24H8, PU1

EBF1

POL24H8

Genes and Immunity (2014) 466 – 476

Ik-2, RBP-Jk

T3R Pou1f1 PPAR, RAR Barhl1, Barx1, Barx2, En-1, Gbx1, Gbx2, Hlxb9, Ik-1, Ik-2, Isl2, Lhx4, Lhx8, Msx-1, Msx2, Pax-6, Pax7, Phox2a, Pou3f2, Prrx1, Prrx2 AP-1, Elf5, HMG-IY, Maf, STAT Bbx, DMRT1, STAT PLZF, Pax-5

FAC1, GR BRCA1, Foxi1, Pou2f2, Pou3f3, TEF

Zic CCNT2, Cdx, Foxc1, Foxj1, GATA, Nkx3, Sox, TAL1 CEBPG, DMRT7, Evi-1, Foxo, HNF1, Hoxa9, Hoxc10, Hoxc9, Irf, Mef2 CEBPB, HMG-IY, Myc, Zfp187 AIRE, AP-4, LBP-1

Maf TCF12 AP-1, CAC-binding-protein, CCNT2, E2F, EWSR1-FLI1, Egr-1, MAZ, MAZR, MZF1::1-4, Myc, PU.1, Pou2f2, SP1, STAT, Sp4, TATA, TFII-I, WT1, ZNF263, Zfp281

Irf, Nanog, Spz1

Ik-2, NF-AT1, Pou2f2

FOXA1, HNF4G, RXRA, TCF4 CMYC PU1, YY1, ELF1

ATF3, CCNT2, INSM1, SP1, ZBTB33

EBF, Pou1f1

Arid5a, CEBPA, CEBPB, CTCF, Pou2f2, STAT Pax-4, Zbtb3 SEF-1, YY1 Mtf1, Rad21

GR, p300 ERa-a, Nkx2, RAR

Motifs changed

GATA2

POL24H8, AP2ALPHA, AP2GAMMA, E2F6, CMYC

P300

ZNF263

Proteins bound (transcription factor binding site)

281 kb 5’ of PTGER4 ANKRD55 3.8 kb 5’ of TCF7 NDFIP1 LOC285626

16 kb 5’ of IL12A 1 kb 3’ of MANBA TET2 2.2 kb 3’ of IL7R

FOXP1 CBLB TIMMDC1 IQCB1 3.7 kb 5’ of CD86 CD86

17 kb 3’ of CCR4

305 kb 5’ of SATB1 866 bp 3’ of EOMES 205 kb 5’ of CMC1

CENPO 88 kb 3’ of ZFP36L2 FLJ16341 4.8 kb 5’ of PLEK MERTK STAT4 SP140

PHGDH FCRL1 SLAMF7 3.4 kb 5’ of RGS1 C1orf106

36 kb 3’ of VCAM1 SLC30A7 CD58

MMEL1 PLEKHG5 4.4 kb 5’ of BCL10 DDAH1 EVI5

Gene

Intronic Intronic

Intronic

Intronic

5’-UTR

Intronic Intronic Synonymous Intronic

Downstream

Intronic Intronic Intronic

Intronic

Intronic

Intronic

Intronic Intronic Intronic

Intronic Intronic

Intronic 3’-UTR

Intronic Intronic

Gene location

A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al

467

Genes and Immunity (2014) 466 – 476

47702395 60793330

11 11

118724894 118755738 6440009 6503500 9905690 58182062

81048611 94481917

10 10

11 11 12 12 12 12

128815029 129158945 5893861 6099045 31415106 75658349

8 8 9 10 10 10

118566746

50325567 149289464 79575804 128192981

7 7 8 8

11

36375304 90976768 128278798 135739355 137452908 137962655 138244816 159470559 3113034 27014988 28172739 37382465

6 6 6 6 6 6 6 6 7 7 7 7

64097233

176788570 14719496

5 6

11

Base pair location

SNP ID

rs9736016 rs523604 rs1800693 rs12296430 rs11052877 rs201202118

rs533646

rs694739

rs7120737 rs34383631

rs1782645 rs7923837

rs4410871 rs759648 rs2150702 rs2104286 rs793108 rs2688608

rs201847125 rs354033 rs1021156 rs2456449

rs941816 rs72928038 rs802734 rs11154801 rs17066096 rs7769192 rs67297943 rs212405 rs1843938 rs706015 rs917116 rs60600003

rs4976646 rs17119

(Continued )

Chr

Table 1.

T A T G A A

C

A

A C

C G

T A G T C G

C G T A

G G A C A G T A G T T T

T G

Ref allele

A G C C G T

G

G

G T

T A

C C A, T C T T

T A C G

A A G A G A C T A G G G

C A

Alt allele

0.38 0.5 0.43 0.2 0.38 NA

0.33

0.4

0.15 0.39

0.37 0.39

0.7 0.31 0.48 0.22 0.48 0.55

NA 0.26 0.72 0.31

0.82 0.18 0.28 0.38 0.23 0.48 0.19 0.61 0.38 0.18 0.22 0.1

0.34 0.77

EUR freq

(1.08–1.16) (1.05–1.13) (1.10–1.22) (1.16–1.26) (1.06–1.13) (1.03–1.10)

(1.07–1.15) (1.00–1.07) (1.08–1.16) (1.06–1.14)

(1.08–1.18) (1.07–1.16) (0.99–1.06) (1.07–1.15) (1.10–1.18) (1.04–1.12) (1.07–1.16) (1.11–1.19) (1.05–1.12) (1.09–1.19) (1.07–1.16) (1.10–1.22)

1.10 1.09 1.14 1.14 1.10 1.14

(1.07–1.14) (1.05–1.13) (1.11–1.18) (1.09–1.18) (1.07–1.14) (1.10–1.18)

1.10 (1.06–1.14)

1.08 (1.04–1.11)

1.13 (1.08–1.18) 1.11 (1.07–1.15)

1.09 (1.05–1.13) 1.11 (1.07–1.14)

1.12 1.09 1.16 1.21 1.09 1.07

1.11 1.03 1.12 1.10

1.13 1.11 1.03 1.11 1.14 1.08 1.12 1.15 1.08 1.14 1.12 1.16

1.13 (1.09–1.17) 1.11 (1.06–1.15)

Published odds ratio (95% CI)a

POL2 POL2

BATF, BCL11A, EBF1, IRF4, MEF2C, P300, PAX5N19, SP1, TCF12, NF-kB, YY1

CEBPB

STAT3

STAT3, P300 CTCF, BATF, EBF1, ELF1, POL2, POL24H8, PU1, RAD21, SIN3AK20, SMC3, TAF1, TBP, YY1, NF-kB, CMYC, MAFK, MAX

CTCF

Proteins bound (transcription factor binding site)

Evi-1, FAC1, Foxa, Foxk1, Foxp1, HDAC2, Irf, Mef2, Nanog, Sox, TATA, Zfp105, p300

CCNT2, Cart1, GR, Nanog, Nkx2 HNF1, Maf ERa-a Ncx, Spz1, Zfp410

CEBPB Foxd3, Foxf1, Foxi1, Foxq1 HNF4, Mrg1::Hoxa9, PPAR, RAR, RXR::LXR, RXRA, TR4 NRSF, p300 Foxd1, Foxf2, Foxj1, Foxk1, Foxo, Hmbox1 Zbtb12 AP-4, CTCF, LBP-1, Lmo2-complex, Nanog AP-1, ERa-a, Egr-1, Myf, Pou2f2, TBX5 STAT, TCF12

HDAC2, Nanog, p300 STAT Cdx2, Hoxa10, Hoxb9, Nanog Evi-1, Foxa, Foxd3, Foxp1, HDAC2, Irf, Sox, Zfp105, p300 Hsf, NF-AT, STAT Ets

Nkx2 Irx, Myc HNF1, Irf, Pax-6 Znf143, p300 Ik-3, NF-kB Gfi1, Gfi1b, RREB-1 Evi-1, HNF1, Sox, TATA Rad21

BCL, GATA, PU.1, Pax-5 AP-2rep, Cdx2, Crx, Hoxc9, Obox3, Pitx2, Pitx3 Hdx Ets, FEV, STAT

Motifs changed

30 kb 5’ of CXCR5 CXCR5 TNFRSF1A 2.8 kb 3’ of LTBR CD69 TSFM

16 kb 5’ of TREH

7.9 kb 3’ of PRDX5

AGBL2 5.5 kb 3’ of CD6

ZMIZ1 27 kb 3’ of HHEX

PVT1 (non-coding RNA) 3.4 kb 5’ of MIR1208 MLANA IL2RA 94 kb 5’ of ZNF438 11 kb 3’ of C10orf55

19 kb 5’ of IKZF1 ZNF767 2.5 kb 5’ of FAM164A 235 kb 5’ of POU5F1B

PXT1 BACH2 11 kb 3’ of PTPRK AHI1 12 kb 3’ of IL22RA2 147 kb 5’ of OLIG3 40 kb 3’ of TNFAIP3 4.4 kb 5’ of TAGAP 30 kb 5’ of CARD11 111 kb 5’ of SKAP2 JAZF1 ELMO1

RGS14 527 kb 5’ of JARID2

Gene

3’–UTR Intronic

Intronic Intronic

Intronic

Intronic

Intronic Intronic

Intronic

Intronic Intronic

Intronic

Intronic Intronic

Intronic

Gene location

A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al

468

& 2014 Macmillan Publishers Limited

& 2014 Macmillan Publishers Limited

123593382 100086259 69261472

75961511

88432328

103263788 79207466 90977333 1073552 11194771 11288806 11435990 30156963

68685905 79110596 79649394 85994484 37912377

40530763 45597098 57816757 56384192 6668972 10463118

10742170 16505106 18285944 49870643 44747947

48438761 52791518 62373983

62409713 22131125 50966914

12 13 14

14

14

14 15 15 16 16 16 16 16

16 16 16 16 17

17 17 17 18 19 19

19 19 19 19 20

20 20 20

20 22 22

rs6062314 rs2283792 rs470119

rs17785991 rs2248359 rs2256814

rs2288904 rs1870071 rs11554159 rs8107548 rs4810485

rs4796791 rs4794058 rs8070345 rs7238078 rs1077667 rs34536443

rs1886700 rs12149527 rs7196953 rs35929052 rs12946510

rs12148050 rs59772922 rs8042861 rs2744148 rs12927355 rs4780346 rs6498184 rs7204270

rs74796499

rs4903324

rs7132277 rs4772201 rs2236262

SNP ID

C T T

T C G

A T G T T

T C T T C G

C C A C C

A T G A C G T C

C

C

C A A

Ref allele

T G C

A T A

G C A C G

C T C G T C

T T G T T

G C T G T A C T

A

T

T G G

Alt allele

0.92 0.51 0.6

0.34 0.4 0.18

0.77 0.31 0.27 0.23 0.75

0.62 0.53 0.55 0.21 0.24 0.03

0.16 0.45 0.69 0.11 0.47

0.65 0.19 0.46 0.16 0.33 0.28 0.85 0.56

0.05

0.21

0.2 0.18 0.5

EUR freq

(1.09–1.19) (1.08–1.16) (1.11–1.20) (1.05–1.13) (1.04–1.12)

(1.06–1.14) (1.04–1.11) (1.11–1.18) (1.02–1.10) (1.12–1.21) (1.18–1.40)

(1.06–1.16) (1.05–1.12) (1.04–1.12) (1.09–1.20) (1.04–1.11)

(1.04–1.11) (1.06–1.15) (1.05–1.12) (1.04–1.13) (1.17–1.26) (1.05–1.13) (1.10–1.21) (1.06–1.13)

1.10 (1.03–1.16) 1.08 (1.05–1.12) 1.07 (1.03–1.10)

1.09 (1.05–1.13) 1.07 (1.03–1.10) 1.11 (1.07–1.16)

1.14 1.12 1.15 1.09 1.08

1.10 1.07 1.14 1.05 1.16 1.28

1.11 1.08 1.08 1.14 1.08

1.08 1.11 1.08 1.09 1.21 1.09 1.15 1.09

1.31 (1.21–1.42)

1.10 (1.05–1.14)

1.10 (1.06–1.15) 1.12 (1.07–1.17) 1.08 (1.04–1.11)

Published odds ratio (95% CI)a

POL2

GR, TCF4

NF-kB, MEF2A, MEF2C, PU1

POL2

POL2, CMYC, MAX

NF-kB, BCL11A, EBF1, MEF2A, MEF2C, P300, PAX5C20, POL2, POL24H8, SP1, TBP, YY1, TAF1

POL2, TAF1, JUND, STAT3, TBP, CMYC, POL24H8, POL2B SRF, YY1, CFOS, USF2, USF1

Proteins bound (transcription factor binding site)

Hbp1, Hoxc9, Nrf1, Obox6, Pdx1, Vax2

Ascl2, CTCFL, CTCF, E2A, HEN1, Myf, Rad21, SMC3 GR

Foxa, Sox Glis2 Barhl1, GR COMP1, Irf, Pou5f1, Sox p53 BDP1, Glis2, HEY1, NRSF, Sin3Ak-20, ZBTB7A, Zfp161, Zic GR, NRSF, YY1 EWSR1-FLI1 NF-I CHOP::CEBPa STAT

DMRT5, HP1-site-factor, Pax-6, Pou6f1 AP-1, NF-Y, Pbx3, RFX5, SP2 Brachyury, MAZR, ZBTB7A, Zfp740 SRF SIX5 Foxo, Foxp1, HNF1, Irf, Mef2, Pax-4, Sox

AP-1, EWSR1-FLI1, Elf3, Elf5, GATA, HMG-IY, Irf, Lhx4 AP-4, Rad21 TCF12 Foxa, Foxd1 Egr-1, Myc, TATA Mef2 AP-1, COMP1, NF-E2

Gm397, Pax-6

Hic1 HNF1, NF-AT, Pax-4 GATA, Pou1f1, SIX5, STAT, Zfp105, Znf143

Motifs changed

ZBTB46 MAPK1 TYMP

SLC9A8 1 kb 5’ of CYP24A1 SLC2A4RG

SLC44A2 EPS15L1 IFI30 DKKL1 CD40

STAT3 11 kb 5’ of NPEPPS VMP1 MALT1 TNFSF14 TYK2

CDH3 WWOX 15 kb 5’ of MAF 38 kb 3’ of IRF8 8.8 kb 3’ of IKZF3

TRAF3 6.6 kb 3’ of CTSH IQGAP1 37 kb 3’ of SOX8 CLEC16A 13 kb 3’ of CLEC16A 3.3 kb 5’ of RMI2 22 kb 5’ of MAPK3

GALC

22 kb 3’ of JDP2

PITPNM2 28 kb 5’ of MIR548AN ZFP36L1

Gene

Intronic Intronic Intronic

Intronic

Intronic

Missense Intronic Missense Intronic Intronic

Intronic Intronic Intronic Missense

Intronic

Intronic Intronic

Intronic

Intronic

Intronic

Intronic

Intronic

Intronic

Gene location

Abbreviations: Chr, chromosome; CI, confidence interval; MHC, major histocompatibility complex; MS, multiple sclerosis; NA, not available; SNP, single-nucleotide polymorphism; UTR, untranslated region. a Odds ratio and 95% confidence intervals reported in 14 498 MS cases and 24 091 healthy controls.5

Base pair location

(Continued )

Chr

Table 1.

A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al

469

Genes and Immunity (2014) 466 – 476

Genes and Immunity (2014) 466 – 476

2525665 6530189 85915183 92975464 101407519 117080166 120258970 157770241 160711804 200874728 25017860 61095245 112665201 191974435 231115454 27757018 71530346 105558837 119222456 121543577 121796768 106173199 55440730 141506564 158759900 176788570 36375304 90976768 135739355 28172739 37382465 149289464 5893861 6099045 81048611 47702395 118755738 6440009 9905690 58182062 123593382 69261472 88432328 103263788 90977333 11194771 68685905 79110596 40530763 57816757 56384192 6668972 10463118 10742170 16505106 18285944 49870643 44747947 48438761 62373983 62409713 22131125 50966914

MMEL1 PLEKHG5 DDAH1 EVI5 SLC30A7 CD58 PHGDH FCRL1 SLAMF7 C1orf106 CENPO FLJ16341 MERTK STAT4 SP140 EOMES FOXP1 CBLB TIMMDC1 IQCB1 CD86 TET2 ANKRD55 NDFIP1 LOC285626 RGS14 PXT1 BACH2 AHI1 JAZF1 ELMO1 ZNF767 MLANA IL2RA ZMIZ1 AGBL2 CXCR5 TNFRSF1A CD69 TSFM PITPNM2 ZFP36L1 GALC TRAF3 IQGAP1 CLEC16A CDH3 WWOX STAT3 VMP1 MALT1 TNFSF14 TYK2 SLC44A2 EPS15L1 IFI30 DKKL1 CD40 SLC9A8 SLC2A4RG ZBTB46 MAPK1 TYMP

Gene

rs3748817 rs3007421 rs11587876 rs41286801 rs11581062 rs6677309 rs666930 rs2050568 rs35967351 rs55838263 rs4665719 rs842639 rs17174870 rs9967792 rs9989735 rs2371108 rs9828629 rs2028597 rs1131265 rs1920296 rs9282641 rs2726518 rs71624119 rs7737631 rs2546890 rs4976646 rs941816 rs72928038 rs11154801 rs917116 rs60600003 rs354033 rs2150702 rs2104286 rs1782645 rs7120737 rs523604 rs1800693 rs11052877 rs12368653 rs7132277 rs2236262 rs74796499 rs12148050 rs8042861 rs12927355 rs1886700 rs12149527 rs4796791 rs8070345 rs7238078 rs1077667 rs34536443 rs2288904 rs1870071 rs11554159 rs8107548 rs4810485 rs17785991 rs2256814 rs6062314 rs2283792 rs470119

SNP used

2524205 6529443 85915149 92972933 101391432 117079262 120255578 157736437 160708150 200871843 25015074 61093035 112660424 191974332 231110737 27756233 71526152 105554066 119203587 121494142 121786069 106151843 55435796 141495917 158759531 176788570 36359879 90951239 135736276 28171832 37372614 149287037 no block 6098949 81048280 47700540 no block 6439470 9904075 58151295 123554604 69260290 88415845 103244990 90966084 11191572 no block 79109071 40529835 57813824 56382716 6668545 10458633 10737988 16504516 18282590 49864497 44746403 48429682 62372005 62408472 22128678 50964862

Block start position

42.28 3.95 2.95 4.34 27.14 5.15 3.39 39.49 11.09 6.89 3.93 40.76 7.58 23.62 9.98 0.79 11.99 47.55 24.92 78.11 17.08 49.84 7.68 33.55 0.37 8.77 31.36 48.41 4.81 0.91 9.91 20.06 2.18 0.33 6.62 0 0.54 5.88 62.19 43.27 12.80 37.43 31.87 20.32 7.88 1.97 6.01 6.53 4.55 1.39 10.04 5.17 5.76 12.33 25.30 2.85 20.26 2.39 5.95 2.45 2.05

6101129 81048611 47707156 6440009 9909956 58213485 123597877 69273090 88453278 103276855 90986403 11199447 79111043 40535845 57820356 56387260 6669934 10468668 10743161 16510279 18294923 49889798 44749251 48449940 62374389 62414424 22131125 50966914

Block width (kb)

2566482 6533393 85918101 92977275 101418569 117084410 120258970 157775921 160719238 200878727 25019001 61133793 112668005 191997946 231120719 27757018 71538137 105601615 119228508 121572252 121803149 106201684 55443479 141529469 158759900 176797343 36391236 90999652 135741087 28172739 37382520 149307098

Block end position

12 0 0 189 0 0 0 33 0 0 0 0 0 9 0 0 0 0 14 33 0 406 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 10 15 0 0 12 93 0 0 0 0 0 0 6 0 0 33 39 0 9 5 0 0 5 0 0 0

3 0 UTR variant

5 0 0 0 0 0 0 0 1 0 28 0 0 0 0 0 0 12 3 23 7 3 10 0 0 0 0 6 0 0 6 0 0 0 0 0 0 0 0 0 13 33 3 4 11 0 0 0 0 8 0 0 20 29 0 2 6 8 8 0 0 0 0

5 0 UTR variant

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9

Coding sequence variant 0 0 0 2 35 0 0 0 0 0 0 13 0 1 1 0 0 0 0 2 0 115 0 2 0 1 0 0 0 0 0 0 0 0 0 10 0 0 1 0 2 0 5 0 5 0 0 0 0 0 0 0 3 11 0 1 0 0 2 1 0 0 0

Feature elongation

1 0 0 0 12 0 0 0 0 0 0 13 0 0 0 0 0 0 0 2 0 185 0 8 0 1 0 0 0 0 0 0 0 0 0 9 0 0 3 0 2 3 3 0 2 0 0 0 0 0 0 0 1 1 0 0 1 1 1 4 0 0 1

Feature truncation

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 265 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 3 0 0 0 0 0 0 1 0 0 0

Frameshift variant 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Inframe deletion

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Inframe insertion

92 56 0 0 0 0 0 69 14 14 11 0 0 1 22 0 0 10 20 53 0 646 4 5 0 17 3 0 0 0 2 0 0 0 0 6 0 7 19 0 0 15 84 0 23 0 0 0 0 5 2 5 59 28 7 42 42 5 6 42 0 0 29

Missense variant

4 0 0 0 0 0 0 0 0 27 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 1 2 0 0 0 0 0 0 0 0 0

NMD transcript variant 70 78 0 0 0 0 0 148 22 0 21 0 0 0 0 0 0 0 17 4 7 0 0 59 1 51 0 6 0 0 10 0 0 0 0 12 0 13 34 0 5 0 104 0 12 0 0 0 0 0 0 0 97 33 0 0 0 1 10 94 0 0 48

Noncoding exon variant

1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 1 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 1

Splice acceptor variant 1 0 0 0 0 0 0 3 0 0 0 0 0 0 1 0 0 0 1 1 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 2 1 0 0 0 0 0 1

Splice donor ariant 16 9 0 0 0 0 0 9 2 6 5 0 0 0 1 0 0 0 4 8 1 12 2 3 0 6 2 1 0 0 0 0 0 0 0 1 0 1 0 0 1 0 14 0 7 0 0 0 0 3 1 2 10 10 0 8 7 0 1 11 0 0 8

Splice region variant 4 1 0 0 0 0 0 2 0 0 0 0 0 1 0 0 0 0 1 3 0 138 0 0 0 2 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 0 0 0

Stop gained

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

Stop lost

39 28 0 0 0 0 0 35 6 6 0 0 0 0 4 0 0 2 14 15 0 72 6 17 0 13 1 0 0 0 2 0 0 0 0 7 0 6 9 0 0 5 19 0 5 0 0 0 0 4 0 4 25 18 3 13 11 0 2 23 0 0 9

Synonymous variant

0 0 28 3 11 254 173 10 78 75 15 30 181 0 0 106

37 81 0 23 69 368 4 67 0

0 0 58

245 172 0 191 47 0 0 300 45 53 65 26 0 12 31 0 0 25 77 153 15 1868 22 94 1 91 7 13 0 0 20 0

Total

Abbreviations: Chr, chromosome; LD, linkage disequilibrium; MHC, major histocompatibility complex; MS, multiple sclerosis; NMD, nonsense-mediated mRNA decay; SNP, single-nucleotide polymorphism; UTR, untranslated region.

rs3748817 rs3007421 rs11587876 rs41286801 rs11581062 rs6677309 rs666930 rs2050568 rs35967351 rs55838263 rs4665719 rs842639 rs17174870 rs9967792 rs9989735 rs2371108 rs9828629 rs2028597 rs1131265 rs1920296 rs9282641 rs2726518 rs71624119 none rs2546890 rs4976646 rs941816 rs72928038 rs11154801 rs917116 rs60600003 rs354033 rs2150702 rs2104286 rs1782645 rs7120737 rs523604f rs1800693 rs11052877 rs201202118 rs7132277 rs2236262 rs74796499 rs12148050 rs8042861 rs12927355 rs1886700 rs12149527 rs4796791 rs8070345 rs7238078 rs1077667 rs34536443 rs2288904 rs1870071 rs11554159 rs8107548 rs4810485 rs17785991 rs2256814 rs6062314 rs2283792 rs470119

1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 3 4 5 5 5 5 6 6 6 7 7 7 9 10 10 11 11 12 12 12 12 14 14 14 15 16 16 16 17 17 18 19 19 19 19 19 19 20 20 20 20 22 22

Base position

Functional assignments within the LD block boundaries for the 63 genic non-MHC MS-risk SNPs

SNP ID

Chr

Table 2.

A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al

470

& 2014 Macmillan Publishers Limited

A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al

471 Table 3.

Overrepresented pathways among the 63 non-MHC MS-associated genes

Pathway

C

O

E

R

5145 5160 4630 4620

132 134 155 102

5 5 5 4

0.19 0.2 0.23 0.15

25.93 25.54 22.08 26.85

2.61E  05 2.61E  05 3.56E  05 0.0001

4514 4060

133 265

4 5

0.19 0.39

20.59 12.92

0.0002 0.0002

CDH3, CD40, CD86, CD58 CXCR5, IL2RA, TNFSF14, CD40, TNFRSF1A

4062 4660 4380 5200

189 108 128 326

4 3 3 4

0.28 0.16 0.19 0.48

14.49 19.02 16.04 8.4

0.0009 0.002 0.0032 0.0041

CXCR5, STAT3, MAPK1, ELMO1 MAPK1, CBLB, MALT1 MAPK1, TYK2, TNFRSF1A TRAF3, STAT3, MAPK1, CBLB

WP453 WP2111 WP286 WP2113

32 11 54 14

4 3 4 3

0.05 0.02 0.08 0.02

85.57 186.7 50.71 146.69

5.90E  06 9.96E  06 1.02E  05 1.02E  05

IL2RA, CD40, TNFRSF1A STAT3, TYK2, STAT4 STAT3, MAPK1, CD86, CD69 STAT3, TYK2, STAT4

WP2203 WP205 WP2112 WP75

49 25 38 116

4 3 3 4

0.07 0.04 0.06 0.17

55.88 82.15 54.04 23.61

1.02E  05 4.56E  05 0.0001 0.0001

IL2RA, STAT3, MAPK1, STAT4 IL2RA, STAT3, MAPK1 TRAF3, STAT3, MAPK1 TRAF3, CD40, MAPK1, CD86

WP2332 WP49

49 53

3 3

0.07 0.08

41.91 38.75

0.0002 0.0003

DB_ID:1633

113

8

0.17

48.46

7.23E  10

KEGG Toxoplasmosis Hepatitis C Jak–STAT signaling Toll-like receptor signaling Cell adhesion molecules Cytokine–cytokine receptor interaction Chemokine signaling T-cell receptor signaling Osteoclast differentiation Pathways in cancer Wikipathways pathway Inflammatory Response IL-12 signaling IL-3 signaling Type III interferon signaling TSLP signaling IL-7 signaling IL-17 signaling Toll-like receptor signaling Interleukin-11 signaling IL-2 signaling

P-value

ID

Pathway Commons pathway IL-12-mediated signaling events TCR signaling in CD8 þ T cells Insulin pathway

DB_ID:1521

129

7

0.19

37.15

5.34E  08

DB_ID:1466

1288

13

1.88

6.91

1.46E  07

CD40/CD40L signaling Nectin adhesion

DB_ID:1479 DB_ID:1472

58 1295

5 13

0.08 1.89

59.01 6.87

1.46E  07 1.46E  07

ErbB1 downstream signaling Syndecan-1-mediated signaling events Signaling events mediated by VEGFR1 and VEGFR2 Class I PI3K signaling events GMCSF-mediated signaling events

DB_ID:1602

1288

13

1.88

6.91

1.46E  07

DB_ID:1454

1300

13

1.9

6.85

1.46E  07

DB_ID:1516

1296

13

1.89

6.87

1.46E  07

DB_ID:1553

1288

13

1.88

6.91

1.46E  07

DB_ID:1461

1292

13

1.89

6.89

1.46E  07

Genes CD40, STAT3, MAPK1, TYK2, TNFRSF1A TRAF3, STAT3, MAPK1, TYK2, TNFRSF1A IL2RA, STAT3, TYK2, STAT4, CBLB TRAF3, CD40, MAPK1, CD86

STAT3, MAPK1, TYK2 IL2RA, STAT3, MAPK1 EOMES, CD86, STAT4, TNFRSF1A, MALT1, IL2RA, STAT3, TYK2 IL2RA, EOMES, MAPK1, CD86, STAT4, TNFRSF1A, MALT1 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2 TRAF3, CD40, TNFRSF1A, CBLB, MALT1 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2 EOMES, ELMO1, STAT4, IQGAP1, TNFRSF1A, CBLB, MALT1, ZFP36L1, IL2RA, ZMIZ1, STAT3, MAPK1, TYK2

Abbreviations: C, number of reference genes in the category; E, expected number in the category; GMCSF, granulocyte macrophage colony-stimulating factor; IL, interleukin; Jak, Janus kinase; KEGG, Kyoto Encyclopedia of Genes and Genomes; MHC, major histocompatibility complex; MS, multiple sclerosis; O, number of genes in the candidate gene list and within the category; PI3K, phosphatidylinositol 3-kinase; R, ratio of enrichment; SNP, single-nucleotide polymorphism; STAT, signal transducer and activator of transcription; TCR, T-cell receptor; VEGFR, vascular endothelial growth factor receptor.

cells (H1 and K562, Po1  10  8); B lymphoblastocytes (GM12878; P ¼ 2.4  10  5) and epidermal keratinocytes (NHEK; P ¼ 5.3  10  5) (Supplementary Table 1). These variants were also more likely to reside within DNase sites in multiple cells lines including CD20 þ B cells (P ¼ 6.4  10  4), T helper cells type 1 (P ¼ 2.5  10  4), CD4 þ T helper cells type 0 (P ¼ 0.0023) and several B lymphoblastocytes (Supplementary Table 1). To explore candidate functional variants within the 63 MS-associated genes, we cataloged all functional assignments (1000 Genomes release 13 December 2012 EBI; Supplementary Table 2) within the LD block boundaries containing the associated risk variant. LD blocks were constructed using the genotypes for Northern Europeans available in the 1000 Genomes Phase 3 & 2014 Macmillan Publishers Limited

integrated variant set (March 2012)10 (see Materials and methods). To start, there were over 208 000 functional assignments within the 63 MS risk genes (Supplementary Table 3); when excluding intronic and noncoding transcript annotations, there were 81 311 functional assignments. Many SNPs possess multiple functional assignments because of transcript variation; for example, the TYK2 variant rs140324156 is annotated as synonymous, splice region, noncoding exon and 30 UTR variant (Supplementary Table 4). There were no LD blocks spanning the risk variants within CDH3, CXCR5 and MLANA (Table 2). For the remaining 60 SNPs, the blocks sizes ranged from 332 base pairs (ZMIZ1) to 78.1 kb (IQCB1), with a median size of 7.8 kb and a mean size of 16.2 kb (s.d. ¼ 17.6; Table 2). Genes and Immunity (2014) 466 – 476

A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al

472 There were functional variants within the LD block boundaries encompassing 43 risk SNPs; for 17 risk SNPs there were no functional variant variants within the LD block boundaries (Table 2). The total number of functional assignments across these 43 genes was almost 116 000 annotations; when removing intronic and noncoding transcript annotations, there were 52 439 functional assignments (Figure 1a). However, when focusing on variation within the LD blocks alone, there were 2816 unique SNPs (Supplementary Table 4) with 5244 functional assignments (Figure 1b) and included: 932 30 UTR SNPs, 249 50 UTR SNPs, 59 coding sequence SNPs, 213 feature elongation variants, 254 feature truncation variants, 272 frameshift variants, 3 inframe deletions, 2 inframe insertions, 42 nonsense-mediated mRNA decay transcript variants, 957 noncoding exon variants, 13 splice acceptor variants, 24 splice donor variants, 172 splice region variants, 161 stop gained variants, 3 stop lost variants, 423 synonymous SNPs and 1465 missense SNPs (Table 2). Over 35% of all functional assignments were located within the 49.8-kb LD block encompassing TET2 rs2726518. There was a greater percentage of missense assignments among the 2816 unique SNPs (28%), compared with 21% among all assignments (not including intronic and noncoding transcript annotations). A similar increase was observed for 30 UTR assignments (10–18%) and frameshifts (3–5%), when comparing the 52 439 assignments across the 43 genes to the 5244 assignments within the 43 LD blocks (Figure 1). A major interest in human genetics is to distinguish missense polymorphisms that are functionally neutral from those that may contribute to disease. Using protein prediction algorithms, 816 missense variants were predicted to be ‘deleterious’ and/or ‘probably damaging’ for at least one gene transcript by SIFT (sorting intolerant from tolerant)11 and PolyPhen (polymorphism phenotyping),12 respectively, of which 373 missense variants were both ‘deleterious’ and ’probably damaging’ (Supplementary Table 4; see Materials and methods). For the three missense MS risk variants, the predicted impact on protein function was: TYK2 rs34536443: ‘deleterious’ and ‘probably damaging’; SLC44A2 rs2288904: ‘tolerated’ and ‘benign’; and IFI30 rs11554159: ‘deleterious’ and ‘possibly damaging’. Functional classification of the 63 MS-associated genes was conducted using Web-based Gene Set Analysis Toolkit (WebGestalt; updated January 2013; http://bioinfo.vanderbilt. edu/webgestalt/),13 and compared with all human genes, it suggests that there was an enrichment of several KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways (Table 3), including toxoplasmosis (P ¼ 2.6  10  5), hepatitis C (P ¼ 2.6  10  5), Janus kinase (JAK)–STAT signaling pathway (P ¼ 3.6  10  5), Toll-like receptor signaling pathway (P ¼ 1  10  4), cell adhesion molecules (P ¼ 2  10-4) and T-cell receptor (TCR) signaling (P ¼ 5  10  4). The Pathways Commons analysis determined that interleukin (IL)-12-mediated signaling (P ¼ 7.2  10  10), TCR signaling (P ¼ 5.3  10  8), insulin pathway (P ¼ 1.5  10  7), CD40/CD40L signaling (P ¼ 1.5  10  7) and several pathways that shared a common gene set (P ¼ 1.5  10  7; nectin adhesion, ErbB1 downstream, syndecan-1-mediated, vascular endothelial growth factor receptor 1/2-mediated, class I phosphatidylinositol 3-kinase and granulocyte macrophage colony-stimulating factormediated signaling pathways) were enriched among the MS genes (Table 3). Using the Wiki Pathway database, genes involved in inflammatory response (P ¼ 5.9  10  6) and IL-2, IL-3, IL-7, IL-11, IL-12 and IL-17 signaling pathways (Po5  10  3) were overrepresented (Table 3). There was an enrichment of microRNA targets within the 63 MS-associated genes, including targets for miR-182, miR-524, miR-122A, miR-506, miR-29A, miR-29B, miR-29C, miR-181A, miR-181B, miR-181C, miR-181D and miR-124A (Po5  10  3; Table 4). There were several transcription factor binding targets enriched among these genes; the most common were for NFAT, Genes and Immunity (2014) 466 – 476

PAX4, Oct-1, NF-kB, Chx10, EST1, LEF1, Oct, ELK1 and AP-4 (Po5  10  3; Table 4). Prednisone and immune globulins were among the drugs more strongly associated with these genes (Po5  10  3; Table 4). Furthermore, there was an overrepresentation of several protein–protein interactions; the genes involved are largely reflective of the genes identified in enriched pathways (Table 4). DISCUSSION Characterization of the genetic component of MS has greatly advanced within the past decade, primarily because of international efforts and the completion of recent GWASs followed by replication and meta-analyses. The primary genetic risk locus for MS is within the MHC, and recent efforts have finally begun to unravel the complexities within the region.6 Thus far, 110 nonMHC disease-associated loci have been identified, although the functional pathological variants within these loci are largely unknown.4,5 We conducted a two-part bioinformatics analysis of these risk loci. First, all likely functional candidates within genes established through GWAS and follow-up studies were identified. Second, enrichment analyses were conducted to explore the underlying biology. Our results underscore the importance of cataloging likely functional candidates and assessing enrichment of various biological features as they relate to function and regulatory capacity of coding and noncoding variants. GWASs are hypothesis free and do not provide any explicit biological evidence relating associated SNPs to disease; only a few SNPs per LD region are usually interrogated. As a result, it is unclear whether MS-associated SNPs are the causal variants; however, it is expected that the adjacent genomic regions in LD with the associated variant are likely to contain the biologically relevant polymorphisms.14 Elucidating the causal variants within these susceptibility loci requires extensive functional experiments in conjunction with comprehensive fine-mapping analyses in larger association studies; both are time intensive and expensive.15,16 The utilization of bioinformatics resources to identify and prioritize candidate SNPs that may be phenotypically relevant is an efficient and cost-effective approach. Available bioinformatics tools can also reveal putative biological processes and pathways not previously known to contribute to pathogenicity. Until recently, determining functionality was largely restricted to nonsynonymous SNPs, and whether the encoded amino acid change was a neutral or deleterious substitution in the resulting protein. However, currently available resources can be used to interrogate a role for both genic and intergenic variations in protein regulation, expressivity and function. Furthermore, enrichment analyses can identify new experimental targets, including microRNA and transcription factor binding sites. Results from bioinformatics analyses presented here emphasize that many MS risk variants are not located within coding region of genes, and are likely to participate in gene regulation and expressivity. A total of 84% of the 110 established MS risk variants are located within a regulatory motif, and 25% are within a transcription factor binding site. The results also suggest complex interactions among MS risk genes, including STAT3 with PVT1, JAZF1 and ZFP36L1. By focusing on variation within the LD block boundaries spanning 43 of the 63 genic MS risk SNPs, we identified 2816 SNPs as primary candidates for functional and fine-mapping investigations. Within these genes, there were 52 439 functional assignments; however, there was a 90% reduction in assignments when restricting to the LD blocks (5244 assignments). Within this catalog, 28% of these assignments were missense annotations; there were also 3 stop lost variants, 161 stop gain variants, 209 splice variants and 277 structural insertions, deletions and frameshifts. Based on these results, fine-mapping and functional studies can be focused; for example, MALT1 contains two missense & 2014 Macmillan Publishers Limited

A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al

473 Table 4.

Overrepresented biological features, drug associations and protein–protein interactions among the 63 MS-associated genes Biological feature

ID

C

O

E

R

P-value

Genes

Transcription target hsa_TGGAAA_V$NFAT_Q4_01

DB_ID:2437

1871

14

2.73

5.12

6.77E  05

hsa_V$PAX4_02

DB_ID:2065

233

6

0.34

17.63

0.0001

hsa_V$OCT1_Q5_01 hsa_V$NFKB_Q6_01 hsa_TAATTA_V$CHX10_01

DB_ID:2306 DB_ID:2259 DB_ID:2408

265 231 797

5 5 8

0.39 0.34 1.16

12.92 14.82 6.87

0.0008 0.0008 0.0008

hsa_V$ETS1_B hsa_CTTTGT_V$LEF1_Q2

DB_ID:2057 DB_ID:2428

255 1939

5 12

0.37 2.83

13.42 4.24

0.0008 0.0008

hsa_V$OCT_Q6 hsa_V$ELK1_01 hsa_CAGCTG_V$AP4_Q5

DB_ID:2271 DB_ID:1853 DB_ID:2403

257 266 1502

5 5 10

0.38 0.39 2.19

13.32 12.87 4.56

0.0008 0.0008 0.001

CXCR5, TYMP, EOMES, ELMO1, WWOX, IQCB1, CD86, TNFRSF1A, ZFP36L1, ZMIZ1, FOXP1, PHGDH, STAT3, SLC30A7 PITPNM2, ZMIZ1, EOMES, ELMO1, STAT3, FOXP1 TRAF3, IQCB1, FOXP1, CD86, STAT4 CXCR5, IL2RA, CD40, ELMO1, CD69 PITPNM2, BACH2, EOMES, ELMO1, CBLB, ZFP36L1, ZMIZ1, STAT3 RGS14, ELMO1, FOXP1, CBLB, SLC30A7 TET2, BACH2, EOMES, EPS15L1, CDH3, C1orf106, ZFP36L1, ZMIZ1, NDFIP1, FOXP1, STAT3, SLC30A7 TRAF3, IQCB1, FOXP1, CD86, STAT4 IFI30, ELMO1, TIMMDC1, STAT4, CBLB CXCR5, BACH2, EOMES, SLC44A2, CDH3, TRAF3, ZFP36L1, ZMIZ1, FOXP1, STAT3

MicroRNA target hsa_TTGCCAA,MIR-182

DB_ID:757

324

6

0.47

12.68

0.0004

hsa_CTTTGTA,MIR-524

DB_ID:802

431

6

0.63

9.53

0.0007

hsa_ACACTCC,MIR-122A hsa_GTGCCTT,MIR-506

DB_ID:719 DB_ID:712

69 712

3 7

0.1 1.04

29.76 6.73

0.0007 0.0007

hsa_TGGTGCT,MIR-29A,MIR29B,MIR-29C hsa_TGAATGT,MIR-181A,MIR181B,MIR-181C,MIR-181D hsa_TGCCTTA,MIR-124A

DB_ID:671

515

6

0.75

7.98

0.0007

DB_ID:669

479

6

0.7

8.57

0.0007

DB_ID:811

542

6

0.79

7.58

0.0007

DB_ID:831 DB_ID:759 DB_ID:673

215 234 277

4 4 4

0.31 0.34 0.4

12.74 11.7 9.89

0.0019 0.0023 0.0036

DB_ID:PA451100 DB_ID:PA164754884

33 624

3 7

0.05 0.91

62.23 7.68

0.0002 0.0002

DB_ID:PA165958345 DB_ID:PA162372878 DB_ID:PA7000 DB_ID:PA451096 DB_ID:PA450935 DB_ID:PA164746411 DB_ID:PA451879 DB_ID:PA449165

21 18 25 27 36 43 45 52

2 2 2 2 2 2 2 2

0.03 0.03 0.04 0.04 0.05 0.06 0.07 0.08

65.2 76.06 54.76 50.71 38.03 31.84 30.42 26.33

0.0014 0.0014 0.0016 0.0016 0.0026 0.0031 0.0031 0.0038

Protein–protein interaction Hsapiens_Module_531 Hsapiens_Module_540 Hsapiens_Module_863 Hsapiens_Module_594 Hsapiens_Module_408 Hsapiens_Module_287 Hsapiens_Module_110 Hsapiens_Module_203

DB_ID:531 DB_ID:540 DB_ID:863 DB_ID:594 DB_ID:408 DB_ID:287 DB_ID:110 DB_ID:203

52 7 7 42 103 115 212 341

4 2 2 3 4 4 5 6

0.18 0.02 0.02 0.15 0.36 0.4 0.74 1.2

21.95 81.51 81.51 20.38 11.08 9.92 6.73 5.02

0.0009 0.0029 0.0029 0.0029 0.0029 0.0034 0.0037 0.0043

Hsapiens_Module_25

DB_ID:25

1871

15

6.56

2.29

0.0052

Hsapiens_Module_299

DB_ID:299

257

4

0.9

4.44

0.0362

hsa_ACTGCCT,MIR-34B hsa_TAGCTTT,MIR-9 hsa_CACTGCC,MIR-34A,MIR34C,MIR-449 Drug associations Prednisone Immune globulin Nilotinib Dasatinib Sorafenib Prednisolone Phenylephrine Alseroxylon Vincristine Cyclophosphamide

SLC44A2, ZFP36L1, BACH2, ELMO1, EVI5, JAZF1 ZMIZ1, FOXP1, STAT4, JAZF1, IQGAP1, SLC30A7 BACH2, GALC, IQGAP1 SLC44A2, ZFP36L1, BACH2, EVI5, STAT3, IQGAP1, SLC30A7 IFI30, ZFP36L1, PITPNM2, BACH2, TNFRSF1A, ZBTB46 ZFP36L1, BACH2, MAPK1, FOXP1, JAZF1, CBLB BACH2, NDFIP1, STAT3, VMP1, JAZF1, IQGAP1 PLEKHG5, MAPK1, FOXP1, JAZF1 ZFP36L1, PITPNM2, BACH2, JAZF1 SLC44A2, PLEKHG5, FOXP1, SLC2A4RG

IL2RA, CD40, CD86 SLAMF7, IL2RA, CD40, CD86, STAT4, FCRL1, TNFRSF1A AHI1, CD69 CD40, STAT3 STAT3, MAPK1 IL2RA, FOXP1 DDAH1, MAPK1 CD40, FOXP1 CD40, FOXP1 CD40, FOXP1 TRAF3, TNFSF14, CD40, TNFRSF1A DDAH1, IQCB1 STAT3, STAT4 STAT3, TYK2, STAT4 NDFIP1, MLANA, TIMMDC1, EPS15L1 IL2RA, STAT3, TYK2, STAT4 IL2RA, MAPK1, STAT3, TYK2, STAT4 TRAF3, TNFSF14, CD40, JAZF1, TNFRSF1A, MALT1 AHI1, ELMO1, CD86, STAT4, IQGAP1, CBLB, IFI30, SLC44A2, CDH3, IL2RA, DKKL1, MAPK1, MERTK, STAT3, TYK2 DKKL1, CD86, IQGAP1, CBLB

Abbreviations: C, number of reference genes in the category; E, expected number in the category; MS, multiple sclerosis; O, number of genes in the candidate gene list and within the category; R, ratio of enrichment.

& 2014 Macmillan Publishers Limited

Genes and Immunity (2014) 466 – 476

A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al

474 Functional Assignment 3 prime UTR variant 5 prime UTR variant Coding sequence variant Feature elongation Feature truncation Frameshift variant Incomplete terminal codon variant Inframe deletion Inframe insertion Initiator codon variant Missense variant NMD transcript variant Non coding exon variant Splice acceptor variant Splice donor variant Splice region variant Stop gained Stop lost Stop retained variant Synonymous variant

Figure 1. Graphical distribution of all functional assignments with 43 MS-associated genes (a; 52 439 assignments), and within the nested 43 LD block boundaries (b; 5244 assignments).

SNPs within the LD block boundaries (Supplementary Table 4). The rs201310286 missense variant is rare (global minor allele frequency o0.001) and is predicted to be ‘deleterious’ (SIFT) and ‘possibly damaging’ (PolyPhen). MALT1 regulates the activation of nuclear factor (NF)-kB, a transcription factor for which binding targets are enriched among the MS-associated genes (Table 4). Results suggest the presence of a hierarchical biological relationship between MALT1 and other genes with NF-kB binding sites. In Malt1  /  mice, T helper 17 cells demonstrated strong infiltration of the central nervous system.17 Unfortunately, for 20 MS-associated genes there were either no LD blocks encompassing the risk variant (for example, CDH3, CXCR5 and MLANA) or there were no nonintronic/noncoding transcript assignments within the blocks (for example, the 332-bp block spanning ZMIZ1 rs1782645) (Table 2). These 20 genic risk SNPs require further attention. Several bioinformatics enrichment analyses were conducted in the current study. The primary biological mechanism highlighted by the 63 MS-associated genes is T-cell development, activation and signaling. We investigated three databases, each with very distinct unique pathways (number of pathways: KEGG ¼ 390; Pathway Commons ¼ 1651; WikiPathways ¼ 1018) and pathway names (Supplementary Table 5). Therefore, we utilized multiple databases for analysis and reviewed the genes identified within each enriched pathway. We observed common biological themes from among the enrichment pathways, for example, TCR signaling in KEGG (genes: MAPK1, CBLB and MALT1) and TCR signaling in CD 8 þ T cells in Pathway Commons (genes: IL2RA, EOMES, MAPK1, CD86, STAT4, TNFRSF1A and MALT1). Across the three pathway databases, TCR signaling processes were enriched. Similarly, the transcription factors overrepresented among the binding targets are also involved in T-cell development (nuclear factor of activated T-cells (NFAT)18) and activation (NF-kB17). Furthermore, there was an enrichment of microRNA targets for several microRNAs involved in T-cell clonal expansion (miR182), activation (miR181a) and differentiation (miR29).19,20 These results also encourage the exploration of other candidate genes with similar transcription factor and microRNA binding targets. Interestingly, the enrichment analyses suggest an overlap between MS-associated genes and genes contributing to susceptibility for toxoplasmosis and cancer. Toxoplasmosis is a leading cause of death attributable to food-borne illnesses in the United Genes and Immunity (2014) 466 – 476

States, but most infected individuals are asymptomatic (www.cdc.gov). A comparison of MS patients with (N ¼ 12) and without (N ¼ 12) toxoplasmosis suggest that infected cases had a less severe disease progression, determined by number of relapses and disability measures, and may be related to increased secretion of IL-10 and transforming growth factor-b and the presence of CD25 þ CD4 þ FoxP36 þ T cells.21 A recent nationwide population-based cohort study in Taiwan observed an increased risk of developing cancer (85%) among MS cases, particularly breast cancer (2.2-fold increased risk).22 These results complement prior bioinformatics analyses of nonMHC MS risk variants. Gene Ontology assessment of genes near the 52 risk variants identified in 2011 and of the 110 risk variants identified in 2013 suggested that processes involved T-cell differentiation were overrepresented.4,5 A protein-interactionnetwork-based pathway analysis of 137 457 SNPs that were nonsynonymous and potential deleterious (PolyPhen), located in 50 or 30 UTRs, or within transcription factor or histone binding sites in two large genetic MS studies demonstrated that encoded proteins are more likely to be connected within the proteininteraction-network, than by chance alone.22 Based on analyses of 79 nominally significant genes within these two data sets, there was an overrepresentation of genes involved in leukocyte activation, apoptosis and positive regulation of macromolecule metabolic process, JAK–STAT signaling, acute myeloid leukemia and TCR signaling using Gene Ontology and KEGG pathways databases.23 Our bioinformatics and enrichment analyses differ in approach; however, similar immunological processes were highlighted across all enrichment analyses. Functional variants are likely to be adjacent to the GWAS signal, existing within LD blocks encompassing the risk SNPs. However, regions of strong LD can be large and SNPs associated with a phenotype can be in perfect LD with SNPs several hundred kilobases away.14 We utilized LD block boundaries as defined by Gabriel et al.24 and based on data from 279 Northern Europeans. Haplotype blocks inferred by different algorithms are likely to differ in size and coverage given SNP density and allele frequency distributions; however, shared chromosomal regions can increase with higher SNP density and the inclusion of less common variants.25 In summary, we present the first comprehensive catalog of functional candidates for MS to expedite prioritization of SNP & 2014 Macmillan Publishers Limited

A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al

475 selection for fine-mapping and molecular studies. Application of bioinformatics and computational tools provided insight into the underlying biological pathways and regulatory processes that may contribute to MS onset. The in silico methods employed can be adapted by any investigator to a priori SNP selection or post hoc evaluation of variants established through GWASs. Investigators must keep in mind that databases for bioinformatics analysis are not necessarily complete, and the algorithms employed by various tools may differ. Thus, we recommend the use of multiple bioinformatics tools and available resources. Our results emphasize a role for T-cell dysregulation in MS pathogenesis, and complex gene–gene interactions between STAT3 and PVT1, JAZF1 and ZFP36L1 and MALT1 and NF-kB target genes.

bioinformatic tools (Supplementary Table 6).35 WebGestalt includes biological pathways (KEGG,36 updated 21 March 2011; Pathway Commons,37 updated 11 November 2012; WikiPathways,38 updated 11 November 2012), regulatory modules (Molecular signatures database;39 updated 11 November 2012), disease and drug association modules (Pharmacogenetics Knowledge Base,40 updated 26 January 2013; Gene List Automatically Derived For You,41 updated 26 January 2013). Enrichment analyses were conducted using hygergeometric tests. P-values were corrected for multiple testing based on the false discovery rate using the Benjamini and Hochberg procedure.13 We considered all human genes as the appropriate reference for comparison. The top 10 enriched classes for each bioinformatics analysis were reported.

CONFLICT OF INTEREST The authors declare no conflict of interest.

MATERIALS AND METHODS Characterization of MS risk variants The physical location and functional annotation for each of the 110 nonMHC risk variants were characterized using the Genome Reference Consortium GRCh37 (hg19) available in the UCSC Genome Browser26 and Haploreg v2 (2013.02.14; http://www.broadinstitute.org/mammals/ haploreg/haploreg.php).9 The regulatory capacity of all coding and noncoding variants was also assessed using Haploreg. Haploreg is a bioinformatics resource that includes an extensive library of SNPs (dbSNP 137), motif instances (including position weight matrices collected from TRANSFAC, JASPAR, protein-binding microarray,and ENCODE CHIPseq experiments),27–30 expression quantitative trait loci (GTex eQTL browser)31 and enhancer annotation (Roadmap Epigenome Mapping Consortium).32,33 Allele frequencies are based on 1000 Genomes Phase 1 low-coverage data.10 Haploreg also performs enhancer and DNase enrichment analyses using a binomial test, comparing the coverage of enhancers, strong enhancers and DNase sites of the query list of variants to all variants.

Identification of functional candidates within MS-associated genes For the 63 genic risk SNPs, all variation within 50 kb upstream or downstream of each gene boundary was extracted from the 1000 Genomes Phase 3 integrated variant set (March 2012). This data set is based on 1000 Genomes Project sequence data freezes from November 2010 (low-coverage whole-genome) and May 2011 (high-coverage exome) that was revised in February and March 2012 to remove sequencing artifacts for 1092 individuals (246 Africans, 181 Asians and 379 Europeans).10 This autosomal variant set (36 648 992 SNPs; 1 380 736 INDELs; 13 805 structural variants) did not include sites that were monomorphic. LD blocks spanning the 63 genic risk SNPs were determined in 267 Northern Europeans (85 Utah residents with Northern and Western European ancestry, 93 Finnish in Finland and 80 British in England and Scotland). Blocks were constructed using the approach suggested by Gabriel et al.,24 as implemented in PLINK v1.07.34 Functional annotation for all variants within the MS-associated genes was downloaded from the 1000 Genomes Browser (1000 Genomes release 13 December 2012, EBI). Information for variation was not restricted, and thus included annotation for variants for which there were no genotypic data in the phase 3 integrated variant set. We included all functional assignments (Supplementary Table 2) for each SNP; for example, SNPs within genes with multiple transcripts may also have varying function dependent on the transcript. We did not consider SNPs that were only intronic or noncoding transcript variants across all gene transcripts. For missense variants, impact on protein function based on in silico tools was also extracted. Predictive values from both SIFT11 (‘tolerated’ to ‘deleterious assignments’) and PolyPhen12 (‘benign’ to ‘probably damaging assignments’) determined the potential pathogenicity of missense variants. SIFT values of p0.05 are considered ‘deleterious’, and PolyPhen values of 40.905 are ‘probably damaging’. We subsequently extracted all functional annotation within the LD block boundaries for the 63 genic MS risk SNPs.

Bioinformatics enrichment analyses of MS-associated genes Comprehensive bioinformatics analyses for the 63 MS-associated genes were conducted using Web-based Gene Set Analysis Toolkit (WebGestalt; updated January 2013; http://bioinfo.vanderbilt.edu/webgestalt/) that incorporates publicly available biological and functional data for enrichment analysis.13 It includes several unique features compared with other & 2014 Macmillan Publishers Limited

ACKNOWLEDGEMENTS We thank Dr Jing Wang and Gary Artim for technical assistance. This study was supported through NIH R01NS049510, NIH R01AI076544 and NIH R01ES017080.

REFERENCES 1 Oksenberg JR, Barcellos LF. Multiple sclerosis genetics: leaving no stone unturned. Genes Immun 2005; 6: 375–387. 2 Hawkes CH, Macgregor AJ. Twin studies and the heritability of MS: a conclusion. Mult Scler 2009; 15: 661–667. 3 Hemminki K, Li X, Sundquist J, Hillert J, Sundquist K. Risk for multiple sclerosis in relatives and spouses of patients diagnosed with autoimmune and related conditions. Neurogenetics 2009; 10: 5–11. 4 Sawcer S, Hellenthal G, Pirinen M, Spencer CC, Patsopoulos NA, Moutsianas L et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 2011; 476: 214–219. 5 Beecham AH, Patsopoulos NA, Xifara DK, Davis MF, Kemppinen A, Cotsapas C et al. Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis. Nat Genet 2013; 45: 1353–1360. 6 Patsopoulos NA, Barcellos LF, Hintzen RQ, Schaefer C, van Duijn CM, Noble JA et al. Fine-mapping the genetic association of the major histocompatibility complex in multiple sclerosis: HLA and non-HLA effects. PLoS Genet 2013; 9: e1003926. 7 Gregory SG, Schmidt S, Seth P, Oksenberg JR, Hart J, Prokop A et al. Interleukin 7 receptor alpha chain (IL7R) shows allelic and functional association with multiple sclerosis. Nat Genet 2007; 39: 1083–1091. 8 Gregory AP, Dendrou CA, Attfield KE, Haghikia A, Xifara DK, Butter F et al. TNF receptor 1 genetic risk mirrors outcome of anti-TNF therapy in multiple sclerosis. Nature 2012; 488: 508–511. 9 Ward LD, Kellis M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res 2011; 40(Database issue): D930–D934. 10 Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE et al. An integrated map of genetic variation from 1,092 human genomes. Nature 2012; 491: 56–65. 11 Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 2009; 4: 1073–1081. 12 Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res 2002; 30: 3894–3900. 13 Zhang B, Kirov S, Snoddy J. WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res 2005; 33(Web Server issue): W741–W748. 14 Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M. Linking disease associations with regulatory information in the human genome. Genome Res 2012; 22: 1748–1759. 15 Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ et al. Finding the missing heritability of complex diseases. Nature 2009; 461: 747–753. 16 Cooper GM, Shendure J. Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 2011; 12: 628–640. 17 Brustle A, Brenner D, Knobbe CB, Lang PA, Virtanen C, Hershenfield BM et al. The NF-kappaB regulator MALT1 determines the encephalitogenic potential of Th17 cells. J Clin Invest 2012; 122: 4698–4709. 18 Macian F. NFAT proteins: key regulators of T-cell development and function. Nat Rev Immunol 2005; 5: 472–484.

Genes and Immunity (2014) 466 – 476

A role for T-cell dysregulation in MS pathogenesis FBS Briggs et al

476 19 Stittrich AB, Haftmann C, Sgouroudis E, Kuhl AA, Hegazy AN, Panse I et al. The microRNA miR-182 is induced by IL-2 and promotes clonal expansion of activated helper T lymphocytes. Nat Immunol 2010; 11: 1057–1062. 20 Baumjohann D, Ansel KM. MicroRNA-mediated regulation of T helper cell differentiation and plasticity. Nat Rev Immunol 2013; 13: 666–678. 21 Correale J, Farez M. Association between parasite infection and immune responses in multiple sclerosis. Ann Neurol 2007; 61: 97–108. 22 Sun LM, Lin CL, Chung CJ, Liang JA, Sung FC, Kao CH. Increased breast cancer risk for patients with multiple sclerosis: a nationwide population-based cohort study. Eur J Neurol 2014; 22: 238–244. 23 IMSGC. Network-based multiple sclerosis pathway analysis with GWAS data from 15,000 cases and 30,000 controls. Am J Hum Genet 2013; 92: 854–865. 24 Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B et al. The structure of haplotype blocks in the human genome. Science 2002; 296: 2225–2229. 25 Indap AR, Marth GT, Struble CA, Tonellato P, Olivier M. Analysis of concordance of different haplotype block partitioning algorithms. BMC Bioinformatics 2005; 6: 303. 26 Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM et al. The human genome browser at UCSC. Genome Res 2002; 12: 996–1006. 27 Berger MF, Philippakis AA, Qureshi AM, He FS, Estep 3rd PW, Bulyk ML. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat Biotechnol 2006; 24: 1429–1435. 28 Berger MF, Badis G, Gehrke AR, Talukder S, Philippakis AA, Pena-Castillo L et al. Variation in homeodomain DNA binding revealed by high-resolution analysis of sequence preferences. Cell 2008; 133: 1266–1276. 29 Badis G, Berger MF, Philippakis AA, Talukder S, Gehrke AR, Jaeger SA et al. Diversity and complexity in DNA recognition by transcription factors. Science 2009; 324: 1720–1723. 30 Kheradpour P, Kellis M. Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res 2013; 42: 2976–2987.

31 GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat Genet 2013; 45: 580–585. 32 Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol 2010; 28: 1045–1048. 33 Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, Epstein CB et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 2011; 473: 43–49. 34 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81: 559–575. 35 Wang J, Duncan D, Shi Z, Zhang B. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res 2013; 41(Web Server issue): W77–W83. 36 Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for deciphering the genome. Nucleic Acids Res 2004; 32(Database issue): D277–D280. 37 Cerami EG, Gross BE, Demir E, Rodchenkov I, Babur O, Anwar N et al. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res 2011; 39(Database issue): D685–D690. 38 Kelder T, van Iersel MP, Hanspers K, Kutmon M, Conklin BR, Evelo CT et al. WikiPathways: building research communities on biological pathways. Nucleic Acids Res 2011; 40(Database issue): D1301–D1307. 39 Liberzon A, Subramanian A, Pinchback R, Thorvaldsdottir H, Tamayo P, Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics 2011; 27: 1739–1740. 40 Hewett M, Oliver DE, Rubin DL, Easton KL, Stuart JM, Altman RB et al. PharmGKB: the Pharmacogenetics Knowledge Base. Nucleic Acids Res 2002; 30: 163–165. 41 Jourquin J, Duncan D, Shi Z, Zhang B. GLAD4U: deriving and prioritizing gene lists from PubMed literature. BMC Genomics 2012; 13(Suppl 8): S20.

Supplementary Information accompanies this paper on Genes and Immunity website (http://www.nature.com/gene)

Genes and Immunity (2014) 466 – 476

& 2014 Macmillan Publishers Limited

Suggest Documents