Supplemental Material for 'JAM: A scalable Bayesian

1 downloads 0 Views 2MB Size Report
q q q q q. ADCY5 sensitivity analysis 2. JAM Posterior Probabilities. P osterior Probability rs1533394 rs1969262 rs11720822 rs6805611 rs1515371 rs7643790.
Supplemental Material for ‘JAM: A scalable Bayesian framework for joint analysis of marginal SNP effects’ P. J . Newcombe1 , D. V. Conti2,* , and S. Richardson1,* 1

2

MRC Biostatistics Unit, Cambridge, UK Division of Biostatistics, Department of Preventive Medicine, Zilkha Neurogenetic Institute, University of Southern California, US * These authors contributed equally to this work.

1

1

Supplementary Methods

Constructing z from univariate effect estimates More often than not, the summary data available for each marker m will consist only of a univariate effect estimate, βˆm . However, we can construct estimates of the genotype group means, y¯mg , and counts, nmg , required to define z as follows. First, using an estimate of marker m’s minor allele frequency, pˆm , taken either from the original study or an external source such as the 1000 genomes project, the group counts, nmg , are straight forward to estimate if we assume the marker has reached Hardy Weinberg Equilibrium (HWE): n ˆ m0 = (1 − pˆm )2 n ˆ m1 = 2ˆ p(1 − pˆm ) n ˆ m2 = pˆ2m Next, define y¯P op as the population mean value of y. Then, for all m, this may be written as a weighted average of the group means, y¯mg , weighted by the true group counts; n0m y¯0m + n1m y¯1m + n2m y¯2m n0m + n1m + n2m As described earlier, for simplicity, we model the mean-centred phenotype, y, to avoid fitting an intercept term. Therefore we have: y¯P op =

0=

n0m y¯0m + n1m y¯1m + n2m y¯2m n0m + n1m + n2m

(1)

If the assumptions underlying the linear regression model from which βˆm was estimated hold perfectly, we also have; y¯1m = y¯0m + βˆm

(2)

y¯2m = y¯0m + 2βˆm

(3)

and

Since we can estimate the group counts, we have a system of three simultaneous equations (1), (2), and (3) in three unknowns, which may thus be solved to obtain estimates of the three group means. Hence, substituting in the group count estimates, n ˆ mg from above, and then substituting equations (2) and (3) into (1) we obtain the following approximation for the wildtype mean; 0≈

n ˆ m0 y¯m0 + n ˆ m1 (¯ ym0 + βˆm ) + n ˆ m2 (¯ ym0 + 2βˆm ) n ˆ m0 + n ˆ m1 + n ˆ m2

therefore:

n ˆ m1 βˆm + 2ˆ nm2 βˆm n ˆ m0 + n ˆ m1 + n ˆ m2 then follow from equations (2) and (3):

yˆ¯m0 = − Approximations for y¯m1 and y¯m2

yˆ¯m1 = yˆ¯m0 + βˆm 2

yˆ¯m2 = yˆ¯m0 + 2βˆm We may now construct an approximation for z from n ˆ mg and yˆ¯mg , using equation (2). Therefore, our framework may be used to infer joint effect estimates from univariate effect estimates, providing a full IPD genotype matrix is available from an external population (note that this genotype matrix can be used to derive MAF estimates pm , as well as the 0 plug-in estimate for X X). 0

Construction of plug-in estimate for X X We follow the method described by Yang and Visscher[2], to construct a plug-in estimate 0 for X X , the unobserved P × P genotype variance-covariance matrix. Define as W the genotype matrix from an external population (such as the 1,000 genonomes project). For consistency with our intercept free formulation, genotypes for SNP m will be centred in W such that for all individuals i, wi,m = −2pm , 1 − 2pm or 2pm where pm is the frequency of SNP m in the external population. If the external population is of identical size to 0 0 the analysis population, we could simply approximate X X with W W . However, if the external population is of a different size (as will nearly always be the case) we must scale the variance and co-variances accordingly. First define D W as the diagonal matrix 0 of W W , such that DW (m) , the mth diagonal entry corresponding to marker m, is easily P 2 0 . Similarly, define D as the diagonal matrix of X X . We obtained from W as i wi,m have therefore not observed D, however, assuming HWE, it is straight forward to show that the mth entry may approximated according to MAF pm from the external data as: Dm ≈ 2pm (1 − 2pm )n Note that this approximation accounts for the mean centred genotype coding; the expression would be different expression if this was not the case. We may now approximate 0 X X with B, where; s X Dj Dk wi,j wi,k Bj,k = DW (j) DW (k) i Alternatively, in matrix form 0

−1/2

0

−1/2

X X ≈ B = D 1/2 D W W W D W D 1/2

Exploiting block independence to extend to high dimensional problems The approximate inference approach described in the methods can be efficiently extended to large numbers of SNPs if the block independence decomposition described in the online methods is invoked (which will likely be necessary anyhow). We start by defining the approximate normalising constant corresponding to to an analysis of block b in isolation, considering models γb ∈ Γ∗b up to dimension 3: X Cˆb = p(γb )p(γb |L−1 (4) b zb) γb ∈Γ∗b

3

The normalising constant corresponding to the joint analysis of all blocks, considering all possible models up to dimension 3 within each block, may be written as: X X −1 Cˆ = ... p(γ1 , ..γB )p(γ1 , ..γB |L−1 1 z 1 , ..LB z B ) γ1 ∈Γ∗1

γB ∈Γ∗B

Under the block independence likelihood decomposition, and placing independent betabinomial (aω , bω ) priors on model dimension in each block this becomes: X X −1 Cˆ = ... p(γ1 )p(γ1 |L−1 1 z 1 )...p(γB )p(γB |LB z B ) γ1 ∈Γ∗1

=

Y

γB ∈Γ∗B

(5)

Cˆb

b=1,..B

where each Cˆb is given by (4). If we are interested in a particular SNP or combination of SNPs in block b, γb say, the approximate posterior probability marginalised over all other blocks reduces to that which would be obtained from an analysis of the block in isolation: p(γb ) ≈ Cˆ −1 p(γb )p(γb |L−1 b zb) X X X ... γ1 ∈Γ∗1

γb−1 ∈Γ∗b−1 γb+1 ∈Γ∗b+1

...

X γB ∈Γ∗B

−1 −1 −1 p(γ1 )p(γ1 |L−1 1 z 1 )...p(γb−1 )p(γb−1 |Lb−1 z b−1 )p(γb+1 )p(γb+1 |Lb+1 z b+1 )...p(γB )p(γB |LB z B ) ˆ ˆ ˆ ˆ = Cˆ −1 p(γb )p(γb |L−1 b z b )C1 ...Cb−1 Cb+1 ...CB = Cˆ −1 p(γb )p(γb |L−1 z b ) b

b

(6) Therefore, invoking the block independence assumption and using independent but calibrated beta-binomial priors, the genome-wide relative importance of block specific SNPs or models may be estimated from an analysis of each block in isolation. Computationally speaking the genome-wide marginal scan amounts to no more than the total computation of analysing each block in isolation - that is, computation increases linearly with number SNPs. Specifically, the number of calculations, or evaluations of equation (9) and (10) in the online methods: X Pb b=1,..B

where Pb is the total number of models in the truncated model space, Γ∗b , corresponding to block b.

Model fitting via Reversible Jump MCMC Here we describe how, alternatively to enumeration of models up to dimension 3, a Reversible Jump MCMC scheme and be used to sample from the posterior p(γ|z L ) described in the online methods[1]. The Reversible Jump sampling scheme starts at an initial model, which we denote γ(0). To sample the next model, γ(1), we propose moving from the 4

current model to another model γ∗ using a proposal function q(γ ∗ |γ). We then accept the proposed model as the next sample with probability equal to the Metropolis-Hastings ratio: M HR =

P (z L |γ∗)P (γ∗) q(γ|γ∗) × P (z L |γ)P (γ) q(γ ∗ |γ)

where P (z L |γ∗) is the likelihood described in equation (9) and P (γ) is the betabinomial model space prior defined in equation (10) of the online methods. Therefore the proposed model is accepted with a probability proportional to it’s likelihood and prior. If the new model is accepted, the proposed sample is accepted as γ(1); otherwise, the sample remains equal to the current model, i.e., γ(1) = γ(0). It can be shown that this produces a sequence of samples that converge to the required posterior distribution[1]. The algorithm is implemented in our software, and can optionally be used instead of the enumeration approach described in the methods.

5

2

Supplementary Figures

6

0.8 0.4 0.0

Mean Posterior Probability

A) Beta−binomial(1, 1) Posterior Probabilities

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0

20

40

60

80

100

120

0.4

0.8

B) Beta−binomial(1, 9) Posterior Probabilities

0.0

Mean Posterior Probability

Simulated SNP

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0

20

40

60

80

100

120

0.4

0.8

C) Beta−binomial(1, 99) Posterior Probabilities

0.0

Mean Posterior Probability

Simulated SNP

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0

20

40

60

80

100

120

0.4

0.8

D) Beta−binomial(1, 999) Posterior Probabilities

0.0

Mean Posterior Probability

Simulated SNP

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0

20

40

60

80

100

120

Simulated SNP

Figure S1: Performance of JAM under the null, for a range of beta-binomial prior choices. For each simulated SNP, average posterior probabilities over 200 replicates are given. Ideally, since no signal exists, all averages should be near 0. 7

20

Up to 4 SNPs Up to 3 SNPs





10

Minutes

15



5





0



● ●

0





2000

4000

6000

8000

10000

SNPs

Figure S2: JAM run times for posterior model inference by enumeration up to dimension 3 for different numbers of SNPs. SNPs and analyses were decomposed into the same LD blocks used in the simulations. Run times are plotted for an analysis of 132 SNPs over 4 blocks, 1000 SNPs over 32 blocks, 5000 SNPs over 152 blocks and 10000 SNPs over 304 blocks. Analyses were run on an Intel Xeon E5-2640 2.50GHz processor.

8

1.0 0.8 0.6 0.0

0.2

0.4

0.6 0.4 0.0

0.2

Average PPV (solid lines)

0.8

Full IPD analysis JAM enumeration JAM stochastic search GCTA (COJO) stepwise selection Single SNP APPs Marginal p−values Sub−cohort IPD analysis

Average Sensitivity (dashed lines)

1.0

132 SNPs over 4 regions with a single signal in each

5

10

15

Top ranked SNPs

Figure S3: Comparison of ranking performance by JAM against various other strategies when data were simulated under single SNP models for 15,356 individuals (the total size of the MAGIC consortium). Ranking performance is measured in terms of positive predictive value (PPV), the proportion of true signal SNPs in the selection (solid lines, left y-axis), and power/sensitivity, the proportion of all simulated signals included (dashed lines, right y-axis). For each method, the average PPV and sensitivity estimates consist of points for each SNP rank, which we have joined with lines to ease the visual comparison. 132 SNPs were simulated across 4 regions, each of which had a single effect. For LD estimation, JAM was provided with an independently simulated reference dataset of 2,674, the size of the WTCCC control sample. Estimates are averaged over 200 simulation replicates. A vertical grey line highlights the rank equal to the number of true signals, where PPV and sensitivity by definition intersect. IPD: Individual Patient Data, ABF: Wakefield’s Approximate Bayes Factor.

9

0

10

20

30

40

Average Sensitivity (dashed lines)

1.0 0.6 0.4 0.2 0.0

0

Top ranked SNPs

0.8

1.0 0.2

0.4

0.6

0.8

Full IPD analysis JAM without approx JAM with approx

0.0

Average Sensitivity (dashed lines) Average PPV (solid lines)

0.2

0.4

0.6

0.8

1.0

B) 10,000 SNPs over 304 regions with multi−SNP signals (3 effects) in 4 regions

0.0

0.2

0.4

0.6

0.8

Full IPD analysis JAM without approx JAM with approx

0.0

Average PPV (solid lines)

1.0

A) 132 SNPs over 4 regions with multi−SNP signals (3 effects) in each

10

20

30

40

Top ranked SNPs

Figure S4: Comparison of JAM performance between use of the true correlation structure and true trait group means vs correlation structure from an independent sample and trait group means reconstructed from marginal effect estimates. Ranking performance is measured in terms of positive predictive value (PPV), the proportion of true signal SNPs in the selection (solid lines, left y-axis), and power/sensitivity, the proportion of all simulated signals included (dashed lines, right y-axis). For each method, the average PPV and sensitivity estimates consist of points for each SNP rank, which we have joined with lines to ease the visual comparison. Panel A) corresponds to a fine-mapping scenario including 132 SNPs across 4 regions, and panel B) corresponds to a higher dimensional setting in which 40 effects were simulated among 10,000 SNPs. Estimates are averaged over 200 simulation replicates. A vertical grey line highlights the rank equal to the number of true signals, where PPV and sensitivity by definition intersect. IPD: Individual Patient Data; the full IPD multivariate analysis is presented to demonstrate ‘optimal’ performance.

10

C) ADCY5

B) TCF7L2 rs10749125 rs11196032 rs17585058 rs10885369 rs7093793 rs6585167 rs10885371 rs10885372 rs17130008 rs11196062 rs1886600 rs1885281 rs1408817 rs11196083 rs7096151 rs11196089 rs12412082 rs7068695 rs11196098 rs7085310 rs7898648 rs10787464 rs941827 rs4918778 rs12358731 rs12253794 rs2590 rs11196122 rs10885388 rs4451646 rs1547191 rs2475177 rs1327694 rs4918785 rs17746501 rs2180964 rs10509966 rs2148887 rs10509967 rs7094463 rs12573128 rs7917983 rs7901275 rs10128255 rs4074720 rs11196181 rs4506565 rs6585201 rs10787472 rs12243326 rs11196203 rs10885409 rs12255372 rs11196212 rs4918790 rs911769 rs12768310 rs3814572 rs11196224 rs7085532 rs7894421 rs4918796 rs3814573 rs12775879 rs290490 rs290489 rs176632 rs11594610 rs4081699 rs290481 rs7915609 rs7918819 rs11196266 rs10885427 rs1923446 rs7914029 rs4917647 rs4265536 rs10787481 rs2788175 rs2039709 rs2788174 rs482074 rs1538320 rs11592170 rs501812 rs499832 rs531027 rs4918811 rs10437494 rs1904444 rs1904443 rs1904442 rs490816 rs495753 rs567584 rs581103 rs569586 rs10885431

rs10084650 rs10934633 rs16833763 rs2120804 rs9868873 rs11922528 rs1123613 rs9842287 rs6799522 rs6809947 rs10934637 rs6781240 rs13434289 rs836861 rs895071 rs12487822 rs4677994 rs3792361 rs2717235 rs1808987 rs3804749 rs3792369 rs864476 rs9877911 rs3804755 rs920899 rs1533394 rs1969262 rs11720822 rs6805611 rs1515371 rs7643790 rs953895 rs6784930 rs10934643 rs3934729 rs9850375 rs6770805 rs9826728 rs12634663 rs4678010 rs4470442 rs4596093 rs11717195 rs2877716 rs4677887 rs17295246 rs6794936 rs9840967 rs17361324 rs13063194 rs1006669 rs16834407 rs1471818 rs1965290 rs2697520 rs6765680 rs2626028 rs820324 rs17299206 rs820360

rs10749125 rs11196032 rs17585058 rs10885369 rs7093793 rs6585167 rs10885371 rs10885372 rs17130008 rs11196062 rs1886600 rs1885281 rs1408817 rs11196083 rs7096151 rs11196089 rs12412082 rs7068695 rs11196098 rs7085310 rs7898648 rs10787464 rs941827 rs4918778 rs12358731 rs12253794 rs2590 rs11196122 rs10885388 rs4451646 rs1547191 rs2475177 rs1327694 rs4918785 rs17746501 rs2180964 rs10509966 rs2148887 rs10509967 rs7094463 rs12573128 rs7917983 rs7901275 rs10128255 rs4074720 rs11196181 rs4506565 rs6585201 rs10787472 rs12243326 rs11196203 rs10885409 rs12255372 rs11196212 rs4918790 rs911769 rs12768310 rs3814572 rs11196224 rs7085532 rs7894421 rs4918796 rs3814573 rs12775879 rs290490 rs290489 rs176632 rs11594610 rs4081699 rs290481 rs7915609 rs7918819 rs11196266 rs10885427 rs1923446 rs7914029 rs4917647 rs4265536 rs10787481 rs2788175 rs2039709 rs2788174 rs482074 rs1538320 rs11592170 rs501812 rs499832 rs531027 rs4918811 rs10437494 rs1904444 rs1904443 rs1904442 rs490816 rs495753 rs567584 rs581103 rs569586 rs10885431

rs10084650 rs10934633 rs16833763 rs2120804 rs9868873 rs11922528 rs1123613 rs9842287 rs6799522 rs6809947 rs10934637 rs6781240 rs13434289 rs836861 rs895071 rs12487822 rs4677994 rs3792361 rs2717235 rs1808987 rs3804749 rs3792369 rs864476 rs9877911 rs3804755 rs920899 rs1533394 rs1969262 rs11720822 rs6805611 rs1515371 rs7643790 rs953895 rs6784930 rs10934643 rs3934729 rs9850375 rs6770805 rs9826728 rs12634663 rs4678010 rs4470442 rs4596093 rs11717195 rs2877716 rs4677887 rs17295246 rs6794936 rs9840967 rs17361324 rs13063194 rs1006669 rs16834407 rs1471818 rs1965290 rs2697520 rs6765680 rs2626028 rs820324 rs17299206 rs820360

A) GCKR

D) VPS13C rs10519131 rs12913453 rs7183554 rs2414743 rs7168148 rs17303705 rs8043433 rs8041905 rs8039554 rs16944568 rs11857925 rs12903441 rs7178749 rs8028558 rs4359383 rs12900235 rs7171455 rs12909308 rs7342610 rs12911496 rs4293320 rs7162205 rs1981918 rs4775435 rs17238047 rs11630921 rs12903374 rs2059469 rs17271123 rs1425268 rs7179559 rs12901220 rs1107785 rs1863264 rs875513 rs8026735 rs12708470 rs7172967 rs4587915 rs8034914 rs2303405 rs933807 rs8034335 rs4143844 rs11638793 rs1561197 rs1436958 rs12898209 rs12442675 rs11632441 rs12437833 rs7163194 rs7164482 rs11856607 rs6494306 rs6494307 rs16944725 rs1436954 rs2414760 rs9806383 rs920179 rs4474658 rs4775470 rs6494311 rs10851710 rs11071655 rs11635220 rs17205561 rs11853937 rs4775474 rs11858401 rs6494314 rs17205603 rs17238391 rs2611799 rs2611792 rs7164695 rs2037276 rs4774432 rs17271591 rs2678435 rs425818 rs17304143 rs289163 rs289159 rs399119 rs10851713 rs874897 rs289138 rs394338 rs12440178 rs2939543 rs2939542 rs17271632 rs289148 rs11638127 rs1878681 rs289153 rs2414771 rs289152 rs2611813 rs17205701 rs17238495 rs2261472 rs1518173 rs289139 rs4479177

rs2119026 rs7605526 rs11691585 rs7588910 rs1275501 rs2580759 rs11126929 rs13404446 rs3739095 rs4665969 rs2280737 rs1060525 rs7583698 rs780094 rs780092 rs8179252 rs4665383 rs6718128 rs2068834 rs10173720 rs2141371 rs2178197 rs13023094 rs12994085 rs3736594

rs10519131 rs12913453 rs7183554 rs2414743 rs7168148 rs17303705 rs8043433 rs8041905 rs8039554 rs16944568 rs11857925 rs12903441 rs7178749 rs8028558 rs4359383 rs12900235 rs7171455 rs12909308 rs7342610 rs12911496 rs4293320 rs7162205 rs1981918 rs4775435 rs17238047 rs11630921 rs12903374 rs2059469 rs17271123 rs1425268 rs7179559 rs12901220 rs1107785 rs1863264 rs875513 rs8026735 rs12708470 rs7172967 rs4587915 rs8034914 rs2303405 rs933807 rs8034335 rs4143844 rs11638793 rs1561197 rs1436958 rs12898209 rs12442675 rs11632441 rs12437833 rs7163194 rs7164482 rs11856607 rs6494306 rs6494307 rs16944725 rs1436954 rs2414760 rs9806383 rs920179 rs4474658 rs4775470 rs6494311 rs10851710 rs11071655 rs11635220 rs17205561 rs11853937 rs4775474 rs11858401 rs6494314 rs17205603 rs17238391 rs2611799 rs2611792 rs7164695 rs2037276 rs4774432 rs17271591 rs2678435 rs425818 rs17304143 rs289163 rs289159 rs399119 rs10851713 rs874897 rs289138 rs394338 rs12440178 rs2939543 rs2939542 rs17271632 rs289148 rs11638127 rs1878681 rs289153 rs2414771 rs289152 rs2611813 rs17205701 rs17238495 rs2261472 rs1518173 rs289139 rs4479177

rs3792252

rs3736594

rs12994085

rs2178197

rs2141371

rs13023094

rs2068834

rs6718128

rs10173720

rs780092

rs4665383

rs780094

rs8179252

rs7583698

rs1060525

rs2280737

rs4665969

rs3739095

rs2580759

rs13404446

rs1275501

rs11126929

rs7588910

rs7605526

rs2119026

rs11691585

rs3792252

Figure S5: LD blocks analysed for each of the four genes re-analysed from the MAGIC consortium results. Pairwise pearson r2 is plotted for each combination of SNPs (black = 1, white = 0). The red boxes indicate the correlation block (around the MAGIC reported index) analysed for each gene in our case study, and the red dotted line highlights the index SNP (or our tag in the case of GCKR and VPS13C). The location of additional effects in our multi-SNP simulation studies are indicated in blue.

11

Posterior Probability

1 0.8 0.6 0.4 0.2 0































rs6784930

rs10934643

rs3934729

rs9850375

rs6770805

rs9826728

rs12634663

rs4678010

rs4470442

rs4596093











12

































rs10934643

rs3934729

rs9850375

rs6770805

rs9826728

rs12634663

rs4678010

rs4470442

rs4596093



ADCY5 sensitivity analysis 9 JAM Posterior Probabilities ●



rs953895

rs4677887

rs2877716

rs11717195

rs4596093

rs4470442

rs4678010

rs12634663

rs9826728

rs6770805

rs9850375

rs3934729

rs10934643

rs6784930

● ●



● ● ●



ADCY5 sensitivity analysis 8 JAM Posterior Probabilities

● ●

● ● ●









rs17361324

rs9840967



rs6794936



rs17295246

ADCY5 sensitivity analysis 4 JAM Posterior Probabilities

● rs17361324

rs1515371 rs7643790

rs6805611



rs17361324

rs1969262 rs11720822

rs1533394



rs17361324



rs9840967

● rs9840967



rs6794936

ADCY5 sensitivity analysis 6 JAM Posterior Probabilities

rs17295246



rs6794936

rs4677887

rs2877716

rs11717195

rs4596093

rs4470442

rs4678010

rs12634663

rs9826728

rs6770805

rs9850375

rs3934729

rs10934643

rs6784930

rs953895

rs7643790

rs1515371

Posterior Probability

rs9840967 rs17361324

rs6794936

rs4677887 rs17295246

rs2877716



rs17295246

rs4677887

rs2877716

rs11717195

rs4596093

rs4470442

rs4678010

rs12634663

rs9826728

rs6770805

rs9850375

rs3934729

rs10934643

rs6784930

rs953895

rs6805611

rs11720822

rs1969262

rs4596093



rs17361324

rs9840967



rs9840967

● rs6794936

ADCY5 sensitivity analysis 10 JAM Posterior Probabilities rs17295246



rs6794936





rs17295246





rs4677887







rs4677887







rs2877716







rs4596093









rs11717195

● ●





rs4470442

● ●





rs4678010

● ●





rs12634663

● ●





rs9826728

● ●





rs6770805

● ●





rs9850375

● ●





rs3934729

ADCY5 sensitivity analysis 7 JAM Posterior Probabilities ●



rs10934643



rs1515371

ADCY5 sensitivity analysis 5 JAM Posterior Probabilities

rs7643790



rs6805611

rs1533394





rs6784930





rs953895

rs1969262

ADCY5 sensitivity analysis 3 JAM Posterior Probabilities ●

rs2877716

● ●



rs11717195



1 0.8 0.6 0.4 0.2 0 ● ●



rs7643790

● ●

rs6784930

● ●



rs953895

● ●



rs1515371



1 0.8 0.6 0.4 0.2 0



rs11720822





rs6805611





rs11720822



1 0.8 0.6 0.4 0.2 0 rs1533394



rs1969262



rs9840967



1 0.8 0.6 0.4 0.2 0

rs1533394



rs6794936

rs4470442



Posterior Probability



rs17295246



rs17361324



rs4677887

rs2877716

rs4678010

rs11717195

rs9826728 rs12634663



Posterior Probability



rs9840967

rs4596093

rs6770805



rs17361324



rs6794936

rs4470442

rs9850375



rs7643790



rs9840967



rs17295246

rs4678010

rs3934729



Posterior Probability



rs6794936

rs12634663

rs10934643



rs17361324



rs17295246

rs9826728

rs953895



rs1515371

● ●

rs4677887



rs2877716



rs11717195



rs4677887



rs4596093



rs2877716



rs4470442



rs11717195



rs4678010



rs6770805



rs6805611



rs4596093



rs12634663



rs9850375



rs11720822



rs4470442



rs9826728



rs3934729



rs1969262



rs4678010



rs6770805



rs10934643



1 0.8 0.6 0.4 0.2 0

rs1533394



rs12634663



rs9850375



Posterior Probability



rs9826728



rs3934729



rs6784930

rs7643790



rs17361324



rs6770805



rs10934643



rs6784930

rs1515371



rs9840967



rs9850375



rs953895



rs6794936



rs3934729



rs6784930



rs7643790



rs17295246



rs10934643



rs953895

rs6805611



rs4677887



rs6784930



rs7643790



rs1515371

rs1969262 rs11720822



rs2877716



rs953895



rs6805611

rs1533394



rs11717195



rs7643790



rs1515371

rs1969262 rs11720822

rs1533394



rs11717195



rs953895

● ●

rs1515371

● ●

rs6805611



rs7643790

● ●

rs1515371

● rs11720822

● ●

rs6805611



rs11720822



rs1969262



rs6805611



rs1533394

Posterior Probability



rs11720822

1 0.8 0.6 0.4 0.2 0 rs1969262

1 0.8 0.6 0.4 0.2 0

rs1533394

Posterior Probability 1 0.8 0.6 0.4 0.2 0

rs1969262

Posterior Probability 1 0.8 0.6 0.4 0.2 0

rs1533394

Posterior Probability

ADCY5 sensitivity analysis 1 JAM Posterior Probabilities ADCY5 sensitivity analysis 2 JAM Posterior Probabilities

● ● ●



Figure S6: Sensitivity analysis for ADCY5. The reference correlation structure from WTCCC was perturbed by bootstrapping rows (with replacement) 10 times.

Figure S7: Example trace plot of selection indicators provided by JAM in an analysis of simulated summary statistics from 15,356 individuals (the total size of the MAGIC consortium) for 132 SNPs over 4 regions. Multi-SNP signals (3 effects) were simulated in each region. Predictors are ordered horizontally, and posterior samples from JAM are ordered vertically from bottom to top. For each SNP, inclusion at the particular iteration is denoted in black, and exclusion is denoted in white. This plot helps to visualise the mixing patterns of the selection indicators and if variables seem to stick, i.e. do not come in and out in a fairly regular manner. JAM was run for 2 million iterations.

13

Figure S8: Example trace plot of selection indicators provided by JAM in an analysis of simulated summary statistics from 15,356 individuals (the total size of the MAGIC consortium) for 10,000 SNPs over 304 regions. Multi-SNP signals (3 effects) were simulated in four of the regions. Predictors are ordered horizontally, and posterior samples from JAM are ordered vertically from bottom to top. For each SNP, inclusion at the particular iteration is denoted in black, and exclusion is denoted in white. This plot helps to visualise the mixing patterns of the selection indicators and if variables seem to stick, i.e. do not come in and out in a fairly regular manner. JAM was run for 20 million iterations.

14

Figure S9: Trace plot of selection indicators provided by JAM in an analysis of summary statistics published by the MAGIC consortium for 132 SNPs over four regions analysed in 15,356 individuals for association with glucose 2 hours after oral stimulation. SNPs are ordered horizontally, and posterior samples from JAM are ordered vertically from bottom to top. For each SNP, inclusion at the particular iteration is denoted in black, and exclusion is denoted in white. This plot helps to visualise the mixing patterns of the selection indicators and if variables seem to stick, i.e. do not come in and out in a fairly regular manner. JAM was run for 2 million iterations. Sticking can be seen during the first million iterations, but note that these iterations (the first half) were discarded as part of the burn-in

15

References [1] Green P J. Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination. Biometrika, 82(4):711, 1995. [2] Yang J, Ferreira T, Morris A P, Medland S E, Madden P a F, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nature Genetics, 44(4):369–375, 2012.

16