Statistical Methods for Detecting Differentially Methylated Regions Based on MethylCap-Seq Data Deepak N. Ayyala1 , David E. Frankhouser2 , Javkhlan-Ochir Ganbat2 , Guido Marcucci3 , Ralf Bundschuh3 , Pearlly Yan3 , and Shili Lin4∗
1
Deepak N. Ayyala is postdoctoral researcher, Department of Statistics, The Ohio State
University 2
David E. Frankhouser and Javkhlan-Ochir Ganbat are graduate students, The Ohio State
University Wexner Medical Center 3
Guido Marcucci, Ralf Bundschuh, and Pearlly Yan are Professors, The Ohio State
University Wexner Medical Center 4
Shili Lin is Professor, Department of Statistics, The Ohio State University
∗
Address for correspondence:
Shili Lin, PhD Department of Statistics The Ohio State University 1958 Neil Avenue Columbus, OH 43210-1247, USA Tel: (614) 292-7404 Fax: (614) 292-2096 Email:
[email protected]
Running Title: DMR Detection Using MethylCap-seq Data
ABSTRACT DNA methylation is a well established epigenetic mark, whose pattern throughout the genome, especially in the promoter or CpG islands, may be modified in a cell at a disease stage. Recently developed probabilistic approaches allow distributing methylation signals at nucleotide resolution from MethylCap-seq data. Standard statistical methods for detecting differential methylation suffer from curse of dimensionality and sparsity in signals, resulting in high false positive rates.
Strong correlation of signals between CG sites also yield
spurious results. In this paper, we review applicability of high dimensional mean vector tests for detection of differentially methylated regions (DMRs) and compare and contrast such tests with other methods for detecting DMRs. Comprehensive simulation studies are conducted to highlight the performance of these tests under different settings. Based on our observation, we make recommendations on the optimal test to use. We illustrate the superiority of mean vector tests in detecting cancer-related canonical gene pathways which are significantly enriched for AML and ovarian cancer.
Keywords: Differentially methylated regions; MethylCap-seq; high dimensionality; mean vector test.
1
Key Points: • Mean vector tests are better than currently practiced methods at controlling type I error for detecting differentially methylated regions. • Of all mean vector tests, TCQ performs the best for methylation signal data and is based on minimal regularity conditions. • Difference in sparsity profiles should be investigated to validate accuracy of results.
2
1
Introduction
DNA methylation is an epigenetic modification known to regulate transcription, transcript splicing, and to play a role in a number of other cellular mechanisms. The dysregulation of methylation has been implicated in a variety of human diseases and cancer and is therefore extensively investigated. Whole genome bisulfite sequencing (WGBS) [1, 2] is a standard method to detect methylation status of the genome at individual nucleotide resolution. However, due to high experimental cost, other high-throughput techniques such as reduced read bisulfite sequencing [3] and capture-based methods [5, 8] have been developed for large-scale studies. In particular, capture-based methylation assays such as MethylCap-seq are more resource-friendly for large-scale clinical trials as these approaches require less patient material, a fraction of the sequencing depth and the associated data analysis efforts in comparison to single-base resolution whole genome bisulfite sequencing. Capture-based assays do have the drawback of data resolution as methylation signals are associated with library fragment sizes [4]. Recently, we derived a novel approach (PrEMeR-CG) [5] to compute nucleotideresolution methylation values from MethylCap-seq data. This prompts the need to generate more sensitive statistical models by which differences between sample groups can be assayed. To identify differentially methylated regions (DMRs) in the genome between two groups, say control and treatment, a preliminary step is to define such regions. One way is to consider well annotated regions such as CpG islands or promoter regions. Another way is to consider the entire genome dividing them into regions of fixed width, such as 1 kilo base pairs (bp). In such regions, methylation signal between neighboring CG sites are potentially positively correlated [6, 7]. As such, while the average or total methylation signal can be used to construct univariate testing procedures, ignorance of the dependence structure can lead to spurious results. That is, representing the methylation signal of a region by a single value such as the average or total over all sites within the region is therefore incorrect. Also 3
there is the risk of flattening out hyper- and hypo-methylated regions, leading to reduction in power. Multivariate statistical methods which incorporate the correlation structure for detection of differential methylation within a region is an alternative that has not been extensively considered. One of the approaches to obtain methylation coverage from MethylCap-seq reads is MEDIPS [8, 9], where the relative methylation signal or reads per kilobase per million reads (rpkm) value is calculated for bins of specified width. As described above, we have developed PrEMeR-CG, a probabilistic model to derive CpG-level methylation signals from MethylCap-seq data. In brief, the MethylCap-seq reads for each sample are probabilistically extended to their associated fragment length and distributed to each CG site contained in the extended fragment to generate signal for each CG site in the genome. The signal for each site is then normalized to the total signal imparted by the total reads for the sample. Details can be found in [5]. To study differences in methylation signals between two groups, statistical tests used are determined by how the signals are recorded. In WGBS studies where methylation signals are represented by the counts of reads mapped to a particular CG site, statistical significance of differential methylation can be tested using Fisher’s exact test [10–13] or Mann-Whitney U test [11, 12, 14]. More sophisticated methods have also been developed in recent years. These include a lognormal-beta-binomial model to describe sequencing counts, and its Bayesian hierarchical modeling nature enables information sharing across multiple CpG sites [15]. Another method is BSmooth, which uses a smoothing technique to borrow information from nearby CpG sites to derive a more accurate estimate of methylation signal for each CpG site [16]. MethylSig, on the other hand, uses a beta-binomial model to takes into account both read coverage and biological variation across samples [17]. Apart from [17], the other tests are only for detecting differential methylation for each CG site. In doing so the dependence
4
between methylation of neighboring CG sites is ignored. Multiple comparison methods such as Bonferroni correction and false discovery rate correction applied undermine the sensitivity of such methods due to the extremely high number of sites compared simultaneously. Using probability-based methods such as PrEMeR-CG [5] for assigning reads to CG sites, methylation signal for each CG site is represented on a continuous scale. Under distributional assumptions on the normalized signals, one can develop univariate tests for individual sites and apply multiple testing corrections but this will lead to the same problem as discussed above for count data. Alternatively, to study differential methylation of regions, one can use multivariate statistical tests such as the Hotelling’s T 2 to test all signals in the region simultaneously rather than testing for each individual site. This results in an exponential reduction in the number of multiple comparisons needed. In studies where the methylation regions under consideration are large, the number of CG sites within a region exceeds the number of subjects in the two groups. This can be attributed to the limitations of the study or the extent of CG site coverage within the regions. Standard multivariate techniques such as Hotelling’s T -test are no longer valid when there are fewer subjects(sample size) than CG sites(dimension). To avoid this shortcoming, Frankhouser et al. [5] considered a generalized estimating equations approach (MethMAGE) to model the vector of signals within a region. While MethMAGE overcomes the curse of dimensionality when testing the multivariate hypothesis, this method is slow and ineffective for large studies. Additionally, the method depends on the correlation structure specified. A computationally feasible alternative is to use the family of mean vector tests which are developed specifically for data sets with fewer samples than dimension. In this paper, we study the performance of three mean vector tests when applied to PrEMeR-CG derived CG sites data for detection of differentially methylated regions. We further compare their performance to those of MethMAGE and total signal t-test. In Section
5
2, we provide an outline of all methods being compared. We also briefly discuss the assumptions of these test statistics in the context of methylation. A simulation study comparing the efficiency of the five methods is provided in Section 3. We have analyzed two data sets, acute myeloid leukemia(AML) and ovarian cancer, using the tests being compared. Analyses of these data sets reveal significant differences between the various testing procedures, and confirm the superiority of vector mean tests for detecting biologically relevant regions. Results of the data analysis are provided in Section 4. We conclude with recommendations on the methods to use under different settings.
2
Methods
For a region with k CG sites, let Xi = (Xi1 , Xi2 , . . . , Xik ) , 1 ≤ i ≤ n1 , denote the observed methylation signals for n1 subjects in the first group and Yj = (Yj1 , Yj2 , . . . , Yjk ) , 1 ≤ j ≤ n2 , denote the observations for the second group. Let µX and µY denote vectors of length k of mean methylation signals for the two groups respectively. Defining differential methylation in terms of vectors of mean signal, the hypothesis of interest can be expressed as
H0 : µX = µY
(Not a DMR)
vs.
HA : µX 6= µY
(DMR) .
(1)
In the following subsections, we describe the testing procedures that we consider for comparison.
2.1
Univariate t-tests
Employing univariate tests to test for differential methylation can be done in two ways, total or site-wise. When using the total region signal, one looks at the total methylation P signal within the region and construct a univariate t-test. Let xi = kl=1 Xil , 1 ≤ i ≤ n1 6
and yj =
Pk
l=1
Yjl , 1 ≤ j ≤ n2 denote the total methylation signal of the region for each
subject in the two groups. One can then construct a t test statistic as t =
r x−y
2 s2 x + sy n1 n2
, where x
and y are the sample means and sx and sy are the sample standard deviations respectively. Alternatively, if one is interested in differential methylation of each CG site, a corresponding t-test can be constructed. These two methods have been previously used in the literature for methylation analyses. While Yan et al. [18] used the t-test for total methylation signal to study global methylation difference, the sitewise t-test is used in the MEDIPS workflow [9]. In the current paper, we use the t-test for the total methylation signal in our investigation.
2.2
MethMAGE
The generalized estimating equations method, called MethMAGE [5], provides a parametric test which defines a region to be differentially methylated by modeling the mean methylation signal vector. The GEE method models the mean vectors using an identity link function and a first order autoregressive (AR(1)) structure for the working correlation matrix. The AR(1) structure is considered to reflect the decreasing correlation between CG sites with increase in distance between the sites. However, the model does not take into account the actual genomic distance and uses the indices of the sites instead. The autoregressive structure specified identically models the correlation in regions where the sites are sparse and are spaced unequally and regions that are dense. As an illustration of this shortcoming, consider a region of length 100 bp with exactly 3 CG sites. The AR(1) model assumes the correlation of the first site with the remaining two is equal to α and α2 respectively, for an unknown parameter α. This correlation model remains the same irrespective of the position of the sites. For example, the correlation structure remains the same whether the locations of the three sites are (1, 3, 5) or (1, 20, 80). Another issue with the GEE method is the high computational cost. The iterative 7
Newton-Raphson algorithm employed to estimate the parameters in the model requires an exponentially increasing number of iterations to converge, both in terms of number of sites within the region and number of subjects in the study. Standard statistical software such as the geepack package in R fail to converge for considerably large sample sizes. It is therefore imperative to consider other multivariate testing procedures that are faster, more accurate and do not require specifying the underlying correlation structure.
2.3
Mean vector tests
From standard multivariate theory, the hypothesis in (1) can be tested using the Hotelling’s T 2 test statistic. However, validity of the T 2 test statistic requires both Xi ’s and Yj ’s to be normally distributed and the number of sites in the region to be smaller than the total number of subjects, k < n1 + n2 . While the second assumption can be satisfied by increasing the sample size, verification of normality assumptions requires additional testing. Recently, several researchers have proposed test statistics which relax both of these assumptions. Amongst such test statistics, the following three are shown to outperform the rest under different conditions: Chen and Qin [19] (henceforth called TCQ ), Park and Ayyala [20] (henceforth called TP A ) and Srivastava, Katayama and Kano [21] (henceforth called TSKK ). The underlying idea for constructing these test statistics is to construct a function whose expected value is a function of µX − µY . This function is constructed so that under the null hypothesis, i.e. µX = µY , the function is equal to zero. When the null hypothesis is not true, the function takes non-zero values and is an increasing function in kµX − µY k, for some norm in the k-dimensional space. Under certain regularity conditions on the fourth order moments and the covariance or correlation structure, all three test statistics are shown to be asymptotically normal. An appealing feature is that none of the three test statistics require a direct distributional assumption on the data.
8
2.3.1
Chen-Qin test
The Chen-Qin test [19] is based on the Euclidean norm of µX − µY . The test statistic is given by P n1
Xi0 Xj n1 (n1 −1)
i,j=1
TCQ = q
P
i6=j
Pn2
Yi0 Yj n2 (n2 −1)
i,j=1
+
2 d2 ) tr(Σ X n1 (n1 −1)
+
P
i6=j
−2
2 d2 ) tr(Σ Y n2 (n2 −1)
−
P n1 P n2 i=1
j=1
Xi0 Yj
n1 n2
,
(2)
4 tr(Σd X ΣY ) n1 n2
P P Xl 1 0 0 2 d where tr(ΣX ) = n1 (n1 −1) i6=j Xj (Xi − X (i,j) )Xi (Xj − X (i,j) ), X (i,j) = nl6=1i,j , tr(Σd X ΣY ) = −2 P P P l6=i Xl l6=j Yl 1 0 0 d2 i,j Xi (Yj − Y (j) )Yj (Xi − X (i) ), X (i) = n1 −1 and Y (j) = n2 −1 ; tr(ΣY ) is defined n1 n2
similarly. This test statistic is established to be asymptotically normal, with the normal approximation holding for relatively small sample sizes. The direct distributional assumption is replaced by finiteness of fourth order moments of X and Y . This test does not require a direct relationship between the sample size and dimension, which is replaced by conditions on traces of fourth powers of the covariance matrices, ΣX and ΣY . Also, TCQ is shown to outperform other orthogonal-transformation invariant tests such as Bai-Saranadasa test [22] and Dempster test [23].
2.3.2
Srivastava-Katayama-Kano test
The Srivastava-Katayama-Kano test [21] is based on a scaled-Euclidean norm, where the inner product terms are scaled by their variance. The test statistic is given by
TSKK = r
(X − Y )0 DS−1 (X − Y ) − k 2tr(R2 )
−
2 tr(DS−1 SX )2 n1 (n1 −1)
where tr(·) is the trace of the matrix, X =
P
−
Xi1 ,..., n1
2 tr(DS−1 SY )2 n2 (n2 −1)
P
Xik n1
and Y =
,
P
Yj1 ,..., n2
(3)
P
Yjk n2
are vectors of site-wise average signals, SX and SY are the sample covariance matrices,
9
S=
SX n1
+ SnY2 , DS is the k × k diagonal matrix whose elements come from the diagonal of S, −1/2
and R = DS
−1/2
SDS
is the correlation matrix. Similar to TCQ , asymptotic normality of
TSKK holds under conditions on the correlation matrix. Finiteness of fourth order moments is also assumed, which relaxes the distribution requirement. Unlike TCQ , the sample sizes and dimension are restricted to be min(n1 , n2 ) = O(k δ ), δ >
1 . 2
This direct relationship
between sample size and number of CG sites affects performance of TSKK in dense regions. 2.3.3
Park-Ayyala test
The Park-Ayyala test [20] is also based on the scaled-Euclidean norm, similar to TSKK . The test statistic is given by Pn1
i,j=1,i6=j
∗ SXY (i,j) =
q
Pn2
Xj
X(i,j)
i,j=1,i6=j
+
n1 (n1 −1)
TP A =
∗ = where SX(i,j)
−1 Xi0 DS ∗
2 d2 ) tr(R X n1 (n1 −1)
(n1 −3)SX(i,j) +(n2 −1)SY n1 +n2 −4
(n1 −2)SX(i) +(n2 −2)SY (j) . n1 +n2 −4
−1 Yi0 DS ∗
n2 (n2 −1)
+
2 d2 ) tr(R Y n2 (n2 −1)
, SY∗ (i,j) =
Pn1 Pn2
Yj
Y (i,j)
−2 −
i=1
j=1
−1 Xi0 DS ∗
Yj
XY (i,j)
n1 n2
,
(4)
4 tr(Rd X RY ) n1 n2
(n1 −1)SX +(n2 −3)SY (i,j) , n1 +n2 −4
and
In the above notation, SX(i,j) and SX(i) are the covariance
matrices estimated using {Xl }l6=i,j and {Xl }l6=i , respectively; SY (i,j) and SY (i) are defined similarly. The estimators in the denominator are given by
d2 ) = tr(R X tr(Rd X RY ) =
1 n1 (n1 − 1)
X
Xj0 DS−1∗
X(i,j)
Xi − X (i,j) Xj0 DS−1∗
X(i,j)
Xi − X (i,j) ,
i6=j
n1 X n2 1 X Xi0 DS−1∗ Yj − Y (j) Yj0 DS−1∗ Xi − X (i) XY (i,j) XY (i,j) n1 n2 i=1 j=1
Note that, as in defining TSKK , DS is a k × k diagonal matrix whose elements come from the diagonal of S. Construction of TP A is inspired by TCQ in the sense that both these tests consider only quadratic products for unequal indices in the numerator. It has been
10
shown that this construction results in removing strong assumptions on the distribution and dependence structure. Asymptotic normality of TP A has been shown under conditions on the correlation matrices which are similar to those in TCQ . The direct relationship between k, n1 and n2 required by TSKK is relaxed and hence is known to control type I error better than TSKK . However, TP A is constructed on the assumption that the two groups have equal variances, that is SX and SY have the same diagonals. Considering such an assumption without validation is not advisable. Although equality of variances is a testable hypothesis and has been previously studied for individual sites [24], studying differential variability of regions is beyond the scope of this manuscript. The test statistics TP A and TSKK are scale-invariant due to variance scaling and hence are known to be powerful when the variates are known to be on different scales. When methylation signals are obtained through PrEMeR-CG, the signals are reported by adjusting for multiple reads as the average signal per million reads. The signals depend not only on the probability assigned to each read, but also on the number of reads.
3
Simulation Study
A comprehensive simulation study to compare the performance of TCQ , TP A and TSKK using data generated with a sparse covariance structure has been performed by [20]. As opposed to such studies, methylation signal data suffers from sparsity in the distributed signals across samples. To compare the performance of TCQ , TP A , TSKK , total signal t-test and MethMAGE when applied to methylation signals, we have performed a simulation study using the AML methylation data described in [5] and the ovarian cancer methylation data reported in [25], representing two disparate data sets of MethylCap-seq data due to their different sample sizes and sparsity. Since the AML data and ovarian cancer data measure methylation signals over
11
promoter regions and CpG islands respectively, the simulation studies illustrate performance of the five test statistics for both region types. Detailed description of the data sets is provided in Section 4.
3.1
Simulation models
In this section, we describe the simulation models used to generate data for both AML and ovarian cancer data sets. From each data set, we identified 10 regions that have been determined to be non-differentially methylated and 10 that are differentially methylated. The former were used to calculate empirical type I error, while the latter were used to calculate power. Note that these 20 regions were selected based on a number of criteria to provide regions with sufficient data to evaluate the methods. These criteria and more details are provided in the discussion section. Methylation signals obtained using PrEMeR-CG are non-negative, hence we generate random signals using the multivariate lognormal distribution. Generation of multivariate lognormal random variables is very restrictive and is not feasible for any specified covariance structure. Assuming an autoregressive correlation structure on the original signals does not guarantee positive-definiteness of the covariance matrix of log-transformed normal variables. Hence the specified correlation structure is assumed on the log-transformed normal variables instead. This assumed autoregressive structure on the log-transformed signals preserves the desired properties of correlation structure of the original signals. To model the covariance structure for the two groups, we specify the correlation matrices and the diagonals of the covariance matrices. This specification allows control over the correlation structure for the signals. Let RX , RY , DΣX and DΣY denote the correlation and diagonal of covariance matrices for the two groups respectively, so that the covariance 1/2
1/2
1/2
1/2
matrices are given by ΣX = DΣX RX DΣX and ΣY = DΣY RY DΣY . As postulated in Section 12
1, the correlation between sites decreases with increase in distance between them. To model this property, we use two correlation structures denoted RI and RG , where the correlation is assumed to be decreasing with increase in difference of indices and genomic distance respectively. The correlation matrices used are
|i−j|
|i−j|
RIX,ij = ρX , RIY,ij = ρY |gi −gj |
|gi −gj | 10
10 RG , RG X,ij = ρX Y,ij = ρY
,
1 ≤ i, j ≤ k,
,
1 ≤ i, j ≤ k,
where g1 , g2 , . . . , gk are the genomic positions (in base pairs) of the sites. In other words, ρ is the correlation between two CG sites that are separated by 10 base pairs. Models generated using RI and RG are denoted Model I and Model II respectively. The diagonal of covariance matrices, DΣX and DΣY are estimated from the data separately for the two groups. As described in Section 2.3.3, TP A assumes that the two groups have equal covariance diagonals, DΣX = DΣY . To study performance of the tests under different specifications of the covariance diagonals, we generated data under two settings. In the first setting, we specify DΣX 6= DΣY estimated separately for the two groups. In the second setting, we specify DΣX = DΣY = DΣ , where DΣ is estimated using observations from both the groups. For sites with signal observed in at least one sample, the multivariate lognormal distribution generates random samples that are always non-zero. However methylation signal is not observed at sites for all the samples. To induce sparsity across samples in the generated lognormal signals, we inflate them by multiplying the generated signals with a Bernoulli random number. To generate the Bernoulli variables, we use the observed proportion of nonzero signals for each CG site as the parameter, denoted by πX = (πX1 , . . . , πXk ) and πY = (πY 1 , . . . , πY k ) for the two groups respectively.
13
The mean signal for the two regions are specified depending on whether the region is differentially methylated or otherwise. Let µX and µY denote the average methylation signal observed for the two groups and µ denote the average methylation signal observed across all samples. To calculate type I error using non-differentially methylated regions, we specify µX = µY = µ as the mean signal for both the groups. When calculating power, we specify µX and µY as the means for the two groups respectively. The generating model can hence be formulated in general as
Xi = Bi ◦ Xi∗ , Yj = Bj ◦ Yj∗ , 1 ≤ i ≤ n1 , 1 ≤ j ≤ n2 , Bi = (Bi1 , Bi2 , . . . , Bik ) , Bj = (Bj1 , Bj2 , . . . , Bjk ) , Xi∗ ∼ LN (µX , DΣX , RX ), Yj∗ ∼ LN (µY , DΣY , RY ), Bil ∼ Bern(πXl ), Bjl ∼ Bern(πY l ), 1 ≤ l ≤ k,
(5)
where ◦ is the element-wise Hadamard product. The number of samples in the two groups are set as n1 = 20 and n2 = 10 respectively. The parameters (RX , DΣX , πX , RY , DΣY , πY ) are specified according to the model being studied. For both Model I and Model II, we also generated data by reducing the observed proportion of non-zero signals by half, πX = 0.5πX and πY = 0.5πY . This is to study the performance of the tests when the observed methylation at each site is more sparse across samples. The correlation parameters ρX and ρY are fixed at 0.35 and 0.2 respectively. The list of parameters for the eight models studied are tabulated in Table 1. For each of the 20 regions, we generated 1000 data sets resulting in a total of 20000 data sets. Half of the data sets were used to calculate type I error while the remaining were used to calculate power.
14
3.2
Results for data simulated based on AML
The regions used to model parameters for data generation using AML data sets are described in Supplementary Material Table S1. Empirical type I error and power of the five test statistics at the nominal 5% significance level are tabulated in Table 2. We see that none of the test statistics control type I error. Besides TCQ , all the test statistics have extremely inflated type I error rates while TCQ is moderately liberal. Methylation signals recorded for promoter regions in the AML data sets have shown significant differences in sparsity profiles between the two groups (See Supplementary Figure S1). When the sparsity parameters are halved in setups 2 and 4, TP A and TSKK show considerable reduction in type I error. This can be explained by decrease in the difference between sparsity of the two groups. This is an indication that these two tests are sensitive to differences in sparsity profiles. In a simulation study (See Supplementary Table S3) where the sparsity vectors are set to be equal, we have seen significant reduction in type I error rates, with TP A even becoming conservative. Hence, one should compare sparsity profiles of the two groups before inferring the accuracy of the results. For all models, TCQ achieves highest power amongst the five tests. Although TSKK is observed to achieve higher power, the test is extremely liberal and hence incongruous. To study the power for a fixed false positive rate, we plotted the ROC curves for the five tests for Model I in Figure 1. Results for Model II convey the same information; the corresponding plots are presented in Supplementary Figure S7. The curves show that TCQ attains the best performance, while at high specificity TP A and MethMAGE perform better than TSKK and t-test. Fixing false positive rate at 5%, sensitivity of the five tests are presented in Table 3. Both TCQ and TP A are seen to achieve higher sensitivity than the remaining tests when the false positive rate is controlled for all the models. This conclusion is irrespective of the two correlation models, indicating that TCQ is the method of choice under either hypothesized 15
correlation structure.
3.3
Results for data simulated based on ovarian cancer
Attributes of the regions used to model parameters for data generation using ovarian cancer data sets are described in Supplementary Material Table S2. Empirical type I error and power calculated at 5% nominal significance level for the eight models are tabulated in Table 4. Results from the table show that TCQ and TP A perform well in preserving type I error, with TP A being conservative. This is observed in the power comparison of these two tests, with TP A achieving lower power than TCQ . An interesting observation from the table is that despite the assumptions failing to hold in setups 3 and 4, TP A still preserves type I error and achieves reasonable power. This is an indication that contrary to promoter regions used in AML data set, site-wise variability of methylation signals in the CpG islands is not significantly different for the two groups. The observed sparsity profiles of the regions considered are very similar for both the groups. This is reflected in similarity of results for the odd and even numbered setups. Decreasing the difference in sparsity parameters for the two groups does not yield significantly different type I error as observed for the AML data set. This reiterates our findings for the AML data set that the mean vector tests are sensitive to difference in sparsity levels, while MethMAGE is observed to be robust. Simulation studies performed by specifying equal sparsity profiles for the two groups (See Supplementary Material Table S4) are similar to those in Table 4, indicating that the tests TSKK , t-test and MethMAGE all have inflated type I error rates, resulting in detection of more false positives. ROC curves of the five test statistics for the Model I are presented in Figure 2. Corresponding plots for Model II are presented in Supplementary Figure S8. The plots show that the mean vector tests perform well for methylation signals recorded in CpG islands,
16
whereas MethMAGE and t-test are worse than a random guess. The curves also emphasizes the need to consider dependency between sites for better accuracy. Sensitivity of the tests calculated by controlling false positive rate at 5% are presented in Table 5, which show the mean vector tests achieving higher sensitivity than t-test and MethMAGE. Once again, qualitatively, there is little difference in the performances of the methods under the two different correlation structures.
3.4
Run time
Another advantage that the mean vector testing approaches have over MethMAGE is run time. Since MethMAGE is based on generalized estimating equations which is an optimizationbased method, it requires longer time for convergence whereas the mean vector tests are all based on matrix calculations and are faster. To illustrate the run times of these methods, we recorded the user time taken taken when evaluating them. The calculations are performed on a 3.2 GHz Intel Core i5 processor with 12 GB RAM. We selected four regions with varying number of sites among the 20 to illustrate the run time of each method and how the run time is affected by the number of CG sites. The number of sites in these regions are 49, 83, 132 and 176 respectively. To further study the effect of sample size on run time, we varied the samples sizes as (n1 , n2 ) = (7C, 3C), with 1 ≤ C ≤ 7. Average run times (in milliseconds) based on 100 randomly simulated data sets are presented in Table 6. Run time for TP A are not available for C = 1 since the test requires at least four samples in both groups. Being a univariate test, t-test is the fastest method, while TSKK is the fastest and the MethMAGE is the slowest amongst the multivariate methods. TP A is slower than the other mean vector tests since it involves O(n2 ) calculations of diagonal of covariance matrix when the sample size is n. To better visualize how run time for each method is affected by the number of CG sites, we provide Supplementary Figure S9. From the figure, we can see that, regardless of
17
the sample sizes, the run time for the mean vector tests increases linearly with the number of sites, whereas for MethMAGE, the increase in run time appears to be exponential. Further, as sample size increases, the run time for MethMAGE and TP A increases much more rapidly than the other tests. In particular, the run times for the t-test and the TSKK test are essentially unaffected by the number of CG sites or the sample size.
4
Analysis of Actual Patient Data
We applied the aforementioned test statistics to detect differentially methylated regions for two data sets. For the first analysis, we used the data set in [5]. The data are available on 39944 regions from 10 AML patients - 7 patients with FLT3 wild-type and 3 with FLT3-ITD. We applied two of the three mean vector tests, TCQ and TSKK , and compared the results to those from the MethMAGE and t-test. The TP A test is not applied as it requires at least four samples in both groups. For the second analysis, we used the ovarian cancer samples and accompanying clinical data which was used as classifiers to identify clinically distinct groups [25]. The 100 subjects were divided into two groups based on one of three attributes, diagnosis (malignant or benign), normal (yes or no) and recurrence (yes or no). The number of patients in the two groups for the different divisions are - 20 benign and 74 malignant, 6 normal and 94 cancer, and 57 recurrent and 27 non-recurrent, respectively. In the human genome, there are 27,718 CpG islands, as defined by the UCSC Genome Browser. Methylation status of this genomic feature was used for evaluation of methylation differences between the two groups as CpG islands are known to impact gene transcriptions [24]. Our analysis is focused on detection of differential methylated CpG islands between the two conditions in each of the three variables.
18
4.1
Results - AML Data
Figure 3 presents the Venn diagram showing the number of regions that are detected using TCQ , TSKK , t-test and MethMAGE. From simulation studies based on the AML data set, all test statistics except TCQ were seen to have inflated type I error. Hence a large number of regions detected by TSKK , t-test and MethMAGE are expected to be false positives. The number of regions detected by TCQ is the smallest amongst the four tests applied. Since TCQ is observed to be slightly liberal and achieves high power, we can conclude that the regions detected by TCQ still contain some false positives, but are the most reliable. R Canonical gene pathways constructed using QIAGEN’s Ingenuity Pathway Analysis
(IPA) are presented in Figure 4. Since t-test showed extremely high type I error rate, we have performed IPA only for and TCQ , TSKK and MethMAGE. Using a significance level of 1%, we see that TCQ has a higher proportion of pathways detected (33%) to be AML cancer-related when compared to TSKK and MethMAGE (25%). This is consistent with our observations from the simulation study where TCQ has slightly inflated type I error rate but achieves high power whereas TSKK and MethMAGE were very liberal. The high type I error for TSKK and MethMAGE observed in the simulations indicate that most of the pathways detected could potentially be false positives.
4.2
Results - Ovarian data
To analyze the ovarian cancer data, we considered four tests - TSKK , TCQ , TP A and t-test. Due to large sample size and high number of CG sites in the CpG islands, MethMAGE failed to converge in a considerable amount of time. Hence, we did not consider MethMAGE in our analysis. Figure 5 presents the Venn diagrams for the ovarian recurrence data set where the groups are separated by diagnosis, normalcy and recurrence status respectively. An interesting feature is the failure of TSKK to detect differentially methylated regions when 19
subjects are grouped by recurrence status. In simulation studies based on the ovarian cancer data, TSKK is seen to have inflated type I error rate. We speculate that the failure to identify more than one region as differentially methylated is possibly due to high bias in the estimation of the correlation matrix. Biological relevance of DMRs produced when samples are divided by their diagnosis status was explored using the canonical pathways produced by IPA. We used only diagnosis as the group status because it allows us to detect pathways which are different between cancer and normal subjects. As in the AML analysis, since the t-test shows extremely high type I error rate and detects almost all the regions as differentially methylated, we have performed IPA only for TCQ , TP A and TSKK . For each of the three methods, the CpG island DMRs were associated with the nearest downstream gene in order to generate a gene list to be input into IPA. The lists produced by the three methods are presented in Figure 6. The gene list from each method produced a number of overlapping biologically relevant, cancer related canonical pathways. All three methods show similar proportion of ovarian-cancer related pathways detected - 60% for TCQ and 50% for TSKK and TP A . However, only genes associated with the DMRs from TCQ resulted in an ovarian specific pathway (marked by an asterisk) that was significantly enriched. As TCQ and TP A were seen to be controlling type I error in simulation studies, it is unlikely that the pathways detected by these methods are false positives.
Conclusions and discussion In this paper, we explored applicability of high dimensional vector mean testing procedures to detect differentially methylated regions using MethylCap-seq data. Using PrEMeR-CG, a probabilistic method for assigning methylation signals to CG sites, we have demonstrated
20
effectiveness of the vector mean tests when compared to the more commonly used t-test and MethMAGE. The vector mean tests are developed for a larger family of correlation structures as opposed to MethMAGE. The t-test is shown to ignore the correlation structure when averaging or totaling the signal over sites within a region, resulting in extremely inflated type I error rates. Performance of the mean vector tests is very strongly affected by sparsity profiles across samples for the two groups. Hence we recommend first investigating sparsity of the signals across samples. When the signals observed are sparse, TCQ is seen to be the best test both in terms of controlling type I error and achieving power. TP A and TSKK are sensitive to difference between sparsity profiles of the two groups but perform well when the difference is small. Similarity in performance of TP A when the diagonal of covariance matrix is specified to be equal or different is an indication that the sitewise variability of methylation signals is fairly consistent. For the ovarian cohort simulations where the signals are based on CpG islands, TP A is observed to be conservative whereas TCQ is seen to have slightly higher type I error rate. This can be attributed to empirical error, since the normal distribution of the test statistic is asymptotic. Although the assumptions made are similar, TSKK fails to control type I error. This might be because of the restriction between sample size and dimension, which is not assumed for TCQ and TP A . TCQ attains much higher power than TP A under all models, and since the difference in power between them is much greater than loss in type I error in this trade-off, we conclude that TCQ is the best test for detecting differential methylation in CpG islands. The above conclusions are drawn based on simulated data sets with samples of 20 and 10 in the two groups. Although these sample sizes are much smaller than the sample sizes for the ovarian data set, they are larger than the AMD data set, therefore we further evaluate
21
the performance of the tests for simulated data sets with two groups of 6 and 4 samples to match the total sample size (10) of the AMD data. However, note that although the AMD data consist of two groups of 7 and 3 samples, we opted for the 6 and 4 split in this simulation since TP A requires the minimum of 4 samples in each group. The results, as presented in Supplementary Tables S7-S10 and Supplementary Figures S10 and S11, show that TCQ continues to outperform the other mean vector tests and MethMAGE. It is noted, though, that the t-test for the simulation based on the AMD data has higher power than TCQ when the empirical type I error is controlled to be at the 5% level under two of the setups. More details are provided in Supplementary Material S4. In both simulations with 10 DMRs and 10 non-DMRs, the regions were selected from the AMD and the ovarian data according to the following criteria: (1) The number of CG sites in each region should be greater than 50 but smaller than 200. This is to have more sites than sample size, but not too large that MethMAGE will not run in a reasonable amount of time; (2) Each region should have at least 40% of its sites with non-zero signal for at least one individual. Since the test statistics remove sites with all zero signals, this is to ensure that a reasonable number of sites remain for the analyses; (3) For the 10 DMRs, we further require that the means are significantly different based on the real data analysis to ensure that the tests will have a good chance of detecting the difference. Nevertheless, we recognize the need to consider other regions in the genome to more comprehensively evaluate the vector mean testing methods. Thus, we carried out an additional simulation study by randomly selecting 200 DMRs and 5000 non-DMRs to further gauge the performances of the methods in a variety, and potentially less ideal, settings of sparsity. However, due to its computational intensiveness and inferior performance, MethMAGE was not included in this set of simulations. Our findings led to similar conclusions as those based on the more carefully selected regions. That is, TCQ is the method that can best balance type I error and power,
22
and is recommended across all sparsity settings. Details are provided in Supplementary Material S5 and Supplementary Figures S12-S17. Testing DMR calling methods on actual biological data is hampered by the absence of a definite test data set, in which the true DMRs are known. Nevertheless, it is a promising sign that TCQ , when applied to the ovarian data set, identified DMRs with the corresponding genes significantly enriched in an ovarian specific pathway, while the other methods only identified more generic cancer related pathways. This indicates that the lower Type I error rate and higher power of TCQ when compared to the other tests on simulated data can indeed identify DMRs that are more likely to be biologically relevant for the system under study.
Software Availability MethylCap-Sig, the software that implements the five methods studied in this paper, is freely available for download at http://www.stat.osu.edu/∼statgen/SOFTWARE/MethylCap-Sig/ or from CRAN at https://cran.r-project.org/web/packages/MethylCapSig/index.html.
Acknowledgement This work was supported in part by grants from the National Science Foundation DMS1220772 and the National Institute of Health 1R01GM114142-01.
23
References [1] Frommer, M., McDonald, L. E., Millar, D. S. et al. (1992) A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands, Proc. Natl. Acad. Sci., 89, 1827 – 1831. [2] Lister, R., Pelizolla, M., Dowen, R. H. et al. (2009) Human DNA methylomes at base resolution show widespread epigenomic differences, Nature, 462, 315 – 322. [3] Meissner A., Gnirke, A., Bell, G. W. et al. (2005) Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis, Nucleic Acids Research, 33(18), 5868 – 5877. [4] Taiwo, O., Wilson, G. A., Morris, T. et al. (2012) Methylome analysis using MeDIP-seq with low DNA concentrations. Nature Protocols, 7, 617 – 636. [5] Frankhouser, D. E., Murphy M., Blachly, J. S. et al. (2014) PrEMeR-CG: inferring nucleotide level DNA methylation values from MethylCap-seq data, Bioinformatics, 30(24), 3567 – 3574. [6] Eckhardt, F., Lewin, J., Cortese, R. et al. (2006) DNA methylation profiling of human chromosomes 6, 20 and 22, Nature Genetics, 38, 1378 – 1385. [7] Kuan, P. F. and Chiang, D. Y. (2012) Integrating prior knowledge in multiple testing under dependence with applications to detecting differential DNA methylation, Biometrics, 68. [8] Weber, M., Davies, J. J., Wittig, D. et al. (2005), Chromosome-wide and promoterspecific analyses identify sites of differential DNA methylation in normal and transformed human cells, Nature Genetics, 37(8). 24
[9] Chavez, L., Lienhard, M. and Dietrich, J. (2013) MEDIPS: (MeD)IP-seq data analysis. R package version 1.16.0. [10] Bock, C., Tomazou E. M., Brinkman, A. B. et al. (2010) Genome-wide mapping of DNA methylation: a quantitative technology comparison, Nature Biotechnology, 28, 1106 – 1114. [11] Carvalho, R. H., Haberle, V., Hou, J. et al. (2012) Genome-wide DNA methylation profiling of non-small cell lung carcinomas, Epigenetics & Chromatin, 5(9). [12] Carvalho, R. H., Hou, J., Haberle, V. et al. (2013) Genomewide DNA methylation analysis identifies novel methylated genes in non-small-cell lung carcinomas, Journal of Thoracic Oncology, 8(5), 562 – 573. [13] Zhao, Y., Guo, S., Sun, J. et al. (2012) Methylcap-Seq reveals novel DNA methylation markers for the diagnosis and recurrence prediction of bladder cancer in a Chinese population, PLoS ONE, 7(4). [14] Lendvai, A., Johannes, F., Grimm, C. et al. (2012) Genome-wide methylation profiling identifies hypermethylated biomarkers in high-grade cervical intraepithelial neoplasia, Epigenetics, 7(11), 1268 – 1278. [15] Feng, H., Conneely, K. N., Wu, H. (2014) A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Research, 42, e69. [16] Hansen, K. D., Langmead, B., Irizarry, R. A. (2012) BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biology, 13, R83 [17] Park, Y., Figueroa, M.E., Rozek, L.S. et al. (2014). methylSig: a whole genome DNA methylation analysis pipeline. Bioinformatics, 30, 2414-22. 25
[18] Yan, P., Frankhouser, D., Murphy, M. et al. (2012) Genome-wide methylation profiling in decibatine-treated patients with acute myeloid leukemia, Blood, 120(12). [19] Chen, S. X., and Qin, Y. (2010) A two-sample test for high-dimensional data with applications to gene-set testing, Annals of Statistics, 38, 808 – 835. [20] Park, J. and Ayyala, D. N. (2013) A test for the mean vector in large dimension and small samples, Journal of Statistical Planning and Inference, 143, 929 – 943. [21] Srivastava, M. S., Katayama, S. and Kano, Y. (2013) A two sample test in high dimensional data, Journal of Multivariate Analysis, 114, 349 – 358. [22] Bai, Z. and Saranadasa, H. (1996) Effect of high dimension: By an example of a two sample problem, Statistica Sinica, 6. [23] Dempster, A. P. (1958) A high dimensional two sample significance test, Annals of Mathematical Statistics, 29(4). [24] Jaffe, A. E., Feinberg, A. P., Irizarry, R. A. et al. (2012), Significance analysis and statistical dissection of variably methylated regions, Biostatistics, 13(1). [25] Huang, R., Gu, F., Kirma, N. B. et al. (2013), Comprehensive methylome analysis of ovarian tumors reveals hedgehog signaling pathway regulators as prognostic DNA methylation biomarkers, Epigenetics, 8(6).
26
Figure Legends Figure 1: ROC curves of the five test statistics constructed for the four setups under Model I. The data was generated with parameters modeled based on the AML data set. Figure 2: ROC curves of the five test statistics constructed for the four models under Models I. The data was generated with parameters modeled based on the ovarian data set. Figure 3: Venn diagram showing the number of regions detected to be differentially methylated by TCQ , TSKK , t-test and MethMAGE. Figure 4: Top-10 canonical pathways produced from the lists of genes that are detected to be significantly differentially methylated in the AML cancer data by the three methods: TCQ , TSKK and MethMAGE. Figure 5: Venn diagrams showing the number of regions detected to be differentially methylated by TCQ , TP A , TSKK and t-test method for the ovarian cancer data. The three plots from left to right correspond to data separated based on diagnosis, recurrence, and normalcy status, respectively. Figure 6: Top-10 canonical pathways produced from the lists of gene lists detected to be significantly differentially methylated in the ovarian cancer data by the three methods: TCQ , TSKK , and TP A , respectively.
27
Table 1: Parameters specification for the four setups studied under Models I and II. Model I
II
Setup RX 1 2 RIX 3 4 1 2 RG X 3 4
DΣX DΣ DΣX DΣ DΣX
πX πX 0.5πX πX 0.5πX πX 0.5πX πX 0.5πX
RY
DΣY DΣ
RIY DΣY DΣ RG Y DΣY
πY πY 0.5πY πY 0.5πY πY 0.5πY πY 0.5πY
Table 2: Type I error and power(in percentage) calculated at 5% nominal significance level from data generated using the AML data set. The four models studied under the two setups are as described in Table 1. Model I
II
Model I
II
Setup 1 2 3 4 1 2 3 4
TCQ 9.03 7.09 9.05 7.15 9.11 7 9.22 7.75
Setup 1 2 3 4 1 2 3 4
TCQ 98.37 89.21 97.87 87.94 98.73 89.54 98.2 88.3
Type I error TP A TSKK 55.16 94.6 20.31 65.83 53.47 89.13 21.46 55.5 54.84 94.4 20.55 64.45 53.44 88.39 21.99 55.46 Power TP A TSKK 87.05 100 47.68 95.6 78.43 100 45.95 96.1 87.75 99.99 48.48 96 78.47 100 46.34 96.51
t-test 74.23 69.09 74.26 69.22 74.5 68.99 74.68 69.21
MethMAGE 31.18 31.05 31.15 31.11 31.1 31.07 31.1 31.06
t-test 79.99 79.78 80 79.72 80 79.73 79.99 79.8
MethMAGE 60.19 62.24 60.37 62.57 60.03 62.36 60.48 62.59
Table 3: Sensitivity of the tests calculated when false positive rate is controlled at 5% actual significance level for data simulated based on AML data set. Model I
II
Setup 1 2 3 4 1 2 3 4
TCQ 97.25 87.93 96.87 86.38 97.86 88.23 97.08 86.59
TP A TSKK 49.63 7.3 33.64 24.51 46.44 10.3 33.34 34.41 51.46 7.43 33.23 26.32 46.55 10.6 34.11 34.96
t-test 12.49 18.01 12.38 17.21 12.48 17.73 12.49 17.58
MethMAGE 21.11 20.9 20.55 20.59 20.88 20.79 20.13 20.68
Table 4: Type I error and power(in percentage) calculated at 5% nominal significance level from data generated using the ovarian cancer data set. The four models studied under the two setups are as described in Table 1 Model I
II
Model I
II
Setup 1 2 3 4 1 2 3 4
TCQ 6.09 6.15 6.72 5.97 5.76 6.15 6.84 6.24
Setup 1 2 3 4 1 2 3 4
TCQ 90.37 65.28 90.35 65.63 90.12 65.22 90.03 66.63
Type I error TP A TSKK 3.39 24.9 3.3 18.24 3.38 16.46 3.53 13.05 3.05 23.8 3.23 18.54 3.41 14.25 3.54 13.28 Power TP A TSKK 74.45 96.24 37.22 72.42 74.37 95.9 35.51 79.28 74.3 96.12 37.38 72.28 74.91 95.8 35.49 78.48
t-test 73.59 72.71 73.26 72.49 73.19 72.61 73.39 72.19
MethMAGE 50.99 46.13 50.82 45.75 50.27 46.03 51.01 45.6
t-test 74.12 61.7 73.57 61.93 73.64 61.91 73.82 62.55
MethMAGE 43.18 39.42 43.18 39.37 43.73 39.49 43.44 39.53
Table 5: Sensitivity of the tests calculated when false positive rate is controlled at 5% actual significance level for data simulated based on ovarian cancer data set. Model I
II
Setup 1 2 3 4 1 2 3 4
TCQ 90.05 63.56 89.99 64.35 89.78 63.47 89.54 65.28
TP A TSKK 76.55 90.65 42.08 45.89 76.53 92.75 39.31 65.07 76.67 90.33 41.28 45.34 76.65 92.95 39.29 64.38
t-test 3.12 2.32 3.21 2.47 3.1 2.29 3.15 2.49
MethMAGE 3.36 4.3 3.29 4.39 3.42 4.28 3.3 4.3
Table 6: Average run time of the four methods (in milliseconds) based on 100 randomly simulated data sets. The various columns correspond to the scaling factor C so that the sample sizes used are (n1 , n2 ) = (7C, 3C). Test
k 1 2 49 125.4 256.9 83 467.5 985.7 MethMAGE 132 865.4 2182.9 176 6085.1 7286.3 49 2.0 9.4 83 2.3 11.4 TCQ 132 2.8 15.3 176 3.0 17.2 49 NA∗ 64.8 83 NA 105.5 TP A 132 NA 198.5 176 NA 320.5 49 0.2 0.9 83 2.0 2.2 TSKK 132 2.3 3.1 176 1.9 2.4 49 0.6 0.7 83 0.8 0.3 t-test 132 0.5 0.3 176 0.4 0.6 ∗
Note that TP A is not applicable when n1 < 4.
C 3 4 386.8 493 1420.5 2000.9 4633.3 6085 10307.3 11961.4 23.6 49.4 30.4 64.2 39.2 87.3 48.5 101.8 167 331 295.4 602.2 553.5 1153.8 888.2 1870 0.8 0.8 2.6 2.5 3.5 3.4 2.2 2.6 0.2 0.6 0.5 0.7 0.6 0.6 0.6 0.6
5 624.3 2486.4 7309.1 18721.5 86.9 112.2 149.6 182.6 569.6 1052.2 2053.2 3353.5 1 2.5 3.6 2.4 0.6 0.4 0.4 0.5
6 742 2996.5 7769.2 21648.7 129.1 172 237.9 301.4 892.2 1673.2 3326.7 5449.4 0.9 2.7 3.6 2.5 0.3 0.5 0.5 0.8
7 862 3582 11558.7 25313.3 189.2 263.3 361.3 460.2 1294.6 2496 5032.7 8302.5 0.8 2.9 3.4 2.6 0.2 0.7 0.5 0.3
100 40
Sensitivity
60
80
100 80 60 40
Sensitivity
0
20
40
60
80
20
TCQ TPA TSKK t−test MethMAGE
0
0
20
TCQ TPA TSKK t−test MethMAGE 100
0
20
1 − Specificity
40
60
80
100
1 − Specificity
80 60 Sensitivity 40
0 0
20
40
60
1 − Specificity
(c) Setup 3
80
100
TCQ TPA TSKK t−test MethMAGE
20
20
TCQ TPA TSKK t−test MethMAGE
0
40
Sensitivity
60
80
100
(b) Setup 2
100
(a) Setup 1
0
20
40
60
80
100
1 − Specificity
(d) Setup 4
Figure 1: ROC curves of the five test statistics constructed for the four setups under Model I. The data was generated with parameters modeled based on the AML data set.
100 40
Sensitivity
60
80
100 80 60 40
Sensitivity
0
20
40
60
80
20
TCQ TPA TSKK t−test MethMAGE
0
0
20
TCQ TPA TSKK t−test MethMAGE 100
0
20
1 − Specificity
40
60
80
100
1 − Specificity
80 60 Sensitivity 40
0 0
20
40
60
1 − Specificity
(c) Setup 3
80
100
TCQ TPA TSKK t−test MethMAGE
20
20
TCQ TPA TSKK t−test MethMAGE
0
40
Sensitivity
60
80
100
(b) Setup 2
100
(a) Setup 1
0
20
40
60
80
100
1 − Specificity
(d) Setup 4
Figure 2: ROC curves of the five test statistics constructed for the four models under Models I. The data was generated with parameters modeled based on the ovarian data set.
Figure 3: Venn diagram showing the number of regions detected to be differentially methylated by TCQ , TSKK , t-test and MethMAGE.
4−hydroxyproline Degradation I
Intrinsic Prothrombin Activation Pathway
Pregnenolone Biosynthesis
Complement System
G−alpha−i Signaling
G−Protein Coupled Receptor Signaling
cAMP−mediated signaling
Histidine Degradation VI
FXR/RXR Activation
Phototransduction Pathway
−log10(pvalues)
EIF2 Signaling
(a) TCQ
0 P2Y Purigenic Receptor Signaling Pathway
G−alpha−s Signaling
0 Lymphotoxin alpha Receptor Signaling
1
0.0 Relaxin Signaling
0.5
iCOS−iCOSL Signaling in T Helper Cells
1.0
cAMP−mediated signaling
1.5
Leptin Signaling in Obesity
2.0
G−alpha−i Signaling
5
G−Protein Coupled Receptor Signaling
2.5
−log10(pvalues) 6
3.0
Phototransduction Pathway
Superpathway of D−myo−inositol (1,4,5)− trisphosphate Metabolism
G−Protein Coupled Receptor Signaling
Cellular Effects of Sildenafil (Viagra)
Dendritic Cell Maturation
Leptin Signaling in Obesity
D−myo−inositol (1,3,4)− trisphosphate Biosynthesis
Cysteine Biosynthesis III (mammalia)
Superpathway of Methionine Degradation
Methionine Degradation I (to Homocysteine)
−log10(pvalues) 3.5
4
3
2
(b) TSKK
4
3
2
1
AML cancer related Non−AML cancer related Significance level
(c) MethMAGE
Figure 4: Top-10 canonical pathways produced from the lists of genes that are detected to be significantly differentially methylated in the AML cancer data by the three methods: TCQ , TSKK and MethMAGE.
(a) Diagnosis
(b) Recurrence
(c) Normalcy
Figure 5: Venn diagrams showing the number of regions detected to be differentially methylated by TCQ , TP A , TSKK and t-test method for the ovarian cancer data. The three plots from left to right correspond to data separated based on diagnosis, recurrence, and normalcy status, respectively.
Neuropathic Pain Signaling In Dorsal Horn Neurons
Relaxin Signaling
tRNA Splicing
Human Embryonic Stem Cell Pluripotency
Maturity Onset Diabetes of Young (MODY) Signaling
Embryonic Stem Cell Differentiation into Cardiac Lineages
Axonal Guidance Signaling
G−Protein Coupled Receptor Signaling
cAMP−mediated signaling
Transcriptional Regulatory Network in Embryonic Stem Cells
−log10(pvalues)
(a) TCQ
0 G−alphas Signaling
Embryonic Stem Cell Differentiation into Cardiac Lineages
Axonal Guidance Signaling
Relaxin Signaling
Neuropathic Pain Signaling In Dorsal Horn Neurons
0 cAMP−mediated signaling
0 Role of Oct4 in Mammalian Embryonic Stem Cell Pluripotency
2
G−Protein Coupled Receptor Signaling
2
Maturity Onset Diabetes of Young (MODY) Signaling
4 −log10(pvalues)
6
Transcriptional Regulatory Network in Embryonic Stem Cells
Nitric Oxide Signaling in the Cardiovascular System
MSP−RON Signaling Pathway
Wnt/alpha−catenin Signaling
Factors Promoting Cardiogenesis in Vertebrates
Ovarian Cancer Signaling
Human Embryonic Stem Cell Pluripotency
Neuropathic Pain Signaling In Dorsal Horn Neurons
Maturity Onset Diabetes of Young (MODY) Signaling
Role of Oct4 in Mammalian Embryonic Stem Cell Pluripotency
Transcriptional Regulatory Network in Embryonic Stem Cells
−log10(pvalues) 8 8
6
4
(b) TSKK
5
4
3
2
1
Ovarian cancer related Non−ovarian cancer related Significance level
(c) TP A
Figure 6: Top-10 canonical pathways produced from the lists of gene lists detected to be significantly differentially methylated in the ovarian cancer data by the three methods: TCQ , TSKK , and TP A , respectively.
Figure Legends Figure 1: ROC curves of the five test statistics constructed for the four setups under Mod el I. The data was generated with parameters modeled based on the AML data set. Figure 2: ROC curves of the five test statistics constructed for the four models under Mod els I. The data was generated with parameters modeled based on the ovarian data set. Figure 3: Venn diagram showing the number of regions detected to be differentially methyla ted by TCQ , TSKK , t-test and MethMAGE. Figure 4: Top-10 canonical pathways produced from the lists of genes that are detected to be significantly differentially methylated in the AML cancer data by the three methods: TCQ , TSKK and MethMAGE. Figure 5: Venn diagrams showing the number of regions detected to be differentially methyl ated by TCQ , TP A , TSKK and t-test method for the ovarian cancer data. Th e three plots from left to right correspond to data separated based on diagnosis, recurre nce, and normalcy status, respectively. Figure 6: Top-10 canonical pathways produced from the lists of gene lists detected to be s ignificantly differentially methylated in the ovarian cancer data by the three methods: TCQ , TSKK , and TP A , respectively.