Supplementary Information In silico identification of ... - Nature

0 downloads 0 Views 1MB Size Report
Supplementary Information. In silico identification of enhancers on the basis of a combination of transcription factor binding motif occurrences. Yaping Fang. 1,2.
Supplementary Information

In silico identification of enhancers on the basis of a combination of transcription factor binding motif occurrences Yaping Fang1,2*, Yunlong Wang1,2, Qin Zhu1,2, Jia Wang1,2, Guoliang Li1,2,3*

1

3

Agricultural Bioinformatics Key Laboratory of Hubei Province, 2College of Informatics,

National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.

*To whom correspondence should be addressed.

Email: [email protected], [email protected]

Supplementary Figures Supplementary Figures S1 Supplementary Figures S2 Supplementary Tables Supplementary Table S1 Supplementary Dataset Supplementary Dataset S1

H3K4me3.E14 M01098 M01275 M01247 M00047 M01162 M01269 M00253 M00311 M01124 M00353 M01228 M00684 M01084 M00394 M00380 M00770 M01534 M01516 M01095 M01412 M01444 M01360 M01445 M01614 M01395 M01211 M01419 M01631 M01553 M01341 M01407 M00436 M01607 M01439 M01479 M00689 M01329 M01625 M00640 M01297 M01626 M01532 M01483 M01504 M01332 M01500 M01527 M00007 M01804 M01803 M00711 M01802 M00020 M01640 M01621 M00205 M01125 M00729 M01509 M00725 total Histones M01563 M01183 H3K27ac.E14 M00672 H3K27ac.E0 H3K4me1.E14 H3K4me1.E0

Featu ures

Featu ures

M01131 M00795 M01009 M00378 M00255 M01756 M01700 M00744 M01161 M01162 M01091 M00029 M01045 M00644 M01682 M01594 M01094 M01130 M00028 M01238 M00396 M01520 M01032 M01723 M00207 M00101 M01200 M01800 M00395 M01588 M00100 M01531 M00355 M00142 M01706 M01292 M01316 M00803 M01566 M01360 M01529 M01473 M01098 M01470 M01324 M01553 M01504 M01323 M01477 M01528 M00253 M01230 M00374 M01412 M00438 M01483 M01532 M01527 M00202 M01269 M00057 M00394 M01228 M01631 M01632 M01643 M01627 M01395 M00047 M01341 M01647 M00197 M00012 M01095 M01445 M00353 M01638 M01629 M00684 M01329 M01651 M01628 M01554 M01624 M00766 M00991 M00013 M00311 M00267 M01773 M01635 M00094 M01560 M01247 M01439 M01124 M01332 M01559 M01633 M01644 M01617 M00380 M01479 M01436 M01608 M01488 M01407 M01551 M01646 M01607 M00770 M01634 M01648 M01640 M01419 M01211 M00436 M01084 M01509 M01649 M00007 M01518 M00689 M00640 M01541 M00020 M01626 M00711 M01803 M01625 M00205 M01500 M01802 M01804 M00729 M01563 M01125 M01621 M00725 M01183 M00672

A

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

Feature importance

0.04

0.045

0.05

 

B

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Feature i mportance

Figure S1. Importance of selected features. Figure S1A represents the importance of the selected features in the model incorporating TF binding motif occurrence features in Table 3. Figure S1B represents the importance of the selected features in the model incorporating the feature group TF binding motif occurrence and the chromatin state in Table 3. The prefix M represents Position Weight Matrix (PWM).

True Positive Rate

A

gapped k−mer k−mer, brain k−mer, limb

False Positive Rate

Precision

B

gapped k−mer k−mer, brain k−mer, limb

Recall Figure S2. The receiver operator characteristic (ROC) and precision-recall curves for different datasets. Figure S2A represents the ROC curves for different datasets. Figure S2B shows the precision-recall curves for different datasets.The value of AUC is 0.8742 for the mouse embryo brain dataset (green dash line), 0.8738 for the mouse limb dataset (blue dash line) and 0.8643 for the binding sites of P300 protein (red line).