May 2, 2013 - [8] Nicola J Mulder, Rolf Apweiler, Terri K Attwood, Amos Bairoch, Alex. Bateman, David Binns, Margaret Biswas, Paul Bradley, Peer Bork, ...
Boosting Probabilistic Graphical Model Inference by Incorporating Prior Knowledge from Multiple Sources (Supplement) Paurush Praveen, Holger Fr¨ohlich May 2, 2013
1 Knowledge sources The approaches presented here primary consider the following sources of biological knowledge, however new sources can be added if available. Protein-Protein Interactions Protein-protein interaction (PPI) data can be instrumental to understand biology at a system-wide level. They present the current knowledge about pairs of proteins that interact in living system and hence can be an important source of information for our approach. PPIs have traditionally been measured using a variety of assays, such as immunoprecipitation and yeast two-hybrid (Y2-H). Such knowledge resides in various databases, like IntAct, HPRD etc. The entire set of known interaction referred as interactome can be structured as a graph. We here use interaction data from the PathwaysCommons database [1]. To compute a confidence value for each interaction between a pair of genes/proteins we can work at the level of this graph in different ways: One way is to look at the shortest path distance between the two entities. Another way is to employ diffusion kernels [6]. To calculate the shortest path distance between two nodes the function sp.between function based on Dijkstra’s algorithm is used from R-package RBGL. The edge confidence is then computed as the inverse shortest path distance. The diffusion kernel was computed using the R-package pathClass [5] for the entire PathwayCommons graph. However, we observed that the use of the diffusion kernel in place of the shortest path measure did not affect the results much (Figure 1). For all the results in the main document we thus used the inverse shortest path distance as a measure of confidence.
1
Gene Ontology The Gene Ontology (GO) has been developed to offer controlled vocabularies for aiding the annotation of molecular attributes for different organisms. Predicting the map of potential physical interactions between proteins by fully exploring the knowledge buried in GO annotations seems a promising approach, because interacting proteins often function in the same biological process. This implies that two proteins acting in the same biological process are more likely to interact than two proteins involved in different processes. Here we use this information based on GO Biological Process (BP) annotations. Comparison of individual GO terms was performed via Lin’s similarity measure [7]. Based on this gene products were compared via the default method in GOSim [3], which resembles the similarity measure by Schlicker et.al.[10]. Protein Domain Annotation Hahne et.al [4] and Fr¨ohlich et.al. [2] found that proteins in distinct KEGG pathways are enriched for certain protein domains, i.e. proteins with similar domains are more likely to act in similar biological pathways. The confidence of interaction between two proteins can thus be seen as a function of the similarity of the domain annotations of proteins. Protein domain annotation can be retrieved from the Inter-Pro database [8]. For each protein we constructed a binary vector, where each component represents one Inter-Pro domain. A “1” in a component thus indicates that the protein is annotated with the corresponding domain. Otherwise a “0” is filled in. The similarity between two binary vectors u, v (domain signatures) is presented in terms of the cosine similarity Sdomain =
hu, vi kukkvk
(1)
Domain-Domain Interactions Two proteins are more likely to interact if they contain domains, which can potentially interact. The DOMINE database collates known and predicted domain–domain interactions [9]. Calculation for edge confidence (IAB ) based on the DOMINE database is done as IAB =
H DA .DB
(2)
where H is the number of hit pairs found in the DOMINE database and DA and DB are the the number of domains in proteins A and B, respectively.
2
Figure 1: Optimal balanced accuracies achieved for reconstructing KEGG subgraphs (m=20) using graph diffusion kernel (left) and inverse of pairwise shortest path-lengths (right)
3
−5000 −6000 −7000 −9000
−8000
log likelihood
0
1000
2000
3000
4000
5000
Iterations (x100)
Figure 2: Plot showing the convergence of the MCMC sampler in terms of log likelihoods.
4
●
A2B3 ●● A2B3 A2B3 ●● A2B3
0.6 0.6
A2B4 A2B4 A3B2 A3B2
A2B4 A2B4 0.6 A3B2 0.6 A3B2
A3B3 A3B3
A3B3 A3B3 A3B4 A3B4
A3B4 A3B4 A4B2 A4B2
0.3 0.3
0.3 0.3
0.0 0.0
0.0 ● ●● ● ●●● ●●● ●●● ●●● ● ●● ●●● ●●● 0.0●● ●● ●● ●●● ●
A4B4 A4B4
0.9
A4B2 A4B2 0.3 A4B4 0.3 A4B4
●
0.0
0.0 0.0
0.0 0.0
0.00
0.50 0.25
0.75 0.50
Probability Probability
1.00 0.75
0.00 0.00
1.00
A2B2
A2B2
● A2B3
● A2B3
A2B4
0.6
0.3
A3B2
A2B4 0.6 A3B2
A3B3
A3B3
A3B4
A3B4
A4B2
A4B2 0.3 A4B4
A4B4
0.25 0.50 0.25 0.50
0.50 0.75 0.50 0.75
Probability Probability Probability Probability
Parameter
●
A2B2
A2B2
● A2B3
● A2B3
A2B4
0.6
A3B2 A3B3
0.3
A2B4 0.6 0.6 A3B2 A3B3
A3B4
A3B4
A4B2
0.3 A4B2 0.3 A4B4
A4B4
Probability Probability
NOM
1.00 0.75
0.00
1.00
0.0 0.0
0.0 0.25 0.00
0.50 0.25
0.75 0.50
Probability Probability
NOM
NOM
1.00 0.75
0.3
●
●
0.6 0.6
●
●
●
● ●
0.6
0.3
A3B2
A2B4 0.6 A3B2
A3B3
A3B3
A3B4
A3B4
A4B2
A4B2 0.3 A4B4
A4B4
● Parameter ●● A2B3
●
● A2B3
A2B4
0.6
A3B2 A3B3
●
●
0.3
●
0.0
● A2B3
0.50 0.25
0.75 0.50
Probability Probability
1.00 0.75
0.0
●
0.25 0.00
IP.RNK IP.RNK
● A2B3 A2B4
0.6
●
● ●
0.3
●
0.3
●
0.25 0.00
0.50 0.25
0.75 0.50
Probability Probability
1.00 0.75
A2B4
●●
A3B4
A3B4
A3B4
A3B4
A4B2
A4B2 0.3 A4B4
A4B2
A4B2 0.3 0.50 A4B4
0.6
A3B2 A3B3
0.3
A4B4
●●
0.6
A3B2
● ● 0.0
A3B3
A3B3
A3B4
A3B4
0.25 0.00
A4B4
0.50 0.25
0.75 0.50
Probability Probability
1.00 0.75
Parameter Parameter A2B2
A3B2
A3B2
A3B3
A3B3
0.75 1.00
A2B2
● A2B3 ● A2B3 A2B4
A3B2
A3B2
A3B3
A3B3
A3B4
A3B4
A3B4
A4B2
A4B2
A4B4
A4B4
0.6
0.6
0.3
0.3
● 0.0
0.0
1.00
0.00 0.25
0.00
0.25 0.50
0.50 0.75
0.75 1.00
ProbabilityProbability
NOM.RNK NOM.RNK
NOM.RNK NOM.RNK
1.00
A2B2
0.6
0.3
A3B3
● Parameter Parameter A2B2
A2B2
A2B4
A2B4
A3B2
A3B2
A3B3
A3B3
A3B4
A3B4
A4B2
A4B2
A4B4
A4B4
A2B2
● A2B3 ● A2B3 0.6
0.6
A2B4
A2B4
A3B2
A3B2
Method A3B3
A3B3
A3B4IP
A3B4
IP.RNK
0.3
A4B2
A4B2
●A4B4MP
A4B4
LFM
0.3
●
●
NOM
●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●
●
1.00 0.75
1.00
0.0 0.25 0.00
0.50 0.25
0.75 0.50
0.0
NOM.RNK
0.0
0.00
0.25 0.00
ProbabilityProbability
0.50 0.25
0.75 0.50
1.00 0.75
1.00
ProbabilityProbability
LFM
LFM
LFM
● ●● ●● ●● ●● ●● ●●● ● ● ●●● ●● ● ●● ● 0.00
0.9 ● 0.9 Parameter
A2B2
A3B2
0.9
●
● A2B3
0.6
● 0.9
● A2B3 ● A2B3
A2B4 0.6 A3B2 A3B3
●
A3B4
● ● A2B2
Parameter Parameter A2B3
●
A2B4
0.9
0.9
A2B2 A2B2 A3B3 A3B2 Size of network ● A2B3 ● A2B3
Parameter Parameter A3B4
A4B2
A4B4
0.6
●
A2B4
● ●
● ●
●
A2B4
0.6
A2B2
A2B2
● A2B3 ● A2B3
Parameter sets
0.3
0.0
0.0
0.00
0.25 0.00
0.50 0.25
0.75 0.50
Probability Probability
1.00 0.75
0.6
A2B4
A2B4
A3B2
A3B2
A3B3
A3B3
A3B2
A3B2
A3B3
A3B3
A3B4
A3B4
A3B4
0.3
● 0.50 0.25
0.75 0.50
Probability Probability
LFM
LFM
1.00 0.75
Parameter
A2B2
A2B2
● A2B3
● A2B3
A2B4
A2B4
A3B2
A3B2
A3B3
A3B3
A3B2
A2B4 0.6 A3B2
A3B3
A3B3
A3B4
A3B4
A3B4
A3B4
A4B2
A4B2 0.3 A4B4
A4B2
A4B2
A4B4 ●
A4B4
A4B4
0.6
0.3
●
●● ● ●●●● ●● ●● ●● ●● ● ● ●
● 1.00
0.0 0.00
0.3
●
A3B4
A4B2
A4B2
A4B2
●
●
●
●
●
0.0 0.25 0.00
0.50 0.25
0.75 0.50
Probability Probability
LFM
LFM
1.00 0.75
1.00
●
A4B2
A4B4
A4B4
A4B4
●●
●
A4B4
5
0.25 0.00
0.50 0.25
0.75 0.50
ProbabilityProbability
1.00 0.75
0.3
0.3
0.0
0.0
●●
0.0
0.00
● Parameter
● A2B3 Specificity
0.6
0.9
Specificity
A2B4
0.25 0.00
1.00
●
0.9 Parameter Parameter A2B2 A2B2
0.0
A4B2 0.3 A4B4
●
●
0.0
1.00
A4B4
A3B4
NOM.RNKNOM.RNK
0.9
Sensitivity
Sensitivity
● ● ●●● ● ● ●● ●● ●● ●● ●● ●●● ●
0.9
MP.RNK
A2B2 A2B4
●
0.00
A3B3
● A2B3
A4B2
A4B2 0.3 A4B4
● ●● ●● ●● ●● ●●●● ● ● ●●●
● A2B3
0.0
0.50 0.75
LFM
●● ● ●●●● ●● ●● ●● ●● ●●●● ● ●● 0.9
0.3
●
0.9
1.00 1.00
A2B4
Parameter Parameter
0.25
Parameter
NOM.RNKNOM.RNK
0.6
A2B4 0.6 A3B2
0.00
1.00
● ● ● ●● ●● ●● ●● ●●●● ● ● ●●● ●
●
0.00
Probability Probability
1.00 0.75
● A2B3 A2B4 0.6 A3B2
0.75 1.00
Figure 3: Sensitivity and specificity plots for different set of α and β parameters for artificially generated information source data. The plot shows the behavior of different kinds of priors (top) and the corresponding optimal balanced accuracies (bottom). The number of nodes for all the networks here were 20 (m=20), and 6 information sources were used. 0.3
0.0
0.75 0.50
0.9
Specificity
Sensitivity
Sensitivity
A2B4
A4B2
0.3
●
0.9 Parameter Parameter A2B2 A2B2
● A2B3 0.6
0.25 0.50
MP.RNK MP.RNK
0.9
0.50 0.75
ProbabilityProbability
0.9
0.0
0.0
Specificity
0.9
A2B4
● ● ●●● ● ●● ●● ●● ●● ●● ●●●
0.00 0.25
A2B2 0.75 ● A2B3
A3B3
MP.RNK MP.RNK
0.25 0.50
●● ● ●● ● ●● ●● ●● ●● ●● ●● ● ● ●●
A3B3
1.00
0.3
0.9 Parameter
A3B2
0.50 0.25
0.6
A2B4
A4B4
Parameter
0.25 0.00
1.00 1.00
A2B2
A4B4
A2B2
0.00
Parameter Parameter
A4B2
● A2B3
0.0
●●
0.0●
A4B4 A4B4
1.00
● A2B3
● ● ● ●●● ● ● ●●●
0.0 0.0 0.00 0.25 0.00 0.25
MP.RNKMP.RNK
A4B2
●
0.0
0.00
0.3 0.3
MP.RNKMP.RNK
A2B2
0.00
A4B2 A4B2
A3B4 A3B4
Parameter Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2 A2B3 ● A2B3 ● ● A2B3 A2B3 ● A2B4 A2B4 A2B4 A2B4 A3B2 A3B2 A3B2 A3B2 A3B3 A3B3 A3B3 A3B3 A3B4 A3B4 A3B4 A3B4 A4B2 A4B2 A4B2 A4B2 A4B4 A4B4 A4B4 A4B4
0.25 0.50 0.75 0.50 Probability 0.75 1.00 Probability ProbabilityProbability
A4B2 0.3 A4B4
A2B4 0.6 A3B2
A4B4
●
0.0
Probability Probability
0.75 1.00 0.75 1.00
ProbabilityProbability
A3B4
1.00
0.50 0.75 0.50 0.75
0.6 0.6
●
1.00 0.75
0.25 0.50 0.25 0.50
Probability Probability Probability Probability
0.9 0.9
0.00 0.00
A4B2
Sensitivity
●
0.00 0.25 0.00 0.25
● A2B3 ● A2B3
0.0
0.75 0.50
A3B3 A3B3
A4B2 A4B2 A4B4 ● A4B4
0.0 0.0
0.0 0.0
1.00 1.00
●
0.9
Specificity
Sensitivity
Sensitivity
0.9 Parameter Parameter A2B2 A2B2
●
● 0.6
A3B3
● ● ●●● ● ● ● ●● ●● ●● ●● ●● ●● ●●●●
● ● 0.9
A2B4 0.6 A3B2
IP.RNK IP.RNK
Specificity
● 0.9
0.50 0.25
● ● ● ●●● ● ● ●●●
A3B4
0.0
0.00
1.00
● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ●● ●● ●● ●● ●●● ● ●●
A3B3 A3B3 A3B4 A3B4 A4B2 A4B2 0.3 A4B4 0.3 A4B4
A3B4
Sensitivity
0.25 0.00
●
0.25 0.50 0.75 0.50 0.75 1.00 0.25 0.50 0.75 0.50 0.75 1.00 Probability Probability
A2B2
● ● ●● ● ●●●● ●● ● ● ● 0.00
0.3 0.3
0.00 0.00
● ● ● ● ● ● ● ● ● ●● ●● ●
0.9
0.9 Parameter
A4B4
●
0.0
0.3 0.3 A4B4 A4B4
A3B2 A3B2
A3B3 A3B3 A3B4 A3B4
●
●
0.0 0.0
A3B3 A3B3 A3B4 A3B4 A4B2 A4B2 A4B4 A4B4
●
●
0.0 0.0 0.00 0.25 0.00 0.25
●●
●
●
A2B2
Specificity
A2B4
0.9
Specificity
Sensitivity
Sensitivity
● A2B3 0.6
1.00
●
0.9 Parameter Parameter A2B2 A2B2
0.9
A3B4 A3B4
0.9 Parameter 0.9 Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2 A2B3 ● A2B3 ● A2B3 ● A2B3 ● A2B4 A2B4 A2B4 A2B4 0.6 A3B2 0.6 A3B2 A3B2 A3B2
● ●
NOM
●● ● ●●●● ●● ●● ●● ●● ●● ●● ● ● ● 0.9
0.00 0.00
Sensitivity
0.75 0.50
A4B2 A4B2
A2B4 A2B4
A3B2 A3B2
●● ●● ●●● ●● ● ●● ● ●● ● ●● ●●● ●●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ●
0.3 0.3
Sensitivity
0.50 0.25
A3B3 A3B3
A2B4
0.6 0.6
0.6 0.6 A3B2 A3B2
LFM LFM IP.RNK IP.RNK
●
● ●
Sensitivity
0.25 0.00
0.0
Sensitivity
0.00
● ●● ●● ●● ●● ●●●● ● ● ●●●
Optimal Balanced Accuracy
● ● 0.0
●● 1.00 1.00
●
●●
0.9 0.9 0.9 ●●0.9 Parameter
●
0.0
●● 0.75 1.00 0.75 1.00
A2B4 A2B4
LFM LFM IP.RNK IP.RNK
0.9
●
0.00 0.25 0.00 0.25
MP
● ● ● ●● ●● ●● ●● ●● ●●●● ● ● ●●●
Specificity
Sensitivity
0.3
0.25 0.00
Specificity Specificity
0.0
A4B4
●●
A2B3 ●● A2B3 ●●A2B3
●
Specificity Specificity
0.3 0.3
A4B2 A4B2 A4B4 A4B4
MP
●
Sensitivity Sensitivity
0.3
A4B2 0.3 0.3 A4B4
0.9 Parameter Parameter
●
Sensitivity Sensitivity
Specificity
A3B4
A4B2
MP
0.9
A3B2 A3B2
A3B4
●
0.9
A2B4 A2B4
A4B2 0.3 A4B4
A3B3
1.00 1.00
● Parameter Parameter Parameter A2B2 A2B2 A2B2
●
A2B3 ●● A2B3 A2B3 ●● A2B3
0.6
A3B4
1.00
0.75 1.00 1.00 0.75
Specificity Specificity
MP
A2B4 0.6 0.6 A3B2
A4B2
A3B3
0.50 0.75 0.75 0.50
0.9 0.9
0.9 0.9
Specificity
Probability Probability
1.00 0.75
A3B2
A3B4
● ● ● ●● ●● ●● ●●●● ● ● ●●● 0.75 0.50
0.6
0.25 0.50 0.50 0.25
Probability Probability Probability Probability
●
Parameter Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2
A3B3 A3B3 A3B4 A3B4
A4B4
0.50 0.25
A2B4
0.9
Specificity
0.25 0.00
A3B3
0.00 0.25 0.25 0.00
Specificity
●
Specificity
A2B2
● A2B3
A3B3
●
NOM NOM.RNK NOMNOM.RNK
Specificity Specificity
0.0
●
●
Sensitivity
A2B2
● A2B3
A4B2 A4B2 A4B4 A4B4
0.0 0.0
0.00 0.00
Specificity
0.0
0.6
A2B2
● A2B3
A3B2
1.00 1.00
Specificity
● 0.3
0.00
A2B2
A2B4 0.6 A3B2
0.75 1.00 1.00 0.75
A3B3 A3B3 A3B4 A3B4
A4B4
● ●● ● ●●● ●●● ●●● ●●● ● ●● ● ●● ●● ●● ●● ●● ●● ● ●● ●● 0.9 0.9 Parameter
Sensitivity Sensitivity
0.3
●
Parameter
● A2B3 A2B4
0.6
0.9
Specificity
●
Sensitivity
Sensitivity
0.6
0.9 Parameter Parameter
0.9
0.50 0.75 0.75 0.50
A2B4 A2B4 A3B2 A3B2
A3B3 A3B3 A4B2
NOM NOM.RNK NOMNOM.RNK
IP
●● ● ●●●● ●● ●● ●● ●● ●●●● ● ● ●●● 0.9
0.25 0.50 0.50 0.25
Probability Probability Probability Probability
A2B4 A2B4 A3B2 A3B2 A3B4
0.3 0.3
0.0 0.0
Specificity
IP
IP
Sensitivity Sensitivity
IP
0.25 0.25 0.00
A2B3 ● A2B3 ●●A2B3 ● A2B3
0.6 0.6
●
0.00 0.00
Parameter Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2
●
●
Specificity Specificity
Sensitivity Sensitivity
● Sensitivity Sensitivity
0.9 Parameter 0.9 Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2
Specificity Specificity
0.9 0.9
0.6 0.6
●●● ● ●● ●●● ●●● ● ●● ● ●● ● ●● ● ●● ●●● ●●● ●● ●● ●●● ●0.9 ●
● ● ●
●● ● 0.9 0.9
1.00
0.00
0.25 0.00
0.50 0.25
0.75 0.50
ProbabilityProbability
1.00 0.75
1.00
A=2,B=2 1.00
A=2,B=3 ● ●
● ● ● ● ● ● ●
A=2,B=4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
A=3,B=2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
A=3,B=3
A=3,B=4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
A=4,B=2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
A=4,B=4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
Values
0.75
0.50
0.25
0.00 S1 S2 S3 S4 S5 S6
S1 S2 S3 S4 S5 S6
S1 S2 S3 S4 S5 S6
S1 S2 S3 S4 S5 S6
S1 S2 S3 S4 S5 S6
S1 S2 S3 S4 S5 S6
S1 S2 S3 S4 S5 S6
S1 S2 S3 S4 S5 S6
Variables with different shape parameters
Figure 4: Box-plot showing the distribution of confidence values in 6 artificially generated information sources (S1 - S6) with different shape parameters α (A) and β (B)
6
0.7 0.6
●
●
0.5
● ● ● ● ●
●
0.4
Correlation coefficient
● ● ●
0.2
0.3
●
A2B2
A2B3
A2B4
A3B2
A3B3
A3B4
A4B2
A4B4
Data sets with different shape parameters
Figure 5: Box-plot showing the distribution of pairwise Spearman rank correlations across 6 artificially generated information sources for different shape parameters α (B) and β (B). Rank correlations were computed for every pair of artificially generated matrices X (i) and X (j) .
7
● ●
●
0.9
●
Source
Sensitivity
0.6
●
0.6
4
0.3
0.3
0.0
0.0
● ●
1
● 2
● 2
3
3
4
4
5
5
● ●
6
0.25 0.50
4
● 0.3
0.0
0.0
●
●●
0.50 0.75
0.75 1.00
1.00
0.00
0.00 0.25
● 0.9
●
1
● 2
3
3
4
4
●
0.9
0.6
●
0.25 0.50
0.50 0.75
0.75 1.00
0.6
3 4
6
●0.3
0.25 0.50
0.25 0.50
0.50 0.75
1
● 2
3
3
4
4
● 5 6
5
● Specificity
●
1
● 2
0.6
0.6
●
6
●
0.3
0.75 1.00
0.0
1.00
● Source
1
● 2
●
● ●
3
3
4
4
5
5
6
6
0.0 0.50 0.75
●
0.75 1.00
●
0.6
●
● 0.3
4
5
5
6
6 0.3
4
0.75 1.00
1.00
6
● ● Source
●
0.6
● 0.3
●
0.0
1.00
1
● 2
3
3
4
4
5
5
6
6
●
●
●
●
Source
1
● ● 2
● ●
● ● ● ●●
0.0
0.00
0.25 0.00
0.50 0.25
0.75 0.50
1.00 0.75
1.00
Probability Probability
LFM
LFM
0.6
0.9
LFM
0.9
Source
1
1
● 2
● 2
Source
0.6 3
3 4
4
5
5
6
6 0.3
0.3
●
0.6
● ●
●
● ●
0.3
Source
●● ●● ● ● ● ●1 ● ● 2 ● ● 3
●
● ●
●
1
● 2 3
4
4
5
5
6
6
● ●
Source
●
0.50 0.25
0.75 0.50
1.00 0.75
1.00
0.0
0.00
0.25 0.00
0.50 0.25
0.75 0.50
1.00 0.75
1.00
Probability Probability
Source 0.75
0.25 0.50
0.50 0.75
Probability Probability
MP.RNK MP.RNK
0.75 1.00
1
● 2
3
3
5
●
0.0
0.25 0.00
Probability Probability
● 2
6
0.00 0.25
0.0
0.00
1.00
0.0
0.9
●
1.00 0.75
5 0.3
1.00
● ●●● ● ●●● ● ● ● ● ● ●
MP.RNK MP.RNK
●
●
Source
0.0
0.50 0.75
●
Method
4
IP
5
IP.RNK
6
LFM
0.50
MP MP.RNK NOM NOM.RNK
1.00 0.25
● ● ● ● ● ● ● ●● ● ● ●●● ●
0.9 0.00
1
● 2
● 2
3
3
4
4
5
5
6
6
Specificity
●
●
Source
1
0.6
●
Source
●
Source
1
1
● 2
● 2
3
3
4
4
5
5
6
6
0.6
●
1
2
3
4
5
6
Size of network
# Sources
Figure 6: Sensitivity and specificity for different number of artificially generated information sources (α = 2, β = 2). The plot shows the behavior of different priors (top) and the corresponding optimal balanced accuracies (bottom). The number of nodes for all the networks here were 20 (m=20). 0.3
0.3
0.3
●
●
0.0
0.00
0.00 0.25
● ● ● ● ● ● ● ●● ● ● ●●●
0.25 0.50
0.50 0.75
0.75 1.00
0.0
1.00
●
0.0
0.00
0.00 0.25
0.25 0.50
0.50 0.75
0.75 1.00
Probability Probability
Probability Probability
NOM.RNK NOM.RNK
NOM.RNK NOM.RNK
● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Source
Sensitivity
● 0.6
0.6
0.3
0.3
0.0
0.0
1
● 2
● 2
3
3
4
4
5
5
6
6
0.6
0.6
● ● 0.3
0.00 0.25
0.25 0.50
0.50 0.75
0.75 1.00
Probability Probability
LFM
LFM
● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●
1.00
0.0 0.00
● ●
●● 0.00
●
● ●
●
Source
● 1
●
●
0.9
0.3
●
1.00
● 0.9
●
Specificity
0.9
Specificity
0.9
Sensitivity
0.6 3
●
●
●
0.0
6
4
0.00
Specificity
Sensitivity
Sensitivity
0.3
3
5
Probability Probability
●
3
●
●
0.0
1.00
Source
0.6
● ●
0.25 0.50
● 0.6
0.3
●
1
6
●
● ●
●
●
●● ● ●● ●
0.6
1
5
6
1.00 0.75
1
● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● 0.9
● 2 0.6● 2
● ●
●
0.75 0.50
0.9
● 2
0.3
Source
1
● ●
0.9
●
●
●
Source
1
● 2
●
0.25 0.50
0.50 0.25
Source
1●
4
0.75 0.50
4
5
●
3
0.50 0.25
3
4
●
0.9
Specificity
●
●
0.9
0.25 0.00
0.9
● 2
0.0
1
● 2
3
●
0.00
●
Source
1
● 2
●
NOM.RNKNOM.RNK
0.6
1.00
0.0
Probability Probability
0.9
●● 0.00 0.25
0.9
0.3
●
NOM.RNKNOM.RNK
●
0.25 0.00
6
0.3
0.0
1.00
Source
0.00
●●
0.0
0.00
Specificity
Sensitivity
Sensitivity
●
●
4
●
●
●
0.00 0.25
6 0.3
1.00 0.75
● Source
IP.RNKIP.RNK
0.6
●●
Source
IP.RNKIP.RNK
●
0.00
0.6
Probability Probability
LFM
● Source
1.00
0.9
Probability Probability
●
0.75 1.00
● 0.9
●
Probability Probability
●
0.0
0.50 0.75
NOM NOM
●
0.3
6
●●
0.00 0.25
5
6
Source
Probability Probability
●
0.0
●●
0.6
0.3
●
0.0
0.00
Specificity
Sensitivity
Sensitivity
1.00
●
●●
0.75 0.50
●
●
5
●
0.0
0.3
●
4 5
NOM NOM
0.0
0.9
3
6
●
●●
1
5
0.3
0.9
4
5
● ● ● ●● ●● ●●● ● ● ● ●
Source
4
5
● ● ● ●● ●● ●●● ● ● ● ● ●
●
0.6 3
3
0.9
6
Probability Probability
●
0.00 0.25
1
Probability Probability
0.6
0.00
0.9
3
4
1.00 0.75
● ● ● ●● ●● ●● ●● ●● ● ● ● ● ●
● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●
●
0.75 0.50
●
●
0.0
1
● 2
●
●
0.3
1
● 2 ●
0.50 0.25
1
● 2
3
0.9
Source
4
MP
5
● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 0.9
0.6
●
0.25 0.00
● 2 0.6● 2
●
0.9
0.00
6
● ● ● ● ● ● ● ●● ● ● ●●●
0.00 0.25
1.00
Source
0.3
0.0
0.00
0.75 1.00
5 0.3
0.0
0.50 0.75
Source
1
● 2
0.6
●
0.25 0.50
MP
Specificity
Sensitivity
Sensitivity
0.3
● 0.9
●
0.0
0.50 0.25
MP.RNK MP.RNK
0.3
0.0
MP
Source
0.6
0.25 0.00
●● ●
0.6
Source
1
● 2
0.0
0.00
MP.RNK MP.RNK
Source
6
0.0
Probability Probability
●
Probability Probability
0.9
●
●
1.00
●
Specificity
●● 0.9
0.3
●
●
● ● ●●● ●●●
Probability Probability
MP
●
1.00 0.75
5
6 0.3
4
5
●
Probability Probability
Specificity
0.00 0.25
●
3
●
0.6
0.3
Specificity
● 0.0
0.00
● 2 0.6● 2 3
0.75 0.50
●
Source 1
6 0.3
● ●
0.0
●
0.6
6
0.3
●
●
0.9
6
Specificity
0.3
Source 1
0.6
0.9
●
5
Specificity
●
●
Sensitivity
●
Source
1
●● ●
● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●
Sensitivity
0.6
●
Sensitivity
0.6
Source
●
●
0.9
Sensitivity
Sensitivity
Sensitivity
●
Specificity
●
Sensitivity
0.9
●
Sensitivity
●
0.9
IP
Optimal Balanced Accuracy
●
●
IP
Specificity
● 0.9
IP
0.50 0.25
4
Source
● ●
● ●
0.25 0.00
●
5
●
●
●
0.6 3
3
●
IP
1
● 2
●
●
0.00
Source
1
● 2
●
Specificity
Sensitivity
●
● ● ●●● ● ● ● ● ● ● ● ● ●
●
0.9
Specificity
●●
0.9 ●
Specificity
●
Specificity
●● 0.9
●Source Source ●
● ●
1
1
● 2
● 2
3
3
4
4
5
5
6
6
●
8
● ● ● ● ● ●●
0.0 0.00 0.25
0.25 0.50
0.50 0.75
Probability Probability
LFM
LFM
0.75 1.00
1.00
● ●● ●● ●● ●● ●● ● ●● ●● ● ●
●● 0.9 0.9
Size
●
●
40
40
60
60
0.3 0.3
●
● ●
0.0 0.0
0.3
60
10 0.6
● 20 40
● 20 40
0.3
0.0
0.0
0.00
1.00
10
40 60
60
60
0.0
0.6
0.3
40 60
40
0.3
● ●● ●● ●● ●● ●● ●●● ●●
0.25 0.00
0.50 0.25
0.75 0.50
NOM
NOM
ProbabilityProbability
1.00 0.75
● ● ● ●●● ●● ●● ●● ●● ●● ●●● ● 0.9
NOM
NOM
1.00 0.75
40 60
40
●
0.0
0.00
0.25 0.00
0.50 0.25
0.75 0.50
Probability Probability
1.00 0.75
0.0
●
●●20
0.25 0.00
0.50 0.25
0.75 0.50
ProbabilityProbability
1.00 0.75
1.00
●
●
Size
Size 10
40
40
60
60
●
40 60
● ● ● ● ● ● ● ●● ●● ● ● ●
●
Size
0.50 0.25
0.75 1.00
0.6
0.6
0.0
●
● ● ● ●● ● ● ● ● ●● ●● ●● ●
10
● 20
40
40
60
60
0.9
●
Size
● ●
40
10
● 20
● ●
0.3
40
● ●
60
● ●
● ●
● ●
0.0
0.75 0.50
●●●
1.00
0.00
0.6
40
0.25 0.00
0.25
0.50
0.50 0.25
0.75
0.50 Probability Probability
0.75
1.00
0.75 0.50
ProbabilityProbability
1.00 0.75
1.00
LFM
●●●
Size
10
● 20
0.6
40 60
60 0.3
● 0.0
0.25
●
0.9
0.3
●● ●
●
Size
10
● 20
0.0
0.00
1.00 0.75
Size
●
● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
0.0
0.00
1.00
Size
10
● 20
0.3
LFM
●
Probability Probability
IP.RNK IP.RNK
0.50 0.50 0.75
Probability Probability
60 0.3
0.0 0.25 0.00
0.25 0.25
● ●
0.9 ●
0.6 10 0.6 ● 20
0.3
0.00
1.00
60
0.9
0.3
●
10
●
0.6
● 0.0
●
●
0.3
40
60
●
LFMLFM
0.9
60
0.3
40
0.00 0.00
1.00
Size
10 0.6 ● 20
10
● 20 ● 20
60
0.9
Specificity
Sensitivity
0.3
0.75 0.50
Size
10
● 20
Size
Size 10 0.6 0.6 ● 20
0.0 0.0
0.50 0.25
ProbabilityProbability
●
Size
0.6
10
● 20
40
●
0.25 0.00
0.9
0.9
0.3 0.3
●●
0.9
0.6
60
0.0
0.00
1.00
40
0.3
0.0
●●●
● 20
●
Specificity
0.0●
0.00
●
60
10
Size
10
● 20
NOM.RNK NOM.RNK
0.9 0.9
0.6
●●●
0.0
0.00
1.00
Specificity
● 20
0.75 1.00
0.0
● ●● ● ● ● ● ●● ● ● ● ●● ●● ●● ●● ●●
●●●
Size
10 0.6 ● 20
0.50 0.50 0.75
Probability Probability
1.00
0.6
NOM.RNK NOM.RNK
Size
10
0.25 0.25
MP
60
0.3
Specificity
Size
0.00 0.00
1.00
● ● ●●●●● ● ● ● ●● ●● ●● ●● ●●
Specificity
● 0.6
0.3
0.0
1.00 0.75
0.9
Specificity
Sensitivity
Sensitivity
0.3
0.75 0.50
●●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● 0.9
●
0.50 0.25
ProbabilityProbability
MP
0.9
0.6
0.25 0.00
MP
●
0.9
Sensitivity
0.00
1.00
40
60
Size
10
Specificity
●
●●●
Sensitivity
MP
1.00 0.75
Sensitivity
0.75 0.50
Sensitivity
0.50 0.25
ProbabilityProbability
10
● 20
40
●
Size
● 20 ● 20 40
0.0 0.0
Sensitivity
0.25 0.00
0.0
Specificity
0.0
0.00
● ● ● ●● ●● ●● ●● ●●● ●●
10
● 20
1.00 0.75
● ● ● ● ●● ● ●● ●● ●● ●● ●● ● 0.9
0.9
●
● 0.0
0.75 0.50
MP.RNKMP.RNK
40
0.3 0.3
0.50 0.25
MP.RNK MP.RNK
Size
0.3
0.25 0.00
ProbabilityProbability
10 0.6 0.6 ● 20
60
60
●
0.75 1.00
Size
10
0.6
Sensitivity
40
● 0.3
Size
Specificity
● 20
●
0.50 0.50 0.75
Probability Probability
0.9 0.9
Size
10
0.6
0.3
● ● ● ●
●●●
0.9
Specificity
●
Sensitivity
Sensitivity
Size 0.6
● ● ●● ●● ●● ●●●●●
Specificity
0.9
0.9
0.25 0.25
IP
●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● 0.9
0.3
Specificity
IP
IP
Sensitivity
IP
0.6
● ●
0.00 0.00
Size
Size
10
0.6 ● 20 ● 20
Specificity
Sensitivity
Sensitivity
10
Specificity
Size
● ● 0.6 0.6
●●●
0.9
0.9
● ●
1.00
0.00
0.0 0.25 0.00
0.50
0.75
0.25 0.50 Probability Probability
1.00 0.75
1.00
IP.RNK IP.RNK 1.00
●● 0.9
Sensitivity
Sensitivity
0.6
0.3
Size
●
10
● 20
0.6
40
●
0.3
●
60
●
100.6 ● 20 40
0.0 0.25 0.00
40 60
60 0.3
0.50
● ● ●● ● ● ● ● ● ● ● ●●● 0.75
1.00
0.25 0.50 0.75 Probability Probability
0.0 0.00
1.00
●
0.9
●
0.25 0.00
0.50
0.9
IP.RNK LFM
0.50
MP MP.RNK NOM NOM.RNK
1.00
0.00
●●●
10
●
20
40
60
Size of network
Size
Size
Specificity
40 60
Specificity
10
● 20
0.6
100.6
● 20 40
10
● 20
0.6
40
Size
60
10
● 20 40 60
0.3
0.3
●
● ● ● ● ● ● ● ●●● 0.0 ● ● ● ● ● ● ●
0.00
0.25 0.00
0.50
0.75
●●●
0.0
1.00
0.25 0.50 0.75 Probability Probability
●
0.0
0.00
0.25 0.00
1.00
NOM.RNK NOM.RNK
0.50
0.75
1.00
0.25 0.50 0.75 Probability Probability
●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●
●
●●
● 0.9
0.9
0.9 Size
0.6
Size
Specificity
10
● 20
Specificity
Sensitivity
Size 0.6
1.00
NOM.RNK NOM.RNK
0.9
Sensitivity
IP
60
MP.RNK
● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● 0.9
0.3
0.0
Method
40
0.25
1.00
60
0.3
10
● 20
Figure 7: Sensitivity and specificity for different network sizes (m = 20, 40 and 60 nodes) for artificially generated information sources (α = 2, β = 2). The plot shows the behavior of different priors (top) and the corresponding optimal balanced accuracies (bottom). The number of sources for all the networks here were 6. Sensitivity
Sensitivity
●
0.9
Size 0.6
0.75
0.25 0.50 0.75 Probability Probability
MP.RNK
●
Size
0.0
MP.RNK MP.RNK ●
0.75
10
● 20
0.6
● ●
0.00
Size
Size
0.3
● 0.0
0.9
●
●
●●●
Optimal Balanced Accuracy
0.9
Specificity
●
Specificity
0.9
● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●
100.6
40
● 20
60
40
10
● 20
0.6
40 60
60 0.3
0.3
0.3
●
● ● 0.0 0.00
0.25 0.00
0.50 0.75 Probability 0.25 0.50
Probability
1.00 0.75
0.0
● 1.00
10
9
● 20 40 60
0.3
0.0
Size
●● ● ● ● ● ● ● ● ●● ● ● ● ● ●
●
● ●
●
0.0 0.00
0.25 0.00
0.50 Probability 0.25
0.75 0.50
Probability
1.00 0.75
1.00
10
● 20 40 60
IP.RNK IP.RNK
IP.RNK IP.RNK
0.9
0.9
0.9●
0.9
●
10
● 20
●
●
40
0.3
●0.3
0.0
0.0
0.9
0.9
0.3
Size
10
● 20
● 20
40
40
60
60
0.3
0.6
0.9
0.9
60
●
● ● ● ● ● ● ● ● ● ● ●●●
0.25 0.50
0.50 0.75
Probability Probability
●●●
0.75 1.00
1.00
● 20
0.6
0.3
● 20
40
40
60
60
0.0
0.0 0.00 0.25
●
10
● 20
0.3
0.3
0.3
0.0
0.0
0.0
0.0 ●
40
40
60
60
0.250.50
0.500.75
Probability Probability
MP
0.751.00
0.00
1.00
0.000.25
0.250.50
0.500.75
Probability Probability
MP
MP
0.751.00
0.9
0.9
0.3
●
Size
10
10
● 20
● 20
40
40
60
60
0.6
0.3
0.3
0.50 0.75
●●●
0.75 1.00
0.9
●
40
40
60
60
0.00 0.25
10
0.6
● 20
0.00
0.000.25
0.250.50
0.500.75
Probability Probability
0.751.00
0.00
1.00
0.000.25
0.250.50
0.500.75
Probability Probability
NOM NOM
0.25 0.50
0.751.00
0.00 0.25
0.00
1.00
0.25 0.50
0.50 0.75
0.75 1.00
40
40
60
60
1.00
0.00 0.25
0.25 0.50
LFM
● 0.3
●● 0.0
0.0
0.00
0.000.25
0.250.50
0.500.75
Probability Probability
0.751.00
● ● ● ● ● ●●
0.000.25
●
0.9
0.9
0.9
10
● 20
40
40
60
60
●
0.250.50
0.0
0.00
0.000.25
●
● ●● ●● ● ● ● ● ● ● ● ● ● ●●● 0.250.50
0.500.75
Probability Probability
0.751.00
0.0
0.0
10
● 20
40
40
60
60
0.6
0.3
● ●● ●● ●● ●● ●● ●● ●●● ● ● ● 0.250.50
0.500.75
Probability Probability
0.751.00
40
60
60
● ●
0.0
0.00
0.25 0.00
0.50 0.25
0.75 0.50
ProbabilityProbability
1.00 0.75
1.00
STRINGSTRING
0.9
●
●● ●● ●● ●● ●● ●● ●● ●●
●
●
0.9
40
60
60
0.6
Size
Size
10
10
● 20
● 20
40
40
60
60
0.3
0.6
0.6
0.3
0.3
0.0
0.0
Size
10
10
● 20
● 20
40
40
60
60
● 0.0 0.250.50
0.500.75
Probability Probability
0.751.00
1.00
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
0.0
0.00
0.25 0.00
0.50 0.25
0.75 0.50
Probability Probability
1.00 0.75
1.00
0.00
0.25 0.00
0.50 0.25
0.75 0.50
ProbabilityProbability
1.00 0.75
1.00
● ●● ●● ●● ●● ●● ●● ●●● ● ● ●
0.6
●
10
● 20
40
40
60
60
●
0.3
0.0
● 0.0
0.00
1.00
Size
10
● 20
0.000.25
0.250.50
0.500.75
Probability Probability
0.751.00
1.00
NOM.RNK NOM.RNK
●
●
●
0.9
●
0.3
0.0
0.0
Size
Size
Specificity
0.6
10
10
● 20
● 20
40
40
60
60
0.6
0.3
0.00
●
0.000.25
0.250.50
0.500.75
Probability Probability
0.751.00
0.6
●
● ● ● ●● ●● ●● ● ● ● ●
0.9
0.000.25
0.250.50
Size
●
●
0.00
10
10
● 20
● 20
40
40
60
●
0.6
60
0.250.50
● ●● ●● ●●● ● ● ● 0.500.75
Probability Probability
40
60
60
● ●
0.751.00
1.00
0.751.00
1.00
●
● ●
● ●● ●● ●●● ● ● ● ●
● Size
●
●
10
● 20
0.00
40 60
●
0.0 0.000.25
0.250.50
0.500.75
Probability Probability
0.751.00
10
● 20
60
0.3
0.0
10
Size
40
●
●
●
●
0.0
●
0.6
● 0.3
0.3
0.000.25
Size
Specificity
●
●
0.0
●
●
●
0.500.75
Probability Probability
●
0.9
Specificity
Sensitivity
0.3
0.6
10
● 20
40
LFM LFM
●
●
Size
10
● 20 ●
0.0
0.00
1.00
●
0.3
0.0
●
●
LFM LFM
● 0.9 ●
●
0.9
Specificity
●● ●● ●● ●● ●● ●● ●● ●● ●
0.9
Sensitivity
Sensitivity
10
● 20
40
0.9
●●
Sensitivity
0.0
1.00
Size
10
● 20
Figure 8: Sensitivity and specificity for different network priors and STRING for KEGG sub-graphsm = 20, 40 and 60 nodes ( top to bottom). Corresponding oBAC are available in the main document.
0.3
0.6
●●●
1.00 0.75
Size
10
40
Size
Specificity
Size
10
● 20
Size
0.9
0.75 0.50
Probability Probability
Size
0.3
● 0.9
0.000.25
●●
0.6
0.50 0.25
● ● ● ●● ● ●
Size
0.3
●
●
●●●
● ●
●
●
10
NOM.RNK NOM.RNK
0.9
60
0.9
● 20 0.6 ● 20
0.000.25
●
0.00
0.25 0.00
0.9
0.0
●
●
60
●
●
● ●● ●●● ●● ●
0.6
STRINGSTRING
●
0.00
Specificity
Sensitivity
Sensitivity
0.3
40
0.6
0.3
●
LFM
●
MP.RNK MP.RNK
0.6
0.3
40
●
0.0
0.00
0.3
0.0
1.00
Size 0.6
1.00
0.6
● 0.9 ●
0.751.00
10
● 20
● ● ●● ●● ●● ●● ●● ●●● ● ● ● ●
MP.RNK MP.RNK
0.9
0.500.75
Probability Probability
Size
10
● 20
0.3
●
1.00
●
●
0.0
Size
● ●
0.75 1.00
●
●
0.6
0.3
0.3
●
●
●
Sensitivity
10
● 20 ●
● 0.3
60
0.3
●
●
0.9
●
●
● ●
Size
Specificity
0.6
Size
Specificity
Sensitivity
Sensitivity
Size
●
60
0.6
IP.RNKIP.RNK
●
0.6
●
40
0.0
IP.RNKIP.RNK
0.9
●
40
0.0
0.00
1.00
●
●
0.9
●
Specificity
0.3
10
0.50 0.75
LFM
● ●
Size
● 20 0.6 ● 20
60
●
ProbabilityProbability
Specificity
●
60
0.3
0.0
●
40
60
0.0
0.00
Specificity
60
40
● Size 10
0.6
●
10
● 20
40
●
● ● ● ● ●●● ●● ●● ● ●
Size
10
● ● 20
●
● 0.3
0.0
Specificity
40
0.6
Sensitivity
10
● 20
● 0.9 ●
0.9
●
● Size
10
● ● ● ●● 20 ●
●
●
Sensitivity
0.3
Size
●
●
0.6
Specificity
Sensitivity
Sensitivity
● 0.6
●
0.9
Sensitivity
0.9
●
Specificity
0.9
●
●
Probability Probability
LFM
1.00
0.6
●●
●● ●● ● ● ●● ● ● ● ● 0.9 ●
0.75 1.00
Size
10
● 20 0.6
●
●
NOM NOM
0.50 0.75
ProbabilityProbability
Size
0.3
0.0
0.0
0.0
60
0.9
0.9
●● 0.0
40
60
●
●
●
●
● ● ●● ●● ●● ●● ●● ●● ●●● ● ● ● 0.0 ● ●
0.0
10
● 20
40
●
0.00
0.3
0.3
0.3
Size
10
● 20
●
●
Size
10
●●●
0.0
0.0
1.00
Size
10
● 20 0.6 ● 20
0.6
●● ●● ●● ●● ●● ●●● ●●
NOM.RNK NOM.RNK
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●
●● 0.9
Sensitivity
Size
Specificity
0.6
● ●● ●● ●● ●● ●● ●● ●● ●●● ● ● ● ● ● 0.9
Specificity
Sensitivity
Sensitivity
Size 0.6
0.25 0.50
Probability Probability
1.00
0.6
NOM.RNK NOM.RNK
MP
●
0.9
0.00 0.25
0.00
1.00
0.75 1.00
0.3
Specificity
0.000.25
● ● ● ● ● ● ● ● ● ● ●● ● ● ●
●
● 0.3
●
0.50 0.75
Size
10
● 20 0.6
Specificity
0.00
Sensitivity
●● ● ●● ●● ●● ●● ●● ●● ●●● ● ● ● 0.0 ●● ●
0.25 0.50
ProbabilityProbability
Size
● 0.0
60
0.9
0.9
0.6
0.3
40
MP.RNKMP.RNK
Size
10
10
● 20
60 0.3
0.00
●
●
Size
10 0.6
Size
10
● 20 40
●
●
●● ●● ●● ●● ●● ●● ●● ●● ●●● ● ● ●
Sensitivity
Size
Specificity
10 0.6
Size
● 0.6
MP.RNKMP.RNK
IP
0.9
Specificity
Sensitivity
Sensitivity
Size 0.6
40
Specificity
●● 0.9
●
Specificity
IP
IP
●
0.00 0.25
Sensitivity
IP
10
● 20 0.6
●
●
0.00
60
●
●●●
Size
Specificity
Size
0.6
Specificity
Sensitivity
Sensitivity
●
0.6
● ● ● ● ● ●● ●● ● ● ● ● ● ●
●
1.00
356
7043
7040
355
7046
7048
5495
1616
5536
6885
7042
208
6416
4217
5606
408
5293
5295
8503
5296
5294
5170
208
7043
7040
355
7046
7048
5495
1616
5536
6885
1852
2316
3479
356
5601
6416
408
1852
2316
5601
5563
6195
27330
6197
5594
9479
4296
5595
11072
3091
5595
23542
3630
2475
10000
208
4217
5606
5595
7042
5595
54541
5778
1849
5594
1432
4137
7249
1843
80824
6300
4149
5600
3303
5601
5609
5602
4208
●●●●●● 8651
4093
658
4091
4090
4086
8454
728622
9765
4087
4088
9655
23624
122809
30837
867
90
●●●●●●●●●●● 3459
163702
3581
3561
3572
2057
4352
3575
3977
3587
3598
4089
3399
3397
3398
1875
2033
6667
5308
1387
●●● 3717
7297
3716
145957
2549
5294
5296
5293
2064
5295
2065
2066
23533
25759
9542
145957
2065
2066
2069
1839
7039
1950
1956
6464
53358
5336
5335 8503
208
5579
1956
23533
5296
5293
25759
399694
6464
867
5578 10000
2885
1026 6655
3458
814
5530
810
805
5257
817
163688
815
801
64750
808
130399
5137
115
4638
107
113
114
5568
7046
657
4092
4086
4090
4093
3398
3399
196883
1875
5613
9372
4091
4087
4088
1874
5308
2033
5567 4609
Figure 9: The 10 graphs with # nodes=20 sampled from KEGG via random walks. The node IDs represent Entrez IDs.
11
2549 1857
5293
208
3714
51107
5663
4851
4854
4855
4853
207
1027
2475
5335
5336
5532
5578
572
23291
100137049
1855
23533
10000
1026
55851
2619
64399
4036
6469
3549
50846
5582
56848
2736
2737
3845
5894
5727 5605
5336
3358
2911
623
3357
146
3356
5579
5894
56848
5605
5604
5594
5595
5319
123745
2776
2767
9630
5330
784
3791
5566
5567
5568
5613
25780
6714
5906
5908
998
6237
5293
8503
5291
5295
673
5881
1456
207
5336
5335
5533
5582
5578
56848
8877
1453
5321
2737
5879
2735
2736
3845
7479
656
8643
652
7478 5894
Figure 10: 10 directed acyclic graphs (DAGs) with # nodes=10 selected via random walks on KEGG pathways. The node IDs represent Entrez IDs.
12
Reconstruction
Reconstruction
0.9
0.9
●
●
●
●
●
●
●
● METHOD
METHOD
IP
IP
● IP.RNK
Specificity
Sensitivity
● IP.RNK 0.6
LFM 0.6 MP
LFM
MP.RNK
MP.RNK
NOM
NOM
NOM.RNK
NOM.RNK
NP
NP
MP
0.3
0.3
●
●
●
●
●
●
●
●
0.0
0.0
5
10 2
20
50 4
100
500 6
1000
5000 8
5
10 2
20
450
100
500 6
1000
5000 8
# Samples
# Samples
Figure 11: Sensitivity and specificity of Bayesian Network reconstruction for simulated data for a network of 10 nodes (m = 10) with different kinds of prior. The corresponding balanced accuracies are shown in the main document
Figure 12: MetacoreTM network for 37 selected genes from vant’t Veer breast cancer data set. The network was retrieved using the shortest path algorithm (path length of 2). The entire set of interaction with literature evidences are in ‘Supplement 2’
13
References ˜ [1] Ethan G Cerami, Benjamin E Gross, Emek Demir, Igor Rodchenkov, Azagin Babur, Nadia Anwar, Nikolaus Schultz, Gary D Bader, and Chris Sander. Pathway commons, a web resource for biological pathway data. Nucleic Acids Research, 39:D685–D690, 2011. [2] Holger Fr¨ohlich, Mark Fellman, Holger S¨ ultman, and Tim Beißbarth. Predicting pathway membership via domain sinatures. Bioinformatics, 24:2137–2142, 2008. [3] Holger Fr¨ohlich, Mark Fellman, Holger S¨ ultman, Annemarie Poustka, and Tim Beißbarth. Large scale statistical inference of singnaling pathways from rnai and microarray data. BMC Bioinformatics, 8(386), October 2007. [4] Florian Hahne, Alexander Mehrle, Dorit Arlt, Annemarie Poustka, Stefan Wiemann, and Tim Beißbarth. Extending pathways based on gene lists using interpro domain signatures. BMC Bioinformatics, 9(1):3, 2008. [5] Marc Johannes, Holger Fr¨ohlich, Holger S¨ ultmann, and Tim Beißbarth. pathclass: an r-package for integration of pathway knowledge into support vector machines for biomarker discovery. Bioinformatics, 27(10):1442–1443, 2011. [6] R I Kondor and J Lafferty. Diffusion kernels on graphs and other discrete structures. In Proceedings of the ICML, 2002. [7] Dekang Lin. An information-theoretic definition of similarity. In In Proceedings of the 15th International Conference on Machine Learning, pages 296–304. Morgan Kaufmann, 1998. [8] Nicola J Mulder, Rolf Apweiler, Terri K Attwood, Amos Bairoch, Alex Bateman, David Binns, Margaret Biswas, Paul Bradley, Peer Bork, Phillip Bucher, Richard Copley, Emmanuel Courcelle, Richard Durbin, Laurent Falquet, Wolfgang Fleischmann, Jerome Gouzy, Sam Griffith-Jones, Daniel Haft, Henning Hermjakob, Nicolas Hulo, Daniel Kahn, Alexander Kanapin, Maria Krestyaninova, Rodrigo Lopez, Ivica Letunic, Sandra Orchard, Marco Pagni, David Peyruc, Chris P Ponting, Florence Servant, Christian J A Sigrist, and InterPro Consortium. Interpro: an integrated documentation resource for protein families, domains and functional sites. Brief Bioinform, 3(3):225– 235, Sep 2002. [9] B. Raghavachari, A. Tasneem, T. M. Przytycka, and R. Jothi. Domine: a database of protein domain interactions. Nucleic Acids Res, 36(Database issue):D656–61, 2008.
14
[10] Andreas Schlicker, Francisco S Domingues, J¨org Rahnenf¨ uhrer, and Thomas Lengauer. A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics, 7:302, 2006.
15
Table 1: Pairwise Wilcoxon test for model performance comparison (false discovery rates) for KEGG sub-graphs with m = 10 nodes Methods IP IP.RNK LFM MP IP.RNK LFM 0.0041 0.0041 MP 0.0203 0.0203 0.0352 MP.RNK 0.0423 0.0423 0.0203 0.0304 NOM 0.0041 0.0041 0.2974 0.0041 NOM.RNK 0.0041 0.0041 1.0000 0.0099 STRING 0.0070 0.0070 0.0041 0.0945
MP.RNK 0.0041 0.0041 0.5111
NOM NOM.RNK 0.0041 0.0041 0.0041
Table 2: Pairwise Wilcoxon test for model performance comparison (false discovery rates) for KEGG sub-graphs with m = 20 nodes Methods IP IP.RNK LFM MP IP.RNK 0.0036 LFM 0.0036 0.2712 MP 0.0144 0.0064 0.0260 MP.RNK 0.0036 0.6250 0.4200 0.0348 NOM 0.0036 0.0036 0.5773 0.0036 NOM.RNK 0.0036 0.0036 0.4648 0.0036 STRING 0.0036 0.0036 0.0036 0.0260
MP.RNK 0.0091 0.0064 0.0036
NOM NOM.RNK 0.1022 0.0036 0.0036
Table 3: Pairwise Wilcoxon test for model performance comparison (false discovery rates) for KEGG sub-graphs with m = 40 nodes Methods IP IP.RNK LFM MP IP.RNK 0.0068 LFM 0.0039 0.0980 MP 0.0594 0.2166 0.0091 MP.RNK 0.0039 0.4038 0.0594 0.0201 NOM 0.0039 0.0495 0.6481 0.0039 NOM.RNK 0.0039 0.0039 0.9219 0.0039 STRING 0.0039 0.0068 0.0039 0.0383
16
MP.RNK 0.0039 0.0039 0.0039
NOM NOM.RNK 0.0091 0.0039 0.0039
Table 4: Area Under Curve (Standard deviations in brackets) values for KEGG pathway reconstruction with different number of nodes (in brackets are the standard deviations) Method 10 IP 0.500 (0.00) IP.RNK 0.500 (0.00) MP 0.828 (0.04) MP.RNK 0.833 (0.09) NOM 0.879 (0.05) NOM.RNK 0.801 (0.02) LFM 0.869 (0.05)
20 0.500 (0.00) 0.741 (0.03) 0.795 (0.03) 0.807 (0.08) 0.875 (0.04) 0.881 (0.02) 0.907 (0.01)
17
40 0.500 (0.00) 0.771 (0.04) 0.803 (0.05) 0.855 (0.10) 0.831 (0.04) 0.864 (0.01) 0.918 (0.02)
60 0.500 (0.00) 0.779 (0.04) 0.781 (0.05) 0.882 (0.11) 0.887 (0.06) 0.856 (0.02) 0.923 (0.03)
Table 5: Pairwise Wilcoxon test for Bayesian Network reconstruction for different sample size (5 to 5000, see main article for details) sample size
5
10
20
50
100
500
Methods IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP
IP IP.RNK LFM MP MP.RNK 0.85 0.11 0.11 1.00 0.85 0.11 0.17 0.17 1.00 0.17 0.11 0.11 1.00 0.11 0.85 0.17 0.19 0.83 0.17 0.85 1.00 1.00 0.11 1.00 0.19 0.703 0.032 0.062 1.00 0.703 0.032 0.437 0.683 0.032 0.437 0.272 0.309 0.683 0.272 0.613 0.227 0.125 0.418 0.227 0.483 0.026 0.105 0.259 0.026 0.683 0.422 0.041 0.041 1.00 0.422 0.041 0.049 0.102 0.060 0.049 0.041 0.041 0.906 0.041 0.240 0.049 0.041 0.466 0.049 0.422 0.304 0.722 0.041 0.304 0.049 0.922 0.016 0.016 1.00 0.922 0.016 0.016 0.016 0.154 0.016 0.018 0.016 0.864 0.018 0.120 0.046 0.020 0.197 0.046 0.703 0.197 0.653 0.016 0.197 0.016 0.625 0.029 0.018 1.00 0.625 0.029 0.102 0.126 0.029 0.102 0.031 0.018 0.240 0.031 0.177 0.029 0.029 0.177 0.029 0.206 0.102 0.625 0.018 0.102 0.237 0.675 0.016 0.016 1.00 0.675 0.016 0.092 0.092 0.016 0.092 0.016 0.016 1.000 0.016 0.022 0.038 0.022 0.063 0.038 0.378 0.016 0.016 0.197 0.016 0.177 18
NOM 0.37 0.13 0.957 0.959 0.304 0.041 0.059 0.016 0.625 0.102 0.049 0.197
NOM.RNK 0.27 0.905 0.049 0.026 0.031 0.799
Table 6: continuation of table 6 sample size
1000
5000
Methods IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP
IP IP.RNK LFM MP MP.RNK 0.625 0.022 0.022 1.00 0.625 0.022 0.048 0.048 0.025 0.048 0.022 0.022 1.00 0.022 0.025 0.034 0.025 0.048 0.034 0.533 0.031 0.022 0.424 0.031 0.034 0.675 0.014 0.011 1.00 0.675 0.014 0.037 0.025 0.019 0.037 0.014 0.011 1.000 0.014 0.019 0.031 0.019 0.075 0.031 0.957 0.014 0.011 0.079 0.014 0.011
19
NOM 0.048 0.424 0.037 0.188
NOM.RNK 0.198 0.011