Boosting Probabilistic Graphical Model Inference by ... - ScienceOpen

1 downloads 0 Views 5MB Size Report
May 2, 2013 - [8] Nicola J Mulder, Rolf Apweiler, Terri K Attwood, Amos Bairoch, Alex. Bateman, David Binns, Margaret Biswas, Paul Bradley, Peer Bork, ...
Boosting Probabilistic Graphical Model Inference by Incorporating Prior Knowledge from Multiple Sources (Supplement) Paurush Praveen, Holger Fr¨ohlich May 2, 2013

1 Knowledge sources The approaches presented here primary consider the following sources of biological knowledge, however new sources can be added if available. Protein-Protein Interactions Protein-protein interaction (PPI) data can be instrumental to understand biology at a system-wide level. They present the current knowledge about pairs of proteins that interact in living system and hence can be an important source of information for our approach. PPIs have traditionally been measured using a variety of assays, such as immunoprecipitation and yeast two-hybrid (Y2-H). Such knowledge resides in various databases, like IntAct, HPRD etc. The entire set of known interaction referred as interactome can be structured as a graph. We here use interaction data from the PathwaysCommons database [1]. To compute a confidence value for each interaction between a pair of genes/proteins we can work at the level of this graph in different ways: One way is to look at the shortest path distance between the two entities. Another way is to employ diffusion kernels [6]. To calculate the shortest path distance between two nodes the function sp.between function based on Dijkstra’s algorithm is used from R-package RBGL. The edge confidence is then computed as the inverse shortest path distance. The diffusion kernel was computed using the R-package pathClass [5] for the entire PathwayCommons graph. However, we observed that the use of the diffusion kernel in place of the shortest path measure did not affect the results much (Figure 1). For all the results in the main document we thus used the inverse shortest path distance as a measure of confidence.

1

Gene Ontology The Gene Ontology (GO) has been developed to offer controlled vocabularies for aiding the annotation of molecular attributes for different organisms. Predicting the map of potential physical interactions between proteins by fully exploring the knowledge buried in GO annotations seems a promising approach, because interacting proteins often function in the same biological process. This implies that two proteins acting in the same biological process are more likely to interact than two proteins involved in different processes. Here we use this information based on GO Biological Process (BP) annotations. Comparison of individual GO terms was performed via Lin’s similarity measure [7]. Based on this gene products were compared via the default method in GOSim [3], which resembles the similarity measure by Schlicker et.al.[10]. Protein Domain Annotation Hahne et.al [4] and Fr¨ohlich et.al. [2] found that proteins in distinct KEGG pathways are enriched for certain protein domains, i.e. proteins with similar domains are more likely to act in similar biological pathways. The confidence of interaction between two proteins can thus be seen as a function of the similarity of the domain annotations of proteins. Protein domain annotation can be retrieved from the Inter-Pro database [8]. For each protein we constructed a binary vector, where each component represents one Inter-Pro domain. A “1” in a component thus indicates that the protein is annotated with the corresponding domain. Otherwise a “0” is filled in. The similarity between two binary vectors u, v (domain signatures) is presented in terms of the cosine similarity Sdomain =

hu, vi kukkvk

(1)

Domain-Domain Interactions Two proteins are more likely to interact if they contain domains, which can potentially interact. The DOMINE database collates known and predicted domain–domain interactions [9]. Calculation for edge confidence (IAB ) based on the DOMINE database is done as IAB =

H DA .DB

(2)

where H is the number of hit pairs found in the DOMINE database and DA and DB are the the number of domains in proteins A and B, respectively.

2

Figure 1: Optimal balanced accuracies achieved for reconstructing KEGG subgraphs (m=20) using graph diffusion kernel (left) and inverse of pairwise shortest path-lengths (right)

3

−5000 −6000 −7000 −9000

−8000

log likelihood

0

1000

2000

3000

4000

5000

Iterations (x100)

Figure 2: Plot showing the convergence of the MCMC sampler in terms of log likelihoods.

4



A2B3 ●● A2B3 A2B3 ●● A2B3

0.6 0.6

A2B4 A2B4 A3B2 A3B2

A2B4 A2B4 0.6 A3B2 0.6 A3B2

A3B3 A3B3

A3B3 A3B3 A3B4 A3B4

A3B4 A3B4 A4B2 A4B2

0.3 0.3

0.3 0.3

0.0 0.0

0.0 ● ●● ● ●●● ●●● ●●● ●●● ● ●● ●●● ●●● 0.0●● ●● ●● ●●● ●

A4B4 A4B4

0.9

A4B2 A4B2 0.3 A4B4 0.3 A4B4



0.0

0.0 0.0

0.0 0.0

0.00

0.50 0.25

0.75 0.50

Probability Probability

1.00 0.75

0.00 0.00

1.00

A2B2

A2B2

● A2B3

● A2B3

A2B4

0.6

0.3

A3B2

A2B4 0.6 A3B2

A3B3

A3B3

A3B4

A3B4

A4B2

A4B2 0.3 A4B4

A4B4

0.25 0.50 0.25 0.50

0.50 0.75 0.50 0.75

Probability Probability Probability Probability

Parameter



A2B2

A2B2

● A2B3

● A2B3

A2B4

0.6

A3B2 A3B3

0.3

A2B4 0.6 0.6 A3B2 A3B3

A3B4

A3B4

A4B2

0.3 A4B2 0.3 A4B4

A4B4

Probability Probability

NOM

1.00 0.75

0.00

1.00

0.0 0.0

0.0 0.25 0.00

0.50 0.25

0.75 0.50

Probability Probability

NOM

NOM

1.00 0.75

0.3





0.6 0.6







● ●

0.6

0.3

A3B2

A2B4 0.6 A3B2

A3B3

A3B3

A3B4

A3B4

A4B2

A4B2 0.3 A4B4

A4B4

● Parameter ●● A2B3



● A2B3

A2B4

0.6

A3B2 A3B3





0.3



0.0

● A2B3

0.50 0.25

0.75 0.50

Probability Probability

1.00 0.75

0.0



0.25 0.00

IP.RNK IP.RNK

● A2B3 A2B4

0.6



● ●

0.3



0.3



0.25 0.00

0.50 0.25

0.75 0.50

Probability Probability

1.00 0.75

A2B4

●●

A3B4

A3B4

A3B4

A3B4

A4B2

A4B2 0.3 A4B4

A4B2

A4B2 0.3 0.50 A4B4

0.6

A3B2 A3B3

0.3

A4B4

●●

0.6

A3B2

● ● 0.0

A3B3

A3B3

A3B4

A3B4

0.25 0.00

A4B4

0.50 0.25

0.75 0.50

Probability Probability

1.00 0.75

Parameter Parameter A2B2

A3B2

A3B2

A3B3

A3B3

0.75 1.00

A2B2

● A2B3 ● A2B3 A2B4

A3B2

A3B2

A3B3

A3B3

A3B4

A3B4

A3B4

A4B2

A4B2

A4B4

A4B4

0.6

0.6

0.3

0.3

● 0.0

0.0

1.00

0.00 0.25

0.00

0.25 0.50

0.50 0.75

0.75 1.00

ProbabilityProbability

NOM.RNK NOM.RNK

NOM.RNK NOM.RNK

1.00

A2B2

0.6

0.3

A3B3

● Parameter Parameter A2B2

A2B2

A2B4

A2B4

A3B2

A3B2

A3B3

A3B3

A3B4

A3B4

A4B2

A4B2

A4B4

A4B4

A2B2

● A2B3 ● A2B3 0.6

0.6

A2B4

A2B4

A3B2

A3B2

Method A3B3

A3B3

A3B4IP

A3B4

IP.RNK

0.3

A4B2

A4B2

●A4B4MP

A4B4

LFM

0.3





NOM

●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●



1.00 0.75

1.00

0.0 0.25 0.00

0.50 0.25

0.75 0.50

0.0

NOM.RNK

0.0

0.00

0.25 0.00

ProbabilityProbability

0.50 0.25

0.75 0.50

1.00 0.75

1.00

ProbabilityProbability

LFM

LFM

LFM

● ●● ●● ●● ●● ●● ●●● ● ● ●●● ●● ● ●● ● 0.00

0.9 ● 0.9 Parameter

A2B2

A3B2

0.9



● A2B3

0.6

● 0.9

● A2B3 ● A2B3

A2B4 0.6 A3B2 A3B3



A3B4

● ● A2B2

Parameter Parameter A2B3



A2B4

0.9

0.9

A2B2 A2B2 A3B3 A3B2 Size of network ● A2B3 ● A2B3

Parameter Parameter A3B4

A4B2

A4B4

0.6



A2B4

● ●

● ●



A2B4

0.6

A2B2

A2B2

● A2B3 ● A2B3

Parameter sets

0.3

0.0

0.0

0.00

0.25 0.00

0.50 0.25

0.75 0.50

Probability Probability

1.00 0.75

0.6

A2B4

A2B4

A3B2

A3B2

A3B3

A3B3

A3B2

A3B2

A3B3

A3B3

A3B4

A3B4

A3B4

0.3

● 0.50 0.25

0.75 0.50

Probability Probability

LFM

LFM

1.00 0.75

Parameter

A2B2

A2B2

● A2B3

● A2B3

A2B4

A2B4

A3B2

A3B2

A3B3

A3B3

A3B2

A2B4 0.6 A3B2

A3B3

A3B3

A3B4

A3B4

A3B4

A3B4

A4B2

A4B2 0.3 A4B4

A4B2

A4B2

A4B4 ●

A4B4

A4B4

0.6

0.3



●● ● ●●●● ●● ●● ●● ●● ● ● ●

● 1.00

0.0 0.00

0.3



A3B4

A4B2

A4B2

A4B2











0.0 0.25 0.00

0.50 0.25

0.75 0.50

Probability Probability

LFM

LFM

1.00 0.75

1.00



A4B2

A4B4

A4B4

A4B4

●●



A4B4

5

0.25 0.00

0.50 0.25

0.75 0.50

ProbabilityProbability

1.00 0.75

0.3

0.3

0.0

0.0

●●

0.0

0.00

● Parameter

● A2B3 Specificity

0.6

0.9

Specificity

A2B4

0.25 0.00

1.00



0.9 Parameter Parameter A2B2 A2B2

0.0

A4B2 0.3 A4B4





0.0

1.00

A4B4

A3B4

NOM.RNKNOM.RNK

0.9

Sensitivity

Sensitivity

● ● ●●● ● ● ●● ●● ●● ●● ●● ●●● ●

0.9

MP.RNK

A2B2 A2B4



0.00

A3B3

● A2B3

A4B2

A4B2 0.3 A4B4

● ●● ●● ●● ●● ●●●● ● ● ●●●

● A2B3

0.0

0.50 0.75

LFM

●● ● ●●●● ●● ●● ●● ●● ●●●● ● ●● 0.9

0.3



0.9

1.00 1.00

A2B4

Parameter Parameter

0.25

Parameter

NOM.RNKNOM.RNK

0.6

A2B4 0.6 A3B2

0.00

1.00

● ● ● ●● ●● ●● ●● ●●●● ● ● ●●● ●



0.00

Probability Probability

1.00 0.75

● A2B3 A2B4 0.6 A3B2

0.75 1.00

Figure 3: Sensitivity and specificity plots for different set of α and β parameters for artificially generated information source data. The plot shows the behavior of different kinds of priors (top) and the corresponding optimal balanced accuracies (bottom). The number of nodes for all the networks here were 20 (m=20), and 6 information sources were used. 0.3

0.0

0.75 0.50

0.9

Specificity

Sensitivity

Sensitivity

A2B4

A4B2

0.3



0.9 Parameter Parameter A2B2 A2B2

● A2B3 0.6

0.25 0.50

MP.RNK MP.RNK

0.9

0.50 0.75

ProbabilityProbability

0.9

0.0

0.0

Specificity

0.9

A2B4

● ● ●●● ● ●● ●● ●● ●● ●● ●●●

0.00 0.25

A2B2 0.75 ● A2B3

A3B3

MP.RNK MP.RNK

0.25 0.50

●● ● ●● ● ●● ●● ●● ●● ●● ●● ● ● ●●

A3B3

1.00

0.3

0.9 Parameter

A3B2

0.50 0.25

0.6

A2B4

A4B4

Parameter

0.25 0.00

1.00 1.00

A2B2

A4B4

A2B2

0.00

Parameter Parameter

A4B2

● A2B3

0.0

●●

0.0●

A4B4 A4B4

1.00

● A2B3

● ● ● ●●● ● ● ●●●

0.0 0.0 0.00 0.25 0.00 0.25

MP.RNKMP.RNK

A4B2



0.0

0.00

0.3 0.3

MP.RNKMP.RNK

A2B2

0.00

A4B2 A4B2

A3B4 A3B4

Parameter Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2 A2B3 ● A2B3 ● ● A2B3 A2B3 ● A2B4 A2B4 A2B4 A2B4 A3B2 A3B2 A3B2 A3B2 A3B3 A3B3 A3B3 A3B3 A3B4 A3B4 A3B4 A3B4 A4B2 A4B2 A4B2 A4B2 A4B4 A4B4 A4B4 A4B4

0.25 0.50 0.75 0.50 Probability 0.75 1.00 Probability ProbabilityProbability

A4B2 0.3 A4B4

A2B4 0.6 A3B2

A4B4



0.0

Probability Probability

0.75 1.00 0.75 1.00

ProbabilityProbability

A3B4

1.00

0.50 0.75 0.50 0.75

0.6 0.6



1.00 0.75

0.25 0.50 0.25 0.50

Probability Probability Probability Probability

0.9 0.9

0.00 0.00

A4B2

Sensitivity



0.00 0.25 0.00 0.25

● A2B3 ● A2B3

0.0

0.75 0.50

A3B3 A3B3

A4B2 A4B2 A4B4 ● A4B4

0.0 0.0

0.0 0.0

1.00 1.00



0.9

Specificity

Sensitivity

Sensitivity

0.9 Parameter Parameter A2B2 A2B2



● 0.6

A3B3

● ● ●●● ● ● ● ●● ●● ●● ●● ●● ●● ●●●●

● ● 0.9

A2B4 0.6 A3B2

IP.RNK IP.RNK

Specificity

● 0.9

0.50 0.25

● ● ● ●●● ● ● ●●●

A3B4

0.0

0.00

1.00

● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ●● ●● ●● ●● ●●● ● ●●

A3B3 A3B3 A3B4 A3B4 A4B2 A4B2 0.3 A4B4 0.3 A4B4

A3B4

Sensitivity

0.25 0.00



0.25 0.50 0.75 0.50 0.75 1.00 0.25 0.50 0.75 0.50 0.75 1.00 Probability Probability

A2B2

● ● ●● ● ●●●● ●● ● ● ● 0.00

0.3 0.3

0.00 0.00

● ● ● ● ● ● ● ● ● ●● ●● ●

0.9

0.9 Parameter

A4B4



0.0

0.3 0.3 A4B4 A4B4

A3B2 A3B2

A3B3 A3B3 A3B4 A3B4





0.0 0.0

A3B3 A3B3 A3B4 A3B4 A4B2 A4B2 A4B4 A4B4





0.0 0.0 0.00 0.25 0.00 0.25

●●





A2B2

Specificity

A2B4

0.9

Specificity

Sensitivity

Sensitivity

● A2B3 0.6

1.00



0.9 Parameter Parameter A2B2 A2B2

0.9

A3B4 A3B4

0.9 Parameter 0.9 Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2 A2B3 ● A2B3 ● A2B3 ● A2B3 ● A2B4 A2B4 A2B4 A2B4 0.6 A3B2 0.6 A3B2 A3B2 A3B2

● ●

NOM

●● ● ●●●● ●● ●● ●● ●● ●● ●● ● ● ● 0.9

0.00 0.00

Sensitivity

0.75 0.50

A4B2 A4B2

A2B4 A2B4

A3B2 A3B2

●● ●● ●●● ●● ● ●● ● ●● ● ●● ●●● ●●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ●

0.3 0.3

Sensitivity

0.50 0.25

A3B3 A3B3

A2B4

0.6 0.6

0.6 0.6 A3B2 A3B2

LFM LFM IP.RNK IP.RNK



● ●

Sensitivity

0.25 0.00

0.0

Sensitivity

0.00

● ●● ●● ●● ●● ●●●● ● ● ●●●

Optimal Balanced Accuracy

● ● 0.0

●● 1.00 1.00



●●

0.9 0.9 0.9 ●●0.9 Parameter



0.0

●● 0.75 1.00 0.75 1.00

A2B4 A2B4

LFM LFM IP.RNK IP.RNK

0.9



0.00 0.25 0.00 0.25

MP

● ● ● ●● ●● ●● ●● ●● ●●●● ● ● ●●●

Specificity

Sensitivity

0.3

0.25 0.00

Specificity Specificity

0.0

A4B4

●●

A2B3 ●● A2B3 ●●A2B3



Specificity Specificity

0.3 0.3

A4B2 A4B2 A4B4 A4B4

MP



Sensitivity Sensitivity

0.3

A4B2 0.3 0.3 A4B4

0.9 Parameter Parameter



Sensitivity Sensitivity

Specificity

A3B4

A4B2

MP

0.9

A3B2 A3B2

A3B4



0.9

A2B4 A2B4

A4B2 0.3 A4B4

A3B3

1.00 1.00

● Parameter Parameter Parameter A2B2 A2B2 A2B2



A2B3 ●● A2B3 A2B3 ●● A2B3

0.6

A3B4

1.00

0.75 1.00 1.00 0.75

Specificity Specificity

MP

A2B4 0.6 0.6 A3B2

A4B2

A3B3

0.50 0.75 0.75 0.50

0.9 0.9

0.9 0.9

Specificity

Probability Probability

1.00 0.75

A3B2

A3B4

● ● ● ●● ●● ●● ●●●● ● ● ●●● 0.75 0.50

0.6

0.25 0.50 0.50 0.25

Probability Probability Probability Probability



Parameter Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2

A3B3 A3B3 A3B4 A3B4

A4B4

0.50 0.25

A2B4

0.9

Specificity

0.25 0.00

A3B3

0.00 0.25 0.25 0.00

Specificity



Specificity

A2B2

● A2B3

A3B3



NOM NOM.RNK NOMNOM.RNK

Specificity Specificity

0.0





Sensitivity

A2B2

● A2B3

A4B2 A4B2 A4B4 A4B4

0.0 0.0

0.00 0.00

Specificity

0.0

0.6

A2B2

● A2B3

A3B2

1.00 1.00

Specificity

● 0.3

0.00

A2B2

A2B4 0.6 A3B2

0.75 1.00 1.00 0.75

A3B3 A3B3 A3B4 A3B4

A4B4

● ●● ● ●●● ●●● ●●● ●●● ● ●● ● ●● ●● ●● ●● ●● ●● ● ●● ●● 0.9 0.9 Parameter

Sensitivity Sensitivity

0.3



Parameter

● A2B3 A2B4

0.6

0.9

Specificity



Sensitivity

Sensitivity

0.6

0.9 Parameter Parameter

0.9

0.50 0.75 0.75 0.50

A2B4 A2B4 A3B2 A3B2

A3B3 A3B3 A4B2

NOM NOM.RNK NOMNOM.RNK

IP

●● ● ●●●● ●● ●● ●● ●● ●●●● ● ● ●●● 0.9

0.25 0.50 0.50 0.25

Probability Probability Probability Probability

A2B4 A2B4 A3B2 A3B2 A3B4

0.3 0.3

0.0 0.0

Specificity

IP

IP

Sensitivity Sensitivity

IP

0.25 0.25 0.00

A2B3 ● A2B3 ●●A2B3 ● A2B3

0.6 0.6



0.00 0.00

Parameter Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2





Specificity Specificity

Sensitivity Sensitivity

● Sensitivity Sensitivity

0.9 Parameter 0.9 Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2

Specificity Specificity

0.9 0.9

0.6 0.6

●●● ● ●● ●●● ●●● ● ●● ● ●● ● ●● ● ●● ●●● ●●● ●● ●● ●●● ●0.9 ●

● ● ●

●● ● 0.9 0.9

1.00

0.00

0.25 0.00

0.50 0.25

0.75 0.50

ProbabilityProbability

1.00 0.75

1.00

A=2,B=2 1.00

A=2,B=3 ● ●

● ● ● ● ● ● ●

A=2,B=4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

A=3,B=2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

A=3,B=3

A=3,B=4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

A=4,B=2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

A=4,B=4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

Values

0.75

0.50

0.25

0.00 S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

Variables with different shape parameters

Figure 4: Box-plot showing the distribution of confidence values in 6 artificially generated information sources (S1 - S6) with different shape parameters α (A) and β (B)

6

0.7 0.6





0.5

● ● ● ● ●



0.4

Correlation coefficient

● ● ●

0.2

0.3



A2B2

A2B3

A2B4

A3B2

A3B3

A3B4

A4B2

A4B4

Data sets with different shape parameters

Figure 5: Box-plot showing the distribution of pairwise Spearman rank correlations across 6 artificially generated information sources for different shape parameters α (B) and β (B). Rank correlations were computed for every pair of artificially generated matrices X (i) and X (j) .

7

● ●



0.9



Source

Sensitivity

0.6



0.6

4

0.3

0.3

0.0

0.0

● ●

1

● 2

● 2

3

3

4

4

5

5

● ●

6

0.25 0.50

4

● 0.3

0.0

0.0



●●

0.50 0.75

0.75 1.00

1.00

0.00

0.00 0.25

● 0.9



1

● 2

3

3

4

4



0.9

0.6



0.25 0.50

0.50 0.75

0.75 1.00

0.6

3 4

6

●0.3

0.25 0.50

0.25 0.50

0.50 0.75

1

● 2

3

3

4

4

● 5 6

5

● Specificity



1

● 2

0.6

0.6



6



0.3

0.75 1.00

0.0

1.00

● Source

1

● 2



● ●

3

3

4

4

5

5

6

6

0.0 0.50 0.75



0.75 1.00



0.6



● 0.3

4

5

5

6

6 0.3

4

0.75 1.00

1.00

6

● ● Source



0.6

● 0.3



0.0

1.00

1

● 2

3

3

4

4

5

5

6

6









Source

1

● ● 2

● ●

● ● ● ●●

0.0

0.00

0.25 0.00

0.50 0.25

0.75 0.50

1.00 0.75

1.00

Probability Probability

LFM

LFM

0.6

0.9

LFM

0.9

Source

1

1

● 2

● 2

Source

0.6 3

3 4

4

5

5

6

6 0.3

0.3



0.6

● ●



● ●

0.3

Source

●● ●● ● ● ● ●1 ● ● 2 ● ● 3



● ●



1

● 2 3

4

4

5

5

6

6

● ●

Source



0.50 0.25

0.75 0.50

1.00 0.75

1.00

0.0

0.00

0.25 0.00

0.50 0.25

0.75 0.50

1.00 0.75

1.00

Probability Probability

Source 0.75

0.25 0.50

0.50 0.75

Probability Probability

MP.RNK MP.RNK

0.75 1.00

1

● 2

3

3

5



0.0

0.25 0.00

Probability Probability

● 2

6

0.00 0.25

0.0

0.00

1.00

0.0

0.9



1.00 0.75

5 0.3

1.00

● ●●● ● ●●● ● ● ● ● ● ●

MP.RNK MP.RNK





Source

0.0

0.50 0.75



Method

4

IP

5

IP.RNK

6

LFM

0.50

MP MP.RNK NOM NOM.RNK

1.00 0.25

● ● ● ● ● ● ● ●● ● ● ●●● ●

0.9 0.00

1

● 2

● 2

3

3

4

4

5

5

6

6

Specificity





Source

1

0.6



Source



Source

1

1

● 2

● 2

3

3

4

4

5

5

6

6

0.6



1

2

3

4

5

6

Size of network

# Sources

Figure 6: Sensitivity and specificity for different number of artificially generated information sources (α = 2, β = 2). The plot shows the behavior of different priors (top) and the corresponding optimal balanced accuracies (bottom). The number of nodes for all the networks here were 20 (m=20). 0.3

0.3

0.3





0.0

0.00

0.00 0.25

● ● ● ● ● ● ● ●● ● ● ●●●

0.25 0.50

0.50 0.75

0.75 1.00

0.0

1.00



0.0

0.00

0.00 0.25

0.25 0.50

0.50 0.75

0.75 1.00

Probability Probability

Probability Probability

NOM.RNK NOM.RNK

NOM.RNK NOM.RNK

● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Source

Sensitivity

● 0.6

0.6

0.3

0.3

0.0

0.0

1

● 2

● 2

3

3

4

4

5

5

6

6

0.6

0.6

● ● 0.3

0.00 0.25

0.25 0.50

0.50 0.75

0.75 1.00

Probability Probability

LFM

LFM

● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●

1.00

0.0 0.00

● ●

●● 0.00



● ●



Source

● 1





0.9

0.3



1.00

● 0.9



Specificity

0.9

Specificity

0.9

Sensitivity

0.6 3







0.0

6

4

0.00

Specificity

Sensitivity

Sensitivity

0.3

3

5

Probability Probability



3





0.0

1.00

Source

0.6

● ●

0.25 0.50

● 0.6

0.3



1

6



● ●





●● ● ●● ●

0.6

1

5

6

1.00 0.75

1

● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● 0.9

● 2 0.6● 2

● ●



0.75 0.50

0.9

● 2

0.3

Source

1

● ●

0.9







Source

1

● 2



0.25 0.50

0.50 0.25

Source

1●

4

0.75 0.50

4

5



3

0.50 0.25

3

4



0.9

Specificity





0.9

0.25 0.00

0.9

● 2

0.0

1

● 2

3



0.00



Source

1

● 2



NOM.RNKNOM.RNK

0.6

1.00

0.0

Probability Probability

0.9

●● 0.00 0.25

0.9

0.3



NOM.RNKNOM.RNK



0.25 0.00

6

0.3

0.0

1.00

Source

0.00

●●

0.0

0.00

Specificity

Sensitivity

Sensitivity





4







0.00 0.25

6 0.3

1.00 0.75

● Source

IP.RNKIP.RNK

0.6

●●

Source

IP.RNKIP.RNK



0.00

0.6

Probability Probability

LFM

● Source

1.00

0.9

Probability Probability



0.75 1.00

● 0.9



Probability Probability



0.0

0.50 0.75

NOM NOM



0.3

6

●●

0.00 0.25

5

6

Source

Probability Probability



0.0

●●

0.6

0.3



0.0

0.00

Specificity

Sensitivity

Sensitivity

1.00



●●

0.75 0.50





5



0.0

0.3



4 5

NOM NOM

0.0

0.9

3

6



●●

1

5

0.3

0.9

4

5

● ● ● ●● ●● ●●● ● ● ● ●

Source

4

5

● ● ● ●● ●● ●●● ● ● ● ● ●



0.6 3

3

0.9

6

Probability Probability



0.00 0.25

1

Probability Probability

0.6

0.00

0.9

3

4

1.00 0.75

● ● ● ●● ●● ●● ●● ●● ● ● ● ● ●

● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●



0.75 0.50





0.0

1

● 2





0.3

1

● 2 ●

0.50 0.25

1

● 2

3

0.9

Source

4

MP

5

● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 0.9

0.6



0.25 0.00

● 2 0.6● 2



0.9

0.00

6

● ● ● ● ● ● ● ●● ● ● ●●●

0.00 0.25

1.00

Source

0.3

0.0

0.00

0.75 1.00

5 0.3

0.0

0.50 0.75

Source

1

● 2

0.6



0.25 0.50

MP

Specificity

Sensitivity

Sensitivity

0.3

● 0.9



0.0

0.50 0.25

MP.RNK MP.RNK

0.3

0.0

MP

Source

0.6

0.25 0.00

●● ●

0.6

Source

1

● 2

0.0

0.00

MP.RNK MP.RNK

Source

6

0.0

Probability Probability



Probability Probability

0.9





1.00



Specificity

●● 0.9

0.3





● ● ●●● ●●●

Probability Probability

MP



1.00 0.75

5

6 0.3

4

5



Probability Probability

Specificity

0.00 0.25



3



0.6

0.3

Specificity

● 0.0

0.00

● 2 0.6● 2 3

0.75 0.50



Source 1

6 0.3

● ●

0.0



0.6

6

0.3





0.9

6

Specificity

0.3

Source 1

0.6

0.9



5

Specificity





Sensitivity



Source

1

●● ●

● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●

Sensitivity

0.6



Sensitivity

0.6

Source





0.9

Sensitivity

Sensitivity

Sensitivity



Specificity



Sensitivity

0.9



Sensitivity



0.9

IP

Optimal Balanced Accuracy





IP

Specificity

● 0.9

IP

0.50 0.25

4

Source

● ●

● ●

0.25 0.00



5







0.6 3

3



IP

1

● 2





0.00

Source

1

● 2



Specificity

Sensitivity



● ● ●●● ● ● ● ● ● ● ● ● ●



0.9

Specificity

●●

0.9 ●

Specificity



Specificity

●● 0.9

●Source Source ●

● ●

1

1

● 2

● 2

3

3

4

4

5

5

6

6



8

● ● ● ● ● ●●

0.0 0.00 0.25

0.25 0.50

0.50 0.75

Probability Probability

LFM

LFM

0.75 1.00

1.00

● ●● ●● ●● ●● ●● ● ●● ●● ● ●

●● 0.9 0.9

Size





40

40

60

60

0.3 0.3



● ●

0.0 0.0

0.3

60

10 0.6

● 20 40

● 20 40

0.3

0.0

0.0

0.00

1.00

10

40 60

60

60

0.0

0.6

0.3

40 60

40

0.3

● ●● ●● ●● ●● ●● ●●● ●●

0.25 0.00

0.50 0.25

0.75 0.50

NOM

NOM

ProbabilityProbability

1.00 0.75

● ● ● ●●● ●● ●● ●● ●● ●● ●●● ● 0.9

NOM

NOM

1.00 0.75

40 60

40



0.0

0.00

0.25 0.00

0.50 0.25

0.75 0.50

Probability Probability

1.00 0.75

0.0



●●20

0.25 0.00

0.50 0.25

0.75 0.50

ProbabilityProbability

1.00 0.75

1.00





Size

Size 10

40

40

60

60



40 60

● ● ● ● ● ● ● ●● ●● ● ● ●



Size

0.50 0.25

0.75 1.00

0.6

0.6

0.0



● ● ● ●● ● ● ● ● ●● ●● ●● ●

10

● 20

40

40

60

60

0.9



Size

● ●

40

10

● 20

● ●

0.3

40

● ●

60

● ●

● ●

● ●

0.0

0.75 0.50

●●●

1.00

0.00

0.6

40

0.25 0.00

0.25

0.50

0.50 0.25

0.75

0.50 Probability Probability

0.75

1.00

0.75 0.50

ProbabilityProbability

1.00 0.75

1.00

LFM

●●●

Size

10

● 20

0.6

40 60

60 0.3

● 0.0

0.25



0.9

0.3

●● ●



Size

10

● 20

0.0

0.00

1.00 0.75

Size



● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●









0.0

0.00

1.00

Size

10

● 20

0.3

LFM



Probability Probability

IP.RNK IP.RNK

0.50 0.50 0.75

Probability Probability

60 0.3

0.0 0.25 0.00

0.25 0.25

● ●

0.9 ●

0.6 10 0.6 ● 20

0.3

0.00

1.00

60

0.9

0.3



10



0.6

● 0.0





0.3

40

60



LFMLFM

0.9

60

0.3

40

0.00 0.00

1.00

Size

10 0.6 ● 20

10

● 20 ● 20

60

0.9

Specificity

Sensitivity

0.3

0.75 0.50

Size

10

● 20

Size

Size 10 0.6 0.6 ● 20

0.0 0.0

0.50 0.25

ProbabilityProbability



Size

0.6

10

● 20

40



0.25 0.00

0.9

0.9

0.3 0.3

●●

0.9

0.6

60

0.0

0.00

1.00

40

0.3

0.0

●●●

● 20



Specificity

0.0●

0.00



60

10

Size

10

● 20

NOM.RNK NOM.RNK

0.9 0.9

0.6

●●●

0.0

0.00

1.00

Specificity

● 20

0.75 1.00

0.0

● ●● ● ● ● ● ●● ● ● ● ●● ●● ●● ●● ●●

●●●

Size

10 0.6 ● 20

0.50 0.50 0.75

Probability Probability

1.00

0.6

NOM.RNK NOM.RNK

Size

10

0.25 0.25

MP

60

0.3

Specificity

Size

0.00 0.00

1.00

● ● ●●●●● ● ● ● ●● ●● ●● ●● ●●

Specificity

● 0.6

0.3

0.0

1.00 0.75

0.9

Specificity

Sensitivity

Sensitivity

0.3

0.75 0.50

●●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● 0.9



0.50 0.25

ProbabilityProbability

MP

0.9

0.6

0.25 0.00

MP



0.9

Sensitivity

0.00

1.00

40

60

Size

10

Specificity



●●●

Sensitivity

MP

1.00 0.75

Sensitivity

0.75 0.50

Sensitivity

0.50 0.25

ProbabilityProbability

10

● 20

40



Size

● 20 ● 20 40

0.0 0.0

Sensitivity

0.25 0.00

0.0

Specificity

0.0

0.00

● ● ● ●● ●● ●● ●● ●●● ●●

10

● 20

1.00 0.75

● ● ● ● ●● ● ●● ●● ●● ●● ●● ● 0.9

0.9



● 0.0

0.75 0.50

MP.RNKMP.RNK

40

0.3 0.3

0.50 0.25

MP.RNK MP.RNK

Size

0.3

0.25 0.00

ProbabilityProbability

10 0.6 0.6 ● 20

60

60



0.75 1.00

Size

10

0.6

Sensitivity

40

● 0.3

Size

Specificity

● 20



0.50 0.50 0.75

Probability Probability

0.9 0.9

Size

10

0.6

0.3

● ● ● ●

●●●

0.9

Specificity



Sensitivity

Sensitivity

Size 0.6

● ● ●● ●● ●● ●●●●●

Specificity

0.9

0.9

0.25 0.25

IP

●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● 0.9

0.3

Specificity

IP

IP

Sensitivity

IP

0.6

● ●

0.00 0.00

Size

Size

10

0.6 ● 20 ● 20

Specificity

Sensitivity

Sensitivity

10

Specificity

Size

● ● 0.6 0.6

●●●

0.9

0.9

● ●

1.00

0.00

0.0 0.25 0.00

0.50

0.75

0.25 0.50 Probability Probability

1.00 0.75

1.00

IP.RNK IP.RNK 1.00

●● 0.9

Sensitivity

Sensitivity

0.6

0.3

Size



10

● 20

0.6

40



0.3



60



100.6 ● 20 40

0.0 0.25 0.00

40 60

60 0.3

0.50

● ● ●● ● ● ● ● ● ● ● ●●● 0.75

1.00

0.25 0.50 0.75 Probability Probability

0.0 0.00

1.00



0.9



0.25 0.00

0.50

0.9

IP.RNK LFM

0.50

MP MP.RNK NOM NOM.RNK

1.00

0.00

●●●

10



20

40

60

Size of network

Size

Size

Specificity

40 60

Specificity

10

● 20

0.6

100.6

● 20 40

10

● 20

0.6

40

Size

60

10

● 20 40 60

0.3

0.3



● ● ● ● ● ● ● ●●● 0.0 ● ● ● ● ● ● ●

0.00

0.25 0.00

0.50

0.75

●●●

0.0

1.00

0.25 0.50 0.75 Probability Probability



0.0

0.00

0.25 0.00

1.00

NOM.RNK NOM.RNK

0.50

0.75

1.00

0.25 0.50 0.75 Probability Probability

●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●



●●

● 0.9

0.9

0.9 Size

0.6

Size

Specificity

10

● 20

Specificity

Sensitivity

Size 0.6

1.00

NOM.RNK NOM.RNK

0.9

Sensitivity

IP

60

MP.RNK

● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● 0.9

0.3

0.0

Method

40

0.25

1.00

60

0.3

10

● 20

Figure 7: Sensitivity and specificity for different network sizes (m = 20, 40 and 60 nodes) for artificially generated information sources (α = 2, β = 2). The plot shows the behavior of different priors (top) and the corresponding optimal balanced accuracies (bottom). The number of sources for all the networks here were 6. Sensitivity

Sensitivity



0.9

Size 0.6

0.75

0.25 0.50 0.75 Probability Probability

MP.RNK



Size

0.0

MP.RNK MP.RNK ●

0.75

10

● 20

0.6

● ●

0.00

Size

Size

0.3

● 0.0

0.9





●●●

Optimal Balanced Accuracy

0.9

Specificity



Specificity

0.9

● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●

100.6

40

● 20

60

40

10

● 20

0.6

40 60

60 0.3

0.3

0.3



● ● 0.0 0.00

0.25 0.00

0.50 0.75 Probability 0.25 0.50

Probability

1.00 0.75

0.0

● 1.00

10

9

● 20 40 60

0.3

0.0

Size

●● ● ● ● ● ● ● ● ●● ● ● ● ● ●



● ●



0.0 0.00

0.25 0.00

0.50 Probability 0.25

0.75 0.50

Probability

1.00 0.75

1.00

10

● 20 40 60

IP.RNK IP.RNK

IP.RNK IP.RNK

0.9

0.9

0.9●

0.9



10

● 20





40

0.3

●0.3

0.0

0.0

0.9

0.9

0.3

Size

10

● 20

● 20

40

40

60

60

0.3

0.6

0.9

0.9

60



● ● ● ● ● ● ● ● ● ● ●●●

0.25 0.50

0.50 0.75

Probability Probability

●●●

0.75 1.00

1.00

● 20

0.6

0.3

● 20

40

40

60

60

0.0

0.0 0.00 0.25



10

● 20

0.3

0.3

0.3

0.0

0.0

0.0

0.0 ●

40

40

60

60

0.250.50

0.500.75

Probability Probability

MP

0.751.00

0.00

1.00

0.000.25

0.250.50

0.500.75

Probability Probability

MP

MP

0.751.00

0.9

0.9

0.3



Size

10

10

● 20

● 20

40

40

60

60

0.6

0.3

0.3

0.50 0.75

●●●

0.75 1.00

0.9



40

40

60

60

0.00 0.25

10

0.6

● 20

0.00

0.000.25

0.250.50

0.500.75

Probability Probability

0.751.00

0.00

1.00

0.000.25

0.250.50

0.500.75

Probability Probability

NOM NOM

0.25 0.50

0.751.00

0.00 0.25

0.00

1.00

0.25 0.50

0.50 0.75

0.75 1.00

40

40

60

60

1.00

0.00 0.25

0.25 0.50

LFM

● 0.3

●● 0.0

0.0

0.00

0.000.25

0.250.50

0.500.75

Probability Probability

0.751.00

● ● ● ● ● ●●

0.000.25



0.9

0.9

0.9

10

● 20

40

40

60

60



0.250.50

0.0

0.00

0.000.25



● ●● ●● ● ● ● ● ● ● ● ● ● ●●● 0.250.50

0.500.75

Probability Probability

0.751.00

0.0

0.0

10

● 20

40

40

60

60

0.6

0.3

● ●● ●● ●● ●● ●● ●● ●●● ● ● ● 0.250.50

0.500.75

Probability Probability

0.751.00

40

60

60

● ●

0.0

0.00

0.25 0.00

0.50 0.25

0.75 0.50

ProbabilityProbability

1.00 0.75

1.00

STRINGSTRING

0.9



●● ●● ●● ●● ●● ●● ●● ●●





0.9

40

60

60

0.6

Size

Size

10

10

● 20

● 20

40

40

60

60

0.3

0.6

0.6

0.3

0.3

0.0

0.0

Size

10

10

● 20

● 20

40

40

60

60

● 0.0 0.250.50

0.500.75

Probability Probability

0.751.00

1.00



● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●





0.0

0.00

0.25 0.00

0.50 0.25

0.75 0.50

Probability Probability

1.00 0.75

1.00

0.00

0.25 0.00

0.50 0.25

0.75 0.50

ProbabilityProbability

1.00 0.75

1.00

● ●● ●● ●● ●● ●● ●● ●●● ● ● ●

0.6



10

● 20

40

40

60

60



0.3

0.0

● 0.0

0.00

1.00

Size

10

● 20

0.000.25

0.250.50

0.500.75

Probability Probability

0.751.00

1.00

NOM.RNK NOM.RNK







0.9



0.3

0.0

0.0

Size

Size

Specificity

0.6

10

10

● 20

● 20

40

40

60

60

0.6

0.3

0.00



0.000.25

0.250.50

0.500.75

Probability Probability

0.751.00

0.6



● ● ● ●● ●● ●● ● ● ● ●

0.9

0.000.25

0.250.50

Size





0.00

10

10

● 20

● 20

40

40

60



0.6

60

0.250.50

● ●● ●● ●●● ● ● ● 0.500.75

Probability Probability

40

60

60

● ●

0.751.00

1.00

0.751.00

1.00



● ●

● ●● ●● ●●● ● ● ● ●

● Size





10

● 20

0.00

40 60



0.0 0.000.25

0.250.50

0.500.75

Probability Probability

0.751.00

10

● 20

60

0.3

0.0

10

Size

40









0.0



0.6

● 0.3

0.3

0.000.25

Size

Specificity





0.0







0.500.75

Probability Probability



0.9

Specificity

Sensitivity

0.3

0.6

10

● 20

40

LFM LFM





Size

10

● 20 ●

0.0

0.00

1.00



0.3

0.0





LFM LFM

● 0.9 ●



0.9

Specificity

●● ●● ●● ●● ●● ●● ●● ●● ●

0.9

Sensitivity

Sensitivity

10

● 20

40

0.9

●●

Sensitivity

0.0

1.00

Size

10

● 20

Figure 8: Sensitivity and specificity for different network priors and STRING for KEGG sub-graphsm = 20, 40 and 60 nodes ( top to bottom). Corresponding oBAC are available in the main document.

0.3

0.6

●●●

1.00 0.75

Size

10

40

Size

Specificity

Size

10

● 20

Size

0.9

0.75 0.50

Probability Probability

Size

0.3

● 0.9

0.000.25

●●

0.6

0.50 0.25

● ● ● ●● ● ●

Size

0.3





●●●

● ●





10

NOM.RNK NOM.RNK

0.9

60

0.9

● 20 0.6 ● 20

0.000.25



0.00

0.25 0.00

0.9

0.0





60





● ●● ●●● ●● ●

0.6

STRINGSTRING



0.00

Specificity

Sensitivity

Sensitivity

0.3

40

0.6

0.3



LFM



MP.RNK MP.RNK

0.6

0.3

40



0.0

0.00

0.3

0.0

1.00

Size 0.6

1.00

0.6

● 0.9 ●

0.751.00

10

● 20

● ● ●● ●● ●● ●● ●● ●●● ● ● ● ●

MP.RNK MP.RNK

0.9

0.500.75

Probability Probability

Size

10

● 20

0.3



1.00





0.0

Size

● ●

0.75 1.00





0.6

0.3

0.3







Sensitivity

10

● 20 ●

● 0.3

60

0.3





0.9





● ●

Size

Specificity

0.6

Size

Specificity

Sensitivity

Sensitivity

Size



60

0.6

IP.RNKIP.RNK



0.6



40

0.0

IP.RNKIP.RNK

0.9



40

0.0

0.00

1.00





0.9



Specificity

0.3

10

0.50 0.75

LFM

● ●

Size

● 20 0.6 ● 20

60



ProbabilityProbability

Specificity



60

0.3

0.0



40

60

0.0

0.00

Specificity

60

40

● Size 10

0.6



10

● 20

40



● ● ● ● ●●● ●● ●● ● ●

Size

10

● ● 20



● 0.3

0.0

Specificity

40

0.6

Sensitivity

10

● 20

● 0.9 ●

0.9



● Size

10

● ● ● ●● 20 ●





Sensitivity

0.3

Size





0.6

Specificity

Sensitivity

Sensitivity

● 0.6



0.9

Sensitivity

0.9



Specificity

0.9





Probability Probability

LFM

1.00

0.6

●●

●● ●● ● ● ●● ● ● ● ● 0.9 ●

0.75 1.00

Size

10

● 20 0.6





NOM NOM

0.50 0.75

ProbabilityProbability

Size

0.3

0.0

0.0

0.0

60

0.9

0.9

●● 0.0

40

60









● ● ●● ●● ●● ●● ●● ●● ●●● ● ● ● 0.0 ● ●

0.0

10

● 20

40



0.00

0.3

0.3

0.3

Size

10

● 20





Size

10

●●●

0.0

0.0

1.00

Size

10

● 20 0.6 ● 20

0.6

●● ●● ●● ●● ●● ●●● ●●

NOM.RNK NOM.RNK

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

●● 0.9

Sensitivity

Size

Specificity

0.6

● ●● ●● ●● ●● ●● ●● ●● ●●● ● ● ● ● ● 0.9

Specificity

Sensitivity

Sensitivity

Size 0.6

0.25 0.50

Probability Probability

1.00

0.6

NOM.RNK NOM.RNK

MP



0.9

0.00 0.25

0.00

1.00

0.75 1.00

0.3

Specificity

0.000.25

● ● ● ● ● ● ● ● ● ● ●● ● ● ●



● 0.3



0.50 0.75

Size

10

● 20 0.6

Specificity

0.00

Sensitivity

●● ● ●● ●● ●● ●● ●● ●● ●●● ● ● ● 0.0 ●● ●

0.25 0.50

ProbabilityProbability

Size

● 0.0

60

0.9

0.9

0.6

0.3

40

MP.RNKMP.RNK

Size

10

10

● 20

60 0.3

0.00





Size

10 0.6

Size

10

● 20 40





●● ●● ●● ●● ●● ●● ●● ●● ●●● ● ● ●

Sensitivity

Size

Specificity

10 0.6

Size

● 0.6

MP.RNKMP.RNK

IP

0.9

Specificity

Sensitivity

Sensitivity

Size 0.6

40

Specificity

●● 0.9



Specificity

IP

IP



0.00 0.25

Sensitivity

IP

10

● 20 0.6





0.00

60



●●●

Size

Specificity

Size

0.6

Specificity

Sensitivity

Sensitivity



0.6

● ● ● ● ● ●● ●● ● ● ● ● ● ●



1.00

356

7043

7040

355

7046

7048

5495

1616

5536

6885

7042

208

6416

4217

5606

408

5293

5295

8503

5296

5294

5170

208

7043

7040

355

7046

7048

5495

1616

5536

6885

1852

2316

3479

356

5601

6416

408

1852

2316

5601

5563

6195

27330

6197

5594

9479

4296

5595

11072

3091

5595

23542

3630

2475

10000

208

4217

5606

5595

7042

5595

54541

5778

1849

5594

1432

4137

7249

1843

80824

6300

4149

5600

3303

5601

5609

5602

4208

●●●●●● 8651

4093

658

4091

4090

4086

8454

728622

9765

4087

4088

9655

23624

122809

30837

867

90

●●●●●●●●●●● 3459

163702

3581

3561

3572

2057

4352

3575

3977

3587

3598

4089

3399

3397

3398

1875

2033

6667

5308

1387

●●● 3717

7297

3716

145957

2549

5294

5296

5293

2064

5295

2065

2066

23533

25759

9542

145957

2065

2066

2069

1839

7039

1950

1956

6464

53358

5336

5335 8503

208

5579

1956

23533

5296

5293

25759

399694

6464

867

5578 10000

2885

1026 6655

3458

814

5530

810

805

5257

817

163688

815

801

64750

808

130399

5137

115

4638

107

113

114

5568

7046

657

4092

4086

4090

4093

3398

3399

196883

1875

5613

9372

4091

4087

4088

1874

5308

2033

5567 4609

Figure 9: The 10 graphs with # nodes=20 sampled from KEGG via random walks. The node IDs represent Entrez IDs.

11

2549 1857

5293

208

3714

51107

5663

4851

4854

4855

4853

207

1027

2475

5335

5336

5532

5578

572

23291

100137049

1855

23533

10000

1026

55851

2619

64399

4036

6469

3549

50846

5582

56848

2736

2737

3845

5894

5727 5605

5336

3358

2911

623

3357

146

3356

5579

5894

56848

5605

5604

5594

5595

5319

123745

2776

2767

9630

5330

784

3791

5566

5567

5568

5613

25780

6714

5906

5908

998

6237

5293

8503

5291

5295

673

5881

1456

207

5336

5335

5533

5582

5578

56848

8877

1453

5321

2737

5879

2735

2736

3845

7479

656

8643

652

7478 5894

Figure 10: 10 directed acyclic graphs (DAGs) with # nodes=10 selected via random walks on KEGG pathways. The node IDs represent Entrez IDs.

12

Reconstruction

Reconstruction

0.9

0.9















● METHOD

METHOD

IP

IP

● IP.RNK

Specificity

Sensitivity

● IP.RNK 0.6

LFM 0.6 MP

LFM

MP.RNK

MP.RNK

NOM

NOM

NOM.RNK

NOM.RNK

NP

NP

MP

0.3

0.3

















0.0

0.0

5

10 2

20

50 4

100

500 6

1000

5000 8

5

10 2

20

450

100

500 6

1000

5000 8

# Samples

# Samples

Figure 11: Sensitivity and specificity of Bayesian Network reconstruction for simulated data for a network of 10 nodes (m = 10) with different kinds of prior. The corresponding balanced accuracies are shown in the main document

Figure 12: MetacoreTM network for 37 selected genes from vant’t Veer breast cancer data set. The network was retrieved using the shortest path algorithm (path length of 2). The entire set of interaction with literature evidences are in ‘Supplement 2’

13

References ˜ [1] Ethan G Cerami, Benjamin E Gross, Emek Demir, Igor Rodchenkov, Azagin Babur, Nadia Anwar, Nikolaus Schultz, Gary D Bader, and Chris Sander. Pathway commons, a web resource for biological pathway data. Nucleic Acids Research, 39:D685–D690, 2011. [2] Holger Fr¨ohlich, Mark Fellman, Holger S¨ ultman, and Tim Beißbarth. Predicting pathway membership via domain sinatures. Bioinformatics, 24:2137–2142, 2008. [3] Holger Fr¨ohlich, Mark Fellman, Holger S¨ ultman, Annemarie Poustka, and Tim Beißbarth. Large scale statistical inference of singnaling pathways from rnai and microarray data. BMC Bioinformatics, 8(386), October 2007. [4] Florian Hahne, Alexander Mehrle, Dorit Arlt, Annemarie Poustka, Stefan Wiemann, and Tim Beißbarth. Extending pathways based on gene lists using interpro domain signatures. BMC Bioinformatics, 9(1):3, 2008. [5] Marc Johannes, Holger Fr¨ohlich, Holger S¨ ultmann, and Tim Beißbarth. pathclass: an r-package for integration of pathway knowledge into support vector machines for biomarker discovery. Bioinformatics, 27(10):1442–1443, 2011. [6] R I Kondor and J Lafferty. Diffusion kernels on graphs and other discrete structures. In Proceedings of the ICML, 2002. [7] Dekang Lin. An information-theoretic definition of similarity. In In Proceedings of the 15th International Conference on Machine Learning, pages 296–304. Morgan Kaufmann, 1998. [8] Nicola J Mulder, Rolf Apweiler, Terri K Attwood, Amos Bairoch, Alex Bateman, David Binns, Margaret Biswas, Paul Bradley, Peer Bork, Phillip Bucher, Richard Copley, Emmanuel Courcelle, Richard Durbin, Laurent Falquet, Wolfgang Fleischmann, Jerome Gouzy, Sam Griffith-Jones, Daniel Haft, Henning Hermjakob, Nicolas Hulo, Daniel Kahn, Alexander Kanapin, Maria Krestyaninova, Rodrigo Lopez, Ivica Letunic, Sandra Orchard, Marco Pagni, David Peyruc, Chris P Ponting, Florence Servant, Christian J A Sigrist, and InterPro Consortium. Interpro: an integrated documentation resource for protein families, domains and functional sites. Brief Bioinform, 3(3):225– 235, Sep 2002. [9] B. Raghavachari, A. Tasneem, T. M. Przytycka, and R. Jothi. Domine: a database of protein domain interactions. Nucleic Acids Res, 36(Database issue):D656–61, 2008.

14

[10] Andreas Schlicker, Francisco S Domingues, J¨org Rahnenf¨ uhrer, and Thomas Lengauer. A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics, 7:302, 2006.

15

Table 1: Pairwise Wilcoxon test for model performance comparison (false discovery rates) for KEGG sub-graphs with m = 10 nodes Methods IP IP.RNK LFM MP IP.RNK LFM 0.0041 0.0041 MP 0.0203 0.0203 0.0352 MP.RNK 0.0423 0.0423 0.0203 0.0304 NOM 0.0041 0.0041 0.2974 0.0041 NOM.RNK 0.0041 0.0041 1.0000 0.0099 STRING 0.0070 0.0070 0.0041 0.0945

MP.RNK 0.0041 0.0041 0.5111

NOM NOM.RNK 0.0041 0.0041 0.0041

Table 2: Pairwise Wilcoxon test for model performance comparison (false discovery rates) for KEGG sub-graphs with m = 20 nodes Methods IP IP.RNK LFM MP IP.RNK 0.0036 LFM 0.0036 0.2712 MP 0.0144 0.0064 0.0260 MP.RNK 0.0036 0.6250 0.4200 0.0348 NOM 0.0036 0.0036 0.5773 0.0036 NOM.RNK 0.0036 0.0036 0.4648 0.0036 STRING 0.0036 0.0036 0.0036 0.0260

MP.RNK 0.0091 0.0064 0.0036

NOM NOM.RNK 0.1022 0.0036 0.0036

Table 3: Pairwise Wilcoxon test for model performance comparison (false discovery rates) for KEGG sub-graphs with m = 40 nodes Methods IP IP.RNK LFM MP IP.RNK 0.0068 LFM 0.0039 0.0980 MP 0.0594 0.2166 0.0091 MP.RNK 0.0039 0.4038 0.0594 0.0201 NOM 0.0039 0.0495 0.6481 0.0039 NOM.RNK 0.0039 0.0039 0.9219 0.0039 STRING 0.0039 0.0068 0.0039 0.0383

16

MP.RNK 0.0039 0.0039 0.0039

NOM NOM.RNK 0.0091 0.0039 0.0039

Table 4: Area Under Curve (Standard deviations in brackets) values for KEGG pathway reconstruction with different number of nodes (in brackets are the standard deviations) Method 10 IP 0.500 (0.00) IP.RNK 0.500 (0.00) MP 0.828 (0.04) MP.RNK 0.833 (0.09) NOM 0.879 (0.05) NOM.RNK 0.801 (0.02) LFM 0.869 (0.05)

20 0.500 (0.00) 0.741 (0.03) 0.795 (0.03) 0.807 (0.08) 0.875 (0.04) 0.881 (0.02) 0.907 (0.01)

17

40 0.500 (0.00) 0.771 (0.04) 0.803 (0.05) 0.855 (0.10) 0.831 (0.04) 0.864 (0.01) 0.918 (0.02)

60 0.500 (0.00) 0.779 (0.04) 0.781 (0.05) 0.882 (0.11) 0.887 (0.06) 0.856 (0.02) 0.923 (0.03)

Table 5: Pairwise Wilcoxon test for Bayesian Network reconstruction for different sample size (5 to 5000, see main article for details) sample size

5

10

20

50

100

500

Methods IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP

IP IP.RNK LFM MP MP.RNK 0.85 0.11 0.11 1.00 0.85 0.11 0.17 0.17 1.00 0.17 0.11 0.11 1.00 0.11 0.85 0.17 0.19 0.83 0.17 0.85 1.00 1.00 0.11 1.00 0.19 0.703 0.032 0.062 1.00 0.703 0.032 0.437 0.683 0.032 0.437 0.272 0.309 0.683 0.272 0.613 0.227 0.125 0.418 0.227 0.483 0.026 0.105 0.259 0.026 0.683 0.422 0.041 0.041 1.00 0.422 0.041 0.049 0.102 0.060 0.049 0.041 0.041 0.906 0.041 0.240 0.049 0.041 0.466 0.049 0.422 0.304 0.722 0.041 0.304 0.049 0.922 0.016 0.016 1.00 0.922 0.016 0.016 0.016 0.154 0.016 0.018 0.016 0.864 0.018 0.120 0.046 0.020 0.197 0.046 0.703 0.197 0.653 0.016 0.197 0.016 0.625 0.029 0.018 1.00 0.625 0.029 0.102 0.126 0.029 0.102 0.031 0.018 0.240 0.031 0.177 0.029 0.029 0.177 0.029 0.206 0.102 0.625 0.018 0.102 0.237 0.675 0.016 0.016 1.00 0.675 0.016 0.092 0.092 0.016 0.092 0.016 0.016 1.000 0.016 0.022 0.038 0.022 0.063 0.038 0.378 0.016 0.016 0.197 0.016 0.177 18

NOM 0.37 0.13 0.957 0.959 0.304 0.041 0.059 0.016 0.625 0.102 0.049 0.197

NOM.RNK 0.27 0.905 0.049 0.026 0.031 0.799

Table 6: continuation of table 6 sample size

1000

5000

Methods IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP

IP IP.RNK LFM MP MP.RNK 0.625 0.022 0.022 1.00 0.625 0.022 0.048 0.048 0.025 0.048 0.022 0.022 1.00 0.022 0.025 0.034 0.025 0.048 0.034 0.533 0.031 0.022 0.424 0.031 0.034 0.675 0.014 0.011 1.00 0.675 0.014 0.037 0.025 0.019 0.037 0.014 0.011 1.000 0.014 0.019 0.031 0.019 0.075 0.031 0.957 0.014 0.011 0.079 0.014 0.011

19

NOM 0.048 0.424 0.037 0.188

NOM.RNK 0.198 0.011