Boosting Probabilistic Graphical Model Inference by ... - ScienceOpen

Boosting Probabilistic Graphical Model Inference by Incorporating Prior Knowledge from Multiple Sources (Supplement) Paurush Praveen, Holger Fröhlich May 2, 2013

1 Knowledge sources The approaches presented here primary consider the following sources of biological knowledge, however new sources can be added if available. Protein-Protein Interactions Protein-protein interaction (PPI) data can be instrumental to understand biology at a system-wide level. They present the current knowledge about pairs of proteins that interact in living system and hence can be an important source of information for our approach. PPIs have traditionally been measured using a variety of assays, such as immunoprecipitation and yeast two-hybrid (Y2-H). Such knowledge resides in various databases, like IntAct, HPRD etc. The entire set of known interaction referred as interactome can be structured as a graph. We here use interaction data from the PathwaysCommons database [1]. To compute a confidence value for each interaction between a pair of genes/proteins we can work at the level of this graph in different ways: One way is to look at the shortest path distance between the two entities. Another way is to employ diffusion kernels [6]. To calculate the shortest path distance between two nodes the function sp.between function based on Dijkstra’s algorithm is used from R-package RBGL. The edge confidence is then computed as the inverse shortest path distance. The diffusion kernel was computed using the R-package pathClass [5] for the entire PathwayCommons graph. However, we observed that the use of the diffusion kernel in place of the shortest path measure did not affect the results much (Figure 1). For all the results in the main document we thus used the inverse shortest path distance as a measure of confidence.

1

Gene Ontology The Gene Ontology (GO) has been developed to offer controlled vocabularies for aiding the annotation of molecular attributes for different organisms. Predicting the map of potential physical interactions between proteins by fully exploring the knowledge buried in GO annotations seems a promising approach, because interacting proteins often function in the same biological process. This implies that two proteins acting in the same biological process are more likely to interact than two proteins involved in different processes. Here we use this information based on GO Biological Process (BP) annotations. Comparison of individual GO terms was performed via Lin’s similarity measure [7]. Based on this gene products were compared via the default method in GOSim [3], which resembles the similarity measure by Schlicker et.al.[10]. Protein Domain Annotation Hahne et.al [4] and Fröhlich et.al. [2] found that proteins in distinct KEGG pathways are enriched for certain protein domains, i.e. proteins with similar domains are more likely to act in similar biological pathways. The confidence of interaction between two proteins can thus be seen as a function of the similarity of the domain annotations of proteins. Protein domain annotation can be retrieved from the Inter-Pro database [8]. For each protein we constructed a binary vector, where each component represents one Inter-Pro domain. A “1” in a component thus indicates that the protein is annotated with the corresponding domain. Otherwise a “0” is filled in. The similarity between two binary vectors u, v (domain signatures) is presented in terms of the cosine similarity Sdomain =

hu, vi kukkvk

(1)

Domain-Domain Interactions Two proteins are more likely to interact if they contain domains, which can potentially interact. The DOMINE database collates known and predicted domain–domain interactions [9]. Calculation for edge confidence (IAB ) based on the DOMINE database is done as IAB =

H DA .DB

(2)

where H is the number of hit pairs found in the DOMINE database and DA and DB are the the number of domains in proteins A and B, respectively.

2

Figure 1: Optimal balanced accuracies achieved for reconstructing KEGG subgraphs (m=20) using graph diffusion kernel (left) and inverse of pairwise shortest path-lengths (right)

3

−5000 −6000 −7000 −9000

−8000

log likelihood

0

1000

2000

3000

4000

5000

Iterations (x100)

Figure 2: Plot showing the convergence of the MCMC sampler in terms of log likelihoods.

4

●

A2B3 ●● A2B3 A2B3 ●● A2B3

0.6 0.6

A2B4 A2B4 A3B2 A3B2

A2B4 A2B4 0.6 A3B2 0.6 A3B2

A3B3 A3B3

A3B3 A3B3 A3B4 A3B4

A3B4 A3B4 A4B2 A4B2

0.3 0.3

0.3 0.3

0.0 0.0

0.0 ● ●● ● ●●● ●●● ●●● ●●● ● ●● ●●● ●●● 0.0●● ●● ●● ●●● ●

A4B4 A4B4

0.9

A4B2 A4B2 0.3 A4B4 0.3 A4B4

●

0.0

0.0 0.0

0.0 0.0

0.00

0.50 0.25

0.75 0.50

Probability Probability

1.00 0.75

0.00 0.00

1.00

A2B2

A2B2

● A2B3

● A2B3

A2B4

0.6

0.3

A3B2

A2B4 0.6 A3B2

A3B3

A3B3

A3B4

A3B4

A4B2

A4B2 0.3 A4B4

A4B4

0.25 0.50 0.25 0.50

0.50 0.75 0.50 0.75

Probability Probability Probability Probability

Parameter

●

A2B2

A2B2

● A2B3

● A2B3

A2B4

0.6

A3B2 A3B3

0.3

A2B4 0.6 0.6 A3B2 A3B3

A3B4

A3B4

A4B2

0.3 A4B2 0.3 A4B4

A4B4


NOM

1.00 0.75

0.00

1.00

0.0 0.0

0.0 0.25 0.00

0.50 0.25

0.75 0.50


NOM

NOM

1.00 0.75

0.3

●

●

0.6 0.6

●

●

●

● ●

0.6

0.3

A3B2

A2B4 0.6 A3B2

A3B3

A3B3

A3B4

A3B4

A4B2

A4B2 0.3 A4B4

A4B4

● Parameter ●● A2B3

●

● A2B3

A2B4

0.6

A3B2 A3B3

●

●

0.3

●

0.0

● A2B3

0.50 0.25

0.75 0.50


1.00 0.75

0.0

●

0.25 0.00

IP.RNK IP.RNK

● A2B3 A2B4

0.6

●

● ●

0.3

●

0.3

●

0.25 0.00

0.50 0.25

0.75 0.50


1.00 0.75

A2B4

●●

A3B4

A3B4

A3B4

A3B4

A4B2

A4B2 0.3 A4B4

A4B2

A4B2 0.3 0.50 A4B4

0.6

A3B2 A3B3

0.3

A4B4

●●

0.6

A3B2

● ● 0.0

A3B3

A3B3

A3B4

A3B4

0.25 0.00

A4B4

0.50 0.25

0.75 0.50


1.00 0.75

Parameter Parameter A2B2

A3B2

A3B2

A3B3

A3B3

0.75 1.00

A2B2

● A2B3 ● A2B3 A2B4

A3B2

A3B2

A3B3

A3B3

A3B4

A3B4

A3B4

A4B2

A4B2

A4B4

A4B4

0.6

0.6

0.3

0.3

● 0.0

0.0

1.00

0.00 0.25

0.00

0.25 0.50

0.50 0.75

0.75 1.00

ProbabilityProbability

NOM.RNK NOM.RNK

NOM.RNK NOM.RNK

1.00

A2B2

0.6

0.3

A3B3

● Parameter Parameter A2B2

A2B2

A2B4

A2B4

A3B2

A3B2

A3B3

A3B3

A3B4

A3B4

A4B2

A4B2

A4B4

A4B4

A2B2

● A2B3 ● A2B3 0.6

0.6

A2B4

A2B4

A3B2

A3B2

Method A3B3

A3B3

A3B4IP

A3B4

IP.RNK

0.3

A4B2

A4B2

●A4B4MP

A4B4

LFM

0.3

●

●

NOM

●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●

●

1.00 0.75

1.00

0.0 0.25 0.00

0.50 0.25

0.75 0.50

0.0

NOM.RNK

0.0

0.00

0.25 0.00


0.50 0.25

0.75 0.50

1.00 0.75

1.00


LFM

LFM

LFM

● ●● ●● ●● ●● ●● ●●● ● ● ●●● ●● ● ●● ● 0.00

0.9 ● 0.9 Parameter

A2B2

A3B2

0.9

●

● A2B3

0.6

● 0.9

● A2B3 ● A2B3

A2B4 0.6 A3B2 A3B3

●

A3B4

● ● A2B2


●

A2B4

0.9

0.9

A2B2 A2B2 A3B3 A3B2 Size of network ● A2B3 ● A2B3


A4B2

A4B4

0.6

●

A2B4

● ●

● ●

●

A2B4

0.6

A2B2

A2B2

● A2B3 ● A2B3

Parameter sets

0.3

0.0

0.0

0.00

0.25 0.00

0.50 0.25

0.75 0.50


1.00 0.75

0.6

A2B4

A2B4

A3B2

A3B2

A3B3

A3B3

A3B2

A3B2

A3B3

A3B3

A3B4

A3B4

A3B4

0.3

● 0.50 0.25

0.75 0.50


LFM

LFM

1.00 0.75

Parameter

A2B2

A2B2

● A2B3

● A2B3

A2B4

A2B4

A3B2

A3B2

A3B3

A3B3

A3B2

A2B4 0.6 A3B2

A3B3

A3B3

A3B4

A3B4

A3B4

A3B4

A4B2

A4B2 0.3 A4B4

A4B2

A4B2

A4B4 ●

A4B4

A4B4

0.6

0.3

●

●● ● ●●●● ●● ●● ●● ●● ● ● ●

● 1.00

0.0 0.00

0.3

●

A3B4

A4B2

A4B2

A4B2

●

●

●

●

●

0.0 0.25 0.00

0.50 0.25

0.75 0.50


LFM

LFM

1.00 0.75

1.00

●

A4B2

A4B4

A4B4

A4B4

●●

●

A4B4

5

0.25 0.00

0.50 0.25

0.75 0.50


1.00 0.75

0.3

0.3

0.0

0.0

●●

0.0

0.00

● Parameter

● A2B3 Specificity

0.6

0.9

Specificity

A2B4

0.25 0.00

1.00

●

0.9 Parameter Parameter A2B2 A2B2

0.0

A4B2 0.3 A4B4

●

●

0.0

1.00

A4B4

A3B4

NOM.RNKNOM.RNK

0.9

Sensitivity

Sensitivity

● ● ●●● ● ● ●● ●● ●● ●● ●● ●●● ●

0.9

MP.RNK

A2B2 A2B4

●

0.00

A3B3

● A2B3

A4B2

A4B2 0.3 A4B4

● ●● ●● ●● ●● ●●●● ● ● ●●●

● A2B3

0.0

0.50 0.75

LFM

●● ● ●●●● ●● ●● ●● ●● ●●●● ● ●● 0.9

0.3

●

0.9

1.00 1.00

A2B4

Parameter Parameter

0.25

Parameter

NOM.RNKNOM.RNK

0.6

A2B4 0.6 A3B2

0.00

1.00

● ● ● ●● ●● ●● ●● ●●●● ● ● ●●● ●

●

0.00


1.00 0.75

● A2B3 A2B4 0.6 A3B2

0.75 1.00

Figure 3: Sensitivity and specificity plots for different set of α and β parameters for artificially generated information source data. The plot shows the behavior of different kinds of priors (top) and the corresponding optimal balanced accuracies (bottom). The number of nodes for all the networks here were 20 (m=20), and 6 information sources were used. 0.3

0.0

0.75 0.50

0.9

Specificity

Sensitivity

Sensitivity

A2B4

A4B2

0.3

●


● A2B3 0.6

0.25 0.50

MP.RNK MP.RNK

0.9

0.50 0.75


0.9

0.0

0.0

Specificity

0.9

A2B4

● ● ●●● ● ●● ●● ●● ●● ●● ●●●

0.00 0.25

A2B2 0.75 ● A2B3

A3B3

MP.RNK MP.RNK

0.25 0.50

●● ● ●● ● ●● ●● ●● ●● ●● ●● ● ● ●●

A3B3

1.00

0.3

0.9 Parameter

A3B2

0.50 0.25

0.6

A2B4

A4B4

Parameter

0.25 0.00

1.00 1.00

A2B2

A4B4

A2B2

0.00

Parameter Parameter

A4B2

● A2B3

0.0

●●

0.0●

A4B4 A4B4

1.00

● A2B3

● ● ● ●●● ● ● ●●●

0.0 0.0 0.00 0.25 0.00 0.25

MP.RNKMP.RNK

A4B2

●

0.0

0.00

0.3 0.3

MP.RNKMP.RNK

A2B2

0.00

A4B2 A4B2

A3B4 A3B4

Parameter Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2 A2B3 ● A2B3 ● ● A2B3 A2B3 ● A2B4 A2B4 A2B4 A2B4 A3B2 A3B2 A3B2 A3B2 A3B3 A3B3 A3B3 A3B3 A3B4 A3B4 A3B4 A3B4 A4B2 A4B2 A4B2 A4B2 A4B4 A4B4 A4B4 A4B4

0.25 0.50 0.75 0.50 Probability 0.75 1.00 Probability ProbabilityProbability

A4B2 0.3 A4B4

A2B4 0.6 A3B2

A4B4

●

0.0


0.75 1.00 0.75 1.00


A3B4

1.00

0.50 0.75 0.50 0.75

0.6 0.6

●

1.00 0.75

0.25 0.50 0.25 0.50


0.9 0.9

0.00 0.00

A4B2

Sensitivity

●

0.00 0.25 0.00 0.25

● A2B3 ● A2B3

0.0

0.75 0.50

A3B3 A3B3

A4B2 A4B2 A4B4 ● A4B4

0.0 0.0

0.0 0.0

1.00 1.00

●

0.9

Specificity

Sensitivity

Sensitivity


●

● 0.6

A3B3

● ● ●●● ● ● ● ●● ●● ●● ●● ●● ●● ●●●●

● ● 0.9

A2B4 0.6 A3B2

IP.RNK IP.RNK

Specificity

● 0.9

0.50 0.25

● ● ● ●●● ● ● ●●●

A3B4

0.0

0.00

1.00

● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ●● ●● ●● ●● ●●● ● ●●

A3B3 A3B3 A3B4 A3B4 A4B2 A4B2 0.3 A4B4 0.3 A4B4

A3B4

Sensitivity

0.25 0.00

●

0.25 0.50 0.75 0.50 0.75 1.00 0.25 0.50 0.75 0.50 0.75 1.00 Probability Probability

A2B2

● ● ●● ● ●●●● ●● ● ● ● 0.00

0.3 0.3

0.00 0.00

● ● ● ● ● ● ● ● ● ●● ●● ●

0.9

0.9 Parameter

A4B4

●

0.0

0.3 0.3 A4B4 A4B4

A3B2 A3B2

A3B3 A3B3 A3B4 A3B4

●

●

0.0 0.0

A3B3 A3B3 A3B4 A3B4 A4B2 A4B2 A4B4 A4B4

●

●

0.0 0.0 0.00 0.25 0.00 0.25

●●

●

●

A2B2

Specificity

A2B4

0.9

Specificity

Sensitivity

Sensitivity

● A2B3 0.6

1.00

●


0.9

A3B4 A3B4

0.9 Parameter 0.9 Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2 A2B3 ● A2B3 ● A2B3 ● A2B3 ● A2B4 A2B4 A2B4 A2B4 0.6 A3B2 0.6 A3B2 A3B2 A3B2

● ●

NOM

●● ● ●●●● ●● ●● ●● ●● ●● ●● ● ● ● 0.9

0.00 0.00

Sensitivity

0.75 0.50

A4B2 A4B2

A2B4 A2B4

A3B2 A3B2

●● ●● ●●● ●● ● ●● ● ●● ● ●● ●●● ●●● ●● ● ●● ●●● ●●● ●●● ● ●● ● ●

0.3 0.3

Sensitivity

0.50 0.25

A3B3 A3B3

A2B4

0.6 0.6

0.6 0.6 A3B2 A3B2

LFM LFM IP.RNK IP.RNK

●

● ●

Sensitivity

0.25 0.00

0.0

Sensitivity

0.00

● ●● ●● ●● ●● ●●●● ● ● ●●●

Optimal Balanced Accuracy

● ● 0.0

●● 1.00 1.00

●

●●

0.9 0.9 0.9 ●●0.9 Parameter

●

0.0

●● 0.75 1.00 0.75 1.00

A2B4 A2B4

LFM LFM IP.RNK IP.RNK

0.9

●

0.00 0.25 0.00 0.25

MP

● ● ● ●● ●● ●● ●● ●● ●●●● ● ● ●●●

Specificity

Sensitivity

0.3

0.25 0.00

Specificity Specificity

0.0

A4B4

●●

A2B3 ●● A2B3 ●●A2B3

●


0.3 0.3

A4B2 A4B2 A4B4 A4B4

MP

●

Sensitivity Sensitivity

0.3

A4B2 0.3 0.3 A4B4

0.9 Parameter Parameter

●


Specificity

A3B4

A4B2

MP

0.9

A3B2 A3B2

A3B4

●

0.9

A2B4 A2B4

A4B2 0.3 A4B4

A3B3

1.00 1.00

● Parameter Parameter Parameter A2B2 A2B2 A2B2

●

A2B3 ●● A2B3 A2B3 ●● A2B3

0.6

A3B4

1.00

0.75 1.00 1.00 0.75


MP

A2B4 0.6 0.6 A3B2

A4B2

A3B3

0.50 0.75 0.75 0.50

0.9 0.9

0.9 0.9

Specificity


1.00 0.75

A3B2

A3B4

● ● ● ●● ●● ●● ●●●● ● ● ●●● 0.75 0.50

0.6

0.25 0.50 0.50 0.25


●

Parameter Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2

A3B3 A3B3 A3B4 A3B4

A4B4

0.50 0.25

A2B4

0.9

Specificity

0.25 0.00

A3B3

0.00 0.25 0.25 0.00

Specificity

●

Specificity

A2B2

● A2B3

A3B3

●

NOM NOM.RNK NOMNOM.RNK


0.0

●

●

Sensitivity

A2B2

● A2B3

A4B2 A4B2 A4B4 A4B4

0.0 0.0

0.00 0.00

Specificity

0.0

0.6

A2B2

● A2B3

A3B2

1.00 1.00

Specificity

● 0.3

0.00

A2B2

A2B4 0.6 A3B2

0.75 1.00 1.00 0.75

A3B3 A3B3 A3B4 A3B4

A4B4

● ●● ● ●●● ●●● ●●● ●●● ● ●● ● ●● ●● ●● ●● ●● ●● ● ●● ●● 0.9 0.9 Parameter


0.3

●

Parameter

● A2B3 A2B4

0.6

0.9

Specificity

●

Sensitivity

Sensitivity

0.6

0.9 Parameter Parameter

0.9

0.50 0.75 0.75 0.50

A2B4 A2B4 A3B2 A3B2

A3B3 A3B3 A4B2

NOM NOM.RNK NOMNOM.RNK

IP

●● ● ●●●● ●● ●● ●● ●● ●●●● ● ● ●●● 0.9

0.25 0.50 0.50 0.25


A2B4 A2B4 A3B2 A3B2 A3B4

0.3 0.3

0.0 0.0

Specificity

IP

IP


IP

0.25 0.25 0.00

A2B3 ● A2B3 ●●A2B3 ● A2B3

0.6 0.6

●

0.00 0.00

Parameter Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2

●

●



● Sensitivity Sensitivity

0.9 Parameter 0.9 Parameter Parameter Parameter A2B2 A2B2 A2B2 A2B2


0.9 0.9

0.6 0.6

●●● ● ●● ●●● ●●● ● ●● ● ●● ● ●● ● ●● ●●● ●●● ●● ●● ●●● ●0.9 ●

● ● ●

●● ● 0.9 0.9

1.00

0.00

0.25 0.00

0.50 0.25

0.75 0.50


1.00 0.75

1.00

A=2,B=2 1.00

A=2,B=3 ● ●

● ● ● ● ● ● ●

A=2,B=4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

A=3,B=2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

A=3,B=3

A=3,B=4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

A=4,B=2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

A=4,B=4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

Values

0.75

0.50

0.25

0.00 S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

S1 S2 S3 S4 S5 S6

Variables with different shape parameters

Figure 4: Box-plot showing the distribution of confidence values in 6 artificially generated information sources (S1 - S6) with different shape parameters α (A) and β (B)

6

0.7 0.6

●

●

0.5

● ● ● ● ●

●

0.4

Correlation coefficient

● ● ●

0.2

0.3

●

A2B2

A2B3

A2B4

A3B2

A3B3

A3B4

A4B2

A4B4

Data sets with different shape parameters

Figure 5: Box-plot showing the distribution of pairwise Spearman rank correlations across 6 artificially generated information sources for different shape parameters α (B) and β (B). Rank correlations were computed for every pair of artificially generated matrices X (i) and X (j) .

7

● ●

●

0.9

●

Source

Sensitivity

0.6

●

0.6

4

0.3

0.3

0.0

0.0

● ●

1

● 2

● 2

3

3

4

4

5

5

● ●

6

0.25 0.50

4

● 0.3

0.0

0.0

●

●●

0.50 0.75

0.75 1.00

1.00

0.00

0.00 0.25

● 0.9

●

1

● 2

3

3

4

4

●

0.9

0.6

●

0.25 0.50

0.50 0.75

0.75 1.00

0.6

3 4

6

●0.3

0.25 0.50

0.25 0.50

0.50 0.75

1

● 2

3

3

4

4

● 5 6

5

● Specificity

●

1

● 2

0.6

0.6

●

6

●

0.3

0.75 1.00

0.0

1.00

● Source

1

● 2

●

● ●

3

3

4

4

5

5

6

6

0.0 0.50 0.75

●

0.75 1.00

●

0.6

●

● 0.3

4

5

5

6

6 0.3

4

0.75 1.00

1.00

6

● ● Source

●

0.6

● 0.3

●

0.0

1.00

1

● 2

3

3

4

4

5

5

6

6

●

●

●

●

Source

1

● ● 2

● ●

● ● ● ●●

0.0

0.00

0.25 0.00

0.50 0.25

0.75 0.50

1.00 0.75

1.00


LFM

LFM

0.6

0.9

LFM

0.9

Source

1

1

● 2

● 2

Source

0.6 3

3 4

4

5

5

6

6 0.3

0.3

●

0.6

● ●

●

● ●

0.3

Source

●● ●● ● ● ● ●1 ● ● 2 ● ● 3

●

● ●

●

1

● 2 3

4

4

5

5

6

6

● ●

Source

●

0.50 0.25

0.75 0.50

1.00 0.75

1.00

0.0

0.00

0.25 0.00

0.50 0.25

0.75 0.50

1.00 0.75

1.00


Source 0.75

0.25 0.50

0.50 0.75


MP.RNK MP.RNK

0.75 1.00

1

● 2

3

3

5

●

0.0

0.25 0.00


● 2

6

0.00 0.25

0.0

0.00

1.00

0.0

0.9

●

1.00 0.75

5 0.3

1.00

● ●●● ● ●●● ● ● ● ● ● ●

MP.RNK MP.RNK

●

●

Source

0.0

0.50 0.75

●

Method

4

IP

5

IP.RNK

6

LFM

0.50

MP MP.RNK NOM NOM.RNK

1.00 0.25

● ● ● ● ● ● ● ●● ● ● ●●● ●

0.9 0.00

1

● 2

● 2

3

3

4

4

5

5

6

6

Specificity

●

●

Source

1

0.6

●

Source

●

Source

1

1

● 2

● 2

3

3

4

4

5

5

6

6

0.6

●

1

2

3

4

5

6

Size of network

# Sources

Figure 6: Sensitivity and specificity for different number of artificially generated information sources (α = 2, β = 2). The plot shows the behavior of different priors (top) and the corresponding optimal balanced accuracies (bottom). The number of nodes for all the networks here were 20 (m=20). 0.3

0.3

0.3

●

●

0.0

0.00

0.00 0.25

● ● ● ● ● ● ● ●● ● ● ●●●

0.25 0.50

0.50 0.75

0.75 1.00

0.0

1.00

●

0.0

0.00

0.00 0.25

0.25 0.50

0.50 0.75

0.75 1.00



NOM.RNK NOM.RNK

NOM.RNK NOM.RNK

● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Source

Sensitivity

● 0.6

0.6

0.3

0.3

0.0

0.0

1

● 2

● 2

3

3

4

4

5

5

6

6

0.6

0.6

● ● 0.3

0.00 0.25

0.25 0.50

0.50 0.75

0.75 1.00


LFM

LFM

● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●

1.00

0.0 0.00

● ●

●● 0.00

●

● ●

●

Source

● 1

●

●

0.9

0.3

●

1.00

● 0.9

●

Specificity

0.9

Specificity

0.9

Sensitivity

0.6 3

●

●

●

0.0

6

4

0.00

Specificity

Sensitivity

Sensitivity

0.3

3

5


●

3

●

●

0.0

1.00

Source

0.6

● ●

0.25 0.50

● 0.6

0.3

●

1

6

●

● ●

●

●

●● ● ●● ●

0.6

1

5

6

1.00 0.75

1

● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● 0.9

● 2 0.6● 2

● ●

●

0.75 0.50

0.9

● 2

0.3

Source

1

● ●

0.9

●

●

●

Source

1

● 2

●

0.25 0.50

0.50 0.25

Source

1●

4

0.75 0.50

4

5

●

3

0.50 0.25

3

4

●

0.9

Specificity

●

●

0.9

0.25 0.00

0.9

● 2

0.0

1

● 2

3

●

0.00

●

Source

1

● 2

●

NOM.RNKNOM.RNK

0.6

1.00

0.0


0.9

●● 0.00 0.25

0.9

0.3

●

NOM.RNKNOM.RNK

●

0.25 0.00

6

0.3

0.0

1.00

Source

0.00

●●

0.0

0.00

Specificity

Sensitivity

Sensitivity

●

●

4

●

●

●

0.00 0.25

6 0.3

1.00 0.75

● Source

IP.RNKIP.RNK

0.6

●●

Source

IP.RNKIP.RNK

●

0.00

0.6


LFM

● Source

1.00

0.9


●

0.75 1.00

● 0.9

●


●

0.0

0.50 0.75

NOM NOM

●

0.3

6

●●

0.00 0.25

5

6

Source


●

0.0

●●

0.6

0.3

●

0.0

0.00

Specificity

Sensitivity

Sensitivity

1.00

●

●●

0.75 0.50

●

●

5

●

0.0

0.3

●

4 5

NOM NOM

0.0

0.9

3

6

●

●●

1

5

0.3

0.9

4

5

● ● ● ●● ●● ●●● ● ● ● ●

Source

4

5

● ● ● ●● ●● ●●● ● ● ● ● ●

●

0.6 3

3

0.9

6


●

0.00 0.25

1


0.6

0.00

0.9

3

4

1.00 0.75

● ● ● ●● ●● ●● ●● ●● ● ● ● ● ●

● ● ● ● ●● ● ● ●●● ● ● ● ● ● ●

●

0.75 0.50

●

●

0.0

1

● 2

●

●

0.3

1

● 2 ●

0.50 0.25

1

● 2

3

0.9

Source

4

MP

5

● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 0.9

0.6

●

0.25 0.00

● 2 0.6● 2

●

0.9

0.00

6

● ● ● ● ● ● ● ●● ● ● ●●●

0.00 0.25

1.00

Source

0.3

0.0

0.00

0.75 1.00

5 0.3

0.0

0.50 0.75

Source

1

● 2

0.6

●

0.25 0.50

MP

Specificity

Sensitivity

Sensitivity

0.3

● 0.9

●

0.0

0.50 0.25

MP.RNK MP.RNK

0.3

0.0

MP

Source

0.6

0.25 0.00

●● ●

0.6

Source

1

● 2

0.0

0.00

MP.RNK MP.RNK

Source

6

0.0


●


0.9

●

●

1.00

●

Specificity

●● 0.9

0.3

●

●

● ● ●●● ●●●


MP

●

1.00 0.75

5

6 0.3

4

5

●


Specificity

0.00 0.25

●

3

●

0.6

0.3

Specificity

● 0.0

0.00

● 2 0.6● 2 3

0.75 0.50

●

Source 1

6 0.3

● ●

0.0

●

0.6

6

0.3

●

●

0.9

6

Specificity

0.3

Source 1

0.6

0.9

●

5

Specificity

●

●

Sensitivity

●

Source

1

●● ●

● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●

Sensitivity

0.6

●

Sensitivity

0.6

Source

●

●

0.9

Sensitivity

Sensitivity

Sensitivity

●

Specificity

●

Sensitivity

0.9

●

Sensitivity

●

0.9

IP


●

●

IP

Specificity

● 0.9

IP

0.50 0.25

4

Source

● ●

● ●

0.25 0.00

●

5

●

●

●

0.6 3

3

●

IP

1

● 2

●

●

0.00

Source

1

● 2

●

Specificity

Sensitivity

●

● ● ●●● ● ● ● ● ● ● ● ● ●

●

0.9

Specificity

●●

0.9 ●

Specificity

●

Specificity

●● 0.9

●Source Source ●

● ●

1

1

● 2

● 2

3

3

4

4

5

5

6

6

●

8

● ● ● ● ● ●●

0.0 0.00 0.25

0.25 0.50

0.50 0.75


LFM

LFM

0.75 1.00

1.00

● ●● ●● ●● ●● ●● ● ●● ●● ● ●

●● 0.9 0.9

Size

●

●

40

40

60

60

0.3 0.3

●

● ●

0.0 0.0

0.3

60

10 0.6

● 20 40

● 20 40

0.3

0.0

0.0

0.00

1.00

10

40 60

60

60

0.0

0.6

0.3

40 60

40

0.3

● ●● ●● ●● ●● ●● ●●● ●●

0.25 0.00

0.50 0.25

0.75 0.50

NOM

NOM


1.00 0.75

● ● ● ●●● ●● ●● ●● ●● ●● ●●● ● 0.9

NOM

NOM

1.00 0.75

40 60

40

●

0.0

0.00

0.25 0.00

0.50 0.25

0.75 0.50


1.00 0.75

0.0

●

●●20

0.25 0.00

0.50 0.25

0.75 0.50


1.00 0.75

1.00

●

●

Size

Size 10

40

40

60

60

●

40 60

● ● ● ● ● ● ● ●● ●● ● ● ●

●

Size

0.50 0.25

0.75 1.00

0.6

0.6

0.0

●

● ● ● ●● ● ● ● ● ●● ●● ●● ●

10

● 20

40

40

60

60

0.9

●

Size

● ●

40

10

● 20

● ●

0.3

40

● ●

60

● ●

● ●

● ●

0.0

0.75 0.50

●●●

1.00

0.00

0.6

40

0.25 0.00

0.25

0.50

0.50 0.25

0.75

0.50 Probability Probability

0.75

1.00

0.75 0.50


1.00 0.75

1.00

LFM

●●●

Size

10

● 20

0.6

40 60

60 0.3

● 0.0

0.25

●

0.9

0.3

●● ●

●

Size

10

● 20

0.0

0.00

1.00 0.75

Size

●

● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

●

●

0.0

0.00

1.00

Size

10

● 20

0.3

LFM

●


IP.RNK IP.RNK

0.50 0.50 0.75


60 0.3

0.0 0.25 0.00

0.25 0.25

● ●

0.9 ●

0.6 10 0.6 ● 20

0.3

0.00

1.00

60

0.9

0.3

●

10

●

0.6

● 0.0

●

●

0.3

40

60

●

LFMLFM

0.9

60

0.3

40

0.00 0.00

1.00

Size

10 0.6 ● 20

10

● 20 ● 20

60

0.9

Specificity

Sensitivity

0.3

0.75 0.50

Size

10

● 20

Size

Size 10 0.6 0.6 ● 20

0.0 0.0

0.50 0.25


●

Size

0.6

10

● 20

40

●

0.25 0.00

0.9

0.9

0.3 0.3

●●

0.9

0.6

60

0.0

0.00

1.00

40

0.3

0.0

●●●

● 20

●

Specificity

0.0●

0.00

●

60

10

Size

10

● 20

NOM.RNK NOM.RNK

0.9 0.9

0.6

●●●

0.0

0.00

1.00

Specificity

● 20

0.75 1.00

0.0

● ●● ● ● ● ● ●● ● ● ● ●● ●● ●● ●● ●●

●●●

Size

10 0.6 ● 20

0.50 0.50 0.75


1.00

0.6

NOM.RNK NOM.RNK

Size

10

0.25 0.25

MP

60

0.3

Specificity

Size

0.00 0.00

1.00

● ● ●●●●● ● ● ● ●● ●● ●● ●● ●●

Specificity

● 0.6

0.3

0.0

1.00 0.75

0.9

Specificity

Sensitivity

Sensitivity

0.3

0.75 0.50

●●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● 0.9

●

0.50 0.25


MP

0.9

0.6

0.25 0.00

MP

●

0.9

Sensitivity

0.00

1.00

40

60

Size

10

Specificity

●

●●●

Sensitivity

MP

1.00 0.75

Sensitivity

0.75 0.50

Sensitivity

0.50 0.25


10

● 20

40

●

Size

● 20 ● 20 40

0.0 0.0

Sensitivity

0.25 0.00

0.0

Specificity

0.0

0.00

● ● ● ●● ●● ●● ●● ●●● ●●

10

● 20

1.00 0.75

● ● ● ● ●● ● ●● ●● ●● ●● ●● ● 0.9

0.9

●

● 0.0

0.75 0.50

MP.RNKMP.RNK

40

0.3 0.3

0.50 0.25

MP.RNK MP.RNK

Size

0.3

0.25 0.00


10 0.6 0.6 ● 20

60

60

●

0.75 1.00

Size

10

0.6

Sensitivity

40

● 0.3

Size

Specificity

● 20

●

0.50 0.50 0.75


0.9 0.9

Size

10

0.6

0.3

● ● ● ●

●●●

0.9

Specificity

●

Sensitivity

Sensitivity

Size 0.6

● ● ●● ●● ●● ●●●●●

Specificity

0.9

0.9

0.25 0.25

IP

●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● 0.9

0.3

Specificity

IP

IP

Sensitivity

IP

0.6

● ●

0.00 0.00

Size

Size

10

0.6 ● 20 ● 20

Specificity

Sensitivity

Sensitivity

10

Specificity

Size

● ● 0.6 0.6

●●●

0.9

0.9

● ●

1.00

0.00

0.0 0.25 0.00

0.50

0.75

0.25 0.50 Probability Probability

1.00 0.75

1.00

IP.RNK IP.RNK 1.00

●● 0.9

Sensitivity

Sensitivity

0.6

0.3

Size

●

10

● 20

0.6

40

●

0.3

●

60

●

100.6 ● 20 40

0.0 0.25 0.00

40 60

60 0.3

0.50

● ● ●● ● ● ● ● ● ● ● ●●● 0.75

1.00

0.25 0.50 0.75 Probability Probability

0.0 0.00

1.00

●

0.9

●

0.25 0.00

0.50

0.9

IP.RNK LFM

0.50

MP MP.RNK NOM NOM.RNK

1.00

0.00

●●●

10

●

20

40

60

Size of network

Size

Size

Specificity

40 60

Specificity

10

● 20

0.6

100.6

● 20 40

10

● 20

0.6

40

Size

60

10

● 20 40 60

0.3

0.3

●

● ● ● ● ● ● ● ●●● 0.0 ● ● ● ● ● ● ●

0.00

0.25 0.00

0.50

0.75

●●●

0.0

1.00


●

0.0

0.00

0.25 0.00

1.00

NOM.RNK NOM.RNK

0.50

0.75

1.00


●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●

●

●●

● 0.9

0.9

0.9 Size

0.6

Size

Specificity

10

● 20

Specificity

Sensitivity

Size 0.6

1.00

NOM.RNK NOM.RNK

0.9

Sensitivity

IP

60

MP.RNK

● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● 0.9

0.3

0.0

Method

40

0.25

1.00

60

0.3

10

● 20

Figure 7: Sensitivity and specificity for different network sizes (m = 20, 40 and 60 nodes) for artificially generated information sources (α = 2, β = 2). The plot shows the behavior of different priors (top) and the corresponding optimal balanced accuracies (bottom). The number of sources for all the networks here were 6. Sensitivity

Sensitivity

●

0.9

Size 0.6

0.75


MP.RNK

●

Size

0.0

MP.RNK MP.RNK ●

0.75

10

● 20

0.6

● ●

0.00

Size

Size

0.3

● 0.0

0.9

●

●

●●●


0.9

Specificity

●

Specificity

0.9

● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●

100.6

40

● 20

60

40

10

● 20

0.6

40 60

60 0.3

0.3

0.3

●

● ● 0.0 0.00

0.25 0.00

0.50 0.75 Probability 0.25 0.50

Probability

1.00 0.75

0.0

● 1.00

10

9

● 20 40 60

0.3

0.0

Size

●● ● ● ● ● ● ● ● ●● ● ● ● ● ●

●

● ●

●

0.0 0.00

0.25 0.00

0.50 Probability 0.25

0.75 0.50

Probability

1.00 0.75

1.00

10

● 20 40 60

IP.RNK IP.RNK

IP.RNK IP.RNK

0.9

0.9

0.9●

0.9

●

10

● 20

●

●

40

0.3

●0.3

0.0

0.0

0.9

0.9

0.3

Size

10

● 20

● 20

40

40

60

60

0.3

0.6

0.9

0.9

60

●

● ● ● ● ● ● ● ● ● ● ●●●

0.25 0.50

0.50 0.75


●●●

0.75 1.00

1.00

● 20

0.6

0.3

● 20

40

40

60

60

0.0

0.0 0.00 0.25

●

10

● 20

0.3

0.3

0.3

0.0

0.0

0.0

0.0 ●

40

40

60

60

0.250.50

0.500.75


MP

0.751.00

0.00

1.00

0.000.25

0.250.50

0.500.75


MP

MP

0.751.00

0.9

0.9

0.3

●

Size

10

10

● 20

● 20

40

40

60

60

0.6

0.3

0.3

0.50 0.75

●●●

0.75 1.00

0.9

●

40

40

60

60

0.00 0.25

10

0.6

● 20

0.00

0.000.25

0.250.50

0.500.75


0.751.00

0.00

1.00

0.000.25

0.250.50

0.500.75


NOM NOM

0.25 0.50

0.751.00

0.00 0.25

0.00

1.00

0.25 0.50

0.50 0.75

0.75 1.00

40

40

60

60

1.00

0.00 0.25

0.25 0.50

LFM

● 0.3

●● 0.0

0.0

0.00

0.000.25

0.250.50

0.500.75


0.751.00

● ● ● ● ● ●●

0.000.25

●

0.9

0.9

0.9

10

● 20

40

40

60

60

●

0.250.50

0.0

0.00

0.000.25

●

● ●● ●● ● ● ● ● ● ● ● ● ● ●●● 0.250.50

0.500.75


0.751.00

0.0

0.0

10

● 20

40

40

60

60

0.6

0.3

● ●● ●● ●● ●● ●● ●● ●●● ● ● ● 0.250.50

0.500.75


0.751.00

40

60

60

● ●

0.0

0.00

0.25 0.00

0.50 0.25

0.75 0.50


1.00 0.75

1.00

STRINGSTRING

0.9

●

●● ●● ●● ●● ●● ●● ●● ●●

●

●

0.9

40

60

60

0.6

Size

Size

10

10

● 20

● 20

40

40

60

60

0.3

0.6

0.6

0.3

0.3

0.0

0.0

Size

10

10

● 20

● 20

40

40

60

60

● 0.0 0.250.50

0.500.75


0.751.00

1.00

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

0.0

0.00

0.25 0.00

0.50 0.25

0.75 0.50


1.00 0.75

1.00

0.00

0.25 0.00

0.50 0.25

0.75 0.50


1.00 0.75

1.00

● ●● ●● ●● ●● ●● ●● ●●● ● ● ●

0.6

●

10

● 20

40

40

60

60

●

0.3

0.0

● 0.0

0.00

1.00

Size

10

● 20

0.000.25

0.250.50

0.500.75


0.751.00

1.00

NOM.RNK NOM.RNK

●

●

●

0.9

●

0.3

0.0

0.0

Size

Size

Specificity

0.6

10

10

● 20

● 20

40

40

60

60

0.6

0.3

0.00

●

0.000.25

0.250.50

0.500.75


0.751.00

0.6

●

● ● ● ●● ●● ●● ● ● ● ●

0.9

0.000.25

0.250.50

Size

●

●

0.00

10

10

● 20

● 20

40

40

60

●

0.6

60

0.250.50

● ●● ●● ●●● ● ● ● 0.500.75


40

60

60

● ●

0.751.00

1.00

0.751.00

1.00

●

● ●

● ●● ●● ●●● ● ● ● ●

● Size

●

●

10

● 20

0.00

40 60

●

0.0 0.000.25

0.250.50

0.500.75


0.751.00

10

● 20

60

0.3

0.0

10

Size

40

●

●

●

●

0.0

●

0.6

● 0.3

0.3

0.000.25

Size

Specificity

●

●

0.0

●

●

●

0.500.75


●

0.9

Specificity

Sensitivity

0.3

0.6

10

● 20

40

LFM LFM

●

●

Size

10

● 20 ●

0.0

0.00

1.00

●

0.3

0.0

●

●

LFM LFM

● 0.9 ●

●

0.9

Specificity

●● ●● ●● ●● ●● ●● ●● ●● ●

0.9

Sensitivity

Sensitivity

10

● 20

40

0.9

●●

Sensitivity

0.0

1.00

Size

10

● 20

Figure 8: Sensitivity and specificity for different network priors and STRING for KEGG sub-graphsm = 20, 40 and 60 nodes ( top to bottom). Corresponding oBAC are available in the main document.

0.3

0.6

●●●

1.00 0.75

Size

10

40

Size

Specificity

Size

10

● 20

Size

0.9

0.75 0.50


Size

0.3

● 0.9

0.000.25

●●

0.6

0.50 0.25

● ● ● ●● ● ●

Size

0.3

●

●

●●●

● ●

●

●

10

NOM.RNK NOM.RNK

0.9

60

0.9

● 20 0.6 ● 20

0.000.25

●

0.00

0.25 0.00

0.9

0.0

●

●

60

●

●

● ●● ●●● ●● ●

0.6

STRINGSTRING

●

0.00

Specificity

Sensitivity

Sensitivity

0.3

40

0.6

0.3

●

LFM

●

MP.RNK MP.RNK

0.6

0.3

40

●

0.0

0.00

0.3

0.0

1.00

Size 0.6

1.00

0.6

● 0.9 ●

0.751.00

10

● 20

● ● ●● ●● ●● ●● ●● ●●● ● ● ● ●

MP.RNK MP.RNK

0.9

0.500.75


Size

10

● 20

0.3

●

1.00

●

●

0.0

Size

● ●

0.75 1.00

●

●

0.6

0.3

0.3

●

●

●

Sensitivity

10

● 20 ●

● 0.3

60

0.3

●

●

0.9

●

●

● ●

Size

Specificity

0.6

Size

Specificity

Sensitivity

Sensitivity

Size

●

60

0.6

IP.RNKIP.RNK

●

0.6

●

40

0.0

IP.RNKIP.RNK

0.9

●

40

0.0

0.00

1.00

●

●

0.9

●

Specificity

0.3

10

0.50 0.75

LFM

● ●

Size

● 20 0.6 ● 20

60

●


Specificity

●

60

0.3

0.0

●

40

60

0.0

0.00

Specificity

60

40

● Size 10

0.6

●

10

● 20

40

●

● ● ● ● ●●● ●● ●● ● ●

Size

10

● ● 20

●

● 0.3

0.0

Specificity

40

0.6

Sensitivity

10

● 20

● 0.9 ●

0.9

●

● Size

10

● ● ● ●● 20 ●

●

●

Sensitivity

0.3

Size

●

●

0.6

Specificity

Sensitivity

Sensitivity

● 0.6

●

0.9

Sensitivity

0.9

●

Specificity

0.9

●

●


LFM

1.00

0.6

●●

●● ●● ● ● ●● ● ● ● ● 0.9 ●

0.75 1.00

Size

10

● 20 0.6

●

●

NOM NOM

0.50 0.75


Size

0.3

0.0

0.0

0.0

60

0.9

0.9

●● 0.0

40

60

●

●

●

●

● ● ●● ●● ●● ●● ●● ●● ●●● ● ● ● 0.0 ● ●

0.0

10

● 20

40

●

0.00

0.3

0.3

0.3

Size

10

● 20

●

●

Size

10

●●●

0.0

0.0

1.00

Size

10

● 20 0.6 ● 20

0.6

●● ●● ●● ●● ●● ●●● ●●

NOM.RNK NOM.RNK

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●

●● 0.9

Sensitivity

Size

Specificity

0.6

● ●● ●● ●● ●● ●● ●● ●● ●●● ● ● ● ● ● 0.9

Specificity

Sensitivity

Sensitivity

Size 0.6

0.25 0.50


1.00

0.6

NOM.RNK NOM.RNK

MP

●

0.9

0.00 0.25

0.00

1.00

0.75 1.00

0.3

Specificity

0.000.25

● ● ● ● ● ● ● ● ● ● ●● ● ● ●

●

● 0.3

●

0.50 0.75

Size

10

● 20 0.6

Specificity

0.00

Sensitivity

●● ● ●● ●● ●● ●● ●● ●● ●●● ● ● ● 0.0 ●● ●

0.25 0.50


Size

● 0.0

60

0.9

0.9

0.6

0.3

40

MP.RNKMP.RNK

Size

10

10

● 20

60 0.3

0.00

●

●

Size

10 0.6

Size

10

● 20 40

●

●

●● ●● ●● ●● ●● ●● ●● ●● ●●● ● ● ●

Sensitivity

Size

Specificity

10 0.6

Size

● 0.6

MP.RNKMP.RNK

IP

0.9

Specificity

Sensitivity

Sensitivity

Size 0.6

40

Specificity

●● 0.9

●

Specificity

IP

IP

●

0.00 0.25

Sensitivity

IP

10

● 20 0.6

●

●

0.00

60

●

●●●

Size

Specificity

Size

0.6

Specificity

Sensitivity

Sensitivity

●

0.6

● ● ● ● ● ●● ●● ● ● ● ● ● ●

●

1.00

356

7043

7040

355

7046

7048

5495

1616

5536

6885

7042

208

6416

4217

5606

408

5293

5295

8503

5296

5294

5170

208

7043

7040

355

7046

7048

5495

1616

5536

6885

1852

2316

3479

356

5601

6416

408

1852

2316

5601

5563

6195

27330

6197

5594

9479

4296

5595

11072

3091

5595

23542

3630

2475

10000

208

4217

5606

5595

7042

5595

54541

5778

1849

5594

1432

4137

7249

1843

80824

6300

4149

5600

3303

5601

5609

5602

4208

●●●●●● 8651

4093

658

4091

4090

4086

8454

728622

9765

4087

4088

9655

23624

122809

30837

867

90

●●●●●●●●●●● 3459

163702

3581

3561

3572

2057

4352

3575

3977

3587

3598

4089

3399

3397

3398

1875

2033

6667

5308

1387

●●● 3717

7297

3716

145957

2549

5294

5296

5293

2064

5295

2065

2066

23533

25759

9542

145957

2065

2066

2069

1839

7039

1950

1956

6464

53358

5336

5335 8503

208

5579

1956

23533

5296

5293

25759

399694

6464

867

5578 10000

2885

1026 6655

3458

814

5530

810

805

5257

817

163688

815

801

64750

808

130399

5137

115

4638

107

113

114

5568

7046

657

4092

4086

4090

4093

3398

3399

196883

1875

5613

9372

4091

4087

4088

1874

5308

2033

5567 4609

Figure 9: The 10 graphs with # nodes=20 sampled from KEGG via random walks. The node IDs represent Entrez IDs.

11

2549 1857

5293

208

3714

51107

5663

4851

4854

4855

4853

207

1027

2475

5335

5336

5532

5578

572

23291

100137049

1855

23533

10000

1026

55851

2619

64399

4036

6469

3549

50846

5582

56848

2736

2737

3845

5894

5727 5605

5336

3358

2911

623

3357

146

3356

5579

5894

56848

5605

5604

5594

5595

5319

123745

2776

2767

9630

5330

784

3791

5566

5567

5568

5613

25780

6714

5906

5908

998

6237

5293

8503

5291

5295

673

5881

1456

207

5336

5335

5533

5582

5578

56848

8877

1453

5321

2737

5879

2735

2736

3845

7479

656

8643

652

7478 5894

Figure 10: 10 directed acyclic graphs (DAGs) with # nodes=10 selected via random walks on KEGG pathways. The node IDs represent Entrez IDs.

12

Reconstruction

Reconstruction

0.9

0.9

●

●

●

●

●

●

●

● METHOD

METHOD

IP

IP

● IP.RNK

Specificity

Sensitivity

● IP.RNK 0.6

LFM 0.6 MP

LFM

MP.RNK

MP.RNK

NOM

NOM

NOM.RNK

NOM.RNK

NP

NP

MP

0.3

0.3

●

●

●

●

●

●

●

●

0.0

0.0

5

10 2

20

50 4

100

500 6

1000

5000 8

5

10 2

20

450

100

500 6

1000

5000 8

# Samples

# Samples

Figure 11: Sensitivity and specificity of Bayesian Network reconstruction for simulated data for a network of 10 nodes (m = 10) with different kinds of prior. The corresponding balanced accuracies are shown in the main document

Figure 12: MetacoreTM network for 37 selected genes from vant’t Veer breast cancer data set. The network was retrieved using the shortest path algorithm (path length of 2). The entire set of interaction with literature evidences are in ‘Supplement 2’

13

References ˜ [1] Ethan G Cerami, Benjamin E Gross, Emek Demir, Igor Rodchenkov, Azagin Babur, Nadia Anwar, Nikolaus Schultz, Gary D Bader, and Chris Sander. Pathway commons, a web resource for biological pathway data. Nucleic Acids Research, 39:D685–D690, 2011. [2] Holger Fröhlich, Mark Fellman, Holger S¨ ultman, and Tim Beißbarth. Predicting pathway membership via domain sinatures. Bioinformatics, 24:2137–2142, 2008. [3] Holger Fröhlich, Mark Fellman, Holger S¨ ultman, Annemarie Poustka, and Tim Beißbarth. Large scale statistical inference of singnaling pathways from rnai and microarray data. BMC Bioinformatics, 8(386), October 2007. [4] Florian Hahne, Alexander Mehrle, Dorit Arlt, Annemarie Poustka, Stefan Wiemann, and Tim Beißbarth. Extending pathways based on gene lists using interpro domain signatures. BMC Bioinformatics, 9(1):3, 2008. [5] Marc Johannes, Holger Fröhlich, Holger S¨ ultmann, and Tim Beißbarth. pathclass: an r-package for integration of pathway knowledge into support vector machines for biomarker discovery. Bioinformatics, 27(10):1442–1443, 2011. [6] R I Kondor and J Lafferty. Diffusion kernels on graphs and other discrete structures. In Proceedings of the ICML, 2002. [7] Dekang Lin. An information-theoretic definition of similarity. In In Proceedings of the 15th International Conference on Machine Learning, pages 296–304. Morgan Kaufmann, 1998. [8] Nicola J Mulder, Rolf Apweiler, Terri K Attwood, Amos Bairoch, Alex Bateman, David Binns, Margaret Biswas, Paul Bradley, Peer Bork, Phillip Bucher, Richard Copley, Emmanuel Courcelle, Richard Durbin, Laurent Falquet, Wolfgang Fleischmann, Jerome Gouzy, Sam Griffith-Jones, Daniel Haft, Henning Hermjakob, Nicolas Hulo, Daniel Kahn, Alexander Kanapin, Maria Krestyaninova, Rodrigo Lopez, Ivica Letunic, Sandra Orchard, Marco Pagni, David Peyruc, Chris P Ponting, Florence Servant, Christian J A Sigrist, and InterPro Consortium. Interpro: an integrated documentation resource for protein families, domains and functional sites. Brief Bioinform, 3(3):225– 235, Sep 2002. [9] B. Raghavachari, A. Tasneem, T. M. Przytycka, and R. Jothi. Domine: a database of protein domain interactions. Nucleic Acids Res, 36(Database issue):D656–61, 2008.

14

[10] Andreas Schlicker, Francisco S Domingues, Jörg Rahnenf¨ uhrer, and Thomas Lengauer. A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics, 7:302, 2006.

15

Table 1: Pairwise Wilcoxon test for model performance comparison (false discovery rates) for KEGG sub-graphs with m = 10 nodes Methods IP IP.RNK LFM MP IP.RNK LFM 0.0041 0.0041 MP 0.0203 0.0203 0.0352 MP.RNK 0.0423 0.0423 0.0203 0.0304 NOM 0.0041 0.0041 0.2974 0.0041 NOM.RNK 0.0041 0.0041 1.0000 0.0099 STRING 0.0070 0.0070 0.0041 0.0945

MP.RNK 0.0041 0.0041 0.5111

NOM NOM.RNK 0.0041 0.0041 0.0041

Table 2: Pairwise Wilcoxon test for model performance comparison (false discovery rates) for KEGG sub-graphs with m = 20 nodes Methods IP IP.RNK LFM MP IP.RNK 0.0036 LFM 0.0036 0.2712 MP 0.0144 0.0064 0.0260 MP.RNK 0.0036 0.6250 0.4200 0.0348 NOM 0.0036 0.0036 0.5773 0.0036 NOM.RNK 0.0036 0.0036 0.4648 0.0036 STRING 0.0036 0.0036 0.0036 0.0260

MP.RNK 0.0091 0.0064 0.0036

NOM NOM.RNK 0.1022 0.0036 0.0036

Table 3: Pairwise Wilcoxon test for model performance comparison (false discovery rates) for KEGG sub-graphs with m = 40 nodes Methods IP IP.RNK LFM MP IP.RNK 0.0068 LFM 0.0039 0.0980 MP 0.0594 0.2166 0.0091 MP.RNK 0.0039 0.4038 0.0594 0.0201 NOM 0.0039 0.0495 0.6481 0.0039 NOM.RNK 0.0039 0.0039 0.9219 0.0039 STRING 0.0039 0.0068 0.0039 0.0383

16

MP.RNK 0.0039 0.0039 0.0039

NOM NOM.RNK 0.0091 0.0039 0.0039

Table 4: Area Under Curve (Standard deviations in brackets) values for KEGG pathway reconstruction with different number of nodes (in brackets are the standard deviations) Method 10 IP 0.500 (0.00) IP.RNK 0.500 (0.00) MP 0.828 (0.04) MP.RNK 0.833 (0.09) NOM 0.879 (0.05) NOM.RNK 0.801 (0.02) LFM 0.869 (0.05)

20 0.500 (0.00) 0.741 (0.03) 0.795 (0.03) 0.807 (0.08) 0.875 (0.04) 0.881 (0.02) 0.907 (0.01)

17

40 0.500 (0.00) 0.771 (0.04) 0.803 (0.05) 0.855 (0.10) 0.831 (0.04) 0.864 (0.01) 0.918 (0.02)

60 0.500 (0.00) 0.779 (0.04) 0.781 (0.05) 0.882 (0.11) 0.887 (0.06) 0.856 (0.02) 0.923 (0.03)

Table 5: Pairwise Wilcoxon test for Bayesian Network reconstruction for different sample size (5 to 5000, see main article for details) sample size

5

10

20

50

100

500

Methods IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP

IP IP.RNK LFM MP MP.RNK 0.85 0.11 0.11 1.00 0.85 0.11 0.17 0.17 1.00 0.17 0.11 0.11 1.00 0.11 0.85 0.17 0.19 0.83 0.17 0.85 1.00 1.00 0.11 1.00 0.19 0.703 0.032 0.062 1.00 0.703 0.032 0.437 0.683 0.032 0.437 0.272 0.309 0.683 0.272 0.613 0.227 0.125 0.418 0.227 0.483 0.026 0.105 0.259 0.026 0.683 0.422 0.041 0.041 1.00 0.422 0.041 0.049 0.102 0.060 0.049 0.041 0.041 0.906 0.041 0.240 0.049 0.041 0.466 0.049 0.422 0.304 0.722 0.041 0.304 0.049 0.922 0.016 0.016 1.00 0.922 0.016 0.016 0.016 0.154 0.016 0.018 0.016 0.864 0.018 0.120 0.046 0.020 0.197 0.046 0.703 0.197 0.653 0.016 0.197 0.016 0.625 0.029 0.018 1.00 0.625 0.029 0.102 0.126 0.029 0.102 0.031 0.018 0.240 0.031 0.177 0.029 0.029 0.177 0.029 0.206 0.102 0.625 0.018 0.102 0.237 0.675 0.016 0.016 1.00 0.675 0.016 0.092 0.092 0.016 0.092 0.016 0.016 1.000 0.016 0.022 0.038 0.022 0.063 0.038 0.378 0.016 0.016 0.197 0.016 0.177 18

NOM 0.37 0.13 0.957 0.959 0.304 0.041 0.059 0.016 0.625 0.102 0.049 0.197

NOM.RNK 0.27 0.905 0.049 0.026 0.031 0.799

Table 6: continuation of table 6 sample size

1000

5000

Methods IP.RNK LFM MP MP.RNK NOM NOM.RNK NP IP.RNK LFM MP MP.RNK NOM NOM.RNK NP

IP IP.RNK LFM MP MP.RNK 0.625 0.022 0.022 1.00 0.625 0.022 0.048 0.048 0.025 0.048 0.022 0.022 1.00 0.022 0.025 0.034 0.025 0.048 0.034 0.533 0.031 0.022 0.424 0.031 0.034 0.675 0.014 0.011 1.00 0.675 0.014 0.037 0.025 0.019 0.037 0.014 0.011 1.000 0.014 0.019 0.031 0.019 0.075 0.031 0.957 0.014 0.011 0.079 0.014 0.011

19

NOM 0.048 0.424 0.037 0.188

NOM.RNK 0.198 0.011

Boosting Probabilistic Graphical Model Inference by ... - ScienceOpen

Boosting Probabilistic Graphical Model Inference by ... - ScienceOpen

Suggest Documents

Probabilistic Inference in General Graphical

Abductive Inference with Probabilistic Graphical Models

Graphical Model Inference using GPUs

Bayesian Inference of Graphical Model Structures ...

Probabilistic Graphical Model Representation in ...

the TO-DAG probabilistic graphical model

Investigation of Probabilistic Graphical Model ... - Semantic Scholar

Probabilistic Graphical Model Structure Learning ...

A Probabilistic Graphical Model for Word-Level

Goal-Based Imitation as Probabilistic Inference over Graphical Models

Probabilistic Inference in General Graphical Models ... - IGI, TU-Graz

Probabilistic Inference in BN2T Models by Weighted Model Counting

Assessing Probabilistic Inference by Comparing the ... - MDPI

Boosting Probabilistic Choice Operators - Inria

Nesting Probabilistic Inference

Optimal control as a graphical model inference problem - Springer Link

Graphical Model Inference in Optimal Control of Stochastic Multi-Agent ...

Incorporating networks in a probabilistic graphical model to find ...

Training a Probabilistic Graphical Model with Resistive ... - arXiv

SNPest: a probabilistic graphical model for ... - Semantic Scholar

Training a Probabilistic Graphical Model with Resistive ... - arXiv

Probabilistic graphical model based approach for water ... - arXiv

A Probabilistic Graphical Model for Mallows Preferences - MPref 2016

Mixture of Trees Probabilistic Graphical Model for Video Segmentation