Studying the Conditions for Learning Dynamic Bayesian Networks to

SIMULATION http://sim.sagepub.com

Studying the Conditions for Learning Dynamic Bayesian Networks to Discover Genetic Regulatory Networks R. J. P. van Berlo, E. P. van Someren and M. J. T. Reinders SIMULATION 2003; 79; 689 DOI: 10.1177/0037549703040942 The online version of this article can be found at: http://sim.sagepub.com/cgi/content/abstract/79/12/689

Published by: http://www.sagepublications.com

On behalf of:

Society for Modeling and Simulation International (SCS)

Additional services and information for SIMULATION can be found at: Email Alerts: http://sim.sagepub.com/cgi/alerts Subscriptions: http://sim.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations (this article cites 15 articles hosted on the SAGE Journals Online and HighWire Press platforms): http://sim.sagepub.com/cgi/content/refs/79/12/689

Downloaded from http://sim.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008 © 2003 Simulation Councils Inc.. All rights reserved. Not for commercial use or unauthorized distribution.

Studying the Conditions for Learning Dynamic Bayesian Networks to Discover Genetic Regulatory Networks R. J. P. van Berlo E. P. van Someren M. J. T. Reinders Information and Communication Theory Group Department of Mediametics Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology P. O. Box 5031 2600 GA Delft, The Netherlands [email protected] Learning regulatory interactions between genes from microarray measurements presents one of the major challenges in functional genomics. This article studies the suitability of learning dynamic Bayesian networks under realistic experimental settings. Through extensive artificial-data experiments, it is investigated how the performance of discovering the true interactions depends on varying data conditions. These experiments show that the performance most strongly deteriorates when the connectivity of the original network increases, and more than a proportional increase in the number of samples is needed to compensate for this. Furthermore, it was found that a lower performance is achieved when the original network size becomes larger, but this decrease can be greatly reduced with increased computational effort. Finally, it is shown that the performance of the search algorithm benefits more from a larger number of restarts rather than from the use of more sophisticated search strategies. Keywords: Bayesian networks, genetic network modeling, structure search

1. Introduction Current advances in microarray technology have made it possible to simultaneously measure the gene expression levels of most of the genes in an organism. Microarray measurements taken over time allow us to monitor the changes in the state of expression, revealing causal relationships between genes. Discovering such relationships will definitely contribute to a better understanding of biological systems. Due to the high costs of microarrays, the number of time-course measurements are, generally, few with respect to the number of genes (thousands). This so-called dimensionality problem, as well as the fact that the data contain measurement noise, makes it particularly difficult to reliably discover genetic interactions from expression data. The discovery of these regulatory interactions is known as genetic network modeling. By incorporating prior knowledge about genetic networks (e.g., robustness, sparsity)

SIMULATION, Vol. 79, Issue 12, December 2003 689-702 © 2003 The Society for Modeling and Simulation International DOI: 10.1177/003754903040942

| | | |

into the modeling process [1–3] and by employing heuristics such as the use of a clustering algorithm to group sets of genes that have similar expression profiles [4, 5], the dimensionality problem can be partly alleviated. Thus far, a number of different modeling techniques have been proposed with varying success [6–11]. A review on genetic network models can be found in van Someren et al. [12] and de Jong [13]. In the past decade, research on learning Bayesian networks (BN) from data has received much attention [14]. The use of probabilistic models for modeling genetic interactions seems a natural choice [15], considering the inherent stochastic aspects of gene expression [16] and the presence of measurement noise. Indeed, Bayesian networks have shown to be most successful when learned on “static” microarray data [17]—that is, data consisting of measurements after genetic perturbations or environmental changes [12]. It has been argued that Bayesian networks are unsuited for modeling genetic networks that are learned from timecourse microarray data because of their need for a substantial number of samples for training and the fact that cyclic interactions are not allowed in BNs [18]. Cycles and


van Berlo, van Someren, and Reinders

feedback loops are, however, important aspects of many biological networks. Most of the dynamic genetic network models that have been proposed thus far are primarily extensions of linear models [4, 6, 19]. Experimental results indicate that the linear approach performs well under realistic data conditions (i.e., relatively few measurements with respect to the number of genes) in cases in which the relations are indeed linear and the noise level is low [20]. However, genetic interactions are known to possess several nonlinear aspects, such as AND functionality and switching behavior. Less attention has been paid to dynamic BNs [15, 21]. A principle choice in modeling BNs is to choose for the discrete representation of the gene expression levels or a continuous representation. Although discrete dynamic BNs (DDBN) are presumably more data demanding, it would be interesting to adopt DDBNs because they possess the potential of probabilistic models while allowing for nonlinear relationships1 between genes and the natural addition of prior information in the form of a soft constraint. In this article, we will investigate the suitability of discrete dynamic Bayesian networks for modeling genetic networks from time-course microarray data. First, we have simulated a large number of gene expression profiles aimed at analyzing the performance of DDBNs under various parameter settings, such as the size of the data set, the connectivity, and so forth. Second, the modeling approach was carried out on a subset of the Saccharomyces cerevisiae expression data set of Spellman [22]. From a systems biology perspective, the modeling approaches presented in this article contribute to the first step of gaining a systems-level understanding of any biological system—that is, the identification of the structure (and parameters) of the system [23] (see Fig. 1a). Knowledge about the structure is also necessary for the second step, in which the behavior of the system is analyzed by performing simulations and sensitivity analysis. A better understanding of biological systems at a systems level is a prerequisite for developing methods to control, design, and modify systems for desired properties (e.g., to improve product formation or design more effective medicine). Throughout all stages, advancements are made by going through iterations of the experimental cycle—that is, (1) generating measurements by performing experiments, (2) applying genetic network models to come up with improved interaction diagrams, and (3) validating hypothesized interactions and generating experiment suggestions (see Fig. 1b). First, in section 2, an introduction to dynamic BNs and aspects related to modeling genetic networks are given. Then, in section 3, a description of the performed artificialdata experiments and their results is presented. The application of the Bayesian approach to real biological data is exemplified in section 4. In the last section, some directions for future work are discussed. 1. In practice, when using a continuous representation, the relationships are often assumed to be linear.

690 SIMULATION

2. Dynamic Bayesian Networks A dynamic Bayesian network is a representation of a joint probability distribution over a set of random vectors X[1] ∪ X[2] ∪ . . . X[T ]. Each random vector X[t] = [X1 [t], . . . XN [t]] consists of the set of N random variables, Xi [t], that change during the process over time t ∈ 1, . . . T . In this article, the random variables represent the individual gene expression levels. Consequently, xi [t] denotes the actual measured gene expression value of gene i at time t (i.e., Xi [t] = xi [t]), and x[t] is the complete assignment of all gene expression levels at time t (i.e., X[t] = x[t]) and corresponds to the results from a single microarray measurement.2 A gene expression level is a measure of that gene’s activity and relates to the number of mRNA molecules of that particular gene that are present in a biological sample. In the cell, mRNA molecules are translated into proteins that may participate in many biochemical reactions. Because proteins can also bind to the DNA and induce or repress the expression of genes, the cell can be seen as a self-regulating system where the gene expression levels represent (part of) its state. The domain of modeling genetic interactions requires several assumptions to be made. First, a general notion is that the complete set of gene expression values fully captures the state of the cell.3 A natural extension is to assume that the (first-order) Markov assumption holds—that is, the current state is only dependent on the preceding state P (X[t + 1] | X[1], X[2], . . . , X[t]) = P (X[t + 1] | X[t]). Second, because the sequences of the genes encoded in the DNA determine the actual genetic interactions that we attempt to model and the DNA does not change, it is assumed that the process is stationary—that is, the transition probability P (X[t + 1] | X[t]) is independent of t. With these assumptions in mind, we can specify our DDBN as a transition network that can be represented by (1) a directed acyclic graph (DAG) G, whose vertices correspond to two separate sets of random variables X1 [t], X2 [t]..., XN [t] and X1 [t + 1], X2 [t + 1], . . . , XN [t + 1] and whose arcs indicate the conditional dependencies between the random variables but are only directed from variables in X[t] to variables in X[t + 1], and (2) a parameterization (θ1 , . . . , θi , . . . , θn ), which describes the conditional distributions for each variable Xi [t + 1] given its immediate parents in G, indicated by Pai . Figure 2a presents an example of the DAG of a DDBN with three genes. Note that when considering the genes as single nodes independent of time, this would require a directed cyclic graph (see Fig. 2b), which is not allowed in BNs. Thus, the nodes in the DAG represent the genes, while the connections (arcs) in the graph represent putative reg2. Note that capital letters indicate the random variable and the lowercase letters the realization of this variable, while vectors or matrices are indicated in bold. 3. This assumption neglects, among others, the influence of protein levels on the state.

Volume 79, Number 12


LEARNING DYNAMIC BAYESIAN NETWORKS

Figure 1. The four steps in systems biology and the experimental cycle

Figure 2. An example of a discrete dynamic Bayesian network with three genes. (a) The directed acyclic graph. (b) The directed cyclic graph representation of the network in (a) when the genes are independent of time. (c) The conditional probability table of node 2. (d) The parent set for each node.

ulatory interactions between the genes. These connections represent the combined effect of direct as well as indirect biochemical and genetic interactions that result in detectable changes in gene expression. As such, these connections capture the influences that are exerted by, for example,

transcription factors and/or the result of signal transduction pathways. In the case of discrete data, the conditional probability distribution for each variable given its parents, θi , is transformed to a conditional probability table (CPT) (see



SIMULATION 691


Fig. 2c). This table consists of qi · ri parameters, where ri is the total number of possible states of Xi , and qi is the number of possible states of its parents Pai . Learning DDBNs can now be described as the process of estimating the unknown underlying graph G and the corresponding CPTs that most likely have produced the data set D = {x[1], . . . , x[T ]}.4 Generally, a statistically motivated scoring function [8, 14] is used to evaluate a candidate graph with respect to the training data, together with a search strategy that hypothesizes new candidate graphs. 2.1 Scoring Function for Complete Data The best graph can be found by maximizing a score, S, that is proportional to the probability of the graph given the data, that is, S ∼ P (G | D). One well-known score that is generally applied is known as the BDe metric (Bayesian metric with Dirichlet priors and equivalence). Although other scoring functions can be proposed, such as the likelihood score P (D|G, θ), we favor the BDe metric since it deals with the uncertainty in θ and makes it possible to add prior information in the form of soft constraints. Using the Bayes rule, P (G | D) can be written as P (G)∗P (D| G) . It is not necessary to compute P (D) since P (D) P (D) is independent of graph G. Also, the products may be transformed into summations by taking the logarithm. The resulting scoring function is defined as S(G|D) = log P (G) + log P (D|G).

(1)

For complete data (i.e., when no measurements are missing5 ), the scoring function is decomposable [14]—that N is, S(G|D) = i=1 Si —leading to the following equation: Si = log P (Pai = Gi ) + log

T t=2

(2)

P (Xi [t] | Gi [t − 1], θi )f (θi | Gi )dθi . The first term in equation (2) is the prior probability assigned to the choice of the possible parent sets Pai , with Gi denoting the parents of gene i from graph G. If there is no preexisting knowledge about possible parents, a noninformative uniform prior can be used. However, if information concerning possible parent sets exists, it may be incorporated through the use of this prior. Real genetic networks are known to have limited connectivity [24], which varies strongly on a per gene basis. To use this property, we may employ a prior on the parents that favors less connected graphs. Jeong et al. [25] show, through studies of known protein networks and based on arguments of scalability, that the connectivity of these networks follows a power-law distribution. 4. The learning process reduces to learning the underlying graph because the CPTs are derived from the data and the graph. 5. In this article, we assume that no measurements are missing. If so, the data are made complete with a preprocessing step.

692 SIMULATION

In this article, we have investigated the use of a powerlaw distribution as a prior on the set of parents. This distriN −b bution is defined as P (Pai = Gi ) = a|Gi | , where |Gi | a and b are constants and |Gi | is the number of parents of gene Xi . We have assumed b = 2 and derived a such that N −2 = 1 based on experimental results described V =1 aV in Jeong et al. [26]. The second term in equation (2) is called the marginal likelihood and measures the local probability of the data given the set of parents and their expression values. The integral term over all parameters cannot generally be solved in closed form, except for some specific parameter distributions (e.g., distributions of the exponential family [27]). A natural assumption [17, 27, 28] for discrete data is to assume that the prior distribution over parameters follows a Dirichlet distribution, f (θi | Pai ) = qi αij k −1 Γ(αij ) , where hyperparameters αij k > 0 j =1 ri Γ(α ) θij k ij k k=1 r for every i, j, k and where αij = 1i αij k . Filling in this Dirichlet distribution in the scoring functions results in the following [29]: Si = log P (Pai = Gi ) + log

qi j =1

Γ(αij k + Nij k ) Γ(αij ) . Γ(αij + Nij ) k=1 Γ(αij k ) ri

(3)

Here, Nij k is equal to the number of times we observe in the data that Gi [t] = j and Xi [t +1] = k, and Nij is the number of times we observe Gi [t] = j . When no prior knowledge about specific parameter values θi is available, this can be expressed by applying a special case of the Dirichlet distribution (i.e., αij k = 1 ∀ i, j, k), which corresponds to a uniform distribution.6 2.2 Searching Structure from Complete Data Given that the score is decomposable, the best graph given the data can be found by optimizing the score for each gene i independently. Because we employ discrete dynamic Bayesian networks, a separation is possible between the set of nodes from which connections may start and the set of nodes in which they may end. This ensures that the graph is not cyclic, which greatly simplifies the search. Otherwise, we would have to search with complete graphs or incorporate tests to prevent cycles. Now it is sufficient to find for each gene the right combination of parents that can explain its behavior.7 Still, there are NC possible ways 6. Generally, when modeling genetic networks, no such prior information is available, and (as is also done in this article) a uniform distribution is employed. 7. Thus, instead of searching for complete graphs, we only need to search for N different subgraphs. However, for readability, we will drop the “sub” prefix and refer in the remainder of this article to graphs or structures while we actually mean subgraphs or substructures.




of picking C parents from a set of N distinct genes without replacement. Thus, the number of possible sets of parents, Np , still increases exponentially with the number of genes N N: Np = C=1 NC . Because it is too computationally expensive to test all possible sets of parents, it is necessary to employ a heuristic search algorithm. What most heuristic search algorithms have in common is that they make successive mutations to the set of parents of a variable and compute the change in the local score for each mutation. Generally employed mutations are the addition or removal of a variable from the set of parents. Well-known search algorithms that make only these single mutations to the existing graph are greedy hill climbing with multiple restarts, simulated annealing, and best first search. Chickering and Heckerman [30] have shown that, for a fixed amount of computation time, greedy search with multiple restarts outperforms simulated annealing and best first search. Consequently, we consider greedy hill climbing with multiple restarts as our basic search algorithm. Greedy Hill Climbing with Restarts (GR). For each run, this search algorithm starts with a random structure and iterates between checking the allowed mutations and applying the mutation that increases the score the most, until it reaches a local maximum, upon which the method starts a new run. The method has a single parameter—that is, the number of restarts (RES), which has a proportional effect on the computation time. After RES restarts, the structure with the highest observed score is returned. Greedy Hill Climbing with Restarts Based on Prior (GRP). A variation to this basic scheme is to let the random restarts not be random but rather generate restarts with structures sampled from the expected prior on parents. This method may be more efficient since it already starts with more probable structures. Greedy Hill Climbing with Restarts and Tabu List (GRT). A second variation augments greedy hill climbing with multiple restarts with Tabu search. This variation was considered since Tabu search is designed to escape local maxima from which the scoring function suffers due to a limited amount of data. In a Tabu search method, a Tabu list is used to escape from local maxima through a path that was not taken before [31]. To achieve this, the previously visited solutions are stored in the list. As long as a solution is contained within the lists, it cannot be revisited, and the next-best mutation is applied instead. The algorithm will stop if M moves in succession have not increased the score, upon which the method starts a new run. After RES restarts, the structure with the highest observed score is returned. Greedy Hill Climbing (No Restarts) and Tabu List (GT). A third variation combines a greedy search with only a Tabu. This method is similar to GRT, but no restarts are made. Instead, it performs a single run, starting with

an empty graph. This method was added to study whether it is better to increase the number of restarts (RES) or the size of the list (M). Beam Search (BEAM). The four greedy hill-climbing procedures are compared to an alternative search algorithm, called “beamsearch,” that was found to be the bestperforming search algorithm when learning linear genetic network models [1]. This search algorithm starts by selecting from all possible structures with a single parent (C = 1) the K graphs with the highest score. Only for these K graphs are all the combinations with a second parent (C = 2) evaluated. Then the best K graphs of connectivity two (C = 2) are selected, upon which the procedure is repeated. A memory is implemented to keep track of the best overall evaluated graph. The algorithm stops when a prespecified maximum number of connections (MCON) is reached. 3. Artificial-Data Experiments We did different types of artificial-data experiments to reveal how well DDBNs are capable of discovering the underlying structure from a data set sampled from a known structure. In all following experiments, we generated discrete data sets by sampling from randomly generated (reference) networks. The generated data sets were used to learn DDBNs, which were then compared to the reference networks from which the data were originally sampled (see Fig. 3). To generate representative expression profiles, the data generation procedure was designed to take into account some general properties believed to be valid for genetic networks and for typical gene expression data sets. Although the actual complexity of genetic networks is unknown, it is believed that genes are regulated by a limited number of other genes [24]. We therefore generated networks with limited connectivity according to the power-law distribution described in section 2.1. In typical gene expression data sets, the number of genes by far exceeds the number of arrays mainly due to cost (and tissue availability) arguments and because long timeseries experiments are sometimes (bio)technically difficult to produce in the lab. Therefore, the generated data sets consisted of several series with a small number of arrays rather than one single, long time series. In fact, small time series have also been shown to be favorable for the learning process [5]. The employed procedure can be described as follows: for a given number of genes, N , and a connectivity, C, a reference DAG G was generated with random selections of C parents per gene. The corresponding CPTs were generated randomly with a parameter value distribution that is close to being “deterministic”8 —that is, the entries of the 8. Although the model is fully stochastic, its behavior can range from completely unpredictable (CPT has only values of r1 ) to completely i deterministic (CPT contains only 1s and 0s).



SIMULATION

693


Generate random network (reference)

Sample data on the basis of the generated network

Learn network on the basis of the sampled data set

Compare the discovered network properties with the reference network properties Figure 3. Framework of the experimental setup of the artificial-data experiments

CPTs consisted of 0.9s and 0.05s (see the appendix). To arrive at a data set D with a given number of samples, T , data were sampled from this reference DDBN in 10 series of T /10 samples, each starting with a random initial sample. In our experiments, we continuously employed three discrete states representing the commonly used concepts of underexpressed, baseline, and overexpressed. Each time, the input to our modeling approach was thus a discrete data set consisting of N genes, T time points, and three discretization levels. To average out stochastic effects, this experimental procedure was employed to generate 20 data sets for each of the studied experimental conditions (i.e., for a given N, C, T ), and all indicated performance measures are averages over these 20 repetitions. We have set up experiments such that they reveal the strengths and weaknesses of the modeling approach. Apart from the score, S, two measures were defined to represent the performance in refinding the wiring diagram: 1. Sensitivity: the number of discovered connections that were also in the reference graph—that is, the true connections (T C), divided by the total number of connections in the reference graph (T C + GC): Se =

TC . T C + GC

2. Specificity: the number of discovered connections that were also in the reference graph (T C), divided by the total number of connections in the discovered graph (T C + F C). Sp =

TC . T C + FC

Only when the discovered graph is identical to the reference graph do both measures equal 1. Apart from influences from the number of true connections that affects both

measures, sensitivity will become smaller than 1 when the discovered graph contains fewer connections than the reference graph. Otherwise, when the discovered graph contains many more connections than the reference graph, the specificity drops below 1. Given the conditions of genetic network modeling, we prefer to have a high specificity rather than a high sensitivity. To understand this, recall that the interactions found serve as starting points for a new hypothesis that will be tested in subsequent experiments (see Fig. 1b). It is thus not desirable to produce a lot of false positives. 3.1 Method We did several experiments to examine the properties of our modeling approach. • Experiment 1. This experiment is directed toward finding the best search method. We examined the performances of the different search methods for a given amount of computation time and examined how the performance measures are affected by the different parameter settings (see Table 2 for the actual settings of RES, M, K, and MCON). The data conditions were kept fixed: employing reference graphs of 200 genes (N = 200) with a connectivity of four (C = 4) and data sets consisting of 100 samples (T = 100). • Experiment 2. This experiment studies the performance of the model under different data conditions. The performance is examined for increasing network connectivity (C = 3, 4, 5, and 6 connections) and for increasing the number of samples (10 < T < 600). The effect on performance caused by the use of the informative prior (powerlaw distribution) instead of the noninformative prior (uniform distribution) (i.e., the first term of equation (2)) is also examined. In this experiment, the number of genes in the reference graph was kept fixed at 50 genes (N = 50), and greedy hill climbing with 500 restarts (GR) was used as a search technique (RES = 500). • Experiment 3. This experiment looks into the scalability of the model. The effect on performance for increasing

694 SIMULATION Volume 79, Number 12



network sizes is investigated (50 < N < 600). Here, the connectivity was fixed at four (C = 4), and data sets with 100 samples were generated (T = 100). Greedy hill climbing with restarts based on prior (GRP) was used as a search technique, but the number of restarts was investigated at two different settings (RES = 100 or 500).

When the number of discovered connections is equal to the number of original connections, both performance measures give the same result. As this was found to happen regularly, the plots depicting specificity are omitted where appropriate. 3.2 Experiment 1: Search Methods The results of the first experiment are shown in Figure 4a. We can distinguish two different behaviors. First, when the computation time is short (40–500 sec), beamsearch outperforms all the other methods. An explanation may be that beamsearch starts searching for the parents that are most strongly correlated with the child node. Apparently, at least a subset of parents is, indeed, more strongly correlated when compared with random genes, allowing the beamsearch to be more efficient in the beginning of the search with respect to the other methods that rely primarily on starting sufficiently close to the original structure. Second, at computation times of 500 seconds and longer, almost all search methods perform equally well. An exception is formed by the Tabu search that starts with the empty graph (GT), which shows an overall lower performance. Nevertheless, it shows that there is a positive relation between the score and the length of the Tabu list. Relying solely on the Tabu list suffers from the effect that just one part of the search space is well examined, whereas restarts provide a more even sampling of the whole search space. The results also indicate that with respect to the standard greedy hill climbing with restarts (GR), the use of Tabu lists (GRT) or prior information on the restarts (GRP) only provides a marginal improvement. Apparently, using (longer) Tabu lists, applying smart restarts, and performing (more) restarts all have a positive effect on the score of the final structure. However, the most gain in performance is obtained by simply increasing the number of restarts. When the score is compared to the corresponding performance measures (see Fig. 4b), we notice that slight differences in the score may lead to a substantial improvement in performance for both Se as well as Sn. At the same time, it becomes apparent that there exists a strong correlation between the scoring function and the other performance measures. Although this is strongly desired, this property is not to be readily expected when dealing with a dimensionality problem (having less samples than variables), where the effect of overfitting may easily decouple data fit from performance. Apparently, even under limited data conditions, the score is able to faithfully distinguish correct graphs from incorrect ones.

3.3 Experiment 2: Data Conditions Part of the results of experiment 2 is depicted in Figure 5. From this figure, we can observe that for a given number of samples, both performance measures strongly decrease when the connectivity increases. Part of this effect may be explained by the fact that the number of possible parent sets increases exponentially; similarly, the number of parameters to be estimated at each node increases with the connectivity according to 3C ∗ 2. The drop in performance may partly be compensated if a larger number of samples may be used. Unfortunately, the necessary number of samples to obtain a similar performance increases almost exponentially with increased connectivity, thus limiting the effect that more measurements may have on the number of connections that can be reliably discovered. Nevertheless, networks of 50 genes with a maximal connectivity of four may already be learned with an acceptable performance when the data constitute 60 samples, while we observe a perfect performance when there are more than 200 samples. This may be a realistic situation if we are considering a single pathway when only a subset of genes is involved. Three particular data conditions in Figure 5 are remarkable. These situations are further illustrated in Figure 6, showing the breakdown of the number of correctly discovered connections for original networks of a connectivity of four. In the first situation (i.e., between 30 and 50 samples), the results are very bad. The number of discovered connections is fewer than four, and the connections that are found are mostly incorrect. In the second situation (i.e., between 60 and 80 samples), most discovered connections are true connections, but the connectivity is still below four. In these cases, a subset of the true connections starts to become more correlated than a random set of parents. In the last situation (i.e., between 90 and 400 samples), an almost perfect result is obtained. Remarkably, the number of connections is never overestimated. Figure 5 also shows that an increase in the number of samples does not lead to a decrease in specificity. It is therefore to be expected that our informative prior that favors even less connected structures will not exert a positive effect on the performance. The results shown in Figure 7 confirm this expectation; that is, on both measures, the scoring function without the power-law distribution as prior gives a better or equal performance than with the standard uniform prior. 3.4 Experiment 3: Scalability Figure 8 shows the effect on the performance for varying network sizes. In the case of 100 restarts (RES = 100), the performance measure drops fast when going from 50 to 200 genes and reaches an almost constant level at 300 genes and more. Two forces come into effect that may explain this behavior. The first is that as the number of variables increases,



SIMULATION 695


a) Score

b) Performance sensitivity

SCORE COMPUTATION TIME

−92.5

0.9

0.9

−93

0.8

0.8

0.7

0.7

TC/(TC+GC)

−94

−94.5

TC/(TC+FC)

1

−93.5

SCORE

specificity

1

−92

0.6

0.5

0.6

0.5

−95

0.4

0.4

−95.5 GR GRP GRT GT BEAM

−96

−96.5

0

500

1000 1500 COMPUTATION TIME [sec.]

2000

GR GRP GRT GT BEAM

0.3

2500

0.2

0

500 1000 1500 2000 COMPUTATION TIME [sec.]

GR GRP GRT GT BEAM

0.3

0.2

2500

0

500 1000 1500 2000 COMPUTATION TIME [sec.]

2500

Figure 4. (a) The average score for different search methods given the computation time and (b) the average performance for different search methods given the computation time

sensitivity/specificity sensitivity 1

TC/(TC+GC)

0.8 0.6 0.4 0.2 0

0

100

200

300 number of samples

400

500

600

specificity 1

TC/(TC+FC)

0.8 0.6 0.4 3 conn. 4 conn. 5 conn. 6 conn.

0.2 0

0

100

200

300 number of samples

400

500

600

Figure 5. The performance measures as a function of the connectivity (C = 3, 4, 5, 6) and the number of samples (50 < T < 600). Other settings are N = 50, using GR with RES = 500.




Table 1. The different parameters setting for each search method GR RES

GRP RES

RES

GRR M

GT M

K

25 100 300 500 1000

25 100 300 500 1000

5 15 15 25 25

10 20 40 40 80

40 170 250 350 500

10 23 40 55 75

BEAM MCON 6 6 6 6 6

Note. Settings are ordered in increasing computation time from top to bottom. GR = Greedy Hill climbing with restarts; GRP = Greedy Hill climbing with restarts based on prior; GRT = Greedy Hill climbing with restarts and Tabu list; GT = Greedy Hill climbing (no restarts) and Tabu list; RES = number of restarts; MCON = maximum number of connections; BEAM = beamsearch.

6 # true conn.

# true conn.

15 10 5 0

0

1 2 3 4 # disc. conn. / 30 samples

4 2


5

0


0


5

5

0

5

20

15

15

# true conn.

20

10 5 0

0

10 # true conn.

# true conn. # true conn.

2 0

5

6

0

4

0


5

0 true conn. 1 true conn. 2 true conn. 3 true conn. 4 true conn.

10 5 0

0


5

Figure 6. The frequency of discovering a specific number of true connections (bars), as a function of the total number of discovered connections (grouped as denoted on the x-axis). Different plots correspond to different sample sizes (T = 30, 50, 60, 80, 90, and 200). Other settings are N = 50, C = 4, using GR with RES = 500.

there are more signals to choose from. The possibility that a randomly chosen graph has a higher score than the reference graph therefore grows. This effect cannot completely explain the results since in that case, we would expect a slight decrease for small networks and a larger one for larger networks. A better explanation results from the fact that the search space expands as the number of variables increases, and it will take more time for the search method to find higher scoring graphs. In that case, the performance is limited by the search procedure and not by the scoring function.

Indeed, an increase in the number of restarts leads to a better performance (see Fig. 8a). To get a better insight, the score of the reference graph and the score of the discovered graph were computed, and the number of times the discovered score was higher, equal, or lower than the original score is depicted by shaded bars in Figure 8b. If the score of the reference graph is lower than the score of the discovered graph, we know that the first explanation holds (a bad scoring function). On the other hand, if the score of the reference graph is higher than the score of the discovered graph, better results may Volume 79, Number 12


SIMULATION 697


sensitivity 1

TC/(TC+GC)

0.8 0.6 0.4 0.2 0 30

Non Informative Prior Informative Prior 40

50

60

70 80 number of samples

90

100

110

120

90

100

110

120

specificity 1

TC/(TC+FC)

0.8 0.6 0.4 0.2 0 30

40

50

60

70 80 number of samples

Figure 7. The sensitivity and the specificity for a noninformative (uniform) and an informative (power-law distribution) prior as a function of the number of samples. Other settings are N = 50, C = 4, using GR with RES = 500.

sensitivity

1

1

0.8 0.9

0.6

0.8

0.4 100 restarts | 500 restarts

0.7

TC/(TC+GC)

0.6

0.5

0.4

0.2

0

−0.2

0.3

−0.4

0.2

−0.6

−0.8

0.1

Score Mistake True Structure discovered Search Mistake

100 res 500 res 0

0

100

200

300 number of genes

400

500

600

−1

50

70

90

110

140 170 200 number of genes

250

300

400

600

Figure 8. (a) The performance as a function of the number of genes and (b) the relative frequencies of obtaining a lower, equal, or higher score than the score of the original graph, indicated by a black, gray, or white bar, respectively. The bottom half corresponds to results obtained with 100 restarts, whereas the top half was obtained in the case of 500 restarts. From left to right, the network size increases from 50 to 600 genes.)




Figure 9. The proposed modeling approach applied to biological data

be obtained by further increasing the number of restarts. As we can see in Figure 8b, more than 50%of the mistakes for large networks can be designated to the search procedure and, therefore, the performance may be improved by further enhancing the search method. 4. Experiment on a Real Biological System To illustrate its use on real data, the proposed method has also been carried out on a subset of the publicly available S. cerevisiae gene expression data set of Spellman [22]. The Alpha subset was chosen because it consisted of relatively many time points (T = 18). For network identification purposes, however, this number of arrays is still extremely small, forcing us to maintain minimal expectations. The experiment consisted of three steps: (1) preprocessing of the gene expression data, (2) the application of the DDBN approach, and (3) a initial validation of the predicted interactions (see Fig. 9). 4.1 Preprocessing of Gene Expression Data Before the DDBN can be applied, the data need to be properly preprocessed (i.e., they need to be cleaned from unwanted disturbances, and the continuous gene expression levels need to be converted into discrete values). Since the topic of preprocessing is beyond the scope of this study, we only adopted simple preprocessing steps. Experimental observations indicate that gene expression values that remain below an absolute value of 2 are not significantly expressed or repressed. To avoid the discovery of erroneous interactions, we have removed all genes with expression profiles not exceeding a twofold expression in any measurement. From the original 2467 genes in the Alpha subset, only 364 genes were shown to be significantly active based on this simple selection scheme. Typical gene expression data sets are not complete and contain a substantial number of missing values. Missing values occur due to different reasons related to technical discrepancies in the quality of the spots on the microarray

(e.g., dirt, lack of deposition, etc.). The derived scoring function, as presented in equation (3), can only be computed when the data set is complete. Therefore, linear interpolation (between expression levels adjacent in time) was employed to impute the missing values in the data set. Each continuous gene expression profile was discretized into a profile that consisted solely of the three commonly used expression levels: underexpressed, baseline, and overexpressed. A discretization procedure adaptive to each gene expression profile was obtained by application of the K-means clustering (K = 3) algorithm on the set of expression levels of each gene expression profile individually. 4.2 Application of Dynamic Bayesian Network Approach Given the discrete data set, the Bayesian method was applied using greedy hill climbing with 1000 random restarts. The method returns the most promising genetic regulatory network, which consisted of 364 hypotheses in the form “regulators A, B, and . . . regulate Z.” The complete list of hypotheses can be found on our Web site (http://www.genlab.tudelft.nl/publications/simulation03/). In total, the hypothesized network consisted of 957 connections between the 364 genes. The number of parents per gene was found to range between 1 and 5 connections, with an average of 2.6. Such properties are typical for networks of limited connectivity, which is a characteristic that is also expected of real biological genetic networks of similar size. Consider now each connection as a separate hypothesis in the form “regulator A regulates Z.” (Thus, if a gene has C parents, C of these hypotheses are created.) When a gene Gi is an important regulator of gene Gj , then it is to be expected that the score will deteriorate significantly when we exclude this regulator from the set of regulators of gene Gj . Therefore, the deterioration of the score due to the deletion of a regulator may be used as a confidence measure for these hypotheses. This allows all hypotheses to be ranked on confidence, which facilitates further validation Volume 79, Number 12


SIMULATION 699


of the found hypotheses. More elaborate approaches could be envisioned based on the application of bootstrapping (with feature extraction) or noise injections. 4.3 Validation of Predicted Interactions To validate whether the discovered connections are sensible,9 it is necessary to verify them against existing biological knowledge. Because neither existing knowledge nor the found hypothesis may be expected to be complete, it is not possible to determine a quantitative measure of performance.10 Instead, one can only gain confidence in the method if some of its hypotheses match with known interactions. To facilitate such a manual comparative analysis (which is quite labor intensive), we first chose to determine in an automatic fashion which pairs of connected genes (1) were assigned to similar functional classes, (2) are known to interact transcriptionally in TransFac,11 (3) are mentioned together in the publications listed in Pubmed, and (4) score together a relatively high number of hits in Google when compared to their individual hit rates. Despite the extremely limited number of data points, it was found that the list of hypothesized (indirect) connections contained several pairs of genes that were involved in the same pathway/process, known to be transcriptionally regulated, and/or known to physically interact with each other. One specific example is given by the hypothesized regulation of the ARS-binding factor ABF1 (YKL112w) by silencing regulatory protein SIR1 (YKR101w). Both genes are located on chromosome XI and are classified into the functional group of transcriptional control (consisting of 334 entries). Examination of the number of references in Pubmed showed that SIR1 and ABF1 are individually mentioned in 63 and 136 references, respectively, but that they share 2 references in common. Similar examination of hit rates in Google resulted in 2640 and 3650 hits for SIR1 and ABF1, respectively, and 156 hits for both of them. According to Gardner et al. [32] (among others), both genes are involved in silencing the mating-type loci HMR and HML of S. cerevisiae. Yeast silencers contain neighboring binding sites for three DNA- binding proteins—that is, the origin recognition complex (ORC), the Rap1p, and/or Abf1p. These silencer-binding proteins are thought to function in silencing by helping to recruit the four Sir proteins to HML and HMR, of which Sir1p is likely to be the key participant. Another example, this time of two connections in cascade, is given by the hypothesized regulation of ADA2 by PHO4 together with the regulation of PHO5 by ADA2 (i.e., PHO4 → ADA2 → PHO5). All three genes are mentioned in the same publication by Barbaric et al. [33], listed 9. That is, when at a stage before follow-up validation experiments are taken in consideration. 10. For example, if a hypothesis is not supported by existing knowledge, it may still be true. 11. A hypothesized connection A → B matches if A is a transcription factor and gene B contains a binding site for A in its upstream region.

in Pubmed. Strong biological support for the two hypotheses can be found in this publication, which mentions, for example, that “under conditions of full induction, Ada2 was bound specifically to the PHO5 promotor, but only in the presence of Pho4.” In addition, analysis of the TransFac database [34] found two protein-binding sites for Pho4p in the upstream region of PHO5. For other examples of hypothesized interactions consistent with biological knowledge, additional information and all other supplementary materials (such as the used data sets, hypothesized connections, a list of the number of Pubmed references, a list of the number of Google hits, and pairs of transcription factors versus binding sites) can be found on our Web site at http://www.genlab.tudelft.nl/publications/simulation03/. 5. Discussion and Future Work In this article, we examined the suitability of DDBNs for discovering the interactions between genes from gene expression data. The effects on the performance of refinding the original structure were experimentally investigated with respect to the connectivity and the number of genes in the original graph, the number of samples in the data set, and with respect to several search methods under varying parameter settings. Furthermore, the impact of the use of a power-law distribution as prior on parent sets was investigated. The analysis in this study showed that an exponential increase in the number of microarrays is necessary to obtain a reliable hypothesis for increasingly more connected networks. Under realistic conditions (20-50 microarrays), DDBNs can only extract promising hypotheses about the true connections for the genes in a genetic network, which are influenced by one, two, or three genes. In difficult cases with high connectivity and/or a severely limited number of microarrays, a large probability exists that not even a subset of the true connections will be discovered. On the other hand, it was demonstrated that the number of connections discovered never exceeded the true number of connections. Therefore, using a soft prior that favors less complex structures (based on prior knowledge about genetic networks) did not enhance the scoring function. It was also discovered that the model is not very sensitive to the number of genes if the search procedure is likewise enhanced with more restarts. Similarly, the method was not found to be particularly sensitive to noise. For experimental design, it is useful to keep in mind that it was found that performance increases when using several short time series instead of using one long series of similar length (results not shown). Application of the method on real data illustrates that several preprocessing steps need to be taken before the DDBN approach can be applied, which can all influence the final results. One of the essential steps necessary when using real microarray measurements is that the measured continuous gene expression levels need to be discretized prior




sensitivity 1

0.9

0.8

55 samples 66 samples 77 samples 88 samples 99 samples 110 samples

0.7

TC/(TC+GC)

0.6

0.5

0.4

0.3

0.2

0.1

0 0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

pmax

Figure 10. The obtained sensitivity for different conditional probability tables (CPTs) and different sample sizes. A CPT with pmax = .9 means that for each state of the parent, the probability distribution for the child is [.9 .05 .05].

to learning DDBNs from the data. Although results may strongly depend on the employed discretization method, investigations toward the best discretization method were beyond the scope of this study. Pe’er et al. [17] introduced a simple discretization method that was designed specifically for microarray data. Given the severe limitations of the Alpha subset of the Spellman data set, we found the DDBN approach to perform surprisingly well. The resulting network was found to be typically sparse, and several hypothesized connections were found to be supported by biological knowledge. Before a stage is reached when real follow-up experiments need to be considered to validate the found results, an in silico validation procedure is needed to gain confidence in the hypothesized connections. Although we have presented and implemented some of our initial ideas to obtain an automatic validation procedure, we feel that there is still much room for improvement. Further improvements in the structure-learning algorithm may result from combining information acquired from the different search methods, as they may have investigated different areas of the search space. Another direction for further exploration consists of the gains that can be made by including a soft prior reflecting the prior knowledge about the probability of a specific connection. Such information may be gleaned from literature and/or preexist in public databases. Promising research may be directed to get insight into the dependency of the score on different data points. It can be expected that the true graph is less sensitive to one specific data point than other high-scoring graphs. To use this notion, noise injection or bootstrapping may boost the

performance, making the models more robust against data disturbances. 6. Appendix 6.1 Capability of DDBNs to Handle Noisy Data Real microarray measurements contain noise arising from the measurement technology, the experimental setup, and the underlying stochastic biological processes [35]. Therefore, to realistically model the data, we need to model the noise also. DDBNs allow us to model the uncertainty resulting from the influence of the noise. Namely, when the CPTs contain only 0s and 1s, the behavior of the DDBN becomes deterministic. Relaxing these values increases the capacity to model the presence of “noise”12 in the data. Obviously, when these values are relaxed to the extreme (uniform distribution), the data generated by the DDBN contain the least information. In such a situation, it becomes more difficult (if not impossible) to discover the graph on the basis of the data. To get insight into the sensitivity of the modeling technique with respect to the amount of noise, we performed an experiment in which the distribution of the parameters in the CPTs was varied. When comparing the most likely graph (given the data) with respect to the reference graph (see Fig. 10), one can observe that the sensitivity increases roughly linearly with increasing levels of stochasticity (i.e., for CPTs with decreasing pmax ).

12. With noise, we also mean the stochasticity of the underlying biological processes.



SIMULATION 701


7. References [1] van Someren, E. P., L. F. A. Wessels, M. J. T. Reinders, and E. Backer. 2001. Searching for limited connectivity in genetic network models. In Proceedings of the Second International Conference on Systems Biology, November, Pasadena, CA, pp. 222-30. [2] van Someren, E. P., L. F. A. Wessels, M. J. T. Reinders, and E. Backer. 2002. Computational and statistical approaches to genomics. Dordrecht, the Netherlands: Kluwer. [3] Friedman, N., I. Nachman, and D. Pe’er. 1999. Learning Bayesian network structure from massive datasets: The “sparse candidate” algorithm. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 206-15. [4] van Someren, E. P., L. F. A. Wessels, and M. J. T. Reinders. 2000. Linear modeling of genetic networks from experimental data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, edited by R. Altman, T. L. Bailey, P. Bourne, M. Gribskov, T. Lengauer, I. N. Shindyalov, L. F. Ten Eyck, and H. Weissig, 355-66. La Jolla, CA: AAAI. [5] Wahde, M., and J. Hertz. 2000. Coarse-grained reverse engineering of genetic regulatory networks. Biosystems 55 (1-3): 129-36. [6] D’Haeseleer, P., X. Wen, S. Fuhrman, and R. Somogyi. 1999. Linear modeling of mrna expression levels during cns development and injury. In Pacific Symposium on Biocomputing ’99, vol. 4, 41-52. New York: World Scientific Publishing. [7] Liang, S., S. Fuhrman, and R. Somogyi. 1998. Reveal, a general reverse engineering algorithm for inference of genetic network architectures. In Pacific Symposium on Biocomputing ’98, vol. 3, 18-29. New York: World Scientific Publishing. [8] Hartemink, A. J., D. K. Gifford, T. S. Jaakkola, and R. A. Young. 2001. Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks. In Pacific Symposium on Biocomputing 2001, vol. 6, 422-33. New York: World Scientific Publishing. [9] Imoto, S., T. Goto, and S. Miyano. 2002. Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. In Pacific Symposium on Biocomputing 2002, vol. 7, 175-86. New York: World Scientific Publishing. [10] Shmulevich, I., E. R. Dougherty, and W. Zhang. 2002. Gene perturbation and intervention in probabilistic Boolean networks. Bioinformatics 18 (10): 1319-31. [11] Gardner, T. S., D. di Bernardo, D. Lorenz, and J. J. Collins. Inferring genetic networks and identifying compound mode of action via expression profiling. Science 301: 102-5. [12] van Someren, E. P., L. F. A. Wessels, E. Backer, and M. J. T. Reinders. 2002. Genetic network modeling. Pharmacogenomics 3 (4): 50725. [13] de Jong, H. 2002. Modeling and simulation of genetic regulatory systems: A literature review. Journal of Computational Biology 9 (1): 67-103. [14] Heckerman, D. 1998. Learning in graphical models. Dordrecht, the Netherlands: Kluwer. [15] Murphy, K., and S. Mian. 1999. Modelling gene expression data using dynamic Bayesian networks. Technical report, Computer Science Division, University of California, Berkeley. [16] McAdams, H. H., and A. Arkin. 1997. Stochastic mechanisms in gene expression. Proceedings of the National Academy of Sciences of the USA 94:814- 9. [17] Pe’er, D., A. Regev, G. Elidan, and N. Friedman. 2001. Inferring subnetworks from perturbed expression profiles. Bioinformatics 1 (1): 9. [18] Spirtes, P., C. Glymour, and R. Scheines. 2000. Constructing Bayesian network models of gene expression networks from microarray data. In Proceedings of the Atlantic Symposium on Computational Biology, Genome Information Systems and Technology. [19] Weaver, D. C., C. T. Workman, and G. D. Stormo. 1999. Modeling regulatory networks with weight matrices. In Pacific Symposium on Biocomputing ’99, vol. 4, pp. 112-23. New York: World Scientific Publishing.

[20] van Someren, E. P. 2003. Data-driven discovery of genetic network models. Ph.D. diss., Delft University of Technology, Delft, the Netherlands. [21] Friedman, N., K. Murphy, and S. Russell. 1998. Learning the structure of dynamic probabilistic networks. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, Wisconsin. [22] Spellman, P., G. Sherlock, M. Zhang, V. Iyer, K. Anders, M. Eisen, P. Brown, D. Botstein, and B. Futcher. 1998. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiaeby microarray hybridization. Molecular Biology of the Cell 9:3273-97. [23] Kitano, H. 2001. Foundations of systems biology. Cambridge, MA: MIT Press. [24] Arnone, A., and B. Davidson. 1997. The hardwiring of development: Organization and function of genomic regulatory systems. Development 124:1851-64. [25] Jeong, H., S. Mason, A.-L. Barabasi, and Z. N. Oltvai. 2001. Centrality and lethality of protein networks. Nature 411:41-2. [26] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A.-L. Barabasi. 2000. The large-scale organization of metabolic networks. Nature 407:651-4. [27] Heckerman, D., and D. Geiger. Likelihoods and parameter priors for Bayesian networks. Tech. Report MSR-TR-95-54, Microsoft Research, Redmond, WA, November, 1995. [28] Friedman, N., M. Linial, I. Nachman, and D. Pe’er. 2000. Using Bayesian networks to analyze expression data. Journal of Computational Biology 7 (3-4): 601-20. [29] Cooper, G. F., and E. Herskovits. 1992. A Bayesian method for the induction of probablistic networks from data. Machine Learning 9:309-47. [30] Chickering, D. M., and D. Heckerman. 1996. Efficient approximation for the marginal likelihood for incomplete data given a Bayesian network. Tech. Report MSR-TR-96-08, Microsoft Research, Redmond, WA. [31] Andersson, J. 2000. A survey of multiobjective optimization in engineering design. Tech. Rep. LiTH-IKP-R-1097, Department of Mechanical Engineering, Linköping University, Linköping, Sweden. [32] Gardner, K. A., J. Rine, and C. A. Fox. 1999. A region of the sir1 protein dedicated to recognition of a silencer and required for interaction with the orc1 protein in Saccharomyces cerevisiae. Genetics 151:31-44. [33] Barbaric, S., H. Reinke, and W. Horz. 2003. Multiple mechanistically distinct functions of saga at the pho5 promoter. Molecular Cellular Biology 23 (10): 3468-76. [34] Wingender, E., X. Chen, R. Hehl, H. Karas, I. Liebich, V. Matys, T. Meinhardt, M. Prüss, I. Reuter, and F. Schacherer. 2000. Transfac: An integrated system for gene expression regulation. Nucleic Acids Research 28:316-9. [35] Friedman, N., M. Goldszmidt, and A. Wyner. 1999. Data analysis with Bayesian networks: A bootstrap approach. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence.

R. J. P. van Berlo is an M Sc student in the Information and Communication Theory Group, Department of Mediametics, Faculty of Information Technology and Systems, Delft University of Technology, Delft, the Netherlands. E. P. van Someren is a post-doctoral student in the Information and Communication Theory Group, Department of Mediametics, Faculty of Information Technology and Systems, Delft University of Technology, Delft, the Netherlands. M. J. T. Reinders is an associate professor in the Information and Communication Theory Group, Department of Mediametics, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, the Netherlands.



Studying the Conditions for Learning Dynamic Bayesian Networks to

Studying the Conditions for Learning Dynamic Bayesian Networks to

Suggest Documents

Bayesian Learning of Dynamic Multilayer Networks

Dynamic Bayesian Networks for Prognosis

Feature Dynamic Bayesian Networks

The Hugin Tool for Learning Bayesian Networks

Improved Dynamic Bayesian Networks Applied to

Learning Bayesian Networks

Learning Bayesian Networks - CiteSeerX

Learning Bayesian Networks

Constructing Bayesian Networks for Linear Dynamic Systems

Bayesian Networks for Dynamic Classification - Department of ...

Dynamic Bayesian Networks - Google Sites

Bayesian Learning Networks Approach to Cybercrime ... - CiteSeerX

A Bayesian Approach to Learning Bayesian Networks with ... - CiteSeerX

Towards Inference and Learning in Dynamic Bayesian Networks using

Transfer Learning for Bayesian Networks | SpringerLink

Learning Bayesian networks i - CiteSeerX

Structure Learning Methods for Bayesian Networks to ...

The conditions of learning in networks - Hal

Infinite Dynamic Bayesian Networks - ICML 2011

Localizing Transient Faults Using Dynamic Bayesian Networks

Efficient inference in persistent Dynamic Bayesian Networks

Heterogeneous Continuous Dynamic Bayesian Networks with Flexible

Learning Methods for Dynamic Neural Networks - IEICE

Representation Learning for Large-scale Dynamic Networks