An Experimental Design to Reduce Dynamics Uncertainty in Genomic Networks Daniel N. Mohsenizadeh
Roozbeh Dehghannasiri
Edward R. Dougherty
Department of Electrical and Computer Engineering Texas A&M University College Station, TX 77843, USA
Department of Electrical and Computer Engineering Texas A&M University College Station, TX 77843, USA
Department of Electrical and Computer Engineering Texas A&M University College Station, TX 77843, USA
[email protected]
[email protected]
[email protected]
ABSTRACT
Keywords
In systems biology, network models are often used as a promising tool to study interactions among cellular components (e.g., genes or proteins). However, these models are typically too complex and biological data is very limited which leads to model uncertainty. Network dynamics involves the evolution of entities over time which is central in developing cancer drugs whose aim is to change the dynamical behavior of the network to avoid cancerous phenotypes. In the presence of uncertainty, network dynamics can be updated in different ways giving multiple dynamic trajectories. In this paper, we propose an experimental design method that can effectively reduces the dynamics uncertainty and improve the performance of the interventions. We use the concept of mean objective cost of uncertainty (MOCU) to quantify dynamics uncertainty. We also incorporate the error of the potential experiments in such a way that the objective from the experimental design is taken into account. As a byproduct of the proposed objective-based experimental design method, we also develop a mathematical framework for applying interventions to interaction-based genomic networks. Furthermore, our proposed approach is well-suited for laboratory tests as the results have biological correspondence. A software package is also programmed based on the proposed experimental design method.
Experimental design, mean objective cost of uncertainty, stochastic dynamics, network intervention, interaction-based networks.
Categories and Subject Descriptors G.3 [Mathematics of Computing]: Probability and Statistics; G.2.2 [Mathematics of Computing]: Graph Theory; D.0 [Software]: General
General Terms Theory
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Copyright is held by the author/owner(s). BCB’15, September 9–12, 2015, Atlanta, GA, USA. ACM 978-1-4503-3853-0/15/09. http://dx.doi.org/10.1145/2808719.2814828.
ACM-BCB 2015
1.
INTRODUCTION
The available knowledge in genomics is mainly in the form of biological pathways that contain the information about interactions among entities. A biological database containing pathway knowledge can be modeled as an interaction-based network model M = (N, E) which consists of a set of nodes N , representing the biological entities in the database under study, and a set of edges E, representing the biological interactions/processes in that database. Each node in the network has a value, where the abstract term “value” is interpreted based on the characteristics of the biological entity that node represents. We assume that network nodes take non-negative integer values. The state of a network is represented by vector x where each element of x is the value of a network node. The network dynamics can be obtained by updating the network state vector x. Each network edge may also have a set of labels, called edge priority labels, characterizing the dynamics of the corresponding process and are assigned to that edge based on a prior knowledge or an earlier measurement made on the process, which play an important role in the study of network dynamics. For instance, consider two network edges (e1 , e2 ) ∈ E where both can dynamically happen. Now, a priority label eprio > eprio infers 1 2 that only e1 should be processed in the dynamics study of the network if both processes can happen at the same time. Given a network model M and an initial condition vector x(0), the network state vector x can be updated at each time step based on the algorithm proposed in [4, 3]. In general, a complete knowledge of all network parameters affecting the network dynamics is not available. This translates to some missing priority labels for the set of edges E, which we refer to as dynamics uncertainty. Therefore, for a given network model M and an initial condition x(0), different sequences of processes can be computed, these being referred to as dyM namic trajectories, and denoted by Tx(0) . Each trajectory M t ∈ Tx(0) , indicates one possible sequence of processes that can take place. To reduce dynamics uncertainty in a dynamical model, additional biological experiments need to be done. Biological experiments are expensive, time consuming, and need to be done on living organisms. Therefore, it is prudent
644
to come up with a strategy that ranks experiments based on the information each one can provide. Under such strategy, experiments with a high priority are suggested to be conducted first. This technique is called experimental design. Consider a set of candidate experiments where each experiment is defined over an appropriate pair of network edges for which conducting a laboratory test is possible. Given a network model (class of uncertainty) M , a distribution governing the initial condition, a class of interventions, and a set of candidate experiments, the goal of an objective-based experimental design problem is that the highest ranked experiment yields the largest reduction of the pertinent uncertainty, that affects the operational cost, herein performance of the interventions. To find such an experiment, first we quantify the uncertainty of network model based on the concept of mean objective cost of uncertainty (MOCU) [5] which measures the increased cost induced by the uncertainty. Using the concept of MOCU, the optimal experiment which is guaranteed to have the best performance on average is the one that results in the minimum expected remaining MOCU [1, 2].
2.
METHODOLOGY
Assuming that M = (N, E) embraces dynamics uncertainty, we consider M as an uncertainty class of network models Mθ , θ ∈ Θ, where each Mθ has no further model uncertainty, and the vector θ = (θ1 , θ2 , . . . , θU ) represents one set of possible priority labels, defined for U pairs of edges, that if applied to M , results in a model Mθ which dynamically behaves in a deterministic way for a given initial condition x(0). A probability distribution PM can be associated to the uncertainty class M which reflects the level of confidence for each possible model inside M . We assume that there is a true network model, M ∗ , corresponding to the biological system under study, embracing a complete knowledge of all edge priority labels affecting the network dynamic behavior. In general, the initial condition vector x(0) may vary, thus, we represent the initial condition by a random vector X(0). Now, given M and X(0), the set of all dynamic trajectories for all initial condition vectors, denoted by T M , can be computed and a corresponding probability distribution, PT M , can be obtained. The network dynamic performance is evaluated based on the value of the final network state vector (or steady-state distribution). An appropriate function for the error variable (t), for each trajectory t ∈ T M can be defined. The expected dynamic performance error, EM , over the uncertainty class (network model) M and all initial conditions, is defined as the expected value of (t) over the set T M . Based on the calculated dynamic trajectories T M , we define a class of interventions Ψ = {ψj , j ∈ J}, where each intervention ψj is the act of blocking the process/edge ej ∈ E from happening, and J is the set of network edge numbers on which such blocking intervention can be applied. Therefore, applying any intervention ψj , to the uncertainty class M , results in a new set of dynamic trajectories T M (ψj ), and a corresponding expected dynamic performance error EM (ψj ) can be calculated. An intrinsically Bayesian robust (IBR) ¯ ), intervention to the uncertainty class M , denoted by ψ(M is then an intervention for which the expected dynamic performance error is minimum. Let us define a set of experiments Ξ = {ξu , u = 1, 2, . . . , U } where ξu is defined over the pair (e1,u , e2,u ) ∈ E, its objec-
ACM-BCB 2015
tive being to determine the corresponding priority labels. To develop the experimental design framework, first we need to quantify uncertainty. Since we aim at reducing the uncertainty that affects the intervention performance, we utilize mean objective cost of uncertainty (MOCU) to quantify uncertainty. MOCU measures the expected increase of cost due to applying robust intervention (in the presence of uncertainty) instead of the model-specific optimal intervention (without the presence of uncertainty). Although we do not know the actual outcome of each experiment in advance, we do know the range of possible outcomes for each experiment. Corresponding to each outcome of a given experiment, we can define a reduced uncertainty class which is a smaller collection of networks inside the uncertainty class M which are compatible with the outcome of the experiment. For this reduced uncertainty class, we can again calculate MOCU that we call it remaining MOCU. For each experiment, if we take the expectation of the remaining MOCUs over the probability distribution governing the possible outcomes of the experiment, we can find the expected remaining MOCU for the experiment. Based on the proposed experimental design method, the experiment which results in the minimum expected remaining MOCU is the one suggested to be conducted first. Suppose that experiments are not error-free meaning that the actual outcome of the experiment might not be the same as the true value of the uncertain parameter in the underlying biological system. In this case, we incorporate the experimental error associated to each experiment in our experimental design method by quantifying it in an objectivebased manner. To do so, we calculate the difference between the expected dynamic performance error of the robust interventions that would be obtained after conducting the experiment with and without assuming the experimental error. Then we add this quantity to the expected remaining MOCU that we previously found without the assumption of the experimental error. Now the optimal experiment is the one that has the lowest value for this updated new metric.
3.
REFERENCES
[1] R. Dehghannasiri, B. Yoon, and E. Dougherty. Efficient experimental design for uncertainty reduction in gene regulatory networks. BMC Bioinformatics, 16(Suppl 13):S2, 2015. [2] R. Dehghannasiri, B. Yoon, and E. Dougherty. Optimal experimental design for gene regulatory networks in the presence of uncertainty. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(4):938–950, July 2015. [3] D. N. Mohsenizadeh. DMUN: Dynamical Modeling of Uncertain Networks (version 2.0) [software]. Available at: http://gsp.tamu.edu/Publications/supplementary/ mohsenizadeh15a. 2014. [4] D. N. Mohsenizadeh, J. Hua, M. Bittner, and E. R. Dougherty. Dynamical modeling of uncertain interaction-based genomic networks. BMC Bioinformatics, 16(Suppl 13):S3, 2015. [5] B. J. Yoon, X. Qian, and E. Dougherty. Quantifying the objective cost of uncertainty in complex dynamical systems. IEEE Transactions on Signal Processing, 61(9):2256–2266, 2013.
645