© The Authors Journal compilation © 2008 Biochemical Society
1
Modelling the dynamics of signalling pathways Sree N. Sreenath*1, Kwang-Hyun Cho† and Peter Wellstead‡ *Case Complex Systems Biology Center, Electrical Engineering and Computer Science Department, Case Western Reserve University, Cleveland, OH 44139, U.S.A., †Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 335 Gwahangno, Yuseong-gu, Daejeon 305-701, Republic of Korea, and ‡The Hamilton Institute, National University of Ireland, Maynooth Co., Kildare, Ireland
Abstract In the present chapter we discuss methodologies for the modelling, calibration and validation of cellular signalling pathway dynamics. The discussion begins with the typical range of techniques for modelling that might be employed to go from the chemical kinetics to a mathematical model of biochemical pathways. In particular, we consider the decision making processes involved in selecting the right mechanism and level of detail of representation of the biochemical interactions. These include the choice between (i) deterministic and stochastic chemical kinetics representations, (ii) discrete and continuous time models and (ii) representing continuous and discrete state processes. We then discuss the task of calibrating the models using information available in web-based databases. For situations in which the data is not available from existing sources we discuss model calibration based upon measured data and system identification methods. Such methods, together with mathematical modelling databases and computational tools are often available in standard 1To
whom correspondence should be addressed (email
[email protected]). 1
0045-0002 Sreenath.indd 1
6/24/08 1:36:57 AM
2
Essays in Biochemistry volume 45 2008
packages. We therefore make explicit mention of a range of popular and useful sites. As an example of the whole modelling and calibration process, we discuss a study of the cross-talk between the IL-1 (interleukin-1)-stimulated NF-κB (nuclear factor κB) pathway and the TGF-β (transforming growth factor β)stimulated Smad2 pathway.
Introduction Speaking in general terms, systems biology brings the quantitative disciplines associated with the physical sciences to the world of biological sciences. Viewed in this broad light, the term systems biology can take on a wide variety of interpretations, (see [1–4]), all of which have a certain validity. However, in all attempts at definition of systems biology a common theme emerges: the use of quantitative methods of mathematical analysis and modelling in a way that enables the investigation of dynamical performance. Along with mathematical methods for representing, analysing and controlling objects, systems biology also brings with it the philosophy of a systems approach: an approach which attempts to view the behaviour of a process as the integrated sum of its parts. The idea of a systems approach offers a helpful counterbalance to the qualitative and reductionist methods that are often necessary within the environment of the experimental biologist. In particular, systems biology offers a unifying framework for integration and quantification of information from the disparate parts of a biological system. With respect to the quantification, important recent systems biology efforts are focused upon accurately quantifying measurements [5] and the associated instrumentation. The integration of biological knowledge on the other hand finds its most useful expression in mathematical modelling [6,7]. As discussed in the present chapter, a mathematical model provides a framework for integration of experimental results from numerous disparate sources. As part of this process, static and descriptive cell signalling diagrams are turned into mathematical representations where the dynamics of the signalling elements can be studied within a single model based upon quantitative measurements. Our aim in the present chapter is to outline the steps and methods that are commonly used to obtain a mathematical model of the dynamics of a cellular signalling pathway. The first step is to choose the appropriate modelling framework for the signalling pathway under consideration: this process is discussed in the section on ‘Modelling frameworks’ below. As the construction of a mathematical model depends on knowledge of the biochemical components and the associated reactions involved, we also show how to access the electronic resources for obtaining this information in the section on ‘Collecting part lists’ below. Finally, the mathematical model must be calibrated and validated against experimental observations and data. The methods and challenges in this step are discussed in the section on ‘Model calibration and validation’ below.
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd 2
6/24/08 1:36:57 AM
S.N. Sreenath, K.-H. Cho & P. Wellstead
3
Modelling frameworks The construction of a mathematical model that is representative and yet manageable is an art that has been extensively developed in the physical sciences. Thus biologists wishing to obtain maximum value from mathematical modelling are strongly advised to review the systematic methods that have been developed for physical system modelling [8]. The modelling procedures followed in the biological sciences have many similarities with modelling the physical world, but there are specific differences which will be outlined here. The first choice presented in biological system modelling is which form of model is appropriate for the signalling pathway under consideration. The various model choices that are possible are indicated in Figure 1. Starting with the left-hand column (deterministic or stochastic), there is a distinction between deterministic reactions that are assumed to take place in uniform biochemical environments, such as might be found in a biochemical reactor, and those where there is a paucity of molecules to participate in the reactions. In this latter case, an element of randomness occurs and a stochastic model form is required. However, deterministic models are simpler and faster in computation, so that they are usually the starting point for model construction, until experimental results indicate otherwise. Considering the second column from the left (continuous or discrete time), most signalling pathway models use continuous time models based upon ODE (Ordinary Differential Equation) sets. Discrete time equations are only infrequently appropriate and for current purposes can be ignored. Models based upon PDE (Partial Differential Equation) sets are relevant for the case in which the various biochemical reactions are not strictly segregated. For example, Amonlirdviman et al. [9] employed PDEs in order to describe a pathway (planar cell polarity signalling) where a protein in one cell interacts with another in the neighbouring cells [9]. Both continuous and discrete time forms can be either deterministic [10,11] or stochastic [12,13]. Considering the third column from the left (continuous state or discrete state), since reactions usually occur in a regular and repeatable sequence, then continuous state models are the form normally used. However, in certain Mathematical model
Deterministic Continuous state Continuous time or (ODE, PDE, difference equation) (ODE, PDE) Stochastic or or (Monte-Carlo, Gillespie) Discrete time Discrete time (Finite state) (Difference equation)
Hybrid
Figure 1. Modelling architecture
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd 3
6/24/08 1:36:58 AM
4
Essays in Biochemistry volume 45 2008
cases, there are biochemical reactions that occur only under specific logical conditions, such as expression of a gene or at a threshold protein concentration. Such reaction sequences would involve discrete state models. However, such discrete events only usually represent a part of a mainly continuous reaction sequence within a signalling pathway. Such a system can generally be represented as a hybrid model (see the far right-hand column of the Figure 1) that combines continuous dynamics of the pathway as described by the ODE, and automata or finite state machines (as in [14,15]) for the discrete states. Figure 1 is only indicative of the modelling subdivisions and alternatives. Other helpful classifications are given elsewhere [16,17]. Deterministic ODE and PDE models In the present chapter we will focus on the most commonly encountered model forms. These are the deterministic and stochastic continuous time model frameworks. Of these, the ODE approach is the most frequently used and the most straightforward to understand. We will therefore begin the discussion with an illustrative example of this most commonly encountered model in its deterministic form. The discussion will then pass to a more formal presentation of the general steps in ODE modelling of pathway dynamics in both the deterministic and stochastic forms. An illustrative example The basis of deterministic modelling is the application of the mass action law and various approximations [18,19], to the sequence of biochemical reactions that occur in a signalling pathway. The mass action law was introduced in the 19th century by Guldberg and Waage [20] and states that the rate of a chemical reaction is proportional to the probability that the reacting species (molecules) collide. This collision probability is in turn proportional to the concentration of the reactants. To illustrate the use of the mass action law for the mathematical representation of biochemical reactions, we will consider five biochemical species (A, B, C, D and E) whose reactions are represented schematically in the prototype signalling pathway shown in Figure 2. These reactions can be written as follows (eqns 1 and 2): κ1
⎯⎯ ⎯⎯ → C ⎯κ⎯→ A + D, A+ B←
(1)
κ4
(2)
3
κ2
2D ⎯⎯→ E,
where k1, k2, k3 and k4 are the reaction constants (i.e. the rate of binding). Here, biochemical species A and B combine at a rate of k1 to form a new species C. In turn C breaks down into species A and B at rate k2, and also A and new species D at rate k3. The new species D then breaks down into a further new species E. The rate constants (k1, k2, k3, k4) determine how much of each biochemical species is formed over time.
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd 4
6/24/08 1:36:58 AM
S.N. Sreenath, K.-H. Cho & P. Wellstead
5
A B G
C D C
Translation E
F Transcription
Figure 2. Typical mechanism of a signalling pathway
For each reaction in eqns (1) and (2), the biochemical species on the lefthand side are called reactants, and those on the right-hand side are products. Based on the mass action law, the rate of the reaction is directly proportional to the product of the reactant concentrations (eqns 3–6) [18]: υ1 = k1[A][B],
(3)
υ2 = k2[C],
(4)
υ3 = k3[C],
(5)
υ4 = k4[D]2,
(6)
where the bracket indicates the concentration of the species–typically expressed in nM. The rate of change of the concentration of each biochemical species is determined by the net rate of the production and consumption reactions and the stoichiometry. A central concept in mathematical modelling of a dynamical system is the state of a system [8]. The system states are important because they give the minimal set information required to completely determine the behaviour of the system over the course of time. In signalling pathway models, the normal choice for the set of states is the set of concentrations. Thus in our illustrative example, the state x denotes the concentration vector of the biochemical
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd 5
6/24/08 1:36:58 AM
6
Essays in Biochemistry volume 45 2008
species: x = ([A],[B],[C],[D],[E])T, such that (eqn 7): ⎡ x1⎤ ⎡−υ1 + υ 2 + υ3⎤ ⎢x2⎥ ⎢ −υ + υ ⎥ ⎢x ⎥ = ⎢ υ − 1υ − 2υ ⎥ 2 3⎥ ⎢x3⎥ ⎢ 1 ⎢ 4⎥ ⎢ υ3 − 2υ4 ⎥ υ4 ⎣x5⎦ ⎣ ⎦
(7)
If the object A is an enzyme, we can approximate eqn (7) in a simpler form using a Michaelis–Menton approximation [19]. This is possible if we can assume that the concentration of complex C formed by A and B does not change significantly with respect to other species. This is called a quasi-steady state since no apparent change of reaction is observed. In other words, at the quasi-steady state, we have (eqn 8): x3 = 0.
(8)
So, we have (eqn 9): k1[A][B] − (k2 + k3)[C] = 0
(9)
from eqns (7) and (8). Moreover, since the total concentration of enzyme A should not change, we have (eqn 10): [A] + [C] = A0 (constant)
(10)
From eqns (9) and (10) we derive eqn (11): [C ] = where K m =
k2 + k3 k1
A0 [B] K m + [B]
(11)
is called a Michaelis–Menten constant.
If we substitute eqn (11) into eqn (7), then we have (eqn 12): 0 ⎡ ⎤ ⎥ k3A0 [B] ⎡ x1⎤ ⎢⎢ − ⎥ ⎢x2⎥ ⎢ K m + [B] ⎥ ⎢x ⎥ = ⎢ 0 (12) ⎥ ⎢x3⎥ ⎢ k A [B] 2⎥ 0 3 4 − 2k4[D] ⎥ ⎢ ⎥ ⎢ ⎣x5⎦ K m + [B] ⎢ ⎥ k4[D]2 ⎣ ⎦ Eqn (12) can be further reduced to the following dynamical equation (eqn 13) having three state variables: k A [B] ⎡ ⎤ − 3 0 ⎢ ⎥ K m + [B] ⎥ ⎡x2⎤ ⎢ ⎢x4⎥ = ⎢ k3A0 [B] − 2k [D]2⎥ (13) 4 ⎥ B] ⎢x ⎥ ⎢K m + [B ⎣ 5⎦ ⎢ ⎥ k4[D]2 ⎢ ⎥ ⎢⎣ ⎥⎦
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:6
6/24/08 1:36:59 AM
S.N. Sreenath, K.-H. Cho & P. Wellstead
7
Thus we have two possible representations of the reaction sequence illustrated in Figure 2. One based upon mass action kinetics (eqn 7) and the Michaelis–Menten form, eqn (13), that is valid under certain quasi-steadystate conditions. In general, biochemical reaction networks are large and complex. So, they are usually represented by differential equations with a large number of state variables. It is thus important to recognize which equations may be approximated by simpler forms by the use of methods such as the Michaelis–Menten approach. A general ODE formulation The example given in the previous section illustrates the key steps in constructing a mathematical model of the signalling pathway dynamics. In general a signalling pathway will be more complex than this example and to this end we will describe the general steps that would occur in constructing an ODE model. The reaction sequences in the illustrative example can be generalized for any number of reactions by noting that any elementary reaction j can be written as (eqn 14): kj
in . . . + sout x Rj : s1in, j x1 + s2in, j x2 + . . . + sN → s1o,uj t x1 + s2out , j xN ⎯⎯ , j x2 + N, j N
(14)
where the xk, for k = 1, . . ., N, are the concentrations of biochemical species and in reaction j; siin, j = stoichiometric coefficient of the i-th species/variable acting as the reactant; siout , j = stoichiometric coefficient of the i-th species acting as the product; i = 1, . . ., N; and j = 1, . . ., P The stoichiometric coefficients of each species is equal to the number of molecules of species i involved in reaction j. Since the reactions in the signalling pathway are not only mathematically transformed using the mass action law, but also using the assumption of Michaelis–Menten kinetics [18,19] as discussed in the illustrative example, we separate the reactions according to their kinetic assumption by adding subscripts to their reaction (R) and rate constant (k): ma for mass-action based kinetics and MM for Michaelis–Menten kinetics. Similarly, we will also distinguish reactions involving the ligand input as Rip and the rate constant as kip. The pathway being studied can now be represented as a set of non-linear differential equations (also referred to as the dynamical state equations) thus (eqns 15a, 15b and 16): x = f1(x, k MM ) + f2(x, kma) + g(x, k ip)u,
(15a)
y = h (x, u),
(15b)
s i , j = −s iin, j + s out i, j
(16)
in which si,j is an element of the stoichiometry matrix S that takes integer values (S ⊆ZP × N). Then (eqns 17a and 17b): f1 (x, kMM) = SMM vMM (x, kMM)
(17a)
f2(x, kma) = Sma vma (x, kma)
(17b)
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:7
6/24/08 1:36:59 AM
8
Essays in Biochemistry volume 45 2008
Where the subscript MM and ma indicates Michaelis–Menten and mass action reactions respectively. The vector function vMM is described in [18,19], whereas the vector function vma is derived in [21] as (eqn 18): vma = diag (kma) exp ( S T log (x)) in
(18)
where the matrix S in has Siin, j as its elements. Here, exp(.) and log(.) are element-wise matrix operations. Elements of vma are typically formulated as (eqn 19): siin, j
υma, j = kj∏ (x iin, j ) ,
∀j ∈[1, . . . , p]
(19)
i
x iin, j is species i acting as reactant in reaction j. g (x, kip) is defined as follows (eqn 20): g(x, kip) = diag (kip) exp ( SipT log (x)) B
(20)
Where B is a stoichiometry matrix for the input. In eqn (20) both exp (.) and log (.) are again element-wise operations. Note that the system in eqns (15a and 15b) assumes that the ligand–receptor binding has 1:1 binding stoichiometry. The ODE set eqn (15), is our deterministic continuous time mathematical model of the signalling pathway. It describes the change of each biochemical species in the pathway due to production, consumption and degradation of the species over time. The ODE approach can also be used to describe the transfer of biochemical species from one compartment to another, such as from cytoplasm to nucleus [22], in the following circumstances: (i) the compartments are well-stirred and (ii) the rates of transport between compartments are observable. A well-stirred compartment means the biochemical species are evenly distributed in space. When these conditions are not satisfied, a PDE becomes necessary to account for the spatial distribution of the biochemical species. A full description of the PDE formulation is beyond the scope of the present chapter; however, in general terms we note that it is based on a diffusion–reaction representation. In this form, and accounting for the spatial distribution of the biochemical species in the pathway, the system in eqn (15) would be rewritten as (eqn 21): ∂x = f (x, k T 2 1 MM ) + f2(x, k ma) + g(x, k ip)u + μ ∇ x ∂t
(21)
where the elements of vector μ denote the diffusion rate constant. For further information on PDE methods and spatial modelling based on diffusion– reaction equations see [23]. Stochastic models of signalling pathways The deterministic ordinary differential approaches described in the previous section gives a mathematical model in which for any starting condition, the way in which each of the states (the concentrations of the various biochemical species) varies, can be exactly predicted. The underlying assumption of the © The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:8
6/24/08 1:36:59 AM
S.N. Sreenath, K.-H. Cho & P. Wellstead
9
deterministic model is that the concentrations of the various chemical species is high and that at any reaction point they are well mixed. However, this assumption is not always correct in cellular reactions. Specifically, as the numbers of molecules (or concentrations) becomes lower, the variability of the molecular population in each stage of a signalling pathway increases [12], and with it comes random variations in the reaction processes. The interpretation of what constitutes a ‘low’ number of molecules is a subject of debate. However, Klipp et al. [24] have suggested that when the number of molecules are of the order of dozens or hundreds, then the random element in reactions becomes significant, and a stochastic differential equation method should be employed. Stochastic framework A key difference between the deterministic modelling and stochastic modelling is that instead of accounting for the change in concentration of each species over time (as in the deterministic case), in stochastic modelling we track changes in the number of molecules of each of the biochemical species. With this key difference in mind, we denote the number of molecules of each species at time t as the random variable: Z(t) = (#Z1(t), #Z2(t),. . ., #Zi(t),. . ., #Zn(t)) Thus for example, in the stochastic version of the illustrative exercise above, the state vector will become Z(t) = (#A, #B, #C, #D)T with the initial state of the system Z(t0) = z0. When the first reaction occurs, the system will move from state z0 to a new state z* shown below (eqn 22): ⎡# A 0⎤ ⎢# B ⎥ z0 = ⎢ 0 ⎥ #C ⎢# D0 ⎥ ⎣ 0⎦
⎡# A 0 − 1⎤ ⎢ # B − 1⎥ z* = ⎢ 0 ⎥ #C +1 ⎢ 0 ⎥ ⎣ # D0 ⎦
(22)
Here the change in the number of molecules for each species from state z0 to z* is equal to its corresponding stoichiometry. This state transition depends on the probability that the changes due to the first reaction occur as described by a object termed the propensity function. For any reaction j as in eqn (14), the propensity function is (eqn 23) [25]: a j(z) dt the probability, given Z(t) = z, that one Rj reaction will occur in the time interval [t, t + dt] (23) This state transition results in a change in the number of molecules of each species. As previously shown in eqn (22), the change in the number of molecules for the state variables, due to any reaction j in eqn (14), is equivalent to the net stoichiometry, defined as (eqn 24): out ν j = sin i, j + si, j
(24)
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:9
6/24/08 1:37:00 AM
10
Essays in Biochemistry volume 45 2008
We can now describe the transition to state z from another state, due to the j-th reaction as (eqn 25): aj (z − ν j )
z − ν j ⎯⎯⎯⎯ → z,
(25)
whereas the transition away from state z, caused by the j-th reaction is (eqn 26): aj (z)
z ⎯⎯⎯ →z +νj
(26)
Let us define the following as the probability that each species exists in z number of molecules at any time t (eqn 27). P(z, t⏐z0, t0) Prob{Z(t) = z, given Z(t0) = z0}
(27)
The Chemical Master Equation then describes the time evolution of the above probability by taking into account the probability that any reaction j occurs (the propensity function) (eqn 28): ∂P (z, t⏐z0, t0) = ∑ [a j(z − ν j )P(z − ν j , t ⏐z0, t0) − a j(z)P(z, t ⏐z0, t0)] ∂t j
(28)
This equation tracks the change in the probability that the system variable Z exists in some number of molecules as a result of all of the reactions involved in the pathway. Unfortunately, the Chemical Master Equation grows exponentially as the number of species increase. It essentially creates probability variables for each possible state (possible number of molecules for each species). To overcome this problem Gillespie [26] developed an algorithm to efficiently simulate the Chemical Master Equation. Stochastic simulation The key idea in the Gillespie algorithm is to calculate the individual trajectory of the species, instead of solving for the individual state transition probabilities. In practical terms this means that at each iteration, the algorithm determines the propensity function, aj, given as (eqn 29): ⎛ zi ⎞ a j = c j ∏ ⎜ in ⎟ , ∀j = 1,..., P, s i ⎝ i, j⎠
(29)
where cj is the stochastic rate constant (eqn 30): cj =
kj N A V
K j −1
∏ siin, j !,
∀j = 1,..., P
(30)
i
in Here, NA is the Avogadro’s number and kj = ∑i s i, j. As we can see in eqn (30), the stochastic rate constant is related to the deterministic rate constant, since both Chemical Master Equation and mass-action law were derived using molecular physical properties. © The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:10
6/24/08 1:37:00 AM
S.N. Sreenath, K.-H. Cho & P. Wellstead
11
With the propensity equation defined in eqn (29), it is possible to delineate the Gillespie algorithm (O. Wolkenhauer, unpublished work): 1. Initialization: load rate constants, c1, c2, . . ., cj and use random number generator to assign initial numbers of molecules for each species. 2. Compute aj for each reaction Rj. * 3. Calculate the combined propensity, a = ∑ j a j. 4. Generate uniform random numbers, r1, r2. ⎛ ⎞ 5. Compute τ = − ⎜ 1*⎟ ln r1. ⎝a ⎠ μ −1 μ * 6. Determine μ such that ∑ j = 1 a j ≤ r2 a ≤ ∑ j = 1a j . 7. Update the number of molecules and set t = t + τ. 8. Return to Step 2, unless t ≥ tmax or a* = 0. For more details of the algorithm and proofs of eqns (29) and (30) see [12, 26, 28]. The computer implementation of the Gillespie algorithm is a non-trivial task. Thus it is useful to note that several biochemical simulation software suites have built-in implementations of the Gillespie algorithm, these suites include Copasi [29], Dizzy [30] and STOCKS [31].
Collecting ‘part’ lists Writing the equations for a mathematical model, as explained in the previous section, is only the first part of the modelling task; values must be assigned to the various coefficients and parameters in the mathematical model. This calibration stage of modelling requires knowledge of biochemical interactions, stoichiometry of the components in the reactions and the rate constants. It is rare for a single laboratory to have the resources to determine all the parameters required in a model. In this spirit, it is customary to collect parameter information from appropriate biological in vitro and in vivo experiments as they have been reported in the literature. This information is often available in web-based databases which have consolidated published information on interactions between proteins, DNA and DNA–protein. Mathematical models of cell signalling dynamics require reaction rate information and unfortunately the number of reaction rate constants reported or collected in the databases is low. This is mainly due to experimental limitations in biology and to the dependence of rate constants on various environmental factors and variations in experimental technique. As a result, most of the rate constants in the dynamical model are obtained through the calibration process discussed in the section on ‘model calibration and validation’ below. As a precursor to this, we describe some popular and easily accessible databases. Protein interaction and cellular pathway databases A growing number of databases are available which contain information useful in signalling pathway modelling. Such databases include the list of interactions, the rate constants (estimated and obtained elsewhere), and, in some cases, © The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:11
6/24/08 1:37:01 AM
12
Essays in Biochemistry volume 45 2008
Table 1. Databases for protein interactions, mathematical models and parameters Name
Information content
BioCarta
Map of signalling pathway with a brief description and references on the pathway
BioModels
Database of annotated published mathematical models, downloadable in SBML
BOND
Biomolecular interaction database
CellML Model Repository
Curated signalling pathway model, downloadable in CellML
DIP
Database of interacting proteins with few associated kinetic parameters
DOQCS
Includes list of interactions and the kinetic parameters. The models are downloadable in Genesis/kinetikit, SBML and Matlab
JWS
Online simulator for biochemical models that include curated mathematical models (downloadable in SBML and PySCeS)
KEGG
Curated map of signalling and metabolic pathways with links to references in the literature
PANTHER
Include signalling pathway components along with the interactions. Downloadable in SBML and compatible with CellDesigner
PathGuide
List resources for signalling pathways (i.e. list of database of protein interactions)
SPAD
Map of signalling pathways sorted based on the extracellular ligand
The Cancer Cell Map
Contains selected cancer-related signalling pathways
the equations. As a guide to what is available, Table 1 is a list of some useful databases of biochemical interactions, the rate constants and mathematical models. This list is presented with the caveat that the internet changes rapidly, so that the location and content of Table 1 may change from time to time. The availability of easily accessible databases is of great value, but the parameters taken from these databases should be used with care. Specifically, it is important to remember that the biochemical reactions involved in a pathway can vary between organisms. In addition, the reliability of the reported biochemical reactions needs to be carefully taken into account, as biological systems have high variability. The issue of model complexity is again relevant here. One should include in the pathway model only that level of biochemical detail that is absolutely necessary. Once an initial model has been formulated, then Occam’s Razor should be applied again to ensure that the model is no more complex than necessary. The simpler the model the more straightforward it will be to calibrate and implement. Even if the model is simpler than thought biologically necessary, it still might be more appropriate than a complex model in which many of the parameters are unknown or of poor quality. In general, the choice of model complexity is based on a balance between the biological question under study and the feasibility of building and parametrising the model. © The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:12
6/24/08 1:37:01 AM
S.N. Sreenath, K.-H. Cho & P. Wellstead
13
Modelling representation tools With the increasing popularity of mathematical modelling as a tool to analyse biological systems, so efforts have been made to enable sharing of signalling pathway models in a standardized format. Representative of this standardization is the development of a XML-based markup language called SBML (Systems Biology Markup Language) [32] and CellML [33]. The standardization that SBML and CellML provides has led to an increasing exchange of models. In the section on modelling frameworks above we discussed the various methodologies for deriving a mathematical model of cell signalling dynamics. However, there are a number of computational tools that are available which assist in model building. These tools are primarily based on graphical representations of biochemical (cell signalling and metabolic) and gene networks with a range of built-in facilities. These include the automatic generation of mathematical models (discrete, ODE, PDE, hybrid, etc.), the simulation and graphical presentation of their dynamical behaviour and the numerical analysis of the simulation results. A summary of some popular software systems is given in Table 2. These tools are compatible at varying degrees with popular scientific and mathematical software such as MATLAB, Maple and Mathematica, and interface (import/export) with SBML. In the event that the researcher wishes to work in MATLAB, but will need some degree of standardization for biology, then the freeware Systems Biology Toolbox is recommended [34]. Example In order to illustrate the procedures described thus far, we now review the mathematical modelling of an example pathway. Specifically, consider the task of modelling the cross-talk between two pathways: IL-1 (interleukin-1)induced NF-κB (nuclear factor κB) and TGF-β (transforming growth factorβ)-induced Smad pathways. IL-1 acts as a ligand in the pathway, and it belongs to the cytokine family, a group of proteins similar to hormones that are involved in the innate immune system. Current known functions of IL-1 cytokines include raising body temperature, controlling lymphocytes and increasing the number of bone marrow cells. IL-1 can be secreted by many types of cells, but primarily by macrophages. Downstream of IL-1 is NF-κB, a protein complex (composed of two or more proteins) that is involved in regulating immune response to infection. Aberrant activation of NF-κB has been linked to cancer, inflammatory and autoimmune diseases, septic shock and viral infection, amongst other conditions. NF-κB functions as a transcription factor so, in the absence of ligand stimulus, it is kept inactive in the cytoplasm, by binding to an inhibitory protein called IκB (inhibitor of NF-κB). NF-κB will translocate to the nucleus if the upstream signalling components degrades the IκB protein. There are various ligands, including IL-1, that can set NF-κB free to enter the nucleus and to initiate gene transcription. The exact mechanisms of NF-κB © The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:13
6/24/08 1:37:01 AM
14
Essays in Biochemistry volume 45 2008
Table 2. Examples of modelling and simulation tools Name CellDesigner
Description and Major feaures
Website
A diagram editor for drawing gene-
www.celldesigner.org
regulatory and biochemical networks Modelling of biochemical networks and gene networks Import (export) to SBML, Matlab formats Uses BioModel database ODE simulations JDesigner
A visual design tool for building signalling
http://www.jdesigner.org
networks Modelling of signalling pathways, and metabolic and gene networks Import (export) to SBML formats ODE simulations PathwayLab
In silico analysis of signalling pathways
http://innetics.com
Modelling of biochemical networks Import (export) to Mathematica, Matlab and SBML formats ODE simulations Netbuilder
A graphical representation tool for drawing/
http://strc.herts.ac.uk/bio/
simulation of gene regulatory networks
maria/Net Builder/index.
Modelling of genetic regulatory networks
html
Import (export) to SBML format Boolean simulations Cellerator
A biological modelling tool based on
www.cellerator.info
automated equation generations Modelling of biochemical networks Import (export) to Mathematica, Matlab, SBML formats ODE simulations COPASI
A tool for simulation and analysis of
www.copasi.org
biochemical networks Modelling of biochemical networks Import (export) to SBML formats ODE simulations GEPASI
A tool for simulating the kinetics of
www.gepasi.org
biochemical reactions Modelling of biochemical networks Import (export) to SBML format
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:14
6/24/08 1:37:01 AM
S.N. Sreenath, K.-H. Cho & P. Wellstead
15
Table 2. Continued Name
Description and Major feaures
Website
E-Cell
A tool for large-scale (whole-cell level)
www.e-cell.org
simulations. It supports multi-time scale and multi-algorithmic simulations Modelling of biochemical networks Import (export) to SBML format ODE, DAE (Differential Algebraic Equations) simulations CADLIVE
A tool for constructing large-scale biological
www.cadlive.jp
networks Modelling of metabolic and gene regulatory networks Import (export) to SBML format DAE simulations PyBioS
A tool for modelling and simulation of
http://pybiol.molgen.mpg.de
generic cellular processes Modelling of biochemical networks Import (export) to SBML format ODE simulation Systems Biology
A platform connecting heterogenous
Workbench (SBW)
software application
http://sys-bio.org
Modelling of biochemical networks Import (export) to SBML format Virtual Cell
A tool for associating biochemical
www.nrcam.uchc.edu
electrophysiological data with experimental microscopic image data Modelling of biochemical networks Import (export) to XML, Matlab, SBML formats ODE, PDE simulations
activation remain a subject of debate, however, one current understanding of the NF-κB mechanism is illustrated in Figure 3. TGF-β is a cytokine which has dual roles: tumour suppressor and promoter [36]. It was first found to elicit signals that transform the growth of fibroblasts in culture into cancer cells. On the other hand, subsequent studies showed that TGF-β can also inhibit cell growth. Downstream of TGF-β is the Smad protein which, upon TGF-β receptor activation, is phosphorylated and consequently dimerized. The SMAD dimer then translocates to the nucleus and initiates gene transcription. Lu et al. [37] found that there is cross-talk between the TGF-β/Smad pathway and the IL-1/NF-κB pathway in a dose-dependent manner. The crosstalk mechanism is as follows: the two cytokines will activate their canonical © The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:15
6/24/08 1:37:02 AM
16
Essays in Biochemistry volume 45 2008
IL-1-R TLRs TCR TNF-R
LTβ-R BR3 CD40
Cytoplasm IKKγ IKKβ IKKα
? IKKα
IκB phosphorylation
p100 phosphorylation
IκB RelA
RelB
p50
IκB degradation RelA
Nucleus
p100
P100 processing p50
RelB
Target genes
p52
Target genes
Figure 3. The NF-κB signalling pathway (Source: Ruland et al. [35]) The NF-κB complex can either exist as a dimer of RelA and p50 proteins, or RelB and p100 proteins. They are bound to IκB which keeps them inactive in the cytoplasm. The activation of the receptors activates a ligand-specific series of adaptor proteins (represented by arrows), which in turn phosphorylates the IKK complex. In return, the phosphorylated IKK complex phosphorylates IκB. This phosphorylation process essentially tags IκB for degradation. As a result, NF-κB translocates to the nucleus to initiate gene transcription.
pathways when they are present in low doses. However, when the cytokines are present in high doses, IL-1 will activate both NF-κB and Smad. Likewise, TGF-β will activate Smad and NF-κB. Although it is not fully elucidated, the cross-talk seems to occur in the receptors. That is, IL-1 receptors can only bind to the TGF-β receptors when the respective cytokines are present in high doses. In this example, the role of the mathematical model will help to understand the dose-dependent cross-talk between the two pathways. The detailed biochemical reactions involved are shown in Figure 4. Lu et al. [37] discovered that the cross-talk between TGF-β/Smad and NF-κB pathways required activation of four adaptor proteins: IRAK (IL-1-receptor-associated kinase), MyD88, TRAF-6 (tumour-necrosis-factor-receptor-associated factor 6) and TAK1 (TGF-β-activated kinase 1). These adaptor proteins are included in the system and modelled using mass-action kinetics. Although evidence for the sequential interactions between these adaptors is still not consistent, evidence from [38,39] is used to support the model sketched here. In the model, MyD88 is recruited to the activated IL-1 receptor (ILR*). The resulting complex then recruits MyD88, IRAK and subsequently TRAF-6. Upon TRAF-6 recruitment, the IRAK–TRAF-6 complex dissociates and recruits TAB2 (TAK1-binding subunit 2) and TAK1. Some studies suggest that IRAK is ubiquitinated upon TAB2 recruitment to the complex. However, since the ubiquitination mechanism is not yet elucidated, this process can be omitted and IRAK be assumed to degrade at this point. Once TAK1 © The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:16
6/24/08 1:37:02 AM
S.N. Sreenath, K.-H. Cho & P. Wellstead
17
TGFβ2
IL1 IL1R
TβRI
ILR
TGβR
IL1R-TβRI TβRII IL1R-TβRR
ILR* TGβRR-ILR TRAF6 ILR*-MyD88-IRAK-TRAF6
IRAK
ILR*-MyD88-IRAK
MyD88
TGβRR
IL1R-TβRR
ILR*-MyD88
TβRR*
Smad2
IRAK.TRAF6
TβRR*-Smad2
TAB2
IκBa
IRAK.TRAF6.TAB2
IκBa-NFκB
Smad2*
NFκB
Smad4
TRAF6.TAB2 IKK-IκBa-NFκB
TAK1
Smad2*- Smad4
TRAF6.TAB2.TAK1 TAK1*-IKKi
TAK1*
IKK
IKKi
IκBan-NFκBn
IκBan NFκBn IκBat
(Smad2*- Smad4)n Smad2n
Smad2*n Smad4n
Figure 4. IL-1/NF-κB and TGF-β/Smad pathways The shaded circle indicates species degradation, the dashed blue line indicates translocation, and the dotted-dashed black line indicates transcription if it is in the nucleus and translation if it is in the cytoplasm.
is phosphorylated, it can activate IKK (IκB kinase), which then frees the NFκB dimer from its inhibitor, IκB. A common starting point in any modelling exercise is to review the literature and adapt existing models. In this case we take the NF-κB model (starting with IKK activation onwards) from Hoffmann et al. [40]. The Hoffmann model takes into account three isoforms of IκB: IκBα, IκBβ and IκBγ. In our model, we only consider IκBα since it is the most important player in the pathway that we are studying. In the TGF-β/Smad pathway, we partly adapt a model from Clarke et al. [41]. In the adapted model, TGF-β binds to TβRI (TGF-β receptor I) and subsequently recruits the second receptor (TβRII). These receptor complexes phosphorylate each other and create a docking site for Smad2. As a result, Smad2 is recruited and phosphorylated by the complex. Once it is activated, Smad2 dissociates from the receptor complex and forms a dimer with Smad4. Together, the dimer complex translocates to the nucleus. As described by Clarke et al. [41], Smad2 can shuttle between the nucleus and cytoplasm with or without the presence of its ligand. The biochemical reactions, along with the rate constants from the literature are listed in Table 3.
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:17
6/24/08 1:37:02 AM
18
Essays in Biochemistry volume 45 2008
Table 3. Biochemical reactions and parameters for IL-1/NF-κB and TGFβ/ Smad pathways The units for rate constants are as follows: for the second-order reaction constants, µM−1·s−1 and for first-order reaction constants, s−1. Those reactions listed without the rate constants mean that they need to be estimated. The reaction constants for the NF-κB part of the pathway are taken from the Hoffmann model [40]. They came from various sources [42–48], and the experimental model includes: Escherichia coli recombinant cells, monkey arterial smooth muscle cells, HeLa S3 cells, human umbilical vein endothelial cells and sf9 cells. The reaction constants for the TGFβ/Smad part of the pathway are estimated by Clarke et al. [41], using HaCat cells. ∅ indicated self referencing and is an output. Number Biochemical reactions 1
[IL1]+[IL1R]↔[ILR]
2
[ILR]+[IL1R]↔[ILR2]
3
[ILR2]↔[ILR2*]
4
[ILR2*]+[MyD88]↔[ILR2*−MyD88]
5
[ILR2*−MyD88]+[IRAK]↔[ILR2*−MyD88−IRAK]
6
[ILR2*−MyD88−IRAK]↔[ILR2*−MyD88−IRAK*]
7
[ILR2*−MyD88−IRAK*]+[TRAF6]↔[ILR2*−MyD88
kf
kb
−IRAK*−TRAF6] 8
[ILR2*−MyD88−IRAK*−TRAF6]→[ILR2*−MyD88] +[IRAK*−TRAF6]
9
[IRAK*−TRAF6]+[TAB2]↔IRAK*−TRAF6−TAB2]
10
[IRAK*−TRAF6−TAB2]→[TRAF6−TAB2]
11
[TRAF6−TAB2]+[TAK1]↔[TRAF6−TAB2−TAK1]
12
[TRAF6−TAB2−TAK1]→[TAK1*]
13
[TAK1*]+[IKKi]↔[TAK1*−IKKi]
14
[TAK1*−IKKi]→[TAK1*]+[IKK]
15
[IκBα]+[NFκB]↔[IκBα−NFκB]
0.5
0.0005
16
[IKK]+[IκBα−NFκB]↔[IKK−IκBα−NFκB]
0.185
0.00125
17
[IKK−IκBα−NFκB]→[IKK]+[NFκB]
0.0204
18
[IKK]+[IκBα]↔[IKK−IκBα]
0.0225
0.00125
19
[IKK−IκBα]+[NFκB]↔[IKK−IκBα−NFκB]
0.5
0.0005
20
[IKK−IκBα]→[IKK]
4.07 × 10 − 3
21
[IKK] ∅
1.2 × 10 − 4
22
[IκBα−NFκB]→[NFκB]
2.25 × 10 −5
23
[IκBα]↔[IκBαn]
0.0003
0.0002
24
[NFκB]↔[NFκBn]
0.09
0.00008
25
[NFκBn]+[NFκBn]→[IκBαt]+[NFκBn]+[NFκBn]
0.0165
26
[IκBαn]+[NFkBn]↔[IκBαn−NFκBn]
0.5
27
[IκBαt] → → [IκBα]
0.00408
28
∅ [IκBαt]
1.54 × 10 −6
29
[IκBαn−NFκBn]→[IκBα−NFκB]
0.0138
30
[IκBαt] ∅
0.00028
0.0005
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:18
6/24/08 1:37:03 AM
S.N. Sreenath, K.-H. Cho & P. Wellstead
19
Table 3. Continued Number Biochemical reactions 31
[IκBα] ∅
32
[TGFβ]+[TbRII]↔[TGbRII]
33
[TGbRII]+[TbRI]↔[TbRR]
34
[TbRR]↔[TbRR*]
kf
kb
0.000113
35
[TbRR*]+[Smad2]↔[TbRR*−Smad2]
36
[TbRR*−Smad2]→[TbRR*]+[Smad2*]
37
[Smad2*]+[Smad4]↔[Smad2*−Smad4]
6.5e−5/
3.99e−2/60
(1.66e−6*60) 38
[Smad2*−Smad4]→[Smad2*−Smad4n]
16.6/60
39
[Smad2*−Smad4n]↔[Smad2*n]+[Smad4n]
4.92e−2/60
1.44e−4/ (1.66−6*60)
40
[Smad2*n]+[PPN]↔[Smad2*n−PPN]
41
[Smad2*n−PPN]→[Smad2n]+[PPN]
42
[Smad2n]↔[Smad2]
5.63/60
0.1*5.63/60
43
[Smad4n]↔[Smad4]
7.83e−1/60
4.97e−3/60
44
[TbRR*]+[Smad7]↔[TbRR*−Smad7]
45
[TbRR*−Smad7]→[Smad7]
46
[ILR]+[TbRII]↔[ILR−TbRII]
47
[ILR−TbRII]+[TbRI]↔[ILR−TbRR]
48
[ILR−TbRR]→[TbRR*]
49
[IL1R]+[TbRR]↔[IL1R−TbRR]
50
[IL1R−TbRR]→[ILR*]
Model calibration and validation As mentioned previously, where it is possible, the model coefficients and parameters are taken from web-based databases. However, for those rate constants that are not available from the literature or databases, the values need to be estimated using biological data. In this process of model calibration (known as system identification in the control and systems literature [49]), we use measured time course and related data for the biological process to estimate unknown parameters. The parameters to be estimated often include not only the rate constants, but also the initial conditions [x0 = x(t0)]. The mathematical procedures involved in calibration can be quite involved [50], thus for the purposes of the present chapter we only sketch the steps involved. If we collect all of the rate constants and initial conditions to be estimated into a vector, θ, the estimation problem is typically formulated as a weighted least squares optimization (eqn 31): min ∑ (y(td , θ , u) − y Exp(td )) Q (y(td , θ , u) − y Exp(td )), θ
(31)
d
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:19
6/24/08 1:37:03 AM
20
Essays in Biochemistry volume 45 2008
where yExp(td) is the experimental data vector at time d (d = 1, . . ., D), y (td, θ, u) is the predicted output from the model at time d and Q is a diagonal matrix that weighs/scales the measurement errors. Owing to the non-linear nature of the system described by eqn (15), the minimization problem in eqn (31) is often multimodal or non-convex [51]. As a consequence of this, the global optimum may not be found using standard optimization algorithms, such as Lavenberg–Marquadt or the Gauss–Newton methods. Stochastic methods such as simulated annealing, and genetic and evolutionary algorithms have been reported to perform well in finding the global optimum [51,52], albeit at the expense of computational cost. However, even the most sophisticated identification methods can still encounter local optimum and it is wise to use a range of methods. This can be done using one of the many software packages which contain calibration and identification methods, such as Copasi [29] or SBMLPet [53]. Alternatives to global optimization methods are: (i) the multiple shooting approach [54–56] where the state variables of the dynamics are discretized, (ii) comprehensive sampling of the whole parameter space to find sets of solutions that give qualitatively similar output behaviour [41,57], or (iii) to fit the parameters ‘manually’ starting from a reasonable parameter set [7] and sequentially inspecting different parameter sets for suitability. This manual fitting is extremely time consuming and is only suggested when the number of unknown parameters to be estimated is small. The challenge in parameter estimation of the kind mentioned in the preceding paragraphs is not only to find the global solution, but also to obtain appropriate data. Specifically, data is not bountiful in biological systems, where the most common experimental technique, quantitative Western blot analysis [58], provides only relative concentration time profiles in terms of intensity values [59]. This means that the resulting intensity values can be compared only with values of the same band or blot, which typically consists of the same protein measurements from one experimental run at different time points. Figure 5 shows a typical Western blot and the corresponding quantification. The availability of data, which also holds for most techniques, irrevocably depends on the availability of corresponding antibodies to probe any given protein. In the Western blot technique, the measured proteins are obtained from a population of cells, and thus the data describes the average behaviour of the cells. Other methods, such as flow cytometry, are capable of probing the protein one cell at a time, yielding the relative protein amount per cell. Despite intense development and rapid progress in bio-instrumentation, currently available measurement techniques do not allow direct measurement of system variables. This causes a range of problems, most notably an inability on the part of the calibration tools to find a unique set of parameter values. This problem, termed lack of identifiability, is common in systems where there are few data points available compared with the number of unknown coefficients. In addition, where there is feedback within the biochemical reaction sequences then some coefficients © The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:20
6/24/08 1:37:04 AM
S.N. Sreenath, K.-H. Cho & P. Wellstead
0
IL-1β
5
21
15
30
60
120 min
pSmad2 (arbitrary unit)
pSmad2 800 700 600 500 400 300 200 100 0 0
20
40
60
80
100
120
Time (min)
Figure 5. Results from Western blot analysis (top panel) and the corresponding quantification (bottom panel). These data were obtained from the Stark laboratory [37]. IL-1β is the input ligand which is applied to the cell for the indicated duration. pSmad2 (phosphorylated Smad2) is the probed protein.
become effectively hidden (or unidentifiable) from external measurements [60]. A result of these phenomena is that measurements typically correspond to linear combinations of two or more variables in terms of their intensi For the system described in eqn set (15), the output can then be expressed as: y = int(Cx),
(32)
where int (.) denotes a function of intensity, and C ∈ m × n is a matrix that selects the output as a linear combination of the variables. The variable x is a function of time (t), rate constants (k) and input (u), but these notations are dropped for representation purposes. In order to estimate the parameters, the model output, y, needs to be formulated appropriately according to the observed data. This formulation depends on the nature of the data, which is discussed below. Case 1: data is in terms of absolute protein amount The absolute amount of protein can be obtained if the purified form is available [5]. In this case, one would generate a ‘standard curve’, a known titration of the purified protein, alongside the protein measurements (from the cell lysates) that are obtained under experimental conditions. Ideally when using Western blot analysis, the standard curve should be generated for every protein and every blot. If the intensity values of the protein from the lysates fall within the linear range of the standard curve, then there is a linear relationship between the intensity value and the total protein amount or concentration. That is (eqn 33): int(Cx) = α Cx,
(33)
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:21
6/24/08 1:37:04 AM
22
Essays in Biochemistry volume 45 2008
where α is a vector estimated from the standard curve for each measured protein. Then the output to the system in eqn set (15) can simply be written as (eqn 34): y = α Cx
(34)
Case 2: data is in terms of intensity value that falls within the linear range Since generating the standard curve for every experimental run and every measured protein may not be possible, one can then just generate one standard curve. If the resulting intensity values are still within the linear range, the output is then formulated as in eqn (34). However, when estimating the parameters, α needs to be estimated as well. The objective function then becomes (eqn 35): min, ∑ (y(td , θ, α, u) − y Exp(td )) Q (y(td , θ, α, u) − y Exp(td )), θ,α
d
(35)
Case 3: data is in terms of relative amount In the case where purified protein is unavailable, the measured intensity is often normalized to a reference time point. This reference point is typically the maximum intensity within the band or the first value of the band. Let tr be the reference point chosen for each measured protein and Cj be the j-th row of matrix C ( j = 1 . . . m). The system output is then (eqn 36): ⎛ 1 ⎞ ⎜ int(C1 x (t r )) ⎟ ⎜ ⎟ 1 ⎜ ⎟ int(C2 x (t r )) ⎟ ⎜ y = diag ( ) ⎜ ⎟ int C x ⎜ ⎟ 1 ⎜ ⎟ ⎜ int(Cn x (t r ))⎟ ⎜⎝ ⎟⎠
(36)
Example: calibration of IL-1/NF-κB and TGF-β/Smad pathways We have previously discussed the biochemical reactions in the dose-dependent cross-talk pathways of IL-1/NF-κB and TGF-β/Smad. Since not all parameters are available in the literature, the unknown parameters need to be estimated using experimental data. This can be done by taking time course data (in this case from Lu et al. [37]) and as shown in Figure 6. The biological experiments associated with Figure 6 were conducted using HEK (human embryonic kidney)-293 cells, where either the gene for MyD88 or IRAK, protein adaptors upstream of NF-κB, are knocked down (eliminated). This means that any biochemical reactions downstream of these adaptor proteins will not take place, allowing us to focus on one-half of the pathway (IL-1/TGF-β/Smad). The cells were treated with a high dose of IL-1 for the indicated time duration (Figure 6), and pSmad2 (phosphorylated Smad2) abundance was measured using Western blot analysis.
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:22
6/24/08 1:37:04 AM
S.N. Sreenath, K.-H. Cho & P. Wellstead
IL1 (h)
0
0.5
23
1
2
4
8
16
24
MyD88−/−, pSmad2 IRAK−/−, pSmad2
Figure 6. IL-1-induced Smad activation data from Lu et al. [37]. −/− indicates knockdown of the protein. HEK-293 cells were treated with a high dose of IL-1.
With the knockdown condition, we have 17 rate constants and seven initial conditions to be estimated using two sets of time course data (each with eight data points). This paucity of data compared with the number of parameters to be determined is a typical example of poor identifiability mentioned previously. A number of methods exist for tackling this problem [61]. An appropriate method in this case is to decrease the parameter space, by placing realistic constraints on the upper and lower values of the parameters. For the association constant, the lower and upper bound are set to 10−4 and 10−1 nM−1·s−1 respectively. These constraints are from [62], where it is indicated that the association of protein molecules to dimers or larger complexes typically occurs within this range. For the dissociation constant and the initial conditions, the lower bound is zero. With most signalling pathway processes occurring in seconds [63], the dissociation rate should be lower than minutes. The estimation of the unknown parameters is then performed using the form of technique discussed previously. Thus, for example, if a typical gradient-based method is used to estimate the parameters, we iteratively estimate 3.5
Total pSmad2 conc.
3 2.5 2 1.5 1 0.5 0 0
5
10
15
20
Time (h)
Figure 7. Estimated model of IL-1-induced Smad activation The blue line indicates the experimental data and the smooth black line shows the model response.
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:23
6/24/08 1:37:05 AM
24
Essays in Biochemistry volume 45 2008
the parameters using different initial or seed values in order to avoid convergence to a local optimum. These seed values are generated randomly from the specified range as discussed above. The response from the best objective function is shown in Figure 7. Here, the model output follows the trend of the data. The gradient-based method is mentioned here as an example, and other algorithms, such as those mentioned earlier, could also be used.
Conclusions In the present chapter we have presented, albeit in review form, the different frameworks for modelling signalling pathway dynamics. We have noted that models based upon deterministic ODEs are the most popular and most tractable approach. We have also demonstrated the typical techniques to develop mathematical models from the biochemical representation of the pathways using chemical kinetics. The main challenge in this process is to determine the right mechanism for the biochemical interactions (the chemical kinetics) as well as obtaining the rate constants. There is also the issue of how much detail should be included, since an activation of any protein can involve numbers of proteins and reactions in which some of the reactions may only be valid in specific cell lines. Incorporating more details leads to an increasing number of parameters to be estimated. However, if an important biochemical reaction is not included, the overall system behaviour can change dramatically. Thus model complexity needs to be given special attention with Occam’s Razor, whereby: one should not increase, beyond what is necessary, the number of entities required to explain anything. We have also discussed the calibration and validation of the mathematical models. As the current state of the art in instrumentation does not allow direct measurement of individual protein concentrations, identification becomes necessary with the attendant problems of identifiability always a major issue. The signalling pathway model example we have given is typical of the complexity to be anticipated, whereby models contain of a large number of unknown parameters. We have outlined the calibration process and noted the interplay between the ease of parameter estimation, the amount of available data and the number of unknown parameters. Here we have re-emphasized that it is often better to start with a model which oversimplifies the biology. Even though the model is too simple it will at least be mathematically and computationally tractable and can form a starting point for investigations. As demonstrated by Swameye et al. [55], a simple model that is properly calibrated is better than none, and can form the first step in understanding the complex behaviour of signalling pathways. We are grateful to Dr Radina Soebiyanto for help in assembling the material for this work and working through the simulations. We thank Professor George Stark and the Stark Laboratory for the data on IL-1/NF-κB and TGF-β/Smad2 pathway. The authors
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:24
6/24/08 1:37:05 AM
S.N. Sreenath, K.-H. Cho & P. Wellstead
25
of the present chapter were funded by NIH (National Institutes of Health; grant numbers K25 CA 113133 and U56 CA 112963) and the Case Provost Opportunity Fund Award for the Systems Biology Center of Excellence Initiative (to S.N.S.), the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MOST) (M10503010001-07N030100112), Korea Ministry of Science and Technology through the Nuclear Research Grant (M20708000001-07B0800-00110) and the 21C Frontier Microbial Genomics and Application Center Program (Grant MG05-0204-3-0) (to K.-H.C.), and the Science Foundation Ireland (grant number RP 03/RP1/I383) (to P.W.)
Summary • • •
• •
•
The different frameworks for modelling signalling pathway dynamics are presented. Models based upon deterministic ODEs are the most popular and most tractable approach. The typical techniques to develop mathematical models from the biochemical representation of the pathways using chemical kinetics are demonstrated. The calibration and validation of the mathematical models is discussed. We have outlined the calibration process and noted the interplay between the ease of parameter estimation, the amount of available data and the number of unknown parameters. It is often better to start with a model which oversimplifies the biology.
References 1. 2. 3. 4. 5.
6.
7. 8. 9.
10.
Albeck, J., MacBeath, G., White, F., Sorger, P., Lauffenburger, D. & Gaudet, S. (2006) Collecting and organizing systematic sets of protein data. Nat. Rev. Mol. Cell Biol. 7, 802–812 Weston, A.D. & Hood, L. (2004) Systems biology, proteomics, and the future of health care: toward predictive, preventative, and personalized medicine. J. Proteome Res. 3, 179–196 Kitano, H. (2001) Foundations of Systems Biology. The MIT Press, Cambridge, MA Kirschner, M.W. (2005) The meaning of systems biology. Cell 121, 503–504 Schilling, M., Maiwald, T., Bohl, S., Kollmann, M., Kreutz, C., Timmer, J. & Klingmüller, U. (2005) Computational processing and error reduction strategies for standardized quantitative data in biological networks. FEBS J. 272, 6400–6411 Schoeberl, B., Eichler-Jonsson, C., Gilles, E.D. & Müller, G. (2002) Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors. Nat. Biotechnol. 20, 370–375 Lipniacki, T., Paszek, P., Brasier, A.R., Luxon, B. & Kimmel, M. (2004) Mathematical model of NF B regulatory module. J. Theor. Biol. 228, 195–215 Wellstead, P. (1979) Introduction to Physical System Modelling. Academic Press, London Amonlirdviman, K., Khare. N.A., Tree, D.R.P. , Chen, W.-S., Axelrod, J.D. & Tomlin, C.J. (2005) Mathematical modeling of planar cell polarity to understand domineering nonautonomy. Science 307, 423–426 Bhalla, U.S. & Iyengar, R. (1999) Emergent properties of networks of biological signaling pathways. Science 283, 381–387
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:25
6/24/08 1:37:05 AM
26
Essays in Biochemistry volume 45 2008
11.
Sreenath, S.N., Soebiyanto, R.P., Mesarovic, M.D. & Wolkenhauer, O. (2007) Coordination of crosstalk between MAPK-PKC pathways: an exploratory study. IET Syst. Biol. 1, 33–40 Wolkenhauer, O., Ullah, M., Kolch, W. & Cho, K.-H. (2004) Modeling and simulation of intracellular dynamics: choosing an appropriate frame-work. IEEE Trans. Nanobioscience 3, 200–207 El Samad, H., Kammash, M. & Gillespie, D. (2002) Stochastic modeling of gene regulatory networks. Int. J. Robust and Nonl. Contr. 15, 691–711 Cho, K.-H., Johansson, K. & Wolkenhauer, O. (2004) A hybrid systems framework for cellular processes. Biosystems 80, 273–282 Ghosh, R. & Tomlin, C. (2004) Symbolic reachable set computation of piece-wise affine hybrid automata and its application to biological modelling: Delta-Notch protein signalling. Syst. Biol. 1, 170–183 Vera, J., Balsa-Canto, E., Wellstead, P., Banga, J.R. & Wolkenhauer, O. (2007) Power-law models of signal transduction pathways. Cell. Signalling 19, 1531–1541 Ullah, O.M. (2007) Family tree of markov models in systems biology. IET Syst. Biol. 1, 247–254 Savageau, M.A. (1976) Biochemical Systems Analysis: A Study of Function and Design in Molecular Biology. Addison-Wesley, Reading, MA Fell, D. (1997) Understanding the Control of Metabolism. Portland Press, London Guldberg, C.M. & Waage, P. (1879) Über die chemische Affinitat. Prakt. Chem. 19, 69 Farina, M., Findeisen, R., Bullinger, E., Bittanti, S., Allgöwer, F. & Wellstead, P. (2006) Results towards identifiability properties of biochemical reactions networks. Proceedings of the Conference on Decision and Control, 2104–2109 Aldridge, B.B., Burke, J.M., Lauffenburger, D.A. & Sorger, P.K. (2006) Physicochemical modelling of cell signalling pathways. Nat. Cell Biol. 8, 1195–1203 Keener, J.P. (2002) Spatial modeling. In Computational Cell Biology (Fall, C.P., Marland, E.S., Wagner, J.M. & Tyson, J.J., eds), Springer–Verlag Klipp, E., Herwig, R., Kowald, A., Wierling, C. & Lehrach, H. (2005) Systems Biology in Practice. Wiley-VCH Gillespie, D.T. (2007) Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem. 58, 35–55 Gillespie, D.T. (1977) Exact stochastic simulation of coupled chemical reacitons. J. Phys. Chem. 81, 2340–2361 Reference deleted Wilkinson, D.J. (2006) Stochastic Modelling for Systems Biology. Chapman and Hall Hoops, S., Sahle, S., Gauges, R., Lee, C., Pahle, J., Simus, N., Singhal, M., Xu, L., Mendez, P. & Kummer, U. (2006) COPASI - a COmplex PAthway SImulator. Bioinformatics 22, 3067–3074 Ramsey, S., Orrell, D. & Bolouri, H. (2005) Dizzy: stochastic simulation of large-scale genetic regulatory networks. J. Bioinform. Comput. Biol. 3, 415–436 Kierzek, A. (2002) STOCKS: STOChastic Kinetic Simulation of biochemical systems with Gillespie algorithm. Bioinformatics 18, 470–481 Hucka, M., Finney, A., Sauro, H.M., Bolouri, H., Doyle, J.C., Kitano, H., Arkin, A.P., Bornstein, B.J., Bray, D., Cornish-Bowden, A. et al. (2003) The Systems Biology Markup Language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19, 524–531 Cuellar, A.A., Lloyd, C.M., Nielsen, P.F., Bullivant, D.P., Nickerson, D.P. & Hunter, P.J. (2003) An overview of CellML 1.1, a biological model description language. SIMULATION: Trans. Soc. Model. Sim. Int. 29, 740–747 Schmidt, H. & Jirstrand, M. (2006) Systems biology toolbox for MATLAB: a computational platform for research in systems biology. Bioinformatics 22, 514–515 Ruland, J. & Mak, T.W. (2003) Transducing signals from antigen receptors to nuclear factor κB. Immunol. Rev. 193, 93–100 Bachman, K.E. & Ben-Ho, P. (2005) Duel nature of of TGF-β. Curr. Opin. Oncol. 17, 49–54 Lu, T., Tian, L., Han, Y., Vogelbaum, M. & Stark, G.R. (2007) Dose-dependent cross-talk between the transforming growth factor-β and interleukin-1 signaling pathways. Proc. Natl. Acad. Sci. U.S.A. 104, 4365–4370
12. 13. 14. 15.
16. 17. 18. 19. 20. 21.
22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32.
33.
34. 35. 36. 37.
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:26
6/24/08 1:37:06 AM
S.N. Sreenath, K.-H. Cho & P. Wellstead
38. 39. 40. 41. 42. 43.
44.
45.
46. 47. 48.
49. 50. 51. 52. 53. 54.
55.
56. 57.
58. 59.
60.
27
Li, X. & Qin, J. (2005) Modulation of Toll-Interleukin 1 receptor mediated signaling. J. Mol. Med. 83, 258–266 Janssens, S. & Beyaert, R. (2003) Functional diversity and regulation of different Interleukin-1 Receptor-Associated Kinase (IRAK) family members. Mol. Cell 11, 293–302 Hoffmanna, A., Levchenko, A., Scott, M.L. & Baltimore, D. (2002) The IκB–NF-κB signaling module: temporal control and selective gene activation. Science 298, 1241–1245 Clarke, D.C., Betterton, M.D. & Liu, X. (2006) Systems theory of Smad signalling. IEE Syst. Biol. 153, 412–424 Malek, S., Huxford, T. & Ghosh, G. (1998) IκBα functions through direct contact with the nuclear localization signals and the DNA binding sequences of NFκB. J. Biol. Chem. 273, 25427–25435 Carlotti, F., Dower, S.K. & Qwarnstrom, E.E. (2000) Dynamic shuttling of nulcear factor κB between the nucleus and cytoplasm as a consequence of inhibitor dissociation. J. Biol. Chem. 275, 41028–41034 Heilker, R., Freuler, F., Pulfer, R., DiPadova, F. & Eder, J. (1999) All three IκB isoforms and most Rel family members are stably associated with the IκB kinase 1/2 complex. Eur. J. Biochem. 259, 253–261 Heikler, R., Freuler, F., Vanek, M., Pulfer, R., Kobel, T., Peter, J., Zerwes, H.G., Hofstetter & Eder, J. (1999) The kinetics of association and phosphorylation of IκB isoforms by IκB kinase 2 correlate with their cellular regulation in human endothelial cells. Biochemistry 36, 6231–6238 Zandi, E., Chen, Y. & Karin, M. (1998) Direct phosphorylation of IκB by IKKα and IKKβ: discrimination between free and NFκB-bound substrate. Science 281, 1360–1363 Pando, M.P. & Verma, I.M. (2000) Signal-dependent and -independent degradation of free and NFκB-bound IκBα. J. Biol. Chem. 275, 21278–21286 Birbach, A., Gold, P., Binder, B.R., Hofer, E., de Martin, R. & Schmid, J.A. (2002) Signaling molecules of the NFκB pathway shuttle constitutively between cytoplasm and nucleus. J. Biol. Chem. 277, 10842–10851 Soderstrom, T. & Stoica, P. (1988) System Identification. Prentice Hall, New York Crampin, E.J., Schnell, S. & McSharry, P.E. (2004) Mathematical and computational techniques to deduce complex biochemical reaction mechanisms. Prog. Biophys. Mol. Biol. 86, 77–112 Moles, C.G., Mends, P. & Banga, J.R. (2003) Parameter estimation in biochemical pathways: a comparison of global optimization methods. Gen. Res. 13, 2467–2474 Mendes, P. & Kell, D.B. (1998) Non-linear optimization of biochemical pathways: application to metabolic engineering and parameter estimation. Bioinformatics 14, 869–883 Zi, Z. & Klipp, E. (2006) SMBL-PET: a systems biology markup language based parameter estimation tool. Bioinformatics 22, 2704–2705 Bock, H. (1983) Recent advances in parameter identification for ordinary differential equations. In Progress in Scientific Computing (Deuflhard, P. & Hairer, E., eds), pp. 95–121, Birkhuser, Switzerland Swameye, I., Müller, T., Sandra, O. & Klingmüller, U. (2003) Identification of nucleocytoplasmic cycling as a remote sensor in cellular signaling by databased modeling. Proc. Natl. Acad. Sci., U.S.A. 100, 1028–1033 Timmer, J., Müller, T., Swameye, I., Sandra, O. & Klingmüller, U. (2004) Modelling the nonlinear dynamics of cellular signal transduction. Int. J. Bif. Chaos 14, 2069–2079 Eissing, T., Conzelmann, H., Gilles, E., Allgower, F., Bullinger, E. & Scheurich, P. (2004) Bistability analyses of caspase activation model for receptor-induced apoptosis. J. Biol. Chem. 279, 36892– 36897 Tözeren, A. & Byers, S.W. (2003) New Biology For Engineers and Computer Scientists. Pearson Prentice Hall, Upper Saddle River, New Jersey Burnette, W.N. (1981) Western blotting: electrophoretic transfer of proteins from sodium dodecyl sulfate-polyacrylamide gels to unmodified nitrocellulose and radiographic detection with antibody and radioiodinated protein. Anal. Biochem. 112, 195–203 Wellstead, P. & Edmunds. J.M. (1975) Least-squares identification of closed-loop systems. Int. J. Control 21, 689694 © The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:27
6/24/08 1:37:06 AM
28
61. 62. 63.
Essays in Biochemistry volume 45 2008
Box, G. & Jenkins, G. (1970) System Identification: Forecasting and Control. Holden Day, San Fransico Kholodenko, B.N., Demin, O.V., Moehren, G. & Hoek, J.B. (1999) Quantification of short term signaling by the epidermal growth factor receptor. J. Biol. Chem. 274, 30169–30181 Papin, J. & Palsson, B. (2004) The JAK-STAT signaling network in the human B-cell: an extreme signaling pathway analysis. Biophys. J. 87, 37–46
© The Authors Journal compilation © 2008 Biochemical Society
0045-0002 Sreenath.indd Sec1:28
6/24/08 1:37:06 AM