1
Supplementary Material for A framework for scalable parameter estimation of gene circuit models using structural information Hiroyuki Kuwahara1a , Ming Fan1a , Suojin Wang2 , and Xin Gao1,∗ contributions. ∗ All correspondence should be addressed to
[email protected]. 1 Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia 2 Department of Statistics, Texas A&M University, College Station, Texas, 77840, U.S.A. a Equal
2
Contents S1 Detailed information of PEDI S1.1 Objective function . . . . . . . . . . . . . . . . . . . . . . . . . . S1.2 Initial estimation of the mean protein level . . . . . . . . . . . . S1.3 Initial estimation of the mean mRNA level . . . . . . . . . . . . . S1.4 Computation of protein levels at the refinement step . . . . . . . S1.5 Computation of the objective function at the refinement step . . S1.6 Termination condition . . . . . . . . . . . . . . . . . . . . . . . . S1.7 Probability of accepting the current parameters in each iteration S2 Configuration of SA S2.1 Objective function . . . . . . . . . . . . . . . . . . S2.2 Algorithmic parameters of SA . . . . . . . . . . . . S2.3 Boundaries for parameter search in the two models S2.4 Initial guess for the parameters in the two models .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . .
. . . .
. . . . . . .
. . . .
. . . . . . .
. . . .
. . . . . . .
. . . .
. . . . . . .
. . . .
. . . . . . .
. . . .
. . . . . . .
. . . .
. . . . . . .
. . . .
. . . . . . .
3 3 3 4 4 4 5 5
. . . .
6 6 6 6 7
S3 Comparison between PEDI and SA
7
S4 Configuration of parameter estimation methods S4.1 Configuration of SRES . . . . . . . . . . . . . . . . . . . . S4.2 Configuration of TDDM . . . . . . . . . . . . . . . . . . . S4.3 Configuration of HEKF+MM . . . . . . . . . . . . . . . . S4.3.1 Configuration of HEKF . . . . . . . . . . . . . . . S4.3.2 Initial guess for the parameters in the two models
8 8 9 9 9 9
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
S5 Comparison using a variant of M3
10
S6 Supporting Tables
11
S7 Supporting Figures
15
3
S1 S1.1
Detailed information of PEDI Objective function
We utilize a weighted sum of squared residuals as the objective function of which an optimization method within the PEDI framework attempts to minimize the value. Since PEDI decomposes a gene circuit model into individual rate equations, it optimizes θi separately and has an objective function for each mRNA mi . The form of objective function for mi is Ji =
M X
wij (m ˆ i (tj ) − m ¯ ij )2 ,
j=1
where wij is the weight for the squared error for mRNA mi . Thus, instead of having the contribution of the squared error at each time point equally, the importance of each error is specified by its weight. It is often the case that the mean of a random variable representing the population of a molecular species positively correlates with its variance. Thus, we often want to decrease the contributions of the error terms as their sample means increase. In this study, we set wij to be the inverse square root of m ¯ ij + 1. This increases the importance of the terms with higher mean values relative to weights proportional to the sample means, and this choice appears to work well in this study by balancing the terms with higher and lower values of sample means. Note that this objective function was also used in simulated annealing.
S1.2
Initial estimation of the mean protein level
We first approximate the time integral of the rate equation of each protein pi using the linear interpolation as follows: Z tj (αi m ˆ i (t) − βi pˆi (t)) dt pˆi (tj ) = t0 ! Z = e−βi ∆t
tj
pˆi (tj−1 ) +
αi m ˆ i (t)eβi (t−tj−1 ) dt
(S1)
tj−1 −βi ∆t
≈e
αi ∆t βi ∆t pˆi (tj−1 ) + m ¯ ij e +m ¯ ij−1 , 2
where ∆t ≡ tj − tj−1 . Here, since m ˆ i is also a time-dependent variable whose rate equation on transcription factors, we utilize the trapezoidal rule to approximate R tj often depends βi (t−tj−1 ) dt and substitute m α m ˆ (t)e ¯ ij for m ˆ i (tj ) at each discrete time point. By iterai i tj−1 tively evaluating Equation (S1) from j = 1 to M , we are able to obtain the initial estimate of pˆi (tj ).
4
S1.3
Initial estimation of the mean mRNA level
To estimate m ˆ i (tj ) initially, we approximate the time integral of the rate equation of each mRNA mi using the linear interpolation as follows: Z tj (hi (ˆ p; θi ) − θi1 m ˆ i ) dt m ˆ i (tj ) = t0 ! Z tj
= e−θi1 ∆t
m ˆ i (tj−1 ) +
hi (ˆ p(t); θi )eθi1 (t−tj−1 ) dt
(S2)
tj−1
≈e
−θi1 ∆t
∆t θi1 ∆t m ˆ i (tj−1 ) + hi (ˆ p(tj ); θi )e + hi (ˆ p(tj−1 ); θi ) . 2
By using the approximate integration shown in Equation (S2), the objective function that we minimize for the initial parameter estimation is a wieghted sum of squared residuals as follows: M X j=1
S1.4
2 ∆t θi1 ∆t −θi1 ∆t hi (ˆ p(tj ); θi )e + hi (ˆ p(tj−1 ); θi ) − m ¯ ij . wij e m ˆ i (tj−1 ) + 2
(S3)
Computation of protein levels at the refinement step
In the refinement step of PEDI, we compute equally-spaced Lp + 1 protein data points using equally-spaced Lm + 1 mRNA data points of m ˇ i . The level of each protein pˇi at time point k∆t/dp for all k ∈ [1, Lp ] can be expressed by integrating the rate equation of protein pi as follows: Z −βi ∆t d
pˇi (tU ) = e
p
tU
pˇi (tL ) +
αi m ˇ i (t)eβi (t−tL ) dt ,
(S4)
tL
where tU = k∆t/dp and tL = (k − 1)∆t/dp . Since Lm is much greater than Lp , we use dq subintervals of mRNA data where dq = Lm /Lp to better interpolate the time evolution of mi Rt ˇ i (t)eβi (t−tL ) dt between the two time points tL and tU and to more accurately integrate tLU αi m using various numerical integration methods. In this study, unless otherwise specified, we set Lm = 3, 000 and Lp = 300 and implemented the composite Simpson’s rule as follows: Z tU dq /2−1 X 2s ∆t 2s βi (t−tL ) αi m ˇ i (t)e dt ≈ αi m ˇ i (tL ) + 2 αi m ˇ i (tL + ∆t)eβi dm ∆t 3dq dm tL s=1 (S5) dq /2 X ∆t 2s−1 2s − 1 β +4 αi m ˇ i (tL + ∆t)eβi dm ∆t + αi m ˇ i (tU )e i dp . dm s=1
S1.5
Computation of the objective function at the refinement step
To compute m ˆ i (tj ) at the refinement step, we approximate the time integral of the rate equation of each mRNA mi for each ∆t interval by using dp subintervals of protein levels. The expression
5 of m ˆ i (tj ) is as follows: m ˆ i (tj ) = e−θi1 ∆t
Z
tj
m ˆ i (tj−1 ) +
! hi (ˆ p(t); θi )eθi1 (t−tj−1 ) dt .
(S6)
tj−1
R tj ˆ i (ˇ Let H p; θi , j) be an approximate numerical solution of tj−1 hi (ˆ p(t); θi )eθi1 (t−tj−1 ) dt, where pˇ— as defined in the main text—is a time-dependent variable representing the Lp + 1 data points of pi . Then, we perform an optimization of each θi which attempts to minimize the following objective function: M X
h i2 ˆ i (ˇ wij e−θi1 ∆t m ˆ i (tj−1 ) + H p; θi , j) − m ¯ ij .
(S7)
j=1
ˆ i (ˇ In this study, we implemented the composite Simpson’s rule to compute H p; θi , j) as follows: dp /2−1 X ∆t 2k θ 2k ∆t ˆ Hi (ˇ p; θi , j) = hi (ˇ p(tj−1 ); θi ) + 2 hi (ˇ p(tj−1 + ∆t); θi )e i1 dp 3dp dp k=1 (S8) dp /2 X 2k−1 2k − 1 θ ∆t +4 hi (ˇ p(tj−1 + ∆t); θi )e i1 dp + hi (ˇ p(tj ); θi )eθi1 ∆t . dp k=1
S1.6
Termination condition
In this study, we set the termination condition of PEDI so that the relative change of each Ji from the previous round to the current round is less than five percent for the last five iterations. Thus, let Ji (k) to be the Ji in the k-th round of the refinement step. Then, we compute ( 1 if k ≤ 1, i (k) = |Ji (k)−Ji (k−1)| (S9) otherwise, Ji (k−1) for all i. For a given k, if, for all i ∈ [1, N ], we have, for all s ∈ [k − 4, k], i (s) < 0.05 but i (k − 5) ≥ 0.05, then the termination condition is satisfied at the k-th round. In other words, the termination condition is satisfied if the relative change is less than 5% for five consecutive refinement rounds. Thus, we perform at least 6 rounds of the PEDI refinement step. To guarantee the termination of the refinement step, we set the maximum number of the iterations to be 100. In this study, however, we did not encounter any instances in which the refinement step exceeded this maximum iteration count.
S1.7
Probability of accepting the current parameters in each iteration
As described in the main text, we select the current θˆ as the seed for the next round at the refinement step when the current sum of Ji is smaller than the previous one. Otherwise, we
6 probabilistically decide which one to select between the current one and the previous one. The probability of accepting the current θˆ is pmax exp (1 − c /o ).
(S10)
In this study, we set pmax to be 0.1. With this setting, when the current sum of Ji is greater than the previous one, we most likely select the previous estimate of θ over the current one.
S2 S2.1
Configuration of SA Objective function
When we use SA as a standalone parameter estimation method, we set the objective function J to be the sum of Ji . Similar to the objective function for PEDI, we set wij to be the inverse square root of m ¯ ij + 1 in this study.
S2.2
Algorithmic parameters of SA
We implemented PEDI in MATLAB. For the SA algorithm, we used simulannealbnd provided in MATLAB. To run the function simulannealbnd for our models, we set several options to better fit our purpose. These options are described below: InitialTemperature The default InitialTemperature in Matlab is 100. The higher the value of InitialTemperature is, the longer the algorithm search for a solution in the search space. We set the value of InitialTemperature to be 500. ReannealInterval ReannealInterval specifies the interval for re-annealing (i.e., the number of steps before re-heating occurs). The default value of ReannealInterval is 100. We set this parameter to be 200. MaxFunEvals MaxFunEvals specifies the maximum number of objective function evaluations that are allowed. The default value of MaxFunEvals is 3,000 × the number of variables. We set the value of MaxFunEvals to be 10,000 × the number of variables.
S2.3
Boundaries for parameter search in the two models
We constrained the SA based optimization so that search space has the lower and upper bounds for each unknown parameter. These boundaries allow us not only to perform search in realistic parameter ranges but also to restrict our search within the biologically relevant range. For model M1, the lower bound of each parameter is set to 10−10 while the upper bound is set to 103 . For model M2, the boundaries are set differently depending on the types of parameters. The lower and upper bounds of the parameters representing the maximum transcription rates are set to 0.01 and 1, respectively. The lower and upper bounds of the parameters representing the transcription factor-cis-regulatory element binding affinity are set to 10−10 and 103 , respectively. Finally, the lower and upper bounds of the parameters representing the Hill coefficients are set to 1 and 8, respectively.
7
S2.4
Initial guess for the parameters in the two models
The initial guess of the value of each parameter is set to be the lower bound of that parameter. That is, in model M1, for example, the initial guess is 10−10 for all unknown parameters.
S3
Comparison between PEDI and SA
In this comparison, we generated four synthetic mRNA datasets, each consisting of the same 31 data points, based on the ODE simulation of each model. To generate each data point, we sampled a value from a Gaussian random variable with mean and variance being the simulation output of the given time point. Throughout this comparison, we set the number of intermediate points to be such that dm = 100 and dp = 10. The results from model M1 showed that PEDI substantially increased the quality of the estimated parameters and the reconstructed models; while SA was often trapped in suboptimal local minima and produced widely different sets of dynamics and estimated parameters, PEDI kept the prediction error small and produced more accurate parameter combinations more consistently (Figures S1 and S2). Next, we compared the results from the best parameter combinations from PEDI and SA in terms of the prediction error. We found that, whereas PEDI and SA both produced a similar level of quality of the reconstructed m3 regulation, PEDI produced more than 1.7 times smaller prediction error than SA for the regulation of m1 and m2 (Figure S3). This makes sense because while m3 is constitutively expressed in this model and the reconstruction of this regulation is simpler, the regulation of the other two mRNAs are more involved and complex. PEDI also generated the estimated parameter sets of m1 and m2 which were much closer to the true parameter sets (Figure S3). Since the linearization involved in the initial estimate of PEDI lowers the quality of numerical integration compared with the full simulation utilized in SA, an increase in the accuracy comes from the refinement process. Indeed, while the estimated parameters from the initial step resulted in a higher prediction error than that of SA, the prediction error of the m1 and m2 regulation from the first iteration of the refinement decreased sharply (Figure S4). Next, we estimated the 12 unknown parameters of model M2 using PEDI and SA. As with the analysis of model M1, we ran both PEDI and SA for 10 times. In this analysis, we compared the accuracy of reconstructed dynamics based on estimates of the binding affinity, hill coefficient, and the ratio between the maximum transcription rate and mRNA degradation rate constant. Our results show that the estimated parameters from SA produced much higher prediction error and that the variance of the estimates for each parameter is large (Figures S5 and S6). This may be explained by the fact that oscillations require a stringent stability condition at fixed points and that the optimization path to reach parameter combinations satisfying such conditions with an appropriate amplitude, phase, and frequency in a high-dimensional search space based on the gradient of the objective function is not straightforward. Indeed, none of the parameter combinations estimated by SA exhibited oscillations. The results from PEDI, on the other hand, show that the estimated parameters consistently resulted in small prediction error and each of the parameter values is relatively close to the true one (Figures S5 and S6). Furthermore, all of the 10 parameter combinations generated from PEDI exhibit damped oscillations. We next compared the estimates which resulted in the smallest prediction error from the two approaches. We found
8 that, while SA could not exhibit an oscillation, PEDI led to a damped oscillation which closely resembles the true trajectory and there was a substantial difference between the two in terms of the prediction error; in particular, PEDI resulted in more than 19 times smaller prediction error for the regulation of m3 compared with SA (Figure S7). Furthermore, the parameter set from PEDI estimated the true parameter set more accurately. By looking at the trace of how this best parameter combination was generated, we found that initially the estimated parameters did not lead to oscillatory dynamics and its prediction error is on par with SA (Figure S8). By the second round of refinement, however, the model started to exhibit an oscillation, and as the parameters were refined over iterations, the amplitude and the frequency of the oscillation were changed to better capture the time-series data (Figure S8). Both SA and PEDI with SA were implemented in MATLAB (Mathworks Inc, Natick, MA). In terms of efficiency, the average runtime from 10 parameter estimation runs of model M1 was 4.1 minutes with SA while it was just 2 minutes with PEDI, making PEDI more than two times faster than SA. As for the parameter estimation of model M2, the average runtime from 10 runs was 28.8 minutes with SA while it was 15 minutes with PEDI, making PEDI 1.92 times faster than SA. The slower runtime of SA in model M2 could be mainly a result of high dimensional parameter search space. Despite the fact that PEDI decomposes a high dimensional parameter estimation problem into subproblems with much smaller search space, the runtime of PEDI for model M2 also significantly slowed down compared with model M1. Such an increase in runtime could be explained by much slower convergence of estimated parameters in model M2 (Figures S4 and S8). Nevertheless, our results indicate that PEDI improves the runtime efficiency of parameter estimation while it also improves the accuracy. Overall, our results show that PEDI is able to improve the consistency and the accuracy of parameter estimation while increasing the runtime efficiency. These improvements make the parameter estimation process more efficient and effective by permitting PEDI to produce high quality estimates in much less computing time.
S4
Configuration of parameter estimation methods
S4.1
Configuration of SRES
We used a Matlab implementation of SRES [1], which we downloaded from https://notendur.hi.is/tpr/index.php?page=software/sres/sres. To configure SRES, we set the following options: lambda 200; mu 30; pf 0.45; varphi 1; The boundaries were set to be consistent with those of SA.
9
S4.2
Configuration of TDDM
We implemented TDDM in Matlab. Since the parameters for the regulation of the proteins were all known in our experiments, we were able to skip one of the steps in TDDM. The only options for TDDM that we set are those for the scatter search method (SSM) [2]. We downloaded SSM from http://www.iim.csic.es/ gingproc/ssmGO.html. To configure the SSM, we set the following options: maxeval 104 ; local.solver dhc; local.finish fmincon; The boundaries were set to be consistent with those of SA.
S4.3
Configuration of HEKF+MM
S4.3.1
Configuration of HEKF
Detailed information about HEKF for this hybrid method can be found in [3]. There are a couple of algorithmic parameter that we set for applying HEKF to the two gene circuit models: Covariance matrix R R is an N × N matrix that represents the measurement noise. Each element Ri,j is the covariance of mi and mj , which is assumed to be known and fixed over time. Since the measurement noise is assumed to be uncorrelated for different mRNAs, off-diagonal elements are set to 0. Ri,i is set to be the mean of µmij over all j as in [3]. That is, ( P M 1 µmij if i = j, Ri,j = M j=0 0 if i 6= j. Linear constraint term Following [3], we specified the constraints for the states and the parameters using the following linear form: D×x ˆ ≤ d, where D ≡ diag(−1, −1, · · · , −1) is a negative identity matrix with suitable size, x ˆ contains the state variables and the unknown parameters, d is a vector specifying the constraint that HEKF must satisfy for each iteration. To have a biologically relevant solution, d must be at least zero. An important things to note here is that depending on the value of d, the results from HEKF can change, which can in turn change the results from HEKF+MM. To obtain different results from HEKF+MM for comparison, we had d ∈ {0, 10−13 , 10−12 , 10−11 , 10−10 , 10−9 , 10−8 , 10−7 , 10−6 , 10−5 }. This allowed us to have 10 different sets of results from HEKF+MM. S4.3.2
Initial guess for the parameters in the two models
The initial guess that is fed into HEKF is set to be the same as PEDI and SA.
10
S5
Comparison using a variant of M3
To test if our overall conclusions about the performance of PEDI still hold with a variant of M3, we have modified the 7-gene repressilator model to add a repression connection from gene g3 to gene g6 . This modification changes only the ODE for the regulation of m6 from M3 as follows: kp6 dm6 = − km6 m6 . n dt 1 + (ke61 p5 ) 61 + (ke62 p3 )n62 As with the analysis of M3, we set km6 to be known. In total, this makes the number of unknown parameters in this modified repressilator model 23. The parameters in the ODE for the regulation of m6 were valuated as: kp6 = 16; ke61 = 1/20; n61 = 4; ke62 = 1/40; n63 = 4; and km6 = 0.2. The results from the comparison of the four methods using this modified 7-gene repressilator model are summarized in Table S5. Similar to the results from M3, PEDI was the second fastest method, closely trailing TDDM. In terms of the quality of reconstructed dynamics, PEDI outperformed the other three methods on the criteria of the average, the best, and the worst prediction errors. Remarkably, the average prediction error from PEDI was lower than the best prediction error from any of the other three methods. In addition, the estimated parameters from PEDI were at least an order-of-magnitude closer to the true parameter values. Furthermore, PEDI was also able to extrapolate the mRNA levels at next few time points substantially more accurately than the other three. These results show that PEDI was able to produce much higher quality parameter solutions very efficiently.
References 1. Runarsson TP, Yao X (2000) Stochastic ranking for constrained evolutionary optimization. IEEE Transactions on Evolutionary Computation 4: 284–294. 2. Rodriguez-Fernandez M, Egea JA, Banga JR (2006) Novel metaheuristic for parameter estimation in nonlinear dynamic biological systems. BMC Bioinformatics 7: 483. 3. Lillacci G, Khammash M (2010) Parameter estimation and model selection in computational biology. PLoS Comput Biol 6: e1000696.
11
S6
Supporting Tables Table S1. The parameter setting for model M1. Parameter k1 k2 k3 k4 k5 k6 γ1 γ2 γ3 n1 n2 n3 α β
a Setup
Value 0.02 0.05 6.25 0.01 2 1 0.01 0.01 0.01 2 2 2 0.01 0.01
Setupa U U U U U U K K K K K K K K
indicates which parameters are treated as unknown and are to be estimated in each model.
Each ‘U’ represents an unknown parameter while ‘K’ represents a known parameter.
12
Table S2. The parameter setting for model M2. Parameter ke1 kp1 km1 ke2 kp2 km2 ke3 kp3 km3 n1 n2 n3 α β a Setup
Value 0.0025 6 0.1 1/300 10 0.1 0.002 8 0.1 4 4 4 0.1 0.01
Setupa U U U U U U U U U U U U K K
indicates which parameters are treated as unknown and are to be estimated in each model.
Each ‘U’ represents an unknown parameter while ‘K’ represents a known parameter.
13
Table S3. The parameter setting for model M3. Parameter kp1 ke1 n1 km1 kp2 ke2 n2 km2 kp3 ke3 n3 km3 kp4 ke4 n4 km4 kp5 ke5 n5 km5 kp6 ke6 n6 km6 kp7 ke7 n7 km7 α β a Setup
Value 20 0.025 4 0.2 14 1/60 4 0.2 16 1/60 4 0.2 16 0.02 4 0.2 15 0.02 4 0.2 18 1/70 4 0.2 18 0.025 4 0.2 0.2 0.2
Setupa U U U K U U U K U U U K U U U K U U U K U U U K U U U K K K
indicates which parameters are treated as unknown and are to be estimated in each model.
Each ‘U’ represents an unknown parameter while ‘K’ represents a known parameter.
14
Table S4. The best parameter solutions for the feedforward loop model. Method PEDI SRES TDDM
k1 1.06 0.2635 0.0419
k2 1.3983 0.1 0.01
kXY 0.4423 1.4998 16.1359
kXZ 0.5226 413.8736 23.8544
kY Z 0.0549 799.1846 0.1079
n1 3.6083 1.6520 1.0
n2 9.9276 2.7076 1.000
n3 9.927 5.7072 8.000
MSE 0.0213 0.20403 0.3754
Table S5. Comparison of the results from the variant of M3. The comparison criteria are: the average runtime; the average, best, and worst prediction errors; the average relative error of the best parameter set; and the quality of data extrapolation.
Runtime Average PE Best PE Worst PE Best paramc Pred(1)d Pred(3)d Pred(5)d a With
SRESa 312.0min 5336.1 3766.8 5955.2 212 268.0 316.6 422.0
HEKF+MM 78.2min N/Ab N/Ab N/Ab N/Ab N/Ab N/Ab N/Ab
2,000 generations.
b Because c The
PEDI 33.2min 2651.3 578.6 4116.3 0.02 31.2 33.8 40.8
all the 10 runs resulted in negative parameter values.
average relative error of the best parameter solution.
d Pred(k)
indicates the mean squared distance of the next k time points.
TDDM 27.6min 4821.03 4820.7 4821.3 0.26184 315.0 473.7 680.6
15
S7
Supporting Figures k1 −60
60
1.5
−35
k6
A
k2
0.5
35
64
0
k5
k3 12.5
−60 30
−30
k4 k1 −60
60
1.5
−35
k6
B
k2
0.5
35
64
0
k5
k3 −60
12.5 30
−30
k4
Figure S1. Comparison between PEDI and SA for the estimated parameter combinations of model M1. The hexagon with the red broken line indicates the true parameter combination. The other hexagons, each with different a color, represent 10 different estimated parameter combinations. (A) The results from PEDI. (B) The results from SA.
16
PEDI
SA
A
10
10
9
9
8
8
7
7
6
B
6
5
5
4
4
3
3
2
2
1
1
0
0 k1
k2
k3
k4
k5
k6
k1
k2
k3
k4
k5
k6
Figure S2. Comparison between PEDI and SA for variance of estimates for model M1. (A) Comparison of prediction error for each estimated parameter combination. Each bar with different color indicates the parameter combination indicated by that color in Figure S1. The X-axis is the ID of each parameter estimation run. The Y -axis is the prediction error. (B) A boxplot showing statistics of relative differences for each unknown parameter. The X-axis indicates each of the unknown parameters. The Y -axis indicates the relative difference of parameter values with respect to their sample means.
17 500
450 400
m1
350
250 200
15
5.5
True curve PEDI, PE=118.06 SA, PE=203.47 Measurements
300
k3
k2
150 100 0
-15
7
50 0
500
1000
5
-5
1500
k1
450 400 350
True curve PEDI, PE=37.84 SA, PE=66.12 Measurements
m2
300 250 200
0.005
k4
150 100
0.015
50 0
0
500
1000
1500
1
3 k5
120 100
m3
80 60
True curve PEDI, PE=36.42 SA, PE=44.56 Measurements
40 20 0
0
500
1000
0.9
1.1 k6
1500
Time
Figure S3. Comparison of PEDI and SA based on the estimated parameter set with the smallest prediction error from model M1. From the top row, we have the results of m1 , m2 , and m3 . The left-hand-side panels show the comparison of the reconstructed mRNA trajectories between PEDI and SA, while the right-hand-side panels show the comparison of the parameter combination generating the estimated trajectory for the two approaches. Here, PE indicates the prediction error of each mRNA species, and the red point in the middle of each side for the parameter comparison indicates the true value of each parameter.
18
m1
m2
600
500
PEDI, PE=891.29 Measurements
400
120
PEDI, PE=2263.42 Measurements
100
400
80
300
m3
m2
300
m1
60
200 200
40 100
100 0 0 600
500
1000
1500
500
0 0 500 400
20
500
1000
1500
PEDI, PE=116.27 Measurements
0 0 120
PEDI, PE=36.42 Measurements 500
Time
1000
1500
100
400
80
PEDI, PE=34.19 Measurements
300
m2
m1
300
m3
Initial Iteration
500
Iteration 1
m3
200
40 100
100 0 0 600
60
200
500
1000
1500
500
0 0 500 400
20
500
1000
1500
PEDI, PE=118.06 Measurements
0 0 120
PEDI, PE=36.42 Measurements 500
Time
1000
1500
100
400
80
m3
PEDI, PE=37.84 Measurements
300
m2
m1
Iteration 4
300
200
40 100
100 0 0
60
200
500
1000 Time
1500
0 0
20
500
Time
1000
1500
0 0
PEDI, PE=36.42 Measurements 500
Time
1000
Figure S4. Refinement of parameters for model M1, showing how estimated curves were refined at various refinement steps for the best estimate of the parameter combination for model M1. Here, PE stands for prediction error.
1500
19
PEDI
SA
4500
0
8
−4380
−100
kp1/km1
n1
kp1/km1
n1
4500
0
-4380
8 -100
100
0
5000
0
8
−4800
−80
5000
n2
kp2/km2
n2
kp2/km2
8
−4800
−80
80
80 ke2
ke2 170
0
−10
8 −50
50 ke3
170
0
kp3/m3
n3
100
ke1
ke1
kp3/m3
n3
8
−10 −50
50 ke3
Figure S5. Comparison between PEDI and SA for the estimated parameter combinations of model M2. The left column shows the results from PEDI, while the right one shows the results from SA. The top row shows the parameter set for the regulation of m1 . The middle row shows the parameter set for the regulation of m2 . The bottom row shows the parameter set for the regulation of m3 . In each parameter set, the triangle with the red broken line indicates the true parameter combination. Each of the 10 estimated parameter combinations is color-coded differently.
20
SA
PEDI
A
B
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0 ke1 kp1/km1 n1
ke2 kp2/km2 n2
ke3 kp3/km3 n3
ke1 kp1/km1 n1
ke2 kp2/km2 n2
ke3 kp3/km3 n3
Figure S6. Comparison between PEDI and SA for variance of estimates for model M2. (A) Comparison of prediction error for each estimated parameter combination. Each bar with different color indicates the parameter combination indicated by that color in Figure S7. The X-axis is the ID of each parameter estimation run. The Y -axis is the prediction error. (B) A boxplot showing statistics of relative differences for each unknown parameter. The X-axis indicates each of the unknown parameters. The Y -axis indicates the relative difference of parameter values with respect to their sample means.
21 90 True curve PEDI, PE=88.38 SA, PE=537.06 Measurements
80 70
2
120
m1
60 50
n1
kp1/km1
40 30 20
6
10
−120
0
0
500
1000
0 ke1
120
1500
140
2
True curve
120 100
m2
180
PEDI, PE=49.41 SA, PE=784.28 Measurements
n2
80
kp2/km2
60
6
40
20
0
20 0 0
500
1000
120
ke2
1500
True curve PEDI, PE=105.12 SA, PE=1944.41 Measurements
100
0.0066
1
120
80
m3
60
n3
kp3/km3
40
7
20 0
0.001 0
500
1000
1500
40 ke3
0.003
Time
Figure S7. Comparison of PEDI and SA based on the estimated parameter set with the smallest prediction error from model M2. From the top row, we have the results of m1 , m2 , and m3 . The left-hand-side panels show the comparison of the reconstructed mRNA trajectories between PEDI and SA, while the right-hand-side panels show the comparison of the parameter combination generating the estimated trajectory for the two approaches. Here, PE indicates the prediction error of each mRNA species, and the red point in the middle of each side for the parameter comparison indicates the true value of each parameter.
22
m2
m1
120
PEDI, PE=1037.55 Measurements
PEDI, PE=1612.85 Measurements
100
100
m3
60 40
50
20 500
1000
0 0 150
1500
500
PEDI, PE=1443.51 Measurements
1000
0 0 120
1500
PEDI, PE=978.64 Measurements
100
100
40
m3
60
50
500
1000
1500
0 0 150
PEDI, PE=774.58 Measurements
500
1000
1500
PEDI, PE=628.33 Measurements
500
Time
1000
1500
PEDI, PE=687.43 Measurements
m3
60 40
50
20
20
500
1000
0 0
1500
120
500
1000
1500
150
PEDI, PE=88.38 Measurements
100
40
500
1000
1500
PEDI, PE=105.12 Measurements
80
m3
m2
100
60
0 0 120
PEDI, PE=49.41 Measurements
80
50
60 40
20 0 0
0 0 120
80
m2
40
100
PEDI, PE=1627.06 Measurements
60
100
100
60
0 0
1500
20
80
m1
Iteration 15
100
1000
40
20 0 0 120
500
80
m2
m1
80
m1
60 40
20
100
Iteration 22
PEDI, PE=1943.16 Measurements
80
m2
80
0 0 120
Iteration 2
m3
150
100
m1
Initial estimaiton
120
20
500
Time
1000
1500
0 0
500
Time
1000
1500
0 0
500
Time
1000
1500
Figure S8. Refinement of parameters for model M2, showing how estimated curves were refined at various refinement steps for the best estimate of the parameter combination for model M2. Here, PE stands for prediction error.
23
700 600
5.25
500 PEDI (59.21) HEKF+MM (85.86) SRES (1423.02) TDDM (857.57)
m1
400 300
800
k2
k3
200
7.25
100
−800
−450 0 0 450
500
1000
450 k1
1500
Time PEDI (39.6) HEKF+MM (81.24) SRES (786.38) TDDM (1141.47)
400 350
−1.5
300
m2
250
k4
200 150 100
1.5
50 0
−31 0
500
1000
1500
k5
35
Time 120 100
m3
80 PEDI (28.69) HEKF+MM (39.87) SRES (63.83) TDDM (145.83)
60
0.8
k6
1.2
40 20 0
0
500
1000
1500
Time
Figure S9. Comparison of the four methods based on the estimated parameter set with the smallest prediction error from model M1. From the top row, we have the results of m1 , m2 , and m3 . The left-hand-side panels show the comparison of the reconstructed mRNA trajectories. Here, each value within parentheses next to each method indicates the prediction error for a given mRNA, the red square points indicate the observed data points, and the dotted red lines indicate the true average trajectories. The right-hand-side panels show the comparison of the best parameter combination from each of the methods. The red point in the middle of each side for the parameter comparison indicates the true value of each parameter.
24
5
Relative Error
10
PEDI HEKF+MM SRES TDDM 0
10 Relative Error
1 0.8 0.6 0.4 0.2 0 k1
k2
k3
k4
k5
k6
Parameter
Figure S10. Comparison of the average relative error of each of the unkown parameter in model M1.
25
90
PEDI (64.92) HEKF+MM (429.01) SRES (735.61) TDDM (807.46)
80 70
0
30800
m1
60 50
kp1/km1
n1
40 30 20
8
10 0
−30680 −35
0
500
1000
ke1
1500
PEDI (58.43) HEKF+MM (534.04) SRES (500.9) TDDM (776.75)
150
35
0
160
100
kp2/km2
m2
n2 50
8
40 0
0.0066 ke2
0
0
500
120
1000
1500
PEDI (103.61) HEKF+MM (740.95) SRES (617.17) TDDM (537.23)
100
1
120
m3
80 60
kp3/km3
n3
40
8
20 0
40 −0.002
0.006 ke3
0
500
1000
1500
Time
Figure S11. Comparison of the four methods based on the estimated parameter set with the smallest prediction error from model M2. From the top row, we have the results of m1 , m2 , and m3 . The left-hand-side panels show the comparison of the reconstructed mRNA trajectories. Here, each value within parentheses next to each method indicates the prediction error for a given mRNA, the red square points indicate the observed data points, and the dotted red lines indicate the true average trajectories. The right-hand-side panels show the comparison of the best parameter combination from each of the methods. The red point in the middle of each side for the parameter comparison indicates the true value of each parameter.
26
6
Relative Error
10
PEDI HEKF+MM SRES TDDM
4
10
2
10
0
Relative Error
10 1
0.8 0.6 0.4 0.2 0 ke1
kp1/km1
n1
ke2
kp2/km2 Parameter
n2
ke3
kp3/km3
n3
Figure S12. Comparison of the average relative error of each of the unkown parameter in model M2.
27
120
PEDI (84.37) HEKF+MM (294.7) SRES (395.78) TDDM (420.94)
140 120
2
PEDI (61.8) HEKF+MM (180.28) SRES (228.72) TDDM (454.64)
100
0.075
100
80 m2
m1
80
ke1
n1
1
0.2
60
ke2
n2
60 40 40
6 20 0
−0.025 −5
20
0
50
100
150
200
250
7
45 0
300
100
0
90
70
0.0334
50
100
150
200
250
170 kp2
300
PEDI (165.22) HEKF+MM (218.5) SRES (402) TDDM (403.08)
80 80
2
−0.17 −142
kp1
1
0.05
60 m4
m3
60
40
ke3
n3
50 40
ke4
n4
30 PEDI (79.57) HEKF+MM (154.98) SRES (319.12) TDDM (188.9)
20
0
0
50
100
120
150
20
6
0 12
200
250
0
kp3
0
150
200
250
26
300
kp4
2
0.0286
80
ke5
n5 40
m6
m5
100
PEDI (75.36) HEKF+MM (158.79) SRES (230.64) TDDM (197.79)
0.05
60
60
ke6
n6 40
20
8
−0.01 10
0
50
100
80
−0.01 6
0
120
PEDI (106.78) HEKF+MM (215.95) SRES (310.93) TDDM (328.84)
100
7
10
20
300
0
70
50
100
150
200
250
60 50
6
1
0 12
0
kp5
300
PEDI (75.05) HEKF+MM (252.76) SRES (287.38) TDDM (350.65)
20
20 0
50
100
150 Time
200
250
300
24 kp6
0.08
m7
40
ke7
n7
30 20
7
10
−0.03 8
0
0
50
100
150 Time
200
250
300
28 kp7
Figure S13. Comparison of the four methods based on the estimated parameter set with the smallest prediction error from model M3. Here, we have the results of the seven mRNAs. The left-hand-side panels of each column show the comparison of the reconstructed mRNA trajectories. Here, each value within parentheses next to each method indicates the prediction error for a given mRNA, the red square points indicate the observed data points, and the dotted red lines indicate the true average trajectories. The right-hand-side panels show the comparison of the best parameter combination from each of the methods. The red point in the middle of each side for the parameter comparison indicates the true value of each parameter.
28
250
Relative Error
200 150
PEDI HEKF+MM SRES TDDM
100 50 1
Relative Error
1 0.8 0.6 0.4 0.2 0
kp1 ke2 n1 kp2 ke2 n2 kp3 ke3 n3 kp4 ke4 n4 kp5 ke5 n5 kp6 ke6 n6 kp7 ke7 n7 Parameter
Figure S14. Comparison of the average relative error of each of the unkown parameter in model M3.