Supplementary material for the paper
Inferring Drosophila gap gene regulatory network: pattern analysis of simulated gene expression profiles and stability analysis Yves Fomekong-Nanfack 1 , Marten Postma1 , Jaap Kaandorp 1,a 1 Section Computational Science, Faculty of Science University of Amsterdam. Science Park 107 ,1078 XJ Amsterdam The Netherlands. a corresponding author E-mail:
[email protected]
1 1.1
Methods Inference of the Gap Gene model
The gap gene circuits analyzed in this paper were presented by Jaeger et al. [1] and FomekongNanfack et al. [2]. In both cases, the inference was performed using the same quantitative data, the same model but different parameter estimation methods. Quantitative data used are available online in the FlyEx database http://urchin.spbcas.ru/flyex or http://flyex.ams.sunysb.edu/flyex. The database presents a collection of quantitative data obtained from fluorescently stained wild-type embryos for Eve protein and two other genes [3]. Data were obtained by applying different image processing strategies [4–6]. The embryos are for different time ranging from cycle 7 to cycle 14A. In the simulation, data obtained at cycle 12 were used as initial conditions. For the genes Kr, gt, kni, Tll these are very close to zero and set to 0 in the simulations.
Mathematical model of gap gene considers the 35% to 92% of the A-P axis of an embryo. It is reduced to a one-dimensional discrete model where nuclei are aligned horizontally. The model focuses on the development between cycle 13 and cycle 14A8, before gastrulation (71.1 min). Three rules describe the mechanism during that phase: interphase, mitosis and division [7]. Interphase and mitosis are continuous stages describing the dynamic of protein variation of a gene within a nucleus. The division is a discrete process describing the division of a nucleus in two. Mitosis, arising before division, differs from interphase by the absence of protein synthesis. The resulting model is a system of 180 equations before division and 348 equations after, with a total of 66 unknown parameters written as:
dgia t dt
P
N ¢ ¨ Ra Φa b g1 Wab gib ma gibcd ha Ǒ ¨ ¨ ¨ λa gia ¨ ¨ a a a ¨ ¨ D g a i1 2gi gi1 ¤
(1)
where Ng denotes the number of genes or gene products involved and Φ is a sigmoid function with range (0,1). gia t represents the concentration level at time t of gene a in nucleus i with 1 B i B N and N the number of nuclei during a cleavage cycle. The concentration, gibcd , of the maternal gene bicoid is taken from experimental observations and is kept constant in time during the simulation. The parameters are: the regulatory weight matrix Wab , describing the influence of gene b on gene a, the production rate Ra , the activation threshold ha for Φ, the decay rate λa , the diffusion coefficient
1
2
Y. Fomekong Nanfack
Da , and the regulatory influence bcda .
Parameter estimation was performed by two different strategies. Jaeger et al. [1] have used a parallel simulated annealing (PLSA) algorithm as described in [8], originally proposed by Lam [9,10]. The expensive computational time required by PLSA could only lead to 10 gap gene circuits with good solution’s quality and patterns behavior. Later on, Fomekong-Nanfack et al. [2] proposed 101 gap gene circuits obtained using hybrid methods composed of an stochastic ranking evolution strategy [11, 12] followed by direct search [13–15]. The large number of solution could be obtained because of the reasonably low computational time of their method (8h on single processor) compare to PLSA (1 to 5 days on 10 parallel CPU), but leading to the same quality of solution. In both cases, the chosen cost-function is the least-squares of the difference of the simulated and the observed data: E θ
Qgia t, θmodel giatdata 2,
(2)
i,t
with θ the parameter vector to which a constraint or penalty function is added. An explicit searchspace constraint is given for parameters Ra , λa and Da . For the parameters Wab , bcda and ha a collective penalty function is used ( [16]) to restrict the function value of Φ to the domain Λ, 1 Λ with Λ a small parameter (in this study taken to be 0.001). The root mean square (RMS) described by Reinitz et al. is used ( [16]) as a measure of the quality of a model solution for a given set of parameters: ¾ RMS
E θ Nd
(3)
where E θ is given by Equation ( 2) and Nd is the number of data points.
1.2
Statistical analysis
A correlation matrix shows the degree of association between two parameters. The parameter values are centered on the mean and computed using the Pearson correlation. Clustering algorithms are often used as one of the first gene expression analysis [17]. In the current context, clustering is applied to simulated gene expression obtained from the inferred circuits. The goal is to subdivide the profiles at gastrulation for all the simulated genes in groups, such that dissimilar profiles fall in different clusters. For each gene, 101 profiles at gastrulation time are available. A cluster analysis will highlight all circuits’ profile that has a similar pattern. The clustering used here is based on agglomerative hierarchical clustering [18]. Prior to linkage, the profiles are median centered and normalized. Then, a dendrogram relating similar circuits in the same tree is hierarchically constructed based on the average linkage and the absolute correlation coefficient. Similarity between circuits of different clusters on the basis of the parameter is obtained by a t-test.
1.3
Stability
We assume that at gastrulation time, the system reached is steady state corresponding to the end of cycle 14A8 (where starting simulation time is cleavage cycle 13 ad total simulation corresponds to approximately 71.1 minutes real time development). We therefore simulate the model up to 1000 or 2000 minutes and classified the resulting spatio-temporal patterns qualitatively in terms of the different observed behavior (stable close or not to attractor and oscillation). Using a t-test, we compare the parameters between the different groups to find parameters that are significantly different.
2
Description of the simulated profiles
Caudal The simulated profiles show a lower expression level than the real data, suggesting that the decay coefficient might be too small. The profiles from time point 14A1 to 14A3 show a good fit contrarily to those from 14A4 to 14A8. Cad expression at later times is rather variable. The data also show that caudal collapses slightly overtime, which is not well represented in the model. Late Cad
3
Y. Fomekong Nanfack
profile variation might be caused by missing data at the two last time points ( cleavage cycle 14A7 and 14A8). This gives freedom for the fit and allows for repression of caudal by other genes.
Tailless The profiles show an overall good fit beside the fact that there were some early time points missing (cleavage cycle 13 to 14A3). However, in some of the circuits there is a small shoulder present at the posterior hunchback peak and sometimes a very small bump at the Kr peak.
Anterior hunchback The simulated profiles are higher than the observation at cleavage cycle 13. From cleavage 14A1 to 14A6 the profiles are well fitted against the real data, especially the boundary. At time 14A7 and 14A8 in some cases, a dip is formed in profile. Posterior hunchback The observed profiles in early times are well fitted, however later on from 14A4/14A8 the model has difficulties to represent the retraction of the posterior hunchback peak. Kr¨ uppel The observed profiles are well fitted for all time points with the exception of cleavage cycle 13 for which the expression level is much higher than observations. Very little variations appear on the posterior domain and increase slightly the kr¨ uppeldomain at time 14A8. Anterior giant The simulated profile shows an overall good fit without any defection. Posterior giant The simulated profile has an overall good fit, only at later times it has minor difficulties to retract. There is some variability in the posterior giant peak. Knirps The observed profile is very well fitted although minor variations appear in the shape of the peak for some gap gene circuits.
5
gt3 (5)
Kr (6) Tll (34) 0
gt2 (36)
6
hb (26)
1
1
gt1 (19)
24 2 1
Figure 1: Hypothetically possible logical relations between set of circuits. Each color corresponds to a gene. Only the solutions with a defect are selected. Overlapping sets describe solutions having multiple genes profiles showing that defect.
4
Y. Fomekong Nanfack
3
Comparision of different groups obtained from the long term dynamics Network with stable pattern with expanded Hb
Network with stable pattern gt
gt tll tll
kni
kni
threshold threshold
bcd
bcd
hb hb
cad
cad Kr Kr
θ hb bcd
hb kni
m1 0.023993 0.0448947
Network Differences m2 dm 0.020258 0.00373504 0.0123351 0.0572298
t 0.0016962 2.85189e 006
Table 1: Comparison of an average network with stable pattern formation(group I) against a network with a stable pattern and with expanded Hb domain (group II). Interactions that are not significantly different between the two groups are shown in light gray. The interactions that are significantly different are shown in colour. The table summarizes the list of parameters that are significantly different (mean mi, difference between mean dm and their p-value from the T-test t. The parameter difference found between Group I and II are the strength of hb autoactivation and the activation/repression of kni by Bcd. Network with oscillating pattern of: Cad, Hb, Kr, Gt and Kni
Network with stable pattern with expanded Hb
gt tll
gt kni
threshold
tll
bcd
hb
threshold
cad Kr
θ hb kni hb bcd
hb hb gt kni
kni
bcd
hb
cad Kr
Network Differences m1 m2 dm 0.0202833 0.023993 0.00370971 0.148545 0.0960403 0.0525049 0.00730634 0.000129385 0.00743572 0.000129675 0.0448947 0.0450244
t 0.000132337 0.000899982 0.00103439 0.000589936
Table 2: Comparison of an average network with a stable pattern group (group II) against oscillatory pattern (group III). Interactions that are not significantly different between the two groups are shown in light gray. The interactions that are significantly different are shown in colour. The table summarizes the list of parameters that are significantly different. Group II is stabilized by the over production of hb (activated by Gt).
5
Y. Fomekong Nanfack
Network with oscillating pattern of: Cad, Hb, Kr, Gt and Kni
Network with oscillating pattern of: Cad, Hb, Gt and Tll
gt tll
gt kni
threshold
tll
bcd
hb
threshold
cad
hb T ll hb gt kni hb Kr T ll gt bcd bcd bcd bcd
cad cad hb hb hb gt gt gt T ll cad Kr gt kni
bcd
hb
Kr
θ
kni
cad Kr
m1 0.0479759 0.0197665 0.0202833 0.0131477 0.148545 0.00730634 0.103984 0.0107778 0.036005 0.014402 0.0576209 0.0957429 0.000129675
Network Differences m2 dm 0.0239867 0.0239891 0.0261618 0.00639534 0.0133955 0.00688781 0.00553095 0.0186786 0.0728052 0.0757399 0.00505889 0.0123652 0.0585162 0.0454676 0.0464788 0.035701 0.00193247 0.0340725 0.0389897 0.0245877 0.0287058 0.0289151 0.0223168 0.0734261 0.0630306 0.0631603
t 0.000701299 0.00334701 1.11532e 005 4.33042e 011 7.19654e 005 3.60571e 006 0.000300942 6.88338e 014 0.000156841 7.47514e 005 0.000306426 6.08573e 005 0.000139936
Table 3: Comparison of an average network of the two groups with oscillatory pattern (group III vs. group IV). Interactions that are not significantly different between the two groups are shown in light gray. The interactions that are significantly different are shown in colour. The table summarizes the list of parameters that are significantly different.
Y. Fomekong Nanfack
4
6
Correlation analysis
One simple approach to explore the parameter determinability is to use cross-correlation between parameters [19]. A correlation matrix shows the degree of association between two parameters. The parameter values are centered on the mean and the normalised cross-correlation between 1 and 1 is computed using the Pearson correlation. From the inverse modelling paradigm, the correlations describe compensation that may arise from an incomplete or inaccurate data set, i.e. the data set does not contain enough information to cover all parameters. Compensation may however also arise from an incomplete model, i.e. the model does not sufficiently represent the underlying biological mechanism. Typically compensation can occur if the time derivative, or gene change rate is remaining the same, while changing different parameters. Examples of these are the promoter rates R and the decay rates λ, which both scale the expression profile, but in different directions and in general show strong correlation patterns. Furthermore, the input weights on a single gene can also compensate each other. If a positive input on a gene becomes stronger, increasing negative weights or decreasing positive weights can adjust for the increased total input, such that the total input on that gene is not altered much. However, these correlation patterns are quite variable and difficult to predict and strongly depend on the precise spatial pattern. From the correlation matrix obtained from the 101 gap gene circuits shown in Figure. ??, we see the intricateness of the correlation patterns. Considering only absolute correlation values (SrS C 0.6), cross-correlation are classified as follow: 1. direct correlations: (involving a gene regulators) (a) negative correlations between a gene’s activators when their contribution is partially on the same anterior-posterior (A/P) domain. (b) negative correlations between a gene’s repressors when their contribution is partially on the same (A/P) domain. (c) positive correlation between a gene’s activators vs. its repressor when their contribution is partially on the same (A/P) domain. (d) co-correlation caused by the domain geometry (boundary control mainly). Usually, it is regulatory interactions of two different parameters on a gene having the same function (activation or repression) but acting on non-overlapping (A/P) domain. 2. indirect correlation caused by the profile variation. Production rate and decay Systematically for all genes but tll, strong negative correlation is observed between all pairs of production rate and decay coefficients (r(Ra ~λa C 0.65). The strong linear correlation represents the scaling of the expression profile. If one increases the production rate of a gene a and wants to keep the system in its normal expression level, one has to decrease the decay related to the protein half-life of the product of gene a. Figure. 2 illustrates the negative pairwise production/decay correlation of the genes hb and kni.
Gene regulatory parameters In classical micro-array data analysis, a direct correlation exists between the regulator-regulatee relationships. This association is described if a set of genes (regulatees) increases or decreases their protein level with the increase or decrease of the expression of another group of genes (regulators). In the current context, we see the same behaviour at the parametric level. It is necessary to discriminate between interactions that are consequence of an over-fitting and parameters that might suggest a real interaction. First, we describe the different regulatory mechanisms obtained from the 101 gap gene circuits presented in the main document, obtained from [1, 2]. Based on the correlation-matrix, we identify the interactions that have a very large cross-correlation with all (or most) other parameters, implying that it is not possible to trust their significance. Caudal
regulation consist mainly of:
auto-repression.
7
Y. Fomekong Nanfack
R−hb vs lambda−hb (r = −0.977978)
R−kni vs lambda−kni (r = −0.882546)
10
12 81
30
29
9.5 11 9 32
8.5
10 50 94
2 77 15
lambda−kni
lambda−hb
8
87 42 8227 92 93 16 3144 12 70 39 62 43 8413 21 10 6564 26 36 2211 37 74 47 698 90 45 729 24 88
7.5 7 6.5 6
75
18
20
22 R−hb
82 66 93 88
12 14 43
77 21
24
68 13 9 15 26 64 32 62
20 30
26
90
16 39 94
6
72
16
78 84 5765 1
8
40
25
5 14
50 49 10 87
7
4957 78 6814 66 41 71
5.5
9
11
44 36 4 92
27 71 42
31
40 38
28
69 75 28
38 28
30
5 15
(a) RHb ~λHb
20
25
8 45 72 29 74 7 47 81 24 25 70 20 22 37
30
R−kni
(b) Rkni ~λkni
Figure 2: Scatter plots of production and decay with regression lines and correlation coefficients. At the top and right the estimated parameter distribution is shown, which was calculated using using ksdensity. repression by Bcd and Tll. negative regulation by Hb, Kr, Gt and kni.
From the dendrogram shown on the right panel of Figure. ??, we see two main clusters acting on cad cad. The first is composed by Wcad , cad promotor threshold Hcad , production rate Rcad , and decay λcad . In this cluster, a strong negative correlation between cad auto-repression and its production rate indicates the compensation effect in order to scale the profile. The second cluster contains cad negative regulators (Hb, Gt, Kr and Kni) and maternal influence of Bcd and Cad. In this group, two types of correlations are present: hb 1. negative correlation between repressors acting on the same domain: (Wcad vs. bcdcad ), gt hb kni hb Kr Kr 2. positive co-correlation of Wcad vs. Wcad , Wcad vs. Wcad and Wcad vs. Wcad ). These correlations express geometry maintenance by symmetric action on cad to keep the gene expression level proportional in all domains.
Plots in Figure. 8a-d illustrate the strong correlation of parameters acting on cad. hunchback
regulation obtained from the reverse engineering is mainly controlled by the following:
activation by Bcd and Cad, confirming that they are both the primary activators of the gap domain, acting respectively on the anterior and the posterior. auto-repression. repression by Kr (weak), Tll (weak), Gt and Kni (strong) activation by Kr (weak) , gt and Tll (weak)
The typical correlations shown in Figure. 5 of the parameters regulating Hb are: gt bcd kni kni 1. negative correlation between opposite regulators (Whb vs. Whb ) and (positive Whb vs. Whb )
8
Y. Fomekong Nanfack
cad−>cad vs R−cad (r = −0.602344)
28814212 253214 242090 11 37 36 31 47 712643 7015 64 8 40 87 84 38
9
30
22
0.02
62 78
0.01
72 75
26
50 21
28 74
hb−>cad vs m−>cad (r = −0.680663)
70
4
0
69
93
65
7 39
24
57
−0.01 1 45 66
m−>cad
R−cad
22 1
20
16
77
18
−0.02
29 31 81 4974 1214 9469 64 44 27 258 71 37 93 88 932 68 47 26 10 43 66
−0.03 45
4492
11
78 22
62 13 87 75 40 36
7
30
20 2
−0.04
50
16
90
77
16
28
57
94
84 42 15
2
−0.05
88
14
65
82
38 72
92 39
68 24
−0.06
12 27 13 492982 10
3021
10 −0.045 −0.04 −0.035 −0.03 −0.025 −0.02 −0.015 −0.01 −0.005 cad−>cad
0
−0.07 −0.07
0.005
−0.06
cad (a) Wcad / Rcad
−0.05
−0.04 hb−>cad
−0.03
−0.02
−0.01
hb (b) Wcad / bcdcad
hb−>cad vs kni−>cad (r = 0.655411)
Kr−>cad vs gt−>cad (r = 0.608334)
−0.01
−0.02
78
70
72
64
81
−0.012 78 21 7 69 75 24 13 38 94 8171
74
70
77 14
57
10
−0.018 12 93
−0.02
92
44
65 39 15
26 66 27 31 82 40 32 9 90
49
28
16 88
−0.022
43
93 49 12 26
36 21
24 72
94 2015 92 7
22
87 38 84
30
−0.04 2
11
16 44
−0.035
11 37
45
75 45
25 7447 14 3257 28
−0.03
43
42
68 77 89 29 71 31
30
84 42 20 36
27 13
37 40
87 6222
68 47 8 64 25 29
−0.016
90
−0.025 4
gt−>cad
−0.014
kni−>cad
4
69 88
66
10 39 62
65
50
−0.024
4
2
50
82
−0.045 −0.026
1 1
−0.028 −0.07
−0.06
−0.05
−0.04 hb−>cad
−0.03
hb kni (c) Wcad / Wcad
−0.02
−0.01
−0.05 −0.034 −0.032
−0.03
−0.028 −0.026 −0.024 −0.022 Kr−>cad
−0.02
−0.018 −0.016
gt Kr (d) Wcad / Wcad
Figure 3: Scatter plots of parameters that regulate Cad gene expression. Only the scatter plots are shown with pairwise correlations higher than 0.6.
Y. Fomekong Nanfack
9
2. positive correlation between regulators with opposite functionality on the same domain on a gt Kr gene ( Whb vs. bcdhb and Whb vs. bcdhb ) cad hb 3. co-correlation caused by the domain geometry (Whb vs. Whb )
Perkins et al. [20] suggested that the posterior of Hb is activated by Tll while Jaeger et al. [1] found that posterior Hb is activated by Cad. We found that Gt and Tll have both positive and negative regulatory parameters on hb. Assuming that posterior Hb is also activated by Tll, we were expecting to see negative correlation between Cad and Tll regulation on hb. Surprisingly, it was not the case, and hb regulation by Tll did not show any particular correlation with any other parameter. In fact, it shows very weak correlation with most of the other parameters implying that this parameter is well determined. kr¨ uppel . From the parameter estimates, the different regulatory mechanisms that control Kr gene expression dynamic is defined by: maternal activation by Bcd and Cad. auto activation. repression by Hb, Gt, Kni and Tll. activation by Hb and kni.
The correlations shown in Figure. 5 with a meaningful value are the following: 1. negative correlations between kr’s repressors when their contribution is mostly on overlapping gt gt hb kni domain : WKr vs. WKr at the anterior domain and WKr vs. WKr at the posterior domain. kr hb kni 2. negative correlation between WKr vs. WKr and Wkr (decrease repression weight if autoactivation is weaker)
3. positive correlation between activators vs. repressor when their contribution is mostly on the gt hb kni Kr hb kni same domain : WKr vs. RKr , WKr vs. RKr , WKr vs. WKr and WKr vs. WKr (NB: kr auto-activation and production contribute in the entire domain). hb kni 4. positive co-correlation caused by the domain geometry: Wkr vs. Wkr cad Jaeger et al. [1] suggested stronger influence of Bcd than Cad and found bcdkr C Wkr . We find equivcad alent weight for Wkr and bcdkr . However, we did not estimate the total contribution of the gene’s parameter and the gene’s product. It is suggested that Hb activates anterior kr. The resulting gap gene circuits found both role activation (very weak) and repression. The strong correlation between hb kni Wkr and Wkr suggests that if one repression increases, the other one also has to increases in order to maintain symmetry and to avoid domain expansion on one side. This result confirms Jaeger et al. [1] hypothesis suggesting that Hb and Kni contribute in the establishment of kr border. Hb represses the anterior border while kni represses the posterior border.
giant From the 101 circuits obtained, mechanism controlling gt is as follow: maternal activation by Bcd and Cad. auto-activation. repression by Hb, Kr, kni and Tll. activation by Hb and kni ( very weak).
10
Y. Fomekong Nanfack
kni−>hb vs m−>hb (r = −0.634808)
gt−>hb vs kni−>hb (r = −0.794431)
0.045
0 3215
0.04
81
50 88 64 40 57 2 77 94 70 116 62 68 2743 29 1228 45 49 10 92 37 87 90 65 36 42 2022 4 66 21 9 93 11 26 13 71
−0.05 0.035
72
−0.1 0.03 92
−0.15
82 39 7
47
84
25 74
0.02
30 8
31
65 478
69
64
44
71 66 20 13 87 2693 36 42 22
40
68
72 14
87
kni−>hb
m−>hb
24 75 38
0.025
44 78 84 75 24 69 82 3974
38
25 47
−0.2
30
31 94
0.015
57 2762 88 70 2 21 45 29 12 1 77 11 10 2843 16 90 49 9
14
0.01
−0.25 15 50
−0.3
37
0.005
32
0 −0.35
−0.3
−0.25
−0.2
−0.15 kni−>hb
−0.1
−0.05
81
−0.35 −0.005
0
0
0.005
0.01
0.015 gt−>hb
0.025
kni (a) Whb / bcdhb
gt kni (b) Whb / Whb
gt−>hb vs m−>hb (r = 0.624369)
Kr−>hb vs m−>hb (r = 0.633589)
0.045
0.03
0.035
0.045 81
0.04
0.035
81
0.04
0.035
72
0.03
72
0.03 92
92 24 75
0.025 40
0.02
0.015
9
15
0.01
78 65 47 39 84
47
64
38 82 44
71 2574 68 66 20 13 87 36 93 26 42 22 8 31 94 62 88 57 27 14 70 2 21 45 29 77 11 112 28 1610 49
50
m−>hb
m−>hb
0.02
30
0.02
69
0.015 29 10 16 49
50
43 28
90
37
0.005
32
0 −0.005
15
0.01
43 90 37
0.005
24 75 38 82 7847 7 65 644 39 44 84 40 25 68 66 71 74 20 30 13 36 87 26 893 42 22 69 31 94 6227 88 579 14 2 70 21 45 12 77 1 11
0.025
32
0
0.005
0.01
0.015 gt−>hb
0.02
0.025
0.03
0 −6
0.035
−4
−2
gt (c) Whb / bcdhb
0 Kr−>hb
2
4
6 −3
x 10
kr (d) Whb / bcdhb
Kr−>hb vs gt−>hb (r = 0.799647)
cad−>hb vs hb−>hb (r = 0.682183)
0.035
0.03 88
32
77
81
0.03
70
62
0.028
57
0.025
90 68
0.026
30
82
15 44
0.01 29
0.005 50
−0.005 −6
32
50
0.024
82 38 847478 39 75 25 24 47 7271 26 656614 31 43 27 888 7 37 9011 92 4 13 45 62 2294 20 36 87 68 9328 10 49 12 16 7742 1 21 2 57 70 40 649
0.015
0
78
69 49
hb−>hb
gt−>hb
0.02
2
gt kr (e) Whb / Whb
75
81
21 27
0.02
37 2 43 28
45 22 13 14
1
6 −3
x 10
0.016 0.005
38 30
87
72
9
8 47 36 20 1125 26 42
4
64
16
44
0 Kr−>hb
65 94 74
6971 84
29
0.018
−2
93 92
12
0.022
15
−4
10
66
31 24
739 4 40
0.01
0.015 cad−>hb
cad hb (f) Whb / Whb
Figure 4: Scatter plots of parameters that regulate Hb
0.02
11
Y. Fomekong Nanfack
gt−>Kr vs kni−>Kr (r = −0.691532)
0
81 74 75 47
25
71 72 8 7 69
38
−0.01
hb−>Kr vs gt−>Kr (r = −0.633867)
0.02 26 62 4 14 84 92 68 22 13 94 70 24 39 9 30 65 36 66 42 93 32 78 20 15 11 82 12 77 29 64 50 28 10 2
43 27 45
90 87
0
37
44 88
57 90
16 1 3121 49 40
−0.02
−0.02
57 88
16
44
36 28
−0.04
−0.03
50
2149
11 31 40
37
10 20
gt−>Kr
kni−>Kr
1
87
2
32 77 64 70 62 15 29 93 65 12 82 66 2242 789
68 92
39
−0.06
94
24 26 14
84
4 30 13
−0.04
69 7
−0.08
8
71
72
45
25
−0.05
2743
−0.1 74 81
47 75 38
−0.06 −0.12
−0.1
−0.08
−0.06
−0.04 gt−>Kr
−0.02
0
−0.12 −0.02
0.02
−0.015
gt kni / WKr (a) WKr
−0.01
−0.005 hb−>Kr
0
0.005
0.01
gt hb (b) WKr / WKr
hb−>Kr vs Kr−>Kr (r = −0.828069)
Kr−>Kr vs kni−>Kr (r = −0.922454)
0.06
0
7181 74 69 875 722614 747 25 49268 22 62 84 13 70 94 24 36 3930 9 65 90 42 66 32 11 87 20 15 7812 93 82 77 29 64 50 2810 2
43
38
0.05
−0.01 27 45
0.04
31 21 49
−0.02
57 40 88 1
88
kni−>Kr
Kr−>Kr
44 37
57
0.03 1 16 40 11
0.02 31
0.01
0 −0.02
−0.015
−0.01
37
−0.03 44
16 2149
87 50
2
64
90
77 12 32 94 78 10 65 70 82 15 28 20 93 62 669 36 29 2242 688469 14 81 4 26 92 7174 30 39 72 24 8 75 25 13 7 47 38
−0.005 hb−>Kr
0
hb Kr (c) WKr / WKr
0.005
−0.04
45
−0.05
0.01
−0.06
43
27
0
0.01
0.02
0.03 Kr−>Kr
0.04
Kr kni (d) KrKr / WKr
Figure 5: Scatter plots of parameters that regulate Kr.
0.05
0.06
12
Y. Fomekong Nanfack
Bcd and Cad contribute respectively in the expression of anterior and posterior gt. Only two significant Kr correlations (shown in Figure. 6) were found: negative correlations between Wgt and bcdgt and behb tween Wgt and bcdgt . The central domain of gt regulation is mainly repressed by Kr and the negative correlation translates the balance between decreasing repression and decreasing activation. Although hb Hb role on gt seems weak ( SWgt S B 0.005), the correlation with bcdgt shows that Hb represses anterior giant as suggested by Jaeger et al. [1]. When Hb positively regulates gt, Hb mainly contributes in the expression of posterior gt. This is observation is confirmed by the negative correlation between hb C 0. repression of gt by Tll and the regulation of gt by Hb for the case where Wgt
Kr−>gt vs m−>gt (r = −0.686957)
hb−>gt vs m−>gt (r = −0.85693)
0.18
0.18 37
37
0.16
0.16
0.14
0.14
90
90 11
0.12
31
m−>gt
m−>gt
0.12
0.1
0.08
3111
0.1
0.08 72
72
92
92 36 82 38 75 66 116 4 65 20 13 68 78 39 45 28 7 2430 71 84 88 70 2174 25 8 93 47 26 49 81 1027 57 69 14 77 22 2 62 15 87 42 29 40 9 44 43 12 94 3264 50
0.06
0.04
0.02 −0.2
−0.18
−0.16
−0.14
−0.12 −0.1 Kr−>gt
−0.08
−0.06
−0.04
36 381 16 4 39 13 20 45 7 3024
0.06
0.04
−0.02
kr (a) Wgt /bcdgt
0.02 −20
−15
−10
66 82 75 65 6878 28 21 84 88 747193 70 81 47849 10 2625 27 57 69 14 22 8777 40 2 15 29 62 42 43 449 12 94 32 50
−5 hb−>gt
64
0
5 −3
x 10
hb (b) Wgt /bcdgt
Figure 6: Scatter plots of parameters that regulate gt.
knirps
regulation obtained from the gap gene circuits is described as follow:
maternal activation by Cad and bcd. maternal repression by Bcd. auto-activation. activation by kr and gt repression by Hb, Kr, gt and Tll.
The main correlations with a significant Pearson value are the following: gt gt cad kni 1. negative correlations between Wkni vs.Wkni , and Wkni vs. positive Wkni . gt hb kni Kr 2. positive correlation between Wkni vs. Wkni , Wkni vs. Rkni and Wkni vs. Rkni . gt Kr 3. co-correlation caused by the domain geometry between Wkni vs. Wkni .
1 and 2 are direct correlations related to compensation phenomena to maintain the expression level. Jaeger et al. [1] proposed that kni anterior border is set by repression by Hb and Kr, and posterior border is controlled by Gt and Tll. They also pointed that Kr might not be necessary in the regulation of kni. We found that in 100% of gap gene circuits, kni is repressed by Hb and Tll, but it is Gt Kr andWkni have a similar distribution and seems to not systematically repressed by Gt and Kr. Wkni gt kr have the same role on kni. The very strong positive correlation between Wkni and Wkni confirms this
13
Y. Fomekong Nanfack
hypothesis and indicates the role of both parameter in maintaining domain symmetry of kni to avoid domain expansion.
cad−>kni vs gt−>kni (r = −0.61907)
gt−>kni vs kni−>kni (r = −0.703405)
0.005
0.03 93
69 75 22 20 11 47 36 3725 43 84 7 14 26 12 29 42 27
−0.005
81 62
9
78
gt−>kni
64 57
2
88 66
31 77
15 32
−0.015
0.025
84 68 70 82 44 94
39 2428 45
13
−0.01
87
71 7274 38 90 92
10
21
0.02
65
40
66
88
31
−0.02
77 30
2 49
16
62 81
65 32 94 82 49
15
0.015
64
20 74 7
36 69 75
492 9071 39 72 68 28 13 452914 8 12 44 11 43 2742
0.01 87
2547 38 37 22
24 70 84
9 78
50
−0.03
26
10 21
16 30
1
−0.025
40
1
50
kni−>kni
0
57
−0.035 93
−0.04 0.01
0.015
0.02
0.025 cad−>kni
0.03
0.035
0.005 −0.04
0.04
−0.035
−0.03
−0.025
gt cad (a) Wkni /Wkni
−0.02 −0.015 gt−>kni
−0.01
−0.005
0
0.005
gt kni (b) Wkni /Wkni
hb−>kni vs kni−>kni (r = 0.683493)
Kr−>kni vs gt−>kni (r = 0.889783)
0.03
0.005 93
75 69
87
0
0.025
4 92 8
64 57 2
kni−>kni
1
32
30
37 22 70 159 20 10
69 75 92 71 68
90
42
27
29 43
24
81
−0.015
88
31
−0.02
30
2
16
10 40
66 77
16
49
1
−0.025
4
39
72
13 8 14 45 1211
0.01
36 21
15 21 32
65
38 49 84 74 7
78
0.015
81 252647
94
82
11 38 25 47
14 26 844229 68 12 3928 70 274524 82 1344 94 9 78
−0.01
31 40
gt−>kni
77 50
6265
74
7
62
88
66
0.02
−0.005
72 71 22 90 20 4336 37
50
28
−0.03
64
44 57
87
−0.035 93
0.005 −0.14
−0.12
−0.1
−0.08 −0.06 hb−>kni
−0.04
−0.02
0
hb kni (c) Wkni /Wkni
−0.04 −0.015
−0.01
−0.005
0 Kr−>kni
gt kr (d) wkni /Wkni
Figure 7: Scatter plots of parameters that regulate kni.
Tailless regulation obtained from the gap gene circuits is as follow: maternal activation by Cad and Bcd. maternal repression by cad and Bcd. auto-activation. activation by Hb, kr, Gt repression by Hb, Kr, Gt kni.
0.005
0.01
Y. Fomekong Nanfack
14
The disparities obtained for Tll regulators show that the missing data and probably the missing gene (hucklebin) lead to different set of parameter in the search space, making it difficult to interpret. As expected, Cad maternally regulates Tll. Cad weights are weak and this is explained by the high level of Cad on the posterior domain of the embryo. The total transcription factor of Cad on Tll has a very large contribution. Contrarily to the gap gene, in most cases Bcd represses Tll.
Diffusion In [1], Jaeger et al. it was shown that diffusion does not consistently contributes in the expression of the shift domain. We did not find systematic strong correlation between diffusion and any other parameters beside Kr and Gt. Their auto-activation parameters are respectively positively gt Kr correlated to the diffusion coefficient (rWgt ~Dgt 0.605 and rWKr ~DKr 0.64). If their gene concentration is increased by means of auto-regulation, the amount of protein diffusing should also increase. Although these correlations are obvious, we cannot explain why a similar feature is not present for hb, kni, and tll. In contrary, the others diffusion parameters have very weak correlation with any other parameters, signifying that the diffusion coefficient can be determined from the current model with the available data.
Geometry based co-correlations Clustering the parameters reveals a group composed of Cad cad activation on hb, kr, gt and kni. All these parameters are strongly positively correlated (Whb vs. cad cad cad cad cad cad cad cad cad cad cad Wkni , Wgt vs. Wkni , Whb vs. WKr , WKr vs. Wgt , Whb vs. Wgt and WKr vs. Wkni ). These correlations express the maintenance of gap gene profile proportional to each other on the action of Cad.
Indirect correlations. Few indirect correlation of type Wba vs. Wdc mainly caused by profile variT ll vs. bcdT ll , ation are present. These correlations are mainly related to Tll regulation such as: Wkni gt T ll T ll T ll cad T ll W T llKr vs. bcdT ll , WKr vs. Wkni , WKr vs. Wkni and WKr vs. WT ll . Also there is a positive correlation between bcdKr and bcdgt . This indirect correlation is caused by the mutual repression between of Kr and gt. The change in the repressive parameter is balanced by the Bcd. If one repressor increase/decrease, the mutual repressor acts in an similar manner. Consequently, the maternal influence is adjusted to keep the gene expression at the desire level. cad Influence of the promoter threshold We also observe a strong negative correlation between Wcad and its promotor threshold suggesting that the level of auto-activation or auto-repression is clearly linked to the threshold. Another interesting type of correlations is the one between Tll promoter threshold with some of the regulators (WThbll vs. hT ll and WTcad ll vs. hT ll ). Therefore, one cannot conclude that it is a strong or weak action just by focusing on the weight of the parameter given that the level of production depends on the threshold [21].
15
Y. Fomekong Nanfack
cad−>cad vs h−cad (r = −0.78458)
hb−>tll vs h−tll (r = −0.83792)
8
4 9 10 9 64 50 28
6
h−cad
74
5
4
37 45
2 1
49
1132 4212 93 64 26 81 40 45 49 82 47 94 4415 25 6837 8420 14 10 9231 43 87 71 24 36 5729 77 69 27 8 75 90 38 4 39 62 22 70 72
7
82 27
0
2129 7816 93 70
h−tll
7
30 13 68
77
39
32
−2
1 90
92
50
88
94 28
16
57 1442 75
43 74
−4
15
38 81 8
25
69
78 13 66 65
30 40
62
2
3
44
88 84
−6
2 31 6524 6612 47 7 36 22 724
11 71 2087 26
21
2 −0.045 −0.04 −0.035 −0.03 −0.025 −0.02 −0.015 −0.01 −0.005 cad−>cad
0
−8 −0.04
0.005
−0.03
−0.02
cad (a) Wcad /hcad
−0.01 hb−>tll
0
0.01
0.02
(b) WThbll /hT ll cad−>tll vs h−tll (r = −0.954034)
4 9
10 64
37 45
2
49 82 27
h−tll
0
30
78 29
93 13
21
16
68 77 70 39 32 90 92
−2
−4
1 50 44 88 57 421494 43 2 7538 28 81 84 74 31 25 15 8 66 12 47 24 65 697 1136 40224 6272
71 20 87 26
−6
−8 −0.03
−0.02
−0.01
0
0.01 cad−>tll
0.02
0.03
0.04
0.05
(c) WTcad ll /hT ll
Figure 8: Scatter plots of parameters that regulate caudal. Only the scatter plots with pairwise correlations higher than 0.6 are shown.
16
Y. Fomekong Nanfack
gt −> hb vs hb −> gt
hb −> hb vs tll −> gt
0.02
0.025
0.00
0
tll −> gt
hb −> gt
0.01
−0.025
−0.01 −0.05 −0.02 −0.075
−0.02 A −0.03
−0.01
0
0.01 0.02 gt −> hb
0.03
0.04
B
0.005
0.01
0.015 0.02 hb −> hb
0.025
0.03
Figure 9: Scatter plots of different parameters that are significantly different between different stability groups. The colors indicate the different stability groups: fixed pattern in yellow (Group I), fixed pattern with large hb domain in red (Group II), oscillatory group in blue (Group III), oscillatory group in green (Group IV) and other types that could not be classified in the above groups in black.
References [1] Jaeger J, Surkova S, Blagov M, Janssens H, Kosman D, Kozlov KN, Myasnikova E, Vanario-Alonso CE, Samsonova M, Sharp DH, Reinitz J: Dynamic control of positional information in the early Drosophila embryo. Nature 2004, 430(6997):368 – 371. [2] Fomekong-Nanfack Y, Kaandorp JA, Blom J: Efficient parameter estimation for spatio-temporal models of pattern formation: case study of Drosophila melanogaster. Bioinformatics 2007, 23(24):3356–3363. [3] Kosman D, Small S, Reinitz J: Rapid preparation of a panel of polyclonal antibodies to Drosophila segmentation proteins. Dev. Genes Evol. 1998, 208:290–294. [4] Myasnikova E, Kosman D, Reinitz J, Samsonova M: Spatio-Temporal Registration of the Expression Patterns of Drosophila Segmentation Genes. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, AAAI Press 1999:195–201. [5] Myasnikova EM, Samsonova AA, Samsonova MG, Reinitz J: Spatial registration of in situ gene expression data. Molecular Biology 2001, 35(6):955–960. [6] Myasnikova E, Samsonova A, Kozlov K, Samsonova M, Reinitz J: Registration of the expression patterns of Drosophila segmentation genes by two independent methods. Bioinformatics 2001, 17:3 – 12. [7] Foe VE, Alberts BM: Studies of nuclear and cytoplasmic behaviour during the five mitotic cycles that precede gastrulation in Drosophila embryogenesis. J. Cell. Sci. 1983, 61:31 – 70. [8] Chu KW, Deng Y, Reinitz J: Parallel simulated annealing by mixing of states. J. Comput. Phys. 1999, 148(2):646–662. [9] Lam J, Delosme JM: An Efficient Simulated Annealing Schedule: Derivation. Tech. Rep. 8816, Electrical Engineering Department, Yale, New Haven, CT 1988. [10] Lam J, Delosme JM: An Efficient Simulated Annealing Schedule: Implementation and Evaluation. Tech. Rep. 8817, Electrical Engineering Department, New Haven, CT 1988. [11] Runarsson TP, Yao X: Stochastic Ranking for Constrained Evolutionary Optimization. IEEE Transactions on Evolutionary Computation 2000, 4(3):284–294. [12] Runarsson TP, Yao X: Search biases in constrained evolutionary optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part C 2005, 35(2):233–243. [13] Hooke R, Jeeves TA: Direct search solution of numerical and statistical problems. J. Assoc. Comput. Mach. 1961, 8:212–229. [14] Nelder J, Mead R: A simplex method for function minimization. Computer Journal 1965, 7:308 313. [15] Lewis RM, Shepherd A, Torczon V: Implementing generating set search methods for linearly constrained minimization. Tech. Rep. WM–CS–2005–01, Department of Computer Science, College of William & Mary 2005. [Revised July 2006].
Y. Fomekong Nanfack
17
[16] Reinitz J, Sharp DH: Mechanism of eve stripe formation. Mech. Dev. 1995, 49(1-2):133 – 158. [17] D’haeseleer P: How does gene expression clustering work? Nature Biotechnology 2005, 23:1499 – 1501. [18] Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998 Dec 8, 95(25):14863–14868. [19] Jaqaman K, Danuser G: Linking data to models: data regression. Nat Rev Mol Cell Biol 2006 Nov, 7(11):813–819. [20] Perkins TJ, Jaeger J, Reinitz J, Glass L: Reverse engineering the gap gene network of Drosophila melanogaster. PLoS Comput. Biol. 2006, 2(5):e51. [21] Ashyraliyev M, Jaeger J, Blom JG: Parameter estimation and determinability analysis applied to Drosophila gap gene circuits. BMC Systems Biology 2008, 2(83).