Evolutionary design of oscillatory genetic networks - Semantic Scholar

Eur. Phys. J. B 76, 167–178 (2010)

DOI: 10.1140/epjb/e2010-00200-9

Evolutionary design of oscillatory genetic networks Y. Kobayashi, T. Shibata, Y. Kuramoto and A.S. Mikhailov

Eur. Phys. J. B 76, 167–178 (2010) DOI: 10.1140/epjb/e2010-00200-9

THE EUROPEAN PHYSICAL JOURNAL B

Regular Article

Evolutionary design of oscillatory genetic networks Y. Kobayashi1,a , T. Shibata2,3 , Y. Kuramoto4, and A.S. Mikhailov1 1 2 3 4

Department of Physical Chemistry, Fritz Haber Institute of the Max Planck Society, Faradayweg 4-6, 14195 Berlin, Germany Department of Mathematics and Life Sciences, Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima, 739-8526, Japan PRESTO, Japan Science and Technology Agency (JST), 4-1-8 Honcho Kawaguchi, Saitama, Japan Research Institute for Mathematical Science, Kyoto University, Kyoto 606-8502, Japan Received 1st March 2010 / Received in final form 10 May 2010 c EDP Sciences, Societ` Published online 29 June 2010 – a Italiana di Fisica, Springer-Verlag 2010 Abstract. The present study is devoted to the design and statistical investigations of dynamical gene expression networks. In our model problem, we aim to design genetic networks which would exhibit stable periodic oscillations with a prescribed temporal period. While no rational solution of this problem is available, we show that it can be effectively solved by running a computer evolution of the network models. In this process, structural rewiring mutations are applied to the networks with inhibitory interactions between genes and the evolving networks are selected depending on whether, after a mutation, they closer approach the targeted dynamics. We show that, by using this method, networks with required oscillation periods, varying by up to three orders of magnitude, can be constructed by changing the architecture of regulatory connections between the genes. Statistical properties of designed networks, including motif distributions and Laplacian spectra, are considered.

1 Introduction Networks of interacting molecular machines are responsible for principal functions of a biological cell and thus understanding of their design, operation and control is essential in systems biology. Once such knowledge is gained, it can be further applied in the future to engineer artificial living cells with prescribed properties or to intentionally modify already existing microorganisms, in the framework of an approach known as synthetic biology [1]. First synthetic oscillatory genetic networks have already been designed and experimentally implemented [2–5]. While rational design methods work well for the networks with a few genes, they become less efficient when larger genetic networks with more broad range of required dynamics should be constructed. An alternative to rational design is provided by evolutionary optimization methods. All real biological networks have acquired their structure not through a logical construction, but in the process of natural biological evolution. One can try therefore to run computer evolution to design artificial networks with prescribed functions. Essentially, the design of a network with prescribed performance is a complex optimization problem: a network architecture generating the dynamics, which is as close as possible to the target behavior, must be found. A characteristic feature of complex optimization problems, such as, e.g., the traveling salesman problem, is that the number of operations needed to find a rational solution a

e-mail: [email protected]

explodes as the system size is increased, making rational solutions not feasible [6]. To deal with such situations, empirical optimization methods, e.g. simulated annealing, have been developed [7]. While not allowing to find an exact solution, they are efficient in identifying an approximate solution which may be quite close to the target. It is remarkable that the simulated annealing method can also be viewed as relying on a stochastic evolution process: mutations are repeatedly generated and accepted or discarded depending on the resulting changes in the network performance. Evolutionary optimization algorithms, representing variants of simulated annealing, have been recently successfully applied to design models of signal transduction networks with a prescribed set of responses [8–10]. Similar methods have been used to design genetic networks yielding required spatial Turing-like stripe patterns [11]. It should be also mentioned that computer evolution methods are being applied to study the evolution of RNA viruses [12]. Remarkably, this approach is so efficient that not only single solutions for the optimization problems can be obtained. Large ensembles of networks with the same prescribed functional dynamics could be constructed and their characteristic statistical properties, depending on the function chosen and on the robustness requirements, could be analyzed [10]. Here, analogous evolutionary optimization methods are applied to the design of genetic networks with prescribed dynamics. Our aim is to construct oscillatory genetic networks with required oscillation periods which may

168

The European Physical Journal B

differ largely, by several orders of magnitude. The optimization search is performed under several significant constraints. We require that the total number of genes in a network and the number of regulatory connections are fixed. Moreover, we also fix the parameter values of the model, so that the optimal solution may only be obtained through an appropriate rewiring of a genetic network. For simplicity, we allow only inhibitory interactions to be present. Thus, our designed networks can be viewed as extended versions of the three-gene repressilator circuit [2]. By systematically using computer optimization methods, tens of thousands of different genetic networks with various oscillation periods are constructed. With this large data set, statistical properties of genetic networks can be investigated as a function of their oscillation period. In addition to traditional statistical properties, such as the clustering coefficient or characteristic path lengths, we also determine motif distributions (as defined in [13]) and Laplacian spectra of constructed genetic networks. The paper is organized as follows. In Section 2, we introduce the model of genetic networks and explain the optimization method. In Section 3, simulation results for the evolution of networks are presented. In Section 4 we demonstrate typical kinds of dynamics observed in our networks. In Section 5, statistical structural properties of designed networks are considered. In the last section, we discuss the results and formulate the conclusions.

2 Formulation of the problem 2.1 The model For our study, we have chosen a simple model of a genetic network with only inhibitory interactions between genes. It represents a generalization of the repressilator model [2] to larger genetic networks. The expression levels of the genes i = 1, 2, ..., N are described by variables ui which obey the following set of ordinary differential equations: dui 1 n − ui . = N dt 1+ j=1 Aij φij uj

(1)

The adjacency matrix Aij determines the network structure of the system. It is defined in such a way that Aij = 1, if gene j directly inhibits expression of gene i, and Aij = 0 otherwise. The equations are written in a dimensionless form, with only the parameters n and φij remaining. We assume that, if the activity of a gene is potentially regulated by several other genes in a network, their inhibitory action is additive. The Hill coefficient n specifies the degree of cooperativity in the network, the parameters φij are regulation strengths. Below, except for Section 4.2, we assume that all regulation strengths are equal, φij = φ. The detailed explanation of the model is provided in Appendix. The dynamics of gene expression can be, generally, different for different genes in the same network. We have assumed that the output signal, used to characterize the

u1

h

ti-1

ti

ti+1

Fig. 1. (Color online) Determination of the time series ti from the output signal u1 and the threshold h.

network dynamics, is always generated by the first gene (i = 1) and represents its time-dependent expression amplitude u1 (t). 2.2 The evolutionary optimization method Our aim is to find networks which can generate oscillations with arbitrary prescribed periods. The parameters of genetic interactions should be the same for all such networks and only their architectures may differ. The search for a network architecture, able to generate persistent oscillations with a given period, can be viewed as an optimization process. Through repeated rewiring, a network yielding regular oscillations with a period sufficiently close to the target should be identified. We do not know any rational solution of this optimization problem and, instead, an evolutionary optimization method will be used. Essentially, it will represent a variant of stochastic Monte Carlo optimization with simulated annealing. The optimization process consists of a sequence of iteration steps. At each of them a structural mutation is applied to a network, the change in its performance (i.e., in the cost function) is determined, and the decision, whether to accept or neglect the mutation, is made. Suppose that a network is given which generates some time-dependent output signal u1 (t), not necessarily periodic. We can associate with this signal a certain sequence of events and determine time intervals between these events. An “event” is that the output signal crosses from below a certain threshold h (see Fig. 1). To determine the threshold, we monitor the output signal for a sufficiently long time and find the maximum and minimum bounds umax and umin of the signal. Then, the threshold is chosen as h = (umax + umin )/2. We determine time moments ti (i = 1, ..., K) at which u1 (t) = h and du1 /dt > 0. Using time intervals Δi = ti − ti−1 between the events, we K compute the average period T = Δi = (1/K) i=1 Δi and the variance σ 2 = Δi 2 − Δi 2 . If a signal must be periodic, its variance σ should vanish. Moreover, we want that the period of the output signal coincides with some prescribed period T0 . This means that, in our optimization problem, the following cost function can be employed: =

(T − T0 )2 σ2 + . T02 T2

(2)

Y. Kobayashi et al.: Evolutionary design of oscillatory genetic networks

3 Simulations 3.1 Numerical details In our study, we consider relatively small genetic networks with N = 10 or N = 20 genes. The activity of a gene in a cell cannot be regulated by many other genes and, therefore, genetic networks cannot be generally highly connected. We consider networks with the mean degree (number of connections per node) k = 4. Thus, a network with 10 nodes has M = 20 links and a network with 20 nodes has M = 40 connections. The parameters of the model were chosen as n = 3 and φ = 117. 65. Note, that the ring network with N = 3 represents the repressilator. For the chosen parameter values it generates limit cycleoscillations with the period T = 10.52. For each evolution, we prepared an initial network by assigning M directed links to N 2 possible pairs of nodes randomly. Note that auto-regulatory loops were not excluded in the considered networks. If a network had a disconnected component, it was discarded and a new network was constructed. Moreover, we checked the dynamics of the initial network: if it had a stable stationary state, this

250

200

Frequency

The cost function acquires its minimal possible value of zero when we have a periodic signal with the period T0 . Our optimization algorithm will involve structural “mutations” of networks. They will consist of random rewiring of the links. We choose at random a link in a given network and delete it. Then, we choose, again at random, which two network nodes should be connected by this link after the mutation. A link, after its relocation, can only connect the nodes which were not linked before that. Note that in every mutation the total number of the links is conserved. To perform optimization, we start with a random network and check its dynamics. If this network turns out to be in a stationary state, we drop it and choose at random another initial network, until a network with some dynamics (not necessarily regular) is found. Then a mutation (i.e., a link rewiring) is applied. By running dynamical simulations of the network before and after the mutation, we determine their average periods and the variances and thus compute the values and before and after the mutation. We always accept the mutation if it reduces the cost function, that is, if Δ = − < 0. We accept it with some probability p = exp(−Δ/μ), if Δ ≥ 0. The iterations should be repeated until the variance σ vanishes and, additionally, until the difference |T − T0 | becomes smaller than some given tolerance threshold ΔT . Hence, our evolutionary optimization process is using the Metropolis algorithm, as in the stochastic Monte Carlo simulations. Note however that the effective temperature in our simulations is θ = μ, where μ is a constant coefficient. Therefore, it changes during the evolution, gradually decreasing to zero as the optimization target ( = 0) is approached. This means that the optimization represents a variant of the simulated annealing method, which is broadly used in complex optimization problems [7].

169

150

100

50

0 0

5

10

15

20

25

Target period Fig. 2. (Color online) Distribution of periods for 581 random networks exhibiting periodic oscillations, among 104 randomly generated networks with N = 10 and M = 20. The peak resides near T = 10.5, corresponding to the period of the three-gene repressilator.

network was also discarded and another initial network was randomly generated. The same checks were repeated also at each subsequent iteration step and networks with disconnected components or exhibiting a stationary state were always rejected. We numerically integrated equations (1) with the constant time increment Δt = 0.01. At each iteration step, we ran a dynamical simulation and first recorded the maximum and the minimum of the output signal u1 (t) for the duration t = 5T0 . With this data, the threshold value h was calculated as h = (umax + umin )/2. Then we continued the simulation until we obtained 10 threshold-crossing events. Using the times ti (i = 1, ..., 10), the average period and the variance were computed and the cost function was determined. If the simulation time t exceeded t > 10T0 before we obtained 5 events, we discarded the network and repeated the mutation. The parameter μ was always chosen to be μ = 0.01. For a given target period T0 , we continued the network evolution process until we achieved σ = 0 and |T − T0 | < ΔT , with the error tolerance ΔT = 1. We stopped the evolution if we could not satisfy these conditions within a given maximum iteration time (either 103 or 104 , depending on a simulation). If such conditions could be reached within the prescribed iteration time, an evolution was successful, otherwise it has failed. 3.2 Success rates of evolution First, we have performed statistical analysis of dynamics in the networks that were randomly generated, as described above. The networks had N = 10 genes and M = 20 connections. The ensemble of 10 000 such networks has been considered. Only 581 networks, i.e. about 6%, in this ensemble had persistent oscillatory (limitcycle) dynamics. Figure 2 shows distribution of periods


4 Dynamical properties of designed networks 4.1 Oscillation profiles Below we show examples of designed networks and their typical dynamics. As we have mentioned, for each target period we have generated many networks with different architectures, which correspond to different dynamical systems. Although we cannot survey all of them, we have picked up several characteristic examples and studied their dynamical properties. Figure 4 shows three designed

a)

100

slope = -2

Evolution success (%)

among the population of oscillatory networks. As we see, the periods of randomly generated networks have a narrow maximum near T = 10. We have already noted that the simplest three-gene repressilator has the oscillation period T = 10.52 for the chosen parameter values. Next, we have designed oscillatory networks with given periods, as described above. For each target period T0 , we ran evolutions with 1000 different initial networks with N = 10 and M = 20. We counted how many of them were successful, i.e., achieved periodic oscillations with prescribed periods, and determined thus the success rate for every chosen target period T0 , each time within 10 000 iterations. For each T0 , its success rate was counted, yielding the distribution shown in the double-logarithmic scale in Figure 3a. We see that, through the evolution, networks with very large periods could be designed, in sharp contrast to the distribution for the randomly generated networks (Fig. 2). We could not find any networks with the period T0 < 2. The success rates were almost 100% up to T0 = 40. Then the success rates decreased with the target period and, for the periods of several hundreds, only a few percent of evolutions could be successful. The distribution of success rates in its tail approximately showed a power law behavior, with the exponent near to −2. Networks with the periods up to T0 = 1000 could be designed, with 3 evolutions out of 1000 being successful for the largest period. It is remarkable that, through the evolution, the networks could acquire oscillation periods varying from 2 to 1000, for the same parameter values. Note that, in the ensemble of networks with the same target period, the obtained networks were all different from each other. Next we compared the success rates for the networks of two different sizes, but with the same average number of links per node (N = 10, M = 20 and N = 20, M = 40), under a smaller number of iterations (maximum 1000 iterations). Figure 3b shows distributions of success rates for these two cases. For the larger size of the networks, longer periods could be generated more easily. It is remarkable that we could find a desired network within a relatively small iteration time, considering the huge number of possible networks available. Indeed, for N = 10 and M = 20, the possible number of various networks is roughly N 2 !/(N 2 −M )!M ! 5.35×1020 . Thus, there is no chance to find a desired network by exhaustive search.

10

1

0.1 1

10

100

1000

Target period b)

100

N=10, M=20 N=20, M=40 80

Evolution success (%)

170

60

40

20

0 0

100

200

300

400

500

600

700

800

Target period

Fig. 3. (Color online) (a) Evolution success rates for N = 10 and M = 20. Evolutions starting from 1000 random initial networks have been performed for each target period. (b) Evolution success rates for different sizes of networks, as a function of the target period. Red: N = 10, M = 20; green: N = 20, M = 40. For each target period, 100 evolutions starting from random initial networks have been performed.

networks with different oscillation periods and their dynamical signals. In each panel, time-dependent expression levels ui (t) of all 10 genes and the corresponding network are shown; the output signal u1 is always indicated by red color. Figure 4a displays the dynamics which is typical for networks with relatively small target periods (T0 = 10). There are significant oscillations of expression levels of six genes, including the output signal. Two signals coincide, and the remaining four genes are expressed very weakly. The genes get activated in an alternated way, one after another. This oscillation pattern is similar to that of the three-gene repressilator. Oscillation profiles for the networks with large periods are different (Figs. 4b and 4c). Usually, as shown in Figure 4 for T0 = 100, we observe almost synchronous oscillations in a group of genes; they represent narrow peaks separated by long intervals of low activity. Other genes in a network are only weakly activated.

Y. Kobayashi et al.: Evolutionary design of oscillatory genetic networks a)

171

c)

b)

1

1

1

0.5

0.5

0.5

ui

0 0

10

20

30

40

50

0

0 0

50

100

150

200

250

300

0

50

100

150

200

250

300

Time 3

5

1

4

4

1

10

2

8 4

1

5

8 7

3 9

6 8

6

10

10 9

2

7

5

7 6

3

2

9

Fig. 4. (Color online) Time series of gene expression levels ui for three examples of designed networks with different prescribed periods T0 . The corresponding network structures are also shown. Nodes and links represent genes and interactions, respectively. The red curves display output signals used for evaluation of the cost function. (a) Dynamics of ui (i = 1, . . . , 10) for T = 10.29. For i = 1, 2, 4, 5, 7, the amplitudes of ui are of the order of unity. All other amplitudes are only weakly oscillating. For i = 2 and i = 7, the dynamics coincide. (b) Dynamics of ui (i = 1, . . . , 10) for T = 99.05. For i = 1, 4, 5, 6, 7, and 10, amplitudes ui are of the order of unity, all other amplitudes are small. (c) Multi-peak oscillations with T = 100.95. Only the amplitudes for four genes (u1 , . . . , u4 ) are displayed here, although all amplitudes ui show oscillations. The amplitude u9 is about 10−2 and all other amplitudes are of the order of unity.

4.2 Regulation strength dependence of oscillation periods It is remarkable that, for the same parameter values, our model system can exhibit various periodic oscillations with a broad range of periods, by only rewiring the links. In other words, the model is able to change the timescale of its dynamics by changing only its network structure. In particular, the networks can have much longer periods than the three-gene repressilator. It is interesting to understand the origins of such behavior. In our model, we have assumed so far that all regulatory links have the same strength φij = φ. To see how the dynamics is affected by the link strength, for the net-

1000

Period

This behavior is however observed only when a network has auto-regulatory loops, as shown in Figure 4b. The networks without auto-regulatory loops can also generate oscillations with long periods, but their profiles are qualitatively different. Figure 4c displays an example of long-period oscillations for a network without auto-regulatory loops. In contrast to Figure 4b, the signal shows multiple peaks during each period. Peaks of the signals are not synchronous, as in the case of networks with auto-regulatory loops. Note that, when we excluded auto-regulatory loops during the evolution process, only the dynamics with multi-peak oscillations could be found.

100

10

0.8

0.9

1

1.1

1.2

Relative strength Fig. 5. (Color online) Dependence of oscillation periods on relative strengths of regulatory interactions φij /φ for the network shown in Figure 4b. Different colors correspond to different links. Data are shown here for 8 links out of 20.

work shown in Figure 4b we have chosen a link, allowed its relative strength φij /φ to vary and studied the resulting changes in the dynamics. This has been repeated for eight links in the network. Figure 5 shows the dependence of oscillation periods on the relative strengths of the links. Each curve corresponds to the choice of a different link. As

172


Fig. 6. (Color online) Peak values of the node 1 as a function of the relative strength φij /φ of a link. The network has been taken from Figure 4c, and the link connecting node 7 and node 8 has been chosen.

we see, oscillation periods are sensitive to the strengths. Divergence of the periods is observed when the strengths are increased or decreased. This behavior has been found when varying any of the 20 links present. It is known in the theory of dynamical systems that a limit cycle may disappear through a saddle-node bifurcation and this is accompanied by the divergence of the oscillation period (see [14]). Consider, for example, the following phase oscillator with a constant applied force ϕ(t) = a + cos ϕ.

(3)

When a > 1, the phase ϕ undergoes rotations and, thus, oscillations are present. Because of the force a, the rotation gets slowed down at ϕ = π and the oscillator spends more time there. If the control parameter a approaches unity, the velocity of motion at ϕ = π decreases and, for a = 1, oscillations are stopped. It can be easily checked that in this transition the oscillation period T behaves as T ∼ (a − 1)−1/2 near the critical point a = 1. This example is a special case of a saddle-node bifurcation on a limit cycle. We have checked that the observed divergences in Figure 5 all have the exponent −1/2, indicating the presence of such a bifurcation. Note that in our evolutionary optimization procedure a dynamical network approaches a bifurcation point not by changing a bifurcation parameter (all parameters are fixed during the evolution), but through appropriate rewiring of its connections architecture. After each rewiring, we get a different dynamical system. If a long target period is chosen, the evolution proceeds in such a way that the networks which are close to a bifurcation point are generated. The situation is different for the networks without auto-regulatory loops that exhibit long-period oscillations, like a network in Figure 4c. When we applied the same analysis as above, we did not observe the divergence of periods. Instead, a transition from periodic oscillations to chaotic behavior was found. This transition is clearly seen if we look at all local maxima (peaks) of the output signal u1 within a time

interval much longer than the target period T0 = 100. In Figure 6 we have plotted the local maxima of u1 as a function of the strength of one selected link. At the strength equal to unity, there are seven peaks. Note that the logarithmic scale is used here for the vertical axis and one of the peaks has a very small height. Therefore, only six peaks are actually visible in Figure 4c. Such sevenpeak oscillations are maintained within an interval of the strength variation in Figure 6. As the strength is further increased or decreased, we immediately see the divergence of the number of peaks, apparently indicating that the system becomes chaotic. Thus the designed network is located inside a periodicity window in a chaotic region. We have observed similar behavior when the strengths of other links were varied. It is interesting that for the entire range of the strength, the average time interval between these peaks was not much different from the period of the repressilator.

5 Statistical analysis of network structures So far, we have shown that designed genetic networks exhibit various dynamical behaviors depending on the network structure. Below, we perform statistical analysis of structures of the designed networks, for large ensembles of networks sharing the same target period. 5.1 Network motif distributions Network motifs are subgraphs that appear in a network more frequently than in its randomized version. The analysis of network motifs has allowed to elucidate basic building blocks in different biological, ecological and technological networks [13]. Moreover, it has been found that many natural and social networks could be classified into several superfamilies, according to the significance profiles of the subgraphs [15]. For a given network, the Z-score zi of a subgraph i is given by N real − Nirand zi = i , (4) std(Nirand ) where Nireal and Nirand are the frequencies of subgraph i in a given network and its randomized version, obtained by keeping the same degree distribution, · · · denotes the average taken over the ensemble of randomized networks, and std(· · · ) stands for the standard deviation. Here, we have considered three-node subgraphs. autoregulatory loops have been disregarded and, therefore, we had only 13 connected subgraphs which are all shown in Figure 7a. The software MFinder [16] has been used to calculate Z-scores of subgraphs. For each network, we have calculated Z-scores of all three-node subgraphs. For an ensemble of networks with the same target period T0 , we have taken the average of the Z-scores, yielding a 13-component vector. We have normalized this vector to unity, so that its components become the normalized Z-scores.


173

1

a) 1

2

3

4

5

6

7

8

b) 1

9

10

T0 = 4

11

1

0.5

0.5

0

0

-0.5

-0.5

12

13

T0 =10

Normalized Z-score

0.5

0

-0.5 subgraph 1 2 3 4 5

-1 -1 1

1 2 3 4 5 6 7 8 9 10 11 12 13

T0 = 20

-1 1

0.5

0.5

0

0

-0.5

-0.5

-1 1

1 2 3 4 5 6 7 8 9 10 11 12 13

T0 = 40

-1 1

0.5

0.5

0

0

-0.5

-0.5

-1

1 2 3 4 5 6 7 8 9 10 11 12 13

-1

1 2 3 4 5 6 7 8 9 10 11 12 13

T0 = 30

1 2 3 4 5 6 7 8 9 10 11 12 13

T0 = 100

1 2 3 4 5 6 7 8 9 10 11 12 13

Fig. 7. (Color online) (a) Three-node subgraphs. (b) Normalized Z-scores for different oscillation periods and their statistical dispersions.

Figure 7b shows distributions of normalized Z-scores of three-node subgraphs, each averaged over 1000 designed networks, for several target periods T0 . The distributions are clearly different for different target periods, although the statistical dispersion is quite large. Note that, none of them belong to those of the previously identified superfamilies of networks [15]. For T0 = 4 and T0 = 10, the third subgraph, which corresponds to an inhibition chain, is strongly suppressed, as compared to the randomized networks. On the other hand, the eighth subgraph is largely enhanced for these target periods. Remarkably, the eighth subgraph is nothing but the repressilator and it can itself generate oscillations with the period about 10. Thus, the designed networks for T0 = 4 and T0 = 10 tend to include the repressilator as their characteristic motif. Besides of i = 3 and i = 8, all other subgraphs have low scores. As the target period increases, the distribution drastically changes. The third subgraph is enhanced and the eighth subgraph is suppressed. At the same time, other

0

20

40

60

80

6 7 8 9 10

100

11 12 13

120

140

160

Target period

Fig. 8. (Color online) Normalized Z-scores of three-node subgraphs for different oscillation periods, averaged over the networks with the same target periods.

subgraphs (except for i = 7, 12, and 13) become significant. In particular, the subgraphs i = 4 and i = 5 get enhanced and those with a larger number of links tend to be suppressed. In Figure 8, the normalized Z-scores of the designed networks are shown as a function of T0 . Each subgraph exhibits its own dependence on the target period. Specifically, the score of the eighth subgraph is high for low T0 , and, as T0 increases, it becomes negative, reaches a minimum near T0 = 30, and then tends to zero. It is also noticeable that the score of the third subgraph is strongly anti-correlated with that of the eighth subgraph. As T0 increases, many Z-scores seem to get stabilized at certain values. As seen in Figure 7, normalized Z-scores have relatively large statistical dispersions. Therefore, it can be interesting to see what are the distributions of scores of individual subgraphs in a network ensemble. In Figure 9, histograms of Z-scores of the eighth subgraph for individual networks in the ensembles are shown for different target periods. Here we have normalized the scores of each network individually. For small periods (T0 = 4, 10), there is a maximum for positive scores, but, as the period increases (T0 = 20, 30, 40), this maximum becomes smaller. However, at the same time another maximum appears in the region of negative scores. For the large period T0 = 100, this maximum becomes smaller and the distribution gets more flat. Our analysis of motif distributions reveals that qualitatively different network architectures are needed to generate oscillations with short or large periods. For low target periods, networks use the repressilator as the principal structural motif. For long periods, networks however tend to avoid it. On the other hand, inhibitory chains (the third subgraph) are avoided at short periods, but their presence is characteristic for the networks which can generate oscillations with long periods. The bimodality of the histograms in Figure 9 reveals furthermore that, at

174

The European Physical Journal B 160 140 120

Frequency

1

T0 = 4 10 20 30 40 100

1

(a)

100

(b)

80 60 0

0

100

40

80

20 -1

60

0

1 40

1

2

-1

0

1

(c)

1

2

1

2

1

2

0

(d)

20 0

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0

0

Normalized Z-score Fig. 9. (Color online) Histograms of normalized Z-scores of the eighth subgraph with different target periods T0 .

-1 1

intermediate periods, network solutions with different kinds of architectures can coexist.

In this section, we apply a different method to characterize statistical properties of designed networks. This method consists in determining Laplacian spectra of the networks. Recently, Laplacian spectra of various biological networks have been considered and it has been shown that, by examining such spectra, significant differences in network architectures can be observed [17–19]. Following [20], we define the network Laplacian as a matrix L with the elements 1 Aij , di

-1 1

2

(5)

N where di = j=1 Aij is the total number of incoming links, or the in-degree, of node i. Since the considered networks are directed, the adjacency matrix A and thus the Laplacian matrix L are generally not symmetric. The matrix L possesses eigenvalues λα (α = 1, ..., N ), a set of which constitutes the Laplacian spectrum of the chosen network. Banerjee and Jost [17–19] considered undirected networks and thus the Laplacian had real eigenvalues in their study. In contrast to this, our networks are directed and the Laplacian eigenvalues can be complex. Note that the Laplacian L is not defined if di = 0 for some node i, i.e. if a node has no incoming links. We have therefore excluded from our analysis below the networks which had such nodes. We have applied this analysis to ensembles of 1000 designed networks of N = 10 and M = 20 for each target period. Figure 10 shows distributions of Laplacian spectra for the ensembles of networks with several different target periods. To display them, density plots of the eigenvalues are used. Every network yields 10 complex eigenvalues

-1

0

1

(e)

(f)

0

0

5.2 Network Laplacian spectra

Lij = δij −

0

0

-1 1

2

0

Fig. 10. (Color online) Distributions of Laplacian eigenvalues for the ensembles of networks with different oscillation periods: (a) initial random networks, (b) T0 = 2, (c) T0 = 10, (d) T0 = 20, (e) T0 = 30, (f) T0 = 100.

and, thus, we have 10 000 eigenvalues in each distribution. To construct such spectral plots, we have divided the complex plane into a square grid of cells of the linear size of 0.04 and counted the number of eigenvalues in each cell. In this way, a density plot has been obtained. There are a few cells with very high counts. To display the spectra more clearly, in Figure 10 we have set a cutoff at the density of 100. For comparison, the distribution of spectra for the ensemble of 1000 random networks is additionally shown in Figure 10a. They have been constructed by the same procedure as that used when we prepared initial networks for the evolutions. In each spectrum, there are real eigenvalues and pairs of complex eigenvalues. One trivial eigenvalue λ = 0 is present in every network. Therefore, in each plot there is a high peak at λ = 0. Moreover, each distribution also has a high peak at the real eigenvalue λ = 1. For random networks, complex eigenvalues are almost uniformly distributed in a ring around λ = 1. In contrast to this, one can observe clear clusters of complex eigenvalues in the spectra of designed networks. For the short period T0 = 2, we see clusters near λ = 1 ± 0.7i. For T0 = 10, these clusters remain and

Y. Kobayashi et al.: Evolutionary design of oscillatory genetic networks 10000

(a)

10000

1000

1000

100

100

10

10

1

10000

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

(c)

1

(d) 1000

100

100

10

10

10000

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

1

10000

(e)

1000

1000

100

100

10

10

1

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

10000

1000

1

(b)

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

1

175

tural properties are also different from those of the networks including auto-regulatory loops. Essentially, the data used in the statistical analysis of motif distributions and Laplacian spectra may represent, for long oscillation periods, a superposition for the data for such two ensembles. It should be however noticed that, because we never excluded networks with auto-regulatory loops during the evolution process, our designed networks usually had such loops and the networks without the loops were rather rare. Therefore, motif distributions and Laplacian spectra, which we have studied, are actually characteristic for the networks which had auto-regulatory loops. To check this, we have sometimes excluded networks without the loops from the considered ensemble and could see that the distributions did not become strongly modified.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

(f)

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Fig. 11. (Color online) Distributions of real eigenvalues for the same ensembles of networks as in Figure 10: (a) initial random networks, (b) T0 = 2, (c) T0 = 10, (d) T0 = 20, (e) T0 = 30, (f) T0 = 100.

additionally two peaks appear (indicated by arrows). These two peaks correspond, actually, to the eigenvalues of the three-node repressilator, which are equal to √ λ = 32 ± 23 i. For T0 = 20, these two peaks disappear, while the clusters move to the right, closer to the real axis. Additionally, a new pair of clusters appears in the left part of the ring. For T0 = 30, these new clusters spread, while the previously existing clusters continue to move to the right and get still closer to the real axis. For very large periods (T0 = 100), the distribution looks similar to that of the random networks, but the fraction of complex eigenvalues is higher. There are still two narrow clusters on the right side of the ring. In Figure 11 we show distributions of real eigenvalues for the ensembles of networks with different oscillation periods and for the random networks. Here, we have used the same network ensembles as in Figure 10. Examining the distributions, we notice that there is no significant dependence on the oscillation periods and, furthermore, these distributions are similar to that of the random networks in Figure 11a. Therefore, the difference of the architectures of the networks with various oscillation periods is better expressed in the distributions of complex eigenvalues and the distributions of real eigenvalues are not much sensitive in the considered case. As we have seen in Section 4.2, networks with long oscillation periods without auto-regulation loops have special dynamical properties. It is possible that their struc-

5.3 Other statistical properties In addition to motif distributions and Laplacian spectra, we have also determined other statistical properties of the designed networks. They are the clustering coefficient, the characteristic path length, and the maximum and minimum in- and out-degrees. The clustering coefficient of a node i with k neighbours, Ci , is defined as Ci =

Ei , k(k − 1)

(6)

where Ei is the number of connections, both incoming and outgoing, among the k neighbours. The clustering coefficient of a network is the average of Ci over all nodes. The characteristic path length of a network is defined as follows: For two nodes i and j, the shortest directed path lij , connecting node i to node j, is determined. If −1 the path from i to j does not exist, we set lij = 0. The inverse characteristic path length is obtained by taking −1 over all pairs of the nodes i and j. the average of lij The in-degree and the out-degree of a node are the numbers of incoming and outgoing links of this node. For each network, we determine its maximal and minimal inand out-degrees. Figure 12 shows ensemble averages over 1000 networks at each target period for the clustering coefficients, the characteristic path lengths, and the in- and out-degrees (maximum and minimum). Additionally, standard statistical deviations of these properties are displayed. Generally, there is no strong dependence of these statistical properties on the oscillation periods. It can be only noticed that the clustering coefficient becomes increased for the shorter periods.

6 Conclusions By using evolutionary optimization algorithms, many thousands of genetic networks, all with the same numbers of genes and regulatory connections, but with different prescribed oscillation periods could be constructed. To

176


(a)

(b)

0.3

Characteristic path length

Clustering coefficient

0.25 0.2 0.15 0.1 0.05 0

0

20

40

60

80

100

120

3.5 3 2.5 2 1.5 1 0.5 0

140

0

20

40

Target period

(c)

(d)

5

In-degree

Out-degree

80

100

120

140

120

140

5 4

4 3 2

3 2 1

1 0

60

Target period

0

20

40

60

80

100

120

140

Target period

0 0

20

40

60

80

100

Target period

Fig. 12. (Color online) Statistical properties of the designed networks with different prescribed periods. (a) Clustering coefficient, (b) characteristic path length, (c) maximum and minimum in-degrees, (d) maximum and minimum out-degrees, all averaged over 1000 networks for each target period. Additionally, statistical dispersion of data is indicated.

obtain each network starting from a random network architecture, only up to ten thousand of iteration steps were needed. In this way, genetic networks with the oscillation periods differing up to three orders of magnitude for the same parameter values could be found. The abundance of solutions and the fast convergence of the optimization process are remarkable. In fact, our study reveals the great plasticity of genetic network dynamics which may play an important role in natural biological evolution. Even relatively small networks with just ten genes are able, by mere rewiring of their connections, to generate limit-cycle oscillations in a wide range of timescales. Moreover, there are thousands of different networks which generate oscillations with (almost) the same period. Our study also reveals high efficiency of the optimization methods in the network design. The total number of different networks with 10 nodes and 20 directed links is of the order of 1020 . Therefore, a solution cannot be found by exhaustive search. There are also no rational methods which would have allowed us to construct networks with a prescribed oscillation period. Nonetheless, the evolutionary optimization method, which is a variant of the simulated annealing, is able to rapidly yield many good solutions. With the abundant data accumulated, statistical properties of the networks could be systematically investigated. Our analysis has shown that traditional characterization methods, based on degree distributions or mean path lengths, are not particularly sensitive in distinguish-

ing networks with different oscillation periods. However, such networks were found to have distinctly different motif distributions and Laplacian spectra. In our networks, only inhibitory regulations were allowed. Therefore, all of them can be seen as extensions of the elementary three-gene repressilator circuit [2]. Indeed, such cyclic three-node circuits could be frequently identified in the designed networks of ten genes, when the prescribed oscillation period was relatively short and close to that of the repressilator with the same parameter values. The networks generating long-period oscillations had qualitatively different architectures, as revealed both by the analysis of their motifs and spectra. Often, they were found to be in the vicinity of a saddle-node on the limit cycle bifurcation or a bifurcation leading to chaotic dynamics. The evolutionary optimization methods, developed in our study, can be applied to design genetic networks with other prescribed dynamical properties, including both inhibitory and activating regulations. Although in the present study we have considered only inhibitory regulations for the sake of simplicity, our optimization algorithm can be straightforwardly extended to include activatory regulations. It may be interesting, furthermore, to design networks which, while generating oscillations with a prescribed period, are also functionally robust against local damage or random parameter variations. This will be done at the next stage of our research. The aim of the present model study was not to design actual genetic genetics networks which may be


177

experimentally implemented, but rather to systematically explore the power of evolutionary design methods and to get better understanding of the dynamical properties and capacities of regulatory genetic networks. Therefore, some simplifications have been made in the construction of the model and, moreover, the parameter choice was relatively arbitrary. The results of our study suggest, however, that similar evolutionary optimization methods should also be applicable in real synthetic biology, as part of the future projects aimed at engineering real genetic networks with prescribed dynamics.

where Kij specifies the strength of the effect of the regulatory factor Uj on the production rate of protein Ui . The structure of a regulatory network consisting of N genes is determined by the adjacency matrix Aij , which is defined in such a way that Aij = 1, if a repressive interaction from the gene j to i exists, and Aij = 0 otherwise. For example, the repressilator is described by the matrix ⎛ ⎞ 001 A = ⎝1 0 0⎠. (A.3) 010

Financial support from the Volkswagen Foundation (Germany) in the framework of the program “Networks as a phenomenon across the disciplines” is gratefully acknowledged.

If protein Ui undergoes degradation and dilution with rate constant γi , this process should be additionally taken into account. As a result, evolution equations for protein concentrations Ui take the form dUi 1 n − γi Ui . = Vi N dt 1+ j=1 Aij Uj /Kij

Appendix A For our study we have chosen a simple model of a transcription regulatory network with only repressors. We consider the dynamics of the concentrations Ui of gene products, proteins Ui (see [21,22]). Gene expression involves first the formation of messenger RNA and, generally, RNA concentrations must be also included into the description, as has been done, e.g., by Elowitz and Leibler [2] in their modeling of the repressilator circuit. However, characteristic times of RNA are often short as compared with the characteristic time scales for the protein products. When this is the case, as assumed in our model, both transcription and translation can be described as a single process of protein production, without explicitly considering the dynamics of the intermediate product (RNA). The rate of production of protein Ui is high when the transcription factor – protein Uj – is not bound to the promoter region of gene i, whereas the production rate is low when it is bound. Therefore, the production rate represents some decreasing function of the concentration Uj of the protein Uj . In our model, we use the Hill function, characteristic for many real genetic systems. Thus, the production rate of protein Ui is given by Vi , 1 + (Uj /Ki )n

(A.1)

where Vi is the maximum rate, n is the Hill coefficient, and Ki specifies the characteristic concentration of protein Uj at which the rate reaches its half-maximum level. When there are several transcription factors which jointly regulate gene i, its expression is determined by a cis-regulatory input function that integrates regulation by these factors. In our model, we assume that multiple factors act competitively on the regulated gene and, therefore, concentrations of these factors (i.e. of regulatory proteins) are additively affecting gene expression. Under this assumption, the production rate can be written as V i , 1 + ( j Uj /Kij )n

(A.2)

(A.4)

We assume that the decay rates γi are the same for all proteins, γi = γ. Introducing dimensionless concentrations ui = γUi /Vi and parameters φij = Vj /γKij and measuring time in the units of γ −1 , the activity of the regulatory network becomes described by the following set of ordinary differential equations: 1 dui n − ui = N dt 1+ A φ u j=1 ij ij j

(i = 1, 2, . . . , N ). (A.5)

Thus, the dimensionless model (1) is derived.

References 1. R. Kwok, Nature 463, 288 (2010) 2. M.B. Elowitz, S. Leibler, Nature 403, 335 (2000) 3. J. Stricker, S. Cookson, M.R. Bennett, W.H. Mather, L.S. Tsimring, J. Hasty, Nature 456, 516 (2008) 4. M. Tigges, T.T. Marquez-Lago, J. Stelling, M. Fussenegger, Nature 457, 309 (2009) 5. T. Danino, O. Mondrag´ on-Palomino, L. Tsimring, J. Hasty, Nature 463, 326 (2010) 6. M.R. Garey, D.S. Johnson, Computers and Intractability (Freeman, San Franscisco, 1979) 7. S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi, Science 220, 671 (1983) 8. P. Kaluza, A.S. Mikhailov, Europhys. Lett. 79, 48001 (2007) 9. P. Kaluza, M. Ipsen, M. Vingron, A.S. Mikhailov, Phys. Rev. E 75, 015101 (2007) 10. P. Kaluza, M. Vingron, A.S. Mikhailov, Chaos 18, 026113 (2008) 11. K. Fujimoto, S. Ishihara, K. Kaneko, PLoS ONE 3, e2772 (2007) 12. M. Stich, C. Briones, S.C. Manrubia, BMC Evol. Biol. 7, 110 (2007) 13. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, U. Alon, Science 298, 824 (2002)

178


14. S.H. Strogatz, Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering (Westview Press, 2001) 15. R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. Shen-Orr, I. Ayzenshtat, M. Sheffer, U. Alon, Science 303, 1538 (2004) 16. Software package MFinder is available at www.weizmann. ac.il/mcb/UriAlon. 17. A. Banerjee, J. Jost, Theory Biosci. 126, 15 (2007) 18. A. Banerjee, J. Jost. Linear Algebra Appl. 428, 3015 (2008)

19. A. Banerjee, J. Jost. Discrete Appl. Math. 157, 2425 (2009) 20. F.R.K. Chung, Spectral Graph Theory (American Mathematical Society, 1997) 21. S. Ishihara, K. Fujimoto, T. Shibata, Genes Cells 10, 1025 (2005) 22. U. Alon, An Introduction to Systems Biology (Taylor & Francis Group, LLC, 2007)