The Effect of Transcription and Translation Initiation Frequencies on ...

51 downloads 0 Views 665KB Size Report
Jul 14, 2000 - Dissociation rate is set to reproduce proper translation initiation frequency. RBS clearance. 0.5 s 1. According to experimental estimations (19).
THE JOURNAL OF BIOLOGICAL CHEMISTRY © 2001 by The American Society for Biochemistry and Molecular Biology, Inc.

Vol. 276, No. 11, Issue of March 16, pp. 8165–8172, 2001 Printed in U.S.A.

The Effect of Transcription and Translation Initiation Frequencies on the Stochastic Fluctuations in Prokaryotic Gene Expression* Received for publication, July 14, 2000, and in revised form, October 27, 2000 Published, JBC Papers in Press, November 2, 2000, DOI 10.1074/jbc.M006264200

Andrzej M. Kierzek‡§, Jolanta Zaim‡, and Piotr Zielenkiewicz‡¶ From the ‡Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Pawinskiego 5a, 02-106 Warsaw, Poland, and ¶Institute of Experimental Plant Biology, Warsaw University, Pawinskiego 5a, 02-106 Warsaw, Poland

The kinetics of prokaryotic gene expression has been modelled by the Monte Carlo computer simulation algorithm of Gillespie, which allowed the study of random fluctuations in the number of protein molecules during gene expression. The model, when applied to the simulation of LacZ gene expression, is in good agreement with experimental data. The influence of the frequencies of transcription and translation initiation on random fluctuations in gene expression has been studied in a number of simulations in which promoter and ribosome binding site effectiveness has been changed in the range of values reported for various prokaryotic genes. We show that the genes expressed from strong promoters produce the protein evenly, with a rate that does not vary significantly among cells. The genes with very weak promoters express the protein in “bursts” occurring at random time intervals. Therefore, if the low level of gene expression results from the low frequency of transcription initiation, huge fluctuations arise. In contrast, the protein can be produced with a low and uniform rate if the gene has a strong promoter and a slow rate of ribosome binding (a weak ribosome binding site). The implications of these findings for the expression of regulatory proteins are discussed.

Stochastic fluctuations in gene expression have drawn the attention of several research groups. It has been postulated (1, 2) and observed (3) that proteins are produced in short “bursts” occurring at random time intervals rather than in a continuos manner. This expression pattern arises from the fact that elementary processes such as polymerase binding and open complex formation involve small number of molecules and therefore show a wide distribution of reaction times. The fluctuations in a number of protein molecules resulting from stochastic gene expression are of special importance if the concentration of the protein constitutes the message that regulates expression of another gene. In this case, one may expect that the activation pattern of the regulated gene is also random, to some extent. The stochastic nature of gene expression and other biochemical processes involving small numbers of molecules may explain the variations observed in isogenic populations of bacterial cells. Examples are individual chemotactic responses (4), the distribution of generation times observed in Escherichia coli cells (5), and the individual cell responses observed during induction of lactose (6) and arabinose (7) oper* This work was supported by Internal PW5 Grant from Institute of Biochemistry and Biophysics, Polish Academy of Sciences. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. § To whom correspondence should be addressed. Tel.: 48-22-658-4703; Fax: 48-39-12-16-23; E-mail: [email protected]. This paper is available on line at http://www.jbc.org

ons at subsaturating inducer concentrations. Stochastic effects have also been observed in engineered genetic networks in E. coli (8). Taking into account the stochastic phenomena described above, it is clear that the cell must contain various mechanisms that ensure deterministic regulation of cellular processes. The check points in eukaryotic cell cycle regulation (9) are examples of such a mechanism. Computer simulations implementing various Monte Carlo algorithms proved to be useful in studying stochastic processes in biochemical reaction networks. This is especially true for several prokaryotic systems for which, due to their relative simplicity, a large number of quantitative measurements are available. Examples are the kinetic analysis of developmental pathway bifurcation in phage ␭ (2), computer simulation of the chemotactic signaling pathway (4), and mechanistic modeling of the all or none effect in lactose operon regulation (10). The goal of this article is to study the kinetic mechanisms that are responsible for the stochastic phenomena of gene expression. Because both RNA and protein elongation rates are approximately equal for all genes, different levels of gene expression are the result of varying frequencies of transcription and translation initiations. Therefore, we examined the influence of transcription and translation initiation frequencies on the pattern of gene expression. The model of a prokaryotic gene was built using kinetic parameters of LacZ determined by Kennell and Riezman (11). The model is consistent with the experimental results presented in their work. We have systematically varied the frequencies of transcription and translation initiation reactions, keeping other parameters of the model constant. Our results indicate that the genes expressed from strong promoters (high frequency of transcription initiation) produce proteins with constant velocity and low variation in the number of molecules. A decrease in the frequency of transcription initiation causes a noisy pattern of gene expression, i.e. the protein is produced in bursts at random time intervals. In contrast, a decrease in the frequency of translation initiation lowers the speed of protein synthesis but does not lead to noisy expression patterns. EXPERIMENTAL PROCEDURES

Formulation of the Model—We present the kinetic model of single prokaryotic gene expression. The transcription and translation processes are represented by the collection of kinetic equations that are numerically integrated with the use of the Monte Carlo approach of Gillespie (12, 13). We assume that the cells in which the model gene is expressed are in the exponential growth phase. All of the experimental data cited in our work concern the cells in this growth phase. Moreover, under these conditions, the concentrations of free RNA polymerase and ribosomes available for the single gene are most probably kept constant by the delicate balance between many coupled reactions (14 –16) such as synthesis of RNA polymerase and ribosomes, binding of RNA polymerase and ribosomes to other expressing genes, nonspecific binding of RNA polymerase to DNA, and its idling at pause sites. A detailed model of all

8165

8166

Stochastic Effects in Prokaryotic Gene Expression TABLE I Summary of the kinetic model and its parameters for the LacZ gene Reaction/process

Rate constants

Comments

The pool of available RNA polymerase The pool of available ribosomes

Gaussian distribution with ␮ ⫽ 35 and ␦ ⫽ 3.5 molecules Gaussian distribution with ␮ ⫽ 350 and ␦ ⫽ 35 molecules

Order of magnitude of the mean is set according to experimental estimations (14, 17). Sensitivity of the results to the value of ␦ has been tested.

RNA polymerase binding and dissociation

Association rate: 108 M⫺1 s⫺1 Dissociation rate: 10 s⫺1

Association rate is set according to experimental estimations (17). Dissociation rate is set to reproduce transcription initiation frequency measured by Kennell & Riezman (11).

Closed complex isomerization

1 s⫺1

According to experimental estimations (17)

Promoter clearance and RBS synthesis

1 s⫺1

Estimated using the rate of polymerase movement and the lengths of RBS and promoter sequences

Ribosome binding and dissociation

Association rate: 108 M⫺1 s⫺1 Dissociation rate: 2.25 s⫺1

Association rate is set to the order of magnitude of diffusion limited aggregation. Dissociation rate is set to reproduce proper translation initiation frequency.

RBS clearance

0.5 s⫺1

According to experimental estimations (19)

mRNA degradation

0.3 s⫺1

Set close to transcription initiation frequency. Values at least as high as transcription initiation frequency are necessary to assure that the mRNA level is lowered by the decrease in protection of RBS by bound ribosomes (as observed by Yarchuk et al. (18)).

Elongation of protein chain

0.015 s⫺1

According to commonly accepted value of ribosome movement rate (15 amino acids/s) and the length of ␤-galactosidase.

Protein degradation

6.42 ⫻ 10⫺5

According to ␤-galactosidase half-life of 3 h measured by Bergquist and Truman (22).

the processes that buffer concentrations of free RNA polymerase and ribosomes would require taking into account the biosynthesis of all the components of gene expression machinery and their binding within all the transcription units in the cell. Most of the parameters of such a complex model would be unknown and impossible to estimate. Instead, we included in the model randomly changing pools of the free RNA polymerase and ribosomes. Because these pools result from many small contributions of other processes, their distributions should be Gaussian. The mean values of these distributions have been set close to the estimations presented by other authors. The fluctuations in the RNA polymerase and ribosome concentrations described by the standard deviations of the distributions are unknown. Therefore, we checked the sensitivity of our results to these parameters. Although many research groups are developing detailed kinetic models of various elements of the gene expression machinery (e.g. transcription initiation (17)), the in vivo values of the rate constants of many elementary reactions remain unknown. In contrast, many experimental works describe the kinetics of prokaryotic gene expression (mainly of the LacZ gene in E. coli) in terms of simple effective parameters such as the frequencies of transcription and translation initiation, the rate of RNA polymerase and ribosome movement, mRNA half-life, etc. (11, 18). In our model, we adjust unknown values of elementary reactions to reproduce measured values of transcription and translation inititiation frequencies, mRNA level, and the protein synthesis rate in vivo. We assume that effective parameters obtained in this way implicitly include reactions involving nucleotides, translation initiation factors, bivalent ions, and other compounds necessary for proper function of the gene expression machinery. The summary of the kinetic model and its parameters for the LacZ gene is given in Table I. Details of the model and simulation algorithm are described below. Model of Transcription and Its Parameters—Transcription initiation is represented in our model as a two-step process:

polymerase binding; the reaction in Equation 3 represents isomerization of closed binary complex to open binary complex. The second order rate constant of RNAP1 binding has been set to 108 M⫺1 s⫺1, which is the order of magnitude of RNAP DNA binding determined in in vitro assays (17). The values of closed complex to open complex isomerization rates reported in the literature are of the order of 1 s⫺1. The rate of reaction (Equation 3) was set to this value. If the rate constants of reactions in Equations 1 and 3 are set to the values listed above, the rate of RNAP dissociation must be of the order of 10 s⫺1 to reproduce a transcription initiation frequency close to the value of 0.3 s⫺1 reported by Kennell and Riezman (11). Moreover, the binding constant resulting from the rates of reactions in Equations 1 and 2 is 107 M⫺1, which is in reasonable agreement with values estimated according to in vitro measurements (17). To clear the promoter region, active RNA polymerase must move 30 to 60 nucleotides (17). Taking into account that the rate of polymerase movement is about 40 nucleotides/s, this step takes about 1 s (17). The length of the mRNA chain that is synthesized during this time corresponds roughly to the length of the leader region containing the ribosome binding site (RBS). Therefore, the synthesis of the RBS and promoter clearance occur at approximately the same rate of 1 s⫺1. It is irrelevant for the conclusions of our model whether the RBS appears slightly earlier or later than promoter clearance. Therefore, we modeled these two processes by the single first order reaction with a rate constant of 1 s⫺1: TrRNAP 3 RBS ⫹ P ⫹ ElRNAP

(Eq. 4)

P ⫹ RNAP 3 P_RNAP

(Eq. 1)

P_RNAP 3 P ⫹ RNAP

(Eq. 2)

P_RNAP 3 TrRNAP

(Eq. 3)

where ElRNAP denotes polymerase elongating a given mRNA molecule. The elongation of the mRNA is not modeled. We assume that the RNA polymerase completes synthesis of the mRNA molecule with a rate sufficient to allow for a ribosome movement rate of 15 amino acids/s. Translation, mRNA Decay, and Protein Degradation—In prokaryote transcription, translation and mRNA degradation are tightly coupled. Following the experimental results of Yarchuk et al. (18), we have assumed the following interdependence between translation and mRNA decay: (i) RNase E and ribosomes compete for the RBS; (ii) if RNase E binds to the RBS faster than the ribosome, it degrades mRNA

where P represents the promoter region of the gene, RNAP represents RNA polymerase, and TrRNAP represents the transcribing RNA polymerase. Reactions in Equations 1 and 2 describe reversible RNA

1 The abbreviations used are: RNAP, RNA polymerase; RBS, ribosome binding site; TIF, transcription initiation frequency.

Stochastic Effects in Prokaryotic Gene Expression in the 5⬘ to 3⬘ direction and does not interfere with the movement of the ribosomes that have been already bound; and (iii) every ribosome that successfully binds to the RBS completes translation of the protein. The above assumptions have been modeled by the following reactions: Ribosome ⫹ RBS 3 RibRBS

(Eq. 5)

RibRBS 3 RBS ⫹ Ribosome

(Eq. 6)

RibRBS 3 ElRib ⫹ RBS

(Eq. 7)

RBS 3 decay

(Eq. 8)

where Ribosome represents the pool of free ribosomes, RibRBS represents the ribosome binding site protected by the bound ribosome, and ElRib denotes the ribosome elongating protein chain. The reaction in Equation 8 models decay of unprotected RBS. Parameters of the reactions in Equations 5– 8 are more difficult to estimate than those concerning transcription initiation. The second order rate constant for ribosome binding was set to 108 M⫺1 s⫺1, which is of the order of diffusion limited macromolecule binding. The rate of the RBS clearance step in translation initiation (the reaction in Equation 7) was set to 0.5 s⫺1, which is close to experimental estimates (19). Parameters of the reactions in Equations 6 and 8 have been adjusted to reproduce the experimentally known rate of protein synthesis and ribosome spacing of the LacZ gene. Another important constraint is provided by experiments of Yarchuk et al. (18). The authors introduced mutations in the RBS region of the LacZ gene that resulted in a 200-fold variation in the expression level. A decrease in the protein synthesis rate has been followed by a nearly equivalent change in the stationary mRNA level. This means that for the less effective ribosome binding sites, the rate of mRNA decay exceeded or at least was no lower than the transcription initiation frequency. Otherwise, the significant mRNA level would be observed even in the case of RBS practically unprotected by ribosomes. We have set the rate of the reaction in Equation 8 to 0.3 s⫺1, which is equal to the transcription initiation frequency. Therefore, the rate of ribosome dissociation has to be set to 2.25 s⫺1 to reproduce the rate of protein synthesis measured by Kenell and Riezman. The elongation of the protein chain has been modeled as a single first order reaction: RibRBS 3 protein

(Eq. 9)

We have used the commonly accepted rate of ribosome movement of 15 amino acids/s. Therefore, the rate of the reaction in Equation 9 has been set to 15/L s⫺1, where L denotes the length of protein chain. For the 1024-amino acid-long product of the LacZ gene, the value of this parameter is 0.015. This treatment of the elongation reaction is simplified with respect to other Monte Carlo models of prokaryotic gene expression (1, 20). In these publications, the addition of a single amino acid by a single ribosome has been simulated as a separate reaction, and ribosome overlap has been prevented by excluded volume constraints. It will be shown in the following sections that despite neglecting fine details of ribosome movement and using the average rate of protein chain synthesis instead, the model is able to reproduce the experimental data concerning expression of the LacZ gene. Simplification of the elongation reaction made the simulation computationally fast, thus we could collect good statistics from long timescale trajectories for a large number of parameter combinations. Protein degradation has been simulated by the following reaction: Protein 3 decay

(Eq. 10)

Although protein degradation rates are variable, most of the native proteins have half-lives of the order of hours (21). In our simulations, we have applied the value of 3 h that has been measured for wild type ␤-galactosidase in E. coli cells (22). RNA Polymerase and Ribosome Pools—The numbers of available RNAP and ribosome molecules were generated from the Gaussian distribution. In the case of RNAP, the mean of the distribution was set to 35 molecules. This value is close to estimations made by other authors according to experimental data (14, 15, 17). Because the number of ribosome molecules present in the cell is approximately an order of magnitude larger than the number of RNAP molecules, the mean of the ribosome pool distribution has been set to 350. We are not able to estimate the S.D. of the above-mentioned pools. We have repeated all the experiments with the standard deviations equal to 10% and 50% of the means to study the influence of the fluctuations in RNAP and ribosome pools on the gene expression pattern. In the following sections, results will be presented for the standard

8167

deviations of the pools equal to 10% of the means, unless it is stated explicitly that standard deviations are set to 50% of the means. Simulation Protocol—To compute the time evolution of the system described above, we have used the Monte Carlo algorithm of Gillespie (12). This method has a sound physical basis and has already been proven useful in the simulation of prokaryotic gene expression (1, 2). For the convenience of the reader, a brief description of the Gillespie algorithm is given below. The rigorous derivation of this computational protocol can be found elsewhere (12, 13). Gillespie’s algorithm is formulated to numerically study systems composed of a large number of chemical species whose interactions are described by rate equations. At each step of simulation, two random numbers are generated. The first one is used to determine which reaction in the system will happen; the second one determines the waiting time for this event. The probability of a given reaction to be chosen is proportional to the product of its stochastic rate constant and numbers of substrate molecules. The distribution of the waiting times is given by the following equation: P共␶, ␮兲 ⫽ a␮exp(⫺a0␶) where a␮ ⫽ h␮c␮; a0 ⫽



a␮

(Eq. 11)

(Eq. 12)

P(␶, ␮) is the probability that the waiting time for the reaction is ␶, provided that reaction ␮ was chosen to happen; h␮ is the product of the numbers of substrate molecules in reaction ␮; and c␮ is the stochastic rate constant of this reaction. After the reaction and its waiting time are determined, one elementary reaction is executed (e.g. if the reaction is A ⫹ B3 AB, one A molecule and one B molecule will be removed from the system, and one AB molecule will be added). The time of the simulation is then increased by ␶, and the next simulation step is executed. The stochastic rate constant of the reaction is defined as the probability that the elementary reaction will happen in the infinitesimally short time interval. It can be calculated from the rate constant using simple relations. For the first order reactions, both constants are equal. For the second order reaction, the stochastic rate constant is equal to the rate constant divided by the volume of the system. It has been rigorously proven that for an infinitely large number of molecules, the Markov Chain generated by the Monte Carlo scheme given above converges to the same trajectory of the system as the deterministic integration of differential equations (12, 13). The advantage of the Gillespie algorithm over the differential equation approach is that it also remains exact for arbitrary low numbers of molecules in the system. This is extremely important in the simulation of biochemical systems, where some processes involve single molecules (e.g. one molecule of promoter in transcription initiation). Randomly fluctuating pools of available ribosomes and RNA polymerase have been modeled in the following way. Before executing every step of the Gillespie algorithm, the number of RNAP molecules and ribosomes has been generated from Gaussian distributions with the means and standard deviations defined in the previous section. Then, the values of a␮ (see Equation 12) for every reaction were calculated, and the step of the Gillespie algorithm has been executed. Generation of the number of substrate molecules from the appropriate probability distribution has already been applied by Arkin et al. (2) in the Gillespie algorithm simulation to model rapid equilibrium phenomena in regulatory protein binding. We have applied the Gillespie algorithm to the model described in the previous sections. At the beginning of the simulation, only a single promoter (P) was present in the system. The numbers of all other molecules, except random pools of RNAP and ribosomes, were set to 0. Simulation was pursued until it reached the generation time of E. coli, and then cell division was simulated, i.e. simulation was continued with the system containing one promoter molecule and all other numbers of molecules divided by 2. The generation time was set to 35 min. We have neglected the fact that during DNA replication, two copies of the bacterial gene can be expressed. Linear volume change of the growing cell has been assumed. Before every step of the Gillespie algorithm, stochastic rate constants of second order reactions have been recomputed by dividing the appropriate rate constants by the current volume of the cell. Simulation of several generations is important in the simulation of constitutive gene expression. As shown below, after starting our simulation with 0 protein molecules, the level of gene expression equilibrates after a few generations. We have found that 10 generations are suffi-

8168

Stochastic Effects in Prokaryotic Gene Expression

FIG. 1. Simulation of LacZ gene expression. A, number of protein molecules as a function of time. The results for 100 trajectories recorded over 10 bacterial generations are shown (generation time, 2100 s). After every generation, the number of protein and mRNA molecules was divided by 2. B, the number of protein molecules recorded in the first generation. In every 10-s time interval, the number of protein molecules was averaged over 100 trajectories, and the S.D. was computed. Data on the plot are mean (average trajectory) and ⫾ 3 ␴ values for each time interval. Linear function is fitted to all data points on the average trajectory recorded for times exceeding 500 s. C, average and ⫾ 3 ␴ trajectories for the number of mRNA molecules in the first generation. D, average and ⫾ 3 ␴ trajectories for the number of ribosomes moving on LacZ mRNA.

cient to equilibrate the system. For each simulation, we computed 100 independent trajectories of the system covering 10 generations each. We have used our own implementation of the Gillespie algorithm. The software has been tested against examples given in the original work (12). RESULTS

Validation of the Model—To check the validity of the model, we performed the simulation with the parameters describing the LacZ gene. Fig. 1A presents the gene expression pattern spanning 10 bacterial generations. Standard deviation of RNAP and ribosome pools has been set to 3.5 and 35, respectively (10% of the mean values). To compare these data with the results of Kennell and Riezman (11), we have analyzed the changes in the number of protein molecules in the first generation because the experimental data covered the time scale of about 30 min after induction. The transcription initiation frequency obtained in this simulation was equal to 0.258 s⫺1, close to that of 0.3 s⫺1 reported by Kennell and Riezman for the LacZ gene. As seen in Fig. 1B, after about 500 s, the number of protein molecules begins to grow linearly with time. The rate of protein synthesis, which was determined as the slope of this part of the curve, is 26 s⫺1, which is in good agreement with the value of 20 s⫺1 reported by Kennell and Riezman (11). The average number of mRNA molecules present in the cell was determined from the data shown in Fig. 1C. After about 1000 s, the number of mRNA molecules reached a stationary level. The mean number of molecules in this part of the curve is 56 ⫾ 4. This result is in good agreement with the value of 62 LacZ mRNA molecules/cell measured by Kennell and Riezman (11). Fig. 1D shows the number of ribosomes translating the LacZ mRNA. The average number of molecules present after 1000 s is 1838 ⫾ 73. Therefore, according to our simulation, 1 molecule of LacZ mRNA is occupied, on average, by 32 ribosomes. This yields a ribosome spacing of 96 nucleotides (the length of Z message is 3074 bp according to the GenBankTM entry). This number is in agreement with the experimental value of 110 nucleotides reported by Kennell and Riezman (11). As will be shown in the next sections, when the rate of ribosome dissociation is increased in our simulation, the model predicts that the mRNA level will decrease. This behavior is consistent with the experimental data of Yarchuk et al. (18). We have repeated the simulation above with the S.D. of

RNAP and ribosome pools set to 50% of the mean values. There are only minor differences with respect to the results reported above. The rate of protein synthesis is 24 s⫺1; the mRNA level is 54 ⫾ 4 molecules; the number of translating ribosomes is 1756 ⫾ 43; and there is a ribosome spacing of 95 nucleotides. We conclude that the results of our simulations are consistent with the experimental data concerning expression of the LacZ gene. In the following sections, we will show predictions of the model for the genes with frequencies of transcription and translation initiations different from those of the LacZ gene. Effect of Varying Transcription Initiation Frequency—Promoter efficiency can be decreased by two factors: a low binding constant of RNA polymerase within the promoter region and/or a low rate of closed complex isomerization. We have performed two sets of simulations in which transcription initiation frequency has been lowered in both ways. Low RNAP specificity was simulated by increasing the RNAP dissociation rate. We assume that RNAP binding is limited by three-dimensional diffusion in solution or by one-dimensional diffusion along DNA molecule, so the promoter sequence influences only the dissociation rate and not the binding rate. The simulations have been performed for the following RNAP dissociation rates: 10, 100, 1000, and 10000 s⫺1. This resulted in transcription initiation frequencies of 0.258, 0.052, 0.0054, and 0.0005 s⫺1. These values cover the entire range of transcription initiation frequencies (1 s⫺1 to 10⫺4 s⫺1) observed thus far in prokaryotic systems. All other parameters of the model and simulation protocol have been kept constant and set to the values used for the simulation of the LacZ gene. Fig. 2 shows the results of the calculations. The plots display the values recorded in the 10th generation (time range, 18,900 –21,000 s) when the expression pattern of the gene is equilibrated. In the case of strong promoters (Fig. 2, A and B), all 100 trajectories shown are similar. In both cases, the number of protein molecules grows linearly, with smaller differences between trajectories observed in the case of stronger promoter. The expression pattern in the case of the smallest frequency of transcription initiation analyzed (Fig. 2D) is qualitatively different. The protein is produced in pulses rather than evenly, as in the cases described above. In some trajectories, the number of protein molecules does not grow at all in the 10th generation, and the plots indicate only slow decay due to the protein degradation. In other cases, the gene is expressed

Stochastic Effects in Prokaryotic Gene Expression

8169

FIG. 2. Expression patterns for various transcription initiation frequencies. On each plot, 100 trajectories recorded in the 10th generation are plotted with red lines. Average and ⫾ 3 ␴ trajectories are plotted with black lines. In D, the ⫺3 ␴ trajectory is not shown because it would have negative numbers of molecules. A–D present results for transcription initiation frequencies of 0.258, 0.052, 0.0054, and 0.0005 s⫺1, respectively.

for some time, resulting in a sudden “burst” in the amount of protein. The time intervals between these events are random. The gene expression pattern for the transcription initiation frequency 0.0054 s⫺1 (Fig. 2C) is the intermediate case with respect to the two extremes already described. To express the above observations quantitatively, the following analysis was performed. For each 10-s time interval in the whole trajectory of the simulation (10 generations; time range, 0 –21,000 s), the mean and S.D. of the number of protein molecules were calculated. The variation coefficient for each time interval was then calculated as the ratio of the S.D. and the corresponding mean value. The values plotted as a function of time are presented on Fig. 3A. The plot shows that the relative fluctuations of gene expression are much larger in the 1st generation of the cells than in subsequent ones. After ⬃2.5 generations, the curves converge to the values 0.03, 0.08, 0.26, and 0.84 for transcription initiation frequencies 0.258, 0.052, 0.0054, and 0.0005 s⫺1, respectively. Therefore, in the case of weakly expressed genes, the fluctuations in the number of protein molecules converge to about 84% of the mean value, whereas for very strongly expressed genes, these fluctuations do not exceed a few percent. In the next set of simulations, promoter efficiency has been lowered by setting closed complex isomerization rates (the reaction in Equation 3) to 1, 0.1, 0.01, and 0.001 s⫺1. The resulting transcription initiation frequencies were 0.258, 0.035, 0.0035, and 0.0001 s⫺1. Gene expression patterns obtained in these simulations do not differ qualitatively from those presented in Fig. 2. Fig. 3B shows the plots of variation coefficient versus time. To examine the influence of the fluctuations in the pools of available RNAP and ribosomes, we repeated all of the above experiments with the standard deviations of these pools set to 50% of mean values. Results do not differ qualitatively from the previous ones. Fig. 3, C and D, shows the plots of variation coefficients versus time for these calculations. Effects of Varying Translation Initiation Frequency—We performed simulations in which translation initiation frequency was lowered by increasing ribosome dissociation. As in the case of RNA polymerase, we have assumed that ribosome binding is diffusion limited and that changes in ribosome binding site affect the dissociation rate rather than the association rate. Simulations have been executed for the following values of ribosome dissociation rate: 2.25, 22.5, 225, and 2250 s⫺1. The

protein levels obtained in these calculations covered the 200fold range observed by Yarchuk et al. (18) for RBS sites engineered to have various translation initiation rates. mRNA levels recorded in the simulations are shown on Fig. 4. The number of mRNA molecules is lowered by the increase of the ribosome dissociation rate. These results are in agreement with the experimental observations of Yarchuk et al. (18). Therefore, the results shown in Fig. 4 strongly support our treatment of the translation initiation kinetics. The gene expression patterns obtained in the simulations described above are shown in Fig. 5. The protein levels cover a range similar to that in the simulations with varying frequency of transcription initiation. The striking difference is that even for the lowest rate of protein synthesis, the protein molecules are still produced evenly, rather than in pulses. Variation coefficients calculated as described in the previous section are shown in Fig. 6A. As one can see, the shape of the curves is similar to those calculated for various transcription initiation frequencies, but the final levels are significantly lower. Fluctuations in the number of protein molecules are higher for the lower rates of ribosome binding, but they do not exceed 15%. We have repeated the above calculations with the standard deviations of the number of available RNAP and ribosome molecules set to 50% of the means. Results do not differ qualitatively. Variation coefficients of the number of protein molecules are plotted on Fig. 6B. The decrease of RBS effectiveness followed by the decrease of mRNA level, as has been observed by Yarchuk et al. (18), implies that the strength of the ribosome binding site is governed by ribosome binding/dissociation rather than by following the RBS clearance reaction (the reaction in Equation 7). If the ribosomes were stalled at RBS, they would protect mRNA from RNase E binding, and the low protein level would not be followed by the decrease in the mRNA level. Despite this fact, we have performed simulations in which frequency of translation initiation has been lowered by decreasing the rate of the reaction in Equation 7 (data not shown). Results do not differ significantly from the ones shown in Figs. 4 – 6. DISCUSSION

The kinetic model of gene expression presented above was tested against the experimental data concerning the LacZ gene in E. coli. The rate of protein synthesis, mRNA level, and ribosome spacing were in reasonable agreement with the ex-

8170

Stochastic Effects in Prokaryotic Gene Expression

FIG. 3. Variation coefficients of the number of protein molecules calculated for different transcription initiation frequencies. For every simulation, the average number of protein molecules and its S.D. in 100 trajectories were computed in each 10-s time interval. Data on plots are S.D./mean ratios for every time interval. Plots span the time scale of 10 generations. A, transcription initiation frequency (TIF) lowered by increasing RNAP dissociation. Curves I, II, III, and IV correspond to a TIF of 0.258, 0.052, 0.0054, and 0.0005 s⫺1, respectively. B, TIF lowered by increasing closed complex isomerization rate. Curves I, II, III, and IV correspond to a TIF of 0.258, 0.035, 0.0035, and 0.0001 s⫺1. C and D show results for the simulations in which the S.D. of RNAP and ribosome pools has been changed from 10% to 50% of the mean values. C, TIF changed by RNAP dissociation rate. Curves I, II, III, and IV correspond to TIF values of 0.259, 0.051, 0.0051, and 0.0003 s⫺1. D, TIF changed by closed complex isomerization rate. Curves I, II, III, and IV correspond to TIF values of 0.259, 0.034, 0.0034, and 0.0001 s⫺1. Numbers in parentheses represent the values to which the curves converge.

FIG. 4. Number of mRNA molecules calculated for different ribosome dissociation rates. A: I, ribosome dissociation rate of 2.25 s⫺1; II, ribosome dissociation rate of 22.5 s⫺1. B, ribosome dissociation rate of 2250 s⫺1. The results for a ribosome dissociation rate of 225 s⫺1 are not shown because the curve would overlap with the curve shown in B.

perimental observations. Moreover, the changes in mRNA level due to the variation in ribosome binding rate were correctly reproduced. The model, when applied to study the influence of transcription and translation initiation frequencies on the gene expression pattern, shows a significant, qualitative difference between these two cases. When the level of gene expression is lowered by decreasing promoter strength, the decrease in the number of protein molecules is accompanied by large fluctuations reaching 100% of the mean value. In contrast, low expression of the gene, without large fluctuations in the number of protein molecules, can be maintained by decreasing the efficiency of the ribosome binding site, i.e. lowering the rate of ribosome binding. For protein levels comparable to those corresponding to the smallest rate of transcription initiation frequency analyzed, the fluctuations do not exceed 15%. According to our simulations, the model is not sensitive to the magnitude of the fluctuations of the pools of available RNA polymerase and ribosomes. The simulations repeated with var-

iation coefficients of these pools equal to 10% and 50% resulted in good agreement with experimental data. The main conclusions of our work concerning the fluctuations in gene expression patterns are also unaffected by the variation coefficients of RNAP and ribosome pools. The influence of the variation coefficient of the pools is limited to a minor change in the exact values of standard deviations of the number of protein molecules. According to our model, the qualitative difference between the expression patterns of genes with low frequency of transcription initiation and low frequency of translation initiation is caused by the following mechanism. The transcription initiation event is always controlled by a single promoter molecule, whereas in the case of translation initiation, many mRNA molecules recruiting ribosomes or RNases are involved. Therefore, in this case, fluctuations are lowered by averaging over the number of ribosome binding sites taking part in the reaction.

Stochastic Effects in Prokaryotic Gene Expression

8171

FIG. 5. Expression patterns for various ribosome dissociation rates. On each plot, 100 trajectories recorded in the 10th generation are plotted with red lines. Average ⫾ 3 ␴ trajectories are plotted with black lines. A: I, ribosome dissociation rate of 2.25 s⫺1; II, ribosome dissociation rate of 22.5 s⫺1. B: I, ribosome dissociation rate of 225 s⫺1; II, ribosome dissociation rate of 2250 s⫺1.

FIG. 6. Variation coefficients of the number of protein molecules calculated for various ribosome dissociation rates. For every simulation, the average number of protein molecules and its S.D. in 100 trajectories were computed for each 10-s time interval. Data on the plot are S.D./mean ratios for every time interval. Plots span the time scale of 10 generations. Curves I, II, III, and IV show the results for ribosome dissociation rates of 2.25, 22.5, 225, and 2250 s⫺1. Numbers in parentheses denote the values to which the curves converge. A, results for the experiments in which the standard deviations of RNAP and ribosome pools were set to 10% of the mean value. B, results for the experiments in which the standard deviations of RNAP and ribosome pools were set to 50% of mean value.

Although variable frequencies of the translation initiation have been observed in bacterial cells, gene expression is more commonly regulated on the promoter level. This seems to contradict the results of our model, which suggest that large fluctuations in the number of protein molecules result from maintenance of low expression levels by low frequency of transcription initiation. These fluctuations are especially important in the context of the synthesis of regulatory proteins. Regulatory proteins are expressed in low amounts, and random changes in their concentration could affect regulated genes in an unpredictable way. This raises the question of why the low frequency of transcription initiation is the more frequent mechanism of maintaining low expression of these proteins. To answer this question, one needs to consider, at least semiquantitatively, the kinetics of protein-DNA binding. Kinetic studies of Lac repressor binding to DNA revealed that this protein finds its target operator site with a very high rate of about 7 ⫻ 109 ⫺1 ⫺1 M s (23). To appreciate the consequences of the high value of this second order rate constant, one can express it as the rate with which a single repressor molecule binds its operator site in the volume of the cell (10⫺15 L). The result, 12 s⫺1, is an order of magnitude higher than the largest transcription initiation rates observed. This means that binding of even a single repressor molecule to its operator site would be still faster than the rate of RNA polymerase recruitment and activation. Thus, should the fluctuations in the number of protein molecules be comparable to those presented in Fig. 2D, the number of repressor molecules present in most cells would be sufficient for rapid binding of the target gene, and the fluctuations would not influence the gene regulation process. The situation can be more complex if the gene is influenced by several regulatory proteins, as happens in the complex life cycles of bacteriophages. In this case, the random changes in relative amounts of

these proteins can trigger bifurcations in developmental pathways as shown by Arkin et al. (2). We conclude that the low level of proteins regulating bacterial operons can be maintained by low transcription initiation frequency because regulatory proteins bind to DNA sites with very high efficiency, exceeding the diffusion limit. In contrast, keeping the low expression level by inefficient ribosome binding is not optimal from the energetic point of view because mRNA molecules have to be unnecessarily synthesized. Therefore, expression of most regulatory proteins from weak promoters is an evolutionarily preferred strategy despite the large random fluctuations in the number of protein molecules. It is also worth commenting on the difference between constitutive and induced gene expression as implied by our model. The curves in Figs. 3 and 6 indicate that the relative magnitude of random variation is significantly higher in the bacterial generation in which the gene has been induced than in the subsequent generations. Therefore, according to the model, constitutively expressed genes show a lower level of fluctuations than induced genes. On the other hand, many induced genes that code for enzymes are expressed to very high levels, so that the fluctuations in the number of protein molecules quickly fall to very low values (see Fig. 3A). We have presented a kinetic model of gene expression that is able to reproduce a variety of experimental observations concerning the model prokaryotic gene, LacZ. We believe that the results obtained by the application of the model provide insights into the origins of the stochastic patterns of prokaryotic gene expression. These effects may be of interest in both efforts aimed at understanding the molecular details of gene regulation and biotechnology, where optimization of protein overexpression in bacteria is one of the primary goals.

8172

Stochastic Effects in Prokaryotic Gene Expression

Acknowledgment—We are grateful to Prof. A. L. Haenni for critical comments on the manuscript. REFERENCES 1. McAdams, H. H., and Arkin, A. (1997) Proc. Natl. Acad. Sci. U. S. A. 94, 814 – 819 2. Arkin, A., Ross, J., and McAdams, H. H. (1998) Genetics 149, 1633–1648 3. Newlands, S., Levitt, L. K., Robinson, C. S., Karpf, C. A. B., Hodgson, V. R. M., Wade, R. P., and Hardeman, E. C. (1998) Genes Dev. 12, 2748 –2758 4. Levin, M. D., Morton-Firth, C. J., Abouhamad, W. N., Bourret, R. B., and Bray, D. (1998) Biophys. J. 74, 175–181 5. Tyson, J. J., and Hannsgen, K. B. (1985) J. Theor. Biol. 113, 29 – 62 6. Maloney, P. C., and Rotman, B. (1973) J. Mol. Biol. 73, 77–91 7. Siegele, D. A., and Hu, J. C. (1997) Proc. Natl. Acad. Sci. U. S. A. 94, 8168 – 8172 8. Elowitz, M. B., and Leibler, S. (2000) Nature 403, 335–338 9. Alberts, B., Bray, D., Lewis, J., Raft, M., Roberts, K., and Watson J. D. (1994) Molecular Biology of the Cell 3rd Ed., Garland Publishing, London and New York

10. 11. 12. 13. 14. 15. 16. 17.

18. 19. 20. 21. 22. 23.

Carrier, T. A., and Keasling, J. D. (1999) J. Theor. Biol. 201, 25–36 Kennell, D., and Riezman, H. (1977) J. Mol. Biol. 114, 1–21 Gillespie, D. T. (1977) J. Phys. Chem. 81, 2340 –2361 Gillespie, D. T. (1992) Phys. A 188, 404 – 425 Bremer, H., and Dennis, P. P. (1996) Escherichia coli and Salmonella 2nd Ed., pp. 1553–1569, ASM Press, Washington, D. C. McClure, W. R. (1985) Annu. Rev. Biochem. 1985, 54, 171–204 Neidhardt, F. C. (1999) J. Bacteriol. 181, 7405–7408 Record, T. M., Reznikoff, W. S., Craig, M. L., McQuade, K. L., and Schlax, P. J. (1996) Escherichia coli and Salmonella 2nd Ed., pp. 792– 821, ASM Press, Washington, D. C. Yarchuk, O., Jacques, N., Guillerez, J., and Dreyfus, M. (1992) J. Mol. Biol. 226, 581–596 Draper, D. E. (1996) Escherichia coli and Salmonella 2nd Ed., pp. 902–908, ASM Press, Washington, D. C. Carrier, A. T., and Keasling, J. D. (1997) J. Theor. Biol. 189, 195–209 Goldberg, A. L., and Dice, J. F. (1974) Annu. Rev. Biochem. 43, 835– 869 Berquist, P. L., and Truman, P. (1978) Mol. Gen. Genet. 164, 105–108 Lin, S., and Riggs, A. D. (1975) Cell 4, 107–111