Optimization of fed-batch process for recombinant protein production in Escherichia coli using genetic algorithm S. Geethalakshmi, S. Narendran, S. Ramalingam and N. Pappa
Abstract—The optimization of process conditions leading to higher yield of recombinant protein is an enduring bottleneck in the bioprocess industries. In this work an unstructured kinetic model for the fed-batch cultivation of Escherichia coli expressing recombinant streptokinase has been developed. An optimization procedure based on genetic algorithm approach was developed to determine the optimal substrate feed profile for maximizing the production of recombinant protein. Regardless of the complexity of recombinant protein production, the simple model developed could describe the process satisfactorily and the model based optimal feed trajectory resulted in higher volumetric productivity.
I. INTRODUCTION
F
ed-batch cultivation is extensively employed to maximize the volumetric productivity of the recombinant proteins. The method of nutrient feeding is critical to the success of high cell density culture, as it not only affects the maximum attainable cell concentration, but also cell productivity [1], [2]. Exponential feeding of substrate is widely used so that the cells can be grown at a desired specific growth rate, preventing the formation of toxic byproduct like acetate. However in post induction phase, the shifts in gene expression and metabolic fluxes also result in significant changes in macroscopic parameters such as specific growth rate, yield and maintenance coefficients. Therefore, it is difficult to maintain a constant specific growth rate using the same feed profile as in the preinduction phase. Different feed strategies have been employed in the postinduction phase for recombinant cultures [3]-[5]. However, identification of appropriate feed strategy for a particular process relies on varying selected parameters through repeated rounds of trial-and-error method. There is no rationale for predicting the substrate feed which maximizes higher volumetric productivity. Determination of optimal
Manuscript received April 4, 2011. S. Geethlakshmi is with the Department of Instrumentation Engineering, Madras Institute of Technology, Anna University, Chennai 600044, India. (phone: +91 44 22516036; fax: +91 44 22232403; e-mail:
[email protected]). S. Narendran is with Centre for Biotechnology, A. C. College of Technology, Anna University, Chennai 600025, India. (e-mail:
[email protected]). S. Ramalingam is with Centre for Biotechnology, A. C. College of Technology, Anna University, Chennai 600025, India. (e-mail:
[email protected]). N. Pappa is with the Department of Instrumentation Engineering, Madras Institute of Technology, Anna University, Chennai 600044, India. (e-mail:
[email protected]).
substrate feed trajectories is highly challenging due to the highly non-linear and complex process dynamics. The performance of the process can easily be understood and optimized for maximum product formation, if an accurate generic model for recombinant protein production is available [6], [7]. The development of such a process model can be used for optimizing the substrate feed trajectories for maximizing the production of recombinant proteins in the post induction phase. Model-based optimization and control of bioprocess has gained importance in the recent years due to reliability and developments in computing power [8], [9]. Genetic Algorithm (GA) has been proved to be extremely suitable for the optimization of highly non-linear problems and also for parameter estimation when the function includes more complexities and/or discontinuities [10], [11]. In this work an unstructured kinetic model for recombinant streptokinase production in E. coli for the post induction phase has been proposed. The model parameters were estimated using GA. Optimization of substrate feed trajectory has been developed for a fixed terminal time and validated by the fed-batch experiment. Streptokinase is a therapeutically important protein widely used as a thrombolytic agent for treatment of heart attack and stroke. Developing an efficient model and optimizing process conditions for maximizing such an off-patented drug is essential for its cost efficient production in the industries. II. PROCESS DESCRIPTION A. Bacterial strain The bacterial strain Escherichia coli BL21 (DE3) was used in this work for production of recombinant streptokinase. Plasmid used in this study was pET containing an Ampr marker. E. coli cells were routinely grown and maintained in LB medium supplemented with 100 μg/ml of ampicillin. For long-term storage, 50% glycerol stock of culture was stored at −80 °C. All the components were obtained from SISCO Research Laboratories Pvt. Ltd. (Mumbai, India) and Hi Media Laboratories Pvt. Ltd. (Mumbai, India). B. Culture media Medium used for batch fermentation contained the following components: glucose 13.2 g L-1, (NH4)2SO4 0.59 g L-1, K2HPO4 5.64 g L-1, KH2PO4 5.64 g L-1, 1 M MgSO4·7H2O 2 mL L-1, thiamine 0.09 g L-1, ampicillin 100 μg mL-1. The concentrated feed solution for fed-batch
978-1-61284-764-1/11/$26.00 ©2011 IEEE
fermentation contained the following components: glucose 66 g L-1, (NH4)2SO4 2.95 g L-1, K2HPO4 28.2 g L-1, KH2PO4 28.2 g L-1, 1 M MgSO4·7H2O 10 mL L-1, thiamine 0.45 g L1 . Antibiotic solutions were filter sterilized through a 0.2 μ filter (Sartorius Ltd., India). C. Fed-batch Cultivation The fed-batch cultivation of recombinant E. coli was carried out in 2 L Bioengineering KLF 2000 stirred vessel fermentor with control modules for pH, temperature, Dissolved Oxygen (DO) and agitation. The optimum process conditions such as temperature (37 ºC) and pH (7.0) were monitored and controlled. Agitation (600 – 1200 rpm) was varied appropriately to maintain DO above 20 % saturation throughout the run. Antifoam - polypropylene glycol was added to prevent excessive foaming. The fedbatch experiment was initially started as a batch culture of 1.0 L and feeding of the concentrated medium was started in the late logarithmic growth phase of the batch culture. Exponential feeding was given in the pre-induction phase according to the substrate balance equation (1) and changed to constant feeding of 0.045 L h-1 (CF45) after induction with 1mM IPTG.
F=
μ X 0 V0 e
μt
(1)
yx / s SF
where F is the substrate feed rate (L h−1), μ the specific growth rate (h−1), X0 the cell concentration at the time of starting the fed-batch (g L−1), V0 the initial (batch culture) volume (L), t the cultivation time after initiation of the fedbatch culture (h), S0 the initial substrate concentration (g L−1) and yx/s the biomass yield coefficient with respect to substrate (g g−1). During the pre-induction phase, the specific growth rate was maintained at a constant level by increasing the feed rate exponentially. D. Analytical procedure The biomass concentration (X) was found using the optical density (600 nm) and dry cell weight correlation. The harvested cells were centrifuged at 7500 g for 10 min in the micro-centrifuge. The supernatant was used for residual glucose (S) estimation using GOD-POD assay kit (Liquichem Glucose Kit). The pellet was dissolved in phosphate buffer saline and the cells were disrupted by ultrasonic waves of higher frequency using sonicator (LABSONICS, B. Braun Biotech International Ltd.) and used for the streptokinase assay. The streptokinase concentration (P) was determined as given in [12]. The analytical procedure for measuring plasmid stability is similar to that used by Yazdani et al [13]. III. PROCESS MODEL A segregated, unstructured model has been developed for the analysis and design of post induction dynamics of the process. This model assumes that, glucose is the limiting
nutrient. The model considers the following dynamic mass balances for the concentrations of plasmid containing cells (X +), plasmid free cells (X -), glucose (S), recombinant protein (P) and process volume (V) [14]. The segregation of the total biomass into plasmid bearing cells and plasmid free cells is shown in (2) and (3).
dX + F = ( μ + −θμ + −α ) X + − X + V dt
(2)
dX − F = θμ + X + + (μ − − β ) X − − X − (3) V dt The utilization of the substrate is assumed to be caused by the growth of plasmid bearing and plasmid free cells and maintenance requirement of the microbes as shown in (4). The model describing the recombinant protein expression and the rate of change of volume is shown in (5) and (6).
μ+X + μ−X − dS F =− − − − ms ( X + + X − ) + (SF − S ) + dt V y ys s dP F = (1−α )μ + y p X + − k p P − P dt V dV =F dt
(4)
(5)
(6)
The kinetic model for specific growth rate of the plasmid bearing cells (µ+) and that of plasmid free cells (µ-) are shown in (7) and (8). The growth rate model for the plasmid bearing cells is assumed to exhibit both substrate and product inhibition, whereas the growth rate model for the plasmid free cells is given by the simple Monod equation. ⎛ ⎞ ⎜ ⎟ μ m+ ⎜⎜1− P ⎟⎟S
μ+ =
μ− =
⎜ ⎝
pm
⎟ ⎠
S2 ks + S + ki
μ m− S
(7)
(8)
ks + S
The variables X +, X -, S and P are the state variables of the process. µm+, µm-, ys+, ys-, ms, ks, yp ,kp , pm, ki, α, β and θ are the process parameters which have to be estimated to obtain a complete model to predict the behavior of the entire state variable profiles of the fed-batch process. IV. PARAMETER ESTIMATION Parameters of the unstructured model have been estimated using GA which is an effective stochastic global search algorithm. Experimental data of the fed-batch cultivation have been interpolated for accurate parameter estimation. The parameters were estimated by minimizing the objective function (9), which is the normalized squared error between the estimated and measured state variables in MATLAB 7.7.
The objective function is defined as, n
2 2 ∑ ⎛⎜⎝ X eij − X ij ⎞⎟⎠ ⎛⎜⎝ X ij max ⎞⎟⎠ (9)
j =1 i =1
where n is the number of observations (Interpolated data), m is the number of state variables, Xeij and Xij are the estimated and measured state variables respectively and Xij max is the maximum value of the state variable. The estimated parameters were used to compute the entire profile of the state variables, by solving the non-linear differential equations (2-6) using differential equation solver in MATLAB.
Substrate feed rate - F(t) ( L h-1 )
m
Objective function = ∑
V. OPTIMAL SUBSTRATE FEED TRAJECTORY The unstructured kinetic model has been used to compute the optimal substrate feed trajectory for maximizing the volumetric productivity of streptokinase. The feed trajectory F(t) was computed over a free terminal time to maximize objective function (Eq. 10) Maximize PI = P(t f ) V (t f )
(10)
F (t )
Subject to constraints on volume, substrate feed rate and post induction duration ≤ F (t ) ≤ F
x a m
and F
n i m
x a m
V (t ) < V
and 0 ≤ t ≤ t f
(11)
Fig. 1. Schematic of optimization of substrate feed rate. TABLE I OPTIMAL ESTIMATED MODEL PARAMETERS FOR CF45 Parameter µm+ µmys+ ysms ks yp kp
The post induction phase has been divided into intervals of 0.5 h duration. The optimization algorithm based on GA has been developed to estimate the constant feed rates F1 to F9 which maximizes the objective function as shown in Fig.1. The optimization algorithm determines the correct switching structure as well as the constant feed rate in the singular interval for the entire post induction phase. The optimal substrate feed rate had been simulated using the developed model to generate the state variable profiles and compared with the experimental values. VI. RESULTS AND DISCUSSIONS This work is intended to optimize the fed-batch process for maximizing the recombinant streptokinase production in E. coli. The fed-batch cultivation was carried in the laboratory scale bioreactor as described in the materials and methods. A segregated, unstructured model was developed for the above fed-batch experiment and the parameters were estimated using GA as shown in table 1.
pm ki α β θ
Description Maximum specific growth rate of X + Maximum specific growth rate of X Biomass yield coefficient of X + Biomass yield coefficient of X Maintenance coefficient Substrate saturation constant Yield coefficient for protein Coefficient for proteolytic degradation of the protein Product saturation constant Substrate inhibition constant Lysis rate of X + Lysis rate of X Plasmid loss rate
h-1 h-1 g g-1 g g-1 h-1 g L-1 g g-1
Optimal Value 0.4890 0.5002 0.4000 0.5488 0.0902 0.0095 0.0564
h-1
0.9456
Unit
-1
gL g L-1 h-1 h-1 -
0.3854 17.559 0.3563 0.0100 0.5704
The unstructured model with the estimated parameters was simulated to generate the state variable profiles and compared with the experimental data of the fed-batch experiment as shown in Fig. 2. The simulated model describes the process satisfactorily and is in close agreement with the experimental values. To maximize the volumetric productivity of the recombinant protein an optimal substrate feed trajectory was computed using GA as shown in Fig. 3. The total streptokinase produced during the post induction phase with the optimal substrate feed rate was compared with that of fed-batch experiment (CF45) as shown in Fig. 4. It is clearly seen that the total streptokinase produced by the optimal feed rate is higher compared to the constant feed rate. The average volumetric productivity of streptokinase was calculated from the simulated profile obtained with optimal substrate feed rate and the experimental data. It was observed that the average volumetric productivity was found to be 8% higher for the optimal feed trajectory.
-
X ( gL 1)
2
+
1.5 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
2
-
X ( gL 1)
1
0 2
APPENDIX
1.5 1
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0.03
List of symbols
-
P (gL 1)
-
S ( gL 1)
-
1
been designed using the model based optimization algorithm. The developed model simulated with the optimal substrate feed rate enhanced the recombinant streptokinase productivity. Further, refinement of the model and the optimization algorithm can be carried out for scaling up of recombinant protein production in industries.
0.025 0.02
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Time after induction (h)
Fig. 2. Comparison of simulated model profile obtained using estimated parameters and the experimental data. The open circles (o) represent the experimental data, the dashed lines (--) represent the interpolated experimental data and the solid lines (-) represent the simulated model output.
0.065
-1
Optimal substrate feed rate (L h )
0.06 0.055 0.05 0.045 0.04 0.035 0.03 0.025 0.02
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Time after induction (h)
Fig.3. Optimal substrate feed trajectory computed by GA based optimization algorithm. 32.5 CF45 Optimal feed
32
Total streptokinase ( mg )
31.5 31 30.5 30 29.5 29 28.5 28 27.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Time after induction (h)
Fig. 4. Comparison of total streptokinase produced by the constant feed of 0.045 L h-1 (CF45) and that by the optimal feed trajectory developed.
F Fmax Fmin P PI S SF t tf V Vmax Vo X+ XXeij Xij Xijmax ki kp ks ms pm qs+ qsyp ys+ ysyx/s α β θ µ+ µµ m+ µ mµp
Substrate feed rate (L h-1) Maximum Substrate feed rate (L h-1) Minimum Substrate feed rate (L h-1) Recombinant streptokinase concentration (g L-1) Performance Index (-) Residual substrate concentration (g L-1) Substrate concentration in feed solution (g L-1) Time (h) Final time (h) Process volume (L) Maximum process volume (L) Initial batch culture volume (L) Plasmid bearing cells concentration (g L-1) Plasmid free cells concentration (g L-1) Estimated value of state variables (g L-1) Measured value of state variables (g L-1) Maximum value of measured state variables (g L-1) Substrate inhibition constant (g L-1) Coefficient for proteolytic degradation of the protein (h-1) Substrate saturation constant (g L-1) Maintenance coefficient (h-1) Product saturation constant (g L-1) Glucose consumption rate function of plasmid bearing cells (h-1) Glucose consumption rate function of plasmid free cells (h-1) Yield coefficient for protein (g g-1) Biomass yield coefficient of X + (g g-1) Biomass yield coefficient of X - (g g-1) Biomass yield coefficient (g g-1) Lysis rate of X + (h-1) Lysis rate of X - (h-1) Plasmid loss rate (-) Specific growth rate of X + (h-1) Specific growth rate of X - (h-1) Maximum specific growth rate of X + (h-1) Maximum specific growth rate of X - (h-1) Protein production rate function (h-1) REFERENCES
VII. CONCLUSIONS
[1]
A simple, unstructured kinetic model has been developed to describe the fed-batch process of E. coli for recombinant streptokinase production. An optimal substrate feed trajectory for maximizing the streptokinase production has
[2]
J. H. Choi, K. C. Keum, S. Y. Lee, “Production of recombinant proteins by high cell density culture of Escherichia coli,” Chem. Eng. Sci., vol. 61, no. 3, pp. 876-885, 2006. J. Shiloach, R. Fass, “Growing E. coli to high cell density-A historical perspective on method development,” Biotechnol. Adv., vol. 23, pp. 345-357, 2005.
[3]
[4]
[5]
[6] [7] [8] [9]
[10] [11]
[12]
[13] [14]
H. H. Wong, Y. C. Kim, S. Y. Lee, H. N. Chang, “Effect of postinduction nutrient feeding strategies on the production of bio adhesive protein in Escherichia coli,” Biotechnol. Bioeng., vol. 60, pp. 271– 276, 1998. B.S. Kim, S. C. Lee, S.Y. Lee, Y. K. Chang, H. N. Chang, “High cell density fed-batch cultivation of Escherichia coli using exponential feeding combined with pH-stat,” Bioprocess Biosyst. Eng., vol. 26, pp. 147–150, 2004. D. J. Seo, B. H. Chung, Y. B. Hwang, Y. H. Park, “Glucose-limited fed-batch culture of Escherichia coli for production of recombinant human interleukin-2 with the DO-stat method,” J. Ferment. Bioeng., vol. 74, pp. 196–198, 1992. H. R. Baheri, W. J. Rosler, G. A. Hill, “Modeling of recombinant bacteria fermentation for enhanced productivity,” Biotechnol. Tech., vol. 11, pp. 47-50, 2005. M. Nadri, I. Trezzani, H. Hammouri, P. Dhurjati , R. Longi, J. Lieto, “ Modeling and observer design for recombinant Escherichia coli strain,” Bioprocess Biosyst. Eng., vol. 28, pp. 217-225, 2006. C. Sommer, N. Volk, M. Pietzsch, “Model based optimization of the fed-batch production of a highly active transglutaminase variant in Escherichia coli,” Protein Expression Purif., vol. 77, pp. 9-19, 2011. L. Yijian, F. Yanjun, “Optimization Design of PID Controller Parameters Based on Improved E. Coli Foraging optimization Algorithm,” in Proc. IEEE Int. Conf. Automation and Logistics Qingdao, China, September 2008, pp. 227-231. D. Sarkar, J. M. Modak, “Optimization and control of fed-batch bioreactors using genetic algorithm: multiple control variables,” Comp. Chem. Eng., vol. 28, pp. 789-798, 2004. V. K. Garlapati, P. R. Vundavilli, R. Banerjee, “Evaluation of lipase production by genetic algorithm and particle swarm optimization and their comparative study”, Appl. Biochem. Biotechnol., vol. 162, no. 5, pp. 1350-1361, 2010. S. Ramalingam, P. Gautam, K. J. Mukherjee, G. Jayaraman, “Effects of post-induction feed strategies on secretory production of recombinant streptokinase in Escherichia coli,” Biochem. Eng., vol. 33,pp. 34-41, 2007. S. S. Yazdani, K. J. Mukherjee, “Continuous-culture studies on the stability and expression of recombinant streptokinase in Escherichia coli,” Bioprocess Biosyst. Eng., vol. 24, pp. 341–346, 2002. Zhi-Yong, Zheng, S. J. Yao, D. Q. Lin, “Using a kinetic model that considers cell segregation to optimize hEFG expression in fed-batch cultures of recombinant Escherichia coli,” Bioprocess Biosyst. Eng., vol. 27, pp. 143-152, 2005.