Multi-objective mixed integer strategy for the optimisation of biological

0 downloads 0 Views 589KB Size Report
Published in IET Systems Biology. Received on 28th July 2009. Revised on 17th December 2009 doi: 10.1049/iet-syb.2009.0045. ISSN 1751-8849.
www.ietdl.org Published in IET Systems Biology Received on 28th July 2009 Revised on 17th December 2009 doi: 10.1049/iet-syb.2009.0045

ISSN 1751-8849

Multi-objective mixed integer strategy for the optimisation of biological networks J.O.H. Sendı´n1 O. Exler2 J.R. Banga1 1

Process Engineering Group, IIM-CSIC (Spanish National Research Council), C/Eduardo Cabello, 6, 36208 Vigo, Spain Department of Computer Science, University of Bayreuth, 95440 Bayreuth, Germany E-mail: [email protected] 2

Abstract: In this contribution, the authors consider multi-criteria optimisation problems arising from the field of systems biology when both continuous and integer decision variables are involved. Mathematically, they are formulated as mixed-integer non-linear programming problems. The authors present a novel solution strategy based on a global optimisation approach for dealing with this class of problems. Its usefulness and capabilities are illustrated with two metabolic engineering case studies. For these problems, the authors show how the set of optimal solutions (the so-called Pareto front) is successfully and efficiently obtained, providing further insight into the systems under consideration regarding their optimal manipulation.

1

Introduction

Mathematical optimisation theory can explain current adaptations of biological systems, and can also be used to predict new designs that may yet evolve [1, 2]. The use of optimisation in fields such as computational biology, bioinformatics and systems biology has been recently reviewed [3– 5]. Model-based optimisation can be used as a systematic framework for manipulation and re-design of biological systems. For example, it can play a key role in metabolic engineering, ensuring the optimal manipulation of the genetic and metabolic composition of organisms in order to achieve a certain goal (e.g. development of strains with enhanced production of metabolites with biotechnological interest). The traditional approach consists of a series of random mutations and subsequent selection of improved organisms, but current techniques allow the targeted modification in the genetic information of the microorganism, making possible to change specifically either the protein content or the enzymatic properties in such a way that, for instance, the production of a metabolite with industrial and/or medical interest can be actually enhanced [6, 7]. During the last decades, several authors have considered the constrained single programming problem, that is, the 236 & The Institution of Engineering and Technology 2010

optimisation of a single performance index subject to a number of constraints. We assume that a mathematical model of the biochemical system is available in the form of differential algebraic equations (DAEs), and that additional requirements for the system can be formulated as sets of equality and/or inequality constraints. Owing to the intrinsic non-linearity of the kinetics involved in the metabolic network (traditionally represented by means of Michaelis – Menten rate laws), many researches have made use of alternative representations (such as the power-law formalism) which permit to obtain a linear system at steady state after a logarithmic transformation. Thus, methods of linear programming can be applied very efficiently to determine the optimal steady state of the pathway [8– 11]. When dealing with metabolic transformations by cellular systems, many criteria are required to be satisfied. From a practical and industrial perspective, the most interesting objective is the maximisation of metabolic reaction rates and steady-state fluxes [12]. In the context of evolutionary optimisation other criteria have been considered, such as minimisation of the concentrations of metabolites or minimisation of transient times [13]. In all cases, a number of constraints must be taken into account. In general, metabolic variables (enzyme and metabolite concentrations, fluxes and so on) should remain within a certain physiological range in order to avoid undesirable effects and IET Syst. Biol., 2010, Vol. 4, Iss. 3, pp. 236– 248 doi: 10.1049/iet-syb.2009.0045

Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on June 03,2010 at 08:56:52 UTC from IEEE Xplore. Restrictions apply.

www.ietdl.org maintain cell viability. It has been argued that the rest of metabolism should remain unperturbed and without changes in intermediate concentrations [14]. A more realistic approach is to consider the simultaneous optimisation of several objectives, that is, one must find the best alternatives for several (often conflicting) metabolic performances of the biochemical pathway. When an optimisation problem involves more than one objective function, the task of finding one or more optimal solutions is known as multi-objective (or multi-criteria) optimisation. In general, it is not possible to obtain a feasible solution which is simultaneously optimal for all the objectives, but instead a family of points known as the Pareto-optimal set must be found. All solutions in this set are optimal in the sense that it is not possible to improve one of the objectives without degrading one or more of the others. In this regard, simultaneous minimisation of intermediate concentrations has been suggested as an optimality principle for metabolic networks [15].

strategies (with identification of gene knock outs) for designing strains with enhanced capabilities [21–24]. Computing the Pareto-optimal set can be a very challenging task because of the highly constrained and non-linear nature of biological systems. In this regard, it is important to keep in mind that the majority of the existing implementations ultimately rely on local optimisation routines and suitable global optimisation (GO) methods are needed [25]. We present a GO solver for multi-criteria MINLPs based on the combination of a novel multi-objective optimisation approach with a recent hybrid algorithm for solving MINLP problems. The work is focused on the generation of a Pareto-optimal set capturing the complete trade-off among the objectives. The usefulness of multi-objective optimisation and the performance of our method are illustrated with case studies taken from the literature.

2 There exist many applications in science and engineering involving the optimisation of multiple criteria. In comparison, very little has been done up to now regarding biochemical networks. Multi-objective optimisation in systems biology has been recently reviewed in for example [16]. Despite of the significant benefits of multi-objective optimisation, only a few metabolic engineering applications are found in the literature [17]. In this work, we consider multi-objective optimisation problems arising from the domain of engineering of metabolic pathways. In this regard, an important question that is often formulated in the metabolic engineering field is: ‘what is the minimum set of enzymes in a metabolic system that should be modified, and by how much, in order to obtain a viable strain producing the maximum flux and/or yield of a desired final product?’ This question is usually answered by solving constrained optimisation problems or a sequence of them [18, 19], which is in fact a common transformation of a natural multi-objective optimisation problem. Furthermore, because of the existence of discrete decision variables (e.g. to determine which enzymes should be modified), the associated optimisation problems are formulated as mixed-integer non-linear programming (MINLP) problems. Mixed-integer programming is not a novel concept in either systems biology or metabolic engineering. Most of the optimisation studies, however, make use of stoichiometric models or kinetic laws which, in steady state, permit the formulation of the resulting optimisation problem as a mixedinteger linear programming (MILP) one. Recent examples of applications with binary variables (including both MILP and MINLP problems) include the analysis and redesign of metabolic reaction networks [18] and their regulatory structure [8, 20], and the search for optimal intervention IET Syst. Biol., 2010, Vol. 4, Iss. 3, pp. 236– 248 doi: 10.1049/iet-syb.2009.0045

Theoretical background

2.1 Basic concepts Mathematical optimisation aims at finding the best solution among the set of all possible alternatives according to a certain performance index or objective function. However, when dealing with two or more objectives or criteria that are in conflict with each other, the solution to the optimisation problem will not be unique but instead there will be a family of solutions known as Pareto-optimal set, and no point from this set can be said to be better than another. In mathematical terms, the general mixed-integer nonlinear multi-objective optimisation problem is defined as finding the vector of nr continuous variables x and the vector of ni integer variables y which minimise a vector J of m objective functions min J = ( J1 (˙z, z, x, y, p), J2 (˙z, z, x, y, p), . . . , Jm (˙z, z, x, y, p)) x,y

(1) subject to † System dynamics in the form of DAEs, with state variables z and additional parameters p f (˙z, z, p, x, y) = 0 z(t0 ) = z0

(2)

† Additional requirements in the form of equality and/or inequality constraints h(z, p, x, y) = 0 g(z, p, x, y) ≤ 0

(3) (4)

† Upper and lower bounds (superscripts U and L, 237

& The Institution of Engineering and Technology 2010

Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on June 03,2010 at 08:56:52 UTC from IEEE Xplore. Restrictions apply.

www.ietdl.org † The utopia vector J ∗ is the vector of objective functions containing the individual global minima of the objectives.

respectively) on decision variables xL ≤ x ≤ xU L

U

y ≤y≤y

(5) (6)

This set of constraints defines the feasible space S, while the feasible objective space o is the set {J (x, y)|(x, y) [ S}. For the sake of clarity, hereafter we will denote by x the vector of decision variables including both continuous and integer variables. As mentioned before, very often the components of the objective vector J are in conflict with each other. In this case, the solution minimising one of the criteria does not minimise the other, that is, it will not be possible to find a unique solution which is simultaneously optimal for all the objectives (it is clear that if there exists such a point, it will provide a solution to the MOP). Unlike single objective optimisation, in a MOP there will be in general multiple points which are optimal in the sense that an improvement in one objective function can only be achieved with a worsening in one or more of the others. For two or more objectives, the concept of domination is introduced to determine which solutions are better than others. Given two points x1 and x2 , the vector J(x1) is said to dominate J(x2) if Ji (x1) ≤ Ji (x2) for all i ¼ 1, . . . , m, with at least one strict inequality. A feasible solution x∗ is said to be local Pareto-optimal if there is no another feasible solution x in the neighbourhood of x∗ such as J(x) dominates J(x∗ ). A feasible solution x∗ is said to be global Pareto-optimal (or efficient) if there is no another solution x∗ over the entire feasible space such as J(x) dominates J(x∗ ). A related concept is that of weak Pareto optimality. A solution x∗ is weakly Pareto-optimal if there does not exist another solution x such that Ji (x) , Ji(x∗ ), for all i ¼ 1, . . . , m. The set of all Pareto-optimal solutions is usually referred as the Pareto front. In the absence of any further information about the problem, all Pareto-optimal solutions (potentially an infinite number for continuous problems) are mathematically equivalent. Thus, multi-objective optimisation implies a decision-making process concerning a large number of optimal alternatives. From a practical point of view, the user is only interested in one final solution. The decision maker (DM) is the responsible for selecting such a solution, who will assign preferences to the objectives using some additional information which quite often is subjective, unknown a priori, and/or difficult to express in mathematical terms. When solving an MOP, other points of interest are the utopia vector and the nadir (or pseudo-nadir) vector: 238 & The Institution of Engineering and Technology 2010

† The nadir vector J Nadir is the vector of objective functions containing the upper bounds of the objectives in the Paretooptimal set. The ith component of the nadir vector can be estimated as JiNadir = max { Ji (x∗j )} where x∗j , j = 1, . . . , m, is the global minimiser of the jth objective.

2.2 Multi-objective optimisation methods A wide range of approaches have been proposed in the last decades to solve MOPs. Reviews of these methods can be found in the books Miettinen [26] and Deb [27], and the references cited therein. However, the majority of mathematical programming techniques are developed for continuous problems and they do not consider integer variables. Traditional approaches deal with MOPs by means of scalarisation techniques, which transform the original MOP into a single-objective (mixed-integer) non-linear programming (NLP/MINLP) problem making use of some characteristic parameter, for example a vector of weights. Such parameter can either represent the relative importance of the objectives or be a mere mathematical device which is varied systematically to (hopefully) obtain different solutions. Scalarisation techniques require solving repeatedly a set of single NLPs/MINLPs, but very often it is not obvious how to change the method parameters in order to obtain a satisfactory solution or a good distribution of points. A different approach to solve MOPs is based on the use of evolutionary algorithms [27]. These methods mimic the mechanisms of natural selection and genetics by using a family of possible solutions in each iteration (the so-called population). Thus, it is possible to obtain multiple Paretooptimal solutions in one single optimisation run. This fact, together with its ability to deal with non-convex Pareto fronts, makes them attractive to solve highly non-linear MOPs. As a drawback, these algorithms usually require a large population size, which is translated into a large number of non-dominated solutions. In addition to the associated increase in computational effort, such a large set can be very difficult to handle by the DM, especially as the number of objectives increases. In the following paragraphs, we describe briefly some of the most popular scalarisation techniques, with special emphasis on those applied in this work.

2.2.1 Weighted sum approach: The simplest and most widely used method for dealing with MOP is to minimise a composite function which is the weighted sum of the objectives. These weights represent the relative importance of each objective, and they must be changed in order to generate different solutions. It is not possible to find points in non-convex regions of the Pareto set. IET Syst. Biol., 2010, Vol. 4, Iss. 3, pp. 236– 248 doi: 10.1049/iet-syb.2009.0045

Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on June 03,2010 at 08:56:52 UTC from IEEE Xplore. Restrictions apply.

www.ietdl.org 2.2.2 1-Constraint: In this traditional technique, the original MOP is reduced to a single NLP/MINLP by minimising one of the objectives and converting the rest of criteria to inequality constraints. Different solutions can be obtained by changing the upper bounds (the 1-constraint) on the objectives not minimised. This method has been widely used because points in concave parts of the Pareto front can be generated. 2.2.3 Weighted Tchebycheff method: This formulation, an extension of the classical min-max approach, can be stated as follows min U = max{wi | Ji (x) − J 0i |} x

(7)

where J 0 is a reference point or goal predefined by the DM. If the utopia vector is taken as reference point, minimising U can provide the complete Pareto front with a variation of the weights. Also, this formulation assures that the solution is at least weak Pareto optimal [28]. The difficulty of this approach, as in the previous ones, lies in the determination of the method parameters to ensure a good representation of the optimal front by solving as few NLPs/MINLPs as possible.

2.2.4 Normal boundary intersection (NBI): The original NBI mathematical programming technique, developed by Das and Dennis [29], essentially works by solving a set of NLPs of the form max U = l

(8)

Fw + ln = J (x) − J ∗

(9)

x,l

subject to

a How NBI works b Reference point for the hybrid NBIWT method

3

F is an m × m pay-off matrix in which the ith column is J (x ∗i ) − J ∗ (i.e. the objective functions areshifted to the origin); w is a vector of weights such that m 1 wi = 1 and wi ≥ 0; n = Fe is the unitary quasi-normal vector, being e a vector of ones. The product Fw defines a point in the socalled convex hull of individual minima (CHIM). The intersection between the normal to the CHIM starting from this point and the boundary of the objective space closest to the origin is expected to be Pareto optimal (Fig. 1a). The above NLP (called NBI-subproblem) is solved for various weight vectors w in such a way that an equally distributed set of them produces an even spread of solutions in the boundary of the objective space. It should be noted that the equality constraints introduced by NBI assure that the solution is actually on the normal to the CHIM, but when dealing with integer variables, there may not exist such a feasible solution. Our strategy, presented in the next section, is intended to overcome this drawback. IET Syst. Biol., 2010, Vol. 4, Iss. 3, pp. 236– 248 doi: 10.1049/iet-syb.2009.0045

Figure 1 Schematic of NBI and NBIWT methods

Methodology

3.1 Hybrid method: NBI-based weighted Tchebycheff approach (NBIWT) In the following paragraphs, we present a novel solution strategy, NBIWT, which combines these two methods, coupled with a MIGO solver for the associated MINLPs, in order to surmount their respective drawbacks, and more specifically, those concerning with the NBI programming technique: 1. Generation of local optimal solutions with non-convex Pareto fronts, which makes necessary the use of suitable GO solvers. 2. The NBI method can also yield non-Pareto optimal points if the normal to the CHIM intersects the objective space boundary in a non-convex region. Although nonoptimal solutions can be eliminated after computing the Pareto front by means of for example a ‘Pareto filter’, it is clear that solving an NLP/MINLP whose solution does not belong to the Pareto-optimal set is a computational waste. 239

& The Institution of Engineering and Technology 2010

Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on June 03,2010 at 08:56:52 UTC from IEEE Xplore. Restrictions apply.

www.ietdl.org 3. The NBI method introduces an additional variable and additional equality constraints (as many as objective functions), which can increase the computational cost of computing the Pareto front. If integer variables are present and the objective space is discrete, there may not exist a solution satisfying these constraints. A recent interesting attempt to improve NBI can be found in for example [30]. It provides a way to reduce the number of NLPs/MINLPs to be solved which can also be implemented in the technique presented here. Our strategy makes use of the NBI programming technique to generate a series of reference points on the hypercube defined by the global minimisers of the objective functions. These points are used to solve a weighted min – max problem, using weights which are determined from the quasi-normal direction to the CHIM (Fig. 1b). Thus, for a given weight vector wk , which defines the point Pk ¼ Fwk on the CHIM, the kth NLP/MINLP to solve is formulated as 0  k,i ( Ji (x) − Jk,i )} min U = max{w x

(10)

where omitting the mathematical derivation: m n) J 0k = J ∗ + (Fwk + l

(11)

1/n w  k,i = − m i i=1 1/ni

(12)

m is determined by solving a linear programming and l m and the ‘dummy’ vector problem defined as finding l  J d to  max U ′ = l s.t. n = Jd − J ∗ Fw + l ∗

J ≤ Jd ≤ J

(13)

Nadir

3.2 Multi-objective optimisation process

(b) Solve the associated NLP/MINLP (11) by means of a suitable single-objective optimisation method.

4. Analysis of solutions and selection. The strategy described above ultimately requires the solution of a set of single-objective NLPs/MINLPs, so the use of robust and efficient MIGO methods becomes a critical issue.

3.3 Mixed-integer Tabu search (MITS) MITS is a recently developed GO algorithm based on extension of the hybrid metaheuristic Tabu search, which uses a special strategy for deciding where to start a local search. It incorporates the local solver MISQP [31], which is an adaptation of a sequential quadratic programming method for the mixed-integer case. Further information about this solver can be found in [32].

4

Case studies

In order to illustrate the proposed MO solution strategy, we consider two case studies taken from the literature. Both of them are formulated below.

4.1 Aromatic amino acid biosynthesis This example deals with the modification of the existing regulatory and activity structure of a metabolic pathway. A similar problem was studied in [8] using a linearised model which is derived from the non-linear one developed in [33]. Only the aromatic amino acid biosynthesis reactions are considered as an isolated subsystem of the overall metabolism (Fig. 2). The pathway consists of six pathway reactions (v1 to v6) and three reactions for the incorporation of the amino acids into biomass (v7 to v9). The model also considers six additional rates (v10 to v15 , one for each metabolite) which

The proposed multi-objective optimisation process is composed of the following steps: 1. Search for the global individual optima of the objectives, which will provide a first insight into the trade-off involved among the criteria. 2. Generate a set of K equally spaced weight vectors. 3. For each weight vector wk , k = 1, . . . , K : (a) Find the reference point J 0k (12) and the weight vector w  k (13). 240 & The Institution of Engineering and Technology 2010

Figure 2 Aromatic amino acid pathway (G6P: glucose6-phosphate; PEP: phosphoenolpyruvate; DAHP: 3-deoxyD-arabino-heptulosonate; CHR: chorismate; PHP: prephenate; PHE: phenylalanine; TRP: tryptophan; TYR: tyrosine) IET Syst. Biol., 2010, Vol. 4, Iss. 3, pp. 236– 248 doi: 10.1049/iet-syb.2009.0045

Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on June 03,2010 at 08:56:52 UTC from IEEE Xplore. Restrictions apply.

www.ietdl.org account for the dilution brought about by increases in the biomass. The network has eight feedback inhibitory loops ( y1 to y8). The aim of the problem is to determine which of the regulatory loops should be retained, and what should be the changes in the enzyme expression levels of steps v1 to v6 to optimise a certain objective function. We consider as starting point the same problem as in [8], the maximisation of phenylalanine selectivity, SPhe max J 1 = SPhe = vm ,y

v4 v4 + v5 + v6

(14)

Thus, here we consider the principle of homeostasis as a second objective function, that is we minimise simultaneously

min J2 = V = vm ,y

NMet |Si − Si,ref | 1  NMet i=1 Si,ref

(17)

where NMet is the number of metabolites. Rate expressions for the pathway are taken from Schlosser and Bailey [33]. Given a general kinetic rate expression, the binary variables associated to the inhibitory interactions are introduced as follows

subject to v = vm

† Mass balance equations (steady state) dS1 = v1 − v2 − v10 = 0 dt dS CHR: 2 = v2 − v3 − v6 − v11 = 0 dt dS PHP: 3 = v3 − v4 − v5 − v12 = 0 dt dS PHE: 4 = v4 − v7 − v13 = 0 dt dS TYR: 5 = v5 − v8 − v14 = 0 dt dS TRP: 6 = v6 − v9 − v15 = 0 dt DAHP:

Si ≤ aU i , Si,ref

i = 1, . . . , 6

(15)

The reference steady state, similar to that reported in [8], is presented in Table 1. These steady-state values are similar to those observed in bacterial cells for the considered concentrations of G6P, PEP, ATP, ADP and AMP, which remain constant during the optimisation. Table 1 Reference steady state for the amino acid biosynthesis pathway Maximum Vm (mM/min)

(16)

with aLi = 0.1 and aLi = 7.5.

Vm,1

710.0

Vm,2

22.0

Vm,3

474.0

Vm,4

64.0

Vm,5

10.5

Vm,6

28.0

Metabolite concentrations (mM) S1,ref (DHAP)

† Bounds on the six enzyme activities (Vm): an enzyme overexpression of up to twice the level of the reference state is allowed. † Physiological constraints: specific growth rate is constrained to its reference value, so that other metabolic activities of the cell remain undisturbed. It is clear that such a wide variation in the metabolite levels can lead to unpredicted responses from the cell and undesirable effects. Therefore it would be attractive to seek for an optimum satisfying the homeostatic constraint, but the problem arises when trying to define a reasonable value (if known) which could be tolerated by the cell. IET Syst. Biol., 2010, Vol. 4, Iss. 3, pp. 236– 248 doi: 10.1049/iet-syb.2009.0045

(18)

with y ¼ 1 if the inhibitory loop is maintained.

† Limits on metabolite concentrations: we allow a variation between 10 and 650% of the reference steady state, that is

aLi ≤

S (Km + S)(1 + y(I /KI ))

3.4129

S2,ref (CHR)

31.0066

S3,ref (PHP)

0.6095

S4,ref (PHE)

263.7936

S5,ref (TYR)

321.8269

S6,ref (TRP)

81.9005

Fluxes (mM/min) flux to Phe

3.8619

flux to Tyr

3.9764

flux to Trp

1.1934

241

& The Institution of Engineering and Technology 2010

Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on June 03,2010 at 08:56:52 UTC from IEEE Xplore. Restrictions apply.

www.ietdl.org 4.2. Metabolic engineering of the Tricarboxylic acid cycle in Dictyostelium discoideum This case study was considered in [18] using the 1-constraint approach. The aim of the multiobjective optimisation is to minimise the rate of carbon dioxide evolution with the minimum number of enzyme modifications. This is equivalent to maximise the efficiency of carbon utilisation (Fig. 3). The mathematical model consists of 13 differential equations (one for each metabolite S1 to S13) with 32 real variables, Xi (one for each enzymatic activity apt for manipulation). There are 32 binary variables ( yj) associated to each one of the continuous variables for deciding if an enzyme is modified or not. Taking into account that the rate of carbon dioxide evolution is the sum of the fluxes of the reactions converting isocitrate to a-ketoglutarate to succinate, the

MOP is mathematically formulated as ⎛

⎞ J1 = V410 + V1011 ⎜ ⎟ 32  min J = ⎝ ⎠ J2 = yj X ,y

(19)

j=1

where Vij is the flux from metabolite i to metabolite j. The above objective vector is subject to the following sets of constraints: † Carbon balance (total system) Cin − Cout = 0

(20)

Cin = 4VPr7 + 2VPr3 + 4VPr11 + 4VPr12 Cout

+ 3VPr8 + 5VPr6 + V116 = 4V7Pr + 2V3Pr + 4V11Pr + 4V12Pr + 3V8Pr + 5V6Pr + V116 + V410 + V53 + V1011 + V135

(21)

Figure 3 Tricarboxylic acid cycle in D. discoideum (OAA: oxaloacetate; ACoA: acetyl co-enzyme A; CIT: citrate; ICIT: isocitrate; aKG: a-ketoglutarate; SUCC: succinate; FUM: fumarate; MAL: malate; PYR: pyruvate; ALA: alanine; ASP: aspartate; GLU: glutamate; PROT: protein) 242

& The Institution of Engineering and Technology 2010

IET Syst. Biol., 2010, Vol. 4, Iss. 3, pp. 236– 248 doi: 10.1049/iet-syb.2009.0045

Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on June 03,2010 at 08:56:52 UTC from IEEE Xplore. Restrictions apply.

www.ietdl.org † Mass balances (pool metabolites)

Optimisation settings for the MIGO method applied in each one of the MINLPs are:

dS1 = V71 + V106 + V131 − V12 − V71 = 0 dt dS OAA2: 2 = V12 + V72 − V27 − V29 = 0 dt dS3 ACoA: = V53 + VPr3 − V29 − V3Pr = 0 dt dS ICIT: 4 = V94 − V410 = 0 dt dS PYR: 5 = V85 + V135 − V53 − V58 = 0 dt dS GLU: 6 = V101 + V116 + VPr6 − V610 dt OAA1:

− V611 − V58 − V6Pr = 0 ASP:

dS7 = V17 + V27 + VPr7 − V71 dt − V72 − V106 − V7Pr = 0

dS8 = V58 + VPr8 − V85 − V8Pr = 0 dt dS CIT: 9 = V29 − V94 = 0 dt dS aKG: 10 = V410 + V610 + V58 dt ALA:

− V106 − V1011 = 0 SUCC:

dS11 = V611 + V1011 + VPr11 − V116 dt − V1112 − V11Pr = 0

dS12 = V1112 + VPr12 − V1213 − V12Pr = 0 dt dS MAL: 13 = V1213 − V131 − V135 = 0 dt (22)

† Maximum number of function evaluations: between 10 000 and 20 000 multiplied by the number of binary variables. † Tolerance for objective functions and constraints: 1026.

5.1 Biosynthesis of aromatic amino acids 5.1.1 Maximisation of phenylalanine selectivity: Following the optimisation process described above, we maximise in the first place the phenylalanine selectivity. We found a maximum SPhe of 84.51% with and associated deviation in metabolite concentrations with respect to those of the reference steady state of V ¼ 191.3%. MITS also reported more than 200 local maxima found during the search, many of which present similar values for the phenylalanine selectivity, but different regulatory structures, enzymatic levels and metabolite concentrations (Fig. 4). This is in agreement with the results found by Hatzimanikatis et al. [8]. In an attempt to reduce the homeostasis of metabolites, we apply a multi-objective lexicographic approach to maximise the phenylalanine selectivity. We minimise the deviation of metabolite concentrations subject to the constraint that the phenylalanine selectivity has to be greater or equal to the maximum value previously found. This solution will be the extreme point of the Pareto front. The resulting regulatory structure corresponding to a maximum SPhe ¼ 84.51% has a V value of 149.9%. The difference in the regulation between both maxima can be seen in Figs. 5a and b.

FUM:

† Physiological constraints on metabolite concentrations, which are allowed to vary a 10% of their original steady-state reference value. † Bounds on enzyme activities: enzymes selected for modification ( yi ¼ 1) are allowed to vary between 0 and 10 times their original level (1 − yi )Xi,ref ≤ Xi ≤ (1 + 9yi )Xi,ref ,

i = 1, . . . , 32 (23)

The mathematical model and kinetic values for the reference steady state can be found in [34].

5

Results and discussion

In the following sections, we present the results obtained with the NBIWT method coupled with the MITS solver. IET Syst. Biol., 2010, Vol. 4, Iss. 3, pp. 236– 248 doi: 10.1049/iet-syb.2009.0045

These solutions are somewhat similar to those obtained in [8] using a linearised model, but there is a noteworthy difference. These authors found eight equivalent solutions, in which phenyalanine always inhibits the first step, v1 . In our case, using the non-linear kinetic model, the inhibition of v1 by DAHP is the regulatory loop which is always present in the solutions maximising the phenylalanine selectivity. From inspection of Figs. 5a and b, it is clear the advantage of the second one. The homeostasis of metabolites is still very high, but from a practical point of view, it only requires three inhibitory loops deletions. Furthermore, only three enzymes are overexpressed (those corresponding to v2 , v3 and v4). However, in both cases the concentration of phenylalanine is at the upper bound (7.5 times the original value).

5.1.2 Generation of the Pareto front: Known the individual optima of the objectives (maximum SPhe and minimum deviation from the original steady state), we generate an approximation of the Pareto-optimal set by solving a set of 19 MINLPs derived from the NBIWT 243

& The Institution of Engineering and Technology 2010

Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on June 03,2010 at 08:56:52 UTC from IEEE Xplore. Restrictions apply.

www.ietdl.org

Figure 4 Maximisation of phenylalanine selectivity a Histogram of solutions for the maximisation of phenylalanine selectivity b Correlation with the corresponding homeostasis of metabolites

Figure 6 Pareto fronts obtained using NBIWT, original NBI and WSA Figure 5 Regulatory structures for maximisation of PHE selectivity a Regulatory structure for the maximum PHE selectivity with V ¼ 191.3% b Regulatory structure for the maximum PHE selectivity with V ¼ 149.9%

method described above. The outcome of the algorithm is depicted in Fig. 6, when it is also compared with the results obtained with the original NBI and the traditional weighted sum approach.

observed two clear discontinuities in the Pareto front (A-B and C-D). These ‘jumps’ in the efficient front cause original NBI and WSA to fail. It should be noted, however, that solution D is weakly Pareto optimal (i.e. there exists another solution, E, which the same value for the phenylalanine selectivity and a lower value for the homeostasis of metabolites). Nevertheless, the mathematical programming technique described in this paper presents clear advantages over the original NBI.

5.1.3 Analysis of solutions: Fig. 7 shows four of the six The best (non-dominated) set of solutions is obtained using out NBIWT approach. NBI fails in generating global Pareto-optimal points, and the classical weighted sum approach (WSA) misses a great portion of the Pareto front. Fig. 6 shows six significant points, named with letters from A to F (in descending order of the values of V). It can be 244 & The Institution of Engineering and Technology 2010

selected regulatory structures mentioned above (in solutions D and F no regulatory loop is eliminated). In Figs. 8a and b are depicted the corresponding values for the enzymatic overexpression ratios and the metabolite levels. From inspection of Fig. 8b, it is clear that higher values for the phenylalanine selectivity are achieved with large increases in the concentration of phenylalanine. Thus, it seems evident IET Syst. Biol., 2010, Vol. 4, Iss. 3, pp. 236– 248 doi: 10.1049/iet-syb.2009.0045

Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on June 03,2010 at 08:56:52 UTC from IEEE Xplore. Restrictions apply.

www.ietdl.org

Figure 7 Regulatory structures for the aromatic amino acid biosynthesis pathway selected from the Pareto-optimal set

Figure 8 Enzymatic overexpression ratios and metabolite levels for selected Pareto-optimal designs a Enzymatic overexpression ratios b Metabolite levels

that the inhibitory feedback loop of phenylalanine over reaction v4 (i.e. the flux producing phenylalanine) should be eliminated in order to achieve concentrations so elevated. Additionally, the saturation-type shape of the Pareto curve could be explained from this apparent relationship. It is important to keep in mind that although this analysis could be considered obvious, these results provide further insight into the system, especially where the homeostasis of metabolites takes lower values. Design E, for instance, only requires overexpressing the enzyme catalysing the phenylalanine production step and eliminating the inhibition of tryptophan over the first step of the pathway.

5.2 Metabolic engineering of the TAC in D. discoideum 5.2.1 Generation of the Pareto front: In this case, we first apply the 1-constraint approach, maximising the carbon utilisation (i.e. minimising the fluxes V410 + V1011 ) while the number of allowed enzymatic modifications is varied from 1 to 32. As a result, we found IET Syst. Biol., 2010, Vol. 4, Iss. 3, pp. 236– 248 doi: 10.1049/iet-syb.2009.0045

that a minimum of 12 enzymes need to be modified in order to reach the minimum rate of carbon dioxide evolution, which is practically zero. It should be noted that this solution could also be obtained following the same process as in the previous case (i.e. minimising the sum of fluxes producing CO2 with no restriction on the number of allowed modifications, and then applying a lexicographic method). We compare then the Pareto-optimal solutions generated with the proposed NBIWT. The best solutions from three optimisation runs are shown in Fig. 9. As in the previous case, it can be observed that NBIWT requires the solution of a fewer number of MINLPs to define the most important points of the Pareto front. Additionally, it is worthy to mention that the mean CPU time to solve each MINLP associated to the 1-constraint approach was of about 15 h. In contrast, the NBIWT subproblems required a mean CPU time of 6 h per MINLP, with the same optimisation settings, showing another clear 245

& The Institution of Engineering and Technology 2010

Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on June 03,2010 at 08:56:52 UTC from IEEE Xplore. Restrictions apply.

www.ietdl.org Another important conclusion is that the two key enzymes for reducing the rate of CO2 are X2 and X3 (corresponding to the aKG to SUCC reaction and the FUM to SUCC step, respectively); either one or both of these enzymes are present in all the solutions. However, the lowest values of the CO2 rate are achieved when X11 , X12 and X13 (i.e. the enzymes corresponding to the steps ACoA to CIT to ICIT) are also modified.

6

Figure 9 Pareto-optimal solutions for the metabolic engineering of D. discoideum Table 2 Selected solution from the Pareto front for the metabolic engineering of D. discoideum Solution

No. modification

Rate of CO2 (mM/ min)

Enzymes

A

2

3.86

X3 , X21 (or X27)

B

4

3.42

X3 , X9 , X14 , X15

C

6

2.20

X2 , X9 , X11 , X14 , X15 , X17

D

8

0.77

X2 , X3 , X5 , X11 , X12 , X13 , X17 , X28

advantage over other classical mathematical programming techniques.

5.2.2 Analysis of solutions: From inspection of Fig. 9, the first conclusion that can be drawn is that a minimum set of two enzymes should be modified in order to achieve an improvement in the carbon utilisation (solution marked as ‘A’). The rate of CO2 is decreased from 4.42 mM/min (reference steady state) to 3.86 mM/min. It is worth mentioning that such improvement can achieved by either decreasing the production of ASP from protein or increasing the inverse reaction. Other three significant solutions are points B, C and D. The enzymes that need to be modified for these three points are collected in Table 2. It can be observed from Fig. 9 that the reduction in the rate of CO2 when eight enzymes are simultaneously modified is greatly improved with respect to when only six enzymes are modulated, but further modifications lead to a small enhancement in the carbon utilisation. 246

& The Institution of Engineering and Technology 2010

Conclusions

In this work, a novel multi-criteria optimisation method was successfully applied to metabolic engineering problems involving both continuous and integer variables. The aim of these problems is to design strains with enhanced capabilities by optimising simultaneously several performance functions of the biological system under consideration. The new strategy, based on a combination of two wellknown mathematical programming techniques, has proved to be efficient when applied to the metabolic engineering problems under consideration. After generating the complete set of Pareto-optimal solutions for several case studies, it can be readily used to choose a suitable compromise between the objectives. Thus, this new technique can be used as the computational engine of a powerful decision support system for metabolic engineering. Since the new technique can be coupled with any singleobjective, stochastic, (MI) GO solvers, this method can adequately deal with more complex and large biological reaction networks. Furthermore, the new approach has proved to be superior to the original NBI method, which fails to obtain global Pareto-optimal solutions. It also presents clear advantages over other well-known programming techniques, when it comes to generate a fair good and easy-to-use representation of the Pareto front capturing the complete trade-off among the objectives.

7

Acknowledgments

The authors acknowledge financial support from Spanish MICINN project MultiSysBio DPI2008-06880-C03-02 and EU project BaSysBio LSHG-CT-2006-037469.

8

References

[1] SUTHERLAND W.J.: ‘The best solution’, Nature, 2005, 435, (7042), pp. 569– 569 [2] ALEXANDER R.M.: ‘Optima for animals’ (E. Arnold, London, 1982) [3] GREENBERG H.J., HART W.E., LANCIA G.: ‘Opportunities for combinatorial optimization in computational biology’, INFORMS J. Comput., 2004, 16, (3), pp. 211– 231 IET Syst. Biol., 2010, Vol. 4, Iss. 3, pp. 236– 248 doi: 10.1049/iet-syb.2009.0045

Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on June 03,2010 at 08:56:52 UTC from IEEE Xplore. Restrictions apply.

www.ietdl.org [4] LARRANAGA P. , CALVO B., SANTANA R., ET AL .: ‘Machine learning in bioinformatics’, Briefings Bioinf., 2006, 7, (1), pp. 86– 112

objectives: a comparison of several solution strategies’, Math. Comput. Model. Dyn. Syst., 2006, 12, (5), pp. 469– 487

[5] BANGA J.R. : ‘Optimization in computational systems biology’, BMC Syst. Biol., 2008, 2:47

[18] DEAN J.P., DERVAKOS G.A.: ‘Design of process-compatible biological agents’, Comput. Chem. Eng., 1996, 20, (S1), pp. S67 – S72

[6] BAILEY J.E.: ‘Toward a science of metabolic engineering’, Science, 1991, 252, (5013), pp. 1668– 1675 [7] STEPHANOPOULOS G., STAFFORD D.E.: ‘Metabolic engineering: a new frontier of chemical reaction engineering’, Chem. Eng. Sci., 2002, 57, (14), pp. 2595 – 2602 [8] HATZIMANIKATIS V. , FLOUDAS C.A. , BAILEY J.E. : ‘Analysis and design of metabolic reaction networks via mixedinteger linear optimization’, AIChE J., 1996, 42, (5), pp. 1277 – 1290 [9] REGAN L., BOGLE I.D.L., DUNNILL P.: ‘Simulation and optimization of metabolic pathways’, Comput. Chem. Eng., 1993, 17, (5– 6), pp. 627– 637 [10] TORRES N.V., VOIT E.O., GONZA´LEZ-ALCO´N C.: ‘Optimization of nonlinear biotechnological processes with linear programming: application to citric acid production by Aspergillus niger’, Biotechnol. Bioeng., 1996, 49, (3), pp. 247– 258

[19] RODRI´GUEZ-ACOSTA F., REGALADO C.M., TORRES N.V.: ‘Non-linear optimization of biotechnological processes by stochastic algorithms. Applications to the maximization of the production rate of ethanol, glycerol and carbohydrates by Saccharomyces cerevisiae’, J. Biotechnol., 1999, 68, (1), pp. 15– 28 [20] HATZIMANIKATIS V., FLOUDAS C.A., BAILEY J.E.: ‘Optimization of regulatory architectures in metabolic reaction networks’, Biotechnol. Bioeng., 1996, 52, (4), pp. 485– 500 [21] BURGARD A.P., PHARKYA P., MARANAS C.D.: ‘OptKnock: A bilevel programming framework for identifying gene knockout strategies for microbial strain optimization’, Biotechnol. Bioeng., 2003, 84, (6), pp. 647 – 657 [22] PATIL K.R. , ROCHA I. , FO¨RSTER J. , NIELSEN J. : ‘Evolutionary programming as a platform in silico metabolic engineering’, BMC Bioinf., 2005, 6:308

E.O. : ‘Optimization in integrated [11] VOIT biochemical systems’, Biotechnol. Bioeng., 1992, 40, (5), pp. 572– 582

[23] PHARKYA P., MARANAS C.D.: ‘An optimization framework for identifying reaction activation/inhibition or elimination candidates for overproduction in microbial systems’, Metabolic Eng., 2006, 8, (1), pp. 1 – 13

[12] MENDES P. , KELL D.B.: ‘Non-linear optimization of biochemical pathways: applications to metabolic engineering and parameter estimation’, Bioinformatics, 1998, 14, (10), pp. 869– 883

[24] VITAL-LOPEZ F.G., ARMAOU A., NIKOLAEV E.V., MARANAS C.D.: ‘A computational procedure for optimal engineering intervention using kinetic models of metabolism’, Biotechnol. Prog., 2006, 22, (6), pp. 1507– 1517

[13] HEINRICH R. , SCHUSTER S. : ‘The regulation of cellular systems’ (Chapman & Hall, New York, 2005)

[25] SENDI´N J.O.H. , OTERO-MURAS I., ALONSO A.A. , BANGA J.R.: ‘Improved optimization methods for the multiobjective design of bioprocess’, Ind. Eng. Chem. Res., 2006, 45, (25), pp. 8594 – 8603

[14] KACSER H., ACERENZA L.: ‘A universal method for achieving increases in metabolite production’, Eur. J. Biochem., 1993, 216, (2), pp. 361– 367 [15] SCHUSTER S. , SCHUSTER R., HEINRICH R. : ‘Minimization of intermediate concentrations as a suggested optimality principle for biochemical networks. II. Time hierarchy, enzymatic rate laws, and erythrocyte metabolism’, J. Math. Biol., 1991, 29, (5), pp. 443– 455 [16] HANDL J. , KELL D.B. , KNOWLES J. : ‘Multiobjective optimization in bioinformatics and computational biology’, IEEE/ACM Trans. Comput. Biol. Bioinf., 2007, 4, (2), pp. 279– 291 [17] SENDIN J.O.H., VERA J., TORRES N.V., BANGA J.R.: ‘Model-based optimization of biochemical systems using multiple IET Syst. Biol., 2010, Vol. 4, Iss. 3, pp. 236– 248 doi: 10.1049/iet-syb.2009.0045

[26] MIETTINEN K.M.: ‘Nonlinear multiobjective optimization’ (Kluwer Academic Publishers, Boston, 1999) [27] DEB K.: ‘Multi-objective optimization using evolutionary algorithms’ (Wiley, Chichester, 2001) [28] KOSKI J. , SILVENNOINEN R. : ‘Norm methods and partial weighting in multicriterion optimization of structures’, Int. J. Numer. Methods Eng., 1987, 24, (6), pp. 1101 – 1121 [29] DAS I., DENNIS J.E.: ‘Normal-boundary intersection: a new method for generating the Pareto surface in nonlinear multicriteria optimization problems’, SIAM J. Optim., 1998, 8, (3), pp. 631 – 657 247

& The Institution of Engineering and Technology 2010

Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on June 03,2010 at 08:56:52 UTC from IEEE Xplore. Restrictions apply.

www.ietdl.org [30] SHUKLA P.K.: ‘On the normal boundary intersection method for generation of efficient front’. Proc. Seventh Int. Conf. on Computational Science, Part I: ICSS 2007, Beijing, China, pp. 310– 317 [31] EXLER O., SCHITTKOWSKI K.: ‘A trust region SQP algorithm for mixed-integer nonlinear programming’, Optim. Lett., 2007, 1, (3), pp. 269 – 280 [32] EXLER O., ANTELO L.T., EGEA J.A., ALONSO A.A., BANGA J.R.: ‘A Tabu search-based algorithm for mixed-integer nonlinear problems and its application to integrated process and

248 & The Institution of Engineering and Technology 2010

control system design’, Comput. Chem. Eng., 2008, 32, (8), pp. 1877 – 1891 [33] SCHLOSSER P.M., BAILEY J.E.: ‘An integrated modeling-experimental strategy for the analysis of metabolic pathways’, Math. Biosci., 1990, 100, (1), pp. 87– 114 [34] SHIRAISHI F. , SAVAGEAU M.A.: ‘The tricarboxylic acid cycle in Dictyostelium discoideum. Systemic effects of including protein turnover in the current novel’, J. Biol. Chem., 1993, 268, (23), pp. 16917 – 16928

IET Syst. Biol., 2010, Vol. 4, Iss. 3, pp. 236– 248 doi: 10.1049/iet-syb.2009.0045

Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on June 03,2010 at 08:56:52 UTC from IEEE Xplore. Restrictions apply.

Suggest Documents