A Hybrid Approach To Modeling Metabolic Systems ... - IEEE Xplore

11 downloads 0 Views 442KB Size Report
John Yen, Senior Member, IEEE, James C. Liao, Bogju Lee, and David Randolph ... J. Yen, B. Lee, and D. Randolph are with the Center for Fuzzy Logic,.
IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 2, APRIL 1998

173

A Hybrid Approach to Modeling Metabolic Systems Using a Genetic Algorithm and Simplex Method John Yen, Senior Member, IEEE, James C. Liao, Bogju Lee, and David Randolph

Abstract—One of the main obstacles in applying genetic algorithms (GA’s) to complex problems has been the high computational cost due to their slow convergence rate. We encountered such a difficulty in our attempt to use the classical GA for estimating parameters of a metabolic model. To alleviate this difficulty, we developed a hybrid approach that combines a GA with a stochastic variant of the simplex method in function optimization. Our motivation for developing the stochastic simplex method is to introduce a cost-effective exploration component into the conventional simplex method. In an attempt to make effective use of the simplex operation in a hybrid GA framework, we used an elite-based hybrid architecture that applies one simplex step to a top portion of the ranked population. We compared our approach with five alternative optimization techniques including a simplex–GA hybrid independently developed by Renders–Bersini (R–B) and adaptive simulated annealing (ASA). Our empirical evaluations showed that our hybrid approach for the metabolic modeling problem outperformed all other techniques in terms of accuracy and convergence rate. We used two additional function optimization problems to compare our approach with the five alternative methods. For a sin function maximization problem, our hybrid approach yields the fastest convergence rate without sacrificing the accuracy of the solution found. For De Jong’s F5 function minimization problem, our hybrid approach is the second best (next to ASA). Overall, these tests showed that our hybrid approach is an effective and robust optimization technique. We further conducted an empirical study to identify major factors that affect the performance of the hybrid approach. The study indicated that 1) our elite-based hybrid GA architecture contributes significantly to the performance improvement and 2) the probabilistic simplex is more costeffective for our hybrid architecture than is the conventional simplex. By analyzing the performance of the hybrid approach for the metabolic modeling problem, we hypothesized that the hybrid approach is particularly suitable for solving complex optimization problems the variables of which vary widely in their sensitivity to the objective function.

I. INTRODUCTION

G

ENETIC algorithms (GA’s) have been demonstrated to be a promising search and optimization technique [1]. GA’s have been successfully applied to system identification [2]–[5] and a wide range of applications including design Manuscript received October 7, 1995; revised July 26, 1996 and February 5, 1997. This work was supported by the NSF under Grant BES 9511737. Part of the research described in the paper was conducted with the support of NSF Young Investigator Awards IRI 92-57293 and BCS-9257351. J. Yen, B. Lee, and D. Randolph are with the Center for Fuzzy Logic, Robotics, and Intelligent Systems Research, Department of Computer Science, Texas A&M University, College Station, TX 77843-3112 USA (e-mail: [email protected]). J. C. Liao is with the University of California, Los Angeles, CA 900951592 USA. Publisher Item Identifier S 1083-4419(98)02188-8.

[6], scheduling [7], routing [8], control [9], [10], and others [11]–[13]. One of the main obstacles in applying GA’s to complex problems has often been the high computational cost due to their slow convergence rate. The convergence rate of a GA is typically slower than that of local search techniques (e.g., the steepest descent), because it does not use much local information to determine the most promising search direction. Consequently, a GA explores a wider frontier in the search space in a less directional fashion. The problem of modeling metabolic systems involves the construction of a model of cellular metabolism and effective estimation of model parameters from limited amount of data. In using a GA to estimate parameters of a metabolic model, we found that its slow convergence rate makes the approach much less attractive. This problem is particularly severe for modeling metabolic systems because the evaluation of a candidate model involves simulating the model, which is computationally costly. The GA relies on such evaluations (typically called fitness evaluation) to provide a measure of the fitness of each guess. The result of this evaluation guides GA’s search process. A common strategy in the literature for dealing with the GA’s slow convergence problem is to combine a GA with a complementary local search technique [14]–[17]. The rationale of such a strategy is that such a hybrid approach can combine the merits of the GA with that of a local search technique. Because of the GA, a hybrid approach is less likely to be trapped in a local optimum than a pure local search technique is. Due to its local search, a hybrid approach often converges faster than the pure GA does. Generally speaking, a hybrid approach usually can explore a better tradeoff between computational cost and the global optimality of the solution found. Toward the objective of improving the convergence rate of a GA, we developed a hybrid approach that combines a realcoded GA with a stochastic variant of Nelder–Mead (N–M) simplex method [18], [19]. Our motivation for developing the stochastic simplex method is to introduce a cost-effective exploration component into the conventional simplex method. We compared our hybrid approach extensively with five alternative optimization methods: 1) a simplex–GA hybrid independently developed by Renders–Bersini (R–B); 2) the G-bit improvement on GA; 3) adaptive simulated annealing (ASA); 4) the real-coded GA; 5) a parallel version of the simplex method.

1083–4419/98$10.00  1998 IEEE

174

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 2, APRIL 1998

TABLE I KINETIC RATE LAWS OF GLUCOSE METABOLIC MODEL

Fig. 1. Pathway of glucose metabolic model.

Section II describes the modeling problem that motivated our research: the parameter estimation of a system model for the central metabolism of an Escherichia coli cell. Section III briefly reviews the basics of GA’s, simplex methods, and simulated annealing which serve as the background for our discussion. A classification of existing hybrid GA approaches is then presented in Section IV to set up the context for introducing two simplex–GA hybrid approaches: the R–B approach and our approach. The performance of applying our hybrid approach to the metabolic modeling problem is compared with those of five alternative methods in Section V. To show that our hybrid approach is also useful for solving other optimization problems, Sections VI and VII empirically compare our hybrid approach with these techniques for a function maximization problem and for De Jong’s foxhole function (F5), respectively. In Section VIII, we discuss the results of additional experiments designed to gain some insights about major factors contributing to the effectiveness of the proposed simplex–GA hybrid approach. Finally, we summarize major results of the paper and outline the issues we plan to address in our future research. II. THE MODELING

OF

METABOLIC SYSTEM

Modeling metabolic pathways has long been a desired goal for biochemists, biochemical engineers, and biotechnologists. The system under consideration here is the central metabolism in Escherichia coli, which includes glycolysis and the tricarboxylic acid (TCA) cycle. Fig. 1 shows a simplified version of the pathway considered in the model.1 Each reaction is shown 1 We simplified the pathway for its clarity. Several metabolites are not shown in the figure because the pathway would be too complicated to comprehend otherwise.

in the pathway as an arrow, which is labeled by the variable denoting the rate of the reaction. The dynamic mass balance of each metabolite (e.g., GLU, ) is described by an ordinary differential equaG6P, PEP, tion with rates of input and output represented by enzyme kinetic rate laws, (1) where is the vector of metabolite concentrations, and are the kinetic rate laws of enzymes that produce and consume , respectively. For instance, where is the concentration of glucose -phosphate; is the rate of enzymes that produce , which is a function of and ; and is the rate of enzymes , which is a function of and . that consume The kinetic rate laws are typically nonlinear functions of , and they are listed in Table I. Several metabolites such as in Table I are not shown in the simplified pathway in Fig. 1 for its clarity. The set of ODE’s is given in Table II. In the problem given, some key parameters (i.e., and ) in the rate laws listed in Table I are unknown and must be estimated from new experiments involving the whole system with the aid of parameter estimation methods. These parameters must be identified by comparing the concentration profiles determined experimentally with those predicted by simulation of the ODE’s in Table II with proper initial conditions. The evaluation of a model is usually based on two criteria: 1)

YEN et al.: A HYBRID APPROACH TO MODELING METABOLIC SYSTEMS

MODES

OF

175

TABLE II GLUCOSE METABOLIC MODEL

(a)

(b) Fig. 2. Landscape at the global optimum: (a) V2

the sum of relative absolute errors between the model output and the target model behavior obtained experimentally, and 2) the convergence of the model output. The second criterion is important because the true model should converge to an equilibrium state. For example, if two metabolic models have the same accumulative errors and one converges while the other does not, the former model is preferred. To express this preference, a penalty should be given a model that does not converge to an equilibrium state. When we use a GA to identify the unknown parameters, these two criteria become the basis for designing the GA’s fitness evaluation function, as we shall see later in Section V. Fig. 2 plots some of landscape of the optimization surface of the biomodeling problem at the global optimum. These figures clearly show that many local optima are scattered in the irregular landscape. Because of the existence of numerous local optima in the model, parameter optimization through gradient techniques fell short of our goal. We instead turned our attention to the GA, which can avoid entrapment by local optima. However, applying the pure GA to this problem is difficult due to its high computational cost. The problem is

0 V6 and (b) V4 0 V2

:

attributed to 1) the slow convergence rate of the GA and 2) the inherent computational cost of the evaluation function, which involves simulating metabolic models using sets of parameter guesses. Since a single simulation could take up to 3 s, evaluating a generation of 150 sets of parameter values could take more than 12 min. One run of a genetic algorithm for optimizing six parameters would take about 7 h before it converges! In system-level metabolic modeling, the number of parameters that may need to be adjusted can be very large. Consequently, it is difficult to apply GA’s to large-scale biomodeling problems. This scalability problem in applying the pure GA to metabolic modeling motivated us to investigate a hybrid approach that integrates a GA with other search techniques to improve the GA’s convergence rate. Before describing our hybrid GA approach, we briefly review existing hybrid GA techniques in Section III. III. BACKGROUND Optimization techniques can be classified into two categories: local search methods and global search methods. A local search method uses local information about the current set of data (state) to determine a promising direction for moving some of the data set, which is used to form the next set of data. The advantage of local search techniques is that they are simple and computationally efficient. But they are easily entrapped in a local optimum. In contrast, a global search method explores the global search space without using local information about promising search direction. Consequently, they are less likely to be trapped in local optima, but their computational cost is higher. The distinction between local search methods and global search methods is referred to

176

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 2, APRIL 1998

as the “exploitation–exploration” tradeoff in the recent work of Renders [20]. While the global search methods often focus on “exploration,” the local search methods focus on “exploitation.” Like most global search methods, GA’s are not easily entrapped in local minima. On the other hand, they typically converge slowly. Many researchers have reported in the literature that combining a GA and a local search technique into a hybrid approach often produces certain benefits [14]–[16], [21]. This is because a hybrid approach can combine the merits of the GA with those of a local search technique. Because of the GA, a hybrid approach is less likely to be trapped in a local optimum than a local search technique. Due to its local search, a hybrid approach often converges faster than the GA does. Generally speaking, a hybrid approach usually can explore a better tradeoff between computational cost and the optimality of the solution found. To provide the background necessary for us to introduce hybrid approaches, this section briefly reviews the basics of genetic algorithm and a local search technique called the simplex method. We give an overview of various existing hybrid architectures in the next section. Because we will compare our approach with ASA at a later point, we also review the basics of simulated annealing.

A. Genetic Algorithms Genetic algorithms are global search and optimization techniques modeled from natural genetics, exploring search space by incorporating a set of candidate solutions in parallel [22]. A GA maintains a population of candidate solutions where each solution is usually coded as a binary string called a chromosome. A chromosome—also referred to as a genotype—encodes a parameter set (i.e., a candidate solution) for a set of variables being optimized. Each encoded parameter in a chromosome is called a gene. A decoded parameter set is called a phenotype. A set of chromosomes forms a population, which is evaluated and ranked by a fitness evaluation function. The initial population is usually generated at random. The evolution from one generation to the next involves three steps. First, the current population is first evaluated using the fitness evaluation function, then ranked based on its fitness values. Second, GA’s stochastically select “parents” from the current population with a bias that better chromosomes are more likely to be selected. This is accomplished using a selection probability that is determined by the fitness value or the ranking of a chromosome. Third, the GA reproduces “children” from selected “parents” using two genetic operations: crossover and mutation. This cycle of evaluation, selection, and reproduction terminates when an acceptable solution is found, when a convergence criterion is met, or when a predetermined limit on the number of iterations is reached. The crossover operation exchanges information between two chromosomes. First, a randomly selected bit position is used to cut each parent chromosome (i.e., bitstring) into two substrings. The parents then exchange their right substrings to produce two new strings (i.e., their children) with the same length. The mutation operation replaces a randomly chosen bit

of a chromosome with its complement (i.e., changing a “1” with a “0” or a “0” with a “1”). The chances that these two operations apply to a chromosome is controlled by two probabilities: the crossover probability and the mutation probability. Typically, the mutation operation has a low probability to reduce its potential interference with a legitimately progressing search. The GA differs from the conventional optimization techniques in that it is inherently parallel. All individuals in a population evolve simultaneously without central coordination. They operate on a set of solutions rather than on one solution, hence multiple frontiers are searched simultaneously. The only feedback used by the genetic algorithm is the fitness evaluation. The GA has been shown to be an effective search technique on a wide range of difficult optimization problems [1], [2], [22], [23]. 1) Real-coded GA: A real-coded GA is a genetic algorithm representation that uses floating point [24]. A chromosome in a real-coded GA becomes a vector of floating point numbers. With some modifications of genetic operators, real-coded GA’s have resulted in better performance than binary-coded GA for certain problems [24]. We briefly describe below some modified genetic operators recommended for real-coded GA. The crossover operator of a real-coded GA is analogous to that of binary-coded GA except that its crossover points fall between genes (i.e., encoded parameters). Two mutation operators have been proposed for real-coded GA’s. A random mutation changes a gene with a random number in the feature’s domain. A dynamic mutation stochastically changes a gene within an interval that narrows over time. For the two problems we tested (i.e., the biomodeling problem described in Section II and a function maximization problem to be described in Section VI), real-coded GA’s outperformed binary implementation of GA’s. Therefore, all GA experiments reported in this paper used real-coded GA’s. To fairly compare hybrid approaches with GA’s, we also used real-coded GA’s to implement the two hybrid approaches that we will describe in Section III-B. B. Simplex Method Simplex method is a local search technique that uses the evaluation of the current data set to determine the promising search direction. In this section, we first review the basic simplex method. We then describe two modifications to the basic simplex method. Before we elaborate on the simplex method, we will describe another kind of local search method—the gradientbased method—which uses the gradient of the function being optimized as the promising search direction [25]. Examples of common gradient methods include steepest descent [26], Newton strategies [27], Powell’s version of conjugate directions [28], and Hooke and Jeeves’ pattern search [27], [29]. Even though these techniques have been widely used for many function optimization problems, it is difficult to apply them to the metabolic modeling problem because the mathematical relationships between the modeling parameters and the modeling objectives (i.e., close fitness between the

YEN et al.: A HYBRID APPROACH TO MODELING METABOLIC SYSTEMS

177

where is called the contraction coefficient because the resulted simplex is contracted. Third, if the reflected point is not worse than but is worse than the second worst point in the original simplex, a new point close to the is generated using the centroid on the opposite side of contraction coefficient (5) Fig. 3. An example of two-dimensional simplex.

model prediction and the experimental data) are too complex to be formulated. 1) Basic Simplex Method: The basic simplex method was first introduced by Spendley et al. [30]. A simplex is defined by a number of points equal to one more than the number of dimensions of the search space. For an optimization problem variables, the simplex method searches for an involving points (i.e., points optimal solution by evaluating a set of forming a simplex), denoted as . The method continually forms new simplices by replacing the worst point in the simplex, denoted as , with a new point generated over the centroid of the remaining points by reflecting (2)

3) Probabilistic Simplex Method: In order to introduce a cost-effective exploration component into the simplex method, we develop a stochastic variant of the simplex method, which we call the probabilistic simplex method. We modify the basic simplex method to allow the distance between the centroid and the reflected point to be determined probabilistically. This is achieved by combining (3) and (5) into the following equation: (6) where is a random variable taking its value from the interval [0, 2] based on a predetermined probability distribution. A probability distribution used in our application is a triangular probability density function that peaks at 1, and reaches zero probability at 0 and 2, respectively. If the reflected point is worse than the worst point, a probabilistic contraction operation is applied in a similar way

where

The

(7)

new

simplex

is

then

defined by .2 This cycle of evaluation and reflection iterates until the step size (i.e., ) becomes less than a predetermined value or the simplex circles around an optimum. Fig. 3 illustrates how this applies to an optimization problem involving two variables. Suppose points and form an original simplex, and point has the worst evaluation. Point represents the centroid of and . Reflecting point across generates , which together with points and , forms the new simplex. 2) N–M Simplex Method: Nelder and Mead developed a modification to the basic simplex method that allows the procedure to adjust its search step according to the evaluation result of the new point generated [19]. This is achieved in three ways. First, if the reflected point is very promising (i.e., better than the best point in the current simplex), a new point further along the reflection direction is generated using the equation (3) because where is the called the expansion coefficient the resulting simplex is expanded. Second, if the reflected point is worse than the worst point in the original simplex (i.e., ), a new point close to the centroid on the same side of is generated using the equation (4)

X

2 If r has the worst evaluation in the new simplex, replace the second worst point in the next cycle instead.

where is a random variable taking its value from the interval [0,1] with a triangular probability density function that peaks at 0.5. By introducing a stochastic component into the reflection operation and the contraction operation, the newly generated point can lie anywhere in a line segment connecting and , rather than constraining to three points in the line. This flexibility allows the probabilistic simplex to explore the search space with more freedom. It may also facilitate the fine-tuning of solutions around an optimum. C. Simulated Annealing The basic idea of simulated annealing technique is that it tries to avoid being trapped in local minima by making an “uphill” move (for a minimization problem) occasionally [31], [32]. The probability of moving in an “uphill” direction is determined by a function in the form of

where is the amount of objective value increase caused by the uphill move and is a parameter (referred to as “annealing temperature”) that decreases over time by a schedule. During the initial stage of the simulated annealing search, the system is more likely to accept uphill moves because is high. As the search proceeds, the probability of accepting uphill moves gradually decreases because becomes lower. 1) Adaptive Simulated Annealing: ASA [33] is a variant of the simulated annealing technique where the annealing schedule for the temperature decreases exponentially in annealing time. ASA also introduces re-annealing so that it permits adaptation to changing sensitivities in the multidimensional parameter space. ASA attempts to “stretch out” the range

178

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 2, APRIL 1998

over the insensitive parameters. It has been shown that ASA outperformed GA on De Jong’s test functions [34]. IV. INTEGRATING GENETIC ALGORITHMS AND THE SIMPLEX METHOD While GA’s have shown to be effective for solving a wide range of optimization problems [1], its convergence speed is typically much slower than local optimization techniques. It can only recombine good guesses hoping that one recombination will have a better fitness than both of its parents.3 Because of this limitation, many researchers have combined GA’s with other optimization techniques to develop hybrid genetic algorithms [14], [15], [17], [20], [35]–[39]. The purpose of such hybrid systems is to speed up the rate of convergence while retaining the ability to avoid being easily entrapped at a local optimum. Although local optimization in a hybrid often results in a faster convergence, it has been shown that too much local optimization can interfere with the search for a global optimum by drawing the genetic algorithm’s attention to local optima too quickly, leading to premature convergence [16]. Thus, while local optimization might improve the speed of the analysis, it may also reduce the quality of the final solution found. Thus, designing a hybrid approach for an application involves a careful analysis of these tradeoffs. To put our discussion in a larger context, we briefly review the types of hybrid GA architectures before we discuss two specific hybrid approaches that combine a GA with a simplex method. A. Types of Hybrid Architecture Hybrid genetic algorithms can be classified into four categories: 1) pipelining hybrids; 2) asynchronous hybrids; 3) hierarchical hybrids; 4) additional operators.4 We briefly review each category below. 1) Pipelining Hybrids: Probably the simplest and most commonly used hybrids are the pipelining hybrids, in which the genetic algorithm and some other optimization techniques are applied sequentially; one generates data (i.e., points in the search space) used by the other. Typically, pipelining hybrids use the first search algorithm to prune or bias the initial search space such that the second algorithm will either converge more quickly or more accurately. There are three basic types of pipelining hybrid GA, as shown in Fig. 4: 1) the GA is applied first, serving as a preprocessor; 2) the GA is applied last, serving as the primary search routine; and 3) the GA interleaves with another optimization technique. This has often been referred to as staged hybrid in the literature [16]. G-bit Improvement: G-bit improvement is one of the pipelining hybrids that uses a simple local optimization method 3 The underlying foundation of this search strategy is Holland’s Schema Theory [22]. 4 These four categories are not mutually exclusive, because a specific hybrid approach can belong to multiple categories. For instance, a pipelining hybrid could introduce additional operators in the GA phase.

(a)

(b)

(c)

Fig. 4. Three pipelining hybrid GA architectures: (a) preprocessor, (b) primary, and (c) staged pipelining.

by searching the neighbors of the best chromosomes in each generation [2]. The original idea of G-bit improvement on binary GA is as follows: 1) select one or more of the best strings from the current population; 2) sweep bit by bit, performing successive one-bit changes to the subject string or strings, retaining the better of the last two alternatives; 3) at the end of sweep, insert the best structure (or -best structures) into the population and continue the normal genetic search. For a real-coded version of the G-bit improvement algorithm, the process of flipping a bit is replaced by mutating a gene (e.g., through dynamic mutation). In later sections, we will compare our hybrid approach with G-bit improvement on realcoded GA for the metabolic modeling problem and two other optimization problems. 2) Asynchronous Hybrids: An asynchronous hybrid architecture uses a shared population to allow a GA and other optimization processes to proceed and cooperate asynchronously. One process might work on the problem by itself for several iterations before accessing the shared population again. If its findings are better than those in the shared population, it updates the shared population. However, if the process does not make any significant improvement after some time, it returns to the shared population to see if any other processes have posted any progress. Two approaches in building asynchronous hybrids are 1) to combine a process that converges slowly with another that converges swiftly or 2) to combine multiple processes that are each suitable for performing search in a subregion of the entire search space. The asynchronous teams (A-teams) methodology describes the first kind of combination by mating the GA with Newton’s method [14]. 3) Hierarchical Hybrids: A hierarchical hybrid GA uses a GA and another optimization technique at two different levels of an optimization problem. An example of hierarchical hybrid is the hybrid of the genetic algorithm and the multivariate adaptive regression splines (MARS) to create the G/SPLINES algorithm [15]. In this hierarchical hybrid, the GA searches for the best structure of a spline model at a high level, whereas the parameters of the spline model are computed using regression.

YEN et al.: A HYBRID APPROACH TO MODELING METABOLIC SYSTEMS

4) Additional Operators: A genetic algorithm can sometimes be improved by introducing additional reproduction operators that perform one-step (or multistep) local search. Almost all local optimization techniques can be incorporated into GA’s this way, since they all can be viewed as an operator that generates a “child” from one or multiple parents. Among them, the simplex method is particularly suited for this type of hybrid, since the entire population is already ranked by GA. The computation overhead introduced by a new simplex operator is thus very low. Partition-based versus Elite-based Hybrid Architecture: There are at least two hybrid GA architectures for introducing a new operator into GA: 1) the partition-based hybrid GA architecture and 2) the elite-based hybrid GA architecture. In a partition-based hybrid GA architecture, the entire population is partitioned into disjoint subgroups. A fixed number of child is produced by each subgroup in a generation to replace a fixed number of worst chromosomes in the subgroup. Each child can be produced by a conventional GA reproduction scheme or by a new operator (i.e., a local search step). Hence, the new operator is associated with a probability that indicates the likelihood that the operator is selected in generating a child in the reproduction process. In an elitebased hybrid architecture, the new operator is applied to top-ranking chromosomes to generate a portion of the new population, while the remaining population is generated by a conventional GA reproduction scheme. In both architectures, a fraction of the new generation is created using the new operator, while a fraction of the remainder is produced by the conventional GA operators. In the first architecture, the new operator is applied to the entire selected population with a fixed probability. In the second architecture, the new operator is applied to top-ranking chromosomes with probability 1, to lower ranking chromosomes with probability 0. Examples of elite-based hybrid GA include G-bit improvement on GA and our simplex–GA hybrid. The partition-based hybrid GA was introduced in R–B’s work, even though they did not give this architecture a name [35]. As we shall see later in Section VIII, the choice between these two architectures can have a significant impact on the performance of a hybrid GA system.

B. Simplex–GA Hybrid Approaches In this section, we describe two approaches for incorporating the simplex method into the GA as an additional operator. We first describe a simplex–GA hybrid developed by R–B (R–B hybrid). Then, we describe an alternative simplex–GA hybrid we developed independently. Finally, we discuss the differences between the two hybrid approaches. The application of these two hybrid approaches to the metabolic modeling problem will be reported in the next section. 1) R–B’s Partition-based Simplex–GA Hybrid: R–B recently proposed a simplex–GA approach that partitions the chromosomes, where entire population into groups of is the number of variables to be optimized. Therefore, the underlying hybrid GA architecture is partition-based, as we

179

mentioned in the previous section. One of the following three operators can be applied to each group. 1) Discrete Crossover: For each gene in a child chromosome, the operator randomly chooses a parent in the group and copy its corresponding gene to the new chromosome. This child chromosome replaces the lowest ranking parent in the group. 2) Average: Replace the worst parent in the group with the parents. average of all 3) Simplex: This is the N–M simplex method described in Section III with an expansion coefficient 2 and a ), contraction coefficient 0.5 (i.e., with one modification that if the contracted point is even worse than the worst point, an offspring that is is generated is the best point in the simplex). We will refer to this operation “contraction toward the best” in this paper. In the rest of this paper, we will use this simplex method (referred to as the N–M simplex or the conventional simplex) to compare with our probabilistic simplex. Exactly one child is produced by a group in a generation. The chance that a particular operator is applied to a group is determined by a probability associated with the operator. For the convenience of our discussion, we will refer to these probabilities as the crossover probability (denoted ), average probability (denoted ), and simplex probability (denoted ), respectively. 2) Our Elite-based Simplex–GA Hybrid: We developed an alternative simplex–GA hybrid independently by applying a concurrent version of probabilistic simplex operator to top ranking chromosomes [18]. Hence, our approach is based on the elite-based hybrid GA architecture. Concurrent Simplex: A concurrent simplex is very much like the classical simplex methods with one minor difference. points in the simplex (where Instead of starting with is the number of variables to be optimized), the variant points, where . Like a classical begins with points are selected and their simplex, the best is calculated. However, instead of reflecting only centroid one point across , the concurrent simplex reflects multiple across to produce . points All new points are reevaluated and contraction operations are applied if needed. This process of ranking, selection, reflection, evaluation, contraction, and elimination iterates like the sequential simplex method. An example of concurrent simplex in two-dimensional (2-D) space is shown in Fig. 5. The benefit of the concurrent simplex is that it can explore a wider search frontier. The main disadvantage is the overhead more points every iteration. of evaluating and reflecting Note also that the concurrent version can incorporate any one of the three simplex methods described in Section III. In our simplex–GA hybrid, the concurrent simplex is applied to the top chromosomes in the population to produce children. The top chromosomes are copied to the next chrogeneration. The remaining chromosomes (i.e., is the total population size) are generated mosomes where using GA’s reproduction scheme (i.e., selection, crossover, and

180

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 2, APRIL 1998

TABLE III A COMPARISON OF R–B HYBRID AND OUR SIMPLEX–GA HYBRID

Fig. 5. An example of two-dimensional concurrent simplex.

Fig. 6. Reproduction in our elite-based simplex–GA hybrid.

in two major ways: there are architectural differences and different reproduction operators used. First, the R–B hybrid, which is partition-based, applies simplex reproduction to multiple disjoint subgroups of the population; but our hybrid, which is elite-based, applies simplex reproduction to a top portion of the sorted population. Hence, the rationale of the R–B approach is to explore multiple search frontiers simultaneously using both genetic search and simplex local search. The R–B approach doesn’t systematically apply the simplex reproductions. Traditional crossover and mutation is also applied with some probability. In contrast, the rationale of our approach is to apply simplex local search to the more promising points, hoping to speed up the convergence rate when they are in the vicinity of an optimum point. In our hybrid, the GA search is still applied to the entire population to children. Therefore, the more promising points generate in our architecture participate in both the simplex reproduction and the GA reproduction. In R–B hybrid, however, a subgroup can perform only one kind of reproduction in an iteration. The second difference between the two hybrid approaches lies in the reproduction operators they chose. The R–B approach uses N–M simplex, while our approach uses the probabilistic simplex. Furthermore, the R–B approach uses multiparent discrete crossover and an average operator, while our approach uses two-parent crossover. We summarize the major differences between these two hybrid simplex–GA approaches in Table III. V. APPLICATION

Fig. 7. Our simplex–GA hybrid algorithm.

mutation). Fig. 6 depicts the reproduction stage of this hybrid approach. The algorithm of this simplex–GA hybrid approach is summarized in Fig. 7. The algorithm terminates when it satisfies a convergence criterion or reaches a predetermined maximal number of fitness evaluations (i.e., a maximal trial number). We will refer a specific version of our architecture by the percentage of population to which the concurrent simplex operator is applied. For example, a 50% simplex–GA applies the concurrent simplex to the top half population. A 100% simplex–GA will mean that no GA reproduction is performed. Obviously, 0% simplex–GA is equivalent to the pure GA. 3) Comparison: Even though both the R–B hybrid and our hybrid combine the GA and simplex methods, they differ

TO

METABOLIC MODELING

We have successfully applied our simplex–GA hybrid optimization approach to the metabolic modeling problem described in Section II. We implemented the approach by modifying the code of GENESIS [40]. In this section, we describe the design of this application and the empirical results obtained, which is compared with those of original real-coded GA, concurrent simplex, the R–B hybrid approach, ASA, and the G-bit improvement on real-coded GA. To gain some insights into the benefits of our approach, we discuss the relationship between the sensitivity of the model parameters and the performance of all these approaches in searching optimal parameter values. A. Design We describe three major design issues in implementing of our hybrid simplex–GA for the metabolic modeling problem:

YEN et al.: A HYBRID APPROACH TO MODELING METABOLIC SYSTEMS

181

TABLE IV AVERAGE BEST FITNESS FOR TEN RUNS

Fig. 8.

Architecture of the fitness evaluation.

the sum of two components as mentioned in Section II. The first term is the sum of normalized errors between the model output and the target model behavior on the three observable state variables. The second term measures how well has model output converged since the true model should converge to an equilibrium state. Because the convergence of a metabolic model’s output is a necessary condition of a good model, the second term in the fitness function gives an additional penalty to a model that does not converge to an equilibrium state. The fitness function is as follows: Fig. 9.

Average best fitness of ten runs for each approach.

1) the encoding scheme, 2) the fitness evaluation, and 3) the choice of various GA parameters. 1) The Encoding Scheme: The overall objective of the metabolic modeling problem is to construct a model whose prediction is consistent with the experimental data. The specific objective of our glucose metabolic modeling problem is to find appropriate values for six parameters in the model described in Section II (i.e., ). These parameters are chosen because scientists are not very certain about their values. Consequently, a chromosome encodes six genes, each representing a parameter . Based on the knowledge about reasonable values for these parameters, we chose the actual range of the six parameters to be [0, 800], [0, 16 000], [0, 800], [0, 800], [0, 800], and [0, 800]. 2) The Fitness Evaluation: To evaluate the fitness of a chromosome, we assign the parameter values in the chromosome to their corresponding parameters. We then evaluate the metabolic model with these parameter assignment by simulating the model using double precision differential algebraic sensitivity analysis code (DDASAC) [41] with 30 time steps. Even though the metabolic model consists of 44 state variables, only three of them can actually be observed during experiments (i.e., and ). Consequently, our systems evaluated a parameter set by comparing the difference between the prediction and the target values of these three observable state variables. We obtained the target observations by simulating a target model. In our future research, the target observations will be obtained from actual experiments. The fitness function is

(8) where denotes an output variable of the model, denotes a time step, and denote the model output value and the target output value for variable at time , respectively. The maximum time step is denoted by . The convergence criteria (i.e., the second component) is measured by a weighted sum of the distance between the model outputs during the last time steps and their final values. Within these time steps, higher weights are given to the deviations occurred during later time steps using the formula . Hence, the lower is the fitness value, the better is the model. The architecture of the fitness evaluation is illustrated in Fig. 8. 3) Choice of GA Parameters: We chose GA parameters that are convenient for designing a fair comparison with the R–B approach. We used a population size of 147, which can be partitioned into 21 subgroups of seven chromosomes for the R–B approach. The probabilities of crossover and mutation are 0.25 and 0.061, respectively. The selection probability is determined by the ranking of chromosome. To evaluate the effectiveness of our simplex–GA hybrid approach to the metabolic modeling, we compare the following approaches: 1) the pure real-coded GA; 2) a 45% simplex–GA hybrid; 3) a 100% concurrent probabilistic simplex;

182

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 2, APRIL 1998

TABLE V BEST OVERALL GUESS AFTER 12 000 TRIALS

4) the R–B hybrid; 5) the G-bit improvement on real-coded GA; 6) ASA The parameters of R–B hybrid are those reported in their paper for a function maximization problem: 0.5 simplex probability, 0.2 crossover probability, and 0.2 average probability [35]. We chose the percentage of simplex reproductions in our hybrid approach such that the portion of simplex reproduction in the total reproduction is about the same as that of R–B’s approach 45

hybrid

(9) R-B hybrid (10) where is the population size, is the number of parameters being optimized, is the percentage of simplex in our hybrid, and are the probabilities in R–B hybrid for and applying simplex operator, crossover operator, and average operator, respectively. Each approach was executed ten times, each time starting with a different initial population generated randomly. To ensure fair comparisons, the ten initial populations used by six approaches are identical. B. Empirical Results Our empirical evaluations showed that our simplex–GA approach is highly effective for identifying the parameters for the metabolic modeling problem. Fig. 9 summarizes the performance of all six approaches by plotting the average of the best fitness (average over ten runs) versus the number of trials (i.e., the number of fitness evaluations). Table IV lists the average best fitness initially, after 5000 trials and after 12 000 trials for each of the six approaches. The figure and the table indicate that the average final best fitness score found by our hybrid approach is the best among the six approaches, followed by ASA. The temperature ratio scale of ASA was 10 and the temperature anneal scale was 100. In fact, both these top two approaches were able

to find good solutions by 5000 trials. Considering the variance of the final best fitnesses of ten runs, our hybrid approach also outperformed all other approaches. This showed the robustness of our approach. The R–B simplex–GA hybrid ranked third best fitness and convergence rate, which is followed by G-bit improvement. The performance of GA is inferior to all other approaches except the 100% concurrent simplex. Table V shows the overall best parameter values found by each approach after 12 000 trials. An interesting phenomenon revealed by this table is that identifying parameters and is easier than identifying parameters and . A very good guess of and can be found by all six approaches. However, most approaches had difficulty in and . In particular, the finding optimal values of value of in the best final overall guess found by almost all approaches except our 45% simplex–GA are far from the optimal value, even though their fitness scores seem reasonably good. This observation motivated us to explore the relationship between the sensitivity of parameters and the efforts required by different approaches to find their optimal values. By doing this, we hope to gain further insights about the benefits of our hybrid approach so that we can generate a working hypothesis about the characteristics of the problems that are most suitable for our simplex–GA hybrid approach. C. Sensitivity Analysis The sensitivity of the six parameters in the metabolic model vary widely. Using DDASAC, we calculated the sensitivity of these parameters at the optimum. Table VI shows the sensitivity and the sensitivity rankings of the parameters. Interestingly, most of them are about ten times more sensitive than the next parameter in the ranking. Even for the parameter pairs whose sensitivities are closer (i.e., and ), they still differ by about five times. Consequently, the least ) has a sensitivity that is four sensitive parameters (i.e., orders of magnitude lower than the most sensitive parameter (i.e., ). To see how different methods behave differently in searching the optimum for the parameter set (which vary widely in their sensitivities) we record the average of each parameter in the entire population for each run. These average values are

YEN et al.: A HYBRID APPROACH TO MODELING METABOLIC SYSTEMS

183

TABLE VI SENSITIVITY OF PARAMETERS

Fig. 10. Correctness of parameter

V3 .

Fig. 11. Correctness of parameter

V1 .

Fig. 12. Correctness of parameter

V5 .

converted to a normalized error using the following equation: (11) where denotes the average value of parameter at trial , and denotes the parameter’s optimal value, which is shown in Table V. Finally, these errors are averaged over ten runs. The results are summarized in Figs. 10–15, ordered by the sensitivity of the parameters. Because ASA does not use a population, it is difficult to establish a fair and meaningful comparison regarding a parameter’s average error. Consequently, ASA does not appear in these figures. Several important observations can be made from these results. 1) Even though GA has less problem in finding close to optimal values for two of the more sensitive parameters and ), it has great difficulties in converging (i.e., to the optimal value for the less sensitive parameters (e.g., and ). In fact, the average correctness of the three least sensitive parameters almost did not make any improvement within 12 000 trials. This is because GA’s search is guided only by the ranking of fitness evaluations, which are dominated by the more sensitive parameters that are not yet converged. In particular, Figs. 10–12 show that the average search behavior of a GA during the first 3000 trials was dominated by the most sensitive parameter (i.e., ). After 3000 trials, the GA began to improve the next two sensitive parameters (i.e., and ). However, the rate of convergence was slower for these two parameters. By the end of 12 000 was still significant. trials, the average error score of As a result, and dominated the later half of the GA search. The three least sensitive parameters and never had a chance to improve themselves, since they could not have much impact on the overall fitness evaluation due to their low sensitivity. 2) Compared to the GA, our hybrid approach explored parameters with a wide range of sensitivity much more effectively. This can be explained by the search direction provided by the simplex method which not only increased the rate of convergence for finding optimal parameter values, but also enabled optimization for multiple parameters with varying sensitivities. In our experiment, our hybrid approach identified optimal values for the three sensitive parameters and within

the first 4000 trials. Between trials 4000 and 7000, the hybrid approach improved the next two insensitive and significantly. Hence, even the parameters were improved after 4000 least sensitive parameter trials. 3) The R–B simplex–GA hybrid, on the average, converged and (two of the three faster than GA’s for most sensitive parameters). However, some of its search seemed to be trapped in local optimum, which for introduced an additional difficulty for finding optimum values for the three insensitive parameters. 4) The G-bit improvement, on the average, converged to a and that is closer to their global optimum value of than what the GA and the R–B simplex–GA found. It after 8000 was able to improve the correctness of trials. Surprisingly, its average normalized error of was the worst. A likely explanation of this phenomena is

184

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 2, APRIL 1998

in [23] (12)

Fig. 13.

Correctness of parameter

V4 .

where . We chose the function maximization ) local minima problem because there are many (exactly in this function. As grows, finding maxima of this function becomes more and more difficult. R–B used two instances of this function family to test their to 10, and to 10 and 100, hybrid approach by setting respectively [35]. These problems are ten-dimensional optimization problems. Theoretical maxima of these two functions are 9.660 (for ) and 9.655 (for ). We used these two problems to test our hybrid approach (57% simplex–GA) and compare the performance with those of GA, 100% concurrent simplex, R–B hybrid, G-bit improvement, and ASA. We chose to test 57% simplex–GA because its percentage of simplex reproductions among all reproductions is close to that of R–B hybrid 57

Fig. 14.

Correctness of parameter

hybrid

(13)

V2 . R-B hybrid

(14)

Fig. 15.

Correctness of parameter

V6 .

that the search of G-bit improvement was still dominated by by the termination time. 5) The concurrent probabilistic simplex had the most difficulties even in converging to the optimum of , partially because the probabilistic simplex was designed to complement the GA. Without the GA, it is easily trapped by any local optimum. VI. APPLICATION TO A FUNCTION MAXIMIZATION PROBLEM Because of the success of applying our simplex–GA hybrid to the metabolic modeling problem, we decided to further test our hybrid approach using two different problems. In this section, we test it using a function maximization problem. We test it using De Jong’s F5 function minimization problem in the next section. The testbed used in this section is a function maximization problem used by R–B [35], which is actually an instance of a function family introduced by Michalewicz

This is similar to the reason we chose 45% simplex–GA for the biomodeling problem. For each approach, the population size was 44 and the maximum trial number was 1 000 000. Like the biomodeling application, we compared the average performance of ten runs for each approach using a set of ten randomly generated initial populations. For the R–B hybrid, we used the parameters that gave the best performance in their experiments (i.e., 0.2, 0.2, and 0.5 for the crossover probability, the average probability, and the simplex probability, respectively) [35]. Fig. 16 plots the average best fitness versus number of trials for the two function maximization problems. Because the performance of the 100% concurrent simplex is much worse than the other three approaches, they are not included in the figures. We have also chosen the range of fitness in the figure so that the performance of the three approaches can be clearly distinguished. Since the performance of the three approaches differs mainly in the convergence rate, Table VII compares the average trial numbers each approach took to find the optimum. The following observations are made from the figures and the table. 1) The real-coded GA, our 57% simplex hybrid, and G-bit improvement found the theoretical maximum for both and in every run. The problems R–B simplex hybrid found the optimum in eight runs for each problem. 2) Our hybrid outperformed all other approaches in terms of the convergence rate. The G-bit improvement has the second best performance. For the simpler problem

YEN et al.: A HYBRID APPROACH TO MODELING METABOLIC SYSTEMS

185

TABLE VII AVERAGE TRIALS DIFFERENT APPROACHES TOOK

TO

FIND

THE

OPTIMUM

TABLE VIII AVERAGE BEST FITNESS VALUES OBTAINED IN R–B’S EXPERIMENTS AND OUR EXPERIMENTS

(a)

(b) Fig. 16. and (b)

Performance for a function maximization problem: (a) = 100.

m

m = 10

, our hybrid converged about eight times faster than the real-coded GA and about 2.5 times faster than the G-bit improvement in finding the optimum. For the , our hybrid converged difficult problem about four times faster than the real-coded GA and about twice as fast as the G-bit improvement on GA. 3) The performance of real-coded GA was better than that of the R–B hybrid. 4) The ASA was not robust in that it only found the optimum in three runs out of ten runs for the difficult . problem We need to clarify a few differences between the result of our experiments and those reported by R–B [35]. The GA outperformed the R–B hybrid for both problems in our experiment, but was outperformed by the R–B hybrid in their experiment. One of the differences between the two experiments is the maximum trials allowed. To take this factor into consideration, we compared the best fitness value (average over ten runs) obtained by R–B’s experiment and ours after

about the same number of trials. This comparison is shown in Table VIII, which indicated that our implementation of the R–B hybrid gave a performance that is similar to their own implementation. However, our implementation of the realcoded GA yielded a performance that is much better than that is in R–B’s experiment, especially for the problem where set to 100. This may be caused by other differences in the two GA implementations (e.g., selection probability, crossover probability, mutation probability, etc.). Since we could not find these details about R–B’s GA implementation in their paper, we cannot analyze these differences further. The empirical results of applying our hybrid approach to the sin maximization problem suggested that our hybrid approach can be an effective method not only for solving the biomodeling problem, but also for solving the complex function optimization problems. Our experiments indicated that our simplex–GA hybrid is superior to all other four alternative methods for the sin maximization problem. VII. APPLICATION TO DE JONG’S F5 FUNCTION MINIMIZATION PROBLEM We further tested our simplex–GA hybrid using De Jong’s F5 function minimization problem [1]. The function is formulated as (15)

for

with for for

and as well as and

186

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 2, APRIL 1998

TABLE IX AVERAGE TRIAL NUMBERS REACHING THE MINIMUM IN DE JONG’S F5 FUNCTION

APPROACHES

for . The range of is . This function is multimodal and globally minimized at with the value of 0.998 004. We designed the GA parameters in both architectures so that the ratio of the update probability of a chromosome by the simplex to the update probability of a chromosome is equal in both hybrids. In our architecture, the simplex is applied to 28% of the population, the crossover probability was 0.25, and the mutation probability was 0.389. The mutation probability in was 0.8, and were R–B’s architecture was 0.59, 0.1. Table IX summarizes the result of solving the problem using five approaches. All algorithms converge to the global optimum, but their convergence rates were different. Among the five algorithms, ASA was the best, followed by our hybrid algorithm. One hypothesis regarding ASA’s performance on different problems is that it is very sensitive on the number of local optima in the problem. Even though ASA gives a superior performance in De Jong’s F5 problem, its robustness is poor in the sin maximization problem (as discussed in the previous section). Because De Jong’s F5 function has much less local optima than the sin maximization functions does ), we hypothesize that ASA is not as suitable (25 versus for optimization problems that have a very large number of local optima. However, further research and experiments are needed to confirm or refine this hypothesis. VIII. DISCUSSIONS After we compared our hybrid approach with R–B’s hybrid approach for these three problems, we wanted to gain some empirical insights about factors that contributed to the differences in their performances. More specifically, we wish to address two questions. How does elitism affect the performance of our simplex–GA hybrid? How does the choice of the simplex operator and the choice of the hybrid architecture affect the performance of the two simplex–GA hybrid approaches? To answer these questions, we designed and performed additional experiments. The rest of this section summarizes the results of these experiments and describes some plausible conclusions we drew from these results. A. The Effect of Elitism To study the effect of elitism in our simplex–GA hybrid, we applied a real-coded GA with elites (where is the number of variables to be optimized) to the biomodeling problem and to the function maximization problem. Table X shows the average final best fitness for the biomodeling problem.

TABLE X ELITES GA WITH OTHER FOR THE BIOMODELING PROBLEM

COMPARING

N

For ease of comparison, we also included results of three other alternative approaches: the real-coded GA, our hybrid, and the R–B hybrid. It should be pointed out that the realcoded GA used for comparison in Sections V–VII kept the chromosome ranked first every generation. Therefore, it is in fact a GA with one elite. Table X shows that the realelites improved, on the average, the final coded GA with best fitness of the real-coded GA with 1 elite by 50% for the biomodeling problem. This is about half of the overall performance improvement of our 45% hybrid. Fig. 17 plots the average best fitness versus trials for the two sin maximization elites problems. The figures clearly show that the GA with did improve the convergence rate of the original GA (with one elite). To compare their convergence rates, we summarize the average trials required by each approach to find the global elites reduced the optimum in Table XI. The GA with average trials to find the global optimum by 34% for the , but by only 5% for the problem problem with . It is worth noting that the GA with elites with converged much faster than the GA did during the first 100 000 trials for both problems, as shown in Fig. 17. However, the elites had more difficulty in finding precisely the GA with global optimum for the problem with . In fact, it did not find the optimum precisely in two out of ten runs. These empirical analyses suggested that elitism is major contributing factor to the performance improvement of our hybrid approach, even though it does not account for the entire performance improvement. B. The Effect of the Choice of Simplex Operator As we have pointed out in Section B, one of the main differences between our hybrid and R–B hybrid is the choice of the simplex operator. We used the probabilistic simplex, while the R–B approach used the N–M simplex. It is thus important to understand the effect of different simplex method on the performance of the two hybrid architectures. Toward this goal, we implemented both operators for both architectures, and applied them to the biomodeling problem, the sin function maximization problems, and De Jong’s F5 function. The results are shown in Table XII, Fig. 18, and Table XIII. We made the following observations from these empirical comparisons. First, the probabilistic simplex is more suitable for our elite-based hybrid architecture. For the biomodeling application of our 45% simplex–GA, the average final best fitness using the N–M simplex was 200 times worse than that using the probabilistic simplex. For the sin maximization

YEN et al.: A HYBRID APPROACH TO MODELING METABOLIC SYSTEMS

187

TABLE XI COMPARING THE CONVERGENCE RATE OF N-ELITES GA WITH OTHER APPROACHES FOR THE sin FUNCTION MAXIMIZATION PROBLEMS

N

(a)

(a)

(b)

(b)

Fig. 17. Comparing elites GA with other approaches for the sin function 10 and (b) = 100. maximization problems: (a)

Fig. 18. The effect of the two simplex operators on the two hybrids for solving the function maximization problems: (a) = 10 and (b) = 100.

TABLE XII THE EFFECT OF THE TWO SIMPLEX OPERATORS ON THE TWO SIMPLEX–GA HYBRIDS FOR SOLVING THE BIOMODELING PROBLEM

TABLE XIII THE EFFECT OF THE TWO SIMPLEX OPERATORS ON THE TWO SIMPLEX–GA HYBRIDS FOR SOLVING THE DE JONG’S F5 PROBLEM

applications of our 57% hybrid, their average best fitnesses using the N–M simplex were slightly worse than those using the probabilistic simplex. For the application of our hybrid approach to De Jong’s F5 function, the probabilistic simplex reduces the convergence rate of N–M simplex by about 40%. Second, we are less conclusive about the suitability of the probabilistic simplex operator for R–B’s partition-based hybrid architecture. The probabilistic simplex improved (i.e., reduced) the final fitness of R–B’s approach by three times for the biomodeling problem. It also improves the convergence rate in solving De Jong’s F5 function using R–B’s architecture by 20%. Unfortunately, it did not improve the convergence rate for the sin maximization problem.

Third, the probabilistic simplex seemed more suitable for solving the biomodeling problem, because it significantly enhanced the performance of all three approaches (i.e., our hybrid, the R–B hybrid, and the 100% simplex) as shown in Table XII. This may be explained by first recognizing that a major challenge of the biomodeling problem lies in identifying parameters whose sensitivities vary widely. The main cost of the N–M simplex is the potential extra evaluations required by the expanded point or the contracted point, in addition to evaluating the reflected point. For a search space whose variables’ sensitivities are in about the same order of magnitude (e.g., the sin maximization problem), the benefit

m=

m

m

m

188

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 2, APRIL 1998

(a) Fig. 19. Distribution of operators generating the best chromosomes in the real-coded GA.

of such additional evaluations may be comparable to the cost. Nevertheless, for a problem whose variables’ sensitivities differ by several order of magnitude (e.g., the biomodeling problem), the cost of these evaluations may outweigh their benefits. This working hypothesis is in our future research. C. The Probabilistic Simplex versus the Conventional Simplex We also have designed two studies to develop a better understanding about the cost-effectiveness tradeoff between the stochastic simplex and the conventional simplex. 1) Study A: Distribution of Reproduction Operators Generating the Best Chromosomes: In the first study, we analyzed the distribution of reproduction operators that generates the best chromosomes in each generation for the biomodeling problem. To do this, we recorded the total number of best chromosomes (i.e., the chromosome ranks first in a generation) that a particular reproduction operator (e.g., crossover, simplex, or mutation) generates by a particular generation number. This information enables us to analyze how often a reproduction scheme directly contributes to the generation of a best chromosome.5 We summarize this information by plotting the accumulative number of best chromosomes generated by each reproduction scheme during the entire search process. Such a figure not only shows the frequency a reproduction scheme generates the best chromosome but also shows how the percentage changes over time. Figs. 19–21 show these plots for five optimization approaches to the biomodeling problem: 1) the real-coded GA; 2) our elite-based hybrid GA using the probabilistic simplex; 3) our elite-based hybrid GA using the N–M simplex; 4) the R–B partition-based hybrid GA using the probabilistic simplex; 5) the R–B partition-based hybrid GA using the N–M simplex. We made the following observations from these figures. 1) In our elite-based hybrid architecture, the simplex operator was the main source for generating best chromosomes, regardless whether which simplex method was used. In contrast, the simplex operator was rarely a best 5 In reality, the best chromosome in a generation is the result of a large collection of reproductions occurring over a large number of generations. However, it is difficult to study the indirect contributors to the production of best chromosomes.

(b) Fig. 20. Distribution of operators generating the best chromosomes in our elite-based hybrid GA using (a) the probabilistic simplex and (b) the N–M simplex.

(a)

(b) Fig. 21. Distribution of operators generating the best chromosomes in R–B’s partition-based hybrid GA using (a) the probabilistic simplex and (b) the N–M simplex.

chromosome generator in R–B’s partition-based hybrid GA regardless which simplex method we chose. 2) The probabilistic simplex generated the best chromosomes more frequently than the N–M simplex did for the elite-based hybrid GA, especially during the early stage of the search. By the midpoint of the search

YEN et al.: A HYBRID APPROACH TO MODELING METABOLIC SYSTEMS

(i.e., 6000 trials), the probabilistic simplex accounted for about 70% of the best chromosomes generated, while the N–M simplex accounted for only 45% of the best chromosomes generated. These observations suggested that our elite-based hybrid architecture makes more effective use of the simplex operators (for both the conventional simplex and the stochastic simplex) than the partition-based hybrid architecture does. This study also indicated that the stochastic simplex contributes more to generating the best chromosome than the conventional simplex in our elite-based architecture. 2) Study B: Cost-Effectiveness of the Probabilistic Simplex versus the Conventional Simplex: In the second study, we analyzed the percentage of different operations (e.g., reflection, contraction, expansion, etc.) actually used by the two simplex methods to obtain some insights about the cost-effectiveness of these methods. These operations are associated with different costs in terms of number of evaluations required. The cost of the reflection operation is the lowest (i.e., one evaluation). The cost of the expansion and the contraction operation are two evaluations each. The cost of “contraction to the best” is the highest (i.e., three evaluations). The probabilistic simplex uses only reflection and contraction, while the R–B simplex we compare with, which is a minor variant of N–M simplex, uses all four operations mentioned above. Figs. 22 and 23 show the accumulated count of these operations in solving the biomodeling problem using the four hybrid approaches mentioned in the previous study. The following observations can be made from these figures. 1) The reflection operation, which has the lowest cost, occurred much more frequent in the elite-based hybrid than it did in the partition-based hybrid. This suggested that the elite-based hybrid GA architecture enables the simplex operator to be more cost-effective. 2) The probabilistic simplex used a higher percentage of reflection operations than the N–M simplex did in the elite-based hybrid GA. This finding suggested that the probabilistic simplex is more cost-effective than the N–M simplex in the elite-based GA architecture. 3) For the partition-based hybrid, the percentage of reflection operations was about the same for the two simplex methods. However, the significant portion of “contraction to the best” in the conventional simplex introduced extra cost into the search process. This study suggested that the elite-based hybrid GA architecture uses both simplex methods more cost-effectively than the partition-based hybrid GA architecture does. It also suggested that the probabilistic simplex is more cost-effective than the conventional simplex for the elite-based architecture.

IX. SUMMARY In this paper, we have introduced an elite-based hybrid GA approach using a probabilistic simplex method as an additional operator. The motivation of developing the probabilistic simplex is to introduce a cost-effective exploration component into the conventional simplex method. We have successfully

189

(a)

(b) Fig. 22. Simplex operators applied in the elite-based hybrid using (a) the probabilistic simplex and (b) a conventional simplex.

(a)

(b) Fig. 23. Simplex operators applied in the partition-based hybrid using (a) the probabilistic simplex and (b) a conventional simplex.

applied a real-coded implementation of our simplex–GA hybrid to a metabolic modeling problem. We have compared our approach with 1) an alternative simplex–GA hybrid independently developed by R–B; 2) a pure real-coded GA; 3) a G-bit improvement on real-coded GA; 4) ASA.

190

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 28, NO. 2, APRIL 1998

In addition to the metabolic modeling problem, two other testbeds were used: a sin function maximization problem and De Jong’s F5 function minimization problem. Our hybrid approach outperforms all other approaches for the biomodeling problem and the sin function maximization problem. Our approach gives the second best performance, next to ASA, for De Jong’s F5 function minimization problem. Further analyses also indicated that the performance improvement of our hybrid approach is partially attributed to the use of elitebased hybrid GA architecture and the cost-effectiveness of the probabilistic simplex in the elite-based GA architecture. Based on an observation about the search behavior of our hybrid approach for the biomodeling problem, we conjectured that the proposed partition-based hybrid of probabilistic simplex and GA is particularly suitable for complex optimization problems the variables sensitivity of which vary widely. There are several issues remaining to be addressed in our future research. First, our working hypothesis regarding suitable applications of the proposed simplex–GA hybrid needs to be investigated through theoretical analysis or further empirical evaluations. Second, we plan to study the impact of different probability distributions for the probabilistic simplex on the performance of the hybrid system. Third, we need to fully investigate the relationship between the percentage of simplex reproduction and the performance of our hybrid. Finally, we plan to apply the hybrid approach to the identification of metabolic models using real experiment data. ACKNOWLEDGMENT The authors would like to thank the reviewers for their comments on an earlier draft of the paper. The software package for model simulation DDASAC originated from M. Caracotsios and W. E. Stewart. The GENESIS implementation of GA was developed by J. J. Grefenstette [42]. REFERENCES [1] K. A. Dejong, “Analysis of the behavior of a class of genetic adaptive systems,” Ph.D. dissertation, Dept. Comput. Commun. Sci., Univ. Michigan, Ann Arbor, 1975. [2] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley, 1989. [3] H. Kargupta and R. E. Smith, “System identification with evolving polynomial networks,” in Proc. 4th Int. Conf. Genetic Algorithms, San Diego, CA, July 1991. [4] K. Kristinnson and G. A. Dumont, “System identification and control using genetic algorithms,” IEEE Trans. Syst., Man, Cybern., vol. 22, no. 5, pp. 1033–1046, 1992. [5] H. Iba, T. Kurita, H. deGaris, and T. Sato, “System identification using structured genetic algorithms,” in Proc. 5th Int. Conf. Genetic Algorithms, Urbana-Champaign, IL, July 1993. [6] D. M. Etter, M. J. Hicks, and K. H. Cho, “Recursive adaptive filter design using an adaptive genetic algorithm,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP ’82), Paris, France, May 1982, vol. 2, pp. 635–638. [7] S. Uckun, S. Bagchi, and K. Kawamura, “Managing genetic search in job shop scheduling,” IEEE Expert, vol. 8, no. 5, pp. 15–24, 1993. [8] A. B. Conru, “A genetic approach to the cable harness routing problem,” in Proc. 1st IEEE Conf. Evolutionary Computation, Orlando, FL, June 1994. [9] D. P. Kwok and F. Sheng, “Genetic algorithm and simulated annealing for optimal robot arm pid control,” in Proc. 1st IEEE Conf. Evolutionary Computation, Orlando, FL, June 1994. [10] D. Park and A. Kandel, “Genetic-based new fuzzy reasoning models with application to fuzzy control,” IEEE Trans. Syst., Man, Cybern., vol. 24, pp. 39–47, Jan. 1994.

[11] T. Smith and K. A. DeJong, “Genetic algorithms applied to the calibration of information driven models of us migration patterns,” in Proc. 12th Annu. Pittsburgh Conf. Modeling Simulation, Pittsburgh, PA, 1981, pp. 955–959, . [12] D. J. Janson and J. F. Frenzel, “Training product unit neural networks with genetic algorithms,” IEEE Expert, vol. 8, no. 5, pp. 26–33, 1993. [13] M. F. Bramlette, “Finding maximum flow with random and genetic search,” in Proc. 1st IEEE Conf. Evolutionary Computation, Orlando, FL, June 1994. [14] P. S. de Souza and S. N. Talukdar, “Genetic algorithm in asynchronous teams,” in Proc. 4th Int. Conf. Genetic Algorithms, San Diego, CA, July 1991, pp. 392–397. [15] D. Rogers, “G/SPLINES: A hybrid of Friedman’s multivariate adaptive regression splines (MARS) algorithm with Holland’s genetic algorithm,” in Proc. 4th Int. Conf. Genetic Algorithms, San Diego, CA, July 1991, pp. 384–391. [16] K. E. Mathias, L. D. Whitley, C. Stork, and T. Kusuma, “Staged hybrid genetic search for seismic data imaging,” in Proc. 1st IEEE Conf. Evolutionary Computation, Orlando, FL, June 1994, pp. 356–361. [17] H. Ishibuchi, N. Yamamoto, T. Murata, and H. Tanaka, “Genetic algorithms and neighborhood search algorithms for fuzzy flowshop scheduling problems,” Fuzzy Sets Syst., vol. 67, no. 1, pp. 81–100, 1994. [18] J. Yen, J. C. Liao, D. Randolph, and B. Lee, “A hybrid approach to modeling metabolic systems using genetic algorithm and simplex method,” in Proc. 11th IEEE Conf. Artificial Intelligence Applications (CAIA95), Los Angeles, CA, Feb. 1995, pp. 277–283. [19] J. A. Nelder and R. Mead, “A simplex method for function minimization,” Comput. J., vol. 7, pp. 308–313, 1965. [20] J. Renders and S. Flasse, “Hybrid methods using genetic algorithms for global optimization,” IEEE Trans. Syst., Man, Cybern. B, vol. 26, pp. 243–258, Apr. 1996. [21] D. B. McGarrah and R. S. Judson, “Analysis of the genetic algorithm method of molecular conformation determination,” J. Comput. Chem., vol. 14, no. 11, pp. 1385–1395, 1993. [22] J. H. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor, MI: Univ. Michigan Press, 1975. Data Structures Evolution [23] Z. Michalewicz, Genetic Algorithms Programs. New York: Springer-Verlag, 1992. [24] C. Z. Janikow and Z. Michalewicz, “An experimental comparison of binary and floating point representation in genetic algorithms,” in Proc. 4th Inte. Conf. Genetic Algorithms, San Diego, CA, July 1991, pp. 31–36. [25] J. Kowalik and M. R. Osborne, Methods for Unconstrained Optimization Problems. New York: Elsevier, 1968. [26] R. W. Daniels, An Introduction to Numerical Methods and Optimization Techniques. New York: North-Holland, 1978. [27] H. P. Schwefel, Numerical Optimization of Computer Models. New York: Wiley, 1981. [28] M. J. D. Powell, “An efficient method for finding the minimum of a function of several variables without calculating derivatives,” Comput. J., vol. 7, pp. 155–162, 1964. [29] R. Hooke and T. A. Jeeves, “Direct search solution of numerical and statistical problems,” J. ACM, vol. 8, pp. 212–229, 1961. [30] W. Spendley, G. R. Hext, and F. R. Himsworth, “Sequential application of simplex designs in optimization and evolutionary operation,” Technometrics, vol. 4, pp. 441–461, 1962. [31] S. Kirkpatrick, C. D. Galatti, and M. P. Vecchi, “Optimization by simulated annealing,” Science, vol. 220, pp. 671–680, 1983. [32] P. J. M. van Laarhoven and E. H. L. Arts, Simulated Annealing: Theory and Applications. Boston, MA: Kluwer, 1987. [33] L. Ingber, “Simulated annealing: Practice versus theory,” Math. Comput. Model., vol. 18, no. 11, pp. 29–57, 1993. [34] L. Ingber and B. Rosen, “Genetic algorithms and very fast simulated annealing: A comparison,” Math. Comput. Model., vol. 16, no. 11, pp. 87–100, 1992. [35] J. Renders and H. Bersini, “Hybridizing genetic algorithms with hillclimbing methods for global optimization: Two possible ways,” in Proc. 1st IEEE Conf. Evolutionary Computation, Orlando, FL, June 1994, pp. 312–317. [36] D. H. Ackley, “Stochastic iterated genetic hillclimbing,” Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, PA, 1987. [37] G. Dozier, J. Bowen, and D. Bahler, “Solving small and large scale constraint satisfaction problems using a heuristic-based microgenetic algorithm,” in Proc. 1st IEEE Conf. Evolutionary Computation, Orlando, FL, June 1994. [38] T. Murata and H. Ishibuchi, “Performance evaluation of genetic algorithms for flowshop scheduling problems,” in Proc. 1st IEEE Conf. Evolutionary Computation, Orlando, FL, June 1994.

+

=

YEN et al.: A HYBRID APPROACH TO MODELING METABOLIC SYSTEMS

[39] J. Renders, Algorithmes Genetiques et Reseaux de Neurones: Applications a la Commande de Processus. Paris, France: Hermes, 1994. [40] J. J. Grefenstette, User’s Guide to GENESIS Version 5.0, 1990. [41] M. Caracotsios and W. E. Stewart, “Sensitivity analysis of initial value problems with mixed ODE and algebraic equations,” Comput. Chem. Eng., vol. 9, pp. 359–365, 1985. [42] J. Grefenstette, “GENESIS: A system for using genetic search procedures,” in Proc. 1984 Conf. Intelligent Systems Machines, pp. 161–165.

John Yen (SM’91) received the B.S. degree in electrical engineering from National Taiwan University, Taipei, Taiwan, R.O.C., in 1980 and the Ph.D. degree in computer science from the University of California, Berkeley, in 1986. Since 1989, he has been an Associate Professor with the Department of Computer Science and the Director of Center for Fuzzy Logic, Robotics, and Intelligent Systems, Texas A&M University, College Station. Previously, he had been conducting artificial intelligence research as a Research Scientist at the Information Sciences Instsitute, University of Southern California, Los Angeles. His research interests include artificial intelligence, fuzzy logic, software engineering, and evolutionary computing. He is an Associate Editor of the IEEE TRANSACTIONS ON FUZZY SYSTEMS. Dr. Yen is a member of the board of directors of the North American Fuzzy Information Processing Society (NAFIPS), the Secretary of the International Fuzzy Systems Association (IFSA), and the Newsletter Editor of the IEEE Neural Network Council. He received the National Science Foundation Young Investigator Award in 1992 and the K. S. Fu Award from NAFIPS in 1994.

James C. Liao received the B.S. degree in chemical engineering from National Taiwan University, Taipei, Taiwan, R.O.C., in 1980 and the Ph.D. degree in chemical engineering from the University of Wisconsin, Madison, in 1987. He is currently Professor of Chemical Engineering at the University of California, Los Angeles. Previously, he was on the faculty of Texas A&M University, College Station. Prior to that, he was a Research Scientist at Eastman Kodak Company, Rochester, NY. His current research includes metabolic engineering, metabolic control and flux analysis, and regulation in micorcirculation. Dr. Liao received the National Science Foundation Young Investigator Award in 1992, and is a member of the American Institute of Chemical Engineers, the American Chemical Society, and the American Society of Microbiology.

191

Bogju Lee received the B.S. degree in computer engineering from Seoul National University, Seoul, Korea, in 1986, the M.S. degree in computer science from the University of South Carolina, Columbia, in 1992, and the Ph.D. degree in computer science from Texas A&M University, College Station, in 1996. He is currently a Member of Technical Staff, AT&T Laboratories, Middletown, NJ. His current research includes fuzzy logic, hybrid genetic algorithms, and complex system modeling and identification using artificial intelligence. Dr. Lee is a member of the American Association for Artificial Intelligence.

David Randolph received the M.S. degree in computer science from Texas A&M University, College Station, in 1995, and the B.A. degree in computer science from Stetson University, De Land, FL, in 1990. He is currently an Engineer at Raytheon, Garland, TX. His area of interest is the optimization of distributed systems.