Engineering Optimization Vol. 37, No. 7, October 2005, 685–703
A framework for design optimization using surrogates KOK SUNG WON*† and TAPABRATA RAY‡ †Temasek Laboratories, 5 Sports Drive 2, National University of Singapore, Singapore 117508 ‡School of Aerospace, Civil and Mechanical Engineering, University of New South Wales, Australian Defense Force Academy (Received 12 May 2004; revised 7 December 2004; in final form 1 June 2005) Design optimization is a computationally expensive process as it requires the assessment of numerous designs and each of such assessments may be based on expensive analyses (e.g. computational fluid dynamics method or finite element based method). One way to contain the computational time within affordable limits is to use computationally cheaper approximations (surrogates) in lieu of the actual analyses during the course of optimization. This article introduces a framework for design optimization using surrogates. The framework is built upon a stochastic, zero-order, population-based optimization algorithm, which is embedded with a modified elitism scheme, to ensure convergence in the actual function space. The accuracy of the surrogate model is maintained via periodic retraining and the number of data points required to create the surrogate model is identified by a k-means clustering algorithm. A comparison is provided between different surrogate models (Kriging, radial basis functions (Exact and Fixed) and Cokriging) using a number of mathematical test functions and engineering design optimization problems. The results clearly indicate that for a given fixed number of actual function evaluations, the surrogate assisted optimization model consistently performs better than a pure optimization model using actual function evaluations. Keywords: Kriging; Cokriging; Radial basis function
1.
Introduction
Modeling and managing surrogates within an optimization framework is a fairly recent and an active area of research. Although most real-life optimization problems are known to involve complex and computationally expensive analyses, the computational burden can be contained within affordable limits using inexpensive approximations (surrogates) in lieu of actual analyses (Wang et al. 2000). Literature in the area of surrogate-assisted optimization can be broadly classified into two major groups. The first group focuses on a better choice of experiment planning and on the development of accurate approximation models, whereas the second group concentrates more on efficient management of the approximation model (Jin et al. 2002) to ensure that it attains the optimum of the actual problem. *Corresponding author. Email:
[email protected]
Engineering Optimization ISSN 0305-215X print/ISSN 1029-0273 online © 2005 Taylor & Francis http://www.tandf.co.uk/journals DOI: 10.1080/03052150500211911
686
K. S. Won and T. Ray
Optimization algorithms can be broadly classified into two categories: gradient-based methods and stochastic methods. Gradient-based methods include conventional techniques such as steepest descent and conjugate-gradient algorithms, whereas stochastic methods are fairly recent, zero-order innovations such as simulated annealing (SA) and evolutionary algorithms. Unfortunately, methods from both categories require evaluation of numerous candidate solutions. Such evaluations are computationally expensive as they often involve computational fluid dynamics (CFD) and finite element method computations. Some natural questions that arise are ‘Is it possible to reduce the number of such function evaluations and, if so, how accurate are the results?’ This work is aimed to provide some of the answers to this broad question and to introduce the surrogate-assisted design optimization framework. The authors, in an earlier article (Won et al. 2003), introduced a primitive form of this framework and reported the performance of radial basis function (RBF)-based surrogates on a set of five mathematical test functions. This article further builds on the framework to incorporate ideologies from both groups, i.e. explores better choices of surrogates and design efficient surrogate management schemes that can handle constraints. The issues that are addressed in this article are summarized as follows: (a) The RBF-based surrogate model performed reasonably well on the test functions in the earlier study (Won et al. 2003). The RBF-Fixed model was created using m/2 solutions, where m was the population size. For this study, an additional RBF model is introduced, called RBF-Exact model, which uses all the data points as training input. This study investigates whether the use of the entire population for training leads to degradation of the RBF model in prediction due to over-fitting and introduction of false optima, as pointed out by Haykin (1999) and Jin (2003). (b) To provide the designer with a greater choice of surrogate models, Kriging is introduced as an alternative to RBF-based surrogates. Kriging is particularly attractive as the confidence interval of the estimation can be obtained without additional computational cost. (c) There is also an important class of problems where the primary performance function is computationally expensive to evaluate but there is a secondary performance function that is cheap to compute and at the same time known to be highly correlated to the primary function. A typical example would be the adjoint-based formulations, where the gradient information is readily available in addition to the performance function. A surrogate model is introduced on the basis of a Cokriging formulation, which is able to make use of secondary performance information to enhance the efficiency of the search. The remainder of this article is structured as follows: section 2 provides the description of the framework with all the necessary details, whereas section 3 delineates the mathematical formulations of the initialization scheme, k-means clustering, RBF and Kriging and Cokriging models. In section 4, the experiment setup and the empirical results are presented and discussed. Section 5 summarizes the capabilities of the framework and lists some ongoing efforts to further improve the usability and the efficiency of the framework.
2.
Outline of framework
The optimization algorithm supporting the framework is a population-based, stochastic, zeroorder, elite-preserving scheme, which makes use of approximate function evaluations in lieu of actual function evaluations. A distinction has to be made here between an actual and an approximate function calls. An approximate function call refers to the value predicted by the surrogate model, whereas an actual function call refers to the value derived from the
Design optimization using surrogates
687
computationally intensive procedures e.g. CFD. The focus is to reduce the number of actual function calls. The pseudo-code of the framework is shown in figure 1a. Figure 1b shows additional details of the retraining phase of the framework where the training of the surrogate model is derived from actual function evaluated individuals in the population. The accuracy and
Figure 1. (a) Pseudo-code of the proposed optimization framework. (b) Details of the training/retraining phase of the proposed optimization framework.
688
K. S. Won and T. Ray
relevance of the surrogate model is maintained using periodic retraining. During the retraining phase, all the individuals of the population are evaluated using actual function evaluations, and the best solution referred as ‘actual elite’ is carried over to the population of next generation and so on until it is time for retraining. The set of elites out of the approximation referred as ‘approximated elites’ are also carried over to the population of the next generation. The presence of the actual elite in the population ensures a monotonic convergence in the actual function space. In figure 1a, λ and γ are integer values which denote the number of predetermined generations and the predetermined retraining frequency, respectively. A large value of γ generally results in an inferior surrogate model that might cause incongruency between approximate function space and actual function space, whereas a small value of γ retains the validity and the accuracy of the surrogate model at the expense of computational costs. Therefore, γ may be regarded as a user-defined specification of the desired resolution of the surrogate model, which can be adjusted on the basis of the complexity of the problem and the availability of computational resources. Experience shows that a γ value of 2–10 offers the creation of a robust surrogate model. As the entire optimization framework is a population-based genetic algorithm, the number of function evaluations would tend to be large. For this framework, λ is usually set to 8000–15,000. This is already necessarily less than the number of function evaluations required by other algorithms (Deb et al. 2002, Yao et al. 1999), which are orders of larger magnitude. This is so because the framework is driven by approximate function evaluations rather than actual ones, hence the immense savings in computational costs. In a typical GA, elitism refers to maintaining the best solution across generations. In the framework, it is necessary to modify the standard elitism scheme to carry over promising solutions that are derived out of approximation (approximated elites) and also the best solution that has been obtained out of actual computation (actual elite). As mentioned earlier, maintaining the actual elite ensures convergence in the actual function space. A brief discussion is now presented on the ranking and elite identification scheme that is employed within the algorithm. A single objective constrained optimization problem can be presented as follows. Minimize f (x)
(1)
Subject to gi (x) ≥ ai ,
i = 1, 2, . . . , r1
(2)
hj (x) = bj ,
j = 1, 2, . . . , r2
(3)
where there are r1 inequality and r2 equality constraints and x = [x1 x2 · · · xn ] is the vector of n design variables. To handle equality constraints, each equality constraint is replaced by a pair of inequalities of the form hj (x) ≥ bj − δ and hj (x) ≤ bj + δ. Thus r2 equality constraints lead to 2r2 inequalities, and the total number of inequalities for the problem is denoted by s = r1 + 2r2 . Given a population of m solutions, the feasible and the infeasible solutions are separated into two sets (Set F for feasible solutions and Set IF for infeasible solutions). Assume that there are p solutions in Set F and q solutions in Set IF. The set of feasible solutions are sorted on the basis of their objective function value such that the best solution has a rank = 1 and the worst solution has a rank = R where R ≥ 1. The constraint matrix for the set of q infeasible
Design optimization using surrogates
solutions takes the following form:
c1s c2s .. .
c11 c21 .. .
c12 c22 .. .
··· ··· .. .
cq1
cq2
· · · cqs
689
(4)
For each infeasible solution, c denotes the constraint satisfaction vector given by c = [c1 c2 . . . cs ] where, 0 if constraint is satisfied, i = 1, 2, . . . , s a − g (x) if constraint is violated, i = 1, 2, . . . , r1 i i ci = b − δ − h (x) if constraint is violated, i = r1 + 1, r1 + 2, . . . , r1 + r2 i i −bi − δ + hi (x) if constraint is violated, i = r1 + r2 + 1, r1 + r2 + 2, . . . , s (5) For the ci ’s in equation (5), ci = 0 indicates that the ith constraint is satisfied, whereas ci > 0 indicates the violation of the constraint. The rank of the infeasible solutions is derived using non-dominated sorting based on the constraint matrix (equation (4)). The rank of the infeasible solutions is then incremented by a constant value that is equal to the rank of the worst feasible solution. The rank of every solution in the population is then converted to fitness as follows: Fitness(i) = Max.Rank − 1 − Rank(i)
(6)
where Max.Rank is the maximum rank of an individual in the population. Solutions with rank less or equal to the average rank of the individuals in the population are referred to as elites. Ray et al. (2000) first proposed the use of non-dominated ranks to deal with constraints and in this study the ranking of the solutions is based on the non-dominated sorting genetic algorithm (Srinivas and Deb 1995).
3.
Initialization scheme and surrogate models
This section provides the mathematical details of the initialization scheme and the surrogate models that have been used in this study. 3.1
Random initialization
Random initialization is the most commonly used technique to initialize a population of solutions (Yao et al. 1999). The solutions are created using equation (7): x˜ = xlow + δ(xupp − xlow )
(7)
where x˜ denotes the initialized variable, xlow and xupp denote the lower and upper bounds of the variable and δ is a uniform random number lying between 0 and 1. 3.2
k-Means clustering algorithm
The k-means clustering algorithm is used to identify k data sets that are used to create the surrogate model. Consider a set of m data sets {x1 , x2 , . . . , xm } in n-dimensional space. It is desired to obtain k centers, i.e. C = {c1 , . . . , ck } using the k-means algorithm. The steps involved can be summarized subsequently as follows.
690
K. S. Won and T. Ray
1. Assign the first k data sets as k centers, i.e. C = {x1 , . . . , xk } = {c1 , . . . , ck }. 2. For each data point xi , compute its membership function, ψ and weight, w: 1 if l = arg minj xi − cj 2 ψ(cl |xi ) = 0 otherwise w(xi ) = 1
∀i = 1, . . . , m
(8) (9)
It can be seen from the definition of the membership function that k-means uses a hard membership and a constant weight function that gives all data points equal importance. 3. For each center cj , recompute its location from all data points xi according to their memberships and weights: m ψ(cj |xi )w(xi )xi (10) cj = i=1 m i=1 ψ(cj |xi )w(xi ) Repeat steps 2 and 3 until convergence. Usually this is done by ensuring the membership function is unchanged for all data points between iterations. The k-means clustering is a popular choice as it is easy to understand and implement. 3.3
Radial basis functions
RBFs belong to the class of artificial neural networks and are a popular choice for approximating non-linear functions. This section provides the necessary details of implementing a RBFs for completeness. A RBF φ is one whose output is symmetric around an associated centre, µ. That is: φ(x) = φ(x − µ), where the argument of φ is a vector norm (Sundararajan and Saratchandran 1999). Usually the Euclidean norm is adopted (Jin 2003) and selecting 2 2 φ(r) = e−r /σ yields the Gaussian function as an RBF, where σ is the width or scale parameter. A set of RBFs can serve as a basis for representing a wide class of functions that are expressible as linear combinations of the chosen RBFs: y(x) =
m
wj φ(x − xj )
(11)
j =1
However, equation (11) is usually very expensive to implement if the number of data sets is large. Thus a generalized RBF network is usually adopted (Haykin 1999, Jin 2003): y(x) =
k
wj φ(x − µj )
(12)
j =1
Here k is typically smaller than m and wj are the unknown parameters that are to be ‘learned’. The k number of data sets is determined from the k-means clustering mentioned previously. The training is usually achieved via the least square solution: w = A+ d
(13)
Here, A+ is the pseudo-inverse and d is the target output vector. The pseudo-inverse is used because typically A is a rectangular matrix and thus no inverse exists. However, the computation of the pseudo-inverse requires a matrix inversion which is computationally expensive for large problems and thus recursive least-squares estimation (Astrom and Wittenmark 1984, Strobach 1990) is often used.
Design optimization using surrogates
3.4
691
Kriging model
Kriging is an approximation scheme that can be used in lieu of RBF. It relies on two component models that can be expressed mathematically as, y(x) = f (x) + Z(x)
(14)
where f (x) represents a global model and Z(x) is the realization of a stationary Gaussian random function with zero mean and non-zero covariance that creates a localized deviation from the global model (Koehler and Owen 1996). If f (x) is taken to be an underlying constant (Simpson et al. 1998), equation (14) then becomes: y(x) = β + Z(x).
(15)
The estimated model prediction of equation (15) is given by, ˆ yˆ = βˆ + rT (x)R−1 (y − eβ)
(16)
where y is the column vector of response data of size m, e is a unit vector of length m and r is the correlation vector of length m between the given input x and the data samples {x1 , x2 , . . . , xm }. R is the symmetric correlation matrix of dimension m × m with values of unity along the diagonal. The correlation function is specified by the user, and this study has used a Gaussian exponential correlation function of the form provided by Giunta and Watson (1998): n 2
j i j i R(x , x ) = exp − (17) θk xk − xk k=1
where xki denotes the kth variable of the ith data sample. The correlation vector r is expressed as T r(x) = R(x, x1 ), R(x, x2 ), . . . , R(x, xm ) .
(18)
The value for βˆ is estimated using the generalized least squares method as, −1 T −1 e R y. βˆ = eT R−1 e
(19)
As R, r and βˆ are a function of the unknown variable θ, one can compute equation (16) once θ is obtained. The value of θ is obtained by maximizing the following function over the interval θ ≥ 0, m ln(σˆ 2 ) + ln |R| − (20) 2 where, σˆ 2 =
ˆ ˆ T R−1 (y − eβ) (y − eβ) . m
(21)
Equation (20) can be solved by using the maximum likelihood method. In this study, an implementation of the SA algorithm by Goffe et al. (1994) has been used to solve for θ. One advantage of using Kriging models is that a confidence interval of the estimation can be obtained without additional computational cost: in other words, the error is quantifiable.
692
K. S. Won and T. Ray
3.5 Cokriging model Both the RBF and the Kriging models make use of function information only to create the surrogate models. There is also an important class of problems where the primary performance function is computationally expensive to evaluate but there are secondary performance functions that are cheap to compute and at the same time they are known to be highly correlated to the primary function. A typical example would be the adjoint-based formulations (Martin et al. 2002), where the gradient information is available in addition to the performance function. The gradient information is usually well cross-correlated with the function values and thus contains useful additional information (Chung and Alonso 2002). The Cokriging method can approximate the unknown primary function of interest more effectively by using these secondary function values (Isaaks and Srivastava 1989). This section briefly describes the theory behind Cokriging approximations. For the original Kriging method, the covariance matrix of Z(x) is defined as: Cov y(xi ), y(xj ) = σ 2 R R(xi , xj )
(22)
where R is the correlation matrix and R(xi , xj ) is the correlation function given by equation (17). As the correlation matrix Rc and the correlation vector rc for the Cokriging method are evaluated using function values and gradients, the covariance can be modified as follows, Cov y(xi ), y(xj ) = σ 2 R R(xi , xj ) ∂R(xi , xj ) ∂y(xj ) i Cov y(x ), = σ2 ∂xk ∂xk ∂y(xi ) ∂R(xi , xj ) Cov , y(xj ) = −σ 2 ∂xk ∂xk i j 2 ∂ R(xi , xj ) ∂y(x ) ∂y(x ) = −σ 2 Cov , ∂xk ∂xl ∂xk ∂xl
(23) (24) (25) (26)
Accordingly, the prediction by the Cokriging model can be obtained by modifying equation (16) to yield, yˆc = βˆc + rcT (x)Rc−1 (yc − ec βˆc )
(27)
βˆc = (eTc Rc−1 ec )−1 eTc Rc−1 yc ∂y(x1 ) ∂y(x2 ) ∂y(xm ) yc = y(x1 ), . . . , y(xm ), , ,..., ∂x1 ∂x2 ∂xn
(28)
where,
ec = [1, 1, . . . , 1, 0, 0, . . . , 0]
(29) (30)
Here, ec contains m ones and m × n zeros. The reader is referred to Koehler and Owen (1996) for a more detailed treatment of both Kriging and Cokriging methods. A caveat for employing Cokriging is that the inference from auxiliary data becomes extremely demanding as the dimensionality increases. This is so because Cokriging requires not only the correlations between variables but also the correlation and cross-correlation between variables and their partial-derivatives.
Design optimization using surrogates
4.
693
Numerical examples
The following section provides the details of experiment setup and a discussion on the results obtained using RBF-, Kriging- and Cokriging-based surrogates within the optimization framework. 4.1
Experiment setup
For this study, five unconstrained single-objective test functions were chosen as shown in table 1: spherical, Rosenbrock, Rastrigin, Schwefel and ellipsoidal and two well-known singleobjective constrained engineering problems (see Appendix A) to assess the performance of the surrogates within the optimization framework. The problems are scalable and are commonly used to assess the performance of optimization algorithms (Deb et al. 2002, Ono et al. 1999). For all five test functions except Rosenbrock, the global minimum is f (x) = 0 at {xi }n = 0. Rosenbrock has a global minimum of f (x) = 0 at {xi }n = 1. The test functions are meant to serve as benchmarks for computational accuracy. The best known solutions for the weldedbeam design is f (x) = 2.38096 and for the speed-reducer problem, f (x) = 2996.232. All numerical simulations are carried out using the optimization framework with the following conditions. 1. For the five unconstrained test functions, n = 10 was used. The speed-reducer problem involves seven variables, whereas the welded-beam design problem involves four variables. Both the welded-beam design and the speed-reducer problems are constrained optimization problems and the details of their formulation are presented in the Appendix A. 2. For all the test cases, 20 independent runs were conducted. For all the seven test functions, the termination criterion was set to a maximum of 10,000 actual function evaluations. A search space of S ∈ [−5, 5] was chosen as the variable bounds for the five 10-dimensional mathematical test problems. The bounds for the engineering design problems are described in detail in the Appendix A. 3. The behavior is reported of RBF-Fixed, which uses a constant m/2 samples for surrogate creation and RBF-Exact which uses a constant m samples to identify the number of data points to be used to create the model. The Kriging model also used m/2 samples similarly identified by k-means, and the Cokriging model used n samples (here m is the population size and n is the dimensionality of the problem). Table 1.
List of test functions.
Test function
Formula
Spherical
f (x) =
Ellipsoidal
f (x) =
n
xi2
i=1
Schwefel
Rosenbrock
f (x) = f (x) =
n
ixi2 i=1 n i
i=1 n−1
xj
j =1
2 (xi − 1)2 + 100 xi2 − xi+1
i=1
Rastrigin
2
f (x) = 10n +
n
xi2 − 10 cos(2π xi )
i=1
694
K. S. Won and T. Ray
4. The crossover operator for offspring production is the parent-centric crossover which follows closely that suggested by Deb et al. (2002) with the variance parameters set to 0.1. Three parents were used to create a child. 5. Retraining of the surrogate model was performed after every 10 generations. Evidently, this approach is an online learning strategy and it has been reported in Wilmes et al. (2003) to be more successful and reliable than offline, prebuilt models. 6. The entire simulation process was executed in C code using a Linux-based 1.0 GHz CPU processor, with 2.0 GB SDRAM.
4.2 Results and discussions The performance of the Kriging model is compared with an implementation that uses only actual function evaluations (called Actual model) and the two variants of RBF model (RBFExact and RBF-Fixed). Tables 2–5 show the statistics of the various simulation runs. The best, worst, median, mean and standard deviation based on 20 independent runs are reported. As has been presented and validated in a previous article (Won et al. 2003), the concept of controlled elitism within the framework ensures a monotonic convergence in the actual function space (see figures 2– 8). Focusing on the best fitness results, it is noted that Kriging performs the best among the four. In addition, comparing the performance in terms of standard deviations (which would indicate the consistency in obtaining good solutions always), the Kriging model once again gave the best results. It should be noted here that both forms of RBF also gave very good results when compared with the actual model, in fact from figure 2, it can be seen that the final solutions obtained by the surrogate models are two orders smaller than that obtained by the actual model. It may seem counter-intuitive at first that an optimization using actual evaluations performs poorer than the approximation-assisted one; however, it should be noted that the termination criteria have been set to be 10,000 actual function evaluations. For the approximation-assisted optimization in addition to the 10,000 actual evaluations, it had the Table 2. Test function Spherical Ellipsoidal Schwefel Rosenbrock Rastrigin Speed-reducer Welded-beam
Statistics for simulations using actual function evaluations.
Best fitness
Worst fitness
Mean fitness
Median fitness
Standard deviation
0.0317866 0.2252000 0.5858090 20.38040 8.14926 2995.63 2.40377
0.179708 0.879624 2.617250 110.2320 26.6938 2999.03 5.51449
0.102095 0.498346 1.717890 47.3883 14.9765 2996.87 2.90925
0.102835 0.411519 2.145330 40.7613 14.4088 2996.72 2.58158
0.0363848 0.247806 0.720978 26.7566 4.65802 0.942666 0.759173
Table 3. Test function Spherical Ellipsoidal Schwefel Rosenbrock Rastrigin Speed-reducer Welded-beam
Statistics for simulations using RBF-Exact.
Best fitness
Worst fitness
Mean fitness
Median fitness
Standard deviation
0.000548036 0.00290367 0.403638 9.33512 2.01628 3001.29 2.41776
0.00769814 0.0588015 7.27972 44.2767 13.0291 3030.11 2.7681
0.0032353 0.0211985 2.11748 18.1896 4.95862 3016.25 2.57647
0.00286642 0.0147544 1.74323 16.5516 4.73919 3017.59 2.60404
0.00217047 0.0160679 1.84554 7.96005 2.56844 7.69543 0.0919161
Design optimization using surrogates Table 4. Test function Spherical Ellipsoidal Schwefel Rosenbrock Rastrigin Speed-reducer Welded-beam
695
Statistics for simulations using RBF-Fixed.
Best fitness
Worst fitness
Mean fitness
Median fitness
Standard deviation
0.00100696 0.00475411 2.96688 8.87244 1.75728 3001.09 2.40072
0.00531813 0.0299909 18.1701 57.4211 10.1031 3024.78 2.75093
0.00202531 0.014889 10.4434 24.4998 4.12777 3009.78 2.53729
0.0014617 0.0130941 9.95524 21.2185 3.84061 3007.81 2.51145
0.00125824 0.0067076 4.4814 11.8556 2.14827 6.3743 0.0917337
benefit of approximate evaluations to further guide it, thus it should not be surprising that it outperforms the optimization that uses actual evaluations only (see table 6). It is also important to take note that for the two engineering design problems, the solutions obtained from both RBF and Kriging were either equal to the best known solutions or within 1.0% of it. From the earlier mentioned observations, it shows that a meta-model driven optimization framework is highly feasible. Table 6 summarizes the number of actual and approximate function evaluations required for each of the surrogate model. As evident, the number of approximate function evaluations is several times higher than that of the actual function evaluations. This is to be expected because the search process of the GA is now approximation-driven. The fact that the surrogate models have indeed performed better than the actual model shows that benefits were derived from the approximate function calls in the course of the optimization. For CPU times, it would only be a fair comparison first to compare the number of approximate function evaluations required for both types of clustering. From table 6, it is evident that for the five mathematical test functions, the number of approximate function evaluations is about the same for Kriging and RBF-Exact; hence, the mean CPU time can be compared with that the GA took for both these surrogate models. Typically for the spherical test function, the RBF-Exact model required 74.91 s and Kriging model took 14,400 s. For the speed-reducer problem, it was 211.39 s and 31,100.5 s, respectively. The engineering problems would generally take longer time because of the non-dominated ranking and modeling of the constraints. As for the vast computational time difference between the RBF and Kriging models, a possible reason is due to the fact that the Kriging model requires a solution to a minimization problem in its formulation for the θ values and the SA algorithm required to solve this took up the bulk of the computational time. In constrast, RBF only requires a matrix inversion for training and subsequently a ‘cheap’ feed-forward pass for each prediction. Comparing the convergence plots in greater detail, it can be seen that for all the five mathematical test functions (except Schwefel), all three surrogate models are very competitive and Table 5. Test function Spherical Ellipsoidal Schwefel Rosenbrock Rastrigin Speed-reducer Welded-beam
Statistics for simulations using Kriging with fixed m/2 k-means identification of training data. Best fitness
Worst fitness
Mean fitness
Median fitness
Standard deviation
0.000274099 0.00194719 0.370233 3.50452 1.22508 2995.23 2.41936
0.00393148 0.0436206 15.3964 87.9599 12.2575 2996.85 2.7124
0.00185054 0.0134435 3.65186 12.4525 5.18184 2995.88 2.54607
0.00168806 0.0120086 2.0404 8.57314 4.54008 2995.55 2.54786
0.000919109 0.0104925 3.86965 17.9126 2.42692 0.857982 0.0992381
696
K. S. Won and T. Ray
Figure 2.
Convergence plots for spherical test function.
Figure 3.
Convergence plots for ellipsoidal test function.
Design optimization using surrogates
Figure 4.
Figure 5.
Convergence plots for Schwefel test function.
Convergence plots for Rosenbrock test function.
697
698
K. S. Won and T. Ray
Figure 6.
Convergence plots for Rastrigin test function.
Figure 7.
Convergence plots for speed-reducer problem.
Design optimization using surrogates
Figure 8.
699
Convergence plots for welded-beam problem.
gave better results than the actual model. In fact for RBF-Exact, no performance degradation was observed and for test functions Schwefel and Rosenbrock, it actually performed better than RBF-Fixed. Therefore, it seems that the issue of over-fitting by a surrogate model is not too much a concern. As for the two engineering design problems, once again the solution obtained by the actual model was the worst, whereas those obtained by the surrogate models are closely comparable. Often in some problems, secondary information such as gradient values may be available as a result of the analysis procedure. Theoretically, the Cokriging method can approximate the unknown primary function of interest more effectively by using these secondary function values (Isaaks and Srivastava 1989). In this article, we restrict ourselves to just the two engineering design problems and present the results of using Cokriging as a surrogate model. Because the mathematical expressions for the functions are known, their partial derivatives can be computed and used as a secondary performance measure in the Cokriging model.
Table 6.
Number of function evaluations required for Actual, RBE, RBF and Kriging. Actual
RBF-Exact
RBF-Fixed
Kriging
Test function
Actual
Approximate
Actual
Approximate
Actual
Approximate
Actual
Approximate
Spherical Ellipsoidal Schwefel Rosenbrock Rastrigin Speed-reducer Welded-beam
10,000 10,000 10,000 10,000 10,000 10,000 10,000
0 0 0 0 0 0 0
10,000 10,000 10,000 10,000 10,000 10,000 10,000
65,090 62,590 58,964 56,198 77,453 62,557 72,290
10,000 10,000 10,000 10,000 10,000 10,000 10,000
64,542 62,093 65,485 56,644 68,332 62,046 73,471
10,000 10,000 10,000 10,000 10,000 10,000 10,000
68,678 69,669 67,679 68,723 80,538 89,838 85,907
700
K. S. Won and T. Ray Table 7.
Statistics for engineering design problems using Cokriging with random initialization.
Test function Speed-reducer Welded-beam
Best fitness
Worst fitness
Mean fitness
Median fitness
Standard deviation
2996.37 2.39903
3018.10 2.67716
3008.20 2.50422
3010.26 2.48181
7.91994 0.090955
Table 8. Average number of function evaluations required for Cokriging. Test function
Actual
Approximate
Speed-reducer Welded-beam
10,000 10,000
70,131 68,665
However, the Cokriging formulation becomes highly intractable for high dimensions and with increasing number of sample points. In this article, the number of sample data points is set to the dimensionality of the problem. For example, the speed-reducer design problem has seven variables and thus seven primary data points were used and 49 partial derivative information to create the Cokriging model. For the welded-beam design problem involving four variables, four primary data points and 16 partial-derivatives information were employed. The statistics of the simulations are presented in tables 7 and 8. Table 7 shows that although Cokriging utilizes only seven and four primary data sample points for the speed-reducer and welded-beam design problems, respectively, the surrogate model performed admirably. As for the CPU time, a Cokriging simulation typically takes ∼750.2 s for the speed-reducer problem, which is somewhere in between the time taken by the RBF and the Kriging models.
5.
Summary and the conclusions
The contributions and findings of this study can be summarized as follows. 1. It has proposed a scheme in the context of the genetic algorithm that unifies the modeling and management of surrogates within the optimization framework. 2. The feasibility and potential of RBF and Kriging as surrogate models have been demonstrated empirically through a suite of mathematical test functions and real-life engineering design problems. 3. It has also introduced a Cokriging model which is able to use the secondary performance information effectively. The study used the gradient information in addition to the performance functions. To better understand the behavior of the surrogates within the framework, we pose a set of questions and try to address them in the future. 1. The current implementation of Kriging uses a fixed parameter of half the individuals (m/2) for model creation. What would be its performance if an automated identification of sample points is used? 2. Would an orthogonal array initialization offer any advantage for the surrogate models? 3. What could be an automated identification of sample points for Cokriging such that the onus of providing an a priori sample number is lifted from the user?
Design optimization using surrogates
701
4. Although the k-means clustering is widely used, a major shortcoming is that the search is prone to local minima (Pelleg and Moore 2000). We suggest the use of a fuzzy subtractive clustering method (SCM), due to Chiu (1994), that could better classify data sets into clusters. Fuzzy SCM is an extension of the mountain clustering method proposed by Yager and Filev (1994) and is superior to k-means because it incorporates functional information in addition to parametric ones into its clustering procedure. Hence, we believe that this would be a more accurate assessment of the data set. This study has answered some of our questions and has opened a few more which we are currently working on.
References Astrom, K. and Wittenmark, B., Computer Controlled Systems: Theory and Designs, 1984 (Prentice Hall: Englewood Cliffs, NJ). Chiu, L.S., Fuzzy model identification based on cluster estimation. J. Intell. Fuzzy Syst., 1994, 2, 267–278. Chung, H.S. and Alonso, J.J., Design of a low-boom supersonic business jet using cokriging approximation models, in 9th AIAA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, Atlanta, Georgia, 2002. Deb, K., Anand, A. and Joshi, D., A computationally efficient evolutionary algorithm for real-parameter optimization. KanGAL Report Number 2002003, 2002. Giunta, A.A. and Watson, L.T., A comparison of approximating modeling techniques: polynomial versus interpolating models, in 7th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, St Louis, MO, AIAA, 1998. Goffe, W.L., Ferrier, G.D. and Rogers, J., Global optimization of statistical functions with simulated annealing. J. Econometrics, 1994, 60(1/2), 65–100. Haykin, S., Neural Networks: A Comprehensive Foundation (2nd edn), 1999 (Prentice-Hall: New Jersey). Isaaks, E.H. and Srivastava, R.M., An Introduction to Applied Geostatistics, 1989 (Oxford University Press: Oxford). Jin, Y., A comprehensive survey of fitness approximation in evolutionary computation. In Soft Computing, pp. 3–12, 2003 (Springer-Verlag: Berlin). Jin, Y., Olhofer, M. and Sendhoff, B., A framework for evolutionary optimization with approximate fitness functions. IEEE Trans. Evol. Comput., 2002, 6(5), 481–494. Koehler, J.R. and Owen, A.B., Computer experiments. In Handbook of Statistics, Volume 13, edited by S. Ghosh and C.R. Rao, pp. 261–308, 1996 (Elsevier Science BV: New York). Martin, J., Alonso, J.J. and Reuther, J., High-fidelity aero-structural design optimization of a supersonic business jet, in 43rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference, Denver, CO, 2002. Ono, I., Kita, H. and Kobayashi, S., A robust real-coded genetic algorithm using unimodal distribution crossover augmented by uniform crossover: effects of self-adaptation of crossover probabilities, in Proceedings of International Conference on Genetic Algorithms, Orlando, Florida, 1999, pp. 496–503. Pelleg, D. and Moore, A., X-means: extending k-means with efficient estimation of the number of clusters, in Proceedings of the International Conference on Machine Learning, Stanford, California, 29 June–2 July 2000, pp. 727–734. Ray, T., Tai, K. and Seow, K.C., An evolutionary algorithm for constrained optimization, in Proceedings of the Genetic and Evolutionary Computing Conference (GECCO 2000), Las Vegas, 2000. Simpson, T.W., Amery, T.M., Korte, J.J. and Mistree, F., Comparison of response surface and kriging models in the multidisciplinary design of an aerospike nozzle, in 7th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, St Louis, MO, AIAA, 1998. Srinivas, N. and Deb, K., Multiobjective function optimization using non-dominated sorting genetic algorithms. Evol. Comput. J., 1995, 2(3), 221–248. Strobach, P., Linear Prediction Theory: A Mathematical Basis for Adaptive Systems, 1990 (Springer-Verlag: New York). Sundararajan, N. and Saratchandran, P., Radial Basis Function Neural Networks with Sequential Learning, 1999 (World Scientific Publishing Co Ltd: London). Wang, G.G., Dong, Z. and Aitchison, P., Adaptive response surface method – a global optimization scheme for approximation-based design problems. Eng. Opt., 2000, 33, 707–733. Willmes, L., Bäck, T., Jin, Y. and Sendhoff, B., Comparing neural networks and kriging for fitness approximation in evolutionary optimization. IEEE Congr. Evol. Comput., 2003, 1, 663–670. Won, K.S., Ray, T. and Tai, K., A framework for optimization using approximate functions. IEEE Cong. Evol. Comput., 2003, 3, 1520–1527. Yager, R.R. and Filev, D.P., Generation of fuzzy rules by mountain clustering. J. Intell. Fuzzy Sys., 1994, 2, 209–219. Yao, X., Liu, Y. and Lin, G., Evolutionary programming made faster. IEEE Trans. Evol. Comput., 1999, 3(2), 82–102.
702
K. S. Won and T. Ray
Appendix A Welded-beam design Minimize f (x) = 1.10471x12 x2 + 0.04811x3 x4 (14.0 + x2 )
(A1)
τ (x) − τmax ≤ 0
(A2)
σ (x) − σmax ≤ 0
(A3)
x1 − x 4 ≤ 0
(A4)
δ(x) − δmax ≤ 0
(A5)
P − PC (x) ≤ 0
(A6)
Subject to
The other parameters are defined as follows:
2τ τ x2 P + (τ )2 τ = √ 2R 2x1 x2 x2 MR τ = M =P L+ 2 J x22 4P L3 x1 + x 3 2 R= δ(x) = + 4 2 Ex4 x33 ! 6P L x1 + x3 2 x1 x2 x22 σ (x) = + J =2 √ 2 x4 x32 2 12 " # $ 4.013 EGx32 x46 /36 x3 E 1− PC (x) = L2 2L 4G τ (x) =
(τ )2 +
where P = 6000 lb, L = 14, δmax = 0.25 in., E = 30 × 106 psi, G = 12 × 106 psi, τmax = 13,600 psi, σmax = 30,000 psi, 0.125 ≤ x1 ≤ 10.0, 0.1 ≤ x2 ≤ 10.0, 0.1 ≤ x3 ≤ 10.0 and 0.1 ≤ x4 ≤ 10.0. Speed-reducer design Minimize f (x) = 0.7854x1 x22 (3.3333x32 + 14.9334x3 − 43.0934) − 1.508x1 (x62 + x72 ) + 7.4777(x63 + x73 ) + 0.7854(x4 x62 + x5 x72 )
(A7)
Design optimization using surrogates
703
Subject to
27 −1≤0 x1 x22 x3
(A8)
397.5 −1≤0 x1 x22 x32
(A9)
1.93x43 −1≤0 x2 x3 x64
(A10)
1.93x53 −1≤0 x2 x3 x74
(A11)
(745x4 /x2 x3 )2 + 16.9 × 106 110.0x63
1/2 −1≤0
1/2 (745x5 /x2 x3 )2 + 157.5 × 106 −1≤0 85.0x73 x2 x3 −1≤0 40 5x2 −1≤0 x1 x1 −1≤0 12x2 1.5x6 + 1.9 −1≤0 x4 1.1x7 + 1.9 −1≤0 x5
(A12)
(A13) (A14) (A15) (A16) (A17) (A18)
where 2.6 ≤ x1 ≤ 3.6, 0.7 ≤ x2 ≤ 0.8, 17 ≤ x3 ≤ 28, 7.3 ≤ x4 ≤ 8.3, 7.3 ≤ x5 ≤ 8.3, 2.9 ≤ x6 ≤ 3.9 and 5.0 ≤ x7 ≤ 5.5.