Stochastic Radial Basis Function Algorithms for Large-Scale ... - SJU

Stochastic Radial Basis Function Algorithms for Large-Scale Optimization Involving Expensive Black-Box Objective and Constraint Functions Rommel G. Regis Mathematics Department, Saint Joseph’s University, Philadelphia, PA 19131, USA, [email protected]

August 23, 2010 Abstract. This paper presents a new algorithm for derivative-free optimization of expensive black-box objective functions subject to expensive black-box inequality constraints. The proposed algorithm, called ConstrLMSRBF, uses radial basis function (RBF) surrogate models and is an extension of the Local Metric Stochastic RBF (LMSRBF) algorithm by Regis and Shoemaker (2007a) that can handle black-box inequality constraints. Previous algorithms for the optimization of expensive functions using surrogate models have mostly dealt with bound constrained problems where only the objective function is expensive, and so, the surrogate models are used to approximate the objective function only. In contrast, ConstrLMSRBF builds RBF surrogate models for the objective function and also for all the constraint functions in each iteration, and uses these RBF models to guide the selection of the next point where the objective and constraint functions will be evaluated. Computational results indicate that ConstrLMSRBF is better than alternative methods on 9 out of 14 test problems and on the MOPTA08 problem from the automotive industry (Jones 2008). The MOPTA08 problem has 124 decision variables and 68 inequality constraints and is considered a large-scale problem in the area of expensive black-box optimization. The alternative methods include a Mesh Adaptive Direct Search (MADS) algorithm (Abramson and Audet 2006, Audet and Dennis 2006) that uses a kriging-based surrogate model, the Multistart LMSRBF algorithm by Regis and Shoemaker (2007a) modified to handle black-box constraints via a penalty approach, a genetic algorithm, a pattern search algorithm, a sequential quadratic programming algorithm, and COBYLA (Powell 1994), which is a derivative-free trust-region algorithm. Based on the results of this study, the results in Jones (2008) and other approaches presented at the ISMP 2009 conference, ConstrLMSRBF appears to be among the best, if not the best, known algorithm for the MOPTA08 problem in the sense of providing the most improvement from an initial feasible solution within a very limited number of objective and constraint function evaluations. Key words: Constrained optimization; derivative-free optimization; large-scale optimization; radial basis function; surrogate model; expensive function; stochastic algorithm

1 1.1

Introduction Motivation and Problem Statement

In many engineering optimization problems, the objective and constraint functions are black-box functions that are outcomes of computationally expensive computer simulations and the derivatives of these functions 1

are usually not available. This paper presents a new method for derivative-free optimization of expensive black-box objective functions subject to expensive black-box inequality constraints. The proposed method uses multiple radial basis function (RBF) surrogate models to approximate the expensive objective and constraint functions and uses these models to identify a promising point for function evaluation in each iteration. The method can be used for constrained optimization problems that are considered large-scale (in terms of number of decision variables and constraints) in the general area of surrogate model-based expensive black-box optimization and it is designed to obtain good solutions after only a relatively small number of objective and constraint function evaluations. Computational results demonstrate the effectiveness of this method on a large-scale optimization problem from the automotive industry involving 124 decision variables and 68 inequality constraints, and on a collection of 14 constrained optimization test problems, four of which are engineering design problems. Our focus is to solve an optimization problem of the following form: min f (x) s.t. x ∈ Rd , a ≤ x ≤ b gi (x) ≤ 0, i = 1, 2, . . . , m

(1)

where f, g1 , . . . , gm are deterministic black-box functions that are computationally expensive and a, b ∈ Rd . Future work will address the case where there is noise in the objective and constraint functions and also when there are explicit linear inequality or equality constraints. We assume that the derivatives of f, g1 , . . . , gm are unavailable, which is the case in many practical applications. Define the vector-valued function g(x) = (g1 (x), . . . , gm (x)) and let D := {x ∈ Rd

: g(x) ≤ 0, a ≤ x ≤ b} be the search space of the above

optimization problem. Furthermore, we assume that f, g1 , . . . , gm are all continuous on [a, b] so that D is a compact subset of Rd and f is guaranteed to have a global minimum point over D. We also assume that the values of f and g = (g1 , . . . , gm ) for a given x ∈ [a, b] can be obtained from computer simulations and that the simulator will not crash for any input x ∈ [a, b]. Future work will also address the case when the simulator crashes for some x ∈ [a, b]. Ideally, we would like to obtain a global minimum point for f over D using only a relatively small number of objective and constraint function evaluations. However, it usually takes a large number of function evaluations to guarantee that the solution obtained is even approximately optimal on low-dimensional bound constrained problems. For high-dimensional problems, finding the global minimum within a reasonable number of function evaluations is almost impossible (and hence not realistic) for general black-box problems with black-box constraints. Hence, most practitioners are typically concerned with obtaining a reasonably good feasible solution given a severe computational budget on the number of function evaluations. Although real-world optimization problems typically involve multiple local minima, the proposed algorithm focuses more on finding good local solutions from a given feasible starting point. Future work will consider infeasible starting points and more global approaches, including a multistart approach for expensive nonlinearly 2

constrained problems that can be effectively combined with this local search method. However, in theory, the proposed method can find the global minimum of the above optimization problem if it is allowed to run indefinitely using a convergence argument similar to that used in Regis and Shoemaker (2007a). Moreover, previous experience with the LMSRBF algorithm (Regis and Shoemaker 2007a) indicates that the proposed method can deal with rugged landscapes similar to those found in groundwater bioremediation problems.

1.2

Related Work

When the objective function f (x) and the constraint functions g1 (x), . . . , gm (x) are smooth and f (x) is not riddled with local minima, then the traditional optimization approach is to use a gradient-based local minimization algorithm. In addition, if a global minimum is desired, then this local minimization algorithm can be used in conjunction with a multistart approach for constrained optimization such as OQNLP (Ugray et al. 2007) or the Tabu Tunneling or Tabu Cutting Method (Lasdon et al. 2010). However, in many practical applications, the derivatives of the objective and constraint functions are not explicitly available so they would have to be obtained by automatic differentiation or finite-differencing. Unfortunately, automatic differentiation does not always produce accurate derivatives and it cannot be used when the complete source codes for the objective and constraint functions are not available. Moreover, finite-differencing may be unreliable when the objective function or the constraint functions are nonsmooth. Hence, many practitioners rely on derivative-free optimization methods (or direct search methods) (Kolda et al. 2003, Conn et al. 2009) such as pattern search (Torczon 1997), Mesh Adaptive Direct Search (MADS) (Abramson and Audet 2006, Audet and Dennis 2006) and derivative-free trust-region methods (Conn et al. 1997, 2009, Powell 2002, 2006, Wild et al. 2008). Furthermore, derivative-free heuristic methods such as simulated annealing, evolutionary algorithms (e.g., genetic algorithms, evolution strategies and evolutionary programming), differential evolution (Storn and Price 1997, Sarimveis and Nikolakopoulos 2005), and scatter search (Glover 1998, Laguna and Marti 2003, Rodriguez-Fernandez et al. 2006, Egea et al. 2007) are also used to solve constrained optimization problems. When the objective and constraint functions are computationally expensive black-box functions, a suitable optimization approach is to use response surface models (also known as surrogate models or metamodels) for these expensive functions. Here, the term response surface model is used in a broad sense to mean any function approximation model such as polynomials, which are used in traditional response surface methodology (Myers and Montgomery 1995), radial basis functions (RBF) (Buhmann 2003, Powell 1992), kriging (Sacks et al. 1989, Cressie 1993), regression splines, neural networks and support vector machines. Note that the RBF model described in Powell (1992) is equivalent to a form of kriging called dual kriging (see Cressie (1993)). The use of response surface models for expensive black-box optimization has become widespread within the last decade. For example, polynomial and kriging response surface models have been used to solve 3

aerospace design problems (Giunta et al. 1997, Simpson et al. 2001). Kriging interpolation was used by Jones et al. (1998) to develop the EGO method, which is a global optimization method where the next iterate is obtained by maximizing an expected improvement function. A variant of the EGO method was used by Aleman et al. (2009) to optimize beam orientation in intensity modulated radiation therapy (IMRT) treatment planning. Villemonteix et al. (2009) also used kriging to develop the IAGO method, which uses minimizer entropy as a criterion for determining new evaluation points. RBF interpolation was used by Gutmann (2001) to develop a global optimization method where the next iterate is obtained by minimizing a bumpiness function. Variants of this RBF method were developed by Björkman and Holmström (2000) and by Regis and Shoemaker (2007b). Kriging was used in conjunction with pattern search to solve a helicopter rotor blade design problem (Booker et al. 1999) and an aeroacoustic shape design problem (Marsden et al. 2004). Egea et al. (2009) also used kriging to improve the performance of scatter search on computationally expensive problems. Finally, derivative-free trust-region methods for unconstrained optimization (e.g., Conn et al. 1997, Powell 1994, 2002, 2006, Wild et al. 2008) use local interpolation models of the objective function using a subset of previously evaluated points. Most of the surrogate model-based optimization methods mentioned above can only be used for bound constrained problems where only the objective function is expensive. Relatively few surrogate model-based approaches have been developed for optimization problems involving nonlinear constraints. For example, the CORS method by Regis and Shoemaker (2005) can be used for problems involving inexpensive and explicitly defined nonlinear constraints. The Adaptive Radial Basis Algorithm (ARBF) by Holmström et al. (2008) can handle nonlinear constraints that are either inexpensive or are incorporated into the objective function via penalty terms. ASAGA (Adaptive Surrogate-Assisted Genetic Algorithm) (Shi and Rasheed 2008) also handles constraints via a penalty and uses a surrogate model to approximate the fitness function for a genetic algorithm. For optimization problems involving an expensive objective function and expensive black-box inequality constraints, there are even fewer surrogate model-based methods that do not use penalty terms to handle the black-box constraints. For example, the NOMADm software by Abramson (2007) implements the MADS algorithm (Abramson and Audet 2006, Audet and Dennis 2006) for constrained optimization and it has the option of using a kriging surrogate model to improve the performance of MADS on computationally expensive problems. COBYLA (Powell 1994) is a derivative-free trust region method for constrained optimization that uses linear interpolation models of the objective and constraint functions. Kleijnen et al. (2010) recently developed a method for constrained nonlinear stochastic optimization that uses kriging models of the stochastic black-box objective and constraint functions but the decision variables are required to be nonnegative integers. The proposed method also handles the expensive black-box inequality constraints by using RBF models of the constraint functions.

4

1.3

Main Contribution

The main contribution of this paper is a new RBF algorithm called Constrained Local Metric Stochastic RBF (ConstrLMSRBF), which is an extension of the Local Metric Stochastic RBF (LMSRBF) algorithm by Regis and Shoemaker (2007a). The original LMSRBF algorithm was designed for bound constrained optimization problems with an expensive black-box objective function. ConstrLMSRBF is designed to handle black-box inequality constraints by using multiple RBF models to approximate the objective and constraint functions and identify promising evaluation points for subsequent iterations. Previous papers on expensive black-box optimization using response surface models have mostly dealt with bound constrained problems where only the objective function is black-box and expensive (e.g., Jones et al. 1998, Björkman and Holmström 2000, Gutmann 2001, Regis and Shoemaker 2004, 2005, 2007a, 2007b, Egea et al. 2009). To the best of my knowledge, this is one of the few surrogate model-based optimization methods at present that treats each inequality constraint individually instead of lumping all of them into one penalty function. This study involves a comparison of seven very different methods for constrained optimization (including the proposed ConstrLMSRBF algorithm) on 14 test problems (four of which are engineering design problems) where the number of decision variables ranges from 2 to 20 and the number of inequality constraints ranges from 1 to 11. ConstrLMSRBF is compared to six alternative methods: (1) NOMADm-DACE, which is a Mesh Adaptive Direct Search (MADS) algorithm (Abramson and Audet 2006, Audet and Dennis 2006) that uses a DACE surrogate model (Lophaven et al. 2002); (2) MLMSRBF-Penalty, which is the Multistart LMSRBF algorithm by Regis and Shoemaker (2007a) that has been modified to handle black-box constraints via a penalty approach; (3) a sequential quadratic programming (SQP) algorithm where the derivatives are obtained by finite differencing; (4) a pattern search algorithm (Torczon 1997); (5) a genetic algorithm; and (6) the derivative-free trust-region algorithm COBYLA (Powell 1994). The computational results indicate that ConstrLMSRBF is generally better than these alternative methods on 9 of the 14 test problems, including three of the engineering design problems. In addition, the computational results show that ConstrLMSRBF is also a very promising method for problems that are relatively high dimensional in the area of expensive black-box optimization. Previous papers on expensive black-box optimization using response surface models have also mostly dealt with low dimensional problems, usually less than 15 dimensions (e.g., Jones et al. 1998, Björkman and Holmström 2000, Gutmann 2001, Regis and Shoemaker 2004, 2005, 2007a, 2007b, Aleman et al. 2009, Egea et al. 2009, Villemonteix et al. 2009). For high dimensional problems (involving more than 100 decision variables), the computational overhead involved in determining a promising function evaluation point in each iteration for some surrogate model-based global optimization algorithms such as the RBF method by Gutmann (2001) and the CORS-RBF method by Regis and Shoemaker (2005) could rival the cost of an objective or constraint

5

function evaluation so it is not easy to develop a reasonably good algorithm for such problems. This paper shows very promising results for ConstrLMSRBF on the MOPTA08 benchmark problem from the automotive industry developed by Jones (2008). The MOPTA08 problem is an optimization problem that has 124 decision variables and 68 inequality constraints and is considered large-scale in the area of expensive blackbox optimization. This problem is important because it has vastly more decision variables and constraints than other test problems used in surrogate model-based optimization. To the best of my knowledge and based on other approaches mentioned by Jones (2008) and the two other approaches presented at the ISMP 2009 conference (i.e., Forrester and Jones (2009), Quttineh and Holmström (2009)), ConstrLMSRBF is among the best, if not the best, known method for the MOPTA08 problem in the sense of providing the most improvement from an initial feasible solution within a very limited number of objective and constraint function evaluations (Regis 2009). From the results for the MOPTA08 problem in this study, ConstrLMSRBF and its variants outperformed NOMADm-DACE, MLMSRBF-Penalty, a Fortran 90 implementation of COBYLA and three optimization solvers in Matlab, namely, Fmincon, Pattern Search and Genetic Algorithm. Based on results for the MOPTA08 problem presented by Jones (2008), ConstrLMSRBF is also better than Generalized Reduced Gradient (using iSIGHT-FD), SQP (Harwell routine VF13), an evolution strategy, a local search algorithm with search directions from local surface approximations (LS-OPT), and also the original Fortran 77 implementation of COBYLA due to Powell (1994). In summary, the proposed ConstrLMSRBF algorithm represents substantial progress in the area of expensive black-box optimization. Computational results show that this is a very promising method compared to alternative methods for expensive black-box optimization. It deals with some important issues that were not previously addressed by similar algorithms, namely, how to deal with nonlinear black-box inequality constraints and how to deal with problems involving a large number of decision variables and constraints.

2 2.1

A Stochastic Response Surface Method for Nonlinearly Constrained Black-Box Optimization Algorithmic Framework

Below is a detailed description of a new derivative-free optimization method called Constrained Local Metric Stochastic Response Surface Method (ConstrLMSRS) that is suitable for problems involving computationally expensive black-box objective and black-box constraint functions. In addition, the method can be used for problems that are generally considered to be large-scale in the area of black-box optimization, typically problems involving more than 30 decision variables. The new method is an extension of the Local Metric Stochastic Response Surface (LMSRS) method by Regis and Shoemaker (2007a) that can handle nonlinear inequality constraints by using multiple response surface models to approximate the black-box objective and

6

constraint functions. The method by Kleijnen et al. (2010) for nonlinear constrained stochastic optimization also uses multiple kriging models to approximate the stochastic constraint functions. However, their method requires the decision variables to be nonnegative integers whereas the proposed method requires the decision variables to be continuous variables. From Section 1.1, the search space for our optimization problem is bounded and is denoted by D := {x ∈ Rd : g(x) ≤ 0, a ≤ x ≤ b}. Previous surrogate model-based optimization methods for bound constrained problems typically begin by evaluating the expensive objective function at an initial set of points from a space-filling experimental design (e.g., see Koehler and Owen 1996). However, when constraints are present, there is no guarantee that there will be a feasible point among a given set of space-filling design points in the region [a, b] ⊆ Rd defined by the bound constraints in (1). Hence, for simplicity, ConstrLMSRS requires a feasible starting point and it always attempts to maintain feasibility throughout the entire run of the algorithm. Future work will consider an extension of this method that can handle infeasible starting points. The ConstrLMSRS algorithm begins by evaluating the expensive objective function f (x) and the expensive constraint function g(x) = (g1 (x), . . . , gm (x)) at the feasible starting point and at an initial set of points, some of which might turn out to be infeasible. These initial points could be in the neighborhood of the feasible starting point (so that there is a good chance of obtaining other feasible points) or they could be spread throughout the region [a, b] defined by the bound constraints (so that we could get a more global picture of the objective and constraint functions) or even a combination of local and global points. Then the algorithm fits multiple response surface models, one for the objective function f (x) and one for each of the constraint functions g1 (x), . . . , gm (x) at the beginning of each iteration using all available data points (i.e., points where the objective and constraint function values are known) and uses these models to select the next point where the objective and constraint functions will be evaluated. After obtaining the objective and constraint function values at the selected point, the process iterates. In the LMSRS and ConstrLMSRS methods, the function evaluation point is obtained from a collection of randomly generated points, which are referred to as candidate points in Regis and Shoemaker (2007a). In the LMSRS algorithm, the candidate points are generated by adding random perturbations to the current best solution that are normally distributed with zero mean and scalar covariance matrix. On the other hand, in ConstrLMSRS, we allow for the possibility that a random candidate point is generated by applying normal random perturbations on only a subset of the coordinates of the current best feasible solution. This idea comes from the heuristic DDS algorithm by Tolson and Shoemaker (2007). The particular coordinates that are perturbed in the current best feasible solution are randomly selected as was done in the DDS algorithm. In both LMSRS and ConstrLMSRS, the evaluation point is selected from a set of random candidate points as the one with the best weighted score from two criteria. However, since ConstrLMSRS tries to maintain feasibility, it first uses the response surface models of the constraint functions g1 (x), . . . , gm (x) to identify the 7

candidate points that are predicted to be feasible or those with the minimum number of predicted constraint violations. From this subset of candidate points, ConstrLMSRS selects the evaluation point according to the two criteria used in the original LMSRS method, namely, the estimated objective function value obtained from the response surface model of the objective function (response surface criterion) and the minimum distance from previously evaluated points (distance criterion). For each criterion, we assign each candidate point a score between 0 and 1 where more desirable points are given scores closer to 0. We seek a candidate point that has a low predicted objective function value and that is far from previously evaluated points. The latter requirement can improve the current response surface models for the objective and constraint functions and promote global search. Because a candidate point with low predicted objective function value tends to be close to the current best feasible solution, we consider a weighted score of these two criteria and choose the candidate point with the lowest weighted score among all candidate points. Some algorithms for constrained optimization, such as the pattern search implementation in Matlab, do not perform the same number of objective function evaluations and constraint function evaluations in each iteration. However, ConstrLMSRS performs exactly one evaluation of the objective function f (x) and exactly one evaluation of the constraint function g(x) = (g1 (x), . . . , gm (x)) in each iteration. Moreover, for ConstrLMSRS, we assume that the objective and constraint function values at a given point x ∈ Rd can be obtained in one call to a simulator so that each iteration requires one computer simulation. In the description below, an evaluated point is a point where the objective and constraint functions have been evaluated. In our notation, n0 is the number of initial evaluation points (including the feasible starting point), n is the number of previously evaluated points, An = {x1 , . . . , xn } is the set of n previously evaluated points, and s0n (x), s1n (x), . . . , sm n (x) are the response surface models for the objective function f (x) and the constraint functions g1 (x), . . . , gm (x), respectively, after n simulations. These response surface models are constructed using the points in An and their known objective and constraint function values. We specify a set of nonnegative weights {(wnR , wnD ) : n = n0 , n0 + 1, . . .} such that wnR + wnD = 1 for all n ≥ n0 for the response surface and the distance criteria. We only need the weights for the response surface criterion {wnR }n≥n0 since the weights for the distance criterion {wnD }n≥n0 can be determined from {wnR }n≥n0 . More precisely, we consider an ordered set of weights for the response surface criterion: Υ = ⟨υ1 , . . . , υκ ⟩, where 0 ≤ υ1 ≤ . . . ≤ υκ ≤ 1. Υ is called the weight pattern, which is a set of parameters for the algorithm. The weights for the response surface criterion are then obtained by cycling through this weight pattern. Constrained Local Metric Stochastic Response Surface (ConstrLMSRS) Method Inputs: (1) Real-valued deterministic black-box functions f, g1 , . . . , gm such that f is defined on D := {x ∈ Rd : a ≤ x ≤ b, gi (x) ≤ 0, i = 1, 2, . . . , m}, where a, b ∈ Rd . We assume that the values of f and the gi ’s are 8

obtained in one computer simulation and that the simulator will not crash for any input in x ∈ [a, b]. (2) A feasible starting point x1 ∈ D. (3) The maximum number of computer simulations allowed denoted by Nmax . (4) A particular response surface model, e.g., a cubic RBF model with a linear polynomial tail. (5) A set of initial evaluation points I = {x1 , x2 , . . . , xn0 } ⊆ D, including the feasible starting point x1 . These points do not all have to be feasible. (6) The number of candidate points in each iteration, denoted by t. (7) The weight pattern for the response surface criterion as described above: Υ = ⟨υ1 , . . . , υκ ⟩, where 0 ≤ υ1 ≤ . . . ≤ υκ ≤ 1. (8) The probability of perturbing a coordinate of the current best feasible solution when generating candidate points, denoted by pselect . (9) The initial step size σinit and the minimum step size σmin . (10) The tolerance for the number of consecutive failed iterations Tfail . (11) The threshold for the number of consecutive successful iterations Tsuccess . Output: The best point encountered by the algorithm. Step 1. (Evaluate Initial Points) For each i = 1, . . . , n0 , compute f (xi ) and g(xi ) = (g1 (xi ), . . . , gm (xi )). Let xbest be the best feasible point found so far and let fbest := f (xbest ). Set n := n0 and An := I = {x1 , x2 , . . . , xn }. Step 2. (Initialize Step Size and Counters for Consecutive Failures and Successes) Set σn = σinit . Also, set Cfail := 0 and Csuccess := 0. Step 3. While the termination condition is not satisfied (e.g., while (n < Nmax )) do Step 3.1 (Fit/Update Response Surface Model) Using the data points Bn = {(x, f (x), g(x)) : x ∈ An } = {(xi , f (xi ), g(xi )) : i = 1, . . . , n}, fit or update response surface models s0n (x), s1n (x), . . . , sm n (x) for the objective and m constraint functions, respectively. (Note that both the feasible and infeasible points will be used here.) Step 3.2 (Generate Multiple Random Candidate Points) Randomly generate t candidate points Ωn = {yn,1 , . . . , yn,t } as follows: For j = 1, . . . , t, Step 3.2(a) (Select Coordinates to Perturb in Current Best Solution) Generate d uniform random numbers ω1 , . . . , ωd in [0, 1]. Let Iperturb := {i : ωi < pselect }. If Iperturb = ∅, then select an element j uniformly at random from the set {1, . . . , d} and set Iperturb = {j}.

9

Step 3.2(b) (Generate Candidate Point) Generate the jth candidate point yn,j ∈ {x ∈ Rd : (i)

(i)

(i)

x(i) = xbest for all i ̸∈ Iperturb } by: yn,j = xbest + zn,j , where zn,j = 0 for all i ̸∈ Iperturb and zn,j is a normal random variable with mean 0 and standard deviation σn for all i ∈ Iperturb . Here, x(i) denotes the ith coordinate of the point x ∈ Rd . End. Step 3.3 (Determine Which Candidate Points Are Predicted to be Feasible and Collect the Valid Candidate Points) Use response surface models for the constraint functions (i.e., s1n (x), . . . , sm n (x)) to determine which of these random candidate points (i.e., those in Ωn ) are predicted to be feasible. Step 3.3(a) (Collect Candidate Points Predicted to be Feasible) If some of the candidate points are predicted to be feasible, then let Ωvalid be the collection of such points. n Step 3.3(b) (Collect Candidate Points With the Least Number of Constraint Violations) If none of the candidate points are predicted to be feasible, then collect the ones with the minimum be the collection of such points. number of predicted constraint violations and let Ωvalid n Step 3.4 (Select Function Evaluation Point) Using the information from the response surface model for the objective function s0n (x) and from the data points Bn , select the evaluation point xn+1 from the candidate points in Ωvalid as follows: n Step 3.4(a) (Estimate the Function Value of Candidate Points) For each x ∈ Ωvalid , compute n s0n (x). Also, compute smax = max{s0n (x) : x ∈ Ωvalid } and smin = min{s0n (x) : x ∈ Ωvalid }. n n n n Step 3.4(b) (Compute the Score Between 0 and 1 for the Response Surface Criterion) For each max max x ∈ Ωvalid , compute VnR (x) = (s0n (x) − smin − smin ̸= smin and VnR (x) = 1 n n )/(sn n ) if sn n

otherwise. Step 3.4(c) (Determine the Minimum Distance from Previously Evaluated Points) For each x ∈ Ωvalid , compute ∆n (x) = min1≤i≤n ∥x−xi ∥. Also, compute ∆max = max{∆n (x) : x ∈ Ωvalid } n n n and ∆min = min{∆n (x) : x ∈ Ωvalid }. (Here, ∥ · ∥ is the Euclidean norm on Rd .) n n Step 3.4(d) (Compute the Score Between 0 and 1 for the Distance Criterion) For each x ∈ Ωvalid , n max compute VnD (x) = (∆max − ∆n (x))/(∆max − ∆min ̸= ∆min and VnD (x) = 1 otherwise. n n n ) if ∆n n

Step 3.4(e) (Determine the Weights for the Two Criteria) Set { wnR =

υ mod (n−n0 ,κ) υκ

and wnD = 1 − wnR .

10

if mod (n − n0 , κ) ̸= 0 otherwise

Step 3.4(f ) (Compute the Weighted Score) For each x ∈ Ωvalid , compute Wn (x) = wnR VnR (x) + n wnD VnD (x). Step 3.4(g) (Select Next Evaluation Point) Let xn+1 be the point in Ωvalid that minimizes Wn . n End. Step 3.5 (Perform Objective and Constraint Function Evaluations) Compute the constraint function values g1 (xn+1 ), . . . , gm (xn+1 ) and the objective function value f (xn+1 ). Step 3.6 (Update Counters for Consecutive Failures and Successes) If xn+1 is feasible and f (xn+1 ) < fbest , then reset Csuccess := Csuccess + 1 and Cfail := 0. Otherwise, reset Cfail := Cfail + 1 and Csuccess := 0. Step 3.7 (Adjust Step Size) If Csuccess ≥ Tsuccess , then reset σn := 2σn and Csuccess :=

0. If

Cfail ≥ Tfail , then reset σn := max(σn /2, σmin ) and Cfail := 0. Step 3.8 (Update Best Feasible Objective Function Value) Update the best feasible objective function value encountered so far among feasible points, i.e. if xn+1 is feasible and f (xn+1 ) < fbest , then xbest = xn+1 and fbest = f (xn+1 ). Step 3.9 (Update Collection of Previously Evaluated Points) Set An+1 := An ∪ {xn+1 }, and reset n := n + 1. (Note that An+1 includes both feasible and infeasible points.) End. Step 4. (Return the Best Feasible Solution Found) Return xbest .

In Step 1 of ConstrLMSRS given above, the objective and constraint functions f, g1 , . . . , gm are evaluated at the n0 initial points one of which is the feasible starting point. In Step 2, the current step size σn is set to the initial step size σinit and the counters for the number of consecutive failed iterations and the number of consecutive successful iterations are both set to zero. In Step 3, the algorithm goes through the iterations that involve fitting or updating the response surface models for the objective and constraint functions, generating random candidate points, selecting the best point among the candidate points with the minimum number of predicted constraint violations, and evaluating the objective and constraint function at the selected point. Finally, in Step 4, the best feasible solution found is returned by the algorithm. In Step 1, the initial evaluation points could be in the neighborhood of the feasible starting point x1 or they could be spread throughout the region [a, b] defined by the bound constraints in (1). The default is to choose d points obtained by moving the feasible starting point by some small step along each of the positive coordinate directions, and so, n0 = d + 1. More precisely, we evaluate the objective and constraint functions at the d + 1 points {x1 , x1 + δ1 e1 , x1 + δ2 e2 , . . . , x1 + δd ed }. Here, {e1 , e2 , . . . , ed } is the natural basis of Rd 11

and δi = 0.05(b(i) − a(i) ), where a = (a(1) , . . . , a(d) ) and b = (b(1) , . . . , b(d) ) define the bound constraints in (1). If x1 + δi ei > bi , then we replace the initial evaluation point x1 + δi ei by the point x1 − δi ei , which is guaranteed to be in the box [a, b] because of the choice of δi . Note that this default procedure for choosing the initial evaluation points yields d + 1 affinely independent points, which is required for fitting the initial RBF models that will be used later. The response surface model used in Step 3.1 can be any type of function approximation model such as RBFs (Powell 1992, Buhmann 2003), kriging (Cressie 1993, Sacks et al. 1989), or neural networks. The default is to use a cubic RBF model with a linear polynomial tail (see Section 2.2 below). In Step 3.2, the set Ωn of candidate points for objective and constraint function evaluation are generated by adding random perturbations to some (or all) of the coordinates of the current best feasible solution xbest . These random perturbations are all normally distributed with mean 0 and standard deviation σn . The parameter σn is referred to as the step size. Moreover, the parameter pselect is the probability that each coordinate of the current best solution xbest will be perturbed when generating candidate points. In Step 3.2(a), Iperturb represents the set of coordinates that are perturbed in the current best solution when generating a particular candidate point. Note that the number of coordinates perturbed follows a binomial probability distribution B(d, pselect ). When the problem has a small number of decision variables (say d ≤ 20), the default value is pselect = 1, and so, Iperturb = {1, . . . , d}. On the other hand, when there is a large number of decision variables, pselect should be chosen so that the average number of coordinates of xbest that are perturbed, given by d · pselect , is relatively small. When pselect < 1, note that Iperturb varies within one iteration to allow for more diversity in candidate points. In this case, the algorithm performs some form of random block coordinate search in the sense that the evaluation point is obtained from xbest by moving in the direction determined by the group of selected coordinates. Hence, when pselect < 1, we shall refer to the method as ConstrLMSRS-BCS, where BCS stands for block coordinate search. The BCS strategy is expected to work well for problems with many inequality constraints since by perturbing only a small fraction of the coordinates, we increase the chances of generating candidate points that will turn out to be feasible and better than the current best feasible solution. Moreover, this strategy could also work for problems with many decision variables even if there are only few black-box inequality constraints or there are only bound constraints. Again, by perturbing only a small fraction of the coordinates, we also increase the chances of generating candidate points that improve the current best solution because the resulting candidate points are closer to the current best feasible point. That is, for a fixed standard deviation of the normal random perturbations, candidate points that are obtained by perturbing all coordinates of the current best feasible solution will tend to be farther from the current best feasible solution than candidate points obtained by perturbing only a fraction of the coordinates. Tolson and Shoemaker (2007) provide some

12

numerical evidence that a BCS-like strategy is effective for relatively high dimensional bound constrained problems. Finally, it is important to mention that although only a fraction of the coordinates are perturbed in generating candidate points, the BCS strategy does not limit the search directions that are considered at any given iteration. Each candidate point is generated by perturbing the current best feasible solution at a few coordinates. However, the set of coordinates considered is different for different candidate points. Thus, this strategy actually helps in considering a very diverse set of search directions. In Step 3.3, we use the response surface models for the constraint functions to determine which of the random candidate points generated in Step 3.2 are predicted to be feasible and let Ωvalid be the collection of n is the collection of points such points. If none of the candidate points are predicted to be feasible, then Ωvalid n with the minimum number of predicted constraint violations. Then, in Step 3.4, the function evaluation point is selected to be the point in Ωvalid with the best weighted score from two criteria: estimated objective n function value obtained from the response surface model, and minimum distance from previously evaluated points. We could also use other selection criteria as noted in Regis and Shoemaker (2007a). In Step 3.5, the objective and constraint functions are evaluated at the selected candidate point. When f is computationally expensive, the total running time for Steps 3.1 and 3.5 dominates the running time of the algorithm. In Steps 3.6 and 3.7, we monitor the progress of the algorithm by recording the number of consecutive failed iterations (i.e., iterations that did not improve the best feasible objective function value encountered so far), which we denote by Cfail . Whenever Cfail reaches some pre-specified tolerance parameter Tfail , we reduce the current step size σn by half, reset Cfail to zero, and continue running the algorithm. We shrink the step size to facilitate convergence of the algorithm. We set a minimum step size σmin to ensure that the points will not be too close to the current best feasible solution. Similarly, we also record the number of consecutive successful iterations, which we denote by Csuccess . Whenever Csuccess reaches some pre-specified threshold parameter Tsuccess , we double the current step size σn , reset Csuccess to zero, and continue running the algorithm.

2.2

Radial Basis Function Interpolation

In the numerical experiments, we use a radial basis function (RBF) interpolation model in our implementation of ConstrLMSRS and the resulting algorithm is called ConstrLMSRBF. This RBF model is described in Powell (1992) and is equivalent to a form of kriging interpolation called dual kriging (see Cressie (1993)). Given n distinct points x1 , . . . , xn ∈ Rd where the objective function values f (x1 ), . . . , f (xn ) are known, ∑n we use an interpolant of the form sn (x) = i=1 λi ϕ(∥x − xi ∥) + p(x), x ∈ Rd , where ∥ · ∥ is the Euclidean norm, λi ∈ R for i = 1, . . . , n, p(x) is a linear polynomial in d variables, and ϕ has the cubic form: ϕ(r) = r3 . Other choices for ϕ are possible (e.g., see Powell (1992)). In the original implementation of the LMSRBF algorithm by Regis and Shoemaker (2007a), a thin plate spline RBF model (ϕ(r) = r2 log r) with a linear 13

polynomial tail was used. We use a cubic RBF model here since it is simpler and there is some evidence that it works better than the thin plate spline RBF model for some surrogate model-based optimization methods (e.g., see Björkman and Holmström (2000)). Define the matrix Φ ∈ Rn×n by: Φij := ϕ(∥xi − xj ∥),

i, j = 1, . . . , n. Also, define the matrix

P ∈ Rn×(d+1) so that its ith row is [1, xTi ]. Now, the cubic RBF model that interpolates the points (x1 , f (x1 )), . . . , (xn , f (xn )) is obtained by solving the system ( )( ) ( ) Φ P λ F = , c 0d+1 P T 0(d+1)×(d+1)

(2)

where 0(d+1)×(d+1) ∈ R(d+1)×(d+1) is a matrix of zeros, F = (f (x1 ), . . . , f (xn ))T , 0d+1 ∈ Rd+1 is a column vector of zeros, λ = (λ1 , . . . , λn )T ∈ Rn and c = (c1 , . . . , cd+1 )T ∈ Rd+1 consists of the coefficients for the linear polynomial p(x). The coefficient matrix in Equation (2) is invertible if and only if rank(P ) = d + 1 (Powell 1992). We use this RBF model to approximate both the objective function f (x) and each of the constraint functions g1 (x), . . . , gm (x). When m is large, this may sound like a daunting task. However, for a given set of data points where the objective and constraint function values are known, note that we use the same interpolation matrix, and so, fitting multiple RBF models can be done relatively efficiently.

3

Computational Experiments

3.1

Alternative Optimization Methods

We compare the proposed ConstrLMSRBF algorithm with alternative methods for constrained optimization. The choice of alternative methods is limited by publicly available software. One alternative is the NOMADm software developed by Abramson (2007). NOMADm is a Matlab implementation of the Mesh Adaptive Direct Search (MADS) algorithm (Abramson and Audet 2006, Audet and Dennis 2006) available at http://www.gerad.ca/NOMAD/Abramson/nomadm.html. The MADS algorithm is an extension of pattern search for constrained optimization that handles nonlinear constraints without using a penalty function. There are multiple options for running NOMADm and, in this study, we choose the option that uses the DACE surrogate model (Lophaven et al. 2002) to make NOMADm suitable for computationally expensive functions. We refer to this particular algorithm as NOMADm-DACE. Here, DACE stands for “Design and Analysis of Computer Experiments,” which is a methodology that uses kriging interpolation to model the outcome of a computer experiment. Another alternative is the Multistart LMSRBF (or MLMSRBF) algorithm by Regis and Shoemaker (2007a) that has been modified to handle the black-box inequality constraints via a penalty approach. That is, the original MLMSRBF algorithm is applied to an equivalent bound constrained problem where the objective function is the original objective function plus a penalty function. We refer to this algorithm 14

as MLMSRBF-Penalty. A Matlab implementation of MLMSRBF by the author is publicly available at http://www.sju.edu/∼rregis/pages/software.html. The alternatives also include the Fmincon routine of the Matlab Optimization Toolbox (The Mathworks 2009), which implements Sequential Quadratic Programming (SQP). Since SQP requires derivatives of the objective and constraint functions, these are obtained by finite differencing. Two other methods are the Genetic Algorithm (GA) and the Pattern Search algorithm in the Matlab Genetic Algorithm and Direct Search Toolbox (The Mathworks 2009). Finally, we also compare ConstrLMSRBF with COBYLA (Powell 1994), which is a derivative-free trust region method that uses linear surrogate models of the objective and constraint functions.

3.2

Test Problems

We compare the performance of the optimization algorithms on 14 test problems, including four engineering design problems, and on the MOPTA08 benchmark problem from the automotive industry developed by Jones (2008). The MOPTA08 problem involves 124 decision variables and 68 inequality constraints and is considered large-scale in the area of expensive black-box optimization because previous papers in this area have mostly dealt with problems involving less than 15 decision variables (e.g., Jones et al. 1998, Björkman and Holmström 2000, Gutmann 2001, Regis and Shoemaker 2004, 2005, 2007a, 2007b, Aleman et al. 2009, Egea et al. 2009, Villemonteix et al. 2009). The goal in the MOPTA08 problem is to determine the values of the decision variables (e.g., shape variables) that will minimize the mass of the vehicle subject to performance constraints (e.g., crashworthiness, durability). For more details about this problem, see Jones (2008). Four of the test problems are engineering design problems and these are the WB4 (Welded Beam Design Problem) (Coello Coello and Montes 2002, Hedar 2004), PVD4 (Pressure Vessel Design Problem) (Coello Coello and Montes 2002, Hedar 2004), GTCD4 (Gas Transmission Compressor Design Problem) (Beightler and Phillips 1976), and SR7 (Speed Reducer Design for small aircraft engine) (Floudas and Pardalos 1990). Nine of the test problems were taken from a collection of well-known constrained optimization test problems mentioned in (Michalewicz and Fogel 2000) and used by Hedar (2004) and Egea (2008). These are labeled G2, G3MOD, G4, G5MOD, G6, G7, G8, G9, and G10. The G3MOD and G5MOD problems were obtained from the original G3 and G5 problems by replacing all equality constraints with ≤ inequality constraints. Some of these problems were taken from Floudas and Pardalos (1990). The remaining test problem was due to Hesse (1973). For some of the problems, we modified the objective function or constraint functions by applying a strictly increasing transformation to avoid extremely large values. Note that applying such a transformation does not change the location of the local and global optima of the problem. Table 1 summarizes the characteristics of the 14 test problems and the MOPTA08 problem. The objective and constraint functions for the above optimization problems are not really expensive 15

Table 1: Constrained optimization test problems and the MOPTA08 problem. Test Problem G6 G8 WB4 (Welded Beam Design) GTCD4 (Gas Transmission Compressor Design) PVD4 (Pressure Vessel Design) G5MOD G4 Hesse

Number of Decision Variables 2 2 4 4

Number of Inequality Constraints 2 2 6 1

Region Defined by Bound Constraints [13, 100] × [0, 100] [0, 10]2 [0.125, 10] × [0.1, 10]3 [20, 50] × [1, 10] × [20, 50] × [0.1, 60]

4 4 5 6

3 5 6 6

7

11

7 8 10 10 20 124

4 6 2 8 1 68

[0, 1]2 × [0, 50] × [0, 240] [0, 1200]2 × [−0.55, 0.55]2 [78, 102] × [33, 45] × [27, 45]3 [0, 5] × [0, 4] × [1, 5] ×[0, 6] × [1, 5] × [0, 10] [2.6, 3.6] × [0.7, 0.8] × [17, 28] ×[7.3, 8.3]2 × [2.9, 3.9] × [5.0, 5.5] [−10, 10]7 [102 , 104 ] × [103 , 104 ]2 × [10, 103 ]5 [0, 10]10 [−10, 10]10 [0, 1]20 [0, 1]124

SR7 (Speed Reducer Design) G9 G10 G2 G7 G3MOD MOPTA08

to evaluate. For example, the MOPTA08 problem is a relatively inexpensive model of an actual design problem in the automotive industry. Each simulation of the MOPTA08 problem takes about 0.53 sec on a 2.40 GHz Windows desktop machine while each simulation of the real version could take 1–3 days (Jones 2008). However, as was done in Regis and Shoemaker (2007a), we can still conduct meaningful comparisons of performance of the different algorithms for constrained black-box optimization by pretending that the objective and constraint functions are computationally expensive. This can be done by keeping track of the best feasible objective function values obtained by the different algorithms as the number of objective and constraint function evaluations increase. The relative performance of algorithms on these test problems are expected to be similar to the relative performance of these algorithms on truly expensive problems whose objective and constraint functions have the same general shape as the objective and constraint functions of our test problems.

3.3

Experimental Setup

We perform all numerical computations in Matlab on a Dual Processor 2.40 GHz Windows machine with 1.98 GB of RAM. For all test problems except the MOPTA08 problem, we compare eight algorithms: ConstrLMSRBF, ConstrLMSRBF-SLHD, NOMADm-DACE, MLMSRBF-Penalty, COBYLA, Fmincon (SQP), Pattern Search, and GA. Here, SLHD stands for symmetric Latin hypercube design (Ye et al.

2000),

which is a particular type of space-filling experimental design. For the MOPTA08 problem, we also run ConstrLMSRBF-BCS and ConstrLMSRBF-SLHD-BCS in addition to the above algorithms since the BCS strategy is meant for high dimensional or highly constrained problems. For the BCS strategy, we set the parameter pselect = 0.10. Before any algorithm was applied to any of the test problems, we apply a suitable

16

transformation to the search space D of each problem so that the region defined by the bound constraints becomes the unit hypercube [0, 1]d . To test the robustness of the different algorithms, we perform 30 runs of each algorithm on each of the 14 test problems where each run corresponds to a different feasible starting point. To ensure a fair comparison, the same feasible starting point is used by the different algorithms in a given run. Each run of each algorithm is given a computational budget of about 200 objective and constraint function evaluations. For the MOPTA08 problem, Jones (2008) provided only one feasible starting point and finding other feasible starting points that are far from the given one is a very difficult problem. However, some of the algorithms are stochastic so the optimization trajectories produced would still be different even if the same starting point is used. Hence, we still perform multiple runs for these algorithms. However, MLMSRBF-Penalty and ConstrLMSRBF and its variants have very long running times on the MOPTA08 problem as will be seen below so we perform only four runs of these algorithms. To be consistent, we also perform four runs of GA from the given starting point. The remaining algorithms (NOMADm-DACE, COBYLA, Fmincon and Pattern Search) are deterministic and since we are given only one feasible starting point, we perform only one run of each of these algorithms on the MOPTA08 problem. The computational budget for this problem is 4000 objective and constraint function evaluations. However, for reasons that will be explained below, some of the solvers terminated in less than 4000 function evaluations. Moreover, other solvers cannot really stop at exactly 4000 function evaluations but we only display the results up to 4000 function evaluations. ConstrLMSRBF and ConstrLMSRBF-BCS are both initialized by evaluating the feasible starting point and some points in the vicinity of the feasible starting point as described in Section 2.1. MLMSRBFPenalty uses the same initial points as the ones used by ConstrLMSRBF and ConstrLMSRBF-BCS. On the other hand, ConstrLMSRBF-SLHD and ConstrLMSRBF-SLHD-BCS are initialized by a symmetric Latin hypercube design (SLHD) (Ye et al. 2000) that has been rescaled over the region defined by the bound constraints in (1). The size of the SLHD is equal to 2(d + 1) as was done in Regis and Shoemaker (2007a). We choose the SLHD to be approximately optimal in the sense that the distances between design points are as large as possible. As mentioned in Section 2.2, we use the cubic RBF model with a linear polynomial tail in the implementation of all RBF-based algorithms, including MLMSRBF-Penalty. The RBF model is used to approximate the objective and constraint functions in every iteration of ConstrLMSRBF and its variants. Note that we cannot guarantee that any of the initial points obtained by the two procedures mentioned above are feasible. However, even if many of these points turn out to be infeasible, we still obtain information that is useful for creating RBF models of the constraint functions. Table 2 summarizes the values of the parameters for ConstrLMSRBF and its variants that are used in the numerical experiments. In this table, ℓ([a, b]) is the length of the smallest side of [a, b] ⊆ Rd defined by 17

Table 2: Parameter values for ConstrLMSRBF and its variants. Below, ℓ([a, b]) is the length of the smallest side of the region [a, b] ⊆ Rd defined by the bound constraints. Parameter

Value

t = |Ωn | (number of candidate points for each iteration)

min(1000d, 20000) ⟨0.95⟩

Υ (weight pattern)

(κ = 1)

σinit (initial step size)

0.05ℓ([a, b])

σmin (minimum step size)

(0.2)(1/2)6 ℓ([a, b])

Tsuccess (threshold parameter for deciding when to increase the step size)

3

Tfail (tolerance parameter for deciding when to reduce the step size)

min(max(⌊d · pselect ⌋, 5), 30)

the bound constraints in (1). Please see Section 2 for an explanation of these parameters. The parameters in Table 2 may be adjusted depending on whatever prior information is known about the problem and these settings are probably not optimal for each problem that was used in the study. However, we fix these parameters of ConstrLMSRBF and its variants for all the problems to ensure fair comparison among the different methods. For the NOMADm software by Abramson (2007), we use the option that involves a kriging-based surrogate model implemented in the DACE toolbox by Lophaven et al. (2002) and refer to the resulting algorithm as NOMADm-DACE. We use a kriging model with a Gaussian correlation function where the parameters are obtained by maximum likelihood estimation. The initial mesh size is set to 0.05ℓ([a, b]), the mesh refinement factor is set to 0.5 and the mesh coarsening factor is set to 2. For the other parameters of NOMADm-DACE, we use the default values. For the MLMSRBF-Penalty method, the penalty weights are set by using some information from the initial feasible solutions for the different trials. In particular, the penalty weights are set so that a violation of 10−6 in any of the inequality constraints results in a penalty function value (that will be added to the original objective function) equal to the maximum objective function value of any of the initial feasible solutions. The parameters used for the MLMSRBF algorithm were the same ones used in Regis and Shoemaker (2007a) except that a cubic RBF model was used instead of the a thin plate spline RBF model. A cubic RBF model was used since this was the same type of RBF model used in ConstrLMSRBF and its variants. For the Matlab routine Fmincon, we use a minimum step size of 10−8 ℓ([a, b]) for calculating finite difference derivatives. For the pattern search implementation in Matlab, we set the cache option to ‘on’ and the cache tolerance to 10−3 ℓ([a, b]). With the cache option enabled, the algorithm keeps a history of the mesh points that it polls and it does not poll points that are within the cache tolerance distance to a point in the cache. For the genetic algorithm implementation in Matlab, we set the population size to max(2(d + 1), 20) and the mutation function to ‘@mutationadaptfeasible’ in order to handle constraints. Moreover, for each

18

trial of GA, the initial population is the same set of initial evaluation points used in the corresponding trial of ConstrLMSRBF-SLHD (i.e., an SLHD of size 2(d + 1) plus the given initial feasible solution). For all other parameters of Fmincon, the genetic algorithm and the pattern search algorithm in Matlab, we use the default values. Finally, for COBYLA, we use an initial trust region radius of 0.05ℓ([a, b]) and a final trust region radius of 10−6 ℓ([a, b]). We will measure the performance of an optimization algorithm on a particular problem by keeping track of the best feasible objective function value obtained by the algorithm after every objective or constraint function evaluation. Some optimization algorithms for constrained optimization, including the proposed ConstrLMSRBF method, uses one objective function evaluation and one constraint function evaluation in each iteration. However, some algorithms like Matlab’s Pattern Search tend to perform multiple constraint function evaluations for every objective function evaluation. Hence, we will count the objective function evaluations and constraint function evaluations separately so that one function evaluation means either an objective function evaluation (one call to f (x)) or a constraint function evaluation (one call to g(x) = (g1 (x), . . . , gm (x))). One may argue that one call to g(x) could be more costly than one call to f (x) especially when multiple inequality constraints are involved. However, for simplicity, we simply count one call to g(x) as one function evaluation. Since there are multiple trials for each problem, we get the average best feasible objective function value for the algorithm after every function evaluation and summarize the results in a plot called an average progress curve.

4

Results and Discussion

Figures 1-2 show graphs of the mean of the best feasible objective function value versus the number of function evaluations (i.e., the average progress curves) when the optimization algorithms are applied to the MOPTA08 problem and to the 14 test problems. Here, one function evaluation means either one evaluation of the objective function f (x) or one evaluation of the constraint function g(x) = (g1 (x), . . . , gm (x)). To get an idea of the variability in the results, we also include error bars that represent 95% t confidence intervals for the mean. For the 14 test problems, each side of the error bar has length equal to 2.045 times the standard deviation of the best feasible objective function value divided by the square root of the number of trials. Here, 2.045 is the critical value corresponding to a 95% confidence level for a t distribution with 30 − 1 = 29 degrees of freedom. The average progress curves track the average progress of the different algorithms as a function of the number of function evaluations. When both the objective and constraint function evaluations are expensive, there would be a relatively limited computational budget for any optimization algorithm so it is helpful to know the average performance of an algorithm after some fixed number of function evaluations. These

19

graphs provide some information on the quality of solutions obtained by the different algorithms for a range of computational budgets. For example, the relative performance of the different algorithms after 50, 100 and 200 function evaluations are indicated by the different average progress curves after 50, 100 and 200 function evaluations, respectively. We report the running times of the different algorithms on the MOPTA08 problem in Table 3 to provide some idea of the computational effort involved in running these algorithms. We did not report the running times on the test problems because for truly expensive functions the total running time of an algorithm is completely dominated by the total time spent on objective and constraint function evaluations. If the test problems are really expensive (say each objective or constraint function evaluation takes about τ units of time), then we can use the average progress curves to obtain a good estimate of how long each algorithm would take (in terms of τ ) to get to a certain feasible objective function value. We say that an algorithm A is better than algorithm B for a particular minimization problem if the average best function value obtained by algorithm A is consistently less than the average best function value obtained by algorithm B for a wide range of computational budgets. That is, algorithm A is better than algorithm B for a particular problem if the average progress curve for algorithm A is generally below the average progress curve for algorithm B for that problem.

4.1

Results on the Smaller Test Problems

The results in Figure 1 show that the ConstrLMSRBF methods (with or without SLHD) are generally the best on most of the 14 smaller test problems. In particular, ConstrLMSRBF (without SLHD) is generally the best on 9 of the 14 test problems (all except PVD4, G2, Hesse, G6 and G8). Moreover, for three of the remaining test problems (PVD4, Hesse and G6), the average progress curve of ConstrLMSRBF is not far from the best algorithm for these problems. In addition, ConstrLMSRBF-SLHD is generally better than NOMADm-DACE on 8 of the 14 test problems (all except SR7, WB4, G3MOD, G7, G9 and G8) and it is competitive with NOMADm-DACE on three of the remaining test problems (SR7, G3MOD and WB4). It is also generally better than MLMSRBF-Penalty on 12 of the 14 test problems (all except G9 and G8). Finally, ConstrLMSRBF-SLHD is generally better than the remaining alternative methods (COBYLA, Fmincon, Pattern Search and GA) on 8 of the 14 test problems (SR7, GTCD4, G3MOD, G10, Hesse, G4, G5MOD and G8) and it is competitive with the best of these remaining alternatives on four other test problems (WB4, PVD4, G7 and G6). Recall that ConstrLMSRBF-SLHD is almost the same as ConstrLMSRBF except that it is initialized by an SLHD with 2(d + 1) points over the region defined by the bound constraints in the given problem. For some of the test problems (namely PVD4, G2 and Hesse), the global picture provided by the SLHD

20

mean of the best feasible objective function value in 30 trials

Constrained Optimization Methods on Welded Beam (d = 4, m = 6)

ConstrLMSRBF ConstrLMSRBF−SLHD NOMADm−DACE COBYLA MLMSRBF−Penalty Fmincon (Matlab) Pattern Search (Matlab) GA (Matlab)

4600 4400 4200 4000 3800 3600 3400 3200 3000 0

50 100 150 number of objective and constraint function evaluations


7

1.8

x 10

200

Constrained Optimization Methods on GTCD (d = 4, m = 1) mean of the best feasible objective function value in 30 trials


Constrained Optimization Methods on Speed Reducer (d = 7, m = 11) 4800


1.6 1.4 1.2 1 0.8 0.6 0.4 0.2

0


200

7 ConstrLMSRBF ConstrLMSRBF−SLHD NOMADm−DACE COBYLA MLMSRBF−Penalty Fmincon (Matlab) Pattern Search (Matlab) GA (Matlab)

6.5 6 5.5 5 4.5 4 3.5 3 2.5 2

0


200

Constrained Optimization Methods on Pressure Vessel Design (d = 4, m = 3) 9500 ConstrLMSRBF ConstrLMSRBF−SLHD NOMADm−DACE COBYLA MLMSRBF−Penalty Fmincon (Matlab) Pattern Search (Matlab) GA (Matlab)

9000 8500 8000 7500 7000 6500 6000 0


200

Figure 1: Average objective function values of best feasible solutions (over 30 trials) found by constrained optimization methods on test problems. Error bars represent 95% confidence intervals about the mean.

appears to improve the performance of the ConstrLMSRBF strategy. However, for most of the test problems (10 out of 14), ConstrLMSRBF is generally better than ConstrLMSRBF-SLHD. One possible explanation for this result is that the initialization of ConstrLMSRBF-SLHD requires the evaluation of the objective and constraint functions at 2(d + 1) points whereas the initialization of ConstrLMSRBF involves only d + 1 points. For some problems, the extra d + 1 initial points in ConstrLMSRBF-SLHD could have been used to improve the best feasible objective function value. For example, if the black-box objective function and many of the black-box constraint functions can be approximated relatively well by linear functions, then we only need d + 1 affinely independent points to get a good enough picture of these functions. In situations where these black-box functions are far from linear, note that as more points get evaluated beyond the initial d + 1 points, ConstrLMSRBF constantly improves the initially crude RBF approximations of the objective and constraint functions so it is not necessarily at a disadvantage compared to ConstrLMSRBF-SLHD for

21

Constrained Optimization Methods on G7 (d = 10, m = 8) mean of the best feasible objective function value in 30 trials


Constrained Optimization Methods on G3MOD (d = 20, m = 1) 0.05 0 −0.05 −0.1 −0.15 −0.2 ConstrLMSRBF ConstrLMSRBF−SLHD NOMADm−DACE COBYLA MLMSRBF−Penalty Fmincon (Matlab) Pattern Search (Matlab) GA (Matlab)

−0.25 −0.3 −0.35

0




−0.2

−0.25

−0.4


0


1200 1000 800 600 400 200 0

4

−0.15

−0.35

1400

0

Constrained Optimization Methods on G2 (d = 10, m = 2)

−0.3

1600

200

−0.1


1800

200

2.2

x 10


200


2 1.8 1.6 1.4 1.2 ConstrLMSRBF ConstrLMSRBF−SLHD NOMADm−DACE COBYLA MLMSRBF−Penalty Fmincon (Matlab) Pattern Search (Matlab) GA (Matlab)

1 0.8 0.6

0


200

Figure 1: (Continued) Average objective function values of best feasible solutions (over 30 trials) found by constrained optimization methods on test problems. Error bars represent 95% confidence intervals about the mean.

the same number of function evaluations. Another possible reason why ConstrLMSRBF seems to be better than ConstrLMSRBF-SLHD is that at the initial stages, there is more information near the initial feasible solution for ConstrLMSRBF allowing the algorithm to make good progress right away. In contrast, although ConstrLMSRBF-SLHD uses more initial evaluated points, these points are farther away from the initial feasible solution so the algorithm takes longer to make progress. It is of interest to note that ConstrLMSRBF is generally much better than NOMADm-DACE, which is among the state-of-the-art direct search methods for the optimization of computationally expensive functions. In particular, ConstrLMSRBF is better than NOMADm-DACE on 11 of the 14 test problems (all except G2, Hesse and G8). Moreover, as noted earlier, ConstrLMSRBF-SLHD is better than NOMADm-DACE on 8 of the 14 test problem and it is competitive with NOMADm-DACE on three of the remaining test problems. NOMADm-DACE is a good algorithm for the optimization of expensive functions. It is better

22


Constrained Optimization Methods on Hesse (d = 6, m = 6)


8 7 6 5 4 3 2 1 0 −1

0

20 40 60 80 number of objective and constraint function evaluations 4



−2.55

x 10

−2.65 −2.7

−2.8 −2.85 −2.9 −2.95 −3 −3.05 0

−150

−200

−250

−300 0



200

Constrained Optimization Methods on G5MOD (d = 4, m = 5)

−2.75

−3.1

−100

Constrained Optimization Methods on G4 (d = 5, m = 6) ConstrLMSRBF ConstrLMSRBF−SLHD NOMADm−DACE COBYLA MLMSRBF−Penalty Fmincon (Matlab) Pattern Search (Matlab) GA (Matlab)

−2.6


−50

100



5

x 10

200


7500

7000

6500

6000

5500

5000 0


200


than MLMSRBF-Penalty on 11 of the test problems (all except PVD4, G10 and G8) and it is much better than Matlab’s Pattern Search and GA on 10 of the test problems (all except PVD4, G2, G10 and G9). NOMADm-DACE is also better than COBYLA on 11 of the test problems (all except G2, G10 and G9). Finally, it is better than Matlab’s Fmincon on 7 of the test problems (SR7, G3MOD, G7, G2, G9, G4 and G8) and it is competitive with Fmincon on two of the remaining test problems (WB4 and Hesse). ConstrLMSRBF and ConstrLMSRBF-SLHD are both better than MLMSRBF-Penalty on 12 of the 14 test problems (all except G9 and G8). Although the original MLMSRBF algorithm uses a surrogate model and has been shown to perform well relative to alternatives on a variety of bound constrained optimization test problems (Regis and Shoemaker 2007a), it is not expected to work effectively in conjunction with a penalty approach on problems with black-box inequality constraints. In fact, Powell (1994) noted that any method that lumps all the constraints into one penalty function is not expected to work as well as one that treats

23

Constrained Optimization Methods on G6 (d = 2, m = 2) mean of the best feasible objective function value in 30 trials


Constrained Optimization Methods on G8 (d = 2, m = 2) 0.02


0

−0.02

−0.04

−0.06

−0.08

−0.1 0


200

−2500 −3000 −3500 −4000 ConstrLMSRBF

−4500

ConstrLMSRBF−SLHD NOMADm−DACE

−5000

COBYLA MLMSRBF−Penalty

−5500

Fmincon (Matlab) Pattern Search (Matlab)

−6000

GA (Matlab)

−6500 −7000 −7500

0


200


each constraint individually. This is probably the main reason why MLMSRBF-Penalty performs poorly in comparison with ConstrLMSRBF, ConstrLMSRBF-SLHD, NOMADm-DACE, and Fmincon (which implements SQP). However, MLMSRBF-Penalty is not a bad algorithm. Although GA, Pattern Search and COBYLA also treat each constraint individually, MLMSRBF-Penalty is generally better than these alternatives on the test problems. In particular, it is better than GA on 12 of the test problems (all except G2 and G10). It is also better than COBYLA on 11 of the test problems (all except G2, G10 and G9). Finally, MLMSRBF-Penalty is better than Pattern Search on 7 of the test problems (SR7, GTCD4, G3MOD, Hesse, G4, G8 and G6) and it is competitive with Pattern Search on WB4. ConstrLMSRBF is much better than Fmincon on 11 of the test problems (all except PVD4, Hesse and G6) and it is at least comparable with Fmincon on the Hesse problem. Moreover, ConstrLMSRBF-SLHD is better than Fmincon on 9 test problems (all except WB4, PVD4, G7, G9 and G6) and it is competitive with Fmincon on WB4, PVD4, G7 and G6. Fmincon implements Sequential Quadratic Programming, which is a derivative-based method. These results provide some evidence that an RBF algorithm can compete with and outperform a more traditional derivative-based optimization method even on smooth relatively lowdimensional constrained problems. One possible explanation for these results is that the finite differencing procedure for Fmincon is relatively expensive since it needs to be done to estimate the derivatives for both the objective function and the constraint functions. For the same number of objective and constraint function evaluations, the RBF algorithms usually made more progress. Moreover, Fmincon does not maintain feasibility at every iteration and the algorithms are being compared based on the best feasible objective function value obtained after every objective and constraint function evaluation. In the computationally expensive setting it makes more sense that algorithms attempt to maintain feasibility as much as possible

24

since if the optimization is prematurely terminated because of limited computational budget, then at least the current iterate is feasible or close to feasible. From the results, it is clear that ConstrLMSRBF is generally much better than Matlab’s pattern search algorithm on 12 of the test problems (all except PVD4 and G2). Moreover, ConstrLMSRBF-SLHD is also generally much better than Matlab’s pattern search on 11 of the test problems (all except PVD4, G7 and G9). Pattern Search is among the most popular derivative-free optimization methods that can handle nonlinear black-box constraints. However, because it is designed for nonsmooth problems or problems with noise, it does not really take advantage of the smoothness of the objective and constraint functions in all the test problems. Also, Matlab’s pattern search spends a substantial amount of time on constraint function evaluations and this appears to be a contributing factor to the relatively slow progress of the algorithm. Fmincon and the three RBF algorithms tend to do one objective function evaluation for every constraint function evaluation while Matlab’s pattern search tends to perform multiple constraint function evaluations for every objective function evaluation. Kriging surrogate models have been used with pattern search to solve a linearly constrained 31-variable optimization problem (Booker et al. 1999) and it is possible to improve the performance of pattern search on the test problems (which are mostly nonlinearly constrained) by using surrogate models for the objective and constraint functions. Recall that the MADS approach is an extension of pattern search that handles nonlinear constraints without using a penalty function, and so, NOMADm-DACE is essentially a pattern search algorithm for nonlinearly constrained optimization that uses a kriging surrogate model. Hence, it is not a surprise that NOMADm-DACE performs better than pattern search on most of the test problems as mentioned above. However, as noted earlier, the ConstrLMSRBF methods are still generally better than NOMADm-DACE. ConstrLMSRBF and ConstrLMSRBF-SLHD are much better than the GA implementation in Matlab on 13 of the test problems (all except G2) as shown in Figure 1. Evolutionary algorithms like GA also do not take advantage of the smoothness of the objective and constraint functions in the test problems and their performance can be improved if they use surrogate models for the objective and constraint functions. Surrogate models have been used to enhance the performance of evolutionary algorithms on computationally expensive bound constrained problems (e.g., Regis and Shoemaker (2004), Rasheed et al. (2005)) and also on constrained problems by using a penalty function to handle the constraints (e.g., Shi and Rasheed (2008)) but at present there are no publicly available implementations that can be used for comparison purposes. From Figure 1, COBYLA was much better than the other algorithms (including ConstrLMSRBF and ConstrLMSRBF-SLHD) on the G2 problem and it is among the best on the G9 problem but its performance on the other test problems is relatively poor. The COBYLA code used in this study was the Fortran 90 version by Alan Miller of the original Fortran 77 code by Powell (1994). Both of these Fortran codes were 25

tried and both of them exhibited problems. For many of the test problems, the codes would freeze for some of the trials. Between the two codes, the Fortran 90 version seems to have less problems than the original version so the results presented were from this version. Closer inspection of the results reveals that for some of the problems (e.g., SR7, WB4, GTCD4, PVD4) COBYLA is able to obtain very good objective function values but the corresponding iterates are infeasible. This suggests that COBYLA was not good at resolving constraints quickly for these problems. Recall that COBYLA is a derivative-free algorithm that uses linear interpolation models of the objective and constraint functions. These results also suggest that linear models have limited ability to capture the complexity of both the objective and constraint functions in the test problems used. Powell (2002, 2006) developed more recent derivative-free algorithms that use quadratic models (e.g., UOBYQA, NEWUOA and BOBYQA). However, these algorithms cannot handle nonlinear constraints.

4.2

Results on the MOPTA08 Benchmark

For the MOPTA08 benchmark, which involves 124 decision variables and 68 nonlinear inequality constraints, the best algorithms are all based on RBFs and they all performed much better than the other alternatives as can be seen from Figure 2. In particular, ConstrLMSRBF and its variants (ConstrLMSRBFSLHD, ConstrLMSRBF-BCS and ConstrLMSRBF-SLHD-BCS) are all much better than NOMADm-DACE, MLMSRBF-Penalty, Fmincon, Pattern Search, GA and COBYLA. The overall best algorithm for this problem is ConstrLMSRBF-BCS, which is an RBF algorithm that uses the Block Coordinate Search (BCS) strategy. Moreover, the top two algorithms both involve the BCS strategy, where the candidate points in each iteration are obtained by perturbing only a subset of the coordinates of the current best feasible solution. In our implementation of the BCS strategy for this problem, each of the 124 coordinates of the current best feasible solution for the MOPTA08 problem is perturbed with probability pselect = 0.10 and the perturbations are done independently. Hence, in a given iteration the average number of coordinates perturbed is d · pselect = (124)(0.1) = 12.4. As noted in Section 2.1, the BCS strategy is potentially helpful for problems with many decision variables or many constraints. By perturbing only a small fraction of the coordinates, the BCS strategy increases the chances of generating candidate points that will turn out to be feasible and better than the current best feasible solution. This is especially important when there are 124 decision variables and 68 black-box inequality constraints that need to be satisfied. By perturbing only a smaller number of coordinates of the current best feasible solution, this might mean fewer constraints to violate especially if each decision variable does not really appear on many of the constraints. Moreover, this might also increase the chances of generating a candidate point that will improve the current best feasible solution as explained in Section 2.1.

26

mean of the best feasible objective function value

Constrained Optimization Methods on the MOPTA08 Problem (d = 124, m = 68)

250

ConstrLMSRBF

245

ConstrLMSRBF−SLHD ConstrLMSRBF−BCS ConstrLMSRBF−SLHD−BCS

240

NOMADm−DACE COBYLA MLMSRBF−Penalty Fmincon (Matlab)

235

Pattern Search (Matlab) GA (Matlab)

230

225 0

500 1000 1500 2000 2500 3000 3500 number of objective and constraint function evaluations

4000

Figure 2: Objective function values of best feasible solutions found by constrained optimization methods on the MOPTA08 problem with 124 decision variables and 68 inequality constraints. Error bars represent 95% confidence intervals about the mean.

From Figure 2, we can also see that using a global SLHD of size 2(d + 1) = 250 appears to degrade the performance of ConstrLMSRBF with or without the BCS strategy. As mentioned earlier, one reason could be that the data points provided by this global SLHD are too far from the initial solution and subsequent best feasible solutions. Hence, these SLHD points do not really help in the construction of accurate RBF models of the objective and constraint functions in the vicinity of the current best feasible solution. Such models are essential for the identification of promising candidate points for the objective and constraint function evaluations. Another reason could be that the variants of ConstrLMSRBF that use a global SLHD of size 250 are simply spending a lot more function evaluations on global search compared to the other variants that use only d + 1 = 125 points for the initial function evaluations. Although NOMADm-DACE performed relatively well on the test problems, this performance did not translate to the MOPTA08 problem. For one thing, the code terminated after 1506 objective and constraint function evaluations (see Figure 2) because it ran out of memory. The implementation of NOMADmDACE appears to be memory intensive and the algorithm does not seem to be designed to handle relatively high dimensional problems with many black-box constraints. This might be partly due to the fact that kriging interpolation can be memory intensive. In contrast, the implementations of ConstrLMSRBF and 27

Results of Constrained LMSRBF on MOPTA08

Results of Constrained LMSRBF with BCS on MOPTA08

280

280 0 violations 1−5 violations 270

260

260

objective function value


0 violations 1−5 violations 270

250

240

230

220

210

0


230

210

4000

Results of NOMADm−DACE on MOPTA08

0


4000

Results of COBYLA on MOPTA08 280

0 violations 1−5 violations >5 violations

270

260

250

240

230


270



240

220

280

220

210

250

260

250

240

230

220

0


210

4000

0


4000

Figure 3: Objective function values and number of constraint violations of the solutions in the trajectory of the different algorithms on the MOPTA08 problem with 124 decision variables and 68 inequality constraints. The horizontal line represents an objective function value of 228.

its variants and also MLMSRBF-Penalty are able to manage memory well even though they also use a surrogate model. In addition, even if we set aside issues with memory management, the results also show that ConstrLMSRBF and its variants performed so much better than NOMADm-DACE within the first 1506 objective and constraint function evaluations. After 4000 function evaluations, COBYLA and Fmincon did not really improve the best feasible value while GA only obtained a small improvement as shown in Figure 2. The plots in Figure 3 explain what is going on with these algorithms. These plots are all drawn to the same scale to facilitate the comparison among the trajectories of the different algorithms. Here, the horizontal line represents an objective function value of 228, which is considered to be a good feasible objective function value to aim for (Jones 2008). In Figure 3, there is a steady decline in objective function value obtained by COBYLA but the corresponding

28

Results of MLMSRBF with penalty on MOPTA08

Results of Fmincon (Matlab) with FD on MOPTA08

280

280 0 violations 1−5 violations >5 violations

260

250

240

230

220

210

0


250

240

230

210

4000

Results of Pattern Search (Matlab) on MOPTA08

0


4000

Results of a Genetic Algorithm (Matlab) on MOPTA08 280


270

260

270



260

220

280

250

240

230

220

210


270



270

260

250

240

230


220

0


210

4000

0


4000

Figure 3: (Continued) Objective function values and number of constraint violations of the solutions in the trajectory of the different algorithms on the MOPTA08 problem with 124 decision variables and 68 inequality constraints. The horizontal line represents an objective function value of 228.

solutions are all infeasible. In fact, almost all solutions in the trajectory of COBYLA on this problem have more than 5 constraint violations. This indicates that COBYLA does not appear to do a good job in handling constraints and maintaining feasibility. There are also solutions in Fmincon’s trajectory that have very good objective function values but then they are also infeasible. After less than 500 objective and constraint function evaluations, Fmincon obtained a relatively low objective value (around 205) and then the objective function values started increasing as Fmincon attempts to resolve the constraints. These results show a weakness of COBYLA and Fmincon when dealing with computationally expensive functions. They do not maintain feasibility through the iterations and they take a long time to resolve constraints. As pointed out earlier, it makes more sense to develop algorithms that maintain feasibility because of very limited computational budgets. For GA, there are also many points in the trajectory with objective function

29

values that are worse than the initial feasible objective function value of 251.0706 and many of these points also have > 5 constraint violations. This suggests that GA spent many function evaluations doing global search and it did not make much progress in local search. For the MOPTA08 problem, it appears that a good local strategy that is capable of producing feasible iterates is necessary for quickly obtaining improved solutions. For a highly constrained problem such as MOPTA08, performing too much global search is somewhat ineffective since a point that is far from the current best feasible point is likely to violate one of the 68 inequality constraints in the problem. Compared to Fmincon, COBYLA and NOMADm-DACE, the Pattern Search algorithm made substantial steady progress within 3000 objective and constraint function evaluations. We only show results up to about 3000 function evaluations because the algorithm terminated prematurely. From Figure 3, we see that the trajectory of Pattern Search includes a mixture of feasible solutions, infeasible solutions with 1–5 constraint violations, and infeasible solutions with more than 5 constraint violations. This suggests that Pattern Search appears to handle constraints relatively well on the MOPTA08 problem. However, compared with ConstrLMSRBF and its variants (which all spend one objective function evaluation for every constraint function evaluation), Pattern Search spends much more constraint function evaluations for every objective function evaluation in trying to resolve the constraints. MLMSRBF-Penalty also made some progress within about 2000 objective and constraint function evaluations though not as much as that of Pattern Search. Recall that MLMSRBF-Penalty performed relatively poorly compared with NOMADm-DACE on the test problems. On the MOPTA08 problem, MLMSRBFPenalty did a better job of generating feasible solutions with improved objective function values compared to NOMADm-DACE. However, compared with ConstrLMSRBF and its variants, MLMSRBF-Penalty was not able to generate improved feasible solutions quickly. Again, this supports the idea that lumping all the constraints into one penalty function is not as effective as treating each constraint individually. At the ISMP 2009 conference, ConstrLMSRBF-BCS was shown to be the best black-box optimization algorithm for the MOPTA08 benchmark. Jones (2008) tried multiple algorithms on the MOPTA08 problem, including the Generalized Reduced Gradient (using iSIGHT-FD), SQP (Harwell routine VF13), an evolution strategy, a local search algorithm with search directions from local surface approximations (LS-OPT), and the original Fortran 77 implementation of COBYLA by Powell (1994). The two other groups that presented results for the MOPTA08 problem at ISMP 2009 (i.e., Forrester and Jones (2009), Quttineh and Holmström (2009)) developed extensions of the EGO method by Jones et al. (1998). None of these alternative methods came close to the solution obtained by ConstrLMSRBF-BCS within 1000 simulations (a total of 2000 objective and constraint function evaluations) for the MOPTA08 problem. To the best of my knowledge, this is the first optimization algorithm that can successfully handle a large-scale, computationally expensive black-box objective function with many nonlinear black-box inequality constraints. This is a remarkable achievement 30

Table 3: Average running times of the different algorithms on the MOPTA08 problem (excluding time spent on function evaluations). Algorithm

Average running time

Average running time

after 2000 function evaluations

after 4000 function evaluations

ConstrLMSRBF

31505.91 sec (8.75 hrs)

151026.14 sec (41.95 hrs)

ConstrLMSRBF-SLHD

32911.62 sec (9.14 hrs)

160313.93 sec (44.53 hrs)

ConstrLMSRBF-BCS

32844.16 sec (9.12 hrs)

155439.66 sec (43.18 hrs)

ConstrLMSRBF-SLHD-BCS

30666.40 sec (8.52 hrs)

153347.90 sec (42.60 hrs)

NOMADm-DACE

2199.76 sec (after 1506 evaluations)

NA

COBYLA

17.95 sec

38.80 sec

MLMSRBF-Penalty

34329.63 sec (9.54 hrs)

160082.28 sec (44.47 hrs)

Fmincon (Matlab)

8.46 sec

16.55 sec

Pattern Search (Matlab)

3.90 sec

NA

Genetic Algorithm (Matlab)

29.33 sec

57.32 sec

in the area of expensive black-box optimization because most papers published so far on surrogate modelbased approaches deal with problems of relatively low dimensionality and do not handle expensive black-box inequality constraints except through a penalty approach.

4.3

Comparison of Running Times on the MOPTA08 Problem

Table 3 shows the running times (excluding total times spent on function evaluations) of the different algorithms on the MOPTA08 problem after 2000 and 4000 function evaluations. From Table 3, it is clear that the running times of ConstrLMSRBF and its variants on the MOPTA08 problem are much longer than those of the alternative methods except MLMSRBF-Penalty. However, in the context of truly expensive functions, these running times are still much smaller than the total time spent on function evaluations. For example, if each objective or constraint function evaluation of the MOPTA08 problem takes 1 hour, then the average total running time of ConstrLMSRBF-BCS for 2000 function evaluations would be about 2009.12 hours while the total running time of Pattern Search for 2000 function evaluations would be 2000.0011 hours. However, from Figure 2, the mean of the best feasible objective function value obtained by ConstrLMSRBFBCS is so much better than that obtained by Pattern Search after 2000 function evaluations. In general, the methods that use surrogate models on the MOPTA08 problem (all RBF methods and NOMADm-DACE) run much longer than methods that do not use a surrogate model for a fixed number of function evaluations. The reasons for this high overhead include the time spent building the surrogate model and also the time spent selecting the next point where the objective and constraint function will be evaluated. Both of these computation times typically depend on the problem size (the number of decisions variables and the number of constraints) and also on the number of previously evaluated points. In general, more decision

31

variables and constraints translate into longer running times for any optimization method and this is why the MOPTA08 problem (with 124 decision variables and 68 inequality constraints) is somewhat challenging for surrogate model-based optimization. In addition, the time to do one iteration of a surrogate model-based method also tends to increase as the number of function evaluations increase. This is because with more function evaluations comes more information that can be incorporated into the surrogate model and into the process of selecting subsequent points for function evaluation. For example, note that ConstrLMSRBF requires an overhead of about 8.75 hours (on average) to do 2000 objective and constraint function evaluations but it requires an overhead of 41.95 hours (more than a four-fold increase) to do 4000 objective and constraint function evaluations. However, if each objective or constraint function evaluation takes an hour, then a 40hour overhead is still reasonable compared to 4000 hours spent on function evaluations. Because of the high overhead of surrogate model-based optimization methods, there is usually a right balance between the cost of objective and constraint function evaluations, the problem dimension, and the computational budget (i.e., the number of planned function evaluations) that makes it reasonable to use a surrogate model-based method instead of other methods. For example, for the MOPTA08 problem, if each function evaluation takes hours, then it is clear that the ConstrLMSRBF methods should be preferred over the other alternatives. However, if each function evaluation only takes a few seconds, then it might actually better to use one of the methods that do not use surrogate models since for the same wall clock time the other method would probably make more progress than any of the ConstrLMSRBF methods. Finding the right balance between these factors that determine when it makes sense to use surrogate model-based methods is beyond the scope of this paper and will be the subject of future work. For now, we assume that we are considering only truly expensive problems that require hours to do one function evaluation.

5

Summary and Conclusion

This paper introduced the ConstrLMSRBF method, which is a new stochastic RBF method for the optimization of expensive black-box objective functions with expensive black-box inequality constraints. Previous work on the use of surrogate models for the optimization of expensive black-box functions have mostly dealt with bound constrained problems where only the objective function is expensive. This new algorithm is a significant extension of the LMSRBF algorithm by Regis and Shoemaker (2007a) that can handle nonlinear black-box inequality constraints. Like LMSRBF, the new method uses an RBF model and a distance criterion to select a promising point for objective and constraint function evaluation in each iteration. However, because there are nonlinear black-box inequality constraints, we use multiple RBF models, one for each constraint function and one for the objective function. The RBF models for the constraints are used to identify candidate points that

32

are predicted to be feasible or at least identify those with the minimum number of predicted constraint violations. From these candidate points, the evaluation point is then selected to be the best candidate point according to the predicted objective function value and minimum distance from previously evaluated points as was done in the LMSRBF algorithm. Four variants of the ConstrLMSRBF method were implemented, namely ConstrLMSRBF, Constr-LMSRBFSLHD, ConstrLMSRBF-BCS and ConstrLMSRBF-SLHD-BCS. Two of these variants (ConstrLMSRBF and ConstrLMSRBF-SLHD) were compared with alternative methods on 14 well-known constrained optimization test problems involving 2 to 20 decision variables and 1 to 11 inequality constraints. Four of these test problems are engineering design applications. All four variants of ConstrLMSRBF were applied to the MOPTA08 benchmark problem from the automotive industry. The MOPTA08 problem involves 124 decision variables and 68 inequality constraints, making it a large-scale problem in the area of expensive black-box optimization because previous work in this area have mostly dealt with problems involving less than 15 decision variables. The BCS variants of ConstrLMSRBF were not applied to the smaller test problem since the BCS strategy is meant for high dimensional problems or problems with many constraints. The alternative optimization methods were: (1) NOMADm-DACE, which is a MADS algorithm that uses a kriging surrogate model; (2) MLMSRBF-Penalty, which is the Multistart LMSRBF algorithm modified to handle black-box inequality constraints via a penalty approach; (3) a sequential quadratic programming (SQP) algorithm that uses finite difference derivatives; (4) a pattern search algorithm; (5) a genetic algorithm; and (6) the COBYLA algorithm, which is a derivative-free trust region method that uses linear approximations to the objective and constraint functions. The computational results indicate that the ConstrLMSRBF methods are clearly the best algorithms on the MOPTA08 problem. In particular, they are much better than NOMADm-DACE, MLMSRBF-Penalty, a Fortran 90 implementation of COBYLA and the commercial-grade optimization solvers from Matlab, namely, Fmincon (SQP), Pattern Search, and Genetic Algorithm. At the ISMP 2009 conference, one of the variants of ConstrLMSRBF, namely ConstrLMSRBF-BCS, was the best black-box optimization algorithm for the MOPTA08 benchmark. The competitor algorithms (other than the ones used for comparison in this study) did not come close to the best solution obtained by the algorithm after relatively few simulations. Moreover, based on the results presented by Jones (2008) on the MOPTA08 problem, ConstrLMSRBF also outperformed Generalized Reduced Gradient (using iSIGHT-FD), SQP (Harwell routine VF13), an evolution strategy, a local search algorithm with search directions from local surface approximations (LS-OPT), and the original Fortran 77 implementation of COBYLA due to Powell (1994). On the 14 smaller test problems involving at most 20 decision variables, the ConstrLMSRBF algorithm (without the SLHD) is generally the best algorithm on 9 of these problems (including three of the engineering design applications). ConstrLMSRBF-SLHD is better than MLMSRBF-Penalty on 12 of the 14 test problems 33

and it is better than NOMADm-DACE on 8 of the 14 test problems. Moreover, ConstrLMSRBF-SLHD is generally better than the remaining alternative methods (COBYLA, Fmincon, Pattern Search and GA) on 8 of the 14 test problems and it is competitive with the best of these remaining alternatives on four other test problems. Finally, ConstrLMSRBF is generally better than ConstrLMSRBF-SLHD on 10 out of the 14 test problems suggesting that a global SLHD can potentially degrade the performance of ConstrLMSRBF when only a very limited number of function evaluations can be performed. NOMADm-DACE is among the state-of-the-art direct search methods for computationally expensive constrained optimization and MLMSRBF is among the best algorithms for computationally expensive bound constrained problems. In addition, the other alternatives (Fmincon, Pattern Search and GA) are well-known, commercial-grade optimization software from Matlab. Hence, these numerical results clearly suggest that the ConstrLMSRBF approach is very promising for the optimization of expensive black-box objective function with expensive black-box inequality constraints.

Acknowledgements Special thanks to Dr. Don Jones for providing a Fortran simulation program for the MOPTA08 Benchmark Problem from the automotive industry and for inviting me to give a talk in his session at the 20th International Symposium on Mathematical Programming (ISMP 2009) conference in Chicago. I am grateful to Dr. Mark A. Abramson for making the NOMADm software publicly available. I would also like to thank Prof. Mike Powell and Alan Miller for their Fortran codes that implement COBYLA. Finally, I would like to thank Saint Joseph’s University for providing me with a conference travel grant to attend ISMP 2009.

References 1. Abramson, M.A. 2007. NOMADm version 4.6 User’s Guide. Unpublished manuscript. 2. Abramson, M.A., C. Audet. 2006. Convergence of mesh adaptive direct search to second-order stationary points. SIAM Journal on Optimization 17(2) 606–619. 3. Aleman, D.M., H.E. Romeijn, J.F. Dempsey. 2009. A response surface approach to beam orientation optimization in intensity modulated radiation therapy treatment planning. INFORMS Journal on Computing 21 62–76. 4. Audet, C., J.E. Dennis, Jr. 2006. Mesh adaptive direct search algorithms for constrained optimization. SIAM Journal on Optimization 17(2) 188–217. 5. Beightler, C.S., D.T. Phillips, D.T. 1976. Applied Geometric Programming. Wiley, New York. 34

6. Björkman, M., K. Holmström. 2000. Global optimization of costly nonconvex functions using radial basis functions. Optimization and Engineering 1 373–397. 7. Booker, A.J., J.E. Dennis, P.D. Frank, D.B. Serafini, V. Torczon, M.W. Trosset. 1999. A rigorous framework for optimization of expensive functions by surrogates. Struct. Optim. 17(1) 1–13. 8. Buhmann, M.D. 2003. Radial Basis Functions. Cambridge Univ. Press, Cambridge, U.K. 9. Coello Coello, C.A., E.M. Montes. 2002. Constraint-handling in genetic algorithms through the use of dominance-based tournament selection. Advanced Engineering Informatics 16, 193–203. 10. Conn, A.R., K. Scheinberg, Ph.L. Toint. 1997. Recent progress in unconstrained nonlinear optimization without derivatives. Mathematical Programming 79(3) 397–414. 11. Conn, A.R., K. Scheinberg, L.N. Vicente. 2009. Introduction to Derivative-Free Optimization. SIAM, Philadelphia, PA. 12. Cressie, N. 1993. Statistics for Spatial Data. Wiley, New York. 13. Egea, J.A., M. Rodriguez-Fernandez, J.R. Banga, R. Marti. 2007. Scatter Search for chemical and bioprocess optimization. J. of Global Optimization 37(3) 481–503. 14. Egea, J.A. 2008. New Heuristics for Global Optimization of Complex Bioprocesses. Ph.D. thesis. Universidade de Vigo, Spain. 15. Egea, J.A., E. Vazquez, J.R. Banga, R. Marti. 2009. Improved scatter search for the global optimization of computationally expensive dynamic models. J. of Global Optimization 43(2-3) 175–190. 16. Forrester, A., D.R. Jones. 2009. Enhancements to the expected improvement criterion. Presented at the 20th International Symposium on Mathematical Programming (ISMP). Chicago, IL. 17. Floudas, C.A., P.M. Pardalos. 1990. A Collection of Test Problems for Constrained Global Optimization Algorithms. Springer-Verlag, Berlin. 18. Giunta, A.A., V. Balabanov, D. Haim, B. Grossman, W.H. Mason, L.T. Watson, R.T. Haftka. 1997. Aircraft multidisciplinary design optimisation using design of experiments theory and response surface modelling. Aeronautical Journal 101(1008) 347–356. 19. Glover, F. 1998. A template for scatter search and path relinking. J.-K. Hao, E. Lutton, E. Ronald, M. Schoenauer, D. Snyers, eds. Artificial Evolution, Lecture Notes in Computer Science 1363. Springer Verlag, Berlin, Germany. 13–54.

35

20. Gutmann, H.-M. 2001. A radial basis function method for global optimization. Journal of Global Optimization 19(3) 201–227. 21. Hedar, A. 2004. Studies on Metaheuristics for Continuous Global Optimization Problems. Ph.D. thesis. Kyoto University, Kyoto, Japan. 22. Hesse, R. 1973. A heuristic search procedure for estimating a global solution of nonconvex programming problems. Op. Res. 21 1267–1280. 23. Holmstr¨ om, K., N.-H. Quttineh, M.M. Edvall. An adaptive radial basis algorithm (ARBF) for expensive black-box mixed-integer constrained global optimization. Optimization and Engineering 9(4) 311–339. 24. Jones, D.R. 2008. Large-scale multi-disciplinary mass optimization in the auto industry. Presented at the Modeling and Optimization: Theory and Applications (MOPTA) 2008 Conference. Ontario, Canada. 25. Jones, D.R., M. Schonlau, W.J. Welch. 1998. Efficient global optimization of expensive black-box functions. Journal of Global Optimization 13(4) 455–492. 26. Kleijnen, J.P.C., W. van Beers, I. van Nieuwenhuyse. 2010. Constrained optimization in expensive simulation: novel approach. European Journal of Operational Research 202(1) 164–174. 27. Koehler, J.R., A.B. Owen. 1996. Computer experiments. S. Ghosh, C.R. Rao, eds. Handbook of Statistics, 13: Design and Analysis of Computer Experiments. North-Holland, Amsterdam, The Netherlands. 261–308. 28. Kolda, T.G., R.M. Lewis, V. Torczon. 2003. Optimization by direct search: new perspectives on some classical and modern methods. SIAM Review 45(3) 385–482. 29. Laguna, M., R. Marti. 2003. Scatter Search: Methodology and Implementations in C. Kluwer, Boston, MA. 30. Lasdon, L., A. Duarte, F. Glover, M. Laguna, R. Marti. 2010. Adaptive memory programming for constrained global optimization. Computers & Operations Research 37(8) 1500–1509. 31. Lophaven, S.N., H.B. Nielsen, J. Søndergaard. 2002. DACE: A Matlab Kriging Toolbox, Version 2.0. Technical Report IMM-TR-2002-12, Informatics and Mathematical Modelling, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.

36

32. Marsden, A.L., M. Wang, J.E. Dennis, Jr., P. Moin. 2004. Optimal aeroacoustic shape design using the surrogate management framework. Optimization and Engineering 5(2) 235–262. 33. Michalewicz, Z., D.B. Fogel. 2000. How to Solve It: Modern Heuristics. Springer-Verlag, Berlin. 34. Myers, R.H., D.C. Montgomery. 1995. Response Surface Methodology: Process and Product Optimization Using Designed Experiments. Wiley, New York. 35. Powell, M.J.D. 1992. The theory of radial basis function approximation in 1990. W. Light, ed. Advances in Numerical Analysis, Volume 2: Wavelets, Subdivision Algorithms and Radial Basis Functions. Oxford University Press, Oxford, U.K. 105–210. 36. Powell, M.J.D. 1994. A direct search optimization methods that models the objective and constraint functions by linear interpolation. S. Gomez and J.-P. Hennart, eds. Advances in Optimization and Numerical Analysis. Kluwer, Dordrecht. 51–67. 37. Powell, M.J.D. 2002. UOBYQA: Unconstrained optimization by quadratic approximation. Mathematical Programming 92 555–582. 38. Powell, M.J.D. 2006. The NEWUOA software for unconstrained optimization without derivatives. G. Di Pillo and M. Roma, eds. Large-Scale Nonlinear Optimization. Springer, US. 255–297. 39. Quttineh, N., K. Holmström. 2009. Implementation of a one-stage EGO algorithm. Presented at the 20th International Symposium on Mathematical Programming (ISMP). Chicago, IL. 40. Rasheed, K., X. Ni, S. Vattam. 2005. Comparison of methods for developing dynamic reduced models for design optimization. The Soft Computing Journal 9(1) 29–37. 41. Regis, R.G. 2009. Radial basis function algorithms for large-scale nonlinearly constrained black-box optimization. Presented at the 20th International Symposium on Mathematical Programming (ISMP). Chicago, IL. 42. Regis, R.G., C.A. Shoemaker. 2004. Local function approximation in evolutionary algorithms for costly black box optimization. IEEE Transactions on Evolutionary Computation 8(5) 490-505. 43. Regis, R.G., C.A. Shoemaker. 2005. Constrained global optimization using radial basis functions. Journal of Global Optimization 31 153–171. 44. Regis, R.G., C.A. Shoemaker. 2007a. A stochastic radial basis function method for the global optimization of expensive functions. INFORMS Journal on Computing 19(4) 497–509.

37

45. Regis, R.G., C.A. Shoemaker. 2007b. Improved strategies for radial basis function methods for global optimization. Journal of Global Optimization 37(1) 113–135. 46. Rodriguez-Fernandez, M., J.A. Egea, J.R. Banga. 2006. Novel Metaheuristic for Parameter Estimation in Nonlinear Dynamic Biological Systems. BMC Bioinformatics 7:483. 47. Sacks, J., W.J. Welch, T.J. Mitchell, H.P. Wynn. 1989. Design and analysis of computer experiments. Statistical Science 4 409–435. 48. Sarimveis, H., A. Nikolakopoulos. 2005. A line up evolutionary algorithm for solving nonlinear constrained optimization problems. Computers & Operations Research 32(6) 1499–1514. 49. Shi, L., K. Rasheed. 2008. ASAGA: an adaptive surrogate-assisted genetic algorithm. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2008). 1049–1056. 50. Simpson, T. W., T.M. Mauery, J.J. Korte, F. Mistree. 2001. Kriging Metamodels for Global Approximation in Simulation-Based Multidisciplinary Design Optimization. AIAA Journal 39(12) 2233–2241. 51. Storn, R., K. Price. 1997. Differential evolution a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization 11(4) 341–359. 52. Tolson, B.A., C.A. Shoemaker. 2007. Dynamically dimensioned search algorithm for computationally efficient watershed model calibration. Water Resources Res. 43, W01413, doi:10.1029/2005WR004723. 53. Torczon, V. 1997. On the convergence of pattern search algorithms. SIAM J. on Optim. 7 1–25. 54. Ugray, Z., L. Lasdon, J. Plummer, F. Glover, J. Kelley, R. Marti. 2005. Scatter search and local NLP solvers: a multistart framework for global optimization. INFORMS J. Comput. 19(3) 328–340. 55. Villemonteix, J., E. Vazquez, E. Walter. 2009. An informational approach to the global optimization of expensive-to-evaluate functions. Journal of Global Optimization 44(4) 509–534. 56. Wild, S., R.G. Regis, C.A. Shoemaker. 2008. ORBIT: Optimization by radial basis function interpolation in trust-regions. SIAM Journal on Scientific Computing 30(6) 3197–3219. 57. Ye, K.Q., W. Li, A. Sudjianto. 2000. Algorithmic construction of optimal symmetric latin hypercube designs. Journal of Statistical Planning and Inference 90 145–159. 58. The Mathworks, Inc. 2009. Matlab Genetic Algorithm and Direct Search Toolbox: User’s Guide, Version 2. Natick, MA. 59. The Mathworks, Inc. 2009. Matlab Optimization Toolbox: User’s Guide, Version 4. Natick, MA.

38

Stochastic Radial Basis Function Algorithms for Large-Scale ... - SJU

Stochastic Radial Basis Function Algorithms for Large-Scale ... - SJU

Suggest Documents

Constrained Optimization by Radial Basis Function Interpolation ... - SJU

Analysis of Algorithms for Radial Basis Function Neural Network

clustering-based algorithms for radial basis function and sigmoid

Radial-Basis-Function Neural Network

Constructive Transparent Radial Basis Function

EVOLUTIONARY RADIAL BASIS FUNCTION NETWORK FOR ...

Radial Basis Function Neural Network for Human

BETTER BASES FOR RADIAL BASIS FUNCTION INTERPOLATION ...

Spectral analysis for radial basis function collocation

Radial Basis Function For Handwritten Devanagari Numeral ...

Radial Basis Function based Approach to reduce

Radial Basis Function approximation methods with

Optimization by Radial Basis Function Interpolation

Adaptive Training of Radial Basis Function

Research Article Radial Basis Function-Sparse ... - BioMedSearch

A new radial basis function networks structure

FAST SOLUTION OF THE RADIAL BASIS FUNCTION ...

Normalized Gaussian Radial Basis Function ... - Semantic Scholar

Radial Basis Function Networks GPU Based Implementation

Adaptive radial basis function interpolation using an

Radial Basis Function (RBF) Based Routing ...

FAST RADIAL BASIS FUNCTION INTERPOLATION VIA ...

Performance Comparison of Radial Basis Function

clustering-based algorithms for radial basis ... - Semantic Scholar