After a short introduction of the evolution control technique, the modifications made to the original algorithm MOPED. (Multi-Objective Parzen based Estimation of ...
MULTI-OBJECTIVE EVOLUTIONARY OPTIMIZATION OF SUBSONIC AIRFOILS BY META-MODELING AND EVOLUTION CONTROL Salvatore D’Angelo, M.S. - Edmondo Minisci, Ph.D. Department of Aeronautical and Space Engineering Politecnico di Torino, C.so Duca degli Abruzzi, 24 – 10129, Turin Italy ABSTRACT This work concerns the application of multi-objective evolutionary optimization by approximation function to 2D aerodynamic design. The new general concept of evolution control is used to on-line enriching the database of correct solutions, which are the basis of the learning procedure for the approximators. Substantially, given an initial very poor model approximation, which means small size of the database, the database, and consequently the models, is enriched by evaluating part of the individuals of the optimization process. The technique showed being very efficient for the considered problem, by requiring few hundreds of true computations when the dimensionality of the problem is 5. Obtained results are utilized to show how the adopted approximation techniques can influence the whole optimization process. For this work artificial neural networks and kriging approximators are used. KEYWORDS Multi-objective evolutionary optimization, meta-modeling, evolution control. INTRODUCTION Evolutionary Algorithms (EAs) have been demonstrating to be robust and effective for over a decade. Multi-Objective Evolutionary Algorithms (MOEAs), in particular can be now considered fully mature, but a wide variety of available industrial problems can not be solved by evolutionary methods because the too high computational cost of objective function evaluations. In order to face this big challenge, some alternative approaches can be considered (see Ref [1]). When the number of objective function calls, which can be afforded from the point of view of an industrially practical computational cost, is not so big as required for convergence of available MOEAs, we could choose among different ways. In this work we will use the function approximation methods in order to approximate objective functions and constraints. The main idea is to consider, throughout the optimization process, two different levels of objective functions. Each level corresponds to the true functions, which are to be evaluated as few times as possible, because of their high computational cost, and to the interpolation of the true objective functions by means of some interpolation technique, which can be evaluated as many times as it is needed, because considered inexpensive (this is the hypothesis). At this point the problem is how scheduling the use of the approximate model and the real one. The simplest idea is a) running the optimization, using the approximation, until convergence is achieved, b) upgrading the approximate model by computing the individuals of the last generation and c) repeating points a) and b) until no new individuals are found. If the first set of data contains the global optima, this approach could efficiently bring to the solution(s). This is surely true only if the dataset would span the whole input space of interest, ensuring that any predicted value (i.e. output of the model) is the result of an interpolation process and not a risky extrapolation, but we have to face the curse of dimensionality. Due to the necessity to limit the number of training samples, if the number of dimensions of the search space increases over a threshold, whose value mainly depends on the cost of the true model, it is very difficult to construct an approximate model that is globally correct. Likely the approximation brings the optimization algorithm to false optima, which are optima of the approximate model but not optima of the true functions. In order to avoid finding false optima, or losing some of the optima, in this work we use the concept of model management or evolution control, borrowed from Ref [2], where it is applied for single objective optimization. The authors already published few results concerning a 2D aerodynamic optimization by evolution control and kriging approximation in Ref [3]. Here the same optimization problem is approached by using Artificial Neural Network (ANN) approximators. The remainder of the paper is structured as follows. In the next section, the adopted evolutionary algorithm is described. After a short introduction of the evolution control technique, the modifications made to the original algorithm MOPED (Multi-Objective Parzen based Estimation of Distribution) (see Ref [4]) in order to take into account the evolution control approach will be detailed. In the third section, few elements concerning the approximation techniques the simple
aerodynamic model and the optimization structure will be detailed. In the forth section, current most meaningful results obtained by ANN approximation will be shown and discussed. The work will be summarized in the last section. MOPED WITH APPROXIMATION The used code is based on the modification (development) of the MOPED algorithm, which is a multi-objective optimization algorithm for continuous problems that uses the Parzen method to build a probabilistic representation of Pareto solutions, with multivariate dependencies among variables. Similarly to what was done in Ref [5] for multi-objective Bayesian Optimization Algorithm (BOA), the techniques of NSGA-II (see Ref [6]) are used to classify promising solutions in the objective space, while new individuals are obtained by sampling from the Parzen model. The Parzen method, described by Fukunaga in [7], pursues a non-parametric approach to kernel density estimation and it gives rise to an estimator that converges everywhere to the true Probability Density Function (PDF) in the mean square sense. Should the true PDF be uniformly continuous, the Parzen estimator can also be made uniformly consistent. In short, the method allocates exactly n identical kernels, each one centered on a different element of the sample. The original MOPED demonstrated to be effective and efficient when applied to general test cases and realer problems. Here, in order to improve its efficiency, we try a possible manner to hybridize it with approximation techniques. Before going into the algorithm, few words will be spent to introduce the evolution control techniques. EVOLUTION CONTROL If the problem we are handling is simple, i.e. it has low dimensionality and low multi-modality, both random sampling and regularized learning can be very helpful to prevent finding false minima. However, if this is not the case (and in general it is not) it would be very difficult to achieve correct convergence using these methods. To overcome this obstacle, we use the concept of evolution control as suggested in [2], where evolution control means employing the original fitness function to prevent the evolutionary algorithm from converging to false minima. In [2], Jin et al propose two methods: a) individual-based control and generation-based control. In the first approach, part of the individuals (nv) in the current population are chosen and evaluated with the true model. Moreover, if the controlled individuals are chosen randomly, we call it a random strategy, if we choose the best nv individuals as the controlled individuals, we call it a best strategy. In the latter, on the other hand, the whole population of nvgen generations will be evaluated with the real model, every ncyc generations, where nvgen distmin, then they are added to the dataset and the approximated model is updated (when a number of individuals < nv is added, the worst half part of the population is explored). Under the hypothesis that every operation, but the use of the true model, is inexpensive, current population is recomputed by the approximation model and re-classified (in order to use new information as soon as possible). 5. The best nind individuals are selected to be next generation. 6. If the convergence criteria are achieved the algorithm stops. FUNCTION APPROXIMATION When we face to the problem of function approximation, we have to make a choice among different approaches. Most known are polynomial models, also known as response surface methodology, the kriging model, feedforward neural networks and support vector machines. For this work we use mainly artificial neural networks and, in order to have some more information, the kriging model. ARTIFICIAL NEURAL NETWORKS Scientific and practiconers communities largely known artificial neural networks, both in the form of Multi Layer Perceptrons (MLPs) and Radial-Basis Function Networks (RBFNs), can approximate any continuous function with a predefined accuracy level. Accuracy mainly depends on the network structure itself (layers and number of neurons), but also on the adopted learning technique. For this work, only MLP type has been used. Given a set of p evaluating point X = [x1 , …, xp]T, xi ∈ℜn, and answers Y = [y1 ,…, yp]T , yi ∈ℜq, we look for an model able to provide the approximated output ŷ for a generic input x.
Figure 2 – Generic MLP network
The generic structure of a MLP network, as shown in figure Figure 2, has an initial layer, which takes the input x, a final layer, which gives the output ŷ, and k hidden layers. Each layer takes as input the elaboration of the previous layer output, that is, given the l-th, each element j of the layer has the input zj =
nni − 1
∑
w j ,i ⋅ y i + θ
j
(1)
i= 1
and the output
( )
yj = F zj
(2)
where F is the activation function. Note y0 and yk+1 are the input x and the output ŷ, respectively. Once the structure and the activation functions have been , we have to determine the values of w and θ in order to have a model giving output Y for input X. LEARNING TECHNIQUES AND GENERALIZATION Supervised learning techniques are used in order to identify the values of weights, w, and biases, θ, which allow the network to approximate the functions which database X and Y are extracted from. The general way the supervised learning works is: a) the network processes the input data and compares obtained output and expected ones; b) errors (differences between obtained and expected output) are elaborated in order to modify weights and biases. Most popular learning rule is backpropagation, which substantially is a way to efficiently and exactly calculate derivatives in a system composed by many different sub-systems. On the basis of backpropagation, many different algorithms were developed with two main tasks: 1) speeding-up the learning process and 2) improving the generalization. Since the aim of this work is the integration of approximating techniques in the optimization process in order to reduce expensive calls to the true models, the speed of the learning phase is a secondary aspect of the procedure. The idea is the computational requirements of the learning phase can be considered negligible if compared to the calls of the true models. Generalization, meaning the capacity to generate good approximation of unknown functions on the basis of a limited set of data, on the other hand is a fundamental aspect. Underfitting and overfitting problems can introduce some major noise in the optimization process bringing it towards some local optimal points. The easiest way allowing to overcome those problems could be correctly dimensioning the network on the basis of the database. However, this is not a practical approach, especially if we consider the approximation technique should be integrated within an optimization process. Other ways are a) regularization and b) early stopping. The early stopping has two big advantages: it is fast and general; but has one big disadvantage: it is a statistically inefficient procedure, because of neither learning nor validation use the whole set of data. Instead regularization is based on a modification of the objective function of the learning process in order to smooth the approximate model. This introduces some new parameters that are not so easy to tune, but there is a technique allowing to automatically set the regularization parameters. This method is called bayesian learning and appears being the best choice when we want smooth functions by using a reduced amount of data. In what follows we will use MLP networks trained by means of both classical and bayesian techniques. Obtained results will contribute to clarify how the generalization error can influence optimization processes when MLPs are used as function approximators. KRIGING APPROXIMATION As commonly accepted (see Ref’s [9] and [10]), this method can build accurate approximations of design spaces both for linear and non-linear functions. The approach is based on the hypothesis that real models are deterministic, that is responses from the models are error/noise free. Here, we used the kriging tool proposed in Ref [11], where the reader can find a detailed description. OPTIMIZATION PROCESS SYSTEM MODEL The aim is optimizing the performance of a subsonic airfoil within assigned flight conditions. Geometries of the airfoils are given by the superposition of a mean line and thickness distribution, each of them parameterised by bezier curves as in figure 3. Performance of the airfoil during the optimization process is computed by a well-known 2D panel code named XFOIL (Ref [12]). The use of XFOIL allows verifying the validity of the optimisation approach without requiring extensive computational capabilities.
Figure 3 – Airfoil parametrization
OPTIMIZATION STATEMENT Since we fix the coordinates of the extreme control points and the abscissa of the second control point related to the thickness distribution (xml,1 = 0; yml,1 = 0; xml,3 = 1; yml,3 = 0; xt,1 = 0; yt,1 = 0; xt,4 = 1; yt,4 = 0; xt,2 = 0), the optimization process was aimed at finding the five free coordinates of the bezier curves that minimize the functions F1 = cd
(3)
F2 = 4 - cl
(4)
cl ≥ 0.9 (F2 ≤ 3.1)
(5)
subject to cd = F1 ≤ 0.02
(6) 6
where cd and cl are the drag and the lift coefficients, respectively, computed for reynolds number = 3·10 without compressibility effects (low subsonic), for a fixed 8 deg of incidence. That is, given the parameterization, we look for those airfoils that minimize the drag and maximize the lift. The search space has been delimited as in table 1, where subscripts ml and t refers to mean line and thickness, and the MOPED parameters have been set as in table 2, where generations means the maximum number of generation allowed, but the algorithm can stop earlier if every individual of the current population satisfy the constraints and it is not non-dominated. The database has been initialized by means of central composite design on base 2 (43 should be number, but for 5 of them XFOIL was not able to converge). MOPED parameters Search space 0.1 ≤ xml,2 ≤ 0.9 0 ≤ yml,2 ≤ 0.2 0.05 ≤ yt,2 ≤ 0.2 0.1 ≤ xt,3 ≤ 0.9 0.05 ≤ yt,3 ≤ 0.2 Table -
Values 100 150 .9 1 10 15 0.15
Population Generations Fitness param. (α) Generator index (τ) Constraint classes nv distmin Table -
Apparently trivial, this test case presents difficulties typical of more complexly modeled problems: the true pareto front, even if mathematically unknown, appears discontinuous and the convergence of the aerodynamic code is not guaranteed at 100% (that introduces irregularities in the objective functions). MLP RESULTS The validity of proposed hybridization was validated by means of a list of test cases, which the kriging approximator was used for. The aim of those test was understanding whether the proposed strategy is promising or not and identifying correct values for nv and distmin parameters (detailed information on those tests can be find in [3]). On the basis of achieved results by kriging approximation, we started integrating the ANN approximator into the optimization algorithm. Since the hypothesis is that a single hidden layer network is able to approximate any arbitrary function, the
degrees of freedom of the integration are the number of neurons and learning techniques, influencing the generalization capacity of the ANN.
a) Bayesian approach
b) LM learning Figure 4 – True model evaluations
As said before, the only learning technique allowing good training and good generalization properties appears being the bayesian one. This approach requires much more computational resources, but in the hypothesis that learning time is negligible if compared to the time required to evacuate the true model, the bayesian approach remains the best choice. The influence of the generalization properties on the pareto front search can be detected by means of the analysis of figures 5 and 6. These figures show the evolutionary path and the final result in terms of pareto front approximation when we used the bayesian and Levenberg-Marquardt (LM, which appears being the fastest learning technique) approaches, respectively (one of 10 trials). Once the database was initialized by means of DOE, when we use bayesian approach (with a predefined error level), the optimization process starts from an initial random population with small variance, figure 5, and the general structure of the pareto front is detected after only 5 generations. The rest of the evolution brings the individuals towards a very good approximation of the solution after 141 evaluations of the true model (mean value on 10 trials), figure 4.a . Things go very differently when LM is used (with the same error level). Actually the learning process, executed for each generation, is much faster, but resulting models have awful generalization properties, bringing the optimization process to converge only towards one part of the front. As can be seen, the initial model is noise affected and the initial population is extremely scattered (figure 6). On the basis of this, the optimization process is not able to identify the left part of the front, even if the number of evaluated individuals is comparable with that of the previous case (148, mean value on 10 trials), figure 4.b .
a) Initial population
b) After 5 generations
c) After 25 generations
d) Last front compared with true results Figure 5 – Evolution by bayesian learning
a) Initial population
b) After 5 generations
c) After 25 generations
d) Last front compared with true results Figure 6 – Evolution by LM learning
Once we understood convergence properties depend on the generalization capabilities of the approximation model, the reason why the hybrid algorithm using the MLP approximator is more efficient than that using the kriging one remains unknown. Actually if we compare results obtained by kriging, as published on [3], to those published here, we can see that hybrid MLP algorithm requires approximately 50% less of function evaluation, referred to the hybrid kriging one. At the moment we do not have any element to support valid hypothesis explaining that. CONCLUSIONS Since many practical/industrial problems can not be managed by evolutionary techniques, because of their own computational requirements, this work is a contribution towards making the evolutionary algorithms more and more efficient. We propose an approach in order to integrate an approximation function technique within a multi-objective and constrained optimization process. In order to verify the validity of the approach, the hybrid algorithm was applied to a simplified aerodynamic problem. Obtained results are encouraging and demonstrate the use of this technique is able to reduce computational costs (number of calls to the true models) up to 90%, with both kriging and ANNs. But, even if presented results are so good,
we can not say the proposed method is able at managing true, complex industrial cases. This is mainly due to the course of dimensionality problem. In other words, at the moment we do not have any evidence allowing to deduce the behavior of the hybrid algorithm when the dimensionality of the problem is much higher than that of the discussed test case. It is widely known that ANNs suffer when the dimensionality of the problem is high, therefore it is quite logic supposing that the integration of ANNs within an optimization process can introduce some applicability limits. On the basis of this considerations, future works will be addressed to understand applicability limits of the proposed technique and to search for method allowing to overcome those limits. REFERENCES [1] Farina, M.: A Neural Network Based Generalized Response Surface Multiobjective Evolutionary Algorithm. In: Proceedings of the 2002 Congress on Evolutionary Computation (CEC2002) (Editors: Fogel, D.B., El-Sharkawi, M.A., Yao, X., Greenwood, G., Iba, H., Marrow, P. and Shackleton, M.)., (1). Piscataway, New Jersey, 2002, pp. 956-961. [2] Jin, Y., Olhofer, M. and Sendhoff, B.: A Framework for Evolutionary Optimization with Approximate Fitness Functions. IEEE Transactions on Evolutionary Computation, 6(5), 2002, pp. 481-494. [3] D'Angelo, S. - E. Minisci,: Multi-Objective Evolutionary Optimization of Subsonic Airfoils by Kriging Approximation and Evolution Control. IEEE Congress on Evolutionary Computation (CEC 2005), Edinburgh, Scotland, 2-5 September, 2005, pp. 1262-1267. [4] Costa, M. and Minisci, E.:MOPED: a Multi-Objective Parzen-based Estimation of Distribution algorithm. In: “Proceedings of the Second International Conference on Evolutionary Multi-Criterion Optimization (EMO 2003)” (Editors: Fonseca, C., Fleming, P., Zitzler, E., Deb, K. and Thiele, L.). Faro, Portugal: Springer, 2003, pp. 282-294. [5] Khan, N., Golberg, D.E. and Pelikan, M.: Multi-objective Bayesian Optimization Algorithm. Technical Report IlliGAL 2002009. Urbana-Champain: University of Illinois at – IlliGAL, 2002. [6] Deb, K, Pratap, A., Agarwal, S. and Meyarivan, T.: A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transaction on Evolutionary Computation, 6(2), 2002, pp. 182-197. [7] Fukunaga, K.: Introduction to statistical pattern recognition. Academic Press, 1972. [8] Avanzini, G., Biamonti, D. and Minisci, E.A.: Minimum-Fuel/Minimum-Time Maneuvers of Formation Flying Satellites. In: “Advances in the Astronautical Sciences, vol. 116”, pp. 2403-2422. [9] Jones, D.R., Schonlau, M. and Welch, W. J.: Efficient Global Optimization of Expensive Black-Box Functions., Journal of Global Optimization, 13(4), 1998, pp. 455-492. [10] Simpson, T.W., Mauery, T.M., Korte, J.J. and Mistree, F.: Kriging Models for Global Approximation in Simulation-Based Multidisciplinary Design Optimization. AIAA Journal, 39(12), 2001, pp. 2233-2241. [11] Lophaven, S.N., Nielsen, H.B. and Sondergaard, J.: DACE - A Matlab Kringing Tool. Technical University of Denmark, 2002. [12] Drela , M.: XFOIL - Subsonic Airfoil Development System. ACDL research group webpage, MIT, 2000. URL: http://raphael.mit.edu/xfoil/ .