Asynchronous evolutionary shape optimization based on high-quality ...

5 downloads 221897 Views 887KB Size Report
node dedicated to the optimization (GA operations) shown in Fig. 3, typically using ..... over an hour elapse time of MOGA for three different remote server states:.
Engineering with Computers DOI 10.1007/s00366-012-0263-0

ORIGINAL ARTICLE

Asynchronous evolutionary shape optimization based on high-quality surrogates: application to an air-conditioning duct Balaji Raghavan • Piotr Breitkopf

Received: 21 December 2010 / Accepted: 22 March 2012  Springer-Verlag London Limited 2012

Abstract Multi-processor HPC tools have become commonplace in industry and research today. Evolutionary algorithms may be elegantly parallelized by broadcasting a whole population of designs to an array of processors in a computing cluster or grid. However, issues arise due to synchronization barriers: subsequent iterations have to wait for the successful execution of all jobs of the previous generation. When other users load a cluster or a grid, individual tasks may be delayed and some of them may never complete, slowing down and eventually blocking the optimization process. In this paper, we extend the recent ‘‘Futures’’ concept permitting the algorithm to circumvent such situations. The idea is to set the default values to the cost function values calculated using a high-quality surrogate model, progressively improving when ‘‘exact’’ numerical results are received. While waiting for the exact result, the algorithm continues using the approximation and when the data finally arrives, the surrogate model is updated. At convergence, the final result is not only an optimized set of designs, but also a surrogate model that is precise within the neighborhood of the optimal solution. We illustrate this approach with the cluster optimization of an A/C duct of a passenger car, using a refined CFD legacy software model along with an adaptive meta-model based on Proper Orthogonal Decomposition (POD) and diffuse approximation.

Keywords Parallel computing  Genetic algorithms  Ask&Tell  Futures  Fluid mechanics

1 Introduction Stochastic search Genetic Algorithms (GAs) [1, 2] attempt to find the optimal solution by manipulating a population of candidate solutions, evaluating the fitness of the population, and selecting the best solutions to reproduce and form the next generation. As the generations proceed, the individuals with the highest fitness levels dominate the population, potentially resulting in an increase in the quality. These algorithms are attractive for multi-modal and multiobjective optimization problems [3, 4] since rather than a single optimum, they report a set of solutions corresponding to local minima or the Pareto points. Since GAs do not use gradient information, these are easily implemented when a non-intrusive optimization strategy is desired. As the overall cost of an optimization may be approximated by the cost of the function evaluation multiplied by the number of function calls, there are two major shortcomings of GA’s when applied to structural optimization problems: •

• B. Raghavan  P. Breitkopf (&) Laboratoire Roberval, UTC-CNRS (UMR 7337), Labex MS2T, Universite´ de Technologie de Compie`gne, Compie`gne, France e-mail: [email protected] B. Raghavan e-mail: [email protected]

The cost of fitness of an individual when calculated using a high-fidelity numerical model with several thousands of degrees of freedom (finite elements, finite volumes, etc.); The number of function evaluations resulting both from the size of the population and from the number of generations required to converge.

In the present work, the first issue is addressed by building a dedicated, low-cost and adaptive counterpart (surrogate) of the numerical model. The second issue is addressed by proposing an asynchronous parallel version of

123

Engineering with Computers

GA that uses both high-fidelity simulation when available and otherwise continues with a surrogate model. As the optimization advances, the refined solutions accumulate and are used to progressively refine the surrogate model. At convergence, the final result is an optimized set of designs and a finely tuned surrogate model that is precise within the neighborhood of the optimal solutions. Surrogate functions and reduced order models have been around for a while and have been used mostly in the field of control systems to reduce the order of the overall transfer function [5]. They can approximate the objective functions in a fraction of the computational time of the high-fidelity model [6]. Lim et al. [7] provided a generalization of surrogate-assisted evolutionary frameworks for computationally expensive problems. Surrogate-assisted evolutionary optimization typically proceeds in cycles consisting of: collection and analysis of a number of designs, fitting a surrogate to the designs, optimization based on the surrogate and exact analysis at the final solution. Most recently, Quiepo et al. [8] addressed a Gaussian process surrogatebased optimization and looked for a statistically rigorous procedure that would allow the user to determine the number of surrogate-based optimization (SBO) cycles needed for a given problem, while Viana et al. [9] investigated the cross-validation error of a set of surrogates and tried to obtain the most accurate surrogate. The ParEGO [10] extension of the Efficient Global Optimization (EGO) approach [11] uses a design-of-experiments inspired initialization procedure and learns a Gaussian process model of the search landscape, which is updated after every function evaluation. We are interested here in high-quality surrogates obtained using the method of Proper Orthogonal Decomposition (POD) [12]. Filomeno Coelho et al. [13] used a model reduction approach using kriging and POD to optimize the intake port for a car engine. Filomeno Coelho et al. [14] then proposed a bi-level model reduction strategy to reduce the interdependence between the fluid and structural models and hence the computation time for the case of a 2D wing demonstrator, and then extended the approach to the optimization of a 3D flexible wing, while Xiao et al. [15] proposed a constrained POD version for the optimization of a car engine intake port. All these approaches used a variant of a GA operating directly on the a priori constructed global surrogate functions, without making calls to the higher order numerical model. While this approach is extremely fast, one could run into trouble since the engineer relies completely on the surrogate function for his final result(s). A distributed/parallel GAs implementation is possible, since the fitness of an individual can be evaluated independently of the other individuals, meaning that the computation and search strategy is independent of the

123

computing sequence within a single generation. In the master–slave scheme, communication is necessary only when the slaves receive their individuals and then when they return the function. Bethke [16] performed the first study on parallel/distributed implementations of GA and compared their parallel performance with gradient-based optimizers. He also identified the bottlenecks that limit the efficiency of a gradient-based approach. Greffensette [17] proposed master–slave paradigms and the multiple populations approach with a migration scheme. The next subclass of parallel GAs is multi-deme GAs with multiple populations [18]. Most parallel GA implementations take one or more serial GAs and run each separately on separate nodes, exchanging individuals at predetermined times, and perform exact/numerical model-based evaluations. This, however, can be a bottleneck on a busy cluster since the function evaluations may be delayed leading to deadlocks. The need for asynchronous parallelism appears particularly when the high-fidelity model runs on a remote computing cluster that is being used for hundreds of other projects and computational purposes, such as in a large research laboratory or technical organization. This means that scheduled jobs will be handled depending on the overall priority, load on the cluster, and available resources. On a computing grid the possibility of node failures and/or delayed/dropped computational jobs exists and needs thus to be factored in when planning projects. Tsutsui and Wu et al. [19, 20] investigates asynchronous evolutionary algorithms on a multi-core cluster. Clearly, a ‘‘mixed’’ approach that allows for asynchronous computation using both numerical as well as surrogate calls allows the design engineer much more control and flexibility given the precision versus computational time and computational power, with the cluster load being a factor outside the user’s control. The literature surveyed shows little research on asynchronous surrogate-assisted evolutionary algorithms. Regis et al. [21] used a parallel stochastic radial-basis function algorithm for the surrogate and an asynchronous global pattern search on up to eight processors, and Asouti [22] used Artificial Neural Network (ANN)-based meta-models to improve the quality of the surrogate compared to existing low-precision meta-models. In this paper, we propose an ‘‘Ask&Future Tell’’ parallel paradigm for competitive use of high-fidelity and reduced models specifically designed for cluster/grid environments where an asynchronous approach is managed by a time-out scheme. We give a ‘‘non-intrusive’’ algorithm for massively parallel asynchronous surrogate-based evolutionary optimization tolerant to an existing cluster load with limited control over the recovery of the numerical results performed on the slave nodes. We illustrate the proposed approach in the problem of multi-objective optimization of a vehicle air-conditioning duct.

Engineering with Computers

The paper is organized into four sections. In Sect. 2 we describe the proposed algorithm. In Sect. 3 the test case, i.e., air-conditioning duct geometry and CFD model is treated along with the CPOD-based meta-model used as a surrogate with discussion of the optimization results. Finally we discuss the perspectives and limitations of proposed approach. Fig. 2 Sequential simulator block

2 Paradigms for parallel evolutionary optimization Consider a simple GA (Fig. 1). Each iteration involves two distinct phases: evaluation of the population fitness values and the recombination of individuals to produce the new population. The population evaluation in serial mode is shown in Fig. 2, when the optimization algorithm and all function evaluations are performed in an iterative manner on a single node. The simulator block is an embarrassingly parallel task and may be performed on a remote cluster with a master node dedicated to the optimization (GA operations) shown in Fig. 3, typically using a Single Program, Multiple Data (SPMD) block. This process takes advantage of the natural scalability of a GA and its result is strictly identical to that of the sequential version because of the independence of the order of function evaluations. However, the maximal theoretical speedup is limited by the size of the population. This means that when there are more processors available than individuals to be evaluated, some of the processors will stay idle. Moreover, when the remote cluster is under heavy use by competing users for multiple projects, affecting the evaluation time for individual processors of the cluster due to jobs being delayed (or even dropped) and as we need all the evaluations for the whole population in order to continue the GA, the overall execution time is dictated by the most charged and thus the slowest node.

Initial population

Fig. 3 Parallel SPMD implementation of simulator block

This phenomenon is represented using the synchronization barrier shown by a horizontal dotted line in Fig. 3. 2.1 ‘‘Ask&Tell’’ synchronous paradigm Communication with the optimizer may be formally explained using the Ask&Tell programming pattern [23] applied to the optimizer/numerical model interface. In simplest terms, ‘‘Ask&Tell’’ is a communication concept that replaces the function call construct in an object-oriented parallel environment. Since the numerical analysis is performed/queued on a remote cluster, the master node that needs the function evaluation issues an ‘‘Ask’’ message to probe the result of the analysis, and the remote cluster uses a ‘‘Tell’’ message to inform the master of the result, when available (Fig. 4). The Ask&Tell paradigm based on simple synchronous messages is not sufficient to leverage the synchronization barrier by itself. We need to introduce a paradigm for timeout messages as well, in order to permit the algorithm to

simulator block Selection Recombination optimizer block Mutation Dominance

Updated population

Fig. 1 Basic GA simulator and optimizer blocks

Fig. 4 The Ask&Tell paradigm

123

Engineering with Computers

continue by replacing missing function values by making some reasonable hypotheses. 2.2 ‘‘Ask&Future Tell’’ asynchronous paradigm The concept of ‘‘Futures’’ or wait-by-necessity [24] permits an algorithm to advance with missing data ‘‘marked for future updating’’. We propose here to extend the original idea by giving a physical meaning to the ‘‘Futures’’ paradigm. The extended concept may be stated as ‘‘use approximate response and mark for future updating’’. The surrogate model is performed by default on all individuals, prior to sending them to the exact model for ‘‘precise’’ evaluation. At each new generation two cases arise: •



(k B N) individuals complete within the specified timeout: the exact function values fy1 . . .yk g are ‘‘told’’ to the optimizer and the errors of the initial estimates are used to update the surrogate model; N-k C 0 computations are timed-out: the ‘‘Futures’’ tag is set for the processes continuing in background, while the surrogate values fykþ1 . . .yN g are ‘‘told’’ to the optimizer.

The synchronization barrier in Fig. 3 is thus replaced by a result time-out (user-specified) in Fig. 5. Once the numerical result is sent from the remote cluster, the surrogate is updated based on the error criterion. Figure 3 shows the ‘‘Ask&Future Tell’’ concept in an asynchronous surrogate-assisted GA. The key parameter is the time-out value which has to be finely tuned during the process. For the initial generation, the time-out is typically set at around 1.1 times the ideal time taken for a single evaluation, i.e., on a dedicated node. Obviously for an ideal case using a dedicated remote cluster ‘‘zero-load’’ (unrealistic) all evaluations will complete before the time-out allowing the optimization to proceed with the exact values instead of a surrogate. The time-out value decreases progressively during the optimization process as allowing the surrogate quality improve on top of accumulated ‘‘exact’’ values.

Fig. 5 ‘‘Ask&Future Tell’’

123

Fig. 6 A/C duct shape with inlet and outlet sections

3 Performance optimization of an air-conditioning duct We demonstrate the proposed approach to optimizing the geometry of an A/C duct (Fig. 6) of a passenger car [25] in order to maximize its performance characterized by the permeability (related to head loss over the duct length), uniformity of flow at the exit and the total duct volume. We set up a CFD grid with 39,000 grid points and 17,250 hexahedral cells (Fig. 7). The physical domain is split into 23 different blocks for the purpose of meshing. Blocks 13–22 (Fig. 7) in the curved portion have a higher meshing density near the sharper portions of the curves. The cell locations are constant in fixed blocks 1 and 23, while cell locations in blocks 2–22 change with each choice of design parameters, albeit the same number of cells, arrangement, connectivity and mesh density. Since the Reynolds number for the situation is typically low, we assume incompressible 2D laminar airflow. We use OpenFoam [26] CFD model to solve the Navier–Stokes

Fig. 7 CFD meshing blocks and cells

Engineering with Computers

equation in 2D. Boundary conditions prescribe null flow speed along the walls and atmospheric pressure at the duct outlet. The CFD analysis is run for 500 iterations to ensure convergence, yielding pressure and velocity fields at the midpoint of each hexahedral cell. The permeability and exit flow uniformity are directly evaluated from the pressure/velocity fields, while the duct volume is easily calculated from the cell coordinates.

Parameters X1–X5 allow us to locate points P5–P8, while the additional parameters a1–a4 and b1–b4 define Bezier curves passing through these points tracing out the curved portion of the duct. The 13 design variables are thus: X1, X2, X3, X4, X5 and X6 = a1, X7 = b1, X8 = a2, X9 = b2, X10 = a3, X11 = b3, X12 = a4, X13 = b4. Upper and lower bounds on these 13 parameters are set in order to retain the laminar flow hypothesis and guarantee a realistic shape.

3.1 Optimization problem statement

3.3 Bi-level surrogate model

The shape optimization problem with three objective functions: pressure drop F1, output flow uniformity F2 expressed by the standard deviation of velocities and the duct volume F3, may be stated as follows

Physics-based meta-models take advantage of the information contained in the pressure and velocity fields of high-fidelity model. The model reduction is two-stage POD with customized coefficients conserving previously chosen cumulative variables after truncation.

 F2 ðXÞ;  F3 ðXÞ  Find X opt ¼ Argmax F1 ðXÞ; X ¼ fX1 . . .X13 g; LB  X  UB

ð1Þ

F1 = 1.0/(PA - PB) X F2 ¼ 1:0= ðkVi k  kVmean kÞ2 =nB B

F3 = 1/(duct volume), where A = inlet section, B = outlet section UB and LB are the geometric bounds on the  The pressure and velocity fields P; V design variables ðXÞ. are calculated using CFD. 3.2 Shape parameterization The inlet and outlet portions of the air-conditioning duct have fixed geometries, while the middle portion allows for modification of the shape and thus performance of the duct. The 2D section (Fig. 8) is completely described by the relative positions of points P1–P11. Positions of P1–P4 and P9–P11 are assumed fixed and P5–P8 are obtained by the geometric constructions.

3.3.1 First level: constrained POD Proper orthogonal decomposition works on a vector basis obtained from a rectangular correlation matrix based on a series of snapshots, and allows capturing the physics with a limited set of scalar coefficients and is several orders of magnitude faster than a CFD simulation. The initial set of snapshots is composed of velocity/pressure fields of the first generation. As the GA progresses, we add successive high-fidelity velocity/pressure fields to the snapshots matrix S. We obtain three snapshot matrices for P, Ux and Uy, each composed of M snapshot vectors of length N equal to the number of sampling points (shown for P, other two field variables are similar) 2 1 3  P1  P1 P21  P1 . . . PM 1  P1 6 7 : : : : 7: S¼6 ð2Þ 4 5 : : : :  P1N  PN P2N  PN . . . PM N  PN It must be noted that the region in space occupied by the structural geometry changes depending on the design variables. We interpolate the field values at the grid points over a fixed reference grid using the finite elements [27] or Delaunay Triangulation [28]. We first calculate the covariance matrices CP for each field variable (only P shown here) CP ¼ SST ;

ð3Þ

allowing us to express the three field vectors in terms of the eigenvectors UP ¼ ½/P1 ; /P2 . . ./PM  of Cp M X Pi ¼ P þ aij /Pj : ð4Þ j¼1

Fig. 8 2D section and geometric construction of curved portion of the duct

We next truncate the basis to the m most active modes P P ~ P ¼ P þ m 1 ai /i (m  M) with a relative projection error

123

Engineering with Computers

Pm ki eðmÞ ¼ 1  Pi¼1 : M i¼1 ki

ð5Þ

However, additional constraints are needed for the truncated POD approximation to conserve cumulative quantities (such as total fluid flow) over the approximated field P~i W T P~i ¼ ciP ;

ð6Þ

where WT stores interpolation and integration coefficients over the reference grid and ciP is the value integrated over the original snapshot. We have shown [15] that for a given basis truncated to m vectors /;m , the coefficients a (b and c for the other two variables) are obtained from   T    T  /;m M/;m /T;m W /;m MðPðkÞ  PÞ aðkÞ : ð7Þ ¼ k 0 W T /;m cðkÞ  W T P This permits us to reconstruct the snapshots preserving global quantities of interest (objective functions). In order to get fields for arbitrary values of design variables, we use an approximation scheme. 3.3.2 Second level: diffuse approximation of POD coefficients The next step is to express the coefficients, a, b and c as functions of the design variables X ¼ X1 . . .X13 using diffuse approximation method [29], chosen for its adaptivity and ability to capture local effects   bT ðXÞað  XÞ  aðXÞ   ¼ 1 X1 X2 bT ðXÞ

ð8Þ . . . X12

X 1 X2



... ;

ð9Þ

solutions and removes duplicates as well as weak points far from the Pareto to increase the approximation precision in the local neighborhood of the dominant solutions (first k snapshots in the snapshot matrix are replaced by more dominant point closer to the Pareto front)   S ¼ S1 S2 . . . SM  ! Sk . . . SN SNþ1 . . . SMþk1 : If we observe distinct groups indicating a split on the population (two different local optima) then the design Uy space can also be split giving different bases /Pi ; /Ux i ; /i for the different local optima, based on adaptive POD by Ryckelynck [30]. 3.4 Master–slave implementation on a cluster Our implementation of the proposed model queues the CFD function calls on a 160-processor remote cluster with close to 40 competing users working on unrelated projects. Since there are three objectives, we used a MOGA [4] approach with single point crossover, mutation and niching to locate dominant solutions using the asynchronous population evaluation (Sect. 2). We implement niching by removing ‘‘clones’’ of the most dominant individuals and replacing them by freshly sampled points in the neighborhood of the dominant solutions. Figure 9 shows the asynchronous implementation in practice with the CFD results obtained using OpenFOAM on the loaded cluster. MATLAB/Scilab is run on the group A (master nodes) processors while group B processors (remote slaves) receive the CFD analysis requests (to be run using OpenFoam). The processors are split into two groups.

 are chosen to minimize the functional Jx(a) where aðXÞ M  1X  bT ðXi Þa  Pi 2 ; wi ðXi ; XÞ ð10Þ Jx ðaÞ ¼ 2 i¼1



 is the weighting function for the MLS where wi ðXi ; XÞ approximation, e.g., radial

 ¼ exp 0:25kXi  Xk2 : wi ðXi ; XÞ ð11Þ



The two steps—POD and diffuse approximation constitute a bi-level surrogate model for the airflow within the duct permitting us to approximate the air-duct pressure and velocity fields by m X ~ XÞ  ¼ P þ  Pi : Pð ai ðXÞ/ ð12Þ

Thus master nodes submit jobs to slaves through the scheduler so slaves can eventually ‘‘write’’ output to the database. Masters continue the optimization with the surrogate values without waiting for the CFD results. Convergence is achieved when we have obtained a satisfactory number of non-dominated and validated solution designs.

The master nodes launch CFD jobs each generation; create snapshots, calculate surrogates and refine database, have ‘‘Ask’’ access to CFD values, write surrogate values to separate database, and perform GA operations. The slave nodes receive CFD requests for individuals, post-process CFD results to evaluate objective functions, ‘‘Tell’’ access to CFD values database, NO access to surrogate database.

1

3.3.3 Adaptive learning strategy

3.5 Numerical results and discussion

The learning strategy to refine and improve the surrogate generates new snapshots around the Pareto (dominant)

While testing the presented approach on a cluster, the important features were,

123

Engineering with Computers

Fig. 9 Cluster implementation of asynchronous GA



• •

Non-intrusive nature of the protocol allowing the optimization to continue in a stable manner independent of the CAD/CFD despite system crashes due to CAD failure, server load and outages; Quality of optimal solution (s) obtained as verified using the exact CFD calculation; Observed utilization of computing resources allotted with no idle time for any of the processors.

The mixed approach ensured zero idle time for the group A processors beyond waiting for a sufficient initial size of the database to use the meta-model, ensuring a

completely asynchronous execution. The accuracy of the CPOD surrogate improved quickly with the database size. The population diversity had to be increased (using niching and additional sampling) during any group B server outages/overloads in order to prevent premature convergence with a less accurate estimate. With this, the optimization process could continue and be halted at any time depending on the need and performed well in the face of failure due to the way databases are written as external files rather than internal variables. Figure 10 shows the evolution of the design points plotted on the 3D objective function space (permeability

Fig. 10 Evolution of the population on objective functions space

123

Engineering with Computers

The velocity distribution shows the reason for the choice of the first objective function, i.e., uniformity of flow velocity at the exit of the duct related to noise control. We note that circulation can be seen next to the curved surfaces of the duct, and since these individuals still involve a trade off between duct volume and permeability/flow uniformity, the velocity fields show some circulation in two distinct regions. 3.6 Performance and parallel efficiency

Fig. 11 Generation 20: F1 and F2 non-competing; F3 competes with F1 and F2; the point A on the Pareto set is chosen for further inspection (Fig. 12)

vs. flow uniformity vs. duct volume) as the multi-objective genetic algorithm proceeds from the first to future generations. Figure 11 shows the individual objectives plotted against each other. It bears mentioning that the first two objectives (permeability and flow uniformity at the duct exit) are not mutually opposing. As expected, there is a gradual improvement in the population based on the objective functions, even though the graph may change as the ‘‘Futures’’ protocol updates the surrogate values with the CFD-calculated values. Figure 12 shows the pressure distribution and the flow field (streamlines) in one of the air-conditioning duct geometries obtained after 40 generations (point A in Fig. 11). As can be seen, the pressure distribution tapers off over the length of the duct eventually releasing into atmospheric pressure (zero gauge pressure), and the pressure drop off is considerably sharp, and directly relates to performance. Fig. 12 Pressure distribution and flow streamlines for a Pareto point A

123

In the general approach covered by this article, it is impossible to obtain a quantitative estimate of parallel efficiency since the actual computation time depends on the existing load on the remote cluster. In our particular example, computation for the full CFD model required 90 s on a dedicated remote node; while the CPOD-based surrogate functions on a master node (identical to remote nodes) required less than 5 s. That said, the actual course of the optimization will greatly depend on the exact state/load of the remote cluster which is constantly evolving with the number of users submitting jobs and thus making a broad statement about the actual efficiency as has been seen in previous related works is not possible. We have made some comparisons over an hour elapse time of MOGA for three different remote server states: ‘‘zero-load’’ (dedicated server); ‘‘normal-load’’ (*30 competing users) and ‘‘full-load’’ (remote server unable to process CFD requests). The program was run with 100 % CFD (no surrogate-assistance) and a mixed approach, i.e., CFD and surrogate with time-out, and the results are compared in Fig. 13. The reference (green) curve is obtained for zero-load and constant 100-s time-out value. The proposed approach reduces here to the synchronous ‘‘Ask&Tell’’ GA

Engineering with Computers

Fig. 13 Elapse-time convergence of synchronous SPMD GA on dedicated cluster compared with ‘‘real life’’ and with proposed ‘‘Ask&Future Tell’’ approach performance on loaded cluster

implementation for a dedicated remote cluster since all exact evaluations complete within roughly 90 s. The red curve shows the degradation of performance of the synchronous ‘‘Ask&Tell’’ GA implementation (without surrogates) on a charged cluster. Only two generations are performed within the 1st hour as it is enough for a single individual taking 40 min to block the progression. The blue curve illustrates the proposed asynchronous ‘‘Ask&Future Tell’’ approach. We notice that the iterations are performed at the beginning within the same 100 s timeout, which decreases giving more frequent update than the reference curve. Due to the poor initial quality of the initial surrogate, the convergence is slower at the beginning than that of the synchronous counterpart and improves over the time. The comparison is hindered by the fact that the asynchronous version is not deterministic as the order of operations changes between generations. It is, however, easily observed that within 1-hour elapse time the proposed algorithm gives objective function values close to the reference and largely outperforms the synchronous implementation in this ‘‘real life’’ case.

4 Conclusions In this paper, we presented and tested a unified non-intrusive approach to solving multi-objective optimization problems on a busy grid/cluster, avoiding deadlocks and node failures. This is possible by an introduction of a new ‘‘Ask&Future Tell’’ parallel paradigm combining asynchronous approach with high-fidelity simulations and custom-built adaptive bi-level surrogate meta-model. This allows the high-fidelity simulations to proceed at their own pace on a remote cluster and update the simulation results to enrich the database of stored results once available to

ensure and improve surrogate quality, while the optimization algorithm proceeds with approximate values. The asynchronous approach presented will conceivably work with any surrogate-based meta-model that produces quality estimates, and on any type of cluster/grid. For an efficient use, several parameters need closer investigation such as the time-out evolution, which conditions the quality of the learning process. In the current work jobs complete largely in FIFO (First In, First Out) order. A second level time-out may be necessary for deleting largely delayed jobs corresponding to design points far from the current Pareto set, the ideal learning strategy being clearly LIFO (Last In, First Out). Further work is needed to extend the ‘‘Ask&Future Tell’’ approach to gradient-based optimization. Several issues need to be addressed such as scalability, approximation of gradients and local optima. Acknowledgments This work has been supported by the French National Research Agency (ANR), through the COSINUS program (project OMD2 no. ANR-08-COSI-007). The authors acknowledge the Projet Pluri-Formations PILCAM2 at the Universite´ de Technologie de Compie`gne for providing HPC resources that have contributed to the research results reported within this paper (URL: http://pilcam2.wikispaces.com.) as well as Maryan Sidorkiewicsz, Direction de la Recherche, Renault, France and Mr. V. Picheny, Ecole des Mines, France for contributing the CFD model used in this work.

References 1. Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor 2. Vose MD (1999) The simple genetic algorithm: foundations and theory. MIT Press, Cambridge 3. Konak A, Coit DW, Smith AE (2006) Multi-objective optimization using genetic algorithms: a tutorial. Reliab Eng Syst Saf 91:992–1007

123

Engineering with Computers 4. Deb K (2001) Multi-objective optimization using genetic algorithms. Wiley, Chichester 5. Willcox K, Peraire J (2002) Balanced model reduction via the proper orthogonal decomposition. AIAA Journal 40(11):2323–2330 6. Gorissen D, Couckuyt I, Laermans E, Dhaene T (1985) Multiobjective global surrogate modeling, dealing with the 5-percent problem. Eng Comput 26(1):81–98 7. Lim D, Jin YC, Ong YS, Sendhoff B (2010) Generalizing surrogate-assisted evolutionary computation. IEEE Trans Evol Comput 14(3):329–355 8. Quiepo NV, Verde A, Pintos S, Haftka RT (2009) Assessing the value of another cycle in Gaussian process surrogate-based optimization. Int J Struc Multidisc Optim 39(5):459–475 9. Viana FAC, Haftka RT, Steffen V (2009) Multiple surrogates: how cross-validation errors can help us to obtain the best predictor. Int J Struc Multidisc Optim 39(4):439–457 10. Knowles J (2006) ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multi objective optimization problems. IEEE Trans Evol Comput 10(1):50–66 11. Jones D, Schonlau M, Welch W (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13:455–492 12. Berkooz G, Holmes P, Lumley JL (1993) The proper orthogonal decomposition in the analysis of turbulent flows. Annu Rev Fluid Mech 25:539–575 13. Filomeno Coelho R, Breitkopf P, Knopf-Lenoir C (2008) Model reduction for multidisciplinary optimization—application to a 2d wing. Int J Struc Multidisc Optim 37(1):29–48 14. Filomeno Coelho R, Breitkopf P, Knopf-Lenoir C (2009) Bi-level model reduction for coupled problems. Int J Struc Multidisc Optim 39(4):401–418 15. Xiao M, Breitkopf P, Coelho RF, Knopf-Lenoir C, Sidorkiewicsz M, Villon P (2009) Model reduction by CPOD and Kriging. Int J Struc Multidisc Optim 41(4):555–574 16. Bethke AD (1976) Comparison of genetic algorithms and gradient-based optimizers on parallel processors: efficiency of use of processing capacity, Tech rep no 197. University of Michigan, Ann Arbor 17. Greffensette JJ (1981) Parallel adaptive algorithms for function optimization: parallel subcomponent interaction in a multilocus model, Tech Rep No CS-81-19. Vanderbilt University, Nashville

123

18. Cantu-Paz E (1997) A survey of parallel genetic algorithms IllGAL report 97003. The University of Illinois, Chicago 19. Tsutsui S (2010) Parallelization of an evolutionary algorithm on a platform with multi-core processors. Artificial evolution, vol 5975. Lecture notes in computer science. Springer, Heidelberg, pp 61–73 20. Wu H, Xu CL, Zou XF (2009) An efficient asynchronous parallel evolutionary algorithm based on message passing model for solving complex nonlinear constrained optimization. In: proceedings of the 8th international symposium on operations research and its applications, ZhangJiaJie, China 21. Regis RG, Shoemaker CA (2009) Parallel stochastic global optimization using radial basis functions. INFORMS J Comput 21(3):411–426 22. Asouti VG, Kampolis IC, Giannakoglou KC (2009) A gridenabled asynchronous meta model-assisted evolutionary algorithm for aerodynamic optimization. Genet Program Evolvable Mach 10(4):373–389 23. LeRiche R, Collette Y, Hansen N, Pujol G, Salazar D (2010) On object-oriented programming of optimizers: examples in Scilab. In: P. Breitkopf, R. Filomeno Coehlo (eds) Multidisciplinary design optimization in computational mechanics (chapter 14) Wiley/ISTE, Ney York, June 2010, pp 499–538 24. Caromel D, Henrio L (2004) A theory of distributed objects. Springer, Berlin 25. http://omd2.scilab.org/ (2009) OMD2-project home-page, Accessed Feb 22 2011 26. http://www.openfoam.com OpenFoam: the open-source CFD toolbox, Accessed Aug 17 2010 27. Breitkopf P (1998) An algorithm for construction of iso-valued surfaces for finite elements. Eng Comput 14(2):146–149 28. Rypl D, Krysl P (1997) Triangulation of 3D surfaces. Eng Comput 13(2):87–98 29. Breitkopf P, Rassineux A, Touzot G, Villon P (2000) Explicit form and efficient computation of MLS shape functions and their derivatives. Int J Numer Meth Eng 48:451–456 30. Ryckelynck D (2005) A priori hyper eduction method: an adaptive approach. J Comput Phys 202(1):346–366

Suggest Documents