Enhancements to Global Design Optimization Techniques

UNIVERSITY OF SOUTHAMPTON

Enhancements to Global Design Optimization Techniques

by András Sóbester

A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in the School of Engineering Sciences Mechanical Engineering

2003

UNIVERSITY OF SOUTHAMPTON ABSTRACT SCHOOL OF ENGINEERING SCIENCES MECHANICAL ENGINEERING Doctor of Philosophy by András Sóbester

Modern engineering design optimization relies to a large extent on computer simulations of physical phenomena. The computational cost of such high-fidelity physics-based analyses typically places a strict limit on the number of candidate designs that can be evaluated during the optimization process. The more global the scope of the search, the greater are the demands placed by this limited budget on the efficiency of the optimization algorithm. This thesis proposes a number of enhancements to two popular classes of global optimizers. First, we put forward a generic algorithm template that combines population-based stochastic global search techniques with local hillclimbers in a Lamarckian learning framework. We then test a specific implementation of this template on a simple aerodynamic design problem, where we also investigate the feasibility of using an adjoint flow-solver in this type of global optimization. In the second part of this work we look at optimizers based on low-cost global surrogate models of the objective function. We propose a heuristic that enables efficient parallelisation of such strategies (based on the expected improvement infill selection criterion). We then look at how the scope of surrogate-based optimizers can be controlled and how they can be set up for high efficiency.

Publications The present thesis is based on the following publications (or parts thereof): • A. Sóbester, A. J. Keane. Empirical Comparison of Gradient-Based Methods on an Engine-Inlet Shape Optimization Problem. 9th AIAA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, Atlanta, Georgia, 4-6 September 2002.

• A. Sóbester, S. J. Leary, A. J. Keane. A Parallel Updating Scheme for Approximating and Optimizing High Fidelity Computer Simulations. Structural and Multidisciplinary Optimization – accepted for publication (presented at the 3rd ISSMO/AIAA Internet Conference on Approximations in Optimization, 2002 ).

• A. Sóbester, S. J. Leary, A. J. Keane. On the Design of Optimization Strategies Based on Response Surface Approximation Models. Journal of Global Optimization – provisionally accepted for publication. The future work section of the conclusions chapter also draws from: • A. Sóbester, P. B. Nair, A. J. Keane. Genetic Programming Approaches for Solving Elliptic Partial Differential Equations – under review.

ii

Contents Publications

ii

Nomenclature

xi

Acknowledgements

xii

1 Introduction 1.1 Global Optimization in Engineering Design – Motivation and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 About this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Towards Affordable Global Optimization – Two Possible Avenues 2.1 Local Optimizers versus Global Explorers . . . . . . . . . . . . . . . . 2.2 Global Model Building as an Optimization Tool . . . . . . . . . . . . . 2.3 Objective Function Gradients – Beyond Local Search . . . . . . . . . . 2.3.1 Sensitivities – Cost and Quality . . . . . . . . . . . . . . . . . . 2.3.2 Gradient-Enhanced Hybrids . . . . . . . . . . . . . . . . . . . . 2.3.3 Response Surface Approximations and Gradients . . . . . . . .

. . . . . .

. . . . . .

3 Exploration / Exploitation Hybrids Lamarckian and Baldwinian Learning . . . . . . . . . . . . 3.1 A Taxonomy of Exploration/Exploitation Hybrids . . . . . . . . . . . . . 3.1.1 Search time division tuning . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Search time division control . . . . . . . . . . . . . . . . . . . . . . 3.1.2.1 Deterministic control . . . . . . . . . . . . . . . . . . . . 3.1.2.2 Adaptive control . . . . . . . . . . . . . . . . . . . . . . . 3.1.2.3 Self-adaptive control . . . . . . . . . . . . . . . . . . . . . 3.2 Designing a hybrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 The Search Time Division Scheme . . . . . . . . . . . . . . . . . . 3.2.2 The Components of the Hybrid . . . . . . . . . . . . . . . . . . . . 3.2.2.1 Operator Selection Based on Off-line Performance Analysis Graphical Off-line Tools . . . . . . . . . . . . . . . . . . . . 3.2.2.2 Other criteria . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 GLOSSY - The General Framework and its Embodiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 An Application: Shape Optimization of a Jet-Engine Inlet . . . . . . . . . 3.4.1 The Empirical Comparisons . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

1 1 3 6 6 8 9 9 11 11 14 15 17 18 19 19 20 20 21 21 22 22 25 27 28 33 34 39

CONTENTS

iv

4 Radial Basis Function Models in Global Optimization 4.1 Simple RBF models of Black-Box Functions . . . . . . . . . . . . . . . 4.2 Approximation Error Estimates . . . . . . . . . . . . . . . . . . . . . . 4.3 Gradient-Enhanced RBF Models . . . . . . . . . . . . . . . . . . . . . 4.4 Global Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Design of Experiments Techniques for Computer Experiments – Brief Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Exploitation and Exploration of RBF Models . . . . . . . . . . 4.4.3 Expected Improvement . . . . . . . . . . . . . . . . . . . . . .

. . . . a . . .

. 54 . 57 . 59

5 Parallel Updates in EIF-based Global Optimization 5.1 The Update Scheme . . . . . . . . . . . . . . . . . . . 5.2 A Demonstrative Example . . . . . . . . . . . . . . . . 5.3 Two Five-Dimensional Test Cases . . . . . . . . . . . . 5.4 A Structural Testcase . . . . . . . . . . . . . . . . . . 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

61 62 64 66 72 75

. . . . . . . .

77 77 77 78 78 80 80 82 82

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

6 On the Design of Optimization Strategies Based on RBF Models 6.1 Two Strategy Planning Issues . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 The Size of the Initial Sample . . . . . . . . . . . . . . . . . . . 6.1.2 Controlling the Scope of an RBF-based Global Search . . . . . A Drawback of the Expected Improvement Criterion . . Weighted Expected Improvement . . . . . . . . . . . . . A Demonstrative Example . . . . . . . . . . . . . . . . . 6.2 Empirical Investigation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Artificial Test Functions . . . . . . . . . . . . . . . . . . . . . . 6.2.2 A “Real-life” Unimodal Problem: Geometric Optimization of Spoked Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 A Case of Higher Modality: Vibration Optimization of a Two-Dimensional Structure . . . . . . . . . . . . . . . . . 7 Selecting the Right Optimizer 7.1 Key Factors in Optimizer Selection . . . . . . . . . . . 7.1.1 The Computational Cost of the Objective . . . 7.1.2 The Dimensionality of the Objective Landscape 7.1.3 Modality . . . . . . . . . . . . . . . . . . . . . 7.1.4 Noise . . . . . . . . . . . . . . . . . . . . . . . 7.1.5 Prior Experience . . . . . . . . . . . . . . . . . 7.1.6 Availability of Parallel Machines . . . . . . . . 7.1.7 Availability, Cost and Accuracy of Gradients . 7.2 The Satellite Beam Problem Revisited . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . a .

. . . .

45 46 49 51 53

. 89

. . 90

. . . . . . . . .

. . . . . . . . .

96 97 97 98 99 99 100 100 101 103

8 Conclusions 108 8.1 What Has Been Accomplished . . . . . . . . . . . . . . . . . . . . . . . . . 109 8.2 ...And What Remains to be Discovered . . . . . . . . . . . . . . . . . . . . 110 A The Engine Inlet Testcase

113

CONTENTS B Artificial Test Functions B.1 Rastrigin’s Function . . . . . . B.2 Branin’s Function . . . . . . . . B.3 Modified Rosenbrock Function B.4 Ackley’s Path Function . . . .

v

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

115 115 116 116 117

C The Tail Bearing Housing Model

118

D Passive Vibration Control in a Satellite Beam

120

E Detailed Empirical Results on the Satellite Beam Testcase 123 E.1 Sample data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Bibliography

136

List of Figures 1.1

How to make your own Nimrod from an old Comet... . . . . . . . . . . . .

Aggregate benefit of applying 1-bit, 5-bit and 10-bit mutation to the 10variable Rastrigin function. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Scatter plot of offspring fitness against parent fitness (1-bit mutation, 10-variable Rastrigin function). . . . . . . . . . . . . . . . . . . . . . . . . 3.3 “Improvement sides” of scatter plots of offspring fitness against parent fitness (1, 5 and 10-bit mutation, 10-variable Rastrigin function). . . . . . 3.4 Three pairs of histograms of the objective values of offspring generated by the two operators (1-bit mutation shown in a darker colour) and represent the situation after 1250, 2500 and 3750 evaluations respectively (10-variable Rastrigin function). . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Flowchart of the GLOSSY hybrid template. . . . . . . . . . . . . . . . . . 3.6 GLOSSY Mk1: at each reshuffle step the individuals from the smaller population are moved to the “front” of the larger population and the same number of individuals are moved from the “back” of the bigger population into the smaller one. . . . . . . . . . . . . . . . . . . . . . . . . 3.7 GLOSSY Mk2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Half-section through the inlet. The baseline shape (dotted line) is visible inside the contour resulting from the addition of two Hicks-Henne bumps (randomly generated in the case shown here). . . . . . . . . . . . . . . . . 3.9 Variation of the global population size when using the adaptive hybrid (GLOSSY Mk2), starting with equally split (4 – 4) populations. The plot is averaged over 50 runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Mean ultimate objective values (µ1 . . . µ7 ) achieved by GLOSSY Mk1 with different numbers of BFGS iterations/cycle. The error bars indicate the ranges (µi + σi , µi − σi ), i = 1 . . . 7, where σ1 . . . σ7 are the corresponding standard deviations (the values represented are those from Table 3.1 and Table 3.2, rows 5-11). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 GLOSSY Mk1, BFGS and GA optimization histories on the inlet problem. 3.12 Histogram of ultimate objective values (after 200 equivalent evaluations) obtained by GLOSSY, BFGS and the GA. . . . . . . . . . . . . . . . . . .

2

3.1

4.1 4.2

4.3

24 26 27

28 29

32 41

42

42

43 43 44

Three examples of how the value of σ influences the width of a Gaussian basis function centered around x = 0.5. . . . . . . . . . . . . . . . . . . . . 48 Four-point experimental designs with two variables, each variable having four possible levels: maximin (left), one possible latin hypercube (centre) and maximin latin hypercube (right). . . . . . . . . . . . . . . . . . . . . . 56 Two-variable latin hypercube designs with 16 experiments: random (left) and optimal in the Morris sense (right). . . . . . . . . . . . . . . . . . . . 57 vi

LIST OF FIGURES 4.4

4.5 5.1 5.2 5.3

5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 6.1 6.2 6.3

6.4

vii

One variable example of why searching the RBF predictor may lead to premature convergence. The circles indicate points where we have sampled the objective function. The heavy black line is the RBF prediction based on these points, whereas the lighter line is the actual objective function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 One variable example function with its RBF approximation based on six points (top) and the corresponding EIF (bottom). . . . . . . . . . . . . . 60 Flowchart of the parallel RBF optimization scheme. . . . . . . . . . . . . Initial approximation of the Branin function based on 8 points. . . . . . . The EIF surface corresponding to the Gaussian RBF model of the Branin function based on eight evaluations. The black squares indicate its four highest maxima – this is where the function will be sampled next. . . . . The RBF predictor of the Branin function after six sets of parallel runs (24 points). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contour plot of the true Branin function. . . . . . . . . . . . . . . . . . . Comparison of optimization strategies based on RBF and GERBF models for the modified Rosenbrock function, using one processor. . . . . . . . . . Comparison of optimization strategies based on RBF and GERBF models for the modified Rosenbrock function, using four processors. . . . . . . . . Comparison of optimization strategies based on RBF and GERBF models for the modified Rosenbrock function, using eight processors. . . . . . . . Comparison of optimization strategies based on RBF and GERBF models for the Ackley function, using one processor. . . . . . . . . . . . . . . . . . Comparison of optimization strategies based on RBF and GERBF models for the Ackley function, using four processors. . . . . . . . . . . . . . . . . Comparison of optimization strategies based on RBF and GERBF models for the Ackley function, using eight processors. . . . . . . . . . . . . . . . Comparison of optimization techniques (GA, multi-start BFGS and GERBF) on the modified Rosenbrock function with free gradients. . . . . . . . . . . Comparison of optimization techniques (GA, multi-start BFGS and GERBF) on the Ackley function with free gradients. . . . . . . . . . . . . . . . . . Comparison of optimization techniques on the structural example assuming 1 processor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of optimization techniques on the structural example assuming 4 processors. One wall time unit is the cost of one FE evaluation. . . One variable example demonstrating the effect of changing the weighting factor in the WEIF criterion. . . . . . . . . . . . . . . . . . . . . . . . . . Optimization algorithm based on the WEIF infill sample selection criterion. Log-scale density map of objective function values reached by the optimizer after 11 evaluations of the 5-variable Sphere function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values. . . Log-scale density map of objective function values reached by the optimizer after 35 evaluations of the 5-variable modified Rosenbrock function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

63 65

65 66 66 69 69 70 70 71 71 73 73 74 75 81 83

84

85

LIST OF FIGURES 6.5

6.6

6.7 6.8 6.9 6.10

6.11

6.12

6.13

6.14

7.1 7.2

Log-scale density map of objective function values reached by the optimizer after 50 evaluations of the 5-variable Ackley function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values. . Log-scale density map of objective function values reached by the optimizer after 60 evaluations of the 10-variable Ackley function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values. . Two-dimensional section through the objective function of the structural testcase. The feasible region is delimited by the linear mass constraint. . Log-scale density map of objective function values reached by the optimizer after 40 evaluations of the stress function (spoked structure). . . . Two-variable slice through the objective function: x coordinate of joint 12 vs. y coordinate of joint 12. . . . . . . . . . . . . . . . . . . . . . . . Log-scale density map of objective function values reached by the optimizer after 30 evaluations of the four-variable satellite-beam objective function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Log-scale density map of objective function values reached by the optimizer after 40 evaluations of the six-variable satellite-beam objective function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Log-scale density map of objective function values reached by the optimizer after 60 evaluations of the eight-variable satellite-beam objective function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Log-scale density map of objective function values reached by the optimizer after 65 evaluations of the ten-variable satellite-beam objective function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Log-scale density map of objective function values reached by the optimizer after 100 evaluations of the 12-variable satellite-beam objective function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

viii

. 85

. 86 . 90 . 90 . 92

. 93

. 93

. 94

. 94

. 95

Optimization histories on the four-dimensional satellite beam testcase. . . 105 Optimization histories on the 12-dimensional satellite beam testcase. . . . 107

A.1 Examples of geometries from the design space of the engine inlet test case. 114 A.2 Contour plot of the objective function of the engine inlet testcase (square of peak surface velocity). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 B.1 Two-dimensional slice through the k dimensional Rastrigin function. . . . 115 B.2 Contour plot of Branin’s function (normalised to [0,1]). . . . . . . . . . . 116

LIST OF FIGURES

ix

B.3 Two-dimensional slice through the k dimensional Modified Rosenbrock function (normalised to [0,1]). . . . . . . . . . . . . . . . . . . . . . . . . . 116 B.4 Two-dimensional slice through the k dimensional Ackley’s Path function (normalised to [0,1]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 C.1 Sketch of the tail bearing structure.

. . . . . . . . . . . . . . . . . . . . . 119

D.1 The four-variable testcase. The x and y coordinates of the two mid-span joints (9 and 10) are allowed to vary within the limits indicated by the chevrons (±0.25m). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 D.2 The six-variable testcase. The x and y coordinates of joints 9, 10 and 11 are allowed to vary within the limits indicated by the chevrons (±0.25m). 121 D.3 The eight-variable testcase. The x and y coordinates of joints 8, 9, 10 and 11 are allowed to vary within the limits indicated by the chevrons (±0.25m). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 D.4 The ten-variable testcase. The x and y coordinates of joints 8, 9, 10, 11 and 12 are allowed to vary within the limits indicated by the chevrons (±0.25m). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 D.5 The 12-variable testcase. The x and y coordinates of 7, 8, 9, 10, 11 and 12 are allowed to vary within the limits indicated by the chevrons (±0.25m). 122 E.1 Two-variable slice through the objective function: x coordinate of joint 7 vs. y coordinate of joint 7. . . . . . . . . . . . . . . . . . . . . . . . . . . E.2 Two-variable slice through the objective function: x coordinate of joint 8 vs. y coordinate of joint 8. . . . . . . . . . . . . . . . . . . . . . . . . . . E.3 Two-variable slice through the objective function: x coordinate of joint 9 vs. y coordinate of joint 9. . . . . . . . . . . . . . . . . . . . . . . . . . . E.4 Two-variable slice through the objective function: x coordinate of joint 10 vs. y coordinate of joint 10. . . . . . . . . . . . . . . . . . . . . . . . E.5 Two-variable slice through the objective function: x coordinate of joint 11 vs. y coordinate of joint 11. . . . . . . . . . . . . . . . . . . . . . . . E.6 Two-variable slice through the objective function: x coordinate of joint 12 vs. y coordinate of joint 12. . . . . . . . . . . . . . . . . . . . . . . . E.7 Two-variable slice through the objective function: x coordinate of joint 7 vs. y coordinate of joint 12. . . . . . . . . . . . . . . . . . . . . . . . . . E.8 Two-variable slice through the objective function: y coordinate of joint 9 vs. x coordinate of joint 10. . . . . . . . . . . . . . . . . . . . . . . . . . E.9 The four-variable testcase (as shown in Figure D.1). Objective function density maps after various numbers of evaluations. The horizontal lines indicate the number of evaluations after which the density map was plotted (clearly, it does not make sense to use any DoE size above the line). E.10 The six-variable testcase (as shown in Figure D.2). Objective function density maps after various numbers of evaluations. . . . . . . . . . . . . E.11 The eight-variable testcase (as shown in Figure D.3). Objective function density maps after various numbers of evaluations. . . . . . . . . . . . . E.12 The ten-variable testcase (as shown in Figure D.4). Objective function density maps after various numbers of evaluations. . . . . . . . . . . . . E.13 The 12-variable testcase (as shown in Figure D.5). Objective function density maps after various numbers of evaluations. . . . . . . . . . . . .

. 123 . 124 . 124 . 124 . 125 . 125 . 125 . 126

. 127 . 128 . 129 . 130 . 131

List of Tables 3.1

3.2

3.3

5.1 5.2 5.3 5.4

Means of the objective function value samples collected over 50 runs of GLOSSY Mk1, the adaptive GLOSSY Mk2, the pure BFGS and the pure GA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Standard deviations of objective function value samples collected over 50 runs of GLOSSY Mk1, the adaptive GLOSSY Mk2, the pure BFGS and the pure GA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Statistical comparison of the final objective value samples collected from 200-evaluation runs of the BFGS, GLOSSY and GA. µ and σ denote the mean and the standard deviation of the samples, MIN and MAX their smallest and largest values. p0 is the probability of the directional null hypothesis that the mean of GLOSSY is worse than that of its two components (two-tailed t-test). . . . . . . . . . . . . . . . . . . . . . . . . 39 Accuracy of the approximation without using gradients. . . . Accuracy of the approximation using gradients. . . . . . . . . Comparison of number of evaluations (i.e., wall time ×Np ) reach convergence for the modified Rosenbrock function. . . . Comparison of number of evaluations (i.e., wall time ×Np ) reach convergence for the Ackley function. . . . . . . . . . . .

. . . . . . . . . . . . needed to . . . . . . needed to . . . . . .

. 67 . 67 . 71 . 72

A.1 Ranges of the six design variables describing the engine inlet geometry used as a testcase in 3.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

x

Nomenclature Chapters 4, 5 and 6 x

vector of design variables

xi

ith element of x

x(j)

j th design vector

k

number of design variables

f (x)

objective function

D

design space

N

number of evaluated designs

y

N -vector of responses

y∗

N + 1-vector of responses

Y

fictitious random variable

y (i)

objective value of design x(i) (realisation of Y )

φ(.)

basis function

yˆ

predicted objective value

wi

coefficient of the ith basis function

w

vector of basis function coefficients

σ

hyperparameter of the Gaussian RBF

r

distance of current point from the RBF centre

C

covariance function

Z, ZN , ZN +1

normalising constants

αi , βi,j

coefficients of the Hermitian interpolant

σyˆ(N +1)

standard deviation of Y in point x(N +1)

Φ

matrix of basis function values / correlation matrix

Np

number of available processors

di

inter-point distances in an experimental design

Ji

nr. of pairs in an exp. design with distance di

EIF

expected improvement function

WEIF

weighted expected improvement function

w

expected improvement weighting

ymin

best objective value so far

xi

Acknowledgements I owe an immense debt to my supervisor, Prof. Andy J. Keane, for providing me with advice, encouragement and a great working environment throughout my time in his research group. His clarity of thought and no-nonsense approach to science have influenced me intellectually to a very large extent. Much of this thesis deals with the use of approximation techniques in design optimization. I have had the privilege to be introduced to this area by one of its leading experts: Dr. Stephen Leary. In completing the work presented on the following pages I have also benefited a great deal from Dr. Prasanth Nair’s encyclopedic knowledge of computational engineering in general and optimization in particular. I have already mentioned the great working environment of the research group – many thanks to (in alphabetical order) Neil Bressloff, Atul Bhaskar, Tom Etheridge, Alex Forrester, Juri Pápay and Tony Scurr for being amongst the principal culprits in making it what it has been for me. The main benefit of working in a multidisciplinary research group is that there is always an expert to turn to, whenever one needs help with some more obscure aspect of one’s research. I’m grateful to Mammadou Bah, Sourish Banerjee, Keith Bearpark, Andy Chipperfield, Arindam Choudhoury, Hans Fangohr, Alvin Ong, Ed Rayner, Wenbin Song and David Stinchcombe for sharing their expertise on various things with me. Arguably, a sine qua non of any research work is its financial support. The work reported in this thesis was made possible by sponsorship from BAE Systems and Rolls-Royce in the framework of the University Technology Partnership for Design. Beyond the financial aspects, I have had many useful discussions with several people from the two companies: Alan Gould, Steve Wiseall, David Knott, Janet Reese, Carren Holden and Sharokh Shahpar, to name but a few.

xii

Chapter 1

Introduction 1.1

Global Optimization in Engineering Design – Motivation and Challenges

Sir Geoffrey de Havilland’s Comet Mk1 first flew in January 1951. It was the world’s first jet airliner, faster and more comfortable than any of its propeller-driven rivals. Unfortunately, it had a number of fatal structural weaknesses, which grounded it in 1954, only two years after it had entered commercial service. The flaws that led to the first Comet’s untimely demise are now probably amongst the best documented design issues of all time and are taught as salutary tales to every novice structural and aeronautical engineer. A later variant of the aircraft, the Mk4, was launched in 1958. With a redesigned airframe and a different alloy used for parts of the fuselage, the new Comet was safe and reliable. However, by then competition from across the Atlantic had become very intense and most airlines opted for the larger and more fuel-efficient American models (the Boeing 707 and the Douglas DC-8). The Comet 4 soon went out of production. Sir Geoffrey died in 1965, but not before he had witnessed the rebirth of the Comet. RAF Strike Command needed a new maritime reconnaissance aircraft to replace their aging fleet of Shackletons. Instead of a new design, the government opted for a modified version of the Comet 4, designed and built in 1964 by Hawker Siddeley (which de Havilland had merged into) under the name Nimrod. Exceeding the original

1

Chapter 1 Introduction

2

expectations by far, the 1950’s Comet design is likely to live on well into the 21st century in the shape of the latest variant of the “Mighty Hunter”, the Nimrod MR4, which is currently being tested by BAE Systems. The similarity between the Nimrod and Comet 4 is obvious at first glance. RAF Waddington even have a light-hearted “DIY-guide” on their web-site urging visitors to make their own Nimrod from and old Comet. The accompanying illustration is reproduced in Figure 1.1.

Figure 1.1: How to make your own Nimrod from an old Comet...

Although not the most technically rigorous of illustrations, this “touched up” picture underlines that aspect of the the Nimrod design that makes this brief historical account relevant to the present work. The external shape of the aircraft is the result of a series of what design engineers call local improvement procedures. The long bulge stretching underneath the nose, designed to accommodate the weapons bay, the slightly reshaped air intakes designed for the new Rolls-Royce Spey engines and other similar features bear testimony to this design philosophy. Instead of giving designers free rein to develop a radically new shape, they were only permitted to make comparatively small changes to the variables describing an existing fuselage design to meet the new constraints imposed by the military role of the aircraft. One of the reasons for being constrained in this way is that global design optimization is still very expensive. Physics-based simulations, such as Computational Fluid Dynamics (CFD) analyses of new shapes require many CPU-hours of computation and thus any global optimization procedure based on them is likely to take a very long time. No doubt, recent times have seen rapid advances in computing technology, which have brought about changes in many facets of computational engineering. One aspect


3

that has not changed, however, is the relative computational expense of simulations used in design optimization. As the demand for ever-higher fidelity (and thus more complex) models closely shadows increases in processing power, this is unlikely to change in the near future. Therefore, it seems inevitable that the efficiency of the optimizers themselves needs to be improved if global design optimization is to become more feasible. The present thesis does not offer any ultimate panacea to the global optimization computational cost problem. Instead, it takes a look at some of the current approaches and suggests possible ways of combining or extending them to increase their efficiency and to make them more suitable for industrial use 1 . The novel (or improved) methods are viewed in comparison with their more established counterparts. At the same time we also attempt to gain an insight (from an empirical perspective) into how the latter can be put to a better use through a more informed choice of their runtime parameters.

1.2

About this thesis

This work pursues two broad lines of argument towards the general goal of making global design optimization a more feasible proposition than it is at present. First, we endeavour to make the case for blending local search techniques into global optimization methods, thereby providing a tool that is more efficient and more versatile than either of its two components used separately. Secondly, we look into another promising area of global optimization research, that of techniques based on response surface approximation methods. We propose and test a number of enhancements as well as guidelines for setting up such search strategies. Along the way we also touch on a question that is intertwined with both of these two main strands in some sense: how to put advances in sensitivity analysis to good use in global optimization. More specifically, we look at how hybrids and response surface approximation-based global optimization techniques can benefit from the availability of low computational cost objective function gradients. This two-pronged reasoning (or three-pronged, if one includes the issue of gradient1

Global optimization is gradually finding its place in everyday industrial design practice. This is reflected in the introduction of optimization capabilities in well-established simulation and analysis tools, such as MSC.Nastran (http://www.mscsoftware.com/).


4

enhancement) provides the blueprint for the thesis as well as for the overview offered in Chapter 2. Thus, on the following pages we review some of the recent advances in the areas outlined above. This does not intend to be a full survey of the state of the art – it is merely meant to highlight those ideas that subsequent chapters attempt to take a step further. The first part of Chapter 3 has a similar flavour, although the scope of the review is narrower here: we focus on the technical details of building hybrids between global exploration algorithms and classic exploitation (local search) techniques. A taxonomy of hybridisation frameworks is proposed, in the light of which we then briefly look at some of the algorithm design issues involved and chart the reasoning that led us to develop GLOSSY, a generic hybrid optimizer. This template, along with some of its specific embodiments and an aerodynamic design application form the second part of the chapter (Sections 3.3 and 3.4). Moving on to the second global optimization approach investigated in this thesis, Chapter 4 deals with global objective landscape approximations. Their general overview presented in Chapter 2 is expanded here in a specific area: we discuss implementation issues related to global Radial Basis Function (RBF) models and their uses in optimization. This provides the background for the two chapters that follow. Chapter 5 investigates the applicability of these methods on parallel computational architectures, while Chapter 6 looks at an improved version of a modern RBF-based optimizer and at the wider issue of setting up such heuristics for optimum performance. Chapter 7 analyzes the relative merits of the approaches discussed previously and identifies their place in the design engineers’ toolbox. We conclude by looking back at what has been achieved during this work and we point out a number of areas where, in our view, practitioners of global design optimization could benefit from further studies (Chapter 8). Wherever new approaches are proposed in this work, artificial and/or real-life testcases are used to measure their performance. The former have the advantage of being very cheap to evaluate and therefore allow us a greater number of computational experiments, whereas the latter provide results that are perhaps more directly relevant to our ultimate goal of bringing global optimization closer to the needs of industry. In


5

addition to these test problems, in each case we have included small, demonstrative toy problems to illustrate the descriptions of the methods under scrutiny. Finally, a linguistic note. Throughout this thesis we adhere to the spelling rules of British English. However, there is one word rather frequently used here that some may consider to be an exception to this: “optimize” (along with its siblings “minimize” and “maximize”). After some editions of English dictionaries have insisted on “optimise” for a long time, the Cambridge Advanced Learner’s Dictionary has now softened its stance to “UK – usually optimise” and the Oxford English Dictionary (OED) only contains “optimize”. Following what seems to be the general trend in the literature on both sides of the Atlantic, we use the OED spelling.

Chapter 2

Towards Affordable Global Optimization – Two Possible Avenues 2.1

Local Optimizers versus Global Explorers

In the optimization community in general and in the field of aerodynamic shape optimization in particular there has been a long-running debate about the use of local improvement procedures (run with multiple starts when needed) versus (usually stochastic) global exploration methods. Both categories have influential advocates and have seen significant developments over past decades. Of course, there are splits within these camps as well. Most notably, proponents of the (multi-start) local search philosophy fall into two main groups: those preferring gradient-based local optimizers and those working with zeroth-order techniques. As far as the history of the “gradient-based” group is concerned, it is perhaps edifying to look at the aerodynamic design communities’ favoured tools throughout the last decades. While in the early days of numerical aerodynamic shape optimization (late 70’s – see, e.g., Hicks and Henne [1978]) the concept of gradient-based search was almost equivalent to that of the steepest-descent algorithm, by the late 90’s a wide range of sophisticated optimizers of this class has entered everyday design practice.

6

Chapter 2 Towards Affordable Global Optimization – Two Possible Avenues

7

Steepest-descent is still used occasionally, but it has been gradually superseded by modern algorithms based on the Newton method (with line-search or trust-region-type implementations), quasi-Newton methods (BFGS, DFP) and conjugate gradient optimizers (Fletcher-Reeves, Polak-Ribière). A good survey of these algorithms and their implementations is offered by More and Wright [1993], while the reader interested in some of their aerodynamic design applications may wish to consult Newman et al. [1999]. Hajela [1999] provides a snapshot of the “non-gradient” community. Their favoured tools include the Simplex method [Nelder and Mead, 1965], the complex method of Box [1965] and the pattern search of Hooke and Jeeves [1961]. On the other side of the multi-start local vs. global argument, stochastic global exploration methods have also evolved from tentative simulations of physical and biological phenomena into a set of powerful search tools, the most popular being Simulated Annealing (SA) and Genetic Algorithms (GA). Again, Hajela’s survey provides a good overview of the state of the art in this area from a Multidisciplinary Optimization (MDO) perspective. As the body of experience on these two main categories of search techniques grew, so did the design community’s awareness of their respective limitations. Local searchers (exploiters), while very efficient on many smooth, unimodal objective function landscapes, often provide less then satisfactory results when the problem exhibits long valleys and/or multiple local optima. Once trapped in a valley or at a local optimum the search needs to be re-launched from a new (commonly random) starting point. This operation usually involves wasteful, lengthy exploration of unpromising regions of the search space, such as those with very poor objective values or virtually flat regions (visited before the neighbourhood of a local optimum is reached) and one can only hope that the new starting point is in the basin of attraction of a thus far unexploited local (or perhaps the global) optimum. Conversely, global explorers, such as GAs, are good at leaving poor objective value regions behind quickly, while simultaneously exploring several basins of attraction. What they lack is high convergence speed and precision in the exploitation of individual local optima. To summarize: global explorers are good at locating basins of attraction – local


8

searches are good at descending into them. Additionally, like most zeroth-order methods, global explorers are less likely to be disrupted by small discontinuities in the objective landscape (these occur sometimes when the objective is the result of some iterative computational process, such as the solution of a discretized flow field). A fairly obvious solution is to attempt to get the best of both worlds by combining a local improvement procedure with a global explorer to form a hybrid method that allocates the available computational resources between the two in an efficient manner. As the search engine that we are proposing in this thesis is based on the global/local hybridisation principle, we delve more deeply into this issue in Chapter 3. For now, we proceed to a brief overview of the class of global search techniques that form the second main strand of this work.

2.2

Global Model Building as an Optimization Tool

One of the more popular approaches for the economical use of expensive computer simulations is the broadening category of optimization algorithms based on cheap global approximation models (often called surrogate models or spatial prediction models) of the high-fidelity computational simulation. These models involve running the physics-based analysis code (treating it, essentially, as a black-box function) for a number of designs and using this training data to build a surrogate model, which is cheap to evaluate. These models have their roots in a variety of fields: response surface approximation methodology (low-order polynomials), nature-inspired computing (artificial neural networks), spatial statistics, stochastic process theory, mathematical geology (kriging, Radial Basis Function models), etc. Polynomial response surfaces constitute the simplest global modeling technique (see, e.g., Box and Draper [1987]). Their main drawback is that the approximation can only be relied on for optimization purposes if the underlying response is of low modality. Artificial neural networks [White et al., 1992], Radial Basis Functions [Powell, 1987] and kriging [Matheron, 1963, Sacks et al., 1989] are more versatile from this point of view, as they can capture more complex landscape features. Many of these tools have been around for quite a long time, but their engineering design applications were rare. This is particularly the case with methods based on the


9

statistics of stochastic processes, which were viewed primarily as physical experimental analysis techniques. The past decade saw a sharp increase in their usage within the design community, possibly encouraged by a number of recent important contributions that deal with applications of these methods aimed at modeling the output of deterministic computer simulations (by viewing the deterministic output as a realisation of a stochastic process – we will discuss this in more detail in Chapter 4 – or by adopting a Bayesian approach). Under headings such as “stochastic process-based optimization”, “Bayesian global optimization”, “random function approach”, a number of papers have dealt with employing stochastic process based approximation methods for model building and exploitation as well as for more sophisticated forms of global search (e.g., Sacks et al. [1989], Currin et al. [1991], Schonlau [1997], Williams et al. [2000], Audet et al. [2000], Jones [2001]). Possibly the most influential contribution of recent years in this domain has been the EGO algorithm [Jones et al., 1998], which uses a DACE (Design and Analysis of Computer Experiments, kriging for deterministic computer experiments, introduced by Sacks et al. [1989]) models, combined with a model update strategy based on an expected objective function value improvement measure (again, we defer a more detailed discussion to Chapter 4). Through most of this thesis we will be arguing the merits of these two approaches and our suggested improvements in separate chapters. However, as we set out on our initial roadmap, there is a topic common to both of these themes – we discuss this next.

2.3 2.3.1

Objective Function Gradients – Beyond Local Search Sensitivities – Cost and Quality

The crucial factor that affects the performance of any gradient-based search is the computational cost and quality of the gradient. Finite differencing, the most straightforward gradient calculation method, is problematic on both counts. As the evaluation of each component of the gradient in a given point of the search space requires an evaluation of the objective function, its computational burden in high-dimensional spaces can be immense. However, depending on the application, there are often cheaper means of obtaining


10

objective function gradients. For example, in some cases it is feasible to apply an Automatic Differentiation (AD) procedure to the computer code used to perform the physics-based analysis (see, e.g., Bischof et al. [1992]). Structural analysts have their own specific tools for obtaining sensitivities with respect to certain design variables at a fraction of the cost of the analysis itself [Haftka, 1993, Perezzan and Hernández, 2003]. In the experimental section of the next chapter we use a Computational Fluid Dynamics (CFD) application, which employs yet another sensitivity analysis technique to obtain the objective function gradients: the adjoint method. For many years advocates of gradient-based aerodynamic shape optimization have had to face the massive costs of finite difference gradients. Using cheaper, partially converged objective function values is not a solution, as the numerical errors on the gradient thus obtained by finite differencing are usually too high for the sensitivity information to be of any value for an optimization algorithm [Reuther, 1996, Reuther et al., 1999]. A paper published by Jameson [1988] signaled a breakthrough for the aerospace community in the gradient evaluation problem. In this seminal contribution he introduced the adjoint method for aeronautical computational fluid dynamics (CFD), a technique whereby a new set of equations can be constructed (based on the solution of the flow equations), which, at a computational cost similar to that of the solution of the flow equations, yields all components of the objective function gradient. Jameson, Reuther and other co-workers developed the method for potential flow, the Euler equations and the Navier-Stokes equations, at the same time proving its benefits from the optimization point of view with applications ranging from 2D aerofoil design [Kim et al., 2000] to the optimization of high-lift systems [Kim et al., 2002] and full aircraft configurations [Reuther et al., 1999, Jameson, 1999, Jameson and Vassberg, 2001]. A number of other research groups have also performed aerodynamic design optimization using adjoint flow solvers: Monge and Tobio [1988], Eliott and Peraire [1998], Giles and Pierce [2000], Anderson and Venkatakrishnan [1999], Kim et al. [2001a], Kim et al. [2001b], Iollo et al. [1993], Arian and Vatsa [1998], Nemec and Zingg [2001], etc. (for a comprehensive survey see Newman et al. [1999]).


2.3.2

11

Gradient-Enhanced Hybrids

To date, as far as we were able to ascertain, all reported aerodynamic design applications of the adjoint method have relied solely on deterministic local gradient-based optimization. Amongst these, quasi-Newton methods (BFGS, in particular) and the smoothed gradient method [Jameson and Vassberg, 2001] (based on the principle that aerodynamic shapes are predominantly smooth) appear to have the best performance. A detailed comparative study of these approaches has been conducted by Jameson and Vassberg [2000] on the brachistochrone, a classic calculus of variations problem (with a known analytical solution), considering every mesh point on the curve (the shape of which is to be optimized) as a design variable. However, very few comparative studies exist in the literature that assess the relative merits of gradient-based deterministic optimizers and stochastic search methods (fairly inconclusive comparisons between single runs of SA, gradient-descent and GA have been described by Obayashi and Tsukahara [1997] and Sasaki et al. [2001]). As indicated earlier, pure local search methods may be the best choice on smooth and unimodal landscapes, but when several local optima and/or discontinuities caused by incomplete convergence are present, their superiority is less obvious [Alonso et al., 2002]. Additionally, as Ta’asan [1997] points out, high dimensional landscapes resulting from the discretisation of partial differential equations (such as the conservation equations of a flow field) often have badly conditioned Hessians (i.e., the level curves around optima are long ellipses, rather than circles), a feature that reduces the efficiency of pure gradientbased searches. In Chapter 3, where we discuss global/local hybrids, we will argue that hybrids can be a feasible means for using adjoint gradients in global optimization. For now, we turn our attention to another possible use of cheap gradients in global optimization, linking the idea of affordable sensitivity analysis to the second main strand of this thesis.

2.3.3

Response Surface Approximations and Gradients

The use of objective function sensitivities to enhance the accuracy of global predictors goes back at least to Morris et al. [1993]. They predict unseen data using a Gaussian stochastic process model, which incorporates the response and its gradients.


12

The accuracy of several types of global predictors can be enhanced using gradient information. van Keulen et al. [2000] use a weighted least squares formulation to build gradient-enhanced polynomial models. A similar technique is used in Rijpkema et al. [2001]. Vervenne and van Keulen [2002] view the problem of including derivative information as a multi-objective optimization problem, where they are searching for a compromise between the surface minimizing an objective-value related error function and a gradient-related error function. The idea is further investigated in van Keulen and Vervenne [2002]. On the engineering applications front Chung and Alonso [2001] build a gradient-enhanced global polynomial model for an objective related to an aerodynamic design problem. There is a growing body of literature on gradient-enhanced kriging models as well. Chung and Alonso [2002a] underline the importance of a terminological distinction between two possible ways of building osculatory interpolants with kriging. In the approach they call direct cokriging, the sensitivities in the sample points are viewed as secondary variables and are built into the analytical form of the correlation matrix. An alternative method, indirect cokriging, is to estimate additional objective values in the neighborhood of the sampled points, by simply inserting the sensitivities in the corresponding term of a Taylor expansion. These additional points are included in the database of known objective values, upon which a standard kriging model is built. Liu and Batill [2002] also use this latter approach – they term it Database Augmentation. A further example, the design of a low-boom supersonic business jet using gradientenhanced kriging, is described in Chung and Alonso [2002b]. The authors note that the “proper” way of obtaining the gradients needed for the cokriging model is, of course, to compute the solutions of the adjoint system – however, in the cokriging work reported so far they have used finite differences to obtain them. A gradient-enhanced krigingbased EGO algorithm is assessed by Leary et al. [2003] – an analytical function and a structural optimization problem are used as testcases. In Chapter 5 we show the results of experiments with gradient-enhanced Radial Basis Function (RBF) models. We compare them with results obtained with the same method without using gradients, looking at both their accuracy in predicting unseen data, as well as their ability to aid global optimization. With the scene thus set in general terms, we now begin the more in-depth study


13

of the ideas outlined in this chapter. We start by examining the ways in which global explorers can be merged with local exploitation techniques into hybrid global optimizers.

Chapter 3

Exploration / Exploitation Hybrids The solution of complex optimization problems ideally requires a thorough global exploration of the search space as well as a precise exploitation of the regions where optima may lie. In other words, all basins of attraction should be located and their corresponding optima approached to within the smallest possible error. Unfortunately most traditional optimizers, such as gradient-based local improvement procedures and global evolutionary searches, cannot accomplish both of these tasks in a truly time-efficient manner. A possible solution is thus to devise hybrids (a term that goes back in this context at least to Davis [1991]) that combine global explorers with local search (exploitation) procedures and indeed, this idea and some of its implementations are as old as the problem itself. Multi-Start Hillclimbers (MSHC) constitute the first such approach from both the chronological and the complexity point of view. An MSHC may be thought of as a hybrid between a hillclimber (e.g., a steepest-descent method) and the simplest of all global explorers: the random search. It essentially involves restarting the local improvement procedure from a (usually) randomly selected point whenever it gets stuck in a local optimum. If the number of restarts is relatively large, we can aim to generate a cloud of

14

Chapter 3 Exploration / Exploitation Hybrids

15

starting points that fills the design space as uniformly as possible. This can be achieved using some Design of Experiments (DoE) technique – Sobol sequences [Sobol, 1979] are particularly suitable as they do not require a priori knowledge of the size of the experimental design (given a computational budget, we can usually not be certain how many restarts we will be able to do). DoE techniques will be discussed in more detail later in the context of global approximation models – the interested reader may wish to leap forward to read Section 4.4.1 and then return here. Another way of generating starting points for the local optimizer is to take the best solutions obtained with a relatively short run of a global optimizer – often an evolutionary algorithm (EA). In spite of (or perhaps partly owing to) their simplicity, these approaches are still widely used today for many industrial applications and deservedly so: they are robust and difficult to outperform on landscapes with low to moderate multimodality and modest numbers of variables. Another simple way of combining a population-based EA with a local search is to occasionally interrupt the former with relatively brief spells of the latter, applied to some or all members of the population (by brief we mean here that the local procedure is not necessarily pursued to full convergence). As EAs in general are viewed as computational equivalents of the Darwinian theory of the “survival of the fittest”, it is not surprising that the concept of chromosomes undergoing bouts of local improvement within an EA framework has found its own evolution theory metaphors. There is more to such extended interdisciplinary terminology than its mere intellectual elegance: it often paves the way towards exploiting scientific knowledge so far confined to other disciplines. Therefore, we now briefly review two such metaphors linked to the concept of local improvement, together with some of the research they have spawned.

Lamarckian and Baldwinian Learning

Some of the first attempts to hybridise

GAs with local searches were based on the implementation of a non-Darwinian (or even anti-Darwinian) idea: the Lamarckian evolution theory. At the core of Jean-Baptiste Lamarck’s teachings lay the belief that acquired characteristics could be inherited. In optimization jargon this means that the results of a local improvement procedure modify the variables of an individual taking part in the evolutionary simulation.


16

Many researchers discard such genotype-altering learning schemes on the grounds that they contradict traditional GA schema theory (they disrupt the schema processing of the GA) and that it is unnatural (it is biologically not feasible). An alternative solution, first proposed by Hinton and Nowlan [1987] is based on a much later theory: Baldwinian learning. Employing the principles put forward by XIX-th century biologist J.M.Baldwin [Baldwin, 1896], this method does not alter the variables as a result of the local search. Instead, the new, improved individual is only used to change the local objective landscape. In other words, the individual will keep its original genotype, but its fitness will improve, thus increasing its probability of being selected. Hence, its offspring will inherit the ability to acquire a particular feature, not the feature itself. A collateral (and according to some authors undesirable) effect of this approach is that it leads to a number of genotypes mapping to the same phenotype. For a more detailed analysis of the theoretical aspects of Baldwinian learning the interested reader is referred to a series of articles (e.g., Mayley [1996]) published in a special themed issue of the journal Evolutionary Computation. Comparative empirical studies of these approaches are relatively hard to find. In one rare systematic study Houck et al. [1996] ran a GA/local search hybrid on a set of test functions with varying probabilities of an individual being forced to match the solution resulting from the local search. After performing searches on a testbed consisting of seven problems, they concluded: “By forcing the genotype to reflect the phenotype, the GA converges more quickly and to better solutions than by leaving the chromosome unchanged after evaluating it. This may seem counterintuitive since forcing the genotype to be equal to the phenotype might have forced the GA to converge prematurely to one of those local optima. However, for this class of problems and procedures, Lamarckian learning outperformed Baldwinian learning.” Julstrom [1999] found similar results on the 4-cycle problem: his most aggressive Lamarckian search outperformed those implementing the Baldwinian technique. Intuitively, one would not employ Baldwinian optimization on problems where, as in the cases discussed in this thesis, the cost of the objective function is very high (safeguarding the schema processing of the GA probably does not warrant “throwing away” the result of a Lamarckian local improvement procedure if its computational cost is measured in CPU hours). In addition, the reports cited above indicate that the effects of Baldwinian local improvement on GAs and the tradeoffs involved are not sufficiently


17

well understood. Nevertheless, the distinction between Lamarckian and Baldwinian local improvement widens one’s perspective on the mechanics of hybridisation; furthermore, future advances in learning theory may make revisiting the issue a worthwhile exercise.

3.1

A Taxonomy of Exploration/Exploitation Hybrids

Throughout recent decades, an impressive variety of global and local search methods have been interwoven into successful hybrids, using an equally impressive number of strategies. In this section we take a look at what is, in our view, one of the most important aspects of exploration/exploitation hybridisation: the methods whereby the algorithms divide the search time between the exploration and exploitation components of the hybrid (more generally, balancing exploration and exploitation is one of the most important questions of global optimization research). We propose a classification of these methods, based on the taxonomy of parameter control methods, presented in the excellent survey by Eiben et al. [1999]. We note here that, although there are similarities between the philosophies behind operator adaptation and search-time division schemes (indeed, this is the reason why the structure of our taxonomy is similar to that of Eiben et al. [1999]), a clear distinction should be made between the two and the present classification only covers the latter. The schemes that allocate available search-time between exploration and exploitation processes fall into two main categories: tuning and control. Tuning involves deciding upon the resource division without any input from the search itself, i.e., the strategy is not a function of time (number of evaluations) or the state of the search (e.g., statistical measures of the population). Conversely, control methods are strongly linked to the progress of the search. Depending on the way in which they accomplish variations in search-time allocation, control methods can be classified under three headings: deterministic (variation according to a pre-determined, time-dependent schedule), adaptive (variation control takes feedback from the state of the search) and self-adaptive (the allocation method evolves together with the solution of the optimization problem). We next look at each of these categories in more detail.


3.1.1

18

Search time division tuning

In algorithms with tuned search time division the way in which the global and local components share the resources is determined before the run and is independent of the search itself. Obviously, the major question is how to decide upon a search-time division that will lead to good results on a landscape, using a given combination of search methods. Although significant theoretical work has been aimed at answering this question (see, e.g., Whitley [1995], Goldberg and Voessner [1999], Sinha and Goldberg [2001]), the results are difficult to apply in practice, mostly because the proposed mathematical models require an in-depth knowledge of the optimizers and the landscape under scrutiny, which is rarely available. Thus, one is left to rely on experience and experimentation on problems from the same class as that in question. Some EA scholars argue that pure steady-state GAs can be thought of as such tuned hybrids, which combine crossover (exploration) with mutation (exploitation) 1 . Regardless of whether we accept this or not, it is definitely safe to include in this category algorithms that incorporate local search methods alongside the usual operators of an EA. The most popular variant of this technique is to add Simplex-type elements to crossover as in Foster and Dulikravich [1997], Yen et al. [1995] or in the Simplex-based Crossover (SPX) / Probabilistic Simplex (PS) / SA tri-hybrid reported in Mendoza et al. [2001]. Also worth mentioning here are the so-called memetic algorithms [Radcliffe and Surry, 1994]. Some researchers use this term to refer to any genetic search in which some local improvement procedure takes an important role. However, in most instances it refers to genetic algorithms, in which every individual takes part in complete local improvement after every generation. In other words, the genetic algorithm searches the space of local optima, rather than the entire search space. We note here that this use of the term is also closer to the initial definition of the concept of the meme [Dawkins, 1976, Blackmore, 2000], as a unit of information that reproduces itself as people exchange ideas. As these ideas are adapted (locally improved, in our optimization terminology) by each person before being passed on, it makes sense to refer to those algorithms as “memetic”, in which each individual is improved locally in each generation. 1

We note here that whether we consider mutation to be an exploration or an exploitation operator, ultimately depends on the probability of application. Here we are referring to the standard low-probability case. Also, there is a school of thought in the evolutionary computation community that considers selection to be the exploitation mechanism and the exploration of the search space to be achieved by the mutation operator.


3.1.2

19

Search time division control

3.1.2.1

Deterministic control

Most of the global/local hybrids in use today fall into this category. This class of algorithms varies the proportion of search-time allocated to each component of the hybrid or makes decisions with regards to the allocation of individuals to one method or another according to a pre-determined schedule, which is usually a function of the elapsed number of generations. One of the most popular approaches is to run a global search (such as a GA) and interrupt it after each block of (a pre-set number of) generations to perform local improvement. A typical example is the wing design work reported by Vicini and Quagliarella [1999]. They perform gradient-based hillclimbing (the gradients are obtained by finite differencing), operating only on a subset of variables of individuals resulting from spells of a GA run, noting that carrying out converged optimization in each learning-period would be detrimental to the hybrid’s overall performance. After 2-3 iterations of the local search, the individual is fed back into the GA population where it continues to evolve for another block of generations (Lamarckian learning). Other authors prefer to interrupt the search after each generation. Here the boundary between tuning and deterministic control becomes slightly blurred. Nevertheless, we include global searches interrupted after each generation by a round of local improvement into the deterministic control category since we consider them a special case of the class of global searches interrupted after every n generations (these clearly fall into the deterministic control category). A typical example is that of the Engineered Conditioning / GA hybrid described by Miller et al. [1993]. More simple implementations, where the local optimizers are used only to improve the final GA solutions, are also quite common (see, e.g., the Gradient-Descent / GA hybrid of Finckenor [1997] or the Simulated Annealing (SA) / GA combination reported by Brown et al. [1989]). The opposite approach is to run several local searches in parallel and, after each block of a pre-set number of iterations, use a global operator (e.g., crossover) to exchange information between the local optimizers. Representative of this method is the PRSA (Parallel Simulated Annealing) algorithm [Mahfoud and Goldberg, 1995], which can be considered as several SAs running in parallel, regularly reconciling solutions with


20

crossover (two randomly selected parents are recombined and one or two Boltzman trials held between parents and offspring to decide who will survive into the next generation).

3.1.2.2

Adaptive control

Hybrids with adaptive control schemes take feedback from the search, but, as opposed to deterministic control strategies, this is not limited to elapsed time (number of evaluations, generations). Typical measures used to control search time allocation are (best) individual fitness, population average fitness and individual or average improvement (reward). One such approach for controlling a hybrid’s search time division is based on the 2-armed bandit stochastic automaton (see, e.g., Lobo and Goldberg [1997], Igel and Kreutz [1999], Magyar et al. [2000]), a statistical decision theory model, the use of which goes back at least to Holland [1975]. The algorithm uses a simple weighted relaxation formula, whereby it attempts to predict at each step, whether the local or the global method is likely to lead to better improvement in the next stage of the search. Another major class of such techniques was spawned by the natural metaphor of species competing for the same resources. A fairly straightforward way of implementing this idea is to allow subpopulations with different searches to evolve for short sequences of reproductive isolation and then to resize them depending on their average improvement achieved (i.e., the more successful species grow at the expense of the less successful). The exchange of individuals is usually carried out by means of a migration scheme (for some recent contributions implementing this idea see, e.g., Herdy [1992], Autere [1994], Schlierkamp-Voosen and Muhlenbein [1994], Eiben et al. [1998]).

3.1.2.3

Self-adaptive control

Some authors use the terms “adaptation” and “self-adaptation” interchangeably. We strongly believe that a clear terminological difference should be made between the two approaches, as also suggested in the parameter control taxonomy of Eiben et al. [1999] or, e.g., in the survey by Angeline [1995]. Self-adaptive evolutionary hybrid algorithms integrate the parameters that control search time division (usually indicators of whether an individual should undergo local or global improvement) into the genotypes of the individuals. Thus, the search time


21

division evolves simultaneously with the solution of the optimization problem (in other words, the problem space and the operator space are being searched at the same time). This approach seems to be less popular in the global/local hybrid literature – it is more commonly used in pure EA’s to decide between different types of crossover (e.g., Spears [1995]) or mutation (e.g., Saravanan and Fogel [1994]) operators.

3.2 3.2.1

Designing a hybrid The Search Time Division Scheme

With the above overview in mind, we can now proceed to propose a hybrid optimization algorithm (or a family of such algorithms) suitable for our purposes. First of all we need to decide which of the previously discussed classes we want our algorithm to belong to. It is currently most widely accepted that an efficient hybrid between an explorer and an exploiter should control the search-time division based on the progress of the search. This places us into the division control (3.1.2) category. As self-adaptive control is not particularly suitable for this type of application (we have included it in our taxonomy only for the sake of completeness), we are left with a choice between deterministic and adaptive control. It is clear that a correctly set up deterministic strategy cannot have a worse performance than an adaptive one. Indeed, it usually performs better, as most adaptive schemes take some time to recognize the presence of a pressure towards increased exploration/exploitation intensity. The problem is, of course, the holy grail of stochastic optimization theory: how do we know a priori what the correct setup is? How do we know from the outset which individuals should undergo exploration / local improvement and when? The simple answer is that we usually do not and this leaves us with two possibilities. We can either gain some experience in choosing the correct setup by performing optimization runs with a deterministic scheme with different setups on a variety of functions and then apply that knowledge when confronted with a new problem, or we can directly run an adaptive hybrid on our problem. We can be reasonably confident that the latter approach gives fairly good performance on the class of (engineering design)


22

problems that we are interested in here; however, if there are similarities between the problem in hand and another problem on which we have already gained some experience, the former is likely to give better results. Therefore, in section 3.3 we will construct a general hybrid template, which allows the implementation of both approaches. Our intention is to combine a local search with a population-based global technique. We will base this general framework on the concept of multiple subpopulations, since we consider this to be very versatile and well-suited for combining both population-based and non-population-based search algorithms.

3.2.2

The Components of the Hybrid

The second essential aspect of designing a hybrid search heuristic is, of course, the choice of its components. In the case of population-based techniques this involves choosing the appropriate operator(s) used to spawn the members of new generations. In the following we look at three possible criteria for making this decision.

3.2.2.1

Operator Selection Based on Off-line Performance Analysis

Off-line methods can be particularly useful if we have some preliminary knowledge of the objective function (dimensionality, multimodality, noise, etc.) and thus we can construct a test-function that will emulate the real-world objective. This type of analysis can also be applied when the algorithm will be used to solve a large number of similar problems and thus the computational cost of the off-line experiments is likely to be a worthwhile long-term investment. In the following we suggest some simple techniques for evaluating the performance of a unary operator α. The reasoning behind the operator performance assessment technique described below is based on the observation that the efficiency of many operators depends on the region of the search space where they are applied. By “region” here we mean areas between objective function level curves. In other words, some operators may be efficient when the individual that they are applied to has a poor objective function value, while others may be preferable when we are approaching the bottom of a basin of attraction. n o (P ) (P ) (P ) Let P = x1 , x2 , . . . xnp denote the set of np parents collected over a num-

Chapter 3 Exploration / Exploitation Hybrids ber of runs of the optimizer and Oα =

n

(O)

23 (O)

(O)

x1 , x2 , . . . x n p

o

the set of corresponding

offspring obtained during the same number of runs by applying unary operator α to them (the same parent can appear several times in the set if the operator is applied to it more than once). We determine the best and the worst objective value corresponding to the individuals in the parents data set: ó ´ ³ ´ ³ n ³ (P ) (P ) ) , . . . , F x(P , F x2 FMIN = min F x1 np n ³ ´ ³ ´ ³ ó (P ) (P ) ) FMAX = max F x1 , F x2 , . . . , F x(P np

(3.1) (3.2)

where F is the objective function. Next, we discretize the interval [FMIN , FMAX ] into nb sub-intervals of equal length with the boundaries l0 , l1 , . . . , lnb , so that l0 = FMIN and lnb = FMAX . Thus, we can define the normalized aggregate benefit of applying the unary operator α to the set of parents P , as the array:

B(α)i =

1 mi

X

li−1 0     0

(4.28)

if s = 0

where Ψ(·) is the standard normal distribution function and ψ(·) is the standard normal density function. The first term of equation (4.28) is the predicted difference between the current minimum and the prediction yˆ in x, penalized by the probability of improvement. Hence it is large where yˆ is small (or it is likely to be smaller then ymin ). The second term is large when the error s is large, i.e., when there is much uncertainty about whether y will be better than ymin . Thus, as Schonlau [1997] points out, the expected improvement will tend to be large at a point with predicted value smaller than ymin and/or there is much uncertainty associated with the prediction.

Chapter 4 Radial Basis Function Models in Global Optimization

60

We thus have a criterion that favours promising basins of attraction, at the same time allowing the exploration of less well known areas of the search space. Returning to our earlier one-variable example, Figure 4.5 illustrates this point. The EIF criterion has two peaks in this case, one at the currently predicted best basin of attraction (on the left) and another where we have less confidence in the accuracy of the predictor (due to the long distance to the nearest sampled point).

f(x)

RBF predictor

True function

EIF(x)

x

x

Figure 4.5: One variable example function with its RBF approximation based on six points (top) and the corresponding EIF (bottom).

The generic EIF-based two-stage optimization method used in this work can be summarised as follows. We start with a set of points usually arranged in a space-filling pattern using some DoE technique (e.g., latin hypercube sampling) and we fit an RBF model to this set. We then pick the next design to be evaluated by maximizing the corresponding EIF. We run the analysis in that point, add the objective value to the database of known designs, re-fit the RBF and repeat these last three steps until we run out of time or some other stopping criterion is met. In the next chapter we investigate the feasibility of tailoring this algorithm to run on a parallel computational architecture.

Chapter 5

Parallel Updates in EIF-based Global Optimization In today’s industrial setting it is commonplace for parallel computing architectures to be available to the design engineer. Here we assess the feasibility of the use of parallel computing throughout the design process based on the methods described in the previous chapter. We will assume Np processors are available; first these will be used to calculate the objective function for the initial Design of Experiments points in parallel. This information will then be used to construct a Radial Basis Function surrogate model, which, in turn, provides the starting point for the optimization process. Both standard and gradient enhanced models are considered. We compare the approaches assuming gradients are free and alternatively that they are available at the cost of one function evaluation (which we sometimes term the “adjoint cost”) as both these situations can arise in practice. After a discussion of the algorithm and a demonstrative example designed to illustrate its implementation, we present an empirical feasibility study, first on two artificial test functions, then on a structural design case.

61

Chapter 5 Parallel Updates in EIF-based Global Optimization

5.1

62

The Update Scheme

One of the advantages of optimization based on approximation models (surrogates) is that the evaluation of the initial experimental design can be done in parallel. The only condition is, of course, that the size of this design must be a multiple of Np if the parallelisation speedup is to be linear. We devote much of Chapter 6 to looking at other factors that may influence our choice of the size of the initial sample – for now, we focus on the issue of parallelising the updates to the model as well. The basic idea is straightforward: we search the expected improvement function (EIF) calculated for the model and to highlight the Np best maxima. This is possible as the EIF is usually highly multimodal. Nevertheless, should Np unique minima not be available, we could add some extra points using either of the other two simple criteria discussed in the previous chapter. That is, the points corresponding to maxima of the EIF can be supplemented with optima of the predictor or maxima of the prediction error landscape. However, here we consider the expected improvement approach alone as in all the examples tested we found more local EIF maxima than available processors. We only mention the others for completeness. Figure 5.1 shows the flowchart of the algorithm. The loop at the top optimizes some space-filling measure of the DoE set (in all experiments presented in this chapter we have used the Morris-Mitchell criterion – see the discussion in section 4.4.1). This is followed by the parallel evaluation of the initial set of points. Since, as we mentioned earlier, the number of DoE points is usually chosen to be a multiple of the number of available processors, each of the Np boxes (only three are shown) can correspond to several sequential evaluations. The next loop trains the model – in the case of the Gaussian RBF models used here this amounts to finding the optimum value of the kernel width parameter σ. After locating the top Np maxima of the expected improvement (“Maximization of EIF” box), the corresponding designs are evaluated in parallel, the newly computed objective values are added to the database of evaluated points and the process is repeated until some stopping criteria are met. In the experiments described here we used two stopping criteria. We ran the optimization process until either the global optimum of the function was reached (in the case of test functions with known optima) or the


63

Construct DoE ¾ DoE tuning

?

6

Evaluate “space-filling” properties

?

?

Computer simulation(s)


?


...

? -

Construct surrogate model

¾

Surrogate training

?

6

Surrogate model evaluation ?

Maximization of EIF ?

?

Computer simulation

Computer simulation

?

...

Computer simulation

?

Add new points to database ³? P PP ³³ ³ PP ³ P ³ PPAdequate? ³ ³ PP³³

-

Best design

Figure 5.1: Flowchart of the parallel RBF optimization scheme.


64

pre-established computational budget had been exhausted, whichever came first. We note here that there are several ways of locating the Np best maxima. For example one could employ a GA with clustering or any multi-start local optimizer. In the applications described here we have used 500 random restarts of the BFGS algorithm written by Nocedal et al. [1997] and in each case we highlighted the top N p distinct local maxima from the list of 500 points generated by the BFGS. In the remainder of this chapter we present an empirical examination of this EIFbased parallel heuristic. The experiments were conducted on two five-dimensional artificial test functions and a real-life engineering problem: the minimization of the maximum stress in a component, subject to a mass constraint. Before proceeding with the discussion of these experiments, we illustrate the method on a simple two-variable test problem.

5.2

A Demonstrative Example

For the purposes of demonstrating the approach outlined above, we chose the twovariable Branin function (for details of the function see section B.2). The inputs x 1 and x2 are normalised to the range [0,1]. Using our technique based on radial basis function approximation , assuming Np = 4 parallel processors, we start with a DoE set consisting of eight computer experiments (two parallel sets of runs). These points, arranged in a latin hypercube experimental design, are shown in Figure 5.2, which also contains a contour plot of the initial RBF approximation based on the eight sampled points. We then locate the four best maxima of the expected improvement surface, as shown in Figure 5.3, and re-evaluate the model at these points. We continue until we have 24 points (six parallel runs) – see Figure 5.4. The RBF predictor here has already succeeded in outlining the three basins of attraction of the Branin function (the contour plot of the “true” function is shown in Figure 5.5). This experiment serves merely to illustrate the optimization procedure – in reallife, high-dimensional objectives it is unlikely that such a precise reconstruction of the objective would be possible with a reasonable number of evaluations. Of course, this is


65

not a pre-requisite of finding the optimum anyway – in most cases at the stage when we have located the global optimum, our picture of the function overall is still relatively sketchy. Indeed, the advantage of using an update criterion that balances exploration and exploitation (such as expected improvement), as opposed to striving for a gradual, uniform improvement of the accuracy of the model, is that when we have sufficient confidence that we have found a good basin of attraction, the search will begin to concentrate on updating the model predominantly in the area of interest. When this is adequately mapped the EI function usually switches to another basin.

Figure 5.2: Initial approximation of the Branin function based on 8 points.

Figure 5.3: The EIF surface corresponding to the Gaussian RBF model of the Branin function based on eight evaluations. The black squares indicate its four highest maxima – this is where the function will be sampled next.


66

Figure 5.4: The RBF predictor of the Branin function after six sets of parallel runs (24 points). 1

0.8

x2

0.6

0.4

0.2

0 0

0.2

0.4

0.6

0.8

1

x1

Figure 5.5: Contour plot of the true Branin function.

5.3

Two Five-Dimensional Test Cases

We have picked two test functions of different modality for this study: the modified Rosenbrock function (denoted here by f2 , moderate modality, see B.3) and Ackley’s Path (denoted by f3 , high modality, details in B.4). While the main target of our investigation is the performance of the optimizers based on the RBF models, we considered it enlightening to look at the global prediction accuracy of the RBF predictors first. For the comparisons in this brief diversion we arbitrarily chose the modified Rosenbrock function. Table 5.1 shows the accuracy of the standard radial basis function approximation model (i.e., without gradient information included). The table shows the average model error, the correlation between the model and the true function and the optimum σ 2 . This is shown for optimal latin hypercube


67

designs of several different sizes N . The errors and correlations are evaluated at a set of testing data created by evaluating the modified Rosenbrock function at a 500-point random latin hypercube design. Table 5.2 contains the same information using the gradient-enhanced radial basis function (GERBF) approximation. N

Average error

Correlation

σ2

16

0.7184

0.1412

0.10

32

0.6844

0.2594

0.05

48

0.5925

0.4870

0.05

64

0.5057

0.5540

0.10

80

0.4653

0.6173

0.10

Table 5.1: Accuracy of the approximation without using gradients.

N

Average error

Correlation

σ2

8

0.6514

0.3625

0.10

16

0.6010

0.4781

0.05

32

0.5536

0.5708

0.05

48

0.3797

0.7355

0.10

64

0.3336

0.7825

0.10

80

0.2004

0.9140

0.10

Table 5.2: Accuracy of the approximation using gradients.

From tables 5.1 and 5.2 it can be seen that as N increases our model generally becomes more accurate (the error reduces and the correlation increases), as expected. It is more interesting to compare the accuracy of the GERBF model with that of the standard RBF model. We note that when all the gradients are available at a cost of one function evaluation, utilizing gradient information increases the global accuracy of the model (compare table 5.1 with N = 16, 32 with table 5.2 using N = 8, 16). We have observed this to be true for other choices of N, k and other objective functions. When gradients are available virtually for free, the accuracy of the GERBF model is far superior. This advantage increases with the number of dimensions, as more gradient components are available at the same (low) computational cost. Let us now examine how the various approaches perform in terms of optimization.


68

First, we compare the RBF, together with the GERBF, using Np = 1, 4 and 8 processors on both five design variable objective functions. We look at the performance of the optimization strategy based on these models, assuming gradients are both free and available at a cost of one function evaluation. We choose the size of the DoE so that its computational cost will be the same in each case: we start from a 32 point latin hypercube design in the case of the standard RBF model and the GERBF model when we assume free gradients and a 16 point latin hypercube design when the gradients are assumed to have “adjoint” cost. All DoE sets used here have been optimized using the minimax criterion. Figures 5.6, 5.7 and 5.8 show the results of optimizing the modified Rosenbrock function, using N p = 1, 4 and 8 processors respectively. Figures 5.9, 5.10 and 5.11 show the same information when using the Ackley function. The first feature one might observe on these plots is the sharp drop in the objective value as soon as the infill selection criterion (EI in this case) takes over from the initial space-filling experimental design (after the evaluation of the 32-point latin hypercubes)1 . As we saw at the beginning of this section, if we assume the sensitivity information to be free, the accuracy of the gradient-enhanced models is considerably higher than that of the standard RBF approximation having the same computational cost. As these optimization histories reveal, the good initial model translates into good optimization performance within the framework based on expected improvement updates: the “free gradient” runs converge fastest in all six test scenarios. Clearly, the more training data we have (i.e., gradient information in addition to objective values in this case), the higher the reliability of the improvement predictor (EIF) will be and in many cases, as seen in these plots, this will ensure convergence within a few updates. The “adjoint cost” case is less clear-cut. Although, as discussed earlier, in terms of global model accuracy the inclusion of gradients is still worthwhile, the increase in the reliability of the EIF predictor often does not make a positive impact on the convergence speed of the updates (in fact, the performance of the optimizer is often worse than without gradients). One of the most important aspects of such optimization schemes is the efficiency 1

Incidentally, this shows the superiority of the EI-based approach to, for example, simply using up all our expensive evaluations in a space-filling DoE – we will analyse this issue more thoroughly in the next chapter.


69

Figure 5.6: Comparison of optimization strategies based on RBF and GERBF models for the modified Rosenbrock function, using one processor.

Figure 5.7: Comparison of optimization strategies based on RBF and GERBF models for the modified Rosenbrock function, using four processors.

of the parallelization. As shown in tables 5.3 and 5.4, this varies from case to case. If, for example, one looks at the free gradient case of the Rosenbrock function optimization, the total number of evaluations needed to reach convergence is very similar for the three values of Np , i.e., the speedup is nearly linear. In fact, this conclusion can be drawn in almost all cases for the one and four-processor cases. In some of the experiments moving to eight-processor parallel runs reduces the efficiency of the search. For example, while optimizing the Ackley function takes 64 evaluations on a four-processor architecture (considering the cost of the gradients being equal to that of an additional evaluation,


70

Figure 5.8: Comparison of optimization strategies based on RBF and GERBF models for the modified Rosenbrock function, using eight processors.

Figure 5.9: Comparison of optimization strategies based on RBF and GERBF models for the Ackley function, using one processor.

i.e., “adjoint cost”), if the parallel updates are chosen in batches of eight the required number of evaluations reaches 112. To conclude this section, we consider how these surrogate-based optimization strategies perform against some frequently used optimization procedures. Here we assume the gradients are free and consider a GA, a gradient-based method (BFGS) and our optimization scheme utilizing GERBF predictions. Results for both the modified Rosenbrock and Ackley functions are shown in figures 5.12 and 5.13 respectively. We


71

Figure 5.10: Comparison of optimization strategies based on RBF and GERBF models for the Ackley function, using four processors.

Figure 5.11: Comparison of optimization strategies based on RBF and GERBF models for the Ackley function, using eight processors.

Np

Without grad.

“Adjoint” grad.

Free grad.

1

47

40

39

4

48

56

48

8

64

80

40

Table 5.3: Comparison of number of evaluations (i.e., wall time ×Np ) needed to reach convergence for the modified Rosenbrock function.


72

Np

Without grad.

“Adjoint” grad.

Free grad.

1

66

60

39

4

64

64

52

8

80

112

48

Table 5.4: Comparison of number of evaluations (i.e., wall time ×Np ) needed to reach convergence for the Ackley function.

arbitrarily use Np = 4. The GA and BFGS convergence histories are averaged over 50 optimization runs. When we consider the GA, new members of the population are spread over the N p processors to make use of the parallel computing architecture. When considering BFGS we perform one optimization run on each processor, therefore we are using four random restarts. Multiple restarts are generally required when using gradient-based optimizers, as they become trapped in local minima. The convergence histories show the best objective function value found so far after each parallel run. Finally, we compare these approaches to the GERBF-based optimization approach (again, using four processors). As the optimization histories show, in these cases the GA performs relatively poorly; it is at an obvious disadvantage, as it makes no use of the free gradient information. The BFGS method performs much better: it is almost competitive with our GERBF based optimization approach on the modified Rosenbrock function, however it does often become trapped in one of the local minima, hence it usually converges to a higher value. On the more strongly multimodal Ackley function the BFGS method performs much worse, often becoming trapped in local minima and also converging to these minima at a slower rate. Our expected improvement approach using GERBF based optimization consistently finds the global minimum very quickly.

5.4

A Structural Testcase

In this final example we consider the optimization of a spoked structure, a six dimensional, constrained problem (for a more detailed description see Appendix C). We use this real-life testcase to compare the search based on gradient enhanced approximation models to the standard variant of the technique that does not make use of sensitivity


73

Figure 5.12: Comparison of optimization techniques (GA, multi-start BFGS and GERBF) on the modified Rosenbrock function with free gradients.

Figure 5.13: Comparison of optimization techniques (GA, multi-start BFGS and GERBF) on the Ackley function with free gradients.

information. This is a case of constrained optimization, where the objective function is relatively expensive, but the evaluation of the constraint is virtually free. When dealing with constraints we take ymin to be the minimum of the feasible sampled responses (in equation (4.28)). We further penalize the expected improvement in infeasible areas so that only feasible updates are performed. We did not, however, bias the DoE in any way in this case.


74

Figure 5.14: Comparison of optimization techniques on the structural example assuming 1 processor.

We have constructed our original approximation using 16 finite element evaluations, the points being distributed evenly throughout the design space using a MorrisMitchell optimal latin hypercube design. The accuracy of the model was checked against a database of 1000 previous runs (also uniformly distributed throughout the design space).2 Without incorporating gradients there was an average error of 7279.4 and a correlation of 0.3868. Utilizing cheap gradient information led to a design with an average error of 4309.1 and an average correlation of 0.7903 – clearly a much more accurate model. We then performed surrogate-based optimization using these two approximation models employing the strategy shown in figure 5.1. Figure 5.14 shows the results assuming one processor is available, whereas figure 5.15 shows a similar result when four processors are available. As can be seen, utilizing cheap gradient information allows very good solutions to be found in significantly less wall time (i.e., actual time elapsed while the parallel runs are being performed) than the variant of the method that takes no account of this information. 2

Usually, such a study would of course not be possible (or indeed, necessary, for the purposes of optimization). Here we have only calculated these figures to gain an insight into the effect of gradientenhancement on the global accuracy of the model.


75

Figure 5.15: Comparison of optimization techniques on the structural example assuming 4 processors. One wall time unit is the cost of one FE evaluation.

5.5

Conclusions

In this chapter we have discussed a parallel surrogate modeling based approach to optimization. The parallel updates here come from locating several maxima of the expected improvement measure. Surrogate models with and without gradient information have been considered. When the gradients are included the cost has been assumed first to be free and then requiring the cost of one objective function evaluation. In terms of the model’s accuracy it appears to make sense to incorporate this gradient information, even if the derivatives are available at adjoint cost. From the point of view of optimization the situation is less obvious. However, if gradients are available virtually for free, as is sometimes the case in Finite Element models, then their use unquestionably makes sense. We note here that the sensitivity information used for these experiments has, in reality, been obtained via finite differencing. This was feasible for the purposes of experimentation, as the computational cost of the objective was fairly low. A further assumption that we have made when comparing the various techniques discussed here was that the computational cost of the optimization algorithms themselves (e.g., the cost of the algebra associated with building the approximation models and searching the expected improvement surface) was negligible, when compared to the cost of the objective. In practice this is not always the case and we will return to this issue in


76

Chapter 7. We also mention here that in an industrial setting there is often another constraint on performing parallel analyses: the cost of the extra licenses required to run the simulation software. However, this issue has not been considered in the efficiency comparisons presented here.

Chapter 6

On the Design of Optimization Strategies Based on RBF Models In Chapter 4 we have outlined the general template upon which two-stage approximationbased strategies are built. The first stage comprises the selection of an initial set of designs (commonly arranged in a space-filling manner) and the application of the (usually expensive) physics-based analysis to them. During the second stage we select the points where the objective will be evaluated next – this time based on information we already have from building the approximation model on the existing (evaluated) designs. In the previous chapter we examined the feasibility of running such an optimization process on parallel architectures, selecting the infill sample points by maximizing the expected improvement function. Focusing now on the standard, sequential implementation of the method, this chapter looks at two aspects of the planning of such runs. The aim is to formulate a set of guidelines that may help engineers in making these planning decisions.

6.1 6.1.1

Two Strategy Planning Issues The Size of the Initial Sample

Our main concern in this study is the optimum size of the initial space-filling sample (stage one). Although some researchers use “rules of thumb”, such as that the number 77

Chapter 6 On the Design of Optimization Strategies Based on RBF Models

78

of points should be roughly ten times the number of dimensions [Jones et al., 1998], to date there is no clear understanding of how this figure should be chosen and what influence the choice has on the performance of the optimizer. Intuitively, one might keep the size of the initial sample to a minimum and target the majority of the available shots more intelligently, i.e., using some infill sample selection criterion based on an approximation of the objective function. However, caution needs to be exercised here. The main question is: could the approximation based on a very small sample be so inaccurate – and thus misleading – that we would be better off starting with a set of points whose selection is simply based on a space-filling criterion? Conversely, if we start with a large number of data points, are we not wasting precious evaluations by selecting them without regard to the previously found objective values? Also, how does the optimum size of the initial sample depend on the choice of the type of infill criterion for further points? While a definitive answer may be some way away, here we offer an empirical investigation of the issue, sufficiently conclusive, we hope, to offer some set-up guidance to users of such algorithms.

6.1.2

Controlling the Scope of an RBF-based Global Search

The second object of our study, which we examine in conjunction with the problem of the initial sample size, concerns stage two of the approximation-based search. We look at how the scope of the infill criterion can be controlled, i.e., how it can be biased towards local exploitation of promising basins of attraction or towards global exploration of the search space and, most importantly, what effect the bias has on the performance of the optimizer.

A Drawback of the Expected Improvement Criterion

In global optimizers it is

important to achieve some balance between exploration and exploitation – approximationbased techniques are no exception. As we saw in Chapter 4, an infill selection criterion designed to search both the prediction and its uncertainty is the maximization of the expectation of the amount by which the next potential evaluated point will improve on the best objective value known so far. The expected improvement function (EIF, Section 4.4.3) provides us with a means of fusing exploration and exploitation into a single criterion. However, if the problem in hand is likely to yield a simple, unimodal surface,


79

searching the predictor will probably work better. Conversely, if the objective landscape is extremely multimodal, biasing the search towards sampling in thus far unexplored areas could lead to faster improvement than the expected improvement criterion. In other words, expected improvement still does not allow us to control the balance between local and global exploration. Furthermore, the scope of the expected improvement criterion may not be broad enough if the objective is poorly estimated by the approximation (and consequently the accuracy of the expectation of the improvement is also questionable). To alleviate these shortcomings, Schonlau [1997] proposed the generalized expected improvement criterion. This is controlled by a parameter g = 0, 1, 2, . . . He shows that for g = 0 the criterion yields the probability of improvement (this value, for a particular point, is the probability of the current best objective value being improved on if we sample in that point). For higher values of g the emphasis shifts more and more towards global search. Sasena et al. [2002] suggest a heuristic reminiscent of Simulated Annealing, which is based on generalized expected improvement. They start with a high value of g, which is then decreased as the search progresses, based on a discrete, approximately exponential cooling schedule. The generalized expected improvement measure thus allows the user to control the scope of the search to some extent, but since it has no upper bound, its values are extremely difficult to select for a particular application (it is hard to tell how much impact a change from, say, g = 5 to g = 10 will make on the global bias of the search). Furthermore, it does not cover the search scope range between extremely localized exploitation (g = 0, i.e., probability of improvement) and expected improvement (g = 1). Here we propose a weighted expected improvement criterion, which is designed to allow a more flexible and more “user friendly” means of biasing the search towards exploration or exploitation. In the following we describe this measure in more detail, taking the standard expected improvement criterion as a natural starting point. This is followed by a demonstrative example that highlights the most important features of the criterion. We then adopt an empirical approach to examine the effect of the weighting (the control parameter of the criterion) on the performance of an approximation-based optimizer, in conjunction with the other major parameter that we propose to investigate: the size of the initial sample.

Chapter 6 On the Design of Optimization Strategies Based on RBF Models Weighted Expected Improvement

80

As a reminder, we reproduce here the equation

used in section 4.4.3 to calculate the expected improvement function in a point x:

E(I) = EIF(x) =

 ³ ´ ³ ´     (ymin − yˆ)Ψ ymins −ˆy + sψ ymins −ˆy if s > 0     0

(6.1)

if s = 0

Let us now examine it more closely. The first term is the predicted difference between the current minimum and the prediction yˆ in x, penalized by the probability of improvement. Hence it is large where yˆ is small (or it is likely to be smaller then ymin ). The second term is large when the error s is large, i.e., when there is much uncertainty about whether y will be better than ymin . Thus, as we saw earlier, the expected improvement will tend to be large at a point with predicted value smaller than ymin and/or there is much uncertainty associated with the prediction. Since we are interested in controlling the precise balance of exploitation (i.e., optimization of the predictor) and exploration (i.e., seeking areas of maximum uncertainty), it makes sense to introduce a weighted infill sample criterion, which is a linear combination of the two terms of the expected improvement measure:

WEIF(x) =

 ´ ³ ´ ³     w(ymin − yˆ)Ψ ymins −ˆy + (1 − w)sψ ymins −ˆy if s > 0     0

(6.2)

if s = 0

where the weighting factor w ∈ [0, 1]. Clearly, w = 0 will yield the global extreme of the search scope range, while selecting the next infill sample point using w = 1 will concentrate the search on the current best basin of attraction. Thus, the larger the values of w, the more restricted (local) the scope of the search will be and the weighting offers the possibility of fully covering the continuum between exploration and exploitation. A notable value of w is 0.5, which will, of course, yield 0.5EIF(x). We now examine the impact of varying the weighting w on the Weighted Expected Improvement Function (WEIF) landscape through a simple one-variable toy problem.

A Demonstrative Example Let us consider the one-variable function shown at the top of Figure 6.1. We assume that it has been sampled in the six points indicated on


81

the plot as circles and an RBF approximation has been constructed. The predictor is also shown in the top section of the figure. 2

sampled points

0

prediction

true function −2 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.6

0.7

0.8

0.9

1

0.7

0.8

0.9

1

0.7

0.8

0.9

1

WEIF, w = 0

0.1 0.05 0 0 0.05

0.1

0.2

0.3

0.4

0.5

WEIF, w = 0.35

0 0 0.2

0.1

0.2

0.3

0.4

0.5

0.6

WEIF, w = 1 0 −0.2 0

0.1

0.2

0.3

0.4

0.5

0.6

Figure 6.1: One variable example demonstrating the effect of changing the weighting factor in the WEIF criterion.

The uneven distribution of these points may seem slightly unrealistic in a DoE context. Nevertheless, such situations regularly occur in higher dimensions and/or after a few infill points have been added to the database. The second section of the figure, placed below this plot, shows the weighted expected improvement with weighting w = 0. Based on the above discussion, we would expect the WEIF measure to give the search a fully global bias. This is indeed the case, as the sole maximum of the criterion is in the sparsely sampled region, where the uncertainty about whether the function value is better than the current best point is high. Note that the “goodness” of the prediction, i.e., the value of the predictor has no way of influencing the optimization based on the WEIF with w = 0 – the weighted criterion guides us towards the middle of the unsampled region, in spite of the predicted objective being rather poor here. As we increase w, i.e., we give the search a more local flavour, another peak starts to emerge in the WEIF on the left-hand side where the predictor indicates good function values (although the uncertainty is very low here). When we reach w = 0.35 (see the


82

third section of Figure 6.1) the importance of exploration and exploitation becomes approximately equal (the two peaks are of equal height). The bottom section of Figure 6.1 shows the WEIF landscape for w = 1. Clearly, maximizing this surface will yield a single optimum, which will be in the lowest predicted objective function value point.

6.2 6.2.1

Empirical Investigation Artificial Test Functions

For this part of our study we have selected three test functions. In order of increasing complexity they are the Sphere, a modified version of Rosenbrock’s “banana” function and the highly multimodal Ackley function (details of these functions can be found in Appendix B). We have looked at the performance of the WEIF-based optimizer on the 5-variable versions and in one case (Ackley’s) on the 10 dimensional landscape as well. The optimization algorithm we have used to perform this empirical study is shown in Figure 6.2. We start by generating an initial database of points and we evaluate their objective function values. A random Latin Hypercube experimental design is used to generate these initial designs. In order to alleviate the effects of any bias due to some of these initial points falling, by sheer luck, close to the global optima of the studied functions, each result is averaged over 30 runs (except where otherwise stated). Next, an RBF model is fitted using Gaussian basis functions (equation (4.5)) and the corresponding WEIF surface is optimized using a BFGS search [Nocedal et al., 1997] with 1000 random restarts (as we have indicated before, the expected improvement surface can be highly multimodal and thus difficult to optimize reliably – hence the large number of restarts). The objective function is evaluated at the optimum point and it is added to the database. A new RBF model is fitted and the process is repeated, usually until we run out of time. There are two exceptions to this stopping criterion, which can halt the process earlier. First, in the case of test functions with known optima, if the global optimum is reached, the process stops. The second supplementary stopping criterion is employed when the WEIF weighting is very high and the search is becoming so localized that successive points are very close together and there is no


83

Figure 6.2: Optimization algorithm based on the WEIF infill sample selection criterion.

point in continuing. For the visualization of the results we have opted for a greyscale snapshot map format. The density of a particular region of the plot represents the (averaged) objective function value reached by an optimizer started from an initial Latin Hypercube sample of the size shown on the vertical axis, using the WEIF weighting indicated by the horizontal axis. Looking back at the discussion of the relationship between the scope of the search and the WEIF weighting, the closer we are to the left edge of the plot the more global the search is – conversely, moving to the right gradually reduces the scope of the WEIF criterion. Let us use Figure 6.3 to clarify how the plots are structured (we will analyze its actual significance later). As an example, the density of the point marked by the x (we chose this point arbitrarily) indicates the average objective value obtained when opti-


84

mizing the function under scrutiny, after the evaluation of 11 points in total (as indicated in the caption), out of which five were in the initial Latin Hypercube DoE set (this value can be read off the vertical axis) and the remaining six have been selected using the WEIF criterion with a weighting of 0.4 (the abscissa of the point). This, referring again to the analysis in section 6.1.2, relates to an optimization with a slightly broader scope (more global) than that of the conventional expected improvement criterion (w = 0.5). 11

0.15

10 9

Initial DoE size

8

0.1

7 6 5

0.05

4 3 2 0

0.2

0.4

0.6

0.8

1

0

w

Figure 6.3: Log-scale density map of objective function values reached by the optimizer after 11 evaluations of the 5-variable Sphere function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values.

Due to the high computational expense of generating these plots, in most cases it was impractical to run the tests for every possible initial DoE size – the density maps are therefore interpolations between the actual data points. For each initial DoE size the optimizer has been run with 11 different weightings, covering the range between 0 and 1 with increments of 0.1. In order to remove some of the noise and thus make the underlying trends clearer, a 3 × 3 weighted moving average linear filter has been applied to each image. Generally the optimizer “weeds out” the very poor regions of the search space fairly rapidly. In other words, during the initial stages of the search very substantial progress is made towards the optimum, leaving comparatively little room for variation in the crucial “fine-tuning” phase (with the exception of runs with very bad parameter


85

35 −2

30

Initial DoE size

−2.2

25

−2.4

−2.6

20

−2.8

15 −3

10 0

0.2

0.4

0.6

0.8

1

w

Figure 6.4: Log-scale density map of objective function values reached by the optimizer after 35 evaluations of the 5-variable modified Rosenbrock function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values. 40

−2.5

−3

35

−3.5

Initial DoE size

30 −4

25

−4.5

−5

20

−5.5 15 −6

10

0

0.2

0.4

0.6

0.8

1

w

Figure 6.5: Log-scale density map of objective function values reached by the optimizer after 50 evaluations of the 5-variable Ackley function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values.

choices, which make very slow progress throughout the search process). Therefore, in order to distinguish between the various areas belonging to “relatively good” initial DoE


86

45 −3

40

−4

Initial DoE size

35

30

−5

25

−6

20 −7

15 −8

10 0

0.2

0.4

0.6

0.8

1

w

Figure 6.6: Log-scale density map of objective function values reached by the optimizer after 60 evaluations of the 10-variable Ackley function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values.

sizes and weightings, the density map is based on a logarithmic scale (as shown by the density bar adjacent to each figure). Let us now examine Figures 6.3 through 6.6 in more detail, first from the perspective of the WEIF weighting. The Sphere function is the easiest of our testset. As Figure 6.3 illustrates, for a fairly wide range of weightings and initial DoE sizes the problem is solved after 11 evaluations, as indicated by the large black area in the lower right-hand corner of the plot. As expected, a fairly localized search (with weightings ranging from 0.7 to 1) gives the best results, whereas the light-gray left-hand side of the plot indicates that a search with a broad scope is not recommended here. A different landscape, the considerably multimodal Modified Rosenbrock function, yields a dramatically different plot (see Figure 6.4). On this occasion the black area emerging after 35 evaluations of the objective function is centered around a weighting of 0.3 for small initial sample sizes and leans slightly towards w = 0.4 as we move up into the zone of 15 . . . 20 point initial DoEs. The dark region indicating good performance is even further to the left on the plot showing the objective values after 50 evaluations of the 5-variable version of the highly


87

multimodal Ackley function (Figure 6.5). Clearly, the emphasis needs to shift towards exploration when the number of potentially misleading local optima is as high as in this case (see Figure B.4 in Appendix B). The contrast between the centre of the plot and the dark area on the left (weightings ranging from 0.2 to 0.4) shows that using the normal expected improvement criterion would lead to relatively poor performance here. Further increasing the number of local optima can push the optimum weighting all the way down to zero. Evidence of this can be found on Figure 6.6, a snapshot of objective values after 60 evaluations of the same function (Ackley), this time in 10 dimensions. We note here that this last plot is only averaged over 10 runs, due to its high computational expense. The optimizer was run for 8 different initial DoE sizes (10 . . . 45) – in each case for 11 different weightings (0 . . . 1, increments of 0.1). With an initial DoE size of 10 the RBF model needs to be retrained and rebuilt 50 times (to reach the total objective function evaluation count of 60), starting from 15 points requires 45 constructions of the model, etc. This amounts to 50 + 45 + . . . + 15 = 260 runs of the training procedure for each WEIF weighting factor, that is 260 × 11 = 2, 860 across the entire range. Averaging over 10 runs thus gives a total figure of 28,600 models that need to be trained. With one such procedure taking, on average, around 60 seconds on a PIII processor for a 10-variable problem, the required CPU time works out as 429 hours for this plot alone. As we mentioned in the introduction, the other aspect of the planning of the algorithm that we have looked at is the optimum size of the initial Latin Hypercube sample. A common conclusion that can be gleaned from all of the plots we have examined so far is that the algorithm becomes very inefficient if the size of the initial sample exceeds about 60% of the total computational budget (the deterioration is slightly less significant on the Sphere model, which is so simple that as long as the search is localised, i.e., w > 0.5, the results will be relatively good regardless of the initial DoE size). This confirms our intuition, as outlined earlier: if the size of the initial sample is too large, we are likely to waste points by placing them simply in a space-filling manner, instead of using information gained from the objective values of previous points (via some approximation model-based criterion). The opposite problem only becomes evident when looking at either of the two Ackley plots (Figures 6.5 and 6.6). A very small initial DoE sample (10 . . . 15 points)


88

often renders any approximation-based criterion almost entirely meaningless – as the contrast between this region and the much darker one above it (20 . . . 30 initial points) indicates, more can be gained by at least ensuring that these points fill the space uniformly, without using the objective values of the other points to decide on their location. This phenomenon, however, only manifests itself on landscapes of high complexity. In the majority of the cases studied here, although the small DoEs still do not contain sufficient information to allow the construction of an accurate model, the prediction based on them can offer some guidance on the choice of points, at least as valuable as choosing the points with a space-filling criterion. In summary it can be said that, based on this initial set of artificial test functions, a safe choice for an initial sample size is around 35% of the available computational budget (one obvious exception to this rule is where such a choice would lead to a number of calculations that did not efficiently fill the available computing facilities – in such cases using more points would probably be a more sensible way to start the optimization process). Looking at the weighting factor in conjunction with the initial sample size, we note that the decision on their values can be made, in general, independently. With few exceptions the boundaries of the dark isles on the plots are roughly horizontal or vertical; consequently, for practical purposes we do not need to worry about correlations between the two factors that influence the runs. The choice of which weighting to use for the selection of the infill sample ultimately comes down to the judgement and experience of the analyst. Relatively few real-life problems generate landscapes as highly multimodal as the Ackley function. Nevertheless, if this appears to be the case (based on previous experience on similar problems), one is well advised to keep the scope of the search reasonably global (w = 0 . . . 0.3). For problems where one can be reasonably confident of the accuracy of the initial prediction, values between 0.2 . . . 0.5 are recommended, depending on the modality of the function. Finally, w = 0.5 should only be exceeded when one is confident that the landscape is of low modality. Many real-life engineering problems exhibit simple, unimodal behaviour – in these cases running the WEIF-based optimizer with a weighting of around 0.9 can be expected to give good results. In the next section we discuss an application of this nature, followed in Section 6.2.3 by a case with much higher modality.


6.2.2

89

A “Real-life” Unimodal Problem: Geometric Optimization of a Spoked Structure

In this case study we consider again the optimization of the tail-bearing housing model (a more detailed description of this spoked structure can be found in Appendix C). Six design parameters define the geometry – these are the variables of our test case. The goal is to minimize the maximum von Mises stress, subject to a weight constraint. Calculating the stress involves solving the finite element problem at a cost of about 100 seconds of CPU time, the weight is simply the result of evaluating an exact linear model (at negligible cost). This is a case of constrained optimization. When dealing with constraints we take ymin (see equation (6.2)) to be the minimum of the feasible sampled responses. We further impose a large penalty on the expected improvement in infeasible areas so that only feasible updates are performed 1 . We do not, however, bias the DoE in any way. To check our conjecture that this problem is likely to generate a simple, unimodal objective we have computed a two-variable slice through the six-dimensional landscape – this is shown in Figure 6.7. As expected, the contour plot shows a single minimum on the constraint boundary. Of course, in general we do not have the luxury of generating such plots (if we had we would not need an optimizer) – here we need this insight to underpin our subsequent conclusions with respect to the choice of the WEIF weighting. Figure 6.8 shows the optimizer performance map – a snapshot of average best objective function (stress) values after 40 evaluations of the stress and weight functions. The dark region is on the right-hand edge of the plot, indicating the need for a very localized search (w > 0.8). With regards to our previous conclusion about the choice of the initial sample size, we note that 35% of the total budget (14 points) is again inside the high optimizer efficiency region – thus it is a safe choice on this problem as well. 1

We note here that finding an optimum in the close proximity of the boundary may sometimes be undesirable, as robustness issues may arise (i.e., manufacturing errors can push the design into the unfeasible region). The expected improvement bias may be modified to take this into account if necessary.


90

Figure 6.7: Two-dimensional section through the objective function of the structural testcase. The feasible region is delimited by the linear mass constraint. 4

x 10

25

3.06

3.04

20 Initial DoE size

3.02

3

2.98

15 2.96

2.94

10 0

0.2

0.4

0.6

0.8

1

w

Figure 6.8: Log-scale density map of objective function values reached by the optimizer after 40 evaluations of the stress function (spoked structure).

6.2.3

A Case of Higher Modality: Vibration Optimization of a Two-Dimensional Structure

There are certain situations in engineering design practice when the manufacturing constraints on a design are so stringent that only one sensible parameterisation can be


91

conceived. For example, when optimizing the twist of a wing with given planform and aerofoil cross sections, the designer is sometimes limited by manufacturing constraints to a linear twist variation along the wing, and therefore to a single variable parameterisation. Alternatively, one is sometimes faced with the task of choosing between several levels of parameterisation of the same design. A small number of parameters means a lower dimensionality in the design space (and thus, usually, an easier optimization problem), but having more parameters may enable us to find better designs (at a higher computational expense, of course). It is beyond the scope of this work to investigate this often convoluted and highly problem-specific tradeoff with the purpose of choosing the optimum level of parameterisation. Instead, the following investigation is concerned with the choice of the expected improvement weighting and initial sample size in this context. In other words, we are attempting to answer the question: how does the choice of level of parameterisation affect that of the EIF weighting and initial sample size? The design case we have chosen for looking at this issue is the optimization of the frequency response of a two-dimensional truss. The objective is the minimization of the frequency averaged response of the structure in a given range. The lowest level of parameterisation that we have considered is when the location of two of the joints is allowed to change during the optimization process (thus creating a four dimensional design problem). The highest dimensionality search space (12 variables) was obtained by allowing 6 of the joints to move. A detailed description of this application can be found in Appendix D. As in the previous sections, to make the correlation between the optimum expected improvement weightings and the complexity of the landscapes clearer, we have produced a number of two-variable slices through the design space. Figure 6.9 shows the contour plot of such a slice – several more can be found in Appendix E, Figures E.1 through E.8. As mentioned earlier, at the bottom of the parameterisation level range we have a four dimensional problem – this results from allowing the locations of the two midspan joints to vary. Figure 6.10 shows the density map corresponding to this case – a snapshot of objective values after 30 evaluations. It can be seen from here that reasonable results can be obtained when starting from an initial space-filling set of up to 10 points (practically regardless of the weighting), but, as the logarithmic density variation highlights,


92

1 1.4

0.9

0.8 1.2 0.7 1

0.6

0.5 0.8 0.4 0.6

0.3

0.2

0.4

0.1 0.2 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 6.9: Two-variable slice through the objective function: x coordinate of joint 12 vs. y coordinate of joint 12.

the best ultimate objectives can be found for a weighting of around 0.7, when starting from a rather small initial sample. Similar weightings lead to best results in the six dimensional case (Figure 6.11) and in the eight design variable case (Figure 6.12) as well, after 40 and 60 evaluations respectively. The density maps also show that if the weightings are chosen reasonably well, good results can be obtained when starting with an initial sample of less than 35% of the total budget. The last two plots, presenting the results for the ten and 12 variable cases are only averaged over 5 and 4 runs respectively (due the very high computational expense of generating them), therefore precise conclusions can not be drawn from here anymore. Nevertheless, one can make out general trends that are similar to those seen on the previous plots: a small number of initial points and a high expected improvement weighting are the best choices. We began this chapter by saying that here we are considering sequential updates to the model and the infill selection criterion (WEIF), as opposed to the experiments described in the previous chapter, where we have looked at parallel implementations. This means that the model and the WEIF landscape are updated after the addition of every new point. Nevertheless, as we have seen in Chapter 5, the expected improvement updates can work well on a parallel architecture, where we can only update the model once every Np new points. This is likely to be the case with WEIF as well, an additional possibility being the choice of the Np points at each iteration using a different weighting for each (this may be useful when we are uncertain about the complexity of the surface

Chapter 6 On the Design of Optimization Strategies Based on RBF Models 30 0.055

0.05 25

Initial DoE size

0.045

0.04

20

0.035

0.03

15

0.025

0.02

10

0.015

5

0

0.2

0.4

0.6

0.8

1

w

Figure 6.10: Log-scale density map of objective function values reached by the optimizer after 30 evaluations of the four-variable satellite-beam objective function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values. 40

0.09

0.085

35

0.08

30

Initial DoE size

0.075 25 0.07 20 0.065 15 0.06

10

5

0.055

0

0.2

0.4

0.6

0.8

1

EI weigthing

Figure 6.11: Log-scale density map of objective function values reached by the optimizer after 40 evaluations of the six-variable satellite-beam objective function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values.

and thus of the correct value of the weighting w).

93


45 0.075 40 0.07 35 0.065

Initial DoE size

30 0.06 25 0.055 20 0.05 15 0.045 10 0.04 5

0

0.2

0.4

0.6

0.8

1

EI weigthing

Figure 6.12: Log-scale density map of objective function values reached by the optimizer after 60 evaluations of the eight-variable satellite-beam objective function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values.

40

0.05

35

0.045

Initial DoE size

30

25 0.04

20

0.035

15

10 0.03 5

0

0.2

0.4

0.6

0.8

1

EI weigthing

Figure 6.13: Log-scale density map of objective function values reached by the optimizer after 65 evaluations of the ten-variable satellite-beam objective function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values.

94


50

0.045

45

0.04

40

Initial DoE size

35 0.035 30

25 0.03 20

15

0.025

10

5

0.02 0

0.2

0.4

0.6

0.8

1

EI weigthing

Figure 6.14: Log-scale density map of objective function values reached by the optimizer after 100 evaluations of the 12-variable satellite-beam objective function, using various WEIF weightings (horizontal axis) and initial samples of various sizes (vertical axis). The darker areas correspond to better objective values.

95

Chapter 7

Selecting the Right Optimizer The global optimizers discussed in this thesis fall into two main categories. We first dealt with techniques that are guided solely by discrete points on the objective landscape. These points are obtained by running the relevant, usually high-cost, simulation code. We shall refer to these as direct optimizers 1 . The exploration/exploitation hybrid introduced in Chapter 3 is a typical direct optimizer and so are its components, the Genetic Algorithm and the local search. The rest of the techniques discussed in the preceding chapters belong to the large family of surrogate-based methods. In addition to objective function values obtained by running the simulation directly, these methods also use a cheap surrogate of that expensive model (built using observational data obtained from the expensive runs) to generate the search trajectory. In fact, this simple taxonomy follows the general roadmap we set out in Chapter 1 for the work presented here. To cover the final part of that roadmap we also need to mention that the convergence properties of both of these families of techniques can sometimes be enhanced by the use of objective function gradients (as demonstrated in Chapters 3 and 5). The design optimization practitioner’s dilemma is, of course, how to choose the right optimizer for the application in hand. Also, as we have seen throughout this thesis, most of these techniques come in several flavours, which means that after selecting the 1 We note here that the adjective “direct” is sometimes also used to refer to optimizers that use only objective function values, as opposed to those also guided by objective function gradients.

96

Chapter 7 Selecting the Right Optimizer

97

algorithm itself, further choices need to be made, further tradeoffs need to be considered. The next section endeavours to provide some guidelines that may be useful in this decision process, focusing on the techniques put forward here. We then conclude this chapter by taking a fresh look at these techniques through a testcase that we have already encountered in Chapter 6: the vibrational energy level minimization of a truss.

7.1

Key Factors in Optimizer Selection

7.1.1

The Computational Cost of the Objective

Whether a computer simulation is considered expensive or not, depends on more than just CPU time or wall time. For example, an evaluation taking 10−2 seconds may be considered expensive in a real-time system, such as a computerised auto-pilot. On the other hand, a CFD simulation taking 10 hours may be considered inexpensive if the optimization process is of major importance and the timescale of the project is of the order of years. It is perhaps more edifying to consider the number of evaluations that fit into the total available optimization time. In these terms, the choice of the threshold between cheap and expensive is still fairly arbitrary, but the figures are more meaningful. The techniques presented in this thesis have been conceived with expensive simulations in mind, where we use the term to refer to computational budgets of less than about 150 evaluations (or, perhaps more pertinently, 20-30 evaluations per variable). Although both the GA-based hybrid framework and the surrogate-based techniques are suitable for such high cost cases, the computational overheads of the algorithms themselves (i.e., ignoring the cost of the objective function evaluations) are very different. While the time required to manipulate the GA population is negligible in most cases, spatial prediction models that use combinations of basis functions to approximate the objective become awkward as the number of training points N increases [Koehler and Owen, 1996]. As An and Owen [2001] point out, the cost of building the model is O(N 3 ), while making a prediction using the model takes O(N ) time. Thus, the time required to do the algebra within the optimizer also needs to be considered in some cases when selecting the optimizer for a particular application, especially if the cost of


98

the objective function is comparatively low.

7.1.2

The Dimensionality of the Objective Landscape

In surrogate-based optimization as the number of dimensions of the objective function increases, so does the number of training points required to maintain the accuracy of the approximation model. In fact, the number of unit k-cubes required to fill out a k-cube of a given size grows exponentially with k – a phenomenon sometimes referred to as the curse of dimensionality. This takes us back to the discussion in section 7.1.1 – if the number of points is large, there is the risk of the cost of the ancillary algebra becoming comparable to the cost of the objective. Again, in such cases one may consider using the GA/local search hybrid, even though in terms of evaluation count the surrogate-based technique may work better. If one is considering applying, say, the weighted expected improvement-based sequential update optimization strategy proposed in Chapter 6, the cost of searching the weighted expected improvement landscape also needs to be considered. Here dimensionality has a more direct impact on the computational overheads of the search algorithm than if one only considers the construction of the approximation models. The reason is again the curse of dimensionality, this time referring to the exponentially increasing difficulty of reliably maximizing the WEIF as its dimensionality increases. This effect is emphasized by the commonly complex, multimodal nature of the surface. If, for example, we use a multi-start hillclimber to search the expected improvement surface (as in Chapters 5 and 6), this means that the number of necessary restarts increases rapidly with the increase in dimensionality (although, in the author’s experience, at a lower than exponential rate – the most likely reason for this is that locating the optimum of a basin of attraction is a cheaper operation than a thorough mapping of the region). We note here that since the purpose of the studies presented in this thesis involving the WEIF-based optimizer was the investigation of sometimes subtle variations in the behaviour of the algorithm when run with different settings, the reliability of the search (i.e., that we did not miss any good weighted expected improvement optima) was very important and the relative computational cost less so. Therefore we used a higher


99

number of BFGS restarts than one would normally use in a design situation. As to what is a reasonable number of restarts to use in the latter case, no general rules can be established, as this is highly dependent on the application in hand, on its modality, on the efficiency of the search technique, on the nature of the implementation, etc.

7.1.3

Modality

Under certain (mild) assumptions it can be proved that Gaussian Radial Basis Function models are universal approximators, i.e., any continuous function can be approximated by them to an arbitrary accuracy. In fact, Chen and Chen [1995] have shown that the necessary and sufficient condition for a function of one variable to qualify as a universally approximating RBF kernel is that the function is not an even polynomial. Therefore, functions of arbitrary modality can, in theory, be optimized using the surrogate techniques discussed in this thesis. However, we are mostly concerned here with the practical aspects of approximation and from this point of view caution is recommended if the objective landscape is very highly multimodal. In such cases it is likely that the number of training points required to build a reasonable model of the function is so high that one is well advised to choose the class of Genetic Algorithm-based hybrid methods instead 2 .

7.1.4

Noise

The surrogate-based techniques put forward in this work are based on interpolating models. Therefore, they are usually not suitable for optimization of noisy landscapes. The difficulties go far beyond the requirement of fitting regression models to the data instead of the interpolating RBF. Serious conceptual hurdles need to be tackled before an expected improvement type measure can be developed for various types of noise, inaccuracies in the physics-based simulation resulting in surface offsets, etc. Encouraging advances in a closely related area have been made by Williams et al. [2000], who calculate the posterior improvement distribution at untested sites for design cases where some of the design variables are known exactly, while others are only known via their distribution (i.e., the noise is in the variables). 2

In certain highly multimodal cases a regression model could also indicate the likely location of the global optimum.


100

Genetic Algorithms are usually much less affected by the presence of noise. As far as the GLOSSY framework is concerned, it can be used for noisy problems to combine a GA with a local hillclimber that is also comparatively efficient on such landscapes. A typical technique, which appears to work well in the author’s experience, is Evolutionary Operation (EVOP), a very robust algorithm suggested by Box [1957], which can be applied, as Bäck [1996] recommends, in an Evolution Strategy-type framework. This can be particularly useful if the GA is bit-coded and thus continuous-discrete conversion issues are avoided when the improved individual is fed back into the population at the end of the Lamarckian learning sequence.

7.1.5

Prior Experience

Most of the techniques discussed in this thesis are reasonably robust with respect to the choice of their parameter settings. Nevertheless, as illustrated in our earlier study of expected improvement weightings and initial population sizes, being “in the right ballpark” can make a considerable difference to the ultimate objective value found by the optimizer. As we discussed in Chapter 6, it is often possible to judge what the right settings may be by considering various features of the problem (e.g., likely degree of multimodality). Nevertheless, if the problem at hand is known from previous experimentation to be efficiently solvable with a certain setting of one of the optimizers, the advantage offered by this knowledge is likely to mean that that method should be favoured.

7.1.6

Availability of Parallel Machines

Both of the main approaches discussed in this work can be run efficiently on parallel clusters (this issue, in the context of surrogate-based methods, has been analysed in detail in Chapter 5). As discussed earlier, in the case of surrogate methods sometimes the computational overheads can be quite significant, in some extreme high-dimensional cases even comparable to the cost of the objective function computations. These can also be reduced by parallelisation, for example by farming out various subregions of the domain to different processors when performing the leave-one-out cross-validation. Partitioned


101

inverse formulae (see, e.g., Theil [1971]) can also be used to speed up operations with large correlation matrices by parallelising the computation of the basis function weights.

7.1.7

Availability, Cost and Accuracy of Gradients

A recurring theme throughout this thesis has been the possibility of improving the efficiency of the global search process by the use of gradients. As we have seen in the case of both the GLOSSY hybrid technique and the expected improvement search method on the gradient-enhanced RBF model, the critical issue is the computational cost of the gradient. This, of course, ranges from zero, i.e., sensitivities obtained at no extra cost after the computation of the objective to k times the cost of the objective, in the case of sensitivities obtained by finite differencing (where k is the number of design variables). The computation of gradients obtained by the adjoint method typically adds about 100% extra cost to that of the objective (although this can vary depending on the application – for example it was around 150% in the testcase discussed in Chapter 3 but it can be as low as 50%, as in the climate model reported by Hall et al. [1982]). In Chapter 2 we briefly mentioned another computational sensitivity analysis technique: Automatic Differentiation (see, e.g., Oblow et al. [1986], Griewank [1989], Bischof et al. [1992]). This method is based on a post-processing of the source code of the physicsbased analysis module, during which, during which additional operations are inserted that, in parallel with the original calculations, compute the sensitivities as well. It is not uncommon that the extra cost of the gradients here is as high as 10 times that of the objective function [Morris et al., 1993], although the technique is in its relatively early days and this will probably decrease. Nevertheless, at the moment AD is only viable for problems of very high dimensionality (the cost of sensitivities, as in the case of the adjoint method, usually does not depend on the number of variables). As far as the techniques that use gradient-enhanced hillclimbers (such as the hybrid discussed here) are concerned, it is hard to establish the relative computational cost threshold (with respect to the cost of the objective) above which objective function sensitivities are not worth using. Indeed, to take the aerodynamic optimization literature as an example, it is littered with papers reporting local hillclimbing work based on finite


102

difference sensitivities. Starting with Hicks and Henne [1978], arguably the pioneers of computer simulation-based aerodynamic design optimization, continuing in recent years with the likes of Vicini and Quagliarella [1999], many researches have resorted to this crudest and most expensive of ways of obtaining the gradients required by the optimizers – apparently with better results than by using simple, direct (non-gradientbased) techniques. In the case of surrogate model-based techniques the gradient cost tradeoff appears to be more clear-cut. First, it does not make any sense to include finite difference gradients in the model – the required evaluations are likely to be put to much better use if spread out in a space-filling manner than if the evaluated points are in small, tight clusters (as required by finite differencing). We note in passing that the inverse approach, advocated by Liu and Batill [2002], may in some circumstances make sense: if some or all components of the gradient can be obtained for a particular design by some method other than finite differencing, the model can be augmented by training points artificially created in the neighbourhood of that design, using the first order term of the relevant Taylor expansion. Our experiments (Chapter 5) have indicated that obtaining the gradients with an additional computational cost of around 100% may already be a borderline case. As our, admittedly limited, battery of testcases used in Chapter 5 demonstrates, using objective sensitivities at this cost clearly makes sense from a global model accuracy perspective. However, their usefulness in an expected improvement-based optimization procedure is already questionable in such circumstances. An important factor here, again, is the dimensionality of the design space. The higher the number of design variables, the more valuable the gradients are, if obtained at a cost similar to that of the objective. While this makes search methods such as GLOSSY attractive propositions for highly parameterized design optimization problems, the problem of the cost of ancillary manipulations becomes considerably more dramatic for surrogate-based techniques – much more so than in the non-gradient enhanced case. The reason is, of course, that the matrices that need to be repeatedly inverted are now of size N (k +1)×N (k +1) (see Section 4.3), as opposed to just N ×N in the standard case. In the case of standard RBF models our reasoning on the effect of dimensionality on the feasibility of the method was based on high-dimensional problems usually requiring


103

more training points (hence larger N ) – in this case the effect is more direct (the size of the matrix is explicitly a function of k).

7.2

The Satellite Beam Problem Revisited

In order to gain a more practical insight into some of the factors discussed above and how they affect optimizer performance, we now return to a “real-life” example discussed earlier: the optimization of a two-dimensional truss. From amongst the variants of this problem described in Appendix D we have picked two, one from each end of the complexity scale. First we discuss the behaviour of the techniques dealt with in this thesis on the four-dimensional test problem, i.e., on the shape optimization of the truss whose two midspan joints may be moved. We then look at the more difficult 12-variable case, where we allow six of the joints of the satellite structure to change their locations during the optimization process. In the comparative studies described below the results of every experiment are averages over 25 runs, except where otherwise stated. The individual runs have been started from randomly generated initial populations (GA, GLOSSY), random starting (and re-starting) points (BFGS) and random latin hypercubes (surrogate-based techniques) respectively. Neither the GLOSSY hybrid nor the GA have been tuned in any way – the same settings have been used as in the experiments described in Chapter 3. Based on the results obtained in the previous chapters the sizes of the initial latin hypercubes have been set to five in all cases. Figure 7.1 shows the optimization histories of various techniques on the fourvariable testcase. As expected on such a moderately multimodal, low-dimensional landscape, the approximation-based technique (WEIF search) performs very well. Three such histories are shown: the simple expected improvement update search (i.e., w = 0.5), the WEIF search with w = 0.7 (as the reader may recall, we found this to be the optimum weighting for this testcase in Section 6.2.3) and the expected improvement update search of the gradient-enhanced RBF model (assuming free gradients). It is clear that the properly tuned WEIF search has the edge over the simple expected improvement based algorithm. As far as the gradient-enhancement of the procedure is concerned, our conclusions from Chapter 5 are reinforced here: if the sensitivity information is consid-


104

ered to be free, convergence time savings can be made, but clearly, if the gradients can only be computed at adjoint cost, the performance of the method even deteriorates (we have not included this history in the comparative graph). As far as the other techniques are concerned, as in the aerodynamic design example discussed in Chapter 3, GLOSSY performs better than its components (the multi-start BFGS and the GA).


0.25 WEIF−search w=0.7 WEIF−search w=0.5 Grad. enh. WEIF search w=0.5 GLOSSY (free grad.) GLOSSY (adj. grad.) BFGS (free grad.) BFGS (adj. grad.) GA

0.2

Objective

0.15

0.1

0.05

0

0

5

10

15 Evaluations

20

25

30

Figure 7.1: Optimization histories on the four-dimensional satellite beam testcase.

105


106

BFGS, GLOSSY and the GA come into their own on the 12-dimensional testcase (Figure 7.2). Clearly, the surrogate-based WEIF optimizer finds building a reliable model fairly difficult and the accuracy of the weighted expected improvement landscape associated with the surrogate is also fairly poor, as the slow convergence illustrates. The other significant pitfall of high-dimensional surrogate-based optimization also appears in this case: the algebraic manipulations required by the optimizer itself begin to make the search unfeasible for all but the most expensive objective functions. For this reason, the surrogate-based results are only averaged over three runs here and the gradientenhanced version has not been included. Again, the optimal expected improvement weighting (w = 0.8) found in the previous chapter leads to a better ultimate objective value than the default (w = 0.5). The other noteworthy feature of this comparison is that BFGS on this occasion performs somewhat better than the hybrid technique.


0.2 WEIF−search w=0.8 WEIF−search w=0.5 GLOSSY (free grad.) GLOSSY (adj. grad.) BFGS (free grad.) BFGS (adj. grad.) GA

0.18

0.16

0.14

Objective

0.12

0.1

0.08

0.06

0.04

0.02

0

0

10

20

30

40

50 Evaluations

60

70

80

90

100

Figure 7.2: Optimization histories on the 12-dimensional satellite beam testcase.

107

Chapter 8

Conclusions Global design optimization is one of the most active areas of research in computational engineering, as demonstrated by the sheer number and popularity of journals dealing with the subject. Amongst a plethora of titles one can find well-established publications (such as Kluwer’s Journal of Global Optimization or Taylor & Francis’ Engineering Optimization) as well as journals with volume numbers still in the single digits (Optimization and Engineering, Journal of Evolutionary Optimization, etc.). High-impact, general interest engineering publications (such as the family of AIAA journals or the IEEE’s Transactions) also frequently deal with global design optimization related work. The studies carried by these journals deal with research into the efficiency, robustness, theoretical underpinning and practical implementation of global search techniques, conducted on many fronts. From this range of ideas and approaches we have focused our attention here at two main lines of attack, which seemed promising in the light of our own experience: combining traditional global techniques with local hillclimbers into reliable hybrids and using global approximators to aid the optimization process. We have intertwined these in certain parts with a third idea: the use of objective function sensitivities in global optimization. In the following we briefly review the milestones of the work presented here, looking back at the achievements and the shortcomings of our research efforts. In the second part of this chapter we attempt to look into the future – we speculate on what may be the paths of future development in the field of global optimization in general and in the areas discussed in the present thesis in particular. 108

Chapter 8 Conclusions

8.1

109

What Has Been Accomplished

We began our study of possible enhancements to global search techniques by looking at exploration/exploitation hybrid heuristics. One of the crucial aspects of such algorithms is the scheme whereby they allocate computational resources to the two components. Therefore, before designing a hybrid of our own, we constructed a taxonomy of existing techniques from this perspective. This taxonomy of search time division mechanisms is laid out in Chapter 3 and followed by the description of our proposed method: a generic multi-population algorithm template (GLOSSY). As an application we chose an area where significant progress has been achieved in recent years on low-cost sensitivity analysis: aerodynamic shape optimization. The use of low-cost objective function sensitivities has so far been restricted to local hillclimbing – here we put forward GLOSSY as a possible vehicle for exploiting them in a global search context as well. Our testcase, the shape optimization of an engine inlet, showed that the approach is viable and more reliable than its main competitors. The following chapters dealt with the other main theme of the work: global approximations and their uses in optimization. In Chapter 4 we outlined the main concepts relating to the construction of Radial Basis Function models, their statistical background and an extension of their theory, which is particularly important from the optimization standpoint: the expected improvement (EI) measure. This forms the basis of the algorithm proposed in Chapter 5. There we investigated the feasibility of running an expected improvement-based optimizer on a parallel computational architecture. We concluded that in the majority of the cases examined here the speedup was close to linear, i.e., doing the model updates in parallel, at least for the cluster sizes examined here, is worthwhile. We have also looked at the issue of using gradient-enhanced models in optimization (all optimization work based on global gradient-enhanced models seen, to date, by the author was based on simply searching the predictor). Our experiments suggest that if the sensitivities can be obtained at little extra cost (i.e., less than the cost of obtaining the objective values) they are worth using – otherwise, although the improvement in overall model accuracy is worthwhile, the increase in optimization performance does not warrant the computation of the gradients. One of the features that can turn a mathematically viable algorithm into a useful industrial tool is the ease of setting its runtime parameters. In an ideal world the


110

engineer would not be required to select any non-trivial parameters of the optimizer – this is, to some extent, true as far as the approximation-based techniques discussed here are concerned. Nevertheless, if one is dealing with high-cost objectives, some degree of tuning may be required to increase the efficiency of the algorithm. The correct choice of the parameters was the subject of Chapter 6, where we also introduced a weighting scheme to control the global/local bias of the expected improvement criterion. We have shown that in most cases the efficiency of the weighted expected improvement-based search is highest if the initial space-filling experimental design is small and the greatest part of the available computational budget is spent on evaluating points selected by the WEIF criterion. As far as the expected improvement weighting factor w is concerned, we have shown that there is a relationship between the modality of the landscape and the optimum value of w, i.e., the WEIF criterion is an effective means of controlling the scope of the search. To summarise, the main contributions of this thesis are: • a taxonomy of hybrid search time division methods, • GLOSSY, a generic Lamarckian learning framework for combining a global optimizer with a local exploitation algorithm (application: first use of an adjoint flow solver in global optimization), • parallel optimization based on simple and gradient-enhanced RBF models using the expected improvement criterion, • an expected improvement weighting scheme designed to control the global/local bias of the search.

8.2

...And What Remains to be Discovered

We have borrowed the title of this section, perhaps a little over-confidently, from Sir John Maddox’s seminal book [Maddox, 1999] about his take on the scientific agenda for the 21st century. Our aim here is far less ambitious than that of Nature’s distinguished editors: we merely attempt to chart a number of possible paths along which global design optimization may progress in the near future. In particular, we discuss those


111

trends that have something in common with or may draw on some of the topics explored in this thesis. A common weakness of all of the methods discussed here, and indeed of most of the industrial global optimization tools of the moment, is that they rely purely on observational data. In most cases the process of optimizing the objective f is guided solely by (x, f (x)) data pairs, as if no knowledge was available on the underlying mathematics of the computation of f . In other words, there is no organic coupling between the global search engine and the mathematical mechanism that simulates the physical phenomena and which determines the shape of the landscape being optimized. The majority of today’s off-the-shelf analysis packages (FEA codes, flow solvers, etc.) isolate the governing equations and the process whereby their field variables are computed in a black box; that is, all the optimizer “sees” is an automaton that turns the input vector of design variables x into the figure of merit f (x) that indicates how “good” the design is. In order to highlight some of the potential weaknesses of this approach, it is perhaps useful to view shape optimization problems, prevalent in modern engineering design practice, in the following mathematical formalism, often encountered in the calculus of variations literature. A parametric curve or surface (i.e., a design) is sought, which optimizes a given functional (f , the figure of merit, or objective), subject to a set of constraints represented by the governing equations of the physical phenomena involved. From the design optimization point of view the essential aspect here is that at least some of the boundary conditions of the governing equations are defined on the curve or surface, the shape of which is being optimized. Therefore, the basis for solving the optimization problem can be formulated as the search for a generic solution in terms of the boundary parameters (the design variables x). Such a solution, if obtained in a cheap to evaluate form, would allow almost instantaneous optimization of the shape. From this standpoint the classic practice of generating a sequence of different boundaries and computing separate, snapshot-like numerical solutions to the governing equations for each may appear wasteful in many cases. It is, of course, a very ambitious goal to obtain a parametric solution to the governing equations, particularly when the boundary is complex and the number of design variables is high. To date only relatively simple problems have been tackled using


112

this approach [Nair, 2002]. Given the daunting nature of the task, the current established practice of optimization based on the black-box principle is unlikely to be unseated in the near future. Nevertheless, work is underway in this direction on several lines of attack and, perhaps slightly paradoxically, one of the most promising contenders draws heavily on a concept traditionally employed in an observational data-based optimization setting (in this thesis, for example): the remarkably fertile idea of radial basis functions. Much of the discussion concerning RBF models in Chapter 4 remains valid if, instead of building a model of the objective function over the design space, we use the RBF approach to approximate the variation of the field variables over the domain on which the governing equations are defined. In the technique pioneered by Kansa [1990], a set of RBF kernels, uniformly distributed over the domain and its boundary, are used to support the model, which is trained by optimizing a suitable measure of how well it satisfies the governing equations and their boundary conditions – this is typically a collocation-type measure or a Galerkin integral. While, as we pointed out earlier, the approach outlined above still has a long way to go towards becoming an industrially feasible design tool, compromise solutions may be considered. For example, instead of using our knowledge of the governing equations that we need to solve to compute the objective, it is often possible to take advantage of certain features of the numerical scheme employed to solve them. Forrester et al. [2003] monitored the convergence of a set of iterative solutions of the Euler equations (for a number of flap-track fairing geometries in inviscid flow, the designs being arranged in two-dimensional DoE) and concluded that the shape of the response surface approximation of the objective (the lift to drag ratio in this case) stabilises long before final convergence of the iterative scheme. In other words, the actual objective values resulting from these partially converged solutions may be a long way away from the final, fully converged ones, but if we solve several designs concurrently and fit approximation models to these intermediate values, we may be able to locate the optima, or at least isolate the most important basins of attraction, after having run the analysis code only for a fraction of the CPU time of a full solve. Therefore, another possible area of further global optimization research is to apply this principle to design processes based on high-fidelity, high cost analyses, drawing on our discussion of parallel update schemes (Chapter 5) and expected improvement-type infill selection criteria (Chapter 6).

Appendix A

The Engine Inlet Testcase The geometry of the cross section of the engine inlet discussed in 3.4 has been parameterised using Hicks-Henne bump functions (equation 3.4). Table A.1 shows the ranges in which the six parameters are allowed to vary. The subscript l denotes the variables defining the bump function added to the lower (inside) skin of the inlet cross section, the subscript u refers to the variables of the upper (outside) bump. The ranges have been chosen to avoid interference between the outside and inside surfaces and to prevent the occurrence of very “unphysical” geometries. Figure A.1 shows a number of example geometries to illustrate the effect of the addition of bump functions with various parameters. Figure A.2 shows a contour plot of the objective function (square of peak surface velocity). Al

xpl

tl

Au

xpu

tu

Lower limit

-0.15

0.4

2

0

0.5

2

Upper limit

0.05

0.8

10

0.15

0.85

5

Table A.1: Ranges of the six design variables describing the engine inlet geometry used as a testcase in 3.4.

113

Appendix A The Engine Inlet Testcase

114

2

2 Au = 0 xpu = 0.5 tu = 2

1.5 1

1 Al = −0.15 xpl = 0.4 tl =2

0.5 0

0

0.5

1

0

1.5

0

0.5

1

1.5

2 Au = 0.04 xpu =0.7 tu = 2

1.5

Au =0.07 xpu =0.65 tu =4

1.5

1

1 Al =−0.09 xpl =0.5 tl = 4

0.5 0

0.5

1

Al =0.05 xpl =0.6 tl =6

0.5 0

1.5

2

0

0.5

1

1.5

2 Au =0.15 xpu =0.85 tu =5

1.5

Au =0.04 xpu = 0.85 tu =5

1.5

1

1 Al = −0.15 xpl = 0.4 tl = 10

0.5 0

Al =0.05 xpl =0.8 tl =10

0.5

2

0

Au = 0.15 xpu = 0.85 tu = 5

1.5

0

0.5

1

Al = −0.09 xpl = 0.5 tl = 4

0.5 0

1.5

0

0.5

1

1.5

Figure A.1: Examples of geometries from the design space of the engine inlet test case.

10

9

8

t

l

7

6

5

4

3

2 0.4

0.5

0.6

0.7

0.8

xpl

Figure A.2: Contour plot of the objective function of the engine inlet testcase (square of peak surface velocity).

Appendix B

Artificial Test Functions This Appendix contains details of the artificial test functions used to test some of the optimizers discussed in this thesis.

B.1

Rastrigin’s Function

fRA = 10k +

Pk

i=1

£

x2i − 10 cos (2πxi )

where xi ∈ [−5.12, 5.12]

¤

5

4

3

2

y

1

1

0

−1

−2

−3

−4

−5 −5

−4

−3

−2

−1

0 x

1

2

3

4

5

1

Figure B.1: Two-dimensional slice through the k dimensional Rastrigin function.

115

Appendix B Artificial Test Functions

B.2

116

Branin’s Function

fB (x1 , x2 ) = a(x2 − bx21 + cx1 − d)2 + e(1 − f ) cos(x1 ) + e, where a = 1, b =

5.1 , 4π 2

c = π5 , d = 6, e = 10, f =

1 8π

and

x1 ∈ [−5, 10], x2 ∈ [0, 15].

1

0.8

x2

0.6

0.4

0.2

0 0

0.2

0.4

0.6 x

0.8

1

1

Figure B.2: Contour plot of Branin’s function (normalised to [0,1]).

B.3

Modified Rosenbrock Function

fM R (x) =

1 a

hP

k−1 i=1 100(xi+1

where a = 206, b = 300

− x2i )2 + (1 − xi )2 +

and

Pk

i=1 75 sin(5(1

− xi )) − b

x ∈ [−2.048, 2.048]k .

i

1

0.8

x2

0.6

0.4

0.2

0 0

0.1

0.2

0.3

0.4

0.5 x1

0.6

0.7

0.8

0.9

1

Figure B.3: Two-dimensional slice through the k dimensional Modified Rosenbrock function (normalised to [0,1]).

Appendix B Artificial Test Functions

B.4 fA (x) =

117

Ackley’s Path Function 1 f

½

¸ · q ¾ h P i P −a exp −b n1 ki=1 x2i − exp n1 ki=1 cos(cxi ) + a + exp(1) + d

where a = 20, b = 0.2, c = 2π, d = 5.7, f = 0.8

and

x ∈ [−2.048, 2.048]k .

1

0.8

x

2

0.6

0.4

0.2

0 0

0.2

0.4

0.6

0.8

1

x1

Figure B.4: Two-dimensional slice through the k dimensional Ackley’s Path function (normalised to [0,1]).

Appendix C

The Tail Bearing Housing Model This Appendix contains details of the tail bearing housing test problem used in some of the experiments discussed in this thesis. The model, shown in Figure C.1, is made up entirely of beam elements whose thickness can be altered in a variety of ways. Six design parameters define the geometry, five of which describe the ring cross section whilst the sixth relates to the spoke sections. The rest of the model simply enforces suitable boundary conditions. Our industrial collaborators provided realistic loading conditions on the structure. The goal is to minimize the maximum von Mises stress within the structure, such that the weight does not exceed a predefined value – here taken as 16.0kg. The evaluation of the stress involves solving the finite element problem at a relatively low cost. The weight is a straightforward function and can be evaluated using a simple analytic expression. The finite element model has been constructed using the ProMecanicaTM package.

118

Appendix C The Tail Bearing Housing Model

Figure C.1: Sketch of the tail bearing structure.

119

Appendix D

Passive Vibration Control in a Satellite Beam This testcase is concerned with the optimization of the frequency response of a twodimensional structure of the sort that may be found in girder bridges, tower cranes, satellite booms, etc. [Renton, 1999]. It consists of 40 individual Euler-Bernoulli beams connected at 20 joints. Each of the 40 beams has the same properties per unit length. Initially the boom was designed and analyzed for a regular geometry, where each beam was either 1 m or 1.414 m in length (see, for example, Figure D.1). The joints at points (0,0) and (0,1) are fixed, i.e., they are fully restrained in all degrees of freedom, all other joints are free to move. The structure is excited by a point transverse force applied halfway between points (0,0) and (1,0) (as indicated by the ’x’ on the figures). The vibrational energy level was found for the right-hand end vertical beam [Keane, 1995]. The results of the analysis have been validated experimentally [Keane and Bright, 1996]. The objective was the minimization of the frequency averaged response of the beam in the range 150-250 Hz. We have constructed a number of testcases with different dimensionalities. Starting from a four-variable optimization problem, where the x and y coordinates of the two mid-span joints (9 and 10) were allowed to vary by ±0.25m, we have created the higherdimensional test problems by allowing the same range of movement to further joints,

120

Appendix D Passive Vibration Control in a Satellite Beam

121

one by one, on alternating sides of the initial two. This is illustrated in the following figures.

y−coordinate [m]

2

1

0

−1

2

4

6

8

10

12

14

16

18

1

3

5

7

9

11

13

15

17

0

2

4 6 x−coordinate [m]

8

10

Figure D.1: The four-variable testcase. The x and y coordinates of the two midspan joints (9 and 10) are allowed to vary within the limits indicated by the chevrons (±0.25m).

y−coordinate [m]

2

1

0

−1

2

4

6

8

10

12

14

16

18

1

3

5

7

9

11

13

15

17

0

2


8

10

Figure D.2: The six-variable testcase. The x and y coordinates of joints 9, 10 and 11 are allowed to vary within the limits indicated by the chevrons (±0.25m).

y−coordinate [m]

2

1

0

−1

2

4

6

8

10

12

14

16

18

1

3

5

7

9

11

13

15

17

0

2


8

10

Figure D.3: The eight-variable testcase. The x and y coordinates of joints 8, 9, 10 and 11 are allowed to vary within the limits indicated by the chevrons (±0.25m).

y−coordinate [m]

2

1

0

−1

0

2

4

6

8

10

12

14

16

18

1

3

5

7

9

11

13

15

17

2


8

10

Figure D.4: The ten-variable testcase. The x and y coordinates of joints 8, 9, 10, 11 and 12 are allowed to vary within the limits indicated by the chevrons (±0.25m).

Appendix D Passive Vibration Control in a Satellite Beam

122

y−coordinate [m]

2

1

0

−1

0

2

4

6

8

10

12

14

16

18

1

3

5

7

9

11

13

15

17

2


8

10

Figure D.5: The 12-variable testcase. The x and y coordinates of 7, 8, 9, 10, 11 and 12 are allowed to vary within the limits indicated by the chevrons (±0.25m).

Appendix E

Detailed Empirical Results on the Satellite Beam Testcase In this section we present the more detailed results of the empirical investigations aimed at finding the optimum setup of the WEIF-based optimization strategy on the satellite beam testcase (see Appendix D). Figures E.1 through E.8 show two-variable sections (colourmaps) through the design space (the variables, the x and y coordinates of the joints, are normalised to the [0,1] range). Figures E.9 through E.13 show the objective function density maps (initial DoE size vs. expected improvement weighting) after various numbers of evaluations. 1

0.9 0.6 0.8

0.7

0.5

0.6 0.4 0.5

0.4 0.3 0.3

0.2

0.2

0.1 0.1 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure E.1: Two-variable slice through the objective function: x coordinate of joint 7 vs. y coordinate of joint 7.

123

Appendix E Detailed Empirical Results on the Satellite Beam Testcase

1

1.2

0.9

1.1

0.8

1 0.9

0.7

0.8 0.6 0.7 0.5 0.6 0.4 0.5 0.3 0.4 0.2

0.3

0.1

0

0.2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


1

1.2

0.9 1

0.8

0.7 0.8 0.6

0.5 0.6 0.4

0.3

0.4

0.2 0.2

0.1

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


1

2

0.9

1.8

0.8

1.6

0.7

1.4

0.6

1.2

0.5

1

0.4 0.8 0.3 0.6 0.2 0.4 0.1 0.2 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


124


1 1.6 0.9 1.4

0.8

0.7

1.2

0.6 1 0.5 0.8

0.4

0.3

0.6

0.2 0.4 0.1

0

0.2 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


1 1.4

0.9

0.8 1.2 0.7 1

0.6

0.5 0.8 0.4 0.6

0.3

0.2

0.4

0.1 0.2 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


1 0.9 0.9 0.8

0.8

0.7 0.7 0.6 0.6 0.5

0.4

0.5

0.3 0.4 0.2

0.3

0.1

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


125


1

0.9 1.2 0.8

1

0.7

0.6 0.8 0.5

0.4

0.6

0.3 0.4

0.2

0.1 0.2 0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure E.8: Two-variable slice through the objective function: y coordinate of joint 9 vs. x coordinate of joint 10.

126

30

30

30

25 20 15 10

25 20 15 10

0

0.2

0.4

0.6

0.8

5

1

25 20 15 10

0

0.2

0.4

0.6

0.8

5

1

0

0.2

0.4

0.6

0.8

1

Objectives after 27 evaluations 35


30

30

30

25 20 15

Initial DoE size


Initial DoE size

Initial DoE size

5

Initial DoE size


Initial DoE size

Initial DoE size


25 20 15

0.08 0.07

25

0.06 0.05

20

0.04

15

0.03 10 5

10 0

0.2

0.4

0.6

0.8

1

5

10 0

0.2

0.4

0.6

0.8

1

5

0.02 0

0.2 0.4 0.6 0.8



1

127

Figure E.9: The four-variable testcase (as shown in Figure D.1). Objective function density maps after various numbers of evaluations. The horizontal lines indicate the number of evaluations after which the density map was plotted (clearly, it does not make sense to use any DoE size above the line).

35

35

35

30

30

30

25 20 15 10

25 20 15 10

0

0.2

0.4 0.6 0.8 EI weigthing

5

1

25 20 15 10

0

0.2


5

1

0

0.2


1



35

35

35

0.11

30

30

30

0.1

25

0.09

20

0.08

15

0.07

10

0.06

25 20 15 10 5

Initial DoE size


Initial DoE size

Initial DoE size

5

Initial DoE size


Initial DoE size

Initial DoE size


25 20 15 10

0

0.2


1

5

0

0.2


1

5

0

0.2 0.4 0.6 0.8 EI weigthing



1

Figure E.10: The six-variable testcase (as shown in Figure D.2). Objective function density maps after various numbers of evaluations.

128

40

40

35

35

35

30 25 20

Initial DoE size

40

30 25 20

30 25 20

15

15

15

10

10

10

5

0

0.2

0.4 0.6 EI weigthing

0.8

5

1

0

Objectives after 50 evaluations

0.2


0.8

5

1


0

0.2


0.8

1


45

45

40

40

40

35

35

35

0.09

30

0.08

25

0.07

30 25 20

Initial DoE size

45

Initial DoE size

Initial DoE size


Initial DoE size

Initial DoE size


30 25 20

0.1

20

15

15

15

10

10

10

0.06 0.05

5

0

0.2


0.8

1

5

0

0.2


0.8

1

5



0.04 0

0.2


1

Figure E.11: The eight-variable testcase (as shown in Figure D.3). Objective function density maps after various numbers of evaluations.

129

35

35

30

30

30

25 20

Initial DoE size

35 Initial DoE size

Initial DoE size

40

25 20

25 20

15

15

15

10

10

10

5

0

0.2


0.8

5

1

0


0.2


0.8

5

1


0

0.2


0.8

1


40

40

35

35

35

0.08

30

30

30

0.07

25 20

Initial DoE size

40

Initial DoE size

Initial DoE size


40

25 20

25

0.06

20 0.05

15

15

15

10

10

10

0.04

5

0

0.2


0.8

1

5

0

0.2


0.8

1

5




0.03 0

0.2


1

Figure E.12: The ten-variable testcase (as shown in Figure D.4). Objective function density maps after various numbers of evaluations.

130

45

45

40

40

40

35 30 25 20

35 30 25 20

35 30 25 20

15

15

15

10

10

10

5

0

0.2


5

1

0

0.2


5

1



45

45

45

40

40

40

35 30 25 20

35 30 25 20 10

10

1

5

0

0.2


1

0.035 0.03

20

10 0.4 0.6 0.8 EI weigthing

1

0.04

25 15

0.2


30

15

0

0.2

35

15 5

0

Objectives after 100 evaluations 50 0.045

Initial DoE size

Initial DoE size

Initial DoE size

45 Initial DoE size


Initial DoE size

Initial DoE size


5

0.025 0.02 0

0.2 0.4 0.6 0.8 EI weigthing

1



Figure E.13: The 12-variable testcase (as shown in Figure D.5). Objective function density maps after various numbers of evaluations.

131


E.1

132

Sample data

In this section we present a sample dataset relating to this testcase, which enables the reader to construct her own model and compare it with ours. Let us consider a twovariable slice through the search space, where x1 is the y-coordinate of node 5 and x2 is the x-coordinate of node 6. We have built a standard Radial Basis Function model of the function (the same objective as above) based on a 16-point Morris-Mitchell-optimal latin hypercube design, denoted here by OLH (each row of the matrix is an (x1 , x2 ) pair). The column vector T contains the responses in these points. We then compared the values predicted by the model against the true function values calculated over a 21 × 21 uniform lattice (i.e, with a spacing of 0.05 from 0 to 1) – this data is contained in the matrix denoted by M (x1 varies along each row of the matrix from 0 to 1 (left to right), while moving down a column means moving up the x2 axis from 0 to 1). The mean absolute error (i.e., the mean of the 21×21 absolute differences between the model and the exact function values) was 0.0579. OLH = 0.0000 0.2000 0.0667 0.4667 0.1333 0.7333 0.2000 1.0000 0.2667 0.1333 0.3333 0.4000 0.4000 0.6667 0.4667 0.9333 0.5333 0.0667 0.6000 0.3333 0.6667 0.6000 0.7333 0.8667 0.8000 0.0000 0.8667 0.2667 0.9333 0.5333 1.0000 0.8000

Appendix E Detailed Empirical Results on the Satellite Beam Testcase T = 0.3974 0.3933 0.3475 0.2940 0.7361 0.3618 0.2706 0.2149 0.3166 0.2438 0.2674 0.2176 0.2846 0.3524 0.2757 0.2655

M = Columns 1 through 7 0.5029 0.4637 0.4398 0.4277 0.4280 0.4476 0.5062 0.4639 0.4378 0.4268 0.4307 0.4557 0.5221 0.6721 0.4358 0.4195 0.4193 0.4392 0.4926 0.6143 0.8116 0.4176 0.4068 0.4161 0.4515 0.5330 0.6911 0.8079 0.3974 0.3976 0.4160 0.4648 0.5651 0.7090 0.7026 0.3847 0.3908 0.4175 0.4751 0.5761 0.6648 0.5922 0.3772 0.3873 0.4183 0.4782 0.5615 0.5933 0.5080 0.3728 0.3861 0.4186 0.4727 0.5289 0.5237 0.4466 0.3710 0.3860 0.4167 0.4594 0.4896 0.4672 0.4038 0.3712 0.3862 0.4120 0.4406 0.4510 0.4230 0.3728 0.3729 0.3861 0.4047 0.4197 0.4170 0.3888 0.3494 0.3760 0.3854 0.3956 0.3989 0.3885 0.3620 0.3310

133

Appendix E Detailed Empirical Results on the Satellite Beam Testcase 0.3800 0.3841 0.3856 0.3799 0.3649 0.3406 0.3160 0.3854 0.3828 0.3764 0.3650 0.3450 0.3232 0.3032 0.3934 0.3846 0.3704 0.3494 0.3285 0.3094 0.2926 0.4078 0.3846 0.3595 0.3365 0.3163 0.2995 0.2850 0.4187 0.3831 0.3532 0.3287 0.3099 0.2960 0.2815 0.4434 0.3885 0.3508 0.3267 0.3119 0.2974 0.2746 0.4383 0.4005 0.3502 0.3296 0.3183 0.2928 0.2639 0.3719 0.3543 0.3334 0.3267 0.3157 0.2809 0.2529 0.3692 0.3543 0.3505 0.3287 0.2940 0.2643 0.2421

Columns 8 through 14 0.6292 0.6475 0.4810 0.3825 0.3363 0.3100 0.2919 0.8118 0.6138 0.4317 0.3521 0.3139 0.2914 0.2762 0.7614 0.5141 0.3836 0.3251 0.2947 0.2761 0.2639 0.6257 0.4387 0.3482 0.3043 0.2798 0.2643 0.2553 0.5192 0.3892 0.3240 0.2898 0.2697 0.2570 0.2510 0.4495 0.3581 0.3088 0.2804 0.2626 0.2517 0.2491 0.4028 0.3340 0.2936 0.2695 0.2544 0.2460 0.2480 0.3677 0.3155 0.2832 0.2626 0.2495 0.2430 0.2496 0.3438 0.3020 0.2750 0.2579 0.2465 0.2419 0.2546 0.3267 0.2931 0.2702 0.2545 0.2447 0.2426 0.2636 0.3137 0.2865 0.2670 0.2531 0.2441 0.2456 0.2754 0.3032 0.2813 0.2647 0.2524 0.2448 0.2515 0.2793 0.2942 0.2766 0.2628 0.2519 0.2461 0.2568 0.2674 0.2861 0.2721 0.2606 0.2509 0.2476 0.2526 0.2550 0.2788 0.2676 0.2574 0.2492 0.2463 0.2465 0.2476 0.2732 0.2617 0.2514 0.2446 0.2414 0.2404 0.2409 0.2659 0.2522 0.2427 0.2367 0.2331 0.2316 0.2315 0.2553 0.2420 0.2329 0.2267 0.2225 0.2202 0.2195 0.2449 0.2322 0.2232 0.2165 0.2117 0.2086 0.2072 0.2356 0.2237 0.2148 0.2080 0.2028 0.1992 0.1972 0.2274 0.2168 0.2085 0.2018 0.1965 0.1926 0.1903

134

Appendix E Detailed Empirical Results on the Satellite Beam Testcase Columns 15 through 21 0.2791 0.2750 0.2846 0.3060 0.3284 0.3447 0.3564 0.2674 0.2701 0.2883 0.3153 0.3381 0.3540 0.3666 0.2596 0.2699 0.2969 0.3268 0.3478 0.3621 0.3749 0.2562 0.2754 0.3098 0.3387 0.3556 0.3671 0.3779 0.2581 0.2872 0.3249 0.3476 0.3581 0.3657 0.3750 0.2646 0.3040 0.3387 0.3515 0.3555 0.3600 0.3681 0.2731 0.3182 0.3414 0.3439 0.3440 0.3474 0.3554 0.2846 0.3254 0.3321 0.3269 0.3246 0.3274 0.3352 0.2977 0.3225 0.3161 0.3083 0.3058 0.3084 0.3159 0.3055 0.3091 0.2981 0.2914 0.2899 0.2928 0.3000 0.2994 0.2910 0.2823 0.2784 0.2785 0.2821 0.2893 0.2820 0.2747 0.2704 0.2694 0.2712 0.2757 0.2834 0.2657 0.2633 0.2627 0.2639 0.2671 0.2726 0.2808 0.2555 0.2562 0.2578 0.2606 0.2649 0.2711 0.2798 0.2490 0.2510 0.2537 0.2575 0.2626 0.2694 0.2783 0.2424 0.2446 0.2478 0.2523 0.2580 0.2651 0.2740 0.2328 0.2349 0.2378 0.2423 0.2482 0.2560 0.2655 0.2203 0.2219 0.2241 0.2281 0.2336 0.2410 0.2507 0.2075 0.2085 0.2100 0.2136 0.2187 0.2258 0.2351 0.1970 0.1980 0.1989 0.2021 0.2069 0.2137 0.2229 0.1898 0.1919 0.1915 0.1944 0.1990 0.2057 0.2148

135

Bibliography J. J. Alonso, I. M. Kroo, and A. Jameson. Advanced algorithms for design and optimization of quiet supersonic platforms. 40th Aerospace Sciences Meeting and Exhibit, Reno, NV, 2002. J. An and A. Owen. Quasi-regression. Journal of Complexity, 17(4), 2001. K. W. Anderson and V. Venkatakrishnan. Aerodynamic design optimization on unstructured grids using a continuous adjoint formulation. Computers and Fluids, 28(4-5): 443–480, 1999. P. J. Angeline. Adaptive and self-adaptive evolutionary computations. In Marimuthu Palaniswami and Yianni Attikiouzel, editors, Computational Intelligence: A Dynamic Systems Perspective, pages 152–163. IEEE Press, 1995. E. Arian and V. N. Vatsa. A preconditioning method for shape optimization governed by the Euler equations. Report 98-14, ICASE, 1998. C. Audet, J. E. Dennis, D. W. Morre, A. Booker, and P. D. Frank. A surrogatemodel-based method for constrained optimization. 8th AIAA/NASA/USAF/ISSMO Symposium on Multidisciplinary Analysis and Optimization, Long Beach, CA, 2000. P. Audze and V. Eglais. New approach to planning out of experiments. Problems of Dynamics and Strength (in Russian), 35(104-107), 1977. A. Autere. Employing gradient information in a genetic algorithm. Second European Congress on Intelligent Techniques and Soft Computing EUFIT 94, Aachen, 1994. T. Bäck. Evolutionary Algorithms in Theory and Practice. Oxford University Press, 1996. J. M. Baldwin. A new factor in evolution. American Naturalist, 30:441–451, 1896. 136

BIBLIOGRAPHY

137

S. J. Bates, J. Sienz, and D. S. Langley. Formulation of the Audze-Eglais uniform latin hypercube design of experiments. Advances in Engineering Software, 34:493–506, 2003. C.H. Bischof, G.F. Corliss, L. Green, A. Griewank, K. Haigler, and P. Newman. Automatic differentiation of advanced cfd codes for multidisciplinary design. Journal on Computing Systems in Engineering, 3(6):625–638, 1992. S. Blackmore. The Meme Machine. Oxford University Press, 2000. G. E. P. Box. Evolutionary operation: a method for increasing industrial productivity. Journal of Applied Statistics, 6(2):81–101, 1957. G. E. P. Box and N. R. Draper. Empirical model building and response surfaces. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, 1987. M. J. Box. A new method of constrained optimization and comparison with other methods. Computer Journal, 8(1):42–52, 1965. D.S. Broomhead and D. Loewe. Multivariate functional interpolation and adaptive networks. Complex Systems, 2:321–355, 1988. D. E. Brown, C. L. Huntley, and A. R. Spillane. A parallel genetic heuristic for the quadratic assignment problem. Proceedings of the Third International Conference on Genetic Algorithms and Their Applications, Fairfax, VA, pages 406–415, 1989. C. G. Broyden. The convergence of a class of double-rank minimization algorithms 2. the new algorithm. J. Inst. Math. Appl., 6:222–231, 1970. T. Chen and H. Chen. Approximation capability of functions of several variables, nonlinear functionals and operators by radial basis function neural networks. IEEE Transactions on Neural Networks, 6(4):904–910, 1995. H. S. Chung and J. J. Alonso. Using gradients to construct response surface models for high-dimensional design optimization problems. 39th AIAA Aerospace Sciences Meeting & Exhibit, Reno, NV, 2001. H.-S. Chung and J. J. Alonso. Design of a low-boom supersonic business jet using cokriging approximation models. 9th AIAA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, Atlanta, GA, 2002a.

BIBLIOGRAPHY

138

H.-S. Chung and J. J. Alonso. Using gradients to construct cokriging approximation models for high-dimensional design optimization problems. 40th AIAA Aerospace Sciences Meeting and Exhibit, Reno, NV, 2002b. C. Currin, T. Mitchell, M. Morris, and D. Ylvisaker. Bayesian prediction of deterministic functions functions, with applications to the design and analysis of computer experiments. Journal of the American Statistical Association, 86:953–963, 1991. L. Davis. Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York, 1991. R. Dawkins. The Selfish Gene. Oxford University Press, 1976. A. E. Eiben, R. Hinterding, and Z. Michalewicz. Parameter control in evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 3(2):124–141, 1999. A. E. Eiben, I. G. Sprinkhuizen-Kuyper, and B. A. Thijssen. Competing crossovers in an adaptive ga framework. Proceedings of the 5th IEEE Conference on Evolutionary Computation, IEEE Press, pages 787–792, 1998. J. Eliott and J. Peraire. Practical 3d aerodynamic design and optimization using unstructured meshes. AIAA Journal, 35(9):1479–1485, 1998. G. E. Fasshauer. Solving Partial Differential Equations by Collocation with Radial Basis Functions. appeared in Surface Fitting and Multiresolution Methods, A. Le Mehaute, C. Rabut, and L. L. Schumaker (eds.), Vanderbilt University Press, 1997. J. Finckenor. Genetic algorithms with inheritance versus gradient optimizers and ga/gradient hybrids. Proceedings of the 5th International Conference on Computer Aided Optimum Design of Structures, pages 257–264, 1997. R. A. Fisher. The Genetical Theory of Natural Selection – A Complete Variorum Edition. Oxford University Press, 1999. A. I. J. Forrester, N. W. Bressloff, and A. J. Keane. Response surface model evolution. 16th AIAA Computational Fluid Dynamics Conference, Orlando, Florida, June 2003. N. Foster and G. Dulikravich. Three-dimensional aerodynamic shape optimization using genetic and gradient search algorithms. Journal of Spacecraft and Rockets, 34:36–42, 1997.

BIBLIOGRAPHY

139

M. N. Gibbs. Bayesian Gaussian Processes for Regression and Classification. PhD thesis, University of Cambridge, 1997. M. B. Giles and N. A. Pierce. An introduction to the adjoint approach to design. Flow, Turbulence and Combustion, 65(3-4):393–415, 2000. D. E. Goldberg and S. Voessner. Optimizing global-local search hybrids. GECCO-99 Proceedings of the Genetic and Evolutionary Computation Conference, pages 220–228, 1999. J. J. Grefenstette. Predictive models using fitness distributions of genetic operators. Foundations of Genetic Algorithms (D. Whitley, editor), 3, 1995. A. Griewank. On Automatic Differentiation. in Mathematical Programming: Recent Developments and Applications, eds. M. Ira and K. Tanabe, Boston, Kluwer, 1989. R. Haftka. Semi-analytical static non-linear structural sensitivity analysis. AIAA Journal, 31:1307–1312, 1993. P. Hajela. Nongradient methods in multidisciplinary design optimization - status and potential. Journal of Aircraft, 36(1), 1999. M. C. G. Hall, D. Cacuci, and M. E. Sclesinger. Sensitivity analysis of a radiativeconvective model by the adjoint method. Journal of Atmospheric Sciences, 39:2038– 2050, 1982. E. Hart and P. Ross. Gavel – a new tool for genetic algorithm visualisation. IEEE Transactions on Evolutionary Computation, 5(4), 2001. M. Herdy. Reproductive isolation as strategy parameter in hierarchically organized evolution strategies. Parallel Problem Solving from Nature II, pages 207–217, 1992. R. Hicks and P. Henne. Wing design by numerical optimization. Journal of Aircraft, 15 (7):407–412, 1978. G. E. Hinton and S. J. Nowlan. How learning can guide evolution. Complex Systems, 1:495–502, 1987. J. H. Holland. Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor, 1975.

BIBLIOGRAPHY

140

R. Hooke and T. A. Jeeves. Direct search solution of numerical and statistical problems. Journal of the Association of Computing Machinery, 8:212–229, 1961. J. N. Hooker. Testing heuristics: We have it all wrong. Journal of Heuristics, 1(1): 33–42, 1996. C. R. Houck, J. A. Joines, and M. G. Kay. Utilizing lamarckian evolution and the baldwin effect in hybrid genetic algorithms. Technical Report 96-01, NCSU-IE, 1996. C. Igel and M. Kreutz. Using fitness distributions to improve the evolution of learning structures. Congress on Evolutionary Computation, 1999. R. L. Iman and J. C. Helton. An investigation of uncertainty and sensitivity analysis techniques for computer models. Risk Analysis, 8:71–90, 1988. A. Iollo, M. Salas, and S. Ta’asan. Shape optimization governed by the euler equations using an adjoint method. Institute for Computer Applications in Science and Engineering, Report 93-78, 1993. A. Jameson. Aerodynamic design via control theory. Journal of Scientific Computing, 3(3):223–260, 1988. A. Jameson. Re-engineering the design process through computation. Journal of Aircraft, 36(1):36–50, 1999. A. Jameson. A perspective on computational algorithms for aerodynamic analysis and design. Progress in Aerospace Sciences, 37(2):197–243, 2001. A. Jameson and J. C. Vassberg. Studies of alternative numerical optimization methods applied to the brachistochrone problem. Computational Fluid Dynamics Journal, 9 (3):281–296, 2000. A. Jameson and J. C. Vassberg. Computational fluid dynamics for aerodynamic design – its current and future impact. 39th AIAA Aerospace Sciences Meeting and Exhibit, Reno, NV, 2001. M. Johnson, L. Moore, and D. Ylvisaker. Minimax and maxmin distance designs. Journal of Statistical Planning and Inference, 26:131–148, 1990. D. R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21:345–383, 2001.

BIBLIOGRAPHY

141

D. R. Jones, M. Schonlau, and W. J. Welch. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13:455–492, 1998. B. A. Julstrom. Comparing darwinian, baldwinian, and lamarckian search in a genetic algorithm for the 4-cycle problem. Late Breaking Papers at the Genetic and Evolutionary Computation Conference, Orlando, FL, 1999. E. J. Kansa. Multiquadrics – a scattered data approximation scheme with applications to computational fluid dynamics: II. Solutions to parabolic, hyperbolic, and elliptic partial differential equations. Computers and Mathematics with Applications, 19(6-8): 147–161, 1990. A. J. Keane. Passive vibration control via unusual geometries: the application of genetic algorithm optimization to structural design. Journal of Sound and Vibrations, 185 (3):441–453, 1995. A. J. Keane and A. P. Bright. Passive vibration control via unusual geometries: experiments on model aerospace structures. Journal of Sound and Vibrations, 190(4): 713–719, 1996. H-J. Kim, S. Obayashi, and K. Nakahashi. Flap-deflection optimization for transonic cruise performance improvement of supersonic transport wing. Journal of Aircraft, 38 (4):709–717, 2001a. H-J. Kim, D. Sasaki, S. Obayashi, and K. Nakahashi. Aerodynamic optimization of supersonic transport wing using unstructured adjoint method. AIAA Journal, 39(6): 1011–1020, 2001b. S. Kim, J. Alonso, and A. Jameson. Two-dimensional high-lift aerodynamic optimization using the continuous adjoint method. 8th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization,Long Beach, CA, 2000. S. Kim, J. Alonso, and A. Jameson. Design optimization of high-lift configurations using a viscous continuous adjoint method. 40th Aerospace Sciences Meeting and Exhibit, Reno, NV, 2002. J. Koehler and A. Owen. Design and analysis of experiments. In S. Ghosh and C. R. Rao, editors, Handbook of Statistics, pages 261–308. North Holland, 1996.

BIBLIOGRAPHY

142

S. J. Leary, A. Bhaskar, and A. J. Keane. A derivative based surrogate model for approximating and optimizing the output of an expensive computer simulation. Journal of Global Optimization – in press, 2003. W. Liu and S. M. Batill. Gradient-enhanced response surface approximations using kriging models. 9th AIAA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, Atlanta, GA, 2002. F. G. Lobo and D. E. Goldberg. Decision making in a hybrid genetic algorithm. Proceedings of the IEEE International Conference on Evolutionary Computation, 1997. J. Maddox. What Remains to be Discovered. Papermac, September 1999. G. Magyar, M. Johnsson, and O. Nevalainen. An adaptive hybrid genetic algorithm for the three-matching problem. IEEE Transactions on Evolutionary Computation, 4(2), 2000. S. W. Mahfoud and D. E. Goldberg. Parallel recombinative simulated annealing: a genetic algorithm. Parallel Computing, 21:1–28, 1995. J. D. Martin and T. W. Simpson. Use of adaptive metamodeling for design optimization. 9th AIAA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, 2002. G. Matheron. Principles of geostatistics. Economic Geology, 58:1246–1266, 1963. G. Mayley. Landscapes, learning costs and genetic assimilation. Evolutionary Computation, 4(3), 1996. M. D. McKay, R. J. Beckman, and W. J. Conover. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21:239–245, 1979. N. E. Mendoza, Y. W. Chen, Z. Nakao, T. Adachi, and Y. Masuda. A real multi-parent tri-hybrid evolutionary optimization method and its application in wind velocity estimation from wind profiler data. Applied Soft Computing, 1:225–235, 2001. J. A. Miller, W. D. Potter, R. V. Gandham, and C. N. Lapena. An evaluation of local improvement operators for genetic algorithms. IEEE Transactions on Systems, Man and Cybernetics, 23(5):1340–1351, 1993.

BIBLIOGRAPHY

143

F. Monge and B. Tobio. Aerodynamic design and optimization by means of control theory. Computational Mechanics – New Trends and Applications, Barcelona, 1988. J. J. More and S. J. Wright. Optimization software guide. SIAM - Frontiers in Applied Mathematics, 14, 1993. M. D. Morris and T. J. Mitchell. Exploratory designs for computer experiments. Journal of Statistical Planning and Inference, 43:381–402, 1995. M. D. Morris, T. J. Mitchell, and D. Ylvisaker. Bayesian design and analysis of computer experiments: Use of derivatives in surface prediction. Technometrics, 35:243–255, 1993. R. H. Myers and D. C. Montgomery. Response surface methodology: Process and product optimization using designed experiments. John Wiley & Sons, New York, 1995. P. B. Nair. Physics-based surrogate modeling of parameterized pdes for optimization and uncertainty analysis. Proceedings of the 43rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, Denver, CO, 2002. J. A. Nelder and R. Mead. A simplex method for function minimization. Computer Journal, 8(1):308–313, 1965. M. Nemec and D. W. Zingg. Towards efficient aerodynamic shape optimization based on the navier-stokes equations. 15th AIAA Computational Fluid Dynamics Conference, Anaheim, CA, 2001. J. C. Newman, A. C. Talyor, R. W. Newman, and P. A. Hou. Overview of sensitivity analysis and shape optimization for complex aerodynamic configurations. Journal of Aircraft, 36(1):87–96, 1999. J. Nocedal, C. Zhu, R. Byrd, and P. Lu. Algorithm 778: L-Bfgs-b, Fortran routines for large scale bound constrained optimization. ACM Transactions on Mathematical Software, 23(4):550–560, 1997. S. Obayashi and T. Tsukahara. Comparison of optimization algorithms for aerodynamic shape design. AIAA Journal, 35(8):1413–1415, 1997. E. M. Oblow, F. G. Pin, and R. Q. Wright. Sensitivity analysis using computer calculus: a nuclear waste application. Nuclear Science and Engineering, 94:4665, 1986.

BIBLIOGRAPHY

144

J. C. Perezzan and S. Hernández. Analytical expressions of sensitivities for shape variables in linear bending systems. Advances in Engineering Software, 34:271–278, 2003. M. J. D. Powell. Radial basis functions for multivariable interpolation: a review. In Algorithms for Approximation, Oxford, Clarendon Press, pages 143–167, 1987. N. J. Radcliffe and P. D. Surry. Formal memetic algorithms. Evolutionary Computing, AISB Workshop, 1994. J. D. Renton. Elastic Beams and Frames. Camford Books, 1999. J. J. Reuther. Aerodynamic shape optimization using control theory. PhD thesis, University of California, Davis, May 1996. J. J. Reuther, A. Jameson, and M. J. Rimlinger. Constrained multipoint aerodynamic shape optimization using an adjoint formulation and parallel computers, part 1. Journal of Aircraft, 36(1):51–60, 1999. J. J. M. Rijpkema, L. P. F. Etman, and A. J. G. Schoofs. Use of design sensitivity information in response surface and kriging metamodels. Optimization and Engineering, 2(4):469–484, 2001. J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Winn. Design and analysis of computer experiments (with discussion). Statistical Science, 4:409–435, 1989. J. Samareh. Survey of shape parameterization techniques for high-fidelity multidisciplinary shape optimization. AIAA Journal, 39(5), 2001. N. Saravanan and D. B. Fogel. Learning strategy parameters in evolutionary programming: An empirical study. Proceedings of the 3rd Annual Conference on Evolutionary Programming, World Scientific Publishing, pages 269–280, 1994. D. Sasaki, S. Obayashi, and H-J. Kim. Evolutionary algorithm vs. adjoint method applied to sst shape optimization. The Annual Conference of the CFD Society of Canada, 2001. M. J. Sasena, P. Papalambros, and P. Goovaerts. Exploration of metamodeling sampling criteria for constrained global optimization. Engineering Optimization, 34:263–278, 2002.

BIBLIOGRAPHY

145

D. Schlierkamp-Voosen and H. Muhlenbein. Strategy adaptation by competing subpopulations. Proceedings of Parallel Problem Solving From Nature III, Jerusalem, pages 199–208, 1994. M. Schonlau. Computer Experiments and Global Optimization. PhD thesis, University of Waterloo, Canada, 1997. A. Sinha and D. E Goldberg. Verification and extension of the theory of global-local hybrids. Technical Report 2001010, IlliGAL, 2001. I. M. Sobol. On the systematic search in a hypercube. SIAM Journal of Numerical Analysis, 16:790–793, 1979. W. M. Spears. Adapting crossover in evolutionary algorithms. Proceedings of 4th Annual Conference on Evolutionary Programming, pages 367–384, 1995. S. Ta’asan. Introduction to shape design and control. VKI Lecture Series on Inverse Design and Optimisation Methods, 1997. B. Tang. Orthogonal array-based latin hypercubes. Journal of the American Statistical Association, 88(424):1392–1397, 1993. H. Theil. Principles of Econometrics. John Wiley, New York, 1971. M. W. Trosset and V. Torczon. Numerical optimization using computer experiments. Technical Report TR-97-38, ICASE, NASA Langley Research Center, Hampton, Virginia, 1997. F. van Keulen, B. Liu, and R. T. Haftka. Noise and discontinuity issues in response surfacesbased on functions and derivatives. 41st AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference, Atlanta, GA, 2000. F. van Keulen and K. Vervenne. Gradient-enhanced response surface building. 9th AIAA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, Atlanta, GA, 2002. K. Vervenne and F. van Keulen. An alternative approach to response surface building using gradient information. 43rd Structures, Structural Dynamics and Materials Conference, Denver, CO, 2002.

BIBLIOGRAPHY

146

A. Vicini and D. Quagliarella. Airfoil and wing design through hybrid optimization strategies. AIAA Journal, 37(5):634–641, 1999. A.G. Watson and R. J. Barnes. Infill sampling criteria to locate extremes. Mathematical Geology, 27(5):589–608, 1995. H. White, A. R. Gallant, K. Kornik, M. Stinchcombe, and J. Wooldridge. Artificial neural networks: Approximation and learning theory. Blackwell publishers, 1992. D. Whitley. Modeling hybrid genetic algorithms. Genetic Algorithms in Engineering and Computer Science (edited by G. Winter and P. Cuesta), pages 191–201, 1995. John Wiley. B. J. Williams, T. J. Santner, and W. Notz. Sequential design of computer experiments to minimize integrated response functions. Statistica Sinica, 10:1133–1152, 2000. J. Yen, D. Randolph, B. Lee, and J. C. Liao. Hybrid genetic algorithm for the identication of metabolic models. Proceedings of the IEEE 7th International Conference on Tools with Artificial Intelligence, Herndon, VA, pages 4–7, 1995.