Dec 3, 2016 - Pour ma m`ere, elle sait pourquoi. 5 ..... purpose, meta-model independent, distribution tailored active learning .... its probability density function (PDF) fX. A random variable is said to ...... Existing approaches (Kennedy.
Active Machine Learning for Computational Design and Analysis under Uncertainties
Item type
text; Electronic Dissertation
Lacaze, Sylvain
The University of Arizona.
Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author.
3-Dec-2016 17:16:42
Link to item
A Dissertation Submitted to the Faculty of the DEPARTMENT OF AEROSPACE AND MECHANICAL ENGINEERING In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY WITH A MAJOR IN MECHANICAL ENGINEERING In the Graduate College THE UNIVERSITY OF ARIZONA
THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE As members of the Dissertation Committee, we certify that we have read the dissertation prepared by Sylvain Lacaze, titled Active Machine Learning for Computational Design and Analysis under Uncertainties and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy. _______________________________________________________________________ Date: 04/17/2015
Samy Missoum
_______________________________________________________________________ Date: 04/17/2015
Barry D. Ganapol
_______________________________________________________________________ Date: 04/17/2015
Mohammad Poursina
_______________________________________________________________________ Date: 04/17/2015
Ning Hao
Final approval and acceptance of this dissertation is contingent upon the candidate’s submission of the final copies of the dissertation to the Graduate College. I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement. ________________________________________________ Date: 04/17/2015 Dissertation Director: Samy Missoum
STATEMENT BY AUTHOR This dissertation has been submitted in partial fulfillment of the requirements for an advanced degree at the University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this dissertation are allowable without special permission, provided that an accurate acknowledgement of the source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his or her judgment the proposed use of the material is in the interests of scholarship. In all other instances, however, permission must be obtained from the author.
SIGNED: Sylvain Lacaze
Acknowledgement First and foremost, I would like to thank my dissertation advisor, Prof. Samy Missoum. Under his supervision, I have developed my critical thinking, original research, and my presentation and writing skills. I would not be the researcher that I am today without his guidance. I also would like to thank Prof. Barry D. Ganapol, Prof. Mohammad Poursina, and Prof. Ning Hao for agreeing to be members of my Ph.D. committee. I would like to express my sincere gratitude to Prof. Barry D. Ganapol for showing me a side of mathematics that now defines my vision of research. I would like to thank Prof. Jean-Marc Bourinet, for showing me the wonder of research. Without his guidance and the opportunities he offered me, I would not be where I am now. I would like to thank my colleagues from the Computational Optimal Design of Engineering Systems (CODES) laboratory: Ethan Boroson, Peng Jiang, Mathieu Carmassi and Emmanuel Cottanceau. Specifically, I would like to acknowledge Ethan Boroson for never losing patience. I also would like to thank Lo¨ıc Br´evault and Dr. Mathieu Balesdent for the very stimulating, albeit brief, collaboration and for the ´ opportunity to do an internship at the Office National d’Etudes et de Recherches A´erospatiales (ONERA). I would like to thank Jini Kandyil for her continuous assistance in navigating the mystifying ways of paperwork and bureaucracy. I would like to thank my family for their everlasting support, especially my parents and my grand-parents. Last but not least, I would like to thank five very special people: Jean-Alexandre Bousquet, Erwan Harrouch, Romain Touz´e, Vincent Divin´e and Mathieu Fernandez. Their contribution might have not been technical, yet it was one of the most important. 4
Dedication To my mother, she knows why. Pour ma m`ere, elle sait pourquoi.
Contents Acknowledgement . . . . . . . . . . . . . . . . . . .
Dedication . . . . . . . . . . . . . . . . . . . . . . .
List of Figures . . . . . . . . . . . . . . . . . . . . . 13 List of Tables. . . . . . . . . . . . . . . . . . . . . . 15 Abstract . . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter 1 Introduction . . . . . . . . . . . . . . . . 17 1.1 Scope . . . . . . . . . . . . . . . . . . . . . . 20 Chapter 2 Tools for Stochastic Analysis . . . 2.1 Elements of Probability Theory . . . . . . 2.1.1 Random Variable . . . . . . . . . 2.1.2 Distribution . . . . . . . . . . . 2.1.3 Moments . . . . . . . . . . . . 2.1.4 Relevant Distributions . . . . . . . 2.1.5 Distribution Transformation . . . . 2.1.6 Multivariate . . . . . . . . . . . 2.1.7 Dependence and Conditioning . . . 2.1.8 Moments for Multivariate . . . . . 2.1.9 Dependent Distributions and Copulas 2.2 Elements of Statistical Inference . . . . . . 2.2.1 Random Samples . . . . . . . . . 2.2.2 Statistics . . . . . . . . . . . . 2.2.3 Estimators . . . . . . . . . . . . 2.2.4 Maximum Likelihood Estimate . . . 2.2.5 Bayes Estimate . . . . . . . . . . 2.2.6 Monte Carlo Estimate . . . . . . . 6
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. 22 . 22 . 23 . 23 . 25 . 26 . 31 . 32 . 32 . 35 . 35 . 38 . 39 . 40 . 42 . 44 . 45 . 46
. . . . . . . . . . . . . . . .
48 50 51 57 59 62 64 66 70 71 73 77 80 85 85 86
Chapter 3 Supervised Learning for Computational Design . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Fundamentals of Supervised-Learning . . . . . . . . 3.1.1 Problem Definition . . . . . . . . . . . . . 3.1.2 Regression vs Classification. . . . . . . . . . 3.1.3 Loss and Risk . . . . . . . . . . . . . . . 3.1.4 Model Selection . . . . . . . . . . . . . . 3.2 Common Meta-Models . . . . . . . . . . . . . . . 3.2.1 Linear Regressions . . . . . . . . . . . . . 3.2.2 Radial Basis Functions Networks . . . . . . . 3.2.3 Polynomial Chaos Expansions . . . . . . . . 3.2.4 Gaussian Processes . . . . . . . . . . . . . 3.2.5 Support Vector Machines . . . . . . . . . . 3.3 Design of Computer Experiments . . . . . . . . . . 3.3.1 Random Sampling . . . . . . . . . . . . . 3.3.2 Full and Fractional Factorial . . . . . . . . . 3.3.3 Latin Hypercube Sampling . . . . . . . . . . 3.3.4 Centroidal Voronoi Tessellation . . . . . . . . 3.4 Active Learning for Reliability Assessment . . . . . . 3.4.1 One Step Look Ahead . . . . . . . . . . . . 3.4.2 Expected Improvement . . . . . . . . . . . 3.4.3 AK-MCS . . . . . . . . . . . . . . . . . 3.4.4 K-Means Clustering Strategy . . . . . . . . .
90 90 91 91 92 93 95 96 97 98 99 103 110 110 111 111 112 113 116 117 119 120
2.2.7 Bootstrap . . . . . . . . . . . . . Sensitivity Analysis . . . . . . . . . . . . 2.3.1 Correlation Analysis. . . . . . . . . 2.3.2 Differential Analysis. . . . . . . . . 2.3.3 ANalysis Of VAriance . . . . . . . . Reliability Assessment . . . . . . . . . . . 2.4.1 Crude Monte Carlo . . . . . . . . . 2.4.2 First Order Reliability Method . . . . 2.4.3 Second Order Reliability Method . . . 2.4.4 Importance Sampling . . . . . . . . 2.4.5 Subset Simulations . . . . . . . . . Reliability-based Design Optimization . . . . 2.5.1 Stochastic Constraint Transformations . 2.5.2 Double Loop Approaches . . . . . . 2.5.3 Single Loop Approaches . . . . . . . 2.5.4 Sequential Approaches. . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
Explicit Design Space Decomposition . . . . . 121
Chapter 4 A Framework for Active Learning under Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Generalized “Max-Min” . . . . . . . . . . . . . . 4.1.1 Generalization to Gaussian Distribution . . . . 4.1.2 Numerical Implementation . . . . . . . . . . 4.2 Fidelity Maps for Model Update. . . . . . . . . . . 4.2.1 Problem Definition . . . . . . . . . . . . . 4.2.2 Fidelity Maps . . . . . . . . . . . . . . . 4.2.3 Likelihood Approximation . . . . . . . . . . 4.2.4 Post-processing . . . . . . . . . . . . . . . 4.2.5 Usage . . . . . . . . . . . . . . . . . . . 4.3 Design under Uncertainties . . . . . . . . . . . . . 4.3.1 Active Learning for Reliability Assessment . . . 4.3.2 System Reliability . . . . . . . . . . . . . 4.3.3 Convergence Criteria . . . . . . . . . . . . 4.3.4 Reliability-based Design Optimization . . . . . Chapter 5 Demonstrative Examples and Applications . 5.1 Supervised Learning for Computational Design . . . . 5.1.1 DOE Comparison . . . . . . . . . . . . . . 5.1.2 DOE vs Adaptive Sampling . . . . . . . . . 5.1.3 Regression vs Classification. . . . . . . . . . 5.2 Active Supervised Learning . . . . . . . . . . . . . 5.2.1 Four Branch Problem . . . . . . . . . . . . 5.2.2 10D Limit State . . . . . . . . . . . . . . 5.2.3 Parallel Update. . . . . . . . . . . . . . . 5.3 Model Update . . . . . . . . . . . . . . . . . . 5.4 Design under Uncertainties . . . . . . . . . . . . . 5.4.1 Reliability Assessment: Analytical Examples . . 5.4.2 Reliability Assessment: Cantilever Beam . . . . 5.4.3 RBDO: Academic Examples . . . . . . . . . 5.4.4 RBDO: Crash-worthiness Analysis of a Car Side Impact . . . . . . . . . . . . . . . . . .
123 123 124 129 134 134 135 136 137 138 138 138 139 140 141 151 151 151 152 153 154 154 156 156 158 164 165 167 172 173
Chapter 6
Conclusion . . . . . . . . . . . . . . . . . 177
Chapter A
MATLAB Toolbox. . . . . . . . . . . . . 179 8
Notations . . . . . . . . . . . . . . . . . . . . . . . 181 Acronyms . . . . . . . . . . . . . . . . . . . . . . . 183 Bibliography . . . . . . . . . . . . . . . . . . . . . . 203
List of Figures 2.1 2.2 2.3
2.4 2.5
2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14
An example of distribution functions for a normal random variable with µ = 0 and σ = 1. . . . . . . . . . . . . Graphical illustration of random vectors and random processes. . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of two joint and marginal PDFs for a Gaussian independent and dependent case. The inner circles (yellow) indicate the high probability density regions whereas outer circles (blue) indicate low probability density regions. . . . . . . . . . . . . . . . . . . . . . . . . . Influence of the Gaussian copula hyper-parameter on the joint PDF. . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum likelihood estimate illustration using four different random samples of sizes n = 10, n = 100, and n = 1000. . . . . . . . . . . . . . . . . . . . . . . . . . . Plots of the likelihood, and prior and posterior distribution for Example 2.12. . . . . . . . . . . . . . . . . . . . Overlay of bootstrapped sample mean histogram and its sampling distribution, for a random sample of size n = 100. Illustration of sensitivity examples used for comparison. . Graphical representation of relevant elements for reliability assessment. . . . . . . . . . . . . . . . . . . . . . . Iso-contours (1%, 10%, and 100%) of the Pf CMC estimate coefficient of variation with respect to n and Pf . . . An example of non-linear limit state function g (U) that has a linear hyper-surface g (U) = 0. . . . . . . . . . . . Illustration of the orientation change. The calculation of the Pf reduces to an univariate problem. . . . . . . . . . An example of optimal auxiliary distribution hopt . . . . . An example of a subset simulation run on a complex 2D limit state. . . . . . . . . . . . . . . . . . . . . . . . . . . 10
24 33
34 38
45 47 50 56 64 65 67 69 73 76
2.15 Illustration of reliability examples used for comparison. 2.16 Graphical illustration of PMA constraint conversion. If Gi (z, θ) ≥ 0 then Pfi (z, θ) ≤ PTi . . . . . . . . . . . . . 2.17 Graphical illustration of SLA. . . . . . . . . . . . . . . 2.18 Iteration 1, 2 and 20 of SORA for Example 2.21. . . . . 3.1 3.2 3.3 3.4 3.5
3.7 3.8 3.9
4.1 4.2 4.3
4.4 4.5
. 77 . 81 . 87 . 88
Graphical illustration of over-fitting for Example 3.1. . . 95 Comparison of regular and normalized RBF on a one dimensional example. . . . . . . . . . . . . . . . . . . . . 98 An illustration of the SVM general idea and basic elements.104 A LHS with poor coverage of the space. . . . . . . . . . . 112 Examples of design of computer experiments in 2D for a uniform and a bivariate correlated Gaussian distribution (Part 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Examples of design of computer experiments in 2D for a uniform and a bivariate correlated Gaussian distribution (Part 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Graphical illustration of the improvement function Iq . . . 118 Graphical illustration of the expected improvement (for contour estimation) function Iq . . . . . . . . . . . . . . . 119 Illustration of k-means clustering strategy following Example 3.4. Blue bins show the MCMC sample and green squares show the k-means cluster centroids. . . . . . . . 121 Sequential addition of 41 max-min examples in a 2D unconstrained setup. . . . . . . . . . . . . . . . . . . . . . Sequential addition of 41 modified max-min (4.3) instances in a 2D unconstrained setup. . . . . . . . . . . Marginal empirical CDFs FXi along with the standard normal CDF Φ for Exercise 4.1 using (4.3). The red dashed-dotted line is the reference standard normal CDF Φ while every solid colored line is the marginal empirical CDFs FXi . . . . . . . . . . . . . . . . . . . . . . . . . . Sequential addition of 41 generalized max-min (4.4) instances in a 2D unconstrained setup. . . . . . . . . . . Marginal empirical CDFs FXi along with the standard normal CDF Φ for Exercise 4.1 using (4.4). The red dashed-dotted line is the reference standard normal CDF Φ while every solid colored line is the marginal empirical CDFs FXi . . . . . . . . . . . . . . . . . . . . . . . . . . 11
. 125 . 126
. 126 . 127
. 128
4.6 4.7 4.8 4.9
5.7 5.8
Graphical illustration of classic and generalized max-min for Example 4.1. . . . . . . . . . . . . . . . . . . . . . . Graphical illustration of Example 4.2. . . . . . . . . . . Main elements of the FM approach. . . . . . . . . . . . Normalized bias (Bias), standard error (Std) and root mean square error (RM SE) for 4 level of probability of failure. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 129 . 135 . 137
. 149
Comparison of four adaptive sampling schemes for the four branch problem (Section 5.2.1) over 10 repetitions. Plots show the relative error ε in the estimated probability of failure Pbf . . . . . . . . . . . . . . . . . . . . . . Comparison of four adaptive sampling schemes for the 10D limit state problem (Section 5.2.2) over 10 repetitions. Plots show the relative error ε in the estimated probability of failure Pbf . . . . . . . . . . . . . . . . . . . Comparison of estimated probability of failure convergences using different adaptive sampling schemes (Section 5.2.3). . . . . . . . . . . . . . . . . . . . . . . . . . . Schematic and finite element representation of a simple plate. One side is simply supported while the others are connected to the ground through springs, to model uncertainties on the boundary conditions. . . . . . . . . . Graphical results of Section 5.3, showing the fidelity maps and the estimated likelihoods, for Eact = 185 × 109 Pa and Kact = 3×105 N.m-1 (a and b), Kact = 6×105 N.m-1 (c and d), Kact = 9 × 105 N.m-1 (e and f). . . . . . . . . Graphical results of Section 5.3, showing the fidelity maps and the estimated likelihoods, for Eact = 235 × 109 Pa and Kact = 3×105 N.m-1 (a and b), Kact = 6×105 N.m-1 (c and d), Kact = 9 × 105 N.m-1 (e and f). . . . . . . . . Bayesian update applied to the first case (Eact = 185 × 109 Pa and Kact = 3 × 105 N.m-1 ) of the plate example. . First natural frequency λ1 distributions for first case (Eact = 185 × 109 Pa and Kact = 3 × 105 N.m-1 ) of the plate example. Uncertainty propagated using the prior, the posterior, and the ideal (unknown) distributions. . . Contours of the probability density function and the limit state hyper-surface for the analytical examples. . . 12
163 164
165 168
5.10 Convergence of the estimated probability of failure and the corresponding 95% confidence intervals using the classic (red) or the generalized (blue) max-min adaptive sampling schemes for the analytical examples. The reference value PˆfNM C is based on the actual limit state function. . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11 Description of the cantilever beam for Section 5.4.2. . . 5.12 Scatter plots of the joint distribution for Section 5.4.2. 5.13 Convergence of the estimated probability of failure and the corresponding 95% confidence intervals using the classic (red) or the generalized (blue) max-min adaptive sampling schemes for the cantilever beam example. The reference value PˆfNM C is based on the actual limit state function. . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14 Overview of RBDO academic examples. . . . . . . . . . 5.15 Evolution of zbest for the RBDO analytical examples. . 5.16 Evolution of ρH for the RBDO analytical examples. . . 5.17 Evolution of ρK for the RBDO analytical examples. . . 5.18 Evolution of zbest for the RBDO crash example. . . . . 5.19 Evolution of ρH for the RBDO crash example. . . . . . 5.20 Evolution of ρK for the RBDO crash example. . . . . .
. 168 . 170 . 170
. . . . . . . .
171 172 174 174 174 176 176 176
A.1 Diagram of the current implementation of the CODES toolbox. . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
List of Tables 2.1 2.2 2.3 2.4 2.5 2.6 2.7 4.1 4.2 4.3 5.1 5.2
5.3 5.4 5.5
Maximum likelihood estimate values for the Example 2.11. Correlation coefficients of the demonstrative examples (n = 104 ). . . . . . . . . . . . . . . . . . . . . . . . . . . Differential analysis of the Example SA.3 (n = 104 ). . . ANOVA of the Example SA.3 (n = 104 ). . . . . . . . . Comparison of 4 reliability assessment techniques on Example RA.1 (2.262). . . . . . . . . . . . . . . . . . . . Comparison of 4 reliability assessment techniques on Example RA.2 (2.263). . . . . . . . . . . . . . . . . . . . Comparison of 4 reliability assessment techniques on Example RA.3 (2.264). . . . . . . . . . . . . . . . . . . .
45 56 59 62 78 78 78
Comparison of numerical optimization for Remark 4.3 (d = 2). . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Comparison of numerical optimization for Remark 4.3 (d = 50). . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Normalized bias, standard error and root mean square error at α = αopt and α = 0.5. Gaussian approximation. . 148 Comparison of SVM risks using different DOE types and sizes for (5.1). . . . . . . . . . . . . . . . . . . . . . . . Comparison of SVM risks trained on either DOE only and DOE followed by adaptive sampling (same final sample size) for (5.1). . . . . . . . . . . . . . . . . . . . . . Comparison of SVM and GP risks using different CVT DOE sizes for (5.1). . . . . . . . . . . . . . . . . . . . . Parameters used in Section 5.3 (S.I. units). . . . . . . . Summary of the 6 experimental combinations and corresponding figures (S.I. units) for Section 5.3. Part 1. . . 14
. 152
. 153 . 153 . 160 . 161
5.6 5.7
5.11 5.12
Summary of the 6 experimental combinations and corresponding figures (S.I. units) for Section 5.3. Part 2. . . Estimated probability of failure and its 95% CI at iteration 30, 60 and 100 using the classic max-min scheme for Section . . . . . . . . . . . . . . . . . . . . Estimated probability of failure and its 95% CI at iteration 30, 60 and 100 using the generalized max-min scheme for Section . . . . . . . . . . . . . . . . Estimated probability of failure and its 95% CI at iteration 30, 60 and 100 using the classic max-min scheme for Section . . . . . . . . . . . . . . . . . . . . Estimated probability of failure and its 95% CI at iteration 30, 60 and 100 using the generalized max-min scheme for Section . . . . . . . . . . . . . . . Marginal distributions of the parameters involved in Section 5.4.2. . . . . . . . . . . . . . . . . . . . . . . . . . Estimated probability of failure and its 95% CI at iteration 30, 60 and 100 using the classic max-min scheme for Section 5.4.2. . . . . . . . . . . . . . . . . . . . . . Estimated probability of failure and its 95% CI at iteration 30, 60 and 100 using the generalized max-min scheme for Section 5.4.2. . . . . . . . . . . . . . . . . .
. 161
. 169
. 169
. 169
. 169 . 170
. 171
. 171
Abstract Computational design has become a predominant element of various engineering tasks. However, the ever increasing complexity of numerical models creates the need for efficient methodologies. Specifically, computational design under uncertainties remains sparsely used in engineering settings due to its computational cost. This dissertation proposes a coherent framework for various branches of computational design under uncertainties, including model update, reliability assessment and reliability-based design optimization. Through the use of machine learning techniques, computationally inexpensive approximations of the constraints, limit states, and objective functions are constructed. Specifically, a novel adaptive sampling strategy allowing for the refinement of any approximation only in relevant regions has been developed, referred to as generalized max-min. This technique presents various computational advantages such as ease of parallelization and applicability to any meta-model. Three approaches tailored for computational design under uncertainties are derived from the previous approximation technique. An algorithm for reliability assessment is proposed and its efficiency is demonstrated for different probabilistic settings including dependent variables using copulas. Additionally, the notion of fidelity map is introduced for model update settings with large number of dependent responses to be matched. Finally, a new reliability-based design optimization method with local refinement has been developed. A derivation of sampling-based probability of failure derivatives is also provided along with a discussion on numerical estimates. This derivation brings additional flexibility to the field of computational design. The knowledge acquired and techniques developed during this Ph.D. have been synthesized in an object-oriented MATLAB toolbox. The help and ergonomics of the toolbox have been designed so as to be accessible by a large audience. 16
Chapter 1 Introduction Historically, design has been carried out by means of prototyping and trial-and-error testing. While some form of experimental testing will always be required for validation, a systematic, physical, testing-oriented design approach can prove to be impractical for large scale projects due to time and cost. For these reasons, engineers are increasingly relying on predictions from computer simulations. These predictions save time and money by drastically reducing the need for experimental testing. With the rise in computational power, powerful numerical tools (e.g., finite elements) are now ubiquitous. This has led to the rise of computational design. The discipline of computational design loosely refers to the use of computational and/or numerical tools for engineering analysis. The easiest example is referred to as reliability assessment, which aims at estimating the probability of some event to occur (e.g., failure, instability). In experimental testing, such estimation would require a large amount of prototypes, leading to unreasonable costs. However, in computational design, it simply leads to various runs of a numerical code. However, setting aside for now the need to validate such numerical tools, computational design is not free from limitations. Arguably the most limiting factor is the computational time. If anyone ever believed the exponential growth in computational power would lead to shorter simulation time, the past decades have proven them wrong. The increase in computational power, rather, lead to increases in model complexity and if anything, longer simulation time. In computational fluid dynamics for example, simulations with times on the order of months have been explored. Reliability assessment of such models would lead to computational times of the order of a human life span. This issue is further 17
magnified in the case of optimization under uncertainties, referred to as reliability-based design optimization (RBDO) where constraints can be reliability assessment setups. In order to address this issue, a better understanding of the reliability assessment process, and the underlying notion of uncertainty, is required. The notion of uncertainties in various stages of the design and manufacturing processes have long been acknowledged. Safety factors are arguably the most widely used approach to account for uncertainties. Once a desirable structural design is achieved, its strength (e.g., stiffness, elastic limit) is amplified to withstand any variations in the loading conditions, material properties, etc. Most of the time however, such designs are over-conservative, leading to unnecessary costs, and sometimes simply fail to account for all sources of uncertainty. A different approach aims at identifying all possible sources of uncertainty, quantifying them, and accounting for them in the design process. Various approaches exist to quantify uncertainties, however this work focuses on the use of probability theory. The most straightforward approach for structural reliability assessment, later referred to as reliability assessment, is the crude Monte Carlo (CMC) estimation. CMC estimation implies that the computational model needs to be called various times, based on the inputs’ distribution. The number of model calls, or function calls, is typically extremely large (> 105 ) making this approach intractable beyond the scope of academic examples. For this reason, knowledge of design under uncertainties was sparse to non-existent before the 1980s. The discipline of design under uncertainty has the potential to yield optimal designs with respect to any (properly quantified) variations in its internal and external environment. However, as mentioned above, it was virtually never considered before the 1980s due to exceedingly large computational time. The field of reliability assessment really began in the late 70s with the pioneering work of Hasofer and Lind (1974) and Rackwitz and Fiessler (1978) relating to first-order concepts that lead to the first-order reliability method (FORM) and later the second-order reliability method (SORM). However, these approaches rely heavily on a set of rather restrictive assumptions (e.g., linearly bounded failure domain in the underlying Gaussian space for FORM). Although satisfactory for a wide range of engineering applications, the approaches can lead to large inaccuracy when these assumptions are violated (e.g., strong non-linear behavior). For this reason, significant effort has been put toward developing sampling based methods with lower computa18
Introduction tional cost than CMC. Such approaches are typically referred to as variance reduction techniques (Rubinstein and Kroese, 2011), such as importance sampling (IS) or subset simulations (SubSim). However, although less computationally intensive (i.e., time wise, or in terms of function calls), such approaches remain intractable for computationally expensive models. Due to the cost of experimental design, the field of computational design arose. Likewise, due to the cost of computational design (i.e., computational time), the field of surrogate modeling arose. Surrogate modeling aims at replacing the computational model by a meta-model (i.e., an approximation) inexpensive to evaluate. This field really began in the late 1980s with the work of Sacks et al. (1989). Therefore, given an accurate and computationally inexpensive meta-model of the original computational model, the use of, for example, SubSim is now tractable. Most of the effort is therefore transfered toward training such an accurate meta-model. Such a need is also found in the field of machine learning which provides essential tools for this research. The origin of machine learning can be tracked back to the late 1940s with works such as the one from Turing (1950). The field of machine learning is extremely large and is made of many subcategories. The one of particular interest for this work is referred to as supervised learning. In an engineering context, it can loosely be described as the task of predicting the performance of a new design, given a set of designs with known performances. However, there is major distinction between supervised learning and surrogate modeling. In supervised learning, the task is to learn as best as possible, given the training data. In the case of surrogate modeling, the task is to learn the model with the fewest function calls. Therefore, the choice of the training set is part of the problem at hand and constitutes the field of active learning. However, in the current definition of active learning, the set of possible new data (pool) is finite and an oracle (e.g., , human operator) can provide new information. In the case of surrogate modeling, the set of possible new data (design space) is infinite. This leads to a new branch of surrogate modeling, often referred to as adaptive sampling. Over the years, various adaptive sampling schemes have been proposed such as AK-MCS (Echard et al., 2011), efficient global reliability assessment (EGRA, Bichon et al., 2008), and explicit design space decomposition (EDSD, Basudhar and Missoum, 2010). Specifically, the EDSD approach (Basudhar, 2011) is used as a starting point for this work. However, EDSD was developed with deterministic design spaces 19
Scope in mind. This work will extend the EDSD work to stochastic spaces and introduce new techniques for reliability assessment and RBDO.
The main purpose of this work is to propose a methodology to perform analysis and design under uncertainties of a computationally expensive model M with d inputs X and ny outputs Y such that: Y = M (X)
In this work, this is achieved through meta-modeling and a general purpose, meta-model independent, distribution tailored active learning strategy (adaptive sampling scheme) that extends the findings from Basudhar (2011). Basudhar concluded in his dissertation that improvements of the EDSD method are needed as follows: • improve the adaptive sampling scheme; • provide better error bounds on support vector machines (SVM); • account for dependence in the input variables. In fact, it will be shown that not only the dependence structures should be taken into account, but also the inputs’ distribution fX . This work addresses these issues and also extends the previous work by developing an adaptive sampling scheme that can be used with any meta-model and that is tailored for stochastic spaces. This new adaptive sampling scheme is used to develop three algorithms for computational design: 1. A general purpose scheme for the approximation of the failure domain, used for reliability assessment. This approach can be used with any meta-model, including SVMs and Gaussian processes (GPs). 2. Fidelity maps for model update. The fidelity map approach is specifically tailored to solve inverse problems with large numbers of responses, possibly correlated. Its main advantage stems from its ability to encompass all responses and their dependence structure into a single quantity. The dependence structure does not need to be characterized. 20
Introduction 3. A local update RBDO. Many existing surrogate-based RBDO techniques rely on a global training of meta-models over some extended space followed by a double loop RBDO approach. In this work, the meta-models are sequentially locally refined using the generalized max-min and the current optimum. This dissertation is at the crossroads between mechanical engineering and machine learning. As such, some level of formalism and mathematical rigor is dropped for the sake of clarity. On the other hand, this work also provides strong mathematical foundations whenever possible. This dissertation is organized as follow: • Chapter 2 begins by introducing the elements of probability theory (Section 2.1) and statistical inference (Section 2.2) used in this work. Conventional methods for sensitivity analysis (Section 2.3), reliability assessment (Section 2.4) and RBDO (Section 2.5) are then introduced. • Chapter 3 presents basic notions of supervised learning (Section 3.1) followed by typical meta-models from the surrogate modeling literature (Section 3.2). Sections 3.3 and 4.3.1 present existing design of computer experiments techniques and adaptive sampling schemes. • Chapter 4 details the different elements of this work’s contributions, starting with the theory of the generalized max-min (Section 4.1). The notion of fidelity map is then developed (Section 4.2) followed by two algorithms, for reliability assessment and RBDO (Section 4.3). • Chapter 5 presents examples, demonstrations and applications of the methods developed in Chapter 4. • Chapter 6 concludes this dissertation. For ease of reading, a table of notations along with a page reference to their first appearance is provided on Page 181. Similarly, a table of acronyms used in this dissertation along with page references of their appearances is provided on Page 183.
Chapter 2 Tools for Stochastic Analysis As mentioned in Chapter 1, this dissertation is at the nexus between mathematics and practical methods for engineering. To serve this purpose, this chapter starts by introducing probability and statistical theories in Sections 2.1 and 2.2. These theoretical bases are then used as foundations for sensitivity analysis (Section 2.3), reliability assessment (Section 2.4) and Reliability-based Design Optimization (RBDO, Section 2.5).
Elements of Probability Theory
When working in the engineering field, one is quickly exposed to the notion of uncertainty. Historically, when engineers had to deal with uncertainties (e.g., in a variable x) safety factors were used so that the design could withstand the worst case scenario. It is widely admitted that such approaches are not cost efficient. A different route aims at quantifying said uncertainties and accounting for them. Various theories have been introduced over the years such as: • Fuzzy logic (Zadeh, 1988) • Possibility theory (Zadeh, 1978) • Evidence theory (Shafer, 1976) • Interval formalism (Du et al., 2005) • Probability theory (Kolmogorov, 1956) 22
Tools for Stochastic Analysis Probability theory is arguably the most widely used framework for uncertainty quantification and is used as a basis for this work. The following sections introduce only practical elements of probabilities that relate to the scope of this work. For an exhaustive introduction to probability theory, the interested reader is referred to Ross (2009).
Random Variable
The foundation of probability theory relies on the notion of random variables. A random variable X can take various realizations based on its distribution. When there is a finite set of realizations (i.e., a discrete random variable), each has a given probability of occurring. When there is an infinite set of realizations (i.e., a continuous random variable), each has a given probability density of occurring. In engineering settings, continuous variables are more common. Therefore, they are the focus of the rest of this background section.
A random variable X is fully characterized by its distribution, which is usually parametrized by hyper-parameters θ (e.g., mean, standard deviation, min, max, etc.). The distribution defines how probability densities are distributed over the domain of the random variable, through its probability density function (PDF) fX . A random variable is said to follow a distribution such that: X ∼ fX
The cumulative distribution function (CDF) FX is defined as the integral of the PDF: Z q FX (q) = fX (x) dx = P [X ≤ q] (2.2) 0
where P [X ≤ q] stands for the probability that the random variable X takes realization lower than or equal to q. A quantile qp is defined as the realization such that there is a probability p that X ≤ qp : FX (qp ) = p Any PDF must satisfy two properties: 23
Elements of Probability Theory 0.4
0.2 0.1
F!1 X
0 -4
0 -4
(a) PDF.
(b) CDF.
(c) IDF.
Figure 2.1: An example of distribution functions for a normal random variable with µ = 0 and σ = 1. • fX (x) ≥ 0 ∀x R +∞ • −∞ fX (x) dx = 1
Due to these properties, the CDF is a monotonically increasing function. The inverse distribution function (IDF) F−1 X is defined over regions where the CDF is one-to-one and defines which quantile qp leads to a given probability p such that: F−1 X (p) = qp
Example 2.1 (Gaussian distribution functions) To illustrate the distribution functions, let x follow a normal distribution (Section with mean µ = 0 and standard deviation σ = 1: X ∼ N (0, 12 )
Figure 2.1 shows the three distribution functions introduced for this example. The PDF provides information about how likely a realization of the random variable is to occur (e.g., zero is the most likely realization). The CDF provides information about the probabilistic content cumulated before a realization (e.g., there is a one half probability to get a realization lower than or equal to zero). Remark 2.1 (IDF Calculation) For most distributions, there is no closed form of the inverse 24
Tools for Stochastic Analysis distribution function. Its calculation usually involves inverse search techniques or optimization approaches to find qp such that: Z qp fX (x) dx = p (2.6) FX (qp ) = 0
Although random variables are fully defined through their distribution hyper-parameters θ, they are better characterized by their moments. The nth moment of a random variable X is defined by: Z +∞ 0 µn = xn fX (x) dx (2.7) −∞
The nth central moment is: Z µn =
(x − µ01 ) fX (x) dx
where µ01 = µX = E [x] is the expectation, or expected value, of the random variable X. The expected value (also referred to as mean) of X is the average of its possible realizations: Z +∞ E [X] = xfX (x) dx (2.9) −∞
The expectation is not to be confused with the most likely. The variance of a random variable X is the 2nd central moment µ2 and depicts the spread of the possible realizations of X around its mean µ: Z +∞ V [X] = (x − µ)2 fX (x) dx (2.10) −∞ = E (X − µ)2 (2.11) 2 2 =E X −µ (2.12)
A normalized version of the variance is called the standard deviation and is defined as: p (2.13) σX = V [X] 25
Elements of Probability Theory Another standardized measure of the spread of a random variable is the coefficient of variation: cvX =
σX µX
Higher moments, such as skewness and kurtosis, can be useful but are outside the scope of this work. Remark 2.2 (Standard Random Variables) For any random variable X, the associated standard random variable U is defined as: U=
X − µX σX
Remark 2.3 (Expectation and Variance of a Constant) Given a constant c, E [c] = c and V [c] = 0.
Relevant Distributions
Various distribution models have been developed over the years. This section only describes the distributions used in this work. For a more complete catalog, the interested reader is referred to Melchers (1999, Appendix A) and Casella and Berger (1990, Chapter 3). For these distributions, the notation (2.1) is traditionally replaced.
A uniform distribution is used when little information is available besides a range [a; b]. Any realization between a and b has an equal probability density. • Hyper-parameters: lower and upper bound, a and b respectively • Notation:
X ∼ U (a, b) 26
Tools for Stochastic Analysis • PDF: fU (x) = • CDF: FU (q) = • Moments:
if x < a if a ≤ x ≤ b if x > b
if q < a if a ≤ q ≤ b if q > b
1 b−a
q−a b−a
a+b E [X] = 2
(b − a)2 V [X] = 12
Remark 2.4 (Standard Uniform) Although contradicting Remark 2.2, a standard uniform random variable U is traditionally defined as: U ∼ U (0, 1)
The Gaussian (also referred to as normal) distribution is arguably the most used across scientific fields. It is applicable to various natural phenomenon and is the distribution of a random variable made of additive uncertainties (Central Limit Theorem Rice, 2006). • Hyper-parameters: mean and standard deviation, µX and σX respectively. • Notation: traditionally, Gaussian distribution is noted by its variance: 2 X ∼ N µX , σX (2.21)
• PDF:
fN (x) =
1 √
σX 2π 27
(x−µX )2 2σ 2 X
Elements of Probability Theory • CDF: there is no analytical closed form:
Z q (x−µX )2 − 1 2 √ e 2σX dx FN (q) = σX 2π −∞ 1 x − µX √ = 1 + erf 2 σX 2
(2.23) (2.24)
where erf is the so-called error function. • Moments:
E [X] = µX
2 V [X] = σX
Remark 2.5 (Sum of Gaussians) The sum of Gaussian variables is a Gaussian variable. Remark 2.6 (Standard Normal) The PDF, CDF and IDF of a standard normal random variable (i.e., µX = 0 and σX = 1) are noted fN (x) = φ (x), −1 FN (a) = Φ (a) and F−1 (p) respectively. N (p) = Φ
Given k standard normal random variables Zi ∼ N (0, 1), X = is a chi-squared random variable with k degrees of freedom.
• Hyper-parameters: degrees of freedom, k. • Notation: X=
k X i=1
• PDF: fχ2k (x) =
Zi2 ∼ χ2k
2 k2 Γ k x (2)
k −1 2
− x2
if x < 0 if x ≥ 0
where Γ is the Gamma function: Z +∞ Γ (x) = e−t tx−1 dt 0
Tools for Stochastic Analysis • CDF: Fχ2k (q) =
( 0
k 2
k q , 2 2
if q < 0 if q ≥ 0
where γ is the lower incomplete Gamma function: Z x ts−1 e−t dt γ (s, x) =
• Moments:
E [X] = k
V [X] = 2k
Given a Gaussian random variable Y : Y ∼ N µY , σY2
The random variable X = eY follows a log-normal distribution. This distribution is used to represent a random variable made of multiplicative uncertainties. In addition, the log-normal is defined on R+ , which makes it ideal for strictly positive quantities. • Hyper-parameters: mean and standard deviation of underlying Gaussian variable, µY and σY respectively. • Notation:
X ∼ LN µY , σY2
• PDF:
(ln x−µY )2 2σ 2 Y
if x ≥ 0
• CDF: there is no analytical closed form: (ln x−µY )2 R q − 1√ 2σ 2 Y e dx if q ≥ 0 0 xσY 2π FLN (q) = 0 if q < 0
fLN (x) =
• Moments:
E [X] = e
µY +
2 σY 2
if x < 0
V [X] = e 29
2 σY
2 − 1 e2µY +σY
Elements of Probability Theory Remark 2.7 (Product of Log-normals) The product of log-normal variables is a log-normal variable.
Remark 2.8 (Underlying Gaussian Variable Moment) Given a log-normal random variable X with mean µX and standard deviation σX , the hyper-parameters of the underlying Gaussian random variable Y are given by: s ! 2 2 σX µX ; σY = ln (2.37) µY = ln p 2 µ2X + 1 σX + µ2X
An exponential distribution is also defined for strictly positive realizations, for which the PDF is continuously decreasing. Exponential distributions are widely used in the field of time reliability. • Hyper-parameters: mean µX . • Notation:
X ∼ E (µX )
• PDF: fE (x) = • CDF:
• Moments:
1 µX e µX
( −q 1 − e µX FE (q) = 0 E [X] = µx
if x ≥ 0 if x < 0 if x ≥ 0 if x < 0
V [X] = µ2X
A Weibull distribution is also defined for strictly positive realizations. It offers different shapes than the log-normal distribution that can be suitable for engineering purposes. 30
Tools for Stochastic Analysis • Hyper-parameters: scale and shape parameters A and B respectively. • Notation:
X ∼ W (A, B)
• PDF: fW (x) = • CDF:
• Moments:
xB B x B−1 − A e AA
( xB 1 − e− A FW (a) = 0
if x ≥ 0 if x < 0
if x ≥ 0 if x < 0
1 E [X] = AΓ 1 + B " 2 # 1 2 −Γ 1+ V [X] = A2 Γ 1 + B B
(2.45) (2.46)
Distribution Transformation
Given a random variable X, a fundamental result of probability theory is that the random variable P = FX (X) follows a standard uniform distribution: P = FX (X) ∼ U (0, 1) (2.47) This result is used across multiple domains to transform one random variable into another. Let X and Y be two random variables. One can show that: Y = F−1 Y (FX (X)) = T (X) ∼ fY
(2.48) (2.49)
The distribution of a function of a random variable can be derived under certain hypotheses. However, such hypotheses are so restrictive for engineering applications that such tools fall outside the scope of this work. Details can be found in Casella and Berger (1990, Chapter 2). 31
Elements of Probability Theory
In many applications, more than one random variable is involved. A random vector X is a vector of d random variables: X = [X1 , . . . , Xi , . . . , Xd ]
where d is referred to as the dimension. In the case of a multivariate, the distribution of any given component of a random vector Xi is referred to as its marginal distribution (e.g., marginal PDF fXi ). The distribution of the random vector X is referred to as its joint distribution (e.g., joint PDF fX ). The joint CDF of a random vector X is defined as: Z ad Z a1 fX (x) dx1 . . . dxd (2.51) ... FX (a) = −∞
A random process R is essentially a random vector of infinite number of dimensions, just like a function can be seen as a vector of infinite size. In other words, for any indices t, R (t) is a random variable. When the indices t are spatial coordinates x = [x1 , x2 , x3 ], R (x) is referred to as a spatial random field. For a formal introduction to random processes, the interested reader is referred to Ross (2007). Example 2.2 (Random Vector Marginal PDFs) An example of a random vector and its marginal distributions for d = 4 is shown in Figure 2.2(a). Example 2.3 (Gaussian Random Process) A random field is referred to as Gaussian if: R(t) ∼ N (µR (t) , σR (t))
A typical representation of a random process is through its quantiles, that are functions of the index t themselves, as shown on Figure 2.2(b) for a Gaussian random field.
Dependence and Conditioning
The notion of conditioning is defined as the ability of one outcome to condition (or influence) a second one. Consider two random variables X1 and X2 . If the realization of X1 does not affect the realization of X2 , the random variables are said to be independent. If the realization 32
Tools for Stochastic Analysis 8
fX 1 fX 2 fX 3 fX 4
6 4
-2 -1
q0:5 q0:25 ; q0:75 q0:05 ; q0:95
-6 1
(a) Marginal distribution of a n = 4 random vector.
(b) Various quantiles of random process R(t).
Figure 2.2: Graphical illustration of random vectors and random processes. of X1 does affect the realization of X2 , the random variables are said to be dependent. The notion of conditional PDF is defined as: fX2 |X1 (x2 |x1 ) =
fX (x) fX1 (x1 )
An important fact derives from this definition. If X1 and X2 are independent, the conditional PDFs should be equal to the marginal ones: fX2 |X1 (x2 |x1 ) = fX2 =
fX (x) fX1 (x1 )
therefore, for independent random variables, the joint PDF is equal to the product of the marginals: fX (x) =
d Y
fXi (xi )
Note that conditioning extends to various probability elements such as expectation and variance. Example 2.4 (Independent and Dependent Bivariate Gaussian) Figure 2.3 shows a comparison of two joint and marginal 33
Elements of Probability Theory
0 -1
0 -1
fX marginal conditional
-2 -3
fX marginal conditional
-2 -3
-4 -4
(a) Independent case.
(b) Dependent case.
Figure 2.3: Comparison of two joint and marginal PDFs for a Gaussian independent and dependent case. The inner circles (yellow) indicate the high probability density regions whereas outer circles (blue) indicate low probability density regions. PDFs for a Gaussian independent and dependent case. Figure 2.3(a) shows that regardless of the realization of X2 , X1 has the same conditional PDF (e.g., 0 is always the most likely realization). Figure 2.3(b) shows that based on the realization of X2 , the conditional PDF of X1 varies (e.g., for a realization x2 = −1, the most likely realization for X1 is −1). Remark 2.9 (Bayes Formula) From (2.53), it follows that: fX (x) = fX2 |X1 (x2 |x1 ) fX1 (x1 )
which lead to a fundamental result in conditional probabilities, the Bayes identity: fX1 |X2 (x1 |x2 ) =
fX2 |X1 (x2 |x1 ) fX1 (x1 ) fX2 (x2 )
This formula allows one to update the knowledge on X1 based on observations of X2 (Example 2.4). 34
Tools for Stochastic Analysis
Moments for Multivariate
Probability theory is tightly linked to integral calculus (e.g., definition of expectation or CDF). Throughout this work, integral or probabilistic notation will be used interchangeably for the sake of clarity and simplicity. Consider two random variables X1 and X2 , and two constants a and b. The expectation function is linear in the sense that: E [aX1 + bX2 ] = aE [X1 ] + bE [X2 ]
However, the variance function has a different behavior: V [aX1 + bX2 ] = a2 V [X1 ] + b2 V [X2 ] + 2abCOV [X1 , X2 ]
where COV [X1 , X2 ] is the covariance between X1 and X2 . The covariance is defined as: COV [X1 , X2 ] = E [(X1 − E [X1 ]) (X2 − E [X2 ])] = E [X1 X2 ] − E [X1 ] E [X2]
(2.60) (2.61)
and characterizes the dependence between 2 random variables. X1 and X2 are independent if and only if COV [X1 , X2 ] = 0, and: E [X1 X2 ] = E [X1 ] E [X2]
The (Pearson) correlation coefficient is a normalized covariance and is defined as: COV [X1 , X2 ] (2.63) ρX1 X2 = σX1 σX2
Dependent Distributions and Copulas
While Section 2.1.2 and 2.1.4 discussed the well known topic of univariate distributions, Section 2.1.6 discussed the concept of multivariate distribution. In the case of independent variables, the expression of the joint PDF is: d Y fX (x) = fXi (xi ) (2.64) i=1
However, in the case of dependent variables (Section 2.1.7), very few joint distributions exist. The most notable one is the multivariate Gaussian distribution. 35
Elements of Probability Theory
Multivariate Gaussian
Consider a random vector X of size d. • Hyper-parameters: means µX and covariance matrix ΣX defined as: µX = [µX1 , . . . , µXd ] ΣX = [COV [Xi , Xj ]] i = [1, . . . , d] , j = [1, . . . , d]
(2.65) (2.66)
X ∼ Nd (µX , ΣX )
• Notation: • Joint PDF: fX (x) = q
1 d
e− 2 (x−µX )Σ
(x−µX )T
(2π) |ΣX |
However, most analytical joint PDFs only offer linear dependence structure. For this reason, copulas have been an extremely active research topic in the past decade. Remark 2.10 (Standard Multivariate Gaussian) Like univariate standard distributions, multivariate standard distributions have zero means and unit standard deviations. However, correlation coefficients can take any values. This is expressed as: 1 · · · ρ1d (2.69) µX = 0 ; σXi = 1 ; ΣX = R = ... . . . ... ρd1 · · · 1
where R is referred to as the correlation matrix. For a standard multivariate Gaussian, the joint PDF and CDF are noted fX (x) = φR (x) and FX (x) = ΦR (x) respectively. In the special case R = Id , where Id is the d × d identity matrix, the notation fX (x) = φ (x) and fX (x) = Φ (x) are used. Plots of joint PDF isovalues for an independent and dependent (ρ = 0.7) standard bivariate Gaussian distribution can be seen on Figure 2.3. 36
Tools for Stochastic Analysis
Copulas (Nelsen, 2006) have recently gained a lot of popularity in several fields, such as economics (Frees and Valdez, 1998), biostatistics (Li, 2000), hydrology (Dupuis, 2007), as well as engineering design (Noh et al., 2009). The theoretical foundation comes from the Sklar’s theorem (Sklar, 1959). It essentially states that for any joint CDF FX , there exists a unique copula C such that: FX (x) = C (FX1 (x1 ) , . . . , FXd (xd )) = C (ν1 , . . . , νd ) = C (ν)
(2.70) (2.71) (2.72)
where FXi is the marginal CDF of the ith random variable and ν refers to the standard uniform space (Remark 2.4) in which the copula lives. The corresponding joint PDF is defined as: dd FX (2.73) fX (x) = dx1 . . . dxd x d dd C Y = fX (xi ) dν1 . . . dνd ν i=1 i
Arguably the most widely used families of copulas are the elliptical and Archimedean ones. As long as the copula is constructed, the corresponding joint PDF can be obtained. In the case of elliptical copulas (e.g., student’s or Gaussian), the derivation of the joint PDF can be carried out analytically. A copula typically relies on a known correlated multivariate distribution and the distribution transformation discussed in Section 2.1.5. Example 2.5 (Gaussian Copula Joint PDF) The Gaussian copula is parametrized by a correlation matrix R and is defined as: C(ν) = ΦR Φ−1 (ν1 ), . . . , Φ−1 (νd ) (2.74) = ΦR (u1 , . . . , ud ) (2.75) = ΦR (u) (2.76) where Φ−1 is defined in Section and u refers to the standard Gaussian space. By carrying out the derivation, 37
Elements of Statistical Inference
65 4
(a) ρ = 0.
(b) ρ = 0.9.
Figure 2.4: Influence of the Gaussian copula hyper-parameter on the joint PDF. one can show that the joint PDF is: d Y fXi (xi ) fX (x) = φR (u) φ (ui ) i=1
Example 2.6 (Influence of Copula Hyper-Parameters) Consider a bivariate example with marginal definition as: X1 ∼ LN (2.05, 0.252 ) X2 ∼ W (53, 0.33)
(2.78) (2.79)
Figure 2.4 shows isovalues of the joint PDF fX using a Gaussian copula for ρ = 0 and ρ = 0.9. Figure 2.4(b) shows a strong non-linear correlation between X1 and X2 whereas Figure 2.4(a) shows, as expected (ρ = 0), independence.
Elements of Statistical Inference
Probability theory lays down a basis to characterize mathematically the notion of uncertainty. The other side of the coin is called the 38
Tools for Stochastic Analysis theory of statistics or statistical inference. Statistics is concerned with making use of data (real or numerical) to understand, estimate, or infer probabilistic features of a phenomenon.
Random Samples
Like random variables for probability theory, statistical inference is built around random samples. For a random variable X, a random sample X of its population is a collection of n realizations x(i) of X: T X = x(1) , . . . , x(i) , . . . , x(n)
Equivalently, for a random vector X of size d, a random sample is defined as: (1) (1) x1 · · · xd x(1) .. .. X = ... = ... (2.81) . . (n) (n) (n) x x1 · · · xd
A fundamental idea in statistics is that each realization x(i) of a random variable can be seen as a random variable itself. In other words, although a realization is typically a deterministic value, a random sample can be repeated. In that sense, as this random sample is repeated, x(i) takes different values according to the parent distribution of X, which is the definition of a random variable. As such, a random sample can also be referred to as independent and identically distributed (i.i.d.). In the context of numerical experiments, random samples need to be efficiently generated in large quantities. Such engines are called pseudo-random sample generators. A uniform generator between zero and one can be found on almost any operating system or programming language. In that sense, assume one can easily generate a random sample P from: P ∼ U (0, 1) (2.82) Using the result discussed in Section 2.1.5, a random sample X can be obtained such as: (n) T (1) X = F−1 , . . . , F−1 (2.83) X u X u Various other methods have been developed over the years, however they fall outside the scope of this work. The interested reader is referred to Rizzo (2008, Chapter 3). 39
Elements of Statistical Inference
A statistic T (X) is some summary of a random sample. As S = T (X) is some mathematical combination of random variables, it is a random variable itself. The distribution of S is called the sampling distribution. Although outside the scope of this work, notions of sufficient, ancillary and complete statistics are of paramount importance for theoretical statistics. Details can be found in Casella and Berger (1990, Chapter 6). Remark 2.11 (Common Statistics)
• Sample mean:
X ¯= 1 x(i) X n i=1
• Sample standard deviation: v u u sX = t
1 X (i) ¯ 2 x −X n − 1 i=1
• Sample coefficient of variation: unbiased for Gaussian distribution 1 sX ? (2.86) cv bX = 1+ ¯ 4n X
• Order statistic: For a random sample X of size n, the j th order statistic X(j) returns the j th highest element such that: X(1) = min x(i) i
X(n) = max x(i) i
Example 2.7 (Sample Mean Sampling Distribution) Consider a Gaussian random variable: 2 X ∼ N µX , σX
Tools for Stochastic Analysis The sum of Gaussian variables is also Gaussian. Its hyperparameters can be calculated as: # " n X 1 ¯ =E E X (2.89) x(i) n i=1 n
1 X (i) E x = n i=1
1 nE [X] = µX n
" n # X ¯ =V 1 V X x(i) n i=1
n 1 X (i) V x = 2 n i=1
2 σX 1 nV [X] = n2 n
(2.92) (2.93) (2.94)
Therefore the sampling distribution of the sample mean is: 2 σ X ¯ ∼ N µX , X (2.95) n
Example 2.8 (Sample Variance Sampling Distribution) Consider a Gaussian random variable: 2 X ∼ N µX , σX (2.96) The sample variance is defined as: n
1 X (i) ¯ 2 x −X = n − 1 i=1
By definition: n X i=1
¯ x(i) − X
2 2 ∼ σX χn−1
Elements of Statistical Inference where χ2n−1 is a chi-squared distribution with n − 1 degree of freedom. Therefore: s2 (2.99) (n − 1) X2 ∼ χ2n−1 σX
A point estimator (later referred to as an estimator or an estimate) θb is any statistic T (X) that aims at estimating some quantity θ. An estimate is called unbiased if: h i E θb = θ (2.100)
The standard error of an estimator θb is the standard deviation of its sampling distribution: r h i (2.101) SEθb = V θb The (1 − α) confidence interval (CI) [l; u] of an estimator θb is defined as: P [l ≤ θ ≤ u] = 1 − α (2.102)
where α is referred to as the significance level. For example, a significance level α = 0.05 means a 95% confidence interval [l; u] which means that there is a 95% chance that θ belongs between l and u. Notion of hypothesis testing offer attractive perspective but fall outside the scope of this work. The interested reader is referred to Casella and Berger (1990, Chapter 8). Example 2.9 (Sample Variance Confidence Interval) Example 2.7 continued Recall: 2 σX ¯ X ∼ N µX , n From Remark 2.2, let us define Z as: ¯ −E X ¯ X Z= q ¯ V X =
¯ − µX √ X n ∼ N 0, 12 σX 42
Tools for Stochastic Analysis and the normal quantile z1−α as: z1−α = Φ−1 (1 − α)
Consider the following interval: P z α2 ≤ Z ≤ z1− α2 = P Z ≤ z1− α2 − P Z ≤ z α2 =1−α
which leads to: 1 − α = P z α2 ≤ Z ≤ z1− α2 ¯ − µX √ X ≤ z1− α2 = P z α2 ≤ n σX σ σ X X ¯ − √ z1− α ≤ µX ≤ X ¯ − √ zα =P X 2 n n 2 Therefore, a (1 − α) CI of µX is given by: σX σX ¯ ¯ α α X − √ z1− 2 ; X − √ z 2 n n Example 2.10 (Sample Variance Confidence Interval) Example 2.8 continued Recall: s2 (n − 1) X2 ∼ χ2n−1 σX n−1 Let us define the chi-squared quantile c1−α as: n−1 F−1 c =1−α 2 1−α χ n−1
(2.107) (2.108)
(2.109) (2.110) (2.111)
where F−1 is the IDF of the chi-squared distribution with χ2n−1 n − 1 degrees of freedom. Therefore, it follows that: s2X n−1 n−1 1 − α = P c α ≤ (n − 1) 2 ≤ c1− α (2.115) 2 2 σX " # (n − 1) s2X (n − 1) s2X 2 =P ≤ σX ≤ (2.116) n−1 c1− cn−1 α α 2
Elements of Statistical Inference A (1 − α) CI of σX therefore reads: "s # s (n − 1) s2X (n − 1) s2X ; n−1 c1− cn−1 α α 2
Maximum Likelihood Estimate
The likelihood function gives information about how likely a hyperparameter θ is to produce an observed random sample X. In other words, let: X ∼ fX (x|θ? ) (2.118) where θ? is unknown. In addition, consider an observed random sample X. The likelihood is defined as: L (θ|X) =
n Y i=1
fX x(i) |θ
The θ value that maximizes the likelihood function is therefore the most likely to be θ? . This is the definition of the maximum likelihood estimate (MLE): (2.120) θb (X) = arg max L (θ|X) θ
Example 2.11 (Mean MLE of a Gaussian Mean) Consider a random variable X such that: X ∼ N 2, 12
For the purpose of this demonstration, let us assume the standard deviation σX = 1 is known. On the other hand, the mean µ? = 2 is unknown. Given a random sample of size n, the likelihood is defined as: L (µ|X) =
n Y i=1
φ x(i) − µ
Figure 2.5 shows plots of the likelihood for four different samples of size n = 10, n = 100 and n = 1000. Table 2.1 44
1 2 3 4 4
Sample Sample Sample Sample 0
(a) n=10.
ln L(7jX)
Sample Sample Sample Sample
ln L(7jX)
ln L(7jX)
Tools for Stochastic Analysis
1 2 3 4 2
Sample Sample Sample Sample 0
(b) n=100.
1 2 3 4 4
(c) n=1000.
Figure 2.5: Maximum likelihood estimate illustration using four different random samples of sizes n = 10, n = 100, and n = 1000. Table 2.1: Maximum likelihood estimate values for the Example 2.11. Sample 1 Sample 2 Sample 3 Sample 4 n = 10 2.63 2.71 2.34 1.58 1.98 1.85 1.90 1.98 n = 100 2.03 2.01 2.04 2.02 n = 1000 groups the MLE for the 12 sub-cases. As n increases, the variability in the estimate decreases. This clearly highlights a rather intuitive idea: the more data is available, the more accurate the estimate.
Bayes Estimate
In some situations, relying solely on observation (in the form of a random sample X) can be sub-optimal. Such situations typically occur when expert knowledge or prior belief are available. Treating θ as a random variable, the Bayes formula (Remark 2.9) allows one to account for both prior knowledge and observed data:
L (θ|X) fθ (θ) fθ|X (θ|X) = R +∞ L (θ|X) fθ (θ) dθ −∞
1. fθ|X (θ|X) is referred to as the posterior distribution, and represents the updated PDF of θ knowing X (in the form of a random sample X); 45
Elements of Statistical Inference 2. L (θ|X) is the likelihood as described in Section 2.2.4; 3. fθ (θ) is the prior distribution, and represents the prior knowledge; R +∞ 4. −∞ L (θ|X) fθ (θ) dθ is a normalizing constant which represents the overall probability density to observe X. The Bayes estimator is defined as the expectation of the posterior distribution: Z +∞ b θ (X) = E [θ|X] = θfθ|X (θ|X) dθ (2.124) −∞
Remark 2.12 (Posterior distribution expression) In most application of the Bayes concept, the following result is easier to use than (2.123): fθ|X (θ|X) ∝ L (θ|X) fθ (θ)
This is typically useful when the posterior distribution is sampled using Markov-chain Monte Carlo (MCMC) techniques (Andrieu et al., 2003). Example 2.12 (Mean Bayes Estimate of a Gaussian Mean) Consider the setup from Example 2.11. Based on some previous experience, assume a prior distribution such that: θ ∼ N 3, 0.52 (2.126) Figure 2.6 shows plots of the likelihood (size n = 10 sample), the prior distribution and the posterior distribution. In this situation, the MLE is 1.74 whereas the Bayes estimator is 2.09. Note that this example does not show superiority of the Bayes estimator over the MLE, but merely that the Bayes estimator balances out observed data (likelihood) with prior knowledge (prior distribution).
Monte Carlo Estimate
A large part of this work consists in estimating high dimensional integrals. Various quadrature approaches (Davis and Rabinowitz, 2007) 46
Tools for Stochastic Analysis
(a) Likelihood.
(b) Prior.
(c) Posterior.
Figure 2.6: Plots of the likelihood, and prior and posterior distribution for Example 2.12. have been developed (e.g., Simpson’s rule). However, in high dimension, Monte Carlo techniques are often the most accurate. For the sake of simplicity and without loss of generality, consider the univariate case: Z +∞ g (x) fX (x) dx (2.127) I= −∞
= E [g(x)]
The sample mean defined in (2.84) is an unbiased estimator of the expectation. Therefore, given a random sample X, the crude Monte Carlo (CMC) estimate of I is defined as: n
1X Ib = g x(i) n i=1
The standard error of this estimator is: r h i SE b = V Ib I
σg(X) sg(X) = √ ≈ √ n n
(2.130) (2.131)
The Central Limit Theorem (Rice, 2006) also states that as n → +∞, the sampling distribution of Ib is Gaussian: ! 2 σg(X) Ib ∼ N I, (2.132) n 47
Elements of Statistical Inference Using Example 2.9, a (1 − α) CI of I is given by: s s g(X) g(X) Ib − √ z1− α2 ; Ib − √ z α2 n n
Remark 2.13 (Deterministic Integral) This section introduced CMC estimate for integrals of the form: Z +∞ g (x) fX (x) dx
However, regular integral can always been expressed as such. Consider the following integral: Z b I= g(x)dx (2.135) a
It can always be rewritten as: Z b 1 I = (b − a) g(x) dx b−a a where that:
1 b−a
is the PDF of a uniform random variable U such U ∼ U (a, b)
Given a uniform random sample U , the CMC estimator reads: n
b−aX Ib = g u(i) n i=1
and its standard error is: SEIb =
(b − a)σg(X) √ n
For complex estimators, the sampling distribution is usually unknown. However, Efron (1979) introduced the bootstrap approach to approximate the sampling distribution of any estimator. The general idea is 48
Tools for Stochastic Analysis that, given a random sample X representative of the population (i.e., the distribution of X), resamples of X are also representative of the population. Resamples X ? are obtained by sampling with replacement X, that is, the size of X ? is the same as the one of X and every element of X ? is randomly selected from the elements of X. Note that the elements X can be repeated. Using this idea, given an estimator θb = T (X), as X is resampled, multiple realizations θb(i) = T (X ? ) of the estimator can be obtained. Doing so, one obtains a bootstrapped random sample θb of the estimator: h iT (1) (b) b b b θ = θ ,...,θ
where b is the bootstrap size. At the end of the bootstrap process, any conventional statistics can be applied (Section 2.2.2) to obtain standard error and empirical CI (bias-corrected and accelerated, Efron, 1987). More details about the bootstrap procedure and associated CI can be found in Efron and Tibshirani (1993). Example 2.13 (Resamples Examples) Given a random sample X: X = [1, 2, 3, 4]T
Possible resamples X ? are: [2, 4, 1, 4]T
[3, 4, 4, 2] [3, 2, 2, 2]
Example 2.14 (Bootstrap of the Sample Mean) Recall the sampling distribution from Example 2.7. Figure 2.7 shows an overlay of the bootstrapped sample mean histogram with its analytical sampling distribution, for a random sample of size n = 100. There is a clear bias between the two (due to the lack of information), however, the standard error (i.e., the standard deviation) is very close. 49
Sensitivity Analysis 4.5
Bootstrap Sample distribution
0 -0.4
7 Figure 2.7: Overlay of bootstrapped sample mean histogram and its sampling distribution, for a random sample of size n = 100.
Sensitivity Analysis
In the two previous sections of this chapter, the basic elements of probability theory and statistical inference needed for this work have been introduced. The three remaining sections of this chapter are concerned with using this theoretical tools to develop design and decision making approaches. As discussed in Section 1.1, this work is concerned with the analysis of a computationally expensive numerical model M defined as: Y = M (X)
where X and Y are referred to as input and output variables respectively. The size of X, d, is referred to as the dimension of the problem. Intuitively, the lower the dimension, the easier the problem (e.g., d graphical representation). For √ example, for a unit hyper-cube I , its longest diagonal is equal to d. Therefore, as the dimension increases, the amount of information needed for an equivalent understanding increases too. This is referred to as the curse of dimensionality (Bellman, 1961). The curse of dimensionality has been shown to arises in engineering setups such as reliability assessment (Section 2.4) and RBDO (Sec50
Tools for Stochastic Analysis tion 2.5). Sensitivity analysis main purpose is to identify and remove the input variables that are deemed non-influential from the problem. This is referred to as dimensionality reduction. Therefore, sensitivity analysis consists in characterizing the influence of the input variables X on the output variables Y. For the sake of simplicity and without loss of generality, the case of univariate output Y is considered. While the definition of sensitivity analysis is widely agreed upon, its mathematical translation is not. Three major interpretations of the sensitivities arises: Correlation Analysis The influence of an input variable Xi on an output Y is defined as the strength of their correlation (Section 2.3.1); Differential Analysis The influence of an input variable Xi on an ∂y , such as in ouput Y is defined through the partial derivative ∂x i design optimization (Section 2.3.2); ANalysis Of VAriance The influence of an input variable Xi on an ouput Y is defined through the contribution of Xi to the global variance of Y (Section 2.3.3). Remark 2.14 (Deterministic to Stochastic) In many engineering settings, the problem definition consists of deterministic variables over a given range. However, a deterministic variable x such that: x ∈ [l; u]
can be seen seen as a uniform random variable X such that: X ∼ (l, u)
Correlation Analysis
Unlike the notion of expectation or variance, the concept of correlation is not unique. While the Pearson correlation coefficients were introduced as early as the 19th century, new measures of dependence are still introduced nowadays, such as the copula-based Randomized Dependence Coefficient (Lopez-Paz et al., 2013). For the purpose of this work, only the three most common correlation measures are introduced. 51
Sensitivity Analysis
Pearson correlation coefficient
The (Pearson) correlation coefficient as defined in (2.63) is arguably the most used measure of correlation: E [(Xi − E [Xi ]) (Y − E [Y ])] ρXi Y = q E (Xi − E [Xi ])2 E (Y − E [Y ])2
Using the CMC estimate (2.129) of the expectation, a CMC estimate of ρXi Y using W = Xi , Y is: Pn (j) ¯i y (j) − Y¯ x − X i j=1 (2.149) ρbXi Y = r P Pn (j) n (j) − Y ¯ ¯ j=1 y j=1 xi − Xi which can be simplified as: n
ρbXi Y
1 X = n − 1 i=1
(j) ¯i xi − X sXi
y (j) − Y¯ sY
¯ and sX are the sample mean (2.84) and standard deviation where X (2.85) based on X respectively. Deriving a general sampling distribution of this estimate is not straightforward. However, Fisher (1915) introduced the so-called Fisher transformation: F (ρ) = arctanh (ρ)
and showed that:
Z = F (b ρXi Y ) ∼ N F (ρXi Y ) ,
1 n−3
Using Example 2.9, a (1 − α) CI of Z can be derived: z α2 z1− α2 F (b ρXi Y ) + √ ; F (b ρXi Y ) + √ n−3 n−3
and a (1 − α) CI of ρXi Y follows as: z α2 z1− α2 tanh F (b ρXi Y ) + √ ; tanh F (b ρXi Y ) + √ (2.154) n−3 n−3 The Pearson correlation coefficients are also referred to as linear, as illustrated in Example 2.16. 52
Tools for Stochastic Analysis
Spearman’s Rank Correlation Coefficients
The Spearman’s rank correlation coefficients (also referred to as Spearman’s rho, Spearman, 1904) are an extension of the Pearson correlation coefficients for non-linear monotonic relationship. The random samples X and Y are converted to ranks RX and RY . The Spearman’s rho is defined as: n 2 X 6 (j) (j) (2.155) r − r ρbXi Y = 1 − Y n (n2 − 1) j=1 Xi One can extend the Fisher transformation for Spearman’s rho (Fieller et al., 1957): 1.06 Z = F (b ρXi Y ) ∼ N F (ρXi Y ) , (2.156) n−3
which leads to a (1 − α) CI of ρXi Y defined as: ρXi Y ) + z1− α2 SEZ tanh F (b ρXi Y ) + z α2 SEZ ; tanh F (b where:
1.06 n−3
Note however that the CI (2.157) is only exact for ρXi Y = 0 and n → ∞. Example 2.15 (Rank Definition) Consider a random sample X such that: X = [5, 8, 1, 6, 2]T
Its associated ranks RX is defined as: RX = [3, 5, 1, 4, 2]T
In case of tied ranks, the averaged rank is used instead. Consider a a random sample X such that: X = [3, 8, 1, 3, 8, 8, 9]T
The realization 3 would have both ranks 2 and 3, while the realizations 8 would have ranks 4, 5, and 6. Therefore the 53
Sensitivity Analysis ranks RX are defined as:
T 2+3 4+5+6 2+3 4+5+6 4+5+6 RX = , , 1, , , ,7 2 3 2 3 3 = [2.5, 5, 1, 2.5, 5, 5, 7]T
(2.162) (2.163)
Kendall Rank Correlation Coefficients
The Kendall rank correlation coefficients (also referred to as Kendall tau, Kendall, 1938) are another non-linear correlation measure based on concordant and discordant pairs. Several variations of the Kendall tau exist. Only the so called τB is presented (Agresti, 2010). For the ranks RX and RY presented in the previous section, any pair: (k) (k) (j) (j) where j 6= k and rXi ; rY rXi ; rY
is said: • concordant if: (j)
rXi > rXi and rY > rY , or, rXi < rXi and rY < rY • discordant if: (j)
rXi < rXi and rY > rY , or, rXi > rXi and rY < rY • tied if:
rXi = rXi or rY = rY
The τB is defined as: nc − nd τbXi Y = p (n0 − n1 ) (n0 − n2 ) 54
Tools for Stochastic Analysis where nc and nd are the number of concordant and discordant pairs respectively and: n (n − 1) 2 X tj (tj − 1) n1 = 2 j n0 =
n2 =
X uk (uk − 1) 2
(2.169) (2.170) (2.171) (2.172)
where tj and tk are the number of tied values in the j th and k th group of RXi and RY ties respectively. Due to its definition, the Kendall tau calculation is more computationally involved than its Pearson and Spearman counterpart. The sampling distribution of the Kendall tau is not straightforward. However for τXiY = 0, it is common (Prokhorov, 2001) to approximate it by a standard distribution with zero mean and standard deviation: s 2 (2n + 5) (2.173) στn = 9n (n − 1) This leads to a (1 − α) CI defined as: τbXiY + z α2 στn ; τbXiY + z1− α2 στn
Although theoretically restricted to τXiY = 0, this CI can offer an approximation for the purpose of sensitivity analysis. A bootstrapped CI, as discussed in Section 2.2.7, is another alternative. Example 2.16 (Correlation Coefficients Demonstration) In order to illustrate the differences between the correlation coefficients, the three problems depicted on Figure 2.8 and defined below are considered: Example SA.1: Y =X
Example SA.2: Y =
Sensitivity Analysis
0 0
(a) Example SA.1.
(b) Example SA.2.
(c) Example SA.3.
Figure 2.8: Illustration of sensitivity examples used for comparison. Table 2.2: Correlation coefficients of the demonstrative examples (n = 104 ). ρb Pearson Spearman Kendall SA.1 ρbXY 1.00 1.00 1 95% CI [1.00; 1.00] [1.00; 1.00] [0.99; 1.01] SA.2 ρbXY −0.92 −1.00 -1 95% CI [−0.93; −0.92] [−1.00; −1.00] [−1.01; −0.99] SA.3 ρbX1 Y 0.01 0.01 −0.01 95% CI [−0.01; 0.03] [−0.01; 0.03] [−0.01; 0.02] ρbX2 Y 0.01 0.01 0.01 95% CI [−0.01; 0.03] [−0.01; 0.03] [−0.00; 0.02] Example SA.3: Y = 0.1 sin (100X1 ) + 2 cos (2πX2 )
X, X1 , X2 ∼ U (0, 1)
Table 2.2 shows the correlation coefficients and the 95% CI, using random sample of size n = 104 . It clearly highlights why Pearson coefficient are referred to as linear, missing the perfect correlation in the non-linear Example SA.2 (2.176). In addition, all three coefficients are impaired by non monotonic behavior. On Example SA.3 (2.177), all coefficients would point to the conclusion that X1 and X2 are noninfluential on Y . 56
Tools for Stochastic Analysis Remark 2.15 (Correlation Coefficient Computational Cost) For a random sample X of size n, the computational cost of all three correlation coefficient is n function calls.
Differential Analysis
Another way to characterize the influence of the inputs X on the output Y is to analyze the behavior of the partial derivatives. This is, done for example, in design optimization where the gradient of the objective is often referred to as sensitivities.
Elementary Effects
When introduced by Morris (1991), the elementary effects (EE) ∆i were defined as: M (x1 , . . . , xi−1 , xi + δ, xi+1 , . . . , xd ) − M (x) (2.179) ∆i (x) = δ where δ is some perturbation. As δ → 0, the EE are the forward finite difference of M: ∂y ∆i ≈ (2.180) δ→0 ∂xi A convenient way to express the two output of the EE method in the probabilistic framework is: ∂Y (2.181) ∆i = ∂Xi µ∆i = E [∆i ] (2.182) p σ∆i = V [∆i ] (2.183)
In words, µ∆i represents the average partial derivative of Y with respect to Xi . Therefore, it is typically used as a measure of the influence of Xi over Y . On the other hand σ∆i represents the variation of the partial derivatives over the domain. This can only be due to non-linearities or interactions between input variables X. For a variable Xi to be classified as non-influential, both µ∆i and σ∆i must be low. Given a random sample X, ∂X is defined as: ∂Y ∂Y · · · ∂Xd (1) ∂X1 x(1) x . . . . . . ∂X = (2.184) . . . ∂Y ∂Y · · · ∂X (n) ∂X1 (n) d x
Sensitivity Analysis and is obtained either from a model providing the gradients or through finite differences of X. The measures (2.182) and (2.183) are estimated using the sample mean (2.84) and sample standard deviation (2.85): n X 1 ∂Y ¯i = ∆ n j=1 ∂Xi x(j) v u 2 n u 1 X ∂Y t s∆i = −µ b∆i n − 1 j=1 ∂X1 x(1)
Using Example 2.9, a (1 − α) CI of µ∆i reads: s s ∆ ∆ i i ¯ i − √ zα ¯ i − √ z1− α ; ∆ ∆ 2 n n 2
Using Example 2.10, a (1 − α) CI of σ∆i reads: v v u u u (n − 1) s2∆i u (n − 1) s2∆i t ;t n−1 c1− cn−1 α α 2
Note that this CI is exact only if: ∆i =
∂Y 2 ∼ N µ∆i , σ∆ i ∂Xi
Such an assumption is usually not satisfied and bootstrapped CI, as discussed in Section 2.2.7, might be preferred. When working with non-monotonic functions, such as periodic ones, the EE method averages out partial derivative realizations of opposite signs, artificially lowering their perceived influence.
Derivative-based Global Sensitivity Measures
Recently, extensions of the EE method, often referred to as derivativebased global sensitivity measure (DGSM), have been proposed by Campolongo et al. (2007): ∂Y 1 ∆i = (2.190) ∂Xi µ∆1i = E ∆1i (2.191) 58
Tools for Stochastic Analysis Table 2.3: Differential analysis of the Example SA.3 (n = 104 ). j 0 1 2 ¯ j1 −0.02 6.23 49.07 ∆ ¯ j2 −0.17 7.98 78.51 ∆ 7.01 s∆j 1 s∆j 8.86 2 where ∆0i = ∆i and Sobol’ and Kucherenko (2009): ∆2i µ∆2i
2 ∂Y = ∂X 2i = E ∆i
(2.192) (2.193)
Estimators and (1 − α) CI can be derived for these two cases as in Section For more details, the interested reader is referred to Iooss et al. (2012); Lamboni et al. (2013). Example 2.17 (Differential Analysis Demonstration) In order to illustrate the differences between the EE method and DGSM approaches, Table 2.3 shows the sensitivity measures, using random sample of size n = 104 for Example ¯ i are relaSA.3 (2.177). For the EE method (∆i ), both ∆ tively low, due to periodicity. However, both s∆i are rather high, either highlighting non-linearities or interactions (in this case, non-linearities). In addition, for both DGSM approaches, the inputs X1 and X2 have roughly the same influence over the output Y . Remark 2.16 (Differential Analysis Computational Cost) For a random sample X of dimension d and size n, the computational cost of all three differential approaches is n (d + 1) function calls using finite differences.
ANalysis Of VAriance
The analysis of variance (ANOVA) refers to a wide range of tools. Of interest in this work is the ANOVA decomposition (Archer et al., 1997). 59
Sensitivity Analysis The ANOVA decomposition expresses a function M in a form that highlights the influence of every combination of variables such that: M (X) = M0 +
d X i=1
d X d X
Mi (Xi ) +
i=1 j=i+1
+ M1,...,d (X)
Mij (Xi , Xj ) + . . .
which can be expressed using set notations as: X M (X) = M0 + Mκ (Xκ )
where the inputs X are assumed independents, Id represents the set of all unique possible combinations made of 1, . . . , d (e.g., 1, 2, {2, 5}, etc.). The decomposition (2.195) is called ANOVA decomposition if and only if: Z ∞ Mκ (xκ ) fX (x) dxi = 0, ∀κ ∈ Id , and ∀i ∈ κ (2.196) −∞
which ensure orthogonality of the individual terms of (2.195): Z ∞ Mκ1 (xκ1 ) Mκ2 (xκ2 ) fX (x) dx = 0 (2.197) −∞
∀κ1 , κ2 ∈ Id ,
κ1 6= κ2
The terms in (2.195) can then be expressed through: Z +∞ M (x) fX (x) dx = E [M (X)] −∞
dxk = E [M (X) |xi ]
dxk = E [M (X) |xi , xj ]
M (x) fX (x)
M (x) fX (x)
= M0
= M0 + Mi (xi )
= M0 + Mi (xi ) + Mj (xj ) + Mij (xi , xj ) .. . 60
Tools for Stochastic Analysis One can therefore show (Sobol’, 2001): Z +∞ M (x)2 fX dx − M0 = V [M (X)] (2.201) −∞ X Z +∞ = Mκ (xκ )2 fX (x) dxκ κ∈Id
where Vκ represents the contribution of the set κ to the variance of the function M: Vi = V [E [M (X) |Xi ]] Vi,j = V [E [M (X) |Xi ]] − Vi − Vj .. .
i = [1, . . . , d] i, j = [1, . . . , d] i < j
(2.202) (2.203)
Using this equality, the so called Sobol’ indices (Sobol’, 1990; Saltelli et al., 1993) are defined as: Sκ =
Vκ V [M (X)]
Sobol’ indices are classified through their order, defined by the size of the set κ. The first order Sobol’ indices are defined as: Si =
Vi V [M (X)]
i = [1, . . . , d]
and represent the direct influence of the input Xi on the variance of the model. The second order Sobol’ indices are defined as: Sij =
Vij V [M (X)]
i, = [1, . . . , d] , such that i < j
and represent the influence of the interaction between Xi and Xj on the variance of the model. Higher orders are defined in the same way. Finally, the ith total Sobol’ index is defined as the sum of Sobol’ indices of all order in which i appears: X Vκ SiT =
V [M (X)] 61
Reliability Assessment Table 2.4: ANOVA of the Example SA.3 (n = 104 ). Sbi Sit Sbij X1 0.00 0.00 0.00 0.00 1.00 1.00 X2 1.00 1.00 0.00 X1 , X2 0.00 where Idi is made of all the sets in Id where i appears. Various approaches have been proposed to compute Sobol’ indices (Saltelli et al., 2000; Sobol’, 2001). Saltelli (2002) introduced the most widely used numerical approach (Remark 2.17). Saltelli’s method returns two estimates of the first and second order, and total Sobol’ indices. Note that only bootstrapped CI are available for these estimates. For the case of dependent variables, the interested reader is referred to Caniou (2012, e.g., ANCOVA). Example 2.18 (ANOVA Demonstration) In order to illustrate the differences between the differential analysis and ANOVA, Table 2.4 shows the Sobol’ indices, using random samples of size n = 104 , for Example SA.3 (2.177). For ANOVA, the input X1 is non-influential, whereas for DGSM (Table 2.3), inputs X1 and X2 had approximately the same influence. At this point, there is not a clear consensus as of which conclusion is correct. It depends on how one defines the notion of sensitivity. Remark 2.17 (ANOVA Computational Cost) For a random sample X and X 0 of dimension d and size n, the computational cost of Sobol’ indices calculation using Saltelli’s method is n (2d + 2) function calls.
Reliability Assessment
During an engineering design process, the decision maker must ensure the reliability of the project. In the traditional deterministic way, this 62
Tools for Stochastic Analysis would be achieved using safety factors (e.g., amplifying the loads during the design process). At the end of this process, a supposedly overconservative design is obtained, for which the decision maker hopes it will withstand any variations in the manufacturing process or loading solicitations. With this approach, the actual reliability of the project cannot be quantified. For this reason, this section details various approaches for reliability assessment. In reliability assessment, one is concerned in estimating a probability of failure Pf : Z Pf (z, θ) =
fX (x|θ) dx
Ωf (z)
where Ωf is referred to as the failure domain, z are some design variable that control the failure domain, and θ are the distribution hyperparameters of X. Until Section 2.5, the dependence notation on z and θ is omitted: Z fX (x) dx (2.209) Pf = Ωf
The failure domain Ωf can be defined either through: • a limit state g such that: Ωf = {x|g (x) ≤ 0}
This is referred to as component reliability. • a combination of limit states gi . This is referred to as system reliability. Common systems are series: n o Ωf = x| ∨ gi (x) ≤ 0 (2.211) i
and parallel:
n o Ωf = x| ∧ gi (x) ≤ 0 i
where ∧ and ∨ are the logical “and” and “or” operators respectively. For a parallel system, all limit states must fail for the system to fail. For a series system, if any limit state fails, the system fails. Limit states are usually simple functions of the computationally expensive model M (Section 1.1) outputs Y. For example, failure might be defined as Y being lower than T : g (x) = Y − T, 63
Y = M (X)
Reliability Assessment
Limi state g (X) = 0
Failure domain Ωf g (X) ≤ 0
Safe domain Ωcf
Distribution fX
g (X) > 0 X1
Figure 2.9: Graphical representation of relevant elements for reliability assessment. For component reliability, another expression of (2.209) reads: Z +∞ Pf = I [g (x) ≤ 0] fX (x) dx (2.214) −∞
where I is the indicator function defined as: ( 1 if g(x) ≤ 0 I [g (x) ≤ 0] = 0 else
Figure 2.9 illustrates the different elements involved in a reliability assessment problem. Note that “limit state” refers either to the limit state function g or the limit state hyper-surface g (x) = 0.
Crude Monte Carlo
As M is a black-box model, the integral in (2.214) cannot be evaluated analytically. For a random sample X, a CMC estimate of (2.214), as defined in Section 2.2.6, reads: n 1X b Pf = I g x(i) ≤ 0 n i=1
Let us define the random variable B as:
B = I [g (X) ≤ 0] 64
Tools for Stochastic Analysis 10 -1
10 -2
10 -3
10 -4
0.1 10 -5
10 -6
10 -7
10 4
10 6
10 8
n Figure 2.10: Iso-contours (1%, 10%, and 100%) of the Pf CMC estimate coefficient of variation with respect to n and Pf . B takes value 0 and 1 when g (X) > 0 and g (X) ≤ 0 respectively. By definition, the probability of g (X) ≤ 0 and g (X) > 0 is Pf and 1 − Pf . Therefore, B is a Bernoulli random variable: B ∼ B (Pf ) ,
E [B] = Pf ,
V [B] = Pf (1 − Pf )
Using the Central Limit Theorem (Rice, 2006), the Bernoulli distribution moments (2.218), and the results from Section 2.2.6, it follows that: P (1 − P ) f f (2.219) Pbf ∼ N Pf , n The error of the CMC estimate Pbf is defined as its coefficient of variation: s σPbf (1 − Pf ) cvPbf = = (2.220) µPbf Pf n
Figure 2.10 shows a plot of this coefficient of variation which illustrates a widely used rule of thumbs in the engineering world. In order to correctly assess a probability of failure Pf , a random sample of size at least: n = 102−log(Pf ) (2.221) 65
Reliability Assessment should be used (i.e., ensures an error of about 10%). Using the same derivation from Section 2.2.6, a (1 − α) CI of Pf reads: # " p p P (1 − P ) P (1 − P ) f f f f √ √ (2.222) z1− α2 ; Ib − z α2 Pbf − n n
Provided a large enough sample size, the CMC estimate is always accurate. However, recall that each realization of the random sample is used to call the computationally expensive model M. Hence, the numerical cost of a CMC estimate of Pf is typically prohibitive for P ff ≤ 10−3 .
First Order Reliability Method
Although probability theory and Monte Carlo integration have been around for decades, CMC reliability assessment has not been applied to real life engineering cases due to its prohibitive computational cost. It is only in the 1970’s with the seminal work of Hasofer and Lind (1974) and Rackwitz and Fiessler (1978) that reliability assessment started to be considered as a viable alternative to conventional deterministic design approaches. First, consider the case of linear component reliability (a single linear limit state function g) with independent standard Gaussian variables U = X: Ui ∼ N 0, 12 , i = [1, . . . , d] (2.223) Under these assumptions, the mean value first order second moment (MVFOSM) approach computes the exact Pf at a cost of d + 1 function calls. A linear function is defined as: Y = g (U) = b +
d X
ai Ui
Recall that (Section a sum of Gaussian random variables is Gaussian: ! d X Y ∼ N b, a2i (2.225) i=1
It then follows that:
−b Pf = P [Y ≤ 0] = Φ qP 66
d i=1
Tools for Stochastic Analysis 4 3
Limit state hyper-surface
1 0 -1 -2 -3 -4 -4
U1 (a) A non-linear limit state function
(b) A linear limit state hyper-surface
Figure 2.11: An example of non-linear limit state function g (U) that has a linear hyper-surface g (U) = 0. where the coefficients ai and b are typically obtained such that: ∂g , ai = ∂xi ui =0
b = g(0)
However, when the limit state function is not linear, MVFOSM can lead to large errors. A more robust approach is the so-called first order reliability method (FORM). Compared to MVFOSM, FORM relaxes the assumption of linear limit state function g (U) for only a linear limit state hypersurface g (U) = 0. Figure 2.11 illustrates a situation where the FORM assumption holds whereas the MVFOSM’s does not. FORM relies on the notion of most probable failure point (MPP). The MPP is defined as the point in the failure domain Ωf that maximizes the joint PDF φ of U: uMPP = arg max φ (u)
s.t. g (u) ≤ 0 As φ features a central symmetry, as shown on Figure 2.3(a), (2.228) 67
Reliability Assessment reduces to: 1 T uu 2 u s.t. g (u) ≤ 0
uMPP = arg min
The reliability index β is defined as the algebraic distance between the origin and the MPP: β = sgn [g (0)] uMPP
where ||·|| is the Euclidean norm. However, the focus of this work is to asses low probabilities of failure. It can therefore be assumed that the mean value design is safe (i.e., g (0) > 0). Hence, for the reminder of this text: (2.231) β = uMPP The negative direction cosines α are defined as: α=
∇u g|uMPP uMPP =− β ||∇u g|uMPP ||
where: ∇u g|uMPP =
∂g ∂u1
∂g ∂ud
Figure 2.12(a) illustrates the different elements of FORM. Given the MPP, and taking advantage of the joint PDF symmetry, the problem can be reoriented (as illustrated on Figure 2.12) so that the Pf can be evaluated analytically as: Pf = 1 − Φ (β) = Φ (−β)
In the FORM setting, the reliability assessment problem is converted into the optimization setup (2.229). Therefore, the computational cost is dictated by the solver used. Various approaches have been proposed (Hasofer and Lind, 1974; Rackwitz and Fiessler, 1978; Liu and Der Kiureghian, 1991; Zhang and Der Kiureghian, 1995; Jiang et al., 2014), arguably the most famous one being the Hasofer and Lind and Rackwitz and Fiessler (HL-RF) method. New numerical solvers are still introduced nowadays, such as in Jiang et al. (2014). 68
Tools for Stochastic Analysis 4 3
? g (U) = 0 MPP
2 1
1 0
-4 -4
-4 -4
g (U) = 0 MPP
(a) Original problem.
(b) Rotated problem.
Figure 2.12: Illustration of the orientation change. The calculation of the Pf reduces to an univariate problem. When the random variables X are not independent Gaussians, various transformation T such that: X ∼ fX (x) U = T (X) ∼ Nd (0, Id ) X = T−1 (U)
(2.235) (2.236) (2.237) (2.238)
can be used to connect to a Gaussian standard space U, such as the one discussed in Section 2.1.5, Nataf (Nataf, 1962), Rosenblatt (Rosenblatt, 1952) or copulas (Section 2.1.9). In that case, (2.229) becomes: 1 T uu 2 u s.t. g T−1 (u) ≤ 0
uMPP = arg min
The probability defined in (2.234) is only exact when the limit state hyper-surface is linear in the standard Gaussian space. However, it remains a reasonable approximation for mildly non-linear limit state hyper-surfaces. For a discussion of the range of application of FORM, the interested reader is referred to Zhao and Ono (1999). FORM has been extended to the case of system reliability in the work of Hohenbichler and Rackwitz (1982). As the FORM approximation is not a statistic, CI cannot be derived. 69
Reliability Assessment
Second Order Reliability Method
When the limit state hyper-surface is strongly non-linear, FORM can lead to large error in the estimated probability of failure. In these situation the second order reliability method (SORM) can be used. The general idea is to correct the FORM approximation to account for the main curvatures γi around the MPP (Der Kiureghian et al., 1987). Various corrections have been introduced for component reliability (Breitung, 1984; Tvedt, 1990; Cai and Elishakoff, 1994; K¨oyl¨ uoˇglu and Nielsen, 1994; Zhao and Ono, 1999; Der Kiureghian, 2000), however, the most widely used was introduced by Breitung: Pf = Φ (−β) qQ
1 d−1 i=1 (1
(2.240) + βγi )
The main curvatures can be obtained in different way. The simplest however uses a simple rotation transformation. Assuming Ui are independent standard random variables, uMPP the MPP, and ∇u g|uMPP and H the gradient and the Hessian of g at uMPP respectively:
d2 g du1 du1
··· ...
.. . d2 g dud du1
d2 g du1 dud
.. . d2 g dud dud
Let the normalized gradient and Hessian, α0 and B respectively, be defined as: ∇u g|uMPP H α0 = , B= (2.242) ||∇u g|uMPP || ||∇u g|uMPP || and R be the d × d rotation matrix such that:
··· ...
0 1 R = ... . . . . . . . . . 0 ··· 0 1 0 0 0 α1 α2 · · · αd−1 70
0 .. . .. . 0 αd0
Tools for Stochastic Analysis For A and Ar the regular and reduced rotated normalized Hessian respectively: a11 · · · a1d A = ... . . . ... = RBRT (2.244) ad1 · · · add a11 · · · a1,d−1 .. ... Ar = ... (2.245) . ad−1,1 · · · ad−1,d−1
the main curvatures are defined as the eigenvalues of Ar . If the random variables X are not independent standard normal, transformations U = T (X) can be used as discussed in Section 2.4.2. The computational cost of SORM is usually the one of FORM plus the cost of evaluating the main curvature (i.e., the Hessian matrix). The brute force approach consist in evaluating the Hessian matrix at the MPP. However, more subtle methods have been introduced, such as Der Kiureghian and De Stefano (1991) where the Hessian is estimated at no extra cost from the optimization steps of FORM. Zhao and Ono (1999) discussed the range of application in which SORM is a reasonable approximations. Finally, SORM for system reliability is discussed by Hohenbichler et al. (1987).
Importance Sampling
FORM and SORM reduced the computational burden of reliability assessment by transforming a statistical inference problem into a numerical optimization one. In other words, rather than estimating Pf , FORM and SORM approximate it. In doing so, both these approaches rely on a specific set of hypothesis. Specific behavior can impair these approaches such as: • Highly non-linear limit state hyper-surfaces • Non-differentiable or discontinuous responses In such cases, sampling (also referred to as Monte Carlo) methods are viable alternatives. Traditionally, sampling methods can handle a large array of problem but have larger computational cost. The most common sampling approach is the CMC estimate discussed in Section 2.4.1. The performance of these approaches is typically measured 71
Reliability Assessment through the variance of their estimate. For this reason, such approaches beside CMC are often referred to as variance reduction techniques (Rubinstein and Kroese, 2011). Among these techniques, the simplest conceptually is referred to as importance sampling (IS). IS relies on a simple “trick”: Z +∞ I [x ∈ Ωf ] fX (x) dx (2.246) Pf = −∞ Z +∞ fX (x) I [x ∈ Ωf ] h (x) dx (2.247) = h (x) −∞ n X (i) fX x(i) 1 I x ∈ Ωf (2.248) Pbf = n i=1 h (x(i) ) where x(i) is the ith realization of a random sample X that follows the auxiliary joint PDF h, and not fX anymore. The variance of the IS estimate reads: " # n h i X (i) fX x(i) 1 2 V I x ∈ Ωf (2.249) V Pbf = SEPb = f n i=1 h (x(i) ) and, a (1 − α) CI of Pf (Section 2.2.6) reads: i h Pbf − SEPbf z1− α2 ; Ib − SEPbf z α2
The general idea of IS is to find an auxiliary joint PDF h that is more suitable than fX . Intuitively, when assessing small probability of failure, a large amount of realization according to fX are required to obtain a single failure one (as it is a rare event). Therefore, a suitable auxiliary distribution should generate a large amount of samples in Ωf . Formally, the optimal auxiliary joint PDF is the one that minimizes the variance of the estimate (2.249). Let us consider an auxiliary distribution h defined as the original joint PDF fX but restricted to Ωf : h (x) =
I [x ∈ Ωf ] fX (x) Pf
(2.249) reduces to: n h i 1X V Pbf = V [Pf ] = 0 n i=1
Tools for Stochastic Analysis
(a) Original joint PDF and limit state.
(b) Optimal auxiliary joint PDF.
Figure 2.13: An example of optimal auxiliary distribution hopt . As the variance is a strictly positive quantity, h is the optimal auxiliary distribution hopt (Bucklew, 2004) as it minimizes the variance of the IS estimate. Figure 2.13 shows an example of optimal auxiliary distribution. Note that the domain of h must contain Ωf entirely. Obviously, hopt cannot be obtained in practical cases as it is conditional on the knowledge of Pf and Ωf . Most IS variations aims at defining an auxiliary distribution that approximates hopt . A common auxiliary distribution in the independent standard normal space is a Gaussian with: µX = xMPP , ΣX = Id (2.253) where Id stands for the identity matrix of size d × d. Two other popular IS variations are: • Adaptive IS: such as Cross-entropy (Rubinstein and Kroese, 2004); • Non-parametric Adaptive IS (NAIS): such as in the work of Zhang (1996) and Morio (2012).
Subset Simulations
Another variance reduction technique that has gained a lot of traction in the past decade is called subset simulation (SubSim). Introduced by Au and Beck (2001), SubSim exploits the properties of conditional 73
Reliability Assessment probabilities to alleviate the computational burden of the CMC estimate. For example, consider the case where one wishes to estimate a Pf = 10−4 . According to (2.220), to achieve a 10% coefficient of variation, a random sample of size n = 106 would be required. However, one can express Pf such as: Pf = 10−4 = 10−1 10−1 10−1 10−1
In order to estimate a probability of 10−1 with a 1% coefficient of variation, a random sample of size n = 104 is required. Therefore, Pf could be estimated accurately at a cost of n = 4 × 104 (≈ 50% coefficient of variation with CMC). Specifically, given Ω and Ωf the domain of fX and the failure domain (0) (1) (s) respectively, let Ωf ≡ Ω ⊃ Ωf ⊃ · · · ⊃ Ωf ≡ Ωf be a decreasing sequence of s + 1 domains where: (i) Ωf = x| g (i) = g (x) − t(i) (x) ≤ 0 , ∀i = {1, . . . , s} (2.255) (2.214) can be expressed as:
Pf =
s Y
where: (i) Pf
(i−1) dx I g (i) (x) ≤ 0 qi−1 x Ωf
(i−1) the conditional auxiliary PDF associated to the with qi−1 x Ωf (i−1)
intermediate domain Ωf defined as in Song et al. (2009). fX (x) if i = 1 (i−1) qi−1 x Ωf = I[g(i−1) (x)≤0] Qi−1 (j) fX (x) if i = [2, . . . , s] j=1
The sequence of intermediate domain is constructed as the process (i−1) goes on. Let us define X(i) as a size n random sample of qi−1 x Ωf . (i)
The j th realization of X(i) is noted x(i,j) . The intermediate domains Ωf are defined such that: (i)
Pf = P s ,
i = [1, s − 1] 74
Tools for Stochastic Analysis where Ps is a user defined step probability. Numerical experiments have shown that Ps = 10−1 should be used (Au and Beck, 2001). At step (i) i, the next intermediate threshold t is defined as the Ps empirical quantile of g (i−1) X(i) . In other word, t(i) is chosen such that only h i (i) (i−1) (i) Ps × n values of g X −t are negatives. If t(i) si lower than (s)
0, then ti = 0, s = i, Pf > Ps and the algorithm stop.
The next random sample X(i+1) is typically obtained using MCMC techniques (Andrieu et al., 2003). Arguably the most famous one is referred to as Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970). In their work, Au and Beck (2001) uses a component Metropolis-Hastings algorithm that makes SubSim more robust in high dimensions. The SubSim estimate then reads:
Pbf =
s Y i=1
(i) Pbf
n 1 X (i) (i,j) (i) b I g x Pf = ≤0 n j=1
Figure 2.14 illustrates the SubSim process on a complex limit state. The coefficient of variation of the SubSim estimate cannot be derived due to the inherent correlation in the MCMC sample, although Au and Beck derived an upper bound. Although the derivations presented in this section are for component reliability, SubSim can be extended to system cases very easily, as long as realizations can be ranked in terms of “level of failure”. Example 2.19 (Comparison of Reliability Techniques) Consider the three following test cases: Example RA.1: X1 + X2 √ 2 2 X1 , X2 ∼ N 0, 1 U=X g (X) = 3 −
Reliability Assessment 5 4 3
g (1) g (2) g (3) g (4)
1 0 -1 -2 -3 -4 -5 -5
X1 Figure 2.14: An example of a subset simulation run on a complex 2D limit state. Example RA.2: X1 + X2 √ 2 X1 , X2 ∼ E (0.5) g (X) = 3 −
⇒ Ui = Ti (Xi ) ∼ N 0, 12 = Φ−1 (FE (Xi |0.5))
Example RA.3: g (X) = 20 − (X1 + X2 )2 X1 , X2 ∼ N 0, 12 U=X
Figure 2.15 illustrates the three limit states and associated distribution. Table 2.5 summarizes the estimated probability of failure Pbf and reliability index βb using FORM, SORM, SubSim and CMC for Example RA.1. In addition, SubSim and CMC are reproduced three hundred times to compute the sample 76
Tools for Stochastic Analysis
3 2
? g (x) = 0 MPP
1 0
-4 -4
-4 -4
(a) Example RA.1
? g (x) = 0 MPP
0 -1
? g (x) = 0 MPP -2
-2 -3 0
(b) Example RA.2
-4 -4
(c) Example RA.3
Figure 2.15: Illustration of reliability examples used for comparison.
coefficient of variation of the estimates. For a linear limit state hyper-surface, FORM is exact. Table 2.6 shows the results of the same comparison for Example RA.2. Note that, although the limit state hypersurface is linear in the real space (X), it is highly non linear in the standard normal space (U). The FORM assumption is for the limit state hyper-surface to be linear in the standard normal space, which explain the large inaccuracy in its estimate. However SORM is able to correct the estimate and account for the curvature. Table 2.7 shows the results of the same comparison for Example RA.3. In that case, only SubSim is able to provide an accurate estimate.
Reliability-based Design Optimization
The previous section introduced different approaches for structural reliability assessment. However, reliability assessment only provides binary information, whether the design is safe or not. On the other hand, design optimization aims at finding an optimal design. A typical problem 77
Reliability-based Design Optimization
Table 2.5: Comparison of 4 reliability assessment techniques on Example RA.1 (2.262). FORM SORM SubSim CMC ] func. calls 6 19 250998 106 ¯ b βb(a) 3 3 3.00 3.00 β/ ¯ (a) −3 1.35 1.35 1.35 1.35 Pbf /Pbf (×10 ) (a) cvβb (%) N/a N/a 0.29 0.28 (a) cvPbf (%) N/a N/a 2.90 2.74 (a)
Computed over 300 repetitions
Table 2.6: Comparison of 4 reliability assessment techniques on Example RA.2 (2.263). FORM SORM SubSim CMC ] func. calls 15 28 233326 106 ¯ b βb(a) β/ 3.09 2.86 2.88 2.88 ¯ (a) −3 Pbf /Pbf (×10 ) 0.99 2.15 1.96 1.96 cvβb(a) (%) N/a N/a 0.32 0.24 (a) cvPbf (%) N/a N/a 2.91 2.20 (a)
Computed over 300 repetitions
Table 2.7: Comparison of 4 reliability assessment techniques on Example RA.3 (2.264). FORM SORM SubSim CMC ] func. calls 45 58 250093 106 b β¯b(a) β/ 3.16 3.06 2.95 2.96 ¯ Pbf /Pbf (a) (×10−3 ) 0.78 1.11 1.57 1.56 cvβb(a) (%) N/a N/a 0.27 0.26 (a) cvPbf (%) N/a N/a 2.56 2.47 (a)
Computed over 300 repetitions
Tools for Stochastic Analysis statement reads: min C (x)
s.t. hi (x) ≤ 0 lj (x) = 0 lk ≤ xk ≤ uk
i = [1, . . . , nineq ] j = [1, . . . , neq ] k = [1, . . . , n]
where C is some objective function (e.g., weight, cost), and hi and lj are inequality and equality constraints respectively (typically representative of design requirements). Design optimization, also referred to as deterministic optimization, does not account for uncertainties. The combination of reliability assessment and design optimization leads to reliability-based design optimization (RBDO). A typical RBDO problem statement reads: min C (z, θ)
s.t. P [X ∈ Ωfi (z)] ≤ PTi hj (z, θ) ≤ 0 lk (z, θ) = 0 llz ≤ zl ≤ uzl θ lm ≤ θm ≤ uθm X ∼ fX (x|θ)
i = [1, . . . , nf ] j = [1, . . . , nineq ] k = [1, . . . , neq ] l = [1, . . . , nz ] m = [1, . . . , nθ ]
where z are nz deterministic design variables and θ are nθ hyperparameters of the d random variables X joint distribution. The probability P [X ∈ Ωfi (z)] defines the ith stochastic constraint, and hj and lk are as defined in (2.265). The failure domains and constraints are typically simple functions of a computationally expensive model M (Section 1.1) outputs Y. Note that the ith failure domain Ωfi can be made of several limit states function gij (i.e., system reliability as discussed in Section 2.4). The probability defined in the stochastic constraints are expressed as: Pfi (z, θ) = P [X ∈ Ωfi (z)] Z +∞ = I [x ∈ Ωfi (z)] fX (x|θ) dx
The remaining sections of the current chapter introduce the reader to a non-exhaustive review of existing RBDO techniques. For a detailed review of RBDO techniques along with benchmark problems, 79
Reliability-based Design Optimization the interested reader is referred to Aoues and Chateauneuf (2010) and Valdebenito and Schu¨eller (2010).
Stochastic Constraint Transformations
In many ways, the field of RBDO evolved similarly to the domain of reliability assessment. As such, various works have been dedicated to developing FORM-like approaches. In these works, the stochastic constraints: Pfi (z, θ) ≤ PTi (2.268) are rewritten in some ways.
Reliability Index Approach
In the reliability index approach (RIA), (2.268) is expressed as (Enevoldsen and Sø rensen, 1994): βi (z, θ) ≥ βTi
where βTi = −Φ−1 (PTi ) and βi (z, θ) is the ith reliability index calculated from FORM (Section 2.4.2): βi (z, θ) = uMPP (z, θ) (2.270) such that:
1 T uu 2 u s.t. gi T−1 (u, θ) , z ≤ 0
uMPP (z, θ) = arg min
However, RIA as shown some instability and excessive computational cost in some situations (Tu et al., 1999, 2001; Royset et al., 2001).
Performance Measure Approach
A different route consists in substituting the inverse FORM setup (Der Kiureghian et al., 1994) in place of FORM. Recall the RIA expression (2.271). One can, for a target reliability index βTi , express the inverse FORM problem, in which gi is minimized along a hyper-spherical constraint: uMPTP (z, θ) = arg min gi T−1 (u, θ) , z (2.272) u
s.t. 80
1 T uu = βT2i 2
Tools for Stochastic Analysis 4
? g (u) = 0 uMP P uMP T P
-4 -4
! " g --uM P T P-- = -- M P P -- = -- u ---u M P T P -- = -1
1 -i -Ti 2
U1 Figure 2.16: Graphical illustration of PMA constraint conversion. If Gi (z, θ) ≥ 0 then Pfi (z, θ) ≤ PTi . where uMPTP is referred to as the minimum performance target point (MPTP). The probabilistic performance measure Gi refers to the limit state function value at the MPTP: Gi (z, θ) = gi xMPTP (z, θ) , z (2.273) where:
xMPTP (z, θ) = T−1 uMPTP (z, θ) , θ
In the performance measure approach (PMA), the ith stochastic constraint (2.268) is expressed as: Gi (z, θ) ≥ 0
as, if the probabilistic performance measure is greater than zero, the reliability index βi (z, θ) must be greater than βTi (as illustrated on Figure 2.16). Like for FORM, specific algorithm have been developed to solve (2.272), such that the advanced mean value (Wu, 1984). Improvement of this algorithm lead to the hybrid mean value approach (Youn et al., 2003; Youn and Choi, 2004), which increased the robustness of the MPTP search. Arc length bsased algorithm have also been explored (Du et al., 2003; Youn et al., 2004a, 2005). 81
Reliability-based Design Optimization Note that if the computationally expensive model M is replaced by f (as discussed in Chapter 3), the probability Pf (z, θ) a meta-model M i can be approximated using any reliability method at no extra cost (e.g., SubSim).
Stochastic Constraint Sensitivities
Regarding the numerical optimization of (2.266), gradient based algorithm would be preferred for their robustness and superior efficiency in high-dimensional spaces. However, the numerical cost of finite differences on the probability Pfi (z, θ) would be prohibitive. Even in the case where one could afford finite differences, the probability Pfi (z, θ) can be noisy (e.g., when CMC or SubSim are used), forbidding the use of finite differences. Reliability index approach When RIA approximation is used, sensitivity of βi (z, θ) can be derived (Bjerager and Krenk, 1987, 1989; Nikolaidis et al., 2004). For a random vector X and a transformation T (Section 2.4.2), sensitivities are expressed as: ∂gi 1 dβi = dzj z,θ ||∇u giMPP || ∂zj uMPP ,z,θ T dβi uMPP ∂T = dθj z,θ βi (z, θ) ∂θi xMPP ,θ
(2.276) (2.277)
where : ∇u giMPP
∂gi u1
··· = uMPP ,z,θ gi ≡ gi T−1 (u, θ) , z
∂gi ud
uMPP ,z,θ
(2.278) (2.279)
Performance Measure Approach When PMA is used, one can show that the sensitivities of the prob82
Tools for Stochastic Analysis abilistic performance measure Gi (z, θ) are: ∂gi dGi = dzj z,θ ∂zj uMPTP ,z,θ dGi ∂gi = dθj z,θ ∂θj uMPTP ,z,θ ∂T−1 ∇x giMPTP = ∂θj uMPTP ,z,θ !T ∂T−1 ∂T = ∇u giMPTP ∂θj uMPTP ,z,θ ∂x xMPTP ,z,θ
∇x giMPTP
∂gi x1
xMPTP ,z,θ
gi ≡ gi (x, z) h ∂g ∇u giMPTP = u1i MPTP u
gi ≡ gi (T (u, θ) , z)
··· ···
∂gi xd
xMPTP ,z,θ
∂gi ud
uMPTP ,z,θ
(2.280) (2.281) (2.282) (2.283)
(2.284) (2.285) (2.286) (2.287)
and ∂T is the Jacobian matrix of the transformation T. ∂x CMC and SubSim When the probabilities Pfi (z, θ) in the stochastic constraints (2.268) are estimated using CMC, sensitivities with respect to the distributions hyper-parameters θ can be easily derived (Lebrun and Dutfoy, 2009; Lee et al., 2011) as: Z +∞ dPfi d = I [x ∈ Ωfi (z)] fX (x|θ) dx dθj z,θ dθj −∞ Z +∞ dfX = I [x ∈ Ωfi (z)] dx dθj x,θ −∞ Z +∞ d ln fX = I [x ∈ Ωfi (z)] fX (x|θ) dx (2.288) dθj x,θ −∞ which can be estimated as a cross-product of the CMC estimate Pbfi at no extra cost: n d ln fX dPbfi 1 X (k) I x ∈ Ωfi (z) (2.289) = dθj n k=1 dθj x(k) ,θ z,θ
Reliability-based Design Optimization When SubSim is used to estimate Pfi (z, θ), Song et al. (2009) derived the sensitivities with respect to hyper-parameters θ and their estimates as a cross-product of Pbfi (z, θ). Using the notation of Section 2.4.5, the sensitivity of Pfi (z, θ) is simply: (k) s X dP dPfi Pfi (z, θ) fi = (2.290) (k) dθj z,θ P (z, θ) dθj k=1
The sensitivity of the intermediate probabilities are obtained as for CMC: Z +∞ h (1) i df dPfi X (k) I gi (x, z) ≤ 0 dx (2.291) = dθj dθj x,θ −∞ z,θ Z +∞ h (k) i dq dPfi k−1 (k) I gi (x, z) ≤ 0 dx (2.292) = dθj dθj x,z,θ −∞ z,θ
(k−1) For k = [2, . . . , s], the auxiliary distribution qk−1 x θ, Ωfi (z) is defined as: I g (k−1) (x, z) ≤ 0 (k−1) fX (x|θ) (2.293) qk−1 x θ, Ωfi (z) = Qk−1 (l) (z, θ) P fi l=1
For the sake of clarity, the following notations and dependence omissions are used: Ik−1 (x) qk−1 (x) = fX (x|θ) (2.294) Pk−1
The derivative of qk−1 then reads: " # Ik−1 (x) dfX dPk−1 dqk−1 = Pk−1 − fX (x|θ) (2.295) dθj x,z,θ P2k−1 dθj x,θ dθj z,θ where:
(l) k−1 X dPfi dPk−1 Pk−1 = (l) dθj z,θ dθ j P (z, θ) f l=1 i
(2.296) z,θ
Estimates of the intermediate sensitivities are obtained as: (1) n dPbfi 1X d ln f X I1 x(1,l) (2.297) = dθj n l=1 dθj x(1,l) ,θ z,θ
Tools for Stochastic Analysis
dPbfi dθj where:
n 1X (k,l) d ln qk−1 = Ik x n l=1 dθj x(k,l) ,θ
d ln qk−1 1 dqk−1 = dθj x,θ qk−1 dθj x,θ Pk−1 dqk−1 = Ik−1 fX dθj x,θ (l) k−1 X dPfi d ln fX 1 = − dθj x,θ l=1 Pf(l) (z, θ) dθj i
(2.299) (2.300) (2.301) z,θ
Although sensitivities of probabilities of failure with respect to hyperparameters θ for CMC and SubSim are well known and have been widely used, sensitivities with respect to deterministic design variables z are lacking in the literature. Such sensitivities are derived in Section
Double Loop Approaches
The most straightforward approach to solve the RBDO problem is referred to as double loop. For any optimization solver (e.g., SQP, pattern search), the stochastic constraints (2.268) (either calculated as discussed in Section 2.4 or rewritten as discussed in Section 2.5.1) are computed every time the optimization solver requires it. This setup is also referred to as nested approach, as the reliability technique is called repeatedly in the optimization process. Because the reliability technique is nested within the optimization loop, double loop approaches are usually computationally intensive. For example, if the optimization loop requires No (e.g., 100) calls to the reliability technique, which itself requires Nr (e.g., 105 for CMC) calls to the computationally expensive model M, a total of No × Nr (e.g., 107 ) calls to M are required. Alternative approaches have been explored where the RBDO problem (2.266) is rewritten in some way.
Single Loop Approaches
Single loop approaches attempts to remove the reliability loop from the RBDO formulation. In doing so, it is hoped that the computational 85
Reliability-based Design Optimization cost will be decreased. The idea of a single loop approach was first proposed by Chen et al. (1997) but gained traction following the works of Liang et al. (2004, 2007). The single loop approach (SLA) main idea is to use the first iteration of the advanced mean value for PMA as an approximation of the MPTP. As presented in Liang et al. (2004), SLA is restricted to the mean θ = µX of the random variables X and is based on PMA. Hence, the probabilistic constraints (2.268) become: gi xMPTP (z, µX ) , z ≥ 0 (2.302) However, instead of carrying out the complete inverse FORM at each iteration, the MPTP is approximated as the result of the advanced mean value first iteration: MPTP −1 e x (z, µX ) = T −βTi ∇u gi |µX , µX (2.303) i h ∂g i · · · ∂g (2.304) ∇u gi |µX = u1i ud u=µX ,z,µX u=µX ,z,µX gi ≡ gi T−1 (u, µX ) , z (2.305)
Example 2.20 (An illustration of SLA) Consider the following RBDO problem: − µX1 µX2 X1 + X2 ≤ 0 ≤ 10− 1 s.t. P 3 − √ 2 0 ≤ µXi ≤ 5 Xi ∼ E (µXi )
min µX
Figure 2.17 shows the difference between the true and approximated MPTP in SLA.
Sequential Approaches
Another alternative to double loop approaches are referred to as sequential or decoupled approaches. Such techniques solve the RBDO problem through a succession of deterministic optimization and reliability analysis. 86
Tools for Stochastic Analysis 5
g (u) = 0 71X ! " eMP T P !71X " x xMP T P 71X
4.5 4 3.5
3 2.5 2 1.5 1 0.5 0
X1 Figure 2.17: Graphical illustration of SLA.
Sequential optimization and reliability assessment
The sequential optimization and reliability assessment (SORA) was introduced by Du and Chen (2004). In Du and Chen (2004), SORA is also restricted to the mean θ = µX of the random variables X. The general idea of SORA is to use the MPTP from the previous iteration to shift the PMA constraints. At iteration k, the PMA constraints: gi xMPTP (z, µX ) , z ≥ 0
are replaced by:
gi µX + sk , z ≥ 0
( 0 s = xMPTP z, µk−1 − µk−1 X X k
(2.308) if k = 1 if k > 1
is the solution of the optimization at iteration k − 1. Note and µk−1 X how the constraint is always exact when the optimization restarts at iteration k from the previous optimum µk−1 X : k MPTP z, µk−1 ,z gi µk−1 X + s , z = gi x X 87
Reliability-based Design Optimization 5
g (u) = 0 71X ! " xM P T P 71X s2
g (u) = 0 72X ! " xM P T P 72X s2 s3
(a) Iteration 1.
g (u) = 0 720 X ! " xM P T P 720 X s20 s21
(b) Iteration 2.
(c) Iteration 20.
Figure 2.18: Iteration 1, 2 and 20 of SORA for Example 2.21. Therefore, SORA solves the RBDO problem by a sequence of deterministic optimization (based on a fixed shift, i.e., MPTP) and inverse FORM (to update the MPTP, i.e., the shift). Example 2.21 (An illustration of SORA) Recall Example 2.20. Figure 2.18 shows the first, second and twentieth iterations of SORA. At the first iteration, there is no shift and the optimum is on the limit state hyper-surface. The true stochastic constraint is largely violated as showed by xMPTP (µ1X ) (as the MPTP must be in the safe domain or on the limit state hyper-surface for the stochastic constraints (2.268) to be satisfied). At the second iteration, the optimum accounts for the shift but is too conservative. Figure 2.18(c) illustrates the twentieth iteration, when SORA has converged.
Taylor Expansion
Another decoupled approach was introduced simultaneously by Zou and Mahadevan (2006) and Cheng et al. (2006), although restricted to the mean θ = µX of the random variables X and no deterministic design variables z. In their contributions, Zou and Mahadevan and Cheng et al. solve the RBDO problem (2.266) through a sequence of reliability and optimization loops. At iteration k, the ith probability of failure Pfi (µX ) in (2.268) are evaluated at µkX (using any technique in Section 2.4) and, using Section, a first-order Taylor expansion 88
Tools for Stochastic Analysis is built: Pefi (µX ) = Pfi
nθ X ∂Pfi k µXj − µXj ∂µXj µk j=1
This approximation is then used to carry out an optimization loop. Note that no extra call to the computationally expensive model M is done during this process. The solution of the optimization becomes µk+1 and the process is repeated until convergence. Note that RIA and X PMA could be used in place of (2.268). In this last approach, Zou and Mahadevan and Cheng et al. had the idea to sequentially approximate the stochastic constraints with linear Taylor expansions. In FORM (and therefore RIA and PMA) and SORM, the limit state is approximated by a linear and a quadratic form. This raise the question, is there better approximations (beyond linear and quadratic) that could be used? The focus of the next chapter of this dissertation is to answer this question.
Chapter 3 Supervised Learning for Computational Design In the previous chapter, methods and approaches were discussed for computational design and decision making. It was also mentioned that some of these methods (Section 2.4.2, 2.4.3 and relied on approximations of the true limit state function (e.g., first-order Taylor expansion). However, such approximations can quickly lead to inaccurate results when the true limit state function is highly non-linear. The aim of the current chapter is to explore more advanced approximations. Machine learning is a vast discipline in which some algorithm (machine) tries to learn an underlying behavior or dependence from a data set. Of specific interest in this work is a sub category of machine learning referred to as supervised learning. For the purpose of computational design, the task of supervised learning is to approximate a computationally expensive model M using a limited data set. A meta-model of f Depending on the discipline or scientific culture, M f M is noted M. is also referred to as a learner, predictor, surrogate, or approximation. f are usually computationally inexpensive and can Such meta-models M subsequently be used in place of the true model M in computationally intensive procedures such as CMC or SubSim estimates of probabilities of failure.
Fundamentals of Supervised-Learning
This work takes some liberties with the formal definitions of supervised learning for the sake of clarity. For a complete introduction to the field 90
Supervised Learning for Computational Design of machine learning, the interested reader is referred to Hastie et al. (2009) and Bishop (2006).
Problem Definition
Consider a computationally expensive model M with inputs X and an output Y such that (Section 1.1): Y = M (X)
and two size n samples Xtr (referred to as training sample) and Y tr (referred to as training value sample) that form the training set Tr = [Xtr , Y tr ] such that: (i) (i) ytr = M xtr , i = [1, . . . , n] (3.2) A meta-model is a function that, given the information from Tr, gives a prediction Ye of Y at X such that: f (X|Tr) = Ye ≈ Y = M (X) M
In addition, some meta-models also provide an estimate of the prefSE (X|Tr). The case of multiple outputs Y is diction standard error M treated either by considering each component independently, or through more advanced techniques that fall outside the scope of this work (e.g., co-Kriging). Note that the notation for a sample X (e.g., training sample) is the same as for a random sample as the two notions are closely related. In addition, as a random sample X is made of realizations x(i) , a sample X is made of instances x(i) . Note that in the field of surrogate modeling, sample and samples are interchangeably used to refer to both the samples and instances of the machine learning literature. For the sake of clarity, the machine learning terminology is used hereafter.
Regression vs Classification
Based on how the output y of the model M is treated by the metaf the learning problem is referred to as: model M, • regression if y is treated as continuous in R;
• classification if y is treated as binary (e.g., ±1). 91
Fundamentals of Supervised-Learning The range of applications of classification techniques is wider as it allows for the treatment of binary, discontinuous or non-differentiable problems. However, regression settings should be preferred when possible. Because more information is gathered (continuous vs binary), regression techniques typically show better performance. Such performance f is defined through a loss function. of a meta-model M
Loss and Risk
For a prediction ye of y, a loss function L is defined such that: ( 0 if y = ye L (y, ye) = > 0 if y = 6 ye
Two of the most common loss functions are: • quadratic loss (regression):
• 0-1 loss (classification):
L (y, ye) = (y − ye)2
( 0 if y = ye L (y, ye) = I [y = 6 ye] = 1 if y 6= ye
f is defined as the expectation of the loss The risk of a meta-model M L: h i f = E L Y, Ye R M Z +∞ f (3.7) = L M (x) , M (x|Tr) fX (x) dx −∞
The risks associated to the previously mentioned loss functions are referred to as: • mean square error (MSE):
h i 2 f = E Y − Ye MSE M 92
Supervised Learning for Computational Design • misclassification error (ME): h i h h ii f e ME M = E I Y 6= Y
f is inversely proportional to The predictive ability of a meta-model M its risk. Note that in engineering settings, the risk has a very different meaning and should not be confused with the machine learning risk.
Model Selection
f is usually also a Although omitted in the notations, the meta-model M function of hyper-parameters ψ (different from the distribution hyperparameter θ). These hyper-parameters ψ typically control the form or the complexity of the meta-model. Such hyper-parameters should always be chosen so as to minimize the risk (maximize the predictive ability). In general, the risk cannot be computed analytically and must be estimated.
Risk estimator
The easiest way to estimate the risk (3.7) is to use a CMC estimate based on the training set Tr: Remp
n 1X (i) f (i) f M = L ytr , M xtr |Tr n i=1
This estimate is referred to as empirical risk or training error. However, when the training error is used to estimate the risk, it is well known that over-fitting is likely to occur. The learner is so “eager” to correctly predict the training values that it degrades its predictive ability. This concept is a consequence of the bias-variance trade-off (Hastie et al., 2009). To address this issue, Vapnik and Chervonenkis (1974) introduced the notion of structural risk. The structural risk accounts for the metamodel complexity to update the empirical risk such that: f f f Rstr M = Remp M + λC M (3.11) where λ is a penalization parameter (that can be seen as a new hyperparameter) and C is a penalizing function that increases with the meta93
Fundamentals of Supervised-Learning model complexity. This expression leads to a trade-off between fitting the training data (bias) and limiting the complexity of the metamodel(variance). Another way to avoid over-fitting is to use a CMC estimate of the risk (3.7) based on a separate test set Te such that: Rgen
n 1X (i) f (i) f L yte , M xte |Tr M = n i=1
This estimate is referred to as generalization error. When the training error is used, all of the available data are used to train the meta-model and then used to assess its risk. On the other hand, when the generalization error is used, the data are split into two mutually exclusive sets Tr and Te and only Tr is used to train the meta-model. Example 3.1 (Graphical Illustration of Over-fitting) Consider the following function: f (x) = ln (x) + ε,
ε ∼ N (0, 1)
Figure 3.1 shows a graphical illustration of the over-fitting concept.
Large Data Set
When a large amount of data is available (such as in bioengineering settings), data can be split three ways into a training set Tr, a calibration set Tc and a validation set Te. The meta-model is trained on Tr, the hyper-parameters are selected so as to minimize the generalization error on Tc and the meta-model predictive ability is estimated as the generalization error on Te. However, when data are sparse (e.g., because M is a computationally expensive model), this process is suboptimal. One would like to use all of the available data for the training without relying on the empirical risk.
Leave-One-Out and Cross Validation
Under the assumption that the training sample Xtr is representative of the population of X, one can show (Luntz and Brailovsky, 1969) that the leave-one-out (LOO) error is an almost unbiased estimator of the 94
Supervised Learning for Computational Design 4
ln 95% CI Data Over--tting Proper -t
-5 0
x Figure 3.1: Graphical illustration of over-fitting for Example 3.1. risk. The LOO error is defined as: RLOO
n 1X (i) f (i) (−i) f M = L ytr , M xtr |Tr n i=1
where Tr(−i) is the training set Tr where the ith example has been discarded. The LOO procedure is actually is sub-case of a larger set of techniques referred to as cross-validation (CV). Various CV techniques (e.g., k-fold) or bootstrap (Section 2.2.7) approaches can be used to estimate the risk of a learner (Kohavi, 1995).
Common Meta-Models
The following sections present a non-exhaustive list of meta-models. Important omissions include Bayesian linear regressions and relevance vector machines. Regarding these subjects, the interested reader is referred to Tipping (2001); Tipping and Faul (2003); Bishop (2006). 95
Common Meta-Models
Linear Regressions
Arguably the most widely used approach for data-fitting is the linear regression model. This approach relies on a set of m basis functions f (monomials for polynomial regressions): f (x) = [f1 (x) , . . . , fm (x)]
f (x) = f (x) ψ T M
such that: where ψ is a size m vector of hyper-parameters. Note that linear regression refers to the fact that the meta-model is a linear combination of the basis function f rather than the inputs x. As discussed in Section, the hyper-parameters ψ should be chosen as to minimize the risk. The ψ values that minimize the training error under the quadratic loss (ordinary least-squares) can be obtained analytically as: −1 T T ψT Φ Y tr opt = Φ Φ
where Φ is the training matrix defined as: (1) (1) f x · · · fm xtr 1 tr .. .. ... Φ= . . (n) (n) f1 xtr . . . fm xtr
The prediction and prediction standard error (under the hypothesis of Gaussian additive noise) are defined as (Goel et al., 2009; Forrester and Keane, 2009): f (x|Tr) = f (x) ψ T M (3.19) opt q fSE (x|Tr) = σ 2 f (x) ΦT Φ −1 f (x)T M (3.20) where:
σ2 =
Pn i=1
2 (i) f x(i) ytr − M |Tr tr n−m
According to the discussion in Section, this meta-model could over-fit, as the ψ values are chosen based on the empirical risk. Approaches such as the one discussed in Sections and can 96
Supervised Learning for Computational Design be applied. Structural risk minimization (also referred to as shrinkage) approaches have also been investigated (Tibshirani, 1996): m X f = C M |ψi |q
where q = 2 and q = 1 are referred to as Tikhonov (or ridge) regression and least absolute shrinkage and selection operator (lasso) respectively. Efficient algorithm to compute the entire lasso path (analytical expression of optimal ψ as a function of the regularization parameter λ) have been proposed, such as least angle regression (LARS, Efron et al., 2004). λ can then be selected such as discussed in Sections and
Radial Basis Functions Networks
Radial basis functions (RBFs) networks (Broomhead and Lowe, 1988) are a special case of linear regression. A basis function fr is said to be radial if it is a function of the distance between x and its center c. This distance is usually the Euclidean one but does not have to be. The Gaussian RBF is arguably the most widely used and is defined as: fr (x, c) = e−
||x−c||2 2σ 2
For a size n training set Tr, an RBFs network is a linear regression problem where the basis functions are obtained by centering an RBF on each training example such that: h i (1) (n) f (x) = fr x, xtr , . . . , fr x, xtr (3.24) The RBFs are sometimes replaced by their normalized counterpart (Bishop, 2006): (i) fr x, xtr (i) frs x, xtr = P (3.25) (n) n f x, x tr j=1 r which mostly changes the behavior away from the training examples. Most RBFs have a scale parameter σ that needs to be tuned using approaches such as the ones discussed in Sections and
Example 3.2 (Normalized RBF) Figure 3.2 illustrates the differences between regular and 97
Common Meta-Models 1
fr (x; 3) fr (x; 5) fr (x; 8)
frs (x; 3) frs (x; 5) frs (x; 8)
0 0
(a) Regular RBF.
(b) Normalized RBF.
Figure 3.2: Comparison of regular and normalized RBF on a one dimensional example. normalized Gaussian RBF (σ = 1) for a training sample X defined as: (3.26) X = [3, 5, 8]T
Polynomial Chaos Expansions
Made popular by Soize and Ghanem (2004), polynomial chaos expansions (PCEs) can be seen as another special case of linear regression: f (x) = f (x) ψ T M
In this framework, the basis functions fi are chosen depending on the joint distribution of the inputs X. For the sake of simplicity, let us first consider the case of a univariate X with PDF fX . In the PCE framework, the basis functions must belong to an orthonormal polynomial basis π(·) such that: Z +∞ πi (x) πj (x) fX (x) dx = δij (3.28) −∞
where δij is the Kronecker symbol and πi (i.e., fi in this case) is the ith order orthonormal polynomial of X. For example, if X ∼ N (0, 12 ), 98
Supervised Learning for Computational Design the Hermite form should be used Soize and Ghanem (2004): if i = 0 1 πi (x) = x if i = 1 xπi−1 (x)−(i−1)π (x) i−2 √ if i ≥ 2 i!
Various families of polynomials have been matched with given distributions (e.g., Lagrange polynomials for uniform distributions). More details can be found in Soize and Ghanem (2004). For an independent multivariate X, the basis functions are defined as the product of the univariate polynomials. For example, in a bi(1) (2) variate case X1 and X2 , where π(·) and π(·) are the orthonormal basis associated to X1 and X2 respectively, the ith basis function fi of degree k and l with respect to X1 and X2 respectively is defined as (1) (2) fi (x) = πk (x1 ) × πl (x2 ). Historically, there are two forms of implementation of PCE, referred to as intrusive and non-intrusive. In the intrusive approach, the computationally expensive model is not used as a black-box. Rather, one goes back to the governing equations behind the model and introduces a PCE expansion to re-express the outputs. This approach requires significant effort, but has lead to very efficient approaches in the field of stochastic finite elements. Non-intrusive approach refers to the conventional data-fitting setup and is far less involved to deploy. In-depth readings on PCE includes: • Berveiller (2005, in French) for a discussion on intrusive and nonintrusive approaches; • Blatman (2009) and Blatman and Sudret (2010, 2011) for application of structural risk minimization approaches, such as stepwise regression and LARS-lasso.
Gaussian Processes
Gaussian processes (GPs), or Kriging, have been growing in popularity within the fields of surrogate modeling and machine learning. Various introduction to GPs have been proposed such as Krige (1951); Matheron (1963); Sacks et al. (1989); Cressie (1990, 1993); Forrester et al. (2008). In this work, the setting used in Rasmussen (2004) is used. Recall the definition of a Gaussian random process as discussed in Example 2.3. The idea of GPs for machine learning is to see the output 99
Common Meta-Models y = M (x) as a realization of a Gaussian random process. The following setting hence aims at learning the underlying Gaussian random process.
A random process R is said Gaussian: R ∼ GP (m, k)
Y (x) ∼ N (m (x) , k (x, x))
Ytr ∼ Nn (µtr , Σtr ) h i (1) (n) µtr = m xtr , . . . , m xtr (1) (1) (1) (n) k xtr , xtr . . . k xtr , xtr .. .. .. Σtr = . . . (n) (n) (n) (1) . . . k xtr , xtr k xtr , xtr
if: where m is a mean function and k a covariance function. Therefore, the size n training value sample can be seen as one realization of the random vector Ytr such that:
For a size nt test set Te, a similar joint distribution can be expressed: Σtr ΣT [Ytr , Yte ] ∼ Nn+nt [µtr , µte ] , (3.33) Σ Σte where µte and Σte are defined like µtr and Σtr respectively but for Te. Σ is the covariance matrix between the training and test instances: (1) (1) (1) (n) k xte , xtr . . . k xte , xtr .. .. ... Σ= (3.34) . . (n ) (1) (n ) (n) k xte t , xtr . . . k xte t , xtr
The random vector Yte is the one we wish to predict. In other words, we would like to know the distribution of Yte knowing the observed Y tr . Neal (1999) derived the expression of a conditional Gaussian distribution such that −1 T −1 T Yte |Ytr ∼ Nnt µte + Y T (3.35) tr − µtr Σtr Σ , Σte − ΣΣtr Σ 100
Supervised Learning for Computational Design The prediction and prediction standard error are defined as: f (x|Tr) = m (x) + Y T − µtr Σ−1 ΣT M tr tr x q fSE (x|Tr) = k (x, x) − Σx Σ−1 ΣT M tr x where:
h i (1) (n) Σx = k x, xtr , . . . , k x, xtr
(3.36) (3.37) (3.38)
Regarding the choice of the mean function m, any meta-model can be used. Notably, the recent contribution of Kersaudy et al. (2015) used a PCE as a mean function m of a GP. Traditionally, linear regression models are used, the most common being a simple constant term (i.e., linear regression with a single basis function f (x) = 1). The mean function hyper-parameters are noted ψ m . Regarding the choice of the covariance function k, a widely used approach is to use a form such that: k (x, x0 ) = σY2 K (x, x0 ) + σn2 I [x = x0 ]
where σY2 is seen as the overall process variance, σn is the inherent variance in the underlying function and K is a kernel function. σY2 could be specified but is usually left to be tuned. When there is no noise in the data and interpolation is desired, σn2 is set to 0. Otherwise, σn can either be specified based on previous knowledge, or left to be tuned. The kernel K is typically parametrized by a set of hyper-parameters ψ K and is usually defined such that: K (x, x) = 1 lim K (x, x0 ) = 0,
d = ||x − x ||
(3.40) (3.41)
In other words, the closer x and x0 , the higher their correlation. A widely used kernel function is the isotropic Gaussian one: 0
K (x, x ) = e
||x−x0 ||2 2σ 2
where σ is the “radius of influence” of the kernel (and the only kernel hyper-parameter ψ K ). This kernel is called isotropic as a single σ is used for all dimensions. An anisotropic Gaussian kernel reads: " d # X (xi − x0 )2 i K (x, x0 ) = e − (3.43) 2 2σ i i=1 101
Common Meta-Models The set of hyper-parameter ψ of the GP approach is made of ψ m , σY2 , σn2 and ψ K .
Hyper-parameters selection
These hyper-parameters can be selected using approaches discussed in Sections and However, a more elegant approach relies on the notion of maximum likelihood (Section 2.2.4). The likelihood is defined as: (3.44) L (ψ|Xtr ) = fYtr Y Ttr |ψ
where fYtr is defined in (3.32). The log-likelihood reads:
l (ψ|Xtr ) = ln L (ψ|Xtr ) −1 T 1 1 ∝ − ln |Σtr | − YT Y tr − µtr tr − µtr Σtr 2 2
where |Σtr | is the determinant of Σtr . Note that one can derive analytical expressions of the log-likelihood derivatives (Rasmussen, 2004). This expression of the log-likelihood highlights the idea of structural risk. The second term serves as an equivalent to the empirical risk, while the determinant of Σtr acts as a complexity penalty term (Rasmussen, 2004). The hyper-parameters ψ are chosen so as to maximize the log-likelihood. Under the assumption that the mean function m fits the linear regression setting introduced in Section 3.2.1, that is: m (x) =
nm X
fi (x) ψ m i
the MLE problem can be simplified. If the kernel is re-expressed as: k (x, x0 ) = σY2 (K (x, x0 ) + δn I [x = x0 ])
where δn = σσ2n is tuned instead of σn2 , one can show (Welch et al., Y 1992; Dubourg, 2011) that the optimal values of ψ m and σY2 are the generalized least squares estimates: −1 T −1 ψ mT ψ K , δn = FT R−1 F F R Y tr 1 σY2 ψ K , δn = Y tr − Fψ mT R−1 Y tr − Fψ mT n 102
(3.48) (3.49)
Supervised Learning for Computational Design where F and R are the matrix of basis functions f and the matrix of correlations, respectively, evaluated on the training set Tr such that:
(1) xtr
(1) xtr
f . . . f nm 1 . . . . . . F= . . . (n) (n) f1 xtr . . . fnm xtr (1) (1) (1) (n) K xtr , xtr + δn . . . K xtr , xtr . . . . . . R= . . . (n) (1) (n) (n) K xtr , xtr . . . K xtr , xtr + δn
Plugging (3.48) and (3.49) into the log-likelihood (3.45) leads to the reduced log-likelihood rl: 1 n rl ψ K , δn |X tr = − ln σY2 − ln |R| 2 2
or the reduced likelihood rL:
1 rL ψ K , δn |X tr = −σY2 |R| n
Maximizing the reduced log-likelihood is significantly computationally less involved. Additionally, GP can be used for classification (Rasmussen and Williams, 2006; Bishop, 2006).
Support Vector Machines
As described in Vapnik and Chervonenkis (1974), support vector machines (SVMs) are built around the notion of structural risk. In binary classification, the goal is to predict if an instance belongs to one class or the other. The initial idea of SVM is to seek a hyperplane that separates best the two classes (e.g., ±1) over a size n training sample Xtr . Consider Figure 3.3(a). The blue and red points represent the different classes. The dotted green line and solid purple line both properly separate the data, yet, intuitively, the purple one seems more appropriate. 103
Common Meta-Models
2 || ||w
) Tr | x f
1 −
r |T x (
r |T
1 +
(a) Two separating hyper-planes.
(b) Graphical illustration of SVM basic elements.
Figure 3.3: An illustration of the SVM general idea and basic elements.
Linear SVM
In a d dimensional space, a linear SVM is of the form: f (x|Tr) = wxT + b M
where w is the normal vector to the separating hyperplane and b is referred to as the bias. The predicted label at x is defined as; ( f (x|Tr) > 0 +1 if M e (3.55) l (x|Tr) = f (x|Tr) ≤ 0 −1 if M
w and b form the set of hyper-parameters ψ of a linear SVM. Strictly speaking, one could use the methods discussedin Sections and to tune them, using a 0-1 loss function L l, e l . However, Vapnik and Chervonenkis (1974) proposed a more elegant approach. The so-called SVM boundary (Figure 3.3(b), solid purple line) is defined as: f (x|Tr) = 0 M (3.56) while the margin refers to the region bounded by the two isovalues (Figure 3.3(b), dashed green lines): f (x|Tr) = −1 and M f (x|Tr) = +1 M 104
Supervised Learning for Computational Design One can show that the size of the margin (Figure 3.3(b), double-arrowed 2 . The idea of a maximum-margin hyperplane, which line) is equal to ||w|| is at the core of SVM, is to search for the hyperplane that: • classifies the training instances properly; • has the largest margin; • has no training instances within its margin.
Primal and Dual Form
These conditions can be expressed into an optimization problem, referred to as primal form, defined as: 1 ||w|| w,b 2 (i) f (i) s.t. ltr M xtr |Tr ≥ 1
(3.58) i = [1, . . . , n]
where ltr is the ith training label defined as: (i) ltr
( (i) +1 if ytr > 0 = (i) −1 if ytr ≤ 0
The optimization problem (3.58) has elements from the structural risk concept. The constraints ensure proper fitting of the training instances (consistent with the idea of empirical risk) while the objective function maximizes the margin. The higher the margin, the lower the complexity of the learner (Vapnik and Chervonenkis, 1974). Therefore, it acts as the equivalent of a penalization term. Let α be the size n vector of Lagrange multipliers associated with the constraints in (3.58). One can show that: w=
n X
(i) (i)
αi ltr xtr
Replacing (3.60) into (3.58) leads to a new optimization problem, re105
Common Meta-Models ferred to as dual form, defined as: max α
n X i=1
1 XX (i) (j) (i) (j)T αi − αi αj ltr ltr xtr xtr 2 i=1 j=1
s.t. 0 ≤ αi i = [1, . . . , n] n X (i) αi ltr = 0 i=1
From the definition of Lagrange multipliers, only the αi associated to examples on the margin hyperplanes will have non-zero values (active constraints). These training instances are referred to as the nSV support vectors xSV (Figure 3.3(b), circles, as opposed to squares, which are regular training instances). Using the solution from (3.61), w and b can be reconstructed from (3.60) and: nSV 1 X (i) (i)T wxSV − lSV b= nSV i=1
Non-linear SVM
An interesting consequence arises when (3.60) is replaced in (3.54), leading to: f (x|Tr) = M =
n X
(i) (i)
αi ltr xtr xT + b
n X
(i) αi ltr K
(i) xtr , x
where K is any kernel function with hyper-parameters ψ K . This is the so-called kernel trick for non-linear SVM. A linear SVM simply used the classic dot product as a kernel. The use of kernels can be seen as a mapping to a feature space (of finite dimension, e.g., polynomials, or infinite dimension, e.g., Gaussian kernel). The Gaussian kernel is arguably the most popular one: 0
K (x, x ) = e 106
||x−x0 ||2 2σ 2
Supervised Learning for Computational Design where σ is the only kernel hyper-parameter ψ K . Using the same idea, (3.61) becomes:
max α
n X i=1
1 XX (i) (j) (i) (j) αi αj ltr ltr K xtr , xtr αi − 2 i=1 j=1 n
s.t. 0 ≤ αi i = [1, . . . , n] n X (i) αi ltr = 0 i=1
Soft Margin
A traditional introduction to soft margin SVMs comes from the idea of non separable data and has been covered numerous times in reference textbooks (Gunn, 1998; Vapnik, 2000; Christianini and Taylor, 2000; Sch¨olkopf and Smola, 2002). In this work, we approach the soft margin differently. Recall the primal form (3.58) and the structural risk definition (3.11). In the primal form, the constraints have to be satisfied, hence the empirical risk is either zero or the primal form has no solution. Therefore, SVM minimizes the complexity such that the empirical risk is zero, which is only a sub-case of structural risk minimization. In order for SVM to offer a proper trade-off between data fitting and complexity, slack variables ξi , which are equivalent to a loss function (i) evaluated at the training instances xtr , are introduced such that:
min w,b
n X i=1
ξiq − λ ||w||
(i) f ltr M
ξi ≥ 0
(i) xtr |Tr ≥ 1 − ξi
i = [1, . . . , n]
In this form, (3.66) is an exact structural risk minimization problem, where ξiq is the loss function evaluated at the ith training example and ||w|| is the complexity function C. For q = 1 and q = 2, the meta-model is referred to as L1-SVM and L2-SVM respectively. In the reminder of this work, L1-SVMs are used. For convenience, the primal form (3.66) 107
Common Meta-Models is rewritten as: n
X 1 ||w|| + C ξi w,b 2 i=1 (i) f (i) s.t. ltr M xtr |Tr ≥ 1 − ξi
(3.67) i = [1, . . . , n]
ξi ≥ 0
where C is referred to as the cost parameter. The general purpose dual form simply reads: max α
n X i=1
1 XX (i) (j) (i) (j) αi − αi αj ltr ltr K xtr , xtr 2 i=1 j=1 n
s.t. 0 ≤ αi ≤ C n X (i) αi ltr = 0
i = [1, . . . , n]
This is a quadratic optimization problem in terms of α. General purpose quadratic programing algorithm can be used to solve (3.68), however they are usually not optimal. Specific algorithms have been proposed, such the highly efficient LIBSVM (Hsu et al., 2003; Chang and Lin, 2011). It should also be noted that entire regularization path (α values as a function of C) can be obtained based on the fact that α are piecewise linear functions of C (Hastie et al., 2004). Finally, Chapelle (2007) explored the possibility of training non-linear soft-margin SVMs in the primal.
Model Selection
At this point, the kernel hyper-parameters ψ K and the cost parameter C, still need to be tuned. They could be selected using methods described in Sections and However more elegant approaches have been proposed over the years. A very good overview of hyper-parameter selection was proposed by Chapelle et al. (2002). Specifically, upper bounds of the LOO error as a cross product of the dual form (3.68) (although strictly speaking, these bounds can be derived as a cross product of the primal form (3.66) too) have been proposed. 108
Supervised Learning for Computational Design Joachims (2000) proposed the ξα estimate as a bound of the LOO error (0-1 loss). The ξα estimate is defined as: n 1X ξα f I ραi R2 + ξi ≥ 1 (3.69) RLOO M = n i=1 where R is an upper bound of the kernel K used (1 for Gaussian kernel). ξα Joachims showed that ρ = 2 is required to prove that RLOO is an upper bound of RLOO , however heuristically, ρ = 1 provides better results. Vapnik and Chapelle (2000) almost simultaneously proposed an LOO bound based on the span of the support vectors. Finally, Jaakkola et al. (1999) proposed a simple heuristic for selecting σ alone for the Gaussian kernel. In this heuristic, σ is chosen as the mean value of all the pairwise distances between +1 and −1 training realizations.
Prediction Standard Error
Arguably the most important drawback of using an SVM is that it does not provide an estimate of its variance. Based on the prediction value (3.63), a hard classification to either class is obtained. A classic approach to alleviate this issue is to default back to a sigmoid link (Platt, 1999): 1 e (x|Tr) = (3.70) P f − 1 + e M(x|Tr) Such a link is typically used due to the following properties: lim f M(x|Tr)→−∞
lim f M(x|Tr)→+∞
e (x|Tr) = 0 P e (x|Tr) = 1 P
f (x|Tr) = 0 M
(3.71) (3.72) e (x|Tr) = 0.5 P
e (x|Tr) is interpreted as the probability of x to be classified Therefore, P f (·|Tr). The predicted labels then become: as +1 by the meta-model M ( e (x|Tr) > 0.5 +1 if P e l (x|Tr) = (3.74) e (x|Tr) ≤ 0.5 −1 if P
Based on the variance of a Bernoulli random variable (2.218), a prediction standard error therefore reads: r SE e (x|Tr) 1 − P e (x|Tr) f (x|Tr) = P M (3.75) 109
Design of Computer Experiments In practice, as the SVM values can range any scale, (3.70) is typically rewritten as: 1 e (x|Tr) = P (3.76) f 1 + e−AM(x|Tr)+B A and B can be found in different ways. Arguably the most popular approach is to use a regularized maximum likelihood approach as described in Lin et al. (2007). Unfortunately, the prediction standard error based on a Sigmoid link has proven to be rather impractical and alternatives might be explored (Basudhar et al., 2012). Additionally, SVM can be adapted for regression, which is referred to as support vector regression (SVR, Smola and Sch¨olkopf, 2004; Bishop, 2006).
Design of Computer Experiments
The previous section introduced various meta-models that, given a training set Tr, would predict some output at a new x. One might wonder, however, how to obtain such a training set Tr. In the conventional machine learning literature, this is a moot point as the task at hand is usually, given some data (i.e., Tr), construct the best metamodel possible. For the scope of this work (Section 1.1), the task at hand is, given a computationally expensive model M, construct the best meta-model possible. Therefore, the training set Tr is left to be selected. Ideally, for a size n training set Tr and a given meta-model f (·, Tr), one seeks the training set Tr that minimizes the generalizaM tion error. In this section, some techniques for the design of computer experiments (Sacks et al., 1989; Johnson et al., 1990) are introduced. Example 3.3 at the end of the section shows graphical representations of these methods. In practice, a design of computer experiments is often referred to as a design of experiments (DOE), although the two fields are rather different. Simply put, a DOE is a size n sample X that is supposed to be representative of the population X, while maximizing the coverage of the domain.
Random Sampling
According to the previous statement, a random sample would be an appropriate candidate for X, by definition. The simplicity of this 110
Supervised Learning for Computational Design technique makes it applicable for deterministic design (uniform distributions, Figure 3.5(a)) as well as general probabilistic design (Figure 3.5(b)). Intuitively, an optimal DOE is one that has uniform gap between its instances (maximum coverage), hence that has low discrepancy. Random samples do not typically fit such a definition. One might wonder then if low discrepancy sequences (e.g., Sobol’, Halton) would be appropriate. Unfortunately, such sequences are typically built to maintain low discrepancy as instances are added whereas the property of interest here is to have the lowest discrepancy possible for a given size n training sample. For this reason, low-discrepancy sequences have sparsely been used.
Full and Fractional Factorial
A full factorial DOE is the typical “grid” in the deterministic space (Figure 3.5(c)). Probabilistic transformation can be used to convert such DOE for probabilistic spaces with independent random variables. Iman and Conover (1982) further proposed an approach (related to the Nataf transformation) to induce a target correlation in a sample (not restricted to full factorial, Figure 3.5(d)). However, full factorial DOEs have a few drawbacks: • the sample size n cannot be controlled, rather, for each dimension Qn (i = [1, . . . , d]), a number of levels nli is selected and a size i=1 nli DOE is obtained;
• this approach heavily suffers from the curse of dimensionality (i.e., is not scalable); • the discrepancy of full factorial design increases with the number of dimensions. One way to overcome the scalability issue is referred to as fractional factorial DOE (Box et al., 1978). Fractional factorial designs are essentially subsets of an underlying full factorial design chosen in some way. However, such techniques usually rely on some prior knowledge that is not always easy to obtain.
Latin Hypercube Sampling
Latin hypercube samples (LHSs) are arguably the most widely used approach for DOE. LHSs (McKay et al., 1979) can be seen as a form of 111
Design of Computer Experiments 1 0.9 0.8 0.7
0.6 0.5 0.4 0.3 0.2 0.1 0 0
X1 Figure 3.4: A LHS with poor coverage of the space. fractional factorial design and are most simply described following the idea of sudoku in 2D. For a n × n grid, there can only be one realization on each row and column (Figure 3.5(e)). Based on this definition, it follows that there is n!d−1 possible LHS designs. LHSs are highly scalable and possess a very attractive property. For an LHS in the design space (e.g., X1 , X2 , X3 ), a projection of that DOE in a sub space (e.g., X1 , X3 ) is also a LHS. LHS can also be extended to probabilistic cases (Figure 3.5(f) and Iman and Conover, 1982). However, as chosen at random, LHS can sometimes have poor coverage of the space (Figure 3.4). In order to overcome this issue, orthogonal LHS (Ye, 1998) and optimal latin hypercube sample (OLHS, Park, 1994) have been investigated. Essentially, OLHS searches among all possible LHS an optimal one in some sense (e.g., maximizes the minimum distances between samples). OLHS (Figure 3.6(a) and 3.6(b)) usually leads to consistent DOE.
Centroidal Voronoi Tessellation
Another approach for DOE is referred to as centroidal Voronoi tesselation (CVT, Du et al., 1999). Various algorithm exists to generate CVT samples but the most simplistic (although possibly not the most effi112
Supervised Learning for Computational Design cient) is to apply n-means clustering (Lloyd, 1982) on a large enough random sample X (usually 105 , Figure 3.6(c)). One of the main advantage of CVTs is their straightforward extension to probabilistic spaces (i.e., simply get a random sample X representative of fX and apply n-means clustering, Figure 3.6(d)). Other extensions based on weight function and rejection regions have been proposed (Ju et al., 2002). This can be of paramount importance in the case of reliability assessment. When rare events are tracked, DOEs need to cover larger relevant regions. In addition, the behavior of interest (the limit state hyper-surface, Section 2.4) is expected to be in regions with low, but relevant, probabilistic content. It can be advantageous to consider rejection regions defined such that the joint PDF of X is below a suitable threshold (Figure 3.6(e)). Example 3.3 (Comparison of Design of Computer Experiments) Figures 3.5 and 3.6 show various examples of design of computer experiments (n = 25, 5 × 5 for full factorial) for two bivariate cases: • Uniform: X1 , X2 ∼ U (0, 1)
• Correlated Gaussian: X ∼ N2 (µX , ΣX ) µX = [0 0] 1 0.7 ΣX = 0.7 1
Active Learning for Reliability Assessment
In the previous section, some DOE techniques were introduced. However, in many ways, spending the entire computational budget on a single DOE is widely recognized to be suboptimal. A more efficient, and widely used, approach is to spend only a small fraction of the computational budget into a DOE to build an initial meta-model and subsequently refine it in an iterative manner. 113
Active Learning for Reliability Assessment 1
2 0.7
0.6 0.5 0.4
-2 0.2
0 0
(a) Random (uniform case).
(b) Random (Gaussian case).
2 0.7
0.6 0.5 0.4
-2 0.2
0 0
(c) Full factorial (uniform case).
(d) Full factorial (Gaussian case).
2 0.7
0.6 0.5 0.4
-2 0.2
0 0
(e) LHS (uniform case).
(f) LHS (Gaussian case).
Figure 3.5: Examples of design of computer experiments in 2D for a uniform and a bivariate correlated Gaussian distribution (Part 1). 114
Supervised Learning for Computational Design 1
2 0.7
0.6 0.5 0.4
-2 0.2
0 0
(a) OLHS (uniform case).
(b) OLHS (Gaussian case).
2 0.7
0.6 0.5 0.4
-2 0.2
0 0
(c) CVT (uniform case).
(d) CVT (Gaussian case).
-4 -4
(e) Alternative CVT (Gaussian case).
Figure 3.6: Examples of design of computer experiments in 2D for a uniform and a bivariate correlated Gaussian distribution (Part 2). 115
Active Learning for Reliability Assessment The overall idea is the following. The initial DOE is used to capture the general features of the computationally expensive model M. f then provides a rough approximation over The first meta-model M f an acthe entire domain. Using the information of the DOE and M, tive learning strategy searches a new instance in the space that brings (or is expected to bring) the most improvement to the meta-model for a given purpose (e.g., optimization, reliability assessment). This is usually achieved by defining a function that somehow quantifies the information gained by a given instance and subsequently maximizing said function. It is of paramount importance to understand that active learning strategies are not multi-purposed but designed toward a specific goal. For example, a strategy for optimization (e.g., EGO, Jones et al. (1998)) has a completely different purpose (and expression) than a strategy for reliability assessment (as discussed below).
One Step Look Ahead
One step look ahead approaches attempt at quantifying the actual improvement in the meta-model due to x, based h i on an error function f of a meta-model M. f Imp that measures the imperfection Imp M For every x, the meta-model is retrained using the training set Tr augmented of x, using some form of assumption or approximation on the f For any x the one step ahead imperfection missing h y = M (x|Tr). i f Imp M (·|Tr, x) can therefore be measured. Hence, solving: i h f (·|Tr, x) min Imp M x
is equivalent to search the x that maximizes the one step ahead improvement. Picheny et al. (2010) proposed the target integrated mean square error (tIMSE) for Imp. tIMSE is essentially an integral measure of the prediction variance in the vicinity of the approximated limit state f (x|Tr) = 0): hyper-surface (i.e., M Z f = M fSE (x|Tr)2 W (x) fX (x) dx tIMSE M (3.80) where W is a weight function that puts emphasis in the vicinity of the limit state hyper-surface approximation. An example of W is a 116
Supervised Learning for Computational Design Gaussian density:
f (x|Tr) W (x) = φ M
An attractive feature of tIMSE stems from the fact that the missing f (x|Tr) can be disregarded (Picheny et al., 2010). information y = M Bect et al. (2012) proposed the stepwise uncertainty reduction (SUR). In their work, Bect et al. proposed four candidates for the imperfecf (x|Tr) tion function Imp. In SUR, the missing information y = M is integrated out over all the possible values. The main drawback of these approaches is the computational cost. Various reconstruction of the meta-modeland multiple integration (even double-loop integration for SUR) are nested within the optimization loop (3.79). In order to partially overcome the computational burden, the meta-model can be partially retrained instead (e.g., , hyper-parameters are maintained across the optimization loop).
Expected Improvement
Another way to reduce the computational burden is to make a decision f (·|Tr) rather than the one step ahead using the current meta-model M f (·|Tr, x). Let us introduce the improvement function (q > 0): one M Iq (x) = ε (x)q − min {|Y (x)|q ; ε (x)q }
where Y is defined, for GP, as: f (x|Tr) , M fSE (x|Tr)2 Y (x) ∼ N M
ε (x) is a bandwidth parameter (Figure 3.7(a)). So far, this approach has been restricted to GPs. The expected improvement measures the likelihood of Y to be equal to 0. This metric is used for contour estimation and should not be confused with the expected improvement for optimization (Jones et al., 1998). The expected improvement is defined as: Z ε(x)
E [Iq (x)] =
(ε (x)q − |y|q ) fY (x) (y) dy
Simultaneously, Bichon et al. (2008) and Ranjan et al. (2008) explored the case q = 1 and q = 2 respectively with: fSE (x|Tr) ε (x) = 2M 117
Active Learning for Reliability Assessment 1
q=1 q=2
0.4 0.8
0.3 0.25 0.2
0.15 0.1 0.05 0 -2
" " 0
0 -2
(a) Illustration of the bandwidth ε.
(b) Illustration of the improvement function Iq for q = 1 and q = 2.
Figure 3.7: Graphical illustration of the improvement function Iq . The case q = 1 is commonly referred to as efficient global reliability analysis (EGRA). Bect et al. (2012) proposed a closed form expression for (3.84) with q = 1 and q = 3. Figure 3.7(b) shows the influence of q on the improvement function Iq . Therefore, the instance x that should f (·|Tr) is the one that be considered given the current meta-model M maximizes the expected improvement: max E [Iq (x)] x
One of the main features of the expected improvement as studied by Ranjan et al. (2008) and Bichon et al. (2008) is that it ignores the distribution of X. This can be seen as an advantage (the learner can be used for several distributions) or a disadvantage (function calls wasted in low probabilistic content regions, Section 4.1). Example 3.4 (Expected Improvement Graphical Illustration) Consider the following function: g (x) = 10 − ln (x) x = [0; 9.7]
(3.87) (3.88)
A GP is built using five training instances. Figure 3.8(a) shows a plot of the GP. Figures 3.8(b) and 3.8(c) show the expected improvement (for contour estimation) function for q = 1 (Bichon et al., 2008) and q = 2 (Ranjan et al., 2008) respectively. 118
Supervised Learning for Computational Design 3 2.5
g GP 95% CI GP Tr
0 -0.5 -1 0
E [I2 ]
E [I1 ]
1 0.5
0.08 0.06 0.04
0.02 0.015 0.01
0.005 0
(a) Illustration of a Gaussian process for Example 3.4.
(b) Illustration of I1 for Example 3.4.
(c) Illustration of I2 for Example 3.4.
Figure 3.8: Graphical illustration of the expected improvement (for contour estimation) function Iq .
Echard et al. (2011) explored a different improvement function. An accurate estimation of the probability of failure is highly dependent on f (·|Tr) to correctly predict the label the ability of the meta-model M (or class) of x. The so-called probability of misclassification is defined as: h i f (x|Tr) > 0 if M (x) ≤ 0 P M h i (3.89) Pmisc (x) = f (x|Tr) ≤ 0 if M (x) > 0 P M and is often approximated by: h i f (x|Tr) ≤ 0 f (x|Tr) > 0 if M P M emisc (x) = h i P f (x|Tr) ≤ 0 if M f (x|Tr) > 0 P M
For GPs, it becomes:
( f (x|Tr) ≤ 0 if M emisc (x) = Φ (z) P f (x|Tr) > 0 Φ (−z) if M = Φ (− |z|)
where: z=
f (x|Tr) M fSE (x|Tr) M 119
(3.91) (3.92)
Active Learning for Reliability Assessment As the standard normal CDF is monotonically increasing, the instance x that maximizes the probability of misclassification is the solution of: f M (x|Tr) min (3.94) x fSE (x|Tr) M Note that this function is non-differentiable and therefore the following optimization problem should be preferred: min x
f (x|Tr)2 M fSE (x|Tr)2 M
Similar to maximizing the expected improvement (Section 3.4.2), this criterion ignores the distribution of X. In their work, Echard et al. addressed this issue as follows. Given a random sample X of size n (n large), the next training instance x is selected as the realization x(i) that has the highest probability of misclassification: f (i) M x |Tr min i = [1, . . . , n] (3.96) i fSE (x(i) |Tr) M This technique both frees the approach from the need of an optimization algorithm and accounts for the distribution of X (through X).
K-Means Clustering Strategy
Dubourg (2011) introduced an innovative approach for active learning. Given any improvement function I (opposite of imperfection function in Section 3.4.1), Dubourg et al. (2011) discussed an alternative to global optimization, that is a constant across all previously discussed methods. In his work, Dubourg samples the improvement function I using MCMC and then reduces the obtained sample X using k-means clustering. One of the main advantages of this approach is that it allows for parallel update. In situations where the computationally expensive model M can be run in parallel (e.g., using clusters or super-computers), it can be beneficial to add several additional training instances instead of a single one. This is referred to as parallel update. Example 3.5 (Graphical Illustration of Clustering Strategy) Example 3.4 continued 120
E [I2 ]
E [I2 ]
Supervised Learning for Computational Design
2 1.5
2 1.5
(a) 1-means clustering.
(b) 4-means clustering.
Figure 3.9: Illustration of k-means clustering strategy following Example 3.4. Blue bins show the MCMC sample and green squares show the k-means cluster centroids. Recall Figure 3.8(b). Figure 3.9 shows the histogram of a MCMC sample (blue bins) with its associated k-means cluster centroids (green squares, k = 1 and k = 4).
Explicit Design Space Decomposition
The improvement functions (or the imperfection ones) discussed so far have systematically been developed for GPs (although one could argue they could be extended to meta-models with an estimate of its local error). This is representative of how dominant GPs have been in the field of active learning for engineering design. In this section, a different, meta-model-free, approach is discussed. Explicit design space decomposition (EDSD) is a strategy that was introduced to construct accurate domain boundaries (limit state hypersurfaces in the terminology of this work) approximations within the design space. EDSD (Basudhar and Missoum, 2008) was developed with deterministic design spaces in mind. It relies on a simple paradigm. Sparsity in the design space is correlated with additional information. Or in other words, domains in the X space with low number of instances (i.e., where we have little to no information) are the most likely to yield f (·, Tr), this translates into new information. Given a meta-model M 121
Active Learning for Reliability Assessment the so-called “max-min” sample (Basudhar and Missoum, 2010): (i) max min x − xtr (3.97) x
f (x, Tr) = 0 s.t. M l≤x≤u
For consistency with the terminology used so far, the max-min sample is referred to as max-min instance hereafter. The solution to this optimization problem yields the instance x belonging to the limit state hyper-surface approximation that maximizes the distance to the training instances. The main aspects of this refinement scheme are as follows: • can be applied with any meta-model; • the objective function is non-differentiable; • does not take into account the joint PDF fX . EDSD also relies on a secondary sample referred to as “anti-locking” sample (Basudhar et al., 2012). However, its purpose and definition fall outside the scope of this work.
Chapter 4 A Framework for Active Learning under Uncertainty In the previous chapter, the fundamentals of active supervised machine learning for computer design have been introduced. Given these elements, two main avenues for improvement arise. On one hand, one would seek new or smarter meta-models to make the best use of a given training set Tr. Some examples of such works involve blind Kriging (Joseph et al., 2008), dynamic Kriging (Zhao et al., 2011) and PC-Kriging (Sch¨obi and Sudret, 2014). On the other hand, new active learning techniques (or adaptive sampling schemes) could lead to more efficient approximations. The first section of this chapter develops a new tool for adaptive sampling and then describes its use for different applications.
Generalized “Max-Min”
The previous chapter concluded on EDSD and its main component, the max-min instance. The core of this work is to expand on this idea. As mentioned, one of its main features is that it is meta-model independent, making it very attractive in a research area where it is still unclear which surrogate(s) is (are) optimal. In fact, the no free lunch theorem (Wolpert, 1996) states that there is not a single optimal meta-model, but rather, any meta-model could be optimal, given the problem at hand. However, as stated in Section 3.4.5, the max-min instance suffers from two major problems that are addressed in this section. 123
Generalized “Max-Min”
Generalization to Gaussian Distribution
Given a size n training set Tr, possibly empty, let us consider first an unconstrained version of the max-min instance: (i) max min x − xtr (4.1) x
Exercise 4.1 (Sequential Generation) Let us define an exercise that will be repeated throughout this section. Given an initial training sample Xtr of dimension d, made only of 0: 1. find a new instance x using the current max-min expression, initially (4.1), later replaced by (4.3) and eventually (4.4); 2. add the new instance x to the training sample Xtr ; 3. loop back to 1. Repeat until the training sample Xtr has reached a size n. Let us now consider Exercise 4.1 with d = 2, the expression (4.1), n = 41 and: l = [−5, −5] u = [5, 5] (4.2) Figure 4.1 shows a plot of the final training sample Xtr . Two conclusions arise: 1. The max-min instance requires bounds in its definition or the problem would be ill-defined (instances would be chosen at infinity); 2. Although the sample provides a good coverage of the space, its instances are not optimal for non-uniform distributions. Indeed, when observing Figure 4.1, the max-min instances are clearly not optimal with respect to the standard normal PDF φ. Additionally, each distribution has its own support and that information is usually carried through its PDF. 124
A Framework for Active Learning under Uncertainty 6
? max-min examples 4
-6 -6
X1 Figure 4.1: Sequential addition of 41 max-min examples in a 2D unconstrained setup.
A natural idea is therefore to weight the minimum distance with the PDF value: (i) max φ (x) min x − xtr (4.3) x
Repeating Exercise 4.1 with d = 2, the expression (4.3) and n = 41 leads to Figure 4.2. However, are these instances (or in other words, this sampling scheme) optimal? Statistically speaking, this scheme is optimal if it behaves as a pseudo random generator that maximizes the space coverage. For the sample Xtr to be optimal with respect to φ is equivalent to stating that the random sample Xtr is representative of the population φ. In order to check for this property, Exercise 4.1 is repeated using (4.3) for n = 300 and d = 2, d = 10, and d = 30. Figure 4.3 shows an overlay of the standard normal CDF Φ along with the marginal empirical CDFs FXi . It is clearly apparent that they do not match. At this point, an infinity of ways to correct the PDF value in the max-min expression could be explored. The generalized max-min prob125
Generalized “Max-Min”
? max-min examples 4
-6 -6
(a) d = 2.
0 -3
0.5 0.4
0.1 0 -3
1 0.9
Figure 4.2: Sequential addition of 41 modified max-min (4.3) instances in a 2D unconstrained setup.
(b) d = 10.
0 -3
(c) d = 30.
Figure 4.3: Marginal empirical CDFs FXi along with the standard normal CDF Φ for Exercise 4.1 using (4.3). The red dashed-dotted line is the reference standard normal CDF Φ while every solid colored line is the marginal empirical CDFs FXi .
A Framework for Active Learning under Uncertainty 6
? max-min examples 4
-6 -6
X1 Figure 4.4: Sequential addition of 41 generalized max-min (4.4) instances in a 2D unconstrained setup. lem (Lacaze and Missoum, 2014a) is defined as: 1 (i) d max fX (x) min x − xtr x
f (x, Tr) = 0 s.t. M
Figure 4.4 is obtained after repeating Exercise 4.1 with d = 2, the expression (4.4) (unconstrained) and n = 41. It can be seen that the instances are more spread out within the domain. In order to check for the pseudo random generator property, Exercise 4.1 is repeated using (4.4) (unconstrained) for n = 300 and d = 2, d = 10, and d = 30. Figure 4.5 shows an overlay of the standard normal CDF Φ along with the marginal empirical CDFs FXi . It can be seen that the marginal CDFs are consistent with the reference Φ across the three test cases. In fact, the subscript d1 in (4.4) can be replaced by a positive tuning parameter dq . As q tends to ∞, the minimum distance part loses influence and instances are drawn closer and closer to each other (influenced by the PDF). As q tends to zero, (4.4) defaults to (4.1), as the PDF value part loses influence. Although no analytical proof of the pseudo random generator exists for this case, the author expects this feature to extend to any d and any arbitrary distribution fX . It was shown 127
0 -3
0.5 0.4
0.1 0 -3
1 0.9
Generalized “Max-Min”
0 -4
(a) d = 2.
(b) d = 10.
(c) d = 30.
Figure 4.5: Marginal empirical CDFs FXi along with the standard normal CDF Φ for Exercise 4.1 using (4.4). The red dashed-dotted line is the reference standard normal CDF Φ while every solid colored line is the marginal empirical CDFs FXi . that the generalized max-min can easily be parallelized (Lacaze and Missoum, 2014a) and used with dependent distributions (Lacaze and Missoum, 2014b). Example 4.1 (Illustration of Generalized Max-Min) In order to illustrate the interest of the generalized max-min approach, consider the following limit state function: g (x) =
6sgn(x1 ) − x2 sgn(x2 ) ≤ 0 x1 X ∼ N2 (µX , ΣX ) µX = [0, 0] 1 0.7 Σ= 0.7 1
Figure 4.6 shows the results of Exercise 4.1 with: • Figure 4.6(a): d = 2, n = 50, and expression: (i) max min x − xtr x
s.t. g (x) = 0
• Figure 4.6(b): d = 2, n = 50, and the expression: 1 (i) 2 max φΣ (x) min x − xtr (4.7) x
s.t. g (x) = 0 128
A Framework for Active Learning under Uncertainty 5
5 4
4 3
?' classic max-min
?' generalized max-min
-5 -5
X1 (a) Classic.
(b) Generalized.
Figure 4.6: Graphical illustration of classic and generalized max-min for Example 4.1. Remark 4.1 (Generalized and Classic Max-Min) In the case where the inputs X are independent uniform random variables, the generalized max-min is equivalent to the classic max-min. Remark 4.2 (Generalized Max-Min and MPP) An important feature of the generalized max-min instance, due to its definition, is that when the training set Tr is empty or largely sparse, the search for a generalized max-min is equivalent to the search for the MPP.
Numerical Implementation
Another point that was raised in Section 3.4.5 stated that (4.4) is non differentiable. In addition, this optimization problem is highly multimodal. The global optimization of highly multi-modal functions is still an open research subject. However, two dominant approaches arise: • heuristic global techniques: genetic algorithm, DIRECT (Jones et al., 1993), CMA-ES (Hansen, 2006), etc. • multi-start local approaches: SQP (Bonnans et al., 2006), etc. In the generalized max-min setting, it is likely that multiple local optima have very close performance. Therefore, obtaining the exact global 129
Generalized “Max-Min” optimum is not necessary. In addition, if gradients are available (as opposed to obtained through finite differences for example), gradientbased optimization can be significantly more efficient. For these reasons, a differentiable approximation of (4.4) is derived. For completeness, in the previous work by Basudhar and Missoum (2010), the non-differentiability of the max-min problem was addressed by rewriting (3.97) as: max z
x, z
(i) s.t. z − x − xtr ≤ 0
i = [1, . . . , n]
f (x, Tr) = 0 M l≤x≤u
which transforms (for a size n training sample Xtr of dimension d) a d dimensional optimization problem with one constraint (3.97) into a d + 1 dimensional optimization problem with n + 1 constraints (4.8). Although this formulation is differentiable, the additional constraints significantly increase the numerical complexity. Therefore, a different road is explored. The core of the proposed implementation stems from the Chebychev distance: ||v||∞ = max (vi ) (4.9) i
where v is a vector of n positive components vi . The Chebychev distance (“norm infinity”) is asymptotically linked to the Minkowski metric (“p-norm”) such that: lim ||v||p = ||v||∞
Therefore: ||v||∞ = max (vi ) ≈ ||v||p = p1
n X
! p1
Based on the author’s experience, p = 40 leads to satisfactory results. Noting that:
−1 −1
min (vi ) = max vi i
n X i=1
!− p1
A Framework for Active Learning under Uncertainty (i) and defining vi = x − xtr , it follows that: (i) min x − xtr ≈ i
n −p X (i) x − xtr i=1
!− p1
Finally, the objective function of the generalized formulation (4.4) can be written as:
max x
n −p X 1 1 (i) log fX (x) − log x − xtr d p i=1
which, in addition to being differentiable, does not add any additional constraints. In fact, analytical sensitivities can be easily derived and the optimization problem can be solved efficiently using a gradient-based method. Note that logarithms have been introduced because, as the dimensionality increases, the numerical value of the joint PDF fX can drop to 0. Consider the maximum value of the multidimensional standard normal joint PDF φ, which is obtained at the origin. For d = 2, φ(0) ≈ 0.16 whereas for d = 50, φ(0) ≈ 1.11 × 10−20 . Remark 4.3 (Numerical Efficiency of the Chebychev Approximation) In order to motivate the use of the Chebychev approximation (4.1.2), a numerical comparison of the different approximations is performed. Two cases are considered. In the unconstrained case, a unique instance is found for d = 2 and d = 50, based on an initial training set Xtr of size n = 20 131
Generalized “Max-Min” and n = 500, using 3 formulations: (i) max min x − xtr x
− 1 ≤ xj ≤ 1
j = [1, . . . , d]
max z
x, z
(i) s.t. z − x − xtr ≤ 0 − 1 ≤ xj ≤ 1
max x
i = [1, . . . , n]
j = [1, . . . , d]
n −40 X (i) x − xtr i=1
− 1 ≤ xj ≤ 1
#− 401
j = [1, . . . , d]
In the constrained case, the exact same setting is repeated with the addition of the following constraint (representing a limit state hyper-surface approximation): d−1
1 X xd − xj = 0 d − 1 j=1
Three metrics st , sp , and sT are used to measure the performance of each configuration. st is the normalized time, defined as the lowest computational time across all configurations divided by the configuration time, such that the best and worst possible scores are 1 and 0 respectively. sp is the normalized performance, defined as minimum distance between the training instances and the instance found using each configuration divided by the maximum of these values, such that the best and worst possible scores are 1 and 0 respectively. sT is the normalized total score, defined as sT = 21 (st + sp ), such that the best and worst possible scores are 1 and 0 respectively. Tables 4.1 and 4.2 show the results. It clearly shows the advantages of using the Chebychev approximation (3c ). 132
A Framework for Active Learning under Uncertainty
Table 4.1: Comparison of numerical optimization for Remark 4.3 (d = 2). Unconstrained Constrained a b c c a n Cases 1 1 2 3 1 1b 2c 3c st .04 1 .11 .13 .02 .68 .72 1 20 sp .92 .94 1 1.0 .74 .49 1 1.0 .48 .97 .56 .56 .38 .58 .86 1.0 sT st .04 1 .12 .11 .02 .73 .01 1 500 .86 .88 .95 1 .60 .47 .78 1 sp sT .45 .94 .53 .55 .31 .60 .39 1 1a Formulation (4.13) using GA (built-in Matlab function using default settings) 1b Formulation (4.13) using PS (built-in Matlab function using default settings) 2c Formulation (4.14)using multi-start SQP with 50 random starting points (built-in Matlab function using default settings) 3c Formulation (4.15) using multi-start SQP with 50 random starting points (built-in Matlab function using default settings)
Table 4.2: Comparison of numerical optimization for Remark 4.3 (d = 50). n 20
Cases st sp sT st sp sT
Unconstrained 1b 2c 3c .32 .59 .75 1 .82 .99 1 .97 .57 .79 .87 .99 .40 .63 .41 1 .82 .99 1 .98 .61 .81 .71 .99 1a
Constrained 1b 2c 3c .04 .13 .85 1 .85 .99 1 .97 .44 .56 .92 .99 .04 .13 .33 1 .88 .99 1 .98 .46 .56 .66 .99 1a
1a Formulation (4.13) using GA (built-in Matlab function using default settings) 1b Formulation (4.13) using PS (built-in Matlab function using default settings) 2c Formulation (4.14)using multi-start SQP with 50 random starting points (built-in Matlab function using default settings) 3c Formulation (4.15) using multi-start SQP with 50 random starting points (built-in Matlab function using default settings)
Fidelity Maps for Model Update
4.2 4.2.1
Fidelity Maps for Model Update Problem Definition
Consider a computationally expensive model M, of a physical phenomenon, with two types of input: epistemic uncertainties X and aleatory uncertainties A, and a large number ny of outputs Y such that: Y = M (X, A) (4.17) The problem definition is as follows. Given a set of m observations (1) (m) (measurements) Yexp of Y made of yexp to yexp , how can one reduce the uncertainty on X knowing Yexp while accounting for A?
Example 4.2 (A Typical Model Update Setting) Consider an upward cantilever beam (representing a tower). A finite element model M returns ny = 3 deflections δ = [δ1 , δ2 , δ3 ] given the Young’s modulus E and the wind loading W (Figure 4.7) such that: δ = M (E, W )
The decision maker has an idea of the Young’s modulus E (e.g., expert knowledge, manufacturing specs); however, the exact value is unknown. Given more information, E could be better characterized (epistemic uncertainty). On the other hand, the wind loading W has a known distribution fW that is irreducible (aleatory uncertainty). At different times, a set of m = 6 measurements δ exp (e.g., strain gauge) of the deflections were observed: (1) (1) δ1,exp · · · δ3,exp .. .. (4.19) δ exp = ... . . (6)
δ1,exp · · · δ3,exp
Hence, under the uncertainty from W and given the information δ exp , how can one better characterize the Young’s modulus?
A natural approach would be to adapt the expression of the likelihood (2.119): m Y (i) L x|Yexp = fY yexp |x (4.20) i=1
A Framework for Active Learning under Uncertainty
δ1 δ2
δ3 E Figure 4.7: Graphical illustration of Example 4.2. Note the change from X to x as in the MLE settings, x is seen as deterministic. Knowing the joint distribution fY is the purpose of uncertainty propagation and is usually not trivial. Existing approaches (Kennedy and O’Hagan, 2001; Xiong et al., 2009) would require extensive work to fit each response Yi separately in addition to quantifying the correlation among them. Residual-like approaches or products of marginal likelihoods (ignoring the dependencies between the Yi ) are numerically more efficient but can lead to large errors (Lacaze and Missoum, 2014c).
Fidelity Maps
In order to address the aforementioned issues, the notion of fidelity map (FM) is introduced (Lacaze and Missoum, 2012, 2013b, 2014c). A (j) FM ΩFM is defined as the region of the (X, A) space corresponding to (j) responses Y within a user-defined interval of the observation yexp : (j)
ΩFM = {(x, a) | rij ≤ εi , i = 1, . . . , ny , j = 1, . . . , m} where:
y − y (j) i i,exp rij = y (j) i,exp
Although complete, this definition can seem rather convoluted. The notion of FM is better remembered as the region of the input space that generates outputs within the vicinity of the observations. By definition, (j) (j) m observations yexp define m FM ΩFM . 135
Fidelity Maps for Model Update
Likelihood Approximation (j)
This section shows how to relate a FM ΩFM to the likelihood. Without loss of generality and for the sake of simplicity, consider the scalar case x, yexp (one response, one measurement) and ΩFM . Recalling (2.2): dFY (4.22) fY (yexp |x) = dy yexp FY (yexp |x) = P [Y ≤ yexp ]
and using the central finite difference expression: dFY FY (yexp + δ|x) − FY (yexp − δ|x) = lim dy yexp δ→0 2ε
leads to:
P [yexp − δ ≤ Y ≤ yexp + δ|x] δ→0 2δ P [r ≤ ε] = lim ε→0 2εyexp
fY (yexp |x) = lim
where: Y − yexp r = yexp
δ yexp
Y = M (x, A)
The probability P [r ≤ ε], by definition, is the probability of belonging to the FM ΩFM . Therefore: fY (yexp |x) ∝ ∼ P [(x, A) ∈ ΩFM ]
which extends to: fY
(j) yexp |x
and finally: L x|Yexp
h i ∝ P (x, A) ∈ Ω(j) FM ∼
∝ ∼
m h i Y (j) P (x, A) ∈ ΩFM
where ∝ ∼ stands for “approximately proportional to”. Figure 4.8 summarizes the different elements and links in the FM approach. 136
A Framework for Active Learning under Uncertainty Y2 y2,exp
ε1 y1,exp ε2 y2,exp
max ri < εi i=1,2
P [(x, A) ∈ ΩFM ]
ΩFM y1,exp
Figure 4.8: Main elements of the FM approach.
Given the elements of the two previous sections, there are two ways to update the knowledge on X. In the frequentist viewpoint, X is updated into xMLE , the MLE obtained by maximizing (4.29). In the Bayesian viewpoint, a prior knowledge (PDF) fX on X is required. Given this prior and the Bayes formula (2.123), the updated distribution of X (posterior PDF) reads: fX x|Yexp ∝ L x|Yexp fX (x) m h i Y (j) ∝ fX (x) P (x, A) ∈ ΩFM (4.30) i=1
Based on this expression, a random sample XYexp of the posterior distribution can be drawn using MCMC techniques. Note that the prior knowledge allows one to weight the information from the observations (evidence) with some prior knowledge (belief, e.g., expert knowledge). Example 4.3 (Prior Interpretation) Example4.2 continued Assume the observations δ exp were corrupted in some way (e.g., human error), and point toward a Young’s modulus E = 100 GPa. The decision maker knows that steel was used, such that a prior distribution could read (in GPa): E ∼ N 210, 212 (4.31)
The frequentist and Bayesian viewpoints would lead to very different results as the observations (evidence) and the prior distribution (belief) have large discrepancy. A similar observation was made in Example 2.12. 137
Design under Uncertainties
As defined in the previous sections, in order to know if an instance (x, a) belongs to the FMs, the computationally expensive model M needs to be called. In order for this approach to remain numerically (j) tractable, the FMs ΩFM are approximated using meta-models as discussed in Section 3.2 and 4.1, and summarized in Section 4.3.1. For the generalized max-min, if no prior fX is available, uninformative priors (e.g., independent uniform distributions with reasonable bounds) are used and X and A are assumed independent. The strength of the FM approach stems from its ability to implicitly capture the dependence structure of a large amount of responses (j) Y. However, each observation yexp generates a FM that needs to be approximated. In conclusion, the FM methodology is a rather niche approach that shines when: • there is a large ny number of (possibly dependent) responses Y; (j)
• there is a limited number m of observations yexp ; • there are discontinuous, non-differentiable or binary behaviors (through classification techniques).
Design under Uncertainties
In this section, an algorithm for reliability assessment is introduced and extended to a sequential surrogate-based local approach for RBDO. The reliability algorithm is a very conventional adaptive sampling scheme with the novelty of the generalized max-min and a bootstrap (Section 2.2.7) based convergence criterion for SVM.
Active Learning for Reliability Assessment
Using the problem statement from Section 2.4, the boundary ∂Ωf or ∂ΩFM of a failure domain Ωf or a FM ΩFM , respectively, is to be estimated. For the case of component reliability, it is equivalent to approximate the limit state hyper-surface defined as: g (x) = 0 138
A Framework for Active Learning under Uncertainty (j)
Note that in the FM setting, the “limit state” associated to ΩFM is defined as: g (j) (x) = max rij − εi
f used to approximate the As discussed in Chapter 3, the learner M limit state hyper-surface should be chosen based on known or expected features (Section 3.1.2). The active learning algorithm is developed in Algorithm 4.1. Algorithm 4.1 Adaptive sampling scheme for contour estimation. 1: Define an initial size n training set Tr = [Xtr , Y tr ] based on a DOE as discussed in Section 3.3 such that: (i) ytr = g xtr (i) i = [1, . . . , n] f (·|Tr) (Section 3.2) Train the initial meta-model M f (·|Tr) not converged (Section 4.3.3) do 3: while M 4: Find a new instance xgmm (Section 4.1) such that: 2:
−p X 1 1 (i) log fX (x) − log x − xtr d p i=1 n
xgmm = arg max x
f (x|Tr) = 0 s.t. M
Compute ygmm = g (xgmm ) Augment Tr with [xgmm , ygmm ] f (·|Tr) 7: Update M 8: end while
5: 6:
System Reliability
In the case of system reliability (Section 2.4), each of the ng limit fi (·, Tri ). However, instead state functions gi is approximated by M of searching a generalized max-min instance for each meta-model, a unique instance can be found using the following updated optimization problems: 139
Design under Uncertainties • Series: max x
n −p X 1 1 (i) log fX (x) − log x − xtr d p i=1 ng Y i=1
fi (x, Tri ) ≥ 0 M
• Parallel: max x
i = [1, . . . , ng ]
n −p X 1 1 (i) log fX (x) − log x − xtr d p i=1 ng Y i=1
fi (x, Tri ) = 0 M
fi (x, Tri ) = 0 M
fi (x, Tri ) ≤ 0 M
i = [1, . . . , ng ]
Convergence Criteria
A very simple and reliable convergence criteria was proposed by Dubourg f (·, Tr) provides et al. (2011). Assuming the chosen approximation M an estimate of its standard error, one can compute a (1 − α) CI on the probability of failure such that: h α αi 2 e (4.36) Pf = P P (x|Tr) ≤ 2 i h f (X|Tr) ≤ 0 Pf = P M (4.37) i h 1− α e (x|Tr) ≤ 1 − α Pf 2 = P P (4.38) 2
e (x|Tr) is the probability of x to be predicted as +1 (Secwhere P tion For GPs regression, it reads: ! f (X|Tr) M e (x|Tr) = Φ P (4.39) fSE (X|Tr) M
However, for SVMs, as discussed in Section, the Sigmoide (x|Tr) can be rather unreliable. In this work, we propose to based P take advantage of the relatively low computational cost of training and 140
A Framework for Active Learning under Uncertainty evaluating an SVM (for fixed hyper-parameters ψ). When the SVM f (·, Tr) is trained, nb bootstrapped (Section 2.2.7) SVMs M f ·, Tr(i) M are trained also. These bootstrapped SVMs are obtained by bootstrapping the training set Tr and retaining the optimal hyper-parameters ψ e (x|Tr) is then defined as: f (·, Tr). The probability P from M nb 1 X e f x, Tr(i) P (x|Tr) = sgn M nb i=1
f is measured as the order The estimated risk of the approximation M of magnitude between the two probability of failure bounds: " 1− α # Pf 2 (4.41) cv = log α Pf2
Reliability-based Design Optimization
This section extends the previously discussed elements to the RBDO setting (Lacaze and Missoum, 2013a; Lacaze et al., 2015b). For the sake of clarity, and without loss of generality, a reduced RBDO setup (2.266) is considered: min C (z)
s.t. P [gi (X, z) ≤ 0] ≤ PTi X ∼ fX (x|θ) lz ≤ z ≤ uz
i = [1, . . . , nf ] X = [X1 , . . . , Xd ]
Various surrogate-based RBDO approaches relied on two independent steps (Bichon et al., 2009; Dubourg et al., 2011): • Construct global meta-models of the failure domains gei ; • Perform double-loop RBDO using gei (Section 2.5.2).
In this work, we propose a sequential approach where the failure domain approximations gei are sequentially refined around the current optimum. Algorithm 4.2 summarizes the approach while Sections and provide details about the approach. Recent work explored the possibility of relaxing the constraint z = z? at Line 5 of Algorithm 4.2 (Br´evault et al., 2015). 141
Design under Uncertainties
Algorithm 4.2 RBDO algorithm. 1: Define an initial size n training set Tr = [Xtr , Ztr , Y tr ] based on a DOE as discussed in Section 3.3 such that: i h (j) (j) (j) (j) (j) j = [1, . . . , n] ytr = g1 xtr , ztr , . . . , gnf xtr , ztr 2:
3: 4: 5:
Train the initial approximations gei (·|Tri ) (Section 3.2) where Tri is the ith training set defined as: (4.43) Tri = Xtr , Ztr , Yi tr
while not converged (Section do Get current optimum [z? , θ ? ] (Section Find nf new instance [xi,gmm , zi,gmm ] (Section 4.1) such that: h i −p X 1 1 (i) (i) log fX (x|θ ? ) − log [x, z] − xtr , ztr d p i=1 n
max x,z
s.t. gei (x, z|Tri ) = 0 z = z?
6: 7: 8: 9:
Compute all yi,gmm = g (xi,gmm , zi,gmm ) Augment all Tri with [xi,gmm , zi,gmm , yi,gmm ] Update all gei (·|Tri ) end while
A Framework for Active Learning under Uncertainty
RBDO Sub-Problem
At every iteration, the first step of this approach is to solve the following optimization sub-problem: min C (z)
s.t. P [e gi (X, z|Tr) ≤ 0] ≤ PTi X ∼ fX (x|θ) lz ≤ z ≤ uz
i = [1, . . . , nf ]
The sub-problem 4.44 is exactly the optimization problem (4.42), where the limit state functions gi have been replaced by their meta-models gei . Based on Section 2.4, the probabilities in (4.44) are evaluated using SubSim. However, as discussed in Section, sensitivities (i.e., gradient) of the probability of failure with respect to deterministic variables are missing from the literature for sampling-based methods (i.e., CMC, SubSim). Such sensitivities can be derived as follows (Lacaze et al., 2015a). Recall the expression of a general probability of failure: Z +∞ Pf (z) = I [g (x, z)] fX dx (4.45) −∞
According to the differentiation rules under the integral symbol using the theory of distributions (Schwartz, 1957; Jones, 1987), the sensitivity of Pf with respect to the variable zk reads: Z ∂ ∂Pf = I [g (x, z) ≤ 0] fX (x) dx ∂zk z ∂zk Z ∂ I [g (x, z) ≤ 0] fX (x) dx (4.46) = ∂zk From the theory of distributions, the derivative of the indicator function is: ( +∞ if y = 0 dI [y ≥ 0] dI [y ≤ 0] =− = δ [y] = (4.47) dy dy 0 otherwise where δ is the Dirac distribution (“impulse”). Hence, (4.46) becomes: Z ∂Pf ∂g =− δ [g (x, z)] fX (x) dx (4.48) ∂zk z ∂zk x,z 143
Design under Uncertainties Note that (4.48) involves the derivative of g. Such derivatives are always available if g is replaced by a meta-model ge (Section 3.2). For a size n random sample X, a CMC estimate of this sensitivity reads: n b 1 X ∂g ∂ Pf (i) δ g x , z (4.49) =− ∂zk n i=1 ∂zk x(i) ,z z
Using the notations from Section 2.4.5, sensitivity estimates for SubSim are now derived (Lacaze et al., 2015a). Based on (2.256), the sensitivity of Pf is: (i) s X ∂Pf ∂Pf 1 (4.50) = Pf (z) (i) ∂zk z ∂z k P (z) i=1 f z For the first SubSim sub-domain, we have: Z (1) ∂Pf ∂g (1) δ g (1) (x, z) fX (x) dx =− ∂zk ∂zk x,z
Noting the three following relations: Z i Y (i) (j) I g (x, z) ≤ 0 fX (x) dx = Pf (z)
and for any subsequent step i > 1: " # Z (i) (i) ∂Pf I g (x, z) ≤ 0 ∂ fX (x) dx = Qi−1 (j) ∂zk ∂zk j=1 Pf (z) z Z ∂gi δ g (i) (x, z) fX (x)dx = − Q ∂zk x,z i−1 Pf(j) (z) j=1 i hQ (j) i−1 ∂ Z j=1 Pf (z) ∂zk − Qi−1 (j) 2 I g (i) (x, z) ≤ 0 fX (x)dx (4.52) j=1 Pf (z)
" i−1 # i−1 (j) i−1 Y X Y ∂P ∂ 1 f (j) (j) Pf (z) = Pf (z) (j) ∂zk j=1 ∂zk j=1 j=1 Pf (z) fX (x) =
(j) j=1 Pf (z) I [g (i−1) (x, z) ≤ (i−1)
∀x ∈ Ωf
(4.54) z
(i−1) qi−1 x Ωf (z)
A Framework for Active Learning under Uncertainty (i−1) (i−1) and that the support of qi−1 · Ωf (z) is Ωf (z), the ith intermediate sensitivity is: Z (i) (i) ∂Pf ∂g (i) (i−1) = − δ g (x, z) q x Ω (z) dx f i−1 ∂zk ∂zk x,z z (j) i−1 X ∂Pf 1 (i) (4.56) − Pf (z) (j) P (z) ∂zk j=1
SubSim estimators of the sensitivities (4.51) and (4.56) are: (1) n1 (1) (1,l) ∂ Pbf 1 X ∂g (1) δ g x ,z = − ∂zk n1 l=1 ∂zk x(1,l) ,z z (i) ni b ∂ Pf 1 X ∂g (i) δ g (i) x(i,l) , z = − ∂zk ni l=1 ∂zk x(i,l) ,z z i−1 b(j) X ∂ P 1 f (i) − Pbf (z) (j) b ∂z k j=1 Pf (z) z
where ni is the size of the random sample X(i) . The estimates (4.49), (4.57) and (4.58) involve the Dirac distribution, making their calculation intractable (there is a zero probability to get an example x(i) such that g x(i) , z is strictly 0). To overcome this hurdle, the Dirac distribution is approximated using a smooth function δˆ such that lim δˆσ [y] = δ [y]. Five candidates are considered in this σ→0 work: y2 Gaussian δˆσ [y] = σ√12π exp− 2σ2 = σ1 φ σy Truncated Gaussian δˆσ [y] =
Sinc δˆσ [y] =
y 1 φ σ σ
( )
I Φ(1)−Φ(−1) −σ≤y≤σ
y sin( σ ) yπ
y 2 1 Bump δˆσ [y] = Aσ exp 1−( σ ) I−σ≤y≤σ R1 − 1 A = −1 exp 1−y2 dy
Poisson δˆσ [y] =
σ π(σ 2 +y 2 )
Design under Uncertainties All these functions include a scalar parameter σ which dictates the “width” of the Dirac approximation. The choice of the approximation as well as σ is of prime importance. Ideally, one would like σ to tend to zero. However, because we are using sampling-based methods, only a finite amount of information is available. For this reason, an “optimal” value of σ needs to be chosen. The optimal value of σ and the choice of Dirac approximation might be problem dependent. Statistically, the optimal choice is the one that minimizes the error between the actual and the estimated sensitivity. Knowing the true sensitivity, traditional performance metrics of an estimator can be computed, such as normalized bias (Bias), standard deviation (Std) and root mean square error (RM SE): h i ˜ E ψ − ψ (4.59) Bias (%) = 100 × ψ r h i h i2 E ψ˜2 − E ψ˜ (4.60) Std (%) = 100 × ψ s 2 ˜ E ψ−ψ RM SE (%) = 100 ×
where ψ is the actual sensitivity value
∂Pf ∂zk
and ψb is an estimator of
ψ, as defined in (4.49). At this point, it is important to recall that the estimator of the sensitivity encompasses two levels of approximation: Z ∂g ∂Pf =− δ [g (x, z)] fX (x) dx ∂zk z ∂zk x,z Z ∂g ˆ ≈− δ [g (x, z)] fX (x) dx (4.62) ∂zk x,z n 1 X ∂g ˆ g x(i) , z ≈− δ (4.63) n ∂zk (i) i=1
Because CMC estimators are unbiased, (4.63) only introduces variance in the estimator. On the other hand, (4.62) is an analytical approximation, and only introduces bias on the estimator. Although the variance could be estimated using the standard error, the bias is not 146
A Framework for Active Learning under Uncertainty strictly speaking statistical. Therefore it cannot be quantified statistically, such as with leave one out approaches. Exercise 4.2 (Empirical Selection of σ) The “optimal” σ can be obtained through experiments. Although not the optimal setting for any problem, this educated guess would lead to better results than an arbitrary one. Consider the following linear analytical limit state, for which analytical sensitivities can be derived: g (x, z) = −x − z + d ≤ 0
where X ∼ N (0, 12 ). Because the limit state function is linear, the probability of failure and its derivative can be obtained exactly: Pf (z) = 1 − Φ (d − z) dPf = φ (d − z) dz z
(4.65) (4.66)
The size of the CMC sample (n) is defined to ensure a 5% coefficient of variation on the probability of failure: !2 p 1 − Pf (z) n (z) = (4.67) pP (z) × 0.05 f
where Pf is defined by (4.65). σ is a function of the number of points (i.e., the amount of information) available, which is in turn influenced by the value of Pf . For this reason, a parameter α is introduced to define a relevant fraction l m Nr of the available samples such b that nr = Pf × N × α . Because the optimal value of σ is also dependent on the order of magnitude of g, the following quantities are defined. Let Y be the random vector such that y (i) = g x(i) , z , |Y | the random sample of absolute values of Y and recall the order statistic (2.87). σ is therefore defined as |Y |(nr ) so that only the nr closest points from the limit state hyper-surface (approximation) have function value within ±σ. These points are the most relevant to the calculation of the sensitivity 147
Design under Uncertainties Table 4.3: Normalized bias, standard error and root mean square error at α = αopt and α = 0.5. Gaussian approximation. Pf αopt |α Bias(αopt )|Bias(α) Std(αopt )|Std(α) RM SE(αopt )|RM SE(α)
10−1 0.94|0.50 2.52|1.54 3.40|5.15 4.23|5.36
10−2 0.59|0.50 2.62|1.94 4.74|5.18 5.41|5.52
10−3 0.53|0.50 2.30|2.03 4.89|5.04 5.40|5.43
10−4 0.50|0.50 2.55|2.55 5.22|5.22 5.80|5.80
of Pf because they will potentially lead to a variation of I [g (x, z) ≤ 0]. The experiment is reproduced for four values of d such that Pf equals 10−1 , 10−2 , 10−3 , and 10−4 . Figure 4.9 shows the plots of normalized bias (4.59), standard error (4.60) and root mean square error (4.61). Expectations in (4.594.61) are calculated out of 300 repetitions. Two immediate conclusions arise: • The Poisson approximation shows a poor performance compared to the other approximations; • The Sinc approximation provides inconsistent results. Out of the three remaining approximations, the Gaussian one has the lowest variance across the experiments. Note that this is a very favorable feature for optimization. In gradient-based optimization, the variance in the sensitivities will impair the convergence properties more than the bias. For these reasons, the Gaussian approximation is recommended. From the results in Figure 4.9, in the case of a Gaussian approximation, a graphical inspection shows that a value of α = 0.5 is a satisfactory choice for the minimization of the MSE. This value can be compared to the solution of the following optimization problem: αopt = arg min RM SE (α)
Table 4.3 shows normalized bias (4.59), standard error (4.60) and root mean square error (4.61) for α = αopt and α = 0.5. Except for the case Pf = 10−1 , α = 0.5 yields similar results 148
A Framework for Active Learning under Uncertainty
Pf = 10−1 Gaussian Trunc. Gaussian Sinc Bump Poisson
RM SE (%)
0 2.5
0 2.5
0 2.5
Gaussian Trunc. Gaussian Sinc Bump Poisson
RM SE (%)
Std (%)
Gaussian Trunc. Gaussian Sinc Bump Poisson
α Pf = 10−4
Gaussian Trunc. Gaussian Sinc Bump Poisson
α Pf = 10−4 Gaussian Trunc. Gaussian Sinc Bump Poisson
0 0.5
α Pf = 10−4 30
RM SE (%)
Std (%)
Gaussian Trunc. Gaussian Sinc Bump Poisson
α Pf = 10−3
α Pf = 10−3 Gaussian Trunc. Gaussian Sinc Bump Poisson
0 0.5
α Pf = 10−3 30
Gaussian Trunc. Gaussian Sinc Bump Poisson
Gaussian Trunc. Gaussian Sinc Bump Poisson
RM SE (%)
α Pf = 10−2
Std (%)
Bias (%)
Bias (%)
Gaussian Trunc. Gaussian Sinc Bump Poisson
α Pf = 10−2
0 0.5
α Pf = 10−2
Gaussian Trunc. Gaussian Sinc Bump Poisson
Std (%)
Bias (%)
Gaussian Trunc. Gaussian Sinc Bump Poisson
Bias (%)
Pf = 10−1
Pf = 10−1 30
Figure 4.9: Normalized bias (Bias), standard error (Std) and root mean square error (RM SE) for 4 level of probability of failure.
Design under Uncertainties to α = αopt . For Pf = 10−1 it yields an increase in RM SE of about 1%.
The convergence of the RBDO algorithm combines conventional optimization metrics: • Soft: Relative change in current optimum [z? , θ ? ]; • Hard: Relative change in current optimal objective function value C (z? , θ ? ). and meta-model convergence criteria as discussed in Section 4.3.3.
Chapter 5 Demonstrative Examples and Applications This chapter presents numerical examples of some points discussed in this dissertation, academic examples of the proposed methodology and some real life applications.
Supervised Learning for Computational Design
In this section, simple examples are used to highlight some of the facts that are discussed in Chapter 3.
DOE Comparison
In Section 3.3, some DOE techniques have been introduced. In order to give some insights into their relative efficiency, consider the following limit state function: sin (10X1 ) − 0.5 ≤ 0 4 X1 , X2 ∼ U (0, 1)
g (X) = X2 −
Exercise 5.1 (Sequential Generation) An initial size n training sample Xtr is obtained using the 151
Supervised Learning for Computational Design Table 5.1: Comparison of SVM risks using different DOE types and sizes for (5.1). n 10 50 100 Random 23.06(35.39) 15.10(30.42) 5.48(29.93) LHS 20.19(30.35) 12.11(27.95) 5.26(25.51) OLHS 18.88(24.25) 11.30(27.85) 5.09(25.20) 15.94(9.38) 9.04(21.37) 3.76(18.63) CVT DOE techniques discussed in Section 3.3. Using the corresponding training set Tr, an SVM is built and its risk (0-1 loss) is estimated (generalization error) based on a size 106 testing set Te. Exercise 5.1 is repeated 200 times for random, LHS, OLHS, and CVT DOEs of size n = 10, n = 50, and n = 100. For this comparison, OLHS refers to the best LHS out of one thousand candidates (in the maximum minimum distance sense). Table 5.1 reports the average risks (in parenthesis, in %) over 200 repetitions and the coefficients of variation (in %). It highlights how CVT tends to lead to lower risk (and less variation) at fixed n (for this example). Although this is only one example, these results are rather intuitive and in the author’s experience, generalize well. Remark 5.1 (Asymptotic Behavior of DOE) The Table 5.1 also highlights a less intuitive result. As the dimensionality d and the DOE size n increase, the relative advantages of all DOE techniques over the baseline random DOE diminishes. Consequently, asymptotically, as d and/or n tends to infinity, all DOE techniques lead to the same performance which is equal to that of a random DOE.
DOE vs Adaptive Sampling
Using the same limit state function (5.1), a comparison is performed between DOE and adaptive sampling. Exercise 5.2 (Active Learning Generation) Based on an initial size 10 CVT DOE training sample Xtr , an initial SVM is built. An additional na adaptive instances 152
Demonstrative Examples and Applications Table 5.2: Comparison of SVM risks trained on either DOE only and DOE followed by adaptive sampling (same final sample size) for (5.1). n/10 + na 30 40 50 Exercise 5.1 9.74(18.94) 7.29(21.24) 6.08(20.19) 8.28(9.06) 6.69(9.18) 5.38(16.44) Exercise 5.2 Table 5.3: Comparison of SVM and GP risks using different CVT DOE sizes for (5.1). n 10 50 100 SVM 16.42(15.78) 6.08(20.86) 3.82(18.31) GP 13.83(17.60) 0.08(42.38) 0.00(247.83)
(generalized max-min) are added sequentially, as discussed in Algorithm 4.1. The SVM risk (0-1 loss) is estimated (generalization error) based on a size 106 testing set Te Results from Exercise 5.1 with n equal 30, 40, and 50 are compared to results from Exercise 5.2 with na equal 20, 30, and 40 (so that the size of the final training set Tr is the same). Table 5.2 reports the average risk (in %) over 200 repetitions and the coefficient of variation (in parenthesis, in %). These results highlight the advantage of using active learning. Albeit these results are for a single example, the point of this section is merely to illustrate a rather intuitive fact.
Regression vs Classification
In Section 3.1.2, the importance of selecting regression or classification was discussed. Using Example 5.1, a comparison between regression (using GP) and classification (using SVM) is performed. Exercise 5.1 is repeated 200 times for n equals 10, 50, and 100 using either SVM or GP. Table 5.3 reports the average risk (in %) over 200 repetitions and the coefficient of variation (in parenthesis, in %). These results highlight the advantage of using regression over classification techniques (when available). Note that this comparison is about regression vs classification and not GP vs SVM. Equivalent results would be observed for SVM vs SVR or regression GP vs classification GP. 153
Active Supervised Learning
Active Supervised Learning
In Section 3.4, some active learning strategies have been discussed. Previous works have looked into comparison of such schemes, such as AKMCS–EGRA (Echard et al., 2011) and SUR–AK-MCS–EGRA (Bect et al., 2012). In order to assess the overall behavior of the generalized max-min, two academic setups are used to compare the proposed approach to existing techniques. A third example is used to highlight the advantage of parallel update.
Four Branch Problem
This first problem is defined as: g (x) = min gi (x) ≤ 0 i
x1 + x2 √ 2 x + x2 1 g2 (x) = 3 + 0.1 (x1 − x2 )2 + √ 2 k g3 (x) = (x1 − x2 ) + √ 2 k g4 (x) = (x2 − x1 ) + √ 2 2 X1 , X2 ∼ N 0, 1
g1 (x) = 3 + 0.1 (x1 − x2 )2 −
Although this problem could be seen as a series system reliability setup, it is treated in this case as a component reliability setup. Four adaptive sampling schemes are compared (classic and generalized max-min, EGRA, and AK-MCS) over a hundred iterations (one instance per iteration). The experiment is repeated ten times, due to computational restrictions. An initial size n = 10 CVT DOE (different for each repetition) and GPs are used. Figure 5.1 shows plots of the relative error of each scheme on the estimated probability of failure. These plots show rather similar convergence properties for each scheme. 154
" (%)
" (%)
Demonstrative Examples and Applications
50 40
50 40
50 40
10 40
40 30
(b) EGRA.
" (%)
" (%)
(a) Classic max-min.
(c) Generalized max-min.
(d) AK-MCS.
Figure 5.1: Comparison of four adaptive sampling schemes for the four branch problem (Section 5.2.1) over 10 repetitions. Plots show the relative error ε in the estimated probability of failure Pbf .
Active Supervised Learning
10D Limit State
This second problem is defined as (Engelund and Rackwitz, 1993):
g (x) =
10 X ln Φ (xi ) i=1
+ 10
Xi ∼ N 0, 12
(5.3) i = [1, . . . , 10]
This problem features a highly non-linear limit state function in 10D. Four adaptive sampling schemes are compared (classic and generalized max-min, EGRA, and AK-MCS) over a hundred iterations. The experiment is repeated ten times, due to computational restrictions. An initial size n = 50 CVT DOE (different for each repetition) and GPs are used. Figure 5.1 shows plots of the relative error of each scheme on the estimated probability of failure. These plots clearly highlight the advantage of accounting for the distribution of the inputs X. Both generalized max-min and AK-MCS show superior convergence over classic max-min and EGRA. The latter schemes would eventually converge, given enough time. It should be noted that the convergence of the classic max-min is not representative of EDSD (anti-locking sample missing). As mentioned in Section 3.4, this feature should be considered when one wishes to train a meta-model specifically tailored to a distribution fX .
Parallel Update
In Section 4.1, it was stated that the generalized max-min can be easily parallelized. This is demonstrated using the following example (Lacaze et al., 2015a):
g (x) =
10 X ln Φ (xi ) i=1
+ 12
Xi ∼ N 0, 12
(5.4) i = [1, . . . , 10]
In this example, SVMs are used as meta-models. Three adaptive sampling schemes are compared, following Algorithm 4.1, with a variation in Line 4: 156
" (%)
" (%)
Demonstrative Examples and Applications
10 20
(b) EGRA.
" (%)
" (%)
(a) Classic max-min.
(c) Generalized max-min.
(d) AK-MCS.
Figure 5.2: Comparison of four adaptive sampling schemes for the 10D limit state problem (Section 5.2.2) over 10 repetitions. Plots show the relative error ε in the estimated probability of failure Pbf .
Model Update −1
Clas s ic max- m in Ge ne r aliz e d m ax- m in ( s e r ie s ) Ge ne r aliz e d max-min ( par alle l) Tr ue Pf
Figure 5.3: Comparison of estimated probability of failure convergences using different adaptive sampling schemes (Section 5.2.3). • Use one max-min instance with a spherical constraint: (i) max min x − xtr x
f (x|Tr) = 0 s.t. M ||x|| ≤ 8
• Use one generalized max-min instance (original Line 4); • Use ten generalized max-min per iteration. Figure 5.3 shows a comparison of the three adaptive sampling scheme convergences. It highlights the advantage of using the generalized maxmin over the classic max-min. In addition, the parallel scheme (that adds ten instances in parallel) also exhibits a faster convergence. Note however that the classic max-min convergence is not representative of the EDSD convergence, as it would involve anti-locking instances.
Model Update
This section presents an application of the fidelity map (FM) approach (Section 4.2) for material properties identification using modal data 158
Demonstrative Examples and Applications (Lacaze and Missoum, 2014c). Other FM applications are discussed in Lacaze and Missoum (2012) for a piano soundboard material properties identification and in Lacaze et al. (2014) for a general approach to couple FM and random fields. Traditional quantities used in model update using modal properties are: • Differences in natural frequency values λi (e.g., Euclidean norm of difference): this quantity is traditionally minimized in the form of a residual through optimization; • Differences between the mode shapes: this is typically measured using the modal assurance criterion (MAC) matrix (Allemang, 2002; Marwala, 2010); • Differences between the frequency response function (FRF) measured using the frequency response assurance criterion (FRAC); • Mode orthogonality. The MAC criterion (5.6) is by far the most widely used: Mij =
2 (Φ∗T i AΦexp,j ) ∗T (Φ∗T i AΦi )(Φexp,j AΦexp,j )
where Φi is the ith computational mode shape (not to be confused with the standard normal CDF Φ). Φ∗T is the conjugate transpose of the i mode shape. A is often the identity matrix or the mass matrix. The MAC value is equal to unity for a perfect match of modes. It should be as close to zero as possible for cross terms (mode orthogonality). Consider a simply supported rectangular plate modeled using finite elements. In order to model uncertainty in the displacement boundary conditions, one dimensional springs of stiffness K in the out-of-plane direction are used on three sides of the plate (Figure 5.4). The finite element model of the plate is constructed with 80 shell elements. The Young’s modulus E of the plate is to be identified based on nm = 4 first modes for a total of ny = 14 responses (4 natural frequencies and 10 MAC matrix terms). The parameters are summarized in Table 5.4. In the absence of an actual experimental setup, “virtual experiments” are used by running the FE model with E = Eact and K = Kact . The methodology is repeated for 6 combinations of Eact and Kact (see Tables 5.5 and 5.6). All the configurations are run with ε = 1% for all the responses. 159
Model Update
b E, ν, ρ, t K
(a) Schematic representation.
(b) FEM representation.
Figure 5.4: Schematic and finite element representation of a simple plate. One side is simply supported while the others are connected to the ground through springs, to model uncertainties on the boundary conditions. Table 5.4: Parameters used in Section 5.3 (S.I. units). Deterministic Estimate Aleatory Param. a b ν ρ t E K Val./Dist. 1 1.5 0.33 7800 0.01 N/A U (2 × 105 , 106 ) The FMs are constructed in the (E, K) space, using a unique SVM per FM, with an initial size n = 15 CVT DOE. Each FM boundary is refined (Section 4.3.1) with na = 50 adaptive instances. As discussed in Section 4.2.3, the likelihood value L (E|Yexp ) is approximated as the probability P [(E, K) ∈ ΩFM ] which is estimated using a size n = 105 CMC estimate (referred to as LHmcs ) according to the distribution of K (Table 5.4). The proposed approach is compared to the results using the likelihood of the residual (LHres ) and the product of the individual likelihoods for the different responses LHprod (Lacaze and Missoum, 2014c). These likelihoods are constructed using Kernel Smoothing (Bowman and Azzalini, 2004) and GP meta-models (Sacks et al., 1989; Forrester and Keane, 2009; Basudhar et al., 2012) trained on a size n = 65 CVT DOE for each of the ny = 14 responses. LHres uses a residual defined as follows: " # Nm nm X X (λi − λi,exp )2 + (Mii − 1)2 + R= Mij λi,exp i=1 j=1,j6=i The graphical results for the 6 configurations are depicted in Figures 5.5 and 5.6. Graphical inspection of the likelihoods shows that 160
Demonstrative Examples and Applications Table 5.5: Summary of the 6 experimental combinations and corresponding figures (S.I. units) for Section 5.3. Part 1. Eact 185 × 109 5 Kact 3 × 10 6 × 105 9 × 105 5.5(a) & 5.5(b) 5.5(c) & 5.5(d) 5.5(e) & 5.5(f) Figures EMLE (Pa) 184.6 × 109 182.3 × 109 185.8 × 109 0.22 1.46 0.49 Error (%) 9 9 EBayes (Pa) 184.3 × 10 185.02 × 10 186.7 × 109 Error (%) 0.37 0.008 0.87 Table 5.6: Summary of the 6 experimental combinations and corresponding figures (S.I. units) for Section 5.3. Part 2. Eact 235 × 109 Kact 3 × 105 6 × 105 9 × 105 Figures 5.6(a) & 5.6(b) 5.6(c) & 5.6(d) 5.6(e) & 5.6(f) M LE (Pa) Eest 232.4 × 109 237.5 × 109 235.4 × 109 Error (%) 1.11 1.06 0.17 Bayes 9 9 Eest (Pa) 233.9 × 10 233.7 × 10 235.5 × 109 Error (%) 0.44 0.54 0.19
LHmcs exhibits a higher robustness than the two other methods for that example. The failure of the LHprod is natural since the different natural frequencies λi are strongly correlated, therefore, the assumption of independence leads to incorrect results. On the other hand, the inaccuracy of LHres is not straightforward. A loose explanation stems from the gathering of several responses that are correlated with different spreads within one quantity (similar to conclusions drawn in Gogu et al. (2010)). In the case where MLE is chosen for estimation, the results for the six cases are summarized in Table 5.5. As can be seen, the methodology is robust for this example. In the case of Bayesian update, a wide prior distribution, reflecting a significant lack of knowledge was chosen. The prior distribution is set with a mean value of 210 GPa and a standard deviation of 21 GPa. Figure 5.7(a) depicts the likelihood function, the prior distribution, and the actual value, for the first case (Eact = 185 × 109 Pa and Kact = 3 × 105 N.m-1 ). Figure 5.7(b) shows the corresponding posterior distribution (Remark 2.9 and Section 4.2.4). The Bayes estimators (expectation of the posterior PDF) for the 6 cases are provided 161
Model Update
x 10
SVM Positive samples Negative Samples
LHmcs LHres LHprod Eact
0.9 0.8
0.7 0.6
K (N/m)
0.5 0.4
0.3 4
0.2 3
0.1 0 1.2
2 1.4
E (P a)
E (P a)
x 10
(a) Fidelity Map
3 11
x 10
(b) Likelihood
x 10
SVM Positive samples Negative Samples
LHmcs LHres LHprod Eact
0.9 0.8
0.7 0.6
K (N/m)
0.5 0.4
0.3 4
0.2 3
0.1 0 1.2
2 1.4
E (P a)
E (P a)
x 10
(c) Fidelity Map
3 11
x 10
(d) Likelihood
x 10
LHmcs LHres LHprod Eact
0.9 9
0.8 8
0.7 0.6
K (N/m)
0.5 0.4
0.3 4
SVM Positive samples Negative Samples
0.1 0 1.2
2 1.4
E (P a)
E (P a)
x 10
(e) Fidelity Map
3 11
x 10
(f) Likelihood
Figure 5.5: Graphical results of Section 5.3, showing the fidelity maps and the estimated likelihoods, for Eact = 185×109 Pa and Kact = 3×105 N.m-1 (a and b), Kact = 6 × 105 N.m-1 (c and d), Kact = 9 × 105 N.m-1 (e and f). 162
Demonstrative Examples and Applications
x 10
SVM Positive samples Negative Samples
LHmcs LHres LHprod Eact
0.9 0.8
0.7 0.6
K (N/m)
0.5 0.4
0.3 4
0.2 3
0.1 0 1.2
2 1.4
E (P a)
E (P a)
x 10
(a) Fidelity Map
x 10
(b) Likelihood
x 10 10
SVM Positive samples Negative Samples
LHmcs LHres LHprod Eact
8 0.9 0.8 0.7
6 0.6
K (N/m)
0.5 0.4
4 0.3 0.2
2 1.4
0 1.2
E (P a)
E (P a)
x 10
(c) Fidelity Map
3 11
x 10
(d) Likelihood
x 10
LHmcs LHres LHprod Eact
0.9 9
0.8 8
0.7 0.6
K (N/m)
0.5 0.4
0.3 4
SVM Positive samples Negative Samples
0.1 0 1.2
2 1.4
E (P a)
E (P a)
x 10
(e) Fidelity Map
3 11
x 10
(f) Likelihood
Figure 5.6: Graphical results of Section 5.3, showing the fidelity maps and the estimated likelihoods, for Eact = 235×109 Pa and Kact = 3×105 N.m-1 (a and b), Kact = 6 × 105 N.m-1 (c and d), Kact = 9 × 105 N.m-1 (e and f). 163
Design under Uncertainties 1
LHmcs Eact Prior
1 0.9
0.8 0.7
0.6 0.5 0.4
0.6 0.5 0.4
0 1.2
0 1.4
E (P a)
3 11
x 10
(a) Likelihood, prior knowledge, and actual parameter value Eact .
E (P a)
3 11
x 10
(b) Posterior through MCMC.
Figure 5.7: Bayesian update applied to the first case (Eact = 185 × 109 Pa and Kact = 3 × 105 N.m-1 ) of the plate example. in Table 5.5. In order to gauge the benefits of the Bayesian update, the posterior distribution was propagated to the first natural frequency (Figure 5.8(a)). For comparison, the prior distribution was also propagated (Figure 5.8(b)). In addition, the ideal response distribution was computed (Figure 5.8(c)) using the actual value of the Young’s modulus (Eact ), unknown in practice, along with the propagation of the aleatory variables (i.e., K). The figure clearly shows that the Bayesian update leads to better prediction.
Design under Uncertainties
This section presents applications of the algorithms discussed in Section 4.3. First, reliability assessment examples are introduced for correlated cases. These demonstrate the ability of the generalized maxmin to handle different dependence structures, including copulas (Section An RBDO application is then presented to showcase the derived probability of failure sensitivity estimates with respect to deterministic variables (Section followed by a real life application of a car side impact crash-worthiness analysis. 164
Demonstrative Examples and Applications 5
x 10
x 10
x 10
2.5 1.5
2 1.5
λ1 (H z)
λ1 (H z)
(a) Using prior.
(b) Ideal distribution.
λ1 (H z) (c) Using posterior.
Figure 5.8: First natural frequency λ1 distributions for first case (Eact = 185 × 109 Pa and Kact = 3 × 105 N.m-1 ) of the plate example. Uncertainty propagated using the prior, the posterior, and the ideal (unknown) distributions.
Reliability Assessment: Analytical Examples
The two following section presents academic analytical examples featuring complex discontinuous responses. In order to address the discontinuous behavior, SVMs are used.
Reliability Assessment: 2D Correlated Gaussian
The first example features a complex 2D discontinuous limit state function defined as: g (x) =
6sgn(x1 ) − x2 sgn(x2 ) ≤ 0 x1
where: X ∼ N2 (µX , ΣX ) µX = [0 0] 1 0.7 ΣX = 0.7 1
(5.8) (5.9) (5.10)
Figure 5.9(a) provides an overview of the stochastic space along with the actual limit state hyper-surface g (x) = 0. Two adaptive sampling schemes are compared, following Algorithm 4.1, with either a 165
Design under Uncertainties classic or generalized max-min update. As discussed in Section 4.3.3, bootstrapped SVMs are built to compute CIs and convergence metrics (bootstrapped SVMs prediction coefficient of variation). Figure 5.10(a) shows the convergence of the estimated probability of failure for the two different schemes. In addition, Tables 5.7 and 5.8 report values of the estimated probabilities of failure, 95% CIs and coefficients of variation at iterations 30, 60 and 100. As a reference, the CI and the coefficient of variation of the CMC estimate on the actual limit state is provided. It can clearly be seen that using the generalized “max-min” substantially improves the convergence of the algorithm as well as the coefficient of variation. A coefficient of variation below 10% is first achieved at iteration 28 and below 5% at iteration 45. The relative errors with respect to the estimated probability of failure at these iterations are 1.1% and 0.9% respectively.
Reliability Assessment: 3D Correlated Gaussian
The second example features a complex 3D limit state function defined as: 3sgn(x2 ) 3(x3 −sgn(x1 )) + x2 x1 ≤0 (5.11) g (x) = − sgn(x3 ) where: X ∼ N3 (µX , ΣX ) µX = [0 0 0] 1 0.7 0.7 ΣX = 0.7 1 0.7 0.7 0.7 1
(5.12) (5.13) (5.14)
Figure 5.9(b) provides an overview of the stochastic space along with the actual limit state hyper-surface. Figure 5.10(b) shows the convergence of the estimated probability of failure using the two different sampling schemes discussed in Section, as well as the corresponding 95% CIs (Section In addition, Tables 5.9 and 5.10 report values of the estimated probabilities of failure, 95% CIs, and coefficients of variation at iterations 30, 60 and 100. The reference values of Pf (CMC estimate on the actual limit state) as well as the corresponding CI are also provided. Similar conclusions can be drawn from Figure 5.10(b) 166
Demonstrative Examples and Applications and Tables 5.9 and 5.10. When the generalized max-min scheme is used, a coefficient of variation below 10% is first achieved at iteration 35 and below 5% at iteration 52. The relative errors with respect to the estimated probability of failure at these iterations are 1.9% and 0.9%.
Reliability Assessment: Cantilever Beam
In this example, a cantilever beam (Figure 5.11) is used. The Young’s modulus E is equal to 210 GPa and the length L is 200 mm. The random variables of the problem are the dimensions b and h of the rectangular cross section as well as the tip load P . The limit state function is based on the tip deflection such that:
g (b, h, P ) = 1.5 −
P L3 ≤0 3EI
. The marginal where I is the second moment of the area I = bh 12 distributions of b, h and P are depicted in Table 5.11. The dependence structure used for this problem is a Gaussian copula (Section with:
1 0.7 0.7 R = 0.7 1 0.7 0.7 0.7 1
Scatter plots of the projected joint PDF are provided in Figure 5.12. Figure 5.13 shows the convergence of the estimated probability of failure for the two different schemes (Section In addition, Tables 5.12 and 5.13 report values of the estimated probability of failure, CIs and coefficient of variation at iterations 30, 60 and 100. Once again, a faster convergence is observed when the generalized max-min sample is used. A coefficient of variation below 10% is first achieved at iteration 34 and below 5% at iteration 39. The relative error made on the estimated probability of failure at these iterations are 4.1% and 5.3% respectively. 167
Design under Uncertainties
Limit State PDF 4
−5 −5
(a) Section
(b) Section
Figure 5.9: Contours of the probability density function and the limit state hyper-surface for the analytical examples. 10-1
Classic \max-min" 95% CI Generalized \max-min" 95% CI P^fNMC 20
10-3 Classic \max-min" 95% CI Generalized \max-min" 95% CI P^fNMC
100 120 140
Iterations (a) Section
(b) Section
Figure 5.10: Convergence of the estimated probability of failure and the corresponding 95% confidence intervals using the classic (red) or the generalized (blue) max-min adaptive sampling schemes for the analytical examples. The reference value PˆfNM C is based on the actual limit state function.
Demonstrative Examples and Applications
Table 5.7: Estimated probability of failure and its 95% CI at iteration 30, 60 and 100 using the classic max-min scheme for Section Classic CMC? 30 60 100 n = 106 −3 Pf (×10 ) 6.2 6.8 7.1 7.2 95% CI (×10−3 ) [2.9, 11.2] [4.4, 9.2] [5.5, 8.8] [7.0, 7.4] 31.9 16.8 11.5 1.2 cv BS (%) ε (%) 14.5 5.07 2.04 Table 5.8: Estimated probability of failure and its 95% CI at iteration 30, 60 and 100 using the generalized max-min scheme for Section Generalized CMC? 30 60 100 n = 106 −3 Pf (×10 ) 7.2 7.2 7.2 7.2 95% CI (×10−3 ) [6.4, 8.7] [7.0, 7.9] [7.0, 7.6] [7.0, 7.4] cv BS (%) 8.5 3.4 1.8 1.2 ε (%) 0.20 0.27 0.52 Table 5.9: Estimated probability of failure and its 95% CI at iteration 30, 60 and 100 using the classic max-min scheme for Section Classic CMC? 30 60 100 n = 106 −3 Pf (×10 ) 0.4 1.0 8.7 5.3 95% CI (×10−3 ) [0.2, 1.4] [0.4, 7.4] [1.9, 11.56] [5.3, 5.6] cv BS (%) 103.5 107.1 349.3 1.3 92.5 81.4 55.3 ε (%) Table 5.10: Estimated probability of failure and its 95% CI at iteration 30, 60 and 100 using the generalized max-min scheme for Section Generalized Monte-Carlo? 30 60 100 n = 106 −3 Pf (×10 ) 5.9 5.6 5.2 5.3 95% CI (×10−3 ) [4.5, 7.1] [5.1, 6.3] [5.2, 5.7] [5.3, 5.6] cv BS (%) 10.5 5.4 2.7 1.3 ε (%) 7.02 2.55 0.84 ?
Obtained using 500 repetitions.
Design under Uncertainties P
L Figure 5.11: Description of the cantilever beam for Section 5.4.2. Table 5.11: Marginal distributions of the parameters involved in Section 5.4.2. Parameter Distribution b (mm) LN (2.0491, 0.2462) h (mm) Γ (53, 0.33) P (N) W (100, 10) 120
26 110
b 120
40 10
Figure 5.12: Scatter plots of the joint distribution for Section 5.4.2. 170
Demonstrative Examples and Applications
10-3 Classic \max-min" 95% CI Generalized \max-min" 95% CI P^fNMC
Figure 5.13: Convergence of the estimated probability of failure and the corresponding 95% confidence intervals using the classic (red) or the generalized (blue) max-min adaptive sampling schemes for the cantilever beam example. The reference value PˆfNM C is based on the actual limit state function. Table 5.12: Estimated probability of failure and its 95% CI at iteration 30, 60 and 100 using the classic max-min scheme for Section 5.4.2. Classic CMC? 30 60 100 n = 106 Pf (×10−3 ) 1.8 3.6 3.2 3.1 −3 95% CI (×10 ) [0.1, 26.6] [1.8, 8.6] [2.7, 3.7] [3.0, 3.2] 428.9 44.9 11.6 1.8 cv BS (%) 41.8 16.1 3.50 ε (%) ?
Obtained using 500 repetitions.
Table 5.13: Estimated probability of failure and its 95% CI at iteration 30, 60 and 100 using the generalized max-min scheme for Section 5.4.2. Generalized CMC? 30 60 100 n = 106 Pf (×10−3 ) 2.9 3.0 3.1 3.1 95% CI (×10−3 ) [2.5, 4.5] [2.9, 3.1] [3.0, 3.1] [3.0, 3.2] cv BS (%) 21.7 1.9 1.1 1.8 ε (%) 4.47 1.59 0.19 ?
Obtained using 500 repetitions.
Design under Uncertainties 10
C P [g(z; X) 5 0] 5 10!2 Optimum
8 7
32 ; X 2
C g1 g2 g3 Optimum
6 5 4 3
2 1 0
31 ; X 1
(a) Section
(b) Section
Figure 5.14: Overview of RBDO academic examples.
RBDO: Academic Examples
RBDO: Non Linear Limit State
This test case is taken from Aoues and Chateauneuf (2010). The problem, as shown in Figure 5.14(a), is defined as (Lacaze et al., 2015b): min z12 + z22
s.t. P [z1 z2 X2 − ln(X1 ) ≤ 0] ≤ 10−2 0 ≤ z ≤ 15 X1 ∼ N (5, 0.32 ) X2 ∼ N (3, 0.32 ) The actual optimum for this problem is found at z? = [1.36 1.36]. An initial size n = 40 CVT DOE is used. z0 = [12 12] is used as a starting point to be consistent. GPs are used. Figures 5.15(a), 5.16(a), and 5.17(a) show the evolution of the current optimum along with the convergence metric presented in Section (ρH hard convergence; ρK order of magnitude between estimated probability of failure bounds). Convergence is achieved in 16 iterations (56 function calls). For reference, Aoues and Chateauneuf (2010) reported 39 function calls using SLA (Section 2.5.3). 172
Demonstrative Examples and Applications
RBDO: Three Probabilistic Constraints
The second test case is taken from Aoues and Chateauneuf (2010) for β = 4. The problem, as shown in Figure 5.14(b), is defined as (Lacaze et al., 2015b): min θ1 + θ2
s.t. P [gi (X) ≤ 0] ≤ 3.17 × 10−5 0 ≤ θ ≤ 10 i = [1, 2, 3] 2 X1 ∼ N (θ1 , 0.3 ) X2 ∼ N (θ2 , 0.32 ) with: X12 X2 −1 20 (X1 − X2 − 5)2 (X1 − X2 − 12)2 g2 (X) = + −1 30 120 80 g3 (X) = 2 −1 X1 + 8X2 + 5 g1 (X) =
The actual optimum for this problem is found at θ ? = [3.62 3.65]. An initial size n = 10 CVT DOE is used. θ 0 = [5 5] is used as a starting point to be consistent. GPs are used. Figures 5.15(b), 5.16(b), and 5.17(b) show the evolution of the current optimum along with the convergence metric presented in Section (ρH hard convergence; ρK order of magnitude between estimated probability of failure bounds). Convergence is achieved in 4 iterations (42 function calls). For reference, Aoues and Chateauneuf (2010) reported 81 function calls using SLA (Section 2.5.3).
RBDO: Crash-worthiness Analysis of a Car Side Impact
The final example is a car side impact crash-worthiness analysis. It was initially introduced in Gu et al. (2001). The formulation used in this 173
Design under Uncertainties 10
z1 z2
31 32
9 8 7
3 best
6 8
5 4
3 4
2 2
1 0 0
(a) Section
(b) Section
;H (%)
;H (%)
Figure 5.15: Evolution of zbest for the RBDO analytical examples.
25 20
25 20
5 0
0 2
(a) Section
(b) Section
Figure 5.16: Evolution of ρH for the RBDO analytical examples.
1 0
0 2
(a) Section
(b) Section
Figure 5.17: Evolution of ρK for the RBDO analytical examples. 174
Demonstrative Examples and Applications work comes from Youn et al. (2004b) such as: min W (θ)
s.t. P [gi (X) ≤ 0] ≤ 10−1 0.5 ≤ θ ≤ 1.5 Xj ∼ N (θj , 0.03) Xk ∼ N (0.345, 0.0062 ) Xl ∼ N (0, 102 )
i = [1, . . . , 10] j = [1, . . . , 7] k = [8, 9] l = [10, 11]
An initial size n = 70 CVT DOE is used. GPs are used (Lacaze et al., 2015b). Figures 5.18(a), 5.19(a), and 5.20(a) show the evolution of the current optimum along with the convergence metric presented in Section (ρH hard convergence; ρK order of magnitude between estimated probability of failure bounds). Convergence is achieved in 22 iterations (920 function calls). For reference, Zou and Mahadevan’s approach (Section, 5,256 and 31,550 function calls were reported using SORM and IS respectively. This example is repeated using SVMs (Lacaze and Missoum, 2013a). An initial size n = 60 CVT DOE is used. Figures 5.18(b), 5.19(b), and 5.20(b) show the evolution of the current optimum along with the convergence metric presented in Section (ρH hard convergence; ρK order of magnitude between estimated probability of failure bounds). Convergence is achieved in 100 iterations (1600 function calls).
Design under Uncertainties 1.5
1.2 1.1 1
1.4 1.3 1.2 1.1
3 best
31 32 33 34 35 36 37
θ5 θ6 θ7
0.5 0
(a) Section 5.4.4 (GP).
(b) Section 5.4.4 (SVM).
;H (%)
Figure 5.18: Evolution of zbest for the RBDO crash example.
ρS ρH
25 20
0 5
(a) Section 5.4.4 (GP).
(b) Section 5.4.4 (SVM).
Figure 5.19: Evolution of ρH for the RBDO crash example. 10
70 8
60 7
6 5 4
max ρΩFi min ρΩFi
40 30
20 2
10 1
0 5
(a) Section 5.4.4 (GP).
Iterations (b) Section 5.4.4 (SVM).
Figure 5.20: Evolution of ρK for the RBDO crash example. 176
Chapter 6 Conclusion Chapter 4 introduced a novel adaptive sampling approach: the generalized max-min. This strategy is built on the basis of explicit design space decomposition (EDSD) and its central element: the max-min approach. This work generalized the max-min strategy to account in an optimal way for the input distribution fX . A novel numerical implementation was proposed to improve both the accuracy and the efficiency of the max-min search. In addition, a bootstrapped-based risk estimate associated with support vector machines (SVMs) was proposed. Chapter 5 presented a series of examples and engineering applications of the proposed methodologies. It was shown that the generalized max-min strategy can exhibit equivalent performances to existing approaches, with the added advantage of being meta-model independent (specifically, it can be used with SVM, as opposed to most existing adaptive sampling schemes). The advantage of parallel update using generalized max-min strategy was also demonstrated. An application of the fidelity map (FM) highlighted some of the advantages of the approach when a large number of responses with an unknown dependence structure are involved. The proposed methodology was developed for any distributions fX , including correlated ones. This feature was demonstrated using two academic examples with correlated Gaussians. Additionally, a beam analysis with non-Gaussian marginals linked through a copula structure showed the versatility of the proposed scheme. The proposed bootstrap-based confidence interval for SVMs was used on the aforementioned cases and showed the ability to be used as a convergence metric. This approach provides a robust estimate of the SVM risk for general purposes, albeit the computational complexity is 177
moderately increased. The proposed sensitivity estimates of a probability of failure with respect to deterministic variables were validated on a test case. This contribution not only offers new perspectives for RBDO but also extends most of the previously proposed gradient-based approaches (that were limited to distribution hyper-parameters as design variables). Also, an application of the proposed RBDO approach for a car side impact crash-worthiness, with eleven random variables and ten probabilistic constraints was proposed to demonstrate the efficiency of the approach in moderate dimensionality. The proposed methodologies showed promising results for computational design and decision making and addressed some of the issues of adaptive sampling strategies. Finally, most of the notions discussed in this work, such as sensitivity analysis, reliability assessment, RBDO, surrogate modeling, and adaptive sampling were implemented in a MATLAB toolbox which will be made freely available to the public. Details about its implementation can be found in Appendix A. Throughout most of this work, the “active learning” and “adaptive sampling” wording were used interchangeably. However, so far, the fields of active learning and adaptive sampling techniques have been evolving independently. Moving forward, a thorough exploration of active learning techniques, their link with existing adaptive sampling schemes, and possible applications to engineering setups could lead to significant progress. Additionally, there are still several machine learning techniques that could have an impact on the engineering community and have not yet been investigated. For example, in Section 3.2, Bayesian linear regression models, sparse Bayesian regression, and relevance vector machines (RVMs) were left out. A thorough comparison of SVMs, SVRs, regression and classification GPs, and regression and classification RVM could lead to important insights in the field of surrogate modeling. In addition, further insights are required in the proposed algorithms. For example, the FM approach still needs to be validated in high dimensional spaces. Specific numerical arrangements are expected to be required. Also, the max-min optimization search, although substantially improved in this work, is still expected to cause problems in high dimensions, when the training set Tr is already large (> 1000). This point will need to be addressed for the approach to be truly applicable for full scale engineering settings.
Appendix A MATLAB Toolbox Most of the work done in this dissertation, along with classical tools for engineering design and decision making were implemented in an object-oriented MATLAB toolbox. This toolbox can be downloaded from: This toolbox, which offers a thorough documentation, was created, structured, and implemented to feature state of the art techniques easily usable for engineering applications. Most functions are highly customizable, using default settings that were heuristically shown to be robust and efficient, but still offering complete control for advanced users. User’s feedback can be submitted at the above website. The features in the current implementation are summarized in Figure A.1.
CODES toolbox
meta-models SVM 180
sensitivity analysis correlation coefficient DGSM Sobol’ index
reliability assessment CMC FORM
double loop
inverse FORM SORM SubSim
Figure A.1: Diagram of the current implementation of the CODES toolbox.
CVT generalized max-min
Notations x A variable. 22 x A (row) vector such that x = [x1 , . . . , xd ]. 22 X A random variable. 23 θ Distribution hyper-parameters. 23 fX Probability Density Function (PDF) of X. 23 FX Cumulative Distribution Function (CDF) of X. 23 F−1 X Inverse Distribution Function (IDF) of X. 24 E [X] Expected value of X. 25 V [X] Variance of X. 25 COV [X1 , X2 ] Covariance between X1 and X2 . 35 X A random vector. 32 d Problem dimension. 32 fXi Marginal PDF of Xi . 32 fX Joint PDF of X. 32 φ Standard normal PDF. 28 Φ Standard normal CDF. 28 T X A random sample of X such that X = x(1) , . . . , x(n) . 39 x(i) ith realization of the random sample X. 39 181
Notations n Random sample size. 39 θb An estimate of θ. 42
X ? A resample of X. 49 b 49 θb(i) ith bootstrapped value of an estimate θ.
M Computationally expensive numerical model. 50 Pf Probability of failure. 63 Ωf Failure domain. 63 g Limit state. 63 I Indicator function. 64 β Reliability index. 68 f An approximation, surrogate or meta-model of M. 90 M f 93 ψ Hyper-parameters of the meta-model M. Tr Training set. 91
Acronyms i.i.d. Independent and Identically Distributed. 39 ANOVA ANalysis Of VAriance. 59, 60, 62 CDF Cumulative Distribution Function. 11, 23, 24, 27–32, 35, 37, 120, 125–128, 159 CI Confidence Interval. 42–44, 48, 49, 52, 53, 55, 56, 58, 59, 62, 66, 69, 72, 140, 166, 167 CMC Crude Monte Carlo. 10, 18, 19, 47, 48, 52, 64–66, 71, 72, 74, 76, 78, 82–85, 90, 93, 94, 143, 144, 146, 147, 160, 166, 169, 171, 180 CV Cross-Validation. 95 CVT Centroidal Voronoi Tessellation. 14, 112, 113, 115, 152–154, 156, 160, 172, 173, 175, 180 DGSM Derivative-based Global Sensitivity Measure. 58, 59, 62, 180 DOE Design Of (Computer) Experiments. 8, 14, 110–113, 116, 139, 142, 151–154, 156, 160, 172, 173, 175 EDSD Explicit Design Space Decomposition. 19, 20, 121–123, 156, 158, 177, 180 EE Elementary Effects. 57–59 EGRA Efficient Global Reliability Assessment. 19, 118, 154–157 FM Fidelity Map. 12, 135–139, 158–160, 177, 178 183
Acronyms FORM First Order Reliability Method. 18, 67–71, 76–78, 80, 81, 86, 88, 89, 180 FRAC Frequency Response Assurance Criterion. 159 FRF Frequency Response Function. 159 GP Gaussian Process. 14, 20, 99, 101–103, 117–119, 121, 140, 153, 154, 156, 160, 172, 173, 175, 176, 178, 180 IDF Inverse Distribution Function. 24, 43 IS Importance Sampling. 19, 72, 73, 175 LARS Least Angle RegreSsion. 97, 99 lasso Least Absolute Shrinkage and Selection Operator. 97, 99 LHS Latin Hypercube Sample. 11, 111, 112, 114, 152 LOO Leave-One-Out. 94, 95, 108, 109 MAC Modal Assurance Criterion. 159 MCMC Markov-Chain Monte Carlo. 11, 46, 75, 120, 121, 137 ME Misclassification Error. 93 MLE Maximum Likelihood Estimate. 44–46, 102, 135, 137, 161 MPP Most Probable failure Point. 67, 68, 70, 71, 129 MPTP Minimum Performance Target Point. 81, 86–88 MSE Mean Square Error. 92, 148 MVFOSM Mean Value First Order Second Moment. 66, 67 OLHS Optimal Latin Hypercube Sample. 112, 115, 152 PCE Polynomial Chaos Expansion. 98, 99, 101 PDF Probability Density Function. 10, 23, 24, 27–38, 48, 67, 68, 72, 73, 98, 113, 122, 124, 125, 127, 131, 137, 161 184
Acronyms PMA Performance Measure Approach. 11, 81, 82, 86, 87, 89 RBDO Reliability-based Design Optimization. 8, 13, 18, 20–22, 50, 79, 80, 85, 86, 88, 138, 141–143, 150, 164, 172–174, 176, 178, 180 RBF Radial Basis Function. 11, 97, 98 RIA Reliability Index Approach. 80, 82, 89 SLA Single Loop Approach. 11, 86, 87, 172, 173 SORA Sequential Optimization and Reliability Assessment. 11, 87, 88 SORM Second Order Reliability Method. 18, 70, 71, 76–78, 89, 175, 180 SubSim Subset Simulation. 19, 73, 75–78, 82–85, 90, 143–145, 180 SUR Stepwise Uncertainty Reduction. 117, 154 SVM Support Vector Machine. 11, 14, 20, 103–110, 138, 140, 141, 152, 153, 156, 160, 165, 166, 175–178, 180 SVR Support Vector Regression. 110, 153, 178 tIMSE Target Integrated Mean Square Error. 116, 117
Bibliography A. Agresti. Analysis of ordinal categorical data. John Wiley & Sons, 2010. R. J. Allemang. The modal assurance criterion: twenty years of use and abuse. Sound and Vibration, 37(8):14–23, 2002. C. Andrieu, N. De Freitas, A. Doucet, and M. I. Jordan. An introduction to MCMC for machine learning. Machine Learning, 50(1-2): 5–43, 2003. doi:10.1023/A:1020281327116. Y. Aoues and A. Chateauneuf. Benchmark study of numerical methods for reliability-based design optimization. Structural and Multidisciplinary Optimization, 41(2):277–294, 2010. doi:10.1007/s00158-0090412-2. G. E. B. Archer, A. Saltelli, and I. M. Sobol’. Sensitivity measures, anova-like techniques and the use of bootstrap. Journal of Statistical Computation and Simulation, 58(2):99–120, 1997. doi:10.1080/00949659708811825. S.-K. Au and J. L. Beck. Estimation of small failure probabilities in high dimensions by subset simulation. Probabilistic Engineering Mechanics, 16(4):263–277, 2001. doi:10.1016/S0266-8920(01)00019-4. A. Basudhar. Computational optimal design and uncertainty quantification of complex systems using explicit decision boundaries. PhD thesis, University of Arizona, 2011. A. Basudhar and S. Missoum. Adaptive explicit decision functions for probabilistic design and optimization using support vector machines. Computers & Structures, 86(19-20):1904–1917, 2008. doi:10.1016/j.compstruc.2008.02.008. 186
Bibliography A. Basudhar and S. Missoum. An improved adaptive sampling scheme for the construction of explicit boundaries. Structural and Multidisciplinary Optimization, 42(4):517–529, 2010. doi:10.1007/s00158-0100511-0. A. Basudhar, C. Dribusch, S. Lacaze, and S. Missoum. Constrained efficient global optimization with support vector machines. Structural and Multidisciplinary Optimization, 46(2):201–221, 2012. doi:10.1007/s00158-011-0745-5. J. Bect, D. Ginsbourger, L. Li, V. Picheny, and E. Vazquez. Sequential design of computer experiments for the estimation of a probability of failure. Statistics and Computing, 22(3):773–793, 2012. doi:10.1007/s11222-011-9241-4. R. E. Bellman. Adaptive control processes: a guided tour. Rand Corporation. Research studies. Princeton University Press, 1961. ´ ements finis stochastiques : approches intrusive et non M. Berveiller. El´ intrusive pour des analyses de fiabilit´e. PhD thesis, Universit´e Blaise Pascal, Clermont-Ferrand, France, 2005. B. Bichon, S. Mahadevan, and M. S. Eldred. Reliability-based design optimization using efficient global reliability analysis. In Proceedings of the 50th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, Palm Springs, CA, 2009. doi:10.2514/6.2009-2261. B. J. Bichon, M. S. Eldred, L. P. Swiler, S. Mahadevan, and J. M. McFarland. Efficient global reliability analysis for nonlinear implicit performance functions. AIAA Journal, 46(10):2459–2468, 2008. doi:10.2514/1.34321. C. M. Bishop. Pattern recognition and machine learning. Springer, 2006. P. Bjerager and S. Krenk. Sensitivity measures in structural reliability analysis. In Proceedings of the first IFIP WG 7.5 Conference on Reliability and Optimization of Structural Systems, pages 459–470, Aalborg, Denmark, 1987. 187
Bibliography P. Bjerager and S. Krenk. Parametric sensitivity in first order reliability theory. Journal of Engineering Mechanics, 115(7):1577–1582, 1989. doi:10.1061/(ASCE)0733-9399(1989)115:7(1577). G. Blatman. Adaptive sparse polynomial chaos expansions for uncertainty propagation and sensitivity analysis. PhD thesis, Universit´e Blaise Pascal, 2009. G. Blatman and B. Sudret. An adaptive algorithm to build up sparse polynomial chaos expansions for stochastic finite element analysis. Probabilistic Engineering Mechanics, 25:183–197, 2010. doi:10.1016/j.probengmech.2009.10.003. G. Blatman and B. Sudret. Adaptive sparse polynomial chaos expansion based on least angle regression. Journal of Computational Physics, 230(6):2345–2367, 2011. doi:10.1016/ J. F. Bonnans, J. C. Gilbert, C. Lemar´echal, and C. A. Sagastiz´abal. Numerical optimization: theoretical and practical aspects. Springer Berlin Heidelberg, 2006. doi:10.1007/978-3-540-35447-5. A. W. Bowman and A. Azzalini. Applied smoothing techniques for data analysis: the kernel approach with S-Plus illustrations. Oxford University Press, 2004. G. E. P. Box, W. G. Hunter, and J. S. Hunter. Statistics for experimenters. Wiley-Interscience, 1978. K. Breitung. Asymptotic approximations for multinormal integrals. Journal of Engineering Mechanics, 110(3):357–366, 1984. L. Br´evault, S. Lacaze, M. Balesdent, and S. Missoum. Kriging-based sequential reliability analysis in the presence of mixed aleatory and epistemic uncertainties. Submitted to Structural Safety, 2015. D. S. Broomhead and D. Lowe. Multivariable functional interpolation and adaptive networks. Complex Systems, 2(3):321–355, 1988. J. Bucklew. Introduction to rare event simulation. Springer Science & Business Media, New York, USA, 2004. G. Cai and I. Elishakoff. Refined second-order reliability analysis. Structural Safety, 14(4):267–276, 1994. doi:10.1016/01674730(94)90015-9. 188
Bibliography F. Campolongo, J. Cariboni, and A. Saltelli. An effective screening design for sensitivity analysis of large models. Environmental modelling & software, 22(10):1509–1518, 2007. Y. Caniou. Global sSensitivity analysis for nested and multiscale modelling. PhD thesis, Universit´e Blaise Pascal-Clermont-Ferrand II, 2012. G. Casella and R. L. Berger. Statistical inference, volume 70. Duxbury Press Belmont, CA, 1990. C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):1–27, 2011. doi:10.1145/1961189.1961199. O. Chapelle. Training a support vector machine in the primal. Neural computation, 19(5):1155–1178, 2007. doi:10.1162/neco.2007.19.5.1155. O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee. Choosing multiple parameters for support vector machines. Machine Learning, 46(1-3):131–159, 2002. doi:10.1023/A:1012450327387. X. Chen, T. K. Hasselman, and D. J. Neill. Reliability based structural design optimization for practical applications. In Proceedings of the 38th structures, structural dynamics, and materials conference, Kissimmee, FL, 1997. doi:10.2514/6.1997-1403. G. Cheng, L. Xu, and L. Jiang. A sequential approximate programming strategy for reliability-based structural optimization. Computers & Structures, 84(21):1353–1367, 2006. doi:10.1016/j.compstruc.2006.03.006. N. Christianini and S. J. Taylor. An introduction to support vector machines (and othre kernel-based learning methods). Cambridge University Press, 2000. N. Cressie. The origins of kriging. Mathematical Geology, 22(3):239– 252, 1990. N. Cressie. Statistics for spatial data. Wiley-Interscience, 1993. P. J. Davis and P. Rabinowitz. Courier Corporation, 2007.
Methods of numerical integration.
Bibliography A. Der Kiureghian. The geometry of random vibrations and solutions by FORM and SORM. Probabilistic Engineering Mechanics, 15(1): 81–90, 2000. doi:10.1016/S0266-8920(99)00011-9. A. Der Kiureghian and M. De Stefano. Efficient algorithm for secondorder reliability analysis. Journal of engineering mechanics, 117(12): 2904–2923, 1991. A. Der Kiureghian, H.-Z. Lin, and S.-J. Hwang. Second-order reliability approximations. Journal of Engineering Mechanics, 113(8): 1208–1225, 1987. doi:10.1061/(ASCE)0733-9399(1987)113:8(1208). A. Der Kiureghian, Y. Zhang, and C. Li. Inverse reliability problem. Journal of Engineering Mechanics, 120(5):1154–1159, 1994. doi:10.1061/(ASCE)0733-9399(1994)120:5(1154). Q. Du, V. Faber, and M. Gunzburger. Centroidal voronoi tessellations: applications and algorithms. SIAM Review, 41(4):637–676, 1999. doi:10.1137/S0036144599352836. X. Du and W. Chen. Sequential optimization and reliability assessment method for efficient probabilistic design. Journal of Mechanical Design, 126(2):225, 2004. doi:10.1115/1.1649968. X. Du, A. Sudjianto, and W. Chen. An integrated framework for optimization under uncertainty using inverse reliability strategy. In Proceedings of the ASME International Design Engineering Technical Conferences and the Computers and Information in Engineering Conference, Chicago, IL, 2003. X. Du, A. Sudjianto, and B. Huang. Reliability-based design with the mixture of random and interval variables. Journal of Mechanical Design, 127(6):1068, 2005. doi:10.1115/1.1992510. V. Dubourg. Adaptive surrogate models for reliability analysis and reliability-based design optimization. PhD thesis, Universit´e Blaise Pascal, 2011. V. Dubourg, B. Sudret, and J.-M. Bourinet. Reliability-based design optimization using kriging surrogates and subset simulation. Structural and Multidisciplinary Optimization, 44(5):673–690, 2011. doi:10.1007/s00158-011-0653-8. 190
Bibliography D. J. Dupuis. Using copulas in hydrology: benefits, cautions, and issues. Journal of Hydrologic Engineering, 12(4):381–393, 2007. doi:10.1061/(ASCE)1084-0699(2007)12:4(381). B. Echard, N. Gayton, and M. Lemaire. AK-MCS: An active learning reliability method combining Kriging and Monte Carlo Simulation. Structural Safety, 33(2):145–154, 2011. doi:10.1016/j.strusafe.2011.01.002. B. Efron. Bootstrap methods: another look at the Jackknife. The Annals of Statistics, 7:1–26, 1979. doi:10.1214/aos/1176344552. B. Efron. Better bootstrap confidence intervals. Journal of the American statistical Association, 82(397):171–185, 1987. B. Efron and R. Tibshirani. An introduction to the bootstrap. CRC press, 1993. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of Statistics, 32(2):407–499, 2004. doi:10.1214/009053604000000067. I. Enevoldsen and J. D. Sø rensen. Reliability-based optimization in structural engineering. Structural Safety, 15(3):169–196, 1994. doi:10.1016/0167-4730(94)90039-6. S. Engelund and R. Rackwitz. A benchmark study on importance sampling techniques in structural reliability. Structural Safety, 12 (4):255–276, 1993. doi:10.1016/0167-4730(93)90056-7. E. C. Fieller, H. O. Hartley, and E. S. Pearson. Tests for rank correlation coefficients. I. Biometrika, 44(3):470–481, 1957. R. A. Fisher. Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika, pages 507–521, 1915. A. I. J. Forrester and A. J. Keane. Recent advances in surrogate-based optimization. Progress in Aerospace Sciences, 45(1-3):50–79, 2009. doi:10.1016/j.paerosci.2008.11.001. A. I. J. Forrester, A. Sobester, and A. J. Keane. Engineering design via surrogate modelling: a practical guide. John Wiley & Sons, 2008. 191
Bibliography E. W. Frees and E. A. Valdez. Understanding relationships using copulas. North American actuarial journal, 2(1):1–25, 1998. T. Goel, R. T. Hafkta, and W. Shyy. Comparing error estimation measures for polynomial and kriging approximation of noise-free functions. Structural and Multidisciplinary Optimization, 38(5):429–442, 2009. doi:10.1007/s00158-008-0290-z. C. Gogu, R. T. Haftka, R. Le Riche, J. Molimard, and A. Vautrin. Introduction to the Bayesian approach applied to elastic constants identification. AIAA journal, 48(5):893–903, 2010. doi:10.2514/1.40922. L. Gu, R. J. Yang, C. H. Tho, M. Makowskit, O. Faruquet, and Y. Li. Optimisation and robustness for crashworthiness of side impact. International Journal of Vehicle Design, 26(4):348–360, 2001. S. R. Gunn. Support vector machines for classification and regression. ISIS technical report, 14, 1998. N. Hansen. The CMA evolution strategy: a comparing review. In Towards a New Evolutionary Computation, volume 192, pages 75– 102. Springer Berlin Heidelberg, 2006. doi:10.1007/3-540-32494-1. A. M. Hasofer and N. C. Lind. An exact and invariant first-order reliability format. Journal of Engineering Mechanics, 100(1):111– 121, 1974. T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. The entire regularization path for the support vector machine. The Journal of Machine Learning Research, 5:1391–1415, 2004. T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning, volume 1. Springer, 2009. doi:10.1007/b94608. W. K. Hastings. Monte carlo sampling methods using Markov chains and their applications. Biometrika, 57(1):97–109, 1970. doi:10.1093/biomet/57.1.97. M. Hohenbichler and R. Rackwitz. First-order concepts in system reliability. Structural Safety, 1(3):177–188, 1982. doi:10.1016/01674730(82)90024-8. 192
Bibliography M. Hohenbichler, S. Gollwitzer, W. Kruse, and R. Rackwitz. New light on first- and second-order reliability methods. Structural Safety, 4 (4):267–284, 1987. doi:10.1016/0167-4730(87)90002-6. C.-W. Hsu, C.-C. Chang, and C.-J. Lin. A practical guide to support vector classification. Technical report, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, 2003. R. L. Iman and W. J. Conover. A distribution-free approach to inducing rank correlation among input variables. Communications in Statistics - Simulation and Computation, 11(3):311–334, 1982. doi:10.1080/03610918208812265. B. Iooss, A.-L. Popelin, G. Blatman, C. Ciric, F. Gamboa, S. Lacaze, and M. Lamboni. Some new insights in derivative-based global sensitivity measures. Proceedings of PSAM, 11:1094–1104, 2012. T. Jaakkola, M. Diekhans, and D. Haussler. Using the Fisher kernel method to detect remote protein homologies. In Proceedings of the International Conference on Intelligent Systems for Molecular Biology, pages 149–158, Heidelberg, Germany, 1999. C. Jiang, S. Han, M. Ji, and X. Han. A new method to solve the structural reliability index based on homotopy analysis. Acta Mechanica, pages 1–17, 2014. doi:10.1007/s00707-014-1226-x. T. Joachims. Estimating the generalization performance of a SVM efficiently. In Proceedings of the International Conference on Machine Learning, San Francisco, CA, 2000. M. E. Johnson, L. M. Moore, and D. Ylvisaker. Minimax and maximin distance designs. Journal of Statistical Planning and Inference, 26 (2):131–148, 1990. doi:10.1016/0378-3758(90)90122-B. D. R. Jones, C. D. Perttunen, and B. E. Stuckman. Lipschitzian optimization without the Lipschitz constant. Journal of Optimization Theory and Applications, 79(1):157–181, 1993. doi:10.1007/BF00941892. D. R. Jones, M. Schonlau, and W. J. Welch. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4):455–492, 1998. doi:10.1023/A:1008306431147. 193
Bibliography D. S. Jones. The theory of generalised functions. Cambridge University Press, 1987. V. R. Joseph, Y. Hung, and A. Sudjianto. Blind kriging: a new method for developing metamodels. Journal of Mechanical Design, 130(3): 1–8, 2008. doi:10.1115/1.2829873. L. Ju, Q. Du, and M. Gunzburger. Probabilistic methods for centroidal voronoi tessellations and their parallel implementations. Parallel Computing, 28(10):1477–1500, 2002. doi:10.1016/S01678191(02)00151-5. M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1): 81–93, 1938. M. C. Kennedy and A. O’Hagan. Bayesian calibration of mathematical models. Journal of the Royal Statistical Society, 63(3):425–464, 2001. doi:10.1111/1467-9868.00294. P. Kersaudy, B. Sudret, N. Varsier, O. Picon, and J. Wiart. A new surrogate modeling technique combining Kriging and polynomial chaos expansions – Application to uncertainty analysis in computational dosimetry. Journal of Computational Physics, 286:103–117, 2015. doi:10.1016/ R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the International Joint Conference on Artificial Intelligence, volume 14, pages 1137– 1145, Montreal, Canada, 1995. A. N. Kolmogorov. Foundations of the theory of probability. Chelsea Publishing Company, New York, 1956. H. U. K¨oyl¨ uoˇglu and S. R. Nielsen. New approximations for SORM integrals. Structural Safety, 13(4):235–246, 1994. doi:10.1016/01674730(94)90031-0. D. G. Krige. A statistical approach to some mine valuations and allied problems at the Witwatersrand. Master’s thesis, University of Witwatersrand, 1951. S. Lacaze and S. Missoum. Fidelity maps for model update under uncertainty: application to a piano soundboard. In Proceedings of 194
Bibliography the 53rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference, Honolulu, HI, 2012. American Institute of Aeronautics and Astronautics, American Institute of Aeronautics and Astronautics. doi:10.2514/6.2012-1432. S. Lacaze and S. Missoum. Reliability-Based Design Optimization using Kriging and Support Vector Machines. In Proceedings of the 11th International Conference on Structural Safety & Reliability, New York, NY, 2013a. IASSAR. doi:10.13140/2.1.2451.8089. S. Lacaze and S. Missoum. Bayesian calibration using fidelity maps. In Proceedings of the 11th International Conference on Structural Safety & Reliability, New York, NY, 2013b. IASSAR. doi:10.13140/2.1.2124.1287. S. Lacaze and S. Missoum. A generalized “max-min” sample for surrogate update. Structural and Multidisciplinary Optimization, 49(4): 683–687, 2014a. doi:10.1007/s00158-013-1011-9. S. Lacaze and S. Missoum. A generalized “max-min” sample for reliability assessment with dependent variables. In Proceedings of the 34th Computers and Information in Engineering Conference, Buffalo, NY, 2014b. doi:10.1115/DETC2014-34051. S. Lacaze and S. Missoum. Parameter estimation with correlated outputs using fidelity maps. Probabilistic Engineering Mechanics, 38: 13–21, 2014c. doi:10.1016/j.probengmech.2014.08.002. S. Lacaze, S. Missoum, F. Alijani, and M. Amabili. Identification under uncertainty of material properties of composite sandwich panels. In Proceedings of the American Society for Composites 29th Conference and16th US-Japan Conference on Composite Materials and ASTM D30 Meeting, San Diego, CA, 2014. S. Lacaze, L. Br´evault, S. Missoum, and M. Balesdent. Probability of failure sensitivity with respect to decision variables. Accepted in Structural and Multidisciplinary Optimization, 2015a. doi:10.1007/s00158-015-1232-1. S. Lacaze, L. Br´evault, S. Missoum, and M. Balesdent. A samplingbased RBDO algorithm with local refinement and efficient gradient estimation. In Proceedings of the12th International Conference on 195
Bibliography Applications of Statistics and Probability in Civil Engineering, Vancouver, Canada, 2015b. M. Lamboni, B. Iooss, A.-L. Popelin, and F. Gamboa. Derivativebased global sensitivity measures: general links with Sobol’ indices and numerical tests. Mathematics and Computers in Simulation, 87: 45–54, 2013. doi:10.1016/j.matcom.2013.02.002. R. Lebrun and A. Dutfoy. An innovating analysis of the Nataf transformation from the copula viewpoint. Probabilistic Engineering Mechanics, 24(3):312–320, 2009. doi:10.1016/j.probengmech.2008.08.001. I. Lee, K. K. Choi, and L. Zhao. Sampling-based RBDO using the stochastic sensitivity analysis and Dynamic Kriging method. Structural and Multidisciplinary Optimization, 44(3):299–317, 2011. doi:10.1007/s00158-011-0659-2. D. X. Li. On default correlation: a copula function approach. Journal of Fixed income, 9(4):43–54, 2000. J. Liang, Z. P. Mourelatos, and J. Tu. A single-loop method for reliability-based design optimization. In Proceedings of the ASME Design Engineering Technical Conferences, Salt Lake City, UT, 2004. J. Liang, Z. P. Mourelatos, and E. Nikolaidis. A single-loop approach for system reliability-based design optimization. Journal of Mechanical Design, 129(12):1215, 2007. doi:10.1115/1.2779884. H. T. Lin, C. J. Lin, and R. C. Weng. A note on Platt’s probabilistic outputs for support vector machines. Machine Learning, 68(3):267– 276, 2007. doi:10.1007/s10994-007-5018-6. P.-L. Liu and A. Der Kiureghian. Optimization algorithms for structural reliability. Structural Safety, 9(3):161–177, 1991. doi:10.1016/0167-4730(91)90041-7. S.
Lloyd. Least squares quantization in PCM. Transactions on Information Theory, 28(2):129–137, doi:10.1109/TIT.1982.1056489.
IEEE 1982.
D. Lopez-Paz, P. Hennig, and B. Sch¨olkopf. The randomized dependence coefficient. Advances in Neural Information Processing Systems, pages 1–9, 2013. 196
Bibliography A. Luntz and V. Brailovsky. On estimation of characters obtained in statistical procedure of recognition. Technicheskaya Kibernetica, 3 (6):6–12, 1969. T. Marwala. Finite Element Model Updating Using Computational Intelligence Techniques: Applications to Structural Dynamics. Springer, 2010. G. Matheron. Principles of geostatistics. Economic Geology, 58(8): 1246–1266, 1963. M. D. McKay, R. J. Beckman, and W. J. Conover. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2):239–245, 1979. doi:10.2307/1271432. R. E. Melchers. Structural reliability analysis and prediction. John Wiley New York, 1999. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, and A. H. Teller. Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6):1087–1092, 1953. doi:10.1063/1.1699114. J. Morio. Extreme quantile estimation with nonparametric adaptive importance sampling. Simulation Modelling Practice and Theory, 27:76–89, 2012. doi:10.1016/j.simpat.2012.05.008. M. D. Morris. Factorial sampling plans for preliminary computational experiments. Technometrics, 33(2):161–174, 1991. A. Nataf. D´etermination des distributions de probabilit´es dont les marges sont donn´ees. Comptes rendus de l’Acad´emie des Sciences, 225:42–43, 1962. R. M. Neal. Regression and classification using gaussian process priors. In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics 6, pages 69–95. Oxford University Press, 1999. R. B. Nelsen. An introduction to copulas. Springer, 2006. E. Nikolaidis, D. M. Ghiocel, and S. Suren. Engineering design reliability handbook. CRC Press, 2004. 197
Bibliography Y. Noh, K. K. Choi, and L. Du. Reliability-based design optimization of problems with correlated input variables using a Gaussian copula. Structural and multidisciplinary optimization, 38(1):1–16, 2009. J.-S. Park. Optimal latin-hypercube designs for computer experiments. Journal of Statistical Planning and Inference, 39(1):95–111, 1994. doi:10.1016/0378-3758(94)90115-5. V. Picheny, D. Ginsbourger, O. Roustant, R. T. Haftka, and N.-H. Kim. Adaptive designs of experiments for accurate approximation of a target region. Journal of Mechanical Design, 132(7):071008, 2010. doi:10.1115/1.4001873. J. C. Platt. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers. MIT Press, 1999. A. V. Prokhorov. Kendall coefficient of rank correlation. Online Encyclopedia of Mathematics, 2001. R. Rackwitz and B. Fiessler. Structural reliability under combined random load sequences. Computers & Structures, 9(5):489–494, 1978. doi:10.1016/0045-7949(78)90046-9. P. Ranjan, D. Bingham, and G. Michailidis. Sequential experiment design for contour estimation from complex computer codes. Technometrics, 50(4):527–541, 2008. doi:10.1198/004017008000000541. C. E. Rasmussen. Gaussian processes in machine learning, volume 3176 of Lecture Notes in Computer Science, pages 63–71. Springer-Verlag, Heidelberg, 2004. C. E. Rasmussen and C. K. I. Williams. Gaussian processes for machine learning. The MIT Press, 2006. J. Rice. Mathematical statistics and data analysis. Cengage Learning, 2006. M. L. Rizzo. Statistical computing with R. CRC Press, 2008. M. Rosenblatt. Remarks on a multivariate transformation. Annals of Mathematical Statistics, 23:470–472, 1952. 198
Bibliography S. M. Ross. An introduction to probability models. Academic Press, 2007. S. M. Ross. A first course in probability. Pearson, 2009. J. O. Royset, A. Der Kiureghian, and E. Polak. Reliability-based optimal structural design by the decoupling approach. Reliability Engineering & System Safety, 73(3):213–221, 2001. doi:10.1016/S09518320(01)00048-5. R. Y. Rubinstein and D. P. Kroese. The cross-entropy method : a unified approach to combinatorial optimization, Monte-Carlo simulation and machine learning. Information science and statistics. Springer Science & Business Media, Secaucus, USA, 2004. R. Y. Rubinstein and D. P. Kroese. Simulation and the Monte Carlo method. John Wiley & Sons, 2011. J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn. Design and analysis of computer experiments. Statistical science, 4(4):409–423, 1989. A. Saltelli. Making best use of model evaluations to compute sensitivity indices. Computer Physics Communications, 145(2):280–297, 2002. doi:10.1016/S0010-4655(02)00280-1. A. Saltelli, T. H. Andres, and T. Homma. Sensitivity analysis of model output: an investigation of new techniques, 1993. A. Saltelli, K. Chan, and E. M. Scott. Sensitivity analysis. John Wiley & Sons, 2000. R. Sch¨obi and B. Sudret. PC-Kriging: a new metamodelling method combining polynomial chaos expansions and kriging. In Proceedings of the 2nd International Symposium on Uncertainty Quantification and Stochastic Modeling, Rouen, France, 2014. B. Sch¨olkopf and A. J. Smola. Learning with kernels: Support vector machines, regularization, optimization, and beyond. The MIT Press, 2002. L. Schwartz. Th´eorie des Distributions: Vol. 1. Hermann & Cie., 1957. 199
Bibliography G. Shafer. A mathematical theory of evidence. Princeton university press, 1976. A. Sklar. Fonctions de r´epartition a` n dimensions et leurs marges. Publications of the Institute of Statistics, University of Paris, 8:229– 231, 1959. A. J. Smola and B. Sch¨olkopf. A tutorial on support vector regression. Statistics and Computing, 14(3):199–222, 2004. doi:10.1023/B:STCO.0000035301.49549.88. I. M. Sobol’. On sensitivity estimation for nonlinear mathematical models. Matematicheskoe Modelirovanie, 2(1):112–118, 1990. I. M. Sobol’. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Mathematics and Computers in Simulation, 55(1-3):271–280, 2001. doi:10.1016/S03784754(00)00270-6. I. M. Sobol’ and S. Kucherenko. Derivative based global sensitivity measures and their links with global sensitivity indices. Mathematics and Computers in Simulation, 79(10):3009–3017, 2009. C. Soize and R. Ghanem. Physical systems with random uncertainties: chaos representations with arbitrary probability measure. SIAM Journal on Scientific Computing, 26(2):395–410, 2004. doi:10.1137/S1064827503424505. S. Song, Z. Lu, and H. Qiao. Subset simulation for structural reliability sensitivity analysis. Reliability Engineering & System Safety, 94(2): 658–665, 2009. doi:10.1016/j.ress.2008.07.006. C. Spearman. The proof and measurement of association between two things. The American journal of psychology, 15(1):72–101, 1904. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58(1):267–288, 1996. M. E. Tipping. Sparse bayesian learning and the relevance vector machine. The Journal of Machine Learning Research, 1:211–244, 2001. doi:10.1162/15324430152748236. 200
Bibliography M. E. Tipping and A. C. Faul. Fast marginal likelihood maximisation for sparse Bayesian models. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, 2003. J. Tu, K. K. Choi, and Y. H. Park. A new study on reliability-based design optimization. Journal of Mechanical Design, 121(4):557, 1999. doi:10.1115/1.2829499. J. Tu, K. K. Choi, and Y. H. Park. Design potential method for robust system parameter design. AIAA journal, 39(4):667–677, 2001. A. M. Turing. Computing machinery and intelligence. Mind, 59(236): 433–460, 1950. L. Tvedt. Distribution of quadratic forms in normal space - application to structural reliability. Journal of Engineering Mechanics, 116(6): 1183–1197, 1990. doi:10.1061/(ASCE)0733-9399(1990)116:6(1183). M. A. Valdebenito and G. I. Schu¨eller. A survey on approaches for reliability-based optimization. Structural and Multidisciplinary Optimization, 42(5):645–663, 2010. doi:10.1007/s00158-010-0518-6. V. N. Vapnik. The nature of statistical learning theory. Springer Verlag, 2000. V. N. Vapnik and O. Chapelle. Bounds on error expectation for support vector machines. Neural computation, 12(9):2013–2036, 2000. doi:10.1162/089976600300015042. V. N. Vapnik and A. Y. Chervonenkis. The theory of pattern recognition. Nauka, Moscow, 1974. W. J. Welch, R. J. Buck, J. Sacks, H. P. Wynn, T. J. Mitchell, and M. D. Morris. Screening, predicting, and computer experiments. Technometrics, 34(1):15–25, 1992. doi:10.2307/1269548. D. H. Wolpert. The lack of a priori distinctions between learning algorithms. Neural Computation, 8(7):1341–1390, 1996. doi:10.1162/neco.1996.8.7.1341. Y.-T. Wu. Efficient methods for mechanical and structural reliability analysis and design (safety-index, fatigue, failure). PhD thesis, The University of Arizona, 1984. 201
Bibliography Y. Xiong, W. Chen, K.-L. Tsui, and D. W. Apley. A better understanding of model updating strategies in validating engineering models. Computer Methods in Applied Mechanics and Engineering, 198(1516):1327–1337, 2009. doi:10.1016/j.cma.2008.11.023. K. Q. Ye. Orthogonal column latin hypercubes and their application in computer experiments. Journal of the American Statistical Association, 93(444):1430–1439, 1998. doi:10.1080/01621459.1998.10473803. B. D. Youn and K. K. Choi. An investigation of nonlinearity of reliability-based design optimization approaches. Journal of Mechanical Design, 126(3):403, 2004. doi:10.1115/1.1701880. B. D. Youn, K. K. Choi, and Y. H. Park. Hybrid analysis method for reliability-based design optimization. Journal of Mechanical Design, 125(2):221, 2003. doi:10.1115/1.1561042. B. D. Youn, K. K. Choi, and L. Du. Adaptive probability analysis using an enhanced hybrid mean value method. Structural and Multidisciplinary Optimization, 29(2):134–148, 2004a. doi:10.1007/s00158-0040452-6. B. D. Youn, K. K. Choi, R.-J. Yang, and L. Gu. Reliability-based design optimization for crashworthiness of vehicle side impact. Structural and Multidisciplinary Optimization, 26(3-4):272–283, 2004b. doi:10.1007/s00158-003-0345-0. B. D. Youn, K. K. Choi, and L. Du. Enriched performance measure approach for reliability-based design optimization. AIAA Journal, 43(4):874–884, 2005. doi:10.2514/1.6648. L. A. Zadeh. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1:3–28, 1978. doi:10.1016/S0165-0114(99)80004-9. L. A. Zadeh. Fuzzy logic. doi:10.1109/2.53.
P. Zhang. Nonparametric importance sampling. Journal of the American Statistical Association, 91(435):1245–1253, 1996. Y. Zhang and A. Der Kiureghian. Two improved algorithms for reliability analysis. In Proceedings of the sixth IFIP WG7.5 working conference on reliability and optimization of structural systems, pages 297–304, Assisi, Italy, 1995. doi:10.1007/978-0-387-34866-7 32. 202
Bibliography L. Zhao, K. K. Choi, and I. Lee. Metamodeling method using dynamic kriging for design optimization. AIAA Journal, 49(9):2034–2046, 2011. doi:10.2514/1.J051017. Y. G. Zhao and T. Ono. A general procedure for first/second-order reliability method (FORM/SORM). Structural Safety, 21(2):95–112, 1999. doi:10.1016/S0167-4730(99)00008-9. T. Zou and S. Mahadevan. A direct decoupling approach for efficient reliability-based design optimization. Structural and Multidisciplinary Optimization, 31(3):190–200, 2006. doi:10.1007/s00158-0050572-7.