Gaussian Process Modeling Using the Principle of Superposition Matthias H.Y. Tan* and Guilin Li Department of Systems Engineering and Engineering Management, City University of Hong Kong *Corresponding author, Email:
[email protected]
Abstract Partial differential equation (PDE) models of physical systems with initial and boundary conditions are often solved numerically via a computer code called the simulator. To study the dependence of the solution on a functional input, the input is expressed as a linear combination of a finite number of basis functions, and the coefficients of the bases are varied. In such studies, Gaussian process (GP) emulators can be constructed to reduce the amount of simulations required from time-consuming simulators. For linear initial-boundary value problems (IBVPs) with functional inputs as additive terms in the PDE, initial conditions or boundary conditions, the IBVP solution is theoretically a linear function of the coefficients conditional on all other inputs, which is a result called the principle of superposition. Since numerical errors cause deviation from linearity and nonlinear IBVPs are widely solved in practice, we generalize the result to account for nonlinearity. Based on this generalized result, we propose mean and covariance functions for building GP emulators that capture the approximate conditional linear effect of the coefficients. Numerical simulations demonstrate the substantial improvements in prediction performance achieved with the proposed emulator. Matlab codes for reproducing the results in this paper are available in the online supplement. Keywords: Computer Experiments, Functional Inputs, Linear Differential Operator
1
Introduction
Time-consuming simulators of physical systems are often based on PDEs. These models are widely employed in engineering applications, which can be seen in the huge and interesting selection of examples in the Comsol (2017) and ANSYS (2017) webpages. A PDE usually has an infinite number of solutions. Thus, initial conditions and boundary conditions need to be imposed to obtain a unique solution. A PDE together with initial and boundary conditions give an IBVP, which is usually solved with numerical methods ˆ ın, 2006; Li & Chen, such as the finite element (FE) and finite difference methods (Sol´ 2009). This paper is concerned with construction of GP emulators to predict the solution or a linear map/transformation/function of the solution of an IBVP as a function of its inputs. Existing GP modeling methods mostly treat the simulator that solves the IBVP 1
as a black box function (Sacks et al., 1989; Currin et al., 1991). A common approach is to use a stationary GP with a product Gaussian or product Mat´ern covariance function as a prior for the functional relationship (Santner et al., 2003). In our approach, we incorporate prior information about the structure of the IBVP given by the principle of superposition. Specifically, for a linear IBVP with functional inputs as additive terms in the PDE, initial conditions or boundary conditions, the IBVP solution is theoretically a linear function of the basis coefficients conditional on other inputs. For many nonlinear IBVPs and solutions of linear IBVPs contaminated by numerical error, these conditional effects of the coefficients are approximately linear. We propose a GP emulator with prior mean and covariance functions chosen based on this prior knowledge, i.e., the GP is the sum of a GP with conditional linear effects and an independent residual GP. As an example, consider the linear heat equation (a linear parabolic PDE), which predicts the temperature u(s, t|x, z) of a solid body (such as the gripper pivot, a robot component, in Figure 1) represented by a set of points S as a function of the spatial coordinate s ∈ S and time t ∈ [0, T ] given a vector of inputs (x, z) to the IBVP. Thus, the IBVP solution is the temperature over the spatial domain S and time interval [0, T ]. The heat equation is k X ∂ zei ei (s, t) ∀s ∈ S, t ∈ [0, T ] , (1) x1 u (s, t|x, z)−∇s ·[x2 ∇s u (s, t|x, z)] = e0 (s, t)+ ∂t i=1
where x1 , x2 are the volumetric specific heat and thermal conductivity respectively, x = (x1 , x2 ), ei (s, t) is a basis function with domain S× [0, T ] and coefficient zei (if i > 0), and each zei is a component of the vector z. Note that ∇s u (s, t| x, z) is the gradient of u with respect to s, and ∇s · is the divergence with respect to s. The initial condition (initial temperature) is m1 X u (s, 0| x, z) = h10 (s) + zh1i h1i (s) ∀s ∈ S, (2) i=1
and the boundary condition (boundary temperature) is m2 X u (s, t| x, z) = h20 (s, t) + zh2i h2i (s, t) ∀s ∈ ∂S,t ∈ (0, T ],
(3)
i=1
where the h1i (s)’s and h2i (s, t)’s are basis functions with domains S and ∂S×(0, T ] respectively, ∂S is the boundary of S, the zhji ’s are coefficients of the bases, and each zhji is a component of the vector z. A numerical method such as the FE method can be used to solve (1)-(3) for fixed (x, z), and a computer experiment would involve varying (x, z) according to an experimental design and obtaining the numerical solution u (s, t| x, z) at each design point. Let us denote the right-hand sides of (1)-(3), which we have modeled as elements of affine function spaces, as e (s, t| z), h1 (s| z), and h2 (s, t| z), i.e., e (s, t| z) = e0 (s, t) +
k X
zei ei (s, t),
i=1
h1 (s| z) = h10 (s) +
m1 X
zh1i h1i (s),
i=1
h2 (s, t| z) = h20 (s, t) +
m2 X i=1
2
zh2i h2i (s, t).
(4)
The term e (s, t| z) appearing on the right-hand side of the PDE is called the source term, while h1 (s| z) is called an initial condition and h2 (s, t| z) is called a boundary condition. In (1)-(3), the source term gives the internal heat generated (e.g., due to resistive heating), the initial condition gives the initial temperature, while the boundary condition gives the boundary temperature. These quantities are functional inputs to the IBVP. Define the ∂ differential operators Lx = x1 ∂t − ∇s · (x2 ∇s ), and Q1x = Q2x = I, where I is the identity operator. Then, we can write (1)-(3) compactly as Lx u (s, t| x, z) = e (s, t| z) ∀s ∈ S,t ∈ [0, T ] , Q1x u (s, t| x, z) = h1 (s| z) ∀s ∈ S,t = 0, Q2x u (s, t| x, z) = h2 (s, t| z) ∀s ∈ ∂S,t ∈ (0, T ].
(5)
Figure 1: 3D plot of a gripper pivot, which is a component of a commercial robot Note that Lx , Q1x , and Q2x are linear operators, i.e., Lx [au1 (s, t) + u2 (s, t)] = aLx u1 (s, t)+ Lx u2 (s, t) for a scalar a and sufficiently differentiable functions u1 (s, t) and u2 (s, t) (similarly for Q1x and Q2x ). This property can be utilized to show that u (s, t| x, z) = u0 (s, t| x) +
k X
zei uei (s, t| x) +
i=1
mj 2 X X
zhji uhji (s, t| x),
(6)
j=1 i=1
where u0 (s, t| x), uei (s, t| x), uh1i (s, t| x), and uh2i (s, t| x) are solutions of modified versions of (5) with e (s, t| z), h1 (s| z), and h2 (s, t| z) replaced with either 0 or a basis function in (4), as given in Table 1. Note that (6) is a statement of the so-called principle of superposition, which is widely employed in classical PDE theory to find a series solution to linear IBVPs (Chapter 7 of Myint-U and Debnath (2007) and Part IV of Pivato (2010)). The principle of superposition (6) implies that conditional on x, u (s, t| x, z) is a linear function of z at each (s, t) point. However, when the linear IBVP (5) is solved numerically, numerical errors can cause small deviations from linearity, which require us to add a residual term to (6). The residual term can similarly be used to account for nonlinearity in nonlinear IBVPs. In many cases, nonlinear IBVPs are the result of adding more elaborate elements to standard linear IBVPs for a physical problem (e.g., a thermal conductivity that depends on the temperature u (s, t| x, z) in (1) instead of a scalar x2 , which makes Lx a nonlinear operator). In such cases, we can expect the original linear 3
Table 1: Quantities replacing right-hand side in (5) to obtain IBVP solutions appearing in (6)
replace e (s, t| z) with replace h1 (s| z) with replace h2 (s, t| z) with
To obtain u0 (s, t| x),
To obtain uei (s, t| x),
To obtain uh1i (s, t| x),
To obtain uh2i (s, t| x),
e0 (s, t) h10 (s) h20 (s, t)
ei (s, t) 0 0
0 h1i (s) 0
0 0 h2i (s, t)
IBVP to provide an approximate solution to the nonlinear IBVP. In other words, we can expect the added residual term to be small. This paper utilizes the modified form of (6) (with residual term) to construct prior mean and covariance functions for a GP emulator of the IBVP solution or a linear map of it so that the GP better captures our prior knowledge that the effect of z is approximately linear conditional on x. A computer program that solves (5) with the FE method will first discretize it and ˆ ın, 2006; Li & Chen, 2009). The solve the resulting system of algebraic equations (Sol´ timing studies of various FE schemes for solving IBVPs in Cai et al. (2003) indicate that a main factor that causes solving an IBVP to be time-consuming is the size of the system of algebraic equations. For time independent PDEs, the system of equations tends to be larger if a system of PDEs instead of a scalar PDE is solved, the spatial domain S is three-dimensional, and S is of a complex geometrical shape. If the PDE is time-dependent, a system of equations needs to be solved at each time point in a ˆ ın (2006)), which increases the grid used to discretize the time interval (Chapter 5 of Sol´ computational cost. Discretizing nonlinear IBVPs generally yields systems of nonlinear algebraic equations (Hyman, 1982), which are more time consuming to solve than the linear algebraic equations obtained by discretizing linear IBVPs. Three examples of IBVP models that are time-consuming to solve are given in Section 5 and they are used to demonstrate the large improvements achieved with the proposed GP emulator. Recently, there are some attempts at using knowledge about the physical models solved by simulators to improve construction of the GP or other types of emulators. For example, Golchi, Bingham, Chipman, and Campbell (2015), Wang and Berger (2016), and Tan (2016, 2017b) propose emulators for computer experiments in which the response is known to be a monotonic function of the inputs. Wang and Berger (2016) also consider other kinds of constraints on the derivatives. Wheeler, Dunson, Pandalai, Baker, and Herring (2014) propose a GP model that incorporates information from a linear ordinary differential equation model of data from a physical experiment. The GP model is designed to approximately satisfy the ordinary differential equation. Tan (2017a) proposes an approach to model numerical solutions of IBVPs in such a way that Dirichlet boundary and initial conditions are satisfied. No restriction is placed on the PDE but the restriction to the specific type of boundary and initial conditions limits the method’s applicability. Alvarez, Luengo, and Lawrence (2013) propose generating a GP prior for multi-response physical data based on the solution of a low-fidelity linear IBVP model of the system with source term or initial condition given by a stationary GP. They use the fact that for a linear parabolic IBVP of the form (5) with homogenous boundary condition h2 (s, t| z) =
4
0, the solution can be written as Z Z tZ 0 0 0 0 0 0 G (s, t, s , t | x) e (s , t | z) ds dt + G (s, t, s0 , 0| x) h1 (s0 | z) ds0 , u (s, t| x, z) = 0
S
S
(7) ´ where G (s, t, s , t | x) is the Green’s function for the IBVP. Alvarez, Luengo, and Lawrence let either ei or h1i be independent stationary GPs (and set e0 = h10 = 0). This makes u (s, t| x, z) a GP, which is taken as the prior for the unknown function with space-time domain to be inferred from the physical data, where (x, z) is now a vector of hyperparameters to be specified by the modeler or estimated from data. All responses are modeled in this fashion and they share the same ei ’s or h1i ’s but have different (x, z)’s. The approach ´ by Alvarez, Luengo, and Lawrence is feasible only when Green’s function is known. Unfortunately, Green’s function is often only known when S represents a simple geometrical shape (rectangle or circle) or is the entire real coordinate space (for an infinitely large object) since it is defined as the solution of an IBVP obtained by modifying the right-hand side terms of (5) (Duffy, 2015). The rest of the paper is organized as follows: Section 2 introduces stationary and separable GP models for responses that are functions and multivariate responses. Section 3 discusses functional inputs to IBVPs, states a modified principle of superposition that accounts for nonlinearity, and presents the proposed prior mean and covariance functions. In Section 4, three specific IBVPs are presented and discussed. Section 5 shows the improvements that can be attained with the proposed emulator for predicting outputs of time consuming simulators built to solve the IBVPs in Section 4. Concluding remarks are given in Section 6. 0
2
0
Gaussian Process Modeling of Responses that are Functions and Multivariate Responses with Separable Covariance Functions
In this section, we present a GP model with a separable covariance function for responses that are real-valued functions defined on some domain C and observed on a fixed set of points M. For this problem, a separable covariance function is desirable when the number of points in M is large since it enables quick numerical computation of the inverse and determinant of the prior covariance matrix (Rougier, 2008; Hung et al., 2015). We also present a GP model with a separable covariance structure for multivariate responses, which is introduced by Conti and OHagan (2010). These models are standard statistical emulators for maps of the solution u of an IBVP, which are the responses in computer experiments. For example, for the IBVP model (1)-(3) of the temperature u in the gripper pivot in Figure 1, a map of u of interest to engineers isR the average of u Rover the spatial domain viewed as a function of time, i.e., y (t, x, z) = S u (s, t| x, z)ds/ S 1ds. Another is the value of u along the vertical line in the cylindrical inner surface of the gripper pivot shown in Figure 1 at some time, say t = 100, which can be written as y (s3 , x, z), where s3 is the vertical coordinate. Note that this response is obtained by restricting the domain of u (·| x, z). Finally, if the IBVP solution is vector valued, e.g., u = (u1 , u2 , u3 )T , then a map of interest might be y (v, x, z) = uv (s, t| x, z) for fixed space time coordinate (s, t), where v is an integer index. A popular GP emulator for a response y (·, w) that is a real-valued function defined 5
on a connected and compact domain C ⊂Rc , where Rc is the real coordinate space of c dimensions, given the input w, is based on the assumption of covariance stationarity and separability. Let Y (v, w) denote the GP emulator of y (v, w), where v ∈ C, w ∈ X , and X ⊂ Rd is the connected and compact experiment region. Note that we use the notation w = (x, z) for brevity and the quantities v and C depend on the context. For emulating the solution u (s, t| x, z) of the IBVP (5), R v = (s, t), C =S R × [0, T ] and y (v, w) = u (s, t| x, z) while for emulating y (t, x, z) = S u (s, t| x, z)ds/ S 1ds, v = t and C = [0, T ]. The covariance function, which is separable with respect to v and w, is cov [Y (v, w) , Y (v 0 , w0 )] = σY2 ψ1 (v, v 0 | γ1 ) ψ2 (w, w0 | γ2 ) ,
(8)
where ψ1 and ψ2 are stationary correlation functions with parameters γ1 and γ2 respectively, and σY2 is the variance. One choice for ψ1 and ψ2 , which we adopt in this paper, is the product Mat´ern correlation function with smoothness parameter 3/2 (Santner et al. (2003), page 42), which gives once differentiable sample paths, i.e., 0
ψ2 (w, w | γ2 ) =
d Y
exp (− |wi − wi0 | /γ2i ) (|wi − wi0 | /γ2i + 1),
(9)
i=1
where w = (w1 , . . . , wd ) and γ2 = (γ21 , . . . , γ2d ) (similarly for ψ1 ). The correlation function (9) is a rather common choice in constructing emulators for computer experiments (Stein, 1989; Santner et al., 2003; Tan, 2017b). In addition, following common practice (Currin et al., 1991), we assume the prior mean function is a constant, i.e., E [Y (v, w)] = α. In this paper, we shall restrict attention to cases where the function y (·, w) is observed at a fixed set of points M = v 1 , . . . , v N for any w ∈ X . Note that we index the points by superscripts to avoid confusion with the scalar components of v. Let the experimental design be D = {w1 , . . . , wn } and let the experimental data be arranged in a vector T Y (D) = Y v 1 , w1 , . . . , Y v N , w1 , . . . , Y v N , wn
(10)
The prior correlation matrix for Y (D) is R = R2 ⊗ R1 , where ⊗ is the matrix Kronecker product, (R1 )ij = ψ1 (v i , v j | γ1 ) (R1 is an N × N matrix with ψ1 (v i , v j | γ1 ) in the ith row and jth column) and (R2 )ij = ψ2 (wi , wj | γ2 ). Denote the response at w by Y (w) = T Y (v 1 , w) , . . . , Y v N , w . Conditional on Y (D) and all GP parameters, the mean of Y (w) (posterior mean conditional on all GP parameters) is E [Y (w)| Y (D)] = α1N + Y∗ − α1N 1Tn R2−1 r (w) , (11) T
where r (w) = (ψ2 (w, w1 | γ2 ) , . . . , ψ2 (w, wn | γ2 )) , Y∗ = (Y (w1 ) , . . . , Y (wn )), and 1N is an N × 1 vector of 1’s. The covariance matrix of Y (w) conditional on Y (D) and all GP parameters (posterior covariance matrix conditional on all GP parameters) is h i cov [ Y (w)| Y (D)] = σY2 R1 ψ2 (w, w| γ2 ) − r (w)T R2−1 r (w) . (12) The value of ψ2 (w, w| γ2 ) is one for the correlation function (9) but we write ψ2 (w, w| γ2 ) instead of one so that the formula (12) is valid if ψ2 is replaced with any covariance function. Thus, this formula will be applicable to our proposed covariance function in Section 3.4 as well. To estimate α, σY2 , γ1 , and γ2 , we can use the maximum likelihood 6
(ML) method. Efficient formulas for computing the ML estimates are given in Equation (A1) of Appendix A. The main reason for adopting the covariance separability assumption in (8) is that it reduces the amount of computation in fitting the GP emulator and obtaining its predictions (Rougier, 2008). The main computational burden is the computation of R−1 and |R|. They are obtained with less computational effort by the formulas R−1 = R2−1 ⊗ R1−1 and |R| = |R2 |N |R1 |n . Note that we also do not need to store the large matrix R−1 for computations since (11)-(12) and (A1) are written in terms of R1−1 and R2−1 but not R−1 . When M is simply a discrete index set that indexes the N different outputs of a simulator, we cannot use a covariance function of the form (9) to model the covariance between the outputs. We adopt the approach of Conti and OHagan (2010). This approach assumes that the covariance matrix of Y (w) is a positive definite matrix Φ that does not depend on w, which is a special case of the more general class of covariance ´ functions for multiple outputs given in Equation (8) in Alvarez and Lawrence (2011). Since the components of Y (w) can be measured in different physical units or different scales, we assume that the prior mean of Y (w) is E [Y (w)] = A = (α1 , . . . , αN )T , i.e., each component has a different mean. In this case, the mean and covariance matrix of Y (w) conditional on data Y (D) is as given by (11)-(12) except that α1N is replaced with A and σY2 R1 is replaced with Φ. Efficient formulas for computing the ML estimates of A, Φ, and γ2 are given in Equation (A2) of Appendix A.
3
The Principle of Superposition and Proposed GP Emulator
In this section, we first present a general IBVP that is more general than the IBVP (5) and includes as special cases linear IBVPs for modeling heat, wave, diffusion, and other phenomena (Coleman, 2013). In Section 3.1, we discuss functional inputs to PDEs. In Section 3.2, a statement of the principle of superposition is given. Section 3.3 modifies the statement to account for nonlinearities. Section 3.4 gives the proposed mean and covariance functions and a procedure for constructing the proposed GP emulator. The general IBVP is Lx u (s, t| x, z) = e (s, t| z) ∀ (s, t) ∈ A, Qjx u (s, t| x, z) = hj (s, t| z) ∀ (s, t) ∈ Bj , j = 1, . . . , q, e (s, t| z) = e0 (s, t) +
k X
zei ei (s, t),
(13)
i=1 mj
hj (s, t| z) = hj0 (s, t) +
X
zhji hji (s, t).
i=1
where Lx , Q1x , . . . , Qqx are differential operators that operate on (s, t), s is the spatial coordinate, t is time, and A = S × [0, T ], which is the space-time domain. For time independent PDEs, we simply replace (s, t) with s and A with S. The Bj ’s are subsets of the space-time domain boundary for time dependent PDEs and are subsets of the spatial boundary for time independent PDEs. Notice that unlike (5), the notation in (13) does not distinguish between initial and boundary conditions. Our proposed methodology does not require such a distinction. The IBVP notation (13) accommodates cases where 7
the spatial boundary is split into multiple pieces with a different boundary condition applied to each piece and cases where more than one initial condition (e.g., the value of the solution and its derivative with respect to time at time 0) is imposed. It includes as special cases all IBVPs with a bounded spatial domain that are associated with second order elliptic, parabolic, and hyperbolic PDEs, as described in Coleman (2013). Specific examples are given in Section 4.
3.1
Functional Inputs to IBVPs
In the PDE (13), the source term e (s, t| z) and initial/boundary condition hj (s, t| z) are functional inputs. A standard approach to parameterizing these inputs is to write them as truncated series expansions based on some set of basis functions, where variation of the functional inputs is associated with variation in the coefficients. A common approach is to use piecewise constant or piecewise linear bases (e.g., Jia et al. (2004); Collar, Oliveros, Castillo, and Mones (2015)). Indeed, examples in textbooks on PDE modeling often assume that the boundary conditions on edges of a two-dimensional polygon or faces of a three-dimensional polyhedron are constants (Coleman, 2013; MathWorks, 2017) while Dirac delta sources, which can be approximated by piecewise constant sources, are common in applications (Hamdi, 2007). There is a vast number of bases, e.g., B-splines (De Boor, 1978), wavelets (Daubechies, 1992), radial basis functions (Buhmann, 2003) to name a few, that have been developed for function approximation, and these bases can be employed in (13) to model the source term, initial conditions, or boundary conditions. For instance, Muehlenstaedt, Fruth, and Roustant (2017) use B-spline bases to model a controllable functional input, i.e., blankholder force (which is a boundary condition in the PDE model of metal forming given in Jung-Ho and Noboru (1985)), in a simulator of sheet metal forming. Functional inputs modeled as random fields are commonly approximated as a truncated series in uncertainty quantification. A popular truncated series expansion for random fields is the Karhunen-Lo`eve (KL) expansion (Le Maˆıtre & Knio, 2010; Gunzburger et al., 2014; Lord et al., 2014). In this expansion, the coefficients of the series are random variables. If the source term is modeled as a square-integrableP random field, its truncated KL expansion is exactly of the form e (s, t| z) = e0 (s, t) + ki=1 zei ei (s, t), as given in (13), where e0 is the mean function, ei is a normalized eigenfunction of the covariance function of the random field, and the zei ’s are uncorrelated random variables. The variance of zei is λi , which is the eigenvalue of the covariance function corresponding to ei , and λ1 ≥ λ2 ≥ · · · ≥ 0. Note that although the zei ’s are random variables, their values can be chosen and fixed in a computer simulation. Typically, few KL expansion terms are needed to accurately approximate smooth random field inputs (Huang, Quek, and Phoon (2001); Le Maˆıtre and Knio (2010), page 25; Gunzburger et al. (2014), page 527).
3.2
Decomposition of IBVP Solution Based on the Principle of Superposition
In this section, we state the principle of superposition. We shall show that the solution u of (13) can be decomposed as a linear combination of solutions of IBVPs of the same form as (13) with the zei ’s and zhji ’s as coefficients if (13) is a linear IBVP. We assume for now that (13) is a linear IBVP, i.e., Lx , Q1x , . . . , Qqx are linear differential operators, which means that Qjx [au1 (s, t) + u2 (s, t)] = aQjx u1 (s, t) + Qjx u2 (s, t) for a 8
scalar a and any sufficiently differentiable functions u1 and u2 (similarly for Lx ). Linear differential operators can be written in the following form (see Page 255 of Han (2011)) X
Lx =
Λ(k1 ,...,kp+1 ) (s, t| x)
(k1 ,...,kp+1 )∈K
∂ k1 +···+kp +kp+1 k
∂sk11 · · · ∂spp ∂tkp+1
,
(14)
where s = (s1 , . . . , sp ), k1 , . . . , kp+1 are nonnegative integers, K is a set of (p + 1)-tuples of nonnegative integers, Λ(k1 ,...,kp+1 ) (s, t| x) is a matrix function of (s, t, x), and the park1 +···+kp +kp+1 tial derivative ∂ k1 kp kp+1 is applied componentwise to the argument vector-valued ∂s1 ···∂sp ∂t
function. To decompose the solution of (13) based on the principle of superposition, we shall need the following assumption, which holds for most IBVPs. Assumption 1: The IBVP (13) has a unique solution for all real values of zei and zhji . Remark 1: A key result often used to prove uniqueness is that if e and the hj ’s are all zero, u = 0 is the unique solution. Uniqueness of the solution has been verified for IBVPs for common PDEs like the Poisson, heat, and wave equations (see Chapter 5 of Pivato (2010)), which are second order elliptic, parabolic, and hyperbolic PDEs respectively. It has also been verified for the equations of linear elasticity (see Pages 233-234 of Slaughter (2002)). We now decompose the solution u of the linear IBVP (13) based on the principle of superposition. Let u0 (s, t| x) be defined as the unique solution of the linear IBVP: Lx u0 (s, t| x) = e0 (s, t) ∀ (s, t) ∈ A, Qjx u0 (s, t| x) = hj0 (s, t) ∀ (s, t) ∈ Bj , j = 1, . . . , q.
(15)
Let uei be defined as the unique solution of the linear IBVP Lx uei (s, t| x) = ei (s, t) ∀ (s, t) ∈ A, Qjx uei (s, t| x) = 0 ∀ (s, t) ∈ Bj , j = 1, . . . , q.
(16)
Let uhli be defined as the unique solution of the linear IBVP Lx uhli (s, t| x) = 0 ∀ (s, t) ∈ A, hli (s, t) , j = l j Qx uhli (s, t| x) = ∀ (s, t) ∈ Bj , j = 1, . . . , q. 0, j 6= l
(17)
Then, by linearity of the IBVP, we have the following proposition, which is a statement of the principle of superposition. Proposition 1: Suppose Assumption 1 holds. Then, the solution to the linear IBVP (13) is u (s, t| x, z) = u0 (s, t| x) +
k X
zei uei (s, t| x) +
i=1
q mj X X
zhji uhji (s, t| x).
(18)
j=1 i=1
Corollary 1: Suppose y (v, x, z) = ηv (u (·| x, z)), where ηv is a linear map indexed by v, i.e., ηv [au1 (·) + u2 (·)] = aηv u1 (·) + ηv u2 (·) for any two functions u1 (·) and u2 (·) with domain A and scalar a. Then, y (v, x, z) = y0 (v, x) +
k X
zei yei (v, x) +
i=1
q mj X X j=1 i=1
9
zhji yhji (v, x),
(19)
where y0 (v, x) = ηv (u0 (·| x)), yei (v, x) = ηv (uei (·| x)), and yhji (v, x) = ηv uhji (·| x) . Remark 2: Depending on the context, v can represent either a space-time (s, t), space s, time t coordinate or an arbitrary index of a component of a vector. Remark 3: Examples of ηv are the average of u (·| x, z) over the spatial domain for time point t = v, and the difference uv (s1 , t1 | x, z) − uv (s2 , t2 | x, z), where uv is a component of u, and (s1 , t1 ) and (s2 , t2 ) are two fixed space-time points. All the maps mentioned in the first paragraph of Section 2 are linear maps. Before we end this subsection, we point out that while the principle of superposition is well-known, it has mainly been used together with the method of separation of variables to solve some PDEs (Page 19 of Myint-U and Debnath (2007)) and to prove uniqueness of the solutions to some PDEs (e.g., Page 234 of Slaughter (2002)). It has also been used in some engineering applications (Lindholm et al., 1979). The use of the decompositions (18)-(19) for the purpose of constructing an emulator does not seem to have been considered in the literature.
3.3
Accounting for Nonlinearity
In this section, we introduce a modified version of the decomposition (19) for y (which includes the special case of the solution u itself by taking ηv (u (·| x)) = u (v| x) and (s, t) = v) that accounts for nonlinearity due to numerical errors and the inherent nonlinearity in solutions of nonlinear IBVPs. In the next section, we propose a GP emulator with mean and covariance functions developed based on the modified decomposition, which can be applied to emulate linear maps of linear and nonlinear IBVPs solved by time consuming simulators. In theory, (19) is exact if (13) is a linear IBVP. However, we have found that the effect of z may not be exactly linear for fixed v and x, i.e., (19) may not hold, if y, y0 , yei , and yhji are replaced with numerical outputs from an FE simulator. The deviation from linearity can be noticeable if the IBVP is not solved to high accuracy. It can be attributed to the use of different spatial meshes and/or time grids to solve the IBVP at different values of the inputs (x, z) due to the use of an adaptive numerical procedure. Thus, in practice, we would observe that y (v, x, z) = y0 (v, x) +
k X
zei yei (v, x) +
i=1
q mj X X
zhji yhji (v, x) + ε (v, x, z) ,
(20)
j=1 i=1
where ε (v, x, z) is a residual term that is usually small in magnitude. This modified decomposition also holds for nonlinear IBVPs of the form (13), where either Lx or some Qjx is a nonlinear operator. For these nonlinear IBVP’s, y0 , yei , and yhji in (20) are similarly defined by (15)-(17) and Corollary 1 but the operators Lx and Qjx may now be nonlinear. For many practical problems, a standard linear IBVP model (13) for the problem (e.g., linear heat or wave equation) is made nonlinear by adding more realistic modeling elements. A common example is to replace the constant thermal conductivity in the heat equation (1) with a thermal conductivity that depends on temperature u. For instance, Park and Cho (1996) assume the thermal conductivity is a quadratic function of u. Other examples can be found in Examples 1 and 2 of Section 4. In these cases, ε (v, x, z) accounts for the combined effect of nonlinearity due to the mathematical form of the IBVP and nonlinearity due to numerical error.
10
Clearly, the model (20) carries more information about the behavior of y as z changes when the residual term is smaller. Numerical errors are often quite small for time consuming simulators as the simulator is time consuming precisely because it attempts to solve the IBVP to high accuracy. Thus, for a simulator of a nonlinear IBVP, ε (v, x, z) is likely dominated by nonlinear effects due to the mathematical form of the IBVP. However, these nonlinear effects are often not as large as the linear effects of the components of z when the nonlinear IBVP is obtained by adding more realistic modeling elements to a widely used low-fidelity linear IBVP model, which is the reason that the low-fidelity linear IBVP model is widely used in the first place. In these cases, the solution of the nonlinear IBVP changes approximately linearly with z and the residual term in (20) has small magnitude compared to the linear effects of z.
3.4
Proposed GP Emulator
Based on the model (20), we propose prior mean and covariance functions to construct an effective GP emulator for a linear map of the solution of the IBVP (13) denoted generically by y (v, x, z). We assume that the design region X for (x, z) is a hyperrectangle and the components of z are standardized so that each component has a range of [−1, 1]. We let x0 denote the center of the design region for x. Referring to (20), we assign y0 (v, x) + ε (v, x, z), each yei (v, x), and each yhji (v, x) an independent GP prior denoted by Y0+ε (v, x, z), Yei (v, x), and Yhji (v, x) respectively. We let the prior mean functions of these GPs be estimates yˆ0 (v, x0 ), yˆei (v, x0 ), and yˆhji (v, x0 ) of y0 (v, x0 ), yei (v, x0 ), and yhji (v, x0 ) respectively. Note that we let ε (v, x, z) have a prior mean of zero. Thus, the GP prior Y (v, x, z) for y (v, x, z), obtained by substituting y0 + ε, yei , and yhji in (20) with their GP priors, has mean function: E [Y (v, x, z)] = yˆ0 (v, x0 ) +
k X
zei yˆei (v, x0 ) +
i=1
q mj X X
zhji yˆhji (v, x0 ).
(21)
j=1 i=1
We can estimate y0 (v, x0 ), yei (v, x0 ), yhji (v, x0 ) by using the simulator output data at specially chosen design points in the design region X . By definition (see (15)), y0 (v, x0 ) is the response at the point (x, z) = (x0 , 0) in the experiment region, which we take as yˆ0 (v, x0 ), i.e., yˆ0 (v, x0 ) = y0 (v, x0 ). In addition, (20) implies that y0 (v, x0 ) + yel (v, x0 ) is approximately equal (exactly equal if ε = 0) to the response at the point given by x = x0 , zel = 1, and all other zei ’s and zhji ’s equal to 0, i.e., (x, z) = (x0 , 0, . . . , 0, zel = 1, 0, . . . , 0). Thus, our estimate yˆel (v, x0 ) of yel (v, x0 ) is obtained by subtracting the output at (x, z) = (x0 , 0) from the output at (x, z) = (x0 , 0, . . . , 0, zel = 1, 0, . . . , 0). We estimate yhji (v, x0 ) in a similar fashion. Thus, estimation of the mean function (21) requires 1 + k + m1 + · · · + mq runs of the simulator. The design runs for this purpose are given by the rows of the design matrix O 0Tdz D1 = x0 11+dz , (22) Idz where dz = k + m1 + · · · + mq , 11+dz is a (1 + dz ) × 1 vector of 1’s, 0dz is a dz × 1 vector of 0’s, and Idz is a dz × dz identity matrix. Note that D1 is a one-factor-at-a-time design for z given fixed x = x0 . We append to D1 a maximin Latin hypercube design (LHD) D2 , which is a commonly used design for constructing GP emulators. Typically, the size (number of rows) of D2 far exceeds the size of D1 . In the examples in Section 5, we choose the size of D2 so that the size of the combined design is 7d, where d is dz plus 11
the dimension of x. Note that D1 is of size 1 + dz ≤ d, if we assume x has at least one component. Our choice of covariance function for Y (v, x, z) is motivated by the goal of achieving a parsimoniously parameterized and separable (with respect to v and (x, z)) covariance function that generates sample paths that are approximately linear in z conditional on (v, x) and allows for weighing of the importance of the linear effect of each component of z. We use covariance functions that generates continuous sample paths since we assume that y (v, x, z) is continuous in (x, z), while y0 (v, x), yei (v, x), and yhji (v, x) are continuous in x, and when v is a continuous variable, y, y0 , yei , yhji are continuous in v. These assumptions and (20) imply that ε (v, x, z) is continuous also. For given ηv , continuity of these functions is easy to establish if it is a fact that u (s, t| x, z), u0 (s, t| x), uei (s, t| x), and uhji (s, t| x) are continuous functions (an assumption which holds for most IBVPs). For instance, if y (v, x, z) = ηv (u (·| x, z)) is the average of u (·| x, z) over the spatial domain for time point v, then we may show that y (v, x, z) is a continuous function of (v, x, z) by applying the dominated convergence theorem. Let us consider the case where v is a continuous variable. The use of the same correlation function over the (v, x) space for Y0+ε (v, x, z), Yei (v, x), and Yhji (v, x) seems justifiable. This is because y0 (v, x) (which is usually more dominant than ε (v, x, z)), yei (v, x) and yhji (v, x) are each obtained by applying the same linear map ηv to the solution of an IBVP with the same operator Lx appearing in the PDE and the same operators Qjx in the initial and boundary conditions (see (15)-(17)). This intuitive argument is given a more concrete meaning in Appendix B where we justify the use of the same covariance function up to a constant multiple to model each of the solutions uei (s, t| x) , i = 1, . . . , k corresponding to the IBVP (1)-(3) by assuming that each ei /σei is an independent draw from the same GP, where σei is a positive constant. Similar comments apply to the uh1i (s, t| x)’s and the uh2i (s, t| x)’s. However, it seems difficult to use this approach to justify the use of the same covariance function up to a constant multiple for all u0 , uei , uh1i , and uh2i . In light of the above discussion and the preference for separable correlation functions to increase computational efficiency, we specify a product form correlation function φ1 (v, v 0 | θ1 ) φ2 (x, x0 | θ2 ) for the GP priors Yei (v, x) and Yhji (v, x), where θ1 and θ2 are parameters of the correlation functions. We assume the GP prior Y0+ε (v, x, z) for y0 (v, x) + ε (v, x, z) has correlation function φ1 (v, v 0 | θ1 ) φ2 (x, x0 | θ2 ) φ3 (z, z 0 | θ3 ), where the correlations over the (v, x) space are given by the same correlation function as for Yei (v, x) and Yhji (v, x). Finally, we let the variances of Y0+ε (v, x, z), Yei (v, x), and Yhji (v, x) be constants denoted by σ02 , σe2i and σh2ji . The assumptions given in this paragraph together with the assumption of independence of Y0+ε (v, x, z), Yei (v, x), and Yhji (v, x) give the following covariance function for the GP prior Y (v, x, z) of y (v, x, z): cov [Y (v, x, z) , Y (v 0 , x0 , z 0 )] = cov [Y0+ε (v, x, z) , Y0+ε (v 0 , x0 , z 0 )] +
k X
zei ze0 i cov [Yei
0
0
(v, x) , Yei (v , x )] +
i=1
q mj X X
zhji zh0 ji cov Yhji (v, x) , Yhji (v 0 , x0 )
j=1 i=1
" = φ1 (v, v 0 | θ1 ) φ2 (x, x0 | θ2 ) σ02 φ3 (z, z 0 | θ3 ) + " = σ02 φ1 (v, v 0 | θ1 ) φ2 (x, x0 | θ2 ) φ3 (z, z 0 | θ3 ) +
k X i=1
j=1 i=1
k X
q mj X X
i=1
12
zei ze0 i σe2i +
q mj X X
zei ze0 i ωe2i +
j=1 i=1
# zhji zh0 ji σh2ji # zhji zh0 ji ωh2ji ,
(23)
where ωe2i = σe2i /σ02 and ωh2ji = σh2ji /σ02 . Note that the parameters of this covariance function are θ1 , θ2 , θ3 , σ02 , ωe2i , i = 1, . . . , k, and ωh2ji , i = 1, . . . , mj , j = 1, . . . , q, which shall be estimated from data. By setting ωe2i = 0, we can remove the linear trend for zei and by setting ωh2ji = 0, we can remove the linear trend for zhji if the effects of these variables are negligible. In this paper, we let both φ1 and φ2 be product Mat´ern correlation functions with smoothness parameter 3/2, as given in (9), since this often gives a far better conditioned prior covariance matrix than the popular Gaussian correlation function. For the case where v is simply an index for multiple outputs, we let φ1 (v, v 0 | θ1 ) in (23) be replaced with an element of a positive definite covariance matrix Φ, as in the Conti and OHagan (2010) approach discussed in Section 2. This assumes that cov [Y0+ε (v, x, z) , Y0+ε (v 0 , x, z)] does not depend on (x, z) (similarly for Yei (v, x) and Yhji (v, x)). To keep the number of parameters controlling the nonlinear effects of z low, we set φ3 to be equal to the isotropic Mat´ern correlation function φ3 (z, z 0 | θ3 ) = exp (− kz − z 0 k2 /θ3 ) (kz − z 0 k2 /θ3 + 1) .
(24)
As in Section 2, y (v, x, z) , v∈ M is the simulator output. Suppose v is a continuous variable. Then, given the data (10), the posterior mean and covariance functions conditioned on all GP parameters are given by (11) and (12) respectively, and the ML estimates of the covariance parameters are given by formulas in Appendix A with the following changes: 0 1. Change (8)h to (23): Replace ψ1 (v, v 0 | γ1 ) with φ1 (v, v 0 | θ1 ), replace i ψ2 (w, w | γ2 ) P P P mj zhji zh0 ji ωh2ji , and change σY2 with φ2 (x, x0 | θ2 ) φ3 (z, z 0 | θ3 ) + ki=1 zei ze0 i ωe2i + qj=1 i=1 to σ02 . 2. Change the mean α or α ˆ to (21): Change α1N and α1N 1Tn to U (1, z)T and ! T 1,z 1 .. U respectively, where U is a matrix with rows (ˆ y0 (v, x0 ), yˆe1 (v, x0 ), . . . , yˆhqmq (v, x0 )), .n 1,z
v ∈ M and z 1 , . . . , z n are the values of z at the design points. The case where v is an arbitrary indexing variable is handled similarly. Below, we summarize the procedure for building the proposed GP emulator. Algorithm for Constructing Proposed Emulator 1. Define the experiment ranges for x. Rewrite the functional inputs e and hj such that z ∈ [−1, 1]dz . Note that the ei ’s in (13) will be modified when the zei ’s are recentered and rescaled. Similar comments apply to the centering and scaling of zhji . 2. Run the simulator at the points in design D1 given in (22), where x0 is the midpoint of the range of x. Compute yˆ0 (v, x0 ), the yˆei (v, x0 )’s, and the yˆhji (v, x0 )’s based on the simulator outputs. Note that yˆ0 is obtained from the output at (x, z) = (x0 , 0). The sum yˆ0 (v, x0 ) + yˆei (v, x0 ) is obtained from the output at x = x0 , zei = 1 and all other components of z equal to zero (similarly for yˆ0 (v, x0 ) + yˆhji (v, x0 )). Then, get all yˆei (v, x0 )’s, and all yˆhji (v, x0 )’s by postprocessing the data. This gives the mean function (21). 3. Generate a maximin LHD D2 of size n − dz − 1. Estimate parameters of the covariance function given in (23) with combined data obtained from the experimental runs specified by D1 and D2 . 13
Before we end this section, we remark that it is not prudent to treat yˆ0 (v, x0 ), yˆei (v, x0 ), yˆhji (v, x0 ), v∈ M in (21) as parameters and estimate them using the ML method. Since the prior mean function is not separable with respect to v and (x, z), we cannot utilize Kronecker product matrix algebra formulas to obtain computationally efficient formulas for the ML estimates of the mean parameters, unlike the models in Section 2. Moreover, we may need to account for uncertainty in the mean parameters since there can be a large number of them, and it seems that doing so would require us to use GP priors for yˆ0 (v, x0 ), yˆei (v, x0 ), and yˆhji (v, x0 ) since they are continuous functions of v when v is a continuous variable.
4
Three Examples of IBVPs
This section gives three examples of IBVPs of the form (13) that often need to be solved numerically in engineering applications. Numerical simulation results for all three IBVP examples shall be given in Section 5. In all three examples, a Cartesian coordinate system is employed as the spatial coordinate system. Many other IBVPs with widespread applications can be found in the literature on PDE models (for example, see Jeffrey (2003)).
4.1
Example 1: The Telegraph Equation
The first example is the telegraph equation (Page 36 of Pivato (2010)), which can be used as a model of damped vibration of a membrane (Chapter 6 and Chapter 13 of Rao (2007)). Specifically, in a damped vibration of a membrane with plane geometry S = [0, L]2 clamped at the edges, the transverse displacement u is a function of the Cartesian coordinate (s1 , s2 ) of the point in S and time t. It is given by the solution of the telegraph equation " # 2 X ∂ ∂ ∂ ∂2 u = e ∀s ∈ S, t ∈ [0, T ] , (25) χi + x2 χ 2− ∂t ∂s ∂s ∂t i i i=1 where χ is the constant areal density of the membrane, i.e., mass per unit area, and x2 ∂u represents the viscous damping force. Note that for simplicity of notation, we have ∂t omitted the arguments for the functions u and e. In classical models, χi represents the constant membrane tension per unit length in direction si . To correct for the increase in tension due to stretching during" vibration, Mianroodi, Niaki, Naghdabadi, and Asghari # r 2 RL ∂u 1 + ∂s dsi − L /L, where x1 is the initial ten(2011) show that χi = x1 + HE 0 i sion per unit length, E is the Young’s modulus of the membrane and H is the membrane thickness (Equation (12) of Mianroodi et al. (2011) follows from (25) using the fact that ∂χi = 0). This makes (25) a nonlinear PDE. The initial conditions are ∂si u = h1 ,
∂ u = h2 ∀s ∈ S, t = 0 ∂t
(26)
Since the membrane is clamped at the edges, the boundary condition is u = 0 ∀s ∈ ∂S, t ∈ (0, T ]. The functional inputs are e, h1 , and h2 , where e is the external transverse force per unit area, h1 represents the initial transverse displacement, and h2 represents the 14
initial transverse velocity. It is conceivable that in some applications, there is uncertainty in x1 and x2 , and one wants to study the effect of changing the initial conditions (by changing z) on the resulting membrane vibration. In these cases, it is of interest to quantify changes in u due to changes in (x, z). Note that (25)-(26) and the boundary condition is of the form (13), where the expression for Lx should be clear from (25), and ∂ , and Q3x = I. Q1x = I, Q2x = ∂t
4.2
Example 2: Steady-State Heat Equation for Three-Dimensional Solid Body
A PDE model for the steady state temperature of a three-dimensional solid body is ( 3 ) X ∂ ∂ − (x1 + χ) u = e ∀s ∈ S, (27) ∂s ∂s i i i=1 where (s1 , s2 , s3 ) is the spatial coordinate, x1 + χ is the thermal conductivity of the solid body, and χ is a function dependent on u. The functional form of the thermal conductivity is motivated by consideration of the thermal conductivity of AZ31 and AZ61 magnesium alloys reported in Lee, Ham, Kwon, Kim, and Suh (2013), where the measured thermal conductivity of the two alloys versus temperature plot like parallel lines. Assuming that S is a polyhedron, three types of boundary conditions can be imposed on the boundary faces B1 , . . . , Bq of the body (each face is a polygon). First, the Dirichlet boundary condition that gives the face temperature: u = hj ∀s ∈ Bj , j = 1, . . . , q1 .
(28)
Second, the Neumann boundary condition representing heat flux through a face: (x1 + χ)
3 X i=1
Nij
∂ u = hj ∀s ∈ Bj , j = q1 + 1, . . . , q2 , ∂si
(29)
where N1j , N2j , N3j is the outward unit vector at face j. Third, the Robin boundary condition giving the amount of heat flow per unit area via convection and radiation through a face: (x1 + χ)
3 X i=1
Nij
∂ u + χ2 (u − u∞ ) + χ3 u4 − u4∞ = 0 ∀s ∈ Bj , j = q2 + 1, . . . , q, (30) ∂si
where χ2 is the convective heat transfer coefficient, u∞ is the ambient temperature, and χ3 is the Stefan–Boltzmann constant (5.6704 × 10−8 W · m−2 · K−4 ) if we assume that the solid body is a black body. The quantity χ2 (u − u∞ ) is the amount of heat transferred by convection to the surrounding atmosphere, and χ3 (u4 − u4∞ ) is the amount of heat transferred by radiation to the other solid surfaces in the surroundings. While (27)-(30) is a nonlinear IBVP given a nonconstant χ or a nonzero χ3 , it is often linearized so that it is more easily solved (e.g., see Chapter 4 of Cengel and Ghajar (2014)). Linearization involves setting χ = 0 and x1 to be a typical value of the thermal conductivity, and neglecting radiation heat transfer by setting χ3 = 0. This makes (30) a linear boundary condition with a fixed right-hand side χ2 u∞ . 15
The functional input e is the internal heat generated. The functional inputs hj , j = 1, . . . , q1 each represents the temperature at a face, and the inputs hj , j = q1 + 1, . . . , q2 each represents the heat flux through a face. In some applications, due to uncertainty in the boundary conditions (Norris, 1998), it is of interested to evaluate the temperature in the solid body over a range of boundary conditions. The nonlinear IBVP (27)-(30) can be time consuming to solve for a three-dimensional body of complex geometry, and the transient version (a generalization of (1)-(3)) is many times more time consuming to solve.
4.3
Example 3: Lam´ e-Navier Equations of Linear Elasticity
The Lam´e-Navier equations are three scalar PDEs that can be solved to obtain the three scalar displacement fields in the theory of linear elasticity (Chapter 6 of Slaughter (2002)). These equations are used to model small elastic deformations of solid structures. The equations together with displacement and traction boundary conditions form a linear IBVP of the form (13), where the operators Lx , Qjx are given in Appendix C. For this IBVP, x = (x1 , x2 ) is a vector of material properties, where x1 denotes the Young’s modulus and x2 denotes the Poisson’s ratio, A = S defines the solid body’s geometry before deformation, and u is the 3 × 1 vector of displacements in the directions of the three Cartesian axes, which are functions of the spatial coordinates s = (s1 , s2 , s3 ). Once the solution to the Lam´e-Navier equations has been obtained, the stress and strain fields within the solid body can be easily obtained via other equations (Chapter 6 of Slaughter (2002)). These equations of linear elasticity are widely used in engineering design of structures in practice. The source term e in (13) represents the body force per unit volume, which can be due to gravity or electromagnetic forces; hj is either the displacement or the traction (force per unit area applied to the surface) at the boundary face Bj , depending on whether Qjx is given by (C2) or (C3). In many cases in the design of building structures, there is uncertainty in the traction and displacement boundary conditions since the loading of the structure and the rigidity of its supports are subject to uncertainty. Moreover, gravity body force represented by the source term e is always present and this force can be uncertain if the density of the material is not precisely known. There is also uncertainty in the Young’s modulus x1 and the Poisson’s ratio x2 since these material properties are subject to manufacturing variation. When the solid body geometry is complex, the number of nodes employed in the FE method to discretize the solid body to solve the linear IBVP given by the Lam´e-Navier equations and boundary conditions needs to be large. Due to this and the fact that the Lam´e-Navier equations represent a system of three scalar PDEs, the use of the FE method to solve the linear IBVP can yield a system of linear algebraic equations that is very large. Thus, the Lam´e-Navier equations can be very time consuming and memory intensive to solve.
5
Numerical Examples
This section gives three numerical examples to demonstrate the improvements that can be achieved with the proposed GP emulator. The three numerical examples correspond to the three IBVPs in Section 4. In all cases, Matlab PDE Toolbox is employed to solve the IBVP with the FE method. This requires generation of a mesh that partitions the 16
spatial domain S with triangles/tetrahedrons. The mesh also consists of selected points in each triangle/tetrahedron called nodes. We choose the maximum mesh edge length parameter (maximum length of the edges of the triangles or tetrahedrons in the mesh) used by Matlab to generate a mesh by decreasing it until changes in the response at a few input sites are negligible. All simulations are performed on a HP Z640 workstation with 64GB RAM and Intel Xeon E5-2650 v4 @ 2.2GHz processor with 12 cores. Two GP models are compared in each example. The first is the stationary and separable GP model in Section 2, which we call the standard GP model. For functional response, it has constant mean and product Mat´ern correlation function of the form (9), and for multiple nonfunctionally related scalar responses, it has a distinct constant mean for each response and a constant covariance matrix for the responses over the input space. The second is a GP model with mean and covariance functions (21) and (23), with φ2 (and φ1 if v is continuous) that is of product Mat´ern form (9), and φ3 that is of isotropic Mat´ern form (24), which we call the conditional linear GP model. We call the covariance function (23) the linear-Mat´ern covariance function. In searching for the ML estimates in the examples, we use the patternsearch solver in the Global Optimization Toolbox of Matlab.
5.1
Example 1: Vibration of a Membrane Simulated with the Telegraph Equation
In this example, we study the performance of the proposed GP model for predicting the transverse vibration of the point (1, 1) on a square membrane of side length 2, i.e., S = [0, 2]2 , over the time interval [0, T ] = [0, 1.25]. The model is the nonlinear IBVP given in (25)-(26) in Section 4.1 with χi as given by Mianroodi et al. (2011). Note that throughout this example, SI units are employed for all physical quantities. The membrane thickness is 0.001. The membrane material is rubber, which has areal density χ = 1.1 and Young’s modulus E = 107 . The initial tension x1 and the coefficient of viscous damping force x2 are varied in the simulations. The transverse force is e = 0. The initial velocity is h2 = 0 and the initial displacement h1 is expressed in terms of tensor Bernstein bases of degree three in each spatial coordinate, i.e., (s1 /2)i1 (1 − s1 /2)3−ii (s2 /2)i2 (1 − s2 /2)3−i2 , where i1 , i2 ∈ {0, 1, 2, 3}. This is a natural choice of bases because we can then fix the displacement at the boundary ∂S to be identically zero, which is a condition that must be satisfied by any valid initial displacement as the membrane is clamped at the edges, by simply setting the coefficients of 12 of the 16 bases to zero. A linear combination of these Bernstein bases is zero on ∂S if and only if the coefficient of any of the bases with i1 ∈ {0, 3} or i2 ∈ {0, 3} is zero. Thus, the initial displacement is h1 (s| z) = 0.1zh11 (s1 /2)1 (1 − s1 /2)2 (s2 /2)1 (1 − s2 /2)2 + 0.1zh12 (s1 /2)2 (1 − s1 /2)1 (s2 /2)1 (1 − s2 /2)2 + 0.1zh13 (s1 /2)1 (1 − s1 /2)2 (s2 /2)2 (1 − s2 /2)1 + 0.1zh14 (s1 /2)2 (1 − s1 /2)1 (s2 /2)2 (1 − s2 /2)1 . The range over which each of the six components of the input (x, z) is varied in the experiment is given in Table 2. The response for given (x, z) is y (v, x, z) = u (1, 1, v| x, z) , v ∈ {0, 1.25/300, . . . , 1.25}, i.e., the displacement at the spatial coordinate (1, 1) at 301 equally spaced time points in the interval [0, 1.25]. The IBVP is solved using a FE mesh that 17
consists of 1250 congruent right triangles. The absolute and relative tolerances of the ordinary differential equation solver used by the Matlab hyperbolic function is set to 10−9 and 10−6 respectively. Table 2: Ranges of inputs varied in computer experiment, Example 1 Variable Input Minimum Maximum x1 x2 zh11 zh12 zh13 zh14
400 0.6 -1 -1 -1 -1
600 2.1 1 1 1 1
The design D1 in (22) consists of five points. A maximin LHD of size 37 points is employed as the design D2 . Thus, the combined design is of size 42. We use a maximin LHD of size 60 points as the test set. The average time taken to solve the IBVP (25)(26) at the points in the test set on our (powerful) workstation is calculated to be 716 seconds, i.e., about 12 minutes per simulation run. Thus, the use of an emulator is highly recommended for problems that would otherwise require the IBVP to be solved many times, especially on a less powerful workstation. The responses for the last three design points are plotted in Figure 2. It is seen that there is quite some significant variation in vibration pattern as the inputs vary.
Figure 2: Plot of responses for last three design points, Example 1 The standard and proposed GP models are fitted with the combined design and compared. The proposed linear-Mat´ern covariance function (23) is given explicitly as " 2 # Y σ02 [exp (− |v − v 0 | /θ1 ) (|v − v 0 | /θ1 + 1)] exp (− |xi − x0i | /θ2i ) (|xi − x0i | /θ2i + 1) i=1
" exp (− kz − z 0 k2 /θ3 ) (kz − z 0 k2 /θ3 + 1) +
4 X
# zh1i zh0 1i ωh21i .
i=1
The ML estimates of the parameters are σ ˆ02 = 7.938 × 10−19 , θˆ1 = 0.1063, θˆ21 = 0.03535, θˆ22 = 3.171, θˆ3 = 1.604 × 10−27 , ω ˆ h211 = ω ˆ h213 = exp(30), ω ˆ h212 = 0.8132 exp (30), ω ˆ h214 = 0.7810 exp (30). Note that σ ˆ02 and θˆ3 are very small while ω ˆ h211 and ω ˆ h213 are equal 18
to the upper bound exp(30) we set in the numerical optimization of the log likelihood function. This indicates that cov [Y0+ε (v, x, z) , Y0+ε (v 0 , x0 , z 0 )] ≈ 0 (refer to (23)). Since we know y0 (v, x) = 0 in (20) (which is easy to show) and Y0+ε (v, x, z) is the GP prior for y0 (v, x) + ε (v, x, z), the ML estimates suggest that the residual ε (v, x, z) in (20), which is supposed to capture deviation from linearity, is very small. Table 3: MAE, MSE, APIL, and coverage (of nominal 98% prediction intervals) for the two GP models, Example 1 GP Model MAE MSE APIL Coverage Conditional Linear Standard
9.739 × 10−5 27.274 × 10−5
2.086 × 10−8 18.648 × 10−8
3.342 × 10−3 9.099 × 10−3
1 1
Figure 3: Plot of point predictions and nominal 98% prediction intervals for the conditional linear (a),(c) and standard (b), (d) GP models together with the true response at two points in the test set; plots (a) and (b) are for the same point and similarly for plots (c) and (d). In Table 3, we give the mean absolute error (MAE), mean squared error (MSE), average prediction interval length (APIL) of nominal 98% prediction intervals, and coverage of these intervals. The proposed conditional linear GP model significantly outperforms the standard GP model in terms of MAE, MSE, and APIL. The coverages of the prediction intervals for both models are one, which is close to the nominal 0.98. In Figure 3, we plot the point and interval predictions of the conditional linear and standard GP models together with the actual simulator output for two points in the test set. We see that the prediction intervals of the standard GP model are so wide that they are not useful, and the point predictions of the standard GP model can be poor in some cases, such as 19
the case illustrated in Figure 3(d). In contrast, the conditional linear GP model does not suffer from these deficiencies. Finally, fitting the conditional linear GP model and computing its point and interval predictions at all points in the test set only take a total of 32 seconds, which is negligible compared to the 12 minutes required by the simulator to solve an instance of the IBVP on average.
5.2
Example 2: Temperature in a Robotics Component Simulated with the Heat Equation
In this example, we study the performance of the proposed GP model for predicting the temperature in a robotics component made from magnesium alloy called the gripper pivot, which is shown in Figure 1. The problem is used by MathWorks-Physics-Team (2017) to demonstrate the steps involved in utilizing Matlab to solve IBVPs. We shall solve the nonlinear IBVP given in (27)-(30) in Section 4.2 but MathWorks-Physics-Team (2017) solve a linear IBVP obtained from (27)-(30) by using a constant thermal conductivity and neglecting radiation effects. Throughout this example, we use units of measurement derived from SI units for all base quantities except for length, which is measured in inches (e.g., the unit for thermal conductivity is kg · inch · s−3 · K−1 ). In (27), χ is obtained from a piecewise cubic Hermite interpolating polynomial that interpolates the thermal conductivity versus temperature data for the AZ61 magnesium alloy from Lee et al. (2013). The range of x1 is 0 to 798.82, where setting x1 = 798.82 gives a curve that approximates the thermal conductivity versus temperature data for the AZ31 magnesium alloy given in Lee et al. (2013) well. The internal heat is e = 0. For the face labelled B1 in Figure 1, we impose a Dirichlet boundary condition (28) with h1 = 303.15 + 15zh11 (a uniform temperature over the face is assumed). For the faces labelled B2 and B3 , Neumann boundary conditions (29) with h2 = 100 + 50zh21 and h3 = 100 + 50zh31 respectively are imposed. For all the other faces, we impose the Robin boundary condition (30) with convective heat transfer coefficient χ2 = 25 and ambient temperature u∞ = 288.15. The range over which each of the four components of the input (x, z) is varied in the computer experiment is given in Table 4. The response for given (x, z) is y (v, x, z) = u (−0.4638, 1.5, v| x, z) , v∈ M, where M is the set of s3 coordinates of the 101 equally spaced points along the vertical line in the cylindrical inner surface of the gripper pivot shown in Figure 1. The IBVP is solved using the FE method with a tetrahedral mesh generated by Matlab with maximum edge length of 0.09. The quadratic FE method with 10 nodes per tetrahedron is used. The design D1 in (22) consists of 4 points. A maximin LHD of size 24 points is employed as the design D2 . Thus, the combined design is of size 28. We use a maximin LHD of size 40 points as the test set. The average time taken to solve the IBVP (27)(30) at the points in the test set on our workstation is calculated to be 2646 seconds, i.e., about 44 minutes per simulation run. Thus, the use of an emulator in place of the simulator for problems that require many simulations is recommended. The responses for the last three design points are plotted in Figure 4. It is seen that the range of the functional output for each simulation is small compared to the differences between the average functional output values for distinct simulation runs. The standard and proposed GP models are fitted with the combined design and
20
Table 4: Ranges of inputs varied in computer experiment, Example 2 Variable Input Minimum Maximum x1 zh11 zh21 zh31
0 -1 -1 -1
798.82 1 1 1
Figure 4: Plot of responses for last three design points, Example 2. compared. The proposed linear-Mat´ern covariance function (23) is given explicitly as σ02 [exp (− |v − v 0 | /θ1 ) (|v − v 0 | /θ1 + 1)] [exp (− |x1 − x01 | /θ2 ) (|x1 − x01 | /θ2 + 1)] # " 3 X exp (− kz − z 0 k2 /θ3 ) (kz − z 0 k2 /θ3 + 1) + zhj1 zh0 j1 ωh2j1 . j=1
We obtain the following ML estimates of the parameters: σ ˆ02 = 23.6321, θˆ1 = 37.2173, θˆ2 = 5.5745, θˆ3 = 43.1817, ω ˆ h211 = 0.001108, ω ˆ h221 = 0.02423, ω ˆ h231 = 1.4204. In Table 5, we give the MAE, MSE, APIL of the 98% prediction intervals, and coverage of these intervals. The proposed conditional linear GP model significantly outperform the standard GP model in terms of MAE, MSE, and APIL. The coverages of the prediction intervals for both models are one, which is close to the nominal 0.98. In Figure 5, we plot the point and interval predictions of the conditional linear and standard GP models together with the actual computer output for two points in the test set. We see that the prediction intervals given by the standard GP model are much wider than the prediction intervals given by the conditional linear GP model. Finally, fitting the conditional linear GP model and computing its point and interval predictions at all points in the test set only take a total of 3.7 seconds, which is negligible compared to the 44 minutes required on average to solve an instance of the IBVP.
5.3
Example 3: Deformation of a Bracket Simulated with the Lam´ e-Navier Equations
In this example, we apply the Lam´e-Navier equations in Section 4.3 to predict the deformation in the bracket shown in Figure 6. SI units are employed for all physical quantities 21
Table 5: MAE, MSE, APIL, and coverage (of nominal 98% prediction intervals) for the two GP models, Example 2 GP Model MAE MSE APIL Coverage Conditional Linear Standard
2.047 × 10−4 8.069 × 10−3
1.413 × 10−7 1.156 × 10−4
5.156 × 10−2 4.245 × 10−1
1 1
in this example. The linear IBVP is solved using the FE method with a tetrahedral mesh that is generated by Matlab with maximum edge length of 0.0016. The linear FE method with four nodes per tetrahedron is used to solve the IBVP. The bracket density is 7850 and the gravitational acceleration is 9.8. Thus, the source term, which represents gravitational force per unit volume, is e = (0, 0, −7850 × 9.8)T . The displacement is assumed to be zero on the boundary face/plane labelled B1 . This gives a Dirichlet boundary condition with h1 = (0, 0, 0)T . For the boundary faces labelled B2 and B3 , we impose the traction boundary condition. On B2 , the traction is h2 = zh21 h21 = zh21 0, 0, −1 × 106 , which is due to a distributed load acting in the s3 direction. On B3 , the traction is h3 =
3 X
h i zh3i h3i = zh31 (1, 0, 0)T + zh32 (0, 1, 0)T + zh33 (0, 0, −1)T × 1.8 × 106 ,
i=1
which is due to a shaft that is welded to the surface B3 of the bracket. The traction is specified to be zero on all the other boundary faces. The range over which each of the six components of the input (x, z) is varied in the computer experiment is given in Table 6. Note that the Young’s modulus x1 and the Poisson’s ratio x2 are varied over ranges that are somewhat wider than typical of steel alloys. The response y of interest is the vector of displacements in the s1 and s3 directions at the point (s1 , s2 , s3 ) = (0.215, 0.1, 0.1055) marked as a triangle in Figure 6, i.e., y (v, x, z) = uv (0.215, 0.1, 0.1055| x, z) , v ∈ {1, 3}, where uv denotes the vth component of u. We ignore the displacement in the s2 direction since it seems to be uncorrelated with the other two displacements (the sample correlation coefficients are small), which suggests that it is more appropriately predicted with an independent GP emulator. The design D1 in (22) consists of 5 points. A maximin LHD of size 37 points is employed as the design D2 . Thus, the combined design is of size 42. We use a maximin LHD of size 60 as the test set. The average time taken to solve the IBVP at the points in the test set on our workstation is calculated to be 550 seconds, i.e., about 9.2 minutes per simulation run. In addition, the memory requirement for the simulation is so high that we can run few simulations in parallel. Thus, the use of an emulator is recommended for various uncertainty quantification exercises involving the simulator. The standard and proposed GP models are fitted with the combined design and compared. The proposed linear-Mat´ern covariance function (23) is given explicitly as " 2 # Y 0 0 Φ exp (− |xi − xi | /θ2i ) (|xi − xi | /θ2i + 1) i=1
" 0
0
exp (− kz − z k2 /θ3 ) (kz − z k2 /θ3 + 1) +
zh21 zh0 21 ωh221
+
3 X i=1
22
# zh3i zh0 3i ωh23i
.
Figure 5: Plot of point predictions and nominal 98% prediction intervals for the conditional linear (a), (c) and standard (b), (d) GP models together with the true response at two points in the test set; plots (a) and (b) are for the same point and similarly for plots (c) and (d). The ML estimates are θˆ21 = 1.4949, θˆ22 = 6.3884, θˆ3 = 8.886 × 107 , ω ˆ h221 = exp(30), ω ˆ h231 = 0.429 exp(30), ω ˆ h232 = 4.360 × 10−28 , ω ˆ h233 = 0.222 exp(30), and 0.127 −4.812 ˆ Φ= × 10−19 . −4.812 184.241 This suggests that the residual ε (v, x, z) in (20), which represents the nonlinear effect due to numerical errors, is very small. The proposed mean function is 3.527 × 10−7 4.580 × 10−4 3.381 × 10−5 + zh21 + zh31 −1.263 × 10−5 −1.638 × 10−2 −1.818 × 10−7 −1.196 × 10−9 1.716 × 10−4 + zh32 + zh33 . −9 6.724 × 10 −6.645 × 10−3 In Table 7, we give the MAE, MSE, APIL of the 98% prediction intervals, and coverage of these intervals. Note that y (v, x, z) is denoted by yv in the table for simplicity. The proposed conditional linear GP model significantly outperforms the standard GP model in terms of MAE, MSE, and APIL. In Figure 7, we plot the point and interval predictions of the conditional linear and standard GP models versus the true output at all test points 23
Figure 6: Bracket with labels of some boundary faces. Table 6: Ranges of inputs varied in computer experiment, Example 3 Variable Input Minimum Maximum x1 x2 zh21 zh31 zh32 zh33
150 × 109 0.15 −1 −1 −1 −1
250 × 109 0.45 1 1 1 1
for each of the two components of the response y (·, x, z) separately. The dash lines in the figure are 45-degree lines through the origin. Thus, we see that the point predictions given by the conditional linear GP model are almost identical to the truth and its prediction intervals are so narrow that they cannot be seen. The prediction intervals given by the standard GP model are narrow but still visible. Finally, fitting the conditional linear GP model and computing its point and interval predictions at all points in the test set only take a total of 2.7 seconds, which is negligible compared to the 9.2 minutes required on average to solve an instance of the IBVP. Table 7: MAE, MSE, APIL, and coverage (of nominal 98% prediction intervals) for the two GP models, Example 3 GP Model MAE MSE APIL Coverage y1 y3
Conditional Linear 3.391 × 10−7 Standard 6.176 × 10−6 Conditional Linear 1.317 × 10−5 Standard 2.354 × 10−4
2.645 × 10−13 7.668 × 10−11 3.797 × 10−10 1.084 × 10−7
24
2.442 × 10−6 8.321 × 10−5 9.313 × 10−5 3.142 × 10−3
1 1 1 1
Figure 7: Plot of point predictions and nominal 98% prediction intervals for yv versus true yv , v = 1, 3, for conditional linear (top row) and standard (bottom row) GP models.
6
Conclusions
This paper proposes a GP model for responses that are linear maps of the solution of an IBVP of the general form (13) with source term, boundary conditions, and initial conditions expressed in terms of a finite number of bases. The proposed mean and covariance functions of the GP model are based on a decomposition of the solution of a linear IBVP, called the principle of superposition. While the exact statement of the principle of superposition applies theoretically to solutions of linear IBVPs, we propose a generalization of the result that applies to solutions of linear IBVPs affected by numerical errors and nonlinear IBVPs, by adding a residual term. The proposed mean and covariance functions give a GP model that reflects the manner in which the IBVP solution varies with the inputs. Specifically, the GP model is approximately a linear function of the coefficients of the functional inputs conditioned on all other inputs (which are parameters of the differential operators). The GP model is especially useful for nonlinear IBVPs obtained by adding more elaborate modeling elements to standard linear IBVPs. Examples of IBVPs of the form (13) that are time-consuming to solve are given. Numerical simulation results show that the proposed GP model, which we call the conditional linear GP model, outperform commonly used stationary GP models with separable covariance functions. We point out that we have tried to use the product Gaussian correlation function in the examples in Section 5 but it often results in very ill conditioned prior covariance matrices. Thus, we recommend the use of the Mat´ern correlation function (9). To check the reproducibility of our reported comparisons, we replicated the simulation in Section 5.2. For this replicate, there were multiple local optimal solutions in the likelihood for the conditional linear GP model. One local optimal solution gave excellent prediction performance but it was not the global optimal solution. Based on this observation, we recommend choosing the local optimal solution of the likelihood that gives the best leaveone-out cross validation performance when applying the conditional linear GP model in practice. In this work, we do not treat the simulator as a purely black box function, as in most existing works on computer experiments. Instead, we exploit the structure of the IBVP 25
solved by the simulator to construct the mean and covariance functions for the proposed GP emulator. Thus, the improved prediction performance achieved by the proposed emulator is due to the use of additional prior information on the functional relationship between the response and the inputs. Note that whether an IBVP is linear, or is nonlinear but obtained by adding more detailed modeling elements into an originally linear IBVP are easy-to-check conditions that do not require much domain-specific knowledge. Hence, the developed methodology should be broadly useful to practicing statisticians involved in computer experiments. The assumption that y is a linear map of u is not restrictive since one can always take the response as the entire function u, which gives all the information needed to compute any map of u of interest. In some cases, accurate modeling of the variation of a functional input requires many terms in a truncated series expansion. For example, if a functional input is modeled as a random field, accurate approximation of the random field via a truncated KL expansion might require many terms in the expansion to be retained. This is not a serious limitation for the conditional linear GP model. It can yield good prediction performance even when z is high dimensional if the response is approximately a linear function of z conditional on x. This is because the conditional linear GP model is specified based on this prior information, and the curse of dimensionality does not occur in estimation of a function f (x, z) that is linear in z conditional on x when z becomes high dimensional. None of our examples in Section 5 are high-dimensional because we intend to compare the proposed conditional linear GP model and the stationary GP model with separable covariance function under conditions that are not obviously favorable to the conditional linear GP model. The stationary GP model with product Mat´ern correlation function often does not give good prediction performance when the input space is highdimensional and the design size is small. It is possible to solve linear IBVPs numerically with GPs (Graepel, 2003). However, aside from the crucial limitation of linearity, this approach seems to be too timeconsuming and numerically ill conditioned for problems that involve discretization of the space-time domain with more than a few thousand points since the inverse of a covariance matrix with size equal to the number of points needs to be computed to solve a scalar PDE. In contrast, the FE method, which has received far more attention from researchers, yields a sparse linear system, and it is almost universally employed for applications with a complex spatial geometry. Graepels approach does have some usefulness. Raissi, Perdikaris, and Karniadakis (2017) employ a GP model with covariance function derived based on a linear PDE as in Graepel (2003) to predict the solution and source term of the PDE given noisy observations on the solution and source terms.
Appendix A: Efficient formulas for computing ML estimates of GP model The ML estimates of parameters of the GP model for a functional response in Section 2 is given by α ˆ = 1TN R1−1 Y∗ R2−1 1n / 1TN R1−1 1N 1Tn R2−1 1n , h T i (A1) σ ˆY2 = trace R1−1 Y∗ − α ˆ 1N 1Tn R2−1 Y∗ − α ˆ 1N 1Tn / (N n) , and (ˆ γ1 , γˆ2 ) = arg min N n log σ ˆY2 + N log |R2 | + n log |R1 | . 26
The ML estimates of parameters of the GP model for multiple scalar responses in Section 2 is given by −1 Aˆ = Y∗ R2−1 1n 1Tn R2−1 1n , T ˆ Tn /n, ˆ = Y∗ − A1 ˆ Tn R2−1 Y∗ − A1 Φ n o ˆ γˆ2 = arg min n log Φ + N log |R | . 2
(A2)
Appendix B: Justification for using the same covariance function up to a constant multiple to model each uei (s,t| x), uh1i (s,t| x), uh2i (s,t| x) for the IBVP (1)-(3) We consider the IBVP in (1)-(3) with h20 = h21 = · · · = h2m2 = 0. Because Green’s function does not depend on the functional inputs in (1)-(3), each of the solutions u0 , uei and uh1i is given by (7) with appropriate substitutions for e and h1 , i.e., Z tZ G (s, t, s0 , t0 | x) ei (s0 , t0 ) ds0 dt0 , uei (s, t| x) = 0
S
Z uh1i (s, t| x) =
G (s, t, s0 , 0| x) h1i (s0 ) ds0 ,
S
Z tZ
0
0
0
0
0
0
Z
G (s, t, s , t | x) e0 (s , t ) ds dt +
u0 (s, t| x) = 0
S
G (s, t, s0 , 0| x) h10 (s0 ) ds0 .
S
If each ei /σei is assumed to be an independent draw from a common GP E with variance one, where σei is a positive constant, then each uei (s, t| x) /σei is an independent GP with the same covariance function. Similarly, if each h1i /σh1i is assumed to be an independent draw from a common GP H with variance one, where σh1i is a positive constant, then each uh1i (s, t| x) /σh1i is a GP with the same covariance function. Note that we can consider the more general case where h2 6= 0 using Equation 5.0.11 in Duffy (2015), which leads to a similar conclusion. Finally, if e0 /σe0 and h10 /σh10 are drawn independently from E and H respectively, then u0 (s, t| x) is a GP with a covariance function that is a weighted sum of the covariance function of uei (s, t| x) /σei and the covariance function of uh1i (s, t| x) /σh1i . If we assume that the covariance functions for uei (s, t| x) /σei and uh1i (s, t| x) /σh1i are the same, then u0 (s, t| x) will have the same covariance function also, up to a constant multiple.
27
Appendix C: Differential operators for Lam´ e-Navier equations of linear elasticity The differential operator for the Lam´e-Navier PDE is ∂2 Lx = −
∂s21 ∂2
x1 /2 1 (1 + x2 ) (1 − 2x2 ) ∂s∂2 ∂s 2 ∂s3 ∂s1 ∂2 i=1 ∂s2i
P3 −
x1 /2 1 + x2
0
∂2 ∂s1 ∂s2 ∂2 ∂s22 ∂2 ∂s3 ∂s2
0 ∂2 i=1 ∂s2i
P3
0
0
∂2 ∂s1 ∂s3 ∂2 ∂s2 ∂s3 ∂2 ∂s23
0
0
∂2 i=1 ∂s2i
P3
(C1)
Two types of boundary conditions are used. The first is obtained from the displacement boundary operator (C2) Qjx = I, and the second is obtained from the traction (force per unit surface area) boundary operator j N1 ∂ ∂ ∂ x1 x2 j j N2 , , Qx = (1 + x2 ) (1 − 2x2 ) ∂s1 ∂s2 ∂s3 j N3 (C3) ∂ ∂s1 ∂ ∂ x1 /2 ∂ + N2j + N3j + N1j I3 + N1j , N2j , N3j ⊗ ∂s∂2 1 + x2 ∂s1 ∂s2 ∂s3 ∂ ∂s3
where I3 is the 3×3 identity matrix, and N1j , N2j , N3j is the unit outward normal vector. It can clearly be seen that Lx and the Qjx ’s are linear differential operators. Note that 2 the operators ∂s∂ i , ∂s∂i ∂sj can simply be treated as scalars when interpreting the matrix operations in the definitions of Lx and Qjx . Acknowledgements: We thank the editor, an associate editor, and two referees for comments that helped improve this paper significantly. Matthias Tan was supported by General Research Fund project 11201117 funded by the Research Grants Council of Hong Kong. Supplementary Materials: Matlab Codes: This file contains Matlab codes for reproducing the results in Section 5. Data for all examples are provided.
References ´ Alvarez, M. A., & Lawrence, N. D. (2011). Computationally efficient convolved multiple output gaussian processes. Journal of Machine Learning Research, 12 (May), 1459– 1500. Alvarez, M. A., Luengo, D., & Lawrence, N. D. (2013). Linear latent force models using gaussian processes. IEEE transactions on pattern analysis and machine intelligence, 35 (11), 2693–2705. 28
ANSYS. (2017). Solutions by industry. (http://www.ansys.com/Solutions/solutions -by-industry) Buhmann, M. D. (2003). Radial basis functions: theory and implementations (Vol. 12). Cambridge: Cambridge university press. Cai, X., Bruaset, A., Langtangen, H., Lines, G., Samuelsson, K., Shen, W., . . . Zumbusch, G. (2003). Performance modeling of pde solvers. In Advanced topics in computational partial differential equations (pp. 361–399). Springer Berlin Heidelberg. Cengel, Y., & Ghajar, A. (2014). Heat and mass transfer: fundamentals and applications (4th edition). New York: McGraw-Hill. Coleman, M. P. (2013). An introduction to partial differential equations with matlab. Boca Raton: CRC Press. Collar, A. F., Oliveros, J. J. O., Castillo, M. M. M., & Mones, J. J. C. (2015). Identification of piecewise constant sources in non-homogeneous media based on boundary measurements. Applied Mathematical Modelling, 39 (23), 7697–7717. Comsol. (2017). Application gallery. (https://www.comsol.com/models) Conti, S., & OHagan, A. (2010). Bayesian emulation of complex multi-output and dynamic computer models. Journal of statistical planning and inference, 140 (3), 640–651. Currin, C., Mitchell, T., Morris, M., & Ylvisaker, D. (1991). Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. Journal of the American Statistical Association, 86 (416), 953–963. Daubechies, I. (1992). Ten lectures on wavelets (Vol. 61). Philadelphia: Siam. De Boor, C. (1978). A practical guide to splines (applied mathematical sciences vol 27). New York: Springer. Duffy, D. G. (2015). Greens functions with applications (2nd edition). Boca Raton: CRC Press. Golchi, S., Bingham, D. R., Chipman, H., & Campbell, D. A. (2015). Monotone emulation of computer experiments. SIAM/ASA Journal on Uncertainty Quantification, 3 (1), 370–392. Graepel, T. (2003). Solving noisy linear operator equations by gaussian processes: Application to ordinary and partial differential equations. In Icml (pp. 234–241). Gunzburger, M. D., Webster, C. G., & Zhang, G. (2014). Stochastic finite element methods for partial differential equations with random input data. Acta Numerica, 23 , 521–650. Hamdi, A. (2007). Identification of point sources in two-dimensional advection-diffusionreaction equation: application to pollution sources in a river. stationary case. Inverse Problems in Science and Engineering, 15 (8), 855–870. Han, Q. (2011). A basic course in partial differential equations (Vol. 120). American Mathematical Soc. Huang, S., Quek, S., & Phoon, K. (2001). Convergence study of the truncated Karhunen– Loeve expansion for simulation of stochastic processes. International journal for numerical methods in engineering, 52 (9), 1029–1043. Hung, Y., Joseph, V. R., & Melkote, S. N. (2015). Analysis of computer experiments with functional response. Technometrics, 57 (1), 35–44. Hyman, J. M. (1982). Numerical methods for nonlinear differential equations. In Nonlinear problems: present and future (pp. 91–107). Amsterdam: North Holland.
29
Jeffrey, A. (2003). Applied partial differential equations: an introduction. Amsterdam: Academic Press. Jia, J., Song, G., Atrens, A., St John, D., Baynham, J., & Chandler, G. (2004). Evaluation of the BEASY program using linear and piecewise linear approaches for the boundary conditions. Materials and Corrosion, 55 (11), 845–852. Jung-Ho, C., & Noboru, K. (1985). An analysis of metal forming processes using large deformation elastic-plastic formulations. Computer Methods in Applied Mechanics and Engineering, 49 (1), 71–108. Lee, S., Ham, H. J., Kwon, S. Y., Kim, S. W., & Suh, C. M. (2013). Thermal conductivity of magnesium alloys in the temperature range from- 125 ◦ C to 400 ◦ C. International Journal of Thermophysics, 34 (12), 2343–2350. Le Maˆıtre, O., & Knio, O. M. (2010). Spectral methods for uncertainty quantification: with applications to computational fluid dynamics. New York: Springer Science & Business Media. Li, J., & Chen, Y.-T. (2009). Computational partial differential equations using Matlab. Boca Raton: CRC Press. Lindholm, F. A., Fossum, J. G., & Burgess, E. L. (1979). Application of the superposition principle to solar-cell analysis. IEEE Transactions on Electron Devices, 26 (3), 165– 171. Lord, G. J., Powell, C. E., & Shardlow, T. (2014). An introduction to computational stochastic PDEs (No. 50). Cambridge University Press. MathWorks. (2017). Partial differential equation toolbox user’s guide. (http://www .mathworks.com/help/pdf doc/pde/pde.pdf) MathWorks-Physics-Team. (2017). 3D thermal analysis of a robotics component. (https://www.mathworks.com/matlabcentral/fileexchange/62058 -3d-thermal-analysis-of-a-robotics-component?focused=7494206&tab= example) Mianroodi, J. R., Niaki, S. A., Naghdabadi, R., & Asghari, M. (2011). Nonlinear membrane model for large amplitude vibration of single layer graphene sheets. Nanotechnology, 22 (30), 305703. Muehlenstaedt, T., Fruth, J., & Roustant, O. (2017). Computer experiments with functional inputs and scalar outputs by a norm-based approach. Statistics and Computing, 27 (4), 1083–1097. Myint-U, T., & Debnath, L. (2007). Linear partial differential equations for scientists and engineers (4th edition). Boston: Birkh¨auser. Norris, P. M. (1998). Application of experimental design methods to assess the effect of uncertain boundary conditions in inverse heat transfer problems. International journal of heat and mass transfer , 41 (2), 313–322. Park, H., & Cho, D. (1996). The use of the Karhunen-Loeve decomposition for the modeling of distributed parameter systems. Chemical Engineering Science, 51 (1), 81–98. Pivato, M. (2010). Linear partial differential equations and Fourier theory. Cambridge: Cambridge University Press. Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2017). Machine learning of linear differential equations using Gaussian processes. Journal of Computational Physics, 348 , 683–693. Rao, S. S. (2007). Vibration of continuous systems. New Jersey: John Wiley & Sons.
30
Rougier, J. (2008). Efficient emulators for multivariate deterministic functions. Journal of Computational and Graphical Statistics, 17 (4), 827–843. Sacks, J., Welch, W. J., Mitchell, T. J., & Wynn, H. P. (1989). Design and analysis of computer experiments. Statistical science, 4 (4), 409–423. Santner, T., Williams, B., & Notz, W. (2003). The design and analysis of computer experiments. New York: Springer. Slaughter, W. S. (2002). The linearized theory of elasticity. New York: Springer Science & Business Media. ˆ ın, P. (2006). Partial differential equations and the finite element method (Vol. 73). Sol´ New York: John Wiley & Sons. Stein, M. (1989). Comment on “Design and analysis of computer experiments,” by Sacks et al. Statistical Science, 4 (4), 432–433. Tan, M. H. (2016). Monotonic quantile regression with Bernstein polynomials for stochastic simulation. Technometrics, 58 (2), 180–190. Tan, M. H. (2017a). Gaussian process modeling of a functional output with information from boundary and initial conditions and analytical approximations. Technometrics(just-accepted). Tan, M. H. (2017b). Monotonic metamodels for deterministic computer experiments. Technometrics, 59 (1), 1–10. Wang, X., & Berger, J. O. (2016). Estimating shape constrained functions using Gaussian processes. SIAM/ASA Journal on Uncertainty Quantification, 4 (1), 1–25. Wheeler, M. W., Dunson, D. B., Pandalai, S. P., Baker, B. A., & Herring, A. H. (2014). Mechanistic hierarchical Gaussian processes. Journal of the American Statistical Association, 109 (507), 894–904.
31