arXiv:1607.03526v1 [math.PR] 12 Jul 2016
Probabilistic Solvers for Partial Differential Equations
Ilias Bilionis∗ School of Mechanical Engineering Purdue University 585 Purdue Mall West Lafayette, IN 47907
[email protected]
Abstract This work is concerned with the quantification of the epistemic uncertainties induced the discretization of partial differential equations. Following the paradigm of probabilistic numerics, we quantify this uncertainty probabilistically. Namely, we develop a probabilistic solver suitable for linear partial differential equations (PDE) with mixed (Dirichlet and Neumann) boundary conditions defined on arbitrary geometries. The idea is to assign a probability measure on the space of solutions of the PDE and then condition this measure by enforcing that the PDE and the boundary conditions are satisfied at a finite set of spatial locations. The resulting posterior probability measure quantifies our state of knowledge about the solution of the problem given this finite discretization.
1
Introduction
We propose a novel discretization technique for partial differential equations (PDEs) that is inherently probabilistic and thus capable of quantifying the epistemic uncertainties induced by the discretization. Inspired by Diaconis [1], we 1) Assign a prior probability measure on the space of solutions; 2) Observe the PDE on a discretized geometry; and 3) Use Bayes rule to derive the probability measure on the space of solutions that is compatible with both our prior beliefs and the observations. The uncertainty in this posterior probability measure is exactly the discretization uncertainty we are looking for. Our work fits within the program of probabilistic numerics (PN) [2], a term referring to the probabilistic interpretation of numerical algorithms. The ideas of PN can be found in various fields, e.g., the quantification epistemic uncertainty in quadrature rules [3, 4, 5], uncertainty propagation [6, 7], sensitivity indexes [8, 9], linear algebra [10], ordinary differential equations [11, 12, 13, 14, 15, 16, 17, 18], and optimization [19]. The outline of the paper is as follows. In Sec. 2 we discuss how our work relates to existing methods for solving PDEs focusing mostly on PN-derived ones. In Sec. 3 we present our solution methodology for arbitrary linear PDEs with mixed linear boundary conditions on arbitrary geometries. In Sec. 4 we employ our method to solve 1D and 2D boundary value problems with quantified discretization errors. Finally, in Sec. 5 we present our conclusions.
∗
www.predictivesciencelab.org
1
2
Related Work
There have been a few attempts to quantify the numerical errors in PDEs, albeit they either ignore spatial discretization errors [14, 18], or they only work in square domains [12]. The seminal work of [20] uses probabilistic arguments to derive a fast multigrid algorithm for the solution of elliptic PDEs, but it does not attempt to quantify epistemic uncertainties induced by discretization errors. Similarly to [12], the method we propose bears some resemblance to classical collocation methods [21], but contrary to them it does not require the analytic specification of a set of basis functions.
3
Methodology
Consider the time-indpendent partial differential equation (PDE): L[u](x) = f (x), x ∈ Ω,
(1)
with boundary conditions: B[u](x) = g(x), x ∈ ∂Ω, (2) where Ω ⊂ R is a physical object in d dimensions (d = 1, 2, or 3), with a sufficiently smooth boundary ∂Ω, and f and g are the source and and the boundary terms of the PDE. L and B are linear differential operators acting on functions u, with B being at least one order less than L. We say that Eq. (1) and Eq. (13) constitute a boundary value problem (BVP). d
Multi-index notation for defining the differential operators. To make these concepts mathematically precise without having an overwhelming notation, we will have to use the multi-index notation. A multi-index is a d tuple α = (α1 , . . . , αd ),
(3)
of non-negative integers. We define the absolute value of a multi-index α to be: |α| = α1 + · · · + αd ,
(4)
∂ α = ∂1αi . . . ∂dαd ,
(5)
and the α-partial derivative operator:
where
∂ αr , (6) r ∂xα r ∂i0 being the identity operator, ∂i0 u = u. Our assumption is that that L is a k-th linear differential operator, i.e., X L[u] = Lα ∂ α u, (7) ∂rαr =
|α|≤k
where the coefficients Lα are functions of x. Similarly, B is a (k − 1)-th order linear differential operator, X B[u] = Bα ∂ α u, (8) |α|≤k−1
where the Bα are also functions of x. Prior probability measure on the space of solutions. Prior to making any observations, we express our uncertainty about the solution u of the BVP by assigning a Gaussian random field to it. Namely, we assume that u is a zero mean Gaussian process (GP) with covariance function c, i.e. u ∼ GP (u|0, c) .
(9)
Intuitively, we are assuming that, a priori, any sample from Eq. (9) could be the solution to the BVP. The selection of c is a statement about the assumed smoothness of the solution, its lengthscale of variation, etc. We can now state and prove the following important result: 2
Theorem 1. Let L0 and B 0 be identical to L and B, respectively, but acting on x0 instead of x. If LL0 [c](x, x) and BB 0 [c](x, x) exist for all (x, x) ∈ R2d , then both L[u](x) and B[u](x) exist in the mean square sense for all x ∈ Rd . Furthermore, the vector random process (u, L[u], B[u]) is Gaussian with zero mean and covariance: Cov[u(x), u(x0 )] = c(x, x0 ), Cov [L[u](x), u(x0 )] = L[c](x, x0 ), Cov [B[u](x), u(x0 )] = B[c](x, x0 ), 0 Cov [L[u](x), L[u](x )] = LL0 [c](x, x0 ), Cov [L[u](x), B[u](x0 )] = LB 0 [c](x, x0 ), Cov [B[u](x), B[u](x0 )] = BB 0 [c](x, x0 ).
(10)
where L0 and B 0 are identical to L and B 0 , respectively, but they operate on functions of x0 . Proof. This is a straightforward generalization of Theorem 2.2.2 of [22]. It can be proved as follows. First, notice that any multi-indices α and β, the vector process (u, ∂ α , ∂ β ) is Gaussian with mean zero and covariance: β Cov[∂ α u(x), ∂ β u(x0 )] = ∂ α (∂ 0 ) c(x, x0 ). (11) This can be proved by letting the expectation over the probability measure of u get inside the limit in the definition of the partial derivatives. Combining this result with the fact that linear combinations of Gaussian random variables are Gaussian, one can show that any finite dimensional probability density function of the vector process is Gaussian. The result follows from a simple application of Kolmogorov’s extension theorem. The covariances between the various quantities are straightforward to derive. For example: Cov [L[u](x), B[u](x0 )]
= =
E [L[u](x)B[u](x0 )] , X X β E Lα (x)∂ α u(x) Bβ (x0 ) (∂ 0 ) u(x0 ) |α|≤k
=
X
X
|β|≤k−1
h
β
Lα (x)Bβ (x0 ) E ∂ α u(x) (∂ 0 ) u(x0 )
i
|α|≤k |β|≤k−1
=
X
X
β
Lα (x)Bβ (x0 )∂ α (∂ 0 ) c(x, x0 )
|α|≤k |β|≤k−1
= LB 0 [c](x, x0 ).
Constraining the prior measure using the BVP. Following Diaconis’ recipe, we use Bayes rule to constrain Eq. (9) to functions that satisfy the BVP. Of course, it is impossible to condition on an uncountable infinity of points. Instead, consider a set of ni points in the interior of Ω, Xi = xi1 , . . . , xini ⊂ Ω, (12) and a set of nb points on the boundary ∂Ω, Xb = xb1 , . . . , xbnb ⊂ ∂Ω.
(13)
We wish to derive the posterior probability measure of u conditioned on L[u] xij = f xij , j = 1, . . . , ni ,
(14)
and
B[u] xbj = g xbj , j = 1, . . . , nb . (15) Theorem 2. The posterior of the GP u conditioned on Eq. (14) and Eq. (15) is a GP with posterior mean: m(x) ˜ = c(x)T C−1 y, (16) and posterior covariance: c˜(x, x0 ) = c(x, x0 ) − c(x)T C−1 c(x0 ), 3
(17)
with C=
LL0 c Xi , Xi LB 0 c Xi , Xb , T LB 0 c(Xi , Xb ) BB 0 c Xb , Xb Lc(x, Xi ) c(x) = , Bc(x, Xb ) f Xi y= , g Xb
(18) (19) (20)
where for any function h : Rd → R, and sets of n and n0 points X = {x1 , . . . , xn } and X0 = {x01 , . . . , x0n0 }, respectively, we define h(X, X0 ) to be the matrix with elements (h(xj , x0r )). Proof. According to Theorem 1, the vector random process (u, L[u], B[u]) is Gaussian. Thus, t t t for any set of nt test points, X = {x1 , . . . , xnt } the joint distribution of the random vector t i b u(X ), L[u](X ), B[u](X ) is Gaussian. The result follows by conditioning this joint distribution on the observed values of L[u](Xi ) and B[u](Xb ), and then making use of the definition of a GP. Probabilistic solution of BVPs and quantification of the discretization error. The conditioned GP of Theorem 2 captures our state of knowledge after enforcing the PDE and the boundary condition on a finite discretization of the spatial domain. We call the posterior mean, m ˜ is as in Eq. (16), the probabilistic solution to the BVP. The pointwise uncertainty of this solution that corresponds to the discretization error is neatly captured by the pointwise posterior variance: σ 2 (x) = c˜(x, x),
(21)
where c˜ is as in Eq. (17). Time-dependent PDEs. Generalization of our methodology to time-dependent PDEs is straightfoward. All that one needs to do is augment x with t, replace Ω with Ω × [0, +∞], and make the initial condition part of the boundary condition. However, this generalization introduces computational challenges that are beyond the scope of this work. Thus, we do not consider time-dependent PDEs in this paper. Connections to the finite element method. A natural question that arises is “What combination of covariance function and observed data gives rise to established numerical methods for the solution of PDEs such as finite difference, finite volume, and finite element methods [21]?” Establishing such a connection would automatically provide probabilistic errors bars for existing numerical codes. Even though a complete survey of this topic is beyond the scope of the paper, we feel obliged to report an important theoretical observation that reveals a connection between the finite element method (FEM) and PN. The observation that we have made is that the mean of the posterior GP becomes exactly the same as the FEM solution to the PDE if 1) the covariance function c is picked to be the Green’s function of the operator L [23]; and 3) theRobservations upon which we condition R the prior measure are the integrals Ω L[u](x)φi (x)dx = Ω f (x)φi (x)dx, i = 1, . . . , ne , where {φi , ı = 1, . . . , ne )} is the finite element basis upon which the solution is expanded. Unfortunately, in this case the posterior variance of the GP turns out to be infinite and consequently, not very useful for quantifying the epistemic uncertainties due to the discretization.
4
Numerical Results
We use the ideas discussed above to solve 1D and 2D boundary value problems. We show how our ideas can naturally be applied to arbitrary geometries without the use of a mesh. Throughout this study we use a stationary covariance function for u: ( ) d 2 1 X (xr − x0r ) 0 2 c(x, x ) = s exp − , (22) 2 r=1 `2r where s and `r , r = 1, . . . , d are parameters that can be interpreted as the signal strength and the lengthscale of spatial dimension r, respectively. The choice of this covariance function corresponds 4
to a priori knowledge that u is infinitely differentiable. The code has been implement in Python using Theano [24] for the symbolic computation of all the required derivatives. Case Study 1: 1D Steady State Heat Equation with Neumann Boundary Condition
1 0 −1 −2 −3 −4 −5 −6 −7
Normalized likelihood
u(x)
Consider the boundary value problem: 2 d du 1 − a(x) − u = e−(x−2) (23) dx dx 2 with 1 (24) a(x) = arctan (20(x − 1)) + 1, 2 and boundary conditions: du(0) = 0, and u(3) = 0. (25) dx In this case, we assume that s = 2 in Eq. (22). However, instead of assuming a value for ` we estimate by maximizing the likelihood of the data Eq. (14) and Eq. (15). Despite the fact that this can be done with a fast algorithm, here we just used an exhaustive search because of the simplicity of the problem. In Figure 1 (a), we compare the probabilistic solution using ni = 20 (green dashed-line), ni = 40 (red dotted line), and ni = 80 (magenta dash-dotted line) equidistant internal discretization points to the exact solution (obtained by the finite difference method using n = 10, 000 discretization points). The shaded areas correspond to 95% predictive intervals that capture the discretization error. In Figure 1 (b), we show the normalized likelihood (likelihood divided by its maximum value) for each one of the cases considered.
Finite dif. solution (n=10000) Probabilistic solution (ni = 20) Probabilistic solution (ni = 40) Probabilistic solution (ni = 80) 0.0
0.5
1.0
1.5
2.0
2.5
3.0
1.0 0.8 0.6
ni = 20 ni = 40 ni = 80
0.4 0.2 0.0 0.00
0.05
0.10
0.15
x
0.20
0.25
0.30
`
(a)
(b)
Figure 1: Case study 2 (Eq. (23) subject to Eq. (25)). Subfigure (a) compares three probabilistic solutions to the finite difference solution. Subfigure (b) depicts the normalized likelihood as a function of the lengthscale ` of Eq. (22). Case Study 2: Elliptic Equation on a Disk Consider the elliptic partial differential equation on the unit disk, Ω = (x1 , x2 ) : x21 + x22 ≤ 1 ,
(26)
−∇2 u = 1, for (x1 , x2 ) ∈ Ω,
(27)
u(x1 , x2 ) = 0, for (x1 , x2 ) ∈ ∂Ω.
(28)
with boundary conditions: The exact solution is:
1 − x21 − x22 . (29) 4 We use ni = 16 internal and nb = 5 boundary discretization points. We assume that the s = 0.1 and we choose ` so that the likelihood of Eq. (14) and Eq. (15) is maximized (` = 3.5 for the chosen discretization). Figure 2 (a) and (b) depict contours of the exact and the probabilistic solution, respectively. Figure 2 (a) and (b) depict contours of the exact and the probabilistic solution, respectively. In subfigure (b), the internal points are indicated by blue crosses and the boundary points by red disks. Subfigure (c) depicts the exact absolute value of the discretization error while (d) plots the contours of predicted discretization error (two times the square root of the posterior variance Eq. (21)). u(x1 , x2 ) =
5
0.2800
0.2800
0.2489 1.0
0.2489 1.0
0.2178 0.1867
0.5
0.2178 0.1867
0.5
0.1556
x2
x2
0.1556 0.0
0.0
0.1244 −0.5
0.1244 −0.5
0.0933
−1.0 −1.0 −0.5
0.0622 0.0
0.5
1.0
x1
0.0933
−1.0 −1.0 −0.5
0.0311
0.0622 0.0
0.5
1.0
x1
0.0000
0.0000
(a) Exact solution.
(b) Probabilsitic solution. 0.00021
0.028
0.00018
0.024
1.0
1.0 0.00015
0.020 0.5
0.5
0.016
x2
0.00012
x2
0.0311
0.0
0.0
0.00009 −0.5
0.012 −0.5
0.00006
−1.0 −1.0 −0.5
0.0
0.5
1.0
−1.0 −1.0
0.00003
x1
0.008 −0.5
0.00000
(c) Exact absolute error.
0.0
0.5
1.0
x1
0.004 0.000
(d) Predicted discretization error.
Figure 2: Case study 4 (Eq. (27) with Eq. (28)).
Case Study 3: Elliptic Equation on a Disk with Unknown Solution Consider the elliptic partial differential equation on the unit disk (Ω as in Eq. (26)), ( 2 2 ) 1 Rx2 − x02 1 Rx1 − x01 2 − , for (x1 , x2 ) ∈ Ω, −∇ u = 4 exp − 2 λ 2 λ
(30)
with R = 0.3, σ = 0.025, x01 = 0.6R cos(0.2), and x02 = 0.6R sin(0.2), with boundary conditions: u(x1 , x2 ) = 0, for (x1 , x2 ) ∈ ∂Ω. (31) We use ni = 50 internal and nb = 20 boundary discretization points. We assume that the s = 0.01 and we choose ` so that the likelihood of Eq. (14) and Eq. (15) is maximized (` = 0.26 for the chosen discretization). Figure 3 (a) and (b) depict contours of the exact and the probabilistic solution, respectively. In subfigure (b), the internal points are indicated by blue crosses and the boundary points by red disks. 6
0.028
0.0105
0.024
0.0090
1.0
1.0 0.020 0.5
0.0075 0.5
0.016
x2
x2
0.0060 0.0
0.012
0.0 0.0045
0.008
−0.5
−0.5
0.0030
0.004 −1.0 −1.0 −0.5
0.0
0.5
1.0
−1.0 −1.0 −0.5
0.000
x1
0.0
0.5
1.0
0.0015
x1
−0.004
0.0000
(a) Probabilistic solution.
(b) Predicted discretization error.
Figure 3: Case study 5 (Eq. (30) with Eq. (31)). Case Study 4: Elliptic Equation on Arbitrary Shape We consider an elliptic partial differential equation on an arbitrary domain Ω (see Figure 4). The equation is as in Eq. (30), but with R = 0.8, σ = 0.025, x01 = R cos(π/4), and x02 = R sin(π/4). We fix nb = 20 and s = 0.2 and experiment with ni = 34, 43, and 61. In each case we fit the ` by maximizing the likelihood of Eq. (14) and Eq. (15).
5
Conclusions
We presented a probabilistic method for solving linear PDEs with mixed linear boundary conditions on arbitrary geometries. The idea is to assign a prior probability measure on the space of solutions, observe some discretized version of the PDE, and then derive the posterior probability measure. Different choices of covariance functions and observed data result in different solution schemes. The value of the method lies on the fact that it is capable of quantifying the epistemic uncertainties induced by the discretization scheme. Such probabilistic schemes can find application in adaptive mesh refinement, parallelized PDE solvers that are able to recover gracefully in case of sub-task failure, solution multi-scale/physics PDEs of which the right hand side depends on some expensive simulations that can only be performed a finite number of times, and more. The main drawback of the proposed methodology is that it requires the factorization of a dense covariance matrix. This task can be prohibitively expensive when solving PDEs on realistic domains. Further research is required to discover covariance functions that lead to sparse matrices that can be factorized efficiently.
7
1.0
0.014
1.0
0.0016
0.9
0.012
0.9
0.0014
0.010
0.8
0.0012
0.8
0.008
0.0010
0.006
x2
0.7
x2
0.7 0.6
0.0008 0.6
0.004 0.5
0.0006 0.5
0.002
0.0004
0.4
0.000
0.4
0.0002
0.3 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75
−0.002
0.3 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75
0.0000
x1
x1
(a) Probabilistic solution (ni = 34, ` = 0.18).
(b) Predicted discretization error (ni = 34, ` = 0.18).
1.0
0.00032
0.9
0.0075
0.9
0.00028
0.8
0.0060
0.8
0.00024
0.7
0.0045
0.7
0.00020
0.6
0.0030
0.6
0.00016
0.5
0.0015
0.5
0.00012
0.4
0.0000
0.4
0.00008
0.3 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75
−0.0015
0.3 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75
0.00004
x2
0.0090
x2
1.0
x1
x1
(c) Probabilistic solution (ni = 43, ` = 0.14).
(d) Predicted discretization error (ni = 43, ` = 0.14).
Figure 4: Case study 4.
8
References [1] Persi Diaconis. Bayesian Numerical Analysis. In Statistical Decision Theory and Related Topics IV, pages 163–175. Springer New York, New York, NY, 1988. [2] Philipp Hennig, Michael A Osborne, and Mark Girolami. Probabilistic Numerics and Uncertainty in Computations. arXiv.org, (2179):20150142, June 2015. [3] A. O’Hagan. Bayes-Hermite quadrature. Journal of Statistical Planning and Inference, 29(3):245–260, 1991. [4] M. Kennedy. Bayesian quadrature with non-normal approximating functions. Statistics and Computing, 8(4):365–375, 1998. [5] T. P. Minka. Deriving quadrature rules from Gaussian processes. Report, Department of Statistics, Carnegie Mellon University, 2000. [6] R. Haylock and A. O’Hagan. On inference for outputs of computationally expensive algorithms with uncertainty on the inputs, pages 629–637. Oxford University Press, Oxford, 1996. [7] J. Oakley and A. O’Hagan. Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika, 89(4):769–784, 2002. [8] W. Becker, J. E. Oakley, C. Surace, P. Gili, J. Rowson, and K. Worden. Bayesian sensitivity analysis of a nonlinear finite element model. Mechanical Systems and Signal Processing, 32:18–31, 2012. [9] A. Daneshkhah and T. Bedford. Probabilistic sensitivity analysis of system availability using Gaussian processes. Reliability Engineering & System Safety, 112:82–93, 2013. [10] P. Hennig. Probabilistic interpretation of linear solvers. SIAM Journal on Optimization, 25(1), 2015. [11] J. Skilling. Bayesian solution of ordinary differential equations. In Maximum Entropy and Bayesian Methods, volume 50, pages 23–37, 1992. [12] T. Graepel. Solving noisy linear operator equations by Gaussian processes: Application to ordinary and partial differential equations. In ICML, pages 234–241, 2003. [13] B. Calderhead, M. Girolami, and N. D. Lawrence. Accelerating Bayesian inference over nonlinear differential equations with Gaussian processes. 2009. [14] O. Chkrebtii, D. A. Cambell, M. A. Girolami, and B. Calderhead. Bayesian uncertainty quantification for differential equations. ArXiv PrePrint 1306.2365, 2013. [15] D. Barber. On solving ordinary differential equations using Gaussian processes. ArXiv PrePrint 1408.3807, 2014. [16] P. Hennig and S. Hauberg. Probabilistic solutions to differential equations and their application to Riemannian statistics. In 17th international Conference on Artificial Intelligence and Statistics, 2014. [17] M. Schober, D. K. Duvenaud, and P. Hennig. Probabilistic ODE solvers with Runge-Kutta means. In schober2014nips, 2014. [18] P. R. Conrad, M. Girolam, S. S¨arkk¨a, A. Stuart, and K. Zygalakis. Probability measures for numerical solutions of differential equations. ArXiv:1506.04592, 2015. [19] P. Hennig. Fast probabilistic optimization from noisy gradients. In International conference on machine learning, 2013. [20] Houman Owhadi. Multigrid with rough coefficients and Multiresolution operator decomposition from Hierarchical Information Games. March 2015. [21] F. W. Ames, R. Rheinboldt, and A. Jeffrey. Numerical Methods for Partial Differential Equations. 1977. [22] R Adler. The Geometry of Random Fields. Society for Industrial and Applied Mathematics, 2010. [23] S. S. Bayin. Mathematical Methods in Science and Engineering. John Wiley & Sons, Inc., 2006. [24] Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs/1605.02688, May 2016.
9