A COMPUTATIONAL MEASURE THEORETIC APPROACH TO INVERSE SENSITIVITY PROBLEMS II: A POSTERIORI ERROR ANALYSIS T. BUTLER∗ , D. ESTEP† , AND J. SANDELIN‡ Abstract. In part one of this paper [4], we develop and analyze a numerical method to solve a probabilistic inverse sensitivity analysis problem for a given map assuming that the map in question can be evaluated exactly. In this paper, we treat the situation in which the output of the map is determined implicitly and is difficult and/or expensive to evaluate, e.g requiring the solution of a differential equation, and hence the output of the map is approximated numerically. The main goal is an a posteriori error estimate that can be used to evaluate the accuracy of the computed distribution solving the inverse problem taking into account all sources of statistical and numerical deterministic errors. We present a general analysis for the method and then apply the analysis to the case of a map determined by the solution of an initial value problem. Key words. a posteriori error analysis, adjoint problem, density estimation, inverse sensitivity analysis, nonparametric density estimation, sensitivity analysis, set-valued inverse
1. Introduction. In part one of this paper [4], we develop and analyze a numerical method to solve a probabilistic inverse sensitivity analysis problem: Given a specified variation and/or uncertainty in the output of a smooth map, determine variations in the input data and/or parameters that produce the observed uncertainty. We formulate this inverse problem using the Law of Total Probability after assuming that the inputs and outputs are random variables. In [4], we analyze the new method assuming that the map in question can be evaluated exactly. However, we are particularly interested in situations in which the output of the map is determined implicitly and is difficult and/or expensive to evaluate, e.g requiring the solution of a differential equation. In this case, the output of the map will be approximated numerically. In this paper, we carry out an analysis of the effects of numerical error on the computed inverse distribution. As a consequence of the analysis, we prove that the numerical error of the approximate parameter density computed from the algorithm for solving the inverse problem converges to zero as the discretization (both statistical and deterministic) converge to zero. But, our main goal is the derivation of an a posteriori error estimate that can be used to evaluate the accuracy of the computed distribution solving the inverse problem, taking into account all sources of statistical and numerical deterministic errors. ∗ Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin, TX 78712 (
[email protected]). T. Butler’s work is supported in part by the Department of Energy (DE-FG02-05ER25699) and the National Science Foundation (DGE-0221595003, MSPACSE-0434354) † Department of Mathematics and Department of Statistics, Colorado State University, Fort Collins, CO 80523 (
[email protected]). D. Estep’s work is supported in part by the Defense Threat Reduction Agency (HDTRA1-09-1-0036), Department of Energy (DE-FG02-04ER25620, DEFG02-05ER25699, DE-FC02-07ER54909, DE-SC0001724), Lawrence Livermore National Laboratory (B573139, B584647), the National Aeronautics and Space Administration (NNG04GH63G), the National Science Foundation (DMS-0107832, DMS-0715135, DGE-0221595003, MSPA-CSE-0434354, ECCS-0700559), Idaho National Laboratory (00069249), and the Sandia Corporation (PO299784). ‡ Department of Mathematics, Colorado State University, Fort Collins, CO 80523 (
[email protected]). J. Sandelin’s work is supported in part by the Department of Energy (DE-FG02-04ER25620, DE-FG02-05ER25699) and the National Science Foundation (DGE0221595003, MSPA-CSE-0434354).
1
2
T. Butler, D. Estep, and J. Sandelin
1.1. The inverse problem. The problem we study in [4] is a direct inversion of the forward stochastic sensitivity problem for a deterministic model. We consider an operator q(λ) that maps values in a parameter (and data) space Λ to an output space D. We assume there is a parameter distance measure µΛ on Λ that is used to distinguish the distance between points in Λ. We assume that µΛ is absolutely continuous with respect to the Lebesgue measure, the volume V of Λ is finite, and µΛ is normalized to be a probability measure on Λ. The deterministic model can be expressed in terms of a likelihood function L(q | λ) of the output q values given the input parameter values λ, where L(q | λ) = δ(q −q(λ)) is the unit mass distribution at q = q(λ). If we specify a density σΛ (λ) on the parameter space Λ, then the Law of Total Probability implies Z P (q(A)) = L(q | λ)σΛ (λ) dµΛ (λ). (1.1) A
The stochastic inverse sensitivity analysis problem that we study is the inversion of the integral equation (1.1). Namely, we assume that an observed probability density, ρD (q(λ)) is given on the output value q(λ) and we seek to compute the corresponding parameter density σΛ (λ) that yields ρD (q(λ)) via (1.1). 1.2. Sources of error. In [4], we present a computational measure theoretic algorithm to represent the solution of the inverse problem (1.1) as a simple function M0 σΛ,M 0 (λ) with respect to a partition {bi }i=1 of Λ. In [4], we analyze the convergence of approximate representation to the true representation σΛ,M 0 (λ) on a given partition M0 {bi }i=1 . In practice, there are two types of error that arise in the approximation of the representation σΛ,M 0 (λ). Statistical error is introduced if finite sampling of the observed probability density ρD (q(λ)) is used in the main algorithm. This type of error shows up on the left-hand side of (1.1). Finite sampling of the distribution of random variable q(λ) is used in order to “pass” the distribution through the piecewise-linear response surface qˆ(λ) constructed as part of the solving the inverse problem. Given an analytic, easy to evaluate distribution function for q(λ), we need not perform any sampling. The second type of error arises when we use numerical approximations in the evaluation of the map q, e.g. as happens if q involves solving a differential equation. This means that we use approximate values of q and its gradient to form the approximate representation q˜(λ) ≈ qˆ(λ). This source of error shows up in the evaluation of the likelihood function in (1.1). 2. General error analysis for the approximation algorithm. Recall the main result from [4]; Theorem 2.1. Given a measurable set A ⊂ Λ, we can approximate P (A) using a simple function representation of σΛ (λ) that only requires calculations of volumes in Λ. The constructive proof yields the computational Algorithm 1 in [4] that generM0 ates a probability P (bi ) for each cell bi in a partition {bi }i=1 of Λ. This gives a representation of the parameter density by the simple function 0
M X P (bi ) 1b (λ). σΛ,M 0 (λ) = µΛ (bi ) i k=1
(2.1)
3
Inverse Sensitivity Problems II
To simplify notation, we drop the hat on the piecewise-linear representation qˆ(λ) of q(λ) produced by the algorithm. 2.1. Analysis of statistical error. We begin by examining the effect of finite sampling in the evaluation of ρD (q). We conduct an a posteriori analysis similar to that used for nonparametric density estimation for elliptic problems with randomly perturbed diffusion coefficients in [14]. Let Fq (t) denote the probability distribution function of q(λ). To simplify the presentation, we assume bi is contained in a region of contours Ai induced by the simple function approximation to ρD (q). If no sampling is used to evaluate ρD (q) or Fq (t), then the algorithm yields R dµΛ (λ) R , 1 ≤ i ≤ M 0, (2.2) P (bi ) = Fq (q(bi )) bi dµΛ (λ) Ai where q(bi ) = {q(λ), λ ∈ bi }. (If bi 6⊂ Ai , then we alter (2.2) to sum over the regions of induced contours Aj such that bi ∩ Aj 6= ∅.) We let F (t) denote the probability distribution on Λ that represents (2.1), where t ∈ Rd , and F (t) = P ({λ | λ ≤ t}) = P (λ ≤ t).
(2.3)
Here the inequality, λ ≤ t, is considered component-wise. Using (2.2) in (2.3) gives R M0 X dµΛ (λ) b ∩{λ≤t} . (2.4) F (t) = Fq (q(bi )) i R dµΛ (λ) Ai i=1 We use a sample distribution function for Fq (t) computed from a finite collection of error free sample values {Q1 , . . . , QN }, Fq,N (t) =
N 1 X 1(Qn ≤ t). N n=1
This leads to an approximation of FN (t) ≈ F (t) defined R M0 X dµΛ (λ) b ∩{λ≤t} FN (t) = Fq,N (q(bi )) i R . dµΛ (λ) Ai i=1 The error in the distribution is bounded by 0
|F (t) − FN (t)| ≤
M X
R |Fq (q(bi )) − Fq,N (q(bi ))|
i=1
bi ∩{λ≤t}
R Ai
dµΛ (λ)
dµΛ (λ)
.
(2.5)
Using standard statistical arguments [14], there is a constant C > 0 such that for any > 0, 1/2 log(−1 ) (2.6) sup |Fq (t) − Fq,N (t)| ≤ C 2N t∈R with probability greater than 1 − . It is possible to prove other forms of this bound [14]. Using (2.6) in (2.5) yields for any > 0, R 1/2 X 1/2 M0 dµΛ (λ) log(−1 ) log(−1 ) bi ∩{λ≤t} R |F (t) − FN (t)| ≤ C ≤C (2.7) 2N 2N dµΛ (λ) Ai i=1 with probability greater than 1 − .
4
T. Butler, D. Estep, and J. Sandelin
2.2. Analysis of deterministic error. We now consider the use of an approximation q˜(λ) ≈ q(λ) and we estimate the effect on the computed value of σΛ,M 0 (λ). We assume that there is an a priori error bound or a posteriori error estimate for the error in q˜(λ). More precisely, the piecewise linear function q is defined on the partition {Bi } of Λ, where q(λ) = q(µi ) + ∇q(µi )(λ − µi ) on Bi and µi is a chosen value in Bi , and q˜(λ) = q˜(µi ) + ∇˜ q (µi )(λ − µi ) on Bi . Hence the error has the form q(λ) − q˜(λ) = q(µi ) − q˜(µi ) + ∇q(µi ) − ∇˜ q (µi ))(λ − µi ) on Bi . (2.8) Hence, we require estimates or bounds for the errors in both q˜(µi ) and ∇˜ q (µi ) respectively. The derivation of the a priori bound or a posteriori error estimate is specific to a particular map q. Below, we derive the necessary estimates for nonlinear ordinary differential equations. Similar results hold for elliptic problems [3]. The use of an approximation for q leads to an error in computation of F (t) given in (2.4). We define the approximate sample distribution function F˜N (t) as R M0 X dµΛ (λ) b ∩{λ≤t} . (2.9) F˜N (t) = Fq,N (˜ q (bi )) i R dµΛ (λ) Ai i=1 We calculate probabilities using (2.9) and seek to determine the error F (t) − F˜N (t). We decompose the error to get (2.10) F (t) − F˜N (t) ≤ |F (t) − FN (t)| + FN (t) − F˜N (t) , {z } | | {z } I
II
where I is bounded in (2.7). For II, we assume a bound or estimate Ei for the error in q˜(bi ) on each cell bi . For convenience, we choose the fine partition {bi } so that for each 1 ≤ i ≤ M 0 , bi ⊂ Bj for some 1 ≤ j ≤ M . Thus, for all cells bi ⊂ Bj for a fixed j, there is the same deterministic error term associated with q˜(bi ). We let Ej , 1 ≤ j ≤ M , denote the deterministic error associated with each q˜(bi ) for all bi ⊂ Bj . Using an analogous argument as in [14], R M0 X dµΛ (λ) b ∩{λ≤t} II ≤ |Fq,N (Mi + E) − Fq,N (mi − E)| i R dµΛ (λ) Ai i=1 R 0 M X dµΛ (λ) b ∩{λ≤t} ≤ |Fq (Mi + E) − Fq,N (mi + E)| i R dµΛ (λ) Ai i=1 R 0 M X dµΛ (λ) b ∩{λ≤t} + |Fq (mi − E) − Fq,N (mi − E)| i R dµΛ (λ) Ai i=1 R 0 M X dµΛ (λ) b ∩{λ≤t} + |Fq (Mi + E) − Fq (mi − E)| i R , dµΛ (λ) Ai i=1 where E = maxj |Ej |, Mi = max q(bi ), and mi = min q(bi ). Using (2.6) for the first two terms on the right-hand side of the inequality we have that for any > 0 R 1/2 X M0 dµΛ (λ) log(−1 ) b ∩{λ≤t} |Fq (Mi + E) − Fq (mi − E)| i R II ≤ C + 2N dµΛ (λ) Ai i=1
Inverse Sensitivity Problems II
5
with probability greater than 1 − . Assuming Lipschitz continuity of the distribution Fq with constant L, for any > 0, II ≤ C
log(−1 ) 2N
1/2 + L max 0 (Mi − mi ) × E 1≤i≤M
with probability greater than 1 − . This gives Theorem 2.2. There exists constant C such that for any > 0, 1/2 log(−1 ) ˜ ≤ C (t) − F (t) + L max 0 (max q(bi ) − min q(bi )) × E F N 1≤i≤M 2N
(2.11)
with probability greater than 1 − . If no sampling is used to evaluate ρD (q) or Fq (t), then (2.12) F (t) − F˜ (t) ≤ L max 0 (max q(bi ) − min q(bi )) × E. 1≤i≤M
Note that F˜ (t) is the distribution calculated using exact values of ρD (q), but approximate values of q. 3. Application to nonlinear ordinary differential equations. We apply the general error analysis to a finite dimensional map q defined implicitly by the solution to a differential equation that depends on a finite number of parameters in the model. We consider the initial value problem ( y˙ = f (y; λ1 ), t > 0, (3.1) y(0) = λ0 , > > d where y ∈ Rn , f : Rn+p → Rn is smooth, and λ = (λ> 1 , λ0 ) ∈ Λ ⊂ R (d = p + n) are the parameters. We solve (3.1) to calculate a linear functional of the solution, or a quantity of interest,
Z
T
hy, ψi dt.
q(y) =
(3.2)
0
We assume that the solution y of (3.1) depends (implicitly) on parameters λ in a smooth way and denote solutions of (3.1) as yλ and the quantity of interest as q(λ) to emphasize the implicit dependence of the quantity of interest on the parameters. The smooth dependence of solutions to (3.1) on parameters λ implies the dependence of the quantity of interest on λ is also smooth. 3.1. Construction of the piecewise-linear representation. Computing the gradient information is problematic for a differential equation. We use an adjoint equation and variational analysis to do this implicitly. We solve the initial value > > problem at a reference parameter value µ = (µ> 1 , µ0 ) , ( y˙ µ = f (yµ ; µ1 ), t > 0, (3.3) y(0) = µ0 ,
6
T. Butler, D. Estep, and J. Sandelin
where (yµ , µ) is a reference point. We define the exact adjoint problem, ( −φ˙ − Dy f (yµ ; µ1 )> φ = ψ, T > t ≥ 0, φ(T ) = 0.
(3.4)
The following theorem [26] relates the value of q(λ) to q(µ) for λ near µ. Theorem 3.1. If f (y; λ) is twice continuously differentiable with respect to both y and λ and Lipschitz continuous in both y and λ, then the quantity of interest is Fr´ echet differentiable at (yµ , µ) with derivative ∇q(µ) : Rd → R given by Z ∇q(µ)[λ] = h(λ0 − µ0 ), φ(0)i +
T
hDλ1 f (yµ ; µ)(λ1 − µ1 ), φi dt.
(3.5)
0
Additionally, q(λ) ≈ q(µ) + ∇q(µ)[λ]. In the absence of numerical error, Z T Z q(λ) ≈ hyµ , ψi dt + h(λ0 − µ0 ), φ(0)i + 0
(3.6)
T
hDλ1 f (yµ ; µ1 )(λ1 − µ1 ), φi dt (3.7)
0
for λ close to µ. M The global piecewise-linear approximation on the partition {Bi }i=1 of Λ, is constructed by using the local linearization on each cell Bi to obtain qˆ(λ) :=
M X
(q(µi ) + h∇q(µi ), (λ − µi )i) 1Bi (λ),
(3.8)
i=1
where µi is the reference parameter value chosen in cell Bi . 3.2. Discretization. The a posteriori error estimate uses a variational analysis after introducing an adjoint problem. The variational analysis makes it natural to write the discretization method in the finite element framework. This is not restrictive as most common finite difference schemes can be written as a finite element method with a particular choice of quadrature for evaluating integrals. A finite element method is based on the variational formulation of the differential equation. For the differential equation, ( x˙ = g(x, t), 0 < t ≤ T, (3.9) x(0) = x0 , the problem is to find x ∈ C 1 ([0, 1]) such that, (R t Rt (x, ˙ v) ds = 0 (g(x, t), v) ds, 0 x(0) = x0 ,
t > 0,
(3.10)
for all v ∈ C 1 ([0, 1]). (We use g instead of f because there are several problems that have to be solved below.) We compute a solution on the interval [0, T ], and we discretize the interval 0 = t0 < t1 < · · · < tN = T with time intervals Ij = (tj−1 , tj ) and time steps kj = tj −tj−1 .
Inverse Sensitivity Problems II
7
The finite element method produces a piecewise polynomial approximation. We use P q (Ij ) to denote the space of polynomials of degree q and less on time interval Ij and define the space of piecewise polynomials, n o V (q) = U : U |Ij ∈ P (q) (Ij ), j = 1, 2, 3, . . . . We consider the discontinuous Galerkin (dGq) finite element method that produces an approximate solution X ∈ V (q) [9]. Since X may be discontinuous at time nodes, + − + − we define Xj−1 = limt↓tj−1 X(t), Xj−1 = limt↑tj−1 X(t), and [X]j−1 = Xj−1 − Xj−1 . The approximation is computed interval by interval. We set X0 = x0 . Then we compute X ∈ P q (tj−1 , tj ) successively for j = 1, 2, . . . , N satisfying Z tj Z tj ˙ v) ds + ([X]j−1 , v + ) = (g(X, t), v) ds, for all v ∈ P q (tj−1 , tj ). (3.11) (X, j−1 tj−1
tj−1
If g(x, t) ≡ g(x) and q = 0, the dG0 approximation matches the backward Euler approximation at the time nodes. In general, we may obtain various difference schemes, e.g. the subdiagonal Pade schemes, by employing quadrature to evaluate the integrals in (3.11) [19, 28]. There is also a continuous Galerkin (cG) approximation that produces yet other classes of approximations [10]. We carry out the analysis below for the dG scheme assuming the integrals in (3.11) are computed exactly. The extension to handle quadrature and the cG method is straightforward [13]. 3.3. The effect of using an approximate solution on the piecewise-linear representation. The main interest is in treating the effects of using a numerical approximation Yµ in the linearization of the forward problem used to construct an adjoint. We define the approximate adjoint using (3.4) with “perturbed” operator Dy f (Yµ ; µ1 ), ( −Φ˙ − Dy f (Yµ ; µ1 )> Φ = ψ, T > t ≥ 0, (3.12) Φ(T ) = 0. We assume f (y; λ) is twice continuously differentiable with respect to both y and λ, so that standard convergence results for Yµ imply that over some (short) time interval [0, T ], kDy f (yµ ; µ1 ) − Dy f (Yµ ; µ1 )kV ≤ K kyµ − Yµ kU ,
(3.13)
where k·kV and k·kU are the L2 ([0, T ]) norm of some appropriate matrix and vector norms of the arguments, respectively. Let q˜(λ) denote the approximate quantity of interest calculated using (3.6) with Yµ and Φ in place of yµ and φ, Z T Z T q˜(λ) = hYµ , ψi dt+h(λ0 − µ0 ), Φ(0)i+ hDλ1 f (Yµ ; µ1 )(λ1 − µ1 ), Φi dt, (3.14) 0
0
with error q(λ) − q˜(λ). Taking the difference of (3.7) and (3.14) gives Z T q(λ) − q˜(λ) = h(yµ − Yµ ), ψi dt + h(λ0 − µ0 ), (φ(0) − Φ(0))i | {z } 0 {z } | II I
Z + |0
T
(hDλ f (yµ ; µ1 )(λ1 − µ1 ), φi − hDλ f (Yµ ; µ1 )(λ1 − µ1 ), Φi) dt . (3.15) {z } III
8
T. Butler, D. Estep, and J. Sandelin
Term I is a linear functional of the error yµ − Yµ and it can be estimated using standard a posteriori analysis techniques as described below. Term II measures the effect of using Yµ and Φ on the sensitivity of q(λ) to changes in the initial conditions. Term III measures the effect of using Yµ and Φ on the sensitivity of q(λ) to changes in model parameters. The terms II and III depend linearly on the vector λ − µ. The analysis below produces estimates that also depend on this vector linearly so that the error estimates for these terms are also linear functions of this vector. Thus, following the analysis described below for p linearly independent vectors λ − µ, we obtain a set of error estimates such that the error defined by II and III for any vector λ − µ can be written as a linear combination from this set of error estimates. 3.4. Convergence and order of accuracy. We can use straightforward a priori error analysis on (3.15) to show that |q(λ) − q˜(λ)| converges at the same order as the dGq method over a short time period under the assumption that f is twice continuously differentiable. 3.5. Estimate of the error in a quantity of interest. We compute an a posteriori error estimate using variational analysis and adjoint problems [10, 9, 7, 8, 13, 28, 6]. We begin by recalling the a posteriori estimate of error in a quantity of interest. Let X denote the dGq approximation to (3.9) and let e = x − X, where x solves (3.9) exactly. We linearize around X in the sense of perturbing the operator to arrive at the adjoint problem ( > −ϑ˙ = g 0 (x, X) φ + ψ1 (t), T > t ≥ 0, (3.16) ϑ(T ) = ψ2 , R1 where g 0 (x, X) = 0 g 0 (sx + (1 − s)X, t) ds. For simplicity, we use g 0 for g 0 (x, X) below. If ψ1 (t) ≡ 0, then the quantity of interest is (e(T ), ψ2 ). If ψ2 = 0, then the RT quantity of interest is 0 (e(t), ψ1 (t)) dt. Assume ψ1 (t) ≡ 0 in (3.16). Take the inner product of the adjoint problem with e and integrate from 0 to T to obtain Z T Z T ˙ e) dt − − (ϑ, ((g 0 )> ϑ, e) dt = 0. (3.17) 0
0
We decompose (3.17) into a sum of integral equations over each time interval in the discretization and integrate by parts over each interval to get N N Z tn N Z tn X X X t − (e, ϑ)|tnn−1 + (ϑ, e) ˙ dt − (ϑ, g 0 e) dt = 0. (3.18) n=1
n=1
tn−1
n=1
tn−1
Since e = x − X might be discontinuous at the boundaries of each interval, we expand the first term on the right hand side of (3.18) to −
N X
t (e, ϑ)|tnn−1
= (e(0), ϑ(0)) +
n=1
N X
([X]n−1 , ϑn−1 ) − (e(T ), ϑ(T )),
(3.19)
n=2
with ϑn−1 = ϑ(tn−1 ). Substitution of (3.19) into (3.18) and re-arranging the terms yields Z tn N N Z tn X X (e(T ), ϑ(T )) = (e(0), ϑ(0)) + ([X]n−1 , ϑn−1 ) + (ϑ, e) ˙ dt − (ϑ, g 0 e) dt. n=2
n=1
tn−1
tn−1
9
Inverse Sensitivity Problems II
Substituting e = x − X and using x˙ − g(x) = 0, gives
(e(T ), ϑ(T )) = (e(0), ϑ(0)) +
N X
([X]n−1 , ϑn−1 ) −
n=2
N Z X n=1
tn
(X˙ − g(X), ϑ) dt. (3.20)
tn−1
Similarly, if ψ2 = 0 and ψ1 (t) is nonzero for some t ∈ (0, T ), we obtain
Z
T
(e, ψ1 ) dt = (e(0), ϑ(0)) + 0
N X
([X]n−1 , ϑn−1 ) −
n=2
N Z X n=1
tn
(X˙ − g(X), ϑ) dt. (3.21)
tn−1
We summarize as the following Theorem 3.2. A computable estimate of the error in a quantity of interest of (3.9) is obtained by solving (3.16) and computing either (3.20) or (3.21).
Details and implementation. Often Galerkin orthogonality is used to introduce a projection of the adjoint solution into the approximation space, which has the effect of “localizing” the error contributions from each time step. Also, the analysis may be altered [28] to account for the use of quadrature formulas in the computation of the approximation. This provides a way to apply the analysis to many common finite difference schemes [28]. Note that (3.20) is a computable estimate of the error provided that we can compute the adjoint solution ϑ. The first issue is that we cannot use g 0 (x, X) in practice since this requires the unknown solution x. Typically, we use g 0 (x, X) ≈ g 0 (X, X) = g 0 (X). The effect of this approximation on the computation of ϑ can be analyzed, e.g. [11]. The error depends on the accuracy of X, so typical short time error bounds can be proved. The second issue is that in practice we solve the adjoint problem using a numerical method, typically using a higher order method than used for the forward solution. The consequence is that in practice we compute a numerical adjoint solution. We can alter the analysis below to take into the account the error in the numerical solution, but this significantly complicates the presentation of the results while generally being a relatively small issue. So we do not present this.
3.6. Estimating term II. We first observe that this term is a linear functional of the error arising from solving the exact adjoint with the approximate adjoint. We adapt the a posteriori analysis to estimate the error of this approximation. We define the adjoint to the approximate adjoint as ( w˙ = Dy f (Yµ ; µ1 )w, 0 < t ≤ T, w(0) = (λ0 − µ0 )
10
T. Butler, D. Estep, and J. Sandelin
Since w˙ − Dy f (Yµ ; µ) = 0, we have Z
T
hw˙ − Dy f (Yµ ; µ1 )w, (φ − Φ)i dt
0= 0
"
Z
T
= hw(T ), (φ(T ) − Φ(T ))i − h(w(0), (φ(0) − Φ(0))i −
# D E ˙ ˙ w, (φ − Φ) dt
0 T
Z
−
w, Dy f (Yµ ; µ1 )> (φ − Φ) dt
0
= − h(λ0 − µ0 ), (φ(0) − Φ(0))i Z TD h iE w, −φ˙ − Dy f (Yµ ; µ1 )> φ + Φ˙ + Dy f (Yµ ; µ1 )> Φ dt. + 0
This gives Z
T
D
II =
h iE w, −φ˙ − Dy f (Yµ ; µ1 )> φ + Φ˙ + Dy f (Yµ ; µ1 )> Φ dt.
(3.22)
0
By adding and subtracting Dy f (Yµ ; µ1 )> φ to the differential equation in (3.4) for the exact adjoint, we have >
−φ˙ − Dy f (Yµ ; µ1 )> φ = [Dy f (yµ ; µ1 ) − Dy f (Yµ ; µ1 )] φ + ψ.
(3.23)
Substituting (3.23) into (3.22) and using (3.12), we have Z
T
II =
D E > w, [Dy f (yµ ; µ1 ) − Dy f (Yµ ; µ1 )] Φ dt
(3.24)
0
Z +
T
D
E > w, [Dy f (yµ ; µ1 ) − Dy f (Yµ ; µ1 )] (φ − Φ) dt.
(3.25)
0
We show that the second term on the right-hand side of the last equation is higherorder and estimate the first term on the right-hand side. If f (y; λ) is three-times continuously differentiable, then we use Taylor’s theorem to get Z TD E > w, [Dy f (yµ ; µ1 ) − Dy f (Yµ ; µ1 )] Φ dt 0 + Z T* > > ≈ w, Φ ⊗ J Dy vec(Dy f (Yµ ; µ1 ) ) (yµ − Yµ ) dt | {z } | {z } 0 n×n2
Z =
T
n2 ×n
D E > > > Dy vec(Dy f (Yµ ; µ1 )> ) Φ ⊗ J w, (yµ − Yµ ) dt,
0
where J denotes the n × n identity matrix and the vector operator denoted vec is a map from Rl×m → Rlm defined by stacking the columns (in order) of a matrix to form a column vector. We let > > > ψII = Dy vec(Dy f (Yµ ; µ1 )> ) Φ ⊗ J w. The first term on the right-hand side is a linear functional of the error yµ − Yµ and can be estimated by Theorem 3.2.
Inverse Sensitivity Problems II
We now show that the second term is higher-order. Let η = φ − Φ, then ( −η˙ = Dy f (yµ ; µ1 )> φ − Dy f (Yµ ; µ1 )> Φ, T > t ≥ 0 η(T ) = 0.
11
(3.26)
If Yµ is sufficiently close to yµ over [0, T ], then Dy f (Yµ ; µ1 )> = Dy f (yµ ; µ1 )> + (t), t ∈ [0, T ],
(3.27)
where (t) is a perturbation matrix satisfying k(t)k ≤ C kyµ − Yµ k, for some C > 0 and all t ∈ [0, T ]. Substituting (3.27) into (3.26) gives, ( −η˙ = Dy f (yµ ; µ1 )> η + (t)Φ(t), T > t ≥ 0 (3.28) η(T ) = 0. Let Σ(t) denote the fundamental matrix of (3.29), then Z
t
η(t) = −Σ(t)
−1
[Σ(s)]
(s)Φ(s) ds.
T
This implies that T
Z kη(t)k ≤ kΣ(t)k 0
Σ(s)−1 k(s)k kΦ(s)k ds ≤ C kyµ − Yµ k . U
(3.29)
Here, k·kU is interpreted as before to mean the L2 ([0, T ]) norm of a given vector norm of the argument, and C > 0 is some constant that bounds the product of supt∈[0,T ] kΣ(t)k, supt∈[0,T ] Σ(t)−1 , and supt∈[0,T ] kΦ(t)k. Thus, by Lipschitz continuity of the first derivatives of f (y; λ) and (3.29), Z TD E 2 > w, [Dy f (yµ ; µ1 ) − Dy f (Yµ ; µ1 )] (φ − Φ) dt ≤ C kyµ − Yµ kU . 0 3.7. Estimating term III. Add and subtract hDλ f (Yµ ; µ1 )(λ1 − µ1 ), φi and write III = IIIa + IIIb, where T
Z
h(Dλ f (yµ ; µ1 ) − Dλ f (Yµ ; µ1 ))(λ1 − µ1 ), φi dt
IIIa = 0 T
Z
hDλ f (Yµ ; µ1 )(λ1 − µ1 ), (φ − Φ)i dt,
IIb = 0
and estimate IIIa and IIIb. Estimating term IIIa. Add and subtract h(Dλ f (yµ ; µ1 ) −Dλ f (Yµ ; µ1 ))(λ1 − µ1 ), Φi and write IIIa = IIIaa + IIIab, where Z
T
h(Dλ f (yµ ; µ1 ) − Dλ f (Yµ ; µ1 ))(λ1 − µ1 ), (φ − Φ)i dt
IIIaa = 0
Z
T
h(Dλ f (yµ ; µ1 ) − Dλ f (Yµ ; µ1 ))(λ1 − µ1 ), Φi dt.
IIIab = 0
12
T. Butler, D. Estep, and J. Sandelin
We show that IIIaa is higher-order. We know that kφ − Φk ≤ C kyµ − Yµ kU for some constant C > 0, therefore 2
|IIIaa| ≤ C kyµ − Yµ kU , for some constant C > 0. Again assuming that f (y; λ) is three-times continuously differentiable, then (Dλ f (yµ ; µ1 ) − Dλ f (Yµ ; µ1 ))(λ 1 − µ1 )> ≈ (λ1 − µ1 ) ⊗ J [Dy (vec(Dλ f (Yµ ; µ1 )))] (yµ − Yµ ) . We substitute this estimate into IIIab so that Z T
IIIab ≈ (λ1 − µ1 )> ⊗ J [Dy (vec(Dλ f (Yµ ; µ1 )))] (yµ − Yµ ) , Φ dt 0 T
Z
D
=
>
(yµ − Yµ ), [Dy (vec(Dλ f (Yµ ; µ1 )))]
> E (λ1 − µ1 )> ⊗ J Φ dt.
0
We let ψIIIab = [Dy (vec(Dλ f (Yµ ; µ1 )))]
>
> (λ1 − µ1 )> ⊗ J Φ.
Thus, we have represented IIIab as a linear functional of the error in yµ − Yµ , which can be estimated by Theorem 3.2. Estimating term IIIb. We let ψIIIb = Dλ f (Yµ , µ1 )(λ1 − µ1 ), so that Z T IIb = hψIIb , (φ − Φ)i dt. 0
Thus, IIIb is a linear functional of the error in the adjoint solutions φ − Φ. We apply Theorem 3.2. We again define an adjoint to the approximate adjoint as ( z˙ − Dy f (Yµ ; µ1 )z = ψIIb , 0 < t ≤ T, z(0) = 0. We perform a standard variational argument to obtain Z IIIb = hz(T ), (φ(T ) − Φ(T ))i − hz(0), (φ(0) − Φ(0))i − {z } | {z } | φ(T )−Φ(T )=0
Z −
T
0
D E ˙ z, (φ˙ − Φ) dt
z(0)=0
T
z, Dy f (Yµ ; µ1 )> (φ − Φ) dt
0
Z =
T
D E z, −φ˙ − Dy f (Yµ ; µ1 )φ + Φ˙ + Dy f (Yµ ; µ1 )Φ dt.
0
Using (3.26)-(3.28) in the right-hand side above, we have Z TD E > IIIb = z, [Dy f (yµ ; µ1 ) − Dy f (Yµ ; µ1 )] Φ dt 0
Z − 0
T
D E > z, [Dy f (yµ ; µ1 ) − Dy f (Yµ ; µ1 )] (φ − Φ) dt
Inverse Sensitivity Problems II
13
The two terms on the right-hand side are analagous to (3.24) and (3.25). The second term on the right-hand side has already been proven to be higher-order. Therefore, the second term is neglected in the estimate. The first term is estimated similarly to how (3.24) was estimated. We define > > > ψ˜IIIb = Dy vec(Dy f (Yµ ; µ1 )> ) Φ ⊗ J z, and the first term is approximated by Z
T
D E ψ˜IIIb , (yµ − Yµ ) dt,
0
which is a linear functional of the error of yµ − Yµ , and is estimable by Theorem 3.2. This completes the proof of Theorem 3.3. 3.8. Summarizing the effect of deterministic error. We summarize the estimates on q(λ) − q˜(λ) in a theorem that presents the steps required to compute the a posteriori estimate in the form of an algorithm. Theorem 3.3. Let Yµ and Φ denote the numerical solutions to the initial value problem (3.3) and the approximate adjoint problem (3.12), respectively. Apply Theorem 3.2 to estimate term I. Let pm be the number of model parameters and pi the number of initial conditions (pm + pi = p) for i = 1, . . . , p do if i ≤ pm then Let z denote the solutions to the adjoint to the approximate adjoint problem ( x˙ − Dy f (Yµ ; µ1 )x = Dλ f (Yµ , µ1 )δi , 0 < t ≤ T, x(0) = 0, where δi denotes the ith standard basis vector in Rpm Set > > ψIIIab = [Dy (vec(Dλ f (Yµ ; µ1 )))] δi> ⊗ J Φ, > > > ψ˜IIIb = Dy vec(Dy f (Yµ ; µ1 )> ) Φ ⊗ J z. Solve (3.12) with data given by the above vectors and use Theorem 3.2 to compute the standard error representations given by ei1
Z
T
h(yµ − Yµ ), ψIIab i dt,
:= 0
ei2
Z :=
T
D E ψ˜IIb , (yµ − Yµ ) dt.
0
else Let w denote the solutions to the adjoint to the approximate adjoint problem ( x˙ − Dy f (Yµ ; µ1 )x = 0, 0 < t ≤ T, x(0) = δi , where δi denotes the ith standard basis vector in Rpi
14
T. Butler, D. Estep, and J. Sandelin
Set > > > ψII = Dy vec(Dy f (Yµ ; µ1 )> ) Φ ⊗J w Solve (3.12) with data given by the above vectors and use Theorem 3.2 to compute the standard error representations given by T
Z
ei3
hψI , (yµ − Yµ )i dt
:= 0
end if end for Fix λ and set u := λ − µ, where ν1
u=
···
νpm
ρ1
···
ρ pi
>
.
The error q(λ) − q˜(λ) is given by q(λ) − q˜(λ) = eu + h.o.t, where eu is the computable error estimate given by eu = e0 +
pm X i=1
νi
2 X j=1
eij +
pi X
ρi ei3 .
i=1
This theorem provides a means of computing the Ei required in Theorem 2.2. Set Ei = maxλ∈Bi eu . For convex polygonal cells Bi , computation of Ei is straightforward since eu is a linear function of λ, so the maximum occurs on the boundary. 4. Examples. We present numerical examples that highlight certain aspects of the analysis. The numerical solution and error estimates are computed using GAASP [16]. We use a first-order discontinuous Galerkin (dG1) method for the forward solve, and a second-order continuous Galerkin (cG2) method for all of the adjoint solves. We use the adaptive time step capability in GAASP and terminate time step refinement once the error estimate corrects the estimated gradient by less than 10%. 4.1. Example 1. The first example is a coupled linear system with four parameters, x1 − λ2 tx2 , 0 < t ≤ T x˙1 = λ1 (1+t) x˙ = x2 + λ tx , 0 < t ≤ T 2 4 1 λ3 (1+t) (4.1) x1 (0) = x1,0 , x (0) = x . 2
2,0
The adjoint problem is 1 −φ˙1 − λ1 (1+t) φ1 + λ4 tφ2 = ψ1 , T > t ≥ 0 −φ˙ + λ tφ − 1 2 2 1 λ3 (1+t) φ2 = ψ2 , T > t ≥ 0 φ1 (T ) = φ1,T , φ (T ) = φ . 2
2,T
(4.2)
15
Inverse Sensitivity Problems II
Computing the true errors requires knowledge of the exact φ. To this end, we choose ψ(t) = (ψ1 , ψ2 )> and φ(T ) = (φ1,T , φ2,T )> so that φ(t) = (t, 1)> . In this linear example, II = 0. We report the error estimates for term III. We > √ √ take µ = (2, 2, 2, 2)> , so yµ = 1 + t cos(t2 ), 1 + t sin(t2 ) . We consider both T = 3 and T = 10. We plot the forward solutions yµ and Yµ for T = 3 with two different time steps in Fig. 4.1. Table 4.1 shows the error estimate results for T = 3. Since the computed error estimates tend to be accurate, we can often compute a corrected gradient by adding the error estimate to the computed (estimated) gradient. We see improvement in the corrected gradient by comparing the fourth and last columns of Tables 4.1. At T = 3, Yµ is a good approximation of yµ at the coarse time step of 0.2 as seen in Fig. 4.1, so the second derivative calculations involving Yµ used in the error estimates produce accurate error estimates beginning at this time step. We plot the forward solutions yµ and Yµ for T = 10 with four different time steps in Fig. 4.2-4.3. The oscillations of yµ increase in magnitude and with higher frequency as time increases. As seen in Table 4.2, when the error estimate is of the same order magnitude as the estimated gradient, the estimate cannot be used to correct the gradient. 2.5 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 0
2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 0
0.5
1
1.5
2
2.5
3
2 1.5 1 0.5 0 −0.5 −1 −1.5 0.5
1
1.5
2
2.5
−2
3
0
0.5
1
1.5
2
2.5
3
2.5 2 1.5 1 0.5 0 −0.5 −1 −1.5 −2 0
0.5
1
1.5
2
2.5
3
Fig. 4.1. Solutions to (4.1) for T = 3. Left two plots: Yµ,1 and Yµ,2 with a time step of 0.2. Right two plots: Yµ,1 and Yµ,2 with a time step of 0.1. (dot = true, dash = approximation)
3
4
4
4
2
3
3
3
1
2
2
2
1
1
1
0
0
0
−1
−1
−1
−2
−2
−2
0 −1 −2 −3 −4 0
2
3
4
5
6
7
8
9
10
−4 0
−3
−3
−3 1
1
2
3
4
5
6
7
8
9
10
−40
1
2
3
4
5
6
7
8
9
10
−40
1
2
3
4
5
6
7
8
9
10
Fig. 4.2. Solutions to (4.1) for T = 10. Left two plots: Yµ,1 and Yµ,2 with a time step of 0.2. Right two plots: Yµ,1 and Yµ,2 with a time step of 0.1. (dot = true, dash = approximation)
4
4
4
4
3
3
3
3
2
2
2
2
1
1
1
1
0
0
0
0
−1
−1
−1
−1
−2
−2
−2
−2
−3
−3
−3
−4 0
−4 0
−4 0
1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
−3 1
2
3
4
5
6
7
8
9
10
−4 0
1
2
3
4
5
6
7
8
9
10
Fig. 4.3. Solutions to (4.1) for T = 10. Left two plots: Yµ,1 and Yµ,2 with a time step of 0.05. Right two plots: Yµ,1 and Yµ,2 with a time step of 0.025. (dot = true, dash = approximation)
true ∂1 q 1.71E − 01 1.71E − 01 1.71E − 01
true ∂2 q −3.17E + 00 −3.17E + 00 −3.17E + 00
true ∂3 q 5.36E − 01 5.36E − 01 5.36E − 01
true ∂4 q 2.78E − 01 2.78E − 01 2.78E − 01
∆t 0.2 0.1 0.05
∆t 0.2 0.1 0.05
∆t 0.2 0.1 0.05
∆t 0.2 0.1 0.05
est. ∂4 q 2.45E − 01 2.72E − 01 2.77E − 01
est. ∂3 q 5.30E − 01 5.35E − 01 5.36E − 01
est. ∂2 q −2.95E + 00 −3.14E + 00 −3.17E + 00
est. ∂1 q 1.62E − 01 1.69E − 01 1.70E − 01
Error 3.27E − 02 5.92E − 03 8.36E − 04
Error 5.82E − 03 7.24E − 04 8.69E − 05
Error −2.27E − 01 −2.91E − 02 −3.54E − 03
Error 8.97E − 03 1.58E − 03 2.21E − 04
IIIab 3.53E − 02 5.58E − 03 7.50E − 04
IIIab 4.40E − 03 5.59E − 04 6.77E − 05
IIIab −1.76E − 01 −2.29E − 02 −2.81E − 03
IIIab 9.43E − 03 1.47E − 03 1.97E − 04
IIb 7.92E − 07 −1.09E − 06 −8.04E − 06
IIb −4.19E − 07 −6.41E − 07 9.55E − 08
IIb 1.07E − 06 −7.29E − 07 −9.58E − 06
IIb −7.53E − 07 −7.32E − 07 2.58E − 06
Table 4.1: Example 1 results for the three partial derivatives of the solution at T = 3
Estimate 3.53E − 02 5.57E − 03 7.41E − 04
Estimate 4.40E − 03 5.58E − 04 6.78E − 05
Estimate −1.76E − 01 −2.29E − 02 −2.82E − 03
Estimate 9.43E − 03 1.47E − 03 1.99E − 04
est. ∂1 q+Est. true ∂1 q
1.00 0.999 1.00 est. ∂2 q+Est. true ∂2 q
0.984 0.998 1.00 est. ∂3 q+Est. true ∂3 q
0.997 1.00 1.00 est. ∂4 q+Est. true ∂4 q
1.01 0.999 1.00
Error Estimate
1.052 0.931 0.904 Error Estimate
0.776 0.788 0.796 Error Estimate
0.757 0.771 0.780 Error Estimate
1.08 0.941 0.887
16 T. Butler, D. Estep, and J. Sandelin
0.2 0.1 0.05 0.025
−1.16E − 01 −1.78E − 01 −7.01E − 01 −9.04E − 01
−9.52E − 01 −9.52E − 01 −9.52E − 01 −9.52E − 01
est. ∂3 q 4.63E − 01 4.64E − 01 4.56E − 01 4.51E − 01 est. ∂4 q
true ∂3 q 4.51E − 01 4.51E − 01 4.51E − 01 4.51E − 01
∆t 0.2 0.1 0.05 0.025
est. ∂2 q −3.39E − 01 −6.37E − 01 7.68E + 00 1.30E + 01
est. ∂1 q 6.30E − 02 5.75E − 02 9.58E − 03 −9.20E − 03
true ∂4 q
true ∂2 q 1.40E + 01 1.40E + 01 1.40E + 01 1.40E + 01
∆t 0.2 0.1 0.05 0.0250
∆t
true ∂1 q −1.35E − 02 −1.35E − 02 −1.35E − 02 −1.35E − 02
∆t 0.2 0.1 0.05 0.025
−8.37E − 01 −7.75E − 01 −2.51E − 01 −4.81E − 02
Error
Error −1.29E − 02 −1.32E − 02 −5.73E − 03 −8.98E − 04
Error 1.44E + 01 1.47E + 01 6.34E + 00 9.91E − 01
Error −7.66E − 02 −7.10E − 02 −2.31E − 02 −4.39E − 03
1.11E − 01 −1.17E + 00 −2.58E − 01 −4.27E − 02
IIIab
IIIab −1.14E − 02 −5.13E − 03 −4.37E − 03 −7.10E − 04
IIIab 1.26E + 01 5.82E + 00 4.85E + 00 7.88E − 01
IIIab 9.59E − 03 −1.07E − 01 −2.36E − 02 −3.91E − 03
2.18E − 04 −3.28E − 04 −2.83E − 04 −2.52E − 04
IIb
IIb 2.18E − 05 −9.29E − 06 −2.20E − 06 −8.74E − 06
IIb 2.07E − 04 −3.24E − 04 −2.87E − 04 −2.53E − 04
IIb 2.20E − 05 −5.60E − 06 3.40E − 06 −9.80E − 06
Table 4.2: Example 1 results for the three partial derivatives of the solution at T = 10
1.11E − 01 −1.18E + 00 −2.58E − 01 −4.29E − 02
Estimate
Estimate −1.14E − 02 −5.14E − 03 −4.37E − 03 −7.19E − 04
Estimate 1.26E + 01 5.82E + 00 4.85E + 00 7.87E − 01
Estimate 9.61E − 03 −1.07E − 01 −2.36E − 02 −3.92E − 03
est. ∂2 q+Est. true ∂2 q
0.873 0.370 0.894 0.985 est. ∂3 q+Est. true ∂3 q
1.02 1.02 1.00 1.00 est. ∂4 q+Est. true ∂4 q
Error Estimate
0.876 0.397 0.765 0.794 Error Estimate
0.882 0.388 0.763 0.801 Error Estimate
1.36 0.458 1.01 0.995
−5.36 3.66 1.04 0.965
−0.126 1.51 1.02 0.893
−0.133 1.52 1.03 0.892
est. ∂1 q+Est. true ∂1 q
Error Estimate
Inverse Sensitivity Problems II
17
18
T. Butler, D. Estep, and J. Sandelin
4.2. Example 2. The second example is a nonlinear problem with two parameters, ( x˙ = −(λ1 + sin(λ2 t))x2 , 0 < t ≤ T, (4.3) x(0) = x0 . We set µ = (−0.1, 20)> so yµ (t) = 20/ (20 + 1 + (−0.1)20t − cos(20t)). The quantity of interest is y(T ). The adjoint problem is ( −φ˙ + 2yµ (−0.1 + sin(20t))φ = 0, T > t ≥ 0, (4.4) φ(T ) = 1. The solution to (4.4) is φ(t) = C(20 + 1 + 20(−0.1)t − cos(20t))2 , where C is chosen so φ(T ) = 1. Since (4.3) is nonlinear, we report the error estimates for both terms II and III. We show results for both T = 3.9 and T = 10. We plot the forward solutions yµ and Yµ and adjoint solutions φ and Φ for T = 3.9 and T = 10 with three different time steps in Figures 4.4 and 4.5 respectively. Tables 4.3 and 4.4 show the error estimate results for T = 3.9 and T = 10 respectively. 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4
0 0.5 1
1.5 2
2.5 3
3.5 4
1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9
3
3
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0
0.5 1
1.5 2
2.5 3
3.5 4
0.5
0
0.5 1
1.5 2
2.5 3
3.5 4
1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9
0
0.5 1
1.5 2
2.5 3
3.5 4
0
0.5 1
1.5 2
2.5 3
3.5 4
2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0
0.5 1
1.5 2
2.5 3
3.5 4
Fig. 4.4. Solutions to (4.3) for T = 3.9 with a time step of 0.3 (left), 0.15 (middle), and 0.075 (right). Top plots: Yµ and yµ . Bottom plots: Φ and φ. (dot = true, dash = approximation)
4.3. Example 3. Consider the same problem as in Sec. 4.2. Assume λ1 and the initial condition are fixed and restrict λ2 to the interval [19, 21] and use the linear approximation for the quantity of interest constructed in the calculations of Example 4.2 for T = 10 with a step size of 0.02. This implies that E = 6.94 and using just one reference point we have that max q(bi ) − min q(bi ) is approximately 1357.91. Assume the output is uniformly distributed so that L = 0.0007364, which is found by considering the output distribution on [min q(bi ), max q(bi )]. This gives the bound on ˜ (t) − F (t) F as 6.94. This is exactly the numerical error. This should be the case for this particular example and checks the consistency of the above analysis. A uniformly
true ∂2 q −1.94E − 01 −1.93E − 01 −1.93E − 01
∆t 0.3 0.15 0.075
0.3 0.15 0.075
∆t
true ∂1 q −7.89E + 00 −7.89E + 00 −7.89E + 00
∆t 0.3 0.15 0.075
2.02E + 00 2.02E + 00 2.02E + 00
φ(0)
est. ∂2 q 2.26E − 01 5.62E − 02 −1.78E − 01
est. ∂1 q −4.42E + 00 −9.32E + 00 −8.11E + 00
2.49E + 00 2.35E + 00 2.07E + 00
Φ(0)
Error −4.20E − 01 −2.50E − 01 −1.55E − 02
Error −3.47E + 00 1.43E + 00 2.12E − 01
II 1.19E + 00 −3.54E − 01 −4.93E − 02
−4.64E − 01 −3.31E − 01 −5.03E − 02
IIb 1.09E + 00 −5.68E − 02 −1.44E − 02
IIb −1.49E + 00 8.18E − 01 1.01E − 01
Error
IIIab 9.44E − 01 −1.91E − 01 −4.55E − 03
IIIab −4.06E + 00 5.88E − 01 8.05E − 02
−2.57 1.07 0.981
Error Estimate
est. ∂1 q+Est. true ∂1 q
1.26 1.00 1.00 est. ∂2 q+Est. true ∂2 q
−11.7 0.990 1.02
Error Estimate
1.60 0.986 0.858 Error Estimate
−4.83 0.992 1.22
1.82 0.988 1.00
Φ(0)+ Est. φ(0)
Estimate 2.03E + 00 −2.48E − 01 −1.90E − 02
Estimate −5.55E + 00 1.41E + 00 1.82E − 01
Table 4.3: Example 2 results for the two partial derivatives of the solution (upper) and the adjoint solution (lower) at T = 3.9 Inverse Sensitivity Problems II
19
0.04 0.02 0.01
0.04 0.02 0.01 ∆t
∆t
6.66E + 02 6.66E + 02 6.66E + 02
φ(0)
1.52E + 03 1.52E + 03 1.52E + 03
0.04 0.02 0.01
1.90E + 03 1.56E + 03 1.53E + 03
1.25043 1.02465 1.00366
Φ(0) φ(0)
−3.67E + 01 −1.28E + 01 −2.19E + 00
Φ(0)
1.05505 1.01928 1.00328
∆t
7.03E + 02 6.79E + 02 6.68E + 02 Term II −7.63E + 02 −3.11E + 01 −4.23E + 00
−3.81E + 02 −3.75E + 01 −5.57E + 00
−1.12E + 03 −9.40E + 01 −1.17E + 01
True Error
1.17E + 03 8.71E + 01 1.02E + 01
7.08E + 03 2.60E + 02 3.61E + 01 Term IIb
True Error
−1.52E + 04 −1.91E + 04 1.25290 3.85E + 03 5.24E + 02 −1.52E + 04 −1.56E + 04 1.02488 3.78E + 02 4.93E + 01 −1.52E + 04 −1.53E + 04 1.00369 5.61E + 01 5.98E + 00 ∂2 q (est) ∂2 q (true) ∂2 q (est) True Error Term IIIab ∂2 q (true)
∂1 q (est) ∂1 q (true)
Term IIb
∂1 q (est)
Term IIIab
∂1 q (true)
2.00188 0.83007 0.75982
Error Ratio
4.73E + 01 −6.94E + 00 −1.48E + 00
7.60E + 03 3.10E + 02 4.21E + 01 Error Estimate
Error Estimate
Table 4.4: Example 2 results for the two partial derivatives of the solution (upper) and the adjoint solution (lower) at T = 10
0.74909 1.00419 1.00088
1.12609 1.00886 1.00106
∂2 q (est)+Error Estimate ∂2 q (true)
0.75317 1.00452 1.00092
∂1 q (est)+Error Estimate ∂1 q (true)
Φ(0)+Error Estimate φ(0)
−1.29028 0.54057 0.67836
1.97601 0.81829 0.75009 Error Ratio
Error Ratio
20 T. Butler, D. Estep, and J. Sandelin
21
Inverse Sensitivity Problems II 60 50 40
40
40
35
35
30
30
30
25
25
20
20
20
15
15
10 0 9
45
10
10 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10
2500 2000
5 9
9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10
2000
1800
1800
1600
1600
1400
1400
1000
1000
800
800
1000
600
600
400
400
500
200
200 0 0
1
2
3
4
5
6
7
8
9
10
0 0
9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10
1200
1200
1500
5 9
1
2
3
4
5
6
7
8
9
10
0 0
1
2
3
4
5
6
7
8
9
10
Fig. 4.5. Solutions to (4.3) for T = 10 with a time step of 0.04 (left), 0.02 (middle), and 0.01 (right). Top plots: Yµ and yµ for time interval [9, 10]. Bottom plots: Φ and φ. (dot = true, dash = approximation)
distributed output has a uniformly distributed input since this is a 1-1 linear map. The only error comes from the numerical error in this case, and is reflected in the result. 5. An error indicator for the accuracy of the generalized linear contours. We show in [4] that the generalized linear contours defining the approximate set-valued inverse to (1.1) converge pointwise to the generalized contours as the “size” of the largest cell, Bi , decreases to zero. It is useful to have an “indicator” to guide a choice of cells {Bi }. Let Bi and Bj denote cells used in (3.8) such that i 6= j and the cells share a boundary. Consider a generalized contour, denoted q −1 (¯ q ), that is connected across the boundary of these cells. The generalized linear contour, qˆ−1 (¯ q ), is typically discontinuous across such a boundary, i.e. on Bi and Bj Dij := [q(µi ) + h∇q(µi ), (λ − µi )i] − [q(µj ) + h∇q(µj ), (λ − µj )i] 6= 0,
(5.1)
for almost every λ ∈ ∂Bi ∩ ∂Bj . The generalized linear contours are simply the level sets of qˆ(λ) and (5.1) gives the discontinuity of qˆ(λ) at each λ ∈ ∂Bi ∩ ∂Bj , which is a measure of the smoothness of the piecewise-linear representation qˆ(λ). We define, D = max {Dij : λ ∈ ∂Bi ∩ ∂Bj 6= 0} , i,j
which is computable and cheap to obtain since (5.1) is a linear function, and if the Bi are polygonal geometric objects, then only a finite number of points need be calculated in order to determine the extreme values of (5.1) for λ ∈ ∂Bi ∩ ∂Bj . The result in [4] shows that the generalized linear contours converge pointwise to the generalized contours, hence D → 0. We claim that D is reasonable indicator of the possible size of the error q −1 (¯ q ) − qˆ−1 (¯ q ) for sufficiently small cells {Bi }.
22
T. Butler, D. Estep, and J. Sandelin
For nonlinear ordinary differential equations, we have −1 q (¯ q ) − qˆ−1 (¯ q) = C
Z
T
hR2 (yλ , yµi ; λ1 , µi,1 ), φi dt, 0
where C is a constant depending on ∇q(µi ). Replacing i with j in the above equa tion gives an expression for Bj . Therefore, q −1 (¯ q ) − qˆ−1 (¯ q) ≤
the error in cell C(kR2 (yλ , yµi ; λ1 , µi,1 )k + R2 (yλ , yµj ; λ1 , µj,1 ) ). Since D has the same bound, we consider it a measure of the error in the approximation of generalized contours. The calculation of D is an a priori measure, computed before solving the inverse problem as described in [4]. This measure allows us to determine if (3.8) is sufficient for use in solving the inverse problem. REFERENCES [1] J.M. Bernardo, Reference posterior distributions for Bayesian inference, J.R. Statist. Soc., 41 (1979), pp. 113–147. [2] P. Billingsley, Probability and Measure, John Wiley & Sons, Inc., 1995. [3] T. Butler, Computational Measure Theoretic Approach to Inverse Sensitivity Analysis: Methods and Analysis, PhD thesis, Department of Mathematics, Colorado State University, Fort Collins, CO 80523, 2009. [4] T. Butler and D. Estep, A computational measure theoretic approach to inverse sensitivity problems I: Basic method and analysis, SIAM J. Numer. Analysis, (2010). submitted. , A measure-theoretic computational method for inverse sensitivity problems III: Multiple [5] output quantities of interest, In preparation, (2010). [6] V. Carey, D. Estep, A. Johansson, M. Larson, and S. Tavener, Blockwise adaptivity for time dependent problems based on coarse scale adjoint solutions. in revision, 2009. [7] K. Eriksson, D. Estep, P. Hansbo, and C. Johnson, Introduction to adaptive methods for differential equations, in Acta numerica, 1995, Acta Numer., Cambridge Univ. Press, Cambridge, 1995, pp. 105–158. [8] , Computational differential equations, Cambridge University Press, Cambridge, 1996. [9] D. Estep, A posteriori error bounds and global error control for approximation of ordinary differential equations, SIAM J. Numer. Anal., 32 (1995), pp. 1–48. [10] D. Estep and D. French, Global error control for the continuous Galerkin finite element method for ordinary differential equations, RAIRO Mod´ el. Math. Anal. Num´ er., 28 (1994), pp. 815–852. [11] D. Estep, V. Ginting, J. Shadid, and S. Tavener, An a posteriori-a priori analysis of multiscale operator splitting, SIAM J. Num. Analysis, 46 (2008), pp. 1116–1146. [12] D. Estep, M. J. Holst, and A. M˚ alqvist, Nonparametric density estimation for randomly perturbed elliptic problems III: Convergence and a priori analysis. In preparation, 2008. [13] D. Estep, M. G. Larson, and R. D. Williams, Estimating the error of numerical solutions of systems of reaction-diffusion equations, Mem. Amer. Math. Soc., 146 (2000), pp. viii+109. [14] D. Estep, M˚ alqvist A.and, and S. Tavener, Nonparametric density estimation for randomly perturbed elliptic problems I: Computational methods, a posteriori analysis, and adaptive error control. In preparation, 2008. [15] D. Estep, A. M˚ alqvist, and S. Tavener, Nonparametric density estimation for randomly perturbed elliptic problems II: Applications and adaptive modeling. In preparation, 2008. [16] D. Estep, B. Mckeown, D. Neckels, and J. Sandelin, GAASP: Globally Accurate Adaptive Sensitivity Package, 2006. write to
[email protected] for information. [17] D. Estep and D. Neckels, Fast and reliable methods for determining the evolution of uncertain parameters in differential equations, J. Comput. Physics, 213 (2006), pp. 530–556. [18] , Fast methods for determining the evolution of uncertain parameters in reactiondiffusion equations, Computer Methods in Applied Mechanics and Engineering, 196 (2007), pp. 3967–3979. [19] D. Estep and A. Stewart, The dynamical behavior of the discontinuous Galerkin method and related difference schemes, Math. Comp., 71 (2002), pp. 1075–1103. [20] J.P. Huelsenbeck et al., Potential applications and pitfalls of Bayesian inference of phylogeny, Syst. Biol., 51 (2002), pp. 673–688. [21] G. Folland, Real Analysis, John Wiley & Sons, Inc., 1999.
Inverse Sensitivity Problems II
23
[22] J.E. Gentle, Random Number Generation and Monte Carlo Methods, Springer, 2003. [23] W.R. Gilks, S. Richardson, and D.J. Spiegelhalter, Markov Chain Monte Carlo in Practice, CRC Press, 1995. [24] J. Kaipio and E. Somersalo, Statistical and Computational Inverse Problems, Springer, 2005. [25] D.C Knill and W. Richards, Perception as Bayesian Inference, Cambridge University Press, 1996. [26] D. Neckels, Variational methods for Uncertainty Quantification, PhD thesis, Department of Mathematics, Colorado State University, Fort Collins, CO 80523, 2005. [27] C.P. Robert and George Casella, Monte Carlo Statistical Methods, Springer, 2004. [28] J. Sandelin, Global Estimate and Control of Model, Numerical, and Parameter Error, PhD thesis, Department of Mathematics, Colorado State University, Fort Collins, CO 80523, 2006. [29] Robert J. Serfling, Approximation Theorems of Mathematical Statistics, John Wiley & Sons, Inc., 1980. [30] A. Tarantola, Inverse Problem Theory and Methods for Model Parameter Estimation, SIAM, 2005.