51st IEEE Conference on Decision and Control December 10-13, 2012. Maui, Hawaii, USA
Optimization-Based Estimation of Random Distributed Parameters in Elliptic Partial Differential Equations Jeff Borggaard and Hans-Werner van Wyk Abstract— As simulation continues to replace experimentation in the design cycle, the need to quantify uncertainty in model outputs due to uncertainties in the model parameters becomes critical. For distributed parameter models, current approaches assume the mean and variance of parameters are known, then use recently developed efficient numerical methods for approximating stochastic partial differential equations. However, the statistical descriptions of the model parameters are rarely known. A number of recent works have investigated adapting existing variational methods for parameter estimation to account for parametric uncertainty. In this paper, we formulate the parameter identification problem as an infinite dimensional constrained optimization problem for which we establish existence of minimizers and the first order necessary conditions. A spectral approximation of the uncertain observations (via a truncated Karhunen-Lo`eve expansion) allows an approximation of the infinite dimensional problem by a smooth, albeit high dimensional, deterministic optimization problem, the so-called ‘finite noise’ problem, in the space of functions with bounded mixed derivatives. We prove convergence of ‘finite noise’ minimizers to the corresponding infinite dimensional solutions, and devise a gradient based strategy for locating these numerically. Lastly, we illustrate our method with a numerical example.
I. INTRODUCTION We discuss a variational approach to estimating the parameter q in the elliptic system r .qru/ D f on D Rd ;
u D 0 on @D;
(1)
based on noisy measurements uO of u, when q is modeled as a spatially varying random field. Variational formulations, where the identification problem is posed as a constrained optimization, have been studied extensively for the case when q is deterministic, cf. [1], [2], [3], [4], [5]. The aleatoric uncertainty in these problems arising from imprecise, noisy measurements, variability in operating conditions, or unresolved scales are traditionally modeled as perturbations and addressed by means of regularization techniques. These techniques approximate the original inverse problem by one in which the parameter depends continuously on the data u, O thus ensuring an estimation error commensurate with the noise level. However, when a statistical model for uncertainty in the system is available, it is desirable to incorporate this information more directly into the estimation framework to This research was partially supported by the National Science Foundation under contracts DMS-0513542 and EAR-0943415 and the Air Force Office of Scientific Research under contract FA9550-12-1-0173. J. Borggaard is with the Interdisciplinary Center for Applied Mathematics, Virginia Tech, Blacksburg, VA 24061, USA and H.-W. van Wyk is with the Department of Scientific Computing, The Florida State University, Tallahassee, FL 23206
[email protected] and
[email protected].
978-1-4673-2066-5/12/$31.00 978-1-4673-2064-1/12/$31.00 ©2012 ©2012 IEEE IEEE
obtain an approximation not only of q itself but also of its probability distribution. Bayesian methods provide a sampling-based approach to statistical parameter identification problems with random observations u. O By relating the observation noise in uO to the uncertainty associated with the estimated parameter via Bayes’ Theorem [6], [7], these methods allow us to sample directly from the joint distribution of q at a given set of spatial points, through repeated evaluation of the deterministic forward model. The computational efficiency of numerical implementations of Bayesian methods, most notably Markov chain Monte Carlo schemes, depends predominantly on the statistical complexity of the input q and the measured output u, O as well as the cost of evaluating the forward model. There has also been continued interest in adapting variational methods to estimate parameter uncertainty, e.g. [8], [9], [10], [11]. Benefits include a well-established infrastructure of existing theory and algorithms, the possibility of incorporating multiple statistical influences arising from uncertainty in boundary conditions or source terms for instance, and clearly defined convergence criteria. The present work follows this approach. Thus, let .; FuO ; d!/ be a complete probability space and suppose we have a statistical model of the measured data uO in the form of a random field uO D u.x; O !/ contained in the tensor product e .D/ WD H 1 .D/ ˝ L2 ./. A least squares formulation of H 0 the parameter identification problem in (1), when q.x; !/ is a random field, may take the form 1 ˇ min O 2H kqk2H J.q; u/ WD ku uk 1 C e e10 (P ) 2 2 0 eH e10 .q;u/2H s:t: q 2 Qad ; e.q; u/ D 0; where the regularization term with ˇ > 0 is added to ensure continuous dependence of the minimizer on the data u. O Here e 1 WD H.D/ ˝ L2 ./, where H.D/ is any Hilbert space H 0 that imbeds continuously in L1 .D/, which may be taken to be the Sobolev space H 1 .D/ when d D 1 or H 2 .D/ when d D 2; 3 (see [2]). The feasible set Qad is given by e 10 .D/ W 0 < qmi n q.x; !/ a.s. on D ; Qad D fq 2 H kq.; !/kH qmax a.s. on g; while the stochastic equality constraint e.q; u/ D 0 can be written in it’s weak form as Z Z q.x; !/ru.x; !/ r.x; !/ dx d! D D Z Z e 10 .D/; (2) f .x/.x; !/ d!; 8 2 H
2926
D
e 1 .D/ [12]. We assume for the sake of simplicity where u 2 H 0 2 that f 2 L .D/ is deterministic. This formulation poses a number of theoretical, as well as computational challenges. The lack of smoothness of the random field q D q.x; !/ in its stochastic component ! limits the regularity of the equality constraint as a function of q, making it difficult to reuse the theory from the deterministic case to establish first order necessary optimality conditions, as will be shown in Section II. The most significant computational hurdle is the need to approximate stochastic integrals required when evaluating the cost functional J and when imposing with the equality constraint (2). Monte Carlo type schemes are inefficient, especially when compared with Bayesian methods. However, the recent success of stochastic Galerkin methods [12], [13] and stochastic collocation approaches [12], [14] to efficiently estimate the integrals appearing in stochastic (forward) problems has motivated investigations into their potential use for inverse and design problems. In forward simulations, collocation methods make use of spectral expansions, such as the Karhunen-Lo`eve (KL) series, to approximate the known input random field q by a smooth function of finitely many random variables, a socalled ‘finite noise’ approximation. Standard PDE regularity theory [12] ensures that the corresponding model output u depends smoothly (even analytically) on these random variables. This facilitates the use of high-dimensional quadrature techniques, based on sparse grid interpolation by high order global polynomials. Inverse problems on the other hand are generally ill-posed and consequently any smoothness of a ‘finite noise’ approximation of the given measured data uO does not necessarily carry over to the unknown parameter q. In variational formulations, explicit assumptions should therefore be made on the smoothness of ‘finite noise’ approximations of q to facilitate efficient implementation, while also accurately estimating problem (P ). We approximate (P ) in the space of functions with bounded mixed derivatives. Posing the ‘finite noise’ minimization problem (P n ) in this space not only guarantees that the equality constraint e.q; u/ is twice Fr´echet differentiable in q (see Section IV), but also allows for the use of numerical discretization schemes based on hierarchical finite elements, approximations known for their effectiveness in mitigating the curse of dimensionality [15]. In [10], Zabaras et al. demonstrate the use of piecewise linear hierarchical finite elements to approximate ‘finite noise’ design parameters in least squares formulations of heat flux control problems subject to system uncertainty with gradient-based methods. This paper aims to provide a rigorous framework within which to analyze and numerically approximate problems of the form (P ) and is structured as follows. In Section II, we establish existence and first order necessary optimality conditions for the infinite dimensional problem (P ). Section III, makes use of standard regularization theory to analytically justify the approximation of (P ) by the ‘finite noise’ problem (P n ), while Section IV discusses existence and first order necessary optimality of (P n ), and outlines a descent
method for finding the solution. Finally, a numerical example in Section V illustrates the implementation of the method. II. T HE I NFINITE D IMENSIONAL P ROBLEM In order to accommodate the lack of smoothness of q as a function of ! in our analysis, we impose inequality constraints uniformly in the random space. Any function q in the feasible set Qad , satisfies the norm bound kq.; !/kH qmax uniformly on , which by the continuous imbedding of H.D/ into L1 .D/, implies 0 < qmi n q.x; !/ qmax for all .x; !/ 2 D . This assumption, while ruling out unbounded processes, nevertheless reflects actual physical constraints. The uniform coercivity condition 0 < qmi n q.x; !/, guarantees that for each q 2 Qad , there exists a e 1 .D/ to the weak form (2) of unique solution u D u.q/ 2 H 0 the equality constraint e.q; u/ D 0 [16] satisfying the bound kuk2H e1 0
CD kf kL2 : qmi n
(3)
Hence all q 2 Qad and their respective model outputs u.q/ have statistical moments of all orders. A. Existence of Minimizers An explicit stability estimate of u.q/ in terms of the Lp .D/ norm of q was given in [12], [16] for 2 < p 1. These norms, besides lacking a Hilbert space structure, give rise to topologies that are too weak for our purposes. The following lemmas establish the weak compactness of the feasible set, continuity of the solution mapping q 7! u.q/ restricted to Qad , as well as the weak closedness of its e norm and will be used to prove graph in the stronger H the existence of solutions to (P ). Lemma II.1 The set Qad is closed, convex, and hence e. weakly compact in H Proof: The convexity of Qad is easily verified. To show e be such that that Qad is closed, let fq n g Qad and q 2 H Z kq n .; !/ q.; !/k2H d! ! 0 as n ! 1 kq n qk2H eD
and since convergence in L2 .; d!/ implies pointwise almost sure (a.s.) convergence of a subsequence on , we have kq nk .; !/ q.; !/kH ! 0
a.s. on
for some subsequence fq nk g Qad . Additionally, kq nk .:; !/kH qmax a.s. on for k 2 N and therefore q also satisfies this constraint. Finally, H.D/ imbeds continuously in L1 .D/, from which it follows that the subsequence fq nk g converges to q pointwise a.s. on D , ensuring that q also satisfies pointwise constraint q.x; !/ qmi n a.s. on D . e 1 is Lemma II.2 The mapping u W q 2 Qad 7! u.q/ 2 H 0 continuous. Proof: Suppose q n ! q in Qad . As in the proof of the previous lemma, there exists a subsequence q nk ! q
2927
pointwise a.s. on D . The upper bound on the function u established in [12, p. 1261] ensures that CD kf kL2 nk kq nk qkL1 .IL1 .D// ; ku.q /u.q/kH e10 2 qmi n where CD is the constant appearing in the Poincar´e inequality on D. Furthermore, since any subsequence of u.q n / has a subsequence converging to u.q/, u.q n / ! u.q/. e H e 1 W q 2 Qad ; u D Lemma II.3 The graph f.q; u/ 2 H 0 u.q/g of u is weakly closed. Proof: Due to space limitations, a detailed proof is omitted. However the proof is based on showing that a weakly convergent sequence .q n ; u.q n // * .q; u/ has q 2 Qad by Lemma II.1 and satisfies e.q; u/ D 0. The latter condition uses the fact that e.q n ; u.q n // D 0 along with the Cauchy-Schwarz inequality, the Dominated Convergence Theorem, and theRfact R that q 2 Qad defines a norm k:kq W u 7! kukq WD D qjruj2 dxd! that is equivalent to k:kH e10 , by virtue of the fact that 0 < qmi n q.x; !/ qmax < 1, to show that Z Z Z Z qrurvdxd! D lim q n run rvdxd! n!1 D D Z Z D f vdxd!
Theorem II.4 (Existence of Minimizers, van Wyk [19]) For each ˇ 0, the problem (P ) has a minimizer. Proof: Let .q n ; un / be a minimizing sequence for the e 1 , i.e. cost functional J over Qad H 0 e10 .q;u/2Qad H
B. A Saddle Point Condition Although solutions to (P ) exist, the inherent lack of smoothness of q in the stochastic variable ! complicates the establishment of traditional necessary optimality conditions. A short calculation reveals that the equality constraint e.q; u/ e. is not Fr´echet differentiable, as a function of q in H Additionally, the set of constraints has an empty interior in e -norm. Instead, we follow [18] in deriving a saddle the H point condition for the optimizer .q ; u / of (P ) with the help of a Hahn-Banach separation argument. Let h; i denote the L2 .D / inner product. For any e 1 , we define the Lagrangian e H e1 H triple .q; u; / 2 H 0 0 functional by L.q; u; / DJ.q; u/ C he.q; u/; iH e1 e1 ;H 0
1 ˇ 2 O 2H D ku uk e10 C 2 kqkH e C hqru; ri hf; i: 2 The main theorem of this subsection is the following [19] Theorem II.5 (Saddle Point Condition) Let .q ; u / 2 e 1 solve problem (P ). Then there exists a Lagrange Qad H 0 e 1 so that the saddle point condition multiplier 2 H 0
D
e 1. for all v 2 H 0 By combining these lemmas, we can now show that a solution q of the infinite dimensional minimization problem (P ) exists for any ˇ 0.
inf
Finally, it follows from Lemma II.3 that u D u .q /, hence u satisfies the inequality constraint e.q ; u / D 0.
J.q; u/ D lim J.q n ; un /
L.q ; u ; / L.q ; u ; / L.q; u; / e1 H e 1. holds for all .q; u; / 2 Qad H 0 0
Proof: Note that the second inequality simply reflects the optimality of .q ; u /. To obtain the first inequality, we rely on a Hahn-Banach separation argument. Let e 10 W S D f.J.q; u/ J.q ; u / C s; e.q; u// 2 R H 1 e 0 ; s 0g .q; u/ 2 Qad H and
n!1
ˇ n 2 1 n O 2H ku uk e10 C 2 kq kH e: n!1 2 Since un satisfies the equality constraint e.q n ; un / D 0, 1 and consequently kun kH e10 qmi n kf kL2 for all n 1 (Lax-Milgram), the Banach Alaoglu theorem guarantees the existence of a weakly convergent subsequence unk * u 2 e 1 .D/. Moreover, the weak compactness of Qad estabH 0 lished in Lemma II.1 also yields a subsequence q nk * q as k ! 1, so that q 2 Qad . The fact that the infimum of J is attained at the point .q ; u / follows directly from the weak lower semicontinuity of norms [17]. Indeed, D lim
1 ˇ O 2H inf kq nk k2H J.q ; u / lim inf kunk uk e10 C lim e nk !1 2 nk !1 2 1 nk ˇ nk 2 ku uk lim inf O 2H e10 C 2 kq kH e nk !1 2 D inf J.q; u/: e10 .q;u/2Qad H
(4)
e 10 W t > 0g: T D f.t; 0/ 2 R H
The Hahn-Banach Theorem (justified by Lemmas II.6–II.8) gives rise to a separating hyperplane, i.e. a pair .˛0 ; 0 / ¤ e 1 , such that .0; 0/ in R H 0 ˛0 .J.q; u/ J.q ; u / C s/ C he.q; u/; 0 iH e10 t ˛0 e1 ;H (5) e 1 . Letting s D for all t > 0, s 0, and .q; u/ 2 Qad H 0 t D 1 and .q; u/ D .q ; u / readily yields ˛0 0. In fact ˛0 > 0. Suppose to the contrary that ˛0 D 0. Then by (5) he.q; u/; 0 iH e1 D hqru; r0 i hf; 0 i 0 e1 ;H 0
e 1 . In particular, for q D q and for all .q; u/ 2 Qad H 0 1 e 1, e u 2 H 0 satisfying hq ru; rihf 0 ; i D 0 8 2 H 0 we have hq ru; r0 i hf; 0 i D h0 ; 0 i 0; which implies that 0 D 0. This contradicts the fact that .˛0 ; 0 / ¤ .0; 0/. Dividing (5) by ˛0 and letting D 0 =˛0
2928
yields J.q ; u / J.q; u/ C he.q; u/; iH e10 8.q; u/ 2 e1 ;H 1 e Qad H 0 and hence L.q ; u ; / D J.q ; u / C he.q ; u /; iH e1 e1 ;H
Since uO n depends on ! only through the intermediary variables fYi gniD1 , it seems reasonable to also estimate the unknown parameter q n as a function of these, i.e. q n .x; !/ D e . n / WD H.D/ ˝ L2 . n /: q n .x; Y1 .!/; :::; Yn .!// 2 H
0
J.q; u/ C he.q; u/; iH e1 D L.q; u; / e1 ;H 0
for all .q; u; / 2 Qad
e1 H e 1. H 0 0
Lemma II.6 The sets S and T are convex. Lemma II.7 The sets S and T are disjoint.
The corresponding model output un D u.q n /, depending continuously on the parameter q n , would then also be a function of Y D .Y1 ; :::; Yn /, according to the Doob-Dynkin lemma. The ‘finite noise’ equality constraint en .q n ; un / D 0 now takes the form of the parameterized partial differential equation [16]
Proof: This follows directly from the fact that e 1. J.q; u/ J.q ; u / for all points .q; u/ in Qad H 0 Lemma II.8 The set S has a non-empty interior. In the following section, we will show that if the observed data uO is expressed as a Karhunen-Lo`eve series [20], [21], we may approximate problem (P ) by a ‘finite noise’ optimization problem (P n ), where q is a smooth, albeit high-dimensional, function of x and intermediary random variables fYi gniD1 . The convergence framework not only informs the choice of numerical discretization, but also suggests the use of a dimension-adaptive scheme to exploit the progressive ‘smoothing’ of the problem. III. A PPROXIMATION BY THE F INITE N OISE P ROBLEM According to [20], the random field uO may be written as the Karhunen-Lo`eve (KL) series u.x; O !/ D u0 .x/ C
1 p X n n .x/Yn .!/;
(6)
nD1
where fYn g1 nD1 is an uncorrelated orthonormal sequence of random variables with zero mean and unit variance and .n ; n / is the eigenpair sequence of u’s O compact covariance operator C W H01 .D/ ! H01 .D/ [21]. Moreover, the truncated series n p X i i .x/Yi .!/ uO n .x; !/ D u0 .x/ C iD1
e 1 , i.e. kuO uO n k 1 ! 0 as n ! 1. converges to uO in H 0 e0 H Assume w.l.o.g. that fYi g1 iD1 forms a complete orthonormal basis for L20 ./, the set of functions in L2 ./ with zero mean. If this is not the case, we can restrict ourselves to L20 ./ \ spanfYi g. The following additional assumption relates to the form of the random vector. Assumption III.1 Assume the random variables fYn g are bounded uniformly in n, i.e. for some ymi n ; ymax 2 R, ymi n Yn .!/ ymax
a.s. on for all n 2 N.
Furthermore, assume that for any n the probability measure of the random vector Y D .Y1 ; :::; Yn / is absolutely continuous with respect to the Lebesgue measure and hence Y n 1/, where the hypercube has joint Qndensity n W ! Œ0; n WD iD1 i Œymi n ; ymax n denotes the range of Y .
r .q n .x; y/run .x; y// f .x/ D 0 un .x; y/ D 0
a.s. on D n a.s. on @D n :
This leaves us free to specify the regularity of qn as a function of the vector y D .y1 ; :::; yn / 2 n Rn . At the very least, qn should be square integrable in y. Moreover, 2 seeing that fYn g1 nD1 forms a basis for L0 ./, the minimizer q of the original infinite dimensional problem (P ) takes the form 1 X q .x; !/ D q0 .x/ C qn .x/Yn .!/; nD1
which is linear in each of the random variables Yn . Any minimizer qn of (P n ) that approximates q (even in the weak sense) is therefore expected to depend relatively smoothly on y when n is large. At low orders of approximation, on the other hand, the parameter q that gives rise to the model output u.q/ most closely resembling the partial data uO n may not exhibit the same degree of smoothness in the variable y D .y1 ; :::; yn /. Since the accuracy in approximation of functions in high dimensions benefits greatly from a high degree of smoothness [22], this suggests the use of a dimension adaptive strategy in which the smoothness requirement of the parameter is gradually strengthened as the stochastic dimension n increases. For the sake of our analysis, we seek ‘finite noise’ minie s WD H.D/ ˝ H s . n /, where mizers qn in the space H mix mix s n Hmix . / is the space of functions with bounded mixed es derivatives, s 1 [15]. For any function v 2 H mix 2 n L .D /, the norm X X Z Z ˇ ˇ 2 ˇD D ˛ v.x; y/ˇ2 n .y/dydx kvkH y x esmixWD n D j j1 s j˛j1 td
(7) is finite, where D . 1 ; :::; n / 2 Nn and ˛ D .˛1 ; :::; ˛d / 2 Nd are multi-indices, with j j1 D maxf 1 ; :::; n g, j˛j1 D ˛1 C ::: C ˛n and td D 1 when d D 1 or td D 2 when d D 2; 3. We can now proceed to formulate a ‘finite noise’ least squares parameter estimation problem for the perturbed, finite noise data uO n : ˇn 1 min kqk2H J.q; u/ WD ku uO n k2H 1 C e esmix s 1 2 2 0 e H e .q;u/2H
2929
mix
0
n s:t: q 2 Qad ; en .q; u/ D 0
(P n )
Proof:
where n Qad
e n W 0 < qmi n 1 q.x; y/ WD fq 2 H kn
kqO n q kH eD
1 a.s. on n g kn
a.s. on D n ; kq.; y/kH qmax C
and kn ! 1 is a monotone increasing approximation parameter to be specified later. We will justify the use of this approximation scheme by demonstrating that as n ! 1 and ˇn ! 0, the sequence of minimizers qn of problem (P n ) has a weakly convergent subsequence and that the limits of all convergent subsequences minimize the infinite dimensional problem (P ). Tikhonov regularization theory for non-linear least squares problems [23] provides the theoretical framework underlying the arguments in this section. In order to mediate between the minimizer qn of the finite noise problem (P n ), formulated in e s norm, and that of the infinite dimensional problem, the H mix e norm, we make whose minimizer q is measured in the H use of the projection of q on the first n basis vectors: Pn q D q0 .x/ C
n X
qi .x/Yi .!/:
iD1
e . Moreover, seeing Evidently, Pn q ! q as n ! 1 in H n e that P q is linear in y, it’s norm in H smix can be bounded e as stated in the following in terms of its norm in H Lemma III.2 kPn q kH es
mix
C P.nAn / sup kqn .; !/ q .; !/k2H !2
n
kP q
q k2H e
C
1 1 4.qmax C /2 ! 0: kn k1
We are now in a position to prove the main theorem of this section. For its proof we will make use of the fact that, due to the lower semicontinuity of norms xn * x; lim sup kxn k kxk ) xn ! x
(9)
n!1
for any sequence xn in a Hilbert space. Theorem III.6 (van Wyk [19]) Let kuO uO n kH e10 ! 0 and ˇn ! 0 as n ! 1. Then the sequence of minimizers qn of (P n ) has a subsequence converging weakly to a minimizer of infinite dimensional problem (P ) and the limit of every weakly convergent subsequence is a minimizer of (P ). The corresponding model outputs converge strongly to the infinite dimensional minimizer’s model output. 2 ku.qn / uO n k2H e1 C ˇn kqn kH es 0
0
es H mix
mix
2 kqO n k2H e kqn kH e X X Z D
Z
X Z
X
j j1 1 j˛j1 td kPn q k2H esmix
Z
Y .An /
j j1 1 j˛j1 td
(10)
D
y
Z
Y .An /
D
Z
Y .An /
ˇ ˛ n ˇ2 ˇD D P q ˇ n .y/dxdy
D
x
ˇ ˛ ˇ2 ˇD D q ˇ n .y/dxdy y
ˇ ˛ n ˇ2 ˇD D P q ˇ n .y/dxdy y
x
2kPn q k2HQ
ku.qn / uO n k2H e1 0
2 ku.qO n / uO n k2H e1 C ˇn kqO n kH es 0
mix
ˇn kqn k2H es
n 2 ku.qO n / uO n k2H e1 C ˇn kP q kH e: 0
By Lemmas III.5 and II.2
es Evidently qO n 2 Qad \ H mix and in light of Lemma III.3, it is reasonable to expect qO n Pn q for large n, except on sets of negligible measure. Indeed
lim sup ku.qn / uO n k2H e1 n!1
0
lim ku.qO n / n!1
n 2 uO n k2H e1 C ˇn kP q kH e
D ku.q / uk O 2H e1 ;
e as n ! 1. ! q in H
0
2930
x n
from which it follows that
e be H (8)
:
Moreover, by definition qO n .; Y .!// D qn .; Y .!// for all Y 2 Y .nAn / and hence
Lemma III.3 There is a monotonically increasing sequence kn ! 1 so that P.nAn / k1n for all n 2 N. In order to ensure strict adherence to the inequality constraints of (P n ) for every n, we modify Pn q .; !/ on nAn .
mix
2 ku.qO n / uO n k2H e1 C ˇn kqO n kH es
Then we have
Lemma III.5
q .; !/k2H d!
Proof: Since qn is optimal for (P n ), we have
The next lemma addresses the feasibility of P q . Although Pn q does not necessarily lie in the feasible region Qad , the set on which it Pn q … Qad can be made arbitrarily small as n ! 1. Let An be the event that Pn q lies inside the n , i.e. approximate feasible region Qad 1 Pn q .x; !/ a:s: on D; An WD ! 2 W 0 < qmi n C kn 1 n kP q .; !/kH qmax C : kn
qO n
Z
kPn q .; !/ q .; !/k2H d!
kqn .; !/ nAn kPn q q k2H e
n
2 Definition III.4 For all n 2 N, let defined as follows: n P q ; ! 2 An qO n WD qn ; ! … An
An
C
p 2kPn q kH e:
qO n
Z
0
mix
which, together with the Banach Alaoglu Theorem, guarantees the existence of a subsequence u.qnj / converging e 1 . Since feasible sets fQn g1 form weakly to some u0 2 H 0 ad nD1 n 1 a nested sequence, all functions qn 2 Qad Qad , which is weakly compact (Lemma II.1). The sequence qn 2 Qad 1 e. therefore has a subsequence, qnj * q0 2 Qad in H n Additionally, since Qad is nested and the graph of u is n weakly closed (Lemma II.3) we have q0 2 \1 nD1 Qad D Qad and u0 D u.q0 /. Therefore nj ku.q0 / uk O 2H O H e1 e1 D lim hu.qnj / uO ; u.q0 / ui j !1
0
0
lim inf ku.qnj / uO nj kH O H e1 ku.q0 / uk e1 j !1
0
0
(11)
lim sup ku.qnj / j !1
0
0
0
0
O H O H which means ku.q0 / uk e10 ku.q / uk e10 and hence q0 2 Qad is a minimizer for (P ). Inequalities (11) and (12) further imply lim ku.qnj / uO nj kH O H e1 D ku.q0 / uk e1
D
Q kqkL1 .D/ jujH e1 jjH e1 Cn kqkH es 0
0
mix
jujH e1 jjH e1 0
0
from which it readily follows that ke.q q; Q u/kH e1 Q Cn jujH e10 kqkH esmix and therefore en is continuously Fr´echet differentiable. A simple calculation then reveals that the first derivative es H e 1 is given by: of en in the direction .h; v/ 2 H 0 mix e 1 ; e 0 .q; u/.h; v/ D eq .q; u/h C eu .q; u/v 2 H
0
which, together with the weak convergence u.qnj / uO nj * u.q0 / u, O implies u.qnj / uO nj ! u.q0 / uO due to (9). In addition the fact that uO nj ! uO implies that u.qnj / ! u.q0 /. Finally, this argument holds for any convergent subsequence of fqn g and hence the Theorem is proved.
where
Z Z
heq .q; u/h; iH e1 D e1 ;H 0
heu .q; u/v; iH e1 D e1 ;H 0
IV. T HE F INITE N OISE P ROBLEM e s as an approximate The immediate benefit of using H mix search space is that it imbeds continuously in L1 .D /, regardless of the size of the stochastic dimension n. By e s . n / we virtue of the tensor product structure of H mix may consider Sobolev regularity component-wise, which, in conjunction with the compact imbedding of H 1 .i / in L1 .i /, gives rise to this property summarized in the following lemma. es Lemma IV.1 The space H mix imbeds continuously in 1 n L .D / for all n 2 N. A. Differentiability and Existence of Lagrange Multipliers es H e! Consider again the right hand side en .; / W H mix 1 e of the equality constraint, given in variational form, i.e. H es for all 2 H
ZD Z D
n
n
hru rn dy dx
and
qrv rn dy dx
e 1 . We can now derive more traditional first for all 2 H 0 order necessary optimality conditions [19] Theorem IV.2 (Existence of Lagrange Multipliers) Let .q ; u / be a minimizer for problem (P n ). Then there e 1 for which the exists a unique Lagrange multiplier 2 H 0 Lagrange functional L.q; uI / D J.q; u/ C h; en .q; u/iH e1 0
satisfies e1 L0 .q ; u I /.h; v/ 0 for all .h; v/ 2 C.q / H 0 where C.q / D fl.c q / W c 2 Qad ; l 0g:
mix
Particularly, the adjoint equation and complementary condition hold
hen .q; u/; iH e10 WD Z Z q.x; y/ru.x; y/ .x; y/n .y/dy dx D n Z Z f .x; y/.x; y/n .y/dy dx: D
e 10 8 2 H
0
and hence ke.q; u u/k Q H QH e1 qmax ju uj e10 . In order to es , establish the continuity of e.q; u/ with respect to q 2 H mix we make use of the imbedding from Lemma IV.1. Let q; qQ 2 e 1 be an arbitrary test function. Then e s and 2 H H 0 mix ˇZ Z ˇ ˇ ˇ ˇ ˇ .q q/ru Q r .y/dy dx n ˇ ˇ n
0
n
D
QH qmax ju uj e1 jjH e1
uO kH O H e1 ku.q0 / uk e1
ku.q / uk O H O H e1 ku.q0 / uk e1 ;
0
0
nj
(12)
j !1
from its boundedness in these variables. Continuity in u is e 1, straightforward. For u; uQ 2 H 0 ˇZ Z ˇ ˇ ˇ ˇ jhe.q; u u/; Q iH qr.u u/ Q r n dy dx ˇˇ e1 j D ˇ
en .q ; / D hu uO n ; iH e1
0
(13)
ˇhq ; q q iH esmix hr ; .q q /ru iL2 .D/ 0 (14)
n
Since e.q; u/ is affine linear in both arguments, it’s Fr´echet differentiability with respect to either q or u follows directly
for all q 2 Qad .
2931
B. A Simple Descent Algorithm In this section we outline a simple gradient descent algorithm. Consider again our problem (P n ). We rewrite it in simpler notation. 1 ˇ min J.q; u.q// D ku.q/ uO n k2H kqkH 1 C esmix e s 2 2 0 emix q2Qad \H e 1 ; s:t: e.q; u.q// D 0 in H (15) es H e1 ! H e 1 is a bounded bilinear form, where e W H 0 mix s 1 e e J.q; u.q// W H mix H 0 ! R. Evidently J 0 .q; u.q//h D Ju .q; u.q//u0 .q/h C Jq .q; u.q//h D hu.q/ uO n ; u0 .q/hiH e1 C hq; hiH es 0
mix
D u0 .q/ Ju .q; u.q//h C Jq .q; u.q//h and e.q; u.q// D 0
in
where .Y1 ; Y2 / uniform Œ0; 12 . The model data uO was computed from the data generated by q and f and expanded in the same random basis .Y1 ; Y2 /. We then implemented an augmented Lagrangian algorithm to identify q. From an initial guess of q 0 .x; Y1 ; Y2 / D 3, the algorithm converges, albeit slowly, to the correct function. We used 16 quadratic finite elements for the deterministic portion of q (33 degrees of freedom) and a 3rd level interpolation in stochastic space, yielding 29 sparse grid points. We therefore had to estimate the coefficients of a 957 dimensional vector. In Fig. 1, we compare the values of fq.; Y1i ; Y2i /g29 i D1 and the estimated at all 29 stochastic collocation values of fq .; Y1i ; Y2i /g29 i D1 points. As we can see, there is very good agreement between the interpolated values of the parameter and the estimated parameter. The largest discrepancy occurs near x D 0 where the slope of deterministic solutions are near zero and hinder the identifiability of the parameter.
e 1 H
Interpolant of Exact Parameter
Interpolant of Estimated Parameter
5
5
4.5
4.5
4
4
3.5
3.5
3
3
leads to e 1 eq .q; u.q//h C eu .q; u.q//u0 .q/h D 0 in H e s and for all h 2 H mix u0 .q/ Ju .q; u.q// D eq .q; u.q// eu .q; u.q// Ju .q; u.q// e /. Equivalently, u0 .q/ Ju .q; u.q// e 1 ; H in L.H mix e 1 / H e 1 solves eq .q; u.q// p , where p 2 .H 0
D
eu .q; u.q// p D Ju .q; u.q//:
2.5 0
e1 ! H e 1 W v 7! hqrv; ri, we can Since eu .q; u/ W H 0 rewrite this adjoint equation as hqrp; ri D hru r u; O i:
(16)
e !H e 1 W h 7! hhru; ri and eu .q; u/ W Here eq .q; u/ W H 1 1 e e1 ! e W v 7! hqrv; i. Therefore eq .q; u/ W H H0 ! H 0 e H mix maps p 7! hrp; rui. Hence J 0 .q; u.q//h D hhrp; ruiL2 C hq; hiH es
mix
:
V. NUMERICAL EXAMPLE To test this formulation on a stochastic parameter estimation problem, we generated model data uO using a known stochastic conductivity q.x; !/ and source term f given by q.x; !/ D 3 C x 2 C Y1 .!/ sin. x/ C Y2 .!/ sin.2 x/;
0.4
0.6
0.8
1
2.5 0
0.2
0.4
0.6
0.8
1
Comparison of q and q at Stochastic Collocation Points
Access to the ‘finite noise’ stochastic representation of q allows one to address fundamental questions through integration over the stochastic variables. Using this example, and the moment formulas q0 D EŒq
(17)
es It is also evident that the constraint set Qad \ H mix is s e closed and convex in H mix . We can therefore make use of Algorithms such as the projected Armijio rule [24] to find the optimal q. This method has linear convergence. Moreover, the calculation of the cost functional requires one forward solve = Ncollocation deterministic solves, which can be done in parallel however. The computation of the gradient also requires an adjoint solve (one for each stochastic collocation point).
f .x/ D 6 C 18x 8x 2 C 12x 3 ;
Fig. 1.
0.2
and
k D EŒ.q q0 /k ;
we compute the first 9 central moments of the stochastic variable q and compare them with the exact moments in Fig. 2. There is very good agreement between the moments of the true and estimated parameters. For instance, we see that the standard deviation 2 changes significantly across the spatial domain Œ0; 1 and that aside from the first moment, odd moments are computed to be relatively insignificant. VI. CONCLUSIONS AND FUTURE WORKS A. Conclusions Estimation of random distributed parameters is a natural complement to stochastic simulations and uncertainty quantification. We have developed a mathematical framework for posing the parameter estimation problem where we attempt to approximate the entire random field across the distributed parameter. While other formulations, such as attempting to identify moments, are possible, the present formulation allows one to both answer interesting probabilistic questions
2932
+2
q = E[q] 0
4
0.2
1
x 10
+3
0 0.1
3.5
3 0
0.5
1
0 0
+4
0.5
0.06
5
0.04
0
0.02
x 10
1
0
0.5
+5
1
+6 0.02
0.01
0 0
0.5
5
x 10
1
0
+
0.5
1
0 0
+
7
0.5
8
0.01
2
0
x 10
1
+
9
0 0.005
0
0.5
1
0 0
0.5
1
0
0.5
1
Fig. 2. The first 9 central moments of q of the exact (solid) and estimated (dotted) parameters
by computing integrals in the probability space and to have access to a spectral expansion of parameters that are suitable for applying modern stochastic partial differential equation approximation methods. We pose the random parameter estimation problem as an optimization problem using the ‘finite noise’ assumption. We justify the finite noise approximation and the least-squares optimization framework when the covariance function is smooth enough. A differentiating s feature of this work is the setup of the problem in HQ mix , the space of bounded mixed derivatives. This space is used in the analysis of hierarchical collocation rules used in sparse grid methods. Thus, it is a natural space for approximation of the first-order necessary conditions as well as for implementing regularization. B. Future Works In future work, we plan to address more efficient optimization algorithms such as simultaneously solving the state, adjoint and gradient equations using sparse grid collocation. In addition, we will investigate other strategies for postprocessing the distributed ‘finite noise’ parametric variables such as computing the likelihood the parameter lies between given ranges. VII. ACKNOWLEDGMENTS This material is based upon work supported by the Greater Philadelphia Innovation Cluster (GPIC) for Energy Efficient Buildings an energy innovation HUB sponsored by the Department of Energy under Award Number DE-EE0004261.
[3] K. Ito, M. Kroller, and K. Kunisch, “A numerical study of an augmented Lagrangian method for the estimation of parameters in elliptic systems,” SIAM Journal on Scientific and Statistical Computing, vol. 12, no. 4, pp. 884–910, 1991. [4] T. F. Chan and X.-C. Tai, “Identification of discontinuous coefficients in elliptic problems using total variation regularization,” SIAM Journal on Scientific Computing, vol. 25, no. 3, pp. 881–904 (electronic), 2003. [5] ——, “Level set and total variation regularization for elliptic inverse problems with discontinuous coefficients,” Journal of Computational Physics, vol. 193, no. 1, pp. 40–66, 2004. [6] A. Tarantola, Inverse problem theory and methods for model parameter estimation. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM), 2005. [7] A. M. Stuart, “Inverse problems: a Bayesian perspective,” Acta Numerica, vol. 19, pp. 451–559, 2010. [8] H. T. Banks and K. L. Bihari, “Modelling and estimating uncertainty in parameter estimation,” Inverse Problems, vol. 17, no. 1, pp. 95–111, 2001. [9] R. T. Rockafellar, “Optimization under uncertainty,” Lecture Notes, 2001. [10] N. Zabaras and B. Ganapathysubramanian, “A scalable framework for the solution of stochastic inverse problems using a sparse grid collocation approach,” Journal of Computational Physics, vol. 227, no. 9, pp. 4697–4735, 2008. [11] A. Sandu, Solution of Inverse Problems using Discrete ODE Adjoints. John Wiley and Sons, Ltd, 2010, pp. 345–365. [12] I. Babuˇska, R. Tempone, and G. E. Zouraris, “Solving elliptic boundary value problems with uncertain coefficients by the finite element method: the stochastic formulation,” Computational Methods in Applied Mechanics and Engineering, vol. 194, pp. 1251–1294, 2005. [13] D. Xiu and G. E. Karniadakis, “The wiener–askey polynomial chaos for stochastic differential equations,” SIAM Journal on Scientific Computing, vol. 24, no. 2, pp. 619–644, 2002. [14] F. Nobile, R. Tempone, and C. G. Webster, “A sparse grid stochastic collocation method for partial differential equations with random input data,” SIAM Journal on Numerical Analysis, vol. 46, no. 5, pp. 2309– 2345, 2008. [15] H.-J. Bungartz and M. Griebel, “Sparse grids,” Acta Numerica, vol. 13, pp. 147–269, 2004. [16] I. Babuˇska, R. Tempone, and G. E. Zouraris, “Galerkin finite element approximations of stochastic elliptic partial differential equations,” SIAM Journal on Numerical Analysis, vol. 42, no. 2, pp. 800–825, 2004. [17] M. Reed and B. Simon, Methods of modern mathematical physics. I. Functional analysis. New York: Academic Press, 1972. [18] Z. Chen and J. Zou, “An augmented Lagrangian method for identifying discontinuous parameters in elliptic systems,” SIAM Journal on Control and Optimization, vol. 37, no. 3, pp. 892–910, 1999. [19] H.-W. van Wyk, “A variational approach to estimating uncertain parameters in elliptic systems,” Ph.D. dissertation, Virginia Tech, Blacksburg, VA, May 2012. [20] M. M. Lo`eve, Probability Theory. Princeton, NJ: Van Nostrand, 1955. [21] C. Schwab and R. A. Todor, “Karhunen-Lo`eve approximation of random fields by generalized fast multipole methods,” Journal of Computational Physics, vol. 217, no. 1, pp. 100–122, 2006. [22] F. Nobile, R. Tempone, and C. G. Webster, “An anisotropic sparse grid stochastic collocation method for partial differential equations with random input data,” SIAM Journal on Numerical Analysis, vol. 46, no. 5, pp. 2411–2442, 2008. [23] A. Binder, H. W. Engl, C. W. Groetsch, A. Neubauer, and O. Scherzer, “Weakly closed nonlinear operators and parameter identification in parabolic equations by Tikhonov regularization,” Applicable Analysis, vol. 55, no. 3-4, pp. 215–234, 1994. [24] M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich, Optimization with PDE constraints, ser. Mathematical Modelling: Theory and Applications. New York: Springer, 2009, vol. 23.
R EFERENCES [1] H. T. Banks and K. Kunisch, Estimation Techniques for Distributed Parameter Systems, ser. Systems & Control: Foundations & Applications. Birkh¨auser, 1989, vol. SC1. [2] K. Ito and K. Kunisch, “The augmented Lagrangian method for parameter estimation in elliptic systems,” SIAM Journal on Control and Optimization, vol. 28, no. 1, pp. 113–136, 1990.
2933