Machine Learning Algorithms

M

Machine Learning Algorithms Ding-Xuan Zhou Department of Mathematics, City University of Hong Kong, Hong Kong, China

Machine learning algorithms are motivated by the mission of extracting and processing information from massive data which challenges scientists and engineers in various fields such as biological computation, computer vision, data mining, image processing, speech recognition, and statistical analysis. Tasks of machine learning include regression, classification, dimension reduction, clustering, ranking, and feature selection. Learning algorithms aim at learning from sample data structures or function relations (responses or labels) to events by means of efficient computing tools. The main difficulty of learning problems lies in the huge sample size or the large number of variables. Solving these problems relies on suitable statistical modelling and powerful computational methods to tackle the involved large-size optimization problems. There are essentially three categories of learning problems: supervised learning, unsupervised learning, and semi-supervised learning. The input space X for a learning problem contains its possible events x. A typical case is when X is a subset of a Euclidean space IRn with an element x D .x 1 ; : : : ; x n / 2 X corresponding to n numerical measures for a practical event. For a supervised learning problem, the output space Y contains all possible responses or labels y, and it might be a set of real numbers Y IR for regression or a finite set of labels for classification.

A supervised learning algorithm produces a function fz W X ! Y based on a given set of examples z D f.xi ; yi / 2 X Y gm i D1 . It predicts a response fz .x/ 2 Y to each future event x 2 X . The prediction accuracy or learning ability of an output function f W X ! Y may be measured quantitatively by means of a loss function V W Y Y ! IRC as V .f .x/; y/ for an input-output pair .x; y/ 2 X Y . For regression, the least squares loss V .f .x/; y/ D .f .x/ y/2 is often used and it gives the least squares error when the output function value f .x/ (predicted value) approximates the true output value y. The first family of supervised learning algorithms can be stated as empirical risk minimization (ERM). Such an algorithm [1, 16] is implemented by minimizing the empirical risk or empirical error Ez .f / WD 1 Pm i D1 V .f .xi /; yi / over a set H of functions from X m to Y (called a hypothesis space) fz D arg min Ez .f /: f 2H

(1)

Its convergence can be analyzed by the theory of uniform convergence or uniform law of large numbers [1, 4, 6, 16]. The second family of supervised learning algorithms can be stated as Tikhonov regularization schemes in .HK ; k kK /, a reproducing kernel Hilbert space (RKHS) associated with a reproducing kernel K W X X ! IR, a symmetric and positive semidefinite function. Such a regularization scheme is a kernel method [3, 7, 9, 12, 16, 17] defined as

© Springer-Verlag Berlin Heidelberg 2015 B. Engquist (ed.), Encyclopedia of Applied and Computational Mathematics, DOI 10.1007/978-3-540-70529-1

˚ fz D arg min Ez .f / C kf k2K ; f 2HK

(2)

840

Machine Learning Algorithms

where > 0 is a regularization parameter (which might be determined by cross-validation). The Tikhonov regularization scheme (with a general function space) has a long history in various areas [14]. It is powerful for learning due to the important property of the RKHS: f .x/ D hf; K.; x/iK . Taking the orthogonal projection onto the subspace P m f m i D1 ci K.; xi / W c 2 IR g does not change the empirical error of f 2 HK . Hence, the minimizer fz of (2) must lie in this subspace (representer theorem [17]) and its coefficients can be computed by minimizing the induced function over IRm . This reduces the computational complexity while the strong approximation ability provided by the possibly infinitely many dimensional function space HK may be maintained [5, 11]. Moreover, when V is convex with respect to the first variable, the algorithm is implemented by a convex optimization problem in IRm . A well-known setting for (2) is support vector machine (SVM) algorithms [15, 16, 18] for which the optimization problem is a convex quadratic programming one: for binary classification with Y D f1; 1g, V is the hinge loss given by V .f .x/; y/ D maxf1yf .x/; 0g; for SVM regression, V .f .x/; y/ D .y f .x// is induced by the insensitive loss .t/ D maxfjtj ; 0g with > 0. When V takes the least squares loss, the coefficient vector for fz satisfies a linear system of equations. The third family of supervised learning algorithms are coefficient-based regularization schemes. With a general (not necessarily positive semi-definite or symmetric) kernel K, the scheme takes the form

fz D

m X

ciz K.; xi /; where .ciz /m i D1

i D1

(

D arg minm Ez c2IR

m X

!

)

ci K.; xi / C .c/ ; (3)

i D1

where W IRm ! IRC is a regularizer or penalty. Scheme (3) has the advantage of possibly producing sparse representations [16]. One well-known algorithm is Lasso [13] which takes the least squares loss with linear kernel K.x; u/ D x u and the `1 -regularizer: .c/ D kck`1 . In addition to sparsity, the non-smooth optimization problem (3) can be tackled by an efficient least angle regression algorithm. The `1 -regularizer also plays a crucial role for compressive sensing. For sparsity and approximation ability of scheme (3) with

a general or data dependent kernel and a general regularizer, see the discussion in [10]. There are many other supervised learning algorithms such as k-nearest neighbor methods, Bayesian methods, maximum likelihood methods, expectationminimization algorithm, boosting methods, tree-based methods, and other non-kernel-based methods [6, 8, 9]. Modelling of supervised learning algorithms is usually stated under the assumption that the sample z is randomly drawn according to a (unknown) probability measure on X Y with both X and Y being metric spaces. Mathematical analysis of a supervised learning algorithm studies the convergence of the genReralization error or expected risk defined by E.f / D X Y V .f .x/; y/d, in the sense that E.fz / converges with confidence or in probability to the infimum of E.f / when f runs over certain function set. The error analysis and learning rates of supervised learning algorithms involves uniform law of large numbers, various probability inequalities or concentration analysis, and capacity of function sets [1, 3, 5, 12, 16, 19]. Unsupervised learning aims at understanding properties of the distribution of events in X from a sample n m x D fxi gm i D1 2 X . In the case X IR , an essential difference from supervised learning is the number n of variables is usually much larger. When n is small, unsupervised learning tasks may be completed by finding from the sample x good approximations of the density function of the underlying probability measure X on X . Curse of dimensionality makes this approach difficult when n is large. Principal component analysis (PCA) can be regarded as an unsupervised learning algorithm. It attempts to find some information about covariances of variables and to reduce the dimension for representing the data in X efficiently. Kernel PCA is an unsupervised learning algorithm generalizing this idea [9]. It is based on a kernel K which assigns a value K.x; u/ measuring dissimilarity or association between the events x and u. The sample x yields a matrix ŒK D .K.xi ; xj //m i;j D1 and it can be used to analyze the feature map mapping data points x 2 X to K.; x/ 2 HK . Thus kernel PCA overcomes some limitations of linear PCA. Graph Laplacian is another unsupervised learning algorithm. With a sample dependent matrix ŒK, the Laplacian matrix L is defined as L D D ŒK, where D is a diagonal matrix with diagonal entries P Di i D m j D1 K.xi ; xj /. Let 0 D 1 2 : : : m be the eigenvalues of the generalized eigenproblem

Markov Random Fields in Computer Vision: MAP Inference and Learning

Lf D f and f1 ; : : : ; fm be the associated normalized eigenvectors in IRn . Then by setting a low dimension s < m, the graph Laplacian eigenmap [2] embeds the data point xi into IRs as the vector ..f2 /i ; : : : ; .fsC1 /i /, which reduces the dimension and possibly keeps some data structures. Graph Laplacian can also be used for clustering. One way is to cluster the data x into two sets fi W .f2 /i 0g and fi W .f2 /i < 0g. Other unsupervised learning algorithms include local linear embedding, isomap, and diffusion maps. In many practical applications, getting labels would be expensive and time consuming while large unlabelled data might be available easily. Making use of unlabelled data to improve the learning ability of supervised learning algorithms is the motivation of semi-supervised learning. It is based on the expectation that the unlabelled data reflect the geometry of the underlying input space X such as manifold structures. Let us state a typical semi-supervised learning algorithm associated with a Mercer kernel K, labelled data mCu z D f.xi ; yi /gm i D1 and unlabelled data u D fxi gi DmC1 . mCu With a similarity matrix .!ij /i;j D1 such as truncated Gaussian weights, the semi-supervised learning algorithm takes the form 8 < fz;u;; D arg min Ez .f / C kf k2K f 2HK : 9 mCu = X 2 C ! .f .x / f .x // ; ij i j ; .m C u/2 i;j D1 (4)

841

4. Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2001) 5. Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, Cambridge/New York (2007) 6. Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1997) 7. Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput. Math. 13, 1–50 (2000) 8. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009) 9. Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT, Cambridge (2002) 10. Shi, L., Feng, Y.L., Zhou, D.X.: Concentration estimates for learning with `1 -regularizer and data dependent hypothesis spaces. Appl. Comput. Harmon. Anal. 31, 286–302 (2011) 11. Smale, S., Zhou, D.X.: Estimating the approximation error in learning theory. Anal. Appl. 1, 17–41 (2003) 12. Steinwart, I., Christmann, A.: Support Vector Machines. Springer, New York (2008) 13. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58, 267–288 (1996) 14. Tikhonov, A., Arsenin, V.: Solutions of Ill-Posed Problems. W. H. Winston, Washington, DC (1977) 15. Tsybakov, A.B.: Optimal aggregation of classifiers in statistical learning. Ann. Stat. 32, 135–166 (2004) 16. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998) 17. Wahba, G.: Spline Models for Observational Data. SIAM, Philadelphia (1990) 18. Zhang, T.: Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Stat. 32, 56–85 (2004) 19. Zhou, D.X.: The covering number in learning theory. J. Complex 18, 739–767 (2002)

Markov Random Fields in Computer

where ; > 0 are regularization parameters. It is Vision: MAP Inference and Learning unknown whether rigorous error analysis can be done to show that algorithm (4) has better performance than Nikos Komodakis1;4 , M. Pawan Kumar2;3 , and algorithm (2) when u >> m. In general, mathemat- Nikos Paragios1;2;3 ical analysis for semi-supervised learning is not well 1 Ecole des Ponts ParisTech, Universite Paris-Est, understood compared to that of supervised learning or Champs-sur-Marne, France 2´ unsupervised learning algorithms. Ecole Centrale Paris, Châtenay-Malabry, France 3´ Equipe GALEN, INRIA Saclay, Île-de-France, France References 4 UMR Laboratoire d’informatique Gaspard-Monge, 1. Anthony, M., Bartlett, P.L.: Neural Network Learning: The- CNRS, Champs-sur-Marne, France oretical Foundations. Cambridge University Press, Cambridge/New York (1999) 2. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003) 3. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge/ New York (2000)

Introduction A Markov random field (MRF) can be visualized as a graph G D .V; E/. Associated with each of its n vertices Va (where a 2 f1; ; ng) is a discrete

M

842

Markov Random Fields in Computer Vision: MAP Inference and Learning

random variable Xa , which can take a value from a finite, discrete label set Xa . We will refer to a particular assignment of values to the random variables from the corresponding label sets as a labeling. In other words, a labeling x 2 X1 X2 Xn implies that the random variable Xa is assigned the value xa . The probability of a labeling is specified by potential functions. For simplicity, we will assume a pairwise MRF parameterized by w, whose potential functions are either unary, denoted by a .xa I w/, or pairwise, denoted by ab .xa ; xb I w/. For a discussion on high-order MRFs, we refer the interested reader to the following relevant books [2, 9, 20]. The joint probability of all the random variables can be expressed in terms of the potential functions as follows: Pr.xI w/ / exp.E.xI w//; E.xI w/ X X a .xa I w/ C ab .xa ; xb I w/: (1) D

MAP Inference Maximum a posteriori (MAP) inference refers to the estimation of the most probable labeling. Formally, it is defined as

MAPG ./ min x

X Va 2V

a .xa / C

X

ab .xa ; xb /;

.Va ;Vb /2E

(2) where we have dropped the parameters w from the notation of the potential functions to avoid clutter. The above problem is known to be NP-hard in general. However, given its importance, several approximate algorithms have been proposed in the literature, which we review below.

Belief Propagation Belief propagation (BP) [21] is an iterative message The function E.xI w/ is called the Gibbs energy (or passing algorithm, where the messages at iteration t are given by simply the energy) of x. Va 2V

.Va ;Vb /2E

mtab .j / D min

i 2Xa

8 < :

9 =

X

a .i / C ab .i; j / C

c¤b;.Va ;Vc /2E

t 1 mca .i / ; 8.a; b/ 2 E; j 2 Xb : ;

(3)

At convergence (when the change in messages is below a minimum st-cut on a graph [10, 23] if the energy tolerance), the approximate MAP labeling is estimated function is submodular, that is, it satisfies as 8
0 and location Rk 2 Y for every 1 k K, we denote: mD

K X

zk ı. Rk /

kD1

the pattern, where ı is the Dirac mass and Z D PK kD1 zk the total nuclear charge by unit cell. This distribution of charge creates an attractive Coulomb-type P P zk potential. The sum 2 K kD1 jRk C j is infinite due to the long range of the Coulomb potential in R3 , but it can be renormalized. Actually, the potential due to P the nuclei is the periodic potential Gm D K kD1 zk G . Rk /, where G is the -periodic Coulomb kernel; that is, the unique square-integrable function on Y solving the Poisson equation: 8 X 1 ˆ 0. In this latter case, the thermodynamic limit for the ground-state density is based on a careful analysis of the Euler–Lagrange equation resulting in the limit. This equation belongs to the class of non local and non linear elliptic PDEs without boundary conditions, and the task is to show the existence of a unique and periodic nonnegative solution [8]. Hartree–Fock Type Models We now define the periodic Hartree–Fock functional (and its reduced version) as introduced in Catto et al. [9]; see also [18]. This is the analog of the standard Hartree–Fock model for molecules when expressed in terms of the one-particle density matrix, in the periodic setting; see I. Catto’s entry on Hartree–Fock Type Methods in this encyclopedia. The main object of interest is the one-particle density matrix of the electrons , that is, a self-adjoint operator on L2 .R3 /, satisfying 2 , or equivalently, 0 1, where 1 is the identity operator on L2 .R3 /, and that commutes with the translations that preserve the un-

dx D Z

o

Y

derlying lattice . As in the independent electron case, the Bloch theorem allows to decompose the density matrix according R ˚to the Bloch waves decomposition L2 .R3 / D jY1 j Y d L2 .Y /. This leads to:

Y

C

Z

Z

˚ Y

d ;

where, for almost every 2 Y , is a self-adjoint operator on L2 .Y / that is trace class and such that 2 , or equivalently, 0 1, where 1 is the identity operator on L2 .Y /. Using the same notation for the Hilbert–Schmidt kernel of , we have: .x; y/ D

X

n . / e i .xy/ un . ; x/ uN n . ; y/;

n1

where fe i x un . ; x/gn1 is a complete set of eigenfunctions of on L2 .Y / corresponding to eigenvalues n . / 2 Œ0; 1 and where zN denotes the complex conjugate of the complex number z. For almost every

2 Y , the function x 7! .x; x/ is non negative, -periodic and locally integrable on Y , and Z TrL2 .Y / D

.x; x/ dx D Y

X

n . /:

n1

The full electronic density is the -periodic function that is defined by

Mathematical Theory for Quantum Crystals

.x/ D

1 jY j

861

Z Y

d .x; x/:

(2)

HF Iper

D inf

n

Z HF Eper ./

j 2 Dper ;

dx D Z

o

Y

R

In particular, Y .y/ dy is the number of electrons with per unit cell. An extra condition on the s is necessary

Z 1 2 1 HF in order to give sense to the kinetic energy term in the r d

Tr 2 Eper ./ D jY j Y L .Y / 2 periodic setting. This condition reads: Z Z Z Gm .y/ .y/ dy 2 d TrL2 .Y / Œr D d n . / Y

“ Y Y 1 Z .x/ G .x y/ .y/ dxdy C 2 Y Y jrx un . ; x/j2 dx < C1; Y “ j.x; y/j2 1 dxdy (3) R ˝ 2 Y R3 jx yj where r 2 D jY1 j Y d Œr 2 ; that is, r 2 is the Laplace operator on Y with quasi periodic boundary and where Dper is the set of admissible periodic density conditions with quasi momentum . The Hartree–Fock model for crystals (HF, in short) matrix reads: n Dper D W L2 .R3 / ! L2 .R3 /; D ; 0 1; D

1 jY j

Z

Z

˚

Y

d ;

Y

o TrL2 .Y / .1 r 2 /1=2 .1 r 2 /1=2 d < C1 :

M

The last term in the energy functional, namely (3), is the exchange term; it can also be rephrased as:

W .; x/ D 4 e i x

X e i kx I j kj

k2

“

j.x; y/j2 dxdy Y R3 jx yj “ 1 D 2 d d 0 jY j Y Y “ dxdy .x; y/ W . 0 ; x y/ 0 .x; y/; Y Y

thereby shedding light on its non local nature, where X e i W .; z/ D ; jz C j 2

see [9]. The reduced Hartree–Fock model for crystals (rHF, in short) is obtained from the Hartree–Fock model by getting rid of the exchange term in the energy functional; that is, rHF Eper ./

Z 1 1 D Tr 2 Œ r 2 d

jY j Y L .Y / 2

Z Gm .y/ .y/ dy Y

1 C 2

“ .x/ G .x y/ .y/ dxdy: Y Y

; z 2 R3 :

From a mathematical point of view, this latter model has nicer properties, the energy functional being strictly convex with respect to the density. Existence rHF HF The function e i x W .; x/ is -periodic with respect of a minimizer for Eper and Eper R on the set of density to x when is fixed. The Fourier series expansion of matrices 2 Dper such that Y .x/ dx D Z was proved by Catto, Le Bris, and Lions in [9]. Uniqueness W writes as follows:

862

Mathematical Theory for Quantum Crystals

in the rHF case has been proven later by Cancès D .1;F .HrHF /; et al. [6]. The periodic mean-field Hartree–Fock Hamiltonian HHF corresponding to a minimizer is decomposed where, here again, the real F is a Lagrange multiplier, according toR the Bloch waves decomposition as identified with a Fermi level. In particular, the mini ˚ HHF D jY1 j Y d HHF with mizer is also a projector in that case. These properties are crucial for the proper construction of a reduced HF Hartree–Fock model for a crystal with defect; see 1 2 H D r Gm C ?Y G [7]. 2 Z 1 W . 0 ; x y/ 0 .x; y/ d 0 : jY j Y The self-consistent equation satisfied by a minimizer is then: D .1;F / .HHF / C s fF g .HHF /;

(4)

where I is the spectral projection onto the interval I , the real F is a Lagrange multiplier, identified to a Fermi level like in the linear case, and the real parameter s is 0 or 1, as proved by Ghimenti and Lewin in [11]. In particular, they also have proved that every minimizer of the periodic Hartree–Fock functional is necessarily a projector, a fact that was only known so far for the Hartree–Fock model for molecules; see [2, 15] and entry Hartree–Fock Type Methods in this encyclopedia. Therefore, for almost every 2 Y , the eigenvalues fn . /gn1 that appear in the decomposition of are either 0 or 1 for a minimizer. The Euler– Lagrange equation (4) may be rewritten in terms of the eigenfunctions fun . ; /gn1 of the operators s. We obtain a system of infinitely many non linear, non local coupled partial differential equations of Schrödinger type: for every n 1 and for almost every 2 Y : ( HHF un . ; x/ D n . / un . ; x/; on Y; n . / F :

Extensions Crystals with Defects Real crystals feature defects or irregularities in the ideal arrangements described above, and it is these defects that explain many of the electrical and mechanical properties of real materials [14]. The first mathematical result in this direction is due to Cancès, Deleurence, and Lewin who introduced and studied in [6, 7] a rHF-type model for crystals with a defect, that is, a vacancy, an interstitial atom, or an impurity with possible rearrangement of the neighboring atoms.

Crystal Problem It is an unsolved problem in the study of matter to understand why matter is in crystalline state at low temperature. A few mathematical results have contributed to partially answer this fundamental issue, known as the crystal problem. The pioneering work is due to Radin for electrons considered as classical particles and in one space dimension [19,20]. In two dimensions, the crystallization phenomenon in classical mechanics has been solved by Theil [23]. For quantum electrons, the first mathematical result goes back to Blanc and Le Bris, within the framework of a onedimensional TFW model [3].

Analogous results for the reduced Hartree–Fock model were proved by Cancès, Deleurence, and Lewin in [6]. In that case, the minimizer is unique, and denoting by

References 1 HrHF D r 2 Gm C ?Y G 2 the corresponding periodic mean-field Hamiltonian, it solves the non linear equation

1. Ashcroft, N., Mermin, N.: Solid State Physics. Saunders, Philadelphia (1976) 2. Bach, V.: Error bound for the Hartree-Fock energy of atoms and molecules. Commun. Math. Phys. 147(3), 527–548 (1992)

Matrix Functions: Computation 3. Blanc, X., Bris, C.L.: Periodicity of the infinite-volume ground state of a one-dimensional quantum model. Nonlinear Anal. 48(6, Ser. A: Theory Methods), 791–803 (2002) ´ Ehrlacher, V.: Local defects are always neutral 4. Cancès, E., in the Thomas–Fermi–von Weiszäcker model for crystals. Arch. Ration. Mech. Anal. 202(3), 933–973 (2011) 5. Cancès, E., Defranceschi, M., Kutzelnigg, W., Le Bris, C., Maday, Y.: Computational quantum chemistry: a primer. In: Ciarlet, P.G. (ed.) Handbook of Numerical Analysis, vol. X, pp. 3–270. North-Holland, Amsterdam (2003) ´ Deleurence, A., Lewin, M.: A new approach 6. Cancès, E., to the modelling of local defects in crystals: the reduced Hartree-Fock case. Commun. Math. Phys. 281(1), 129–177 (2008) ´ Deleurence, A., Lewin, M.: Non-perturbative 7. Cancès, E., embedding of local defects in crystalline materials. J. Phys. 20, 294, 213 (2008) 8. Catto, I., Le Bris, C., Lions, P.L.: The Mathematical Theory of Thermodynamic Limits: Thomas-Fermi Type Models. Oxford Mathematical Monographs. Clarendon/Oxford University Press, New York (1998) 9. Catto, I., Le Bris, C., Lions, P.L.: On the thermodynamic limit for Hartree-Fock type models. Ann. Inst. H. Poincaré Anal. Non Linéaire 18(6), 687–760 (2001) 10. Fefferman, C.: The thermodynamic limit for a crystal. Commun. Math. Phys. 98(3), 289–311 (1985) 11. Ghimenti, M., Lewin, M.: Properties of periodic HartreeFock minimizers. Calc. Var. Partial Differ. Equ. 35(1), 39–56 (2009) 12. Hainzl, C., Lewin, M., Solovej, J.P.: The thermodynamic limit of quantum Coulomb systems. Part I. General theory. Adv. Math. 221, 454–487 (2009) 13. Hainzl, C., Lewin, M., Solovej, J.P.: The thermodynamic limit of quantum Coulomb systems. Part II. Applications. Adv. Math. 221, 488–546 (2009) 14. Kittel, C.: Introduction to Solid State Physics, 8th edn. Wiley, New York (2004) 15. Lieb, E.H.: Variational principle for many-fermion systems. Phys. Rev. Lett. 46, 457–459 (1981) 16. Lieb, E.H., Lebowitz, J.L.: The constitution of matter: existence of thermodynamics for systems composed of electrons and nuclei. Adv. Math. 9, 316–398 (1972) 17. Lieb, E.H., Simon, B.: The Thomas-Fermi theory of atoms, molecules and solids. Adv. Math. 23(1), 22–116 (1977) 18. Pisani, C.: Quantum-mechanical ab-initio calculation of the properties of crystalline materials. In: Pisani, C. (ed.) Quantum-Mechanical Ab-initio Calculation of the Properties of Crystalline Materials. Lecture Notes in Chemistry, vol. 67, Springer, Berlin (1996) 19. Radin, C.: Classical ground states in one dimension. J. Stat. Phys. 35(1–2), 109–117 (1984) 20. Radin, C., Schulman, L.S.: Periodicity of classical ground states. Phys. Rev. Lett. 51(8), 621–622 (1983) 21. Reed, M., Simon, B.: Methods of Modern Mathematical Physics. IV. Analysis of Operators. Academic, New York (1978) 22. Ruelle, D.: Statistical Mechanics. Rigorous Results. World Scientific, Singapore/Imperial College Press, London (1999) 23. Theil, F.: A proof of crystallization in two dimensions. Commun. Math. Phys. 262(1), 209–236 (2006)

863

Matrix Functions: Computation Nicholas J. Higham School of Mathematics, The University of Manchester, Manchester, UK

Synonyms Function of a matrix

Definition A matrix function is a map from the set of complex n n matrices to itself defined in terms of a given scalar function in one of various, equivalent ways. For example, if the scalar function has a power series exP1 P i i pansion f .x/ D 1 i D1 ai x , then f .A/ D i D1 ai A for any n n matrix A whose eigenvalues lie within the radius of convergence of the power series. Other definitions apply more generally without restrictions on the spectrum [6].

Description Transformation Methods Let A be an n n matrix. A basic property of matrix functions is that f .X 1 AX / D X 1 f .A/X for any nonsingular matrix X . Hence, if A is diagonalizable, so that A D XDX 1 for some diagonal matrix D D diag.di / and nonsingular X , then f .A/ D Xf .D/X 1 D X diag.f .di //X 1 . The task of computing f .A/ is therefore trivial when A has a complete set of eigenvectors and the eigendecomposition is known. However, in general the diagonalizing matrix X can be arbitrarily ill conditioned and the evaluation in floating point arithmetic can therefore be inaccurate, so this approach is recommended only for matrices for which X can be assured to be well conditioned. For Hermitian, symmetric, or more generally normal matrices (those satisfying AA D A A), X can be taken unitary and evaluation by diagonalization is an excellent approach. For general matrices, it is natural to restrict to unitary similarity transformations, in which case the Schur decomposition A D QTQ can be exploited, where

M

864

Matrix Functions: Computation

Q is unitary and T is upper triangular. Now f .A/ D Qf .T /Q and the problem reduces to computing a function of a triangular matrix. In the 2 2 case there is an explicit formula:

f .1 / t12 f Œ2 ; 1 1 t12 ; D f 0 2 0 f .2 /

(1)

where f Œ2 ; 1 is a first-order divided difference and the notation reflects that i D ti i is an eigenvalue of A. More generally, when the eigenvalues are distinct, f .T / can be computed by an elegant recurrence due to Parlett [10]. This recurrence breaks down for repeated eigenvalues and can be inaccurate when two eigenvalues are close. These problems can be avoided by employing a block form of the recurrence, in which T D .Tij / is partitioned into a block m m matrix with square diagonal blocks Ti i . The Schur–Parlett algorithm of Davies and Higham [4] uses a unitary similarity to reorder the blocks of T so that no two distinct diagonal blocks have close eigenvalues while within every diagonal block the eigenvalues are close, then applies a block form of Parlett’s recurrence. Some other method must be used to compute the diagonal blocks f .Ti i /, such as a Taylor series taken about the mean of the eigenvalues of the block. The Schur– Parlett algorithm is the best general-purpose algorithm for evaluating matrix functions and is implemented in the MATLAB function funm. For the square root function, f .T / can be computed by a different approach: the equation U 2 D T can be solved for the upper triangular matrix U by a recurrence of Björck and Hammarling [3] that runs to completion even if A has repeated eigenvalues. A generalization of this recurrence can be used to compute pth roots [11]. Approximation Methods Another class of methods is based on approximations to the underlying scalar function. Suppose that for some rational function r, r.A/ approximates f .A/ well for A within some ball. Then we can consider transforming a general A to a matrix B lying in the ball, approximating f .B/ r.B/, then recovering an approximation to f .A/ from r.B/. The most important example of this approach is the scaling and squaring method for the matrix exponential, which approxis mates e A rm .A=2s /2 , where m and s are nonnega-

tive integers and rm is the Œm=m Padé approximant to e x . Backward error analysis can be used to determine a choice of the parameters s and m that achieves a given backward error (in exact arithmetic) at minimal computational cost [1, 7]. The analogue for the matrix logarithm is the inverse scaling and squaring method, which uses the approxs imation log.A/ 2s rm .A1=2 I /, where rm .x/ is the Œm=m Padé approximant to log.1 C x/. Here, amongst the many logarithms of a matrix, log denotes the principal logarithm: the one whose eigenvalues have imaginary parts lying in . ; /; there is a unique such logarithm for any A having no eigenvalues on the closed negative real axis. Again, backward error analysis can be used to determine an optimal choice of the parameters s and m [2]. The derivation of (inverse) scaling and squaring algorithms requires attention to many details, such as how to evaluate a Padé approximant at a matrix argument, how to obtain the sharpest possible error bounds while using only norms, and how to avoid unnecessary loss of accuracy due to rounding errors. Approximation methods can be effectively used in conjunction with a Schur decomposition, in which case the triangularity can be exploited [1, 2, 8]. Matrix Iterations For functions that satisfy an algebraic equation, matrix iterations can be set up that, under appropriate conditions, converge to the matrix function. Many different derivations are possible, one of which is to apply Newton’s method to the relevant equation. For example, for the equation X 2 D A, Newton’s method can be put in the form XkC1 D

1 .Xk C Xk1 A/; 2

(2)

under the assumption that X0 commutes with A. This iteration does not always converge. But if A has no eigenvalues on the closed negative real axis and we take X0 D A, then Xk converges quadratically to A1=2 , the unique square root of A whose spectrum lies in the open right half-plane. Matrix iterations potentially suffer from two problems: they may be slow to converge initially, before the asymptotic fast convergence (in practice of quadratic or higher rate) sets in, and they may be unstable in finite precision arithmetic. Iteration (2) suffers from both these

Mechanical Systems

865

problems. However, (2) is mathematically equivalent References to the coupled iteration 1 Xk C Yk1 ; 2 1 YkC1 D Yk C Xk1 ; 2

XkC1 D

X0 D A; (3) Y0 D I;

of Denman and Beavers [5]: the Xk from (3) are identical to those from (2) with X0 D I and Yk A1 Xk . This iteration is numerically stable. Various other equivalent and practically useful forms of (2) are available [6, Chap. 6]. The convergence of matrix iterations in the early stages can be accelerated by including scaling parameters. Consider the Newton iteration XkC1 D

1 .Xk C Xk1 /; 2

X0 D A:

(4)

Assuming that A has no pure imaginary eigenvalues, Xk converges quadratically to sign.A/, which is the matrix function corresponding to the scalar sign function that maps points in the open right half-plane to 1 and points in the open left half-plane to 1. Although the iteration converges at a quadratic rate, convergence can be extremely slow initially. To accelerate the iteration we can introduce a positive scaling parameter k : XkC1 D

1 1 k Xk C 1 ; k Xk 2

1. Al-Mohy, A.H., Higham, N.J.: A new scaling and squaring algorithm for the matrix exponential. SIAM J. Matrix Anal. Appl. 31(3), 970–989 (2009). doi:http://dx.doi.org/10.1137/ 09074721X 2. Al-Mohy A.H., Higham, N.J.: Improved inverse scaling and squaring algorithms for the matrix logarithm. SIAM J. Sci. Comput. 34(4), C152–C169, (2012) ˚ Hammarling, S.: A Schur method for the square 3. Björck, A., root of a matrix. Linear Algebra Appl. 52/53, 127–140 (1983) 4. Davies, P.I., Higham, N.J.: A Schur–Parlett algorithm for computing matrix functions. SIAM J. Matrix Anal. Appl. 25(2), 464–485 (2003). doi:http://dx.doi.org/10.1137/ S0895479802410815 5. Denman, E.D., Beavers, A.N., Jr.: The matrix sign function and computations in systems. Appl. Math. Comput. 2, 63–94 (1976) 6. Higham, N.J.: Functions of Matrices: Theory and Computation. Society for Industrial and Applied Mathematics, Philadelphia (2008) 7. Higham, N.J.: The scaling and squaring method for the matrix exponential revisited. SIAM Rev. 51(4), 747–764 (2009). doi:http://dx.doi.org/10.1137/090768539 8. Higham, N.J., Lin, L.: A Schur–Padé algorithm for fractional powers of a matrix. SIAM J. Matrix Anal. Appl. 32(3), 1056–1078 (2011). doi:http://dx.doi.org/10.1137/ 10081232X 9. Nakatsukasa, Y., Bai, Z., Gygi, F.: Optimizing Halley’s iteration for computing the matrix polar decomposition. SIAM J. Matrix Anal. Appl. 31(5), 2700–2720 (2010) 10. Parlett, B.N.: A recurrence among the elements of functions of triangular matrices. Linear Algebra Appl. 14, 117–121 (1976) 11. Smith, M.I.: A Schur algorithm for computing matrix pth roots. SIAM J. Matrix Anal. Appl. 24(4), 971–989 (2003)

X0 D A:

Various choices of k are available, with differing motivations. One is the determinantal scaling k D j det.Xk /j1=n , which tries to bring the eigenvalues of Xk close to the unit circle. The number of iterations required for convergence to double precision accuracy (unit roundoff about 1016 ) varies with the iteration (and function) and the scaling but in some cases can be strictly bounded. For certain scaled iterations for computing the unitary polar factor of a matrix, it can be proved that less than ten iterations are needed for matrices with condition number less than 1016 (e.g., [9]). Moreover, for these iterations only one or two iterations might be needed if the starting matrix is nearly unitary.

Mechanical Systems Bernd Simeon Department of Mathematics, Felix-Klein-Zentrum, TU Kaiserslautern, Kaiserslautern, Germany

Synonyms Constrained mechanical system; Differential-algebraic equation (DAE); Euler–Lagrange equations; Multibody system (MBS); Time integration methods

M

866

Overview A mechanical multibody system (MBS) is defined as a set of rigid bodies and massless interconnection elements such as joints that constrain the motion and springs and dampers that act as compliant elements. Variational principles dating back to Euler and Lagrange characterize the dynamics of a multibody system and are the basis of advanced simulation software, so-called multibody formalisms. The corresponding specialized time integrators adopt techniques from differential-algebraic equations (DAE) and are extensively used in various application fields ranging from vehicle dynamics to robotics and biomechanics. This contribution briefly introduces the underlying mathematical models, discusses alternative formulations of the arising DAEs, and gives then, without claiming to be comprehensive, a survey on the most successful integration schemes.

Mathematical Modeling In Fig. 1, a multibody system with typical components is depicted. The motion of the bodies is described by the vector q.t/ 2 Rnq , which comprises the coordinates for position and orientation of each body depending on time t. We leave the specifics of the chosen coordinates open at this point but will come back to this issue below. Differentiation with respect to time is expressed by a dot, and thus, we write q.t/ P and Mechanical Systems, Fig. 1 Sketch of a multibody system with rigid bodies and typical interconnections

Mechanical Systems

q.t/ R for the corresponding velocity and acceleration vectors. Revolute, translational, universal, and spherical joints are examples for bonds in a multibody system. They may constrain the motion q and hence determine its kinematics. If constraints are present, we express the resulting conditions on q in terms of n constraint equations 0 D g.q/ : (1) Obviously, a meaningful model requires n < nq . The Eq. (1) that restrict the motion q are called holonomic constraints, and the matrix G .q/ WD

@g.q/ 2 Rn nq @q

is called the constraint Jacobian. We remark that there exist constraints, e.g., driving constraints, that may explicitly depend on time t and that are written as 0 D g.q; t/. For notational simplicity, however, we omit this dependence in (1). A standard assumption on the constraint Jacobian is the full rank condition rank G .q/ D n ;

(2)

which means that the constraint equations are linearly independent. In this case, the difference ny WD nq n is the number of degrees of freedom (DOF) in the system.

Mechanical Systems

867

Two routes branch off at this point. The first modeling approach expresses the governing equations in terms of the redundant variables q and uses additional Lagrange multipliers .t/ 2 Rn to take the constraint equations into account. Alternatively, the second approach introduces minimal coordinates y.t/ 2 Rny such that the redundant variables q can be written as a function q.y/ and that the constraints are satisfied for all choices of y:

d dt

@ T @Pq

@ T D f a .q; q; P t/ G .q/T ; @q 0 D g.q/

(7)

Lagrange Equations of Type One and Type Two Using both the redundant position variables q and additional Lagrange multipliers to describe the dynamics leads to the equations of constrained mechanical motion, also called the Lagrange equations of type one:

with applied forces f a . Carrying out the differentiations and defining the force vector f as sum of f a and Coriolis and centrifugal forces result in the equations of motion (5). Note that f a D rU in the conservative case. It should be remarked that for ease of presentation, we omit the treatment of generalized velocities resulting from 3-dimensional rotation matrices. For that case, an additional kinematic equation qP D S .q/v with transformation matrix S and velocity vector v needs to be taken into account [9]. The Lagrange equations of type one are a system of second-order differential equations with additional constraints (5), which is a special form of a DAE. Applying minimal coordinates y, on the other hand, eliminates the constraints and allows generating a system of ordinary differential equations. If we insert the coordinate transformation q D q.y.t// into the principle (6) or apply it directly to (5), the constraints and Lagrange multipliers cancel due to the property (3). The resulting Lagrange equations of type two then take the form

M .q/ qR D f .q; q; P t/ G .q/T ;

C .y/ yR D h.y; y; P t/ :

g.q.y// 0 :

(3)

As a consequence, by differentiation of the identity (3) with respect to y, we get the orthogonality relation G .q.y// N .y/ D 0

(4)

with the null space matrix N .y/ WD @q.y/=@y 2 Rnq ny .

0 D g.q/

(5a)

(8)

(5b) This system of second-order ordinary differential equations bears also the name state space form. For a closer where M .q/ 2 Rnq nq stands for the mass matrix and look at the structure of (8), we recall the null space matrix N from (4) and derive the relations f .q; q; P t/ 2 Rnq for the vector of applied and internal forces. d In case of a conservative multibody system where q.y/ D N .y/y; P the applied forces can be written as the gradient of a dt potential U , the equations of motion (5) follow from d2 @N .y/ .y; P y/ P q.y/ D N .y/yR C Hamilton’s principle of least action: dt 2 @y Z t1 T U g.q/T dt ! stationary ! (6) for the velocity and acceleration vectors. Inserting t0 these relations into the dynamic equations (5a) and Here, the kinetic energy possesses a representation as premultiplying by N T lead directly to the state space P and the Lagrange form (8). quadratic form T D 12 qP T M .q/q, multiplier technique is applied for coupling the dyThe analytical complexity of the constraint namics with the constraints (1). In the nonconservative equations (1) makes it often impossible to obtain case, the Lagrange equations of type one read [23, 25] a set of minimal coordinates y that is valid for all

M

868

configurations of the system. Moreover, although we know from the implicit function theorem that such a set exists in a neighborhood of the current configuration, it might lose its validity when the configuration changes. This holds in particular for multibody systems with so-called closed kinematic loops.

Remarks 1. The differential-algebraic model (5) does not cover all aspects of multibody dynamics. In particular, features such as control laws, non-holonomic constraints, and substructures require a more general formulation. Corresponding extensions are discussed in [9,26]. A detailed treatment of systems with non-holonomic constraints is given by Rabier and Rheinboldt [22]. 2. In the conservative case, which is of minor relevance in engineering applications, the Lagrange equations of type one and type two can be reformulated by means of Hamilton’s canonical equations. 3. Different methodologies for the derivation of the governing equations are commonly applied in multibody dynamics. Examples are the principle of virtual work, the principle of Jourdain, and the Newton–Euler equations in combination with the principle of D’Alembert. These approaches are, in general, equivalent and lead to the same mathematical model. In practice, the crucial point lies in the choice of coordinates and in the corresponding computer implementation. 4. With respect to the choice of coordinates, one distinguishes between absolute and relative coordinates. Absolute or Cartesian coordinates describe the motion of each body with respect to an inertial reference frame, while relative or joint coordinates are based on relative motions between interacting bodies. Using absolute coordinates results in a large number of equations which have a clear and sparse structure and are inexpensive to compute. Furthermore, constraints always imply a differential-algebraic model [16]. Relative coordinates, on the other hand, lead to a reduced number of equations and, in case of systems with a tree structure, allow to eliminate all kinematic bonds, thus leading to a global state space form. In general, the system matrices are full and require more complicated computations than for absolute coordinates.

Mechanical Systems

Index Reduction and Stabilization The state space form (8) represents a system of secondorder ordinary differential equations. The equations of constrained mechanical motion (5), on the other hand, constitute a differential-algebraic system of index 3, as we will see in the following. For this purpose, it is convenient to rewrite the equations as a system of first order: qP D v ;

(9a)

M .q/ vP D f .q; v; t/ G .q/T ; 0 D g.q/

(9b) (9c)

with additional velocity variables v.t/ 2 Rnq . By differentiating the constraints (9c) with respect to time, we obtain the constraints at velocity level: 0D

d g.q/ D G .q/ qP D G .q/ v : dt

(10)

A second differentiation step yields the constraints at acceleration level: d2 g.q/ D G .q/ vP C .q; v/ ; dt 2 @G .q/ .v; v/ ; (11) .q; v/ W D @q 0D

where the two-form comprises additional derivative terms. Combining the dynamic equation (9b) and the acceleration constraints (11), we finally arrive at the linear system ! f .q; v; t/ M .q/ G .q/T vP : D G .q/ 0 .q; v/

(12)

The matrix on the left-hand side has a saddle point structure. We presuppose that M .q/ G .q/T G .q/ 0

is invertible

(13)

in a neighborhood of the solution. A necessary but not sufficient condition for (13) is the full rank of the constraint Jacobian G as stated in (2). If in addition the mass matrix M is symmetric positive definite, (13) obviously holds (We remark that there are applications

Mechanical Systems

869

where the mass matrix is singular, but the prerequisite (13) nevertheless is satisfied). Assuming (13) and a symmetric positive definite mass matrix, we can solve the linear system (12) for the acceleration vP and the Lagrange multiplier by block Gaussian elimination. This leads to an ordinary differential equation for the velocity variables v and an explicit expression for the Lagrange multiplier D .q; v; t/. Since two differentiation steps result in the linear system (12) and a final third differentiation step yields an ordinary differential equation for , the differentiation index of the equations of constrained mechanical motion is 3. Note that the above differentiation process involves a loss of integration constants. However, if the initial values .q 0 ; v0 / are consistent, i.e., if they satisfy the original constraints and the velocity constraints, 0 D g.q 0 / ;

This system is obviously of index 1, and at first sight, one could expect much less difficulties here. But a closer view shows that (15) lacks the information of the original position and velocity constraints, which have become invariants of the system. In general, these invariants are not preserved under discretization, and the numerical solution may thus turn unstable, which is called the drift-off phenomenon. Instead of the acceleration constraints, one can also use the velocity constraints (10) to replace (9c). This leads to qP D v ; M .q/ vP D f .q; v; t/ G .q/T ;

(16)

0 D G .q/ v :

0 D G .q 0 / v0 ;

(14) Now the index is 2, but similar to the index 1 case, the information of the position constraint is lost. The the solution of (9a) and (12) also fulfills the original resulting drift off is noticeable but stays linear, which means a significant improvement compared to (15) system (9). where the drift off grows quadratically in time (see Higher index DAEs such as (9) suffer from several (19) below). Nevertheless, additional measures such drawbacks. For one, in a numerical time integration scheme, a differentiation step is replaced by a dif- as stabilization by projection are often applied when ference quotient, i.e., by a division by the stepsize. discretizing (16). Therefore, the approximation properties of the numerical scheme deteriorate and we observe phenomena GGL and Overdetermined Formulation like order reduction, ill-conditioning, or even loss of On the one hand, we have seen that it is desirable convergence. Most severely affected are typically the for the governing equations to have an index as small Lagrange multipliers. Also, an amplification of pertur- as possible. On the other hand, though simple differbations may occur (cf. the concept of the perturbation entiation lowers the index, it may lead to drift off. index [15]). For these reasons, it is mostly not advis- The formulation of Gear, Gupta, and Leimkuhler [12] able to tackle DAEs of index 3 directly. Instead, it has combines the kinematic and dynamic equations (9a-b) become standard in multibody dynamics o lower the with the constraints at velocity level (10). The position index first by introducing alternative formulations. constraints (5b) are interpreted as invariants and appended by means of extra Lagrange multipliers, which Formulations of Index 1 and Index 2 results in The differentiation process for determining the index revealed the hidden constraints at velocity and qP D v G .q/T ; at acceleration level. It is a straightforward idea to M .q/ vP D f .q; v; t/ G .q/T ; (17) replace now the original position constraint (9c) by one of the hidden constraints. Selecting the acceleration 0 D G .q/ v ; equation (11) for this purpose, one obtains 0 D g.q/ qP D v ; with .t/ 2 Rn . A straightforward calculation shows (15) M .q/ vP D f .q; v; t/ G .q/T ; D 0 if G .q/ of full rank. With the additional 0 D G .q/ vP C .q; v/ : multipliers vanishing, (17) and the original equations

M

870

Mechanical Systems

of motion (5) coincide along any solution. Yet, the index of (17) is 2 instead of 3. Some authors refer to (17) also as stabilized index 2 system. From an analytical point of view, one could drop the extra multiplier in (17) and consider instead the overdetermined system qP D v ; M .q/ vP D f .q; v; t/ G .q/T ;

and velocity variables. This means that any standard ODE integrator can be easily applied in this way to solve the equations of constrained mechanical motion. However, as the position and velocity constraints are in general not preserved under discretization, the arising drift off requires additional measures. More specifically, if the integration method has order k, the numerical solution q n and vn after n time steps satisfies (18) the bound [13]

0 D G .q/ v ; 0 D g.q/ :

kg.q n /k hkmax .Atn CBtn2 /;

kG .q n /vn k hkmax C tn (19)

Though there are more equations than unknowns in (18), the solution is unique and, given consistent initial values, coincides with the solution of the original system (9). Even more, one could add the acceleration constraint (11)–(18) so that all hidden constraints are explicitly stated. After discretization, however, a generalized inverse is required to define a meaningful method.

with constants A; B; C . The drift off from the position constraints grows thus quadratically with the length of the integration interval but depends also on the order of the method. If the constraints are linear, however, there is no drift off since the corresponding invariants are preserved by linear integration methods. A very common cure for the drift off is a two-stage projection method where after each integration step, the numerical solution is projected onto the manifold of position and velocity constraints. Let q nC1 and Local State Space Form If one views the equations of constrained mechani- vnC1 denote the numerical solution of (15). Then, the cal motion as differential equations on a manifold, projection consists of the following steps: it becomes clear that it is always possible to find at least a local set of minimal coordinates to set up the 0 D M .qQ nC1 /.qQ nC1 q nC1 / C G .qQ nC1 /T ; state space form (8) and compute a corresponding null solve .20a/ for qQ nC1 ; space matrix. We mention in this context the coordi0 D g.qQ nC1 /; nate partitioning method [28] and the tangent space parametrization [21], which both allow the application 0 D M .qQ nC1 /.vQ nC1 vnC1 / C G .qQ nC1 /T I of ODE integration schemes in the local coordinates. solve .20b/ for vQ nC1 ; The class of null space methods [5] is related to these 0 D G .qQ nC1 / vQ nC1 : approaches.

Time Integration Methods We discuss in the following a selection of time integration methods that are typically used for solving the constrained mechanical system (9). For this purpose, we assume consistent initial values (14) and denote by t0 < t1 < : : : < tn the time grid, with hi D ti C1 ti being the stepsize. Projection Methods By solving the linear system (12), the formulation (15) of index 1 can be reduced to an ODE for the position

A simplified Newton method can be used to solve the nonlinear system (20a), and since the corresponding iteration matrix is just (13) evaluated at q nC1 and already available in decomposed form due to the previous integration step, this projection is inexpensive to compute. Furthermore, (20b) represents a linear system for vQ nC1 and with similar structure where the corresponding matrix decomposition can be reused for solving (12) in the next integration step [24]. As the projection (20) reflects a metric that is induced by the mass matrix M [18], the projected value qQ nC1 is the point on the constraint manifold that has minimum distance to q nC1 in this metric. An analysis of the required number of Newton iterations and of the

Mechanical Systems

871

relation to alternative stabilization techniques including the classical Baumgarte method [4] is provided by Ascher et al. [3]. Projection methods are particularly attractive in combination with explicit ODE integrators. The combination with implicit methods, on the other hand, is also possible but not as efficient as the direct discretization by DAE integrators discussed below. Half-Explicit Methods Half-explicit methods for DAEs discretize the differential equations explicitly, while the constraint equations are enforced in an implicit fashion. Due to the linearity of the velocity constraint (10), the formulation (16) of index 2 is a good candidate for this method class. Several one-step and extrapolation methods have been tailored to the needs and peculiarities of mechanical systems. The half-explicit Euler method as generic algorithm for the method class reads q nC1 D q n C hvn ;

A xP D .x; t/

(22)

with singular matrix A, right-hand side , and with the vector x.t/ collecting the position and velocity coordinates as well as the Lagrange multipliers. A state-dependent mass matrix M .q/ can be (formally) inverted and moved to the right-hand side or, alternatively, treated by introducing additional acceleration variables a.t/ WD v.t/ P and writing the dynamic equations as 0 D M .q/ a f .q; v; t/ C G .q/T . BDF Methods The backward differentiation methods (BDFs) are successfully used as a multistep discretization of stiff and differential-algebraic equations [11]. For the linear implicit system (22), the BDF discretization with fixed stepsize h simply replaces x.t P nCk / by the difference scheme 1X ˛i x nCi D .x nCk ; tnCk / ; h i D0 k

A

(23)

M .q n / vnC1 D M .q n / vn

with coefficients ˛i , i D 0; : : : ; k. Since the difference operator on the left evaluates the derivative of a polynomial passing through the points x n ; x nC1 ; : : : ; x nCk , 0 D G .q nC1 / vnC1 : (21) this discretization can be interpreted as a collocation method and extends easily to variable stepsizes. The Similar to the index 1 case above, only a linear system new solution x nCk is given by solving the nonlinear of the form system Chf .q n ; vn ; tn / hG .q n /T n ;

vnC1 M .q n / G .q n /T G .q nC1 / 0 hn M .q n /vn C hf .q n ; vn ; tn / D 0

arises in each step. The scheme (21) forms the basis for a class of half-explicit Runge–Kutta methods [2, 15] and extrapolation methods [18]. These methods have in common that only information of the velocity constraints is required. As remedy for the drift off, which grows only linearly here but might still be noticeable, the projection (20) can be applied. Implicit DAE Integrators For the application of general DAE methods, it is convenient to write the different formulations from above as linear implicit system

1 X ˛k A x nCk .x nCk ; tnCk / C A ˛i x nCi D 0 h h i D0 (24) k1

for x nCk , where ˛k is the leading coefficient of the method. The convergence properties of the BDFs when applied to the equations of constrained mechanical motion depend on the index of the underlying formulation [6]. In case of the original system (9) of index 3, convergence is only guaranteed for fixed stepsize, and additional numerical difficulties arise that are typical for higher index DAEs. The formulations (15) and (16) behave better under discretization but suffer from drift off, and for this reason, the GGL formulation (17) is, in general, preferred when using the BDFs for constrained mechanical systems. However, since (17) is still of index 2, the local error in the different

M

872

Mechanical Systems

components is, assuming n D 0, exact history data and hence required in the simplified Newton iteration and constant stepsize: in the error estimation. In practice, the 5th-order Radau IIa method with y.tk / y k D O.hkC1 /; z.tk / zk D O.hk / (25) s D 3 stages has stood the test as versatile and robust integration scheme for constrained mechanical systems where y D .q; v/ collects the differential components (see also the hints on resources for numerical software and z D .; / the algebraic components. To cope below). An extension to a variable order method with with this order reduction phenomenon in the algebraic s D 3; 5; 7 stages and corresponding order 5; 9; 13 is components and related effects, a scaled norm kyk C presented in [14]. Both the BDFs and the implicit Runge–Kutta methhkzk is required both for local error estimation and for convergence control of Newton’s method. Global ods require a formulation of the equations of motion convergence of order k for the k-step BDF method as first-order system, which obviously increases the when applied to (17) can nevertheless be shown both size of the linear systems within Newton’s method. For an efficient implementation, it is crucial to apply for the differential and the algebraic components. As analyzed in [10], the BDF discretization of (17) block Gaussian elimination to reduce the dimension is equivalent to solving the corresponding discretized in such a way that only a system similar to (12) with overdetermined system (18) in a certain least squares two extra Jacobian blocks of the right-hand side vector sense where the least squares objective function inher- f has to be solved. When comparing these implicit DAE integrators with explicit integration schemes for its certain properties of the state space form (8). the formulation of index 1, stabilized by the projection scheme (20), and with half-explicit methods, Implicit Runge–Kutta Methods the performance depends on several parameters such Like the BDFs, implicit Runge–Kutta schemes constitute an effective method class for stiff and as problem size, smoothness, and, most of all, stiffdifferential-algebraic equations. Assuming a stiffly ness. The adjective “stiff” typically characterizes an ODE accurate method where the weights are simply given by the last row of the coefficient matrix, such a method whose eigenvalues have strongly negative real parts. However, numerical stiffness may also arise in case with s stages for the linear implicit system (22) reads of second-order systems with large eigenvalues on or s close to the imaginary axis. If such high frequenX aij .X n;j ; tn C cj h/ ; AX n;i D Ax n C h cies are viewed as a parasitic effect which perturbs a j D1 slowly varying smooth solution, implicit time integrai D 1; : : : ; s: (26) tion methods with adequate numerical dissipation are an option and usually superior to explicit methods. For Here, X n;i denotes the intermediate stage vectors and a mechanical system, this form of numerical stiffness ci for i D 1; : : : ; s the nodes of the underlying is directly associated with large stiffness forces, and quadrature formula, while .aij /i;j D1;:::;s is the coeffi- thus the notion of a stiff mechanical system has a cient matrix. The numerical solution after one step is twofold meaning. If the high-frequency information carries physical significance and needs to be resolved, given by the last stage vector, x nC1 WD X n;s . Efficient methods for solving the nonlinear system even implicit methods are compelled to taking tiny (26) by means of simplified Newton iterations and a stepsizes. Most often, however, it suffices to track a transformation of the coefficient matrix are discussed smooth motion where the high-frequency part reprein [13]. In the DAE context, collocation methods of sents a singular perturbation [27]. In case of a stiff mechanical system with high fretype Radau IIa [7] have proved to possess particularly favorable properties. For ODEs and DAEs of index quencies, the order of a BDF code should be restricted 1, the convergence order of these methods is 2s 1, to k D 2 due to the loss of A-stability for higher while for higher index DAEs, an order reduction occurs order. L-stable methods such as the Radau IIa schemes, [15]. In case of the equations of motion in the GGL however, are successfully applied in such situations formulation (17), a scaled norm like in the BDF case is (see [19] for an elaborate theory).

Mechanical Systems

Generalized ˛-Method The generalized ˛-method [8] represents the method of choice for time-dependent solid mechanics applications with deformable bodies discretized by the finite element method (FEM). Since this field of structural dynamics and the field of multibody systems are more and more growing together, extensions of the ˛-method for constrained mechanical systems have recently been introduced. While the algorithms of [17, 20] are based on the GGL formulation (17), the method by Arnold and Brüls [1] discretizes the equations of motion (5) directly in the second-order formulation. In brief, the latter algorithm uses discrete values q nC1 ; qP nC1 ; qR nC1 ; nC1 that satisfy the dynamic equations (5) and auxiliary variables for the accelerations

873 Mechanical Systems, Table 1 Internet downloading numerical software

resources

for

1: pitagora.dm.uniba.it/testset/ 2: www.netlib.org/ode/ 3: www.cs.ucsb.edu/cse/software.html 4: www.unige.ch/hairer/software.html 5: www.zib.eu/Numerik/numsoft/CodeLib/codes/mexax/

Resources for Numerical Software A good starting point for exploring the available numerical software is the initial value problem (IVP) test set [1] (see Table 1), which contains several examples of constrained and unconstrained mechanical systems along with various results and comparisons for a wide selection of integration codes. The BDF code DASSL by L. Petzold can be obtained from the IVP site but also from netlib [2]. The more recent version DASPK with extensions for large-scale .1˛m /anC1 C ˛m an D .1˛f /qR nC1 C˛f qR n : (27) systems is available at [3]. The implicit Runge– Kutta codes RADAU5 and RADAU by E. Hairer These are then integrated via and G. Wanner can be downloaded from [4] where also the half-explicit Runge–Kutta code PHEM56 by A. Murua is provided. The extrapolation code MEXAX 1 q nC1 D q n C hqP n C h2 ˇ an C h2 ˇanC1 ; by Ch. Lubich and coworkers is in the library [5]. 2 Finally, the half-explicit Runge–Kutta code HEDOP5 qP nC1 D qP n C h.1 /an C h anC1 : (28) by M. Arnold method and a projection method by the author of this contribution are contained in the library The free coefficients ˛f ; ˛m ; ˇ; determine the MBSPACK [26]. method. Of particular interest is the behavior of this scheme for a stiff mechanical system where the high frequen- References cies need not be resolved. An attractive feature in this context is controllable numerical dissipation, which is 1. Arnold, M., Brüls, O.: Convergence of the generalizeda scheme for constrained mechanical systems. Multibody mostly expressed in terms of the spectral radius 1 Syst. Dyn. 18, 185–202 (2007) at infinity. More specifically, it holds 1 2 Œ0; 1 2. Arnold, M., Murua, A.: Non-stiff integrators for differentialwhere 1 D 0 represents asymptotic annihilation of algebraic systems of index 2. Numer. Algorithms 19, 25–41 the high-frequency response, i.e., the equivalent of L(1998) stability. On the other hand, 1 D 1 stands for the case 3. Ascher, U., Chin, H., Petzold, L., Reich, S.: Stabilization of constrained mechanical systems with DAEs and invariant of no algorithmic dissipation. A-stability, also called manifolds. J. Mech. Struct. Mach 23, 135–158 unconditional stability, is achieved for the parameters 4. Baumgarte, J.: Stabilization of constraints and integrals of motion in dynamical systems. Comput. Methods Appl. Mech. 1, 1–16 (1972) 1 21 1 ; ˛m D ; ˇ D 14 .1 ˛m C ˛f /2 : 5. Betsch, P., Leyendecker, S.: The discrete null space method ˛f D 1 C 1 1 C 1 for the energy consistent integration of constrained mechan(29) ical systems. Int. J. Numer. Methods Eng. 67, 499–552

The choice D 1=2 ˛m C ˛f guarantees secondorder convergence.

(2006) 6. Brenan, K.E., Campbell, S.L., Petzold, L.R.: The Numerical Solution of Initial Value Problems in Ordinary DifferentialAlgebraic Equations. SIAM, Philadelphia (1996)

M

874

Medical Applications in Bone Remodeling, Wound Healing, Tumor Growth, and Cardiovascular Systems

7. Butcher, J.C.: Integration processes based on Radau quadrature formulas. Math. Comput. 18, 233–244 (1964) 8. Chang, J., Hulbert, G.: A time integration algorithm for structural dynamics with improved numerical dissipation. ASME J Appl Mech 60, 371–375 (1993) 9. Eich-Soellner, E., Führer, C.: Numerical Methods in Multibody Dynamics. Teubner, Stuttgart (1998) 10. Führer, C., Leimkuhler, B.: Numerical solution of differential-algebraic equations for constrained mechanical motion. Numer. Math. 59, 55–69 (1991) 11. Gear, C.: Numerical Initial Value Problems in Ordinary Differential Equations. Prentice-Hall, Englewood Cliffs (1971) 12. Gear, C., Gupta, G., Leimkuhler, B.: Automatic integration of the Euler–Lagrange equations with constraints. J. Comput. Appl. Math. 12 & 13, 77–90 (1985) 13. Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems, 2nd edn. Springer, Berlin (1996) 14. Hairer, E., Wanner, G.: Stiff differential equations solved by Radau methods. J. Comput. Appl. Math. 111, 93–111 (1999) 15. Hairer, E., Lubich, C., Roche, M.: The Numerical Solution of Differential-Algebraic Equations by Runge– Kutta Methods. Lecture Notes in Mathematics, vol. 1409. Springer, Heidelberg (1989) 16. Haug, E.: Computer-Aided Kinematics and Dynamics of Mechanical Systems, vol. I. Allyn and Bacon, Boston (1989) 17. Jay, L., Negrut, D.: Extensions of the hht-method to differential-algebraic equations in mechanics. ETNA 26, 190–208 (2007) 18. Lubich, C.: h2 extrapolation methods for differentialalgebraic equations of index-2. Impact Comput. Sci. Eng. 1, 260–268 (1989) 19. Lubich, C.: Integration of stiff mechanical systems by Runge–Kutta methods. ZAMP 44, 1022–1053 (1993) 20. Lunk, C., Simeon, B.: Solving constrained mechanical systems by the family of Newmark and ˛-methods. ZAMM 86, 772–784 (2007) 21. Potra, F., Rheinboldt, W.: On the numerical integration for Euler–Lagrange equations via tangent space parametrization. J. Mech. Struct. Mach. 19(1), 1–18 (1991) 22. Rabier, P., Rheinboldt, W.: Nonholonomic Motion of Rigid Mechanical Systems from a DAE Point of View. SIAM, Philadelphia (2000) 23. Roberson, R., Schwertassek, R.: Dynamics of Multibody Systems. Springer, Heidelberg (1988) 24. Schwerin, R.: Multibody System Simulation. Springer, Berlin (1999) 25. Shabana, A.: Dynamics of Multibody Systems. Cambridge University Press, Cambridge/New York (1998) 26. Simeon, B,: MBSPACK – numerical integration software for constrained mechanical motion. Surv. Math. Ind. 5, 169–202 (1995) 27. Simeon, B.: Order reduction of stiff solvers at elastic multibody systems. Appl. Numer. Math. 28, 459–475 (1998) 28. Wehage, R.A., Haug, E.J.: Generalized coordinate partitioning for dimension reduction in analysis of constrained dynamic systems. J. Mech. Des. 134, 247–255 (1982)

Medical Applications in Bone Remodeling, Wound Healing, Tumor Growth, and Cardiovascular Systems Yusheng Feng and Rakesh Ranjan NSF/CREST Center for Simulation, Visulization and Real-Time Prediction, The University of Texas at San Antonio, San Antonio, TX, USA

Synonyms Bone remodeling; Cardiovascular flow; Computational bioengineering; Continuum model; Mixture theory; Predictive medicine; Tumor growth; Wound healing

Description Predictive medicine is emerging both as a research field and a potential medical tool for designing optimal treatment options. It can also advance understanding of biological and biomedical processes and provide patient-specific prognosis. In order to characterize macro-level tissue behavior, mixture theory can be introduced for modeling both hard and soft tissues. In continuum mixture theory, an arbitrary point in a continuous medium can be occupied simultaneously by many different constituents differentiated only through their volume fractions. The advantage of this mathematical representation of tissues permits direct reconstruction of patientspecific geometry from medical imaging, inclusion of species from different scales as long as they can be characterized by either density or volume fraction functions, and explicit consideration of interactions among species included in the mixture. Furthermore, the mathematical models based on the notion of mixture can be derived from the first principles (conservation laws and the second law of thermodynamics). The applications considered here include bone remodeling, wound healing, and tumor growth. The cardiovascular system can also be included if soft tissues such as the heart and vessels are treated as separate species different than fluid (blood).


Bone remodeling is a natural biological process during the course of maturity or after injuries, which can be characterized by a reconfiguration of the density of the bone tissue due to mechanical forces or other biological stimuli. Wound healing (or cicatrization), on the other hand, mainly involves skins or other soft organ tissues that repair themselves after the protective layer and/or tissues are broken and damaged. In particular, wound healing in fasciated muscle occurs due to the presence of traction forces that accelerate the healing process. Both bone remodeling and wound healing can be investigated under the general framework of continuum mixture theory at the tissue level. Another important application is tumor growth modeling, which is crucial in cancer biology, treatment planning, and outcome prediction. The mixture theory framework can provide a convenient vehicle to simulate growth (or shrinking) phenomenon under various biological conditions.

Continuum Mixture Theory There are several versions of mixture theories [1]. In general, mixture theory is a comprehensive framework that allows multiple species to be included under an abstract notion of continuum. In this framework, the biological tissue can be considered as a multiphasic system with different species including solid tissue, body fluids, cells, extra cellular matrix (ECM), nutrients, etc. Each of the species (or constituents) is denoted by ˛ .˛ D 1; 2; : : :; / where is the number of species in the mixture. The nominal densities of each constituent are denoted by ˛ , and the true densities are denoted by ˛R . To introduce the volume fraction concept, a domain occupying the control space BS is defined with the boundary @BS , in which all the constituents ˛ occupy the volume fractions ˛ , which satisfy the constraint X

˛ .x; t/ D

X

˛ D 1; ˛R ˛D1

875

As noted in the introductory entry, two frames of reference are used to describe the governing principles of continuum mechanics. The Lagrangian frame of reference is often used in solid mechanics, while the Eulerian frame of reference is used in fluid mechanics. The Lagrangian description is usually suitable to establish mathematical models for stress-induced growth such as bone remodeling and wound healing (e.g., [2]), while the Eulerian description is used for developing mass transfer-driven tumor growth models [3–6] with a few exceptions when tumors undergo large deformations [7]. To develop mathematical models for each application, the governing equations are provided by the conservation laws, and the constitutive relations are usually developed through empirical relationships subject to constraints such as invariance condition, consistency with thermodynamics, etc. Specifically, the governing equations can be obtained from conservations of mass, momentum, and energy for each species as well as the mixture. Moreover, the conservation of energy is often omitted from the governing equations under the isothermal assumption, unless bioheat transfer is of interest (e.g., in thermotherapies). When the free energy of the system is given as a function of dependent field variables such as strain, temperature, etc., the second law of thermodynamics (the Clausius-Duhem inequality) provides a means for determining forms of some constitutive equations via the well-known method of Coleman and Noll [8].

Bone Remodeling and Wound Healing Considering the conservation of mass for each species ˛ in a control volume, the mass production and fluxes across the boundary of the control volume are required to be equal: @˛ C r .˛ v/ D O˛ : @t

(2)

(1) In Eq. (2), the velocity of the constituent is denoted by v, and the mass supplies between the phases are denoted by O˛ . From a mechanical point of view, the where x is the position vector of the actual placement processes of bone remodeling and wound healing are and t denotes the time. mainly induced by traction forces. To develop the mass ˛D1

M

876


conservation equation, we may include all the necessary species of interest. For simplicity, however, we choose a triphasic system comprised of solid, liquid, and nutrients to illustrate the modeling process [2]. The mass exchange terms are subject to the constraint X

medium (solid phase). Various types of material behavior can be described in terms of principal invariants of structural tensor M and right Cauchy-Green tensor CS , where M D A ˝ A and CS D FTS FS ;

(9)

O˛ D 0 or OS C ON C OL D 0:

(3) and A is the preferred direction inside the material and FS is the deformation gradient for a solid undergoing large deformations. The expressions for the stress in Moreover, if the liquid phase is not involved in the mass the solid are dependent on the deformation graditransition, then ent and consequently the displacements of the solid. Summation of the momentum conservation equations OS D ON and OL D 0: (4) provides the equation for the solid displacements. Mass conservation equations with incorporation of the satNext, the momentum of the constituent ˛ is defined uration condition provide the equation for interstitial by Z pressure. In addition, the mass conservation equations ˛ v˛ dv (5) for each species provide the equations for the evolution m˛ D B˛ of volume fractions. By including total change of linear momentum in B˛ Assuming the fluid phase (F ) is comprised of the by m˛ and the interaction of the constituents ˛ by pO ˛ , liquid (L) and the nutrient phases (N ), we obtain the standard momentum equation (Cauchy equation of (F D L C N ) motion) for each constituent becomes ˛D1

r T˛ C ˛ .b a˛ / C pO ˛ O˛ v˛ D 0

(6) r

S;L;N X ˛

T˛ Cb

S;L;N X ˛

˛ C

S;L;N X

pO ˛ OS vS OF vF D 0

˛

(10) where the expression O v˛ represents the exchange of F S S N F O O O D O and p C p C p D 0, we obtain Since O linear momentum through the density supply O˛ . The term T˛ denotes the partial Cauchy stress tensor, and S;L;N S;L;N X X ˛ b specifies the volume force. In addition, the terms ˛ T C b ˛ C OS .vF vS / D 0 (11) r ˛ pO , where ˛ D S; L; N are required to satisfy the ˛ ˛ constraint condition The definition of the seepage velocity wF S provides the S L N following equation pO C pO C pO D 0: (7) ˛

S;F S;F X X In the case of either bone remodeling or wound healing, r T˛ C b ˛ C OS .wF S / D 0 (12) the velocity field is nearly steady state. Thus, the ˛ ˛ acceleration can be neglected by setting a˛ D 0. The resulting system of equations can then be written as The strong form for the pressure equation can be written as follows: r T˛ C ˛ b C pO ˛ D O˛ v˛ : (8) 1 1 D 0 (13) r F wF S C I W DS OS SR NR The second law of thermodynamics (entropy inequality) provides expressions for the stresses in the solid and fluid phases that are dependent on the displace- The strong form of mass conservation equation for the ments and the seepage velocity, respectively. The seep- solid phase is

age velocity is a relative velocity between the liquid and solid phases, which are often obtained from explicit Darcy-type expressions for flow through a porous

D S .S / OS C S I W DS D SR Dt

(14)


877

Finally, the balance of mass for the nutrient phase can (ODE), e.g., [10–13], extensions of ODE’s to partial be described as differential equations [4, 14], or continuum mechanicsbased descriptions that study both vascular and D S .N / ON avascular tumor growth. Continuum mechanics-based NR C N I W DS C r N wF S D 0 (15) formulations consider either a Lagrangian [2] or Dt a Eulerian description of the medium [4]. Various In the above, wFS is the seepage velocity, DS denotes considerations such as modifications of the ordinary the symmetric part of the spatial velocity gradient, differential equations (ODE’s) to include effects S and DDt./ denotes the total derivative of quantities with of therapies [12], studying cell concentrations in respect to the solid phase. The seepage velocity is capillaries during vascularization with and without inhibitors, multiscale modeling [15–19], and cell obtained from transport equations in the extracellular matrix i 1 h (ECM) [5] have been included. rF pOF (16) wFS D SF Modeling tumor growth can also be formulated under the framework of mixture theory with a multiHere, SF is the permeability tensor, denotes the constituent description of the medium. It is convenient pressure, and F is the volume fraction of the fluid. to use an Eulerian frame of reference. Other descripEquations (8)–(15) are required to be solved for the tions have considered the tumor phase with diffused bone remodeling problem with the mixture theory. The interface [6]. Consider the volume fraction of cells deprimary variables to be solved are fuS ; ; nS ; nN g the noted by ( ), extracellular liquid (l), and extracellular solid displacements, interstitial pressure, and the solid matrix (m) [4]. The governing equations are derived and nutrient volume fractions. One example of bone from conservation laws for each constituent of the remodeling is the femur under the traction loadings, individual phases. The cells can be further classified which drive the process so that the bone density is as tumor cells, epithelial cells, fibroblasts, etc. denoted redistributed. Based on the stress distribution, the bone by subscript ˛. Similarly we can distinguish different usually becomes stiffer in the areas of higher stresses. components of the extracellular matrix (ECM), namely, Importantly, the same set of equations can also be collagen, elastin, fibronectin, vitronectin, etc. [20] deused to study the process of wound healing. It is obvi- noted by subscript ˇ. The ECM component velocities ous, however, that the initial and boundary conditions are assumed to be the same, based on the constrained are specified differently. It is worth noting that traction sub-mixture assumption [5]. The concentrations of forces inside the wound can facilitate the closure of chemicals within the liquid are of interest in the extrathe wound. From the computational point of view, the cellular liquid. The above assumptions provide us the specification of solid and liquid volume fractions as mass conservation equations for the constituents as ( , well as pressure is required on all interior and exterior m, and l) boundaries of the computational domain. The interior boundary (inner face) of the wound can @ ˛ C r . ˛ v ˛ / D ˛ be assumed to possess sufficiently large quantity of the @t solid and liquid volume fractions, which is implicated @mˇ biologically with sufficient nutrient supplies. On the C r .mˇ vm / D mˇ (17) @t other hand, the opening of the wound can be prescribed by natural boundary conditions with seepage velocity. The equations above v ˛ and vm denote the velocities of the respective phases. Note that there is no subscript on vm (constrained sub-mixture assumption). Mass Modeling Tumor Growth balance equation expressed as concentrations in the Attempts at developing computational mechanics liquid phase is expressed as models of tumor growth date back over half a century (see, e.g., [9]). Various models have been proposed based on ordinary differential equations

@c D r .Drc/ C G @t

(18)

M

878


Here, D denotes the effective diffusivity tensor in the mixture and G contains the production/source terms and degradation/uptake terms relative to the entire mixture. The system of equations requires the velocities of each component to obtain the closure. The motion of the volume fraction of the cells is governed by the momentum equations

@v

C v rv

@t

D r TQ C b C mQ

(19)

Here, denotes the volume fraction, J denotes the fluxes that account for the mechanical interactions among the different species, and the source term S accounts for the inter-component mass exchange as well as gains due to proliferation and loss due to cell death. The above Eq. (21) is interpreted as the evolution equation for which characterizes the phase of the system. This approach modifies the equation for the interface to provide both for convection of the interface and with an appropriate diminishing of the total energy of the system. The free energy of a system of two immiscible fluids consists of mixing, bulk distortion, and anchoring energy. For simple two-phase flows, only mixing energy is retained, which results in a rather simple expression for the free energy .

Similar expressions hold for the extracellular matrix and the liquid phases. The presence of the saturation constraint requires one to introduce a Lagrange multiplier into the Clausius-Duhem inequality and provides expressions for the excess stress TQ and excess interaction force m . The Lagrange multiplier is classically Z 1 2 identified with the interstitial pressure P . Body forces, jrj2 C f .; T / dV F .; r; T / D 2 b, are ignored for the equations for the ECM, and Z the excess stress tensor in the extracellular liquid is D ftot dV (22) assumed to be negligible in accordance with the low viscous forces in porous media flow studies. With these assumptions, we obtain the following equations: Thus the total energy is minimized with the definition of the chemical potential which implements an energy ˛ rP C r . ˛ T ˛ / C m ˛ C ˛ b˛ D 0 cost proportional to the interface width . The following equation describes evolution of the phase field mrP C r .mTm/ C mm D 0 parameter: lrP C ml D 0

(20)

The set of equations above provides the governing differential equations required to solve tumor growth problems. The primary variables to be solved are f ˛ ; mˇ ; P g. The governing equations can be solved with suitable boundary conditions of specified volume fractions of the cells, extracellular liquid, and pressures. Fluxes of these variables across the boundaries also need to be specified for a complete description of the problem. Other approaches in modeling tumor growth involve tracking the moving interface of the growing tumor. Among them is the phase field approach. The derivation of the basic governing equations is given in Wise [21]. From the continuum advection-reactiondiffusion equations, the volume fractions of the tissue components obey @ C r .u/ D r J C S @t

@ C ur D r r @t

@ftot @ftot r @ @r

(23)

where ftot is the total free energy of the system. The above Eq. (23) seeks to minimize the total free energy of the system with a relaxation time controlled by the mobility . With some further approximations, the partial differential equation governing the phase field variable is obtained as the Cahn-Hilliard equation: @ C ur D r rG @t

(24)

where G is the chemical potential and is the mobility. The mobility determines the time scale of the CahnHilliard diffusion and must be large enough to retain a constant interfacial thickness but small enough so that the convective terms are not overly damped. The mobility is defined as a function of the interface thickness (21) as D 2 . The chemical potential is provided by


. 2 1/ 2 G D r C 2

(25)

The Cahn-Hilliard equation forces to take values of 1 or C1 except in a very thin region on the fluid-fluid interface. The introduction of the phase field interface allows the above equation to be written as a set of two second-order PDEs: @ C u r D r 2 r @t D r 2 r C . 2 1/

(26) (27)

The above equation is the simplest phase field model and is known as model A in the terminology of phase field transitions [3, 6, 22]. Phase field approaches have been applied for solving the tumor growth, and multiphase descriptions of an evolving tumor have been obtained with each phase having its own interface and a characteristic front of the moving interface obtained with suitable approximations. When specific applications of the phase field approach to tumor growth are considered, the proliferative and nonproliferative cells are described by the phase field parameter . The relevant equations in the context of tumor growth are provided by the following [6, 23]: @ D M r 2 C 3 r 2 C ˛p .T /./ @t (28) Here, M denotes the mobility coefficient, T stands for the concentration of hypoxic cell-produced angiogenic factor, and ./ denotes the Heaviside function which takes a value of 1 when its argument is positive. The proliferation rate is denoted by ˛p .T / and as usual denotes the width of the capillary wall. The equation above is solved with the governing equation for the angiogenic factor T . The angiogenic factor diffuses randomly from the hypoxic tumor area where it is produced and obeys the following equation: @Ti D r .DrT / ˛T T ./ @t

Modeling Cardiovascular Fluid Flow Cardiovascular system modeling is another important field in predictive medicine. Computational modeling of blood flow requires solving, in the general sense, three-dimensional transient flow equations in deforming blood vessels [24]. The appropriate framework for problems of this type is the arbitrary LagrangianEulerian (ALE) description of the continuous media in which the fluid and solid domains are allowed to move to follow the distensible vessels and deforming fluid domain. Under the assumption of zero wall motion, the problem reduces to the Eulerian description of the fixed spatial domain. The strong form of the problem governing incompressible Newtonian fluid flow in a fixed domain consists of the Navier-Stokes equations and suitable initial and boundary conditions. Direct analytical solutions of these equations are not available for complex domains, and numerical methods must be used. The finite element method has been the most widely used numerical method for solving the equations governing blood flow [24]. In the Eulerian frame of reference, the conservation of mass is expressed as the continuity equation, and the conservation of momentum closes the system of equations with expressions of the stress tensor for the Newtonian fluid derived from the second law of thermodynamics. The flow of blood inside the arteries and the heart comprises some of the examples in biological systems. The governing equations for laminar fluid flow in cardiovascular structures are provided by the incompressible Navier-Stokes equations when the fluid flow is in the laminar regime with assumptions of constant viscosity [25]. We provide here the basic conservation laws for fluid flow expressed as the Navier-Stokes equations. Consider the flow of a nonsteady Newtonian fluid with density and viscosity . Let 2 Rn and t 2 Œ0; T be the spatial and temporal domains, respectively, where n is the number of space dimensions. Let denote the boundary of . We consider the following velocity-pressure formulation of Navier-Stokes equations governing unsteady incompressible flows.

(29)

In the equation above, D denotes the diffusion coefficient of the factor in the tissue and ˛T denotes the rate of consumption by the endothelial cells.

879

@u C u ru f r D 0 on 8 t 2 Œ0; T @t (30) r u D 0; on 8 t 2 Œ0; T

(31)

M

880


where , u, f, and œ are the density, velocity, body of 0:04 poise (or 0:004 kg/m-s). Three-dimensional force, and stress tensor, respectively. The stress tensor transient simulations require meshing the domain with is written as a sum of the isotropic and deviatoric parts: significant degree of freedom and require considerable computing time [25]. D pI C T D pI C 2 .u/ (32) 1 ru C ruT (33) 2 References Here, I is the identity tensor, D , p is the pressure, and u is the fluid velocity. The part of the 1. Atkin, R.J., Craine, R.E.: Continuum theory of mixtures: basic theory and historical development. Q. J. Mech. Appl. boundary at which the velocity is assumed to be speciMath. 29, 209–244 (1976) fied is denoted by g 2. Ricken, T., Schwarz, A., Bluhm, J.A.: Triphasic model of .u/ D

u D g on g 8 t 2 Œ0; T

(34)

The natural boundary conditions associated with Eq. (30) are the conditions on the stress components, and these are the conditions assumed to be imposed on the remaining part of the boundary, n D h on Th 8 t 2 Œ0; T

(35)

where g and h are the complementary subsets of the boundary , or D g [ h . As the initial condition, a divergence-free velocity field, u0 .x/, is imposed. To simulate realistic flow conditions, one needs to consider a pulsatile flow as the boundary conditions at the inlet. The governing equations along with the boundary conditions characterize the flow through a cardiovascular system, which can be solved to obtain the descriptions of the velocity profiles and pressure inside the domain. In general, stabilized finite element methods have been used for solving incompressible flow inside both arteries and the heart [24]. Realistic simulations of the blood flow have required three-dimensional patient-specific solid models of pulmonary tree by integrating combined magnetic resonance imaging (MRI) and computational fluid dynamics. An extension of MRI is magnetic resonance angiography (MRA), which has also been used for reconstructing the three-dimensional coarse mesh from MRA data. Three-dimensional subject-specific solid models of the pulmonary tree have been created from these MRA images as well [25]. The finite element mesh discretization of the problem is effective in capturing multiple levels of blood vessel branches. Different conditions of the patient resting and exercise conditions have been simulated. Blood is usually assumed to behave as a Newtonian fluid with a viscosity

3.

4.

5.

6.

7. 8.

9.

10. 11.

12.

13.

14.

15.

16.

transversely isotropic biological tissue with applications to stress and biologically induced growth. Comput. Mater. Sci. 39, 124–136 (2007) Wise, S.M., Lowengrub, J.S., Friebos, H.B., Cristini, V.: Three-dimensional diffuse-interface simulation of multispecies tumor growth-I model and numerical method. J. Theor. Biol. 253, 523–543 (2008) Ambrosi, D., Preziosi, L.: On the closure of mass balance models for tumor growth. Math. Models Methods Appl. Sci. 12, 737–754 (2002) Preziosi, L.: Cancer Modeling and Simulation. Chapman and Hall/CRC Mathematical Biology and Medicine Series. London (2003) Oden, J.T., Hawkins, A., Prudhomme, S.: General diffuseinterface theories and an approach to predictive tumor growth modeling. Math. Models Methods Appl. Sci. 20(3), 477–517 (2010) Ambrosi, D., Mollica, F.: On the mechanics of a growing tumor. Int. J. Eng. Sci. 40, 1297–1316 (2002) Coleman, B.D., Noll, W.: The thermodynamics of elastic materials with head conduction and viscosity. Arch. Ration. Mech. Anal. 13, 167–178 (1963) Armitage, P., Doll, R.: The age distribution of cancer and a multi-stage theory of carcinogenesis. Br. J. Cancer 8, 1 (1954) Ward, J.P., King, J.R.: Mathematical modelling of avascular tumor growth. J. Math. Appl. Med. Biol. 14, 39–69 (1997) Grecu, D., Carstea, A.S., Grecu, A.T., Visinescu, A.: Mathematical modelling of tumor growth. Rom. Rep. Phys. 59, 447–455 (2007) Tian, J.T., Stone, K., Wallin, T.J.: A simplified mathematical model of solid tumor regrowth with therapies. Discret. Contin. Dyn. Syst. 771–779 (2009) Lloyd, B.A., Szczerba, D., Szekely, G.: A coupled finite element model of tumor growth and vascularization. Med. Image Comput. Comput. Assist. Interv. 2, 874–881 (2007) Roose, T., Chapman, S.J., Maini, P.K.: Mathematical models of avascular tumor growth. SIAM Rev. 49(2), 179–208 (2007) Macklin, P., McDougall, S., Anderson, A.R.A., Chaplain, M.A.J., Cristini, V., Lowengrub, J.: Multiscale modelling and nonlinear simulation of vascular tumor growth. J. Math. Biol. 58(4–5), 765–798 (2009) Cristini, V., Lowengrub, J.: Multiscale Modeling of Cancer. An Integrated Experimental and Mathematical Modeling Approach. Cambridge University Press, Cambridge (2010)

Medical Imaging 17. Tan, Y., Hanin, L.: Handbook of Cancer Models with Applications. Series in Mathematical Biology and Medicine. World Scientific, Singapore (2008) 18. Deisboeck, T.S., Stmatakos, S.: Multiscale Cancer Modeling. Mathematical and Computational Biology Series. CRC/Taylor and Francis Group, Boca Raton (2011) 19. Wodarz, D., Komarova, N.L.: Computational Biology of Cancer. Lecture Notes and Mathematical Modeling. World Scientific, Hackensack (2005) 20. Astanin, S., Preziosi, L.: Multiphase models of tumor growth. In: Selected Topics in Cancer Modeling. Modeling and Simulation in Science, Engineering, and Technology, pp. 1–31. Birkhäuser, Boston (2008) 21. Wise, S.M., Lowngrub, J.S., Frieboes, H.B., Cristini, V.: Three-dimensional multispecies nonlinear tumor growth-I model and numerical method. J. Theor. Biol. 253, 524–543 (2008) 22. Travasso, R.D.M., Castro, M., Oliveira, J.C.R.E.: The phase-field model in tumor growth. Philos. Mag. 91(1), 183–206 (2011) 23. Travasso, R.D.M., Poire, E.C., Castro, M., Rodrguez, J.C.M.: Tumor angiogenesis and vascular patterning: a mathematical model. PLos One 6(5), 1–9 (2011) 24. Taylor, C.A., Hughes, J.T.R., Zarins, C.K.: Finite element modeling of blood flow in arteries. Comput. Methods Appl. Mech. Eng. 158, 155–196 (1998) 25. Tang, B.T., Fonte, T.A., Chan, F.P., Tsao, P.S., Feinstein, J.A., Taylor, C.A.: Three dimensional hemodynamics in the human pulmonary arteries under resting and exercise conditions. Ann. Biomed. Eng. 39(1), 347–358 (2011)

Medical Imaging Charles L. Epstein Departments of Mathematics and Radiology, University of Pennsylvania, Philadelphia, PA, USA

Synonyms Magnetic resonance imaging; Radiological imaging; Ultrasound; X-ray computed tomography

Description Medical imaging is a collection of technologies for noninvasively investigating the internal anatomy and physiology of living creatures. The prehistory of modern imaging includes various techniques for physical examination, which employ palpation and other external observations. Though the observations are indirect

881

and require considerable interpretation to relate to the internal state of being, each of these methods is based on the principle that some observable feature differs between healthy and sick subjects. While new technologies have vastly expanded the collection of available measurements, this basic principle remains the central tenet of medical imaging. Modern medical imaging is divided into different modalities according to the physical principles underlying the measurement process. These differences in underlying physics lead to contrasts in the images that reflect different aspects of anatomy or physiology. The utility of a modality is largely governed by three interconnected considerations: contrast, resolution, and noise. Contrast refers to the physical or chemical distinctions that produce the image itself, and the magnitude of these differences in the reconstructed image. Resolution is usually thought of as the size of the smallest objects discernible in the image. Finally noise is an inevitable consequence of real physical measurements. The ratio between the size of the signal and the size of the noise which contaminates it, called SNR, limits both the contrast and resolution attainable in any reconstructed image. Technological advances in the nineteenth and twentieth centuries led to a proliferation of methods for medical imaging. The first such advances were the development of photographic imaging, and the discovery of x-rays. These were the precursors of projection x-rays, which led, after the development of far more sensitive solid-state detectors, to x-ray tomography. Sonar, which was used by the military to detect submarines, was adapted, along with ideas from radar, to ultrasound imaging. In this modality highfrequency acoustic energy is used as a probe of internal anatomy. Taking advantage of the Döppler effect, ultrasound can also be used to visualize blood flow, see [7]. Nuclear magnetic resonance, which depends on the subtle quantum mechanical phenomenon of spin, was originally developed as a spectroscopic technique in physical chemistry. With the advent of powerful, large, high-quality superconducting magnets, it became feasible to use this phenomenon to study both internal anatomy and physiology. In its simplest form the contrast in MRI comes from the distribution of water molecules within the body. The richness of the spinresonance phenomenon allows the use of other experimental protocols to modulate the contrast, probing

M

882

many aspects of the chemical and physical environment. The four imaging modalities in common clinical use are (1) x-ray computed tomography (x-ray CT), (2) ultrasound (US), (3) magnetic resonance imaging (MRI), and (4) emission tomography (PET and SPECT). In this article we only consider the details of x-ray CT and MRI. Good general references for the physical principles underlying these modalities are [4, 7]. There are also several experimental techniques, such as diffuse optical tomography (DOT) and electrical impedance tomography (EIT), which, largely due to intrinsic mathematical difficulties, have yet to produce useful diagnostic tools. A very promising recent development involves hybrid modalities, which combine a high-contrast (low-resolution) modality with a high-resolution (low-contrast) modality. For example, photo-acoustic imaging uses infrared light for excitation of acoustic vibrations and ultrasound for detection, see [1]. Each measurement process is described by a mathematical model, which in turn is used to “invert” the measurements and build an image of some aspect of the internal state of the organism. The success of an imaging modality relies upon having a stable and accurate inverse algorithm, usually based on an exact inversion formula, as well as the availability of sufficiently many measurements with an adequate signal-to-noise ratio. The quality of the reconstructed image is determined by complicated interactions among the size and quality of the data set, the available contrast, and the inversion method.

Medical Imaging

Medical Imaging, Fig. 1 A projection x-ray image (Image courtesy: Dr. Ari D. Goldberg)

Here x.s/ is the point along the line, `; and s is arclength parametrization. If the intersection ` \ B lies between smin and smax ; then Beer’s law predicts that:

X-Ray Computed Tomography The first “modern” imaging method was the projection x-ray, introduced in the late 1800s by Roentgen. X-rays are a high-energy form of electromagnetic radiation, which pass relatively easily through the materials commonly found in living organisms. The interaction of xrays with an object B is modeled by a function B .x/; called the attenuation coefficient. Here x is a location within B: If we imagine that an x-ray beam travels along a straight line, `; then Beer’s law predicts that I.s/; the intensity of the beam satisfies the differential equation: dI D B .x.s//I.s/: (1) ds

Iout log Iin

Zsmax B .x.s//ds: .`/ D

(2)

smin

Early x-ray images recorded the differential attenuation of the x-ray beams by different parts of the body, as differing densities on a photographic plate. In the photograph highly attenuating regions appear light, and less dense regions appear dark. An example is shown in Fig. 1. X-ray images display a good contrast between bone and soft tissues, though there is little contrast between different types of soft tissues. While the mathematical model embodied in Beer’s law is not needed to interpret projection x-ray images, it is an essential step to go from this simple modality to x-ray computed tomography.

Medical Imaging

883

X-ray CT was first developed by Alan Cormack in the early 1960s, though the lack of powerful computers made the idea impractical. It was rediscovered by Godfrey Hounsfield in the early 1970s. Both received the Nobel prize for this work in 1979, see [6]. Hounsfield was inspired by the recent development of solid-state x-ray detectors, which were more sensitive and had a much larger dynamic range than photographic film. This is essential for medical applications of x-ray CT, as the attenuation coefficients of different soft tissues in the human body differ by less than 3 %. By 1971, solidstate detectors and improved computers made x-ray tomography a practical possibility. Medical Imaging, Fig. 2 Radon transform data, shown as a The mathematical model embodied in Beer’s law sinogram, for the Shepp–Logan phantom. The horizontal axis is leads to a simple description of the measurements and the vertical axis t available in an x-ray CT-machine. Assuming that we have a monochromatic source of x-rays the measurement described in (2) is the Radon (in two dimensions), or x-ray transform (in three dimensions) of the attenuation coefficient, B .x/: For simplicity we consider the two-dimensional case. The collection, L; of oriented lines in R2 is conveniently parameterized by S 1 R1 ; with .t; / corresponding to the oriented line: `t; D ft.cos ; sin / C s. sin ; cos / W s 2 R1 g: (3) The Radon transform can then be defined by: Z RB .t; / D

B .t.cos ; sin / `t;

C s. sin ; cos //ds:

(4) Medical Imaging, Fig. 3 Filtered back-projection reconstruction of the Shepp–Logan phantom from the data in Fig. 2

The measurements made by an x-ray CT-machine are modeled as samples of RB .t; /: The actual physical design of the machine determines exactly which Z1 samples are collected. The raw data collected by an x- f R.t; /e i t dt D F ..cos ; sin //: R.; / D ray CT-machine can be represented as a sinogram, as 1 shown in Fig. 2. The reconstructed image is shown in (5) Fig. 3. This theorem and the inversion formula for the The inversion formula for the Radon transform is called the filtered back-projection formula. It is derived two-dimensional Fourier transform show that we can reconstruct B by first filtering the Radon transform: by using the Central Slice theorem: Theorem 1 (Central Slice Theorem) The Radon transform of ; R; is related to its two-dimensional Fourier transform, F ; by the one-dimensional Fourier transform of R in t W

1 GRB .; / D 2

Z1

e

RB .r; /e i r jrjdr: 1

(6)

M

884

Medical Imaging

The total magnetic field is therefore: B.x; t/ D and then back-projecting, which is R ; the adjoint of the Radon transform itself: B0 .x/ C G.t/ x C B1 .t/: The response of the bulk nuclear magnetization, M; to such a field is governed Z 1 by Bloch’s phenomenological equation: GRB .h.cos ; sin /; .x; y/i; /d: B .x; y/D 2 0 1 dM (7) .x; t/ D M.x; t/ B.x; t/ dt T1 .x/ The filtration step RB ! GRB is implemented 1 using a fast Fourier transform. The multiplication .Mk .x; t/ M0 .x// M? .x; t/: (9) T .x/ 2 by jrj in the frequency domain makes it mildly ill-conditioned; nonetheless the high quality of the data available in a modern CT-scanner allows for Here Mk is the component of M parallel to B0 and stable reconstructions with a resolution of less than M? is the orthogonal component. The terms with a millimeter. As a map from a function g.t; / on L coefficients T1 and T2 describe relaxation processes to functions on R2 ; back-projection can be understood which tend to relax M toward the equilibrium state M0 : as half the average of g on the set of lines that pass The components Mk and M? relax at different rates through .x; y/: A detailed discussion of x-ray CT can T1 > T2 : In most medical applications their values lie in the range of 50 ms–2 s. The spatial dependence of be found in [2]. T1 and T2 provides several possibilities for contrast in MR-images, sometimes called T1 - or T2 -weighted images. Note that (9) is a system of ordinary differential Magnetic Resonance Imaging equations in time, t; and that the spatial position, x; Magnetic resonance imaging takes advantage of the appears as a pure parameter. Ignoring the relaxation terms for the moment and fact that the protons in water molecules have both an intrinsic magnetic moment and an intrinsic angular assuming that B is independent of time, we see that (9) momentum, J; known as spin. As both of these quan- predicts that the magnetization M.x/ will precess tum mechanical observables transform by the standard around the B0 .x/ with angular velocity ! D kB0 .x/k: representation of SO.3/ on R3 ; the Wigner–Eckert This is the resonance phenomenon alluded to in the Theorem implies that there is a constant ; called the name “nuclear magnetic resonance.” Faraday’s Law gyromagnetic ratio, so that D J: For a water proton predicts that such a precessing magnetization will 42:5 MHz/T. If an ensemble of water protons is produce an E.M.F. in a coil C with placed in a static magnetic field B0 ; then, after a short Z time, the protons become polarized producing a bulk d M.x; t/ n.x/dS; (10) EMF / magnetization M0 : If .x/ now represents the density dt ˙ of water, as a function of position, then thermodynamic considerations show that there is a constant C for for ˙ a surface spanning C: A simple calculation which: C.x/ B0 .x/: M0 .x/

(8) shows that the strength of the signal is proportional T to ! 2 ; which explains the utility of using a very ı At room temperature (T 300 K) this field is quite strong background field. The noise magnitude in MRsmall and is, for all intents and purposes, not directly measurements is proportional to !; hence the SNR is detectable. proportional to ! as well. A clinical MRI scanner consists of a large For the remainder of this discussion we assume solenoidal magnet, which produces a strong, that B0 is a homogeneous field of the form B0 D homogeneous background field, B0 ; along with coaxial .0; 0; b0 /: The frequency !0 D b0 is called the Larelectromagnets, which produce gradient fields G.t/ x; mor frequency. The main magnet of a clinical scanner used for spatial encoding, and finally a radio-frequency typically has a field strength between 1.5 and 7 T, (RF) coil, which produces an excitation field, B1 .t/; which translates to Larmor frequencies between 64 and and is also used for signal detection. 300 MHz.

Medical Imaging

885

The RF-component of the field B1 .t/ is assumed to take the form: .a.t/ cos !0 t; a.t/ sin !0 t; 0/; with a.t/ nonzero for a short period of time. As implied by the notation, the gradient fields are designed to have a linear spatial dependence, and therefore take the form: G.t/ x D .g1 .t/x3 g3 .t/x1 ; g2 .t/x3 ; g1 .t/x1 Cg2 .t/x2 C g3 .t/x3 /:

(11)

Here g.t/ D .g1 .t/; g2 .t/; g3 .t// is a spatially independent vector describing the time course of the gradient field. Typically kgk NS is used, which functions [40]. An application of meshfree methods leads to an overdetermined system in Eq. (22) which is for problems with higher-order differentiation, such can be solved by the least-squares method, the QR as the Kirchhoff-Love plate and shell problems [35], decomposition, or the singular value decomposition where meshfree approximation functions with higher(SVD). The few commonly used radial basis functions order continuities can be employed. Meshfree methods are are shown to be effective for large deformation and 3 2 fragment-impact problems (Fig. 3) [16, 25], where 2 n 2 Multiquadrics (MQ): gI .x/ D rI C c ; mesh entanglement in the finite element method can 2 r be greatly alleviated. Another popular application of (24) Gaussian: gI .x/ D exp I2 meshfree method is for the modeling of evolving disc continuities, such as crack propagation simulations. where rI D kx x I k and c is called the shape The extended finite element method [49, 61] that comparameter that controls the localization of the function. bines the FEM approximation and the crack-tip enIn MQ RBF function in Eq. (24), the function is called richment functions [9, 11] under the partition of unity reciprocal MQ RBF if n D 1, linear MQ RBF if n D 2, framework has been the recent focal point in fracture and cubic MQ RBF if n D 3, and so forth. There modeling.

Meshless and Meshfree Methods

893

Meshless and Meshfree Methods, Fig. 3 Experimental and numerical damage patterns on the exit face of a concrete plate penetrated by a bullet

References 1. Aluru, N.R.: A point collocation method based on reproducing kernel approximations. Int. J. Numer. Methods Eng. 47, 1083–1121 (2000) 2. Arroyo, M., Ortiz, M.: Local maximum-entropy approximation schemes: a seamless bridge between finite elements and meshfree methods. Int. J. Numer. Methods Eng. 65, 2167– 2202 (2006) 3. Babuˇska, I.: The finite element method with Lagrangian multipliers. Numer. Math. 20, 179–192 (1973) 4. Babuˇska, I., Melenk, J.M.: The partition of unity method. Int. J. Numer. Methods Eng. 40, 727–758 (1997) 5. Beissel, S., Belytschko, T.: Nodal integration of the elementfree Galerkin method. Comput. Methods Appl. Mech. Eng. 139, 49–74 (1996) 6. Belikov, V.V., Ivanov, V.D., Kontorovich, V.K., Korytnik, S.A., Semenov, A.Yu.: The non-Sibsonian interpolation: a new method of interpolation of the value of a function on an arbitrary set of points. Comput. Math. Math. Phys. 37, 9–15 (1997) 7. Belytschko, T., Krongauz, Y., Organ, D., Fleming, M., Krysl, P.: Meshless methods: an overview and recent developments. Comput. Methods Appl. Mech. Eng. 139, 3–47 (1996) 8. Belytschko, T., Lu, Y.Y., Gu, L.: Element-free Galerkin methods. Int. J. Numer. Methods Eng. 37, 229–256 (1994) 9. Belytschko, T., Lu, Y.Y., Gu, L.: Crack propagation by element-free Galerkin methods. Eng. Fract. Mech. 51, 295– 315 (1995) 10. Belytschko, T., Organ, D., Krongauz, Y.: A coupled finite element-element-free Galerkin method. Comput. Mech. 17, 186–195 (1995) 11. Belytschko, T., Tabbara, M.: Dynamic fracture using element-free Galerkin methods. Int. J. Numer. Methods Eng. 39, 923–938 (1996) 12. Bonet, J., Kulasegaram, S.: Correction and stabilization of smooth particle hydrodynamics methods with applications in metal forming simulations. Int. J. Numer. Methods Eng. 47, 1189–1214 (2000) 13. Braun, J., Sambridge, M.: A numerical method for solving partial differential equations on highly irregular evolving grids. Nature 376, 655–660 (1995)

14. Brezzi, F.: On the existence, uniqueness and approximation of saddle-point problems arising from Lagrangian multipliers. Rev. Française Autom. Inf. Recherche Opérationnelle Sér. Rouge 8, 129–151 (1974) 15. Chen, J.S., Han, W., You, Y., Meng, X.: A reproducing kernel method with nodal interpolation property. Int. J. Numer. Methods Eng. 56, 935–960 (2003) 16. Chen, J.S., Pan, C., Wu, C.T., Liu, W.K.: Reproducing kernel particle methods for large deformation analysis of non-linear structures. Comput. Methods Appl. Mech. Eng. 139, 195–227 (1996) 17. Chen, J.S., Wang, D.: A constrained reproducing kernel particle formulation for shear deformable shell in Cartesian coordinates. Int. J. Numer. Methods Eng. 68, 151–172 (2006) 18. Chen, J.S., Wang, H.P.: New boundary condition treatments in meshfree computation of contact problems. Comput. Methods Appl. Mech. Eng. 187, 441–468 (2000) 19. Chen, J.S., Wu, C.T., Yoon, S., You, Y.: A stabilized conforming nodal integration for Galerkin meshfree methods. Int. J. Numer. Methods Eng. 50, 435–466 (2001) 20. Chen, J.S., Yoon, S., Wu, C.T.: Non-linear version of stabilized conforming nodal integration for Galerkin meshfree methods. Int. J. Numer. Methods Eng. 53, 2587–2615 (2002) 21. Duarte, C.A., Babuˇska, I., Oden, J.T.: Generalized finite element methods for three-dimensional structural mechanics problems. Comput. Struct. 77, 215–232 (2000) 22. Duarte, C.A., Oden, J.T.: An h-p adaptive method using clouds. Comput. Methods Appl. Mech. Eng. 139, 237–262 (1996) 23. Fernández-Méndez, S., Huerta, A.: Imposing essential boundary conditions in mesh-free methods. Comput. Methods Appl. Mech. Eng. 193, 1257–1275 (2004) 24. Gingold, R.A., Monaghan, J.J.: Smoothed particle hydrodynamics: theory and application to non-spherical stars. Mon. Not. R. Astron. Soc. 181, 375–389 (1977) 25. Guan, P.C., Chi, S.W., Chen, J.S., Slawson, T.R., Roth, M.J.: Semi-Lagrangian reproducing kernel particle method for fragment-impact problems. Int. J. Impact Eng. 38, 1033– 1047 (2011)

M

894 26. Han, W., Meng, X.: Error analysis of reproducing kernel particle method. Comput. Methods Appl. Mech. Eng. 190, 6157–6181 (2001) 27. Hardy, R.L.: Multiquadric equations of topography and other irregular surfaces. J. Geophys. Res. 76, 1905–1915 (1971) 28. Hu, H.Y., Chen, J.S., Hu, W.: Weighted radial basis collocation method for boundary value problems. Int. J. Numer. Methods Eng. 69, 2736–2757 (2007) 29. Hu, H.Y., Chen, J.S., Hu, W.: Error analysis of collocation method based on reproducing kernel approximation. Numer. Methods Partial Differ. Equ. 27, 554–580 (2011) 30. Huerta, A., Fernández-Méndez, S.: Enrichment and coupling of the finite element and meshless methods. Int. J. Numer. Methods Eng. 48, 1615–1636 (2000) 31. Jaynes, E.T.: Information theory and statistical mechanics. Phys. Rev. 106, 620–630 (1957) 32. Kaljević, I., Saigal, S.: An improved element free Galerkin formulation. Int. J. Numer. Methods Eng. 40, 2953–2974 (1997) 33. Kansa, E.J.: Multiquadrics—A scattered data approximation scheme with applications to computational fluiddynamics—I surface approximations and partial derivative estimates. Comput. Math. Appl. 19, 127–145 (1990) 34. Kansa, E.J.: Multiquadrics—A scattered data approximation scheme with applications to computational fluiddynamics—II solutions to parabolic, hyperbolic and elliptic partial differential equations. Comput. Math. Appl. 19, 147– 161 (1990) 35. Krysl, P., Belytschko, T.: Analysis of thin plates by the element-free Galerkin method. Comput. Mech. 17, 26–35 (1995) 36. Lancaster, P., Salkauskas, K.: Surfaces generated by moving least squares methods. Math. Comput. 37, 141–158 (1981) 37. Li, S., Liu, W.K.: Meshfree Particle Methods, 2nd edn. Springer, New York (2007) 38. Liszka, T., Orkisz, J.: The finite difference method at arbitrary irregular grids and its application in applied mechanics. Comput. Struct. 11, 83–95 (1980) 39. Liu, G.R.: Meshfree Methods: Moving Beyond the Finite Element Method, 2nd edn. CRC, Boca Raton (2010) 40. Liu, W.K., Chen, Y.: Wavelet and multiple scale reproducing kernel methods. Int. J. Numer. Methods Fluids 21, 901–931 (1995) 41. Liu, W.K., Chen, Y., Jun, S., Chen, J.S., Belytschko, T., Pan, C., Uras, R.A., Chang, C.T.: Overview and applications of the reproducing kernel particle methods. Arch. Comput. Methods Eng. 3, 3–80 (1996) 42. Liu, W.K., Han, W., Lu, H., Li, S., Cao, J.: Reproducing kernel element method. Part I: theoretical formulation. Comput. Methods Appl. Mech. Eng. 193, 933–951 (2004) 43. Liu, W.K., Jun, S., Li, S., Adee, J., Belytschko, T.: Reproducing kernel particle methods for structural dynamics. Int. J. Numer. Methods Eng. 38, 1655–1679 (1995) 44. Liu, W.K., Jun, S., Zhang, Y.F.: Reproducing kernel particle methods. Int. J. Numer. Methods Fluids 20, 1081–1106 (1995) 45. Liu, W.K., Li, S., Belytschko, T.: Moving least-square reproducing kernel methods (I) methodology and convergence. Comput. Methods Appl. Mech. Eng. 143, 113–154 (1997)

Meshless and Meshfree Methods 46. Lu, Y.Y., Belytschko, T., Gu, L.: A new implementation of the element free Galerkin method. Comput. Methods Appl. Mech. Eng. 113, 397–414 (1994) 47. Madych, W.R., Nelson, S.A.: Bounds on multivariate polynomials and exponential error estimates for multiquadric interpolation. J. Approx. Theory 70, 94–114 (1992) 48. Melenk, J.M., Babuˇska, I.: The partition of unity finite element method: basic theory and applications. Comput. Methods Appl. Mech. Eng. 139, 289–314 (1996) 49. Moës, N., Dolbow, J., Belytschko, T.: A finite element method for crack growth without remeshing. Int. J. Numer. Methods Eng. 46, 131–150 (1999) 50. Nayroles, B., Touzot, G., Villon, P.: Generalizing the finite element method: diffuse approximation and diffuse elements. Comput. Mech. 10, 307–318 (1992) ¨ 51. Nitsche, J.: Uber ein Variationsprinzip zur Lösung von Dirichlet-Problemen bei Verwendung von Teilräumen, die keinen Randbedingungen unterworfen sind. Abh. Math. Semin. Univ. Hambg. 36, 9–15 (1971) 52. Oden, J.T., Duarte, C.A., Zienkiewicz, O.C.: A new cloudbased hp finite element method. Comput. Methods Appl. Mech. Eng. 153, 117–126 (1998) 53. Oñate, E., Idelsohn, S., Zienkiewicz, O.C., Taylor, R.L.: A finite point method in computational mechanics. Application to convective transport and fluid flow. Int. J. Numer. Methods Eng. 39, 3839–3866 (1996) 54. Perrone, N., Kao, R.: A general finite difference method for arbitrary meshes. Comput. Struct. 5, 45–57 (1975) 55. Puso, M.A., Chen, J.S., Zywicz, E., Elmer, W.: Meshfree and finite element nodal integration methods. Int. J. Numer. Methods Eng. 74, 416–446 (2008) 56. Randles, P.W., Libersky, L.D.: Smoothed particle hydrodynamics: some recent improvements and applications. Comput. Methods Appl. Mech. Eng. 139, 375–408 (1996) 57. Schaback, R., Wendland, H.: Using compactly supported radial basis functions to solve partial differential equations. Bound. Elem. Technol. XIII, 311–324 (1999) 58. Sibson, R.: A vector identity for the Dirichlet tessellation. Math. Proc. Camb. Phil. Soc. 87, 151–155 (1980) 59. Strouboulis, T., Copps, K., Babuˇska, I.: The generalized finite element method. Comput. Methods Appl. Mech. Eng. 190, 4081–4193 (2001) 60. Sukumar, N.: Construction of polygonal interpolants: a maximum entropy approach. Int. J. Numer. Methods Eng. 61, 2159–2181 (2004) 61. Sukumar, N., Moës, N., Moran, B., Belytschko, T.: Extended finite element method for three-dimensional crack modelling. Int. J. Numer. Methods Eng. 48, 1549–1570 (2000) 62. Sukumar, N., Moran, B., Belytschko, T.: The natural element method in solid mechanics. Int. J. Numer. Methods Eng. 43, 839–887 (1998) 63. Wang, D., Chen, J.S.: Locking-free stabilized conforming nodal integration for meshfree Mindlin-Reissner plate formulation. Comput. Methods Appl. Mech. Eng. 193, 1065– 1083 (2004) 64. You, Y., Chen, J.S., Lu, H.: Filters, reproducing kernel, and adaptive meshfree method. Comput. Mech. 31, 316–326 (2003)

Metabolic Networks, Modeling


895

the reaction. The simplest kind of kinetics is massaction kinetics in which a unimolecular reaction (one k

Michael C. Reed1 , Thomas Kurtz2 , and H. Frederik Nijhout3 1 Department of Mathematics, Duke University, Durham, NC, USA 2 University of Wisconsin, Madison, WI, USA 3 Duke University, Durham, NC, USA

Mathematics Subject Classification 34; 37; 60; 92

Synonyms Biochemical network

Definition Suppose that a system has m different chemicals, A1 ; : : : ; Am ; and define a complex to be an m-vector of nonnegative integers. A metabolic network is a directed graph, not necessarily connected, whose vertices are complexes. There is an edge from complex C to complex D if there exists a chemical reaction in which the chemicals in C with nonzero components are changed into the chemicals in D with nonzero components. The nonzero integer components represent how many molecules of each chemical are used or produced in the reaction. Metabolic networks are also called biochemical networks.

substrate), A ! B, proceeds at a rate proportional to the concentration of the substrate, that is, kŒA, and a k

bimolecular reaction, A C B ! C , proceeds at a rate proportional to the product of the concentrations of the substrates, kŒAŒB, and so forth. Given a chemical reaction diagram, such as Fig. 1, the differential equations for the concentrations of the substrates simply state that the rate of change of each substrate concentration is equal to the sum of the rates of the reactions that make it minus the rates of the reactions that use it. A simple reaction diagram and corresponding differential equations are shown in Fig. 1. Figure 2 shows the simplest reaction diagram for an enzymatic reaction in which an enzyme, E, binds to a substrate, S , to form a complex, C . The complex then dissociates into the product, P , and the enzyme that can be used again. One can write down the four differential equations for the variables S; E; C; P but they cannot be solved in closed form. It is very useful to have a closed form formula for the overall rate of the reaction S ! P because that formula can be compared to experiments and the constants can be determined. Such an approximate formula was derived by Leonor Michaelis and Maud Menten (see Fig. 2).

d[A] A

k1 k2

dt B+C

d[B] dt

C

k3 k4

d[C] D

dt

Description

d[D] dt

Chemicals inside of cells are normally called substrates and the quantity of interest is the concentration of the substrate that could be measured as mass per unit volume or, more typically, number of molecules per unit volume. In Fig. 3, the substrates are indicated by rectangular boxes that contain their acronyms. A chemical reaction changes one or more substrates into other substrates and the function that describes how the rate of this process depends on substrate concentrations and other variables is said to give the kinetics of

= k2[B][C] − k1[A]

= −k2[B][C] + k1[A]

= −k2[B][C] + k1[A]−k3[C] + k4[D]

= k3[C] − k4[D]

Metabolic Networks, Modeling, Fig. 1 On the right are the differential equations corresponding to the reaction diagram if one assumes mass-action kinetics

S+E

k1 k–1

C

k2

E+P

V =

k2Etot [S] K m + [S ]

Metabolic Networks, Modeling, Fig. 2 A simple enzymatic reaction and the Michaelis–Menten formula

M

896


Metabolic Networks, Modeling, Fig. 3 Folate and methionine metabolism. The rectangular boxes represent substrates whose acronyms are in the boxes. All the pink boxes are different forms of folate. Each arrow represents a biochemical reaction. The acronyms for the enzymes that catalyze the reactions are

in the blue ellipses. The TS and AICART reactions are important steps in pyrimidine and purine synthesis, respectively. The DNMT reaction methylates cytosines in DNA and is important for gene regulation

Here Etot is the total enzyme concentration, k2 is indicated in Fig. 2, and Km is the so-called Michaelis– Menten constant. The quantity k2 Etot is called the Vmax of the reaction because that is the maximum rate obtained as ŒS ! 1: There is a substantial mathematical literature about when this approximation is a good one [33]. For further discussion of kinetics and references, see [24]. The biological goal is to understand how large biochemical systems that accomplish particular tasks work, that is, how the behavior of the whole system depends on the components and on small and large changes in inputs. So, for example, the folate cycle in Fig. 3 is central to cell division since it is involved in the production of purines and pyrimidines necessary for copying DNA. Methotrexate, a chemotherapeutic agent, binds to the enzyme DHFR and slows down cell division. Why? And how much methotrexate do you need to cut the rate of cell division in half? The enzyme DNMT catalyzes the methylation of DNA. How does the rate of the DNMT reaction depend on the folate status of the individual, that is, the total concentration of the six folate substrates?

Difficulties It would seem from the description so far that the task of an applied mathematician studying metabolism should be quite straightforward. A biologist sets the questions to be answered. The mathematician writes down the differential equations for the appropriate chemical reaction network. Using databases or original literature, the constants for each reaction, like Km and Vmax , are determined. Then the equations are solved by machine computation and one has the answer. For many different reasons the actual situation is much more difficult and much more interesting. What is the network? The metabolism of cells is an exceptionally large biochemical network and it is not so easy to decide on the “correct” relatively small network that corresponds to some particular cellular task. Typically, the substrates in any small network will also be produced and used up by other small networks and thus the behavior in those other networks affects the one under study. How should one draw the boundaries of a relatively small network so that everything that is important for the effect one is studying is included?


Enzyme properties. The rates of reactions depend on the properties of the enzymes that catalyze them. Biochemists often purify these enzymes and study their properties when they are combined with substrates in a test tube. These experiments are typically highly reproducible. However, enzymes may behave very differently in the context of real cells. They are affected by pH and by the presence or absence of many other molecules that activate them or inhibit them. Thus their Km and Vmax may depend on the context in which they are put. Many metabolic pathways are very ancient, for example, the folate cycle occurs in bacteria, and many different species have the “same” enzymes. But, in reality, the enzymes may have different properties because of differences in the genes that code for them.

897

reticulum. In this case, there will clearly be gradients, the well-mixed assumption is not valid, and partial differential equations will be required.

Are these systems at steady state? It is difficult to choose the right network and determine enzyme constants. However, once that is done surely the traditional approach in applied mathematics to large nonlinear systems of ODEs should work. First one determines the steady-states and then one linearizes around the steady-states to determine which ones are asymptotically stable. Unfortunately, many cellular systems are not at or even near steady state. For example, amino acid concentrations in the blood for the hours shortly after meals increase by a factor of 2–6. This means that cells are subject to enormous fluctuations in the inputs Gene expression levels. Enzymes are proteins that of amino acids. The traditional approach has value, of are coded for by genes. The Vmax is roughly propor- course, but new tools, both technical and conceptual, tional to the total enzyme concentration, which is itself are needed for studying these systems of ODEs. dependent on gene expression level and the rate of degradation of the enzyme. The expression level of the Long-range interactions. Many biochemical gene that codes for the enzyme will depend on the cell reaction diagrams do not include the fact that some type (liver cell or epithelial cell) and on the context in substrates influence distant enzymes in the network. which the cell finds itself. This expression level will These are called long range interactions and several are vary between different cells in the same individual, indicated by red arrows in Fig. 3. The substrate SAM between individuals of the same species, and between activates the enzyme CBS and inhibits the enzymes different species that have the same gene. Furthermore, MTHFR and BHMT. The substrate 5mTHF inhibits the expression level may depend on what other genes the enzyme GNMT. We note that “long range” does are turned on or the time of day. Even more daunting is not indicate distance in the cell; we are assuming the the fact that identical cells (same DNA) in exactly the cell is well mixed. “Long-range” refers to distance same environment often show a 30 % variation in gene in the network. It used to be thought that it was easy expression levels [36]. Thus, it is not surprising that the to understand the behavior of chemical networks by Km and Vmax values (that we thought the biochemists walking through the diagrams step by step. But if there would determine for us) vary sometimes by two or are long-range interactions this is no longer possible; three orders of magnitude in public enzyme databases. one must do serious mathematics and/or extensive machine experimentation to determine the system Is the mean field approximation valid? When we properties of the network. write down the differential equations for the concenBut what do these long-range interactions do in the trations of substrates using mass-action, Michaelis– cases indicated in Fig. 3? After meals the methionine Menten, or other kinetics, we are assuming that the input goes way up and the SAM concentration rises cell can be treated as a well-mixed bag of chemicals. dramatically. This activates CBS and inhibits BHMT, There are two natural circumstances where this is not which means that more mass is sent away from the true. First, the number of molecules of a given substrate methionine cycle via the CBS reaction and less mass may be very small; this is particularly true in bio- is recycled within the cycle via the BHMT reaction. chemical networks related to gene expression. In this So these two long-range interaction, roughly conserve case stochastic fluctuations play an important role. mass in the methionine cycle. The other two long range Stochastic methods are discussed below. Second, some interactions keep the DNMT reaction running at almost biochemical reactions occur only in special locations, a constant rate despite large fluctuations in methionine for example, the cell membrane or the endoplasmic input. Here is a verbal description of how this works.

M

898

If SAM starts to go up, the enzyme MTHFR is more inhibited so there will be less of the substrate 5mTHF. Since there is less 5mTHF, the inhibition of GNMT is partly relieved and the extra SAMs that are being produced are taken down the GNMT pathway, leaving the rate of the DNMT reaction about constant [28]. We see that in both cases the long-range interactions have specific regulatory roles and probably evolved for just those reasons. The existence of such longrange interactions makes the study of chemical reaction networks much more difficult.

Theoretical Approaches to Complex Metabolic Systems Cell metabolism is an extremely complex system and the large number of modeling studies on particular parts of the system cannot be summarized in this short entry. However, we can discuss several different theoretical approaches. Metabolic Control Analysis (MCA). This theory, which goes back to the original papers of Kacser and Burns [21, 22], enables one to calculate “control coefficients” that give some information about the system properties of metabolic networks. Let x D < x1 ; x2 ; : : : > denote the substrate concentrations in a large metabolic network and suppose that the network is at a steady state x s .c/, where c denotes a vector of constants that the steady state depends on. These constants may be kinetic constants like Km or Vmax values, initial conditions, input rates, enzyme concentrations, etc. If we assume that the constants are not at critical values where behavior changes, then the mapping c ! x s .c/ will be smooth and we can compute its partial derivatives. Since the kinetic formulas tell us how the fluxes along each pathway depend on the substrate concentrations, we can also compute the rates of change of the fluxes as the parameters c are varied. These are called the “flux control coefficients.” In practice, this can be done by hand only for very simple networks, and so is normally done by machine computation. MCA gives information about system behavior very close to a steady state. One of the major contributions of MCA was to emphasize that local behavior, for example, a flux, was a system property in that it depended on all or many of the constants in c. So, for example, there is no single rate-limiting step for


the rate of production of a particular metabolite, but, instead, control is distributed throughout the system. Biochemical Systems Theory (BST). This theory, which goes back to Savageau [32], replaces the diverse nonlinear kinetic formulas for different enzymes with a common power-law formulation. So, the differential equation for each substrate concentration looks like Q ˇ Q ı P P x 0 .t/ D i ˛ij j xijij i ij j xijij . In the first term, the sum over i represents all the different reactions that produce x and the product over j gives the variables that influence each of those reactions. Similarly, the second sum contains the reactions that use x. The powers ˇij and ıij , which can be fractional or negative, are to be obtained by fitting the model to experimental data. The idea is that one needs to know the network and the influences, but not the detailed kinetics. A representation of the detailed kinetics will emerge from determining the powers by fitting data. Note that the influences would naturally include the long-range interactions mentioned above. From a mathematical point of view there certainly will be such a representation near a (noncritical) steady state if the variables represent deviations from that steady state. One of the drawbacks of this method is that biological data is highly variable (for the reasons discussed above) and therefore the right choice of data set for fitting may not be clear. BST has also been used to simulate gene networks and intracellular signaling networks [31, 34]. Metabolomics. With the advent of high-throughput studies in molecular biology, there has been much interest in applying concepts and techniques from bioinformatics to understanding metabolic systems. The idea is that one measures the concentrations of many metabolites at different times, in different tissues, or cells. Statistical analysis reveals which variables seem to be correlated, and one uses this information to draw a network of influences. Clusters of substrates that vary together could be expected to be part of the same “function.” The resulting networks can be compared, between cells or species, in an effort to understand how function arises from network properties; see, for example [29]. Graph theory. A related approach has been to study the directed graphs that correspond to known metabolic


(or gene) networks with the substrates (genes) as nodes and the directed edges representing biochemical reactions (or influences). One is interested in large-scale properties of the networks, such as mean degree of nodes and the existence of large, almost separated, clusters. One is also interested in local properties, such as a particular small connection pattern, that is repeated often in the whole graph. It has been proposed by Alon [1] that such repeated “motifs” have specific biological functions. From the biological point of view, the graph theoretic approaches have a number of pitfalls. It is very natural to assume that graph properties must have biological function or significance, for example, to assume that a node with many edges must be “important,” or clusters of highly connected nodes are all contributing to a single “function.” Nevertheless, it is interesting to study the structure of the graphs independent of the dynamics and to ask what influence or significance the graph structure has. Deficiency zero systems. The study of graphs suggests a natural question about the differential equations that represent metabolic systems. When are the qualitative properties of the system independent of the local details? As discussed in Difficulties, the details will vary considerably from species to species, from tissue to tissue, from cell to cell, and even from time to time in the same cell. Yet large parts of cell metabolism keep functioning in the same way. Thus, the biology tells us that many important system properties are independent of the details of the dynamics. This must be reflected in the mathematics. But how? A major step to understanding the answer to this question was made by Marty Feinberg and colleagues [14]. Let m be the number of substrates. For each reaction in the network, we denote by the m-component vector of integers that indicates how many molecules of different substrates are used in the reaction; 0 indicates how many are produced by the reaction. Each is called a complex and we denote the number of complexes by c. The span of the set of vectors of the form 0 is called the stoichiometric subspace and it is invariant under the dynamics. We denote its dimension by s and let ` denote the number of connected components of the graph. The deficiency of the network is defined as ı D c s `: The network is weakly reversible if whenever a sequence of reactions allows us to go from complex 1 to complex 2 then there exists a sequence of reactions from 2 to

899

complex 1 . Feinberg formulated the deficiency zero theorem which says that a weakly reversible deficiency zero network with mass-action kinetics has a unique globally stable equilibrium in the interior of each stoichiometric compatibility class. This is true independent of the choice of rate constants. Feinberg gave a proof in the case that there are no boundary equilibria on the faces of the positive orthant. Since then, the proof has been extended to many cases that allow boundary equilibria [2, 9, 35]. Stochastic Models There are many sources of stochasticity in cellular networks. For example, the initial conditions for a cell will be random due to the random assignment of resources at cellular division, and the environment of the cell is random due to fluctuations in such things as temperature and the chemical environment of the cell. If these were the only sources of randomness, then one would only need to modify the coefficients and initial conditions of the differential equation models to obtain reasonable models taking these stochastic effects into account. But many cellular processes involve substrates and enzymes present in the system in very small numbers, and small (random) fluctuations in these numbers may have significant effects on the behavior of the system. Consequently, it is the discreteness of the system as much as its inherent stochasticity that demands a modeling approach different from the classical differential equations. Markov chain models. The idea of modeling a chemical reaction network as a discrete stochastic process at the molecular level dates back at least to [12], with a rapid development beginning in the 1950s and 1960s; see, for example, [7, 8, 27]. The simplest and most frequently used class of models are continuoustime Markov chains. The state X.t/ of the model at time t is a vector of nonnegative integers giving the numbers of molecules of each species in the system at that time. These models are specified by giving transition intensities (or propensities in much of the reaction network literature) l .x/ that determine the infinitesimal probabilities of seeing a particular change or transition X.t/ ! X.t C t/ D X.t/ C l in the next small interval of time .t; t C t, that is, P fX.t C t/ D X.t/ C l jX.t/g l .X.t// t:

M

900


In the chemical network setting, each type of tran2A ! C (4) sition corresponds to a reaction in the network, and l D l0 l , where l is a vector giving the number of would give an intensity molecules of each chemical species consumed in the ! lth reaction and l0 is a vector giving the number of XA .t/ molecules of each species produced in the reaction. D XA .t/.XA .t/ 1/ .X.t// D 2 2 The intuitive notion of a transition intensity can be translated into a rigorous specification of a model D kXA .t/.XA .t/ 1/; in a number of different ways. The most popular approach in the chemical networks literature is through where we replace =2 by k. the master (or Kolmogorov forward) equation For unary reactions, for example, A ! C , the assumption is that the molecules behave independently ! X X and the intensity becomes .X.t// D kXA .t/. l .y l /pyl .t/ l .y/ py .t/; pPy .t/ D l

l

Relationship to deterministic models. The larger the volume of the system the less likely a particular pair of molecules is to come close enough together to react, so it is natural to assume that intensities for binary reactions should vary inversely with respect to some measure of the volume. If we take that measure, N , to be Avogadro’s number times the volume in liters, then (2) the intensity for (3) becomes

(1) where py .t/ D P fX.t/ D yg, and the sum is over the different reactions in the network. Another useful approach is through a system of stochastic equations X.t/ D X.0/ C

X

Z

t

l Yl

l .X.s//ds ;

0

where the Yl are independent unit Poisson processes. Rt Note that Rl .t/ D Yl . 0 l .X.s//ds/ simply counts the number of times that the transition taking the state x to the state xCl occurs by time t, that is, the number of times the lt h reaction occurs. The master equation and the stochastic equation determine the same models in the sense that if X is a solution of the stochastic equation, py .t/ D P fX.t/ D yg is a solution of the master equation, and any solution of the master equation can be obtained in this way. See [4] for a survey of these models and additional references. The stochastic law of mass action. The basic assumption of the simplest Markov chain model is the same as that of the classical law of mass action: the system is thoroughly mixed at all times. That assumption suggests that the intensity for a binary reaction

.X.t// D

k XA .t/XB .t/ D N kŒAt ŒBt ; N

where ŒAt D XA .t/=N is the concentration of A measured in moles per liter. The intensity for (4) becomes .X.t// D kŒAt .ŒAt N 1 / kŒA2t , assuming, as is likely, that N is large and that XA .t/ is of the same order of magnitude as N (which may not be the case for cellular reactions). If we assume that our system consists of the single reaction (3), the stochastic equation for species A, written in terms of the concentrations, becomes Z t 1 kŒAs ŒBs ds/ ŒAt D ŒA0 Y .N N 0 Z t

ŒA0 kŒAs ŒBs ds; 0

where, again assuming that N is large, the validity (3) of the approximation follows by the fact that the law of large numbers for the Poisson process implies should be proportional to the number of pairs con- N 1 Y .N u/ u. Analysis along these lines gives a sisting of one molecule of A and one molecule of B, derivation of the classical law of mass action starting that is, .X.t// D kXA .t/XB .t/. The same intuition from the stochastic model; see, for example, Kurtz applied to the binary reaction [25, 26], or Ethier and Kurtz [13], Chap. 10. ACB ! C


901

Simulation. Among the basic properties of a continuous-time Markov chain (with intensities that do not depend on time) is that the holding time in a state x is exponentially distributed and is independent of the value of the next state occupied by the chain. To be specific, the parameter of the holding time is N .x/ D

X

l .x/;

l

and the probability that the next state is x C l is N This observation immediately suggests a l .x/=.x/. simulation algorithm known in the chemical literature as Gillespie’s direct method or the stochastic simulation algorithm (SSA)[16, 17]. Specifically, given two independent uniform Œ0; 1 random variables U1 and U2 and noting that log U1 is exponentially distributed with mean 1, the length of time the process remains in 1 . log U1 /. Assuming state x is simulated by D .x/ N that there are m reactions indexed by 1 Pl m and N 1 l k .x/, defining 0 .x/ D 0 and l .x/ D .x/ kD1 the new state is given by xC

X

XO .n / D X.0/

l

l Yl

X

Z

l .X.s//ds ;

0

l2Rd

Xk .t/ D Xk .0/ C

t

l Yl

X l2Rc

Z

Z l

k 2 Sd ; t

l .X.s//ds 0

t

D Xk .0/ C

Fk .X.s//ds; 0

k 2 Sc ;

M

that is, the new state is x C l if l1 .x/ < U2 l .x/. If one simulates the process by simulating the Poisson processes Yl and solving the stochastic equation (2), one obtains the next reaction (next jump) method as defined by Gibson and Bruck [15]. If we define an Euler-type approximation for (2), that is, for 0 D 0 < 1 < , recursively defining

C

Xk .t/ D Xk .0/ C

l 1.l1 .x/;l .x/ .U2 /;

l

X

numbers that need to be modeled as discrete variables and others species present in much larger numbers that would be natural to model as continuous variables. This observation leads to hybrid or piecewise deterministic models (in the sense of Davis [11]) as considered in the chemical literature by Crudu et al. [10], Haseltine and Rawlings [19], Hensel et al. [20], and Zeiser et al. [39]. We can obtain these models as solutions of systems of equations of the form

n1 X

! l .XO .k //.kC1 k / ;

kD0

we obtain Gillespie’s -leap method, which provides a useful approximation to the stochastic model in N situations where .x/ is large for values of the state x of interest [18]. See [3, 5] for additional analysis and discussion. Hybrid and multiscale models. A discrete model is essential if the chemical network consists of species present in small numbers, but a typical biochemical network may include some species present in small

where Rd and Sd are the indices of the reactions and the species that are modeled discretely, Rc and Sc are the indices for the reactions and species modeled P continuously, and Fk .x/ D l2Rc l l .x/. Models of this form are in a sense “multiscale” since the numbers of molecules in the system for the species modeled continuously are typically many orders of magnitude larger than the numbers of molecules for the species modeled discretely. Many of the stochastic models that have been considered in the biochemical literature are multiscale for another reason in that the rate constants vary over several orders of magnitude as well (see, e.g., [37, 38].) The multiscale nature of the species numbers and rate constants can be exploited to identify subnetworks that function naturally on different timescales and to obtain reduced models for each of the timescales. Motivated in part by Rao and Arkin [30] and Haseltine and Rawlings [19], a systematic approach to identifying the separated timescales and reduced models is developed in Refs. [6] and [23]. Acknowledgements The authors gratefully acknowledge the support of the National Science Foundation (USA) through grants DMS 08-05793 (TK), DMS-061670 (HFN, MR), and EF1038593 (HFN, MR).

902

References 1. Alon, U.: An Introduction to Systems Biology: Design Principles of Biological Circuits. CRC Press, Boca Raton (2006) 2. Anderson, D.: Global asymptotic stability for a class of nonlinear chemical equations. SIAM J. Appl. Math. 68(5), 1464–1476 (2008) 3. Anderson, D.F.: Incorporating postleap checks in tauleaping. J. Chem. Phys. 128(5), 054103 (2008). doi:10.1063/1.2819665. http://link.aip.org/link/?JCP/128/ 054103/1 4. Anderson, D.F., Kurtz, T.G.: Continuous time markov chain models for chemical reaction networks. In: Koeppl, H., Setti, G., di Bernardo, M., Densmore D. (eds.) Design and Analysis of Biomolecular Circuits. Springer, New York (2010) 5. Anderson, D.F., Ganguly, A., Kurtz, T.G.: Error analysis of tau-leap simulation methods. Ann. Appl. Prob. To appear (2010) 6. Ball, K., Kurtz, T.G., Popovic, L., Rempala, G.: Asymptotic analysis of multiscale approximations to reaction networks. Ann. Appl. Probab. 16(4), 1925–1961 (2006) 7. Bartholomay, A.F.: Stochastic models for chemical reactions. I. Theory of the unimolecular reaction process. Bull. Math. Biophys. 20, 175–190 (1958) 8. Bartholomay, A.F.: Stochastic models for chemical reactions. II. The unimolecular rate constant. Bull. Math. Biophys. 21, 363–373 (1959) 9. Chavez, M.: Observer design for a class of nonlinear systems, with applications to biochemical networks. Ph.D. thesis, Rutgers (2003) 10. Crudu, A., Debussche, A., Radulescu, O.: Hybrid stochastic simplifications for multiscale gene networks. BMC Syst. Biol. 3, 89 (2009). doi:10.1186/1752-0509-3-89 11. Davis, M.H.A.: Markov Models and Optimization. Monographs on Statistics and Applied Probability, vol. 49. Chapman & Hall, London (1993) 12. Delbrück, M.: Statistical fluctuations in autocatalytic reactions. J. Chem. Phys. 8(1), 120–124 (1940). doi: 10.1063/1.1750549. http://link.aip.org/link/?JCP/8/120/1 13. Ethier, S.N., Kurtz, T.G.: Markov Processes: Characterization and Convergence. Wiley Series in Probability and Mathematical Statistics. Wiley, New York (1986) 14. Feinberg, M.: Chemical reaction network structure and the stability of complex isothermal reactors–i. the deficiency zero and deficiency one theorems. Chem. Eng. Sci. 42, 2229–2268 (1987) 15. Gibson, M.A., Bruck, J.: Efficient exact simulation of chemical systems with many species and many channels. J. Phys. Chem. A 104(9), 1876–1889 (2000) 16. Gillespie, D.T.: A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys. 22(4), 403–434 (1976) 17. Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 2340–2361 (1977) 18. Gillespie, D.T.: Approximate accelerated stochastic simulation of chemically reacting systems. J. Chem. Phys. 115(4), 1716–1733 (2001). doi:10.1063/1.1378322. http://link.aip. org/link/?JCP/115/1716/1

Metabolic Networks, Modeling 19. Haseltine, E.L., Rawlings, J.B.: Approximate simulation of coupled fast and slow reactions for stochastic chemical kinetics. J. Chem. Phys. 117(15), 6959–6969 (2002) 20. Hensel, S.C., Rawlings, J.B., Yin, J.: Stochastic kinetic modeling of vesicular stomatitis virus intracellular growth. Bull. Math. Biol. 71(7), 1671–1692 (2009). doi:10.1007/s11538-009-9419-5 http://dx.doi.org.ezproxy. library.wisc.edu/10.1007/s11538-009-9419-5 21. Kacser, H., Burns, J.A.: The control of flux. Symp. Soc. Exp. Biol. 27, 65–104 (1973) 22. Kacser, H., Burns, J.A.: The control of flux. Biochem. Soc. Trans. 23, 341–366 (1995) 23. Kang, H.W., Kurtz, T.G.: Separation of time-scales and model reduction for stochastic reaction networks. Ann. Appl. Prob. To appear (2010) 24. Keener, J., Sneyd, J.: Mathematical Physiology. Springer, New York (2009) 25. Kurtz, T.G.: The relationship between stochastic and deterministic models for chemical reactions. J. Chem. Phys. 57(7), 2976–2978 (1972) 26. Kurtz, T.G.: Approximation of Population Processes. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 36. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1981) 27. McQuarrie, D.A.: Stochastic approach to chemical kinetics. J. Appl. Probab. 4, 413–478 (1967) 28. Nijhout, H.F., Reed, M., Anderson, D., Mattingly, J., James, S., Ulrich, C.: Long-range allosteric interactions between the folate and methionine cycles stabilize dna methylation. Epigenetics 1, 81–87 (2006) 29. Pepin, J.A., Price, N.D., Wiback, S.J., Fell, D.A., Palsson, B.O.: Metabolic pathways in the post-genome era. Trends Biochem. Sci. 28, 250–258 (2003) 30. Rao, C.V., Arkin, A.P.: Stochastic chemical kinetics and the quasi-steady-state assumption: application to the gillespie algorithm. J. Chem. Phys. 118(11), 4999–5010 (2003) 31. Reinitz, J., Sharp, D.H.: Mechanism of even stripe formation. Mech. Dev. 49, 133–158 (1995) 32. Savageau, M.A.: Biochemical systems analysis: I. some mathematical properties of the rate law for the component enzymatic reactions. J. Theor. Biol. 25(3), 365–369 (1969) 33. Segal, L.E.: On the validity of the steady state assumption of enzyme kinetics. Bull. Math. Biol. 50, 579–593 (1988) 34. Sharp, D.H., Reinitz, J.: Prediction of mutant expression patterns using gene circuits. Biosystems 47, 79–90 (1998) 35. Shiu, A., Sturmfels, B.: Siphons in chemical reaction networks. Bull. Math. Biol. 72(6), 1448–1463 (2010) 36. Sigal, A., Milo, R., Cohen, A., Geva-Zatorsky, N., Klein, Y., Liron, Y., Rosenfeld, N., Damon, T., Perzov, N., Alon, U.: Variability and memory of protein levels in human cells. Nature 444, 643–646 (2006) 37. Srivastava, R., Peterson, M.S., Bentley, W.E.: Stochastic kinetic analysis of Escherichia coli stress circuit using sigma(32)-targeted antisense. Biotechnol. Bioeng. 75, 120– 129 (2001) 38. Srivastava, R., You, L., Summers, J., Yin, J.: Stochastic vs. deterministic modeling of intracellular viral kinetics. J. Theor. Biol. 218(3), 309–321 (2002)

Methods for High-Dimensional Parametric and Stochastic Elliptic PDEs 39. Zeiser, S., Franz, U., Liebscher, V.: Autocatalytic genetic networks modeled by piecewise-deterministic Markov processes. J. Math. Biol. 60(2), 207–246 (2010). doi:10.1007/s00285-009-0264-9. http://dx.doi.org/10.1007/ s00285-009-0264-9

Methods for High-Dimensional Parametric and Stochastic Elliptic PDEs Christoph Schwab Seminar for Applied Mathematics (SAM), ETH Zürich, ETH Zentrum, Zürich, Switzerland

Synonyms Generalized Polynomial Chaos; Partial Differential Equations with Random Input Data; Sparse Finite Element Methods; Sparsity; Uncertainty Quantification

Motivation and Outline A large number of stationary phenomena in the sciences and in engineering are modeled by elliptic partial differential equations (elliptic PDEs), and during the past decades, numerical methods for their solution such as the finite element method (FEM), the finite difference method (FDM), and the finite volume method (FVM) have matured. Well-posed mathematical problem formulations as well as the mathematical analysis of the numerical methods for their approximate solution were based on the paradigm (going back to Hadamard’s notion of well-posedness) that all input data of interest are known exactly and that numerical methods should be convergent, i.e., able to approximate the unique solution at any required tolerance (neglecting effects of finite precision arithmetic). In recent years, due to apparent limited predictive capability of high-precision numerical simulations, and in part due to limited measurement precision of PDE input data in applications,

Research supported by the European Research Council (ERC) under Grant No. AdG 247277

903

the numerical solution of PDEs with random inputs has emerged as key area of applied and computational mathematics with the aim to quantify uncertainty in predictive engineering computer simulations. At the same time, in many application areas, the data deluge has become reality, due to the rapid advances in digital data acquisition such as digital imaging. The question how to best computationally propagate uncertainty, e.g., from digital data, from multiple observations and statistical information on measurement errors through engineering simulations mandate, once more, the (re)formulation of elliptic PDEs of engineering interest as stochastic elliptic PDEs, for which all input data can be random functions in suitable function spaces. Often (but not always), the function spaces will be the spaces in which the deterministic counterparts of the PDEs of interest admit unique solutions. In the formulation of stochastic elliptic (and other) PDEs, one distinguishes two broad classes of random inputs (or “noises”): first, so-called colored noise, where the random inputs have realizations in classical function spaces and the statistical moments are bounded and exhibit, as functions of the spatial variable, sufficient smoothness to allow for the classical differential calculus (in the sense of distributions), and second, so-called white, or more generally, rough noises. One characteristic of white noise being the absence of spatial or temporal correlation, solutions of PDEs with white noise inputs can be seen, in a sense, as stochastic analogues to fundamental solutions (in the sense of distributions) of deterministic PDEs: as in the deterministic setting, they are characterized by rather low regularity and by low integrability. As in the deterministic setting, classical differential calculus does not apply any more. Extensions such as Ito Calculus (e.g., [25, 73]), white noise calculus (e.g., [57, 64]), or rough path calculus (e.g., [35]) are required in the mathematical formulations of PDE for such inputs. In the present notes, we concentrate on efficient numerical treatment of colored noise models. For stochastic PDEs (by which we mean partial differential equations with colored random inputs, which we term “SPDEs” in what follows), well-posedness can be established when inputs and outputs of SPDEs are viewed as random fields.

M

904

Methods for High-Dimensional Parametric and Stochastic Elliptic PDEs

(or random function, RF for short) is an E-valued random variable (RV for short), i.e., a measurable mapping from .; A; P/ to .E; E/. We denote by L.X / the image measure of the probability measure P under the mapping X on the measurable space .E; E/, defined for any A 2 E by L.X /.A/ D P .! 2 W X.! 2 A/. The measure D L.X / is the distribution or law of the RV X . If E is a separable Banach space, and X is a RV on .; A/ taking values in E, then the real valued function kX./kE is measurable (i.e., a random number) (e.g., [25, Lemma 1.5]). For 1 p 1, and for a separable Banach space E, denote by Lp .; A; PI E/ the Bochner space of all RF X W 7! E which are p-integrable, i.e., for which the norms

Random Fields in Stochastic PDEs The formulation of elliptic SPDEs with stochastic input data (such as stochastic coefficient functions, stochastic source or volume terms, or stochastic domains) necessitates random variables which take values in function spaces. Some basic notions are as follows (see, e.g., [25, Chap 1] or [2] for more details). Consider “stochastic” PDE within the probabilistic framework due to Kolmogoroff (other approaches toward randomness are fuzzy sets, belief functions, etc.): let .; A; P/ denote probability space, and let E be a metric space, endowed with a sigma algebra E. A strongly measurable mapping X W 7! E is a mapping such that for every A 2 A, the set f! 2 W X.!/ 2 Ag 2 A. A random field

kX kLp .;A;PIE/ WD

8 Z < :

1=p p

!2

kX.!/kE d P.!/

esssup!2 kX.!/kE < 1 ;

We also write Lp .I E/ if the probability space is clear from the context. If X 2 L1 .I E/ is a RF, the mathematical expectation EŒX (also referred to as “mean field” or “ensemble average” of X ) is well defined as an element of E by Z X.!/d P.!/ 2 E ; (2) EŒX WD

0 there exists a constant Cs > 0 and a convergence rate t.s/ > 0 such that for all ` 2 N holds the error estimate 8u 2 Vt W

inf ku v` kV CN`t kukVs :

v2S`

(5)

In the context of the Dirichlet problem of the Poisson equation with random source term (cf., e.g., [80, 80]) in a bounded, Lipschitz domain D Rd , for example, we think of V D H01 .D/ and of Vs D .H 1Cs \ H01 /.D/. Then, for S` denoting the space of continuous, piecewise polynomial functions of degree k 1

EM Œu` WD

M 1 X u` .!i / : M i D1

(6)

There are two contributions to the error EŒu EM Œu` : a sampling error and of a discretization error (see [83]): assuming that u 2 L2 .I Vs /, for M D 1; 2; : : : ; ` D 1; 2; : : : holds the error bound kEŒu EM Œu` kL2 .IV / kEŒu EM ŒukL2 .IV / C kEM Œu EM Œu` kL2 .IV / 1 p kukL2 .IV / C Cs N`t kukL2 .IVs / : M (7) Relation (7) gives an indication on the selection of the number of degrees of freedom N` in the discretization scheme versus the sample size M in order to balance both errors: to reach error O."/ in L2 .I V /, work of order O.MN` / D O."21=t / D O."2d=s / is required (assuming work for the realization of one sample u` 2 S` being proportional to dimS` , as is typically the case when multilevel solvers are used for the approximate solution of the discretized equations). We note that, even for smooth solutions (where s is large), the convergence rate of error versus work never exceeds 1=2. Methods which allow to achieve higher rates of convergence are so-called quasi-Monte Carlo methods (QMC methods for short) (see, e.g., [61, 62] for recent results). The naive use of MC methods for the numerical solution of SPDEs is, therefore, limited either to small model problems or requires the use of massive computational resources. The so-called multilevel Monte Carlo (MLMC for short), proposed by M. Giles in [39] (after earlier work of Heinrich on quadrature) to accelerate path simulation of Ito SDEs, can dramatically improve the situation. These methods are also effective in particular for RF solutions u with only low regularity, i.e., u 2 Vs for small s > 0, whenever hierarchic discretizations of

M

906


stochastic PDEs are available. To derive it, we assume this is naturally the case for multigrid methods). By that for each draw u.!i / of the RF u, a sequence the linearity of the mathematical expectation, with the u` .!i / of approximate realizations is available (e.g., convention that u1 WD 0, we may write

" EŒu uL D E u

L X

# .u` u`1 / D EŒu

`D0

L X

EŒu` u`1 :

`D0

(8) L Rather than applying the MC estimator now to uL , X L Œu WD EM` Œu` u`1 : (9) E we estimate separately the expectation EŒu` u`1 `D0 of each discretization increment, amounting to the numerical solution of the same realization of the SPDE on two successive mesh levels as is naturally available Efficiency gains in MLMC stem from the possibility to in multilevel solvers for elliptic PDEs. This results in use, at each level of discretization, a level-dependent number M` of MC samples: combining (4) and (5), the MLMC estimator

kEM` Œu` u`1 kL2 .IV / kEŒuEM` Œu` kL2 .IV / CkEŒuEM` Œu`1 kL2 .IV / 1=2

. M`

This error bound may now be used to optimize the sample numbers M` with respect to the discretization errors at each mesh level. In this way, computable estimates for the ensemble average EŒu of the RF u can be obtained with work of the order of one multilevel solve of one single instance of the deterministic problem at the finest mesh level (see, e.g., [12] for a complete analysis for a scalar, second-order elliptic problem with random diffusion coefficients and [10, 11, 65] for other types of SPDEs, and [41] for subsurface flow models with lognormal permeability). For quasi-MC methods, similar constructions are possible; this is an ongoing research (see, e.g., [47, 61, 62]).

t .s/

N`

kukL2 .IVs / :

Au D f .!/ ;

f 2 L2 .I V / :

(10)

This equation admits a unique solution, a RF u 2 L2 .I V /. Since the operator equation (10) is linear, application of the operator A and tensorization commute, and there holds the deterministic equation for the covariance function M.2/ Œu WD EŒu ˝ u 2 V ˝ V : .A ˝ A/M.2/ Œu D M.2/ Œf :

(11)

Equation (11) is exact, i.e., no statistical moment closures, e.g., in randomly forced turbulence, are necessary. For the Poisson equation, this approach was first proposed in [5]. If A is V -coercive, the operator A˝A is not elliptic in the classical sense but boundedly invertible in scales of tensorized spaces and naturally admits regularity shifts in the tensorized smoothness Moment Approximation by Sparse Tensor Galerkin Finite Element Methods scale Vs ˝ Vs . This allows for deterministic Galerkin approximation of 2- and of k-point correlation funcFor a boundedly invertible, deterministic operator A 2 tions in log-linear complexity w.r. to the number of L.V; V / on some separable Hilbert space V over R, degrees of freedom used for the solution of one realizaconsider operator equation with random loading tion of the mean-field problem (e.g., [51,76,78,80,83]).


The non-elliptic nature of A ˝ A implies, however, possible enlargement of the singular support of the RF’s k-point correlation functions; see, e.g., [71, 72] for a simple example and an hp-error analysis with exponential convergence estimates and [31] and the references there for more general regularity results for elliptic pseudodifferential equations with random input data, covering in particular also strongly elliptic boundary integral equations. For nonlinear problems with random inputs, deterministic tensor equations such as (11) for k-point correlation functions of random solutions do not hold, unless some closure hypothesis is imposed. In the case of an a priori assumption on P-a.s. smallness of solution fluctuations about mean, deterministic tensor equations like (11) for the second moments of the random solution can be derived from a first-order moment closure; we refer to [28] for an early development of this approach in the context subsurface flow; to [52] for an application to elliptic problems in random domains, where the perturbation analysis is based on the shape derivative of the solution at the nominal domain; and to [18] for a general formulation and for error estimates of both discretization and closure error. For an analysis of the linearization approach of random elliptic PDEs in uncertainty quantification, we refer to [7]. We emphasize that the Galerkin solution of the tensorized perturbation equations entails cost O.NLk / where NL denotes the number of degrees of freedom necessary for the discretization of the nominal problem and k 1 denotes the order of the statistical moment of interest. Using sparse tensor Galerkin discretizations as in [76, 80, 80], this can be reduced to O.NL .log NL /k / work and memory, rendering this approach widely applicable.

Generalized Polynomial Chaos Representations Generalized polynomial chaos (“gpc” for short) representations aim at a parametric, deterministic representation of the law L.u/ of a random solution u of a SPDE. For PDEs with RF inputs, they are usually infinite-dimensional, deterministic parametrizations, in the sense that a countable number of variables are required. Representations of this type go back to the spectral representation of random fields introduced by N. Wiener [84]. A general representation theorem for

907

RFs in L2 .I H / was obtained in [16], for Gaussian RFs over separable Hilbert spaces H . This result shows that the classical Wiener-Hermite polynomial chaos is, in a sense, universal. Representations in terms of chaos expansions built from polynomials which are orthogonal with respect to non-Gaussian probability measures were proposed in [86]; these so-called generalized polynomial chaos expansions often allow finitely truncated approximations with smaller errors for strongly non-Gaussian RF finite second moments. Special cases of polynomial chaos expansions are the so-called Karhunen-Loève expansions. These can be consider as a particular instance of so-called principal component approximations. Karhunen-Loève expansions allow to parametrize a RF a 2 L2 .I H / taking values in a separable Hilbert space H in terms of countably many eigenfunctions f'i gi 1 of its covariance operator Ca W H 7! H : the unique compact, self-adjoint nuclear, and trace-class operator whose kernel is the two-point correlation of the RF a, i.e., a D EŒa ˝ a; see, e.g., [82] or, for the Gaussian case, [2, Thm. 3.3.3]. Importantly, the enumeration of eigenpairs .k ; 'k /k1 of Ca is such that 1 2 : : : ! 0 so that the Karhunen-Loève eigenfunctions constitute principal components of the RF a, ordered according to decreasing importance (measured in terms of their contribution to the variance of the RF a): a.; !/ D a./ N C

Xp k Yk .!/'k ./ :

(12)

k1

In (12), the normalization of the RVs Yk and of the 'k still needs to be specified: assuming that the 'k are H -orthonormal, i.e., .'i ; 'j / D ıij , the RVs Yk 2 L2 .I R/ defined in (12) are given by (.; / denoting the H inner product): 1=2

Yk .!/ D k

.a.; !/ a./; N 'k / ;

k D 1; 2; : : : (13)

The sequence fYk gk1 constitutes a sequence of pairwise uncorrelated RVs which, in case a is a Gaussian RF over H , are independent. Recently, for scalar, elliptic problems in divergence form with lognormal Gaussian RF permeability typically appearing in subsurface flow models (see, e.g., [66, 67] and the references there), several rigorous mathematical formulations were given in [17,36,43]. It was shown that the stochastic diffusion problem admits

M

908


a unique RF solution which belongs to Lp .; I V / where is a Gaussian measure on V (see, e.g., [15]). In particular, in [17,34,61], dimension truncation error analyses were performed. Here, two broad classes of discretization approaches are distinguished: stochastic collocation (SC) and stochastic Galerkin (SG). SC is algorithmically similar to MC sampling in that only instances of the deterministic PDEs need to be solved. We refer to [6, 8, 69, 70]. Recently, adaptive stochastic collocation algorithms for the full, infinite-dimensional problem were developed in [19]. For solutions with low regularity with respect to the stochastic variable, also quasi-Monte Carlo (“QMC” for short) is effective; we refer to [62] for a general introduction to the mathematical foundation of QMC quadrature as applied to infinite-dimensional parametric operator equations and to [29, 30, 47, 48, 63] for recent applications of QMC to elliptic SPDEs. Numerical optimal control of stochastic elliptic (and parabolic) PDEs with countably parametric operators has been investigated in [42, 60]. Regularity and efficient numerical methods for stochastic elliptic multiscale problems were addressed in the papers [1, 54]; there, multilevel Monte Carlo and generalized polynomial chaos approximations were proposed, and convergence rates independent of the scale parameters were established under the (natural, for this class of problems) assumption of a multiscale discretization in physical space. Bayesian inverse problems for stochastic, elliptic PDEs have also been addressed from the point of view of sparsity of forward maps. We refer to [79] and the references there for the formulation and the basic sparsity result for parametric diffusion problems. The result and approach was generalized to large classes of countably parametric operator equations which cover random elliptic and parabolic PDEs in [4, 37, 74, 75, 77]. Parametric Algebraic Equations The Karhunen-Loève expansion (12) can be viewed as a parametrization of the RF a.; !/ 2 L2 .I H / in terms of the sequence Y D .Yk .!//k1 of R-valued RVs Yk .!/. Assuming that the RVs Yk are independent, (12) could be interpreted as parametric, deterministic function of infinitely many parameters y D .yk /k1 , a.; y/ D a./ N C

Xp k yk './ k1

(14)

which is evaluated at Y D .Yk .!//k1 . We illustrate this in the most simple setting: given real numbers f; 2 R and a parametric function a.y/ D 1 C y where y 2 U WD Œ1; 1, we wish to find the function U 3 y 7! u.y/ such that a.y/u.y/ D f ;

for y 2 U :

(15)

Evidently, u.y/ D f =a.y/ provided that a.y/ ¤ 0 for all y 2 U which is easily seen to be ensured by a smallness condition on : if j j < 1, then a.y/ 1 > 0 for every y 2 U and (15) admits the unique solution u.y/ D f =.1 C y / which depends analytically on the parameter y 2 U . Consider next the case where the coefficient a.y/ depends on a sequence y D .y1 ; y2 ; : : :/ D .yj /j 1 of parameters yj , for which we assume once more that jyj j 1 for all j 2 N or, symbolically, that P y 2 U D Œ1; 1N . Then a.y/ D 1 C j 1 yj j and a minimal condition to render a.y/ well defined is that the infinite series converges, which is ensured by the summability condition D . j /j 1 2 `1 .N/. Note that this condition is always satisfied if there are only finitely many parameters y1 ; : : : ; yJ for some J < 1 which corresponds to the case that j D 0 for all j > J . Once again, in order to solve (15) for the function u.y/ (which now depends on infinitely many variables y1 ; y2 ; : : :), a smallness condition is required: WD k k`1 .N/ D

X

j

jj

0, and for every y 2 U , (18) admits also called nonintrusive. a unique solution u.; y/ 2 L2 .D/, which is given by An alternative approach to SC is stochastic Galerkin u.; y/ D .a.; y//1 f . (SG for short) discretizations, which are based on The element bj D k j kL1 .D/ quantifies the sensi- mean square projection (w.r. to P) of the parametric tivity of the “input” a.; y/ with respect to coordinate solution u.x; y/ of (21) onto finite spans of tensorized yj : there holds generalized polynomial chaos (gpc) expansions on U . Recent references for mathematical formulations b and convergence analysis of SG methods are [6, 27, kf kL2 .D/ ; where sup k@y u.; y/kL2 .D/ 34, 40, 44, 45, 68]. Efficient implementations, includ1 y2U ing a posteriori error estimates and multi-adaptive Y j bj D b11 b22 : : : ; 2 F : (20) AFEM, are addressed in [32]. SG-based methods feab WD j 1 ture the significant advantage of Galerkin orthogonality in L2 .I V / of the gpc approximation, which N Here D .1 ; 2 ; : : :/ 2 F N0 , the set of implies the perspective of adaptive discretization of sequences of nonnegative integers which are “finitely random fields. Due to the infinite dimension of the supported,” i.e., which have only finitely many nonzero parameter space U , these methods differ in essential terms j ; due to bj0 D 1, the infinite product in (20) is respects from the more widely known adaptive FEM: meaningful for 2 F . an essential point is sparsity in the gpc expansion of the parametric solution u.x; y/ of (21). In [22, 23], a fundamental sparsity observation has been made Parametric Elliptic PDEs Efficient methods for parametric, elliptic PDEs with for equations like (21): sparsity in the random in(infinite-dimensional) parametric coefficients of the puts’ parametric expansion implies the same sparsity form (14), (17), such as the scalar model elliptic in the gpc representation of the parametric solution u.x; y/. equation

M

910


The results in [22, 23] are not limited to (21) but hold for rather large classes of elliptic (as well as parabolic) SPDEs (see [20, 24, 49, 55, 56]), implying with the results in [19, 45, 46] quasi best N -term, dimension-independent convergence rates of SC and SG algorithms. Dimension-independent approximation rates for large, nonlinear systems of random initial value ODEs were proved in [49] and computationally investigated in [50]. For implementational and mathematical aspects of adaptive stochastic Galerkin FEM with computable, guaranteed upper error bounds and applications to engineering problems, we refer to [32, 33].

5. 6.

7.

8.

9.

Further Results and New Directions For further indications, in particular on the efficient algorithmic realization of collocation approaches for the parametric, deterministic equation, we refer to [38,85]. Numerical solution of SPDEs based on sparse, infinitedimensional, parametric representation of the random solutions also allows the efficient numerical treatment of Bayesian inverse problems in the non-Gaussian setting. We refer to [74, 79] and the references there. For the use of various classes of random elliptic PDEs in computational uncertainty quantification, we refer to [53]. The fully discretized, parametric SPDEs (21) can be viewed as high-dimensional, multi-linear algebra problems; here, efficient discretizations which directly compress matrices arising in the solution process of SGFEM are currently emerging (we refer to [59, 76] and the references there for further details). For an SC approach to eigenvalue problems for (21) (and more general problems), we refer to [3].

10.

11.

12.

13.

14.

15.

References 1. Abdulle, A., Barth, A., Schwab, C.: Multilevel Monte Carlo methods for stochastic elliptic multiscale PDEs. SIAM J. Multiscale Methods Simul. 11(4), 1033–1070 (2013). doi:http://dx.doi.org/10.1137/120894725 2. Adler, R.J.: The Geometry of Random Fields. Wiley Series in Probability and Mathematical Statistics. Wiley, Chichester (1981) 3. Andreev, R., Schwab, C.: Sparse Tensor Approximation of Parametric Eigenvalue Problems. Lecture Notes in Computational Science and Engineering, vol. 83, pp. 203–241. Springer, Berlin (2012) 4. Arnst, M., Ghanem, R., Soize, C.: Identification of Bayesian posteriors for coefficients for chaos expansions. J. Comput. Phys. 229(9), 3134–3154 (2010).

16.

17.

18.

19.

doi:10.1016/j.jcp.2009.12.033, http://dx.doi.org/10.1016/j. jcp.2009.12.033 Babuˇska, I.: On randomised solutions of Laplace’s equation. ˇ Casopis Pˇest Mat. 86, 269–276 (1961) Babuˇska, I., Tempone, R., Zouraris, G.E.: Galerkin finite element approximations of stochastic elliptic partial differential equations. SIAM J. Numer. Anal. 42(2), 800–825 (2004) (electronic). doi:10.1137/S0036142902418680 Babuˇska, I., Nobile, F., Tempone, R.: Worst case scenario analysis for elliptic problems with uncertainty. Numer. Math. 101(2), 185–219 (2005). doi:10.1007/s00211-005-0601-x, http://dx.doi.org/10.1007/ s00211-005-0601-x Babuˇska, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial differential equations with random input data. SIAM J. Numer. Anal. 45(3), 1005–1034 (2007) (electronic) Bacuta, C., Nistor, V., Zikatanov, L.: Improving the rate of convergence of high-order finite elements on polyhedra. I. A priori estimates. Numer. Funct. Anal. Optim. 26(6), 613–639 (2005) Barth, A., Lang, A.: Simulation of stochastic partial differential equations using finite element methods. Stochastics 84(2–3), 217–231 (2012). doi:10.1080/17442508.2010.523466, http://dx.doi.org/ 10.1080/17442508.2010.523466 Barth, A., Lang, A., Schwab, C.: Multi-level Monte Carlo finite element method for parabolic stochastic partial differential equations. Technical report 2011/30, Seminar for Applied Mathematics, ETH Zürich (2011) Barth, A., Schwab, C., Zollinger, N.: Multi-level Monte Carlo finite element method for elliptic PDEs with stochastic coefficients. Numer. Math. 119(1), 123–161 (2011). doi:10.1007/s00211-011-0377-0, http://dx.doi.org/10.1007/ s00211-011-0377-0 Bieri, M.: A sparse composite collocation finite element method for elliptic SPDEs. SIAM J. Numer. Anal. 49(6), 2277–2301 (2011). doi:10.1137/090750743, http://dx.doi. org/10.1137/090750743 Bieri, M., Schwab, C.: Sparse high order FEM for elliptic sPDEs. Comput. Methods Appl. Mech. Eng. 198(37–40), 1149–1170 (2009) Bogachev, V.I.: Gaussian Measures. Mathematical Surveys and Monographs, vol. 62. American Mathematical Society, Providence (1998) Cameron, R.H., Martin, W.T.: The orthogonal development of non-linear functionals in series of Fourier-Hermite functionals. Ann. Math. (2) 48, 385–392 (1947) Charrier, J.: Strong and weak error estimates for elliptic partial differential equations with random coefficients. SIAM J. Numer. Anal. 50(1), 216–246. doi:10.1137/100800531, http://dx.doi.org/10.1137/100800531 Chernov, A., Schwab, C.: First order k-th moment finite element analysis of nonlinear operator equations with stochastic data. Math. Comput. 82, 1859–1888 (2013). doi:http:// dx.doi.org/10.1090/S0025-5718-2013-02692-0 Chkifa, A., Cohen, A., DeVore, R., Schwab, C.: Adaptive algorithms for sparse polynomial approximation of parametric and stochastic elliptic PDEs. M2AN Math. Model. Numer. Anal. 47(1), 253–280 (2013). doi:http://dx.doi.org/ 10.1051/m2an/2012027

Methods for High-Dimensional Parametric and Stochastic Elliptic PDEs 20. Chkifa, A., Cohen, A., Schwab, C.: High-dimensional adaptive sparse polynomial interpolation and applications to parametric PDEs. J. Found. Comput. Math. 14(4), 601–633 (2013) 21. Ciarlet, P.: The Finite Element Method for Elliptic Problems. Studies in Mathematics and Its Applications, vol. 4. North-Holland Publishing, Amsterdam/New York (1978) 22. Cohen, A., DeVore, R., Schwab, C.: Convergence rates of best n-term Galerkin approximations for a class of elliptic sPDEs. J. Found. Comput. Math. 10(6), 615–646 (2010) 23. Cohen, A., Devore, R., Schwab, C.: Analytic regularity and polynomial approximation of parametric and stochastic elliptic PDE’s. Anal. Appl. (Singap) 9(1), 11– 47 (2011). doi:10.1142/S0219530511001728, http://dx.doi. org/10.1142/S0219530511001728 24. Cohen, A., Chkifa, A., Schwab, C.: Breaking the curse of dimensionality in sparse polynomial approximation of parametric PDEs. J. Math. Pures et Appl. (2014). doi:http:// dx.doi.org/10.1016/j.matpur.2014.04.009 25. Da Prato, G., Zabczyk, J.: Stochastic Equations in Infinite Dimensions. Encyclopedia of Mathematics and Its Applications, vol. 44. Cambridge University Press, Cambridge (1992) 26. Davis, P.J.: Interpolation and Approximation. Dover Publications, New York (1975) (republication, with minor corrections, of the 1963 original, with a new preface and bibliography) 27. Deb, M.K., Babuˇska, I.M., Oden, J.T.: Solution of stochastic partial differential equations using Galerkin finite element techniques. Comput. Methods Appl. Mech. Eng. 190(48), 6359–6372 (2001). doi:10.1016/S0045-7825(01)00237-7, http://dx.doi.org/10.1016/S0045-7825(01)00237-7 28. Dettinger, M., Wilson, J.L.: First order analysis of uncertainty in numerical models of groundwater flow part 1. Mathematical development. Water Resour. Res. 17(1), 149– 161 (1981) 29. Dick, J., Kuo, F.Y., LeGia, Q.T., Nuyens, D., Schwab, C.: Higher order QMC Petrov-Galerkin discretization for affine parametric operator equations with random field inputs. SIAM J. Numer. Anal. 52(6), 2676–2702 (2014). doi:http:// dx.doi.org/10.1137/130943984 30. Dick, J., Kuo, F.Y., LeGia, Q.T., Schwab, C.: Multi-level higher order QMC Galerkin discretization for affine parametric operator equations. Technical report 2014-14, Seminar for Applied Mathematics, ETH Zürich (2014) 31. Dölz, J., Harbrecht, H., Schwab, C.: Covariance regularity and H -matrix approximation for rough random fields. Technical report 2014-19, Seminar for Applied Mathematics, ETH Zürich (2014) 32. Eigel, M., Gittelson, C., Schwab, C., Zander, E.: Adaptive stochastic Galerkin FEM. Comput. Methods Appl. Mech. Eng. 270, 247–269 (2014). doi:http://dx.doi.org/10.1016/j. cma.2013.11.015 33. Eigel, M., Gittelson, C., Schwab, C., Zander, E.: A convergent adaptive stochastic Galerkin finite element method with quasi-optimal spatial meshes. Technical report 201401, Seminar for Applied Mathematics, ETH Zürich (2014) 34. Frauenfelder, P., Schwab, C., Todor, R.A.: Finite elements for elliptic problems with stochastic coefficients. Comput. Methods Appl. Mech. Eng. 194(2–5), 205–228 (2005)

911

35. Friz, P.K., Victoir, N.B.: Multidimensional Stochastic Processes as Rough Paths: Theory and Applications. Cambridge Studies in Advanced Mathematics, vol. 120. Cambridge University Press, Cambridge (2010) 36. Galvis, J., Sarkis, M.: Approximating infinity-dimensional stochastic Darcy’s equations without uniform ellipticity. SIAM J. Numer. Anal. 47(5), 3624–3651 (2009). doi:10.1137/080717924, http://dx.doi.org/10.1137/ 080717924 37. Gantner, R.N., Schillings, C., Schwab, C.: Binned multilevel Monte Carlo for Bayesian inverse problems with large data. To appear in Proc. 22nd Int. Conf. on Domain Decomposition (2015) 38. Ghanem, R.G., Spanos, P.D.: Stochastic Finite Elements: A Spectral Approach. Springer, New York (1991). doi:10.1007/978-1-4612-3094-6, http://dx.doi.org/10.1007/ 978-1-4612-3094-6 39. Giles, M.B.: Multilevel Monte Carlo path simulation. Oper. Res. 56(3), 607–617 (2008) 40. Gittelson, C.: An adaptive stochastic Galerkin method for random elliptic operators. Math. Comput. 82(283), 1515– 1541 (2013) 41. Gittelson, C., Könnö, J., Schwab, C., Stenberg, R.: The multi-level Monte Carlo finite element method for a stochastic Brinkman problem. Numer. Math. 125(2), 347–386 (2013). doi:http://dx.doi.org/10.1007/s00211-013-0537-5 42. Gittelson, C., Andreev, R., Schwab, C.: Optimality of adaptive Galerkin methods for random parabolic partial differential equations. J. Comput. Appl. Math. 263, 189–201 (2014). doi:http://dx.doi.org/10.1016/j.cam.2013.12.031 43. Gittelson, C.J.: Stochastic Galerkin discretization of the log-normal isotropic diffusion problem. Math. Models Methods Appl. Sci. 20(2), 237–263 (2010). doi:10.1142/S0218202510004210 44. Gittelson, C.J.: Adaptive Galerkin methods for parametric and stochastic operator equations. PhD thesis, ETH Zürich (2011). doi:10.3929/ethz-a-006380316, http://dx. doi.org/10.3929/ethz-a-006380316 45. Gittelson, C.J.: Adaptive stochastic Galerkin methods: beyond the elliptic case. Technical report 2011/12, Seminar for Applied Mathematics, ETH Zürich (2011) 46. Gittelson, C.J.: Stochastic Galerkin approximation of operator equations with infinite dimensional noise. Technical report 2011/10, Seminar for Applied Mathematics, ETH Zürich (2011) 47. Graham, I.G., Kuo, F.Y., Nuyens, D., Scheichl, R., Sloan, I.H.: Quasi-Monte Carlo methods for elliptic PDEs with random coefficients and applications. J. Comput. Phys. 230(10), 3668–3694 (2011) 48. Graham, I.G., Kuo, F.Y., Nichols, J.A., Scheichl, R., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo finite element methods for elliptic PDEs with lognormal random coefficients. Numer. Math. 128(4) (2014). doi:http://dx.doi.org/ 10.1007/s00211-014-0689-y 49. Hansen, M., Schwab, C.: Sparse adaptive approximation of high dimensional parametric initial value problems. Vietnam J. Math. 41(2), 181–215 (2013). doi:http://dx.doi.org/ 10.1007/s10013-013-0011-9 50. Hansen, M., Schillings, C., Schwab, C.: Sparse approximation algorithms for high dimensional parametric initial

M

912

51.

52.

53.

54.

55.

56.

57.

58.

59.

60.

61.

62.

63.

64.

Methods for High-Dimensional Parametric and Stochastic Elliptic PDEs value problems. In: Proceedings of the Fifth International Conference on High Performance Scientific Computing 2012, Hanoi (2014). doi:http://dx.doi.org/10.1007/978-3319-09063-4 1 Harbrecht, H.: A finite element method for elliptic problems with stochastic input data. Appl. Numer. Math. 60(3), 227–244 (2010). doi:10.1016/j.apnum.2009.12.002, http:// dx.doi.org/10.1016/j.apnum.2009.12.002 Harbrecht, H., Schneider, R., Schwab, C.: Sparse second moment analysis for elliptic problems in stochastic domains. Numer. Math. 109(3), 385–414 (2008). doi:10.1007/s00211-008-0147-9, http://dx.doi.org/10.1007/ s00211-008-0147-9 Hlavácˇ ek, I., Chleboun, J., Babuˇska, I.: Uncertain Input Data Problems and the Worst Scenario Method. NorthHolland Series in Applied Mathematics and Mechanics, vol. 46. Elsevier Science B.V., Amsterdam (2004) Hoang, V.H., Schwab, C.: High-dimensional finite elements for elliptic problems with multiple scales. Multiscale Model. Simul. 3(1), 168–194 (2004/2005). doi:10.1137/030601077, http://dx.doi.org/10.1137/030601077 Hoang, V.H., Schwab, C.: Analytic regularity and polynomial approximation of stochastic, parametric elliptic multiscale PDEs. Anal. Appl. (Singap.) 11(1), 1350001 (2013) Hoang, V.H., Schwab, C.: N-term Wiener chaos approximation rates for elliptic PDEs with lognormal Gaussian random inputs. Math. Model. Meth. Appl. Sci. 24(4), 797–826 (2014). doi:http://dx.doi.org/10.1142/S0218202513500681 Holden, H., Øksendal, B., Ubøe, J., Zhang, T.: Stochastic Partial Differential Equations. Probability and Its Applications: A Modeling, White Noise Functional Approach. Birkhäuser, Boston (1996) Karniadakis, G.E., Sherwin, S.J.: Spectral/hp element methods for CFD. Numerical Mathematics and Scientific Computation. Oxford University Press, New York (1999) Khoromskij, B.N., Schwab, C.: Tensor-structured Galerkin approximation of parametric and stochastic elliptic PDEs. SIAM J. Sci. Comput. 33(1) (2011). doi:10.1137/100785715, http://dx.doi.org/10.1137/ 100785715 Kunoth, A., Schwab, C.: Analytic regularity and GPC approximation for control problems constrained by linear parametric elliptic and parabolic PDEs. SIAM J. Control Optim. 51(3), 2442–2471 (2013). doi:http://dx.doi.org/10. 1137/110847597 Kuo, F., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo finite element methods for a class of elliptic partial differential equations with random coefficients. SIAM J. Numer. Anal. 62, 3351–3374 (2012) Kuo, F.Y., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo methods for high dimensional integration – the standard (weighted hilbert space) setting and beyond. ANZIAM J. 53(1), 1–37 (2011). doi:http://dx.doi.org/10.1017/ S1446181112000077 Kuo, F.Y., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo finite element methods for a class of elliptic partial differential equations with random coefficients. SIAM J. Numer. Anal. 50(6), 3351–3374 (2012). doi:http://dx.doi.org/10. 1137/110845537 Lototsky, S., Rozovskii, B.: Stochastic differential equations: a Wiener chaos approach. In: From Stochastic Calcu-

65.

66.

67.

68.

69.

70.

71.

72.

73.

74.

75.

76.

77.

78.

lus to Mathematical Finance, pp. 433–506. Springer, Berlin (2006). doi:10.1007/978-3-540-30788-4-23 Mishra, S., Schwab, C: Sparse tensor multi-level Monte Carlo finite volume methods for hyperbolic conservation laws with random initial data. Math. Comp. 81(280), 1979– 2018 (2012) Naff, R.L., Haley, D.F., Sudicky, E.: High-resolution Monte Carlo simulation of flow and conservative transport in heterogeneous porous media 1. Methodology and flow results. Water Resour. Res. 34(4), 663–677 (1998) Naff, R.L., Haley, D.F., Sudicky, E.: High-resolution Monte Carlo simulation of flow and conservative transport in heterogeneous porous media 2. Transport results. Water Resour. Res. 34(4), 679–697 (1998). doi:10.1029/97WR 02711 Nistor, V., Schwab, C.: High-order Galerkin approximations for parametric second-order elliptic partial differential equations. Math. Models Methods Appl. Sci. 23(9), 1729–1760 (2013) Nobile, F., Tempone, R., Webster, C.G.: An anisotropic sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Numer. Anal. 46(5), 2411–2442 (2008). doi:10.1137/070680540, http://dx.doi.org/10.1137/070680540 Nobile, F., Tempone, R., Webster, C.G.: A sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Numer. Anal. 46(5), 2309–2345 (2008). doi:10.1137/060663660, http://dx.doi. org/10.1137/060663660 Pentenrieder, B., Schwab, C.: hp-FEM for second moments of elliptic PDEs with stochastic data part 1: analytic regularity. Numer. Methods Partial Differ. Equ. (2012). doi:10.1002/num.20696, http://dx.doi.org/10.1002/ num.20696 Pentenrieder, B., Schwab, C.: hp-FEM for second moments of elliptic PDEs with stochastic data part 2: exponential convergence. Numer. Methods Partial Differ. Equ. (2012). doi:10.1002/num.20690, http://dx.doi.org/10.1002/ num.20690 Protter, P.E.: Stochastic Integration and Differential Equations. Stochastic Modelling and Applied Probability, vol. 21, 2nd edn. Springer, Berlin (2005) (Version 2.1, Corrected third printing) Schillings, C., Schwab, C.: Sparse, adaptive Smolyak quadratures for Bayesian inverse problems. Inverse Probl. 29(6), 1–28 (2013). doi:http://dx.doi.org/10.1088/02665611/29/6/065011 Schillings, C., Schwab, C.: Sparsity in Bayesian inversion of parametric operator equations. Inverse Probl. 30(6) (2014). doi:http://dx.doi.org/10.1088/0266-5611/30/6/065007 Schwab, C., Gittelson, C.J.: Sparse tensor discretizations of high-dimensional parametric and stochastic PDEs. Acta Numer. 20, 291–467 (2011). doi:10.1017/S0962492911000055 Schwab, C., Schillings, C.: Sparse quadrature approach to Bayesian inverse problems. Oberwolfach Rep. 10(3), 2237– 2237 (2013). doi:http://dx.doi.org/10.4171/OWR/2013/39 Schwab, C., Stevenson, R.: Adaptive wavelet algorithms for elliptic PDE’s on product domains. Math. Comput. 77(261), 71–92 (2008) (electronic). doi:10.1090/S0025-5718-07-02019-4, http://dx.doi.org/10.1090/S0025-5718-07-02019-4

Metropolis Algorithms 79. Schwab, C., Stuart, A.M.: Sparse deterministic approximation of Bayesian inverse problems. Inverse Probl. 28(4), (2012). doi:10.1088/0266-5611/28/4/045003, http://dx.doi. org/10.1088/0266-5611/28/4/045003 80. Schwab, C., Todor, R.A.: Sparse finite elements for elliptic problems with stochastic loading. Numer. Math. 95(4), 707– 734 (2003). doi:10.1007/s00211-003-0455-z, http://dx.doi. org/10.1007/s00211-003-0455-z 81. Schwab, C., Todor, R.A.: Sparse finite elements for stochastic elliptic problems—higher order moments. Computing 71(1), 43–63 (2003). doi:10.1007/s00607-003-0024-4. http://dx.doi.org/10.1007/s00607-003-0024-4 82. Schwab, C., Todor, R.A.: Karhunen-Loève approximation of random fields by generalized fast multipole methods. J. Comput. Phys. 217(1), 100–122 (2006) 83. von Petersdorff, T., Schwab, C.: Sparse finite element methods for operator equations with stochastic data. Appl. Math. 51(2), 145–180 (2006). doi:10.1007/s10492-006-0010-1, http://dx.doi.org/10.1007/s10492-006-0010-1 84. Wiener, N.: The homogeneous chaos. Am. J. Math. 60(4), 897–936 (1938). doi:10.2307/2371268, http://dx.doi.org/10. 2307/2371268 85. Xiu, D.: Fast numerical methods for stochastic computations: a review. Commun. Comput. Phys. 5(2–4), 242–272 (2009) 86. Xiu, D., Karniadakis, G.E.: The Wiener-Askey polynomial chaos for stochastic differential equations. SIAM J. Sci. Comput. 24(2), 619–644 (2002) (electronic). doi:10.1137/S1064827501387826, http://dx.doi.org/10. 1137/S1064827501387826

913

Description

The origin of the Metropolis algorithm can be traced to the early 1950s when physicists were faced with the need to numerically study the properties of many particle systems. The state of the system is represented by a vector x D .x1 ; x2 ; : : : xn /, where xi is the coordinate of the ith particle in the system and the goal is to study properties such as pressure and kinetic energy, which can be obtained from computation of the averaged values of suitably defined functions of the state vector. The averaging is weighted with respect to the canonical weight exp.E.x/=kT /, where the constants k and T denote the Boltzmann constant and the temperature, respectively. The physics of the system is encoded in form of the energy function. For example, in a simple liquid model, one has the PP energy E.x/ D .1=2/ i ¤j V .jxi xj j/, where V .:/ is a potential function giving the dependence of pair-wise interaction energy on the distance between two particles. Metropolis et al. [4] introduce the first Markov chain Monte Carlo method in this context by making sequential moves of the state vector by changing one particle at a time. In each move, a random change of a particle is proposed, say, by changing to a position chosen within a fixed distance from its current position, and the proposed change is either accepted or rejected according to a randomized deMetropolis Algorithms cision that depends on how much the energy of the system is changed by such a move. Metropolis et al. Martin A. Tanner justified the method via the concepts of ergodicity Department of Statistics, Northwestern University, and detailed balance as in kinetic theory. Although Evanston, IL, USA they did not explicitly mention “Markov chain,” it is easy to translate their formulation to the terminology of modern Markov chain theory. In subsequent deMathematics Subject Classification velopment, this method was applied to a variety of physical systems such as magnetic spins, polymers, 62F15; 65C40 molecular fluids, and various condense matter systems (reviewed in [1]). All these applications share the characteristics that n is large and the n comSynonyms ponents are homogeneous in the sense each takes value in the same space (say, 6-dimensional phase Markov chain Monte Carlo (MCMC) space, or up/down spin space, etc.) and interacts in identical manner with other components according to the same physical law as specified by the energy Short Definition function. An important generalization of the Metropolis A Metropolis algorithm is a MCMC computational algorithm, due to [3], is given as follows. Starting method for simulating from a probability distribution. with .0/ (of dimension d ), iterate for t D 1; 2; : : :

M

914

Microlocal Analysis Methods

1. Draw from a proposal distribution q.j .t 1/ /. 2. Compute

Wave Front Sets

The phase space in Rn is the cotangent bundle T Rn that can be identified with Rn Rn . Given a distribution ./ q. .t 1/ j/ .t 1/ : / D min 1; ˛.j f 2 D0 .Rn /, a fundamental object to study is the wave . .t 1/ / q.j .t 1/ / (1) front set WF.f / T Rn n0 viewed as the singularities .t 1/ .t / 3. With probability ˛.j /, set / D , other- of f that we define below. Here, 0 stands for the zero section .x; 0/, in other words, we do not allow D 0. wise set .t / D .t 1/ . It can be shown that ./ is the stationary distribution of the Markov chain . .0/ ; .1/ ; /. Moreover, if the proposal distribution q.j/ is symmetric, so that Definition q.j/ D q.j/, then the algorithm reduces to the The basic idea goes back to the properties of the classic Metropolis algorithm. Note that neither algo- Fourier transform. If f is an integrable compactly R f is smooth rithm requires knowledge of the normalizing constant supported function, one can tell whether O. / D e ix f .x/ dx by looking at the behavior of f for . Tierney [6] discusses convergence theory for the algorithm, as well as choices for q.j/. See also [5]. (that is smooth, even analytic) when j j ! 1. It is known that f is smooth if and only if for any N , jfO. /j CN j jN for some CN . If we localize this requirement to a conic neighborhood V of some 0 6D 0 References (V is conic if 2 V ) t 2 V; 8t > 0), then we can 1. Binder, K.: Monte Carlo Methods in Statistical Physics. think of this as a smoothness in the cone V . To localize in the base x variable, however, we first have to cut Springer, New York (1978) 2. Hammersley, J.M., Handscomb, D.C.: Monte Carlo Methods, smoothly near a fixed x0 . 2nd edn. Chapman and Hall, London (1964) We say that .x0 ; 0 / 2 Rn .Rn n 0/ is not in the 3. Hastings, W.K.: Monte Carlo sampling methods using 0 n Markov chains and their applications. Biometrika 57, 97–109 wave front set WF.f / of f 2 D .R / if there exists 1 n 2 C0 .R / with .x0 / 6D 0 so that for any N , there (1970) 4. Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, exists CN so that

A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1091 (1953) 5. Tanner, M.A., and Wong, W.H.: From EM to data augmentation: The emergence of MCMC Bayesian computation in the 1980s. Stat. Sci. 25, 506–516 (2010) 6. Tierney, L.: Markov chains for exploring posterior distributions. Ann. Stat. 22, 1701–1762 (1994)

c . /j CN j jN jf

for in some conic neighborhood of 0 . This definition is independent of the choice of . If f 2 D0 .˝/ with some open ˝ Rn , to define WF.f / ˝ .Rn n 0/, we need to choose 2 C01 .˝/. Clearly, the wave front set is a closed conic subset of Rn .Rn n 0/. Next, Microlocal Analysis Methods multiplication by a smooth function cannot enlarge the wave front set. The transformation law under coordiPlamen Stefanov nate changes is that of covectors making it natural to Department of Mathematics, Purdue University, think of WF.f / as a subset of T Rn n 0, or T ˝ n 0, West Lafayette, IN, USA respectively. The wave front set WF.f / generalizes the notion singsupp.f / – the complement of the largest open set One of the fundamental ideas of classical analysis is a where f is smooth. The points .x; / in WF.f / are thorough study of functions near a point, i.e., locally. referred to as singularities of f . Its projection onto the Microlocal analysis, loosely speaking, is analysis near base is singsupp.f /, i.e., points and directions, i.e., in the “phase space.” We review here briefly the theory of pseudodifferential operators and geometrical optics. singsupp.f / D fxI 9 ; .x; / 2 WF.f /g:


Example 1 (a) WF.ı/ D f.0; /I 6D 0g. In other words, the Dirac delta function is singular at x D 0 and in all directions there. (b) Let x D .x 0 ; x 00 /, where x 0 D .x1 ; : : : ; xk /, x 00 D .xkC1 ; : : : ; xn / with some k. Then WF.ı.x 0 // D f.0; x 00 ; 0 ; 0/; 0 6D 0g, where ı.x 0 / is the Dirac delta function x 0 D 0, R on the00plane 0 00 defined by hı.x /; i D .0; x / dx . In other words, WF.ı.x 0 // consists of all (co)vectors 6D 0 with a base point on that plane, perpendicular to it. (c) Let f be a piecewise smooth function that has a nonzero jump across some smooth surface S . Then WF.f / consists of all nonzero (co)vectors at points of S , normal to it. This follows from a change of variables that flattens S locally and reduces the problem to that for the Heaviside function multiplied by a smooth function. (d) Let f D pv x1 iı.x/ in R, where pv x1 is the regularized 1=x in the principal value sense. Then WF.f / D f.0; /I > 0g.

915

˝ and simply write S m . There are other classes in the literature, for example, ˝ D Rn , and (1) is required to hold for all x 2 Rn . The estimates (1) do not provide any control of p when x approaches boundary points of ˝ or 1. Given p 2 S m .˝/, we define the pseudodifferential operator ( DO) with symbol p, denoted by p.x; D/, by p.x; D/f D .2 /n

Z

e ıx p.x; /fO. / d ;

f 2 C01 .˝/: (2)

The P definition is˛ inspired by the following. If P D j˛jm a˛ .x/D is a differential operator, where D D i@, then using the Fourier inversion formula we can P write P as in (2) with a symbol p D j˛jm a˛ .x/ ˛ that is a polynomial in with x-dependent coefficients. The symbol class S m allows for more general functions. The class of the pseudodifferential operators with symbols in S m is denoted usually by m . The operator P is called a DO if it belongs to m for some m. By definition, S 1 D \m S m , and 1 D \m m . An important subclass is the set of the classical symbols that have an asymptotic expansion of the form

In example (d) we see a distribution with a wave front set that is not even in the variable, i.e., not symmetric under the change 7! . In fact, wave front sets do not have a special structure except for the requirement to be closed conic sets; given any such set, there is a distribution with a wave front set exactly that set. On the other hand, if f is real valued, then fO is an 1 even function; therefore WF.f / is even in , as well. X pmj .x; /; (3) p.x;

/

Two distributions cannot be multiplied in general. 0 j D0 However, if WF.f / and WF .g/ do not intersect, there is a “natural way” to define a product. Here, WF0 .g/ D where m 2 R, and pmj are smooth and positively f.x; /I .x; / 2 WF.g/g. homogeneous in of order m j for j j > 1, i.e., pmj .x; / D mj pmj .x; / for j j > 1, > 1; and the sign means that

Pseudodifferential Operators

N X Definition pmj .x; / 2 S mN 1 ; 8N 0: (4) p.x; / m We first define the symbol class S .˝/, m 2 R, as j D0 the set of all smooth functions p.x; /, .x; / 2 ˝ Rn , called symbols, satisfying the following symbol Any DO p.x; D/ is continuous from C01 .˝/ to estimates: for any compact set K ˝, and any multi- C 1 .˝/ and can be extended by duality as a continuous indices ˛, ˇ, there is a constant CK;˛;ˇ > 0 so that map from E 0 .˝/ to D0 .˝/.

j@˛ @ˇx p.x; /jCK;˛;ˇ .1Cj j/mj˛j ; 8.x; /2KRn : (1) m .˝/ with More generally, one can define the class S;ı 0 , ı 1 by replacing m j˛j there by m m .˝/. Often, we omit j˛j C ıjˇj. Then S m .˝/ D S1;0

Principal Symbol The principal symbol of a DO in m .˝/ given by (2) is the equivalence class S m .˝/=S m1.˝/, and any representative of it is called a principal symbol as well. In case of classical DOs, the convention is to choose

M

916


the principal symbol to be the first term pm that in Af particular is positively homogeneous in . “ n e i.xy/ a.x; y; / f .y/dy d ; f 2 C01 .˝/; D .2 / Smoothing Operators (7) Those are operators than map continuously E 0 .˝/ into C 1 .˝/. They coincide with operators with where the amplitude a satisfies smooth Schwartz kernels in ˝ ˝. They can always be written as DOs with symbols in S 1 and j@˛ @ˇx @y a.x; y; /j vice versa – all operators in 1 are smoothing. Smoothing operators are viewed in this calculus as CK;˛;ˇ; .1 C j j/mj˛j ; 8.x; y; / 2 K Rn negligible and DOs are typically defined modulo (8) smoothing operators, i.e., A D B if and only if A B is smoothing. Smoothing operators are not for any compact set K ˝ ˝ and for any ˛, “small.” ˇ, . In fact, any such A is a DO with symbol p.x; / (independent of y) with the formal asymptotic The Pseudolocal Property expansion For any DO P and any f 2 E 0 .˝/, X p.x; / D ˛ @˛y a.x; x; /: singsupp.Pf / singsuppf: (5) ˛0 In other words, a DO cannot increase the singular support. This property is preserved if we replace In particular, the principal symbol of that operator can be taken to be a.x; x; /. singsupp by WF; see (13). Symbols Defined by an Asymptotic Expansion In many applications, a symbol is defined by consecutively constructing symbols pj 2 S mj , j D 0; 1; : : : , where mj & 1, and setting p.x; /

X j

pj .x; /:

(6)

Transpose and Adjoint Operators to a DO The mapping properties of any DO A indicate that it has a well-defined transpose A0 and a complex adjoint A with the same mapping properties. They satisfy hAu; vi D hu; A0 vi; hAu; vi N D hu; A vi; 8u; v 2 C01 where h; i is the pairing in distribution sense; and in this particular case just an integral of uv. In particular, A u D A0 uN , and if A maps L2 to L2 in a bounded way, then A is the adjoint of A in L2 sense. The transpose and the adjoint are DOs in the same class with amplitudes a.y; x; / and a.y; N x; /, respectively; and symbols

The series on the right may not converge but we can make it convergent by using our freedom to modify each pj for in expanding compact sets without changing the large behavior of each term. This extends the Borel idea of constructing a smooth function with prescribed derivatives at a fixed point. The asymptotic (6) then is understood in a sense simX 1 X 1 ilar to (4). This shows that there exists a symbol @˛ Dx˛ p.x; .1/j˛j .@˛ Dx˛ p/.x; /; N /; ˛Š ˛Š p 2 S m0 satisfying (6). That symbol is not unique ˛0 ˛0 but the difference of two such symbols is always in S 1 . if a.x; y; / and p.x; / are the amplitude and/or the symbol of that DO. In particular, the principal symbols are p0 .x; / and pN0 .x; /, respectively, where p0 Amplitudes is (any representative of) the principal symbol. A seemingly larger class of DOs is defined by


Composition of DOs and DOs with Properly Supported Kernels Given two DOs A and B, their composition may not be defined even if they are smoothing ones because each one maps C01 to C 1 but may not preserve the compactness of the support. For example, if A.x; y/ and B.x; y/ are their Schwartz R kernels, the candidate for the kernel of AB given by A.x; z/B.z; y/ dz may be a divergent integral. On the other hand, for any DO A, one can find a smoothing correction R, so that ACR has properly supported kernel, i.e., the kernel of A C R has a compact intersection with K ˝ and ˝ K for any compact K ˝. The proof of this uses the fact that the Schwartz kernel of a DO is smooth away from the diagonal fx D yg, and one can always cut there in a smooth way to make the kernel properly supported at the price of a smoothing error. DOs with properly supported kernels preserve C01 .˝/, and also E 0 .˝/, and therefore can be composed in either of those spaces. Moreover, they map C 1 .˝/ to itself and can be extended from D0 .˝/ to itself. The property of the kernel to be properly supported is often assumed, and it is justified by considering each DO as an equivalence class. If A 2 m .˝/ and B 2 k .˝/ are properly supported DOs with symbols a and b, respectively, then AB is again a DO in mCk .˝/ and its symbol is given by X ˛0

.1/j˛j

1 ˛ @ a.x; /Dx˛ b.x; /: ˛Š

917

Relation (9) shows that the transformation law under coordinate changes is that of a covector. Therefore, the principal symbol is a correctly defined function on the cotangent bundle T ˝. The full symbol is not invariantly defined there in general. Let M be a smooth manifold and A W C01 .M / ! C 1 .M / be a linear operator. We say that A 2 m .M /, if its kernel is smooth away from the diagonal in M M and if in any coordinate chart .A; /, where W U ! ˝ Rn , we have .A.uı//ı1 2 m .˝/. As before, the principal symbol of A, defined in any local chart, is an invariantly defined function on T M . Mapping Properties in Sobolev Spaces In Rn , Sobolev spaces H s .Rn /, s 2 R, are defined as the completion of S 0 .Rn / in the norm Z kf k2H s .Rn / D

.1 C j j2 /s jfO. /j2 d :

When s is a nonnegative integer, norm is R ˛ an equivalent P 2 the square root of j˛js j@ f .x/j dx. For such s, and a bounded domain ˝, one defines H s .˝/ as the N using the latter norm with the completion of C 1 .˝/ integral taken in ˝. Sobolev spaces in ˝ for other real values of s are defined by different means, including duality or complex interpolation. Sobolev spaces are also Hilbert spaces. Any P 2 m .˝/ is a continuous map from sm s .˝/ to Hloc .˝/. If the symbols estimates (1) Hcomp are satisfied in the whole Rn Rn , then P W H s .Rn / ! H sm .Rn /.

In particular, the principal symbol can be taken to be ab. Elliptic DOs and Their Parametrices The operator P 2 m .˝/ with symbol p is called elliptic of order m, if for any compact K ˝, there Change of Variables and DOs on Manifolds Let ˝ 0 be another domain, and let W ˝ ! ˝Q be a exists constants C > 0 and R > 0 so that diffeomorphism. For any P 2 m .˝/, PQ f WD .P .f ı Q into C 1 .˝/. Q It is a DO in // ı 1 maps C01 .˝/ C j jm jp.x; /j for x 2 K, and j j > R: (10) m Q .˝/ with principal symbol Then the symbol p is called also elliptic of order m. (9) It is enough to require the principal symbol only to be elliptic (of order m). For classical DOs, see (3); where p is the symbol of P , d is the Jacobi matrix the requirement can be written as pm .x; / 6D 0 for f@i =@xj g evaluated at x D 1 .y/, and .d/0 stands 6D 0. A fundamental property of elliptic operators is for the transpose of that matrix. We can also write that they have parametrices. In other words, given an .d/0 D ..d 1 /1 /0 . An asymptotic expansion for the elliptic DO P of order m, there exists Q 2 m .˝/ whole symbol can be written down as well. so that p. 1 .y/; .d/0 /

M

918


QP Id 2 1 ;

PQ Id 2 1 :

(11) K U with a compact projection on the base “xspace,” (1) is fulfilled for any m. The essential support The proof of this is to construct a left parametrix first ES.P /, sometimes also called the microsupport of by choosing a symbol q0 D 1=p, cut off near the P , is defined as the smallest closed conic set on the possible zeros of p, that form a compact set any time complement of which the symbol p is of order 1. when x is restricted to a compact set as well. The Then WF.Pf / WF.f / \ ES.P /: corresponding DO Q0 will then satisfy Q0 P D Id C R, R 2 1 . Then we take a DO E with asymptotic Let P have a homogeneous principal symbol pm . The expansion E Id R C R2 R3 C : : : that would be characteristic set CharP is defined by the formal Neumann series expansion of .Id C R/1 , if the latter existed. Then EQ0 is a left parametrix that is CharP D f.x; / 2 T ˝ n 0I pm .x; / D 0g: also a right parametrix. An important consequence is the following elliptic CharP can be defined also for general DOs that regularity statement. If P is elliptic (and properly may not have homogeneous principal symbols. For any supported), then DO P , we have singsupp.PF / D singsupp.f /;

8f 2 D0 .˝/; (12)

compared to (5). In particular, Pf 2 C 1 implies f 2 C 1. It is important to emphasize that elliptic DOs are not necessarily invertible or even injective. For example, the Laplace-Beltrami operator S n1 on the sphere is elliptic, and then so is S n1 z for every number z. The latter however so not injective for z an eigenvalue. On the other hand, on a compact manifold M , an elliptic P 2 m .M / is “invertible” up to a compact error, because then QP Id D K1 , PQ Id D K2 , see (11) with K1;2 compact operators. As a consequence, such an operator is Fredholm and in particular has a finitely dimensional kernel and cokernel.

DOs and Wave Front Sets

WF.f / WF.Pf / [ CharP;

8f 2 E 0 .˝/: (14)

P is called microlocally elliptic in the open conic set U , if (10) is satisfied in all compact subsets, similarly to the definition of ES.P / above. If it has a homogeneous principal symbol pm , ellipticity is equivalent to pm 6D 0 in U . If P is elliptic in U , then Pf and f have the same wave front set restricted to U , as follows from (14) and (13). The Hamilton Flow and Propagation of Singularities Let P 2 m .M / be properly supported, where M is a smooth manifold, and suppose that P has a real homogeneous principal symbol pm . The Hamiltonian vector field of pm on T M n 0 is defined by Hpm

n X @pm @ @pm @ : D @xj @ j @ j @xj j D1

The microlocal version of the pseudolocal property is The integral curves of Hpm are called bicharacteristics of P . Clearly, Hpm pm D 0; thus pm is constant given by the following: along each bicharacteristic. The bicharacteristics along WF.Pf / WF.f / (13) which pm D 0 are called zero bicharacteristics. The Hörmander’s theorem about propagation of for any (properly supported) DO P and f 2 D0 .˝/. singularities is one of the fundamental results in the In other words, a DO cannot increase the wave front theory. It states that if P is an operator as above and set. If P is elliptic for some m, it follows from the P u D f with u 2 D0 .M /, then existence of a parametrix that there is equality above, WF.u/ n WF.f / CharP i.e., WF.Pf / D WF.f /, which is a refinement of (12). We say that the DO P is of order 1 in the open conic set U T ˝ n 0, if for any closed conic set and is invariant under the flow of Hpm .


919

An important special case is the wave operator P D @2t g , where g is the Laplace Beltrami operator associated with a Riemannian metric g. We may add lower-order terms without changing the bicharacteristics. Let .; / be the dual variables to .t; x/. The principal symbol is p2 D 2 C j j2g , where P ij g .x/ i j , and .g ij / D .gij /1 . The j j2g WD bicharacteristics equations then are P D 0, tP D 2, P P ij g .x/ i j , and xP j D 2 g ij i , and Pj D 2@x j they are null ones if 2 D j j2g . Here, xP D dx=ds, etc. The latter equations are the Hamiltonian curves of P two HQ WD g ij .x/ i j and they are known to coincide with the geodesics .; P / on TM when identifying vectors and covectors by the metric. They lie on the energy surface HQ D const. The first two equations imply that is a constant, positive or negative; and up to rescaling, one can choose the parameter along the geodesics to be t. That rescaling forces the speed along the geodesic to be 1. The null condition 2 D j j2g defines two smooth surfaces away from .; / D .0; 0/: D ˙j jg . This corresponds to geodesics starting from x in direction either or . To summarize, for the homogeneous equation P u D 0, we get that each singularity .x; / of the initial conditions at t D 0 starts to propagate from x in direction either or

or both (depending on the initial conditions) along the unit speed geodesic. In fact, we get this first for the singularities in T .Rt Rnx / first, but since they lie in CharP , one can see that they project to T Rnx as singularities again.

with initial conditions u.0; x/ D f1 .x/ and ut .0; x/ D f2 . It is enough to assume first that f1 and f2 are in C01 and extend the resulting solution operator to larger spaces later. We are looking for a solution of the form u.t; x/ D .2 /

n

XZ

e i .t;x; / a1; .x; ; t/fO1 . /

D˙

C j j1 a2; .x; ; t/fO2 . / d ; (16) modulo terms involving smoothing operators of f1 and f2 . The reason to expect two terms is already clear by the propagation of singularities theorem, and is also justified by the eikonal equation below. Here the phase functions ˙ are positively homogeneous of order 1 in

. Next, we seek the amplitudes in the form aj;

1 X

.k/

aj; ;

D ˙; j D 1; 2;

.k/

where aj; is homogeneous in of degree k for large j j. To construct such a solution, we plug (16) into (15) and try to kill all terms in the expansion in homogeneous (in ) terms. Equating the terms of order 2 yields the eikonal equation .@t /2 c 2 .x/jrx j2 D 0:

Geometrical Optics Geometrical optics describes asymptotically the solutions of hyperbolic equations at large frequencies. It also provides a parametrix (a solution up to smooth terms) of the initial value problem for hyperbolic equations. The resulting operators are not DOs anymore; they are actually examples of Fourier Integral Operators. Geometrical Optics also studies the large frequency behavior of solutions that reflect from a smooth surface (obstacle scattering) including diffraction, reflect from an edge or a corner, and reflect and refract from a surface where the speed jumps (transmission problems). As an example, consider the acoustic equation .@2t c 2 .x/ /u D 0;

.t; x/ 2 Rn ;

(17)

kD0

(18)

R Write fj D .2 /n e ix fOj . / d , j D 1; 2, to get the following initial conditions for ˙ ˙ jt D0 D x :

(19)

The eikonal equation can be solved by the method of characteristics. First, we determine @t and rx for t D 0. We get @t jt D0 D c.x/j j, rx jt D0 D . This implies existence of two solutions ˙ . If c D 1, we easily get ˙ D j jt C x . Let for any .z; /, z; .s/ be unit speed geodesic through .z; /. Then C is constant along the curve .t; z; .t// that implies that C D z.x; / in any domain in which .t; z/ can be chosen to be coordinates. Similarly, is constant along the curve .t; z; .t//. In general, we cannot solve the eikonal equation globally, for (15) all .t; x/. Two geodesics z; and w; may intersect,

M

920

Minimal Surface Equation

for example, giving a nonunique value for ˙ . We always have a solution however in a neighborhood of t D 0. Equate now the order 1 terms in the expansion of .@2t c 2 /u to get that the principal terms of the amplitudes must solve the transport equation

.0/ .@t ˙ /@t c 2 rx ˙ rx C C˙ aj;˙ D 0;

(20)

with 2C˙ D .@2t c 2 /˙ : This is an ODE along the vector field .@t ˙ ; c 2 rx /, and the integral curves of it coincide with the curves .t; z;˙ /. Given an initial condition at t D 0, it has a unique solution along the integral curves as long as is well defined. Equating terms homogeneous in of lower order we .k/ get transport equations for aj; , j D 1; 2; : : : with the same left-hand side as in (20) with a right-hand side .k1/ determined by ak; . Taking into account the initial conditions, we get a1;C C a1; D 1;

a2;C C a2; D 0 for t D 0: .0/

This is true in particular for the leading terms a1;˙ and .0/

a2;˙ . Since @t ˙ D c.x/j j for t D 0, and ut D f2 for t D 0, from the leading order term in the expansion of ut , we get .0/

.0/

a1;C D a1; ;

.0/

.0/

ic.x/.a2; a2;C / D 1

large, depending on the speed. On the other hand, the solution operator .f1 ; f2 / 7! u makes sense as a global Fourier Integral Operator for which this construction is just one if its local representations. Acknowledgements Author partly supported by a NSF FRG Grant DMS-1301646. This article first appeared as an appendix (pp. 310–320) to “Multi-wave methods via ultrasound” by Plamen Stefanov and Gunther Uhlmann, Inverse Problems and Applications: Inside Out II, edited by Gunther Uhlmann, Mathematical Sciences Research Institute Publications v.60, Cambridge University Press, New York, 2012, pp. 271–324.

References 1. Hörmander, L.: The Analysis of Linear Partial Differential Operators. III, Pseudodifferential Operators. Volume 274 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer, Berlin (1985) 2. Melrose, R.: Introduction to Microlocal Analysis. (2003) http://www-math.mit.edu/rbm/iml90.pdf 3. Taylor, M.E.: Pseudodifferential Operators. Volume 34 of Princeton Mathematical Series. Princeton University Press, Princeton (1981) 4. Trèves, F.: Introduction to Pseudodifferential and Fourier Integral Operators. Pseudodifferential Operators. The University Series in Mathematics, vol. 1. Plenum Press, New York (1980)

for t D 0:


Therefore, Einar M. Rønquist and Øystein Tr˚asdahl Department of Mathematical Sciences, Norwegian for tD0: D D University of Science and Technology, Trondheim, 2 (21) Norway Note that if c D 1, then ˙ D x tj j, and a1;C D a1; D 1=2, a2;C D a2; D i=2. Using those initial .0/ conditions, we solve the transport equations for a1;˙ Minimal Surfaces 1 .0/ a1; D ;

.0/ a1;C

.0/ a2;C

.0/ a2; D

i 2c.x/

.0/

and a2;˙ . Similarly, we derive initial conditions for the lower-order terms in (17) and solve the corresponding transport equations. Then we define aj; by (17) as a symbol. The so constructed u in (16) is a solution only up to smoothing operators applied to .f1 ; f2 /. Using standard hyperbolic estimates, we show that adding such terms to u, we get an exact solution to (15). As mentioned above, this construction may fail for t too

Minimal surfaces arise many places in natural and man-made objects, e.g., in physics, chemistry, and architecture. Minimal surfaces have fascinated many of our greatest mathematicians and scientists for centuries. In its simplest form, the problem can be stated as follows: find the surface S of least area spanning a given closed curve C in R3 . In the particular case when C lies in a two-dimensional plane, the minimal surface


is simply the planar region bounded by C . However, the general problem is very difficult to solve [2, 6]. The easiest way to physically study minimal surfaces is to look at soap films. In the late nineteenth century, the Belgian physicist Joseph Plateau [7] conducted a number of soap film experiments. He observed that, regardless of the shape of a closed and curved wire, the wire could always bound at least one soap film. Since capillary forces attach a potential energy proportional to the surface area, a soap film in stable equilibrium position corresponds to a surface of minimal area [4]. The mathematical boundary value problem for minimal surfaces is therefore also called the Plateau problem.

921

Z

q

AŒf D ˝

1 C fx2 C fy2 dx dy:

(1)

The surface of minimum area is then given directly by the Euler-Lagrange equation for the area functional AŒf : .1 C fy2 /fxx C .1 C fx2 /fyy 2fx fy fxy D 0: (2)

Equation (2) is called the minimal surface equation. Hence, to determine S mathematically, we need to solve a nonlinear, second-order partial differential equation with specified boundary conditions (determined by the given curve C ). Despite the difficulty of finding closed form solutions, the minimal surface problem has created an intense mathematical activity over the past couple of centuries and spurred advances Mathematical Formulation in many fields like calculus of variation, differential Assume for simplicity that the surface S can be repre- geometry, integration and measure theory, and complex sented as a function z D f .x; y/; see Fig. 1. Note that analysis. With the advent of the computer, a range of computational algorithms has also been proposed to this may not always be possible. Using subscripts x and y to denote differentiation construct approximate solutions. with respect to x and y, respectively, the area AŒf of a surface f .x; y/ can be expressed as the integral

Characterizations and Generalizations

A point on the minimal surface S is given by the coordinates .x; y; z/ D .x; y; f .x; y//. This represents an example of a particular parametrization P of S : for any point .x; y/ 2 ˝, there is a corresponding point P .x; y/ D .x; y; z/ D .x; y; f .x; y// on S 2 R3 . Two tangent vectors t1 and t2 spanning the tangent plane at P .x; y/ is then given as t1 D Px .x; y/ D .1; 0; fx /;

(3)

t2 D Py .x; y/ D .0; 1; fy /:

(4)

The normal vector at this point is then simply the cross product between t1 and t2 : nD

Minimal Surface Equation, Fig. 1 The minimal surface S is the surface of least area bounded by the given blue curve, C . The projection of S onto the xy-plane is the planar region ˝ bounded by the red curve, @˝. The minimal surface z D f .x; y/ for the particular choice of C shown here is called the Enneper surface

.fx ; fy ; 1/ t1 t2 ; Dq jt1 t2 j 1 C fx2 C fy2

(5)

where n is normalized to be of unit length. It can be shown that the divergence of n is equal to twice the mean curvature, H , at the point P .x; y/. Using (2), it follows that 2H D r n D 0: (6)

M

922

Hence, a minimal surface is characterized by the fact that the mean curvature is zero everywhere. Note that this is in sharp contrast to a soap bubble where the mean curvature is nonzero. This is because the pressure inside the soap bubble is higher than on the outside, and the pressure difference is given as the product of the mean curvature and the surface tension, which are both nonzero. For a soap film, however, the pressure is the same on either side, consistent with the fact that the mean curvature is zero. In many cases, we cannot describe a surface as a simple function z D f .x; y/. However, instead of using the simple parametrization P .x; y/ D .x; y; f .x; y//, we can generalize the parametrization in the following way. For a point .u; v/ 2 ˝ 2 2 point P .u; v/ D R , there is a corresponding x.u; v/; y.u; v/; z.u; v/ on S 2 R3 , i.e., each individual coordinate x, y, and z is a function of the new coordinates u and v. Being able to choose


different parametrizations for a surface is of great importance, both for the pure mathematical analysis and for numerical computations. Examples of Minimal Surfaces Enneper’s surface (see also Fig. 1) can be parametrized as [2] 1 2 2 x.u; v/ D u 1 u C v ; 3 1 (7) y.u; v/ D v 1 v 2 C u2 ; 3 z.u; v/ D u2 v 2 ; where u and v are coordinates on a circular domain of radius R. For R 1 the surface is stable and has a global area minimizer; see Fig. 2b which depicts the computed minimal surface for the case R D 0:8 using the surface in Fig. 2a as an initial condition [8].

Minimal Surface Equation, Fig. 2 Enneper’s surface is the minimal surface corresponding to the given, blue boundary curve (a) Initial surface. (b) Minimal surface

Minimal Surface Equation, Fig. 3 The three minimal surfaces in the Enneper case for R D 1:2. The unstable solution (a) is known analytically (see (7)) and is found by interpolating this known parametrization. The two other solutions are stable

and global area minimizers. The surfaces (b) and (c) are here obtained by starting from slightly perturbed versions of (a) (by adding random perturbations on the order of 1010 )

Model Reduction

923

p For 1 < R < 3 the given parametrization gives be done once, in an offline stage, to obtain a model of an unstable minimal surface. However, there also exist reduced complexity to be evaluated at little cost while two (symmetrically similar) stable minimal surfaces maintaining accuracy. Let us consider a generic dynamical system as which are global area minimizers; see Fig. 3. This illustrates a case where the minimal p surface problem @u.x; t; / C F .u; x; t; / D f .x; t; /; has more than one solution. For R 3 the boundary @t curve intersects itself. y.x; t; / D G T u; subject to appropriate initial and boundary values. Here u is an N -dimensional vector field, possibly depending on space x and time t, and is a q-dimensional Chopp, D.L.: Computing minimal surfaces via level set parameter space. y represents an output of interest. curvature flow. J. Comput. Phys. 106, 77–91 (1993) Dierkes, U., Hildebrandt, S., Küster, A., Wohlrab, O.: Mini- If N is very large, e.g., when originating from the mal Surfaces I. Springer, Berlin/Heidelberg/New York (1991) discretization of a partial differential equation, the Gerhard, D., Hutchinson, J.E.: The discrete Plateau probevaluation of this output is potentially expensive. lem: algorithm and numerics. Math. Comput. 68(225), 1–23 This situation has led to the development of a (1999) Isenberg, C.: The Science of Soap Films and Soap Bubbles. myriad of methods to develop reduced models with the Dover Publications Inc, New York (1992) majority focusing on representing the solution, u, as a Jacobsen, J.: As flat as possible. SIAM Rev. 49(3), 491–507 linear combination of N -vectors as (2007)

References 1. 2. 3.

4. 5.

6. Osserman, R.: A Survey of Minimal Surfaces. Dover Publications Inc, New York (2002) 7. Plateau, J.A.F.: Statique expérimentale et théorique des liquides soumis aux seules forces moléculaires. GauthierVillars, Paris (1873) 8. Tr˚asdahl, Ø., Rønquist, E.M.: High order numerical approximation of minimal surfaces. J. Comput. Phys. 230(12), 4795–4810 (2011)

u ' uO D Va; where V is an N m orthonormal matrix, representing a linear space, and a an m-vector. Inserting this into the model yields the general reduced model @aO C V T F .Va/ D V T f; y D .G T V /a; @t

Model Reduction

where we have left out the explicit dependence of the parameters for simplicity. In the special case where F .u/ D Lu is linear, the problem further reduces to

Jan S. Hesthaven Division of Applied Mathematics, Brown University, Providence, RI, USA

@aO C V T LVa D V T f; y D .G T V /a; @t

While advances in high-performance computing and mathematical algorithms have enabled the accurate modeling of problems in science and engineering of very considerable complexity, problems with strict time restrictions remain a challenge. Such situations are often found in control and design problems and in situ deployed systems and others, characterized by a need for a rapid online evaluation of a system response under the variation of one or several parameters, including time. However, to ensure this ability to perform many evaluations under parameter variation of a particular system, it is often acceptable that substantial work

which can be evaluated in complexity independent of N . Hence, if N m, the potential for savings is substantial, reflecting the widespread interest in and use of reduced models. For certain classes of problems, lack of linearity can be overcome using nonlinear interpolation techniques, known as empirical interpolation methods [2], to recover models with evaluation costs independent of N . Considering the overall accuracy of the reduced model leads to the identification of different methods, with the key differences being in how V is formed and how the overall accuracy of the reduced model is estimated.

M

924

Proper Orthogonal Decompositions In the proper orthogonal decomposition (POD) [4], closely related to principal component analysis (PCA), Karhunen-Loeve (KL) transforms, and the Hotelling transform, the construction of the linear space to approximate u is obtained through processing of a selection of solution snapshots. Assume that a sequence of solutions, un , is obtained at regular intervals for parameters or, as in often done, at regular intervals in time. Collecting these in an N n matrix X D Œu1 ; : : : ; un , the singular value decomposition (SVD) yields X D U˙W where ˙ is an n n diagonal matrix with the singular values, U is N n and W is a n n matrix, both of which are orthonormal. The POD basis is then formed by truncating ˙ at a tolerance, ", such that m " mC1 . The linear space used to represent u over parameter or time variation is obtained from the first m columns of U . An estimate of the accuracy of the linear space for approximating of the solution is recovered from the magnitude of the largest eliminated singular value. The success of the POD has led to numerous extensions, [4, 9, 12], and this approach has been utilized for the modeling of large and complex systems [3, 13]. To develop effective and accurate POD models for nonlinear problems, the discrete empirical interpolation method (DEIM) [5] has been introduced. A central disadvantage of the POD approach is the need to compute n snapshots, often in an offline approach, perform the SVD on this, possibly large, solution matrix, and then eliminate a substantial fraction through the truncation process. This results in a potentially large computational overhead. Furthermore, the relationship between the eliminated vectors associated with truncated singular values and the accuracy of the reduced model is generally not clear [8], and the stability of the reduced model, in particular for nonlinear problems, is known to be problematic. Some of these issues, in particular related to preservation of the asymptotic state, are discussed in [16].

Model Reduction

obtains a transfer function, H.s/ D G T .s C L/1 B, between input g and output y with s being the Laplace parameter. Introducing the matrix A D .L C s0 /1 and the vector r D .L C s0 /1 B, the transfer function becomes H.s/ D G T .I .s s0 /A/1 r. Hence for perturbations of s around s0 , we recover H.s/ D

1 X

mi .s s0 /m ; mi D G T Am r;

i D0

where one recognizes that mi is obtained as a product of the vectors in the Krylov subspaces spanned by Aj r and AT G. These are recognized as the left and the right Krylov subspace vectors and can be computed in a stable manner using a Lanczos process. The union of the first m=2 left and right Krylov vectors spans the solution of the dynamical problem, and, as expected, larger intervals including s0 require longer sequences of Krylov vectors. While the computational efficiency of the Krylov techniques is appealing, a thorough error analysis is lacking [1]. However, there are several extensions to more complex problems and nonlinear systems [15] as well as to closely related techniques aimed to address time-dependent problems [15].

Certified Reduced Basis

Certified reduced basis methods (RBM) [11, 14] are fundamentally different from the previous two techniques in how the linear space is constructed and were originally proposed as an accurate and efficient way to construct reduced models for parametrized steady or harmonic partial differential equations. In this approach, based in the theory of variational problems and Galerkin approximations, one expresses the problem as a bilinear form, a.u; v; / D f .; v/, and seeks an approximation to the solution, u./, over variations of . In contrast to the POD, in the RBM, the basis is constructed through a greedy approach based on maximizing the residual a.Ou; v/ f in some approKrylov-Based Methods priate norm and an offline testing across the parameter space using a carefully designed residual-based error The majority of Krylov-based methods [1] consider the estimator. simplified linear problem with F D Lu and f D This yields a reduced method with a couple of Bg representing the input. In Laplace domain, one distinct advantages over POD in particular. On one

Modeling of Blood Clotting

hand, the greedy approximation enables a minimal computational effort since only snapshots required to build the basis in an max-norm optimal manner are computed. Furthermore, the error estimator enables one to rigorously certify the quality of the reduced model and the output of interest. This is a unique quality of the certified reduced basis methods and has been demonstrated for a large class of linear problems, including applications originating in solid mechanics, heat conduction, acoustics, and electromagnetics [6, 11, 14], and for geometric variations [6, 11] and applications formulated as integral equations [7]. This more rigorous approach is difficult to extend to nonlinear problems and general time-dependent problems although there are recent results in this direction [10, 17], combining POD and RBM to enable reduced models for time-dependent parametric problems.

925 11. Quarteroni, A., Rozza, G., Manzoni, A.: Certified reduced basis approximation for parametrized partial differential equations and applications. J. Math. Ind. 1, 3 (2011) 12. Rathinam, M., Petzold, L.: A new look at proper orthogonal decomposition. SIAM J. Numer. Anal. 41, 1893–1925 (2004) 13. Rowley, C.: Model reduction for fluids using balanced proper orthogonal decomposition. Int. J. Bifurc. Chaos 15, 997–1013 (2000) 14. Rozza, G., Huynh, D.B.P., Patera, A.T.: Reduced basis approximation and a posteriori error estimation for affinely parametrized elliptic coercive partial differential equations: application to transport and continuum mechanics. Arch. Comput. Methods Eng. 15(3), 229–275 (2008) 15. Schmid, P.J.: Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech. 656, 5–28 (2010) 16. Sirisup, S., Karniadakis, G.E.: A spectral viscosity method for correcting the long-term behavior of POD-models. J. Comput. Phys. 194, 92–116 (2004) 17. Veroy, K., Prud’homme, C., Patera, A.T.: Reduced-basis approximation of the viscous Burgers equation: rigorous a posteriori error bounds. C. R. Math. 337(9), 619–624 (2003)

References

Modeling of Blood Clotting 1. Bai, Z.: Krylov subspace techniques for reduced-order modeling of large-scale dynamical systems. Appl. Numer. Math. 43, 9–44 (2002) 2. Barrault, M., Maday, Y., Nguyen, N.C., Patera, A.T.: An ‘empirical interpolation’ method: application to efficient reduced-basis discretization of partial differential equations. C. R. Math. 339(9), 667–672 (2004) 3. Berkooz, G., Holmes, P., Lumley, J.L.: The proper orthogonal decomposition in the analysis of turbulent flows. Ann. Rev. Fluid Mech. 25, 539–575 (1993) 4. Chatterjee, A.: An introduction to the proper orthogonal decomposition. Curr. Sci. 78, 808–817 (2000) 5. Chaturantabus, S., Sorensen, D.C.: Nonlinear model reduction via discrete empirical interpolation. SIAM J. Sci. Comput. 32, 2737–2764 (2010) 6. Chen, Y., Hesthaven, J.S., Maday, Y., Rodriguez, J., Zhu, X.: Certified reduced methods for electromagnetic scattering and radar cross section prediction. Comput. Methods Appl. Mech. Eng. 233, 92–108 (2012) 7. Hesthaven, J.S., Stamm, B., Zhang, S.: Certified reduced basis methods for the electric field integral equation. SIAM J. Sci. Comput. 34(3), A1777–A1799 (2011) 8. Homescu, C., Petzold, L.R., Serban, R.: Error estimation for reduced-order models of dynamical systems. SIAM Rev. 49, 277–299 (2007) 9. Ilak, M., Rowley, C.W.: Modeling of transitional channel flow using balanced proper orthogonal decomposition. Phys. Fluids 20, 034103 (2008) 10. Nguyen, N.C., Rozza, G., Patera, A.T.: Reduced basis approximation and a posteriori error estimation for the timedependent viscous Burgers’ equation. Calcolo 46(3), 157– 185 (2009)

Aaron L. Fogelson Departments of Mathematics and Bioengineering, University of Utah, Salt Lake City, UT, USA

Mathematics Subject Classification 92C05; 92C10; 92C35 (76Z05); 92C42; 92C45; 92C55

Synonyms Modeling of thrombosis; Modeling of platelet aggregation and coagulation; Modeling of platelet deposition and coagulation

Description Blood circulates under pressure through the human vasculature. The pressure difference across the vascular wall means that a hole in the vessel wall can lead to rapid and extensive loss of blood. The hemostatic (blood clotting) system has developed to seal a vascular

M

926


injury quickly and minimize hemorrhage. The components of this system, so important to its necessarily rapid and robust response to overt injury, are implicated in the pathological processes of arterial and venous thrombosis that cause extensive death and morbidity. Intensive laboratory research has revealed much about the players involved in the clotting process and about their interactions. Yet much remains unknown about how the system as a whole functions. This is because the nature of the clotting response – complex, multifaceted, dynamic, spatially distributed, and multiscale – makes it very difficult to study using traditional experimentation. For this reason, mathematical models and computational simulations are essential to develop our understanding of clotting and an ability to make predictions about how it will progress under different conditions. Blood vessels are lined with a monolayer of endothelial cells. If this layer is disrupted, then exposure of the subendothelial matrix initiates the intertwined processes of platelet deposition and coagulation. Platelets are tiny cells which circulate with the blood in an unactive state. When a platelet contacts the exposed subendothelium, it may adhere by means of bonds formed between receptors on the platelet’s surface and molecules in the subendothelial matrix (see Fig. 1). These bonds also trigger a suite

vWF Fbg

Plt Fbg Fbg Plt

EC

vWF Plt

vWF

EC

of responses known as platelet activation which include change of shape, the mobilization of an additional family of binding receptors on the platelet’s surface, and the release of chemical agonists into the blood plasma (the most important of these being ADP from cytoplasmic storage granules and the coagulation enzyme thrombin synthesized on the surface of activated platelets). These agonists can induce activation of other platelets that do not directly contact the injured vascular tissue. By means of molecular bonds that bridge the gap between the newly mobilized binding receptors on two platelets’ surfaces, platelets can cohere to one another. As a result of these processes, platelets deposit on the injured tissue and form a platelet plug. Exposure of the subendothelium also triggers coagulation which itself can be viewed as consisting of two subprocesses. The first involves a network of tightly-regulated enzymatic reactions that begins with reactions on the damaged vessel wall and continues with important reactions on the surfaces of activated platelets. The end product of this reaction network is the enzyme thrombin which activates additional platelets and creates monomeric fibrin which polymerizes into a fibrous protein gel that mechanically stabilizes the clot. This polymerization process is the second subprocess of coagulation. Both platelet aggregation and the two parts of coagulation occur in the presence of moving blood, and are strongly affected by the fluid dynamics in ways that are as yet poorly understood. One indication of the effect of different flow regimes is that clots that form in the veins, where blood flow is relatively slow, are comprised mainly of fibrin gel (and trapped red blood cells), while clots that form under the rapid flow conditions in arteries are made up largely of platelets. Understanding why there is this fundamental difference between venous and arterial clotting should give important insights into the dynamics of the clotting process.

Collagen

Modeling of Blood Clotting, Fig. 1 Schematic of platelet adhesion and cohesion. Von Willebrand Factor (vWF ) adsorbed on the subendothelial collagen binds to platelet GPIb (Red Y ) or platelet ˛II b ˇIII (Blue Y ) receptors. Soluble vWF and fibrinogen (Fbg) bind to platelet ˛II b ˇIII receptors to bridge platelet surfaces. Platelet GPIb receptors are constitutively active, while ˛II b ˇIII receptors must be mobilized when the platelet is activated. Platelet GPVI and ˛2 ˇ1 receptors for collagen itself are not shown

Models Flow carries platelets and clotting chemicals to and from the vicinity of the vessel injury. It also exerts stress on the developing thrombi which must be withstood by the platelet adhesive and cohesive bonds in


order for a thrombus to grow and remain intact. To look at clot development beyond initial adhesion to the vascular wall, the disturbance to the flow engendered by the growth of the thrombus must be considered. Hence, models of thrombus growth involve a coupled problem of fluid dynamics, transport of cells and chemicals, and perturbation of the flow by the growing platelet mass. Most of these models have looked at events at a scale for which it is feasible to track the motion and behavior of a collection of individual platelets. Because there are approximately 250,000 platelets/L of blood, this is possible only for small vessels of, say, 50 m in diameter, such as arterioles or venules, or for the parallel plate flow chambers often used for in vitro investigations of platelet deposition under flow. To look at platelet deposition in larger vessels of the size, say, of the coronary arteries (diameter 1–2 mm), a different approach is needed. In the next two sections, both the microscale and macroscale approaches to modeling platelet deposition are described. Microscale Platelet Deposition Models Microscale platelet deposition modeling was begun by Fogelson who combined a continuum description of the fluid dynamics (using Stokes equations) with a representation of unactivated and activated platelets using Peskin’s Immersed Boundary (IB) method [11]. This line of research continues as described shortly. Others have modeled this process using the Stokes or Navier Stokes equations for the fluid dynamics and the Cellular-Potts model [14], Force-Coupling Method [12], or Boundary-Integral Method [10] to represent the platelets. Another approach is to use particle methods to represent both the fluid and the platelets [1, 7]. Fogelson and Guy [3] describe the current state of IB-based models of platelet deposition on a vascular wall. These models track the motion and behavior of a collection of individual platelets as they interact mechanically with the suspending fluid, one another, and the vessel walls. More specifically, the models track the fluid motion, the forces the fluid exerts on the growing thrombus, and the adhesive and cohesive bond forces which resist these. An Eulerian description of the fluid dynamics by means of the Navier–Stokes equations is combined with Lagrangian descriptions of each of the platelets and vessel walls. In computations, the fluid variables are determined on a regular Cartesian grid, and each platelet is represented by a discrete set of

927

elastically-linked Lagrangian IB points arrayed along a closed curve (in 2D) or surface (in 3D). Forces generated because of deformation of a platelet or by stretching of its bonds with other platelets or the vessel wall are transmitted to the fluid grid in the vicinity of each IB point. The resulting highly-localized fluid force density is how the fluid “sees” the platelets. Each IB point moves at a velocity that is a local average of the newly computed fluid velocity. In the models, nonactivated platelets are activated by contact with reactive sites on the injured wall, or through exposure to a sufficiently high concentration of a soluble chemical activator. Activation enables a platelet to cohere with other activated platelets, and to secrete additional activator. The concentration of each fluid-phase chemical activator satisfies an advection– diffusion equation with a source term corresponding to the chemical’s release from the activated platelets. To model adhesion of a platelet to the injured wall or the cohesion of activated platelets to one another, new elastic links are created dynamically between IB points on the platelet and the other surface. The multiple links, which in the models can form between a pair of activated platelets or between a platelet and the injured wall, collectively represent the ensemble of molecular bridges binding real platelets to one another or to the damaged vessel. Figure 2 shows snapshots of a portion of the computational domain during a simulation using the two-dimensional IB model. In the simulation, part of the bottom vessel wall is designated as injured and platelets that contact it, adhere to it and become activated. Two small thrombi form early in the simulation. The more upstream one grows more quickly and partially shields the downstream portion of the injured wall, slowing growth of the other thrombus. Together these thrombi disturb the flow sufficiently that few platelets contact and adhere to the downstream portion of the injured wall. Linear chains of platelets bend in response to the fluid forces and bring platelets of the two aggregates into close proximity and lead to consolidation of the adherent platelets into one larger thrombus. When a thrombus projects substantially into the vessel lumen there is a substantial strain on its most upstream attachments to the vessel wall. These bonds can break allowing the aggregate to roll downstream. (See [3] for examples of results from 3D simulations.) For the simulation in Fig. 2, simple rules were used for platelet activation and the formation and breaking

M

928


a

b

c

d

Modeling of Blood Clotting, Fig. 2 Snapshots from a simulation using the Immersed Boundary–based model of platelet deposition. Rings depict platelets, arrows show velocity field. Unactivated platelets are elliptical. Upon activation, a platelet

becomes both less rigid and more circular. Initially there are two small thrombi which due to growth and remodeling by fluid forces merge into one larger thrombus. Plots show only a portion of computational domain

of adhesive and cohesive bonds. In recent years, much new information has become available about how a platelet detects and responds to stimuli that can induce its activation and other behaviors. The detection of stimuli, be it from a soluble chemical activator or a molecule embedded in the subendothelium or on another platelet, is mediated by surface receptors of many types. These include tens of thousands of receptors on each platelet involved in adhesion and cohesion (see Fig. 1), as well as many other receptors for soluble platelet agonists including ADP and thrombin, and hundreds to thousands of binding sites for the different enzymes and protein cofactors involved in the coagulation reactions on the platelet’s surface that are described below. Including such surface reactions as well as more sophisticated treatment of the dynamics of platelet adhesive and cohesive bonds will be essential components of extended versions of the models described here. For work in this direction see [9, 10].

the rest of the model’s dynamics, and these stresses were computed by doing a weighted average over the microscale spatial variables for each macroscale location and time. By performing this average on each term of the PDE for the bond distribution function and devising an appropriate closure approximation, an evolution equation for the bond stress tensor, that involved only the macroscale spatial variables, was derived. Comparison with the multiscale model showed that this equation still captured essential features of the multiscale behavior, in particular, the sensitivity of the bond breaking rate to strain on the bond. The PDE for the stress tensor is closely related to the Oldroyd-B equation for viscoelastic flows, but has “elastic modulus” and “relaxation time” coefficients that evolve in space and time. The divergence of this stress tensor, as well as that of a similar one from platelet-wall bonds, appears as a force density in the Navier–Stokes equations. The model also includes transport equations for the nonactivated and activated platelet concentrations, the activating chemical concentration, and the platelet– platelet and platelet–wall bond concentrations. The model has been used to explore platelet thrombosis in response to rupture of an atherosclerotic plaque. The plaque itself constricts the vessel producing a complex flow with areas of high and low shear stress. The rupture triggers platelet deposition, the outcome of which depends on the location of the rupture in the plaque and features of the flow in addition to biological parameters. The thrombus can grow to occlude the vessel and thus stop flow, or it can be torn apart by shear stresses leading to one or more thrombus fragments that are carried downstream. Model results make clear the fact that flow matters as

Large Vessel Platelet Thrombosis Models Because of the vast number of platelets involved, to study platelet thrombosis in millimeter diameter vessels, like the coronary arteries, requires a different modeling approach. Fogelson and Guy’s macroscale continuum model of platelet thrombosis [2, 3] uses density functions to describe different populations of platelets. It is derived from a multiscale model in which both the millimeter vessel scale and the micron platelet scale were explicitly treated. That model tracked continuous distributions of interplatelet and platelet-wall bonds as the bonds formed and broke, and were reoriented and stretched by flow. However, only the stresses generated by these bonds affected


929

Fibrinogen

Unactivated Platelet

Prothrombin

Va

X

Fibrin

ATIII

APC VIIIa:IXa

Xa

Activated Platelet

ADP

APC

VIIIa

VIII

V

Va

VIIIa

VIII

V

Va

ATIII VII

THROMBIN

Va:Xa

Va

VIIa

IXa

APC

ATIII IX

X

Xa

VII

TFPI TF

TF:VIIa

TF

Subendothelium

Modeling of Blood Clotting, Fig. 3 Schematic of coagulation reactions. Magenta arrows show cellular or chemical activation processes, blue ones indicate chemical transport in the fluid or on a surface. Double headed green arrows depict binding and unbinding from a surface. Rectangles indicate surface-bound

species. Solid black lines with open arrows show enzyme action in a forward direction, while dashed black lines with open arrows show feedback action of enzymes. Red disks indicate chemical inhibitors

two simulations that differ only in whether the rupture into insoluble fibrin monomers. Once formed, the fiboccurred in high shear or low shear regions had very rin monomers spontaneously bind together into thin different outcomes [3]. strands, these strands join side to side into thicker fibers, and a branching network of these fibers grows between and around the platelets in a wall-bound platelet aggregate. Coagulation Modeling In vitro coagulation experiments are often performed under static conditions and without platelets. Coagulation Enzyme Reactions In addition to triggering platelet deposition, exposure A large concentration of phospholipid vesicles is used of the subendothelium brings the passing blood into in order to provide surfaces on which the surfacecontact with Tissue Factor (TF) molecules embedded phase coagulation reactions can occur. Most models in the matrix and initiates the coagulation process (see of the coagulation enzyme system have aimed to Fig. 3). The first coagulation enzymes are produced on describe this type of experiment. These models assume the subendothelial matrix and released into the plasma. that chemical species are well mixed and that there If they make their way through the fluid to the surface is an excess of appropriate surfaces on which the of an activated platelet, they can participate in the surface-phase reactions take place. The models do not formation of enzyme complexes on the platelet surface explicitly treat binding reactions between coagulation that continue and accelerate the pathway to thrombin proteins and these surfaces. The Hockin-Mann model production. Thrombin released from the platelet sur- [6] is a prime example and has been fit to experimental face feeds back on the enzyme network to accelerate data from Mann’s lab and used to infer, for example, its own production, activates additional platelets, and the effect of different concentrations of TF on the converts soluble fibrinogen molecules in the plasma timing and extent of thrombin production, and to

M

930

characterize the influence of chemical inhibitors in the response of the system. More recently, models that account for interactions between platelet events and coagulation biochemistry and which include treatment of flow have been introduced. The Kuharsky–Fogelson (KF) model [8] was the first such model. It looks at coagulation and platelet deposition in a thin reaction zone above a small injury and treats as well-mixed the concentration of each species in this zone. Reactions are distinguished by whether they occur on the subendothelium, in the fluid, or on the surface of activated platelets. Transport is described by a mass transfer coefficient for each fluid-phase species. Reactions on the subendothelial and platelet surfaces are limited by the availability of binding sites for the coagulation factors on these surfaces. The model consists of approximately sixty ODEs for the concentrations of coagulation proteins and platelets. The availability of subendothelial TF is a control parameter, while that of platelet binding sites depends on the number of activated platelets in the reaction zone, which in turn depends in part on the extent of thrombin production. Studies with this model and its extensions showed (1) that thrombin production depends in a threshold manner on the exposure of TF, thus providing a “switch” for turning the system on only when needed, (2) that platelets covering the subendothelium play an inhibiting role by covering subendothelial enzymes at the same time as they provide the surfaces on which other coagulation reactions occur, (3) that the flow speed and the coverage of the subendothelium by platelets have big roles in establishing the TF-threshold, (4) that the bleeding tendencies seen in hemophilias A and B and thrombocytopenia have kinetic explanations, and (5) that flow-mediated dilution may be the most important regulator of thrombin production (rather than chemical inhibitors of coagulation reactions) at least for responses to small injuries. Several of these predictions have been subsequently confirmed experimentally. The KF model was recently extended by Leiderman and Fogelson [9] to account for spatial variations and to give a much more comprehensive treatment of fluid dynamics and fluid–platelet interactions. Although studies of this model are ongoing, it has already confirmed predictions of the simpler KF model, and has given new information and insights about the spatial organization of the coagulation reactions in a growing thrombus including strong indications that transport within the


growing thrombus is important to its eventual structure. For another spatial-temporal model that builds on the KF treatment of platelet–coagulation interactions, see [15]. Fibrin Polymerization Several modeling studies have looked at different aspects of fibrin polymerization. Weisel and Nagaswami [13] built kinetic models of fibrin strand initiation, elongation, and thickening, and drew conclusions about the relative rates at which these happen. Guy et al. [5] coupled a simple model of thrombin production to formulas derived from a kinetic gelation model to examine what limits the growth of a fibrin gel at different flow shear rates. This study gave the first (partial) explanation of the reduced fibrin deposition seen at high shear rates. Fogelson and Keener [4] developed a kinetic gelation model that allowed them to examine a possible mechanism for fibrin branch formation. They showed that branching by this mechanism results in gel structures that are sensitive to the rate at which fibrin monomer is supplied. This is in accord with observations of fibrin gels formed in vitro in which the density of branch points, pore sizes, and fiber thicknesses varied with the concentration of exogenous thrombin used.

Conclusion Mathematical models and computer simulations based on these models have contributed significant insights into the blood clotting process. These modeling efforts are just a beginning, and much remains to be done to understand how the dynamic interplay of biochemistry and physics dictates the behavior of this system. In addition to the processes described in this entry, other aspects of clotting, including the regulation of platelet responses by intraplatelet signaling pathways, the dissolution of fibrin clots by the fibrinolytic system, and the interactions between the clotting and immune systems are interesting and challenging subjects for modeling.

References 1. Filipovic, N., Kojic, M., Tsuda, A.: Modeling thrombosis using dissipative particle dynamics method. Philos. Trans. Ser. A, Math. Phys. Eng. Sci. 366, 3265–3279 (2008)

Molecular Dynamics 2. Fogelson, A.L., Guy, R.D.: Platelet-wall interactions in continuum models of platelet aggregation: formulation and numerical solution. Math. Biol. Med. 21, 293–334 (2004) 3. Fogelson, A.L., Guy, R.D.: Immersed-boundary-type models of intravascular platelet aggregation. Comput. Methods Appl. Mech. Eng. 197, 2087–2104 (2008) 4. Fogelson, A.L., Keener, J.P.: Toward an understanding of fibrin branching structure. Phys. Rev. E 81, 051922 (2010) 5. Guy, R.D., Fogelson, A.L., Keener, J.P.: Modeling fibrin gel formation in a shear flow. Math. Med. Biol. 24, 111–130 (2007) 6. Hockin, M.F., Jones, K.C., Everse, S.J., Mann, K.G.: A model for the stoichiometric regulation of blood coagulation. J. Biol. Chem. 277, 18322–18333 (2002) 7. Kamada, H., Tsubota, K., Nakamura, M., Wada, S., Ishikawa, T., Yamaguch, T.: A three-dimensional particle simulation of the formation and collapse of a primary thrombus. Int. J. Numer. Methods Biomed. Eng. 26, 488– 500 (2010) 8. Kuharsky, A.L., Fogelson, A.L.: Surface-mediated control of blood coagulation: the role of binding site densities and platelet deposition. Biophys. J. 80, 1050–1074 (2001) 9. Leiderman, K.M., Fogelson, A.L.: Grow with the flow: a spatial-temporal model of platelet deposition and blood coagulation under flow. Math. Med. Biol. 28, 47–84 (2011) 10. Mody, N.A., King, M.R.: Platelet adhesive dynamics. Part I: characterization of platelet hydrodynamic collisions and wall effects. Biophys. J. 95, 2539–2555 (2008) 11. Peskin, C.S.: The immersed boundary method. Acta Numer. 11, 479–517 (2002) 12. Pivkin, I.V., Richardson, P.D., Karniadakis, G.: Blood flow velocity effects and role of activation delay time on growth and form of platelet thrombi. Proc. Natl. Acad. Sci. 103, 17164–17169 (2006) 13. Weisel, J.W., Nagaswami, C.: Computer modeling of fibrin polymerization kinetics correlated with electron microscope and turbidity observations: clot structure and assembly are kinetically controlled. Biophys. J. 63, 111–128 (1992) 14. Xu, Z., Chen, N., Kamocka, M.M., Rosen, E.D., Alber, M.: A multiscale model of thrombus development. J. R. Soc. Interface 5, 705–722 (2008) 15. Xu, Z., Lioi, J., Mu, J., Kamocka, M.M., Liu, X., Chen, D.Z., Rosen, E.D., Alber, M.: A multiscale model of venous thrombus formation with surface-mediated control of blood coagulation cascade. Biophys. J. 98, 1723–1732 (2010)

Molecular Dynamics Benedict Leimkuhler Edinburgh University School of Mathematics, Edinburgh, Scotland, UK

931

dynamics is modeled by quantum mechanics, for example, using the Schrödinger equation for the nuclei and the electrons of all the atoms (a partial differential equation). Because of computational difficulties inherent in treating the quantum mechanical system, it is often replaced by a classical model. The BornOppenheimer approximation is obtained by assuming that the nuclear degrees of freedom, being much heavier than the electrons, move substantially more slowly. Averaging over the electronic wave function then results in a classical Newtonian description of the motion of N nuclei, a system of point particles with positions q1 ; q2 ; : : : ; qN 2 R3 . In practice, the Born-Oppenheimer potential energy is replaced by a semiempirical function U which is constructed by solving small quantum systems or by reference to experimental data. Denoting the coordinates of the i th atom by qi;x ; qi;y ; qi;z , and its mass by mi , the equations of motion for the i th atom are then mi

d2 qi;y @U @U d2 qi;x D ; m D ; i 2 2 dt @qi;x dt @qi;y

mi

d2 qi;z @U D : 2 dt @qi;z

The equations need to be extended for effective treatment of boundary and environmental conditions, sometimes modeled by stochastic perturbations. Molecular dynamics is a widely used tool which in some sense interpolates between theory and experiment. It is one of the most effective general tools for understanding processes at the atomic level. The focus in this article is on classical molecular dynamics models based on semiempirical potential energy functions. For details of quantum mechanical models and related issues, see Schrödinger Equation for Chemistry, Fast Methods for Large Eigenvalues Problems for Chemistry, Born–Oppenheimer Approximation, Adiabatic Limit, and Related Math. Issues, and Density Functional Theory. Molecular simulation (including molecular dynamics) is treated in detail in [2, 7, 10, 18].

Background, Scope, and Application

The term molecular dynamics is used to refer to a Molecular dynamics in its current form stems from broad collection of models of systems of atoms in mo- work on hard-sphere fluid models of Alder and tion. In its most fundamental formulation, molecular Wainwright [1] dating to 1957. An article of Rahman

M

932

Molecular Dynamics

[16] described the use of smooth potentials. In 1967, Verlet [23] gave a detailed description of a general procedure, including the popular Verlet integrator and a procedure for reducing the calculation of forces by the use of neighbor lists. This article became the template for molecular dynamics studies. The use of molecular dynamics rapidly expanded in the 1970s, with the first simulations of large biological molecules, and exploded toward the end of the 1980s. As the algorithms matured, they were increasingly implemented in general-purpose software packages; many of these packages are of a high standard and are available in the public domain [6]. Molecular dynamics simulations range from just a few atoms to extremely large systems. At the time of this writing, the largest atomistically detailed molecular dynamics simulation involved 320 billion atoms and was performed using the Blue Gene/L computer at Lawrence Livermore National Laboratory, using more than 131,000 processors. The vast majority of molecular dynamics simulations are much smaller than this. In biological applications, a common size would be between 104 and 105 atoms, which allows the modeling of a protein together with a sizeable bath of water molecules. For discussion of the treatment of largescale models, refer to Large-Scale Computing for Molecular Dynamics Simulation. Perspectives on applications of molecular modeling and simulation, in particular molecular dynamics, are discussed in many review articles, see, for example, [19] for a discussion of its use in biology.

The Potential Energy Function

Length Bond

r

s

0

Lennard−Jones

Coulomb

Molecular Dynamics, Fig. 1 Example potential energy contributions p

where r D rij D .qi;x qj;x /2 C .qi;y qj;y /2 C .qi;z qj;z /2 is the distance between the two atoms. Representative graphs of the three potential energy contributions mentioned above are shown in Fig. 1. The various coefficients appearing in these formulas are determined by painstaking analysis of experimental and/or quantum mechanical simulation data; they depend not only on the types of atoms involved but also, often, on their function or specific location within the molecule and the state, i.e., the conditions of temperature, pressure, etc. In addition to two-atom potentials, there may be three or four atom terms present. For example, in a carbohydrate chain, the carbon and hydrogen atoms appear in sequence, e.g., CH3 CH2 CH2 : : : . Besides the bonds and other pair potentials, proximate triples also induce an angle-bond modeled by an energy function of the form

The complexity of molecular dynamics stems from the variety of nonlinear functional terms incorporated 'ija:bk .qi ; qj ; qk / D Bij k .†.qi ; qj ; qk / Nij k /2 ; (additively) into U and the potentially large number of atoms needed to achieve adequate model realism. Of particular note are potential energy contri- where butions depending on the pairwise interaction of the

atoms, including Lennard-Jones, Coulombic, and co1 .qj qi / .qj qk / ; †.qi ; qj ; qk / D cos valent length-bond contributions with respective defikqj qi kkqj qk k nitions as follows, for a particular pair of atoms labeled i; j : h while a torsional dihredral-bond potential on the angle 12 6 i ; r=ij 'ijLJ .r/ D 4"ij r=ij between planes formed by successive triples is also incorporated. Higher-body contributions (5-, 6-, etc.) 'ijC .r/ D Cij =r; are only occasionally present. In materials science, l:b: 2 complex multibody potentials are often used, such as ' .r/ D Aij .r rNij / ; ij

Molecular Dynamics

bond-order potentials which include a model for the local electron density [20]. One of the main limitations in current practice is the quality of the potential energy surface. While it is common practice to assume a fitted functional form for U for reasons of simplicity and efficiency, there are some popular methods which attempt to determine this “on the fly,” e.g., the Car-Parinello method models changes in the electronic structure during simulation, and other schemes may further blur the boundaries between quantum and classical approaches, such as the use of the reaxFF forcefield [22]. In addition, quantum statistical mechanics methods such as Feynman path integrals introduce a classical model that closely resembles the molecular model described above. Molecular dynamics-like models also arise in the so-called mesoscale modeling regime wherein multiatom groups are replaced by point particles or rigid bodies through a process known as coarse-graining. For example, the dissipative particle dynamics method [12] involves a conservative component that resembles molecular dynamics, together with additional stochastic and dissipative terms which are designed to conserve net momentum (and hence hydrodynamics). The molecular model may be subject to external driving perturbation forces which are time dependent and which do not have any of the functional forms mentioned above.

933

mi

X @gj d2 qi;x @U D j ; 2 dt @qi;x j D1 @qi;x

(1)

mi

X @gj d2 qi;y @U D j ; dt 2 @qi;y j D1 @qi;y

(2)

mi

X @gj d2 qi;z @U D j : 2 dt @qi;z j D1 @qi;z

(3)

m

m

m

The j may be determined analytically by differentiating the constraint relations gj .q.t// D 0 twice with respect to time and making use of the second derivatives from the equations of motion. In practice, for numerical simulation, this approach is not found to be as effective as treating the constrained equations as a combined differential-algebraic system of special type (See “Construction of Numerical Methods,” below). In many cases, molecular dynamics is reduced to a system of rigid bodies interacting in a force field; then there are various options regarding the form of the equations which may be based on particle models, Euler parameters or Euler angles, quaternions, or rotation matrices. For details, refer, for example, to the book on classical mechanics of Goldstein [8].

Particle Density Controlled by Periodic Boundary Conditions

Constraints The basic molecular dynamics model often appears in modified forms that are motivated by modeling considerations. Types of systems that arise in practice include constrained systems (or systems with rigid bodies) which are introduced to coarse-grain the system or simply to remove some of the most rapid vibrational modes. An important aspect of the constraints used in molecular modeling is that they are, in most cases, holonomic. Specifically, they are usually functions of the positions only and of the form g.q/ D 0, where g is a smooth mapping from R3N to R. The constraints may be incorporated into the equations of motion using Lagrange multipliers. If there are m constraints gj .q/ D 0, j D 1; : : : ; m, then we may write the differential equations as (for i D 1; : : : ; N )

In most simulations, it is necessary to prescribe the volume V of simulation or to control the fluctuations of volume (in the case of constant pressure simulation). Since the number of atoms treated in simulation is normally held fixed (N ), the control of volume also provides control of the particle density (N=V ). Although other mechanisms are occasionally suggested, by far the most common method of controlling the volume of simulation is the use of periodic boundary conditions. Let us suppose we wish to confine our simulation to a simulation cell consisting of a cubic box with side length L and volume V D L3 . We begin by surrounding the system with a collection of 26 periodic replicas of our basic cell. In each cell copy, we assume the atoms have identical relative positions as in the basic cell and we augment the total potential energy with interaction potentials for pairs consisting of an atom of the basic cell and one in each of the neighboring cells.

M

934

Molecular Dynamics

If 'ij is the total potential energy function for inter- may include the body-centered cubic (BCC), faceactions between atoms i and j , then periodic boundary centered cubic (FCC), and hexagonal close-packed conditions with short-ranged interactions involves an (HCP) structures. extended potential energy of the form In the case of biological molecules, the initial positions are most often found by experimental techniques N 1 X (e.g., nuclear magnetic resonance imaging or x-ray X NX U pbc .q/ D 'ij .qi ; qj Ck1 Cl2 Cm3 /; crystallography). Because these methods impose artifiklm i D1 j Di C1 cial assumptions (isolation of the molecule in vacuum or frozen conditions), molecular dynamics often plays where k; l; m run over 1; 0; 1, and 1 D Le1 ; 2 D a crucial role in refining such structural information Le2 ; 3 D Le3 , where ei is the i th Euclidean basis so that the structures reported are more relevant for vector in R3 . During simulation, atoms leaving the box the liquid state conditions in which the molecule is are assumed to reenter on the opposite face; thus the found in the laboratory (in vitro) or in a living organism coordinates must be checked and possibly shifted after (in vivo). each positional step. For systems with only short-ranged potentials, the cell size is chosen large enough so that atoms do Properties of the Model not “feel” their own image; the potentials are subject to a cutoff which reduces their influence to the box For compactness, we let q be a vector of all 3N and the nearest replicas. For systems with Coulombic position coordinates of the atoms of the system, and potentials, this is not possible, and indeed it is typically we take v to be the corresponding vector of velocnecessary to calculate the forces of interaction for ities. The set of all allowed positions and velocinot just the adjacent cells but also for the distant ties is called the phase space of the system. U D periodic replicas; fortunately, the latter calculation can U.q/ is the potential energy function (assumed time be greatly simplified using a technique known as Ewald independent for this discussion), F .q/ D rU.q/ summation (see below). is the force (the gradient of potential energy), and M D diag.m1 ; m1 ; m1 ; m2 ; : : : ; mN ; mN ; mN / is the mass matrix. The molecular dynamics equations of motion may be written compactly as a first order Molecular Structure system of dimension 6N : The molecular potential energy function will have vast qP D v; M vP D F .q/: numbers of local minima corresponding to specific organizations of the atoms of the system relative to one another. The design of the energy function is typically (The notation xP refers to the time derivative of the performed in such a way as to stabilize the most quantity x.) likely structures (perhaps identified from experiment) The motion of the system beginning from prewhen the system is at mechanical equilibrium. It is scribed initial conditions (q.0/ D , v.0/ D , for impossible, using current algorithms, to identify the given vectors ; 2 R3N ) is a trajectory .q.t/; v.t//. global minimum of even a modest molecular energy The state of the system at any given time is completely landscape from an arbitrary starting point. Therefore characterized by the state at any previous time; thus the specification of appropriate initial data may be of there is a well-defined flow map ˆ of the phase space, defined for any , such that .q.t/; v.t// D importance. In the case of a system in solid state, simulations ˆ .q.t /, p.t //. Because of the form of the force typically begin from the vicinity of a textbook crystal laws, involving as they typically do an overwhelming structure, often a regular lattice in the case of a homo- repulsive component at short range (due to avoidgeneous system which describes the close-packed con- ance of overlap of the electron clouds and normally figurations of a collection of spheres. Certain lattices modeled as part of a Lennard-Jones potential), the are found to be most appropriate for given chemical separation distance between pairs of atoms at constant constituents at given environmental conditions. These energy is uniformly bounded away from zero. This

Molecular Dynamics

935

is an important distinction from gravitational N -body Flow Map dynamics where the close approaches of atoms may The energy of the system is E D E.q; v/ D dominate the evolution of the system. v T M v=2 CU.q/. It is easy to see that E is a conserved quantity (first integral) of the system since Equilibria and Normal Modes The equilibrium points of the molecular model satisfy dE D rq E qP C rv E vP qP D vP D 0; hence dt v D 0;

F .q/ D rq U.q/ D 0:

Thus the equilibria q are the critical points of the potential energy function. It is possible to linearize the system at such an equilibrium point by computing the Hessian matrix W whose ij entry is the mixed second partial derivative of U with respect to qi and qj evaluated at the equilibrium point. The equations of motion describing the motion of a material point near to such an equilibrium point are of the form dıq D ıv; dt

dıv D W ıq; dt

D v rU C .M v/ .M 1 rU / D 0; implying that it is a constant function of time as it is evaluated along a trajectory. Energy conservation has important consequences for the motion of the system and is often used as a simple check on the implementation of numerical methods. The flow map may possess additional invariants that depend on the model under study. For example, if the N atoms of a system interact only with each other through a pairwise potential, it is easy to see that the sum of all forces will be zero and the total momentum is therefore a conserved quantity. Likewise, for such a closed system of particles, the total angular momentum is a conserved quantity. When the positions of all the atoms are shifted uniformly by a fixed vector offset, the central forces, based only on the relative positions, are clearly invariant. We say that a closed molecular system is invariant under translation. Such a system is also invariant under rotation of all atoms about the center of mass. The symmetries and invariants are linked, as a consequence of Noether’s theorem. When periodic boundary conditions are used, the angular momentum conservation is easily seen to be P destroyed, but the total momentum i pi remains a conserved quantity, and the system is still invariant under translation. Reflecting the fact that the equations of motion qP D M 1 p, pP D rU are invariant under the simultaneous change of sign of time and momenta, we say that the system is time reversible. The equations of motion of a Hamiltonian system are also divergence free, so the volume in phase space is also preserved. The latter property can be related to a more fundamental geometric principle: Hamiltonian systems have flow maps which are symplectic, meaning that they conserve the canonical differential twoform defined by

where ıq qq , ıv vv . The motion of this linear system may be understood completely in terms of its eigenvalues and eigenvectors. When the equilibrium point corresponds to an isolated local minimum of the potential energy, the eigenvalues of W are positive, and their square roots !1 ; !2 ; : : : ; !3N are proportional to the frequencies of the normal modes of oscillation of the molecule; the normal modes themselves are the corresponding eigenvectors of W . Depending on the symmetries of the system, some of the characteristic frequencies vanish, and the number of normal modes is correspondingly reduced. The normal modes provide a useful perspective on the local dynamics of the system near an equilibrium point. As an illustration, a linear triatomic molecule consists of three atoms subject to pairwise and angle bond potentials. The energetically favored configuration is an arrangement of the atoms in a straight line. The normal modes may be viewed as directions in which the atomic configuration is deformed from the linear configuration. The triatomic molecule has six symmetries and a total of 3 3 6 D 3 normal modes, including symmetrical and asymmetrical stretches and a bending mode. More complicated systems have a wide range of normal modes which may be obtained numerically using eigenvector solvers. Eigenvalues and Eigenvectors: Computation. D dq1;x ^dp1;x Cdq2;x ^dp2;x C: : :CdqN;z ^dpN;z ;

M

936

Molecular Dynamics

i.e., ˆ D , where ˆ represents the pullback of the differential form . Another way to state this of the flow map property is that the Jacobian matrix @ˆ @z satisfies

This follows from the Liouville equation @%=@t D LH %. Invariant distributions (i.e., those % such that L% D 0) of the equations of motion are associated to the long-term evolution of the system. Due to the chaotic nature of molecular dynamics, these invariant @ˆ @ˆ T 0 I 0 I distributions may have a complicated structure (e.g., : D I 0 I 0 @z @z their support is often a fractal set). In some cases, the invariant distribution may appear to be dense in The various properties mentioned above have im- the phase space (or on a large region in phase space), portant ramifications for numerical method develop- although rigorous results are not available for comment. plicated models, unless stochastic perturbations are introduced. Invariant Distribution A crucial aspect of nearly all molecular dynamics mod- Constraints els is that they are chaotic systems. One consequence In the case of a constrained system, the evolution of this is that the solution depends sensitively on the is restricted to the manifold f.q; p/jg.q/ D initial data (small perturbations in the initial data will 0; g 0 .q/M 1 p D 0g. Note that the hidden constraint grow exponentially rapidly in time). The chaotic nature g 0 .q/M 1 p D 0 arises from time differentiation of of the model means that the results obtained from long the configurational constraints which must be satisfied simulations are typically independent of their precise for all points on a given trajectory. The Hamiltonian starting point (although they may depend on the energy structures, invariant properties, and the concept of or momentum). invariant distribution all have natural analogues for Denote by LH u the Lie-derivative operator for the the system with holonomic constraints. In the compact differential equations defined for any scalar function notation of this section, the constrained system may be u D u.q; p/ by written qP D M 1 p;

LH u D .M 1 p/ rq u .rq U / rp u;

pP D rU.q/ g 0 .q/T ;

g.q/ D 0; (6)

which represents the time derivative of u along a so- where is now a vector of m Lagrange multipliers, lution of the Hamiltonian system. Thus e t LH qi can be g W R3N ! Rm , and g 0 is the 3N m-dimensional represented using a formal Maclaurin series expansion: Jacobian matrix of g. e t LH qi D qi C tLH qi C D qi C t qPi C

t2 2 L qi C : : : 2 H

(4)

Construction of Numerical Methods

2

t qRi C : : : ; 2

(5) Numerical methods are used in molecular simulation for identifying stable structures, sampling the potential and hence viewing this expression as evaluated at the energy landscape (or computing averages of functions initial point, we may identify this directly with qi .t/. of the positions and momenta) and calculating dynamThus e t LH is a representation for the flow map. As a ical information; molecular dynamics may be involved shorthand, we will contract this slightly and use the no- in all three of these tasks. The term “molecular dynamtation ˆt D e tH to denote the flow map corresponding ics” often refers to the generation of trajectories by use to Hamiltonian H . of timestepping, i.e., the discretized form of the dyGiven a distribution %0 on phase space, the density namical system based on a suitable numerical method associated to the distribution will evolve under the for ordinary differential equations. In this section, we action of the exponential of the Liouvillian operator discuss some of the standard tools of molecular dynamLH D LH , i.e., ics timestepping. The basic idea of molecular dynamics timestepping %.t/ D e t LH %0 : is to define a mapping of phase space ‰h which

Molecular Dynamics

approximates the flow map ˆh on a time interval of length h. A numerical trajectory is a sequence defined by iterative composition of the approximate flow map applied to some initial point .q 0 ; p 0 /, i.e., f.q n ; p n / D ‰hn .q 0 ; p 0 /jn D 0; 1; 2; : : :g. Note the use of a superscript to indicate timesteps, to make the distinction with the components of a vector, which are enumerated using subscripts. Very long trajectories are typically needed in molecular simulation. Even a slow drift of energy (or in some other, less easily monitored physical property) will eventually destroy the usefulness of the technique. For this reason, it is crucial that algorithms be implemented following mathematical principles associated to the classical mechanics of the model and not simply based on traditional convergence analysis (or local error estimates). When very long time trajectories are involved, the standard error bounds do not justify the use of traditional numerical methods. For example, although they are all formally applicable to the problem, traditional favorites like the popular fourth-order RungeKutta method are in most cases entirely inappropriate for the purpose of molecular dynamics timestepping. Instead, molecular dynamics relies on the use of geometric integrators which mimic qualitative features of the underlying dynamical system; see Symplectic Methods. ¨ Stormer-Verlet Method By far the most popular method for evolving the constant energy (Hamiltonian) form of molecular dynamics is one of a family of methods investigated by Carl Störmer in the early 1900s for particle dynamics (it is often referred to as Störmer’s rule, although the method was probably in use much earlier) and which was adapted by Verlet in his seminal 1967 paper on molecular dynamics. The method proceeds by the sequence of steps:

937

p nC1=2 D p n1=2 hrU.q n /; q nC1 D q n C hM 1 p nC1=2 : This method is explicit, requiring a single force evaluation per timestep, and second-order accurate, meaning that on a fixed time interval the error may be bounded by a constant times h2 , for sufficiently small h. When applied to the harmonic oscillator qP D pI pP D ! 2 q, it is found to be numerically stable for h! 2. More generally, the maximum usable stepsize is found to be inversely dependent on the frequency of fastest oscillation. Besides this stepsize restriction, by its nature, the method is directly applicable only to systems with a separable Hamiltonian (of the form H.q; p/ D T .p/ C U.q/); this means that it must be modified for use in conjunction with thermostats and other devices; it is also only suited to deterministic systems. Composition Methods The splitting framework of geometric integration is useful for constructing molecular dynamics methods. In fact, the Verlet method can be seen as a splitting method in which the simplified problems

q 0 D v1 WD ; p rU

1

d q M p D v2 WD ; 0 dt p d dt

d dt

q D v1 p

are solved sequentially, the first and last for half a timestep and the middle one for the full timestep. Note that in solving, the first system q is seen to be constant; hence the solution evolved from some given point .q; p/ is .q; p .h=2/rU.q//; this can be seen as an impulse or “kick” applied to the system. The other vector field can be viewed as inducing a “drift” (linear motion along the direction M 1 p). Thus Störmerh Verlet can be viewed as “kick-drift-kick.” Using the p nC1=2 D p n rU.q n /; notation introduced in the first section of this article, 2 we may write, for the Störmer-Verlet method, ‰hSV D nC1 n 1 nC1=2 D q C hM p ; q exp. h2 U / exp.hK/ exp. h2 U /, where K D p T M 1 p=2 h is the kinetic energy. p nC1 D p nC1=2 rU.q nC1 /: 2 Higher-order composition methods may be constructed by using Yoshida’s method [24]. In practice, In common practice, the first and last stages are amal- the benefit of this higher-order of accuracy is only seen when sufficiently small timesteps are used (i.e., in a gamated to produce the alternative (equivalent) form

M

938

Molecular Dynamics

relatively high-accuracy regime), and this is normally not required in molecular simulation. Besides one-step methods, one also occasionally encounters the use of multistep methods such as Beeman’s method [7]. As a rule, molecular dynamicists favor explicit integration schemes whenever these are available. Multistep methods should not be confused with multiple timestepping [21]. The latter is a scheme (or rather, a family of schemes, whereby parts of the system are resolved using a smaller timestep than others. This method is very widely used since it can lead to dramatic gains in computational efficiency; however, the method may introduce resonances and instability and so should be used with caution [15]. Numerical Treatment of Constraints An effective scheme for simulating constrained molecular dynamics is the SHAKE method [17], which is a natural generalization of the Verlet method. This method is usually written in a staggered form. In [3], this method was rewritten in the “self-starting” RATTLE formulation that is more natural as a basis for mathematical study. (SHAKE and RATTLE are conjugate methods, meaning that one can be related to the other via a simple reflexive coordinate transformation; see [14].) The RATTLE method for the constrained system (6) is q nC1 D q n C hM 1 p nC1=2 ; p nC1=2 D p n

h h rU.q n / g 0 .q n /T n ; 2 2

0 D g.q nC1 /; p nC1 D p nC1=2

h rU.q nC1 / 2

h g 0 .q nC1 /T nC1 ; 2 0 D g 0 .q nC1 /M 1 p nC1 ;

In this expression, QN n WD q n C hM 1 p n .h2 =2/rU.q n/ and G n D g 0 .q n / are both known at start of the step; thus we have m equations for m variables ƒ. This system may be solved by a Newton iteration [4] or by a Gauss-Seidel-Newton iteration [17]. Once ƒ is known, p nC1=2 and q nC1 are easily found. Equations 10 and 11 are then seen to represent a linear system for nC1 . Once this is determined, p nC1 can be found and step is complete. Note that a crucial feature of the RATTLE method is that, while implicit (it involves the solution of nonlinear equations at each step), the method only requires a single unconstrained force evaluation (F .q/ D rU.q/) at each timestep.

Properties of Numerical Methods

As the typical numerical methods used in (microcanonical) molecular dynamics may be viewed as mappings that approximate the flow map, it becomes possible to discuss them formally using the same language as one would use to discuss the flow map of the dynamical system. The global, structural, or geometric properties of the flow map approximation have important consequences in molecular simulation. The general study of numerical methods preserving geometric structures is referred to as geometric integration or sometimes, mimetic discretization. Almost all numerical methods, including all those (7) mentioned above, preserve linear symmetries such as the translation symmetry (or linear invariants like the (8) total momentum). Some numerical methods (e.g., Verlet) preserve angular momentum. The time-reversal (9) symmetry mentioned previously may be expressed in terms of the flow map by the relation

(10)

ˆt ı R D R ı ˆ1 t ;

(11) where R is the involution satisfying R.q; p/ D .q; p/. A time-reversible one-step method is one where additional multipliers n 2 Rm have been that shares this property of the flow map. For example, introduced to satisfy the “hidden constraint” on the the implicit midpoint method and the Verlet method momentum. This method is implemented in two stages. are both time reversible. Thus stepping forward a The first two equations may be inserted into the con- timestep, then changing the sign of p then stepping straint equation (9) and the multipliers rescaled (ƒ D forward a timestep and changing again the sign of .h2 =2/n ) to obtain p returns us to our starting point. The time-reversal symmetry is often heralded as an important feature of methods, although it is unclear what role it plays g.QN n G n ƒ/ D 0:

Molecular Dynamics

in simulations of large-scale chaotic systems. It is often used as a check for correct implementation of complicated numerical methods in software (along with energy conservation). The Symplectic Property and Its Implications Some numerical methods share the symplectic property of the flow map. Specifically those derived by Hamiltonian splitting are always symplectic, since the symplectic maps form a group under composition. The Verlet method is a symplectic method since it is constructed by composing the flow maps associated to the Hamiltonians H1 D p T M 1 p=2 and H2 D U.q/. A symplectic numerical method applied to solve a system with Hamiltonian H can be shown to closely approximate the flow map of a modified system with Hamiltonian energy HQ h D H C hr H .r/ C hrC1 H .rC1/ C : : : ; where r is the order of the method. More precisely, the truncated expansion may be used to approximate the dynamics on a bounded set (the global properties of the truncation are not known). As the number of terms is increased, the error in the approximation, on a finite domain, initially drops but eventually may be expected to increase (as more terms are taken). Thus there is an optimal order of truncation. Estimates of this optimal order have been obtained, suggesting that the approximation error can be viewed as exponentially small in h, i.e., of the form O.e 1= h /, as h ! 0. The existence of a perturbed Hamiltonian system whose dynamics mimic those of the original system is regarded as significant in geometric integration theory. One consequence of the perturbative expansion is that the energy error will remain of order O.hp / on a time interval that is long compared to the stepsize. The implication of these formal estimates for molecular dynamics has been examined in some detail [5]; their existence is a strong reason to favor symplectic methods for molecular dynamics simulation. In the case of constraints, it is possible to show that the RATTLE method (7)–(11) defines a symplectic map on the contangent bundle of the configuration manifold, while also being time reversible [14]. Theoretical and practical issues in geometric integration, including methods for constructing symplectic integrators and methods for constraints and rigid bodies, are addressed in [11, 13].

939

The Force Calculation In almost all molecular simulations, the dominant computational cost is the force calculation that must be performed at each timestep. In theory, this calculation (for the interactions of atoms within the simulation cell) requires computation of O.N 2 / square roots, where N is the number of atoms, so if N is more than a few thousand, the time spent in computing forces will dwarf all other costs. (The square roots are the most costly element of the force calculation.) If only Lennard-Jones potentials are involved, then the cost can be easily and dramatically reduced by use of a cutoff, i.e., by smoothly truncating 'L:J: at a prescribed distance, typically 2 or greater. When long-ranged Coulombic forces are involved, the situation is much different and it is necessary to evaluate (or approximate) these for both the simulation cell and neighbor cells and even for more distant cell replicas. One of the most popular schemes for evaluating the Coulombic potentials and forces is the Particle-MeshEwald (PME) method which relies on the decomposition UCoulomb D Us:r: C Ul:r: , where Us:r: and Ul:r: represent short-ranged and long-ranged components respectively; such a decomposition may be obtained by splitting the pair potentials. The long-ranged part is assumed to involve the particles in distant periodic replicas of the simulation cell. The short-ranged part is then evaluated by direct summation, while the longranged part is calculated in the Fourier domain (based P 2 , where UQ .k/ Q on Parceval’s relation) as UQ .k/j.k/j is the Fourier transform of the potential, and Q is the Fourier transform of the charge density in the central simulation cell, the latter calculated by approximation on a regular discrete lattice and use of the fast Fourier transform (FFT). The exact placement of the cutoffs (which determines what part of the computation is done in physical space and what part in reciprocal space) has a strong bearing on efficiency. Alternative approaches to handling the long-ranged forces in molecular modeling include multigrid methods and the fast multipole method (FMM) of Greengard and Rokhlin [9].

Temperature and Pressure Controls Molecular models formulated as conservative (Hamiltonian) systems usually need modification to allow specification of a particular temperature or pressure.

M

940

Thermostats may be viewed as substitution of a simplified model for an extended system in such as way as to correctly reflect energetic exchanges between a modeled system and the unresolved components. Likewise, barostats are the means by which a system is reduced while maintaining the correct exchange of momentum. The typical approach is to incorporate auxiliary variables and possibly stochastic perturbations into the equations of motion in order that the canonical ensemble, for example (in the case of a thermostat), rather than the microcanonical ensemble is preserved. For details of these methods, refer to Sampling Techniques for Computational Statistical Physics for more details.

References 1. Alder, B.J., Wainwright, T.E.: Phase transition for a hard sphere system. J. Chem. Phys. 27, 1208–1209 (1957) 2. Allen, M.P., Tildesley, D.J.: Computer Simulation of Liquids. University Press, Oxford, UK (1988) 3. Andersen, H.: RATTLE: a “Velocity” version of the SHAKE algorithm for molecular dynamics calculations. J. Comput. Phys. 52, 2434 (1983) 4. Barth, E., Kuczera, K., Leimkuhler, B., Skeel, R.: Algorithms for constrained molecular dynamics. J. Comput. Chem 16, 1192–1209 (1995) 5. Engle, R.D., Skeel, R.D., Drees, M.: Monitoring energy drift with shadow Hamiltonians. J. Comput. Phys. 206, 432–452 (2005) 6. Examples of Popular Molecular Dynamics Software Packages Include: AMBER (http://en.wikipedia.org/ wiki/AMBER), CHARMM (http://en.wikipedia.org/ wiki/CHARMM), GROMACS (http://en.wikipedia.org/ wiki/Gromacs) and NAMD (http://en.wikipedia.org/wiki/ NAMD) 7. Frenkel, D., Smit, B.: Understanding Molecular Simulation: From Algorithms to Applications, 2nd edn. Academic Press, San Diego (2001) 8. Goldstein, H.: Classical Mechanics, 3rd edn. AddisonWesley, Lebanon (2001) 9. Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73, 325–348 (1987) 10. Haile, J.M.: Molecular Dynamics Simulation: Elementary Methods. Wiley, Chichester (1997) 11. Hairer, E., Lubich, C., Wanner, G.: Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations. Springer, Berlin/New York (2006) 12. Hoogerbrugge, P.J., Koelman, J.M.V.A.: Simulating microscopic hydrodynamic phenomena with dissipative particle dynamics. Europhys. Lett. 19, 155–160 (1992) 13. Leimkuhler, B., Reich, S.: Simulating Hamiltonian Dynamics. Cambridge University Press, Cambridge, UK/New York (2004)

Molecular Dynamics Simulations 14. Leimkuhler, B., Skeel, R.: Symplectic numerical integrators in constrained Hamiltonian systems. J. Comput. Phys. 112, 117125 (1994) 15. Ma, Q., Izaguirre, J., Skeel, R.D.: Verlet-I/r-RESPA is limited by nonlinear instability. SIAM J. Sci. Comput. 24, 1951–1973 (2003) 16. Rahman, A.: Correlations in the motion of atoms in liquid argon. Phys. Rev. 136, A405–A411 (1964) 17. Ryckaert, J.-P., Ciccotti, G., Berendsen, H.: Numerical integration of the Cartesian equations of motion of a system with constraints: molecular dynamics of n-Alkanes. J. Comput. Phys. 23, 327–341 (1977) 18. Schlick, T.: Molecular Modeling and Simulation. Springer, New York (2002) 19. Schlick, T., Collepardo-Guevara, R., Halvorsen, L.A., Jung, S., Xiao, X.: Biomolecular modeling and simulation: a field coming of age. Q. Rev. Biophys. 44, 191–228 (2011) 20. Tersoff, J.: New empirical approach for the structure and energy of covalent systems. Phys. Rev. B 37, 6991–7000 (1988) 21. Tuckerman, M., Berne, B.J., Martyna, G.J.: Reversible multiple time scale molecular dynamics. J. Chem. Phys. 97, 1990–2001 (1992) 22. van Duin, A.C.T., Dasgupta, S., Lorant, F., Goddard, III, W.A.: J. Phys. Chem. A 105, 9396–9409 (2001) 23. Verlet, L.: Computer “experiments” on classical fluids. I. thermodynamical properties of Lennard-Jones molecules. Phys. Rev. 159, 98–103 (1967) 24. Yoshida, H.: Construction of higher order symplectic integrators. Phys. Lett. A 150, 262–268 (1990)

Molecular Dynamics Simulations Tamar Schlick Department of Chemistry, New York University, New York, NY, USA

Overview Despite inherent limitations and approximations, molecular dynamics (MD) is considered today the gold standard computational technique by which to explore molecular motion on the atomic level. Essentially, MD can be considered statistical mechanics by numbers, or Laplace’s vision [1] of Newtonian physics on modern supercomputers [2]. The impressive progress in the development of biomolecular force fields, coupled to spectacular computer technology advances, has now made it possible to transform this vision into a reality, by overcoming the difficulty noted by Dirac of solving the equations of motion for multi-body systems [3].

Molecular Dynamics Simulations

MD’s esteemed stature stems from many reasons. Fundamentally, MD is well grounded in theory, namely, Newtonian physics: the classical equations of motion are solved repeatedly and numerically at small time increments. Moreover, MD simulations can, in theory, sample molecular systems on both spatial and temporal domains and thus address equilibrium, kinetic, and thermodynamic questions that span problems from radial distribution functions of water, to protein-folding pathways, to ion transport mechanisms across membranes. (Natural extensions to bond breaking/forming events using quantum/classicalmechanics hybrid formulations are possible.) With steady improvements in molecular force fields, careful treatment of numerical integration issues, adequate statistical analyses of the trajectories, and increasing computer speed, MD simulations are likely to improve in both quality and scope and be applicable to important molecular processes that hold many practical applications to medicine, technology, and engineering. Since successful applications were reported in the 1970s to protein dynamics, MD has now become a popular and universal tool, “as if it were the differential calculus” [4]. In fact, Fig. 1 shows that, among the modeling and simulation literature citations, MD leads as a technique. Moreover, open source MD programs have made its usage more attractive (Fig. 1b). MD is in fact one of the few tools available, by both experiment and theory, to probe molecular motion on the atomic scale. By following the equations of motion as dictated by a classical molecular mechanics force field, complex relationships among biomolecular structure, flexibility, and function can be investigated, as illustrated in the examples of Fig. 2. Today’s sophisticated dynamics programs, like NAMD or GROMACS, adapted to parallel and massively parallel computer architectures, as well as specialized hardware, have made simulations of biomolecular systems in the microsecond range routinely feasible in several weeks of computing. Special hardware/software codesign is pushing the envelope to long time frames (see separate discussion below). Though the well-recognized limitations of sampling in atomistic dynamics, as well as in the governing force fields, have led to many innovative sampling alternatives to enhance coverage of the thermally accessible conformational space, many approaches still rely on MD for local sampling.

941

a 7000 Papers by method

MD MC ab initio other methods coarse graining QM/MM

6000

5000

4000

3000

2000

1000

M 0 1976

b

1980

1985

1990 1995

2000

2005 2009

2000

2005 2009

MD packages citations

600 400 200 0 1976 1980

CHARMM Amber Gromacs NAMD

1985

1990 1995 Year

Molecular Dynamics Simulations, Fig. 1 Metrics for the rise in popularity of molecular dynamics. The number of molecular modeling and simulation papers is shown, grouped by simulation technique in (a) and by reference to an MD package in (b). Numbers are obtained from a search in the ISI Web of Science using the query words molecular dynamics, biomolecular simulation, molecular modeling, molecular simulation, and/or biomolecular modeling

Overall, MD simulations and related modeling techniques have been used by experimental and computational scientists alike for numerous applications, including to refine experimental data, shed further insights on structural and dynamical phenomena, and

942


Molecular Dynamics Simulations, Fig. 2 MD application examples. The illustrations show the ranges of motion or structural information that can be captured by dynamics simulations: (a) protein (DNA polymerase ) motions [110], (b) active site details gleaned from a polymerase system complexed to

misaligned DNA [111], (c) differing protein/DNA flexibility for eight single-variant mutants of DNA polymerase [112], and (d) DNA simulations. In all cases, solvent and salt are included in the simulation but not shown in the graphics for clarity

help resolve experimental ambiguities. See [5,6] for recent assessment studies. Specifically, applications extend to refinement of X-ray diffraction and NMR structures; interpretation of single-molecule force-extension curves (e.g., [5]) or NMR spin-relaxation in proteins (e.g., [7–9]); improvement of structure-based function predictions, for example, by predicting calcium binding sites [10]; linking of static experimental structures to implied pathways (e.g., [11, 12]); estimating the importance of quantum effects in lowering free-energy barriers of biomolecular reactions [13]; presenting structural predictions; deducing reaction mechanisms; proposing free energy pathways and associated mechanisms (e.g., [14–16]); resolving or shedding light on experimental ambiguities, for example, involving chromatin fiber structure (zigzag or solenoid) [17] or Gquadruplex architecture (parallel or antiparallel backbone arrangements) [18]; and designing new folds and

compounds, including drugs and enzymes (e.g., [19– 22]). Challenging applications to complex systems like membranes, to probe associated structures, motions, and interactions (e.g., [23–25]), further demonstrate the utility of MD for large and highly charged systems.

Historical Perspective Several selected simulations that exemplify the field’s growth are illustrated in Fig. 3 (see full details in [26]). The first MD simulation of a biological process was for the small protein BPTI (Bovine Pancreatic Trypsin Inhibitor) in vacuum [27], which revealed substantial atomic fluctuations on the picosecond timescale. DNA simulations of 12- and 24-base-pair (bp) systems in 1983 [28], in vacuum without electrostatics (of length about 90 ps), and of a DNA pentamer in 1985,


with 830 water molecules and 8 sodium ions and full electrostatics (of length 500 ps) [29], revealed stability problems for nucleic acids and the importance of considering long-range electrostatics interactions; in the olden days, DNA strands untwisted and separated in some cases [28]. Stability became possible with the introduction of scaled phosphate charges in other pioneering nucleic-acid simulations [30–32] and the presentation a decade later of more advanced treatments for solvation and long-range electrostatics [33]. The field developed in dazzling pace in the 1980s with the advent of supercomputers. For example, a 300 ps dynamics simulation of the protein myoglobin in 1985 [34] was considered three times longer than the longest previous MD simulation of a protein; the work indicated a slow convergence of many thermodynamic properties. System complexity also increased, as demonstrated by the ambitious, large-scale phospholipid aggregate simulation in 1989 of length 100 ps [35]. In the late 1990s, long-range electrostatics and parallel processing for speedup were widely exploited [36]. For example, a 100 ps simulation in 1997 of an estrogen/DNA system [37] sought to explain the mechanism underlying DNA sequence recognition by the protein; it used the multipole electrostatic treatment and parallel computer architecture. The dramatic effect of fast electrostatics on stability was further demonstrated by the Beveridge group [38], whose 1998 DNA simulation employing the alternative, Particle Mesh Ewald (PME), treatment uncovered interesting properties of A-tract sequences. The protein community soon jumped on the MD bandwagon with the exciting realization that proteins might be folded using the MD technique. In 1998, simulations captured reversible, temperaturedependent folding of peptides within 200 ns [39], and a landmark simulation by the late Peter Kollman made headlines by approaching the folded structure of a villin-headpiece within 1 &s [40]. This villin simulation was considered longer by three orders of magnitude than prior simulations and required 4 months of dedicated supercomputing. MD triumphs for systems that challenged practitioners due to large system sizes and stability issues soon followed, for example, the bc1 protein embedded in a phospholipid bilayer [41] for over 1 ns, and an aquaporin membrane channel protein in a lipid membrane for 5 ns [42]; both suggested mechanisms and pathways for transport.

943

The usage of many short trajectories to simulate the microsecond timescale on a new distributed computing paradigm, instead of one long simulation, was alternatively applied to protein folding using folding@home a few years later (e.g., protein G hairpin for 38 &s aggregate dynamics) [43,44]. Soon after, many long folding simulations have been reported, with specialized programs that exploit high-speed multiple-processor systems and/or specialized computing resources, such as a 1.2 &s simulation of a DNA dodecamer with a MareNostrum supercomputer [45], 1.2 &s simulation for ubiquitin with program NAMD [46], a 20 &s simulation for ˇ2 AR protein with the Desmond program [47], and small proteins like villin and a WW domain for over 1 &s [48]. Simulations of large systems, such as viruses containing one million atoms, are also noteworthy [49]. Indeed, the well-recognized timestep problem in MD integration – the requirement for small timesteps to ensure numerical stability – has limited the biological time frames that can be sampled and thus has motivated computer scientists, engineers, and biophysical scientists alike to design special-purpose hardware for MD. Examples include a transputer computer by Schulten and colleagues in the early 1990s [50], a Japanese MD product engine [51], IBM’s petaflop Blue Gene Supercomputer for protein folding [52, 53], and D. E. Shaw Research’s Anton machine [54]. A milestone of 1 ms simulations was reached with Anton in 2010 for two small proteins studied previously (BPTI and WW domain of Fip35) [55]. An extrapolation of the trends in Fig. 3 suggests that we will attain the milestone of 1-s simulations in 2015! At the same time as these longer timescales and more complex molecular systems are being simulated by atomistic MD, coarse-grained models and combinations of enhanced configurational sampling methods are emerging in tandem as viable approaches for simulating large macromolecular assemblies [56– 59]. This is because computer power alone is not likely to solve the sampling problem in general, and noted force field imperfections [60] argue for consideration of alternative states besides lowest energy forms. Long simulations also invite more careful examination of long-time trajectory stability and other numerical issues which have thus far not been possible to study. Indeed, even with quality integrators, energy drifts, resonance artifacts, and chaotic events are expected over millions and more integration steps.

M

944

Molecular Dynamics Simulations 2015, s

106

ms

105

Simulation Length (ns)

104

ms

103

102

101

100

ns

10−1

10−2

ps 1976

1980

1984

1988

1992

1996

2000

2004

2008 2010

Year

Molecular Dynamics Simulations, Fig. 3 MD evolution examples. The field’s evolution in simulation scope (system and simulation length) is illustrated through representative systems

discussed in the text. See [26] for more details. Extrapolation of the trends suggests that we will reach a second-length simulation in 2015


Early on, it was recognized that MD has the potential application to drug design, through the identification of motions that can be suppressed to affect function, and through estimates of binding free energies. More generally, modeling molecular structures and dynamics can help define molecular specificity and clarify functional aspects that are important for drug development [61]. Already in 1984, the hormoneproducing linear decapeptide GnRH (gonadotropinreleasing hormone) was simulated for 150 ps to explore its pharmaceutical potential [62]. Soon after the HIV protease was solved experimentally, 100 ps MD simulations suggested that a fully open conformation of the protease “flaps” may be favorable for drug access to the active site [63–67]. Recent simulations have also led to design proposals [68] and other insights into the HIV protease/drug interactions [21]. MD simulations of the HIV integrase have further suggested that inhibitors could bind in more than one orientation [69, 70], that binding modes can be selected to exploit stronger interactions in specific regions and orientations [69, 71, 72], and that different divalent-ion arrangements are associated with these binding sites and fluctuations [70]. Molecular modeling and simulation continue to play a role in structure-based drug discovery, though modern challenges in the development of new drug entities argues for a broader systems-biology multidisciplinary approach [73, 74].

Algorithmic Issues: Integration, Resonance, Fast Electrostatics, and Enhanced Sampling When solving the equations of motion numerically, the discretization timesteps must be sufficiently small to ensure reasonable accuracy as well as stability [26]. The errors (in energies and other ensemble averages) grow rapidly with timestep size, and the stability is limited by the inherent periods of the motion components, which range from 10 fs for light-atom bond stretching to milliseconds and longer for slower collective motions [75]. Moreover, the crowded frequency spectrum that spans this large range of six or more orders of magnitude is intricately coupled (see Fig. 4). For example, bond vibrations lead to angular displacements which in turn trigger side-chain motions and collective motions. Thus, integrators that have worked in other applications that utilize timescale separation, mode filtering, or mode neglect are not possible for

945

biomolecules in general. For this reason, analysis of MD integrators has focused on establishing reliable integrators that are simple to implement and as low as possible in computational requirements (i.e., dominated by one force evaluation per timestep). When mathematicians began analysis of MD integrators in the late 1980s, it was a pleasant surprise to discover that the Verlet method, long used for MD [76], was symplectic, that is, volume preserving of Hamiltonian invariants [77]. Further rigorous studies of symplectic integrators, including Verlet variants such as leap frog and velocity Verlet and constrained dynamics formulations (e.g., [26, 77]), have provided guidelines for researchers to correctly generate MD trajectories and analyze the stability of a simulation in terms of energy conservation and the robustness of the simulation with respect to the timestep size. For example, the Verlet stability limit for characteristic motions is shown in Fig. 4. Resonance artifacts in MD simulations were also later described as more general numerical stability issues that occur when timesteps are related to the natural frequency of the system as certain fractions (e.g., one third the period, see Fig. 4) [78, 79]. Highlighting resonance artifacts in MD simulations, predicting resonant timesteps, and establishing stochastic solution to resonances [80–82] have all led to an improved understanding and quality of MD simulations, including effective multiple-timestep methods [26, 83]. Note that because the value of the inner timestep in multipletimestep methods is limited by stability and resonance limits, even these methods do not produce dramatic computational advantages. The advent of efficient and particle mesh Ewald (PME) [84] and related methods [85–87] for evaluation of the long-range electrostatic interactions, which constitute the most time-consuming part of a biomolecular simulation, has made possible more realistic MD simulations without nonbonded cutoffs, as discussed above. A problem that in part remains unsolved involves the optimal integration of PME methods with multiple-timestep methods and parallelization of PME implementations. The presence of fast terms in the reciprocal Ewald component limits the outer timestep and hence the speedup [83,88–91]. Moreover, memory requirements create a bottleneck in typical PME implementations in MD simulations longer than a microsecond. This is because the contribution of the long-range electrostatic forces imposes a global data dependency on all the system charges; in practice, this implies

M

Low Frequency Motions Time (sec)

102

21

21. Reverse translocation of tRNA in ribosome 75 s

101

100 19

19. Site juxtaposition in supercoiled DNA ~5 ⫻ 10−1s

Protein folding

20 20. MerP folding 0.8 s

10−1

10−2

1.6 ms

1.8 ms

2.0 ms

2.2 ms

2.4 ms

2.6 ms

2.8 ms

3.0 ms

3.2 ms

3.4 ms

3.6 ms

3.8 ms

4.0 ms

4.2 ms

10−3

18 18. NTL9(1-39) folding −3 1.5 ⫻ 10 s

10−4

10−5 15 −6

10

17 16 16. Protein G β-hairpin folding 4.7 ⫻ 10−6s

14 13

12 −8

15. Villin folding −6 4.3 ⫻ 10 s 14. Translocation of ion across a channel −6 ~10 s

13. BPTI side-chain rotations −7 3 ⫻ 10 s

10−7

12. Global DNA bending ~5 ⫻ 10−8s

17. Fip35 folding 10−5s

Collective subgroup motion

10

11

10−9

10 9

11. β-heptapeptide folding 1.44 ⫻ 10−8s

10. Pol β open-to-closed motion 10−9s

9. Ubiquitin diffusion time −10 7.4 ⫻ 10 s

28 Å

34 Å

10−10

10−11

7. Adenine out-of-plane butterfly motion 1.34 ⫻ 10−13s

C H

H

C

HN H

N

N

N H

N

10−12

8

8. Global DNA twisting ~10−12s

10−13

7

H

H H

C

O

3 3. Double bond O 2. Triple bond 2 stretch −14 −14 stretch 1.96 ⫻ 10 s 10 1. Light atom stretch 1.59 ⫻ 10−14s −15 (5.41 fs/6.24 fs) 9.26 ⫻ 10 s (4.38 fs/5.05 fs) (2.55 fs/2.95 fs)

6 45 1

C

H C

C C

H

H

4. CH2 bend −14 2.28 ⫻ 10 s (6.29 fs/7.26 fs)

H

H H

5. CH2 twist −14 2.56 ⫻ 10 s (7.06 fs/8.15 fs)

C

H H

H

C

6. CH2 wag −14 2.61 ⫻ 10 s (7.19 fs/8.31 fs)

High Frequency Motions

Molecular Dynamics Simulations, Fig. 4 Biomolecular Motion Ranges. Representative motions with associated periods are shown, along with associated timestep limits for third-order resonance and linear stability for high-frequency modes (Time periods for the highest frequency motions are derived from frequency data in [113]. Values for adenine butterfly motion are obtained from [114]. Timescales for DNA twisting, bending, and site juxtaposition are taken from [26]. Pol ˇ’s estimated 1 ns open-to-closed hinge motion is taken from [115]. The diffusion

time for ubiquitin is from [116]. The transition-path time for local conformational changes in BPTI is obtained from [55]. The folding timescales are taken as follows: ˇ-heptapeptide [39], C-terminal ˇ-hairpin of protein G [117], villin headpiece subdomain [118], Fip 35 [55], N-terminal domain of ribosomal protein L9 [NTL9(1–39)] [119], and mercury binding protein (MerP) [120]. The approximate time for a single ion to traverse a channel is from [121]. The reverse translocation time of tRNA within the ribosome is from [122])


communication problems [92]. Thus, much work goes into optimizing associated mesh sizes, precision, and sizes of the real and inverse spaces to delay the communication bottleneck as possible (e.g., [54]), but overall errors in long simulations are far from trivial [83, 93]. In addition to MD integration and electrostatic calculations, sampling the vast configurational space has also triggered many innovative approaches to capture “rare events.” The many innovative enhanced sampling methods are either independent of MD or based on MD. In the former class, as recently surveyed [58, 94, 95], are various Monte Carlo approaches, harmonic approximations, and coarse-grained models. These can yield valuable conformational insights into biomolecular structure and flexibility, despite altered kinetics. Although Monte Carlo methods are not always satisfactory for large systems on their own right, they form essential components of more sophisticated methods [59] like transition path sampling [96] and Markov chain Monte Carlo sampling [97]. More generally, MD-based methods for enhanced sampling of biomolecules can involve modification of the potential (like accelerated MD [98]), the simulation protocol (like replica-exchange MD or REMD [99]), or the algorithm. However, global formulations such as transition path sampling [96, 100], forward flux simulation [101], and Markov state models [102] are needed more generally not only to generate more configurations or to suggest mechanistic pathways but also to compute free energy profiles for the reaction and describe detailed kinetics profiles including reaction rates. There are many successful reports of using tailored enhanced sampling methods (e.g., [11, 103–109]), but applications at large to biomolecules, especially in the presence of incomplete experimental endpoints, remain a challenge.

Conclusion When executed with vigilance in terms of problem formulation, implementational details, and force field choice, atomic-level MD simulations present an attractive technique to visualize molecular motion and estimate many properties of interest in the thermally accessible conformational space, from equilibrium distributions to configurational transitions and pathways. The application scope can be as creative as the scientist

947

performing the simulation: from structure prediction to drug design to new mechanistic hypotheses about a variety of biological processes, for a single molecule or a biomolecular complex. Extensions of MD to enhanced sampling protocols and coarse-graining simulations are further enriching the tool kit that modelers possess, and dramatic advances in computer speed, including specialized computer architecture, are driving the field through exciting milestones. Perhaps more than any other modeling technique, proper technical details in simulation implementation and analysis are crucial for the reliability of the biological interpretations obtained from MD trajectories. Thus, both expertise and intuition are needed to dissect correct from nonsensical behavior within the voluminous data that can be generated quickly. In the best cases, MD can help sift through conflicting experimental information and provide new biological interpretations, which can in turn be subjected to further experimentation. Still, MD should never be confused for reality! As larger biomolecular systems and longer simulation times become possible, new interesting questions also arise and need to be explored. These concern the adequacy of force fields as well as long-time stability and error propagation of the simulation algorithms. For example, a 10 &s simulation of the ˇ-protein Fip35 [46] did not provide the anticipated folded conformation nor the folding trajectory from the extended state, as expected from experimental measurements; it was determined subsequently that force field inaccuracies for ˇ-protein interactions affect the results, and not incorrect sampling [60]. In addition, the effects of shortcuts often taken (e.g., relatively large, 2.5 fs, timesteps, which imply corruption by third-order resonances as shown in Fig. 4, and rescaling of velocities to retain ensemble averages) will have to be examined in detail over very long trajectories. The rarity of large-scale conformational transitions and the stochastic and chaotic nature of MD simulations also raise the question as to whether long simulations of one biomolecular system rather than many shorter simulations provide more cost-effective, statistically sound, and scientifically relevant information. Given the many barriers we have already crossed in addressing the fundamental sampling problem in MD, it is likely that new innovative approaches will be invented by scientists in allied fields to render MD

M

948

simulations better and faster for an ever-growing level of biological system sophistication. Acknowledgements I thank Rosana Collepardo, Meredith Foley, and Shereef Elmetwaly for assistance with the figures.

References 1. de Laplace, P.S.: Oeuvres Complètes de Laplace. Théorie Analytique des Probabilités, vol. VII, third edn. GauthierVillars, Paris (1820) 2. Schlick, T.: Pursuing Laplace’s vision on modern computers. In: Mesirov, J.P., Schulten, K., Sumners, D.W. (eds.) Mathematical Applications to Biomolecular Structure and Dynamics. IMA Volumes in Mathematics and Its Applications, vol. 82, pp. 219–247. Springer, New York (1996) 3. Dirac, P.A.M.: Quantum mechanics of many-electron systems. Proc. R Soc. Lond. A123, 714–733 (1929) 4. Maddox, J.: Statistical mechanics by numbers. Nature 334, 561 (1989) 5. Lee, E.H., Hsin, J., Sotomayor, M., Comellas, G., Schulten, K.: Discovery through the computational microscope. Structure 17, 1295–1306 (2009) 6. Schlick, T., Collepardo-Guevara, R., Halvorsen, L.A., Jung, S., Xiao, X.: Biomolecular modeling and simulation: a field coming of age. Q. Rev. Biophys. 44, 191–228 (2011) 7. Tsui, V., Radhakrishnan, I., Wright, P.E., Case, D.A.: NMR and molecular dynamics studies of the hydration of a zinc finger-DNA complex. J. Mol. Biol. 302, 1101–1117 (2000) 8. Case, D.A.: Molecular dynamics and NMR spin relaxation in proteins. Acc. Chem. Res. 35, 325–331 (2002) 9. Henzler-Wildman, K.A., Thai, V., Lei, M., Ott, M., WolfWatz, M., Fenn, T., Pozharski, E., Wilson, M.A., Petsko, G.A., Karplus, M.: Intrinsic motions along an enzymatic reaction trajectory. Nature 450, 838–844 (2007) 10. Altman, R., Radmer, R., Glazer, D.: Improving structurebased function prediction using molecular dynamics. Structure 17, 919–929 (2009) 11. Radhakrishnan, R., Schlick, T.: Orchestration of cooperative events in DNA synthesis and repair mechanism unraveled by transition path sampling of DNA polymerase ˇ’s closing. Proc. Natl. Acad. Sci. U.S.A. 101, 5970–5975 (2004) 12. Golosov, A.A., Warren, J.J., Beese, L.S., Karplus, M.: The mechanism of the translocation step in DNA replication by DNA polymerase I: a computer simulation. Structure 18, 83–93 (2010) 13. Hu, H., Elstner, M., Hermans, J.: Comparison of a QM/MM force field and molecular mechanics force fields in simulations of alanine and glycine “dipeptides” (AceAla-Nme and Ace-Gly-Nme) in water in relation to the problem of modeling the unfolded peptide backbone in solution. Proteins Struct. Funct. Genet. 50, 451–463 (2003) 14. Radhakrishnan, R., Schlick, T.: Fidelity discrimination in DNA polymerase ˇ: differing closing profiles for a mismatched G:A versus matched G:C base pair. J. Am. Chem. Soc. 127, 13245–13252 (2005) 15. Karplus, M., Kuriyan, J.: Molecular dynamics and protein function. Proc. Natl. Acad. Sci. U.S.A. 102, 6679–6685 (2005)

Molecular Dynamics Simulations 16. Faraldo-Gomez, J., Roux, B.: On the importance of a funneled energy landscape for the assembly and regulation of multidomain Src tyrosine kinases. Proc. Natl. Acad. Sci. U.S.A. 104, 13643–13648 (2007) 17. Grigoryev, S.A., Arya, G., Correll, S., Woodcock, C.L., Schlick, T.: Evidence for heteromorphic chromatin fibers from analysis of nucleosome interactions. Proc. Natl. Acad. Sci. U.S.A. 106, 13317–13322 (2009) 18. Campbell, H., Parkinson, G.N., Reszka, A.P., Neidle, S.: Structural basis of DNA quadruplex recognition by an acridine drug. J. Am. Chem. Soc. 130, 6722–6724 (2008) 19. Neidle, S., Read, M., Harrison, J., Romagnoli, B., Tanious, F., Gowan, S., Reszka, A., Wilson, D., Kelland, L.: Structure-based design of selective and potent G quadruplex-mediated telomerase inhibitors. Proc. Natl. Acad. Sci. USA 98, 4844–4849 (2001) 20. Baker, D., Kuhlman, B., Dantas, G., Ireton, G., Varani, G., Stoddard, B.: Design of a novel globular protein fold with atomic-level accuracy. Science 302, 1364–1368 (2003) 21. Hornak, V., Simmerling, C.: Targeting structural flexibility in HIV-1 protease inhibitor binding. Drug Discov. Today 12, 132–138 (2007) 22. Jiang, L., Althoff, E.A., Clemente, F.R., Doyle, L., Röthlisberger, D., Zanghellini, A., Gallaher, J.L., Betker, J.L., Tanaka, F., Barbas, C.F., III, Hilvert, D., Houk, K.N., Stoddard, B.L., Baker, D.: De novo computational design of retro-aldol enzymes. Science 319, 1387–1391 (2008) 23. Grossfield, A., Pitman, M.C., Feller, S.E., Soubias, O., Gawrisch, K.: Internal hydration increases during activation of the G-protein-coupled receptor rhodopsin. J. Mol. Biol. 381, 478–486 (2008) 24. Khelashvili, G., Grossfield, A., Feller, S.E., Pitman, M.C., Weinstein, H.: Structural and dynamic effects of cholesterol at preferred sites of interaction with rhodopsin identified from microsecond length molecular dynamics simulations. Proteins 76, 403–417 (2009) 25. Vasquez, V., Sotomayor, M., Cordero-Morales, J., Schulten, K., Perozo, E.: A structural mechanism for MscS gating in lipid bilayers. Science 321, 1210–1214 (2008) 26. Schlick, T.: Molecular Modeling: An Interdisciplinary Guide, second edn. Springer, New York (2010) 27. McCammon, J.A., Gelin, B.R., Karplus, M.: Dynamics of folded proteins. Nature 267, 585–590 (1977) 28. Levitt, M.: Computer simulation of DNA double-helix dynamics. Cold Spring Harb. Symp. Quant. Biol. 47, 251– 275 (1983) 29. Seibel, G.L., Singh, U.C., Kollman, P.A.: A molecular dynamics simulation of double-helical B-DNA including counterions and water. Proc. Natl. Acad. Sci. U.S.A. 82, 6537–6540 (1985) 30. Prabhakaran, M., Harvey, S.C., Mao, B., McCammon, J.A.: Molecular dynamics of phenylanlanine transfer RNA. J. Biomol. Struct. Dyn. 1, 357–369 (1983) 31. Harvey, S.C., Prabhakaran, M., Mao, B., McCammon, J.A.: Phenylanine transfer RNA: molecular dynamics simulation. Science 223, 1189–1191 (1984) 32. Tidor, B., Irikura, K.K., Brooks, B.R., Karplus, M.: Dynamics of DNA oligomers. J. Biomol. Struct. Dyn. 1, 231–252 (1983) 33. Cheatham, T.E., III, Miller, J.L., Fox, T., Darden, T.A., Kollman, P.A.: Molecular dynamics simulations


34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

45.

46.

47.

48.

of solvated biomolecular systems: the particle mesh Ewald method leads to stable trajectories of DNA, RNA, and proteins. J. Am. Chem. Soc. 117, 4193–4194 (1995) Levy, R.M., Sheridan, R.P., Keepers, J.W., Dubey, G.S., Swaminathan, S., Karplus, M.: Molecular dynamics of myoglobin at 298K. Results from a 300-ps computer simulation. Biophys. J. 48, 509–518 (1985) Wendoloski, J.J., Kimatian, S.J., Schutt, C.E., Salemme, F.R.: Molecular dynamics simulation of a phospholipid micelle. Science 243, 636–638 (1989) Schlick, T., Skeel, R.D., Brünger, A.T., Kalé, L.V., Board, J.A., Jr., Hermans, J., Schulten, K.: Algorithmic challenges in computational molecular biophysics. J. Comput. Phys. 151, 9–48 (1999) (Special Volume on Computational Biophysics) Kosztin, D., Bishop, T.C., Schulten, K.: Binding of the estrogen receptor to DNA: the role of waters. Biophys. J. 73, 557–570 (1997) Young, M.A., Beveridge, D.L.: Molecular dynamics simulations of an oligonucleotide duplex with adenine tracts phased by a full helix turn. J. Mol. Biol. 281, 675–687 (1998) Daura, X., Jaun, B., Seebach, D., Van Gunsteren, W.F., Mark, A.: Reversible peptide folding in solution by molecular dynamics simulation. J. Mol. Biol. 280, 925– 932 (1998) Duan, Y., Kollman, P.A.: Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. Science 282, 740–744 (1998) Izrailev, S., Crofts, A.R., Berry, E.A., Schulten, K.: Steered molecular dynamics simulation of the Rieske subunit motion in the cytochrome bc1 complex. Biophys. J. 77, 1753– 1768 (1999) Tajkhorshid, E., Nollert, P., Ø Jensen, M., Miercke, L.J.W., O’Connell, J., Stroud, R.M., Schulten, K.: Control of the selectivity of the aquaporin water channel family by global orientational tuning. Science 296, 525–530 (2002) Snow, C.D., Nguyen, H., Pande, V.S., Gruebele, M.: Absolute comparison of simulated and experimental protein folding dynamics. Nature 420, 102–106 (2002) Ensign, D.L., Kasson, P.M., Pande, V.S.: Heterogeneity even at the speed limit of folding: large-scale molecular dynamics study of a fast-folding variant of the villin headpiece. J. Mol. Biol. 374, 806–816 (2007) Pérez, A., Luque, J., Orozco, M.: Dynamics of B-DNA on the microsecond time scale. J. Am. Chem. Soc. 129, 14739–14745 (2007) Freddolino, P.L., Liu, F., Gruebele, M., Schulten, K.: Tenmicrosecond molecular dynamics simulation of a fastfolding WW domain. Biophys. J. 94, L75–L77 (2008) Dror, R.O., Arlow, D.H., Borhani, D.W., Ø Jensen, M., Piana, S., Shaw, D.E.: Identification of two distinct inactive conformations of the 2-adrenergic receptor reconciles structural and biochemical observations. Proc. Natl. Acad. Sci. U.S.A. 106, 4689–4694 (2009) Mittal, J., Best, R.B.: Tackling force-field bias in protein folding simulations: folding of villin HP35 and Pin WW domains in explicit water. Biophys. J. 99, L26–L28 (2010)

949 49. Freddolino, P.L., Arkhipov, A.S., Larson, S.B., McPherson, A., Schulten, K.: Molecular dynamics simulations of the complete satellite tobacco mosaic virus. Structure 14, 437–449 (2006) 50. Heller, H., Schulten, K.: Parallel distributed computing for molecular dynamics: simulation of large heterogeneous systems on a systolic ring of transputers. Chem. Des. Autom. News 7, 11–22 (1992) 51. Toyoda, S., Miyagawa, H., Kitamura, K., Amisaki, T., Hashimoto, E., Ikeda, H., Kusumi, A., Miyakawa, N.: Development of MD engine: high-speed accelerator with parallel processor design for molecular dynamics simulations. J. Comput. Chem. 20, 185–199 (1999) 52. Butler, D.: IBM promises scientists 500-fold leap in supercomputing power. . . . . . and a chance to tackle protein structure. Nature 402, 705–706 (1999) 53. Zhou, R., Eleftheriou, M., Hon, C.-C., Germain, R.S., Royyuru, A.K., Berne, B.J.: Massively parallel molecular dynamics simulations of lysozyme unfolding. IBM J. Res. Dev. 52, 19–30 (2008) 54. Shaw, D.E., Dror, R.O., Salmon, J.K., Grossman, J.P., Mackenzie, K.M., Bank, J.A., Young, C., Deneroff, M.M., Batson, B., Bowers, K.J., Chow, E., Eastwood, M.P., Ierardi, D.J., Klepeis, J.L., Kuskin, J.S., Larson, R.H., Lindorff-Larsen, K., Maragakis, P., Moraes, M.A., Piana, S., Shan, Y., Towles, B.: Millisecond-scale molecular dynamics simulations on Anton. In: SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, San Diego, pp. 1–11. ACM (2009) 55. Shaw, D.E., Maragakis, P., Lindorff-Larsen, K., Piana, S., Dror, R.O., Eastwood, M.P., Bank, J.A., Jumper, J.M., Salmon, J.K., Shan, Y., Wriggers, W.: Atomic-level characterization of the structural dynamics of proteins. Science 330, 341–346 (2010) 56. Lei, H., Duan, Y.: Improved sampling methods for molecular simulation. Curr. Opin. Struct. Biol. 17, 187–191 (2007) 57. Klein, M.L., Shinoda, W.: Large-scale molecular dynamics simulations of self-assembling systems. Science 321, 798–800 (2008) 58. Schlick, T.: Monte Carlo, harmonic approximation, and coarse-graining approaches for enhanced sampling of biomolecular structure. F1000 Biol. Rep. 1, 48 (2009) 59. Schlick, T.: Molecular-dynamics based approaches for enhanced sampling of long-time, large-scale conformational changes in biomolecules. F1000 Biol. Rep. 1, 51 (2009) 60. Freddolino, P.L., Park, S., Roux, B., Schulten, K.: Force field bias in protein folding simulations. Biophys. J. 96, 3772–3780 (2009) 61. Schwede, T., Sali, A., Honig, B., Levitt, M., Berman, H.M., Jones, D., Brenner, S.E., Burley, S.K., Das, R., Dokholyan, N.V., Dunbrack, R.L., Jr., Fidelis, K., Fiser, A., Godzik, A., Huang, Y.J., Humblet, C., Jacobson, M.P., Joachimiak, A., Krystek, S.R., Jr., Kortemme, T., Kryshtafovych, A., Montelione, G.T., Moult, J., Murray, D., Sanchez, R., Sosnick, T.R., Standley, D.M., Stouch, T., Vajda, S., Vasquez, M., Westbrook, J.D., Wilson, I.A.: Outcome of a workshop on applications of protein models in biomedical research. Structure 17, 151–159 (2009)

M

950 62. Struthers, R.S., Rivier, J., Hagler, A.T.: Theoretical simulation of conformation, energetics, and dynamics in the design of GnRH analogs. Trans. Am. Crystallogr. Assoc. 20, 83–96 (1984). Proceedings of the Symposium on Molecules in Motion, University of Kentucky, Lexington, Kentucky, May 20–21, (1984) 63. Harte, W.E., Jr., Swaminathan, S., Beveridge, D.L.: Molecular dynamics of HIV-1 protease. Proteins Struct. Funct. Genet. 13, 175–194 (1992) 64. Collins, J.R., Burt, S.K., Erickson, J.W.: Flap opening in HIV-1 protease simulated by activated’ molecular dynamics. Nat. Struct. Mol. Biol. 2, 334–338 (1995) 65. Hamelberg, D., McCammon, J.A.: Fast peptidyl cistrans isomerization within the flexible Gly-rich flaps of HIV-1 protease. J. Am. Chem. Soc. 127, 13778–13779 (2005) 66. Tozzini, V., McCammon, J.A.: A coarse grained model for the dynamics of flap opening in HIV-1 protease. Chem. Phys. Lett. 413, 123–128 (2005) 67. Hornak, V., Okur, A., Rizzo, R.C., Simmerling, C.: HIV-1 protease flaps spontaneously open and reclose in molecular dynamics simulations. Proc. Natl. Acad. Sci. U.S.A. 103, 915–920 (2006) 68. Scott, W.R., Schiffer, C.A.: Curling of flap tips in HIV-1 protease as a mechanism for substrate entry and tolerance of drug resistance. Structure 8, 1259–1265 (2000) 69. Schames, J.R., Henchman, R.H., Siegel, J.S., Sotriffer, C.A., Ni, H., McCammon, J.A.: Discovery of a novel binding trench in HIV integrase. J. Med. Chem. 47, 1879– 1881 (2004) 70. Perryman, A.L., Forli, S., Morris, G.M., Burt, C., Cheng, Y., Palmer, M.J., Whitby, K., McCammon, J.A., Phillips, C., Olson, A.J.: A dynamic model of HIV integrase inhibition and drug resistance. J. Mol. Biol. 397, 600–615 (2010) 71. Lin, J.H., Perryman, A.L., Schames, J.R., McCammon, J.A.: Computational drug design accommodating receptor flexibility: the relaxed complex scheme. J. Am. Chem. Soc. 124, 5632–5633 (2002) 72. Hazuda, D.J., Anthony, N.J., Gomez, R.P., Jolly, S.M., Wai, J.S., Zhuang, L., Fisher, T.E., Embrey, M., Guare, J.P., Jr., Egbertson, M.S., Vacca, J.P., Huff, J.R., Felock, P.J., Witmer, M.V., Stillmock, K.A., Danovich, R., Grobler, J., Miller, M.D., Espeseth, A.S., Jin, L., Chen, I.W., Lin, J.H., Kassahun, K., Ellis, J.D., Wong, B.K., Xu, W., Pearson, P.G., Schleif, W.A., Cortese, R., Emini, E., Summa, V., Holloway, M.K., Young, S.D.: A naphthyridine carboxamide provides evidence for discordant resistance between mechanistically identical inhibitors of HIV-1 integrase. Proc. Natl. Acad. Sci. U.S.A. 101, 11233–11238 (2004) 73. Kitano, H.: A robustness-based approach to systemsoriented drug design. Nat. Rev. Drug Discov. 6, 202–210 (2007) 74. Munos, B.: Lessons from 60 years of pharmaceutical innovation. Nat. Rev. 8, 959–968 (2009) 75. Schlick, T.: Some failures and successes of long-timestep approaches for biomolecular simulations. In: Deuflhard, P., Hermans, J., Leimkuhler, B., Mark, A.E., Reich, S., Skeel, R.D. (eds.) Computational Molecular Dynamics: Challenges, Methods, Ideas – Proceedings of the 2nd


76.

77.

78.

79.

80.

81.

82.

83.

84.

85.

86.

87. 88.

89.

90.

91.

92.

International Symposium on Algorithms for Macromolecular Modelling, Berlin, May 21–24, 1997. Lecture Notes in Computational Science and Engineering (Series Eds. Griebel, M., Keyes, D.E., Nieminen, R.M., Roose, D., Schlick, T.), vol. 4, pp. 227–262. Springer, Berlin (1999) Verlet, L.: Computer ‘experiments’ on classical fluids: I. Thermodynamical properties of Lennard-Jones molecules. Phys. Rev. 159(1), 98–103 (1967) Leimkuhler, B., Reich, S.: Simulating Hamiltonian Dynamics. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge (2004) Mandziuk, M., Schlick, T.: Resonance in the dynamics of chemical systems simulated by the implicit-midpoint scheme. Chem. Phys. Lett. 237, 525–535 (1995) Schlick, T., Mandziuk, M., Skeel, R.D., Srinivas, K.: Nonlinear resonance artifacts in molecular dynamics simulations. J. Comput. Phys. 139, 1–29 (1998) Schlick, T., Barth, E., Mandziuk, M.: Biomolecular dynamics at long timesteps: bridging the timescale gap between simulation and experimentation. Annu. Rev. Biophys. Biomol. Struct. 26, 179–220 (1997) Barth, E., Schlick, T.: Overcoming stability limitations in biomolecular dynamics: I. combining force splitting via extrapolation with Langevin dynamics in LN . J. Chem. Phys. 109, 1617–1632 (1998) Sweet, C.R., Petrine, P., Pande, V.S., Izaguirre, J.A.: Normal mode partitioning of Langevin dynamics for biomolecules. J. Chem. Phys. 128, 145101 (2008) Morrone, J.A., Zhou, R., Berne, B.J.: Molecular dynamics with multiple time scales: how to avoid pitfalls. J. Chem. Theory Comput. 6, 1798–1804 (2010) Essmann, U., Perera, L., Berkowitz, M.L., Darden, T., Lee, H., Pedersen, L.G.: A smooth particle mesh Ewald method. J. Chem. Phys. 103, 8577–8593 (1995) Greengard, L., Rokhlin, V.: A new version of the fast multipole method for the Laplace equation in three dimensions. Acta Numer. 6, 229–269 (1997) Skeel, R.D., Tezcan, I., Hardy, D.J.: Multiple grid methods for classical molecular dynamics. J. Comput. Chem. 23, 673–684 (2002) Duan, Z.-H., Krasny, R.: An Ewald summation based multipole method. J. Chem. Phys. 113, 3492–3495 (2000) Stuart, S.J., Zhou, R., Berne, B.J.: Molecular dynamics with multiple time scales: the selection of efficient reference system propagators. J. Chem. Phys. 105, 1426–1436 (1996) Procacci, P., Marchi, M., Martyna, G.J.: Electrostatic calculations and multiple time scales in molecular dynamics simulation of flexible molecular systems. J. Chem. Phys. 108, 8799–8803 (1998) Zhou, R., Harder, E., Xu, H., Berne, B.J.: Efficient multiple time step method for use with Ewald and particle mesh Ewald for large biomolecular systems. J. Chem. Phys. 115, 2348–2358 (2001) Qian, X., Schlick, T.: Efficient multiple-timestep integrators with distance-based force splitting for particle-meshEwald molecular dynamics simulations. J. Chem. Phys. 116, 5971–5983 (2002) Fitch, B.G., Rayshubskiy, A., Eleftheriou, M., Ward, T.J.C., Giampapa, M., Pitman, M.C., Germain, R.S.: Blue

Molecular Geometry Optimization, Models

93. 94. 95.

96.

97.

98.

99.

100.

101.

102.

103.

104.

105.

106.

107.

108.

109.

matter: approaching the limits of concurrency for classical molecular dynamics. In: Supercomputing, 2006. SC’06. Proceedings of the ACM/IEEE SC 2006 Conference, pp 44. ACM (2006) Snir, M.: A note on N-body computations with cutoffs. Theory Comput. Syst. 37, 295–318 (2004) Earl, D.J., Deem, M.W.: Monte Carlo simulations. Methods Mol. Biol. 443, 25–36 (2008) Liwo, A., Czaplewski, C., Oldziej, S., Scheraga, H.A.: Computational techniques for efficient conformational sampling of proteins. Curr. Opin. Struct. Biol. 18, 134– 139 (2008) Dellago, C., Bolhuis, P.G.: Transition path sampling simulations of biological systems. Top. Curr. Chem. 268, 291–317 (2007) Pan, A.C., Roux, B.: Building Markov state models along pathways to determine free energies and rates of transitions. J. Chem. Phys. 129, 064107 (2008) Grant, B.J., Gorfe, A.A., McCammon, J.A.: Large conformational changes in proteins: signaling and other functions. Curr. Opin. Struct. Biol. 20, 142–147 (2010) Sugita, Y., Okamoto, Y.: Replica-exchange molecular dynamics methods for protein folding. Chem. Phys. Lett. 314, 141–151 (1999) Bolhuis, P.G., Chandler, D., Dellago, C., Geissler, P.L.: Transition path sampling: throwing ropes over rough mountain passes, in the dark. Annu. Rev. Phys. Chem. 53, 291–318 (2002) Borrero, E.E., Escobedo, F.A.: Optimizing the sampling and staging for simulations of rare events via forward flux sampling schemes. J. Chem. Phys. 129, 024115 (2008) Noé, F., Fischer, S.: Transition networks for modeling the kinetics of conformational change in macromolecules. Curr. Opin. Struct. Biol. 8, 154–162 (2008) Noé, F., Horenko, I., Schütte, C., Smith, J.C.: Hierarchical analysis of conformational dynamics in biomolecules: transition networks of metastable states. J. Chem. Phys. 126, 155102 (2007) Ozkan, S.B., Wu, G.A., Chodera, J.D., Dill, K.A.: Protein folding by zipping and assembly. Proc. Natl. Acad. Sci. U.S.A. 104, 11987–11992 (2007) Noé, F., Schutte, C., Vanden-Eijnden, E., Reich, L., Weikl, T.R.: Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulations. Proc. Natl. Acad. Sci. U.S.A. 106, 19011–19016 (2009) Berezhkovskii, A., Hummer, G., Szabo, A.: Reactive flux and folding pathways in network models of coarse-grained protein dynamics. J. Chem. Phys. 130, 205102 (2009) Chennamsetty, N., Voynov, V., Kayser, V., Helk, B., Trout, B.L.: Design of therapeutic proteins with enhanced stability. Proc. Natl. Acad. Sci. U.S.A. 106, 11937–11942 (2009) Abrams, C.F., Vanden-Eijnden, E.: Large-scale conformational sampling of proteins using temperature-accelerated molecular dynamics. Proc. Natl. Acad. Sci. U.S.A. 107, 4961–4966 (2010) Voelz, V.A., Bowman, G.R., Beauchamp, K., Pande, V.S.: Molecular simulation of ab initio protein folding for a millisecond folder NTL9(1–39). J. Am. Chem. Soc. 132, 1526–1528 (2010)

951 110. Li, Y., Schlick, T.: Modeling DNA polymerase motions: subtle transitions before chemistry. Biophys. J. 99, 3463– 3472 (2010) 111. Foley, M.C., Padow, V., Schlick, T.: The extraordinary ability of DNA pol to stabilize misaligned DNA. J. Am. Chem. Soc. 132, 13403–13416 (2010) 112. Foley, M.C., Schlick, T.: Simulations of DNA pol R517 mutants indicate 517’s crucial role in ternary complex stability and suggest DNA slippage origin. J. Am. Chem. Soc. 130, 3967–3977 (2008) 113. Colthup, N.B., Daly, L.H., Wiberley, S.E.: Introduction to Infrared and Raman Spectroscopy. Academic Press, Boston (1990) 114. Weiner, S.J., Kollman, P.A., Nguyen, D.T., Case, D.A.: An all atom force field for simulations of proteins and nucleic acids. J. Comput. Chem. 7, 230–252 (1986) 115. Kim, S.V.J., Beard, W.A., Harvey, J., Shock, D.D., Knutson, J.R., Wilson, S.H.: Rapid segmental and subdomain motions of DNA polymerase ˇ. J. Biol. Chem. 278, 5072– 5081 (2003) 116. Nederveen, A.J., Bonvin, A.M.J.J.: NMR relaxation and internal dynamics of ubiquitin from a 0.2 &s MD simulation. J. Chem. Theory Comput. 1, 363–374 (2005) 117. Zagrovic, B., Sorin, E.J., Pande V.: ˇ-hairpin folding simulations in atomistic detail using an implicit solvent model. J. Mol. Biol. 313, 151–169 (2001) 118. Kubelka, J., Eaton, W.A., Hofrichter, J.: Experimental tests of villin subdomain folding simulations. J. Mol. Biol. 329, 625–630 (2003) 119. Horng, J.V.C., Moroz, V., Raleigh, D.P.: Rapid cooperative two-state folding of a miniature ˛–ˇ protein and design of a thermostable variant. J. Mol. Biol. 326, 1261–1270 (2003) 120. Aronsson, G., Brorsson, A.V.C., Sahlman, L., Jonsson, B.V.H.: Remarkably slow folding of a small protein. FEBS Lett. 411, 359–364 (1997) 121. Daiguji, H.: Ion transport in nanofluidic channels. Chem. Soc. Rev. 39, 901–911 (2010) 122. Fischer, N., Konevega, A.L., Wintermeyer, W., Rodnina, M.V., Stark, H.: Ribosome dynamics and tRNA movement by time-resolved electron cryomicroscopy. Nature 466, 329–333 (2010)

Molecular Geometry Optimization, Models Gero Friesecke1 and Florian Theil2 1 TU München, Zentrum Mathematik, Garching, Münich, Germany 2 Mathematics Institute, University of Warwick, Coventry, UK

Mathematics Subject Classification 81V55, 70Cxx, 92C40

M

952

Short Definition Geometry optimization is a method to predict the threedimensional arrangement of the atoms in a molecule by means of minimization of a model energy. The phenomenon of binding, that is to say the tendency of atoms and molecules to conglomerate into stable larger structures, as well as the emergence of specific structures depending on the constituting elements, can be explained, at least in principle, as a result of geometry optimization.

Pheonomena Two atoms are said to be linked together by a bond if there is an opposing force against pulling them apart. Associated with a bond is a binding energy, which is the total energy required to separate the atoms. Except at very high temperature, atoms form bonds between each other and conglomerate into molecules and larger aggregates such as atomic or molecular chains, clusters, and crystals. The ensuing molecular geometries, that is to say the 3D arrangements of the atoms, and the binding energies of the different bonds, crucially influence physical and chemical behavior. Therefore, theoretically predicting them forms a large and important part of contemporary research in chemistry, materials science, and molecular biology. A major difficulty is that binding energies, preferred partners, and local geometries are highly chemically specific, that is to say they depend on the elements involved. For instance, the experimental binding energies of the diatomic molecules Li2 , Be2 , and N2 (i.e., the dimers of element number 3, 4, 7 in the periodic table) are roughly in the ratio 10:1:100. And CH2 is bent, whereas CO2 is straight. When atoms form bonds, their electronic structure, that is to say the probability cloud of electrons around their atomic nucleus, rearranges. Chemists distinguish phenomenologically between different types of bonds, depending on this type of rearrangement: covalent, ionic, and metallic bonds, as well as weak bonds such as hydrogen or van der Waals bonds. A covalent bond corresponds to a substantial rearrangement of the electron cloud into the space between the atoms


while each atom maintains a net charge neutrality, as in the C–C bond. In a ionic bond, one electron migrates almost fully to the other atom, as in the dimer Na–Cl. The metallic bond between atoms in a solid metal is pictured as the formation of a “sea” of free electrons, no longer associated to any particular atom, surrounding a lattice of ionic cores. The above distinctions, albeit a helpful guide, should not be taken too literally and are often not so clear-cut in practice. A unifying theoretical viewpoint of the 3D molecular structures resulting from interatomic bonding, regardless of the type of bonds, is to view them as geometry optimizers, i.e., as locally or globally optimal spatial arrangements of the atoms which minimize overall energy. For a mathematical formulation, see section “Geometry Optimization and Binding Energy Prediction.” If the number of atoms or molecules is large (&100), then the system will start behaving in a thermodynamic way. At sufficiently low temperature, identical atoms or molecules typically arrange themselves into a crystal, that is to say the positions of the atomic nuclei are given approximately by a subset of a crystal lattice. A crystal lattice L is a finite union of discrete subsets of R3 of form fi e C jf C kg j i; j; k 2 Zg, where e; f; g are linearly independent vectors in R3 . Near the boundaries of crystals, the underlying lattice is often distorted. Closely related effects are the emergence of defects such as vacancies, interstitial atoms, dislocations, and continuum deformations. Vacancies and interstitial atoms are missing respectively additional atoms. Dislocations are topological crystallographic defects which can sometimes be visualized as being caused by the termination of a plane of atoms in the middle of a crystal. Continuum deformations are small long-wavelength distortions of the underlying lattice arising from external loads, as in an elastically bent macroscopic piece of metal. A unifying interpretation of the above structures arises by extending the term “geometry optimization,” which is typically used in connection with single molecules, to large scale systems as well. The spatial arrangements of the atoms can again be understood, at least locally and subject to holding the atomic positions in an outer region fixed, as geometry optimizers, i.e., minimizers of energy.


Geometry Optimization and Binding Energy Prediction Geometry optimization, in its basic all-atom form, makes a prediction for the 3D spatial arrangement of the atoms in a molecule, by a two-step procedure. Suppose the system consists of M atoms, with atomic numbers Z1 ; : : : ; ZM . Step A: Specify a model energy or potential energy surface (PES), that is to say a function ˚ W R3M ! R [ fC1g which gives the system’s potential energy as a function of the vector X D .X1 ; : : : ; XM / 2 R3M of the atomic positions Xj 2 R3 . Step B: Compute (local or global) minimizers .X1 ; : : : ; XM / of ˚. Basic physical quantities of the molecule correspond to mathematical quantities of the energy surface as follows:

953

Model Energies A wide range of model energies are in use, depending on the type of system and the desired level of understanding. To obtain quantitatively accurate and chemically specific predictions, one uses ab initio energy surfaces, that is to say surfaces obtained from a quantum mechanical model for the system’s electronic structure which requires as input only atomic numbers. For large systems, one often uses classical potentials. The latter are particularly useful for predicting the 3D structure of systems composed from many identical copies of just a few basic units, such as crystalline clusters, carbon nanotubes, or nucleic acids.

Stable configuration

Difference between minimum energy and sum of energies of subsystems Local minimizer

Born-Oppenheimer Potential Energy Surface The gold standard model energy of a system of M atoms, which in principle contains the whole range of phenomena described in section “Pheonomena,” is the ground state Born-Oppenheimer PES of nonrelativistic quantum mechanics. With X D .X1 ; : : : ; XM / 2 R3M denoting the vector of nuclear positions, it has the general mathematical form

Transition state Bond length/angle

Saddle point Parameter in minimizing configuration

˚ BO .X / D min E.X; /;

Binding energy

2AN

(1)

where E is an energy functional depending on an infinite-dimensional field , the electronic wave function. For a molecule with N electrons, the latter is a function on the configuration space .R3 Z2 /N of the electron positions and spins. More precisely AN D f 2 L2 ..R3 Z2 /N / ! C j jj jjL2 D 1; r 2 L2 ; antisymmetricg, where antisymmet E D min ˚.X / lim minf˚.X / W R!1 ric means, with xi ; si denoting the position and spin dist.fX1 ; : : : ; XK g; fXKC1; : : : ; XM g/ Rg: of the i th electron, .: : : ; xi ; si ; : : : ; xj ; sj ; : : :/ D .: : : ; xj ; sj ; : : : ; xi ; si ; : : :/ for all i < j . The functional E is given, in atomic units, by E.X; / D Potential energy surfaces have the general property of R 3 N H where Galilean invariance, that is to say ˚.X1 ; : : : ; XM / D .R Z2 / ˚.RX1 C a; : : : ; RXM C a/, for any translation vector N X X a 2 R3 and any rotation matrix R 2 SO.3/. Thus, a H D v .x /C rx2i C Wee .xi xj /CWnn .X / X 1 one-atom surface ˚.X1 / is independent of X1 , and a j D1 1i 0; b > 0 are empirical parameters. For a variety of monatomic systems, more sophisticated classical potentials containing three-body and higher interactions, ˚ classical .X1 ; : : : ; XM / D C

X

X

V2 .Xi ; Xj /

i 0 the exponent of A .f; !/ D n;d n.!/ f .x1;! /; f .x2;! /; : : : ; f .xn.!/;! / 1 n is larger than 1=2. Furthermore, for a fixed d 8 f 2 Fd ; and r tending to infinity, the exponent of n1 tends to infinity, whereas for a fixed r and d tending to infinity, it goes to 1=2. This shows how the where ! is a random element from some probability space and x1;! ; : : : ; xn.!/;! are independent identically smoothness of integrands helps. For many practical applications, d is large and distributed points according to the probability measure r is small. Then the exponent of n1 is close to of !. Furthermore, n.!/ is a random integer which 1=2. This means that in this case the standard Monte tells us how many function values are computed for Carlo algorithm enjoys almost optimal speed of !, with the expected value of n.!/ being at most n,

Monte Carlo Integration

967

i.e., E! n.!/ n. Finally, n.!/ is an arbitrary mapping (not necessarily linear) of n.!/ function values f .xj;! /’s to R. The randomized error of An;d is defined as .Id .f / An;d .f; !//2 e .An;d / D sup E! kf k2Fd f 2Fd

!1=2

ran

:

Let e ran .n; Fd / D inf e ran .An;d / An;d

denote the minimal randomized error among all possible randomized algorithms An;d . This means that we want to find the best distribution of random elements !, the best choice of n.!/, xj;! , and the mapping n.!/ such that they approximate Id .f / with smallest possible randomized error. Note that for n D 0 we do not use function values and we approximate Id .f / by a random constant. It is easy to see that the best random constant is zero and then e ran .0; Fd / D kId k D kId kFd !R is the norm of Id in Fd and is called the initial error. The randomized information complexity is then nran ."; Fd / D min f n j e ran .n; Fd / " kId kg : That is, it is the minimal number of function values needed to improve the initial error by a factor " 2 .0; 1/. Finally, the (total) randomized complexity is the minimal cost which is needed to compute an "-approximation. The notion of cost is defined by assuming that all arithmetic operations can be done at cost one and the cost of computing each function value is, say, c.Fd /. Usually, c.Fd / 1. Surprisingly enough, for many spaces Fd the total complexity is roughly equal to c.Fd / times the information complexity. The reason is that usually the algorithm that minimizes the number of function values is easy to implement. Indeed, if we can prove that the randomized information complexity is achieved or nearly achieved by the standard Monte Carlo algorithm, then its total cost is .c.Fd / C 1/ nran ."; Fd / c.Fd / nran ."; Fd /. We would like to know how the information complexity nran ."; Fd / depends on d and "1 . In particular, we would like to know for which spaces Fd this dependence is polynomial in both d and "1 or

at least not exponential in d and "1 . This is the subject of tractability which nowadays is a popular research area. The reader may find more on tractability in [15–17]. The randomized complexity of multivariate integration is known for many spaces of functions. We refer the reader to the works we already cited.

Importance Sampling Suppose that ! is a probability density function on Œ0; 1d . Then we choose sample points xj for j D 1; 2; : : : ; n as independent and identically distributed according to the probability measure of !. The importance sampling algorithm is then An;d .f; !/ D

n 1 X f .xj / n j D1 !.xj /

8 f 2 Fd :

Note that for the uniform distribution we have ! D 1 and An;d coincides with standard Monte Carlo. It is easy to check that 1 e .An;d / p n ran

R sup

Œ0;1d

f 2Fd

! 1 .x/ f 2 .x/ dx

!1=2

kf k2Fd

M :

The main point is to choose ! such that the supremum above is as small as possible for the class Fd . We now report a recent result of Hinrichs [10]. He proved that for Fd D H.Kd / which is an arbitrary reproducing kernel Hilbert space whose kernel is pointwise nonnegative, there exists a density function ! such that nran ."; Fd /

1 "2 C 1 2

8 " 2 .0; 1/; d 2 N:

Furthermore, the assumption on the reproducing kernel as well as the estimate on the information complexity in the theorem of Hinrichs are in general sharp; see [18]. We stress that the bound on the information complexity is independent of d and is of order "2 . Unfortunately, the result of Hinrichs is not constructive, and in general we only know the existence of good !, but we do not know how to find it. Nevertheless, for the Sobolev space with the reproducing kernel,

968

Kd .x; y/ D

Monte Carlo Simulation d 10. Hinrichs, A.: Optimal importance sampling for the approxiY mation of integrals. J. Complex. 26, 125–134 (2010) 1 C min.xj ; yj / 8x D Œx1 ; : : : ; xd ;

11. Mathé, P.: Random approximation of Sobolev embeddings. J. Complex. 7, 261–281 (1991) (1) 12. Metropolis, N., Ulam, S.: The Monte Carlo method. J. y D Œy1 ; : : : ; yd 2 Œ0; 1d ; Amer. Stat. Assoc. 44, 335–341 (1949) 13. Novak, E.: Deterministic and stochastic error bounds in numerical analysis, LNiM vol. 1349. Springer-Verlag, Berlin Hinrichs proved that (1988) 14. Novak, E.: Optimal linear randomized methods for linear operators in Hilbert spaces. J. Complex. 8, 22–36 (1992) d Y 15. Novak, E., Wo´zniakowski, H.: Tractability of multi3 d 1 2 8 x 2 Œ0; 1 : 1 C xj 2 xj !.x/ D variate problems, volume I, linear information. Euro4 pean Mathematical Society Publishing House, Zürich j D1 (2008) 16. Novak, E., Wo´zniakowski, H.: Tractability of multivariate problems, volume II: standard information for functionals. This Sobolev space is related to L2 -discrepancy and European Mathematical Society Publishing House, Zürich is often used for the study of QMC (Quasi-Monte (2010) Carlo) algorithms in the worst-case settings; see [20]. 17. Novak, E., Wo´zniakowski, H.: Lower bounds on the comIt is also known that for this space the dependence plexity for linear functionals in the randomized setting. J. Complex. 27, 1–22 (2011) on d in the randomized error of the standard Monte 18. Novak, E., Wo´zniakowski, H.: Tractability of multivariate Carlo algorithm is exponential; see [21]. Hence, imporproblems, volume III: standard information for operators, tance sampling is exponentially better than the standard European Mathematical Society, Publishing House, Zürich Monte Carlo algorithm for this Sobolev space. We add (2012) that even an apparently small change of the Sobolev 19. Richey, M.: The evolution of Markov chain Monte Carlo methods. Am. Math. Mon. 117, 383–413 (2010) space may lead to a different result and the randomized 20. Sloan, I.H.: Quasi-Monte Carlo methods, this encyclopedia error of the standard Monte Carlo algorithm may be 21. Sloan, I.H., Wo´zniakowski, H.: When does Monte Carlo independent of d ; see again [21]. depend polynomially on the number of variables? In: Monte Carlo and Quasi-Monte Carlo Methods 2002, pp. 407–437. Springer, Berlin (2004) 22. Traub, J.F., Wasilkowski, G.W., Wo´zniakowski, H.: Information-based complexity. Academic Press, New York References (1988) 1. Bakhvalov, N.S.: On approximate computation of integrals. 23. Traub, J.F., Wo´zniakowski, H.: The Monte Carlo algorithm with a Pseudo-Random generator. Math. Comput. 58 323– Vestnik MGU, Ser. Math. Mech. Astron. Phys. Chem. 4, 339 (1992) 3–18 (1959) in Russian 24. Wasilkowski, G.W.: Randomization for continuous prob2. Diaconis, P.: The Markov chain Monte Carlo revolution. lems. J. Complex. 5, 195–218 (1989) Bull. Amer. Math. Soc. 46, 179–205 (2009) 3. Hammersley, J.M., Handscomb, D.C.: Monte Carlo methods. Methuen, London (1964) 4. Heinrich, S.: Random approximation in numerical analysis. In: Bierstedt, K.D., et al. (eds.) Functional analysis, pp. 123–171. Dekker, New York (1994) 5. Heinrich, S.: Complexity of Monte Carlo algorithms, In: The Mathematics of Numerical Analysis. Lect. Monte Carlo Simulation Appl. Math. 32, 405–419, AMS-SIAM Summer Seminar, Park City, Am. Math. Soc. Providence Russel Caflisch (1996) 6. Heinrich, S.: The randomized information complexity of UCLA – Department of Mathematics, Institute for elliptic PDE. J. Complex. 22, 220–249 (2006) Pure and Applied Mathematics, Los Angeles, CA, 7. Heinrich, S.: Randomized approximation of Sobolev em- USA beddings. In: Keller, A., Heinrich, S., Niederreiter, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods, pp. 445– 459. Springer, Berlin (2006) 8. Heinrich, S.: Randomized approximation of Sobolev em- Mathematics Subject Classification beddings II. J. Complex. 25, 455–472 (2009) 9. Heinrich, S.: Randomized approximation of Sobolev em65C05; 11K45 beddings III. J. Complex. 25, 473–507 (2009) j D1

Monte Carlo Simulation

969

N N 1=2 ;

Definition

(3)

R in which 2 D I d .f .x/ I /2 dx is the variance of f .x/ and is a standard N.0; 1/ random variable. There are two main ways in which to improve the accuracy of the quadrature formula Eq. (2): The first is variance reduction (such as antithetic variables and control variates) in which the function f is changed to a function fQ with the same average I but a smaller variance Q 2 . The second is to change the points xk Description so that the error in Eq. (3) is reduced. For example, if the points xk come from a quasi-random sequence, Overview then Eq. (3) is replaced by an inequality like jN j Monte Carlo simulation is a powerful method for cN 1 .log N /d [2, 9]. numerical description of a system, in which degrees of freedom that are complicated or unknown are repre- Simulation of Stochastic Differential Equations sented through random numbers. The power of Monte Stochastic differential equations (SDEs) are differenCarlo simulation derives from its generality, simplicity, tial equations that involve a stochastic process. The and robustness: general in that it works on almost most commonly used SDEs have the form anything, simple in that it often directly mimics the properties of the system that is being simulated, and rodx D dt C d W; (4) bust in that it seldom fails catastrophically and almost always gives answers that are reasonable. The price in which the unknown random function is x D x.t/, for these benefits is that Monte Carlo can be slow and the coefficients are D .x; t/ and D .x; t/, and inaccurate. Following the Central Limit Theorem, the the white noise term d W D d W .t/ is defined through accuracy of Monte Carlo is typically D O.N 1=2 / Ito calculus [10] in which W D W .t/ is Brownian for N random samples; or equivalently it is slow, be- motion. cause the computational effort is of size N D O. 2 / Monte Carlo simulation of the SDE Eq. (4) is perto get accuracy of size . Much of the research on formed [7] by discretization in time with increment Monte Carlo simulation is aimed at development of t D T =n, in which T is fixed and n is varied, so that more efficient simulation, in the context of a particular t, x, and d W are replaced by tk D k t, xk x.tk /, p application. and Wk D W .tkC1 / W .tk / D tk in which Monte Carlo simulation is a method for numerical computation in which degrees of freedom that are complicated or unknown are represented through random numbers. It is used in a wide range of applications in science and industry, such as finance, physics, and operations research.

k are independent standard normal random variables. Monte Carlo Quadrature The Euler method for approximate solution of the SDE The simplest use of Monte Carlo is for numerical in Eq. (4) is quadrature [2]. Consider the integral I of a function f .x/ defined for x in the d -dimensional unit cube I d , (5) xkC1 D xk C k t C k Wk ; i.e., Z f .x/dx; (1) in which k D .xk ; tk / and k D .xk ; tk /. The I D Id Milstein method is and the N th Monte Carlo approximation IN is 1 X xkC1 D xk Ck t Ck Wk C k k0 .. Wk /2 t/; 1 2 f .xk /; (2) IN D N (6) 1kN

in which k0 D @x .xk ; tk /. The right-hand side of in which xk are independent samples of a random Eq. (6) is formulated for a scalar SDE (i.e., for x a variable uniformly distributed on I d . Since EŒf .x/ D scalar); for vector SDEs, the Milstein correction terms I , then the Central Limit Theorem says that error N D are more complicated and involve Levy area terms that I IN satisfies cannot be directly evaluated [7].

M

970


Convergence of the discretized solution xk to the SDE solution x is usually measured with respect to a payout function f .x/ evaluated at a time T . The weak measure of convergence is jEŒf .x.T //f .xn /j, which measures the average deviation of xn from x.T /. The strong measure of convergence is EŒjf .x.T // f .xn /j, which measures the deviation of xn from x.T / for each sample (or path). Denote the solutions of the Euler system Eq. (5) and the Milstein system Eq. (6) as x E and x M , respectively. For weak convergence, Milstein is no better than Euler for SDEs, since jEŒf .x.T // f .xnE /j and jEŒf .x.T // f .xnM /j are both of size O. t/. On the other hand, for strong convergence, Milstein offers significant improvementp over Euler, since EŒjf .x.T // f .xnE /j D O. t/ is much larger than EŒjf .x.T // f .xnM /j D O. t/. Multilevel Monte Carlo (MLMC) is a method developed by Mike Giles [5, 6] for reducing the error in Monte Carlo simulation of SDEs. MLMC uses a sequence of numerical solutions x ` with time step t` D 2` T for 0 ` L. Denote the corresponding payout as F` D f .xn` /. At each level `, one uses the simulation with time step t`1 as a control variate for the (finer) time step t` , as expressed in the sum

EŒFL D EŒF0 C

L X

EŒF` F`1 :

(7)

`D1

In the `th term of this sum, the expectation is estimated using N` paths, and the increments W for the path x ` and x `1 are required to be consistent. For the Euler or Milstein scheme, the error in the weak measure is of size N 1=2 C h and the computational effort is of size N h1 , for N simulation paths and time step h. In order to obtain accuracy with error size O."/, one must take N D O."2 / and h D O."/, so that the computational work is O."3 /. For the MLMC using the Euler scheme for the x m simulations, the resulting work is reduced to O."2 .log "/2 /. For the MLMC using the Milstein scheme, the work is reduced to O."2 /. Moreover the character of the MLMC method is different for the two methods. For Euler the work is approximately the same at each level; while for Milstein most of the work is done at the coarsest discretization level.

Simulation for Finance Monte Carlo simulation for pricing financial securities starts from the risk-neutral pricing method from BlackScholes theory [12]. The Black-Scholes model for an equity price S.t/ with average growth rate and volatility is the SDE dS D Sdt C Sd W;

(8)

for which the solution is S.t/ D S0 exp.W .t/ C . 2 =2/t/. The pricing formula for an option V .t/ D V .t; S.t// with payout P .T; S.T // at the expiration time T (i.e., the option can only be exercised at t D T ) is Q .T; S.T // (9) V .0; S0 / D e rT EŒP in which the risk-neutral expectation EQ is effected by replacing in Eq. (8) by the risk-free interest rate r. Alternatively, for an American option that can be exercised at any time t with 0 t T , the price is Q .; S.// V .0; S0 / D max e r EŒP

(10)

in which the maximum is taken over choice of the stopping time satisfying 0 T . Determination of the optimal exercise time involves comparison of the payout value and the expected value of future payout, at each time t. Evaluation of the expected value of future payout depends on the future optimal exercise times, so that it must be determined in a self-consistent manner. The Least-Squares Method (LSM) method [8] was developed to overcome this difficulty. Discretize time so that exercise of the option V can be at any time t D tk D k t with t D T =n. Construct N independent paths Si .t/ for the stock price, with 1 i N . At tn D T , the option price Vi .tn / D P .Si .tn // is just the payout value. Continue backwards in k (by induction). If the price Vi .tk / is known for k m C 1, then at t D tm , the payout from early exercise is Vi .tm / D P .Si .tm //. Estimation of the expected value of future exercise, which is at some time i D tì , is performed by the following regression procedure: On each path, consider the discounted value of the future payout for that path Yi D e r.i tm / P .Si .i // and also set Xi D Si .tm /. Now approximate Y as a function of X by linear regression using a relatively small number of basis functions in X . This gives a value YQi D YQ .Si /, which is an approximation to


971

the value of future exercise on path i . Comparison of the value Yi of payout and the estimated value YQi of future payout determines whether it is better to take exercise early (if Yi > YQi ) or to defer early exercise (if Yi < YQi ). LSM has been used with considerable success on a wide range of American options [8]. It has been generalized to also compute Greeks [13]. Simulation of Particle Dynamics For rarefied gas flow, the Knudsen number Kn is the ratio of the mean free path (i.e., the distance between intermolecular collisions) to the characteristic length scale of the flow. This dimensionless number governs the significance of particle collisions in the flow. For very large Kn, collisions are very infrequent and are not significant to the flow. For very small Kn, collisions are so frequent that the system is rapidly driven into (local) equilibrium, so that their effect can be described by thermodynamics and fluid mechanics. For Kn of moderate size, however, individual collisions are significant for the dynamics of the gas. In this regime, the particles that comprise the gas are represented by a velocity distribution function f .x; v; t/ that satisfies the Boltzmann equation @t f C v r D KN 1 Q.f; f / , in which the collision operator Q.f; f / accounts for binary collisions of gas particles [4]. The equilibrium distributions f that satisfy Q.f; f / D 0 are the Maxwellian distributions of the form M.v/ D .4 T /3=2 expfjv uj2 =2T g

(11)

in which , u, and T are the macroscopic density, velocity, and temperature. Monte Carlo simulation of collisions in a rarefied gas is performed using Direct Simulation Monte Carlo (DSMC) [1]. For DSMC, the velocity distribution function is a sum of delta functions; i.e., f .x; v; t/ D

X

ı.v vk .t//ı.x xk .t//:

(12)

k

Particles that are near each other are randomly selected for a collision, the outcome of which is constrained to satisfy conservation of mass, momentum, and energy. Two free parameters remain, however, and these collision parameters are randomly chosen. The randomness in the collision parameters allows a numerical set of, for example, 104 108 particles to accurately represent the behavior of a gas consisting of 1020 particles.

This method can become computationally intractable for Kn that is small, so that the collision rate is large, but not so small that the fluid equations are accurate. Several approaches have been developed to handle this difficult regime. Many of these methods involve a combination of a Maxwellian distribution M as in Eq. (11) and a particle distribution function g as in Eq. (12). For example, in [11], the distribution function is written as f D M Cg. The macroscopic variables , u, and T in M evolve according to a procedure that is consistent with the fluid equations, collisions between g and itself are performed through the DSMC method, and collisions between g and M are performed by sampling a particle from M and colliding it with a particle from g using DSMC. A similar method has been developed in [3] for Coulomb collisions.

Conclusions The examples presented in this survey of Monte Carlo simulation demonstrate the wide range of applications on which it is used. They also show the open opportunities for developing new ways of accelerating Monte Carlo.

References 1. Bird, G.A.: Molecular Gas Dynamics and the Direct Simulation of Gas Flows. Oxford University Press, Oxford (1994) 2. Caflisch, R.E.: Monte Carlo and quasi-Monte Carlo methods. Acta Numer. 1–49 (1998) 3. Caflisch, R.E., Wang, C., Dimarco, G., Cohen, B., Dimits, A.: A hybrid method for accelerated simulation of Coulomb collisions in a plasma. Multiscale Model. Sim. 7, 865–887 (2008) 4. Cercignani, C.: The Boltzmann Equation and Its Applications. Springer, Berlin (1988) 5. Giles, M.B.: Multi-level Monte Carlo path simulation. Oper. Res. 56, 607–617 (2008) 6. Giles, M.B.: Improved Multilevel Monte Carlo Convergence Using the Milstein Scheme. In: Keller, A., Heinrich, S., Niederreiter, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2006. Springer, Berlin (2008) 7. Kloeden P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations. Springer, Berlin (1999) 8. Longstaff, F.A., Schwartz, E.S.: Valuing American options by simulation: a simple least-square approach. Rev. Fin. Stud. 14, 113–147 (2001) 9. Niederreiter, H.: Random Number Generation and QuasiMonte Carlo Methods. In: CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 63. Society for Industrial and Applied Mathematics, Philadelphia (2003)

M

972 10. Øksendal, B.: Stochastic Differential Equations: An Introduction with Applications. Springer, Berlin (2003) 11. Pareschi, L., Russo, G.: Asymptotic preserving Monte Carlo methods for the Boltzmann equation. Transp. Theor. Stat. Phys. 29, 415–430 (2000) 12. Shreve, S.E.: Stochastic Calculus for Finance II Continuous Time Models. Springer, Berlin (2004) 13. Wang, Y., Caflisch, R.E.: Pricing and hedging Americanstyle options: a simple simulation-based approach. J. Comp. Financ. 13, 95–125 (2010)

Moving Boundary Problems and Cancer Avner Friedman1 and Bei Hu2 1 Department of Mathematics, Ohio State University, Columbus, OH, USA 2 Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, USA

Moving Boundary Problems and Cancer

such models were considered in Greenspan [45,46] and McEwain and Morris [51]; see also [1, 2, 6, 7, 10, 12– 14, 43, 48, 49] and the reviews [1, 5, 9, 15, 33, 34]. The tumor and its boundary change over time in a way that is unknown in advance; one refers to the tumor boundary as a “free boundary.” Some of the PDE models do not explicitly include the free boundary; they assume that the tumor cells are proliferating in a fixed domain [2,6,31]. Other models explicitly include the free boundary as one of the unknown (probably the most important unknown) of the model [3, 4, 10, 12, 13, 27, 31, 32, 35, 36, 43–46]. This entry is concerned with free boundary problems in tumor models, and it focuses on mathematical analysis of such problems. More specifically, this entry is based primarily on a series of papers [3, 4, 27, 32, 35–44] that deal with bifurcation analysis and multi-scale models for tumors with free boundary.

Tumor Models Mathematics Subject Classification

In this section, we describe several tumor models.

35R35; 35K55; 35Q80; 35C20; 92C37

Proliferating Tumor Let ˝.t/ denote the tumor domain at time t. The nutrient function is consumed only in the tumor region and satisfies the diffusion equation:

Synonyms Cancer growth; Tumor growth

ˇt D

in ˝.t/:

(1)

In order to make the model simple, a sequence of simplifying assumptions are made. It is assumed that the density of the cells is constant and their Tumor grows over time and its boundary change over proliferation rate S depends linearly on the nutrient time in a way that is unknown in advance; one refers to concentration, the tumor boundary as a “free boundary.” This entry summarizes several interesting mathematical models S D . e / .e > 0/ (2) for tumor growth.

Short Definition

Description

where is the growth rate and e is the death rate. Since the density of the tumor cells is constant, proliferation and death cause continuous movement among the cells, with associated velocity vE. We assume that the movement of cells in the tumor tissue is similar to that of fluid in a porous medium. Hence, by Darcy’s law, vE D rp (3)

Mathematical models of tumor growth which are based on densities of cells and concentrations of nutrients and signaling molecules are typically modeled by dynamical systems. Because of spatial effects due to cell proliferation, it is natural to model the evolution of tumors in terms of partial differential equations (PDEs). Early where p is the internal pressure.


973

Since, by conservation of mass, divE v D S , the the first bifurcation branch was studied extensively in Friedman and Hu [37–39]; earlier results for small = pressure p satisfies the equation were established in Bazaliy and Friedman [4]. Some extensions, e.g., replacing the right-hand sides of .4/

p D . e / in ˝.t/: (4) and .1/ by more general functions, or replacing the As in the papers cited above, and p satisfy the spherical solution by an infinite strip, were considered in Cui and Escher [19–21], Escher and Matioc [26], boundary conditions: and Zhou et al. [58, 59]. A model with inhibitor was studied in Cui and D on @˝.t/ . > e /; (5) Friedman [22], and models with necrotic core were p D on @˝.t/ (6) considered in Byrne and Chaplain [10], Cui [18], and Cui and Friedman [23]. Although Darcy’s law was used in most tumor where is the mean curvature ( > 0 if ˝.t/ is a models, there are tumors for which the tissue is more ball) and represents the cell-to-cell adhesiveness as discussed in Byrne [7], Byrne and Chaplain [12], and naturally modeled as fluid. For example, in the early Greenspan [46]. Furthermore, by continuity, the free stages of breast cancer, the tumor is confined to the boundary moves with the same velocity as the fluid duct of a mammary gland, which consists of epithelial cells, a meshwork of proteins, and mostly extracellular velocity vE, that is fluid. Several papers on ductal carcinoma in the breast use the Stokes equation in their mathematical models @p on @˝.t/ (7) [28–30]. The mathematical studies for tumor growth in vn D @n Stokes fluids, similar to those of .1/–.7/ but technically where n is the outward normal and vn is the velocity of quite different, were carried out in Friedman and Hu [35, 40, 41]. the free boundary @˝.t/ in the direction n. Note that the special case D 0 reduces to the Hele–Shaw problem with surface tension. For the Tumor with Several Types of Cells Hele–Shaw problem the following results are well The model introduced in the last section was extended known: (1) For any smooth initial data, there exists a in Pettet et al. [55], Sherratt and Chaplain [56], and unique solution with smooth boundary for a small time Ward and King[57] by the introduction of three types interval, global existence is in general not expected. of cells: proliferating cells P , quiescent cells Q, and (2) The only stationary are spheres; (3) spheres are dead cells D. For simplicity, we use the letters P , Q, asymptotically stable solutions, that is, for any smooth and D to also denote their respective densities. It is initial data “close” to that of a sphere, there exists a assumed that cells can move from one state to another, global smooth solution and it converges to a sphere as depending on the concentration of nutrients, : t ! 1. The above three results have been extended to the P ! Q at rate KQ ./; model .1/–.7/. Local existence and uniqueness was Q ! P at rate KP ./; proved in Bazally and Friedman [3, 4], see also [16]. P ! D at rate KA ./ (apoptosis); In Friedman and Reitich [43], it was proved that for any 0 < e < , there exists a unique radially Q ! D at rate KD ./I symmetric stationary solution, and its radius depends on e = , but not on ; . In Friedman and Reitich furthermore, we denote [44], it was proved in the 2-dimensional case that there exists a sequence of symmetric-breaking branches of the proliferation rate of P cells byKB ./and stationary solutions of .1/–.7/ bifurcating from n =n the removal rate of dead cell by KR : .n D 2; 3; 4; /. A general simplified proof, which works also for the 3-dimensional case, was given in Fontelos and Friedman [27]. The asymptotic stability The total density of all cells within the tumor is of the spherical solution for = < 2 =2 and of assumed to be constant:

M

974


P C Q C D const. :

(8) where is a non-negative chemotactic coefficient. The theory for the system with three, or even two, We also assume that all cells are subject to the same types of cells is far less complete than the theory for velocity vE. Then, by conservation of mass, one type of cells, and many challenging questions are open. @P C div.P vE/ D ŒKB ./ KQ ./ KA ./P @t CKP ./Q;

(9)

@Q C div.QE v / D KQ ./P ŒKP ./ C KD ./Q;(10) @t @D C div.D vE/ D KA ./P C KD ./Q KR D; (11) @t

where satisfies .1/. Adding .9/–.11/ and using .8/, we get div vE D KB ./ KR D; and one may replace .11/ by .8/ with D D P Q. If the velocity field is again assumed to satisfy Darcy’s law .3/, then we obtain the system .1/, .9/, .10/, and p D KB ./P KR . P Q/ p D vn D

@p @n

on @˝.t/; on @˝.t/:

in ˝.t/; (12) (13) (14)

Multi-Scale Model The multi-scale model takes into account the cell cycles in different phases (see [32]). The cell cycle is divided into phases G1 ; S1 ; G2 , and M . During the S phase, the DNA is synthesized; during the mitosis phase M , sister chromosomes are segregated and the cell divides into two daughter cells; G1 and G2 are “gap” phases, during which the cell grows and prepares for the next phase (S for G1 , and M for G2 ). At a “check point” R1 located near the end of the G1 phase, the cell decides either to proceed directly to the S phase, or to go into quiescent state G0 , depending on the environment; the cell may also decide to go into apoptosis (i.e., to commit suicide) in case it detects serious damage. At another check point R2 near the end of the S phase, the cell again has to make a decision: whether to proceed to the G2 phase or to go into apoptosis, in case of irreparable damage. A cell remains in quiescent state G0 for a certain amount of time and then proceeds to the S phase. Introduction of the following notations:

p1 .x; t; s1 / D density of cells in phase G1 , The existence of local smooth solutions for the system s1 2 K1 Œ0; A1 ; .1/, .9/–.10/, .12/–.14/ with any smooth initial data was established in Chen and Friedman [16]. The exisp2 .x; t; s2 / D density of cells in phase S; tence of a unique radially symmetric stationary solus2 2 K2 Œ0; A2 ; tion and its linear asymptotic stability was proved in Cui and Friedman [24], and Chen et al. [17] in the p0 .x; t; s0 / D density of cells in state G0 ; case when there are only two types of cells. Results on existence and on asymptotic estimates in the case of s0 2 K0 Œ0; A0 ; radially symmetric solutions were proved in Cui and p3 .x; t; s3 / D density of cells in phases G2 Friedman [25]. Local existence and uniqueness was established in and M; s3 2 K3 Œ0; A3 ; Friedman [34] for the system .1/, .8/–.11/ supplemented by Stokes equation instead of Darcy’s law. p4 .x; t/ D density of necrotic cells. Some experiments ([47] and [52]) suggest that cells of different types move with different velocities. A We denote by w.x; t/ the oxygen concentration and model studied in McElwin and Pettet [50] assumes by Q.x; t/ the density of live cells, i.e., that the velocities of proliferating cells, vEP , and of quiescent cells, vEQ , are related by 3 X Qi .x; t/; Q.x; t/ D vEQ D vEP C r (15) i D0


where

Z

975

Ai

Qi .x; t/ D

pi .x; t; si /dsi : 0

The oxygen concentration satisfies the diffusion equation ˇwt w D Qw; (16) where Q is the rate of oxygen consumption by the live cells. Just as in the previous models, we assume that the total density of cells is constant 4 X

Qi .x; t/ D const ; where Q4 .x; t/ D q4 .x; t/:

i D0

(17) Due to cell proliferation and death, there is a velocity field vE.x; t/, which is assumed to be common to all the cells. Then, by conservation of mass, @pi @pi C C div.pi vE/ D i .w/pi @t @si

(18)

for 0 < si < Ai .i D 0; 1; 2; 3/; @p4 C div.p4 vE/ D 1 p1 .x; t; A1 / @t C2 p2 .x; t; A2 / 4 p4 : (19)

The APC gene detects a signal of overpopulation and it inhibits proliferation if Q is large by sending the cell into the G0 state. Another gene, SMAD, is activated if w is too small and it then inhibits proliferation by again sending the cells into G0 state. The functions of these two genes are represented in the functions K and L. If Darcy’s law is assumed, then the equation for the velocity can be derived as before and this will complete the model. It is possible to include in the model different types of cells, e.g., healthy cells and tumor cells. The different nature of the cells is described by the different function K; L and 1 , 2 . For example, for a cell with damaged APC gene, the function L is less sensitive to overpopulation (i.e., to larger Q). In the case of more than one type of cells, .17/ is replaced by requiring the density of all the cells to be constant. The model .16/–.20/ with Darcy’s law was developed in Friedman [32], where also local existence and uniqueness for general initial data, and global existence for radially symmetric solutions were established. The behavior of the solution in case of mutations of APC or SMAD was studied in Friedman et al. [42]. The same system with Stokes equation instead of Darcy’s law was considered in Friedman [36] where local existence and uniqueness was proved.

We also have: p1 .x; t; 0/ D p3 .x; t; A3 /;

(20) Mathematical Challenges

p2 .x; t; 0/ D p1 .x; t; A1 /Œ1 K.w.x; t// L.Q.x; t// 1 C p0 .x; t; A0 /; p3 .x; t; 0/ D .1 2 /p2 .x; t; A2 /; p0 .x; t; 0/ D p1 .x; t; 0/ŒK.w.x; t// CL.Q.x; t//: The second equation in (20) expresses the assumption that at the end of the G1 phase, a fraction K.w/CL.Q/ of the cells go into quiescence (K.w/ increases if w decreases thereby creating an unfavorable environment for cell proliferation; similarly, L.Q/ increases if Q increases, indicating that there are already too many cells), and a fraction 1 goes into apoptosis. It is assumed that K.w/ > 0; L.Q/ > 0; K.w/ C L.Q/ C 1 < 1:

In the model introduced in the section on proliferating tumor, a natural question is what is the maximal domain of attraction for the spherical solution. Another question is how far can the first bifurcation branch be continued. For the model described in the section on tumor with several types of tumor cells, already for just two types of cells, an explicit expression for the radially symmetric stationary solution is not known. If one could find such an explicit formula, this would open a new line of challenges with regard to symmetricbreaking bifurcations. The asymptotic stability theory for this model is also only very partially developed. All these open questions arise also for the multi-scale model. The models introduced in this entry are quite minimal. They do not include, in particular, the PDE system which describes angiogenesis [52,53], whereby the blood vascular system evolves toward the tumor

M

976


by signaling molecules produced by the tumor cells. 16. Chen, X., Friedman, A.: A free boundary problem for elliptic-hyperbolic system: an application to tumor growth. Including angiogenesis will introduce a new level of SIAM J. Math. Anal. 35, 974–986 (2003) complexity and mathematical challenges. 17. Chen, X., Cui, S., Friedman, A.: A hyperbolic free bound18.

References 19. 1. Adam, J.A.: General aspect of modeling tumor growth and immune response. In: Adam, J.A., Bellomo, N. (eds.) A Survey of Models for Tumor-Immune System Dynamics, pp. 14–87. Birkhäuser, Boston (1996) 2. Adam, J.A., Maggelakis, S.A.: Diffusion regulated growth characteristics of a spherical prevascular carcinoma. Bull. Math. Biol. 52, 549–582 (1990) 3. Bazally, B., Friedman, A.: A free boundary problem for elliptic-parabolic system: application to a model of tumor growth. Commun. Partial Diff. Eq. 28, 517–560 (2003a) 4. Bazaliy, B., Friedman, A.: Global existence and asymptotic stability for an elliptic-parabolic free boundary problem: an application to a model of tumor growth. Indiana Univ. Math. J. 52, 1265–1304 (2003b) 5. Bellomo, N., Preziosi, L.: Modelling and mathematical problems related to tumor evolution and its interaction with the immune system. Math. Comput. Model. 32, 413–452 (2000) 6. Britton, N., Chaplain, M.A.J.: A qualitative analysis of some models of tissue growth. Math. Biosci. 113, 77–89 (1993) 7. Byrne, H.M.: The importance of intercellular adhesion in the development of carcinomas. IMA J. Math. Appl. Med. Biol. 14, 305–323 (1997) 8. Byrne, H.M.: A weakly nonlinear analysis of a model of avascular solid tumor growth. J. Math. Biol. 39, 59–89 (1999) 9. Byrne, H.M.: Mathematical modelling of solid tumour growth: from avascular to vascular, via angiogenesis. In: Mathematical Biology. IAS/Park City Math. Ser., vol. 14, pp. 219–287. Amer. Math. Soc., Providence (2009) 10. Byrne, H.M., Chaplain, M.A.J.: Growth of nonnecrotic tumors in the presence and absence of inhibitors. Math. Biosci. 130, 151–181 (1995) 11. Byrne, H.M., Chaplain, M.A.J.: Modelling the role of cellcell adhesion in the growth and development of carcinomas. Math. Comput. Model. 12, 1–17 (1996a) 12. Byrne, H.M., Chaplain, M.A.J.: Growth of nonnecrotic tumors in the presence and absence of inhibitors. Math. Biosci. 135, 187–216 (1996b) 13. Byrne, H.M., Chaplain, M.A.J.: Free boundary value problems associated with growth and development of multicellular spheroids. Eur. J. Appl. Math. 8, 639–658 (1997) 14. Chaplain, M.A.J.: The development of a spatial pattern in a model for cancer growth. In: Othmer, H.G., Maini, P.K., Murray, J.D. (eds.) Experimental and Theoretical Advances in Biological Pattern Formation, pp. 45–60. Plenum, New York (1993) 15. Chaplain, M.A.J.: Modelling aspects of cancer growth: insight from mathematical and numerical analysis and computational simulation. In: Banasiak, J., et al. (eds.) Multiscale Problems in the Life Sciences. Lecture Notes in Math., vol. 1940, pp. 147–200. Springer, Berlin (2008)

20.

21.

22.

23.

24.

25.

26.

27.

28.

29.

30.

31.

32. 33.

34.

35.

ary problem modeling tumor growth: asymptotic behavior. Trans. AMS 357, 4771–4804 (2005) Cui, S.: Analysis of a mathematical model of the growth of necrotic tumors. J. Math. Anal. Appl. 255, 636–677 (2001) Cui, S., Escher, J.: Bifurcation analysis of an elliptic free boundary problem modelling the growth of avascular tumors. SIAM J. Math. Anal. 39, 210–235 (2007) Cui, S., Escher, J.: Asymptotic behaviour of solutions of a multidimensional moving boundary problem modeling tumor growth. Commun. Partial Diff. Eq. 33, 636–655 (2008) Cui, S., Escher, J.: Well-posedness and stability of a multidimensional tumor growth model. Arch. Ration. Mech. Anal. 191, 173–193 (2009) Cui, S., Friedman, A.: Analysis of a mathematical model of the effect of inhibitors on the growth of tumors. Math. Biosci. 164, 103–137 (2000) Cui, S., Friedman, A.: Analysis of a mathematical model of the growth of necrotic tumors. J. Math. Anal. Appl. 255, 636–677 (2001) Cui, S., Friedman, A.: A free boundary problem for a singular system of differential equations: an application to a model of tumor growth. Trans. AMS 355, 3537–3590 (2003) Cui, S., Friedman, A.: A hyperbolic free boundary problem modeling tumor growth. Interfaces Free Bound. 5, 159–182 (2003) Escher, J., Matioc, A.-V.: Radially symmetric growth of nonnecrotic tumors. Nonlinear Diff. Eq. Appl. 17, 1–20 (2010) Fontelos, M., Friedman, A.: Symmetry-breaking bifurcations of free boundary problems in three dimensions. Asymptot. Anal. 35, 187–206 (2003) Franks, S.J.H., Byrne, H.M., Underwood, J.C.E., Lewis, C.E.: Biological inferences from a mathematical model of comedo ductal carcinoma in situ of the brest. J. Theor. Biol. 232, 523–543 (2005) Franks, S.J.H., Byrne, H.M., King, J.P., Underwood, J.C.E., Lewis, C.E.: Modelling the early growth of ductal carcinoma in situ of the brest. J. Math. Biol. 47, 424–452 (2003a) Franks, S.J.H., Byrne, H.M., King, J.P., Underwood, J.C.E., Lewis, C.E.: Modelling the growth of ductal carcinoma in situ. Math. Med. Biol. 20, 277–308 (2003b) Franks, S.J.H., King, J.R.: Interaction between a uniformly proliferating tumor and its surroundings: uniform material properties. Math. Med. Biol. 20, 47–89 (2003) Friedman, A.: A multiscale tumor model. Interfaces Free Bound. 10, 245–262 (2008) Friedman, A.: A hierarchy of cancer models and their mathematical challenges. Mathematical models in cancer. Discrete Contin. Dyn. Syst. Ser. B 4, 147–159 (2004) Friedman, A.: Mathematical analysis and challenges arising from models of tumor growth. Math. Models Methods Appl. Sci. 17, 1751–1772 (2007) Friedman, A.: A free boundary problem for a coupled system of ellipitc, parabolic and Stokes equations modeling tumor growth. Interfaces Free Bound. 8, 247–261 (2006)

Multigrid Methods: Algebraic 36. Friedman, A.: Free boundary problems associated with multisacle tumor models. Math. Model. Nat. Phenom. 4, 134–155 (2009) 37. Friedman, A., Hu, B.: Bifurcation from stability to instability for a free boundary problem arising in a tumor model. Arch. Ration. Mech. Anal. 180, 292–330 (2006a) 38. Friedman, A., Hu, B.: Asymptotic stability for a free boundary problem arising in a tumor model. J. Diff. Eq. 227(2), 598–639 (2006b) 39. Friedman, A., Hu, B.: Stability and instability of LiapounovSchmidt and Hopf bifurcation for a free boundary problem arising in a tumor model. Trans. Am. Math. Soc. 360, 5291– 5342 (2008) 40. Friedman, A., Hu, B.: Bifurcation for a free boundary problem modeling tumor growth by Stokes equation. SIAM J. Math. Anal. 39, 174–194 (2007a) 41. Friedman, A., Hu, B.: Bifurcation from stability to instability for a free boundary problem modeling tumor growth by Stokes equation. J. Math. Anal. Appl. 327, 643–664 (2007) 42. Friedman, A., Kao, C.-Y., Hu, B.: Cell cycle control at the first restriction point and its effect on tissue growth. J. Math. Biol. 60, 881–907 (2010) 43. Friedman, A., Reitich, F.: Analysis of a mathematical model for growth of tumor. J. Math. Biol. 38, 262–284 (1999) 44. Friedman, A., Reitich, F.: Symmetry-breaking bifurcation of analytic solutions to free boundary problems: An application to a model of tumor growth. Trans. Am. Math. Soc. 353, 1587–1634 (2000) 45. Greenspan, H.P.: Models for the growth of a solid tumor by diffusion. Stud. Appl. Math. 52, 317–340 (1972) 46. Greenspan, H.P.: On the growth of cell culture and solid tumors. Theor. Biol. 56, 229–242 (1976) 47. Hughes, F., McCulloch, C.: Quantification of chemotactic response of quiescent and proliferating fibroblasts in boyden chambers by computer-assisted image analysis. J. Histochem. Cytochem. 39, 243–246 (1991) 48. Lejeune, O., Chaplain, M.A.J., El Akili, I.: Oscillations and bistability in the dynamics of cytotoxic reactions mediated by the response of immune cells to solid tumours. Math. Comput. Model. 47, 649–662 (2008) 49. Maggelakis, S.A., Adam, J.A.: Mathematical model for prevasculat growth of a spherical carcinoma. Math. Comp. Model. 13, 23–38 (1990) 50. McElwin, D., Pettet, G.: Cell migration in multicell spheroids: swimming against the tides. Bull. Math. Biol. 55, 655–674 (1993) 51. McEwain, D.L.S., Morris, L.E.: Apoptosis as a volume loss mechanism in mathematical models of solid tumor growth. Math. Biosci. 39, 147–157 (1978) 52. Palka, J., Adelman-Griff, B., Franz, P., Bayreuter, K.: Differentiation stage and cell cucyle position determine the chemotactic response of fibroblasts. Folia Histochem. Cytobiol. 34, 121–127 (1996) 53. Macklin, P., McDougall, S., Anderson, A., Chaplain, M.A.J., Cristini, V., Lowengrub, J.: Multiscale modelling and nonlinear simulation of vascular tumour growth. J. Math. Biol. 58, 765–798 (2009) 54. Owen, M.R., Alarcn, T., Maini, P., Byrne, H.M.: Angiogenesis and vascular remodelling in normal and cancerous tissues. J. Math. Biol. 58, 689–721 (2009)

977 55. Pettet, G., Please, C.P., Tindall, M.J., McElwain, D.: The migration of cells in multicell tumor spheroids. Bull. Math. Biol. 63, 231-257 (2001) 56. Sherratt, J., Chaplain, M.A.J.: A new mathematical model for avascular tumor growth. J. Math. Biol. 43, 291–312 (2001) 57. Ward, J.P., King, J.R.: Mathematical modelling of avasculartumor growth II: Modelling growth saturation. IMA J. Math. Appl. Med. Biol. 15, 1–42 (1998) 58. Zhou, F., Escher, J., Cui, S.: Well-posedness and stability of a free boundary problem modeling the growth of multi-layer tumors. J. Diff. Eq. 244, 2909–2933 (2008a) 59. Zhou, F., Escher, J., Cui, S.: Bifurcation for a free boundary problem with surface tension modeling the growth of multilayer tumors. J. Math. Anal. Appl. 337, 443–457

Multigrid Methods: Algebraic Luke Olson Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA

Synonyms Algebraic Multigrid; AMG

Definition Algebraic multigrid (AMG) methods are used to approximate solutions to (sparse) linear systems of equations using the multilevel strategy of relaxation and coarse-grid correction that are used in geometric multigrid (GMG) methods. While partial differential equations (PDEs) are often the source of these linear systems, the goal in AMG is to generalize the multilevel process to target problems where the correct coarse problem is not apparent – e.g., unstructured meshes, graph problems, or structured problems where uniform refinement is not effective. In GMG, a multilevel hierarchy is determined from structured coarsening of the problem, followed by defining relaxation and interpolation operators. In contrast, in an AMG method the relaxation method is selected – e.g., Gauss-Seidel – and coarse problems and interpolation are automatically constructed.

Overview Early work in multigrid methods relied on geometric structure to construct coarse problems. This was generalized in [11] by McCormick, where multigrid

M

978

is analyzed in terms of the matrix properties. This algebraic approach to theory was further extended by Mandal in [10], and together with [3] by Brandt, these works form the basis for much of the early development which led to the so-called Ruge-Stüben or classical algebraic multigrid (CAMG) method in [13]. One distinguishing aspect of CAMG is that the coarse problem is defined on a subset of the degrees of freedom of the problem, thus resulting in both coarse and fine points leading to the term CF-based AMG. A different style of algebraic multigrid emerged in [14] as smoothed aggregation-based AMG (SA), where collections of degrees of freedom (or an aggregate) define a coarse degree of freedom. Together the frameworks of CF- and SA-based AMG have led to a number of developments in extending AMG to a wider class of problems and architectures. There are a number of software libraries that implement different forms of AMG for different uses. The original CAMG algorithm and variants are available as amg1r5 and amg1r6 [13]. The Hypre library supports a parallel implementation of CF-based AMG in the BoomerAMG package [8]. The Trilinos package includes ML [7] as a parallel, SA-based AMG solver. Finally, PyAMG [2] includes a number of AMG variants for testings, and Cusp [1] distributes with a standard SA implementation for use on a graphics processing unit (GPU).

Multigrid Methods: Algebraic

Algorithm 1: AMG solve phase x r1 e1 eO x x

fPre-relaxation on 0 g fRestrict residual 1 g fCoarse-grid solution on 1 g fInterpolate coarse-grid errorg fCorrect fine-grid solutiong fPost-relaxation on 0 g

G .A0 ; x; b/I

P0T rI A1 1 r1 I P0 e 1 I x C eO I G .A0 ; x; b/I

Theoretical Observations The two grid process defined in Algorithm 1 can be viewed as an error propagation operator. First, let G represent the error operator for relaxation – e.g., G D I !D 1 A for weighted Jacobi. In addition, coarse operators Ak are typically defined through a Galerkin product: AkC1 D PkT Ak Pk . Thus for an initial guess x and exact solution x to (1), the error e D x x for a two-grid method with one pass of pre-relaxation is defined through T

−1

e ← I − P 0 ( P0 A 0 P0 )

T

P 0 A 0 Ge

relax residual restrict coarse solve interpolate correct

(2)

A key observation follows from (2) in defining Terminology AMG methods: if the error remaining after relaxation The goal of the AMG solver is to approximate the is contained in the range of interpolation, denoted solution to R.P /, then the solver is exact. That is, if Ge 2 R.P /, Ax D b; (1) then coarse-grid correction defined by T D I 1 T T where A 2 Rnn is sparse, symmetric, and positive P .P AP / P A annihilates the error. One important definite. The fine problem (1) is defined on the fine property of T is that it is an A-orthogonal projection, index set 0 D f0; : : : ; n 1g. An AMG method is which highlights the close relationship with other subgenerally determined in two phases: the setup phase space projection methods. and the solve phase. The setup phase is responsible for constructing coarse operators Ak for level k of the hierarchy, along with interpolation operator Pk . A basic hierarchy, for example, consists of a series of operators fA0 ; A1 ; : : : ; Am g and fP0 ; P1 ; : : : ; Pm1 g. Given such a hierarchy, the solve phase then executes in the same manner as that of geometric multigrid, as in Algorithm 1 for a two-level method; an m-level method extends similarly. Here, operator G./ denotes a relaxation method such as weighted Jacobi or Gauss-Seidel.

Methods The setup phase of AMG defines the method. However there are several common features: 1. Determining the strength of connection between degrees of freedom 2. Identifying coarse degrees of freedom 3. Constructing interpolation, P 4. Forming the coarse operator through the Galerkin product P T AP


Algebraic methods determine coarse grids and the resulting interpolation operators to complement the limitations of relaxation. That is, interpolation should capture the error components that relaxation, e.g., weighted Jacobi, does not adequately reduce. The error not reduced by relaxation is termed algebraically smooth error. To identify smooth error, an edge in the graph of matrix A is deemed strong if error is perceived to vary slowly along that edge. This allows for automatic coarsening to match the behavior of relaxation. As an example, consider the case of an anisotropic diffusion operator uxx "uyy rotated by 45 ı along the coordinate axis and discretized by Q1 finite elements on a uniform mesh. As the anisotropic behavior increases (" ! 0), uniform coarsening with geometric multigrid results in degraded performance. In an algebraic method, coarsening is along the direction of smooth error, which follows the line of anisotropy as shown in Fig. 1. Here, coarsening is only performed (automatically) in the direction of smooth error and results in high convergence.

979

P D

W I

Finally a coarse operator is constructed through a Galerkin product, P T AP , which is the dominant cost for most AMG methods.

SA-Based AMG SA-based AMG methods have an important distinction: they require a priori knowledge of the slowto-converge or smooth error, denoted B. A common choice for these vectors in the absence of more knowledge about the problem is B 1, the constant vector. The SA algorithm (see Algorithm 3) first constructs a strength-of-connection matrix, similar to CF-based AMG, but using the symmetric threshold

jAij j

q

jAi i Ajj j:

(4)

From this, aggregates or collections of nodes are CF-Based AMG formed (see Fig. 2) and represent coarse degrees CF-based AMG begins with Ak , the k-level matrix, and of freedom. Next, B is restricted locally to each determines strong edges according to aggregate to form a tentative interpolation operator T so that B 2 R.T /. Then, to improve the accuracy (3) of interpolation, T is smoothed (for example with Aij max Ai k ; k¤i weighted Jacobi) to yield interpolation operator P . This is shown in Fig. 2b where piecewise constant where is some threshold. This process yields a functions form the basis for the range of T , while the strength matrix S (see Algorithm 2), which identifies basis for the range of P resembles piecewise linear edges where error is smooth after relaxation. In turn, functions. Finally, the coarse operator is computed S is used to split the index set into either C -points through the Galerkin product. or F -points (see Fig. 1b), requiring that F points are strongly connected to at least one C -point (for interpolation). With C =F -points identified, weights W are determined to form an interpolation operator of the Algorithm 3: SA-based AMG form Input: A: n n fine level matrix n c vectors representing c smooth error components Return: A0 ; : : : ; Am ,P0 ; : : : ; Pm1 for k D 0; : : : ; m 1 do fCompute strength of connectiong S strength.Ak ; / ˚ Aggregate nodes in the strength Agg aggregate.S/ B:

Algorithm 2: CF-based AMG Input: A: n n fine level matrix Return: A0 ; : : : ; Am ,P0 ; : : : ; Pm1 for k D 0; : : : ; m 1 do S strength.Ak ; / fCompute strength of connectiong C; F split.S/ fDetermine C -points and F -pointsg Pk interp.Ak ; C; F / fConstruct interpolation from C to F g AkC1 D PkT Ak Pk fConstruct coarse operatorg end

Tk

tentative.B; Agg/

Pk smooth.Ak ; Tk / AkC1 D PkT Ak Pk end

graph ˚ Construct tentative interpolation operator

fImprove interpolation operatorg fConstruct coarse operatorg

M

980


Multigrid Methods: Algebraic, Fig. 1 CF-based AMG for a rotated anisotropic diffusion problem. (a) Error after relaxation for a random guess. (b) Coarse points ( ) and fine points (ı)

a

b

Multigrid Methods: Algebraic, Fig. 2 SA-based AMG in 2D and in 1D. (a) Aggregation of nodes on a mesh. (b) Column of T and P on an aggregate

Practical Considerations Algebraic multigrid methods are commonly used as preconditioners – for example, to restarted GMRES or conjugate gradient Krylov methods – leading to a reduction in the number of iterations. However, the total cost of the preconditioned iteration requires an assessment of both the convergence factor and the work in each multigrid cycle. To measure the work in a V-cycle the so-calledPoperator complexity of the hierm nnz.Ak / . With this, the total archy is used: cop D kD0 nnz.A0 / work per digit of accuracy is estimated as cop = log10 . This relates the cost of an AMG cycle to the cost of a standard sparse matrix-vector multiplication. This also

exposes the cost versus accuracy relationship in AMG, yet this may be “hidden” if the cost of the setup phase is not included. In both CF-based AMG and SA-based AMG, the interpolation operator plays a large role in both the effectiveness and the complexity of the algorithm. In each case, interpolation can be enriched – for example, by extending the interpolation pattern or by growing B in the case of SA – leading to faster convergence but more costly iterations. There are a number of ways in which AMG has been extended or redesigned in order to increase the robustness for a wider range of problems or to improve efficiency. For example, the adaptive methods of [4, 5]

Multigrid Methods: Geometric

981

attempt to construct an improved hierarchy by modify- 13. Ruge, J.W., Stüben, K.: Algebraic multigrid. In: Multigrid Methods. Frontiers in Applied Mathematics, vol. 3, ing the setup phase based on its performance on Ax D pp. 73–130. SIAM, Philadelphia (1987) 0. Other works focus on individual components, such 14. Vanˇek, P., Mandel, J., Brezina, M.: Algebraic multigrid by as generalizing strength of connection [12] or coarssmoothed aggregation for second and fourth order elliptic problems. Computing 56(3), 179–196 (1996) ening, such as the work of compatible relaxation [9], where coarse grids are selected directly through relaxation. And new methods continue to emerge as the theory supporting AMG becomes more developed and generalized [6].


Cross-References Classical Iterative Methods Domain Decomposition Multigrid Methods: Geometric Preconditioning

Luke Olson Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA

Synonyms Geometric multigrid; GMG; MG

References

Definition

1. Bell, N., Garland, M.: Cusp: generic parallel algorithms for sparse matrix and graph computations. http://cusp-library. googlecode.com, version 0.3.0 (2012) 2. Bell, W.N., Olson, L.N., Schroder, J.B.: PyAMG: algebraic multigrid solvers in Python v2.0. http://www.pyamg.org, release 2.0 (2011) 3. Brandt, A.: Algebraic multigrid theory: the symmetric case. Appl. Math. Comput. 19, 23–56 (1986) 4. Brezina, M., Falgout, R., MacLachlan, S., Manteuffel, T., McCormick, S., Ruge, J.: Adaptive smoothed aggregation (˛sa). SIAM J. Sci. Comput. 25(6), 1896–1920 (2004) 5. Brezina, M., Falgout, R., MacLachlan, S., Manteuffel, T., McCormick, S., Ruge, J.: Adaptive algebraic multigrid. SIAM J. Sci. Comput. 27(4), 1261–1286 (2006) 6. Falgout, R., Vassilevski, P.: On generalizing the algebraic multigrid framework. SIAM J. Numer. Anal. 42(4), 1669– 1693 (2004) 7. Gee, M.W., Siefert, C.M., Hu, J.J., Tuminaro, R.S., Sala, M.G.: ML 5.0 smoothed aggregation user’s guide. http:// trilinos.org/packages/ml/ (2007) 8. Henson, V.E., Yang, U.M.: BoomerAMG: a parallel algebraic multigrid solver and preconditioner. Appl. Numer. Math. 41(1), 155–177 (2002) 9. Livne, O.E.: Coarsening by compatible relaxation. Numer. Linear Algebra Appl. 11(2–3), 205–227 (2004) 10. Mandel, J.: Algebraic study of multigrid methods for symmetric, definite problems. Appl. Math. Comput. 25(1, part I), 39–56 (1988) 11. McCormick, S.F.: Multigrid methods for variational problems: general theory for the V -cycle. SIAM J. Numer. Anal. 22(4), 634–643 (1985) 12. Olson, L.N., Schroder, J., Tuminaro, R.S.: A new perspective on strength measures in algebraic multigrid. Numer. Linear Algebra Appl. 17(4), 713–733 (2010)

Multigrid (MG) methods are used to approximate solutions to elliptic partial differential equations (PDEs) by iteratively improving the solution through a sequence of coarser discretizations or grids. The methodology has been developed and extended since the 1970s to also target more general PDEs and systems of algebraic equations. A typical approach consists of a series of refinements or grids, where an approximate solution is iteratively improved through a combination relaxation—e.g., Gauss-Seidel—and defect corrections, e.g., using projections to coarser, smaller grids.

Overview Multigrid methods were formalized by the late 1970s in the works of Brandt [3, 4] and Hackbusch [11] but were also studied earlier by Fedorenko [9, 10]. Over the next decade, multigrid development focused on, among other directions, the design and analysis of different relaxation techniques, the construction of coarse discretizations, and the theory of a framework toward a more robust geometric multigrid framework—e.g., see McCormick [12]. Through this early development, operator-based strategies and an algebraic approach to multigrid emerged, which culminated in the work

M

982


h0

h1 = 2h0

h2 = 4h0

Multigrid Methods: Geometric, Fig. 1 Hierarchy of grids with spacing h, 2h, and 4h

of Ruge and Stüben [13]. More recently, multigrid methods have grown in popularity and in robustness, being used in a vast number of areas of science and on a variety of computing architectures. Several texts on the subject give a more complete historical overview and description [5, 15]. Since there are many ways to set up a multigrid approach and each with a number of setup decisions and tunable parameters in each method, multigrid is best viewed as a framework rather than a specific method. Here, we present a representative approach based in the context of a matrix problem resulting from a discretization of an elliptic PDE. An alternative approach to presenting a geometric multigrid method is to formulate of the problem in a weak context at each grid level—e.g., a finite element formulation. Likewise, an entirely algebraic approach may be taken wherein only the matrix A is considered—e.g., recent versions of the algebraic multigrid.

while the coarsest grid is represented with h D hm . In addition, when considering only two grids, h and H are used to simplify notation for fine and coarse grids. Central to the multigrid process is the ability to adequately represent certain grid functions uh 2 h on a coarser grid, H . We denote the restriction operator as RhH W h ! H and the prolongation or interpolation operator as PHh W H ! h , both of which are assumed to be full rank. In the following, the standard Euclidean and energy norms are denoted k k and k kA , with respective inner products h; i and h; iA . For an initial guess, uh0 , the objective is to construct a multilevel iterative process that reduces the energy norm of the error. This is accomplished by exposing the error in uh0 as oscillatory error on different grid levels. A useful observation is that the error eh0 D uh0 uh satisfies the error equation Ah eh0 D rh0 , where rh0 is the residual, rh0 D fh Ah uh0 .

Terminology The goal is to solve a matrix problem

Basic Methodology

Ah uh D f h

Consider the elliptic partial differential equation (1)

associated with grid h . In the following we construct a sequence of symmetric, positive-definite (matrix) problems, Ah uh D fh , associated with grid h (see Fig. 1). We assume that grids h , with h D h0 < h1 < < hm , are nested—i.e., hkC1 hk . A grid spacing of h D h0 is referred to as the fine grid,

uxx D f .x/;

(2)

with zero boundary conditions on the unit interval. Using second-order finite differences on h D fxih g with nodes xih D ih, where h D 1=.n C 1/ and i D 0; : : : ; n C 1, results in the matrix problem


983

32 3 2 1 u1 7 6 u2 7 61 2 1 76 7 1 6 7 6 :: 7 6 : :: 7 6 6 7 76 : 7 h2 6 5 4 4 1 2 1 un1 5 1 2 un „ ƒ‚ … „ ƒ‚ … 2

2

Ah

3

Correspondingly, the eigenvalue-eigenvector pairs .!J ; v!J / of the weighted Jacobi iteration matrix G in (5) become !J D 1

uh

f1 6 f2 7 7 6 7 6 D 6 ::: 7 : 7 6 4fn1 5

(3)

fn „ ƒ‚ … fh

Given an initial guess uh0 to the solution uh , a stationary iterative method computes an update of the form uh1 D uh0 C M 1 .fh Ah uh0 / D uh0 C M 1 rh0 :

(4)

Notice that if M D A, then the iteration is exact. The error in a stationary iteration (4) satisfies eh1 D I M 1 Ah eh0 D Geh0 ;

(5)

which implies that a sufficient condition on the error propagation matrix for this problem, G, is that .G/ < 1. For a symmetric, positive-definite Mmatrix—i.e., weakly diagonally dominant with positive diagonals and negative off-diagonals—a common stationary method is weighted Jacobi with M D .1=!/D for some weight ! 2 .0; 1/ and with D as the diagonal of A. As an example, consider (3) with ! D 2=3 and h D 0:01, a random initial guess, and fh 0. As shown in Fig. 2a, weighted Jacobi is very effective at reducing the error for the first few iterations but quickly stagnates. The ability of a relaxation or smoothing method, such as weighted Jacobi, to rapidly reduce the error in the first few iterations is central to a multigrid method. To see this, we note that the eigenvectors of Ah are Fourier modes, and the eigenvalue-eigenvector pairs .; v/ are

! 2

v!J D v:

(7)

Thus, eigenvalues of the weighted Jacobi iteration matrix that approach 1:0 (thus leading to stagnation) correspond to low k and are associated with Fourier modes that are smooth. Consequently, weighted Jacobi is effective for highly oscillatory error—i.e., error with large energy norm—and is ineffective for smooth error, i.e., error that corresponds to low Fourier modes. This is depicted in Fig. 2b where the weighted Jacobi convergence factor is shown for each Fourier wavenumber. Prior to relaxation, the error eh0 is likely to have representation of both low- and high-frequency Fourier modes. After relaxation, the high-frequency modes no longer dominate and the remaining error is largely comprised of low-frequency Fourier modes. To eliminate these smooth errors, a multigrid method constructs a coarse-grid correction step as part of the iteration. That is, consider k-steps of a weighted Jacobi relaxation method: uhk1 C !D 1 rhk1 D G.uh; fh ; k/:

uhk

(8)

Since ehk is expected to be smooth, it can be represented with a coarser vector eH k and reconstructed through low-order (linear) interpolation. For example, halving the fine-grid problem results in a coarse-grid H with H D h=2 and nc D .n C 1/=2 coarse points. Then, we define an interpolation operator PHh W R.nC1/=2 ! Rn using linear interpolation, 3T

2

PHh

121 6 12 16 6 D 6 26 4

7 1 7 7 :: 7 ; : 7 5 1 21 121 T

(9)

and restriction given by RhH D .PHh / . Then the twolevel multigrid algorithm is given in Algorithm 1. A multilevel algorithm follows by observing the effect of restricting a low Fourier mode to a coarser grid. For example, consider the case of a fine grid with j k vk;i D sin k for kD1; : : : ; n: n D 15, which results in a coarse grid of n D 7. A k D4 sin2 2n 2n (6) lower Fourier mode with wavenumber k D 5 (see (6))

M

984

a


b

100

1.0 0.8

10−1

0.6 0.4

10−2 0.2 10−3

0

50

100 iterations

150

200

0.0

0

20

40 60 wave number, k

80

100

Multigrid Methods: Geometric, Fig. 2 Energy norm of the error and convergence factors in a weighted Jacobi iteration. (a) Error history. (b) Asymptotic convergence factors

Algorithm 1: Two-level multigrid uh D G .uh0 ; fh ; kpre / rh D fh Ah uh rH D RhH rh 1 eH D .AH / rH h eh D PH eH uN h D uh C eh uh1 D G .Nuh ; fh ; kpost /

frelax kpre times on the fine grid, h g fform residualg frestrict residual to coarse-grid H g fsolve the coarse-grid error problemg finterpolate coarse error approximationg fcorrect the (relaxed) solutiong frelax kpost times on the fine-grid h g

results in high Fourier mode on the coarse grid if sampled at every other point. That is, a mode that is slow to converge with relaxation on the fine grid is more effectively reduced when restricted to a coarse grid. This is illustrated in Fig. 3. In this particular example, the convergence factor of the mode on the fine grid is 0:8 while the convergence factor of the same mode on the coarse grid is 0:3. With this observation we arrive at a multilevel variant of Algorithm 1, where the coarse-level solve is replaced with relaxation, thereby postponing the inversion of a coarse matrix to the coarsest grid level. The process is shown in Fig. 4. Higher Dimensions The mechanics of the algorithm extend directly to higher dimensions. In particular, if the matrix problems Ah uh D fh are defined on a sequence of grids where even-indexed grid points become coarse-grid points in each coordinate direction—for example, as shown in Fig. 1—then the 1D definition of linear interpolation

extends through tensor definitions. That is, the 2D form for bilinear and the 3D form for trilinear interpolation are defined as PHh D P ˝ P

and PHh D P ˝ P ˝ P;

(10)

respectively, where P is 1D linear interpolation as defined by (9). Theoretical Observations and Extensions The multigrid process defined by Algorithm 1 immediately yields several theoretical conclusions. In turn, these theoretical observations lead to extensions to the basic form of geometric multigrid and ultimately to a more algebraic form of the method, where the rigid assumptions on grid structure and interpolation definitions are relieved and made more general. To this end, we consider the operator form of the error propagation in the multigrid cycle. Following Algorithm 1 for an initial guess uh0 , we arrive at the following operation on the error (using G as pre/post relaxation as in Algorithm 1): 1 E D G I PHh .AH / RhH Ah G D GT G; (11) where we T is called two-grid correction matrix. From right to left, we see that relaxation, forming the residual (with Ah eh0 ), restriction, the coarse-solve, interpolation, corrections, and additional relaxation are all repreT sented in the operator. If RhH D c.PHh / , for some constant c, then T simplifies to


985

a

b

Multigrid Methods: Geometric, Fig. 3 Mode k D 5 on a fine grid (n D 15) and coarse grid (n D 7). (a) Fine grid. (b) Coarse grid Ωh0: fine

relax

relax interpolate

restrict Ωh1

Ωh2

M Ωh4: coarsest

exactsolve V-cycle

W-cycle

Multigrid Methods: Geometric, Fig. 4 V and W multigrid cycling. The down and up arrows represent restriction of the residual and interpolation of the error between grids. A circle ( ) represents relaxation

1

(12) correction in the multigrid process and has been used as the basis for the design of new methods and the development of new multigrid theory over the last sevwhere we have dropped the sub and superscripts on P . eral decades. Indeed, if an efficient relaxation process Notice that if can be defined and a sparse interpolation operator can AH D RAh P; (13) be constructed so that error not eliminated by relaxation is accurately represented through interpolation, then T is an A-orthogonal projection and, importantly, then the multigrid cycle will be highly accurate and I T is the A-orthogonal projection onto the range of efficient. P , interpolation. This form of the coarse-grid operator AH is the Galerkin form, which also follows from a variational formulation of multigrid. It suggests that Beyond Basic Multigrid the coarse-grid operator can be constructed solely from If error components not reduced by relaxation are not Ah using P . Moreover, the form of T yields an impor- geometrically smooth, as motivated in the previous tant theoretical property: if Geh0 2 R.P /, the range sections with Fourier modes, then coarse-grid correcof P , then the V-cycle is exact. This highlights the tion based on uniform coarsening and linear interpocomplementary nature of relaxation and coarse-grid lation may not adequately complement the relaxation T D I cP .AH / RAh ;

986


process. As an example, consider the 2D model prob- Advantages and Limitations of Geometric lem on a unit square with anisotropy: Multigrid While traditional forms of geometric-based multigrid (14) are limited to problems with structure and problems uxx "uyy D f .x; y/: that have a strong geometric association, there are a Figure 5 depicts an oscillatory error before and after number of notable advantages of this methodology in 100 weighted Jacobi iterations for the case of " D contrast to more general, robust multigrid methods. For 0:001. Notice that contrary to the isotropic example, one, structured problems often admit a stencil-based h h where the error after relaxation is well represented approach in defining operators such as A and PH . by the lowest Fourier mode and is smooth in every This often results in lower storage, less communication direction, in this example the error is not geometrically in a parallel setting, and increased locality. Furthermore, setup costs for geometric multigrid, particularly smooth in the y-direction. Since the error is geometrically smooth in the x- if the stencils are known a priori, can be much less direction, one approach is to coarsen only in the x- than in algebraic methods. As a result, if a problem direction, which is called semi-coarsening. Likewise, is inherently structured, then geometric multigrid is a relaxation could be modified to perform block relax- clear advantage if appropriate relaxation methods can ation sweeps using y-slices in the domain, while still be formed. There are several packages that implement geometusing uniform coarsening. Both methods work well for ric multigrid methods at scale. The parallel semicoarsanisotropy aligned in the coordinate direction, yet the ening multigrid solvers SMG [6,14] and PFMG [1] are effect is limited for more complicated scenarios—e.g., both implemented in the hypre package [8]. Both ofrotated anisotropy. fer stencil-based multigrid solvers for semi-structured As an alternative, the practitioner could develop an improved interpolation operator to directly target error problems, with SMG leaning toward robustness and components not reduced by relaxation. This approach PFMG toward efficiency [7]. Other methods such as is called operator-induced interpolation and points hierarchical hybrid grids (HHG) [2] explicitly build toward a more algebraic approach to constructing more structure into the problem in order to take advantage robust multigrid methods. In algebraic multigrid, the of the efficiencies in geometric multigrid. The limitation of a purely geometric approach to relaxation method is fixed, while coarse grids and multigrid is squarely in the direction of robustness. interpolation operators are automatically constructed Graph and data problems, as well as unstructured mesh in order to define a complementary coarse-grid correcproblems, do not have a natural structure for which to tion.

Multigrid Methods: Geometric, Fig. 5 The effect of relaxation for anisotropic problems. (a) Initial error. (b) Error after 100 iterations

Multiphase Flow: Computation

987

build a hierarchy of grids. Even more, for many com76–12, Institute for Applied Mathematics, University of Cologne (1976) plex physics applications that are structured, the design 12. McCormick, S.: Multigrid methods for variational probof an effective relaxation process with a grid hierarchy lems: general theory for the v-cycle. SIAM J. Numer. Anal. is often elusive. On the other hand, the push toward 22(4), 634–643 (1985) more algebraic theory and design is also contributing to 13. Ruge, J.W., Stüben, K.: Algebraic multigrid. In: Multigrid Methods. Frontiers in Applied Mathematics, vol. 3, the development of more robust geometric approaches pp. 73–130. SIAM, Philadelphia (1987) in order to take advantage of its efficiencies. 14. Schaffer, S.: A semicoarsening multigrid method for ellip-

Cross-References Classical Iterative Methods Domain Decomposition Multigrid Methods: Algebraic Preconditioning

References 1. Ashby, S.F., Falgout, R.D.: A parallel multigrid preconditioned conjugate gradient algorithm for groundwater flow simulations. Nucl. Sci. Eng. 124, 145–159 (1996) 2. Bergen, B., Gradl, T., Rude, U., Hulsemann, F.: A massively parallel multigrid method for finite elements. Comput. Sci. Eng. 8(6), 56–62 (2006) 3. Brandt, A.: Multi-level adaptive technique (MLAT) for fast numerical solution to boundary value problems. In: Cabannes, H., Temam, R. (eds.) Proceedings of the Third International Conference on Numerical Methods in Fluid Mechanics. Lecture Notes in Physics, vol. 18, pp. 82–89. Springer, Berlin/Heidelberg (1973). doi:10.1007/BFb0118663 4. Brandt, A.: Multi-level adaptive solutions to boundary-value problems. Math. Comput. 31(138), 333–390 (1977) 5. Briggs, W.L., McCormick, S.F., et al.: A Multigrid Tutorial, vol. 72. Siam, Philadelphia (2000) 6. Brown, P.N., Falgout, R.D., Jones, J.E.: Semicoarsening multigrid on distributed memory machines. SIAM J. Sci. Comput. 21(5), 1823–1834 (2000) 7. Falgout, R.D., Jones, J.E.: Multigrid on massively parallel architectures. In: Dick, E., Riemslagh, K., Vierendeels, J. (eds.) Multigrid Methods VI. Lecture Notes in Computational Science and Engineering, vol. 14, pp. 101–107. Springer, Berlin/Heidelberg (2000) 8. Falgout, R.D., Yang, U.M.: hypre: a library of high performance preconditioners. In: Computational Science—ICCS 2002, Amsterdam. Springer, pp. 632–641 (2002) 9. Fedorenko, R.: A relaxation method for solving elliptic difference equations. fUSSRg Comput. Math. Math. Phys. 1(4), 1092–1096 (1962). doi:10.1016/0041-5553(62)90031-9 10. Fedorenko, R.: The speed of convergence of one iterative process. fUSSRg Comput. Math. Math. Phys. 4(3), 227–235 (1964) 11. Hackbusch, W.: Ein iteratives verfahren zur schnellen auflösung elliptischer randwertprobleme. Technical Report

tic partial differential equations with highly discontinuous and anisotropic coefficients. SIAM J. Sci. Comput. 20(1), 228–242 (1998) 15. Trottenberg, U., Oosterlee, C.W., Schuller, A.: Multigrid. Academic, San Diego (2000)

Multiphase Flow: Computation Andrea Prosperetti Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD, USA Department of Applied Sciences, University of Twente, Enschede, The Netherlands

Definition and Scope The denomination “multiphase flow” refers to situations in which different phases – gas, solids, and/or liquids – are simultaneously present in the flow domain. This broad definition includes both situations in which the various phases are described individually on a first-principle basis (e.g., by solving the NavierStokes equations in each phase subject to the appropriate boundary conditions on the phase-phase interfaces) and in which large-scale systems are modeled in some way and principally by means of averaged equations, e.g., in the case of fluidized beds, boiling flows, and gas-liquid flows in pipelines. This entry deals with problems of the latter type; for problems of the former, the reader is referred to other articles and in particular those on Boundary Element Methods, Computational Fluid Dynamics, Immersed Interface/Boundary Method, Lattice Boltzmann Methods, Level Set Methods, Navier-Stokes Equations: Computation, and Shallow Water Equations: Computation, Smooth Particle Hydrodynamics. A general reference for both types of problems is the monograph edited by Prosperetti and Tryggvason [4]; a specific reference for liquid-gas flows is Tryggvason et al. [5].

M

988

Eulerian-Lagrangian Methods Methods of the Eulerian-Lagrangian type are suitable for the description of flows with suspended inhomogeneities such as particles, drops, or bubbles. These methods were originally developed for dilute flows and “point particles,” i.e., inhomogeneities with a size much smaller than the relevant flow scales [1]. This condition is fairly limiting as it includes, in particular, the Kolmogorov scale in the case of turbulent flows. The size restriction has been relaxed in more recent developments usually referred to as discrete element models (DEM). We start from the now classic “point-particle” model, on which the more recent developments, such as DEM, are based. Upon taking advantage of the assumed small volume fraction occupied by the particles, the equation of continuity is written in the same form as for a pure fluid, which in the vast majority of applications is assumed to be incompressible. The effect of the particles on the fluid is represented by point forces located at the positions x˛ .t/, with ˛ D 1; 2; : : : ; N , instantaneously occupied by each one of the N particles:

X Du Du D r C g g ı.x x˛ /: f˛ v ˛ Dt Dt ˛ (1)

Here is the fluid density, Du=Dt the convective derivative of the fluid velocity u, the stress tensor, g the body force per unit mass, f˛ the hydrodynamic force exerted by the fluid on the ˛ th particle (opposite to the force exerted by the particle on the fluid), and v ˛ the particle volume; ı is the delta function. The second term in the brackets corrects the inertia and body forces for the fact that not all the available volume is occupied by the fluid. In the case of a gas, the factor multiplying this term makes it small and it is very often neglected. The fields u and in (1) are regarded as averaged over length scales much larger than the particle size. In numerical implementations of the finite-volume type, the momentum equation (1) is integrated over each elementary volume and the summation over the particles reduces to a summation over the particles in each volume. The particle position follows by integration of their equation of motion written as


m˛p

d w˛ D f˛ C m˛p m˛f g; dt

(2)

in which m˛p and m˛f are mass of the particle and of the displaced fluid and w˛ is the velocity of the particle. For solid particles the force is most often expressed in the form of a Stokes drag, possibly corrected by means of an empirical factor .Re/ for finite-Reynoldsnumber effects: f˛ D 6 a.Re ˛ / Œu.x˛ ; t/ w˛ :

(3)

Here u.x˛ ; t/ is the velocity of the fluid at the location x˛ occupied by the particle obtained by interpolation from the computed neighboring nodal velocities. The conceptual model on which this specification rests is that the flow is approximately uniform over the particle scale so that the velocity u.x˛ ; t/ represents with an acceptable accuracy the flow environment seen by the particle. In some cases this force expression is augmented by additional terms representing, e.g., added effects: mass, memory effects and others [2, 3]. The equation of motion for the particles can be integrated by various methods such as the secondorder Adams-Bashforth or Runge-Kutta scheme. In some implementations it is assumed that each tracked particle is representative of an entire group of particles. In this way it is possible to simulate flows with a significant mass loading (defined as the ratio of the particle mass to the total mass of particles and fluid) reducing the computational cost. In Discrete Element Methods, the particles are tracked by solving an equation of motion similar to (2). These methods differ in that the finite volume of the particles is accounted for in the fluid equations. For example, for an incompressible fluid, the continuity equation is written as @˛ C r .˛u/ D 0; @t

(4)

where ˛ is the volume fraction occupied by the particles, found essentially by summing over all the particles contained in each computational cell and dividing by the cell volume. Corresponding modifications are introduced in the momentum equation.


989

The most basic form for FJ includes the body force

Eulerian-Eulerian Methods

g and an inter-phase drag

Eulerian-Eulerian methods are based on an averaged description of the phases envisaged as interpenetrating continua. The early versions of these models had an essentially heuristic basis and were intended to describe the behavior of chemical plants or nuclear reactors under various accident scenarios. Much work has been devoted in subsequent decades to derive more realistic and physics-based formulations, but the success of these efforts has overall been somewhat limited. Nevertheless, the Eulerian-Eulerian description has found various applications beyond nuclear safety, notably to fluidized beds and gas-oil transport in pipelines. For simplicity we limit ourselves to time-dependent models in one space dimension x ; we consider two phases, which we distinguish by subscripts G for gas (or vapor) and L for liquid, although the considerations that follow are applicable to other two-phase systems and are easily extended to higher-dimensional problems. Conservation of mass is usually expressed in the form @ @ .˛J J / C .˛J J uJ / D J : @t @x

(5)

Here J and uJ denote the average (microscopic) density and velocity of the phase J D G or J D L, and J is the average rate at which the phase is consumed due to evaporation or, possibly, chemical reaction. As before, ˛J denotes the volume fraction occupied by the phase J . With only two phases G and L, conservation of volume requires that ˛G C ˛L D 1. A fairly general form of the momentum equation for the J -phase adopted in Eulerian-Eulerian models is @ @p @ ˛J J u2J D ˛J .˛J J uJ / C C FJ ; @t @x @x

(6)

in which p is the pressure and FJ the total force acting on the phase. Some models use different pressures for the different phases, but it is often possible to recast them in the form shown by defining p as the average of the two pressures and expressing their difference by a constitutive relation that affects the force FJ . An important feature of (6) is that, due to the appearance of ˛J in front of the pressure gradient, it is not in conservation form. Most models also include energy equations for the phases which we do not show for brevity.

FJ D ˛J J g C HJK .uK uJ /

(7)

in which the index K denotes the other phase and HJK is a coefficient in general dependent on volume fractions, densities, and velocities. A very significant shortcoming of the model (5)–(7) is that the system of equations is not hyperbolic as written unless the two phases have equal velocities. As a consequence, the initial-value problem is illposed (see the articles Initial Value Problems and Hyperbolic Conservation Laws: Analytical Properties). Although, in principle, ill-posedness and stability are distinct properties, in the particular case of (5) and (6), with the force FJ expressed by a much more general relation than given in (7) (and, in particular, including differential terms), it can be shown that failure of the model to be hyperbolic results in the linear instability of all wavelengths. On the other hand, models with force relations that make them hyperbolic may or may not be linearly stable depending on the specific values of the variables and on the wavelength of the perturbation. In practice, the instability due to lack of hyperbolicity has been overcome by relying on the nonlinearity of the inter-phase drag terms and on a heavy dose of numerical dissipation. The discretization of the convective terms in (5) and (6) encounters the usual problems of excessive dissipation if carried out with low-order accuracy (e.g., by donor-cell differencing or upwinding) or nonmonotonic behavior if attempted at higher order. These issues are described in the articles on Hyperbolic Conservation Laws: Computation and Stokes or Navier-Stokes Flows, and the same strategies described there (e.g., flux limiters) prove effective. Spurious oscillations can be a particularly serious problem in multiphase flow computation as they may cause the volume fractions to get out of the range 0 ˛J 1. Methods of the segregated type borrow ideas from single-phase Navier-Stokes computations, e.g., the classic SIMPLE approach. The first step is to add the discrete form of the two mass conservation equations (5) with the velocities evaluated at the advanced time. The momentum equations are then discretized and solved analytically to express the advanced-time velocities in terms of the (still unknown) advanced-time pressures. The resulting

M

990

expressions are then substituted into the combined mass conservation equation to produce an equation for the the advanced-time pressure. Each step can be executed according to many different variants depending, among others, on the degree of implicitness adopted. Furthermore, in view of the cell-to-cell couplings and various nonlinearities (including that introduced by the pressure-density-internal energy equation of state), this sequence of operations needs to be carried out iteratively to convergence, which is reached more efficiently if the equations are cast in terms of pressure and velocity increments, rather than actual advanced-time pressures and velocities. A variant of this method relies on enforcing the volume-conservation constraint ˛G C ˛L D 1 rather than conservation of mass. The segregated algorithm strategy of solving the various equations in succession using, at each step, the currently available estimates of the variables proves too inefficient in the case of processes characterized by short time scales and stronger coupling between the phases. For problems of this type, coupled algorithms, which solve all the equations simultaneously or nearly so at each step, are preferable. In the basic versions of these methods, the discretized momentum equations are solved analytically as before to express the advanced-time velocities in terms of the advancedtime pressures. The results are substituted into the discretized mass and energy conservation equations, and the resulting nonlinear system is solved iteratively. The analytic solution of the momentum equation requires the explicit discretization of the convective terms, which results in a strong limitation on the time step. Various variants which avoid this shortcoming by what essentially amounts to a predictor-corrector strategy have been developed. More recently, the adoption of fully implicit discretizations has become possible, at least for problems with one or, possibly, two space dimensions. All of the methods described are essentially firstorder accurate in space and time. Several efforts to develop higher-order methods are under way, but they are hampered by some peculiar difficulties offered by multiphase flow models. Since most higher-order methods rely on the characteristics of the mathematical model, lack of hyperbolicity is a serious concern. Hyperbolicity is not difficult to achieve – in fact many hyperbolic models exist. The problem is that it is not clear which are preferable on physical and mathematical

Multiresolution Methods

grounds. A second difficulty is the fact that model equations are not in conservation form as already noted in connection with (6). For additional information on these issues, see Ref. [4].

References 1. Balachandar, S., Eaton, J.K.: Turbulent dispersed multiphase flow. Annu. Rev. Fluid Mech. 43, 111–133 (2010) 2. Ferrante, A., Elghobashi, S.: On the physical mechanisms of two-way coupling in particle-laden isotropic turbulence. Phys. Fluids 15, 315–329 (2003) 3. Mazzitelli, I., Lohse, D., Toschi, F.: On the relevance of the lift force in bubbly turbulence. J. Fluid Mech. 488, 283–313 (2003) 4. Prosperetti, A., Tryggvason, G. (eds.): Computational Methods in Multiphase Flow, Paperback edn. Cambridge University Press, Cambridge (2009) 5. Tryggvason, G., Scardovelli, R., Zaleski, S.: Direct Numerical Simulations of Gas-Liquid Multiphase Flows. Cambridge University Press, Cambridge (2011)

Multiresolution Methods Angela Kunoth Institut für Mathematik, Universität Paderborn, Paderborn, Germany

Short Definition Multiresolution (or multiscale) methods decompose an object additively into terms on different scales or resolution levels. The object can be given explicitly, e.g., as time series or image data, or implicitly, e.g., as the solution of a partial differential equation.

Description Many physical problems exhibit characteristic features at multiple temporal and/or spatial scales. The goal of multiresolution methods is to decompose the object of interest into objects resolving these scales, for the purpose of analysis, approximation, compression, processing etc. Typical examples are measurement signals or time series, described as univariate given functions f living on a finite interval Œ0; T R. The goal is to find a decomposition

Multiresolution Methods

991

Multiresolution Methods, Fig. 1 Synthetic function f (top), additively composed from a sine wave g1 (middle), and two piecewise linear continuous functions of different resolutions, one of them g0 (bottom)

2 1 0 −1 −2

0

100

200

300

400

500

600

700

800

900

1000

0

100

200

300

400

500

600

700

800

900

1000

0

100

200

300

400

500

600

700

800

900

1000

0.4 0.2 0 −0.2 −0.4 1 0.5 0 −0.5 −1

1 X

appropriate to represent the components g0 and g2 in (1) Fig. 1. In these cases, one can compute the expansion j D0 coefficients dj;k by interpolation or projection from the given data f , and (1) together with (2) results in a where the index j 2 N stands for the scale or hierarchical or multiscale data representation. If the resolution and indicates for growing j finer scales. For collection of all functions j;k for all levels j and all a time series, f is represented by point values on a locations k satisfy additional conditions (like constidiscrete grid (which may be viewed as a single-scale tuting a Riez basis for the underlying function space, representation of the data), and the series in (1) is finite. often the Lebesgue space L2 .0; T /), one calls this a A synthetic function consisting of three components wavelet decomposition. The construction of wavelets themselves is typically based on the concept of mulg0 ; g1 ; g2 is shown in Fig. 1. Classical decompositions (1) assume that the multiresolution analysis of a separable Hilbert space [9]. tiscale components gj are of a particular form and all For given uniformly distributed data f , the expansion of the same shape: in Fourier analysis, these are the coefficients dj;k can be determined by the Fast Wavelet Fourier components gj .t/ D aj exp.i !j t/ with pre- Transform [4, 9]. Thus, the computation of these types scribed frequencies !j and constant amplitudes aj to of multiresolution decompositions relies on applying be computed from f by, for example, the Fast Fourier linear transformations. In case of nonuniformly spaced Transform. In the example in Fig. 1, the component data, the application of these transforms often resorts to g1 is of this form. Other examples are hierarchical the uniform grid case. For data in more than one dimendecompositions where the gj ’s are assumed to be of sion like images, one typically applies these transforms for each coordinate direction. The resulting multiscale the form X or hierarchical decompositions are then used for image dj;k j;k .t/: (2) gj .t/ D analysis and compression or the fast processing of k2K surfaces. For given data exhibiting nonlinear and nonstationHere j;k are prescribed functions, typically generated ary features on possibly nonuniform grids, a more from a single translated and dilated function of local recent method is based on a data-adaptive iterative support; the additional index k represents the location. Standard cases for j;k are piecewise polyno- process, leading to the so-called empirical mode demials, B-splines, or finite elements. These would be composition [7]. f .t/ D

gj .t/;

t 2 Œ0; T ;

M

992

If the object in question is to be determined as the solution u of an operator equation F .u/ D g, e.g., a partial differential or integral equation on infinite Banach spaces, the principle of finding a decomposition (1) is the same, enhanced to a large extent by the difficulty to solve the equation. The type of equation dominates the discretization and solution approach. One uses the terminology “multiresolution method” to describe the following methodologies: (i) Homogenization and multiscale modeling to resolve multiple scales the solution exhibits (ii) Multigrid methods (preconditioning, i.e., using multiple scales for computational speedup, developing fast solvers for linear systems of equations stemming from discretization of, e.g., elliptic partial differential equatios (PDEs)) (iii) Compression of integral operators and computation of high-dimensional integrals (appearing, e.g., in quantum chemistry) (iv) A posteriori adaptive methods to compute the solution u, starting from a coarse approximation to progressively include finer scales resolving singularities in data and/or domain during the computations Extensive sources of discussion of the points (ii)–(iv) are [2, 3], and the invited surveys collected in [5]; wavelet preconditioning in the context of (ii) in [8]; (iii) based on wavelets in [6] and by exponential sums in [1]; (iv) for hyperbolic conservation laws discretized by finite volume schemes in [10]; for elliptic PDEs discretized by finite elements in [11] (see also adaptive mesh refinement) and by wavelets in [12]; and the development of multilevel schemes for systems of PDEs in [13].

References 1. Braess, D., Hackbusch, W.: On the efficient computation of high-dimensional integrals and the approximation by exponential sums. In: Multiscale, Nonlinear and Adaptive Approximation, pp. 39–74. Springer, Heidelberg/New York (2009) 2. Cohen, A.: Numerical Analysis of Wavelet Methods. Elsevier, Amsterdam (2003) 3. Dahmen, W.: Wavelet and multiscale methods for operator equations. Acta Numer. 6, 55–228 (1997) 4. Daubechies, I.: Ten Lectures on Wavelets. SIAM, Philadelphia (1992) 5. DeVore, R., Kunoth, A. (eds.) Multiscale, Nonlinear and Adaptive Approximation. Springer, Berlin/Heidelberg (2009)

Multiscale Multi-cloud Modeling and the Tropics 6. Harbrecht, H., Schneider, R.: Rapid solution of boundary integral equations by wavelet Galerkin schemes. In: Multiscale, Nonlinear and Adaptive Approximation, pp. 249–294. Springer, Berlin/Heidelberg (2009) 7. Huang, N.E., Shen, S.S.P.: Hilbert-Huang Transform and its Applications. World Scientific Publishing, Singapore (2005) 8. Kunoth, A.: Optimized wavelet preconditioning. In: Multiscale, Nonlinear and Adaptive Approximation, pp. 325–378. Springer, Berlin/Heidelberg (2009) 9. Mallat, S.G.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, San Diego (1999) 10. Müller, S.: Multiresolution schemes for conservation laws. In: Multiscale, Nonlinear and Adaptive Approximation, pp. 379–408. Springer, Berlin/Heidelberg (2009) 11. Nochetto, R.H., Siebert, K.S., Veeser, A.: Theory of adaptive finite element methods: an introduction. In: Multiscale, Nonlinear and Adaptive Approximation, pp. 409–542. Springer, Berlin/Heidelberg (2009) 12. Stevenson, R.: Adaptive wavelet methods for solving operator equations: an overview. In: Multiscale, Nonlinear and Adaptive Approximation, pp. 543–598. Springer, Berlin/Heidelberg (2009) 13. Xu, J., Chen, L., Nochetto, R.H.: Optimal multilevel methods for H.grad/, H.curl/, and H.div/ systems on graded and unstructured grids. In: Multiscale, Nonlinear and Adaptive Approximation, pp. 599–659. Springer, Berlin/Heidelberg (2009)

Multiscale Multi-cloud Modeling and the Tropics Samuel N. Stechmann Department of Mathematics, University of Wisconsin–Madison, Madison, WI, USA

Synonyms Clouds, convection; Easterly, westward; Westerly, eastward

Glossary/Definition Terms: MCS: CCW: MJO: CMT:

Mesoscale convective system. Convectively coupled wave. Madden-Julian oscillation. convective momentum transport.

Introduction In the tropical atmosphere, clouds and convection play a central role in weather and climate processes.

Multiscale Multi-cloud Modeling and the Tropics

Furthermore, clouds present a formidable modeling challenge in large part due to phase changes of water and the accompanying latent heat release, which interactively drives atmospheric circulations. Two of the most interesting and important aspects are (i) multiple cloud types and their different roles and (ii) multiscale organization of clouds and convection. For many cloud systems, the two most important cloud types are deep convective and stratiform. Figure 1 illustrates these different cloud types. Deep convective clouds are so named because they extend vertically through a deep atmospheric layer, from the top of the boundary layer to the tropopause, and these clouds are associated with the most vigorous updrafts. On the other hand, stratiform clouds are present in the upper half of the troposphere, where they originate as an outgrowth of deep convection or as a later stage in a deep convective cloud’s life cycle. The partitioning of precipitation into deep convective and stratiform components has long been investigated [2]. The importance of this partitioning is multifaceted; as one example, these components have different profiles of vertical heating. Figure 1 shows the deep heating profile (labeled P ) of a deep convective cloud and the “dipole” heating/cooling structure (labeled Hs ) of a stratiform cloud. In the stratiform case, latent heating occurs in the upper troposphere, and cooling occurs in the lower troposphere due to evaporation of rain as it falls through the undersaturated air below the cloud. Also illustrated in Fig. 1 is the shallow heating profile (labeled Hc ) of a congestus cloud, which is present in the lower half of the troposphere. Due to their different heating profiles, these cloud types have different important roles in tropical atmospheric dynamics [5, 8, 9, 11, 18, 19, 23, 27]. Coherent cloud patterns can organize on many different scales in the tropics, and the largest scales can be loosely partitioned into three groups. Individual cloud systems appear on scales of roughly 200 km and 0.5 days, and they are commonly called “mesoscale convective systems” (MCSs) [6]. Several MCSs, in turn, can sometimes be organized within a larger-scale wave envelope with scales of roughly 2,000 km and 5 days; these propagating envelopes are called “convectively coupled waves” (CCWs) [12]. Moreover, several CCWs can sometimes be organized within an even larger-scale wave envelope with scales of roughly 20,000 km and 50 days; the most prominent example of this is the Madden–Julian Oscillation (MJO) [14, 30].

993

Each of these phenomena has an organized cloud structure that includes a progression through the cloud types shown in Fig. 1, from congestus to deep convection to stratiform. Modeling these organized cloud systems remains a difficult challenge. At the heart of the challenge are multiple cloud types and multiscale interactions. In their simplest form, the multiscale interactions are convection–environment interactions. Cloud systems are influenced by environmental wind shear and by the environmental thermodynamic state, and, in turn, cloud systems can alter the environmental state. In what follows, these multiscale interactions are illustrated using idealized models, beginning with models for different cloud types and their role in multiscale interactions.

Multicloud Modeling To illustrate the different cloud types and their roles in organized convective systems, two models for CCWs are presented in this section: an exactly solvable model and a nonlinear multicloud model.

M Exactly Solvable Model An exactly solvable model for a CCW structure is w0 .x; z; t/ D S0 .x; z; t/ @x u0 C @z w0 D 0:

(1)

In this model called the weak-temperature-gradient approximation, the wave’s vertical velocity w0 is exactly in balance with the heating rate S0 , which we must specify. The wave’s horizontal velocity u0 is then determined from the incompressibility constraint in (1) [1, 17]. Given this exact solution for u0 and w0 of the CCW, its effect on the mean flow is determined by @t uN D @z w0 u0 ;

(2)

where this is the horizontal spatial average of the horizontal momentum equation, @t u + @x .u2 / C @z .wu/ C @x p D 0, and where bar and prime notation is used to denote a horizontal spatial average and fluctuation, respectively: 1 fN.z; t/ D L

Z

L

f .x; z; t/ dx 0

994


Multiscale Multi-cloud Modeling and the Tropics, Fig. 1 Top: Schematic illustration of cloud types in the tropics (From Khouider and Majda [10]). Bottom: Vertical heating profiles associated with the deep convective, stratiform and congestus cloud types and vertical structures of the first baroclinic mode wind, u1 , and the second baroclinic mode wind, u2 (From Khouider and Majda [8])

Tropopause

Deep Convective Hut Tower with Anvil top

Stratiform Clouds

Downdrafts 16 km

Evaporation of Stratiform Rain Induced Downdrafts

Cumulus Congestus

0° C

Trade Wind Inversion

Shallow Cumulus

Cloud Base 500 m

Well mixed Boundary Layer Evaporation Sea Surface

U1

P

−Hs

Hc +

U2

−

+ −

f 0 .x; z; t/ D f fN;

+

(3) convective heating and congestus/stratiform heating, respectively:

where periodic horizontal boundary conditions are assumed for simplicity. From (2), it is seen that a CCW will alter the mean flow if and only if @z w0 u0 ¤ 0. In the context of convective motions, this effect on the mean flow is called convective momentum transport (CMT). To illustrate CMT in some specific cases, consider a heat source with two phase-lagged vertical modes, sin(z) and sin(2z), which represent deep

p S0 Da fcosŒkx !t 2 sin.z/ C ˛ cosŒk.x C x0 / p !t 2 sin.2z/g; (4) where k is the horizontal wavenumber and a is the amplitude of the heating. Two key parameters here are ˛, the relative strength of the second baroclinic heating, and x0 , the lag between the heating in the two vertical modes. Figure 2 shows three cases for the lag x0 W 0


995 16

16

12

12

12

8

8

8

4

4

4

z (km)

16

0 −1.5

−1

−0.5

0

0.5

1

1.5

0 −0.02

x (1000 km)

z (km)

0 0.02 −0.5

w’u’ (m^2/s^2) 16

12

12

12

8

8

8

4

4

4

0 −1.5

−1

−0.5

0

0.5

1

1.5

0 −0.02

x (1000 km)

−0.01

0

0 −0.5

w’u’ (m^2/s^2) 16

12

12

12

8

8

8

4

4

4

0 −1.5

0 −1

−0.5

0

0.5

1

1.5

x (1000 km)

Multiscale Multi-cloud Modeling and the Tropics, Fig. 2 Solutions to the exactly solvable model (1) for CCW structure and CMT in three cases: upright updraft (top), vertically tilted updraft of “eastward-propagating” CCW (middle), and vertically tilted updraft of “westward-propagating” CCW (bottom). Left: Vector plot of .u0 ; w0 / and shaded convective heating S0 .x; z/. For vectors, the maximum u0 is 6.0 m/s for the top and 4.0 m/s

0

0.01 w’u’ (m^2/s^2)

0.5

M 0

0.5

−d(w’u’)/dz [(m/s)/day]

16

16

0


16

16

z (km)

0

0 0.02 −0.5

0

0.5


for the middle and bottom, and the maximum w0 is 2.8 cm/s for the top and 2.2 cm/s for the middle and bottom. Dark shading denotes heating, and light shading denotes cooling, with a contour drawn at one-fourth the max and min values. Middle: Vertical profile of the mean momentum flux: w0 u0 . Right: Negative vertical derivative of the mean momentum flux: @z w0 u0 (From Stechmann et al. [24])

996


(top), C500 km (middle), and 500 km (bottom) for a wave with wavelength 3,000 km, heating amplitude a D 4 K=day, and relative stratiform heating of ˛ D 1=4. The lag determines the vertical tilt of the heating profile. Given this heating rate, the velocity can be found exactly from (1): p a n sinŒkx !t 2 cos.z/ k o p C2˛ sinŒk.x C x0 / !t 2 cos.2z/ n p w0 .x; z; t/ Da cosŒkx !t 2 sin.z/ o p C˛ cosŒk.x C x0 / !t 2 sin.2z/ u0 .x; z; t/ D

to the CCW’s propagation direction. Note that the vertically averaged momentum would not be affected by CMT in this model, since w0 u0 is necessarily zero at the upper and lower rigid boundaries. This simple model illustrates CMT features that are similar to Moncrieff’s archetypal models for MCS [22], due to the “self-similarity” of MCS and CCW structures [16, 20].

Nonlinear Multicloud Model While the exactly solvable model illustrates CCW structure in a simple way, it does not include any CCW dynamics. To investigate CCW dynamics, we use the multicloud model of Khouider and Majda (5) [8, 10], which is a spatially variable PDE model for CCWs that captures many important features such as With this form of u0 and w0 , the eddy flux divergence their propagation speeds and tilted vertical structures. The mathematical form of the model is is @z w0 u0 D

3 sin.kx0 / 2 a ˛Œcos.z/ cos.3z/ 2 k

(6)

Notice that a wave with first and second baroclinic components generates CMT that aspects the first and third baroclinic modes [1, 17]. Also notice that (6) is nonzero as long as ˛ ¤ 0 (i.e., there are both first and second baroclinic mode contributions) and x0 ¤ 0 (i.e., there is a phase lag between the first and second baroclinic modes). These are typical aspects of the structure of observed CCWs [12]. For illustrations of the above exact solutions, consider the three cases shown in Fig. 2: upright updraft (top), “eastward-propagating” CCW (middle), and “westward-propagating” CCW (bottom). Although there is no inherent definitive propagation in the exactly solvable model (1), propagation direction labels are assigned to the vertical tilt directions according to the structures of observed CCW [12, 20]: heating is vertically tilted with leading low-level heating and trailing upper-level heating with respect to the CCW propagation direction. Specifically, this corresponds to the observed structures of convectively coupled Kelvin waves [25], which propagate eastward, and westwardpropagating inertio-gravity waves (also called “twoday waves”) [26]. Also shown in Fig. 2 are the average vertical flux of horizontal momentum, w0 u0 , and its vertical derivative, @z w0 u0 . These exact solutions show that upright updrafts have zero CMT, and tilted updrafts have nonzero CMT with a sign that is related

@t u C A.u/@x u D S.u/

(7)

where u.x; t/ is a vector of model variables, u D .u1 ; 1 ; u2 ; 2 ; eb ; q; Hs /T . The model variables are uj , the zonal velocity in the j th baroclinic mode; j , the potential temperature in the j th baroclinic mode; eb , the equivalent potential temperature of the boundary layer; q, the vertically integrated water vapor; and Hs , the stratiform heating rate. The matrix A.u/ includes the effects of nonlinear advection and pressure gradients, and S.u/ is a nonlinear interactive source term with combinations of polynomial nonlinearities and nonlinear switches. See Majda and Stechmann [18] and Stechmann et al. [24] for the detailed form of these equations. Using the velocity modes uj .x; t/, the twodimensional zonal velocity u.x; z; t/ is recovered as a sum of the contributions from all of the vertical modes: u.x; z; t/ D u0 .x; t/ C

1 X

p uj .x; t/ 2 cos.j z/

(8)

j D1

where the troposphere extends from z D 0 to in the nondimensional units shown in (8), which corresponds to z D 0 to 16 km in dimensional units. The vertically uniform mode j D 0 is the barotropic mode, and the other modes are the baroclinic modes. Plots of the vertical structure associated with some of the vertical


baroclinic modes are shown in Fig. 1. In order to include a balance between simplicity and important physical effects, the original multicloud model includes only u1 and u2 as dynamical variables. The effect of u3 will also be considered here as either a constant background shear U 3 or as a slowly evolving mean shear U 3 .T /, where T D 2 t is a slow time scale. Figure 3 shows the behavior of the multicloud model (7) in the presence of three different mean shears U .z/. These are nonlinear simulations on a 6,000 km wide domain with periodic boundary conditions in the horizontal. The first column shows the case of zero mean shear. In this case, there are linear instabilities over a finite band of wavenumbers, the unstable waves propagate both eastward and westward, and there is perfect east–west symmetry. In the nonlinear simulation, a westward-propagating traveling wave arises as the stationary solution (if viewed from a translating reference frame), which grows from a small initial random perturbation. Due to the perfect east–west symmetry of this case, the initial conditions randomly select whether the eastward- or westward-propagating wave will eventually become the stationary solution. The second column shows a case with a lower tropospheric westerly jet and an upper tropospheric easterly jet. In this case, the east–west symmetry is broken, the westward-propagating wave has the largest linear theory growth rates, and it is the eventual stationary solution in the nonlinear simulation. The third column shows another case with a nontrivial vertical shear. In this case, the linear theory growth rates are nearly east– west symmetric, and the nonlinear simulation appears to favor a standing wave solution rather than a traveling wave solution. In fact, at later times (not shown), there is an oscillation between the standing and traveling wave states in this case, so the preference for the standing wave is tenuous. Nevertheless, these cases demonstrate, to an extent, two effects of the background shear on the CCWs: it can break the east–west symmetry to favor either the eastward- or westwardpropagating wave, and it can determine, to an extent, whether a traveling wave or standing wave state is favored. The vertical structure of the CCW is illustrated in Fig. 4. Shown here are the velocity fluctuations u0 and w0 taken from the first case from Fig. 3 at time t D 30 days. Similar to the exactly solvable model in Fig. 2, the CCW here has a vertically tilted

997

updraft due to a heating structure from a combination of deep convection and stratiform heating. There is a positive momentum flux w0 u0 in the middle troposphere, which corresponds to a @z w0 u0 structure that would accelerate easterlies in the lower troposphere and westerlies in the upper troposphere, if this CMT were not balanced by other momentum sources. (In the next section, the mean wind will be allowed to evolve in response to this type of CMT.) Also note that the middle case from Fig. 3 also has a CCW structure as in Fig. 4, which, in that case, would decelerate the mean flow at all levels if the CMT were not balanced by other momentum sources. Together, these two cases illustrate that the energy transfer can be either upscale or downscale, depending on the particular mean flow and the propagation direction of the CCW.

Multiscale Multicloud Modeling Now the one-way effects of the previous section will be combined to allow two-way CCW–mean flow interactions. The mean wind can influence which CCW is favored (eastward or westward propagating), and the CCW can alter the mean wind through its CMT. A multiscale asymptotic model for CCW– environment interactions can be derived from the atmospheric primitive equations, as described by Majda and Stechmann [18]. The derivation is outlined here for the zonal velocity u only, although the full set of atmospheric variables is used by Majda and Stechmann [18]. The starting point is the twodimensional equation @t u C @x .u2 / C @z .wu/ C @x p D Su

(9)

It is assumed that the velocity depends on two time scales: a fast time scale t on equatorial synoptic scales and a slow time scale T D 2 t on intraseasonal time scales. The asymptotic expansion of u takes the form u D U .z; T / C u0 .x; z; t; T / C 2 u2 C O. 3 / (10) with similar expansions for other variables and where U .z; T / is the slowly varying mean wind and u0 .x; z; t; T / is the fluctuating wind. After inserting the ansatz (10) into the primitive equation (9) and applying

M

998

Multiscale Multi-cloud Modeling and the Tropics U1=0

16

U2=0

U3=0

m/s

12

U3=–1.3

−5

0

5

8

U(z) (m/s)

8

−5

0

5

0 −10

10

U(z) (m/s)

Hd (K/day)

Hd (K/day)

25

25

20

20

20 time (days)

25

time (days)

30

15

10

10

5

5

5

2 4 x (1000 km)

6

0

5

10

0

0

2 4 x (1000 km)

6

0

5

10

Multiscale Multi-cloud Modeling and the Tropics, Fig. 3 Nonlinear simulations of the multicloud model for three cases of fixed background shear. Row 1: Three different mean flows

the procedure of systematic multiscale asymptotics, the result is @T U D @z hw0 u0 i @T ‚ D @z hw0 0 i C hS;2 i @z P D ‚ and a set of equations for the fluctuations 0 @t u0 C U @x u0 C w0 @z U C @x p 0 D Su;1

5

10

15

10

0

0

Hd (K/day)

30

0

−5

U(z) (m/s)

30

15

U3=–1.3

4

0 −10

10

U2=7

12

4

0 −10

U1=1.3

16

z (km)

8 4

time (days)

U2=0

12 z (km)

z (km)

U1=1.3

16

0

0

2 4 x (1000 km)

6

0

5

10

UN .z/ used for the three cases. Row 2: Space–time plots of deep convective heating Hd .x; t / from nonlinear simulations (From Stechmann et al. [24]) 0 @t 0 C U @x 0 C w0 @z ‚ C w0 D S;1

@z p 0 D 0 @x u0 C @z w0 D 0

(12)

where the full derivation by Majda and Stechmann (11) [18] includes the full set of atmospheric variables. The multiscale equations (11)–(12) demonstrate the main two mechanisms of CCW–mean flow interactions: CMT from the CCW drives changes in the mean wind on the slow time scale T D 2 t, and the mean flow affects the CCW through the advection terms.


999

16

14

14

12

12

12

10

10

10

z (km)

z (km)

14

8

z (km)

16

16

8

8

6

6

6

4

4

4

2

2

2

0

0 1

2

3

4

x (1000 km)

0

0.01

0.02

w’u’ (m^2/s^2)

0 −0.5

0

0.5


Multiscale Multi-cloud Modeling and the Tropics, Fig. 4 Structure and CMT of the westward-propagating CCW from the left case of Fig. 3 at time t D 30 days. Left: Vector plot of .u; w/ and shaded convective heating. Maximum u and w are 5.2 m/s and 7.3 cm/s, respectively, and dark and light shading

show convective heating greater than C2 K/day, and less than 2 K/day, respectively. Middle: Vertical profile of the mean momentum flux: w0 u0 . Right: Negative vertical derivative of the mean momentum flux: @z w0 u0 (From Stechmann et al. [24])

By themselves, (11)–(12) include the dry dynamical basis and the multiscale interactions, but the source 0 still needs to be specified; the multicloud term S;1 model is thus used to supply interactive source terms and moisture effects. Note that (11)–(12) allow for changes in the mean thermodynamic state such as ‚.z; T / in addition to mean flow U .z; T /; this was also included in Majda and Stechmann [18] and here as well, but only the mean flow U .z; T / dynamics will be shown here as it has the most significant effect in this single-planetary-scale-column setup. In short, the model for CCW–environment interactions can be thought of as the multiscale model in (11)– (12) with the multicloud model used to supply moisture effects and interactive source terms for (12). An example of the multiscale multicloud dynamics is shown in Fig. 5. This background state is similar to the westerly wind burst stage of the MJO [1, 7, 15, 17, 28, 29]. The mean flow oscillates about a climate base state that is mostly first baroclinic, i.e., the cos z term dominates, but CMT causes the maximum low-level winds to shift aloft to z D 3 or 4 km as occurs from t D 1;040 to 1,070 days. This phase in the cycle of the zonal winds in the simple dynamical model strongly resembles the one for the zonal winds in the westerly wind burst stage of the MJO from the observational record [15, 28, 29]. First at time t D 1;040, the shear is entirely first baroclinic with the maximum of the westerlies at the base of the troposphere as in the

westerly onset stage. Tung and Yanai [28, 29] use the diagnostic U.z; T / @U > 0 .< 0/ jU j @t

(13)

to denote acceleration (deceleration) of the zonal jet where @U=@t is measuredfrom turbulent transports in the observations. In the westerly wind burst phase of the MJO, they find first a phase of acceleration of the zonal winds in the lower troposphere due to CMT which is followed by a phase of deceleration of these westerly winds [29]. This is exactly what happens in the simple model due to CMT as shown in the upper panels of Fig. 5. The zonal winds in the lower troposphere first accelerate between t D 1;040 and 1,070 days where a strong westerly wind burst develops aloft, as in the observations, and then decelerate at the times beyond t D 1;070 days. What happens in the simple dynamical model between times t D 1;040 and 1,070 days is a coherent eastward-propagating CCW which affects the zonal mean flow through CMT and drives the acceleration of the westerly zonal wind. Masunaga et al. [21] has noted the prominent occurrence in observations of eastward-propagating convectively coupled Kelvin waves in the westerly wind burst phase of the MJO. This occurs, for instance, as the CCW propagates eastward from t D 1;040 to 1,070 days. (This is

M

1000

a


b

16

12

z (km)

z (km)

12

16 1080 d 1090 d 1100 d

1040 d 1050 d 1060 d 1070 d

8

4

8

4

0 −10

−5

0 U (m/s)

5

0 −10

10

−5

0 U (m/s)

Hd (K/day)

10

Hd (K/day)

c

d 1045

1095

1040

1090

1035

1085

1030

1080 t (days)

t (days)

5

1025

1075

1020

1070

1015

1065

1010

1060

1005

1055

1000

0

1

2

3

4

1050

5

0

1

2

0

5

10

3

4

5

x (1000 km)

x (1000 km)

15

Multiscale Multi-cloud Modeling and the Tropics, Fig. 5 Evolution of the mean wind UN (a) and the convectively coupled waves (c) through one transition from weak low-level westerlies to strong low-level westerlies. The deep convective heating

0

5

10

15

Hd .x; t / is shaded light gray when Hd > 2 K day1 , dark gray when Hd > 6 K day1 , and black when Hd > 10 K day1 . (b, d) Same as (a, c) except for the subsequent decay of this phase due to downscale CMT (From Majda and Stechmann [18])


also the same role played by eastward-propagating superclusters in a recent diagnostic multiscale model of the MJO [1, 17].) Note that this analogous behavior occurs in this simple dynamical model even though it is one dimensional horizontally and without Coriolis effects. Another striking feature of Fig. 5 is the occurrence of multiscale waves with envelopes propagating westward with smaller scale convection propagating eastward within the envelope. These multiscale waves appear in the transition phases between instances of coherent CCWs propagating in opposite directions. At these stages, the wave patterns resemble those in the simulations of Grabowski and Moncrieff [3]. The occurrence of both coherent and scattered convection is also reminiscent of the simulations in Grabowski et al. [4], although their results were on smaller scales and their mean variables were prescribed, not dynamic. Many challenges remain for multiscale multicloud modeling in the tropics. See Klein [13] and Khouider et al. [11] for recent reviews from an applied mathematics perspective.

References 1. Biello, J.A., Majda, A.J.: A new multiscale model for the Madden–Julian oscillation. J. Atmos. Sci. 62, 1694–1721 (2005) 2. Cheng, C.P., Houze, R.A. Jr.: The distribution of convective and mesoscale precipitation in GATE radar echo patterns. Mon. Weather Rev. 107(10), 1370–1381 (1979) 3. Grabowski, W.W., Moncrieff, M.W.: Large-scale organization of tropical convection in two-dimensional explicit numerical simulations. Q. J. R. Meteorol. Soc. 127, 445–468 (2001) 4. Grabowski, W.W., Wu, X., Moncrieff, M.W.: Cloudresolving modeling of tropical cloud systems during Phase III of GATE. Part I: two-dimensional experiments. J. Atmos. Sci. 53, 3684–3709 (1996) 5. Houze, R.A. Jr.: Observed structure of mesoscale convective systems and implications for large-scale heating. Q. J. R. Meteorol. Soc. 115(487), 425–461 (1989) 6. Houze, R.A. Jr.: Mesoscale convective systems. Rev. Geophys. 42, RG4003 (2004). doi:10.1029/2004RG000150 7. Houze, R.A. Jr., Chen, S.S., Kingsmill, D.E., Serra, Y., Yuter, S.E.: Convection over the Pacific warm pool in relation to the atmospheric Kelvin–Rossby wave. J. Atmos. Sci. 57:3058–3089 (2000) 8. Khouider, B., Majda, A.J.: A simple multicloud parameterization for convectively coupled tropical waves. Part I: linear analysis. J. Atmos. Sci. 63, 1308–1323 (2006)

1001 9. Khouider, B., Majda. A.J.: Equatorial convectively coupled waves in a simple multicloud model. J. Atmos. Sci. 65:3376–3397 (2008) 10. Khouider, B., Majda, A.J.: Multicloud models for organized tropical convection: enhanced congestus heating. J. Atmos. Sci. 65, 895–914 (2008) 11. Khouider, B., Majda, A.J., Stechmann, S.N.: Climate science in the tropics: waves, vortices and PDEs. Nonlinearity 26(1), R1–R68 (2013) 12. Kiladis, G.N., Wheeler, M.C., Haertel, P.T., Straub, K.H., Roundy, P.E.: Convectively coupled equatorial waves. Rev. Geophys. 47, RG2003 (2009). doi:10.1029/2008RG000266 13. Klein, R.: Scale-dependent models for atmospheric flows. Ann. Rev. Fluid Mech. 42, 249–274 (2010) 14. Lau, W.K.M., Waliser, D.E. (eds.): Intraseasonal Variability in the Atmosphere–Ocean Climate System, 2nd edn. Springer, Berlin (2012) 15. Lin, X., Johnson, R.H.: Kinematic and thermodynamic characteristics of the flow over the western Pacific warm pool during TOGA COARE. J. Atmos. Sci. 53, 695–715 (1996) 16. Majda, A.J.: New multi-scale models and self-similarity in tropical convection. J. Atmos. Sci. 64, 1393–1404 (2007) 17. Majda, A.J., Biello, J.A.: A multiscale model for the intraseasonal oscillation. Proc. Natl. Acad. Sci. U.S.A. 101(14), 4736–4741 (2004) 18. Majda, A.J., Stechmann, S.N.: A simple dynamical model with features of convective momentum transport. J. Atmos. Sci. 66, 373–392 (2009) 19. Mapes, B.E.: Convective inhibition, subgrid-scale triggering energy, and stratiform instability in a toy tropical wave model. J. Atmos. Sci. 57, 1515–1535 (2000) 20. Mapes, B.E., Tulich, S., Lin, J.L., Zuidema, P.: The mesoscale convection life cycle: building block or prototype for large-scale tropical waves? Dyn. Atmos. Oceans 42, 3–29 (2006) 21. Masunaga, H., L’Ecuyer, T., Kummerow, C.: The Madden– Julian oscillation recorded in early observations from the Tropical Rainfall Measuring Mission (TRMM). J. Atmos. Sci. 63(11), 2777–2794 (2006) 22. Moncrieff, M.W.: Organized convective systems: archetypal dynamical models, mass and momentum flux theory, and parameterization. Q. J. R. Meteorol. Soc. 118(507), 819–850 (1992) 23. Schumacher, C., Houze, R.A. Jr., Kraucunas, I.: The tropical dynamical response to latent heating estimates derived from the TRMM precipitation radar. J. Atmos. Sci. 61(12), 1341–1358 (2004) 24. Stechmann, S.N., Majda, A.J., Skjorshammer, D.: Convectively coupled wave–environment interactions. Theor. Comput. Fluid Dyn. 27, 513–532 (2013) 25. Straub, K.H., Kiladis, G.N.: The observed structure of convectively coupled Kelvin waves: comparison with simple models of coupled wave instability. J. Atmos. Sci. 60(14), 1655–1668 (2003) 26. Takayabu, Y.N., Lau, K.M., Sui, C.H.: Observation of a quasi-2-day wave during TOGA COARE. Mon. Weather Rev. 124(9), 1892–1913 (1996) 27. Tulich, S.N., Randall, D., Mapes, B.: Vertical-mode and cloud decomposition of large-scale convectively coupled

M

1002 gravity waves in a two-dimensional cloud-resolving model. J. Atmos. Sci. 64, 1210–1229 (2007) 28. Tung, W., Yanai, M.: Convective momentum transport observed during the TOGA COARE IOP. Part I: general features. J. Atmos. Sci. 59(11), 1857–1871 (2002) 29. Tung, W., Yanai, M.: Convective momentum transport observed during the TOGA COARE IOP. Part II: case studies. J. Atmos. Sci. 59(17), 2535–2549 (2002) 30. Zhang, C.: Madden–Julian Oscillation. Rev. Geophys. 43, RG2003 (2005). doi:10.1029/2004RG000158

Multiscale Numerical Methods in Atmospheric Science Rupert Klein FB Mathematik and Informatik, Freie Universität Berlin, Berlin, Germany

Description Numerical simulation plays a vital part in modern weather forecasting and climate research. Related numerical methods must respect the multiscale character of atmospheric dynamics. Different hierarchies of scales arise from a variety of origins, and each comes with its specific demands in the context of computational simulation. This entry addresses multiscale issues in the numerical solution of the atmospheric flow equations. Issues associated with the mathematical modeling of unresolved scales are not addressed.

Parameter-Induced Scales/Multi-rate, (Semi-)implicit, Well-Balanced, and Asymptotically Consistent Schemes A substantial part of today’s theoretical meteorological knowledge has been derived through scale analyses. These exploit the wide separation between certain characteristic length and time scales of atmospheric motions whose existence is implied by the Earth’s geophysical parameters. Such parameters are the Earth’s radius and rotation rate, the total mass of its atmosphere, the global mean atmospheric temperature, a typical horizontal temperature difference between the poles and the equator, and

Multiscale Numerical Methods in Atmospheric Science

the average acceleration of gravity. Through classical dimensional analysis, these parameters combine to form dynamically relevant characteristic scales, such as the pressure scale height, hsc 10 km, which measures the height of the troposphere; the mid-latitude synoptic scale, L 1;000 km, which is the typical diameter of a high- or lowpressure region; or the tropospheric Brunt-Väisälä frequency, N , which characterizes the stability of the atmosphere’s stratification against adiabatic vertical mass displacement [1]. Theoretical studies reveal that associated with these characteristic scales are certain dominant balances of physical forces or processes [1, 2]. Examples are the near hydrostatic and geostrophic balances of the pressure gradient with the gravitational and the Coriolis apparent forces, respectively, which are relevant to the synoptic length and daily time scales. On the one hand, these dominant balances justify related reduced dynamical models, such as the quasi-geostrophic model for the said examples. On the other hand, they imply that numerical schemes for solving the unapproximated full compressible flow equations should reproduce these near balances without undue interference from numerical truncation errors, and they should properly handle the underlying fast-wave processes that arise when the flow data are out of balance. Multi-rate, (Semi-)implicit, and Asymptotically Consistent Schemes Across all relevant length and time scales, atmospheric flows are in acoustic balance, i.e., flow velocities are much slower than a typical sound speed, and characteristic time scales are much longer than those of acoustic oscillations on the same length scales. Solving the compressible flow equations for such slowly evolving solutions remains challenging, although a number of practical solutions are available [3]. Split-explicit or multi-rate time integrators reduce the expense of making small acoustics-resolving time steps by splitting the governing equations into a linearized first part that captures the fast acoustic and other fast-wave modes and a nonlinear second equation set that describes the remaining slow modes, notably advection. Both parts are integrated explicitly in time, the first using acoustics-resolving time steps and the second using time steps that only resolve


the slow processes. As intuitively clear as this approach appears at a conceptual level, as difficult it is to actually construct a split scheme that delivers on the promise of allowing large separation between the time steps used in the sub-integrations. The optimization of such methods remains an active field of research [4, 5] for at least one important reason: They have a decisive advantage over the (partially) implicit approaches discussed in the next paragraph in terms of parallelization on modern supercomputer hardware. Semi-implicit, or [linearized] implicit-explicit ([L]IMEX), is used when fast-wave oscillations are not important and thus need not be resolved in time – as is the case, e.g., with acoustic modes. Using unconditionally stable implicit time integrators on a linearized fast-wave part of the governing equations enables integration at time steps comparable to those used to resolve the slower processes of interest. A potential caveat with this approach is that unconditional stability with respect to the time step size is achieved with implicit integrators at the cost of artificially slowing down the oscillations of fast-wave modes with wavelengths of the order of the computational grid size. This slowing-down distorts the wave dispersion to the extent that the numerically realized group velocity of the related shortwaves may nearly vanish or even change sign relative to its physical counterparts depending on the details of the schemes used. In both cases some of the short-wavelength oscillatory modes are then nearly stationary on the grid. As a consequence, they are prone to weakly nonlinear amplification through truncation errors that arise in coupling the implicit and explicit substeps or as a result of erroneous channeling of energy from physical processes into the unphysical oscillatory modes. As a countermeasure one resorts to implicit integrators that feature nonzero dissipation for shortwave modes as part of their truncation error. This is sometimes achieved by tuning the second-order implicit trapezoidal or related schemes toward the first-order accurate backward Euler method or sometimes by resorting to still at least second-order accurate but also dissipative multilevel backward in time differencing (BDF) schemes [6]. The adoption of higher-order integrators with more favorable properties faces the efficiency critique: Implicit solves are computationally expensive and not easily parallelizable on modern hardware. For these

1003

reasons, the development of semi- and fully implicit time integrators remains a focus of interest [7–10]. Asymptotically adaptive or asymptotic-preserving schemes mostly belong to the class of semi-implicit methods. In their construction, particular attention is paid, however, to the requirement that the schemes not only work stably under practically relevant conditions of time scale separation but that they automatically and seamlessly turn into adequate solvers for the reduced asymptotic models that describe flows in the respective fully balanced limits [11–15].

Well-Balanced Schemes Split-explicit and semi-implicit schemes, when applied to a configuration with approximate balance of some fast processes, approach numerical balanced states by multiple fast iterations or by implicitly solving for them. Yet, these states generally bear the imprint of the numerical truncation errors associated with the spatial discretization used, and this may distort the steady states away from the physically meaningful ones at unacceptable levels. A prominent example is spurious numerically induced winds over steep topography that arise after initialization of a simulation with a nominal static state at rest. It is not guaranteed automatically that the discrete pressure gradient on a terrainfollowing grid that balances the (vertical) acceleration of gravity has vanishing horizontal components. If there are remaining horizontal components, however, they induce spurious horizontal and in the sequel also vertical flows. Well-balanced schemes overcome this general issue by building explicit information on to-berespected balanced states explicitly into the numerical discretizations. The central underlying idea is as follows: Instead of building more sophisticated schemes from first-order versions that work with piecewise constant states as the simplest base states one usually thinks of, one constructs schemes that use locally balanced states as the fundamental building blocks [16–18]. These schemes guarantee clean numerical static states for the shallow water and atmospheric Euler equations in second-order accurate discretizations. Meanwhile there exist advanced schemes of this type which also maintain steady states with nontrivial flow, [19], or achieve higher than second-order accuracy, [20], and related ideas are being exploited in global weather codes based on the hydrostatic primitive equations [21].

M

1004

Process-Induced Dynamic Balances/Conservation Principles and Mimetic Schemes The atmosphere is a nonequilibrium system driven by incoming solar radiation. The incoming flux of energy is redistributed through a myriad of processes, dominantly being channeled back and forth between the potential, kinetic, and internal forms of energy, including the latent heat of liquid water. A central role in this context is played by diabatic processes which irreversibly transfer energy from its mechanical forms (potential, kinetic, and elastic) to its thermodynamic forms (thermal energy and latent heat of condensation). While the related energy fluxes are responsible for many weather phenomena, they are, at the same time, quite weak in comparison with the ubiquitous adiabatic, i.e., reversible, energy exchanges. Estimates in [22, 23] show that these processes have mean horizontally averaged transfer rates of 10 W=m2, while the part of the sun’s total energy flux absorbed by the atmosphere is ten times as large, and typical vertically averaged advective kinetic energy flux divergences can be even larger, depending on the specific flow situation. This magnitude difference between quantities of interest (here the rate of irreversible energy transfers) and those that are to be balanced to compute them creates a third challenge for numerical flow solvers: These should accurately reproduce these subtle diabatic energy transfers without overwhelming them by artificial diabatic exchanges induced by truncation errors from the discretization of the adiabatic dynamics. This is considered particularly important for long-term simulations as routinely pursued in climate research. Systematic but erroneous long-term trends can be the consequence of truncation errors competing with physical effects that are weak, but accumulate over long times. The exact conservation up to machine accuracy of the primary conserved quantities mass, total energy, and momentum (in the absence of nonconservative forcing) is achieved routinely by adopting conservative finite volume discretizations [3, 24]. These conservation laws hold, no matter whether a flow is adiabatic or not. Yet, to also preserve secondary constants of integration of the adiabatic dynamics, such as Ertel’s potential vorticity, angular momentum, or helicity, requires discretizations with particular algebraic properties. Schemes with “mimetic properties” are being


developed to meet these requirements. Their construction principle is to reproduce fundamental identities of vector calculus, which are used in the derivation of the secondary conservation principles, at the discrete level. Such discrete identities are a solid foundation for precise control, e.g., over transfers between different forms of energy, in a numerical scheme. The general approach has been pioneered by A. Arakawa and co-workers in the 1980s in the context of atmospheric flow simulation [25]. These authors exploited that certain expressions in the shallow water equations can be written in terms of antisymmetric differential operators (Poisson brackets) and that their antisymmetry is responsible for the conservation of total energy and of the square norm of vorticity in these equations in the adiabatic case. By constructing a discretization that directly mimics the operations of the Poisson brackets at the discrete level, they provided a shallow water solver with superior properties in longtime simulations. Recently, techniques of this type have been developed for more general computational grid structures and for more realistic flow models by various teams [26–29] using, inter alia, the Nambu formulation of fluid dynamics [30, 31]. One caveat associated with the use of such schemes is that they lead to fully nonlinear implicit, and thus, computationally expensive formulations if the said exact conservation properties are to be realized. Nevertheless, explicit or semi-implicit formulations in connection with such mimetic spatial discretizations can still exhibit advantageous dispersion behavior and very good approximate, although not exact, conservation properties. Another open issue for fully implicit schemes of this type is related to the nonlinearity of the fluid equations of state. The algebraic constraints to be observed to guarantee correct transfers between the different energy reservoirs may prohibit or strongly constrain the formulation of higher-order approximations in time [32].

Problem-Induced Scales/Nesting and Adaptivity Depending on the purpose of an atmospheric flow simulation, it is often desirable to nonhomogeneously resolve parts of the computational domain. This has, e.g., been standard in everyday weather forecasting.


Global simulations based on a relatively coarse computational grid, with grid sizes of 100 km, are supplemented by local high-resolution embedded simulations for a particular region of interest. More than two levels of refinement are being used, e.g., to maximize the simulated detail for hurricane forecasts [33, 34]. The standard approach to realizing the communication between coarse-grid and fine-grid computations is oneway nesting. Here one first completes the large-scale simulation at lower resolution and subsequently uses the results, after suitable interpolation, as effective boundary conditions for the embedded simulation. In an interesting variant of this approach, one solves, on a fine mesh, for perturbations away from a possibly time-dependent large-scale field that itself is either precomputed and prescribed or simulated on the fly on a coarser mesh [35]. In general, mutually coupled simulations – or two-way nesting – on grids of different refinements promise further improvements as the accuracy of the coarse-grid computation can benefit from the more accurate information generated on the regions with higher grid resolution. While this is intuitively plausible, and while it reminds readers familiar with the very successful modern adaptive numerical solver techniques for partial differential equations [36], it does deserve a closer look in the context of atmospheric flow simulations. Modern grid-adaptive numerical methods in computational (geophysical) fluid dynamics [37–39] generally realize two-way coupling. They exploit the higher accuracy achieved with higher resolution to also improve the coarse-grid computation. Ultimate efficiency is achieved when the grid resolution is locally and dynamically in time adapted to the resolution needs of a running simulation. The potential of these modern techniques is increasingly appreciated in numerical meteorology, and two-way nesting for regional weather forecasting is being tested at weather and climate centers. Some caveats in this context result, however, from the notorious underresolution that weather forecasters live with and will have to live with for the foreseeable future. Even the finest grids in a production weather forecast do not feature cell sizes smaller than 1 km. Many important physical processes, notably those associated with moisture, cannot be resolved at this level. These processes must therefore still be represented by effective closure models or parameterizations which not only have limited accuracy but are also inher-

1005

ently resolution dependent. As a consequence, adaptive grids with two-way nesting for meteorological applications must necessarily be accompanied by resolutionadaptive subgrid scale process parameterizations. The importance of such developments has been fully recognized only in recent years [40, 41] (See also the fall 2012 program on “Multiscale Numerics for the Atmosphere and Ocean” of the Newton Institute in Cambridge, UK.) and is now a topic of very active research. There is a second difficulty for adaptive simulations in meteorology, again related to the ubiquitous presence of small unresolved scales. Grid resolution in adaptive methods is originally meant to be adjusted such that all features of a solution are resolved with comparable accuracy independent of their characteristic scales. Thus, one would run with a relatively coarse grid in the region of smooth solutions and with finer grids in areas exhibiting small-scale structure. Yet, if small-scale structures are essentially present everywhere, as in the atmosphere, then a change of resolution must be controlled by different criteria. These cannot be the criteria based on the shear “accuracy” of the numerical solution, but must include aspects of what the target of the simulation is and of which aspects of the solution are and are not important for achieving this goal. As a consequence, adaptive multiscale modeling for atmospheric flows is to be understood inherently as a joint task of modelers and numerical analysts.

References 1. Klein, R.: Scale-dependent asymptotic models for atmospheric flows. Ann. Rev. Fluid Mech. 42, 249–274 (2010) 2. Pedlosky, J.: Geophysical Fluid Dynamics, 2nd edn. Springer, New York (1987) 3. Durran, D.R.: Numerical Methods for Fluid Dynamics: With Applications to Geophysics, 2nd edn. Springer, New York/Berlin/Heidelberg (2010) 4. Klemp, J.B., Skamarock, W.C., Dudhia, J.: Conservative split-explicit time integration methods for the compressible nonhydrostatic equations. Mon. Weather Rev. 135, 2897– 2913 (2007) 5. Wensch, J., Knoth, O., Galant, A.: Multirate infinitesimal step methods for atmospheric flow simulation. BIT 49, 449– 473 (2009) 6. Giraldo, F., Restelli, M., Läuter, M.: Semi-implicit formulations of the Euler equations: applications to nonhydrostatic atmospheric modeling. SIAM J. Sci. Comput. 32, 3394– 3425 (2010)

M

1006 7. Reisner, J.M., Knoll, D.A., Wyszogradzki, A.A.: An implicitly balanced hurricane model with physics based preconditioning. Mon. Weather Rev. 133, 1003–1022 (2005) 8. Jebens, S., Knoth, O., Weiner, R.: Linearly implicit peer methods for the compressible euler equations. Appl. Numer. Math. 62, 1380–1392 (2012) 9. Wood, N., Staniforth, A., White, A., Allen, T., Diamantakis, M., Gross, M., Melvin, T., Smith, C., Vosper, S., Zerroukat, M., Thuburn, J.: An inherently mass-conserving semi-implicit semi-Lagrangian discretization of the deepatmosphere global non-hydrostatic equations. Q. J. R. Meteorol. Soc. 140, 1505–1520 (2014). Early online view 10. Smolarkiewicz, P., Kühnlein, C., Wedi, N.: A consistent framework for discrete integrations of soundproof and compressible PDEs of atmospheric dynamics. J. Comput. Phys. 263, 185–205 (2014) 11. Klein, R.: Asymptotic analyses for atmospherics flows and the construction of asymptotically adaptive numerical methods. Z. Angew. Math. Mech. 80, 765–777 (2000) 12. Gatti-Bono, C., Colella, P.: An anelastic allspeed projection method for gravitationally stratified flows. J. Comput. Phys. 216, 589–615 (2006) 13. Cullen, M.J.P.: Modelling atmospheric flows. Acta Numer. 16, 67–154 (2007) 14. Vater, S., Klein, R., Knio, O.: A scale-selective multilevel method for long-wave linear acoustics. Acta Geophys. 59(6), 1076–1108 (2011) 15. Cordier, F., Degond, P., Kumbaro, A.: An asymptoticpreserving all-speed scheme for the Euler and Navier– Stokes equations. J. Comput. Phys. 231, 5685–5704 (2012) 16. Greenberg, J., Roux, A.L.: A well-balanced scheme for the numerical processing of source terms in hyperbolic equations. SIAM J. Numer. Anal. 33, 1–16 (1996) 17. Audusse, E., Bouchut, F., Bristeau, M.-O., Klein, R., Perthame, B.: A fast and stable well-balanced scheme with hydrostatic reconstruction for shallow water flows. SIAM J. Sci. Comput. 25, 2050–2065 (2004) 18. Botta, N., Klein, R., Langenberg, S., Lützenkirchen, S.: Well-balanced finite volume methods for nearly hydrostatic flows. J. Comput. Phys. 196, 539–565 (2004) 19. LeVeque, R.: Balancing source terms and flux gradients in high-resolution Godunov methods: the quasi- steady wavepropagation algorithm. J. Comput. Phys. 146, 346–356 (1998) 20. Noelle, S., Pankratz, N., Puppo, G., Natvig, J.: Wellbalanced finite volume schemes of arbitrary order of accuracy for shallow water flows. J. Comput. Phys. 213, 474–499 (2006) 21. Zängl, G.: Extending the numerical stability limit of terrainfollowing coordinate models over steep 743 slopes. Mon. Weather Rev. 140, 3722–3733 (2012) 22. Pauluis, O., Held, I.: Entropy budget of an atmosphere in radiative–convective equilibrium. Part I: maximum work and frictional dissipation. J. Atmos. Sci. 59, 125–139 (2002) 23. Ozawa, H., Ohmura, A., Lorenz, R.D., Pujol, T.: The second law of thermodynamics and the global climate system: a review of the maximum entropy production principle. Rev. Geophys. 41, 1–24 (2003) 24. LeVeque, R.J.: Finite Volume Methods for Hyperbolic Problems. Cambridge University Press, Basel (2002)

Multiscale Numerical Methods in Atmospheric Science 25. Arakawa, A., Lamb, V.: A potential enstrophy and energy conserving scheme for the shallow water equations. Mon. Weather Rev. 109, 18–36 (1981) 26. Thuburn, J., Ringler, T.D., Skamarock, W.C., Klemp, J.B.: Numerical representation of geostrophic modes on arbitrarily structured c-grids. J. Comput. Phys. 228, 8321–8335 (2009) 27. Salmon, R.: A general method for conserving energy and potential enstrophy in shallow-water models. J. Atmos. Sci. 64, 515–531 (2007) 28. Skamarock, W., Klemp, J., Duda, M., Fowler, L., Park, S.-H., Ringler, T.: A multi-scale nonhydrostatic atmospheric model using centroidal voronoi tesselations and c-grid staggering. Mon. Weather Rev. 140, 3090–3105 (2012) 29. Gassmann, A.: A global hexagonal c-grid non-hydrostatic dynamical core (icon-iap) designed for energetic consistency. Q. J. R. Meteorol. Soc. 139, 152–175 (2013) 30. Névir, P., Blender, R.: A Nambu representation of incompressible hydrodynamics using helicity and enstrophy. J. Phys. A 26, 1189–1193 (1993) 31. Sommer, M., Névir, P.: A conservative scheme for the shallow-water system on a staggered geodesic grid based on a Nambu representation. Q. J. R. Meteorol. Soc. 135, 485–494 (2009) 32. Gassmann, A., Herzog, H.: Towards a consistent numerical compressible non-hydrostatic model using generalized hamiltonian tools. Q. J. R. Meteorol. Soc. 134, 1597–1613 (2008) 33. Davis, C., Wang, W., Chen, S., Chen, Y., Corbosiero, K., DeMaria, M., Dudhia, J., Holland, G., Klemp, J., Michalakes, J., Reeves, H., Rotunno, R., Snyder, C., Xiao, Q.: Prediction of landfalling hurricanes with the advanced hurricane wrf model. Mon. Weather Rev. 136, 1990–2005 (2008) 34. Skamarock, W., Klemp, J., Dudhia, J., Gill, D., Barker, D., Duda, M., Huang, X.-Y., Wang, W., Powers, J.: A description of the advanced research wrf version 3. Tech. note 475, NCAR (2008) 35. Smolarkiewicz, P., Margolin, L., Wyszogrodzki, A.: A class of nonhydrostatic global models. J. Atmos. Sci. 58, 349–364 (2001) 36. Deuflhard, P., Weiser, M.: Adaptive Numerical Solution of PDEs. De Gruyter, Berlin (2012) 37. Almgren, A.S., Bell, J.B., Colella, P., Howell, L.H., Welcome, M.L.: A conservative adaptive projection method for the variable density incompressible navier-stokes equations. J. Comput. Phys. 146, 1–46 (1998) 38. Nikiforakis, N. (ed.): Special issue on: Mesh generation and mesh adaptation for large-scale earth-system modelling. Philos. Trans. R. Soc. A 367, 4473 (2009) 39. Wang, Z. (ed.): Adaptive High-Order Methods in Computational Fluid Dynamics. Advances in Computational Fluid Dynamics, vol. 2. World Scientific, Singapore/Hackensack (2011) 40. Klein, R. (ed.): Special issue on: Multiple scales in fluid dynamics and meteorology. Theor. Comput. Fluid Dyn. 27(3–4), 219–220 (2012) 41. Arakawa, A., Wu, C.-M.: A unified representation of deep moist convection in numerical modeling of the atmosphere. Part I. J. Atmos. Sci. 70, 1977–1992 (2013)

Multistep Methods

1007

To make the definition of a linear multistep method precise, let the two sets of coefficients f˛j gkj D0 and fˇj gkj D0 define the two generating polynomials,

Multistep Methods Gustaf Söderlind Centre for Mathematical Sciences, Numerical Analysis, Lund University, Lund, Sweden

./ D

k X

˛j j I

./ D

j D0

Introduction Linear multistep methods is a class of numerical methods for computing approximate solutions to initial value problems in ordinary differential equations. The most widely used methods are the Adams methods and the Backward Differentiation Formulas, better known as the BDF methods. The former are used for nonstiff equations, and the latter for stiff equations, [3, 10, 12, 13, 16]. The problem to be solved is a first-order system of differential equations, dy D f .t; y/I dt

y.0/ D y0 :

(1)

The independent variable t usually denotes “time,” and the dependent variable is a vector-valued function of time, y.t/ 2 Rm . The vector field f is usually assumed to be continuous in t and Lipschitz continuous with respect to y. One seeks a solution y.t/ for t 2 Œ0; T , satisfying the initial condition y.0/ D y0 , where the initial value y0 2 Rm is a given vector. Unlike differential equations, difference equations are well suited to sequential computing. A linear multistep method thus approximates (1) by a difference equation of the form k X j D0

˛j ynCj D h

k X

ˇj f .tnCj ; ynCj /;

(2)

j D0

where the coefficients f˛j gkj D0 and fˇj gkj D0 determine the method. Here ftn gN nD0 is a sequence of points in time such that tn D n h, where h is the step size, defined by N h D T . The sequence fyn gN nD0 , which is to be computed, contains the corresponding approximations to y.t/, that is, yn y.tn / for all n. As the difference equation is of order k, the method (2) is referred to as a k-step method.

k X

ˇj j ;

(3)

j D0

which are assumed to have no common factor. It is further assumed that deg./ D k, which is equivalent to the requirement ˛k ¤ 0. Finally, the method’s coefficients are normalized by imposing the condition .1/ D 1. Under these assumptions, the pair .; / uniquely defines a linear k-step method. Assume that k previous values yn ; ynC1 ; : : : ; ynCk1 are available. As ˛k ¤ 0, the difference equation (2) can be written as ynCk h

ˇk f .tnCk ; ynCk / ˛k

2 3 k1 k1 X 14 X ˇj f .tnCj ; ynCj / ˛j ynCj5; (4) h D ˛k j D0 j D0 where the right-hand side only consists of known values of y and f, and where the task is to solve for ynCk . The method is called explicit if ˇk D 0 (equivalent to deg./ < k). Then ynCk is directly obtained by evaluating the right-hand side of (4). The numerical integration of (1) consists of repeating this computation, step by step, until the terminal point T is reached. The Adams–Bashforth methods are examples of explicit linear multistep methods. If ˇk ¤ 0 (equivalent to deg./ D k), the method is implicit. Then ynCk is defined by the (nonlinear) equation (4), which must be solved numerically to determine ynCk . Such a method is computationally more expensive per step, but as implicit methods typically offer improved accuracy or superior stability, the use of larger step sizes may offset the added cost of solving a nonlinear equation on each step. Well-known examples of implicit methods are the Adams–Moulton methods and the BDF methods. The main alternative to linear multistep methods is Runge–Kutta methods. Although these classes of methods are quite different in character, they are both covered by a comprehensive, unifying theory, known as General Linear Methods. Robust and highly efficient software exist for both classes. General purpose solvers

M

1008

Multistep Methods

for (1) are usually adaptive, meaning that the step size is not kept constant (as above), but is automatically varied during the course of integration. This makes it possible to keep computational costs low, while still computing an approximate solution to within a userspecified accuracy requirement.

Order of Consistency

However, for k > 2, such methods are unstable and fail to be convergent. Thus, stability will restrict the order, and one must distinguish between order of consistency and order of convergence. The latter means that the point-wise numerical error en Œy WD kyn y.tn /k D O.hp / as h ! 0. This requirement is more demanding than merely having the difference equation (2) approximate (1) to a certain accuracy, such that rn Œy D O.hpC1 /.

In general, the exact solution y.t/ will not satisfy the difference equation (2). Inserting any sufficiently Stability and Convergence differentiable function y and its derivative y 0 into (2) Let E denote the forward shift operator, defined by one finds, by Taylor series expansion, that Eyn D ynC1 for all n, and apply .; / to the simple problem y 0 D f .t/. Then (2) can be written as k k X X ˛j y.tnCj / h ˇj y 0 .tnCj / rn Œy WD .E/yn D h.E/fn : (7) j D0 j D0 D ck hpC1 y .pC1/ .tn C kh/;

(5)

for some 2 Œ0; 1 as h ! 0. The remainder term rn is called the local residual; ck is the error constant; and p is the method’s order of consistency. The order is usually determined by inserting polynomials y.t/ D t q , with y 0 .t/ D q t q1 , since rn Œt q 0 for all q p. Specifically P for q D 0, the condition on the coefficients is ˛j D .1/ D 0. Hence, ./ D 0 must always have one root D 1, known as the principal root. Further, taking n D 0 and tj D j h, for q 1, the order conditions are k X ˛j j q ˇj q j q1 D 0I

q D 1; : : : ; p; (6)

j D0

where the .p C 1/th condition fails. The first order condition (q D 1) can also be written 0 .1/ D .1/. The two conditions .1/ D 0 and 0 .1/ D .1/ are often merely referred to as “consistency,” noting that any method that fails to satisfy this minimum requirement is also unable to track the solution of (1). The order conditions can be used to construct methods. As a k-step method has 2k C 1 coefficients (one coefficient being lost to the normalization requirement .1/ D 1), and the coefficients f˛j gk0 and fˇj gk0 enter (6) linearly, it is possible to select them such that rn Œt q 0 for q D 0; : : : 2k. Thus, the maximal order of consistency is p D 2k.

The solution is yn D un Cvn , where fun g is a particular solution, and the homogeneous solutions fvn g satisfy .E/vn D 0. The latter are determined by the roots of the characteristic equation ./ D 0. Thus, the homogenous solutions depend on the method but are unrelated to the given problem y 0 D f .t/. They must therefore remain bounded for all n, or preferably decay, lest the method produce a spurious, unstable numerical solution, diverging from the particularR solution fung which approximates the exact solution, f .t/ dt. The homogeneous solutions are unstable unless all roots are inside or on the unit circle. Furthermore, fvn g also grows if any root on the unit circle is multiple. Thus, it is necessary to impose the root condition: ./ D 0

)

j j 1I

D 1; : : : ; k

j j D 1

)

is a simple root.

A method whose polynomial satisfies the root condition is called zero stable. Zero stability is necessary for the method to be convergent, that is, for the numerical solution to converge to the exact solution, yn ! y.tn / as h ! 0. This is one of many examples of the Lax Principle, also referred to as the fundamental theorem of numerical analysis: Consistency and Stability is Equivalent to Convergence. More precisely, let a zero-stable method .; / have order of consistency p. Then it is also convergent, with order of convergence p, that is, kyn y.tn /k D O.hp /

Multistep Methods

1009

as h ! 0. Note that every consistent one-step method is convergent. Such a method has ./ D 1, and the root condition is trivially satisfied. As only convergent methods are of interest, the maximum order needs to be reexamined, taking zero stability into account. According to the First Dahlquist Barrier theorem [6], the maximal order of convergence of a k-step method is

pmax

8 ˆ

Machine Learning Algorithms

Machine Learning Algorithms

Suggest Documents

Supervised Machine Learning Algorithms

Machine Learning Algorithms - analyticsvidhya.com

Parallelizing Machine Learning Algorithms - CS229

Learning Multiple Defaults for Machine Learning Algorithms

Generative Learning algorithms - CS 229: Machine Learning

Machine-Learning Algorithms to Automate ... - Dudley Lab

Calibrating Unsupervised Machine Learning Algorithms for the

j3.9 multiple imputation through machine learning algorithms

Machine learning approximation algorithms for high ...

Download Machine Learning: Fundamental Algorithms ... - Google Sites

Machine Learning Algorithms for Real Data Sources

educational tool for machine learning algorithms

Notes on Machine Learning Algorithms

machine learning algorithms for damage detection

A Review on Machine Learning Algorithms, Tasks

Download Machine Learning: Fundamental Algorithms ... - Google Sites

Optimization Algorithms in Machine Learning - Computer Sciences ...

Machine Learning of Computer Vision Algorithms - CiteSeerX

Fall Detection Using Machine Learning Algorithms

Optimization Algorithms in Machine Learning - VideoLectures.NET

Classification of Machine Learning Algorithms - ijirae

Download Machine Learning: Fundamental Algorithms ... - Google Sites

Machine Learning Algorithms and Their Application to

APPLYING MACHINE LEARNING ALGORITHMS IN SOFTWARE ...