A parallel multi-subdomain strategy for solving Boussinesq ... - ifi - UiO

3 downloads 3226 Views 837KB Size Report
This paper describes a general parallel multi-subdomain strategy for solving the ...... [14] Choi H, Bai K, Cho J. Nonlinear free surface waves due to a ship.
Advances in Water Resources 28 (2005) 215–233 www.elsevier.com/locate/advwatres

A parallel multi-subdomain strategy for solving Boussinesq water wave equations X. Cai

a,b,*

, G.K. Pedersen c, H.P. Langtangen

a,b

a

Simula Research Laboratory, P.O. Box 134, N-1325 Lysaker, Norway Department of Informatics, University of Oslo, P.O. Box 1080, Blindern, N-0316 Oslo, Norway Department of Mathematics, Mechanics Division, University of Oslo, P.O. Box 1053, Blindern, N-0316 Oslo, Norway b

c

Received 25 June 2004; received in revised form 1 November 2004; accepted 5 November 2004 Available online 20 January 2005

Abstract This paper describes a general parallel multi-subdomain strategy for solving the weakly dispersive and nonlinear Boussinesq water wave equations. The parallelization strategy is derived from the additive Schwarz method based on overlapping subdomains. Besides allowing the subdomains to independently solve their local problems, the strategy is also flexible in the sense that different discretization schemes, or even different mathematical models, are allowed in different subdomains. The parallelization strategy is particularly attractive from an implementational point of view, because it promotes the reuse of existing serial software and opens for the possibility of using different software in different subdomains. We study the strategys performance with respect to accuracy, convergence properties of the Schwarz iterations, and scalability through numerical experiments concerning waves in a basin, solitary waves, and waves generated by a moving vessel. We find that the proposed technique is promising for large-scale parallel wave simulations. In particular, we demonstrate that satisfactory accuracy and convergence speed of the Schwarz iterations are obtainable independent of the number of subdomains, provided there is sufficient overlap. Moreover, existing serial wave solvers are readily reusable when implementing the parallelization strategy.  2004 Elsevier Ltd. All rights reserved. Keywords: Parallel computing; Wave equations; Domain decomposition; Additive Schwarz iterations

1. Introduction and motivation Shallow water models have been dominant in many branches of ocean modeling, tsunami computations being one important example. Such models are efficient, robust, allow explicit time stepping, and may treat breaking waves provided that an appropriate shock capturing scheme is employed. Moreover, due to the use of explicit numerical schemes, the shallow water models are simple to parallelize. However, important physics, such as (frequency) dispersion, is absent in the shallow water formulation. Dispersion is crucial in a series of long *

Corresponding author. Tel.: +47 678 282 84; fax: +47 678 282 01. E-mail addresses: [email protected] (X. Cai), [email protected] (G.K. Pedersen), [email protected] (H.P. Langtangen). 0309-1708/$ - see front matter  2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.advwatres.2004.11.004

wave applications where the wave length to depth ratio is only moderate (15 or less) or huge propagation distances are involved as for ‘‘trans-Pacific’’ tsunamis [39]. In particular, important wave forms like solitary waves, ondular bores and dispersive wave trains, often with dominant fronts, are not at all reproduced by shallow water theory. While solution of the primitive Navier–Stokes equations, or even full potential theory, is still far too heavy for an ocean domain of appreciable size, the Boussinesq type equations provide an attractive alternative. These equations include both nonlinearity and leading dispersion, while they are still depth averaged and favorable with respect to numerical solution. In addition to ocean modeling and tsunamis, Boussinesq models are also useful for internal waves, plasma physics, and applications in coastal engineering such as

216

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

Nomenclature X g / H  a p T ^g ^ / I gDD I /DD

the spatial solution domain the water surface elevation the depth averaged velocity potential the water depth the magnitude of dispersion the magnitude of nonlinearity a known external pressure field the stopping time for a simulation the approximate solution of g the approximate solution of / the average number of additive Schwarz iterations per time step needed for solving the discretized continuity equation (4) the average number of additive Schwarz iterations per time step needed for solving the discretized Bernoulli equation (5)

swells in shallow water and harbor seiches. One theme that has been popular during the last decades is the generation of nonlinear surface waves by a moving source at trans-critical speed. Recently, the practical implications of this topic have become more important due to the increased use of high speed ferries in coastal waters. Such vessels traveling in shallow water have occasionally produced waves (‘‘Solitary killers’’) with fatal results [24,6]. A numerical case study in the present paper is inspired by this type of application. Boussinesq type equations have been solved numerically by finite differences [40,1,41,36,34] or finite elements [27,46,33,44]. All the models in these references involve large sets of implicit equations at each time step. In comparison with the shallow water equations, this makes the computations and memory requirements substantially heavier. In addition parallel computing becomes at the same time both more difficult and much more crucial. This leads to the theme of the present paper: parallel solution of Boussinesq equations by domain decomposition. Parallel computing is an indispensable technique for running large-scale simulations, which may arise from a combination of many millions of degrees of freedom and many thousands of time steps. Single-processor computers have their limit on the processing speed and the size of local memory and storage, thus unfit for large-scale simulations. Parallelization of a serial wave simulator can be done in different fashions. In case the entire solution domain uses the same mathematical model and discretization scheme, a possible parallelization approach is to divide the loop- or array-level operations among multiple processors. For example, when traversing a long array, a master process can spawn a

EgL2 and E/L2 the error of a numerical solution at the end of a simulation, measured in the L2norm. For example, EgL2 is defined as Z 12 2 g EL2  ðgðx; y; T Þ  ^gðx; y; T ÞÞ dx dy X

E/L2

is defined similarly

Eg;ref L2

or E/;ref the L2-norm of the deviation between a L2 numerical solution and a reference solution, which is obtained by running the serial onedomain solution strategy on a very fine grid P the number of processors, the same as the number of subdomains WT the total wall-time consumption of a parallel simulation.

set of worker processes, each being responsible for a portion of the long array. Such a parallelization approach maintains a global image of the global simulation and is particularly attractive for shared-memory parallel computing systems. The overall numerical strategy following this parallelization approach remains the same as that in the serial simulator. A more flexible parallelization approach is to explicitly partition the global solution domain into several subdomains, each being the responsibility of an assigned processor. The resulting overall numerical strategy may be different from that of the serial simulator, and thus more favorable for parallel computing. Each subdomain is given more independence, i.e., either different discretization schemes, or different local solution methods, or even different mathematical models may be allowed in different subdomains. For tsunami simulations, for instance, linear finite difference methods may be used in deep sea domains, while shallower coastal regions may be treated by nonlinear finite element techniques that enable boundary fitted discretization and local refinement. Therefore, a resource-effective simulation strategy should adopt a hybrid approach that divides the entire solution domain into multiple computational regions. Advanced numerical schemes together with unstructured grids are used for regions that require high accuracy, such as the coastal regions where the water is shallow and the coast line is of a complicated shape. On the other hand, simple rectangular grids and finite difference schemes can be applied to regions that correspond to the vast open sea. The essence of such a hybrid simulation strategy is that each computational region computes its local solution mostly by itself. A certain amount of overlap is needed between the neighboring

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

computational regions. Within each overlapping zone, the different locally computed solutions are exchanged and then averaged between the neighbors to ensure a smooth transition of the overall solution. This procedure of local computation plus solution exchange normally needs to undergo a few iterations during each time step, so that convergence is obtained for a global solution composed of the regional solutions. We will consider in this paper a particular variant of parallelization based on overlapping subdomains. In respect of programming, we adopt the general parallel programming model of message passing [23]. That is, the exchange of information between processors is in the form of explicitly sending and receiving messages. The mathematical foundation for our parallelization approach is additive Schwarz iterations, see [12,42]. We remark that this parallelization approach is fully compatible with the hybrid simulation strategy mentioned above, where a computational region may be further decomposed into smaller subdomains. During each time step, the processors first independently carry out local discretizations restricted to the subdomains. Then, to find a global solution, an iterative procedure proceeds until convergence, where in each iteration every subdomain solves a local problem and exchanges solutions in the overlapping zones with the neighbors. The advantages of the Schwarz iteration-based parallelization approach include a simple algorithmic structure, rapid convergence, good re-usability of existing serial code, and no need for special interface solvers between neighboring subdomains. (We remark that nonoverlapping domain decomposition algorithms involve special interface solvers, see [12,42].) The disadvantages of our overlapping subdomain-based strategy are twofold. First, an additional layer of iterations has to be introduced, i.e., between the time-stepping iterations and the subdomain iterations. This extra cost may sometimes be compensated by a combination of fast Schwarz convergence and cheap subdomain solvers, especially when the number of degrees of freedom becomes large. Second, there will arise a certain amount of extra local computational work due to overlap, slightly affecting the speed up of parallel simulations. However, we will show by numerical experiments that this is not a severe problem for really large-scale simulations. The work of the present paper is mainly motivated by [22], where it has been shown that rapid convergence of the additive Schwarz iterations is obtainable for the onedimensional Boussinesq equations. This nice behavior of wave-related problems is due to the fact that the spread of information is limited by the wave speed, thus giving rise to faster convergence than that of standard elliptic boundary value problems with pure Laplace operators, in particular when sufficient overlap is employed. More specifically, theoretical analyses of the one-dimensional Boussinesq equations in [22] show that an amount of

217

overlap between subdomains of the order of water depth can ensure rapid convergence within a few addtive Schwarz iterations. Hence, a correspondingly good efficiency is to be expected when the technique is employed in two dimensions. In this paper, we investigate the convergence in twodimensional cases for the method proposed in [22]. Many practical issues that are not addressed in [22] are considered. More specifically, we study different mechanisms of determining how to terminate the Schwarz iteration, including both local-type and global-type convergence criteria. It will also be shown that the theoretical estimate of the overlap amount from [22] may sometimes be too strict. That is, the additive Schwarz iterations may be robust enough in many cases to allow a small amount of overlap between subdomains. The numerical accuracy is studied with respect to the number of subdomains. Moreover, we analyze the performance scalability with respect to both the number of degrees of freedom and the number of processors. It should be stated that domain decomposition methods have been extensively used to solve PDEs, see e.g. [4,30,29]. Applications to wave-related problems have also been studied by numerous authors, see e.g. [17,3,2,16,21,5,15,19,25], to just mention a few. Many of the cited papers deal with the Helmholtz equation, which has a significantly different numerical nature in comparison with the time-discrete equations of the present paper. To the authors knowledge, there are no papers (except for [22]) that address domain decomposition and parallel computing for PDEs of the same nature as the Boussinesq equations, i.e., equations with limited signal speed and implicit dispersion terms. The remainder of the paper is organized as follows. First, Section 2 presents the mathematical model of the Boussinesq equations and standard single-domain discretization techniques. Then, we devote Section 3 to the explanation of our multi-subdomain strategy, addressing both the mathematical background and numerical details. Later on, Section 4 investigates the behavior of the parallel solution strategy by several numerical experiments, and Section 5 concludes the paper with some remarks and discussions.

2. Mathematical model and discretization 2.1. The Boussinesq equations The Boussinesq equations in a scaled form are considered in this paper. More specifically, we introduce a typical depth, H0, and a typical wavelength, L, as the verticalpand ffiffiffiffiffiffiffiffiffihorizontal length scales, respectively. Selecting L= gH 0 as the time scale and extracting an amplitude factor a from the field variables, we then obtain the Boussinesq equations in the following form [45]:

218

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

og þ r  q ¼ 0; ot   o/ a  o/ þ r/  r/ þ g þ p  H r  H r ot 2 2 ot  o/ ¼ 0; þ H 2 r2 6 ot

ð1Þ

ð2Þ

where   (H0/L)2 must be small for the equations to apply. In the above system of partial differential equations (PDEs), we recognize Eq. (1) as the continuity equation, whereas Eq. (2) is a variant of the Bernoulli (momentum) equation. The primary unknowns are the surface elevation g(x, y, t) and the depth averaged velocity potential /(x, y, t). In Eq. (2), H(x, y) denotes the water depth and p(x, y, t) is a known external pressure applied to the surface. The latter is assumed to be of the same order as g or less. In Eq. (1), the flux function q is given by   1 og 1 q ¼ ðH þ agÞr/ þ H  rH  r/ rH : ð3Þ 6 ot 3 Eqs. (1) and (2) are assumed to be valid in a two-dimensional domain X, where suitable boundary conditions apply on oX. In the applications of the present paper we employ no-flux conditions, implying that q Æ n = 0 and o//on = 0. In addition, the Boussinesq equations are supplemented with initial conditions in the form of prescribed g(x, y, 0) and /(x, y, 0). The complete mathematical problem is thus to find g(x, y, t) and /(x, y, t), 0 < t 6 T, as solutions to Eqs. (1) and (2), subject to the boundary and initial conditions. The Boussinesq equations as given above are limited to potential flow, which excludes wave-breaking, bottom friction, the Coriolis effect, and other sources of vorticity, which may be incorporated if velocities are used as variables instead of the potential. Still, for many aspects of tsunami propagation and coastal engineering the assumption of potential flow is appropriate. Moreover, regarding the application of overlapping domain decomposition methods, the experiences obtained for the present model will also be valid when primitive variables are employed, due to the likeness in the structure of the nonlinear and dispersion terms; see also [22]. 2.2. Temporal and spatial discretization The temporal and spatial discretization proposed in [33] is employed in the present work. The time domain 0 < t 6 T is divided into discrete time levels: 0; Dt; 2Dt; . . . ; T : During each time step t‘1 < t 6 t‘, where t‘1 = (‘1)Dt and t‘ = ‘Dt, the Boussinesq equations (1, 2) are discretized by centered differences in time, and a Galerkin finite element method or centered finite differences are used in

the spatial discretization. A staggered grid in time [36] is used, where g is sought at integer time levels (‘) and / is sought at half-integer time levels ð‘ þ 12Þ. Using a super1 script, as in g‘ and /‘þ2 , we can formulate the time-discrete problem as follows:   1 g‘  g‘1 g‘1 þ g‘ þr H þa r/‘2 Dt 2  ‘   1 g  g‘1 1 ‘12  rH  r/ þ H rH ¼ 0; ð4Þ 6 3 Dt 1

1

1 1 /‘þ2  /‘2 a  þ r/‘2  r/‘þ2 þ g‘ þ p‘  H r 2 2 Dt ! 1 1 1 1 r/‘þ2  r/‘2  r2 /‘þ2  r2 /‘2  H ¼ 0: þ H2 6 Dt Dt

ð5Þ In addition to centered differences in time we have used an arithmetic mean for the g term in Eq. (3) and a geometric mean for the $/ Æ $/ term in Eq. (2). The latter yields a linear set of equations to be solved for new / values at each time step. The discretization techniques have been applied to long term numerical simulations in, e.g., [36,38], and have also been found to be stable for the solitary wave solution [37]. Another nice feature of these approximations is that they imply an operator splitting in the sense that the originally coupled system (1) and (2) can be solved in sequence. That is, Eq. (4) 1 is solved first to find g‘, using g‘1 and /‘2 from the computations at the previous time level. Then, Eq. (5) 1

is solved to find /‘þ2 , using the recently computed g‘ 1 and the previous /‘2 as known quantities. There are several options for the spatial discretization. If the finite difference method is preferred, a rectangular spatial grid (iDx, jDy) is normally needed for discretizing X. Eqs. (4) and (5) are spatially discretized on the interior grid points, while the boundary conditions must be incorporated on the grid points lying along oX. The approximate finite difference solutions, 1 ^ ‘þ2 ðx; yÞ, are sought on all the spanamed g^‘ ðx; yÞ and / tial grid points, in the form of discrete values ^ g‘i;j and 1 ^ ‘þ2 . / i;j

If the finite element method is chosen, more freedom is allowed in constructing the spatial discretization, since we may employ unstructured grids and the elements may have different sizes, shapes, and approximation properties. The finite element solutions, also denoted 1 ^ ‘þ2 ðx; yÞ, have values over the entire spaby g^‘ ðx; yÞ and / tial domain X. These solutions are as usual taken as linear combinations of a set of basis functions Ni defined on the finite element grid ^g‘ ðx; yÞ ¼

X i

g‘i N i ðx; yÞ;

1

^ ‘þ2 ðx; yÞ ¼ /

X i

‘þ1

/i 2 N i ðx; yÞ:

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

These expressions are inserted in the time-discrete equations (4, 5), and the resulting residual is integrated against weighting functions Ni according to the Galerkin method. Second order terms are integrated by parts, and the no-flux boundary conditions imply that the boundary integrals vanish. The objective of solution is to find ‘þ1 the coefficient vectors fg‘i g and f/i 2 g. The accuracy of the numerical method is typically second order in time and space if bilinear/linear elements or centered spatial differences are used. Adjustments of the scheme for achieving fourth order accuracy are described in [33], where accuracy and stability are analyzed in detail. The stability criterion can be written as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi h2 4 Dt 6 þ min H ; ð6Þ maxX H 3 X maybe with some smaller adjustments depending on the details of the discretization, see [33]. Note that the implicit treatment of dispersion gives a more favorable stability criterion (because  > 0 in Eq. (6)), in comparison with the corresponding non-dispersive case. No matter whether finite differences or finite elements are chosen, the final result of the spatial discretization of Eqs. (4) and (5) are the following two systems of linear equations:  1   1 Ag /‘2 g‘ ¼ bg g‘1 ; /‘2 ; ð7Þ  1 1   1 A/ /‘2 /‘þ2 ¼ b/ g‘ ; /‘2 ;

ð8Þ

which need to be solved at each time step. We remark 1 that the vectors g‘ and /‘þ2 contain either the discrete ‘þ12

values g‘i;j and /i;j in the finite difference method, or ‘þ1

the coefficients g‘i and /i 2 in the finite element method. The system matrices Ag and A/ depend on the latest solution of /, whereas the right-hand side vectors bg and b/ depend on the latest solutions of g and /. We may summarize the computations as follows in case of a standard single-domain numerical strategy for single-processor computers: A single-domain numerical strategy For time step tl1 < t 6 tl: 1

^ ‘2 as known solutions and discretize ^‘1 and / 1. Use g Eq. (1). 2. Solve the resulting linear system (7) for finding ^g‘ . 1 ^ ‘2 as known solutions and discretize Eq. 3. Use ^ g‘ and / (2). 1 ^ ‘þ2 . 4. Solve the resulting linear system (8) for finding / We remark that the above numerical strategy assumes that same mathematical model and spatial discretization scheme apply to the entire spatial domain X.

219

3. A parallel multi-subdomain strategy As mentioned earlier in Section 1, we will adopt a particular parallelization technique based on subdomains. More specifically, we explicitly divide the global solution domain X into a set of overlapping subdomains {Xs}. Each subdomain becomes an independent working unit, which mostly concentrates on local discretizations within Xs and solving local linear systems. In addition, the subdomains frequently collaborate with each other, in a form that neighboring subdomains exchange local solutions within overlapping zones. A loose synchronization of the work progress on the subdomains has also to be enforced. 3.1. Additive Schwarz iterations The mathematical foundation of our parallelization approach is the additive Schwarz method, see [12,42]. The numerical strategy behind this domain decomposition method can be understood as follows. Suppose we want to solve a global linear system Ax ¼ b;

ð9Þ

which arises from discretizing a PDE on a global domain X. Given a set of subdomains {Xs} such that X = [Xs and there is a certain amount of overlap between neighboring subdomains, we locally discretize the PDE in every subdomain Xs. The result of the local discretization is As xs ¼ bs ðxjoXs noX Þ:

ð10Þ

Mathematically, the above linear system arises from restricting the discretization of the PDE within Xs. The only special treatment happens on the so-called internal boundary oXsnoX, i.e., the part of oXs that does not coincide with the physical boundary oX of the global domain. We remark that a requirement of the overlapping zones says that any point lying on the internal boundary of a subdomain must also be an interior point in at least one of the other subdomains. On the internal boundary, artificial Dirichlet conditions are repeatedly updated with new values given by the solution (in the previous iteration) in the neighboring subdomains. The involvement of the artificial Dirichlet conditions is indicated by the notation bs ðxjoXs noX Þ in Eq. (10). On the remaining part of oXs, the original boundary conditions of the global PDE are valid as before. Actually, a subdomain matrix As can also arise from first building the global matrix A and then cutting out the portion of A that corresponds to the grid points lying in Xs. However, this approach requires unnecessary construction and storage of global matrices, which is not a desired situation during parallel computations. We just make the point that the global matrix A can be logically represented by the collection of subdomain matrices As.

220

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

For the artificial Dirichlet conditions to converge toward the correct values on the internal boundary, iterations need to be carried out. That is, we generate on each subdomain a series of approximate solutions x0s ; x1s ; x2s ; . . . , which will hopefully converge toward the correct solution xs ¼ xjXs . One such additive Schwarz iteration is defined as e 1 bs ðxk1 j xks ¼ A oXs noX Þ; s

xk ¼ composition of all xks : ð11Þ

We note that the subdomain local solves can be carried out independently in each additive Schwarz iteration. This immediately gives rise1to the possibility of parallel e in Eq. (11) indicates that computing. The symbol A s a local solve can be approximate, not necessarily an exact inverse of As. The right-hand side vector bs needs to be updated with artificial Dirichlet conditions on the points that lie on the internal boundary, using solution of the previous Schwarz iteration provided by the neighboring subdomains. At the end of the kth additive Schwarz iteration, the (logically existing) global approximate solution xk is composed on the basis of the subdomain approximate solutions fxks g. The principle of partition of unity, which roughly means that composing subdomain solutions of constant one should result in a global solution of constant one (see e.g. [12]), is used in the following rule for composing a global solution: • An overlapping point refers to a point that lies inside a zone of overlap, i.e., the point belongs to at least two subdomains. • For every non-overlapping point, i.e., a point that belongs to only one subdomain, the global solution attains the same value as that inside the host subdomain. • For every overlapping point, let us denote by ntotal the total number of host subdomains that own this point. Let also ninterior denote the number of subdomains, among those ntotal host subdomains, which do not have the point lying on their internal boundaries. (The setup of the overlapping subdomains ensures ninterior P 1.) Then, the average of the ninterior local values becomes the global solution on the point. The other ntotal  ninterior local values are not used, because the point lies on the internal boundary there. Finally, the obtained global solution is enforced in each of the ntotal host subdomains. For the ntotal  ninterior host subdomains, which have the point lying on their internal boundary, the obtained global solution will be used as the artificial Dirichlet condition during the next Schwarz iteration. To compose the global solution and update the artificial Dirichlet conditions, as described by the above

rule, we need to carry out a procedure of communication among the neighboring subdomains at the end of each Schwarz iteration. During this procedure of communication, each pair of neighboring subdomains exchanges between each other an array of values that are associated with their shared overlapping points. It is clear that if each subdomain solution xks converges toward the correct solution xjXs , the difference between the subdomain solutions in an overlapping zone will eventually disappear. A well-known technique in domain decomposition methods for obtaining convergence independent of the number of subdomains is the so-called coarse grid correction, see e.g. [12]. The rough idea behind one coarse grid correction is that a global ‘‘residual’’ is mapped to another very coarse global grid and resolved accurately there. Then, the computed ‘‘correction’’ is interpolated back to the global fine grid level and thus improves the accuracy of the global solution. Coarse grid corrections have been proved to be essential for domain decomposition methods to solve purely elliptic PDEs, where the ‘‘spread of information’’ is infinitely fast. However, we will deliberately avoid using coarse grid corrections in our paper, because the spread of information in the Boussinesq equations is limited by the wave speed. Numerical experiments in Section 4 will indicate that convergence independent of the number of subdomains is obtainable in many cases. In fact, onedimensional experiments from [22] displayed a significant effect of coarse grid corrections only when the global coarse grid has a sufficiently high resolution. 3.2. Solving the Boussinesq equations in parallel 3.2.1. Domain partitioning To solve the Boussinesq equations (1, 2) in parallel, we need to first partition the global domain X into overlapping subdomains Xs, s = 1,2, . . . , P. The overlapping subdomains may arise as follows. First, the global domain X is partitioned into P non-overlapping subdob s , s = 1,2, . . . , P. For example, if X is of a mains X rectangular shape, it can be partitioned regularly into a mesh of smaller rectangles. A more general partitioning scheme, which can be applied to both rectangularand irregular-shaped X, should allow curves as the borders between neighboring subdomains. The resulting partitioning is consequently of an unstructured nature, see Fig. 1 for an example. When a non-overlapping partitioning of X is done, b s is enlarged with a certain amount, each subdomain X so that overlapping zones arise between neighboring subdomains. The final results are the overlapping subdomains {Xs}. A rule of thumb for domain partitioning is that the subdomains should have approximately the same number of degrees of freedom. In addition, the length of the internal boundaries should preferably be

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

221

Fig. 1. An example of an unstructured partitioning of X into 16 subdomains (a), and the approximate solution ^g obtained from a corresponding parallel simulation at t = 2 (b).

small, for limiting the communication overhead in the parallel simulations. 3.2.2. Distributed discretization When the overlapping subdomains are ready, parallelization can be done by modifying the single-domain numerical strategy, on the basis of the overlapping subdomains. That is, each subdomain solver (on Xs) is only responsible for finding the approximate solutions within 1 ^ ‘þ2 . Regarding the discretization, Xs, denoted by ^ g‘s and / s the main difference between a serial strategy and a parallel one is the area of discretization. In a parallel strategy, the discretization can be carried out independently on the subdomains. Optionally, different discretization schemes or even different mathematical models can be adopted by the subdomains. The result of a parallel discretization is that a global matrix Ag is distributed as a set of subdomain matrices {Ag,s}, and similarly A/ is distributed as {A/,s}. 3.2.3. Solving the discretized equations in parallel To solve the two distributed systems of linear equations during each time step, i.e., the distributed form of Eqs. (7) and (8) among the subdomains, we use the additive Schwarz iterations described in Section 3.1. More specifically, during time step t‘1 < t 6 t‘, subdomain Xs participates in the solution of Eq. (7) by the following iterations: For k = 1,2, . . . ,  1   ‘12 ‘1 ‘;k1 2 g‘;k ¼ b Ag;s /‘ g ; / ; g j ð12Þ g;s oXs noX s s s s until global convergence. Here, g‘1 and /‘1 denote the s s converged solutions of the additive Schwarz iterations from the previous time step. The initial guess g‘;0 of s the Schwarz iterations for the current time level is the same as g‘1 s . We remark that updating the artificial Dirichlet conditions per Schwarz iteration is indicated by g‘;k1 joXs noX in Eq. (12). The intermediate global solution g‘,k1 needs only to exist logically, and is composed of the intermediate subdomain solutions fg‘;k1 g, as des scribed in Section 3.1. We also remark that the subdomain systems (12) can be solved totally independently of each other. An approximate subdomain solver is

often sufficient for Ag,s. Moreover, a loose synchronization between the subdomains has to be enforced, in the sense that no subdomain can start with its local work for the next Schwarz iteration, before all the other subdomains have finished their local solves for the current Schwarz iteration and have communicated with each other. Similarly, to solve Eq. (8) in parallel, we use the following additive Schwarz iterations on subdomain Xs: For k = 1, 2, . . .  1 1   ‘12 ‘þ12;k1 ‘ 2 /‘þ2;k ¼ b A/;s /‘ g ; / ; / j ð13Þ /;s oXs noX s s s s 1

until global convergence. The initial guess /s‘þ2;0 is the 1 2 same as /‘ s . 3.2.4. Checking the global convergence of Schwarz iterations An important issue for the above Schwarz iterations is checking the convergence of the intermediate global solutions g‘;1 ; g‘;2 ; . . . ; g‘;k

and

1

1

1

/‘þ2;1 ; /‘þ2;2 ; . . . ; /‘þ2;k :

If the same mathematical model and discretization scheme are used in all the subdomains, a global-type convergence monitor can check an intermediate global residual vector. Let us consider, for instance, the case of the continuity equation. The subdomain linear systems (12) logically constitute the following global linear system:  1   1 Ag /‘2 g‘;k  bg g‘1 ; /‘2 ; ð14Þ which can be considered as a ‘‘global view’’ of one Schwarz iteration. The associated global residual vector    1 1 ‘1 r‘;k ; /‘2  Ag /‘2 g‘;k ; ð15Þ g ¼ bg g which arises from ‘‘composing’’ a set of locally computed residual vectors r‘;k g jXs , can be used to check the global convergence. More precisely, when the global solution gl,k is ready, each subdomain computes its local contribution to the global residual vector as ‘;k r‘;k g jXs ¼ bg;s  Ag;s g jXs ;

ð16Þ

222

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

where g‘;k jXs denotes the restriction of the global solution on subdomain s. In Eq. (16), the residual values on the internal boundary points will be incorrect and thus should not partiticate in computing a global ‘;k norm of rl;k g . Duplicated contributions to krg k from residual values on the other overlapping points should also be scaled according to the principle of partition of unity. Thereby, a typical global-type monitor for checking the global convergence of the Schwarz iterations is kr‘;k g k kr‘;0 g k

ð17Þ

< eglobal ;

where eglobal is a prescribed threshold value. For small time steps, where the initial residual is expected to be small, Eq. (17) might be too strict, and checking for the absolute instead of the relative residual is preferable. We remark that the global matrix and vectors Ag, bg, g‘,k, and r‘;k g exist only logically, because their actual values are computed and distributed on the subdomains. However, if different discretization schemes or different mathematical models are used on the subdomains, it makes no sense to compute global residuals. Therefore, the convergence of the intermediate global solutions g‘;1 ; g‘;2 ; . . . ; g‘;k

and

1

1

1

/‘þ2;1 ; /‘þ2;2 ; . . . ; /‘þ2;k

should be checked locally, in a collaboration involving all the subdomains. A local-type monitor for the global convergence can thus be ‘;k1 kg‘;k k s  gs < elocal global ‘;k kgs k

A typical monitor for the subdomain convergence is thus kr‘;k;m g;s k kr‘;k;0 g;s k

ð20Þ

< esubd ;

where esubd is a prescribed threshold value. We remark that different subdomains may choose different iterative solvers and different values of esubd. For example, on a subdomain where the solutions change very little from time step to time step, the threshold esubd should use a relatively large value, or a convergence monitor that only checks the absolute value of kr‘;k;m g;s k should be used. 3.2.6. The overall parallel strategy The whole parallel numerical strategy can be summarized as follows. A multi-subdomain numerical strategy Partition the global domain X into overlapping subdomains {Xs}. During each time step tl1 < t 6 tl, the following sub-steps are carried out on every subdomain Xs: 1

^ ‘2 as known solutions and discretize 1. Use g^s‘1 and / s Eq. (1) inside Xs. 2. Solve the distributed global linear system (7) to find ^g‘s , using the additive Schwarz iterations (12). 1 ^ ‘2 as known solutions and discretize Eq. 3. Use ^g‘s and / s (2) inside Xs. 4. Solve the distributed global linear system (8) to find 1 ^ ‘þ2 , using the additive Schwarz iterations (13). / s

for all s;

ð18Þ

where elocal global is another prescribed threshold value. We remark that the above local-type convergence monitor (18) can replace the global-type monitor (17), but not the other way around. In Section 4 we report experience with both convergence monitors. 3.2.5. The subdomain solvers During every additive Schwarz iteration, each subdomain Xs needs to solve a local linear system (12) or (13) using updated artificial Dirichlet conditions on its internal boundary. Normally, an iterative solver can be used, because the additive Schwarz method allows approximate subdomain solvers. A typical subdomain solver may use a few (preconditioned) conjugate gradient (CG) iterations, see e.g. [7]. Let us consider Eq. (12) during the kth Schwarz iteration. An iterative subdomain solver generates a series of local solutions g‘;k;0 ; g‘;k;1 ; . . . ; g‘;k;m on Xs. To monitor s s s the convergence with a subdomain solver, we may use the following subdomain residual vector:    1 1 ‘;k1 2 2 g‘;k;m : r‘;k;m ¼ bg;s gs‘1 ; /‘ joXs nX  Ag;s /‘ g;s s s ;g s ð19Þ

In the case nonlinearity and dispersion are neglected, i.e., a =  = 0, the time-discrete scheme becomes explicit. One Schwarz iteration is then sufficient per time step for solving both Eqs. (1) and (2), because only quantities computed at the previous time level are involved in the artificial boundary conditions in the subdomains. 3.2.7. Software The attractive feature of the proposed parallelization strategy, from an implementational point of view, is that a serial simulation code can, in principle, be reused in each subdomain. The authors have developed this idea to the extent that a serial simulation code can be reused without modifications. When the serial code is available as a C++ class, only a small subclass needs to be programmed for gluing the original solver with a generic library for communicating finite element fields via MPI. The small subclass then works as a parallel subdomain solver, which can be used by a general Schwarz iterator. This parallelization framework is available as part of the Diffpack programming environment [18,31] and documented in previous papers [8,32,11,9,10]. For the present work we have managed to insert the Boussinesq equation solver from [33] in the parallelization framework [9]. The approach of reusing a serial solver

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

not only saves implementation time, but also makes the parallel software more reliable: a well-tested serial code is combined with a well-tested generic communication library and a general Schwarz iterator. We believe that such a step-wise development of parallel simulation software is essential, because debugging parallel codes soon becomes a tedious and challenging process.

4. Numerical experiments The purpose of this section is to test the numerical properties of the parallel multi-subdomain strategy described in the preceding section. A series of numerical experiments is done, by which we investigate in particular: (1) The relationship between the accuracy of the resulting numerical solutions and a chosen convergence monitor. (2) The effect of different partitioning schemes, both regular and unstructured. (3) The convergence speed of Schwarz iterations with respect to the amount of overlap between neighboring subdomains. (4) The applicability of the global-type and local-type convergence monitors, i.e., Eqs. (17) and (18). (5) The scalability of the parallel simulations, i.e., how the speed-up results depend on the number of processors and the grid resolution. 4.1. Eigen-oscillations in a closed basin As the first set of test cases, we consider standing waves in the following solution domain: X ¼ ð0; 10Þ  ð0; 10Þ with a constant water depth H = 0.04. The external pressure term in Eq. (2) is prescribed as zero. Both the case of linear waves without dispersion (a =  = 0, i.e., the hydrostatic model) and the case of nonlinear and dispersive waves (a =  = 1) are studied. The boundary conditions are of no-flux type, o/ ¼ 0 and q Æ n = 0, on the on entire boundary oX. The spatial discretization is done by the finite element method based on a uniform grid with bilinear elements, the spatial grid resolution varies from Dx = Dy = 0.025 to Dx = Dy = 0.1. The time domain of simulation is 0 < t 6 T = 2, and the initial conditions are chosen such that the following solutions: gðx; y; tÞ ¼ 0:008 cosð3pxÞ cosð4pyÞ cosðptÞ

ð21Þ

and /ðx; y; tÞ ¼ 

0:008 cosð3pxÞ cosð4pyÞ sinðptÞ p

ð22Þ

223

are exact for the linear case without dispersion (a =  = 0). (The above exact solutions are derived from a more general form exp(i(kxx + kyy  xt)) using desired initial and boundary conditions, see e.g. [43,35,31].) We note that there are 15 wave lengths in the x-direction, and 20 wave lengths in the y-direction. For the size of the time steps, we have used a fixed value of Dt = 0.05 independent of the spatial grid resolution. We also mention that consistent mass matrices (without lumping) have been used in our finite element discretization, so that the numerical scheme is implicit even for the linear non-dispersive case (a =  = 0). In the present simulation case, lumped mass is known to give better numerical approximations, but consistent mass matrices are deliberately chosen for studying the convergence behavior of additive Schwarz iterations. The topic on lumping mass matrices is, e.g., discussed in [33]. 4.1.1. The effect of the global convergence criterion We use a 100 · 100 global grid for the case of linear waves without dispersion. For domain partitioning we use an unstructured strategy (see Fig. 1 for an example). This unstructured partitioning strategy is based on using the Metis [26] software package that is originally designed for decomposing unstructured finite element meshes, in which equal-sized subdomains are sought while the total volume of communication overhead is minimized. Although we can use here a regular partitioning scheme, in which straight ‘‘cutting lines’’ that are parallel to the x- and/or y-axis produce subdomains of a rectangular shape, we have deliberately chosen the unstructured strategy to study whether irregularly shaped subdomains pose a problem for the convergence of the additive Schwarz iterations. The amount of overlap is fixed as one layer of shared elements between two neighboring subdomains. In Table 1, we show the relationship between the accuracy and the threshold value in a chosen convergence criterion, for different values of P. More specifically, we vary the value of eglobal in the global-type convergence monitor (17) and check how it affects EgL2 , E/L2 ; I gDD , and I /DD . It can be observed that I gDD and I /DD increase when the global convergence criterion becomes stricter, while the L2-norm of the errors first decreases and then stabilize at the level of discretization errors. We can also see that eglobal = 104 ensures at least three-digit accuracy, which becomes totally independent of the number of subdomains, so this conservative value of eglobal is used in Tables 2–5. Moreover, for a fixed value of eglobal, it can be observed that I gDD and I /DD increase slightly with respect to P. For practical applications, it may be argued that eglobal = 102 is sufficient. The choice of thresholds must be related to the choice of grid resolution and the size of the approximations in the underlying wave model (the Boussinesq equations).

224

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

Table 1 The effect of the threshold value eglobal, see Eq. (17), on convergence speed and accuracy for the case of linear waves without dispersion (a =  = 0) I gDD

EgL2

I /DD

E/L2

eglobal

P

101 101 101 101 101

2 4 8 16 32

1.23 1.23 1.38 1.43 1.98

1.1828 · 102 1.2606 · 102 1.3051 · 102 1.3811 · 102 1.1373 · 102

1.15 1.25 1.43 1.33 1.43

3.7155 · 103 3.9693 · 103 4.1668 · 103 4.5126 · 103 3.4153 · 103

102 102 102 102 102

2 4 8 16 32

2.10 2.13 2.68 3.08 3.58

9.6199 · 103 9.6209 · 103 9.5882 · 103 9.6674 · 103 9.4929 · 103

2.00 2.03 2.50 3.03 3.55

3.2211 · 103 3.2241 · 103 3.2086 · 103 3.2556 · 103 3.2657 · 103

103 103 103 103 103

2 4 8 16 32

3.00 3.00 5.00 5.05 6.03

9.5775 · 103 9.5788 · 103 9.5792 · 103 9.5837 · 103 9.5608 · 103

3.00 3.00 4.95 5.03 6.00

3.2080 · 103 3.2091 · 103 3.2088 · 103 3.2120 · 103 3.1935 · 103

104 104 104 104 104

2 4 8 16 32

5.00 4.05 7.00 9.00 9.00

9.5757 · 103 9.5750 · 103 9.5761 · 103 9.5758 · 103 9.5766 · 103

5.00 4.03 7.00 9.00 9.00

3.2061 · 103 3.2055 · 103 3.2065 · 103 3.2063 · 103 3.2068 · 103

105 105 105 105 105

2 4 8 16 32

7.00 7.00 10.00 12.00 13.00

9.5756 · 103 9.5756 · 103 9.5755 · 103 9.5755 · 103 9.5756 · 103

7.00 7.00 10.00 12.00 13.00

3.2061 · 103 3.2061 · 103 3.2060 · 103 3.2060 · 103 3.2061 · 103

The global 100 · 100 grid is partitioned using an unstructured scheme, where the amount of overlap is fixed as one layer of shared elements. The subdomain solvers use 1–5 CG iterations for obtaining esubd = 101, see Eq. (20).

Table 2 Results from a one-dimensional regular partitioning scheme for the case of linear waves without dispersion (a =  = 0). The global-type convergence monitor (17) is used with eglobal = 104. The amount of overlap is one layer of shared elements between two neighboring subdomains. The subdomain solvers use 1–5 CG iterations for obtaining esubd = 101, see Eq. (20) Global 100 · 100 grid; subdomains as vertical stripes P

I gDD

EgL2

I /DD

E/L2

2 4 8 16

4.10 5.00 6.00 7.00

9.5751 · 103 9.5756 · 103 9.5724 · 103 9.5761 · 103

4.10 5.00 6.00 7.00

3.2056 · 103 3.2061 · 103 3.2035 · 103 3.2067 · 103

Table 3 Results due to different partitioning schemes associated with 16-subdomain linear simulations on a 100 · 100 global grid Partitioning scheme 16 · 1 rectangles 8 · 2 rectangles 4 · 4 rectangles 2 · 8 rectangles 1 · 16 rectangles Unstructured

I gDD 7.00 9.98 7.93 7.95 7.00 9.00

EgL2 3

9.5761 · 10 9.5750 · 103 9.5752 · 103 9.5748 · 103 9.5767 · 103 9.5758 · 103

I /DD

E/L2

WT

7.00 9.98 7.93 7.98 7.00 9.00

3.2067 · 103 3.2056 · 103 3.2057 · 103 3.2053 · 103 3.2070 · 103 3.2063 · 103

7.78 10.52 8.33 10.43 7.74 9.46

The other parameters are the same as in Table 2. We note that WT denotes the wall-time consumption (in s) of the simulations.

4.1.2. The effect of domain partitioning We have also tested a regular partitioning scheme that produces the subdomains as a coarse mesh of rectangles. The amount of overlap is fixed as one layer of

shared elements between two neighboring subdomains and the grid has 100 · 100 elements, as in Section 4.1.1. Table 2 shows EgL2 , E/L2 , I gDD , and I /DD arising from using a regular one-dimensional partitioning in the

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

225

Table 4 The relationship between grid resolution (fixed Dt = 0.05) and convergence speed for the case of nonlinear and dispersive waves (a =  = 1) Global grid

P

I gDD

Eg;ref L2

I /DD

E/;ref L2

100 · 100 100 · 100 100 · 100 100 · 100

2 4 8 16

6.68 6.80 8.43 9.35

7.2339 · 103 7.2338 · 103 7.2342 · 103 7.2342 · 103

4.25 4.38 4.83 5.35

3.3084 · 103 3.3084 · 103 3.3083 · 103 3.3084 · 103

200 · 200 200 · 200 200 · 200 200 · 200

2 4 8 16

5.48 5.48 5.75 6.13

1.5613 · 103 1.5609 · 103 1.5609 · 103 1.5611 · 103

3.10 3.13 3.20 3.68

9.4077 · 104 9.4073 · 104 9.4082 · 104 9.4091 · 104

400 · 400 400 · 400 400 · 400 400 · 400

2 4 8 16

4.05 4.15 4.40 4.28

2.5737 · 104 2.5740 · 104 2.5740 · 104 2.5766 · 104

5.45 6.98 7.00 7.00

1.9139 · 104 1.9131 · 104 1.9115 · 104 1.9084 · 104

800 · 800 800 · 800 800 · 800 800 · 800

2 4 8 16

3.78 3.83 3.88 3.98

1.1286 · 105 1.4519 · 105 1.3249 · 105 9.0133 · 106

13.00 15.85 15.93 15.78

9.8782 · 107 1.5556 · 106 1.7371 · 106 1.4481 · 106

The global-type convergence monitor (17) with eglobal = 104 is used. For domain partitioning, an unstructured scheme is used where the amount of overlap is fixed as one layer of shared elements. The subdomain solvers use 1–5 CG iterations for obtaining esubd = 101, see Eq. (20).

Table 5 Comparing Eqs. (17) and (18) as the global convergence monitor for the case of nonlinear and dispersive waves (a =  = 1) on a global 200 · 200 grid Eg;ref L2

I /DD

E/;ref L2

Using the global-type convergence monitor (17); eglobal = 104 101 2 5.48 101 4 5.48 101 8 5.75 101 16 6.13

1.5613 · 103 1.5609 · 103 1.5609 · 103 1.5611 · 103

3.10 3.13 3.20 3.68

9.4077 · 104 9.4073 · 104 9.4082 · 104 9.4091 · 104

102 102 102 102

2 4 8 16

7.03 7.00 8.00 8.08

1.5610 · 103 1.5610 · 103 1.5611 · 103 1.5611 · 103

3.00 3.00 3.00 3.00

9.4099 · 104 9.4114 · 104 9.4056 · 104 9.4042 · 104

103 103 103 103

2 4 8 16

7.03 7.00 8.00 7.10

1.5610 · 103 1.5609 · 103 1.5611 · 103 1.5610 · 103

3.00 3.00 3.00 4.23

9.4098 · 104 9.4113 · 104 9.4057 · 104 9.4087 · 104

4 Using the local-type convergence monitor (18); elocal global ¼ 10 101 2 6.05 101 4 6.55 101 8 6.93 101 16 7.10

1.5609 · 103 1.5609 · 103 1.5609 · 103 1.5610 · 103

3.98 4.08 4.10 4.23

9.4086 · 104 9.4086 · 104 9.4088 · 104 9.4087 · 104

102 102 102 102

2 4 8 16

7.63 8.28 9.00 9.50

1.5609 · 103 1.5610 · 103 1.5609 · 103 1.5610 · 103

3.40 3.63 3.78 3.73

9.4086 · 104 9.4085 · 104 9.4085 · 104 9.4085 · 104

103 103 103 103

2 4 8 16

7.63 8.23 9.00 9.55

1.5609 · 103 1.5610 · 103 1.5610 · 103 1.5610 · 103

3.20 3.40 3.65 3.70

9.4086 · 104 9.4085 · 104 9.4086 · 104 9.4085 · 104

esubd

P

I gDD

The table also investigates the effect of the subdomain convergence threshold value esubd, see Eq. (20). For domain partitioning, an unstructured scheme is used where the amount of overlap is fixed as one layer of shared elements.

x-direction, such that the subdomains are vertical stripes. Compared with the measurements in Table 1 that correspond to eglobal = 104, we can observe that

the accuracy is independent of the chosen partitioning scheme. However, the convergence speed of the additive Schwarz iterations is somewhat sensitive to the parti-

226

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

tioning scheme. In Table 3, we compare six different partitioning schemes for P = 16 in particular. We have also listed the wall-time consumption associated with each partitioning scheme in the table. It can be observed that the time usage depends primarily on the number of Schwarz iterations used, rather than on the shape of the subdomains. This suggests that the message passing overhead is quite low in the parallel simulations. 4.1.3. The effect of grid resolution For the case of nonlinear and dispersive waves (a =  = 1), we try different grid resolutions and study how the number of Schwarz iterations changes. The unstructured partitioning scheme is used, and the amount of overlap is fixed as one layer of shared elements between two neighboring subdomains. We can observe in Table 4 that the obtained accuracy depends on the grid resolution, but is almost completely insensitive to the number of subdomains. Note that in this nonlinear case we have no exact solution available and we therefore use fine-grid reference solutions for accuracy comparisons. The reference solutions gref(x, y, T) and /ref(x, y, T) are produced by a serial solution method on a 800 · 800 global grid, i.e., without domain decomposition. The linear systems involved in the serial solution process have been solved with sufficient accuracy. From Table 4 we can also see that I gDD decreases when the spatial grid resolution is increased, whereas I /DD shows an opposite tendency. We remark that for the 800 · 800 global grid, one layer of shared elements between neighboring subdomains is a very small amount of overlap, therefore the large numbers of I /DD . For a fixed spatial grid resolution, I gDD and I /DD tend to increase slightly with P. (We could of course use a larger amount of overlap, which will result in faster convergence, but this small amount of overlap is deliberately chosen to test the robustness of the additive Schwarz iterations.) With our choice of bilinear elements for the spatial discretization and centered differences in time, the scheme is of second order in the time step size Dt and the element size h. This means that the numerically computed surface elevation field ^ g‘ ðx; y; h; DtÞ is related to ‘ the exact solution g through ^ g‘ ðx; y; h; DtÞ ¼ g‘ þ Ah2 þ BDt2 ; where A and B are constants independent of the discretization parameters h and Dt. For the reference solution we have ^ g‘ ðx; y; href ; DtÞ ¼ g‘ þ Ah2ref þ BDt2 ; with href being the element size in the grid used for computing the reference solution. Since we have used a fixed value of Dt in the experiments, this implies that the difference between the numerical solution and a reference solution can be written as

ðh; DtÞ ¼ Ah2 þ C; Eg;ref L2

where C ¼ Ah2ref :

ð23Þ

Applying Eq. (23) to two resolutions h1 and h2, we can estimate A as   g;ref A ¼ Eg;ref ðh ; DtÞ  E ðh ; DtÞ ðh21  h22 Þ1 : ð24Þ 1 2 2 2 L L The C parameter then follows from: C ¼ Eg;ref ðh1 ; DtÞ  Ah21 ; L2

h1 6¼ hhref :

ð25Þ

Based on Eqs. (24) and (25), the values of A and C can be estimated using two consecutive experiments, and with sufficiently fine resolution in space and time, the estimated values will (hopefully) converge to the true values. The same estimation technique applies of course also to the reference solution of the / field. From the values in Table 4 we find from the two finest grids Eg;ref ðh; DtÞ  7:1  103 h2  1:9  104 ; L2 which fits well with all g-related values in Table 4. For ðh; DtÞ an estimate using the first and third grid E/;ref L2 gives a formula ðh; DtÞ  3:3  103 h2  1:7  105 ; E/;ref L2 which also predicts the third value satisfactorily. It and E/;ref values in Table therefore appears that the Eg;ref L2 L2 4 are compatible with a discretization method of secondorder in space. 4.1.4. More on global and subdomain convergence monitors As we have mentioned earlier in Section 3.2.4, the local-type global convergence monitor (18) can replace the global-type monitor (17). To demonstrate this property, we compare the results obtained from the two global convergence monitors in Table 5. The first half of the table shows the results from using the global-type monitor (17) with eglobal = 104, whereas the second half of Table 5 is devoted to the results due to using the local4 type monitor (18) with elocal global ¼ 10 . In each half of the table, we also show the effect of different values of esubd, which is used by Eq. (20) to check the convergence of the local subdomain solvers during each Schwarz iteration. It can be observed from Table 5 that the use of the local-type global convergence monitor (18) results in slightly more iterations I gDD and I /DD , while achieving at the same time more stable measurements of Eg;ref L2 and E/;ref . This indicates that the local-type monitor L2 (18) can be a useful mechanism for checking the global convergence of the Schwarz iterations. The advantage is in particular its applicability for situations where different subdomains use different discretizations or mathematical models, a feature of particular importance in large-scale ocean wave modeling. Regarding the choice of esubd used in the subdomain convergence monitor (20), we have observed in Table

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

5 that esubd = 101 is sufficient for obtaining three-digit global accuracy with respect to both Eg;ref and E/;ref . L2 L2 This means that the subdomain problems need not be solved very accurately. A stricter subdomain convergence monitor (smaller value of esubd) will only increase the computational cost, without improving the accuracy of the global numerical solutions. We remark that the subdomain convergence monitor (20) with a constant threshold value esubd actually becomes ‘‘stricter and stricter’’ toward the final Schwarz iteration, because l;k;0 the value of rl;k;0 g;s or r/;s decreases with the number of Schwarz iterations k.

227

4.2. Solitary waves In the preceding test cases, the solutions have a uniform frequency x throughout the entire domain. This roughly means that all the subdomains have the same dynamic behavior. To investigate whether differences in the dynamic behavior among the subdomains affect our parallel solution strategy, we now study the problem of a solitary wave, which moves in the positive x-direction. We have deliberately chosen the initial conditions to be dependent on x only, even though we simulate in the entire two-dimensional domain. This enables an easy verification of the solutions visually, and can be used to check whether different domain partitionings affect the parallel simulation results. The solution domain is X = (0, 400) · (0, 20) with a constant water depth of H = 1. As before, the boundary conditions are of no-flux type as in the previous test case, and the spatial discretization is done by the finite element method based on a uniform grid with bilinear elements. Consistent mass matrices are used and the external pressure term in Eq. (2) is also prescribed as zero. The time domain of interest is 0 < t 6 T = 200, and the forms of g(x, y, 0) and /(x, y, Dt/2) are depicted in Fig. 3. The size of the time step is fixed at Dt = 0.25. We simulate a case of nonlinearity and dispersion (a =  = 1) for the solitary wave. Three resolutions of the global grid are chosen as 400 · 40, 800 · 40, and 1600 · 40. The chosen Dt is then 4.6–6 times smaller than the stability limit of Dt (see Eq. (6)). A one-dimensional regular partitioning scheme is used such that the resulting subdomains are vertical stripes. The relationship between convergence speed and amount of overlap is studied in Table 7. We can observe that a sufficient overlap is necessary for achieving rapid convergence of the Schwarz iterations in this test case, meaning that the width of the overlapping zones must be above a fixed

4.1.5. Variable water depth and unstructured grids To study the effect of non-constant water depth on the convergence of additive Schwarz iterations, we introduce a bottom profile as depicted in Fig. 2. More specifically, the bottom profile is a bell-shaped function centered around (x = 5, y = 0), such that H varies between 0.04 and 0.62. Three unstructured computational grids with different resolutions have been built accordingly for X = (0, 10) · (0, 10), see Fig. 2 for the coarsest grid. The unstructured domain partitioning scheme (see Section 4.1.1) has to be used for such unstructured grids. For the three grids we have respectively used one, two, and three layers of elements as the overlapping zone between neighboring subdomains. Table 6 shows the average numbers of Schwarz iterations needed to achieve convergence associated with eglobal = 104, where Dt = 0.05, T = 2, and a =  = 1. The subdomain solver uses 1–5 preconditioned CG iterations for obtaining esubd = 101, see Eq. (20). The chosen preconditioner is based on the modified incomplete LU-factorization (MILU), see e.g. [7]. We can observe from Table 6 that a moderate size of the overlapping zone is sufficient in this test case for obtaining a stable convergence independent of P. 10 -0.04 -0.1

-0.2

-0.3

-0.4

-0.5

-0.6 0

10

Z 10 Y X

0

0 0

Fig. 2. A variable-depth bottom profile and an associated unstructured computational grid.

10

228

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

Table 6 The average numbers of Schwarz iterations needed for solving a nonlinear and dispersive case with variable water depth described in Section 4.1.5 2741

Overlap layers

1

P

I gDD

I /DD

I gDD

I /DD

I gDD

I /DD

2 4 8 16

5.85 6.93 7.00 7.65

9.43 10.03 10.10 11.55

4.00 4.00 4.00 4.00

11.45 11.10 11.18 11.40

4.00 4.00 4.00 4.00

7.50 7.48 8.28 8.03

10761

42641

2

0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0

3

0

(a)

(b) -0.5 φ(x)

η(x)

Grid points

-1

5

10

15

20 x

25

30

35

40

-1.5 0

5

10

15

20 x

25

30

35

40

Fig. 3. A region of the initial conditions for the test case of a solitary wave; g at t = 0 (a) and / at t = Dt/2 (b).

value, typically of the size H, independent of the grid resolution. This observation is in accordance with the one-dimensional analysis in [22]. For large values of P, even though only a small number of subdomains have large changes in g and / at a given time level, this does not prevent a similar convergence speed of the Schwarz iterations as for small values of P. Fig. 4 shows the numerical solution of g at e.g. t = 10. Due to numerical dispersion, we can see that small components of different wave lengths are lagging behind the moving soliton.

Such residual wave trains are characteristic in virtually all Boussinesq or KdV-type models, unless some kind of filtering is employed. This is because the analytic solitary wave solution, which is inserted as the initial condition, can not be reproduced exactly in the discrete solutions. This gives rise to a slightly modified solitary wave and a small residual wave train with a high content of short waves. The residual train will not grow in time and thus does not cause instability; see e.g. [37,22].

Table 7 The relationship between convergence speed and amount of overlap for a nonlinear solitary wave problem Width of overlap

H

2H

3H

Global grid

P

I gDD

I /DD

I gDD

I /DD

I gDD

I /DD

400 · 40 400 · 40 400 · 40 400 · 40 400 · 40

2 4 8 12 16

1.17 1.49 2.10 2.52 2.86

2.05 2.09 2.18 2.30 2.40

1.11 1.42 1.75 1.95 2.13

2.02 2.03 2.06 2.10 2.13

1.08 1.29 1.61 1.80 2.01

2.00 2.00 2.00 2.00 2.00

800 · 40 800 · 40 800 · 40 800 · 40 800 · 40

2 4 8 12 16

1.08 1.18 1.40 1.59 1.75

3.06 3.12 3.23 3.40 3.51

1.05 1.19 1.37 1.47 1.58

3.03 3.06 3.12 3.20 3.26

1.06 1.12 1.24 1.41 1.54

3.03 3.06 3.12 3.21 3.27

1600 · 40 1600 · 40 1600 · 40 1600 · 40 1600 · 40

2 4 8 12 16

1.09 1.27 1.62 1.86 1.92

3.12 3.25 3.49 3.86 4.11

1.06 1.32 1.61 1.74 1.84

3.09 3.19 3.37 3.65 3.84

1.06 1.14 1.29 1.43 1.54

3.09 3.19 3.38 3.67 3.86

The global-type convergence monitor (17) with eglobal = 103 is used, and the subdomain solvers use 1  2 MILU-preconditioned CG iterations for obtaining esubd = 101, see Eq. (20).

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

229

Numerical solution of η at t=10

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 -0.05 20 40

15 30

10 20

5

y

10 0

0

x

Fig. 4. A region of the computed solution of the solitary wave at t = 10; the case of nonlinearity and dispersion (a =  = 1).

4.3. Waves generated by a moving disturbance In the final test case we want to study waves generated at trans-critical speeds by a moving disturbance, such as a boat. The standard Kelvin ship wave pattern is then strongly modified. In particular solitary waves may be generated and radiated upstream. For an indepth explanation of this subject, we refer to [45,20,28,36,14,13,34]. These references are generally concerned with sources that move along uniform channels with constant speed, even though a few also address wave generation in an horizontally unbounded fluid. The wave patterns evolve slowly over large propagation distances. Then, to limit the size of the computations the references truncate their domain downstream and a computational window is sometimes designed to follow the source, either by dynamical inclusion of upstream grid points or employment of a coordinate transformation. In a more general setting, with highly variable source velocity and bathymetry, such techniques can hardly be invoked. Hence, to approach real applications, as the ‘‘Solitary killers’’ [24], we will need much larger computational domains than those used in the academic studies referenced above and parallel computing becomes desirable. A moving disturbance is most conveniently incorporated into our model through the pressure term p(x, y, t) in the Bernoulli equation (2). We assume that p(x, y, t) has an effective region of fixed shape, where the center is moving along a trajectory. Since we are concerned with the parallel aspects we keep the problem simple by assuming constant depth and that the disturbance is moving along the x-axis with a velocity of F,

which in the present scaling equals the Froude number. However, there is nothing in our model that depends on these limitations. Moreover, following [36] and others, we assume that the effective region of p(x, y, t) is of an ellipsoidal shape, i.e.,    pa cos2 12 pR R 6 1; pðx; y; tÞ ¼ ð26Þ 0 R > 1; where s ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2   ffi x  Ft y 2 : Rðx; y; tÞ ¼ þ b w

ð27Þ

We have chosen the spatial domain as (x, y) 2 [100, 600] · [60, 60], which can be thought as a wide channel of shallow water (H  1), in which a moving vessel generates waves. The physical size of H may be 10 m in such a case. Due to symmetry (the disturbance moves along the x-axis), only the upper y-half of the spatial domain is used for computation. The speed of the disturbance is chosen as F = 1.1, i.e., a slightly supercritical case. The ellipsoidal shape of the effective region of p(x, y, t) is determined by choosing b = 12, w = 7, and pa = 0.1 in Eqs. (26, 27). Several snapshots of g(x, y, t) from a simulation between 100 6 t 6 500 are depicted in Fig. 5, showing that there will occur upstream radiation of solitons when t is large enough. The main purpose of this test case is to study the scalability property of our parallel solution strategy with respect to the grid resolution and the number of subdomains. In order to be able to carry out detailed measurements of a set of large-scale parallel simulations, we have chosen to only consider a short simulation time

230

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

Fig. 5. Snapshots of g(x, y, t) that arise from a simulation of waves generated by a moving disturbance moving at a supercritical speed.

period 0 < t 6 1. Three different grid resolutions are chosen: 2800 · 240, 5600 · 480, and 11200 · 960, where we note that the finest computational grid has 10, 764, 161 nodal points. Two-dimensional rectangular domain partitioning is used to divide X into P overlapping subdomains, where the amount of overlap is, respectively for the three grid resolutions, four, eight, and sixteen layers of shared elements. In other words, the width of the overlapping zone is H. We study the case of nonlinear and dispersive waves (a =  = 1). Finite element discretization with linear triangular elements and a consistent mass matrix are used on all the subdomains. To monitor the convergence of Schwarz iterations, we have used the global-type monitor (17) with eglobal = 103. As the subdomain solver, we always use ten CG-iterations. In Table 8, we report the average numbers of Schwarz iterations I gDD and I /DD per time step and the total wall-clock time usage WT. The wall-clock time measurements are obtained on a Linux cluster that consists of 1.3 GHz Itanium2 processors, inter-connected through a Gigabit ethernet. It can be observed that the wall-clock time measurements scale quite well with respect to P. Note that the speed-up results in Table 8 are based on the measurements for P = 2, e.g., the speed-up for P = 20 is obtained

by 2 · WT(P = 2)/WT(P = 20). The speed-up results also gradually improve as the grid resolution increases. This indicates that our parallel solution strategy has inherently good parallel performance, where the communication overhead is not dominating. The main ‘‘obstacle’’ for achieving perfect speed-up results is the overlapping zone between neighboring subdomains, having actually more effect than the communication overhead. In addition, we can observe that the additive Schwarz iterations rapidly reach the desired convergence level for the continuity equation in all the situations, whereas the number of iterations for the Bernoulli equation is independent of P, but increases when the grid resolution increases.

5. Discussion and conclusions Modeling of long destructive water waves in the ocean frequently involves huge domains and fine grids, thus calling for highly efficient parallel simulation codes. Optimal parallel speed-up is easy to achieve for the explicit time-marching schemes that are currently dominating in long water wave modeling. However, weak dispersion is often needed (and is usually sufficient

Table 8 The average numbers of Schwarz iterations per time step and the total wall-clock time consumptions (in s) for a set of parallel simulations of waves due to a moving disturbance P

2 4 8 12 16 20

2800 · 240 grid; 4 steps

5600 · 480 grid; 8 steps

11200 · 960 grid; 16 steps

I gDD

I /DD

WT

Speedup

I gDD

I /DD

WT

Speedup

I gDD

I /DD

WT

Speedup

1 1 1 1 1 1

2 2 2 2 2 2

60.79 30.45 16.06 11.04 8.28 7.11

N/A 3.99 7.57 11.01 14.68 17.10

1 1 1 1 1 1

4 4 4 4 4 4

612.55 306.35 159.68 108.86 82.76 66.36

N/A 4.00 7.67 11.25 14.80 18.46

1 1 1 1 1 1

10 10 10 10 10 10

7568.91 3836.77 1928.40 1304.43 990.90 780.31

N/A 3.95 7.85 11.60 15.28 19.40

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

[39]), and this makes the numerical schemes implicit and harder to parallelize. In the present paper we have proposed and evaluated a parallelization strategy for the weakly dispersive and nonlinear Boussinesq equations in two dimensions. Due to a limited wave speed, the additive Schwarz iterations are more suitable for solving wave-related problems than the Laplace-like elliptic boundary problems. This has been demonstrated by nearly constant or slowly growing numbers of Schwarz iterations as the number of processors and subdomains grows. The growth of the Schwarz iterations is far slower than what arises in Laplace-like elliptic problems. In another paper [22] we have performed an in-depth investigation of overlapping domain decomposition methods and Schwarz iterations for the one-dimensional Boussinesq equations. Such investigations are important because artificial reflection of waves at subdomain boundaries may arise and destroy the quality of the simulations. The present paper provides evidence that the main conclusions from [22] carry over to two dimensions and that the domain decomposition method can be parallelized with satisfactory scalability. Regarding the accuracy, the number of subdomains does not affect the overall accuracy of the resulting numerical solution, provided that a strict enough global convergence monitor is used for the Schwarz iterations. Such a convergence monitor may well be local in each subdomain, and this is necessary if we want to utilize the flexibility of using different discretization methods and/or mathematical models in different subdomains. Experiments have indicated similar behavior of the global-type and local-type convergence monitors, i.e., Eqs. (17) and (18). Another advantage of the proposed parallel strategy is that the subdomain problems do not need to be solved very accurately (at least during the early Schwarz iterations). This is an important computation saving factor. In respect of software development, our experience shows that the described parallelization approach strongly encourages code reuse. That is, an existing serial solver can work as the subdomain solvers. The required amount of modifications in the serial solver depends on how it is designed. Our implementation shows an example where serial solvers following an object-oriented design normally can be reused without modifications in a generic parallel framework [9]. Although our numerical experiments have been carried out on an in-house PC cluster, which has relatively much faster processors than its communication speed, we still manage to achieve close-to-perfect scalability. This means that such parallel wave simulations suit quite well for cheap and low-end parallel computers, provided that the size of computation is sufficiently large. In other words, we believe that parallel wave simulations will eventually become ‘‘affordable’’ for many

231

researchers, with respect to both software implementation and hardware building. Regarding the future work, we see several topics that need to be investigated. First, multi-algorithmic applications should be tested, i.e., different wave models and/or discretizations are used in different subdomains. We remark that our parallel implementation readily allows such multi-algorithmic applications and that a few one-dimensional cases have been investigated to a certain extent in [22]. Second, the necessity of a coarse grid correction mechanism (see Section 3.1) should be studied. The experience from [22] suggests that coarse grid corrections may slightly improve the convergence, but it also introduces a certain amount of extra computation. We remark that our solution strategy without coarse grid correction is straightforward to implement, which is a fact of importance for ‘‘popularizing’’ parallel computations with (weakly) dispersive wave models. Third, adaptive mesh refinement is important for treating coastlines and significant variations in the water depth, but the resulting mesh is static and does not pose challenges beyond our unstructured domain partitioning scheme using Metis [26]. Dispersion reduces the need for dynamic adaptive mesh refinement from time step to time step, since there are less localized phenomena in a dispersive wave train. However, dynamic adaptive mesh refinement is important for treating, e.g., wave breaking. In the context of our parallel domain decomposition strategy, we see the main challenge of dynamic adaptive mesh refinement not in the respect of locating target areas for refinement and performing the element subdivision (this is already taken care of in our software [31]), but rather in the actual parallel implementation. This is because load balancing is not easily achievable, which may involve shuffling solution areas and data between subdomains from one time step to another.

Acknowledgments The authors thank Tom Thorvaldsen for his contribution to an early programming phase of the parallel software used in this paper. We are also grateful to Sylfest Glimsdal for his help in providing the initial conditions for the experiments concerning solitary waves.

References [1] Abott MB, Petersen HM, Skovgaard O. On the numerical modelling of short waves in shallow water. J Hyd Res 1978; 16(3):173–203. [2] Bamberger A, Glowinski R, Tran QH. A domain decomposition method for the acoustic wave equation with discontinuous coefficients and grid change. SIAM J Numer Anal 1997;34(2): 603–39.

232

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233

[3] Benamou JD, Despre´s B. A domain decomposition method for the Helmholtz equation and related optimal control problems. J Comp Phys 1997;136:68–82. [4] Bjørstad PE, Espedal M, Keyes D, editors. Domain decomposition methods in sciences and engineering. Proceedings of the 9th International Conference on Domain Decomposition Methods, June 1996, Bergen, Norway; 1998. Domain Decomposition Press. [5] Boubendir Y, Bendali A, Collino F. Domain decomposition methods and integral equations for solving Helmholtz diffraction problem. In Fifth International Conference on Mathematical and Numerical Aspects of Wave Propagation, Philadelphia, PA; 2000. SIAM. p. 760–4. [6] Marine Accident Investigation Branch. Report on the investigation of the man overboard fatality from the angling boat Purdy at Shipwash Bank off Harwich on 17 July 1999. Technical Report 17/2000, Marine Accident Investigation Branch, Carlton House, Carlton Place, Southampton, SO15 2DZ, 2000. [7] Bruaset AM. A survey of preconditioned iterative methods. In: Pitman Res Notes, Math Ser 328; 1995. London: Longman Scientific & Technical. [8] Bruaset AM, Cai X, Langtangen HP, Tveito A. Numerical solution of PDEs on parallel computers utilizing sequential simulators. In: Ishikawa Y, Oldehoeft RR, Reynders JVW, Tholburn M, editors. Scientific computing in object-oriented parallel environments. Lect Notes Comput Sci. Berlin: Springer; 1997. p. 161–8. [9] Cai X. Overlapping domain decomposition methods. In: Langtangen HP, Tveito A, editors. Advanced topics in computational partial differential equations—numerical methods and Diffpack programming. Berlin: Springer; 2003. p. 57–95. [10] Cai X, Acklam E, Langtangen HP, Tveito A. Parallel computing. In: Langtangen HP, Tveito A, editors. Advanced topics in computational partial differential equations–numerical methods and Diffpack programming, Lect Notes Computat Sci Eng.; 2003. Berlin: Springer. p. 1–55. [11] Cai X, Langtangen HP. Developing parallel object-oriented simulation codes in Diffpack. In H.A. Mang, F.G. Rammerstorfer, and J. Eberhardsteiner, editors, Proceedings of the Fifth World Congress on Computational Mechanics, 2002. http:// wccm.tuwien.ac.at. [12] Chan TF, Mathew TP. Domain decomposition algorithms. In: Acta Numerica. Cambridge University Press; 1994. p. 64–143. [13] Chen X, Sharma S. A slender ship moving at a near-critical speed in a shallow channel. J Fluid Mech 1995;291:263–85. [14] Choi H, Bai K, Cho J. Nonlinear free surface waves due to a ship moving near the critical speed in shallow water. In Proceedings of 18th Symposium of Naval Hydrodynamics, Washington, DC; 1991. p. 173–90. [15] Collino F, Ghanemi S, Joly P. Domain decomposition methods for harmonic wave propagation: a general presentation. Comput Methods Appl Mech Eng 2000;184(2–4):171–211. [16] Dean EJ, Glowinski R, Pan T-W. A wave equation approach to the numerical simulation of incompressible viscous fluid flow modeled by the Navier–Stokes equations. In: De Santo JA, editor. Mathematical and numerical aspects of wave propagation. Philadelphia, PA: SIAM; 1998. p. 65–74. [17] Despre´s B. Domain decomposition method and the Helmholtz problem II. In Second International Conference on Mathematical and Numerical Aspects of Wave Propagation (Newark, DE, 1993); 1993. Philadelphia, PA: SIAM. p. 197–206. [18] Diffpack Home Page. Available from: http://www.diffpack.com. [19] Dolean V, Lanteri S. A domain decomposition approach to finite volume solutions of the Euler equations on unstructured triangular meshes. Int J Numer Methods Fluids 2001;37(6):625–56. [20] Ertekin RC, Webster WC, Wehausen JV. Waves caused by a moving disturbance in a shallow channel of finite width. J Fluid Mech 1986;169:275–92.

[21] Feng X. Analysis of a domain decomposition method for the nearly elastic wave equations based on mixed finite element methods. IMA J Numer Anal 1998;18(2):229–50. [22] Glimsdal S, Pedersen GK, Langtangen HP. An investigation of domain decomposition methods for one-dimensional dispersive long wave equations. Adv Water Res 2004;27:1111–33. [23] Gropp W, Lusk E, Skjellum A. Using MPI—portable parallel programming with the message-passing interface. 2nd ed. Cambridge, MA: The MIT Press; 1999. [24] Hamer M. Solitary killers. New Scientist August 1999;18–19. [25] Ingber MS, Schmidt CC, Tanski JA, Phillips J. Boundary-element analysis of 3D diffusion problems using a parallel domain decomposition method. Numer Heat Transfer, Pt B 2003;44(2): 145–64. [26] Karypis G, Kumar V. Metis: unstructured graph partitioning and sparse matrix ordering system. Technical report, Department of Computer Science, University of Minnesota, Minneapolis/St. Paul, MN, 1995. [27] Katopedes ND, Wu C-T. Computation of finite-amplitude dispersive waves. J Waterw Port, Coastal, Ocean Eng 1987; 113(4):327–46. [28] Katsis C, Akylas TR. On the excitation of long nonlinear water waves by a moving pressure distribution. Pt. 2. Three-dimensional effects. J Fluid Mech 1987;177:49–65. [29] Kornhuber R, Hoppe R, Pe´riaux J, Pironneau O, Widlund O, Xu J, editors. Domain decomposition methods in sciences and engineering. In: Proceedings of the 15th International Conference on Domain Decomposition Methods, July 2003, Berlin, Germany, Lect Notes Computat Sci Eng, vol. 40; 2004. Berlin: Springer. [30] Lai C-H, Bjørstad PE, Cross M, Widlund O, editors. Domain decomposition methods in sciences and engineering. Proceedings of the 11th International Conference on Domain Decomposition Methods, July 1998, Greenwich, UK; 1999. Domain Decomposition Press. [31] Langtangen HP. Computational partial differential equationsnumerical methods and Diffpack programming. Texts in computational science and engineering. 2nd ed.; 2003. Berlin: Springer. [32] Langtangen HP, Cai X. A software framework for easy parallelization of PDE solvers. In: Jensen CB, Kvamsdal T, Andersson HI, Pettersen B, Ecer A, Periaux J, et al, editors. Parallel computational fluid dynamics. Amsterdam: North-Holland; 2001. [33] Langtangen HP, Pedersen G. Computational models for weakly dispersive nonlinear water waves. Comp Methods Appl Mech Eng 1998;160:337–58. [34] Li Y, Sclavounos PD. Three-dimensional nonlinear solitary waves in shallow water generated by an advancing disturbance. J Fluid Mech 2002;470:383–410. [35] Mei CC. The Applied Dynamics of Ocean Surface Waves. Singapore: World Scientific; 1989. [36] Pedersen G. Three-dimensional wave patterns generated by moving disturbances at transcritical speeds. J Fluid Mech 1988; 196:39–63. [37] Pedersen G. Finite difference representations of nonlinear waves. Int J Numer Methods Fluids 1991;13:671–90. [38] Pedersen G. Nonlinear modulations of solitary waves. J Fluid Mech 1994;267:83–108. [39] Pedersen G, Langtangen HP. Dispersive effects on tsunamis. In: Proceedings of the International Conference on Tsunamis, Paris, France; 1999. p. 325–40. [40] Peregrine DH. Long waves on a beach. J Fluid Mech 1967;77: 417–31. [41] Rygg OB. Nonlinear refraction–diffraction of surface waves in intermediate and shallow water. Coast Eng 1988;12:191–211. [42] Smith BF, Bjørstad PE, Gropp W. Domain decomposition: parallel multilevel methods for elliptic partial differential equations. Cambridge University Press; 1996.

X. Cai et al. / Advances in Water Resources 28 (2005) 215–233 [43] Whitham GB. Linear and nonlinear waves. Pure and applied mathematics. New York: John Wiley Sons; 1974. [44] Woo J-K, Liu PL-F. Finite element model for modified Boussinesq equations. I: Model development. J Waterways Port, Coastal, Ocean Eng 2004;130(1):1–16.

233

[45] Wu DM, Wu TY. Three-dimensional nonlinear long waves due to moving surface pressure. In Proceedings of the 14th Symposium on Naval Hydrodynamics, MI, USA; 1982. p. 103–29. [46] Zelt JA, Raichlen F. A Lagrangian model for wave-induced harbour oscillations. J Fluid Mech 1990;213:203–25.