Domain Decomposition for Navier-Stokes Equations - Laboratoire ...

1 Domain Decomposition for Navier-Stokes Equations Gassan Abdoulaev , Yves Achdou , Jean-Claude Hontand , Yuri Kuznetsov , Olivier PIRONNEAU , Christophe Prud’homme Laboratoire d’Analyse Numérique Université Paris 6, 75252 Paris, F. Abstract The numerical solution of most Partial Differential Equations is a computer intensive task. Parallel computing is an economical way of increasing the computing power at our disposal, especially when it consists of a cluster of workstations. In this talk, we present two parallel computations for incompressible flows, we discuss all issues up to the use of sophisticated C++ tools and high end parallel computers like the CRAY-T3E. On the way we shall discuss other choices, the mathematical tools and the difficulties.

1

Introduction

It is common place to recall that microprocessors are doubling their computing power roughly every two years. But even so, there are a few very important fields of physics which are out of reach computationally; turbulence is one of them. Therefore, it is tempting to cluster computers (see for example the Beowulf projects) to increase the computing resources for a single task. This can be done either by the user or by the computer manufacturer. Users can link several computers together by a high speed network or even by a simple ethernet cable and a switch. They can also buy computing boards and rack them around a high speed bus. Manufacturers have invested a great deal into parallel architecture; nowadays many computations run in parallel mode, even in industry, on machines like the Paragon (Intel), the SP2 (IBM), the Origin 2000 (Silicon-graphics). There are less research projects for parallel machines than before but there is still the remarkable US project ASCI. On the software side it is necessary to distinguish clearly between shared memory machines and distributed memory machines. In shared memory machines the user does not worry about conflicts to access data and concentrates more on the parallelism of the task. At the system level the task is broken into threads which are executed by each processors. At the programming language level one uses compiler directives or tools like OpenMP or cilk [1] or other systems. But data access conflicts will slow down the machine

2

Domain Decomposition for Navier-Stokes Equations

when there are too many processors. The more traditional MIMD machines with distributed memory are conveniently driven by two libraries: PVM[4] and MPI[3]. Research and hope for a high level parallel language like HPF are strong still but it seems difficult to keep the subtleties of parallelism in one high level paradigm. We must mention one very interesting direction of research in which one provides the user with tools to write his program such as PETSC[2] and POOMA[5]. In these libraries the functions for parallelism are no longer low level but deal directly with matrix-vector multiplication or linear system solver. So let us consider the situation, typical to intensive computing, of a user who is ready to use MPI/PVM in a FORTRAN or C/C++ program which will run on a network of workstations in his lab and on a T3E or SP2 outside. The first major decision is whether he should start from scratch or put parallelism into an existing program. Then he should identify bottlenecks such as the solution of linear systems. And finally rethink old options: structured or unstructured mesh (figure 1) , implicit or explicit methods, because all are linked. To our experience, and until an efficient and easy way to use parallel libraries for solving linear systems is available, it is easier to implement an explicit solver for a PDE, using even a pseudo-time if the problem is stationary; then whether the mesh is the structured or unstructured will make little difference. But explicit schemes are slow and cursed with stability conditions. On the other hand implicit solvers are an order of magnitude harder to implement in parallel and for them structuredness allows the use of multigrid and other fast block solvers. So the type of the mesh and the choice of the iterative schemes are important issues. In most applications, the mesh is built independently of the machine and that domain decomposition is applied afterward on it (figure 2). But modern methods include mesh adaption (figure 1) and then load balancing between processors plus mesh adaption in parallel becomes a real challenge. But if mesh adaption is done within each processor then automatically comes the concept of multi-level meshes and so it is better to use it right from start. We can speak then of ”a-priori parallelism” as in figure 2 on the left versus ”a-posteriori parallelism” as in figure 2 on the right. We will present here a middle way which uses multi-level meshes which are unstructured at the coarse block level and structured within each block; it uses the mortar element method introduced in [6].


2

3

Operation Count for Explicit Schemes

Data Communication and Explicit Schemes non-viscous fluids, written in conservation form are

The Euler equations for

∂t W + ∇ · F (W ) = 0,

(2.1)

T

where W = (ρ, ρu, ρe) and ρ(x, t) is the density, u(x, t) is the velocity and e(x, t) the energy. If the fluid is meshed into non-overlapping adjacent elements and on the j-th element σj , W takes a constant value Wjm at time mδt, then an explicit scheme for this conservation law is Z 1 m+1 m = Wj + δt Φ(W + , W − ) (2.2) Wj |σj | ∂σj where Φ is a numerical flux function to implement ”upwinding” such that Φ(W, W ) = F (W ) and W ± are the values of W m upstream and downstream of the current point. For incompressible flows i.e. ∇ · u = 0, the Characteristic-Galerkin scheme can be used [8], ∂t v + ∇ · (u ⊗ v) = 0, discretized by v m+1 (xj ) = v m (X m (xj )), X m (x) ≈ v m (x − u(x)δt).

(2.3) (2.4)

For these two schemes, if data are assigned to processors by domain decomposition, we see clearly that the communications are limited to the data on the band of elements with touches the boundaries of each block. For the CharacteristicGalerkin method one needs to compute all the characteristics of each block and then to transfer the positions of those which are not completed to their neighbor elements. Operation count in 3D Assume that the machine has P processors of speed v, and that the communication cost per byte of data is c. Then the computation cost is proportional to the number of unknowns in each processor, N 3 , and inversely proportional to the processor speed: O(N 3 /v). Similarly the communication cost is O(c N 2 ) so the total computing time is H = N 3 /v + cN 2 in each processor, giving an efficiency eff = (N 3 /v) / (cN 2 + N 3 /v). Therefore, asymptotically, the total number of unknowns is P N 3 ≈ v H P and the efficiency as a function of the computing time is eff ≈ 1 / (1 + c ∗ v 2/3 H −1/3 ). This means that for a given computer and for a given computing time H, there is an optimal number of mesh points which increases with H. The efficiency improves slowly with H, but decreases if the processor speed is increased. In short, it is a waste of money to use fast parallel computing for problems which do not fill them.

3

Implicit Schemes

To update the pressure in an incompressible Navier-Stokes solver, it is difficult to avoid the use of an implicit scheme. The equation for the pressure and its

4


discrete counterpart are −∆p = f, Ap = F,

Z ∂p p = 0, |Γ = 0, ∂n Ω Z Z ∇wi · ∇wj , Fj = f wj , Aij = Ω

(3.1) (3.2)

Ω

where {wj }N 1 are the basis functions of the discrete space. Solved by an iterative process such as the Conjugate Gradient Algorithm, or for simplicity the Richardson method, pk+1 = pk + F − Apk ,

(3.3)

one has just to make the vector-matrix multiplication parallel, and the communication is also limited to the data on a band of elements at the boundary of the subdomains. However. without preconditioning, the iterative methods are very slow for such large systems.Thus to take care of parallelism only at the vector-matrix multiplication level may not be the best strategy. 3.1 Schwarz Method with overlap Another way is to rethink the algorithm and use Schwarz iterations. Take the case of two sub-domains Ω1 and Ω2 and call Γij the boundary of Ωi which is inside Ωj (figure 3). Because the Laplacian is a second order elliptic operator, the problem is solved if one has −∆u1 = f in Ω1 u 1 = u2 u = 0 on ∂Ω.

− ∆u2 = f in Ω2 , ∂u2 = − ∂n on Γij , 2

∂u1 ∂n1

(3.4)

, un+1 ) The additive Schwarz method consists of iterating on (un1 , un2 ) → (un+1 1 2 with −∆un+1 = f in Ω1 , = f in Ω2 , −∆un+1 1 2 n+1 n+1 n u1 = u2 on Γ12 . u2 = un1 on Γ21 . The rate of convergence of this algorithm depends on the relative overlap δ = max

i=1,2

diameter(Ω1 ∩ Ω2 ) diameter(Ωi )

kun − u∗ k ≤ (

Cδ n ) . 1 + Cδ

(3.5)

But when there are many subdomains the constant C becomes large and a coarse grid solver is needed for optimal preconditioning [17].

Domain Decomposition for Navier-Stokes Equations 3.2

5

Non overlapping subdomains: the Schur Complement Method

Suppose now that the domain Ω is partitioned into two subdomains without overlapping and call Γ the interface. Suppose u|Γ is known, then the problem decouples on each subdomain: −∆ui = f

in Ωi ,

ui = 0

on ∂Ωi \Γ,

Now consider the Steklov-Poincaré operator: 1 2

S : L (Ω) × H00 (Γ) → H 2

− 12

(Γ) S(f, u|Γ ) =

ui = u|Γ µ

on Γ.

∂u1 ∂u2 + ∂n1 ∂n2

(3.6)

¶ |Γ .

(3.7)

Then u|Γ satisfies a linear system at the interface S(0, u|Γ ) = −S(f, 0)

(3.8)

which can be solved after discretization by an iterative method like preconditioned conjugate gradient. 3.3

Discretization

The fact that overlapping Schwarz methods require much more data communications has discredited them a little. However let us note that the origin of the problem plays a role in the selection of the Schwarz method and here too, apriori and a-posteriori parallelism will orient the choice between overlapping and non-overlapping meshes and even among those, between meshes which coincide or do not coincide in Ω1 ∩ Ω2 (figure 4). In constructive Solid Geometry (CSG) such as used by VRML ( Virtual Reality Modeling Language)[9] , the object is given as a result of set operations on simple shapes, then Domain Decomposition leads naturally to non-compatible meshes. Similarly, in the Chimera approach [10], the complete object being too hard to triangulate, it is decomposed into simpler shapes and non-compatible meshes are used. For non-compatible overlapping meshes a modified version of Schwarz’ algorithm has been proposed by J.L. Lions et al in [13] and an extension which uses ”Virtual Controls” is given in [?] (we developped the software freefem+[12] especially to evaluate these new formulations). For instance −∆u = f, XZ min v

i

u ∈ H01 (Ω)

is replaced by

u2i : − ∆ui = f + (−1)i v 1O

(3.9) ui ∈ H 1 (Ωi ), ui |Γ =(3.10) 0

∂Ωi

where O is any set inside Ω1 ∩ Ω2 and 1O its characteristic function. 3.4 Preconditioners Let us come back to the non-overlapping case (3.8). It is solved by an iterative method such as the preconditioned conjugate gradient algorithm. A ”good preconditioner” must be such that

6

Domain Decomposition for Navier-Stokes Equations • the rate of convergence of the preconditioned system should be as independent as possible of the mesh size, and of the size of the subdomains, • the algorithmic complexity of the resolution of a linear system with the preconditioner should be low, • The cost of the latter resolution should scale down well as the number of processors is increased.

Example: in the case without overlap, the matrix S has a condition number of order O( h1 ), where h is the mesh size; for the matrix S preconditioned with a Neumann-Neumann preconditioner [15], see below, the condition number behaves like H12 (1 + log H h ) where H is the diameter of the subdomains: The convergence rate depends very little on the number of unknowns contained in the subdomains but, when the number of subdomains is increased, the performances of the method deteriorate. For that reason, it is necessary to include a coarse level correction, and then, the condition number behaves like (1 + log H h ), which is almost perfect. The Neumann-Neumann Preconditioner The basic idea (see Glowinski[14] and LeTallec[15]) is to approach the inverse of S(0, ·) by T : 1

2 (Γ) T (g) = T : H − 2 (Γ) → H00 1

1 (v1 + v2 ) |Γ , 2

(3.11)

where −∆vi = 0

in Ωi ,

vi = 0

on ∂Ωi \Γ

−

∂vi =g ∂ni

on Γ.

(3.12)

Remark 1 If Ω1 = Ω2 then T is exactly the inverse of S(0, ·). Remark 2 This preconditioner is almost optimal if a coarse level correction is added (multigrid effect). Extensions to convection-diffusion operators are available in Achdou[16] & Nataf[18]

4

Mortar Elements for the Navier-Stokes equations

4.1 The discretization Another discretization is possible with Mortar Elements, as introduced in Bernardi et al [6]. Applied to the Finite Element Method, it provides a way to impose a matching condition between functions which are defined on non-overlapping subdomains with non-matching grids (see figure 4). This can be combined with an implicit time scheme for the Navier-Stokes equations, first or second order in time, which leads to constant symmetric linear operators when the characteristic-Galerkin method is used for the convection terms. In the subdomains, the discretizations are independent, and the meshes do not coincide on both sides of a common boundary. The grid is structured in each block but non-conforming between the blocks. In each block, a cartesian


7

grid is used to apply a piecewise bilinear finite element method (overparametricQ1 -FEM) on the pressure variables. An overparametric Q1 -FEM on the velocity variables is applied too, but on a twice finer grid, so that an inf-sup compatibility condition between the discrete velocity and pressure spaces is satisfied. For the velocity and the pressure, weak matching conditions are imposed at the boundaries of blocks, so the method is non conforming. The incompressibility condition is handled by projection. In two dimensions, a stream function-vorticity formulation is used with a special boundary condition for the vorticity to decouple the operators[7]. The Scheme

Finally the scheme is

• The convection-diffusion step : 1 n 1 n+1 (˜ u u −u − un ) + (˜ ñ ◦ X n ) − ν∆˜ un+1 = f, + B.C. δt δt • The Projection step: ñ+1 − δt∇pn+1 , un+1 = u

−∆pn+1 = −

1 ∇·u ñ+1 , δt

∂p n+1 given on Γ. ∂n

At each time step one needs to • compute the streamlines for the method of characteristics, • solve three uncoupled (symmetric) PDEs for the velocity −νδt∆ui + ui = fi , • solve a Poisson problem for the pressure: −∆p = f,

∂p = g. ∂n

The method is naturally parallel and uses fast multilevel type preconditioners within each block. Yet the mesh can be adapted to the flow in each processor independently. Unfortunately, such a scheme is somewhat difficult to program; it requires new mesh generators and new graphic routines, and there is a a stability problems which is particular to Navier-Stokes in that the discontinuities at the interfaces create vorticity which takes a long time to disappear by viscous effect at high Reynolds number. An Example with Mortar Elements Consider in two dimensions the PDE where Dirichlet conditions are treated by choosing γ >> 1 and appropriate quadratures. ∂u = g on ∂Ω (4.1) u − µ∆u = f in Ω, γu + µ ∂n 1 Suppose that each Q Ωk is provided with its own discretization Xk,h of H (Ωk and call Xh = k Xk . Let E be the set of all the edges of the subdomains not

8


contained in ∂Ω. Consider also for e ∈ E a discrete space Le,h contained in Q L2 (e), and call Lh = e Le,h . Call Yh the space Z [vh ]µh = 0}. Yh = {vh ∈ Xh ; ∀e ∈ E, ∀µh ∈ Le,h , e

The Mortar Elements Method consists of finding uh ∈ Yh such that Z Z f vh + gvh . ∀vh ∈ Yh , a(uh , vh ) = Ω

where a(uh , vh ) =

¶

X µZ

Z

uw + µ∇u · ∇w +

Ωi

i

(4.2)

∂Ω

γuw, ∂Ω

Calling b the bilinear form on Xh × Lh , XZ [vh ]µh , b(vh , µh ) = e

e

it is also possible to consider the saddle point problem of finding (uh , λh ) ∈ Xh × Lh , such that Z Z f vh + gvh , ∀vh ∈ Xh , a(uh , vh ) + b(vh , λh ) = ∀µh ∈ Lh ,

Ω

b(uh , µh ) = 0

∂Ω

When working with first order elements in the subdomains, it is possible to construct Lh (e) from the traces on e of the functions of one of the adjacent subdomains.Well posedness requires that the space Lh should not be too large, so for wh in Lh , one requires that ∂s wh = 0 at both ends of a given edge. For higher order elements and spectral methods, see [6]. The method can also be generalized in three dimensions. The linear system BU = 0,

The problem is of the type and ∀W with

BW = 0,

W T AU = W T F.

(4.3)

So by the separation theorem, there exists Λ such that AU + B T Λ = F.

BU = 0, i.e. µ

A BT

B 0

¶µ

U Λ

¶

µ =

F 0



¶ ,

A11  0 A=  0 .. .

0 A22 0

(4.4) 0 0 A33

...

..

   

(4.5)

.

When Λ is known, AU = F − B T Λ is a local system in each block, which can be solved in parallel. After eliminating U , one obtains BA−1 B T Λ = BA−1 F.

(4.6)


9

Remark 3 If A = −∆, special care must be taken because the local matrices are singular. 4.2 Preconditioning To precondition an iterative algorithm applied to (4.6) one must find a matrix spectrally equivalent to (4.5). If A is positive definite, a good preconditioner is ¶ µ ¶ µ A 0 RV ∼ A, RV 0 ≈ with 0 RΛ RΛ ∼ BA−1 B T . 0 BA−1 B T We take RV with the same block structure as A :   R1 0 0   .. RV =  0  . 0 0 0 RK and we can choose Rk−1 = Pˆk Hk Pˆk +

1 Pk ²h3k

in the case where A ≡ −∆u+²u = 0, ²