A FAST POISSON SOLVER FOR THE FINITE

8 downloads 0 Views 866KB Size Report
the solution of a Poisson equation for some form of the pressure. ..... Applying Stewart's theorem [22], we have from the singular value decomposition ..... [6] M. Fortin, R. Peyret, and R. Temam, Calcul des ecoulements d'un fluide visqueux incom- ... [20] Shyy and Vu, On the adoption of velocity variable and grid system for ...
c 1998 Society for Industrial and Applied Mathematics °

SIAM J. SCI. COMPUT. Vol. 19, No. 5, pp. 1606–1624, September 1998

012

A FAST POISSON SOLVER FOR THE FINITE DIFFERENCE SOLUTION OF THE INCOMPRESSIBLE NAVIER–STOKES EQUATIONS∗ GENE H. GOLUB† , LAN CHIEH HUANG‡ , HORST SIMON§ , AND WEI-PAI TANG¶ Abstract. In this paper, a fast direct solver for the Poisson equation on the half-staggered grid is presented. The Poisson equation results from the projection method of the finite difference solution of the incompressible Navier–Stokes equations. To achieve our goal, new algorithms for diagonalizing a semidefinite pair are developed. The fast solver can also be extended to the three-dimensional case. The motivation and related issues in using this half-staggered grid are also discussed. Key words. fast solver, generalized eigenvalues, Poisson equation, incompressible Navier–Stokes equations AMS subject classifications. 65F05, 65F15, 65M06, 76D05, 76M20 PII. S1064827595285299

1. Introduction. Consider the unsteady incompressible Navier–Stokes equations (INSE) (1) (2)

∂w ∂w ∂w + grad p = α div grad w, +u +v ∂y ∂t ∂x div w = 0,

where w = (u, v)′ in a two-dimensional region Ω, with initial condition w(x, y, 0) = w0 (x, y)

in Ω

satisfying the constraint condition (2), and boundary condition w(x, y, t) = wb (x, y, t) satisfying the consistency condition I (3)

∂Ω

on ∂Ω

∂w ds = 0. ∂n

The main difficulty in the solution of this problem is that the unsteady INSE is not entirely evolutionary; it is subject to the divergence-free constraint (2). The ∗ Received by the editors April 26, 1995; accepted for publication (in revised form) November 25, 1996; published electronically April 28, 1998. This work was supported by the Natural Sciences and Engineering Research Council of Canada, the Information Technology Research Centre, which is funded by the Province of Ontario, NASA contract NAS 2-13721, NSF grant CCR-9505393, the National Key Projects of Fundamental Research of China, and the National Natural Science Foundation of China. http://www.siam.org/journals/sisc/19-5/28529.html † Department of Computer Science, Stanford University, Stanford, CA 94305 (golub@sccm. stanford.edu). ‡ Laboratory for Scientific and Engineering Computing, Institute of Computational Mathematics, Chinese Academy of Sciences, Beijing 100080, China. § NERSC Division, Lawrence Berkeley National Laboratory, Mail Stop 50A/5104, University of California, Berkeley, CA 94720 ([email protected]). ¶ Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1 ([email protected]).

1606

A FAST POISSON SOLVER

1607

projection method, proposed by Chorin [4] and Temam [23], has been proved to be very effective and its variants are widely used in many applications. With this method, some auxiliary velocity is found and then projected into the divergence-free space via the solution of a Poisson equation for some form of the pressure. For example, let the discretization of (1) and (2), after linearization of nonlinear terms, be (4) (5)

˙ ∆w ˙ + ∇φ = E, + A∆w ∆t ∇ · w = 0,

˙ = wn+1 −wn ; A represents the discretization of convection and diffusion, ∇ where ∆w and ∇·, the discretization of grad and div, respectively, and φ = pn+1 −pn , the pressure increment; and E includes all known terms at the nth level. With an approximate factorization, (4) is put into the following fractional steps:

(6)

e ∆w e = E, + A∆w ∆t

e =w e is called the auxiliary velocity) and e − wn (w where ∆w (7)

e wn+1 − w + ∇φ = 0. ∆t

Now, applying the discrete divergence to (7), we obtain the discrete Poisson equation for φ: (8)

∇2 φ =

e ∇·w , ∆t

in which the discrete divergence-free condition (5), ∇ · wn+1 = 0, has been enforced. The solution process is then: e by (6); • find w • solve the discrete Poisson equation (8) for φ; • update wn+1 by (7). For a more detailed account of the boundary treatment, see [11]. The constraint (5) causes difficulties in the choice of the grid, the coordinate system, and the related discrete Poisson equation, etc. of the solution method. For a rectangular region Ω, the usual staggered grid (u, v, and p staggered as shown in Fig. 1.1) is most frequently used. With this grid, no pressure boundary condition is needed, which is mathematically correct. Let the discretized linear system of equations of (8) be denoted as (9)

LΦ = R,

where Φ is the unknown vector of size, say, M . With this grid, L is of rank M − 1; the constraint on the right-hand side R reduces to an approximation of (3), and the solution Φ is unique up to a constant. For nonuniform grid intervals, L may not be symmetric, and direct solvers from the FISHPACK have proved to be most efficient and convenient; see [2, 19], for example. Also, for the usual staggered grid, the finite difference approximation of (1) is the same as the finite volume approximation,

1608

G. GOLUB, L. HUANG, H. SIMON, AND W.-P. TANG

u v

p



u v

p

u v

p



p



v

p

u v

p

u v



u



u

p



v

u v

p

u v





v

u v

u

p



v

u

Fig. 1.1. Staggered grid.

u,v

p•

p•

p• u,v

u,v

u,v

u,v

u,v

p•

p•

p•

u,v

u,v

u,v

u,v

p•

p•

p•

u,v

u,v

u,v

u,v

u,v

u,v

Fig. 1.2. Half-staggered grid.

and the computations of ∇p and ∇ · w are straightforward without any interpolation. However, u and v involve different finite volumes, and near the boundary, half-interval differencing is necessary. More serious is the problem on curvilinear grids in arbitrary regions; there the extension of the usual staggered grid will lose many of the above advantages (see [20], for instance). Hence, the half-staggered grid (as referred to by [17] with w, p staggered as shown in Fig. 1.2) was studied in [12]. This grid was first proposed in [6] for the unsteady INSE; it was used in [1] with a different discretization of projection, as a recent example; but in general it is not used in practical computation. For steady-state INSE, this grid was used and investigated for the finite element method (FEM) in [18] and for the Galerkin formulation in [21].

A FAST POISSON SOLVER

1609

The advantages of this grid for the unsteady INSE are: no pressure boundary condition is required; the finite difference approximation of (1) is the same as the finite volume approximation with the same finite volume for u and v; ∇p and ∇ · w involve only simple averages; and there is no half-interval differencing near the boundary. Most important is the extension of this staggering to curvilinear grids in arbitrary regions. As the velocity components are at the same point, they are easily transformed to contravariant components for finite volume formation, for approximation of div w, and for boundary conditions, etc. The disadvantage of this grid is in the nature of the discrete Poisson equation (8), where L is of rank M − 2. There is an additional constraint and the solution Φ may have oscillations; see the next section. As the first step to this problem, we consider rectangular regions with nonuniform grids. Since (9) is to be solved at every time step and is the CPU-intensive part of the algorithm for the unsteady INSE, efficient solvers are essential, and for the convenience of users, direct solvers are considered in this paper. 2. Discretization. Consider a rectangular region with a nonuniform grid, schematically shown in Fig. 2.1. We generate the nonuniform grid by some smooth transformation x(ξ), y(η), with a uniform grid in the computational ξη region, and approximate ∂ dξ ∂ = ∂x dx ∂ξ

by

∆ξ δ δ = , δx δx ∆ξ

where δ on the right-hand side denotes centered differencing, and similarly for ∂/∂y. Then for every interior w point,     φj+1,k − φj,k φj+1,k+1 − φj,k+1 δφ + 1 xj+1 − xj xj+1 − xj  δx  =  φ , (10) ∇ φ|j+ 1 ,k+ 1 =  δφ φj,k+1 − φj,k 2 2 j+1,k+1 − φj+1,k 2 + δy j+ 21 ,k+ 21 yk+1 − yk yk+1 − yk and (7) can be written as 1 f ) + GΦ = 0, (W n+1 − W ∆t

(11)

where G is the gradient matrix for the solution vector Φ. For every interior φ point, or rather every (j, k) cell, the most straightforward approximation of div w is ∇ · w |j,k = (12)

i h 1 1 (uj+ 21 ,k+ 12 + uj+ 21 ,k− 12 ) − (uj− 12 ,k+ 21 + uj− 12 ,k− 21 ) 2 xj+ 12 − xj− 12 i h 1 1 + (vj+ 21 ,k+ 21 + vj− 12 ,k+ 21 ) − (vj+ 12 ,k− 21 + vj− 21 ,k− 12 ) , 2 yk+ 12 − yk− 21

which results in matrix form (13)

DW = −B.

Here D is a matrix form of ∇· with boundary modifications; i.e., for cells adjacent to the boundary, some of w in (13) are known and put into B. Hence, the final discrete system of equations is:  1  f (14) DW + B . LΦ = DGΦ = R = ∆t

1610

dx1

G. GOLUB, L. HUANG, H. SIMON, AND W.-P. TANG

dx2

dx3

fi,j •

dx4

gi,j •

hi,j •

oi,j •

• ei,j

• bi,j

• ci,j

dy3 di,j• dy2

• ai,j

dy1

Fig. 2.1. Nonuniform grid and the stencil.

After some tedious derivations, the matrix L is put into the following form: L = DG = [Cn ⊗ Dx Am + Dy Bn ⊗ Cm ] , which is a nine-diagonal unsymmetric matrix. Letting dxi = xi+1 −xi , dyi = yi+1 −yi , dxi,i+1 = (dxi+1 +dxi )/2, and dyi,i+1 = (dyi+1 +dyi )/2, the matrices in the expression of L can be written as  1   dx1  1     dx 2 Dx =  ,  ..   .   1 dxn m×m   1 dy 1   1     dy2 , Dy =   ..   .   1 dyn n×n   1 1   1 2 1     . . .. .. Cl =   ,    1 2 1  1 1 l×l 

     Am =      

− 1 dx12 1 dx12





1 dx12 1 + 1 dx12 dx23

..



1 dx23 .. .

. .. 1

.

dxm−2,m−1







.. 1

. +

1

dxm−2,m−1 dxm−1,m 1 dxm−1,m



1 dxm−1,m 1 − dxm−1,m

          

m×m

,

1611

A FAST POISSON SOLVER



     Bn =      

− 1 dy12 1 dy12





1 dy12 1 + 1 dy12 dy23

..



1 dy23 .. .

. ..



.

1 dyn−2,n−1





..

.

1 1 + dyn−2,n−1 dyn−1,n 1 dyn−1,n



1 dyn−1,n 1 − dyn−1,n

          

.

n×n

Here an m × n rectangular grid is assumed (note here that n does not have the same meaning as in the time-stepping process). For the interior nodes, a nine-point stencil is shown in Fig. 2.1; the corresponding coefficients for a node (i, j) in the figure are: 1 1 1 1 + , dxi dxi−1,i dyj dyj−1,j   1 1 1 1 1 + , +2 =− dxi dxi−1,i dxi,i+1 dyj dyj−1,j 1 1 1 1 = + , dxi dxi,i+1 dyj dyj−1,j   1 1 1 1 1 − + , =2 dxi dxi−1,i dyj dyj−1,j dyj,j+1   1 1 1 1 1 =2 − + , dxi dxi,i+1 dyj dyj−1,j dyj,j+1 1 1 1 1 + , = dxi dxi−1,i dyj dyj,j+1   1 1 1 1 1 + , +2 =− dxi dxi−1,i dxi,i+1 dyj dyj,j+1 1 1 1 1 + , = dxi dxi,i+1 dyj dyj,j+1 = − (ai,j + ci,j + fi,j + hi,j ) .

ai,j = bi,j ci,j di,j ei,j fi,j gi,j hi,j oi,j

It is not difficult to see that L is singular, and in the next section we will show it is of rank M − 2. The two basis vectors of the null space N (L) are φ01 = (1, 1, 1, . . . , 1)T ⊗ ( 1, 1, 1, . . . , 1)T = en ⊗ em , φ02 = (1, −1, 1, . . . , −1n )T ⊗ (−1, 1, −1, . . . , −1m−1 )T = e′ n ⊗ −e′ m , where T

el = (1, 1, 1, . . . , 1) , {z } | l



T

e l = (1, −1, 1, . . . , −1l−1 ) , | {z } l



since Am em = 0, Bn en = 0, and Cl e l = 0.

1612

G. GOLUB, L. HUANG, H. SIMON, AND W.-P. TANG

We note from φ01 that the solution φ can differ by a constant which is irrelevant, as our interest is only in ∇φ. From φ02 , we see that φ can have oscillations, called the checkerboard effect [18], and from (10), we also see that in the present finite differencing the oscillations in φ do not affect ∇φ and hence do not affect the solution w. To ensure that the solution of (14) exists, the right-hand side R must satisfy R ∈ N (LT )⊥ ; i.e., R must be orthogonal with the two basis vectors of N (LT ). For a nonuniform grid, the two basis vectors of the null space of LT are ψ01 = dyn ⊗ dxm , ψ02 = e′ n ⊗ −e′ m , where dxm = (dx1 , dx2 , . . . , dxm ) {z } |

T

m

and

T

dyn = (dy1 , dy2 , . . . , dyn ) . | {z } n

This can be verified by multiplying directly by the matrix LT : LT ψ01 = [Cn ⊗ Am Dx + Bn Dy ⊗ Cm ] dyn ⊗ dxm = Cn dyn ⊗ Am Dx dxm + Bn Dy dyn ⊗ Cm dxm = 0, T L ψ02 = [Cn ⊗ Am Dx + Bn Dy ⊗ Cm ] e′ n ⊗ −e′ m = −Cn e′ n ⊗ Am Dx e′ m − Bn Dy e′ n ⊗ Cm e′ m = 0, since Am Dx dxm = 0, Bn Dy dyn = 0, and Cl e′ l = 0. In [18], Sani investigated the solution of (13) and presented the basis vectors of T N (DT ) to be ψ01 and ψ02 . He showed that the constraint ψ01 B = 0 reduces to a T discrete form of (3), and ψ02 B = 0 yields a constraint for the tangential velocity component on the boundary. For the unsteady INSE, the interest is in the solution of (14). Due to the diverT f will cancel out, leaving ψ T R = (ψ T B)/∆t. Hence, the DW gence form of R, ψ01 01 01 T T R = 0 will also hold if constraint ψ01 R = 0 is essentially a discrete form of (3). ψ02 (5) holds for some velocity field with the given boundary condition; i.e., (13) holds for some W , for example, the initial velocity field and time-independent boundary condition. When dxi ≡ ∆x and dyi ≡ ∆y are constant, the matrix L is symmetric and has a simpler form:   1 1 ′ ′ Cn ⊗ Am + An ⊗ Cm , L=− ∆x2 ∆y 2

1613

A FAST POISSON SOLVER

2

2 ◦•



@ I @

 ¡

•◦

¡ ¡ @ ◦ ◦• ◦ ¡@−8 ¡ @ @ ª ¡ R ◦• ◦ •◦ 2

2

Fig. 2.2. Skewed five-point stencil.

where 

1  −1   A′l =   

−1 2 .. .



−1 ..

. −1

2 −1

    .  −1  1 l×l

In particular, when ∆xi = ∆yi = h, the discretization is a well-known skewed fivepoint stencil (see Fig. 2.2). 3. Fast solver. As we mentioned in the introduction, since matrix equation (9) is to be solved every time step, an efficient solver is essential. Fast solvers for the sum of two matrix tensor products were discussed in the classic paper [2]. Later, Kaufman and Warner [15] provided a generalization to the higher order discretization case. In particular, the eigendecomposition used in [2] is replaced by a generalized eigenvalue and eigenvector problem. If we examine the formula of the matrix L, it is clear that generalized eigendecomposition is needed to develop our fast solver. For a matrix of the form A ⊗ B + C ⊗ E, the key step in developing a fast algorithm in [2, 15] was to simultaneously diagonalize the matrices B and E by a matrix Z for which Z T BZ = I,

Z T EZ = D,

where B and E are symmetric and positive definite. There are two obvious problems which prevent these algorithms from being directly applicable here. The first problem is that the matrices L, Dx Am , and Dy Bn are not symmetric. The second problem is the singularity of the matrices Am , Cl , and Bn . Many of the effective generalized eigenvalue algorithms [5, 9, 14, 24] for simultaneously diagonalizing two banded matrices need the matrix pair to be nonsingular and symmetric. Unfortunately, both Am and Cl are singular and Dx Am is unsymmetric, and thus a new approach is needed.

1614

G. GOLUB, L. HUANG, H. SIMON, AND W.-P. TANG

3.1. Equal-spaced in one direction. Assume the mesh is equal-spaced in one direction, say, the x-direction. Letting ∆xi = ∆x, i = 1, . . . , m, the matrix L can be written as (15)

L = [Cn ⊗ A′′m + Dy Bn ⊗ Cm ] ,

where A′′m = −A′m /∆x2 . Though the matrix L is not symmetric, both A′m and Cm are symmetric, semidefinite matrices. We can show the following result. Theorem 3.1. There exists a matrix Z such that Z T C 2 Z = A′m , Z T S 2 Z = Cm , where the diagonal elements ci , si of C and S satisfy c2i + s2i = 1,

0 ≤ ci , si ≤ 1,

i = 1, . . . , m.

Proof. The proof is a natural generalization of the result presented in [24], where a pair of symmetric and definite matrices is considered. There exist two orthogonal matrices T1 and T2 such that A′m = T1 D12 T1T , Cm = T2 D22 T2T , where Di , i = 1, 2, are diagonal matrices with nonnegative diagonal elements. Now, construct   D1 T1T = QR, W = D2 T2T where Q, R make up the QR decomposition of the matrix W . Then partition matrix Q as follows:   Q1 (16) . Q= Q2 Applying Stewart’s theorem [22], we have from the singular value decomposition (SVD) [8] Q1 = U1 CV1 , Q2 = U2 SV2 , where U1 , U2 , V1 , and V2 are all orthogonal matrices and C, S are diagonal matrices diag(ci ) and diag(si ), respectively. Then from (16) and the orthogonality of Q, V1 = V2 = V and c2i + s2i = 1.

A FAST POISSON SOLVER

1615

Therefore, we have A′m = T1 D12 T1T = RT V T C 2 V R = Z T C 2 Z, Cm = T2 D22 T2T = RT V T S 2 V R = Z T S 2 Z, where Z = V R. After the simultaneous diagonalization of the matrices Am and Cm is achieved, the fast solver in this case is similar to the Buzbee–Golub–Nilson or Kaufman–Warner algorithm. First, the matrix equation (15) can be transformed into a tridiagonal matrix equation as follows: (I ⊗ Z −T ) L (I ⊗ Z −1 ) = I ⊗ Z −T [Cn ⊗ A′′m + Dy Bn ⊗ Cm ] I ⊗ Z −1 = Cn ⊗ Z −T A′′m Z −1 + Dy Bn ⊗ Z −T Cm Z −1 1 Cn ⊗ C 2 + Dy Bn ⊗ S 2 =− ∆x2 = L′′ , where L′′ is a tridiagonal matrix. A simple permutation matrix transforms L′′ into a block diagonal matrix. The diagonal elements are n × n tridiagonal matrices. The solution of these tridiagonal matrices are trivial. In particular, they can be implemented on a vector or parallel computer efficiently. For a regular mesh, ∆x = ∆y = h, this block diagonal matrix has a very simple form. We show the following result. Theorem 3.2. The rank of the matrix L for a regular mesh is mn − 2. Proof. We know that both A′m and Cm have only one zero eigenvalue and N (A′m )∩ N (Cm ) = ∅. Therefore, the two zero diagonal elements ci∗ and sj ∗ in C and S will not be in the same location. It is easy to arrange them such that c1 = 0 and sm = 0. After permuting the matrix L′′ to a block diagonal matrix, the first and last diagonal block matrices will be: T1 = −s21 A′n ,

Tn = −c2n Cn ,

respectively. Each has only one zero eigenvalue. The rest of the diagonal block matrices are of the form (17)

Ti = −c2i Cn − s2i A′n ,

i = 2, . . . , n − 1,

where c2i + s2i = 1. When ci = si , (17) reduces to a diagonal matrix diag{1, 2, . . . , 2, 1}, and it is not singular. If ci 6= si , Ti is a tridiagonal matrix; viz.,   −α 1  1 −2α 1      .. .. Ti = (−c2i + s2i )   , . .    1 −2α 1  1 −α l×l 1 where |α| = s2 −c 2 > 1, and hence Ti is nonsingular. i i In the next section, we show that the rank of L for the general case is also mn − 2.

1616

G. GOLUB, L. HUANG, H. SIMON, AND W.-P. TANG

3.2. General case. For a general nonuniform grid, the simultaneous diagonalization of both matrix Dx Am and Cm by a single matrix cannot be achieved. However, the generalized eigenvectors can still be used to achieve the same purpose with two different matrices. Recall the matrix equation (18)

LΦ = R,

where L = DG = [Cn ⊗ Dx Am + Dy Bn ⊗ Cm ] .

(19)

The QZ algorithm [8, 16] is used to compute the generalized eigenvalues (αi , βi ) and its corresponding eigenvectors vi such that βi Dx Am vi = αi Cm vi . Let Da and Dc be two diagonal matrices whose diagonal elements are αi and βi , respectively. We have Dx Am V Dc = Cm V Da , where matrix V = [v1 , v1 , . . . , vm ]. Since we know that each of the matrices Am and Cm has one zero eigenvalue, it is clear that there is one zero diagonal element in Da and Dc , respectively. Due to the orthogonality of the null spaces of N (Am ) and N (Cm ), the two zero diagonal elements will not be in the same location. We arrange the generalized eigenvectors such that αn = 0,

β1 = 0.

Then we construct a matrix (20)

U = [Dx Am [v1 , v2 , . . . , vm−1 ], Cm vm ].

It is not difficult to see that

U

and

−1



   (Dx Am )V =    

   U −1 Cm V =   



1 1 ..

. 1 0

   e  = Im   

0 β2 /α2 ..

. βn−1 /αn−1 1

    = Dx .  

Thus, the tridiagonalization of (19) can be achieved as follows: (In ⊗ U −1 )L(In ⊗ V ) = (In ⊗ U −1 ) [Cn ⊗ Dx Am + Dy Bn ⊗ Cm ] (In ⊗ V ) = Cn ⊗ U −1 Dx Am V + Dy Bn ⊗ U −1 CV = Cn ⊗ Iem + Dy Bn ⊗ Dx e = L.

1617

A FAST POISSON SOLVER

e has three nonzero diagonals with a bandwidth of 2m − 1. The Here, the matrix L resulting matrix equation will be

where

(In ⊗ U −1 )L(In ⊗ V )(In ⊗ V −1 )Φ = (In ⊗ U −1 )R, eΦ e = R, e L e = (In ⊗ V −1 )Φ, Φ e = (In ⊗ U −1 )R. R

We start with an ordering of the grid nodes first in the x direction, then in the y direction. Let P be the permutation matrix, which reorders the grid nodes first in the y direction, then in the x direction; we will then have e e T )(P Φ) e = P R, (P LP bΦ b = R, b L

b is a block diagonal matrix, e The matrix L e and R b = P R. e T, Φ b = P Φ, b = P LP where L in which each diagonal element Ti is an n × n tridiagonal matrix: T1 = Cn , βi Dy Bn αi = Dy Bn ,

(i = 2, 3, . . . , m − 1),

Ti = Cn + Tm and 

  b= L   



T1 T2 ..

. Tm−1 Tm

   .  

The solutions of the equations with matrix T2 , . . . ,Tm−1 are trivial. Some attention should be paid to the first and last equations, since both T1 and Tm are rank n − 1 and the top left (n − 1) × (n − 1) submatrix of T1 and Tm is nonsingular. We can solve the two subsystems with that nonsingular submatrix and assume the last element of the solution to be zero. Using this approach, the contribution of the two null space basis vectors of L is set to be zero, and this will not affect the solution u and v in the INSE. The final solution of (18) is b Φ = (In ⊗ V )P T Φ.

The fast solver can be summarized as follows. 1. Preprocess. (a) Compute the generalized eigendecomposition (Da , Dc , V ) of the pencil (Dx Am , Cm ). (b) Compute the LU decomposition of the matrix U given in (20). (c) Form the LU decompositions of the matrices Ti .

1618

G. GOLUB, L. HUANG, H. SIMON, AND W.-P. TANG

(Note this step is needed only once at the beginning of the solution process of the INSE. Its cost can be amortized over many time steps. The complexity of steps (a) and (b) is O(m3 ), and that of (c) is O(nm).) 2. Solve e = R. (I ⊗ U )R

The complexity is O(nm2 ). 3. Solve

e bΦ b=R b = P R. L

The complexity is O(nm). 4. Compute

b Φ = (I ⊗ V )P T Φ.

The complexity is O(nm2 ). Remark. It is not difficult to see that the generalized eigenvalue problem Dx Am x = λCm x is a discrete approximation of the eigenvalue problem y ′′ = λy,

(21)

y ′ (0) = y ′ (1) = 0

on a nonuniform grid. Therefore, this pencil is diagonalizable. The eigenvectors are approximations of the eigenfunctions of (21). Introducing a small perturbation on the (1, 1) element of Cm (to make Cm positive definite), we can show that βi /αi < 0, i = 2, . . . , m − 1. Therefore, the matrices T2 , . . . , Tm−1 are diagonally dominant and nonsingular. The conclusion for the rank of L is also true for the general case. 4. Three-dimensional case. The fast solver can be extended to a three-dimensional problem without difficulty. Assume an m×n×l nonuniform grid for the solution domain. The coefficient matrix of the three-dimensional case in (18) will be L = Cl ⊗ Cn ⊗ Dx Am + Cl ⊗ Dy Bn ⊗ Cm + Dz El ⊗ Cn ⊗ Cm , where



1  dz1   Dz =    and



− 1 12  dz 1   dz12

   El =      

1   dz12 1 + 1 − dz12 dz23 .. .

1 dz2

..

. 1 dzl

     

l×l



1 dz23 .. .

..

. 1

dzl−2,l−1







..

.

1 1 + dzl−2,l−1 dzl−1,l 1 dzl−1,l



1 dzl−1,l − 1 dzl−1,l

           

l×l

.

A FAST POISSON SOLVER

1619

There are two generalized eigenvalue problems Dx Am x = λCm x, Dy Bn y = τ Cn y which need to be solved. Let V and W be the eigenvector matrices for the two problems, respectively. Similar to the two-dimensional case, form U = [Dx Am [v1 , v2 , . . . , vm−1 ], Cm vm ], Q = [Dy Bn [w1 , w2 , . . . , wn−1 ], Cn wn ], and we have U −1 Dx Am V = Iem , Q−1 Dy Bn W = Ien ,

U −1 Cm V = Dx , Q−1 Cn W = Dy .

The tridiagonalization of L can be achieved as follows: (Il ⊗ Q−1 ⊗ Im )(Il ⊗ In ⊗ U −1 ) L (Il ⊗ W ⊗ Im )(Il ⊗ In ⊗ V ) = Cl ⊗ Dy ⊗ Iem + Cl ⊗ Ien ⊗ Dx + Dz El ⊗ Dy ⊗ Dx .

5. Numerical experiments. We present three test problems to illustrate the use of our fast solver. The finite difference schemes in these test problems are all of second-order accuracy in time and in space. For each individual problem, we specify the particular projection method we used. (a) Exact Solution Example. Consider the INSE with exact solution u = et sin x cos y, v = −et cos x sin y, p = et sin x sin y, with corresponding nonhomogeneous terms in the momentum equations, on a square 0 ≤ x ≤ π and 0 ≤ y ≤ π. The initial and boundary values of u and v are taken from the exact solution. The conservative-form centered difference Crank–Nicholson T scheme was used for (6). For the discrete Poisson equation, the first constraint ψ01 R= 0 is satisfied approximately, as it is just a discrete form of (3). The second constraint T ψ02 R = 0 reduces to X X ∇·w =0 ∇·w− red

black

and is also satisfied approximately, as boundary values are taken from the exact solution satisfying (2). On a 64 × 64 nonuniform grid with mini (dxi ) = mini (dyi ) ≈ 3.3 × 10−2 and maxi (dxi ) = maxi (dyi ) ≈ 7.6×10−2 generated by the same transformation function as in [10], the present fast solver took 0.13 sec of CPU time on the 100MHz Indigo R4000 workstation for each solution of (14), with residual ≈ 0.4 · 10−3 and maxij (∇ · w) ≈ 0.1 · 10−3 . There was no apparent oscillation in pressure, but adding ±1, say, to the initial pressure at alternate cells (the red and black cells) will cause an oscillating pressure distribution, as shown in Fig 5.1 for t = 1. As we indicated earlier, this has

1620

G. GOLUB, L. HUANG, H. SIMON, AND W.-P. TANG

no effect on the solution u and v, nor any effect on the solution process, as illustrated in Fig. 5.2. The present solver is about six times faster than the adaptive multigrid method used in [12], for which the optimal parameters apparently were too difficult to determine. The difficulty is presumably due to the fact that the right-hand sides of the residual equations do not satisfy the constraints, especially the second one, and hence the five-point staggered differencing was used for the residual equations. For larger scale problems and more careful tuning of parameters, it is conceivable that the multigrid method might work better.

30

1

2

2 3

1 0 3 2 1 0 -1

Fig. 5.1. Oscillating pressure at t = 1.

(b) Plane Poiseuille Flow. This flow problem is often used to calibrate the high order accuracy of numerical methods. Consider an infinite channel with height 2, −1 ≤ y ≤ 1, with equilibrium state   2 2 p0 (x, y) = − x u0 (x, y) = 1 − y , v0 (x, y) = 0, Re and a small perturbation of the form ˆ(y)ei(αx−ωt) , v ′ (x, y, t) = vˆ(y)ei(αx−ωt) , p′ (x, y, t) = pˆ(y)ei(αx−ωt) , u′ (x, y, t) = u which leads to the Orr–Sommerfeld equations for u ˆ(y), vˆ(y), and pˆ(y) (see [3]). We compute the unstable mode (a specific ω = ωr +iωi ) for Re = 7500, α = 1, with initial values taken from the solution of the Orr–Sommerfeld equations. The perturbation kinetic energy should satisfy ln(E(t)/E(0)) = 2ωi t.

1621

A FAST POISSON SOLVER

30 2 1

1

30 2

2

3

1

0

0

2

2

0

0

-2

-2

1

2

3

Fig. 5.2. Solution u and v at t = 1.

2 The INSE for u, v, and p′ is formed with nonhomogeneous term Re on the right-hand side of the x momentum equation, so that the solution is periodic in the x direction with period 2π. The Crank–Nicholson modified Temam scheme with componentconsistent pressure correction projection method of [13] was used on a 129 × 128 grid, uniform in the x direction and nonuniform in the y direction with mini (dyi ) ≈ 7.8 × 10−3 and maxi (dyi ) ≈ 2.1 × 10−2 . The numerical results are extremely sensitive to the accuracy near the wall boundaries. Our method achieves good results because using the half-staggered grid on top of a carefully generated nonuniform grid allows uniform treatment of finite differencing at all grid points. For example, see Fig 5.3 (dashed line for ∆t = 0.04, short dashed line for ∆t = 0.08); the results are comparable to the example presented in [3], using grid size of n = 256. For details, see [13]. (c) The Driven Square Cavity Flow. This flow problem is often used to prove the competence of numerical methods in complex flow simulation. The flow is described by the INSE in a unit square with boundary conditions u = 1, v = 0 on top, and u = 0, v = 0 on the other sides. The solution of the unsteady INSE is used as a time-dependent method for the steady-state solution with Re = 5000. The Crank– Nicholson modified Temam scheme with the pressure correction projection method of [10] is used on a 97 × 97 grid, with mini (dxi ) = mini (dyi ) ≈ 3.5 × 10−3 and maxi (dxi ) = maxi (dyi ) ≈ 2.1 × 10−2 . The top boundary condition was modified to u = (0, 0.5, 1, . . . , 1, 0.5, 0) in order to compare the numerical results with the standard results on the staggered grid. At every time step, other than the Poisson equation for φ, another Poisson solution for p from the “component-consistent condition” was found to be necessary for the projection method to be robust. Both Poisson equations were solved with the fast solver of section 3. Time step ∆t = 0.02 produced steadystate solution in about 5,000 steps; ψc = −0.114, umin = −0.4356, and vmax /vmin = 0.4322/ − 0.5668 agree well with standard results in the literature, e.g., [7]. The u distribution on the mid-vertical plane (vmid) and the v distribution on the midhorizontal plane (umid) are given in Fig. 5.4. The solid line is the result of the present method and the dots are the result of [7]. The streamlines are given in Fig. 5.5. Again,

1622

G. GOLUB, L. HUANG, H. SIMON, AND W.-P. TANG

0.2 0.15 0.1 0.05

10

20

30

40

50

Fig. 5.3. Plane Poiseuille flow—ln(E(t)/E(0)).

1

vmid

umid 0.4

0.8

0.2

0.6 0.20.40.60.8 1

0.4

-0.2

0.2

-0.4 0.20.40.60.8 1

-0.2 -0.4

-0.6 -0.8 -1

Fig. 5.4. Driven square cavity flow—umid, vmid, Re = 5000.

the running time for using multigrid for this test problem is an order of magnitude longer than our fast solver. No optimal parameters were found. 6. Conclusion. We have developed a fast Poisson solver for the projection method of the incompressible Navier–Stokes equation with finite difference schemes on the half-staggered grid. Its efficiency has been demonstrated by numerical simulation of flow fields. When the solution domain is irregular, domain decomposition and preconditioning techniques will be used in conjunction with this fast solver to achieve increased efficiency.

1623

A FAST POISSON SOLVER

1 0.8 0.6 0.4 0.2 0 0

0.2

0.4

0.6

0.8

1

Fig. 5.5. Driven square cavity flow—streamlines, Re = 5000.

REFERENCES [1] J. B. Bell, P. Colella, and H. M. Glaz, A second-order projection method for the incompressible Navier-Stokes equations, J. Comput. Phys., 85 (1989), pp. 257–283. [2] B. Buzbee, G. Golub, and C. Nilson, On direct methods for solving Poisson’s equation, SIAM J. Numer. Anal., 7 (1970), pp. 627–656. [3] C. Canuto, M. Y. Hussaini, A. Quarteroni, and T. A. Zang, Spectral Methods in Fluid Dynamics, Springer-Verlag, New York, 1987. [4] A. J. Chorin, Numerical solution of Navier-Stokes equations, Math. Comp., 22 (1968), pp. 745– 762. [5] C. R. Crawford, Reduction of aa band-symmetric generalized eigenvalue problem, Comm. ACM., 16 (1973), pp. 41–44. [6] M. Fortin, R. Peyret, and R. Temam, Calcul des ecoulements d’un fluide visqueux incompressible, J. Mec., 10 (1971), pp. 357–390. [7] U. Ghia, K. N. Ghia, and C. T. Shin, High-resolutions for incompressible flow using the Navier-Stokes equations and a multigrid method, J. Comput. Phys., 48 (1982), pp. 387– 411. [8] G. Golub and C. Van Loan, Matrix Computations, 2nd ed., The Johns Hopkins University Press, Baltimore, MD, 1989. [9] R. G. Grimes, J. G. Lewis, and H. D. Simon, A shifted block Lanczos algorithm for solving sparse symmetric generalized eigenproblems, SIAM J. Matrix Anal. Appl., 15 (1994), pp. 228–272.

1624

G. GOLUB, L. HUANG, H. SIMON, AND W.-P. TANG

[10] L. C. Huang, J. Oliger, W.-P. Tang, and Y. D. Wu, Toward efficient and robust finite difference schemes for unsteady incompressible Navier-Stokes equation on the half-staggered mesh, in Proc. 6th International Symposium on CFD, 1995. [11] L. C. Huang, On boundary treatment for the numerical solution of the incompressible NavierStokes equations with finite difference methods, J. Comput. Math, 14 (1996) pp. 135–142. [12] L. C. Huang and Y. D. Wu, On the half-staggered mesh for the finite difference solution of the unsteady incompressible Navier-Stokes equations, J. Comput. Math, 18 (1996), pp. 24–37. (in Chinese). [13] L. C. Huang and Y. D. Wu, The component-consistent pressure correction projection method for the incompressible Navier-Stokes equations, Comput. Math. Appl., 31 (1996), pp. 1–21. [14] L. Kaufman, An algorithm for the banded symmetric generalized matrix eigenvalue problem, SIAM J. Matrix Anal. Appl., 14 (1993), pp. 372–389. [15] L. Kaufman and D. D. Warner, High-order, fast-direct methods for separable elliptic equations, SIAM J. Numer. Anal., 21 (1984), pp. 672–694. [16] C. B. Moler and G. W. Stewart, An algorithm for generalized matrix eigenvalue problems, SIAM J. Numer. Anal., 10 (1973), pp. 241–256. [17] R. Peyret and T. D. Taylor, Computational Methods for Fluid Flow, Springer-Verlag, New York, 1983. [18] R. L. Sani, The cause and cure of the spurious pressure generated by certain FEM solution of the incompressible Navier-Stokes equations: Part I, Internat. J. Numer. Methods Fluids, 1 (1981), pp. 17–43. [19] U. Schmann and R. Sweet, A direct method for the solution of Poisson’s equation with Neumann boundary conditions on a staggered grid of arbitrary sizes, J. Comput. Phys., 20 (1976), pp. 171–182. [20] Shyy and Vu, On the adoption of velocity variable and grid system for fluid flow computation in curvilinear coordinates, J. Comput. Phys., 92 (1991), pp. 82–105. [21] A. B. Stephen, A finite difference Galerkin formulation for the incompressible Navier-Stokes equations, J. Comput. Phys., 53 (1984), pp. 152–172. [22] G. W. Stewart, On the perturbation of pseudo-inverses, projections, and linear least squares problems, SIAM Rev., 19 (1977), pp. 634–662. [23] R. Temam, Navier-Stokes Equations, Theory and Numerical Analysis, Elsevier Science Pub. B. V., New York, 1984. [24] S. Wang and S. Zhao, An algorithm for Ax = λBx with symmetric and positive definite a and b, SIAM J. Matrix Anal. Appl., 12 (1991), pp. 654–660.

Suggest Documents