METHOD FOR NINE-POINT DIFFERENCE MATRICES ... complement is again a nine-point matrix (on a skew grid this time), the process of approximating.
THE NESTED RECURSIVE TWO-LEVEL FACTORIZATION METHOD FOR NINE-POINT DIFFERENCE MATRICES OWE AXELSSON AND VICTOR EIJKHOUTy
Abstract. Nested recursivetwo-level factorizationmethodsfor nine-pointdierencematricesare analyzed. Somewhat similar in construction to multilevel methods for nite element matrices, these methods use recursive red-black orderings of the meshes, approximating the nine-point stencils by ve-point ones in the red points and then forming the reduced system explicitly. Because this Schur complement is again a nine-point matrix (on a skew grid this time), the process of approximating and factorizing can be applied anew. Progressing until a suciently coarse grid has been reached, this procedure gives a multilevel preconditioner for the original matrix. Solving the levels in V -cycle order will not give an optimal order method (that is, with a total work proportional to the number of unknowns), but we show that using certain combinations of V -cycles and W -cycles will give methods of both optimal order of numbers of iterations and computational complexity. Since all systems to be solved during a preconditioner solve are of diagonal form, the method is suitable for execution on massively parallel architectures.
1. Introduction. Recently the rst author and P.S. Vassilevski [7], [6] have derived and analyzed algebraic multilevel iteration methods for nite element matrices, in particular for piecewise linear approximations. In the present paper we propose somewhat similar methods for nine-point dierence matrices. Because nine-point difference approximations can have a higher degree of approximation, while being just as easy to construct as ve-point approximations, it is of interest to have preconditioners based directly on the nine-point matrix. In fact, the method proposed in this paper is also directly applicable to ve-point dierence matrices. The methods presented here dier from the methods proposed earlier in that, apart from the top level, they do not use coecient matrices derived from the dierential operator. Instead, the Schur complement system of the previous level is used. This is possible because the factorization proposed ensures recursively a nine-point structure on each level. Namely, if a red-black ordering is imposed on the grid, and the nine-point stencil in the red points is suitably modi ed to a ve-point one, elimination of these red points will give a skew nine-point stencil on the black points. A fundamental property of this procedure is that the red points become mutually uncoupled, thus facilitating the solution of systems, and, indeed, enabling the explicit formation of the Schur complement. Continuing this recursive factorization process until a suciently coarse grid is obtained, we obtain a multilevel preconditioner which can be used in a conjugate gradient or other iterative method. For systems on the coarse grid one can for instance use a direct solution method. However, using the preconditioner as a simple V -cycle (descending to the coarsest grid during the forward solve, and ascending again to the nest grid in the back substitution) will not give spectral equivalence to the original matrix. We show that the relative condition number grows with the meshsize h as O(h?q ) for some positive q. Therefore, we use the nested polynomial approximations proposed earlier. Each visit to a level then entails multiple visits to the next coarser level, thereby making the preconditioner into a general W -cycle. More precisely we show that for polynomial degree = 2 the relative condi Faculty of Mathematics and Informatics, University of Nijmegen, 6525 ED Nijmegen, The Netherlands. y Center for Supercomputing Research and Development, University of Illinois at UrbanaChampaign, 104 S. Wright st., Urbana, Illinois 61801, USA.
1
tion number is O(1), but as the number of arithmetic operations grows linearly with the number of levels, the result is a method with computational complexity of order O(h?2 log h?1). Using alternately rst and second (or even third) degree polynomials, we show that a method of optimal order of computational complexity results. The optimality or close-to-optimality is also valid for problems with anisotropy, if the coecients dier by a factor suciently close to 1. The bilinear approximation then permits the ratio of the coecients to be between 1=2 and 2, while the standard nine-point discretization allows values between 1=5 and 5. For other ranges the method will converge, but not with optimal order in general. Earlier use of similar red-black orderings for ve-point dierence matrices can be found in [5]. The use of \intermediate" (i.e., skewed) grids has been described in [16] and in [9] and [14]. A method similar to the present one, but for nite element meshes using bisections of triangles, has been analyzed in [2]. A rst description of 5-point/9-point methods, but without analysis of the in uence of polynomials on the condition number, and not based on explicit use of the Schur complement, can be found in [1]. Beside the multilevel methods mentioned above in which standard nite element basis functions are used, there is the multilevel method using hierarchical basis functions [17]. However, in this method there is no use of nested polynomials, and the condition number grows slowly (O(log h)) for two-dimensional, or more rapidly (O(h?1)) for three-dimensional elliptic problems. A method based on Schur complements and several mesh levels has been used in [11]. Here also the nested recursive de nition of the preconditioners, which is essential for an optimal order method, was not used. This paper proceeds as follows. In section 2 we present the preconditioner in matrix form; in section 3 we derive the actual coecients involved. Section 4 gives an element matrix analysis of the preconditioner to arrive at the value of a certain series of constants k . A global analysis of the condition number is then given in section 5. Here we will show that a condition number of O(1) is attainable. Section 6 concerns itself with the computational complexity of the method. The result here is that for methods with a condition number of O(1) an optimum number of operations can also be attained. Finally, section 7 gives numerical tests and discussion. 2. Construction of the level k coecient and preconditioning matrices. The nested recursive factorization method resembles in some respects recursive twolevel methods proposed in [7], [6]. It diers in the fact that, whereas the two-level methods use a nite element stiness matrix on every level, the nested recursive factorization uses on each level the Schur-complement of the previous ( ner) level to generate the coecient matrix on that level. The nested recursive factorization can be described formally (that is, in terms of matrices, and without reference to stencils) as follows. Let some symmetric positive de nite coecient matrix A(p) be given. Let for some value k p ? 1 the matrix D Ct ( k +1) A = C E (where D and E are symmetric and positive de nite) be the coecient matrix on level k +1. The coecient matrix A(k) on the next level is formed as an approximation of the Schur complement
S (k) = E ? CD?1 C t 2
in such a way that for all vectors u there exist a k 2 (0; 1) such that (1) 0 < k ut S (k) u utA(k) u utS (k) u: The exact mechanics of this approximation are discussed in the next section. This recursive factorization is pursued until we arrive at a suciently coarse level, say level 0, where direct solution with A(k) or S (k) becomes feasible. We precondition A(k+1) by D 0 I D?1C t ( k +1) (2) M = C Z (k) 0 I where I stands generically for an identity matrix of the proper order, and Z (k) is an approximation to the Schur complement S (k) , de ned by Z (k) ?1 = (I ? P (M (k)?1 S (k) ))S (k) ?1 version (i) (3) Z (k) ?1 = (I ? P (M (k)?1 A(k) ))A(k)?1 version (ii). Here P = P(kk) is a polynomial of degree = k where the order zero term has coecient 1. We will specify the choice of the polynomials in section 5; subscripts and superscripts k to indicate level-dependencies will be omitted when they are not needed. We will see later on that, if we aim at preconditioned solution of a nine-point system, the matrix on the nest mesh is an S (k+1) -matrix. Alternatively, one could start out with a mixed ve/nine-point system. The matrix on the nest mesh is then an A(k+1)-matrix. 3. Recursive derivation of the level k coecients. In the previous section we didn't mention the stencils involved in the S (k) and A(k) matrices; in particular, we left unspeci ed how the approximation of the Schur complement system S (k) by A(k) is performed. This will be discussed now. We may consider on each level a nine-point matrix S (k) to be given. A standard red-black structure is then imposed on the grid, the nine-point stencils are modi ed to ve-point stencils in the red points, and are left untouched in the black points. The resulting matrix is called A(k) . In the transition from this level to the next, the red points are eliminated; as we will show below, in the resulting Schur complement system S (k?1) the black points are then connected by a nine-point stencil but on a skew grid. Note that this Schur complement system is derived from exact factorization of A(k) ; factorization of S (k) would give rise to ll-in outside the nine-point stencil on the skew grid. The coecient matrix A(k?1) on the grid of the black points is again derived by imposing a red-black structure on these points, and modifying the stencils in the new red points in order to obtain ve-point connections. Eliminating the red points will then give nine-point stencils on a horizontal/vertical grid, which is the same structure that we had at the outset, but with dierent coecients, and on a double distance grid. Suppose that we start o in the black points with a nine-point molecule composed of s times (?5(h;+) ) plus t times (?(5h;) ), where s and t are positive numbers and 8 ?1 9 8 ?1 ?1 9 = = < < ?5(h;+) = h12 : ?1 4 ?1 ; ; ?(5h;) = (p 21h)2 : 4 ; ?1 ?1 ?1 3
with grid spacing h in both stencils are the axi-parallel and skew ve-point dierence molecules, i.e., our initial molecule in the black points is 8 9 ?s ?t=2 = 1 < ?t=2 ?s 4s + 2t ?s ; : h2 : ?t=2 ?s ?t=2 The stencil in the red points is then : ?s : ?s 4s ?s : ?s : Now, in order to eliminate the red points (i.e., the horizontal and vertical neighbours of the black points), we multiply on the horizontal and vertical crossconnections by 1=4 1=4 1 1=4 : 1=4 Such algebra of stencils can already be found in [14]. This gives a resulting (skewed) nine-point stencil 8 ?s=2 9 > > > ?(s + t) ?(s + t) > = 1 < ? s= 2 6 s + 4 t ? s= 2 > 2h2 > > ; : ?(s+ t) ?s= 2 ?(s+ t) > (or (s + t)(?(5h;) ) + s(?5(2h;+) )) on the black points. Next we impose a red-black structure on the black points with the black points occupying the positions of an axi-parallel grid of meshwidth 2h. We then modify the stencil in the red points by eliminating the horizontal/vertical connections and moving them to the central node; that is, we eliminate the s(?5(2h;+) ) term at the red points. Thus we obtain the stencil 8 9 : ?(s + t) = 1 < ?(s + t) : 4(s + t) : 2h2 : ?(s + t) : ?(s + t) ; in the red points. Eliminating the red points of the skew grid gives the double distance nine-point molecule 8 ?(t + s)=2 ?(2s + t) ?(s + t)=2 9 > >
< ? 1 2 > ?(2s + t) 2h > > : ?(t + s)=2
10s + 6t ?(2s + t)
> = ?(2s + t) > > ; ?(s + t)=2
or (2s + t)(?5(2h;+) ) + (s + t)(?(25 h;) ). Thus the factorization progresses with stencils on level k ak (?5(h;+) ) + bk (?(5h;) ) k ? 1 ak?1(?(5h;) ) + bk?1(?5(2h;+) ) k ? 2 ak?2(?5(2h;+) ) + bk?2(?(25 h;) ) 4
and coecients satisfying ak?1 = ak + bk ; bk?1 = ak for k = p; p ? 1; : : :; 1. We see that (after a normalization) the coecients satisfy a Fibbonacci series, with the coecients for the nest mesh (level p) given. In explicit form
ak = c 1 with as
?
1+p5 p?k 2
p p?k + c2 1?2 5 ?
c1 = 12 1 + p15 ap + p15 bp ;
c2 = 21 1 ? p15 ap ? p15 bp :
In the course of the factorization (that is, for k # 0) the quotient bk =ak converges
p bk = ak+1 ! = 5 ? 1 0:618; ak ak 2 its maximal value is ap =(ap + bp ) if c2 > 0, it is bp =ap if c2 < 0. Furthermore, if bp = ap , then bk =ak = for all k. 4. Local analysis. In this section we analyze the error that we make by modifying a Schur complement system S (k) to obtain the ve-point/nine-point coecient matrix A(k) on that level. As was shown in the previous section, the Schur complement matrix itself was derived by eliminating the red points in the coecient matrix A(k+1) one level higher. To this purpose we construct element matrices that can be associated with the global matrices and analyze their relative condition numbers. For simplicity of presentation we start by treating isotropic problems. 4.1. Isotropic problems. Suppose that on a certain (horizontal/vertical) level we have a coecient matrix A(k+1) of the type with a nine-point stencil in the black points and a ve-point stencil in the red point. If this matrix was derived by discarding the skew connections in the red points, we can write the stencils as follows: ?b=2 ?a ?b=2 : ?a : black: ?a 4a + 2b ?a red: ?a 4a ?a ?b=2 ?a ?b=2 : ?a : ( k ) The Schur complement S obtained after eliminating the red points will then have the nine-point stencil : : ?a=4 : : : ?(b + a)=2 : ?(b + a)=2 : ?a=4 : 3a + 2b : ?a=4 : ?(b + a)=2 : ?(b + a)=2 : : : ?a=4 : : We can associate local element matrices with the above Schur complement, without any need to actually construct any underlying basisfunctions. Consider then the numbering of the nodes of a local element (1) (3) (4) (2) 5
on the skew grid of the black points. Each center node has support of four rotated squares, and we nd the element matrix 0 (3a + 2b)=4 ?a=4 ?(b + a)=4 ?(b + a)=4 1 Se = B @ ?(b?+a=a4)=4 (3?(ab++2ab))==44 (3?(ab++2ab))==44 ?(b?+a=a4)=4 CA ?(b + a)=4 ?(b + a)=4 ?a=4 (3a + 2b)=4 by the reverse process of that by which we usually construct global matrices from element matrices. Letting the points (3) and (4) be the red points of the next level, we equip them with a ve-point stencil ?(b + a)=2 : ?(b + a)=2 : 2a + 2b : ?(b + a)=2 : ?(b + a)=2 and we eliminate the couplings between the red points. For the resulting coecient matrix on this level we nd an element matrix 0 (3a + 2b)=4 ?a=4 ?(b + a)=4 ?(b + a)=4 1 Ae = B @ ?(b?+a=a4)=4 (3?(ab++2ab))==44 (2?(ab++2ab))==44 ?(b +0 a)=4 CA ?(b + a)=4 ?(b + a)=4 0 (2a + 2b)=4 The zeros in positions (3; 4) and (4; 3) are due to the uncoupling of the red points 3 and 4. Now we want to solve the generalized eigenvalue problem x x Se y = Ae y : Since Se and Ae have the same nullspace, the eigenvalue problem is understood as nding the solutions in the complementary space to the nullspace. For this we partition the matrices into 2 2 blocks and make a block factorization ?1 Se = SS11 SS12 = SS11 0I 0I S ?SS11 SS12?1 S ; 21 21 22 22 21 11 12
S
?1 S12 S11 S12 = S11 0 I Ae = ?1 S12 S21 I 0 A22 ? S21 S11 A22 and we nd that the solution to the generalized eigenvalue problem satis es: 1. = 1 and S22y = A22y, or ?1 S12 y and (S22 ? S21S ?1 S12)y = (A22 ? S21S ?1 S12)y, where y is 2. x = ?S11 11 11 not in the nullspace of the two matrices. ? An elementary computation reveals that the vector y = 11 satis es S22 y = A22 y, so the generalized eigenvalue problem in the second point above simpli es to 2a + b 1 ?1 y1 = 1 ?1 y1 ; ?1 1 y2 y2 a + b ?1 1 11 S21
6
where y1 6= y2 . Hence, the eigenvalues to the generalized eigenproblem are = 1 (twice) and = 2aa++bb ; (4) the latter for a vector with y1 6= y2 . Changing a nine-point matrix into a ve-point matrix by the process of moving elements to the central node corresponds to subtracting a certain positive semi-de nite matrix, so from the positiveness of a and b at every level we conclude that the coef cient matrix A(k) and the Schur complement matrix S (k) derived from the previous level satisfy (5) for all x: k xtS (k) x xtA(k) x xtS (k) x where 4 shows that k = (a + b)=(2a + b); that is, k 2 (0:5; 1) for all levels k < p. The estimates based on element matrices can be used for the global matrix since
xt Ax = xt
X
elements
!
A(ek) x
if the global matrix is a Toeplitz form, or if we can identify the element that will give an upper bound for all elements. Note that if the coecient matrix on the nest level was a Toeplitz matrix, the matrices on the coarser levels need not be so. A simple calculation, however, shows that after an elimination step, for the elements along the boundary of the domain an even sharper bound than 4 holds. Therefore, inequalities of the form 5 hold on all levels of the factorization of a Toeplitz form, and the quantity k can be computed locally. In fact, similar to the corresponding analysis in [7], [6], we nd that if we construct the nine-point stencil recursively from an initial coarse grid to ner levels, we can even permit the coecients of the underlying elliptic dierential equation to be discontinuous on the elements of the coarsest grid. The quantity k can still be computed locally. The relation between k and the parameter k used in [7], [6], is k = 1 ? k2 . Hence, as an example, they are valid for the nine-point discretization of ?rru = f in , where consists of four rectangular boxes with = i , i = 1; 2; 3; 4.
1
2
3
4
In particular, they are valid for 4 = 1, which corresponds to a problem on an Lshaped domain, with Neumann-type boundary conditions in the interior boundary part of the `L'. Equation 5 could also have been derived in the following way. Consider the original nine-point matrix and its approximation by using a ve-point stencil in the red points. 7
From the generalized eigenvalue problem for the corresponding element matrices we nd in this case
k = ap =(ak + bk ) = a=(a + b): The relation between the entries in the element matrices for the original mesh and the next, skew, mesh is b a=2; a (b + a)=2: Hence k?1 = (b +(ba+)=a2)+=2a=2 = bb++2aa which gives 4. Therefore, using the recurrence relations ak?1 = ak + bk ; bk?1 = ak we nd k = 1 + 1 ; k = p; p ? 1; : : :; 1 k?1 which shows that the k converge to the positive solution of the equation 2 + ? 1 = 0 ) = 0:618: We will now calculate the value of k for the nest mesh (that is, at level k = p) for some nine-point stencils. For the standard nine-point dierence stencil (which is of fourth order of approximation after suitable modi cation of the right-hand side) ?1 ?4 ?1 ?4 20 ?4 ?1 ?4 ?1 we have a = 4 ) p = 4 = 2 b=2 4+2 3 which is the largest k in this case. For the nine-point molecule arising from bilinear nite elements ?1 ?1 ?1 ?1 8 ?1 ?1 ?1 ?1 we nd a = 1 ) p = 1 ; b=2 3 which is the smallest k in this case: on all coarser levels k :5. 8
4.2. Anisotropic problems. In the analysis of the previous section, in particular in the derivation of the inequalities in equation 5, the fact that the operator stencil was of positive type was of crucial importance. Unfortunately, nine-point stencils for the anisotropic elliptic self-adjoint dierential equation Lu = ?1 uxx ? 2 uyy do not always give dierence approximations of positive type. For instance, bilinear nite elements will give a stencil ?(1 + 2)=2 ?(22 ? 1) ?(1 + 2 )=2 ?(21 ? 2) 4(1 + 2 ) ?(21 ? 2) ?(1 + 2)=2 ?(22 ? 1) ?(1 + 2 )=2 which is not of positive type if the coecients dier by more than a factor of 2. The general form for the fourth order (again, assuming a suitable modi cation of the right-hand side) nine-point box can be derived by noting that 1uxx + 2uyy = 1 +2 2 (uxx + uyy ) + 1 ?2 2 uxx + 2 ?2 1 uyy and using the nine-point dierence scheme for the rst term, and central dierences for the second and third; the resulting stencil (multiplied by 6) is then ?(1 + 2)=2 ?(52 ? 1) ?(1 + 2 )=2 (6) ?(51 ? 2) 10(1 + 2) ?(51 ? 2) ?(1 + 2)=2 ?(52 ? 1) ?(1 + 2 )=2 which is only of positive type if the coecients dier by at most a factor of 5. To get a dierence approximation of positive type we therefore modify the stencil when 52 ? 1 < 0 (or when 51 ? 2 < 0) by moving the o-diagonal positive entries to the diagonal: ?(1 + 2)=2 0 ?(1 + 2 )=2 (7) ?(51 ? 2) 121 ?(51 ? 2) ?(1 + 2)=2 0 ?(1 + 2 )=2 and proceed with this as described before. For the analysis of the spectral relation between the Schur complements and the approximations thereof for a general scheme of anisotropic type, we start with a stencil ?b=2 ?c ?b=2 (8) ?a 2(a + b + c) ?a ?b=2 ?c ?b=2 The Schur complement obtained after eliminating the red points (where we use ?c ?a 2(a + c) ?a ?c for the ve-point stencil) has the skew nine-point stencil ? 2(aa+2 c)
? 2b ? aac+c : b ? 2 ? aac+c
? 2(ac+2 c) : ? 2b ? aac+c 4(a + c + b)(a + c) : ? 2(aa+2 c) ?2a2 ? 2c2 : ? 2b ? aac+c 2 ? 2(ac+c) 9
Corresponding to this stencil we get two kinds of element matrices for square elements: those lying to the left or right of the central node, and those lying over or under it. For an element to the right of the central node, the element matrix and modi ed element matrix (where a ve-point stencil is used in the red points) are
0 b+ a ? 2(aa+c) ? 4b ? 2(aac+c) ? 4b ? 2(aac+c) 1 2 2(a+c) BB ? a b+ a CC ; ? 4b ? 2(aac+c) ? 4b ? 2(aac+c) C 2( a + c ) 2 2( a + c ) B Se = B b ac b ac c c b @ ? 4 ? 2(a+c) ? 4 ? 2(a+c) 2 + 2(a+c) ? 2(a+c) CA 2
2
2
2
2
? 4b ? 2(aac+c) ? 4b ? 2(aac+c)
2
? 2(ac+2 c)
c2 b 2 + 2(a+c)
1 0 b a + 2(a+c) ? 2(aa+c) ? 4b ? 2(aac+c) ? 4b ? 2(aac+c) 2 B a b+ a b ? ac b ? ac C B ? ? ? 2( a + c ) 2 2( a + c ) 4 2( a + c ) 4 2(a+c) C CC ; B Ae = B b ac b ac b ac 0 A @ ? 4 ? 2(a+c) ? 4 ? 2(a+c) 2 + a+c 2
2
2
2
b ac 0 ? 4b ? 2(aac+c) ? 4b ? 2(aac+c) 2 + a+c the other matrices are the same, but with a and c interchanged. Solving the general-
ized eigenvalue problem
Se x = Ae x orthogonal to the null space of the matrices, results in eigenvalues = 1 (twice) and
(
= max 22aca ++bb ; 22acc ++bb a+c a+c
)
Hence, assuming that a, b, and c are nonnegative, we nd that the new coecient matrix A(k) and the Schur complement matrix S (k) are related by (9) where (10)
for all x: k xt S (k) x xtA(k) x xtS (k) x;
( 2ac
+ b 2ac + b k = min 2a + b ; a2+cc+ b a+c
)
For the nine-point scheme 6, substituting b = 1 + 2, a = 51 ? 2 , c = 52 ? 1 , equation 10 shows that if 1=5 1=2 5 holds, then k 1=9. For the modi ed nine-point scheme 7 we obtain with b = 1 + 2, a = 51 ? 2 , c = 0 that k 1=11 for all values of 1 and 2 . Unfortunately, the results in section 5 show that an optimal, or nearly optimal, method requires k 1=4. A similarly modi ed bilinear scheme gives, with a = 21 ? 2 , b = 1 + 2, c = 0, a value of k = 51 11?+2==51 2 1 so k > 1=5 for all values of 1 and 2. In case 2 =1 > 1=5 we have k > 1=4. 10
Next we have to investigate how the value of changes during the factorization. Consider then a skew nine-point stencil ?c ?b : ?b ?a : 2(a + 2b + c) : ?a ; ?b : ?b ?c after equipping the red points (that is, the cross connections) with a ve-point stencil and eliminating them, we obtain ?b=4 : ?c ? b=2 : ?b=4 : : : : : ?a ? b=2 : 2a + 2c + 3b : ?a ? b=2 : : : : : : ?b=4 : ?c ? b=2 : ?b=4 For the analysis, we assume, in terms of the quantities a, b, and c of 8, that c a. Now note that ?1 = maxf1 + a ; 1 + c g; where 2 a = 2ac +2aab + bc = c b2 b c 2a + a + a a 2 2 c c = 2ac +2ab + bc = 2 ac + cb + cb ac : Two steps of the factorization, that is, going from one level to one with double meshsize, transform the quantity c=a as follows: c c(a + c) + (ab + bc) a 7! a(a + c) + (ab + bc) 2 [1; c=a]; so both a and c may increase during the factorization, but they remain bounded uniformly in the initial coecients. In fact, in practice we often see them converging rather rapidly to 1. The analysis for the case a > c follows by interchanging a and c, and we can perform a similar analysis to the above for the spectral condition bounds on the skew levels. Another approach to obtaining nine-point schemes of positive type would be imaginable, namely to start out with the standard ve-point dierence scheme and let the method be based on the nine-point scheme of the Schur complement after elimination of the red points. For anisotropic problems, however, one then nds (with a = 1 , c = 2 , and b = 0) that on the nest mesh f1; 2g = min a +c c ; a +a c = min 1 + 2 : Hence in this case the lower eigenvalue bound is not uniform in 1 and 2. 11
5. Estimation of the condition number. In this section we analyze the relative condition numbers of the coecient matrix A(k) and the Schur complement matrix S (k) to the preconditioner M (k) on the k-th level. This is done by comparing the condition numbers on two consecutive levels and performing a limit analysis. Ultimately, we are interested in the condition number of M (p) relative to either A(p) or S (p) . As the latter two are related by the simple inequalities 1, we can look at either. We show that if the polynomial P in the preconditioner is taken, as a linear polynomial, the relative condition number increases with the meshsize h as h?q , where q = log2 ?1; 0:618. If a higher degree polynomial is taken the condition number can be reduced to O(1). However, as per level only half of the points are eliminated, such higher degree polynomials do not give a preconditioner of an optimal order in the number of arithmetic operations. Therefore we also consider preconditioners alternately taking polynomials of rst and higher degrees; this will again lead to an O(1) condition number. In section 6 it is shown that such preconditioners have an optimal order of the number of arithmetic operations for polynomial degrees 2 and 3. 5.1. Analysis of optimal order. We begin by recalling that the preconditioner M (k+1) satis es D 0 t C 0 ( k +1) ( k +1) M = C Z (k) + CD?1C t = A + 0 Z (k) ? S (k) (11) where S (k) = E ? CD?1 C t and where Z (k) is de ned as Z (k)?1 = (I ? P (M (k)?1 S (k) ))S (k) ?1 for version (i), and Z (k) ?1 = (I ? P (M (k)?1A(k) ))A(k) ?1 for version (ii). At the lowest level k = 0 (i.e., the coarsest mesh) we let A(0) = M (0): The polynomials P can have both their degree and their coecients de ned per level; they are chosen to have for all x 2 Ik : 0 P (x) < 1; where the interval Ik is de ned as utS(k) u t S (k) u u ; sup t (k) ; for version (i): Ik = inf u t (k) uutMA(k)uu u uutMA(k)uu for version (ii): Ik = inf : ; sup u utM (k)u u utM (k)u Also we assume that P is chosen to have at least one zero in Ik . For the further analysis we need the following basic lemma. Lemma 5.1. The following relations between coecient matrices and Schur complement matrices hold. 12
1. For all v
0 < k vt S (k) v vt A(k) v vt S (k) v: 2. For all v = (v1 ; v2) partitioned consistently with A(k+1) vt A(k+1)v v2t S (k) v2 and equality is attainable. Proof. Part 1 is formula 1 carried over from the element matrices to the global
matrix; the value of k was derived in formula 5. Part 2 follows by noting that vt A(k+1)v = (v1 ? D?1 C tv2 )t D(v1 ? D?1 C tv2 ) + v2t (E ? CD?1 C t)v2 v2t S (k) v2 with equality if v1 = D?1 C tv2 . The de nition of the polynomial implies the following a priori bounds on the relative condition of S (k) and Z (k) : Lemma 5.2. For version (i) of the method
(12)
t (k) for all u: ut S (k)u 2 (0; 1];
for version (ii) of the method
(13)
uZ u
for all vectors u:
with the lower and upper bounds satisfying
utS (k) u 2 (0; ?1]; k ut Z (k)u
ut S (k) u 1 sup utS (k) u : inf u ut Z (k) u utZ (k) u u
Proof. The assertions for version (i) follow trivially from the de nition of the polynomial. In order to obtain the bounds for version (ii), use in addition lemma 5.1, and note that for block vectors with a zero rst block component utS (k) u = utA(k)u 2 (0; 1]; ut Z (k)u ut Z (k) u which implies by the de nition of the polynomial that the lower bound doesn't exceed 1. Also, for any u 6= 0 utS (k) u = utS (k) u utA(k) u ?1 : ut Z (k)u ut A(k)u utZ (k) u k If u is such that P (utA(k)u=utM (k)u) = 0 (and note that we have required P to have a zero in Ik ), then we nd ut S (k) u utA(k) u = 1; ut Z (k) u utZ (k) u which establishes that the upper bound is 1. Investigation of ut S (k) u=utZ (k)u gives us bounds for utA(k+1) u=utM (k+1)u: 13
Lemma 5.3. For both versions of the method
ut A(k+1)u = inf ut2 S (k) u2 ; inf u ut M (k+1)u u2 ut Z (k) u 2
2
and tA(k+1) u sup uutM (k+1) u = 1;
for version (i):
u
tA(k+1) u ut2S (k) u2 : = sup sup uutM (k+1) u ut Z (k) u
for version (ii):
u2
u
2
2
Proof. For the lower bound let
utS (k) u ; = inf u utZ (k) u
and note that lemma 5.2 shows that 1; using 11, the bound then follows from ut M (k+1)u = 1 + ut(M (k+1) ? A(k+1) )u ut A(k+1)u utA(k+1) u t ( k ) S (k) )u2 ?1 ? 1) ut2S (k) u2 ?1 ; = 1 + u2(ZutA(?k+1) (14) 1 + ( u ut A(k+1)u
where the last inequality follows from 1 and lemma 5.1. For the upper bound we have to consider the two versions of the method separately. For version (i) equations 11 and 12 imply that
utM (k+1) u = utA(k+1) u + ut2 (Z (k) ? S (k) )u2 ut A(k+1)u; for version (ii) let u2Z (k) u2 ; = inf u2 u S (k) u 2
2
and note that 1 by lemma 5.2. Now we have for all vectors u = (u1; u2), (1 ? )ut A(k+1)u (1 ? )ut2S (k) u2 ut2 (S (k) ? Z (k))u2 so
utM (k+1)u = ut A(k+1)u + ut2(Z (k) ? S (k) )u2 utA(k+1) u: Also in this case we have equality for the vectors u = (u1; u2) for which S (k) u2 = A(k+1) u. Lemma 5.4. The following upper bounds hold for the quadratic forms of A(k+1) and S (k+1) relative to M (k+1) : for version (i) ut A(k+1)u 1; utS (k+1) u ?1 ; utM (k+1)u ut M (k+1)u k+1 14
and for version (ii)
ut A(k+1)u ?1 ; ut M (k+1)u k
ut S (k+1) u ?1?1 : utM (k+1)u k k+1
Proof. We proved the bounds for A(k+1) in the previous lemma. The bounds for
S (k+1)
then follow from lemma 5.1.
Corollary 5.5. The interval Ik can be given by utS(k?1)u ? 1 ; : Ik = inf u utZ (k?1) u k?
where = 0 for version (i) and = 1 for version (ii). Proof. Combine lemmas 5.3 and 5.4, and for the lower bound for version (i) use
in addition that
utS (k) u inf utS (k) u utA(k) u ut M (k)u ut A(k)u utM (k)u for all u 6= 0, and lemma 5.1a. We will now give the de nition of the polynomials P(kk). The best approximation to zero on the interval Ik among polynomials of degree satisfying 0 P (x) < 1 and P (0) = 1, is the shifted and scaled Chebyshev polynomial
(15)
p
1 + Tk k +k?k?k2t P(kk)(t) = 1 + Tk kk ?+kk
p
where T (x) = 21 [(x + x2 ? 1) + (x ? x2 ? 1) ] and
ut S (k?1)u k = inf 1)u u u?t1Z (k?version (i) k = ?k1 version (ii) k?1 Remark 5.6. For k = 1 the polynomial reduces to 1? t ( k ) Pk (t) = 1 ? t= k = 1 ? k?k 1t for versions (i) and (ii) respectively. Lemma 5.7. If we de ne k by
utS (k) u ; k = inf u utM (k)u ut A(k)u ; k = inf u utM (k)u
for version (i): for version (ii):
then a lower bound for ut S (k) u=utZ (k) u is given by
utS (k) u 1 ? P (k)( ) k k ut Z (k)u 15
for both versions. Proof. The assertion follows as
ut S (k) u = 1 ? P (k) u~tS (k) u~ for some u~ k u~tM (k)u~ utZ (k) u for version (i), and utS (k) u = utS (k) u 1 ? P (k) u~tA(k) u~ for some u~ k u~tM (k)u~ utZ (k) u utA(k) u for version (ii), where lemma 5.1 has also been used. Remark 5.8. Note that also for version (i) lemma 5.3 shows that
for all u:
ut A(k+1)u : utM (k+1)u k
We collect the above results into a theorem.
Theorem 5.9. The following bounds hold for the quadratic forms of A(k+1) and
S (k+1) relative to M (k+1): for version (i) utS (k+1) u 2 [1 ? P (k)( ); ?1 ]; utA(k+1) u 2 [1 ? P (k)( ); 1]; k k k k k+1 utM (k+1) u ut M (k+1)u
and for version (ii)
utA(k+1)u 2 [1 ? P (k)( ); ?1]; ut S (k+1) u 2 [1 ? P (k)( ); ?1?1 ]; k k k k k k+1 k utM (k+1)u utM (k+1)u where the quantities k satisfy the recurrence k = 1 ? P(kk??11)(k?1): Note that the above theorem is somewhat independent of the actual choice of the polynomials; the results hold for any polynomial such that its maximum in the interval Ik is taken at the left boundary. Also it is worthwhile to remark that there is an essential dierence in behaviour between polynomials of odd and even degree in that polynomials of even degree are very sensitive to estimates of the upper bound of Ik . If we underestimate the upper bound, in both cases the condition P (x) 2 [0; 1] is violated, but for polynomials of odd degree it gives P (x) < 0 which merely result in eigenvalues of M ?1A greater than 1. For even degree polynomials, on the other hand, it gives P (x) > 1, leading to negative eigenvalues, so it may result in divergent methods. To analyze the spectral condition number k of M (k)?1 A(k) for version (i), we note that for = 1 k+1 = 1=(1 ? P (k )) = 1=(k k ) = ?k 1 k ; so the condition number on the nest mesh p =
Y
p?1
k=0
?k 1 :
As ?k 1 1 + for some positive , the condition number grows geometrically with the number of levels. An upper bound for the number of iterations to achieve a certain 16
relative error in the norm frtM (p)?1 rg1=2, where r is the residual, is found to be (see [3])
p
#iterations = 21 p log2=:
For the standard nine-point stencil we have so
p ! 31 c1 (?1 )p
?
p
(c1 = (5 + 2 5)=15; ?1 1:618)
?
#iterations = O (1=)p=2 log 2= = O 2(log2 1=)p=2 log 2=: With h?1 = 2p=2 we nd ? #iterations = O h? log2 1= log2=; for h ! 0, which incidently shows that the number of iterations grows asymptotically slower than for the incomplete factorization method where the number of iterations grows as O(h?1), but faster than for the modi ed incomplete factorization method (see [3], [8], or [12], for instance) where it is O(h?:5). For version (ii) and for the relative condition of S (k) and M (k) we get a similar analysis, but involving ratios of k 's. Consider now the case 2. Actually, from the results in section 6, we deduce that only = 2 is of interest because of the high computational complexity for 3. For = 2, equation 15 shows that for version (i)
P(kk)(t) =
?1 + k ? 2t 2 k
?k 1 + k
;
P(kk)(k ) =
?1 ? k 2 k
?k 1 + k
so (16) k+1 = 4k ?k 1 =(?k 1 + k )2 ; k = 0; : : :; p ? 1: It is readily seen that if k ^, the recursion converges to a positive xed-point ^ = 2^?1=2 ? ^?1 if ^ > 1=4. Even though the sequence k is in fact not constant, the quantities k are bounded below by a number independent of p. The condition numbers are k = ?k 1 ; and k+1 = (1 + k ?k 1 )2 =(4k ?k 1 ); k = 0; : : :; p ? 1: We have seen earlier that b0 = a0 implies k = , in which case p k ! ^ = ^=(2 ^ ? 1) 1:09; that is, a very small condition number. The condition numbers for the standard nine-point dierence stencils converge to the same limit and should be close to this. 17
Consider next preconditioners where on alternating levels polynomials of rst and higher degree are used; that is k = 1, and k+1 = 2 or k+1 = 3. For k+1 = 2 we nd 1 ?1 ?1 2 ?1 (17) k+2 = ?k+1 k+1 = k+1(1 + k k ) =(4k k+1 ): Similar to the above derivation for the case where = 2 was used throughout, we nd that 17 has a xed-point ^ = ^=(2^ ? 1) if k ^ and ^ > 1=2. For ^ = we nd p ^ = =(2 ? 1) = (3 + 5)=2 2:62: For k+1 = 3 we have 1 ? P3(k)(k ) = =
k + k k + k T3 ? + 1 T3 ? ? 1 k k 1 ( k ? k)2 1 k( k ?kk)2 1 + 8 ( + ) k k k
1 + 8 ( + ) k k k
1 , so with k+1 = k?+1 k+1 = k+1, 1 k+2 = k?+1 k+1
(18)
2 1 1 + 1 (k ? k ) = k?+1
8 (k + k )k
1+
1 (k ? k )2 8 (k + k )k
If k ^ this recurrence has a xed-point ^ satisfying " # " # 2 2 ^ ^ 1 1 1 ( ? ^ ) ( ? ^ ) ^ 1 + 8 ^ = ^ 1 + 8 ^ ( + ^)^ ( + ^)^ and we nd if ^ > 1=3 3 ? ^ ^ = ^ 3^ ?1 which is if ^ = .
^ = 32 +
p5 10
1:72
5.2. Analysis of parameter-free polynomials . The previous section gave an analysis for the case where the polynomials P(kk) were optimized with respect to the quantities k and k . This of course presumes that these quantities are computable, which in general, and in particular for problems with strongly variable coecients, may not be the case. In this section we will therefore suggest polynomials that do 18
not require explicit knowledge of parameters. In particular, we will give an analysis of the quantities utA(k)u ; sup utA(k) u inf u utM (k)u u utM (k)u for several speci c classes of polynomials. Quantities utS (k) u=utM (k)u dier from these by at most a simple factor, as was shown in the previous section. We start by considering separately polynomials of even and odd degree, where the latter are required to have degree 3. We will then consider the case = 1, and the use of mixed rst and higher degree polynomials. Case I P (x) = (x ? ?1 ) for even. In this case we show that the polynomial P assumes only values in (0; 1) for some value of > 1. Then lemma 5.5 applies, and we nd that if Ik is such that for some k 2 (0; 1] Ik = [k; ?k 1] then 1 ]; Ik+1 = [k+1; ?k+1 k+1 = minf1 ? P (k ); 1 ? P (?k 1)g: It is seen that ?1 = mink k satis es the above assumption. Case II P (x) = (1 ? x) for odd and > 1; 2 (0; 1]. First let us assume that = 1. In this case we show that there exist quantities k such that t A(k)u for all u: uut M (k)u 2 [1; k ] and such that the sequence k 7! k is bounded. Suppose inductively that for all u ut A(k)u 2 [1; ] k utM (k)u for some k > 1. Then utS (k) u 2 [1; ?1]; k k utM (k) u and, as 1 ? P is increasing and 1 ? P (1) = 1, we nd ut S (k) u 2 [1; ]; = 1 ? P ( ?1): k+1 k+1 k k ut Z (k)u Now use lemma 5.5 and the second half of the proof of lemma 5.4 to nd that also utA(k+1) u 2 [1; ]: k+1 utM (k+1)u In order to show that the k 's converge if = 2 0 + 1 (for 0 > 0), we show that the recursion x0 = 1; xk+1 = 1 ? (1 ? ?1xk )3 19
has a xed point for suciently close to 1. It is easily seen that, if this is the case, also for higher values of 0 a xed point exists. Elementary computation shows that this recurrence has a xed-point if
43 :
As this condition is not met in practice, we choose such that a xed point is ensured. For constant values of k , = would be a convenient choice; in practice it suces to take = 1=2. Note, however, that in the numerical tests on the Poisson problem even for = 1 the method converges in a number of iterations that is of optimal order, or close to it. Case III P (x) = 1 ? x. The important observation in the case of P (x) = 1 ? x is that
Z (k) = M (k); so if k is such that for all u: then
ut A(k)u ; ut M (k)u k
t (k+1) u ut2S (k) u2 sup ut2S (k) u2 sup ut2A(k) u2 sup uutA = sup Z (k+1) u u u2 ut2M (k)u u2 ut2 A(k)u u2 ut2 M (k)u ? 1 k k
that is,
k+1 = ?k 1 k : The lower bound of 1 carries over from the above analysis for polynomials of odd degree. Thus we nd in this case an (approximately) geometric increase of the condition number, and consequently an increase of the number of iterations inversely proportional to the meshwidth. Case IV P (x) is of rst and higher degree on alternating levels. It is shown in the previous section 6 that if second or third degree polynomials are only employed on every other level, the number of operations per preconditioner solve will be of the order of the number of unknowns. In order to show that the condition number O(1) can still be reached we have to extend the above analysis somewhat. Again we consider the cases of odd and even polynomials separately. Assume that k is such that on level k + 1 we have P (x) = 1 ? x, while on level k the polynomial is of even degree. Let k and k be such that t A(k)u for all u: uutM (k)u 2 [k; k ]; and let the polynomial satisfy (19) x 2 [k; k ?k 1] ) P (x) 2 [0; 1]: 20
For a lower bound on the condition number we nd, referring to lemma 5.4 utA(k+2) u = inf ut2S (k+1) u2 = inf ut2S (k+1) u2 inf u utM (k+2)u u2 ut2Z (k+1) u2 u2 ut2M (k+1) u2 ut2A(k+1) u2 = inf ut22S (k) u22 inf u2 ut2M (k+1)u2 u22 ut22Z (k)u22 ut S(k)u22 : 1 ? P ut22M (k) u = inf u22 22 22 Again using the proof of lemma 5.4 and taking the requirements on the polynomial into account we nd that ut S (k) u 2 [0; 1]; so ut A(k+1)u 1 ut Z (k) u utM (k+1)u and t (k+1) t S (k+1) u ?1 sup utA(k+1) u sup uut ZS (k+1)uu = sup uutM k+1 u ut M (k+1)u (k+1)u u u ? 1 ; k+1 which shows that t A(k+2)u ?1 for all u: uutM (k+2)u k+1 : Thus we have proved inductively that 1 : 0 < k+2 = inf ?1 (1 ? P (x)); k+2 = ?k+1 x2[k ; kk ]
An easy way to have condition 19 satis ed is to choose P (x) = (x ? ?1) where is a lower bound to all of the k s. Next let the polynomial be of odd degree on level k; speci cally, let (20) P (x) = (1 ? x) where 2 (0; 1]. We begin by assuming that = 1 is chosen. Our inductive assumption is t A(k)u for all u: uutM (k)u 2 [1; k]: As utS (k+2) u 2 [1; ?1 ] ) utS (k+2) u 2 [1; 1 ? P ( ?1 )]; k k k k utM (k+2)u utZ (k+2) u so following the above analysis for odd degree polynomials, we get the lower bound tA(k+2) u for all u: uutM (k+2)u 1 21
and the upper bound t (k+2)ut ut2 S (k+1) ut2 = sup ut2S (k+1) ut2 sup uut A sup ( k +2) M u u u2 ut2 Z (k+1) u2 u2 ut2M (k+1)u2 t (k+1) ut t (k) 2 ?1 sup u22S u22 : 1 sup u2S ?k+1 k +1 t t u2 u2Z (k+1) u2 u22 u22Z (k) u22 As the recurrence
?
k+1 = ?1 1 ? (1 ? ?1 k ) does not have a xed-point for any value of , we will have to choose again an < 1. In practice a value of = :5 is sucient. 6. Computational complexity. The solution of a system M (k+1) a = b involves the solution of a system of lower dimension with Z (k). From the de nition of Z (k) , 3, we see that solving Z (k) y = x is equivalent to computing (21) or
?
?
y = I ? P (M (k)?1 S (k) ) S (k)?1 x
for version (i)
y = I ? P (M (k)?1 A(k) ) A(k)?1 x for version (ii). As was already remarked before, we choose for P a polynomial which can be written P (x) = 1 ? c1 x ? ? c x ; so for instance the solution process 21 boils down to
(22)
h ?
i
y = c M (k)?1 S (k) ?1M (k)?1 + + c1M (k)?1 x i h ?1 ?1 = M (k)?1 c1 + S (k) M (k) ( (c ?1 + S (k) M (k) c ) ) x; which can be written algorithmically as y 0 for k = 0; : : :; ?1?? 1 do y M (k) c ?k x + S (k) y for version (i); for version (ii) we must replace S (k) by A(k). We will now proceed to give operation counts for the solution of a system with preconditioners using combined ve-point/nine-point stencils and the elimination procedure described above. If we take N to denote the number of points on the nest mesh, at level p, we de ne Np = N , and the number of points Nk on the mesh at level k equals (disregarding boundary eects)
Nk = N=2p?k : 22
From the de nition 2 of the preconditioner and the algorithm to solve systems with Z (k) above, we nd that the number of operations !k+1 needed to solve a system with M (k+1) can (for version (i)) be given by
!k+1 = 2 Nk solve with D(k) twice + 2 4 Nk multiply by C and C t + !k + ( + 9( ? 1)) Nk solve system with Z (k) = (5 + 1=2)Nk+1 + !k (Total) For version (ii) we nd similarly !k+1 = (4 + 3=2)Nk+1 + !k: These operation counts concern the number of multiplications. The number of additions is somewhat smaller. When the coecients are locally constant, we can actually reduce the number of multiplications substantially, using a method of summations of dierences, or taking averages for the computation of matrix vector multiplications. For further details see [3]. However, in the interest of not presenting too many variations of our method, we do not give the corresponding work estimate here. It is now easy to estimate the amount of work for various values of . Let the amount of work !0 at the coarsest level be given. As Nk = Nk+1 =2 we solve the recurrence for the amount of work !k explicitly: !k = (5 + 1=2)(1 + +
? k?1)N + k! : k 0 2
For some speci c values of we nd for the total amount of work at the nest level = 1: !p 11Np + !0 = 2: !p 10? :5pN + 2p !0 p p 3 = 3: !p 31 2 Np + 3p !0 for version (i); for version (ii) we nd somewhat smaller constants, except for = 1 when the methods coincide. Thus, for = 2 or larger the preconditioner can no longer be of optimal order of computational complexity. As it is desirable to have polynomial degrees higher than 1 (see the analysis of the condition number in section 5), we shall propose a preconditioner that uses dierent polynomials on dierent levels. In particular, let us consider a preconditioner which uses rst and either second or third degree polynomials on alternate levels. Let k be such that a rst degree polynomial is used on level k + 1. For the amount of work we then nd for version (i)
!k+2 = 5:5Nk+2 + (5 + 1=2)Nk+1 + !k = (23=4 + 5=2)Nk+2 + !k The explicit solution of this recurrence is ? !2k = (23=4 + 5=2) 1 + + (=4)k?1 N2k + k !0 which gives (neglecting the last term) = 2 : !p 21:5Np; = 3 : !p 53Np 23
for version (i), and
= 2 : !p 20:5Np; = 3 : !p 49Np for version (ii). Hence both versions of the method give, when using on alternating levels polynomials of degree 1 and 2 or 3, a preconditioner with computational work per preconditioning step of optimal order, O(N ). 7. Numerical tests. We have performed a number of problems, to test the optimality of the methods, to compare the Chebyshev polynomials and the parameter-free polynomials, and to investigate the sensitivity of the methods to anisotropy. The method used throughout is the conjugate gradient method with a multilevel preconditioner M with various degrees of polynomials, subject to a stopping criterion
pt
g M ?1 g < 10?10 where g is the residual. The starting vector was the zero vector and the solution of the dierential equation was u 1 throughout. The coecient matrix was derived from the nine-point dierence box. We have tested the Poisson problem, and the anisotropic problem ?uxx ? 100uyy = f on = (0; 1)2 The coecient matrix was a nine-point matrix (that is, an S (p) matrix) derived from the fourth order box nite dierence stencil, and the preconditioner was a version (i). The incomplete factorization was pursued all the way down to where the size of the system was 1, so the number of levels was p = 2dlog2 h?1e. Some observations: For the methods with alternating rst and higher degree polynomials, it makes a dierence if the rst degree polynomial is taken on the axiparallel or the skew grid. It turns out that for the Chebyshev polynomials one has to take the axiparallel grid, but for the parameter-free polynomials it is better to take the skew grid for the rst degree polynomials. In both cases the numbers of iterations may dier, especially for the anisotropic problem, by a factor of 2 depending on which grid is taken. We have reported only the optimal number of iterations. For rst degree polynomials the method with the Chebyshev polynomials converges surprisingly slowly, to such an extent that for the anisotropic problem no convergence within the rst 100 iterations was reached for h?1 > 20. The convergence speed for the parameter-free rst degree polynomials is probably due to an eect of improved condition by underestimation of the upper bound of the spectrum, as was described above. When parameter-free polynomials of alternating degree are used on the anisotropic problem, the value of in the de nition of the polynomials is crucial. Although for the Poisson problem = 1 seems optimal with other values giving slightly more iterations, for anisotropic problems this gives a very slowly converging method. Taking = 1=2, on the other hand, gives, for third degree polynomials on every other level, satisfactory convergence speeds that are within a factor of 2 of those for the (optimal) Chebyshev polynomials. 24
Numerical tests indicate that this value of = 1=2 suces for most values of the anisotropy; only for the Poisson problem, performance slightly improves when = 1 is chosen. For parameter-free polynomials of the second degree, the choice of is crucial. It turned out not to be necessary to choose = maxk ?k 1 : about half to one third of that value suced. The resulting numbers of iterations were within a factor of two from those of the methods with optimal polynomials. Choosing smaller led to divergent methods, and choosing it larger rapidly led to slower convergence. However, since no practical method for nding this optimal seems to exist, we have only given results for this variant for the Poisson problem where = 1 suces. We start by giving in tables 1 and 2 the iteration counts on the Poisson problem. On this problem the \V -cycle" methods (polynomial degree 1) give numbers of iterations that seem to indicate a condition number of less than O(h?1); the \W -cycle" methods (polynomial degree 2) give constant numbers of iterations, but using alternating rst and higher degree polynomials gives a slowly increasing number of iterations for the parameter-free polynomials. 1=h = 10 20 40 =1 10 13 17 2 6 6 6 3 6 6 6 1&2 7 7 7 1&3 6 6 6 Table 1: Numbers of iterations for the 5 =9 dierence stencil on the Poisson problem using optimal polynomials.
80 22 6 6 7 6
1=h = 10 20 40 80 =1 7 10 12 16 2 7 8 8 8 3 5 6 6 6 1&2 7 8 8 9 1&3 6 7 9 12 Table 2: Numbers of iterations for the 5 =9 dierence stencil on the Poisson problem using parameter-free polynomials. As a prototype of a more dicult and thus more realistic problem, we consider the anisotropic problem
?uxx ? 100uyy = f: Although we give the numbers of iterations for the Chebyshev polynomials of second and third degree on each level, these data, indicating that the preconditioner is spectrally equivalent in this case too, should be regarded as reference material only because of the high operation count per iteration. 25
Our interest then lies with the rst order methods, and those with rst and higher degree polynomials alternating. As already remarked above, however, using rst degree polynomials in a V -cycle method with Chebyshev polynomials gave a very slowly converging method, so these results were not reported. For the methods with alternating degrees of polynomials, the tests show a reasonably slow increase in the numbers of iterations for the second and in particular third degree polynomials. The preconditioners with parameter-free polynomials converge in a number of iterations that stays within a factor 2 of the optimal number on the Chebyshev polynomials. 1=h = 10 20 40 Chebyshev: = 2 13 16 17 3 10 11 12 1&2 24 34 38 1&3 13 15 16 Parameter-Free: = 1 15 20 23 1&3 16 20 25 Table 3: Numbers of iterations for the 5=9 dierence stencil on the anisotropic problem.
80 17 12 44 17 26 28
Tests not reported here on problems with even stronger anisotropy, or with large discontinuities in the coecients, show essentially the same behaviour, with numbers of iterations diering not much from those given here. Next we consider the eciency of these methods compared to some incomplete block factorization preconditioners, namely the standard recursive incomplete line block preconditioner (denoted Lr in [4] and INV C 1 in [10]), and a vectorizable variant of this where series expansion of pivot blocks is used in the preconditioner solve (denoted INV Vj in [15] and, more general, Lx (p) in [4]). For the methods with alternating degree polynomials we use for the Poisson problem second degree polynomials, as for third degree the number of iterations is the same, but the number of operations is about twice as high; for the anisotropic problem, however, second degree polynomials give an increasing number of iterations, so we present then the gures for third degree polynomials. h?1 = 10 20 40 80 160 Lr 5 1.3 106 8 1.6 160 13 1.9 250 24 2.2 448 42 . 772 Lx (3) 5 1.8 162 8 2.3 240 13 2.9 370 24 3.7 656 42 . 1124 = 1 7 1.8 167 10 2.0 231 12 2.2 274 16 2.4 358 . . = 1&2 7 1.38 263 7 1.4 263 7 1.42 263 7 1.42 263 . . Table 4: Numbers of iterations, condition number estimates, and operations=n2 per PCG for multilevel and incomplete factorization preconditioners on the Poisson problem using Chebyshev polynomials. First of all we notice with respect to scalar eciency that it is hard to outperform the recursive preconditioner Lr , especially as its performance improves with increasing anisotropy. Only for the Poisson problem is the multilevel preconditioner with = 1&2 more ecient for suciently ne grids. 26
h?1 = 10 20 40 Lr 3 1.003 40 4 1.03 88 6 Lx (3) 6 3.5 188 10 5.8 292 20 = 1 15 6.2 337 20 8.7 442 23 = 1&3 2.1 13 925 15 2.2 1051 16 Table 5: Numbers of iterations condition number estimates, and operations=n2 per PCG for multilevel and incomplete factorization preconditioners on the anisotropic problem; Chebyshev polynomials for the case 1&3.
1.2 9.2 10.9 2.3
80 144 12 552 41 505 26 1114 2.4
1.6 12.7 16.3 17
160 242 23 . 430 1098 75 . 1982 568 . . 1177 . .
On vector computers and parallel architectures, however, where a recursive preconditioner is less desirable, we have to compare the multilevel preconditioners to the vectorizable preconditioner Lx . It is then apparent that, even for the anisotropic problem, for relatively coarse meshes multilevel preconditioners can be competitive with respect to scalar eciency. The simple V -cycle method ( = 1) with parameter-free polynomials seems then preferable to W -cycle methods, as it has at most a slightly higher complexity on the Poisson problem, but a noticeably lower operation count on the anisotropic problem. Extrapolating the results for the anisotropic problem would lead us to expect that the W -cycle methods become more ecient only for grids with h?1 > 600. Note that solving a system with the multilevel preconditioner is vectorizable, as only diagonal systems need to solved. However, the sequence of grids presupposes that the architecture is capable of ecient gather/scatter. It is only on massively parallel architectures that W -cycle methods are the uncontested winners with a number of iterations that is both lower than that of the V -cycle method, and constant or at most very slowly increasing. In the last two tables estimates of the condition number of the preconditioned system are also given, based on parameters computed during the iterative process; see for instance [13]. Comparing for instance the incomplete factorization preconditioner Lx (3) and the rst degree multilevel preconditioner on the anisotropic problem, we see that the latter has a higher condition number, but it converges faster. Thus these gures are not the best measure of the relative value of the methods. However, we do see that for the multilevel preconditioners the condition numbers are slowly growing for the rst degree methods, and almost constant for the higher degree ones. 8. Conclusion. We have investigated multilevel preconditioners for self-adjoint nine-point dierence operators where the incompleteness of the preconditioner derives from replacing the nine-point stencil by a ve-point one on the red points of a redblack structure imposed on each grid. Each coarser grid consists then of the black points of the previous grid. We have shown that if the grids are traversed using certain polynomials to obtain a general W -cycle{like method, it is possible to have a preconditioner that is both spectrally equivalent to the original matrix and has a number of operations that is proportional to the number of unknowns. As an added bonus, only systems with diagonal matrices need to be solved. Such multilevel preconditioners can function competitively on vector machines if there is an ecient gather/scatter mechanism to overcome the increasing grid sizes. 27
On massively parallel architectures they can be extremely fast, involving no global synchronization and only local communications. REFERENCES [1] O. Axelsson, On multigrid methods of the two-level type, in Multigrid methods, Proceedings, Koln-Porz, W. Hackbusch and U. Trottenberg, eds., 1982, pp. 352{367. Lecture Notes in Mathematics, vol. 960. [2] , A multilevel solution method for nine-point dierence approximations, in Parallel Supercomputing: Methods, Algorithms and Applications, G. F. Carey, ed., John Wiley, 1989, pp. 199{205. [3] O. Axelsson and A. Barker, Finite element solution of boundary value problems. Theory and computation, Academic Press, Orlando, Fl., 1984. [4] O. Axelsson and V. Eijkhout, Vectorizable preconditioners for elliptic dierence equations in three space dimensions, J. Comp. Appl. Math., 27 (1989), pp. 299{321. [5] O. Axelsson and I. Gustafsson, On the use of preconditioned conjugate gradient methods for red-black ordered ve-point dierence schemes, J. Comp. Physics, 35 (1980), pp. 284{299. [6] O. Axelsson and P. Vassilevski, Algebraic multilevel preconditioning methods, ii, Tech. Report 1988-15, Inst. for Sci. Comput., the University of Wyoming, Laramie,, 1988. to appear in SIAM J. Numer. Anal. [7] , Algebraic multilevel preconditioning methods, i, Numer. Math, 56 (1989), pp. 157{177. [8] R. Beauwens, On axelsson's perturbations, Lin. Alg. Appl., 68 (1985), pp. 221{242. [9] D. Braess, The contraction number of a multigrid method for solving the poisson equation, Numer. Math, 37 (1981), pp. 387{404. [10] P. Concus, G. Golub, and G. Meurant, Block preconditioning for the conjugate gradient method, SIAM J. Sci. Stat. Comput., 6 (1985), pp. 220{252. [11] H. Elman, Approximate schur complement preconditioners on serial and parallel computers, SIAM J. Sci. Stat. Comput., 10 (1989), pp. 581{605. [12] I. Gustafsson, A class of rst-order factorization methods, BIT, 18 (1978), pp. 142{156. [13] L. Hageman and D. Young, Applied Iterative Methods, Academic Press, New York, 1981. [14] T. Meiss, Schnelle losung von randwertaufgaben, Z. Angew. Math. Mech, 62 (1982), pp. 263{ 270. [15] G. Meurant, The block preconditioned conjugate gradient method on vector computers, BIT, 24 (1984), pp. 623{633. [16] M. Ries, U. Trottenberg, and G. Winter, A note on mgr methods, Lin. Alg. Appl., 49 (1983), pp. 1{26. [17] H. Yserentant, On the multilevel splitting of nite element spaces, Numer. Math, 49 (1986), pp. 379{412.
28