theoretical and numerical comparison of projection methods derived ...

18 downloads 50755 Views 405KB Size Report
J.M. Tang, R. Nabben, C. Vuik and Y.A. Erlangga. DDM) and .... First note that Q = QT , (I−PT )x = Qb and APT = PA hold from Lemma 2.2. Then, in order to solve ...
THEORETICAL AND NUMERICAL COMPARISON OF PROJECTION METHODS DERIVED FROM DEFLATION, DOMAIN DECOMPOSITION AND MULTIGRID METHODS J.M. TANG

†,

R. NABBEN

‡,

C. VUIK

† , AND

Y.A. ERLANGGA





Abstract. For various applications, it is well-known that a two-level-preconditioned Krylov method is an efficient method for solving large and sparse linear systems. Beside a traditional preconditioner, like incomplete Cholesky decomposition, a projector is included as preconditioner to get rid of the effect of a number of small or large eigenvalues of the coefficient matrix. In literature, various projection methods are known, coming from the fields of deflation, domain decomposition and multigrid. At a first glance, the projectors seem to be different. However, from an abstract point of view, it can be shown that these methods are closely related. The aim of this paper is to compare these projection methods both theoretically and numerically. We investigate their convergence properties and stability by considering their implementation, effect of roundingerrors, inexact coarse solves, severe termination criteria and perturbed starting vectors. Finally, we end up with a suggestion of the second-level preconditioner, which is as stable as the abstract balancing preconditioner and as cheap and fast as the deflation preconditioner. Key words. deflation, domain decomposition, multigrid, conjugate gradients, two-grid schemes, preconditioning, implementation, spd matrices, hybrid methods, coarse grid corrections, projection methods. AMS subject classifications. 65F10, 65F50, 65N22, 65N55.

1. Introduction. The Conjugate Gradient (CG) method [14] is a very popular iterative method to solve large linear systems of equations, Ax = b,

A = [aij ] ∈ Rn×n ,

(1.1)

whose coefficient matrix A is sparse and symmetric positive definite (SPD). The convergence rate of CG depends on the condition number of A, i.e., after j iterations of CG, the exact error is bounded by √ j κ−1 ||x − xj ||A ≤ 2||x − x0 ||A √ , κ+1

(1.2)

where x0 is the starting vector, κ = κ(A) denotes the spectral condition number of √ A, and ||x||A is the A−norm of x, defined as ||x||A = xT Ax. If κ is large it is advisable to solve a preconditioned system, M −1 Ax = M −1 b, instead of (1.1). The SPD preconditioner, M −1 , should be chosen such that M −1 A has a more clustered spectrum, or a smaller condition number than A. Furthermore, systems M y = z must be cheap to solve, relative to the improvement it provides in convergence rate. Nowadays, the design and analysis of preconditioners for CG are in the main focus. Even fast solvers, like multigrid (MG) or domain decomposition methods (DDM), are used as preconditioners. Traditional preconditioners are diagonal scaling, basic iterative methods, approximate inverse preconditioning, and incomplete Cholesky preconditioners. A discussion and overview of these fast solvers (MG and ∗ Part of this work has been done during the visit of the first and third author at Technische Universit¨ at Berlin. † Delft University of Technology, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft Institute of Applied Mathematics, Mekelweg 4, 2628 CD Delft, The Netherlands ([email protected], [email protected]). Part of the work of these authors has been funded by the Dutch BSIK/BRICKS project. ‡ Technische Universit¨ at Berlin, Institut f¨ ur Mathematik, Straße des 17. Juni 136, D-10623 Berlin, Germany ([email protected], [email protected]). The work of these authors has been partially funded by the Deutsche Forschungsgemeinschaft (DFG), Project NA248/22.

1

2

J.M. Tang, R. Nabben, C. Vuik and Y.A. Erlangga

DDM) and traditional preconditioners have been given in [43]. However, it appears that the resulting preconditioned CG method shows slow convergence in many applications with highly refined grids, and flows with high coefficient ratios in the original differential equations. In these cases, the presence of small eigenvalues has a harmful influence on the convergence of preconditioned CG. Recently, it appeared that beside traditional preconditioners, a second-level preconditioning can be used to get rid of the effect of the small eigenvalues. This new type is also known as projectors or projection methods, where extra coarse linear systems have to be solved during the iterations. Deflation is one of the frequently used projection methods, which can also be divided into several variants, see, e.g., [8, 16, 27, 30, 39]. Other typical examples of projectors are additive coarse grid correction [4, 5] and abstract balancing methods [17–19], which are well-known in the field of MG and DDM. Various projectors appear to be useful for solving problems with large jumps in the coefficients, combined with domain decomposition methods [32, 36], and in combination with block-Jacobi-type preconditioners in parallel computing [9, 38, 40]. Recently, it appeared that the two-level preconditioning can also be useful for problems with constant coefficients, which are solved on sequential computers [34]. At first glance, the projectors from deflation, DDM and MG seem to be different. Eigenvector approximations are usually used as deflation projectors, whereas special projections are built to transfer information to the whole domain or to a coarser grid, in cases of DDM and MG. However, from an abstract point of view, these projections are comparable, or even identical in some sense. In [24–26], theoretical comparisons have been given of the deflation, abstract balancing and additive coarse grid correction projectors. It has been shown that the deflation method is expected to be faster in convergence, than the other two projectors, by considering, e.g., the effective condition numbers. For certain starting vectors, deflation and abstract balancing even produce the same iterates. Although these projectors seem to be comparable, deflation turned out to be instable, as observed in the concise numerical experiments. The residuals stagnated or even diverged, when the required accuracy was (too) high. On the other hand, the other two projectors are more stable, but they have the drawbacks that balancing is more expensive to apply and coarse grid correction is slower in convergence. More recent papers about stability of projectors can also be found in, e.g., [2, 11, 13]. Note that, in [24–26], the comparisons of deflation, abstract balancing and coarse grid correction were mainly based on theoretical aspects, whereas the numerical comparison had been done concisely. Additionally, there are more attractive projectors available in literature, which were not included. Some of these basically employ the same operators, where some slight differences can only be noticed in the implementation. In this paper, we will examine a wide set of projection methods, used in different fields. Beside deflation, abstract balancing and coarse grid correction, some more attractive projectors are included in this set. First, this set will be compared theoretically, by considering the corresponding spectral properties, their numerical implementation and equivalences. Thereafter, the main focus will be on the numerical experiments, where the projectors will be tested on their convergence properties and stability. The effect of the different implementations will be analyzed extensively. The following questions will be answered: • which projectors are stable with respect to rounding errors? • which projectors can be applied, if one uses inaccurate coarse solvers, severe termination criteria or perturbed starting vectors? • is there a second-level preconditioner, which is as stable as abstract balancing and as cheap and fast as deflation? Beside the projection methods considered in this paper, some other variants are known, as augmented subspace CG [7], deflated Lanczos method [30] and the

Theoretical and Numerical Comparison of Projection Methods

3

Odir and Omin version of CG combined with extra vectors [1]. We refer to [30, 31] for a discussion and comparison of these methods. Finally, another comparison of projection methods has been done in [11], where methods like Init-CG, Def-CG, Proj-CG and SLRU have been compared. The aim of that paper is to obtain an optimal solver, which exploits accurate spectral information about the coefficient matrix, in an efficient way. This paper is organized as follows. In Section 2, we introduce and discuss the projection methods. Section 3 is devoted to the theoretical comparison of these methods. Subsequently, the numerical comparison of the projectors is carried out in Section 4. Finally, the conclusions are given in Section 5. 2. Projection Methods. In this section, the projectors will be defined and motivated, but we first start with some terminology and preliminary results. Definition 2.1. Suppose that an SPD coefficient matrix, A ∈ Rn×n , an SPD preconditioning matrix, M −1 ∈ Rn×n , and a deflation subspace matrix, Z ∈ Rn×k , with full rank and k  n ∈ N are given. Then, we define the invertible coarse matrix, E ∈ Rk×k , the correction matrix, Q ∈ Rn×n , and the deflation matrix, P ∈ Rn×n , as follows: P := I − AQ,

Q := ZE −1 Z T ,

E := Z T AZ,

where I is the identity matrix of appropriate size. Lemma 2.2. The following equalities hold: (a) P = P 2 ; (b) P A = AP T ; (c) P T Z = 0, P T Q = 0; (d) P AZ = 0, P AQ = 0; (e) QA = I − P T , QAZ = Z, QAQ = Q; (f ) QT = Q; Proof. See, e.g., [35]. If k  n and Z has rank k, coarse matrix E can be easily computed and factored, and it is SPD for any Z. From an abstract point of view, all projectors will consist of an arbitrary M −1 , combined with one or more matrices P and Q. In the next subsection, we will give a concise explanation and choices for these matrices in the different fields. Nevertheless, from our point of view, the given matrices M −1 and Z are just arbitrary. This abstract setting allows us to compare the different approaches, used in DDM, MG and deflation. 2.1. Background of the Matrices in Domain Decomposition, Multigrid and Deflation. In the projection methods used in DDM, like the balancing Neumann-Neumann or the (two-level) additive coarse grid correction method, the preconditioner, M −1 , consists of the local exact or inexact solves on subdomains. For example, M −1 can be the additive Schwarz preconditioner. Moreover, Z describes a restriction operator, while Z T is the prolongation or interpolation operator based on the subdomains. In this case, E is called the coarse grid or Galerkin matrix. To speed up the convergence of the additive coarse grid correction method, a coarse grid correction matrix, Q, can be added. Finally, P can be interpreted as a subspace correction, in which each subdomain is agglomerated into a single cell. More details can be found in [32, 36]. Also in the MG approach, Z and Z T are the restriction and prolongation operator, respectively, where there can be a connection between some subdomains. E and Q are again the coarse grid or Galerkin and coarse grid correction matrix, respectively, corresponding to the Galerkin approach. Matrix P can be interpreted as a coarse grid correction, using an interpolation operator with extreme coarsening, where linear systems with E are usually solved recursively. In the context of

4

J.M. Tang, R. Nabben, C. Vuik and Y.A. Erlangga

MG projection methods, M −1 should work as a smoother, followed by a coarse grid correction P . We refer to [37, 42] for more details. In deflation methods, M −1 can be an arbitrary preconditioner, such as the Incomplete Cholesky factorization. Furthermore, the deflation subspace matrix Z consists of so-called deflation vectors, used in the deflation matrix P . In this case, the column space of Z builds the deflation subspace, i.e., the space to be projected out of the residuals. It consists of, for example, eigenvectors, approximations of eigenvectors, or piecewise constant or linear vectors, which are strongly related to DDM. If one chooses for eigenvectors, the corresponding eigenvalues would be shifted to zero in the spectra of the deflated matrix. This fact has motivated the name ‘deflation method’. In literature, it is also known as the spectral preconditioner, see also, e.g., [11]. Usually, systems with E are solved directly, using, e.g., Cholesky decomposition. 2.2. General Linear System. The general linear system, which will be the basis for the projection methods, is given by PAx = b,

P, A ∈ Rn×n .

(2.1)

In the standard preconditioned CG method, x = x is the solution of the original −1 linear system Ax = b, A = A is the SPD coefficient matrix, P = MPREC represents −1 a traditional SPD preconditioner, and b = MPREC b is the right-hand side. We will call this method ‘Traditional Preconditioned CG’ (PREC), see also [12, 22]. Next, A may also be a combination of A and P , such that A is symmetric positive (semi-)definite (SP(S)D), while P remains a traditional preconditioner. Note that this does not cause difficulties for CG, since it can deal with SPSD matrices as long as the linear system is consistent [15]. Furthermore, instead of choosing one traditional preconditioner for P, we can combine different traditional preconditioners and matrices P and Q in an additive or multiplicative way, which will be illustrated below. The additive combination of two SPD preconditioners C1 and C2 leads to Pa2 , given by Pa2 := C1 + C2 ,

(2.2)

which should be SPD. Of course, the summation of the preconditioners can be done by different weights of C1 and C2 . Moreover, (2.2) can be easily generalized to Pai for more SPD preconditioners C1 , C2 , . . . , Ci . The multiplicative combination of preconditioners can be easily explained, by considering the stationary iterative methods induced by the preconditioner. As1 suming that C1 and C2 are two SPD preconditioners, we can combine xi+ 2 := 1 1 xi +C1 (b−Axi ) and xi+1 := xi+ 2 +C2 (b−Axi+ 2 ), to obtain xi+1 = xi +Pm2 (b−Axi ) with Pm2 := C1 + C2 − C2 AC1 ,

(2.3)

which is the multiplicative operator consisting of two preconditioners. In addition, C1 and C2 can again be combined with another SPD preconditioner C3 in a multiplicative way, which yields Pm3 = C1 + C2 + C3 − C2 AC1 − C3 AC2 − C3 AC1 + C3 AC2 AC1 .

(2.4)

This can also be generalized to Pmi , for C1 , C2 , . . . , Ci . 2.3. Definition of the Projection Methods. The projection methods, which will be compared, are given and motivated below.

Theoretical and Numerical Comparison of Projection Methods

5

2.3.1. Additive Method. If one substitutes a traditonal preconditioner C1 = M −1 and a coarse grid correction matrix C2 = Q into the additive combination as given in (2.2), then this implies PAD = M −1 + Q.

(2.5)

Using the additive Schwarz preconditioner for M −1 , the abstract form (2.5) includes the additive coarse grid correction preconditioner [3], and the resulting method is known as BPS. This operator has further been analyzed in, e.g., [4,5,28]. If the multiplicative Schwarz preconditioner is taken for M −1 , we obtain the Hybrid2 preconditioner [36, p. 47]. In the MG language, PAD is sometimes called an additive multigrid preconditioner. In this paper, the resulting method, associated to PAD , will be called ‘Additive Coarse Grid Correction’ (AD). 2.3.2. Deflation Methods. The deflation technique has been exploited by several authors [9,10,16,20,21,23–25,27,30,39]. Below, we first describe the deflation method following [39], and thereafter [16, 27, 30]. First note that Q = QT , (I −P T )x = Qb and AP T = P A hold from Lemma 2.2. Then, in order to solve Ax = b, we employ x = (I − P T )x + P T x where (I − P T )x can be computed immediately. For the part P T x, we solve the deflated system P A˜ x = P b.

(2.6)

Obviously, (2.6) is singular, and it can only be solved by CG, as long as it is consistent, see also [15]. Since matrix A is non-singular and Ax = b is consistent, this is certainly true for (2.6), where the same projection is applied to both sides of the nonsingular system. If A is singular, this projection can also be applied in many cases, see [33, 34]. Then, because of P T x ˜ = P T x, the unique solution x can be obtained via (2.6), by premultipling x˜ by P T , and adding it to Qb, i.e., x = Qb + P T x˜. Subsequently, the deflated system can also be solved, by using a preconditioner, M −1 , which gives M −1 P Ax = M −1 P b,

(2.7)

see [39] for details. The linear system (2.7) can be written in the form of (2.1), by taking P = M −1 , A = P A and b = M −1 P b. Note that this is well-defined, since it can be shown that P A is an SPSD matrix. The resulting method will be called ‘Deflation Variant 1’ (DEF1). An alternative way to describe the deflation technique is to start with a random vector, x0 , and by supposing x0 := Qb + P T x0 . Then, it can be shown that the solution of Ax = b can be constructed in the form of the deflated system AP T y = r0 ,

r0 := b − Ax0 ,

(2.8)

where it can be proven that y is a unique solution. Again, the deflated system (2.8) can also be solved with the preconditioner M −1 , leading to M −1 AP T y = M −1 r0 . After some rewriting, x can be uniquely determined from P T M −1 Ax = P T M −1 b,

(2.9)

see, e.g., [16] for more details. The resulting method will be denoted as ‘Deflation Variant 2’ (DEF2). In contrast to M −1 AP T y = M −1 r0 , Eq. (2.9) can not be written in the form of (2.1), with an SPD operator P and an SPSD matrix A. Fortunately, in Subsection 3.2, it will be shown that (2.9) is identical to a linear system, which is in the form of (2.1). Remark 1. The main difference between DEF1 and DEF2 is their flipped projection operators. In addition, define the ‘uniqueness’-operation as v = Qb +

6

J.M. Tang, R. Nabben, C. Vuik and Y.A. Erlangga

P T v˜, for certain vectors v and v˜. Then, this operation has been carried out at the end of the iteration process in DEF1, so that an arbitrarily chosen starting vector x 0 can be used. On the contrary, this operation has been applied prior to the iteration process in DEF2, which can be interpreted as adopting a special starting vector. 2.3.3. Adapted Deflation Methods. If one applies C1 = Q and C2 = M −1 as a multiplicative combination, as given in (2.3), then this yields PA-DEF1 = M −1 P + Q.

(2.10)

In the MG language, this operator results from the non-symmetric multigrid iteration scheme, where one first applies a coarse grid correction, followed by a smoothing step. Note that, although Q and M −1 are SPD preconditioners, (2.10) is a nonsymmetric operator and, even more, it is not symmetric with respect to the inner product induced by A. In addition, PA-DEF1 can also be seen as an adapted deflation preconditioner, since M −1 P from DEF1 is combined in an additive way with a coarse grid correction Q. Hence, the resulting method, corresponding to P A-DEF1 , will be denoted as ‘Adapted Deflation Variant 1’ (A-DEF1). Subsequently, we can also reverse the order of Q and M −1 in (2.3), i.e., we choose C1 = M −1 and C2 = Q in (2.3), which implies PA-DEF2 = P T M −1 + Q.

(2.11)

Using the additive Schwarz preconditioner for M −1 , PA-DEF2 is the two-level HybridII Schwarz preconditioner [32, p. 48]. In the MG methods, PA-DEF2 is the (nonsymmetric) multigrid preconditioner, where M −1 is used as a smoother. Similar to A-DEF1, PA-DEF2 is non-symmetric. Fortunately, in Subsection 3.2, we will see that A-DEF2 is identical to a method based on a symmetric operator. As in the case of PA-DEF1 , the operator PA-DEF2 can also be seen as an adapted deflation preconditioner, since P T M −1 from DEF2 is combined with Q, in an additive way. Therefore, the resulting method will be called ‘Adapted Deflation Variant 2’ (ADEF2). 2.3.4. Abstract Balancing Methods. The operators PA-DEF1 and PA-DEF2 can be symmetrized, by using the multiplicative combination of three preconditioners. If one substitues C1 = Q, C2 = M −1 and C3 = Q into (2.4), we obtain PBNN = P T M −1 P + Q. In the MG philosophy, PBNN results from a symmetric multigrid iteration scheme, where one first applies a coarse grid correction followed by a smoothing step, and ended with another coarse grid correction. However, in MG, the symmetrization is usually done by another smoothing step. Moreover, PBNN is a well-known operator in DDM. In combination with the additive Schwarz preconditioner for M −1 , and after some scaling and special choices of Z, the operator PBNN is known as the Balancing-Neumann-Neumann preconditioner, introduced in [17] and further been analyzed in, e.g., [6,18,19,29,36]. In the abstract form, PBNN is called the Hybrid-1 preconditioner [36, p. 34]. Here, we will call it ‘Abstract Balancing NeumannNeumann’ (BNN). Moreover, we will also consider two variants of BNN, see below. In the first variant, we omit Q of PBNN , giving us PR-BNN1 = P T M −1 P, which still remains a symmetric operator. To our knowledge, PR-BNN1 is unknown in literature, so this is the first time its properties are analyzed. The corresponding method is called ‘Reduced Balancing Neumann-Neumann Variant 1’ (R-BNN1). Next, in the second variant of BNN, we omit both P and Q of PBNN , resulting in PR-BNN2 = P T M −1 ,

(2.12)

Theoretical and Numerical Comparison of Projection Methods

7

and this method will be denoted as ‘Reduced Balancing Neumann-Neumann Variant 2’ (R-BNN2). Notice that the operators of both R-BNN2 and DEF2 are equal, i.e., PDEF2 = PR-BNN2 = P T M −1 , where only the implementation appears to differ, see Subsection 2.4.1. In fact, the implementation of DEF2 is equal to the approach as applied in, e.g., [30], where the deflation method has been derived, by combining a deflated Lanczos procedure and the standard CG algorithm. On the other hand, R-BNN2 is the approach, where deflation has been incorporated into the CG algorithm in a direct way [16], but it is also the approach where a hybrid variant has been employed in DDM [36]. Finally, as mentioned earlier, P T M −1 is a nonsymmetric preconditioner, but it will be shown in Subsection 3.2, that both PR-BNN1 and PR-BNN2 are identical to PBNN , for certain starting vectors. Hence, we classify these methods as variants of the original abstract balancing method, rather than as variants of deflation methods. 2.4. Aspects of Projection Methods. A list of the methods, which will be compared, is given in Table 2.1. More details about the methods can be found in the references, given in the last column of this table. Subsequently, the implementation and the computational cost of these methods will be considered in this subsection. Name PREC AD DEF1 DEF2 A-DEF1 A-DEF2 BNN R-BNN1 R-BNN2

Method Traditional Preconditioned CG Additive Coarse Grid Correction Deflation Variant 1 Deflation Variant 2 Adapted Deflation Variant 1 Adapted Deflation Variant 2 Abstract Balancing Reduced Balancing Variant 1 Reduced Balancing Variant 2

Operator M −1 M −1 + Q M −1 P P T M −1 M −1 P + Q P T M −1 + Q P T M −1 P + Q P T M −1 P P T M −1

References [12, 22] [3, 32, 36] [39] [16, 27, 30] [32] [32] [17] – [17, 36]

Table 2.1 List of methods which will be compared. The operator of each method can be interpreted as the preconditioner P, given in (2.1) with A = A. If possible, references to the methods and their implementations are given in the last column.

2.4.1. Implementation Issues. The general implementation of any method, given in Table 2.1, can be found in Algorithm 2.4.1. For each method, the corresponding matrices Mi and vectors Vstart and Vend can be found in Table 2.2. For more details, we refer to [35]. [htbp] General implementation for solving Ax = b. [1] Select random x ¯ and Vstart , M1 , M2 , M3 , Vend from Table 2.2 x0 := Vstart , r0 := b − Ax0 z0 := M1 r0 , p0 := M2 z0 j := 0, 1, . . . , until convergence wj := M3 Apj αj := (rj , zj )/(pj , wj ) xj+1 := xj + αj pj rj+1 := rj − αj wj zj+1 := M1 rj+1 βj := (rj+1 , zj+1 )/(rj , zj ) pj+1 := M2 zj+1 + βj pj xit := Vend From Algorithm 2.4.1 and Table 2.2, it can be observed that one or more precondition and projection operations are carried out, in the steps where Mi with i = 1, 2, 3 are involved. For most projectors, these steps are combined to obtain the preconditioned/projected residuals, zj+1 . DEF2 is the only method, where a projection step is applied to the search directions, pj+1 , whereas DEF1 is the only method, where the projection is performed to create wj . Moreover, notice that we use the same random starting vector x ¯ in each method, but the actual starting vector, Vstart , differs for each method. Finally, it can also be noticed that the ending vector, Vend , is the same for all methods, except DEF1. Next, recall that P, as given in (2.1), should be SPD to guarantee convergence

8

J.M. Tang, R. Nabben, C. Vuik and Y.A. Erlangga

Method PREC AD DEF1 DEF2 A-DEF1 A-DEF2 BNN R-BNN1 R-BNN2

Vstart x ¯ x ¯ x ¯ Qb + P T x ¯ x ¯ Qb + P T x ¯ x ¯ Qb + P T x ¯ Qb + P T x ¯

M1 M −1 M −1 + Q M −1 M −1 M −1 P + Q P T M −1 + Q P T M −1 P + Q P T M −1 P P T M −1

M2 I I I PT I I I I I

M3 I I P I I I I I I

Vend xj+1 xj+1 Qb + P T xj+1 xj+1 xj+1 xj+1 xj+1 xj+1 xj+1

Table 2.2 Choices of Vstart , M1 , M2 , M3 , Vend for each method, as used in Algorithm 2.4.1.

of CG. This is obviously the case for PREC, AD, DEF1 and BNN. As mentioned earlier, it can be shown that DEF2, A-DEF2, R-BNN1 and R-BNN2 also give appropriate operators, where it turns out that Vstart = Qb + P T x¯ plays an important role in this derivation. A-DEF1 is the only method, which does not have an SPD operator, and which can also not be decomposed or transformed into an SPD operator, P. Nonetheless, we will see that it works fine in several test cases, see Section 4. 2.4.2. Computational Cost. The computational cost of each method depends not only on the choices of M −1 and Z, but also on the way of implementation and storage of the matrices. It is easy to see that, for each iteration, PREC requires 1 matrix-vector multiplication (MVM), 4 inner products (IP), 3 vector updates (VU) and 1 preconditioning step. Next, note that AZ and E should be computed and stored beforehand, so that only one MVM with A is required in each iteration of the projection methods. Moreover, we distinguish two cases considering Z and AZ: • Z is sparse, and both Z and AZ can be stored in a vector; • Z is full, and therefore, Z and AZ are full matrices. For each projection method, we give the extra computation cost for each iteration with respect to PREC, see Table 2.3. In the table, the number of operations P y and Qy, for a given vector y, per iteration is also provided. Note that, if both P y and Qy should be computed for the same vector y, like in A-DEF1 and BNN, then Qy can be determined efficiently, since it only requires one IP if Z is sparse, or one MVM if Z is full. From Table 2.3, it can be seen that AD is obviously the cheapest method, while BNN and R-BNN1 are the most expensive projection methods, since two operators with P and P T are involved. Finally, observe that using a projection method is only efficient, if Z is sparse or if the number of deflation vectors is relatively small in the case of a full matrix Z. 3. Theoretical Comparison. In this section, a comparison of eigenvalue distributions of the operators will be made, and thereafter, some relations between the abstract balancing method and the other projectors will be derived. 3.1. Spectral Analysis of the Methods. It is well-known that the eigenvalue distribution of the system, corresponding to PREC, is always worse than those of the projection methods. For example, in [41], we have shown that   κ ˜ M −1 P A ≤ κ M −1 A , for any SPD matrices A and M −1 , and any Z with full rank. This means that the effective condition number, κ ˜ , of DEF1 is always below the condition number, κ,

Theoretical and Numerical Comparison of Projection Methods

Method AD DEF1 DEF2 A-DEF1 A-DEF2 BNN R-BNN1 R-BNN2

P y, P T y 0 1 1 1 1 2 2 1

Qy 1 0 0 1 1 1 0 0

IP / MVM 2 2 2 3 4 5 4 2

VU 0 1 1 1 1 2 2 1

9

CCS 1 1 1 1 2 2 2 1

Table 2.3 Extra computational cost of each iteration of the projection methods compared to PREC. IP = inner products, MVM = matrix-vector multiplications, VU = vector updates and CCS = coarse system solves. Note that IP holds for sparse Z and MVM holds for full Z.

of PREC. It appears that κ associated to PREC is always larger than κ ˜ associated to the other projection methods. Therefore, we restrict ourselves to the projection methods, in the analysis below. In [24, 25], it has been shown that the effective condition number of DEF1 is below the condition number of both AD and BNN, i.e.,   κ ˜ M −1 P A ≤ κ M −1 A + QA ,  κ ˜ M −1 P A ≤ κ P T M −1 P A + QA , for all full-ranked Z and SPD matrices A and M −1 . Important to notice is that such an inequality, between AD and BNN, does not hold. One would expect that the condition number associated to BNN is below the one associated to AD, but this is however not the case, see [26] for a counterexample. Beside the comparisons of AD, DEF1 and BNN, done in [24–26], more relations between the eigenvalue distribution of these and other projection methods are given next. We show that DEF1, DEF2, R-BNN1 and R-BNN2 have identical spectra, and the same for BNN, A-DEF1 and A-DEF2, see Theorem 3.1. Here, σ(C) = {λ1 , λ2 , . . . , λn } denotes the spectrum of an arbitrary matrix C with eigenvalues λi . Theorem 3.1.  The following two  statements hold:  • σ M −1 P A = σ P TM −1 A = σ P T M −1P A ;  • σ (P T M −1 P + Q)A = σ (M −1 P + Q)A = σ (P T M −1 + Q)A . Proof. Note first that σ(CD) = σ(DC), σ(C + I) = σ(C) + σ(I) and σ(C) = σ(C T ) hold, for arbitrary matrices C, D ∈ Rn×n , see also [35, Lemma 3.1]. Using these facts and Lemma 2.2, we obtain immediately    σ M −1 P A = σ AM −1 P = σ P T M −1 A , and σ M −1 P A



 = σ M −1 P 2 A  = σ M −1 P AP T  = σ P T M −1 P A ,

which proves the first statement. Moreover, we have   σ P T M −1 P A + QA = σ P T M −1 P A − P T + I = σ (M −1 P A − I)PT + σ(I) = σ M −1 P 2 A − P T + σ(I) = σ M −1 P A + QA .

10

J.M. Tang, R. Nabben, C. Vuik and Y.A. Erlangga

and, likewise, σ P T M −1 A + QA



= = = = =

σ σ σ σ σ

 P T M −1 A − P T + σ(I) AM −1 P − P +  σ(I) P AM −1 P − P + σ(I)  P T M −1 AP T − P T + σ(I) P T M −1 P A + QA ,

which completes the proof of the second statement. As a consequence of Theorem 3.1, DEF1, DEF2, R-BNN1 and R-BNN2 can be seen as one class of projectors, whereas BNN, A-DEF1 and A-DEF2 form another class of projectors. These two classes can be connected by Theorem 2.8 of [25], which states that if σ(M −1 P A) = {0, . . . , 0, µk+1 , . . . , µn } is given, then σ(P T M −1 P A + QA) = {1, . . . , 1, µk+1 , . . . , µn }. It appears that the reverse statement also holds. For completeness, these results are given in Theorem 3.2. Theorem 3.2. Let the spectra of DEF1 and BNN be given by σ(M −1 P A) = {λ1 , . . . , λn },

σ(P T M −1 P A + QA) = {µ1 , . . . , µn },

respectively. Then, the numbering of the eigenvalues within these spectra can be such that the following statements hold: • λi = 0 and µi = 1 for i = 1, . . . , k; • λi = µi , for i = k + 1, . . . , n. Proof. By using Lemma 2.2, (P T M −1 P + Q)AZ = Z, and M −1 P AZ = 0 hold. As a consequence, the columns of Z are the eigenvectors, corresponding to the eigenvalues of BNN and DEF1, which are equal to 1 and 0, respectively. Next, due to Theorem 2.8 of [25], it suffices to show that if σ(P T M −1 P A + QA) = {1, . . . , 1, µk+1 , . . . , µn } holds, then this implies σ(M −1 P A) = {0, . . . , 0, µk+1 , . . . , µn }. The proof is as follows. Consider the eigenvalues µi and corresponding eigenvectors vi with i = k + 1, . . . , n of BNN, i.e., (P T M −1 P + Q)Avi = µi vi , which implies P T (P T M −1 P + Q)Avi = µi P T vi .

(3.1)

Applying Lemma 2.2, we have (P T )2 M −1 P A + P T QA = P T M −1 P AP T . Using the latter expression, Eq. (3.1) can be rewritten into P T M −1 P Awi = µi wi , with wi = P T vi . Note that we have P T x = 0 if x ∈ Col(Z), due to Lemma 2.2. However, wi 6= 0, since vi ∈ / Col(Z) for i = k + 1, .. . , n. Hence, µi is also an eigenvalue of P T M −1 P A. Lemma 3.1 gives σ M −1 P A = σ P T M −1 P A , so that µi is also an eigenvalue of DEF1. Due to Theorem 3.2, both DEF1 and BNN provide almost the same spectra with the same clustering. The zero eigenvalues of DEF1 are replaced by eigenvalues equal to one, if BNN is used. Next, Theorem 3.3 connects all methods, in terms of spectra. Theorem 3.3. Suppose the spectrum of DEF1, DEF2, R-BNN1 or R-BNN2 is given by {0, . . . , 0, λk+1 , . . . , λn }. Moreover, let the spectrum of DEF1, DEF2, R-BNN1 or R-BNN2 be {1, . . . , 1, µk+1 , . . . , µn }. Then, if λi and µi in the spectra are sorted increasingly, then λi = µi for all i = k + 1, . . . , n. Proof. The theorem follows immediately from Theorem 3.1 and 3.2. From Theorem 3.3, it can be concluded that all projectors almost have the same clusters of eigenvalues. 3.2. Comparison of the Abstract Balancing and Other Projection Methods. In this subsection, it will be shown that DEF2, A-DEF2, R-BNN1 and R-BNN2 are identical methods, in exact arithmetic. More important, we will prove that these projectors are equal to the expensive projector BNN, for certain starting

Theoretical and Numerical Comparison of Projection Methods

11

vectors. First, Lemma 3.4 shows that some steps in the BNN implementation can be reduced, see also [36, Sect. 2.5.2]. Lemma 3.4. Suppose that Vstart = Qb + P T x ¯ instead of Vstart = x¯ is used in BNN. Then, this implies • Qrj+1 = 0; • P rj+1 = rj+1 , for all j = 0, 1, 2, . . ., in the BNN implementation of Algorithm 2.4.1. Proof. Both statements can be proven by induction. For the first statement, the proof is as follows. It can be verified that Qr1 = 0 and QAp1 = 0. By the inductive hypothesis, Qrj = 0 and QApj = 0 holds. Then, for the inductive step, we obtain Qrj+1 = 0 and QApj+1 = 0, since Qrj+1 = Qrj − αj QApj = 0, and QApj+1

= QAzj+1 + βj QApj = QAP T M −1 P rj+1 + QAQrj+1 = 0,

where we have used Lemma 2.2. Next, for the second statement, P r1 = r1 and P Ap1 = Ap1 can be easily shown. Assume that P rj = rj and P Apj = Apj hold. Then, both P rj+1 = rj+1 and P Apj+1 = Apj+1 hold, because P rj+1

= P rj − αj P Apj = rj − αj Apj = rj+1 ,

and P Apj+1

= = = = = =

P Azj+1 + βj P Apj P AP T M −1 P rj+1 + βj Apj AP T M −1 rj+1 + βj Apj AP T M −1 P rj+1 + βj pj A(zj+1 + βj pj ) Apj+1 ,

where we have applied the result of the first statement. This concludes the proof. As a result of Lemma 3.4, it can be concluded that, if Vstart = Qb + P T x ¯ is used, BNN is completely identical to R-BNN1, R-BNN2, A-DEF2 in exact arithmetic. Moreover, it is also mathematically identical to DEF2, since the operator P T M −1 is the same in R-BNN2 and DEF2. Thus, besides the fact that DEF2, A-DEF2, R-BNN1 and R-BNN2 are identical methods, they are also identical to BNN in some cases. These results are summarized in Theorem 3.5. Theorem 3.5. BNN with Vstart = Qb + P T x ¯ is identical to DEF2, A-DEF2, R-BNN1 and R-BNN2, in exact arithmetic. As a consequence of Theorem 3.5, the corresponding operators are all appropriate, so that CG in combination with them works fine. Moreover, note that it may be possible that Lemma 3.4 is not fully satisfied in practice, due to rounding errors. Therefore, although BNN, DEF2, A-DEF2, R-BNN1 and R-BNN2 are mathematically identical, all involved projectors except for BNN may lead to inaccurate solutions and may suffer from instabilities in numerical experiments. In these cases, the omitted projection and correction step of the BNN algorithm appear to be important. Next, we will provide a more detailed comparison between BNN and DEF1, in terms of exact errors in the A−norm, see Theorem 3.6. In fact, it is a generalization of Theorem 3.4 and 3.5 of [25], where we now apply random starting vectors, x¯, instead of zero starting vectors.

12

J.M. Tang, R. Nabben, C. Vuik and Y.A. Erlangga

Theorem 3.6. Let (xj+1 )DEF1 and (xj+1 )BNN denote the iterates xj+1 of BNN and DEF1, respectively. Then, they satisfy • ||x − (xj+1 )DEF1 ||A ≤ ||x − (xj+1 )BNN ||A ; • (xj+1 )DEF1 = (xj+1 )BNN , if Vstart = Qb + P T x¯ instead of Vstart = x ¯ is used in BNN. Proof. The proof is mainly the same as given in [25, Th. 3.4 and 3.5]. From Theorem 3.6, we conclude that the errors of the iterates built by DEF1 are never larger than those of BNN. More stronger, DEF1 and BNN produce the same iterates in exact arithmetic, if Vstart = Qb + P T x ¯ is used in BNN. 4. Numerical Comparison. In this section, a numerical comparison of the projectors will be given. We restrict ourselves to one specific test problem, which is a 2-D porous media flow problem. Other 2-D test problems with the Laplace equation and bubbly flows have been considered in [35]. Since these results appear to be comparable, it suffices to present only the results of one test problem. We consider the Poisson equation with a discontinuous coefficient,   1 −∇ · ∇p(x) = 0, x = (x, y) ∈ Ω = (0, 1)2 , (4.1) ρ(x) where ρ and p denote the piecewise-constant density and fluid pressure, respectively. The contrast,  = 10−6 , is fixed, which is the jump between the high and low density. In addition, we impose a Dirichlet condition on the boundary y = 1 and homogenous Neumann conditions on the other boundaries. The geometry of the problem, which consists of layers, is given in Figure 4.1. The layers are denoted by the disjoint sets Ωj , j = 1, 2, . . . , k, such that Ω = ∪kj=1 Ωj . The discretized domain and layers are denoted by Ωh and Ωhj , respectively. Then, for each Ωhj with j = 1, 2, . . . , k, each deflation vector, zj , is defined as follows: (zj )i :=



0, xi ∈ Ωh \ Ωhj ; 1, xi ∈ Ωhj ,

where xi is a grid point of Ωh . Then, we defined Z := [z1 z2 · · · zk ], consisting of orthogonal disjunct piecewise-constant vectors. Other more sophisticated settings of the deflation vectors can also be used, see [35, Sect. 4.1]. Composition

Permeability

Shale

Ω1

100

Sandstone

Ω2

10−6

Shale

Ω3

100

Sandstone

Ω4

10−6

Shale

Ω5

100

Fig. 4.1. Geometry of the porous media problem with k = 5 layers having a fixed density ρ. The number of deflation vectors and layers is equal.

A standard second-order finite-difference scheme is used to discretize (4.1), resulting in our main linear system, Ax = b, with A ∈ Rn×n . Moreover, we choose as preconditioner, M −1 , the Incomplete Cholesky decomposition [22], but any other SPD preconditioner could also be used.

Theoretical and Numerical Comparison of Projection Methods

13

We will start with a numerical experiment using standard parameters, which means that an appropriate termination criterion, exact computation of E −1 and exact starting vectors are used. Subsequently, numerical experiments will be performed with inexact E −1 , severe termination tolerances and perturbed starting vector, respectively. The results for each method will be presented, by giving the number of iterations and the 2−norms of the exact errors (i.e., ||xit − x||2 ) in a table, and by giving the exact errors in the A−norm during the iteration processes (i.e., ||xj − x||A with xj denoting the j−iterate), in a figure. Note that figures with the residuals and exact errors in the 2−norm (i.e., ||xj − x||2 ), during the iteration processes, are omitted, because it appears that they do not provide extra relevant information in our test cases. Finally, for all methods, we will terminate the iterative process if the norm of the relative preconditioned/projected residual, ||zj+1 ||2 /||z1 ||2 , is below a tolerance δ, or if the maximum allowed number of iterations, equal to 250, is reached. 4.1. Experiment using Standard Parameters. In the first numerical experiment, standard parameters are used with stopping tolerance δ = 10−8 , exact coarse matrix E −1 and exact starting vectors Vstart . The results are presented in Table 4.1 and Figure 4.2. Note that we only give figures of a few test cases, because they all show similar behaviors. Moreover, for the sake of a better view, the results for PREC are omitted in these figures. Method PREC AD DEF1 DEF2 A-DEF1 A-DEF2 BNN R-BNN1 R-BNN2

n = 292 , k = 5 # It. ||xit − x||2 101 2.2 × 10−7 60 5.0 × 10−7 53 2.2 × 10−7 53 2.2 × 10−7 60 4.6 × 10−7 53 2.5 × 10−7 53 2.4 × 10−7 53 2.2 × 10−7 53 2.2 × 10−7

n = 542 , k = 5 # It. ||xit − x||2 135 3.9 × 10−7 74 1.7 × 10−6 68 5.9 × 10−7 68 5.9 × 10−7 76 7.0 × 10−7 69 2.6 × 10−7 69 2.6 × 10−7 69 2.6 × 10−7 69 2.8 × 10−7

n = 412 , k = 7 # It. ||xit − x||2 188 2.3 × 10−7 76 1.7 × 10−6 68 3.0 × 10−7 68 2.9 × 10−7 69 2.5 × 10−6 68 2.8 × 10−7 68 2.8 × 10−7 68 2.8 × 10−7 68 2.9 × 10−7

n = 552 , k = 7 # It. ||xit − x||2 236 6.3 × 10−7 92 3.2 × 10−7 85 4.1 × 10−7 85 4.3 × 10−7 86 8.8 × 10−7 85 4.5 × 10−7 85 4.3 × 10−7 85 4.2 × 10−7 85 4.3 × 10−7

Table 4.1 Number of required iterations for convergence and the 2−norm of the exact error of all proposed methods for the test problem with ‘standard’ parameters.

Considering both Table 4.1 and Figure 4.2, we notice that all methods need more iterations to converge, by enlarging the grid points, n, or number of layers, k. It can also be seen that all methods, except for PREC, AD and A-DEF1, perform more or less the same, which confirms the theory (cf. Theorem 3.1 and 3.3). PREC is obviously the slowest method, followed by AD. Moreover, the differences between AD and the other projectors are relatively small, although the plots show an erratic behavior for AD. Additionally, A-DEF1 is somewhat slower in convergence and gives less accurate solutions, than the other methods except for PREC and AD. 4.2. Experiment using Inaccurate Coarse Solves. In practice, it may be difficult to find an accurate solution of the coarse system, Ey = z, at each iteration of a projection method. Therefore, only an approximated solution y˜ may be available, if one uses, for instance, an iterative solver with a low accuracy. In ˜ −1 z, where E ˜ is an inexact matrix based on E. this case, y˜ can be interpreted as E ˜ −1 defined as This motivates our next experiment, where we use E e −1 := (I + ψR)E −1 (I + ψR), E

ψ > 0,

(4.2)

14

J.M. Tang, R. Nabben, C. Vuik and Y.A. Erlangga

AD DEF1 DEF2 A−DEF1 A−DEF2 BNN R−BNN1 R−BNN2

0

10

j

||x − x||

A

−2

10

−4

10

−6

10

10

20

30 Iteration

40

50

60

(a) n = 292 , k = 5.

AD DEF1 DEF2 A−DEF1 A−DEF2 BNN R−BNN1 R−BNN2

0

10

j

||x −x||

A

−2

10

−4

10

−6

10

10

20

30

40 Iteration

50

60

70

(b) n = 412 , k = 7.

Fig. 4.2. Exact errors in the A−norm for the test problem with ‘standard’ parameters.

where R ∈ Rk×k is a symmetric random matrix with elements from the interval [−0.5, 0.5], see also [24, Sect.3] for more details. Note that theoretical results, as derived in Subsection 3.2, do not hold anymore. The sensitivity of the projectors to this inaccurate coarse matrix with various values of ψ will be investigated. Note that the results for PREC are not influenced by this adaptation of E −1 . They are only included for reference. The results can be found in Table 4.2 and Figure 4.3. By studying Table 4.2 and Figure 4.3, we see that the most stable projectors are AD, BNN and A-DEF2. Furthermore, notice that A-DEF1, R-BNN1 and RBNN2 converge for ψ ≤ 10−8 , but sometimes to the incorrect solution. In addition, DEF1 and DEF2 are obviously the worst methods, which is as expected, since the zero-eigenvalues of the deflated system become small non-zero eigenvalues, due to

15

Theoretical and Numerical Comparison of Projection Methods

Method PREC AD DEF1 DEF2 A-DEF1 A-DEF2 BNN R-BNN1 R-BNN2

ψ = 10−16 # It. ||xit − x||2 101 2.2 × 10−7 60 4.9 × 10−7 53 2.2 × 10−7 53 2.2 × 10−7 60 7.3 × 10−7 53 2.6 × 10−7 53 2.5 × 10−7 53 2.4 × 10−7 53 2.2 × 10−7

ψ = 10−12 # It. ||xit − x||2 101 2.2 × 10−7 60 5.0 × 10−7 NC 2.3 × 10−2 NC 2.8 × 10+4 60 7.0 × 10−7 53 2.4 × 10−7 53 2.3 × 10−7 53 1.9 × 10−5 53 5.7 × 10−5

ψ # It. 101 60 234 NC 60 53 53 53 54

= 10−8 ||xit − x||2 2.2 × 10−7 4.5 × 10−7 2.1 × 10−6 3.7 × 10+4 7.9 × 10−7 2.5 × 10−7 2.4 × 10−7 1.9 × 10−1 5.7 × 10−1

ψ # It. 101 53 91 NC 77 57 49 78 NC

= 10−4 ||xit − x||2 2.2 × 10−7 1.9 × 10−5 2.1 × 10−1 3.5 × 10+5 1.4 × 10−5 5.4 × 10−6 1.6 × 10−5 1.1 × 10−3 4.1 × 10+3

Table 4.2 Number of required iterations for convergence and the 2−norm of the exact error of all e −1 . methods, for the test problem with parameters n = 292 , k = 5 and E

the perturbation. It can be observed that the errors diverge for all test cases of DEF2, whereas they remain bounded in the case of DEF1. 4.3. Experiment using Severe Termination Tolerances. We perform a numerical experiment, using various values for the tolerance δ. Note that, for a relatively small δ, it may lead to a too severe termination criterion with respect to the machine precision. However, the goal of this experiment is to test the sensitivity of the projectors to δ, rather than to perform realistic experiments. In Table 4.3 and Figure 4.4, the results are provided. Method PREC AD DEF1 DEF2 A-DEF1 A-DEF2 BNN R-BNN1 R-BNN2

δ # It. 101 60 53 53 60 53 53 53 53

= 10−8 ||xit − x||2 2.2 × 10−7 5.0 × 10−7 2.2 × 10−7 2.2 × 10−7 4.6 × 10−7 2.5 × 10−7 2.4 × 10−7 2.2 × 10−7 2.2 × 10−7

δ = 10−12 # It. ||xit − x||2 131 3.8 × 10−8 76 2.0 × 10−8 69 6.3 × 10−8 69 7.7 × 10−8 97 2.5 × 10−8 69 7.0 × 10−9 69 6.0 × 10−9 69 2.9 × 10−8 69 7.7 × 10−8

δ = 10−16 # It. ||xit − x||2 201 3.6 × 10−8 87 2.0 × 10−8 NC 1.4 × 10−6 NC 2.3 × 10+5 136 2.5 × 10−8 81 7.0 × 10−9 81 6.1 × 10−9 81 2.9 × 10−8 NC 7.7 × 10−8

Table 4.3 Number of required iterations for convergence and the 2−norm of the exact error of all methods, for the test problem with parameters n = 292 , k = 5 and various termination tolerances.

By considering Table 4.3 and Figure 4.4, it can be seen that all methods, except A-DEF1, perform well for δ ≥ 10−12 . However, for δ ≤ 10−16 , A-DEF1, R-BNN2, DEF1 and DEF2 show difficulties, since they do not converge appropriately or even diverge. This is in contrast to PREC, AD, A-DEF2, BNN and R-BNN1, which give good convergence results in all test cases. Therefore, these projectors can be characterized as stable methods, with respect to severe termination tolerances. 4.4. Experiment using Perturbed Starting Vectors. In Subsection 3.2, it has been proven that BNN with Vstart = Qb + P T x ¯ is equal to DEF2, A-DEF2, R-BNN1 and R-BNN2, in exact arithmetic. In this case, the resulting operators are well-defined and they should perform appropriately. In our next experiment, we will perturb Vstart in DEF2, A-DEF2, R-BNN1 and R-BNN2, and examine whether this influences the results. Note that, in this case, there are no equivalences between

16

J.M. Tang, R. Nabben, C. Vuik and Y.A. Erlangga

4

10

AD DEF1 DEF2 A−DEF1 A−DEF2 BNN R−BNN1 R−BNN2

2

10

||xj−x||A

0

10

−2

10

−4

10

−6

10

50

100

150

200

250

Iteration

(a) ψ = 10−12 .

4

AD DEF1 DEF2 A−DEF1 A−DEF2 BNN R−BNN1 R−BNN2

10

2

10

||xj−x||A

0

10

−2

10

−4

10

−6

10

50

100

150

200

250

Iteration

(b) ψ = 10−8 .

e −1 . Fig. 4.3. Exact errors in the A−norm for the test problem with n = 292 , k = 5 and E

BNN and these methods. The perturbed Vstart , denoted as Wstart , is defined by Wstart := (1 + γy0 )Vstart ,

γ ≥ 0,

where y0 is a random vector with elements from the interval [−0.5, 0.5]. Note that, if DEF2, R-BNN1 or R-BNN2 converges with Wstart , then we may obtain a non-unique solution, since the corresponding operator is singular. Therefore, as in the case of DEF1, we should apply the ‘uniqueness’ step, as mentioned in Remark 1, at the end of the iteration process. Note that this procedure is not required in A-DEF2, because this method corresponds to a non-singular operator. Now, we perform the numerical experiment, using Wstart for different γ. The results

Theoretical and Numerical Comparison of Projection Methods

AD DEF1 DEF2 A−DEF1 A−DEF2 BNN R−BNN1 R−BNN2

0

10

−2

A

10

j

||x −x||

17

−4

10

−6

10

−8

10

10

20

30

40

50 60 Iteration

70

80

90

(a) δ = 10−12 .

AD DEF1 DEF2 A−DEF1 A−DEF2 BNN R−BNN1 R−BNN2

5

10

j

||x −x||

A

0

10

−5

10

50

100

150

200

250

Iteration

(b) δ = 10−16 .

Fig. 4.4. Exact errors in the A−norm for the test problem with n = 292 , k = 5 and various termination tolerances.

can be found in Table 4.4 and Figure 4.5. Here, we use asterisks to stress that extra uniqueness steps are applied, for some methods. Moreover, notice that PREC, AD, DEF1, BNN are not included in this experiment, since they apply the random vector Vstart = x ¯, by definition.

From the results, it can be noticed that all involved methods converge appropriately for δ ≤ 10−8 . For δ ≥ 10−6 , DEF2 and R-BNN2 fail to converge, or converge to the wrong solution, even after the uniqueness step. The most stable methods are obviously A-DEF2 and R-BNN1. This experiment has shown clearly that the ‘reduced’ variants of BNN have different stability properties, with respect

18

J.M. Tang, R. Nabben, C. Vuik and Y.A. Erlangga

Method DEF2 A-DEF2 R-BNN1 R-BNN2

γ = 10−8 # It. ||xit − x||2 53 2.2 × 10−7 53 2.5 × 10−7 53 2.5 × 10−7 53 2.5 × 10−7

γ = 10−6 # It. ||xit − x||2 NC 1.2 × 10+12 53 2.1 × 10−7 53 2.3 × 10−7 * 53 1.6 × 10−5 *

γ = 100 # It. ||xit − x||2 NC 4.0 × 10+18 54 2.5 × 10−7 53 2.7 × 10−7 * NC 1.6 × 10+1

Table 4.4 Number of required iterations for convergence and the 2−norm of the exact error of some methods, for the test problem with parameters n = 292 , k = 5 and perturbed starting vectors. An asterisk (*) means that an extra uniqueness step is applied in that test case.

DEF2 A−DEF2 R−BNN1 R−BNN2

10

10

5

||xj−x||A

10

0

10

−5

10

50

100

150

200

250

Iteration

(a) γ = 10−6 .

DEF2 A−DEF2 R−BNN1 R−BNN2

15

10

10

||xj−x||A

10

5

10

0

10

−5

10

50

100

150

200

250

Iteration

(b) γ = 100 .

Fig. 4.5. Exact errors in the A−norm for the test problem with n = 292 , k = 52 and perturbed starting vectors.

Theoretical and Numerical Comparison of Projection Methods

19

to perturbations in starting vectors. 4.5. Summary and Discussion. From a numerical point of view, we have observed that the theoretical results only hold for standard parameters with exact computations of the coarse matrices E −1 , appropriate choices for the stopping tolerance and no perturbations in the starting vectors, for certain projectors. In these cases, the numerical results confirm the theoretical fact that all projection methods perform approximately the same, although A-DEF1 showed problems in some test cases. This can be understood by the fact that A-DEF1 corresponds to a non-SPSD operator of A-DEF1, see also Subsection 2.4.1. If the dimensions of the coarse matrix, E, become large, it is favorable to solve the corresponding systems iteratively with a low accuracy. In this case, we have seen that DEF1, DEF2, R-BNN1 and R-BNN2 showed difficulties to converge. It could be observed that the errors during the iterative process of DEF2 exploded, whereas DEF1 converged slowly to the solution. The most robust methods turn out to be AD, BNN, A-DEF1 and A-DEF2. If matrix A is ill-conditioned and the tolerance of the termination criterion, chosen by the user, becomes too severe, it would be advantageous that the projection method still works appropriately. However, we have observed that DEF1, DEF2, R-BNN2 and A-DEF1 can not deal with too strict tolerances. This is in contrast to AD, BNN, A-DEF2 and R-BNN1, which remain stable in all test cases. In theory, BNN is identical to DEF2, A-DEF2, R-BNN1 and R-BNN2, for certain starting vectors. Beside the fact that these ‘reduced’ variants, except ADEF2, were not able to deal with inaccurate coarse solves, some of them were also sensitive to perturbations of the starting vector. Both DEF2 and R-BNN2 were unstable, whereas A-DEF2 and R-BNN1 appeared to be independent of these perturbations. This can be of importance, if one uses multigrid-like subdomains, where the number of subdomains, k, is very large, and the starting vector can not be obtained accurately. In the numerical experiments, we have observed that several methods showed divergence, stagnation or erratic behavior of the errors, during the iterative process. This may be caused by the fact that the residuals gradually lost the orthogonality with respect to the columns of Z, see also [30]. It can be easily shown that Z T rj = 0,

j = 1, 2, . . . ,

(4.3)

should hold for DEF1, DEF2, A-DEF2, R-BNN1 and R-BNN2. However, it appeared that (4.3) has not always been satisfied in the experiments. A remedy to recover this orthogononality in the bad-converging methods is described in, e.g., [30]. If we define the ‘reorthogonalization’ matrix W as W := I − Z(Z T Z)−1 Z T ,

(4.4)

then W is orthogonal to Z, i.e., Z T W = Z T − Z T Z(Z T Z)−1 Z T ) = 0.

(4.5)

Now, orthogonality of the residuals, rj , can be preserved, by premultiplying rj by W right after rj is computed in the algorithm: rj := W rj ,

j = 1, 2, . . . .

(4.6)

As a consequence, these adapted residuals satisfy (4.3), due to (4.5).1 1 Note that (4.3) is not valid for AD, A-DEF1 and BNN. In the case of AD and BNN, this is not a problem, because they appear extremely stable in most test cases. This is in contrast to ADEF1, which is unstable in several test cases. The instability of this projector can not be resolved, using the reorthogonalization strategy. Moreover, note that the reorthogonalization operator (4.6) is relatively cheap, provided that Z is sparse.

20

J.M. Tang, R. Nabben, C. Vuik and Y.A. Erlangga

In the numerical experiments of [35, Sect. 4.6], we have shown that the adapted versions of the methods, including the reorthogonalization strategy, converge better in terms of the residuals. Unfortunately, it appeared that accurate solutions could not be obtained. To preserve the relation rj = b − Axj , each iterate xj should be adapted via xj := xj − A−1 Z(Z T Z)−1 Z T rj ,

j = 1, 2, . . . .

(4.7)

However, it is clear that (4.7) is not useful to apply, in practice. Hence, the reorthogonalization strategy does not seem to be an appropriate remedy for bad-converging projection methods, and hence, it should be avoided. 5. Conclusions. In this paper, several abstract projectors, listed in Table 2.1, have been considered, coming from deflation, domain decomposition and multigrid. A comparison of these projectors has been carried out, by investigating their theoretical and numerical aspects. Theoretically, DEF1 appeared to be the best method [24–26]. We have seen that all projection methods, except for PREC and AD, have comparable eigenvalue distributions. Two classes of projectors could be formed having the same spectral properties. The first class consists of DEF1, DEF2, R-BNN1 and R-BNN2, and the second class includes BNN, A-DEF1 and A-DEF2. It has been shown that the associated spectrum of the methods of the first class is more favorable, than those of the second class. In numerical experiments, DEF1 works fine, if one applies a realistic stopping criterion and relatively small perturbations of E −1 . If one should choose between DEF1 and DEF2, we prefer DEF1, since the corresponding errors remain bounded in contrast to those of DEF2. Moreover, it has been shown that, for certain starting vectors, the expensive operator of BNN can be reduced to simpler and cheaper operators, which are used in DEF2, A-DEF2, R-BNN1 and R-BNN2. Hence, some projectors of the two spectral classes are the same in exact arithmetic. However, these reduced variants, except for A-DEF2, appear to be unstable in the numerical experiments, by applying inaccurate coarse solves, strict stopping tolerances or perturbed starting vectors. In fact, one should realize that the reduced variants of BNN, except A-DEF2, are as unstable as DEF1 or DEF2. Finally, by examining all theoretical and numerical aspects, we conclude that BNN and A-DEF2 are the best, c.q., most robust projectors. However, two deflation matrices are involved in BNN, making the method expensive to use. On the contrary, only one deflation matrix is involved in A-DEF2, so that it is attractive to apply. Hence, A-DEF2 seems to be the best and most stable method, considering the theory, numerical experiments and the computational cost. REFERENCES [1] S.F. Ashby, T.A. Manteuffel and P.E. Saylor, A taxonomy for conjugate gradient methods, SIAM J. Numer. Anal., 27 (1990), pp. 1542–1568. [2] R. Blaheta, Multilevel Iterative Methods and Deflation, Proceedings of ECCOMAS CFD 2006, Egmond aan Zee, The Netherlands, September 5-8, 2006. [3] J.H. Bramble, J.E. Pasciak and A.H. Schatz, The construction of preconditioners for elliptic problems by substructuring, I. Math. Comp., 47 (1986), pp. 103–134. [4] M. Dryja, An additive Schwarz algorithm for two- and three-dimensional finite element elliptic problems, Domain Decomposition methods, SIAM, Philadelphia, pp. 168–172, 1989. [5] M. Dryja and O.B. Widlund, Towards a unified theory of domain decomposition algorithms for elliptic problems, Third International Symposium on DDM for PDEs, SIAM, Philadelphia, pp. 3–21, 1990. [6] M. Dryja and O.B. Widlund, Schwarz Methods of Neumann-Neumann Type for ThreeDimensional Elliptic Finite Element Problems, Comm. Pure Appl. Math., 48 (1995), pp. 121–155.

Theoretical and Numerical Comparison of Projection Methods

21

[7] J. Ehrel and F. Guyomarc’h, An augmented subspace conjugate gradient, Report RR-3278, INRIA, Rocquencourt, France, 1997. [8] M. Eiermann, O.G. Ernst and O. Schneider, Analysis of acceleration strategies for restarted minimal residual methods, J. Comput. Appl. Math., 123 (2000), pp. 261–292. [9] J. Frank and C. Vuik, On the construction of deflation-based preconditioners, SIAM J. Sci. Comp., 23 (2001), pp. 442–462. [10] H. de Gersem and K. Hameyer, A deflated iterative solver for magnetostatic finite element models with large differences in permeability, Eur. Phys. J. Appl. Phys., 13 (2000), pp. 45–49. [11] L Giraud and S Gratton, On the sensitivity of some spectral preconditioners, SIAM J. Matrix Anal. App., 27 (2006), pp. 1089–1105. [12] G.H. Golub and C.F. van Loan, Matrix Computations, Third Edition, The John Hopkins University Press, Baltimore, 1996. [13] I.G. Graham and R. Scheichl, Robust Domain Decomposition Algorithms for Multiscale PDEs, BICS Preprint 14/06, University of Bath, 2006. [14] M.R. Hestenes and E. Stiefel, Methods of Conjugate Gradients for Solving Linear Systems, J. Res. Nat. Bur. Stand., 49 (1952), pp. 409–436. [15] E.F. Kaasschieter, Preconditioned conjugate gradients for solving singular systems, J. Comp. Appl. Math., 24 (1988), pp. 265–275. [16] L.Y. Kolotilina, Preconditioning of systems of linear algebraic equations by means of twofold deflation, I. Theory, J. Math. Sci., 89 (1998), pp. 1652–1689. [17] J. Mandel, Balancing domain decomposition, Commun. Appl. Numer. Meth., 9 (1993), pp. 233–241. [18] J. Mandel, Hybrid domain decomposition with unstructured subdomains, Proceedings of the Sixth International Symposium on DDM, Como, Italy, 1992; AMS, Providence; Contemp. Math., 157 (1994), pp. 103–112. [19] J. Mandel and M. Brezina, Balancing domain decomposition for problems with large jumps in coefficients, Math. Comp., 216 (1996), pp. 1387–1401. [20] L. Mansfield, On the Conjugate Gradient Solution of the Schur Complement System Obtained from Domain Decomposition, SIAM J. Num. Anal., 27 (1990), pp. 1612–1620. [21] L. Mansfield, Damped Jacobi preconditioning and coarse grid deflation for Conjugate Gradient iteration on parallel computers, SIAM J. Sci. Stat. Comput., 12 (1991), pp. 1314–1323. [22] J.A. Meijerink and H.A. Van der Vorst, An iterative solution method for linear systems of which the coefficient matrix is a symmetric M-matrix, Math. Comp., 31 (1977), pp. 148– 162. [23] R.B. Morgan, A restarted GMRES method augmented with eigenvectors, SIAM J. Matrix Anal. App., 16 (1995), pp. 1154–1171. [24] R. Nabben and C. Vuik, A comparison of Deflation and Coarse Grid Correction applied to porous media flow, SIAM J. Numer. Anal., 42 (2004), pp. 1631–1647. [25] R. Nabben and C. Vuik, A Comparison of Deflation and the Balancing Preconditioner, SIAM J. Sci. Comput., 27 (2006), pp. 1742–1759. [26] R. Nabben and C. Vuik, A comparison of abstract versions of deflation, balancing and additive coarse grid correction preconditioners, 2007, submitted. See also: Report 07-01, Delft University of Technology, Delft Institute of Applied Mathematics, ISSN 1389-6520, 2007. [27] R.A. Nicolaides, Deflation of Conjugate Gradients with applications to boundary value problems, SIAM J. Matrix Anal. Appl., 24 (1987), pp. 355–365. [28] A. Padiy, O. Axelsson and B. Polman, Generalized augmented matrix preconditioning approach and its application to iterative solution of ill-conditioned algebraic systems, SIAM J. Matrix Anal. Appl., 22 (2000), pp. 793–818. [29] L. F. Pavarino and O. B. Widlund, Balancing Neumann-Neumann methods for incompressible Stokes equations, Comm. Pure Appl. Math., 55 (2002), pp. 302–335. [30] Y. Saad, M. Yeung, J. Erhel and F. Guyomarc’h, A deflated version of the Conjugate Gradient Algorithm, SIAM J. Sci. Comput., 21 (2000), pp. 1909–1926. [31] V. Simoncini and D.B. Szyld, Recent computational developments in Krylov Subspace Methods for linear systems, Num. Lin. Alg. Appl., 2007, to appear. See also: Research Report 05-9-25, Department of Mathematics, Temple University, September 2005 (revised May 2006). [32] B. Smith, P. Bjørstad and W. Gropp, Domain Decomposition, Cambridge University Press, Cambridge, 1996. [33] J.M. Tang and C. Vuik, On Deflation and Singular Symmetric Positive Semi-Definite Matrices, J. Comp. Appl. Math., 2007, to appear. [34] J.M. Tang and C. Vuik, An Efficient Deflation Method applied to 3-D Bubbly Flow Problems, 2007, submitted. See also: Report 06-01, Delft University of Technology, Delft Institute of Applied Mathematics, ISSN 1389-6520, 2006. [35] J.M. Tang, R. Nabben, C. Vuik and Y.A. Erlangga, Theoretical and numerical comparison of various projection methods derived from deflation, domain decomposition and

22

[36] [37] [38]

[39]

[40] [41]

[42] [43]

J.M. Tang, R. Nabben, C. Vuik and Y.A. Erlangga multigrid methods, Delft University of Technology, Department of Applied Mathematical Analysis, Report 07-04, ISSN 1389-6520, 2007. A. Toselli and O.B. Widlund, Domain Decomposition: Algorithms and Theory, Comp. Math., 34, Springer, Berlin, 2005. ¨ller, Multigrid, Academic Press, London, U. Trottenberg, C.W. Oosterlee and A. Schu 2001. F. Vermolen, C. Vuik and A. Segal, Deflation in preconditioned Conjugate Gradient methods for finite element problems, Conjugate Gradient and Finite Element Methods, Springer, Berlin, pp. 103–129, 2004. C. Vuik, A. Segal and J.A. Meijerink, An efficient preconditioned CG method for the solution of a class of layered problems with extreme contrasts in the coefficients, J. Comp. Phys., 152 (1999), pp. 385–403. C. Vuik and J. Frank, Coarse grid acceleration of a parallel block preconditioner, Fut. Gen. Comp. Syst., 17 (2001), pp. 933–940. C. Vuik, R. Nabben and J.M. Tang, Deflation acceleration for Domain Decomposition Preconditioners, Proceedings of the 8th European Multigrid Conference on Multigrid, Multilevel and Multiscale Methods, The Hague, The Netherlands, September 27-30, 2005. P. Wesseling, An Introduction to Multigrid Methods, John Wiley & Sons Ltd., 1992. Corrected Reprint. Philadelphia: R.T. Edwards, Inc., 2004. J. Xu, Iterative Methods by Space Decomposition and Subspace Correction, SIAM Review, 34 (1992), pp. 581–613.

Suggest Documents