Geometric Aspects in the Theory of Krylov Subspace Methods

Geometric Aspects in the Theory of Krylov Subspace Methods Michael Eiermanny and Oliver G. Ernstz March 8, 1999 Abstract

The recent development of Krylov subspace methods for the solution of operator equations has shown that two basic construction principles, the orthogonal residual (OR) and minimal residual (MR) approaches, underlie the most commonly used algorithms. It is shown that these can both be formulated as techniques for solving an approximation problem on a sequence of nested subspaces of a Hilbert space, a problem not necessarily related to an operator equation. Most of the familiar Krylov subspace algorithms result when these subspaces form a Krylov sequence. The well-known relations among the iterates and residuals of OR/MR pairs are shown to hold also in this rather general setting. We further show that a common error analysis for these methods involving the canonical angles between subspaces allows many of the recently developed error bounds to be derived in a simple manner. An application of this analysis to compact perturbations of the identity shows that OR/MR pairs of Krylov subspace methods converge q-superlinearly when applied to such operator equations. Key words. Krylov subspace methods, projection methods, subspace correction methods, best approximation, Galerkin approximation, minimal residual methods, Lanczos process, Arnoldi process, error bounds, residual bounds, compact operators, angles between subspaces

AMS Subject Classi cations. 65F10,65-02,39B42 Part of this work was completed while the rst author was supported by the Laboratoire d'Analyse Numerique et d'Optimisation of the Universite des Sciences et Technologies de Lille and while the second author was supported by NSF grant No. ASC9704683 and by the University of Maryland Institute of Advanced Computer Studies. y Institut f ur Angewandte Mathematik II, Technische Universitat Bergakademie Freiberg, D-09596 Freiberg, Germany; e-mail:[email protected] z Institut f ur Angewandte Mathematik II, Technische Universitat Bergakademie Freiberg, D-09596 Freiberg, Germany; e-mail:[email protected]

1

1 Introduction In the literature of the last two decades on Krylov subspace techniques for solving linear systems of equations, two principal methods have emerged as the basis for most algorithms: the minimal residual (MR) and the orthogonal residual (OR) approaches. While both methods select an approximation to the solution of the linear system from a shifted Krylov space, the former does this in such a way that the resulting residual norm is minimized, whereas the latter chooses the approximation such that the associated residual is orthogonal to the Krylov space. Examples of such OR/MR pairs are Conjugate Gradients (CG)/Conjugate Residuals (CR) (Hestenes and Stiefel [25]) for Hermitian positive de nite systems, CG/Minimal Residual Method (MINRES) (Paige and Saunders [32]) for Hermitian inde nite systems, Full Orthogonalization Method (FOM)/Generalized Minimal Residual Method (GMRES) (Saad and Schultz [37, 38] for the general non-Hermitian case as well as Biconjugate Gradient Method (BCG)/Quasi-Minimal Residual Method (QMR) (Lanczos [29], Freund and Nachtigal [14]) and Conjugate Gradients Squared (CGS)/Transpose-Free QMR (TFQMR) (Sonneveld [39], Freund [11]), where the last two pairs of methods require de ning orthogonality with respect to a problem dependent inner product. In this paper, we will cast these two methods of solving linear systems as two abstract approximation problems on a sequence of nested subspaces of a Hilbert space, problems not necessarily related to an operator equation. We show that all of the well-known relations between the OR and MR approximations and their residuals in the context of Krylov subspace methods also hold in this abstract setting. In the usual implementations, the OR/MR approximations are computed by solving a small linear system or least-squares problem, respectively, involving a tridiagonal or Hessenberg matrix at each step. This is done by maintaining a QR factorization of this matrix which is updated at each step using Givens rotations. Due to the Hessenberg structure of this matrix, only one Givens rotation is required at each step, and the sines and cosines of these rotations can be used to characterize the OR and MR residual norms and also to express the iterates and residuals in terms of each other. In our abstract setting, these relations are derived using only the notion of angles between spaces de ning orthogonal and oblique projections. The sines and cosines occurring in these expressions are therfore not mere artifacts of the computational scheme for the MR and OR approximations, but have an intrinsic meaning. It is this intrinsic relationship between the orthogonal and oblique projections that is at the root of the often observed close relationship between OR and MR approximations. We further discuss the relation between MR and quasi-minimal residual (QMR) aproximations. The latter is also obtained by solving a least-squares problem in coordinate space, the only dierence to the MR approximation being that these coordinates are with respect to a non-orthogonal basis. We show that QMR approximations become MR approximations if only the inner product is appropriately chosen. Again, the QMR approximation can be de ned in 2

our abstract setting and then has the identical structure as the MR approximation. This distinction between MR and QMR approximations via dierent inner products also extends to OR and quasi-orthogonal residual (QOR) approximations; in fact, we show that essentially any Krylov subspace method for solving a linear system can be classi ed as both an MR or OR method by appropriate choice of the inner product. This paper, although also containing some new results, has the character of a survey paper. Our objective is to present many seemingly separate results on Krylov subspace methods using a common abstract framework: In Section 2 we cast the MR and OR approximations to the solution of an operator equation as abstract approximation problems or, which is the same, as an orthogonal and oblique projection method, respectively. In contrast to the orthogonal projection, there are situations in which the oblique projection, and hence the OR approximation, may not exist. A useful characterization of the oblique projection as the Moore-Penrose inverse of the product of two orthogonal projections leads to a canonical way of de ning an OR approximation in case of such a breakdown. We characterize the conditions under which the oblique projection exists and relate the norm of the oblique projection to the angle between two subspaces. Section 3 specializes the spaces characterizing the projections to two sequences of closely related nested spaces. It is shown that, under these general conditions, all the well-known relations among OR/MR pairs such as FOM/ GMRES can be derived. These results also elucidate the theory behind residual smoothing techniques. Section 4 explores the coordinate calculations required to compute the MR and OR approximations with repect to orthogonal and non-orthogonal bases of the underlying spaces. These involve solving least-squares problems and linear systems with a Hessenberg matrix. When a QR-factorization based on Givens rotations is used to solve these problems, the angles characterizing the Givens rotations are identi ed as the angles between these subspaces. Moreover, it is shown that methods based on non-orthogonal bases can be characterized as MR/OR methods with respect to a dierent, basis-dependent inner product. We further characterize the QMR and QOR approximations as oblique projections with respect to the original inner product and conclude with a characterization for when the QMR approximation coincides with the MR approximation. Section 5 specializes the spaces yet further to Krylov subspaces and their images under the operator A. We recover familiar Krylov subspace algorithms and show that our framework covers all Krylov subspace methods. In Section 6 we show how angles between subspaces may be used to derive essentially all known residual and error bounds for MR and OR methods and include an application to compact perturbations of the identity. Final remarks and conclusions are given in Section 7.

3

2 Two Abstract Approximation Problems and Their Relation to Angles Between Subspaces

In the sequel, H denotes a Hilbert space with inner product (; ) and induced norm k k. By A : H ! H, we always denote an invertible bounded linear operator. We begin by posing the MR and OR approximation problems for the operator equation Ax = b

(1)

with respect to a pair of subspaces. Given an arbitrary nite dimensional subspace1 V H, not necessarily a Krylov space, and an initial approximation x0 2 H to the solution x of (1), we de ne the MR approximation x MR of x with respect to x0 and V by x MR := x0 + v MR with the correction v MR 2 V chosen to minimize the residual norm kr MRk = kb ? Ax MR k = kr0 ? Av MR k

among all v 2 V. Here and in what follows, r0 := b ? Ax0 always denotes the residual of the initial guess x0 . Thus, if the space W := AV denotes the image of V under A, the minimal residual correction v MR results from requiring that Av MR be the best approximation to r0 from Wm . The unique solution to this best approximation problem is given by PW r0, where PW : H ! W denotes the orthogonal projection onto W. The corresponding approximation error r MR := r0 ? Av MR is characterized by r MR ? W: (2) The uniqueness of v MR and x MR follows from the injectivity of A. We de ne the OR approximation x OR := x0 + v OR again as an element of x0 + V, where now in place of (2) the associated residual r OR := b ? Ax OR = r0 ? Av OR is required to satisfy r OR ? V: (3) In contrast to the MR approximation, there are situations where the OR approximation may not exist or may not be uniquely determined and these are discussed below. The OR approximation thus also results from the problem of aproximating r0 in W, but with the approximation characterized by (3) rather than (2). In summary, we see that x MR = x0 + v MR and x OR = x0 + v OR simply result from dierent ways of requiring the images of the corresponding corrections, Av MR or Av OR , to approximate r0 from W. Moreover, these approximation

1 All assertions of this section can be proven for arbitrary closed subspaces, but since only those of nite dimension are relevant for numerical computations, we restrict our attention to these.

4

problems along with the two dierent characterizations (2) and (3) of their solutions can be formulated entirely without reference to the operator equation (1). Given an arbitrary nite dimensional subspace W H and an element r 2 H, we de ne its MR approximation h MR as the best approximation of r from W and denote by d MR the associated error: h MR := PW r ; d MR := r ? h MR = (I ? PW )r ? W: The distance between r and its best approximation PW r in W can be described in terms of angles between vectors and subspaces of H. The angle ](x ; y ) between two nonzero elements x ; y 2 H is de ned by the relation cos ](x ; y ) := j(x ; y )j ;

kx k ky k

which, in view of the Cauchy-Schwarz inequality, uniquely determines the number ](x ; y ) 2 [0; =2]. We note here that the natural de nition of the angle would replace the modulus in the numerator by the real part; our de nition, however, is more appropriate for comparing subspaces (cf. also the discussion of this issue in Davis and Kahan [8, p. 9]). Similarly, we de ne the angle between a nonzero vector x 2 H and a subspace U H, U 6= f0g, as ](x ; U) := 06=inf ](x ; u ); i.e.; cos ](x ; U) = sup cos ](x ; u ): u 2U 06=u 2U Further, we de ne the sine of this angle as p sin ](x ; U) := 1 ? cos2 ](x ; U): The connection between angles and orthogonal projections is given in the following lemma, a proof of which can be found e.g. in Wedin [46]. Lemma 2.1. Let U be a nite dimensional subspace of H and denote by PU the orthogonal projection onto U. For each x 2 H there holds

](x ; U) = ](x ; PU x )

and, as a consequence,

kPUx k = kx k cos ](x ; U); k(I ? PU )x k = kx k sin ](x ; U):

(4)

(5) (6) In light of this result, the distance between r and and its MR approximation h MR may be expressed as kd MRk = kr ? h MR k = k(I ? PW )r k = kr k sin](r ; W): (7) In order to de ne the OR approximation in this abstract setting we require a further nite dimensional subspace V H to formulate the orthogonality 5

constraint. The OR approximation h OR 2 W of r is then de ned by the requirement h OR 2 W;

r ? h OR ? V:

Of course, since choosing V = W yields the MR approximation, the latter is just a special case of the OR approximation. We choose nonetheless to distinguish the two, both for historical reasons and for ease of exposition. Existence and uniqueness of h OR are summarized in Proposition 2.2. If V; W are subspaces of the Hilbert space H and r 2 H,

then there holds: 1. There exists h 2 W such that r ? h ? V if and only if r 2 W + V? . 2. Such an h is unique if and only if W \ V? = f0g.

Thus, a unique OR approximation is de ned whenever H = W V? . In this case h OR is the oblique projection of r onto W orthogonal to V (or, equivalently, along V? ), which we denote by PWV : H ! W, and h OR is characterized by V r ; d OR = r ? h OR = (I ? PWV )r ? V: h OR = PW

When it exists, the oblique projection PWV , is given by the Moore-Penrose inverse (PVPW )+ of PVPW (cf. Wedin [46]). This is established in the following two lemmas, the rst of which describes the mapping properties of (PV PW )+ .

Lemma 2.3. Given two nite dimensional subspaces V and W of the Hilbert space H, let S := (PVPW )+ denote the Moore-Penrose inverse of the product of the orthogonal projections onto W and V. Then S is a projection and there holds

range(S) = W \ (V + W? ) null(S) = V? + (W? \ V):

(8) (9)

Proof. The proof of the projection property S 2 = S follows along the lines of

Greville [19, Theorem 1]. First, since P := PVPW has nite rank, its MoorePenrose pseudoinverse P + = S exists and satis es range(P + ) = range(P ), from which it follows that range(S ) range(PV );

range(S) range(PW );

and hence, by the idempotency of PV and PW , there holds PW S = S;

SPV = S;

which together imply S 2 = SPV PW S = P + PP + = P + = S. Since range(S) = range((PV PW ) ) = range(PW PV ) and analogously for the null space, it is sucient to show (8) and (9) for the operator PW PV instead of 6

S. To derive (8), note that w 2 W lies in range(PW PV) if and only if there exists 2 V such that v = w + w? for some w? 2 W? , which in turn is equivalent with w 2 W \ (V + W? ). To see (9), note that any x 2 H may be written as x = (w + w? ) + v? with w 2 W, w? 2 W? , v? 2 V? , and w + w? 2 V. Thus, x 2 null(PW PV) if and only if w = 0 or, equivalently, x 2 V? + (W? \ V). v

Lemma 2.4. Given two nite dimensional subspaces V and W of the Hilbert space H such that H = W V?, the oblique projection PWV onto W orthogonal to V is given by

PWV = (PV PW )+ :

Proof. By the previous lemma, S is a projection and, in view of

(10)

V + W? = W and its

(V? \ W)? , W? \ V = (W + V? )? , and H = W V? , its range is null space is V? , which characterizes it as P V .

W Remark 2.5. We note that, while the left hand side of (10) exists only under the condition H = W V? , the right hand side is always de ned. Thus, (PVPW )+ r can be viewed as a natural way of de ning an OR approximation in those cases where this direct sum condition fails to hold, a situation sometimes referred to, in the context of Krylov subspace methods, as a Galerkin breakdown. As is to be expected, the error d OR = r ? PWV r depends on the angles between the subspaces V and W, which we introduce as follows (cf. Golub and Van Loan [15]): Given two nite dimensional subspaces V and W of H, let m := min(dim V; dim W). The canonical angles fj gmj=1 between V and W are de ned recursively by cos j := 0max max j(v ; w )j =: j(vj ; wj )j 6=v 2V 06=w 2W kv k kw k kvj k kwj k subject to v ? v1 ; : : :; vj ?1 and w ? w1; : : :; wj ?1. We further de ne the angle between the spaces V and W as the largest canonical angle ](V; W) := m : Pm Remark 2.6. If PVPW = j =1 j (; wj )vj is a singular value decomposition of PV PW , then the variational characterization of the singular values, j(PVPW w ; v )j =: j(PVPW wj ; vj )j j (PV PW ) = v 2max V;w 2W kw k kv k kwj k kvj k subject to v ? v1 ; : : : vj ?1, w ? w1; : : :; wj ?1 for j = 1; : : :; m (cf. Bjorck and Golub [2]), shows immediately that cos j = j . Furthermore, we note that, V dim W given any two orthonormal bases fvj gdim j =1 and fwj gj =1 of V and W, then the cosines of the canonical angles are the singular values of the matrix of inner products [(vj ; wk )]j =1;:::;dim V;k=1:::;dim W (cf. Chatelin [6]). 7

Remark 2.7. As a consequence of Remark 2.6, we see that S = (PV PW )+ can

be written as

S=

m X j =1

j+ (; vj )wj

+

with

j

In particular, we have

(

:= 1=j if j > 0 0 otherwise.

null(S) = spanfvj : j > 0g? :

range(S) = spanfwj : j > 0g;

Thus, we have range(S) = W if and only if dim(W) = m and j > 0 for all j = 1; : : :; m. Similarly, null(S) = V? if and only if dim(V) = m and j > 0 for all j = 1; : : :; m. Consequently, the oblique projection PWV exists if and only if dim(V) = dim(W) and ](V; W) < =2. Remark 2.8. For completeness, we note that the sine of the angle between two subspaces, V and W, is also given by sin ](V; W) = kPV ? PW k (cf. [6]). Besides the relative position of the spaces V and W, the error of the OR approximation also depends on the position of r with respect to V and W. In this generality, all we can do to bound the OR approximation error is determine the norm of the complementary projection I ? PWV . For simplicity, since H = W V? for nite dimensional V and W implies dim V = dim W, we assume that both spaces have the same dimension m < 1. Theorem 2.9. Given two m-dimensional subspaces V; W H of the Hilbert space H such that H = W V? , let PWV : H ! W denote the oblique projection onto W orthogonal to V. Then there holds

kI ? PWV k = cos ](1V; W) :

(11)

Proof. The proof follows from the fact that kP k = kI ? P k for any projection operator P in a Hilbert space (cf. Kato [28]). Thus, kI ? (PV PW )+ k = k(PV PW )+ k = max ((PVPW )+ ) = (P1 P ) min V W and the conclusion follows from Remark 2.6. Without further assumptions on r and the spaces V and W, all we can say about the error of the OR approximation is

kr k : kd OR k = kr ? h OR k = k(I ? PWV )r k cos ]( V; W) As an immediate consequence, noting that V )r = (I ? PWV )(r ? w ) d OR = (I ? PW 8

8w 2 W;

we obtain

kd ORk kI ? PWV k winf kr ? w k = kI ? PWV k kd MRk; 2W

an estimate usually referred to as Cea's Lemma, which, in view of (11), implies cos ](V; W)kd ORk kd MRk kd OR k:

3 Projections onto Nested Subspaces

Since until now nothing further was assumed to relate V, W and r , we were only able to bound the error in the OR approximation in terms of the largest canonical angle between the spaces V and W. We will obtain more interesting results in this section by selecting a speci c space V which diers only slightly from W. This choice, however, is still general enough to contain all Krylov subspace methods. Moreover, we now consider the MR and OR approximations on nested sequences of subspaces, which is the setting in which these approximations are used by practical algorithms. Consider a sequence of nested subspaces

f0g = W0 W1 Wm?1 Wm

(12)

of H and assume for simplicity that dim Wm = m. Throughout this section, fw1 ; : : :; wm g will always denote an orthonormal basis of Wm such that fw1 ; : : :; wm?1 g forms a basis of Wm?1 . In terms of such a basis, the MR approximation, i.e., the best approximation of r 2 H from Wm , can be expressed as the truncated Fourier expansion hmMR = PWm r =

m X j =1

(r ; wj )wj ;

so that the norm of the associated error dmMR = r ? hmMR is given by

kdmMRk2 = k(I ? PWm )r k2 = kr k2 ?

m X j =1

j(r ; wj )j2:

(13)

To see how the approximations on successive spaces are related, we note that for every m 1 there holds hmMR =

m X j =1

(r ; wj )wj = hmMR?1 + (r ; wm)wm = hmMR?1 + PWm dmMR ?1 :

It follows that MR MR dmMR = dmMR ?1 ? PWm dm?1 = (I ? PWm )dm?1

9

and, since dmMR ? PWm dmMR ?1 2 Wm ,

kdmMRk2 = kdmMR?1k2 ? kPWm dmMR?1 k2 = kdmMR?1k2 ? j(r ; wm)j2:

(14)

Relation (14) shows that no improvement in the MR approximation results whenever the direction in which Wm?1 is enlarged is orthogonal to r . In other words,

kdmMRk < kdmMR?1 k if and only if (r ; wm ) 6= 0:

(15)

To express successive approximation errors in terms of angles, we note that dmMR = (I ? PWm )dmMR ?1 together with (6) yields

kdmMRk = k(I ? PWm )dmMR?1k = sm kdmMR?1 k;

(16)

where sm := sin ](dmMR ?1 ; Wm ). Note that the sine sm is also given by (cf. (7))

MR m k = k(I ? PWm )r k = sin ](r ; Wm) ; (17) sm = kkddMR m?1k k(I ? PWm?1 )r k sin ](r ; Wm?1) i.e., sm is the sine of the angle between the previous approximation error and Wm or, equivalently, the quotient of the sines of the angles between r and the current and previous approximation spaces. In order for the last three terms in (17) to make sense, we assume that r 62 Wm?1 ; otherwise, the approximation problem is solved exactly in the space Wm?1 and the larger spaces no longer contribute toward improving the approximation. In view of (14), the corresponding cosine is given by s

mMRk2 = j(r ; wm)j cm := 1 ? s2m = 1 ? kdMR kdm?1 k2 kdmMR?1k p

(18)

and we see that an equivalent way of stating (15) is

kdmMRk < kdmMR?1 k if and only if cm 6= 0:

(19)

An obvious induction applied to (16) leads to the error formula

kdmMRk = s1 s2 sm kr k;

(20)

from which we see that the sequence of approximations will converge to r if and only if the product of sines tends to zero. Moreover, if the numbers sm themselves tend to zero, the convergence of the approximations is superlinear. To de ne the corresponding OR approximations with respect to the nested sequence fWm gm0, we make a special choice for the sequence fVm gm1 of spaces which de ne the orthogonality condition: We set

Vm := spanfr g + Wm?1; 10

m = 1; 2; : : ::

(21)

With this de nition of Vm , since the m-th OR and MR approximations lie in Wm , the corresponding approximation errors lie in Vm+1 . For this reason, we shall refer to fVm gm1 as the sequence of error spaces. We rst investigate the question of when the OR approximation is wellde ned. In view of Remark 2.7, this amounts to checking whether the largest canonical angle between Vm and Wm is strictly less than =2. As a consequence of the special choice (21) of Vm , it turns out that this angle is the same as ](dmMR ?1 ; Wm ): Theorem 3.1. If the spaces Vm and Wm are related as in (21), then the largest canonical angle between them is given by

](Vm ; Wm ) = ](dmMR (22) ?1; Wm ): Moreover, the remaining m ? 1 canonical angles between Vm and Wm are zero. Proof. Noting that fw1; : : :; wm?1; w^ m g with w^ m = dmMR ?1=kdmMR ?1k is an orthonormal basis of Vm , we obtain the cosine of the largest canonical angle as the smallest singular value of the matrix 2 3 (w1 ; w1) : : : (wm?1 ; w1) (wm ; w1) # " 6 7 .. .. .. I 0 m ? 1 6 7 . . . 6 7= m ;r ) 0 k(dwmMR 4(w1 ; wm?1) : : : (wm?1 ; wm?1) (wm ; wm?1 )5 ?1 k (w1 ; w^ m ) : : : (wm?1 ; w^ m ) (wm ; w^ m) (cf. Remark 2.6). We see that the smallest singular value is j(wm ; r )j=kdmMR ?1k = cm (cf. (18)) and that all remaining singular values are equal to one. As an immediate consequence of Theorem 3.1, we obtain the following characterization of when the oblique projection is de ned for our special choice of Vm:

Corollary 3.2. The OR approximation of an arbitrary r 2 H with respect to the sequence of spaces Vm and Wm as given by (12) and (21) is uniquely de ned if and only if (r ; wm) = 6 0, i.e., if and only if the MR approximation improves as Wm?1 is enlarged to Wm .

Thus the degenerate case in which the OR approximation is not uniquely de ned for all r 2 H and for which the MR approximation makes no progress is characterized by cm = 0. We thus tacitly assume (r ; wm ) 6= 0 whenever mentioning the OR approximation of index m. In many algorithms such as GMRES it is convenient to work also with orthonormal bases fv1; : : :; vm g of Vm , m = 1; 2; : : :, i.e., vm is a unit vector in Vm \ V?m?1. The following result relates ](r ; wm) with ](vm+1 ; wm). Lemma 3.3. If the subspaces Vm and Wm are related by (21) and fv1 ; : : :; vmg and fw1 ; : : :; wm g are orthonormal bases of Vm and Wm , respectively, then there holds

j(vm+1 ; wm )j = sm :

In the degenerate case (r ; wm) = 0, the vectors vm+1 and wm must be collinear.

11

Proof. From vm+1 2 spanfr ; w1; : : :; wm g \ spanfr ; w1; : : :; wm?1g? we con-

clude vm+1 2 spanfdmMR; wm g and the assertion follows from the remaining requirements kvm+1 k = 1 and vm+1 ? r : Using the notation from the proof of Theorem 3.1, we set w^ m+1 := dmMR=kdmMRk and note that fw^ m ; wm g form an orthonormal basis of spanfdmMR; wm g, hence vm+1 = w^ m + wm for some coecients ; 2 C . Since (dmMR; r ) = kdmMRk2, orthogonality of vm+1 and r yields 0 = (v ; r ) = (d MR; r ) + (w ; r ) = kd MRk + (w ; r ): m+1

kdmMRk

m

m

m

m

The requirement that vm+1 have unit norm now gives

j j2 =

MR 2 MR 2 j ( wm ; r )j2 ?1 m k = s2 : = kd MRkk2d+m j(kw ; r )j2 = kdMR 1 + kd MRk2 kdm?1k2 m m m m

Since dmMR ? wm , we now obtain j(vm+1 ; wm )j = j j = sm . In the case (r ; wm) = 0, this implies j(vm+1 ; wm )j = 1, i.e., j(vm+1 ; wm)j = kvm+1 k kwm k which means that these two vectors are collinear. Recall that r 2 Wm?1 implies that the MR approximation with respect to the space Wm?1 solves the approximation problem exactly. The same is true for the OR approximation, since r 2 Wm?1 implies r ? hmOR?1 2 Wm?1 \ V?m?1 = f0g, so that the OR approximation solves the problem exactly whenever the MR approximation does. In other words, the assumption r 62 Wm?1 is equivalent OR with saying that both dmMR ?1 and dm?1 are not yet zero. MR If we de ne w~ m := j(wm ; r )j=(r ; wm)w^ m , where w^ m = dmMR ?1=kdm?1k, then (wm ; w~ m ) = cm (cf. the proof of Theorem 3.1). Consequently, the sets fw1 ; : : :; wm?1 ; w~ m =cm g and fw1 ; : : :; wm g form a pair of biorthonormal bases of Vm and Wm . This fact allows us to express the oblique projection which determines the OR approximation as the singular value expansion PWVmm =

mX ?1 j =1

(; wj )wj + c1 (; w~ m )wm ; m

(23)

from which we derive the following expression for the dierence of the OR and MR approximations: MR = (P Vm ? P )r = c?1(r ; w ~ m ) ? (r ; wm) wm hmOR ? hm m Wm Wm MR 2 2 (24) dmMRk2 wm : = kdm?1k(w? ;j(rr); wm)j wm = k(w m m; r ) In other words, since the spaces Vm and Wm are so closely related, the projection PWVmm is simply a rank-one modi cation of PWm and this is the essential ingredient of the proof for the following familiar relations: 12

Theorem 3.4. Given an arbitrary element r 2 H, a nested sequence of subspaces Wm H of dimension m (cf. (12)) and a corresponding sequence of

error spaces Vm as de ned by (21), then the MR and OR approximations of r with respect to Wm and Vm satisfy

kdmMRk = sm kdmMR?1k; kdmMRk = s1 s2 : : :sm kr k; kdmMRk = cm kdmORk; kdmORk = s1 s2 sm kr k=cm;

(25) (26) (27) (28) (29) (30)

MR = s2m hmMR 2 OR hm ?1 + cm hm ; 2 OR dmMR = s2m dmMR ?1 + cm dm ; m 1 h OR; 1 h MR = X m OR 2 j kdmMRk2 k d j =0 j k m 1 d MR = X 1 OR kdmMRk2 m j =0 kdjORk2 dj ; m X 1 1 1

kdmMRk2 =

where

(31) (32)

1 ; = + OR MR OR 2 2 kdm?1k kdm k2 j =0 kdj k

sm = sin](dmMR ?1 ; Wm )

and

(33)

cm = cos ](dmMR ?1 ; Wm ):

Proof. The identities (25) and (26) have already been proven.

Now from (24) there follows

OR ? h MR 2 spanfw g; dmMR ? dmOR = hm m m

and OR = h MR + hm m

kdmMRk2 w = h MR + kdmMR?1k2 w ; (w ; r ) m m?1 (w ; r ) m m

m

(34) (35)

where we have used (14) for the last equality. Since dmMR ? wm , the Pythagoras identity and (14) yield

kd OR k2 = m

kd MR k2 k dmMRk2 1 + j(w ; r )j2 kdmMRk2 = j(wm?; r1 )j2 kdmMRk2; m m

which, in view of (18), gives the following error formula for the OR approximation: MR MR kdmOR k = kdmj(?w1k ;krd)mj k = c1 kdmMRk: m m 13

This proves (27) and (28). From (35) and hmMR ? hmMR?1 = (r ; wm )wm we obtain MR 2 1 MR MR MR + kdm k hmOR = hm (wm ; r ) (r ; wm) (hm ? hm?1)

m k kdm?1k (h MR ? h MR ) = hmMR + kkddMR m?1 k2 j(r ; w )j2 m MR 2

MR 2

= hmMR +

m m?1 2 sm (h MR ? h MR ) m?1 c2m m

(cf. (25) and (18)), which implies the relationship (29) between the OR and MR approximations and, by way of s2m + c2m = 1, the associated relationship (30) between their errors. Repeated application of these two formulas leads to hmMR =

m X j =0

2 h OR m;j j

dmMR =

and

m X j =0

2 d OR ; m;j j

where m;0 := s1 s2 : : :sm and m;j := cj sj +1 : : :sm (1 j m). In view of (25) and (27) this can be simpli ed to

kd MRk kd MRk

kdm k kdm k kdm k +1 j +2 m;j = cj kdjMR k kd MRk kd MR k = cj kd MRk = kd OR k j

j +1

MR

MR

MR

m?1

j

j

and we obtain (31) as well as (32). We nally note that|since the errors djOR are orthogonal| ?1 1 m X 1 1 1 = mX 1 = kdmMRk2 j =0 kdjORk2 j =0 kdjORk2 + kdmOR k2 = kdmMR?1 k2 + kdmORk2 which proves (33). Strictly speaking, this proves 1 = 1 + 1

1

kdmMRk2

kdmMR?1k2 kdmORk2

only under the assumption that all OR approximations h1OR ; : : : ; hmOR exist. But this last equation is merely a reformulation of the Pythagoras identity, m k + kdm k 1 = s2m + c2m = kkddMR k2 kd OR k2 MR 2

MR 2

m?1

m

(cf. (25) and (27)), and requires only the existence of hmOR (besides r 62 Wm ).

Corollary 3.5. In view of s k dmMRk mMRk2 ; sm = kd MR k ; i.e., cm = 1 ? kkddMR m?1 m?1k2 14

an angle-free formulation of (27), (29) and (30) reads s

mMRk2 kd ORk; kdmMRk = 1 ? kkddMR m m?1 k2 kdmMRk2 (h MR ? h OR); hmMR = hmOR + MR kdm?1k2 m?1 m kdmMRk2 (d MR ? d OR): dmMR = dmOR + MR kd k2 m?1 m m?1

Of course, the rst of these identities, or its reformulation

kd OR k = m

?1=2 k dmMRk2 kdmMRk; 1 ? kd MR k2 m?1

only makes sense if hmOR is de ned, which is equivalent to kdmMRk < kdmMR ?1k. MR MR MR 2 MR 2 ? 1 = 2 If kdm k kdm?1k then the factor (1 ? kdm k =kdm?1k ) will be large and, consequently, kdmORk kdmMRk. Conversely, if the MR approximation makes considerable progress in step m then (1 ? kdmMRk2 =kdmMR ?1k2)?1=2 1 OR MR and kdm k kdm k. In the context of Krylov subspace methods, this observation is sometimes referred to as the peak/plateau-phenomenon of OR/MR approximations (cf., e.g., Cullum and Greenbaum [7]). A smoothing algorithm transforms a given sequence fhm g, hm 2 Wm , of approximations to r into a new sequence fh^m g, ^hm 2 Wm , according to h^m := (1 ? m )h^m?1 + m hm ; i.e., d^m = (1 ? m )d^m?1 + m dm ; (36) (m = 1; 2; : : :, h^0 := h0 = 0). The intention is that the errors d^m of ^hm should converge to 0 \more smoothly" than the errors dm associated with the original sequence. Ideally, we would like to have h^m = hmMR and we shall discuss two smoothing procedures which achieve this goal when applied to hm = hmOR . In minimal residual smoothing (cf. Weiss [47], Zhou and Walker [50], Gutknecht [23, x17]), the parameter m of (36) is chosen such that the norm of the error d^m = d^m?1 ? m (d^m?1 ? dm ) is minimized as a function of m . In other words, we seek the best approximation m (d^m?1 ? dm ) from spanfd^m?1 ? dm g to d^m?1 , which is obtained for ^ ^ MR := (dm?1 ; dm?1 ? dm ) : m

kd^m?1 ? dm k2

In an alternative smoothing procedure known as quasi-minimal residual smoothing (cf. [50], [23, x17]), the parameter m is chosen according to m2 QMR m := kd k2 m

with m such that

1 1 1 m2 = m2 ?1 + kdm k2 ; 0 = kr k:

It is easy to see by induction that 2 1 QMR = P 1=kdmk ; i.e., ; m2 = Pm 1= m m j =0 kdj k2 j =0 1=kdj k2 15

and therefore

Pm 2 hj =kdj k2 j =0 dj =kdj k : j =0 ^ Pm P as well as d = m m j =0 1=kdj k2 j =0 1=kdj k2 The last formula for h^m reveals the idea behind quasi-minimal residual smoothing: ^hm is aPweighted sum of all previous iterates h0; h1; : : : ; hm with weights (1=kdk k2)=( mj=0(1=kdj k2)) which are (relatively) large if hk approximates r well and (relatively) small if hk is a poor approximation to r . In general, i.e., for an arbitrary sequence fhm g 2 Wm , minimal and quasiminimal residual smoothing will generate dierent \smoothed" iterates h^m . In the case of hm = hmOR however, these two methods are equivalent:

^hm =

Pm

Proposition 3.6. If either the minimal residual or the quasi-minimal residual smoothing algorithm is applied to the sequence of OR approximations fhmORg for an element r 2 H, then the resulting smoothed sequence consists of the MR approximations hmMR for r and the associated smoothing parameters are given QMR 2 by MR m = m = cm . Proof. An induction shows that minimal residual smoothing applied to hm =

2 hmOR yields h^m = hmMR and that, in this case, MR m = cm : The assertion is trivial MR for m = 1. Assuming h^m?1 = hm?1 for some m 2, we see from (35) MR 2 OR kdm?1 k d^m?1 ? dm = dmMR ?1 ? dm = (wm ; r ) wm

and consequently, noting that dmMR ?1 = dmMR + (r ; wm)wm and dmMR ? wm ,

2 (d^m?1 ; ^dm?1 ? dm ) = (r ; wm)wm ; k(dwm?;1rk) wm = kdmMR ?1 k : m MR 2

From (18) and (29), there nally follows 2 2 MR = kd MR k2 j(wm ; r )j = j(wm ; r )j = c2 m

m?1

kdmMR?1k4

kdmMR?1 k2

m

and h^m = hmMR :

The analogous assertion for quasi-minimal residual smoothing follows, with (31) and (32), immediately from the orthogonality of the error vectors dmOR . It should not come as a surprise that, in our setting, a one-dimensional minimization procedure like minimal residual smoothing yields the best approximation hmMR, which is the global optimum on Wm . Recall that hmOR is already \nearly optimal" and needs to be corrected only in the direction of wm . Remark 3.7. We close this section by reconsidering the issue of the so-called Galerkin breakdown mentioned in Remark 2.5. Using the biorthonormal bases introduced in (23), we obtain the singular value expansion of PVPW as PV PW =

mX ?1 j =1

(; wj )wj + cm (; wm )w~ m : 16

If, in view of (10), one de nes the OR approximation in the degenerate case cm = 0 by hmOR := (PV PW )+ r , then this leads to hmOR = hmMR?1 = hmMR.

4 Working with Coordinates The above results more closely resemble the matrix formulation of well-known Krylov subspace methods when formulated in coordinates with respect to suitable bases of the spaces Wm and Vm . The simplest case is a formulation in terms of an orthonormal basis of Wm . In the context of operator equations, however, the error space Vm of the previous step is commonly used as not only the space de ning the OR orthogonality constraint, but also as the trial space for the approximate solution; for this reason, a formulation in terms of an orthonormal basis of the spaces Vm is often more convenient.

4.1 Using an Orthonormal Basis of Wm

If, for each m 1, the vectors fw1; : : :; wm g form an orthonormal basis of Wm , then each h 2 Wm possesses the unique representation h = Wm y , in which Wm denotes the row vector Wm := [w1; : : :; wm ] and y 2 C m is the coordinate vector of h with respect to this basis. The characterization dmMR = r ? Wm ymMR ? Wm then immediately determines the coordinate vector ymMR of hmMR to be MR = [(r ; w ); : : :; (r ; w )]> : ym (37) 1 m The coordinate vector ymOR of the corresponding OR approximation hmOR is given by 2 > ymOR = [(r ; w1); : : :; (r ; wm?1); kdmMR (38) ?1k =(wm ; r )] since hmOR = hmMR?1 + kdmMR ?1k2 =(wm ; r )wm (cf. (35)).

4.2 Using an Orthonormal Basis of Vm

We now drop the orthogonality requirement on the vectors wj , assuming only that, for each m, fw1 ; : : :; wmg forms a basis of Wm . In the same manner, let fv1 ; : : :; vm+1 g form a basis of the corresponding error space Vm+1 . Since Wm spanfr g + Wm = Vm+1 , we may represent each wk as a linear combination of v1 ; : : :; vk+1, wk =

kX +1 j =1

j;k vj ;

k = 1; : : :; m;

or, more compactly, employing the row vector notation Wm = [w1; : : :; wm ] and Vm := [v1; : : :; vm ], m = 1; 2; : : :, Wm = Vm+1 H~ m = Vm Hm + [0; : : :; 0; m+1;m vm+1 ]; (39) 17

where H~ m =: [j;k] 2 C (m+1)m is an upper Hessenberg matrix and Hm := [Im 0]H~ m is the square matrix obtained by deleting the last row of H~ m . Note that, as long as r 62 Wm , i.e., wm 62 Vm , we have m+1;m 6= 0, which implies rank(H~ m ) = m. If r 2 Wm for some index m, we set L := minfm : r 2 Wm g (40) to be the smallest such index and observe that VL+1 = VL = spanfv1; : : :; vLg, i.e., WL = VL HL, implying rank(HL ) = rank(H~ L) = L. If such an index does not exist, we set L = 1. To avoid cumbersome notation we shall concentrate on the case of L < 1, which is the most relevant for practical applications. We note, however, that all our conclusions can be proven in the general case. For a given sequence fwj gj 1, an orthonormal sequence of vectors fvj gj 1 may be constructed recursively starting with v1 := r =kr k and, in view of Vm+1 = Vm + spanfwmg, successively orthogonalizing each wm against the previously generated v1 ; : : :; vm : v1 := r = ; := kr k; (41) (I ? PVm )wm ; m = 1; 2; : : :; L ? 1: vm+1 := k(I ? PVm )wm k Of course, this is nothing but the Gram-Schmidt orthogonalization procedure applied to the basis fr ; w1; : : :; wm?1 g of Vm , and hence in this case the entries in the Hessenberg matrix introduced in (39) are given by j;m = (wm ; vj ); j = 1; : : :; m + 1; m 1: We also note that (42) m+1;m = (wm ; vm+1 ) = k(I ? PVm )wm k 0 with equality holding if and only if wm 2 Vm or, equivalently, r 2 Wm , i.e., m = L. We now turn to the determination of the coordinate vectors of the MR and OR approximations with respect to the basis fw1; : : :; wm g. In the following lemma and in the remainder of the paper, we will employ the notation u1(m) to denote the rst unit coordinate vector of C m wherever this adds clarity. Lemma 4.1. The coordinate vector ymMR 2 C m of the MR approximation hmMR with respect to the basis fw1; : : :; wm g is the solution of the least-squares problem k u1(m+1) ? H~ m y k2 ! minm ; (43) y 2C

whereas the coordinate vector ymOR of the OR approximation solves the linear system of equations

Hm y = u1(m) : 18

(44)

In short, + u (m+1) and y OR = H ?1 u (m) ; ymMR = H~ m m m 1 1 where H~ m+ = (H~ mH H~ m )?1H~ mH is the Moore-Penrose pseudoinverse of H~ m .

Proof. The assertions of the lemma become obvious when the relevant quantities

are represented in terms of the orthonormal basis fv1; : : :; vm+1 g of Vm+1 . The vector r to be approximated possesses the coordinate vector u1 2 Rm+1 and the approximation space Wm = spanfw1; : : :; wm g is represented by the span of the columns of H~ m . In other words, if h 2 Wm has the coordinate vector y with respect to fw1; : : :; wm g, then d = r ? h 2 Vm+1 has the coordinate vector u1 ? H~ m y with respect to fv1 ; : : :; vm+1 g. More formally, for any w = Wm y 2 Wm (y 2 C m ), there holds r ? w = r ? Wm y = v1 ? Vm+1 H~ m y = Vm+1 ( u1 ? H~ m y ): As the vectors fv1 ; : : :; vm+1 g are orthonormal, it follows that kr ? w k = k u1 ? H~ m y k2

(k k2 denoting the Euclidean norm in C m+1 ). Similarly, r ? w ? Vm if and only if the rst m coecients of u1 ? H~ m y vanish, i.e., if u1 ? Hm y = 0

(u1 2 Rm):

Remark 4.2. To determine ymOR using Lemma 4.1 we must of course assume that

the linear system Hm y = u1 is solvable. But this is equivalent to our previous characterization of the existence of hmOR , namely with cm = cos ](dmMR ?1; Wm ) 6= 0 (cf. (19) and Corollary 3.2), which can be seen as follows: First, note that u1 2 Rm and the rst m ? 1 column vectors of Hm form a basis of C m as long as m L (since j +1;j 6= 0 for j = 1; 2; : : : ; L ? 1). This implies that Hm y = u1 is consistent, i.e., u1 2 range(Hm ), if and only if Hm is nonsingular. Next, recall from Remark 2.6 that cm equals the smallest singular value of the matrix [(vj ; w^ k )]j;k=1;2;:::;m , where fw^ 1; w^ 2; : : : ; w^ m g is any orthonormal basis of Wm . We select such an orthonormal basis and represent its elements as linear combinations in the original basis fw1; w2; : : : ; wm g. In our row vector notation, this leads to a nonsingular matrix T 2 C mm with [w^ 1; w^ 2; : : : ; w^ m ] = [w1; w2; : : : ; wm ]T. Now, [(vj ; w^ k)] = [(vj ; wk)] T = HmH T and, consequently, the smallest singular value of [(vj ; w^ k )] is positive if and only if Hm is nonsingular.

19

Remark 4.3. In view of (39) and the result of Lemma 4.1, the aproximations OR and their associated errors have the following representations in hmMR and hm terms of the basis fv1; : : : ; vm+1 g:

+ u ; hmMR = Vm+1 H~ m H~ m 1

+ u ; dmMR = Vm+1 Im+1 ? H~ m H~ m 1

?1 u1 ; hmOR = Vm+1 H~ m Hm

dmOR = Vm+1

Im ? H~ H ?1 u : m m 1 0

(45)

OR The last identity shows that the coordinate vector iof dm has a particularly h [?1] , we obtain for m < L, simple form: Introducing the notation Hm?1 = j;k

?1 u (m) dmOR = Vm+1 u1(m+1) ? H~ m Hm 1 ? 1 = Vm+1 Im+1 ? H~ m [Hm 0 ] u1(m+1)

0

[?1] = Vm+1 ? [?1] = ? m+1;m m;1 vm+1 : m+1;m m;1 The matrix Im+1 ? H~ m [Hm?1 0 ] represents I ? PWVmm restricted to Vm+1 with respect to the orthonormal basis fv1; : : : ; vm+1 g. The following lemma, which was recently obtained by Hochbruck and Lubich [26], provides a simpler expression for this projection. Lemma 4.4. Let the vector w^ m+1 2 Vm+1 \ W?m be de ned by the condition (vm+1 ; w^ m+1 ) = 1. Then for all v 2 Vm+1 there holds (I ? PWVmm )v = (v ; w^ m+1 )vm+1 : (46) Moreover, the coordinate vector y^m+1 of w^ m+1 with respect to fv1 ; : : : ; vm+1 g

has the form

y^m+1 = gm ; 1

where gm solves

HmH gm = ?m+1;m um :

Proof. On Vm+1 , the projection I ? PWVmm is characterized by the two properties

(I ? PWVmm )v = v (I ? PWVmm )v = 0

8v 2 Vm+1 \ V?m = spanfvm+1 g; 8v 2 Vm+1 \ Wm ; i.e., it is the oblique projection onto Vm+1 \ V?m orthogonal to Vm+1 \ W?m , both of which are one-dimensional spaces of which fvm+1 g and fw^ m+1 g are biorthonormal bases. Thus, (46) is the singular value expansion of I ? PWVmm restricted to Vm+1 . To obtain the coordinate vector y^m+1 , we rst note that the requirement (vm+1 ; w^ m+1 ) = 1 implies that its last component is equal to one. Next, H ), since the columns of H~ m span w^ m+1 ? Wm translates to y^m+1 2 null(H~ m 20

the coordinate space of Wm . Denoting by gm 2 C m the rst m components of y^m+1 and recalling that m+1;m > 0, we obtain

0 = H~ mH y^m+1 = HmH m+1;m um g1m = HmH gm + m+1;m um : The representation (46) can be used to obtain another expression for the OR approximation error as follows: by virtue of the inclusion Wm?1 Wm , an arbitrary vector w 2 Wm?1 must lie in the nullspace of I ? PWVmm . Furthermore, the dierence r ? w 2 spanfr g + Wm?1 = Vm has a representation r ? w = Vm z with z 2 C m . It follows that Vm )r = (I ? P Vm )(r ? w ) dmOR = (I ? PW Wm m

= (r ? w ; w^ m)vm+1 = Vm+1 z0 ; Vm+1 g1m = (gmH z )vm+1 ;

vm+1

(47)

and therefore kdmOR k = jgmH z j kgm k2kz k2 with equality holding if and only if g and z are collinear. At the same time, as gm is xed, equality must occur when kz k2 = kr ? w k is minimized among all w 2 Wm?1 , which is the case for w = hmMR?1. As a result, kdmOR k = kgm k2kdmMR ?1 k which, in view of (25) and (27), implies kgm k2 = sm =cm , an identity which could also have been derived directly from the de nition of gm . The least-squares problem (43) can be solved with the help of a QR decomposition of the Hessenberg matrix H~ m ,

Qm H~ m = R0m ;

(48)

where Qm 2 C (m+1)(m+1) is unitary (QHm Qm = Im+1 ) and Rm 2 C mm is upper triangular. Substituting this QR factorization in (43) yields

Rm min k u1 ? H~ m y k2 = minm

QH m Qm u1 ? 0 y m y 2C y 2C

min Qm u1 y 2Cm

2

qm ? R m y

R m

;

? 0 y = ymin = (m)

2Cm qm +1;1 2 2 where [qm> ; qm(m+1) ;1]> (qm 2 C m ) denotes the rst column of Qm . Since H~ m has full rank, Rm is nonsingular and the solution of the above least-squares problem is ymMR = R?m1 qm . The associated least-squares error is given by kdmMRk = jqm(m+1) ;1 j. The following theorem identi es the angles between r and Wm as well as those between the spaces Vm and Wm with quantities from the rst column of the matrix Qm :

21

(m) ]m+1 2 C (m+1)(m+1) is the Theorem 4.5. If, for m = 1; : : :; L, Qm = [qj;k j;k=1

unitary matrix in the QR decomposition (48) of the Hessenberg matrix H~ m in (39), then there holds

sin ](r ; Wm) = qm(m+1) ;1 ; (m) (m?1) : sin ](dmMR ; W ) = sin ]( V ; W ) = q =q m m ?1 m m+1;1 m;1

(49) (50)

Proof. As mentioned earlier (cf. the proof of Lemma 4.1) the vector r possesses the coordinates u1 2 C m+1 with respect to the orthonormal basis fv1 ; : : : vm+1 g of Vm+1 , whereas Wm is represented by range(H~ m ) C m+1 . This implies ](r ; Wm) = ]2( u1 ; range(H~ m )) = ]2 (u1; range(H~ m ));

where the index 2 indicates that the last two angles are de ned with respect to the Euclidean inner product on C m+1 . The vectors [v1; : : : vm+1 ]QHm form another orthonormal basis of Vm+1 with respect to which r possesses the coordinate vector [qm> ; qm(m+1) ;1]> 2 C m+1, i.e., a multiple of the rst column of Qm , and Wm is represented by Qm H~ m y = Rm0 y (y 2 C m ), a subspace of C m+1 which we abbbreviate by C m because it consists of those vectors from C m+1 whose last component equals zero. Consequently, there holds ](r ; Wm) = ]2( [qm> ; qm(m+1) ;1]> ; C m ) = ]2 ([qm> ; qm(m+1) ;1]>; C m )

which, in view of (6), proves assertion (49). Formula (50) follows directly from (17) and (22). The matrix Qm is usually constructed as a product of Givens rotations Qm = Gm Gm0?1 01 Gm0?2 I0 G01 I 0 m?1 2 where, for k = 1; 2; : : : ; m, 2 Ik?1 0 0 3 Gk := 4 0 c~k s~k e?ik 5 (~ck ; s~k 0; ~c2k + s~2k = 1; k 2 R): i 0 ?s~k e k c~k We brie y explain how these rotations have to be chosen inductively. Assume that we have constructed G1; : : : ; Gm?2; Gm?1 such that 3 2 r Gm?1 0 Gm?2 0 G1 0 H~ = 4Rm0?1 5: 0 Im?1 m 0 1 0 I2 0 m+1;m

22

For later use, we rewrite this identity in the form 3 2 R r m ? 1 Qm?1 0 Hm 5: (51) 0 1 0 0 m+1;m = 4 00 m+1 ;m Now we set m+1;m ; q c~m := q j j ; s ~ := m j j2 + m2 +1;m j j2 + m2 +1;m (52) m := arg(m+1;m ) ? arg() = ? arg() (recall m+1;m 0) and verify by a simple calculation that 2 Im?1 0 0 3 2Rm?1 r 3 2Rm?1 r 3 4 0 5 = 4 0 5 c~m s~m e?im 5 4 0 i m 0 0 0 m+1;m c~m 0 ?s~m e q

with = j j2 + m2 +1;m e?im . That the quantities s~m and c~m are indeed the sines and cosines of the angles (m) ](dmMR ?1 ; Wm ) = ](Vm ; Wm ) can be easily seen as follows: Since qm+1;1 = (m?1) , by Theorem 4.5 we have ?s~m eim qm; 1

(m) (m?1) sm = ](dmMR ?1 ; Wm ) = jqm+1;1=qm;1 j = s~m : When describing the MR and OR approximations, another orthonormal basis of Vm+1 , which was already employed by Paige and Saunders in [32], often proves useful: we de ne V^m+1 := [^v1(m+1) ; : : : ; v^m(m+1+1) ] := Vm+1 QHm (cf. the proof of Theorem 4.5). The above notation for these new basis vectors, namely v^1(m+1) ; : : : ; v^m(m+1+1), is not quite appropriate, since all but the last do not change with the index m. This is shown in the following proposition.

Proposition 4.6. There holds

[^v1(m+1) ; : : : ; v^m(m+1+1)] = [^v1 ; : : : ; v^m ; v~m+1 ]; where v~1 = v1, and, for m = 1; : : : ; L ? 1, v^m = cm v~m + sm eim vm+1 ; v~m+1 = ?sm e?im v~m + cm vm+1 : fv^1 ; : : : ; v^m ; v~m+1 g is an orthonormal basis of Vm+1 such that fv^1; : : : ; v^m g is

a basis of Wm . In addition, there holds

q hmMR = V^m+1 0m

0 and dmMR = V^m+1 q(m) : m+1;1 23

(53)

Proof. To prove the rst assertion we observe

H V^m+1 = Vm+1 QHm = [Vm ; vm+1 ] Qm0?1 01 GHm 2 0 0 3 i Im?1 h = V^m ; vm+1 4 0 cm ?sm e?im 5 : cm 0 sm eim That the rst m columns of V^m+1 form a basis of the approximation space Wm follows from R m H ~ = [^v ; : : : ; v^ ]R ; W =V H =V Q m

m+1 m

m+1 m

0

m m

1

which also implies (53). We summarize the coordinate representations of the MR und OR errors in

Proposition 4.7. For the MR and OR errors there holds:

m Y (m) v~ ?sj eij v~m+1 ; = dmMR = qm m +1 +1;1 j =1 m Y s m ( m ? 1) OR i ij m dm = ? e qm;1 vm+1 = cm cm j =1 ?sj e vm+1 ; mY ?1 ( m ? 1) MR OR ij dm?1 ? dm = qm;1 v^m = cm cm j =1 ?sj e v^m :

Proof. The recursive de nition of Qm allows us to express its entries qk;(m1) ex-

plicitly in terms of the sj 's und cj 's: qk;(m1) = ck

kY ?1 j =1

?sj eij

(1 k m);

qm(m+1) ;1 =

m Y j =1

?sj eij :

This, together with (53), proves the rst identity. [?1] v Next, we recall from Remark 4.3 that dmOR = ? m+1;m m; 1 m+1 . To [ ? 1] eliminate m;1 from this relation, we note that the matrix Hm possesses the QR decomposition (cf. (51)) ?1 R r R r~ Q ; m ? 1 ? 1 m ? 1 Qm?1 Hm = 0 ; i.e., Hm = 0 1= m?1 [?1] = q(m?1) =. Since which implies m; m+1;m = = eim sm =cm (cf. (52)) we 1 m;1 [?1] (m?1) im conclude m; 1 m+1;m = qm;1 e sm =cm . This proves the second identity. The desired representation of dmMR ?1 ? dmOR now follows from v^m = cm v~m + i sm e m vm+1 (cf. Proposition 4.6).

24

We note that all relations of Theorem 3.4 connecting the MR and OR approaches can be easily obtained by manipulating the error representations of Proposition 4.7. Indeed, this is essentially how these relations are proven in the literature. The main dierence is that the involved sines and cosines usually result from the Givens rotations, which are needed to construct the QR decompostition of H~ m , and are not identi ed as the sines and cosines of ](r ; Wm ). Finally, we observe that the vector w^ m+1 introduced in Lemma 4.4 is given by v~m+1 =cm , i.e., an equivalent formulation of (46) reads as ~ (I ? PWVmm )v = (v ; cvm+1 ) vm+1 (v 2 Vm+1 ): m

4.3 Using Arbitrary Bases of Wm and Vm

For practical computations it is desirable that the matrices H~ m have small bandwidth. If H~ m has only k nonvanishing diagonals, namely the ones with indices ?1; 0; 1; : : : ; k ? 2 (we follow the standard notation according to which a diagonal has index k if its entries j;` are characterized by ` ? j = k), then only k diagonals of the upper triangular matrices Rm are nonzero, namely those with indices k = 0; 1; : : : ; k ? 1. This follows easily from the fact that Qm has lower Hessenberg form. The special structure of the matrices Rm can then be used to derive k-term recurrence formulas for the coordinate vectors ym in terms of ym?1 ; ym?2 ; : : : ; ym?k+1 and for the approximations hm in terms of hm?1 ; hm?2; : : : ; hm?k+1. (This statement applies to both the MR and the OR approach.) The most important consequence of this observation is that, at each step, only the k previous approximations (or, in other implementations, the last k basis vectors) have to be stored, which means that storage requirements do not increase with m. If we insist on choosing v1 ; : : : ; vL as orthogonal vectors, the Hessenberg matrices will generally not have banded form. The main motivation for doing without an orthonormal basis of Vm is therefore to constrain the bandwidth of Hm in order to keep storage requirements low. As explained at the beginning of Section 4.2, no orthogonality conditions are required to derive the fundamental relationship (39) Wm = Vm+1 H~ m = Vm Hm + [0; : : :; 0; m+1;m vm+1 ]: In this section, we only assume v1; v2; : : : to be linearly independent such that fv1 ; : : : ; vm g constitutes a basis of Vm (m = 1; 2; : : : ; L). Just as before, the j-th column of the upper Hessenberg matrix H~ m 2 C (m+1)m contains the coecients of wj 2 Wj Vm+1 with respect to the basis vectors v1 ; : : : ; vm+1 . The dierence is that, since now the vectors vj need not be orthogonal, these coecients can no longer be expressed in terms of the inner product with which H was originally endowed. We shall see below that the usual inner product representation does indeed hold if we switch to another suitable inner product. As in the proof of Lemma 4.1 we see that, for each h = Wm y 2 Wm (y 2 C m ), the associated error is represented by d = r ? h = r ? Wm y = v1 ? Vm+1 H~ m y = Vm+1 ( u1 ? H~ m y ): 25

Minimizing the norm of d among all h 2 Wm leads as above to the least squares problem

min

Vm+1 u1 ? H~ m y y 2Cm

=

min k u1 ? H~ m y kv ;

y 2Cm

(54)

in which k kv denotes the norm induced on the coordinate space with respect to VL by the inner product (; ) given on H. More precisely, if we set where M := [(vj ; vk )] 2 C LL

(y ; z )v := (VL y ; VL z ) = z H M y ;

(55)

for y ; z 2pC L , then (; )v is an inner product on C L and induces a norm, namely k kv := (; )v , on C L . At this point one could proceed as in the algorithms which use an orthogonal basis, the only dierence being that all inner products in the coordinate space now require knowledge of the Gram matrix M. In particular, the Givens rotations and the matrices Qm in the QR factorization (48) must now be unitary with respect to the inner product (; )v , i.e., they must satisfy QHm Mm Qm = Im , where Mm 2 C (m+1)(m+1) is the (m + 1)-st leading principal submatrix of M. The submatrices Mm , however, cannot be computed unless all basis vectors v1 ; v2; : : : ; vm are available, regardless of the fact that they may obey short recurrences; this is precisely what we want to avoid. An alternative was originally proposed by Freund [10]: Rather than solving the minimization problem (54), we instead solve minm k u1 ? H~ m y k2 ; y 2C

and, if ymQMR 2 C m denotes the unique solution to this least-squares problem, regard QMR hmQMR := Wm ym

as an approximation of r . Using the terminology introduced by Freund, we refer to this approach as the quasi-minimal residual (QMR) approach. In the context of Krylov subspace methods the vector smQMR := u1 ? H~ m ymQMR 2 C m+1 is usually called the quasi-residual of hmQMR . We note that, instead of changing the inner product in the coordinate space from (; )v to the Euclidean inner product, one could equivalently have replaced the given inner product (; ) on VL H by (v ; w )V = (VL y ; VL z )V =: z H y 8v = VL y ; w = VL z 2 VL

(56)

and proceeded as in the MR algorithm of Section 4.2. The new inner product (; )V thus de ned has the property that the basis vectors fv1 ; : : :; vLg are orthonormal.

26

The following assertions are obvious if we keep in mind that (VL x ; VL z )V = (VL x ; VL M ?1z ) = (VL M ?1 x ; VL z ) = (VL M ?1=2x ; VL M ?1=2z ); (VL x ; VL z ) = (VL x ; VL M z )V = (VL M x ; VL z ) = (VL M 1=2x ; VL M 1=2z )

(57)

for y ; z 2 C L . As a consequence, we obtain for instance

Theorem 4.8. The QMR iterates are the MR iterates with respect to the inner

product (; )V :

kdmQMRkV = kr ? hmQMRkV = hmin kr ? h kV : 2Wm

In terms of the original norm in H, the MR and QMR errors may be bounded by

kdmMRk kdmQMRk 2 (Mm ) kdmMRk; p

(58)

in which 2 (Mm ) denotes the (Euclidean) condition number of the (Hermitian positive de nite) matrix Mm . Moreover, we have =2 (Mm )ks QMR k: kdmQMRk 1max m

Proof. Only the second inequality of (58) remains to be proven. This follows

immediately from

=2 (M ) ky k ky k 1=2 (M ) ky k 1min m 2 v 2 max m

for all y 2 C m+1 . In view of (58) the deviation of the QMR approach from the MR approach is bounded by the condition numbers 2 (Mm ), i.e., by the ratio of the extremal eigenvalues of Mm . The largest eigenvalue max (Mm ) is easily controlled: it merely requires choosing the basis vectors vm such that they have unit length, i.e., kvm k = 1 for all m, to guarantee max (Mm ) m + 1 (note that max (Mm ) kMm kF := [Pmj;k+1=1(vj ; vk )2 ]1=2 [Pmj;k+1=1 kvj k kvk )k]1=2). The crucial point is to construct the basis Vm in such a way that min (Mm ) does not (or only slowly) approach zero. Another immediate consequence of (57) is

Proposition 4.9. The error vectors of the QMR approach ful ll dmQMR ? Um ; 27

where Um := fv = Vm+1 y : y = Mm?1H~ m z for some z 2 C m g is an mdimensional subspace of Vm+1 (for m = L, there holds UL = VL) and orthogonality is understood with respect to the original inner product (; ) on H. Consequently,

Um r ; dmQMR = I ? PW m

where PWUmm denotes the oblique projection onto Wm orthogonal to Um . Proof. Since the QMR approximations are merely the MR approximations with

respect to (; )V , their errors dmQMR 2 Vm+1 are characterized by dmQMR ?V Wm : From this observation and (57) the proof easily follows. An equally simple proof observes h i dmQMR = Vm+1 1 u1 ? H~ m ymQMR ;

where ymQMR solves the least-squares problem k 1 ? H~ m y k2 ! min, in other words 1 ? H~ m y QMR ? range(H~ m ) or 1 ? H~ m y QMR ?v M ?1 range(H~ m ). Note that the orthogonal complement of Um is given by U?m = spanfdmQMR g ? +Vm+1 , i.e., U?m Vm = H and the oblique projection PWUmm exists. Next, we brie y describe the analogue of the OR approach if we start out with a non-orthonormal basis. Instead of seeking y 2 C m such that 0 = (Vm+1 ( u1 ? H~ m y ); vj ) = ( u1 ? H~ m y ; uj )v (j = 1; 2; : : : ; m); which would lead to r ? Wm y ? Vm , i.e., to a proper OR approximation, we determine ymQOR 2 C m such that 0 = ( u1 ? H~ m ymQOR ; uj )2 (j = 1; 2; : : : ; m); i.e., Hm ymQOR = u1 (59) (provided Hm is nonsingular). The corresponding approximants are then de ned by hmQOR = Wm ymQOR . In terms of the inner product (; )V on VL they are characterized by QOR ? V dmQOR = r ? hm V m i.e., the QOR iterates are the OR iterates with respect to the inner product (; )V . Similarly to Proposition 4.9 there holds

Proposition 4.10. The errors of the QOR approximants ful ll dmQOR ? Tm; where Tm := fv = Vm+1 y 2 Vm+1 : y = Mm?1 [z T 0]T with z 2

Cmg

is an m-dimensional subspace of Vm+1 (for m = L, there holds TL = VL) and

28

orthogonality is understood with respect to the original inner product (; ) on H. Consequently,

Tm r ; dmQOR = I ? PW m

where PWTmm denotes the oblique projection onto Wm orthogonal to Tm.

We know from Remark 4.2 that Hm being nonsingular is equivalent Wm V?mV = H. But since,Tmfor every h 2 H, h ?V Vm if and only if h ? Tm, the

oblique projection PWm exists if Hm is nonsingular. Recall that the QMR and QOR approximations are the MR and OR approximations, respectively, obtained when replacing the original inner product (; ) by the basis-dependent inner product (; )V . This simple observation implies that the assertions of the preceding sections, particularly those of Theorem 3.4 and Propositions 3.6, 4.7, are valid for any pair of QMR/QOR methods. Note, however, that when formulating these results for QMR/QOR methods, each occurrence of the original norm must be replaced by the k kV -norm and that angles are understood to be de ned with respect to (; )V . As an example, we mention that QMR + c^2 h QOR ; hmQMR = s^2m hm m m ?1 where QMR s^m := sin ]V (dmQMR ?1 ; Wm ); c^m := cos ]V (dm?1 ; Wm )

QMR and ]V (dmQMR ?1 ; Wm ) denotes the angle between dm?1 and Wm with respect to (; )V inner product. Next, we comment on the eect of applying the smoothing procedures introduced in Section 3, namely minimal and quasi-minimal residual smoothing, to the QOR approximations. We rst note that they are no longer equivalent: If we de ne (dmQMR ; dmQMR ? 1 ?1 ? dmQOR ) MR (60) m := QMR QOR 2

kdm?1 ? dm k

QOR then, in general, hmQMR 6= (1 ? MR )hmQMR m MR m hm because the formula ?1 + MR (60) for the smoothing parameter m was derived in order that the errors of the smoothed approximations solve a local approximation problem with respect to the inner product (; ), which is dierent from the inner product (; )V which characterizes the QMR and QOR approximations. Minimal residual smoothing therefore does not lead from QOR to QMR. It is an easy consequence of Remark 4.3 that the situation is dierent if we apply QMR smoothing: If we set 1=kdmQOR k2V ; QMR m = Pm QOR 2 j =0 1=kdj kV

29

QMR QMR QOR then hmQMR = (1 ? QMR mQMR)hm?1 + m hm does indeed hold (cf. Proposition 3.6). But since dm = vm+1 for some 2 C , we have

kdmQMR kV = j j = kdmQMRk=kvm+1 k and consequently

kvm+1 k=kdm k : QMR m = Pm QOR j =0 kvj +1 k=kdj k2 QOR 2

If we supose the usual assumption that the basis vectors vj have length one (kvj k = 1 for all j), then 1=kdmQORk2 ; QMR m = Pm QOR 2 j =0 1=kdj k and the QMR and QOR approximants are related by exactly the formulas which hold for a proper MR/OR pair, namely: h QMR m

=

Pm

QOR

QOR 2

j =0 hj =kdj k Pm QOR j =0 1=kdj k2

and

d QMR = m

Pm

QOR

QOR 2

j =0 dj =kdj k QOR j =0 1=kdj k2

Pm

:

QMR smoothing applied to CGS and Bi-CGSTAB is discussed in Walker [45]. For smoothing techniques applied to the general class of Lanczos-type product methods, see Ressel and Gutknecht [33]. The above analysis might lead one to believe that the QMR approximations will move steadily farther away from the MR approximation at each step. The following observation due to G. W. Stewart [42] shows that this is not necessarily the case, but that the QMR approximation may under certain conditions recover, regardless of how far it may have deviated from the (optimal) MR approximation in earlier steps. Proposition 4.11. There holds hmQMR = hmMR if and only if v~m+1 ? Wm. There holds hmQOR = hmOR if and only if vm+1 ? Vm . Proof. By Proposition 4.6, we have both Wm = spanfv^1; : : :; v^m g and dmQMR 2 spanfv~m+1 g. Since dmQMR = dmMR if and only if dmQMR ? Wm , the rst assertion is proven. The analogous assertion for the QOR approximation follows from dmQOR 2 spanfvm+1 g, hence h QOR ? Vm if and only if vm+1 ? Vm .

5 Krylov Subspace Methods We now return to our original problem of approximating the solution of the operator equation (1) by selecting corrections to an initial approximation x0 2 H from a nested sequence of m-dimensional subspaces Vm H. As mentioned at the beginning of Section 2, the MR and OR methods accomplish this by 30

approximating the initial residual r0 on the sequence of spaces Wm = AVm . The requirement (21) of choosing the (residual) error space spanfr0g + Wm as the space Vm de ning the orthogonality condition in the OR method leads to the relation

Vm+1 = spanfr0g + AVm; m = 0; 1; : : :; which immediately determines Vm to be the Krylov space Km (A; r0) := spanfr0; Ar0; : : : ; Am?1r0g

of the coecient matrix A and the initial residual r0. Thus, we see that the minimal residual and orthogonal residual Krylov subspace methods are special cases of the abstract MR and OR methods described in Section 3 with Vm = Km(A; r0) and Wm = AKm (A; r0). In the remainder of this section, we recover some well known results on Krylov subspace methods by specializing the abstract results of Sections 3 and 4. Before we begin translating these abstract results to the Krylov subspace setting we note that the number L de ned in (40) is obviously equal to the index of the smallest A-invariant Krylov subspace, i.e., L = minfm : Km (A; r0) = Km+1 (A; r0)g = minfm : A?1 r0 2 Km (A; r0)g: We also note here that the MR and OR approximations of an arbitrary vector r 2 H from a given sequence of nested spaces fWm gLm=0 , dim Wm = m, can always be interpreted as Krylov subspace MR and OR approximations of the solution of a linear operator equation: Indeed, relation (39) uniquely determines a linear operator A on the spaces fVm gLm=0 , and A has VL as an invariant subspace. If A is then extended to an invertible operator on the entire space H, then the MR and OR approximation sequences for r coincide with the corresponding Krylov subspace approximations for solving the equation Ae = r with initial guess eo = 0.

5.1 Using an Orthonormal Basis

Following the minimal residual approach, we seek approximate solutions xmMR =

x0 + kmMR of (1) with kmMR 2 Km (A; r0 ) such that

krmMRk = kb ? AxmMR k = kr0 ? AkmMR k = k 2Kmin kr ? Ak k (A;r ) 0 m

0

(61)

(m = 1; 2; : : : ; L). Recall that AkmMR is the best approximation to r0 from the space Wm = AKm (A; r0) and that rmMR = r0 ? AkmMR is the associated approximation error. Following the orthogonal residual approach, we seek approximate solutions xmOR = x0 + kmOR with kmOR 2 Km (A; r0) such that rmOR = b ? AxmOR = r0 ? AkmOR ? Km (A; r0):

31

(62)

In the terminology of Section 3, AkmOR is the OR approximation to r0 from Wm with error rmOR . As mentioned above, the test spaces de ning the orthogonality conditions coincide with the Krylov spaces Vm = Km (A; r0). We rst turn to the question of existence of the two approximate solutions (cf. Corollary 3.2) and express the residual norms in terms of the angles 'm := ] (Km (A; r0); AKm (A; r0))

(63)

between the subspaces Km (A; r0) and AKm (A; r0) (cf. Theorem 3.4). (Note that 'm = 0 if and only if m = L.) Corollary 5.1. The MR approximation xmMR 2 x0 + Km (A; r0) de ned by (61) exists for every m = 1; 2; : : : ; L. The associated residual vector satis es krmMRk = krmMR?1 k sin'm = kr0 kQm sin 'j : j =1

In particular, this implies

MR kr0MRk kr1MRk krLMR ?1k > krL k = 0: For m = 1; 2; : : : ; L, the OR approximation xmOR 2 x0 + Km (A; r0) de ned by (62) exists if and only if 'm = 6 =2. In this case, the associated residual vector ful lls

Qm k rmMRk kr0 k j =1 sin 'j OR krm k = cos ' = : cos 'm m In particular, xLOR exists and there holds xLOR = xLMR = A?1 b . Since sin 'm = 1 and cos 'm = 0 are equivalent statements, we see again MR that xmOR exists if and only if krmMR ?1k > krm k. We next turn to the question MR OR of how xm and xm and their associated residual vectors are related (cf. (29), (30) and Corollary 3.5). Corollary 5.2. With the angle 'm as de ned by (63) there holds 2 OR xmMR = sin2 ('m )xmMR ?1 + cos ('m )xm ; 2 OR rmMR = sin2 ('m )rmMR ?1 + cos ('m )rm for m = 1; 2; : : : ; L, provided xmOR exists, i.e., that 'm 6= =2. Equivalently, we

have

s

mMRk kr ORk; krmMRk = 1 ? kkrrMR m m?1 k krmMR k ?x MR ? x OR : xmMR = xmOR + MR krm?1 k m?1 m

32

Of course, not only (29) and (30) but every relation presented in Section 3 can be translated into the Krylov subspace language. This applies in particular to the results of Proposition 3.6 on minimal and quasi-minimal residual smoothing. The identities contained in Corollaries 5.1 and 5.2 are not new and may be found e.g. in the papers of Brown [3], Freund [10], Gutknecht [21], Cullum and Greenbaum [7]. In these works, however, the sines and cosines result from the Givens rotations needed to construct a QR decomposition of the Hessenberg matrix analogous to (39) in the course of a speci c algorithm for computing the MR and OR approximations. Section 4, speci cally Theorem 4.5, has revealed the more fundamental signi cance of these rotation angles, namely as the angles between the subspaces Km (A; r0) and AKm (A; r0). The algorithms which have been proposed in the literature for calculating the MR and OR approximations for solving linear equations are based on one of the coordinate representations introduced in Section 4. The new feature in the context of equation-solving is that one is interested in the coordinates of the correction vector in Vm rather than those of the residual approximation in Wm . To see the connection, observe that the residual of the approximation xm = x0 + Vm ym in terms of a basis fv1; : : :; vm g of Vm is rm = r0 ? AVm ym , hence the desired coordinate vector ym with respect to Vm is the same as that of the residual approximation with respect to the basis fAv1; : : :; Avm g of Wm . The GMRES and FOM algorithms proceed by successively constructing an orthonormal basis fv1; : : :; vm g of Vm = Km (A; r0) beginning with V1 = spanfr0g. We observe that this is exactly the Gram-Schmidt procedure introduced in (41) with wm = Avm , m = 1; : : :; L, which in this setting is known as the Arnoldi process. In this case equation (39) reads AVm = Vm+1 H~ m = Vm Hm + [0; : : : ; 0; m+1;m vm+1 ] (64) and the upper Hessenberg matrix H~ m is given by H~ m = [(Avk ; vj )] 2 C (m+1)m ; j = 1 : : :; k + 1; k = 1; : : :; m: (65) To compute the coordinate vector ymMR of the MR approximation xmMR = MR , we note that, in view of (64), the corresponding residual has x0 + Vm ym the form rmMR = r0 ? Vm+1 H~ m ymMR = Vm+1 ( u1 ? H~ m ymMR ), and hence the minimum residual condition again determines ymMR as the solution of the leastsquares problem (43). Just as in Section 4.2, the GMRES algorithm uses a QR factorization of H~ m constructed from Givens rotations to solve this leastsquares problem and thus, by Theorem 4.5, the rotation angles of these Givens rotations coincide with the angles (63) between the subspaces Km (A; r0) and AKm (A; r0). The FOM implementation of the OR approach, which is also based on the Arnoldi process (cf. (64)), computes the coordinate vector ymOR 2 C m of the OR approximation, xmOR = x0 +Vm ymOR , just as in (44), as the solution of the linear system ?1u1 xmOR = x0 + Vm Hm (66) 33

which requires Hm that be nonsingular. We know from Remark 4.2 that this condition is consistent with our previous result, namely that 'm = ](Km (A; r0); AKm (A; r0)) 6= =2: We conclude this section by brie y mentioning two alternative algorithms for computing the MR and OR approximations which are based on an orthonormal basis fw1; : : :; wm g of Wm = AKm (A; r0) as in Section 4.1. The rst of these, which was proposed by Walker and Zhou [44], computes the MR approximation at step m as xmMR = x0 + [r0; w1; : : :; wm?1]ymMR . As noted above, ymMR is also the coordinate vector of the approximation of r0 from Wm with respect to [Ar0; Aw1 ; : : :; Awm?1 ]. Since these vectors lie in Wm for m = 1; : : :; L, there exists an upper triangular matrix Tm 2 C mm such that [Ar0; Aw1; : : :; Awm?1] = [w1; : : :; wm ]Tm ; hence the MR residual is given by rmMR ? Wm Tm ymMR and, comparing with (37), ymMR is the solution of the triangular system Tm ymMR = [(w1; r0); : : :; (wm ; r0)]> :

Similarly, in view of (38), the coordinate vector ymOR of the OR approximation with respect to fr0; w1; : : :; wm?1g is the solution of the triangular system 2 > Tm ymOR = [(w1; r0); : : :; (wm?1; r0); krmMR ?1 k =(wm ; r0)] :

The triangular matrix Tm can be computed using the Arnoldi process applied to A with initial vector Ar0 . Finally, a similar algorithm proposed by Rozloznk and Strakos [35] employs a basis fv1; : : :; vm g of Vm with the property that fAv1; : : :; Avm g is an orthonormal basis of Wm , in other words, fv1; : : :; vm g is an A A-orthonormal basis of Vm . The advantage of this approach is that the MR and OR coordinate vectors can be obtained without solving either a least-squares problem or linear system. Indeed, since AVm is orthonormal, by (37) and (38) we have 2

3

3 (r0; Av1 ) (r0; Av1) 7 6 .. 7 .. 75 ; ymOR = 66 . ymMR = 6 7: 4 . 4 (r0 ; Avm?1) 5 (r0 ; Avm ) krmMR?1 k2=(wm ; r0) 2

We refer to [35] for implementation details of these two algorithms as well as a discussion of their stability properties relative to the standard GMRES algorithm. The GMRES and FOM algorithms and the variants mentioned above implement the MR and OR approach for solving (1) in the case of general injective A. The method of choice for constructing an orthonormal basis of Vm = K(A; r0) is the Arnoldi process, of which there exist several modi cations with varying 34

degrees of stability in the presence of roundo errors (cf. e.g. the discussions in Rozloznk et al. [36] and Greenbaum et al. [16]). Whenever A is selfadjoint, an orthogonal basis may be generated using the Hermitian Lanczos process, with the result that the Hessenberg matrix in the fundamental relation (64) is tridiagonal, with all the computational bene ts mentioned at the beginning of Section 4.3. In particular, the short recurrences for the basis vectors allow the approximations and residuals to be inexpensively obtained by clever update formulas. Well-known algorithms which follow this approach to implement the MR and OR approximations in the selfadjoint and positive de nite case are the CR/CG algorithms of Hestenes and Stiefel [25], and an MR/OR pair of algorithms for the selfadjoint but possibly inde nite case is MINRES/CG due to Paige and Saunders [32]. Since these are all implementations of MR and OR approximations for special cases, all MR and OR relations, in particular those listed in Corollaries 5.1 and 5.2, also hold for these two Krylov subspace pairs.

5.2 Using a Non-Orthogonal Basis

The previous section identi ed Krylov subspace algorithms for the iterative solution of (1) which use an orthogonal basis of Vm = Km (A; r0) as algorithms which compute the MR and OR approximations of the initial residual r0. In the same manner, we show in this section how Krylov subspace methods such as QMR/BCG and TFQMR/CGS, which use a non-orthogonal basis of the Krylov space, can be treated in the same way as the MR/OR approximations discussed in Section 4.3. To this end, let v1 ; v2; : : : ; vL 2 H be any set of linearly independent vectors such that fv1; v2 ; : : : ; vm g form a basis of Vm = Km (A; r0) for m = 1; : : :; L (in particular, r0 = v1 for some 0 6= 2 C ). This basis leads naturally to a (quasi)-minimal residual as well as to a (quasi-) orthogonal residual method. Since, for every 1 m L, Avm 2 spanfv1; v2; : : : ; vm+1 g, where we set vL+1 = 0, there exists an upper Hessenberg matrix H~ m 2 C (m+1)m such that|just as in (39) and in (64)| (67) AVm = Vm Hm + [0; : : : ; 0; m+1;m vm+1 ] = Vm+1 H~ m : Just as in (65), the m-th column of H~ m contains the coecients of Avm 2 Km+1 (A; r0) with respect to the basis vectors v1; : : :; vm+1. We are thus in the situation of Section 4.3, with Vm = Km (A; r0) and Wm = AVm . De ning the auxiliary inner products (; )V on VL and (; )v on the coordinate space C L as in (56) and (55), respectively, we obtain the QMR approximation QMR = x + v QMR xmQMR := x0 + Vm ym 0 m ? 1 QMR of the solution A b by requiring that Avm be the MR approximation to r0 with respect to the inner product (; )V . Just as in Section 4.3, the coecient vector ymQMR 2 C m is characterized as the unique solution of the least-squares problem (43). Merely translating Theorems 4.8 and 4.9 to Krylov subspace notation, we obtain 35

Theorem 5.3. The QMR iterates are the MR iterates with respect to the inner product (; )V : kb ? AxmQMR kV = x 2x +min kb ? Ax kV : (68) 0 Km (A;r0 ) In terms of the original norm in H, the MR and QMR residuals may be bounded by p

krmMR k krmQMRk 2(Mm ) krmMRk;

(69)

in which 2 (Mm ) denotes the (Euclidean) condition number of the (Hermitian positive de nite) matrix Mm . Theorem 5.4. The residual vectors of the QMR iterates are characterized by rmQMR ? Um ;

where Um = fv = Vm+1 y 2 Km+1 (A; r0) : y = Mm?1H~ m z with z 2 C m g is an m-dimensional subspace of Km+1 (A; r0) (for m = L, there holds UL = KL(A; r0)) and orthogonality is understood with respect to the original inner product (; ) on H. Consequently,

Um r0; rmQMR = I ? PW m

where PWUmm denotes the oblique projection onto Wm = AKm (A; r0) orthogonal to Um .

Again drawing from the abstract results in Section 4.3, the corresponding QOR approximations in the Krylov subspace case are de ned by QOR ; xmQOR = x0 + Vm ym where the coordinate vector ymQOR 2 C m is determined by 0 = ( u1 ? H~ m ymQOR ; uj )2 (j = 1; 2; : : : ; m); i.e., Hm ymQOR = u1 (70) under the usual assumption that Hm is nonsingular. In terms of the inner product (; )V on KL (A; r0) the QOR approximations are characterized by rmQOR = b ? AxmQOR ?V Km (A; r0 ); i.e., the QOR iterates are the OR iterates with respect to the inner product (; )V . Similarly to Theorem 4.9 there holds

Theorem 5.5. The residual vectors of the QOR iterates satisfy rmQOR ? Tm; where Tm = fv = Vm+1 y 2 Km+1 (A; r0) : y = Mm?1 [z T 0]T with z 2 C m g is an m-dimensional subspace of Km+1 (A; r0) (for m = L, there holds TL =

36

KL(A; r0)) and orthogonality is understood with respect to the original inner product (; ) on H. Consequently,

Tm r0 ; rmQOR = I ? PW m

where PWTmm denotes the oblique projection onto Wm = AKm (A; r0) orthogonal to Tm.

The QMR and QOR iterates being identi ed as MR and OR iterates with respect to the inner product (; )V implies that the assertions of Corollaries 5.1 and 5.2 are valid for any pair of QMR/QOR methods. The crucial angles 'm of (63), however, are now de ned with respect to (; )V . The motivation for using non-orthogonal bases comes from the potential savings in storage and computation when using algorithms for generating a basis of Km (A; r0) which lead to a Hessenberg matrix in relation (67) with a small number of bands. A tridiagonal Hessenberg matrix can be achieved using the non-Hermitian Lanczos process. The non-Hermitian variant of the Lanczos process is almost as economical as its Hermitian counterpart, requiring in addition the generation of a basis of a Krylov space with respect to the adjoint A and thus storage for several additional vectors and, what can be expensive, multiplication by A at each step. The result are two biorthogonal, in general not orthogonal, bases. The non-Hermitian Lanczos process may break down before the Krylov space becomes stationary, and in nite arithmetic this leads to numerical instabilities in the case on near-breakdowns. This problem is addressed by so-called look-ahead techniques for the non-Hermitian Lanczos process (cf. Freund et al. [13], Gutknecht [23] and the references therein), which result in a pair of block-biorthogonal bases and a Hessenberg matrix which is as close to tridiagonal form as possible while maintaining stability. Remark 5.6. We have mentioned that Krylov subspace QMR/QOR methods have the advantage of being able to work with an arbitrary basis of the Krylov space, in particular also with a non-orthogonal basis. As the bounds (69) show, how close the QMR iterates come to also minimizing the residual in the original norm depends on the Euclidean condition number of the Gram matrices Mm , i.e., on the largest and smallest singular value of Vm = [v1 ; : : :; vm ] (interpreted as an operator from C m to Km (A; r0 )). The largest singular value of Vm is bounded by pm if the basis vectors are normalized with respect to the original norm. When the look-ahead Lanczos algorithm is used to generate the basis, bounds on the smallest singular value may be obtained depending on the lookahead strategy being followed.

Example: QMR/BCG and TFQMR/CGS

We now consider two particular examples of Krylov subspace QMR/QOR algorithms, the QMR method of Freund and Nachtigal [14] and the TFQMR and CGS methods due to Freund [11, 12] and Sonneveld [39], respectively. As another QMR/QOR pair, we mention the QMRCGSTAB method of Chan et al. [5] and BICGSTAB developed by Van der Vorst [43] and Gutknecht [22]. 37

The QMR algorithm of Freund and Nachtigal proceeds exactly as described above, with the basis of the Krylov space generated by the look-ahead Lanczos algorithm. The QOR counterpart of Freund and Nachtigal's QMR is the BCG algorithm, the iterates of which are characterized by x BCG 2 x0 + Km (A; r0); rmBCG ? Km (A ; r~0): The algorithm proceeds by generating a basis of the Krylov space Km (A ; r~0), where r~0 is an arbitrary starting vector, simultaneously with a basis Vm of Km (A; r0) in such a way that these two bases are biorthogonal. If y 2 C m denotes the coecient vector of the BCG approximation with respect to Vm , then the biorthogonality requirement implies rmBCG = Vm+1 ( u1 ? H~ m y ) ? Km (A ; r~0)

, rmBCG ?V Vm , Hm y = u 1 :

The last equality identi es x BCG as the m-th QOR iterate. To treat the TFQMR/CGS pair, recall that the residual of any Krylov subspace approximation can be expressed as rm = m (A)r0 in terms of a polynomial m of degree m satisfying m (0) = 1. Sonneveld [39] de ned the CGS iterate xmCGS 2 x0 + K2m (A; r0) such that rmCGS = [m(A)]2 r0; where rmBCG = m (A)r0 : It is shown in [11] that xmCGS = x0 +Y2m z2m for a coecient vector z2m 2 C 2m , where K2m (A; r0) = spanfy1; : : :; y2m g. Moreover, there exists a sequence fwng such that r0 = w1 and AYn = Wn+1 H~ n; n = 1; : : :; 2L: In terms of this sequence, there holds rmCGS = W2m+1 u1 ? H~ 2m z2m ; where H2m z2m = u1 , i.e., rmCGS ?W spanfw1 ; : : :; w2m g = AK2m (A; r0); which identi es CGS as an OR method. The corresponding MR method is Freund's transpose-free QMR method TFQMR (cf. [11, 12]), the iterates of which are de ned as xnTFQMR = x0 + Yn zn; n = 1; : : :; 2L; where the coecient vector zn 2 C n solves the least-squares problem : k u1 ? H~ nzn k2 ! zmin 2Cn In other words,

krnTFQMR kW = x 2x +min K (A;r ) kb ? Ax kW : n

0

38

0

Comparison of Residuals

The availability of a sequence of vectors fv~j gj 1 which is biorthogonal to the Krylov basis fvj g permits a convenient representation of the QOR residual via (47). If (vj ; v~k ) = jk dj ; j; k = 1; : : :; L; and if both sequences are normalized to one, then, since the MR-residual at step Pm MR m ? 1 lies in Wm?1 Vm , we have rmMR = ?1 j =1 (rm?1 ; v~j =dj )vj . Inserting r ? w = rmMR ?1 in (47) then yields m g X j (r MR ; v~ )v ; rmQOR = (rmMR ?1 ; w^ m+1)V vm+1 = vm+1 m?1 j j j =1 dj which, after taking norms and applying the Cauchy-Schwarz inequality, becomes m X krmQOR k krmMR?1 k jjdgj jj : j =1 j This bound is a slightly improved version of a bound by Hochbruck and Lubich [26].

5.3 Every Method is an MR or OR Method

After successfully bringing a number of Krylov subspace algorithms into the MR/OR framework, one might ask whether this is possible for all Krylov subspace methods. A more precise formulation of this question is the following: Given a linear operator equation Ax = b and an initial guess x0 , this determines the sequence of Krylov spaces V1 = spanfr0g; : : :; VL = KL(A; r0) up to the (possibly in nite) index L at which the Krylov spaces become stationary. We consider Krylov subspace methods which possess the so-called nite termination property, by which we mean that they obtain the exact solution A?1 b in step L. Strictly speaking, this excludes for instance the Chebyshev semi-iterative methods(cf. Manteuel [30]), but if we consider only at most L ? 1 steps of any Krylov subspace method (which is never a restriction in practice, since we cannot aord L Krylov steps for large systems), then the following results apply. Any Krylov subspace method is uniquely determined by either its sequence of iterates xm 2 x0 + Vm ; m = 1; : : : ; L ? 1; or its sequence of residuals rm 2 r0 + AVm = Vm+1 ; m = 1; : : :; L ? 1; (71) 39

where we recall that xL = A?1 b and rL = 0. The rst question is under which conditions does there exist an inner product (; )V on VL such that

krm kV = x 2min kb ? Ax kV ; x0 +Vm

m = 1; 2; : : : ; L ? 1;

or, equivalently, such that the given Krylov subspace method turns out to be a QMR method, i.e., a MR method with respect to k kV ? The second question is under which conditions does there exist an inner product (; )V~ on VL such that rm ?V~ Vm ;

; m = 1; 2; : : : ; L ? 1;

or equivalently, such that the given Krylov subspace method turns out to be a QOR method, i.e., an OR method with respect to (; )V~ ? To answer these questions, we return to our general setting and approximate an element r 2 H (r0 in the Krylov subspace context) from a sequence of nested subspaces fWm g (AKm (A; r0 )) with associated error spaces Vm+1 = spanfr g + Wm (Km+1 (A; r0)). As above, the approximants will be denoted by hm 2 Wm and the corresponding errors by dm = r ? hm (they correspond to the residuals rm ).

Theorem 5.7. Assume that fhmgm=1;::: ;L is a sequence of approximants to 2 H such that hm 2 Wm and hL = r . There exists an inner product (; )V

r

on VL = WL such that

kdm kV = hmin kr ? h kV ; m = 1; 2; : : : ; L ? 1; 2Wm if and only if hm 2 Wm?1 implies hm = hm?1 for m = 1; 2; : : : ; L ? 1 (where W0 = f0g, i.e., h0 := 0), in other words, if and only if either hm 2 Wm n Wm?1 or hm = hm?1 (72) is ful lled for any m = 1; 2; : : : ; L ? 1. Proof. Assume that, for all m = 1; 2; : : : ; L ? 1, the vectors hm are the best approximations to r from Wm with respect to some inner product (; )V . If now, for some m, hm 2 Wm?1 , then hm must also be the best approximation to r from Wm?1 , and consequently, hm = hm?1 , which proves the necessity of (72). Next assume that (72) is ful lled. We write r = (h1 ? h0) + (h2 ? h1) + + (hL ? hL?1)

and construct a basis fw1; : : : wL g of WL in the following way. We set wm :=

if hm 2 Wm n Wm?1 ; an arbitrary vector from Wm n Wm?1 ; if hm = hm?1 :

hm ? hm?1 ;

40

Note that, for each m, fw1; : : : wm g is a basis of Wm . We further de ne the \Fourier coecients" m (m = 1; : : : ; L) by m :=

1; if hm 2 Wm n Wm?1 ; 0; if hm = hm?1:

Note that this implies r = 1 w1 + 2 w2 + L wL and hm = 1w1 + 2w2 + m wm (m = 1; : : : ; L); i.e., hm is nothing but the truncated \Fourier expansion" of r . De ning the inner product (; )V such that fw1; : : : wL g is an orthonormal basis of WL leads to the desired conclusion. Remark 5.8. Condition (72) translated to the Krylov subspace context, reads xm ? x0 2 Km (A; r0) n Km?1 (A; r0), a condition which is satis ed by any \reasonable" Krylov subspace method. However, as e.g. the well-known example of the circulant shift matrix due to Brown [3] shows, one can easily concoct pathological examples where r = w1 + 0w2 + + 0wL?1 + wL ;

i.e., rL?1 = = r0 so that the MR method makes absolutely no progress until the very last step. For the OR or, more precisely, the QOR approach, we have the following theorem:

Theorem 5.9. Assume that fhmgm=1;::: ;L is a sequence of approximants to 2 H such that hm 2 Wm and hL = r . There is an inner product (; )V~ on

r

VL = WL such that

dm ?V~ Vm ;

m = 1; 2; : : : ; L ? 1;

if and only if hm 2 Wm n Wm?1 for m = 1; 2; : : : ; L ? 1. Proof. Assume that, for all m = 1; 2; : : : ; L ? 1, the vectors hm are the OR

approximations to r from Wm with respect to some inner product (; )V~ . If now, for some m, hm 2 Wm?1 , then dm 2 spanfr g + Wm?1 = Vm , i.e., dm 2 Vm \ V?m = f0g, which implies hm = r . But this is impossible unless r 2 Wm , i.e., m = L. We saw that hm 2 Wm n Wm?1 for m = 1; : : : ; L ? 1. Since hm 2 Wm n Wm?1 implies dm 2 Vm+1 n Vm for m = 1; : : : ; L ? 1, we see that fd0; d1; : : : dL?1g (d0 = r ) is a basis of VL such that, for every m = 1; : : : ; L ? 1, fd0; d1; : : : dm?1 g is a basis of Vm . De ning the inner product (; )V~ such that fd0; : : : dL?1g is an orthogonal basis of VL leads to the desired conclusion.

41

6 Residual and Error Bounds In this section we will obtain bounds for the residual and error norms of Krylov subspace methods by specializing Theorem 3.4. The goal is to relate properties of the operator A to the decay of the numbers sm .

6.1 Parameters Determining the Rate of Convergence

We begin by recalling that, as a result of Theorem 3.4, the residual norm history of the MR and OR methods is completely determined by the sequence of angles between the spaces Vm and Wm , which in the present setting of Krylov subspaces specialize to Km (A; r0 ) and AKm (A; r0 ), respectively. In other words, convergence depends only on the sequence of these spaces and their relative position. Furthermore, Theorem 4.5 and the ensuing discussion revealed that the sines and cosines of these angles appear as the parameters of the Givens rotations in the recursive construction of the QR-factorizations (48) of the Hessenberg matrices H~ m , m = 1; : : :; L. As a consequence, we note that all the information regarding the progress of the solution algorithm is contained in the Q-factors. In other words, the residual norm history of the MR/OR approximations associated with the Arnoldi relations R m ~ AV = V H = V Q ; m = 1; : : :; L; m

m+1 m

m+1 m

0

is independent of the matrices Rm . In particular, quantities such as the singular values of H~ m |which coincide with those of Rm |play no role for the rate of convergence. Furthermore, this relation reveals that there is a particularly simple matrix A~ which has the same MR/OR residual norm history for the same initial residual r0 , namely the unitary matrix obtained by setting RL equal to the identity. If VL is a proper subspace of H, A~ may be made unique by extending it to be the identity on the complementary space. This observation, that for each linear system of equations there exists a linear system of equations with a unitary matrix for which MR/OR display the identical convergence behavior, was rst pointed out by Greenbaum and Strakos [17]. The next lemma, which we will make use of in Section 6.4, is due to Moret [31] and gives a bound on sm = j(vm+1 ; wm)j for the case when Vm is a Krylov space:

Lemma 6.1. Let fvmg be the Arnoldi basis de ned above with respect to r0 2 H and A : H ! H, where we assume A is bounded and possesses a bounded inverse. In addition, let fwj gj 1 denote an orthonormal system such that fw1; : : :; wm g

is an orthonormal basis of Wm . Then there holds

j(vm+1 ; wm )j kA?1 k (vm+1 ; Avm ): Proof. Since A?1 wm 2 Vm , we can write (vm+1 ; wm ) = (vm+1 ; APVm A?1wm ): 42

Moreover, since Avj 2 Vm ? vm+1 for 1 j m ? 1, we have

j(vm+1 ; APVm A?1 wm )j = j(vm+1 ;

m X j =1

(A?1 wm ; vj )Avj )j

= j(vm+1 ; (A?1 wm ; vm )Avm )j = j(A?1wm ; vm )(vm+1 ; Avm )j kA?1 k (Avm ; vm+1 ): Note that the modulus in j(Avm ; vm+1 )j can be omitted since (Avm ; vm+1 ) = k(I ? PVm )Avm k 0.

6.2 Residual Bounds

By (7), the residual norm of the MR approximation of the solution of (1) with respect to the initial guess x0 2 H and the space V H is given by kr k = kr0 k sin](r0 ; AV), i.e., the factor by which the residual is reduced is given by the sine of the smallest angle formed by r0 and the image of a vector v 2 V under A. When we add the assumption of Section 3 that r0 2 V we can bound this factor as follows: kr k2 = 1 ? cos2 ](r ; AV) = 1 ? sup j(r0; Av )j2 0 2 2 kr0k2 v 2V kr0k kAv k 2 1 ? kjr(rk02; AkAr0r)jk2 = 1 ? (A(rr0; ;rr0)) (A(rr0 ; ;AArr0)) 0 0 0 0 0 0 ?1 (Av ; v ) (A w ; w ) inf 1 ? vinf 2V (v ; v ) w 2AV (w ; w ) = 1 ? inf fjz j : z 2 W(AjV )g inf fjz j : z 2 W(A?jA1V )g 1 ? inf fjz j : z 2 W(A)g inf fjz j : z 2 W(A?1)g =: 1 ? (A)(A?1):

where

v; v) W(A) = (A (v ; v ) : 0 6= v 2 H denotes the eld of values of A and we have assumed that A is invertible. We have shown

Theorem 6.2. The residual r of the MR approximation to the solution of (1) on the subspace V H with initial residual r0 2 V satis es kr k p1 ? (A)(A?1) (73) kr0k with (A) de ned above.

43

Of course, the bound (73) only yields a reduction provided 0 62 W(A), which also implies 0 62 W(A?1 ) cf. Horn and Johnson [27, p. 66]. If Vm is the sequence of Krylov spaces, we obtain Corollary 6.3. The MR residual with index m satis es m q krm k Y ?1 kr0 k j =1 1 ? j (A)~j (A )

(74)

where the quantities j (A) and ~j (A?1 ) are de ned as

j (A) := inf fjz j : z 2 W(AjVj \W?j?1 )g and ~j (A?1 ) := inf fjz j : z 2 W(A?jA1(Vj \W?j?1 ) )g:

Proof. From (26), the fact that sj = sin ](rjMR ?1; AVj ) and rjMR ?1 2 Vj \ W? j ?1,

we conclude

m m m 2 Y Y j(rjMR krm k = Y ?1 ; Av )j MR ; AV ) = 1 ? sup sin]( r s = j j ?1 MR kr0 k j =1 j j =1 v 2Vj krj ?1k2 kAv k2 j =1

=

m Y

j =1 m Y j =1 m ? Y j =1

MR 2 j(rjMR ?1; Arj ?1)j 1 ? kr MR MR j ?1 k2 kArj ?1k2

1?

!1=2

!1=2

! (Ar MR ; r MR) (r MR ; Ar MR) 1=2 j ? 1 j ?1 j ?1 j ? 1 (rjMR ?1; rjMR ?1) (ArjMR ?1 ; ArjMR ?1)

1 ? j (A)~j (A?1 ) 1=2 :

Noting that j (A) (A) and ~j (A?1 ) (A?1), we obtain the simpler bound given in Corollary 6.4. The MR residual with index m satis es

krm k ?1 ? (A)(A?1)m=2 : kr0 k

(75)

If A is a positive real operator, i.e., if its Hermitian part H := (A + A )=2 is positive de nite, then (A) = min (H) and (A?1 v ; v ) = inf (w ; Aw ) (w ; w ) min (H) ; (A?1) = vinf 2H (v ; v ) w 2H (w ; w ) (Aw ; Aw ) kAk2 in view of which Corollary 6.4 yields a bound rst given by Elman [9]: krm k 1 ? min (H)2 m=2 : kr0k max (AT A) 44

Remark 6.5. If, in the derivation of the residual bound (73), one makes the

slightly cruder estimate kr k2 1 ? j(r0; Ar0)j2 1 ? inf j(r0; Ar0)j2 =: sin2 ( (A)); r02H kr0k2 kAr0 k2 kr0k2 kr0k2 kAr0k2 where (A) is the largest angle between a nonzero vector v 2 H and its image Av . One thus obtains a bound sinm (A) on the residual reduction after m steps. The angle (A) was introduced by Wielandt [48] in 1967. Cf. also Gustafson [20].

6.3 Error Bounds

When solving a linear system (1) approximately using successive iterates xm , the residual rm = b ? Axm may be the only computable indication of the progress of the solution process. The quantity of primary interest, however, is the error em = x ? xm = A?1rm . For Krylov subspace methods, we have xm 2 x0 + Vm , so that em = e0 ? v for some v 2 Vm . Of course, the best one could to is to select this v 2 Vm as the best approximation to e0 from Vm = Km (A; r0) = AKm (A; e0 ). This would correspond to computing the MR approximation of e0 with respect to the sequence of spaces Wm = AKm (A; e0), a process which would require knowledge of the initial error and hence the solution x . The relation between residuals and errors, however, allows us to bound the error of the MR approximation with respect to this best possible approximation: Lemma 6.6. The error emMR of the MR approximation satis es kemMR k (A) v 2infVm ke0 ? v k (76) where (A) = kAk kA?1k denotes the condition number of A. Proof. With Vm = Km (r0; A) and Wm = AVm , there holds

krmMR k = wmin kr ? w k = vmin kA(e0 ? v )k kAk vmin ke ? v k; 2Wm 0 2Vm 2Vm 0 and thus the assertion follows from kemMRk = kA?1 rmMRk kA?1 k krmMRk. Thus, the error of the MR approximation is within the condition number of A of the error of the best approximation to e0 from the Krylov space. In view of the relation (27), this translates to the following bound for the OR error: kemOR k (A) (77) cm v 2infVm ke0 ? v k: However, a stronger bound can be obtained if the eld of values of A is bounded away from the origin: 45

Theorem 6.7. If inf fjzj : z 2 W(A)g for some > 0, then the OR error

satis es

kemORk kAk inf ke ? v k: ke0k v 2Vm 0

Proof. From the characterization of the OR approximation we have emOR =

2 Km (r0 ; A) subject to rmOR = Ae OR ? Km (r0 ; A) ,

e0 ? v for some v

emOR ? A Km (r0; A):

This means that the OR error is obtained as the error of an OR approximation of e0 from the space Km (r0 ; A) orthogonal to A Km (r0; A). Thus, with Vm = Km (r0; A), the results on the norm of an oblique projection give ke OR k kI ? P A Vm k = 1 Vm ke0k cos ](Vm ; A Vm ) : We bound the cosine of the largest canonical angle between Vm and A Vm by 2 cos2 ](Vm ; A Vm ) = sup sup kju(uk2; AkAv v)jk2 u 2Vm v 2Vm 2 u ; u ) (A u ; u ) sup kju(uk2; AkAu u)jk2 = sup (A u 2Vm (u ; u ) (A u ; A u ) u 2Vm 2 v2infV (A(uu; ;uu)) v2infV (A(A uu; A; u u) ) kA k2 m

m

An important class of operators consists of those with a positive de nite Hermitian part H := (A + A )=2. These operators are sometimes referred to as positive real. In this case the inner product (H ; ) induced by H de nes a norm on H. The next theorem, which is due to Starke [40], shows that the OR error measured in this norm is optimal up to a factor which depends on the skew-Hermitian part S := (A ? A )=2. Theorem 6.8. If A is positive real with Hermitian and skew-Hermitian parts H and S , then the OR error satis es ? kemORkH 1 + (H ?1 S) v 2infV ke0 ? v kH ; m

where (H ?1 S) denotes the spectral radius of H ?1 S . Proof. Since rmOR = AemOR ? Vm , we have, noting (H v ; v ) = Re(Av ; v ) and

(S v ; v ) = Im(Av ; v ) for v 2 Vm , kemOR k2H = (H emOR ; emOR ) j(AemOR ; emOR)j = j(AemOR ; e0 )j = j(AemOR ; e0 ? v )j 46

for arbitrary v 2 Vm , and therefore

kemOR k2H j(H emOR; e0 ? v ) + (S emOR ; eo ? v )j: The rst term is bounded by ke0 ? v kH kemOR kH , and for the second term we

obtain

j(S emOR ; eo ? v )j = j(H 1=2H ?1=2SH ?1=2 H 1=2emOR ; e0 ? v )j kH ?1=2SH ?1=2 H 1=2emOR kke0 ? v kH kH ?1=2SH ?1=2 kkemORkH ke0 ? v kH = (H ?1 S)kemOR kH ke0 ? v kH ;

which, together with the bound for the rst term, yields the assertion. An immediate consequence of Theorem 6.8 is that, for a Hermitian positive definite operator A, the OR method, which simpli es to the well-known conjugate gradient method in this case, yields the best approximation in the A-norm. Remark 6.9. In view of the remark preceding Lemma 6.6 that the best approximation of the initial error e0 from the Krylov space Km (A; r0) = AKm (A; e0) has the same structure as the best approximation of r0 from AK(A; r0) with r0 replaced by e0, the in mum in (76) and (77) may be bounded in an analogous manner to the MR residual in Theorem 6.2 and Corollaries 6.3 and 6.4.

6.4 An Application: Compact Operators

Many applications such as the solution of elliptic boundary value problems by the integral equation method require the solution of second-kind Fredholm equations, i.e., operator equations (1) in which A has the form A = I+K with 6= 0 and K : H ! H a compact operator. The development of fast multiplication algorithms (cf. Greengard and Rokhlin [18], Hackbusch and Novak [24]) has made Krylov subspace methods attractive as solution algorithms for discretizations of these problems, since they require only applications of the (discrete) operator to vectors. Moreover, as shown in [31] for GMRES and in Winther [49] for CG, Krylov subspace methods converge q-superlinearly for operator equations involving compact perturbations of (multiples of) the identity. The reason for this is that, for these operators, the sines sm of the canonical angles between the Krylov space Vm and Wm = AVm converge to zero. To show this, we recall a basic result on compact operators and orthonormal systems: Theorem 6.10. Let K : H ! H be a compact linear operator and fvm gm1

H be an orthonormal system. Then

mlim !1(K vm ; vm+1 ) = 0:

Proof. See e.g. Ringrose [34].

47

Corollary 6.11. Let A = I + K with 6= 0 and K : H ! H compact, let fvj gj 1 denote the orthonormal system generated by the Arnoldi process applied to A and r0 2 H, and let fwj gj 1 be an orthonormal system such that fw1 ; : : :; wm g is an orthonormal basis of Wm = AVm . Then the sines sm of largest canonical angle between Vm and Wm form a null sequence. Proof. Lemmas 3.3 and 6.1 combined with (42) yield jsm j = j(vm+1 ; wm )j (vm+1 ; Avm ) kA?1 k = (vm+1 ; K vm ) kA?1 k ! 0; since A?1 is bounded whenever = 6 0. In particular, since sm ! 0 implies that jsm j = 6 1 for m suciently large,

this means that the OR approximation is always de ned except for possibly a nite number of indices. Moreover, as jsm j is bounded away from one, cm is accordingly bounded away from zero, hence the relation (27) also implies the q-superlinear convergence of the OR approximation. We summarize this result in the following theorem. Theorem 6.12. Given K : H ! H compact, 0 6= 2 C and b 2 H, let x0 2 H be an initial guess at the solution of (1) with A = I + K . Then the OR approximation with respect to the spaces Vm = Km (r0 ; A) and Wm = AVm with initial residual vector r0 = b ? Ax0 exists for all suciently large indices.

Moreover, the sequence of MR and OR approximations converge q-superlinearly.

We remark that the rate of superlinear convergence may be quanti ed in terms of the rate of decay of the singular values of K cf. [31]. We also note, in view of (58), that this result applies to all MR/OR pairs of Krylov subspace methods including QMR/BCG given a bound on the conditioning of the basis of the Krylov space being used. For bases generated by the look-ahead Lanczos method, such bounds are guaranteed e.g. by the implementation given in [13].

7 Conclusions and Final Remarks We have presented a unifying framework for describing and analyzing Krylov subspace methods by rst introducing the abstract MR and OR approximation methods on sequences of subspaces and then specializing these methods to the Krylov subspace setting for solving linear operator equations. All known relations between MR/OR-type Krylov methods were shown to hold in the abstract formulation. In particular, the angles appearing in the Givens QR factorization of the Hessenberg matrix used in many Krylov subspace algorithms were identi ed as angles between the Krylov spaces and their image under A. Moreover, depending on whether orthogonal or non-orthogonal bases are employed, both MR/OR and QMR/QOR methods can be described and analyzed in the same manner. Furthermore, we have shown that essentially all Krylov subspace methods can be interpreted as QMR/QOR methods. The description of the algorithms in terms of angles was subsequently used to derive many of the previously known error and residual bounds. 48

For related work on interpreting QMR methods as MR methods in a modi ed norm, see Barth and Manteuel [1]. The idea of using a nonstandard inner product in Krylov subspace algorithms like GMRES is used in Starke [41]. Another bene t of the analysis in this paper is that, at least conceptually, it separates the issue of generating bases from the method for computing the approximations. Indeed, other algorithms besides Lanczos or Arnoldi could be used to generate the bases required for the MR and OR methods, but this is seldom done for lack of promising alternatives. Calvez and Saad [4], introduce a nonstandard inner product and develop a QMR-like algorithm not based on the Lanczos algorithm. We have made no mention in this paper of preconditioning, which is indispensible for most problems of practical relevance; when preconditioning is accounted for, our results apply to the preconditioned system. We believe that our approach provides a simple and intuitive way of describing Kylov subspace algorithms which simpli es many of the standard proofs and brings out the connections among the many algorithms in the literature.

Acknowledgements

The authors would like to thank Howard Elman for many helpful remarks and G. W. Stewart for his observation on QMR recovery.

References [1] T. Barth and T. Manteuel. Variable metric conjugate gradient algorithms. In M. Natori and T. Nodera, editors, Advances in Numerical Methods for Large Sparse Sets of Linear Systems, volume 10 of Parallel Processing for Scienti c Computing, pages 165{188, Yokohama, Japan, 1994. Keio University. [2] A. Bjorck and G. H. Golub. Numerical methods for computing angles between linear subspaces. Math. Comp., 27:579{594, 1973. [3] P. N. Brown. A theoretical comparison of the Arnoldi and GMRES algorithms. SIAM J. Sci. Stat. Comput., 12:58{77, 1991. [4] C. L. Calvez and Y. Saad. Modi ed Krylov acceleration for parallel environments. Technical Report UNMSI-98-6, University of Minnesota, Minneapolis, 1998. [5] T. F. Chan, E. Gallopoulos, V. Simoncini, T. Szeto, and C. H. Tong. A quasi-minimal residual variant of the Bi-CGSTAB algorithm for nonsymmetric systems. SIAM J. Sci. Comput., 15:338{347, 1994. [6] F. Chatelin. Eigenvalues of Matrices. John Wiley & Sons, 1993.

49

[7] J. Cullum and A. Greenbaum. Relations between Galerkin and normminimizing iterative methods for solving linear systems. SIAM J. Matrix Anal. Appl., 17(2):223{247, 1996. [8] C. Davis and W. M. Kahan. The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal., 7(1):1{46, 1970. [9] H. Elman. Iterative Methods for Sparse, Nonsymmetric Systems of Linear Equations. PhD thesis, Yale University, Department of Computer Science, 1982. [10] R. W. Freund. Quasi-kernel polynomials and their use in non-Hermitian matrix iterations. J. Comput. Appl. Math., 43(1{2):135{158, 1992. [11] R. W. Freund. A transpose-free quasi-minimal residual algorithm for nonHermitian linear systems. SIAM J. Sci. Comput., 14(2):470{482, 1993. [12] R. W. Freund. Transpose-free quasi-minimal residual methods for nonHermitian linear systems. In G. Golub, A. Greenbaum, and M. Luskin, editors, Recent Advances in Iterative Methods, volume 60 of Mathematics and its Applications, pages 69{94, New York, 1994. IMA, Springer-Verlag. [13] R. W. Freund, M. Gutknecht, and N. M. Nachtigal. An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices. SIAM J. Sci. Comput., 14(1), 1993. [14] R. W. Freund and N. M. Nachtigal. QMR: a quasi-minimal residual method for non-Hermitian linear systems. Numer. Math., 60:315{339, 1991. [15] G. Golub and C. van Loan. Matrix Computations. Johns Hopkins Univ. Press, 1989. [16] A. Greenbaum, M. Rozloznk, and Z. Strakos. Numerical behaviour of the modi ed Gram-Schmidt GMRES implementation. BIT, 37(3):706{719, 1997. [17] A. Greenbaum and Z. Strakos. Matrices that generate the same Krylov residual spaces. In G. Golub, A. Greenbaum, and M. Luskin, editors, Recent advances in iterative methods. Springer, New York, 1994. [18] L. Greengard and V. Rokhlin. A fast algorithm for particle simulations. J. Comput. Phys., 73(2):325{348, 1987. [19] T. N. E. Greville. Solutions of the matrix equation XAX = X and relations between oblique and orthogonal projectors. SIAM J. Appl. Math., 26(4), 1074. [20] K. E. Gustafson and D. K. M. Rao. Numerical Range: the Field of Values of Operators and Matrices. Springer-Verlag, New York, 1996. 50

[21] M. H. Gutknecht. Changing the norm in conjugate gradient type algorithms. SIAM J. Numer. Anal., 30:40{56, 1993. [22] M. H. Gutknecht. Variants of BiCGStab for matrices with complex spectrum. SIAM J. Sci. Stat. Comput., 14(5):1020{1033, 1993. [23] M. H. Gutknecht. Lanczos-type solvers for nonsymmetric linears systems of equations. Acta Numerica, pages 271{397, 1997. [24] W. Hackbusch and Z. Nowak. On the fast matrix multiplication in the boundary element method by panel clustering. Numer. Math., 54:463{491, 1989. [25] M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Standards, 49:409{436, 1952. [26] M. Hochbruck and C. Lubich. Error analysis of Krylov methods in a nutshell. SIAM J. Sci. Comput., 19:695{701, 1998. [27] R. Horn and C. Johnson. Topics in Matrix Analysis. Cambridge University Press, 1991. [28] T. Kato. Estimation of iterated matrices, with application to the von Neumann condition. Numer. Math., 2:22{29, 1960. [29] C. Lanczos. Solution of systems of linear equations by minimized iterations. J. Res. Nat. Bur. Standards, 49:33{53, 1952. [30] T. A. Manteuel. The Tchebychev iteration for nonsymmetric linear systems. Numer. Math., 28:307{327, 1977. [31] I. Moret. A note on the superlinear convergence of GMRES. SIAM J. Numer. Anal., 34(2):513{516, 1997. [32] C. C. Paige and M. A. Saunders. Solution of sparse inde nite systems of linear equations. SIAM J. Numer. Anal., 12:617{629, 1975. [33] K. J. Ressel and M. H. Gutknecht. QMR-smoothing for Lanczos-type product methods based on three-term recurrences. Technical Report TR-96-18, Swiss Center for Scienti c Computing, Zuerich, 1996. [34] J. R. Ringrose. Compact non-self-adjoint operators. Van Nostrand Reinhold Company, 1971. [35] M. Rozloznk and Z. Strakos. Variants of the residual minimizing Krylov space methods. In I. Marek, editor, Proceedings of the XI. School `Software and Algorithms of Numerical Mathematics', pages 208{225, 1996. [36] M. Rozloznk, Z. Strakos, and M. Tuma. On the role of orthogonality in the GMRES method. In K. Jeery, J. Kral, and M. Bartosek, editors, SOFSEM'96: Theory and Practice of Informatics, volume 1175 of Lecture Notes In Computer Science, pages 409{416. Springer-Verlag, 1996. 51

[37] Y. Saad. Krylov subspace methods for solving large unsymmetric linear systems. Math. Comp., 37:105{126, 1981. [38] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Comput., 7:856{869, 1986. [39] P. Sonneveld. CGS, a fast Lanczos-type solver for nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 10(1):36{52, 1989. [40] G. Starke. Iterative methods and decomposition-based preconditioners for nonsymmetric elliptic boundary value problems. Habilitationsschrift, Universitat Karlsruhe, 1994. [41] G. Starke. Field-of-values analysis of preconditioned iterative methods for nonsymmetric elliptic problems. Numer. Math., 78:103{117, 1997. [42] G. W. Stewart, 1998. private communication. [43] H. A. van der Vorst. BICGSTAB: a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comput., 13:631{644, 1992. [44] H. Walker and L. Zhou. A simpler GMRES. Numer. Linear Algebra Appl., 1(6):571{581, 1994. [45] H. F. Walker. Residual smoothing and peak/plateau behavior in Krylov subspace methods. Appl. Numer. Math., 19:279{286, 1995. [46] P. A. Wedin. On angles between subspaces of a nite dimensional inner product space. In B. Kagstrom and A. Ruhe, editors, Matrix Pencils, number 273 in Lecture Notes in Mathematics, pages 263{285, New York, 1983. Springer. [47] R. Weiss. Properties of generalized conjugate gradient methods. Numer. Linear Algebra Appl., 1(1):45{63, 1994. [48] H. Wielandt. Topics in the analytic theory of matrices. In B. Huppert and H. Schneider, editors, Helmut Wielandt, Mathematical Works, volume 2. Walter de Gruyter, 1996. [49] R. Winther. Some superlinear convergence results for the conjugate gradient method. SIAM J. Numer. Anal., 17:14{17, 1980. [50] L. Zhou and H. F. Walker. Residual smoothing techniques for iterative methods. SIAM J. Sci. Comput., 15:297{312, 1994.

52