Hybrid procedures and semi{iterative methods for linear systems C. Brezinski Laboratoire d'Analyse Numerique et d'Optimisation UFR IEEA { M3
Universite des Sciences et Technologies de Lille 59655 { Villeneuve d'Ascq cedex, France e{mail:
[email protected]
Abstract
The aim of this paper is to study the multiple hybrid procedure which produces a better residual vector from those given by several iterative methods used simultaneously for solving a system of linear equations. A particular case leads to new semi{iterative methods where the residual is minimized instead of the error. Discussions about several related topics such as projection, extrapolation, best approximation and biorthogonalization are also included and generalizations of the Lanczos method are discussed. Numerical examples illustrate the purpose.
AMS (MOS) subject classi cation: 65F10, 65F25, 65B05, 65D15. Keywords: linear systems, semi{iterative methods, hybrid procedures, projection, extrapolation, best approximation, orthogonalization, Lanczos method.
1 Introduction We consider the problem of solving a nonsingular system of linear equations Ax = b by an iterative method which produces the sequence of iterates (xn). A semi{iterative method consists of constructing a new sequence of iterates ynk by k X k yn = aik;n xn i ( )
(
( )
with Pki aik;n = 1. We have (
)
+
i=0
)
=0
enk = ( )
k X i=0
aik;n en
1
(
)
i
+
with ei = x ? xi and enk = x ? ynk . Setting ri = b ? Axi and nk = b ? Aynk , we have ri = Aei; nk = Aenk and it follows k X nk = aik;n rn i : ( )
( )
( )
( )
( )
( )
(
( )
)
i=0
+
If the iterates (xn) are produced by a stationary iterative method of the form
xn = Txn + c
n = 0; 1; : : :
+1
with I ? T nonsingular and x arbitrary then en = T ne and it follows 0
0
enk = Pk n (T )e ( )
( )
where
Pk n () = ( )
k X i=0
0
aik;n i: (
)
The problem is to nd the best polynomial Pk n that is the polynomial minimizing the Euclidean norm of the error kPk n (T )e k among the set of polynomials of degree at most k satisfying P (1) = 1. Such a procedure will be called a minimal error semi{iterative method. However, since e = x ? x is unknown it is obviously impossible to solve this problem in the general case. Since we have kPk n (T )e k kPk n (T )k ke k a simpler problem is to nd the polynomial Pk n , with Pk n (1) = 1, which minimizes kPk n (T )k. When the matrix T is symmetric (or, more generally, Hermitian, since everything can be easily generalized to the complex case), the solution of this problem is well{known and it involves Chebyshev polynomials [68, 70, 123, 94, 105] (a survey can be found in [76, Chap.7]). Recently, the problem has been studied for more general matrices [93] and its solution involves Faber polynomials [45, 46, 47, 114] or generalizations of Chebyshev polynomials [104, 9]. The connections between numerical linear algebra and approximation theory have been reviewed in [116]. In this paper, we shall consider the application of a semi{iterative method to an arbitrary sequence of iterates not necessarily produced by a stationary iterative method and we shall also make use of a dierent minimization criterion based on the residual vector nk = b ? Aynk instead of the error enk = x ? ynk . In the case of iterates produced by a stationary iterative method, such a procedure will be called a minimal residual semi{iterative method. This terminology is issued from [96]. We shall also discuss a more general case where several iterative methods are used simultaneously to produce a better residual vector. Obviously, such a procedure is well adapted to parallel computing [51]. Our approach is based on the multiple hybrid procedure introduced in [26] and studied in more detail in [1]. We shall also use some results from [2]. Since such a procedure is related to projections, extrapolation methods and best approximation, we shall begin, in section 2, by reminding some results about these topics and their connections. We shall also discuss several orthogonalization and biorthogonalization processes and apply them to the solution of a system of ( )
( )
0
0
( )
( )
0
( )
0
( )
( )
( )
( )
( )
2
( )
0
linear equations thus leading to generalizations of the Lanczos method. Section 3 will be devoted to hybrid procedures and to minimal and quasi{minimal residual smoothing algorithms. Section 4 deals with semi{iterative methods and their convergence acceleration properties. Many open problems needing further investigations will also be raised.
2 Preliminaries 2.1 Projections
In this subsection we shall remind some results about oblique and orthogonal projections in IRn. Let (u ; : : : ; uk ) and (v ; : : :; vk ) be two sets of linearly independent vectors in IRn. Let Ek = span(u ; : : :; uk ) and Fk = span(v ; : : : ; vk ). Let Pk be the (oblique) projection on Ek along Fk?. Pk is the projection on Ek orthogonal to Fk. Let Uk and Vk be the n k matrices whose columns are u ; : : :; uk and v ; : : :; vk respectively. Pk exists if and only if the k k matrix VkT Uk is regular. This is true if u ; : : :; uk and also v ; : : :; vk are linearly independent (see Davis [40], Lemma 2.2.1, p.26) or, equivalently, if dim(Ek ) = dim(Fk?) = k and no vector of Ek is orthogonal to Fk [106]. If the matrix VkT Uk is singular and if u ; : : :; uk (resp. v ; : : : ; vk) are linearly independent then v ; : : :; vk (resp. u ; : : : ; uk ) are dependent. If u ; : : :; uk (resp. v ; : : : ; vk) are independent and if VkT Uk is regular then v ; : : :; vk (resp. u ; : : : ; uk ) are also independent. We have 1 0 ( v ; u ) ( v ; u k) B .. ... C CA : VkT Uk = B @ . (vk ; u ) (vk ; uk ) If u ; : : :; uk and v ; : : : ; vk are dual basis of Ek and Fk then, by de nition, (vi; uj ) = ij and VkT Uk = I . We remind that Ek = Im Pk = Ker (I ? Pk ) and Fk? = Ker Pk . We shall identify the projection operator Pk with the matrix which represents it. We have the 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Theorem 1
Pk = Uk (VkT Uk )? VkT : 1
It is well known that I ? Pk is also a projection. If y is an arbitrary vector then
Pk y = a u + + ak uk 1
and it follows immediately
1
(y ? Pk y; vi) = 0 for i = 1; : : : ; k
3
Theorem 2
(v ;.u ) (v ;.uk ) (v .; y) .. . . (vk ; u ) (vk ;.uk ) (vk.; y) u uk 0 Pk y = ? (v ; u ) (v ; u ) k .. ... (v ; u ) (v ;.u ) k k k and (v ;.u ) (v ;.uk) (v .; y) .. . . (vk ; u ) (vk ;.uk ) (vk.; y) u uk y y ? Pk y = (v ; u ) (v ; u ) : k ... ... (v ; u ) (v ; u ) k k k Applying Schur's formula to the ratio of determinants for Pk y (see [17]) leads to Pk y = Uk (VkT Uk )? z where z = ((v ; y); : : :; (vk; y))T and thus we recover the result given in Theorem 1. If the ui and the vi are biorthogonal (that is (vi; uj ) = 0 if i 6= j ) then k k T X X Pk = (vui;vui ) and Pk y = ((vvi;;uy)) ui: i i i i i i PkT is the projection on Fk along Ek? , while I ? PkT is the projection on Ek? along Fk . Pk is an orthogonal projection if and only if Pk = PkT , that is if and only if vi = ui for i = 1; : : : ; k or, equivalently, Ek = Fk . In that case Pk y is the solution of the minimization problem a1min ;:::;a ky ? (a u + + ak uk )k 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
=1
=1
1
k
1
and the usual results about best approximation in an inner product space (also called least squares approximation) are recovered. In particular, we have the
Theorem 3
(y ? Pk y; Pk y) = 0; (u ; u ) (u ; u ) (u ; y) k . ... ... .. (uk ; u ) (uk ; uk ) (uk ; y) (y; u ) (y; uk ) (y; y) ky ? Pk yk = (u ; u ) (u ; u ) = (y ? Pk y; y): k .. ... (u ; u ) (u ;. u ) k k k 1
1
1
1
1
2
1
1
1
1
1
4
If the ui are mutually orthogonal, then k k T X X Pk = (uui;uui ) , Pk y = ((uui;;uy)) ui i i i i i i and k X kPk yk = j((uui;; yu))j : =1
=1
2
2
i=1
i i
On these questions, see, for example, Davis [40], Laurent [88] or Cryer [39]. The following general result should also be reminded
Theorem 4 If Pk = 6 0, then kPk k 1. kPk k = 1 if and only if Pk is an orthogonal projection.
Other results about projections can be found, for example, in Chatelin [34, 35]. The connection with Galerkin approximation is studied in [117, pp. 291]; see also subsection 2.5 below. The subject is also closely related to biorthogonality as explained in [19]; some other results will be given in subsection 2.4. All the preceding results will be useful in the sequel.
2.2 Extrapolation methods
Let (sn) be a sequence of vectors (usually converging to some unknown limit s1 ). An extrapolation method is a method whose purpose is to accelerate the convergence of (sn ). It consists of transforming this sequence into a set of new sequences (tkn ) given by ( )
tkn = a sn + + ak sn k k; n = 0; 1; : : : where the numbers ai can depend on k and n. If sn = = sn k = s1 for some indexes n and k, then a desirable property is that tkn = s1. Thus, we must have a + + ak = 1. As explained in [18] (see also [24], pp. 18{21, 214{215), such extrapolation methods can often be written as a ratio of two determinants s s n k n g (n) g (n + k) ... ... gk (n) gk (n + k) n tk = 1 1 g (n) g (n + k) ... ... gk (n) gk (n + k) where the gi(n) are given numbers which can depend on some terms of the sequence (sn) itself. Such ratios of determinants can be recursively computed by several algorithms such as the E and the H{algorithms or some variant of the CRPA [24] (see section 3 below). ( )
0
+
( )
+
0
+
1
1
1
1
( )
5
We shall now show that projections, as explained in the previous subsection, also t into this framework. It is easy to see, after some manipulations, that u + y uk + y y (v ; y) (v ; u + y) (v ; uk + y) ... ... ... (vk ; y) (vk ; u + y) (vk ; uk + y) y ? Pk y = 1 1 1 (v ; y) (v ; u + y) (v ; uk + y) ... ... ... (vk ; y) (vk ; u + y) (vk ; uk + y) 1
1
1
1
1
1
1
1
1
1
1
which shows that a projection can be considered as an extrapolation method and vice versa. Let us mention that the connection between both subjects was studied by Sidi [111] and by Jbilou and Sadok [77] for linearly generated vector sequences and, in particular, for the Lanczos method and other Krylov subspace methods. Another related problem is to nd r = a u + + akuk with a + + ak = 1 such that (r; vi) = 0 for i = 1; : : : ; k. Its solution is given by u uk (v ; u ) (v ; uk ) ... ... (vk ; u ) (vk ; uk) : r = 1 1 (v ; u ) (v ; uk ) ... ... (vk ; u ) (vk ; uk) 1
1
1
1
1
1
1
1
1
1
1
1
To end this section, let us mention a paradox: it can sometimes be interesting to accelerate the convergence of a sequence whose limit is known. This is, in particular, the case for a sequence of residual vectors (rn = b ? Axn) converging to zero. If this sequence is transformed into a new sequence of residuals (n = b ? Ayn), then the sequence (yn) can converge to the solution x = A? b faster than the sequence (xn). 1
2.3 Best approximation
A special case of the question discussed at the end of the preceding subsection is the problem (M ) of nding r 2 span(u ; ; uk ) which minimizes 1
(r; r) = ka u + + akuk k 1
2
1
under the assumption a + + ak = 1. We rst need the 1
6
Lemma 1 The problem (M ) is equivalent to the problem (M 0 ) of nding the best approximation s of ?uk in span(u1 ? uk ; : : : ; uk?1 ? uk ).
Proof: The problem (M) can be written as nding r which minimizes (r; r) = ka u + + ak? uk? + (1 ? a ? ? ak? )uk k = ka (u ? uk ) + + ak? (uk? ? uk ) + uk k that is to nd the best approximation s of ?uk in span(u ? uk ; : : :; uk? ? uk ). Remark 1 : Obviously the problem (M ) is also equivalent to nding the best approximation of any ?ui for i 2 f1; : : : ; kg in span(u ? ui; : : : ; ui? ? ui; ui ? ui; : : :; uk ? ui). 1
1
1
1
1
1
1
1
2
1
1
1
1
2
1
1
+1
We can now give the
Theorem 5
The solution of the problem (M) is given by
u (u ? uk ; u ) (u ? uk ; u ) ... (u ? uk ; u ) r = k? 1 (u ? uk ; u ) (u ? uk ; u ) .. . (uk? ? uk ; u ) 1
1
2
1
1
2
1
2
1
1
1
2
2
2
...
1
2
(uk? ? uk ; u ) 1 (u ? uk ; u ) (u ? uk ; u ) .. . (uk? ? uk ; u )
1
1
... (uk? ? uk ; uk ) 1 (u ? uk ; uk ) (u ? uk ; uk ) .. . (uk? ? uk ; uk )
u uk (u ? uk ; u ) (u ? uk ; uk ) (u ? uk ; u ) (u ? uk ; uk )
1
1
1
2
1
2
2
2
1
2
1
1
2
1
and it holds
(u ? uk ;. u ? uk ) (u ? uk ; u. k? ? uk ) ?(u ?. uk ; uk ) . . . (uk? ? uk. ; u ? uk ) (uk? ? uk ;. uk? ? uk ) ?(uk? ?. uk ; uk ) ? (uk ; u ? uk ) ?(uk ; uk? ? uk ) (uk ; uk ) (r; r) = : (u ? uk ; u ? uk ) (u ? uk ; uk? ? uk ) ... ... (uk? ? uk ; u ? uk ) (uk? ? uk ; uk? ? uk ) 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
7
1
Proof: Let us set r = uk + s. By Lemma 1, the problem (M ) is transformed into the problem (M 0) whose solution is well known (see, for example, [40], Theorem 8.7.5, p.182). We have (u ? uk ;. u ? uk ) (uk? ? uk. ; u ? uk ) ?(uk ; u. ? uk ) . . . (u ? uk ; u. k? ? uk ) (uk? ? uk ;. uk? ? uk ) ?(uk ; uk.? ? uk ) u ? uk u k ? ? uk ?uk ?r = ?uk ? s = (u ? uk ; u ? uk ) (uk? ? uk ; u ? uk ) ... ... (u ? uk ; uk? ? uk ) (uk? ? uk ; uk? ? uk ) 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
:
1
The denominator of this expression can be written as (u ? uk ;. u ? uk ) (uk? ? uk. ; u ? uk ) ?(uk ; u. ? uk ) .. .. .. ? : ( u ? u ; u ? u ) ( u ? u ; u ? u ) ? ( u ; u ? u ) k k ? k k ? k k ? k k k ? k 0 0 ?1 1
1
1
1
1
1
1
1
1
1
Subtracting the last column of the numerator and the denominator from the other ones gives the ratio of determinants for r. The expression for (r; r) is an immediate consequence of a well{known result; see, for example, [40, Theorem 8.7.4, p. 181].
Remark 2 : In the ratio of determinants for r given in Theorem 5, let us replace the second rows in the numerator and the denominator by their dierence with the third ones, the third rows by their dierence with the fourth ones, and so on. We obtain
u (u ? u ; u ) (u ? u ; u ) .. . (uk ? uk? ; u ) r = 1 (u ? u ; u ) (u ? u ; u ) ... (uk ? uk? ; u ) 1
2
1
1
3
2
1
1
1
2
1
1
3
2
1
1
1
.. . (uk ? uk? ; uk ) : 1 (u ? u ; uk ) (u ? u ; uk ) ... (uk ? uk? ; uk )
u uk (u ? u ; u ) (u ? u ; uk ) (u ? u ; u ) (u ? u ; uk ) 2
2
1
2
3
2
2
.. .
(uk ? uk? ; u ) 1 (u ? u ; u ) (u ? u ; u ) ... (uk ? uk? ; u ) 1
2
2
1
2
3
2
2
1
2
2
1
3
2
1
2
1
3
2
1
The expression of r given in Theorem 5 is similar to the ratio of determinants involved in extrapolation methods and thus there is a close connection between orthogonal projections, extrapolation methods and best approximations. This connection will be exploited in the next sections.
8
2.4 Biorthogonalization
As we saw in subsection 2.1, biorthogonality is an important concept in projection methods since it reduces considerably the amount of work to be performed. Let v ; v ; : : : and r ; r ; : : : be two sets of vectors. We shall biorthogonalize them, that is we shall built two new sets v; v; : : : and r ; r; : : : such that (vi; rj) = 0; 8i and 8j 6= i: The conditions for the existence of such biorthogonal families are well known; see, for example, [40, Theorem 2.6.1, p.41]. It is easy to check that the vectors de ned by v = v r = r kX ? kX ? rk = rk ? ((vvi;; rrk)) ri vk = vk ? ((vvk;; rri)) vi i i i i i i satisfy (vk; ri) = (vi; rk) = 0 for i = 1; : : : ; k ? 1. But this is also true 8i 6= k since one of the indexes is always smaller than the other one. We also have (vk; rk) = (vk; rk ) = (vk ; rk): Obviously, this biorthogonalization scheme is a two{sided Gram{Schmidt process which generalizes the usual one. The usual Gram{Schmidt process (which corresponds to vi = ri) is known to be numerically unstable because of the loss of orthogonality between the vectors. Another implementation of this algorithm is the so{called modi ed Gram{Schmidt process (MGS). It is uncertain who rst found the MGS, but numerical experiments were reported by Rice [101] and it was proved to be more stable by Bjorck [11]. It is explained in [12, pp. 504] and [69, pp. 218{219]. A detailed analysis of the algorithm is given in [127, pp. 175]). A similar procedure could also be used in our case. The algorithm is as follows vk = vk rk = rk i ; r i v v ; r rki = rki ? (vi; rk) ri for i = 1; : : : ; k ? 1 vki = vki ? (vk; ri) vi i i i i vk = vkk rk = rkk with v = v and r = r . It is easy to show, by induction, that i ( 0 i ( 0 j = 1; : : : ; i ? 1 vk ; rj = (v ; r) vj ; rk = (v; r ) for for j = i; : : :; k ? 1: k j j k It is also possible, for improving the numerical stability of the algorithm, to reorthogonalize the vectors, a procedure proposed by Lanczos himself [86, p. 271] (see [98, pp. 262] for more details). After having obtained rk, it consists of checking its biorthogonality with all the previous vi. If some scalar product (vi; rk) diers substantially from zero, then the vector "ki = ? ((vvi;; rrk)) ri i i 1
2
1
2
1
2
1
1
2
1
1
1
=1
=1
(1)
(1)
( )
( )
( +1)
( +1)
( )
( )
( )
1
1
1
1
1
( )
1
( )
( )
( )
9
is added to rk. A similar procedure is used for vk. Obviously, these two techniques could also be applied to all the other orthogonalization processes discussed in this paper. As explained by Wilkinson [132, p. 383], the lost of orthogonality is not always due to the accumulation of rounding errors and, due to cancellation, it can even happen at the rst step of the process with vectors of dimension 2. It is important to notice that, if v; : : : ; vk? and r; : : :; rk? are respectively replaced by aivi and biri where the ai and the bi are arbitrary nonzero constants, the vk and rk remain unchanged. Thus vk and rk can also be changed to akvk and bk rk by multiplying the formulae above respectively by the nonzero constants ak and bk . Obviously, such a change in the normalizations does not aect the biorthogonalization property of the families fvkg and frk g. This invariance property of the preceding biorthogonalization process will be much useful in the sequel. We have the following determinantal formulae (v ; r ) (v ; r ) (v ; r ) (v ; r ) k k . . . . .. .. .. .. (v ; rk? ) (vk ; rk? ) (vk? ; r ) (vk? ; rk ) v v r r k k vk = (v ; r ) (v ; r ) rk = (v ; r ) (v ; r ) : k? k? ... ... ... ... (v ; r ) (v ; r ) (v ; r ) (v ; r ) 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
k?1
k?1 k?1
1
1
k?1
1
1
k?1 k?1
1
Of course, vk and rk exist if and only if the Gram determinant in the denominator is dierent from zero. We also see that (vk; ri) = (vi; rk) = 0 for i = 1; : : : ; k ? 1: There is an obvious connection between these formulae and the second formula of Theorem 2. The question of biorthogonality is treated in its full generality in [19] where recursive formulae for the computation of the biorthogonal families are given. This biorthogonalization process is also connected to a factorization of the Gram matrix Gk = ((vi; rj ))ki;j as explained in [40, p.44]. Let Vk (resp. Rk ) be the matrix whose columns are v ; : : : ; vk (resp. r ; : : : ; rk ). We shall denote by Vk and Rk the matrices with columns v; : : : ; vk and r; : : :; rk respectively. We have Gk = VkT Rk . There exists unique unit lower triangular matrices Lk and Mk and a diagonal matrix Dk such that (see [69, p.134]) 1
=1
1
1
1
Gk = Lk Dk MkT : It is easy to check that the preceding formulae are equivalent to where B ?T
Vk = Vk L?k T , Rk = Rk Mk?T ? denotes (B ? )T = B T for any matrix B . Setting 1
1
aik = (vk; ri)=(vi; ri) and bik = (vi; rk )=(vi; ri) ( )
( )
10
(1)
we have
0 B B B Lk = B B @
1 a 1 O ... ... . . . a k a k 1 (2) 1
( ) 1
( ) 2
0 1 1 BB b CC 1 O C B Mk = B . . B@ .. .. . . . CCA : b k b k 1
1 CC CC CA
(2) 1
( ) 1
( ) 2
Since
VkT Rk = Gk = Lk Dk MkT = Lk VkT Rk MkT it follows that Dk = VkT Rk . When 8i; vi = ri, the matrices Gk are symmetric positive de nite. Let us now assume that the ri are the residual vectors obtained by some iterative method for solving the system Ax = b. The preceding process can be used for trying to improve the convergence properties of the iterative method by biorthogonalization. However, in this case, we also need to compute the vectors xk de ned by rk = b ? Axk without using A? . This is indeed possible by using the invariance property of the biorthogonalization process. From the above formula, we have kX ? rk = b ? Axk = b ? Axk ? bik (b ? Axi ) 1
1
( )
i=1
with, as in (1), Pbik = (vi; rk )=(vi; ri). So, b can be eliminated from both sides of this relation if and only if 1 ? ik? bik = 1. Usually, this is not the case and we have to replace these vectors rk by the new vectors ! !, kX ? kX ? k k 1 ? bi rk = rk ? bi ri ( )
1 =1
( )
1
with r = r1. 1
1
( )
i=1
i=1
It follows
xk
= xk ?
kX ?1 i=1
( )
bik xi ( )
!,
1?
kX ?1 i=1
bi
k
( )
!
= x . If we denote by R0 the matrix whose columns are these new vectors r then with x i k R0k = Rk k where k is a diagonal matrix. If a breakdown (due to a division by zero) occurs in this scheme, it is possible to skip over the residual vector rk which produces it and to consider the following one instead. Now, if vi = ri then vi = ri and it follows that 9k p (the dimension of the system) such that rk = 0 and xk = x. Thus the basic iterative method has been transformed into a method with orthogonal residuals and, thus, a nite termination. However, it must be noticed that the computation of rk needs the storage of k vectors, a procedure usually much too costly. In the next section, we shall see how this drawback can be avoided in a particular case. In the general case, it is always possible to perform only a partial biorthogonalization of the vectors vi and ri by using, for k d + 2, the relations 1 0 1 ,0 kX ? kX ? kX ? aik vi rk = @rk ? vk = vk ? bik A bik ri A @1 ? 1
1
1
1
( )
i=k?d?1
i=k?d?1
11
( )
1
i=k?d?1
( )
where d 0 is a xed integer and aik and bik are given by (1). For k < d + 2, the same relations have to be used with the sum starting from i = 1. We also have 0 1 ,0 1 kX ? kX ? xk = @xk ? bik xi A @1 ? bi k A : ( )
( )
1
1
( )
i=k?d?1
( )
i=k?d?1
This process is a truncated version of the biorthogonalization process described above since, in this case, we have (vk; ri) = (vi; rk) = 0 for i = k ? d ? 1; : : :; k ? 1 only, that is (vi; rj) = 0 for 1 ji ? j j d. Such a procedure is called an incomplete biorthogonalization. Applications will be given in the next subsection. Until now, starting from the vi and the ri we have constructed the vi and the ri such that 8j; (vj; ri) = 0. Thus, both families of vectors have been transformed into new ones. It is also possible to modify only one of the families, for example the ri, and to leave the other one unchanged. If the same biorthogonality property is still required between both families (assumed to be limited to m vectors), then each rk has to be written as a combination of all the ri and not only of rk and the previous ri. Thus, if m has to be increased, the new ri are not related to the olds ones in a simple fashion, a drawback limiting the practical use of this procedure (which is analogous to the Lagrange interpolation formula, while the biorthogonalization process described above is similar to Newton's; see [19, 40]). This is the reason why we shall now construct vectors rk of the form kX ? rk = rk ? i k ri 1
( )
i=0
such that, 8k
(vj ; rk) = 0 for j = 1; : : : ; k ? 1: Thus we must have, for j = 1; : : : ; k ? 1 kX ? (vj ; rk ) ? i k (vj ; ri) = 0: 1
But, since obtain
(vj ; ri)
( )
i=0
= 0 for j < i, the preceding system reduces to a lower triangular one and we 1 0 kX ? (2) i k = @(vi; rk ) ? j k (vi; rj)A /(vi; ri) : 1
( )
( )
j =1
The vectors rk
are given by the same determinantal formula as above. Such a process will be called a semi{biorthogonalization. Obviously, it is also possible to consider the vectors ! !, kX ? kX ? k k 1 ? i rk = rk ? i ri 1
1
( )
( )
i=0
i=0
with the i k still given by the same formula. In this case, the vectors xk such that rk = b ? Axk are obtained by ! !, kX ? kX ? k k 1 ? i xk = xk ? i xi ( )
1
( )
1
i=0
i=0
12
( )
Of course, partial semi{biorthogonalization can also be considered. It consists of constructing the vectors ri such that 8k; (vj ; rk) = 0 for j = k ? d ? 1; : : : ; k ? 1 where d is a xed nonnegative integer. These vectors can be written under the form kX ? rk = rk ? i k ri: 1
( )
i=k?d?1
In that case, the matrix of the system giving the coecients i k for i = k ? d ? 1; : : : ; k ? 1 is still lower triangular and the formula (2) holds with the sum going from k ? d ? 1 to k ? 1. It is also possible to modify the process as above in order to be able to compute the corresponding vectors xi . An application will be given in the next subsection. ( )
2.5 The Lanczos method and around
Let us give an application of the preceding biorthogonalization processes. We consider a system of p linear equations in p unknowns, Ax = b. The method of Galerkin (also often called Petrov{Galerkin, see [103]) for solving this system consists of constructing a sequence of approximate solutions (xn) such that
xn = a u + + anun 1
1
where u ; : : : ; un are arbitrary linearly independent vectors, and such that the residual vector rn = b?Axn is orthogonal to the n linearly independent vectors v ; : : : ; vn. Writing down these conditions leads to a (Au ; vi) + + an (Aun; vi) = (b; vi) for i = 1; : : :; n: Obviously, this method corresponds to an oblique projection. The coecients ai are solution of the system of dimension n given above. Thus they will be easier to compute if the vectors Aui and the vectors vj have been previously biorthogonalized by the procedure described in the preceding subsection. Let us mention that, in some variants, it is the vector xn ? u , where u is an arbitrary vector, which is considered. The method of Vorobyev [125] is a particular case of Galerkin's where ui = Aui? ; i 2, see [21]. As explained in [107], many well{known methods for solving linear systems can be put into this framework, in particular the GCR [50, Chap.5] (see also [49] and [52] for a generalization), ORTHOMIN [124], ORTHODIR [136], Axelsson's method [3] and the GMRES [108]. A review can be found in [57] and [4]. These methods have connections with extrapolation methods as explained in [111, 77]. As explained in [21], the method of Lanczos [87] for solving the system Ax = b is a particular case of the method of Vorobyev [125]. It consists of constructing the sequence of vectors (xn) such that 8n 1. xn ? x 2 span(r ; Ar ; : : : ; An? r ) 2. rn = b ? Axn ? Fn = span(y; AT y; : : :; AT ?1 y) 1
1
1
1
0
0
1
0
0
0
1
0
n
13
where y and x are arbitrary vectors and r = b ? Ax . Let Pn r be the projection of r on En = span(Ar ; : : :; Anr ) along Fn?. We set rn = r ? Pnr . Multiplying both sides of the rst condition by A, adding and subtracting b, it is easy to see that this vector is the same as the vector rn constructed by the Lanczos method since rn 2 r + En and rn = r ? Pnr ? Fn. We have r Ar An r (y; Ar ) (y; Anr ) (y; r ) .. ... ... T ?.1 (A y; r ) (AT ?1 y; Ar ) (AT ?1 y; Anr ) : rn = (y; Ar ) (y; Anr ) ... ... (AT ?1 y; Ar ) (AT ?1 y; Anr ) 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
n
0
n
n
0
0
0
0
0
n
n
0
0
When A = AT and y = r , the projection is orthogonal and we recover the minimization property of rn discussed in subsection 2.1. Thus, Pn r is the best approximation of r in En . The projector Pn can be represented by the matrix Pn (A) where Pn () is the nth polynomial (normalized by the condition Pn (0) = 1) belonging to the family of formal orthogonal polynomials (FOP) with respect to the linear functional c on the space of polynomials de ned by c() = ci = (y; Air ). The connection between FOP and the Lanczos method was already implicitly contained in [86]. The theory of FOP is fully developed in [13] where this connection is studied in more detail. Generalizations of the Lanczos method where Fn is an arbitrary subspace, instead of a Krylov one, are discussed in [20] and [21]. In that case, Pn can still be represented by Pn (A) where Pn () is a biorthogonal polynomial as de ned in [19]. Some properties of such generalizations are given in [89]. Let us now show how the results of the preceding subsection apply to the Lanczos method and its generalizations. We consider the case where 8i 1; ri = Ai? r . As we shall see below, it will not be necessary, in this case, to know the corresponding vectors xi such that ri = b ? Axi except x . From the formulae given in subsection 2.4 for the biorthogonalization and semi{biorthogonalization processes, we have for k 2 kX ? (3) rk = Ark? ? bik ri 0
0
0
0
1
0
0
1
1
( )
i=1
k
with r = r = r and where the bi are the coecients involved in any of these processes. It is easy to see, by induction, that rk = Pk? (A)r with kX ? Pk? () = k? ? bik Pi (): 1
1
0
( )
1
1
1
0
2
i=0
( ) +1
(4)
Let us mention that, usually, Pk? (A)r is denoted by rk? instead of rk. This shift in the index is due to the fact that our vectors ri are numbered from i = 1 instead of i = 0. 1
0
1
14
Since rk? is a linear combination of r ; : : : ; rk? and vice versa, rk? can be replaced, in relation (3), by rk? . Thus, k? can be replaced, in relation (4), by Pk? . Of course, the bik have to be changed to new coecients that will be denoted by i k . Thus we have kX ? (5) rk = Ark? ? i k ri 1
1
1
1
1
1
( )
2
( )
1
1
and
i=1
Pk? () = Pk? () ? 1
( )
2
kX ?2 i=0
i k Pi ():
(6)
( ) +1
These polynomials Pk form a family of (monic) biorthogonal polynomials. Such polynomials were introduced in [16] and they have been discussed at length in [19]. Let L ; L ; : : : be the linear functionals de ned on the space of polynomials by 0
1
Li(j ) = (vi ; Aj r ) for i; j = 0; 1; : : : : +1
0
It follows that the conditions (vi; rj) = 0 for i = 1; : : : ; j ? 1 can be written as
Li (Pj? ()) = 0 for i = 0; : : : ; j ? 2:
(7)
1
A family of polynomials satisfying such conditions 8j 2 with P an arbitrary nonzero constant is called a family of biorthogonal polynomials with respect to the linear functionals L ; L ; : : :. The biorthogonality conditions (7) determine the polynomials Pk apart from a multiplying factor since we have k conditions for the k + 1 coecients of Pk . So, one condition has to be added to the conditions (7). The preceding polynomials were taken to be monic but, later, a dierent normalization will chosen. Replacing, 8i, the vector vi by the vector vi (which is a linear combination of v ; : : :; vi) in the biorthogonality conditions leads to replacing the linear functional Li by the functional Li given as the linear combination of L ; : : :; Li with the same coecients. The polynomials Pk can be recursively computed by (6) but the length of the recurrence grows with k. If some relation holds between the vectors vi (that is between the linear functionals Li) then this recurrence relationship can simplify and its length can become independent of ?k1 (this is, for example, what happens for the FOP mentioned above which correspond to vi = AT y). Such cases will be studied below. As explained in subsection 2.4, we shall not be able, from the formula (5), to compute the corresponding vectors xk . Since, now, we have the product Ark? in the formula (5) for rk, it is necessary that ? Pk? k = 1 or, in other words, to consider the new vectors r given by 0
0
1
1
1
i
1 =1
i
1
( )
i
k
kX ?1 (k ) rk = 0 Ark?1 ? i(k)ri i=1
with Pik? i k = ?1 and we shall have 1 =1
( )
xk = ? k rk? ? ( ) 0
1
15
kX ?1 i=1
i k xi ( )
with r = r = b ? Ax and x = x . This normalization is equivalent to the normalization Pk? (0) = 1 for the biorthogonal polynomials and relation (6) has to be modi ed accordingly. Let us now show how to compute directly the coecients i k . Using the biorthogonalization process described above and its invariance property, we immediately obtain i k = (vi; Ark? ) (vi; ri) k where the vi are computed by the biorthogonalization scheme of subsection 2.4. Then, k is obtained by the formula : k =? k k 1 = + + kk? = k In fact, such a method builds up a pair of biorthogonal basis for span(r ; Ar ; : : :; Ak? r ) and span(v ; : : : ; vk). Let us now investigate the preceding biorthogonalization algorithms for two particular choices of the vectors vi. 0
1
0
0
1
1
( )
( )
1
( ) 0
( ) 0
( ) 0
( ) 1
( ) 0
( ) 1
( ) 0
0
0
1
0
1
Orthores
For the choice vk = rk = Ak? r , the algorithm Orthores [136] is recovered. Since the vectors rk are mutually orthogonal, this method terminates with a zero residual in at most p iterations where p is the dimension of the system. However, unless the matrix A has some speci c properties [80] (see [79] for a review) or a nonsingular matrix H such that HA = AT H is known [78], the recursion does not simplify and, thus, its practical use is quite limited. This is the reason why, as explained at the end of subsection 2.4, the recurrence was shortened thus leading to a truncated version of Orthores. For example, with d = 2, a three{term recurrence relationship is obtained for the vectors rk which are now orthogonal only to the two preceding ones. However, in some cases, such a truncated method works quite well. Moreover it only requires one matrix{by{vector multiplication per iteration. A numerical example will be given below. It seems that the idea of truncating was introduced by Vinsome [124]. The polynomials Pk which underline the algorithm Orthores satisfy, in the case where the algorithm is truncated, a truncated recurrence relationship of the form kX ? Pk () = Pk? () ? i k Pi() (8) 1
0
1
1
i=k?d?1
( +1) +1
for k d. For k < d the same relation holds but the sum starts from i = ?1 and P? () = 0 and P () = 1. Families of polynomials satisfying such a (d +2){term recurrence relationship were introduced by Van Iseghem [121] and they are called orthogonal of dimension d. An extension of the classical theorem of Shohat{Favard [112] says that, for any family of polynomials such that (8) holds, there exist linear functionals L ; ; Ld? such that, for k = nd + m with 0 m < d, we have Li(j Pk ) = 0 for j = 0; : : : ; n ? 1 and i = 0; : : : ; d ? 1 Li(n Pk ) = 0 for i = 0; : : : ; m ? 1: 1
0
0
16
1
These linear functionals are uniquely determined once the values of Li(j ) have been given for i = 0; : : : ; d ? 1 and j = 0; : : : ; i [122]. Orthogonal polynomials of dimension d are a particular case of biorthogonal polynomials when the linear functionals Li are related by
Li(j ) = Li d (j ): +1
+
(9)
The case d = 1 corresponds to the FOP discussed above (with the linear functionals c and L identical). Conversely, if the functionals Li are related by (9), then the biorthogonal polynomials satisfy a recurrence relationship of the form (8). This is, in particular, the case if vi d = AT vi for i = 1; 2; : : : with v ; : : :; vd (almost) arbitrary given vectors, that is if vi = AT vq for i = 1; 2; : : : where i = ld + q with 1 q d. If, as above, we set k = nd + m with 0 m < d, then it follows that 0
+
1
l
(vi; Aj rk ) = 0 for j = 0; : : :; n ? 1 and i = 1; : : : ; d (vi; Anrk ) = 0 for i = 1; : : : ; m: +1
+1
If n = 0, only the last relations hold with m = k. Thus, we obtain a generalization of the Lanczos method which could be related to the generalization of the method of moments discussed by Vorobyev [125, pp. 128] in a way similar to the connection, pointed out in [21], between the usual Lanczos method (which corresponds to d = 1) and the method of moments. The theory of such a generalization remains to be studied. Instead of truncating the recurrence relationship of the biorthogonal polynomials underlying the algorithm Orthores, another possibility would consist of using biorthogonal polynomials in the least{squares sense that is such that mX ? [Li(Pk )] 1
2
i=0
is minimized, where m k. Such polynomials were introduced in [23] in the particular case Li(P ) = L (i P ) for i = 0; 1; : : :. When m is xed, these polynomials can be recursively computed by the CRPA [14] for increasing values of k m. When m = k, they coincide with the biorthogonal polynomials de ned above. Using least{squares biorthogonal polynomials will lead to a least{squares Orthores algorithm. Such an algorithm has to be studied both from a theoretical and practical point of view. Finally, let us mention that a restarted version of Orthores can also be used. 0
Lanczos/Orthores
Let us now consider the particular case vk = AT ?1 y where y is an (almost) arbitrary vector. This is exactly the method of Lanczos described at the beginning of this subsection. It builds up a pair of biorthogonal basis for Kk (A; r ) and Kk (AT ; y). The recurrence relationship of the biorthogonalization process reduces itself to a three{term one and the Lanczos/Orthores algorithm is recovered. This algorithm is also directly recovered from Orthores in the particular case A = AT . Such a three{term recurrence relationship arises because the polynomials fPk g form a family of FOP as explained above. k
0
17
Breakdowns in this algorithm can be avoided by using a look{ahead strategy as introduced by Parlett, Taylor and Liu [99]. The technique described by Draux [43], which consists of jumping over the residuals responsible for a division by zero, can be applied. The other algorithms for implementing the Lanczos method (Lanczos/Orthodir, Lanczos/Orthomin, ...) can also be used [6, 30] and breakdowns and near{breakdowns (division by a quantity close to zero which causes numerical instability) are avoided by similar techniques [27, 28] or by building intermediate residuals [5]. Other solutions to the breakdown and near{breakdown problems can be found in the literature; see, for example, [7, 73, 74]. Since this question is not our main purpose here, we shall not refer more to it. Another possibility consists of taking rk = (AT A)k? r with r = AT (b ? Ax ) and, as in Orthores, chosing vk = rk . This is similar to applying the biorthogonalization procedure to the normal equations AT Ax = AT b. In this case, the procedure reduces to the following three{term recurrence relationship 1
0
0
0
rk = k AT r~k + k rk ? k rk? +1
+1
+1
+1
1
with r~k = Ark; r = r and = 0. The constants are given by 0
0
1
k =k = ?(~rk ; r~k )=(rk ; rk) +1
k =k = (~rk? ; r~k )=(rk? ; rk? )
+1
+1
+1
1
1
1
and k = 1=( k =k ? k =k ). We have +1
+1
+1
+1
+1
xk = ?k rk + k xk ? k xk? +1
+1
+1
+1
1
with x = x . Thus, we have obtained an algorithm slightly dierent from Lanczos/Orthores. Both algorithms need AT and they require two matrix{by{vector products. Let us give a numerical example comparing this Lanczos/Orthores algorithm on the normal equations with the truncated version of Orthores when d = 2. We consider the system of dimension p = 500 whose matrix is given by aii = 1; ai;i = ?1; ai ;i = 1 and a p = 1. With bi = 1; i = 1; : : : ; p ? 1 and bp = 2 the solution is x = (1; : : : ; 1)T . Figure 1 shows log krkk as a function of k starting from x = 0. We see that Lanczos/Orthores on the normal equations (the lowest curve) gives better results than the truncated Orthores algorithm. However, taking into account that it requires two matrix{by{vector products per iteration instead of one, the convergence of both methods is almost exactly the same. 0
0
+1
+1
1
10
0
3 Hybrid procedures Hybrid procedures for solving systems of linear equations were introduced in [26]. Let us assume that we are given two iterative methods for solving the system Ax = b which respectively produce the sequences of residual vectors (rn ) and (rn ). The hybrid procedure consists of constructing the new sequence of residual vectors (n ) by (1)
(2)
n = anrn + (1 ? an)rn (1)
(2)
18
Figure 2 2 0
log_10 (norm of residual)
-2 -4 -6 -8 -10 -12 -14 -16 0
5
10
15
20
25
30
35
40
iteration
where an is chosen to minimize knk. Such an an is given by rn ? r n ; r n : an = ? rn ? r n ; rn ? r n Thanks to this minimization property, we have kn k min krn k; krn k : It is easy to see that n ; n ? rn = n ; n ? rn = n; rn ? rn which shows that such a procedure enters into the framework of projections as de ned in subsection 2.1. We have rn rn (rn ? rn ; rn ) (rn ? rn ; rn ) n = 1 1 (rn ? rn ; rn ) (rn ? rn ; rn ) (1)
(1)
(2)
(2)
(2)
(1)
(1)
(2)
(2)
(1)
(2)
(1)
(1)
(2)
(1)
(2)
(1)
(1)
(2)
(2)
(1)
(2)
(1)
(1)
(2)
(2)
19
(2)
which shows that such a procedure also enters into the framework of extrapolations methods as de ned in subsection 2.2. It is also possible to use more than two iterative methods in an hybrid procedure and to consider the new sequence of residual vectors (nk ) de ned from the sequences (rn ); : : : ; (rnk ) by (1)
( )
( )
nk = an rn + + ank rnk ( )
(1) (1)
( ) ( )
with an + + ank = 1. The integer k is called the rank of the procedure. If ynk is the vector such that nk = b ? Aynk then ( )
(1)
( )
( )
( )
ynk = an xn + + ank xnk ( )
(1)
( )
(1)
( )
where the xni are related to the rni by rni = b ? Axni . If an ; : : :; ank are chosen to minimize knk k we recover the multiple hybrid procedure proposed in [26] and, for k = 2, the case treated above. However, before treating this case, let us consider the more general case where an ; : : :; ank are chosen such that k n ; vi = 0 for i = 1; : : :; k ? 1 where the vi are arbitrary vectors which can depend on n and k. From the result given at the end of subsection 2.2, we have rnk rn (v ; rn ) (v ; rnk ) ... ... (vk? ; rn ) (vk? ; rnk ) k (10) n = : 1 1 (v ; rn ) (v ; rnk ) ... ... (vk? ; rn ) (vk? ; rnk ) ( )
(1)
( )
( )
( )
( )
( )
(1)
( )
( )
(1)
1
1
( )
( )
(1)
(1)
1
( )
1
(1)
1
( )
1
( )
1
(1)
1
( )
This procedure will be called the general multiple hybrid procedure, in short the GMHP. Obviously, if one of the rni is equal to zero then nk is also zero. But we have in fact the more general following result ( )
( )
Theorem 6 8n; nk = 0 if and only if 9(a n ); : : : ; (akn ) such that 8n; a n + + akn 6= 0 and ( )
( ) 1
( )
( ) 1
a n rn + + akn rnk = 0: ( ) (1) 1
( ) ( )
Proof: If the condition of the theorem holds between the rni , then n a rn + + akn rnk ; vi = 0 ( )
( ) (1) 1
( ) ( )
20
( )
and the columns in the numerator of nk are dependent and the result follows. Conversely if nk = 0, a linear combination between the columns of the numerator of nk exists and we have the nresult. The condition a + + akn 6= 0 insures that the denominator of nk is dierent from zero. Obviously we have the ( )
( )
( )
( ) 1
( )
( )
Corollary 1 In IRp ; 8n; 9k p + 1 such that (nk) = 0.
In practice, for high values of k, a recursive algorithm is needed for computing the vectors . Such an algorithm exists when the vi are independent of k and when n is xed (thus the vi can depend on n). In (10), let us replace rn by rni ; : : : ; rnk by rni k? and denote by nk;i the corresponding ratio of determinants. Thus nk; is identical to nk given by (10). These vectors can be recursively computed by a variant of the CRPA [14] which is the following k? ;i k? ;i v ; k ? n k? ;i nk;i = nk? ;i ? ? n vk? ; nk? ;i ? nk? ;i n
nk
( )
(1) (
(
)
(
( )
(
1
( +
1)
(
1
1 )
( )
1)
(
)
( )
1 )
(
1 +1)
(
1 +1)
(
1 )
1 )
with n ;i = rni . Let us now consider the case where vi = rni ? rnk rn k (rn ? rn ; rn ) .. k? . k (r ? rn ; rn ) nk = n 1 k ( r ? r ; r ) n n n . . k? . k (rn ? rn ; rn )
for i = 1; : : : ; k ? 1. In this case, we have rnk (rn ? rnk ; rnk ) ... (rnk? ? rnk ; rnk ) : 1 (rn ? rnk ; rnk ) ... (rnk? ? rnk ; rnk )
From Remark 2, we also have rn ? ( r n rn ; rn ) ... k (r ? rnk? ; rn ) k n = n 1 (rn ? rn ; rn ) ... k (rn ? rnk? ; rn )
... (rnk ? rnk? ; rnk ) : 1 (rn ? rn ; rnk ) ... (rnk ? rnk? ; rnk )
(1 )
( )
( )
( )
(1)
(1)
( )
(
( )
1)
( )
(1)
(
( )
1)
( )
(1)
(1)
(1)
(1)
(1)
(1)
(2)
( )
( )
(
(2)
( )
1)
(1)
(
1)
(1)
(1)
(1)
(1)
21
( )
(1)
(
( )
1)
( )
(1)
(
( )
1)
( )
( )
( )
( )
( )
rnk (rn ? rn ; rnk ) ( )
(1)
(2)
( )
(
(2)
( )
1)
(1)
(
1)
( )
( )
( )
( )
From Theorem 5, we know that nk is the solution of the minimization problem ( )
a1min ;:::;ak ka1rn
+ + ak rnk k
(1)
( )
under the assumption a + + ak = 1. By Lemma 1, this problem can be written as nding the minimum of k k n ; n = ka rn + + ak1 rnk? + (1 ? a ? ? ak? )rnk k = ka (rn ? rnk ) + + ak? (rnk? ? rnk ) + rnk k 1
( )
( )
1
1
(1)
(
(1)
1)
1
( )
1
(
1)
1
( )
( ) 2
( ) 2
that is to nd the best approximation snk of ?rnk in span(rn ? rnk ; : : :; rnk? ? rnk ). We have nk = snk + rnk . Since the denominator of nk is the Gram determinant of the vectors rn ? rnk ; : : :; rnk? ? rnk , it is zero if and only if these vectors are dependent [40, Theorem 8.7.2, p.178]. By construction knk k min krn k; : : :; krnk k ( )
( )
( )
( )
(1)
( )
(
1)
( )
( )
( )
(1)
( )
(1)
( )
(
1)
( )
( )
and the gain is given by the second result of Theorem 5. Moreover, Theorem 6 and its Corollary still hold. From Theorem 3, we have (nk ; nk ? rnk ) = 0. This procedure is called the multiple hybrid procedure, in short the MHP. It was proposed in [26] and was studied in [1] where recursive algorithms for its implementation are discussed in the particular case rni = rn i? . The cheapest one is based on the H {algorithm proposed in [15] and studied in [29] and is as follows i i r ? r ; H n k n k? k i Hki = Hki ? H ? H k k rn k ? rn k? ; Hki ? Hki ( )
( )
+
( ) +1
( )
( )
1
+
( )
+
+
+
( ) )
1
( +1)
1
( +1)
( )
( )
with H j = rn j . For a xed index n, we obtain Hk = nk , the Hki for i 6= 0 being auxiliary vectors. With this particular choice of the vectors rni , the MHP is identical to the reduced rank extrapolation (RRE) due to Mesina [95] and Eddy [44] and studied in [111, 77]. Let us give, in the general case, a new expression for the vectors nk . This expression could serve as an algorithm for their computation. We have the ( ) 1
(0)
+
( )
( )
( )
( )
Theorem 7
Let vn(1); : : :; vn(k) be k vectors such that
(vni ; rn(j))
(
( )
Let us set
znk = ( )
if i 6= j if i = j:
=0 6= 0 k X i=1
vni : (vni ; rni )
22
( )
( )
( )
(11)
Then, the vectors (nk) of the MHP can be expressed as
nk
( )
,k k X i X k i j(znk ; rni )j = (zn ; rn )rn ( )
( )
( )
( )
i=1
and we have
(nk
( )
( )
2
i=1
; nk
( )
)=1
,X k i=1
(i)
j(znk ; rni )j ( )
( )
2
where the rn are the rn(i) orthonormalized (in any way).
Proof: If y = a rn + + ak rnk , then it is easy to check that (znk ; y) = a + + ak . The subset of span(rn ; : : :; rnk ) satisfying (znk ; y) = 1 is a linear variety of dimension k ? 1 at most. From 1
(1)
(1)
( )
( )
( )
1
( )
what precedes, we know that the vector nk obtained by the MHP is the element of this variety closest to the origin. The solution of this problem is known and is given by the formulae above (see [40, Theorem 9.4.3, page 228]). From the de nition (11) of znk , we have znk = znk? + vnk =(vnk ; rnk ) with zn = 0. Thus (znk ; rni ) = (znk? ; rni ) + (vnk ; rni )=(vnk ; rnk ): Let us now assume that 8i; rni is a linear combination of rn ; : : : ; rni only (such as given, for example, by the Gram{Schmidt process). Thanks to the orthogonality property of vnk with respect to rn ; : : : ; rnk? , we have (vnk ; rni ) = 0 for i = 1; : : : ; k ? 1 and thus (znk ; rni ) = (znk? ; rni ) , i = 1; : : : ; k ? 1. Thus, from the preceding theorem, we have k nk = X (znk ; rni )rni k kn k i kX ? (znk ; rni )rni + (znk ; rnk )rnk = ( )
( )
( )
(
1)
( )
( )
( )
(0)
( )
( )
(
1)
( )
( )
( )
( )
( )
( )
(1)
( )
( )
(1)
(
1)
( )
( )
( )
( )
( ) 2
= = k X i=1
j(znk
( )
( )
( )
( )
( )
( )
=1
1
and
( )
; rni )j2 ( )
i=1 kX ?1 i=1
( )
( )
( )
(znk? ; rni )rni + (znk ; rnk )rnk (
1)
( )
( )
( )
( )
( )
nk? + (z k ; r k )r k n n n knk? k (
(
= =
1)
( )
( )
( )
1) 2
kX ?1 i=1 kX ?1 i=1
j(znk ; rni )j + j(znk ; rnk )j ( )
( )
2
( )
( )
2
j(znk? ; rni )j + j(znk ; rnk )j : (
23
1)
( )
2
( )
( )
2
( )
(
1)
( )
Thus, we nally have
h i nk = knk k nk? =knk? k + (znk ; rnk )rnk ( )
(
( ) 2
1)
(
1) 2
( )
( )
( )
1 + j(z(k); r(k) )j2: n n k k kn ?1)k2 (i) In practice, if the r are obtained from the rn(i) by the classical Gram{Schmidt process then the n (i) (i) (i) vectors xn such that rn = b ? Axn cannot be computed from the rn(i) without using A?1 and 1
knk
( ) 2
=
(
the preceding algorithm has no interest. Below, we shall give a modi cation of the Gram{Schmidt i process which avoids this drawback. However, even if the vectors xn are known, the vectors ynk such that nk = b ? Aynk cannot be obtained without using A? since the sum of the coecients in the formula of Theorem 7 for nk is not equal to 1. This second drawback can be avoided by assuming that the rni are mutually orthogonal and we have the ( )
( )
( )
( )
1
( )
( )
Theorem 8 If 8i = 6 j; (rni ; rnj ) = 0 then ( )
( )
,X k i 1 r n = i i i i ; i (rn ; rn ) i (rn ; rn ) ,X k k i X x 1 n k yn = i i i i i (rn ; rn ) i (rn ; rn ) k X
nk
( )
=1
( )
( )
=1
( )
( )
( )
( )
=1
and
( )
( )
( )
=1
( )
( )
k X 1 : 1 = k i k (n ; n ) i (rn ; rni ) ( )
( )
=1
( )
( )
Proof: If the rni are orthogonal, it is well known that the rni = rni =krni k are orthonormal. From the de nition (11) of znk , we immediately have (znk ; rni ) = 1=krni k and the results follow. ( )
( )
( )
( )
( )
( )
( )
( )
This result, which is independent of the choice of the vectors vni , is a generalization of a result obtained by Weiss [128, Lemma 4.11, Theorem 4.1] (see also [130, Theorem 3.1] or [129, Lemma 11, Theorem 12]) in the case of the minimal residual smoothing (in short the MRS). This method, which will be explained in detail in subsection 4.3, consists of setting nk = nk? + (1 ? )rnk and choosing the value of which minimizes knk k . The results of Weiss follow directly from the orthogonality of the vector rnk with the vector nk? which is a linear combination of rn ; : : : ; rnk? . Obviously, from the preceding theorem, we have 1 1 1 k k = k? k? + k k : (n ; n ) (n ; n ) (rn ; rn ) Thus we obtain the following recursive algorithm for implementing the MRS. ( )
( )
(
1)
( )
( ) 2
( )
( )
( )
(
(
1)
1)
(
24
(1)
1)
( )
( )
(
1)
For k = 2; 3; : : :, and n = 0; 1; : : : 1 1 + 1 = knk k knk? k" krnk k k? k # r n n k k n = kn k + k k? k krn k # n k " k? k ynk = knk k ynk? + xnk kn k krn k ( ) 2
(
( )
1) 2
( ) 2
(
( ) 2
(
( )
1)
1) 2
(
( ) 2
(
( )
( ) 2 ( )
1)
1) 2
( ) 2
with n = rn and yn = xn . For using the preceding algorithm, we are now faced to the problem of constructing mutually i orthogonal vectors rn from the vectors rni . As mentioned above, the problem with the usual i such that r i = b ? Ax i Gram{Schmidt process is that it is not possible to recover the vectors x n n n from the rni without using A? . This is the reason why we shall now modify the Gram{Schmidt procedure in order to avoid this problem. We shall look for mutually orthogonal vectors rni given by rnk = dk r~nk with kX ? r~nk = rnk ? bik rni (1)
(1)
(1)
(1)
( )
( )
( )
( )
( )
1
( )
( )
( )
( )
1
( )
( )
( ) ( )
i=1
and r~n = rn . Obviously dk and the bik also depend on n. We have ! ? kX k i k k k rn = b ? Axn = dk b ? Axn ? bi b ? Axn : (1)
( )
(1)
( )
( )
1
( )
( )
( )
i=1
Thus, it follows
xnk ( )
= dk xnk
( )
if and only if
dk = 1
,
?
kX ?1
1?
bik x(ni) ( )
i=1 kX ?1 i=1
bi
k
( )
!
!
and d = 1. Thus, it is easy to check that these vectors are obtained by the following modi cation of the Gram{Schmidt process (compare with the modi cation given in subsection 2.4) 1. set rn = rn , 2. for k = 2; 3; : : : compute kX ? (r k ; r i ) n n ri k k r~n = rn ? i i n ( r n ; rn ) i , kX ? (r k ; r i ) ! n n k k rn = r~n 1? i i : i (rn ; rn ) 1
(1)
(1)
( )
1
( )
=1
( )
( )
( )
( )
( )
1
( )
=1
25
( )
( )
( )
( )
( )
Of course, this algorithm breaks down if 1 ? Pik? (rnk ; rni )=(rni ; rni ) = 0. In that case, it is possible to skip over the residual rnk which is responsible for the breakdown and to consider the following one. The procedure can be repeated until a residual which does not produce a breakdown has been found. A smoothing algorithm similar to the preceding one can be applied even if the vectors rnk are not mutually orthogonal and we have for k = 2; 3; : : :, and n = 0; 1; : : : nk = krnk k 1 = 1 + 1 nk 2 nk? "2 nk 2 k? k # r 2 k k n n n = n k? 2 + k 2 n n " k? k # x y 2 n n k k + yn = n nk? 2 nk 2 with n = rn ; yn = xn and n = krn k. Thus, for this algorithm, the quantities nk are no longer equal to the quantities knk k (except for k = 1 or when the rni are mutually orthogonal) and the relation with the MHP no longer holds. When n = 0 and the r k are the successive residual vectors produced by an arbitrary iterative method, then the general quasi{minimal residual smoothing procedure (in short the QMRSm) introduced by Zhou and Walker [137] is recovered (an extra m was introduced in the name of this procedure for avoiding confusion with the QMR Squared algorithm). In particular, if the iterates r k are those given by the biconjugate gradient algorithm (BCG) then, Zhou and Walker [137] proved that the k given by the QMRSm are the iterates of the QMR of Freund and Nachtigal [58] and the same type of results holds (see eq. (3.1.9) of [137]). This result was implicitly contained in [59] where it is shown that the QMR iterates can be obtained from those of the BCG (see also [60] for a shortened version). The connection between both processes was also pointed out in [58]. However, in these papers, the relationships with the QMRSm kwas not explicitly mentioned. When the matrix A is symmetric positive de nite and the vectors r are obtained by the conjugate gradient algorithm (they are mutually orthogonal) then the theoretical results given above are recovered, nk = knk k and the procedure reduces to the MRS. Zhou and Walker [137] also proved that, if the QMRSm is applied to the algorithm CGS of Sonneveld [113] then the TFQMR of Freund [56] is recovered and that the QMRCGSTAB [33] can be similarly obtained from the Bi{CGSTAB of Van Der Vorst [120]. A synthetic review of these results and the connections between these algorithms is presented in [126]. Applying the look{ahead version of the Bi{CGSTAB described in [25] to the Example 1 of [26] leads to the results presented in Figure 2, the lowest curve corresponding to the QMRSm. In practice, there often exists a discrepancy between the residual vectors rnk and their actual values b ? Axnk which causes the vectors nk computed recursively to dier signi cantly from the exact residuals b ? Aynk . A mathematically equivalent algorithm for avoiding this drawback was formulated by Zhou and Walker [137, Algo. 3.2.2]. This algorithm also applies in our case. For treating the case of several iterative methods, it is possible, instead of the MHP to use the 1 =1
( )
( )
( )
( )
( )
( )
( )
( )
( )
(
( )
( )
( )
(1)
(1)
(1)
(1)
(1)
1)
( )
( )
(
1)
( )
(
1)
( )
(
1)
(
1)
( )
( )
(1)
( )
( )
( )
( ) 0
( ) 0
( ) 0
( ) 0
( )
( )
( )
( )
( )
( )
26
Figure 3 0 -1
log 10 (norm of the residual)
-2 -3 -4 -5 -6 -7 -8 -9 0
10
20
30
40
50
60
70
80
90
100
nk
hybrid procedure of rank 2 in cascade as explained in [26]. This procedure (in short the CHP) consists of applying the hybrid procedure of rank 2 to rn and rn . Denoting by n the result, the same procedure is then applied to n and rn . Denoting the result by n , the same procedure is applied again to n and rn and so on. Thus, the CHP is given by, for k = 2; 3; : : : nk = nk nk? + (1 ? nk )rnk . k? n ? rnk ; nk ? rnk and n = rn . with nk = ? nk? ? rnk ; rnk The MRS and the CHP are related by the (1)
(2)
(3)
(
1)
(2)
(3)
(4)
( )
( )
(2)
(3)
( )
( )
( ) (
(
1)
1)
( )
( )
( )
Theorem 9
( )
( )
(1)
(1)
If the vectors rn(i) are mutually orthogonal, then the MRS and the CHP are identical. Proof: (nk?1) is a linear combination of rn(1); : : : ; rn(k?1). Thus (nk?1) is orthogonal to rn(k). It follows from the de nition of (nk) in the CHP that (nk) = (rn(k); rn(k))=[((nk?1); (nk?1)) + (rn(k); rn(k))] and we obtain (see [26, line 1 of page 4]) ((nk); (nk)) = (nk)((nk?1); (nk?1)): Moreover 1 ? (nk) = ((nk); (nk))=(rn(k); rn(k)) and the MRS is recovered.
27
4 Semi{iterative methods For solving the system Ax = b let us use an iterative method producing the sequence of iterates (xn). We set rn = b ? Axn and we consider the choice
rni = rn ( )
for i = 1; : : : ; k:
i?1
+
Combining these vectors by the GMHP described in the preceding section leads to a new residual nk . If vi = rn i? ? rn k? for i = 1; : : : ; k ? 1 we are in the case of the MHP (that is, in other words, the RRE of Mesina [95] and Eddy [44]), and nk is the vector whose norm is minimum in span(rn; : : : ; rn k? ). Thus the GMHP and the MHP lead to semi{iterative methods in the sense explained in [123, Chap.5]. They will be respectively called SIGMHP and SIMHP where SI stands for semi{iterative. If n is xed and k increases then for k p +1, the dimension of the system, the exact solution is obtained (see Corollary 1). Obviously the case where k is xed and n increases is easier to implement since it always makes use of the same xed number of consecutive iterates of the original method. As proved in [2, Theorem 2.7], we have for the SIMHP of rank 2 ( )
+
1
+
1
+
1
( )
Theorem 10
(n ; n ) = (rn ; rn ) sin n where n is the angle between rn ? rn and rn . Thus the sequence (n ) converges to zero faster than the sequence (rn ) in the sense that (2)
(2)
+1
+1
+1
2
(2)
+1
+1
nlim !1 kn
(2)
k=krn k = 0 +1
if and only if (n) tends to 0 or .
We also recall that
(12) (n ; n ) = (rn ; rn ) ? (r(rn ? ?r r; nr; rn ?)r ) : n n n n Let us now look at the acceleration properties of the SIMHP for some particular sequences (xn). (2)
(2)
+1
+1
+1
+1
4.1 Projection methods
Let us consider the case
xn = xn + n zn +1
and
rn = rn ? n Azn where zn is an arbitrary vector. If we take +1
n = (un; rn)=(un ; Azn) 28
+1
+1
2
where un is an arbitrary vector such that the denominator of n does not vanish, then (un ; rn ) = 0 +1
which shows, using the results of section 2.1, that rn is the oblique projection of rn on u?n along Azn. A general framework and convergence results for such methods were given by Beuneu [10]. Let us remark that the above formulae for rn and n are quite similar to one of the recurrence relationships used in some algorithms for implementing the Lanczos method [30], such as Lanczos/Orthodir [136]. This relation is +1
+1
Pn () = Pn () ? n Pn ()
(13)
(1)
+1
with P () = P () = 1 and n = c(UnPn )=c(Un Pn ), where Un is an arbitrary polynomial of exact degree n and fPn g and fPn g are the families of FOP with respect to the linear functionals de ned by c(i ) = (y; Air ) and c (i ) = (y; Ai r ). If we take zn = Pn (A)r and un = Un(AT )y, then, since rn = Pn (A)r , we exactly obtain the relation used for the Lanczos method. All the possible relationships for computing recursively these two families of FOP, and thus implementing the Lanczos method, have been studied in [6]. Many well{known methods enter into this framework according to the choices of the vectors un and zn. Let us consider some of them. un = Azn. This choice minimizes krn k and we obtain kn k = krn k. Thus, the SIMHP of rank 2 does not accelerates the convergence. This result is not surprising to those working on convergence acceleration methods since a result due to Pennacchi [100] (see also [66]) states that the set of linearly converging scalar sequences cannot be accelerated by a method using only two successive iterates and that it is necessary, for this purpose, to make use an algorithm needing three iterates. Among such algorithms, Aitken's process is optimal in three dierent meanings, as proved by Delahaye [41] (see also [42, pp. 213]). This is the reason why methods with a higher rank have to be studied and, in particular, the vector generalizations of Aitken's process (see, for example, [90]), or the method given in [22]. zn = AT un. In that case, xn is the orthogonal projection of xn along AT un. From Pythagoras' theorem we have kxn ? xk kxn ? xk: If the vector un is chosen so that (un; rn) = krnk, where kk is an arbitrary vector norm, then we recover the method of norm decomposition introduced by Gastinel [62] for the L {norm and extended to other vector norms in [63] (see also [64]). Other choices of the vectors un lead to the processes of Kaczmarz [81] (when un = ei, the ith unit vector, with i = n + 1 (mod. p)) and Cimmino [37] as explained in [65, pp. 168{174]. un = zn. 0
(1) 0
(1)
(1)
0
(1)
+1
(1)
0
0
0
(2)
+1
+1
2
+1
+1
1
29
This choice leads to a generalization of the Lanczos method [20] which does not, in general, terminate in a nite number of iterations. It corresponds, in relation (13), to the choice Un Pn into the expression of n . In the particular case of the Lanczos method, the exact solution is obtained in a nite number of iterations not greater than the dimension of the system. If A is symmetric positive de nite, then the conjugate gradient algorithm is recovered and, moreover, (rn ; A? rn ) = (en ; Aen ) is minimized [8, p.14]. When the symmetric part (A + AT )=2 of the real matrix A is positive de nite, the conjugate gradient was generalized by Concus and Golub [38] and Widlund [131]. As proved by Eisenstat [48], the iterates constructed by this algorithm are the best approximants to the solution from certain ane subspaces (which are not the ane Krylov subspaces). When A = AT and un = zn = rn the method of steepest descent is obtained. un = rn. This choice also leads to a generalization of the Lanczos method [20] which, in general, does not terminate in a nite number of iterations. It corresponds, in relation (13), to the choice Un Pn into the expression of n . For this generalization of the Lanczos method, since we have (rn ; rn) = 0, it follows that (rn ? rn; rn ) = (rn ; rn ) and (rn ? rn; rn ? rn) = (rn ; rn ) + (rn; rn). From (12), we have (n ; n ) = 1 + (r (rn; r; rn)=()r ; r ) n n n n and thus there exists K < 1 such that 8n; (n ; n )=(rn ; rn ) K if and only if there exists M > 0 such that 8n; M krn k=krn k. In that case, kn k is strictly smaller than krn k and their ratio cannot tend to 1, which shows that the SIMHP of rank 2 can be interesting. Of course, if 9M > 0 such that limn!1 krn k=krn k = M then limn!1 kn k=krn k = K = (1 + M )? = < 1 and, in that case, the convergence is improved (but not accelerated). Obviously, K = 0 if and only if M = 1. Denoting by 'n the angle between rn and Azn and, as above, by n the angle between rn ? rn and rn , it is easy to see that sin n = cos 'n and krn k=krn k = j tan 'n j. Thus n = 0 if and only if (rn; Azn) = 0 which shows that zn has to be chosen as close as possible to orthogonality to the vector AT rn. This condition cannot be satis ed for the Lanczos method even if A is symmetric. Indeed in that case, since zn = Pn (A)r , we have, with the choice y = r , (rn; Azn) = c (Pn Pn ) which is dierent from zero unless a breakdown occurs in the algorithm. Again, it will interesting to study methods with a higher rank. Another type of procedure is described in [22]. zn = rn In this case, rn = (I ? n A)rn and, whatever the n may be, we have rn = pn (A)r with Q n pn () = i (1 ? i ). We also have en = (I ? n A)en and en = pn (A)e . Such a procedure is usually called a rst order Richardson's method since the polynomials pn are related by a dierence equation of the rst order pn () = (1 ? n )pn () with p () = 1. This method goes back to Richardson [102] and it has been discussed in many textbooks, see [76] for example. Extensions will be discussed in subsection 4.2. (1)
+1
1
+1
+1
+1
+1
+1
+1
+1
+1
+1
(2)
+1
+1
+1
(2)
+1
(2)
+1
+1
(2)
+1
+1
2
+1
+1 (2)
+1
(2)
+1
1 2
+1
+1
2
+1
2
+1
(1)
0
0
(1)
(1)
+1
=1
0
+1
0
+1
0
30
There are many ways of choosing the parameters n , that is the polynomials pn among all the polynomials with pn (0) = 1. When the spectrum of A is known to belong to some bounded region of the complex plane, pn is usually selected in order to minimize the maximum norm over that region, an idea due to Young [133]. In the particular case where A is symmetric positive de nite, it is well known that the pn are related to the Chebyshev polynomials. Another possibility is, at each step of the procedure, to chose n minimizing krn k. Such a n is given by n = (rn ; Arn)=(Arn ; Arn) which means that it corresponds to the choice un = Arn in the projection method considered at the beginning of this subsection. This method is discussed in detail in [22]. Let us mention that a quite similar idea was exploited by Van Der Vorst [120] in his Bi{CGSTAB where the residual rn is de ned by rn = pn (A)Pn (A)r where Pn is the FOP involved in the Lanczos method (see subsection 2.5), pn () = (1 ? n )pn () and where n is chosen to minimizes krn k. Other choices of n where proposed in [25]. +1
0
+1
+1
4.2 Stationary and nonstationary iterative methods
Let us now consider the case where the nonsingular system Ax = b is solved by a rst{order (or one{step or rst{degree) stationary iterative method of the form
xn = Txn + c +1
with x arbitrary. Assuming that I ? T is nonsingular and setting G = A(I ? T )? , it follows that c = G? b and T = I ? G? A. Thus the method corresponds to the splitting A = G ? GT of the matrix A. We have 1
0 1
1
AT = A(I ? G? A) = A ? AG? A = (I ? AG? )A: 1
1
1
Since x ? xn = T (x ? xn) and rn = b ? Axn = A(x ? xn ), it follows that rn = AT (x ? xn) and we obtain rn = Brn with B = I ? AG? . Let us remark that the usual splitting A = M ? N with M nonsingular and T = M ? N leads to G = M and B = NM ? . Thus, we have rn i = B irn and the SIGMHP (de ned at the beginning of this section) can be written as nk = Pk?n (B )rn +1
+1
+1
1
1
1
+
( )
( ) 1
31
with
k? 1 (v ; rn) (v ; rn k? ) ... ... (vk? ; rn ) (vk? ; rn k? ) n : Pk? () = 1 1 (v ; rn) (v ; rn k? ) ... ... (vk? ; rn ) (vk? ; rn k? ) If, as in subsection 2.5, we de ne the linear functionals Li on the space of polynomials by 1
1
1
1
( ) 1
+
1
1
1
1
+
+
1
1
+
1
1
1
Li(j ) = (vi; rj ) then this polynomial satis es
Li n Pk?n () = 0 for i = 1; : : : ; k ? 1 ( ) 1
which shows that it is a biorthogonal polynomial in the sense de ned in [19]. Since, now, the normalization is Pk?n (1) = 1, we can write ( ) 1
Pk?n () = 1 ? (1 ? )Qkn? () ( ) 1
( ) 2
where Qkn? is a polynomial of degree k ? 2 at most. It is easy to see that GT = BG and A = (I ? B )G. Thus it holds (compare with [96, Prop. 1.4.1]) ynk = xn + G? Qkn? (B )rn and enk = A? Pk?n (B )Aen: Since AT = BA it follows that A? B iA = T i and ( ) 2
( )
1
1
( )
( ) 2
( ) 1
1
enk = Pk?n (T )en: ( )
( ) 1
Sometimes, the residual vectors are taken as rn0 = c + Txn ? xn. In that case we have rn0 = Trn0 which is similar to the relation en = Ten for the errors. It follows that rn0 = G? rn. In the case where T = I ? A, we have G = I and thus rn0 = rn and T = B . Obviously the SIGMHP can be applied to the sequence (rn0 ) and analogous results are obtained. However it must be noticed that, in general, the polynomials Pk?n corresponding to both cases are not the same. In the case of the SIMHP, such a procedure will be called a minimal residual semi{iterative method thanks to its minimization property as explained above. Let us mention that, from the practical point of view, the computation of nk becomes quite costly in the general case when k increases and that it requires the storage of k vectors. In the particular case where a recurrence relationship of order d + 1 exists among the polynomials Pk n 1
+1
+1
( ) 1
( )
( )
32
when n is xed, then it is no longer necessary to compute rst the residuals rn; : : : ; rn k? and, moreover, it is only necessary to store the d + 1 vectors nk? ; : : :; nk?d? . Let us assume that, for n xed, the polynomials Pk n form a family of formal orthogonal polynomials of dimension d. Thus, as proved in [121], they satisfy a recurrence relationship of order d + 1 which, due to the normalization Pk n (1) = 1, will be written as d X Pk n () = a k;n + k;n Pk?n () ? ik;n Pk?n i? () (
1)
(
+
1)
1
( )
( )
( )
(
( 0
)
(
( ) 1
)
)
( )
1
i=1
with Pi n identically zero for i < 0 and P n () = 1. Replacing the variable by the matrix T and multiplying by the vector en gives d k;n k X k;n k?i k k;n T + I en ? i en : en = a ( )
( ) 0
( +1)
(
( 0
)
)
(
( )
) (
)
i=1
Using the relations Tx = x ? c and Pk n (1) = a k;n (1 + k;n ) ? Pdi ik;n = 1, it is easy to see that d k k;n k;n k X k k;n Tyn + c + a yn ? ik;n ynk?i yn = a ( ) +1
( +1)
yni
(
(
( )
)
( 0
)
(
)
( 0
)
)
(
)
=1
(
( )
)
(
)
i=1
with = 0 for i < 0 and yn = en. Thus, the method is a d + 1{step iterative method. It is stationary if the coecients a k;n ; k;n ; : : :; dk;n are independent of k, otherwise the method is nonstationary. The recurrencen relationships o n between o the two adjacent families of formal orthogonal polynomials of dimension d; Pk n and Pk n given in [122] leads to recurrences between the enk and the enk . There exists an extensive literature on these methods (also known as Richardson iterative methods) which cannot be quoted here. It seems that stationary k{step methods were introduced by Kublanovskaya [85]. Such methods can also be applied to the solution of nonlinear equations as, for example, in [75]. Let us consider the formal series expansion 1 X x() = (I ? T )? c = i T ic: ( )
(0) (
)
( )
( )
( 0
)
(
)
( +1)
( ) +1
1
i=0
Then x(1) = x and the computation of x is equivalent to computing the limit of the series for = 1, if it converges, or its antilimit. So, extrapolation methods (also called convergence acceleration methods) can be used for that purpose. In [91], Mann proposed to use linear summation processes (see the review paper [92]) and the idea was exploited by Germain{Bonne in his thesis [67]. In [97], the authors study the Euler's summation process in connection with stationary iterative methods. Obviously, other convergence acceleration processes could also be of interest (see [61] for numerical experiments) and, in particular, those based on rational approximations of formal power series. In this respect, Pade approximants command attention. Such approximants are rational functions whose expansion in ascending powers of the variable agrees with the series to be approximated as far as possible that is up to the term of degree r + s inclusively where r and s are the degrees 33
of the numerator and the denominator of the approximant respectively; see [31] for a review and [32] for an introduction. Since the coecients of the Neumann series for x() are vectors, it seems appropriate to use the vector Pade approximants of Van Iseghem [121] or those of Graves{Morris [71] (see [72] for a review). The theoretical properties of such processes for summing up the series x() still have to be studied. The vector Pade approximants of Van Iseghem [121] are related to her formal vector orthogonal polynomials of dimension d which satisfy a recurrence relationship of order d +1 as discussed above. Of particular interest is the case d = 1 which corresponds to a three{term recurrence relationship that is to the usual formal orthogonal polynomials and to second{degree iterative methods. Such methods were introduced by Frankel [55] and investigated by Stiefel [115] (see also [54, pp. 234{235]) and Young in [134, Chap. 16, pp. 486{494] and [135]; see also [53, Chap. 9] and [82]. It must be noticed that the Neumann series for x() is such that each vector coecient is obtained from the preceding one by multiplication by the matrix T . The stationary case was studied in detail in [84, 83]. As proved by Cheney [36, Problem 15, page 114], the only polynomials satisfying a three{term recurrence relationship with constant coecients are related, in a simple way, to the Chebyshev polynomials. Thus polynomials related by a recurrence of order d+2 with constant coecients could be considered as a generalization of the Chebyshev polynomials. They remain to be studied. It must also be mentioned that, for studying the nonstationary case, it should be possible, when the coecients of the three{term recurrence relationship converge to some limits, to use the idea of Van Assche [118] which consists of comparing these polynomials with the Chebyshev polynomials by a perturbation technique (see also [119]).
4.3 The minimal residual smoothing method
Let us consider an iterative method producing the sequence of residual vectors (rn0 ). The minimal residual smoothing method (MRS) introduced by Schonauer, Muller and Schnepf [110] (see also [109]) consists, in fact, of applying the MHP of rank 2 to rn0 and the previous iterate of the hybrid procedure itself. That is rn = anrn0 + (1 ? an )rn with r = r0 and an minimizing krn k, that is an = ?(rn0 ? rn ; rn)=(rn0 ? rn; rn0 ? rn): It is easy to see, by induction, that nX ? rn = an? rn0 + ai(1 ? ai ) (1 ? an? )ri0 +1
0
+1
+1
0
+1
+1
+1
2
1
+1
i=?1
1
+1
for n = 0; 1; : : :, with the convention that the sum is zero for n = 0. Thus the MRS is a semi{iterative method. By the fundamental property of the hybrid procedure of rank 2, we have krn k min krnk; krn0 k +1
+1
which shows that the sequence (krnk) decreases monotonically. Thus, even if the convergence of (krn0 k) is quite irregular, we are able to transform it into a smooth sequence, a property giving 34
its name to the procedure. Since sequences with a smooth behavior are easier to accelerate than irregular sequences, it is better to transform rst the sequence (rn0 ) into the sequence (rn ) by the MRS before trying to accelerate its convergence. Let us begin by trying to accelerate the convergence of the sequence (rn ) given by the MRS with the SIMHP of rank 2. We have
n = bnrn + (1 ? bn )rn (2)
+1
with bn = ?(rn ? rn; rn )=(rn ? rn; rn ? rn). But rn ? rn = an (rn0 ? rn ) and, using the above expression for an, we nd that bn = 1 and thus n = rn which shows that the procedure cannot accelerate the convergence of the MRS. Thus, again, methods with a higher rank will have to be used. Other types of acceleration processes are studied in [22]. This work was supported by the Human Capital and Mobility Programme of the European Community, Project ROLLS, Contract CHRX{CT93{0416 (DG 12 COMA). +1
+1
+1
(2)
+1
+1
+1
References [1] A. Abkowicz, Etude de la Procedure Hybride Appliquee a la Resolution des Systemes Lineaires, These, Universite des Sciences et Technologies de Lille, 1995. [2] A. Abkowicz, C. Brezinski, Acceleration properties of the hybrid procedure for solving linear systems, Applicationes Mathematicae, (1995), to appear. [3] O. Axelsson, Conjugate gradient type methods for unsymmetric and inconsistent systems of linear equations, Linear Algebra Appl., 29 (1980) 1{16. [4] O. Axelsson, Iterative Solution Methods, Cambridge University Press, Cambridge, 1994. [5] E.H. Ayachour, Avoiding the look{ahead in the Lanczos method, Appl. Numer. Math., to appear. [6] C. Baheux, New implementations of the Lanczos method, J. Comput. Appl. Math., to appear. [7] R.E. Bank, T.F. Chan, A composite step bi{conjugate gradient algorithm for nonsymmetric linear systems, Numerical Algorithms, 7 (1994) 1{16. [8] R. Barrett et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, 1993. [9] V.O. Belash, N. Ya. Mar'yashkin, The solution of linear algebraic systems by polynomial iterative methods, Comp. Maths. Math. Phys., 34 (1994) 967{975. [10] J. Beuneu, Methodes de projection{minimisation pour les problemes lineaires, RAIRO Anal. Numer., 17 (1983) 221{248. 35
[11] A. Bjorck, Solving linear least squares problems by Gram{Schmidt orthogonalization, BIT, 7 (1967) 1{21. [12] A. Bjorck, Least squares methods, in Handbook of Numerical Analysis, vol. I, P.G. Ciarlet and J.L. Lions eds., North{Holland, Amsterdam, 1990, pp. 465{652. [13] C. Brezinski, Pade{Type Approximation and General Orthogonal Polynomials, Birkhauser, Basel, 1980. [14] C. Brezinski, Recursive interpolation, extrapolation and projection, J. Comput. Appl. Math., 9 (1983) 369{376. [15] C. Brezinski, About Henrici's method for nonlinear equations, Symposium on Numerical Analysis and Computational Complex Analysis, Zurich, August 1983, unpublished. [16] C. Brezinski, Ideas for further investigations on orthogonal polynomials and Pade approximants, in Actas III Simposium sobre Polinomios Ortogonales y Aplicaciones, F. Marcellan ed., 1985. [17] C. Brezinski, Other manifestations of the Schur complement, Linear Algebra Appl., 111 (1988) 231{247. [18] C. Brezinski, Bordering methods and progressive forms for sequence transformations, Zastosow. Mat., 20 (1990) 435{443. [19] C. Brezinski, Biorthogonality and its Applications to Numerical Analysis, Marcel Dekker, New York, 1992. [20] C. Brezinski, Biorthogonality and conjugate gradient{type algorithms, in Contributions in Numerical Mathematics, R.P. Agarwal ed., World Scienti c, Singapore, 1993, pp. 55{70. [21] C. Brezinski, The methods of Vorobyev and Lanczos, Linear Algebra Appl., to appear. [22] C. Brezinski, Acceleration by projection and preconditioning, submitted. [23] C. Brezinski, A.C. Matos, Least{squares orthogonal polynomials, J. Comput. Appl. Math., 46 (1993) 229{239. [24] C. Brezinski, M. Redivo Zaglia, Extrapolation Methods. Theory and Practice, North{Holland, Amsterdam, 1991. [25] C. Brezinski, M. Redivo Zaglia, Look{ahead in BI{CGSTAB and other product{type methods for linear systems, BIT, (1995), to appear. [26] C. Brezinski, M. Redivo Zaglia, Hybrid procedures for solving systems of linear equations, Numer. Math., 67 (1994) 1{19. [27] C. Brezinski, M. Redivo Zaglia, H. Sadok, A breakdown{free Lanczos type algorithm for solving linear systems, Numer. Math., 63 (1992) 29{38. 36
[28] C. Brezinski, M. Redivo Zaglia, H. Sadok, Avoiding breakdown and near{breakdown in Lanczos type algorithms, Numerical Algorithms, 1 (1991) 261{284. [29] C. Brezinski, H. Sadok, Vector sequence transformations and xed point methods, in Numerical Methods in Laminar and Turbulent Flows, vol. I, C. Taylor et al. eds., Pineridge Press, Swansea, 1987, pp. 3{11. [30] C. Brezinski, H. Sadok, Lanczos{type algorithms for solving systems of linear equations, Appl. Numer. Math., 11 (1993) 443{473. [31] C. Brezinski, J. Van Iseghem, Pade approximations, in Handbook of Numerical Analysis, vol. III, P.G. Ciarlet and J.L. Lions eds., North{Holland, Amsterdam, 1994, pp. 47{222 [32] C. Brezinski, J. Van Iseghem, A taste of Pade approximation, in Acta Numerica 1995, A. Iserles ed., Cambridge University Press, Cambridge, 1995, pp. 53{103. [33] T.F. Chan, E. Gallopoulos, V. Simoncini, T. Szeto, C.H. Tong, A quasi{minimal residual variant of the Bi{CGSTAB algorithm for nonsymmetric systems, SIAM J. Sci. Comput., 15 (1994) 338{347. [34] F. Chatelin, Spectral Approximation of Linear Operators, Academic Press, New York, 1983. [35] F. Chatelin, Valeurs Propres de Matrices, Masson, Paris, 1988. [36] E.W. Cheney, Introduction to Approximation Theory, McGraw{Hill, New York, 1966. [37] G. Cimmino, Calcolo approssimato per le soluzioni dei sistemi di equazioni lineari, Ricerca Sci., II, 9 (1938) 326{333. [38] P. Concus, G.H. Golub, A generalized conjugate gradient method for nonsymmetric systems of linear equations, in Computing Methods in Applied Sciences and Engineering, R. Glowinski and J.L. Lions eds., Lecture Notes in Economics and Mathematical Systems, vol. 134, Springer{Verlag, Berlin, 1976, pp. 56{65. [39] C.W. Cryer, Numerical Functional Analysis, Oxford University Press, Oxford, 1982. [40] P.J. Davis, Interpolation and Approximation, 2nd edition, Dover, New York, 1975. [41] J.P. Delahaye, Optimalite du procede d'Aitken pour l'acceleration de la convergence lineaire, RAIRO Anal. Numer., 15 (1981) 321{330. [42] J.P. Delahaye, Sequence Transformations, Springer{Verlag, Berlin, 1988. [43] A. Draux, Formal orthogonal polynomials revisited; applications, Annals Numer. Math., 3 (1996), to appear. [44] R.P. Eddy, Extrapolation to the limit of a vector sequence, in Information Linkage between Applied Mathematics and Industry, P.C.C. Wang ed., Academic Press, New York, 1979, pp. 387{396. 2
37
[45] M. Eiermann, On semiiterative methods generated by Faber polynomials, Numer. Math., 56 (1989) 139{156. [46] M. Eiermann, X. Li, R.S. Varga, On hybrid semiiterative methods, SIAM J. Numer. Anal., 26 (1989) 152{168. [47] M. Eiermann, W. Niethammer, R.S. Varga, A study of semi{iterative methods for nonsymmetric systems of equations, Numer. Math., 47 (1985) 505{533. [48] S.C. Eisenstat, A note on the generalized conjugate gradient method, SIAM J. Numer. Anal., 20 (1983) 358{361. [49] S.C. Eisenstat, H.C. Elman, M.H. Schultz, Variational iterative methods for nonsymmetric systems of linear equations, SIAM J. Numer. Anal., 20 (1983) 345{357. [50] H.C. Elman, Iterative Methods for Large, Sparse, Nonsymmetric Systems of Linear Equations, Ph.D. Thesis, Yale University, 1982. [51] N. Emad, S. Petiton, in preparation. [52] V. Faber, T.A. Manteuel, Orthogonal error methods, SIAM J. Numer. Anal., 24 (1987) 170{187. [53] D.K. Faddeev, V.N. Faddeeva, Computational Methods of Linear Algebra, W.H. Freeman and Company, San Franscisco, 1963. [54] G.E. Forsythe, W.R. Wasow, Finite{Dierence Methods for Partial Dierential Equations, Wiley, New York, 1960. [55] S.P. Frankel, Convergence rates of iterative treatments of partial dierential equations, M.T.A.C., 4 (1950) 65{75. [56] R.W. Freund, A transpose{free quasi{minimal residual algorithm for non{Hermitian linear systems, SIAM J. Sci. Stat. Comput., 14 (1993) 470{482. [57] R.W. Freund, G.H. Golub, N.M. Nachtigal, Iterative solution of linear systems, in Acta Numerica 1992, A. Iserles ed., Cambridge University Press, Cambridge, 1992, pp. 57{100. [58] R.W. Freund, N.M. Nachtigal, QMR: a quasi{minimal residual method for non{Hermitian linear systems, Numer. Math., 60 (1991) 315{339. [59] R.W. Freund, T. Szeto, A quasi{minimal residual squared algorithm for non{Hermitian linear systems, Tech. Report 91.26, NASA Ames Research Center, Moet Field, California, December 1991. [60] R.W. Freund, T. Szeto, A transpose{free quasi{minimal residual squared algorithm for non{ Hermitian linear systems, in Advances in Computer Methods for Partial Dierential Equations { VII, R. Vichnevetsky et al. eds., IMACS, 1992, pp. 258{264. 38
[61] W. Gander, G.H. Golub, D. Gruntz, Solving linear equations by extrapolation, in Supercomputing, J.S. Kovalik ed., Springer{Verlag, Berlin, 1989, pp. 279{293. [62] N. Gastinel, Procede iteratif pour la resolution numerique d'un systeme d'equations lineaires, C.R. Acad. Sci. Paris, 246 (1958) 2571{2574. [63] N. Gastinel, Matrices du Second Degre et Normes Generales en Analyse Numerique Lineaire, These, Universite de Grenoble, Publ. Sci. et Tech. du Ministere de l'Air, S.D.I.T., Paris, 1962. [64] N. Gastinel, Sur{decomposition de normes generales et procedes iteratifs, Numer. Math., 5 (1963) 142{151. [65] N. Gastinel, Analyse Numerique Lineaire, Hermann, Paris, 1966. [66] B. Germain{Bonne, Transformations de suites, RAIRO Anal. Numer., 7eme annee, R1 (1973) 84{90. [67] B. Germain{Bonne, Estimation de la Limite de Suites et Formalisation de Procedes d'Acceleration de Convergence, These de Doctorat es Sciences Mathematiques, Universite des Sciences et Techniques de Lille, 1978. [68] G.H. Golub, The Use of Chebyshev Matrix Polynomials in the Iterative Solution of Linear Equations Compared with the Method of Successive Over{Relaxation, Ph.D. Thesis, University of Illinois, 1959. [69] G.H. Golub, C.F. Van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, 2nd ed., 1989. [70] G.H. Golub, R.S. Varga, Chebyshev semi{iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods, Numer. Math., 3 (1961) 147{156, 157{168. [71] P.R. Graves{Morris, Extrapolation methods for vector sequences, Numer. Math., 61 (1992) 475{487. [72] P.R. Graves{Morris, A review of Pade methods for the acceleration of convergence of a sequence of vectors, Appl. Numer. Math., 15 (1994) 153{174. [73] M.H. Gutknecht, A completed theory of the unsymmetric Lanczos process and related algorithms, Part I, SIAM J. Matrix Anal. Appl., 13 (1992) 594{639. [74] M.H. Gutknecht, A completed theory of the unsymmetric Lanczos process and related algorithms, Part II, SIAM J. Matrix Anal. Appl., 15 (1994) 15{58. [75] M.H. Gutknecht, W. Niethammer, R.S. Varga, k{step iterative methods for solving nonlinear systems of equations, Numer. Math., 48 (1986) 699{712. 39
[76] W. Hackbusch, Iterative Solution of Large Sparse Systems of Equations, Springer{Verlag, New York, 1994. [77] K. Jbilou, H. Sadok, Analysis of some vector extrapolation methods for solving systems of linear equations, Numer. Math., 70 (1995) 73{89. [78] K.C. Jea, D.M. Young, On the simpli cation of generalized conjugate{gradient methods for nonsymmetrizable linear systems, Linear Algebra Appl., 52/53 (1983) 399{417. [79] W.D. Joubert, T.A. Manteuel, Iterative methods for nonsymmetric linear systems, in Iterative Methods for Large Linear Systems, D.R. Kincaid and L.J. Hayes eds., Academic Press, New York, 1990, pp. 149{171. [80] W.D. Joubert, D.M. Young, Necessary and sucient conditions for the simpli cation of generalized conjugate gradient algorithms, Linear Algebra Appl., 88/89 (1987) 449{485. [81] S. Kaczmarz, Angenaherte Au osung von Systemen linearer Gleichungen, Bull. Acad. Polon. Sci., A35 (1937) 355{357. [82] D.R. Kincaid, On complex second{degree iterative methods, SIAM J. Numer. Anal., 11 (1974) 211{218. [83] D.R. Kincaid, Stationary second{degree iterative methods, Appl. Numer. Math., 16 (1994) 227{237. [84] D.R. Kincaid, D.M. Young, Stationary second{degree iterative methods and recurrences, in Iterative Methods in Linear Algebra, R. Beauwens and P. de Groen eds., North{Holland, Amsterdam, 1992, pp. 27{47. [85] V.N. Kublanovskaya, Application of analytic continuation in numerical analysis by means of change of variables, Trudy Mat. Inst. Steklov, 53 (1959) 145{185. [86] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear dierential and integral operators, J. Res. Natl. Bur. Stand., 45 (1950) 255{282. [87] C. Lanczos, Solution of systems of linear equations by minimized iterations, J. Res. Natl. Bur. Stand., 49 (1952) 33{53. [88] P.J. Laurent, Approximation et Optimisation, Hermann, Paris, 1972. [89] H. Le Ferrand, Quelques inegalites sur les determinants de Gram d'une suite vectorielle de Krylov associee a une matrice orthogonale de IRp, Linear Algebra Appl., 196 (1994) 243{252. [90] H. Le Ferrand, Une generalisation au cas vectoriel du procede d'Aitken et les suites a comportement lineaire, M AN, 29 (1995) 53{62. [91] W.R. Mann, Mean values methods in iteration, Proc. Am. Math. Soc., 4 (1953) 506{510. 2
2
40
[92] W.R. Mann, Averaging to improve convergence of iterative processes, in Functional Analysis Methods in Numerical Analysis, M.Z. Nashed ed., LNM vol. 701, Springer{Verlag, Berlin, 1979, pp. 169{179. [93] T.A. Manteufel, The Tchebychev iteration for nonsymmetric linear systems, Numer. Math., 28 (1977) 307{327. [94] T.A. Manteufel, Adaptative procedure for estimation of parameters for the nonsymmetric Tchebychev iteration, Numer. Math., 31 (1978) 183{208. [95] M. Mesina, Convergence acceleration for the iterative solution of x = Ax + f , Comput. Methods Appl. Mech. eng., 10 (1977) 165{173. [96] O. Nevanlinna, Convergence of Iterations for Linear Equations, Birkhauser, Basel, 1993. [97] W. Niethammer, R.S. Varga, The analysis of k{step iterative methods for linear systems from summability theory, Numer. Math., 41 (1983) 177{206. [98] B.N. Parlett, The Symmetric Eigenvalue Problem, Prentice{Hall, Englewood Clis, 1980. [99] B.N. Parlett, D.R. Taylor, Z.A. Liu, A look{ahead Lanczos algorithm for unsymmetric matrices, Math. Comput., 14 (1985) 105{124. [100] R. Pennacchi, Le trasformazioni razionali di una successionne, Calcolo, 5 (1968) 37{50. [101] J.R. Rice, Experiments on Gram{Schmidt orthogonalization, Math. Comput., 20 (1966) 325{ 328. [102] L.F. Richardson, The approximate arithmetical solution by nite dierences of physical problems involving dierential equations, with application to the stress in a masonry dam, Philos. Trans. Roy. Soc. London, ser. A, 226 (1910) 307{357. [103] Y. Saad, The Lanczos biorthogonalization algorithm and other oblique projection methods for solving large unsymmetric systems, SIAM J. Numer. Anal., 19 (1982) 485{506. [104] Y. Saad, Iterative solution of inde nite symmetric linear systems by methods using orthogonal polynomials over two disjoint intervals, SIAM J. Numer. Anal., 20 (1983) 784{814. [105] Y. Saad, Least squares polynomials in the complex plane and their use in solving non{ symmetric linear systems, SIAM J. Numer. Anal., 24 (1987) 155{169. [106] Y. Saad, Numerical Methods for Large Eigenvalue Problems, Manchester University Press, Manchester, 1992. [107] Y. Saad, M.H. Schultz, Conjugate gradient{like algorithms for Solving nonsymmetric linear systems, Math. Comput., 44 (1985) 417{424. [108] Y. Saad, M.H. Schultz, GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 7 (1986) 856{869. 41
[109] W. Schonauer, Scienti c Computing on Vector Computers, North{Holland, Amsterdam, 1987. [110] W. Schonauer, H. Muller, E. Schnepf, Numerical tests with biconjugate gradient type methods, Z. Angew. Math. Mech., 65 (1985) T400{T402. [111] A. Sidi, Extrapolation vs. projection methods for linear systems of equations, J. Comput. Appl. Math., 22 (1988) 71{88. [112] J.A. Shohat, Sur les polyn^omes orthogonaux generalises, C.R. Acad. Sci. Paris, 207 (1938) 556{558. [113] P. Sonneveld, CGS, a fast Lanczos{type solver for nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 10 (1989) 36{52. [114] G. Starke, R.S. Varga, A hybrid Arnoldi{Faber iterative method for nonsymmetric systems of linear equations, Numer. Math., 64 (1993) 213{240. [115] E.L. Stiefel, Kernel polynomials in linear algebra and their numerical applications, in Further Contributions to the Solution of Simultaneous Linear Equations and the Determination of Eigenvalues, Nat. Bur. Standards Appl. Math. Ser., 49 (1958) 1{22. [116] L.N. Trefethen, Approximation theory and numerical linear algebra, in Algorithms for Approximation II, J.C. Mason and M.G. Cox eds., Chapman and Hall, London, 1990, pp. 336{ 360. [117] M.M. Vainberg, Variational Method and Method of Monotone Operators in the Theory of Nonlinear Equations, John Wiley, New York, 1973. [118] W. Van Assche, Asymptotics for orthogonal polynomials and three{term recurrences, in Orthogonal Polynomials: Theory and Practice, P. Nevai ed., Kluwer, Dordrecht, 1990, pp. 435{ 462. [119] W. Van Assche, Chebyshev polynomials as a comparison system for orthogonal polynomials, in Proceedings of the Cornelius Lanczos International Centenary Conference, J.D. Brown et al. eds., SIAM, Philadelphia, 1994, pp. 365{367. [120] H.A. Van Der Vorst, Bi{CGSTAB: a fast and smoothly converging variant of Bi{CG for the solution of nonsymmetric linear systems, SIAM J. Sci. Stat. Comput., 13 (1992) 631{644. [121] J. Van Iseghem, Vector Pade approximants, in Numerical Mathematics and Applications, R. Vichnevetsky and J. Vignes eds., North{Holland, Amsterdam, 1986, pp. 73{77. [122] J. Van Iseghem, Vector orthogonal relations. Vector QD{algorithm, J. Comput. Appl. Math., 19 (1987) 141{150. [123] R.S. Varga, Matrix Iterative Analysis, Prentice{Hall, Englewood Clis, 1962. 42
[124] P.K.W. Vinsome, ORTHOMIN, an iterative method for solving sparse sets of simultaneous linear equations, in Proc. Fourth Symposium on Reservoir Simulation, Society of Petroleum Engineers of AIME, 1976, pp. 149{159. [125] Yu.V. Vorobyev, Method of Moments in Applied Mathematics, Gordon and Breach, New York, 1965. [126] H.F. Walker, Residual smoothing and peak/plateau behavior in Krylov subspace methods, Appl. Numer. Math., to appear. [127] D.S. Watkins, Fundamentals of Matrix Computations, John Wiley, New York, 1991. [128] R. Weiss, Convergence Behavior of Generalized Conjugate Gradient Methods, Doctoral Thesis, University of Karlsruhe, 1990. [129] R. Weiss, Properties of generalized conjugate gradient methods, Num. Lin. Algebra with Appl., 1 (1994) 45{63. [130] R. Weiss, W. Schonauer, Accelerating generalized conjugate gradient methods by smoothing, in Iterative Methods in Linear Algebra, R. Beauwens and P. de Groen eds., North{Holland, Amsterdam, 1992, pp. 283{292. [131] O. Widlund, A Lanczos method for a class of nonsymmetric systems of linear equations, SIAM J. Numer. Anal., 15 (1978) 801{812. [132] J.H. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, 1965. [133] D.M. Young, On Richardson's method for solving linear systems with positive de nite matrices, J. Math. Phys., 32 (1954) 243{255. [134] D.M. Young, Iterative Solution of Large Linear Systems, Academic Press, New York, 1971. [135] D.M. Young, Second{degree iterative methods for the solution of large linear systems, J. Approximation Theory, 5 (1972) 137{148. [136] D.M. Young, K.C. Jea, Generalized conjugate{gradient acceleration of nonsymmetrizable iterative methods, Linear Algebra Appl., 34 (1980) 159{194. [137] L. Zhou, H.F. Walker, Residual smoothing techniques for iterative methods, SIAM J. Sci. Comput., 15 (1994) 297{312.
43