important implementation details. We establish a relationship between the block Lanc- zos algorithm and block MINRES algorithm, and compare the numerical ...
TWO ALGORITHMS FOR SYMMETRIC LINEAR SYSTEMS WITH MULTIPLE RIGHT-HAND SIDES HUA DAI
1;2
Department of Mathematics Nanjing University of Aeronautics and Astronautics Nanjing 210016, People's Republic of China
CERFACS Technical Report - TR/PA/98/17
Abstract In this paper, we investigate the block Lanczos algorithm for solving large sparse symmetric linear systems with multiple right-hand sides, and show how to incorporate de ation to drop converged linear systems using a natural convergence criterion, and present an adaptive block Lanczos algorithm. We propose also a block version of Paige and Saunders' MINRES method for iterative solution of symmetric linear systems, and describe important implementation details. We establish a relationship between the block Lanczos algorithm and block MINRES algorithm, and compare the numerical performance of the Lanczos algorithm and MINRES method for symmetric linear systems applied to a sequence of right-hand sides with that of the block Lanczos algorithm and block MINRES algorithm for multiple linear systems simultaneously.
Key words: symmetric linear systems, multiple right-hand sides, block Lanczos algorithm, block MINRES method
AMS(MOS) subject classi cation: 65F10, 65F15
The research supported in part by the National Natural Science Foundation of China and the Jiangsu Province Natural Science Foundation. 2 The work of this author was done during a visit to CERFACS, France in March-August 1998. 1
1
1. Introduction
In many applications we need to solve multiple systems of linear equations
Ax( ) = b( ); i = 1; ; s (1) with the same n n real symmetric coecient matrix A, but s dierent right-hand sides b( )(i = 1; ; s). If all of the right-hand sides are available simultaneously, then these s i
i
i
linear systems can be summarized in block form as follows.
AX = B
(2)
where X = [x(1); ; x( )] and B = [b(1); ; b( )]. If the order n of the matrix A is small, we can solve (2) using direct methods, rst decompose the coecient matrix A into simpler factors at a cost of O(n3) operations and then solve for each right-hand side at a cost of O(n2 ). Direct methods can also be advantageous if the right-hand sides are not all simultaneously available. However, for n large direct methods can be prohibitively expensive both in terms of memory and computational cost, and so iterative methods become appealing. We will assume that the order n of A is suciently large that it is possible to keep only a limited number of dimension-n vectors in memory and that all of the right-hand sides are available simultaneously and s n. Most of iterative methods are tailored to the solution of linear systems with a single right-hand side. They could be trivially used to solve multiple right-hand problems (2) by simply solving the s linear systems (1) individually. The Lanczos algorithms proposed by Parlett[15] and extended by Saad[16], van der Vorst[21], Papadrakakis and Smerou[14] are mostly suitable when not all b( ) are simultaneously available. Reference[14] is of particular interest as it contains numerical experiments from actual applications. However, it can be signi cantly more ecient to apply block variants of the iterative methods that generate iterates for all the multiple linear systems simultaneously. In the block approach, one extends an iterative method for single to multiple systems, by devising a block version which is applied to the block formulation (2) of multiple linear systems (1). The approximate solutions that are generated by a block method for the s dierent right-hand sides will generally converge at dierent stages of the block iteration. An ecient block method needs to be able to detect and adaptively de ate converged systems. O'Leary[11] was the rst to devise the block conjugate gradient method and the block Lanczos algorithm for multiple symmetric linear systems, which are closely related to the conjugate gradient method[8], the classical Lanczos method[9] and the block Lanczos process [3, 7]. However, the algorithms in [11, 12] can not handle de ations or variable block sizes. For multiple symmetric positive de nite systems. Sadkane and Vital[18] proposed the block Davidson method, Calvetti and Reichel[1] devised an adaptive Richardson iteration method. Nikishin and Yeremin[10] presented also a block version of the conjugate gradient algorithm that allows varying block sizes. The literature for nonsymmetric linear systems with multiple right-hand sides is vast[17]. Some methods that have been proposed are block generalizations of solvers for nonsymmetric linear systems: the block biconjugate gradient algorithm[11, 20], block GMRES[22, 2], hybrid GMRES[19] and block QMR method[6]. Although the block methods for multiple nonsymmetric systems could be directly used to solve multiple symmetric systems, they can not make full use of the symmetry of the matrix A. In this paper, we investigate the block Lanczos algorithm (referred to as BLanczos hereafter) for multiple symmetric linear systems (2), and consider its de ation procedure. Using a natural convergence criterion, we can identify and drop linear systems whose solutions can be recovered from the solution of the remaining multiple linear systems. s
s
i
2
In order to smooth the possibly erratic behavior of the residual norm curve generated by the application of the BLanczos algorithm to multiple systems (2), we describe a block version of Paige and Saunders' MINRES method[13] for the iterative solution of symmetric linear systems with a single right-hand side. The MINRES iterates are de ned by a minimization of the residual norm, which leads to smooth convergence behavior. The block MINRES algorithm (referred to as BMINRES hereafter) is an extension of MINRES to multiple symmetric linear systems. The BMINRES iterates are then determined via a block smoothing residual technique, which can be formulated as a matrix least-squares problem. In order to de ate the converged block iterates, we restart the algorithm with new full rank bases. The structure of the paper is as follows. In section 2 we brie y review the block Lanczos process, and then discuss the de ation of the BLanczos algorithm for multiple symmetric linear systems, and present an adaptive BLanczos algorithm which allows varying block sizes. In section 3 we describe the BMINRES algorithm and show how it is aected by de ation, and give some implementation details for the BMINRES algorithm. In section 4 we discuss the relation between the BLanczos and the BMINRES algorithms. In section 5 we compare the numerical performance of the Lanczos algorithm and MINRES method for symmetric linear systems applied to a sequence of right-hand sides with that of the BLanczos and BMINRES algorithms for multiple linear systems simultaneously and give results of numerical experiments. Finally, conclusions are presented in section 6. The following notation will be used. k:k2 denotes the Euclidean vector norm or induced spectral norm, and k:k the Frobenius matrix norm. I is the identity matrix of order n. E = [0; ; 0; I ; 0; ; 0] with I at the ith block position; the total size of E will be made clear from the context. F
i
n
T
s
s
i
2. The block Lanczos algorithm
We brie y review the block Lanczos process. For R0 2 R , let R0 = Q1 T1 be a QR factorization of R0, i.e., Q1 2 R is orthonormal and T1 2 R is upper-triangular. The block Lanczos process generates a sequence of orthogonal mutually matrices Q 2 R ; k = 1; 2; , i.e., I; j = k Q Q = 0; j 6= k n
n
n
s
s
s
s
k
s
T k
j
that satisfy a three-term recursion relation (3) Q +1 T +1 = AQ ? Q M ? Q ?1 T ; k 1 where Q0 = 0, and M ; T 2 R . De ne K (A; R0) = spanfR0; AR0; ; A ?1R0g (4) K (A; R0) is referred to as the kth block Krylov subspace. The block Lanczos process contructs an orthonormal basis for the block Krylov subspace K (A; R0). This can be accomplished by using the following algorithm. Algorithm 2.1 (Block Lanczos process) 1). Compute the QR factorization of R0 : R0 = Q1T1 where Q1 2 R is orthonormal and T1 2 R is upper-triangular, and M1 = Q1 AQ1 T
k
k
k
k
k
s
k
k
k
k
s
k
k
k
k
n
s
s
T
3
s
2). For j = 1; ; k, do 2.1). Compute P +1 = AQ ? Q M ? Q ?1 T (Q0 = 0) 2.2). Compute QR factorization of P +1 : P +1 = Q +1 T +1 where Q +1 2 R is orthonormal and T +1 2 R is upper-triangular. 2.3). Compute M +1 = Q +1AQ +1 end do. The matrices M ; T are uniquely determined if the diagonal elements of the T generated by Algorithm 2.1 are positive. We now assume this to be the case. We will consider the case of a rank de ciency of P later. Let j
j
j
j
T j
j
j
j
n
j
j
s
s
j
T j
j
j
j
s
j
j
j
j
Qb = [Q1; ; Q ] k
(5)
k
0M T BB T M T B T : : Tb = B BB : : : @ T
1
2
2
T
2
3
3
k
: : T T M
T
k
k
1 C C C C C C A
(6)
k
The three-term recursion relation (3) can be written in matrix form as
AQb = Qb Tb + P +1 E = Qb Tb + Q +1T +1E T
k
k
k
k
k
k
k
k
k
(7)
T k
where Qb 2 R is orthonormal, Tb = Qb AQb 2 R is an orthogonal projection of A onto the block Krylov subspace K (A; R0), and is a symmetric band matrix with the band width 2s + 1. In the BLanczos algorithm the goal is to solve the multiple symmetric linear systems (2). Given a block of initial guesses x(0 )(i = 1; ; s) , we de ne R0 the block of initial residuals R0 = B ? AX0 = [r0(1); ; r0( )] (8) k
n
ks
T
k
ks
k
k
ks
k
i
s
( ) ( ) ( ) where X0 = [x(1) ? Ax(0 )(i = 1; ; s). Each of the approximation 0 ; ; x0 ], r0 = b solutions generated by the BLanczos algorithm has the form s
i
i
i
x( ) = x(0 ) + Qb y( ) i
i
(9)
i
k
k
k
Grouping the approximations x( ) in a block X , and the y ( ) in a block Y , we can write i
i
k
k
k
X = X0 + Qb Y Then, the relation (7) results in the residual R = B ? AX R = R0 ? Qb Tb Y ? Q +1 T +1E Y k
k
k
k
k
k
k
4
(10)
k
T
k
k
k
k
k
k
(11)
The kth BLanczos iterate X is chosen such that the kth residual R satis es Galerkin condition, i.e., k
k
R ? K (A; R0) k
(12)
k
The kth BLanczos iterate X is obtained by solving k
Tb Y = E1T1 k
(13)
k
and so the kth BLanczos residual R is of the form k
R = ?P +1 E Y = ?Q +1 T +1E Y
(14)
Rb = T +1E Y
k
(15)
R = ?Q +1 Rb
k
(16)
k
T
k
Let
k
k
k
T k
k
T
k
Then
k
k
k
k
k
(16) can be used for convergence test. If A is also assumed to be positive de nite, then after k iterations of the BLanczos algorithm the error in the ith component e( ) = x( ) ? x( ) satis es[11] i
k
i
i
k
p
?1 e Ae c( 1 ? p?1 )2 ; 1 i s 1+ where = = and ?1 1 are the eigenvalues of the matrix A, and c is a constant dependent on i but independent of k. This estimate shows that the block version of the Lanczos algorithm may lead to an essentially faster convergence. Now let us consider the case of a rank de ciency of P +1 and describe a process for reducing the block size of the BLanczos algorithm. It follows from (14) that rank de ciency of P +1 leads the loss of full rank in R . We can easily establish the following result which allows us to discover how the convergence of the BLanczos algorithm forces the loss of full rank of the residual matrix. Theorem 2.1. Let two linear systems with multiple right-hand sides AX = B and AXe = Be e X; Xe 2 R . Suppose that Be = B, Xe0 = X0, where A 2 R is symmetric, B; B; where 2 R is nonsingular and that the BLanczos algorithm applied to these multiple systems does not fail after k steps. Then the BLanczos iterates and the residuals satisfy Xe = X and Re = R If the columns of P +1 are linearly dependent at kth iteration, then the current residual R loses full rank. So there esists a nonsingular matrix such that n
s
n
(i)T
(i)
k
k
k
n
s
k
k
k
n
s
n
n
s
s
k
k
k
k
k
k
Re = R = [R; 0] k
k
k
k
where the submatrix R 2 R (d < s) has full rank. n
d
k
5
k
(17)
Let
Be = B = [B; B( )]; Xe = X = [X ; X ( )]
(18)
I
I
k
k
k
k
k
k
where B ; X 2 R . From R = B ? AX , we have n
d
k
k
k
AX ( ) = B( ) I
(19)
I
k
and
R = B ? AX
(20)
k
k
(19) shows that the residual R may lose full rank only in the case of the algebraic convergence of some vector components of the modi ed residual Re . Thus for j k the BLanczos iterates X +1 can be written in the form k
k
j
Xe +1 = [X +1; X ( )] I
j
j
k
Using X and R as initial guess and initial residual, respectively, we repeat the BLanczos algorithm until we nd the solution X of the de ated multiple systems AX = B . Then [X; X ( )] ?1 is the solution of the multiple linear systems (2). Therefore, we can reduce adaptively the block size. Using (10) and (13), however, we can not conveniently accumulate iterates X as the block Lanczos process is implemented, but by a further modi cation of variables we can. Similar to discussion in [13, 11], we can de ne a LQ decomposition of Tb : k
k
I
k
k
k
k
Tb = L V k
k
(21)
k
where V 2 R is orthogonal, and k
ks
ks
0L BB L L BB L L L L =B : : : BB : : B@ L? 11 21
22
31
32
33
k
?
1;k 3
k
:
L ?1 ?2 L ?1 ?1 L ?2 L ?1 L k
;k
k
k;k
with L 2 R . Let ij
s
;k
k;k
1 C C C C C C C C A
(22)
kk
s
W = Qb V ; = V Y T
k
k
k
k
k
k
(23)
Then (10), (13) and (15) become
X = X0 + W
(24)
L = E 1 T1
(25)
k
k
k
k
6
k
Rb = T +1(V E ) k
k
k
T
k
(26)
k
The orthogonal matrix V is updated from the previous iteration by setting
V = k
I
k
?
(k 2)s
0
V ?
0
T
k
V (a ; b ; c ; d ) k
k
k
0
k
1
0
I
s
(27)
a b V (a ; b ; c ; d ) =
where
k
k
k
k
c d is a 2s 2s orthogonal matrix written as four s s blocks, and is chosen such that a b [L ?1 ?1 ; T ] c d = [L ?1 ?1 ; 0] k
k
k
k
k
k
k
k
k
T
;k
k
k
;k
A detailed derivation is similar to that of Paige and Saunders for the single vector [13]. Thus we can give an adaptive BLanczos algorithm for multiple symmetric linear systems (2) as follows. Algorithm 2.2 (Adaptive BLanczos algorithm) Choose an initial guess X0, and compute R0 = B ? AX0, and set i = 0; s = s 1). Initialization R0 = Q1T1 ( QR factorization of R0) M1 = Q1 AQ1 Xb0 = X0; Q0 = 0; W 1 = Q1; L 11 = M1 a1 = d1 = I i ; b1 = c1 = 0; ?1 = 0 = 0 2). For k = 1; 2; , 2.1). Compute P +1 = AQ ? Q M ? Q ?1T P +1 = Q +1T +1 ( QR factorization of P +1 ) M +1 = Q +1AQ +1 2.2). Update the LQ factorization Compute L +1 ?1 = T +1c ; L +1 = T +1d i
T
s
k
k
k
k
k
k
T
k
k
k
T k
k
k
k
k
;k
k
k
k
;k
k
Find a 2s 2s orthogonal matrix i
i
a
k +1
c +1 k
[L ; T +1 ] ac +1 db +1 T k
k;k
k
k
k
k +1
k +1
where L 2 R i i is lower-triangular. Compute L +1 = L +1 a +1 + M +1 c +1 L +1 +1 = L +1 b +1 + M +1d +1 W = W a +1 + Q +1c +1 W +1 = W b +1 + Q +1 d +1 s
k;k
k
;k
k
;k
k
k
s
k
;k
k
k
k
k
k
;k
k
k
k
k
k
k
k
k
k
7
k
b +1 d +1 k
k
= [L ; 0] k;k
so that
2.3). Update the iterates Compute from L = ?L ?2 ?2 ? L ?1 ?1 if k = 1, L11 1 = T1 Form Xb = Xb ?1 + W 2.4). Test convergence and reduce block size Compute +1 from L +1 +1 +1 = ?L +1 ?1 ?1 ? L +1 Form Rb = T +1(c +1 + d +1 +1 ) Compute QR factorization with row pivoting of Rb Rb U = Z where 2 R i i is a permutation matrix, U 2 R i i is orthogonal and Z 2 R i i is lower-triangular. If Z has full rank, go to 2); Otherwise compute X = Xb + W +1 +1 Partition Z and X U Z = [Z ; Z ( )]; X U = [X ; X ( )] where Z 2 R i (d < s ), kZ ( ) k ", and X 2 R . Set ( ) U =U R0 = ?Q +1 Z i= i+1 s =d If s = 0, go to 3); Otherwise go to 1). 3). Form the solution of (2) X = [ X (1)]U (1)T ; X (0)]U (0)T k
kk
k
k
k;k
k
k
k
k;k
k
k
k
k
;k
k
k
k
k
k
k
;k
k
k
k
;k
k
k
k
k
s
k
k
k
k s
k
s
k
s
s
s
k
k
k
k
k
k
k
k
i
k
k
i
k
k
s
k
k
i
d
i
k
i
k
n
F
k
d
k
k
T k
k
k
i
i
k
k
3. The block MINRES algorithm
If residual norms, kR k ; k = 1; 2; , generated by the BLanczos algorithm are plotted, typically the resulting curve is not monotonically decreasing as a function of the iteration number. Irregular peaks can appear in such curves. These are erratic convergence behavior. In addition, the matrix Tb may be singular, in which case the BLanczos iterates X are not well de ned. In order to overcome these drawbacks, we extend Paige and Saunders' MINRES algorithm [13] for single to multiple systems. The BMINRES algorithm is derived in much the same way as the block GMRES algorithme [3,22]. De ne the (k + 1)s ks matrix Te by k
F
k
k
k
Te
k
Tb =
k
T +1E k
T k
(28)
The relation (7) can be written in the form
AQb = Qb +1 Te k
k
8
k
(29)
The kth BMINRES iterate X has the form k
X = X0 + Qb Y The kth BMINRES residual R = B ? AX is of the form R = R0 ? AQb Y = Qb +1 (E1T1 ? Te Y ) k
k
k
(30)
k
k
k
k
k
k
k
(31)
k
We choose Y to minimize kR k . Since the columns of Qb +1 are orthonormal, it follows that Y satis es the following matrix least squares problem k
k
k
F
k
kR k = min kE T ? Te Y k k
F
1 1
Y
k
(32)
F
The properties (30 and (32) completely de ne the BMINRES iterates. The following result, similar to Theorem 2.1, can be also established. Theorem 3.1. Let two linear systems with multiple right-hand sides AX = B and AXe = Be e X; Xe 2 R . Suppose that Be = B, Xe0 = X0, where A 2 R is symmetric, B; B; where 2 R is nonsingular and that Te has full rank after k steps. Then the BMINRES iterates and the residuals satisfy Xe = X and Re = R The matrix least squares problem (32) may be sovled by using the QR factorization. We nd an orthogonal matrix Qe +1 such that n
s
n
n
s
s
k
k
k
k
k
k
0R R R BB R R R B : : : R BB : : Qe Te = 0 = B BB R? BB @ 11
12
13
22
23
24
k
k +1
k
k
?
2;k 2
ks
ks
s
ii
Qe +1 = k
I
s
?
(k 1)s
0
0
Q(a ; b ; c ; d ) k
k
k
k
;k
k
;k
?1 ?1
R ?2 R ?1 R k
;k
k
;k
kk
where R 2 R ; R 2 R are upper-triangular. Qe updated from the previous iteration by setting k
: R ?2 R ?1
0
k +1
Qe
k
2R 0
0 I
k
(k +1)s (k +1)s
s
a b where Q(a ; b ; c ; d ) = c d is a 2s 2s orthogonal matrix and Qe = I . k
k
k
k
k
k
k
1 C C C C C C C C (33) C C A
may be (34)
1
k
The problem (32) is equivalent to
R e min kQ E T ? 0 Y k Y
k +1
1
1
9
k
F
(35)
Let
Qe +1E1T1 = e ; e1 = T1 +1 k
(36)
k
k
where e +1 2 R . Then the solution of the problem (35) can be obtained by setting Y = R?1 (37) and the kth BMINRES residual R is of the form 0 R = Qb Qe (38) s
k
s
k
k
k
k
e +1
T k +1
k +1
k
k
The following theorem establishes a relationship between the rank de ciency of P +1 and the loss of full rank in the BMINRES residual R . Theorem 3.2. Suppose that Tb is nonsingular after k steps and that P +1 is rankde cient. Then the BMINRES residual R loses full rank. Moreover, if rank(P +1 ) = d < s, then rank(R ) d. Proof. From (34) and (36) it is easy to verify that e +1 = c e ; k = 1; 2; (39) k
k
k
k
k
k
k
k
k
k
Since Tb is nonsingular, then k
0R R R BB R R R BB : : : : : Qe Tb = B BB R ? B@ 11
k
12
13
22
23
24
k
k
?
2;k 2
: R ?2 R ?1 k
;k
k
;k
?1 ?1
1 C C C C C = R C C C A
(40)
k
R ?2 R ?1 Re k
;k
k
;k
kk
is nonsingular, and so is Re 2 R . Using (34) and (40), we have s
kk
s
0R R R B R R R B B : : : B B : : : 0 B B : R? Q(a ; b ; c ; d ) B B R? B @ 11
Qe +1Te = k
k
I
?
(k 1)s
0
k
k
k
12
13
22
23
k
24
? ?1
k
2;k 1
k
1;k
R ?2 R ?1 Re T +1 k
;k
k
;k
kk
1 CC CC CC CC CC A
k
where the 2s 2s orthogonal matrix Q(a ; b ; c ; d ) is chosen such that
a b Re R = k
k
k
k
k
c d
k
kk
T +1 k
10
k
k
kk
0
(41)
where R 2 R is upper-triangular. Form (41) we obtain kk
s
s
c Re + d T +1 = 0 k
kk
k
(42)
k
It follows from (38), (39), (42) and the nonsingularity of Re that rank(R ) = rank(e +1) rank(c ) rank(T +1) = rank(P +1) = d. 2 Using Theorem 3.1 and Theorem 3.2, we can also reduce adaptively the block size in the BMINRES algorithm. An adaptive BMINRES algorithm for multiple systems (2) may be summarized as follows. Algorithm 3.1 (Adaptive BMINRES algorithm) Choose an initial guess X0, and compute R0 = B ? AX0, and set i = 0; s = s 1). Initialization R0 = Q1T1 ( QR factorization of R0) e1 = T1 Q0 = 0; q0 = Q1; ?1 = 0; 0 = 0 a0 = d0 = d?1 = I i ; b0 = b?1 = c0 = 0 2). For k = 1; 2; , 2.1). Compute M = Q AQ P +1 = AQ ? Q M ? Q ?1T P +1 = Q +1T +1 ( QR factorization of P +1 ) 2.2). Update the QR factorization of Te Compute R ?2 = b ?2T R ?1 = a ?1 d ?2T + b ?1M Re = c ?1 d ?2T + d ?1M a b Find a 2s 2s orthogonal matrix c d such that kk
k
k
k
k
k
i
s
T k
k
k
k
k
k
k
k
k
T k
k
k
k
k
T
k
;k
k
k
k
;k
k
k
kk
k
T k
T k
k
i
k
k
k
k
k
k
k
k
i
a b Re R = k
k
k
k
kk
kk
c d
T +1
0
k
where R 2 R i i is upper-triangular. 2.3). Update the iterates = (Q ? ?1 R ?1 ? ?2 R ?2 )R?1 = a e X = X ?1 + 2.4). Test convergence and reduce block size Compute q = q ?1 c + Q +1 d e +1 = c e Compute QR factorization with row pivoting of e +1 e +1 U = Z where 2 R i i is a permutation matrix, U 2 R i i is orthogonal and s
kk
k
s
k
k
k
k
k
k
k
k
;k
k
k
;k
kk
k
k
T k
k
k
k
k
k
T k
k
k
k
k
k
k
s
k
s
k
11
s
s
Z 2 R i i is lower-triangular. If Z has full rank, go to 2); Otherwise partition Z and X U Z = [Z ; Z ( )]; X U = [X ; X ( )] where Z 2 R i (d < s ), kZ ( ) k ", and X 2 R . Set ( ) U =U R0 = q Z i= i+1 s =d If s = 0, go to 3); Otherwise go to 1). s
k
s
k
k
k
k
k
k
s
k
k
k
k
i
d
i
k
i
k
i
i
k
n
F
d
k
k
T
k
k
k
i
i
3). Form the solution of (2) X = [ X (1)]U (1)T ; X (0)]U (0)T k
k
4. Relation between the BLanczos and BMINRES algorithms In [4], relationships are established between Galerkin and norm-minimizing iterative methods for solving linear systems with a single right-hand side. We now analyse the related performance aspects of the BLanczos and BMINRES algorithms. Assume that Tb is nonsingular. Let X ( ) and R( ) denote the kth BLanczos iterate and residual, respectively. X ( ) and R( ) stand for respectively the kth BMINRES iterate and residual. Let Y ( ) and Y ( ) denote the solutions of the multiple symmetric linear systems (13) and the matrix least squares problem (32), respectively. Inthe previous section, we show that the matrix Te can be factored in the form Qe +1 R0 , where Qe +1 2 R( +1) ( +1) is orthogonal and R 2 R is uppertriangular. This factorization can be performed as follows: R e (F F1 )T = 0 (43) where k
L
L
k
k
M
M
k
L
k
k
M
k
k
k
T k
k
k
s
k
s
ks
k
ks
k
k
0I B F =B @
k
?
(i 1)s
i
a b c d i
i
i
i
I( ? ) k
1 C C A
(44)
i s
2 R . Note that the rst k ? 1 2s 2s orthogonal matrices
a b
with a ; b ; c ; d c d (i = 1; ; k ? 1) used to factor Te are the same as those used to factor Te ?1 . The following result is a generalization of Theorem 1 in [4]. Theorem 4.1. Let ( ) = E1T1 ? Te Y ( ); ( ) = E1T1 ? Te Y ( ) If d is nonsingular at iteration k, then (45) k( )k = kd?1c c1T1k ; k( )k = kc c1T1k i
i
i
i
s
s
k
k
k
M
M
L
L
k
k
k
k
k
k
M
L
k
F
k
k
F
12
k
F
k
F
i
i
i
i
to
. Let Qe Proof R
= F F1 be the (k + 1)s (k + 1)s orthogonal matrix reduing Te . Since Tb is nonsingular, so is R . It follows from (32) and (35)-(39) that
k
0
k +1
k
k
k
k
k k = ke k = kc c T k (M )
k +1
F
k
F
1 1
k
This establishes the second equality in (45). From (13) and (28), we have
F
( ) = E1T1 ? Te Tb?1E1T1 = ?T E 0Tb?1E T +1 1 1 L
k
k
k
(46)
T
k
k
k
It follows from (40) that Tb = Q R , where Q = F ?1 F1 and F is the ks ks principal submatrix of F . we have Tb?1 = R ?1 Q , and the (k; 1) block of Tb?1 is Re?1 times the (k; 1) block of Q = F ?1 F1 and this is just c ?1 c1 . Then T
k
k
k
i
k
k
k
k
i
k
k
k
k
kk
k
(47) k k = kT E Tb? E T k = kT Re? c ? c T k 0I 1 ? Since F = @ a b A is chosen such that (41) holds, it then follows from (42) c d (L)
k +1
F
k
1
T
k
1 1
k
k +1
F
1
kk
k
1
1 1
F
(k 1)s
k
that
k
k
k
k
T +1Re?1 = ?d?1c k
(48)
k
k
kk
Combining (47) and (48), we obtain the rst equality in (45). 2 The following result shows exactly how much larger the residual norm is for the BLanczos algorithm than that for the BMINRES algorithm, when the approximations come from the same block Krylov subspace. Theorem 4.2. In exact arithmetic, if d is nonsingular at iteration k, then the BLanczos and BMINRES residuals are related by k
kR k kR k kd? k kR k s ? kR k =kR ? k (M )
q
k
(L)
F
k
(M ) 2
F
1
F
k
(M ) 2 k 1 F
k
(49)
(M )
2
F
k
Proof. From (10), (29) and (30), it follows that the BLanczos and BMINRES residuals can each be written in the form R( ) = R0 ? AQb Y ( ) = Qb +1(E1T1 ? Te Y ( )) L;M
L;M
k
k
L;M
k
k
k
Since the columns of Qb +1 are orthonormal, then we have
k
k
kR
(L;M )
k = kE T ? Te Y 1 1
F
k
k
(L;M )
k = k F
k
(L;M )
k
(50)
F
k
From (39), we obtain
kR k = k k = ke k = kc e k kc k ke k = kc k kR ? k (M )
(M ) k
F
k
F
k +1
F
k
13
k
F
k
F
k
F
k
F
(M ) k 1
F
(51)
Then
kc k kR k =kR ? k (M )
k
Let
F
(M ) k 1
F
k
!
0
= (L)
(52)
F
Z(
L)
k
k
where Z ( ) 2 R . From (46) and (48), we have L
s
s
k
Z ( ) = d?1c c1T1 = d?1e +1 L
k
k
k
k
k
Then
kR k = k k = kZ k kd? k ke k = kd? k kR k (L)
(L)
(L)
F
k
k
F
1
F
k
2
k
k +1
1
F
k
(M )
2
k
F
(53)
kR k = ke k kd k kZ k = kd k kR k
(54)
kd k kR k =kR k
(55)
(M )
k +1
F
k
(L)
F
k
F
k
(M )
k
F
k
(L)
F
k
F
k
(L)
F
k
F
F
Since c c + d d = I , it follows from (52) and (55) that k
T k
k
T k
s
s = kc k + kd k kR k + kR k kR ? k kR k k
2
F
k
(M ) 2
2
k
F
F
(M ) 2 k 1 F
(M ) 2 k
F
(L) 2 k
(56)
F
So (49) holds. 2 Note that if s = 1, then (51)-(56) reduce equalities. Therefore, the inequalities in (49) become equalities in this case. Theorem 4.2 shows that when the BMINRES algorithm is converging rapidly, the residual norms in the BLanczos algorithm exhibit similar behavior, whereas when the BMINRES algorithm is converging very slowly, the residual norms in the BLanczos algorithm are usually larger. This is a generalization of the relationship established between Lanczos and MINRES residual norms[4].
5. Numerical experiments In this section, we present some numerical experiments that illustrate the behavior of the adaptive BLanczos and BMINRES algorithms de ned by Algorithm 2.2 and Algorithm 3.1, respectively, and compare the numerical performance of the BLanczos algorithm with that of the BMINRES algorithm. We also compare the BLanczos and BMINRES algorithms with standard Lanczos and MINRES algorithms for symmetric linear systems applied to a sequence of right-hand sides. All codes were written in Fortran, using double precision arithmetic. All the experiments were performed on a SunOS workstation at CERFACS. In particular, the codes were compiled and executed under the F77. Timings were obtained using the function Elapse. 14
In all examples the right-hand sides were chosen as a random matrix of dimension
n s with values uniformly distributed in the interval [?1; 1]. The starting guess X (0) was taken to be zero. The de ating or stopping test for iteration for index k is kr( )k2=kR0k i
F
k
where r( ) is the residual for the de ated linear systems, R0 is an initial residual matrix. Example 5.1. This example is the rst matrix named GR3030 from the LAPLACE group of the Harwell-Boeing collection[5]. The matrix was generated from a nine point discretization of the Laplacian on the unit square with Dirichlet boundary conditions. The order of the matrix is 900 and it has 7744 nonzero entries. Let = 10?10. Table 5.1 shows the number of iterations to convergence for each method. The numbers of iterations for the BLanczos and BMINRES algorithms decrease as the number of righthand sides increases, corroborating theoretical results in [11]. Table 5.1. Number of iterations to convergence i
k
s Lanczos MINRES BLanczos BMINRES
2 151 150 68 69
4 300 299 51 53
8 16 32 64 599 1180 2335 4630 591 1172 2323 4587 38 28 19 13 40 30 23 15
Table 5.2 displays the number of matrix-vector products performed to generate the Krylov subspaces until convergence for each method. The BMINRES algorithm requires the least number of matrix-vector products among the four methods. Table 5.2. Number of matrix-vector products s Lanczos MINRES BLanczos BMINRES
2 153 152 137 136
4 304 303 210 204
8 607 599 312 303
16 1196 1188 456 441
32 2367 2355 658 631
64 4694 4651 926 885
Table 5.3 reports the CPU time for the four methods. The Blanczos and BMINRES algorithms keep good performance. Table 5.3. CPU time to convergence (in seconds) s Lanczos MINRES BLanczos BMINRES
2 140 140 131 130
4 279 277 205 201
8 694 546 321 318
16 1092 1092 519 534
32 2193 2207 969 933
64 4338 4254 1815 1875
Example 5.2. The matrix is the second one named BCSSTK09 from the BCSSTRUC1 group[5]. The order of the matrix is 1083 and the number of nonzero entries is 18437. Let = 10?8. The numbers of iterations and matrix-vector products, and the CPU time for each method are shown in Table 5.4, 5.5 and 5.6, respectively. 15
Table 5.4. Number of iterations to convergence s Lanczos MINRES BLanczos BMINRES
2 578 576 266 238
4 8 16 32 64 1163 2311 4581 9101 18038 1148 2277 4514 8936 17748 203 195 82 178 43 155 95 62 42 30
Table 5.5. Number of matrix-vector products s Lanczos MINRES BLanczos BMINRES
2 580 578 484 458
4 1167 1152 677 582
8 2319 2285 975 723
16 4597 4530 1078 907
32 9133 8968 1589 1143
64 18102 17812 1594 1401
Table 5.6. CPU time to convergence (in seconds) s Lanczos MINRES BLanczos BMINRES
2 1284 1282 1096 1040
4 2663 2557 1551 1345
8 5133 5051 2284 1724
16 10212 10076 2680 2314
32 20313 19817 4262 3327
64 40033 39456 5212 5094
The convergence curves for the BLanczos and BMINRES algorithms for Example 5.2 are plotted in Figure 5.1{5.6. Figure 5.1{5.6 show that the residual norm kR k generated by the BMINRES algorithm is a monotonically decreasing function of the iteration number k, while the resulting curve in the BLanczos algorithm is not monotonically decreasing as a function of the iteration number. k
6. Conclusion
F
We have proposed the adaptive BLanczos and BMINRES algorithms for the iterative solution of symmetric linear systems with multiple right-hand sides. We have compared the BLanczos algorithm and the BMINRES algorithm in both the theory and the numerical performance. Numerical examples compare the BLanczos and BMINRES algorithms with the standard Lanczos and MINRES algorithms in which the multiple linear system (2) is considered as s linear systems of equations. Our all numerical experiments show that the adaptive BLanczos and BMINRES algorithms are two attractive methods for solving symmetric linear systems with multiple right-hand sides. Acknowledgment. The author would like to thank Professor F. Chaitin-Chatelin and Dr. V. Fraysse for providing a ne working environment during his visit to CERFACS.
16
References [1] D. Calvetti and L. Reichel. Application of a block modi ed Chebyshev algorithm to the iterative solution of symmetric linear systems with multiple right hand side vectors, Numer. Math., 68, 3-16, 1994 [2] A. Chapman and Y. Saad. De ated and augmented Krylov subspace techniques, Numer. Linear Algebra Appl., 4, 43-66, 1997. [3] J. Cullum and W. E. Donath. A block Lanczos algorithm for computing the q algebraically largest eigenvalues and a corresponding eigenspace of large, sparse, symmetric matrices, Proc. 1974 IEEE Conference on Decision and Control, pp.505-509, 1974. [4] J. Cullum and A. Greenbaum. Relations between Galerkin and norm-minimizing iterative methods for solving linear systems, SIAM J. Matrix Anal. Appl., 17, 223-247, 1996. [5] I. S. Du, R. G. Grimes and J. G. Lewis. User's Guide for the Harwell-Boeing Sparse Matrix Collection(Release I), Tech. Report TR/PA/92/86, CERFACS, Toulouse, France, Oct. 1992. [6] R. W. Freund and M. Malhotra. A block QMR algorithm for non-Hermitian linear systems with multiple right-hand sides, Linear Algebra Appl., 254, 119-157, 1997. [7] G. H. Golub and R. Underwood. The block Lanczos method for computing eigenvalues, in Mathematical Software III(J.R. Rice ed.), Academic Press, New York, pp.361-377, 1977. [8] M. R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Standards, 49, 409-436, 1952. [9] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear dierential and integral operators, J. Res. Nat. Bur. Standards, 45, 255-282, 1950. [10] A. A. Nikishin and A. Yu. Yeremin. Variable block CG algorithms for solving large sparse symmetric positive de nite linear systems on parallel computers, I: general iterative scheme, SIAM J. Matrix Anal. Appl., 16, 1135-1153, 1995. [11] D. P. O'Leary. The block conjugate gradient algorithm and related methods, Linear Algebra Appl., 29, 293-322, 1980. [12] D. P. O'Leary. Parallel implementation of the block conjugate gradient algorithm, Parallel Computing, 5, 127-139, 1987. [13] C. C. Paige and M. A. Saunders. Solution of sparse inde nite systems of linear equations, SIAM J. Numer. Anal., 12, 617-629, 1975. [14] M. Papadrakakis and S. Smerou. A new implementation of the Lanczos method in linear problems, Internat. J. Numer. Methods Engnrg., 29, 141-159, 1990. [15] B. N. Parlett. A new look at the Lanczos algorithm for solving symmetric systems of linear equations, Linear ALgebra Appl., 29, 323-346, 1980. [16] Y. Saad. On the Lanczos method for solving symmetric systems with several right hand sides, Math. Comp., 48, 651-662, 1987.
17
[17] Y. Saad. Iterative Methods for Sparse Linear Systems, PWS Publishing Company, Boston, 1996. [18] M. Sadkane and B. Vital. Davidson's method for linear systems of equations. Implementation of a block algorithm on a multi-processor, Tech. Report TR/PA/91/60, CERFACS, Toulouse, France, Sept. 1991. [19] V. Simoncini and E. Gallopoulos. An iterative method for nonsymmetric systems with multiple right-hand sides, SIAM J. Sci. Comput., 16, 917-933, 1995. [20] V. Simoncini. A stabilized QMR version of block BICG, SIAM J. Matrix Anal. Appl., 18, 419-434, 1997. [21] H. A. van der Vorst. An iterative solution method for solving f(A)=b using Krylov subspace information obtained for the symmetric positive de nite matrix A, J. Comput. Appl. Math., 18, 249-263, 1987. [22] B. Vital. Etude de quelques methodes de resolution de problemes lineaires de grande taille sur multiprocesseur, Ph. D. thesis, Universite de Rennes I, Rennes, France, Nov. 1990.
18
Figure 5.1: Convergence History for s=2 5 BMINRES Blanczos
log(||Rk||F)
0
−5
−10
−15
−20
0
50
100 150 200 Iteration Count
250
300
Figure 5.2: Convergence History for s=4 5 BMINRES BLanczos
log(||Rk||F)
0
−5
−10
−15
0
50
100 150 Iteration Count
19
200
250
Figure 5.3: Convergence History for s=8 5 BMINRES BLanczos
log(||Rk||F)
0
−5
−10
−15
0
50
100 150 Iteration Count
200
Figure 5.4: Convergence History for s=16 10 BMINRES BLanczos
log(||Rk||F)
5
0
−5
−10
−15
0
20
40 60 Iteration Count
20
80
100
Figure 5.5: Convergence History for s=32 10 BMINRES BLanczos
log(||Rk||F)
5
0
−5
−10
−15
0
50
100 150 Iteration Count
200
Figure 5.6: Convergence History for s=64 10 BMINRES BLanczos
log(||Rk||F)
5
0
−5
−10
−15
0
10
20 30 Iteration Count
21
40
50