ABLE: an Adaptive Block Lanczos Method for Non ... - Semantic Scholar

1 downloads 0 Views 354KB Size Report
and David Dayz and Qiang Yex. May 10, 1995. Revised February 4, 1996. Abstract. This work presents an Adaptive Block Lanczos method for large scale non-.
ABLE: an Adaptive Block Lanczos Method for Non-Hermitian Eigenvalue Problems Zhaojun Bai

y

and

David Day

z

and

Qiang Ye

x

May 10, 1995 Revised February 4, 1996

Abstract

This work presents an Adaptive Block Lanczos method for large scale nonHermitian Eigenvalue problems (henceforth the ABLE method). The ABLE method is a block version of the non-Hermitian Lanczos algorithm. There are three innovations. First an adaptive blocksize scheme cures (near) breakdown and adapts the blocksize to the order of multiple or clustered eigenvalues. Second stopping criteria are developed that exploit the quadratic convergence property of the method. Third a well-known technique from the Hermitian Lanczos algorithm is generalized to monitor the loss of duality and maintain semi-duality among the computed Lanczos vectors. Each innovation is theoretically justi ed. Academic model problems and real application problems are solved to demonstrate the robustness and e ectiveness of this competitive method.

1 Introduction A number of ecient numerical algorithms for solving large scale matrix computation problems are built upon the Lanczos procedure, a procedure for successive reduction of a given matrix to a tridiagonal form [35]. SYMMLQ [41] and QMR [21] are Lanczos based procedures for solving large scale Hermitian and non-Hermitian linear systems, This work was supported in part by NSF grant ASC-9313958 and in part by DOE grant DEFG03-94ER25219 via subcontracts from University of California at Berkeley. This research is also supported in part by a research grant from NSERC of Canada. y Department of Mathematics, University of Kentucky, Lexington, KY40506, USA ([email protected]). z MS 1110, Sandia National Laboratories PO Box 5800, Albuquerque, NM 87185, USA ([email protected]). x Department of Applied Mathematics, University of Manitoba, Winnipeg, Manitoba, R3T 2N2, Canada ([email protected] ). 

1

1 INTRODUCTION

2

respectively. LANCZOS [14], LASO [46, 55] and LANZ [30] are Lanczos based algorithms for solving large scale symmetric eigenvalue problems. All corresponding software can be obtained from [email protected], a software distribution system. Block Lanczos algorithms for Hermitian eigenvalue problems are developed in [11, 22, 48], and a state-of-the-art implementation is described in [26]. Lanczos based procedures have emerged as the methods of choice for large sparse partial Hermitian eigenvalue problems. Over the last decade there has been considerable interest in Lanczos based algorithms for solving non-Hermitian eigenvalue problems. The Lanczos algorithm without rebiorthogonalization is implemented and applied to a number of application problems in [13]. It is also successfully used in other applications, such as the transient stability analysis of Euler/Navier-Stokes solvers used to compute the ow over an airfoil [36]. Di erent schemes to overcome possible failure in the non-Hermitian Lanczos algorithm are studied in [47, 20, 63]. Theoretical studies of breakdown and instability can be found in [24, 44, 27, 6]. Error analyses of the non-Hermitian Lanczos procedure implemented in nite precision arithmetic are presented in [2, 16] which extend the well-known results of Paige from the Hermitian case [40]. Despite all this progress a number of unresolved issues obstruct a robust implementation of the non-Hermitian Lanczos procedure. These issues include:  how to reduce data movement to achieve high performance when the Lanczos algorithm is applied to a large sparse matrix,  how to distinguish copies of converged Rayleigh-Ritz values from multiple or clustered eigenvalues,  how to treat a breakdown or a near breakdown to preserve the stringent numerical stability requirements on the Lanczos procedure for eigenvalue problems in nite precision arithmetic,  how to explain and take advantage of the observed fast (quadratic) convergence of the Lanczos procedure,  how to detect ill-posed or poorly conditioned eigenvalue problems,  how to extend the understanding of the Hermitian Lanczos algorithm with semi-orthogonality [56, 25] and the non-Hermitian Lanczos algorithm with semiduality [16] to the block non-Hermitian Lanczos algorithm. In the ABLE method proposed in this work, we address each of these issues as follows.  A block version of the Lanczos procedure is implemented. Several non-trivial implementation issues are addressed.  The blocksize adapts to be at least the order of multiple or clustered eigenvalues, and the linear independence of the Lanczos vectors is maintained. This accelerates convergence in the presence of multiple or clustered eigenvalues.

1 INTRODUCTION

3

 The blocksize also adapts to cure (near) breakdowns. The adaptive blocking

scheme proposed here enjoys the theoretical advantage that any exact breakdown can be cured with xed augmentation vectors. In contrast the prevalent look-ahead techniques require an arbitrary number of augmentation vectors to cure a breakdown, and may not be able to cure all breakdowns [47, 20, 44].  An asymptotic analysis of the second order convergence of the Lanczos procedure is presented and utilized in the stopping criteria.  Since both the left and right approximate eigenvectors are available, the approximate eigenvalue condition numbers are readily computable. This detects ill-posedness in an eigenvalue problem.  A scheme to monitor the loss of the duality and maintain semi-duality is developed in the adaptive block Lanczos procedure. The ABLE method is a generalization of the block Hermitian Lanczos algorithm [22, 26] to the non-Hermitian case. For general application codes that represent their matrices as out-of-core, block algorithms substitute matrix block multiplies and block solvers for matrix vector products and simple solvers [26]. In other words, higher level BLAS are used in the inner loop of block algorithms. This decreases the I/O costs essentially by a factor of the blocksize. The robustness and e ectiveness of the ABLE method is demonstrated using several numerical examples from academic model problems and real applications. Other competitive methods for computing some eigenvalues of a large sparse (nonHermitian) matrix include the simultaneous iteration method [5, 60, 17], Arnoldi method [1, 51, 58], the block version of Arnoldi method [52, 54, 28], the Rational Krylov subspace method [49], and the Davidson method [15, 53] and the JacobiDavidson method [57, 8]. An extensive comparison among these approaches is expected in the future. The rest of this paper is organized as follows. In x2, we present a basic block non-Hermitian Lanczos algorithm, discuss its convergence properties and review how to maintain duality among Lanczos vectors computed in nite precision arithmetic. An adaptive block scheme to cure (near) breakdown and adapt the blocksize to the order of multiple or clustered eigenvalues is described in x3. In x4, we model the loss of duality among the Lanczos vectors in nite precision arithmetic, and present an ecient algorithm for maintaining semi-duality among the computed Lanczos vectors. The complete ABLE method is presented in x5 and implementation issues are discussed. In x6, we brie y discuss how a spectral transformation is used to solve a generalized eigenvalue problem using the ABLE method. Numerical experiments are reported in x7.

2 A BLOCK NON-HERMITIAN LANCZOS ALGORITHM

4

2 A Block Non-Hermitian Lanczos Algorithm In this section we present a block implementation of the non-Hermitian Lanczos algorithm and discuss its convergence properties for solving non-Hermitian eigenvalue problems. An adaptive block non-Hermitian Lanczos algorithm (see x5) builds into this algorithm features presented in the intervening sections.

2.1 A Basic Block Lanczos Algorithm

The block non-Hermitian Lanczos procedure is a variation of the original Lanczos procedure as proposed by Lanczos [35]. Given an n by n matrix A and initial n by p block vectors P1 and Q1, two sequences of n by p block vectors fPj g and fQj g, called Lanczos vectors, are generated such that for j = 1; 2; : : :, spanf P1T ; P2T ; : : : ; PjT g = Kj ( P1T ; A ); spanf Q1; Q2; : : : ; Qj g = Kj ( Q1; A ); where Kj (Q1; A) and Kj (P1T ; A) are Krylov subspaces of Cn and dual space Cn generated by A and n  p block vectors Q1 and P1T , respectively, i.e.,

Kj ( P1T ; A )  spanf P1T ; P1T A; P1T A2; : : : ; P1T Aj?1 g; Kj ( Q1; A )  spanf Q1; AQ1; A2Q1; : : : ; Aj?1Q1 g: The block vectors fPj g and fQj g are constructed so that they are bi-orthonormal, i.e.

(

if i = j P Qj = I; 0 if i 6= j: T i

Together these properties determine the computed quantities up to a scaling. Several non-trivial practical issues are resolved in the following implementation. Basic Block Non-Hermitian Lanczos Algorithm

1. 2. 3. 4.

Choose starting n  p vectors P1 and Q1 so that P1T Q1 = I R1 = (P1T A)T S1 = AQ1 For j = 1; 2; : : : : : : 1. 2. 3. 4. 5.

Aj = PjT Sj Rj := Rj ? Pj ATj Sj := Sj ? Qj Aj

Compute the QR decomposition: Rj = Pj+1 BjT+1 Compute the QR decomposition: Sj = Qj+1Cj+1

2 A BLOCK NON-HERMITIAN LANCZOS ALGORITHM 6. 7. 8. 9. 10. 11. 12.

5

Compute the singular value decomposition: PjT+1 Qj+1 = Uj j VjH Bj+1 := Bj+1Uj 1j =2 Cj+1 := 1j =2VjH Cj+1 Pj+1 := Pj+1 Uj ?j 1=2 Qj+1 := Qj+1Vj ?j 1=2 Rj+1 = (PjT+1 A ? Cj+1PjT )T Sj+1 = AQj+1 ? Qj Bj+1

Initially, if no other choice suggests itself, the vectors P1 and Q1 are chosen so that P1 = Q1 is a real orthonormalized random matrix. The above basic block Lanczos iteration implements the three term recurrences

Bj+1PjT+1 = PjT A ? Aj PjT ? Cj PjT?1 Qj+1Cj+1 = AQj ? Qj Aj ? Qj?1Bj :

(1) (2)

This implementation is believed to be the least susceptible to roundo errors (see Remark 3). The procedure can be also viewed as a successive reduction of an n  n non-Hermitian matrix A to block tridiagonal form. If we let

Pj = [ P1; P2; : : :; Pj ];

Qj = [ Q1; Q2; : : :; Qj ]

and

3 2A B 1 2 77 66 ... C A 77 ; 6 2 2 Tj = 66 . . . . . . Bj 75 4 Cj A j then the three term vector recurrences (1) and (2) have the matrix form PjT A = Tj PjT + Ej Bj+1PjT+1 ;

(3)

(4)

AQj = Qj Tj + Qj+1Cj+1 EjT ;

(5) where Ej is a tall thin matrix whose bottom square is an identity matrix and which vanishes otherwise. Furthermore, the computed Lanczos vectors Pj and Qj satisfy the bi-orthonormality or duality property

PjT Qj = I:

(6)

When the blocksize p = 1, this is just the unblocked non-Hermitian Lanczos algorithm discussed in [23, p. 503]. Remark 1. For a complex matrix A we still use the transpose T instead of the conjugate transpose H . If A is complex symmetric and P1 = Q1, then equation (4)

2 A BLOCK NON-HERMITIAN LANCZOS ALGORITHM

6

is the transpose of equation (5) and it is only necessary to compute one of these two recurrences. Remark 2. The above block Lanczos algorithm can breakdown prematurely if RTj Sj is singular. See steps 4.9 and 4.10. We will discuss this issue in x3. Remark 3. Many choices of the p  p nonsingular scaling matrices Bj+1 and Cj+1 satisfy RTj Sj = Bj+1 Cj+1. An alternative scaling maintains the unit length of all Lanczos vectors. This scaling scheme for the unblocked Lanczos algorithm is used in [20, 44, 16]. In this case the Lanczos algorithm determines a pencil (T^j ; j ), where T^j is tridiagonal and j is diagonal. It can be shown that the tridiagonal Tj determined by the above unblocked (p = 1) Lanczos algorithm and this pencil are related by Tj = j j j1=2T^j j j j1=2: The Lanczos vectors are also related, up to sign, by a similar scaling. Remark 4. The condition number of the Lanczos vectors Pj and Qj can be monitored by the diagonal matrices 1; 2; : : :; j . Recall that the condition number of the rectangular matrix Qj is de ned by

k2 ; cond(Qj ) def = kQk2kQyj k2 = kQ(jQ ) min

j

where min(Qj ) = minkxk2=1 kQj xk2: To derive the bound for cond(Qj ) observe from steps 4.9 and 4.10 that kPiT k2 = kQik2 = k?i 1=2k2. Then for a unit vector v such that kPjT k2 = kPjT vk2,

kPjT k22 = kPjT vk22 =

j X i=1

kPiT vk22 

j X

1 ; min(i) i=1

where min(i ) denotes the smallest diagonal element of i. The latter bound also applies to Qj . Furthermore, we note that

PjT Qj = Ij =) kPj k2 min(Qj )  1: Therefore

j X

1 : i=1 min(i ) The bound applies to cond(Pj ) by the similar argument. This generalizes and slightly improves a result from [44]. Remark 5. This implementation is a generalization of the block Hermitian Lanczos algorithms of Golub and Underwood [22] and Grimes, Lewis and Simon [26] to the non-Hermitian case. A simple version of the block non-Hermitian Lanczos procedure has been studied in [3]. Other implementations of the basic block non-Hermitian Lanczos procedure have been proposed for di erent applications in [7, 34]. cond(Qj )  kPj k2kQj k2 

2 A BLOCK NON-HERMITIAN LANCZOS ALGORITHM

7

2.2 Eigenvalue Approximation

To extract approximate the eigenvalues and eigenvectors of A, we solve the eigenvalue problem of the jp  jp block tridiagonal matrix Tj after the step 4.5. Each eigentriplet (; wH ; z) of Tj , wH Tj = wH and Tj z = z; determines a Rayleigh-Ritz triplet, (; yH ; x), where yH = wH PjT and x = Qj z. Rayleigh-Ritz triplets approximate eigentriplets of A. To assess the approximation, (; yH ; x), of an eigentriplet of the matrix A, let s and r denote the corresponding left and right residual vectors. Then by (4) and (5), we have (7) sH = yH A ? yH = (wH Ej )Bj+1PjH+1 ; H r = Ax ? x = Qj+1Cj+1(Ej z): (8) Note that a remarkable feature of the Lanczos algorithm is that the residual norms ksH k2 and krk2 are available without explicitly computing yH and x. There is no need to form yH and x until their accuracy is satisfactory. The residuals determine a backward error bound for the triplet. The duality condition, (6), applied to the de nition of x and yH yields (9) PjH+1x = 0; yH Qj+1 = 0: From (8) and (7), we have the following measure of the backward error for the Rayleigh-Ritz triplet (; yH ; x): yH (A ? F ) = yH ; (10) (A ? F )x = x; (11) where the backward error matrix F is H H F = krxxk2 + kysyk2

2

2

(12)

and kF k2F = krk22=kxk22 + ksH k22=kyk22. That is, the left and right residual norms bound the distance to the nearest matrix to A with eigentriplet (; yH ; x). In fact it has been shown that F is the smallest perturbation of A such that (; yH ; x) is an eigentriplet of A ? F [31]. The computed Rayleigh-Ritz value  is a kF kF -pseudo eigenvalue of the matrix A [61]. If we write A = B + F , where B = A ? F , then a rst order perturbation analysis indicates that there is an eigenvalue  of A such that j ? j  cond()kF k2 where H cond() = ky jykH2 xkjxk2 ;

2 A BLOCK NON-HERMITIAN LANCZOS ALGORITHM

8

is the condition number of the Rayleigh-Ritz value  [23]. This rst order estimate is very often pessimistic because  is a two-sided or generalized Rayleigh quotient [42]. A second order perturbation analysis yields a more realistic error estimate, which should be used as a stopping criterion. Global second order bounds for the accuracy of the generalized Rayleigh quotient may be found in [59] and [9]. Here we derive an asymptotic bound. Recall that (; yH ; x) is an eigentriplet of B = A?F and that yH F = sH and Fx = r. Assume that B has distinct eigenvalues fig and the corresponding normalized left and right eigenvectors fyiH ; xig (kyiH k2 = kxik2 = 1). Let us perturb  = (0) towards an eigenvalue  of A using the implicit function (t) = (B + tE ) for E = F=kF k2. Under classical results from function theory [32], it can be shown that in a neighborhood of the origin there exist di erentiable (t), yH (t) and x(t) (kyH (t)k2 = kx(t)k2 = 1), such that

yH (t)(B + tE ) = (t)yH (t) and (B + tE )x(t) = x(t)(t): Next expand (t) about t = 0:

(13)

 = (kF k2) = (0) + 0(0)kF k2 + 21 00(0)kF k22 + O(kF k32):

By di erentiating the equations (13) with respect to t, and setting t = 0, we obtain H 0(0) = kF1k yyHFx x: 2 Note that from (12), yH Fx = yH r + sH x. Substitute (7), (8) and (9), to nd yH Fx = 0. This implies the stationarity property 0(0) = 0. Di erentiate equation (13) with respect to t twice, and set t = 0 and there appears H 00(0) = kF2k ysH x x0(0): 2

Now the standard eigenvector sensitivity analysis gives H X x0(0) = ( ?yiEx xi; H i )yi xi  6= i

see for example Golub and Van Loan [23, p. 345]. From the above two formulas, up to the second order of kF k2, we obtain 0 1 H X k s k k r k 1 1 j ? j  gap(2; B )2 @ jyH xj jyH x j A : (14) i i  6= i

Here gap(; B ) = min 6= j ? ij. Note that the term in the parentheses involves the condition numbers of the eigenvalues fig of B . i

2 A BLOCK NON-HERMITIAN LANCZOS ALGORITHM

9

The bound (14) shows that the accuracy of the Rayleigh-Ritz value  is proportional to the product of the left and right residuals and the gap in the eigenvalues of B . Since gap(; B ) is not computable, we use the gap(; Tj ) to approximate gap(; B ) when kF k2 is small. From (12) and (14), we advocate accepting  as an approximate eigenvalue of A if ( ) H k s k 2 krk2 H min ks k2; krk2; gap(; T )  c ; (15) j

where c is a given accuracy threshold. Note that for ill-posed problems, small residuals (backward errors) do not imply high eigenvalue accuracy (small forward error). In this case, the estimate is optimistic. In any case since both the left and right approximate eigenvectors are available, the approximate eigenvalue condition numbers are readily computable. This detects ill-posedness in an eigenvalue problem. See numerical example 5 in x7. It is well-known that for Hermitian matrices, the Lanczos algorithm reveals rst the outer and the well-separated eigenvalues [43]. In the block Hermitian Lanczos algorithm with blocksize p, the outer eigenvalues and the eigenvalue clusters of order up to p that are well separated from the remaining spectra converge rst [22, 50]. This qualitative understanding of convergence has been extended to the block Arnoldi algorithm for non-Hermitian eigenproblems in [51, 29].

2.3 Maintaining the Duality of the Lanczos Vectors

The quantities computed in the block Lanczos algorithm in the presence of nite precision arithmetic have di erent properties than the corresponding exact quantities. The duality property, (6), fails to hold and the columns of the matrices Pj and Qj are spanning sets but not bases. The loss of linear independence in the matrices Pj and Qj computed by the three term recurrence is coherent; as a Rayleigh-Ritz triplet converges to an eigentriplet of A, copies of the Rayleigh-Ritz values appear. At this iteration, Qj is singular because it maps a group of right eigenvectors of Tj to an eigenvector of A. For example, in a Lanczos run of 100 iterations, one may observe 10 copies of the dominant eigenvalue of A among the Rayleigh-Ritz values. This increases the number of iterations required to complete a given task. As a partial remedy, we advocate maintaining local duality to ensure the duality among consecutive Lanczos vectors in the three term recurrences. Local duality is maintained as follows. After the step 4.3,

Rj := Rj ? Pj (QTj Rj ); Sj := Sj ? Qj (PjT Sj ): This enhances the numerical duality of the Lanczos vectors [16]. Repeating this inner loop increases the number of oating point operations in a Lanczos iteration.

3 AN ADAPTIVE BLOCK LANCZOS ALGORITHM

10

However, no new data transfer is required and the local duality would normally be swamped by iteration 10. The cost e ectiveness seems indisputable. Another limitation of simple implementations of the three term recurrences is that the multiplicity of an eigenvalue of A is not related in any practical way to the multiplicity of a Rayleigh-Ritz value. To reveal the multiplicity or clustering of an eigenvalue it suces to explicitly enforce (6). This variation has been called Lanczos algorithm with full rebiorthogonalization [47] or complete rebiorthogonalization [7]. We will call it full duality following the terminology used in [16]. Full duality is maintained by incorporating a variant of the Gram-Schmidt process called the two-sided modi ed Gram-Schmidt biorthogonalization [44] (TSMGS hereafter). After the step 4.10, we dualize Pj+1 and Qj+1 in place against all previous Lanczos vectors Pj = [ P1; P2;    ; Pj ] and Qj = [ Q1; Q2;    ; Qj ]. for i = 1; 2; : : : ; j Pj+1 := Pj+1 ? Pi (QTi Pj+1) Qj+1 := Qj+1 ? Qi(PiT Qj+1) end Maintaining full duality substantially increases the cost per iteration of the Lanczos algorithm. To be precise, at Lanczos iteration j , it requires an additional 8p2 jn

oating point operations. More importantly all the computed Lanczos vectors are accessed at each iteration. This is very often the most costly part of a Lanczos run. A less costly alternative to full duality is presented in x4.

3 An Adaptive Block Lanczos Algorithm In this section, we present an adaptive block scheme. This algorithm has the exibility to adjust the blocksize to the multiplicity or the order of a cluster of desired eigenvalues. In addition, the algorithm can also be used to cure (near) breakdowns.

3.1 Augmenting the Lanczos Vectors

In a variable block Lanczos algorithm, at the j th iteration, Pj and Qj have pj columns, respectively. At the next j + 1th iteration, the number of columns of the Lanczos vectors Pj+1 and Qj+1 can be increased by k as follows. First note that for any n by k matrices Pbj+1 and Qb j+1, the basic three h term recuri b rences (1) and (2) also hold with augmented j + 1th Lanczos vectors P P j +1 j +1 i h b and Qj+1 Qj+1 : h i " PjT+1 # T Bj+1 0 Pb T = Pj A ? Aj PjT ? Cj PjT?1; j +1

3 AN ADAPTIVE BLOCK LANCZOS ALGORITHM and

11

# " i C j +1 Qj+1 Qb j+1 0 = AQj ? Qj Aj ? Qj?1Bj : Provided that i iT h h (16) Qj+1 Qb j+1 Pj+1 Pbj+1 is nonsingular, the Lanczos procedure continues as before under the substitutions i i h h Pj+1 Qj+1 Qb j+1 Pj+1 Pbj+1 ; Qj+1 with proper normalization and # " h i C j +1 Bj+1 Bj+1 0 ; Cj+1 0 : The only other constraint on Pbj+1 and Qb j+1 is that they satisfy the duality condition among the Lanczos vectors, i.e., it is required that PbjT+1 Qj = 0 and PjT Qb j+1 = 0: As a consequence the adaptive block scheme has the same governing equations and the resulting Rayleigh-Ritz approximation properties as the basic block Lanczos method described in x2. Before we turn to the usage of the adaptive block scheme, we discuss the choice of the increment vectors PbjT+1 and Qb j+1. Ideally we would like to choose augmentations so that the resulting matrix PjT+1 Qj+1 is well-conditioned. To be precise we want the smallest singular value of PjT+1 Qj+1 to be larger than the given threshold b, say b = 10?8 in double precision. However, there may not exist PbjT+1 and Qb j+1 such that the given threshold b is satis ed. A natural choice to choose Pbj+1 and Qb j+1 in practice is to dualize a pair of random n by k vectors to the previous Lanczos vectors. In other words, the vectors Pbj+1 and Qb j+1 are computed by applying TSMGS (see x2.3) to a pair of random n by k vectors. The construction is repeated a few (5 at most) times if necessary to ensure that the smallest singular value of (16) is larger than a threshold. We observe that this works well in practice. h

3.2 Adaptive Blocking for Clustered Eigenvalues

If A has an eigenvalue of multiplicity greater than the blocksize, then the RayleighRitz values converge slowly to this group of eigenvalues [13, 22, 26]. In some applications, information about multiplicity is available a priori and then the blocksize can be chosen accordingly. But when this information is not available, it is desirable to adjust the blocksize using the information obtained during the iteration. In any variable block implementation of the Lanczos algorithm in which the duality of the computed Lanczos vectors is maintained, it is advantageous to increase the blocksize to the order of the largest cluster of Rayleigh-Ritz values, fig. The adaptive block scheme proposed in x3.1 o ers such exibility.

3 AN ADAPTIVE BLOCK LANCZOS ALGORITHM

12

The cluster of Rayleigh-Ritz values about i is the set of all k such that ji ? k j   max(jij; jk j); (17) where  is a user speci ed clustering threshold. The order of the largest cluster of Rayleigh-Ritz values is computed whenever we test for convergence, and the blocksize is increased to the order of the largest cluster.

3.3 Adapting the Blocksize to Treat Breakdown

A second reason to increase the blocksize is to overcome a breakdown in the block Lanczos algorithm. Recall from x2.1 that breakdown occurs when RTj Sj is singular. There are two cases: (I) Either Rj or Sj or both is rank de cient. (II) Both Rj and Sj are not rank de cient, but RTj Sj is. Exact breakdowns are rare, but near breakdowns (i.e. RTj Sj has singular values close to 0) do occur. In nite precision arithmetic this can cause numerical instability. In case I, if Sj vanishes in step 4.3 of the basic block Lanczos algorithm, an invariant subspace is detected. To restart the Lanczos procedure choose Qj+1 to be any vector such that PjT Qj+1 = 0. If Sj is just (nearly) rank de cient, then after the QR decomposition of Sj , Sj = Qj+1Cj+1, we biorthogonalize Qj+1 to the previous left Lanczos vectors Pj . This also e ectively expands the Krylov subspace and continues the procedure. Rank de ciency of Rj is treated similarly. Note that in this case, the blocksize is not changed. This generalizes the treatment suggested by Wilkinson for the unblocked Lanczos procedure [62, p.389]. Case II is called a serious breakdown [62]. Let us rst examine the case of exact breakdown. Let Rj = Pj+1BjT+1 and Sj = Qj+1Cj+1 be the QR decompositions of Rj and Sj . In this case, PjT+1 Qj+1 is singular (see step 4.6 of the algorithm in x2.1). Suppose that PjT+1 Qj+1 has the singular value decomposition " #  0 T Pj+1Qj+1 = U 0 0 V H ; where  is nonsingular if it exists ( may be 0 by 0). Let us see how to augment Pj+1 and Qj+1 so that PjT+1 Qj+1 is nonsingular. For clearity, drop the subscript j + 1 and partition P U and QV into i i h h and QV = Q(1) Q(2) : P U = P(1) P(2) Here the number of columns of P(2) and Q(2) is the number of zero singular values of P T Q. We will also use the oblique projector j = Qj PjT : Let the augmented Lanczos vectors be i i h h and Q := Q(1) Q(2) Qb P := P(1) P(2) Pb

3 AN ADAPTIVE BLOCK LANCZOS ALGORITHM where

13

and Qb = (I ? j )P(2): h i The duality condition (6) and then the orthonormality of the columns of P(1) P(2) yield T b T T  P(1) Q = P(1) (I ? j )P(2) = P(1) P(2) = 0 and T T  T b P(2) = I: Q = P(2) (I ? j )P(2) = P(2) P(2) Similarly, Pb T Q(1) = 0 and Pb T Q(2) = I . Therefore, we have 2 0 0 3 P T Q = 64 0 0 I 75 ; 0 I Pb T Qb which is nonsingular for any Pb T Qb . Therefore, we conclude that exact breakdowns are always curable by the adaptive blocksize technique. However, for the near breakdown case, the situation is more complicated. The above choice may not succeed in increasing the smallest singular value of PjT+1 Qj+1 above a speci c given threshold, b. The diculty involves the fact that the norms of Pb and Qb can be large. In our implementation, we have chosen Pb and Qb by dualizing a pair of random n by k vectors to the previous Lanczos vectors as described in x3.1. The increment to the blocksize is the number of singular values of PjT+1Qj+1 below a speci ed threshold. Another scheme for adjusting the block size to cure (near) breakdown is the lookahead strategy [47, 20]. In the look-ahead Lanczos algorithm, the spans of the columns of Pj and Qj remain within K(P1T ; A) and K(Q1; A) respectively. Speci cally, PjT+1 and Qj+1 are augmented by Pb = (I ? j )T AT Pj+1 = AT Pj+1 ? Pj CjT+1 and

Pb = (I ? j )T Q (2)

Qb = (I ? j )AQj+1 = AQj+1 ? Qj Bj+1:

If [Pj+1Pb ]T [Qj+1Qb ] is not (nearly) singular, then one step of look-ahead is successful and Pj+1 and Qj+1 are obtained from P and Q, respectively, after normalization. Since span(Qj+1) = span(Qj ; [Qj+1; Qb ]); and span(Qj+2) = span(Qj ; [Qj+1; Qb ]; A[Qj+1; Qb ]) = span(Qj+1; A2Qj+1);

Qj+2 has no more columns than Qj+1 prior to augmentation. That is, the block size doubles at step j +1 only, and then returns to the ambient block size at the following

4 MAINTAINING SEMI-DUALITY

14

step j + 2. It may be necessary to repeatedly augment the j + 1th Lanczos blockvectors [44]. In contrast, we have shown that the adaptive strategy has the property that an exact breakdown is cured in using a xed number of augmentation vectors. Moreover to reveal clustered eigenvalues and to eliminate a potential source of slow convergence, we store Pj and Qj and maintain duality (see x4). We have found the adaptive block scheme to be a viable alternative to look-ahead strategies here.

4 Maintaining Semi-Duality In this section we present a form of limited rebiorthogonalization that is more ecient than the Lanczos algorithm with full rebiorthogonalization described in x2.3. This method extends the block Hermitian Lanczos algorithm with partial reorthogonalization to the non-Hermitian case [26]. Instead of maintaining full duality (x2.3), only semi-duality is maintained at each iteration, i.e. for j  1, ! p T T kP kQ j Qj +1 k1 j Pj +1 k1 dj+1 = max kP k kQ k ; kQ k kP k  ; (18) j

1

j +1

1

j

1

j +1

1

where  is the roundo error unit. This generalizes the de nition of semi-duality for the unblocked Lanczos algorithm [16]. We will show that semi-duality requires less computation and data transfer to maintain than full duality. In particular Pj and Qj are only accessed at certain iterations. In x4.1 we show how to monitor the loss of numerical duality without signi cantly increasing the number of oating point operations in the Lanczos recurrences. In x4.2 we show how best to correct the loss of duality.

4.1 Monitoring the Loss of Duality

When the Lanczos algorithm is implemented in nite precision arithmetic, the computed quantities can be modeled by perturbed three term recurrences: Bj+1PjT+1 = PjT A ? Aj PjT ? Cj PjT?1 ? FjT (19) and Qj+1Cj+1 = AQj ? Qj Aj ? Qj?1Bj ? Gj ; (20) where Fj and Gj represent the roundo error introduced at iteration j . By applying the standard model of the rounding errors committed in oating point arithmetic [62], it can be shown that to rst order in roundo errors, there holds kFj kF  u(kAk1 kPj k1 + kAj k1 kPj k1 + kCj k1 kPj?1 k1); kGj kF  u(kAk1 kQj k1 + kAj k1 kQj k1 + kBj k1 kQj?1k1); where u is a constant multiple of the roundo error unit . The governing equations for the computed quantities are PjT A = Tj PjT + Ej Bj+1 PjT+1 + FjT ; (21)

4 MAINTAINING SEMI-DUALITY

15

and

AQj = Qj Tj + Qj+1Cj+1 EjT + Gj ; (22) where the matrices Fj = [ F1; F2; :::; Fj ] and Gj = [ G1; G2; :::; Gj ] are such that max(kFj kF ; kGj kF )  u(kAk1 + kTj k1) max(kPj kF ; kQj kF ): (23) A detailed analysis for the unblocked case can be found in [2, 16]. Now we use this model of rounding errors in the Lanczos process to quantify the propagation of the loss of duality from iteration to iteration. The duality of the j + 1th Lanczos vectors to the previous Lanczos vectors can be measured using the short vectors Xj = PjT Qj+1 and Yj = PjT+1Qj : In the following, we show that these vectors satisfy perturbed three term recurrences which we can use to eciently monitor the duality loss. The recurrence for Xj is derived as follows. Note that " # X j ? 1 PjT Qj = 0 + Ej : (24) Let Wij = PiT Qj . Multiply (20) by PjT on the left, substitute in (21)Qj and there appears

Xj Cj+1 = Tj PjT Qj ? PjT Qj Aj ? Bj PjT Qj?1 + Ej Bj+1Wj+1;j + FjT Qj ? PjT Gj : (25) Substitute (24) above and equation (3), the de nition of Tj and simplify to nd # # " " X X j ? 1 j ? 1 (26) Tj PjT Qj ? PjT Qj Aj = Tj 0 ? 0 Aj + Ej?1 Bj : In addition, we have the identity

2 3 X j ?2 PjT Qj?1 = 64 0 75 + Ej?1 + Wj;j?1 Ej : 0

(27)

Substituting equations (26) and (27) into equation (25) nally yields 3 2 # # " " X j ?2 Xj Cj+1 = Tj X0j?1 ? Xj0?1 Aj ? Bj 64 0 75 + Ej Bj+1Wj+1;j + O(uj ) (28) W j;j

?1

where O(uj ) represents the local rounding error term FjT Qj ? PjT Gj and

u = u(kT k1 + kAk1) max(kP k ; kQ k ): j

j

j F

j F

4 MAINTAINING SEMI-DUALITY

16

The similar analysis of the left Lanczos vectors yields

Bj+1Yj = [ Yj?1 ; 0 ]Tj ? Aj [ Yj?1 ; 0 ] ? Cj [ Yj?2; 0; Wj?1;j ] + Wj;j+1 Cj+1Ej + O(uj ):

(29) Equations (28) and (29) model the propagation of the loss of the numerical duality among Lanczos vectors from iteration to iteration. The following algorithm implements these recurrence relations to monitor the duality loss. pNote that the scalar parameter d^j+1 is our measure of the duality. When d^j+1 > , then TSMGS invoked to recover duality as described in the next section. To economize on storage there is a subtle change of notation in the following monitoring algorithm. At Lanczos iteration j , the vectors Xj?1 , Xj and Xj+1 are denoted X1 , X2 and X3 and the previous Xk are not stored. Similar conventions apply to fYig and fWi;k g. Algorithm for monitoring the loss of duality

Initially, when j = 1, we set X1 = 0, Y1 = 0, d1 = u, compute X2 = P1T Q2, Y2 = P2T Q1 and let W1(l) = Y2, W1(r) = X2. When j > 1.

 W2(l) = PjT+1 Qj

# " " # # " # X 0 X X 1 2 2  X3 = Tj 0 ? 0 Aj ? W (l) Bj + B W (l) j +1 1 2 ? 1  X3 := (X3 + Fj )Cj+1 " #  X = X2 ; X = X "

1

     

0

2

3

W2(r) = PjT Qj+1 h i h i h i i h Y3 = Y2 0 Tj ? Aj Y2 0 ? Cj Y1 Wr + 0 W2(r)Cj+1 Y3 := Bj?+11 (Y3 + FjT ) h i Y1 = Y2 0 ; Y2 = Y3 W1(l) = W2(l); W1(r) = W2(r) d^j+1 = max(kX2k1=(kPj k1kQj+1k1); kY2k1=(kQj k1kPj+1 k1))

The matrix Fj is a random matrix scaled to have norm uj to simulate the roundo errors in the three term recurrences. The number of oating point operations per iteration of the monitoring algorithm is 2j 2 + O(n) where the 2j 2 is for the multiplications by Tj in steps 2 and 6 and the n comes from the \inner products" of block Lanczos vectors in steps 1 and 5. If the block tridiagonal structure of Tj is taken in account, then the cost is just O(n). Therefore the cost of the monitoring algorithm is not signi cant as promised.

5 THE ABLE METHOD

17

4.2 Correcting the Loss of Duality

When action is required to maintain semi-duality (18), TSMGS (see x2.3) is invoked to rebiorthonormalize or correct the candidate Lanczos vectors Pj+1 and Qj+1. Recall from equation (28) that the sequence fXj p g satis es a perturbed three term recurrence. Correcting Qj+1 annihilates the O( ) matrix Xj , but at the next Lanczos p iteration, Xj+1 will be a multiple of the nearly O( ) matrix Xj?1 . Instead as Qj+1 is corrected to maintain semi-duality, we also correct Qj ; In this way the duality of the following Lanczos vectors can deteriorate gradually. The similar comments hold for the left Lanczos vectors. There is a much better way to do this than to apply TSMGS at consecutive iterations to the pairs of Pj and Qj and Pj+1 and Qj+1, respectively. Instead, as the columns of Pj and Qj are transferred from slow storage to the computational unit to correct Pj+1 and Qj+1, the previous Lanczos vectors Pj and Qj can be also retroactively corrected. This halves the amount of data transfer required. Retroactive TSMGS: dualize Pj , Pj +1 , Qj and Qj +1 against the previous Lanczos

vectros in place. for i = 1; 2; : : : ; j ? 1 Pj := Pj ? Pi(QTi Pj ) Pj+1 := Pj+1 ? Pi (QTi Pj+1 ) Qj := Qj ? Qi(PiT Qj ) Qj+1 := Qj+1 ? Qi(PiT Qj+1) end Pj+1 := Pj+1 ? Pj (QTj Pj+1 ) Qj+1 := Qj+1 ? Qj (PjT Qj+1)

We do not update the QR and SVD decompositions computed in steps 4.3 and 4.5 after Retroactive TSMGS for the same technical reasons discussed in x6.3 of [16] for the unblocked Lanczos algorithm.

5 The ABLE Method In this section we present in summary form an Adaptive Block Lanczos method for Eigenproblems. We call it the ABLE method. Implementation issues e ecting performance are discussed in x5.2. Software development is reviewed in x5.3.

5.1 The ABLE Method

The ABLE method below incorporates adaptive blocking into the basic block Lanczos algorithm of x2.1 and maintains the local duality and semi-duality of the Lanczos vectors.

5 THE ABLE METHOD

18

Adaptive Block Non-Hermitian Lanczos (ABLE) Method

1. 2. 3. 4.

Choose starting vectors P1 and Q1 so that P1T Q1 = I R = (P1T A)T S = AQ1 For j = 1; 2; : : : until convergence 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.

Aj = PjT S

Compute the eigen-decomposition of Tj , and test for convergence Find the largest order  of the clustering of converged Rayleigh-Ritz values R := R ? Pj ATj S := S ? Qj Aj Local duality: R := R ? Pj (QTj R); S := S ? Qj (PjT S ) Compute the QR decompositions: R = Pj+1 BjT+1 ; S = Qj+1Cj+1 If R or S (or both) is rank de cient, apply TSMGS to dualize Pj+1 and Qj+1 against the previous Lanczos vectors Compute the singular value decomposition: PjT+1 Qj+1 = U V H Increase blocksize if min() < b and/or  > pj Bj+1 := Bj+1U 1=2 Cj+1 := 1=2V H Cj+1 Pj+1 := Pj+1 U ?1=2 Qj+1 := Qj+1V ?1=2 Monitor the loss of duality and maintain semi-duality R = (PjT+1 A ? Cj+1PjT )T S = AQj+1 ? Qj Bj+1

Certain steps refer to techniques and algorithms developed in earlier sections of this paper:  At step 4.2, we suggest the use of equation (15) in x2.2 as the stopping criterion. At the end of a Lanczos run, we compute the \exact" residual norms ksH k2 and krk2 corresponding to the converged Rayleigh-Ritz triplets (; yH ; x). See equations (7) and (8) in x2.2.  At step 4.3, equation (17) is used to compute the order of the largest cluster as described in x3.2.  For step 4.8, see x3.3 for an explanation.

5 THE ABLE METHOD

19

 At step 4.10, b is a threshold for breakdown. min() is the smallest singular

value of the matrix PjT+1 Qj+1. If there is (near) breakdown and/or the order of the largest cluster of the converged Rayleigh-Ritz values is larger than the blocksize, then the blocks are augmented as described in x3.1.  Algorithms for monitoring the loss of duality and maintaining semi-duality at step 4.15 are described in x4.1 and x4.2.

5.2 Implementation Issues

We now discuss a number of implementation issues which will e ect the performance of the ABLE method.

Matrix-vector products: The non-Hermitian Lanczos algorithm is also called the

two-sided Lanczos algorithm because both the operations X T A and AX are required at each iteration. A is only referenced as a rule to compute these matrixvector products. Because of this feature, the algorithm is well suited for large sparse matrices or large structured dense matrices for which matrix-vector products can be computed cheaply. The ecient implementation of these products depends on the data structure and storage format for A matrix and the Lanczos vectors.

Storage requirements and out-of-core computations: If no Lanczos vectors are saved, the three term recurrences can be implemented using only 6 block vectors of length n. To maintain the semi-duality of the computed Lanczos vectors P and Q , j

j

it is necessary to store these vectors in core or out-of-core memory. This consumes a signi cant amount of memory. The user must be conscious of how much memory is needed for each application. For very large matrices it may be best to store the Lanczos vectors out of core. After each Lanczos iteration, save the current Lanczos vectors to an auxiliary storage device. The Lanczos vectors are recalled in the procedure TSMGS for rebiorthogonalization and when the converged Rayleigh-Ritz vectors are computed at the end of a Lanczos run. A block Lanczos algorithm is ideal for application codes that represent A out-ofcore. The main cost of a Lanczos iteration, with or without blocks, is accessing A. Block algorithms compute the matrix block vectors product with only one pass over the data structure de ning A, with a corresponding savings of work.

Computational bottlenecks: The most time consuming steps in a Lanczos run are

to

1. apply the matrix A (from the left and the right) 2. apply retroactive TSMGS to maintain semi-duality

6 A SPECTRAL TRANSFORMATION ABLE METHOD

20

3. solve the eigenproblem for the block tridiagonal matrix Tj Items 1 and 2 have been addressed already (see the above and x4.2). We do not need to solve the eigenproblem for Tj at each Lanczos iteration. A way to reduce such cost is to solve the eigenvalue problem for Tj only after a correction iteration has been made to maintain semi-duality. This technique utilizes the connection between the loss of the duality and convergence [39, 43, 2].

5.3 An Algorithm Template for the ABLE Method

An algorithm template for the ABLE method is in progress. A prototype of the template has been developed. It is a part of a project to develop a template library and environment for solving di erent types of eigenvalue and singular value problems [4]. In the prototype of the ABLE template, a user has the option to implement the ABLE method at di erent levels. These levels refer to the Lanczos algorithm with or without rebiorthogonalization, maintaining full duality or semi-duality, treating or not treating breakdown and/or clustering. Implementations of each level of the method is described in detail in the template. Memory storage requirements and

oating point operation counts for the di erent levels of the ABLE implementation are given. Finally, expert advice and trouble shooting tips to users are given in a question-and-answer format and illustrated by numerical examples. For a copy of the algorithm template, the reader can send email to the authors.

6 A Spectral Transformation ABLE Method In this section we brie y discuss how to use the ABLE method to compute some eigenvalues of the generalized eigenvalue problem

Kx = Mx (30) nearest an arbitrary complex number, . We assume that M is nonsingular, and that it is feasible to solve linear systems of equations with coecient matrices K ? M and M . The reward for solving these linear systems of equations is the rapid convergence of the Lanczos algorithm. In x7 we apply the ABLE method to such a generalized eigenvalue problem arising in magneto-hydro-dynamics. We apply a popular shift-and-invert strategy to (K; M ) with shift  (Ericsson and Ruhe [19]). In this approach, the ABLE method is applied with

A = (K ? M )?1M: The eigenvalues, , of A are of the form 1 :

?

(31)

7 SUMMARY OF NUMERICAL EXAMPLES

21

The outer eigenvalues of A are now the eigenvalues of (K; M ) nearest to . This spectral transformation also generally improves the separation of the eigenvalues of interest from the remaining eigenvalues of (K; M ), a very desirable property. When we apply the ABLE method to the matrix A = (K ? M )?1M , the governing equations become PbjT M (K ? M )?1 = Tj PbjT + Ej Bj+1PbjT+1 ; (32) ? 1 T (K ? M ) M Qj = Qj Tj + Qj+1Cj+1Ej ; (33) where PbjT = Pj M ?1 and PbjT+1 = Pj+1 M ?1, and Pbj and Qj are M -biorthogonal, i.e., P^jT M Qj = I . Therefore, if (; wH ; z) is an eigentriplet of Tj , then from the above governing equations (32) and (33), the triplet  1  H T ? 1  + ;w P M ;Q z



j

j

is an approximate eigentriplet of the pencil (K; M ). The matrix-vector products Y = [(K ? M )?1M ]X and Z T = X T [(K ? M )?1M ] required in the inner loop of the algorithm can be performed by rst computing the LU factorization of K ? M = LU , and then solving the linear systems of equations LUY = MX and Z T = X T (LU )?1 M for Y and Z T , respectively. If K and M are real, and the shift  is complex, one can still keep the Lanczos procedure in real arithmetic using a strategy proposed by Parlett and Saad [45]. In many applications, M is symmetric positive de nite. In this case, one can avoid factoring M explicitly by preserving M -biorthogonality among the Lanczos vectors [19, 43, 12, 26]. Numerical methods for the case in which M is symmetric inde nite are discussed in [26, 37].

7 Summary of Numerical Examples This section summarizes our numerical experience with the ABLE method. We use test matrices and matrix eigenvalue problems from real applications to demonstrate the major features of the ABLE method. Each numerical example illustrates a property of the ABLE method. The ABLE method has been implemented in Matlab 4.2 with sparse matrix computation functions. All numerical experiments are performed on a SUN Sparc 10 workstation with IEEE double precision oating point arithmetic. The tolerance value c for the stopping criterion (15) is set to be 10?8 . The clustering threshold (17) is  = 10?6 . The breakdown threshold is b = 10?8 .

Example 1. The block algorithm accelerates convergence in the presence of multiple and clustering eigenvalues. When the desired eigenvalues are known in advance to be multiple or clustered, we should initially choose the blocksize as the expected multiplicity or the cluster order. For example, the largest eigenvalue of the 656  656

7 SUMMARY OF NUMERICAL EXAMPLES

22

Chuck matrix has multiplicity 2 [13]. If we use the unblocked ABLE method, then at iteration 20, the converged Rayleigh-Ritz values, 5:502378378875370e + 00 1:593971696766128e + 00 approximate the two largest distinct eigenvalues. But the multiplicity is not yet revealed. However, if we use the ABLE method with initial blocksize 2, then at iteration 7 the converged Rayleigh-Ritz values are 5:502378378347202e + 00 5:502378378869873e + 00 Each computed Rayleigh-Ritz value agrees to 10 to 12 decimal digits compared with the one computed by the dense QR algorithm.

Example 2. Full duality is very expensive to maintain in terms oating point operations and memory access. Based on our experience, maintaining semi-duality is a reliable and much less expensive alternative. Our example is a 2500  2500 block

tridiagonal coecient matrix obtained by discretizing the two-dimensional model convection-di usion di erential equation

?u + 2p1ux + 2p2 uy ? p3u = f in ;

u = 0 on @ ; using nite di erences, where is the unit square f(x; y) 2 R2; 0  x; y  1g [18]. The eigenvalues of the coecient matrix can be expressed analytically in terms of the parameters p1 , p2, and p3. In our test run, we choose p1 = 0:5, p2 = 2 and p3 = 1. With full duality, at iteration 132 two extreme eigenvalues are converged. If we use the ABLE method with semi-duality, at iteration 139, the same two extreme eigenvalues are converged to the same accuracy. The di erence is that only 8 corrections of duality loss are invoked to maintain semi-duality, compared to 132 corrections for full duality. In Figure 1 the solid and dotted lines display the exact and estimated duality of the computed Lanczos vectors, and the \+"-points concern breakdown and are the smallest singular values of PjT Qj . The solid line plots ! T T kP kQ j Qj +1 k1 j Pj +1 k1 dj+1 = max ;

kPj k1kQj+1k1 kQj k1kPj+1k1

for j = 1; 2; : : : ; 132. Each sharp decrease corresponds to a correction. The dotted line plots the estimate, d^j+1 , of this quantity computed by the monitoring algorithm of x4.2. Correction iterations are taken when the dotted line increases to the threshold p , where  denotes the machine precision. The observation that the solid line is below the dotted line indicates that the monitoring algorithm is prudent. A near breakdown

7 SUMMARY OF NUMERICAL EXAMPLES

23

Estimated (dash dot) and Exact (solid) Duality, Omega (+)

0

10

-2

10

-4

10

-6

duality

10

-8

10

-10

10

-12

10

-14

10

20

40

60 80 Lanczos step

100

120

Figure 1: The exact (solid line) and estimated (dash-dot line) duality of the Lanczos vectors, \+" ar the smallest singular values of PjT+1Qj+1. occurs if the smallest singular value of PjT+1 Qj+1 is less than the breakdown threshold, but this is not the case in this example.

Example 3. As mentioned before when we know the multiplicity of the eigenvalues in

advance, we should choose the appropriate blocksize. Otherwise the adaptive scheme presented in x3 can dynamically adjust the blocksize to accelerate convergence. This smoothes the convergence behavior to clusters of eigenvalues. For example, we apply the ABLE method with initial blocksize 1 to the 656  656 Chuck matrix [13]. At iteration 24, the double eigenvalue is detected and the blocksize is doubled.

Example 4. Exact breakdowns are rare but near breakdowns are not. In general, we can successfully cure the near breakdowns. For example, when the ABLE method is applied to the 882  882 Quebec Hydro matrix from the application of numerical methods for power systems simulations [38], four near breakdowns are cured. At iteration 37, the four leading eigenvalues are converged. In the further investigation of this example, we found that the breakdowns are solely caused by the badly balancing of the entries of the matrix A. If we balance the matrix rst (say, using the the balance function available in Matlab), then the breakdown does not occur for the balanced matrix. The balancing of a large sparse matrix is a subject of further study.

Example 5. One of the attractive features of the ABLE method is that condition

7 SUMMARY OF NUMERICAL EXAMPLES

24

Eigenvalues (+) and Rayleigh−Ritz values (o) 8

6

imaginary part

4

2

0

−2

−4

−6

−8 0

5

10

15 real part

20

25

30

Figure 2: Spectra (+) and pseudo-spectra () of 30 by 30 Wilkinson bidiagonal matrix. numbers of the approximate eigenvalues can be readily computed at the end of the ABLE method. This makes it possible to detect ill-posed eigenvalue problems. Our example is the 30 by 30 Wilkinson bidiagonal matrix [62, p.90]. 2 3 30 30 66 77 29 30 66 77 ... ... A = 66 77 : 64 2 30 75 1 In the ABLE method with blocksize 1, all the residual errors after 30 iterations indicate convergence but the Rayleigh-Ritz values do not approximate exact eigenvalues, see Figure 2. This is understandable since all corresponding condition numbers cond(i) are of the order 1011 to 1013 . The eigenvalue problem for the Wilkinson matrix is ill-posed and the \converged" Rayleigh-Ritz values are pseudo-spectra.

Example 6. In this example, we apply the spectral transformation ABLE method of x6 to solve the eigenvalue problem of the test matrix Tolosa [10]. The order of the

matrix K is 2,000, and M = I . Our goal is to compute the eigenvalues of K in the upper left end of the spectrum. There are 23 eigenvalues in the region bounded by ?750 < Re() < ?650 and 2200 < Im() < 2400. It is known that these eigenvalues are the worst conditioned. We choose the shift  = ?750 + 2390i, the same shift as used in [49]. At iteration 69, all 23 Rayleigh-Ritz values i in the speci ed region converged. They agree to 9 to 11 decimal digits compared with the corresponding ones computed by the dense QR algorithm. The convergence history of the Rayleigh-Ritz values are shown in Figure 3 (Yes, we computed the eigenvalues of this 2000 by 2000 matrix using the LAPACK driver DGEEV; it took 10 minutes of CPU time on a CRAY2). A total of 12 corrections are invoked to maintain the semi-duality.

7 SUMMARY OF NUMERICAL EXAMPLES

25

Convergence History of 2000 by 2000 Tolosa Matrix 25

Number of Converged RR values

20

15

10

5

0 0

10

20

30 40 Lanczos step

50

60

70

Figure 3: Number of converged Rayleigh-Ritz values at each iteration of the shiftand-invert spectral transformation ABLE method for the eigenvalue problem of 2000 by 2000 Tolosa matrix.

Example 7. In this example, we apply the spectral transformation ABLE method to a generalized eigenvalue problem

Kx = Mx

(34) arising from magneto-hydro-dynamics (MHD) [33, 12], where K is non-Hermitian, and M is Hermitian positive de nite. The interesting part of the spectrum in MHD problems is not the outer part of the spectrum, but an internal branch, known as the Alfven spectrum. We need to use a spectral transformation technique to transfer the interesting spectrum to the outer part. In x6, we have outlined a general approach. Now, we show the numerical results of this general approach for the MHD test problem. The MHD test problem we consider here is from [8]. Both K and M are 416 by 416 block tridiagonal matrices with 16 by 16 blocks. To two signi cant digits, there holds kK k1 = 3100 and kM k1 = 2:50 but the estimated condition number of M is 5:05109 ; M is quite ill-conditioned. The computational task is to calculate the eigenvalues close to the shift  = ?0:3 + 0:65i [8]. We ran the unblocked spectral transformation ABLE method. After only 30 iterations, 10 Rayleigh-Ritz values are converged; their accuracy ranges from 10?8 to 10?12 , compared with the eigenvalues computed by the QZ algorithm. The following table lists the 10 converged Rayleigh-Ritz values i, and the corresponding left and right residual norms, where

kyi K ? iyi M k2 ; Res-Li = max( kK k ; kM k ) H

H

1

1

kKxi ? iMxik2 ; Res-Ri = max( kK k ; kM k ) 1

1

and (yiH ; xi) are the normalized approximate left and right eigenvectors of (K; M ) (i.e., kyiH k2 = kxik2 = 1).

REFERENCES i

1 2 3 4 5 6 7 8 9 10

26

i

?2:940037576164888e ? 01 + 5:871546479737660e ? 01i ?2:381814888866186e ? 01 + 5:914958688660595e ? 01i ?3:465530921874517e ? 01 + 5:468970786348115e ? 01i ?3:780991425908282e ? 01 + 5:071655448857557e ? 01i ?2:410301845692590e ? 01 + 5:238090347100917e ? 01i ?1:989292783177918e ? 01 + 5:900118523050361e ? 01i ?2:045328538082208e ? 01 + 5:678048139549289e ? 01i ?3:092857309948118e ? 01 + 4:687528684165645e ? 01i ?1:749780170739634e ? 01 + 5:920044440850396e ? 01i ?1:573456542107287e ? 01 + 5:976613227972810e ? 01i

Res-Li 3:82e ? 12 2:66e ? 11 1:23e ? 11 6:18e ? 11 9:81e ? 11 5:34e ? 11 5:97e ? 11 5:23e ? 09 5:62e ? 10 5:98e ? 09

Res-Ri 6:59e ? 11 4:46e ? 11 2:76e ? 10 3:98e ? 10 4:32e ? 10 8:55e ? 11 1:12e ? 10 2:59e ? 08 9:58e ? 10 9:63e ? 09

In addition, 6 other Rayleigh-Ritz values range in accuracy from 10?5 to 10?7. Figure 4 shows Alfven spectrum computed by the QZ algorithm (+) and the RayleighRitz values () computed by the spectral transformation ABLE method. Three corrections to maintain semi-duality were taken at iterations 13, 20 and 26. The convergence history of the Rayleigh-Ritz values are shown in the following table, where j is the Lanczos iteration, k is the number of converged Rayleigh-Ritz values at the j th iteration. j  14 15{18 19 20{22 23{24 25{26 27{28 29 30 k 0 1 2 3 4 7 8 9 10 Moreover, at Lanczos iteration 45, the Alfven branch of spectra of the MHD test problem are revealed: 20 Rayleigh-Ritz values converged, and 12 other Rayleigh-Ritz values range in accuracy from 10?7 up to 10?5 . See Figure 4. No copies of eigenvalues are observed.

References [1] W. E. Arnoldi. The principle of minimized iteration in the solution of the matrix eigenproblem. Quart. Appl. Math., 9:17{29, 1951. [2] Z. Bai. Error analysis of the Lanczos algorithm for the nonsymmetric eigenvalue problem. Math. Comp., 62:209{226, 1994. [3] Z. Bai. A spectral transformation block nonsymmetric Lanczos algorithm for solving sparse nonsymmetric eigenvalue problems. In J. G. Lewis, editor, Proceedings of the Fifth SIAM Conference on Applied Linear Algebra, pages 307{311. SIAM, 1994. [4] Z. Bai, D. Day, D. Demmel, J. Dongarra, M. Gu, A. Ruhe, and H. ver der Vorst. Templates for linear algebra problems. In J. Van Leeuwen, editor, Computer Science Today: Recent Trends and Developments. Springer-Verlag, 1995. Lecture Notes in Computer Science, Vol. 1000.

REFERENCES

27

1 0.9 0.8

imaginary part

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −1

−0.8

−0.6

−0.4 real part

−0.2

0

−0.8

−0.6

−0.4 real part

−0.2

0

1 0.9 0.8

imaginary part

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −1

Figure 4: The Alfven spectra of the MHD test problem. \+" denotes the eigenvalues computed by the QZ algorithm. \" are the Rayleigh-Ritz values computed by the spectral transformation ABLE method. \" is the shift  = ?0:3 + 0:65i. The gure at the top displays the results after 30 iterations. The gure at the bottom gives the results after 45 iterations.

REFERENCES

28

[5] Z. Bai and G. W. Stewart. SRRIT: a Fortran subroutine to calculate the dominant invariant subspace of a nonsymmetric matrix. Computer Science Dept. Report TR 2908, University of Maryland, April 1992. [6] D. Boley, S. Elhay, G. H. Golub, and M. H. Gutknecht. Nonsymmetric Lanczos and nding orthogonal polynomials associated with inde nite weights. Numerical Analysis Report NA-90-09, Stanford University, 1990. [7] D. Boley and G. Golub. The nonsymmetric Lanczos algorithm and controllability. Systems and Control Lett, 16:97{105, 1991. [8] J. G. L. Booten, H. A. van der Vorst, , P. M. Meijer, and H. J. J. te Riele. A preconditioned Jacobi-Davidson method for solving large generalized eigenvalue problems. Technical Report Report NM-R9414, Dept. of Num. Math, CWI, Amsterdam, the Netherlands,, 1994. [9] F. Chatelin. Eigenvalues of Matrices. Wiley, Chichester, England, 1993. A translation with modi cation of a French version published in 1988. [10] F. Chatelin and S. Godet-Thobie. Stability analysis and nonnormal matrices in aeronautical industries. 1991. Proceedings of the 13th IMACS World Congress on Computation and Applied Mathematics. [11] J. Cullum and W. E. Donath. A block Lanczos algorithm for computing the q algebraically largest eigenvalues and a corresponding eigenspace of large sparse real symmetric matrices. In Proc. of the 1974 IEEE conference on decision and control. Phoenix, 1974. [12] J. Cullum, W. Kerner, and R. Willoughby. A generalized nonsymmetric Lanczos procedure. Comput. Phys. Commun., 53:19{48, 1989. [13] J. Cullum and R. Willoughby. A practical procedure for computing eigenvalues of large sparse nonsymmetric matrices. In J. Cullum and R. Willoughby, editors, Large Scale Eigenvalue Problems. North-Holland, Amsterdam, 1986. [14] J. Cullum and R. A. Willoughby. Lanczos algorithms for large symmetric eigenvalue computations. Birkhauser, Basel, 1985. Vol.1, Theory, Vol.2. Programs. [15] E. R. Davidson. The iteration calculation of a few of the lowest eigenvalues and corresponding eigenvectors of large real-symmetric matrices. Comp. Phys., 17:87{94, 1975. [16] D. Day. Semi-duality in the two-sided Lanczos algorithm. PhD thesis, UC at Berkeley, 1993. [17] I. Du and J. Scott. Computing selected eigenvalues of sparse nonsymmetric matrices using subspace iteration. ACM Trans. on Math. Soft., 19:137{159, 1993.

REFERENCES

29

[18] H. C. Elman and R. L. Streit. Polynomial iteration for nonsymmetric inde nite linear systems, volume 1230 of Lecture Notes Math. Springer-Verlag, Berlin, 1984. [19] T. Ericsson and A. Ruhe. The spectral transformation Lanczos method for the numerical solution of large sparse generalized symmetric eigenvalue problem. Math. Comp., 35:1251{1268, 1980. [20] R. W. Freund, M. H. Gutknecht, and N. M. Nachtigal. An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices. SIAM J. Sci. Comput., 14:137{158, 1993. [21] R. W. Freund and N. M. Nachtigal. QMR: a quasi-minimal residual method for non-Hermitian linear systems. Numer. Math., 60:315{339, 1991. [22] G. Golub and R. Underwood. The block Lanczos method for computing eigenvalues. In J. Rice, editor, Mathematical Software III. Academic Press, 1977. [23] G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, MD, 2nd edition, 1989. [24] W. B. Gragg. Matrix interpretations and applications of the continued fraction algorithm. Rocky Mountain J. of Math., 5:213{225, 1974. [25] A. Greenbaum and Z. Strakos. Predicting the behavior of nite precision Lanczos and conjugate gradient computations. SIAM J. Mat. Anal. Appl., 13(1):121{137, 1992. [26] R. Grimes, J. Lewis, and H. Simon. A shifted block Lanczos algorithm for solving sparse symmetric generalized eigenproblems. SIAM J. Mat. Anal. Appl., 15(1):228{272, 1994. [27] M. H. Gutknecht. A completed theory of the unsymmetric Lanczos process and related algorithms, Part I and II. SIAM J. Mat. Anal. Appl. Part I, 13(1992), pp.594{639, Part II, 15(1994), pp.15{58. [28] Z. Jia. Some Numerical Methods for Large Unsymmetric Eigenproblems. PhD thesis, Universitat Bielfeld, Bielfeld, Germany, 1994. [29] Z Jia. Convergence of generalized Lanczos methods for large unsymmetric eigenproblems. SIAM J. Matrix Anal. Appl., 16:843{862, 1995. [30] M. T. Jones and M. L. Patrick. The use of Lanczos's methods to solve the large generalized symmetric de nite eigenvalue problem. ICASE TR-89-67, NASA Langley Research Center, Hampton, VA, 1989. [31] W. Kahan, B. N. Parlett, and E. Jiang. Residual bounds on approximate eigensystems of nonnormal matrices. SIAM J. Numer. Anal., 19:470{484, 1982.

REFERENCES

30

[32] T. Kato. Perturbation Theory for Linear Operators. Springer Verlag, Berlin, 2 edition, 1980. [33] W. Kerner. Large-scale complex eigenvalue problems. J. of Comp. Phys., 85:1{ 85, 1989. [34] S. Kim and A. Chronopoulos. An ecient nonsymmetric Lanczos method on parallel vector computers. J. Comp. and Appl. Math., 42:357{374, 1992. [35] C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear di erential and integral operators. J. Res. Natl. Bur. Stand, 45:225{280, 1950. [36] A. Mahajan, E. H. Dowell, and D. Bliss. Eigenvalue calculation procedure for an Euler/Navier-Stokes solver with applications to ows over airfoils. J. Comp. Phys., 97:398{413, 1991. [37] K. Meerbergen and A. Spence. Implicitly restarted Arnoldi with puri cation for the shift-invert transformation. Report tw 225, Dept. of Comp. Sci, Katholieke Universiteit Leuven, Belgium, 1995. [38] D. Ndereyimana, 1994. private communication. [39] C. Paige. The computation of eigenvalues and eigenvectors of very large sparse matrices. PhD thesis, London University, London, England, 1971. [40] C. Paige. Error analysis of the Lanczos algorithm for tridiagonalizing a symmetric matrix. J. Inst. Math. Appl., 18:341{349, 1976. [41] C. Paige and M. Saunders. Solution of sparse inde nite systems of linear equations. SIAM J. Num. Anal., 12:617{629, 1975. [42] B. Parlett. The Rayleigh quotient algorithm iteration and some generalizations for nonnormal matrices. Math. Comp., 28:679{693, 1974. [43] B. Parlett. The Symmetric Eigenvalue Problem. Prentice Hall, Englewood Cli s, NJ, 1980. [44] B. Parlett. Reduction to tridiagonal form and minimal realizations. SIAM J. Mat. Anal. Appl., 13(2):567{593, 1992. [45] B. Parlett and Y. Saad. Complex shift and invert strategies for real matrices. Lin. Alg. Appl., 88/89:575{595, 1987. [46] B. Parlett and D. S. Scott. The Lanczos algorithm with selective reorthogonalization. Math. Comp., 33:217{238, 1979. [47] B. N. Parlett, D. R. Taylor, and Z. A. Liu. A look-ahead Lanczos algorithm for unsymmetric matrices. Math. Comp., 44:105{124, 1985.

REFERENCES

31

[48] A. Ruhe. Implementation aspects of band Lanczos algorithms for computation of eigenvalues of large sparse symmetric matrices. Math. Comp., 33:680{687, 1979. [49] A. Ruhe. Rational Krylov, a practical algorithm for large sparse nonsymmetric matrix pencils. Computer Science Division UCB/CSD-95-871, University of California, Berkeley, CA, April 1995. [50] Y. Saad. On the rates of convergence of the Lanczos and block Lanczos methods. SIAM J. Numer. Anal., 17:687{706, 1980. [51] Y. Saad. Variations on Arnoldi's method for computing eigenelements of large unsymmetric matrices. Lin. Alg. Appl., 34:269{295, 1980. [52] Y. Saad. Numerical methods for large eigenvalue problems. Halsted Press, Div. of John Wiley & Sons, Inc., New York, 1992. [53] M. Sadkane. Block-Arnoldi and Davidson methods for unsymmetric large eigenvalue problems. Numer. Math., 64:195{211, 1993. [54] M. Sadkane. A block Arnoldi-Chebyshev method for computing the leading eigenpairs of large sparse unsymmetric matrices. Numer. Math., 64:181{193, 1993. [55] D. Scott. Block Lanczos software for symmetric eigenvalue problems. Tech. rep. ornl/csd 48, Oak Rdige Nat. Lab., 1979. [56] H. Simon. The Lanczos algorithm with partial reorthogonalization. Math. Comp., 42:115{142, 1984. [57] G. L. G. Sleijpen and H. A. van der Vorst. A Jacobi-Davidson iteration method for linear eigenvalue problems. Technical Report 856, University of Utrecht, Dept. of Math., 1994. [58] D. Sorensen. Implicit application of polynomial lters in a k-step Arnoldi method. SIAM J. Mat. Anal. Appl., 13(1):357{385, 1992. [59] G. W. Stewart. Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM Review, 15(4):727{764, 1973. [60] W. J. Stewart and A. Jennings. Algorithm 570 LOPSI: A simultaeous iteration algorithm for real matrices. ACM Trans. on Math. Soft., 7:230{232, 1981. [61] L. N. Trefethen. Pseudospectra of matrices. In 1991 Dundee Numerical Analysis Conference Proceedings, Dundee, Scotland, June 1991. [62] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford University Press, Oxford, 1965. [63] Q. Ye. A breakdown-free variation of the nonsymmetric Lanczos algorithms. Math. Comp., 62:179{207, 1994.