EFFICIENT IMPLEMENTATION OF THE MULTISHIFT QR ... - CiteSeerX

EFFICIENT IMPLEMENTATION OF THE MULTISHIFT QR ALGORITHM FOR THE UNITARY EIGENVALUE PROBLEM RODEN J. A. DAVID AND DAVID S. WATKINS∗ Abstract. We present an efficient implementation of the multi-shift QR algorithm for computing the eigenvalues of a unitary matrix. The algorithm can perform QR iterations of arbitrary degree, it is conceptually simple, and it is backward stable. Keywords. unitary matrix, eigenvalue, multi-shift QR algorithm AMS subject classifications. 65F15, 15A18

1. Introduction. We consider the eigenvalue problem for a unitary matrix U ∈ Cn×n that is upper Hessenberg, i.e. uij = 0 whenever i > j + 1. Without loss of generality we can assume that all of the subdiagonal entries uj+1,j are nonzero. Assuming this, then by a unitary, diagonal similarity transformation we can make them real and positive. Thus we will assume that uj+1,j > 0 for j = 1, . . . , n − 1. Then U can be expressed as a product of matrices of a very simple form [9]: (1.1)

U = G1 G2 · · · Gn−1 Gn ,

˜ k , In−k−1 }, where Gk = diag{Ik−1 , G γk σk ˜ , Gk = σk −γ k

σk > 0,

2

| γk | + σk2 = 1,

for k = 1, . . . , n − 1, and Gn = diag{In−1 , γn } with | γn | = 1. We will refer to the numbers γ1 , . . . , γn , σ1 , . . . , σn−1 collectively as Schur Parameters. Since these determine U completely, we see that we can store U , in the form of Schur parameters, in O(n) storage space instead of the usual O(n2 ) for n × n matrices. This being the case, one might reasonably hope to compute the eigenvalues of U in O(n2 ) work instead of the usual O(n3 ). It turns out that this can be done, and a number of interesting methods have been proposed. The first was an ingenious method of Rutishauser [12], which is however unstable and can break down. It relies on LU decompositions that sometimes do not exist. Gragg [9] showed how to do an iteration of the shifted QR algorithm [8] in terms of Schur parameters. Supposing ˆ , Gragg derived the QR iteration starts with a matrix U and ends with a matrix U ˆ formulas for computing the Schur parameters of U directly from those of U in O(n) arithmetic. Making the reasonable practical assumption that all of the eigenvalues can be found in O(n) QR iterations, we see that we can get the eigenvalues in O(n2 ) work. The formulas given in [9] turned out to be unstable, but they can be stabilized [13]. Other methods that have been proposed are described below. This paper presents another scheme for performing a unitary QR iteration in O(n) work. Our method has several virtues. For one, it can do multi-shift QR iterations of arbitrary degree. Furthermore, it is straightforward and easy to understand. Finally, it is backward stable. Numerical experiments confirm that the method works well. 2. Previous Work. We have already mentioned the work of Rutishauser [12] and Gragg [9]. Bunse-Gerstner and He [6] proposed a bisection method based on a Sturm sequence. Gragg and Reichel [10] and Ammar Reichel and Sorensen [2, 3] ∗ Department of Mathematics, Washington State University, Pullman, Washington 99164-3113 (e-mail: {rdavid,watkins}@math.wsu.edu).

1

2

RODEN J. A. DAVID AND DAVID S. WATKINS

developed divide-and-conquer algorithms, which were improved by Gu, Guzzo, Chi, and Cao [11]. Several methods make use of the odd-even form. H is unitarily similar to ˜ = Ho He , H where Ho (resp. He ) is the product of the Gi with odd (resp. even) subscripts. Thus, when n is even, for example, ˜1, G ˜3, . . . , G ˜ n−1 } and He = diag{1, G˜2 , G ˜4, . . . , G ˜ n }. Ho = diag{G Ammar, Gragg, and Reichel [1] have used this form to develop an algorithm for real orthogonal matrices that reduces the problem to two half-sized bidiagonal singular ˜ = Ho He can also be formulated value decompositions. The eigenvalue problem for H as a generalized eigenvalue problem for the odd-even pencil H0 − λHe−1 . Bunse-Gerstner and Elsner [5] formulated variants of the QZ algorithm (single and double shift) for the odd-even pencil. 3. Multi-Shift QR Algorithm. Our method is an efficient implementation of the multi-shift QR algorithm [4, 17]. We will begin, therefore, with a brief review of how the multi-shift QR algorithm is implemented implicitly. Given a matrix A ∈ Cn×n in unreduced upper Hessenberg form and shifts µi ∈ C for i = 1, 2, · · · , m, a multi-shift QR iteration of degree m carries out the steps ˇiR ˇi (A − µi I) = Q ˇiQ ˇ i + µi I Aˇi := R for i = 1, 2, · · · , m implicitly. The final matrix Aˆ := Aˇm is produced directly from A and is unitarily similar to A by Aˆ = Q∗ AQ

(3.1)

ˇ1Q ˇ2 · · · Q ˇ m . It can be shown [4, 17] that Q is also the unitary factor in where Q = Q the unitary-upper triangular decomposition (A − µm I)(A − µm−1 I) · · · (A − µ1 I) = QR. The transformation (3.1) from A to Aˆ is carried out implicitly as follows: 1. Construct a unitary matrix V ∈ Cn×n that satisfies V e1 =

1 (A − µm I)(A − µm−1 I) · · · (A − µ1 I)e1 α

where α = k (A − µm I)(A − µm−1 I) · · · (A − µ1 I)e1 k2 . 2. Reduce the matrix V ∗ AV to upper Hessenberg form. Since A is upper Hessenberg, the unitary matrix V has the block diagonal form V = diag{V˜1 , In−m−1 } where V˜1 ∈ C(m+1)×(m+1) is unitary. In fact the matrix V ∗ maps the vector v = (A − µm I)(A − µm−1 I) · · · (A − µ1 I)e1

UNITARY EIGENVALUE PROBLEM

3

to y = (α, 0, · · · , 0)T ∈ Cn . Because of the form of V , the transformation A 7→ V ∗ A acts only on the first (m + 1) rows and the transformation V ∗ A 7→ (V ∗ A)V acts only on the first (m + 1) columns. Hence the unitary similarity transformation A 7→ A˜ := V ∗ AV introduces an initial bulge of size (m + 1) × (m + 1) given by the submatrix   a ˜2,1 ··· a ˜2,m+1   .. .. (3.2)  . . . a ˜m+2,1

··· a ˜m+2,m+1

To return A˜ to upper Hessenberg form, a unitary matrix P1 is built such that the transformation A˜ 7→ P1∗ A˜ acts only on rows 2, · · · , m + 2 of A˜ to zero out the entries a ˜3,1 , · · · , a ˜m+2,1 . Matrix P1 has the block diagonal form diag{I1 , P˜1 , In−m−2 }. The ˜ 1 acts only on the columns 2, · · · , m + 2, leaving the transformation P1∗ A˜ 7→ (P1∗ A)P newly created zeros unaffected, and creates a new row to the bulge. Hence the unitary ˜ 1 returns the first column to upper Hessenberg similarity transformation A˜ 7→ P1∗ AP form and moves the bulge one row and one column down. A second unitary matrix P2 ˜ 1 7→ P ∗ (P ∗ AP ˜ 1 )P2 returns the second column is built so that the transformation P1∗ AP 2 1 to Hessenberg form and moves the bulge one row and one column down. The process is repeated until the bulge is chased off the bottom of the matrix and A˜ is eventually returned to upper Hessenberg form. In all, unitary matrices P1 , P2 , · · · , Pn−2 are created to carry out this reduction. The kth unitary matrix has the block diagonal form   Ik  Pk =  (3.3) P˜k In−m−k−1 for k = 1, 2, · · · , n − m − 2 and (3.4)

Pk =

Ik P˜k

for k = n − m − 1, · · · , n − 2. It can be shown [4, 17] that the upper Hessenberg matrix that is obtained at the end of this reduction of A˜ is the matrix Aˆ in (3.1). Hence matrices A and Aˆ are related by ∗ Aˆ = (Pn−2 · · · P2∗ P1∗ V ∗ )A(V P1 P2 · · · Pn−2 ).

If A is unitary Hessenberg, we make the following modification in the scheme: The matrices P1 , P2 , · · · , Pn−2 are constructed such that we get real, positive subdiagonal ˆ We require this so that Aˆ has a factorization of the form (1.1). Matrix entries in A. P1 for instance can be chosen as the unitary matrix that maps the first column of A˜ to the vector (˜ a11 , ξ, 0, 0, · · · , 0)T ∈ Cn where ξ = k (˜ a21 , · · · , a ˜m+2,1 ) k2 . 4. Efficient Unitary Multi-Shift QR Iteration. We now show how to implement a multi-shift QR iteration efficiently on a unitary matrix in factored form. Let U ∈ Cn×n be a unitary matrix in upper Hessenberg form with uj+1,j > 0 for j = 1, . . . , n − 1. Then U has a factorization (4.1)

U = G1 G2 · · · Gn−1 Gn ,

4


where  (4.2)



Ik−1 ˜k G

Gk = 

 In−k−1

with ˜k = G

γk σk

σk −γ k

,

σk > 0,

2

| γk | + σk2 = 1,

for k = 1, . . . , n − 1, and (4.3)

Gn =

In−1 γn

ˆ in factored with | γn | = 1. The QR iteration will produce a new unitary matrix U form: (4.4)

ˆ =G ˆ1G ˆ2 · · · G ˆ n−1 G ˆn. U

We define two vectors g = (γ1 , · · · , γn ) ∈ Cn and s = (σ1 , · · · , σn−1 ) ∈ Rn−1 to store matrix U . Let µi ∈ C for i = 1, · · · , m be the shifts. The first part of the implicit algorithm is as follows. We construct a matrix V = diag{V˜1 , In−m−1 } as described in ˜ := V ∗ U V then U ˜ contains the initial bulge given by the the preceding section. If U ˜ ˜ to submatrix U (2 : m + 2, 1 : m + 1). The second part of the algorithm is to return U ˆ upper Hessenberg form U by chasing this bulge. The idea behind our implementation is to multiply together the first few of the Gi factors to build a leading submatrix of U that is big enough to accommodate the bulge. We then build the bulge and begin to chase it downward. As we do so, we must multiply in additional Gi factors to ˆ1, accommodate the progressing bulge. However, we also get to factor out matrices G ˆ G2 , . . . , from the top since, as soon as the bulge begins to move downward, we can begin to refactor the top part of the matrix, for which the iteration is complete. At any given point in the algorithm, the part of the matrix that contains the bulge can be stored in a work area of dimension (m + 2) × (m + 2). On each forward step we must factor in one new Gi at the bottom of the work area, and we get to factor out ˆ j at the top. The total storage space needed by our algorithm is thus O(n + m2 ). aG Let   I1 ˜1 Im G ˜   W1 = ··· G2 ˜ m+1 . Im G Im−1 Thus W1 is the (m + 2) × (m + 2) leading principal submatrix of G1 G2 · · · Gm+1 . This goes into the work area initially. Note that the submatrix W1 (:, 1 : m + 1) consisting of the first (m + 1) columns of W1 is the submatrix U (1 : m + 2, 1 : m + 1) of U . It ˜ (1 : m + 2, 1 : m + 1) which contains the initial bulge is follows that the submatrix U the first m + 1 columns of W2 := V1∗ W1 V1 where V1 = diag{V˜1 , I1 }. In computing W2 , we have thus performed the transformation U 7→ V ∗ U V by working only with matrix W1 . We now chase the bulge. The matrix P1 = diag{I1 , P˜1 , In−m−2 } is constructed ˜ 7→ P ∗ U ˜ returns the first column of U ˜ to upper such that the transformation U 1

5


Hessenberg form. In terms of the working matrix, we perform the transformation (1) (1)∗ W2 7→ W2 := P˜1 W2 where P˜ (1) = diag{I1 , P˜1 }. Further, P˜1 is constructed so that (1) (1) the entry W2 (2, 1) > 0. Hence the first column of W2 is (ˆ γ1, σ ˆ 1 , 0, · · · , 0)T . We can then perform the factorization (1) ˜ I1 (1) G 1 W2 = (4.5) ˜ (1) . Im W 2 where

˜ (1) = G 1

γˆ 1 σ ˆ1

σ ˆ1 γ1 −ˆ

.

ˆ 1 = diag{G ˜ (1) , Im−2 } is the first matrix in the factorization (4.4) The The matrix G 1 first entries of the vectors g and s are replaced with the new Schur parameters γˆ 1 and ˜ (1) is the trailing (m + 1) × (m + 1) principal submatrix σ ˆ 1 . From (4.5), we see that W 2 ∗ ˜ (1) , Im } W (1) . We extract W ˜ (1) and let of diag{G 1 2 2 ˜ (1) (2) W 2 W2 := (4.6) . I1 This is our new working matrix. The next factor in (4.1) is multiplied in: Im (3) (2) W2 := W2 ˜ m+2 . G ˜ 7→ U ˜1 := (P ∗ U ˜ Finally to carry out the transformation P1∗ U 1 )P1 we note from (3.3) that P1 commutes with Gm+3 , · · · , Gn . Thus if P˜1 (3) W3 := W2 I1 ˜1 (3 : m + 3, 2 : m + 2), then the first (m + 1) columns of W3 form the submatrix U ˜ 7→ P ∗ U ˜ which contains the new bulge. This completes the transformation U 1 P1 . In general, for k = 2, · · · , n − m − 2, we have the working matrix Wk+1 whose first (m + 1) columns contain the bulge. The matrix Pk having the form (3.3) is built. In this block diagonal form, the unitary matrix P˜k is constructed such that the transformation (4.7)

(1) (1)∗ Wk+1 7→ Wk+1 := P˜k Wk+1 ,

(1) where P˜k = diag{I1 , P˜k }, returns the first column of Wk+1 to upper Hessenberg (1) form and makes the entry Wk+1 (2, 1) > 0. Next, the factorization

(4.8)

(1)

Wk+1 =

˜ (1) G k

I1 ˜ (1) W k+1

Im

is performed, where ˜ (1) = G k

γˆ k σ ˆk

σ ˆk γk −ˆ

.

6


The kth entries in the vectors g and s are updated with γˆ k and σ ˆ k respectively. The (1) ˜ submatrix Wk+1 is extracted and the working matrix (4.9)

(2)

Wk+1 :=

˜ (1) W k+1

I1

is formed. The next factor in (4.1) is multiplied in: Im (3) (2) Wk+1 := Wk+1 ˜ m+k+1 G and a full working matrix is formed by (4.10)

Wk+2 :=

(3) Wk+1

P˜k

I1

.

When k = n − m − 1, the working matrix begins to shrink. After the operations (4.7) and (4.8), there is no need to make the extension indicated by (4.9), because ˜ n = [γn ] is only 1 × 1, not 2 × 2. On subsequent steps the working matrix continues G to shrink, because there are no more factors to multiply in. By the time the bulge chase is complete, the working matrix has been reduced to 2 × 2 and can be factored to form γˆ n−1 σ ˆ n−1 1 0 . 0 γˆ n γ n−1 σ ˆ n−1 −ˆ The new Schur parameters γˆ n−1 , σ ˆ n−1 , and γˆ n replace the old ones in g and s, and the iteration is complete. Enforcement of Unitarity. One other important detail needs to be mentioned. 2 Each new pair of Schur parameters γˆ k , σ ˆ k satisfies | γˆ k | + σ ˆ 2k = 1 in principle, but in practice roundoff errors will cause this equation to be violated by a tiny amount. Therefore the following normalization step is required:

(4.11)

1/2 2 ν ← | γˆ k | + σ ˆ 2k γˆ k ← γˆ k /ν σ ˆk ← σ ˆ k /ν

This should be done even when k = n, taking σ ˆ n = 0. This enforcement of unitarity is essential to the stability of the algorithm. If it is not done, the matrix will (over the course of many iterations) drift away from being unitary, and the algorithm will fail. Backward Stability. If we could perform a QR iteration in exact arithmetic, ˆ = Q∗ U Q, where Q, U , and U ˆ are exactly unitary matrices. Now we would have U suppose we perform the iteration in floating-point arithmetic, at first supposing that ˆ are fully assembled, i.e. not in the factored form U = G1 · · · Gn . Then it is U and U ˆ satisfies U ˆ = Q∗ (U + E)Q, where Q is exactly well established that the computed U unitary and k E k2 is a modest multiple of the unit roundoff of the floating-point arithmetic [19, Ch. 3]. Thus the iteration is backward stable. The additional complication that arises in our algorithm is that the matrix U ˆ is produced in factored form. In the course of is presented in factored form, and U

7


the iteration, factors are multiplied together at some points and split apart at others. Errors occur during each of these operations, and we need to analyze their effect. There is no problem with the phase in which factors are multiplied together. The multiplication of two unitary (or near unitary) matrices results in a product that has a tiny backward error. The big question is what happens in the splitting-apart step shown in (4.5) and (1) (4.8). The factorization (4.8) is effected by multiplying Wk+1 on the left by the ˜ (1) , Im } to obtain conjugate transpose of diag{G k



I1 ˜ (1) W k+1

  = 

1 0 .. .

0 ··· 0



˜ (1) W k+1

  . 

0 The generation of zeros in the first row of this matrix depends upon the fact that it is unitary and therefore has orthonormal columns. In fact the matrix is not quite unitary, so those first-row entries will not be exactly zero in practice. We need to show that they are tiny enough that setting them to zero does not compromise the stability of the algorithm. The success of the analysis hinges on the fact that we perform the normalization step (4.11) each time we produce a new pair of Schur parameters. This ensures that no matter how many steps we have taken in the algorithm, each of the Gi matrices is ˜ i + Ei , where G ˜ i is exactly unitary and k Ei k nearly unitary, having the form Gi = G 2 is on the order of the unit roundoff u. Now suppose we multiply together two matrices that are nearly unitary. Say we ˜ + F , where A˜ and B ˜ are unitary and k E k and k F k have A = A˜ + E and B = B 2 2 are on the order of the unit roundoff u. If we multiply them together, we obtain a computed product P that satisfies P = AB + H, where k H k2 is on the order of u. Substituting in the forms of A and B, and letting ˜ we find that P˜ = A˜B, P = P˜ + K, ˜ + AF ˜ + EF + H. Thus P is the sum of a unitary matrix and an error where K = E B matrix K such that k K k2 is on the order of u. (1) The first working array that gets split apart is W2 , which gets factored as (1) shown in (4.5). W2 was formed by multiplying together several matrices that are almost exactly unitary. Therefore, by applying the analysis of the previous paragraph (1) ˜ + E, where W ˜ is exactly unitary and k E k is at repeatedly, we see that W2 = W 2 (1) most a modest multiple of u. The first column of W2 has the form (ˇ γ1 , σ ˇ1 , 0, · · · , 0)T , ˜ (1) , Im }, the matrix where γˇ12 + σ ˇ12 = 1 + 1 , with 1 on the order of u. Let G = diag{G 1 in (4.5) that has the property that G∗ zeros out the entry σ ˇ1 in the first column of (1) ˆ1 , which are used to build G, are obtained from γˇ1 and σ ˇ1 W2 . The entries γˆ1 and σ by carrying out the normalization step (4.11). Since roundoff errors are incurred in the normalization step, G∗ transforms (ˇ γ1 , σ ˇ1 , 0, · · · , 0)T to (1 + 2 , 3 , 0, · · · , 0)T ; that ˜ denote the theoretical is, it does not exactly succeed in transforming σ ˇ1 to zero. Let G

8


˜ rather than W (1) . This G matrix built using the entries from the first column of W 2 ∗ ˜ W ˜ is exactly e1 = (1, 0, 0, · · · , 0)T . matrix is exactly unitary, and the first column of G (1) ˜ and the computation (4.11) has high Moreover, since W2 differs only slightly from W ˜ + F , where k F k is on the order relative accuracy in floating-point arithmetic, G = G 2 of u. (1) ˜ (1) } in Now consider the computation G∗ W2 , which forms the factor diag{I1 , W 2 (4.5). We have (4.12)

(1)

G∗ W 2

˜∗W ˜ + H, =G

˜ +G ˜ ∗ E + F ∗ E. The matrix G ˜∗W ˜ is exactly unitary, its first column where H = F ∗ W T is exactly e1 , and its first row is exactly e1 . Since k H k2 is at most a modest multiple of u, we conclude from (4.12) that the first row and column of the computed matrix (1) G∗ W2 differ from eT1 and e1 , respectively, by errors on the order of u. Therefore, the error we make in setting the first row and column to eT1 and e1 , respectively, are tiny. Their contribution to the backward error is equally tiny. ˜ (1) that is created in (4.5) differs We also deduce from (4.12) that the matrix W 2 only negligibly from a unitary matrix. This is the part of the matrix that is carried forward to the working array (4.6). Now, proceeding inductively, we can conclude that at every step of the algorithm the matrix in the working array differs only negligibly from a unitary matrix. At each factorization (4.8) a tiny backward error is incurred, and the part of the working array that is moved forward differs only neglibibly from a unitary matrix. Therefore the algorithm is backward stable. This argument depends critically on the normalization (4.11), which guarantees that each new factor that is brought into the working array is almost exactly unitary. Operation Count. The bulk of the arithmetic in our algorithm is contained in the steps (4.7) and (4.10). Each unitary transformation is taken to be the product of a reflector followed by a diagonal phase-correcting transformation to enforce the condition u ˆk+1,k > 0. The latter costs O(m) arithmetic; the real work is in applying the reflector. Each of these is at most (m + 1) × (m + 1) (smaller at the very end of the iteration), and the cost of applying it efficiently to the working matrix on left or right is about 4m2 flops [16, § 3.2]. Since the reflector is applied only to the small work area and not to the full Hessenberg matrix, the amount of arithmetic is O(m2 ) instead of O(nm); this is where we realize our savings. Since n − 1 reflectors are applied (on left and right) in the whole iteration, the arithmetic cost is about 8nm2 flops. If m is fixed and small, then we can say that the cost of an iteration is O(n), in the sense that the arithmetic is bounded by Cm n, where Cm is independent of n. However, the fact that Cm grows like m2 as m is increased shows that it will be inefficient to take m too large. There is another important reason for keeping m fairly small. If m is made much bigger than 8 or 10, roundoff errors interfere with the mechanism of shift transmission and render the QR iteration ineffective [15]. This phenomenon is known as shift blurring. 5. Shift Strategies. Eberlein and Huang [7] presented a globally convergent shift strategy for the unitary QR algorithm, and they showed that it converges at least quadratically. Wang and Gragg [14] proposed a family of strategies that includes that of Eberlein and Huang. They demonstrated global convergence and showed that the convergence rate is always at least cubic. These strategies are for single QR iterations, the case m = 1.


9

Since we are taking multiple steps, we need a different strategy. The most common way to obtain m shifts is to take the eigenvalues of the trailing m × m submatrix of U . Watkins and Elsner [18] showed that this strategy is cubically convergent when it converges. However, it is not globally convergent, as the following well-known example shows. Let U be the unitary circulant shift matrix, which looks like   1  1      1 1 in the 4 × 4 case. For any m < n, if we take the eigenvalues of the trailing submatrix as shifts, we get shifts 0, . . . , 0, which are equidistant from all of the eigenvalues. A QR iteration on U with these shifts goes nowhere. Since the eigenvalues of a unitary matrix lie on the unit circle, it make sense to choose shifts that are on the unit circle. We tried two strategies. The first computes the eigenvalues of the trailing m × m submatrix and normalizes each of them by dividing it by its absolute value. If any of the tentative shifts happens to be zero, it is replaced by a random number on the unit circle. If we use this strategy on the circulant shift matrix, we get m random shifts. A second strategy stems from the following observation. The last m rows of the unreduced Hessenberg matrix U are orthonormal. Since un−m+1,n−m > 0, the trailing m × m submatrix U (n − m + 1 : n, n − m + 1 : n) is not unitary, but it is nearly unitary. Its rows are orthogonal, and they all have norm 1, except that the top row U (n − m + 1, n − m + 1 : n) has norm less than one. Unitarity can be restored by dividing this row by its norm. In the rare case when the whole top row is zero, a suitable first row can be generated by orthonormalizing a random row against rows 2 through m. The m eigenvalues of the modified matrix then give m shifts on the unit circle. When this strategy is used on the circulant shift matrix, the orthonormalization process will generate a first row of the form (0, . . . , 0, γ) with | γ | = 1. The shifts are then the roots of the equation z m − γ = 0, which are equally spaced points on the unit circle. We found that these two strategies work about equally well. Both are locally cubically convergent: As un−m+1,n−m → 0, the trailing m × m submatrix becomes closer and closer to unitary. Its eigenvalues become ever closer to the unit circle, and normalizing them as in the first strategy moves them only slightly. On the other hand, if we modify the matrix as in the second strategy by normalizing its first row, that also moves the eigenvalues only slightly, because the rescaling factor is very close to 1. Thus both strategies behave asymptotically the same as the strategy that simply takes the eigenvalues of the trailing submatrix as shifts; that is, they converge cubically [18] when they converge. We conjecture that both strategies converge globally. 6. Numerical Results. To verify that our algorithm works as expected, we coded it in MATLAB and tried it out on numerous unitary matrices. Test problems with known eigenvalues were generated as follows. A unitary diagonal matrix D was generated and its eigenvalues noted. A unitary matrix Q, random with respect to Haar measure, was generated, and the random unitary matrix B = QDQ∗ formed. Then B was transformed to upper Hessenberg form to yield an upper Hessenberg

10

RODEN J. A. DAVID AND DAVID S. WATKINS Table 6.1 Error in computed eigenvalues of a 1000 × 1000 unitary matrix

m=1 m=2 m=3 m=4 m=5 m=6 m=7 m=8 MATLAB

maximum error 1.10 × 10−13 5.31 × 10−14 3.25 × 10−14 2.77 × 10−14 2.71 × 10−14 2.53 × 10−14 2.52 × 10−14 2.14 × 10−14 1.03 × 10−14

unitary matrix A with known eigenvalues, which was then factored into the form (1.1). The eigenvalues of unitary matrices are perfectly conditioned, so we always expect to be able to compute them to very high accuracy. We found that our algorithm was able to do this. The results in Table 6.1 are typical. These are for a matrix of order 1000 × 1000 with eigenvalues randomly distributed on the unit circle. We computed the eigenvalues with our code using m = 1, 2, 3, . . . , 8 and obtained accurate results in all cases. It is interesting that increasing m increases the accuracy. At m = 4 the maximum error is only about one fourth what it is for m = 1. We already have at least two reasons for not taking m too large, but these numbers suggest that m = 1 may not be the best choice. For real orthogonal matrices one should always take m ≥ 2, and the complex shifts should be taken in conjugate pairs. Then the matrix (A − µm I) · · · (A − µ1 I) is real, and all operations can be done in real arithmetic. As Table 6.1 shows, we also had the standard MATLAB QR code compute the eigenvalues of the Hessenberg matrix, and we found that it was a bit more accurate than our codes, but the difference was not substantial. The results in Table 6.1 are for a single matrix, but they are entirely typical of what we observed. The test matrices included matrices with many repeated eigenvalues and others with tight clusters of eigenvalues. The eigenvalues of smaller matrices are computed with slightly more accuracy than are those of large ones, but in all cases the results were qualitatively like those in Table 6.1. We conclude that our algorithm works as expected. REFERENCES [1] G. Ammar, W. Gragg, and L. Reichel, On the eigenproblem for orthogonal matrices, in Proc. 25th IEEE Conference on Decision and Control, Athens, Greece, 1986, pp. 1963–1966. [2] G. S. Ammar, L. Reichel, and D. C. Sorensen, An implementation of a divide and conquer algorithm for the unitary eigenproblem, ACM Trans. Math. Software, 18 (1992), pp. 292– 307. , Corrigendum: Algorithm 730: An implementation of a divide and conquer algorithm [3] for the unitary eigenproblem, ACM Trans. Math. Software, 20 (1994), p. 161. [4] Z. Bai and J. Demmel, On a block implementation of the Hessenberg multishift QR iteration, Internat. J. High Speed Comput., 1 (1989), pp. 97–112. [5] A. Bunse-Gerstner and L. Elsner, Schur parameter pencils for the solution of the unitary eigenproblem, Linear Algebra Appl., 154–156 (1991), pp. 741–778.


11

[6] A. Bunse-Gerstner and C. He, On a Sturm sequence of polynomials for unitary Hessenberg matrices, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 1043–1055. [7] P. J. Eberlein and C. P. Huang, Global convergence of the QR algorithm for unitary matrices with some results for normal matrices, SIAM J. Numer. Anal., 12 (1975), pp. 97–104. [8] J. G. F. Francis, The QR transformation, parts I and II, Computer J., 4 (1961), pp. 265–272, 332–345. [9] W. B. Gragg, The QR algorithm for unitary Hessenberg, J. Comput. Appl. Math., 16 (1986), pp. 1–8. [10] W. B. Gragg and L. Reichel, A divide and conquer algorithm for the unitary and orthogonal eigenproblems, Numer. Math., 57 (1990), pp. 695–718. [11] M. Gu, R. Guzzo, X.-B. Chi, and X.-Q. Cao, A stable divide and conquer algorithm for the unitary eigenproblem, SIAM J. Matrix Anal. Appl., 25 (2003), pp. 385–404. [12] H. Rutishauser, Bestimmung der Eigenwerte orthogonaler Matrizen, Numer. Math., 9 (1966), pp. 104–108. [13] M. Stewart, An error analysis of a unitary Hessenberg QR algorithm, Tech. Rep. TRCS-98-11, Department of Computer Science, Australian National University, 1998. http://eprints.anu.edu.au/archive/00001557/. [14] T.-L. Wang and W. B. Gragg, Convergence of the shifted QR algorithm for unitary hessenberg matrices, Math. Comp., 71 (2002), pp. 1473–1496. [15] D. S. Watkins, The transmission of shifts and shift blurring in the QR algorithm, Linear Algebra Appl., 241–243 (1996), pp. 877–896. , Fundamentals of Matrix Computations, John Wiley and Sons, New York, Second ed., [16] 2002. [17] D. S. Watkins and L. Elsner, Chasing algorithms for the eigenvalue problem, SIAM J. Matrix Anal. Appl., 12 (1991), pp. 374–384. [18] , Convergence of algorithms of decomposition type for the eigenvalue problem, Linear Algebra Appl., 143 (1991), pp. 19–47. [19] J. H. Wilkinson, The Algebraic Eigenvalue Problem,, Clarendon Press, Oxford University, 1965.

EFFICIENT IMPLEMENTATION OF THE MULTISHIFT QR ... - CiteSeerX

EFFICIENT IMPLEMENTATION OF THE MULTISHIFT QR ... - CiteSeerX

Suggest Documents

THE MULTISHIFT QR ALGORITHM. PART I - CiteSeerX

Algorithm and software implementation of QR ...

Implementation of QR Decomposition Algorithm using FPGAs

Implementation of QR Decomposition Algorithm using FPGAs

Efficient Implementation of Rijndael Encryption in ... - CiteSeerX

Designed Implementation of Modified Area Efficient ... - CiteSeerX

Efficient implementation of inverse approach for ... - CiteSeerX

Efficient Distributed Implementation of Semi-Replicated ... - CiteSeerX

Efficient implementation of inverse approach for ... - CiteSeerX

Contextual QR Codes - CiteSeerX

Contextual QR Codes - CiteSeerX

Y Qr Qs - CiteSeerX

on the efficient implementation of the capon spectral ... - CiteSeerX

An Efficient Implementation of File Sharing Systems on the ... - CiteSeerX

An Efficient Implementation of File Sharing Systems on the ... - CiteSeerX

Efficient Small-Sized Implementation of the Keyed-Hash ... - CiteSeerX

efficient implementation of the 2-d capon spectral estimator - CiteSeerX

TRADABLE DEFICIT PERMITS. Efficient Implementation ... - CiteSeerX

Implementation Intentions and Efficient Action Initiation - CiteSeerX

qr

perturbation analyses for the qr factorization - CiteSeerX

MULTISHIFT VARIANTS OF THE QZ ALGORITHM WITH ... - The Netlib

Efficient implementation of the Localized Orthogonal Decomposition ...

Efficient Implementation of the K-SVD Algorithm