DAVID E. STEWART. March 2, 1994. Abstract. A number ..... For a \Q" factor in a de Moor and Zha product SVD, instead of 3.4, the following factorisation is used:.
PRODUCT ALGORITHMS FOR EIGENSYSTEMS DAVID E. STEWART
March 2, 1994 Abstract. A number of problems arising from dynamical systems and other areas
leads to problems of computing eigenvalues/vectors and singular value decomposisitions of products of matrices. A number of recent algorithms by Bojanczyk, Golub and Van Dooren; Bojanczyk, Ewerbring and Luk; D. Stewart; and Abarbanel, Brown and Kennel have been devised to compute Schur forms and SVD's of products in terms of the factors. A common connection between them is the use of recursive QR factorisations and related methods. Relationships between these methods, and their accuracy, is discussed. Generalised eigen- and singular value decompositions can also be understood in this framework.
1. Why product algorithms? Product algorithms are algorithms to compute factorisations of products of matrices that works with the product in terms of its factors. There are a number of reasons why this is a good thing to do. In terms of the design of an algorithm, by working in terms of the factors, at each stage of the algorithm the data is close in form to the true data of the original problem. This often means that meaningful backward error results can be proven. These product algorithms can be very important where long products are involved, such as arise from dynamical systems. These are often studied to obtain estimates of Lyapunov exponents, and related quantities such as bounds of (fractal) dimensions of attractors and topological or Kolmogorov entropy. Long products typically have exponentially diverging eigenvalues and singular values. Multiplying out such long products will often result in the (exponentially) smaller eigenvalues being swamped by the roundo errors from the larger eigenvalues. As with the common criticism of the normal equations approach to solving least squares problems, the problem comes from the much poorer conditioning obtained from multiplying matrices and then working with the product rather than its factors. 1991 Mathematics Subject Classi cation. 65F15. Key words and phrases. matrix products, recursive QR, treppeniteration, Lyapunov exponents. 1
2
David E. Stewart
One way of considering the problem is to consider the problem in terms of additive and multiplicative error. If we compute C = AB and then work with C , the algorithms operating on C will usually generate small errors relative to kC k, and will (if they backward stable) compute the exact results for a matrix C + F where kF k = O(u)kC k, and u is the unit roundo or machine epsilon. This error F cannot, in turn, be imputed to errors in A and B that are of the same size. What is more desirable is to use multiplicative backward error analysis; that is to show that the computed solution is the exact solution of (I + EA )A(I + EB )B , where the E 's are of size O(u). Note that for perturbation results on long products of matrices, we will need to consider A and B that are themselves long products. For problems with long products, small (multiplicative) backward errors do not necessarily imply that the computed solution has small relative errors. For that, some perturbation theory that is independent or slowly growing in the length of the product is needed. This cannot be guaranteed in general, although it seems that under assumptions related to exponential dichotomies [4] useful perturbation results can be proven. Two partial perturbation results are given in the next subsection for \outer" multiplicative perturations. 1.1. Multiplicative perturbations. Here we are concerned with products of the form (I + E )A where kE k is small and A has both extremely large and extremely small eigenvalues or singular values. What we aim to prove is that the eigenvalues or singular values of the product are close to the corresponding eigenvalues or singular values of A relative to the size of the corresponding eigen- or singular values. This an example of an \outer" multiplicative perturbation. \Inner" perturbations of products have the form A(I + E )B and can be considerably harder to deal with. If either A or B is non-singular, this inner perturbation can be transformed into either of the outer perturbations (I + AEA? )AB or AB (I + B ? EB ). The penalty for this transformation is the growth in the size of the perturbation, possibly by as much as (A) or (B ). If both of these condition numbers are large, then the relative stability of the eigen- and singular values cannot be guaranteed, as will be shown later. Firstly for singular values, there is the result given in Stewart [19]: Theorem 1. If kE k < 1 then 1 ? kE k i((I +(AE) )A) 1 + kE k i where i (B ) is the ith largest singular value of B . Proof. This can be proven easily in terms of the result in Horne and Johnson [10, Thm. 3.3.16] that i j? (CD) i(C )j (D). Thus i((I + E )A) (I + E )i(A) (1 + kE k )i(A): 1
1
2
2
+
2
1
1
2
Product Algorithms
3
Conversely, i(A) ((I + E )? )i((I + E )A), so i((I + E )A) i(A)n(I + E ) (1 ? kE k )i(A). A corresponding result can be proven for eigenvalues under some fairly mild assumptions: Theorem 2. If A is diagonalisable with X ? AX = = diag( ; : : :; n ) and ji ? j j ; (X )kE k < min i6 j ji j + jj j then the eigenvalues ei of (I + E )A are distinct and satisfy jei ? i j (X )kE k : jij Proof. The proof uses Gershgorin's theorem. Note that (I + E )A = (I + E )X X ? X ? (I + E )X = (I + X ? EX ) : Let F = X ? EX . Then scales the columns of F . The Gershgorin disks of (I + F ) are given by X j ? (1 + fii)i j jij jfjij: 1
1
2
1
1
1
1
=
1
1
1
1
1
1
j 6=i
These are contained in the disks given by j ? i j X jf j max X jf j = kF k (X )kE k : ji k j jk jij j After some algebra it can be seen that there is no overlap between the ith disk and the j th disk provided that jj ? i j=(jj j + jij) > (X )kE k . Under this condition for no overlap of Gershgorin disks, there is exactly one eigenvalue in each disk. Denote the eigenvalue of (I + E )A in the ith disk by ei . Then by the above bounds for the ith disk, jei ? i j=jij (X )kE k . To see that \inner" perturbations can cause the loss of all relative accuracy in eigen- and singular values, consider the singular values of the following product for 0 < ; 1: # #" # " " #" 1 1 A(I + E )B = 1 1 = : The product AB = diag(; ), but A(I + E )B has an singular value of size , which indicates arbitrarily large relative errors in at least one of the singular values. Since the singular values of A(I + E )B are the square roots of the eigenvalues of B T (I + E T )AT A(I + E )B , it follows that this example carries over to showing that 1
1
1
1
1
1
1
1
2
2
1
2
1
2
2
4
David E. Stewart
the relative error of an eigenvalue of a product can be arbitrarily large due to inner perturbations. A closer inspection can reveal an explanation: # #" " 1 T T T B (I + E )A A(I + E )B 1 ; and if we set = 0, the rightmost factor is not diagonalisable; if 0 < 1 then the matrix of eigenvectors will be nearly singular, and the above theorem on eigenvalue perturbations does not guarantee any \nice" error bounds. 2
2
4
2
2
1
1
1
1
2. Treppeniteration and recursive QR The basic building block for many algorithms is the recursive QR factorisation. Treppeniteration is an old method [8, x7.3.4] which computed eigenvalues and eigenvectors by performing the recursive LU or QR factorisation on the power Am for m suitably large. The basic algorithm takes a product A A : : : A p , factors the last matrix A p = Q p R p using the standard QR factorisation. The matrix C p? = A p? Q p is formed, and then the QR factorisation C p? = Q p? R p? is computed. Continuing we obtain A = Q R = Q R :::R p where R is upper triangular, being the product of upper triangular matrices. More formally, this is performed by the following piece of pseudo-code: procedure treppeniteration (in: A i , out: R i , Q ) Qp I for i = p, p ? 1, : : : , 1 Ci AiQi compute QR factorisation C i = Q i R i (1)
( )
( )
(2)
( )
( )
(
(
(1)
(1)
1)
(
(1)
1)
(
1)
(
1)
( )
1)
( )
( )
( )
(1)
( +1)
( )
( )
( +1)
( )
return
( )
( )
Later we will need to use variants which do not have the initialisation step Q p I , in order to compute QR factorisations of A U where U is an orthogonal matrix. In this case Q p will be an input parameter. Additionally, if the Q's are stored we can reconstruct the original factors as A i = C i Q i T = Q i R i Q i T. 2.1. Error analysis. The recursive QR factorisation admits an elegant backward error analysis which is readily obtained from the error analysis of the standard QR factorisation [8]. This backward error analysis is in Stewart [19]. Here the notation Qb and Rb i are the computed versions of Q and R i . It is assumed that a suitable stable method of computing the standard QR factorisation, such as the Householder method or using Givens' rotations, but not methods such as classical Gram{Schmidt ( +1)
( +1)
( )
( )
(1)
( +1)
( )
( )
( )
( +1)
(1)
( )
Product Algorithms
5
reduction [8, x5.2.7]. The main result is that there are modest constants k and k , and an exactly orthogonal matrix Qe , such that Qe Rb : : : Rb p = (A + E ) : : : (A p + E p ) where kE i k k ukA i k and kQe ? Qb k k u. 2.2. (Im)Possibilities for parallelism. The recursive QR algorithm is not an inherently parallel algorithm. Recursive doubling does not work as if A = A A , where A and A are themselves products, applying the QR factorisation to each gives A = Qe Re Qe Re . The problem is now to remove the Q matrix from the middle of the product. To push Q through to the rst factor, we need to form Re Qe and then take the QR factorisation of the product, which indicates that there is no signi cant bene t from the rst factorisation. It is possible to get a factor of two improvement for a modi ed problem using parallelism. With this the product is split into two parts A = A A , and the QR factorisation of A is formed, and the \RQ" factorisation (see next section) of A is formed giving A=Q R R Q : This may be suitable for computing singular values. However, its value is very limited. 1
2
(1)
(1)
( )
(1)
( )
1
( )
(1)
(1)
(1)
(1)
( )
( )
2
(1)
(1)
(2)
(2)
(1)
(1)
(2)
(2)
(2)
(2)
(1)
(2)
(1)
(2)
(1)
(2)
(1)
(1)
(2)
(2)
3. Inverses and generalised eigenvalues and singular values Generalised eigenvalues and singular values of pairs of matrices are really the eigenand singular values of the matrix product AB ? . Computing the product form of the Schur form or the SVD of AB ? eectively gives the generalised Schur form and the generalised SVD of the pair (A; B ). Furthermore, it is done in a backward stable way, and the resulting algorithms are either identical or very close to the standard QZ-type algorithms for generalised eigen- and singular values. In fact, the QZ algorithm for generalised eigenvalues can be considered (at least in terms of the overall scheme) as anextension of the periodic Schur decomposition of Bojanczyk, Golub and Van Dooren [2]. 3.1. Computing with inverses. If the product has factors that are inverses, then the recursive QR factorisation can be computed without needing to explicitly form the inverses. Consider a factor C = B ? . From the previous step of treppeniteration an othogonal matrix Q is computed and we wish to nd the QR factorisation of CQ = UR in terms of B . The product CQ = B ? Q can be written as (QT B )? = (R? U T )? . Since the inverse of an upper triangular matrix is upper triangular, Re = R? is upper triangular, and it suces to form the \RQ" factorisation of QT B = Re Ue T . This can be computed by means of an ordinary QR factorisation. 1
1
1
1
1
1
1
1
6
David E. Stewart
Given E 2 Rnn, its \RQ" factorisation E = SV (S upper triangular, V orthogonal) can be computed by means of suitable Householder transformations. Alternatively, given code for the standard QR factorisation, the \RQ" factorisation can be computed as follows: Let J be the \re ection" matrix where (J )kl is one if k +l = n+1 and zero otherwise. Then J is an orthogonal, symmetric matrix where J = I . Take the standard QR factorisation of JE T J = QR. Then S = JRT J is upper triangular (sij = rn ?j;n ?i = 0 if n + 1 ? j > n + 1 ? i) and V = JQT J is orthogonal and E = SV . Returning to the problem of dealing with inverse matrices implicitly, the steps for e , and set a factor C = B ? are to compute the \RQ" factorisation of B : B = RU Q U T and the factor B ? is replaced by Re ? . Since the end result of this step is to replace an inverse by an inverse, it is not necessary to explicitly compute Re? . Instead, the factor at the end of the step is given implicitly as the inverse of Re . Even if B is singular or nearly singular, the numerical factorisation is still backward stable. 3.2. Generalised eigenvalues. The complex number is a generalised eigenvalue of the matrix pair (A; B ) if there is a nonzero vector x where Ax = Bx: (It is assumed that A and B are real.) If B is nonsingular then this is equivalent to being an eigenvalue of AB ? , with eigenvector Bx. The product QR algorithm for eigenvalues (described below) computes an orthogonal matrix Q such that QT AB ? Q = T with T quasi-upper triangular [8, x7.3.3]. The product QR algorithm also computes a unitary matrix Z where Z T B ? Q is upper triangular and QT AZ is quasi-upper triangular. Thus, QT AZ is quasi-upper triangular and QT BZ is upper triangular. This is the result of the QZ algorithm as described in Golub and Van Loan [8, x7.7]. 3.3. Generalised singular values. The generalised singular value decomposition (GSVD) can be handled in a similar way. A generalised singular value is the square root of a generalised eigenvalue of the pair (AT A; B T B ). This is equivalent being an eigenvalue of (B ? )T AT AB ? . Again applying the product QR factorisation (or the symmetric product LR algorithm) we get orthogonal matrices Q, Z and W where R = QT AZ and R = W T BZ are upper triangular, and D = (QT AZ )(W T BZ )? = R R? is diagonal. Let dii = si=ci where ci + si = 1, so D = D D? where D = diag(s ; : : : ; sn) and D = diag(c ; : : : ; cn). Now R = DR so QT A(ZR? ) = D and W T B (ZR? ) = I . Multiplying both of these equations by D gives QT A(ZR? D ) = D = diag(s ; : : : ; sn) W T B (ZR? D ) = D = diag(c ; : : : ; cn) which is the form of the GSVD speci ed in [8, x8.7.3] if we set X = ZR? D . 2
+1
+1
1
1
1
1
1
1
1
2
1
1
1
2
1
1
2
1
1
2
1
2
2
1
2
1
1
2
2
1
2
1
1
2
2
2
1
1
2
1
1
2
2
1
2
1
2
Product Algorithms
7
3.4. The decompositions of de Moor and Zha. There have also been some other
forms of product SVD's (in fact, a whole tree of such decompositions) described by de Moor [12], H. Zha [21] and de Moor and Zha [13]. For each factor in the product SVD's in de Moor's and Zha's family of decompositions, they assign a \P" or a \Q" depending on whether the product involves the matrix as a factor directly, or the inverse of the matrix as a factor. (The \P" stands for \product", while the \Q" stands for \quotient".) That is, while a PP-SVD for factors A and B gives the SVD of AB , the PQ-SVD gives the SVD of A(B ? )T . These decompositions of de Moor and Zha are a little dierent to the other decompositions and to the ones described here as they use general linear transformations in their product SVD's. This makes them a more direct generalisation of the usual GSVD as described in [8, x8.7.3]. For the Pn -SVD factorisation of de Moor and Zha, the factors of the product A A : : : Ap are each factorised Ai = Xi? DiXi? where U = X and V = Xp are orthogonal, but Xi is otherwise a general nonsingular matrix. The Di matrices are quasi-diagonal ; a matrix S is quasi-diagonal if, where sk;p 6= 0, then it is the only non-zero in that row or column, if there is a nonzero in column +1, then there is a nonzero in column k. if sk ;q 6= 0, then p < q. Furthermore, the quasi-diagonal matrices Di have the same zero-structure. The singular values can thus be directly read o from the Di's: = D D : : : Dp = UA A : : : ApV T : For a \Q" factor in a de Moor and Zha product SVD, instead of 3.4, the following factorisation is used: Ai = (Xi?? )T DiXi? : This gives an SVD of a product in which the factor (A?i )T occurs. The greatest complication of the de Moor and Zha product SVD's, is the use of quasi-diagonal matrices. The precise zero-structure of these quasi-diagonal matrices is rather complicated, and depends, as they say in [13, Thm. 1]: \the integers [describing the zero-structure] are ranks of certain matrices as given in the constructive proof in ::: " 1
1
2
1
1
0
+1
+1
1
2
1
1 1
2
1
1
4. The product QR algorithm for eigenvalues The simplest algorithm for computing eigenvalues using recursive QR factorisation is the well-known QR factorisation algorithm rst devised by Francis and
8
David E. Stewart
Kublanovskaya in 1961 [8, x7.3]. This iteration is very straight forward in its basic form: for k = 1, 2, : : : Ak = Qk Rk (QR factorisation) Ak R k Qk +1
As the recursive QR factorisation can be performed on products in terms of its factors, this algorithm is easily implemented. This is used by Abarbanel, Brown and Kennel in [1] and by Bojanczyk, Golub and Van Dooren in [2]. The Abarbanel et al. algorithm is a straightforward implementation of the QR algorithm using the recursive QR method. The Bojanczyk et al. algorithm [2], on the other hand, goes to lengths to increase the speed of the algorithm by means of reduction to \product" Hessenberg form. That is, the rst matrix is put into upper Hessenberg form, and the remaining factors are put into upper triangular form.
4.1. Reduction to product Hessenberg form. This is described in [2]. Given a product
A = A A :::A p the rst step is to apply the recursive QR method to the product A : : : A p to give A : : : A p = Q R : : : R p . Then set B A Q . A Householder matrix P should then be computed which zeros the entries in column one of A below the rst sub-diagonal. Then PAP T = PB R : : : R p P T : Applying the recursive QR factorisation to the last p factors gives R : : :R p P T = Pe Re : : : Re p . If the standard QR factorisation techniques are used, then P has the form # # " " 1 1 e P = P ; and P = Pe (1)
(2)
( )
(2)
(2)
( )
(2)
(2)
( )
(1)
(1)
( )
(2)
(1)
(1)
(2)
( )
(2)
(2)
( )
( )
since the R k matrices are all upper triangular. Thus we get ( )
PAP T = PB Pe Re : : : Re p : Because of the zero structure of Pe it will not introduce any non-zeros into the rst column of PB . Repeating this for the remaining columns of PB Pe puts it into Hessenberg form. Since the remaining factors in the product are always kept in upper triangular form, this process puts the entire product in Hessenberg form. (1)
(1)
(2)
( )
(1)
Product Algorithms
9
4.2. QR algorithm with product Hessenberg form. This is described in [2].
If we have a product
A = H R :::R p with H upper Hessenberg and R k upper triangular for k = 2; : : :; p, then a \chasing" technique can be used to number of oating point operations needed for one step of the QR algorithm. For this we use 2 2 Givens' rotations applied to adjacent rows or columns. Let G be such a Givens' rotation, for rows/columns i and i + 1. Then GT AG = GT H R : : : R p G: The product R p G has at most one sub-diagonal non-zero entry, at the (i + 1; i) entry. To restore the upper triangularity of this matrix it suces to premultiply by a Givens' rotation applied to the same rows (i and i + 1). continuing in this way we obtain GT AG = GT H Ge Re : : : Re p : If G is chosen to eliminate an entry in the sub-sub-diagonal, then GT H Ge will have a sub-sub-diagonal entry one place further down than was the case with H . If i = n ? 1, then GT H Ge has no sub-sub-diagonal entry and hence is Hessenberg. With this and the implicit Q theorem [8, pp. 367{368], a chasing technique can be developed paralleling the standard one. 4.3. Shifting for products. Shifting is the standard method for accelerating convergence of the QR algorithm. Below is the QR algorithm with shifts, with shift k computed at each stage, which is hoped to be close to an eigenvalue of Ak . for k = 1, 2, : : : Ak ? k I = Qk Rk (QR factorisation) Ak R k Qk + k I (1)
(1)
(2)
( )
( )
(1)
(2)
( )
( )
(1)
(2)
( )
(1)
(1)
(1)
+1
This k is usually taken to be the smallest eigenvalue of the bottom right-hand 2 2 submatrix of Ak . Of course, for matrices represented as a product it is not reasonable to compute Ak ? k I directly. Instead, it is possible to form the (I ? k A?k )Ak , which is again a product. But can I ? k A?k be formed accurately? One thing can be seen immediately: k should be close to the smallest eigenvalue of Ak , as jk jkA?k k needs to be bounded. One way of implementing a shift into product algorithms is to write A ? I = H R : : : R p ? I = H (I ? (R : : :R p )? ) R : : : R p : Provided the matrix (R : : : R p )? is modest in size, the matrix H can be modi ed with high accuracy relative to kH k. Note that R : : : R p is upper triangular. This means that provided that each R k is well-conditioned, and that for some > 0, 1
1
1
(1)
(2)
( )
(2)
(1)
( )
(2)
( )
1
1
(2)
(1)
(1)
( )
(2)
( )
( )
10
David E. Stewart
= Qpk jriik j < 1 ? for all i and all iterates, each row of I ? (R : : : R p )? can be computed to high accuracy, relative to the size of that row. Thus I ? (R : : : R p )? can be computed (under these conditions) with high absolute accuracy. Since the shift k is chosen to make Ak ? k I nearly singular, the matrix H is going to become near singular. Of course, the second step of adding back in k I in the second step of the QR algorithm with shifts, will make the resulting leading matrix in the product well-conditioned again. Note that shifting can also be carried out for the product SVD by a related technique, with hyperbolic rotations used to modify the leading factor of the product in the LR algorithm in [19]. ( )
(2)
=2
( )
1
(2)
( )
1
(1)
4.4. De ation. De ation for the QR algorithm with Hessenberg matrices is the process which splits the spectrum of a matrix by setting a small, but usually nonzero, sub-diagonal element to zero. For product algorithms this can be done with small multiplicative backward error provided H is well conditioned. This is because de ation can be performed only considering H . If we can nd a matrix Z , with kZ ? I k small, such that ZH splits, then since R : : : R p is upper triangular, the product ZH R : : :R p splits. (1)
(1)
(1)
(1)
(2)
(2)
( )
( )
5. Product SVD's: LR, QR and Jacobi In one sense, computing SVD's is no harder than computing eigenvalues as the singular values of A are the square roots of the eigenvalues of AT A. Indeed, eigenvalue methods are the basis of most methods for computing singular values. Care must be taken with eigenvalue methods as forming AT A squares the condition number. Thus this product is used, but only implicitly. Since we are dealing with product form algorithms here, the product need not be explicitly formed, and eigenvalue methods can be used much more freely than is the case for standard algorithms.
5.1. QR for AT A. The simplest approach is to apply the QR algorithm to AT A = A p T : : :A ( )
(2)
T A(1) T A(1)A(2) : : : A(p):
This is, in fact, done in Abarbanel et al. [1]. The implementation is straightforward, and a list of 2p matrices needs to be stored. In this form backward error analyses cannot be completely developed as the perturbations to A i T and to A i will, in general, not be transposes. Thus we cannot guarantee that the computed singular values are those of a product with perturbations in the factors of size O(u). Nevertheless, it will have much better error properties than the naive algorithm. ( )
( )
Product Algorithms
11
5.2. LR algorithm. An alternative to the QR algorithm is to use Rutishauser's symmetric LR algorithm of Rutishauser and Schwartz [17] in product form. The basic symmetric LR algorithm applied to a symmetric positive de nite matrix B is for k = 1, 2, : :T: Bk = Lk Lk (Cholesky factorisation) Bk LTk Lk 1
+1
If Bk = ATk Ak , the Cholesky factorisation step is equivalent to taking the QR factorisation of Ak = Qk Rk . If Ak is itself a product, then this step can be done by the recursive QR factorisation. The second step Bk LTk Lk is equivalent to Ak RTk . Again this can be implemented for products by reversing the product and transposing each factor. This algorithm is described by Stewart [19]. This method is closely related to the ordinary QR factorisation. In fact, for symmetric positive de nite matrices, two steps of the symmetric LR algorithm is equivalent to one step of the QR algorithm in exact arithmetic; a fact known to Wilkinson [20, p. 535]. Note that for the product algorithm for computing SVD's, the symmetric LR algorithm requires half the number of QR factorisations per step than the QR algorithm, and there is no signi cant speed dierence. Note that this approach only requires a list of p matrices, not 2p as for the QR algorithm. Another advantage, though on a theoretical level, is that this approach is easily amenable to a backward error analysis, where the computed singular values are the singular values of a product with factors perturbed by O(u). This is described in [19]. Also, if the Qk matrices are remembered then the U and V matrices of the SVD A = U V T can be accumulated. (Which matrix is modi ed depends on whether k is even or odd.) One useful addition to the techniques available in this case is the ability to perform eigenvalue shifts by means of hyperbolic rotations. This requires the computation of products, but these need only be computed to high accuracy relative to the norm of the product. The problem with this technique is that it apparently destroys the ability to accumulate the U and V matrices of the SVD. (See x4.3.) 5.3. Jacobi for product SVD's. Another approach to computing SVD's of products is due to Bojanczyk, Ewerbring, Luk and Van Dooren [3] which is based on Jacobi matrices. An earlier version of this algorithm for a product of two matrices was developed by Heath, Laub, Paige and Ward [9]. This method is based on the Kogbetliantz methods for nding SVD's [11]. The approach is to rst apply the recursive QR factorisation to a product. The upper triangular structure is then used to zero strictly upper triangular entries of the product using rotations. To apply the method we need to be able to compute 2 2 products of a consistent set of 2 2 submatrices of the factors. Consequently the analysis is applied to just the 2 2 case. In [3] the Givens' rotations can be +1
+1
12
David E. Stewart
propagated through the product either from the left or the right depending on the values in the product of the 2 2 sub-matrices. This exibility is used to give smaller backward error for the product decomposition. It is interesting to note that most of the algorithms given for SVD's of short products such as the SVD algorithm (for a product of two matrices) of Fernando and Hammarling [7], and the HK-SVD (for products of three matrices) of Ewerbring and Luk [6] are based on Givens' rotations and Jacobi-like methods. 6. Uniqueness of product factorisations Suppose that A = A : : : A p has an orthogonal product form factorisation A = Q R : : : R p (Q p )T based on treppeniteration as described above. Then given Q , the remaining triangular factors (and the associated orthogonal matrices) are uniquely speci ed up to the signs of the rows and columns in real arithmetic. To prove this we note that in all of the above product form factorisations, A i = Q i R i (Q i )T for i = 1; 2; : : : p for some orthogonal matrices Q , Q , : : : Q p . Suppose that there are two product form factorisations A = Q R R : : :R p (Q p )T = Q R R : : :R p (Q p )T : Then for each i A i = Q i R i (Q i )T = Q i R i (Q i )T : We now inductively construct D i of the form diag(1; : : : ; 1), for i = 1; 2; : : : ; p where Q i = Q i D i . To start the induction we note that Q = Q I , and that D = I has the appropriate form. Suppose that the induction has been carried through to i = j ; the problem now is to construct D j . Given that Q j R j (Q j )T = Q j R j (Q j )T Premultiplying by (Q j )T gives D j R j (Q j )T = R j (Q j )T : Thus (R j )? D j R j = ((Q j )T Q j )T : Since the matrix on the left-hand side is upper triangular, and the matrix on the righthand side is orthogonal, it must be a diagonal matrix of the form diag(1; : : :; 1). Call it D j . Thus the induction hypothesis is satis ed for i = j + 1. Hence there are matrices D i of the form diag(1; : : :; 1) for i = 1; : : : ; p where Q i = Q i D i . Since R i = (Q i )T Q i R i (Q i )T Q i = D i R i D i , it follows that the factors of the factorisation are determined uniquely up to the signs of the rows and the columns. (1)
(1)
(1)
( )
( )
( +1)
(1)
( )
( )
( )
( +1)
(2)
(1) 1
(1) 1
(2) 1
( ) 1
( ) 1
( )
( +1) 1
( ) 1
(1) 2
( +1) 1
(1) 2
( ) 2
(2) 2
( ) 2
( ) 2
(3)
( )
( +1) 2
( +1) 2
( )
( ) 1
( ) 2
( )
(1) 1
(1) 2
(1)
( +1)
( ) 1
( ) 1
( +1) 1
( ) 2
( ) 2
( +1) 2
( ) 2
( ) 1
( )
( ) 2
1
( )
( +1) 1
( ) 2
( ) 1
( +1) 1
( +1) 2
( +1) 2
( +1)
( )
( ) 1
( ) 2
( )
( ) 2
( ) 1
( ) 2
( ) 1
( +1) 2
( +1) 1
( )
( ) 1
( +1)
Product Algorithms
13
Note that if we choose the QR factorisations to have the property that the diagonal entries of the R matrix are all non-negative, then provided all of the factor matrices are non-singular, the induction step gives D i = D i = = D = I and the product factorisation is unique given Q . This result can be used to show that the product form factorisations are as unique (upto sign) as the corresponding standard factorisations. For example, if the singular values are distinct, then given the ordering of singular values, the factorisation A = U V T is unique up to choice of signs of columns of U . Thus the product form of SVD is unique up to the choice of signs in D i , i = 1; : : :; p. Furthermore, if the product A is non-singular, and the QR factors are chosen to have rii 0, then the the product form SVD is unique up to the choice of ordering and the signs of the columns of Q . 7. Product Arnoldi methods The Arnoldi algorithm is one way to nd eigenvalues of large sparse matrices in a semi-iterative way [8, pp. 501{502]. It is a Krylov subspace method that constructs both an orthonormal basis for the Krylov subspace spanfx ; Ax ; : : :; Am? x g, and the matrix of the large matrix with respect to this orthonormal basis. The LSQR algorithm for computing least squares solutions to large, sparse, overdetermined systems of equations of Paige and Saunders [15] can be regarded as a product-type iterative method since it's computations implicitly construct a Krylov subspace of the matrix # " T O A e A= A O : This is done a particularly structured manner by using a starting vector of the form e = [0; (Au )T ]T and Ae x = [(AT Au )T ; 0]T etc. x = [uT ; 0]T so that Ax The orthonormal basis vectors qi are computed by rst setting q = x =kx k, and thereafter i X e i? hjiqj hi ;iqi = Aq ( +1)
( )
(1)
(1)
( )
(1)
0
0
0
0
2
0
0
1
0
0
0
+1
0
0
0
+1
j =0 T e where hji = qj Aqi. For the LSQR algorithm it can easily be seen that the q's have the form q2j = [uTj ; 0]T and q2j+1 = [0; vjT ]T , and that hji = 0 if j and i have the same parities. If Um = [u0; u1; : : : ; um] and Vm = [v0; : : : ; vm] where AUm = Vm Hm(1) and AT Vm = UmHm(2) + h2m+1;2mum+1eTm where Hm(1) is the submatrix of H with odd numbered rows and even numbered columns (starting with h1;0), and Hm(2) is the submatrix with even numbered rows and odd numbered columns (starting with h0;1). In this sense it is a product form description of the Krylov subspace of AT A.
This structure generalises to longer products, although the symmetry of Ae is lost, even for SVD problems. If A = A A :::A p (1)
(2)
( )
14
David E. Stewart
we can construct a collection of n m matrices Umi for i = 1; : : : ; p containing orthonormal columns and Ump contains an orthonormal basis for an m-dimensional Krylov subspace. There are also mm matrices Hmi for i = 1; : : : ; p where A k Umk = Umk? Hmk if k > 1 and A Um = Ump Hm + m ump eTm. Taking the complete product of the A k 's we get AUmp = Ump Hm : : : Hmp + m ump eTm: Thus we get a factored representation of the product with Hmk = Umk? T A k Umk identifying p with zero in the superscripts. These matrices and orthonormal bases can be constructed from the Arnoldi basis generated for the matrix 3 2 O A 7 6 p 7 6A O 7 6 p ? 7 6 A O 7 Ae = 66 7 ... ... 7 6 5 4 A O with a starting vector x = [uT ; 0; : : : ; 0]T . The Arnoldi matrix Hpm generated for this pn pn matrix with this starting vector has the property that hji can be nonzero only where j i + 1 mod p. However, it does not seem to be numerically advisable to compute eigenvalues of Ae directly in terms of the resulting H matrix. (Some evidence for this is described in the last section of Stewart [19].) It appears that the best way to implement an Arnoldi-type method for computing eigenvalues of products of matrices is by applying the above factored-form methods for computing the eigenvalues of Hm Hm : : : Hmp . The Hmk can be extracted from Hpm by taking the submatrix with row numbers (starting from zero) congruent to k + 1 mod p, and column numbers congruent to k mod p. Note that the product Hm Hm : : : Hmp is in product Hessenberg form: Hm is upper Hessenberg, and Hmk is upper triangular for k > 1. As yet, the author has no numerical experience of this approach applied to a suitable large-scale system. ( )
( )
( )
(
1)
( )
(1)
(1)
( )
(1)
( )
( )
( )
(1)
( )
( )
+1
( )
( ) +1
( ) +1
+1
( )
(
1)
( )
( )
(1)
( )
(
1)
(2)
0
(1)
0
( )
( )
(2)
(1)
(2)
( )
( )
(1)
8. Implications for dynamical systems The dynamical systems considered here are discrete time systems given by xt = g(xt): If we write (x; t) = gt(x) = g(g(: : : g(x) : : : )), the Jacobian matrix of the map (; t) at x is the product rx(x ; t) = rg(xt? ) rx(x ; t ? 1) = rg(xt? )rg(xt? ) : : : rg(x ): For computing Lyapunov exponents, we need to compute the singular values of rx(x ; t) for large t. +1
0
0
0
1
0
1
2
0
Product Algorithms
15
For shadowing problems we need to look at the Newton-type methods for solving g(xt) ? xt = 0 for all t. The Newton methods consider n(t ? 1) nt matrices of the form 2 3 r g(x ) ?I 6 7 rg(x ) ?I 6 7 6 7: . . 6 7 . . . . 4 5 rg(xt? ) ?I If we consider transformations that both preserve this structure (keeping the ?I matrices) of the form xet = Ptxt, then product algorithms arise naturally. (Indeed products of matrices are just the composition of linear maps; it is this composition structure that these transformations preserve.) 8.1. Lyapunov exponents and Oseledec matrices. In the famous paper by Oseledec [14] it is shown that under some mild conditions, then for almost all x, log i(rx(x; t)) lim t!1 t exists. This limit is denoted by i (x) and is called the Lyapunov exponent for the trajectory with initial value x. Under somewhat stronger conditions, the Lyapunov exponent i (x) is independent of x, except on a null set. A number of important quantities relating to chaotic dynamical systems can be computed in terms of Lyapunov exponents (such as upper bounds on the HausdorBesicovitch dimension of the limit set of a trajectory, and the topological entropy of the system [16]). Product algorithms for the SVD can be used to estimate Lyapunov exponents. Furthermore, Oseledec showed in [14] that under the same mild conditions that ensured that the Lyapunov exponents existed for almost all x, that the matrix limit lim (r (x; t)T rx(x; t)) = t t!1 x also exists for almost all x. The limiting matrix is called an Oseledec matrix. The Oseledec matrix can also be estimated by means of product form SVD algorithms. Once we have rx(x; t) = At = UttVtT it follows that (rx(x; t)T rx(x; t)) = t = (Vtt VtT ) = t = Vt(t ) = tVtT = Vt (t) =tVtT : The diagonal entries of (t) =t are just i(rx(x; t)) =t, and since these are represented as a product of diagonal values in the factor matrices of the product SVD, then this is just the geometric mean of the (i; i) entries of the factor matrices. Provided the factor matrices are well conditioned and bounded, the entries of (t) =t are bounded and bounded away from zero, independently of t. Thus it is possible to estimate the Oseledec matrix numerically, in a reliable way using product SVD algorithms. +1
0
1
1
1 2
1 2 1
2
1 2
2 1 2
1
1
1
16
David E. Stewart
Perturbation theorems can be proven (which deal with inner perturbations) for Lyapunov exponents, and these are given in Deici, Russell and Van Vleck [5]. The result they prove is essentially that if the scaled linearised system yt = e? rg(xt)yt has an exponential dichotomy [4] for any not a Lyapunov exponent, then the Lyapunov exponents are continuous with respect to perturbations in the rg(xt) matrices. This is proven this in [5] for the case of autonomous ODE's, although it carries over to the discrete time case. It is worth noting that the stability failure with respect to inner perturbations described in x1.1 can be considered to be due to a failure to have an exponential dichotomy. 8.2. Shadowing. To consider the application of these product algorithms to shadowing problems, consider the linearisation of g(xt) ? xt = 0, as it would be used in a Newton-type method: 3 2 r g(x ) ?I 7 6 rg(x ) ?I 7 6 7: 6 rG (x) = 64 ... ... 7 5 rg(xt? ) ?I Note that this is not actually square, but n(t ? 1) nt. To obtain information about shadows we need to get estimates of the conditioning of rG (x) that are uniform in time. General orthogonal matrices are not suitable for this as the ratio between the 2-normp on Rnt and the time-uniform norm kxk ;1 = max k