Conditioning in the Application of -orthogonal

0 downloads 0 Views 237KB Size Report
Jun 14, 1996 - rotations which act on two rows of A or two rows of B or hyperbolic .... Although the orthogonal analog is well known, it is worth dealing with a point which may be less ..... The values k are re ection coe cients associated with the canonical ..... k JAk);. (13). Ek+1 = Ek ? ^PT k+1Ek^Pk+1: (14). The matrices Ek ...
Conditioning in the Application of -orthogonal Transformations  Michael Stewart

y

Paul Van Dooren

z

June 14, 1996

Abstract

This work attempts to give a uni ed treatment of sensitivity issues in problems involving the application of -orthogonal transformations. Algorithms for such problems are sometimes implemented with minor variations in the way elementary hyperbolic transformations are used to construct a -orthogonal transformation. This results in a possible variation in the condition of the elementary transformations which can obscure the condition of the underlying problem. To characterize the possible elementary transformations which can be used to solve a given problem and to clarify important conditioning issues, we introduce a canonical decomposition of a partitioned -orthogonal matrix which is analogous to the CS decomposition of a partitioned orthogonal matrix. We then proceed to prove optimality properties of the hyperbolic transformations given by the decomposition and show how these properties relate to the sensitivity of Cholesky downdating and block Toeplitz factorization problems.

1 Introduction In block implementations of the generalized Schur algorithm it is typically necessary to compute "

H12 11 H= H H21 H22 such that and with

HT

"

#

"

A = A^ B 0

H T H =  "

#

#

(1) (2)

#

 = I0 ?0I :

Any transformation, H , satisfying (2) is referred to as -orthogonal. Whenever AT A ? B T B is positive de nite, the existence of a -orthogonal transformation satisfying (1) is guaranteed. This work was supported by ARPA (grant 60NANB2D1272) and NSF (grant CCR-9209349) Coordinated Science Laboratory, University of Illinois, Urbana, Illinois 61801, ([email protected]). Dept. Mathematical Engineering, Universite Catholique de Louvain, Louvain-la-Neuve, Belgium, (Vandooren@ anma.ucl.ac.be).  y z

1

The positivity constraint is naturally satis ed in A and B which arise when applying the generalized Schur algorithm to a positive de nite structured matrix. While the transformation is unique up to an inconsequential orthogonal transformation, there is no single approach to computing it. The most common algorithm applies and computes H in a form which is factored into elementary plane rotations and hyperbolic transformations. Any orthogonal matrix of the form H = UA  UB obviously satis es (2). Further, if H1 and H2 are separately -orthogonal and separately satisfy (2), then it is easy to see that H2H1 will also be orthogonal. The product of block diagonal orthogonal transformations will always be -orthogonal. Although in the block diagonal case it is a trivial observation, there is a more important practical use for the general observation about products of -orthogonal transformations. In particular, it applies to transformations which act on both A and B . An obvious choice for such a transformation, satisfying (2), which acts on a row of A together with a row of B is the elementary hyperbolic transformation 2 3 6 6 6 6 6 6 6 4

I

p ?2

p ?2

1

1

I

p ?2

7 7 7 7 7 7 7 5

1

p ?2 1

1

1

I

:

This transformation is -orthogonal whenever it acts on rows from both A and B . Thus, there are two fundamental transformations from which to construct -orthogonal transformations: plane rotations which act on two rows of A or two rows of B or hyperbolic transformations which act on a row of A and a row of B . If we include transformations which multiply a row of A or B by ?1, then it can be shown that any -orthogonal transformation can be represented as a product of these three types of elementary transformations. A common approach for producing an H satisfying (1) follows a simple triangularization procedure. Suppose orthogonal UA and UB are computed so that "

UAT

0

"

1   1

#"

#

"

A TA B = TB

#

0 UBT where TA and TB are upper triangular. The procedure for introducing these zero elements in A and B through the use of plane rotations is well known. It is also well known that zeros can be introduced with hyperbolic transformations, p

1

1 ? 2

#"

#

a = b

"

p a ?b 2

0

2

#

;

for  = ?b=a. If TA (:; 1) = ae1 and TB (:; 1) = be1, then this transformation can be applied to give

2

p T^A(:; 1) = a2 ? b2 e1 and T^B (:; 1) = 0. If A and B are both 4  4, then this looks like 2

3

x x x x 6 0 x x x 7 6 7 6 7 6 7 0 0 x x " #6 6 A 6 0 0 0 x 777 : B 66 0 x x x 77 6 7 6 0 x x x 7 6 7 4 0 0 x x 5 0 0 0 x Further plane rotations can be used to transform B to the form 3 2 0 x x x 7 6 B = 664 00 00 x0 xx 775 : 0 0 0 0

This introduces a zero row in B and the rst non-zero element of the rst row of B can then be eliminated by applying a hyperbolic transformation with the second row of A. If further plane rotations are applied, then an additional zero row will be introduced in B , and the process can continue until B = 0. Thus, if A and B are n  n, we have

H = U1H1 U2 H2    Un Hn

(3)

satisfying (1). This is not the only way such an H could have been computed. The most obvious modi cation avoids completing the triangularization of A and B before computing hyperbolic transformations; while performing the k hyperbolic transformation we are only directly concerned with preserving the structure of columns 1 through k and there is no reason to be concerned with the structure of other columns. However, the method described here and its obvious alternatives have two properties which are characteristic of all practical methods for computing H . First, although each Ui may represent multiple plane rotations, each Hi is a single hyperbolic transformation represented by a single i , and thus the hyperbolic part of the transformation is represented by exactly n re ection coecients. Second, the Hi and the Ui do not have any e ect on A(j; :) for j < i. After hyperbolic transformation i, the i row of A^ is determined. These two assumptions will be useful in characterizing the i which can occur in factorizations, of the form (3), of all -orthogonal H which satisfy (1). The rst step to this characterization involves the introduction of a canonical decomposition of H .

2 Decomposition of a -Orthogonal Matrix The main observation of this section is the following theorem, which gives a generalization of the CS decomposition of a partitioned orthogonal matrix,[6], to a partitioned -orthogonal matrix. A more general form of the decomposition, in which A and B are not constrained to have the same number of rows appears in [4]. For the purposes of this paper, this higher degree of generality is not needed. 3

Theorem 1 Every -orthogonal H can be decomposed uniquely (except for sign changes and per-

mutations in the orthogonal matrices and for equivalent orthogonal transformations operating on parts of the U and V matrices corresponding to equal values of P = B ?A1 ) as "

#"

#"

#"

#

(2A ? 2B )?1=2 VAT 0 0 A B (4) 2 2 ?1=2 0 VBT B A 0 (A ? B ) where UA , UB , VA and VB are orthogonal, A and B are diagonal with elements bounded in magnitude by one and the diagonal elements of A are strictly greater in magnitude than those of B .

H = U0A U0 B

Proof: Let the singular value decomposition of H be 11

H11 = UA DAVAT : Equation (2) implies that

DA2 ? VAT H21T H21VA = I

T H21VA must be diagonal and the singular value decomposition of H21 must have the so that VAT H21 form H21 = UB (DA2 ? I )1=2VAT = UB DB VAT : Similarly, since H T will also be -orthogonal,

DA2 ? UA H12H12T UAT = I; and the singular value decomposition of H12 will have the form

H12 = UADB VBT : In a similar manner, the two relations

DB2 ? VBT H22T H22VB = ?I and

DB2 ? UB H22H22T UBT = ?I imply that the singular value decomposition of H22 is H22 = UB (I + DB2 )1=2VBT = UB DAVBT :

Thus

"

#"

#"

#

VAT 0 : DA DB H = U0A U0 DB DA 0 VBT B The representation of DA and DB in terms of A and B with the stated properties follows from consideration of 2  2 hyperbolic transformations and the fact that DA2 ? I = DB2 . The uniqueness

claim follows from the uniqueness of the singular value decomposition. Although the orthogonal analog is well known, it is worth dealing with a point which may be less familiar in the -orthogonal context. In particular, we wish to understand the extent to which A, B 4

and (1) determine H . Clearly an orthogonal transformation of the form U^ T  V^ T may be applied to the left of H T without destroying the constraints in (1). The inverse of a -orthogonal matrix, H T , can easily be veri ed to be H . From this we nd "

or

"

#

#

"

A H11 ^ ^ B = ?H21 U A: #

"

#

H11 = A A^?1 U^ T : B ?H21 Thus the only freedom in choosing H11 and H21 is in the transformation U^ . It is also not dicult to verify that H12 and H22 are completely determined except for V^ .

3 The Orthogonal Case Although the di erence between the orthogonal and -orthogonal cases is not great, the existence of a canonical decomposition is more widely recognized for orthogonal matrices. A goal of this paper is to develop an optimality result for the re ection coecients associated with the canonical decomposition of a -orthogonal matrix. However, for the purpose of clarity, it will be natural to develop the analogous results for the orthogonal case. The fundamental decomposition is a special case of the CS decomposition and it has its roots in [2] and in [5]. Suppose we wish to compute an orthogonal Q such that " # " # ^ A A QT B = 0 (5) but that we wish to compute Q in such a manner that we use only n plane rotations acting between A and B. We assume that AT A + B T B has full rank. The situation is analogous to the factorization (3), but with each H replaced by a plane rotation G, Q = U1G1 U2G2    Un Gn: (6) As before, the orthogonal Ui act on A and B separately while the Gi represent a single rotation between a row of A and a row of B . As with the -orthogonal case, it can be shown that the constraint in (5) for A and B for which AT A + B T B has full rank determines Q uniquely with the exception of a possible block orthogonal transformation " # T 0 V A ^ Q = Q 0 VT : The transformation QT

maps the subspace

B

"

A range B to the subspace

"

#

#

range I0 : 5

(7) (8)

The notion of canonical angles between subspaces is well known, [2]. By introducing a set of canonical angles between these two subspaces, we gain some understanding of the representation of Q in terms of the angles associated with the Gi . The fundamental theorem is the following special case of the CS decomposition, [6]. Theorem 2 A partitioned orthogonal matrix Q can be decomposed as "

Q = U0A U0 B

#"

C ?S S C

#"

VAT

0

#

0 VBT

for orthogonal UA , UB , VA and VB and where C and S are real and diagonal, satisfying C 2 + S 2 = I .

If (5) is satis ed, then the angles for which the diagonal elements of C are the cosines are known as the canonical angles between the subspaces given by (7) and (8). The decomposition given in Theorem 2 is a special case of a decomposition of the form (6) in which U2 = U3 =    = Un = I . In this special case, the Gi all commute and can be ordered so that Gi acts on the ith row of A and the ith row of B. The angles associated with the CS decomposition have an optimality property among a speci c class of decompositions of Q into the form (6). The property is similar to the following theorem, taken from [2].

Theorem 3 If W T is an orthogonal transformation mapping the subspace (7) to the subspace (8)

and if the si are the sines of the canonical angles associated with the two subspaces and if wi is an orthonormal basis for the subspace (7) then n X k=1

sin 6 (wk ; W T wk )  2

n X k=1

s2k :

The transformation, W , which achieves optimality is the direct rotation, "

W = U0A U0 B and the vectors wi are given by

#"

C ?S S C

"

wi = U0A U0 B

#"

#"

UAT

0

#

0 UBT #

?si ei : ci ei

The result we would like to prove is that an inequality similar to that of Theorem 3 holds for the angles associated with the factorization (6). Unfortunately, without further constraints on how the factorization is computed, this will not generally be true. The missing condition is that the rows of A^ in (5) be computed sequentially as described in Section 1. As already noted, this constraint is satis ed by the triangularization procedure. Further, there is no real loss of generality in the assumption about the order in which the rows of A^ are computed|all that is required is that a new row of A^ be established with each Gi and that later transformations do not act on this row. The theorem is as follows.

6

Theorem 4 Assume that AT A + BT B has full rank. If an orthogonal Q, satisfying (5) and factored as (6), is computed using a scheme which computes the rows of A^ sequentially as described in Section 1, but with the hyperbolic transformations replaced by plane rotations, then the angles associated with the plane rotations will have sines, s^k , which satisfy n X

k=1

s^2k 

n X

k=1

s2k :

The values sk are sines associated with the canonical angles between the subspaces given in (7) and (8).

Proof: The proof is inductive. Assume without loss of generality that each Gi acts on row i of A and row 1 of B . The uniqueness up to block diagonal orthogonal transformations imposed by the full rank condition together with the CS decomposition and the fact that the Frobenius norm is unitarily invariant imply that for any Q satisfying (5), kQ kF = 21

2

n X j =1

s2k :

In the interest of nding a basis which clearly illustrates the actions of Q, we can choose invertible X so that " # " # A I T Q B X= 0 :

In fact, for the rest of the proof, we will assume that the transformation X has been applied to A and B so that " # " # A = Q11 B Q 21

P

and we can assume that kB k2F = j s2j . We will use Qi1 to refer to the initial matrices, while using A and B more loosely to describe various stages after transformations have been applied. To allow the application of induction, we will prove the more general assertion that if Q is (m + n)  (m + n), Q = U1G1U2 G2    Un Gn: and Q11 is n  n, then n X s^2j  kQ21k2F : j =1

The induction is on n. The case n = 1 is obvious. The assumption that Gj and Uj do not act on row i of A for j > i implies that after the application of U1T and GT1 the rst column of A will be e1 and the rst column of B will be zero. Similarly, after the application of U1T only, the rst columns of A and B will be multiples of e1 . Thus, the cosine and sine associated with G1 will be computed to satisfy #" # " # " kA(:; 1)k = 1 : c^1 ?s^1 kB(:; 1)k 0 s^ c^ 1

This gives

1

s^21 = kB(:; 1)k2 = kQ21(:; 1)k2: 7

Since orthogonality guarantees that GT1 introduces zeros into the rst row of A(:; 2 : n), the application of GT1 will only increase kB (:; 2 : n)k. Since later transformations will not act on the rst row of A, the induction hypothesis guarantees that the sum of the squares of the later sines will be greater than this new, increased kB (:; 2 : n)k2F and consequently n X

j =1

s2j  kQ21k2F :

This completes the proof. It is fairly simple to check, even by constructing random matrices from factorizations of the form (3), that this result depends on the sequential computation of the rows of A^ in an essential way. The sign cance of the theorem is that it gives a lower bound on the amount of \action" required by Givens rotations acting between A and B to zero B through a triangularization procedure, with or without any transformation from the right. To put things in perspective, it is worth drawing a comparison with Theorem 3. Despite similarities, the sines which the two results bound in terms of the canonical sines are distinct. Suppose that A and B are such that we are applying the triangularization procedure described in Section 1 to a matrix with orthonormal columns. Theorem 3 and Theorem 4 can both be applied without dif culty, but they provide bounds on di erent quantities. In the former case, it is a bound on the sines associated with angles between the columns of the original orthogonal matrix and its transformed version. In the latter it is a bound on the sines computed by the triangularization procedure. In e ect, we have shifted a theorem which describes the action of Q on a particular basis for a subspace to a theorem which describes the representation of Q. Several other points are worth noting about this modi ed optimality result. First, we have lost the concern with the symmetry of W which is implicit in Theorem 3. Applying a right block diagonal orthogonal transformation to the direct rotation, W T , will destroy its optimality by the criteria of Theorem 3. However, Theorem 4 applies to representations of non-symmetric matrices. Under these circumstances, the sines associated with triangularization are typically smaller than those to which Theorem 3 applies. This is not always the case, but the bounds in Theorem 4 are typically tighter. The way this was observed was by constructing Q from the QR decomposition of random matrices and then applying a triangularization procedure to "

#

Q11 : Q21

to construct a factorization of a di erent orthogonal matrix (but with the rst n columns having the same range). The resulting orthogonal matrix is not constructed with any attempt to obtain symmetry. It is not surprising that even when the sines computed by triangularization are not far from those of the direct rotation, the resulting orthogonal transformation fares poorly by the standard of Theorem 3. What is surprising is that the sines computed by triangularization of an orthogonal matrix are frequently quite close to those of the direct rotation.

4 The -Orthogonal Case The -orthogonal case is somewhat less intuitive, if only because of a lesser degree of familiarity. However, the proof of an optimality result for re ection coecients associated with the canonical decomposition is directly analogous to the proof of Theorem 4. The result is the following theorem. 8

Theorem 5 Assume that A and B are such that AT A ? BT B has full rank. If a -orthogonal

H , satisfying (1) and factored as (3), is computed using a scheme which computes the rows of A^ sequentially as described in Section 1, then the re ection coecients, ^k associated with the hyperbolic transformations, Hk , will satisfy n n X X ^2k  2k : k=1

k=1

The values k are re ection coecients associated with the canonical decomposition of H , and are equal to the diagonal elements of B ?A1 .

Proof: The proof is inductive and is very similar to the proof of Theorem 4. Assume without loss of generality that each Hi acts on row i of A and row 1 of B . The uniqueness up to block orthogonal transformations imposed by the full rank condition together with the canonical decomposition and the fact that the Frobenius norm is unitarily invariant imply that for any H satisfying (1), kH H ? kF = 21

1 2 11

n X j =1

2k :

In the interest of nding a basis which clearly illustrates the actions of H , we can choose invertible

X so that

HT

"

#

"

#

A X= I : B 0

As before, for the rest of the proof, we will assume that the transformation X has been applied to A and B so that " # " # A = H11 B H 21

P

Thus we can assume that initially kBA?1 k2F = j 2j . We will use Hi1 to refer to the initial matrices, while using A and B more loosely to describe stages after transformations have been applied to A and B . Again, to allow the application of induction, we will prove the more general assertion that if H is (m + n)  (m + n), H = U1H1 U2H2    Un Hn: and H11 is n  n, then n X ^2j  kH21H11?1 k2F j =1

still holds. The induction is on n. The case n = 1 is obvious. The assumption that Hj and Uj do not act on row i of A for j > i implies that after the application of U1T and H1T the rst column of A will be e1 and the rst column of B will be zero. Similarly, after the application of U1T only, the rst columns of A and B will be multiples of e1 . Assume that after U1T has been applied, we have 3

2

"

a11 aT12 6 A = 66 0 A22 777 : 4 b22 bT B 12 5 0 B22 #

9

The application of U1T does not change kBA?1 k2F and after U1T has been applied

BA?1 =

"

b11 a11



 # # " bT12 ? ab1111 aT12 A?221 = ?^1 (bT12 + ^1 aT12)A?221 : 0 B22 A?221 B22A?221

0

After H1T is applied,

2 q "

Clearly,

"



Consequently,

#

a211 ? b211

6

0

A22 (bT 12 p+1^?1^a2T12 ) 1 B22

0 0 0

A = 666 6 B 4

p ?2 0 # " bT + ^ aT # ? 1 A B 0 I

2

2

4



2

?1

22

1

1

1 22

1 12

12

^

22

^21 +

3

bT12p+^1 aT12 1? ^2 1

5

B22

A

F



"



3 7 7 7 7 7 5

#

bT12 + ^1aT12 A?1

2 : 22 B22 F

 kBA? kF 1 2

F

and the theorem follows from the induction hypothesis upon noting that we are left with the smaller problem of computing re ection coecients from A22 and 2 4

bT12p+^1 aT12 1? ^2 1

B22

3 5

:

This completes the induction step and the proof. It is easy to prove other optimality properties of the re ection coecients associated with the canonical form. Of particular interest is the following result.

Theorem 6 Along with the assumptions in the statement of Theorem 5 if we also assume that j j  j j      jnj: 1

then

2

j j  j^k j  jnj 1

for k = 1; 2; : : :; n.

Proof: Without loss of generality, we assume A and B have been multiplied from the right by X

as in the proof of Theorem 5 and that they have been multiplied from the left by the appropriate orthogonal transformations for the rst stage of triangularization. Clearly j1j = 1 (BA?1 ) and jnj = n (BA?1). Since

BA?1 =

"

b11 a11

0

 # " bT12 ? ab1111 aT12 A?221 = ?^1 (bT12 + ^1 aT12)A?221 0 B22 A?221 B22A?221



10

#

we see that ?^1 is an eigenvalue of BA?1 . Consequently j^1j  j1j. This is the starting point for an inductive proof. If A^ and B^ are the matrices after the hyperbolic transformation has been applied, then AT A ? B T B = A^T A^ ? B^ T B^ = I . From this we see that (BA?1 )T (BA?1 ) = I ? A?T A?1 and (B^ A^?1 )T (B^ A^?1 ) = I ? A^?T A^?1 : Since the smallest eigenvalue of A^?T A^?1 is 1=kAk2 and since a similar equality holds for A, it is easily veri ed that the smallest eigenvalue of A^?T A^?1 is larger than the smallest eigenvalue of A?T A?1. This implies kB^ A^?1 k < kBA?1 k and consequently, by induction, j^k j  j1j. The other side of the inequality is similarly easy to prove. In the same manner as before, we can conlude that j^1j  jnj. But " # 1 h i p 0 ?  2 1? ^1 B^A^?1 = 0 BA?1 (:; 2 : n) 0 I which implies that n?1 (B^ A^?1 ) > n (BA?1 ). Since we now e ectively have a problem of size (n ? 1)  (n ? 1), this is the inductive step which proves that ^k > n. As suggested earlier, the signi cance of these result is broader than just providing an inequality which says that the canonical re ection coecients are better in a sum-of-squares sense than those of the obvious triangularization procedure. The Theorems in this section give a characterization of the range of possible re ection coecients for schemes for computing H which are more general than triangularization. Since, for an appropriate transformation from the right of A and B , the triangularization procedure will compute the canonical re ection coecients, we e ectively have bounds on the the range of re ection coecients which are possible when triangularizing after transforming from the right. As with the orthogonal case, the sequential computation of the rows of A^ is essential for the result to hold. The fact that we have decoupled the n re ection coecients with orthogonal transformations shows that the canonical re ection coecients reveal the inherent condition of this problem. Theorem 6 is reasuring in that it implies that it is not possible to introduce arti cially ill-conditioned transformations through a poor choice of a method for computing H . However, it does not really go far enough: in most schemes for computing H , the hyperbolic transformations will not be decoupled as they are in the decomposition of H . We will discuss this further in Section 5 and in Section 6. Although the optimality is interesting and it does clarify an important property of the canonical re ection coecients, we will shift emphasis for the rest of the paper. We will argue that the worst canonical re ection coecient is an appropriate measure of ill-conditioning for several important problems.

5 An Alternate Perspective on Cholesky Downdating The decomposition given in (4) o ers an interesting perspective on Cholesky downdating and provides a natural variant of the Linpack algorithm. Given a triangular matrix, C , and a vector, x, such that C T C ? xxT is positive de nite, we seek C^ such that C^ T C^ = C T C ? xxT : 11

This is the traditional Cholesky downdating problem. Suppose we have computed the decomposition of H in 4 for

HT

"

#

2

C xT

3

"

#

A = H T 64 C^ : 7 5= B 0nn 0(n?1)n

The introduction of zeros in A is always possible by positivity of C T C ? xxT while the triangularity of C^ is possible through the freedom in choosing VA . The canonical decomposition of H shows that UAT C must have a row which is a scalar multiple of xT . Thus, the downdating problem becomes a matter of multiplying C by UAT , scaling a row and transforming back to triangular form. In practice, we wish to avoid deviating too far from a triangular structure when applying UAT . The fact that B is just a single row makes this very easy to do. Suppose uT C = xT and UAT u = kuke1. Then T T eT1 UAT C = kuuk C = kxuk :

p Thus, scaling the rst row of UAT C by 1 ? kuk2 and transforming back to a triangular C^ will accomplish the downdate. If we choose UA as a sequence of plane rotations, UAT = QTn?1 : : :QT1 , with QTi acting in the (n ? i; n ? i + 1) plane, then the structure of UAT C will be upper Hessenberg. The triangular structure can be restored with another n ? 1 rotations, resulting in an O(n2 ) complexity algorithm. Technically, this is an algorithm which is distinct from both the Linpack algorithm and from the Chambers algorithm. However, it is so closely related to the Linpack algorithm that it might be considered to be a simple-minded and inecient implementation of the same basic idea. By adding a row of zeros to C , # " T 0 A= C

and adding an additional element to u

" p

1 ? kuk2

#

u

the Linpack algorithm computes an orthogonal transformation, UA , such that UAT A is upper Hessenberg and the scaling on the rst row of UAT A which completes the downdate is zero. With a rst row of zero, there is no need to compute additional transformations to restore the triangular structure. The Linpack algorithm can be thought of as clever way to compute an (n + 1)  (n + 1) UA in (4) which makes the downdate trivial. It is not the only way to do this, but of the two alternatives discussed here, it is the more ecient. Not surprisingly, the stability properties of the canonical form downdating algorithm are the same as those of the Linpack method. Here we will present an outline of the analysis without the details necessary to establish tight bounds. The rst stage of the algorithm is the back substitution. The computed u will satisfy

uT (C + F ) = xT 12

where F is upper triangular and satis es kF k  nkC k + O(2 ), where  is the machine precision. We will freely neglect terms of O(2) or higher. Let UA = U1 U2    Un?1 . Standard results on the application of plane rotations,[7], ensure that 

h

UnT?1 UnT?2    U1T u C

i

h

i

h

i

= U^nT?1 U^n?2    U^1T u C + g G :

(9)

for rotations U^1 ; : : :; U^n?1 acting in the same planes as U1 ; : : :; Un?1 and for kg k  Cg kuk and kGk  CGkC k. The constants Cg and CG are independent of , kuk, and kC k, but the analysis in [7] suggests the possibility of exponential growth in n. Fortunately, with typical , the base which is being exponentiated is so close to 1 that for reasonable values of n these two exponential functions are e ectively smaller than low order polynomials. The rotations Ui are the exact rotations which introduce the appropriate zeros into the computed UiT?1    U1T u. We can also ensure that if

U^nT?1    U^1T u = e1 + g^ with eT1 g^ = 0, then kg^k=kuk  Cg^n for some Cg^ which is independent of  and n. The computed kuk will be ^ = + g(1). As expected, the bounds on g and g^, together with (9) show that ^ approximates kuk with high relative precision. At this point, we have a computed Hessenberg matrix U^AT C +pG. If we let H represent the computed Hessenberg matrix obtained after scaling the rst row by (1 ? ^ )(1 + ^ ), which is itself computed to high relative accuracy, then "

T

(H ) = ^00 I

where

#







U^AT C + G + e1 hT = D U^AT C + G



(10)

khk < p khk  Chn: kH k 1 ? j ^j kxk=j ^j 2

From UAT u^ = e1 + g^ we know that









x^ = eT1 UAT C = u^T C ? g^T U^AT C = xT ? u^T F ? g^T UAT C : Combining this with (10), we see that (H ? D G ? e1 hT )(H ? D G ? e1 hT )T = C T UAT D 2 UA C = C T C ? x^x^T : Thus, H is close to the Hessenberg matrix which results from applying the algorithm with no error on C and x^ close x. This result is equivalent to the existence of E1 , E2 and U1 such that "

C

U1T 0T

#

"

#

"

#

= xHT + Eet1 : 2

The transformation of H to the downdated C^ can be computed to satisfy U2T (H + E3 ) = C^ 13

so that

"

#

#

"

"

#

#

"

U2T 0 U T C = C^ + U2T (E1 ? E3) : eT2 0 1 1 0T xT

This is a mixed stability result in which part of the error is put on the initial data, x, and part of the error is placed on the result, C^ . Such bounds are common when dealing with problems which have algorithmic solutions involving the use of hyperbolic transformations. Finally, we note that the same sort of decomposition has been used in the context of Cholesky downdating before. Although we have developed it by analogy with the CS decomposition, it ts naturally within the closely related context of the GSVD. In [3], in slightly di erent language, it is noted that for a block downdating problem, the conditioning of the problem is dependent on the largest canonical re ection coecient.

6 Condition Numbers of Block Toeplitz Matrices For an n  n symmetric positive de nite Toeplitz matrix, T , it is shown in [1] that ?1 (1 + j j)2 1 + k  kT ?1 k  nY k ; 1 2 2 k=1 1 ? k k=1 1 ? k n Y

(11)

where k are the re ection coecients of T . We assume that T is scaled to have ones on the diagonal, so that the bounds can be immediately translated into bounds on the condition number of T . Although these bounds can give a very broad range for kT ?1 k, it is immediately clear that the presence of a re ection coecient which is very close to 1 yields an ill-conditioned Toeplitz matrix. It is in the other direction that problems arise: a Toeplitz matrix can be very ill-conditioned without revealing it in the form of a single very bad re ection coecient. The cummulative e ect of a large number of moderately bad re ection coecients can result in an extremely ill-conditioned Toeplitz matrix. For block Toeplitz matrices, the situation is even worse. There are likely to be more re ection coecients and their relation to the condition number is even more dubious. Without claiming to have put the notion of estimating condition numbers from re ection coecients on a safe footing, the point of this section is to show that the decoupling provided by the canonical decomposition makes the canonical re ection coecients more appropriate for this task. Further, if the block size is small, the additional computational cost is not prohibitive. It will certainly be less expensive than conventional condition estimation, which involves a cost comparable to that of the factorization. Assume we are dealing with a symmetric positive de nite Toeplitz matrix, 2

B2    BN ?1 6 . . . ... 6 B 6 1 6 ... ... B TN = 66 2 6 .. . . . . . . . . . 6 4 B1 . BN ?1    B2 B1 B0 with symmetric blocks, Bk . If

B0 B1 B2

B1 B0 B1

h

RN = B1 B2    BN 14

iT

3 7 7 7 7 7 7 7 7 5

then triangular factors of TN?1 can be obtained from the solutions Ak to Tk Ak = ?Rk for k = 1; 2; : : :; N ? 1. In particular, if A is de ned as a lower triangular matrix by 2

A(:; (k ? 1)n + 1 : kn) = 64

0(k?1)nn

Inn AN ?k

3 7 5

for k = 1; : : :; N then Tk Ak = ?Rk implies that TN A will be upper block triangular. Consequently, it is possible to express TN?1 as TN?1 = ADAT where D is a block diagonal matrix with n  n symmetric positive de nite diagonal blocks. The algorithm which computes such a factorization is a block version of the Levinson algorithm. This is not a new idea and the derivation of the algorithm is so similar to the traditional Levinson algorithm that we will only summarize the results before applying them to the problem at hand. We partition Ak as " # A^k P^k where Pk is n  n and let J be 3 2

J=

6 6 6 4

I

I ...

I

7 7 7 5

:

It is possible to recursively compute Ak+1 from Ak , A^k+1 = Ak + JAk P^k+1 ;

P^k+1 = ?Ek?1 (Bk+1 + RTk JAk ); Ek+1 = Ek ? P^kT+1 Ek P^k+1 :

The matrices Ek are symmetric positive de nite and

(12) (13) (14)

D = diag(EN?1?1; EN?1?2; : : :E0?1): Let Ek = CkT Ck for upper triangular Ck . (14) is equivalent to CkT+1 Ck+1 = CkT Ck ? (Ck P^k+1 )T (CkP^k+1 ) = CkT (I ? PkT+1 Pk+1 )Ck where Pk+1 = Ck P^k+1 Ck?1 . It is easily veri ed from (13) that Pk+1 is symmetric. This is a downdating problem. Using the notation of (4) and letting Rk+1 be the Cholesky factor of I ? PkT+1 Pk+1 we have Rk+1 = UA ?A1(2A ? 2B )1=2VAT and Ck+1 = Rk+1 Ck where the re ection coecients correspond to those from the -orthogonal transformation computed by the Schur algorithm. 15

This provides a bound on the growth in the block diagonal elements of the triangular factor of

TN?1, connecting the condition of T with the worst canonical re ection coecient. We can tighten the connection by modifying (12). So far, the symmetry of P shows that for diagonal elements, Ek , the eigenvalues of P^ , rather than the singular values, are what really matter. In (12) there is an apparent dependence on kP^ k, but this can be dealt with easily. From (12) A^k+1 Ck?1 = (Ak Ck?1 ) + J (Ak Ck?1 )Pk+1 : But so and

1 1 = Ck R?k+1 Ck?+1 1 1 1 (A^k+1 Ck?+1 ) = (Ak Ck?1 )R?k+1 + J (Ak Ck?1)Pk+1 R?k+1 :

1 1 : ) = Ck?1 Pk+1 R?k+1 (P^k+1 Ck?+1 The e ects of Pk+1 and Rk+1 in these equations can be easily observed by noting that 1 Uk = (I ? ?A2?B2 )?1=2 VAT R?k+1

and

1 UA = ?A1 B (I ? ?A2 2B )?1=2: VAT Pk+1 R?k+1 Since Ak Ck?1 is just a column of the triangular factor of TN?1 , we see that the growth in the norms

of these columns will be determined by the largest diagonal element of ?A1 B . From this result, we can derive bounds which are essentially identical to the upper bound in (11). From a practical point of view, there are several signi cant points to this observation. First, the products in such bounds will involve multiplying factors which are equal in number to the size of the matrix in terms of blocks rather than elements; the number of factors will be N rather nN . This leads to a better result than attempting to detect ill-conditioning by multiplying the norms of the nN hyperbolic transformations produced by the Schur algorithm. Second, if the block size is small, computing the re ection coecient is easy. Since it is a more realistic characterization of the conditioning of the problem, there is no reason to not compute it. It won't produce nearly as good an estimate as produced by a traditional condition estimator, but, since it only involves looking at diagonal blocks, it will be far less computationally expensive.

7 Summary In this paper we have presented a decomposition of a -orthogonal matrix and analyzed its properties which parallel those of the CS decomposition. In addition, we have addressed the issue of conditioning. The CS decomposition reveals canonical rotations which are always well conditioned, but the -orthogonal equivalent reveals information which is relevant to understanding the conditioning of Cholesky downdating and factorization of block Toeplitz matrices. Since the elementary hyperbolic transformations are completely decoupled by orthogonal transformations, the norms of these transformations reveal the inherent condition of the problem. In addition to this, we have shown that these re ection coecients can be used to characterize the range of re ection coecients which are achievable by other approaches to solving the problem. 16

References [1] G. Cybenko, The Numerical Stability of the Levinson-Durbin Algorithm for Toeplitz Systems of Equations, SIAM J. Sci. Stat. Comput., 1 (1980), pp. 303{319. [2] C. Davis and W. M. Kahan, The Rotation of Eigenvectors by a Perturbation, III, SIAM J. Numer. Anal., 7 (1970), pp. 1{46. [3] L. Elden and H. Park, Perturbation analysis for block downdating of a cholesky decomposition. [4] E. J. Grimme, D. C. Sorenson, and P. Van Dooren, Model reduction of state space systems via an implicitly restarted lnaczos method. [5] G. W. Stewart, On the Perturbation of Pseudo-Inverses, Projections, and Linear Least Squares Problems, SIAM Review, 19 (1977), pp. 634{662. [6] G. W. Stewart and J.-G. Sun, Matrix Perturbation Theory, Academic Press, Boston, 1990. [7] J. H. Wilkinson, The Algebraic Eigenvalue Problem, Oxford University Press, London, 1965.

17

Suggest Documents