Bounds on the error of an approximate invariant subspace for non-self

0 downloads 0 Views 195KB Size Report
The subject of bounding the angle between an invariant subspace of a ... invariant subspace with the p dimensional subspace Y . The question which rises.
Numer. Math. 67: 491{500 (1994)

Numerische Mathematik

c Springer-Verlag 1994 Electronic Edition

Bounds on the error of an approximate invariant subspace for non-self-adjoint matrices Moshe Haviv1 2, Ya'acov Ritov1 ;

1 2

Department of Statistics, The Hebrew University of Jerusalem, 91905 Jerusalem, Israel Department of Econometrics, The University of Sydney, Sydney, NSW 2006, Australia

Received December 1, 1992 / Revised version received October 20, 1993

Summary. Suppose one approximates an invariant subspace of an n  n matrix in C nn which in not necessarily self{adjoint. Suppose that one also has an approximation for the corresponding eigenvalues. We consider the question of how good the approximations are. Speci cally, we develop bounds on the angle between the approximating subspace and the invariant subspace itself. These bounds are functions of the following three terms: (1) the residual of the approximations; (2) singular{value separation in an associated matrix; and (3) the goodness of the approximations to the eigenvalues. Mathematics Subject Classi cation (1991): 65F15

1. Introduction The subject of bounding the angle between an invariant subspace of a matrix B 2 C nn and an approximation to it is not new. Most attention received in the literature concerns the case where B is a self{adjoint{matrix. For a recent paper see Sun (1991). This paper removes this assumption. Suppose one approximates p eigenvalues 1 ; 2 ; : : : ; p of a matrix B with the scalars 1 ; 2 ; : : : ; p and suppose that one approximates the corresponding invariant subspace with the p dimensional subspace Y . The question which rises is how good the approximations are. In particular, one likes to bound the angle between the invariant subspace corresponding to 1 ; 2 ; : : : ; p and its approximation. We obtain bounds separately for the following four cases: 1. p = 1 and the approximation  to  is error-free. 2. p = 1. 3. p  1 and the approximations 1 ; : : : ; p to 1 ; : : : ; p are error free. 4. p  1. In all cases, the bounds on the angle between the two subspaces are monotone functions of the following:1 (a) The residual of the approximation. (b) The reciprocal of singular{value separation in the matrix (B ? p )(B ? p?1 )    (B ? 1 ): Correspondence to : 1

Y. Ritov Exact de nitions are given in the next section

Numerische Mathematik Electronic Edition { page numbers may di er from the printed version page 491 of Numer. Math. 67: 491{500 (1994)

492

M. Haviv and Y. Ritov

(c) The distances between 1 ; : : : ; p and their approximations (only in cases 2 and 4). In comparing the bounds we developed with those existing for the self-adjoint case, as appearing in Parlett (1980, pp. 222{225), one nds terms corresponding to (a) and (b) above. A term corresponding to (c) does not appear in the self{ adjoint case. This is due to the fact that for self{adjoint matrices the orthogonal complement of an invariant subspace is an invariant subspace itself. We like to mention two related questions which were considered in the literature. Stewart (1971) de ned a measure which quanti es how near to an invariant subspace a given subspace X is. Then he constructed an invariant subspace whose distance form X approaches zero as the corresponding measure does. Finally, he developed a bound on the norm of the di erence between the two subspaces. See also Stewart (1973). In Sect. 3 we compare our bounds with Stewart's. Kahan, Parlett and Jiang (1982) considered an alternative question. For a given subspaces X , does there exist a small perturbation of B such that X is an invariant subspace of the perturbed matrix? Our nal remark here considers ill{conditioned eigenvalues. An eigenvalue is called ill{conditioned if the right and left eigenspaces belonging to it are almost orthogonal. This might be the case only in non{self{adjoint matrices. In that case, the eigenspaces may be sensitive to small perturbation (c.f., Golub and Wilkinson 1976, p. 588). Hence, it looks as though a bound on the angle between an invariant subspace and an approximation to it should be a function of the degree of ill{conditioning. As our bounds do not contain a term directly representing ill{conditioning, we conclude that ill{conditioned eigenvalues implies poor singular value separation. For more on the relation between the ill{conditioning phenomenon and its correspondence to eigenvalues and singular value separation see Golub and Wilkinson (1976).

2. Notation and preliminaries For a vector x 2 C n , we denote its Euclidean norm by kxk. Similarly, for a matrix B 2 C nn , kB k denotes maxkxk kBxk. All the vectors denoted later by x; y; z and w will be norm one vectors in C n . Also, for a matrix B we denote by B its conjugate transpose. The angle between the vectors x; y 2 C n , denoted by \(x; y) is de ned as arccos j x y j. Of course, 0  \(x; y)  =2. The =1

H

H

angle between subspace X and subspace Y , denoted by \(X; Y ), is de ned as supx2X inf y2Y \(x; y). Note that \(X; Y ) does not necessarily equal \(Y; X ). For an n  n matrix B and for a vector of length p, ~ = (1 ; : : : ; p ), let ~ B () = (B ? p )(B ? p?1 )    (B ? 1 ). For an n  n matrix B , a p dimensional subspace Y , and for a vector ~ of length p, ~ = (1 ; : : : ; p ), let rB (~; Y ) denote the residual of (~; Y ) at B , namely maxy2Y k B (~)yk. Of course, if the dimension of Y is one, the maximization is redundant. The term B (~; Y ) will be defend accordingly as the minimization on the complementary subspace, namely B (~; Y ) = miny?Y kB (~)yk. Note that the number of scalars, p, involved in the de nition of rB () and B () equals the dimension of Y . The case where p = 1 is, of course possible. In that case, we use  instead of ~. Also, for a matrix B , let (B ) denote the set of eigenvalues of B . We deviate from the traditional

Numerische Mathematik Electronic Edition { page numbers may di er from the printed version page 492 of Numer. Math. 67: 491{500 (1994)

Title suppressed due to excessive length

493

notation and include an eigenvalue in (B ) as many times as its geometric multiplicity (i.e., the dimension of its corresponding eigenspace) indicates. Finally, let 1 (B )      n (B ) be the nonnegative square roots of the eigenvalues of B H B (i.e., the singular values of B ) appearing in a nonincreasing order. Again, a multiple singular value appears in this sequence as its multiplicity indicates. We end this section with quoting two results which will be used frequently later on. Result 1 (cf. Parlett (1980), p. 188). For ~ = (1 ; : : : ; p)  (B) with the corresponding p dimensional invariant subspace Z (1) B (~ ; Z ) = n?p (B (~ )):

Result 2 (cf. Parlett (1980), p. 222). For a self{adjoint matrix B let (; z) be

an eigenpair. Let  and y be approximations to  and z , respectively. If  is the closest to  in (B ), then2 ; y) : (2) sin \(z; y)  rB((B n?1 ? )

3. The bounds 3.1. The case p = 1;  =  We begin with the simplest case where the dimension of the invariant space is one, and where the approximation  to  is known to be error-free. More speci cally, let y be an approximation to the eigenvector z belonging to the (known) eigenvalue . Theorem 3.1 below bounds sin \(y; z ) in terms of rB (; y) and n?1 (B ? ).

Theorem 3.1.

sin \(y; z )  rB (; y)= n?1 (B ? ): Proof. To simplify notation, let = \(y; z ). Hence, y = z cos + w sin for some w with w ? z . As Bz = z one easily gets that By = z cos + Bw sin and that y = z cos + w sin : Hence, By ? y = (B ? )w sin and then rB (; y) = sin k(B ? )wk implying that (3) sin  rB (; y)=B (; z ): The proof is completed by noticing that n?1 (B ? ) = B (; z ) as indicated in (1). ut It is worthwhile to note that inequality (3) above is valid for any norm on C n (when one adjusts the de nitions of rB (; y ) and of B (; z ) accordingly). For explicit expressions for B (; z ) for the l1 {norm and for the l1 {norm see Rothblum (1984). 2

In Parlett (1980, p.222)  stands for the Rayleigh quotient of y, namely for yH By, but it is easy to see from the proof there that this restriction is not necessary for our needs

Numerische Mathematik Electronic Edition { page numbers may di er from the printed version page 493 of Numer. Math. 67: 491{500 (1994)

494

M. Haviv and Y. Ritov

3.2. The case p = 1,  6= 

Suppose one approximates the eigenpair (; z ) with the pair (; y) where  and  do not necessarily coincide. In this sub-section we give three di erent bounds for the angle between y and z in terms of j  ?  j, rB (; y), B (; z ), and kB ? B H k. Later, in Theorem 3.3, we bound B (; z ) from below, in terms of

n (B ()) and n?1 (B ()), the smallest and the second smallest singular value of B (), respectively. For a given y, the traditional choice for  is its Rayleigh quotient ? = yHBy as it minimizes the corresponding residual, rB (; y). For that choice we have the following result as a corollary to Theorem 3.1.

Corollary 1.

1 2 ? ? 2 2 sin \(y; z )  [rB ( ; y) +(B(? ?) ) ]

n?1 ? Proof. Since y ? (B ?  )y, rB2 (; y) = k(B ? ? )y + (? ? )yk2 = k(B ? ? )yk2 + (? ? )2 :

The corollary follows from Theorem 3.1. ut Our two other bounds follow from the following theorem. Theorem 3.2. For some w ? z: H sin \(y; z )  rB (; y)+ j  ?  jj z ((;B z?) )w j =k(B ? )wk : B Proof. Again for simplicity let = \(y; z ). Then y = z cos + w sin for some w with w ? z . As Bz = z one easily gets that By = z cos + Bw sin and that y = z cos + w sin : Thus, By ? y = ( ? )z cos + (B ? )w sin . Hence, rB2 (; y) = ( ? )2 cos2 + 2( ? ) sin cos z H (B ? )w + k(B ? )wk2 sin2 2  H = k(B ? )wk sin ? j ? j cos jzk(B(B??)w)wk j  jzH(B ? )wj 2 ! 2 2 + ( ? ) cos 1 ? k(B ? )wk  2 H  k(B ? )wk sin ? j ? j cos jzk(B(B??)w)wk j : The theorem follows since, by de nition, B (; z )  k(B ? )wk. ut The rst corollary is immediate: Numerische Mathematik Electronic Edition { page numbers may di er from the printed version page 494 of Numer. Math. 67: 491{500 (1994)

Title suppressed due to excessive length

Corollary 2.

495

j  ?  j: sin \(y; z )  rB (;y)+ (; z ) B

The next corollary in useful when B is nearly self-adjoint matrix in a sense that kB ? B H k is small.

Corollary 3.

; y) + j  ?  j kB ? B H k sin \(y; z )  rB ((; z)  2 (; z ) B

B

Proof. The corollary follows from Theorem 3.2 since z ? w and hence j z H (B ? )w j=j z H(B ? B H )w j kB ? B H k. ut It is possible to see from Theorem 3.2 and from the analysis in the self{ adjoint case, that the bounds are proportional to the reciprocal of the eigenvalue separation in corresponding matrices. In the general case, as can be deduced from Theorem 3.2, the bounds are function of B (; z ). Thus we conclude that the closer  and  are, the closer B (; z ) and n?1 (B ? ) are. Theorem 3.3, given next, quanti es this observation in a sense that it bounds B (; z ) from below in terms of n?1 (B ? ). Of course, this bound can replace B (; z ) in the bound on sin \(y; z ) given in Theorem 3.2.

Theorem 3.3. )]( ? ) kB ? k : B (; z )  n? (B ? ) ? [ n? (B ? ) ?

n (B(B? ? ) 2

2

2

1

1

2

2

2

4

n?1 ? H Proof. Let z be the eigenvector of (B ? ) (B ? ) belonging to n2 (B ? ) with kz ?k = 1 and let ? be the angle between z ? and z . Then by taking (0; z ) as an approximation to the eigenpair ( n2 (B ? ); z ? ) of the self{adjoint matrix H

(B ? ) (B ? ) one gets by (2) that H sin ?  k(B 2? ()(BB??))z k  j  ?2  (jBkB??)k : (4) n?1 n?1 Now, let w be the vector where the minimum de ning B (; z ) is attained. For some angle and some vector v with v ? z ? and kvk = 1, w = z ? sin + v cos . (Note that arcsin is the angle between w and z ? .) Applying B ?  yields B2 (; z ) = k(B ? )wk2 = k(B ? )z ? sin + (B ? )v cos k2 : As vH (B ? )H (B ? )z ? = 0 and as k(B ? )z ? k = n (B ? ) one gets that k(B ? )wk2 = n2 (B ? ) sin2 + k(B ? )vk2 cos2 : But k(B ? )vk2  min[uH(B ? )H (B ? )u j u ? z ? ; kuk = 1] where the latter equals n2?1 (B ? ) as indicated in (1). Hence, (5)

B2 (; z )  n2 (B ? ) sin2 + n2?1 (B ? ) cos2 = n2?1 (B ? ) ? [ n2?1 (B ? ) ? n2 (B ? )] sin2 :

Numerische Mathematik Electronic Edition { page numbers may di er from the printed version page 495 of Numer. Math. 67: 491{500 (1994)

496

M. Haviv and Y. Ritov

Next we show that sin  sin ? . This, coupled with inequalities (4) and (5), complete the proof. Indeed, one can write z ? = z cos ? + u sin ? for some vector u, kuk = 1. Then sin =j wH z ? j=j wH (z cos ? + u sin ? ) j sin ? by wH z = 0 and j wH u j 1. ut Remark 1. The term kB ? k in Theorem 3.3 bounds k(B H ? )z k. It could be replaced by j  ?  j +k(B H ? )z k j  ?  j +kB ? B H k. Again, this is a useful bound for nearly self-adjoint matrices B . 3.3. The case p  1, 1 = 1 , 2 = 2 , : : : ; p = p

Let Z be a invariant subspace of B with a dimension of p with corresponding set of p (known) eigenvalues ~ = (1 ; : : : ; p ). Finally, let Y be a p dimensional subspace and think of it as an approximation to Z . This subsection considers the issue of measuring the quality of Y as an approximation to Z . Of course, we rst need a measure which quanti es how close is a given subspace to another. As stated in the introduction we use the quantity supy2Y inf z2Z \(y; z ) which is denoted by \(Y; Z ). Next, in Theorem 3.4 we bound \(Y; Z ) in terms of rB (~; Y ), namely the residual of ~ and Y which is de ned as supy2Y k B (~ )yk, and in terms of n?p (B (~ )).

Theorem 3.4.

~ sin \(Y; Z )  rB (; Y~) :

n?p (B ()) Proof. Let y 2 Y . Then for some z 2 Z and for some w ? z , y = z cos + w sin for = \(y; z ) = \(y; Z ). Note, by the de nition of z as the projection of y on Z , that = inf z2Z \(y; z ). (Of course, is a function of y.) Then, kB (~ )yk = sin kB (~)wk, or ~ sin = kB (~)yk : kB ()wk By de nition kB (~)yk  rB (~ ; Y ), and by Eq. (1) and the fact that w ? Z one gets that kB (~)wk  n?p (B (~ )). Hence, ~ sin  rB (; Y~) :

n?p (B ()) Since the right hand side of the last inequality does not depend on y it holds for any , namely for any y 2 Y . This completes the proof. ut Remark 2. Suppose the subspace Y is spanned by the orthonormal vectors y1 ; : : : ; yp 2 C n . With a slight abuse of notation denote the n  p matrix whose j column is Yj by Y . Then, rB (~; Y ) = p (B (~ )Y ). This is the case as rB (~ ; Y ) = max kB (~)yk = max kB (~ )Y wk = kB (~ )Y k = p (B (~ )Y ): y2Y

w2C p

Numerische Mathematik Electronic Edition { page numbers may di er from the printed version page 496 of Numer. Math. 67: 491{500 (1994)

Title suppressed due to excessive length

497

3.4. The case p  1, 1 6= 1 ; : : : ; p 6= p

Suppose one approximates a p dimensional invariant subspace Z of B 2 C nn by some other p dimensional subspace Y . Also, suppose ~ = (1 ; : : : ; p ) are approximations to the corresponding eigenvalues ~ = (1 ; : : : ; p ). Next we bound sin \(Y; Z ) in terms of the residual of (~; Y ), in terms of   kB (~) ? B (~)k (which represents the goodness of the approximation ~ to ~) and in terms of the singular value separation measure n?p (B (~)). Before stating Theorem 3.5, we need the following lemma. Lemma 3.1. For a self{adjoint matrix M 2 C nn let 1  : : :  n be its eigenvalues. Also, let Z be the invariant subspace corresponding to n?p+1 ; : : : ; n . Then for any vector y, sin \(y; Z )  kMyk : n?p

Proof. For some z 2 Z and w ? Z; y = z cos + w sin where = \(y; z ) = \(y; Z ): Then, as for self-adjoint matrices, the orthogonal complement of an invariant subspace is invariant itself, kMyk2 = kMz cos k2 + kMw sin k2  kMwk2 sin2 or sin  kMyk=kMwk. By (1), the proof is completed. ut

Theorem 3.5.

sin \(Y; Z )  

rB (~; y) + 

n?p (B (~)) ? 22 kB(B(~()~k))2 2

 : 1 2

n?p

Proof. For y 2 Y , write y = z cos + w sin for z 2 Z and w ? Z . Then as B (~)w sin = B (~)y ? B (~)z cos , by the triangle inequality one gets that ~ ~ (6) sin  kB ()yk +~ kB ()z k : kB ()wk Now, let Z~ be the invariant subspace belonging to the p smallest eigenvalues of B H (~)B (~). Then, by Lemma 3.1 H ~ ~ sin \(Z; Z~)  supz2Z2kB (~)B ()z k

n?p (B ()) H ~ ~ ~ = supz2Z kB 2()[B (~) ? B ()]z k

n?p (B ()) ~  2 kB ()~k : (7)

n?p (B ())

Write w = z~ sin + v cos with z~ 2 Z~, v ? Z~ and kvk = 1 and note that sin  sin \(Z; Z~). Then, as in the proof of Theorem 3.3, kB (~)wk2 = kB (~)(~z sin + v cos )k2 = kB (~)~zk2 sin2 + kB (~)vk2 cos2

Numerische Mathematik Electronic Edition { page numbers may di er from the printed version page 497 of Numer. Math. 67: 491{500 (1994)

498

M. Haviv and Y. Ritov

 (1 ? sin )kB (~)vk  (1 ? sin ) n?p (B (~)) 2

2

2

2

where the last inequality follows (1). Then, by noting that sin  sin \(Z; Z~) and using (7) one gets that 2 ~ 2 (8) kB (~)wk2  n2?p (B (~)) ? 2 kB ()~k :

n?p (B ())

Finally, combining the inequalities (6) and (8) completes the proof. ut Remark 3. Similar to Remark 2, note that if Y stands also for an orthonormal matrix whose columns spans Y then rB (~; Y ) = 1 (B (~)Y ). The bound of Theorem 3.5 is a monotone function of  = kB (~) ? B (~)k. Next, in Theorem 3.6, we bound  in terms of max1ip j i ? i j , which is denoted next by . Theorem 3.6. Let  be a uniform bound on k(B ? i)k, 1  i  p. Then,   ( +  )p ?  p : Proof. First note that

Yp

Yp

i=1

i=1 p X

 = k (B ? i ) ? =k

p X

(B ? i )k = k m Y

m=1 1i1

Suggest Documents