Error Bounds For Approximate Deflating Subspaces For Linear Response Eigenvalue Problems Wei-guo Wang Lei-Hong Zhang Ren-Cang Li
Technical Report 2016-01 http://www.uta.edu/math/preprint/
Error Bounds For Approximate Deflating Subspaces For Linear Response Eigenvalue Problems Wei-guo Wang∗
Lei-Hong Zhang†
Ren-Cang Li‡
January 6, 2016 Dedicated to Professor Rajendra Bhatia on the occasion of his 65th birthday Abstract
0 K Consider the linear response eigenvalue problem (LREP) for H = , where K M 0 and M are positive semidefinite and one of them is definite. Given a pair of approximate deflating subspaces of {K, M }, it can be shown that LREP can be transformed into one for e that is nearly decoupled into two smaller LREPs upon congruence transformations on K H and M that preserve the eigenvalues of H. In this paper, we establish a bound on how far the pair of approximate deflating subspaces is from a pair of exact ones, using the closeness e from being decoupled. of H
Key words. linear response eigenvalue problem, random phase approximation, perturbation, deflating subspaces AMS subject classifications. 15A42, 65F15
1
Introduction
In computational quantum chemistry and physics, the so-called random phase approximation (RPA) describes the excitation states (energies) of physical systems in the study of collective motion of many-particle systems [1, 12, 13]. It has important applications in silicon nanoparticles and nanoscale materials and analysis of interstellar clouds [1, 2]. One important question in RPA is to compute a few eigenpairs associated with the smallest positive eigenvalues of the following eigenvalue problem: A B u u H w := =λ , (1.1) −B −A v v ∗ School of Mathematical Sciences, Ocean University of China, Qingdao, 266100, People’s Republic of China. Email:
[email protected]. Supported in part by the National Natural Science Foundation of China Grant 11371333 and the Shandong Province Natural Science Foundation Grant ZR2013AM025. † School of Mathematics and Shanghai Key Laboratory of Financial Information Technology, Shanghai University of Finance and Economics, 777 Guoding Road, Shanghai 200433, China. Email:
[email protected]. Supported in part by the National Natural Science Foundation of China NSFC-11371102, and the Basic Academic Discipline Program, the 11th five year plan of 211 Project for Shanghai University of Finance and Economics. ‡ Department of Mathematics, University of Texas at Arlington, P.O. Box 19408, Arlington, TX 76019 (
[email protected].) Supported in part by NSF grants DMS-1317330 and CCF-1527104, and NSFC grant 11428104.
1
A B where A, B ∈ are both symmetric matrices and is positive definite. Through a B A similarity transformation, this eigenvalue problem can be equivalently transformed into [1, 2, 4] 0 K y y Hzz := =λ , (1.2) M 0 x x Rn×n
where K = A − B and M = A + B. This eigenvalue problem was still referred to as the linear response eigenvalue problem (LREP) [1, 4, 15] and will be in this paper, too. The condition imposed upon A and B in (1.1) implies that both K and M are symmetric and positive definite [1]. But in the rest of this paper, unless otherwise explicitly stated, we relax the positive definiteness of both K and M to that both are positive semidefinite and one of them is definite. An important notion for LREP [1] is the so-called pair of deflating subspaces {U , V} by which we mean that both U and V are subspaces of Rn and satisfy KU ⊆ V
and
MV ⊆ U.
More discussions on this are in section 3. It is a generalization of the concept of the invariant subspace (or, eigenspace) in the standard eigenvalue problem upon considering the special structure in the LREP (1.2). This notion is not only vital in analyzing the theoretical properties such as the subspace version [1] of Thouless’s minimization principle [12] and the Cauchy-like interlacing inequalities [2], but also fundamental for several rather efficient algorithms, e.g., the Locally Optimal Block Preconditioned 4D Conjugate Gradient Method (LOBP4DCG) [2] and its space-extended variation [3], the block Chebyshev-Davidson method [11], as well as the generalized Lanczos method [9, 14, 15]. Each of these algorithms generates a sequence of approximate deflating subspace pairs {Uj , Vj } that hopefully converge to or contain subspaces near the pair of deflating subspaces of interest. Therefore, it is important to establish relationships between the accuracy in eigenvalue approximations and the distances from the exact deflating subspaces to their approximate ones. Analogously to error estimate results for approximate invariant subspaces in the standard and generalized eigenvalue problems [5, 6, 7], in this paper, we will establish results on error bounds for approximate deflating subspaces of LREP (1.2). The rest of the paper is organized as follows. In section 2, we will first collect some known results for LREP. These results are essential to our later development. Section 3 discusses the most important property of a pair of deflating subspaces: it decouples H into two smaller LREPs upon congruence transformations on K and M that preserve the eigenvalues of H. In section 4, we present a way to construct the pair of exact deflating subspaces that will decouple the LREP, given a pair of approximate deflating subspaces. Our main results on the existence of the pair of exact deflating subspaces and how far it is from the pair of approximate ones are given in section 5. Notation. Kn×m is the set of all n × m matrices whose entries belong to the number field K, Kn = Kn×1 , and K = K1 , where K = R (the set of real numbers) or C (the set of complex numbers). In (or simply I if its dimension is clear from the context) denotes the n × n identity matrix. All vectors are column vectors and are in boldface. For a matrix Z, 1. Z T denotes its transpose, 2
2. R(Z) is Z’s column space, spanned by its column vectors, 3. kZk2 and kZkF are the spectral norm and the Frobenius norm, respectively, 4. Z’s submatrices Z(k:ℓ,i:j), Z(k:ℓ,:), and Z(:,i:j) consist of intersections of row k to row ℓ and column i to column j, row k to row ℓ, and column i to column j, respectively, 5. when Z is a square matrix and its eigenvalue set is eig(Z).
2
Preliminaries
Recall that in LREP (1.2), we assume, in what follows, both K and M are positive semidefinite and one of them is definite. Many theoretical properties of LREP have been established in [1, 2]. In particular, LREP (1.2) has only real eigenvalues coming in pairs {λ, −λ}. Let the eigenvalues be −λn ≤ · · · ≤ −λ1 ≤ λ1 ≤ · · · ≤ λn . (2.1) Theorem 2.1 presents decompositions on K and M , necessary for our developments later in this paper. Theorem 2.1. Suppose that K is semidefinite and M is definite. Then the following statements are true: (i) There exists a nonsingular Φ ∈ Rn×n such that K = Ψ Λ2 Ψ T
and
M = ΦΦT ,
(2.2)
where Λ = diag(λ1 , λ2 , . . . , λn ), λi are as in (2.1), and Ψ = Φ− T . (ii) If K is also definite, then all λi > 0 and H is diagonalizable: ΨΛ ΨΛ ΨΛ ΨΛ Λ H = . Φ −Φ Φ −Φ −Λ
(2.3)
(iii) The eigen-decompostion of KM and M K are (KM )Ψ = Ψ Λ2
and
(M K)Φ = ΦΛ2 ,
(2.4)
respectively. LREP (1.2) is equivalent to any one of the following product eigenvalue problems [1] KM y = λ2 y, 2
M Kx = λ x.
(2.5a) (2.5b)
In theory, solving any one of them leads to solutions of the other two. It is tempting to solve either (2.5a) or (2.5b) because they are half the size of the original LREP (1.2). But numerically, it is preferable to solving (1.2) directly for a couple of reasons. First, KM and M K are no longer symmetric and thus spurious complex eigenvalues could show up due to rounding errors, and secondly if carefully designed, solving (1.2) directly costs about the same as solving either (2.5a) 3
or (2.5b) even though the former is twice the size of the latter. Nonetheless, for theoretic analysis there are times when it is more convenient to deal with either (2.5a) or (2.5b). For example, in seeking transformations to transform K and M to “simpler forms” while preserving the eigenvalues of these problems, working with either KM or M K for theoretic investigation seems to be more convenient. In fact, in order not to change the eigenvalues of KM , we may do K → Z −1 KX,
M → X −1 M Z
to get
KM → Z −1 (KM )Z
which, on H, becomes
−1 0 Z −1 KX Z Z H→ = H . X −1 M Z 0 X X But such transformations destroy the symmetry in K and M , not to mention their definiteness property. Instead, we prefer symmetry- and definiteness-preserving transformations: K → X T KX,
M → Y T M Y,
with the constraint XY T = In to ensure that the eigenvalues of KM before and after the transformation remain the same. In fact, under XY T = In , we have KM → X T KX · Y T M Y = X T (KM )Y = Y −1 (KM )Y and, on H,
0 H→ T Y MY
T X T KX X Y = H 0 Y
X
,
a similarity transformation.
3
Deflating subspaces
Recall that {U , V} is a pair of deflating subspaces of {K, M } if KU ⊆ V
and
MV ⊆ U.
(3.1)
Let U ∈ Rn×k and V ∈ Rn×k be the basis matrices for U and V, respectively. Alternatively, (3.1) can be restated as that there exist KR ∈ Rk×k and MR ∈ Rk×k such that KU = V KR and vice versa, or equivalently, V V H = U
U
and M V = U MR
HR
with
HR :=
(3.2)
KR MR
,
i.e., V ⊕ U is an invariant subspace of H [1, Theorem 2.4]. Two particular choices of {KR , MR } to satisfy (3.2) are KR = (U T V )−1 U T KU,
MR = (V T U )−1 V T M V ;
(3.3a)
KR = (V T V )−1 V T KU, MR = (U T U )−1 U T M V.
(3.3b)
4
In (3.3a), it is assumed that U T V is invertible, which is guaranteed if one of K and M is definite [1, Lemma 2.7]. By what we just proved, the associated HR with either (3.3a) or (3.3b) must have the same eigenvalues. {KR , MR } by (3.3a) relates to the structure-preserving projection HSR of H in [2, (2.2)] that plays an important role numerically there. To highlight this particular pair, in [16] we called {KR , MR } by (3.3a) a Rayleigh quotient pair of LREP (1.2) associated with {U , V} and introduce KRQ := (U T V )−1 U T KU,
MRQ := (V T U )−1 V T M V
(3.4)
for the ease of future references. Both KRQ and MRQ vary with different selections of U and V as the basis matrices of U and V, respectively. But the eigenvalues of the induced KRQ HRQ = . (3.5) MRQ do not. In the case when U T V = Ik , both KRQ and MRQ are symmetric and have the same definiteness property as their corresponding K and M . But when U T V 6= Ik , if one of K and M is definite [1, Lemma 2.7], we can always “normalize” them so that U T V is Ik . For example, we factorize U T V as U T V = W1T W2 , where Wi ∈ Rk×k are nonsingular, and then U W1−1 and V W2−1 are new basis matrices of U and V, respectively, and satisfy (U W1−1 )T (V W2−1 ) = Ik . For this reason, frequently we select basis matrices for a pair of deflating subspaces to have this property. Theorem 3.1. Let {U , V} be a pair of deflating subspaces of {K, M } with dim(U ) = dim(V) = k, and let the columns of U1 and V1 form bases for U and V, respectively, and satisfy U1T V1 = Ik . Then there are nonsingular matrices [U1 , U2 ] and [V1 , V2 ] such that [U1 , U2 ]T [V1 , V2 ] = In , and
T
[U1 , U2 ] K[U1 , U2 ] =
k n−k
k
n−k
K1 0
0 K2
,
T
[V1 , V2 ] M [V1 , V2 ] =
k n−k
k
n−k
M1 0
0 M2
.
(3.6)
Furthermore, {R(U2 ), R(V2 )} is also a pair of deflating subspaces of dimension n − k, and
where Hi =
eig(H) = eig(H1 ) ∪ eig(H2 ),
0 Ki for i = 1, 2. Mi 0
Proof. Let the columns of U2 ∈ Rn×(n−k) and V2 ∈ Rn×(n−k) be the basis matrices of the orthogonal complements of V and U , respectively. In particular, U2T V1 = V2T U1 = 0. Then we have U2T KU1 = 0 and V2T M V1 = 0 (cf. (3.2)), which implies that [U1 , U2 ]T K[U1 , U2 ] and [V1 , V2 ]T M [V1 , V2 ] have the form (3.6). We claim that both [U1 , U2 ] and [V1 , V2 ] are nonsingular. Consider [U1 , U2 ]. It is sufficient to show that U1 y = U2 z implies y = 0 and z = 0. We have y = V1T U1 y = V1T U2 z = 0 since V1T U2 = 0. Thus U2 z = 0 which implies z = 0 because U2 is the basis matrix of the orthogonal complement of V. Similarly [V1 , V2 ] is also nonsingular, and so is I 0 [U1 , U2 ]T [V1 , V2 ] = k . 0 U2T V2 5
Thus U2T V2 is nonsingular. As we just commented before stating this theorem, we can always “normalize” U2T V2 so that it is In−k , which we will assume for the rest of this proof. Now use (3.6) to get K1 0 K[U1 , U2 ] = [V1 , V2 ] ⇒ KU2 = V2 K2 , 0 K2 and similarly, M U2 = V2 M2 . Hence {R(U2 ), R(V2 )} is a pair of deflating subspaces of dimension n − k. Corollary 3.1. Let M be positive definite, and let {U , V} be a pair of deflating subspaces with dim(U ) = dim(V) = k. Then there are nonsingular matrices U and V such that U V T = In , and
T
U KU =
k n−k
k
n−k
K1 0
0 K2
Ik 0 V MV = . 0 In−k T
,
(3.7)
Proof. In Theorem 3.1, let M1 = R1T R1 and M2 = R2T R2 be the Cholesky decompositions. Ri for i = 1, 2 are nonsingular. Set T −1 R1 0 R1 0 U := [U1 , U2 ] , V := [V1 , V2 ] , 0 R2T 0 R2−1 to complete the proof.
4
Approximate Deflating Subspaces
Theorem 3.1 says that if {U , V} is a pair of deflating subspaces, then both K and M can be block-diagonalizable as in (3.6). The opposite is also true. Because of [U1 , U2 ]T [V1 , V2 ] = In , the congruence transformations on K and M there preserve eigenvalues as we argued at the end of section 2. Now what if {U , V} is just a pair of approximate deflating subspaces, then we won’t expect to have something like (3.6). In fact, examining the proof of Theorem 3.1, we will see now we don’t have U2T KU1 = 0 and V2T M V1 = 0 anymore, except that both U2T KU1 and V2T M V1 are expected to be small in magnitude. Thus again following through the proof, we will have [U1 , U2 ]T [V1 , V2 ] = In , and
T
[U1 , U2 ] K[U1 , U2 ] =
T
[V1 , V2 ] M [V1 , V2 ] =
k n−k
k n−k
k
n−k T K21
K1 K21
T M1 M21 M21 M2
k
K2 n−k
b =: K,
(4.1a)
c, =: M
(4.1b)
with tiny kK21 k2 and kM21 k2 , where U1 and V1 are basis matrices of U and V, respectively. Naturally, we expect that there will be a pair of exact deflating subspaces that is near {U , V}. Recently in [10], the authors consider the effect of setting both K21 and M21 to 0 on the eigenvalues. Perturbation bounds in terms of the squares of kK21 k2 and kM21 k2 are obtained. 6
The main purpose of this paper, however, is to quantify how far {U , V} is from the pair of exact deflating subspaces. This section sets up the stage for our main result in section 5. b and M c given by (4.1), let For K I −P T I −QT b b , (4.2) , Y = X= Q I P I
where P, Q ∈ R(n−k)×k are to be determined such that the off-diagonal " T T T b K K I −Q K I P 1 T 21 b K bX b= =: b 1 X I −Q I K21 K2 P K21 " T c1 I −P T M I QT M1 M21 T cb b =: Y MY = c21 I −P I M21 M2 Q M b 21 = M c21 = 0. Calculations give become 0, i.e., making K
blocks of # bT K 21 , b2 K # cT M 21 , c2 M
(4.3a) (4.3b)
b 21 = K21 − QK1 + K2 P − QK T P, K 21 T c21 = M21 − P M1 + M2 Q − P M21 M Q.
b 21 = M c21 = 0, we need P and Q to satisfy Thus to make K
T QK1 − K2 P = K21 − QK21 P,
T P M1 − M2 Q = M21 − P M21 Q.
(4.4a) (4.4b)
Suppose we have found such P and Q and if also kP T Qk2 < 1, then I + QT P 0 T T b b b b XY = Y X = =: D 0 I + P QT
b and Yb . Now set1 is nonsingular; so are X
b −1 , X = XD
Y = Yb .
b −1 Yb T = I and It can be verified that XY T = XD " # " # e f K M 1 1 Tc b = e f X T KX e 2 =: K, Y M Y = f2 =: M , K M
(4.5)
(4.6a)
where
1
T e 1 = (K1 + K21 K P )(I + QT P )−1 , T f1 = (I + QT P )(M1 + M21 M Q),
This is only one choice among many. Other choices will also work, e.g., b Y = Yb D− T , X = X,
or
b −1/2 , Y = Yb (D−1/2 )T . X = XD
7
(4.6b) (4.6c)
T e 2 = (I + QP T )−1 (K2 − QK21 K ), T T f2 = (M2 − P M21 )(I + QP ). M
(4.6d) (4.6e)
b i and M ci These expressions do not look like that they actually come from (4.3). In particular, K do not look like symmetric but they are. This is because we have used (4.4). In fact, we have T b 1 = K1 + P T K21 + K21 K P + P T K2 P
= K1 +
= K1 +
T K21 P T K21 P T
(clearly symmetric)
T
+ P (K21 + K2 P )
T + P T (QK1 + QK21 P)
(use (4.4a))
T K21 P ), T
= (I + P Q)(K1 + e 1 = (I + P T Q)−1 K b 1 (I + Q P )−1 K T = (K1 + K21 P )(I + QT P )−1 .
eM f have the same eigenvalues and so do H and Because of XY T = I, KM and K # " e K e = . H f M
Our analysis so far suggests the importance of the system (4.4) of equations in P and Q. In preparation of studying the system, in the next subsection we introduce a linear operator and investigating its invertibility. Much of the development in the next two subsections bears similarity to Stewart [6, 7].
5
Error bounds for approximate deflating subspaces
The most important outcome of the analysis in section 4 is that if the equations in (4.4) hold, then {R (U1 + U2 P ) , R (V1 + V2 Q)} is an exact pair of deflating subspaces of {K, M }. In this section, we will bound the difference between the pair of approximate deflating subspaces {R(U ), R(V )} and this exact pair of deflating subspaces. To this end, we first investigate an operator and then use the result of the investigation to devise bound for the difference.
5.1
The operator L (P, Q) = (QK1 − K2 P, P M1 − M2 Q)
Let K1 , M1 ∈ Rk×k and K2 , M2 ∈ Rm×m , not necessarily symmetric. Define linear operator L : Rm×k × Rm×k → Rm×k × Rm×k by L (P, Q) = (QK1 − K2 P, P M1 − M2 Q).
(5.1)
In this subsection, we will investigate the non-singularity properties of L . The results may have independent interests of their own and will be used later in showing the solvability of the system (4.4) of equations for P and Q and establishing bounds on P and Q. Although in (4.4) Ki and Mi are assumed symmetric, as a result of their sources in LREP (1.2), this assumption is not required here. So this subsection is a little bit more general than we will need later. On the other hand, with or without this symmetry assumption does not increase the complexity of our development in this section. The following theorem succinctly states a non-singularity condition of L in terms of the eigenvalues of K1 M1 and K2 M2 . 8
Theorem 5.1. The operator L is invertible if and only if eig(K1 M1 ) ∩ eig(K2 M2 ) = ∅. Proof. Suppose that eig(K1 M1 ) ∩ eig(K2 M2 ) = ∅. We need to show that L is invertible. Since L maps Rm×k × Rm×k to itself, it suffices to show that L (P, Q) = (0, 0) has only the trivial solution (0, 0). Otherwise, let Q 6= 0. Post-multiply QK1 − K2 P = 0 by M1 and pre-multiply P M1 −M2 Q = 0 by K2 and then add the resulting equations to get QK1 M1 −K2 M2 Q = 0 which can only have the trivial solution 0 since eig(K1 M1 ) ∩ eig(K2 M2 ) = ∅, contradicting Q 6= 0. Conversely2 , we will prove that eig(K1 M1 ) ∩ eig(K2 M2 ) 6= ∅ implies that L is singular. e ∈ Rm×k such that From eig(K1 M1 ) ∩ eig(K2 M2 ) 6= ∅ it follows that there is a nonzero Q k×k e e QK1 M1 = K2 M2 Q. For the same reason there is a nonzero W ∈ R so that W M1 K1 = e and Q = QW e M1 to get K1 M1 W . because eig(K1 M1 ) ∩ eig(M1 K1 ) 6= ∅. Now set P = M2 QW P M1 = M2 Q and e M1 K1 − K2 M2 QW e QK1 − K2 P = QW e M1 K1 − QK e 1 M1 W = QW
e = Q(W M1 K1 − K1 M1 W ) = 0,
Therefore L is singular. In order to obtain estimates on the solution of the equation L (P, Q) = (R, S), we define a norm on Rm×k × Rm×k as follows: for given pair (P, Q) ∈ Rm×k × Rm×k , q k(P, Q)kF := k[P, Q]kF = kP k2F + kQk2F ,
and accordingly define the dis function, in analogy with the functions sep and dif introduced in Stewart [5, 6], dis(K1 , M1 ; K2 , M2 ) := min kL (P, Q)kF . (5.2) k(P,Q)kF =1
Since L is a linear operator that maps a finite dimensional vector space to itself, by Theorem 5.1, we have dis(K1 , M1 ; K2 , M2 ) = 0 ⇔ L is singular, i.e., eig(K1 M1 ) ∩ eig(K2 M2 ) 6= ∅.
(5.3)
If L is nonsingular, then k(P, Q)kF ≤
kL (P, Q)kF . dis(K1 , M1 ; K2 , M2 )
We remark that the use of min in (5.2) is justifiable. Introduce the vec-operation that turns a matrix into a column vector by appending its columns one by one with the first column followed by the second column and the third one and so on, and let ⊗ be the usual matrix Kronecker product. We have vec(QK1 − K2 P ) vec(P ) −(I ⊗ K2 ) K1T ⊗ I =L with L = . (5.4) vec(P M1 − M2 Q) vec(Q) −(I ⊗ M2 ) M1T ⊗ I 2
We thank Dr. Weihong Yang of Fudan University for this “converse” part of proof which is much simpler than our original one.
9
The big matrix L is thus the matrix representation of the operator L , and dis(K1 , M1 ; K2 , M2 ) = σmin (L), vec(P ) the smallest singular value of L, attained when is the unit right singular vector correvec(Q) sponding to the smallest singular value σmin (L) of L. In what follows, we give a few useful results about this dis(·) function. Lemma 5.1. dis(K1 + E1 , M1 + F1 ; K2 + E2 , M2 + F2 ) q ≥ dis(K1 , M1 ; K2 , M2 ) − k(E1 , F1 )k2F + k(E2 , F2 )k2F .
(5.5)
Proof. Suppose dis(K1 + E1 , M1 + F1 ; K2 + E2 , M2 + F2 ) is attended by the pair (P, Q). Then dis(K1 + E1 , M1 + F1 ; K2 + E2 , M2 + F2 ) =k(Q(K1 + E1 ) − (K2 + E2 )P, P (M1 + F1 ) − (M2 + F2 )Q)kF q = k(Q(K1 + E1 ) − (K2 + E2 )P k2F + kP (M1 + F1 ) − (M2 + F2 )Q)k2F q q ≥ kQK1 − K2 P k2F + kP M1 − M2 Qk2F − kQE1 − E2 P k2F + kP F1 − F2 Qk2F q ≥ dis(K1 , M1 ; K2 , M2 ) − kE1 k2F + kE2 k2F + kF1 k2F + kF2 k2F ,
as was to be shown.
Stewart [5] introduced the function sep(K1 , K2 ) = min kP K1 − K2 P kF kP kF =1
to measure the separation between the operators K1 and K2 . It was used in the perturbation analysis for approximate invariant subspaces of a matrix. The function dis( · ) will play an analogous role for deflating subspaces of {K, M }. Since, when M = I, the eigenvalue problem for KM reduces to an ordinary eigenvalue problem, it is natural to look for some connection between sep( · ) and dis( · ). Theorem 5.2. For all square matrices K1 and K2 , 1 √ sep(K1 , K2 ) ≥ dis(K1 , I; K2 , I). 2
(5.6)
If also kK1 k2 ≤ 1 and kK2 k2 ≤ 1, then dis(K1 , I; K2 , I) ≥
1 sep(K1 , K2 ). 2
Proof. The proof is similar to that of [6, Theorem 4.3]. Notice vec(P K1 − K2 P ) = (K1T ⊗ I − I ⊗ K2 ) vec(P ), 10
(5.7)
and we let vec(P ) be the unit singular vector of K1T ⊗ I − I ⊗ K2 associated with its smallest singular value. Then kP kF = 1, sep(K1 , K2 ) = kP K1 − K2 P kF . √ Since k(P, P )kF = 2, we have 1 dis(K1 , I; K2 , I) ≤ √ kL (P, P )kF 2 1 = √ k(P K1 − K2 P, P − P )kF 2 1 = √ sep(K1 , K2 ). 2 To prove (5.7), we let (P, Q) be the optimal pair that achieves dis(K1 , I; K2 , I), i.e., k(P, Q)kF = 1,
dis(K1 , I; K2 , I) = kL [(P, Q)]kF .
(5.8)
Set E = QK1 − K2 P and F = P − Q. The second equation in (5.8) says dis(K1 , I; K2 , I) = k(E, F )kF . Use E = QK1 − K2 P = QK1 − K2 Q − K2 F to get kEkF ≥ kQK1 − K2 QkF − kK2 F kF ≥ kQK1 − K2 QkF − kF kF ,
kEkF + kF kF ≥ kQK1 − K2 QkF .
(5.9)
√ Let kP kF = α, and kQkF = β. Then α2 + β 2 = 1 which implies max{α, β} ≥ 1/ 2. It follows from (5.9) that kEkF + kF kF ≥ kQK1 − K2 QkF ≥ β sep(K1 , K2 ), (5.10a) and similarly kEkF + kF kF ≥ kP K1 − K2 P kF ≥ α sep(K1 , K2 ).
(5.10b)
Combining (5.10a) and (5.10b), we have kEkF + kF kF ≥ max{α, β} sep(K1 , K2 ), and thus q dis(K1 , I; K2 , I) = kEk2F + kF k2F 1 ≥ √ (kEkF + kF kF ) 2 1 ≥ √ max{α, β} sep(K1 , K2 ) 2 1 ≥ sep(K1 , K2 ), 2
as expected. Finally, we have the following theorem. Theorem 5.3. Let L be defined by (5.4). Then 1 = kL−1 k2 . dis(K1 , M1 ; K2 , M2 ) 11
5.2
Error bounds
We now return to LREP (1.2) with (4.1) in section 4. We would like to know when the system (4.4) has a solution pair (P, Q) with kP T Qk2 < 1. In terms of the linear operator L defined in (5.1), the equations in (4.4) can be combined into one nonlinear equation: T T L (P, Q) = (K21 − QK21 P, M21 − P M21 Q).
(5.11)
Lemma 5.2. Let Ki , Mi , K21 and M21 be given by (4.1), and let δ = k(K21 , M21 )kF , If γ > 0 and κ1 =
γ = dis(K1 , M1 ; K2 , M2 ).
(5.12)
δ2 1 < , γ2 4
(5.13)
then (5.11) has a solution pair (P, Q) ∈ R(n−k)×k × R(n−k)×k satisfying k(P, Q)kF ≤ ρ := where κ=
δ δ (1 + κ) < 2 , γ γ
2κ1 √ < 1. 1 − 2κ1 + 1 − 4κ1
(5.14)
(5.15)
Proof. The proof is largely similar to that of [6, Theorem 5.1]. The idea is to construct an iterative procedure to solve (5.11) for P and Q. Since γ > 0, L is invertible. Let (P0 , Q0 ) = L −1 (K21 , M21 ),
(5.16)
and recursively define (Pi+1 , Qi+1 ) by T T (Pi+1 , Qi+1 ) = L −1 (K21 − Qi K21 Pi , M21 − Pi M21 Qi ) T T = (P0 , Q0 ) − L −1 (Qi K21 Pi , Pi M21 Qi ).
(5.17)
We will show that the sequence {(Pi , Qi )} converges to a solution of (5.11). The first step is to bound k(Pi , Qi )kF . From (5.16), we have k(P0 , Q0 )kF = kL −1 (K21 , M21 )kF ≤
δ 1 =: ρ0 < γ 2
by (5.13). Assuming k(Pi , Qi )kF ≤ ρi , we have by (5.17) k(Pi+1 , Qi+1 )kF ≤ ρ0 + γ −1 ηρ2i =: ρi+1 . This recursively defines ρi for i ≥ 1. Write ρi = ρ0 (1 + κi ) for i ≥ 1, which recursively define κi by δ2 δρ0 = 2, γ γ = κ1 (1 + κi )2
(5.18a)
κ1 = κi+1
12
for i ≥ 1.
(5.18b)
If (5.13) is satisfied, we see that κ1 < 14 and κ2 < 14 (1 + 41 )2 < 1. From κ1 < 41 and κ2 < 1, we have κ3 < 41 (1+ 1)2 = 1. Suppose that κi < 1. Then we have κi+1 = κ1 (1+ κi )2 < 41 (1+ 1)2 = 1. Thus, by induction, all κi < 1. On the other hand, κ2 = κ1 (1+κ1 )2 > κ1 . Now if κi > κi−1 , then κi+1 = κ1 (1 + κi )2 > κ1 (1 + κi−1 )2 = κi . This shows that the sequence {κi } is monotonically increasing. Hence it converges and κi < κ := lim κi = i→∞
2κ1 √ < 1. 1 − 2κ1 + 1 − 4κ1
As a result, the sequence {ρi = ρ0 (1 + κi )} is monotonically increases, too, and k(Pi , Qi )kF ≤ ρ := lim ρi = ρ0 (1 + κ) < 1. i→∞
To show that the matrix pair sequence (Pi , Qi ) converge also, we let ∆i = (Pi+1 − Pi , Qi+1 − Qi ) =: (∆Pi , ∆Qi ). Then T T T T Qi−1 )kF Pi − Qi−1 K21 Pi−1 , Pi M21 Qi − Pi−1 M21 k∆i kF = kL −1 (Qi K21
T T T T Qi + Pi−1 M21 ∆Qi−1 )kF ∆Pi−1 , ∆Pi−1 M21 ≤ γ −1 k(∆Qi−1 K21 Pi + Qi−1 K21
≤ 2γ −1 δρk∆i−1 kF .
So the sequence {(Pi , Qi )} converges provided 2γ −1 δρ < 1, which is ensured by (5.13) and ρ < 1. The limit of (Pi , Qi ) gives a solution to (5.11) and satisfies (5.14). To measure differences between subspaces, we two subspaces U and Ue of dimension k. Let U respectively. The canonical angles between U and θi := arccos σi ,
need the notion of canonical angles between e be the basis matrices of U and Ue, and U e U are [8, Definition 5.3 on p.43]
i = 1, 2, . . . , k,
e (U e TU e )−1/2 . Furthermore, we where σi (1 ≤ i ≤ k) are the singular values of (U T U )−1/2 U T U e e define the angle ∠(U , U ) and the angle matrix between U and U to be ∠(U , Ue) = max θi , i
Θ(U , Ue) = diag(θ1 , θ2 , . . . , θk ).
Note the canonical angles θi and thus ∠(U , Ue) and Θ(U , Ue) are independent of the choices of basis matrices.
Theorem 5.4. For LREP (1.2) with (4.1), where [U1 , U2 ]T [V1 , V2 ] = In , if (5.13) holds, then there is a matrix pair (P, Q) ∈ R(n−k)×k × R(n−k)×k satisfying (5.14) to yield X and Y in (4.5) such that (4.6) holds. In particular, {R (U1 + U2 P ) , R (V1 + V2 Q)} is a pair of deflating subspace for {K, M }, and eig(H) = eig(H1 ) ∪ eig(H2 ), 13
eig(H1 ) ∩ eig(H2 ) = ∅,
(5.19)
"
# ei 0 K e i and M fi given by (4.6), Hi = where, with K fi 0 for i = 1, 2. Moreover, if M σmax (V2 ) δ σmax (U2 ) , cos θV < 1, η := 2 max cos θU σmin (U1 ) σmin (V1 ) γ
(5.20)
where σmax ( · ) and σmin ( · ) denote the largest and smallest singular value of a matrix, then kP kF σmin (V2 )σmin (U1 ) kQkF k tan ΘV kF ≤ σmin (U2 )σmin (V1 ) k tan ΘU kF ≤
1 , 1−η 1 · , 1−η ·
(5.21a) (5.21b)
where kP kF and kQkF can be bounded using (5.14), and θU = ∠(R(U1 ), R(U2 )),
ΘU = Θ(R(U1 ), R(U1 + U2 P )),
θV = ∠(R(V1 ), R(V2 )),
ΘV = Θ(R(V1 ), R(V1 + V2 Q)).
(5.22) (5.23)
Proof. Under the assumption of the theorem, (5.11) has a solution satisfying (5.14). Hence, the column spaces of U1 + U2 P and V1 + V2 Q form a pair of deflating subspace for {K, M }. Note that P and Q satisfy (4.4). By (4.6), we have T T e1M f1 ) = eig((K1 + K21 eig(K P )(M1 + M21 Q)), e2M f2 ) = eig((K2 − QK T )(M2 − P M T )). eig(K 21
21
Hence, the first relation of (5.19), i.e., eig(H) = eig(H1 ) ∪ eig(H2 ) holds as a consequence of (4.6). We now show eig(H1 ) ∩ eig(H2 ) = ∅. Apply Lemma 5.1 to obtain T T T T dis(K1 + K21 P, M1 + M21 Q, K2 − QK21 , M2 − P M21 ) q T P, M T Q)k2 + k(K T Q, M T P )k2 ≥ γ − k(K21 21 21 21 F F q T k2 + kM T k2 )(kP k2 + kQk2 ) ≥ γ − 2(kK21 21 F F F F
≥ γ − 2δρ > 0,
where ρ is defined in (5.14). Thus apply (5.3) to conclude eig(H1 ) ∩ eig(H2 ) = ∅. Next we prove (5.21). Let Ui0 = Ui (UiT Ui )−1/2 , and Vi0 = Vi (ViT Vi )−1/2 . Then R(U1 ) = R(U10 ),
R(U1 + U2 P ) = R(U10 + U2 P (U1T U1 )−1/2 ).
T + V V T = I, and thus Note U1T V2 = 0 to see U10 U10 20 20 T T U10 + U2 P (U1T U1 )−1/2 = U10 + (U10 U10 + V20 V20 )U2 P (U1T U1 )−1/2 T T = U10 [I + U10 U2 P (U1T U1 )−1/2 ] + V20 V20 U2 P (U1T U1 )−1/2
= U10 [I + G] + V20 (V2T V2 )−1/2 P (U1T U1 )−1/2 , T U P (U T U )−1/2 = U T U (U T U )1/2 P (U T U )−1/2 . We have where G = U10 2 1 1 10 20 2 2 1 1
kGk2 ≤ cos θU
σmax (U2 ) kP k2 σmin (U1 ) 14
σmax (U2 ) k(P, Q)kF σmin (U1 ) σmax (U2 ) δ ≤ 2 cos θU ≤ η < 1, σmin (U1 ) γ ≤ cos θU
where the last inequality holds because of (5.20). Thus R(U1 + U2 P ) = R(U10 + U2 P (U1T U1 )−1/2 )
= R(U10 + V20 (V2T V2 )−1/2 P (U1T U1 )−1/2 (I + G)−1 )
which gives k tan ΘU kF = k(V2T V2 )−1/2 P (U1T U1 )−1/2 (I + G)−1 kF
≤ k(V2T V2 )−1/2 k2 kP kF k(U1T U1 )−1/2 k2 k(I + G)−1 k2 1 kP kF · . ≤ σmin (V2 )σmin (U1 ) 1 − η
Similarly, we can prove (5.21b).
6
Conclusion
A pair of exact deflating subspaces of {K, M } can be used to decouple LREP (1.2) to two smaller LREP as proved in Theorem 3.1. But a pair of approximate deflating subspaces can only lead to an almost decoupling LREP (1.2) to an extent that is reflected by the tininess of kK21 k2 and kM21 k2 in (4.1). We have obtained in Theorem 5.4 bounds on how far the given pair of approximate deflating subspaces from a nearby pair of exact deflating subspaces of {K, M }. The results can be viewed as the extensions of Stewart’s theory [5, 6, 7] for the standard eigenvalue problem of a matrix and the generalize eigenvalue problem of a matrix pencil. Closely related, in [10], we considered the effect of setting both K21 and M21 to 0 on the eigenvalues. Perturbation bounds in terms of the squares of kK21 k2 and kM21 k2 are obtained.
References [1] Zhaojun Bai and Ren-Cang Li. Minimization principle for linear response eigenvalue problem, I: Theory. SIAM J. Matrix Anal. Appl., 33(4):1075–1100, 2012. [2] Zhaojun Bai and Ren-Cang Li. Minimization principles for the linear response eigenvalue problem II: Computation. SIAM J. Matrix Anal. Appl., 34(2):392–416, 2013. [3] Zhaojun Bai, Ren-Cang Li, and Wen-Wei Lin. Linear response eigenvalue problem solved by extended locally optimal preconditioned conjugate gradient methods. Technical Report 2015-18, Department of Mathematics, University of Texas at Arlington, November 2015. Available at http://www.uta.edu/math/preprint/. [4] D. Rocca. Time-Dependent Density Functional Perturbation Theory: New algorithms with Applications to Molecular Spectra. PhD thesis, The International School for Advanced Studies, Trieste, Italy, 2007. [5] G. W. Stewart. Error bounds for approximate invariant subspaces of closed linear operators. SIAM J. Numer. Anal., 8:796–808, 1971.
15
[6] G. W. Stewart. On the sensitivity of the eigenvalue problem Ax = λBx. SIAM J. Numer. Anal., 4:669–686, 1972. [7] G. W. Stewart. Error and perturbation bounds for subspaces associated with certain eigenvalue problems. SIAM Rev., 15:727–764, 1973. [8] G. W. Stewart and Ji-Guang Sun. Matrix Perturbation Theory. Academic Press, Boston, 1990. [9] Zhongming Teng and Ren-Cang Li. Convergence analysis of Lanczos-type methods for the linear response eigenvalue problem. J. Comput. Appl. Math., 247:17–33, 2013. [10] Zhongming Teng, Linzhang Lu, and Ren-Cang Li. Perturbation of partitioned linear response eigenvalue problems. Electron. Trans. Numer. Anal., 2016. to appear. [11] Zhongming Teng, Yunkai Zhou, and Ren-Cang Li. A block Chebyshev-Davidson method for linear response eigenvalue problems. Technical Report 2013-11, Department of Mathematics, University of Texas at Arlington, September 2013. Available at http://www.uta.edu/math/preprint/, submitted. [12] D. J. Thouless. Vibrational states of nuclei in the random phase approximation. Nuclear Physics, 22(1):78–95, 1961. [13] D. J. Thouless. The quantum mechanics of many-body systems. Academic, 1972. [14] E. V. Tsiper. Variational procedure and generalized Lanczos recursion for small-amplitude classical oscillations. JETP Letters, 70(11):751–755, 1999. [15] E. V. Tsiper. A classical mechanics technique for quantum linear response. J. Phys. B: At. Mol. Opt. Phys., 34(12):L401–L407, 2001. [16] Lei-Hong Zhang, Jungong Xue, and Ren-Cang Li. Rayleigh–Ritz approximation for the linear response eigenvalue problem. SIAM J. Matrix Anal. Appl., 35(2):765–782, 2014.
16